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FOREWORD 


l n bygone centuries, our physical world appeared to be filled to the brim with mysteries. Divine powers 
could provide for genuine miracles; water and sunlight could turn arid land into fertile pastures, but the 
same powers could lead to miseries and disasters. The force of life, the vis vitalis, was assumed to be the 
special agent responsible for all living things. The heavens, whatever they were for, contained stars and other 
heavenly bodies that were the exclusive domain of the Gods. 

Mathematics did exist, of course. Indeed, there was one aspect of our physical world that was recognised to 
be controlled by precise, mathematical logic: the geometric structure of space, elaborated to become a genuine 
form of art by the ancient Greeks. From my perspective, the Greeks were the first practitioners of ‘mathematical 
physics’, when they discovered that all geometric features of space could be reduced to a small number of 
axioms. Today, these would be called ‘fundamental laws of physics’. The fact that the flow of time could be 
addressed with similar exactitude, and that it could be handled geometrically together with space, was only 
recognised much later. And, yes, there were a few crazy people who were interested in the magic of numbers, 
but the real world around us seemed to contain so much more that was way beyond our capacities of analysis. 

Gradually, all this changed. The Moon and the planets appeared to follow geometrical laws. Galilei and 
Newton managed to identify their logical rules of motion, and by noting that the concept of mass could be 
applied to things in the sky just like apples and cannon balls on Earth, they made the sky a little bit more 
accessible to us. Electricity, magnetism, light and sound were also found to behave in complete accordance 
with mathematical equations. 

Yet all of this was just a beginning. The real changes came with the twentieth century. A completely new 
way of thinking, by emphasizing mathematical, logical analysis rather than empirical evidence, was pioneered 
by Albert Einstein. Applying advanced mathematical concepts, only known to a few pure mathematicians, to 
notions as mundane as space and time, was new to the physicists of his time. Einstein himself had a hard 
time struggling through the logic of connections and curvatures, notions that were totally new to him, but are 
only too familiar to students of mathematical physics today. Indeed, there is no better testimony of Einstein’s 
deep insights at that time, than the fact that we now teach these things regularly in our university classrooms. 

Special and general relativity are only small corners of the realm of modern physics that is presently being 
studied using advanced mathematical methods. We have notoriously complex subjects such as phase transitions in 
condensed matter physics, superconductivity, Bose-Einstein condensation, the quantum Hall effect, particularly 
the fractional quantum Hall effect, and numerous topics from elementary particle physics, ranging from fibre 
bundles and renormalization groups to supergravity, algebraic topology, superstring theory, Calabi-Yau spaces 
and what not, all of which require the utmost of our mental skills to comprehend them. 

The most bewildering observation that we make today is that it seems that our entire physical world 
appears to be controlled by mathematical equations, and these are not just sloppy and debatable models, but 
precisely documented properties of materials, of systems, and of phenomena in all echelons of our universe. 

Does this really apply to our entire world, or only to parts of it? Do features, notions, entities exist that are 
emphatically not mathematical? What about intuition, or dreams, and what about consciousness? What 
about religion? Here, most of us would say, one should not even try to apply mathematical analysis, although 
even here, some brave social scientists are making attempts at coordinating rational approaches. 


No, there are clear and important differences between the physical world and the mathematical world. 
Where the physical world stands out is the fact that it refers to ‘reality’, whatever ‘reality’ is. Mathematics is 
the world of pure logic and pure reasoning. In physics, it is the experimental evidence that ultimately decides 
whether a theory is acceptable or not. Also, the methodology in physics is different. 

A beautiful example is the serendipitous discovery of superconductivity. In 1911, the Dutch physicist Heike 
Kamerlingh Onnes was the first to achieve the liquefaction of helium, for which a temperature below 4.25 K 
had to be realized. Heike decided to measure the specific conductivity of mercury, a metal that is frozen solid 
at such low temperatures. But something appeared to go wrong during the measurements, since the volt 
meter did not show any voltage at all. All experienced physicists in the team assumed that they were dealing 
with a malfunction. It would not have been the first time for a short circuit to occur in the electrical 
equipment, but, this time, in spite of several efforts, they failed to locate it. One of the assistants was 
responsible for keeping the temperature of the sample well within that of liquid helium, a dull job, requiring 
nothing else than continuously watching some dials. During one of the many tests, however, he dozed off. 
The temperature rose, and suddenly the measurements showed the normal values again. It then occurred to 
the investigators that the effect and its temperature dependence were completely reproducible. Below 4.19 
degrees Kelvin the conductivity of mercury appeared to be strictly infinite. Above that temperature, it is 
finite, and the transition is a very sudden one. Superconductivity was discovered (D. van Delft, “Heike 
Kamerling Onnes”, Uitgeverij Bert Bakker, Amsterdam, 2005 (in Dutch)). 

This is not the way mathematical discoveries are made. Theorems are not produced by assistants falling 
asleep, even if examples do exist of incidents involving some miraculous fortune. 

The hybrid science of mathematical physics is a very curious one. Some of the topics in this Encyclopedia 
are undoubtedly physical. High T, superconductivity, breaking water waves, and magneto-hydrodynamics, 
are definitely topics of physics where experimental data are considered more decisive than any high-brow 
theory. Cohomology theory, Donaldson—Witten theory, and AdS/CFT correspondence, however, are examples 
of purely mathematical exercises, even if these subjects, like all of the others in this compilation, are strongly 
inspired by, and related to, questions posed in physics. 

It is inevitable, in a compilation of a large number of short articles with many different authors, to see quite a 
bit of variation in style and level. In this Encyclopedia, theoretical physicists as well as mathematicians together 
made a huge effort to present in a concise and understandable manner their vision on numerous important 
issues in advanced mathematical physics. All include references for further reading. We hope and expect that 
these efforts will serve a good purpose. 


Gerard ’t Hooft, 
Spinoza Institute, 


Utrecht University, 
The Netherlands. 


PREFACE 


athematical Physics as a distinct discipline is relatively new. The International Association of 

Mathematical Physics was founded only in 1976. The interaction between physics and mathematics 
has, of course, existed since ancient times, but the recent decades, perhaps partly because we are living 
through them, appear to have witnessed tremendous progress, yielding new results and insights at a dizzying 
pace, so much so that an encyclopedia seems now needed to collate the gathered knowledge. 

Mathematical Physics brings together the two great disciplines of Mathematics and Physics to the benefit of 
both, the relationship between them being symbiotic. On the one hand, it uses mathematics as a tool to 
organize physical ideas of increasing precision and complexity, and on the other it draws on the questions 
that physicists pose as a source of inspiration to mathematicians. A classical example of this relationship 
exists in Einstein’s theory of relativity, where differential geometry played an essential role in the formulation 
of the physical theory while the problems raised by the ensuing physics have in turn boosted the development 
of differential geometry. It is indeed a happy coincidence that we are writing now a preface to an 
encyclopedia of mathematical physics in the centenary of Einstein’s annus mirabilis. 

The project of putting together an encyclopedia of mathematical physics looked, and still looks, to us a 
formidable enterprise. We would never have had the courage to undertake such a task if we did not believe, 
first, that it is worthwhile and of benefit to the community, and second, that we would get the much-needed 
support from our colleagues. And this support we did get, in the form of advice, encouragement, and 
practical help too, from members of our Editorial Advisory Board, from our authors, and from others as well, 
who have given unstintingly so much of their time to help us shape this Encyclopedia. 

Mathematical Physics being a relatively new subject, it is not yet clearly delineated and could mean 
different things to different people. In our choice of topics, we were guided in part by the programs of recent 
International Congresses on Mathematical Physics, but mainly by the advice from our Editorial Advisory 
Board and from our authors. The limitations of space and time, as well as our own limitations, necessitated 
the omission of certain topics, but we have tried to include all that we believe to be core subjects and to cover 
as much as possible the most active areas. 

Our subject being interdisciplinary, we think it appropriate that the Encyclopedia should have certain 
special features. Applications of the same mathematical theory, for instance, to different problems in physics 
will have different emphasis and treatment. By the same token, the same problem in physics can draw upon 
resources from different mathematical fields. This is why we divide the Encyclopedia into two broad sections: 
physics subjects and related mathematical subjects. Articles in either section are deliberately allowed a fair 
amount of overlap with one another and many articles will appear under more than one heading, but all are 
linked together by elaborate cross referencing. We think this gives a better picture of the subject as a whole 
and will serve better a community of researchers from widely scattered yet related fields. 

The Encyclopedia is intended primarily for experienced researchers but should be of use also to beginning 
graduate students. For the latter category of readers, we have included eight elementary introductory articles for easy 
reference, with those on mathematics aimed at physics graduates and those on physics aimed at mathematics 
graduates, so that these articles can serve as their first port of call to enable them to embark on any of the main 
articles without the need to consult other material beforehand. In fact, we think these articles may even form the 


foundation of advanced undergraduate courses, as we know that some authors have already made such use of them. 

In addition to the printed version, an on-line version of the Encyclopedia is planned, which will allow both 
the contents and the articles themselves to be updated if and when the occasion arises. This is probably a 
necessary provision in such a rapidly advancing field. 

This project was some four years in the making. Our foremost thanks at its completion go to the members 
of our Editorial Advisory Board, who have advised, helped and encouraged us all along, and to all our 
authors who have so generously devoted so much of their time to writing these articles and given us much 
useful advice as well. We ourselves have learnt a lot from these colleagues, and made some wonderful 
contacts with some among them. Special thanks are due also to Arthur Greenspoon whose technical expertise 
was indispensable. 

The project was started with Academic Press, which was later taken over by Elsevier. We thank warmly 
members of their staff who have made this transition admirably seamless and gone on to assist us greatly in 
our task: both Carey Chapman and Anne Guillaume, who were in charge of the whole project and have been 
with us since the beginning, and Edward Taylor responsible for the copy-editing. And Martin Ruck, who 
manages to keep an overwhelming amount of details constantly at his fingertips, and who is never known to 
have lost a single email, deserves a very special mention. 

As a postscript, we would like to express our gratitude to the very large number of authors who generously 
agreed to donate their honorariums to support the Committee for Developing Countries of the European 
Mathematical Society in their work to help our less fortunate colleagues in the developing world. 


Jean-Pierre Françoise 
Gregory L. Naber 
Tsou Sheung Tsun 
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GUIDE TO USE OF THE ENCYCLOPEDIA 


Structure of the Encyclopedia 


The material in this Encyclopedia is organised into two sections. At the start of Volume 1 are eight Introductory Articles. 
The introductory articles on mathematics are aimed at physics graduates; those on physics are aimed at mathematics 
graduates. It is intended that these articles should serve as the first port of call for graduate students, to enable them to 
embark on any of the main entries without the need to consult other material beforehand. 

Following the Introductory Articles, the main body of the Encyclopedia is arranged as a series of entries in alphabetical 
order. These entries fill the remainder of Volume 1 and all of the subsequent volumes (2-5). 

To help you realize the full potential of the material in the Encyclopedia we have provided four features to help you find 
the topic of your choice: a contents list by subject, an alphabetical contents list, cross-references, and a full subject index. 


1. Contents List by Subject 


Your first point of reference will probably be the contents list by subject. This list appears at the front of each volume, 
and groups the entries under subject headings describing the broad themes of mathematical physics. This will enable the 
reader to make quick connections between entries and to locate the entry of interest. The contents list by subject is divided 
into two main sections: Physics Subjects and Related Mathematics Subjects. Under each main section heading, you will 
find several subject areas (such as GENERAL RELATIVITY in Physics Subjects op NONCOMMUTATIVE GEOMETRY 
in Related Mathematics Subjects). Under each subject area is a list of those entries that cover aspects of that subject, 
together with the volume and page numbers on which these entries may be found. 

Because mathematical physics is so highly interconnected, individual entries may appear under more than one subject 
area. For example, the entry GAUGE THEORY: MATHEMATICAL APPLICATIONS is listed under the Physics Subject 
GAUGE THEORY as well as in a broad range of Related Mathematics Subjects. 


2. Alphabetical Contents List 


The alphabetical contents list, which also appears at the front of each volume, lists the entries in the order in which they 
appear in the Encyclopedia. This list provides both the volume number and the page number of the entry. 

You will find “dummy entries” where obvious synonyms exist for entries or where we have grouped together related 
topics. Dummy entries appear in both the contents list and the body of the text. 


Example 
If you were attempting to locate material on path integral methods via the alphabetical contents list: 


PATH INTEGRAL METHODS see Functional Integration in Quantum Physics; Feynman Path Integrals 


The dummy entry directs you to two other entries in which path integral methods are covered. At the appropriate 
locations in the contents list, the volume and page numbers for these entries are given. 

If you were trying to locate the material by browsing through the text and you had looked up Path Integral Methods, 
then the following information would be provided in the dummy entry: 


Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals 
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3. Cross-References 


All of the articles in the Encyclopedia have been extensively cross-referenced. The cross-references, which appear at the 
end of an entry, serve three different functions: 


i. To indicate if a topic is discussed in greater detail elsewhere. 
ii. To draw the reader’s attention to parallel discussions in other entries. 


iii. To indicate material that broadens the discussion. 


Example 
The following list of cross-references appears at the end of the entry STOCHASTIC HYDRODYNAMICS 


See also: Cauchy Problem for Burgers-Type Equations; Hamiltonian 
Fluid Dynamics; Incompressible Euler Equations: Mathematical Theory; 
Malliavin Calculus; Non-Newtonian Fluids; Partial Differential Equations: 
Some Examples; Stochastic Differential Equations; Turbulence Theories; 
Viscous Incompressible Fluids: Mathematical Theory; Vortex Dynamics 


Here you will find examples of all three functions of the cross-reference list: a topic discussed in greater detail elsewhere 
(e.g. Incompressible Euler Equations: Mathematical Theory), parallel discussion in other entries (e.g. Stochastic Differ- 
ential Equations) and reference to entries that broaden the discussion (e.g. Turbulence Theories). 

The eight Introductory Articles are not cross-referenced from any of the main entries, as it is expected that introductory 
articles will be of general interest. As mentioned above, the Introductory Articles may be found at the start of Volume 1. 


4. Index 


The index will provide you with the volume and page number where the material is located. The index entries 
differentiate between material that is a whole entry, is part of an entry, or is data presented in a figure or table. Detailed 
notes are provided on the opening page of the index. 


5. Contributors 


A full list of contributors appears at the beginning of each volume. 
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General Principles 


Classical mechanics is a theory of motions of point 
particles. If X =(x,,...,x,,) are the particle positions 
in a Cartesian inertial system of coordinates, the 
equations of motion are determined by their masses 


(711,...,™n,), mj > 0, and by the potential energy of 
interaction, V(x1,...,X,), as 
WX =O, V (Xjes ha), todeeum A] 


here x;=(xi1,...,Xjq) are coordinates of the ith 
particle and ôx, is the gradient (0,,,,..., Ox); d is the 
space dimension (i.e., d=3, usually). The potential 
energy function will be supposed “smooth,” that is, 
analytic except, possibly, when two positions coin- 
cide. The latter exception is necessary to include the 
important cases of gravitational attraction or, when 
dealing with electrically charged particles, of Cou- 
lomb interaction. A basic result is that if V is 
bounded below, eqn [1] admits, given initial data 
Xo =X(0),X 9 =X(0), a unique global solution 
t— X(t), t € (co, oo); otherwise a solution can fail 
to be global if and only if, in a finite time, it reaches 
infinity or a singularity point (i.e., a configuration in 
which two or more particles occupy the same point: 
an event called a collision). 

In eqn [1], —0,,V(x1,...,Xn) is the force acting on 
the points. More general forces are often admitted. 
For instance, velocity-dependent friction forces: they 
are not considered here because of their phenomeno- 
logical nature as models for microscopic phenomena 
which should also, in principle, be explained in 
terms of conservative forces (furthermore, even from 
a macroscopic viewpoint, they are rather incomplete 
models, as they should be considered together with 
the important heat generation phenomena that 
accompany them). Another interesting example of 


forces not corresponding to a potential are certain 
velocity-dependent forces like the Coriolis force 
(which, however, appears only in noninertial frames 
of reference) and the closely related Lorentz force 
(in electromagnetism): they could be easily accom- 
modated in the Hamiltonian formulation of 
mechanics; see Appendix 2. 

The action principle states that an equivalent 
formulation of the eqns [1] is that a motion 
t— Xo(t) satisfying [1] during a time interval 
[t1,t2] and leading from X! = Xo(t1) to X* =Xo(t2), 
renders stationary the action 


A({X}) -f (somo? 


within the class M,, 4, (X!, X?) of smooth (i.e., 
analytic) “motions” t— X(t) defined for t € [t1, t2] 
and leading from X! to X’. 

The function 


V(X o) dt [2] 


=5 mot (x) K(Y) — V(X), 


Y= Pa 


is called the Lagrangian function and the action can 


be written as 
i) . 
f LEOD 
ti 


The quantity K(X(t)) is called kinetic energy and 
motions satisfying [1] conserve energy as time 
t varies, that is, 


K(X(t)) + V(X(t)) = E = const. [3] 


Hence the action principle can be intuitively thought 
of as saying that motions proceed by keeping 
constant the energy, sum of the kinetic and potential 
energies, while trying to share as evenly as possible 
their (average over time) contribution to the energy. 

In the special case in which V is translation i invariant, 
motions conserve linear momentum o£ 3 mixi; if V 
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is rotation invariant around the origin O, motions 
conserve angular momentum Me > Mix; A xj, where A 
denotes the vector product in Rf, that is, it is the tensor 
(a A b); =ajb; — bjaj,1,j=1,...,d: if the dimension 
d=3 the a ^b will be naturally regarded as a vector. 
More generally, to any continuous symmetry group of 
the Lagrangian correspond conserved quantities: this is 
formalized in the Noether theorem. 

It is convenient to think that the scalar product 
in R” is defined in terms of the ordinary scalar product 
in R a-b= es ajbj, by (v, w) = y miti : Wi: 
so that kinetic energy and line element ds can be 
written as K(X) = 1(X,X) and ds* = )~"_, m; dx?, 
respectively. Therefore, the metric generated by the 
latter scalar product can be called kinetic energy 
metric. 

The interest of the kinetic metric appears from the 
Maupertuis’ principle (equivalent to [1]): the princi- 
ple allows us to identify the trajectory traced in R? 
by a motion that leads from X! to X? moving with 
energy E. Parametrizing such trajectories as 
T — X(r) by a parameter r varying in [0, 1] so that 
the line element is ds? = (0,X, 0,X) dr’, the principle 
states that the trajectory of a motion with energy E 
which leads from X! to X? makes stationary, among 
the analytic curves é € Mo,1(X',X*), the function 


L(é) = | JE -= ViE(s)) ds 4] 


so that the possible trajectories traced by the 
solutions of [1] in R”? and with energy E can be 
identified with the geodesics of the metric 
dm? (E —V(X)) - ds?. 

For more details, the reader is referred to Landau 
and Lifshitz (1976) and Gallavotti (1983). 


Constraints 


Often particles are subject to constraints which force 
the motion to take place on a surface M C RY. ic. 
X(t) is forced to be a point on the manifold 
M. A typical example is provided by rigid systems 
in which motions are subject to forces which keep 
the mutual distances of the particles constant: 
|x; — x;| = pij, with pj time-independent positive quan- 
tities. In essentially all cases, the forces that imply 
constraints, called constraint reactions, are velocity 
dependent and, therefore, are not in the class of 
conservative forces considered here, cf. [1]. Hence, 
from a fundamental viewpoint admitting only conser- 
vative forces, constrained systems should be regarded 
as idealizations of systems subject to conservative 
forces which approximately imply the constraints. 


In general, the ¢-dimensional manifold M will not 
admit a global system of coordinates: however, it 
will be possible to describe points in the vicinity 
of any X°e€M by using N=nd_ coordinates 
G=(15--+5 Gls qi+1, ---, qN) Varying in an open ball 
By: X =X (15 -oqo qiri»: - -> qN). 

The q-coordinates can be chosen well adapted to 
the surface M and to the kinetic metric, i.e., so that 
the points of M are identified by q1 =---= gn =0 
(which is the meaning of “adapted”); furthermore, 
infinitesimal displacements (0,...,0,deyi1,..., den) 
out of a point X? € M are orthogonal to M (in the 
kinetic metric) and have a length independent of the 
position of X° on M (which is the meaning of “well 
adapted” to the kinetic metric). 

Motions constrained on M arise when the 
potential V has the form 


V(X) = Va(X) + AW(X) [5] 


where W is a smooth function which reaches its 
minimum value, say equal to 0, precisely on the 
manifold M while V, is another smooth potential. 
The factor A > 0 is a parameter called the rigidity of 
the constraint. 

A particularly interesting case arises when the level 
surfaces of W also have the geometric property of 
being “parallel” to the surface M: in the precise sense 
that the matrix On ğ W(X), i,7 > £ is positive definite 
and X-independent, for all X€M, in a system of 
coordinates well adapted to the kinetic metric. 

A potential W with the latter properties can be 
called an approximately ideal constraint reaction. In 
fact, it can be proved that, given an initial datum 
X? € M with velocity X? tangent to M, i.e., given 
an initial datum whose coordinates in a local system 
of coordinates are (qọ,0) and (đọ,0) with qọ= 
(Jo1,---,40e) and q y=(4o1,---540c), the motion 
generated by [1] with V given by [5] is a motion 
t — X(t) which 


1. as A— oo tends to a motion t > X,,(t); 

2. as long as X,,(t) stays in the vicinity of the initial 
data, say for 0<t<t, so that it can be 
described in the above local adapted coordinates, 
its coordinates have the form t— (q(t), 0)= 
(qi(t),...,qge(t),0,...,0): that is, it is a motion 
developing on the constraint surface M; and 

3. the curve t— X(t), t € [0,t,], as an element of 
the space Mo, t (X? X..(t1)) of analytic curves on 
M connecting X? to X,,(t;), renders the action 


A(X) = ] (KĚW) - Va(X(d)))dt (6 


stationary. 


The latter property can be formulated “intrinsically,” 
that is, referring only to M as a surface, via the 
restriction of the metric ds? to line elements ds = 
(dqi,...,dqv,0,...,0) tangent to M at the point 
X= (4, 0,...,0) € M; we write ds? = a gii(q) x 
dq; dq;. The £ x £ symmetric positive-definite matrix g 
can be called the metric on M induced by the kinetic 
energy. Then the action in [6] can be written as 


ty 1 12 
aaf (Zaoa 


= Vala) dż [7] 


where V,,(q) #2 Va(X(q15---34050,---,0)): the function 


TEESIEN 
Ln, q) = 5 2 gildin — Vald) 
= 58(a)n -n — V,(q) [8] 


is called the constrained Lagrangian of the system. 
An important property is that the constrained motions 
conserve the energy defined as E= +(g(q)q,4)+ 
Va(q); see next section. 
The constrained motion X,,(t) of energy E satisfies 
the Maupertuis’ principle in the sense that the curve 


on M on which the motion develops renders 


LE = J, VE- ValG\s)) as 9] 


stationary among the (smooth) curves that develop 
on M connecting two fixed values X; and X2. In the 
particular case in which ¢=7 this is again Mauper- 
tuis’ principle for unconstrained motions under the 
potential V(X). In general, Z is called the number of 
degrees of freedom because a complete description 
of the initial data requires 2 coordinates q(0), q(0). 

If W is minimal on M but the condition on W of 
having level surfaces parallel to M is not satisfied, i.e., 
if W is not an approximate ideal constraint reaction, 
it still remains true that the limit motion X,,(t) takes 
place on M. However, in general, it will not satisfy the 
above variational principles. For this reason, motions 
arising as limits (as \—> 00) of motions developing 
under the potential [5] with W having minimum on M 
and level curves parallel (in the above sense) to M are 
called ideally constrained motions or motions subject 
by ideal constraints to the surface M. 

As an example, suppose that W has the form 
W(X) =>}; jep Wii (li — 2; |) with wi (|E|) > 0 an ana- 
lytic function vanishing only when |€| =, for i, 7 in 
some set of pairs P and for some given distances p; (e.g., 
wyl) = (E — BYA > 0). Then W can be shown to 
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satisfy the mentioned conditions and therefore, the so 
constrained motions X(t) of the body satisfy the 
variational principles mentioned in connection with [7] 
and [9]: in other words, the above natural way of 
realizing a rather general rigidity constraint is ideal. 

The modern viewpoint on the physical meaning of 
the constraint reactions is as follows: looking at 
motions in an inertial Cartesian system, it will appear 
that the system is subject to the applied forces with 
potential V,(X) and to constraint forces which are 
defined as the differences R; =™m,x; + 0,,V,(X). The 
latter reflect the action of the forces with potential 
AW (X) in the limit of infinite rigidity (A — oo). 

In applications, sometimes the action of a constraint 
can be regarded as ideal: the motion will then verify the 
variational principles mentioned and R can be com- 
puted as the differences between the m;x; and the active 
forces —O,, V(X). In dynamics problems it is, however, 
a very difficult and important matter, particularly in 
engineering, to judge whether a system of particles can 
be considered as subject to ideal constraints: this leads 
to important decisions in the construction of machines. 
It simplifies the calculations of the reactions and fatigue 
of the materials but a misjudgment can have serious 
consequences about stability and safety. For statics 
problems, the difficulty is of lower order: usually 
assuming that the constraint reaction is ideal leads to 
an overestimate of the requirements for stability of 
equilibria. Hence, employing the action principle to 
statics problems, where it constitutes the principle of 
virtual work, generally leads to economic problems 
rather than to safety issues. Its discovery even predates 
Newtonian mechanics. 

We refer the reader to Arnol’d 
Gallavotti (1983) for more details. 


(1989) and 


Lagrange and Hamilton Forms 
of the Equations of Motion 


The stationarity condition for the action A(q), cf. 
[7], [8], is formulated in terms of the Lagrangian 


L(n,¢), see [8], by 


© a, £(4(0).4(0) 
= a; £(q(t),4(t)), 


which is a second-order differential equation called 
the Lagrangian equation of motion. It can be cast in 
“normal form”: for this purpose, adopting the 
convention of “summation over repeated indices,” 
introduce the “generalized momenta” 


i=1,...,2 [10 


def 


Pi= sid t= 1,.-.,6 [11] 
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Since g(q) > 0, the motions t— q(t) and the corre- 
sponding velocities t > g(t) can be described equiva- 
lently by t — (q(t), p(t)): and the equations of motion 
[10] become the first-order equations 


di = p H(p, q), bi = —0q;H (p,q) [12] 


where the function H, called the Hamiltonian of the 
system, is defined by 


H(p.4)=4(e(@) p p) + Vala) 113] 
Equations [12], regarded as equations of motion for 
phase space points (p,q), are called Hamilton 
equations. In general, q are local coordinates on M 
and motions are specified by giving q, q or p,q. 

Looking for a coordinate-free representation of 
motions consider the pairs X, Y with X € M and Y a 
vector Y € Tx tangent to M at the point X. The 
collection of pairs (Y,X) is denoted T(M) = Uxem 
(Tx x {X}) and a motion t— (X(t), X(t)) € T(M) in 
local coordinates is represented by (q(t), q(t)). The 
space T(M) can be called the space of initial data for 
Lagrange’s equations of motion: it has 24 dimen- 
sions (also known as the “tangent bundle” of M). 

Likewise, the space of initial data for the 
Hamilton equations will be denoted T*(M) and it 
consists of pairs X,P with X € M and P=g(X)Y 
with Y a vector tangent to M at X. The space T*(M) 
is called the phase space of the system: it has 
24 dimensions (and it is occasionally called the 
“cotangent bundle” of M). 

Immediate consequence of [12] is 


© H(p(t), q(t) = 0 


and it means that H(p(t),q(t)) is constant along 
the solutions of [12]. Noting that H(p,q)= 
(1/2)(g(q) 4,q)+ Valq) is the sum of the kinetic 
and potential energies, it follows that the conservation 
of along solutions means energy conservation in 
presence of ideal constraints. 

Let S, be the flow generated on the phase space 
variables (p,q) by the solutions of the equations of 
motion [12], that is, let t—S;(p,q) = (p(t), q(t)) 
denote a solution of [12] with initial data (p,q). 
Then a (measurable) set A in phase space evolves in 
time ¢ into a new set S;A with the same volume: this 
is obvious because the Hamilton equations [12] have 
manifestly zero divergence (“Liouville’s theorem”). 

The Hamilton equations also satisfy a variational 
principle, called the Hamilton action principle: that 
is, if M, ((P1s q1), (P2542); M) denotes the space of 
the analytic functions @: t — (m(t), K(t)) which in the 
time interval [t;,t2] lead from (p,,q,) to (Pp2,q2), 
then the condition that @ (t)= (p(t), q(t)) satisfies 


[12] can be equivalently formulated by requiring 
that the function 


t2 
An(g) = (0 EO- HEOK) 4 
t1 

be stationary for p=@p: in fact, eqns [12] are the 
stationarity conditions for the Hamilton action 
[14] on Mn, ((P1»q1), (P22); M). And, since the 
derivatives of z(t) do not appear in [14], statio- 
narity is even achieved in the larger space 
Mz, ,(415923M) of the motions @:t— (a(t), K(t)) 
leading from q, to q, without any restriction on 
the initial and final momenta p,,p, (which, there- 
fore, cannot be prescribed a priori independently 
of q1,q>). If the prescribed data p,,q,,p5,q> are 
not compatible with the equations of motion (e.g., 
H(p,,92) 4 H(p2,q2)), then the action functional 
has no stationary trajectory in M,, 1((DP1541), 
(P42); M). 

For more details, the reader is referred to Landau 
and Lifshitz (1976), Arnol’d (1989), and Gallavotti 
(1983). 


Canonical Transformations of Phase 
Space Coordinates 


The Hamiltonian form, [13], of the equations of 
motion turns out to be quite useful in several 
problems. It is, therefore, important to remark that 
it is invariant under a special class of transformations 
of coordinates, called canonical transformations. 

Consider a local change of coordinates on phase 
space, i.e., a smooth, smoothly invertible map 
C(a,«K) =(2',K’) between an open set U in the 
phase space of a Hamiltonian system with 
é degrees of freedom, into an open set U’ in a 
2¢-dimensional space. The change of coordinates is 
said to be canonical if for any solution 
t— ((t),«(t)) of equations like [12], for any 
Hamiltonian H(z,«) defined on U, the C-image 
t — (m'(t),«'(t)) =C(a(t),«K(t)) is a solution of [12] 
with the “same” Hamiltonian, that is, with 
Hamiltonian H’ (2', x’) H(C1 (2, K’)). 

The condition that a transformation of coordi- 
nates is canonical is obtained by using the 
arbitrariness of the function # and is simply 
expressed as a necessary and sufficient property of 
the Jacobian L, 


[15] 


, 4. Let 


e= (i o) 


denote the 2 x2 matrix formed by four ¢ x £ 
blocks, equal to the 0 matrix or, as indicated, to the 
+ (identity matrix); then, if a superscript T denotes 
matrix transposition, the condition that the map be 
canonical is that 


where i,j=1,... 


Iiei orb = ( Ti A) [16] 
which immediately implies that det L= +1. In fact, 
it is possible to show that [16] implies det L=1. 
Equation [16] is equivalent to the four relations ADT — 
BC! =1, —AB! + BA'=0,CD'—DC'=0, and 
—CB! + DA! =1. More explicitly, since the first and 
the fourth relations coincide, these can be expressed as 


{m, Ki} = Oi, {r m} = 0, {Ki}, Ki} =0 [17] 


where, for any two functions F(m, K), G(z,«), the 
Poisson bracket is 


6 
a O,,F(#,K)O,,,G(, K) 

k=1 

— ðk, F(T, K) On,G(a,«)) [18] 


The latter satisfies Jacobi’s identity: {{F, G}, O} + 
{{G, O}, F} + {{O, F}, G}=0, for any three functions 
F,G,O on the phase space. It is quite useful to 
remark that if t— (p(t), q(t)) =S;(p, q) is a solution 
to Hamilton equations with Hamiltonian H then, 
given any observable F(p,q), it “evolves” as 
F(t) = F(p(2), q(t)) satisfying 


OF (p(t), q(t)) ={H, Fp), a(¢)) 


Requiring the latter identity to hold for all observables 
F is equivalent to requiring that the t — (p(t), q(t)) bea 
solution of Hamilton’s equations for H. 

Let C: U — U’ be a smooth, smoothly invertible 
transformation between two open 2¢-dimensional 
sets: C(z, K) = (7, x’). Suppose that there is a function 
®(z',«) defined on a suitable domain W such that 


{F,G}(a,x) 2 


T = Q&T, K) 


C(m,K)=(",K') = a 


[19] 


then C is canonical. This is because [19] implies that 
if x, are varied and if 2,x’,a’,K« are related by 
C(z,K)=(n',K’), then 2-dk+k'-da’=d0(z',k), 
which implies that 


mdk —H(a,«)dt = a’ d — H(C'(a',x’))dt 
+ d®(a’,«K) —d(z’' - x’) [20] 


Introductory Article: Classical Mechanics 5 


It means that the Hamiltonians H(p,q) and 

H'(p',q ny dfg (C™(p',q')) have Hamilton actions 
Ay and A,, differing by a constant, if evaluated 
on corresponding motions (p(t),q(t)) and 
(p'(t), q'(t)) =C(plt), q(t). 

The constant depends only on the initial and final 
values (p(t1),q(ti)) and (p(t2),q(t2)) and, respec- 
tively, (p'(ti)sq'(t1)) and (p'(t2),q'(t2)) so that if 
(p(t), q(t)) makes Ay extreme, then (p’(t), q’/(t)) = 
C(p(t), q(t)) also makes A, extreme. 

Hence, if t > (p(t), g(t)) solves the Hamilton equa- 
tions with Hamiltonian H(p,q) then the motion 
t — (p'(t), q'(t)) =C(p(t), g(t)) solves the Hamilton 
equations with Hamiltonian H’(p’, q’) = HICH (pi q')) 
no matter which it is: therefore, the transformation is 
canonical. The function ©® is called its generating 
function. 

Equation [19] provides a way to construct 
canonical maps. Suppose that a function ®(7’,«) is 
given and defined on some domain W;; then setting 


T = Q&T, K) 
kK’ = Opm, K) 


and inverting the first equation in the form 
mq’ =&(a,«K) and substituting the value for az’ thus 
obtained, in the second equation, a map 
C(m,«)=(n',K’) is defined on some domain (where 
the mentioned operations can be performed) and if 
such domain is open and not empty then C is a 
canonical map. 

For similar reasons, if I(«K,«’) is a function 
defined on some domain then setting 7=0,T 
(K,K’), 7 = —O,1(k,x«’) and solving the first rela- 
tion to express K’= A(m,«) and substituting in the 
second relation a map (m, K’) =C(m, K) is defined on 
some domain (where the mentioned operations can 
be performed) and if such domain is open and not 
empty then C is a canonical map. 

Likewise, canonical transformations can be con- 
structed starting from a priori given functions 
F(z, «’) or G(a,a’). And the most general canonical 
map can be generated locally (i.e., near a given point 
in phase space) by a single one of the above four 
ways, possibly composed with a few “trivial” 
canonical maps in which one pair of coordinates 
(Ti, ki) is transformed into (—K;,7;). The necessity of 
also including the trivial maps can be traced to the 
existence of homogeneous canonical maps, that is, 
maps such that 7-dk=a’'-dk’ (e.g., the identity 
map, see below or [49] for nontrivial examples) 
which are action preserving hence canonical, but 
which evidently cannot be generated by a function 
®(x,K’) although they can be generated by a 
function depending on 7’, K. 
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Simple examples of homogeneous canonical maps 
are maps in which the coordinates q are changed 
into g’=R(q) and, correspondingly, the p’s are 
transformed as p’ = (0,R(q))'"p, linearly: indeed, 


this map is generated by the function Fip q) = 


p- R(q). 
For instance, consider the map “Cartesian-polar” 
coordinates (q1, q2)—> (p,0) with (p,0) the polar 


coordinates of q (namely p= ,/q+ + 42,0 = arctan 
(q2/q1)) and let n= q/ q| =(m,n2) and t=(—m, 71). 
Setting p; “p-n, po pp-t, the map (p1,p2, 
q1, q2)—(Pp, Po, p, 9) is homogeneous canonical 
(because p-dq=p-ndp+p:tpd0=p,dp+ pod). 

As a further example, any area-preserving map 
(p,q) <> (p’, gq’) defined on an open region of the 
plane R* is canonical: because in this case the 
matrices A,B,C,D are just numbers, which satisfy 
AD — BC=1 and, therefore, [16] holds. 

For more details, the reader is referred to Landau 
and Lifshitz (1976) and Gallavotti (1983). 


Quadratures 


The simplest mechanical systems are integrable by 
quadratures. For instance, the Hamiltonian on RŽ, 


H(p, q) = 5p? + Vla) 21] 


generates a motion t— q(t) with initial data go, go 
such that H(po,qo)=E, ie., 5mq5 + Vigo) =E, 


satisfying 
g(t) = £\/—(E- Va) 


If the equation E= V(q) has only two solutions 
q(E) < qi(E) and |0,V(qi(E))| > 0, the motion is 
periodic with period 


q+(E) dx 
1-2] mea 


The special solution with initial data qo= 
q-(E), q¿=0 will be denoted Q(t), and it is an 
analytic function (by the general regularity theorem 
on ordinary differential equations). For 0 < t < T/2 
or for T/2 <t< T it is given, respectively, by 


Q(t) dx 
= =o o 23a 
: lL. (2/m)(E — V(x)) ae 
3 23b 
=) lL. (2/m)(E — V(x)) Ii 


The most general solution with energy E has the 
form q(t)=QO(to +t), where tọ is defined by 
qo = Olto), Jo = Olto), i.e., it is the time needed for 
the “standard solution” O(t) to reach the initial data 
for the new motion. 

If the derivative of V vanishes in one of the 
extremes or if at least one of the two solutions g+(E) 
does not exist, the motion is not periodic and it may 
be unbounded: nevertheless, it is still expressible via 
integrals of the type [22]. If the potential V is 
periodic in q and the variable q is considered to be 
varying on a circle then essentially all solutions are 
periodic: exceptions can occur if the energy E has a 
value such that V(qg)=E admits a solution where V 
has zero derivative. 

Typical examples are the harmonic oscillator, the 
pendulum, and the Kepler oscillator: whose Hamil- 
tonians, if m, w,g,h,G,k are positive constants, are, 
respectively, 


2 
1 
Im tee 
2 
i F mg(1 — COS 2) [24] 
p? 1 G2 


2m "Ul 2e 


the Kepler oscillator Hamiltonian has a potential 
which is singular at q=0 but if G Æ 0 the energy 
conservation forbids too close an approach to g=0 
and the singularity becomes irrelevant. 

The integral in [23] is called a quadrature and the 
systems in [21] are therefore integrable by quad- 
ratures. Such systems, at least when the motion is 
periodic, are best described in new coordinates in 
which periodicity is more manifest. Namely when 
V(q) = E has only two roots g+(E) and FV'(q+(E)) > 0 
the energy-time coordinates can be used by replac- 
ing q,q or p,q by E,7, where 7 is the time needed 
for the standard solution t + O(t) to reach the given 
data, that is, O(r)=g,O(rT)=g. In such coordi- 
nates, the motion is simply (E,r)-—(E,7 +f) and, 
of course, the variable r has to be regarded as 
varying on a circle of radius T/27. The E,r 
variables are a kind of polar coordinates, as can 
be checked by drawing the curves of constant E, 
“energy levels,” in the plane p,g in the cases in 
[24]; see Figure 1. 

In the harmonic oscillator case, all trajectories are 
periodic. In the pendulum case, all motions are 
periodic except the ones which separate the oscilla- 
tory motions (the closed curves in the second 
drawing) from the rotatory motions (the apparently 
open curves) which, in fact, are on closed curves as 
well if the g coordinate, that is, the vertical 





Figure 1 The energy levels of the harmonic oscillator, the 
pendulum, and the Kepler motion. 


coordinate in Figure 1, is regarded as “periodic” 
with period 27h. In the Kepler case, only the 
negative-energy trajectories are periodic and a few 
of them are drawn in Figure 1. The single dots 
represent the equilibrium points in phase space. 

The region of phase space where motions are 
periodic is a set of points (p,q) with the 
topological structure of Uyey({u} x C,), where u is 
a coordinate varying in an open interval U (e.g., 
the set of values of the energy), and C, is a closed 
curve whose points (p,q) are identified by a 
coordinate (e.g., by the time necessary for an 
arbitrarily fixed datum with the same energy to 
evolve into (p,q)). 

In the above cases, [24], if the “radial” coordinate 
is chosen to be the energy the set U is the interval 
(0, +00) for the harmonic oscillator, (0,2mg) or 
(2mg, +00) for the pendulum, and (—4$ mk? /G?, 0) in 
the Kepler case. The fixed datum for the reference 
motion can be taken, in all cases, to be of the form 
(0, Jo) with the time coordinate tọ given by [23]. 

It is remarkable that the energy-time coordinates 
are canonical coordinates: for instance, in the vicinity 
of (Po, qo) and if po > O, this can be seen by setting 


S(q,E) = J " JImE—Vodx [25] 


and checking that p=0,S(q,E), t=OgS(q,E) are 
identities if (p,q) and (E,t) are coordinates for the 
same point so that the criterion expressed by [20] 
applies. 

It is convenient to standardize the coordinates 
by replacing the time variable by an angle a= 
(27/T(E))t; and instead of the energy any invertible 
function of it can be used. 

It is natural to look for a coordinate A =A(E) 
such that the map (p,g)<—(A,qa) is a canonical 
map: this is easily done as the function 


S(q, A) = J " JImEA —V(x))dx [26] 
qo 
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generates (locally) the correspondence between 


p=./2m(E(A) — V(q)) and 


a = E! 


a dx 
a \/2m-!(E(A) — V(x)) 


Therefore, by the criterion [20], if 


i.e., If A'(E)=T(E)/27, the coordinates (A,a) will 
be canonical coordinates. Hence, by [22], A(E) can 
be taken as 


a= t2 [” SmE Vad 
= — m — 
2T hw oo 


=o $ p dg 27 


where the last integral is extended to the closed curve 
of energy E; see Figure 1. The action—angle coordi- 
nates (A,a) are defined in open regions of phase 
space covered by periodic motions: in action—angle 
coordinates such regions have the form W =J x T of 
a product of an open interval J and a one- 
dimensional “torus” T = [0, 27] (i.e., a unit circle). 
For details, the reader is again referred to Landau and 
Lifshitz (1976), Arnol’d (1989), and Gallavotti (1983). 


Quasiperiodicity and Integrability 


A Hamiltonian is called integrable in an open region 
W c T*(M) of phase space if 


1. there is an analytic and nonsingular (i.e., with 
nonzero Jacobian) change of coordinates (p, q) — 
(I,@) mapping W into a set of the form Z x T“ 
with Z c Rf (open); and furthermore 

2. the flow t—5S,(p,q) on phase space is trans- 
formed into (I, @) — (I, @ + @(I)t) where @(I) is a 
smooth function on Z. 


This means that, in suitable coordinates, which 
can be called “integrating coordinates,” the system 
appears as a set of ¢ points with coordinates 
Ø =(1,---, Ye) Moving on a unit circle at angular 
velocities @(I) =(w1(I),..., we(I)) depending on the 
actions of the initial data. 

A system integrable in a region W which, in 
integrating coordinates I, ø, has the form Z x T° is 
said to be anisochronous if det O;@(I) Æ 0. It is said 
to be isochronous if @(I) = æ is independent of I. 
The motions of integrable systems are called 
quasiperiodic with frequency spectrum @(JI), or 
with frequencies @(I)/27, in the coordinates (I, ø). 

Clearly, an integrable system admits £ independent 
constants of motion, the I = (I;,..., J), and, for each 
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choice of I, the other coordinates vary on a “standard” 
¢-dimensional torus T ‘: hence, it is possible to say that 
a phase space region of integrability is foliated into 
¢-dimensional invariant tori 7 (I) parametrized by the 
values of the constants of motion I € Z. 

If an integrable system is anisochronous then it is 
canonically integrable: that is, it is possible to define 
on W a canonical change of coordinates (p,q) = 
C(A,a) mapping W onto J xT‘ and such that 
ACA, a) = b(A) for a suitable b. Then, if 

w(A ) de TODA ), the equations of motion become 


A=0, a@=@(A) [28] 


Given a system (I,@) of coordinates integrating an 
anisochronous system the construction of action- 
angle coordinates can be performed, in principle, via 


a classical procedure (under a few extra 
assumptions). 

Let %1, ..., ye be £ topologically independent circles 
on T£, for definiteness let y;(I) ={ọ | y1 =92 =-= 
Pi-1 = Yi41 =:::=0, y; € [0,27]}, and set 

AM) =5-¢ p-dq 29) 
: 2m vi) 


If the map I++ A(I) is analytically invertible as 
I=I(A), the function 


7 X [pda 30) 


is well defined if the integral is over any path A 
joining the points (p(I(A),0), g(I(A),0)) and 
(p(I(A), @)), gI(A), @) and lying on the torus para- 
metrized by I(A). 

The key remark in the proof that [30] really 
defines a function of the only variables A, @ is that 
anisochrony implies the vanishing of the Poisson 
brackets (cf. [18]): {I;, I;}}=0 (hence also {A;, Aj} = 
>p k On Ai On, Ajks Ip} =0). And the property 
(li, l;}=0 can be checked to be precisely the 
integrability condition for the differential form p - dq 
restricted to the surface obtained by varying q while p is 
constrained so that (p,q) stays on the surface 
I =constant, i.e., on the invariant torus of the points 
with fixed I. 

The latter property is necessary and sufficient in 
order that the function S(A, ø) be well defined (1.e., 
be independent on the integration path A) up to an 
additive quantity of the form ) ,2an;A; with 
n= (is-s) integers. 

Then the action—angle variables are defined by the 
canonical change of coordinates with S(A,q@) as 
generating function, i.e., by setting 


a; = O4S(A,9), T= O9S(A,9) [BU 


and, since the computation of S(A, Ø) is “reduced to 
integrations” which can be regarded as a natural 
extension of the quadratures discussed in the one- 
dimensional cases, such systems are also called 
integrable by quadratures. The just-described con- 
struction is a version of the more general Arnol’d- 
Liouville theorem. 

In practice, however, the actual evaluation of the 
integrals in [29], [30] can be difficult: its analysis in 
various cases (even as “elementary” as the pendu- 
lum) has in fact led to key progress in various 
domains, for example, in the theory of special 
functions and in group theory. 

In general, any surface on phase space on which 
the restriction of the differential form p -dq is locally 
integrable is called a Lagrangian manifold: hence the 
invariant tori of an anisochronous integrable system 
are Lagrangian manifolds. 

If an integrable system is anisochronous, it cannot 
admit more than £ independent constants of motion; 
furthermore, it does not admit invariant tori of 
dimension >. Hence ¢-dimensional invariant tori 
are called maximal. 

Of course, invariant tori of dimension < £ can also 
exist: this happens when the variables I are such that 
the frequencies @(I) admit nontrivial rational rela- 
tions; 1.e., there is an integer components vector 
ve Zf, v =(11,..., v) £0 such that 


v=) ay =0 [32] 


in this case, the invariant torus 7(I) is called 
resonant. If the system is anisochronous then 
det 0;@(I) 4 0 and, therefore, the resonant tori are 
associated with values of the constants of motion 
I which form a set of measure zero in the space 
T but which is not empty and dense. 

Examples of isochronous systems are the systems of 
harmonic oscillators, i.e., systems with Hamiltonian 


2 Cj Vij 


where the matrix v is a positive-definite matrix. 
This is an isochronous system with frequencies 
@ =(w1,..., wr) whose squares are the eigenvalues of 

= =1/2 a l . 
the matrix m, ‘“cjm, '~. It is integrable in the region 
W of the data x = (p,q) € R” such that, setting 


2 
V68 iPi £ VB idi 
vi E) 


1 


£ 





2 








for all eigenvectors vg, 8=1,..., £, of the above 
matrix, the vectors A have all components >0. 


Even though this system is isochronous, it never- 
theless admits a system of canonical action—angle 
coordinates in which the Hamiltonian takes the 
simplest form 


4 
b(A) =) weAg=o-A [33] 
p=1 
with 
i=1 y 
Qg = — arctan 


r 

>, V MiwgYp, iqi 
i=1 
as conjugate angles. 

An example of anisochronous system is the free 
rotators or free wheels: i.e., £ noninteracting points 
on a circle of radius R or £ noninteracting homo- 
geneous coaxial wheels of radius R. If J; =m;R* or, 
respectively, J; =(1/2)m;R? are the inertia moments 
and if the positions are determined by £ angles œ = 
(a1,...,Q¢), the angular velocities are constants 
related to the angular momenta A=(Aj,..., Az) by 
wi = Aj;/J;. The Hamiltonian and the spectrum are 


As}; 


For further details see Landau and Lifshitz (1976), 
Gallavotti (1983), Arnol’d (1989), and Fassò (1998). 


Multidimensional Quadratures: 
Central Motion 


Several important mechanical systems with more 
than one degree of freedom are integrable by 
canonical quadratures in vast regions of phase 
space. This is checked by showing that there is a 
foliation into invariant tori 7 (I) of dimension equal 
to the number of degrees of freedom (4) parame- 
trized by £ constants of motion I in involution, i.e., 
such that {J;, Jj} =0. One then performs, if possible, 
the construction of the action—angle variables by 
the quadratures discussed in the previous section. 

The above procedure is well illustrated by the 
theory of the planar motion of a unit mass attracted 
by a coplanar center of force: the Lagrangian is, in 
polar coordinates (p, 0), 


m. : 
L=>( +96) — V(p) 


The planarity of the motion is not a strong restriction 
as central motion always takes place on a plane. 
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Hence, the equations of motion are 
—mp*6 = 0 


i.e., mp’O=G is a constant of motion (it is the 
angular momentum), and 


Then the energy conservation yields a second 
constant of motion E, 


1 1 p? 
= F + om + V(p) [35] 


The right-hand side (rhs) is the Hamiltonian for the 
system, derived from £, if pp, pọ denote conjugate 
momenta of p, 0: p, =mp and py=mp6 (note that 
po =G). 

Suppose p*V(p) 5a 0 then the singularity at the 
origin cannot be reached by any motion starting 
with p > 0 if G > 0. Assume also that the function 


ef 1 G? 
Vc(p) ae V(p) 
has only one minimum Eo(G), no maximum and no 
horizontal inflection, and tends to a limit E,,(G) < oo 
when p— oo. Then the system is integrable in the 
domain W = {(p, q) | Eo(G) < E < E,.(G), G £ 0}. 
This is checked by introducing a “standard” periodic 
solution t— R(t) of mö= —0,Vc(p) with energy 
Eo(G) < E < Ex(G) and initial data p= pg,_(G), 
p=0 at time t=0, where pg, +(G) are the two 
solutions of Vg(p) = E, see the section “Quadratures”: 
this is a periodic analytic function of t with period 


PE,+(G) dx 


TE6)=2] o JIE NO 


pe,- (G) 


The function R(t) is given, for 0 < t < + T(E, G) 
or for I T(E, G) <t < T(E, G), by the quadratures 





R(t) dx 
= —————e 36a 
J o (2/m)(E — Ve(x)) — 
_T(E,G) _ BY dx ve 
í 2 J (2/m)(E — Valx)) a 
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respectively. The analytic regularity of R(t) follows 
from the general existence, uniqueness, and regularity 
theorems applied to the differential equation for /. 

Given an initial datum ġo, po, 69,49 with energy E 
and angular momentum G, define tọ to be the time 
such that R(to) = po, R(to) = D5: then p(t) = R(t + to) 
and @(t) can be computed as 


, G 
O(t) = 0o +f — dt’ 
o mR(t' + to) 


a second quadrature. Therefore, we can use as 
coordinates for the motion E, G, tp, which determine 
bo, Po, Oo and a fourth coordinate that determines 0 
which could be ĝo itself but which is conveniently 
determined, via the second quadrature, as follows. 
The function Gm 'R(t)~ is periodic with period 
T(E, G); hence it can be expressed in a Fourier series 


27 
E,G)+ E, G) ex itk 
x0(E, G) 2 xe ) P rE ) 





the quadrature for (t) can be performed by 
integrating the series terms. Setting 


=, def T(E, G) xXk(E, G) Im n 
Olto) = Z 2, 7 T(E, G) itok 
and 1(0)= 6o — Oto), the expression 


, G 
A(t) = Oo +f ——, dt’ 
o mR(t' + to) 





k40 


becomes 
yi(t) = y1(0) + xo(E, G) t [37] 


Hence the system is integrable and the spectrum is 
aE, G) a (wo(E, G), wi (E, G)) = (wo, w1) with 


def 2r 
ATE) 





and W1 u (E, G) 


while [=(E,G) are constants of motion and the 
angles Ø = (Yo, %1) can be taken as 


def = 
pı = bo — Ato) 


At E, G fixed, the motion takes place on a two- 
dimensional torus T(E, G) with yo, 1 as angles. 

In the anisochronous’ cases, i.e., when 
det Og, c@(E,G) #0, canonical action-angle vari- 
ables conjugated to (pp, p, po, 0) can be constructed 
via [29], [30] by using two cycles 71, y2 on the torus 
T(E, G). It is convenient to choose 


def 
Yo = wolo, 


1. y1 as the cycle consisting of the points p =x, = 0 
whose first half (where p, > 0) consists in the 
set pr,_(G) <x < pr,+(G),p,=V2m(E — Vole) 
and d8 = 0; and 


2. y2 as the cycle p= const, 0 € [0,27] on which 
dp=0 and pọ =G obtaining 


2 pe, +(G) 
Ape =| | Dre ESN 


PE,—(G [38] 


A =G 
According to the general theory (cf. the previous 
section) a generating function for the canonical 


change of coordinates from (pp, p, po,0) to action- 
angle variables is (if, to fix ideas, p, > 0) 


S(A1, A2, p,0) = GO f y 2m(E — Vg(x))dx [39] 


In terms of the above wọ, Xo the Jacobian matrix 
O(E, G)/Ə(A1, A2) is computed from [38], [39] to be 


(4 7) It follows that pS =t, 0GS=6 — 6(t) — xo! 


0 1 
so that, see [31], 


a Æ 3a S= wot, a f04,8=0-8(t) [40] 


and (A1, @1), (A2, a2) are the action—angle pairs. 
For more details, see Landau and Lifshitz (1976) 
and Gallavotti (1983). 


Newtonian Potential and Kepler’s Laws 


The anisochrony property, that is, det O(wo, Xo)/ 
(A1,A2) #0 or, equivalently, det O(wo, xo) / 
O(/E,G)40, is not satisfied in the important cases 
of the harmonic potential and the Newtonian 
potential. Anisochrony being only a sufficient con- 
dition for canonical integrability it is still possible 
(and true) that, nevertheless, in both cases the 
canonical transformation generated by [39] inte- 
grates the system. This is expected since the two 
potentials are limiting cases of anisochronous ones 
egla | andl * with «3 0). 
The Newtonian potential 


is integrable in the region G+#0, Eo(G)= 
—k?m? /2G? < E < 0, |G| < /k2m3/(-2E). Pro- 
ceeding as in the last section, one finds integrating 
coordinates and that the integrable motions develop 
on ellipses with one focus on the center of attraction 
S so that motions are periodic, hence not anisochro- 
nous: nevertheless, the construction of the canonical 
coordinates via [29]-[31] (hence [39]) works and 
leads to canonical coordinates (L’, A’, G’, y). To 
obtain action-angle variables with a simple 
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Figure 2 Eccentric and true anomalies of P, which moves on a small circle E centered at a point c moving on the circle D located 
half-way between the two concentric circles containing the Keplerian ellipse: the anomaly of c with respect to the axis OS is €. The 
circle D is eccentric with respect to S and therefore € is, even today, called eccentric anomaly, whereas the circle D is, in ancient 
terminology, the deferent circle (eccentric circles were introduced in astronomy by Ptolemy). The small circle E on which the point P 
moves is, in ancient terminology, an epicycle. The deferent and the epicyclical motions are synchronous (i.e., they have the same 
period); Kepler discovered that his key a priori hypothesis of inverse proportionality between angular velocity on the deferent and 
distance between P and S (i.e., p= constant) implied both synchrony and elliptical shape of the orbit, with focus in S. The latter law is 
equivalent to p20 = constant (because of the identity a€ = pĝ). Small eccentricity ellipses can hardly be distinguished from circles. 


interpretation, it is convenient to perform on the 
variables (L’, ’, G’, y’) (constructed by following the 
procedure just indicated) a further trivial canonical 
transformation by setting L=L’+G’,G=G, 
A= N,y=7 — X; then 


1. à (average anomaly) is the time necessary for the 
point P to move from the pericenter to its actual 
position, in units of the period, times 27; 

2. L (action) is essentially the energy E = —k*m?> /2L?; 

. G (angular momentum); 

4. y (axis longitude), is the angle between a fixed 
axis and the major axis of the ellipse oriented 
from the center of the ellipse O to the center of 
attraction S. 


SN) 


The eccentricity of the ellipse is e such that G= 
+LV1—e?. The ellipse equation is p=a(1— 
e cos €), where € is the eccentric anomaly (see 
Figure 2), a=L*/km? is the major semiaxis, and 
p is the distance to the center of attraction S. 

Finally, the relations between eccentric anomaly €, 
average anomaly A, true anomaly 6@ (the latter is the 
polar angle), and SP distance p are given by the 
Kepler equations 


A= €-—esin€ 

(1 — ecos€)(1+ecos6) = 1 — e? 
0 / 

a= (1-2 | — [41] 
o (1+ ecos 6’) 

po Teg 

a 1+ecosé 


and the relation between true anomaly and average 
anomaly can be inverted in the form 


E= AF8 
1 — e? 42] 


p 
T a 
a1 a 1+¢ecos(A + fy) 


where g)=g(esinA,ecosX), fa = f(esin A, ecos àA), 
and g(x,y),f(x,y) are suitable functions analytic 
for |x|, |y| < 1. Furthermore, g(x,y)=x(1+y+---), 
f(x,y) =2x(1 + >y +---) and the ellipses denote 
terms of degree 2 or higher in x,y, containing only 
even powers of x. 

For more details, the reader is referred to Landau 


and Lifshitz (1976) and Gallavotti (1983). 


Rigid Body 


Another fundamental integrable system is the rigid 
body in the absence of gravity and with a fixed point 
O. It can be naturally described in terms of the Euler 
angles 4, Yo, Yo (see Figure 3) and their derivatives 
Oo, Po, Yo. 

Let I1, 5,13 be the three principal inertia moments 
of the body along the three principal axes with unit 
vectors f1, t2,13. The inertia moments and the 
principal axes are the eigenvalues and the associated 
unit eigenvectors of the 3 x3 inertia matrix Z, 
which is defined by Zp, = >", m;j(x;),(xi)p, where 
h,k=1,2,3 and x; is the position of the ith particle 
in a reference frame with origin at O and in which 





Figure 3 The Euler angles of the comoving frame i4, i2, fz with 
respect to a fixed frame x, y, z. The direction n is the “node line, 
intersection between the planes x, y and fy, fo. 
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all particles are at rest: this comoving frame exists as 
a consequence of the rigidity constraint. The 
principal axes form a coordinate system which is 
comoving as well: that is, in the frame (O31, 12,13) 
as well, the particles are at rest. 

The Lagrangian is simply the kinetic energy: we 
imagine the rigidity constraint to be ideal (e.g., as 
realized by internal central forces in the limit of 
infinite rigidity, as mentioned in the section “Lagrange 
and Hamilton forms of equations of motion”). The 
angular velocity of the rigid motion is defined by 


oO = ĝon + poz + bois [43] 


expressing that a generic infinitesimal motion 
must consist of a variation of the three Euler 
angles and, therefore, it has to be a rotation of 
speeds 90, bo, Wo neni the axes n,z,13 as shown 
in Figure 3. 

Let (w1,w2,w3) be the components of @ along the 
principal axes 11,12,13: for brevity, the latter axes 
will often be called 1,2,3. Then the angular 
momentum M, with respect to the pivot point O, 
and the kinetic energy K can be checked to be 


M = hwii + hwi + [3w313 
1 (44) 
K =- 
5 | 
and are constants of motion. From Figure 3 it follows 
that w ,=O cos Wo + Yo sin Âo sin Wo, W2 = — Oo sin Wo + 
YosinAycosywo and w3=H9cosho9+wWo, so that the 
Lagrangian, uninspiring at first, is 


Iw F hui + Inw3) 


dl 
gS 511 (00 cos Wo + Yo sin ĝo sin wo)” 


1 - os 
=f z h2(—6 sin Yo + Yo sin Oo cos Wo)” 


1 . 
+5 B($o cos ĝo + o) [45] 

Angular momentum conservation does not imply 
that the components w; are constants because 
11,12,13 also change with time according to 

qi = OA J= L2 

Hence, M=0 becomes, by the first of [44] and 
denoting Iø = (lw, Inw2, 13w3), the Euler equations 
lo +a/Ilo=)0, or 


lhw = (h = T3)w2W3 
haz = (I; — l )w3w4 [46] 
lws = (l = h)wiwz 


which can be considered together with the conserved 
quantities [44]. 


Since angular momentum is conserved, it is con- 
venient to introduce the laboratory frame (O;xoọ, 
Yo, Z0) with fixed axes xo, Yọ, Zo and (see Figure 4): 


1. (O; x,y, z), the momentum frame with fixed axes, 
but with z-axis oriented as M, and x-axis 
coinciding with the node (i.e., the intersection) 
of the xo—yọ plane and the x-y plane (orthogonal 
to M). Therefore, x,y,z is determined by the two 
Euler angles C, y of (O; x,y,z) in (O; x0, Yo, Z0); 

2. (O;1,2,3), the comoving frame, that is, the 
frame fixed with the body, and with unit vectors 
i1, i2, 13 parallel to the principal axes of the body. 
The frame is determined by three Euler angles 
0o, P0, Wo; 

3. the Euler angles of (O;1,2,3) with respect to 
(O;x,¥, 2), which are denoted 0, p, 4; 

4. G, the total angular momentum: G* = 2 Pur; 

5. M3, the angular momentum along the zo axis; 
M3=Gcos¢; and 

6. L, the projection of M on the axis 3, L = G cos 8. 


The quantities G, M3, L, p, y, determine 9p, Yo, 
wo and o, ġo, po, or the po,PyosPu) variables 
conjugated to ĝo, %0, Yo as shown by the following 
comment. 

Considering Figure 4, the angles ¢,y determine 
location, in the fixed frame (O;x0,Yo,Z0) of the 
direction of M and the node line m, which are, 
respectively, the z-axis and the x-axis of the fixed 
frame associated with the angular momentum; the 
angles 0,.,~ then determine the position of the 
comoving frame with respect to the fixed frame 
(O;x,y,z), hence its position with respect to 


(O; x0, Yo, 20), that is, (80, o, Yo). From this and 
G, it is possible to determine @ because 
lws hw 
cos 0 =——., tan Y = —— 
G i lhw |47] 
w3 =° (G? — Tuy — 13w3) 


and, from [43], ĝo, 0, Wo are determined. 





Figure 4 The laboratory frame, the angular momentum frame, 
and the comoving frame (and the Deprit angles). 


The Lagrangian [45] gives immediately (after 
expressing @, 1.e., ,2,13, in terms of the Euler 
angles 0o, %0, Yo) an expression for the variables 


Poo» Poos Pro conjugated to 60, £05 Wo: 


Poo =M. No, Py = M: 290, Pro =M. 13 |48] 


and, in principle, we could proceed to compute the 
Hamiltonian. 

However, the computation can be avoided 
because of the very remarkable property (DEPRIT), 
which can be checked with some patience, making 
use of [48] and of elementary spherical trigonometry 
identities, 


M3 dy + Gdy ar L dy 
= þa dpo + Puy do + Po dAo |49] 


which means that the map _ ((M3,7),(L,w), 


(G, Y)) — ((Po9s 80), (Poos P0), (Puo» Wo)) is a canoni- 
cal map. And in the new coordinates, the kinetic 


energy, hence the Hamiltonian, takes the form 

[2 ore 2 

“4 (G2 12)(SB¥ 8? )| Iso) 
Í} li h 


This again shows that G,M3 are constants of 
motion, and the L,~ variables are determined by a 
quadrature, because the Hamilton equation for w 
combined with the energy conservation yields 


1 
ke- 
2 

















sin? cos 
2E — G?( 3 Y ae £) 


2 [51] 


In the integrability region, this motion is periodic 
with some period T(E, G). Once y(t) is determined, 
the Hamilton equation for y leads to the further 
quadrature 


sin? cos? 


which determines a second periodic motion with 
period T(E, G). The y, M3 are constants and, 
therefore, the motion takes place on three- 
dimensional invariant tori Tg G,m, in phase space, 
each of which is “always” foliated into two- 
dimensional invariant tori parametrized by the 
angle y which is constant (by [50], because K is 
M3-independent): the latter are in turn foliated by 
one-dimensional invariant tori, that is, by periodic 
orbits, with E,G such that the value of 
T(E, G)/Tg(E, G) is rational. 
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Note that if =h =I, the above analysis is 
extremely simplified. Furthermore, if gravity g acts 
on the system the Hamiltonian will simply change by 
the addition of a potential —mgz if z is the height of 
the center of mass. Then (see Figure 4), if the center 
of mass of the body is on the axis 13 and z = þ cos 4p, 
and h is the distance of the center of mass from O, 
since cos ło = cos ĝ cos C — sin @ sin Å cos y, the Hamil- 
tonian will become H = K — mgh cos 69 or 


M3L MA '/ 
mee k - (1 E 3) 


J2 1/2 
x (1 — aa) cos Y [53] 


so that, again, the system is integrable by quadratures 
(with the roles of = and » “interchanged” with respect 
to the previous case) in suitable regions of phase space. 
This is called the Lagrange’s gyroscope. 

A less elementary integrable case is when the 
inertia moments are related as I; = h = 2I; and the 
center of mass is in the 7;-%2 plane (rather than on 
the 723-axis) and only gravity acts, besides the 
constraint force on the pivot point O; this is called 
Kowalevskaia’s gyroscope. 

For more details, see Gallavotti (1983). 


G2 G2 — L2 


pe ay 


Other Quadratures 


An interesting classical integrable motion is that of a 
point mass attracted by two equal-mass centers of 
gravitational attraction, or a point ideally constrained 
to move on the surface of a general ellipsoid. 

New integrable systems have been discovered 
quite recently and have generated a wealth of new 
developments ranging from group theory (as integ- 
rable systems are closely related to symmetries) to 
partial differential equations. 

It is convenient to extend the notion of integ- 
rability by stating that a system is integrable in a 
region W of phase space if 


1. there is a change of coordinates (p,q) € 
W — {A,a,Y,y} € (Ux T*) x (Vx R”) where 
U c Rf, V c R”, with? + m > 1, are opensets; and 

2. the A, Y are constants of motion while the other 
coordinates vary “linearly”: 


(a,y) — (@+@(A,Y)t, y+ v(A, Y)t) [54] 


where @(A, Y), v(A, Y) are smooth functions. 


In the new sense, the systems studied in the previous 
sections are integrable in much wider regions (essen- 
tially on the entire phase space with the exception of a 
set of data which lie on lower-dimensional surfaces 
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forming sets of zero volume). The notion is con- 
venient also because it allows us to say that even the 
systems of free particles are integrable. 

Two very remarkable systems integrable in the 
new sense are the Hamiltonian systems, respectively 
called Toda lattice (KRusKAL, Zapusky), and 
Calogero lattice (CALOGERO, Moser); if (pi, qi) € R’, 
they are 


1 2 n—1 E 
Hr(p, 4) = ee ee a 
f=1 j= 1 


where m>0O and k,w,g > 0. They describe the 
motion of n interacting particles on a line. 

The integration method for the above systems is 
again to find first the constants of motion and later 
to look for quadratures, when appropriate. The 
constants of motion can be found with the method 
of the Lax pairs. One shows that there is a pair of 
self-adjoint n x n matrices M(p, q), N(p, q) such that 
the equations of motion become 


Í M(p.4) =i[M(p.q).N(p.q)), i=V=1 [56 


which imply that M(t) = U(t)M(0)U(t)“, with U(t) a 
unitary matrix. When the equations can be written in 
the above form, it is clear that the n eigenvalues of the 
matrix M(0)=M(po,qo) are constants of motion. 
When appropriate (e.g., in the Calogero lattice case 
with w > 0), it is possible to proceed to find canonical 
action—angle coordinates: a task that is quite difficult 
due to the arbitrariness of n, but which is possible. 

The Lax pairs for the Calogero lattice (with 
w=), g=m= 1) are 


Np, = O 
l 1 57 
ea Nik = Gh #R 37 
db = qk) (dn — qr) 


while for the Toda lattice (with m = g = lk = 1) the 
nonzero matrix elements of M, N are 


Mpb = Pp, 


Mpp = Pp, (4,—-4h41) 


Np p41 = —No41,b = je (deo) 


M = M =e 
b,h+1 h+1,h 58) 


which are checked by first trying the case n=2. 
Another integrable system (SUTHERLAND) is 


n 


1 € g 
Hs(p,q) = — X AEn X —— e 59 


whose Lax pair is related to that of the Calogero 
lattice. 

By taking suitable limits as n— œo and as the 
other parameters tend to 0 or œ at suitable rates, 
integrability of a few differential equations, among 
which the Korteweg-de Vries equation or the non- 
linear Schrödinger equation, can be derived. 

As mentioned in the introductory section, sym- 
metry properties under continuous groups imply 
existence of constants of motion. Hence, it is natural 
to think that integrability of a mechanical system 
reflects enough symmetry to imply the existence of 
as many constants of motion, independent and in 
involution, as the number of degrees of freedom, n. 

This is in fact always true, and in some respects it 
is a tautological statement in the anisochronous 
cases. Integrability in a region W implies existence 
of canonical action—angle coordinates (A, œ) (see the 
section “Quasiperiodicity and integrability”) and the 
Hamiltonian depends solely on the A’s: therefore, its 
restriction to W is invariant with respect to the 
action of the continuous commutative group 7” of 
the translations of the angle variables. The actions 
can be seen as constants of motion whose existence 
follows from Noether’s theorem, at least in the 
anisochronous cases in which the Hamiltonian 
formulation is equivalent to a Lagrangian one. 

What is nontrivial is to recognize, prior to 
realizing integrability, that a system admits this 
kind of symmetry: in most of the interesting cases, 
the systems either do not exhibit obvious symmetries 
or they exhibit symmetries apparently unrelated to 
the group 7”, which nevertheless imply existence of 
sufficiently many independent constants of motion 
as required for integrability. Hence, nontrivial 
integrable systems possess a “hidden” symmetry 
under 7”: the rigid body is an example. 

However, very often the symmetries of a Hamiltonian 
H which imply integrability also imply partial 
isochrony, that is, they imply that the number of 
independent frequencies is smaller than n (see the 
section “Quasiperiodicity and integrability”). Even 
in such cases, often a map exists from the original 
coordinates (p,q) to the integrating variables (A, œ) 
in which A are constants of motion and the @ are 
uniformly rotating angles (some of which are also 
constant) with spectrum @(A), which is the gradient 
dah(A) for some function (A) depending only on a 
few of the A coordinates. However, the map might 
fail to be canonical. The system is then said to be 
bi-Hamiltonian: in the sense that one can represent 
motions in two systems of canonical coordinates, 
not related by a canonical transformation, and by 
two Hamiltonian functions H and H’=h which 
generate the same motions in the respective 


coordinates (the latter changes of variables are 
sometimes called “canonical with respect to the 
pair H, H’” while the transformations considered in 
the section “Canonical transformations of phase 
space coordination” are called completely 
canonical). 

For more details, we refer the reader to Calogero 
and Degasperis (1982). 


Generic Nonintegrability 


It is natural to try to prove that a system “close” to 
an integrable one has motions with properties very 
close to quasiperiodic. This is indeed the case, but in 
a rather subtle way. That there is a problem is easily 
seen in the case of a perturbation of an anisochro- 
nous integrable system. 

Assume that a system is integrable in a region W 
of phase space which, in the integrating action—angle 
variables (A, œ), has the standard form U x Tt with 
a Hamiltonian (A) with gradient @(A) = O4h(A). If 
the forces are perturbed by a potential which is 
smooth then the new system will be described, in the 
same coordinates, by a Hamiltonian like 


H(A, a) = h(A) + ef (A, œ) [60] 


with 4,f analytic in the variables A, a. 

If the system really behaved like the unperturbed 
one, it ought to have £ constants of motion of the 
form F.(A,a@) analytic in € near e=O and uniform, 
that is, single valued (which is the same as periodic) 
in the variables æ. However, the following theorem 
(PoINCARE) shows that this is a somewhat unlikely 
possibility. 


Theorem 1 If the matrix 0, 0(A) has rank >2, the 
Hamiltonian [60] “generically” (an intuitive notion 
precised below) cannot be integrated by a canonical 
transformation C-(A,@) which 


(i) reduces to the identity as € — 0; and 
(ii) is analytic in € near c=0 and in (A,a)«€ 
U' x T°, with U' C U open. 


Furthermore, no uniform constants of motion F(A, a), 
defined for £ near 0 and (A, œ) in an open domain U' x 
T’, exist other than the functions of H- itself. 


Integrability in the sense (i), (ii) can be called 
analytic integrability and it is the strongest (and 
most naive) sense that can be given to the attribute. 

The first part of the theorem, that is, (i), (11), holds 
simply because, if integrability was assumed, a 
generating function of the integrating map would 
have the form A’. æ + 6.(A’,a) with © admitting a 
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power series expansion in £ as ®. = e@! + *@* + --., 
Hence, &! would have to satisfy 


O(A’) Dab! (A'a) + f(A,ar) = f(A) [61] 


where f(A’) depends only on A’ (hence integrating 
both sides with respect to œ, it appears that f(A’) 
must coincide with the average of f(A’, œ) over æ). 

This implies that the Fourier transform f,(A), 
v € Zf, should satisfy 


f(A’) = 0 


which is equivalent to the existence of f,(A’) such that 
f(A) =@(A’) - vf (A) for v 4 0. But since there is no 
relation between @(A) and [f(A,q@), this property 
“generically” will not hold in the sense that as close 
as wished to an f which satisfies the property [62] there 
will be another f which does not satisfy it essentially no 
matter how “closeness” is defined, (e.g., with respect to 
the metric ||f — g||= 5°, |A(A) — gy(A)||). This is so 
because the rank of 0, 0(A) is higher than 1 and @(A) 
varies at least on a two-dimensional surface, so that 
@-v=O becomes certainly possible for some v Æ 0 
while f,(A) in general will not vanish, so that ®t, 
hence ®., does not exist. 

This means that close to a function f there is a 
function f’ which violates [62] for some v. Of course, 
this depends on what is meant by “close”: however, 
here essentially any topology introduced on the 
space of the functions f will make the statement 
correct. For instance, if the distance between two 
functions is defined by `, sup,gey |w (A) — gy(A)| or 
by sup A,alf (A, Œ) — g(A, a) |. 

The idea behind the last statement of the theorem 
is in essence the same: consider, for simplicity, the 
anisochronous case in which the matrix 0, 0(A) 
has maximal rank £, that is, the determinant 
det 0°, H(A) does not vanish. Anisochrony implies 
that w(A)-v Æ 0 for all v 40 and A on a dense set, 
and this property will be used repeatedly in the 
following analysis. 

Let B(e£, A, œ) be a “uniform” constant of motion, 
meaning that it is single valued and analytic in the 
non-simply-connected region U x T“ and, for £ small, 


B(e,A, æ) = Bo (A, æ) + eB1 (A, 0.) 
te By AG) se [63] 


if @(A)\-v=0, v0 [62] 


The condition that B is a constant of motion can be 
written order by order in its expansion in e: the first 
two orders are 


o(A) - ðxBo(A, a) = 0 
Oaf (A, a) í Og Bo (A, a) = al (A, a) j a Bo(A, 0.) [64] 
+@(A) - dyB1(A,a@) = 0 
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Then the above two relations and anisochrony imply 
(1) that Bo must be a function of A only and (2) that 
@(A)-v and 04Bo(A) - v vanish simultaneously for all 
v. Hence, the gradient of Bo must be proportional to 
@(A), that is, to the gradient of h(A):04Bo(A) = 
\(A)dah(A). Therefore, generically (because of the 
anisochrony) it must be that Bọ depends on A 
through (A): Bo(A) = F(h(A)) for some F. 

Looking again, with the new information, at the 
second of [64] it follows that at fixed A the 
a-derivative in the direction @(A) of Bı equals 
F'(h(A)) times the a-derivative of f, that is, 
B,(A, æ) = f(A, a@)F'(b(A)) + Ci(A). 

Summarizing: the constant of motion B has been 
written as B(A,a@) =F(h(A)) + eF'(h(A))f(A, a) + 
eC,(A) +e7B,+--- which is equivalent to 
B(A, œ) = F(H-) + e(Bọ + eB, +---) and therefore 
Bo +B} +--+ is another analytic constant of 
motion. Repeating the argument also By) + €B} +--> 
must have the form Fy(H-)+¢(Bj +¢Bi+---); 
conclusion 


B =F(H-) + €Fi (He) + €7Fo(H.) + 
+6"F,(H.) + O(c") [65] 


By analyticity, B=F.(H-(A,a@)) for some F.: hence 
generically all constants of motion are trivial. 

Therefore, a system close to integrable cannot 
behave as it would naively be expected. The 
problem, however, was not manifest until Pom- 
CARE’s proof of the above results: because in most 
applications the function f has only finitely many 
Fourier components, or at least is replaced by an 
approximation with this property, so that at least 
[62] and even a few of the higher-order constraints 
like [64] become possible in open regions of action 
space. In fact, it may happen that the values of A of 
interest are restricted so that w@(A)-v=0O only for 
“large” values of v for which fy =0. Nevertheless, 
the property that f,(A)=(@(A)-v)f,(A) (or the 
analogous higher-order conditions, e.g., [64)), 
which we have seen to be necessary for analytic 
integrability of the perturbed system, can be 
checked to fail in important problems, if no 
approximation is made on f. Hence a conceptual 
problem arises. 

For more details see Poincaré (1987). 


Perturbing Functions 


To check, in a given problem, the nonexistence of 
nontrivial constants of motion along the lines 
indicated in the previous section, it is necessary to 
express the potential, usually given in Cartesian 


coordinates as eV(x), in terms of the action—angle 
variables of the unperturbed, integrable, system. 

In particular, the problem arises when trying to 
check nonexistence of nontrivial constants of 
motion when the anisochrony assumption (cf. the 
previous section) is not satisfied. Usually it 
becomes satisfied “to second order” (or higher): 
but to show this, a more detailed information on 
the structure of the perturbing function expressed 
in action—angle variables is needed. For instance, 
this is often necessary even when the perturbation 
is approximated by a trigonometric polynomial, as 
it is essentially always the case in celestial 
mechanics. 

Finding explicit expressions for the action—angle 
variables is in itself a rather nontrivial task which 
leads to many problems of intrinsic interest even in 
seemingly simple cases. For instance, in the case of 
the planar gravitational central motion, the Kepler 
equation A= €—esin€ (see the first of [41]) must be 
solved expressing € in terms of (see the first of 
[42]). It is obvious that for small £, the variable € 
can be expressed as an analytic function of e: 
nevertheless, the actual construction of this expres- 
sion leads to several problems. For small ¢, an 
interesting algorithm is the following. 

Let b(A) =£ — å, so that the equation to solve (i.e., 
the first of [41]) is 


b(A) = esin(A + h(A)) 

_ oc 

= -Ez A thA) [66] 
where c(A)= cos à; the function A — h(A) should be 
periodic in A, with period 27, and analytic in £, A for 
e small and A real. If b(A) =eh" 4+ e7h?) +---, the 
Fourier transform of h'*)(X) satisfies the recursion 
relation 





< 1 | | 
==) a (ivy )c,, (ivo)? 
p=1 P: ky + +kp=k-1 
vo Hyg tet p =Y 
x [[%, k>1 [67] 


with c, the Fourier transform of the cosine (c41 = L, 
c,=0 if v#+1), and (of course) h'!) = —ivc,. 
Equation [67] is obtained by expanding the RHS 
of [66] in powers of h and then taking the Fourier 
transform of both sides retaining only terms of order 
k ine. 

Iterating the above relation, imagine drawing all 
trees 0 with k “branches,” or “lines,” distinguished 
by a label taking k values, and k nodes and attach to 
each node v a harmonic label v, = +1 as in Figure 5. 
The trees will be assumed to start with a root line vr 
linking a point r and the “first node” v (see Figure 5) 


V4 
V4 V5 
Vo V6 
V7 
di 

V3 Vg 
V9 

V10 

Figure 5 An example of a tree graph and its labels. It contains 


only one simple node (3). Harmonics are indicated next to their 
nodes. Labels distinguishing lines are not marked. 


and then bifurcate arbitrarily (such trees are some- 
times called “rooted trees”). 

Imagine the tree oriented from the endpoints 
towards the root r (not to be considered a node) 
and given a node v call v’ the node immediately 
following it. If v is the first node before the root r, 
let v =r and v,,=1. For each such decorated tree 
define its numerical value 


Val(0) -7 I ww lle [68] 


` lines l=v'v nodes 


and define a current v(/) on a line /=v'v to be the 
sum of the harmonics of the nodes preceding 
v: v(l)= X „<, vv. Call v(0) the current flowing in 
the root branch and call order of 0 the number of 
nodes (or branches). Then 


hb) = X Val(0) [69] 


É TOS 
order(0)=k 
provided trees are considered identical if they can be 
overlapped (labels included) after suitably scaling 
the lengths of their branches and pivoting them 
around the nodes out of which they emerge (the root 
is always imagined to be fixed at the origin). 

If the trees are stripped of the harmonic labels, 
their number is finite and it can be estimated to be 
<k!4* (because the labels which distinguish the lines 
can be attached to an unlabeled tree in many ways). 
The harmonic labels (i.e. v,= +1) can be laid 
down in 2* ways, and the value of each tree can be 
bounded by (1/k!)2~* (because c+1 = 5). 

Hence X, |h'*)| <45, which gives a (rough) 
estimate of the radius of convergence of the 
expansion of / in powers of e: namely 0.25 (easily 
improvable to 0.3678 if 4*k! is replaced by k*~! 
using Cayley’s formula for the enumeration of 
rooted trees). A simple expression for h'*)(w) 
(LAGRANGE) 1s 
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(also readable from the tree representation): the 
actual radius of convergence, first determined by 
Laplace, of the series for hb can also be determined 
from the latter expression for h (ROUCHE) or directly 
from the tree representation: it is ~0.6627. 

One can find better estimates or at least more 
efficient methods for evaluating the sums in [69]: 
in fact, in performing the sum in [69] important 
cancellations occur. For instance, the harmonic 
labels can be subject to the further strong constraint 
that no line carries zero current because the 
sum of the values of the trees of fixed order and 
with at least one line carrying zero current 
vanishes. 

The above expansion can also be simplified by 
partial resummations. For the purpose of an 
example, let the nodes with one entering and one 
exiting line (see Figure 5) be called as “simple” 
nodes. Then all tree graphs which, on any line 
between two nonsimple nodes, contain any number 
of simple nodes can be eliminated. This is done by 
replacing, in evaluating the (remaining) tree values, 
the factors vyr, in [68] by vyw/(1 — ecos): then 
the value of 0 (denoted Val(@),,) for a tree becomes a 
function of Y% and £ and [69] is replaced by 


h(w) = 3 5 eel” ’Val(0), [70] 
k=1 0,v(0)=v 


order(6)=k 


where the x means that the trees are subject to the 
further restriction of not containing any simple 
node. It should be noted that the above graphical 
representation of the solution of the Kepler equation 
is strongly reminiscent of the representations of 
quantities in terms of graphs that occur often in 
quantum field theory. Here the trees correspond to 
Feynman graphs, the factors associated with the 
nodes are the couplings, the factors associated with 
the lines are the propagators, and the resummations 
are analogous to the self-energy resummations, 
while the cancellations mentioned above can be 
related to the class of identities called Ward 
identities. Not only the analogy can be shown not 
to be superficial, but it also turns out to be very 
helpful in key mechanical problems: see Appendix 1. 

The existence of a vast number of identities 
relating the tree values is shown already by the 
simple form of the Lagrange series and by the 
even more remarkable resummation (LEvi-Crvita) 
leading to 
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It is even possible to further collect the series 
terms to express it as a series with much better 
convergence properties; for instance, its terms can be 
reorganized and collected (resummed) so that / is 
expressed as a power series in the parameter 


aie 
OTe 


with radius of convergence 1, which corresponds to 
é€=1 (via a simple argument by Levi-Civita). The 
analyticity domain for the Lagrange series is |n| < 1. 
This also determines the value of Laplace radius, 
which is the point closest to the origin of the 
complex curve |7(<)| = 1: it is imaginary so that it is 
the root of the equation 


[72] 


cet? (14/1462) =1 


The analysis provides an example, in a simple 
case of great interest in applications, of the kind of 
computations actually necessary to represent the 
perturbing function in terms of action—angle 
variables. The property that the function c(A) in 
[66] is the cosine has been used only to limit the 
range of the label v to be +1; hence the same 
method, with similar results, can be applied to 
study the inversion of the relation between the 
average anomaly A and the true anomaly @ and to 
efficiently obtain, for instance, the properties of 
f, g in [42]. 

For more details, the reader is referred to Levi- 
Civita (1956). 


Lindstedt and Birkhoff Series: 
Divergences 


Nonexistence of constants of motion, rather than 
being the end of the attempts to study motions close 
to integrable ones by perturbation methods, marks 
the beginning of renewed efforts to understand their 
nature. 

Let (A,æ)€ Ux T° be action-angle variables 
defined in the integrability region for an analytic 
Hamiltonian and let h(A) be its value in the action- 
angle coordinates. Suppose that (Ao) is anisochro- 
nous and let f(A,@) be an analytic perturbing 
function. Consider, for € small, the Hamiltonian 
H(A, a) = Ho (A) + ef (A, a). 

Let @o =@(Ao0) = A4H0(A) be the frequency spec- 
trum (see the section “Quasiperiodicity and integ- 
rability”) of one of the invariant tori of the 
unperturbed system corresponding to an action Ao. 
Short of integrability, the question to ask at this 
point is whether the perturbed system admits an 


analytic invariant torus on which the motion is 
quasiperiodic and 


1. has the same spectrum øo, 

2. depends analytically on £ at least for € small, 

3. reduces to the “unperturbed torus” {Ag} x T’’ as 
E— 0. 


More concretely, the question is: 


Are there functions H-(w),h-(w) analytic in y € T“ 
and in £ near 0, vanishing as € — 0 and such that the 
torus with parametric equations 

A =A +H.:(y), a=w+h(y), weT® [73] 
is invariant and, if w @(Ao), the motion on it is 
simply w>w+4+@ot, i.e., it is quasiperiodic with 
spectrum @o? 


In this context, Poincaré’s theorem (in the section 
“Generic nonintegrability”) had followed another 
key result, earlier developed in particular cases and 
completed by him, which provides a partial answer 
to the question. 

Suppose that @o = @(Ao) € Rf satisfies a Diophan- 
tine property, namely suppose that there exist 
constants C,T > 0 such that 


1 
Oo: V| > =— 
| 0 2 Gyr 


forallOAvEZ [74 
which, for each 7>¢—1 fixed, is a property 
enjoyed by all æ € R* but for a set of zero measure. 
Then the motions on the unperturbed torus run over 
trajectories that fill the torus densely because of the 
“irrationality” of @o implied by [74]. Writing 
Hamilton’s equations, 


G = O4Ho(A) + cðaf (A, æ), A=—edgf(A,a@) 


with A, œ given by [73] with w replaced by y + at, 
and using the density of the unperturbed trajectories 
implied by [74], the condition that [73] are 
equations for an invariant torus on which the 
motion is Y— Y + @ot are 


@ + (@o -Ay)h-(y) =AaHo(Ao + H-(y)) 
+ edaf (Ao + H-(w), y +h-(W))(@o0 -dy)H-(y) 
= —€0af (Ao + H-(w), y +h-(w)) [75] 


The theorem referred to above (POINCARE) is that 


Theorem 2 If the unperturbed system is anisochro- 
nous and @) = @(Ao) satisfies [74] for some C,T > 0 
there exist two well defined power series h-(w)= 


th) (w) and Hew) =, +H. (y) which 
k=1 k=l 


solve |75] to all orders in e. The series for H. is 
uniquely determined, and such is also the series for 
h. up to the addition of an arbitrary constant at each 
order, so that it is unique if ba is required, as 
henceforth done with no loss of generality, to have 
zero average over y. 


The algorithm for the construction is illustrated in 
a simple case in the next section (see eqns [83], 
[84]). Convergence of the above series, called 
Lindstedt series, even for small e has been a problem 
for rather a long time. Poincaré proved the existence 
of the formal solution; but his other result, discussed 
in the section “Generic nonintegrability,” casts 
doubts on convergence although it does not exclude 
it, as was immediately stressed by several authors 
(including Poincaré himself). The result in that 
section shows the impossibility of solving [75] for 
all @o’s near a given spectrum, analytically and 
uniformly, but it does not exclude the possibility of 
solving it for a single @o. 

The theorem admits several extensions or analogs: 
an interesting one is to the case of isochronous 
unperturbed systems: 


Given the Hamiltonian H-(A,a@)=@-A-+ef(A,q@), 
with @ọ satisfying [74] and f analytic, there exist 
power series C-(A’,a@’), u-(A’) such that H.(C-(A’, a’)) = 
@y-A'+u-(A’) holds as an equality between formal 
power series (i.e., order by order in £) and at the 
same time the C., regarded as a map, satisfies order by 
order the condition (i.e., (4.3)) that it is a canonical map. 


This means that there is a generating function 
A’-a@+@®,.(A',a) also pe by a formal power 
series ®,(A',æ)=) z] R(A' œ), that is, such 
that if C (Ala '\=(A, A a it is true, order by 
order in powers of £, that A =A’ +ða®-(A', œ) and 
a’ =Q +9 y®-(A', æ). The series for ®,, u- are called 
Birkhoff series. 

In this isochronous case, if Birkhoff series were 
convergent for small ¢ and (A’', œ) in a region of the 
form U x T° with U c R! open and bounded, it 
would follow that, for small £, H+ would be inte- 
grable in a large region of phase space (i.e., where the 
generating function can be used to build a canonical 
map: this would essentially be U x T‘ deprived of a 
small layer of points near the boundary of U). 
However, convergence for small < is false (in general), 
as shown by the simple two-dimensional example 


H(A, æ) = @ : A + € (A2 + f(a) 


(A, æ) € R? x T? 76l 


with f(@) an arbitrary analytic function with all 
Fourier coefficients fy positive for v 4 0 and fo =0. 
In the latter case, the solution is 
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u-(A’) = EA? 
®@.(A’,a@) = 
e cia (inn)! 7 
2 ae” (i(wo11 + won2)) t a 


The series does not converge: in fact, its convergence 
would imply integrability and, consequently, 
bounded trajectories in phase space: however, the 
equations of motion for |76] can be easily solved 
explicitly and in any open region near given initial 
data there are other data which have unbounded 
trajectories if wo1/(wo2 + £) is rational. 

Nevertheless, even in this elementary case a 
formal sum of the series yields 


u(A’) = cA; 


fer [78] 


@.(A’,a~) = oo 
l ) 1(wo14 + (w20 + €)V2) 


0AVEZ? 


and the series in [78] (no longer a power series in €) 
is really convergent if @=(w01,W02 +€) is a Dio- 
phantine vector (by [74], because analyticity implies 
exponential decay of |f|). Remarkably, for such 
values of ¢ the Hamiltonian Hs is integrable and it is 
integrated by the canonical map generated by [78], 
in spite of the fact that [78] is obtained, from [77], 
via the nonrigorous sum rule 


CO 1 
R 
X PeT forz #1 [79] 


k=0 


(applied to cases with |z| > 1, which are certainly 
realized for a dense set of £’s even if @ is Diophantine 
because the z’s have values z= n2/@oọ - v). In other 
words, the integration of the equations is elementary 
and once performed it becomes apparent that, if @ is 
diophantine, the solutions can be rigorously found 
from [78]. Note that, for instance, this means that 
relations like XX 2* = —1 are really used to obtain 
[78] from [77]. 

Another extension of Lindstedt series arises in a 
perturbation of an anisochronous system when 
asking the question as to what happens to the 
unperturbed invariant tori Te, on which the spec- 
trum is resonant, that is, @ : v =0 for some v Æ 0, 
v € Zf. The result is that even in such a case there is a 
formal power series solution showing that at least 
a few of the (infinitely many) invariant tori into 
which To, is in turn foliated in the unperturbed case 
can be formally continued at € 4 0 (see the section 
“Resonances and their stability”). 

For more details, we refer the reader to Poincaré 
(1987). 


20 Introductory Article: Classical Mechanics 


Quasiperiodicity and KAM Stability 


To discuss more advanced results, it is convenient 
to restrict attention to a special (nontrivial) para- 
digmatic case 


H.(A,a@) =1A* +ef(a) [80] 


In this simple case (called Thirring model: represent- 
ing £ particles on a circle interacting via a potential 
ef(a@)) the equations for the maximal tori [75] 
reduce to equations for the only functions b-: 


(@ dy) h-(w) =—daf(wt+h(w)), wet’ [81] 


as the second of [75] simply becomes the definition 
of H; because the RHS does not involve H.. 

The real problem is therefore whether the formal 
series considered in the last section converge at least 
for small e: and the example [76] on the Birkhoff 
series shows that sometimes sum rules might be 
needed in order to give a meaning to the series. In 
fact, whenever a problem (of physical interest) 
admits a formal power series solution which is not 
convergent, or which is such that it is not known 
whether it is convergent, then one should look for 
sum rules for it. 

The modern theory of perturbations starts with 
the proof of the convergence for € small enough of 
the Lindstedt series (KOLMOGOROV). The general 
“KAM” result is: 


Theorem 3 (KAM) Consider the Hamiltonian 
H.(A, a) =h(A)+ef(A,a), defined in U=V x T! 
with V CR! open and bounded and with f(A,a), 
b(A) analytic in the closure V x T! where b(A) is also 
anisochronous; let @) =@(Ao) = Oah(Ao) and assume 
that @o satisfies |74]. Then 


(i) there is €c., > 0 such that the Lindstedt series 

converges for |e| < €c73 

(ii) its sum yields two function H.(w),b-(w) on T° 
which parametrize an invariant torus 
T c,r(Ao, €); 

(ii) on Tc,,(Ao, €) the motion is y —> w+ @ot, see 
[73]; and 

(iv) the set of data in U which belong to invariant 
tori Tc -(Ao,£€) with @(Ao) satisfying [74] 
with prefixed C,T has complement with volume 
<const C™ for a suitable a> 0 and with area 
also <const C~% on each nontrivial surface of 
constant energy H-=E. 


In other words, for small £ the spectra of most 
unperturbed quasiperiodic motions can still be found 
as spectra of perturbed quasiperiodic motions devel- 
oping on tori which are close to the corresponding 
unperturbed ones (i.e., with the same spectrum). 


This is a stability result: for instance, in systems 
with two degrees of freedom the invariant tori of 
dimension two which lie on a given three-dimensional 
energy surface, will separate the points on the energy 
surface into the set which is “inside” the torus and the 
set which is “outside.” Hence, an initial datum 
starting (say) inside cannot reach the outside. Like- 
wise, a point starting between two tori has to stay in 
between forever. Further, if the two tori are close, this 
means that motion will stay very localized in action 
space, with a trajectory accessing only points close to 
the tori and coming close to all such points, within a 
distance of the order of the distance between the 
confining tori. The case of three or more degrees of 
freedom is quite different (see sections “Diffusion in 
phase space” and “The three-body problem”). 

In the simple case of the rotators system [80] the 
equations for the parametric representation of the 
tori are given by [81]. The latter bear some analogy 
with the easier problem in [66]: but [81] are £ 
equations instead of one and they are differential 
equations rather than ordinary equations. Further- 
more, the function f(a@) which plays here the role of 
c(A) in [66] has Fourier coefficient fy with no 
restrictions on V, while the Fourier coefficients c, 
for c in [66] do not vanish only for v= +1. 

The above differences are, to some extent, 
“minor” and the power series solution to [81] can 
be constructed by the same algorithm as used in the 
case of [66]: namely one forms trees as in Figure 5 
with the harmonic labels v, € Z replaced by v, € Z! 
(still to be thought of as possible harmonic indices in 
the Fourier expansion of the perturbing function f). 
All other labels affixed to the trees in the section 
“Generic nonintegrability” will be the same. In 
particular, the current flowing on a branch /=v’'v 
will be defined as the sum of the harmonics of the 
nodes w < v preceding v: 


v) X ww [82] 


w<v 


and we call v(@) the current flowing in the root 
branch. 

Here the value Val(@) of a tree has to be defined 
differently because the equation to be solved ([81]) 
contains the differential operator (@ Oy) which, 
when Fourier transformed, becomes multiplication 
of the Fourier component with harmonic v by 
(i@-v)’. 

The variation due to the presence of the operator 
(Wo Oy) and the necessity of its inversion in the 
evaluation of u- p”, that is, of the component of 
p” along an arbitrary unit vector u, is nevertheless 


v 
quite simple: the value of a tree graph 0 of order k 


(i.e., with k nodes and k branches) has to be defined 


by (cf. [68]) 
iy Vy Vy 
k! wwe (Mo ` | 


x ( I fa) [83 


nodes v 


Val(o) @ 


where the vy appearing in the factor relative to the 
root line rv from the first node v to the root r (see 
Figure 5) is interpreted as a unit vector u (it was 
interpreted as 1 in the one-dimensional case [66]). 
Equation [83] makes sense only for trees in which 
no line carries zero current. Then the component 
along u oe harmonic label attached to the root of a 
tree) of bh) is given (see also [69]) by 


NX Val(6) [84] 
0, v(0)=v 
order(0)=k 


where the x means that the sum is only over trees in 
which a nonzero current v(/) flows on the lines | € 8. 
The quantity u- p” will be defined to be 0 (see the 
previous section). 

In the case of [66] zero-current lines could appear: 
but the contributions from tree graphs containing at 
least one zero current line would cancel. In the 
present case, o oot that the above algorithm 
actually gives p” by simply ignoring trees with lines 
with zero current is nontrivial. It was Poincaré’s 
contribution to the theory of Lindstedt series to show 
that even in the general case (cf. [75]) the equations 
for the invariant tori can be solved by a formal power 
series. Equation [84] is proved by induction on k after 
checking it for the first few orders. 

The algorithm just described leading to [83] can 
be extended to the case of the general Hamiltonian 
considered in the KAM theorem. 

The convergence proof is more delicate than the 
(elementary) one for eqn [66]. In fact, the values 
trees of order k can give large contributions to p” 
because the “new” factors (@o - v(1))}*, although not 
zero, can be quite small and their small size can 
overwhelm the smallness of the factors fy and e. In 
fact, even if f is a trigonometric polynomial (so that fy 
vanishes identically for |v| large enough) the currents 
flowing in the branches can be very large, of the 
order of the number k of nodes in the tree; see [82]. 

This is called the small-divisors problem. The key 
to its solution goes back to a related work (SIEGEL) 
which shows that 


Theorem 4 Consider the contribution to the sum 
in |82] from graphs 0 in which no pairs of lines 
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which lie on the same path to the root carry the 
same current and, furthermore, the node harmonics 
are bounded by |v| <N for some N. Then the 
number of lines £ in 0 with divisor @o - Ve satisfying 
27” < Cl@o -ve| < 2-"*! does not exceed 4Nk2~"'". 


Hence, setting 


def 
FEC maxy|<n|fv| 


the corresponding Val(9) can be bounded by 


L bah D 2n(4Nk2~"/7) def lk 
ae Į[2 =7B 
ya [85] 


B= FN*25 ° 8n2-n/T 


since the product is convergent. In the case in which 
f is a trigonometric polynomial of Saati N, the 
above restricted contributions to wu- hi would 
generate a convergent series for € small LEA In 
fact, the number of trees is bounded (as in the 
section “Perturbing ee by k!4*(2N + 1) so 
that the series $}, le|§|u - h | would converge for 
small e (i.e., Je] < (B - 42N + 1)! yt). 

Given di comment, the analysis of the “remain- 
ing contributions” becomes the real problem, and it 
requires new ideas because among the excluded trees 
there are some simple kth order trees whose value 
alone, if considered separately from the other 
contributions, would generate a factorially divergent 
power series 1n £. 

However, the contributions of all large-valued 
trees of order k can be shown to cancel: although 
not exactly (unlike the case of the elementary 
problem in the section “Perturbing functions,” 
where the cancellation is not necessary for the 
proof, in spite of its exact occurrence), but enough 
so that in spite of the existence of exceedingly large 
values of individual tree graphs their total sum can 
still be bounded by a constant to the power k so that 
the power series actually converges for e small 
enough. The idea is discussed in Appendix 1. 

For more details, the reader is referred to Poincaré 
(1987), Kolmogorov (1954), Moser (1962), and Arnol’d 
(1989). 


Resonances and their Stability 


A quasiperiodic motion with r rationally indepen- 
dent frequencies is called resonant if r is strictly less 
than the number of degrees of freedom, ¢. The 
difference s= £ — r is the degree of the resonance. 

Of particular interest are the cases of a perturba- 
tion of an integrable system in which resonant 
motions take place. 
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A typical example is the n-body problem which 
studies the mutual perturbations of the motions of 
n—1 particles gravitating around a more massive 
particle. If the particle masses can be considered to 
be negligible, the system will consist of n — 1 central 
Keplerian motions: it will therefore have ¢ = 3(n — 1) 
degrees of freedom. In general, only one frequency 
per body occurs in the absence of the perturbations 
(the period of the Keplerian orbit). Hence, r < n — 1 
and s > 2(m—1) (or in the planar case s > (n — 1)) 
with equality holding when the periods are ration- 
ally independent. 

Another example is the rigid body with a fixed 
point perturbed by a conservative force: in this case, 
the unperturbed system has three degrees of freedom 
but, in general, only two frequencies (see the 
discussion following [52]). 

Furthermore, in the above examples there is the 
possibility that the independent frequencies assume, 
for special initial data, values which are rationally 
related, giving rise to resonances of even higher 
order (i.e., with smaller values of 7). 

In an integrable anisochronous system, resonant 
motions will be dense in phase space because the 
frequencies @(A) will vary as much as the actions 
and therefore resonances of any order (i.e., any 
r < £) will be dense in phase space: in particular, the 
periodic motions (i.e., the highest-order resonances) 
will be dense. 

Resonances, in integrable systems, can arise in 
a priori stable integrable systems and in a priori 
unstable systems: the former are systems whose 
Hamiltonian admits canonical action—angle coordi- 
nates (Aja) € U x T with U c R* open, while the 
latter are systems whose Hamiltonian has, in 
suitable local canonical coordinates, the form 


S1 1 S2 1 
Ho(A) + 20 - X; q;) + aa F Tal 
cl al [86] 


Ais Hj =Q 


where (A æ) €U xT’, UER’, (p,q) eV cR”, 
(m, k)E€V'C R with V,V' neighborhoods of the 
origin and ¢@=r+s,+52,5;>0,s1+s2>0 and 
+y/\j, + ./Hj are called Lyapunov coefficients of 
the resonance. The perturbations considered are 
supposed to have the form ef (A, œ, p,q, m, K). The 
denomination of a priori stable or unstable refers to 
the properties of the “a priori given unperturbed 
Hamiltonian.” The label “a priori unstable”? is 
certainly appropriate if s4 > 0: here also sı=0 is 
allowed for notational convenience implying that the 
Lyapunov coefficients in a priori unstable cases are all 
of order 1 (whether real A; or imaginary 1,/j). In 


other words, the a priori stable case, sı =s2 =0 in 
[86], is the only excluded case. Of course, the stability 
properties of the motions when a perturbation acts 
will depend on the perturbation in both cases. 

The a priori stable systems usually have a great 
variety of resonances (e.g., in the anisochronous 
case, resonances of any dimension are dense). The 
a priori unstable systems have (among possible other 
resonances) some very special r-dimensional 
resonances occurring when the unstable coordinates 
(p,q) and (m, K) are zero and the frequencies of the r 
action-angle coordinates are rationally independent. 

In the first case (a priori stable), the general 
question is whether the resonant motions, which 
form invariant tori of dimension r arranged into 
families that fill -dimensional invariant tori, con- 
tinue to exist, in presence of small enough perturba- 
tions ef(A, œ), on slightly deformed invariant tori. 
Similar questions can be asked in the a priori 
unstable cases. To examine the matter more closely 
consider the formulation of the simplest problems. 

A priori stable resonances: more precisely, suppose 
Ho = 1A? and let {Ao} x Tf be the unperturbed 
invariant torus 7a, with spectrum @) =@(Ao)= 
O4Ho(Ao) with only r rationally independent compo- 
nents. For simplicity, suppose that @oọ = (w1,..., 
w,,0,...,0) (@,0) with œ € R’. The more general 
case in which @ has only r rationally independent 
components can be reduced to the special case above 
by a canonical linear change of coordinates at the price 
of changing the Ho to a new one, still quadratic in the 
actions but containing mixed products A;B;: the proofs 
of the results that are discussed here would not be 
really affected by such more general form of H. 

It is convenient to distinguish between the “fast” 
angles aj,...,a, and the “resonant” angles 
Qy415---,Q¢ (also called “slow” or “secular”) and 
call a=(a’,B) with a’ € T” and B € T°. Likewise, 
we distinguish the fast actions A’ =(A,,...,A,) and 
the resonant ones A,,1,...,A¢ and set A=(A’,B) 
with A’ € R” and BE RS. 

Therefore, the torus T4,, Ao = (Ap, Bo), is in turn a 
continuum of invariant tori Za, ,g with trivial 
parametric equations: p fixed, œ' =y, y € T”, and 
A' =A, B=Bo. On each of them the motion is: 
A’,B,8 constant and a —a' +t, with rationally 
independent @ € R”. 

Then the natural question is whether there exist 
functions h.,k-,H-,K- smooth in £ near £ =0 and in 
y € T”, vanishing for ¢=0, and such that the torus 
T Aoba, With parametric equations 


b 


wet’ [87] 


is invariant for the motions with Hamiltonian 
H.(A,ar) = $A" +1B* + ef(a’, B) 


and the motions on it are wW —> y + øt. The above 
property, when satisfied, is summarized by saying 
that the unperturbed resonant motions 
A = (Ap, Bo), @ = (œh + @'t,B)) can be continued in 
presence of perturbation ef, for small £, to quasiper- 
iodic motions with the same spectrum and on a 
slightly deformed torus Ty g, e 

A priori unstable resonances: here the question is 
whether the special invariant tori continue to exist 
in presence of small enough perturbations, of 
course slightly deformed. This means asking 
whether, given Ao such that @(Ao) = ôA Ho(Ao) has 
rationally independent components, there are func- 
tions (H-(y),b-(y)), (P-(y),Q.(y)) and (T-Y), 
K.(w)) smooth in £ near ¢=0, vanishing for € =0, 
analytic in w € T” and such that the r-dimensional 
surface 


A=Ao+H.:(y) a@=ywth.(y) 
p=P.(y), q = Q. (y) pel” [88] 
n—IT-(w), K= K.(w) 


is an invariant torus J,4,. on which the motion is 
w—>w+a@(Ao)t. Again, the above property is 
summarized by saying that the unperturbed special 
resonant motions can be continued in presence of 
perturbation ef for small ¢ to quasiperiodic motions 
with the same spectrum and on a slightly deformed 
torus T Ap, <. 

Some answers to the above questions are pre- 
sented in the following section. For more details, the 
reader is referred to Gallavotti et al. (2004). 


Resonances and Lindstedt Series 


We discuss eqns [87] in the paradigmatic case in 
which the Hamiltonian Ho(A) is 4A* (cf. [80]). It 
will be @(A’) =A’ so that Aj =@, Bp =0 and the 
perturbation f(@) can be considered as a function 
of æ =(a’,B): let f(B) be defined as its average over 
a’. The determination of the invariant torus of 
dimension r which can be continued in the sense 
discussed in the last section is easily understood in 
this case. 

A resonant invariant torus which, among the tori 
T 4),g> has parametric equations that can be con- 
tinued as a formal power series in € is the torus 
TA, p, With Po a stationarity point for f (£), that is, 
an equilibrium point for the average perturbation: 
Opt ( Bo) =0. In fact, the following theorem holds: 


Introductory Article: Classical Mechanics 23 


Theorem 5 If @€ER’ satisfies a Diophantine 
property and if By is a nondegenerate stationarity 
point for the “fast angle average” f(B) (1.e., such 
that det Saal (Bo) # 0), then the following equations 


for the functions h.,k., 


(@- dy)"h-(W) = —€0af (Ww +b-(W), Bo +k-(W)) 
(@ Oy) k-(w) = —edpt (w+ b-(W) + k-(W)) 


[89] 


can be formally solved in powers of e. 


Given the simplicity of the Hamiltonian [80] that 
we are considering, it is not necessary to discuss the 
functions H.,K- because the equations that they 
should obey reduce to their definitions as in the 
section “Quasiperiodicity and KAM stability,” and 
for the same reason. 

In other words, also the resonant tori admit a 
Lindstedt series representation. It is however very 
unlikely that the series are, in general, convergent. 

Physically, this new aspect is due to the fact that 
the linearization of the motion near the torus T 4, 4, 
introduces oscillatory motions around Ty: g, with 
frequencies proportional to the square roots of the 
positive eigenvalues of the matrix Opel (Bo): there- 
fore, it is naively expected that it has to be necessary 
that a Diophantine property be required on the 
vector (@,,/EH1,...), where euj are the positive 
eigenvalues. Hence, some values of £, namely those 
for which (a, \/ef1,...) is not a Diophantine vector 
or is too close to a non-Diophantine vector, should 
be excluded or at least should be expected to 
generate difficulties. Note that the problem arises 
irrespective of the assumptions about the nonde- 
generate matrix al | Bo) (since £ can have either 
sign), and no matter how small |e| is supposed to be. 
But we can expect that if the matrix zal ( Bo) is 
(say) positive definite (i.e., By is a minimum point 
for f( B)) then the problem should be easier for £ < 0 
and vice versa, if Jọ is a maximum, it should be 
easier for £ >Q (i.e. in the cases in which the 
eigenvalues of eOpef | Po) are negative and their roots 
do not have the interpretation of frequencies). 

Technically, the sums of the formal series can be 
given (so far) a meaning only via summation rules 
involving divergent series: typically, one has to 
identify in the formal expressions (denumerably 
many) geometric series which, although divergent, 
can be given a meaning by applying the rule [79]. 
Since the rule can only be applied if z 4 1, this leads 
to conditions on the parameter £, in order to exclude 
that the various z that have to be considered are very 
close to 1. Hence, this stability result turns out to be 
rather different from the KAM result for the 
maximal tori. Namely the series can be given a 
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meaning via summation rules provided f and Bp 
satisfy certain additional conditions and provided 
certain values of € are excluded. An example of a 
theorem is the following: 


Theorem 6 Given the Hamiltonian [$80] and a 
resonant torus T y g, with @ =A; € R” satisfying a 
Diophantine property let By be a nondegenerate 
maximum point for the average potential f(B) $% 
(27) fy F(a’, B)d"a’. Consider the Lindstedt series 
solution for eqns [89] of the perturbed resonant 
torus with spectrum (@,0). It is possible to express 
the single nth-order term of the series as a sum of 
many terms and then rearrange the series thus 
obtained so that the resummed series converges for 
e ina domain E which contains a segment [0, £o] and 
also a subset of |—€0,0| which, although with open 
dense complement, is so large that it has O as a 
Lebesgue density point. Furthermore, the resummed 
series for h-,k- define an invariant r-dimensional 
analytic torus with spectrum o. 


More generally, if By is only a nondegenerate 
stationarity point for f (B), the domain of definition 
of the resummed series is a set E C [—€0,€9| which 
on both sides of the origin has an open dense 
complement although it has 0 as a Lebesgue density 
point. 

Theorem 6 can be naturally extended to the 
general case in which the Hamiltonian is the most 
general perturbation of an anisochronous integrable 
system H-(A,a@)=h(A) + ef(A,q) if 04, is a non- 
singular matrix and the resonance arises from a 
spectrum @(Ao) which has r independent compo- 
nents (while the remaining are not necessarily zero). 

We see that the convergence is a delicate problem 
for the Lindstedt series for nearly integrable reso- 
nant motions. They might even be divergent 
(mathematically, a proof of divergence is an open 
problem but it is a very reasonable conjecture in 
view of the above physical interpretation); never- 
theless, Theorem 6 shows that sum rules can be 
given that sometimes (i.e., for € in a large set near 
e€ =0) yield a true solution to the problem. 

This is reminiscent of the phenomenon met in 
discussing perturbations of isochronous systems in 
[76], but it is a much more complex situation. It 
leaves many open problems: foremost among them 
is the question of uniqueness. The sum rules of 
divergent series always contain some arbitrary 
choices, which lead to doubts about the uniqueness 
of the functions parametrizing the invariant tori 
constructed in this way. It might even be that the 
convergence set € may depend upon the arbitrary 
choices, and that considering several of them no e€ 
with |e| < £ọ is left out. 


The case of a priori unstable systems has also 
been widely studied. In this case too resonances 
with Diophantine r-dimensional spectrum @ are 
considered. However, in the case s2 =0 (called a 
priori unstable hyperbolic resonance) the Lindstedt 
series can be shown to be convergent, while in the 
case s;=0 (called a priori unstable elliptic reso- 
nance) or in the mixed cases sj,s2 >0 extra 
conditions are needed. They involve @ and 
M=(1,---5fs,) (cf. [86]) and properties of the 
perturbations as well. It is also possible to study a 
slightly different problem: namely to look for 
conditions on @,u,f which imply that, for small 
€, invariant tori with spectrum e-dependent but 
close, in a suitable sense, to @ exist. 

The literature is vast, but it seems fair to say that, 
given the above comments, particularly those con- 
cerning uniqueness and analyticity, the situation is still 
quite unsatisfactory. We refer the reader to Gallavotti 
et al. (2004) for more details. 


Diffusion in Phase Space 


The KAM theorem implies that a perturbation of an 
analytic anisochronous integrable system, i.e., with 
an analytic Hamiltonian H-(A,a@)=Ho(A) + 
ef(A,@) and nondegenerate Hessian matrix 
04,,h(A), generates large families of maximal invar- 
iant tori. Such tori lie on the energy surfaces but do 
not have codimension 1 on them, i.e., they do not 
split the (24 — 1)-dimensional energy surfaces into 
disconnected regions except, of course, in the case of 
systems with two degrees of freedom (see the section 
“Quasiperiodicity and KAM stability”). 

Therefore, there might exist trajectories with 
initial data close to A’ in action space which reach 
phase space points close to Af + A' in action space 
for ¢ 40, no matter how small. Such diffusion 
phenomenon would occur in spite of the fact that 
the corresponding trajectory has to move in a space 
in which very close to each {A} x T" there is an 
invariant surface on which points move keeping 
A constant within O(e), which for € small can be 
<A — Al. 

In a priori unstable systems (cf. the section 
“Resonances and their stability”) with s,;=1, 
s2 =Q, it is not difficult to see that the correspond- 
ing phenomenon can actually occur: the paradig- 
matic example (ARNOL’D) is the a priori unstable 
system 

2 2 
He: = Ay +5 + g(cosq — 1) 
+ e(cos a1 + sina2z)(cosg — 1) 190] 


This is a system describing a motion of a “pendu- 
lum” ((p, g) coordinates) interacting with a “rotat- 
ing wheel” ((A1,@1) coordinates) and a “clock” 
((A2,a2) coordinates) a priori unstable near the 
points p=0,q=0,27 (s3=1, s2=0, \1= 8, 
cf. [86]). It can be proved that on the energy surface 
of energy E and for each e £0 small enough (no 
matter how small) there are initial data with action 
coordinates close to A’ = (AÌ , AŻ ) with (1/2)A? + A} 
close to E eventually evolving to a datum 
A’ =(A\, A) with A‘ at a distance from Af smaller 
than an arbitrarily prefixed distance (of course with 
energy E). Furthermore, during the whole process 
the pendulum energy stays close to zero within o(e) 
(i.e., the pendulum swings following closely the 
unperturbed separatrices). 

In other words, [90] describes a machine (the 
pendulum) which, working approximately in a 
cycle, extracts energy from a reservoir (the clock) 
to transfer it to a mechanical device (the wheel). The 
statement that diffusion is possible means that the 
machine can work as soon as £ Æ 0, if the initial 
actions and the initial phases (i.e., a1,a2,p,q) are 
suitably tuned (as functions of €). 

The peculiarity of the system [90] is that the fixed 
points P+ of the unperturbed pendulum (i.e., the 
equilibria p = 0, q = 0, 27) remain unstable equilibria 
even when £ Æ 0 and this is an important simplify- 
ing feature. 

It is a peculiarity that permits bypassing the 
obstacle, arising in the analysis of more general 
cases, represented by the resonance surfaces consist- 
ing of the A’s with Arı +r2=0: the latter 
correspond to harmonics (14,72) present in the 
perturbing function, i.e., the harmonics which 
would lead to division by zero in an attempt to 
construct (as necessary in studying [90] by Arnol’d’s 
method) the parametric equations of the perturbed 
invariant tori with action close to such A’s. In the 
case of [90] the problem arises only on the 
resonance marked in Figure 6 by a heavy line, i.e., 
A; =0, corresponding to cosa, in [90]. 

If e=0, the points P_ with p=0, g=0 and the 
point P, with p=0,q=27 are both unstable 
equilibria (and they are, of course, the same point, 
if g is an angular variable). The unstable manifold 
(it is a curve) of P, coincides with the stable 
manifold of P_ and vice versa. So that the 
unperturbed system admits nontrivial motions lead- 
ing from P, to P_ and from P_ to P,, both in a bi- 
infinite time interval (—0oo,0o): the p,q variables 
describe a pendulum and Px are its unstable 
equilibria which are connected by the separatrices 
(which constitute the zero-energy surfaces for the 
pendulum). 
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(a) (b) 

Figure 6 (a) The £=0 geometry: the “partial energy” lines are 
parabolas, (1/2)Af + A2 =const. The vertical lines are the 
resonances A; =rational (i.e., 1v4A; +v2=0). The disks are 
neighborhoods of the points A! and A' (the dots at their centers). 
(b) £ £0; an artists rendering of a trajectory in A space, driven 
by the pendulum swings to accelerate the wheel from Aj to Aj at 
the expenses of the clock energy, sneaking through invariant tori 
not represented and (approximately) located “away” from the 
intersections between resonances and partial energy lines (a 
dense set, however). The pendulum coordinates are not shown: 
its energy stays close to zero, within a power of £. Hence the 
pendulum swings, staying close to the separatrix. The oscilla- 
tions symbolize the wiggly behavior of the partial energy 
(1/2)A + A2 in the process of sneaking between invariant tori 
which, because of their invariance, would be impossible without 
the pendulum. The energy (1/2)A? of the wheel increases 
slightly at each pendulum swing: accurate estimates yield an 
increase of the wheel speed A; of the order of </(log <~') at 
each swing of the pendulum implying a transition time of the 
order of ge loge. 


The latter property remains true for more general 
a priori unstable Hamiltonians 


Fls = Ho (A) se Halba) F ef (A, a, p,q) 


in (U x T^) x (R?) va) 


where Ha is a one-dimensional Hamiltonian which 
has two unstable equilibrium points P, and P_ 
linearly repulsive in one direction and linearly 
attractive in another which are connected by two 
heteroclinic trajectories which, as time tends to too, 
approach P_ and P, and vice versa. 

Actually, the points need not be different but, if 
coinciding, the trajectories linking them must be 
nontrivial: in the case [90] the variable g can be 
considered an angle and then P} and P_ would 
coincide (but are connected by nontrivial trajec- 
tories, i.e., by trajectories that also visit points 
different from P+). Such trajectories are called 
heteroclinic if P} # P_ and homoclinic if P} = P_. 

In the general case, besides the homoclinicity (or 
heteroclinicity) condition, certain weak genericity 
conditions, automatically satisfied in the example 
[90], have to be imposed in order to show that, 
given A! and Af with the same unperturbed energy 
E, one can find, for all € small enough but not equal 
to zero, initial data (e-dependent) with actions 
arbitrarily close to A’ which evolve to data with 
actions arbitrarily close to Af. This is a phenomenon 
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called the Arnold diffusion. Simple sufficient con- 
ditions for a transition from near A! to near A! are 
expressed by the following result: 


Theorem 7 Given the Hamiltonian |91] with Ha 
admitting two hyperbolic fixed points P+ with 
heteroclinic connections, t — (palt), ga(t)), a=1, 2, 
suppose that: 


(i) On the unperturbed energy surface of energy 
E=H(A') + Ha(P+) there is a regular curve 
y:s—A(s) joining A' to Af such that the 
unperturbed tori {A(s)} x T! can be continued 
at e £0 into invariant tori TAs) for a set of 
values of s which fills the curve y leaving only 
gaps of size of order o(e). 

(ii) The £x £ matrix Dy of the second derivatives of 
the integral of f over the heteroclinic motions is 
not degenerate, that is, 


| det D| 


- det ( [At Aouaf(Avar+ (Ale 


palt).au(t)))| > e> 0 92] 


for all A’s on the curve y and alla € T’. 


Given arbitrary p>0, for ¢40 small enough 
there are initial data with action and energy closer 
than p to A’ and E, respectively, which after a long 
enough time acquire an action closer than p to A 
(keeping the initial energy). 


The above two conditions can be shown to hold 
generically for many pairs A'4 A!‘ (and many 
choices of the curves y connecting them) if the 
number of degrees of freedom is >3. Thus, the result, 
obtained by a simple extension of the argument 
originally outlined by Arnol’d to discuss the para- 
digmatic example [90], proves the existence of 
diffusion in a priori unstable systems. The integral 
in [92] is called Melnikov integral. 

The real difficulty is to estimate the time needed 
for the transition: it is a time that obviously has to 
diverge as € — 0. Assuming g fixed (i.e., € indepen- 
dent) a naive approach easily leads to estimates 
which can even be worse than O(exp(ae~’)) with 
some a,b > 0. It has finally been shown that in such 
cases the minimum time can be, for rather general 
perturbations ef(a@,g), estimated above by 
O(e loge), which is the best that can be hoped 
for under generic assumptions. 

The reader is referred to Arnol’d (1989) and 
Chierchia and Valdinoci (2000) for more details. 


Long-Time Stability of Quasiperiodic 
Motions 


A more difficult problem is whether the same 
phenomenon of migration in action space occurs in 
a priori stable systems. The root of the difficulty is a 
remarkable stability property of quasiperiodic 
motions. Consider Hamiltonians H-(A,@)=h(A) + 
ef(A,a@) with Ho(A) =h(A) strictly convex, analytic, 
and anisochronous on the closure U of an open 
bounded region U c Rf, and a perturbation ef (A, œ) 
analytic in U x T“. 

Then a priori bounds are available on how long it 
can possibly take to migrate from an action close to 
A; to one close to A: and the bound is of 
“exponential type” as e— 0 (i.e., it admits a lower 
bound which behaves as the exponential of an 
inverse power of €). The simplest theorem is 
(NEKHOROSSEV): 


Theorem 7 There are constants 0 <a,b,d,g,rT 
such that any initial datum (A,a) evolves so that A 
will not change by more than as£ before a long time 
bounded below by rt exp (be~). 


Thus, this puts an exponential bound, i.e., a 
bound exponential in an inverse power of £, to the 
diffusion time: before a time r exp (be~?) actions can 
only change by O(e£) so that their variation cannot 
be large no matter how small € Æ 0 is chosen. This 
places a (long) lower bound to the time of diffusion 
in a priori stable systems. 

The proof of the theorem provides, actually, an 
interesting and detailed picture of the variations in 
actions showing that some actions may vary more 
slowly than others. 

The theorem is constructive, i.e., all constants 
0 <a,b,d,r can be explicitly chosen and depend 
on £, Ho, f although some of them can be fixed to 
depend only on £ and on the minimum curvature of 
the convex graph of Ho. Its proof can be adapted 
to cover many cases which do not fall in the class of 
systems with strictly convex unperturbed Hamilto- 
nian, and even to cases with a resonant unperturbed 
Hamiltonian. 

However, in important problems (e.g., in the 
three-body problems met in celestial mechanics) 
there is empirical evidence that diffusion takes 
place at a fast pace (i.e., not exponentially slow in 
the above sense) while the above results would 
forbid a rapid migration in phase space if they 
applied: however, in such problems the assumptions 
of the theorem are not satisfied, because the 
unperturbed system is strongly resonant (as in the 
celestial mechanics problems, where the number of 
independent frequencies is a fraction of the number 


of degrees of freedom and /(A) is far from strictly 
convex), leaving wide open the possibility of observ- 
ing rapid diffusion. 

Further, changing the assumptions can dramati- 
cally change the results. For instance, rapid diffusion 
can sometimes be proved even though it might be 
feared that it should require exponentially long 
times: an example that has been proposed is the 
case of a three-timescales system, with Hamiltonian 


2 
w1 A1 +w A2 + = + g(1 + cos q) 


+ ef (01, 02, p,q) [93 | 
with ao-*8(w1,w2), where w =€ U, w =E 0 
and w,w >00 constants. The three scales are 
wi, yg, wy. In this case, there are many 


(although by no means all) pairs A1,A2 which can 
be connected within a time that can be estimated to 
be of order O(c! loge™). 

This is a rapid-diffusion case in an a priori 
unstable system in which condition [92] is not 
satisfied: because the e-dependence of @(A) implies 
that the lower bound c in [92] must depend on e€ 
(and be exponentially small with an inverse power 
of £ ase—0). 

The unperturbed system in [93] is nonresonant in 
the Ho part for € > 0 outside a set of zero measure 
(i.e., where the vector @- satisfies a suitable 
Diophantine property) and, furthermore, it is 
a priori unstable: cases met in applications can be 
a priori stable and resonant (and often not aniso- 
chronous) in the Ho part. In such a system, not only 
the speed of diffusion is not understood but 
proposals to prove its existence, if present (as 
expected), have so far not given really satisfactory 
results. 

For more details, the 
to Nekhorossev (1977). 


reader in referred 


The Three-Body Problem 


Mechanics and the three-body problem can be 
almost identified with each other, in the sense that 
the motion of three gravitating masses has long been 
a key astronomical problem and at the same time 
the source of inspiration for many techniques: 
foremost among them the theory of perturbations. 
As an introduction, consider a special case. Let 
three masses ms =mọ, m; =m, my =M interact 
via gravity, that is, with interaction potential 
—kmjm,;|xji — x|": the simplest problem arises 
when the third body has a neglegible mass compared 
to the two others and the latter are supposed to be 
on a circular orbit; furthermore, the mass my is ems 
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with £ small and the mass mm moves in the plane of 
the circular orbit. This will be called the “circular 
restricted three-body problem.” 

In a reference system with center S and rotating at 
the angular speed of J around S inertial forces 
(centrifugal and Coriolis) act. Supposing that the 
body J is located on the axis with unit vector i at 
distance R from the origin S, the acceleration of the 
point M is 


: ER, 


if F is the force of attraction and @ọ ^ ò = w+ 
where @ọ is a vector with |@o|=wo and perpen- 
dicular to the orbital plane and 0t%(—po, pı) if 
o =(p1, p2). Here, taking into account that the origin 
S rotates around the fixed center of mass, wle — 
ER/(1 + )t) is the centrifugal force while 2@ ^ ọ 
is the Coriolis force. The equations of motion can 
therefore be derived from a Lagrangian 


1 1 
L=50 -W + woo” ò +5499" 


— o-i [94] 
with 


WR? = kms(1 + €) of 90 
kms B kmse 
lel [e — Ril 





where k is the gravitational constant, R the distance 
between S and J, and finally the last three terms in [94] 
come from the Coriolis force (the first) and from the 
centripetal force (the other two, taking into account that 
the origin S rotates around the fixed center of mass). 

Setting g =go/(1 +¢) = kms, the Hamiltonian of 
the system is 


eE a 
(lg i R i) [95] 


The first part can be expressed immediately in the 
action—angle coordinates for the two-body problem 
(cf. the section “Newtonian potential and Kepler’s 
laws”). Calling such coordinates (Lo, ào, Go, yo) and 
0o the polar angle of M with respect to the major axis 
of the ellipse and Ay the mean anomaly of M on its 
ellipse, the Hamiltonian becomes, taking into account 
that for e =Q the ellipse axis rotates at speed —w, 


2, —1 
-Ez -wG -e$ (|8 -2.4) 96) 
0 
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which is convenient if we study the interior problem, 
i.e. |Q| < R. This can be expressed in the action- 
angle coordinates via [41], [42]: 


0o = A0 + fr; Oo + Yo = Ao + Y0 + fao 


lel G 1 [97] 
R gR1+ecos(ào+fy) 


where (see [42]), fi =f (esin à, ecosA) and 


f(x,y) = 2x(1 Ht) 


with the ellipsis denoting higher orders in x,y even 
in x. The Hamiltonian takes the form, if w? = gR”, 


He =- pz- wGo+eRF(GosLo,Ao,A0+70) [98] 
where the only important feature (for our purposes) is 
that F(L, G,a, 68) is an analytic function of L, G,a, 8 
near a datum with |G| < L (1e., e>0) and jo|<R. 
However, the domain of analyticity in G is rather 
small as it is constrained by |G|<L excluding in 
particular the circular orbit case G= +L. 

Note that apparently the KAM theorem fails to be 
applicable to [98] because the matrix of the second 
derivatives of Ho(L,G) has vanishing determinant. 
Nevertheless, the proof of the theorem also goes 
through in this case, with minor changes. This can 
be checked by studying the proof or, following a 
remark by Poincaré, by simply noting that the 
“squared” Hamiltonian HEH.) has the form 


2 2 
H, = (- 5-060) +eF (Go, Lo, Ao, ào +70) [99] 
0 


with F’ still analytic. But this time 


"Ho 27-4, 2 
——__ = —62° Lp “ugh 
det 7G), Lo) 6g Lp woh £0 


if h = —g*Lo? — 2wGo # 0 


Therefore, the KAM theorem applies to H, and 
the key observation is that the orbits generated by 
the Hamiltonian (H+)? are geometrically the same as 
those generated by the Hamiltonian Hs: they are 
only run at a different speed because of the need of a 
time rescaling by the constant factor 21. 

This shows that, given an unperturbed ellipse of 
parameters (Lo,Go) such that w@=(g*/L%, —w), 
Go > 0, with w;/w2 Diophantine, then the perturbed 
system admits a motion which is quasiperiodic with 
spectrum proportional to @ and takes place on an orbit 
which wraps around a torus remaining forever close to 
the unperturbed torus (which can be visualized as 
described by a point moving, according to the area law 


on an ellipse rotating at a rate —wg) with actions 
(Lo, Go), provided £ is small enough. Hence, 


The KAM theorem answers, at least conceptually, the 
classical question: can a solution of the three-body 
problem remain close to an unperturbed one forever? 
That is, is it possible that a solar system is stable 
forever? 


Assuming e, |o|/R<1 and retaining only the lowest 
orders in e and |o|/R<1 the Hamiltonian [98] 
simplifies into 





2 4 
__ 8 _ £8 Go 
H=— 373 Opto) IR PRE (3008240 +70) 
~ecosdy ~52c0s(Ay +20) 
3 
+5ecos(3A9 +20) ) 100] 
where 
G4 
SA 1/24 es 0 
6-(Go) (( Fe) ) Go IR g2 R2 


It is an interesting exercise to estimate, assuming 
as model the Hamiltonian [100] and following the 
proof of the KAM theorem, how small has € to be if 
a planet with the data of Mercury can be stable 
forever on a (slowly precessing) orbit with actions 
close to the present-day values under the influence 
of a mass € times the solar mass orbiting on a circle, 
at a distance from the Sun equal to that of Jupiter. It 
is possible to follow either the above reduction to 
the ordinary KAM theorem or to apply directly to 
[100] the Lindstedt series expansion, proceeding 
along the lines of the section “Quasiperiodicity and 
KAM stability.” The first approach is easy but the 
second is more efficient: in both cases, unless the 
estimates are done in a particularly careful manner, 
the value found for ems is not interesting from the 
viewpoint of astronomy. 

The reader is refered to Arnol’d (1989) for more 
details. 


Rationalization and Regularization of 
Singularities 


Often integrable systems have interesting data which 
lie on the boundary of the integrability domain. For 
instance, the central motion when L=G (circular 
orbits) or the rigid body in a rotation around one of 
the principal axes or the two-body problem when 
G =0 (collisional data). In such cases, perturbation 


theory cannot be applied as discussed above. 
Typically, the perturbation depends on quantities 
like VL — G and is not analytic at L=G. Never- 
theless, it is sometimes possible to enlarge phase space 
and introduce new coordinates in the vicinity of the 
data which in the initial phase space are singular. 

A notable example is the failure of the analysis of 
the circular restricted three-body problem: it appar- 
ently fails when the orbit that we want to perturb is 
circular. 

It is convenient to 
coordinates L, à and G,7: 


introduce the canonical 


i, Con 
0 0 0 101] 

AS AOU: yay 
so that e=V2GL—,/1—G(2L)" and \»=A+7 


and o= Ao + fà, where T is defined in [42] (see 
also [97]). Hence, 


bo = ÀA + Y + ftg, Oo + Yo 


veer ( -=) 


=A+ frty 


[102] 
lol _ L0 
R = z + ecos(r er ee, 
and the Hamiltonian [100] takes the form 
g2 
H: = — 372 wL + wG 
+ eS F(L —~G,L,\+7,\) [103] 


In the coordinates L,G of [101] the unperturbed 
circular case corresponds to G=O and [96], once 
expressed in the action—angle variables G, L, y, A, is 
analytic in a domain whose size is controlled by 
VG. Nevertheless, very often problems of perturba- 
tion theory can be “regularized.” 

This is done by “enlarging the integrability” 
domain by adding to it points (one or more) around 
the singularity (a boundary point of the domain of 
the coordinates) and introducing new coordinates to 
describe simultaneously the data close to the 
singularity and the newly added points: in many 
interesting cases, the equations of motion are no 
longer singular (i.e., become analytic) in the new 
coordinates and are therefore apt to describe the 
motions that reach the singularity in a finite time. 
One can say that the singularity was only apparent. 

Perhaps this is best illustrated precisely in the 
above circular restricted three-body problem, with 
the singularity occurring where G=0, that is, at a 
circular unperturbed orbit. If we describe the points 
with G small in a new system of coordinates 
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obtained from the one in [101] by letting alone 
L, àA and setting 


p = vV2G cosy, = V2Gsiny 


then p, q vary in a neighborhood of the origin with 
the origin itself excluded. 

Adding the origin of the p-q plane then in a full 
neighborhood of the origin, the Hamiltonian [96] is 
analytic in L, A, p,q. This is because it is analytic 
(cf. [96], [97]) as a function of L,A and ecos 0o 
and of cos(àoọ + #0). Since 0o =A+7+fra+, and 
Oo + 0o =A +f by [97], the Hamiltonian [96] is 
analytic in L,A,ecos(A+ 7+ fa), cos (A + fa) 
for e small (i.e et G small) and, by [42], fy+, is 
analytic in pve +) and ecos(A +7). Pee the 
trigonometric identities 


104) 


A A G 
[105] 
A G 
pense ae bahia, 1 


VL IL 
together with G = (1/2)(p? + q?) imply that [103] is 
analytic near p=q=0 and L >0,A € [0,27]. The 
Hamiltonian becomes analytic and the new coordi- 
nates are suitable to describe motions crossing the 
origin: for example, by setting 


c41 (1 pt T) po 





4L 
[100] becomes 


pt 


a7 


et a wtp T q“) 


{a 2 — eg (L— 


x (3 cos 2A — ((—11 cos À + 3 cos 3A)p 
— (7 sin À + 3 sin 3A)q)C) 


1e +e) 


[106] 


The KAM theorem does not apply in the form 
discussed above to “Cartesian coordinates,” that is, 
when, as in [106], the unperturbed system is not 
assigned in action-angle variables. However, there 
are versions of the theorem (actually its corollaries) 
which do apply and therefore it becomes possible to 
obtain some results even for the perturbations of 
circular motions by the techniques that have been 
illustrated here. 

Likewise, the Hamiltonian of the rigid body with 
a fixed point O and subject to analytic external 
forces becomes singular, if expressed in the action- 
angle coordinates of Deprit, when the body motion 
nears a rotation around a principal axis or, more 
generally, nears a configuration in which any two of 
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the axes 13, 2, or Zo coincide (i.e., any two among the 
principal axis, the angular momentum axis and the 
inertial z-axis coincide; see the section “Rigid 
body”). Nevertheless, by imitating the procedure 
just described in the simpler cases of the circular 
three-body problem, it is possible to enlarge the 
phase space so that in the new coordinates the 
Hamiltonian is analytic near the singular 
configurations. 

A regularization also arises when considering 
collisional orbits in the unrestricted planar three- 
body problem. In this respect, a very remarkable 
result is the regularization of collisional orbits in the 
planar three-body problem. After proving that if the 
total angular momentum does not vanish, simulta- 
neous collisions of the three masses cannot occur 
within any finite time interval, the question is 
reduced to the regularization of two-body collisions, 
under the assumption that the total angular momen- 
tum does not vanish. 

The local change of coordinates, which changes the 
relative position coordinates (x, y) of two colliding 
bodies as (x,y) > (£, n), with x + iy = (£ + in)’, is not 
one to one, hence it has to be regarded as an 
enlargement of the positions space, if points with 
different (£, n) are considered different. However, the 
equations of motion written in the variables £, n have 
no singularity at £, n = 0 (Levi-Crvita). 

Another celebrated regularization is the regular- 
ization of the Schwartzschild metric, i.e., of the 
general relativity version of the two-body problem: 
it is, however, somewhat out of the scope of this 
review (SYNGE, KRUSKAL). 

For more details, the reader is refered to Levi- 
Civita (1956). 


Appendix 1: KAM Resummation Scheme 


The idea to control the “remaining contributions” is to 
reduce the problem to the case in which there are no 
pairs of lines that follow each other in the tree order 
and which have the same current. Mark by a scale 
label “0” the lines, see [74], [83], of a tree whose 
divisors C/@o.v(/) are >1: these are lines which give 
no problems in the estimates. Then mark by a scale 
label “>1” the lines with current v(/) such that 
l@o -v(L)| < 2-7"! for n= 1 (i.e., the remaining lines). 

The lines labeled 0 are said to be on scale 0, while 
those labeled >1 are said to be on scale >1. A cluster 
of scale 0 will be a maximal collection of lines of 
scale 0 forming a connected subgraph of a tree 0. 

Consider only trees 09 € Oo of the family Oo of 
trees containing no clusters of lines with scale label 
0 which have only one line entering the cluster and 
one exiting it with equal current. 


It is useful to introduce the notion of a line 4 
situated “between” two lines 4, @ with @ > &: this 
will mean that 44 precedes ¢’ but not £. 

All trees 0 in which there are some pairs l > / of 
consecutive lines of scale label >1 which have equal 
current and such that all lines between them bear 
scale label 0 are obtained by “inserting” on the lines 
of trees in Oo with label >1 any number of clusters 
of lines and nodes, with lines of scale 0 and with the 
property that the sum of the harmonics of the nodes 
inserted vanishes. 

Consider a line lọ € 4) € Oo linking nodes vı < v2 
and labeled >1 and imagine inserting on it a cluster 
y of lines of scale 0 with sum of the node harmonics 
vanishing and out of which emerges one line 
connecting a node Vout in y to v2 and into which 
enters one line linking vı to a node Vin € y. The 
insertion of a k-lines, |y|= (k + 1)-nodes, cluster 
changes the tree value by replacing the line factor, 
that will be briefly called “value of the cluster y”, as 


Wo MOV) 1 
Wo : v(Io)* Do: v(Io)* 


Vy- V 
ee 107 
Wo ` v(Io)? pe 


where M is an £ x £ matrix 
ell 


M;s(Ņ, v(lo)) = py out, rin, s LCA.) 1] 


2 
yey Jey M0 + V(I) 


V, ° Vy 


if €=v'v denotes a line linking v’ and v. Therefore, if 
all possible connected clusters are inserted and the 
resulting values are added up, the result can be taken 
into account by attributing to the original line lọ a 
factor like [107] with M!)(v(Jo)) >, M(73v(Io)) 
replacing M(7; v(lo)). 

If several connected clusters y are inserted on the 
same line and their values are summed, the result is 
a modification of the factor associated with the line 
lọ into 


Sv . (met) 1 


Wo: v(Io)* Wo: v(Ip)? 


z e ay : va) 108] 
Mo -V(Io)” — M (v(lo)) 


The series defining M') involves, by construction, only 
trees with lines of scale 0, hence with large divisors, so 
that it converges to a matrix of small size of order € 
(actually <*, more precisely) if € is small enough. 
Convergence can be established by simply remark- 
ing that the series defining M") is built with lines 
with values >(1/2) of the propagator, so that it 
certainly converges for € small enough (by the 
estimates in the section “Perturbing functions,” 
where the propagators were identically 1) and the 


sum is of order £ (actually £), hence <1. However, 
such an argument cannot be repeated when dealing 
with lines with smaller propagators (which still have 
to be discussed). Therefore, a method not relying on 
so trivial a remark on the size of the propagators has 
eventually to be used when considering lines of scale 
higher than 1, as it will soon become necessary. 

The advantage of the collection of terms achieved 
with [108] is that we can represent h as a sum of 
values of trees which are simpler because they 
contain no pair of lines of scale >1 with in between 
lines of scale 0 with total sum of the node harmonics 
vanishing. The price is that the divisors are now more 
involved and we even have a problem due to the fact 
that we have not proved that the series in [108] 
converges. In fact, it is a geometric series whose value 
is the RHS of [108] obtained by the sum rule [79] 
unless we can prove that the ratio of the geometric 
series is <1. This is trivial in this case by the previous 
remark: but it is better to note that there is another 
reason for convergence, whose use is not really 
necessary here but will become essential later. 

The property that the ratio of the geometric series 
is <1 can be regarded as due to the consequence of 
the cancellation mentioned in the section “Quasi- 
periodicity and KAM stability” which can be 
shown to imply that the ratio is <1 because 
M(v) =22(@0 -v)?m(v) with C |m(v)| < Do 
for some Dp > 0 and for all |e| < £o for some £o. 
Then for small e the divisor in [108] is essentially 
still what it was before starting the resummation. 

At this point, an induction can be started. Consider 
trees evaluated with the new rule and place a scale 
level “>2” on the lines with C |@o - v(/)| < 2-"*" for 
n= 2: leave the label “0” on the lines already marked 
so and label by “1” the other lines. The lines of scale 
“1” will satisfy 27” < |@o-v(l)| <2-"*! for n=1. 
The graphs will now possibly contain lines of scale 0, 
1 or >2 while lines with label “>1” no longer can 
appear, by construction. 

A cluster of scale 1 will be a maximal collection of 
lines of scales 0, 1 forming a connected subgraph of 
a tree 0 and containing at least one line of scale 1. 

The construction carried out by considering clusters 
of scale 0 can be repeated by considering trees 6, € O4, 
with ©, the collection of trees with lines marked 0, 1, 
or >2 and in which no pairs of lines with equal 
momentum appear to follow each other if between 
them there are only lines marked 0 or 1. 

Insertion of connected clusters y of such lines on a 
line lọ of 6; leads to define a matrix M" formed by 
summing tree values of clusters y with lines of scales 
0 or 1 evaluated with the line factors defined in 
[107] and with the restriction that in y there are no 
pairs of lines 2 < ¢ with the same current and which 
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follow each other while any line between them has 
lower scale (1.e., 0), here “between” means “preced- 
ing l’ but not preceding /,” as above. 

Therefore, a scale-independent method has to be 
devised to check the convergence for M” and for the 
matrices to be introduced later to deal with even 
smaller propagators. This is achieved by the following 
extension of Siegel’s theorem mentioned in the section 
“Quasiperiodicity and KAM stability”: 


Theorem 8 Let @o satisfy [74] and set @=C@ . 
Consider the contribution to the sum in [82] from 
graphs 0 in which 


(i) no pairs l >£ of lines which lie on the same 
path to the root carry the same current v if all 
lines 4 between them have current v(t) such 
that |æ - v(4)| > 2|@ - v|; 

(ii) the node harmonics are bounded by |v| < N for 
some N. 


Then the number of lines £ in 0 with divisor @- V¢ 
satisfying 2-" < |æ -v| <2°-"*! does not exceed 
4 Nk2-"/7, n=1,2,.... 


This implies, by the same estimates in [85], that 
the series defining M") converges. Again, it must be 
checked that there are cancellations implying that 
M” (v) =e@ -vf m” (v) with |) (v)| < Do for 
the same Do > 0 and the same £o. 

At this point, one deals with trees containing only 
lines carrying labels 0, 1, > 2, and the line factors for 
the lines 2=v'v of scale 0 are vy -V,/(@o -v(l))7, 
those of the lines €=v'v of scale 1 have line factors 
Vy -(@0-v(l)* — M(y(2))) tv,, and those of the 
lines 2=v'v of scale > 2 have line factors 


Vy - (@ -v(0)* — M(v(0))) v, 


Furthermore, no pair of lines of scale “1” or of scale 
“>2” with the same momentum and with only lines 
of lower scale (i.e., of scale “0” in the first case or of 
scale “0”, “1” in the second) between them can 
follow each other. 

This procedure can be iterated until, after infi- 
nitely many steps, the problem is reduced to the 
evaluation of tree values in which each line carries a 
scale label n and there are no pairs of lines which 
follow each other and which have only lines of 
lower scale in between. Then the Siegel argument 
applies once more and the series so resumed is an 
absolutely convergent series of functions analytic in 
e: hence the original series is convergent. 

Although at each step there is a lower bound on the 
denominators, it would not be possible to avoid using 
Siegel’s theorem. In fact, the lower bound would become 
worse and worse as the scale increases. In order to check 
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the estimates of the constants Do, €o which control the 
scale independence of the convergence of the various 
series, it is necessary to take advantage of the theorem, 
and of the absence (at each step) of the necessity of 
considering trees with pairs of consecutive lines with 
equal momentum and intermediate lines of higher scale. 

One could also perform the analysis by bounding 
h'®) order by order with no resummations (i.e., 
without changing the line factors) and exhibiting the 
necessary cancellations. Alternatively, the paths that 
Kolmogorov, Arnol’d and Moser used to prove 
the first three (somewhat different) versions of the 
theorem, by successive approximations of the 
equations for the tori, can be followed. 

The invariant tori are Lagrangian manifolds just 
as the unperturbed ones (cf. comments after [31]) 
and, in the case of the Hamiltonian [80], the 
generating function A-w+®(A,w) can be 
expressed in terms of their parametric equations 


®(A,y)=Gliy)+a-wth(y):(A—@- Ah(y)) 


dy G(y) = — Ably) +h(w)dyAh(y) -a 


e d 
pe J (-Ab(y) + b(w)dy Ab(y)) 


[109] 








_ dy 
7 J bay Ab Se 


where A=(@-ðy) and the invariant torus corre- 
sponds to A’=@ in the map æ = y + d4®(A, y) and 
A'=A+ðy(A, y). In fact, by [109] the latter 
becomes A’ =A — Ah and, from the second of [75] 
written for f depending only on the angles œ, it is 
A=a@+ Ah when A,«& are on the invariant torus. 

Note that if a exists it is necessarily determined by the 
third relation in [109] but the check that the second 
equation in [109] is soluble (i.e., that the RHS is an exact 
gradient up to a constant) is nontrivial. The canonical 
map generated by A - w+ Ø(A, y) is also defined for A’ 
close to @ and foliates the neighborhood of the invariant 
torus with other tori: of course, for A’ 4 @ the tori 
defined in this way are, in general, not invariant. 

The reader is referred to Gallavotti et al. (2004) 
for more details. 


Appendix 2: Coriolis and Lorentz 
Forces - Larmor Precession 


Larmor precession refers to the motion of an 
electrically charged particle in a magnetic field H 
(in an inertial frame of reference). It is due to the 
Lorentz force which, on a unit mass with unit 
charge, produces an acceleration ọ=v ^H if the 
speed of light is c= 1. 


Therefore, if H = Hk is directed along the k-axis, 
the acceleration it produces is the same that the 
Coriolis force would impress on a unit mass located 
in a reference frame which rotates with angular 
velocity wok around the k-axis if H =2wok. 

The above remarks imply that a homogeneous 
sphere electrically charged uniformly with a unit 
charge and freely pivoting about its center in a 
constant magnetic field H directed along the k-axis 
undergoes the same motion as it would follow if not 
subject to the magnetic field but seen in a 
noninertial reference frame rotating at constant 
angular velocity wọ around the k-axis if H and wo 
are related by H =2wọ: in this frame, the Coriolis 
force is interpreted as a magnetic field. 

This holds, however, only if the centrifugal force 
has zero moment with respect to the center: true in 
the spherical symmetry case only. In spherically 
nonsymmetric cases, the centrifugal forces have in 
general nonzero moment, so the equivalence 
between Coriolis force and the Lorentz force is 
only approximate. 

The Larmor theorem makes this more precise. It 
gives a quantitative estimate of the difference between 
the motion of a general system of particles of mass m 
in a magnetic field and the motion of the same 
particles in a rotating frame of reference but in the 
absence of a magnetic field. The approximation is 
estimated in terms of the size of the Larmor frequency 
eH/2mc, which should be small compared to the 
other characteristic frequencies of the motion of the 
system: the physical meaning is that the centrifugal 
force should be small compared to the other forces. 

The vector potential A for a constant magnetic 
field in the k-direction, H = 2wok, is A = 2wọk ^ ọ = 
2woo+. Therefore, from the treatment of the Coriolis 
force in the section “Three-body problem” (see 
[95]), the motion of a charge e with mass m in a 
magnetic field H with vector potential A and subject 
to other forces with potential W can be described, in 
an inertial frame and in generic units, in which the 
speed of light is c, by a Hamiltonian 


1 e ,\2 
H=>—(p--A) +W(o) [110] 
where p=mọ + (e/c)A and ọ are canonically con- 


jugate variables. 
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Differential geometry is the study of differential 
properties of geometric objects such as curves, 
surfaces and higher-dimensional manifolds endowed 
with additional structures such as metrics and 
connections. One of the main ideas of differential 
geometry is to apply the tools of analysis to 
investigate geometric problems; in particular, it 
studies their “infinitesimal parts,” thereby lineariz- 
ing the problem. However, historically, geometric 
concepts often anticipated the analytic tools 
required to define them from a differential geometric 
point of view; the notion of tangent to a curve, for 
example, arose well before the notion of derivative. 

In its barely more than two centuries of existence, 
differential geometry has always had strong (often 
two-way) interactions with physics. Just to name a 
few examples, the theory of curves is used in 
kinematics, symplectic manifolds arise in Hamilto- 
nian mechanics, pseudo-Riemannian manifolds in 
general relativity, spinors in quantum mechanics, Lie 
groups and principal bundles in gauge theory, and 
infinite-dimensional manifolds in the path-integral 
approach to quantum field theory. 


Curves and Surfaces 


The study of differential properties of curves and 
surfaces resulted from a combination of the coordi- 
nate method (or analytic geometry) developed by 
Descartes and Fermat during the first half of the 
seventeenth century and infinitesimal calculus devel- 
oped by Leibniz and Newton during the second half 
of the seventeenth and beginning of the eighteenth 
century. 


Differential geometry appeared later in the eight- 
eenth century with the works of Euler Recherches 
sur la courbure des surfaces (1760) (Investigations 
on the curvature of surfaces) and Monge Une 
application de Panalyse a la géométrie (1795) (An 
application of analysis to geometry). Until Gauss’ 
fundamental article Disquisitiones generales circa 
superficies curvas (General investigations of curved 
surfaces) published in Latin in 1827 (of which one 
can find a partial translation to English in Spivak 
(1979)), surfaces embedded in R? were either 
described by an equation, W(x,y,z)=0, or by 
expressing one variable in terms of the others. 
Although Euler had already noticed that the 
coordinates of a point on a surface could be 
expressed as functions of two independent variables, 
it was Gauss who first made a systematic use of such 
a parametric representation, thereby initiating the 
concept of “local chart” which underlies differential 
geometry. 


Differentiable Manifolds 


The actual notion of m-manifold independent of a 
particular embedding in a Euclidean space goes back 
to a lecture Uber die Hypothesen, welche der 
Geometrie zu Grunde liegen (On the hypotheses 
which lie at the foundations of geometry) (of which 
one can find a translation to English and comments 
in Spivak (1979)) delivered by Riemann at Göttingen 
University in 1854, in which he makes clear the 
fact that m-manifolds are locally like n-dimensional 
Euclidean space. In his work, Riemann mentions 
the existence of infinite-dimensional manifolds, 
such as function spaces, which today play an 
important role since they naturally arise as config- 
uration spaces in quantum field theories. 

In modern language a differentiable manifold 
modeled on a topological space V (which can be 
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finite dimensional, Fréchet, Banach, or Hilbert for 
example) is a topological space M equipped with a 
family of local coordinate charts (Uj, ¢;);-; such that the 
open subsets U; C M cover M and where ¢;: U; > V, 
i € I, are homeomorphisms which give rise to smooth 
transition maps Qj © gj OAU; N U;) — ilU; A U;). 
An n-dimensional differentiable manifold is a differ- 
entiable manifold modeled on R”. The sphere 
Set {(x1,-.-5%n) E RSX =1} is a differenti- 
able manifold of dimension 7 — 1. 

Simple differentiable curves in R” are one- 
dimensional differentiable manifolds locally speci- 
fied by coordinates x(t) =(xj(t),...,X,(t)) € R”, 
where t> x;(t) is of class C*. The tangent at point 
x(t) to such a curve, which is a straight line passing 
through this point with direction given by the vector 
x'(to), generalizes to the concept of tangent space 
TmM at point meM of a smooth manifold M 
modeled on V which is a vector space isomorphic to 
V spanned by tangent vectors at point m to curves 
a(t) of class C! on M such that y(to) =m. 

In order to make this more precise, one needs the 
notion of differentiable mapping. Given two differ- 
entiable manifolds M and N, a mapping f:M —> N 
is differentiable at point m if, for every chart (U, ¢) 
of M containing m and every chart (V,w) of N such 
that f(U) c V, the mapping w o f o 61: (U) > WV) 
is differentiable at point ¢(m). In particular, differenti- 
able mappings f :M — R form the algebra C™(M, R) 
of smooth real-valued functions on M. Differentiable 
mappings y: [a,b] — M from an interval [a,b] C R to 
a differentiable manifold M are called “differentiable 
curves” on M. A differentiable mapping f:M— N 
which is invertible and with differentiable inverse 
f-':N — M is called a diffeomorphism. 

The derivative of a function f € C~(M,R) along 
a curve y:[a,b] — M at point y(to) € M with tọ € 
[a,b] is given by 


xf = 5, fowls 


and the map f+ Xf is called the tangent vector to 
the curve y at point >(to). Tangent vectors to some 
curve y:[a,b] > M at a given point m € 4([a, b]) 
form a vector space T„M called the “tangent space” 
to M at point m. 

A (smooth) map which, to a point m € M, assigns 
a tangent vector X € T,,M is called a (smooth) 
vector field. It can also be seen as a derivation 
X:fXf on C%(M,R) defined by (Xf)(m):= 
X(m)f for any m€M and the bracket of vector 
fields is thereby defined from the operator bracket 
[X, Y]:=XoY-—YoX. The linear operations on 
tangent vectors carry out to vector fields (X + 
Y)(m) := X(m) + Y(m), (AX)(m):=AX(m) for any 


meé€M and for any X,Y € TmM,A ER so that 
vector fields on M build a linear space. 

One can generate tangent vectors to M via local 
one-parameter groups of differentiable transforma- 
tions of M, that is, mappings (t, m)—> ¢,(m) from 
J-eelx U to U (with €>0 and UCM an 
open subset of M) such that o =Id, Pis = Qt © bs 
Vs,t € J-e,e[ with t+ s € ]—e,e[ and m+ ¢,(m) isa 
diffeomorphism of U onto an open subset ¢,(U). 
The tangent vector at t=0 to the curve q(t) = ¢;(m) 
yields a tangent vector to M at point m=q(0). 
Conversely, when M is finite dimensional, the 
fundamental theorem for systems of ordinary 
equations yields, for any vector field X on M, the 
existence (around any point m€EM) of a 
local one-parameter group of local transformations 
@:|—e, e| x U— M (with U an open subset contain- 
ing m) which induces the tangent vector 
X(m) € TaM. 

A differentiable mapping ¢: M— N induces a map 
b.(m):TinM— Tyin)M defined by ¢,.Xf =X(f o ¢). 
An “immersion” of a manifold M in a manifold N is a 
differentiable mapping ¢:M—N such that the maps 
@,(m) are injective at any point m € M. Such a map is 
an embedding if it is moreover injective in which case 
@(M) C N is a submanifold of N. The unit sphere S” 
is a submanifold of R”*'. Whitney showed that every 
smooth real n-dimensional manifold can be embedded 
in R221. 

A differentiable manifold whose coordinate charts 
take values in a complex vector space V and whose 
transition maps are holomorphic is called a complex 
manifold, which is complex n-dimensional if V = C”. 
The complex projective space CP”, the union of 
complex straight lines through 0 in C”+!, is a 
compact complex manifold of dimension n. Similarly 
to the notion of differentiable mapping between 
differentiable manifolds, we have the notion of 
holomorphic mapping between complex manifolds. 

A smooth family m+ Jm of endomorphisms of the 
tangent spaces T,,,M to a differentiable manifold M such 
that J2, = —Id gives rise to an almost-complex manifold. 
The prototype is the almost-complex structure on C” 
defined by J(0,,)=0,,3 J(O),)= —O,, with z=(x1 + 
Y1,- .., Xn + iYn) E€ C” which can be transferred to a 
complex manifold M by means of local charts. An 
almost-complex structure J on a manifold M is called 
complex if M is the underlying differentiable manifold 
of a complex manifold which induces J in this way. 

Studying smooth functions on a differentiable 
manifold can provide information on the topology 
of the manifold: for example, the behavior of a 
smooth function on a compact manifold as its 
critical points strongly restricted by the topological 
properties of the manifold. This leads to the Morse 


critical point theory which extends to infinite- 
dimensional manifolds and, among other conse- 
quences, leads to conclusions on extremals or closed 
extremals of variational problems. Rather than 
privileging points on a manifold, one can study 
instead the geometry of manifolds from the point of 
view of spaces of functions, which leads to an 
algebraic approach to differential geometry. The 
initial concept there is a commutative ring (which 
becomes a possibly noncommutative algebra in the 
framework of noncommutative geometry), namely 
the ring of smooth functions on the manifold, while 
the manifold itself is defined in terms of the ring as the 
space of maximal ideals. In particular, this point of 
view proves to be fruitful to understand supermani- 
folds, a generalization of manifolds which is impor- 
tant for supersymmetric field theories. 

One can further consider the sheaf of smooth 
functions on an open subset of the manifold; this 
point of view leads to sheaf theory which provides a 
unified approach to establishing connections between 
local and global properties of topological spaces. 


Metric Properties 


Riemann focused on the metric properties of manifolds 
but the first clear formulation of the concept of a 
manifold equipped with a metric was given by Weyl in 
Die Idee der Riemannsche Fläche. A Riemannian 
metric on a differentiable manifold M is a positive- 
definite scalar product g,, on T,,M for every point 
m € M depending smoothly on the point m. A manifold 
equipped with a Riemannian metric is called a 
Riemannian manifold. A Weyl transformation, which 
is multiplying the metric by a smooth positive function, 
yields a new Riemannian metric with the same angle 
measurement as the original one, and hence leaves the 
“conformal” structure on M unchanged. 

Riemann also suggested considering metrics on 
the tangent spaces that are not induced from scalar 
products; metrics on the manifold built this way 
were first systematically investigated by Finsler and 
are therefore called Finsler metrics. Geodesics on a 
Riemannian manifold M which correspond to 
smooth curves y:[a,b] > M that minimize the 
length functional 


dy dy 
E(t) dt’ dt 


then generalize to curves which realize the shortest 
distance between two points chosen sufficiently close. 

Euclid’s axioms which naturally lead to Rieman- 
nian geometry are also satisfied up to the axiom 
of parallelism by a geometry developed by 
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Lobatchevsky in 1829 and Bolyai in 1832. Non- 
Euclidean geometries actually played a major role in 
the development of differential geometry and Loba- 
chevsky’s work inspired Riemann and later Klein. 

Dropping the positivity assumption for the 
bilinear forms g,, on T,,M leads to Lorentzian 
manifolds which are (n + 1)-dimensional smooth 
manifolds equipped with bilinear forms on the 
tangent spaces with signature (1,7). These occur in 
general relativity and tangent vectors with negative, 
positive, or vanishing squared length are called 
timelike, spacelike, and lightlike, respectively. 

Just as complex vector spaces can be equipped with 
positive-definite Hermitian products, a complex 
manifold M can come equipped with a Hermitian 
metric, namely a positive-definite Hermitian product 
hm on T,M for every point m eM depending 
smoothly on the point m; every Hermitian metric 
induces a Riemannian one given by its real part. The 
complex projective space CP” comes naturally 
equipped with the Fubini-Study Hermitian metric. 


Transformation Groups 


Metric properties can be seen from the point of view 
of transformation groups. Poncelet in his Traité 
projectif des figures (1822) had investigated classical 
Euclidean geometry from a projective geometric 
point of view, but it was not until Cayley (1858) 
that metric properties were interpreted as those 
stable under any “projective” transformation which 
leaves “cyclic points” (points at infinity on the 
imaginary axis of the complex plane) invariant. 
Transformation groups were further investigated by 
Lie, leading to the modern concept of Lie group, a 
smooth manifold endowed with a group structure 
such that the group operations are smooth. 

A vector field X on a Lie group G is called left- 
(resp. right-) invariant if it is invariant under left 
translations Le:htegh (resp. right translations 
Rg: ht+hg) for every g € G, that is, if (Lg), X(b) = 
X(gh) V(g,h) € G? (resp. (Rg),X(b) = X(gh) Víg, h) 
€ G*). The set of all left-invariant vector fields 
equipped with the sum, scalar multiplication, and 
the bracket operation on vector fields form an 
algebra called the Lie algebra of G. 

The group Gl,,(R) (resp. Gl„(C)) of all real (resp. 
complex) invertible n x n matrices is a Lie group 
with Lie algebra, the algebra gl, (IR) (resp. gl,(C)) of 
all real (resp. complex) nxn matrices and the 
bracket operation reads [A, B] = AB — BA. 

The orthogonal (resp. unitary) group O,(R):= 
{A € Gl (R), AtA =1}, where A‘ denotes the trans- 
posed matrix (resp. U,(C):={A € Gl,(C), A*A = 1}, 
where A* = At), is a compact Lie group with Lie 
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algebra o,(R):={A € GIL,(R), A' = —A} (resp. u,(C):= 
{A €Gl,(C), A* = —A}). 

A left-invariant vector field X on a finite-dimen- 
sional Lie group G (or equivalently an element X of 
the Lie algebra of G) generates a global one- 
parameter group of transformations x(t), tE R. 
The mapping from the Lie algebra of G into G 
defined by exp(X) := (1) is called the exponential 
mapping. The exponential mapping on Gl,,(R) (resp. 
Gl,,(C)) is given by the series exp (A) = 5°) A‘/i!. 

As symmetry groups of physical systems, Lie 
groups play an important role in physics, in 
particular in quantum mechanics and Yang-Mills 
theory. Infinite-dimensional Lie groups arise as 
symmetry groups, such as the group of diffeomorph- 
isms of a manifold in general relativity, the group of 
gauge transformations in Yang-Mills theory, and 
the group of Weyl transformations of metrics on a 
surface in string theory. The principle “the physics 
should not depend on how it is described” translates 
to an invariance under the action of the (possibly 
infinite-dimensional group) of symmetries of the 
theory. Anomalies arise when such an invariance 
holds for the classical action of a physical theory but 
“breaks” at the quantized level. 

In his Erlangen program (1872), Klein puts the 
concept of transformation group in the foreground 
introducing a novel idea by which one should 
consider a space endowed with some properties 
as a set of objects invariant under a given group of 
transformations. One thereby reaches a classifica- 
tion of geometric results according to which group is 
relevent in a particular problem as, for example, the 
projective linear group for projective geometry, 
the orthogonal group for Riemannian geometry, or 
the symplectic group for “symplectic” geometry. 


Fiber Bundles 


Transformation groups give rise to principal fiber 
bundles which play a major role in Yang-Mills 
theory. The notion of fiber bundle first arose out of 
questions posed in the 1930s on the topology and the 
geometry of manifolds, and by 1950 the definition of 
fiber bundle had been clearly formulated by Steenrod. 

A smooth fiber bundle with typical fiber a 
manifold F is a triple (E, r, B), where E and B are 
smooth manifolds called the total space and the base 
space, and 7:E—B is a smooth surjective map 
called the projection of the bundle such that the 
preimage 7 !'(b) of a point b € B called the fiber of 
the bundle over b is isomorphic to F and any base 
point b has a neighborhood U C B with preimage 
ma '(U) diffeomorphic to U x F, where the diffeo- 
mophisms commute with the projection on the base 


space. Smooth sections of E are maps o: B — E such 
that to c= Í}. 

When F is a vector space and when, given open 
subsets U; C B that cover B with corresponding 
coordinate charts (Uj, ¢;);-;, the local diffeomorph- 
isms 7:7 '(U;) ~ ġ;(U;) x F give rise to transition 
maps 7; 0 a alU; N U;) x F— 6(U;N U;) x F that 
are linear in the fiber, the bundle is called a “vector 
bundle.” The tangent bundle TM = U,,,-., TmM to a 
differentiable manifold M modeled on a vector space 
V is a vector bundle with typical fiber V and 
transition maps Tij =(¢; ° ¢;',d(¢; 0 ¢;")) expressed 
in terms of the differentials of the transition maps on 
the manifold M. So are the cotangent bundle, the 
dual of the tangent bundle, and tensor products of 
the tangent and cotangent vector bundles with 
typical fiber the dual V* and tensor products of V 
and V*. Vector fields defined previously are sections 
of the tangent bundle, 1-forms on M are sections of 
the cotangent bundle, and contravariant tensors, 
resp. covariant tensors are sections of tensor 
products of the tangent, resp. cotangent bundles. A 
differentiable mapping ¢:M—N takes covariant 
p-tensor fields on N to their pullbacks by 4, 
covariant p-tensors on M given by 


(TX, e. , Xp) = T(ġ,X1, EE 


for any vector fields X1,..., Xp on M. 

Differentiating a smooth function f on M gives 
rise to a 1-form df on M. More generally, exterior p- 
forms are antisymmetric smooth covariant p-tensors 
so that w(Xo(1)5 ieee Xip) =€(o)w(X1,... , Xp) for 
any vector fields X1,..., Xp on M and any permuta- 
tion o € X, with signature e(o). 

Riemannian metrics are covariant 2-tensors and 
the space of Riemannian metrics on a manifold M is 
an infinite-dimensional manifold which arises as a 
configuration space in string theory and general 
relativity. 

A principal bundle is a fiber bundle (P, 7m, B) with 
typical fiber a Lie group G acting freely and properly 
on the total space P via a right action (p,g) € 
Px Gr pg=R,(p) € P and such that the local 
diffeomorphisms m~™t(U) ~ U x G are G-equivariant. 
Given a principal fiber bundle (P, m, B) with structure 
group a finite-dimensional Lie group G, the action of 
G on P induces a homomorphism which to an 
element X of the Lie algebra of G assigns a vector 
field X* on P called the “fundamental vector field” 
generated by X. It is defined at p € P by 


Ap) 


r d 
X (p) = dt} Rexp(tx) (p) 


where exp is the exponential map on G. 


Given an action of G on a vector space V, one 
builds from a principal bundle with typical fiber G an 
associated vector bundle with typical fiber V. 
Principal bundles are essential in gauge theory; U(1)- 
principal bundles arise in electro-magnetism and 
nonabelian structure groups arise in Yang-Mills 
theory. There the fields are connections on the 
principal bundle, and the action of gauge transforma- 
tions on (irreducible) connections gives rise to an 
infinite-dimensional principal bundle over the moduli 
space with structure group given by gauge transfor- 
mations. Infinite-dimensional bundles arise in other 
field theories such as string theory where the moduli 
space corresponds to inequivalent complex structures 
on a Riemann surface and the infinite-dimensional 
structure group is built up from Weyl transformations 
of the metric and diffeomorphisms of the surface. 


Connections 


On a manifold there is no canonical method to 
identify tangent spaces at different points. Such an 
identification, which is needed in order to differenti- 
ate vector fields, can be achieved on a Riemannian 
manifold via “parallel transport” of the vector fields. 
The basic concepts of the theory of covariant 
differentiation on a Riemannian manifold were given 
at the end of the nineteenth century by Ricci and, in a 
more complete form, in 1901 in collaboration with 
Levi-Civita in Méthodes de calcul différentiel absolu et 
leurs applications; on a Riemannian manifold, it is 
possible to define in a canonical manner a parallel 
displacement of tangent vectors and thereby to 
differentiate vector field covariantly using the since 
then called Levi-Civita connection. 

More generally, a (linear) connection (or equiva- 
lently a covariant derivation) on a vector bundle E 
over a manifold M provides a way to identify fibers 
of the vector bundle at different points; it is a map V 
taking sections o of E to E-valued 1-forms on M 
which satisfies a Leibniz rule, V(fo) =dfo+ fVo, 
for any smooth function f on M. When E is the 
tangent bundle over M, curves y on the manifold 
with covariantly constant velocity V(t) = 0 give rise 
to geodesics. Given an initial velocity 4+(0)=X € 
TM and provided X has small enough norm, yx(1) 
defines a point on the corresponding geodesic and 
the map exp :X —> yx(1) a diffeomorphism from a 
neighborhood of 0 in T„M to a neighborhood of 
m € M called the “exponential map” of V. 

The concept of connection extends to principal 
bundles where it was developed by Ehresmann 
building on the work of Cartan. A connection on a 
principal bundle (P,z,B) with structure group G, 
which is a smooth equivariant (under the action of 
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the group G) decomposition of the tangent space 
T P = H,P © VpP at each point p into a horizontal 
space H,P and the vertical space VP = Ker drp, 
gives rise to a linear connection on the associated 
vector bundle. 

A connection on P gives rise to a 1-form w on P 
with values in the Lie algebra of the structure group 
G called the connection 1-form and defined as 
follows. For each X € T,P,w(X) is the unique 
element U of the Lie algebra of G such that the 
corresponding fundamental vector field U*(p) at 
point p coincides with the vertical component of X. 
In particular, w(U*) = U for any element U of the Lie 
algebra of G. 

The space of connections which is an infinite- 
dimensional manifold arises as a configuration space 
in Yang-Mills theory and also comes into play in the 
Seiberg—Witten theory. 


Geometric Differential Operators 


From connections one defines a number of differ- 
ential operators on a Riemannian manifold, among 
them second-order Laplacians. In particular, the 
Laplace-Beltrami operator fre —tr(V'*™ df) on 
smooth functions, where V!™ is the connection on 
the cotangent bundle induced by the Levi-Civita 
connection on M, generalizes the ordinary Laplace 
operator on Euclidean space. This in turn generalizes 
to second-order operators A’ := —tr(V! MEV) 
acting on smooth sections of a vector bundle E over 
a Riemannian manifold M, where VË is a connection 
on E and VIME the connection on T*M Q E 
induced by VË and the Levi-Civita connection on M. 

The Dirac operator on a spin Riemannian 
manifold, a first-order differential operator whose 
square coincides with the Laplace—Beltrami opera- 
tor up to zeroth-order terms, can be best under- 
stood going back to the initial idea of Dirac. A 
first-order differential operator with constant 
matrix coefficients )~\"_,7(0/0x;) has square 
given by the Laplace operator —)~?_, 07/Ox? on 
R” if and only if its coefficients satisfy the the 
Clifford relations 


y=-1l Vi=1,...,2 
YENS Vij 


The resulting Clifford algebra, once complexified, is 
isomorphic in even dimensions n= 2k to the space 
End(S,,) (and End(S,) 6 End(S,,) in odd dimensions 
n=2k + 1) of endomorphisms of the space S, = ‘on 
of complex n-spinors. When instead of the canoni- 
cal metric on R” one starts from the the metric on 
the tangent bundle TM induced by the Riemannian 
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metric on M and provided the corresponding spinor 
spaces patch up to a “spinor bundle” over M, M is 
called a spin manifold. The Dirac operator on a 
spin Riemannian manifold M is a first-order 
differential gas acting on spinors given by 
D= 2e JiVe, where V is the connection 
on spinors (sections of the spinor bundle S) induced 
by the Levi-Civita connection and e1,...,é@, is 
an orthonormal frame of the tangent bundle TM. 
This is a particular case of more general twisted 
Dirac operators De on a twisted spinor bundle 
S& W equipped with the connection V°®™ which 
combines the connection V with a connection VW 
on an auxilliary vector bundle W. Their square 
DF relates to the Laplacian A®®™ built from this 
twisted connection via the Lichnerowicz formula 
which is useful for estimates on the spectrum of the 
Dirac operator in terms of the underling geometric 
data. 

When there is no spin structure on M, one can still 
hope for a Spin! structure and a Dirac D° operator 
associated with a connection compatible with that 
structure. In particular, every compact orientable 
4-manifold can be equipped with a Spin! structure 
and one can build invariants of the differentiable 
manifold called Seiberg-Witten invariants from 
solutions of a system of two partial differential 
equations, one of which is the Dirac equation 
D° =0 associated with a connection compatible 
with the Spin“ structure and the other a nonlinear 
equation involving the curvature. 


Curvature 


23) 


The concept of “curvature,” which is now under- 
stood in terms of connections (the curvature of a 
connection V is defined by Q=V7), historically 
arose prior to that of connection. In its modern 
form, the concept of curvature dates back to Gauss. 
Using a spherical representation of surfaces — the 
Gauss map v, which sends a point m of an oriented 
surface © C R? to the outward pointing unit normal 
vector Vm — Gauss defined what is since then called 
the Gaussian curvature K,, at point m€ U C X as 
the limit when the area of U tends to zero of the 
ratio area(v(U))/area(U). It measures the obstruc- 
tion to finding a distance-preserving map from a 
piece of the surface around m to a region in the 
standard plane. Gauss’ Teorema Egregium says that 
the Gaussian curvature of a smooth surface in R? is 
defined in terms of the metric on the surface so that 
it agrees for two isometric surfaces. 

From the curvature Q of a connection on a 
Riemannian manifold (M,g), one builds the 


Riemannian curvature tensor, a 4-tensor which in 
local coordinates reads 


ð O\0 oð 
Paea a a ea 


further taking a partial trace leads to the Ricci 
curvature given by the 2-tensor Ricj = >°, Rin, 
the trace of which gives in turn the scalar cur- 
vature R= `; Rica. Sectional curvature at a point 
m in the direction of a two-dimensional plane 
spanned by two vectors U and V corresponds to 
K(U, V)=2(Q(U, V)V, U). A manifold has constant 
sectional curvature whenever K(U, V)/||U A V||* is a 
constant K for all linearly independent vectors U,V. 
A Riemannian manifold with constant sectional 
curvature is said to be spherical, flat, or hyperbolic 
type depending on whether K > 0,K=0, or K < 0, 
respectively. One owes to Cartan the discovery of an 
important class of Riemannian manifolds, symmetric 
spaces, which contains the spheres, the Euclidean 
spaces, the hyperbolic spaces, and compact Lie 
groups. A connected Riemannian manifold M 
equipped at every point m with an isometry om 
such that o,,(m)=m and the tangent map Tynom 
equals -Id on the tangent space (it therefore reverses 
the geodesics through m) is called symmetric. CP” 
equipped with the Fubini-Study metric is a symmetric 
space with the isometry given by the reflection with 
respect to a line in C”*'. A compact symmetric space 
has non-negative sectional curvature K. 

Constraints on the curvature can have topological 
consequences. Spheres are the only simply connected 
manifolds with constant positive sectional curvature; 
if a simply connected complete Riemannian mani- 
fold of dimension >1 has non-positive sectional 
curvature along every plane, then it is homeo- 
morphic to the Euclidean space. 

A manifold with Ricci curvature tensor propor- 
tional to the metric tensor is called an Einstein 
manifold. Since Einstein, curvature is a cornerstone 
of general relativity with gravitational force being 
interpreted in terms of curvature. For example, the 
vacuum Einstein equation reads Ricg = (1/2)Rg g with 
Ric, the Ricci curvature of a metric g and Rg its scalar 
curvature. In addition, Kaluza—Klein supergravity is a 
unified theory modeled on a direct product of the 
Mikowski four-dimensional space and an Einstein 
manifold with positive scalar curvature. 

The Ricci flow dg(t)/dt= —2Ricgiz, which is 
related with the Einstein equation in general 
relativity, was only fairly recently introduced in the 
mathematical literature. Hopes are strong to get a 
classification of closed 3-manifolds using the Ricci 
flow as an essential ingredient. 


Cohomology 


Differentiation of functions ft+df on a differenti- 
able manifold M generalizes to exterior differentia- 
tion a+ da of differential forms. A form a is closed 
whenever it is in the kernel of d and it is exact 
whenever it lies in the range of d. Since d? = 0, exact 
forms are closed. 

Cartan’s structure equations dw = —(1/2)[w, w] + 
relate the exterior differential of the connection 1-form 
w on a principal bundle to its curvature Q given by 
the exterior covariant derivative Dw:= dw o h, where 
h:T,P — H,P is the projection onto the horizontal 
space. 

On a complex manifold, forms split into sums 
of (p,qg)-forms, those with p-holomorphic and 
g-antiholomorphic components, and exterior differ- 
entiation splits as d=0+0 into holomorphic and 
antiholomorphic derivatives, with 0* = 07 =0. 

Geometric data are often expressed in terms of 
closedness conditions on certain differential forms. 
For example, a “symplectic manifold” is a manifold 
M equipped with a closed nondegenerate differential 
2-form called the “symplectic form.” The theory of 
J-holomorphic curves on a manifold equipped with 
an almost-complex structure J has proved fruitful in 
building invariants on symplectic manifolds. A 
Kahler manifold is a complex manifold equipped 
with a Hermitian metric h whose imaginary part 
Im b yields a closed (1,1)-form. The complex 
projective space CP” is Kahler. 

The exterior differentation d gives rise to de Rham 
cohomology as Kerd/Imd, and de Rham’s theorem 
establishes an isomorphism between de Rham coho- 
mology and the real singular cohomology of a 
manifold. Chern (or characteristic) classes are topo- 
logical invariants associated to fiber bundles and play 
a crucial role in index theory. Chern—Weil theory 
builds representatives of these de Rham cohomology 
classes from a connection V of the form tr(f(V7)), 
where f is some analytic function. 

When the manifold is Riemannian, the Laplace- 
Beltrami operator on functions generalizes to differ- 
ential forms in two different ways, namely to the 
Bochner Laplacian A“!™ on forms (i.e., sections of 
AT*M), where the contangent bundle T*M is 
equipped with a connection induced by the Levi-Civita 
connection and to the Laplace—Beltrami operator on 
forms (d + d*)* =d*d + d d*, where d* is the (formal) 
adjoint of the exterior differential d. These are related 
via Weitzenbock’s formula which in the particular case 
of 1-forms states that the difference of those two 
operators is measured by the Ricci curvature. 

When the manifold is compact, Hodge’s theorem 
asserts that the de Rham cohomology groups are 
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isomorphic to the space of harmonic (i.e., annihi- 
lated by the Laplace—Beltrami operator) differential 
forms. Thus, the dimension of the set of harmonic 
k-forms equals the kth Betti numbers from which 
one can define the Euler characteristic y(M) of the 
manifold M taking their alternate sum. Hodge 
theory plays an important role in mirror symmetry 
which posits a duality between different manifolds 
on the geometric side and between different field 
theories via their correlation functions on the 
physics side. Calabi-Yau manifolds, which are 
Ricci-flat Kahler manifolds, are studied extensively 
in the context of duality. 


Index Theory 


While the Gaussian curvature is the solution to a 
local problem, it has strong influence on the global 
topology of a surface. The Gauss—Bonnet formula 
(1850) relates the Euler characteristic on a closed 
surface to the Gaussian curvature by 


1 
x(M) = sf Ke dA», 
where dA, is the volume element on M. This is the 
first result relating curvature to global properties 
and can be seen as one of the starting points for 
index theory. It generalizes to the Chern—Gauss— 
Bonnet theorem (1944) on an even-dimensional 
closed manifold and can be interpreted as an 
example of the Atiyah—-Singer index theorem (1963) 


ind(D\”) = J À (Ng) HO”) 
M 

where g denotes a Riemannian metric on a spin 
manifold M, D™ a Dirac operator acting on sections 
of some twisted bundle S& W with S the spinor 
bundle on M and W an auxiliary vector bundle over 
M, ind(D¥) the “index” of the Dirac operator, and 
Og, Q™ respectively the curvatures of the Levi-Civita 
connection and a connection on W, and A(Q,) a 
particular Chern form called the A-genus. Index 
theorems are useful to compute anomalies in gauge 
theories arising from functional quantisation of 
classical actions. 

Given an even-dimensional closed spin manifold 
(M, g) and a Hermitian vector bundle W over M, the 
index of the associated Dirac operator Dy yields the 
so-called Atiyah map K®°(M)+>Z defined by 
W= ind(D,"), where K°(M) is the group of formal 
differences of stable homotopy classes of smooth 
vector bundles over M. This is the starting point for 
the noncommutative geometry approach to index 
theory, in which the space of smooth functions on a 
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manifold which arises here in a disguised from since 
K°(M) ~ Ko(C®(M)) (which consists of formal 
differences of smooth homotopy classes of idempo- 
tents in the inductive limit of spaces of matrices 
gl (C~(M))) is generalized to any noncommutative 
smooth algebra. 
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Introduction 


The modern theory of electromagnetism is built on 
the foundations of Maxwell’s equations: 


div E = £ 1] 
E0 
div B = 0 [2] 
1 OE 
curl B - -z = PoJ [3] 
curl E + at =) [4] 


On the left-hand side are the electric and magnetic 
fields, E and B, which are vector-valued functions 
of position and time. On the right are the sources, 
the charge density p, which is a scalar function of 
position and time, and the current density J. The 
source terms encode the distribution and velocities 
of charges, and the equations, together with 
boundary conditions at infinity, determine the fields 


that they generate. From these equations, one can 
derive the familiar predictions of electrostatics and 
magnetostatics, as well as the dynamical behavior 
of fields and charges, in particular, the generation 
and propagation of electromagnetic waves — light 
waves. 

Maxwell would not have recognized the equations 
in this compact vector notation — still less in the 
tensorial form that they take in special relativity. It 
is notable that although his contribution is univer- 
sally acknowledged in the naming of the equations, 
it is rare to see references to “Maxwell’s theory.” 
This is for a good reason. In his early studies of 
electromagnetism, Maxwell worked with elaborate 
mechanical models, which he saw as analogies 
rather than as literal descriptions of the underlying 
physical reality. In his later work, the mechanical 
models, in particular the mechanical properties of 
the “lumiferous ether” through which light waves 
propagate, were put forward more literally as 
the foundations of his electromagnetic theory. The 
equations survive in the modern theory, but the 
mechanical models with which Maxwell, Faraday, 
and others wrestled live on only in the survival of 
archaic terminology, such as “lines of force” and 
“magnetic flux.” The luminiferous ether evaporated 
with the advent of special relativity. 

Maxwell’s legacy is not his “theory,” but his 
equations: a consistent system of partial differential 
equations that describe the whole range of known 
interactions of electric and magnetic fields with 


moving charges. They unify the treatment of 
electricity and magnetism by revealing for the first 
time the full duality between the electric and 
magnetic fields. They have been verified over an 
almost unimaginable variety of physical processes, 
from the propagation of light over cosmological 
distances, through the behavior of the magnetic 
fields of stars and the everyday applications in 
electrical engineering and laboratory experiments, 
down — in their quantum version — to the exchange 
of photons between individual electrons. 

The history of Maxwell’s equations is convoluted, 
with many false turns. Maxwell himself wrote down 
an inconsistent form of the equations, with a 
different sign for p in the first equation, in his 
1865 work “A dynamical theory of the electromag- 
netic field.” The consistent form appeared later in 
his Treatise on Electricity and Magnetism (1873); 
see Chalmers (1975). 

In this article, we shall not follow the historical 
route to the equations. Some of the complex story of 
the development hinted at in the remarks above can 
be found in the articles by Chalmers (1975), Siegel 
(1985), and Roche (1998). Neither shall we follow 
the traditional pedagogic route of many textbooks in 
building up to the full dynamical equations through 
the study of basic electrical and magnetic phenom- 
ena. Instead, we shall follow a path to Maxwell’s 
equations that is informed by knowledge of their 
most critical feature, invariance under Lorentz 
transformations. Maxwell, of course, knew nothing 
of this. 

We shall start with a summary of basic facts 
about the behavior of charges in electric and 
magnetic fields, and then establish the full dynami- 
cal framework by considering this behavior as seen 
from moving frames of reference. It is impossible, of 
course, to do this consistently within the framework 
of classical ideas of space and time since Maxwell’s 
equations are inconsistent with Galilean relativity. 
But it is at least possible to understand some of the 
key features of the equations, in particular the need 
for the term involving the time derivative of E, the 
so-called “displacement current,” in the third of 
Maxwell’s equations. 

We shall begin with some remarks concerning the 
role of relativity in classical dynamics. 


Relativity in Newtonian Dynamics 


Newton’s laws hold in all inertial frames. The 
formalism of classical mechanics is invariant under 
Galilean transformations and it is impossible to tell 
by observing the dynamical behavior of particles 
and other bodies whether a frame of reference is at 
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rest or in uniform motion. In the world of classical 
mechanics, therefore: 


Principle of Relativity There is no absolute stan- 
dard of rest; only relative motion is observable. 


In his “Dialogue concerning the two chief world 
systems,” Galileo illustrated the principle by arguing 
that the uniform motion of a ship on a calm sea does 
not affect the behavior of fish, butterflies, and other 
moving objects, as observed in a cabin below deck. 

Relativity theory takes the principle as funda- 
mental, as a statement about the nature of space and 
time as much as about the properties of the 
Newtonian equations of motion. But if it is to be 
given such universal significance, then it must apply 
to all of physics, and not just to Newtonian 
dynamics. At first this seems unproblematic — it is 
hard to imagine that it holds at such a basic level, 
but not for more complex physical interactions. 
Nonetheless, deep problems emerge when we try to 
extend it to electromagnetism since Galilean invari- 
ance conflicts with Maxwell’s equations. 

All appears straightforward for systems involving 
slow-moving charges and slowly varying electric and 
magnetic fields. These are governed by laws that 
appear to be invariant under transformations 
between uniformly moving frames of reference. 
One can imagine a modern version of Galileo’s 
ship also carrying some magnets, batteries, semi- 
conductors, and other electrical components. Salvia- 
t?s argument for relativity would seem just as 
compelling. 

The problem arises when we include rapidly 
varying fields — in particular, when we consider the 
propagation of light. As Einstein (1905) put it, 
“Maxwell’s electrodynamics..., when applied to 
moving bodies, leads to asymmetries which do not 
appear to be inherent in the phenomena.” The 
central difficulty is that Maxwell’s equations give 
light, along with other electromagnetic waves, a 
definite velocity: in empty space, it travels with the 
same speed in every direction, independently of the 
motion of the source — a fact that is incompatible 
with Galilean invariance. Light traveling with speed 
c in one frame should have speed c +u in a frame 
moving towards the source of the light with speed u. 
Thus, it should be possible for light to travel with 
any speed. Light that travels with speed c in a frame 
in which its source is at rest should have some other 
speed in a moving frame; so Galilean invariance 
would imply dependence of the velocity of light on 
the motion of the source. 

A full resolution of the conflict can only be 
achieved within the special theory of relativity: here, 
remarkably, Maxwell’s equations retain exactly 
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their classical form, but the transformations between 
the space and time coordinates of frames of 
reference in relative motion do not. The difference 
appears when the velocities involved are not insig- 
nificant when compared with the velocity of light. 
So long as one can ignore terms of order u*/c’, 
Maxwell’s equations are compatible with the Gali- 
lean principle of relativity. 


Charges, Fields, and the 
Lorentz-Force Law 


The basic objects in the modern form of electro- 
magnetic theory are 


e charged particles; and 

e the electric and magnetic fields E and B, which 
are vector quantities that depend on position and 
time. 


The charge e of a particle, which can be positive 
or negative, is an intrinsic quantity analogous 
to gravitational mass. It determines the strength 
of the particle’s interaction with the electric 
and magnetic fields — as its mass determines 
the strength of its interaction with gravitational 
fields. 

The interaction is in two directions. First, electric 
and magnetic fields exert a force on a charged 
particle which depends on the value of the charge, 
the particle’s velocity, and the values of E and B at 
the location of the particle. The force is given by the 
Lorentz-force law 


f =e(E+UAB) [5] 


in which e is the charge and u is the velocity. It is 
analogous to the gravitational force 


f = mg [6] 


on a particle of mass m in a gravitational field g. It is 
through the force law that an observer can, in 
principle, measure the electric and magnetic fields at 
a point, by measuring the force on a standard charge 
moving with known velocity. 

Second, moving charges generate electric and 
magnetic fields. We shall not yet consider in detail 
the way in which they do this, beyond stating the 
following basic principles. 


EM1. The fields depend linearly on the charges. 


This means that if we superimpose two distributions 
of charge, then the resultant E and B fields are the 
sums of the respective fields that the two distribu- 
tions generate separately. 


EM2. A stationary point charge e generates an electric 
field, but no magnetic field. The electric field is 
given by 


ker 
Bm 


7] 
where r is the position vector from the charge, 
r= |r|, and k is a positive constant, analogous 
to the gravitational constant. 


By combining [7] and [5], we obtain an inverse- 
square law electrostatic force 


kee’ 8) 
y2 

between two stationary charges; unlike gravity, it is 

repulsive when the charges have the same sign. 


EM3. A point charge moving with velocity v gen- 
erates a magnetic field 
k'ev ^r 
B = ——— 9 
: 9 
where k’ is a second positive constant. 


This is extrapolated from measurements of the 
magnetic field generated by currents flowing in 
electrical circuits. 

The constants k and k’ in EM2 and EM3 
determine the strengths of electric and magnetic 
interactions. They are usually denoted by 


1 Ho 
= k = — 10 
Ate,’ 4r eM 





Charge e is measured in coulombs, |B| in teslas, and 
|E| in volts per meter. With other quantities in SI units, 


e=89x10™ pm 13%I0° [11] 


The charge of an electron is —1.6 x 1071? C; the 
current through an electric fire is a flow 
of 5-10 Cs. The earth’s magnetic field is about 
4x10°T; a bar magnet’s is about 1T; there is a 
field of about 50T on the second floor of the 
Clarendon Laboratory in Oxford; and the magnetic 
field on the surface of a neutron star is about 10° T. 

Although we are more aware of gravity in every- 
day life, it is very much weaker than the electrostatic 
force — the electrostatic repulsion between two 
protons is a factor of 1.2 x 10°° greater than their 
gravitational attraction (at any separation, both 
forces obey the inverse-square law). 

Our aim is to pass from EM1—-EM3 to Maxwell’s 
equations, by replacing [7] and [9] by partial 
differential equations that relate the field strengths 
to the charge and current densities p and J of a 


continuous distribution of charge. The densities are 
defined as the limits 


o= ia (Z) Ji) 2 





where V is a small volume containing the point, e is 
a charge within the volume, and v is its velocity; the 
sums are over the charges in V and the limits are 
taken as the volume is shrunk (although we shall not 
worry too much about the precise details of the 
limiting process). 


Stationary Distributions of Charge 


We begin the task of converting the basic principles 
into partial differential equations by looking at the 
electric field of a stationary distribution of charge, 
where the passage to the continuous limit is made by 
using the Gauss theorem to restate the inverse- 
square law. 

The Gauss theorem relates the integral of the 
electric field over a closed surface to the total charge 
contained within it. For a point charge, the electric 
field is given by EM2: 


er 
~ 4reor? 


Since divr =3 and gradr=r/r, we have 


er e 3 3r-r 


everywhere except at r=0. Therefore, by the 
divergence theorem, 


J E-dS=0 [13] 
oV 


for any closed surface OV bounding a volume V that 
does not contain the charge. 

What if the volume does contain the charge? 
Consider the region bounded by the sphere Spr of 
radius R centered on the charge; Sr has outward 
unit normal r/r. Therefore, 


e e 
E-dS = ———— |] dS=— 
[ 4rR?eo [ EY 


In particular, the value of the surface integral on the 
left-hand side does not depend on R. 

Now consider arbitrary finite volume bounded by 
a closed surface S. If the charge is not inside 
the volume, then the integral of E over S vanishes 
by [13]. If it is, then we can apply [13] to the 
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volume V between S and a small sphere Sr to 
deduce that 


J E-as- | E-ds= | E-dS=0 
S SR OV 


and that the integrals of E over S and Sp are the 
same. Therefore, 


e/eg if the charge is in 
J E- dS = | the volume bounded by S 
> 0 otherwise 


When we sum over a distribution of charges, 
the integral on the left picks out the total charge 
within S. Therefore, we have the Gauss theorem. 


The Gauss theorem. For any closed surface 0V 
bounding a volume V, 


| B:48= O/a 


where E is the total electric field and O is the total 
charge within V. 


Now we can pass to the continuous limit. Suppose 
that E is generated by a distribution of charges with 
density p (charge per unit volume). Then by the 
Gauss theorem, 


f E-ds=— | pav 
OV CE0 JV 


for any volume V. But then, by the divergence 
theorem, 


J Give — p/eg) dV = 0 
V 


Since this holds for any volume V, it follows that 
div E = p/e€o [14] 


By an argument in a similar spirit, we can also 
show that the electric field of a stationary distribu- 
tion of charge is conservative in the sense that the 
total work done by the field when a charge is moved 
around a closed loop vanishes; that is, 


fE-ds=0 


for any closed path. This is equivalent to 
curl E = 0 [15] 


since, by Stokes’ theorem, 


f E-ds= | curl E. ds 
S 


where S is any surface spanning the path. This vanishes 
for every path and for every S if and only if [15] holds. 
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The field of a single stationary charge is con- 
servative since 


e 


E = —grad Q, Q = 





Ateor 


and therefore curl E=0 since the curl of a gradient 
vanishes identically. For a continuous distribution, 
E= —grad ¢, where 


a. pr) ay 
Be ATE L lr — r'| eN ro 


In the integral, r (the position of the point at which 
@ is evaluated) is fixed, and the integration is over 
the positions r’ of the individual charges. In spite of 
the singularity at r=r’, the integral is well defined. 
So, [15] also holds for a continuous distribution of 
stationary charge. 








The Divergence of the Magnetic Field 


We can apply the same argument that established 
the Gauss theorem to the magnetic field of a slow- 
moving charge. Here, 


_— poev Ar 
= 4r 


where r is the vector from the charge to the point at 
which the field is measured. Since r/r = -—grad(1/r), 
we have 


l r 1 
div (v A =) =v ^A curl (rad 3 = 


Therefore, div B=0 except at r=0, as in the case of 
the electric field. However, in the magnetic case, the 
integral of the field over a surface surrounding the 
charge also vanishes, since if Sp is a sphere of radius 
R centered on the charge, then 


[ Bas= | a 
Sp 4n Js, P r 


By the divergence theorem, the same is true for any 
surface surrounding the charge. We deduce that if 
magnetic fields are generated only by moving 
charges, then 





J B-dS=0 
OV 


for any volume V, and hence that 
div B = 0 /17] 


Of course, if there were free “magnetic poles” 
generating magnetic fields in the same way that 
charges generate electric fields, then this would not 
hold; there would be a “magnetic pole density” on 


the right-hand side, by analogy with the charge 
density in [14]. 


Inconsistency with Galilean Relativity 


Our central concern is the compatibility of the laws 
of electromagnetism with the principle of relativity. 
As Einstein observed, simple electromagnetic inter- 
actions do indeed depend only on relative motion; 
the current induced in a conductor moving through 
the field of a magnet is the same as that generated in 
a stationary conductor when a magnet is moved past 
it with the same relative velocity (Einstein 1905). 
Unfortunately, this symmetry is not reflected in our 
basic principles. We very quickly come up against 
contradictions if we assume that they hold in every 
inertial frame of reference. 

One emerges as follows. An observer O can measure 
the values of B and E at a point by measuring the force 
on a particle of standard charge, which is related to the 
velocity v of the charge by the Lorentz-force law, 


f =e(E+vAB) 


A second observer O’ moving relative to the first with 
velocity v will see the same force, but now acting on a 
particle at rest. He will therefore measure the electric 
field to be E’=f/e. We conclude that an observer 
moving with velocity v through a magnetic field B and 
an electric field E should see an electric field 


E'=E+vAB [18] 


By interchanging the roles of the two observers, we 
should also have 


E=E' —v\B' 19) 


where B’ is the magnetic field measured by the 
second observer. If both are to hold, then B — B’ 
must be a scalar multiple of v. 

But this is incompatible with EM3; if the fields are 
those of a point charge at rest relative to the first 
observer, then E is given by [7], and 


B=0 


On the other hand, the second observer sees the field 
of a point charge moving with velocity —v. Therefore, 


Uoev Ar 
Arr 


So B — B’ is orthogonal to v, not parallel to it. 

This conspicuous paradox is resolved, in part, by 
the realization that EM3 is not exact; it holds only 
when the velocities are small enough for the 
magnetic force between two particles to be negli- 
gible in comparison with the electrostatic force. If v 
is a typical velocity, then the condition is that v7 10 


B= — 


should be much less than 1/¢€9. That is, the velocities 
involved should be much less than 


— 3 x 108 ms"! 





C= 


1 
JV €0/40 


This, of course, is the velocity of light. 


The Limits of Galilean Invariance 


Our basic principles EM1-EM3 must now be seen to 
be approximations — they describe the interactions of 
particles and fields when the particles are moving 
relative to each other at speeds much less than that of 
light. To emphasize that we cannot expect, in 
particular, EM3 to hold for particles moving at 
speeds comparable with c, we must replace it by 


EM3’. A charge moving with velocity v, where v < c, 
generates a magnetic field 


B = —— + O(v* /c 20 

Arr NE PE |20] 
The magnetic field of a system of charges in 
general motion satisfies 


divB = 0 [21] 


In the second part, we have retained [21] as a 
differential form of the statement that there are no 
free magnetic poles; the magnetic field is generated 
only by the motion of the charges. With this change, 
the theory is consistent with the principle of 
relativity, provided that we ignore terms of order 
v? /c*. The substitution of EM3’ for EM3 resolves the 
conspicuous paradox; the symmetry noted by Ein- 
stein between the current generated by the motion of 
the conductor in a magnetic field and by the motion 
of a magnet past a conductor is explained, provided 
that the velocities are much less than that of light. 

The central problem remains however; the equa- 
tions of electromagnetism are not invariant under 
a Galilean transformation with velocity comparable 
to c. The paradox is still there, but it is more subtle 
than it appeared to be at first. There are three 
possible ways out: (1) the noninvariance is real and 
has observable effects (necessarily of order v*/c* or 
smaller); (2) Maxwell’s theory is wrong; or (3) the 
Galilean transformation is wrong. Disconcertingly, 
it is the last path that physics has taken. But that is 
to jump ahead in the story. Our task is to complete 
the derivation of Maxwell’s equations. 


Faraday’s Law of Induction 


The magnetic field of a slow-moving charge will 
always be small in relation to its electric field (even 
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when we replace B by cB to put it into the same 
units as E). The magnetic fields generated by 
currents in electrical circuits are not, however, 
dominated by large electric fields. This is because 
the currents are created by the flow, at slow 
velocity, of electrons, while overall the matter in 
the wire is roughly electrically neutral, with the 
electric fields of the positively charged nuclei and 
negatively charged electrons canceling. 

This is the physical context to keep in mind in 
the following deduction of Faraday’s law of 
induction from Galilean invariance for velocities 
much less than c. The law relates the electromotive 
force or “voltage” around an electrical circuit 
to the rate of change of the magnetic field B over 
a surface spanning the circuit. In its differential 
form, the law becomes one of Maxwell’s 
equations. 

Suppose first that the fields are generated by 
charges all moving relative to a given inertial 
frame of reference R with the same velocity v. 
Then in a second frame R’ moving relative to R 
with velocity v, there is a stationary distribution of 
charge. If the velocity is much less than that of 
light, then the electric field E’ measured in R’ is 
related to the electric and magnetic E and B 
measured in R by 


E =E+vAB 


Since the field measured in R’ is that of a stationary 
distribution of charge, we have 


curl E = 0 


In R, the charges are all moving with velocity v, so 
their configuration looks exactly the same from the 
point r at time ¢ as it does from the point r+ vr at 
time t+ 7. Therefore, 


Bir + 07,t+7) = Bir, t) 
E(r+07,t+7) = E(r,t) 


and hence by taking derivatives with respect to 7 
At T=; 


v- grad B+ 7 =) 
22] 


OE 
. grad E+ — = 
v-gradE+ At 0 


So we must have 


0=curlE’ 
= curl E + curl(v A B) 
= curl E + v div B — v - grad B 
OB 


— Z 2 
curl E + T [23] 
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since divB=0. It follows that 


OB 
LE +- = 24 
curl E + z 0 [24] 


Equation [24] is linear in B and E; so by adding 
the magnetic and electric fields of different streams 
of charges moving relative to R with different 
velocities, we deduce that it holds generally for the 
electric and magnetic fields generated by moving 
charges. 

Equation [24] encodes Faraday’s law of electro- 
magnetic induction, which describes how changing 
magnetic fields can generate currents. In the static case 


OB 

Ot 
and the equation reduces to curlE=0 — the 
condition that the electrostatic field should be 
conservative; that is, it should do no net work 
when a charge is moved around a closed loop. 

More generally, consider a wire loop in the shape of 

a closed curve y. Let S be a fixed surface spanning y. 
Then we can deduce from eqn [24] that 


f E-ds= | culE-ds 
y S 


=- | Z -ds 


2 ie B- dS) 25 


If the magnetic field is varying, so that the integral of B 
over S$ is not constant, then the integral of E around the 
loop will not be zero. There will be a nonzero electric 
field along the wire, which will exert a force on the 
electrons in the wire and cause a current to flow. 


The quantity 
f E - ds 


which is measured in volts, is the work done by the 
electric field when a unit charge makes one circuit 
of the wire. It is called the electromotive force 
around the circuit. The integral is the magnetic flux 
linking the circuit. The relationship [25] between 
electromotive force and rate of change of magnetic 
flux is Faraday’s law. 


The Field of Charges in Uniform Motion 


We can extract another of Maxwelľs equations 
from this argument. By EM3’, a single charge e with 
velocity v generates an electric field E and a 
magnetic field 


_ boev Ar 
Arr? 
where r is the vector from the charge to the point at 


which the field is measured. In the frame of reference 
R’ in which the charge is at rest, its electric field is 


+ O(v*/c*) 


œr 
4 reor? 


In the frame in which it is moving with velocity 
v, E=E' + O(v/c). Therefore, 


E' E 2 
vAE vh +0(3) 
€ C C 


By taking the curl of both sides, and dropping terms 
of order v*/c’, 


curl(cB) = curl ( ° =) 


= *(vdivE —v- grad E) 


As = 











But 
. OE 
div E = p/eo, v- grad E = —— 
Ot 
by [22]. Therefore, 
10E 
l(cB) --— = — J = 
E c Ot == Hod 
where J = pv. By summing over the separate particle 
velocities, we conclude that 
1 OE 
|B — —-— 
cur 2 = uoJ 


holds for an arbitrary distribution of charges, provided 
that their velocities are much less than that of light. 


Maxwell’s Equations 


The basic principles, together with the assumption of 
Galilean invariance for velocities much less than that 
of light, have allowed us to deduce that the electric and 
magnetic fields generated by a continuous distribution 
of moving charges in otherwise empty space satisfy 


div E = Ê [26] 
€0 
div B = 0 [27] 
1 OE 
curl B = 2 Ae = uoJ [28] 
OB 
= J 
curl E + T 0 [29] 


where p is the charge density, J is the current 
density, and c?=1/eouo. These are Maxwell’s 
equations, the basis of modern electrodynamics. 
Together with the Lorentz-force law, they describe 
the dynamics of charges and electromagnetic fields. 

We have arrived at them by considering how basic 
electromagnetic processes appear in moving frames 
of reference — an unsatisfactory route because we 
have seen on the way that the principles on which 
we based the derivation are incompatible with 
Galilean invariance for velocities comparable with 
that of light. Maxwell derived them by analyzing an 
elaborate mechanical model of electric and magnetic 
fields — as displacements in the luminiferous ether. 
That is also unsatisfactory because the model has 
long been abandoned. The reason that they are 
accepted today as the basis of theoretical and 
practical applications of electromagnetism has little 
to do with either argument. It is first that they are 
self-consistent, and second that they describe the 
behavior of real fields with unreasonable accuracy. 


The Continuity Equation 


It is not immediately obvious that the equations are 
self-consistent. Given p and J as functions of the 
coordinates and time, Maxwell’s equations are two 
scalar and two vector equations in the unknown 
components of E and B. That is, a total of eight 
equations for six unknowns — more equations than 
unknowns. Therefore, it is possible that they are in 
fact inconsistent. 

If we take the divergence of eqn [29], then we 
obtain 


O. 
z (div B) =() 


which is consistent with eqn [27]; so no problem 
arises here. However, by taking the divergence of 
eqn [28] and substituting from eqn [26], we get 


0 = div curl B 


1 Os. 
= za (div E) + uodiv J 


7 Op 
= Lo (2 + divJ ) 


This gives a contradiction unless 


2p +divJ = 0 [30] 
So the choice of p and J is not unconstrained; they 
must be related by the continuity equation [30]. This 
holds for physically reasonable distributions of 
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charge; it is a differential form of the statement 
that charges are neither created nor destroyed. 


Conservation of Charge 


To see the connection between the continuity 
equation and charge conservation, let us look at 
the total charge within a fixed V bounded by a 
surface S. If charge is conserved, then any increase 
or decrease in a short period of time must be 
exactly balanced by an inflow or outflow of charge 
across S. 

Consider a small element dS of S with outward 
unit normal and consider all the particles that have a 
particular charge e and a particular velocity v at 
time t. Suppose that there are ø of these per unit 
volume (ø is a function of position). Those that cross 
the surface element between t and t+ ôt are those 
that at time ż lie in the region of volume 


|v - nds ôt| 


shown in Figure 1. They contribute eov - dS6ét to the 
outflow of charge through the surface element. But 
the value of J at the surface element is the sum of 
eov over all possible values of v and e. By summing 
over v, e, and the elements of the surface, therefore, 
and by passing to the limit of a continuous 
distribution, the total rate of outflow is 


fI as 


Charge conservation implies that the rate of 
outflow should be equal to the rate of decrease in 
the total charge within V. That is, 


Gf eav + [J-as=0 [31] 


By differentiating the first term under the integral 
sign and by applying the divergence theorem to the 
second integral, 


Op l B 
LE $ div }av = 0 [32] 


If this is to hold for any choice of V, then p and J 
must satisfy the continuity equation. Conversely, the 
continuity equation implies charge conservation. 


n 


| F fa 


Figure 1 The outflow through a surface element. 
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The Displacement Current 


The third of Maxwell’s equations can be written as 


curl B = uo | + €o z) [33] 


in which form it can be read as an equation 
for an unknown magnetic field B in terms of 
a known current distribution J and electric 
field E. When E and J are independent of t, it 
reduces to 


curl B = uoJ 


which determines the magnetic field of a steady 
current, in a way that was already familiar 
to Maxwelľs contemporaries. But his second 
term on the right-hand side of [33] was new; it 
adds to J the so-called vacuum displacement 
current 


OE 
= Ot 
The name comes from an analogy with the 
behavior of charges in an insulating material. 
Here no steady current can flow, but the distribu- 
tion of charges within the material is distorted 
by an external electric field. When the field 
changes, the distortion also changes, and the result 
appears as a current — the displacement current — 
which flows during the period of change. Max- 
well’s central insight was that the same term 
should be present even in empty space. The 
consequence was profound; it allowed him to 
explain the propagation of light as an electromag- 
netic phenomenon. 


The Source-Free Equations 


In a region of empty space, away from the 
charges generating the electric and magnetic fields, 


we have p=0=J, and Maxwell’s equations 
reduce to 
divE = 0 [34] 
div B = 0 [35] 
1 OE 
curl B - -z3 ~ 0 [36] 
OB 
lE+—= 37 
curl E + BP 0 [37] 


where c=1/,/eouo. By taking the curl of eqn [36] 
and by substituting from eqns [35] and [37], we 
obtain 


1 OF 
= — VY?B— 
0 = grad (div B) — V^B 2 curl ( =) 


1 O 
__ 2 
——V B- za; CE) 
1 B 
c* Ot? 
Therefore, the three components of B in empty space 
satisfy the (scalar) wave equation 


—-V°B+ [38] 


Ju = 0 
Here [O] is the d’Alembertian operator, defined by 
Lo O oO g 


ol a y EE a 
ua X c? ôt? Ox? Oy* 0z? 


By taking the curl of eqn [37], we also obtain 
OE=0. 


Monochromatic Plane Waves 


The fact that E and B are vector-valued solutions of 
the wave equation in empty space suggests that we 
look for “plane wave” solutions of Maxwells 
equations in which 


E=acosQ0+ BsinQ [39] 


where @,f are constant vectors and 
Q=% (ct-r-e), e-e=1 140] 
C 


with w > 0, a, 6, and e constant; w is the frequency 
and e is a unit vector that gives the direction of 
propagation (adding T to t and cre to r leaves u 
unchanged). This satisfies the wave equation, but for 
a general choice of the constants, it will not be 
possible to find B such that eqns [34]-[37] also hold. 
By taking the divergence of eqn [39], we obtain 


div E == (e -æ sin Q — e - B cos 9) [41] 


For eqn [34] to hold, therefore, we must choose @ 

and B orthogonal to e. For eqn [37] to hold, we 
must find B such that 

OB 

curl E == (e A æ sin Q — e AB cos 0) = —— [42] 


A possible choice is 
 e^nNE 1 


B r =z (e^æcosN +e AB sin®) [43] 





and it is not hard to see that E and B then satisfy 
[35] and [36] as well. 


The solutions obtained in this way are called 
“monochromatic electromagnetic plane waves.” 

Note that such waves are transverse in the sense 
that E and B are orthogonal to the direction of 
propagation. The definition E can be written more 
concisely in the form 


E=Rel(a+if)e™| [44] 


It is an exercise in Fourier analysis to show every 
solution in empty space is a combination of 
monochromatic plane waves. A plane wave has 
“plane” or “linear” polarization if œ and p are 
proportional. It has “circular” polarization if 
a-a=B-B,a-B=0. 

At the heart of Maxwell’s theory was the idea that 
a light wave with definite frequency or color is 
represented by a monochromatic plane solution of 
his equations. 


Potentials 


For every solution of Maxwell’s equations in vacuo, 
the components of E and B satisfy the three- 
dimensional wave equation; but the converse is not 
true. That is, it is not true in general that if 


OB = 0, DJE=0 


then E and B satisfy Maxwell’s equations. For this 
to happen, the divergence of both fields must vanish, 
and they must be related by [36] and [37]. These 
additional constraints are somewhat simpler to 
handle if we work not with the fields themselves, 
but with auxiliary quantities called “potentials.” 

The definition of the potentials depends on 
standard integrability conditions from vector calcu- 
lus. Suppose that v is a vector field, which may 
depend on time. If curlu=0, then there exists a 
function @ such that 


v= grad [45] 
If divv =O, then there exists a second vector field a 
such that 

v =curla [46] 


Neither ¢ nor a is uniquely determined by v. In the 
first case, if [45] holds, then it also holds when ¢ is 
replaced by ¢’ =¢@+f, where f is a function of time 
alone; in the second, if [46] holds, then it also holds 
when a is replaced by 


a’ =a+gradu 


for any scalar function u of position and time. It 
should be kept in mind that the existence statements 
are local. If v is defined on a region U with 
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nontrivial topology, then it may not be possible to 
find a suitable ¢ or a throughout the whole of U. 
Suppose now that we are given fields E and B 
satisfying Maxwell’s equations [26|-[29] with 
sources represented by the charge density p and the 
current density J. Since div B= 0, there exists a time- 
dependent vector field A (t,x, y, z) such that 


B=curlA 


If we substitute B=curlA into [29] and interchange 
curl with the time derivative, then we obtain 


curl (z + 5 = 0 
Ot 


It follows that there exists a scalar (t,x, y,z) such 
that 
OA 
E = —grad ọ — — 47 
grado — [47] 
Such a vector field A is called a “magnetic vector 
potential”; a function ¢ such that eqn [47] holds is 
called an “electric scalar potential.” 
Conversely, given scalar and vector functions ¢ 
and A of t, x, y, z, we can define B and E by 
OA 
B=curlA, E = —grad ġ — or [48] 
Then two of Maxwell’s equations hold automati- 
cally, since 


eile = 0 
Ot 


The remaining pair translate into conditions on A 
and ¢. Equation [26] becomes 


div B = 0, 


O.. p 
S aa i 
div E = —V*¢ T (div A) = F 


and eqn [28] becomes 


1 OE r 
curl B — oe —V^A + grad div A 
1 0 OA 
+a (rade +5) 
= Hol 
If we put 
139 
Q = PET + div (A) 


then we can rewrite the equations for A and ¢ more 
simply as 


Ot €0 
OA + grad a = uoJ 
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Here we have four equations (one scalar, one vector) 
in four unknowns (¢ and the components of A). Any 
set of solutions ø, A determines a solution of 
Maxwell’s equations via [48]. 


Gauge Transformations 


Given solutions E and B of Maxwell’s equations, 
what freedom is there in the choice of A and œ? 
First, A is determined by curlA=B up to the 
replacement of A by 


A’ =A +gradu 


for some function u of position and time. The scalar 
potential ¢’ corresponding to A’ must be chosen so 
that 

OA’ 


_ Ps 2 ee 
grad ġ -- AF 


OA Ou 


= -grad ( — =) 


That is, 6 = ¢ — Ou/Ot + f(t), where f is a function 
of t alone. We can absorb f into u by subtracting 


ffar 


(this does not alter A’). So the freedom in the choice 
of A and ¢ is to make the transformation 


Ou 
'=ġ-=— J49 
pmd =o- [49] 
for any u=u(t,x,y,z). The transformation [49] is 
called a “gauge transformation.” 


Under [49], 


ArA'=A+ gradu, 


It is possible to show, under certain very mild 
conditions on a, that the inhomogeneous wave 
equation 


gou = q&a [50] 


has a solution u = u(t, x,y,z). If we choose u so that 
[50] holds, then the transformed potentials A’ and ¢’ 
satisfy 

1 O¢' 

nee ames 5 

c* Ot 

This is the “Lorenz gauge condition,” named after 
L Lorenz (not the H A Lorentz of the “Lorentz 
contraction”). 


div(A’) 


p) 


If we impose the Lorenz condition, then the only 
remaining freedom in the choice of A and ¢ is to 
make gauge transformations [49] in which u is a 
solution of the wave equation [Qu =0. Under the 
Lorenz condition, Maxwell’s equations take the 
form 


O¢ = p/€0, OA = uoJ [51] 


Consistency with the Lorenz condition follows from 
the continuity equation on ¢ and J. 

In the absence of sources, therefore, Maxwell’s 
equations for the potential in the Lorenz gauge 
reduce to 


Lo = 0, 
together with the constraint 
106 _ 
Cor 
We can, for example, choose three arbitrary solu- 


tions of the scalar wave equation for the compo- 
nents of the vector potential, and then define ¢ by 


b= 2 | divAde 


OA =0 [52] 


div A + 0 


Whatever choice we make, we shall get a solution of 
Maxwell’s equations, and every solution of Max- 
well’s equations (without sources) will arise from 
some such choice. 


Historical Note 


At the end of the eighteenth century, four types of 
electromagnetic phenomena were known, but not 
the connections between them. 


e Magnetism, the word derives from the Greek for 
“stone from Magnesia.” 

e Static electricity, produced by rubbing amber with 
fur; the word “electricity” derives from the Greek 
for “amber.” 

e Light. 

è Galvanism or “animal electricity” — the electricity 
produced by batteries, discovered by Luigi 
Galvani. 


The construction of a unified theory was a slow 
and painful business. It was hindered by attempts, 
which seem bizarre in retrospect, to understand 
electromagnetism in terms of underlying mechanical 
models involving such inventions as “electric fluids” 
and “magnetic vortices.” We can see the legacy of 
this period, which ended with Einstein’s work in 
1905, in the misleading and archaic terms that still 
survive in modern terminology: “magnetic flux,” 
“lines of force,” “electric displacement,” and so on. 
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Maxwell’s contribution was decisive, although 
much of what we now call “Maxwell’s theory” is 
due to his successors (Lorentz, Hertz, Einstein, and 
so on); and, as we shall see, a key element in 
Maxwell’s own description of electromagnetism — 
the “electromagnetic ether,” an _ all-pervasive 
medium which was supposed to transmit electro- 
magnetic waves — was thrown out by Einstein. 


A rough chronology is as follows. 


e 1800 Volta demonstrated the connection between 
galvanism and static electricity. 

e 1820 Oersted showed that the current from a 
battery generates a force on a magnet. 

e 1822 Ampère suggested that light was a wave 
motion in a “luminiferous ether” made up of two 
types of electric fluid. In the same year, Galileo’s 
“Dialogue concerning the two chief world sys- 
tems” was removed from the index of prohibited 
books. 

e 1831 Faraday showed that moving magnets can 
induce currents. 


e 1846 Faraday suggested that light is a vibration 
in magnetic lines of force. 

e 1863 Maxwell published the equations that 
describe the dynamics of electric and magnetic 
fields. 

e 1905 Einstein’s paper “On the electrodynamics 
of moving bodies.” 
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Foundations: Atoms and Molecules 


Classical statistical mechanics studies properties of 
macroscopic aggregates of particles, atoms, and 
molecules, based on the assumption that they are 
point masses subject to the laws of classical 
mechanics. Distinction between macroscopic and 
microscopic systems is evanescent and in fact the 
foundations of statistical mechanics have been laid 
on properties, proved or assumed, of few-particle 
systems. 

Macroscopic systems are often considered in 
stationary states, which means that their micro- 
scopic configurations follow each other as time 
evolves while looking the same macroscopically. 
Observing time evolution is the same as sampling 
(“not too closely” time-wise) independent copies of 
the system prepared in the same way. 

A basic distinction is necessary: a stationary state 
may or may not be in equilibrium. The first case 
arises when the particles are enclosed in a container 
Q and are subject only to their mutual conservative 


interactions and, possibly, to external conservative 
forces: a typical example is a gas in a container 
subject to forces due to the walls of Q and gravity, 
besides the internal interactions. This is a very 
restricted class of systems and states. 

A more general case is when the system is in a 
stationary state but it is also subject to nonconservative 
forces: a typical example is a gas or fluid in which a 
wheel rotates, as in the Joule experiment, with some 
device acting to keep the temperature constant. The 
device is called a thermostat and in statistical 
mechanics it has to be modeled by forces, including 
nonconservative ones, which prevent an indefinite 
energy transfer from the external forcing to the system: 
such a transfer would impede the occurrence of 
stationary states. For instance, the thermostat could 
simply be a constant friction force (as in stirred 
incompressible liquids or as in electric wires in which 
current circulates because of an electromotive force). 

A more fundamental approach would be to 
imagine that the thermostat device is not a phenom- 
enologically introduced nonconservative force (e.g., 
a friction force) but is due to the interaction with an 
external infinite system which is in “equilibrium at 
infinity.” 

In any event nonequilibrium stationary states are 
intrinsically more complex than equilibrium states. 
Here attention will be confined to equilibrium 
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statistical mechanics of systems of N identical point 
particles O = (q4, . . -, qy) enclosed in a cubic box Q, 
with volume V and side L, normally assumed to 
have perfectly reflecting walls. 

Particles of mass m located at q,q’ will be 
supposed to interact via a pair potential y(q — q’). 
The microscopic motion follows the equations 


N 
mq; = -X Oa,0(4; — 4;) + ` Wwwatl (qi) 
j=1 i 


= —04,0(Q) 1 
where the potential y is assumed to be smooth 
except, possibly, for |q — q'| < ro where it could be 
+oo, that is, the particles cannot come closer than 
ro, and at ro [1] is interpreted by imagining that they 
undergo elastic collisions; the potential Wwa models 
the container and it will be replaced, unless 
explicitly stated, by an elastic collision rule. 

The time evolution (Q, O) — S,(Q, Q) will, there- 
fore, be described on the position — velocity space, 
F(N), of the N particles or, more conveniently, on 
the phase space, i.e., by a time evolution $, on the 
momentum — position (P,Q, with P=mQ) space, 
F(N). The motion being conservative, the energy 


US Sr + ` plq; — q;) + > Wwall (4%) 
i i<j i 
def 

= K(P) + ®(Q) 

will be a constant of motion; the last term in ® is 
missing if walls are perfect. This makes it convenient to 
regard the dynamics as associated with two dynamical 
systems (F(N),S;) on the 6N-dimensional phase 
space, and (Fy(N),S;) on the (6N — 1)-dimensional 
surface of energy U. Since the dynamics [1] is 
Hamiltonian on phase space, with Hamiltonian 


H(P,Q) = Sp + 2(Q) K +0 


it follows that the volume d°’Pd°’O is conserved 
(i.e., a region E has the same volume as $E) and 
also the area 6(H(P, Q) — U)d?* PPNO is conserved. 

The above dynamical systems are well defined, 
i.e., S; is a map on phase space globally defined for 
all t € (-co, 00), when the interaction potential is 
bounded below: this is implied by the a priori 
bounds due to energy conservation. For gravita- 
tional or Coulomb interactions, much more has to 
be said, assumed, and done in order to even define 
the key quantities needed for a statistical theory of 
motion. 

Although our world is three dimensional (or at 
least was so believed to be until recent revolutionary 


theories), it will be useful to consider also systems of 
particles in dimension d#3: in this case the above 
6N and 3N become, respectively, 2dN and dN. 
Systems with dimension d=1,2 are in fact some- 
times very good models for thin filaments or thin 
films. For the same reason, it is often useful to 
imagine that space is discrete and particles can only 
be located on a lattice, for example, on Z" (see the 
section “Lattice models”). 

The reader is referred to Gallavotti (1999) for 
more details. 


Pressure, Temperature, and Kinetic 
Energy 


The beginning was BERNOULLI’s derivation of 
the perfect gas law via the identification of 
the pressure at numerical density p with the 
average momentum transferred per unit time to 
a surface element of area dS on the walls: that is, 
the average of the observable 2mvpv dS, with v 
the normal component of the velocity of 
the particles that undergo collisions with ds. 
If f(v)dv is the distribution of the normal compo- 
nent of velocity and fwd’ v = JI, f(vi)d’v, p= 
(v1, v2,v3), is the total velocity distribution, 
the average of the momentum transferred is pdS 
given by 


aS Amv? pf(v)dv = dS | mv? pf(v)dv 
= p= ds / Pf udev = 05 (5) ds 2] 


Furthermore (2/3)(K/N) was identified as pro- 
portional to the absolute temperature (K/N) det 
const (3/2)T which, with present-day notations, is 
written as (2/3)(K/N)=kpT. The constant kg was 
(later) called Boltzmann’s constant and it is the 
same for at least all perfect gases. Its independence 
on the particular nature of the gas is a conse- 
quence of Avogadro’s law stating that equal 
volumes of gases at the same conditions of 
temperature and pressure contain equal number 
of molecules. 

Proportionality between average kinetic energy 
and temperature via the universal constant kg 
became in fact a fundamental assumption extending 
to all aggregates of particles gaseous or not, never 
challenged in all later works (until quantum 
mechanics, where this is no longer true, see the 
section “Quantum statistics”. 

For more details, we refer the reader to Gallavotti 
(1999). 
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Heat and Entropy 


After Clausius’ discovery of entropy, BOLTZMANN, in 
order to explain it mechanically, introduced the heat 
theorem, which he developed to full generality 
between 1866 and 1884. Together with the men- 
tioned identification of absolute temperature with 
average kinetic energy, the heat theorem can also be 
considered a founding element of statistical 
mechanics. 

The theorem makes precise the notion of time 
average and then states in great generality that 
given any mechanical system one can associate with 
its dynamics four quantities U, V,p, T, defined as 
time averages of suitable mechanical observables 
(i.e., functions on phase space), so that when the 
external conditions are infinitesimally varied and 
the quantities U, V change by dU,dV, respectively, 
the ratio (dU+pdV)/T is exact, i.e., there is a 
function S(U,V) whose corresponding variation 
equals the ratio. It will be better, for the purpose of 
considering very large boxes (V — oo) to write this 
relation in terms of intensive quantities u = U/N and 


v=V/N as 


d dv . 
a is exact [3] 
i.e. the ratio equals the variation ds of 


s(U/N, V/N) = (1/N)S(U, V). 

The proof originally dealt with monocyclic 
systems, i.e., systems in which all motions are 
periodic. The assumption is clearly much too 
restrictive and justification for it developed from 
the early “nonperiodic motions can be regarded 
as periodic with infinite period” (1866), to the 
later ergodic hypothesis and finally to the 
realization that, after all, the heat theorem 
does not really depend on the ergodic hypothesis 
(1884). 

Although for a one-dimensional system the proof 
of the heat theorem is a simple check, it was a real 
breakthrough because it led to an answer to the 
general question as to under which conditions one 
could define mechanical quantities whose variations 
were constrained to satisfy [3] and therefore could 
be interpreted as a mechanical model of Clausius’ 
macroscopic thermodynamics. It is reproduced in 
the following. 

Consider a one-dimensional system subject to 
forces with a confining potential y(x) such that 
ly’(x)|}>0 for |x| >0,p"(0)>0 and y(x) = +00. 
All motions are periodic, so that the system is 
monocyclic. Suppose that the potential y(x) depends 
on a parameter V and define a state to be a motion with 
given energy U and given V; let 


U = total energy of the system = K+ ® 


T = time average of the kinetic energy K = (K) 


V =the parameter on which ọ [4] 


is supposed to depend 


p = —time average of vy, — (Ove) 


A state is thus parametrized by U, V. If such 
parameters change by dU,dV, respectively, and 
if du — pdV,dO “2 dU + pdV, then [3] holds. In 
fact, let x+(U, V) be the extremes of the oscillations of 
the motion with given U, V and define S as 


x+(U,V) 
S = 2log | y (U — y(x))dx 
x_(U,V) 


J(dU = Ove(x)dV)(dx/VK) 


7 Fld] VRK 


[5] 


Noting that dx/ VK = ,/2/m dt, [3] follows because 
time averages are given by integrating with respect 
to dx/WK and dividing by the integral of 1/VK. 

For more details, the reader is referred to Boltzmann 
(1968b) and Gallavotti (1999). 


Heat Theorem and Ergodic Hypothesis 


Boltzmann tried to extend the result beyond the one- 
dimensional systems (e.g., to Keplerian motions, 
which are not monocyclic unless only motions with 
a fixed eccentricity are considered). However, the 
early statement that “aperiodic motions can be 
regarded as periodic with infinite period” is really 
the heart of the application of the heat theorem 
for monocyclic systems to the far more complex gas 
in a box. 

Imagine that the gas container Q is closed by a 
piston of section A located to the right of the 
origin at distance L and acting as a lid, so that the 
volume is V=AL. The microscopic model for the 
piston will be a potential (L — £) if x =(€,7,C) are 
the coordinates of a particle. The function %(r) 
will vanish for r> ro, for some ro <L, and 
diverge to +oo at r=0. Thus, ro is the width of 
the layer near the piston where the force of the 
wall is felt by the particles that happen to be 
roaming there. 

The contribution to the total potential energy 
® due to the walls is Wwall = 22; PIL — §) and 
yp = A! Y; assuming monocyclicity, it is neces- 
sary to evaluate the time average of O,®(x)= 
Or Wwall = —21; P(L — &). As time evolves, the 
particles x; with € in the layer within rọ of the 
wall will feel the force exercised by the wall and 
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bounce back. One particle in the layer will con- 
tribute to the average of 0, ®(x) the amount 


1 

total time 5 [ d [e 
if to is the first instant when the point f enters the 
layer and tı is the instant when the -component of 
the velocity vanishes “against the wall.” Since 
—p(L—&) is the -component of the force, the 
integral is 2m §)| (by Newton’s law), provided, of 
course, &; > 0. 

Suppose that no collisions between particles occur 
while the particles travel within the range of the 
potential of the wall, i.e., the mean free path is much 
greater than the range of the potential y defining the 
wall. The contribution of collisions to the average 
momentum transfer to the wall per unit time is 
therefore given by, see [2], 


J dmv FU) PwatlAv dv 
v>0 


if pwal, f(v) are the average density near the wall 
and, respectively, the average fraction of particles 
with a velocity component normal to the wall 
between v and v + dv. Here p, f are supposed to be 
independent of the point on the wall: this should be 
true up to corrections of size o(A). 

Thus, writing the average kinetic energy per particle 
and per velocity component, f (m/2)v*f(v)dv, as 


(1/2) (cf. [2]) it follows that 


p E — (Ay®) = pwal! [7] 
has the physical interpretation of pressure. (1/2)67 
is the average kinetic energy per degree of freedom: 
hence, it is proportional to the absolute temperature 
T (cf. see the section “Pressure, temperature, and 
kinetic energy”). 

On the other hand, if motion on the energy 
surface takes place on a single periodic orbit, the 
quantity p in [7] is the right quantity that would 
make the heat theorem work; see [4]. Hence, 
regarding the trajectory on each energy surface as 
periodic (i.e., the system as monocyclic) leads to the 
heat theorem with p,U,V,T having the right 
physical interpretation corresponding to their appel- 
lations. This shows that monocyclic systems provide 
natural models of thermodynamic behavior. 

Assuming that a chaotic system like a gas in a 
container of volume V will satisfy, for practical 
purposes, the above property, a quantity p can be 
defined such that dU + pdV admits the inverse of 
the average kinetic energy (K) as an integrating 
factor and, furthermore, p,U,V,(K) have the 
physical interpretations of pressure, energy, volume, 


and (up to a proportionality factor) absolute 
temperature, respectively. 

Boltzmann’s conception of space (and time) as 
discrete allowed him to conceive the property that 
the energy surface is constituted by “points” all of 
which belong to a single trajectory: a property that 
would be impossible if the phase space was really a 
continuum. Regarding phase space as consisting of a 
finite number of “cells” of finite volume paN, for 
some b > 0 (rather than of a continuum of points), 
allowed him to think, without logical contradiction, 
that the energy surface consisted of a single 
trajectory and, hence, that motion was a cyclic 
permutation of its points (actually cells). 

Furthermore, it implied that the time average of 
an observable F(P,Q) had to be identified with its 
average on the energy surface computed via the 
Liouville distribution 


Co l F(P, Q)6(H(P, Q) — U)dP dO 
with 
Ge J 5(H(P,Q)— U)dP dO 


(the appropriate normalization factor): a property 
that was written symbolically 


dt dPdQ 
T JdPdO 
or 
T 
lim — | F(S,(P,Q))dt 
T—0o 0 


_ J FP, Q')6(A(P’, Q’) — U) dP'dQ! 8] 
[6(H(P", Q" — U) dP’dO' 


The validity of [8] for all (piecewise smooth) 
observables F and for all points of the energy 
surface, with the exception of a set of zero area, is 
called the ergodic hypothesis. 

For more details, the reader is referred to 
Boltzmann (1968) and Gallavotti (1999). 


Ensembles 


Eventually Boltzmann in 1884 realized that the 
validity of the heat theorem for averages computed 
via the right-hand side (rhs) of [8] held indepen- 
dently of the ergodic hypothesis, that is, [8] was not 
necessary because the heat theorem (i.e., [3]) could 
also be derived under the only assumption that the 
averages involved in its formulation were computed 
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as averages over phase space with respect to the 
probability distribution on the rhs of [8]. 
Furthermore, if T was identified with the average 
kinetic energy, U with the average energy, and p 
with the average force per unit surface on the walls 
of the container Q with volume V, the relation [3] 
held for a variety of families of probability distribu- 
tions on phase space, besides [8]. Among these are: 


1. The “microcanonical ensemble,” which is the 
collection of probability distributions on the rhs 
of [8] parametrized by u = U/N,v= V/N (energy 
and volume per particle), 


Hu (dP dQ) 


7 1 dP dQ 
=F VP.) N 





where þ is a constant with the dimensions of an 
action which, in the discrete representation of 
phase space mentioned in the previous section, can 
be taken such that hN equals the volume of the 
cells and, therefore, the integrals with respect to [9] 
can be interpreted as an (approximate) sum over 
the cells conceived as microscopic configurations 
of N indistinguishable particles (whence the N!). 

2. The “canonical ensemble,” which is the collec- 
tion of probability distributions parametrized by 
B,v=V/N, 


e PH(P,Q) NIGAN [10] 





H3 „(dPdOQ) = Z.(B, N, V) 


to which more ensembles can be added, such as 
the grand canonical ensemble (Gibbs). 

3. The “grand canonical ensemble” which is the 
collection of probability distributions parameter- 


ized by (,A and defined over the space 
F gc = U gor Nh 
1; (dP dQ) 
— 1 eßàN-PH(P,Q) dPdQ [11] 
Z(8.,V) NIRAN 


Hence, there are several different models of thermo- 
dynamics. The key tests for accepting them as real 
microscopic descriptions of macroscopic thermo- 
dynamics are as follows. 


1. A correspondence between the macroscopic 
states of thermodynamic equilibrium and the 
elements of a collection of probability distribu- 
tions on phase space can be established by 
identifying, on the one hand, macroscopic 
thermodynamic states with given values of the 
thermodynamic functions and, on the other, 


probability distributions attributing the same 
average values to the corresponding microscopic 
observables (i.e., whose averages have the inter- 
pretation of thermodynamic functions). 

2. Once the correct correspondence between the 
elements of the different ensembles is established, 
that is, once the pairs (u,v), (6,v),(8,A) are so 
related to produce the same values for the 
averages U, V, kpT © B+, pl|dQ| of 


2K(P) 


H(P,Q), VS, / boo(q,)2m(v1 n)? dq, [12] 


where (69Q(q,) is a delta-function pinning q, to 
the surface Q), then the averages of all physi- 
cally interesting observables should coincide at 
least in the thermodynamic limit, Q — oo. In this 
way, the elements p of the considered collection 
of probability distributions can be identified with 
the states of macroscopic equilibrium of the 
system. The w’s depend on parameters and there- 
fore they form an ensemble: each of them 
corresponds to a macroscopic equilibrium state 
whose thermodynamic functions are appropriate 
averages of microscopic observables and therefore 
are functions of the parameters identifying ju. 


Remark The word “ensemble” is often used to 
indicate the individual probability distributions of 
what has been called here an ensemble. The meaning 
used here seems closer to the original sense in the 
1884 paper of Boltzmann (in other words, often by 
“ensemble” one means that collection of the phase 
space points on which a given probability distribu- 
tion is considered, and this does not seem to be the 
original sense). 


For instance, in the case of the microcanonical 
distributions this means interpreting energy, volume, 
temperature, and pressure of the equilibrium state 
with specific energy u and specific volume v as 
proportional, through appropriate universal propor- 
tionality constants, to the integrals with respect to 
in (dP dQ) of the mechanical quantities in [12]. 
The averages of other thermodynamic observables in 
the state with specific energy u and specific volume 
v should be given by their integrals with respect 
tO [iy 

Likewise, one can interpret energy, volume, 
temperature, and pressure of the equilibrium state 
with specific energy u and specific volume v as the 
averages of the mechanical quantities [12] with 
respect to the canonical distribution pj „(dP dQ) 
which has average specific energy precisely u. The 
averages of other thermodynamic observables in the 
state with specific energy and volume u and v are 
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given by their integrals with respect to w3,. A 
similar definition can be given for the description of 
thermodynamic equilibria via the grand canonical 
distributions. 

For more details, see Gibbs (1981) and Gallavotti 
(1999), 


Equivalence of Ensembles 


BOLTZMANN proved that, computing averages via the 
microcanonical or canonical distributions, the essen- 
tial property [3] was satisfied when changes in their 
parameters (i.e., u,v or G,v, respectively) induced 
changes du and dv on energy and volume, respec- 
tively. He also proved that the function s, whose 
existence is implied by [3], was the same function 
once expressed as a function of u,v (or of any pair 
of thermodynamic parameters, e.g., of T,v or p,u). 
A close examination of Boltzmann’s proof shows 
that the [3] holds exactly in the canonical ensemble 
and up to corrections tending to 0 as Q —> œ in the 
microcanonical ensemble. Identity of thermo- 
dynamic functions evaluated in the two ensembles 
holds, as a consequence, up to corrections of this 
order. In addition, Gibbs added that the same held 
for the grand canonical ensemble. 

Of course, not every collection of stationary 
probability distributions on phase space would 
provide a model for thermodynamics: Boltzmann 
called “orthodic” the collections of stationary 
distributions which generated models of thermo- 
dynamics through the above-mentioned identifica- 
tion of its elements with macroscopic equilibrium 
states. The microcanonical, canonical, and the later 
grand canonical ensembles are the chief examples 
of orthodic ensembles. Boltzmann and Gibbs 
proved these ensembles to be not only orthodic 
but to generate the same thermodynamic functions, 
that is to generate the same thermodynamics. 

This meant freedom from the analysis of the truth 
of the doubtful ergodic hypothesis (still unproved in 
any generality) or of the monocyclicity (manifestly 
false if understood literally rather than regarding the 
phase space as consisting of finitely many small, 
discrete cells), and allowed Gibbs to formulate the 
problem of statistical mechanics of equilibrium as 
follows. 


Problem Study the properties of the collection of 
probability distributions constituting (any) one of 
the above ensembles. 


However, by no means the three ensembles just 
introduced exhaust the class of orthodic ensembles 
producing the same models of thermodynamics in 
the limit of infinitely large systems. The wealth of 


ensembles with the orthodicity property, hence 
leading to equivalent mechanical models of thermo- 
dynamics, can be naturally interpreted in connection 
with the phenomenon of phase transition (see the 
section “Phase transitions and boundary conditions”). 

Clearly, the quoted results do not “prove” 
that thermodynamic equilibria “are” described by 
the microcanonical, canonical, or grand canonical 
ensembles. However, they certainly show that, 
for most systems, independently of the number of 
degrees of freedom, one can define quite unambigu- 
ously a mechanical model of thermodynamics estab- 
lishing parameter-free, system-independent, physically 
important relations between thermodynamic quanti- 
ties (e.g, u(p(tt,v)/T(11,v)) = 9,(1/T(u,v)), from [3]). 

The ergodic hypothesis which was at the root 
of the mechanical theorems on heat and entropy 
cannot be taken as a justification of their validity. 
Naively one would expect that the time scale 
necessary to see an equilibrium attained, called 
recurrence time scale, would have to be at least the 
time that a phase space point takes to visit all 
possible microscopic states of given energy: hence, 
an explanation of why the necessarily enormous size 
of the recurrence time is not a problem becomes 
necessary. 

In fact, the recurrence time can be estimated once 
the phase space is regarded as discrete: for the 
purpose of countering mounting criticism, Boltz- 
mann assumed that momentum was discretized in 
units of (2mkpT)'/* (i.e., the average momentum 
size) and space was discretized in units of pt 
(i.e., the average spacing), implying a volume of 
cells 2% with ps pl 3(2mkpT)'/*; then he calcu- 
lated that, even with such a gross discretization, a 
cell representing a microscopic state of 1cm? of 
hydrogen at normal condition would require a time 
(called “recurrence time”) of the order of ~10!°" 
times the age of the Universe (!) to visit the entire 
energy surface. In fact, the phase space volume is 
T=(p 3N(2mkpT)?7)% =p and the number of 
cells of volume pN is T'/(N!b°X) ~ e°; and the 
time to visit all will be e%7, with 7 a typical 
atomic unit, e.g., 107s — but N=10!’. In this 
sense, the statement boldly made by young Boltz- 
mann that “aperiodic motions can be regarded as 
periodic with infinite period” was even made 
quantitative. 

The recurrence time is clearly so long to be 
irrelevant for all purposes: nevertheless, the correct- 
ness of the microscopic theory of thermodynamics 
can still rely on the microscopic dynamics once it is 
understood (as stressed by Boltzmann) that the 
reason why we observe approach to equilibrium, 
and equilibrium itself, over “human” timescales 
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(which are far shorter than the recurrence times) is 
due to the property that on most of the energy surface 
the (very few) observables whose averages yield 
macroscopic thermodynamic functions (namely pres- 
sure, temperature, energy,...) assume the same value 
even if N is only very moderately large (of the order of 
10° rather than 101°). This implies that this value 
coincides with the average and therefore satisfies the 
heat theorem without any contradiction with the 
length of the recurrence time. The latter rather 
concerns the time needed to the generic observable to 
thermalize, that is, to reach its time average: the 
generic observable will indeed take a very long time to 
“thermalize” but no one will ever notice, because the 
generic observable (e.g., the position of a pre-identified 
particle) is not relevant for thermodynamics. 

The word “proof” is not used in the mathematical 
sense so far in this article: the relevance of a 
mathematically rigorous analysis was widely rea- 
lized only around the 1960s at the same time when 
the first numerical studies of the thermodynamic 
functions became possible and rigorous results were 
needed to check the correctness of various numerical 
simulations. 

For more details, the reader is referred to Boltzmann 
(1968a, b) and Gallavotti (1999). 


Thermodynamic Limit 


Adopting Gibbs axiomatic point of view, it is 
interesting to see the path to be followed to achieve 
an equivalence proof of three ensembles introduced 
in the section “Heat theorem and ergodic 
hypothesis.” 

A preliminary step is to consider, given a cubic 
box Q of volume V=L4, the normalization factors 
Z®(8,A, V),Z°(G,N,V), and Z™(U,N,V) in [9], 
[10], and [11], respectively, and to check that the 
following thermodynamic limits exist: 


BPsc(8, A) = lim n log Z®(8,A, V) 


def 
— Bf-(8, p) s lim ~ log Z°(8, N, V) 
V0 ,7= =pN [13] 
kp Sic (u, p) 
def . 
= li i Z™(U,N, V 
V> N/V, a uN 5 ( ) 
where the density p fytI=Nn /V is used, instead of 
v, for later reference. The normalization factors play 
an important role because they have simple thermo- 
dynamic interpretation (see the next section): they 
are called grand canonical, canonical, and micro- 
canonical partition functions, respectively. 


Not surprisingly, assumptions on the interparticle 
potential yw(q—q') are necessary to achieve an 
existence proof of the limits in [13]. The assump- 
tions on y are not only quite general but also have a 
clear physical meaning. They are 


1. stability: that is, ns of a constant B > 0 
such that Dig PI — q;) = -BN for all N > 0, 
Gissetg On E Rf, a 

2. temperedness: a is, existence of constants €ọ, 
R >Q such that A- q’)|<Bla—q|* for 


lIq—q|>R. 


The assumptions are satisfied by essentially all 
microscopic interactions with the notable exceptions 
of the gravitational and Coulombic interactions, 
which require a separate treatment (and lead to 
somewhat different results on the thermodynamic 
behavior). 

For instance, assumptions (1), (2) are satisfied 
if y(q) is +co for |q| < ro and smooth for |q| > ro, 
for some ro > 0, and furthermore y(q) > Bolg 10 
- ro a al < R, while for |a| >R it is |y(q)| < 

By|q\ aro) cater some Bo, Bi,é) > 0,R > ro. Briefly, 

ọ is fast ens at contact and ae approaching 0 
A large distance. This is called a (generalized) 
Lennard-Jones potential. If 79 > 0, is called a 
hard-core potential. If Bı =0, the potential is said 
to have finite range. (See Appendix 1 for physical 
implications of violations of the above stability and 
temperedness properties.) However, in the following, 
it will be necessary, both for simplicity and to contain 
the length of the exposition, to restrict consideration 
to the case Bı =0, i.e., to 


(d+£0) 


pla) > Bola|, ro < |q| < R, 


ly(q)| =9, |q| >R 


unless explicitly stated. 

Assuming stability and temperedness, the exis- 
tence of the limits in [13] can be mathematically 
proved: in Appendix 2, the proof of the first is 
analyzed to provide the simplest example of the 
technique. A remarkable property of the functions 
BPgc( Bs A), —GBpf-( 8, p), and psmc(u, p) is that they are 
convex functions: hence, they are continuous in the 
interior of their domains of definition and, at one 
variable fixed, are differentiable with respect to the 
other with at most countably many exceptions. 

In the case of a potential without hard core 
(Pmax =œ), —pfc(9, p) can be checked to tend to 0 
slower than p as p— 0, and to —oo faster than —p as 
p — co (essentially proportionally to —plog p in both 
cases). Likewise, in the same case, Smc(u,p) can be 
shown to tend to 0 slower than u — Umin AS U — Umin, 
and to —oo faster than —u as u— oo. The latter 


[14] 
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asymptotic properties can be exploited to derive, from 
the relations between the partition functions in [13], 


Z®(6,4, V) = ` EPan 7°( 3, N.V) 
n= as] 
Z°(6,N, V) = J e “ZUN VdU 
-B 
and, from the above-mentioned convexity, the 
consequences 


BPme(B, A) = max(GAv™! — Bv fep, v?) 


[16] 

—bf: (p, v`!) = max(— ĝu -+ kn Smet, D 
and that the maxima are attained in points, or 
intervals, internal to the intervals of definition. Let 
Vgc, Uc be points where the maxima are, respectively, 
attained in [16]. 

Note that the quantity e”*N Z°(3, N, V)/Z®(G, A, V) 
has the interpretation of probability of a density 
v-'—N/V evaluated in the grand canonical distribu- 
tion. It follows that, if the maximum in the first of 
[16] is strict, that is, it is reached at a single point, the 
values of v™! in closed intervals not containing the 
maximum point Use have a probability behaving as 
<e ©’, c>0, as V — oo, compared to the probability 
of v~!’s in any interval containing ee Hence, Vgc has 
the interpretation of average value of v in the grand 
canonical distribution, in the limit V — oo. 

Likewise, the interpretation of 


e ™NZ™ (uN, N, V)/Z°(B, N, V) 


as probability in the canonical distribution of an 
energy density u shows that, if the maximum in the 
second of [16] is strict, the values of u in closed 
intervals not containing the maximum point ue have 
a probability behaving as <e“’,c > 0, as V >o, 
compared to the probability of ws in any interval 
containing ue. Hence, in the limit Q— oœ, the 
average value of u in the canonical distribution is ue. 
If the maxima are strict, [16] also establishes a 
relation between the grand canonical density, the 
canonical free energy and the grand canonical para- 
meter A, or between the canonical energy, the micro- 
canonical entropy, and the canonical parameter /:: 


Ne O- (vz ee, wi 


where convexity and strictness of the maxima imply 
the derivatives existence. 


kgl = OuSmc(Uc,v ') [17] 


Remark Therefore, in the equivalence between 
canonical and microcanonical ensembles, the cano- 
nical distribution with parameters (8,v) should 
correspond with the microcanonical with para- 
meters (u,v). The grand canonical distribution 


with parameters (8, A) should correspond with the 
canonical with parameters (8, Vgc). 


For more details, the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Physical Interpretation of 
Thermodynamic Functions 


The existence of the limits [13] implies several 
properties of interest. The first is the possibility of 
finding the physical meaning of the functions 
Pec» fcs Smc and of the parameters A, 8 

Note first that, for all V the grand canonical average 
(K) g x is (d/2)3(N) 4 so that 3 is proportional to 
the temperature Tg: = T((@, A) in the grand canonical 
distribution: 3-' = kgT(@, A). Proceeding heuristically, 
the physical meaning of p(G,A) and A can be found 
through the following remarks. 

Consider the microcanonical distribution w™*, and 
denote by f* the integral over (P, Q) extended to the 
domain of the (P,Q) such that H(P, Q)=U and, at 
the same time, g, € dV, where dV is an infinitesimal 
volume surrounding the region 2. Then, by the 
microscopic definition of the pressure p (see the 
introductory section), it is 


N f 2 pł dP dQ 





pdV = 


Z(U,N, V) 32m N!hIN 
7 2 i dP dO 
sony PNG S 


where 6 = 6(H(P, Q) — U). The RHS of [18] can be 
compared with 


OvZ(U,N,V)dV N f dP dO 
Z(U,N, V)  Z(U,N,V)J NbN 
to give 
OyZdV pdV 
Z (2/3)(Ky 


because (K)*, which denotes the average f“ K/{™ 1, 
should be essentially the same as the microcanonical 
average (K),,. (i.e., insensitive to the fact that one 
particle is constrained to the volume dV) if N is 
large. In the limit V—oo,V/N=v, the latter 
remark together with the second of [17] yields 


ky OySme(U,u~') = Bp(u,v), 
kg OySme(U, v) = 8 [19] 


respectively. Note that p > 0 and it is not increasing 
in v because Smc(p) is concave as a function of 
v=p' (in fact, by the remark following [14] 
PSmc(u, p) is convex in p and, in general, if pg(p) is 
convex in p then g(v~') is always concave in v= p’). 
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Hence, dsmc(u,v) =(du + pdv)/T, so that taking 
into account the physical meaning of p, T (as 
pressure and temperature, see the section “Pressure, 
temperature, and kinetic energy”), Sme is, in thermo- 
dynamics, the entropy. Therefore (see the second 


of [16]), =3f-(8, p) = — Bu: + kg'Smelttc, p) becomes 


fe(8, p) = Ue — TeSme (ues P), 
dfe = —p dv — sme dT 20] 


and since ue has the interpretation (as mentioned in 
the last section) of average energy in the canonical 
distribution u3 „ it follows that fe has the thermo- 
dynamic interpretation of free energy (once com- 
pared with the definition of free energy, F = U — TS, 
in thermodynamics). 


By [17] and [20], 
ASO 4 (u fe(B, Ta) Sus = I Sne PU ee 


and vg: has the meaning of specific volume v. Hence, 
after comparison with the definition of chemical 
potential, AV = U — TS + pV, in thermodynamics, it 
follows that the thermodynamic interpretation of A 
is the chemical potential and (see [16], [17]), the 
grand canonical relation 


CDV; A) = Bru. _ Bu! (—Butc ES kp Sac is y')) 


shows that Pe-(G, A) =p, implying that pe,-(B, A) is 
the pressure expressed, however, as a function of 
temperature and chemical potential. 

To go beyond the heuristic derivations above, it 
should be remarked that convexity and the property 
that the maxima in [16], [17] are reached in the 
interior of the intervals of variability of v or u are 
sufficient to turn the above arguments into rigorous 
mathematical deductions: this means that given [19] 
as definitions of p(u,v), 3(u,v), the second of [20] 
follows as well as pgc({, ) = p(y, Vz). But the 
values vg. and ue in [16] are not necessarily unique: 
convex functions can contain horizontal segments 
and therefore the general conclusion is that the 
maxima may possibly be attained in intervals. 
Hence, instead of a single vg, there might be a 
whole interval [v—, v+], where the rhs of [16] reaches 
the maximum and, instead of a single ue, there 
might be a whole interval [u—, u4] where the rhs of 
[17] reaches the maximum. 

Convexity implies that the values of A or 8 
for which the maxima in [16] or [17] are attained 
in intervals rather than in single points are rare 
(i.e., at most denumerably many): the interpretation 
is, in such cases, that the thermodynamic functions 
show discontinuities, and the corresponding 
phenomena are called phase transitions (see the 
next section). 


For more details the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Phase Transitions and Boundary 
Conditions 


The analysis in the last two sections of the relations 
between elements of ensembles of distributions 
describing macroscopic equilibrium states not only 
allows us to obtain mechanical models of thermo- 
dynamics but also shows that the models, for a given 
system, coincide at least as Q — oo. Furthermore, the 
equivalence between the thermodynamic functions 
computed via corresponding distributions in differ- 
ent ensembles can be extended to a full equivalence 
of the distributions. 

If the maxima in [16] are attained at single points 
Vgc OF Ue the equivalence should take place in the 
sense that a correspondence between ue ee 
can be established so that, given any local obser- 
vable F(P,Q), defined as an observable depending 
on (P,Q) only through the p;,q; with q; € A, where 
A CQ is a finite region, has the same average with 
respect to corresponding distributions in the limit 
Q — oo. 

The correspondence is established by considering 
(A, 6) = (8, Vgc) œ (Umes Y), where vg is where the 
maximum in [16] is attained, ume = “e is where the 
maximum in [17] is attained and vg: = v, (cf. also 


[19], [20]). This means that the limits 


lim | F(P,Q)"(dP dQ) = (F), 
(a — independent), a = gc,c,me [2.1] 


coincide if the averages are evaluated by the 
distributions 45 y, U5 y, HiS, um: 

Exceptions to [21] are possible: and are certainly 
likely to occur at values of u, v where the maxima in 
[16] or [17] are attained in intervals rather than in 
isolated points; but this does not exhaust, in general, 
the cases in which [21] may not hold. 

However, no case in which [21] fails has to be 
regarded as an exception. It rather signals that an 
interesting and important phenomenon occurs. To 
understand it properly, it is necessary to realize that 
the grand canonical, canonical, and microcanonical 
families of probability distributions are by far not 
the only ensembles of probability distributions 
whose elements can be considered to generate 
models of thermodynamics, that is, which are 
orthodic in the sense of the discussion in the section 
“Equivalence of ensembles.” More general families 
of orthodic statistical ensembles of probability 
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distributions can be very easily conceived. In 
particular: 


Definition Consider the grand canonical, canoni- 
cal, and microcanonical distributions associated 
with an energy function in which the potential 
energy contains, besides the interaction ® between 
particles located inside the container, also the 
interaction energy ®in our between particles inside 
the container and external particles, identical to the 
ones in the container but not allowed to move and 
fixed in positions such that in every unit cube A 
external to Q there is a finite number of them 
bounded independently of A. Such configurations of 
external particles will be called “boundary condi- 
tions of fixed external particles.” 


The thermodynamic limit with such boundary 
conditions is obtained by considering the grand 
canonical, canonical, and microcanonical distribu- 
tions constructed with potential energy function 
® + Min our in containers 2 of increasing size taking 
care that, while the size increases, the fixed particles 
that would become internal to 2 are eliminated. The 
argument used in the section “Thermodynamic limit” 
to show that the three models of thermodynamics, 
considered there, did define the same thermodynamic 
functions can be repeated to reach the conclusion that 
also the (infinitely many) “new” models of thermo- 
dynamics in fact give rise to the same thermodynamic 
functions and averages of local observables. Further- 
more, the values of the limits corresponding to [13] 
can be computed using the new partition functions 
and coincide with the ones in [13] (i.e., they are 
independent of the boundary conditions). 

However, it may happen, and in general it is 
the case, for many models and for particular values 
of the state parameters, that the limits in [21] do 
not coincide with the analogous limits computed 
in the new ensembles, that is, the averages of 
some local observables are unstable with respect 
to changes of boundary conditions with fixed 
particles. 

There is a very natural interpretation of such 
apparent ambiguity of the various models of 
thermodynamics: namely, at the values of the 
parameters that are selected to describe the macro- 
scopic states under consideration, there may corre- 
spond different equilibrium states with the same 
parameters. When the maximum in [16] is reached 
on an interval of densities, one should not think of 
any failure of the microscopic models for thermo- 
dynamics: rather one has to think that there are 
several states possible with the same (3, and that 
they can be identified with the probability distribu- 
tions obtained by forming the grand canonical, 


canonical, or microcanonical distributions with 
different kinds of boundary conditions. 

For instance, a boundary condition with high 
density may produce an equilibrium state with 
parameters (3, A which also has high density, i.e., the 
density v7! at the right extreme of the interval in 
which the maximum in [16] is attained, while using a 
low-density boundary condition the limit in [21] may 
describe the averages taken in a state with density v~! 
at the left extreme of the interval or, perhaps, with a 
density intermediate between the two extremes. 
Therefore, the following definition emerges. 


Definition If the grand canonical distributions 
with parameters (3,) and different choices of 
fixed external particles boundary conditions gene- 
rate for some local observable F average values 
which are different by more than a quantity 6 > 0 
for all large enough volumes Q then one says that 
the system has a phase transition at ((@,A). This 
implies that the limits in [21], when existing, will 
depend on the boundary condition and their values 
will represent averages of the observables in 
“different phases.” A corresponding definition is 
given in the case of the canonical and microcano- 
nical distributions when, given (G,v) or (u,v), the 
limit in [21] depends on the boundary conditions 
for some F. 


Remarks 


1. The idea is that by fixing one of the thermodynamic 
ensembles and by varying the boundary conditions 
one can realize all possible states of equilibrium of 
the system that can exist with the given values of 
the parameters determining the state in the chosen 
ensemble (i.e., (8, A), (G,v), or (u,v) in the grand 
canonical, canonical, or microcanonical cases, 
respectively). 

2. The impression that in order to define a phase 
transition the thermodynamic limit is necessary 
is incorrect: the definition does not require 
considering the limit Q — oo. The phenomenon 
that occurs is that by changing boundary condi- 
tions the average of a local observable can 
change at least by amounts independent of the 
system size. Hence, occurrence of a phase 
transition is perfectly observable in finite volume: 
it suffices to check that by changing boundary 
conditions the average of some observable 
changes by an amount whose minimal size is 
volume independent. It is a manifestation of an 
instability of the averages with respect to changes 
in boundary conditions: an instability which does 
not fade away when the boundary recedes to 
infinity, i.e., boundary perturbations produce 
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bulk effects and at a phase transition the averages 
of the local observable, if existing at all, will 
exhibit a nontrivial dependence on the boundary 
conditions. This is also called “long range order.” 

3. It is possible to show that when this happens then 
some thermodynamic function whose value is 
independent of the boundary condition (e.g., the 
free energy in the canonical distributions) has 
discontinuous derivatives in terms of the para- 
meters of the ensemble. This is in fact one of the 
frequently-used alternative definitions of phase 
transitions: the latter two natural definitions of 
first-order phase transition are equivalent. How- 
ever, it is very difficult to prove that a given system 
shows a phase transition. For instance, existence of 
a liquid-gas phase transition is still an open 
problem in systems of the type considered until 
the section “Lattice models” below. 

4. A remarkable unification of the theory of the 
equilibrium ensembles emerges: all distributions of 
any ensemble describe equilibrium states. If a 
boundary condition is fixed once and for all, then 
some equilibrium states might fail to be described 
by an element of an ensemble. However, if all 
boundary conditions are allowed then all equili- 
brium states should be realizable in a given 
ensemble by varying the boundary conditions. 

5. The analysis leads us to consider as completely 
equivalent without exceptions grand canonical, 
canonical, or microcanonical ensembles enlarged 
by adding to them the distributions with poten- 
tial energy augmented by the interaction with 
fixed external particles. 

6. The above picture is really proved only for 
special classes of models (typically in models 
in which particles are constrained to occupy 
points of a lattice and in systems with hard core 
interactions, rọ > 0 in [14]) but it is believed to 
be correct in general. At least it is consistent 
with all that is known so far in classical 
statistical mechanics. The difficulty is that, 
conceivably, one might even need boundary 
conditions more complicated than the fixed 
particles boundary conditions (e.g., putting 
different particles outside, interacting with 
the system with an arbitrary potential, rather 
than via y). 


The discussion of the equivalence of the ensembles 
and the question of the importance of boundary 
conditions has already imposed the consideration 
of several limits as Q— œo. Occasionally, it will 
again come up. For conciseness, it is useful to set up 
a formal definition of equilibrium states of an 
infinite-volume system: although infinite volume is 


an idealization void of physical reality, it is never- 
theless useful to define such states because certain 
notions (e.g., that of pure state) can be sharply 
defined, with few words and avoiding wide circum- 
volutions, in terms of them. Therefore, let: 


Definition An infinite-volume state with parameters 
(G,v), (u,v) or (G, A) is a collection of average values 
F— (F) obtained, respectively, as limits of finite- 
volume averages (Fo, defined from canonical, micro- 
canonical, or grand canonical distributions in 2, with 
fixed parameters (8, v), (u, v) or (8, A) and with general 
boundary condition of fixed external particles, on 
sequences NQ, > oo for which such limits exist simul- 
taneously for all local observables F. 


Having set the definition of infinite-volume 
state consider a local observable G(X) and let 
7¢G(X) = G(X + £), £ € RË, with X + € denoting the 
configuration X in which all particles are trans- 
lated by £: then an infinite-volume state is called 
a pure state if for any pair of local observables 
F,G it is 

(FreG) — (F) e —>0 [22] 
— OO 
which is called a cluster property of the pair F, G. 

The result alluded to in remark (6) is that at least in 
the case of hard-core systems (or of the simple lattice 
systems discussed in the section “Lattice models”) the 
infinite-volume equilibrium states in the above sense 
exhaust at least the totality of the infinite-volume 
pure states. Furthermore, the other states that can be 
obtained in the same way are convex combinations of 
the pure states, 1.e., they are “statistical mixtures” of 
pure phases. Note that (7G) cannot be replaced, in 
general, by (G) because not all infinite-volume states 
are necessarily translation invariant and in simple 
cases (e.g., crystals) it is even possible that no 
translation-invariant state is a pure state. 


Remarks 


1. This means that, in the latter models, general- 
izing the boundary conditions, for example 
considering external particles to be not identical 
to the ones inside the system, using periodic or 
partially periodic boundary conditions, or the 
widely used alternative of introducing a small 
auxiliary potential and first taking the infinite- 
volume states in presence of it and then letting 
the potential vanish, does not enlarge further the 
set of states (but may sometimes be useful: an 
example of a study of a phase transition by using 
the latter method of small fields will be given in 
the section “Continuous symmetries: ‘no d=2 
crystal’ theorem”). 
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2. If x is the indicator function of a local event, it 
will make sense to consider the probability of 
occurrence of the event in an infinite-volume state 
defining it as (x). In particular, the probability 
density for finding p particles at x1,x2,...,Xp, 
called the p-point correlation function, will thus be 
defined in an infinite-volume state. For instance, 
if the state is obtained as a limit of canonical 
states (-), with parameters p, p, p=N,/Vn, in a 
sequence of containers 9Q,, then 


Nn 
x)= in( 5 d(x — a) 
j=1 Q, 
Nn P 
p(x1, X2, ras Xy) = im( D [I (X; — a) 
i Q 


EEEE ip j=1 


where the sum is over the ordered p-ples 
(715---5fp). Thus, the pair correlation p(q, q’) 
and its possible cluster property are 


AD) 
def); me exp( —BU(4,4',415---,4N,-2)) dq1 dn» 
lim (Ni = 217, (550; Va) 
plq, (d +6))—pla)old +8) a 0 [23] 
where 


zs af J e-PU(O) JO 


is the “configurational” partition function. 


The reader is referred to Ruelle (1969), Dobrushin 
(1968), Lanford and Ruelle (1969), and Gallavotti 
(1299); 


Virial Theorem and Atomic Dimensions 


For a long time it has been doubted that “just 
changing boundary conditions” could produce such 
dramatic changes as macroscopically different states 
(i.e., phase transitions in the sense of the definition in 
the last section). The first evidence that by taking the 
thermodynamic limit very regular analytic functions 
like N log Z°(3, N, V) (as a function of 8, v = V/N) 
could develop, in the limit Q — oo, singularities like 
discontinuous derivatives (corresponding to the max- 
imum in [16] being reached on a plateau and to a 
consequent existence of several pure phases) arose in 
the van der Waals’ theory of liquid-gas transition. 
Consider a real gas with N identical particles with 
mass m in a container Q with volume V. Let the 
force acting on the ith particle be f;; multiplying 


both sides of the equations of motion, mq;=f;, by 
—(1/2)q; and summing over i, it follows that 


es p 
poca ha 


and the quantity C(q) defines the virial of the forces 
in the configuration q. Note that C(q) is not 
translation invariant because of the presence of the 
forces due to the walls. 

Writing the force f; as a sum of the internal and 
the external forces (due to the walls) the virial C can 
be expressed naturally as sum of the virial Cint of the 
internal forces (translation invariant) and of the 
virial C.,, of the external forces. 

By dividing both sides of the definition of the 
virial by r and integrating over the time interval 
[0,7], one finds in the limit t — +00, that is, up to 
quantities relatively infinitesimal as T — oo, that 


A 5C@) 


(K)=3(C) and (Con) = 3pV 
where p is the pressure and V the volume. Hence 
(K) =3pV T 5(Cint) 


or 





24 
3N |24] 
Equation [24] is Clausius’ virial theorem: in the case 
of no internal forces, it yields Bpv = 1, the ideal-gas 
equation. 

The internal E Cine can be written, if f; 


joi 
q; P Y(q; — qj), a 


Cint = Eaa di 


i=1 iŻj 


= ` ðq pld; 


i<j 


q;) - (q; — 4;) 


which shows that the contribution to the virial by 
the internal repulsive forces is negative while that of 
the attractive forces is positive. The average of Cint 
can be computed by the canonical distribution, 
which is convenient for the purpose. van der Waals 
first used the virial theorem to perform an actual 
computation of the corrections to the perfect-gas 
laws. Simply neglect the third-order term in the 
e and use the approximation p(q,,q>)= 
p-e Pe(%—4) for the pair correlation function, [23], 
then 


5 (Cm) = VIe) + VOU") 2S 
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where 


1(8)=5 |60 -1a 
and the equation of state [24] becomes 


I 
pv + Li + Ov 7) = 87! 
For the purpose of illustration, the calculation of I 
can be performed approximately at “high tempera- 
ture” (8 small) in the case 


E ro 12 ro 6 
oo) =4e( (7) -0 
(the classical Lennard-Jones potential), €,rọ > 0. 
The result is 


b 
(+ S)e= (147) 5-259 Olar) 
(p+) (w — b)8 =1+ Ow?) [26] 


which gives the equation of state for Ge < 1. Equation 
[26] can be compared with the well-known empirical 
van der Waals equation of state: 


o(p +S) e- b)=1 
(p + An*/V*)(V — nB) = nRT [27] 


where, if Na is Avogadros number, A=aNi, 
B=bNag,R=kpNa,n=N/Nagj. It shows the possi- 
bility of accessing the microscopic parameters £ and 
ro of the potential p via measurements detecting 
deviations from the Boyle—Mariotte law, Gpv=1, 
of the rarefied gases: ¢€=3a/8b=3A/8BNa 
ro = (3b/2n)'/3 = (3B/2nNa))”. 

As a final comment, it is worth stressing that the 
virial theorem gives in principle the exact correc- 
tions to the equation of state, in a rather direct and 
simple form, as time averages of the virial of the 
internal forces. Since the virial of the internal forces 
is easy to calculate from the positions of the 
particles as a function of time, the theorem provides 
a method for computing the equation of state in 


numerical simulations. In fact, this idea has been 
exploited in many numerical experiments, in which 
[24] plays a key role. 

For more details, the reader is referred to Gallavotti 
(1999), 


van der Waals Theory 


Equation [27] is empirically used beyond its validity 
region (small density and small 8) by regarding A, B as 
phenomenological parameters to be experimentally 
determined by measuring them near generic values of 
p, V, T. The measured values of A, B do not “usually 
vary too much” as functions of v, T and, apart from 
this small variability, the predictions of [27] have 
reasonably agreed with experience until, as experi- 
mental precision increased over the years, serious 
inadequacies eventually emerged. 

Certain consequences of [27] are appealing: for 
example, Figure 1 shows that it does not give a p 
monotonic nonincreasing in v if the temperature is 
small enough. A critical temperature can be defined 
as the largest value, Te, of the temperature below 
which the graph of p as a function of v is not 
monotonic decreasing; the critical volume V. is the 
value of v at the horizontal inflection point 
occurring for T = T}. 

For T < T, the van der Waals interpretation of the 
equation of state is that the function p(v) may 
describe metastable states while the actual equilibrium 
states would follow an equation with a monotonic 
dependence on v and p(v) becoming horizontal in the 
coexistence region of specific volumes. The precise 
value of p where to draw the plateau (see Figure 1) 
would then be fixed by experiment or theoretically 
predicted via the simple rule that the plateau 
associated with the represented isotherm is drawn at 
a height such that the area of the two cycles in the 
resulting loop are equal. 

This is Maxwells rule: obtained by assuming 
that the isotherm curve joining the extreme points of 
the plateau and the plateau itself define a cycle 


V; V, 


g V 


Figure 1 The van der Waals equation of state at a temperature 
T < Te where the pressure is not monotonic. The horizontal line 
illustrates the “Maxwell rule.” 
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(see Figure 1) representing a sequence of possible 
macroscopic equilibrium states (the ones correspond- 
ing to the plateau) or states with extremely long time 
of stability (“metastable”) represented by the curved 
part. This would be an isothermal Carnot cycle which, 
therefore, could not produce work: since the work 
produced in the cycle (i.e., $ pdv) is the signed area 
enclosed by the cycle the rule just means that the area is 
zero. The argument is doubtful at least because it is not 
clear that the intermediate states with p increasing 
with v could be realized experimentally or could even 
be theoretically possible. 

A striking prediction of [27], taken literally, is 
that the gas undergoes a gas-liquid phase transition 
with a critical point at a temperature T., volume ve, 
and pressure p. that can be computed via [27] and 
are given by RT, = 8A/27B, V.=3B (n=1). 

At the same time, the above prediction is interesting 
as it shows that there are simple relations between the 
critical parameters and the microscopic inter- 
action constants, 1.€., € ~ kgT, and ro ~ (V./Na))'”?: 
or more precisely e = 81kpT./64, 19 =(Ve/2mNa)'/? 
if a classical Lennard-Jones potential (i.e., p=4e 
((ro/|q|)'* — (ro/|q|)°); see the last section) is used 
for the interaction potential y. 

However, [27] cannot be accepted acritically not 
only because of the approximations (essentially the 
neglecting of O(v~') in the equation of state), but 
mainly because, as remarked above, for T < T, the 
function p is no longer monotonic in v as it must be; 
see comment following [19]. 

The van der Waals equation, refined and comple- 
mented by Maxwell’s rule, predicts the following 
behavior: 


(p — Pc) x (v— ve)’, 6=3, T= Ke 
(v,—u)«(Te—T)*’, G=1/2, iaTeT p8 


which are in sharp contrast with the experimental 
data gathered in the twentieth century. For the 
simplest substances, one finds instead 625, 8 S 1/3. 

Finally, blind faith in the equation of state [27] is 
untenable, last but not least, also because nothing in 
the analysis would change if the space dimension was 
d=2 or d=1: but for d= 1, it is easily proved that the 
system, if the interaction decays rapidly at infinity, 
does not undergo phase transitions (see next section). 

In fact, it is now understood that van der Waals’ 
equation represents rigorously only a limiting situa- 
tion, in which particles have a hard-core interaction 
(or a strongly repulsive one at close distance) and a 
further smooth interaction y with very long range. 
More precisely, suppose that the part of the potential 
outside a hard-core radius ro > 0 is attractive 
(i.e., non-negative) and has the form y“y1(y~!|q|) < 0 


and call Po(v) the (G-independent) product of 8 times 
the pressure of the hard-core system without any 
attractive tail (Po(v) is not explicitly known except 
if d=1, in which case it is Po(v)(v — b) =1,b=10), 
and let 


1 
a=-5/  |eila)idg 
lg|>ro 


If p(G, v; y) is the pressure when y > 0 then it can be 
proved that 


6p(8,v) = lim 6p(8,¥59) 


- n + Pa(v)| [29] 


Maxwell’s rule 


where the subscript means that the graph of p()3, v) 
as a function of v is obtained from the function in 
square bracket by applying to it Maxwell’s rule, 
described above in the case of the van der Waals 
equation. Equation [29] reduces exactly to the 
van der Waals equation for d=1, and for d>1 
it leads to an equation with identical critical 
behavior (even though Po(v) cannot be explicitly 
computed). 

The reader is referred to Lebowitz and Penrose 
(1979) and Gallavotti (1999) for more details. 


Absence of Phase Transitions: d = 1 


One of the most quoted no-go theorems in statistical 
mechanics is that one-dimensional systems of parti- 
cles interacting via short-range forces do not exhibit 
phase transitions (cf. the next section) unless the 
somewhat unphysical situation of having zero 
absolute temperature is considered. This is particu- 
larly easy to check in the case of “nearest-neighbor 
hard-core interactions.” Let the hard-core size be ro, 
so that the interaction potential y(r) = +00 if r < ro, 
and suppose also that y(r) =0 if f > 2rọ. In this 
case, the thermodynamic functions can be exactly 
computed and checked to be analytic: hence the 
equation of state cannot have any phase transition 
plateau. This is a special case of van Hove’s theorem 
establishing smoothness of the equation of state for 
interactions extending beyond the nearest neighbor 
and rapidly decreasing at infinity. 

If the definition of phase transition based on the 
sensitivity of the thermodynamic limit to variations 
of boundary conditions is adopted then a more 
general, conceptually simple, argument can be given 
to show that in one-dimensional systems there 
cannot be any phase transition if the potential 
energy of mutual interaction between a 
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configuration O of particles to the left of a reference 
particle (located at the origin O, say) and a 
configuration Q’ to the right of the particle (with 
QUOUQ' compatible with the hard cores) is 
uniformly bounded below. Then a mathematical 
proof can be devised showing that the influence of 
boundary conditions disappears as the boundaries 
recede to infinity. One also says that no long-range 
order can be established in a one-dimensional case, 
in the sense that one loses any trace of the boundary 
conditions imposed. 

The analysis fails if the space dimension is > 2: in 
this case, even if the interaction is short-ranged, the 
energy of interaction between two regions of space 
separated by a boundary is of the order of the 
boundary area. Hence, one cannot bound above and 
below the probability of any two configurations in 
two half-spaces by the product of the probabilities 
of the two configurations, each computed as if the 
other was not there. This is because such a bound 
would be proportional to the exponential of the 
surface of separation, which tends to oo when the 
surface grows large. This means that we cannot 
consider, at least not in general, the configurations 
in the two half-spaces as independently distributed. 

Analytically, a condition on the potential suffi- 
cient to imply that the energy between a configura- 
tion to the left and one to the right of the origin is 
bounded below, if d=1, is simply expressed by 


Co 
J ri\y(r)|dr < +œ forr > ro 
y! 


Therefore, in order to have phase transitions in 
d= 1, a potential is needed that is “so long range” 
that it has a divergent first moment. It can be 
shown by counterexamples that if the latter condi- 
tion fails there can be phase transitions even in 
d= 1 systems. 

The results just quoted also apply to discrete 
models like lattice gases or lattice spin models that 
will be considered later in the article. 

For more details, we refer the reader to Landau 
and Lifschitz (1967), Dyson (1969), Gallavotti 
(1999), and Gallavotti et al. (2004). 


Continuous Symmetries: “No d= 2 
Crystal” Theorem 


A second case in which it is possible to rule out 
existence of phase transitions or at least of certain 
kinds of transitions arises when the system under 
analysis enjoys large symmetry. By symmetry is 
meant a group of transformations acting on the 
configurations and transforming each of them into a 


configuration which, at least for one boundary 
condition (e.g., periodic or open), has the same 
energy. 

A symmetry is said to be “continuous” if the 
group of transformations is a continuous group. For 
instance, continuous systems have translational 
symmetry if considered in a container Q with 
periodic boundary conditions. Systems with “too 
much symmetry” sometimes cannot show phase 
transitions. For instance, the continuous translation 
symmetry of a gas in a container Q with periodic 
boundary conditions is sufficient to exclude the 
possibility of crystallization in dimension d= 2. 

To discuss this, which is a prototype of a proof 
which can be used to infer absence of many 
transitions in systems with continuous symmetries, 
consider the translational symmetry and a potential 
satisfying, besides the usual [14] and with the 
symbols used in [14], the further property that 
lal |Z0(4)| < Blq| t, with eo > 0, for some B 
holds for rọ < |q| < R. This is a very mild extra 
requirement (and it allows for a hard-core 
interaction). 

Consider an “ideal crystal” on a square lattice 
(for simplicity) of spacing a, exactly fitting in its 
container Q of side L assumed with periodic 
boundary conditions: so that N =(L/a)4 is the 
number of particles and a~@ is the density, which is 
supposed to be smaller than the close packing 
density if the interaction y has a hard core. The 
probability distribution of the particles is rather 
trivial: 


d 
EDD I 6p) a0) 2 
p n l 


the sum running over the permutations m — p(m) of 
the sites m € Q, m € Z$, 0 < m; < La. The density 
at q is 


N 
Pla) =X _ lq -an) = (> 6(q— a) 
n j=1 


and its Fourier transform is proportional to 


def 1 =ikd; _ 2a d 
TON ) k=—n, neZ 


p(k) has value 1 for all k of the form K=(27/a)n 
and (1/N)O(max,—1,2 |e — 1%) otherwise. In 
presence of interaction, it has to be expected that, 
in a crystal state, p(k) has peaks near the values K: 
but the value of p(k) can depend on the boundary 
conditions. 

Since the system is translation invariant a crystal 
state defined as a state with a distribution “close” to 71, 
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i.e., with (q) with peaks at the ideal lattice points 
q=na, cannot be realized under periodic boundary 
conditions, even when the system state is crystalline. 
To realize such a state, a symmetry-breaking term is 
needed in the interaction. 

This can be done in several ways, for example, by 
changing the boundary condition. Such a choice 
implies a discussion of how much the boundary 
conditions influence the positions of the peaks of 
p(k): for instance, it is not obvious that a boundary 
condition will not generate a state with a period 
different from the one that a priori has been selected 
for disproval (a possibility which would imply a 
reciprocal lattice of K’s different from the one 
considered to begin with). Therefore, here the choice 
will be to imagine that an external weak force with 
potential eW(q) acts forcing a symmetry breaking 
that favors the occupation of regions around the 
points of the ideal lattice (which would mark the 
average positions of the particles in the crystal state 
that is being sought). The proof (Mermin’s theorem) 
that no equilibrium state with particles distribution 
“close” to m, i.e., with peaks in place of the delta 
functions (see below), is essentially reproduced 
below. 

Take W(q)= > naco x(q — na), where x(q) < 0 is 
smooth and zero everywhere except in a small 
vicinity of the lattice points around which it 
decreases to some negative minimum keeping a 
rotation symmetry around them. The potential W is 
invariant under translations by the lattice steps. By 
the choice of the boundary condition and €W, the 
density p-(q) will be periodic with period a so that 
p-(k) will, possibly, not have a vanishing limit as 
N — œ only if k is a reciprocal vector K =(27/a)n 
If the potential is y + €W and if there exists a crystal 
state in which particles have higher probability of 
being near the lattice points ma, it should be 
expected that for small € > 0 the system will be 
found in a state with Fourier transform of the 
density, p-(k), satisfying, for some vector K Æ 0 in 
the reciprocal lattice, 


lim lim lee(K)| =r >0 [30] 


that is, the requirement is that uniformly in € — 0 
the Fourier transform of the density has a peak at 
some K Æ 0. Note that if k is not in the reciprocal 
lattice p-(k) —> 0, being bounded above by 


1 
SO(n lew _ i?) 


because (1/N)p- is periodic and its integral over q is 
equal to 1. Hence, excluding the existence of a 


crystal will be identified with the impossibility of the 
[30]. Other criteria can be imagined, for example, 
considering crystals with a lattice different from 
simple cubic, which lead to the same result by 
following the same technique. Nevertheless, it is not 
mathematically excluded (but unlikely) that, with 
some weaker existence definition, a crystal state 
could be possible even in two dimensions. 

The following inequalities hold under the present 
assumptions on the potential and in the canonical 
distribution with periodic boundary conditions 
and parameters (3, p), p=4™ in a box 2 with side 
multiple of a (so that N= (La-!)?) and potential of 
interaction y + €W. The further assumption that the 
lattice na is not a close-packed lattice is (of course) 
necessary when the interaction potential has a hard 
core. Then, for suitable Bo, B, B1, B2 > 0, indepen- 
dent of N, and £ and for |x| < a/a and for all Q 


lif K # 0) 


=, ee +K)-4; 
Ni 


j=l 
ye) § RO 


where the averages are in the canonical distribu- 
tion (8, p) with periodic boundary conditions and a 
symmetry-breaking potential eW (q); y(k) > 0 is an 
(arbitrary) smooth function vanishing for 2|x| > 6 
with 6 < 27/a and By depends on y. See Appendix 
3 for a derivation of [31]. 

Multiplying both sides of the first equation in [31] 
by N-!4(x) and summing over g, the crystallinity 
condition in the form [30] implies 


Bo > Bra! | Sole 5 
E kes K7By + eB3 


For d=1,2 the integral diverges, as e!/? or loge, 
respectively, implying |p-(K)|—>r=0: the criterion 
of crystallinity, [30] cannot be ‘satisfied if d= LZ, 
The above inequality is an example of a pee 
class of inequalities called infrared inequalities stem- 
ming from another inequality called Bogoliubov’s 
inequality (see Appendix 3), which lead to the proof 
that certain kinds of ordered phases cannot exist if 
the dimension of the ambient space is d=2 when a 
finite volume, under suitable boundary conditions 
(e.g., periodic), shows a continuous symmetry. The 
excluded phenomenon is, more precisely, the non- 
existence of equilibrium states exhibiting, in the 
thermodynamic limit, a symmetry lower than 
the continuous symmetry holding in a finite volume. 
In general, existence of thermodynamic equili- 
brium states with symmetry lower than the 
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symmetry enjoyed by the system in finite volume 
and under suitable boundary conditions is called a 
“spontaneous symmetry breaking.” It is yet another 
manifestation of instability with respect to changes 
in boundary conditions, hence its occurrence reveals 
a phase transition. There is a large class of systems 
for which an infrared inequality implies absence of 
spontaneous symmetry breaking: in most of the one- 
or two-dimensional systems a continuous symmetry 
cannot be spontaneously broken. 

The limitation to dimension d < 2 is a strong 
limitation to the generality of the applicability of 
infrared theorems to exclude phase transitions. 
More precisely, systems can be divided into classes 
each of which has a “critical dimension” below 
which too much symmetry implies absence of 
phase transitions (or of certain kinds of phase 
transitions). 

It should be stressed that, at the critical dimen- 
sion, the symmetry breaking is usually so weakly 
forbidden that one might need astronomically large 
containers to destroy small effects (due to boundary 
conditions or to very small fields) which break the 
symmetry. For example, in the crystallization just 
discussed, the Fourier transform peaks are only 
bounded by O(1/,\/loge—!). Hence, from a practical 
point of view, it might still be possible to have some 
kind of order even in large containers. 

The reader is referred to Mermin (1968), Hohen- 
berg (1969), and Ruelle (1969). 


High Temperature and Small Density 


There is another class of systems in which no phase 
transitions take place. These are the systems with 
stable and tempered interactions » (e.g., those 
satisfying [14]) in the high-temperature and low- 
density region. The property is obtained by showing 
that the equation of state is analytic in the variables 
(3, p) near the origin (0,0). 

A simple algorithm (Mayer’s series) yields the 
coefficients of the virial series 


6p(8,p) =p +> clb) 
k=2 


It has the drawback that the kth order coefficient c;(/3) 
is expressed as a sum of many terms (a number 
growing more than exponentially fast in the order k) 
and it is not so easy (but possible) to show 
combinatorially that their sum is bounded exponen- 
tially in k if G is small enough. A more efficient 
approach leads quickly to the desired solution. 


def 
Denoting ®(41,---54n) = Xi pldi — q;) consider 
the (“spatial or configurational”) correlation functions 


defined, in the grand canonical distribution with 
parameters 3, A (and empty boundary conditions), by 


def 1 eo 
poldi,- --;qn) — KAV? 


m=0 


x J e FO gee Dn Ym) dy `+- dyn, — Yn [32] 
Q : 


m 


This is the probability density for finding particles 
with any momentum in the volume element dq, ---dq,, 
(irrespective of where other particles are), and 
z =efà(4/2nmß-1h-2)f accounts for the integration 
over the momenta variables and is called the activity: 
it has the dimension of a density (cf. [23]). 

Assuming that the potential has a hard core (for 
simplicity) of radius R, the interaction energy 
a (d2,---,4n) Of a particle at q4 with any number 
of other particles at q3,..., 4, with |q; — q;| > R is 
bounded below by —B for some B > 0 (related but 
not equal to the B in [14]). The functions po will be 
regarded as a sequence of functions “of one, two, ... 
particle positions”: po = {pa(q1,---54,)},—1 vanish- 
ing for q; ¢ Q. Then, one checks that 


pals: G a.) = 26n,1X0(41) +K po(q,- ie Qn [33a] 


with 


def _ 
Kpalqis---, qn) = e?n B®) (0 (Ga 5-5 n) One 


L, f dydy, qr, 7 
else Be(4i-Ye) _ 1) 


X PO(Ga5-*6sFnsV19-+-9Vs)) [33b] 


where 6,1, 6x1 are Kronecker deltas and x(q) is the 
indicator function of Q. Equation [33] is called the 
Kirkwood-Salzburg equation for the family of corre- 
lation functions in Q. The kernel K of the equations is 
independent of 2, but the domain of integration is 2. 

Calling ag the sequence of functions 
aQ(qy; eak sdn) =0 if n F 1 and ao(q) =%xoalq)s a 


recursive expansion arises, namely 
PQ = ZaQ + Kage K oo be KR ag + <<< [34] 


It gives the correlation functions, provided the series 
converges. The inequality 


p 
K’aolgiqn)| e+ f e80 — taa) 


= eC OPDP (p)? 35] 


shows that the series [34], called Mayer’s series, 
converges if |z|<e~(#°8+)7(G)%. Convergence is 
uniform (as Q — œ) and (K?)ag(q),---5g,,) tends to 
a limit as V — œ at fixed q4,...,q„ and the limit is 


simply (K?a)(qy,---5,), if a(q1,---59,) =O for n 41, 
and a(q,)=1. This is because the kernel K contains 
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the factors (e°"'41-») — 1) which decay rapidly or, if 
y has finite range, will eventually even vanish. It 
is also clear that (K?a)(q1,...,g,) is translation 
invariant. 

Hence, if |z|e2?8+17(3)° <1, the limits, as Q — ov, 
of the correlation functions exist and can be 
computed by a convergent power series in z; the 
correlation functions will be translation invariant (in 
the thermodynamic limit). 

In particular, the one-point correlation function 
p= plq) is p=2(1 + O(zr(B)*)), which, to lowest order 
in z, just shows that activity and density essentially 
coincide when they are small enough. Furthermore, 
Bpa =(1/V) log Z®(G, A, V) is such that 


1 
20z BPa = y | pala dq 


(from the definition of pg in [32]). Therefore, 


Bp (GB, z) 


1 
im —log Z% 
py eS ee) 


Z dz’ 


7 / = 0(8,2" [36] 


and, since the density p is analytic in z as well and 
pœz for z small, the grand canonical pressure is 
analytic in the density and 3p = p(1 + O(p*)), at small 
density. In other words, the equation of state is, to 
lowest order, essentially the equation of a perfect gas. 
All quantities that are conceivably of some interest 
turn out to be analytic functions of temperature and 
density. The system is essentially a free gas and it has 
no phase transitions in the sense of a discontinuity or 
of a singularity in the dependence of a thermodynamic 
function in terms of others. Furthermore, the system 
cannot show phase transitions in the sense of sensitive 
dependence on boundary conditions of fixed external 
particles. This also follows, with some extra work, 
from the Kirkwood-Salzburg equations. 

The reader is referred to Ruelle (1969) and 
Gallavotti (1969) for more details. 


Lattice Models 


The problem of proving the existence of phase 
transitions in models of homogeneous gases with 
pair interactions is still open. Therefore, it makes 
sense to study the problem of phase transitions 
in simpler models, tractable to some extent but 
nontrivial, and which are of practical interest in 
their own right. 

The simplest models are the so-called lattice 
models in which particles are constrained to points 
of a lattice: they cannot move in the ordinary sense 
of the word (but, of course, they could jump) and 


therefore their configurations do not contain 
momentum variables. 

The interaction energy is just the potential 
energy, and ensembles are defined as collections of 
probability distributions on the position coordinates 
of the particle configurations. Usually, the potential 
is a pair potential decaying fast at oo and, often, 
with a hard-core forbidding double or higher 
occupancy of the same lattice site. For instance, 
the lattice gas with potential y, in a cubic box 2 
with |Q|= V = Lf sites of a square lattice with mesh 
a>0O, is defined by the potential energy attributed 
to the configuration X of occupied distinct sites, 


i.e., subsets X C Q: 


H(X)=- $ v(x—y) [37] 


(x, y)EX 


where the sum is over pairs of distinct points in X. 
The canonical ensemble and the grand canonical 
ensemble are the collections of distributions, para- 
metrized by (3,p),(9=N/V), or, respectively, by 
(8, A), attributing to X the probability 


or 
BX|X| .—BH(X) 
— [38b] 


PaO ZEA) 


where the denominators are normalization factors 
that can, respectively, be called, in analogy with the 
theory of continuous systems, canonical and grand 
canonical partition functions; the subscript p stands 
for particles. 

A lattice gas in which in each site there can be at 
most one particle can be regarded as a model for the 
distribution of a family of spins on a lattice. Such 
models are quite common and useful (e.g., they arise 
in studying systems with magnetic properties). 
Simply identify an “occupied”? site with a “spin 
up” or + and an “empty” site with a “spin down” 
or — (say). If o = {0x}yeQ is a spin configuration, the 
energy of the configuration “for potential p and 
magnetic field h” will be 


H(o)=- X` yl(x-y)oxoy-bS ox [B9] 


(x, y)EQ 


with the sum running over pairs (x, y) € Q of distinct 
sites. If p(x — y) = Jxy > 0, the model is called a 
ferromagnetic Ising model. As in the case of 
continuous systems, it will be assumed to have a 
finite range for y: that is, y(x)=0 for |x| > R, for 
some R, unless explicitly stated otherwise. 
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The canonical and grand canonical ensembles in the 
box Q with respective parameters (8, m) or (8, h) will 
be defined as the probability distributions on the spin 
configurations O={ox},eq with X3 eo 0x =M=mV 
or without constraint on M, respectively; hence, 


exp (-8 2 (axy) P(X — y)ox0y 
Lo Ze(6, M, 2) 
po plO) [40] 
exp( -6h Doe a s y)ox0y 
7 Z§(8,b,Q) 


where the denominators are normalization factors 
again called, respectively, the canonical and grand 
canonical partition functions. As in the study of the 
previous continuous systems, canonical and grand 
canonical ensembles with “external fixed particle 
configurations” can be defined together with the 
corresponding ensembles with “external fixed spin 
configurations”; the subscript s stands for spins. 

For each configuration X C Q of a lattice gas, let 
{nx} be ny =1 if x € X and n; =0 if x ¢ X. Then the 
transformation o, =2n, — 1 establishes a correspon- 
dence between lattice gas and spin distributions. In 
the correspondence, the potential y(x — y) of the 
lattice gas generates a potential (1/4)y(x — y) for the 
corresponding spin system and the chemical potential 
AÀ for the lattice gas is associated with a magnetic field 
h for the spin system with h = (1/2)(A + > x0 P(X). 

The correspondence between boundary conditions 
is natural: for instance, a boundary condition for the 
lattice gas in which all external sites are occupied 
becomes a boundary condition in which external 
sites contain a spin +. The close relation between 
lattice gas and spin systems permits switching from 
one to the other with little discussion. 

In the case of spin systems, empty boundary 
conditions are often considered (no spins outside Q). 
In lattice gases and spin systems (as well as in 
continuum systems), often periodic and semiperiodic 
boundary conditions are considered (i.e., periodic in 
one or more directions and with empty or fixed 
external particles or spins in the others). 

Thermodynamic limits for the partition functions 


Vine , N 
Bp(B, A) = Jim “log Z$ (8, A, 9) ” 
m) = lim $log Z$(8, M, 9) 


M/V—>m 


1 
Bf(G,b) = lim zlog Z$ (8, A, 9) 


can be shown to exist by a method similar to the 
one discussed in Appendix 2. They have convexity 
and continuity properties as in the cases of the 
continuum systems. In the case of a lattice gas, the 
f, p functions are still interpreted as free energy 
and pressure, respectively. In the case of spin, f(/3, /) 
has the interpretation of magnetic free energy, 
while g(8,m) does not have a special name in the 
thermodynamics of magnetic systems. As in the 
continuum systems, it is occasionally useful to define 
infinite-volume equilibrium states: 


Definition An infinite-volume state with para- 
meters (3,4) or (G,m) is a collection of average 
values F — (F) obtained, respectively, as limits of 
finite-volume averages (F), defined from canonical 
or grand canonical distributions in Q, with fixed 
parameters (8, h) or (8, m), or (u,v) and with general 
boundary condition of fixed external spins or empty 
sites, on sequences N, — oo for which such limits 
exist simultaneously for all local observables F. 


This is taken verbatim from the definition in the 
section “Phase transitions and boundary condi- 
tions.” In this way, it makes sense to define the 
spin correlation functions for X=(€,,...,€,) as 
(ox) if ox= [[jog. For instance, we shall call 

plE,,€) = Eog TE, i and a pure ghas: can be defined 
as an infinite-volume state such that 


(Oxoy+e) — (ox)(oy+e) a [42] 
Again, for more details, we refer the reader to Ruelle 
(1969) and Gallavotti (1969). 


Thermodynamic Limits and Inequalities 


An interesting property of lattice systems is that it is 
possible to study delicate questions like the existence 
of infinite-volume states in some (moderate) generality. 
A typical tool is the use of inequalities. As the simplest 
example of a vast class of inequalities, consider the 
ferromagnetic Ising model with some finite (but 
arbitrary) range interaction J,, > 0 in a field h, > 0: 
J,b may even be not translationally invariant. Then 
the average of ox =x, 0x, °°" Ox, X=(X1,..-5Xn)s 
in a state with “empty boundary conditions” (i.e., no 
external spins) satisfies the inequalities 


(ox), Op,(Ox), OJxy(ox) 20 X=(x1,..., Xn) 


More generally, let H(o) in [39] be replaced by 
H(o)= —)°yJxox with Jx > 0 and X can be any 
finite set them, if Y= (Vjsia~5 9a = (Xis 


the following Griffiths inequalities hold: 
(0x) 20, Oy (ox) = (Ox) (0 


(oxoy) — y) 20 [43] 
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The inequalities can be used to check, in ferromag- 
netic Ising models, [39], existence of infinite-volume 
states (cf. the sections “Phase transitions and boundary 
conditions” and “Lattice models”) obtained by fixing 
the boundary condition B to be either “all external 
spins +” or “all external sites empty.” If (F)g o 
denotes the grand canonical average with boundary 
condition B and any fixed G, > 0, this means that 
for all local observables F(o',) (i.e., for all F depending 
on the spin configuration in any fixed region A) all the 
following limits exist: 


lim (F) po = (F)g [44] 


The reason is that the inequalities [43] imply that all 
averages (0x)gq are monotonic in Q for all fixed 
X C Q: so the limit [44] exists for F(a) = cox. Hence, 
it exists for all Fs depending only on finitely many 
spins, because any local function F “measurable in A” 
can be expressed (uniquely) as a linear combination 
of functions oy with X C A. 

Monotonicity with empty boundary conditions is 
seen by considering the sites outside Q and in a 
region Q with side one unit larger than that of Q 
and imagining that the couplings Jx with X c 1)’ but 
X £ Q vanish. Then, (ox) q > (ox)o, because (ox) 
is an average computed with a distribution corre- 
sponding to an energy with the couplings Jx with 
X t Q, but X c w, changed from 0 to Jx > 0. 

Likewise, if the boundary condition is +, then 
enlarging the box from Q to Q’ corresponds to 
decreasing an external field h acting on the external 
spins from +00 (which would force all external spins to 
be +) to a finite value b > 0: so, increasing the box Q 
causes (0x), p to decrease. Therefore, as Q increases, 
Ising ferromagnets spin correlations increase if the 
boundary condition is empty and decrease if it is +. 

The inequalities can be used in similar ways to prove 
that the infinite-volume states obtained from + or 
empty boundary conditions are translation invariant; 
and that in zero external field, h =0, the + and — 
boundary conditions generate pure states if the interac- 
tion potential is only a pair ferromagnetic interaction. 

There are many other important inequalities 
which can be used to prove several existence 
theorems along very simple paths. Unfortunately, 
their use is mostly restricted to lattice systems and 
requires very special assumptions on the energy 
(e.g., ferromagnetic interactions in the above exam- 
ple). The quoted examples were among the first 
discovered and provide a way to exhibit nontrivial 
thermodynamic limits and pure states. 

For more details, see Ruelle (1969), Lebowitz 
(1974), Gallavotti (1999), Lieb and Thirring (2001), 
and Lieb (2002). 


Symmetry-Breaking Phase Transitions 


The simplest phase transitions (see the section 
“Phase transitions and boundary conditions”) are 
symmetry-breaking transitions in lattice systems: 
they take place when the energy of the system in a 
container Q and with some special boundary 
condition (e.g., periodic, antiperiodic, or empty) is 
invariant with respect to the action of a group G on 
phase space. This means that on the points x of 
phase space acts a group of transformations G so 
that with each y € G is associated a map x —> xy 
which transforms x into xy respecting the composi- 
tion law in G, that is, (xy)y’ =x(yy’). If F is an 
observable, the action of the group on phase space 
induces an action on the observable F changing F(x) 
. def 1 

into Fax) = F(xy~). 

A symmetry-breaking transition occurs when, by 
fixing suitable boundary conditions and taking the 
thermodynamic limit, a state F — (F) is obtained in 
which some local observable shows a nonsymmetric 
average (F) ¢ (F,) for some y. 

An example is provided by the “nearest-neighbor 
ferromagnetic Ising model” on a d-dimensional lattice 
with energy function given by [39] with h=0 and 
lx —y) =O unless |x — y|=1, i.e., unless x,y are 
nearest neighbors, in which case y(x —y)=] > 0. 
With periodic or empty boundary conditions, it 
exhibits a discrete “up-down” symmetry 0 >—o. 

Instability with respect to boundary conditions 
can be revealed by considering the two boundary 
conditions, denoted + or —, in which the lattice 
sites outside the container Q are either occupied by 
spins + or by spins —. Consider also, for later 
reference, (1) the boundary conditions in which 
the boundary spins in the upper half of the 
boundary are + and the ones in the lower 
part are —: call this the +-boundary condition 
(see Figure 2); or (2) the boundary conditions in 





Figure 2 The dashed line is the boundary of Q; the outer spins 
correspond to the + boundary condition. The points A, B are 
points where an open “line” à ends. 
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which some of the opposite sides of Q are 
identified while + or — conditions are assigned on 
the remaining sides: call these “cylindrical or 
semiperiodic boundary conditions.” 

A new description of the spin configurations is 
useful: given o, draw a unit segment perpendicular 
to the center of each bond b having opposite spins at 
its extremes. An example of this construction is 
provided by Figure 2 for the boundary condition +. 

The set of segments can be grouped into lines 
separating regions where the spins are positive from 
regions where they are negative. If the boundary 
condition is + or —, the lines form “closed polygons”, 
whereas, if the condition is +, there is also a single 
polygon A; which is not closed (as in Figure 2). If the 
boundary condition is periodic or cylindrical, all 
polygons are closed but some may “go around” Q. 
The polygons are also called “contours” and the length 
of a polygon y will be denoted |y]. 

The correspondence (71,72,---; Yn, A1) — Ø, for 
the boundary condition + or, for the boundary 
condition + (or —), © (74,..-,%) is One-to-one 
and, if h =0, the energy Ho(o) of a configuration is 
higher than -—Jx(number of bonds in Q) by an 
amount 2J(|A1|+ >>; |yi|) or, respectively, 2J $; [y]. 
The grand canonical probability of each spin 
configuration is therefore proportional, if b=0, 
respectively, to 


eo 28Mult >) or e728 DL, hil [45] 


and the “up-down” symmetry is clearly reflected 
by [45]. 

The average (ox)o, of o} with + boundary 
conditions is given by (ox)o , =1—2Po,+(—), where 
Po (—) is the probability that the spin ox is —1. If the 
site x is occupied by a negative spin then the point x is 
inside some contour y associated with the spin 
configuration o under consideration. Hence, if p(y) 
is the probability that a given contour belongs to 
the set of contours describing a configuration Ø, it 
is Pos(—) < box P(Y) Where yox means that y 
“surrounds” x. 

If T=(1,..-,%) is a spin configuration and if 
the symbol IT compy means that the contour y is 
“disjoint” from y,..., Yu (1e., {y UT} is a new spin 
configuration), then 


Ts, oT Der 17" 


p(y) 
Dra a 


-267 Dep 1" 
eee a 
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< e ZAIN [46] 


because the last ratio in [46] does not exceed 1. 
Note that there are >3? different shapes of y with 
perimeter p and at most p? congruent 7y’s containing 
x; therefore, the probability that the spin at x is — 
when the boundary condition is + satisfies the 
inequality 


TOP 2h — 0 
p=4 


This probability can be made arbitrarily small so 
that (ox)q., is estimated by a quantity which is as 
close to 1 as desired provided 8 is large enough and 
the closeness of (ox), to 1 is estimated by a 
quantity which is both x and Q independent. 

A similar argument for the (— )-boundary condition, 
or the remark that for h=0 it is (ox)o =—(ox)o4s 
leads to conclude that, at large Ø, (ox)o  # (Ox)o,4 
and the difference between the two quantities 
is positive uniformly in Q. This is the proof 
(Peierls’ theorem) of the fact that there is, if 8 is 
large, a strong instability, of the magnetization with 
respect to the boundary conditions, i.e., the nearest- 
neighbor Ising model in dimension 2 (or greater, by an 
identical argument) has a phase transition. If the 
dimension is 1, the argument clearly fails and no phase 
transition occurs (see the section “Absence of phase 
transitions: d= 1”). 

For more details, see Gallavotti (1999). 


Finite-Volume Effects 


The description in the last section of the phase 
transition in the nearest-neighbor Ising model can be 
made more precise both from physical and mathe- 
matical points of view giving insights into the nature 
of the phase transitions. Assume that the boundary 
condition is the (+)-boundary condition and 
describe a spin configuration o by means of the 
associated closed disjoint polygons (y,..., 1). 
Attribute to O=(1,..-5Yx) a probability propor- 
tional to [45]. Then the following Minlos—Sinai’s 
theorem holds: 


Theorem If (3 is large enough there exist C > 0, 
p(y) > 0 with ply) < e? and such that a spin 
configuration o randomly chosen out of the grand 
canonical distribution with + boundary conditions 
and h=0 will contain, with probability approaching 
1 as Q — œ, a number K,,(o) of contours con- 
gruent to y such that 


Kola) — pails CVA eA 47] 


and this relation holds simultaneously for all 4s. 
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Thus, there are very few contours (and the larger 
they are the smaller is, in absolute and relative 
value, their number): a typical spin configuration in 
the grand canonical ensemble with (+)-boundary 
conditions is such that the large majority of the spins 
is “positive” and, in the “sea” of positive spins, there 
are a few negative spins distributed in small and 
rare regions (their number, however, is still of order 
of |Q)). 

Another consequence of the analysis in the last 
section concerns the the approximate equation of 
state near the phase transition region at low 
temperatures and finite Q. If Q is finite, the graph 
of b versus moe(3,h) will have a rather different 
behavior depending on the possible boundary con- 
ditions. For example, if the boundary condition is 
(+) or (—), one gets, respectively, the results 
depicted in Figure 3a and 3b, where m*((3) denotes 
the spontaneous magnetization (i.e. m*(() a 
limy_.o+ limo MQ (0; h)). 

With periodic or empty boundary conditions, the 
diagram changes as in Figure 4. The thermody- 
namic limit (3,4) = limo_.., me(G, h) exists for all 
hb #0 and the resulting graph is in Figure 4b, 
which shows that at )=0 the limit is discontin- 
uous. It can be proved, if 8 is large enough, that 
œ > lim; „o+ 0,m(G,b)=x(B) > 0 (1e., the angle 
between the vertical part of the graph and the rest 
is sharp). 

Furthermore, it can be proved that m(G,h) is 
analytic in hb for hb #0. If 8 is small enough, 


analyticity holds at all 4. For 8 large, the function 
f(G,h) has an essential singularity at h =0: a result 
that can be interpreted as excluding a naive theory 
of metastability as a description of states governed 
by an equation of state obtained from an analytic 
continuation to negative values of h of f((3,h). 

The above considerations and results further 
clarify the meaning of a phase transition for a 
finite system. For more details, we refer the 
reader to Gallavotti (1999) and Friedli and Pfister 
(2004). 


Beyond Low Temperatures 
(Ferromagnetic Ising Model) 


A limitation of the results discussed above is the 
condition of low temperature (“8 large enough”). 
A natural problem is to go beyond the low- 
temperature region and to describe fully the phe- 
nomena in the region where boundary condition 
instability takes place and first develops. A number 
of interesting partial results are known, which 
considerably improve the picture emerging from 
the previous analysis. A striking list, but far from 
exhaustive, of such results follows and focuses on 
the properties of ferromagnetic Ising spin systems. 
The reason for restricting to such cases is that they 
are simple enough to allow a rather fine analysis, 
which sheds considerable light on the structure of 
statistical mechanics suggesting precise formulation 





(b) 





(a) 
Figure4 (a) The hvs mo(68, h) graph for periodic or empty boundary conditions. (b) The discontinuity (at h = 0) of the thermodynamic limit. 
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of the problems that it would be desirable to 
understand in more general systems. 


def 


1. Let ge” and consider that the product of zV 


(V is the number of sites |Q] of Q) times the 
partition function with periodic or perfect-wall 
boundary conditions and with  finite-range 
ferromagnetic interaction, not necessarily nearest- 
neighbor; a polynomial in z (of degree 2V) 
is thus obtained. Its zeros lie on the unit 
circle |z/=1: this is Lee-Yang’s theorem. It 
implies that the only singularities of f(G,/) in 
the region 0< B<w,-—o <h<-+co can be 
found at h=0. 

A singularity can appear only if the point z=1 
is an accumulation point of the limiting distribu- 
tion (as Q — oo) of the zeros on the unit circle: if 
the zeros are z,...,Z2y then 


1 
y log 2" Z(8, b, Q, periodic) 


1 2V 
= 26] + Bh +) |log(z — zi) 
i=] 


and if 
V=! x (number of zeros of the form 
. dp3(0 
zi = ei 0 < 0; = 0 + db) — dpa (9) 
Q 00 2r 


It 1s 
1 i 10 
3f(9,b) = 28] +z- | loglz-e")dos(0) [48 


The existence of the measure dpg(@) follows from 
the existence of the thermodynamic limit: but 
dpg(@) is not necessarily d@-continuous, i.e., not 
necessarily proportional to dé. 

. It can be shown that, with not necessarily a 
nearest-neighbor interaction, the zeros of the 
partition function do not move too much under 
small perturbations of the potential even if one 
perturbs the energy (at perfect-wall or periodic 
boundary conditions) into 


Ho (0) = Ho(o) + (6Ho)(0) 
(Ho)(o) = XJ (X) ox ed 


XC 


where J'(X) is very general and defined on 
subsets X =(x1,...,xXg) C Q such that the quan- 
tity ||J"||=supyczd > jyex J'(X)| is small enough. 
More precisely, with a ferromagnetic pair 
potential J fixed, suppose that one knows that, 
when J'=0, the partition function zeros in the 
variable z=e® lie in a certain closed set N (of 


the unit circle) in the z-plane. Then, if J” Æ 0, 
they lie in a closed set N!,Q-independent and 
contained in a neighborhood of N of width 
shrinking to 0 when ||J’|| — 0. This allows to 
establish various relations between analyticity 
properties and boundary condition instability 
as described in (3) below. 


. In the ferromagnetic Ising model, with not necessa- 


rily a nearest-neighbor interaction, one says that 
there is a gap around 0 if dpg(@)=0 near 0=0. It 
can be shown that if 8 is small enough there is a gap 
for all b of width uniform in h. 


. Another question is whether the boundary 


condition instability is always revealed by the 
one-spin correlation function (1.e., by the magne- 
tization) or whether it might be shown only 
by some correlation functions of higher order. It 
can be proved that no boundary condition 
instability occurs for h Æ 0; at h =0 it is possible 
only if 


lim m(8,h) # lim m(3,b) [50] 


. A consequence of the Griffiths inequalities 


(cf. the section “Thermodynamic limits and 
inequalities”) is that if [50] is true for a given 
Bo then it is true for all G > Jo. Therefore, item 
(4) leads to a natural definition of the critical 
temperature T, as the least upper bound of the 
T’s such that [50] holds (kgT = 87). 


. If d=2 the free energy of the nearest-neighbor 


ferromagnetic Ising model has a singularity 
at e and the value of G is known exactly 
from the exact solutions of the model: 
m(8, 0+) 2 m*(8) = (1 — sinh* 26J)"/8. The loca- 
tion and nature of the singularities of f (8,0) as a 
function of 8 remains an open question for d=3. 
In particular, the question whether there is a 
singularity of f(8,0) at 8 = b: is open. 


. For B < Be there is instability with respect to 


boundary conditions (see (6) above) and a 
natural question is: how many “pure” phases 
can exist in the ferromagnetic Ising model? 
(cf. the section “Phase transitions and boundary 
conditions,” eqn [22]). Intuition suggests 
that there should be only two phases: the 
positively magnetized and the negatively 
magnetized ones. 

One has to distinguish between translation- 
invariant pure phases and non-translation-invariant 
ones. It can be proved that, in the case of the 
two-dimensional nearest-neighbor ferromagnetic 
Ising models, all infinite-volume states (cf. the 
section “Lattice models”) are translationally invar- 
iant. Furthermore, they can be obtained by 
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considering just the two boundary conditions + 
and —: the latter states are also pure states for 
models with non-nearest-neighbor ferromagnetic 
interaction. The solution of this problem has led to 
the introduction of many new ideas and techniques 
in statistical mechanics and probability theory. 

8. In any dimension d >2, for 8 large enough, it can 
be proved that the nearest-neighbor Ising model 
has only two translation-invariant phases. If the 
dimension is >3 and 8 is large, the + and — 
phases exhaust the set of translation-invariant 
pure phases but there exist non-translation- 
invariant phases. For 8 close to Be, however, the 
question is much more difficult. 


For more details, see Onsager (1944), Lee and 
Yang (1952), Ruelle (1971), Sinai (1991), Gallavotti 
(1999), Aizenman (1980), Higuchi (1981), and 
Friedli and Pfister (2004). 


Geometry of Phase Coexistence 


Intuition about the phenomena connected with the 
classical phase transitions is usually based on the 
properties of the liquid—gas phase transition; this 
transition is usually experimentally investigated in 
situations in which the total number of particles is 
fixed (canonical ensemble) and in presence of an 
external field (gravity). 

The importance of such experimental conditions 
is obvious; the external field produces a nontransla- 
tionally invariant situation and the corresponding 
separation of the two phases. The fact that the 
number of particles is fixed determines, on the other 
hand, the fraction of volume occupied by each of the 
two phases. 

Once more, consider the nearest-neighbor ferro- 
magnetic Ising model: the results available for it can 
be used to obtain a clear picture of the solution to 
problems that one would like to solve but which in 
most other models are intractable with present-day 
techniques. 

It will be convenient to discuss phase coexistence in 
the canonical ensemble distributions on configurations 
of fixed total magnetization M=mvV (see the section 
“Lattice models”; [40]). Let 8 be large enough to be in 
the two-phase region and, for a fixed a € (0, 1), let 


m = a m* (3) + (1 — a) (—m*(8)) 
= (1 — 2a) m* (8) |51] 


that is, m is in the vertical part of the diagram 
m=m(ß,h) at 8 fixed (see Figure 4). 

Fixing m as in [51] does not yet determine the 
separation of the phases in two different regions; for 
this effect, it will be necessary to introduce some 


external cause favoring the occupation of a part of 
the volume by a single phase. Such an asymmetry 
can be obtained in at least two ways: through a 
weak uniform external field (in complete analogy with 
the gravitational field in the liquid—vapor transition) or 
through an asymmetric field acting only on boundary 
spins. The latter should have the same qualitative 
effect as the former, because in a phase transition 
region a boundary perturbation produces volume 
effects (see sections “Phase transitions and inequal- 
ities?” and “Symmetry-breaking phase transitions”). 
From a mathematical point of view, it is simpler to 
use a boundary asymmetry to produce phase separa- 
tions and the simplest geometry is obtained by 
considering +-cylindrical or ++-cylindrical boundary 
conditions: this means ++ or + boundary conditions 
periodic in one direction (e.g., in Figure 2 imagine the 
right and left boundary identified after removing the 
boundary spins on them). 

Spins adjacent to the bases of Q act as symmetry- 
breaking external fields. The ++-cylindrical bound- 
ary condition should favor the formation inside Q 
of the positively magnetized phase; therefore, it 
will be natural to consider, in the canonical 
distribution, this boundary condition only when 
the total magnetization is fixed to be the sponta- 
neous magnetization m* (6). 

On the other hand, the +-boundary condition 
favors the separation of phases (positively magnetized 
phase near the top of Q and negatively magnetized 
phase near the bottom). Therefore, it will be natural 
to consider the latter boundary condition in the 
case of a canonical distribution with magnetization 
m= (1 — 2a)m*(B) with O < a < 1 ([51]). In the latter 
case, the positive phase can be expected to adhere to 
the top of Q and to extend, in some sense to be 
discussed, up to a distance O(L) from it; and then to 
change into the negatively magnetized pure phase. 

To make the phenomenological description 
precise, consider the spin configurations øo through 
the associated sets of disjoint polygons (cf. the 
section “Symmetry-breaking phase transitions”). Fix 
the boundary conditions to be ++ or -cylindrical 
boundary conditions and note that polygons asso- 
ciated with a spin configuration o are all closed and 
of two types: the ones of the first type, denoted 
Y1,- -, Yn, are polygons which do not encircle Q; the 
second type of polygons, denoted by the symbols Aa, 
are the ones which wind up, at least once, around Q. 

So, a spin configuration ø will be described by a set 
of polygons; the statistical weight of a configuration 
O = (Jiye Ya Aea Ap) is (cf. [45]): 


(EE) iy 
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The reason why the contours A that go around 
the cylinder Q are denoted by A (rather than by y) is 
that they “look like” open contours (see the section 
“Symmetry-breaking phase transitions”) if one forgets 
that the opposite sides of Q have to be identified. In the 
case of the +-boundary conditions then the number of 
polygons of A-type must be odd (hence £0), while for 
the ++-boundary condition the number of A-type 
polygons must be even (hence it could be 0). 

For more details, the reader is referred to Sinai 
(1991) and Gallavotti (1999). 


Separation and Coexistence of Phases 


In the context of the geometric description of 
the spin configuration in the last section, consider 
the canonical distributions with ++-cylindrical or the 
+-cylindrical boundary conditions and zero field: they 
will be denoted briefly as jug ++, Hg, +, respectively. 
The following theorem (Minlos—Sinai’s theorem) 
provided the foundations of the microscopic theory 
of coexistence: it is formulated in dimension d= 2 
but, modulo obvious changes, it holds for d > 2. 


Theorem For 0<a<1 fixed, let m=(1 — 2a) 
m*(3); then for 3 large enough a spin configuration 
O= (71; -- -Yn A1 --- A2p41) randomly chosen with 
the distribution ug, + enjoys the properties (i)—-(iv) below 
with a ug, +-probability approaching 1 as Q — ov: 


(i) o contains only one contour of A-type and 
IA] — (1 + €(8))L] < o(L) [53] 


where e(6)>0 is a suitable (a-independent) 
function of 3 tending to zero exponentially fast 
as B > œ. 

(ii) If QY, Q3 denote respectively, the regions above 
and below x, and |Q)=V,|Q*|,|Q°| are, 
respectively, the volumes of Q, QT, Q` then 


IQF] — a V| < K(8) V34 
Q|- (1 -a)V|< K(8) V% [34] 


where «(3) 7”. exponentially fast; the expo- 
Boo 


nent 3/4, here and below, is not optimal. 
(iii) If My = Dixent öy and My = 2 xen- Ox, then 
IM} — am* (6) V| < K(8) V4 
M; — (1 - a)m* (B) V| < (8) V4 [55] 
(iv) If KÀ (o) denotes the number of contours con- 


gruent to a given y and lying in QY then, 
simultaneously for all the shapes of y: 


IKàlo)-ply)a V| < Ce hyt, C>0 [56] 


where piy)<e7/l is the same quantity as 
already mentioned in the text of the theorem of 
“Finite-volume effects”. A similar result holds for 
the contours below A (cf. the comments on [47]). 


The above theorem not only provides a detailed and 
rather satisfactory description of the phase separation 
phenomenon, but it also furnishes a precise micro- 
scopic definition of the line of separation between the 
two phases, which should be naturally identified with 
the (random) line A. 

A similar result holds in the canonical distribution 
H6, ++, m(8) Where (i) is replaced by: no A-type 
polygon is present, while (ii), (iii) become super- 
fluous, and (iv) is modified in the obvious way. In 
other words, a typical configuration for the distribu- 
tion the Wg, ++, ms) has the same appearance as a 
typical configuration of the corresponding grand 
canonical ensemble with (+)-boundary condition 
(whose properties are described by the theorem 
given in the section “Beyond low temperatures 
(ferromagnetic Ising model”). 

For more details, see Sinai (1991) and Gallavotti 
(1999). 


Phase Separation Line and Surface 
Tension 


Continuing to refer to the nearest-neighbor Ising 
ferromagnet, the theorem of the last section means 
that, if 8 is large enough, then the microscopic line A, 
separating the two phases, is almost straight (since 
e(8) is small). The deviations of À from a straight line 
are more conveniently studied in the grand canonical 
distributions u? with boundary condition set to + in 
the upper half of OO, vertical sites included, and 
to —1 in the lower half: this is illustrated in Figure 2 
(see the section “Symmetry-breaking phase transi- 
tions”). The results can be converted into very 
similar results for grand canonical distributions with 
+-cylindrical boundary conditions of the last section. 

Define to be rigid if the probability that passes 
through the center of the box Q (i.e., 0) does not 
tend to 0 as Q > œ; otherwise, it is not rigid. 

The notion of rigidity distinguishes between the 
possibilities for the line A to be “straight.” The 
“excess” length e(8)L (see [53]) can be obtained in 
two ways: either the line is essentially straight (in 
the geometric sense) with a few “bumps” distributed 
with a density of order (8) or, otherwise, it is only 
locally straight and with an important part of the 
excess length being gained through a small bending 
on a large length scale. In three dimensions a similar 
phenomenon is possible. Rigidity of A, or its failure, 
can in principle be investigated by optical means; 
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there can be interference of coherent light scattered 
by macroscopically separated surface elements of A 
only if A is rigid in the above sense. 

It has been rigorously proved that, the line A is not 
rigid in dimension 2. And, at least at low tempera- 
ture, the fluctuation of the middle point is of the 
order O(\VL). In dimension 3 however, it has been 
shown that the surface is rigid at low enough 
temperature. 

A deeper analysis is needed to study the shape of 
the separation surface under other conditions, for 
example, with + boundary conditions in a canoni- 
cal distribution with magnetization intermediate 
between +m*(6). It involves, as a prerequisite, the 
definition and many properties of the surface 
tension between the two phases. Here only 
the definition of surface tension in the case of 
+-boundary conditions in the two-dimensional case 
will be mentioned. If Z*7*(Q,m*(3)) and Z*7(Q,m) 
are, respectively, the canonical partition functions 
for the ++- and +-cylindrical boundary conditions 
the tension 7((@) is defined as 


© 1 Z*~(Q, m) 
Brie) —— un 5 108 Fm, m 


The limit can be shown to be a-independent for 8 
large enough: the definition and its justification is 
based on the microscopic geometric description in 
the section “Geometry of phase co-existence.” The 
definition can be naturally extended to higher 
dimension (and to more general non-nearest-neighbor 
models). If d=2, the tension + can be exactly 
computed at all temperatures below criticality and 
is 37(3)=2G] + logtanh GJ. 

More remarkably, the definition can be extended to 
define the surface tension 7(8, n) in the “direction n,” 
that is, when the boundary conditions are such 
that the line of separation is in the average 
orthogonal to the unit vector n. In this way, if 
d=2 and a € (0,1) is fixed, it can be proved that 
at low enough temperature the canonical distribu- 
tion with + boundary conditions and intermediate 
magnetization m=(1—2a)m*(3) has typical 
configurations containing a spin — region of area 
~aV; furthermore, if the container is rescaled to 
size L=1, the region will have a limiting shape 
filling an area a bounded by a smooth curve 
whose form is determined by the classical macro- 
scopic Wulff’s theory of the shape of crystals in 
terms of the surface tension T(n). 

An interesting question remains open in the three- 
dimensional case: it is conceivable that the surface, 
although rigid at low temperature, might become 
“loose” at a temperature Te smaller than the critical 


temperature T, (the latter being defined as the 
highest temperature below which there are at least 
two pure phases). The temperature T., whose 
existence is rather well established in numerical 
experiments, would be called the “roughening 
transition” temperature. The rigidity of » is con- 
nected with the existence of translationally non- 
invariant equilibrium states. The latter exist in 
dimension d=3, but not in dimension d =2, where 
the discussed nonrigidity of A, established all the 
way to Te, provides the intuitive reason for the 
absence of non-translation-invariant states. It has 
been shown that in d=3 the roughening tempera- 
ture T.(G) necessarily cannot be smaller than the 
critical temperature of the two-dimensional Ising 
model with the same coupling. 

Note that existence of translationally noninvar- 
iant equilibrium states is not necessary for the 
description of coexistence phenomena. The theory 
of the nearest-neighbor two-dimensional Ising model 
is a clear proof of this statement. 

The reader is referred to Onsager (1944), van 
Beyeren (1975), Sinai (1991), Miracle-Solé (1995), 
Pfister and Velenik (1999), and Gallavotti (1999) for 
more details. 


Critical Points 


Correlation functions for a system with short-range 
interactions and in an equilibrium state (which is 
a pure phase) have cluster properties (see [22]): 
their physical meaning is that in a pure phase there 
is independence between fluctuations occurring in 
widely separated regions. The simplest cluster 
property concerns the “pair correlation function,” 
that is, the probability density p(q,,q)) of finding 
particles at points q,,q, independently of where 
the other particles may happen to be (see [23]). 
In the case of spin systems, the pair correlation 
P(915 92) = (Fq,%q,) Will be considered. The pair 
correlation of a translation-invariant equilibrium 
state has a cluster property ([22], [42]), if 


1?(41, 92) — p| =~ 0 [57] 


[q1 =q |>% 


where p is the probability density for finding a 
particle at q (i.e., the physical density of the state) or 
p= (cq) is the average of the value of the spin at q 
(i.e., the magnetization of the state). 

A general definition of critical point is a point c in 
the space of the parameters characterizing equili- 
brium states, for example, 6,À in grand canonical 
distributions, 3, v in canonical distributions, or 3, þh 
in the case of lattice spin systems in a grand canonical 
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distribution. In systems with short-range interaction 
(i.e. with y(r) vanishing for |r| large enough) the 
point c is a critical point if the pair correlation tends 
to 0 (see [57]), slower than exponential (e.g., as a 
power of the distance |r| = |q; — ql). 

A typical example is the two-dimensional Ising 
model on a square lattice and with nearest-neighbor 
ferromagnetic interaction of size J. It has a single 
critical point at 8 = b., h =0 with sinh2G.J =1. The 
cluster property is that (ox0y} — (ax) (ay) ———~ 0 as 


|x—y|—00 
—K(8)|x—y| —K(G)|x—y| 

e e 

2) ——$ $= 2) 
J |x — y| lx — y| 
1 

o [58] 
lx =y] 


for B < Be, B > Be or B= be, respectively, where 
A+(3), Ac, K(3) > 0. The properties [58] stem from 
the exact solution of the model. 

At the critical point, several interesting phenom- 
ena occur: the lack of exponential decay indicates 
lack of a length scale over which really distinct 
phenomena can take place, and properties of the 
system observed at different length scales are likely 
to be simply related by suitable scaling transforma- 
tions. Many efforts have been dedicated at finding 
ways of understanding quantitatively the scaling 
properties pertaining to different observables. The 
result has been the development of the renormaliza- 
tion group approach to critical phenomena (cf. the 
section “Renormalization group”). The picture that 
emerges is that the closer the critical point is the 
larger becomes the maximal scale of length below 
which scaling properties are observed. For instance, 
in a lattice spin system in zero field the magnetiza- 
tion M|A| 7 in a box A C Q should have essentially 
the same distribution for all A’s with side </o(3) and 
lo(8) — œ as B > be, provided a is suitably chosen. 
The number a is called a critical exponent. 

There are several other “critical exponents” that 
can be defined near a critical point. They can 
be associated with singularities of the thermody- 
namic function or with the behavior of 
the correlation functions involving joint densities at 
two or more than two points. As an example, 
consider a lattice spin system: then the “2n-spins 
correlation” (a0¢, ...0¢,,_,). could behave propor- 
tionally to y2n(0,&,...,&n-1),2=1,2,3,..., for a 
suitable family of homogeneous functions Xn, of 
some degree wn, of the coordinates (&,...,&2,—1) 
at east when the reciprocal distances are large but 


<Io(3) and 


lo(6) = const.( — Bc)" > 00 


This means that if & are regarded as points in R? 
there are functions %2; such that 





£1 E aa 
xan( Ly ee = Nin Uy ise pA) 


O<AER [59] 


and (000% es Oaa & Xa 615405560421). 1 la 
|x; — xj| < lo(8). The numbers w2, define a sequence 
of critical exponents. 

Other critical exponents can be associated with 
approaching the critical point along other directions 
(e.g., along h — 0 at 8 = be). In this case, the length up 
to which there are scaling phenomena is lo(h) = 4h ™. 
Further, the magnetization m(h) tends to 0 as h — 0 at 
fixed B=. as m(h) = moh’ for 6 > 0. 

None of the feautres of critical exponents is known 
rigorously, including their existence. An exception is the 
case of the two-dimensional nearest-neighbor Ising 
ferromagnet where some exponents are known exactly 
(e.g., w2 = 1/4, wr, = nwn, or v = 1, while 6, D are not 
rigorously known). Nevertheless, for Ising ferromag- 
nets (not even nearest-neighbor but, as always here, 
finite-range) in all dimensions, all of the exponents 
mentioned are conjectured to be the same as those 
of the nearest-neighbor Ising ferromagnet. A further 
exception is the derivation of rigorous relations 
between critical exponents and, in some cases, even 
their values under the assumption that they exist. 


Remark Naively it could be expected that in a pure 
state in zero field with (o,)=0O the quantity 
s= |A|“ ove Ox» If A is a cubic box of side 4, 
should have a probability distribution which is 
Gaussian, with dispersion lim,a_,..(s*). This is 
“usually true,” but not always. Properties [58] 
show that in the d=2 ferromagnetic nearest- 
neighbor Ising model, (s*) diverges proportionally 
to 4 so that the variable s cannot have the above 
Gaussian distribution. The variable S$=|A|~’/® 
Sexe, Gx Will have a finite dispersion: however, 
there is no reason that it should be Gaussian. This 
makes clear the great interest of a fluctuation theory 
and its relevance for the critical point studies (see 
the next two sections). 


For more details, the reader is referred to Onsager 
(1944), Domb and Green (1972), McCoy and Wu 
(1973), and Aizenman (1982). 


Fluctuations 


As it appears from the discussion in the last section, 
fluctuations of observables around their averages 
have interesting properties particularly at critical 
points. Of particular interest are observables that 
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are averages, over large volumes A, of local functions 
F(x) on phase space: this is so because macroscopic 
observables often have this form. For instance, given 
a region A inside the system container Q, A C Q, 
consider a configuration x= (P,O) and the number 
of particles Na = )/y<, 1 in A, or the potential energy 
Da => gea plq- q) or the kinetic energy 
Ka =} ger (1/2m)p*. In the case of lattice spin 
systems, consider a configuration o and, for instance, 
the magnetization Ma = J eag; in A. Label the 
above four examples by a=1,...,4. 

Let ua be the probability distribution describing 
the equilibrium state in which the quantities Xa are 
considered; let x,=(Xa/|A|),, and p pen — 
x,)/|A|. Then typical properties of fluctuations that 
should be investigated are (w=1,...,4): 


1. for all é > 0 it is limg-s65:5(|p| > 6) =0 
large numbers); 
2. there is Da > 0 such that 


(law of 


b dz - 
up VIA] € (a, b) — J et., 


(central limit law); and 
3. there is an interval Ia = (p% —, P) and a concave 
function Fa(p), p € I, such that if [a,b] C I then 


1 
A a8 Hp € |a, b]) — ee alp) 


(large deviations law). 


The law of large numbers provides the certainty 
of the macroscopic values; the central limit law 
controls the small fluctuations (of order \/|A]) of Xa 
around its average; and the large deviations law 
concerns the fluctuations of order |A]. 

The relations (1)-(3) above are not always true: 
they can be proved under further general assump- 
tions if the potential y satisfies [14] in the case of 
particle systems or if }7,|y(q)| < co in the case 
of lattice spin systems. The function F,(p) is 
defined in terms of the thermodynamic limits of 
suitable thermodynamic functions associated with 
the equilibrium state ua. The further assumption is, 
essentially in all cases, that a suitable thermody- 
namic function in terms of which F,(p) will be 
expressed is smooth and has a nonvanishing second 
derivative. 

For the purpose of a simple concrete example, 
consider a lattice spin system of Ising type with 
energy —) y yen P(X — Y)OxIy —) y Pox and the fluc- 
tuations of the magnetization My =} pea Ox, A CQ, 
in the grand canonical equilibrium 7. Uy, G. 

Let the free energy be (f(G,h) (see [41]), let 
m=m(h)S (Mxy/|A|) and let b(m) be the inverse 


function of m(h). If p=My/|A| the function F(p) is 
given by 


F(p) = B(f(8,h(b)) —F(G,2) — Onf(G,2)(A(p) —f)) [60] 


then a quite general result is: 


Theorem The relations (1)—(3) spit if the potential 
satisfies X`, |\p(x)| < œ and if F(p) [60] is smooth 
and F"(p) #0 in open leg seal those in 
which p is considered, that is, around p=O for the 
law of large numbers and for the central limit law or 
in an open interval containing a, b for the case of the 
large deviations law. 


In the cases envisaged, the theory of equivalence 
of ensembles implies that the function F can also be 
computed via thermodynamic functions naturally 
associated with other equilibrium ensembles. For 
instance, instead of the grand canonical f(G,/), one 
could consider the canonical Bg(8, m) (see [41]), then 


F(p) =—5(g(8,2) — 8(G,m) — Ong(G,m)(p—m)) [61] 


It has to be remarked that there should be a 
strong relation between the central limit law and the 
law of large deviations. Setting aside stating the 
conditions for a precise mathematical theorem, the 
statement can be efficiently illustrated in the case of 
a ferromagnetic lattice spin system and with A = Q, 
by showing that the law of large deviations in small 
intervals, around the average m(/o), at a value ho of 
the external field, is implied by the validity of the 
central limit law for all values of h near ho and vice 
versa (here ( is fixed). Taking hpo = 0 (for simplicity), 
the heuristic reasons are the following. Let upo be 
the grand canonical distribution in external field h. 
Then: 


1. The probability puyo(p € dp) is proportional, 
by definition, to poa(p € dp)e"?'"!. Hence, 
if the central limit law holds for all þb near 
ho =0, there will exist two functions m(h) and 
Dib) >0, defined for hb near hy=0, with 
m(0) =0 and 


olp € dp)e =” 


(p—m(h))” 


= const.exp (-i0 ID) 


+ oa) dp [62] 


2. There is a function (m) such that 0,,C(m(h)) = Gh 
and @.¢(m(h))=D(h)*. (This is obtained by 
noting that, given D(h), the differential equation 
0,34 =D(h)* with the initial value h(0)=0 
determines the function h(m); therefore, C(m) 
is determined by a second integration, from 


mcm) = Bh(m). 
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It then follows, heuristically, that the probability 
of p in zero field has the form const. e')?! dp so 
that the probability that p € [a,b] will be const 
exp (|Q| maxpefa,b) $(P)). 

Conversely, the large deviations law for p at h=0 
implies the validity of the central limit law for the 
fluctuations of p in all small enough fields þh: this 
simply arises from the function F(p) having a 
negative second derivative. 

This means that there is a “duality” between central 
limit law and large deviation law or that the law of 
large deviations is a “global version” of the central 
limit law, in the sense that: 


1. if the central limit law holds for h in an interval 
around þo then the fluctuations of the magnetiza- 
tion at field þọ satisfy a large deviation law in a 
small enough interval J around m(ho); and 

2. if a large deviation law is satisfied in an interval 
around / then the central limit law holds for the 
fluctuations of magnetization around its average 
in all fields h with h — ho small enough. 


Going beyond the heuristic level in establishing 
the duality amounts to giving a precise meaning to 
“small enough” and to discuss which properties of 
m(h) and D(h), or F(p) are needed to derive 
properties (1), (2). 

For purposes of illustration consider the Ising 
model with ferromagnetic short range interaction ¢: 
then the central limit law holds for all h if 8 is small 
enough and, under the same condition on 8, the 
large deviations law holds for all 4 and all intervals 
[a,b] C (—1,1). If 8 is not small then the condition 
h £0 has to be added. Hence, the conditions are 
fairly weak and the apparent exceptions concern the 
value h=0 and 8 not small where the statements 
may become invalid because of possible phase 
transitions. 

In presence of phase transitions, the law of large 
numbers, the central limit law, and law of large 
deviations should be reformulated. Basically, one 
has to add the requirement that fluctuations are 
considered in pure phases and change, in a natural 
way, the formulation of the laws. For instance, 
the large fluctuations of magnetization in a pure 
phase of the Ising model in zero field and large 8 
(i.e., in a state obtained as limit of finite-volume 
states with + or — boundary conditions) in 
intervals [a,b] which do not contain the average 
magnetization m* are not necessarily exponen- 
tially small with the size of |A|: if [a,b] Cc 
[—m*,m*| they are exponentially small but only 
with the size of the surface of A (i.e., with 
AJD) while they are exponentially small with 
the volume if [a,b] O [ —m*,m*]=0. 


The discussion of the last section shows that at 
the critical point the nature of the large fluctuations 
is also expected to change: no central limit law is 
expected to hold in general because of the example 
of [58] with the divergence of the average of the 
normal second moment of the magnetization in a 
box as the side tends to oo. 

For more details the reader is referred to Olla 
(1987). 


Renormalization Group 


The theory of fluctuations just discussed concerns 
only fluctuations of a single quantity. The problem 
of joint fluctuations of several quantities is also 
interesting and in fact led to really new develop- 
ments in the 1970s. It is necessary to restrict 
attention to rather special cases in order to illustrate 
some ideas and the philosophy behind the approach. 
Consider, therefore, the equilibrium distribution uo 
associated with one of the classical equilibrium 
ensembles. To fix the ideas we consider the 
equilibrium distribution of an Ising energy function 
GHo, having included the temperature factor in the 
energy: the inclusion is done because the discussion 
will deal with the properties of fo as a function of /. 
It will also be assumed that the average of each spin 
is zero (“no magnetic field,” see [39] with )=0). 
Keeping in mind a concrete case, imagine that GHo 
is the energy function of the nearest-neighbor Ising 
ferromagnet in zero field. 

Imagine that the volume Q of the container has 
periodic boundary conditions and is very large, 
ideally infinite. Define the family of blocks ké, 
parametrized by €€ Z and with k an integer, 
consisting of the lattice sites x= {k& < x; < (k + 1) 
€}. This is a lattice of cubic blocks with side size k 
that will be called the “k-rescaled lattice.” 

Given a, the quantities mg=k~°4 2 oseke Ox are 
called the block spins and define the map 
R% ,Ho = He transforming the initial distribution on 
the original spins into the distribution of the block 
spins. Note that if the initial spins have only two 
values ox, = +1, the block spins take values between 
—k4/k°4 and k4Y/k™ at steps of size 2/k°¢. Further- 
more, the map Rý , makes sense independently of 
how many values the initial spins can assume, and 
even if they assume a continuum of values Sy € R. 

Taking a=1 means, for k large, looking at the 
probability distribution of the joint large fluctuations 
in the blocks kë. Taking a=1/2 corresponds to 
studying a joint central limit property for the block 
variables. 

Considering a one-parameter family of initial 
distributions uo parametrized by a parameter 8 
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(that will be identified with the inverse temperature), 
typically there will be a unique value a( 8) of œa such 
that the joint fluctuations of the block variables 
admit a limiting distribution, 


prob; (meg € 
{be} 


= ga((Se)eca) | [ dse 63 
78 J {ag} EEA 


lag, be], 0 E A) 


for some distribution g4(z) on RÊ. 

If a > a(8), the limit will then be Ile ca (Se) dSz, 
or if a < a(8) the limit will not exist a the 
block variables will be too large, with a dispersion 
diverging as k — ov). 

It is convenient to choose as sequence of k — oo 
the sequence k= 2” with n=0,1,... because in this 
way it is R}, = Ri”, and the limits k — oo along 
the sequence k= 2” can be regarded as limits on a 
sequence of iterations of a map Rý , acting on the 
probability distributions of generic spins Sẹ on the 
lattice Z? (the sequence 3” would be equally 
suited). 

It is even more convenient to consider probability 
distributions that are expressed in terms of energy 
functions H which generate, in the thermodynamic 
limit, a distribution u: then Rž defines an action 
Ra on the energy functions so tat RH = A if H 
generates u, H' generates w’ and Rž u =p. Of 
course, the energy function will be more general 
than [39] and at least a form like 6U in [49] has to 
be admitted. 

In other words, R, gives the result of the action 
of R ı expressed as a map acting on the energy 
functions. Its iterates also define a semigroup 
which is called the block spin renormalization 
group. 

While the map R} is certainly well defined as a 
map of probability distributions into probability 
distributions, it is by no means clear that Ra is well 
defined as a map on the energy functions. Because, if 
u is given by an energy function, it is not clear that 
R% 4 is such. 

A remarkable theorem can be (easily) proved 
when Rž , and its iterates act on initial „uos which 
are equilibrium states of a spin system with short- 
range interactions and at high temperature (8 small). 
In this case, if a= 1/2, the sequence of distributions 
Ri 1 Ho(8) admits a limit which is given by 
a product of independent Gaussians: 


prob; (me € |ag, be], 0 € A) 


{be} 
Hel- Dl- a “^ 
— {az} EEA zea V 20D(G 


Note that this theorem is stated without even 
mentioning the renormalization maps R} j2: it can 
nevertheless be interpreted as stating that 


> DO DO [65] 


Ri 28H —> 


but the interpretation is not rigorous because [64] 
does not state require that R7,,Ho0(@) makes sense 
for n > 1. It states that at high temperature block 
spins have normal independent fluctuations: it is 
therefore an extension of the central limit law. 

There are a few cases in which the map R, can be 
rigorously shown to be well defined at least when 
acting on special equilibrium states like the high- 
temperature lattice spin systems: but these are 
exceptional cases of relatively little interest. 

Nevertheless, there is a vast literature dealing with 
approximate representations of the map R,. The 
reason is that, assuming not only its existence but 
also that it has the properties that one would 
normally expect to hold for a map acting on a finite 
dimensional space, it follows that a number of 
consequences can be drawn; quite nontrivial ones as 
they led to the first theory of the critical point that 
goes beyond the van der Waals theory discribed in 
the section “van der Waals theory.” 

The argument proceeds essentially as follows. At 
the critical point, the fluctuations are expected to be 
anomalous (cf. the last remark in the section “Critical 
points”) in the sense that (( > ye, %¢/ VIAJ?) will 
tend to oo, because a=1/2 does not correspond to 
the right fluctuation scale of X ecn og, signaling that 
RY; "2 1H0lbe) will not have a limit but, possibly, there 
is &c > 1/2 such that RZ” 1 uo(8c) converges to a limit 
in the sense of [63]. In the case of the critical nearest- 
neighbor Ising ferromagnetic a, =7/8 (see ending 
remark in the section “Critical points”). Therefore, if 
the map Rž, 1 is Oe as acting on uo(8), it will 
happen that for all B < Bey Ri” 1 Mo(Gc) will converge to 
a trivial limit Į [zca 6(S¢) 4s. ‘because the value ag is 
greater than 1/2 while normal fluctuations are expected. 

If the map Ra, can be considered as a map on the 
energy functions, this says that ] [zca 6(S¢) dS¢ is a 

“(trivial) fixed point of the renormalization group” 
which “attracts” the energy functions BHọ corre- 
sponding to the high-temperature phases. 

The existence of the critical 6. can be associated 
with the existence of a nontrivial fixed point H* for 
Ra. which is hyperbolic with just one Lyapunov 
exponent A > 1; hence, it has a stable manifold of 
codimension 1. Call u* the probability distribution 
corresponding to H*. 

The migration towards the trivial fixed point for 
B < B: can be explained simply by the fact that for 
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such values of 8 the initial energy function (GHp is 
outside the stable manifold of the nontrivial fixed 
point and under application of the renormalization 
transformation R% , GHo migrates toward the trivial 
fixed point, which is attractive in all directions. 

By increasing (, it may happen that, for 
B= Be, GHo crosses the stable manifold of the 
nontrivial fixed point H* for Ra.. Then R? BHo 
will no longer tend to the trivial fixed point but it 
will tend to H*: this means that the block spin 
variables will exhibit a completely different fluctua- 
tion behavior. If 8 is close to Be, the iterations of Ra, 
will bring R* GHo close to H*, only to be eventually 
repelled along the unstable direction reaching a 
distance from it increasing as \”|3 — bel. 

This means that up to a scale length O(2””) lattice 
units with \”)|G — 6.|=1 (i.e., up to a scale O(|G— 
Be} '°84)), the fluctuations will be close to those of the 
fixed point distribution u*, but beyond that scale they 
will come close to those of the trivial fixed point: to see 
them the block spins would have to be normalized 
with index a=1/2 and they would appear as 
uncorrelated Gaussian fluctuations (cf. [64], [65]). 

The next question concerns finding the nontrivial 
fixed points, which means finding the energy 
functions H* and the corresponding a, which are 
fixed points of Ra.. If the above picture is correct, 
the distributions u* corresponding to the H* would 
describe the critical fluctuations and, if there was 
only one choice, or a limited number of choices, of 
a, and H* this would open the way to a universality 
theory of the critical point hinted already by the 
“primitive” results of van der Waals’ theory. 

The initial hope was, perhaps, that there would be a 
very small number of critical values a, and H* 
possible: but it rapidly faded away leaving, however, 
the possibility that the critical fluctuations could be 
classified into universality classes. Each class would 
contain many energy functions which, upon iterated 
actions of R,., would evolve under the control of the 
trivial fixed point (always existing) for 8 small while, 
for B= b., they would be controlled, instead, by a 
nontrivial fixed point H* for Ra, with the same a, and 
the same H*. For G< B., a “resolution” of the 
approach to the trivial fixed point would be seen by 
considering the map Rj/2 rather than Ra, whose 
iterates would, however, lead to a Gaussian distribu- 
tion like [64] (and to a limit energy function like [65]). 

The picture is highly hypothetical: but it is 
the first suggestion of a mechanism leading to 
critical points with the character of universality 
and with exponents different from those of the van 
der Waals theory or, for ferromagnets on a lattice, 
from those of its lattice version (the Curie—Weiss 
theory). Furthermore, accepting the approximations 


(e.g., the Wilson—Fisher ¢-expansion) that allow one 
to pass from the well-defined Rj, to the action of 
Ra on the energy functions, it is possible to obtain 
quite unambiguously values for a, and expressions 
for H* which are associated with the action of Ra, 
on various classes of models. 

For instance, it can lead to conclude that the 
critical behavior of all ferromagnetic finite-range 
lattice spin systems (with energy functions given by 
[39]) have critical points controlled by the same a, 
and the same nontrivial fixed point: this property is 
far from being mathematically proved, but it is 
considered a major success of the theory. One has to 
compare it with van der Waals’ critical point theory: 
for the first time, an approximation scheme has 
led, even though under approximations not fully 
controllable, to computable critical exponents which 
are not equal to those of the van der Waals theory. 

The renormalization group approach to critical 
phenomena has many variants, depending on which 
kind of fluctuations are considered and on the models 
to which it is applied. In statistical mechanics, there 
are a few mathematically complete applications: 
certain results in higher dimensions, theory of dipole 
gas in d=2, hierarchical models, some problems in 
condensed matter and in statistical mechanics of 
lattice spins, and a few others. Its main mathematical 
successes have occured in various related fields where 
not only the philosophy described above can be 
applied but it leads to renormalization transforma- 
tions that can be defined precisely and studied in 
detail: for example, constructive field theory, KAM 
theory of quasiperiodic motions, and various pro- 
blems in dynamical systems. 

However, the applications always concern special 
cases and in each of them the general picture of the 
trivial-nontrivial fixed point dichotomy appears 
realized but without being accompanied, except in 
rare cases (like the hierarchical models or the 
universality theory of maps of the interval), by the 
full description of stable manifold, unstable direction, 
and action of the renormalization transformation on 
objects other than the one of immediate interest (a 
generality which looks often an intractable problem, 
but which also turns out not to be necessary). 

In the renormalization group context, mathema- 
tical physics has played an important role also by 
providing clear evidence that universality classes 
could not be too few: this was shown by the 
numerous exact solutions after Onsager’s solution 
of the nearest-neighbor Ising ferromagnet: there are 
in fact several lattice models in d=2 that exhibit 
critical points with some critical exponents exactly 
computable and that depend continuously on the 
models parameters. 
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For more details, we refer the reader to McCoy 
and Wu (1973), Baxter (1982), Bleher and Sinai 
(1975), Wilson and Fisher (1972), Gawedzky and 
Kupiainen (1983, 1985), Benfatto and Gallavotti 
(1995), and Mastropietro (2004). 


Quantum Statistics 


Statistical mechanics is extended to assemblies of 
quantum particles rather straightforwardly. In the 
case of N identical particles, the observables are 
operators O on the Hilbert space 


Hn =12(Q)% or Hy = (LQ 8) 


where a= +, —, of the symmetric (a= +, bosonic 
particles) or antisymmetric (a= —, fermionic parti- 
cles) functions 7(Q), O = (q4, ...,qn), of the posi- 
tion coordinates of the particles or of the position 
and spin coordinates ~(Q,0), € = (c1,..., oN), nor- 
malized so that 
[w@pdq=t or Y f WoR 
Oo 
here only oj=+1 is considered. As in classical 
mechanics, a state is defined by the average values 
(O) that it attributes to the observables. 
Microcanonical, canonical, and grand canonical 
ensembles can be defined quite easily. For instance, 
consider a system described by the Hamiltonian 
(4 = Planck’s constant) 


“Fa 1+ Dog 


I< 


Hy = — Gy) + > w(q;) 
j 
def 


SK+0 (66) 


where periodic boundary conditions are imagined 
on Q and w(q) is periodic, smooth potential (the side 
of Q is supposed to be a multiple of the periodic 
potential period if w #0). Then a canonical 
equilibrium state with inverse temperature 8 and 
specific volume v= V/N attributes to the observable 
O the average value 


def tre PANO 


(0) £ 


Similar definitions can be given for the grand 
canonical equilibrium states. 

Remarkably, the ensembles are orthodic and a “heat 
theorem” (see the section “Heat theorem and ergodic 
hypothesis”) can be proved. However, “equipartition” 
does not hold: that is, (K) 4 (d/2)N@", although 67! 
is still the integrating factor of dU + pdV in the heat 
theorem; hence, 3~! continues to be proportional to 
temperature. 


Ctr e OFN 67 


Lack of equipartition is important, as it solves 
paradoxes that arise in classical statistical mechanics 
applied to systems with infinitely many degrees 
of freedom, like crystals (modeled by lattices of 
coupled oscillators) or fields (e.g., the electromagnetic 
field important in the study of black body radiation). 
However, although this has been the first surprise of 
quantum statistics (and in fact responsible for the 
very discovery of quanta), it is by no means the last. 

At low temperatures, new unexpected (i.e., 
with no analogs in classical statistical mechanics) 
phenomena occur: Bose-Einstein condensation 
(superfluidity), Fermi surface instability (supercon- 
ductivity), and appearance of off-diagonal long- 
range order (ODLRO) will be selected to illustrate 
the deeply different kinds of problems of quantum 
statistical mechanics. Largely not yet understood, 
such phenomena pose very interesting problems not 
only from the physical point of view but also from 
the mathematical point of view and may pose 
challenges even at the level of a definition. However, 
it should be kept in mind that in the interesting cases 
(i.e., three-dimensional systems and even most two- 
and one-dimensional systems) there is no proof that 
the objects defined below really exist for the systems 
like [66] (see, however, the final comment for an 
important exception). 


Bose-Einstein Condensation 


In a canonical state with parameters (,v, a defini- 
tion of the occurrence of Bose condensation is in 
terms of the eigenvalues v;(Q,N) of the kernel 
plq,d) on L2(Q), called the one-particle reduced 
density matrix, defined eo 


e— BEn( 
— Aa Wal q, dı- --, 4N- 1) 
RUG qi,- --,dy-1) dq sedge [68] 


where E,(Q,N) are the eigenvalues of Hyn and 
Waldi,- qn) are the corresponding eigenfunctions. 
If v; are ordered by increasing value, the state with 
parameters (3,v is said to contain a Bose-Einstein 
condensate if 14(0,N) > bN > 0 for all large Q at 
v=V/N,( fixed. This receives the interpretation 
that there are more than DN particles with equal 
momentum. The free Bose gas exhibits a Bose 
condensation phenomenon at fixed density and 
small temperature. 


Fermi Surface 


The wave functions W(q1,015---,4n, ON) = Wn(Q, 0) 
are now antisymmetric in the permutations of the 
pairs (q;,0;). Let w(Q,o;N,2) denote the mth 
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eigenfunction of the N-particle energy Hy in [66] with 


eigenvalue E(N,7) (labeled by n=0,1,... and non- 
decreasingly ordered Setting Q” = (Gly dh) 


O =T riis 
Q',0") b 
pPp(Q,0; Q’,o") 

-BE(N n) 


def N-p ny € 
p(y ) 4 Q aan 


x u(O,0;Q",0"; N,n)(Q',o';Q",0";N,n) [69] 


which are called p-particle reduced density matrices 
(extending the corresponding one- oe jteduced 
density matrix [68]). Denote p(q,— yt p1 
(415054>,0). It is also useful to o spinless 
fermionic systems: the corresponding definitions are 
obtained simply by suppressing the spin labels and 
will not be repeated. 

Let rı (k) be the Fourier transform of p1(q — q’): the 
Fermi surface can be defined as the locus of the k’s in 
the neighborhood of which prı(k) is unbounded as 
Q — œ, B — oo. The limit as G — œ is important 
because the notion of a Fermi surface is, possibly, 
precise only at zero temperature, that is at 8 = oo. 

So far, existence of Fermi surface (i.e., the smooth- 
ness of rı(k) except on a smooth surface in k-space) 
has been proved in free Fermi systems (y = 0) and 


On—p)> introduce the kernels p;“(Q,o 


1. certain exactly soluble one-dimensional spinless 
systems and 

2. in rather general one-dimensional spinless systems 
or systems with spin and repulsive pair interac- 
tion, possibly in an external periodic potential. 


The spinning case in a periodic potential and 
dimension d > 2 is the most interesting case to study 
for its relevance in the theory of conduction in 
crystals. Essentially no mathematical results are 
available as the above-mentioned ones do not 
concern any case in dimension >1: this is a rather 
deceiving aspect of the theory and a challenge. 

In dimension 2 or higher, for fermionic systems 
with Hamiltonian [66], not only there are no results 
available, even without spin, but it is not even clear 
that a Fermi surface can exist in presence of 
interesting interactions. 


Cooper Pairs 


The superconductivity theory has been phenomeno- 
logically related to the existence of Cooper pairs. 
Consider the Hamiltonian [66] and define (cf. [69]) 


p(x —y,0;x' —y',0'sx — x’) 


def Pod axl / 
= p(X, 0,Y, 0 AT T] 


The system is said to contain Cooper pairs with 
spins o,—0 (o =+ or g= -—) if there exist functions 


g°(q,0) # 0 with 


J a ue) a 
such that 


lim p(x — y,o, x — y, o, x — x") 


V—o0o 


sae | X — y, 0 a) g(x! 752) [70] 
In this case, g“(x — y, o) with largest Ly norm can be 
called, after normalize, the wave function of the paired 
state of lowest energy: this is the analog of the plane 
wave for a free particle (and, like it, it is manifestly not 
normalizable, i.e., it is not square integrable as a 
function of x,y). If the system contains Cooper pairs 
and the nonleading terms in the limit [70] vanish 
quickly enough the two-particle reduced density 
matrix [70] regarded as a kernel operator has an 
eigenvalue of order V as V — oo: that is, the state of 
lowest energy is “macroscopically occupied,” quite 
like the free Bose condensation in the ground state. 
Cooper pairs instability might destroy the Fermi 
surface in the sense that rı(k) becomes analytic in k; 
but it is also possible that, even in the presence of 
them, there remains a surface which is the locus of the 
singularities of the function rı(k). In the first case, 
there should remain a trace of it as a very steep 
gradient of r1(k) of the order of an exponential in the 
inverse of the coupling strength; this is what happens 
in the BCS model for superconductivity. The model is, 
however, a mean-field model and this particular 
regularity aspect might be one of its peculiarities. In 
any event, a smooth singularity surface is very likely to 
exist for some interesting density matrix (e.g., in the 
BCS model with “gap parameter y” the wave function 


ak =y a= — | ekas) | — dk 
(27)" Jelk)>0 elk? +72 


of the lowest energy level of the Cooper pairs is 
singular on a surface coinciding with the Fermi 
surface of the free system). 


ODLRO 


Consider the k-fermion reduced density matrix 
p(Q,0;Q',o') as kernel operators O, on Lo ((Q x 
Oe Suppose k is even, then if O; has a (generalized) 
eigenvalue of order N*/? as N > œ, N/V =p, the 
system is said to exhibit off-diagonal long-range order 
of order k. For k odd, ODLRO is defined to exist if O, 
has an eigenvalue of order N'*~")/* and k > 3 (if k=1 
the largest eigenvalue of O4 is necessarily <1). 
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For bosons, consider the reduced density matrix 
p(Q; O') regarding it as a kernel operator O, on 
L2(0)* and define ODLRO of order k to be present 
if O(k) has a (generalized) eigenvalue of order N* as 
N = œ, N/V =p. 

ODLRO can be regarded as a unification of the 
notions of Bose condensation and of the existence of 
Cooper pairs, because Bose condensation could be 
said to correspond to the kernel operator p1(q, — q2) 
in [68] having a (generalized) eigenvalue of order N, 
and to be a case of ODLRO of order 1. If the state is 
pure in the sense that it has a cluster property (see 
the sections “Phase transitions and boundary condi- 
tions” and “Lattice models”), then the existence of 
ODLRO, Bose condensation, and Cooper pairs 
implies that the system shows a spontaneously 
broken symmetry: conservation of particle number 
and clustering imply that the off-diagonal elements 
of (all) reduced density matrices vanish at infinite 
separation in states obtained as limits of states with 
periodic boundary conditions and Hamiltonian [66], 
and this is incompatible with ODLRO. 

The free Fermi gas has no ODLRO, the BCS model 
of superconductivity has Cooper pairs and ODLRO 
with k =2, but no Fermi surface in the above sense 
(possibly too strict). Fermionic systems cannot have 
ODLRO of order 1 (because the reduced density 
matrix of order 1 is bounded by 1). 

The contribution of mathematical physics has 
been particularly effective in providing exactly 
soluble models: however, the soluble models deal 
with one-dimensional systems and it can be shown 
that in dimensions 1, 2 no ODLRO can take place. 
A major advance is the recent proof of ODLRO and 
Bose condensation in the case of a lattice version of 
[66] at a special density value (and d > 3). 

In no case, for the Hamiltonian [66] with y 4 0, 
existence of Cooper pairs has been proved nor 
existence of a Fermi surface for d > 1. Nevertheless, 
both Bose condensation and Cooper pairs formation 
can be proved to occur rigorously in certain limiting 
situations. There are also a variety of phenomena 
(e.g., simple spectral properties of the Hamiltonians) 
which are believed to occur once some of the 
above-mentioned ones do occur and several of 
them can be proved to exist in concrete models. 

If d=1,2, ODLRO can be proved to be impos- 
sible at T>0O through the use of Bogoliubov’s 
inequality (used in the “no d=2 crystal theorem,” 
see the section “Continuous symmetries: ‘no d=2 
crystal’ theorem”). 

For more details, the reader is referred to Penrose 
and Onsager (1956), Yang (1962), Ruelle (1969), 
Hohenberg (1967), Gallavotti (1999), and 
Aizenman et al. (2004). 


Appendix 1: The Physical Meaning of the 
Stability Conditions 


It is useful to see what would happen if the 
conditions of stability and temperedness (see [14]) 
are violated. The analysis also illustrates some of the 
typical methods of statistical mechanics. 


Coalescence Catastrophe due 
to Short-Distance Attraction 


The simplest violation of the first condition in [14] 
occurs when the potential y is smooth and negative 
at the origin. 

Let 6 > 0 be so small that the potential at distances 
<26 is <—b <0. Consider the canonical distribution 
with parameters 3, N in a (cubic) box 2 of volume V. 
The probability Peollapse that all the N particles are 
located in a little sphere of radius 6 around the center 
of the box (or around any prefixed point of the box) is 
estimated from below by remarking that 


so that 


F callapse 
apd —3(K(p) + ®(q)) 
Je PNN! 


E | dpdq _—G(K(p) + 8(q)) 
b3NN! 


aa 6b(1/2)N(N~1) 
N 


T i 


dq —60(q) 
PNN! 


The phase space is extremely small: nevertheless, 
such configurations are far more probable than the 
configurations which “look macroscopically cor- 
rect,” that is, configurations with particles more or 
less spaced by the average particle distance expected 
in a macroscopically homogeneous configuration, 
namely (N/V) =p. Their energy ®(q) is of 
the order of uN for some u, so that their probability 
will be bounded above by 


J dpdq_ —p(K(p) + uN) 
ie ular = Ba S S 
"O [ dpdq —B(K(p) + ®(q)) 
PNN! 
VN 2MB —BuN 
— PNN! [72] 


dq .—60(q) 
b3NN! 








2 [71] 
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However, no matter how small 6 is, the 
ratio Pregular/Pcollapse Will approach 0 as V — ov, 
N/V —v'!; this occurs extremely rapidly because 
e%N*/2 eventually dominates over VN ~ eN los, 
Thus, it is far more probable to find the system in a 
microscopic volume of size 6 rather than in a 
configuration in which the energy has some macro- 
scopic value proportional to N. This catastrophe can 
be called an ultraviolet catastrophe (as it is due to the 
behavior at very short distances) and it causes the 
collapse of the particles into configurations concen- 
trated in regions as small as we please as V —> ov. 


Coalescence Catastrophe due 
to Long-Range Attraction 


It occurs when the potential is too attractive near oo. 
For simplicity, suppose that the potential has a hard 
core, i.e., it is +oo for r< rọ, so that the above- 
discussed coalescence cannot occur and the system 
density bounded above by a certain quantity pep < 00 
(close-packing density). 

The catastrophe occurs if y(q) ~ —glq| ~"~, g,¢ > 0, 
for |q| large. For instance, this is the case for matter 
interacting gravitationally; if k is the gravitational 
constant, 7 is the particle mass, then g = km? and € =2. 

The probability Pregular of “regular configurations,” 
where particles are at distances of order p™/? from 
their close neighbors, is compared with the probability 
Poollapse Of “catastrophic configurations,” with the 
particles at distances rọ from their close neighbors to 
form a configuration of density pep/(1 + 6)? almost in 
close packing (so that rọ is equal to the hard-core 
radius times 1 + 6). In the latter case, the system does 
not fill the available volume and leaves empty a region 
whose volume is a fraction ~((fcp — P)/ Pep) V of V. 
Further, it can be checked that the ratio Pregutar/P collapse 
tends to 0 at a rate O(exp (g5N(pep(1 + 6) — p))) 
if 6 is small enough (and p < pep). 

A system which is too attractive at infinity will not 
occupy the available volume but will stay confined ina 
close-packed configuration even in empty space. 

This is important in the theory of stars: stars cannot 
be expected to obey “regular thermodynamics” and in 
particular will not “evaporate” because their particles 
interact via the gravitational force at large distances. 
Stars do not occupy the whole volume given to them 
(i.e., the universe); they do not collapse to a point only 
because the interaction has a strongly repulsive core 
(even when they are burnt out and the radiation pressure 
is no longer able to keep them at a reasonable size). 


—3+¢ 


Evaporation Catastrophe 


This is another infrared catastrophe, that is, a 
catastrophe due to the long-range structure of the 


interactions in the above subsection; it occurs when 
the potential is too repulsive at oo, that is, 


plq) ~ +ga > as 
so that the temperedness 
violated. 

In addition, in this case, the system does not 
occupy the whole volume: it will generate a layer of 
particles sticking, in close-packed configuration, to 
the walls of the container. Therefore, if the density is 
lower than the close-packing density, p < pcp, the 
system will leave a region around the center of the 
container Q empty; and the volume of the empty 
region will still be of the order of the total volume of 
the box (i.e., its diameter will be a fraction of the 
box side L). The proof is completely analogous to 
the one of the previous case; except that now the 
configuration with lowest energy will be the one 
sticking to the wall and close packed there, rather 
than the one close packed at the center. 

Also this catastrophe is important as it is realized in 
systems of charged particles bearing the same charge: 
the charges adhere to the boundary in close-packing 
configuration, and dispose themselves so that the 
electrostatic potential energy is minimal. Therefore, 
charges deposited on a metal will not occupy the whole 
volume: they will rather form a surface layer minimiz- 
ing the potential energy (i.e., so that the Coulomb 
potential in the interior is constant). In general, charges 
in excess of neutrality do not behave thermodynami- 
cally: for instance, besides not occupying the whole 
volume given to them, they will not contribute 
normally to the specific heat. 

Neutral systems of charges behave thermodyna- 
mically if they have hard cores, so that the 
ultraviolet catastrophe cannot occur or if they obey 
quantum-mechanical laws and consist of fermionic 
particles (plus possibly bosonic particles with 
charges of only one sign). 

For more details, we refer the reader to Lieb 
and Lebowitz (1972) and Lieb and Thirring (2001). 


q —> œ 


condition is again 


Appendix 2: The Subadditivity Method 


A simple consequence of the assumptions is that the 
exponential in (5.2) can be bounded above by 
eĵBN expl- y i P?) so that 
d 
Le Za OAV) exp(VePe™ 2a ) 
1 d 
S05 vlog Zec(B,A,V) < ee"? /2mB-! [73] 


Consider, for simplicity, the case of a hard-core 
interaction with finite range (cf. [14]). Consider a 
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sequence of boxes Q, with sides 2”Lo, where Lo > 0 
is arbitrarily fixed to be >2R. The partition function 
Zgc(6, z) relative to the volume Q, is 


oo N 
& 
Zn= oS J dOn 
Z N! Jo, 


because the integral over the P variables can be 
explicitly performed and included in 2% if z is 
defined as z = ef^ (2m 871)". 

Then the box Q, contains 2% boxes 9,1 for n > 1 
and 


1 < Z, < Z exp (8B2d(Ln-1 [R2] [74] 


because the corridor of width 2R around the 
boundaries of the 2% cubes Q„-1 filling ©, has 
volume 2RL, ,24 and contains at most 
(Ln—1 a particles, each of which interacts 
with at most 2? other particles. Therefore, 


def 


BDn = Lf log Z, 


< L4_, log Zy-1 + BByg2~"(Lo/R)** 


for some yg > 0. Hence, 0 < Gp, < Bpy_-1 +1 g2™ 
for some T4 > 0 and p, is bounded above and below 
uniformly in n. So, the limit [13] exists on the sequence 
L,=Lo2” and defines a function 3p.,(, A). 

A box of arbitrary size L can be filled with about 
(L/ La)? boxes of side La with 7 so large that, 
prefixed 6 > 0, |bæ — pn| < 6 for all n > a. Likewise, 
a box of size L, can be filled by about (L,/ Ly 
boxes of size L if n is large. The latter remarks lead 
us to conclude, by standard inequalities, that the 
limit in [13] exists and coincides with pæ. 

The subadditivity method just demonstrated for 
finite-range potentials with hard core can be extended 
to the potentials satisfying just stability and tempered- 
ness (cf. the section “Thermodynamic limit”). 

For more details, the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Appendix 3: An Infrared Inequality 


The infrared inequalities stem from Bogoliubov’s 
inequality. Consider as an example the problem of 
crystallization discussed in the section “Continuous 
symmetries: ‘no d=2 crystal’ theorem”. Let (-) 
denote average over a canonical equilibrium state 
with Hamiltonian 


with given temperature and density parameters 
G,p,p—a. Let {X, Y}= >; (0p; X Og; Y — Og, X Op; Y) 


be the Poisson bracket. Integration by parts, with 
periodic boundary conditions, yields 


[A*{C,e- 94 }dPdQ 
BZ.-(8, p, N) 
= —6-'({A*, C}) [75] 


as a general identity. The latter identity implies, for 
A={C, H}, that 


(A*{C,H}) = - 


({H, CF {H, C}) = -B (fC, {H,C}}) [76] 


Hence, the Schwartz inequality (A*A)({H, CY 
TLGH =A; Gir combined with the two 
relations in [75], [76] yields Bogoliubov’s inequality: 


* 2 
wa ox ony 


Let g, h be arbitrary complex (differentiable) 
functions and 0; = 0, 


N N 
A(Q)= Y ela), C(P,Q)= Y phla) [78] 
j=1 j=1 
Then H = Dip + B(4,,---59N)s> if 


an) =5 3. olg- a) +2 Wa) 


JFT j 


P(g,- 


so that, via algebra, 
{C,H} = }_(hj0j® — p;(p; - 3h;) 
j 


with h; h(q)). If b is real valued, ({C,{C*, H}}) 


becomes, again via algebra, 


(5 hh; : sao) 


+ (- X h; AW(q;) + SEn) 


(integrals on p; just replace p; by 26% and 
((p))i(Pj)z) =F 8v). Therefore, the average 
({C, {C*, H}}) becomes 


1 
€ > (h; = hy) Ay(|qj = qii\) 
i 


te) h AW) +407 Son) [79] 


J 


Choose plq) =e" "4, h(g) = cosg- K and 
bound (b; = hy)? by K*(q; = ay)’ (O;h;)° by K? and 
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h? by 1. Hence [79] is bounded above by ND(xk) 
with 


def = 1 
D(x) = ( K| 48 og (q; - q; |Av(4; — 4) 
SAT 
1 
+e AWU) 80 


J 


This can be used to estimate the denominator in 
[77]. For the LHS remark that 


N 
(A*,A) =|) eer’ 
ji 


and 


K{A*, CHP = So rag 


= |K + K| N? (p-(K) + p-(K + 2K))° 


hence [77] becomes, after multiplying both sides 
by the auxiliary function y(x) (assumed even and 
vanishing for |x| > 7/a) and summing over K, 


def 1 (oe ee 


> NL 


|K? (p-(K) T pe(K T 2«))° 
46 D(x) 


To apply [77] the averages in [80], [81] have to be 
bounded above: this is a technical point that is 
discussed here, as it illustrates a general method of 
using the results on the thermodynamic limits and 
their convexity properties to obtain estimates. 

Note that ((1/N) Dy, y(k)d“k| SNe 4/?) is 
identically (0) + (2/N\(Xj<p lq; —4p)) with 
Sa) (1/N) Ep Wace. 

Let acl) = plq) + Ag?|Av(q)| + npl) and 
let Fy(A,7, €)=(1/N) log Z°(A,7,C) with Z° the 
partition function in the volume Q computed 
with energy U'= 7. p),c(4; — 4) +e); Wg) + 
née >) |AW(q;)|. Then Fy(\,7,¢) is convex in A,n 
and it is uniformly bounded above and below if 
In|, lel, |¢] < 1 (say) and |A| < Ag: here Ao > O exists 
if r*|Ay(r)| satisfies the assumption set at the 
beginning of the section “Continuous symmetries: 
‘no d=2 crystal’ theorem” and the density is smaller 
than a close packing (this is because the potential U’ 
will still satisfy conditions similar to [14] uniformly 
in lel, |n| < 1 and |A| small enough). 

Convexity and boundedness above and below 
in an interval imply bounds on the derivatives in 


(81) 


the interior points, in this case on the derivatives of Fy 
with respect to A, 7, ¢ at 0. The latter are identical to 
the averages in [80], [81]. In this way, the constants 
B1, B2, Bo such that D(k) < KB, + £B and Bo > Dı 
are found. 


For more details, the reader is referred to Mermin 
(1968). 


Further Reading 


Aizenman M (1980) Translation invariance and instability of phase 
coexistence in the two dimensional Ising system. Communica- 
tions in Mathematical Physics 73: 83-94. 

Aizenman M (1982) Geometric analysis of y* fields and Ising 
models. 86: 1-48. 

Aizenman M, Lieb EH, Seiringer R, Solovej JP, and Yngvason J 
(2004) Bose-Einstein condensation as a quantum phase 
transition in a optical lattice, Physical Review A 70: 023612. 

Baxter R (1982) Exactly Solved Models. London: Academic Press. 

Benfatto G and Gallavotti G (1995) Renormalization group. 
Princeton: Princeton University Press. 

Bleher P and Sinai Y (1975) Critical indices for Dyson’s asympto- 
tically hierarchical models. Communications in Mathematical 
Physics 45: 247-278. 

Boltzmann L (1968a) Uber die mechanische Bedeutung des zweiten 
Haupsatzes der Warmetheorie. In: Hasenohrl F (ed.) Wissenschaf- 
tliche Abhandlungen, vol. 1, pp. 9-33. New York: Chelsea. 

Boltzmann L (1968b) Uber die Eigenshaften monzyklischer und 
anderer damit verwandter Systeme. In: HasenGdhrl FP (ed.) 
Wissenshafltliche Abhandlungen, vol. Wl, pp. 122-152. 
New York: Chelsea. 

Dobrushin RL (1968) Gibbsian random fields for lattice systems 
with pairwise interactions. Functional Analysis and Applica- 
tions 2: 31-43. 

Domb C and Green MS (1972) Phase Transitions and Critical 
Points. New York: Wiley. 

Dyson F (1969) Existence of a phase transition in a one-dimensional 
Ising ferromagnet. Communications in Mathematical Physics 12: 
91-107. 

Dyson F and Lenard A (1967, 1968) Stability of matter. Journal 
of Mathematical Physics 8: 423-434, 9: 698-711. 

Friedli S and Pfister C (2004) On the singularity of the free energy at 
a first order phase transition. Communications in Mathematical 
Physics 245: 69-103. 

Gallavotti G (1999) Statistical Mechanics. Berlin: Springer. 

Gallavotti G, Bonetto F and Gentile G (2004) Aspects of the 
Ergodic, Qualitative and Statistical Properties of Motion. 
Berlin: Springer. 

Gawedzky K and Kupiainen A (1983) Block spin renormalization 
group for dipole gas and (0¢)*. Annals of Physics 147: 
198-243. 

Gawedzky K and Kupiainen A (1985) Massless lattice 64 theory: 
rigorous control of a renormalizable asymptotically free model. 
Communications in Mathematical Physics 99: 197-252. 

Gibbs JW (1981) Elementary Principles in Statistical Mechanics. 
Woodbridge (Connecticut): Ox Bow Press (reprint of the 1902 
edition). 

Higuchi Y (1981) On the absence of non translationally invariant 
Gibbs states for the two dimensional Ising system. In: Fritz J, 
Lebowitz JL, and Szaz D (eds.) Random Folds. Amsterdam: 
North-Holland. 

Hohenberg PC (1967) Existence of long range order in one and 
two dimensions. Physical Review 158: 383-386. 


88 Introductory Article: Functional Analysis 


Landau L and Lifschitz LE (1967) Physique Statistique. Moscow: 
MIR. 

Lanford O and Ruelle D (1969) Observables at infinity and 
states with short range correlations in statistical mechanics. 
Communications in Mathematical Physics 13: 194-215. 

Lebowitz JL (1974) GHS and other inequalities. Communications 
in Mathematical Physics 28: 313-321. 

Lebowitz JL and Penrose O (1979) Towards a rigorous molecular 
theory of metastability. In: Montroll EW and Lebowitz JL 
(eds.) Fluctuation Phenomena. Amsterdam: North-Holland. 

Lee TD and Yang CN (1952) Statistical theory of equations of 
state and phase transitions, II. Lattice gas and Ising model. 
Physical Review 87: 410-419. 

Lieb EH (2002) Inequalities. Berlin: Springer. 

Lieb EH and Lebowitz JL (1972) Lectures on the Thermodynamic 
Limit for Coulomb Systems, In: Lenard A (ed.) Springer 
Lecture Notes in Physics, vol. 20, pp. 135-161. Berlin: Springer. 

Lieb EH and Thirring WE (2001) Stability of Matter from Atoms 
to Stars. Berlin: Springer. 

Mastropietro V (2004) Ising models with four spin interaction at 
criticality. Communications in Mathematical Physics 244: 
595-642. 

McCoy BM and Wu TT (1973) The two Dimensional Ising 
Model. Cambridge: Harvard University Press. 

Mermin ND (1968) Crystalline order in two dimensions. Physical 
Review 176: 250-254. 


Miracle-Solé S (1995) Surface tension, step free energy and facets 
in the equilibrium crystal shape. Journal Statistical Physics 79: 
183-214. 

Olla S (1987) Large deviations for Gibbs random fields. 
Probability Theory and Related Fields 77: 343-357. 

Onsager L (1944) Crystal statistics. I. A two dimensional Ising 
model with an order—disorder transition. Physical Review 65: 
117-149. 

Penrose O and Onsager L (1956) Bose-Einstein condensation and 
liquid helium. Physical Review 104: 576-584. 

Pfister C and Velenik Y (1999) Interface, surface tension and 
Reentrant pinning transition in the 2D Ising model. Commu- 
nications in Mathematical Physics 204: 269-312. 

Ruelle D (1969) Statistical Mechanics. New York: Benjamin. 

Ruelle D (1971) Extension of the Lee-Yang circle theorem. 
Physical Review Letters 26: 303-304. 

Sinai Ya G (1991) Mathematical Problems of Statistical Mechanics. 
Singapore: World Scientific. 

van Beyeren H (1975) Interphase sharpness in the Ising model. 
Communications in Mathematical Physics 40: 1-6. 

Wilson KG and Fisher ME (1972) Critical exponents in 3.99 
dimensions. Physical Review Letters 28: 240-243. 

Yang CN (1962) Concept of off-diagonal long-range order and 
the quantum phases of liquid He and of superconductors. 
Reviews of Modern Physics 34: 694-704. 


Introductory Article: Functional Analysis 


S Paycha, Université Blaise Pascal, Aubiere, France 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Functional analysis is concerned with the study of 
functions and function spaces, combining techniques 
borrowed from classical analysis with algebraic 
techniques. Modern functional analysis developed 
around the problem of solving equations with 
solutions given by functions. After the differential 
and partial differential equations, which were 
studied in the eighteenth century, came the integral 
equations and other types of functional equations 
investigated in the nineteenth century, at the end of 
which arose the need to develop a new analysis, 
with functions of an infinite number of variables 
instead of the usual functions. In 1887, Volterra, 
inspired by the calculus of variations, suggested a 
new infinitesimal calculus where usual functions are 
replaced by functionals, that is, by maps from a 
function space to R or C, but he and his followers 
were still missing some algebraic and topological 
tools to be developed later. Modern analysis was 
born with the development of an “algebra of the 
infinite” closely related to classical linear algebra 
which by 1890 had (up to the concept of duality, 


which was developed later) settled on firm ground. 
Strongly inspired by algebraic methods, Fredholm’s 
work at the turn of the nineteenth century, in which 
emerged the concept of kernel of an operator, 
became a founding stone for the modern theory of 
integral equations. Hilbert developed further Fred- 
holm’s methods for symmetric kernels, exploiting 
analogies with the theory of real quadratic forms 
and thereby making clear the importance of the 
notion of square-integrable functions. With Hilbert’s 
Grundzige einer allgemeinen Theorie der Integral- 
gleichung, a further step was made from the 
“algebra of the infinite” to the “geometry of the 
infinite.” The contribution of Fréchet, who intro- 
duced the abstract notion of a space endowed with a 
distance, made it possible to transfer Euclidean 
geometry to the framework of what have since 
then been called Hilbert spaces, a basic concept in 
mathematics and quantum physics. 

The usefulness of functional analysis in the study 
of quantum systems became clear in the 1950s when 
Kato proved the self-adjointness of atomic Hamilto- 
nians, and Garding and Wightman formulated 
axioms for quantum field theory. Ever since func- 
tional analysis lies at the very heart of many 
approaches to quantum field theory. Applications 
of functional analysis stretch out to many branches 
of mathematics, among which are numerical 


analysis, global analysis, the theory of pseudodiffer- 
ential operators, differential geometry, operator 
algebras, noncommutative geometry, etc. 


Topological Vector Spaces 


Most topological spaces one comes across in practice 
are metric spaces. A metric on a topological space E 
is a map d:E x E — [0, + o[ which is symmetric, 
such that d(u,v)=0 <= u=v and which verifies the 
triangle inequality d(u,w) < d(u,v) + d(v,w) for all 
vectors u,v,w. A topological space E is metrizable if 
there is a metric d on E compatible with the topology 
on E, in which case the balls with radius 1/7 centered 
at any point x € E form a local base at x — that is, a 
collection of neighborhoods of x such that every 
neighborhood of x contains a member of this 
collection. A sequence (u„) in E then converges to 
u € E if and only if d(u,,u) converges to 0. 

The Banach fixed-point theorem on a complete 
metric space (E,d) is a useful tool in nonlinear 
functional analysis: it states that a (strict) contrac- 
tion on E, that is, a map T:E >E such that 
d(Tu, Tv) < k(u,v) for allu#Ave€E and fixed 0 < 
k<1, has a unique fixed point Tuj=uo. In 
particular, it provides local existence and uniqueness 
of solutions of differential equations dy/dt = F(y, t) 
with initial condition y(0)=yo, where F is Lipschitz 
continuous. 

Linear functional analysis starts from topological 
vector spaces, that is, vector spaces equipped with a 
topology for which the operations are continuous. A 
topological vector space equipped with a local base 
whose members are convex is said to be locally 
convex. Examples of locally convex spaces are 
normed linear spaces, namely vector spaces 
equipped with a norm, a concept that first arose in 
the work of Fréchet. A seminorm on a vector space 
V is a map p: V — [0,œ| which obeys the triangle 
identity p(u+v) < p(u) + plv) for any vectors u,v 
and such that p(Au)=|A\p(u) for any scalar A and 
any vector u; if p(u) =0 => u=0, it is a norm, often 
denoted by ||- ||. A norm on a vector space E gives 
rise to a translation-invariant distance function 
d(u,v) = ||u — v|| making it a metric space. 

Historically, one of the first examples of normed 
spaces is the space C([0,1]) investigated by Riesz of 
(real- or complex-valued) continuous functions on 
the interval [0,1] equipped with the supremium 
norm ||f||,,:= SUPx<0,1] [f(x)|. In the 1920s, the 
general definition of Banach space arose in connec- 
tion with the works of Hahn and Banach. A normed 
linear space is a Banach space if it is complete as a 
metric space for the induced metric, C([0,1]) being a 
prototype of a Banach space. More generally, for 
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any non-negative integer k, the space C*([{0,1]) of 
functions on [0, 1] of las. C? equipped with the 
norm ||f||, = y o IF’ ||, expressed in terms of a 
finite number of seminorms ||f® || = sup,cjo,1) 
f(x) |,2= .,k, is also a Banach space. 

The space C™([0,1]) of smooth functions on the 
interval [0,1] is not anymore a Banach space since 
its topology is described by a countable family of 
seminorms ||f||, with k varying in the positive 
integers. The metric 


k lf - We 
-552 T+If ei 


turns it into a Fréchet space, that is, a locally convex 
complete metric space. The space S(R”) of rapidly 
decreasing functions, which are smooth functions f 


on R” for which 
If lle.g = sup Dr] 
xER” 


is finite for any multiindices a and 8, is also a 
Fréchet space with the topology given by the 
seminorms ||- ||a g9. Further examples of Fréchet 
spaces are the space C%(K) of smooth functions 
with support in a fixed compact subset K C R” 
equipped with the countable family of seminorms 


ID" F loo. K = Sur Df), aE No 
XE 


and the space C®(M, E) of smooth sections of a 
vector bundle E over a closed manifold M equipped 
with a similar countable family of seminorms. Given 
an open subset Q= Upen Kp with Kp, p E N com- 
pact subsets of R”, the space D(Q) = Upen C (Kp) 
equipped with the inductive limit topology — for 
which a sequence (f,,) in D(Q) converges to f € D(Q) 
if each f,, has support in some fixed compact subset 
K and (D°f,,) converges uniformly to D°f on K for 
each mutilindex a — is a locally convex space. 
Among Banach spaces are Hilbert spaces which 
have properties very similar to those of finite- 
dimensional spaces and are historically the first 
type of infinite-dimensional space to appear with the 
works of Hilbert at the beginning of the twentieth 
century. A Hilbert space is a Banach space equipped 
with a norm al || that derives from an inner product, 
that is, ||z|\” =(u,u) with (.,-) a positive-definite 
Dime (or sesquilinear accor ne to whether the 
base space is real or complex) form. Hilbert spaces 
are fundamental building blocks in quantum 
mechanics; using (closed) tensor products, from a 
Hilbert space H one builds the Fock space 
F(H)= 5, @*H and from there the bosonic 
Fock space F(H)= $? o @*H (where &, stands 
for the (closed) symmetrized tensor product) as well 
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as the fermionic Fock space F(H)= § 7% o AH 
(where AF stands for the antisymmetrized (closed) 
tensor product). 

A prototype of Hilbert space is the space h(Z) of 
complex-valued sequences (u,),c7 such that 
ez tal is finite, which is already implicit in 
Hilbert’s Grundziigen. Shortly afterwords, Riesz and 
Fischer, with the help of the integration tool 
introduced by Lebesgue, showed that the space 
L7(]0,1[) (first introduced by Riesz) of square- 
summable functions on the interval ]0,1[, that is, 
functions f such that 


ifs = (f flea) 


is finite, provides an example of Hilbert space. 
These were then further generalized to spaces 
L?(]0,1[) of p-summable (1 < p < oo) functionals 
on ]0,1[ (i.e., functions f such that 


ify = (f FO te) 


is finite), which are not Hilbert unless p = 2 but which 
provide further examples of Banach spaces, the space 
L®(]0,1[) of functions on ]0,1[ bounded almost 
everywhere with respect to the Lebesgue measure, 
offering yet another example of Banach space. 

In 1936, Sobolev gave a generalization of the 
notion of function and their derivatives through 
integration by parts, which led to the so-called 
Sobolev spaces W*?(]0,1[) of functions f€ 
LP (J0, 1[) with derivatives up to order k lying in 
L?(]0,1[), obtained as the closure of C™~(]0,1[) for 


the norm 


k 1/p 
f= llfllwee = (>: arn) 


j=l 


1/2 


(for p=2,W*?(]0,1[) is a Hilbert space often 
denoted by H*(]0, 1[). They differ from the Sobolev 
spaces We? (JO, 1[), which correspond to the closure 
of the set D(]0,1[) for the norm f+>||f||ywe.3 for 
example, an element ue W!?(]0,1[) lies in 
Ww’? (]0, 1[) if and only if it vanishes at 0 and 1, 
that is, if and only if it satisfies Dirichlet-type 
boundary conditions on the boundary of the inter- 
val. Similarly, one defines Sobolev spaces 
We? (R) = WEP (R) on R, Sobolev spaces W*?(Q) 
and wee (Q) on open subsets Q C R” and using a 
partition of unity on a closed manifold M, Sobolev 
spaces H*(M, E) = W**(M,E) of sections of vector 
bundles E over M. Using the Fourier transform 
(discussed later), one can drop the assumption that k 
be an integer and extend the notion of Sobolev space 


to define W%?(Q) and H*(M,E) with s any real 
number. 

Sobolev spaces arise in many areas of mathe- 
matics; one central example in probability theory is 
the Cameron—Martin space H!'([0,t]) embedded in 
the Wiener space C([0,t]). This embedding is a 
particular case of more general Sobolev embedding 
theorems, which embed (possibly continuously, 
sometimes even compactly (the notion of compact 
operator is discussed in a later section)) W*?- 
Sobolev spaces in L?-spaces with q > p such as the 
continuous inclusion W*8?(R”) c LI(R”) with 
1/q=1/p—k/n, or in C!-spaces with I< k such 
as, for a bounded open and regular enough subset Q 
of R” and for any s>l+n/p with p>n, the 
continuous inclusion W*?(Q) c CHQ) (the set of 
functions in C!(Q) such that D°u can be continu- 
ously extended to the closure Q for all |a| < D. 
Sobolev embeddings have important applications for 
the regularity of solutions of partial differential 
equations, when showing that weak solutions one 
constructs are in fact smooth. In particular, on an n- 
dimensional closed manifold M for s > l+ n/2, the 
Sobolev space H*(M,E) can be continuously 
embedded in the space C'(M, E) of sections of E of 
class C!', which in particular implies that the 
solutions of a hypoelliptic partial differential equa- 
tion Au=v with v € L*(M,E) are smooth, as for 
example in the case of solutions of the Seiberg— 
Witten equations. 


Duality 


The concept of duality (in a topological sense) was 
initiated at the beginning of the twentieth century by 
Hadamard, who was looking for continuous linear 
functionals on the Banach space C(I) of continuous 
functions on a compact interval I equipped with a 
uniform topology. It is implicit in Hilbert’s theory 
and plays a central part in Riesz’ work, who 
managed to express such continuous functionals as 
Stieltjes integrals, one of the starting points for the 
modern theory of integration. 

The topological dual of a topological vector space 
E is the space E* of continuous linear forms on E 
which, when E is a normed space, can be equipped 
with the dual norm ||L]|-. =sup,cg, juj<i|L(4)|. 

Dual spaces often provide a receptacle for singular 
objects; any of the functions f € L?(R”)(p > 1) and 
the delta-function at point x € R5 éx :f —> f(x), all lie 
in the space S'(R”) dual to S(R”) of tempered 
distributions on IR”, which is itself contained in the 
space D'(R”) of distributions dual to D(R”). 
Furthermore, the topological dual E* of a nuclear 
space E contains the support of a probability 


measure with characteristic function (see the next 
section) given by a continuous positive-definite 
function on E. Among nuclear spaces are projective 
limits E=Mpyen Hp (a sequence (u,,) E€ E converges 
to u € E whenever it converges to u in each H,) of 
countably many nested Hilbert spaces --- C Hp C 
Hy-1 C+- C Ho such that the embedding Hy, C 
Hy-1 is a trace-class operator (see the section 
“Operator algebras”). If H, is the closure of E for 
the norm ||- ||,, the topological dual FE’ of E for the 
norm ||- ||) is an inductive limit E’= Upen, H-p, 
where H_, are the dual (with respect to || - ||9) 
Hilbert spaces with norm ||- ||_, (a sequence (u,) € 
E" converges to u € E’ whenever it lies in some H_, 
and converges to u for the topology of H_»,) and we 
have 


BC eee dig. Pigg (CS ereG 19 
=H, CHa Ci 4 GCE 


As a result of the theory of elliptic operators on a 
closed manifold, the Fréchet space C®%(M, E) of 
smooth sections of a vector bundle over a closed 
manifold M is nuclear as the inductive limit of 
countably many Sobolev spaces H?(M,E) with 
L7-dual given by the projective limit of countably 
many Sobolev spaces H? (M, E). 

The existence of nontrivial continuous linear 
forms on a normed linear space E is ensured by the 
Hahn-Banach theorem, which asserts that for any 
closed linear subspace F of E, there is a nonvanish- 
ing continuous linear form that vanishes on F. When 
the space is a Hilbert space (H,(-,-);,), it follows 
from the Riesz—Fréchet theorem that any continuous 
linear form L on H is represented in a unique way 
by a vector v € H such that L(u)=(v,u),, for all 
u € H, thus relating the dual pairing on the left with 
the Hilbert inner product on the right and identify- 
ing the topological dual H* with H. 

The strong topology induced by the norm || - || on 
a normed vector space E — that is, the topology in 
which a sequence (u„) converges to u whenever 
||un — u|| — 0 — is too refined to have compact sets 
when E is infinite dimensional since the compactness 
of the unit ball in E for the strong topology 
characterizes finite-dimensional spaces. Since com- 
pact sets are useful for existence theorems, one is 
inclined to weaken the topology: the weak topology 
on E — which coincides with the strong topology 
when E is finite dimensional and for which a 
sequence (u,,) converges to u if and only if L(u,) > 
L(u) VL € E* — has compact unit ball if and only if E 
is reflexive or, in other words, if E can be canonically 
identified with its double dual (E*)*. For 1 < p < œ, 
given an open subset Q C R”, the topological dual of 
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L?(Q) can be identified via the Riesz representation 
with LP (Q) with p* conjugate to p, that is, 1/p + 
1/p* =1 and L?(Q) is reflexive, whereas the topolo- 
gical duals of W%?(Q) and We? (Q) both coincide 
with W,°? (Q) so that only WÈP (Q) is reflexive. 
Neither L'(Q) nor its topological dual L®(Q) is 
reflexive since L'(Q) is strictly contained in the 
topological dual of L°(Q) for there are continuous 
linear forms L on L®(Q) that are not of the form 


L(u) = J uv Yu € L” (Q) with v € L'(Q) 


Similarly, the topological dual E* of a normed 
linear space E can be equipped with the topology 
induced by the dual norm || - ||. and the the weak x- 
topology, namely the weakest one for which the 
maps L+> L(u),u € E, are continuous, and the unit 
ball in E* is indeed compact for this topology 
(Banach—Alaoglu theorem). 

Duality does not always preserve separability — a 
topological vector space is separable if it has a 
countable dense subspace — since L™®(Q), which is 
not separable, is the topological dual of L1(Q), 
which is separable. However, as a consequence of 
the Hahn-Banach theorem, if the topological dual of 
a Banach space is separable then so is the original 
space and one has equivalence when adding the 
reflexivity assumption; a Banach space is reflexive 
and separable whenever its topological dual is. For 
1<p<o,L?(Q) and We? (Q) are separable and 
moreover reflexive if p 1. 


Fourier Transform 


In the middle of the eighteenth century, oscillations 
of a vibrating string were interpreted by Bernouilli 
as a limit case for the oscillation of m-point masses 
when z tends the infinity, and Bernouilli introduced 
the novel idea of the superposition principle by 
which the general oscillation of the string should 
decompose in a superposition of “proper oscilla- 
tions.” This point of view triggered off a discussion 
as to whether or not an arbitrary function can be 
expanded as a trigonometric series. Other examples 
of expansions in “orthogonal functions” (this termi- 
nology actually only appears with Hilbert) had been 
found in the mean time in relation to oscillation 
problems and investigations on heat theory, but it 
was only in the nineteenth century, with the works 
of Fourier and Dirichlet, that the superposition 
problem was solved. 

Separable Hilbert spaces can be equipped with a 
countable orthonormal system {en},e7 ((€ns@m) y= 
bmn With (-,-),, the scalar product on H) which is 
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complete, that is, any vector u € H can be expanded 
in this system in a unique way u = ) „ez nen with 
Fourier coefficients i, =(u,e,). The latter obey 
Parseval’s relation },e7 in| = jul? (where ||- || is 
the norm associated with (-,-)), and the Fourier 
transform u= (û(n)) ez gives rise to an isometric 
isomorphism between the separable Hilbert space 
H and the Hilbert space F(Z) of square-summable 
sequences of complex numbers. In particular, the 
space L7(S') of L?-functions on the unit circle 
S'—R/Z with its usual Haar measure dt is separ- 
able with complete orthonormal system t+ e,(t) = 
eim n e Z, and the Fourier transform 


1 
ur (: > A(n) | em y(t) dr) 
0 neZ 


identifies it with the space /7(Z). Under this 
identification, the Hilbert subspace (N) obtained 
as the range in /*(Z) of the projection p4 : (u)pez > 
(tn) nex corresponds to the Hardy space H7(S'). 

The Fourier transform extends to the space S(R”), 
sending a function f € S(IR”) to the map 


AN 


E= P(E) = e f(x) 


and maps S(R”) onto itself ncn and continuously 
with continuous inverse ft>f(—€). When n=1, the 
Poisson formula relates f € S(R) with its Fourier 
transform f by ye uae 2) = aa: 

Since Fourier transformation turns (up to a 
constant multiplicative factor) differentiation D? 
for a multiindex a=(a1,...,@,) into multiplication 
by €*=€)'---&", it can be used to define W*?- 
Sobolev spaces with s a real number as the space of 
L’ -functions with finite Sobolev norms ||z|| ws» = 

(SI + |€|)Sa(€)|?)'/? (which coincide with the ones 
defined a when s=k is a non-negative 
integer). 

Fourier transforms are also used to describe a 
linear pseudodifferential operator A (see next two 
sections where the notions of bounded and 
unbounded linear operator are discussed) of order 
a acting on smooth functions on an open subset U 
of R” in terms of its symbol o4 — a smooth map o 
on U x R” with compact support in x such that for 
any multi-indices a, 6 € Nọ, there is a constant 
Cag with 


DS Dfo(x, £) < Ca,(1 + |E)" 


for any € € R” - by 


(Af)(x) = 
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Fourier transform maps a Gaussian function 
x e702} on R”, where A is a nonzero scalar, 
to another Gaussian on Ee M2’ (up to 
a nonzero multiplicative factor), a starting point for 
T-duality in string theory. More generally, the 
characteristic function 


o= J el) u(dx) 


of a Gaussian probability measure u with covariance 
C on a Hilbert space H is the function 
Ere (1/2) CH, Such probability measures typically 
arise in Euclidean quantum field theory; in axio- 
matic quantum field theory, the analyticity proper- 
ties of n-point functions can be derived from the 
Wightman axioms using Fourier transforms. Thus, 
Fourier transformation underlies many different 
aspects of quantum field theory. 


Fredholm operators 


A complex-valued continuous function K on [0,1] x 
[0, 1] gives rise to an integral operator 


1 
A:f— f KEO 


on complex-valued continuous functions on [0,1] 
(equipped with the supremum norm || - ||,,) with the 
following upper bound property: 


AP loc 0.1 K(%, 9) | MIF loc 


In other words, A is a bounded linear operator with 
norm bounded from above by supyo, 11x0, 1|K(%5 ¥)|3 
a linear operator A:E — F from a normed linear 
space (E,]|| - ||) to a normed linear space (F,]|| - ||) is 
bounded (or continuous) if and only if its (operator) 
norm |||A|||:= supy,) <1 ||A u||p is bounded. 

An integral operator 


< SUP (0,1) x 


1 
Ae l K(x, y)f (y) dy 


defined by a continuous kernel K is, moreover, 
compact; a compact operator is a bounded operator 
of normed spaces that maps bounded sets to a 
precompact sets, that is, to sets whose closure is 
compact. Other examples of compact operators on 
normed spaces are finite-rank operators, operators 
with finite-dimensional range. In fact, any compact 
operator on a separable Hilbert space can be 
approximated in the topology induced by the 
operator norm |||- ||| by a sequence of finite-rank 
operators. 

Inspired by the work of Volterra, who, in the case 
of the integral operator defined above, produced 


continuous solutions ¢=(I—A)'f of the equation 
f=(I— A) for f € C([0,1]), Fredholm in 1900 
(Sur une classe d’équations fonctionnelles) studied the 
equation f = (I — AA), introducing a complex para- 
meter A. He proved what is since then called the 
Fredholm alternative, which states that either the 
equation f = (I — AA)¢ has a unique solution for every 
f € C([0, 1]) or the corresponding homogeneous equa- 
tion (I — \A)¢=0 has nontrivial solutions. In modern 
language, it means that the resolvent R(A, u) =(A — 
ul) of a compact linear operator A is surjective if and 
only if it is injective. The Fredholm alternative is a 
powerful tool to solve partial differential equations 
among which the Dirichlet problem, the solutions of 
which are harmonic functions u (i.e., Au =O, where 
A=-—)S*"_, 07u/0x?) on some domain 2 € R” with 
Dirichlet boundary conditions u, =f, where f is a 
continuous function on the boundary OQ. The Dirichlet 
problem has geometric applications, in particular to the 
nonlinear Plateau problem, which minimizes the area of 
a surface in R? with given boundary curves and which 
reduces to a (linear) Dirichlet problem. 

The operator B=I—A built from the compact 
operator A is a particular Fredholm operator, namely a 
bounded linear operator B : E — F which is invertible 
“up to compact operators,” that is, such that there is a 
bounded linear operator C: F — E with both BC — Ip 
and CB — Ig compact. A Fredholm operator B has a 
finite-dimensional kernel Ker B and when (E,(-,-);) 
and (F,(-,-);) are Hilbert spaces its cokernel Ker B*, 
where B* is the adjoint of B defined by 


(Buju), = (u,B*v), Vu €E, Vue F 


is also finite dimensional, so that it has a well- 
defined index ind(B) =dim(Ker B) — dim(Ker B*), a 
starting point for index theory. Toplitz operators 
Ts, where @ is a continuous function on the unit 
circle St, provide first examples of Fredholm 
operators; they act on the Hardy space H7(S') by 


To. (x: Am en) — ` m+n em 


m>0 m>0 


under the identification H7(S!) ~ P(N) c F(Z), 
with /*(Z) equipped with the canonical complete 
orthonormal basis (e,,” € Z). The Fredholm index 
ind(T,_,) is exactly the integer n so that the index of 


its adjoint is —”, as a consequence of which the index 
map from Fredholm operators to integers is onto. 


One-Parameter (Semi) groups 


Unlike in the finite-dimensional situation, a linear 
operator A:E — F between two normed linear 
spaces (E,|| - ||,,) and (F,|| - ||) is not expected to be 
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bounded. Unbounded operators arise in partial 
differential equations that involve differential opera- 
tors such as the Laplacian A on an open subset Q C 
R”. The following equations provide fundamental 
examples of partial differential equations which 
arose over time from the study of various problems 
in mathematical physics with the works of Poisson, 
Fourier, and Cauchy: 


Au =0 Laplace equation 


es + Au =0 ti 
— u=0 wave equation 
Or i 

o 

S + Au =0 heat equation 


and later the Schrödinger equation in quantum 
mechanics: 


„ðu 
"at 
where f is a time parameter. 

An unbounded linear operator on an infinite- 
dimensional normed space is usually defined on a 
domain D(A) which is strictly contained in E. The 
Laplacian A is defined on the dense domain 
D(A)=H?(R"”) in L?(R”); it defines a bounded 
operator from H7(R”) to L?(R”) but does not 
extend to a bounded operator on L*(R”). Like this 
Operator, most unbounded operators A: E — F one 
comes across have dense domain D(A) in E and are 
closed, that is, their graph {(u,Au),u € D(A)} is 
closed as a subset of the normed linear space E x F. 
When not actually closed, they can be closable, that 
is, they can have a closed extension called the 
closure of the operator. By the closed-graph theo- 
rem, when E and F are Banach spaces, a linear 
operator A: E — F is continuous whenever its graph 
is closed, as a consequence of which a closed linear 
operator A:E —> F defined on a dense domain is 
bounded provided its domain coincides with the 
whole space. 

For a closed operator A:E— F with dense 
domain D(A), when E and F are Hilbert spaces 
equipped with inner products (-,-); and (-,-),, the 
adjoint A* of A is defined on its domain D(A*) by 


Au 


(Au, vp = (u, Av), V(u,v) € D(A) x D(A*) 


A self-adjoint operator A with domain D(A) is one 
for which D(A) = D(A*) and A = A*; the Laplacian 
A on R” is self-adjoint on the Sobolev space H*(R”) 
but it is only essentially self-adjoint on the dense 
domain D(R”), the latter meaning that its closure is 
self-adjoint. 

Unbounded self-adjoint operators can arise as 
generators of one-parameter semigroups of bounded 
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operators. A one-parameter family of bounded 
operators T;,t > 0 (T, t € R) on a Hilbert space H 
is a semigroup (resp. group) if T;T;= 7; Vt,s > 0 
(resp. Vt,s € R) and it is strongly continuous (or 
simply continuous) if lim; — n T;u = Tpu at any to > 0 
(resp. to € R) and for any u € H. 

Stones’ theorem sets up a one-to-one correspon- 
dence between continuous one-parameter unitary 
(UsU,=U,U;3 =I) groups U;,t¢€R on a Hilbert 
space such that Up =Id and self-adjoint operators 
A obtained as infinitesimal generators, that is, as the 
strong limit 

Uu-—u 


Au = lim , 
t—0 t 





u € H 


of U,,t€R, which in a compact form reads 
U,;=e". An important example in quantum 
mechanics is U;,=e*## Uo,tE€R with H a self- 
adjoint Hamiltonian, which solves the Schrödinger 
equation d/dtw=iHu. The Lie-Trotter formula, 
which has important applications for Feynman 
path integrals, expresses the unitary semigroup 
generated by A+B, where A, B, and A+B are 
self-adjoint on their respective domains as a strong 
limit 
e#(4+B) — lim G 


t—co 


On the other hand, positive operators on a 
Hilbert space (H,(-,-),,) — that is, A self-adjoint 
and such that (Au,u);, > 0 Vu € D(A) —- generate 
one-parameter semigroups T;=e“,t>0. Hille 
and Yosida proved that on a Hilbert space, strongly 
continuous contraction (i.e. |||T;||| <1 Vzt> 0) 
semigroups such that T)=Id are in one-to-one 
correspondence with densely defined positive opera- 
tors A: D(A) c H — H that are maximal (i.e., I + A 
is onto), obtained as (minus the) infinitesimal 
generators 





of the corresponding semigroups. Similarly, a posi- 
tive densely defined self-adjoint operator A on a 
Hilbert space H gives rise to a densely defined closed 
symmetric sesquilinear form (u,v)>(VAu, VAv),, 
(see next section for a definition of VA;(-,-),, is the 
scalar product on H) and this map yields a one- 
to-one correspondence between operators and 
sesquilinear forms on H with the aforementioned 
properties, one of the starting points for the theory 
of Dirichlet forms. To a probability measure u on 
a separable Banach space E, one can associate a 
densely defined closed symmetric sesquilinear form 
(it is in fact a Dirichlet form) on a Hilbert space H 


such that E* C H*=H C E, which in the particular 
case of the standard Wiener measure u on the 
Wiener space E=C([0,t]) and with Hilbert space 
given by the Cameron—Martin space H = H'([0, ¢]), 
is the bilinear form 


(u,v) b> [vu Viia 


with V the (closed) gradient of Malliavin calculus. 
The operator —A, where A is the Laplacian on R”, 

generates the heat-operator semigroup e~*,t > 0. It 

has a smooth kernel K; € C°(R” x R”) defined by 


(e-^ f)(x) = / K(x, )f)dy Yf € CR(R”) 
a 


and defines a smoothing operator, an operator that 
maps Sobolev function to smooth function. In 
general, a pseudodifferential operators A on an 
open subset U of R” with symbol oc, only has a 
distribution kernel 


Ka(x,y) =f ei- 6 (dé 


The kernel of the inverse Laplacian (A +m?) 
on R” (the non-negative real number m? stands 
for the mass) called Green’s function on R”, 
plays an essential role in the theory of Feynman 
graphs. 


Spectral Theory 


Spectral theory is the study of the distribution of the 
values of the complex parameter A for which, given 
a linear operator A on a normed space E, the 
operator A — AI has an inverse and of the properties 
of this inverse when it exists, the resolvent 
R(A, A) =(A— Al) of A. The resolvent p(A) of A 
is the set of complex numbers A for which A — AI is 
invertible with densely defined bounded inverse. The 
spectrum Sp(A) of A is the complement in C of the 
resolvent; it consists of a union of three disjoint sets: 
the set of all complex numbers A for which A — AI is 
not injective, called the point spectrum — such a A is 
an eigenvalue of A with associated eigenfunction 
any u € D(A) such that Au = Au; the set of points A 
for which A — XI has a densely defined unbounded 
inverse R(A, A) called the continuous spectrum; and 
the set of points A for which A — AI has a well- 
defined unbounded but not densely defined inverse 
R(A, A) called the residual spectrum. 

A bounded operator has bounded spectrum and a 
self-adjoint operator A acting on a Hilbert space has 
real spectrum and no residual spectrum since the 
range of A — AI is dense. As a consequence of the 


Fredholm alternative, the spectrum of a compact 
Operator consists only of point spectrum; it is 
countable with accumulation point at 0. A Hamilto- 
nian of a quantum mechanichal system can have 
both point and continuous spectra, but its point 
spectrum is of special interest because the corre- 
sponding eigenfunctions are stationary states of the 
system. As was first pointed out by Kac (“Can you 
hear the shape of a drum?”), the spectrum of an 
Operator acting on functions can reflect the geome- 
try of the space these functions are defined on, a 
starting point for many interesting and far-reaching 
questions in differential geometry. 

A self-adjoint linear operator on a Hilbert space 
can be described in terms of a family of projections 
Ey, A E R via the spectral representation 


A= J AdE, 
Sp(A) 


Given a Borel real-valued function f on R, the operator 
fa) =| fOydE, 
Sp(A) 


yields another self-adjoint operator. A positive 
operator A on a dense domain D(A) of some Hilbert 
space (H,(-,-);,;) has non-negative spectrum and for 
any positive real number t, the map Arve” gives 
the associated bounded heat-operator 


e =f e dE, 
Sp(A) 


while the map AvA gives rise to a positive 


operator VA such that JA =A, 
The resolvent can also be used to define new 
operators 


f(A) = ziz | FORA, NAA 


from a linear operator via a Cauchy-type integral 
along a countour C around the spectrum; this way 
one defines complex powers A™ of (essentially self- 
adjoint) positive elliptic pseudodiffferential opera- 
tors which enter the definition of the zeta-function, 
z> ÇC(A,z), of the operator A. The ¢-function is a 
useful tool to extend the ordinary determinant to 
¢-determinants of self-adjoint elliptic operators, 
thereby providing an ansatz to give a meaning to 
partition functions in the path integral approach to 
quantum field theory. 


Operator Algebras 


Bounded linear operators on a Hilbert space H 
form an algebra £L(H) closed for the operator norm 
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with involution given by the adjoint operation 
At A*; it is a C*-algebra, that is, an algebra over 
C with a norm ||- || and an involution * such that A 
is closed for this norm and such that ||ab|| < ||a||||D]| 
and |la*a||=||a\|> for all a,b¢A and by the 
Gelfand—Naimark theorem, every C*-algebras 
isomorphic to a sub-C*-algebra of some L(H). The 
notion of spectrum extends from bounded opera- 
tors to C*-algebras; the spectrum sp(a) of an 
element a in a C*-algebra A is a (compact) set of 
complex numbers such that a— A-1 is not inver- 
tible. The notion of self-adjointness also extends 
(a=a*), and just as a self-adjoint operator B € 
L(H) is non-negative (in which case its spectrum 
lies in R?) if and only if B = A*A for some bounded 
operator A, an element b € A is said to be non- 
negative if and only if b=a‘*a for some a € A, in 
which case sp(a) C Rj. 

The algebra C(X) of continuous functions f : X —> 
C vanishing at infinity on some locally compact 
Hausdorff space X equipped with the supremum 
norm and the conjugation f —> f is also a C*-algebra 
and a prototype for abelian C*-algebras, since 
Gelfand showed that every abelian C*-algebra is 
isometrically isomorphic to C(X), with X compact if 
the algebra is unital. To a C*-algebra A, one can 
associate an abelian group Ko(A) which is dual to the 
Grothendieck group K°(X) of isomorphism classes of 
vector bundles over a compact Hausdorff space X. 

Compact operators on a Hilbert space H form 
the only proper two-sided ideal K(H) of the C*- 
algebra £(H) which is closed for the operator norm 
topology on L(H). The quotient L(H)/K(H) is 
called the Calkin space, after Calkin, who classi- 
fied all two-sided ideals in £(H) for a separable 
Hilbert space H; one can set up a one-to-one 
correspondence between such ideals and certain 
sequence spaces. Corresponding to the Banach 
space 1'(Z,) of complex-valued sequences (u,) such 
that So en |un| < œ, is the x-ideal Z1(H) of trace- 
class operators. The trace tr(A) =) „ez (A €ns€n) 7 
of a negative operator A € L(H) lies in [0, +00] 
and is independent of the choice of the complete 
orthonormal basis {e,,2 € Z} of H equipped with 
the inner product (-,-),,;. Z1(H) is the Banach space 
of bounded linear operators on H such that 
|A|] =tr(|A]) is bounded. Given an (esssentially 
self-adjoint) positive differential operator D of 
order d acting on smooth functions on a closed 
n-dimensional Riemannian manifold M, its 
complex power D= is a trace class on the space 
of L?-functions on M provided Re(z) > n/d and the 
corresponding trace tr(D~*) extends to a mero- 
morphic function on the whole plane, the 
¢-function ¢(D,z) which is holomorphic at 0. 
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More generally, Banach spaces [?(Z),1<p<wo, 
of complex-valued sequences (w,),<7 such that 
S` cz |n|? < co relate to Schatten ideals Z,(H), 1 < 
p < co, where Z,(H) is the Banach space of bounded 
linear operators on H such that ||Al|, = (tr(|A|?))!/? 
is bounded. Just as all /?-sequences converge to 0, 
the Schatten ideals Z(H) all lie in K(H) and we 
have --- C Zp41(H) C Zp(H) C -+ C K(H). 

Compact operators and Schatten ideals are 
useful to extend index theory to a noncommuta- 
tive context; a Fredholm module (H, F) over an 
involutive algebra A is given by an involutive 
representation m of A in a Hilbert space H and 
a self-adjoint bounded linear operator F on H 
such that F*=Idy and the operator brackets 
[F,z(a)| are compact for all acA. To a 
p-summable Fredholm module (H, F), that is, 
[F,7(a)] € Z,(H) for all a € A, one associates a 
representative T of the Chern character ch*(H, F) 
given by a cyclic cocycle on A, which pairs up with 
K-theory to build an integer-valued index map rT 
on K-theory. 

Schatten ideals are also useful to investigate the 
geometry of infinite-dimensional spaces such as loop 
groups, for which the Hilbert-Schmidt operators 
(operators in Z2(H) are also called Hilbert—Schmidt 


operators) are particularly useful. A Holder-type 
inequality shows that the product of two Hilbert- 
Schmidt operators is trace-class. Moreover, for any 
two Hilbert-Schmidt operators A and B, the 
“cyclicity property” that tr(AB)=tr(BA) holds, 
and the sesquilinear form (A,B) tr(A B*) makes 
£>(H) a Hilbert space. 
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Introduction 


Minkowski spacetime is generally regarded as the 
appropriate mathematical context within which to 
formulate those laws of physics that do not refer 
specifically to gravitational phenomena. Here we 
shall describe this context in rigorous terms, 
postulate what experience has shown to be its 
correct physical interpretation, and illustrate by 
means of examples its appropriateness for the 
formulation of physical laws. 


Minkowski Spacetime 
and the Lorentz Group 


Minkowski spacetime M is a four-dimensional real 
vector space on which is defined a bilinear form 
g:Mx M — R that is symmetric (g(v, w) = g(w, v) 
for all v,w E€ M) and nondegenerate (g(v,w) =0 


for all w € M implies v = 0). Further, g has index 1, 
that is, there exists a basis {e1, e2, €3, e4} for M with 


1 it g=b=H1,2,3 
2 (Ea, ep) = Nab =< —1 ifa=b=4 
0 itab 


g is called a Lorentz inner product for M and any 
basis of the type just described is an orthonormal 
basis for M. We shall often write v - w for the value 
g(v,w) of g on (v,w) E M x M. A vector v € M is 
said to be spacelike, timelike, or null if v-v is 
positive, negative, or zero, respectively, and the set 
Cy of all null vectors is called the null cone in M. If 
{e1,€2,€3,e4} is an orthonormal basis and if 
we write v=v'e, +v e + ves + vleg =e, (using 
the Einstein summation convention, according to 
which a repeated index, one subscript and one 
superscript, is summed over its possible values) and 
w=wi? e,, then 


V: wW = viw! -+ vw -+ vw? — vw 


z Nap w 
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Ce 


“2— Null 


Spacelike Nw 


Figure 1 Spacelike, timelike and null vectors. 


In particular, v is null if and only if 


(hence the name null “cone” for Cyn). Timelike vectors 
are “inside” the null cone and spacelike vectors are 
“outside” (see Figure 1). 

We select some orientation for the vector space M 
and will henceforth consider only oriented, ortho- 
normal bases for M. From the Schwartz inequality 
for R?, one can show (Naber 1992, theorem 1.3.1) 
that, if v is timelike and w is either timelike or null 
and nonzero, then v-w < 0 if and only if v*w* > 0 
in any orthonormal basis. In particular, one can 
define an equivalence relation on the set of all 
timelike vectors by decreeing that two such, v and 
w, are equivalent if and only if v-w <0. For 
reasons that will emerge shortly we then say that v 
and w have the same time orientation. There are 
precisely two equivalence classes, one of which we 
select and designate future directed. Timelike vectors 
in the other class are then called past directed. One 
can show (Naber 1992, section 1.3 and corollary 
1.4.5) that this classification can be extended to 
nonzero null vectors as well (but not to spacelike 
vectors). We will call an oriented, orthonormal basis 
time oriented if its timelike vector e4 is future 
directed and will consider only these in what 
follows. An oriented, time-oriented, orthonormal 
basis for M will be called an admissible basis. If 
{e1, €2,€3,e4} and {é1, é2, é3,é4} are two such bases 
and if we write 


ep = A! pê + A? pen + A? p63 +A pê, 
=e, D= 12.3.4 [1] 
then the matrix A=(A‘%,) (a=row index, 


b=column index) can be shown to satisfy the 
following three conditions (Naber 1992, section 1.3): 


1. (orthogonality) A'nA =n, 
where T means transpose and 


N = (Nab) = 


O O Ohe 
O © e 


2. (orientability) det A = 1, and 
3. (time orientability) A44 > 1. 


We shall refer to any 4 x 4 matrix A = (A%,) satisfying 
these three conditions as a Lorentz transformation 
(although one often sees the adjectives “proper” and 
“orthochronous” appended to emphasize conditions 
(2) and (3), respectively). The set £ of all such matrices 
forms a group under matrix multiplication that we call 
simply the Lorentz group. It is a simple matter to show 
(Naber 1992, lemma 1.3.4) from the orthogonality 
condition (1) that, if A44 = 1, then A must be of the 
form 


(R‘j) 


0 0 


O CO © 


0O 1 


where (R‘;) is an element of SO(3), that is, a 3 x 3 
orthogonal matrix with determinant 1. The set R of 
all matrices of this form is a subgroup of £ called 
the rotation subgroup. Although it will play no role 
in what we do here, it should be pointed out that in 
many applications (e.g., in particle physics) it is 
necessary to consider the larger group of transfor- 
mations of M generated by the Lorentz group and 
spacetime translations (x* — x* + A’, for some con- 
stants A%,a=1,2,3,4). This is called the inhomoge- 
neous Lorentz group, or Poincaré group. 


Physical Interpretation 


For the purpose of describing how one is to think of 
Minkowski spacetime and the Lorentz group physi- 
cally it will be convenient to distinguish (intuitively 
and terminologically, if not mathematically) between a 
“vector” in M and a “point” in M (the “tip” of a 
vector). The points in M are called events and are to be 
thought of as actual physical occurrences, albeit 
idealized as “point events” which have no spatial 
extension and no duration. One might picture, for 
example, an instantaneous collision, or explosion, or 
an “instant” in the history of some point material 
particle or photon (“particle of light”). 

Events are observed and identified by the assign- 
ment of coordinates. We will be interested in 
coordinates assigned in a very particular way by a 
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very particular type of observer. Specifically, our 
admissible observers preside over three-dimensional, 
right-handed, Cartesian spatial coordinate systems, 
relative to which photons always move along 
straight lines in any direction. With a single clock 
located at the origin, such an observer can determine 
the speed, c, of light in vacuo by the so-called Fizeau 
procedure (emit a photon from the origin when the 
clock there reads t;, bounce it back from a mirror 
located at (x',x*,x°), receive the photon at the 
origin again when the clock there reads ft) and set 


c= 24/ (x1)* + (x2)* + (x3)? /(t2 — t1)). Now place an 
identical clock at each spatial point and synchronize 
them by emitting from the origin a spherical 
electromagnetic wave (photons in all directions) 


and setting the clock whose location is (x!,x,x°) 


(xc1)* + (x2)* + (x3)*/e at the instant the 
wave arrives. An observer now assigns to an event 
the three spatial coordinates of the location at which 
it occurred in his coordinate system as well as the 
time reading on the clock at that location at the 
instant the event occurred. We shall assume also 
that our admissible observers are inertial in the sense 
of Newtonian mechanics (the trajectory of a particle 
on which no forces act, when described in terms 
of the coordinates just introduced, is a point or a 
straight line traversed at constant speed). It is an 
experimental fact (and quite a remarkable one) that 
all of these admissible observers (whether or not they 
are in relative motion) agree on the numerical value of 
the speed of light in vacuo (c ~ 3.00 x 101° cm s7!). 
We shall exploit this fact at the outset to have all of our 
admissible observers measure time in units of distance 
by simply multiplying their time coordinates ¢ by c. 
The resulting time coordinate is denoted xf = ct. In 
these units all speeds are dimensionless and the speed 
of light in vacuo is 1. 

In our mathematical model M of the world of 
events, this very subtle and complex notion of an 
admissible observer is fully identified with the 
conceptually very simple notion of an admissible 
basis {€1,é€2,e€3,e4}. If x € M is an event and if we 
write x =x%e,, then (x1, x?, x°) are the spatial and xí 
is the time coordinate supplied for x by the 
corresponding observer. If {@), é2,é3,é4} is another 
basis/observer related to {e1,é€2,e3,e4} by [1] and if 
we write x =x%é,, then 


to read 


£4 = AQ x” a =1,2,3,4 [2] 
Thus, Lorentz transformations relate the space and 
time coordinates supplied for any given event by two 
admissible observers. If (Afp) E€ R, then the two 
observers differ only in the orientation of their spatial 


coordinate axes. On the other hand, for any real 
number @ one can define an element L(0) of £ by 


coshO 0 0 —sinhé@ 
0 1 0 0 
le 0 01 0 3 
— sinh 0 0 cosh 0 


and, if two admissible bases are related by this Lorentz 
transformation, then the coordinate transformation [2] 
becomes 


£! = (cosh 8) xt — (sinh 8) x* 


Vax 
X 


4] 


z4 = —(sinh 0) x! + (cosh 0) x* 


Letting 8 = tanh 8 (so that —1 < 8 < 1) and suppressing 


£? =x* and £ = x°, one obtains 
1 p 
si 1 4 
Ca Ee 
= er s 
ge Pa ts 
4/1 — 3 4/1 — 82 


This corresponds to two observers whose spatial 
axes are oriented as shown in Figure 2 with the 
hatted coordinate system moving along the common 
x!-, x'-axis with speed |6], to the right if 8 > 0 and 
to the left if 8 < 0. 

We remark that, reverting to traditional time units, 
G=v/c, where |v| is the relative speed of the two 
coordinate systems, and [5] becomes what is gener- 
ally referred to as a “Lorentz transformation” in 
elementary expositions of special relativity, that is, 


a  x'-—ut 
” fv /2 
s t—(v/c*)x! 6l 
O Vie 





Figure 2 Observers in standard configuration. 
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There is a sense in which, to understand the 
kinematic effects of special relativity, it is enough 
to restrict one’s attention to the so-called special 
Lorentz transformations L(9). Specifically, one can 
show (Naber 1992, theorem 1.3.5) that if A € £ is 
any Lorentz transformation, then there exists a real 
number @ and two rotations Ri, R2 E R such that 
A=R,L(0)R>. Since Ry and R3 involve no relative 
motion, all of the kinematics is contained in L(6). 
We shall explore these kinematic effects in more 
detail shortly. 

Now suppose that x and xo are two distinct events 
in M and consider the displacement vector x — xo 
from xo to x. If {e1, e2, e3, e4} is an admissible basis 
and if we write x=x%e, and xo =xĝea, then x — 
No = (0 — Kh er = Ax e If = xo is null, then 


(Ax +4 (Ax?)"+(Ax3)"= (Axt) 


so the spatial separation of the two events is equal to 
the distance light would travel during the time lapse 
between the events. The same must be true in any 
other admissible basis since Lorentz transformations 
are the matrices of linear maps that preserve the 
Lorentz inner product. Consequently, all admissible 
observers agree that x) and x are “connectible by 
a photon.” They even agree as to which of the two 
events is to be regarded as the “emission” of the 
photon and which is to be regarded as its “reception” 
since one can show (Naber 1992, theorem 1.3.3) 
that, when a vector is either timelike or null and 
nonzero, the sign of its fourth coordinate is the same 
in every admissible basis (because A*4 > 1). Thus, 
xt — x} is either positive for all admissible observers 
(xo occurred before x) or negative for all admissible 
observers (xo occurred after x). Since photons move 
along straight lines in admissible coordinate systems 
we adopt the following terminology. If xo,x € M are 
such that x — xo is null, then the straight line in M 
containing xg and x is called the world line of a 
photon in M and is to be thought of as the set of all 
events in the history of some particle of light that 
“experiences” both xo and x. 

Let us now suppose instead that x — xo is timelike. 
Then, in any admissible basis, 


(Ax!) HAL HA) < (Axt) 


so the spatial separation of xo and x is less than the 
distance light would travel during the time lapse 
between the events. In this case, one can prove (Naber 
1992, section 1.4) that there exists an admissible basis 
{@1, 2, €3, é4} in which Ax! = Ax* = Ax? = 0, that is, 
there is an admissible observer for whom the two 
events occur at the same spatial location, one after the 
other. Thinking of this location as occupied by some 


material object (e.g., the observer’s clock situated at 
that point) we find that the events xo and x are both 
“experienced” by this material particle and that, 
moreover, \/|g(x — xo, x — xo)| is just the time lapse 
between the events recorded by a clock carried along by 
this material particle. To any other admissible observer 
this material particle appears “free” (not subject to 
forces) because it moves on a straight line with constant 
speed. This leads us to the following definitions. If 
xo,x E M are such that x — xo is timelike, then the 
straight line in M containing xg and x is called the 
world line of a free material particle in M and 

lg(x —x0,x — xo)|, usually written T(x — xo), or 
simply Ar, is the proper time separation of xo and x. 
One can think of T(x — x0) as a sort of “length” for 
x — xo measured, however, by a clock carried along by 
a free material particle that experiences both xo and x. 
It is an odd sort of length, however, since it satisfies 
not the usual triangle inequality, but the following 
“reversed” version. 


Reversed triangle inequality (Naber 1992, theorem 
1.4.2) Letxo,x and y be events in M for which y — x 
and x — xo are timelike with the same time orientation. 
Then y — xo = (y — x) + (x — xo) is timelike and 


T(y — xo) 2 T(y — x) + T(x — Xo) [7] 


with equality holding if and only if y — x and x — xo 
are linearly dependent. 


The sense of the inequality in [7] has interesting 
consequences about which we will have more to say 
shortly. 

Finally, let us suppose that x — xo is spacelike. 
Then, in any admissible basis 


(Ax!) + (Ax? + (Ax3)* > (Ax*)? 


so the spatial separation of xo and x is greater than the 
distance light could travel during the time lapse that 
separates them. There is clearly no admissible observer 
for whom the events occur at the same location. No 
free material particle (or even photon) can experience 
both x9 and x. However, one can show (Naber 1992, 
section 1.5) that, given any real number T (positive, 
negative, or zero), one can find an admissible basis 
{e1, 2, €3,é4} in which Ax*=T. Some admissible 
observers will judge the events simultaneous, some 
will assert that xg occurred before x, and others will 
reverse the order. Temporal order, cause and effect, 
have no meaning for such pairs of events. For those 
admissible observers for whom the events are simulta- 
neous (Ax*=0), the quantity \/g(x — xo, x — xo) is 
the distance between them and for this reason this 
quantity is called the proper spatial separation of xo 
and x (whenever x — xo is spacelike). 
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For any two events x9,x E M, g(x — x0,x — xo) is 
given in any admissible basis by (Ax!)* + (Ax?) + 
(Ax3)* — (Axt)? and is called the interval separating 
xo and x. It is the closest analog in Minkowskian 
geometry to the (squared) length in Euclidean 
geometry. It can, however, assume any real value 
depending on the physical relationship between 
the events xo and x. Historically, of course, it was 
the various physical interpretations of this interval 
that we have just described which led Minkowski 
(Einstein et al. 1958) to the introduction of the 
structure that bears his name. 


Kinematic Effects 


All of the well-known kinematic effects of special 
relativity (the addition of velocities formula, the 
relativity of simultaneity, time dilation, and length 
contraction) follow easily from what we have done. 
Because it eases visualization and because, as we 
mentioned earlier, it suffices to do so, we will limit our 
discussion to the special Lorentz transformations. 

Let 0; and 9) be two real numbers and consider 
the corresponding elements L(6,) and L(@2) of 
L defined by [3]. Sum formulas for sinh@ and 
cosh imply that L(6,)L(62) = L(6, + 62). Defining 
B;= tanh 6; i= 1,2, and G= tanh (6; + 62), the sum 
formula for tanh @ then gives 


_ Bi + Bo 
1+ 61b2 


The physical interpretation is simple. One has three 
admissible observers whose spatial axes are related 
in the manner shown in Figure 2. If the speed of the 
second relative to the first is 3, and the speed of the 
third relative to the second is 62, then the speed of 
the third relative to the first is not 3; + 62 as a 
Newtonian predisposition would lead one to expect, 
but rather 6, given by [8]. This is the relativistic 
addition of velocities formula. 

We have seen already that, when the interval 
between xo and x is spacelike, the events will be 
judged simultaneous by some admissible obser- 
vers, but not by others. Indeed, if Ax*=0 
and the observers are related by [5], then Ax* = 
—(B//1 — 62)Ax! =—BAx', which will not be 
zero unless 8 =Q and so there is no relative motion 
(Ax! cannot be zero since then Ax*=0 for 
a=1,2,3,4 and x=xo). This phenomenon is 
called the relativity of simultaneity and we now 
construct a simple geometrical representation of it. 

Select two perpendicular lines in the plane to 
represent the x!- and x*-axes (the Euclidean ortho- 
gonality of the lines has no physical significance and 


b [8] 


is unnecessary, but makes the pictures easier to 
draw). The x!-axis will be represented by the 
straight line 44=0 which, from [5], is given by 
xt = Bx! (in Figure 3 we have assumed that 8 > 0). 
Similarly, the x*-axis is identified with the line 
x* —(1/)x'. Since Lorentz transformations leave 
the Lorentz inner product invariant, the hyperbolas 
(x1) — (x4)? =k coincide with (21)? — (%4)* =k and 
we calibrate the axes accordingly, for example, the 
branch of (x!) — (x*)*=1 with x! > 0. intersects 
the x!-axis at the point (xt, x4) = (1,0) and intersects 
the x!-axis at the point (x',x*)=(1,0). This 
necessitates a different scale on the hatted and 
unhatted axes, but one can show (Naber 1992, 
section 1.3) that, with this calibration, all coordi- 
nates can be obtained geometrically by projecting 
parallel to the opposite axis (e.g., the x*- and x*- 
coordinates of an event result from projecting 
parallel to the x!- and x!-axes, respectively). 

Thus, a line of simultaneity in the _ hatted 
(respectively, unhatted) coordinates is parallel to 
the £t- (respectively, x!-) axis so that, in general, a 
pair of events lying on one will not lie on the other 
(note, however, that these lines are “really” three- 
dimensional hyperplanes so what appears to be a 
point of intersection is actually a two-dimensional 
“plane of agreement”, any two events in which are 
judged simultaneous by both observers). 

For any two events whatsoever the relationship 
between the time lapse AX* in the hatted coordinates 
and the time lapse Ax* in the unhatted coordinates is, 
from [5], 


azt = —-—2 _ag! + —=_aw' 


—— Ax 
\/1 — 82 \/1 — 8 
so the two are generally not equal. Consider, in 
particular, two events on the world line of a point 
at rest in the unhatted coordinate system, for 


Hatted line of simultaneity 






Unhatted line of simultaneity 


3 (xt, x4) =(1, 0) 


Figure 3 Relativity of simultaneity. 
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example, two readings on the clock at rest at the 
origin in this system. Then Ax! =0 so 


Ax’ = Ax* > Ax‘ 


1 
/1 — 8 
This effect is entirely symmetrical since, if Ax! = 0, 
then [5] implies 


Axt = Ax* > ARÍ 


1 
Ji-# 
Each observer judges the other’s clocks to be 
running slow. This phenomenon is called time 
dilation and is clearly visible in the spacetime 
diagram in Figure 4 (e.g., both observers agree 
on the time reading “0” for the clock at the origin of 
the unhatted system, but the line %*—=1 intersects 
the world line of the clock, i.e., the x*-axis, at a 
point below (x!,x*) = (0, 1)). 

We should emphasize that this phenomenon is 
quite “real” in the physical sense. For example, 
certain types of elementary particles (mesons) found 
in cosmic radiation are so short-lived (at rest) that, 
even if they could travel at the speed of light, the 
time required to traverse our atmosphere would be 
some ten times their normal life span. They should 
not be able to reach the earth, but they do. Time 
dilation “keeps them young” in the sense that what 
seems a normal life time to the meson appears much 
longer to us. 

Finally, since admissible observers generally 
disagree on which events are simultaneous and 
since the only way to measure the “length” of a 
moving object (say, a measuring rod) is to locate its 
end points “simultaneously,” it should come as no 
surprise that length, like simultaneity, and time, 
depends on the admissible observer measuring it. 
Specifically, let us consider a measuring rod lying 
at rest along the x!-axis of the hatted coordinate 





Figure 4 Time dilation. 









s(x", x“) =(1, 0) 


Figure 5 Length contraction. 


system. Its “length” in this coordinate system is Ax!. 
The world lines of its end points are two straight 
lines parallel to the x*-axis. If the unhatted observer 
locates two events on these world lines “simulta- 
neously” their coordinates will satisfy Ax*=0 and, 


by [5] Ax! =(1/./1 — 62)Ax! so 
Ax! = y1 — 82 Ak! < Ax! 


and the moving measuring rod appears contracted in 
its direction of motion by a factor of ,/1— G2. As 
for time dilation, this phenomenon, known as length 
contraction, is entirely symmetrical, quite real, and 
clearly visible in a spacetime diagram (Figure 5). 


The Relativity Principle 


We have found that admissible observers can disagree 
about some rather startling things (whether or not two 
events are simultaneous, the time lapse between two 
events even when no one thinks they are simultaneous, 
and the length of a measuring rod). This would be 
a matter of no concern at all, of course, if one could 
determine, in any given situation, who was really 
right. Surely, two events are either simultaneous or 
they are not and we need only sort out which 
admissible observer has the correct view of the 
situation? Unfortunately (or fortunately, depending 
on one’s point of view) this distinction between 
the judgments made by different admissible observers 
is precisely what physics forbids. 


The relativity principle (Einstein et al. 1958). All 
admissible observers are completely equivalent for 
the formulation of the laws of physics. 


We must be clear that this is not a mathematical 
statement. It is rather a statement about the physical 
world around us and how it should be described, 
gleaned from observations, some of which are 
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complex and subtle and some of which are common- 
place (a passenger in a smooth, quiet airplane 
traveling at constant groundspeed cannot “feel” 
his motion relative to the earth). It is a powerful 
guide for constructing the laws of relativistic 
physics, but even more fundamentally it prohibits 
us from regarding any particular admissible observer 
as having a privileged view of the universe. In 
particular, we are forbidden from attaching any 
objective significance to such questions as, “were the 
two supernovae simultaneous?”, “How long did the 
meson survive2”, and “What is the distance between 
the Crab Nebula and Alpha Centauri?” This is 


severe, but one must deal with it. 


Particles and 4-Momentum 


IfI C R isan interval, thenamapa:I — M isa curve 
in M. Relative to any admissible basis we can write 


a(€) =- x’ (E) ea 


for each € € I. We shall assume that a is smooth in 
the sense that each x*(€),a=1,2,3,4, is infinitely 
differentiable (C%) on I and the velocity vector 


is nonzero for every €€ (we adopt the usual 
custom, in a vector space, of identifying the tangent 
space at each point with the vector space itself). This 
definition of smoothness clearly does not depend on 
the choice of admissible basis for M. The curve a is 
said to be spacelike, timelike, or null if 

dx? dx? 


aE) i a (E) = Nab dé dE 


is positive, negative, or zero, respectively, for each 
€€ I. A timelike curve a for which a‘(€) is future 
directed for each £ € I is called a timelike world line 
and its image is identified with the set of all events 
in the history of some (not necessarily free) point 
material particle. If I=[&,&] and a:[&,&] > M 
is a timelike world line, then the proper time length 
of a is defined by 


&1 
L(a) = , v Ig(æ (£), a (£))| dg 


7 [ dxa dx? 
£0 


— Tab dé dé 
and interpreted as the time lapse between the events 
a(&)) and a(€;) as recorded by a clock carried along by 
the particle whose world line is a. This interpretation 
is easily motivated by writing out a Riemann sum 


approximation to the integral and appealing to our 
interpretation of the proper time separation 
AT =/—np Ax? Ax®. There are subtleties, however, 
both mathematical and physical (Naber 1992, section 
1.4). The mathematical ones are addressed by the 
following result (which combines theorems 1.4.6 
and 1.4.8 of Naber (1992)). 


Theorem Let xo and x be two events in M. Then 
x — xo is timelike and future directed if and only if 
there exists a timelike world line a:|&),£,] — M in 
M with alo) = xo and a(&,) =x and, in this case, 


L(a) < T(x — xo) [9] 


with equality holding if and only if a is a parametriza- 
tion of a timelike straight line. 


The inequality [9] asserts that if two material 
particles experience both xp and x, then the one 
that is free (and so can be regarded as at rest in 
some admissible coordinate system) has longer to 
wait for the occurrence of the second event (moving 
clocks run slow). For many years this basically 
obvious fact was christened “The Twin Paradox.” 

Just as a smooth curve in Euclidean space has an 
arc length parametrization, so a timelike world line 
has a proper time parametrization defined as 
follows. For each € in [£o, £1] let 


£ 
r= 11) = | VECO AOI 


(the proper time length of a from a(&) to a(&)). 
Then 7 =7(€) has a smooth inverse €= €(T) so a can 
be reparametrized by r. We will abuse our notation 
slightly and write 
OUT) = (7 ey 

The velocity vector with this parametrization is 
denoted 

dx” 
dr 


called the 4-velocity of the world line and is the unit 
tangent vector field to a, that is, 


U(r) - U(r) = -1 [10] 





v=) = 


€a 


for each +. An admissible observer is, of course, 
more likely to parametrize a world line by his own 
time coordinate x*. Then 


mas dy de dx 
a (x") = JA t a2 tate, 


SO 


g(a’(x*), a(x*))| = a A 
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where 


dx1\*  /dx2\* (dey? 
Ivi = au. 


is the usual magnitude of the particle’s velocity 
vector 


V = V(x") 
dx! dx? dx? 
de aa aa 
= V'e; 


in the given admissible coordinate system. One finds 
then that 


U=(1-|IVIP) (+e) [tt] 


We shall identify a material particle in M with a 
pair (a,m), where a is a timelike world line and 7n is 
a positive constant called the particle’s proper mass 
(or rest mass). If each dx*/dé,a=1,2,3,4, is 
constant, then (œ, m) is a free material particle with 
proper mass m. The 4-momentum of (a,m) is 


defined by P=mU. Thus, 
P-P=-m? [12] 


In any admissible basis we write 


a 
P= Fe =e. m 
dr 
=1/2 
=m(1- |VIP) Vte) [m3 


The “spatial part” of P in these coordinates is 


m 


P=—— V 
2 
1—||V| 


which, for ||V|| < 1, is approximately mV. Identify- 
ing m with the inertial mass of Newtonian 
mechanics (measured by an observer for whom the 
particle’s speed is small), this is simply the classical 
momentum of the particle. Somewhat more expli- 
citly, if one expands 1/4/1 — ||V||* by the Binomial 
Theorem one finds that 


m 


P = —— V' 
2 
1- ||V|| 


= 1 , 
=mV' +3 mV'IVIE +-->, i=1,2,3 [14 


which gives the components of the classical momen- 
tum plus “relativistic corrections.” In order 
to preserve a formal similarity with Newtonian 


mechanics one often sees m/,/1-—||V||* referred 


to as the “relativistic mass” of the particle, but we 
shall avoid this terminology. The fourth component 
of P is given by 


Pt = —P - e, 
1 
1- ||V|| 


The appearance of the term (1/2)m||V||” corre- 
sponding to the Newtonian kinetic energy suggests 
that P* be denoted E and called the total relativistic 
energy measured by the given admissible observer 
for the particle: 


E =- —P . e4 [16] 


Now, one must understand that the concept of 
“energy” in physics is a subtle one and simply 
giving —P - e4 this name does not ensure that there 
is any physical content. Whether or not the name 
is appropriate can only be determined experimen- 
tally. In particular, one should ask if the appear- 
ance of the term m in [15] is consistent with 
the view that P* represents the “energy” of the 
particle. Observe that if ||V|| =0 (1.e., if the particle 
is at rest relative to the given observer), then [15] 
gives 


E =m (= mc’, in standard units) [17] 


which we interpret as saying that, even when the 
particle is at rest, it still has energy. If this is really 
“energy” in the physical sense, then it should be 
possible to liberate and use it. That this is, indeed, 
possible has, of course, been rather convincingly 
demonstrated. 

Next we observe that not only material particles, 
but also photons possess “momentum” and 
“energy” and therefore should have 4-momentum 
(witness, e.g., the photoelectric effect in which 
photons collide with and eject electrons from their 
orbits in an atom). Unlike a material particle, 
however, a photon’s characteristic feature is not 
proper mass, but frequency v, or wavelength 
A=1/v, related to its energy € by €=hv (h being 
Planck’s constant) and these are highly observer 
dependent (Doppler effect). There is, moreover, no 
“proper frequency” analogous to “proper mass” 
since there is no admissible observer for whom the 
photon is at rest. In an attempt to model these 
features we consider a point x9 E€ M, a future 
directed null vector N and an interval I C R. The 
curve a:I — M defined by 


alé) = xo + ¿N [18] 
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is a parametrization of the world line of a photon 
through xo. Being null, N can be written in any 
admissible basis as 


N = (—N -e4)(d + e4) [19] 


where 


+ (N . ex)" 
+ (N-e)er+(N- ex)es 20] 


is the direction vector of the world line in the 
corresponding spatial coordinate system. Now, by 
analogy with [16], we define a photon in M to 
be a curve in M of the form [18], take N to be its 
4-momentum and define the energy € of the photon 
in the admissible basis {e1, e2, e3, e4} by 


aA = —N.- €4 [21] 
Then, by [19], 
N = €(d + e4) [22] 


The corresponding frequency v and wavelength A 
are then defined by v=€£/þ and A=1/v. In another 
admissible basis, one has N=€(d+é4), where d 
and € are defined by the hatted versions of [20] and 
[21]. One can then show (Naber 1992, section 1.8) 
that 


E b 1 — Gcosé 
E yọ JR 
= (1 — Bcos0) +38 (1 — feos) +- [23] 


where ( is the relative speed of the two spatial 
coordinate systems and 0 is the angle (in the 
unhatted spatial coordinate system) between the 
direction d of the photon and the direction of 
motion of the hatted spatial coordinate system. 
Equation [23] is the formula for the relativistic 
Doppler effect with the first term in the series being 
the classical formula. 

We conclude this section by examining a few 
simple interactions between particles of the sort 
modeled by our definitions, assuming only that 
4-momentum is conserved in the interaction. For 
convenience, we will use the term free particle to 
refer to either a free material particle or a photon. 
If A is a finite set of free particles, then each 
element of A has a unique 4-momentum which is a 
future-directed timelike or null vector. The sum of 
any such collection of vectors is timelike and future 
directed, except when all of the vectors are null and 


parallel, in which case the sum is null and future 
directed (Naber 1992, lemma 1.4.3). We call this 
sum the total 4-momentum of A. Now we formulate 
a definition which is intended to model a finite set 
of free particles colliding at some event with a 
(perhaps new) set of free particles emerging from the 
collision (e.g., an electron and proton collide, with a 
neutron and neutrino emerging from the collision). 
A contact interaction in M is a triple (A,x, A), 
where A and A are two finite sets of free particles, 
neither of which contains a pair of particles with 
linearly dependent 4-momenta (which would pre- 
sumably be physically indistinguishable) and x € M 
is an event such that 


1. x is the terminal point of all of the particles in A 
(i.e., for each world line a:[&,&]—~ M of a 
particle in A, a(&,) =x); - 

2. x is the initial point of all the particles in A, and 

3. the total 4-momentum of A equals the total 
4-momentum of A. 


Properly (3) is called the conservation of 4-momentum. 
If A consists of a single free particle, then (A, x, A) is 
called a decay (e.g., a neutron decays into a proton, an 
electron and an antineutrino). - 

Consider, for example, an interaction (A, x, A) 
for which A consists of a single photon. The total 
4-momentum of A is null so the same must be true of 
A. Since the 4-momenta of the individual particles in 
A are timelike or null and future directed their sum 
can be null only if they are, in fact, all null and 
parallel. Since A cannot contain distinct photons with 
parallel 4-momenta, it must consist of a single photon 
which, by (3), must have the same 4-momentum as 
the photon in A. In essence, “nothing happened at 
x.” We conclude that no nontrivial interaction of the 
type modeled by our definition can result in a single 
photon and nothing else. Reversing the roles of A 
and A shows that, if 4-momentum is to be conserved, 
a photon cannot decay. 

Next let us consider the decay of a single material 
particle into two material particles, for example, the 
spontaneous disintegration of an atom through 
a-emission. Thus, we consider a contact interaction 
(A, x, A) in which A consists of a single free material 
particle of proper mass mo and A consists of two 
free material particles with proper masses mı and 
m. Let Po,P1, and P2 be the 4-momenta of the 
particles of proper mass mo, mı, and m2, respec- 
tively. Then Po=Pı + P2. Appealing to the 
“reversed triangle inequality,” the fact that P4 and 
P2 are linearly independent and future directed, and 
[12] we conclude that 


mo > mı +m [23] 
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The excess mass mop — (mı +m) of the initial 
particle is regarded, via [17], as a measure of the 
amount of energy required to split mọ into two 
pieces. Stated somewhat differently, when the two 
particles in A were held together to form the single 
particle in A, the “binding energy” contributed to 
the mass of this latter particle. - 

Reversing the roles of A and A in the last 
example gives a contact interaction modelling an 
inelastic collision (two free material particles with 
masses mı and m collide and coalesce to form a 
third of mass mọ). The inequality [23] remains true, 
of course, and a somewhat more detailed analysis 
(Naber 1992, section 1.8) yields an approximate 
formula for mo — (mı +m) which can be com- 
pared (favorably) with the Newtonian formula for 
the loss in kinetic energy that results from the 
collision (energy which, classically, is viewed as 
taking the form of heat in the combined particle). 
An analysis of the interaction in which both A and 
A consist of an electron and a photon yields (Naber 
1992, section 1.8) a formula for the so-called 
Compton effect. Many more such examples of this 
sort are treated in great detail in Synge (1972, 
chapter VI, § 14). 


Charged Particles and Electromagnetic 
Fields 


A charged particle in M is a triple (a,m, q), where 
(a,m) is a material particle and q is a nonzero real 
number called the charge of the particle. Charged 
particles do two things of interest to us. By their 
very presence they create electromagnetic fields and 
they also respond to the electromagnetic fields 
created by other charges. 

Charged particles “respond” to an electromag- 
netic field by experiencing changes in 4-momentum. 
The quantitative nature of this response, that is, the 
equation of motion, is generally taken to be the 
so-called Lorentz 4-force law which expresses 
the proper time rate of change of the particle’s 
4-momentum at each point of the world line as a 
linear function of the 4-velocity. Thus, at each point 
a(t) of the world line 


PO = gk(U(r)) 24) 
7 

where Fam: M — M is a linear transformation 
determined, in each admissible coordinate system, 
by the classical electric E and magnetic B fields (here 
we are assuming that the contribution of q to the 
ambient electromagnetic field is negligible, that is, 


(a,m,q) is a “test charge”). Let us write [24] more 
simply as 


z _mdU 


RU) = 25 


Dotting both sides of [25] with U gives 


e mdU m d 
PET eka a 
m d 
re ne 


Since any future-directed timelike unit vector u is 
the 4-velocity of some charged particle, we find 
that F(u)-u=0 for any such vector. Linearity then 
implies F(v)-v=0 for any timelike vector. Now, 
if u and v are timelike and future directed, then u + v 
is timelike so O=F(u+v)-(u+v)=F(u)-v+ 
u-F(v) and therefore F(u)-v= —u-F(v). But M 
has a basis of future-directed timelike vectors so 


F(x) -y = —x - F(y) [26] 


for all x,y € M. Thus, at each point, the linear 
transformation F must be skew-symmetric with 
respect to the Lorentz inner product. One could 
therefore model an electromagnetic field on M by 
an assignment to each point of a skew-symmetric 
linear transformation whose job it is to assign to the 
4-velocity of a charged particle whose world line 
passes through that point the change in 4-momen- 
tum that the particle should expect to experience 
because of the presence of the field. However, a 
slightly different perspective has proved more con- 
venient. Notice that a skew-symmetric linear trans- 
formation F:M— M and the Lorentz inner 
product together determine a bilinear form F: M x 
M — R given by 


~ 


F(x,y) = F(x) -y 


which is also skew-symmetric (F(y, x)= F(y) -x= 
—F(x,y)) and that, conversely, a skew-symmetric 
bilinear form uniquely determines a skew-symmetric 
linear transformation. Now, an assignment of a 
skew-symmetric bilinear form to each point of M is 
nothing other than a 2-form on M and it is in the 
language of forms that we choose to phrase classical 
electromagnetic theory (a concise introduction to 
this language is available, for example, in Spivak 
(1965, chapter 4). 

Nature imposes a certain restriction on which 
2-forms can reasonably represent an electromagnetic 
field on M (“Maxwell’s equations”). To formulate 
these we introduce a source 1-form J as follows: If 
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xl, x, x’, xf is any admissible coordinate system on 


M, then 
J= Jidx! + Jadx? + J3dx? — pdx* [27] 


where p:M — R is a charge density function and 
J=Jie1 + Jz2e2 + J3e3 is a current density vector field 
(these are to be regarded as the usual “smoothed 
out,” pointwise versions of “charge per unit 
volume” and “charge flow per unit area per unit 
time” as measured by the corresponding admissible 
observer). Now, our formal definition is as follows: 
The electromagnetic field on M determined by the 
source 1-form J on M is a 2-form F on M that 
satisfies Maxwell’s equation 


dF =0 [28] 
and 
"d F=] [29] 


A few comments are in order here. We have chosen 
units in which not only the speed of light, but also 
various other constants that one often finds in 
Maxwell’s equations (the dielectric constant co and 
magnetic permeability wo) are 1 and a factor of 47 in 
[29] is “normalized out.” The * in [29] is the Hodge 
star operator determined by the Lorentz inner 
product and the chosen orientation of M. This is a 
natural isomorphism 


*2OP(M) = OF ?(M), p=0,1,2,3,4 


of the p-forms on M to the (4 — p)-forms on M and is 
most simply defined as follows: let x!, x7, x°, x* be any 
admissible coordinate system on M. If 1 € 2°(M) 


is the constant function (0-form) on M whose value 
is 1 € R, then 


*1 = dx! A dx? A dx? A dx* 
is the volume form on M. If1l<i<---<4 <4, 
then *(dx A --- A dx*) is uniquely determined by 
(dx A+++ A dx") A* (dx A+++ A dx*) 
= —dx! A dx” A dx? A dx* 

Thus, for example, *dx* = dx! A dx? A dxf, *(dx! A 
dx”) = —dx? Adx*, *(dx! A dx? A dx? Adx*)= —1, 
etc. It follows that, if u is a p-form on M, then 

“y= (1) "pu [30] 


(a more thorough discussion is available in Choquet- 
Bruhat et al. (1977, chapter V A3)). In particular, 
[29] is equivalent to 


d*F=*J 31) 


On regions in which there are no charges, so that 
J =0, [28] and [31] become the source free Maxwell 


equations 

dF = 0 [32] 
and 

dr=0 [33] 


that is, both F and *F are closed 2-forms. 

Any 2-form Fon M can be written in any admissible 
coordinate system as F = (1/2)F pdx" A dx?’ (summa- 
tion convention!), where (F,,,) is the skew-symmetric 
matrix of components of F. In order to make contact 
with the notation generally employed in physics, we 
introduce the following names for these components: 


0 Be =p p 


-B? 0 B F 
(Fab) = B2 — pB! 0 F3 [34] 


= =k =k © 


Thus, 
F = Etdxt A dx* + E*dx? A dx* 
+E* d A dx* + Bedx! A dx? 
+ B*dx?’ A dx! + B'dx* A dx? [35] 
Computing *F,dF,d*F and *dřF and writing 


E=Ele, + E?e + F°e3 and B= Bte + Be) + Bre 
one finds that dF=0 is equivalent to 


div B = 0 [36] 
and 
OB 
curl E + 2" = 0 [37] 
while *d*F =] is equivalent to 
divE =p [38] 
and 
curl B — £ =J [39] 


Equations [36]-[39] are the more traditional render- 
ings of Maxwell’s equations. 

In another admissible coordinate system 
x! x7, %°,x* on M (related to the first by [2]) the 
2-form F would be written F= (1/2)Ê pd" A dx?. 
Setting 7=A%x* and &°=A%ox? gives 
F=(1/2)(A%,A° gF,,)dx* A dxf, so 


Fog = APA’ Fp, a, B= 1,2,3,4 40] 


Now, suppose that we wish to describe the electro- 
magnetic field of a uniformly moving charge. 
According to the relativity principle, it does not 
matter at all whether we view the charge as moving 
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relative to a “fixed” admissible observer, or the 
observer as moving relative to a “stationary” charge. 
Thus, we shall write out the field due to a charge 
fixed at the origin of the hatted coordinate system 
(“Coulomb’s law”) and transform, by [40], to an 
unhatted coordinate system moving relative to it. 
Relative to x',%7,x°,x*, the familiar inverse square 
law for a fixed point charge g located at the spatial 
origin gives B=0 and E=(q/?*)?, where *=%'8 + 
526, +363; and F=((K!)* + (£2)? + (23)7)1/ (note 
that È is defined only on M — Span{é4}). Thus, 


0 0 0 & 

> a4} 0 0 0 # 

U)=31 9 o o 3] FA 
-#1 42 -#3 0 


N 


It is a simple matter to verify that, on its domain, (Fap) 
satisfies the source free Maxwell equations. Taking A to 
be the special Lorentz transformation corresponding to 
[5] and writing out [40] with (F,,) given by [41] yields 


We wish to express these in terms of measurements 
made by the unhatted observer at the instant the 
charge passes through his spatial origin. Setting 
x* =0 in [5] gives 


1 
Rl = —— xl, =x, R=? 
4/1 — 82 
and so 
1 
p= 1g (x01)? + (a7)* + (x7)? 


which, for convenience, we write m Making these 
substitutions in [42] gives 


E= oe (3 ( Taj +x e + x°e3) 


Vie 
1 


and 


B = a (5) (Oe, — Bxrer + Bx*e3) 


q 1 
“HF (5) ((Ge1) x r) [44] 
for the field of a charge moving uniformly with 
velocity Ze, at the instant the charge passes through 
the origin. Observe that when 8 < 1,1rg ~ r, so [43] 
says that the electric field of a slowly moving charge 
is approximately the Coulomb field. When 8 < 1, 
[44] reduces to the Biot—Savart law. 

Let us consider one other simple application, that 
is, the response of a charged particle (a,m,q) to an 
electromagnetic field which, for some admissible 
observer, is constant and purely magnetic. For 
simplicity, we assume that, for this observer E=0 
and B=be3, where b is a nonzero constant. The 
corresponding 2-form F has components 


0 b 0 0 

-b 0 0 0 

(Fap) = 000 0 
0 0 0 O 


(from [34]). The corresponding linear transforma- 
tion F has the same matrix relative to this basis so, 
with a(T)=x°"(T)e, and U(r) = U*%(r)e,, the Lorentz 
4-force law [25] reduces to the system of linear 
differential equations 


OM, "4172 _ 1 
dr ae dr a” 
ie aut 
dr’ dr 


The system is easily solved and the results easily 
integrated to give 


a(T) =xo + asin ee + 6) e1 





bar 
+ acos (2 + @ le 
m 
2 f,2 42 
a“b 
+ cTe3 + (1 + k. + res [45] 
m 
where xo =xĝe, € M is constant and a, ġ, and c are 
real constants with a > 0 (we have used U . U= —1 
to eliminate one other arbitrary real constant). Note 
that, at each point on a, (x! — oe pg xn) — gq’, 


Thus, if c #0 the spatial trajectory in this coordi- 
nate system is a helix along the _ e3-direction 
(i.e. along the magnetic field lines). If c=0, the 
trajectory is a circle in the x!-x” plane. This case 
is of some practical significance since one can 
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introduce constant magnetic fields in a bubble 
chamber so as to induce a particle of interest to 
follow a circular path. We show now how to 
measure the charge-to-mass ratio for such a particle. 
Taking c=0 in [45] and computing U(r), then using 
[11] to solve for the coordinate velocity vector V of 
the particle gives 


v = EAP (cos (“E+ 0 Je 
1 —||V||? m 


+ sin (2 + 6) e») 
m 


From this one computes 


m- -1 
vi?= [1 + 
IVP= (14355) 


(note that this is a constant). Solving this last equation 
for q/m (and assuming q > 0 for convenience) one 
arrives at 


Since a, b, and ||V|| are measurable, one obtains the 
desired charge-to-mass ratio. 

To conclude we wish to briefly consider the 
existence and use of “potentials” for electromagnetic 
fields. Suppose F is an electromagnetic field defined 
on some connected, open region X in M. Then F is 
a 2-form on X which, by [28], is closed. Suppose 
also that the second de Rham cohomology H?(X;R) 
of X is trivial (since M is topologically R* this will 
be the case, for example, when X is all of M, or an 
open ball in M, or, more generally, an open “star- 
shaped” region in M). Then, by definition, every 
closed 2-form on X is exact so, in particular, there 
exists a 1-form A on X satisfying 


F=dA 46] 


In particular, such a 1-form A always exists locally 
on a neighborhood of any point in X for any F. Such 
an A is not uniquely determined, however, because, 
if A satisfies [46], then so does A+df for any 
smooth real-valued function (0-form) f on X (d =0 
implies d(A + df) = dA + d’f = dA =F). Any 1-form 
A satisfying [46] is called a (gauge) potential for F. 
The replacement A —> A + df for some f is called a 
gauge transformation of the potential and the 
freedom to make such a replacement without 
altering [46] is called gauge freedom. 

One can show that, given F, it is always possible 
to locally solve dA =F for A subject to an arbitrary 
specification of the 0-form *d*A. More precisely, if F 


is any 2-form satisfying dF=0 and g is an arbitrary 
0-form, then locally, on a neighborhood of any 
point, there exists a 1-form A satisfying 


dA=F and *ďA=g 47] 


(a more general result is proved in Parrott (1987, 
appendix 2) and a still more general one in section 
2.9 of this same source). The usefulness of the 
second condition in [47] can be illustrated as 
follows. Suppose we are given some (physical) 
configuration of charges and currents (i.e., some 
source 1-form J) and we wish to find the corre- 
sponding electromagnetic field F. We must solve 
Maxwell’s equations dF=0 and *d*F =J (subject to 
whatever boundary conditions are appropriate). 
Locally, at least, we may seek instead a correspond- 
ing potential A (so that F=dA). Then the first of 
Maxwell’s equations is automatically satisfied 
(dF=d(dA)=0) and we need only solve 
*d*(dA) =J. To simplify the notation let us tempora- 
rily write 6=*d* and consider the operator A= 
doé+6od on forms (variously called the Laplace- 
Beltrami operator, Laplace-de Rham operator, or 
Hodge Laplacian on Minkowski spacetime). Then 


AA = d(6A) + 6(dA) = d(*d*A) + *d*(dA) 148] 


According to the result quoted above, we may 
narrow down our search by imposing the condition 


*d*A =0, that is 
6A =0 |49] 


(this is generally referred to as imposing the Lorentz 
gauge). With this, [48] becomes AA =*d*(dA) and 
to satisfy the second Maxwell equation we must 
solve 


AA =J [50] 


Thus, we see that the problem of (locally) solving 
Maxwell’s equations for a given source J reduces 
to that of solving [49] and [50] for the potential A. 
To understand how this simplifies the problem, we 
note that a calculation in admissible coordinates 
shows that the operator A reduces to the compo- 
nentwise d’Alembertian O, defined on real-valued 
functions by 


8 E O O 
I(x A(S axy 


-a(x 





Thus, eqn [50] decouples into four scalar equations 
OAs = Ja a= 1,2,3,4 [51] 


each of which is the well-studied inhomogeneous 
wave equation. 
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Historical Background 


In this section we shall briefly recall the basic 
empirical facts and the first theoretical attempts 
from which the theory and the formalism of present- 
day quantum mechanics (QM) has grown. In the 
next sections we shall give the mathematical and 
computational structure of QM, mention the physi- 
cal problems that QM has solved with much 
success, and describe the serious conceptual consis- 
tency problems which are posed by QM (and which 
remain unsolved up to now). 

Empirical rules of discretization were observed 
already, starting from the 1850s, in the absorption 
and in the emission of light. Fraunhofer noticed 
that the dark lines in the absorption spectrum of 
the light of the sun coincide with the bright lines in 
the emission lines of all elements. G Kirchhoff and 
R Bunsen reached the conclusion that the relative 
intensities of the emission and absorption of light 
implied that the ratio between energy emitted and 
absorbed is independent of the atom considered. 
This was the starting point of the analysis by 
Planck. 

On the other hand, by the end of the eighteenth 
century, the spatial structure of the atom had been 
investigated; the most successful model was that of 
Rutherford, in which the atom appeared as a small 
nucleus of charge Z surrounded by Z electrons 
attracted by the nucleus according to Coulomb’s 
law. This model represents, for distances of the 
order of the size of an atom, a complete departure 
from Newton’s laws combined with the laws of 
classical electrodynamics; indeed, according to these 
laws, the atom would be unstable against collapse, 
and would certainly not exhibit a discrete energy 
spectrum. We must conclude that the classical laws 


are inadequate for the description of emission and 
absorption of light, in which the internal structure of 
the atom plays a major role. 

The birth of the old quantum theory is placed 
traditionally at the date of M Planck’s discussion of 
the blackbody radiation in 1900. 

Planck put forward the postulate that light is 
emitted and absorbed by matter in discrete energy 
quanta through “resonators” that have an energy 
proportional to their frequency. This assumption 
led, through the use of Gibb’s rules of Statistical 
Mechanics applied to a gas of resonators, to a law 
(Planck’s law) which reproduces the empirical 
findings on the radiation from a blackbody. It led 
Einstein to ascribe to light (which had, since the 
times of Maxwell, a successful description in terms 
of waves) a discrete, particle-like nature. Nine years 
later A Einstein gave further support to Planck’s 
postulate by showing that it can reproduce correctly 
the energy fluctuations in blackbody radiation and 
even clarifies the properties of specific heat. Soon 
afterwards, Einstein (1924, 1925) proved that the 
putative particle of light satisfied the relativistic laws 
(relation between energy and momentum) of a 
particle with zero mass. 

This dual nature of light received further support 
from the experiments on the Compton effect and 
from description, by Einstein, of the photoelectric 
effect (Einstein 1905). It should be emphasized 
that while Planck considered with light in interaction 
with matter v as composed of bits of energy hv (h ~ 
6,6 x 10’ ergs), Einstein’s analysis went much 
further in assigning to the quantum of light properties 
of a particle-like (localized) object. This marks a 
complete departure from the laws of classical electro- 
magnetism. Therefore, quoting Einstein, 


It is conceivable that the wave theory of light, which 
retains its effectiveness for the representation of purely 
optical phenomena and is based on continuous functions 
over space, will lead to contradiction with the experiments 
when applied to phenomena in which there is creation or 
conversion of light; indeed these phenomena can be better 
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described on the assumption that light is distributed 
discontinuously in space and described by a finite number 
of quanta which move without being divided and which 
must be absorbed or emitted as a whole. 


Notice that, for wavelength of 8x10°A, a 30W 
lamp emits roughly 107° photons s™t; for macro- 
scopic objects the discrete nature of light has no 
appreciable consequence. 

Planck’s postulate and energy conservation imply 
that in emitting and absorbing light the atoms of the 
various elements can lose or gain energy only by 
discrete amounts. Therefore, atoms as producers or 
absorbers of radiation are better described by a 
theory that assigns to each atom a (possible infinite) 
discrete set of states which have a definite energy. 

The old quantum theory of matter addresses 
precisely this question. Its main proponent is 
N Bohr (Bohr 1913, 1918). The new theory is 
entirely phenomenological (as is Planck’s theory) 
and based on Rutherford’s model and on three 
more postulates (Born 1924): 


(i) The states of the atom are stable periodic 
orbits, as given by Newton’s laws, of energy 
E,,n € Z*, given by E, =bhv, f(n), where hb is 
Plank’s constant, v, is the frequency of the 
electron on that orbit, and f(n) is for each atom 
a function approximately linear in Z at least for 
small values of Z. 

(ii) When radiation is emitted or absorbed, the 
atom makes a transition to a different state. 
The frequency of the radiation emitted or 
absorbed when making a transition is 
Vigan Ea = Eyl 

(iii) For large values of n and m and small values of 
(1 —m)/(n+m) the prediction of the theory 
should agree with those of the classical theory 
of the interaction of matter with radiation. 


Later, A Sommerfeld gave a different version of the 
first postulate, by requiring that the allowed orbits 
be those for which the classical action is an integer 
multiple of Planck’s constant. 

The old quantum theory met success when 
applied to simple systems (atoms with Z < 5) but 
it soon appeared evident that a new, radically 
different point of view was needed and a fresh 
start; the new theory was to contain few free 
parameters, and the role of postulate (iii) was now 
to fix the value of these parameters. 

There were two (successful) attempts to construct 
a consistent theory; both required a more sharply 
defined mathematical formalism. The first one was 
sparked by W Heisenberg, and further important 
ideas and mathematical support came from M Born, 


P Jordan, W Pauli, P Dirac and, on the mathema- 
tical side, also by J von Neumann and A Weyl. This 
formulation maintains that one should only consider 
relations between observable quantities, described 
by elements that depend only on the initial and final 
states of the system; each state has an internal 
energy. By energy conservation, the difference 
between the energies must be proportional (with a 
universal constant) to the frequency of the radiation 
absorbed or emitted. This is enough to define the 
energy of the state of a single atom modulo an 
additive constant. The theory must also take into 
account the probability of transitions under the 
influence of an external electromagnetic field. 

We shall give some details later on, which will 
help to follow the basis of this approach. 

The other attempt was originated by L de Broglie 
following early remarks by HW Bragg and 
M Brillouin. Instead of emphasizing the discrete 
nature of light, he stressed the possible wave nature 
of particles, using as a guide the Hamilton—Jacobi 
formulation of classical mechanics. This attempt 
was soon supported by the experiments of Davisson 
and Germer (1927) of scattering of a beam of ions 
from a crystal. These experiments showed that, 
while electrons are recorded as “point particles,” 
their distribution follows the law of the intensity for 
the diffraction of a (dispersive) wave. Moreover, the 
relation between momentum and frequency was, 
within experimental errors, the same as that 
obtained by Einstein for photons. 

The theory started by de Broglie was soon placed 
in almost definitive form by E Schrodinger. In this 
approach one is naturally led to formulate and solve 
partial differential equations and the full develop- 
ment of the theory requires regularity results from 
the theory of functions. 

Schrödinger soon realized that the relations which 
were found in the approach of Heisenberg could be 
easily (modulo technical details which we shall 
discuss later) obtained within the formalism he was 
advocating and indeed he gave a proof that the two 
formalisms were equivalent. This proof was later 
refined, from the mathematical point of view, by 
J von Neumann and G Mackey. 

In fact, Schrodinger’s approach has proved much 
more useful in the solution of most physical 
problems in the nonrelativistic domain, because it 
can rely on the developments and practical use of 
the theory of functions and of partial differential 
equations. Heisenberg’s “algebraic” approach has 
therefore a lesser role in solving concrete problems 
in (nonrelativistic) QM. 

If one considers processes in which the number of 
particles may change in time, one is forced to 


introduce a Hilbert space that accommodates states 
with an arbitrarily large number of particles, as is 
the case of the theory of relativistic quantized field 
or in quantum statistical mechanics; it is then more 
difficult to follow the line of Schrödinger, due to 
difficulties in handling spaces of functions of 
infinitely many variables. The approach of Heisen- 
berg, based on the algebra of matrices, has a rather 
natural extension to suitable algebras of operators; 
the approach of Schrödinger, based on the descrip- 
tion of a state as a (wave) function, encounters more 
difficulties since one must introduce functionals over 
spaces of functions and the description of dynamics 
does not have a simple form. 

From this point of view, the generalization of 
Heisenberg’s approach has led to much progress in 
the understanding of the structure of the resulting 
theory. Still some relevant results have been 
obtained in a Schrödinger representation. We shall 
not elaborate further on this point. 

We shall end this introductory section with a 
short description of the emergence of the structure 
of QM in Heisenberg’s and _ Schrédinger’s 
approaches; this will provide a motivation for the 
axiom of QM which we shall introduce in the 
following section. For an extended analysis, see, for 
example, Jammer (1979). 

The specific form that was postulated by 
de Broglie (1923) for the wave nature of a particle 
relies on the relation of geometrical optics with 
wave propagation and on the formulation of 
Hamiltonian mechanics as a sort of “wave front 
propagation” through the solution of the Hamilton- 
Jacobi equation and the introduction of group 
velocity. 

By the analogy with electromagnetic wave, it is 
natural to associate with a free nonrelativistic 
particle of momentum p and mass m the plane wave 


pl, t) = ello, 


Schrödinger obtained the equation for a quantum 
particle in a field of conservative forces with 
potential V(x) by considering an analogy with the 
propagation of an electromagnetic wave in a 
medium with refraction index n(x,w) that varies 
slowly on the scale of the wavelength. Indeed, in this 
case the “wave” follows the laws of geometrical 
optics, and has therefore a “particle-like” behavior. 
If one denotes by “#(x,w) the Fourier transom (with 
respect to time) of a generic component of the 
electric field and one assumes that the field be 
essentially monochromatic (so that the support of 
u(x,w) as a function of w is in a very small 
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neighborhood of wo), one finds that u(x,w) is an 
approximate solution of the equation 
x Wh 2 5 
—Ai(x,w) = an (x, w)u(x, w) [1] 
Writing uļx,w)=A(x, w) el/IV™) the phase 
W(x,w) satisfies, in the high-frequency limit, the 
eikonal equation IV W(x, w)? —n?(x,w). One can 
define for the solution a phase velocity v; and it 
turns out that vt =c/|V W(x, w)|. 

On the other hand, classical mechanics can also be 
described by propagation of surfaces of constant value 
for the solution W(x, t) of the Hamilton-Jacobi 
equation H(x,VW)=E, with H=p*/2m+ V(x). 
Recall that high-frequency (the realm of geometric 
optics) corresponds to small distances. This analogy 
led Schrodinger (1926) to postulate that the dynamics 
satisfied by the waves associated with the particles was 
given by the (Schrodinger) equation 


Ow(x,t b 
DEED O E Att Ve) 2 
This wave was to describe the particle and its motion, 
but, being complex valued, it could not represent any 
measurable property. It is a mathematical property of 
the solutions of [2] that the quantity f |¢(x, t)|? d°x is 
preserved in time. Furthermore, if one sets 


p(x,t) = y(x, t) 

, m e _ 

(x,t) = -iz [W(x OV U(x, t) = plx, t) Velt) [3] 
one easily verifies the local conservation law 


7 + div j(x,t) = 0 [4] 
Ot 

These mathematical properties led to the statis- 
tical interpretation given by Max Born: in those 
experiments in which the position of the particles is 
measured, the integral of y(x, t)|? over a region Q of 
space gives the probability that at time ¢ the particle 
is localized in the region Q. Moreover, the current 
associated with a charged particle is given locally by 
j(x, t) defined above. 

Let us now briefly review Heisenberg’s approach. 
At the heart of this approach are: empirical formulas 
for the intensities of emission and absorption of 
radiation (dispersion relations), Sommerfeld’s quan- 
tum condition for the action and the vague 
statement “the analogue of the derivative for the 
discrete action variable is the corresponding finite 
difference quotient.” And, most important, the 
remark that the correct description of atomic 
physics was through quantities associated with 
pairs of states, that is, (infinite) matrices and the 
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empirical fact that the frequency (or rather the wave 
number) wg, ; of the radiation (emitted or absorbed) 
in the transition between the atomic levels k and 
j (k Æj) satisfies the Ritz combination principle 
Wm, jį Wik =Wm, k. It easy to see that any doubly 
indexed family satisfying this relation must have the 
form wm, k = Em — Ex for suitable constant Ej. 

It was empirically verified by Kramers that the 
dipole moment of an atom in an external monochro- 
matic external field with frequency v was proportional 
to the field with a coefficient (of polarization) 


e? f; F; 
P— = s 5 
raed Be sere v — | | 


where e, m are the charge and the mass of the 
electron and f;,F; are the probabilities that the 
frequency v is emitted or absorbed. 

A detailed analysis of the phenomenon of polarization 
in classical mechanics, with the clearly stated aim “of 
presenting the results in a way that may give hints for the 
construction of a New Mechanics” was made by Max 
Born (1924). He makes use of action-angle variables 
{Ji,9;} assuming that the atom can be considered as a 
collection of harmonic oscillators with frequency 1; 
coupled linearly to the electric field of frequency u. 

In the dipole approximation one obtains the 
following result for the polarization P (linear 
response in energy to the electric field): 











O S om yp DUm ig 


a 


(v-m)>0 


where v = 0H /ðJ}, H is the interaction Hamiltonian), 
and A(J) is a suitable matrix. In order to derive the 
new dynamics, having as a guide the correspondence 
principle, one has to compare this result with the 
Kramers dispersion relation, which we write (to make 
the comparison easier) in the form 


2 


P _ e ` T Tim 
= E ogl p m 
4rm ae a 





Em > by [7] 


Bohr’s rule implies that v(n+r,n)=(E(n+rTt-— 


n))/ħ. 

Born and Heisenberg noticed that, for n suffi- 
ciently large and k small, one can approximate the 
differential operator in [6] with the corresponding 
difference operator, with an error of the order of k/n. 
Therefore, [6] could be substituted by 


EP 


p-p S| 
v(n+ m) — p? 


m,>0 
P 


v(n —m) — p? 








The conclusion Born and Heisenberg drew is that 
the matrix A that takes the place of the momentum 
in the classical theory must be such that 
|A = ehm f(n +m,n). In the same vein, 
considering the polarization in a static electric 
field, it is possible to find an expression for the 
matrix that takes the place of the coordinate x in 
classical Hamiltonian theory. 

In general, the new approach (matrix mechanics) 
associates matrices with some relevant classical 
observables (such as functions of position or 
momentum) with a time dependence that is derived 
from the empirical dispersion relations of Kramers, 
the correspondence principle, Bohr’s rule, Sommer- 
feld action principle and first- (and second-) order 
perturbation theory for the interaction of an atom 
with an external electromagnetic field. It was soon 
clear to Born and Jordan (1925) that this dynamics 
took the form ihA = AH — HA for a matrix H that 
for the case of the hydrogen atom is obtained for the 
classical Hamiltonian with the prescription given for 
the coordinates x and p. It was also seen as plausible 
the relation [X,,p,]=il among the matrices £, and 
p corresponding to position and momentum. One 
year later P Dirac (1926) pointed out the structural 
identity of this relation with the Poisson bracket of 
Hamiltonian dynamics, developed a “quantum alge- 
bra” and a “quantum differentiation” and proved 
that any *-derivation 6 (derivation which preserves 
the adjoint) of the algebra By of N x N matrices is 
inner, that is, is given by 6(a)=ila,h| for a 
Hermitian matrix h. Much later this theorem was 
extended (with some assumptions) to the algebra of 
all bounded operators on a separable Hilbert space. 
Since the derivations are generators of a one- 
parameter continuous group of automorphisms, 
that is, of a dynamics, this result led further strength 
to the ideas of Born and Heisenberg. 

The algebraic structure introduced by Born, 
Jordan, and Heisenberg (1926) was used by Pauli 
(1927) to give a purely group-theoretical derivation 
of the spectrum of the hydrogen atom, following the 
lines of the derivation in symplectic mechanics of the 
SO(4) symmetry of the Coulomb system. This 
remarkable success gave much strength to the 
Heisenberg formulation of QM, which was soon 
recognized as an efficient instrument in the study of 
the atomic world. 

The algebraic formulation was also instrumental 
in the description given by Pauli (1928) of the 
“spin” (a property of electrons empirically postu- 
lated by Goudsmidt and Uhlenbeck to account for a 
hyperfine splitting of some emission lines) as 
“internal” degree of freedom without reference to 
spatial coordinates and still connected with the 


2 








properties of the the system under the group of 
spatial rotations. This description through matrices 
has a major role also in the formulation by Pauli of 
the exclusion principle (and its relation with Fermi- 
Dirac statistics), which gave further credit to the 
Heisenberg’s theory by helping in reproducing 
correctly the classification of the atoms. 

These features may explain why the “standard” 
formulation of the axioms of QM given in the next 
section shows the influence of Heisenberg’s 
approach. On the other hand, comparison with 
experiments is usually set in the framework in 
Schrédinger’s approach. Posing the problems in 
terms of properties of the solution of the Schrödinger 
equation, one is led to a pragmatic use of the 
formalism, leaving aside difficulties of interpreta- 
tion. This separation of “the axioms” from the 
“practical use” may be one of the reasons why a 
serious analysis of the axioms and of the problems 
that arise from them is apparently not a concern for 
most of the research in QM, even from the point of 
view of mathematical physics. 

One should stress that both the approach of Born 
and Heisenberg and that of de Broglie and Schrö- 
dinger are rooted in a mixture of attention to the 
experimental data, deep understanding of the pre- 
vious theory, bold analogies and approximations, 
and deep concern for the consistency of the “new 
mechanics.” 

There is an essential difference between the 
starting points of the two approaches. In Heisen- 
berg’s approach, the atom has a priori no spatial 
structure; the description is entirely in terms of its 
properties under emission and absorption of light, 
and therefore its observable quantities are repre- 
sented by matrices. Dynamics enters through the 
study of the interaction with the electromagnetic 
field, and some analogies with the classical theory of 
electrodynamics in an asymptotic regime (correspon- 
dence principle). In this way, as we have briefly 
indicated, the special role of some matrices, which 
have a mutual relation similar to the relation of 
position and momentum in Hamiltonian theory. 
Following this analogy, it is possible to extend the 
theory beyond its original scope and consider 
phenomena in which the electrons are not bound 
to an atom. 

In the approach of Schrödinger, on the other 
hand, particles and collections of particles are 
represented by spatial structures (waves). Spatial 
coordinates are therefore introduced a priori, and 
the position of a particle is related to the intensity of 
the corresponding wave (this was stressed by Born). 
Position and momentum are both basic measurable 
quantities as in classical mechanics. Physical 
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interpretation forces the particle wave to be square 
integrable, and mathematics provides a limitation on 
the simultaneous localization in momentum and 
position leading to Heisenberg’s uncertainty princi- 
ple. Dynamics is obtained from a particle-wave 
duality and an analogy with the relativistic wave 
equation in the low-energy regime. The presence of 
bound states with quantized energies is seen as a 
consequence of the well-known fact that waves 
confined to a bounded spatial region have their 
wave number (and therefore energy) quantized. 


Formal Structure 


In this section we describe the formal mathematical 
structure that is commonly associated with QM. It 
constitutes a coherent mathematical theory, but the 
interpretation axiom it contains leads to conceptual 
difficulties. 

We state the axioms in the form in which they 
were codified by J von Neumann (1966); they 
constitute a mathematically precise rendering of the 
formalism of Born, Heisenberg, and Jordan. The 
formalism of Schrodinger per se does not require 
general statements about the category of 
observables. 


Axiom I 


(i) Observables are represented by self-adjoint opera- 
tors in a complex separable Hilbert space H. 
(ii) Every such operator represents an observable. 


Remark Axiom I (ii) is introduced only for mathe- 
matical simplicity. There is no physical justification 
for part (ii). In principle, an observable must be 
connected to a procedure of measurement (observa- 
tion) and for most of the self-adjoint operators on H 
(e.g. in the Schrodinger representation for 
ixp(O/Ox),)x,z) such procedure has not yet been given). 


Axiom II 


(i) Pure states of the systems are represented by 
normalized vectors in H. 

(ii) If a measurement of the observable A is made on 
a system in the state represented by the element 
@ €H, the average of the numerical values one 
obtains is <¢, Ad>, a real number because A is 
self-adjoint (we have denoted by <¢,wW> the 
scalar product in H). 


Remark Notice that Axiom II makes no statement 
about the outcome of a single measurement. 


Using the natural complex structure of B(H), pure 
states can be extended as linear real functionals on 


B(H). 
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One defines a state as any linear real positive 
functional on B(H) (all bounded operators on the 
separable Hilbert space H) and says that a state is 
normal if it is continuous in the strong topology. 
It can be proved that a normal state can be 
decomposed into a convex combination of at most 
a denumerable set of pure states. With these 
definitions a state is pure iff it has no nontrivial 
decomposition. It is worth stressing that this state- 
ment is true only if the operators that correspond to 
observable quantities generate all of B(H); one refers 
to this condition by stating that there are no 
superselection rules. 

By general results in the theory of the algebra 
B(H), a normal state p is represented by a positive 
Operator of trace class o through the formula 
p(A)=Tr(oA). Since a positive trace-class operator 
(usually referred to as density matrix in analogy 
with its classical counterpart) has eigenvalues A; 
that are positive and sum up to 1, the decomposition 
of the normal state p takes the form o= 55, Allg, 
where I], is the projection operator onto the kth 
eigenstate (counting multiplicity). 

It is also convenient to know that if a sequence of 
normal states o, on B(H) converges weakly (i.e., for 
each A € B(H) the sequence o,(A) converges) then 
the limit state is normal. This useful result is false in 
general for closed subalgebras of 6(H), for example, 
for algebras that contain no minimal projections. 

Note that no pure state is dispersion free with 
respect to all the observables (contrary to what 
happens in classical mechanics). Recall that the 
dispersion of the state p, with respect to the 
observable A is defined as A,(A) = o(A2) — (o(A))’. 

The connection of the state with the outcome of a 
single measurement of an observable associated with 
an operator A is given by the following axiom, which 
we shall formulate only for the case when the self- 
adjoint operator A has only discrete spectrum. The 
generalization to the other case is straightforward but 
requires the use of the spectral projections of A. 


Axiom III 


(i) If A has only discrete spectrum, the possible 
outcomes of a measurement of A are its 
eigenvalues {a;}. 

(ii) If the state of the system immediately before the 
measurement is represented by the vector ¢ € H, 
the probability that the outcome be a, is $`, |< Y, 
p >|, where ge" are a complete orthonormal 
set in the Hilbert space spanned by the eigenvec- 
tors of A to the eigenvalue az. 

(iii) If a system is in the pure state ọ and one 
performs a measurement of the observable 
A with outcome a; € (b—6,b+6) for some 


b, € R then immediately after the measure- 
ment the system can be in any (not necessarily 
pure) state which lies in the convex hull of the 
pure states which are in the spectral subspace of 
the operator A in the interval Aj,.5= 
(b—6,b+6). 


Note Statements (ii) and (iii) can be extended 
without modification to the case in which the initial 
state is not a pure state, and is represented by a 
density matrix o. 


Remark 1 Axiom III makes sure that if one 
performs, immediately after the first, a further 
measurement of the same observable A the outcome 
will still lie in the interval Aj;.;. This is needed to 
give some objectivity to the statement made about 
the outcome; notice that one must place the 
condition “immediately after” because the evolution 
may not leave invariant the spectral subspaces of A. 
If the operator A has, in the interval Aj;.5, only 
discrete (pure point) spectrum, one can express 
Axiom III in the following way: the outcome can 
be any state that can be represented by a convex 
affine superposition of the eigenstates of A with 
eigenvalues contained in Ag, 5. 


In the very special case when A has only one 
eigenvalue in Ap, and this eigenvalue is not 
degenerate, one can state Axiom III in the following 
form (commonly referred to as “reduction of the 
wave packet”): the system after the measurement is 
pure and is represented by an eigenstate of the 
operator A. 


Remark 2 Notice that the third axiom makes a 
statement about the state of the system after the 
measurement is completed. 


It follows from Axiom III that one can measure 
“simultaneously” only observables which are repre- 
sented by self-adjoint operators that commute with 
each other (i.e., their spectral projections mutually 
commute). It follows from the spectral representa- 
tion of the self-adjoint operators that a family {A;} 
of commuting operators can be considered (i.e., 
there is a representation in which they are) functions 
Over a common measure space. 

Axioms I-III give a mathematically consistent 
formulation of QM and allow a statistical descrip- 
tion (and statistical prediction) of the outcome of 
the measurement of any observable. It is worth 
remarking that while the predictions will have only 
a statistical nature, the dynamical evolution of the 
observables (and by duality of the states) will be 
described by deterministic laws. The intrinsically 
statistical aspect of the predictions comes only from 


the third postulate, which connects the mathemati- 
cal content of the theory with the measurement 
process. 

The third axiom, while crucial for the connection 
of the mathematical formalism with the experimen- 
tal data, contains the seed of the conceptual 
difficulties which plague QM and have not been 
cured so far. 

Indeed, the third axiom indicates that the process 
of measurement is described by laws that are 
intrinsically different from the laws that rule the 
evolution without measurement. This privileged role 
of the changing by effect of a measurement leads to 
serious conceptual difficulties since the changing is 
independent of whether or not the result is recorded 
by some observer; one should therefore have a way 
to distinguish between measurements and generic 
interactions with the environment. 

A related problem that is originated by Axiom III 
is that the formulation of this axiom refers implicitly 
to the presence of a classical observer that certifies 
the outcomes of measurements and is allowed to 
make use of classical probability theory. This 
observer is not subjected therefore to the laws 
of QM. 

These two aspects of the conceptual difficulties 
have their common origin in the separation of the 
measuring device and of the measured systems into 
disjoint entities satisfying different laws. The diffi- 
culties in the theory of measurement have not yet 
received a satisfactory answer, but various attempts 
have been made, with various degree of success, and 
some of them are described briefly in the section 
“Interpretation problems.” It appears therefore that 
QM in its present formulation is a refined and 
successful instrument for the description of the 
nonrelativistic phenomena at the Planck scale, but 
its internal consistency is still standing on shaky 
ground. 

Returning to the axioms, it is worth remarking 
explicitly that according to Axiom II a state is a 
linear functional over the observables, but it is 
represented by a _ sesquilinear function on the 
complex Hilbert space H. Since Axiom II states 
that any normalized element of H represents a state 
(and elements that differ only by a phase represent 
the same state) together with ¢,y also €= ao + 
by), |a| + |b| =1 represent a state superposition of 
@ and w (superposition principle). 

But for an observable A, one has in general 
pe(A) A la|” po(A) + Ibl py (A), due to the cross-terms 
in the scalar product. The superposition principle is 
one of the characteristic features of QM. The 
superposition of the two pure states @ and w has 
properties completely different from those of a 
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statistical mixture of the same two states, defined 
by the density matrix o = |a| Ily + |b| Ilp, where we 
have denoted by II, the orthogonal projection onto 
the normalized vector œ. Therefore, the search for 
these interference terms is one of the means to verify 
the predictions of QM, and their smallness under 
given conditions is a sign of quasiclassical behavior 
of the system under study. 

Strictly connected to superposition are entangle- 
ment and the partial trace operation. Suppose that 
one has two systems which when considered 
separately are described by vectors in two Hilbert 
spaces H;,i=1,2, and which have observables A; € 
B(H;). When we want to study their mutual 
interaction, it is natural to describe both of them in 
the Hilbert space H,®#H 2 and to consider the 
observables A, Q I and I@ Ap. 

When the systems interact, the interaction will not 
in general commute with the projection operator I], 
onto H1. Therefore, even if the initial state is of the 
form ¢; ® ¢2,¢; E€ Hi, the final state (after the 
interaction) is a vector € € H1 ® H2 which cannot 
be written as E=% ®® with G € H;i. It can be 
shown, however, that there always exist two 
orthonormal family vectors ¢, € Hy and Yn E Ho 
such that E= X` cnn Yn for suitable c, € C, 
S jca =1 (this decomposition is not unique in 
general). 

Recalling that pgey(A1 © I) = pọ(A1), one can write 


pelA1 @1) = X` len|”pe,(A1) = polA1) 


= ` al Ts, 


n 


The map T2:p¢— po, 1s called reduction or also 
conditioning) with respect to H2; it is also called 
“partial trace” with respect to H2. The first notation 
reflects the analogy with conditioning in classical 
probability theory. 

The map I) can be extended by linearity to a map 
from normal states (density matrices) on B(H1 ® H2) 
to normal states on B(Hı) and gives rise to a 
positivity-preserving and trace-preserving map. 

One can in fact prove (Takesaki 1971) that any 
conditioning for normal states of a von Neumann 
algebra M is completely positive in the sense that it 
remains positive after tensorization of M with B(K), 
where X is an arbitrary Hilbert space. 

It can also be proved that a partial converse is 
true, that is, that every completely positive trace- 
preserving map ® on normal states of a von 
Neumann algebra A C B(H) can be written, for a 
suitable choice of a larger Hilbert space K and 
partial isometries Vp, in the form (Kraus form) 
(a) = >i, ViaVe. 
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But it must be remarked that, if U(t) is a one- 
parameter group of unitary operators on Hı 8 H2 
and ø is a density matrix, the one-parameter family 
of maps T(t) =a—-IT2(U(t)oU*(t)) does not, in 
general, have the semigroup property I(t+s)= 
I(t) -I(s) s,t > 0 and therefore there is in general 
no generator (of a reduced dynamics) associated 
with it. Only in special cases and under very strong 
hypothesis and approximations is there a reduced 
dynamics given by a semigroup (Markov property). 

Since entanglement and (nontrivial) conditioning are 
marks of QM, and on the other side the Markov 
property described above is typical of conditioning in 
classical mechanics, it is natural to search for condi- 
tions and approximations under which the Markov 
property is recovered, and more generally under which 
the coherence properties characteristic of QM are 
suppressed (decoherence). We shall discuss briefly this 
problem in the section “Interpretation problems,” 
devoted to the attempts to overcome the serious 
conceptual difficulties that descend from Axiom III. 

It is seen from the remarks and definitions above 
that normal states (density matrices) play the role 
that in classical mechanics is attributed to measures 
over phase space, with the exception that pure states 
in QM do not correspond to Dirac measures (later 
on we shall discuss the possibility of describing a 
quantum-mechanical states with a function (Wigner 
function) on phase space). 

In this correspondence, evaluation of an observa- 
ble (a measurable function over phase space) over a 
state (a normalized, positive measure) is related to 
finding the (Hilbert space) trace of the product of an 
operator in B(H) with a density matrix. Notice that 
the trace operation shares some of the properties of 
the integral, in particular trAB=trBA if A is in 
trace class and B € B(H) (cf. ge L! and f € L®) 
and tr AB > 0 if A is a density matrix and B is a 
positive operator. This suggests to define functions 
over the density matrices that correspond to quan- 
tities which are important in the theory of dynami- 
cal systems, in particular the entropy. 

This is readily done if the Hilbert space is finite 
dimensional, and in the infinite-dimensional case if 
one takes as observables all Hermitian bounded 
operators. In quantum statistical mechanics one is 
led to consider an infinite collection of subsystems, 
each one described with a Hilbert space (finite or 
infinite dimensional) H;,i=1,2,..., the space of 
representation is a subspace K of H,; ®H2®:---, 
and the observables are a (weakly closed) subalgebra 
A of B(K) (typically constructed as an inductive 
limit of elements of the form I @I---@ A, @I--:). 
In this context one also considers normal states on A 
and defines a trace operation, with the properties 


described above for a trace. Most of the definitions 
(e.g., of entropy) can be given in this enlarged 
context, but differences may occur, since in general 
A does not contain finite-dimensional projections, 
and therefore the trace function is not the trace 
commonly defined in a Hilbert space. We shall not 
describe further this very interesting and much 
developed theory, of major relevance in quantum 
statistical mechanics. For a thorough presentation 
see Ohya and Petz (1993). 

The simplest and most-studied example is the 
case when each Hilbert space H; is a complex 
two-dimensional space. The resulting system is 
constructed in analogy with the Ising model of 
classical statistical mechanics, but in contrast to that 
system it possesses, for each value of the index i, 
infinitely many pure states. The corresponding 
algebra of observables is a closed subalgebra of 
(C2 x C*)®% and generically does not contain any 
finite-dimensional projection. 

This model, restricted to the case (C2 x C2)‘, K a 
finite integer, has become popular in the study of 
quantum information and quantum computation, in 
which case a normalized element of H; is called a q-bit 
(in analogy with the bits of information in classical 
information theory). It is clear that the unit sphere in 
(C* x C*) contains many more than four points, and 
this gives much more freedom for operations on the 
system. This is the basis of quantum computation and 
quantum information, a very interesting field which 
has received much attention in recent years. 


Quantization and Dynamics 


The evolution in nonrelativistic QM is described by 
the Schrödinger equation in the representation in 
which for an N-particle system the Hilbert space is 
L?(R°\ @ C$, where CÈ is a finite-dimensional space 
which accounts for the fact that some of the 
particles may have a spin content. 

Apart from (often) inessential parameters, the 
Schrodinger equation for spin-0 particles can be 
written typically as 


p 
N 
H = X m(ibV g + Ag)” 
k=1 
N N 
T >, V (Xk) + ` Vati — xz) [9] 
= TTN 


where / is Planck’s constant, A, are vector-valued 
functions (vector potentials), and V, and V;, are 
scalar-valued function (scalar potentials) on R3. 


If some particles have of spin 1/2, the correspond- 
ing kinetic energy term should read —(iho- VY, 
where op, R=1, 2, 3, are the Pauli matrices and one 
must add a term W(x) which is a matrix field with 
values in C*’@C* and takes into account the 
coupling between the spin degrees of freedom. 
Notice that the local operator io-V is a “square 
root” of the Laplacian. 

A relativistic extension of the Schrödinger equa- 
tion for a free particle of mass m > 0 in dimension 
3 was obtained by Dirac in a space of spinor- 
valued functions w,(x,t),k=0,1,2,3, which carries 
an irreducible representation of the Lorentz group. 
In analogy with the electromagnetic field, for which 
a linear partial differential equation (PDE) can be 
written using a four-dimensional representation of 
the Lorentz group, the relativistic Dirac equation is 
the linear PDE 


3 
o 
iD hae, Y= xo = ct 


where the y, generate the algebra of a representation 
of the Lorentz group. The operator X` (O/Ox z)yp% is a 
local square root of the relativistically invariant 
d’Alembert operator —0?/0x5 + A-—m-I. 

When one tries to introduce (relativistically 
invariant) local interactions, one faces the same 
problem as in the classical mechanics, namely one 
must introduce relativistically covariant fields (e.g., 
the electromagnetic field), that is, systems with an 
infinite number of degrees of freedom. If this field is 
considered as external, one faces technical problems, 
which can be overcome in favorable cases. But if one 
tries to obtain a fully quantized theory (by also 
quantizing the field) the obstacles become unsur- 
mountable, due also to the nonuniqueness of the 
representation of the canonical commutation rela- 
tions if these are taken as the basis of quantization, 
as in the finite-dimensional case. 

In a favorable case (e.g., the interaction of a 
quantum particle with the quantized electromagnetic 
field) one can set up a perturbation scheme in a 
parameter a (the physical value of a in natural units 
is roughly 1/137). We shall come back later to 
perturbation schemes in the context of the Schr6- 
dinger operator; in the present case one has been 
able to find procedures (renormalization) by which 
the series in œ that describe relevant physical 
quantities are well defined term by term. But even 
in this favorable case, where the sum of the first few 
terms of the series is in excellent agreement with the 
experimental data, one has reasons to believe that 
the series is not convergent, and one does not even 
know whether the series is asymptotic. 
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One is led to wonder whether the structure of 
fields (operator-valued elements in the dual of 
compactly supported smooth functions on classical 
spacetime), taken over in a simple way from the 
field structure of classical electromagnetism, is a 
valid instrument in the description of phenomena 
that take place at a scale incomparably smaller than 
the scale (atomic scale) at which we have reasons to 
believe that the formalisms of Schrodinger and 
Heisenberg provide a suitable model for the descrip- 
tion of natural phenomena. 

The phenomena which are related to the interac- 
tion of a quantum nonrelativistic particle interacting 
with the quantized electromagnetic field take place 
at the atomic scale. These phenomena have been the 
subject of very intense research in theoretical 
physics, mostly within perturbation theory, and the 
analysis to the first few orders has led to very 
spectacular results (although there is at present no 
proof that the perturbation series are at least 
asymptotic). 

In this field rigorous results are scarce, but 
recently some progress has been made, establishing, 
among other things, the existence of the ground 
state (a nontrivial result, because there is no gap 
separating the ground-state energy from the con- 
tinuous part of the spectrum) and paving the way 
for the description of scattering phenomena; the 
latter result is again nontrivial because the photon 
field may lead to an anomalous infrared (long- 
range) behavior, much in the same way that the 
long-range Coulomb interaction requires a special 
treatment in nonrelativistic scattering theory. 

This contribution to the Encyclopedia is meant to 
be an introduction to QM and therefore we shall 
limit ourselves to the basic structure of nonrelativis- 
tic theory, which deals with systems of a finite 
number of particles interacting among themselves 
and with external (classical) potential fields, leaving 
for more specialized contributions a discussion of 
more advanced items in QM and of the successes 
and failures of a relativistically invariant theory of 
interaction between quantum particles and quan- 
tized fields. 

We shall return therefore to basics. 

One may begin a section on dynamics in QM by 
discussing some properties of the solutions of the 
Schrödinger equation, in particular dispersive effects 
and the related scattering theory, the problem of 
bound states and resonances, the case of time- 
dependent perturbation and the ionization effect, 
the binding of atoms and molecules, the Rayleigh 
scattering, the Hall effect and other effects in 
nanophysics, the various multiscale and adiabatic 
limits, and in general all the physical problems that 
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have been successfully solved by Schrodinger’s QM 
(as well as the very many interesting and unsolved 
problems). 

We will consider briefly these issues and the 
approximation schemes that have been developed in 
order to derive explicit estimates for quantities of 
physical interest. Since there are very many excellent 
reviews of present-day research in QM (e.g., Araki 
and Ezawa (2004), Blanchard and Dell’Antonio 
(2004), Cycon et al. (1986), Islop and Sigal (1996), 
Lieb (1990), Le Bris (2005), Simon (2002), and 
Schlag (2004)) we refer the reader to the more 
specialized contributions to this Encyclopedia for a 
detailed analysis and precise statements about the 
results. 

We prefer to come back first to the foundations of 
the theory; we shall take the point of view of 
Heisenberg and start discussing the mapping proper- 
ties of the algebra of observables and of the states. 
Since transition probabilities play an important role, 
we consider only transformations a which are such 
that, for any pair of pure states ¢; and ¢2, one has 
<a(@1),a(¢2) > = <¢1,¢2>. We call these maps 
Wigner automorphisms. 

A result of Wigner (see Weyl (1931)) states that if 
œ is a Wigner automorphism then there exists a 
unique operator U,, either unitary or antiunitary, 
such that a(P) = U* PU, for all projection operators. 
If there is a one-parameter group of such auto- 
morphisms, the corresponding operators are all 
unitary (but they need not form a group). 

A generalization of this result is due to Kadison. 
Denoting by lh, the set of density matrices, a 
Kadison automorphism £8 is, by definition, such that 
for all 01,02 € l, and all 0 <s <1 one has G(soy + 
(1 — s)or) =sG(o,) + (1 — s)G(o2). For Kadison auto- 
morphisms the same result holds as for Wigner’s. 

A similar result holds for automorphisms of the 
observables. Notice that the product of two Hermi- 
tian operators is not Hermitian in general, but 
Hermiticity is preserved under Jordan’s product 
defined as A x B = (1/2)[AB+ BA]. 

A Segal automorphism is, by definition, an 
automorphism of the Hermitian operators that 
preserves the Jordan product structure. A theorem 
of Segal states that y is a Segal automorphism if and 
only if there exist an orthogonal projector E, a 
unitary operator U in EH, and an antiunitary 
operator V in (I— E)H such that y(A)= W AW*, 
where W=U6V. 

We can study now in more detail the description 
of the dynamics in terms of automorphism of 
Wigner or Kadison type when it refers to states 
and of Segal type when it refers to observables. We 
require that the evolution be continuous in suitable 


topologies. The strongest result refers to Wigner’s 
case. One can prove that if a one-parameter group 
of Wigner automorphism a; is measurable in the 
weak topology (i.e., a;o(A) is measurable in £ for 
every choice of A and ø) then it is possible to choose 
the U(t) provided by Wigner’s theorem in such a 
way that they form a group which is continuous in 
the strong topology. Similar results are obtained for 
the cases of Kadison and Segal automorphism, but 
in both cases one has to assume continuity of a; ina 
stronger topology (the strong operator topology in 
the Segal case, the norm topology in Kadison’s). 
Weak continuity is sufficient if the operator product 
is preserved (in this case one speaks of automorph- 
isms of the algebra of bounded operators). The 
existence of the continuous group U(t) defines a 
Hamiltonian evolution. One has indeed: 


Theorem 1 (Stone). The map t— U(t),t € R is a 
weakly continuous representation of R in the set of 
unitary operators in a Hilbert space H if and only if 
there exists a self-adjoint operator H on (a dense set 
of) H such that U(t)=e and therefore 


dU(t) 4, 
T o=HU)d — [10] 
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The operator H is called generator of the dynamics 


described by U(t). 


Note In Schrödingers approach the operator 
described in Stone’s theorem is called Hamiltonian, 
in analogy with the classical case. In the case of one 
particle of mass m in R° subject to a conservative 
force with potential energy V(x) it has the following 
form, in units in which b=1: 
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[11] 


If the potential V depends on time, Stone’s theorem 
is not directly applicable but still the spectral 
properties of the self-adjoint operators H; and of 
the Kernel of the group r— e!" are essential to 
solve the (time-dependent) Schrödinger equation. 

The semigroup t—e™™®! is usually a positivity- 
preserving semigroup of contractions and defines a 
Markov process; in favorable cases, the same is true 
of te“ (Feynmann-Kac formula). 

There is an analogous situation in the general 
theory of dynamical systems on a von Neumann 
algebra; in analogy with the case of elliptic 
operators, one defines as “dissipation” a map A on 
a von Neumann algebra M which satisfies A (a*a) > 
a* A(a) + A(a*)a for all a € M. The positive dissipa- 
tion A is called completely positive if it remains 
positive after tensorization with B(K) for any 


Hilbert space K. Notice that according to this 
definition every *-derivation is a completely positive 
dissipation. For dissipations there is an analog of the 
theorem of Stinespring, and often bounded dissipa- 
tion can be written as 


1 
A(a) =ilh, a] + >, VaV — (3) > (ViVo a} 
for ae M 


(the symbols {.,.} denote the anticommutator). 

In general terms, by quantization is meant the 
construction of a theory by deforming a commutative 
algebra of functions on a classical phase X in such a 
way that the dynamics of the quantum system can be 
derived from the prescription of deformation, usually 
by deforming the Poisson brackets if X is a cotangent 
bundle T*M (Halbut 2002, Landsman 2002). We 
shall discuss only the Weyl quantization (Weyl 1931) 
that has its roots in Heisenberg’s formulation of QM 
and refers to the case in which the configuration space 
is RN, or, with some variant (Floquet-Zak) the 
N-dimensional torus. We shall add a few remarks 
on the Wick (anti-Weyl) quantization. More general 
formulations are needed when one tries to quantize a 
classical system defined on the cotangent bundle of 
a generic variety and even more so if it defined on a 
generic symplectic manifold. 

The Weyl quantization is a mathematically accu- 
rate rendering of the essential content of the 
procedure adopted by Born and Heisenberg to 
construct dynamics by finding operators which 
play the role of symplectic coordinates. 

Consider a system with one degree of freedom. 
The first naive attempt would be to find operators 
q, p that satisfy the relation 


l, p] c i [12] 


and to construct the Hamiltonian in analogy with 
the classical case. To play a similar role, the 
operators ĝ and p must be self-adjoint and satisfy 
[12] at least in a weak sense. If both are bounded, 
[12] implies e~} ge? = ĝ + bI (the exponential is 
defined through a convergent series) and therefore 
the spectrum of g is the entire real line, a contra- 
diction. Therefore, that inclusion sign in [12] is strict 
and we face domain problems, and as a consequence 
[12] has many inequivalent solutions (“equivalence” 
here means “unitary equivalence”). 

Apart from “pathological” ones, defined on 
L?-spaces over multiple coverings of R, there are 
inequivalent solutions of [12] which are effectively 
used in QM. 

The most common solution is on the Hilbert space 
L7(R) (with Lebesgue measure), with * defined as 
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the essentially self-adjoint operator that acts on the 
smooth functions with compact support as multi- 
plication by the coordinate x and p is defined 
similarly in Fourier space. This representation can 
be trivially generalized to construct operators g, and 
Dp in ER", 

Another frequently used representation of [12] is 
on L*(S') (and when generalized to N degrees of 
freedom, on TY). In this representation, the operator 
p is defined by cyg—kcg on functions f(0)= 
ae cpe'*9/27 0 > M,N < co. In this case the 
operator g is defined as multiplication by the angle 
coordinate 8. It is easy to check that this representa- 
tion is inequivalent to the previous one and that [12] 
is satisfied (as an identity) on the (dense) set of 
vectors which are in the domain both of pq and 
of gp. But notice that the domain of essential self- 
adjointness of p is not left invariant by the action of 
q (Of(0) is a function on S! only if f(27) = 0). 

We shall denote p in this representation by the 
symbol 0/06, and refer to it as the Bloch 
representation. It can be modified by setting the 
action of p as Cy—nc,+a,0 <a < 2r, and this 
gives rise to the various Bloch—-Zak and magnetic 
representations. 

The Bloch representation can be extended to 
periodic functions on R! noticing that L*(R)= 
L7(S'!) @ P(N); similarly, the Bloch-Zak and the 
magnetic representation can be extended to L?(RN). 

The difference between the representations can be 
seen more clearly if one considers the one-parameter 
groups of unitary operators generated by the 
“canonical operators” ĝ and p. In the Schrödinger 
representation on L*(R), these groups satisfy 


U(a) V(b) =e V(b) U(a) 
U(a)=e'4, = V(b) = e*t 


and therefore, setting z=a+ib and W(z)= 


ei26/2 V(b) U(a) one has 


W(z)W(z’) = e 2/2 Wz + 2') 


[13] 
ZEC, 


w(Z, z) = Im(z, z) 


The unitary operators W(z) are therefore projective 
representations of the additive group C. This 
generalizes immediately to the case of N degrees 
of freedom; the representation is now of the 
additive group C and w is the standard symplectic 
form on CN, 

In the Bloch representation, the  unitaries 
U(a) V(b)U*(a)V*(b) are not multiples of the iden- 
tity, and have no particularly simple form. The map 
CN > z— W(z) with the structure [13] is called Weyl 
system; it plays a major role in QM. The following 


120 Introductory Article: Quantum Mechanics 


theorem has therefore a major importance in the 
mathematical theory of QM. 


Theorem 2 (von Neumann 1965). There exists 
only one, modulo unitary equivalence, irreducible 
representation of the Weil system. 


The proof of this theorem follows a general 
pattern in the theory of group representations. One 
introduces an algebra W) of operators 


W, = [few f(z) W(z)dz, 


called Weyl algebra. 

It easy to see that |W;| = |f| and that f — W; 1 : 
linear isomorphism of algebras if one considers WY 
with its natural product structure and L! as a 
noncommutative algebra with product structure 


feg=fdcfe—z)eeexpzolzz) [4 


So far the algebra W® is a concrete algebra of 
bounded operators on L*(R*). But it can also be 
considered a Ran C*-algebra which we still 
denote by W'N 

It is easy to see that, according to [14], if fo is 
chosen to be a suitable Gaussian, fs W, is a 
projection operator which commutes with all the 
Wps. Moreover, W Wg = Qf eWesg for a suitable 
phase factor ¢. Considering the Gelfand—Neumark-— 
Segal construction for the C*-algebra W'%), one 
finds that these properties lead to a decomposition 
of any representation in cyclic irreducible equivalent 
ones, completing the proof of the theorem. 

The Weyl system has a representation (equivalent 
to the Schrödinger one) in the space L*(RN, g), 
where g is Gauss’s measure. This allows an exten- 
sion in which CN is replaced by an infinite- 
dimensional Banach space equipped with a Gauss 
measure (weak distribution (Segal 1965, Gross 
1972, Wiener 1938)). Uniqueness fails in this more 
general setting (uniqueness is strictly connected with 
the compactness of the unit ball in C). Notice that 
in the Schrodinger representation (and, therefore, in 
any other representation) the Hamiltonian for the 
harmonic oscillator defines a positive self-adjoint 
operator 


fe Lc) 


2 


2 
a Ta~ 
e k 


N 
N=) Ne, Ne=- 

1 
The spectrum of each of the commuting operators 
N; consists of the positive integers (including 0) and 
is therefore called number operator for the kth 
degree of freedom. The operator N, can be written 
as Nz, =a%ak, where az = (1/2) (x, + 0/Ox,z) and di, 


is the formal adjoint of a, in L*(R). One has 
laz(Nz + 1) ‘eae In the domain of N these 
operators satisfy the following relations (canonical 
commutation relations) 


4p, 4,| = Opps ap, ak] = 0 


|N}, ap] 15] 


= —Ay, Op b, Npa] = 4, 6p. 


In view of the last two relations, the operator ay, is 
called the annihilation operator (relative to the kth 
degree of freedom) and its formal adjoint is called 
the creation operator. The operators ag, have as 
spectrum the entire complex plane, the operators a; 
have empty spectrum; the eigenvectors of N; are the 
Hermite polynomials in the variable x. The 
eigenvectors of a, (i.e., the solutions in L7(R) of 
the equation a,¢, = Ad), A € C) are called coherent 
states; they have a major role in the Bargmann- 
Fock-Segal quantization and in general in the 
semiclassical limit. 

The operators {N;} generate a maximal abelian 
system and therefore the space L*(RN) has a natural 
representation as the symmetrized subspace of 
@,(CN)*® (Fock representation). In this representa- 
tion, a natural basis is given by the common 
eigenvectors Qin}, k =1,..., N, of the operators N}. 
A generic vector can be written as 


2 
U=]> Cng ng > Aaa —e 
{nk} {nk} 


and therefore can be represented by the sequence c,,,,}. 

Notice that the creation operators do not create 
particles in RN but rather act as a shift in the basis 
of the Hermite polynomials. 

It is traditional to denote by 7(L7(R%)) the Fock 
representation (also called second quantization 
because for each degree of freedom the wave 
function is written in the quantized basis of the 
harmonic oscillator) and to denote by r(A) the lift 
of a matrix A € B(CN). These notations are espe- 
cially used if CN is substituted with a Banach space 
X. This terminology was introduced by Segal in his 
work on quantization of the wave equation; it is 
used ever since, mostly in a perturbative context. 

In the theory of quantized fields, the space CN is 
substituted with a Banach space, X, of functions. 
In this setting, “second quantization” (Segal 1965, 
Nelson 1974) considers the state ¢,,; as represent- 
ing a configuration of the system in which there are 
precisely m, particles in the kth physical state (this 
presupposes having chosen a basis in the space of 
distribution on R*). There is no problem in doing 
this (Gross 1972) and one can choose for X a 
suitable Sobolev space (which one depends on the 
Gaussian measure given in X) if one wants that the 


generalization of the commutation relations [15] be 
of the form [a*(f),a(g)]=<f,g> with a suitable 
scalar product <-,-> in X. The problem with 
quantization of relativistic fields is that, in order to 
ensure locality, one is forced to use a Sobolev space 
of negative index (depending on the dimension of 
physical space), and this gives rise to difficulties in 
the definition of the dynamics for nonlinear vector 
fields. 

One should notice that in the work of Segal 
(1965), and then in Constructive field theory 
(Nelson 1974), the Fock representation is placed in 
a Schrödinger context exhibiting the relevant opera- 
tors as acting on a space L*(X,g), where X is a 
subspace of the space of Schwartz distributions on 
the physical space of the particles one wants to 
describe and g is a suitably defined Gauss measure 
on X. 

The Fock representation is related to the Bargmann- 
Fock-Segal representation (Bargmann 1967), a repre- 
sentation in a space of holomorhic functions on CN 
square integrable with respect to a Gaussian measure. 
For its development, this representation relies on the 
properties of Toeplitz operators and on Tauberian 
estimates. It is much used in the study of the 
semiclassical limit and in the formulation of QM in 
systems for which the classical version has, for phase 
space, a manifold which is not a cotangent bundle 
(e.g., the 2-sphere). 


Remark The Fock representation associated with 
the Weyl system in the infinite-dimensional context 
can describe only particles obeying Bose-Einstein 
statistics; indeed, the states are qualified by their 
particle content for each element of the basis chosen 
and there is no possibility of identifying each 
particle in an N-particle state. This is obvious in 
the finite-dimensional case: the Hermite polynomial 
of order 2 cannot be seen as “composed” of two 
polynomials of order 1. 


In the infinite-dimensional context, if one wants 
to treat particles which obey Fermi—Dirac statistics, 
one must rely on the Pauli exclusion principle (Pauli 
1928), which states that two such particles cannot 
be in the same configuration; to ensure this, the 
wave function must be antisymmetric under permu- 
tation of the particle symbols. It is a matter of fact 
(and a theorem in relativistic quantum field theory 
which follows in that theory from covariance, 
locality and positivity of the energy (Streater and 
Wightman 1964) that particles with half-integer spin 
obey the Fermi—Dirac statistics. Therefore, to quan- 
tize such systems, one must introduce (commuta- 
tion) relations different from those of Weyl. Since it 
must now be that (a*)* =0, due to antisymmetry, it 
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is reasonable to introduce the following relations 
(canonical anticommutation relations: 


{ap ap} =0 
{A,B} = AB — BA 


C T 


16 
Ngai] = —an bpp, ad 


The Hilbert space is now @NH2, where H3 is a 
two-dimensional complex Hilbert space. Notice that 
H2 carries an irreducible two-dimensional represen- 
tation of sU(2) = 0(3) (spin representation) so that 
this quantization associates spin 1/2 and 
antisymmetry. 

The operators in [16] are all bounded (in fact 
bounded by 1 in norm). The Fock representation is 
constructed as in the case of Weyl (see Araki 
(1988)), with m, equal O or 1 for each index k. 
The infinite-dimensional case is defined in the same 
way, and leads to inequivalent irreducible represen- 
tations (Araki 1988); only in one of them is the 
number operator defined and bounded below. Some 
of these representations can be given a Schrödinger- 
like form, with the introduction of a gauge and an 
integration formalism based on a trace (Gross 
1972). This system is much used in quantum 
statistical mechanics because it deals with bounded 
operators and can take advantage of strong results 
in the theory of C*-algebras. In the finite-dimensional 
case (and occasionally also in the general case) it is 
used in quantum information (the space H2 is the 
space of a quantum bit). 

Returning to the Weyl system, we now introduce 
the strictly related Wigner function which plays an 
important role in the analysis of the semiclassical 
limit and in the discussion of some scaling limits, in 
particular the hydrodynamical limit and the Bose- 
Einstein condensation when N — oo. 

The Wigner function W, for a pure state @ is a 
real-valued function on the phase space of the 
classical system which represents the state faithfully. 
It is defined as 


Wry(x,£) = (2m) / eb (x +2) 5 (x -3) dy 


The Wigner function is not positive in general (the 
only exceptions are those Gaussian states that satisfy 
A(x)-A(p) > 4). But is has the interesting property 
that its marginals reproduce correctly the Born rule. 
In fact, one has f W(x, £) dx = IBEN. If the func- 
tion ġ(t,x) x € R” is a solution of the free Schrödinger 
equation ih0d/dt=—h° A then its Wigner function 
satisfies the Liouville (transport) equation OW4/Ot+ 
E-VW=0. 

The Wigner function is strictly linked with the 
Weyl quantization. This quantization associates 
with every function o(p,x) in a given regularity 
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class an operator o(D,x) (the Weyl symbol of the 
function ø) defined by 


(o(D, x)f,2) = J o(é,x) W (F, g)(E, x) dé dx 
WE aE x) = f Mf (x +E x —F) dp 


It can be verified that the action of F preserves the 
Schwartz classes § and S’ and is unitary in L7(R7%). 
Moreover, one has o(D,x)* =a(D,x). 

The relation between Weyl’s quantization and 
Wigner functions can be readily seen from the 
natural duality between bounded operators and 
pure states: 


tr(Â ô) = J a(p,q)p(p, 4) dp dq 
(p,q) = J eT) o(q', q) dq! 


We give now a brief discussion of the general 
structure of a quantization, and apply it to the 
Weyl quantization. By quantization of a Hamilto- 
nian system we mean a correspondence, parame- 
trized by a small parameter 4, between classical 
observables (real functions on a phase space F) and 
quantum observables (self-adjoint operators on a 
Hilbert space H) with the property that the 
corresponding structures coincide in the limit 4 — 0 
and the difference for h~0 can be estimated in a 
suitable topology. 

This last requirement is important for the applica- 
tions and, from this point of view, Weyl’s quantiza- 
tion gives stronger results than the other formalisms 
of quantization. 

We limit our analysis to the case F = T*X, with 
X = RN, and we make use of the realization of H as 
LR) 

Let {x;} be Cartesian coordinates in RN and 
consider a correspondence A — A that satisfies the 
following requirements: 


1. AA is linear; 

2. xp £p where x, is multiplication by x+; 

3. Dem —ihð/ðx,; 

4. if f is a continuous function in R, one has 
f(x) f(x) and f(p) = (Ff)(x), where F denotes a 
Fourier transform; 

5. Leole,¢ = (a, 6) a, B € RN, where Le is the 
generator of the translations in phase space in 
the direction ¢ and Le is the generator of the one- 
parameter group t— W(tC) associated with ¢ by 
the Weyl system. 


Note that (1) and (4) imply (2) and (3) through a 


limit procedure. 


Under the correspondence A < A, linear symplec- 
tic maps correspond to unitary transformations. 
This is not in general the case for nonlinear maps. 

One can prove that conditions (1)-(S) give 
a complete characterization of the map AA. 
Moreover, the correspondence cannot be extended 
to other functions in phase space. Indeed, one has: 


Theorem 3 (van Hove). Let G be the class of 
functions C® on R? which are generators of global 
symplectic flows. For ge G let ©®,(t) be the 
corresponding group. There cannot exist for every 
g a correspondence gg, with g self-adjoint, such 
that 2(x, p) = 9(X, p). 


We described the Weyl quantization as a corre- 
spondence between functions in the Schwartz class S 
and a class of bounded operators. Weyl’s quantiza- 
tion can be extended to a much wider class of 
functions. Operators that can be so constructed are 
called Fourier integral operators. One uses the 
notation ĉ = o(D,x). 

We have the following useful theorems (Robert 
1957): 


Theorem 4 Let h,...,lg be linear functions on RN 
such that {lly} =0. Let P be a polynomial and let 
a(x) = PUAIG%); Ix(&, x)]. Then 


(i) o(D,x) maps S in L*(RN) and self-adjoint; 
(ii) if g is continuous, then (g(o)(D, x) =g(o(D,x)). 


One proves that o(D,x) extends to a continuous 
map S’(X)— S’(X) and, moreover, 


Theorem 5  (Calderon—Vaillancourt). If oo = 
Dlal+|6l<2N-+1 |D? Do] < oo the norm of the opera- 
tor o(D,x) is bounded by oo. 


Any operator obtained from a suitable class of 
functions through Weyl’s quantization is called a 
pseudodifferential operator. If o(g,p)=P(p), where 
P is a polynomial, ĉ(p, q) is a differential operator. 

Moreover, if o(p,x)€L* then o(D,x) is a 
Hilbert—Schmidt operator and 


jo(D,x)leas = (27h) J JA(z)/° de - 


Pseudodifferential operators turn out to be very 
important in particular in the quantum theory of 
molecules (Le Bris 2003), where adiabatic analysis 
and Peierls substitution rules force the use of 
pseudodifferential operators. 

The next important problem in the theory of 
quantization is related to dynamics. 

Let 8 be a quantization procedure and let H(p, q) 
be a classical Hamiltonian on phase space. Let A; be 


the evolution of a classical observable A under the 
flow defined by H and assume that ((A;) is well 
defined or all t. 

Is there a self-adjoint operator H such that 
B(A;) =e"43(A)e"“74? If so, can one estimate 
IH — G(H)|? Conversely, if the generator of the 
quantized flow is, by definition, H (as is usually 
assumed), is it possible to give an estimate of the 
difference |G(A;) — (G(A)),|@ for a dense set of ¢ € 
H, where A; = ese! or to estimate |A; — Arlas 
where A; is defined by (A+) =((G(A)),. Is it possible 
to write an asymptotic series in h for the differences? 

For the Weyl quantization some quantitative 
results have been obtained if one makes use of the 
semiclassical observables (Robert 1987). We shall 
not elaborate further on this point. 

For completeness, we briefly mention another 
quantization procedure which is often used in 
mathematical physics. 


Wick Quantization 


This quantization assigns positive operators to 
positive functions, but does not preserve polynomial 
relations. It is strictly related to the Bargmann- 
Fock-—Segal representation. 

Call coherent state centered in the point (y, n) of 
phase space the normalized solution of (ip +x — 
In + X)Py, n(x) = 0. 

Wick’s quantization of the classical observable A 
is by definition the map A > Op™ (A), where 


Op (A)w = (2mb)-" J E byn dy dn 


One can prove, either directly or going through 
Weyl’s representation, that 


1. if A> 0 then Op; (A) > 0; 
2. the Weyl symbol of the operator Op; (A) is 


(nb)~” A(y, nye tE» (6-0) dy dn 


3. for every A € O(0) one has |Opř(A) — Al] = 
O(h). 


Wick’s quantization associates with every vector 
€ H a positive Radon measure jug in phase space, 
called Husimi measure. It is defined by f A duy = 
(Op; (A)w- Y), A € S(z). Wick’s quantization is less 
adapted to the treatment of nonrelativistic particles, 
in particular Eherenfest’s rule does not apply, and 
the semiclassical propagation theorem has a more 
complicated formulation. It is very much used for 
the analysis in Fock space in the theory of quantized 
relativistic fields, where a special role is assigned to 
Wick ordering, according to which the polynomials 
in £, and py, are reordered in terms of creation and 
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annihilation operators by placing all creation opera- 
tors to the left. 

We now come back to Schrédinger’s equation and 
notice that it can be derived within Heisenberg’s 
formalism and Weyl’s quantization scheme from the 
Hamiltonian of an N-particle system in Hamiltonian 
mechanics (at least if one neglects spin, which has 
no classical analog). 

Apart from (often) inessential parameters, the 
Schrödinger equation for N scalar particles in R? 
can be written as 


pee — Sibv: +A) o+ Vo= Ho 
ðt HZ 
o E€ L?(R?N) 


where A, are vector-valued functions (vector poten- 
tials) and V= V(x) + V; k(xi—xę) are scalar- 
valued function (scalar potentials) on R°. 

Typical problems in Schrödinger’s 
mechanics are: 


quantum 


1. Self-adjointness of H, existence of bound states 
(discrete spectrum of the operator), their number 
and distribution, and, in general, the properties 
of the spectrum. 

2. Existence, completeness, and continuity proper- 
ties of the wave operators 


Ws = s — lim e” e- "H [18] 

F 
and the ensuing existence and properties of the 
S-matrix and of the scattering cross sections. In 
[18] Ho is a suitable reference operator, usually 
—A (with periodic boundary conditions if the 
potentials are periodic in space), for which 
Schrödinger’s equation can be somewhat analy- 
tically controlled. 

3. Existence and property of a semiclassical limit. 


In [17] and [18] we have implicitly assumed that H 
is time independent; very interesting problems arise 
when H depends on time, in particular if it is 
periodic or quasiperiodic in time, giving rise to 
ionization phenomena. In the periodic case, one is 
helped by Floquet’s theory, but even in this case 
many interesting problems are still unsolved. 

If the potentials are sufficiently regular, the 
spectrum of H consists of an absolutely continuous 
part (made up of several bands in the space-periodic 
case) and a discrete part, with few accumulation 
points. 

On the contrary, if V(x,w) is a measurable 
function on some probability space Q, with a 
suitable distribution (e.g., Gaussian), the spectrum 
may have totally different properties almost surely. 
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For example, in the case N= 1 (so that the terms V;,; 
are absent) in one and two spatial dimensions the 
spectrum is pure point and dense, with eigenfunctions 
which decrease at infinity exponentially fast (although 
not uniformly); as a consequence, the evolution group 
does not give rise to a dispersive motion. The same is 
true in three dimensions if the potential is sufficiently 
strong and the kinetic energy content of the initial state 
is sufficiently limited. This very interesting behavior is 
due roughly to the randomness of the “barriers” 
generated by the potential and is also present, to a 
large extent, for potentials quasiperiodic in space 
(Pastur and Figotin 1992). 

In these as well as in most problems related 
to Schrédinger’s equation, a crucial role is taken 
by the resolvent operator (H — AI), where A is 
any complex number outside the spectrum of H; 
many of the results are obtained when the difference 
(H — XI) — (Ho — AI) is a compact operator. 

Problems of type (1) and (2) are of great physical 
interest, and are of course common with theoretical 
physics and quantum chemistry (Le Bris 2003), 
although the instruments of investigation are some- 
what different in mathematical physics. The semi- 
classical limit is often more of theoretical interest, 
but its analysis has relevance in quantum chemistry 
and its methods are very useful whenever it is 
convenient to use multiscale methods, as in the 
study of molecular spectra. 

We start with a brief description of point (3); it 
provides a valid instrument in the description of 
quantum-mechanical systems at a scale where it is 
convenient to use units in which the physical 
constant b has a very small value (6 ~ 107’ in 
CGS units). From Heisenberg’s commutation rela- 
tions, [x,p] C Al, it follows that the product of the 
dispersion (uncertainty) of the position and momen- 
tum variables is proportional to þh and therefore at 
least one of these two quantities must have very 
large values (compared to b). One considers usually 
the case in which these dispersions have comparable 
values, which is therefore very small, of the order of 
magnitude b'/? (but very large as compared with þ). 
In order to make connection with the Hamilton- 
Jacobi formalism of classical mechanics one can also 
consider the case in which the dispersion in 
momentum is of the order h (the WKB method). 

The semiclassical limit takes advantage mathema- 
tically from the fact that the parameter h is very 
small in natural units, and performs an asymptotic 
analysis, in which the terms of “lowest order” are 
exactly described and the difference is estimated. 
The problem one faces is that the Schrödinger 
equation becomes, in the “mathematical limit” 


h—0, a very singular PDE (the coefficients of the 
differential terms go to zero in this limit). 

Dividing each term of the equation by h (because 
we do not want to change the scale of time) leads, in 
the case of one quantum particle in R? in potential 
field V(x) (we treat, for simplicity, only this case), to 
the equation 

jot) = hAd(x,t) +h 'V(x)o(x,t) [19] 
It is convenient therefore to “rescale” the spatial 
variables by a factor h'/? (ie., choose different 
units) setting x = VHX and look for solutions of [19] 
which remain regular in the limit 4 — 0 as functions 
of the rescaled variable X. One searches therefore 
for solutions that on the “physical scale” have 
support that becomes “vanishingly small” in the 
limit. It is therefore not surprising that, in the limit, 
these solutions may describe point particles; the 
main result of semiclassical analysis is that he 
coordinates of these particles obey Hamilton’s laws 
of classical mechanics. 

This can be roughly seen as follows (accurate 
estimates are needed to make this empirical analysis 
precise). Using multiscale analysis, one may write the 
solution in the form ¢(X,x,t) and seek solutions 
which are smooth in X and x. Both terms on the right- 
hand side of [19] contain contributions of order —2 
and —1 in vb and in order to have regular solutions 
one must have cancellations between equally singular 
contributions. For this, one must perform an expan- 
sion to the second order of the potential (assumed at 
least twice differentiable) around a suitable trajectory 
q(t), q € R°, and choose this trajectory in such a way 
that the cancellations take place. 

A formal analysis shows that this is achieved only 
if the trajectory chosen is precisely a solution of the 
classical Lagrange equations. Of course, a more 
refined analysis and good estimates are needed to 
make this argument precise, and to estimate the 
error that is made when one neglects in the resulting 
equation terms of order WA; in favorable cases, for 
each chosen T the error in the solution for most 
initial conditions of the type described is of order 
vh for |t| <T. 

This semiclassical result is most easily visualized 
using the formalism of Wigner functions (the 
technical details, needed to to make into a proof 
the formal arguments, take advantage of regularity 
estimates in the theory of functions). 

In natural units, one defines 


Ta £ 
Wh (x, £, £) = (=) Wp (=z) 


In terms of the Wigner function W, , the Schrédin- 
ger equation [19] takes the form 
of” 


h i 
FT sg. Uf? + Ky ef? =0 
ap + & Vato + hp *f 


p” (t = 0) = po(h) 


|20] 


where 


i : E hy hy 
E = | ( 7) E ( n) 
h Ome x 5 x 5 


It can be proved (Robert 1987) that if the potential 
is sufficiently regular and if the initial datum 
converges in a suitable topology to a positive 
measure fo, then, for all times, Wz, ,(x,¢) converges 
to a (weak) solution of the Liouville equation 

Of 
gern val = ke) -Vf =0 
This leads to the semiclassical limit if, for example, 
one considers a sequence of initial data py, where ¢,, 
is a sequence of functions centered at xg with 
Fourier transform centered at pọ and dispersion of 
order H"? both in position and in momentum. In 
this case, the limit measure is a Dirac measure 
centered on the classical paths. 

In the course of the proof of the semiclassical limit 
theorem, one becomes aware of the special status of 
the Hamiltonians that are at most quadratic in x and 
p. Indeed, it is easy to verify that for these 
Hamiltonians the expectation values of x and p 
obey the classical equation of motion (P Ehrenfest 
rule). 

From the point of view of Heisenberg, this can be 
understood as a consequence of the fact that 
Operators at most bilinear in a and a* form an 
algebra D under commutation and, moreover, the 
homogeneous part of order 2 is a closed subalgebra 
such that its action on D (by commutation) has the 
same structure as the algebra of generators of the 
Hamiltonian flow and its tangent flow. Apart from 
(important) technicalities, the proof of the semiclas- 
sical limit theorem reduces to the proof that one can 
estimate the contribution of the terms of order 
higher than 2 in the expansion of the quantum 
Hamiltonian at the classical trajectory as being of 
order h'/* in a suitable topology (Hepp 1974). 

We end this overview by giving a brief analysis of 
problems (1) and (2), which refer to the description 
of phenomena that are directly accessible to com- 
parison with experimental data, and therefore have 
been extensively studied in theoretical physics and 
quantum chemistry (Mc Weeny 1992); some of 
them have been analyzed with the instruments of 
mathematical physics, often with considerable 





Introductory Article: Quantum Mechanics 125 


success. We give here a very naive introduction to 
these problems and refer the reader to the more 
specialized contributions to this Encyclopedia for a 
rigorous analysis and exact statements. 

Of course, most of the problems of physical 
interest are not “exactly solvable,” in the sense that 
rarely the final result is given explicitly in terms of 
simple functions. As a consequence, exact numerical 
results, to be compared with experimental data, are 
rarely obtained in physically relevant problems, and 
most often one has to rely on approximation 
schemes with (in favorable cases) precise estimates 
on the error. 

Formal perturbation theory is the easiest of such 
schemes, but it seldom gives reliable results to 
physically interesting problems. One writes 


H.=H+eV [21] 


where € is a small real parameter, and sets a formal 
scheme in case (1) by writing 


Ad = LO L= SAE, De — Sa 
0 0 


and, in case (2), iterating Duhamel’s formula 
t 
e tHe = e Ho J ic | e 1(t—s)H- Ve iso ds [22] 
0 


Very seldom the perturbation series converges, and 
one has to resort to more refined procedures. 

In some cases, it turns out to be convenient to 
consider the formal primitive E: of E. (as a 
differentiable function of e) and prove that it is 
differentiable in e for 0 < e < eo (but not for «=0). 
In favorable cases, this procedure may lead to 


N 
Ec = SI AE, +Rn(©), lim |Rw|(€) = +00 
0 


with explicit estimates of |Rn(e)| for O < € < €. 

Re-summation techniques of the formal power 
series may be of help in some cases. 

The estimate of the lowest eigenvalues of an 
operator bounded below is often done by variational 
analysis, making use of min—max techniques applied 
to the quadratic form O(¢) = (¢, H@). 

Semiclassical analysis can be useful to search for 
the distribution of eigenvalues and in the study of 
the dynamics of states whose dispersions both in 
position and in momentum are very large in units in 
which h=1. 

A case of particular interest in molecular and 
atomic physics occurs when the physical parameters 
which appear in H, (typically the masses of the 
particles involved in the process) are such that one 
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can a priori guess the presence of coordinates which 
have a rapid dependence on time (fast variables) and 
a complementary set of coordinates whose depen- 
dence on time is slow. This suggests that one can try 
an asymptotic analysis, often in connection with 
adiabatic techniques. Seldom one deals with cases in 
which the hypotheses of elementary adiabatic 
theorems are satisfied, and one has to refine the 
analysis, mostly through subtle estimates which 
ensure the existence of quasi invariant subspaces. 

Asymptotic techniques and refined estimates are 
also needed to study the effective description of a 
system of N interacting identical particles when N 
becomes very large; for example, in statistical 
mechanics, one searches for results which are valid 
when N > oo. 

The most spectacular results in this direction are 
the proof of stability of matter by E Lieb and 
collaborators, and the study of the phenomenon of 
Bose-Einstein condensation and the related Gross- 
Pitaevskii (nonlinear Schrodinger) equation. The 
experimental discovery of the state of matter 
corresponding to a Bose-Einstein condensate is a 
clear evidence of the nonclassical behavior of matter 
even at a comparatively macroscopic size. From the 
point of view of mathematical physics, the ongoing 
research in this direction is very challenging. 

One should also recognize the increasing role that 
research in QM is taking in applications, also in 
connection with the increasing success of nanotech- 
nology. In this respect, from the point of view of 
mathematical physics, the study of nanostructure 
(quantum-mechanical systems constrained to very 
small regions of space or to lower-dimensional 
manifolds, such as sheets or graphs) is still in its 
infancy and will require refined mathematical 
techniques and most likely entirely new ideas. 

Finally, one should stress the important role 
played by numerical analysis (Le Bris 2003) and 
especially computer simulations. In problems involv- 
ing very many particles, present-day analytical 
techniques provide at most qualitative estimates 
and in favorable cases bounds on the value of the 
quantities of interest. Approximation schemes are 
not always applicable and often are not reliable. 

Hints for a progress in the mathematical treatment 
of some relevant physical phenomena of interest in 
QM (mostly in condensed matter physics) may come 
from the ab initio analysis made by simulations on 
large computers; this may provide a qualitative and, 
to a certain extent, quantitative behavior of the 
solutions of Schrédinger’s equation corresponding to 
“typical” initial conditions. In recent times the 
availability of more efficient computing tools has 
made computer simulation more reliable and more 


apt to concur with mathematical investigation to a 
fuller comprehension of QM. 


Interpretation Problems 


In this section we describe some of the conceptual 
problems that plague present-day QM and some of 
the attempts that have been made to cure these 
problems, either within its formalism or with an 
altogether different approach. 


Approaches within the QM Formalism 


We begin with the approaches “from within.” We 
have pointed out that the main obstacle in the 
measurement problem is the description of what 
occurs during an act of measurement. Axiom III 
claims that it must be seen as a “destruction” act, 
and the outcome is to some extent random. The 
final state of the system is one of the eigenstates of 
the observable, and the dependence on the initial 
state is only through an a priori probability assign- 
ment; the act of measurement is therefore not a 
causal one, contrary to the (continuous) causal 
reversible description of the interaction with the 
environment. One should be able to distinguish 
a priori the acts of measurement from a generic 
interaction. 

There is a further difficulty. Due to the super- 
position principle, if a system S on which we want 
to make a measurement of the property associated 
with the operator A “interacts” with an instrument 
T described by the operator S, the final state € of the 
combined system will be a coherent superposition of 
tensor product of (normalized) eigenstates of the 
two systems 


£ = ` amO. zo Lae ` al =1 [23] 


n,m 


Measurement as described by Axiom III of QM 
claims that once the measurement is over, the 
measured system is, with probability $>, [cn m|, in 
the state ¢4 and the instrument is in a state which 
carries the information about the final state of the 
system (after all, what one reads at the end is an 
indicator of the final state of the instrument). 
It is therefore convenient to write € in the form 


E=X dnp @Gn, Sold’ =1 24 
(this defines C, if the spectrum of A is pure point and 
nondegenerate). It is seen from [24] that, due to the 
reduction postulate, we know that the the measured 
system is in the state $4 if a measurement of an 
observable T with nondegenerate spectrum, 


eigenvectors {¢,}, and eigenvalues {z,} gives the 
results Z,,. 

Along these lines, one does not solve the measure- 
ment problem (the outcome is still probabilistic) but 
at least one can find the reason why the measuring 
apparatus may be considered “classical.” 

It is more convenient to go back to [23] and to 
assume that one is able to construct the measuring 
apparatus in such a way that one divides (roughly) 
its pure (microscopic) states in sets ©®, (each 
corresponding to a “macroscopic” state) which are 
(roughly) in one-to-one correspondence to the 
eigenstates of A. The sets ®, contain a very large 
number, No,, of elements, so that the sets ®, need 
not be given with extreme precision. And the sets ®,, 
must be in a sense “stable” under small external 
perturbations. 

It is clear from this rough description that the 
apparatus should contain a large number of small 
components and still its interaction with the “small” 
system A should lead to a more or less sudden 
change of the sets ®,. 

A concrete model of this mechanism has been 
proposed by K Hepp (1972) for the case when A is a 
2 x 2 matrix, and the measuring apparatus is made 
of a chain of N spins, N —> œ; the analysis was 
recently completed by Sewell (2005) with an 
estimate on the error which is made if N is finite 
but large. This is a dynamical model, in which the 
observable A (a spin) interacts with a chain of spins 
(“moves over the spins”) leaving the trace of its 
passage. It is this trace (final macroscopic state of 
the apparatus) which is measured and associated 
with the final state of A. The interaction is not 
“instantaneous” but may require a very short time, 
depending on the parameters used to describe the 
apparatus and the interaction. 

We call “decoherence” the weakening of the 
superposition principle due to the interaction with 
the environment. 

Two different models of decoherence have been 
analyzed in some detail; we shall denote them 
thermal-bath model and scattering model; both are 
dynamical models and both point to a solution, to 
various extents, of the problem of the reduction to a 
final density matrix which commutes with the 
operator A (and therefore to the suppression of the 
interference terms). 

The thermal-bath model makes use of the 
Heisenberg representation and relies on results of 
the theory of C*-algebras. This approach is closely 
linked with (quantum) statistical mechanics; its aim 
is to prove, after conditioning with respect to the 
degrees of freedom of the bath, that a special role 
emerges for a commuting set of operators of the 
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measured system, and these are the observables that 
specify the outcome of the measurement in prob- 
abilistic terms. 

The scattering approach relies on the Schrodinger 
approach to QM, and on results from the theory of 
scattering. This approach describes the interaction of 
the system S (typically a heavy particle) with an 
environment made of a large number of light particles 
and seeks to describe the state of S after the 
interaction when one does not have any information 
on the final state of the light particle. One seeks to 
prove that the reduced density matrix is (almost) 
diagonal in a given representation (typically the one 
given by the spatial coordinates). This defines the 
observable (typically, position) that can be measured 
and the probability of each outcome. 

Both approaches rely on the loss of information in 
the process to cancel the effect of the superposition 
principle and to bring the measurement problem 
within the realm of classical probability theory. 
None of them provides a causal dependence of the 
result of the measurement on the initial state of the 
system. 

We describe only very briefly these attempts. 

In its more basic form, the “scattering approach” 
has as starting point the Schrödinger equation for a 
system of two particles, one of which has mass very 
much smaller than the other one. The heavy particle 
may be seen as representing the system on which a 
measurement is being made. The outline of the 
method of analysis (which in favorable cases can be 
made rigorous) (Joos and Zeh 1985, Tegmark 1993) 
is the following. One chooses units in which the 
mass of the heavy particle is 1, and one denotes by «€ 
the mass of the light particle. If x is the coordinate 
of the heavy particle and y that of the light one, and 
if the initial state of the system is denoted by 
Po(x,y), the solution of the equation for the system 
is (apart from inessential factors) 


®, = exp{i(—A, — «Ay + W(x) + V(x — y))t}®o 


Making use of center-of-mass and relative coordi- 
nates, one sees that when € is very small one should 
be able to describe the system on two timescales, 
one fast (for the light particle) and one slow (for the 
heavy one) and, therefore, place oneself in a setting 
which may allow the use of adiabatic techniques. In 
this setting, for the measure of the heavy particle 
(e.g., its position) one may be allowed to consider 
the light particle in a scattering regime, and use the 
wave operator corresponding to a potential 
Vly) = V(y — x). 

Taking the partial trace with respect to the 
degrees of freedom of the light particle (this 
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corresponds to no information of its final state) one 
finds, at least heuristically, that the state of the 
heavy particle is now described (due to the trace 
Operation) by a density matrix ø for which in the 
coordinate representation the off-diagonal terms 
ox xw are slightly suppressed by a factor éx w =1 — 
(Wip, Ww) where w represents the initial state of 
the light particle and Wf is the wave operator for 
the motion of the light particle in the potential «V,. 
One must assume that function ¢ which represents 
the initial state of the heavy particle is sufficiently 
localized so that & w < 1 for every x’ #£x in its 
support. 

If the environment is made of very many 
particles (their number N(e) must be such that 
limo €N(€)=00) and the heavy particle can be 
supposed to have separate interactions with all of 
them, the off-diagonal elements of the density 
matrix tend to 0 as e—0 and the resulting density 
matrix tends to have the form ©®(x,x’)=6(x — x’) 
p(x), p(x) > 0, f p(x)dx=1. If it can be supposed 
that all interactions take place within a time T(e) < °, 
a > 0 one has p(x) = hpl). 

If the interactions are not independent, the 
analysis becomes much more involved since it has 
to be treated by many-body scattering theory; this 
suggests that the scattering approach can be hardly 
used in the context of the “thermal-bath model.” In 
any case, the selection of a “preferred basis” (the 
coordinate representation) depends on the fact that 
one is dealing with a scattering phenomenon. A few 
steps have been made for a rigorous analysis (Teta 
2004) but we are very far from a mathematically 
satisfactory answer. 

The thermal-bath approach has been studied 
within the algebraic formulation of QM and stands 
on good mathematical ground (Alicki 2002, 
Blanchard et al. 2003, Sewell 2005). Its drawback 
is that it is difficult to associate the formal scheme 
with actual physical situations and it is difficult to 
give a realistic estimate on the decoherence time. 

The thermal-bath approach attributes the deco- 
herence effect to the practical impossibility of 
distinguishing between a vast majority of the pure 
states of the systems and the corresponding statis- 
tical mixtures. In this approach, the observables are 
represented by self-adjoint elements of a weakly 
closed subalgebra M of all bounded operators B(H) 
on a Hilbert space H. This subalgebra may depend 
on the measuring apparatus (i.e, not all the 
apparatuses are fit to measure a set of observables). 
A “classical” observable by definition commutes 
with all other observables and therefore must belong 
to the center of A which is isomorphic to a 
collection of functions on a probability space M. 


So the appearance of classical properties of a 
quantum system corresponds to the “emergence” of 
an algebra with nontrivial center. Since automorphic 
evolutions of an algebra preserve its center, this 
program can be achieved only if we admit the loss of 
quantum coherence, and this requires that the 
quantum systems we describe are open and interact 
with the environment, and moreover that the 
commutative algebra which emerges be stable for 
time evolution. 

It may be shown that one must consider quantum 
environment in the thermodynamic limit, that is, 
consider the interaction of the system to be 
measured with a thermal bath. A discussion of the 
possible emergence of classical observables and of 
the corresponding dynamics is given by Gell-Mann 
(1993). In all these approaches, the commutative 
subalgebra is selected by the specific form of the 
interaction; therefore, the measuring apparatus 
determines the algebra of classical observables. 

On the experimental side, a number of very 
interesting results have been obtained, using very 
refined techniques; these experiments usually also 
determine the “decoherence time.” The experimental 
results, both for the collision model (Hornberger 
et al. 2003) and for the thermal-bath model 
(Hackermueller et al. 2004), are done mostly with 
fullerene (a molecule which is heavy enough and is 
not deflected too much after a collision with a 
particle of the gas). They show a reasonable 
accordance with the (rough) theoretical conclusions. 

The most refined experiments about decoherence 
are those connected with quantum optics (circularly 
polarized atoms in superconducting cavities). These 
are not related to the wave nature of the particles 
but in a sense to the “wave nature” of a photon as a 
single unit. The electromagnetic field is now 
regarded as an incoherent superposition of states 
with an arbitrarily large number of photons. 
Polarized photons can be produced one by one, 
and they retain their individuality and their polar- 
ization until each of them interacts with “the 
environment” (e.g., the boundary of the cavity or a 
particle of the gas). In a sense, these experimental 
results refer to a “decoherence by collision” theory. 

The experiments by Haroche (2003) prove that 
coherence may persist for a measurable interval of 
time and are the most controlled experiments on 
coherence so far. 


Other Approaches 


We end this section with a brief discussion of the 
problem of “hidden variables” and a presentation of 
an entirely different approach to QM, originated by 


D Bohm (1952) and put recently on firm mathema- 
tical grounds by Duerr et al. (1999). The approach is 
radically different from the traditional one and it is 
not clear at present whether it can give a solution to 
the measurement problem and a description of all 
the phenomena which traditional QM accounts for. 
But it is very interesting from the point of view of 
the mathematics involved. 

We have remarked that the formulation of QM 
that is summarized in the three axioms given earlier 
has many unsatisfactory aspects, mainly connected 
with the superposition principle (described in its 
extremal form by the Schrödinger’s cat “paradox’’) 
and with the problem of measurement which 
reveals, for example, through the Einstein—Rosen— 
Podolski “paradox,” an intrinsic nonlocality if one 
maintains that their “objective” properties can be 
attributed to systems which are far apart. From the 
very beginning of QM, attempts have been made to 
attribute these features to the presence of “hidden 
variables”; the statistical nature of the predictions 
of QM is, from this point of view, due to the 
incompleteness of the parameters used to describe 
the systems. The impossibility of matching the 
statistical prediction of QM (confirmed by experi- 
mental findings) with a local theory based on hidden 
variables and classical probability theory has been 
known for sometime (Kochen and Specker 1967), 
also through the use of “Bell inequalities” (Bell 
1964) among correlations of outcomes of separate 
measurements performed on entangled system 
(mainly two photons or two spin-1/2 particles 
created in a suitable entangled state). 

A proof of the intrinsic nonlocality of QM (in the 
above sense) was given by L Hardy (see Haroche 
(2003)). 

While experimental results prove that one 
cannot substitute QM with a “naive” theory of 
hidden variables, more refined attempts may have 
success. We shall only discuss the approach of Bohm 
(following a previous attempt by de Broglie) as 
presented in Duerr et al. (1999). It is a dynamical 
theory in which representative points follow “classical 
paths” and their motion is governed by a time- 
dependent vector “velocity” field (in this sense, it is 
not Newtonian). In a sense, Bohmian mechanics is a 
minimal completion of QM if one wants to keep the 
position as primitive observable. To these primitive 
objects, Bohm’s theory adds a complex-valued func- 
tion @ (the “guiding wave” in Bohm’s terminology) 
defined on the configuration space Q of the particles. 
In the case of particles with spin, the function ¢ is 
spinor-valued. Dynamics is given by two equations: 
one for the coordinates of the particles and one for 
the guiding wave. If x =x ,...,xn describes the 
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configuration of the points, the dynamics in a 
potential field V(x) is described in the following 
way: for the wave ¢ by a nonrelativistic Schrödinger 
equation with potential V and for the coordinates by 
the ordinary differential equation (ODE) 


x 
ig = (bym) EE] a), eR 
oo 
where m, is the mass of the mth particle. 

Notice that the vector field is singular at the zeros 
of the wave function, therefore global existence and 
uniqueness must be proved. To see why Bohmian 
mechanics is empirically equivalent to QM, at least 
for measurement of position, notice that the 
equation for the points coincides with the continuity 
equation in QM. It follows that if one has at time 
zero a collection of points distributed with density 
Idol, the density at time ¢ will be olt)? where (t) 
is the solution of the Schrödinger equation with 
initial datum do. 

Bohm (1952) formulated the theory as a modi- 
fication of Newton’s laws (and in this form it has 
been widely used) through the introduction of a 
“quantum potential” Vg. This was achieved by 
writing the wave function in its polar form 
p= Re'’/” and writing the continuity equation as a 
modified Hamilton-Jacobi equation. The version of 
Bohm’s theory discussed in Duerr et al. (1999) 
introduces only the guiding wave function and the 
coordinates of the points, and puts the theory on 
firm mathematical grounds. Through an impressive 
series of mathematical results, these authors and 
their collaborators deal with the completeness of 
the velocity vector field, the asymptotic behavior of 
the points trajectories (both for the scattering regime 
and for the trapped trajectories, which are shown to 
correspond to bound states in QM), with a rigorous 
analysis of the theorem on the flux across a surface 
(a cornerstone in scattering theory) and the detailed 
analysis of the “two-slit” experiment through a 
study of the interaction with the measuring appara- 
tus. The theory is completely causal, both for the 
trajectories of the points and for the time develop- 
ment of the pilot wave, and can also accommodate 
points with spin. It leads to a mathematically precise 
formulation of the semiclassical limit, and it may 
also resolve the measurement problem by relating 
the pilot wave of the entire system to its approximate 
decomposition in incoherent superposition of pilot 
wave associated with the particle and to the measur- 
ing apparatus (this would be the way to see the 
“collapse of the wave function” in QM). A weak 
point of this approach is the relation of the 
representative points with observable quantities. 
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Introduction 


This will be an elementary introduction to general 
topology. We shall not even touch upon algebraic 
topology, which will be dealt with in Cohomology 
Theories, although in some mathematics departments 
it is introduced in an advanced undergraduate course. 

We believe such an elementary article is useful for 
the encyclopaedia, purely for quick reference. Most 
of the concepts will be familiar to physicists, but 
usually in a general rather vague sense. This article 
will provide the rigorous definitions and results 
whenever they are needed when consulting other 
articles in the work. To make sure that this is the 
case, we have in fact experimentally tested the 
article on physicists for usefulness. 

Topology is very often described as “rubber-sheet 
geometry,” that is, one is allowed to deform objects 
without actually breaking them. This is the all- 
important concept of continuity, which underlies 
most of what we shall study here. 

We shall give full definitions, state theorems 
rigorously, but shall not give any detailed proofs. 
On the other hand, we shall cite many examples, 
with a view to applications to mathematical physics, 
taking for granted that familiar more advanced 
concepts there need not be defined. By the same 
token, the choice of topics will also be so dictated. 


" 1,5,1,0,0pc,0pc,0pc,0pc>Essential 
Concepts 
Definition 1 Let X be a set. A collection 7 of 


subsets of X is called a topology if the following are 
satisfied: 
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(i) 0X eT. 
(ii) Let Z be an index set. then 


Aon eT eel |) .-Aa et 


acl 
(Gii) Aged f= 1. ice = (Taf ET. 


Definition 2 A member of the topology T is called 
an open set (of X with topology 7). 


Remark The last two properties are more easily 
put as arbitrary unions of open sets are open, and 
finite intersections of open sets are open. One can 
easily see the significance of this: if we take the 
“usual topology” (which will be defined in due 
course) of the real line, then the intersection of all 
open intervals (—1/n,1/n),n a positive integer, is 
just the single point {0}, which is manifestly not 
open in the usual sense. 


Example If we postulate that Ø, and the entire set 
X, are the only open subsets, we get what is called 
the indiscrete or coarsest topology. At the other 
extreme, if we postulate that all subsets are open, 
then we get the discrete or finest topology. Both 
seem quite unnatural if we think in terms of the 
real line or plane, but in fact it would be more 
unnatural to explicitly exclude them from the 
definition. They prove to be quite useful in certain 
respects. 


Definition 3 A subset of X 
complement in X is open. 


is closed if its 


Remarks 


(i) One could easily build a topology using closed 
sets instead of open sets, because of the simple 
relation that the complement of a union is the 
intersection of the complements. 

(ii) From the definitions, there is nothing to prevent 
a set being both open and closed, or neither 
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Definition 4 A set equipped with a topology is 
called a topological space (with respect to the given 
topology). Elements of a topological space are 
sometimes called points. 


Definition 5 Let x € X. A neighborhood of x is a 
subset of X containing an open set which contains x. 


Remark This seems a clumsy definition, but turns 
out to be more useful in the general case than 
restricting to open neighborhoods, which is often done. 


Definition 6 A subcollection of open sets BCT is 
called a basis for the topology T if every open set is 
a union of sets of B. 


Definition 7 A subcollection of open sets S C T is 
called a sub-basis for the topology T if every open 
set is a union of finite intersections of sets of S. 


Definition 8 The closure A of a subset A of X is 
the smallest closed set containing A. 


Definition 9 The interior A of a subset A of X is 
the largest open set contained in A. 


Remark It is sometimes useful to define the 
boundary of A as the set A\A={x € A,x ¢ A}. 


Definition 10 Let A be a subset of a topological 
space X. A point x € X is called a limit point of A if 
every Open set containing x contains some point of 
A other than x. 


Definition 11 A subset A of X is said to be dense in 
X if A=X. 


Definition 12 A topological space X is called a 
Hausdorff space if for any two distinct points x, y € X, 
there exist an open neighborhood of A of x and an 
open neighborhood B of y such that A and B are 
disjoint (that is, AM B =Q). 


Remark and Examples 


(i) This is looking more like what we expect. 
However, certain mildly non-Hausdorff spaces 
turn out to be quite useful, for example, in twistor 
theory. A “pocket” furnishes such an example. 
Explicitly, consider X to be the subset of the real 
plane consisting of the interval [—1,1] on the x- 
axis, together with the interval [0,1] on the line 
y=1, where the following pairs of points are 
identified: (x, 0) S (x,1),0 < x < 1. Then the two 
points (0, 0) and (0,1) do not have any disjoint 
neighborhoods. Strictly speaking, one needs the 
notion of a quotient topology, introduced below. 

(ii) For a more “truly” non-Hausdorff topology, 
consider the space of positive integers N = 
{1,2,3,...}, and take as open sets the following: 
Ø, N, and the sets {1,2,...,7} for each n EN. 


This space is neither Hausdorff nor compact (see 
later for definition of compactness). 


Definition 13 Let X and Y be two topological 
spaces and let f : X — Y be a map from X to Y. We 
say that f is continuous if f-'(A) is open (in X) 
whenever A is open (in Y). 


Remark Continuity is the single most important 
concept here. In this general setting, it looks a little 
different from the “e—6” definition, but this latter works 
only for metric spaces, which we shall come to shortly. 


Definition 14 A map f:X—Y is a homeomorph- 
ism if it is a continuous bijective map such that its 
inverse f ! is also continuous. 


Remark Homeomorphisms are the natural maps 
for topological spaces, in the sense that two home- 
omorphic spaces are “indistinguishable” from the 
point of view of topology. Topological invariants 
are properties of topological spaces which are 
preserved under homeomorphisms. 


Definition 15 Let B C A. Then one can define the 
relative topology of B by saying that a subset C C B 


is open if and only if there exists an open set D of A 
such that C= DAB. 


Definition 16 A subset B C A equipped with the 
relative topology is called a subspace of the 
topological space A. 


Remark Thus, if for subsets of the real line, we 
consider A =[0, 3], B=[0, 2], then C=(1, 2] is open 
in B, in the relative topology induced by the usual 
topology of R. 


Definition 17 Given two topological spaces X and Y, 
we can define a product topological space Z= X x Y, 
where the set is the Cartesian product of the two sets X 
and Y, and sets of the form A x B, where A is open in 
X and B is open in Y, form a basis for the topology. 


Remark Note that the open sets of X x Y are not 
always of this product form (A x B). 


Definition 18 Suppose there is a partition of X into 
disjoint subsets Aa, œ € Z, for some index set Z, or 
equivalently, there is defined on X an equivalence 
relation ~. Then one can define the quotient 
topology on the set of equivalence classes {Aq,a € 
T}, usually denoted as the quotient space X/ ~ =Y, 
as follows. Consider the map 7:X— Y, called the 
canonical projection, which maps the element x € X 
to its equivalence class [x]. Then a subset U C Y is 
open if and only if m*(U) is open. 


Proposition 1 Let T be the quotient topology on 
the quotient space Y. Suppose T is another 


topology on Y such that the canonical projection is 
continuous, then T' CT. 


Definition 19 An (open) cover {U,:a € T} for X isa 
collection of open sets Ua C X such that their union 
equals X. A subcover of this cover is then a subset of 
the collection which is itself a cover for X. 


Definition 20 A topological space X is said to be 
compact if every cover contains a finite subcover. 


Remark So for a compact space, however one 
chooses to cover it, it is always sufficient to use a 
finite number of open subsets. This is one of the 
essential differences between an open interval (not 
compact) and a closed interval (compact). The former 
is in fact homeomorphic to the entire real line. 


Definition 21 A topological space X is said to be 
connected if it cannot be written as the union of two 
nonempty disjoint open sets. 


Remark A useful equivalent definition is that any 
continuous map from X to the two-point set {0, 1}, 
equipped with the discrete topology, cannot be 
surjective. 


Definition 22 Given two points x, y in a topolo- 
gical space X, a path from x to y is a continuous 
map f:[0,1]—X such that f(0)=x,f(1)=y. We 
also say that such a path joins x and y. 


Definition 23 A topological space X is path- 
connected if every two points in X can be joined 
by a path lying entirely in X. 


Proposition 2. A path-connected space is connected. 


Proposition 3 A connected open subspace of R” is 
path-connected. 


Definition 24 Given a topological space X, define 
an equivalence relation by saying that x ~ y if and 
only if x and y belong to the same connected 
subspace of X. Then the equivalence classes are 
called (connected) components of X. 


Examples 


(i) The Lie group O(3) of 3 x 3 orthogonal matrices 
has two connected components. The identity 
connected component is SO(3) and is a subgroup. 

(ii) The proper orthochronous Lorentz transformations 
of Minkowski space form the identity component 
of the group of Lorentz transformations. 


Metric Spaces 


A special class of topological spaces plays an 
important role: metric spaces. 
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Definition 25 A metric space is a set X together 
with a function d: X x X —> R satisfying 


(i) d(x,y) > 0, 
(ii) d(x,y)=0 & x=y, 
(iii) d(x,z) < d(x,y) + d(y, z) (“triangle inequality”). 


Remarks 


(i) The function d is called the metric, or distance 
function, between the two points. 

(ii) This concept of metric is what is generally 
known as “Euclidean” metric in mathematical 
physics. The distinguishing feature is the posi- 
tive definiteness (and the triangle inequality). 
One can, and does, introduce indefinite metrics 
(for example, the Minkowski metric) with 
various signatures. But these metrics are not 
usually used to induce topologies in the spaces 
concerned. 


Definition 26 Given a metric space X and a point 
x € X, we define the open ball centred at x with 
radius r (a positive real number) as 


B,(x) = {y € X : d(x,y) < r} 


Given a metric space X, we can immediately 
define a topology on it by taking all the open balls in 
X as a basis. We say that this is the topology 
induced by the given metric. Then we can recover 
our usual “e—6” definition of continuity. 


Proposition 4 Letf:X— Y bea map from the metric 
space X to the metric space Y. Then f is continuous 
(with respect to the corresponding induced topologies) 
at x € X if and only if given any «€ > 0,46 > 0 such that 
d(x, x’) <6 implies d(f (x, ), f(x’)) <€. 


Note that we do not bother to give two different 
symbols to the two metrics, as it is clear which 
spaces are involved. The proof is easily seen by 
taking the relevant balls as neighborhoods. Equally 
easy is the following: 


Proposition 5 A metric space is Hausdorff. 


Definition 27 A map f:X-—Y of metric spaces is 
uniformly continuous if given any € > 0 there exists 
6>0 such that for any x1,x2 E€ X,d(x1,x2)<6 
implies d(f(x1), f(x2)) < €. 


Remark Note the difference between continuity 
and uniform continuity: the latter is stronger and 
requires the same 6 for the whole space. 


Definition 28 Two metrics dı and də defined on X 
are equivalent if there exist positive constants a and 
b such that for any two points x,y € X we have 


ad,(x,y) < da(x,y) < bdi(x,y) 
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Remark This is clearly an equivalence relation. 
Two equivalent metrics induce the same topology. 


Examples 


(i) Given a set X, we can define the discrete metric 
as follows: do(x,y)=1 whenever x Æ y. This 
induces the discrete topology on X. This is quite 
a convenient way of describing the discrete 
topology. 

(ii) In R, the usual metric is d(x, y)= |x —y|, and 
the usual topology is the one induced by this. 

(iii) More generally, in R”, we can define a metric 
for every p>1 by 


j 1/p 
d(x,y) = 5 [xk — nt} 
k=1 


where x= (x1, X25... , Xn), Y = (Y1 Y2». -->Yn)- In 
particular, for p=2 we have the usual Eucli- 
dean metric, but the other cases are also useful. 
To continue the series, one can define 


doo = max {|x} — yrl} 
1<k<n 


All these metrics induce the same topology on R”. 
(iv) In a vector space V, say over the real or the 

complex field, a function || - ||: V — R* is called 

a norm if it satisfies the following axioms: 

(a) ||x||=0 if and only if x =0, 

(b) laxl|=lalllxll, and 

(c) || + yl] < [lel] + lly: 


Then it is easy to see that a metric can be defined 
using the norm 


d(x,y) = ||x — y| 


In many cases, for example, the metrics defined in 
example (iii) above, one can define the norm of a 
vector as just the distance of it from the origin. One 
obvious exception is the discrete metric. 

A slightly more general concept is found to be 
useful for spaces of functions and operators: that of 
seminorms. A seminorm is one which satisfies the 
last two of the conditions, but not necessarily the 
first, for a norm, as listed above. 


Definition 29 Given a metric space X, a sequence 
of points {x1,x2,...} is called a Cauchy sequence if, 
given any € > 0, there exists a positive integer N 
such that for any k, > N we have d(x}, xo) < €. 


Definition 30 Given a sequence of points 
{x1,X2,...} in a metric space X, a point x € X is 
called a limit of the sequence if given any «€> 0, 
there exists a positive integer N such that for any 
n>N we have d(x,x,)<e. We say that the 
sequence converges to x. 


Definition 31 A metric space X is complete if every 
Cauchy sequence in X converges to a limit in it. 


Examples 


(i) The closed interval [0,1] on the real line is 
complete, whereas the open interval (0,1) is 
not. For example, the Cauchy sequence 
{1/n,n=2,3,...} has no limit in this open 
interval. (Considered as a sequence on the real 
line, it has of course the limit point 0.) 

(ii) The spaces R” are complete. 

(iii) The Hilbert space Æ consisting of all 
sequences of real numbers {x1,x2,...} such 
that XF x? converges is complete with respect 
to the obvious metric which is a generalization 
to infinite dimension of də above. For arbi- 
trary p> 1, one can similarly define 4’, which 
are also complete and are hence Banach 
spaces. 


Remarks Completeness is not a topological invar- 
iant. For example, the open interval (—1,1) and the 
whole real line are homeomorphic (with respect to 
the usual topologies) but the former is not complete 
while the latter is. The homeomorphism can 
conveniently be given in terms of the trigonometric 
function tangent. 


Definition 32 A subset B of the metric space X is 
bounded if there exists a ball of radius R (R > 0) 
which contains it entirely. 


Theorem 1 (Heine-Borel) Any closed bounded 
subset of R” is compact. 


Remark The converse is also true. We have thus a 
nice characterization of compact subsets of R” as 
being closed and bounded. 


Proposition 6 Any bounded sequence in R” has a 
convergent subsequence. 


Definition 33 Consider a sequence {f,} of real- 
valued functions on a subset A (usually an interval) 
of R. We say that {fa} converges pointwise in A if 
the sequence of real numbers {f,,(x)} converges for 
every x € A. We can then define a function f:A—R 
by f(x) = lim, f,(x), and write f, > f. 


Definition 34 A sequence of functions f,:A— 
R, A C R is said to converge uniformly to a function 
f:A—R if given any e€ > 0, there exists a positive 
integer N such that, for all x, |f,(x) —f(x)|<e 
whenever n >N. 


Theorem 2 Let fa:(a,b)—>R be a sequence of 
functions continuous at the point c€ (a,b), and 
suppose f„ converges uniformly to f on (a, b). Then f 
is continuous at c. 


Remark and Example The pointwise limit of 
continuous functions need not be continuous, as 
can be shown by the following example: 
falx) =x",x € [0,1]. We see that the limit function 
f is not continuous: 


a 2 


x=] 


Definition 35 Let X be a metric space. A map 
f:X—X is a contraction if there exists c < 1 such 
that d(f (x), f(y)) < cd(x, y) for all x,y € X. 


Theorem 3 (Banach) If X is a complete metric 
space and f is a contraction in X, then f has a unique 
fixed point x € X, that is, f(x) =x. 


Some Function and Operator Spaces 


The spaces of functions and operators can be 
equipped with different topologies, given by various 
concepts of convergence and of norms (or sometimes 
seminorms), very often with different such concepts 
for the same space. As we saw earlier, a norm in a 
vector space gives rise to a metric, and hence to a 
topology. Similarly with the concept of convergence 
for sequences of functions and operators, as one 
then knows what the limit points, and hence closed 
sets, are. 

But before we do that, let us introduce, in a 
slightly different context, a topology which is in 
some sense the natural one for the space of 
continuous maps from one space to another. 


Definition 36 Consider a family F of maps from a 
topological space X to a topological space Y, and 
define W(K,U)={f:f € F,f(K) C U}. Then the 
family of all sets of the form W(K,U) with K 
compact (in X) and U open (in Y) form a sub-basis 
for the compact open topology for F. 


Consider a topological space X and sequences of 
functions (f„) on it. Let D C X. We can then define 
pointwise convergence and uniform convergence 
exactly as for functions on subsets of the real line. 


Definition 37 Let X, D and (fn) as above. 


(i) The functions f, converge pointwise on D to a 
function f if the sequence of numbers 
(ii) The functions f, converge uniformly on D to a 
function f if given € > 0, there exists N such that 
for all n > N we have |fa(x) — f(x)| < € Vx € D. 


Next we consider the Lebesgue spaces L’, that 
is, functions f defined on subsets of R”, such 
that |f(x)|? is Lebesgue integrable, for real 
numbers p> 1. To define these spaces, we tacitly 
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take equivalence classes of functions which are equal 
almost everywhere (that is, up to a null set), but very 
often we can take representatives of these classes 
and just deal with genuine functions instead. Note 
that of all LP, only LŽ is a Hilbert space. 


Definition 38 Inthe space L’, we define its norm by 


ifl = (f oP iw) 


Now we turn to general normed spaces, and 
Operators on them. 


Definition 39 Convergence in the norm is also 
called strong convergence. In other words, a 
sequence (x,,) in a normed space X is said to 
converge strongly to x if 


lim ||x, — x|| = 0 
NCO 


Definition 40 A sequence (x,,) in a normed space X 
is said to converge weakly to x if 


lim (xn) = f (x) 


for all bounded linear functionals f. 


Consider the space B(X,Y) of bounded linear 
operators T from X to Y. We can make this into a 
normed space by defining the following norm: 

ITI= sup T= 
x EX, ||x|| =1 


Then we can define three different concepts of 
convergence on B(X, Y). There are in fact more in 
current use in functional analysis. 


Definition 41 Let X and Y be normed spaces and 
let (T„) be a sequence of operators T,, € B(X, Y). 


(i) (Ta) is uniformly convergent if it converges in 
the norm. 
(ii) (T„) is strongly convergent if (T,,x) converges 
strongly for every x € X. 
(iii) (T;,) is weakly convergent if (T,x) converges 
weakly for every x € X. 


Remark Clearly we have: uniform convergence ==> 
strong convergence = > weak convergence, and the 
limits are the same in all three cases. However, the 
converses are in general not true. 


Homotopy Groups 


The most elementary and obvious property of a 
topological space X is the number of connected 
components it has. The next such property, in a 
certain sense, is the number of holes X has. There 
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are higher analogues of these, called the homotopy 
groups, which are topological invariants, that is, 
they are invariant under homeomorphisms. They 
play important roles in many topological considera- 
tions in field theory and other topics of mathema- 
tical physics. The articles Topological Defects 
and Their Homotopy Classification and Electric- 
Magnetic Duality contain some examples. 


Definition 42 Given a topological space X, the 
zeroth homotopy set, denoted zo(X), is the set of 
connected components of X. One sometimes writes 
m™o(X) =O if X is connected. 


To define the fundamental group of X, or 7(X), 
we shall need the concept of closed loops, which we 
shall find useful in other ways too. For simplicity, 
we shall consider based loops (that is, loops passing 
through a fixed point in X). It seems that in most 
applications, these are the relevant ones. One could 
consider loops of various smoothness (when X is a 
manifold), but in view of applications to quantum 
field theory, we shall consider continuous loops, 
which are also the ones relevant for topology. 


Definition 43 Given a topological space X and a 
point xo € X, a (closed) (based) loop is a continuous 
function of the parametrized circle to X: 


E: [0, 27| — X 
satisfying €(0) = (2r) = xọ. 


Definition 44 Given a connected topological space 
X and a point xo € X, the space of all closed based 
loops is called the (parametrized based) loop space 
of X, denoted QX. 


Remarks 


(i) The loop space QX inherits the relative compact- 
open topology from the space of continuous maps 
from the closed interval [0,27] to X. It also has a 
natural base point: the constant function mapping 
all of [0,27] to xo. Hence it is easy to iterate the 
construction and define Q*X,k > 1. 

(ii) Here we have chosen to parametrize the circle 
by [0,27], as is more natural if we think in 
terms of the phase angle. We could easily have 
chosen the unit interval [0,1] instead. This 
would perhaps harmonize better with our pre- 
vious definition of paths and the definitions of 
homotopies below. 


Proposition 7 The fundamental group of a topo- 
logical space X, denoted 71(X), consists of classes of 
closed loops in X which cannot be continuously 
deformed into one another while preserving the base 
point. 


Definition 45 A space X is called simply connected 
if 74(X) is trivial. 


To define the higher homotopy groups, let us go 
into a little detail about homotopy. 


Definition 46 Given two topological spaces X and 
Y, and maps 


p,q: X—Y 


we say that b is a homotopy between the maps p, q if 


b:XxI—Y 
is a continuous map such that h(x,0)=p(x), 
b(x,1)=q(x), where I is the unit interval [0,1]. In 


this case, we write p œ q. 


Definition 47 A map f:X—Y is a homotopy 
equivalence if there exists a map g:Y—X such 
that g o f ~ idx and f og ~ idy. 


Remark This is an equivalence relation. 


Definition 48 For a topological space X with base 
point xo, we define m„(X)ů, n >0 as the set of 
homotopy equivalence classes of based maps from 
the m-sphere S” to X. 


Remark This coincides with the previous defini- 
tions for mo and 7}. 


There is a very nice relation between homotopy 
classes and loop spaces. 


Proposition 8 7,(X) = 7y_1(QX) = --- =79(0”X). 
Remarks 


(i) When we consider the gauge group G ina Yang- 
Mills theory, its fundamental group classifies the 
monopoles that can occur in the theory. 

(ii) For n > 1,7,(X) is a group, the group action 
coming from the joining of two loops together 
to form a new loop. On the other hand, zo(X) 
in general is not a group. However, when X is a 
Lie group, then 79(X) inherits a group structure 
from X, because it can be identified with the 
quotient group of X by its identity-connected 
component. For example, the two components 
of O(3) can be identified with the two elements 
of the group Z2, the component where the 
determinant equals 1 corresponding to 0 in Z2 
and the component where the determinant 
equals —1 corresponding to 1 in Z2. 

(iii) For n > 2, the group 7,,(X) is always abelian. 

(iv) Examples of nonabelian 71 are the fundamental 
groups of some Riemann surfaces. 

(v) Since 7, is not necessarily abelian, much of the 
direct-sum notation we use for the homotopy 


groups should more correctly be written multi- 
plicatively. However, in most literature in 
mathematical physics, the additive notation 
seems to be preferred. 


Examples 


(i) W(X x Y)=Ta(X) + 7,(Y), n> 1. 
(ii) For the spheres, we have the following results: 


n(S") = ° ifi >n 

Z- tia=a 
m(S') = 0 ifi>1 

tis = 7a swe s 

T42(S") =Z. itn >2 
m6(S°) = Zap 


(iii) From the theory of sphere bundles, we can 
deduce: 


1;(S*) = m;-1(S') + m;(S?) ig 2 
1;(S*) = m-1(S?) 4 7;(S’) ifi>2 
Tils) = m-1(S’) —+- m;(S'>) fi> 2 


and the first of these relations give the follow- 
ing more succinct result: 


T(S) = 7;(S*) if i> 3 


(iv) A result of Serre says that all the homotopy 
groups of spheres are in fact finite except 7,,(S”) 
and 4,_1(S*"),2 > 1. 


Definition 49 Given a connected space X, a map 
7:B— X is called a covering if (i) 7(B) = X, and (ii) for 
each x € X, there exists an open connected neighbor- 
hood V of x such that each component of 17!(V) is open 
in B, and 7 restricted to each component is a home- 
omorphism. The space B is called a covering space. 


Examples 


(i) The real line R is a covering of the group U(1). 

(ii) The group SU(2) is a double cover of the group 
SO(3). 

(iii) The group SL(2, C) is a double cover of the 
Lorentz group SO(1, 3). 

(iv) The group SU(2,2) is a 4-fold cover of the 
conformal group in four dimensions. This local 
isomorphism is of great importance in twistor 
theory. 


Remarks 


(i) By considering closed loops in X and their 
coverings in B it is easily seen that the 
fundamental group 71(X) acts on the coverings 
of X. If we further assume that the action is 
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transitive, then we have the following nice 
result: coverings of X are in 1-1 correspon- 
dence with normal subgroups of 7(X). 

(ii) Given a connected space X, there always exists a 
unique connected simply connected covering space 
X, called the universal covering space. Further- 
more, X covers all the other covering spaces of X. 
For the higher homotopy groups, one has 


TX) = T(X), n>2 


One very important class of homotopy groups are 
those of Lie groups. To simplify matters, we shall 
consider only connected groups, that is, mo(G)=0. 
Also we shall deal mainly with the classical groups, 
and in particular, the orthogonal and unitary groups. 


Proposition 9 Suppose that G is a connected Lie 
group. 


(i) If G is compact and semi-simple, then 71(G) is 
finite. This implies that G is still compact. 
(ii) m2(G)=0. 
(1) For G compact, 
13(G)=Z. 
(iv) For G compact, simply connected, and simple, 
t4(G)=0 or Zp. 


simple, and nonabelian, 


Examples 


(i) 74(SU(z)) =0. 

(ii) 74(SO(2)) = Z2. 

(iii) Since the unitary groups U(z) are topologically 
the product of SU(m”) with a circle St, their 
homotopy groups are easily computed using the 
product formula. We remind ourselves that 
U(1) is topologically a circle and SU(2) topolo- 
gically S°. 

(iv) For i > 2, we have: 

mi(SO(3)) = 7i(SU(2)) 

mi(SO(5)) = mi(Sp(2)) 

mi(SO(6)) = 7;(SU(4)) 
Just for interest, and to show the richness of the 
subject, some isomorphisms for homotopy groups 


are shown in Table 1 and some homotopy groups 
for low SU(m) and SO(n) are listed in Table 2. 


Table 1 Some isomorphisms for homotopy groups 


Isomorphism Range 





t(SO(n)) = xj(SO(m)) 


(S nm>i+2 
mj(SU(n)) & m;(SU(m)) 
(S 


n,m >4(i+1) 


m(Sp(n)) = r;i(Sp(m)) n,m > ł}(i— 1) 
(Go) = t;i(SO(7)) 2</<5 
ti(F4) = mj(SO(9)) 2=/56 
(SO(9)) = 7(SO(7)) i<13 
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Table 2 Some homotopy groups for low SU(n) and SO(n) 


T4 T5 TG UT 
SU(2) Zo Zo Za2 Zo 
SU(3) 0 Z Ze 0 
SU(4) 0 Z 0 Z 
SU(5) 0 Z 0 7 
SU(6) 0 Z 0 Z 
SO(5) Zs Zo 0 Z 
SO(6) 0 Z 0 Z 
SO(7) 0 0 0 Z 
SO(8) 0 0 0 Z+Z 
SO(9) 0 0 0 Z 
SO(10) 0 0 0 Z 


Appendix: A Mathematician’s 
Basic Toolkit 


The following is a drastically condensed list, most 
of which is what a mathematics undergraduate 
learns in the first few weeks. The rest is included 
for easy reference. These notations and concepts 
are used universally in mathematical writing. We 
have not endeavored to arrange the material in a 
logical order. Furthermore, given structures such as 
sets, groups, etc., one can usually define “substruc- 
tures” such as subsets, subgroups, etc., in a 
straightforward manner. We shall therefore not 
spell this out. 


Sets 
AUB={4x:xcAorxeB} union 
AQB={x:x€ Aandxe B} intersection 
A\B={x:x€ AandxgB} complement 


A x B= {(x,y):x € A,y € B} Cartesian product 


Maps 


1. A map or mapping f :A— B is an assignment of 
an element f(x) of B for every x € A. 

2. A map f:A—B is injective if f(x)=f(y) 
=> x =y. This is sometimes called a 1-1 map, a 
term to be avoided. 

3. A map f:A—B is surjective if for every y € B 
there exists an x € A such that y=f(x). This is 
sometimes called an “onto” map. 

4. A map f:A—B is bijective if it is both surjective 
and injective. This is also sometimes called a 1-1 
map, a term to be equally avoided. 

5. For any map f : A — B and any subset C C B, the 
inverse image f-!(C) = {x: f(x) € C} C A is always 
defined, although, of course, it can be empty. On 


Zo Z3 ZA5 


T8 T9 T10 
Z2 Z3 Z30 
Z4 Zo Z120 + Z2 

0 Z Z120 

0 Z Z3 

0 0 Z120 
Z4 Zo Z120 + Z2 

Zo + Zo Zo + Zo Ling 
Le + Zə + Z2 Le + Le + Zo Z4 + Z24 
Ze + Zo Le + Le Z4 

Zo Z + Zo Z12 


the other hand, the map f™ is defined if and only 
if f is bijective. 

6. A map from a set to either the real or complex 
numbers is usually called a function. 

7. A map between vector spaces, and more particu- 
larly normed spaces (including Hilbert spaces), is 
called an operator. Most often, one considers 
linear operators. 

8. An operator from a vector space to its field of 
scalars is called a functional. Again, one con- 
siders almost exclusively linear functionals. 


Relations 


1. A relation ~ on a set A is a subset R C A x A. 
We say that x ~ y if (x,y) E R. 

2. We shall only be interested in equivalence relations. 
An equivalence relation ~ is one satisfying, for all 
x,y, Z EÁ: 

(a) x ~ x (“reflexive”), 
(b) x ~ y= y ~ x (“symmetric”), 
(c) x ~ y, y ~ z= x ~ z (“transitive”). 

3. If ~ is an equivalence relation in A, then for each 
x € A, we can define its equivalence class: 


x] = {y € Ary ~ x} 


It can be shown that equivalence classes are 
nonempty, any two equivalence classes are either 
equal or disjoint, and they together partition the set 
A. Subgroup equivalence classes are called cosets. 

4. An element of an equivalence class is called a 
representative. 


Groups 


A group is a set G with a map, called multiplication 
or group law 


G x G= G 
(y > xy 


satisfying 


EEN 


. (xy)z=x(yz), Vx, y,z € G (“associative”); 

2. there exists a neutral element (or identity) 1 such 
that 1x =x1 =x,YVYx € G; and 

3. every element x € G has an inverse x™, that is, 


xx! a=xtx = 1. 


A map such as the multiplication in the definition 
is an example of a binary operation. Note that we 
have denoted the group law as multiplication here. 
It is usual to denote it additively if the group is 
abelian, that is, if xy = yx, Vx, y € G. In this case, we 
may write the condition as x + y=y+x, and call 
the identity element 0. 


Rings 


A ring is a set R equipped with two binary 
operations, x+y called addition, and xy called 
multiplication, such that 


1. R is an abelian group under addition; 

2. the multiplication is associative; and 

Si. (Ke VS IY a) SY he se eK 
(“distributive”). 


If the multiplication is commutative (xy = yx) then 
the ring is said to be commutative. A ring may 
contain a multiplicative identity, in which case it is 
called a ring with unit element. 

An ideal I of R is a subring of R, satisfying in 
addition 


reRacel=raclarel 


One can define in an obvious fashion a left-ideal and 
a right- ideal. The above definition will then be for a 
two-sided ideal. 


Modules 


Given a ring R, an R-module is an abelian group M, 
together with an operation, M x R-—M, denoted 
multiplicatively, satisfying, for x,y € M,r,s E€ R, 


1. (x+y)r=xr+ yr, 
2. x(r+s)=xrt+xs, 
3. x(rs)=(xr)s, and 
4. xl=x 


The term right R-module is sometimes used, to 
distinguish it from obviously defined left R-modules. 
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Fields 


A field F is a commutative ring in which every 
nonzero element is invertible. 

The additive identity 0 is never invertible, unless 
0 =1, so it is usual to assume that a field has at least 
two elements, 0 and 1. 

The most common fields we come across are, of 
course, the number fields: the rationals, the reals, 
and the complex numbers. 


Vector Spaces 


A vector space, or sometimes linear space, V, over a 
field F, is an abelian group, written additively, with 
a map F x V — V such that, for x,y € V,a, b EF, 


1. a(x +y)=ax + ay (“linearity”), 
2. (a+ B)x=ax + bx, 

3. (aß)x =a(Bx), and 

4. 1x=x. 


A vector space is then a right (or left) F-module. 
The elements of V are called vectors, and those of F 
scalars. 


Algebras 


An algebra A over a field F is a ring which is a 
vector space over F, such that 


a(ab) = (aa)b = a(ab), 


Note that in some older literature, particularly the 
Russian school, an algebra of operators is called a 
ring of operators. 


acéF, abeA 
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Introduction 


Quantum electrodynamics is the theory of the 
electromagnetic interactions of photons and elec- 
trons. When attempting to generalize this theory to 
other interactions it turns out to be necessary to 
identify its essential components. The essential 
properties of electrodynamics are contained in its 
formulation as an “abelian gauge theory.” The 
generalization to include other interactions is then 
reduced to incorporating the structure of nonabelian 
groups. This becomes particularly clear when we 
formulate the theory in the language of differential 
forms. 

Here we first present the formulation of electro- 
dynamics using differential forms. The electromag- 
netic fields are introduced via the Lorentz force 
equation. They are recognized as the components of 
a differential 2-form. This form fulfills two differ- 
ential conditions, which are equivalent to Maxwell’s 
equations. These are expressed with the help of a 
differential operator and its Hermitian conjugate, 
the codifferential operator. We consider the effects 
of charge conservation and introduce electromag- 
netic potentials, which are defined up to gauge 
transformations. We finally consider Weyl’s argu- 
ment for the existence of the electromagnetic 
interaction as a consequence of the local phase 
invariance of the electron wave function. 

We then go on to present the nonabelian general- 
ization. The gauge bosons appear in a theory with 
fermions by requiring invariance of the theory with 
respect to local gauge transformations. When the 
fermions group into symmetry multiplets this gives 
rise to a gauge group SU(N) involving N*—1 gauge 
bosons mediating the interaction, where N is the 
dimension of the Lie algebra. The interaction arises 
through the necessity of replacing the usual deriva- 
tives by covariant derivatives, which transform in a 
natural way in order to preserve the gauge 


invariance. The covariant derivatives involve the 
gauge potentials, whose transformation properties 
are dictated by those of the covariant derivative. 
Whereas for an abelian gauge theory such as 
electromagnetism scalar-valued p-forms are suffi- 
cient (actually only p=1,2), a nonabelian gauge 
theory involves the use of Lie-algebra-valued 
p-forms. These are introduced and used to construct 
the Yang-Mills action, which involves the field 
strength tensor which is determined from the gauge 
potentials. This action leads to the Yang-Mills 
equations for the gauge potentials, which are the 
nonabelian generalizations of the Maxwell equations. 


Relativistic Kinematics 


The trajectory of a mass point is described as x“(r), 
where 7 is the invariant proper time interval: 


dr = dt’ — dx- dx = d (1 — 1°) [1] 


with v= dx/dt. With the abbreviation y = (1 — v?) 
this yields dr = (1 /y)dt. 

The 4-velocity of a point is defined as u” = 
dx" /dt =7(dx"/dt). The quantity 





u = guu" u = =1 |2] 


is a relativistic invariant. Here 


1 0 0 0 
0-1 0 0 

sv “10 0 -1 0 P 
0 0 0 -1 


is the metric of Minkowski space. 
The 4-momentum of a particle is p“ =mou" = 
(moy, mow), and p“p,, =m. The 4-force is 


dp” dp” dp? 
i a E e 
f —— dr =] dt (Sof) |4] 


with the 3-force 
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Differentiate p* =m with respect to 7, this yields 


0 
prf, = 2m? (F-f-v)=0 (6 
or 
dp? _ „dx 
doa 71 


This says that 
dp? = f -dx= dW [8] 


where W is the work done and p° is the energy. 
For a charged particle, the Lorentz force is 


f=q(E+v~xB) [9] 


where q is the charge of the particle, E is the electric, 
and B the magnetic field strength. Since f-v=qE-v, 
we have the four-dimensional form of the Lorentz 
force: 


f! = qy(E -v,E +v x B) [10] 


The Lorentz Force Equation with 
Differential Forms 


We write the Lorentz force equation as an equation 
for a differential form y =] ax", with lem ua LHe 
velocity-dependent Lorentz force is 


f = —qi„F [11] 


with 


o o o o 
— y| — + y* 4 u H er — 12 
í ($+ T By” =) |12] 


the 4-velocity and F the electromagnetic field 
strength: 


F=EAdt+B [13] 
where € is a 1-form in three dimensions, 
E= t ady FEA [14] 
and B is a 2-form in three dimensions, 
B = B,dy ^ dz + BydzA\dx + B,dx Ady [15] 


The symbol i, indicates a contraction of a 2-form 
with a vector, which is defined as 


taku) = F(u,v) [16] 


for an arbitrary vector v. The contraction of a 
2-form with a vector yields a 1-form. 

It is easily seen that a 2-form can be expressed in 
terms of a polar vector and an axial vector: if it is to 
be invariant with respect to parity transformations 
with 


t—>t, x—> —x, y> =y, z—=-z [17] 


the fields in eqn [13] must transform as 
E— -E, B — B [18] 
Now we check the validity of eqn [11]. We have 
f = —gi,F 
= qy(u- E)dt — qy|(E* + (v x B)” )dx 
+ (EY + (v x BY )dy + (E + (vx B)*)dz] [19] 


in agreement with eqn [10]. We remember to change 
the signs in E, = —E*, Bx = —B*, etc. 


The Codifferential Operator 


The space of p-forms on an n-dimensional manifold 


C= (np) = En [20] 


dimensional vector space. The space of p-forms is 
thus isomorphic to the space of (n — p)-forms. The 
Hodge dual operator maps the p-forms into the 
(n — p)-forms, and is defined by 


ah * B=(a,B\dx! A---Adx" [21] 
Here (a, 8Y is the scalar product of two p-forms: 
(CA eer aad |22] 


where Qj, ...5;, are the coefficients of the form a, 


p 
E a [23] 
Bj -sj are the coefficients of the form £, 
= ee A+++ A dx [24] 
and 
Biv ie = git... gil. [25] 


The indices satisfy i1 <- -+ < ip and j1<--- < jp- 
The basis elements are orthogonal with respect to 
this scalar product, and 


(dx" N+- A dx? , dx" A- A dx") 
= Shi’ Sipip |26] 
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The Hodge dual has the property that 
; (ax NN di?) ) 
= So(1)0(1) ` ` ` So(p)o(p) (Sign a) 
x am /\ Ada”) [27] 


where o is a permutation of the indices (1,...,7), 


a(1)<---<o(p), and o(p +1) < oln). We also 
have 
: Cae Aee Adar) 

= Zo(p+1)o(p+1) `` ` Eolnoln) (—1)?"?) (sign a) 


x (dx A+++ dx”) ) [28] 


We therefore find that the application of the 
Hodge dual to a p-form twice yields 


wa (dx Ae A dx? ') 


= So(1)o(1) `` So(p)o(p) (Signe) * (aen Ae Adx | 
= Bo(1)o(1) Eoo CLP dx) A---Adx™) [29] 


or 


xx = (—1)P?) (1/948 Td [30] 


where Ind g is the number of times (—1) occurs along 
the diagonal of g. 

Now let a be a (p—1)-form, and 8 a p-form. 
Then d x 8 is an (n — p + 1)-form, and 


dlan xp) 1} and8 


ye) 2 DO 


= da^ *3+(- 

=da^*ß+(—1 
x (—1) a A (**)d * 8 

= da^ x* 8+ ay ea 

x aA * (xd * 3) [31] 


We then have 


(da, 3) — (a, d* 6) =} d(a Nxb) [32] 


with 
d* = —(-1)"P Ys x d x [33] 


We are here using the scalar product of two p-forms 


Cro J (anap) [34] 


With the help of Stokes’ theorem the last integral in 
eqn [32] may be turned into a surface term at 
infinity, which vanishes for a and 8 with compact 
support. d* is the adjoint operator to d with respect 


to the scalar product (,). Whereas the differential 
operator d maps p-forms into (p + 1)-forms, the 
codifferential operator d* maps p-forms into (p — 1)- 
forms. 


The relation d? =0 leads to 


(dY x (xd*)(xd*) x xd** = 0 [35] 


This fact plays an essential role in connection with 
the conservation laws. 
Finally, we want to obtain a coordinate expres- 


sion for d* 8. Indeed d*@ = —Div 8 for 


| ag. 

(Div), = 7A [36] 
where K is the multi-index of the coeffecients in 
3 = Bxdx*, and K indicates that K=(ki,..., Rp») is in 
the order kı<---<kp. We will show that 
(a, d* 3) =(a, —Div@) for an arbitrary (p — 1)-form 


a. It is a fact that 


(a,d"8) = (da, 8) = f(da),s'*1 [87 


Now we have the coordinate expressions 


da = (daz) A dx* [38] 
and (dx), = 6%. It follows that 
o 
(da); = (dar Adx"), = 8% F 6L B9 
or 
K OOK 
(da), = 5 40] 
Here we use 
(a /\ B), = oak By [41] 
where 
1 if (KL) is an even 
permutation of I 
pd if (KL) is an odd [42 


permutation of I 


0 otherwise 


Use of the Leibnitz rule yields 


o 
fido x1 = J aa Bi «1 


fp ASP arp") 
- (i. 


-j orst E 1 (43) 
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The first term corresponds to a surface integration 
and we can neglect it. We then have es IK from 
the antisymmetry of 3, so that 


op” 


(œa d'p)= -| ax 55 aa x 1 = (a, —Div{) [44] 


The Maxwell Equations 


The Maxwell equations become remarkably concise 
when expressed in terms of differential forms, namely 


dF =0, d*F =-j [45] 
where F is the field strength and j is the current 
density. We wish to demonstrate this. We use a 
(3 + 1)-separation of the exterior derivative into a 
timelike and a spacelike part: 


o 
= — 4 
d=d+dt^ Ai [46] 
We then get 
dP =(de +2) Adt + dB =0 [47] 
By comparing coefficients, we arrive at 
OB 
Z = 4 
dE E dB = 0 [48] 
In vector notation 
curl E = a div B = 0 [49] 
Ot 
the usual form of the homogeneous Maxwell 


equations. 
By direct application of the formula [27], one finds 


xF = — xB A^dt + xE [50] 


where x means the Hodge dual in three space 
dimensions. One finds 


dxF=dxE- (4B 92") nds [51] 
Therefore, 
d x F = —(divE)dx ^ dy ^ dz 
+ ( (cur B)* — a pay Nena 
+ (curi BY — T) dz dx \dt 


+ (curi B) — T) dx dy dt [52 


We apply again the Hodge dual: 


xd x F = — (div E)dt + ( (cur B)* — =) dx 
y 
+ ( (cut BY — E) dy 
F (curi B)* — =) dz [53] 


In Minkowski space the expression xd» equals the 
codifferential. Therefore, the equation d*F= xd x 
F=-—yj holds, with j given by j“=(p,J), which is 
equivalent to 


div E = p, curl B — es =f [54] 


the inhomogeneous Maxwell equations. 


Current Conservation 


The electromagnetic 4-current is 


(p,J) [55] 


where p is the charge density and J the current 
density. This corresponds to a 1-form 


JË = pou" = (poy, poyv) = 


j = pdt — J* dx — dy — J dz [56] 


The Hodge dual is xj =° — j* ^ dt, with the 3-form 
o? = pdx ^ dy ^ dz, and the 2-form 


f = -]*dy Adz — Pde ^dx — J dx Ady [57] 


From the Maxwell equation d*F = —j, it follows 
that 
(d*)* F =—d*j =0 [58] 
that is 
xd(xj) = xd(o? — j* Adt) = *(do?® — dj* ^dt) 
=+ (2+ div dt adie dy Adz 
2 + divJ = 0 [59] 
Ot 


This is the “continuity equation.” 
The total charge inside a volume Vis Q= |, pdV, 


therefore 
-2 - -5f o dV = f Inas 60 


where OV is the surface which encloses the 
volume V, dS is the surface element, and n is the normal 
vector to this surface. This is current conservation. 
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The Gauge Potential 


The “Poincaré lemma” tells us that dF=0 implies 
F=dA, with the 4-potential A: 


A=¢dt+A [61] 


and the vector potential A=A,dx + Aydy + A,dz. 
From 


F=€Ad+B= (arden SA 


= døndt + dA + dtn É [62] 


it follows by comparing coefficients that 


ðA 
In vector notation this is 
A 
E = grado — E B = curl A [64] 


The 4-potential is determined up to a gauge function A: 
A'=A+dA [65] 


This gauge freedom has no influence on the 
observable quantities E and B: 


F = dA' = dA + ËA = dA =F [66] 


The Laplace operator is A=(d* +d)* =dd* + 
d*d, so when the 4-potential A fulfills the condition 
d*A=0O, we have 


AA = ddA = d'F = -j (67) 


the “classical wave equation.” The condition 
d*A=0 is called the “Lorentz gauge condition.” 
This condition can always be fulfilled by using the 
gauge freedom: d*(A+dA)=0 is fulfilled when 
d*d\ =AA= —d*A, where we have used the fact 
that d*A =0 for functions. That is to say, d*A =0 is 
fulfilled when A is a solution of the inhomogeneous 
wave equation. 


Gauge Invariance 


In quantum mechanics, the electron is described by a 
wave function which is determined up to a free 
phase. Indeed, at every point in space this phase can 
be chosen arbitrarily: 


plx) > y (x) = expfia(x) }ap(x) 
W(x) > (x) = y(x) exp{—ia(x)} 


with the only condition being that a(x) is a 
continuous function. The gauge transformation is 


(68) 


of the form g= exp {ia(x)}, with g an element of the 
abelian gauge group G=U(1). The free action is 


So = J Lo d*x [69] 
with 


Ly = Biv", — my [70] 


the “Lagrange density.” This action is not invariant 
under gauge transformations: 


Lo = Lh = blia, -mb — (Opah 7U 


The undesired term can be compensated by the 
introduction of a gauge potential w in a covariant 
derivative of w, 


Dy = (d+ w) 72] 


which has the desired transformation property 
Dw — exp {ia}Dw when besides the transformation 
W(x) — exp {ia(x)}q(x) of the matter field the gauge 
potential simultaneously transforms according to the 
gauge transformation w —> w — ida. The new Lagrange 
density is 


L= 4i D, — m) = Lo + iwla) ypx) [73] 


The substitution ô, — D,, is known to physicists; 
with w= —igA it is the ansatz of minimal coupling 
for taking into account electromagnetic effects: 
O, — ð, —igA,. The Lagrange density becomes in 
this notation £= Lo — A, J“, where J" = —quyy'w. 

The Lagrange density must now be completed by 
a kinetic term for the gauge potential and we get the 
complete electromagnetic Lagrange density 


L = Lo —A, J" — LF P” [74] 


with F =0,A, —O,A,. In the action this corre- 
sponds to 


1 
S = So — J Ave J F „F vol [75] 
M 4 M 


We get the field equations for the potential A by 
demanding that the variation of the action vanishes: 


6S[A] = — J aro- / F,,F’vol’ [76] 
M 4 M 


We write now 


J 6A,,J"vol* = (6A, j) [77] 
M 
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and 
2 / FF’ vol" 
4 JM 


=55] FA «F=50(RF 
= (6dA,F) = (d5A,F) = (6A,d*F) 178] 


where we have exchanged the action of 6 and d. 
Since this holds for arbitrary variations 6A we find 


de= [79] 


the inhomogeneous Maxwell equation. 


Nonabelian Gauge Theories 


In SU(N) gauge theory the elementary particles are 
taken to be members of symmetry multiplets. For 
example, in electroweak theory the left-handed 
electron and the neutrino are members of an SU(2) 


doublet: 
w= (5) 80 


W(x) =e (x(x), Vx) = px) [81] 


g(x) = exp {A(x)} [82] 


where g(x) is an element of the Lie group SU(2) and 
A is an element of the Lie algebra su(2). The Lie 
algebra is a vector space, and its elements may be 
expanded in terms of a basis: 


A(x) = M@(x)T, $3] 


For su(2) the basis elements are traceless and anti- 
Hermitian (see below), they are conventionally 
expressed in terms of the Pauli matrices, 


Ts = 5 [84] 
with 
n=(? 7 n=(? |) 
tUa 0)? Li 0 35 


of! 4 
oO. i 


They are conventionally normalized according to 


tr(TaTp) =—46yp [86] 


The Dirac Lagrangian is not invariant with 
respect to local gauge transformations: 


Lo = Bid, —m)b— L} 


= Lo + iby" (gôg W [87] 
We introduce the gauge potential 
wux) = w(x) Ta |88] 


with a gauge transformation 
Wy > Wh, =g wg +g Ong [89] 


The Lagrange density is modified through a covar- 
iant derivative: 


Onr Dy = u + Wy [90] 
The covariant derivative D, transforms according to 
D, >D; =¢"D,g 91] 

and thus the modified Lagrange density 
L= pli D, — m)w = Lo + ivy wy) [92] 


is invariant with respect to local gauge transformations. 
The extra term in the Langrange density is 
conventionally written 


=A [93] 

with 
Al, = —igqui, [94] 

and 
Ji = p" Tap [95] 


In mathematical terminology w is called a connec- 
tion. The quantity A is the physicists gauge 
potential. The connection is anti-Hermitian and the 
gauge potential Hermitian. The gauge potential also 
includes the coupling constant g. We will refer to 
both w and A as the gauge potential, where the 
relation between them is given by eqn [94]. 

We can write the gauge potential as A = Af dx" T, 
or, in the SU(2) case, as 


A, = ATi + AUT + A T; [96] 


where we see explicitly that it involves three vector 
fields, which couple to the electroweak currents [95] 
with the single coupling constant q, and which will 
become after symmetry breaking the three vector 
bosons W}, W_, Zo of the electroweak gauge theory. 
Actually, a mix of the neutral gauge boson and the 
photon will combine to yield the Zo boson, while the 
orthogonal mixture gives rise to the electromagnetic 
interaction, in an SU(2) x U(1) theory. At this stage, 
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the gauge bosons are all massless, their masses are 
generated by the “Higgs? mechanism.” 


Lie-Algebra-Valued p-Forms 


To describe nonabelian fields, we need Lie-algebra- 
valued p-forms: 


Q = Tag" [97] 


where T, is a generator of the Lie algebra, the index 
a runs over the number of generators of the Lie 
algebra, and the ° are the usual scalar-valued 
p-forms. The composition in a Lie algebra is a Lie 
bracket, which is defined for two Lie-algebra-valued 
p-forms by 


[¢, Y= [Ta To] Av” [98] 
The Lie bracket in the algebra is 
[Ta, To] = fa Te [99] 


where ff. are the structure constants. It follows from 
this that 
W, 9] = [Ta, T Ao? = —[Ty, Taly Ad? [100] 


or 


Y, o) = (1) o, y] 


when ¢ is a p-form and w is a g-form. In the special 
case that T, is a matrix, also the product TT, is 
defined, and from this the product of two Lie- 
algebra-valued p-forms 


101) 


o NY = Ta AT pb? = TaT pd? Aw? [102] 
Now the Lie bracket is a commutator: 
Ta, Te] = TaTy — Ty Ta [103] 
and 
lo, Y] = [Ta, To] AW 
= Tad! A Typ? — (-1)4 Tol? A Ta" 
= oAv—(-1)PyA¢ [104] 


From this relation it follows that for ¢ and Y odd 
p-forms 


Io, Yl = bAY+YAG [105] 
For ġọ an odd p-form 
$e] =PAG+OAG=2GAG) [106] 


The Gauge Potential and the 
Field Strength 


The generalization of the abelian relationship 
between the gauge potential and the field strength, 
F= dA, is 

0 = dw +4 jw, w] = d9 + wrw [107] 
where because w is a 1-form we can use eqn [106]. 


The mathematician refers to 0 as the curvature. The 
physicist writes, in analogy to eqn [94], 


F = —i q0 = s a Ads” Ta [108] 
One obtains for the components 
F°, = 0,A% — O,A%—igf ABAS [109 


A generalization of the gauge transformation of 
A, that is, A’ = A + dA, is eqn [89]: 


w =g 'ug +g 'dg [110] 
A quantity @ with the transformation property 
P =g ‘og 


is called a “tensorial” quantity. The gauge potential 
w is according to this definition nontensorial. 
Nevertheless the field strength is tensorial. Indeed 


[111] 


6’ = d(g-'wg) + (dg) Adg 
+ilg wg +g 'dg,g wg +g 'dg] 
= (dg')Awg+ g ldwg — g !wAdg + (dg!) Adg 
+32 '[w,wlg +i ig og, g dg] 
+i[g dg, gwg] +4 [g7 dg, g7" dg] 
= g '0g + (dg"') Awg — g 'wAdg + (dg) Adg 
+g lwAde te 'dzAg wg te deng "dg 


=g ‘0g [112] 


where we have used the derivation of the relation 
glg = Id to get 


dg = -g 'dgg! [113] 


In the abelian case, we had dF=0. The non- 
abelian analog is 


d0 = dw Aw — w^ dw 
= (0 —-w^w) Aw- wA (0 —-wnw) 


= Au =w AN0 [114] 


or 


dO+wA@-—A@Aw=0 [115] 
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the Bianchi identity. It can also be written as 


dO+wA0-—O@Aw = dé + [w, 4] = 0 [116] 
because from eqn [104] 
w+ (1) t8 Aw = w, 6] [117] 
The covariant derivative D is defined as 
Do:= dọ + |w, ¢| [118] 


for ġ a tensorial quantity. The covariant derivative 
takes tensorial p-forms into tensorial (p + 1)-forms: 


D'¢' = d(g-'g) + [g ‘we +g ‘dg, g og] 
= dg! Apg +g ‘dbg + (-1)’g 'bAdg 
+| ‘wg, g og] + [g ‘dg, g og] 
=g 'Dog + dg ' dg + (-1)’g tb Adg 
+g ‘deg! Nog —(—-1)’g 'oAdg 
= g 'Deg 


We have thereby verified the transformation prop- 
erty of eqn [91]. 


[119] 


The Gauge Group 


From the gauge transformation y = gy the require- 
ment ||? =|w|7 leads to g'g= 1. That means that g 
belongs to the unitary Lie group G = U(n), whose 
elements fulfill gt = g! = g. For elements of the Lie 
algebra G = u(n) this implies 


[120] 


as =X [121] 


where X is complex conjugation and X! means 
transposition. 

For elements of the Lie algebra we can define a 
scalar product (the Killing metric) 


(X,Y) = XH [122] 
The scalar product is real: 
(X,Y) =— ar 3 = —X%X",= (X,Y) [123] 
symmetric: 
(X,Y) =—tr(X, Y)= -tr(Y,X)=(Y,X) [124] 
and positive definite: 
(XK, RS aK eX Hox gS (Kg J125 


The scalar product is invariant under the action of 
G on G: for g€ G 


(g8Xg ~, gYg"') = —tr (gXYg"') 
= —tr(X,Y)=(X,Y) [126] 
or for X,Y,Z EG 
aa a NA [127] 


We take the derivative of this equation with respect 
to t at the value t=0 and get: 


(X, Y],Z) + (Y, [X, Z]) =0 [128] 


We define an action of the algebra G on itself: 
ad(X):G — G 


ad(X)Y = [X,Y] 129) 


We can then formulate our conclusion as follows: 


the action of G on itself is anti-Hermitian: 
(ad(X)Y,Z) = — (Y,ad(X)Z) [130] 


or 


[ad(X)]' 


From gtg =1 we have |det (g)|* =1. For the gauge 
group G=SU(N) we require in addition det (g) = 1. 
Since 


= —ad(X) [131] 


det(g) = det(exp(X)) = exp(tr(X)) [132] 


the elements X € su(N) must be traceless. A basis of 
the vector space of traceless, anti-Hermitian (2 x 2) 
matrices is given by the Pauli matrices, eqn [85]. 


The Yang-Mills Action 


The SU(2) Yang-Mills action is, in analogy to the 
abelian case, 


1 i 1 a 
S=- 4q2 ial aie De: olf -za | Et )vol 


1 
=r J tr(F A x F) 


We have included the trace in our definition of the 
scalar product: 


[133] 


p=- <o> vol =- | tlon xY) [134] 


We then write eqn [133] as 


S[u] =4 (0,0) 1135] 


taking into account the relation between 0 and the 
field strength F, and indicating the dependence on 
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the gauge potential. Since 0 is tensorial the action is 
invariant. 
Now we calculate the variation von S[w] with 
respect to a variation of the gauge potential: 
d 1 


6S] = F Slw(e)]l,-0 = 5 (8, 8) 


(56,0) + (0,50) 


60,0): = (5( do Zll) 


2 

(60,0 

ôd Ma E dw], 0 
W 7 W, W J 


= (déw + |w, dw], 0) [136] 


where we have exchanged the order of 6 and d. We 
remark that although w is not a tensorial section, ôw is: 
for w, =g twig + g 'dg and w, =g ung +e 'dg is 


— / 
dW = Ww, — 


wh = g (w — w2)8 [137] 


The quantity 0 is in any case tensorial. Therefore, 
the covariant derivative is defined, and we have 


Déw = déw + |w, ôw] [138] 
and 


DO = do + |w, 9] [139] 


In general, the action of the covariant derivative on 
tensorial quantities can be written as D =d + ad(w), 
where ad(X) is the representation of the Lie algebra on 
itself introduced in the previous section. We now have 


§S|w] = (Déw, 0) = (Sw, D*6) = 0 [140] 


for an arbitrary variation ów. Therefore, D*0 =0. 
We have obtained 


Dp=0 [141] 
the “Yang-Mills equations,” and 
D0 = 0 [142] 


the “Bianchi identites.” These are the generalizations 
of the Maxwell equations d*F =0 and dF=0 in the 
absence of external sources. For the general case of 
interacting fermions, we write out the full action, in 
analogy to eqn [74], and obtain, in analogy to eqns 
[79] and [58], 


D‘@=-J, D*}=0 [143] 


We shall now derive, again for the pure gauge 
sector, coordinate expressions for the Yang-Mills 
equations. Consider the expression 


6S[w] = (Déw, 0) = (dw, D*0) 


= (déw + |w, dw], 0) [144] 


The first term in the last expression is 
(d5w, 0) = (6w, d*0) = —tr J Sw, {d0} vol [145] 
M 


The second term can be computed using 


lw, dw], = {wA dw + dwAw}(d,,; Ov) 


= wy dWy — Wy OW, + dWyWy — dwWyWwy, [146] 
and hence 
w, ôw] 0” = 2[w,,, bwl” [147] 
because @ is antisymmetric, 6“” = —0””. Thus, 
(la bw] 0) = -| tr([w, dw] A * 0) 
M 
: / tr([w, 6u.,,0”)vol! 
= -| tr([wy, bu," vol’ 
M 
7 I, (lwn, dw], g”\vol4 [148] 


where (,) is the scalar product in G. From eqn [128] 
this equals 


- J (Swn, [wu "vol! 
M 
= J tr(bw,[w,,, 0"”|)vol* [149] 
M 
Combining this with eqn [144] gives 


(6w, D*0) = — / 


tr 
M 


= (ôw, {(d°0)" — [wn OM T}) 


(S42, {dO — [wp 0"”]})vol* 

150) 

We can now insert the coordinate expression for 
(d6)" = —0,,0%" [151] 


Finally, the coordinate expressions of the Yang- 
Mills equations D*0 =0 are 


(D"0)" = —{0,0" + [wy 0]} = 0 [152] 
The Analogy with Electromagnetism 


The Yang-Mills equation and the Bianchi identity in 
the absence of external sources are 


O,. FY’ — ig|A,, F] = 0 [153] 
and 
Ola FOE a t Oda = AT or] 
Flad l F Antal 0 [154] 
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We shall write these equations in terms of the fields 


F? = Fİ, i=1,2,3 [155] 


F! -_ B’, F>! — Be. F! z B? [156] 


where the E and B vectors may be thought of as 

“electric” and “magnetic” fields, even though they have 

Lie-algebra indices, F}? = (F*)"°T,,, etc. In the context of 

the SU(3) theory, they are referred to as the “chromo- 

electric” and “chromomagnetic” fields, respectively. 
The Yang-Mills equations with u=0 are 


0;F° — ig[A;, F°] = 0 [157] 


with 1=1,2,3 a spatial index. In vector notation 
this is 


div E = iq(A -E — E-A) 158) 


This is the analog of Gauss’s equation. Even though 
we started out without external sources, ig(A-E — 
E-A) plays the role of a “charge density.” The 
Yang-Mills field E and the potential A combine to 
act as a source for the Yang-Mills field. This is an 
essential feature of nonabelian gauge theories in 
which they differ from the abelian case, due to the 
fact that the commutator [A, E] is nonvanishing. 

Now consider the Yang-Mills equations with a 
spatial index u = ʻi: 


DoF? + 0;F! — ig[Ao, F®] — iq[A;F"] =0 [159] 
In vector notation this is 
E 
curl B =< = iq(AoE = EAo) 
+ig(A x B+ Bx A) [160] 


replacing the Ampere-Maxwell law. Note that there 
are two extra contributions to the “current” other 
than the displacement current. 

The analogs of the laws of Faraday and of the 
absence of magnetic monopoles are derived similarly 
from the Bianchi identities. The results are 


curl p1% = iq{(A x E + E x A) + (AoB — BAo)} 


[161] 
and 

div B = iq(A -B — B-A) [162] 
Further Remarks 


The foundations of the mathematics of differential 
forms were laid down by Poincaré (1953). They 
were applied to the description of electrodynamics 


already by Cartan (1923). A modern presentation of 
differential forms and the manifolds on which they 
are defined is given in Abraham et al. (1983). A 
recent treatment of electrodynamics in this approach 
is Hehl and Obukhov (2003). Weyl’s argument is in 
his paper of 1929. 

Nonabelian gauge theories today explain the 
electromagnetic, the strong and weak nuclear 
interactions. The original paper is that of Yang 
and Mills (1954). Glashow, Salam, and Weinberg 
(1980) saw the way to apply it to the weak 
interactions by using spontaneous symmetry 
breaking to generate the masses through the use 
of the Higgs’ (1964) mechanism. t’Hooft and 
Veltman (1972) showed that the resulting quan- 
tum field theory was renormalizable. The strong 
interactions were recognized as the nonabelian 
gauge theory with gauge group SU(3) by Gell- 
Mann (1972). For a modern treatment which puts 
nonabelian gauge theories in the context of 
differential geometry, see Frankel (1987). 


See also: Dirac Fields in Gravitation and Nonabelian 
Gauge Theory; Electroweak Theory; Measure on Loop 
Spaces; Nonperturbative and Topological Aspects of 
Gauge Theory; Quantum Electrodynamics and its 
Precision Tests. 
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Introduction 


For the purpose of this article, vortices are topological 
solitons arising in field theories in (2 + 1)-dimensional 
spacetime when a complex-valued field ¢ is allowed to 
acquire winding at infinity, meaning that the phase of 
olt, x), as x traverses a large circle in the spatial plane, 
changes by 27, where n is a nonzero integer. Such 
winding cannot be removed by any continuous 
deformation of @ (hence “topological’’) and traps a 
considerable amount of energy which tends to coalesce 
into smooth, stable lumps with highly particle-like 
characteristics (hence “solitons”). Clearly, the universe 
is (3+1) dimensional. Nonetheless, planar field 
theories are of physical interest for two main reasons. 
First, the theory may arise by dimensional reduction of 
a (3 + 1)-dimensional model under the assumption of 
translation invariance in one direction. Vortices are 
then transverse slices through straight tube-like objects 
variously interpreted as magnetic flux tubes in a 
superconductor or cosmic strings. Second, a crucial 
ingredient of the standard model of particle physics is 
spontaneous breaking of gauge symmetry by a Higgs 
field. As well as endowing the fundamental gauge 
bosons and chiral fermions with mass, this mechanism 
can potentially generate various types of topological 
solitons (monopoles, strings, and domain walls) whose 
structure and interactions one would like to under- 
stand. Vortices in (2 + 1) dimensions are interesting in 
this regard because they arise in the simplest field 
theory exhibiting the Higgs mechanism, the abelian 
Higgs model (AHM). They are thus a useful theoret- 
ical laboratory in which to test ideas which may 
ultimately find application in more realistic theories. 
This article describes the properties of abelian Higgs 
vortices and explains how, using a mixture of 
numerical and analytical techniques, a good under- 
standing of their dynamical interactions has been 
obtained. 


The Abelian Higgs Model 


Throughout this article spacetime will be R?*! 
endowed with the Minkowski metric with signature 
(+,—,-—), and Cartesian coordinates x“, u= 
0,1,2, with x?=+ (the speed of light c=1). A 
spacetime point will be denoted x, its spatial part by 
x =(x!,x7). Latin indices j,k,... range over 1, 2, and 
repeated indices (Latin or Greek) are summed over. 
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We sometimes use polar coordinates in the spatial 
plane, x =r(cos@, sin@), and sometimes a complex 
coordinate z=x! +ix?=re®. Occasionally, it is 
convenient to think of R**! as a subspace of R°*! 
and denote by k the unit vector in the (fictitious) 
third spatial direction. The complex scalar Higgs 
field is denoted ¢, and the electromagnetic gauge 
potential A,,, best thought of as the components of a 
Lior A=A,dx". Fi =OAy— 0,7, 1s: the field 
strength tensor which, in R7*', has only three 
independent components, identified with the mag- 
netic field B=Fy2 and electric field (E1, E2)= 
(Fo1, Fo2). The gauge-covariant derivative is D,ọ = 
Oo — 1eA,,¢, e being the electric charge of the Higgs. 
Under a U(1) gauge transformation, 


preg, 
A:R**! —R being any smooth function, Fy and 
|| remain invariant, while D,dre“D,¢. Only 
gauge-invariant quantities are physically observable 
(classically). 


With these conventions, the AHM has Lagrangian 
density 


A,m A, +e 'd,A [1] 


L=- FF PY +2D,¢D" —2(* — |P)? 2 
4 2 8 
which is manifestly gauge invariant. By rescaling 
@,Ay,,x and the unit of action, we can (and 
henceforth will) assume that e=v=a=1. The 
only parameter which cannot be scaled away is A > 0. 
Its value greatly influences the model’s behavior. 
The field equations, obtained by demanding that 
(x), A (x) be a local extremal of the action 


S= Ld a ate 
À 
D,D"¢ +5 (1— e e= 0 


[3] 
OF +5(¢Di6 — $14) = 0 
This is a coupled set of nonlinear second-order PDEs. 
Of particular interest are solutions which have finite 
total energy. Energy is not a Lorentz-invariant 
quantity. To define it we must choose an inertial 
frame and, having broken Lorentz invariance, it is 
convenient to work in a temporal gauge, for which 
Ap = 0 (which may be obtained by a gauge transfor- 
mation with A(t, x) = fo Ao(t', x) dt’, after which only 
time-independent gauge transformations are per- 
mitted). The potential energy of a field is then 


E= JG + DD +4 (1 — oP?) dx! dx 


= Fmag T Esd F Eit [4] 
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while its kinetic energy is 
_l 2 a a\ Jal 4n2 
Ekin = 5 (ao + TT dx dx [5] 


If ġ, A satisfy the field equations then the total 
energy Eot=Ekin +E is independent of t. By 
Derrick’s theorem, static solutions have Emag = 
Et (Manton and Sutcliffe 2004, pp. 82-87). 

Configurations with finite energy have quantized 
total magnetic flux. To see this, note that E finite 
implies |¢| — 1 as r > œ, so ġ ~ el at large r for 
some real (in general, multivalued) function x. The 
winding number of ¢ is its winding around a circle of 
large radius R, that is, the integer n=(x(R,27) — 
x(R, 0))/27. Although the phase of ¢ is clearly gauge 
dependent, 7 is not, because to change this, a gauge 
transformation e:R* — U(1) would itself need 
nonzero winding around the circle, contradicting 
smoothness of e'*. The model is invariant under 
spatial reflexions, under which n+> —n, so we will 
assume (unless noted otherwise) that n > 0. Finite- 
ness of E also implies that Do = dọ — i1A¢d — 0, so 
A ~ —idé/¢ ~ dx asr — co (note ¢ Æ 0 for large r). 
Hence, the total magnetic flux is 


2r 
/ Bd'’x= lim ¢ A= lim Ox dd=2nn [6] 
R2 R= Sz R=œ Jo 

where Sr ={x:|x|=R} and we have used Stokes’s 
theorem. The above argument uses only generic 
properties of E, namely that finite Esef requires |¢| 
to assume a nonzero constant value as r— oo. So 
flux quantization is a robust feature of this type of 
model. As presented, the argument is somewhat 
formal, but it can be made mathematically rigorous 
at the cost of gauge-fixing technicalities (Manton 
and Sutcliffe 2004, pp. 164-166). Note that if n 40 
then, by continuity, ¢(x) must vanish at some x € 
R?, and one expects a lump of energy density to be 
associated with each such x since 6=0 maximizes 
the integrand of Eggs. 


Radially Symmetric Vortices 


The model supports static solutions within the 
radially symmetric ansatz ¢=o(r)el”’, A =a(r) dé, 
which reduces the field equations to a coupled pair 
of nonlinear ODEs: 


do 1do 1 TE j 
P [7] 
a 4=0 
dr? rdr = 


Finite energy requires lim,_,., o(r) = 1, lim,_.. a(r) =n 
while smoothness requires o(r) ~ constyr”, a(r) ~ 


const2r? as r — 0. It is known that solutions to this 
system, which we shall call 1-vortices, exist for all 
n,, though no explicit formulas for them are 
known. They may be found numerically, and are 
depicted in Figure 1. Note that o and a always rise 
monotonically to their vacuum values, and B always 
falls monotonically to 0, as r increases. These 
solutions have their magnetic flux concentrated in a 
single, symmetric lump, a flux tube in the R°*! 
picture. In contrast, the total energy density (inte- 
grand of E in [4]) is nonmonotonic for n > 2, being 
peaked on a ring whose radius grows with n. This is 
a common feature of planar solitons. 

The large r asymptotics of n-vortices are well 
understood. For A < 4 one may linearize [7] about 
o=1,a=n, yielding 


a(r) ~ 1+ 5" Ko( VAr) 8 
alr) ~ n+ 5" 1Ki(r) 9 


where gy,™, are unknown constants and Ka 
denotes the modified Bessel’s function. For A > 4 
linearization is no longer well justified, and the 
asymptotic behaviour of o (though not a) is quite 
different (Manton and Sutcliffe 2004, pp. 174-175). 
We shall not consider this rather extreme regime 
further. Note that 


Kalr) ~ T as r — œ [10] 


for all a, so both o and a approach their vacuum 
values exponentially fast, but with different decay 
lengths: 1/VAÀ for o, 1 for a. This can be seen in 
Figure 1a. The constants g, and m, depend on à and 
must be inferred by comparing the numerical 
solutions with [8], [9]; q=q1 and m=m, will 
receive a physical interpretation shortly. 

The 1-vortex (henceforth just “vortex”) is stable for 
all A, but 2-vortices with n > 2 are unstable to break 
up into n separate vortices if A > 1. We shall say that 
the AHM is type I if A< 1, type II if A> 1, and 
critically coupled if A = 1, based on this distinction. Let 
E, denote the energy of an n-vortex. Figure 2 shows 
the energy per vortex E,,/n plotted against n for 
A=0.5,1, and 2. It decreases with n for \=0.5, 
indicating that it is energetically favorable for isolated 
vortices to coalesce into higher winding lumps. For 
A=2, by contrast, E„/n increases with n indicating 
that it is energetically favorable for 1-vortices to fission 
into their constituent vortex parts. The case \=1 
balances between these behaviors: E, /n is independent 
of n. In fact, the energy of a collection of vortices is 
independent of their positions in this case. 


Energy density 








(c) 
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Figure 1 Static, radially symmetric n-vortices: (a) the 1-vortex profile functions o(r) (solid curve) and a(r) (dashed curve) for \ = 2, 1, 
and 1/2, left to right; (b) the magnetic field B; and (c) the energy density of n-vortices, n= 1 to 5, left to right, for \=1. 


Ek 








Figure 2 The energy per unit winding E,/n of radially 
symmetric n-vortices for \=1/2,1, and 2. 


Interaction Energy 


A precise understanding of the type I/II dichotomy 
can be obtained using the 2-vortex interaction 
energy Ejnt(s) introduced by Jacobs and Rebbi. This 
is defined to be the minimum of E over all n=2 
configurations for which ¢(x)=0 at some pair of 
points x1,x2 distance s apart. One interprets x1,x2 
as the vortex positions. Eint can only depend on their 
separation s = |x; — x2|, by translation and rotation 
invariance. Figure 3 presents graphs of Ejnt(s) 
generated by a lattice minimization algorithm. For 
A <1, vortices uniformly attract one another, so a 
vortex pair has least energy when coincident. For 
A> 1, vortices uniformly repel, always lowering 
their energy by moving further apart. The graph for 
\=1 would be a horizontal line, Eint(s) = 27. 
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(a) 
Figure 3 The 2-vortex interaction energy Eint(s) as a function 
form E® (s) (dashed curve) for (a) \=1/2 and (b) \=2. 


int 


The large s behavior of Eint(s) is known, and can 
be understood in two ways (Manton and Sutcliffe 
2004, pp. 177-181). Speight, adapting ideas of 
Manton on asymptotic monopole interactions, 
observed that, in the real ¢ gauge (dred, 
A — A — dé), the difference between the vortex and 
the vacuum ¢=1,A=0 at large r, 


b=b-10 2 Ko(Vdr) [11 
(Ao, A) ~ 5- (0,k x VKo(r) [12] 


is identical to the solution of a linear Klein- 
Gordon-Proca theory, 


(8L +A =r, ("+ 1)A,=j, [13] 


in the presence of a composite point source, 


K= q(x), (o,f) = m0, k x Vê(x)) [14] 
located at the vortex position. Viewed from afar, 
therefore, a vortex looks like a point particle 
carrying both a scalar monopole charge q and a 
magnetic dipole moment m, a “point vortex,” 
inducing a real scalar field of mass vA (the Higgs 
particle) and a vector boson field of mass 1 (the 
“photon”). If physics is to be model independent, 
therefore, the interaction energy of a pair of well- 
separated vortices should approach that of the 
corresponding pair of point vortices as the separa- 
tion grows. Computing the latter is an easy exercise 
in classical linear field theory, yielding 


2.42 


2.38 


Eint 


2.34 


(b) 


of vortex separation (solid curve), in comparison with its asymptotic 


Eint(s) ~ E” (s) =2E, — T kol Vis) 


int 27 
m 


;] 
Ko(s) [15] 


> 


Bettencourt and Rivers obtained the same formula 
by a more direct superposition ansatz approach, 
though they did not give the constants g, m a 
physical interpretation. 

The force between a well-separated vortex pair, 
—Eint (s), consists of the mutual attraction of 
identical scalar monopoles, of range 1/VA, and the 
mutual repulsion of identical magnetic dipoles, of 
range 1. If A< 1, scalar attraction dominates at 
large s so vortices attract. If \>1, magnetic 
repulsion dominates and they repel. If A=1 then 
q = m, as we shall see, so the forces cancel exactly. 
Figure 3 shows both Ein and ES. for \=0.5,2. The 
agreement is good for s large, but breaks down for 
s<4, as one expects. Vortices are not point 
particles, as in the linear model, and when they lie 
close together the overlap of their cores produces 
significant effects. 

The same method predicts the interaction energy 
between an ,-vortex and an m-vortex at large 
separation. We just replace 2E; by E,, + Em, q? by 
dn,Im, and m by my,mn,. In particular, an 
antivortex ((—1)-vortex) has E4 =E1,g4=91=4, 
and mı = —m,= —m, so the interaction energy for 
a vortex—antivortex pair is 


EP 


int 


q? m2 
(s) ~ 2E, — z- Ko( vàr) = z7 Kol) [16] 


which is uniformly attractive. It would be pleasing if 
qn, Mn could be deduced easily from q, m. One 
might guess g,=|n|qg,m,=nm, in analogy with 
monopoles. Unfortunately, this is false: qn, Mn 
grow approximately exponentially with |n]. 


Vortex Scattering 


The AHM being Lorentz invariant, one can obtain 
time-dependent solutions wherein a single 1-vortex 
travels at constant velocity, with speed 0<uv<1 
and Fro =(1 — v2) !/"E,,, by Lorentz boosting the 
static solutions described above. Of more dynamical 
interest are solutions in which two or more vortices 
undergo relative motion. The simplest problem is 
vortex scattering. Two vortices, initially well sepa- 
rated, are propelled towards one another. In the 
center-of-mass (COM) frame they have, as t —> —o, 
equal speed v, and approach one another along 
parallel lines distance b (the impact parameter) 
apart, see Figure 4. If b=0, they approach head- 
on. Assuming they do not capture one another, they 
interact and, as t — ov, recede along parallel straight 
lines having been deflected through an angle © (the 
scattering angle). If scattering is elastic, the exit lines 
also lie b apart and each vortex travels at speed v as 
t — oo. The dependence of © on v, b, and X has 
been studied through lattice simulations by several 
authors, perhaps most comprehensively by Myers, 
Rebbi, and Strilka (1992). We shall now describe 
their results. 

Note first that vortex scattering is actually 
inelastic: vortices recede with speed <v because 
some of their initial kinetic energy is dispersed by 
the collision as small-amplitude traveling waves 
(“radiation”). This energy loss can be as high as 
80% in very fast collisions at small b. At small v the 
energy loss is tiny, but can still have important 
consequences for type I vortices: if v is very small, 
they start with only just enough energy to escape 
their mutual attraction. In undergoing a small b 
collision they can lose enough of this energy to 
become trapped in an oscillating bound state. In this 
case they do not truly scatter and © is ill-defined. 
Myers et al. find that v > 0.2 suffices to avoid 





Figure 4 The geometry of vortex scattering. 
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capture when \=1/2. Since type I vortices attract, 
one might expect © to be always negative, indicating 
that the vortices deflect towards one another. In 
fact, as Figure 5a shows, this happens only for small 
v and large b. Another naive expectation is that 
O=0 or 0=180° when b=0 (either vortices pass 
through one another or ricochet backwards in a 
head-on collision). In fact © = 90°, the only other 
possibility allowed by reflexion symmetry of the 
initial data. Figure 6 depicts snapshots of such a 
scattering process at modest v. The vortices deform 
each other as they get close until, at the moment of 
coincidence, they are close to the static 2-vortex 
ring. They then break apart along a line perpendi- 
cular to their line of approach. One may consider 
them to have exchanged half-vortices, so that each 
emergent vortex is a mixture of the incoming 
vortices. This rather surprising phenomenon was 
actually predicted by Ruback in advance of any 
numerical simulations and turns out to be a generic 
feature of planar topological solitons. 

Consider now the type II case (\=2, Figure 5b). 
Here, © > 0 for all v, b as one expects of particles 
that repel each other. Head-on scattering is more 
interesting now since two regimes emerge: for v > 
Verit © 0.3, one has the surprising 90° scattering 
already described, while for v < Veri the vortices 
bounce backwards, O=180°. This is easily 
explained. In order to undergo 90° head-on scatter- 
ing, the vortices must become coincident (otherwise 
reflexion symmetry is violated), hence must have 
initial energy at least E2. For v < vei, where 


2E1 
V Le Dae 


they have too little energy, so come to a halt before 
coincidence, then recede from one another. The 
solution Verit of [17] depends on X and is plotted in 
Figure 7. For v slightly above voit, we see that, in 
contrast to the type I case, O(b) is not monotonic: 
maximum deflection occurs at nonzero b. 

The point vortex formalism yields a simple model 
of type II vortex scattering which is remarkably 
successful at small v. One writes down the Lagrangian 
for two identical (nonrelativistic) point particles of 
mass E; moving along trajectories x;(t),x2(t) under 


the influence of the repulsive potential E**, 


= F2 [17] 


L =} E(f + č) — EX(lx1 — x21) [18] 


Energy and angular momentum conservation reduce 
O(v, b) to an integral over one variable (s = |x1 — x21) 
which is easily computed numerically. To illustrate, 
Figure 5b shows the result for A=2,v=0.1 
in comparison with the lattice simulations of 
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Figure 5 The 2-vortex scattering angle © as a function of impact parameter b for v=0.1 (7), v=0.2 (A), 





v=0.3 ($), v=0.4 (x), v=0.5 (x), and v=0.9 (+), as computed by Myers et al. (1992): (a) A=1/2; (b) A=2; (c) A=1. The 
dotted curves are merely guides to the eye. The solid curves in (b), (c) were computed using the point vortex model. Note that Myers 


et al. use different normalizations, so b= v2bmps and à= Amprs/2. 


Myers et al. The agreement is almost perfect. For 
large v the approximation breaks down not only 
because relativistic corrections become significant, 
but also because small b collisions then probe the small 
|x1— x2| region where vortex core overlap effects 
become important. For the same reason, the point 
vortex model is less useful for type I scattering. 
Here there is no repulsion to keep the vortices well 
separated, so its validity is restricted to the small v, 
large b regime. 

Critical coupling is theoretically the most inter- 
esting regime, where most analytic progress has been 
made. Since Ein = E>. = 0, one might expect vortex 
scattering to be trivial (O(v, b) = 0), but this is quite 
wrong, as shown in Figure 5c. In particular, 


O(v,0)=90° for all v, just as in the large v type I 
and type II cases. The point is that scalar attraction 
and magnetic repulsion of vortices are mediated by 
fields with different Lorentz transformation proper- 
ties. While they cancel for static vortices, there is no 
reason to expect them to cancel for vortices in 
relative motion. 


Critical Coupling 


The AHM with \=1 has many remarkable proper- 
ties, at which we have so far only hinted. These all 
stem from Bogomol’nyi’s crucial observation 
(Manton and Sutcliffe 2004, pp. 197-202) that the 
potential energy in this case can be rewritten as 
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Taubes’s theorem shows that this m-vortex is just 
one point, corresponding to the list [0,0,...,0], in a 
2n-dimensional space of static multivortex solutions 
called the moduli space M,. This space may be 
visualized as the flat, finite-dimensional valley 
bottom in C, on which E attains its minimum 
value, mn. Points in M, are in one-to-one correspon- 
dence with distinct unordered lists [z1,22,...,2n], 
which are themselves in one-to-one correspondence 
with points in C”, as follows. To each list, we assign 
the unique monic polynomial whose roots are 2,, 


DZ) = (2 Zi) (Za) (2 = Sy) 
= ay tage bape” +2" = [22] 


This polynomial is uniquely determined by its 
coefficients (40,41, ...,4n-1) E€ C”, which give good 
global coordinates on M, = C”. The zeros z, of œ 
may be used as local coordinates on M,, away from 
A, the subset of M, on which two or more of the 
zeros Z, coincide, but are not good global 
coordinates. 

Let (¢,A), denote the static solution correspond- 
ing toa € C”. If the zeros z, are all at least s apart, 
Taubes showed the solution is just a linear super- 
position of 1-vortices located at z,, up to corrections 
exponentially small in s. Imagine these constituent 
vortices are pushed with small initial velocities. 
Then (¢(t), A(t)) must remain close to the valley 
bottom M,, since departing from it costs kinetic 
energy, of which there is little. Manton has 
suggested, therefore, that the dynamics is well 
approximated by the constrained variational problem 
wherein (@(t), A(t)) = (9, A)aE M, for all t. Since 
the action $= fLldx= f (Ekin — E) dt, and E=an, 
constant, on M,, this constrained problem amounts 
to Lagrangian mechanics on configuration space M, 
with Lagrangian L=Exin|y,. Now Ekin is real, 
positive, and quadratic in time derivatives of ¢, A, so 


L =1),5(a)ayd [23] 


Yrs forming the entries of a positive-definite n x n 
Hermitian matrix (Ys = Yrs). Since (¢,A), is not 
known explicitly, neither are y„s(a). Observe, how- 
ever, that L is the Lagrangian for geodesic motion in 
M, with respect to the Riemannian metric 


y = yrsla)da,dās [24] 


Manton originally proposed this geodesic approx- 
imation for monopoles, but it is now standard for all 
topological solitons of Bogomol’nyi type (where one 
has a moduli space of static multisolitons saturating 
a topological lower bound on E). Note that 
geodesics are independent of initial speed, which 
agrees with Myers et al: Figure 5c shows that O(v, b) 


is approximately independent of v for v < 0.5. 
Further, Stuart (1994) has proved that, for initial 
speeds of order e, small, the fields stay (pointwise) e? 
close to their geodesic approximant for times of 
order «1. 

On symmetry grounds, two vortex dynamics in 
the COM frame reduces to geodesic motion in M5 = 
C, the subspace of centered 2-vortices (aj =0, so 


z1 = —22), with induced metric 
qP = G(|ao|)daodao [25] 


G being some positive function. Note that ag = 2122, 
so the intervortex distance |z1 — 22| =2|z1|=2|ao|'/”. 
The line a9 = 8 € R, traversed with 8 increasing, say, 
is geodesic in MÍ. The vortex positions (roots of 
z + ao) are +,/|B| for 8 < 0 and +i/@ for G> 0. 
This describes perfectly the 90° scattering phenom- 
enon: two vortices approach head-on along the x! 
axis, coincide to form a 2-vortex ring, then break 
apart along the x* axis, as in Figure 6. This behavior 
occurs because a9 = 2122, rather than zı — 22, is the 
correct global coordinate on MÌ, since vortices are 
classically indistinguishable. 

Samols found a useful formula (Manton and 
Sutcliffe 2004, pp. 205-215) for y in terms of the 
behavior of |a| close to its zeros, using which he 
devised an efficient numerical scheme to evaluate 
G(|ao|), and computed ©(b) in detail, finding 
excellent agreement with lattice simulations at low 
speeds. He also studied the quantum scattering of 
vortices, approximating the quantum state by a 
wave function VW on M, evolving according to the 
natural Schrödinger equation for quantum geodesic 
motion, 


i —-1h AU [26] 


where A, is the Laplace-Beltrami operator on 
(M,,,y). This technique, introduced for monopoles 
by Gibbons and Manton, is now standard for 
solitons of Bogomol’nyi type. 

By analyzing the forces between moving point 
vortices at A=1, Manton and Speight (2003) 
showed that, as the vortex separations become 
uniformly large, the metric on M, approaches 


2 
œ - 4 
y pe desde — 4r 2 Koller — Z,|) 
x (dz, — dz,)(dz, — de.) [27] 
This formula can also be obtained by a method of 


matched asymptotic expansions. We can use [27] to 
study 2-vortex scattering for large b, when the 


vortices remain well separated. (Note that 4% is not 
positive definite if any |z; — zs| becomes too small.) 
The results are good, provided v < 0.5 and b> 3 
(see Figure 5c). 


Other Developments 


The (critically coupled) AHM on a compact physical 
space © is of considerable theoretical and physical 
interest. Bradlow showed that M,,(2) is empty unless 
V = Area() > 47, so there is a limit to how many 
vortices a space of finite area can accommodate 
(Manton and Sutcliffe 2004, pp. 227-230). Manton 
has analyzed the thermodynamics of a gas of 
vortices by studying the statistical mechanics of 
geodesic flow on M,,(). In this context, spatial 
compactness is a technical device to allow nonzero 
vortex density n/V for finite n, without confining 
the fields to a finite box, which would destroy the 
Bogomol’nyi properties. In the limit of interest, 
n,V =œ with n/V fixed, the thermodynamical 
properties turn out to depend on » only through 
V, so X= S? and X = T? give equivalent results, for 
example. The equation of state of the gas is 
(P = pressure, T = temperature) 


nT 


P= _— 
V —Ann 


[28] 
which is similar, at low density n/V, to that of a gas 
of hard disks of area 27. The crucial step in deriving 
[28] is to find the volume of M,(=) which, despite 
there being no formula for y, may be computed 
exactly by remarkable indirect arguments (Manton 
and Sutcliffe 2004, pp. 231-234). 

The static AHM coincides with the Ginzburg- 
Landau model of superconductivity, which has 
precisely the same type I/II classification. Here the 
“Higgs” field represents the wave function of a 
condensate of Cooper pairs, usually (but not always) 
electrons. There has been a parallel development of 
the static model by condensed matter theorists, 
therefore; see Fossheim and Sudbo (2004), for 
example. In fact the vortex was actually first 
discovered by Abrikosov in the condensed matter 
context. One important difference is that type I 
superconductors do not support vortex solutions in 
an external magnetic field Bext because the critical 
|Bext| required to create a single vortex is greater 
than the critical |Bext| required to destroy the 
condensate completely (6 = 0). Type II supercon- 
ductors do support vortices, and there are such 
superconductors with A#x1, but the vortex 
dynamics we have described is not relevant to these 
systems. In this context there is an obvious preferred 
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reference frame (the rest frame of the superconduc- 
tor) so it is unsurprising that the Lorentz-invariant 
AHM is inappropriate. Insofar as vortices move at 
all, they seem to obey a first-order (in time) 
dynamical system, in contrast to the second-order 
AHM. Manton has devised a first-order system 
which may have relevance to superconductivity, by 
replacing Eķin with a Chern—Simons-Schrodinger func- 
tional (Manton and Sutcliffe 2004, pp. 193-197). 
Rather than attracting or repelling, vortices now 
tend to orbit one another at constant separation. 
There is again a moduli space approximation to 
slow vortex dynamics for A#x1, but it has a 
Hamiltonian-mechanical rather than Riemannian- 
geometric flavor. 

Finally, an interesting simplification of the AHM, 
which arises, for example, as a phenomenological 
model of liquid helium-4, is obtained if we discard the 
gauge field A,,, or equivalently set the electric charge of 
@ to e =0. There is now no type I/II classification, since 
A may be absorbed by rescaling. The resulting model, 
which has only global U(1) phase symmetry, supports 
n-vortices @=o(r)e” for all n, but these are not 
exponentially spatially localized, 


2 2 2 
n n (8 +n) ¢ 

or) =1-.3-—ypa + Or ) [29] 
and cannot have finite E by Derrick’s theorem. They 
are unstable for |n| > 1, and 1-vortices uniformly 
repel one another. They can be given an interesting 
first-order dynamics (the Gross—Pitaevski equation). 


Abbreviations 

Au electromagnetic gauge potential 
b impact parameter 

D; gauge-covariant derivative 

E potential energy 

Exin kinetic energy 

Fig electromagnetic field strength tensor 
L Lagrangian 

L Lagrangian density 

S action 

Q Higgs field 

O scattering angle 


See also: Fractional Quantum Hall Effect; 
Ginzburg-Landau Equation; High 7, Superconductor 
Theory; Integrable Systems: Overview; Nonperturbative 
and Topological Aspects of Gauge Theory; Quantum 
Fields with Topological Defects; Solitons and Other 
Extended Field Configurations; Symmetry Breaking in 
Field Theory; Topological Defects and Their Homotopy 
Classification; Variational Techniques for 
Ginzburg—Landau Energies. 
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Introduction 
Macroscopic Problem 


The “adiabatic piston” is an old problem of 
thermodynamics which has had a long and con- 
troversial history. It is the simplest example con- 
cerning the time evolution of an adiabatic wall, that 
is, a wall which does not conduct heat. The system 
consists of a gas in a cylinder divided by an 
adiabatic wall (the piston). Initially, the piston is 
held fixed by a clamp and the two gases are in 
thermal equilibrium characterized by (p*,T*, NF), 
where the index —/-+ refers to the gas on the left/right 
side of the piston and (p, T, N) denote the pressure, 
the temperature, and the number of particles 
(Figure 1). Since the piston is adiabatic, the whole 
system remains in equilibrium even if T7 Æ T*. At 
time t=0, the clamp is removed and the piston is let 
free to move without any friction in the cylinder. The 











N- Nt 
po p* A 
T- T+ 




















0 X L 


Figure 1 The adiabatic piston problem. 
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question is to find the final state, that is, the final 
position X; of the piston and the parameters (p7, TA) 
of the gases. 

In the late 1950s, using the two laws of 
equilibrium thermodynamics (i.e., thermostatics), 
Landau and Lifshitz concluded that the adiabatic 
piston will evolve toward a final state where 
p /T =pt/T*. Later, Callen (1963) and others 
realized that the maximum entropy condition 
implies that the system will reach mechanical 
equilibrium where the pressures are equal p; =p; 
however, nothing could be said concerning the final 
position X; or the final temperatures T= which 
should depend explicitly on the viscosity of the 
fluids. It thus became a controversial problem since 
one was forced to accept that the two laws of 
thermostatics are not sufficient to predict the final 
state as soon as adiabatic movable walls are 
involved (see early references in Gruber (1999)). 

Experimentally, the adiabatic piston was used 
already before 1924 to measure the ratio c,/c, of 
the specific heats of gases. In 2000, new measure- 
ments have shown that one has to distinguish 
between two regimes, corresponding to weak damp- 
ing or strong damping, with very different proper- 
ties, for example, for weak damping the frequency 
of oscillations corresponds to adiabatic oscillations, 
whereas for strong damping it corresponds to 
isothermal oscillations. 


Microscopic Problem 


The “adiabatic piston” was first considered from a 
microscopic point of view by Lebowitz who intro- 
duced in 1959 a simple model to study heat 
conduction. In this model, the gas consists of point 
particles of mass m making purely elastic collisions 
on the wall of the cylinder and on the piston. 
Furthermore, the gas is very dilute so that the 


equation of state p=nkpT is satisfied at equili- 
brium, where 7 is the density of particles in the gas 
and kg the Boltzmann constant. The adiabatic piston 
is taken as a heavy particle of mass M >> m without 
any internal degree of freedom. Using this same 
model Feynman (1965) gave a qualitative analysis in 
Lectures in Physics. He argued intuitively but 
correctly that the system should converge first 
toward a state of mechanical equilibrium where 
p =p* and then very slowly toward thermal 
equilibrium. This approach toward thermal equili- 
brium is associated with the “wiggles” of the piston 
induced by the random collisions with the atoms of 
the gas. Of course, this stochastic behavior is not 
part of thermodynamics and the evolution beyond 
the mechanical equilibrium cannot appear in the 
macroscopical framework assuming that the piston 
does not conduct heat. 

From a microscopical point of view, one is 
confronted with two different problems: the 
approach toward mechanical equilibrium in the 
absence of any a priori friction (where the entropy 
of both gases should increase) and, on a different 
timescale, the approach toward thermal equilibrium 
(where the entropy of one gas should decrease but 
the total entropy increase). 

The conceptual difficulties of the problem beyond 
mechanical equilibrium come from the following 
intuitive reasoning. When the piston moves toward 
the hotter gas, the atoms of the hotter gas gain 
energy, whereas those of the cooler gas lose energy. 
When the piston moves toward the cooler side, it is 
the opposite. Since on an average the hotter side 
should cool down and the cold side should warm 
up, we are led to conclude that on an average the 
piston should move toward the colder side. On the 
other hand, from p=nkgT, the piston should move 
toward the warmer side to maintain pressure 
balance. 

In 1996, Crosignani, Di Porto, and Segev intro- 
duced a kinetic model to obtain equations describing 
the adiabatic approach toward mechanical equili- 
brium. Starting with the microscopical model 
introduced by Lebowitz, Gruber, Piasecki, and 
Frachebourg, later joined by Lesne and Pache, 
initiated in 1998 a systematic investigation of the 
adiabatic piston within the framework of statistical 
mechanics, together with a large number of numer- 
ical simulations. This analysis was based on the fact 
that m/M is a very small parameter to investigate 
expansions in powers of m/M (see Gruber and 
Piasecki (1999) and Gruber et al. (2003) and 
reference therein). An approach using dynamical 
system methods was then developed by Lebowitz 
et al. (2000) and Chernov et al. (2002). An 
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extension to hard-disk particles was analyzed at 
the same time by Kestemont et al. (2000). Recently, 
several other authors have contributed to this 
subject. 

The general picture which emerges from all the 
investigations is the following. For an infinite 
cylinder, starting with mechanical equilibrium 
p =p'=p, the piston evolves to a stationary 
stochastic state with nonzero velocity toward the 
warmer side 


mZ ENT- o2) 


7 


with relaxation time 


“Vie (etyR) 


where M/A is the mass per unit area of the piston. 


In this state the piston has a temperature 
Tp=<VT+T~- and there is a heat flux 
jo = WT — VT fp o2) 
M M 


(p =p" =p) [3] 


For a finite cylinder and pt Æ p`, the evolution 
proceeds in four different stages. The first two are 
deterministic and adiabatic. They correspond to the 
thermodynamic evolution of the (macroscopic) 
adiabatic piston. The last two stages, which go 
beyond thermodynamics, are stochastic with heat 
transfer across the piston. More precisely: 


1. In the first stage whose duration is the time 
needed for the shock wave to bounce back on the 
piston, the evolution corresponds to the case of 
the infinite cylinder (with p Æp"). If 
R=Nm/M> 10, the piston will be able to 
reach and maintain a constant velocity 


7 k hee hes m 
N _ AE 00) 
mM pr T- +p- T+ 
for |p -p| «1 |4] 
2. In the second stage the evolution toward 


mechanical equilibrium is either weakly or 
strongly damped depending on R. If R < 1, the 
evolution is very weakly damped, the dynamics 
takes place on a timescale t = v Rt, and the effect 
of the collisions on the eine is to introduce an 
external potential $(X)=c,/X? + c2/(L— XY 
On the other hand, if R > 4, the evolution is 
strongly damped (with two o only) and 
depends neither on M nor on R. 
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3. After mechanical equilibrium has been reached, 
the third stage is a stochastic approach toward 
thermal equilibrium associated with heat transfer 
across the piston. This evolution is very slow and 
exhibits a scaling property with respect to 
t = mt/M. 

4. After thermal equilibrium has been reached 
(T-=T*t,p =p’), in a fourth stage the gas 
will evolve very slowly toward a state with 
Maxwellian distribution of velocities, induced 
by the collision with the stochastic piston. 


The general conclusion is thus that a wall which is 
adiabatic when fixed will become a heat conductor 
under a stochastic motion. However, it should be 
stressed that the time required to reach thermal 
equilibrium will be several orders of magnitude larger 
than the age of the universe for a macroscopical piston 
and such a wall could not reasonably be called a heat 
conductor. However, for mesoscopic systems, the effect 
of stochasticity may lead to very interesting properties, 
as shown by Van den Broeck et al. (2004) in their 
investigations of Brownian (or biological) motors. 


Microscopical Model 


The system consists of two fluids separated by an 
“adiabatic” piston inside a cylinder with x-axis, 
length L, and area A. The fluids are made of N+ 
identical light particles of mass m. The piston is a 
heavy flat disk, without any internal degree of 
freedom, of mass M > m, orthogonal to the 
x-axis, and velocity parallel to this x-axis. If the 
piston is fixed at some position Xo, and if the two 
fluids are in thermal equilibrium characterized by 
(po, T, NF), then they will remain in equilibrium 
forever even if Ty ATp: it is thus an “adiabatic 
piston” in the sense of thermodynamics. At a certain 
time t=O, the piston is let free to move and the 
problem is to study the time evolution. To define the 
dynamics, we consider that the system is purely 
Hamiltonian, that is, the particles and the piston 
move without any friction according to the laws of 
mechanics. In particular, the collisions between the 
particles and the walls of the cylinder, or the piston, 
are purely elastic and the total energy of the system 
is conserved. In most studies, one considers that the 
particles are point particles making purely elastic 
collisions. Since the piston is bound to move only in 
the x-direction, the velocity components of the 
particles in the transverse directions play no role in 
this problem. Moreover, since there is no coupling 
between the components in the x- and transverse 
directions, one can simplify the model further by 
assuming that all probability distributions are 


independent of the transverse coordinates. We are 
thus led to a formally one-dimensional problem 
(except for normalizations). Therefore, in this 
review, we consider that the particles are noninter- 
acting and all velocities are parallel to the x-axis. 
From the collision law, if v and V denote the 
velocities of a particle and the piston before a 
collision, then under the collision on the piston: 


pov =2V—-—vt+a(u—V) 


5 
V>V'=V+a(u— V) 5] 
where 
2m 
SS 6 
oa [6] 


Similarly, under a collision of a particle with the 
boundary at x=0 or x= L: 


v —> v =-v [7] 


Let us mention that more general models have also 
been considered, for example, the case where the 
two fluids are made of point particles with different 
masses m+, or two-dimensional models where the 
particles are hard disks. However, no significant 
differences appear in these more general models and 
we restrict this article to the simplest case. 

One can study different situations: L=oo, L 
finite, and L — œœ. Furthermore, taking first M and 


A finite, one can investigate several limits. 


1. Thermodynamic limit for the piston only. In 
this limit, L is fixed (finite or infinite) and 
A-o,M—oo, keeping constant the initial 
densities n* of the fluid and the parameter 


mA A 
ee eg T E 
Taa m [8] 


If L is finite, this means that N*— oo while 
keeping constant the parameters 








mN+ M? 
R? = — —_8* 9 
MoM 9] 


2. Thermodynamic limit for the whole system, 
where L—oo and A~ L, N? ~ L?. In this 
limit, space and time variables are rescaled 
according to x'=x/L and t'=t/L. This limit 
can be considered as a limiting case of (1) where 
R* ~ VA —> œ (and time is scaled). 

3. Continuum limit where L and M are fixed and 
N= — œo,m— 0 keeping M=, constant, that is, 
R srie 


The case L infinite and the limit (1) have been 
investigated using statistical mechanics (Liouville or 


Boltzmann’s equations). On the other hand, the 
limit (2) has been studied using dynamical system 
methods, reducing first the system to a billiard in an 
(N+ + N` + 1)-dimensional polyhedron. The limit 
(3) has been introduced to derive hydrodynamical 
equations for the fluids. 

In this article, we present the approach based on 
statistical mechanics. Although not as rigorous as (2) 
on a mathematical level, it yields more informations 
on the approach toward mechanical and thermal 
equilibrium. Moreover, it indicates what are the 
open problems which should be mathematically 
solved. In all investigations, advantage is taken of 
the fact that m/M is very small and one introduces 
the small parameter 


e=\//m/M <1 [10] 


Let us note that € measures the ratio of thermal 
velocities for the piston and a fluid particle, whereas 
aœ ~ e? measures the ratio of velocity changes during 


a collision. 


Starting Point: Exact Equations 


Using the statistical point of view, the time evolution 
is given by Liouville’s equation for the probability 
distribution on the whole phase space for (N* + 
N~ +1) particles, with L,A,N*, and M finite. 
Initially (t < 0), the piston is fixed at (Xo, Vo =0) 
and the fluids are in thermal equilibrium with 
homogeneous densities nọ, velocity distributions 
yo (v) = y5(—v), and temperatures 


t= m) du ng ve (v)v" [11] 


Integrating out the irrelevant degrees of freedom, 
the Liouville’s equation yields the equations for 
the distribution p*(x,v;t) of the right and left 
particles: 


0,07 (x,u;t) + vd,p*(x,v3£) = I*(x,v;t) [12] 


The collision term I*(x,v;t) is a functional of 
p+,p(X,v; X, V; t), the two-point correlation func- 
tion for a right (resp. left) particle at (x = X,v) and 
the piston at (X, V). Similarly, one obtains for the 
velocity distribution of the piston: 


O,®(V; t) =a | 


— OO 


+ O(v — V) pi (uv; V; £)| dv 


-A | V-o- Vokalt Ve 


CO 


+60(V = v) (uv; V; tdv [13] 
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where (v’, V’) are given by eqn [5] and 
pv Vt) -J dXp+ p(X, v; X, V;t) [14] 


We thus have to solve eqns [12]-[13] with initial 
conditions 


p (x, v; t = 0) = 19 po (v)A(x)O(Xo — x) 
p` (x,v;t = 0) = nj 95 (Vv)O(L — x)(x — Xo) T15] 
@(V;t =0) = 6(V) 


Using the fact that a=2m/(M+m) <1, we can 
rewrite eqn [13] as a formal series in powers of a: 


F,(V;t) =] (v — V) elv; V; t)dv 


V 
-j (v—V)*pt_.(v;V;t)dv [17] 


from which one obtains the equations for the 
moments of the piston velocity: 





1d(V") 
y dt 
a n! i be 
-Ya dV V”kĒ, (V: [18] 
k=1 ' oe 


However, we do not know the two-point correlation 
functions. 

If the length of the cylinder is infinite, the 
condition M >> m implies that the probability for 
a particle to make more than one collision on the 
piston is negligible. Alternatively, one could choose 
initial distributions yg(v) which are zero for |v| < 
Vmin, Where Vmin is taken such that the probability 
of a recollision is strictly zero. Therefore, if L= a, 
one can consider that before a collision on the 
piston the particles are distributed with yğ(v) for 
all t, and the two-point correlation functions 
factorize, that is, 


Prut (V3 Vit) = Pore (Y; )®(V; t), 
Peart (U3 V; t) = Peart (U3 t)®(V; t), 


where for L=ox, p= .(v3t) =p yp (v) and thus the 
conditions to obtain eqn [18] are satisfied. 

If L is finite, one can show that the factorization 
property (eqn [19]) is an exact relation in the 
thermodynamic limit for the piston (A—> œ, 
M/A=cte). For finite L and finite A, we introduce 


ifv>V 


[19] 
fv< V 
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Assumption 1 (Factorization condition). Before a 
collision the two-point correlation functions have the 
factorization property (eqn [19]) to first order in a. 


Under the factorization condition, we have 
F,(V;t) = Fe(V; 2) ®(V;1t) [20] 
with 
F,(V;t) =f dv(v — vy Psart(V3 t) 


"i dv(v — “pt 0; t) 
= Fy (V;t) — FE(V;2) 21] 
and from eqn [18] 
(F) g0) = Ma(P(V; i) p2 


(a) 


Introducing V=(V),, then from eqns [12] and [20], 
it follows that the (kinetic) energies satisfy 


d = ) oe Ma] |(F} (V: t)}a V 


=Mol(VF:(V5t))y +a(F;(V;t))o] [23] 





dt\ A 
+O - VFV: t))a 
+SEE: D)al 24 


which implies conservation of energy. 
From the first law of thermodynamics, 


E-i s 





where Pi * and P+ denote the work- and 
heat-power transmitted by the piston to the fluid, 
we conclude from eqns [22] and [25] that the heat 
flux is 


Í pot 


a = + Mal ((V - V)FE(V52))o 


+5 (FE(V50)) 9 26 


Since œa «& 1, it is interesting to introduce the 
irreducible moments 


A,=(V-V)')o 27 


and the expansion around V=(V),, 


FEV = YORO V-V [28] 


from which one obtains equations for dA,/dt. In 
particular, using the identities 





Se, UE ee 
n [22] and [24], we have 
(Vr a 
+ Gen TT Fy Aze [30] 
(2) = + Ma FEV: t)),V 
Pe — F£(V;t) +50 Ae 
x FY WDA, | [31] 


Depending on the questions or approximations one 
wants to study, either the distribution ®(V;t) or the 
moments (V”), will be the interesting objects. 
Finally, with the condition [19], one can take 
eqn [12] for x#X, and impose the boundary 
conditions at x = X;: 


o Xatt =p (Xav ih ivs V 32] 
OG t) = DO (Xd I, if v > V: 
and similarly for x=0 and x= L with v’ = =v. 


Let us note that this factorization condition is of 
the same nature as the molecular chaos assumption 
introduced in kinetic theory, and with this condition 
eqn [13] yields the Boltzmann equation for this 
model. 

In the following, to obtain explicit results as a 
function of the initial temperatures Tj, we take 
Maxwellian distributions yğ(v) and initial condi- 
tions (pg, T9, nọ) such that the velocity of the piston 
remains small (i.e., |V}, | < |(v*)o|). 





Distribution (V; t) for the Infinite 
Cylinder (L= œo) 


To lowest order in ¢€=,/m/M, and assuming 
|1 — p7 /p7 | is of order e€, one obtains from eqn [16] 
the usual Fokker—Planck equation whose solution 
gives 


1 1 V—Via)) 
to Vit) = Fete (CEO) 33] 
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a2 L mT P t4 T* 4p vT 
E ptT- +p-vTt 


where we have dropped the index “zero” on the 
variable T*,n* and used the equation of state 
p = n*kpT. 

In conclusion, in the thermodynamic limit for the 
piston (M— œ, M/A fixed), eqn [33] shows that 
the evolution is deterministic, that is, ®(V;t)= 
6(V — V(t), where the velocity V(t) of the piston 
tends exponentially fast toward stationary value 
Vat = V(oo) with relaxation time r =A". 

Let us note that for pt =p~, we have V(t) =0 
and the evolution [33] is identical to the 
Ornstein—Uhlenbeck process of thermalization of 
the Brownian particle starting with zero velocity 
and friction coefficient A. The analysis of [16] to 
first order in e yields then 


jita (t))* 


where a(t) can be explicitly calculated and ao(t) = 





(1 ==) 


®o(V;t) [35] 








—A?(t)a2(t) because of the normalization condition. 
Moreover, 42(t)~(p — pt), that is, a2(t)=O0 if 
p =p". From [35], one obtains 
Ua ao 
t V 8m p+/T- +p- VTH 
x {7 —p*)(1-e™) 
ep) 
4 427 (DP 


x (1 —2dAte™” — e 
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and 


(V2), — = A209) 1 + \e2a7(Ha2(0) 37] 


From eqn [36], we now conclude that for equal 
pressures p =p", the piston will evolve stochasti- 
cally to a stationary state with nonzero velocity 
toward the warmer side 
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a. 


stat — M 
Let us remark that we have established eqn [35] 
under the condition that |1—p‘/p~|=O(e), but as 
we see in the next section, the stationary value Vstat 
obtained from eqn [36] remains valid whenever 


\(1—pt/p-)\(1—./Tt+/T-)| <1. 


stat 


Moments (V”),: Thermodynamic Limit 


for the Piston 
General Equations: Adiabatic Evolution 


In the thermodynamic limit M —> œ, a > 0, y= QA 
is fixed and eqn [16] reduces to 
3@ð(V; t) = — o 5 (V;t) [39] 
t st] — TOV 2\ yY; 


Integrating [39] with initial condition ®(V;t=0) = 
6(V) yields 


(V,t)=6(V—V(t)), thatis, (V"),=(V)? [40] 
where 

GV(t)=rF(V(et), ViE=0)=0 M 
Moreover, 


Fa(V; t) = F)(V;t)®(V;t) [42] 


and 


pa, P(X, 0; X, V; t) =p* (x, v;£)6(X — X(z)) 
x 6(V — V(t)) [43] 


where dX(t)/dt = V(t), X(t = 0) = Xo. 

In conclusion, as plat mentioned, in this limit 
the onon condition (eqn [19]) is an exact 
relation. Let us note that p= (v; t) = p£ (2V — v;t) if 
v > V(t) (on the right) or v < V(t) (on the left). Let 
us also remark that 2mF}(V(t);t) represents the 
effective pressure from the right/left exerted on the 
piston. Moreover, since for any distribution 
pÈ (v;t), the functions F; (V;t) and —F}(V;t) are 
monotonically decreasing, we can introduce the 
decomposition 


Pigg = MFE Vi n) =P (FEVD MA 


where the static pressure at the surface is 
p(t) =p (V=0;t) and the friction coefficients 
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A*(V;t) are strictly positive. The evolution [41] is 
thus of the form 


—X(V)V [45] 


It involves the difference of static pressure and the 
friction coefficient A(V)=A~(V) + A7(V). Finally, 
from eqn [12], we obtain the evolution of the 
(kinetic) energy per unit area for the fluids in the left 
and right compartments: 


d <Et> 
dt A 


Therefore, from [40] and [46], and the first law of 
thermodynamics, we recover the conclusions 
obtained in the previous section, that is, in the 
thermodynamic limit for the piston, the evolution 
(eqns [41], [12], and [35]) is deterministic and 
adiabatic (i.e., in [46] only work and no heat is 
involved). 


) —+2mF;(V;t)V [46] 


Infinite Cylinder (L= cœ, M = œo) 


As already discussed, for L=oo we can neglect the 
recollisions. Therefore, in F> the distribution p*(v;t) 
can be replaced by nf ys (v) and F>(V) is indepen- 
dent of t. In this case, the evolution of the piston is 
simply given by the ordinary differential equation 


— V(t) = L 2mP(V), yVt=0 s0 [47] 
where F>(V) is a strictly decreasing function of V. If 
po =Po> then V(t) =0, that is, the piston remains at 
rest and the two fluids remain in their original 
thermal equilibrium. If pj 4 pg, that is, n>keTy # 
ny kgTg, the piston will evolve monotonically to a 
stationary state with constant velocity Vstat solution 
of Fo(Vstar) =O. From [34], it follows that Vstat is a 
function of n} /ng, Tg, 75 but does not depend on 
the value M/A. Moreover, the approach to this 
stationary state is exponentially fast with relaxation 
time 7 =1/A(V=0). For Maxwellian distributions 
yo (v), Vstat is a solution of 


kp (19 To -ni To) — Vseat \ am (ng yT Y =n V TS) 


+ Vig (ng — 16) + O(Virar) =0 [48] 


stat 








Moreover, 


T! = SV n a +n VT) [49] 


which implies that the relaxation time will be very 
small either if M/A « 1, or if ng =¿ñ3 with € > 1. 
In this case, the piston acquires almost immediately 





its final velocity Vstat and one can solve eqn [12] to 
obtain the evolution of the fluids. 


Finite Cylinder (L < œ, M=oco) 


For finite L, introducing the average temperature in 


the fluids 


AE”) 
L t 
L _ kpNt+ [50] 
we have to solve [41] and [46], that is, 
Eva) =f 2m[Fz (Vit) — FE(Vs2) 
+ A y. 
kpg J Tav +4m NE F(V;t)V 


where F;(V;t) is a functional of p=_.(v;t) which we 
decompose as 


FE(V;t) = a*(t)kpT*(t) + (=) AV: OV [52 
with 
= / dup,..¢(V; t) 
an [53] 
=J dup. (v; t) 
and 
tkp Tt = pt [54 


For a time interval 71 = L,/m/kpT which is the time 
for the shock wave to bounce back, the piston will 
evolve as already discussed. In particular, if R* is 
sufficiently large, then after a time To = O((R*)~') the 
piston will reach the velocity V given by F2(V,t)=0 
(eqn [47]). For t > 7, F¥(V3;t) depends explicitly on 
time. For R* sufficiently large, we can expect tar for 
all ż the velocity V(t) will be a functional of p= (v; t) 
given by F)[V(t); p.,¢(.3£)] =0, and thus the a 
is to solve eqn [12] wath the bonncaty condition (eqn 
[32]). Since V(t) so defined is independent of M/A, 
the evolution will be independent of M/A if R* is 
sufficiently large. This conclusion, which we cannot 
prove rigorously, will be confirmed by numerical 
simulations. 

To give a qualitative discussion of the evolution 
for arbitrary values of R*, we shall use the following 
assumption already introduced in the experimental 
measurement of cp/Ccy. 


Assumption 2 (Average assumption). The surface 
coefficients #*(t) and T*(t) (eqns [52]-[53]) coin- 
cide to order 1 in a with the average value of the 
density and temperature in the fluids, that is, 


_ N ., N 
AX(t)’ A(L — X(t) 
For [35| 


We still need an expression for the friction 
coefficients. From 


= p*(t) — 4mVF;(V = 0;1) 
+ mV àt (t) + OCV?) [56] 


FF (Vt) 


then, assuming that to first order in a, F7(V = 0; t) is 
the same function of T*(t) as for Meadin 
distributions, we have 


tn (A). 4 |, ST 
A v-(f mn = ty 


Therefore, choosing initial condition such that V(t) 
is small for all time, eqn [51] yields 


Vî-X-VÎHL-X) 
m n Tt (L-— Xo) [58] 


We thus obtain the equilibrium point for the 
adiabatic evolution (M =o): 


+ O(V*) [57] 
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N+)  2Eo X; 
a TF = Aks ( - a 60] 
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Solving [58]-[62] gives the equilibrium state (X;, T= ) 
which is a state of mechanical equilibrium pF =p;, 
but not thermal equilibrium TF # T;. Moreover, this 
equilibrium state does not depend on M. Having 
obtained the equilibrium point, we can then investi- 
gate the evolution close to the equilibrium point. 
Linearizing eqn [51] around (X¢, T;) yields 


= — y2 
TOE 





dt M x? 
Nt) Tt (L — X;)° 7 
E (S) e | V [63] 
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In other words, the effect of collisions on the piston 
is to induce an external potential of the form 
[aX]? + c(L — X)?] and a friction force. It is a 
damped harmonic oscillator with 


E 1 

2 0 

—6 ——_—_ 
“o ($ Jz Xe) 


v alya 


(recall that R* =mN*/M). For the case N~ = N* to 
be considered in the simulations, eqn [64] implies 
that the motion is weakly damped if 
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and strongly damped if R > Rmax, in agreement with 
experimental observations. 


Moments (V”),: Piston with Finite Mass 


Equation to First Order in a=2m/(M-+ m) 


If the mass of the piston is finite with M >> m, then 
the irreducible moments A, are of the order al’+!)/2! 
where [(r+1)/2] is the integral part of (r+ 1)/2. 
If the factorization condition [19] is satisfied, to first 
order in a we have 

1) 


(vr), = vr) + 2D 


z V AeA [67] 


where V(t)=(V), and A2(t)=(V?),— (VY are 
solutions of 

1d 

vdi V(t ) F, + A>Fo 

1d 

—— Aj(t) = —4A2F F 

yal 2(t) air + aF3 68 

1d 

aa (E*), = + {M[Fy + AoF5|V 


+ (M/2)[4A.Fy — aF3]} 
and A>=kpTp/M defines the temperature of the 
piston. 
Infinite Cylinder: Heat Transfer 


For the infinite cylinder, the factorization assump- 
tion is an exact relation and in this case the 
functions F,(V;t) are independent of t. The solution 


168 Adiabatic Piston 


of the autonomous system [68] with F, = F,(V) 
shows that the piston evolves to a stationary state 
with velocity V given by 

a F3(V)Fo(V) 


RV +E Ea” 7? [69] 


The temperature of the piston is 


Ra kgTp = a F3(V) 








A =- 70 
2 M 4R) [70] 
and the heat flux from the piston to the fluid is 
1 m? [Fi F; — F; Ft 
+ ppa- |234 veal 1 
A È A Fr — F} a 


If we choose initial conditions such that |V(t)| « 1 
for all t, and Maxwellian distributions y*(v), the 
solutions V(t), A2(t) coincide with the solutions 
previously obtained (eqns [36] and [37]) and 


1 P=- _. fl — m Skp 
oe 


=p 
e (72) 
(ptVT- +p-vT*) 
In conclusion, to first order in m/M, there is a heat 
flux from the warm side to the cold one propor- 
tional to (T+ — T7), induced by the stochastic 
motion of the piston. 


Finite Cylinder (L < œ, M < œ) 


Singular character of the perturbation approach 
Whereas the leading order is actually the “thermo- 
dynamic behavior?” M =œ in the first two stages of 
the evolution (fast relaxation toward mechanical 
equilibrium), the fluctuations of order O(a) rule the 
slow relaxation toward thermal equilibrium. It is 
thus obvious that a naive perturbation approach 
cannot give access to “both” regimes. This difficulty 
is reminiscent of the boundary-layer problems 
encountered in hydrodynamics, and the perturbation 
method to be used here is the exact temporal analog 
of the matched perturbative expansion method 
developed for these boundary layers. The idea is to 
implement two different perturbation approaches: 


1. one at short times, with time variable ¢ describing 
the fast dynamics ruling the fast relaxation 
toward mechanical equilibrium; and 

2. one for longer times, with a rescaled time 
variable += at. 


The second perturbation approach above is supple- 
mented with a “slaving principle,” expressing that at 
each time of the slow evolution, that is, at fixed 7, 
the still present fast dynamics has reached a local 
asymptotic state, slaved to the values of the slow 


observables. The initial conditions are set on the 
first-stage solution. The initial conditions of the 
second regime match the asymptotic behavior of the 
first-stage solution (“matching condition”). 

The slaving principle is implemented by interpret- 
ing an evolution equation of the form 

= = a = Alta); 

as follows: it indicates that a is in fact a fast quantity 
relaxing at short times (XT) toward a stationary 
state deq(T) slaved to the slow evolution and 
determined by the condition 


Alt GeqlF)| = 0 [74] 


A=O(1) [73] 


(at lowest order in a, actually A[T, aeqa(T)] = O(a) 
which prescribes the leading order of aeq(T)); the 
following-order terms can be arbitrarily fixed as 
long as only the first order of perturbation is 
implemented. Physically, such a condition arises to 
express that an instantaneous mechanical equili- 
brium takes place at each time r of the slow 
relaxation to thermal equilibrium. 


Equations for the fluctuation-induced evolution of 
the system Following this procedure, we arrive at 
explicit expressions for the rescaled quantities (of order 
O(1))V = V/a, A2 = A2/a, and II = (p~ — p*)/a: 


~ m([AL\[(F F} -FiF 
¥=3(E)(A gE) +0 


II 2m (AL 
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We then introduce a (dimensionless) rescaled posi- 
tion for the piston 





1 X 1 
a a ie 76) 
which satisfies 
dé _ 4. [2A\ Fr Ft 
$=- -mA 77 


To discuss eqn [77], a third assumption has to be 
introduced. 


Assumption 3 (Maxwellian Identities). In the 

regime when V= O(a), the relations between the 

functionals F1, F2, and F; are the same at lowest 
. . . . . + 

order in a as if the distributions p= (v; V;t) were 

Maxwellian in v: 





[kp T+ 
Fe y Æ 
i(V) ~ FP re 


rev) = (SES) FE(V) - VEE) 


m 


[78] 


Using these identities and the (dimensionless) 
rescaled time 


2 [kg DNT +N*TS) 


where N=N*t+N-, we obtain a deterministic 
equation describing the piston motion (Gruber et al. 
2003): 





s0) =5- > 


where X,q is the piston position at the end of the 
adiabatic regime (1.e., Xs, eqn [62]). The meaningful 
observables straightforwardly follow from the solu- 
tion €(s): 


1 

Xs) =L(5-&)) 
[81] 
N- T; +NtTy 

TET 0 0 
T= (s) = [1+ 2¢9)(“—S "| 

The first-order perturbation analysis using a single 
rescaled time tj = ato is valid in the regime when 
V = O(a) and it gives access to the relaxation toward 
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thermal equilibrium up to a temperature difference 
T* — T~ = O(a). For the sake of technical complete- 
ness (rather that physical relevance, since the above 
first-order analysis is enough to get the observable, 
meaningful behavior), let us mention that the pertur- 
bation analysis can be carried over at higher orders; 
using further rescaled times t2 = a7to,...,t, = ato, it 
would allow us to control the evolution up to a 
temperature difference |T* — T~ |= O(a”); however, 
one could expect that the factorization condition does 
not hold at higher orders. 


Numerical Simulations 


As we have seen, the results were established under 
the condition that m/M is a small parameter. More- 
over for finite systems (L < oo,M < oo), it was 
assumed that before collisions and to first order in 
miM, the factorization and the average assumptions 
are satisfied. The numerical simulations are thus 
essential to check the validity of these assumptions, to 
determine the range of acceptable values m/M for the 
perturbation expansion, to investigate the thermo- 
dynamic limit, and to guide the intuition. 

In all simulation, we have taken kp=1,m=1, 
T` =1 and usually T* =10. For L finite, we have 
taken L=60, Xo = 10, A = 10°, and N+ = N- =N/2, 
that is, p =R(M/A)(1/10) and pt=2p~. The 
number of particles N was varied from a few hundreds 
to one or several millions; the mass M of the piston 
from 1 to 10°. We give below some of the results 
which have been obtained for L = œ (Figures 2 and 3) 





0 20 40 60 80 100 


Figure 2 Evolution of the piston for L= œ, and p~ = p* = 1 as observed in simulations (stochastic line in (a), dots in (b)) compared 
with prediction: (a) position X(t) for T* = 10; and (b) stationary velocity for T* = 10 (continuous line) and T* = 100 (dotted line), as a 


function of M. 
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(a) (b) 
Figure 3 Evolution of the piston for L=oo, M=10*, and pt 4 p~ as observed in simulations (continuous line) compared with 
predictions (dotted line): (a) p =1,p*'=p + Ap, from top to bottom Ap/p- =0.05,0.1,0.2,1,2,3; and (b) p =¢, pt =2¢, 
Apr =tAXA = X HC, CS 10 10 10. 1. 10,107, 10" ,.10- 




















10 0.3 
0.2 
9.5 
0.1 
9 0 _ 
V 
Xad 
aus —0.1 
—0.2 
8 
-03 
7.5 —0.4 
0 50 100 150 200 250 300 350 0 10 20 30 40 50 
0.3 
0.2 
0.1 
0 _ 
> V 
< -0.1 
“62 
3 
—0.4 
50 100 150 200 250 300 350 0 10 20 30 40 50 
0.15 
10 
ae 0.1 
7 0.05 
8.5 ly 0 
oak Yt ur FT PI ot je? ~0.05 
7.5 -0.1 
7 -0.15 
, —0.2 
O 50 100 150 200 250 300 350 0 10 20 30 40 50 
t t 


(a) (b) 
Figure 4 “Deterministic” evolution toward mechanical equilibrium for L < oo, M = 10°: (a) position X(f; one finds K = 8.3 whereas 
X!" =8.42 and (b) velocity V(t); one finds V*'""= —0.343 whereas V*™ = —0.3433. From top to bottom: R=12: strong damping, 
independent of R and M for R > 4 and M > 10°. R=2: critical damping. R =0.1: weak damping; damping coefficient increases with R 
and wo ~ VR for R < 1 but is independent of M for M > 10°. 
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(a) 


Figure 5 Same conditions as Figure 4, R=12: (a) average 
T;, =E*/N*kg and (b) pressure and temperature at the surface 





Simulations: T;, = 1.52, Ti = 9.48, p54 = Pàg = 2.2. 


and for L < œ approach to mechanical equilibrium 
(Figures 4—6) and to thermal equilibrium (Figures 7 
and 8). 


Conclusions and Open Problems 


In this article, the adiabatic piston has been 
investigated to first order in the small parameter 
miM, but no attempt has been made to control the 
remainder terms. For an infinite cylinder, no other 
assumptions were necessary and the numerical 
simulations (Figures 2 and 3) are in perfect agree- 
ment with the theoretical prediction in particular for 
the stationary velocity Vstat, the friction coefficient 
A(V), and the relaxation time T. 

For a finite cylinder (L < co) and in the thermo- 
dynamic limit (M = oo), we were forced to introduce 
the average assumption to obtain a set of autono- 
mous equations. As we have seen when initially p~ 
Æ p*, this limiting case also describes the evolution 
to lowest order during the first two stages character- 
ized by a time of the order tı = L,/m/kgT, where the 
evolution is adiabatic and deterministic. In the first 
stage, that is, before the shock wave bounces back on 
the piston, the simulations confirm the theoretical 
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pressure and temperature in the fluid: på (t)= 2E*n*/N=, 
of the piston. Prediction: 7,4 = 1.54, Tz, =9.46, Pig = Pay =2.2. 


predictions. In particular, they show that if R > 4, 
the piston will be able to reach and maintain for 
some time the velocity Vota, whereas this will not be 
the case for R < 1 (Figure 4b). In the second stage of 
the evolution, the simulations (Figure 4) exhibit 
damped oscillations toward mechanical equilibrium 
which are in very good agreement with the predic- 
tions for the final state (X,q, T4), the frequency of 
oscillations and the existence of weak and strong 
damping depending on R < 1 or R > 4. Moreover, 
the general behavior of the evolution observed in the 
simulations as a function of the parameters was as 
predicted. However, the damping coefficient of these 
oscillations is wrong by one or several orders of 
magnitude. To understand this discrepancy, we note 
that using the average assumption we have related 
the damping to the friction coefficient. However, the 
simulations clearly show that those two dissipative 
effects have totally different origins. Indeed, as one 
can see with L= cœ, friction is associated with the 
fact that the density of the gas in front and in the 
back of the piston is not the same as in the bulk, and 
this generates a shock wave that propagates in the 
fluid. For finite L, when R> 4, the stationary 
velocity Vstat is reached and the effect of friction is 
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Figure 6 Velocity distribution in the left compartment. Same conditions as Figure 4, R = 12. Dotted line corresponds to Maxwellian 
with T~ = 1.52: (a) t= 12, 24, 36, 48, 60, 92, 144, 240 from top to bottom and (b) t=276—460. 


to transfer in this first stage more and more energy to 
the fluid on one side and vice versa on the other side. 
However, to stop the piston and reverse its motion, 
only a certain amount of the transferred energy is 
necessary and the rest remains as dissipated energy in 
the fluid leading to a strong damping. On the other 
hand, for R < 1, the value Vx. is never reached and 
all the energy transferred is necessary to revert the 
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motion. In this case very little dissipation is involved 
and the damping will be very small. This indicates 
that the mechanism responsible for damping is 
associated with shock waves bouncing back and 
forth and the average assumption, which corresponds 
to a homogeneity condition throughout the gas, 
cannot describe the situation. In fact, the simulations 
(Figure 5b) indicate that the average assumption does 
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Figure 7 Approach to thermal equilibrium, N+ =3 x 104. The smooth curves correspond to the predictions, the stochastic curves to 
simulations: (a) position X(T), t= at, no visible difference for M = 100, 200, 1000 and (b) average temperatures T+(r), T =at, M=200. 
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Figure 8 Approach to thermal equilibrium from 7,,—1.54 (dotted line in(a)) to 7, =5.5 (heavy line in (b)). Velocity distribution 
function on the left for M=200, N+ =5 x 10+. (a) r=a t=2, 4, 14, 48, 92, 144 and (b) approach to Maxwellian distribution for r > 445. 


not hold in this second stage. In conclusion, one is 
forced to admit that to describe correctly the 
adiabatic evolution, it is necessary to study the 
coupling between the motion of the piston and the 
hydrodynamic equations of the gas. Preliminary 
investigations have been initiated, but this is still 
one of the major open problems. Another problem 
would be to study the evolution in the case of 
interacting particles. However, investigations with 
hard disks suggest that no new effects should appear. 
To investigate adiabatic evolution, a simpler version 
of the adiabatic piston problem, without any con- 
troversy, has been introduced: this is the model of a 
standard piston with a constant force acting on it. 

In the third stage, that is, the very slow 
approach to thermal equilibrium, another assump- 
tion was necessary, namely the factorization 
condition. The simulations (Figure 7) show a very 
good agreement with the prediction, and in 
particular the scaling property with t =t/M is 
perfectly verified. It appears that the small dis- 
crepancy between simulations and_ theoretical 
predictions could be due to the fact that, to 
compute explicitly the coefficients in the equations 
of motion, we have taken Maxwellian relations for 
the velocities of the gas particles, which is clearly 
not the case (Figure 8a). 

The fourth stage of the evolution, that is, the 
approach to Maxwellian distributions (Figure 8b), is 
still another major open problem. Some preliminary 
studies have been conducted, where one investigates 
the stability and the evolution of the system when 
initially the two gases are in the same equilibrium 
state, but characterized by a distribution function 
which is not Maxwellian. 


Finally, let us mention that the relation between the 
piston problem and the second law of thermodynamics 
is one more major problem. The question of entropy 
production out of equilibrium, and the validity of the 
second law, are still highly controversial. Again, 
preliminary results can be found in the literature. 
Among other things, this question has led to a model of 
heat conductivity gases, which reproduces the correct 
behavior (Gruber and Lesne 2005). 


See also: Billiards in Bounded Convex Domains; 
Boltzmann Equation (Classical and Quantum); 
Hamiltonian Fluid Dynamics; Multiscale Approaches; 
Nonequilibrium Statistical Mechanics (Stationary): 
Overview; Nonequilibrium Statistical Mechanics: 
Dynamical Systems Approach. 
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Introduction 
The anti-de Sitter/conformal field theory (AdS/CFT) 


correspondence is a conjectured equivalence 
between a quantum field theory in d spacetime 
dimensions with conformal scaling symmetry and a 
quantum theory of gravity in (d+ 1)-dimensional 
anti-de Sitter space. The most promising 
approaches to quantizing gravity involve super- 
string theories, which are most easily defined in 
10 spacetime dimensions, or M-theory which is 
defined in 11 spacetime dimensions. Hence, the 
AdS/CFT correspondences based on superstrings 
typically involve backgrounds of the form AdSj,1 x 
Yo_q while those based on M-theory involve back- 
grounds of the form AdSj,1 x Yio_g, where Y are 
compact spaces. 

The examples of the AdS/CFT correspondence 
discussed in this article are dualities between 
(super)conformal nonabelian gauge theories and 
superstrings on AdS; x Ys, where Ys is a five- 
dimensional Einstein space (i.e. a space whose 
Ricci tensor is proportional to the metric, 
Rj =4g;). In particular, the most basic (and maxi- 
mally supersymmetric) such duality relates 
N =4SU(N) super Yang-Mills (SYM) and type IIB 
superstring in the curved background AdS; x S°. 

There exist special limits where this duality is 
more tractable than in the general case. If we take 
the large-N limit while keeping the ‘t Hooft coupling 
AN=KiyN fixed (gyy is the Yang-Mills coupling 
strength), then each Feynman graph of the gauge 
theory carries a topological factor NX, where x is 
the Euler characteristic of the graph. The graphs of 
spherical topology (often called “planar’’), to be 
identified with string tree diagrams, are weighted by 
N?; the graphs of toroidal topology, to be identified 
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with string one-loop diagrams, by N®, etc. This 
counting corresponds to the closed-string coupling 
constant of order N~. Thus, in the large-N limit 
the gauge theory becomes “planar,” and the dual 
string theory becomes classical. For small g¥,,N, 
the gauge theory can be studied perturbatively; in 
this regime the dual string theory has not been very 
useful because the background becomes highly 
curved. The real power of the AdS/CFT duality, 
which already has made it a very useful tool, lies in 
the fact that, when the gauge theory becomes 
strongly coupled, the curvature in the dual descrip- 
tion becomes small; therefore, classical supergravity 
provides a systematic starting point for approximat- 
ing the string theory. 

There is a strong motivation for an improved 
understanding of dualities of this type. In one 
direction, generalizations of this duality provide the 
tantalizing hope of a better understanding of 
quantum chromodynamics (QCD); QCD is a non- 
abelian gauge theory that describes the strong 
interactions of mesons, baryons, and glueballs, and 
has a conformal symmetry which is broken by 
quantum effects. In the other direction, AdS/CFT 
suggests that quantum gravity may be understand- 
able as a gauge theory. Understanding the confine- 
ment of quarks and gluons that takes place in 
low-energy QCD and quantizing gravity are well 
acknowledged to be two of the most important 
outstanding problems of theoretical physics. 


Some Geometrical Preliminaries 


The d-dimensional sphere of radius L, Ss. may be 
defined by a constraint 


d+1 


XY = VL? [1] 


i=l 


on d + 1 real coordinates X’. It is a positively curved 
maximally symmetric space with symmetry group 
SO(d + 1). We will denote the round metric on S? of 
unit radius by d0%,. 


The d-dimensional anti-de Sitter space, AdS,;, may 
be defined by a constraint 


a 


(x2 -FPR 2] 
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| 
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This constraint shows that the symmetry group of 
AdS4 is SO(2,d — 1). AdSg is a negatively curved 
maximally symmetric space, that is, its curvature 
tensor is related to the metric by 


1 
Rad == 72 (Baiba = Sad8bel [3] 


Its metric may be written as 





d 2 
dhas = LÒ -0+ Dd + 7 + Pang.) M 


where the radial coordinate y €[0,0o), and £ is 
defined on a circle of length 27. This space has 
closed timelike curves; to eliminate them, we will 
work with the universal covering space where 
t € (~, 00). The boundary of AdS4, which plays 
an important role in the AdS/CFT correspondence, is 
located at infinite y. There exists a subspace of AdS4 
called the Poincaré wedge, with the metric 


2 Lf, 02 ryan? 
ds -E (a — (dx?) Yas) [5] 


=l 


“~ 


where z € [0, 00). 

A Euclidean continuation of AdS,; is the 
Lobachevsky space (hyperboloid), Lg. It is obtained 
by reversing the sign of (X¢)7, dt?, and (dx?) in [2], 
[4], and [5], respectively. After this Euclidean 
continuation, the metrics [4] and [5] become 
equivalent; both of them cover the entire Lg. 
Another equivalent way of writing the metric is 


ds? = L? (dp + sinh? pd) 6] 


which shows that the boundary at infinite p has the 
topology of S4!. In terms of the Euclideanized 
metric [5], the boundary consists of the R*! at 
z=0, and a single point at z = œ. 


The Geometry of Dirichlet Branes 


Our path toward formulating the AdS;/CFT4 
correspondence requires introduction of Dirichlet 
branes, or D-branes for short. They are soliton-like 
“membranes” of various internal dimensionalities 
contained in type II superstring theories. A Dirichlet 
p-brane (or Dp brane) is a (p+ 1)-dimensional 
hyperplane in (9 + 1)-dimensional spacetime where 
strings are allowed to end. A D-brane is much like a 
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topological defect: upon touching a D-brane, a 
closed string can open up and turn into an open 
string whose ends are free to move along the 
D-brane. For the endpoints of such a string the p + 1 
longitudinal coordinates satisfy the conventional free 
(Neumann) boundary conditions, while the 9 — p 
coordinates transverse to the Dp brane have the fixed 
(Dirichlet) boundary conditions, hence the origin of 
the term “Dirichlet brane.” The Dp brane preserves 
half of the bulk supersymmetries and carries an 
elementary unit of charge with respect to the (p + 1)- 
form gauge potential from the Ramond—Ramond 
(RR) sector of type II superstring. 

For this article, the most important property of 
D-branes is that they realize gauge theories on their 
world volume. The massless spectrum of open 
strings living on a Dp brane is that of a maximally 
supersymmetric U(1) gauge theory in p +1 dimen- 
sions. The 9 — p massless scalar fields present in this 
supermultiplet are the expected Goldstone modes 
associated with the transverse oscillations of the Dp 
brane, while the photons and fermions provide the 
unique supersymmetric completion. If we consider 
N parallel D-branes, then there are N? different 
species of open strings because they can begin and 
end on any of the D-branes. N? is the dimension of 
the adjoint representation of U(N), and indeed we 
find the maximally supersymmetric U(N) gauge 
theory in this setting. 

The relative separations of the Dp branes in the 
9—p transverse dimensions are determined by 
the expectation values of the scalar fields. We will 
be interested in the case where all scalar expectation 
values vanish, so that the N Dp branes are stacked 
on top of each other. If N is large, then this stack is 
a heavy object embedded into a theory of closed 
strings which contains gravity. Naturally, this 
macroscopic object will curve space: it may be 
described by some classical metric and other back- 
ground fields including the RR (p+ 2)-form field 
strength. Thus, we have two very different descrip- 
tions of the stack of Dp branes: one in terms of the 
U(N) supersymmetric gauge theory on its world 
volume, and the other in terms of the classical RR 
charged p-brane background of the type II closed 
superstring theory. The relation between these two 
descriptions is at the heart of the connections 
between gauge fields and strings that are the subject 
of this article. 


Coincident D3 Branes 


Gauge theories in 3 + 1 dimensions play an impor- 
tant role in physics, and as explained above, parallel 
D3 branes realize a (3 + 1)-dimensional U(N) SYM 
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theory. Let us compare a stack of D3 branes with 
the RR-charged black 3-brane classical solution 
where the metric assumes the form 


ds? = H-"/?(r)|-f(r)(dx°)? + (dx)? 


sH (a (dr +r ds” 7) 
where i= 1, 2, 3 and 
E r 
HQ=14+4, f)=1 oer 


The solution also contains an RR self-dual 5-form 
field strength 


F = dx? A dx! A dx? A dx? A d(H~') 
+ 4L* vol(S°) [8] 


so that the Einstein equation of type IIB super- 
gravity, Ru = F ags F22 /96, is satisfied. 

In the extremal limit rọ — 0, the 3-brane metric 
becomes 


ds? -(1 + =) _ (—(dx°)? + (dx')*) 


a 1 
+ (1 + =) (d? + 7° dO5) [9] 
Just like the stack of parallel, ground-state D3 
branes, the extremal solution preserves 16 of the 
32 supersymmetries present in the type IIB theory. 
Introducing z=L7/r, one notes that the limiting 
form of [9] as r — O factorizes into the direct 
product of two smooth spaces, the Poincaré wedge 
[5] of AdS;, and S°, with equal radii of curvature L. 
The 3-brane geometry may thus be viewed as a 
semi-infinite throat of radius L which, for r> L, 
Opens up into flat (9 + 1)-dimensional space. Thus, 
for L much larger than the string length scale, Va’, 
the entire 3-brane geometry has small curvatures 
everywhere and is appropriately described by the 
supergravity approximation to type HB string 
theory. 

The relation between L and Va’ may be found by 
equating the gravitational tension of the extremal 
3-brane classical solution to N times the tension of a 
single D3 brane: 


2 
— Lf vol(S°) = nvm [10] 
K K 

where vol(S°) =7° is the volume of a unit 5-sphere, 


and «= ~/8nG is the ten-dimensional gravitational 
constant. It follows that 


Lt = aN = g2,, Na’? [11] 


where we used the standard relations k = 8217/7 gta? 


and g¥,, = 4rgs: [10]. Thus, the size of the throat in 
string units is A!/+. This remarkable emergence 
of the ‘t Hooft coupling from gravitational con- 
siderations is at the heart of the success of the AdS/ 
CFT correspondence. Moreover, the requirement 
L> Va’ translates into Aœ 1: the gravitational 
approach is valid when the ‘t Hooft coupling is very 
strong and the perturbative field-theoretic methods 
are not applicable. 


Example: Thermal Gauge Theory from 
Near-Extremal D3 Branes 


An important black hole observable is the Bekenstein— 
Hawking (BH) entropy, which is proportional to the 
area of the event horizon. For the 3-brane solution 
[7], the horizon is located at r=ro. For rọ > O the 
3-brane carries some excess energy E above its 
extremal value, and the BH entropy is also non- 
vanishing. The Hawking temperature is then defined 
by T- = OSpy / ðE. 

Setting ro < L in [9], we obtain a near-extremal 
3-brane geometry, whose Hawking temperature is 
found to be T =rọ/(rL?). The eight-dimensional 
“area” of the horizon is 


Ap = (1o/L)°? V3L° vol(S°) = ÉLT? V3 [12] 


where V3 is the spatial volume of the D3 brane (i.e., 
2 a3 


the volume of the x!, x’, x? coordinates). Therefore, 
the BH entropy is 


2TA 1 
Seu = — _ zN VT [13] 





This gravitational entropy of a near-extremal 
3-brane of Hawking temperature T is to be 
identified with the entropy of M=4 supersym- 
metric U(N) gauge theory (which lives on N 
coincident D3 branes) heated up to the same 
temperature. 

The entropy of a free U(N) M = 4 supermultiplet — 
which consists of the gauge field, 6N? massless 
scalars, and 4N* Weyl fermions — can be calculated 
using the standard statistical mechanics of a 
massless gas (the blackbody problem), and the 
answer Is 


_ 2 
3 


It is remarkable that the 3-brane geometry captures 
the T° scaling characteristic of a conformal field 
theory (CFT) (in a CFT this scaling is guaranteed by 
the extensivity of the entropy and the absence of 
dimensionful parameters). Also, the N? scaling 
indicates the presence of O(N?) unconfined degrees 


So N? VT? [14] 


of freedom, which is exactly what we expect in the 
N =4 supersymmetric U(N) gauge theory. But what 
is the explanation of the relative factor of 3/4 
between Spy and So? In fact, this factor is not a 
contradiction but rather a prediction about the 
strongly coupled M=4 SYM theory at finite 
temperature. As we argued above, the supergravity 
calculation of the BH entropy, [13], is relevant to 
the A — oo limit of the M =4 SU(N) gauge theory, 
while the free-field calculation, [14], applies to the 
A — 0 limit. Thus, the relative factor of 3/4 is not a 
discrepancy: it relates two different limits of the 
theory. Indeed, on general field-theoretic grounds, 
we expect that in the ‘t Hooft large-N limit, the 
entropy is given by 
2 
S= FE N?F(X)V3T? [15] 
The function f is certainly not constant: 
perturbative calculations valid for small A= g%,,N 
give 
3. 34+v2 
f(àA)=1 72At a 
Thus, the BH entropy in supergravity, [13], is 
translated into the prediction that 
. 3 
lim f(A) = - [17] 


A> 00 4 
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The Essentials of the AdS/CFT 
Correspondence 


The AdS/CFT correspondence asserts a detailed map 
between the physics of type IIB string theory in the 
throat of the classical 3-brane geometry, that is, the 
region r < L, and the gauge theory living on a stack 
of D3 branes. As already noted, in this limit r< L, 
the extremal D3 brane geometry factors into a direct 
product of AdS; x $°. Moreover, the gauge theory 
on this stack of D3 branes is the maximally 
supersymmetric VV =4 SYM. 

Since the horizon of the near-extremal 3-brane lies 
in the region r < L, the entropy calculation could 
have been carried out directly in the throat limit, 
where H(r) is replaced by L*/r*. Another way to 
motivate the identification of the gauge theory with 
the throat is to think about the absorption of 
massless particles. In the D-brane description, a 
particle incident from asymptotic infinity is con- 
verted into an excitation of the stack of D-branes, 
that is, into an excitation of the gauge theory on the 
world volume. In the supergravity description, a 
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particle incident from the asymptotic (large r) region 
tunnels into the r« L region and produces an 
excitation of the throat. The fact that the two 
different descriptions of the absorption process give 
identical cross sections supports the identification of 
excitations of AdS; x S? with the excited states of 
the N =4 SYM theory. 

Maldacena (1998) motivated this correspondence 
by thinking about the low-energy (a’ — 0) limit of 
the string theory. On the D3 brane side, in this low- 
energy limit, the interaction between the D3 branes 
and the closed strings propagating in the bulk 
vanishes, leaving a pure MN =4 SYM theory on the 
D3 branes decoupled from type IIB superstrings in 
flat space. Around the classical 3-brane solutions, 
there are two types of low-energy excitations. The 
first type propagate in the bulk region, r >> L, and 
have a cross section for absorption by the throat 
which vanishes as the cube of their energy. The 
second type are localized in the throat, r < L, and 
find it harder to tunnel into the asymptotically flat 
region as their energy is taken smaller. Thus, both 
the D3 branes and the classical 3-brane solution 
have two decoupled components in the low-energy 
limit, and in both cases, one of these components is 
type IIB superstrings in flat space. Maldacena 
conjectured an equivalence between the other two 
components. 

Immediate support for this identification comes 
from symmetry considerations. The isometry group 
of AdS; is SO(2,4), and this is also the conformal 
group in 3+ 1 dimensions. In addition, we have the 
isometries of S° which form SU(4) ~ SO(6). This 
group is identical to the R-symmetry of the MN =4 
SYM theory. After including the fermionic genera- 
tors required by supersymmetry, the full isometry 
supergroup of the AdS;xS° background is 
SU(2, 2/4), which is identical to the MN =4 super- 
conformal symmetry. We will see that, in theories 
with reduced supersymmetry, the S° factor is 
replaced by other compact Einstein spaces Ys, but 
AdS; is the “universal” factor present in the dual 
description of any large-N CFI and makes the 
SO(2,4) conformal symmetry a geometric one. 

The correspondence extends beyond the super- 
gravity limit, and we must think of AdS; x Ys as a 
background of string theory. Indeed, type IIB strings 
are dual to the electric flux lines in the gauge theory, 
providing a string-theoretic setup for calculating 
correlation functions of Wilson loops. Furthermore, 
if N — co while g¢,,N is held fixed and finite, then 
there are string scale corrections to the supergravity 
limit (Maldacena 1998, Gubser et al. 1998, Witten 
1998) which proceed in powers of 
AL =E N)'/*. For finite N, there are also 
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string loop corrections in powers of k?/L8 ~ N>. 
As expected, with N — oo we can take the classical 
limit of the string theory on AdS; x Y5. However, in 
order to understand the large-N gauge theory with 
finite ‘t Hooft coupling, we should think of AdS; x 
Y; as the target space of a two-dimensional sigma 
model describing the classical string physics. 


Correlation Functions and the Bulk/Boundary 
Correspondence 


A basic premise of the AdS/CFT correspondence is 
the existence of a one-to-one map between gauge- 
invariant operators in the CFI and fields (or 
extended objects) in AdS. Gubser et al. (1998) and 
Witten (1998) formulated precise methods for 
calculating correlation functions of various opera- 
tors in a CFT using its dual formulation. A physical 
motivation for these methods comes from earlier 
calculations of absorption by 3-branes. When a 
wave is absorbed, it tunnels from asymptotic infinity 
into the throat region, and then continues to 
propagate toward smaller r. Let us separate the 
3-brane geometry into two regions: r > L andr < L. 
For r< L the metric is approximately that of 
AdS; x S°, while for r > L it becomes very different 
and eventually approaches the flat metric. Signals 
coming in from large r (small z=L*/r) may be 
considered as disturbing the “boundary” of AdS; at 
r~ L, and then propagating into the bulk of AdSs. 
Discarding the r > L part of the 3-brane metric, the 
gauge theory correlation functions are related to the 
response of the string theory to boundary conditions 
at r~ L. It is therefore natural to identify the 
generating functional of correlation functions in the 
gauge theory with the string theory path integral 
subject to the boundary conditions that 
d(x,z)=do(x) at z=L (at z= all fluctuations 
are required to vanish). In calculating correlation 
functions in a CFT, we will carry out the standard 
Euclidean continuation; then on the string theory 
side, we will work with Ls, which is the Euclidean 
version of AdSs. 

More explicitly, we identify a gauge theory 
quantity W with a string-theory quantity Zetring: 


Wigolx)]| = Zstringlho(%)] [18] 


W generates the connected Euclidean Green’s func- 
tions of a gauge-theory operator O, 


Wio) = (exp f d's) p9 


Zstring 18 the string theory path integral calculated as 
a functional of ġo, the boundary condition on the 


field @ related to O by the AdS/CFT duality. In the 


large-N limit, the string theory becomes classical 
which implies 


Z string E e~le (œ) [20] 


where I[ġo(x)] is the extremum of the classical string 
action calculated as a functional of do. If we are 
further interested in correlation functions at very 
large ‘t Hooft coupling, then the problem of 
extremizing the classical string action reduces to 
solving the equations of motion in type IIB super- 
gravity whose form is known explicitly. A simple 
example of such a calculation is presented in the 
next subsection. 

Our reasoning suggests that from the point of 
view of the metric [5], the boundary conditions are 
imposed not quite at z=0, which is the true 
boundary of Ls, but at some finite value z=e. It 
does not matter which value it is since the metric [5] 
is unchanged by an overall rescaling of the coordi- 
nates (z, x); thus, such a rescaling can take z= L into 
z=e for any e. The physical meaning of this cutoff is 
that it acts as a UV regulator in the gauge theory. 
Indeed, the radial coordinate z is to be considered as 
the effective energy scale of the gauge theory, and 
decreasing z corresponds to increasing the energy. A 
safe method for performing calculations of correla- 
tion functions, therefore, is to keep the cutoff on the 
z-coordinate at intermediate stages and remove it 
only at the end. 


Two-Point Functions and Operator Dimensions 


In the following, we present a brief discussion of 
two-point functions of scalar operators in CFT. 
The corresponding field in Ly,, is a scalar field of 
mass m whose Euclidean action is proportional to 


m? L? 


2 
Zz © 


[21] 





a=1 


: J dx dzz~4*! | (8p)? +370 oy + 
F gg Z a 


In calculating correlation functions of vertex 
operators from the AdS/CFT correspondence, the 
first problem is to reconstruct an on-shell field in 
L441 from its boundary behavior. The near-bound- 
ary, that is, small z, behavior of the classical 
solution is 


olz, x) 24" [bo (x) + O(z7)| 
+ z°[A(x) + O(27)| |22] 


where A is one of the roots of 


A(A — d) =m’*L? [23] 


olx) is regarded as a “source” in [19] that couples 
to the dual gauge-invariant operator O of dimension 
A, while A(x) is related to the expectation value, 


1 


A(x) = IA _d (O(x)) 


|24] 

It is possible to regularize the Euclidean action to 
obtain the following value as a functional of the 
source: 


in TA) 
(a/d) TR 


I|¢o(x)] =- (4 - T(A — (d/2)) 


x | atx x [ae MAE : 


Varying twice with respect to ġo, we find that the 
two-point function of the corresponding operator is 

2A — d)I(A) 1 
O(~O(¢)) = RA=DPA) 1 
( (x) (x )) Til r(A — (d/2)) Ix — x’ 7^ 


Which of the two roots, A, or A_, of [23] 


_d d? 272 
Ap=ztyptmeL (27) 


should we choose for the operator dimension? For 
positive m*, A. is certainly the right choice: here the 
other root, A_, is negative. However, it turns out 
that for 


|25] 


|26] 


2 3 

-E cmr e [28] 
both roots of [23] may be chosen. Thus, there are 
two possible CFTs corresponding to the same 
classical AdS action: in one of them the correspond- 
ing operator has dimension A,, while in the other 
the dimension is A_. We note that A_ is bounded 
from below by (d—2)/2, which is precisely the 
unitarity bound on dimensions of scalar operators in 
d-dimensional field theory! Thus, the ability to 
choose dimension A- is crucial for consistency of 
the AdS/CFT duality. 

Whether string theory on AdSs x Y; contains 
fields with m? in the range [28] depends on Ys. 
The example discussed in the next section, 
Y;=T!, turns out to contain such fields, and the 
possibility of having dimension A_, [27], is crucial 
for consistency of the AdS/CFT duality in that case. 
However, for Y;=S°, which is dual to the V =4 
large-N SYM theory, there are no such fields and all 
scalar dimensions are given by [27]. 

The operators in the M =4 large-N SYM theory 
naturally break up into two classes: those that 
correspond to the Kaluza—Klein states of super- 
gravity and those that correspond to massive string 
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states. Since the radius of the S° is L, the masses of 
the Kaluza—Klein states are proportional to 1/L. 
Thus, the dimensions of the corresponding operators 
are independent of L and therefore also of A. On the 
gauge-theory side, this independence is explained by 
the fact that the supersymmetry protects the dimen- 
sions of certain operators from being renormalized: 
they are completely determined by the representa- 
tion under the superconformal symmetry. All 
families of the Kaluza—Klein states, which corre- 
spond to such protected operators, were classified 
long ago. Correlation functions of such operators in 
the strong ‘t Hooft coupling limit may be obtained 
from the dependence of the supergravity action on 
the boundary values of corresponding Kaluza—Klein 
fields, as in [19]. A variety of explicit calculations 
have been performed for two-, three-, and even four- 
point functions. The four-point functions are parti- 
cularly interesting because their dependence on 
Operator positions is not determined by the con- 
formal invariance. 

On the other hand, the masses of string excita- 
tions are m? =4n/a’, where n is an integer. For the 
corresponding operators the formula [27] predicts 
that the dimensions do depend on the ‘t Hooft 
coupling and, in fact, blow up for large \ = g4,,N as 


204 n., 


Calculation of Wilson Loops 


The Wilson loop operator of a nonabelian gauge 


theory 
W(C) =tr P exp ( $ A) [29] 


involves the path-ordered integral of the gauge 
connection A along a contour C. For N =4 SYM, 
one typically uses a generalization of this loop 
operator which incorporates other fields in the 
N =4 multiplet, the adjoint scalars and fermions. 
Using a rectangular contour, we can calculate the 
quark-antiquark potential from the expectation 
value (W(C)). One thinks of the quarks located a 
distance L apart for a time T, yielding 


(W) ~e TV) [30] 


where V(L) is the potential. 

According to Maldacena, and Rey and Yee, the 
AdS/CFT correspondence relates the Wilson loop 
expectation value to a sum over string world sheets 
ending on the boundary of Ls(z=0) along the 
contour C: 


(w)~ fe 31] 
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where S is the action functional of the string world 
sheet. In the large ‘t Hooft coupling limit à —> ov, 
this path integral may be evaluated using a saddle- 
point approximation. The leading answer is ~e™, 
where Sp is the action for the classical solution, 
which is proportional to the minimal area of the 
string world sheet in Ls subject to the boundary 
conditions. The area as currently defined is 
actually divergent, and to regularize it one must 
position the contour at z=e (this is the same type 
of regulator as used in the definition of correlation 
functions). 

Consider a circular Wilson loop of radius a. The 
action of the corresponding classical string world 
sheet is 


So = VA (£ = 1) [32] 


Subtracting the linearly divergent term, which is 
proportional to the length of the contour, one finds 


In(W) = VA + O(ln à) [33] 


a result which has been duplicated in field theory by 
summing certain classes of rainbow Feynman dia- 
grams in MN =4 SYM. From these sums, one finds 


J 
rainbow —_ ae h (va) [34] 
where I; is a Bessel function. This formula is one of 
the few available proposals for extrapolation of an 
observable from small to large coupling. At large A, 


2 ev 
Diara" Jr 35 


in agreement with the geometric prediction. 

The quark—antiquark potential is extracted from a 
rectangular Wilson loop of width L and length T. 
After regularizing the divergent contribution to the 
energy, one finds the attractive potential 


OANA 
UATE 


(W) 


V(L) = [36] 
The Coulombic 1/L dependence is required by the 
conformal invariance of the theory. The fact that the 
potential scales as the square root of the ‘t Hooft 
coupling indicates some screening of the charges at 
large coupling. 


Conformal Field Theories and Einstein 
Manifolds 


Interesting generalizations of the duality between 
AdS; x S° and N =4 SYM with less supersymmetry 
and more complicated gauge groups can be 





fae 


D3 branes placed at the tip of a Ricci-flat cone xX. 





Figure 1 


produced by placing D3 branes at the tip of a 
Ricci-flat six-dimensional cone X (see Figure 1). The 
cone metric may be cast in the form 


dsx” = d? +r dsy” [37] 


where Y is the level surface of X. In particular, Y is a 
positively curved Einstein manifold, that is, one for 
which R;j=4g;. In order to preserve the N=1 
supersymmetry, X must be a Calabi-Yau space; then 
Y is defined to be Sasaki—Einstein. 

The D3 branes appear as a point in X and span the 
transverse Minkowski space R*'. The ten-dimen- 
sional metric they produce assumes the form [9], but 
with the sphere metric dQ;* replaced by the metric on 
Y, ds}. The equality of tensions [10] now requires that 


2 vol(Y) 





j VARN = 4ng N a? [38] 


vol(Y) 


In the near-horizon limit, r — 0, the geometry factors 
into AdS; x Y. Because the D3 branes are located at a 
singularity, the gauge theory becomes much more 
complicated, typically involving a product of several 
SU(N) factors coupled to matter in bifundamental 
representations, often described using a quiver dia- 
gram (see Figure 2 for an example). 


4 
<= 














Figure 2 The quiver for Y*°. Each node corresponds to an 
SU(N) gauge group and each arrow to a bifundamental chiral 
superfield. 


The simplest examples of X are orbifolds C°/T, 
where T is a discrete subgroup of SO(6). Indeed, if 
r C SU(3), then N =1 supersymmetry is preserved. 
The level surface of such an X is Y=S°/T. In this 
case, the product structure of the gauge theory can 
be motivated by thinking about image stacks of D3 
branes from the action of T. 

The next simplest example of a Calabi-Yau cone 
X is the conifold which may be described by the 
following equation in four complex variables: 


J g =0 [39] 


Since this equation is symmetric under an overall 
rescaling of the coordinates, this space is a cone. The 
level surface Y of the conifold is a coset manifold 
Ttt =(SU(2) x SU(2))/U(1). This space has the 
SO(4) ~ SU(2) x SU(2) symmetry which rotates the 
z’s, and also the U(1) R-symmetry under z, —> e'’z,. 
The metric on Tt! is known explicitly; it assumes 
the form of an S! bundle over S% x S°. 

The supersymmetric field theory on the D3 branes 
probing the conifold singularity is SU(N) x SU(N) 
gauge theory coupled to two chiral superfields, Aj, 
in the (N, N) representation and two chiral super- 
fields, B;, in the (N, N) representation. The A’s 
transform as a doublet under one of the global 
SU(2)’s, while the B’s transform as a doublet under 
the other SU(2). Cancelation of the anomaly in the 
U(1) R-symmetry requires that the A’s and the B’s 
each have R-charge 1/2. For consistency of the 
duality, it is necessary that we add an exactly 
marginal superpotential which preserves the SU(2) x 
SU(2) x U(1)p symmetry of the theory. Since a 
marginal superpotential has R-charge equal to 2 it 
must be quartic, and the symmetries fix it uniquely 
up to overall normalization: 


W = ee”! tr A;B, A,B; [40] 


There are in fact infinite families of Calabi-Yau 
cones X, but there are two problems one faces in 
studying these generalized AdS/CFT correspon- 
dences. The first is geometric: the cones X are not 
all well understood and only for relatively few do 
we have explicit metrics. However, it is often 
possible to calculate important quantities such as 
the vol(Y) without knowing the metric. The second 
problem is gauge theoretic: although many techni- 
ques exist, there is no completely general procedure 
for constructing the gauge theory on a stack of D- 
branes at an arbitrary singularity. 

Let us mention two important classes of Calabi- 
Yau cones X. The first class consists of cones over 
the so-called Y’4 Sasaki-Einstein spaces. Here, p 
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and q are integers with p > q. Gauntlett et al. (2004) 
discovered metrics on all the Y”7, and the quiver 
gauge theories that live on the D-branes probing the 
singularity are now known. Making contact with 
the simpler examples discussed above, the Y®? are 
orbifolds of T+! while the Y®? are orbifolds of S°. 

In the second class of cones X, a del Pezzo surface 
shrinks to zero size at the tip of the cone. A 
del Pezzo surface is an algebraic surface of complex 
dimension 2 with positive first Chern class. One 
simple del Pezzo surface is a complex projective 
space of dimension 2, P^, which gives rise to the 
N =1 preserving S°/Z3 orbifold. Another simple 
case is P! x Pt, which leads to T!!/Z ). The 
remaining del Pezzos surfaces B, are P% blown up 
at k points, 1 < k < 8. The cone where B, shrinks to 
zero size has level surface Y*'. Gauge theories for 
all the del Pezzos have been constructed. Except for 
the three del Pezzos just discussed, and possibly also 
for Bg, metrics on the cones over these del Pezzos 
are not known. Nevertheless, it is known that for 
3 < k < 8, the volume of the Sasaki—Einstein mani- 
fold Y associated with By is 7°(9 — k)/27. 


The Central Charge 


The central charge provides one of the most 
amazing ways to check the generalized AdS/CFT 
correspondences. The central charge c and confor- 
mal anomaly a can be defined as coefficients of 
certain curvature invariants in the trace of the stress 
energy tensor of the conformal gauge theory: 


(TS) = —aE4 = cl4 [41] 


(The curvature invariants E4 and I4 are quadratic in 

the Riemann tensor and vanish for Minkowski 

space.) As discussed above, correlators such as (Tu) 

can be calculated from supergravity, and one finds 
aN? 


a=c= Foy) [42] 


On the gauge-theory side of the correspondence, 
anomalies completely determine a and c: 


a = 3 (3tr R? — tr R) 
c= $ (9tr R? — Str R) |43] 


The trace notation implies a sum over the R-charges 
of all of the fermions in the gauge theory. (From the 
geometric knowledge that a=c, we can conclude 
that trR=0.) 

The R-charges can be determined using the 
principle of a-maximization. For a superconformal 
gauge theory, the R-charges of the fermions 
maximize a subject to the constraints that the 
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Novikov—Shifman—Vainshtein—Zakharov (NSVZ) 
beta function of each gauge group vanishes and 
the R-charge of each superpotential term is 2. 

For the Y7 spaces mentioned above, one finds 
that 


q? (2p + 4p? = 347) 
3p2 (34? — 2p? + p/4p? —3q?) 


[44] 


vol(Y®4) = 3 


The gauge theory consists of p — q fields Z, p + q 
fields Y, 2p fields U, and 24 fields V. These fields all 
transform in the bifundamental representation of a 
pair of SU(N) gauge groups (the quiver diagram for 
Y4? is given in Figure 2). The NSVZ beta function 
and superpotential constraints determine the 
R-charges up to two free parameters x and y. Let x 
be the R-charge of Z and y the R-charge of Y. Then 
the U have R-charge 1 —(1/2)(x+ y) and the V 
have R-charge 1 + (1/2)(x — y). 

The technique of a maximization leads to the result 


1 
x = zzz (740? +2pq + 34° + (2p — q) V4 — 34") 


1 
Y= 35 (740 — 2pq + 34° + (2p + q) V47 = 347) 


Thus, as calculated by Benvenuti et al. (2004) and 
Bertolini et al. (2004) 


m? N? 


P.q\ — 
aY) = Fol Ya) 


[45] 
in remarkable agreement with the prediction [42] of 
the AdS/CFT duality. 


A Path to a Confining Theory 


There exists an interesting way of breaking the 
conformal invariance for spaces Y whose topology 
includes an S* factor (examples of such spaces 
include Tt! and Y?4, which are topologically 
S? x S°). At the tip of the cone over Y, one may 
add M wrapped D5 branes to the N D3 branes. The 
gauge theory on such a combined stack is no longer 
conformal; it exhibits a novel pattern of quasiperiodic 
renormalization group flow, called a duality cascade. 

To date, the most extensive study of a theory of this 
type has been carried out for the conifold, where one 
finds an N =1 supersymmetric SU(N) x SU(N + M) 
theory coupled to chiral superfields A4, Az in the 
(N,N+M) representation, and B1, B2 in the 
(N, N+M) representation. DS branes source RR 
3-form flux; hence, the supergravity dual of this 
theory has to include M units of this flux. Klebanov 
and Strassler (2000) found an exact nonsingular 
supergravity solution incorporating the 3-form and 





the 5-form RR field strengths, and their back-reaction 
on the geometry. This back-reaction creates a “geo- 
metric transition” to the deformed conifold 


D -= e [46] 


a=—1 


and introduces a “warp factor” so that the full ten- 
dimensional geometry has the form 


dsio” = hP (T) (— (dx?) 
+ (dx!)*) + h! (r) di? [47] 


where ds¢* is the Calabi-Yau metric of the deformed 
conifold, which is known explicitly. 

The field-theoretic interpretation of this solution is 
unconventional. After a finite amount of RG flow, the 
SU(N + M) group undergoes a Seiberg duality trans- 
formation. After this transformation, and 
an interchange of the two gauge groups, the new 
gauge theory is SU(N) x SU(N + M) with the same 
matter and superpotential, and with N = N — M. The 
self-similar structure of the gauge theory under the 
Seiberg duality is the crucial fact that allows this 
pattern to repeat many times. If N = (k + 1)M, where 
k is an integer, then the duality cascade stops after k 
steps, and we find SU(M) x SU(2M) gauge theory. 
This IR gauge theory exhibits a multitude of interesting 
effects visible in the dual supergravity background. 
One of them is confinement, which follows from the 
fact that the warp factor h is finite and nonvanishing at 
the smallest radial coordinate, 7=0. The methods 
presented in the section “Calculation of Wilson loops,” 
then imply that the quark—antiquark potential grows 
linearly at large distances. Other notable IR effects 
are chiral symmetry breaking and the Goldstone 
mechanism. Particularly interesting is the appearance 
of an entire “baryonic branch” of the moduli space in 
the gauge theory, whose existence has been demon- 
strated also in the dual supergravity language. 


Conclusions 


This article tries to present a logical path from 
studying gravitational properties of D-branes to the 
formulation of an exact duality between conformal 
field theories and string theory in anti-de Sitter 
backgrounds, and also sketches some methods for 
breaking the conformal symmetry. Due to space 
limitations, many aspects and applications of the 
AdS/CFT correspondence have been omitted. At 
the moment, practical applications of this duality 
are limited mainly to very strongly coupled, large-N 
gauge theories, where the dual string description is 
well approximated by classical supergravity. To 
understand the implications of the duality for more 
general parameters, it is necessary to find better 


methods for attacking the world sheet approach to 
string theories in anti-de Sitter backgrounds with RR 
background fields turned on. When such methods are 
found, it is likely that the material presented here will 
have turned out to be just a tiny tip of a monumental 
iceberg of dualities between fields and strings. 
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One can distinguish three classes of affine quantum 
groups, each leading to a different dependence of the 
R-matrices on the spectral parameter u: Yangians 
lead to rational R-matrices, quantum affine algebras 
lead to trigonometric R-matrices, and elliptic quan- 
tum groups lead to elliptic R-matrices. We will mostly 
concentrate on the quantum affine algebras but many 
results hold similarly for the other classes. 

After giving mathematical details about quantum 
affine algebras and Yangians in the first two sections, 
we describe how these algebras arise in different 
areas of mathematical physics in the three following 
sections. We end with a description of boundary 
quantum groups which extend the formalism to the 
boundary Yang-Baxter (reflection) equation. 
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Quantum Affine Algebras 
Definition 


A quantum affine algebra U,(q) is a quantization of 
the enveloping algebra U(q) of an affine Lie algebra 
(Kac—Moody algebra) g. So we start by introducing 
affine Lie algebras and their enveloping algebras 
before proceeding to give their quantizations. 

Let g be a semisimple finite-dimensional Lie algebra 
over C of rank r with Cartan matrix (aj); ;-4, 75 
symmetrizable via positive integers d;, so that djaj; is 
symmetric. In terms of the simple roots a;, we have 


OF? Qj 
jail” 


2 
_ [a;l 


and d; 5 


az =2 





We can introduce an ao = )>._, a; in such a way 
that the extended Cartan matrix (aj);;~0,.,, 18 of 
affine type — that is, it is positive semidefinite of 
rank r. The integers n; are referred to as Kac indices. 
Choosing ao to be the highest root of g leads to an 
untwisted affine Kac-Moody algebra while choosing 
ag to be the highest short root of g leads to a twisted 
affine Kac-Moody algebra. 

One defines the affine Lie algebra g corresponding 
to this affine Cartan matrix as the Lie algebra 
(over C) with generators H;, EF for i=0,1,...,7 
and D with relations 





[H;, EF | = tay EF; Hat] = 0 
ENE | = o)H; [2 
[D, Hj] — 0, ID, EF | = +E; 
1-a; 
1 — aij k l=äz=k š : 
SUT EN EME = 0, #4) 


The E* are referred to as Chevalley generators and 
the last set of relations are known as Serre relations. 
The generator D is known as the canonical deriva- 
tion. We will denote the algebra obtained by 
dropping the generator D by g. 

In applications to physics, the affine Lie algebra ĝ 
often occurs in an isomorphic form as the loop Lie 
algebra g[z,z']@C-c with Lie product (for 
untwisted q) 


[Xz*, Yz'] = [X, Yje**’ + d4_1(X, Ye, 
for X,Y €q, kR,IEZ [3] 


and c being the central element. 

The universal enveloping algebra U(q) of ĝ is the 
unital algebra over C with generators H;,E* for 
i=0,1,...,r and D and with relations given by [2] 
where now |, ] stands for the commutator instead of 
the Lie product. 


To define the quantization of U(q), one can either 
define U,(q) (Drinfeld 1985) as an algebra over the 
ring C[[b]] of formal power series over an indeter- 
minate h or one can define U,(q) (Jimbo 1985) as an 
algebra over the field Q(q) of rational functions of q 
with coefficients in Q. We will present U;(q) first. 

The quantum affine algebra U;(q) is the unital 
algebra over C[[h]] topologically generated by 
H;, EF for i=0,1,...,r and D with relations 


HE; | = +a, EF: |H;, H;]| = (0) 
A; -H 
= qi q4 
Ete | = 69 i 4] 
|D, Hi == 0, [D, EF | = +6; EF 





(EX)ES (Ef) 7" =0, iF; 
qi 


where gj=q* and q=e’. The g-binomial coeffi- 
cients are defined by 


i = 5] 

l = [in], [7 - 1],-- 2), [1], 6] 
m| [m]! 

al- ala — 7], 4 


The quantum affine algebra U;,(q) is a Hopf 
algebra with coproduct 


A(D)=D®@1+1@D 
AEF) = EF @ qj," +4;"" @ EF 


antipode 
S(D) = =D, S(H;) = mz! 9] 
S(E#) = -47 EP 
and co-unit 
e(D) = e(H;) = €(E#) =0 110 


It is easy to see that the classical enveloping 
algebra U(q) can be obtained from the above by 
setting h =0, or more formally, 


U,(9)/hU,(g) = U(g) 


We can also define the quantum affine algebra 
U,(qg) as the algebra over Q(q) with generators 
K;,E*,D for i=0,1,...,r and relations that are 


obtained from the ones given above for U;,(q) by 
setting 


qui? =K;, i=0,...,7 [11] 


One can go further to an algebraic formulation over 
C in which g is a complex number (with some points 
including q =0 not allowed). This has the advantage 
that it becomes possible to specialize, for example, to 
q a root of unity, where special phenomena occur. 


Representations 


For applications in physics, the finite-dimensional 
representations of U;(q’) are the most interesting. As 
will be explained in later sections, these occur, for 
example, as particle multiplets in 2D quantum field 
theory or as spin Hilbert spaces in quantum spin 
chains. In the next subsection, we will use them to 
derive matrix solutions to the Yang—Baxter equation. 

While for a nonaffine quantum algebra U;/(q) 
the ring of representations is isomorphic to that of 
the classical enveloping algebra U(q) (because in fact 
the algebras are isomorphic, as Drinfeld has pointed 
out), the corresponding fact is no longer true for affine 
quantum groups, except in the case ĝ =a% =sl,41. 

For the classical enveloping algebras U(q’), any 
finite-dimensional representation of U(q) also carries 
a finite-dimensional representation of U(q’). In the 
quantum case, however, in general, an irreducible 
representation of U,(q’) reduces to a sum of 
representations of U;(q). 

To classify the finite-dimensional representations 
of U;,(q’), it is necessary to use a different realization 
of U,(q’) that looks more like a quantization of the 
loop algebra realization [3] than the realization in 
terms of Chevalley generators. In terms of the 
generators in this alternative realization, which we 
do not give here because of its complexity, the 
finite-dimensional representations can be viewed as 
pseudo-highest-weight representations. There is a set 
of r “fundamental” representations V°, a=1,...7, 
each containing the corresponding U;,(q) fundamen- 
tal representation as a component, from the tensor 
products of which all the other finite-dimensional 
representations may be constructed. The details can 
be found in Chari and Pressley (1994). 

Given some representation p:U,(9')— End(V), 
we can introduce a parameter A with the help of 
the automorphism 7) of U;,(q’) generated by D and 
given by 

+ +s; pt 
MEEA = M2) 
™ (Hj) = H; 


Different choices for the s; correspond to different 
gradations. Commonly used are the “homogeneous 
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gradation,” so=1,s1=--- =s,=0, and the “prin- 
cipal gradation,” so=s;=--- =s,=1. We shall 
also need the “spin gradation” s;=d;'. The 
representations 


Pr = pot 


play an important role in applications to integrable 
models where A is referred to as the (multiplicative) 
spectral parameter. In applications to particle scatter- 
ing introduced in a later section, it is related to the 
rapidity of the particle. The generator D can be 
realized as an infinitesimal scaling operator on À and 
thus plays the role of the Lorentz boost generator. 

The tensor product representations p$ & as are 
irreducible generically but become reducible for 
certain values of A/j, a fact which again is important 
in applications (fusion procedure, particle-bound 
states). 


R-Matrices 


A Hopf algebra A is said to be “almost cocommu- 
tative” if there exists an invertible element R € AQA 
such that 


RA(x) =(coA(x))R, forallxcA [13 


where o:x ® y> y & x exchanges the two factors in 
the coproduct. In a quasitriangular Hopf algebra, 
this element R satisfies 


(A & id) (R) = Raiz R23 [14] 

(id ®) A(R) = Raiz Rı2 
and is known as the “universal R-matrix” (see Hopf 
Algebras and q-Deformation Quantum Groups). As 
a consequence of [13] and [14], it automatically 
satisfies the Yang—Baxter equation 


Raiz R3 R23 = 193 Ri Rin [15] 


For technical reasons, to do with the infinite number 
of root vectors of ĝ, the quantum affine algebra U,(ĝ) 
does not possess a universal R-matrix that is an 
element of U;(q) 9 U,(q). However, as pointed out 
by Drinfeld (1985), it possesses a pseudouniversal 
R-matrix R(A) € (U;,(q’) S U,(q’))((A)). The A is 
related to the automorphism 7) defined in [12]. 
When using the homogeneous gradation, R(A) is a 
formal power series in À. 

When the pseudouniversal R-matrix is evaluated 
in the tensor product of any two indecomposable 
finite-dimensional representations p, and p2, one 
obtains a numerical R-matrix 


R? (A) = (p' ® p*)R(A) [16] 
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The entries of these numerical R-matrices are 
rational functions of the multiplicative spectral 
parameter but when written in terms of the 
additive spectral parameter u= log(A) they are 
trigonometric functions of u and satisfy the Yang- 
Baxter equation in the form given in [1]. The matrix 


R™*(r) = ao R™(A) 
satisfies the intertwining relation 
RP (A/u) + (0) 8 o} (Ale) 
= (RDA) A RPA) 7 
for any x € U,(q’). It follows from the irreducibility 


of the tensor product representations that these 
R-matrices satisfy the Yang—Baxter equations 


(id @ R” (u/v) (R? (A/V) 8 id) (id 8 R™(A/p)) 
= (R? (A/p) 8 id) (id 8 RY (A/v)) 
x (R3(u/v) 8 id) 18] 
or, graphically, 


V3 @ Vev} Vi@Vv2 @ v] 





Vi @ V2ev3 Viev? @ V3 
Explicit formulas for the pseudouniversal 
R-matrices were found by Khoroshkin and Tolstoy. 
However, these are difficult to evaluate explicitly in 
specific representations so that in practice it is easiest 
to find the numerical R-matrices R“’() by solving the 
intertwining relation [17]. It should be stressed that 
solving the intertwining relation, which is a linear 
equation for the R-matrix, is much easier than directly 


solving the Yang—Baxter equation, a cubic equation. 


Yangians 


As remarked by Drinfeld (1986), for untwisted g the 
quantum affine algebra U,(g’) degenerates as hb — 0 
into another quasipseudotriangular Hopf algebra, 
the “Yangian” Y(g) (Drinfeld 1985). It is associated 
with R-matrices which are rational functions of the 
additive spectral parameter u. Its representation ring 
coincides with that of U;(g’). 

Consider a general presentation of a Lie algebra g, 
with generators I, and structure constants fabes 
so that 


a All) =1,814+1@], 


= Tibete; 


(with summation over repeated indices). The Yan- 
gian Y(g) is the algebra generated by these and a 
second set of generators J, satisfying 


laso] = libele 
Alfa) =Ja@1+1 Jat Ffavele 8 l 


The requirement that A be a homomorphism 
imposes further relations: 


as Josdal] — la Jo, Jcl] = Qgbcdeg ta, Les Ie} 


and 


[Ja Jel, Ur Jml] + Ji Jml, a, Jol] 
— (Crees T Ogee doe) TIa, ladat 


where 


Qabcdeg = EPT E Gare Xaia = — > XiXjXk 
n i#j#k 


When g=sl the first of these is trivial, while for 
g Æ sl. the first implies the second. The co-unit is 
e(Iz) =e(Jz) =0; the antipode is s(I,)= —Ig, s(Ja) = 
+ (1/2)f,5-I-Ip. The Yangian may be obtained 
from U,(g') by expanding in powers of h. For 
the precise relationship, see Drinfeld (1985) and 
MacKay (2005). In the spin gradation, the auto- 
morphism [12] generated by D descends to Y(q) as 
la> la, Ja™ Ja + ula. 
There are two other realizations of Y(q). The first 
(see, for example, Molev 2003) defines Y(gI,) 
directly from 


R(u —v)T1(u)T2(v) = 
where T(z) 


T2(v)T,(u)R(u — v) 
= T(u) & id, T2(v) =id & T(v), and 


= ow tylu) S e; 
tj(u) = 5 + Tun" + Ju * + 


where e; are the standard matrix units for gl,. The 
rational R-matrix for the n-dimensional representa- 
tion of gl,, is 
P 
R(u — v) = 1 — i 


u — vV 





n 
where P = ; eij © eji 
ij=1 


is the transposition operator. Y(gl,,) is then defined 
to be the algebra generated by Ij, Jj, and must be 
quotiented by the “quantum determinant” at its 
center to define Y(sl,,). The coproduct takes a 
particularly simple form, 


A(t;(u = ota tip (u 


)® tpj(u 


Here we do not give explicitly the third realization, 
namely Drinfeld’s “new” realization of Y(g) (Drinfeld 
1988), but we remark that it was in this presentation 
that Drinfeld found a correspondence between certain 
sets of polynomials and finite-dimensional irreducible 
representations of Y(g), thus classifying these (although 
not thereby deducing their dimension or constructing 
the action of Y(g)). As remarked earlier, the structure is 
as in the earlier section: Y(g) representations are in 
general g-reducible, and there is a set of r fundamental 
Y(g)-representations, containing the fundamental 
g-representations as components, from which all 
other representations can be constructed. 


Origins in the Quantum 
Inverse-Scattering Method 


Quantum affine algebras for general g first appear in 
Drinfeld (1985, 1986) and Jimbo (1985, 1986), but 
they have their origin in the “quantum inverse- 
scattering method” (QISM) of the St. Petersburg 
school, and the essential features of U,(sl2) first 
appear in Kulish and Reshetikhin (1983). In this 
section, we explain how the quantization of the Lax- 
pair description of affine Toda theory led to the 
discovery of the U,(g) coproduct, commutation 
relations, and R-matrix. We use the normalizations 
of Jimbo (1986), in which the H; are rescaled so that 
the Cartan matrix aj = ajaj is symmetric. 
We begin with the affine Toda field equations 


2 r 
m 
OONO Z 3 X (ei? — T 
j=1 


an integrable model in R!+! of r real scalar fields 
@i(x,t) with a mass parameter m and coupling 
constant 68. Equivalently, we may write 
[O, + Lx, O; + Li] =0 for the Lax pair 


a i010; + e(F/2)499i (EF + E7 
( ) 


1 
Ea ms (9/2409 (si a Ez) 
j=1 


n= ČS Hð; n m Şo e(/2ai (EF — E>) 
i=1 


- 1 
4 > e(9/2)aoi%j (E3 = Eo | 
j=1 


with arbitrary À € C. The classical integrability of the 
system is seen in the existence of r(A, A’) such that 


{T(A) 9TA) = FA, à), TA) 8 TO’) 


Lal wt) = 
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where T(A)=T(—oo,00;A) and  T(x,y;A)= 
P expl f. a L(€; A) d€). Taking the trace of this relation 
gives an infinity of charges in involution. 

Quantization is problematic, owing to divergences 
in T. The QISM regularizes these by putting the 
model on a lattice of spacing A, defining the lattice 
Lax operator to be 


Ly(A) =T((n —1/2)A, (n+ 1/2)A; 2) 


(n+(1/2))A 
=Pexp ( J L(&; A) a) 
(n—(1/2)) 
The lattice monodromy matrix is then T(A)= 
HIMjis comes ts wiee T = Lgl 445s Lia, 
and its trace again yields an infinity of commuting 
charges, provided that there exists a quantum 
R-matrix R(à1, 2) such that 


R(t, Ar) Ln A1) L302) 
= 12(da)L¥(4)ROa, A2) 19) 
where L (d4) = Li) ® id, L2(Az2) =id ® La): 
That R solves the Yang—Baxter equation follows 
from the equivalence of the two ways of intertwining 
La(à1) @ La(à2) 8 Lnl(A3) with Lalà3) 8 La(A2) 8 
To compute L,(A), one uses the canonical, equal- 


time commutation relations for the ¢; and ¢;. In 
terms of the lattice fields 


(n+(1/2))A 
Pin = J 
(n—(1/2))A 
(n+(1/2))A 
din -| 
(n—(1/2))A 


the only nontrivial relation is 


(ib3/2)diq),n, and one finds 
— exp! ES Hp, PY Hp, 
Li) =p 5 HiPin + exp z 2 HiPin 
i j 
m 7 
DA X din(Ej + E; ) 
(a8 58) 
+I TAE + z Fo 
een BS ao, i 
exp Zo HiPin FOLA) 
j 


the expression used by the St Petersburg school and 
by Jimbo. We now make the replacement 
EF =œ q ™4E}q™/4, where q= exp(ibG*/2), and 
compute the O(A) terms in [19], which reduce to 


di(x) dx 


e(9/2)aijOi(X) dx 
j 


ET A = 
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R(z)(H; @1+1@H;) 
= (Hj ®1+1@ H;)R(z) 


R(z) (Ef @ gH? + gi? BEF) 

= (T ™2 @ EF + EF 8 qh?) RG) 
R(z) (2 Eg @ g Hol? + gto? @ EG | 

= (qt? @ E$ + ATES @ qh?) RG) 


where z= 1/2. We recognize in these the U,(q) 
coproduct and thus the intertwining relations, in the 
homogeneous gradation. These equations were 
solved for R in defining representations of 
nonexceptional g by Jimbo (1986). 

For g=sly, it was Kulish and Reshetikhin (1983) 
who first discovered that the requirement that the 
coproduct must be an algebra homomorphism forces 
the replacement of the commutation relations of 
U(sl2) by those of U;(sl2); more generally it requires 
the replacement of U(g) by U;(g). 


Affine Quantum Group Symmetry 
and the Exact S-Matrix 


In the last section, we saw the origins of U;(g) in the 
“auxiliary” algebra introduced in the Lax pair. 
However, the quantum affine algebras also play a 
second role, as a symmetry algebra. An imaginary- 
coupled affine Toda field theory based on the affine 
algebra g’ possesses the quantum affine algebra 
U,(g) as a symmetry algebra, where gY is the 
Langland dual to g (the algebra obtained by 
replacing roots by coroots). 

The solitonic particle states in affine Toda theories 
form multiplets which transform in the fundamental 
representations of the quantum affine algebra. Multi- 
particle states transform in tensor product representa- 
tions V” @ V°. The scattering of two solitons of type 
a and b with relative rapidity 0 is described by the 
S-matrix §$7°(9): V? @ V?— VE @ V°, graphically 
represented in Figure 1a. It then follows from the 
symmetry that the two-particle scattering matrix 


b a b a 
x i 
0 
gab 
a b a 


(a) (b) 
Figure 1 (a) Graphical representation of a two-particle 
scattering process described by the S-matrix Sap(0). (b) At 
special values 0%, of the relative spectral parameter, the two 
particles of types a and b form a bound state of type c. 


(S-matrix) for solitons must be proportional to the 
intertwiner for these tensor product representa- 
tions, the R matrix: 


S” (0) = F? (0) R” (O) 


with @ proportional to u, the additive spectral 
parameter. The scalar prefactor f°? (0) is not deter- 
mined by the symmetry but is fixed by other 
requirements like unitarity, crossing symmetry, and 
the bootstrap principle. 

It turns out that the axiomatic properties of the 
R-matrices are in perfect agreement with the 
axiomatic properties of the analytic S-matrix. For 
example, crossing symmetry of the S-matrix, gra- 
phically represented by 


b a ba b a 





is a consequence of the property of the universal 
R-matrix with respect to the action of the antipode S, 


(SQ1)R =R! 


An S-matrix will have poles at certain imaginary 
rapidities 67° corresponding to the formation of 
virtual bound states. This is graphically represented 
in Figure 1b. The location of the pole is determined 
by the masses of the three particles involved, 

2 


m? = m? + me: + 2mm, cos(i”’) 


At the bound state pole, the S-matrix will project 
onto the multiplet V°. Thus, the R matrix has to have 
this projection property as well and indeed, this turns 
out to be the case. The bootstrap principle, whereby 
the S-matrix for a bound state is obtained from the 
S-matrices of the constituent particles, 


c d C 
- [21] 


a b d a b 


is a consequence of the property [14] of the universal 
R-matrix with respect to the coproduct. 

There is a famous no-go theorem due to Coleman 
and Mandula which states the “impossibility of 
combining space-time and internal symmetries in 
any but a trivial way.” Affine quantum group 
symmetry circumvents this no-go theorem. In fact, 
the derivation D is the infinitesimal two-dimensional 
Lorentz boost generator and the other symmetry 


charges transform nontrivially under these Lorentz 
transformations, see [2]. 

The noncocommutative coproduct [8] means 
that a U,(q) symmetry generator, when acting on a 
2-soliton state, acts differently on the left soliton 
than on the right soliton. This is only possible 
because the generator is a nonlocal symmetry charge 
— that is, a charge which is obtained as the space 
integral of the time component of a current which 
itself is a nonlocal expression in terms of the fields 
of the theory. 

Similarly, many nonlinear sigma models possess 
nonlocal charges which form Y(g), and the con- 
struction proceeds similarly, now utilizing rational 
R-matrices, and with particle multiplets forming 
fundamental representations of Y(g). In each case, 
the three-point couplings corresponding to the 
formation of bound states, and thus the analogs for 
U,(g) and Y(g) of the Clebsch—-Gordan couplings, 
obey a rather beautiful geometric rule originally 
deduced in simpler, purely elastic scattering models 
(Chari and Pressley 1996). 

More details about this topic can be found in 
Delius (1995) and MacKay (2005). 


Integrable Quantum Spin Chains 


Affine quantum groups provide an unlimited supply 
of integrable quantum spin chains. From any 
R-matrix R(0) for any tensor product of finite- 
dimensional representations W & V, one can pro- 
duce an integrable quantum system on the Hilbert 
space V®”",. This Hilbert space can then be inter- 
preted as the space of interacting spins. The space 
W is an auxiliary space required in the construction 
but not playing a role in the physics. 

Given an arbitrary R-matrix R(@), one defines the 
monodromy matrix T(9) € End(W @ V®”) by 


T(8) = Roi(@ — 61)Ro2(6 — 62) --- Ron(O — x) 


where, as usual, R; is the R-matrix acting on the 
ith and jth component of the tensor product space. 
The 0; can be chosen arbitrarily for convenience. 
Graphically the monodromy matrix can be repre- 


sented as 
t+} 


Vi Vp Boe Vp_1 Vp 


As a consequence of the Yang—Baxter equation 
satisfied by the R-matrices the monodromy matrix 
satisfies 


RTT =TIR (22) 
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or, graphically, 


Ws 7 | 
X 
V Va ss Yy, Vi Vo = V, 


























One defines the transfer matrix 
T(0) = trw I (6) 


which is now an operator on V®”, the Hilbert space 
of the quantum spin chain. Due to [22], two transfer 
matrices commute, 


[r(9), r(@)] = 0 


and thus the 7(@) can be seen as a generating 
function of an infinite number of commuting 
charges, one of which will be chosen as the 
Hamiltonian. This Hamiltonian can then be diag- 
onalized using the algebraic Bethe ansatz. 

One is usually interested in the thermodynamic 
limit where the number of spins goes to infinity. In 
this limit, it has been conjectured, the Hilbert space 
of the spin chain carries a certain infinite-dimensional 
representation of the quantum affine algebra and this 
has been used to solve the model algebraically, using 
vertex operators (Jimbo and Miwa 1995). 


Boundary Quantum Groups 


In applications to physical systems that have a 
boundary, the Yang—Baxter equation [1] appears in 
conjunction with the boundary Yang—Baxter equa- 
tion, also known as the reflection equation, 


Rip (u = v)Ky (u)Ra1 (u + v)K2 (v) 
= K, (v)Rı2(u + v) Ky (u)Ro1 (u = v) [23] 


The matrices K are known as reflection matrices. This 
equation was originally introduced by Cherednik to 
describe the reflection of particles from a boundary in 
an integrable scattering theory and was used by 
Sklyanin to construct integrable spin chains and 
quantum field theories with boundaries. 

Boundary quantum groups are certain co-ideal 
subalgebras of affine quantum groups. They provide 
the algebraic structures underlying the solutions of the 
boundary Yang-Baxter equation in the same way in 
which affine quantum groups underlie the solutions of 
the ordinary Yang-Baxter equation. Both allow one 
to find solutions of the respective Yang-Baxter 
equation by solving a linear intertwining relation. In 
the case without spectral parameters these algebras 
appear in the theory of braided groups (see Hopf 
Algebras and q-Deformation Quantum Groups and 
Braided and Modular Tensor Categories). 
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For example, the subalgebra B,(g) of U,(g9') 
generated by 
Qi = qi” (EF + Er) + «(qi — 1), 

E T [24] 
is a boundary quantum group for certain choices of 
the parameters «; € C[[h]]. It is a left co-ideal 
subalgebra of U;(g') because 

A(Q;) = QO; ®1+ 4," 8 Q; € U,(G') SB) [25] 
Intertwiners K(A): Vna > V,/, for some constant n 
satisfying 
KAPAO) =p Q)KA), forall Q€ BG) [26 


provide solutions of the reflection equation in the 
form 


(id 8 K? (uR (Ap) (id @ K! (A)R (A/u) 
= R™(A/p) (id 8 K*()) 
x R”! (Au) (id @ K? (u)) [27] 
This can be extended to the case where the 
boundary itself carries a representation W of B,(g). 


The boundary Yang-Baxter equation can be repre- 
sented graphically as 


Vin Vey, 
Vin 
Vin 
= Vi 
V; 
W 
v2 va W 


Another example is provided by twisted Yangians 
where, when the I, and J, are constructed as 
nonlocal charges in sigma models, it is found that 
a boundary condition which preserves integrability 
leaves only the subset 


Ip =JIp + at piq( Lil T Iqli) 


conserved, where i labels the h-indices and p,q the 
t-indices of a symmetric splitting g=h+&. The 


I; and 


algebra Y(g, b) generated by the L;i, Jp is, like B.(g), 
a co-ideal subalgebra, A(Y(g,b)) c Y(g) ® Y(g, h), 
and again yields an intertwining relation for 
K-matrices. For g=sl, and }=s0, or sp,,, Y(g, b) 
is the “twisted Yangian” described in Molev (2003). 

All the constructions in earlier sections of this 
review have analogs in the boundary setting. For 
more details see Delius and MacKay (2003) and 
MacKay (2005). 


See also: Bethe Ansatz; Boundary Conformal Field 
Theory; Classical Matrices, Lie Bialgebras, and Poisson 
Lie Groups; Hopf Algebras and g-Deformation Quantum 
Groups; Riemann-Hilbert Problem; Solitons and 
Kac—Moody Lie Algebras; Yang—Baxter Equations. 
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Introduction 


In classical electrodynamics, the interaction of charged 
particles with the electromagnetic field is local, 
through the pointlike coupling of the electric charge 
of the particles with the electric and magnetic fields, E 
and B, respectively. This is mathematically expressed 
by the Lorentz-force law. The scalar and vector 
potentials, p and A, which are the time and space 
components of the relativistic 4-potential A,, are 
considered auxiliary quantities in terms of which 
the field strengths E and B, the observables, are 
expressed in a gauge-invariant manner. The homo- 
geneous or first pair of Maxwell equations are a direct 
consequence of the definition of the field strengths in 
terms of A, The inhomogeneous or second pair of 
Maxwell equations, which involve the charges and 
currents present in the problem, are also usually 
written in terms of E and B; however when writing 
them in terms of A,,, the number of degrees of freedom 
of the electromagnetic field is explicitly reduced from 
six to four; and finally, with two additional gauge 
transformations, one ends with the two physical 
degrees of freedom of the electromagnetic field. 

In quantum mechanics, however, both the 
Schrödinger equation and the path-integral approaches 
for scalar and unpolarized charged particles in the 
presence of electromagnetic fields, are written in 
terms of the potential and not of the field strengths. 
Even in the case of the Schrodinger—Pauli equation 
for spin 1/2 electrons with magnetic moment 4 
interacting with a magnetic field B, one knows that 
the coupling —y-B is the nonrelativistic limit of the 
Dirac equation, which depends on A,, but not on E and 
B Since gauge invariance also holds in the quantum 
domain, it was thought that A and p were mere 
auxiliary quantities, like in the classical case. 

Aharonov and Bohm, in 1959, predicted a quan- 
tum interference effect due to the motion of charged 
particles in regions where B(E) vanishes, but not 
A(y), leading to a nonlocal gauge-invariant effect 
depending on the flux of the magnetic field in the 
inaccessible region, in the magnetic case, and on the 
difference of the integrals over time of time-varying 
potentials, in the electric case. (The magnetic effect 
was already noticed 10 years before by Ehrenberg 
and Siday in a paper on the refractive index of 
electrons.) 
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In the context of the Schrodinger equation, one 
can show that due to gauge invariance, if wo is a 
solution to the equation in the absence of an 
electromagnetic potential, then the product of 
olx) times the integral of A, over a path joining 
an arbitrary reference point x) to x is also a 
solution, if the integral is path independent. How- 
ever, it is the path integral of Feynman which in the 
formulas for propagators of charged particles in the 
presence of electromagnetic fields clearly shows that 
the action of these fields on charged particles is 
nonlocal, and it is given by the celebrated non- 
integrable (path-dependent) phase factor of Wu and 
Yang (1975). Moreover, this fact provides an 
additional proof of the nonlocal character of 
quantum mechanics: to surround fluxes, or to 
develop a potential difference, the particle has to 
travel simultaneously at least through two paths. 

Thus, the fact that the Aharonov-Bohm (A-B) 
effect was verified experimentally, by Chambers and 
others, demonstrates the necessity of introducing the 
(gauge-dependent) potential A, in describing the 
electromagnetic interactions of the quantum parti- 
cle. This is widely regarded as the single most 
important piece of evidence for electromagnetism 
being a gauge theory. Moreover, it shows, to 
paraphrase Yang, that the field underdescribes the 
physical theory, while the potential overdescribes it, 
and it is the phase factor which describes it exactly. 

The content of this article is essentially twofold. 
The first four sections are mainly physical, where we 
describe the magnetic A-B effect using the 
Schrödinger equation and the Feynman path inte- 
gral. The fifth section is geometrical and is the long- 
est of the article. We describe the effect in the 
context of fiber bundles and connections, namely 
as a result of the coupling of the wave function 
(section of an associated bundle) to a nontrivial 
flat connection (non-pure gauge vector potential 
with zero magnetic field) in a trivial bundle (the 
A-B bundle) with topologically nontrivial (non- 
simply-connected) base space. We discuss the mod- 
uli space of flat connections and the holonomy 
groups giving the phase shifts of the interference 
patterns. Finally, in the last section, we briefly 
comment on the nonabelian A-B effect. 


Electromagnetic Fields in Classical Physics 


In classical physics, the motion of charged particles 
in the presence of electromagnetic fields is governed 
by the equation 
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© p =4q(E+"xB) 1] 
where 
= MU 
a 1 — (v2 /c?) 


is the mechanical momentum of the particle with 
electric charge q, mass m, and velocity v = x (c is 
the velocity of light in vacuum, and for |v| < c the 
left-hand side (LHS) of [1] is approximately mv); the 
right-hand side (RHS) is the Lorentz force, where E 
and B are, respectively, the electric and magnetic 
fields at the spacetime point (t,x) where the particle 
is located. Equation [1] is easily derived from the 
Euler-Lagrange equation 


d fob. ƏL 

ef oe 2 

dt (Z) Ox i 2 
with the Lagrangian L given by the sum of the free 
Lagrangian for the particle, 


Lo = -megi [3] 


and the Lagrangian describing the particle—field 
interaction, 


Linn = ŻA: v- ay 4 


In [4], A and ọ are, respectively, the vector potential 
and the scalar potential, which together form the 
4-potential A, = (Ao, —A) = (y, —A’), i = 1,2,3, 
in terms of which the electric and magnetic field 
strengths are given by 


LO 
p= —75,4- Ve [Sa] 
B=VxA [Sb] 


The classical action corresponding to a given path of 
the particle is 


to to 
S= J L= J dt(Lo + Lint) 
ti ty 


t t2 
= J dt Lo + J dt Lin = So. Sint [6] 
ty ti 


E, B, and S are invariant under the gauge 
transformation 
A—A'=A-VA [Za] 
10 
= ——A 7b 
PG =O [7b] 


where A is a real-valued differentiable scalar 
function (at least of class C*) on spacetime. That 
is, if E’, B’, and Si, are defined in terms of A’ and 
y’ as E, B, and Sint are defined in terms of A and 
y, then E = EF, B =B, and Si. = Sin. This fact 
leads to the concept that, classically, the observa- 
bles E and B are the physical quantities, while A,, 
is only an auxiliary quantity. Also, and most 
important in the present context, eqn [1] states 
that the motion of the particles is determined by 
the values or state of the field strengths in an 
infinitesimal neighborhood of the particles, that is, 
classically, E and B act locally. If one defines the 
differential 1-form A = A„dx” (with dx? = cdt), 
then the components of the differential 2-form 
P= dA s (T2 A= 0A On” hak? E2] 
dx” A dx” are precisely the electric and magnetic 


fields: 


At the level of A, 
dF = A = 0 [9] 


is an identity, but at the level of E and B, [9] 
amounts to the homogeneous (or first pair of) 
Maxwell equations obeyed by the field strengths: 


V-B=0 [10a] 
zis pei [10b] 
c ot 


Therefore, these equations have a geometrical 
origin. The second pair of Maxwell equations is 
dynamical, and is obtained from the field action (in 
the Heaviside system of units) 


S= -g | aise, PY [11] 
which leads to V -E = 4rp [12a] 
TaB- e [12b] 

c Ot C 


where (p, —j) = (j°, —/) is the 4-current satisfying, as a 
consequence of [12a] and [12b], the conservation law 


a,j" = 0 13 


For a pointlike particle, p(t, x) = q6? (x — x(t)) and 
j = pv. 


Electromagnetic Fields in Quantum 
Physics 


In quantum physics, the motion of charged particles in 
external electromagnetic fields is governed by the 
Schrödinger equation or, equivalently, by the Feynman 
path integral. In both cases, however, it is the 
4-potential A, which appears in the equations, and 
not the field strengths. For simplicity, we consider here 
scalar (spinless) charged particles or unpolarized 
electrons (spin-(1/2)particles), both of which, in the 
nonrelativistic approximation, can be described quan- 
tum mechanically by a complex wave function w(t, x). 

To derive the Schrödinger equation, one starts 
from the classical Hamiltonian 


1 
H=P-v-L-me=>(P-1A) +q [14 
2 
where 
o q 
Pa = pA 
Ov eT 


is the canonical momentum of the particle, and we 
have subtracted its rest energy. The replacements 
P — —ibV and H — ibd/dt lead to 


py- (1 (inva Fay taob 
ae ga C a? 


-(-2 v4 q A2 





2m 2mc2 

ihq ihq 
+~——V-A+—A-Veqy]v_ [15] 

2mc MC 


The gauge transformation [7a] and [7b] is a 
symmetry of this equation, if simultaneously to the 
change of the 4-potential, the wave function trans- 
forms as follows: 


ViVi eae rae) [7c 


So, A’ and w! obey [15]. At each (t,x), e7 "a/hoA 
belongs to U(1), the unit circle in the complex plane. 

In the path-integral approach, the kernel 
K(t', x’;t,x), which gives the probability amplitude 
for the propagation of the particle from the spacetime 
point (t,x) to the spacetime point (t,x) (t< t), is 
given by 


K(t, x’; t,x) 


x(t )=x' ; 
= J Dx(rT)exp G (So + Six) 


(t)=x 


x(t =x i t 1 
= Lo Dx(r)exp (z dr( je 
+ TA -V — a) ) 
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[7 i [i 4 1 j 
= Dx(T exp(; [ TNX ) 
x(t)=x h t 2 


ig faa 
x exp A | dx”A, [16] 
t 


where the integral f Dx(r)... is over all continuous 
spacetime paths (7, x(7)) which join (t, x) with (t, x’). 
If one knows the wave function at (t,x), then the 
wave function at (t’, x’) is given by 


y(t, x) = J dx K(t,x'; t, x)ylt,x) [17] 


An important point is the natural appearance in the 
integrand of the functional integral of the factor 


alia/be) JA 


for each path y joining (t,x) with (t’, x’). 


A Solution to the Schrodinger Equation 


In what follows, we shall restrict ourselves to static 
magnetic fields; then in the previous formulas, we 
set y = 0 and A(t,x) = A(x). It is then easy to 
show that if xo is an arbitrary reference point and 
the integral ie A(x’)-dx’ is independent of the 
integration path from xo to x, that is, it is a well- 
defined function f of x, and if wo is a solution of 
the free Schrödinger equation, that is, 


Oo Po 
iba Wo = —~_—V Wo [18] 


2m 


then 


w(t.x) = exp | Ale) ds! Jeo(ta) [19 


is a solution of [15]. In fact, replacing [19] in [15], 
the LHS gives 


oe 
exp (Fr ib Su 


while for the RHS one has 


exp( Af) ) (- Yo 
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The cancelation of the exponential factors shows 
that, under the condition of path independence, 
there is no effect of the potential on the charged 
particles. Another way to see this is by making a 
gauge transformation [7a|-[7c] with A(x) = f(x), 
which changes %— yọ and A—-A’'=A-V 
Ie, A(x’)- dx’ =A—A= 0. 

The condition of path independence amounts, 
however, to the condition that no magnetic field is 
present since, if f A depends on y, then for some 
pair of paths y and y from (t,x) to (t’,x’), OF J, 
roy ee. = ee ee = Lun 4 = fado- (V x A), 
where in the last equality we applied Stokes theorem 
(X is any surface with boundary yU(—7’)), which 
shows that B=V xA must not vanish everywhere 
and has a nonzero flux ® through © given by 


= | do-B (20) 


The conclusion of this section is that the ansatz [19] for 
solving [15] can only be applied in simply connected 
regions with no magnetic field strength present. 


Aharonov-Bohm Proposal 


In 1959, Aharonov and Bohm proposed an experi- 
ment to test, in quantum mechanics, the coupling of 
electric charges to electromagnetic field strengths 
through a local interaction with the electromagnetic 
potential A,, but not with the field strengths 
themselves. However, as we saw before, no physical 
effect exists, that is, A,, can be gauged away, unless 
magnetic and/or electric fields exist somewhere, 
although not necessarily overlapping the wave func- 
tion of the particles. 

Consider the usual two-slit experiment as depicted 
in Figure 1, with the additional presence, behind the 
slits, of a long and narrow solenoid enclosing a 
nonvanishing magnetic flux ® due to a constant and 
homogeneous magnetic field B normal to the plane 





Figure 1 


Magnetic Aharonov-Bohm effect. 


of the figure (in direction z); outside of the solenoid, 
the magnetic field is zero. If the radius of the 
solenoid is R, a vector potential A that produces 
such field strength is given by 


_ J ([Blr/2)¢, 
A(x) = (b/2nr), 


r<kR 


r>kR [21] 


where ® = 7R*|B| and ¢ is a unit vector in the 
azimuthal direction. In fact, 


IBZ, r<R 


0, r>R [22] 


B = V x A(x) = l 
Notice that at r= R, A is continuous but not 
continuously differentiable. Also, the ideal limit of 
an infinitely long solenoid makes the problem two- 
dimensional, that is, in the x-y plane. 

The probability amplitude for an electron emitted 
at the source S to arrive at the point P on the screen 
II, is given by the sum of two probability ampli- 
tudes, namely those corresponding to passing 
through the slits 1 and 2. The solenoid is assumed 
to be impenetrable to the electrons; mathematically, 
this corresponds to a motion in a non-simply- 
connected region. In the approximation for the 
path integral [16], in which one considers the 
contribution of only two classes of paths, that is, 
the class {y} represented by path I, and the class 
{y} represented by path II, if the wave function at 
the source is ws, then the wave function at P is 
given by 


je= J ali/b)S0(7) @~Cilel/be) fA 
{yt 


+f ali/b)So(7/)o~(lel/be) J, “Yes 
ty) 


=e lilel/he) JA f G/S 


iy} 
q e~le fy A J eli/P)S0(1) ape 
{17} 
— e~llel/be) fA (w A 
5 or lilel/be) ( Si A) 7. m) 


= eCo hA (up ma e240 yem) ) 


S 


[23] 


where, in the second line, we used the path 
independence of the integral of A within each class 
of paths; 


i/b S 
usa = f ol) Jin SOM a, 
{yt 


and 
WO) = | eS 
17 } 


and, in the last equality, we applied the extended 
version of Stokes theorem (by Craven), to allow for 
noncontinuously differentiable vector potentials; 
and the quantum of magnetic flux associated with 
the charge |e| is defined by 


h 
by = 2a ~4.135x10-7Gem? [24] 
e 


(=27/le| = /a/a %v137r in the natural system 
of units (n.s.u.) b = c = 1; a is the fine structure 
constant). Then the probability of finding the 
electron at P is proportional to 


bel’ = WEDI + WADI? 
+ 2Re (7/90 Dy") [25] 


which exhibits an interference pattern shifted with 
respect to that without the magnetic field: as B and 
therefore ® change, dark and bright interference 
fringes alternate periodically at the screen II, with 
period ®o. This is the magnetic A-B effect, which has 
been quantitatively verified in many experiments, the 


first one in 1960 by Chambers. The effect is: 


1. gauge invariant, since B and therefore ® are 
gauge invariant; 

2. nonlocal, since it depends on the magnetic field 
inside the solenoid, where the electrons never 
enter; 

3. quantum mechanical, since classically the charges 
do not feel any force and therefore no effect 
would be expected in this limit; and 

4. topological, since the electrons necessarily move 
in a non-simply-connected space. 


But perhaps the most important implication of the 
A-B effect is a dramatic additional confirmation of 
the nonlocal character of quantum mechanics: the 
electron has to “travel” along the two paths (I and 
II) simultaneously; on the contrary, no flux would 
be surrounded and then no shift of the (then 
nonexistent) interference fringes would be observed 
at the screen II. 

Calculations in the path-integral approach includ- 
ing the whole set of homotopy classes of paths 
around the solenoid, indexed by an integer m, have 
been performed by several authors, leading to a 
formula of the type 


bp =X ety (mn) 26 


Aharonov-Bohm Effect 195 


with 


P 
= 2r — 2 
6 EF 27] 


(Schulman 1971, Kobe 1979). As in [23], 
pp(® + ko) = pp(®), REJ [28] 


There is a close relation between the A-B effect 
and the Dirac quantization condition (DQC) in the 
presence of electric and magnetic charges: according 
to [25] (or [26]) the A-B effect disappears when the 
flux ® equals no = 2nn(bc/le|), neZ, that is, 
when the condition 


lel = nhc [29] 


holds. But this is the DQC (Dirac 1931) when ® is 
the flux associated with a magnetic charge g: 
(g) = (g/4nr?) x 4nr* = g, leading to |elg = nhc 
(27m in the n.s.u.). This is precisely the condition for 
the Dirac string to be unobservable in quantum 
mechanics: to give no A-B effect. 


Geometry of the A-B Effect 


In this section we study the space of gauge classes of 
flat potentials outside the solenoid, which determine 
the A-B effect; the topological structure of the A-B 
bundle; and the holonomy groups of the connec- 
tions, which precisely give the phase shifts of the 
wave functions. We use the n.s.u. system; in parti- 
cular, if [L] is the unit of length, then [A,] = cl". 
[lel] = [L]°, and o = 21/|e| = ,/r/a = V137n, where 
a is the fine structure constant. 

To synthesize, one can say that the abelian A-B 
effect is a nonlocal gauge-invariant quantum effect 
due to the coupling of the wave function (section of 
an associated bundle) to a nontrivial (non-exact) flat 
(closed) connection in a trivial principal bundle with 
a non-simply-connected base space. In the following 
subsections, we will give a detailed explanation of 
these statements. 


The A-B Bundle 


The gauge group of electromagnetism is the abelian 
Lie group U(1) with Lie algebra (the tangent space at 
the identity) u(1) = iR. In the limit of an infinitely 
long and infinitesimally thin solenoid carrying the 
magnetic flux ®, the space available to the electrons 
is the plane minus a point, that is, R**, which is of 
the same homotopy type as the circle St. Then the 
set of isomorphism classes of U(1) bundles over R” 
is in one-to-one correspondence with the set of 
homotopy classes of maps from S° to S! (Steenrod 
1951), which consists of only one point: if f,g: 
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S? — S! are given by f(1) =e, f(-1) =e”, 
g(1) =e, and g(—1) = e™, then H : S x [0,1] — 
S! given by H(1,t) = e!'-914+%) and H(-1,t) = 
el((1-1)2+12) is a homotopy between f and g. Then, 
up to equivalence, the relevant bundle for the A-B 
effect is the product bundle 

ĉas : U(1) > R™ x U(1) > R” [30a] 
Since R* is homeomorphic to an open disk minus a 
point (DZ)*, then the total space of the bundle is 
homeomorphic to an open solid 2-torus minus a 
circle, since (T3)* = (D3)* x St. Then the A-B 
bundle has the topological structure 


éx_p: S! > (T3)* > (D2)* [80b 


The Gauge Group and the Moduli Space of Flat 
Connections 


The gauge group of the bundle €,_p, is the set of 
smooth functions from the base space to the 
structure group, that is, G = C~(R*,U(1)). Since 
G c C°(R*, U(1)) = {continuous functions R* =U 
(1)} and [R**, U(1)] = {homotopy classes of contin- 
uous functions R“ — U(1)} & [S!,S!] & a(S!) & 
Z, given f € G there exists a unique n € Z such 
that f is homotopic to f,(f ~ fa), where f, : R% > 
U(1) is given by f,(re®) = e'””, p € [0, 27). 

G acts on the space of flat connections on €,_p 
given by the closed u(1)-valued differential 1-forms 
on R”: 


Co = {A € 0) (R*;u(1)), dA = 0} [31] 
through 
CoxGoCy, (Afl A+f d p2 


where f—!(x, y) = (f(x, y))~!. The moduli space 


C , 
My === {gauge equivalence classes 


G 


of flat connections on £4 3} 


={[A]={At+f "df feG},Aeco} [33] 


is isomorphic to the circle St with length 1. This can 
be seen as follows: the de Rham cohomology of R” 
with coefficients in iR in dimension 1 is 


Hop (R™*;iR) = {A[Mo]pr: à € R} 
S Ha (S';iR) SR [34] 


where 


x dy — y dx 

Ao = a av 35] 
is the connection that, once multiplied by —|e|~* (see 
below) generates the flux —®o and therefore no 
A-B effect: Ag is closed (dAg =0) but not 
exact ((xdy—ydx)/(x7+y*) = dy only for ọ € 
(0,27), p = 0 is excluded); [Ap]pr = Ao + d8 with 
8 € Q°(R**; iR). 8 gives an element of G through the 
composite exp o 8 : R** > U(1), (x,y) et). The 
A-B effect with flux ® = — A®po is produced by the 
connection A = Ap. To determine Mo, one finds 
the smallest o € R such that (A + o) Ao ~ AAp, that is, 
(A +0)Ao € [Ao], which means, from [33], that 
(A +0)Ap = An + fo'df or oAp =f df: For yp 4 
0, Ay =idy and f,'df; =idy, then o = 1, and 
therefore (A + 1)Ap ~ AAo, in particular Ao ~ 0. 

A remark concerning the gauge group G is the 
following. In classical electrodynamics, according to 
[7a] and [7b], the symmetry group could be taken to 
be the additive group (R, +) instead of the multi- 
plicative group U(1). Since R is contractible, then 
the gauge group would be Ga = C°(R*,R) with 
[R**, R] = 0, so that the homomorphism Y : Gy > G, 
W(f) (x) = e™ would not exhaust G since U(f) € [1] 
for any f€Gy: in fact, H:R* x [0,1] — U(1) 
given by H(x,t) = ei! is a homotopy between 
W(f) and 1. However, the quantization of electric 
charges implies that in fact the gauge group is U(1) 
and not R. This is equivalent mathematically to the 
possible existence of magnetic monopoles which 
require nontrivial bundles for their description. 


Covariant Derivative, Parallel Transport, 
and Holonomy 


Let G be a matrix Lie group with Lie algebra g, B a 
differentiable manifold, €:G— P&B a principal 
bundle, V a vector space, G x V — V an action, 
and éy:V — P xg VB the corresponding asso- 
ciated vector bundle (£y is trivial if € is trivial). Call 
T (£y) the sections of £y, I'(TB)(I'(TP)) the sections 
of the tangent bundle of B(P), and Feq(P, V) the set 
of functions y: P — V satisfying y(pg) = g!y(p) 
(equivariant functions from P to V). s €I(é&y) 
induces Ys €Teg(P,V) with (p) =v, where 
s(m(p)) = [p, v] and y € Teq(P, V) induces s, € T'(Ey) 
with s,(b) = [p,y(p)], where p € 7 '({b}). If H is a 
connection on €, that is, a smooth assignment of a 
(horizontal) vector subspace Hp of T,P at each p of 
P, algebraically determined by a smooth g-valued 
1-form w on P through Hp = ker(wy), s € T'(év), 
X €I(TB), and X! € T(TP) the horizontal lifting of 
X by w, then X!(y;) €Teqg(P, V), and covariant 


derivative of s with respect to w in the direction of X is 


defined by 


VxS2= SxI(%,) [36a] 


If ġo: 7 I U) — U x G is a local trivialization of €, 
x”, u = 1,...,dim B are local coordinates on U, and 
esi S= l a V is a basis of the local sections in 
aa (0); ‘en the local expression of [36a] is 


| ð 
Vxna/axu(S ei) = X" (saa +, )se ej [36b] 


where 


j_ yl _ (ax j 
Au; = A pdx“ = (wu); [36c] 
is the geometrical gauge potential in U, given by the 
pullback of wy, the restriction of w to 7 ~ (U), by the 
local section o: U — m™t(U), o(b) = 6 -1(b, 1). (A); 
is defined through V575,.ei = Alej.) The operator 


D’. = ae 


ui i Axl [36d] 
is the usual local covariant derivative. In an over- 
lapping trivialization, [36b] is replaced by 


te) o 
Vixiafaxn (Se g) = x(a} +l, ste 


with e; = Sel ado =g l on UN U’, then the 
local potential transforms as 


+ (Que, 


which for G abelian has the form [32]. 

For each smooth path c:[0,1] — B joining the 
points b and b’, and each p € P, = 7 '({b}), there 
exists a unique path c! in P through p with ċ¿'(t) € 
Ha for all t € [0,1]. c! is the horizontal lifting of c 
by w through p. Thus, for each connection and path 
there exists a diffeomorphism P’:P, — Py called 
parallel transport. If c is a loop at b, then PY € 
Diff(P,) is called the holonomy of w at b along c. To 
the loop space of B at b, Q(B;b), corresponds a 
subgroup Hol; of Diff(P,) called the holonomy of w 
at b. If c € Q(B;b) and £ is a lifting of c through q € 
P,, then there exists a unique path g:[0,1] > G 
such that c!(t) = G(t)g(t) with c' (0) = gg(0) = p; g 
satisfies the differential equation 


Ai — = 2A ae [36e] 


S g(t) + wao (AO) = 0 37 
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whose solution is the time-ordered exponential 


m=1 0 


si drws) (6lT)) + 


7 / Cn aa) 38] 


If q = p then g(0) = 1. For each p € P, the set of 
elements g € G such that c'(1)= pg’ for ce 
Q(B;7(p)) is a subgroup of G, Hols called the 
holonomy of w at p. (For each p, there exists a 
group isomorphism Hol; — Holp) and if p and p’ 
are connected by a horizontal curve, then 
Hol, = = Hol” p; Uf all p's in P are oranal con- 
need hen Hol, = = G for all p € P.) If (U,¢) is a 
local ny alco of £,c C U, and G(t) = o(c(t)), then 
one has the local formula 


c(t) 
cl(t) = 6 *(c(t), 1) (r exp ( - / 4) e0 
c(0) 


[39] 


In particular, if € is a product bundle, then ¢ is the 
identity, and choosing g(0) =1 gives 
) 

Au) ) [40] 


c(t 
c(t) = | c(t), Texp| — 
(1) ( (t) o( J , 


In our case, V = C, € is a product bundle, s = 4, 
the wave function, is a global section of the 
associated bundle 


éc:C > R” x C ZSR” [41] 
G = U(1) with g = iR and an action U(1) x C > C, 
(e, z)— e?z; therefore, A, = Ao, = ia, with a, 
real valued, and the covariant derivative is 


O 
Dw = E + ia, Y 


If w~ carries the electric charge q, we define the 
physical gauge potential A,, through 
Ay = qÅ, |42] 


36f] 


and, for the covariant derivative, after multiplying 
by i, we obtain the operator appearing in eqn [15], 
iD, = (1(0/Ox") — qA,,)w: in fact, for the spatial 
part the coupling is (iV+qA)y, and for the 
temporal part one has (i0/Ot—qy)w. For the 
electron, g = —|e| and a, = —|e|A,, = —(27/®o)A,, 
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For c € Q(R*;(x0,¥0)), which turns n times 
around the solenoid at (0,0), eqn [40] gives 


cl = ((x0,90),€ FA) = ((xo,¥0),€ 8) 
= (x0, ¥0),€ FAM) = ((x0, yo), 0-2/0) 


and therefore, for ®/®) = A € [0,1) we have the 
holonomy groups 


w(®) f .—2rin(6/ 
Hol o.90),1) a {e (2/ Vi ez 
_ÍZaą A=b/4, p.4 EZ, (p,4)=1 
Z, AZQ 
[43] 


In the second case, Ho yo), 1) İS dense in U(1): in fact, 


suppose that for n1, n2 € Z, ny Æ m, emà = e272, 
then e7" -m)à — 1 and so (nı — m)A = m for some 
m € Z; therefore, A € Q, which is a contradiction. 

Finally, we should mention that the A-B effect 
can be understood as a geometric phase à la Berry, 
though not necessarily through an adiabatic change 
of the parameters on which the Hamiltonian 
depends. The Berry potential ap turns out to be 
proportional to the real magnetic vector potential A: 
in the n.s.u., and for electrons, 


dg = > le|A [44] 


Nonabelian and Gravitational A-B Effects 


Since the fundamental group IM (R™, (xo, yo)) & Z, 
eqn [43] shows that there is a homomorphism (w): 


M (R, (x0, y0)) > U(1), plw)(n) =e 2, with 
olw) (hR) so asi): which characterizes 


the A-B effect in that case. In general, an A-B 
effect in a G-bundle with a connection w is 
characterized by a group homomorphism from the 
fundamental group of the base space B onto the 
holonomy group of the connection, which is a 
subgroup of the structure group. The A-B effect is 
nonabelian if the holonomy group is nonabelian, 
which requires both G and II;(B,x) to be 


nonabelian. Examples with Yang-Mills and grav- 
itational fields are considered in the literature. 
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Introduction 


Quantum field theory may be understood as the 
incorporation of the principle of locality, which is at 
the basis of classical field theory, into quantum 


physics. There are, however, severe obstacles against 
a straightforward translation of concepts of classical 
field theory into quantum theory, among them the 
notorious divergences of quantum field theory and 
the intrinsic nonlocality of quantum physics. There- 
fore, the concept of locality is somewhat obscured in 
the formalism of quantum field theory as it is 
typically exposed in textbooks. Nonlocal concepts 
such as the vacuum, the notion of particles or the S- 
matrix play a fundamental role, and neither the 


relation to classical field theory nor the influence of 
background fields can be properly treated. 

Algebraic quantum field theory (AQFT; synony- 
mously, local quantum physics), on the contrary, 
aims at emphasizing the concept of locality at every 
instance. As the nonlocal features of quantum 
physics occur at the level of states (“entangle- 
ment”), not at the level of observables, it is better 
not to base the theory on the Hilbert space of states 
but on the algebra of observables. Subsystems of a 
given system then simply correspond to subalgebras 
of a given algebra. The locality concept is abstractly 
encoded in a notion of independence of subsystems; 
two subsystems are independent if the algebra of 
observables which they generate is isomorphic 
to the tensor product of the algebras of the 
subsystems. 

Spacetime can then — in the spirit of Leibniz — be 
considered as an ordering device for systems. So, one 
associates with regions of spacetime the algebras of 
observables which can be measured in the pertinent 
region, with the condition that the algebras of 
subregions of a given region can be identified with 
subalgebras of the algebra of the region. 

Problems arise if one aims at a generally covariant 
approach in the spirit of general relativity. Then, in 
order to avoid pitfalls like in the “hole problem,” 
systems corresponding to isometric regions must be 
isomorphic. Since isomorphic regions may be 
embedded into different spacetimes, this amounts 
to a simultaneous treatment of all spacetimes of a 
suitable class. We will see that category theory 
furnishes such a description, where the objects are 
the systems and the morphisms the embeddings of a 
system as a subsystem of other systems. 

States arise as secondary objects via Hilbert space 
representations, or directly as linear functionals on 
the algebras of observables which can be interpreted 
as expectation values and are, therefore, positive 
and normalized. It is crucial that inequivalent 
representations (“sectors”) can occur, and the 
analysis of the structure of the sectors is one of 
the big successes of AQFT. One can also study the 
particle interpretation of certain states as well as 
(equilibrium and nonequilibrium) thermodynamical 
properties. 

The mathematical methods in AQFT are mainly 
taken from the theory of operator algebras, a field of 
mathematics which developed in close contact to 
mathematical physics, in particular to AQFT. 
Unfortunately, the most important field theories, 
from the point of view of elementary particle 
physics, as quantum electrodynamics or the standard 
model could not yet be constructed beyond formal 
perturbation theory with the annoying consequence 
that it seemed that the concepts of AQFT could not 
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be applied to them. However, it has recently been 
shown that formal perturbation theory can be 
reshaped in the spirit of AQFT such that the algebras 
of observables of these models can be constructed as 
algebras of formal power series of Hilbert space 
operators. The price to pay is that the deep 
mathematics of operator algebras cannot be applied, 
but the crucial features of the algebraic approach can 
be used. 

AQFT was originally proposed by Haag as a 
concept by which scattering of particles can be 
understood as a consequence of the principle of 
locality. It was then put into a mathematically 
precise form by Araki, Haag, and Kastler. After the 
analysis of particle scattering by Haag and Ruelle 
and the clarification of the relation to the Lehmann- 
Symanzik—Zimmermann (LSZ) formalism by Hepp, 
the structure of superselection sectors was studied 
first by Borchers and then in a fundamental series of 
papers by Doplicher, Haag, and Roberts (DHR) 
(see, e.g., Doplicher et al. (1971, 1974)) (soon after 
Buchholz and Fredenhagen established the relation 
to particles), and finally Doplicher and Roberts 
uncovered the structure of superselection sectors as 
the dual of a compact group thereby generalizing the 
Tannaka—-Krein theorem of characterization of 
group duals. 

With the advent of two-dimensional conformal 
field theory, new models were constructed and it was 
shown that the DHR analysis can be generalized to 
these models. Directly related to conformal theories is 
the algebraic approach to holography in anti-de Sitter 
(AdS) spacetime by Rehren. 

The general framework of AQFT may be described 
as a covariant functor between two categories. The 
first one contains the information on local relations 
and is crucial for the interpretation. Its objects are 
topological spaces with additional structures (typi- 
cally globally hyperbolic Lorentzian spaces, possibly 
spin bundles with connections, etc.), its morphisms 
being the structure-preserving embeddings. In the 
case of globally hyperbolic Lorentzian spacetimes, 
one requires that the embeddings are isometric and 
preserve the causal structure. The second category 
describes the algebraic structure of observables. In 
quantum physics the standard assumption is that one 
deals with the category of C*-algebras where the 
morphisms are unital embeddings. In classical phys- 
ics, one looks instead at Poisson algebras, and in 
perturbative quantum field theory one admits alge- 
bras which possess nontrivial representations as 
formal power series of Hilbert space operators. It is 
the leading principle of AQFT that the functor A 
contains all physical information. In particular, two 
theories are equivalent if the corresponding functors 
are naturally equivalent. 
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In the analysis of the functor £, a crucial role is 
played by natural transformations from other 
functors on the locality category. For instance, a 
field A may be defined as a natural transformation 
from the category of test function spaces to the 
category of observable algebras via their functors 
related to the locality category. 


Quantum Field Theories as Covariant 
Functors 


The rigorous implementation of the generally covariant 
locality principle uses the language of category theory. 
The following two categories are used: 


Loc: The class of objects obj(Loc) is formed by all 
(smooth) d-dimensional (d >2 is held fixed), 
globally hyperbolic Lorentzian spacetimes M 
which are oriented and time oriented. Given any 
two such objects Mı and M2, the morphisms w € 
homyoc(M,1,M2) are taken to be the isometric 
embeddings wy: Mı — M> of Mı into M; but with 
the following constraints: 


(i) if y:[a,b] > M2 is any causal curve and 
y(a),y(b) € (M1) then the whole curve must 
be in the image Y(M1), that is, y(t) € Y(M1) for 
all t € [a, b]; 

(ii) any morphism preserves orientation and 
time orientation of the embedded spacetime. 
The composition is defined as the composition 
of maps, the unit element in homyz,,(M, M) is 
given by the identical embedding idy:Mt>M 
for any M € obj(Loc). 


Obs: The class of objects obj(Obs) is formed by all 
C*-algebras possessing unit elements, and the 
morphisms are faithful (injective) unit-preserving 
*-homomorphisms. The composition is again 
defined as the composition of maps, the unit 
element in homo p,(A, A) is for any A € obj(Obs) 
given by the identical map idy: Ate A,A € A. 


The categories are chosen for definitiveness. One 
may envisage changes according to particular needs, 
as, for instance, in perturbation theory where instead 
of C*-algebras general topological x-algebras are 
better suited. Or one may use von Neumann 
algebras, in case particular states are selected. On 
the other hand, one might consider for Loc bundles 
over spacetimes, or (in conformally invariant the- 
ories) admit conformal embeddings as morphisms. In 
case one is interested in spacetimes which are not 
globally hyperbolic, one could look at the globally 
hyperbolic subregions (where one needs to be careful 
about the causal convexity condition (i) above). 


The concept of locally covariant quantum field 
theory is defined as follows. 


Definition 1 


(i) A locally covariant quantum field theory is a 
covariant functor . from Loc to Obs and (writing 
a, for ()) with the covariance properties 


Ay O Ay = Ayloy, Qida = Idm) 


for all morphisms ~w € homyoc(M1, M2), all 
morphisms wy € homyoc(M2,M3), and all 
M € obj(Loc). 

(ii) A locally covariant quantum field theory 
described by a covariant functor ./ is called 
“causal” if the following holds: whenever there 
are morphisms y; E€ homyoc(M;,M),7=1, 2, 
so that the sets %ı(M1ı) and %2 (M2) are causally 
separated in M, then one has 


[Ay (4 (M1)), Aipa (4 (M2))] — {0} 


where the element-wise commutation makes 
sense in (M). 

(iii) One says that a locally covariant quantum field 
theory given by the functor . obeys the “time- 
slice axiom” if 


ay (A (M)) = of (M’) 


holds for all y € homyoc(M,M’) such that Y(M) 
contains a Cauchy surface for M’. 


Thus, a quantum field theory is an assignment of 
C*-algebras to (all) globally hyperbolic spacetimes 
so that the algebras are identifiable when the 
spacetimes are isometric, in the indicated way. This 
is a precise description of the generally covariant 
locality principle. 


The Traditional Approach 


The traditional framework of AQFT, in the Araki- 
Haag—Kastler sense, on a fixed globally hyperbolic 
spacetime can be recovered from a locally covariant 
quantum field theory, that is, from a covariant 
functor .¥ with the properties listed above. 

Indeed, let M be an object in obj(Loc). K(M) 
denotes the set of all open subsets in M which are 
relatively compact and also contain, with each pair 
of points x and y, all g-causal curves in M 
connecting x and y (cf. condition (i) in the definition 
of Loc). O € K(M), endowed with the metric of M 
restricted to O and with the induced orientation and 
time orientation, is a member of obj(Loc), and the 
injection map tm,o:O — M, that is, the identical 
map restricted to O, is an element in homyo,(O, M). 


With this notation, it is easy to prove the following 
assertion: 


Theorem 1 Let Æ be a covariant functor with 
the above-stated properties, and define a map 
K(M) > Or A(O) C 4 (M) by setting 


A(O) := amo (Y (0)) 
Then the following statements hold: 
(i) The map fulfills isotony, that is, 


O1 C O: > A(O1) C A(O2) 
for all O1, O2 € K(M) 


(ii) If there exists a group G of isometric diffeo- 
morphisms k: M — M (so that k x g = g) preser- 
ving orientation and time orientation, then there 
is a representation G > k= k of G by C- 
algebra automorphisms č&«:A(M) — (M) 
such that 


Ax(A(O)) = A(K(O)), O E€ KM) 


— 


If the theory given by & is additionally causal, 
then it holds that 


|A(O7), A(O2)] = {0} 


for all O1, O2 E€ K(M) with O, causally sepa- 
rated from O2. 


(iii 


These properties are just the basic assumptions of 
the Araki-Haag-Kastler framework. 


The Achievements of the Traditional 
Approach 


In the Araki-Haag—Kastler approach in Minkowski 
spacetime M, many results have been obtained in 
the last 40 years, some of them also becoming a 
source of inspiration to mathematics. A description 
of the achievements can be organized in terms of a 
length-scale basis, from the small to the large. We 
assume in this section that the algebra ./(M) is 
faithfully and irreducibly represented on a Hilbert 
space H, that the Poincaré transformations are 
unitarily implemented with positive energy, and 
that the subspace of Poincaré invariant vectors is 
one dimensional (uniqueness of the vacuum). 
Moreover, algebras correponding to regions which 
are spacelike to a nonempty open region are 
assumed to be weakly closed (i.e., von Neumann 
algebras on HH), and the condition of weak 
additivity is fulfilled, that is, for all O € K(M) 
the algebra generated from the algebras 
A(O + x),x € M is weakly dense in (M). 
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Ultraviolet Structure and Idealized Localizations 


This section deals with the problem of inspecting the 
theory at very small scales. In the limiting case, one 
is interested in idealized localizations, eventually the 
points of spacetimes. But the observable algebras are 
trivial at any point x € M, namely 


() A(O)=C1, OE K(M) 
O5dx 


Hence, pointlike localized observables are neces- 
sarily singular. Actually, the Wightman formulation 
of quantum field theory is based on the use of 
distributions on spacetime with values in the algebra 
of observables (as a topological x-algebra). In spite 
of technical complications whose physical signifi- 
cance is unclear, this formalism is well suited for a 
discussion of the connection with the Euclidean 
theory, which allows, in fortunate cases, a treatment 
by path integrals; it is more directly related to 
models and admits, via the operator-product expan- 
sion, a study of the short-distance behavior. It is, 
therefore, an important question how the algebraic 
approach is related to the Wightman formalism. The 
reader is referred to the literature for exploring the 
results on this relation. 

Whereas these results point to an essential equiva- 
lence of both formalisms, one needs in addition a 
criterion for the existence of sufficiently many Wight- 
man fields associated with a given local net. Such a 
criterion can be given in terms of a compactness 
condition to be discussed in the next subsection. As a 
benefit, one derives an operator-product expansion 
which has to be assumed in the Wightman approach. 

In the purely algebraic approach, the ultraviolet 
structure has been investigated by Buchholz and 
Verch. Small-scale properties of theories are studied 
with the help of the so-called scaling algebras whose 
elements can be described as orbits of observables 
under all possible renormalization group motions. 
There results a classification of theories in the scaling 
limit which can be grouped into three broad classes: 
theories for which the scaling limit is purely classical 
(commutative algebras), those for which the limit is 
essentially unique (stable ultraviolet fixed point) and 
not classical, and those for which this is not the case 
(unstable ultraviolet fixed point). This classification 
does not rely on perturbation expansions. It allows 
an intrinsic definition of confinement in terms of the 
so-called ultraparticles, that is, particles which are 
visible only in the scaling limit. 


Phase-Space Analysis 


As far as finite distances are concerned, there are 
two apparently competing principles, those of 
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nuclearity and modularity. The first one suggests 
that locally, after a cutoff in energy, one has a 
situation similar to that of old quantum mechanics, 
namely a finite number of states in a finite volume 
of phase space. Aiming at a precise formulation, 
Haag and Swieca introduced their notion of com- 
pactness, which Buchholz and Wichmann sharpened 
into that of nuclearity. The latter authors proposed 
that the set generated from the vacuum vector Q, 


fe "7 AQ|A € A(O), |All < 1} 


H denoting the generator of time translations 
(Hamiltonian), is nuclear for any 3> 0, roughly 
stating that it is contained in the image of the unit 
ball under a trace class operator. The nuclear size 
Z(3,O) of the set plays the role of the partition 
function of the model and has to satisfy certain 
bounds in the parameter 3. The consequence of this 
constraint is the existence of product states, namely 
those normal states for which observables localized in 
two given spacelike separated regions are uncorre- 
lated. A further consequence is the existence of 
thermal equilibrium states (KMS states) for all 8 > 0. 

The second principle concerns the fact that, even 
locally, quantum field theory has infinitely many 
degrees of freedom. This becomes visible in the 
Reeh-Schlieder theorem, which states that every 
vector ® which is in the range of e~?" for some 
8 >Q (in particular, the vacuum Q) is cyclic and 
separating for the algebras A(O), O € K(M), that is, 
A(O)® is dense in H (® is cyclic) and AB=0,A € 
A(O) implies A=0O (® is separating). The pair 
(A(O),Q) is then a von Neumann algebra in the 
so-called standard form. On such a pair, the 
Tomita—Takesaki theory can be applied, namely 
the densely defined operator 


SAQ = A*Q, ACAO) 


is closable, and the polar decomposition of its 
closure S=JA!/? delivers an antiunitary involution 
J (the modular conjugation) and a positive self- 
adjoint operator A (the modular operator) asso- 
ciated with the standard pair (A(O),Q). These 
operators have the properties 


JA(O)J = A(O) 
where the prime denotes the commutant, and 


A" A(O)A" = A(O), tER 


The importance of this structure is based on the 
fact disclosed by Bisognano and Wichmann using 
Poincaré-covariant Wightman fields and local alge- 
bras generated by them, that for specific regions in 
Minkowski spacetime the modular operators have a 


geometrical meaning. Indeed, these authors showed 
for the pair (A(W),Q), where W denotes the wedge 
region W={x € M||x°| < xt}, that the associated 
modular unitary A” is the Lorentz boost with velocity 
tanh(27t) in the direction 1 and that the modular 
conjugation J is the CP;T symmetry operator with 
parity P4 the reflection with respect to the x! =0 
plane. Later, Borchers discovered that already on the 
purely algebraic level a corresponding structure exists. 
He proved that, given any standard pair (A, ®) and a 
one-parameter group of unitaries rT — U(r) acting on 
the Hilbert space H with a positive generator and 
such that ® is invariant and U(r)AU(r)* C A,r > 0, 
then the associated modular operators A and J fulfill 
the commutation relations 


A"U(r)A™ = U(e*™r) 
JU(r)J = U(-7) 


which are just the commutation relations between 
boosts and lightlike translations. 

Surprisingly, there is a direct connection between 
the two concepts of nuclearity and modularity. 
Indeed, in the nuclearity condition, it is possible to 
replace the Hamiltonian operator by a specific 
function of the modular operator associated with a 
slightly larger region. Furthermore, under mild 
conditions, nuclearity and modularity together 
determine the structure of local algebras completely; 
they are isomorphic to the unique hyperfinite type 
III, von Neumann algebra. 


Sectors, Symmetries, Statistics, and Particles 


Large scales are appropriate for discussing global 
issues like superselection sectors, statistics and 
symmetries as far as large spacelike distances are 
concerned, and scattering theory, with the resulting 
notions of particles and infraparticles, as far as large 
timelike distances are concerned. 

In purely massive theories, where the vacuum 
sector has a mass gap and the mass shell of the 
particles are isolated, a very satisfactory description 
of the multiparticle structure at large times can be 
given. Using the concept of almost local particle 
generators, 


UW = A(t)Q 


where W is a single-particle state (i.e., an eigenstate 
of the mass operator), A(t) is a family of almost 
local operators essentially localized in the kinema- 
tical region accessible from a given point by a 
motion with the velocities contained in the spectrum 
of Y, one obtains the multiparticle states as limits of 
products Aj(t)---A,(t)Q for disjoint velocity sup- 
ports. The corresponding closed subspaces are 


invariant under Poincaré transformations and are 
unitarily equivalent to the Fock spaces of noninter- 
acting particles. 

For massless particles, no almost-local particle 
generators can be expected to exist. In even 
dimensions, however, one can exploit Huygens 
principle to construct asymptotic particle generators 
which are in the commutant of the algebra of the 
forward or backward lightcone, respectively. Again, 
their products can be determined and multiparticle 
states obtained. 

Much less well understood is the case of massive 
particles in a theory which also possesses massless 
particles. Here, in general, the corresponding states 
are not eigenstates of the mass operator. Since 
quantum electrodynamics (QED) as well as the 
standard model of elementary particles have this 
problem, the correct treatment of scattering in these 
models is still under discussion. One attempt to a 
correct treatment is based on the concept of the so- 
called particle weights, that is, unbounded positive 
functionals on a suitable algebra. This algebra is 
generated by positive almost-local operators annihi- 
lating the vacuum and interpreted as counters. 

The structure at large spacelike scales may be 
analyzed by the theory of superselection sectors. The 
best-understood case is that of locally generated 
sectors which are the objects of the DHR theory. 
Starting from a distinguished representation 70 
(vacuum representation) which is assumed to fulfill 
the Haag duality, 


To (A(O)) = 0(A(O’))’ 


for all double cones O, one may look at all 
representations which are equivalent to the vacuum 
representation if restricted to the observables loca- 
lized in double cones in the spacelike complement of 
a given double cone. Such representations give rise 
to endomorphisms of the algebra of observables, 
and the product of endomorphisms can be inter- 
preted as a product of sectors (“fusion”). In general, 
these representations violate the Haag duality, but 
there is a subclass of the so-called finite statistics 
sectors where the violation of Haag duality is small, 
in the sense that the nontrivial inclusion 


t(A(O)) € (A(O) 


has a finite Jones index. These sectors form (in at least 
three spacetime dimensions) a symmetric tensor 
category with some further properties which can be 
identified, in a generalization of the Tannaka—Krein 
theorem, as the dual of a unique compact group. This 
group plays the role of a global gauge group. The 
symmetry of the category is expressed in terms of a 
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representation of the symmetric group. One may then 
enlarge the algebra of observables and obtain an 
algebra of operators which transform covariantly 
under the global gauge group and satisfy Bose or 
Fermi commutation relations for spacelike separation. 

In two spacetime dimensions, one obtains instead 
braided tensor categories. They have been classified 
under additional conditions (conformal symmetry, 
central charge c<1) in a remarkable work by 
Kawahigashi and Longo. Moreover, in their paper, 
one finds that by using completely new methods (O- 
systems) a new model is unveiled, apparently 
inaccessible by methods used by others. To some 
extent, these categories can be interpreted as duals 
of generalized quantum groups. 

The question arises whether all representations 
describing elementary particles are, in the massive 
case, DHR representations. One can show that in the 
case of a representation with an isolated mass shell 
there is an associated vacuum representation which 
becomes equivalent to the particle representation after 
restriction to observables localized spacelike to a given 
infinitely extended spacelike cone. This property is 
weaker than the DHR condition but allows, in four 
spacetime dimensions, the same construction of a 
global gauge group and of covariant fields with Bose 
and Fermi commutation relations, respectively, as the 
DHR condition. In three space dimensions, however, 
one finds a braided tensor category, which has similar 
properties as those known from topological field 
theories in three dimensions. 

The sector structure in massless theories is not 
well understood, due to the infrared problem. This is 
in particular true for QED. 


Fields as Natural Transformations 


In order to be able to interpret the theory in terms of 
measurements, one has to be able to compare 
observables associated with different regions of 
spacetime, or, even different spacetimes. In the 
absence of nontrivial isometries, such a comparison 
can be made in terms of locally covariant fields. By 
definition, these are natural transformations from 
the functor of quantum field theory to another 
functor on the category of spacetimes Loc. 

The standard case is the functor which associates 
with every spacetime M its space D(M) of smooth 
compactly supported test functions. There, the 
morphisms are the pushforwards Dw = y». 


Definition 2 A locally covariant quantum field ©® is 
a natural transformation between the functors 2 
and £, that is, for any object M in obj(Loc) there 
exists a morphism ®y :D(M) — (M) such that for 
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any pair of objects Mı and M2 and any morphism w 
between them, the following diagram commutes: 


D(Mi) =S A(Mı) 
Wal | Qy, 
D(M2) 5? AM2) 


The commutativity of the diagram means, expli- 
citly, that 


Qy O m, = Ou, OW, 


which is the requirement sought for the covariance 
of fields. It contains, in particular, the standard 
covariance condition for spacetime isometries. 

Fields in the above sense are not necessarily linear. 
Examples for fields which are also linear are the scalar 
massive free Klein—Gordon fields on all globally 
hyperbolic spacetimes and its locally covariant Wick 
polynomials. In particular, the energy-momentum 
tensors can be constructed as locally covariant fields, 
and they provide a crucial tool for discussing the back- 
reaction problem for matter fields. 

An example for the more general notion of a field 
are the local S-matrices in the Stiickelberg—Bogolubov— 
Epstein—Glaser sense. These are unitaries Sy(A) with 
M € obj(Loc) and A€D(M) which satisfy the 
conditions 


su(0) =1 
Su (A+ + v) =Su(0+ 2)Su(H) Su (u +v) 


for A, u,v E€ D(M) such that the supports of A and v 
can be separated by a Cauchy surface of M with 
supp à in the future of the surface. 

The importance of these S-matrices relies on the 
fact that they can be used to define a new quantum 
field theory. The new theory is locally covariant if the 
original theory is and if the local S-matrices satisfy 
the condition of the locally covariant field above. A 
perturbative construction of interacting quantum 
field theory on globally hyperbolic spacetimes was 
completed in this way by Hollands and Wald, based 
on previous work by Brunetti and Fredenhagen. 
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Synopsis 


Anomalies are the breaking of classical symmetries by 
quantum mechanical radiative corrections, which arise 
when the regularizations needed to evaluate small 
fermion loop Feynman diagrams conflict with a 
classical symmetry of the theory. They have important 
implications for a wide range of issues in quantum 
field theory, mathematical physics, and string theory. 


Chiral Anomalies, Abelian 
and Nonabelian 


Consider quantum electrodynamics, with the fer- 
mionic Lagrangian density 


L = liy"d, — eoy"B, — mo) [1a] 


where y= yty, eg and mo are the bare charge and 
mass, and B, is the electromagnetic gauge potential. 
(We reserve the notation A for axial-vector quan- 
tities.) Under a chiral transformation 


yp — el Bap [1b] 


with constant A, the kinetic term in eqn [la] is 
invariant (because ys commutes with 4°47”), whereas 
the mass term is not invariant. Therefore, naive 
application of Noether’s theorem would lead one to 
expect that the axial-vector current 


jp = Pist [1¢] 


obtained from the Lagrangian density by applying a 
chiral transformation with spatially varying A, should 
have a divergence given by the change under chiral 
transformation of the mass term in eqn [la]. Up to 
tree approximation, this is indeed true, but when one 
computes the AVV Feynman diagram with one axial- 
vector and two vector vertices (see Figure 1), and 
insists on conservation of the vector current 
ju =U, one finds that to order eĝ, the classical 
Noether theorem is modified to read 


eg 
1672 





Oji (20) = 2imoj (x) + PGF (x) F” (x)ecorp [2] 


V V 


Figure 1 The AVV triangle diagram responsible for the abelian 
chiral anomaly. 


with FS? (x) = 0° B5(x) — 0§B°(x) the electromagnetic 
field strength tensor. The second term in eqn [2], 
which would be unexpected from the application of 
the classical Noether theorem, is the abelian axial- 
vector anomaly (often called the Adler-Bell-Jackiw 
(or ABJ) anomaly after the seminal papers on the 
subject). Since vector current conservation, together 
with the axial-vector current anomaly, implies that 
the left- and right-handed chiral currents j,, + i, are 
also anomalous, the axial-vector anomaly is fre- 
quently called the “chiral anomaly,” and we shall 
use the terms interchangeably in this article. 

There are a number of different ways to understand 
why the extra term in eqn [2] appears. (1) Working 
through the formal Feynman diagrammatic Ward 
identity proof of the Noether theorem, one finds that 
there is a step where the closed fermion loop contribu- 
tions are eliminated by a shift of the loop-integration 
variable. For Feynman diagrams that are convergent, 
this is not a problem, but the AVV diagram is linearly 
divergent. The linear divergence vanishes under sym- 
metric integration, but the shift then produces a finite 
residue, which gives the anomaly. (2) If one defines the 
AVV diagram by Pauli-Villars regularization with 
regulator mass Mo that is allowed to approach infinity 
at the end of the calculation, one finds a classical 
Noether theorem in the regulated theory, 


with the subscripts mọ and Mo indicating that 
fermion loops are to be calculated with fermion 
mass mo and Mp, respectively. Taking the vacuum 
to two-photon matrix element of eqn [3a], one finds 
that the matrix element (OP |m, lyy), which is 
unambiguously computable after imposing vector- 
current conservation, falls off only as Mj‘ as the 
regulator mass approaches infinity. Thus, the 
product of 2iMọ with this matrix element has a 
finite limit, which gives the anomaly. (3) If the 
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gauge-invariant axial-vector current is defined by 
point-splitting 


p(x) = B(x + €/2)yysP(x — €/2)e "FB [3b] 


with e — 0 at the end of the calculation, one 
observes that the divergence of eqn [3b] contains 
an extra term with a factor of e. On careful 
evaluation, one finds that the coefficient of this 
factor is an expression that behaves as e, which 
gives the anomaly in the limit of vanishing e. (4) 
Finally, if the field theory is defined by a functional 
integral over the classical action, the standard 
Noether analysis shows that the classical action is 
invariant under the chiral transformation of eqn 
[1b], apart from the contribution of the mass term, 
which gives the naive axial-vector divergence. How- 
ever, as pointed out by Fujikawa, the chiral 
transformation must also be applied to the func- 
tional integration measure, and since the measure is 
an infinite product, it must be regularized to be well 
defined. Careful calculation shows that the regular- 
ized measure is not chiral invariant, but contributes 
an extra term to the axial-vector Ward identity that 
is precisely the chiral anomaly. 

A key feature of the anomaly is that it is 
irreducible: a local polynomial counter term cannot 
be added to the AVV diagram that preserves 
vector-current conservation and eliminates the 
anomaly. More generally, one can show that there 
is no way of modifying quantum electrodynamics 
so as to eliminate the chiral anomaly, without 
spoiling either vector-current conservation (i.e., 
electromagnetic gauge invariance), renormalizabil- 
ity, or unitarity. Thus, the chiral anomaly is a new 
physical effect in renormalizable quantum field 
theory, which is not present in the prequantization 
classical theory. 

The abelian chiral anomaly is the simplest case of 
the anomaly phenomenon. It was extended to 
nonabelian gauge theories by Bardeen using a 
point-splitting method to compute the divergence, 
followed by adding polynomial counter terms to 
remove as many of the residual terms as possible. 
The resulting irreducible divergence is the nonabe- 
lian chiral anomaly, which in terms of Yang-Mills 
field strengths for vector and axial-vector gauge 
potentials V” and A”, 


FA, (x) = OPV" (x) — OV" (x) — i[V4(x), V’(x)] 

a iA“ (x), A” (x)| [4a] 
Fi (x)= 0" A" (x)= 0 A" (x) =ilV" aA" (x) 

= iA" (x), V” Œœ)] 
is given by 


o"j (x) = normal divergence term 
+ (1/47 )ewortt X4[(1/4) Py (x)F7 (x) 
+ (1/12) Fa (x) Fa" (x) 


+ ( 

+ (2/3)iA" (x) A" (x) Fy (x) 
+ (2/3)iFy (x)A°(x)A7(x) 

+ (2/3)iA" (x) Fry’ (x)A™ (x) 

- (8/3)A"(x)AM(x)A’(R)A(x)) [4b 
In eqn [4b], “tr? denotes a trace over internal 
degrees of freedom, and A4 is the internal symmetry 
matrix associated with the axial-vector external 
field. In the abelian case, where there is no internal 
symmetry structure, the terms involving two or four 
factors of A“,A”,... vanish by antisymmetry of 
Emory and one recovers the AVV triangle anomaly, 
as well as a kinematically related anomaly in the 
AAA triangle diagram. In the nonabelian case, with 
nontrivial internal symmetry structure, there are also 
box- and pentagon-diagram anomalies. 

In addition to coupling to spin-1 gauge fields, 
fermions can also couple to spin-2 gauge fields, 
associated with the graviton. When the coupling of 
fermions to gravitation is taken into account, the 
axial-vector current wT 7,75, with T an internal 
symmetry matrix, has an additional anomalous 
contribution to its divergence proportional to 


tr Ta T [4c] 


where Roro is the Riemann curvature tensor of the 
gravitational field. 


Chiral Anomaly Nonrenormalization 


A salient feature of the chiral anomaly is the fact 
that it is not renormalized by higher-order radia- 
tive corrections. In other words, the one-loop 
expressions of eqns [2] and [4b] give the exact 
anomaly coefficient without modification in higher 
orders of perturbation theory. In gauge theories 
such as quantum electrodynamics and quantum 
chromodynamics, this result (the Adler—Bardeen 
theorem) can be understood heuristically as fol- 
lows. Write down a modified Lagrangian, in 
which regulators are included for all gauge-boson 
fields. Since the gauge-boson regulators do not 
influence the chiral-symmetry properties of the 
theory, the divergences of the chiral currents are 
not affected by their inclusion, and so the only 
sources of anomalies in the regularized theory are 
small single-fermion loops, giving the anomaly 
expressions of eqns [2] and [4b]. Since the 
renormalized theory is obtained as the limit of 


the regularized theory as the regulator masses 
approach infinity, this result applies to the 
renormalized theory as well. 

The above argument can be made precise, and 
extends to nongauge theories such as the o-model as 
well. For both gauge theories and the o-model, 
cancellation of radiative corrections to the anomaly 
coefficient has been explicitly demonstrated in 
fourth-order calculations. Nonperturbative demon- 
strations of anomaly renormalization have also been 
given using the Callan-Symanzik equations. For 
example, in quantum electrodynamics, Zee, and 
Lowenstein and Schroer, showed that a factor f 
that gives the ratio of the true anomaly to its one- 
loop value obeys the differential equation 


(m+ ada) \f = 0 5 


Since f is dimensionless, it can have no dependence 
on the mass m, and since G(a) is nonzero this implies 
Of /Oa=0. Thus, f has no dependence on a, and so 
SL 


Applications of Chiral Anomalies 


Chiral anomalies have numerous applications in the 
standard model of particle physics and its exten- 
sions, and we describe here a few of the most 
important ones. 


Neutral Pion Decay 7° — yy 


As a result of the abelian chiral anomaly, the 
partially conserved axial-vector current (PCAC) 
equation relevant to neutral pion decay is modified 
to read 


OF 3, (x) 
= (farh / V2) bale) + SSOP EMR) ory (6a 


with u, the pion mass, f+ ~ 131 MeV the charged- 
pion decay constant, and S$ a constant determined 
by the constituent fermion charges and axial-vector 
couplings. Taking the matrix element of eqn [6a] 
between the vacuum state and a two-photon state, 
and using the fact that the left-hand side has a 
kinematic zero (the Sutherland—Veltman theorem), 
one sees that the 7? — yy amplitude F is comple- 
tely determined by the anomaly term, giving the 
formula 


F = —(a/1)2S8V2/f, [6b] 


For a single set of fractionally charged quarks, the 
amplitude F is a factor of three too small to agree 
with experiment; for three fractionally charged 
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quarks (or an equivalent Han-Nambu triplet), eqn 
[6b] gives the correct neutral pion decay rate. This 
calculation was one of the first pieces of evidence for 
the color degree of freedom of quarks. 


Anomaly Cancellation in Gauge Theories 


In quantum electrodynamics, the gauge particle (the 
photon) couples to the vector current, and so the 
anomalous conservation properties of the axial- 
vector current have no effect. The same statement 
holds for the gauge gluons in quantum chromody- 
namics, when treated in isolation from the other 
interactions. However, in the electroweak theory 
that embeds quantum electrodynamics in a theory of 
the weak force, the gauge particles (the W* and Z 
intermediate bosons) couple to chiral currents, 
which are left- or right-handed linear combinations 
of the vector and axial-vector currents. In this case, 
the chiral anomaly leads to problems with the 
renormalizability of the theory, unless the anomalies 
cancel between different fermion species. Writing all 
fermions as left-handed, the condition for anomaly 
cancellation is 


tt) De, Ta} Sth alg t Tela) =0 
for all a, 3,7 [7] 


with T, the coupling matrices of gauge bosons to 
left-handed fermions. These conditions are obeyed 
in the standard model, by virtue of three nontrivial 
sum rules on the fermion gauge couplings being 
satisfied (four sum rules, if one includes the 
gravitational contribution to the chiral anomaly 
given in eqn [4c], which also cancels in the standard 
model). Note that anomaly cancellation in the 
locally gauged currents of the standard model does 
not imply anomaly cancellation in global-flavor 
currents. Thus, the flavor axial-vector current 
anomaly that gives the 7? — yy matrix element 
remains anomalous in the full electroweak theory. 
Anomaly cancellation imposes important constraints 
on the construction of grand unified models that 
combine the electroweak theory with quantum 
chromodynamics. For instance, in SU(5) the fer- 
mions are put into a 5 and 10 representation, which 
together, but not individually, are anomaly free. The 
larger unification groups SO(10) and E¢ satisfy eqn 
[7] for all representations, and so are automatically 
anomaly free. 


Instanton Physics and the Theta Vacuum 


The theory of anomalies is intimately tied to the 
physics associated with instanton classical Yang- 
Mills theory solutions. Since the instanton field 
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strength is self-dual, the nonvanishing instanton 
Euclidean action 


Sp = J dxi Fy BY = 8r? [8a] 


implies that the integral of the pseudoscalar density 
FF)” over the instanton is also nonzero, 


/ d*xF Fy” = 641° [8b] 


Referring back to eqn [4b], this means that the 
integral of the nonabelian chiral anomaly for 
fermions in the background field of an instanton is 
an integer, which in the Minkowski space continua- 
tion has the interpretation of a topological winding 
number change produced by the instanton tunneling 
solution. This fact has a number of profound 
consequences. Since a vacuum with a definite wind- 
ing number |v) is unstable under instanton tunnel- 
ing, careful analysis shows that the nonabelian 
vacuum that has correct clustering properties is a 
Fourler superposition 


0) = de |v) [8c] 


giving rise to the 6-vacuum of quantum chromody- 
namics, and a host of issues associated with (the lack 
of) strong CP violation, the Peccei-Quinn mecha- 
nism, and axion physics. Also, the fact that the 
integral of eqn [8b] is nonzero means that the U(1) 
chiral symmetry of quantum chromodynamics is 
broken by instantons, which as shown by ’t Hooft 
resolves the longstanding “U(1) problem” of strong 
interactions, that of explaining why the flavor 
singlet pseudoscalar meson 7/ is not light, unlike its 
flavor octet partners. 


Anomaly Matching Conditions 


The anomaly structure of a theory, as shown by ’t 
Hooft, leads to important constraints on the forma- 
tion of massless composite bound states. Consider a 
theory with a set of left-handed fermions y”, with i a 
“color” index acted on by a nonabelian gauge force, 
and f an ungauged family or “flavor” index. Suppose 
that the family multiplet structure is such that the 
global chiral symmetries associated with the flavor 
index have nonvanishing anomalies tr{T,, Ts}Ty. 
Then the ’t Hooft condition asserts that if the color 
forces result in the formation of composite massless 
bound states of the original completely confined 
fermions, and if there is no spontaneous breaking of 
the original global flavor symmetries, then these 
bound states must contain left-handed  spin-1/2 
composites with a representation structure § that 


has the same anomaly coefficient as that in the 
underlying theory. In other words, we must have 


Sing} Oy = th Lay Let Ty [9] 


To prove this, one adjoins to the theory a set of 
right-handed spectator fermions y/ with the same 
flavor structure as the original set, but which are not 
acted on by the color force. These right-handed 
fermions cancel the original anomaly, making the 
underlying theory anomaly free at zero color 
coupling; since dynamics cannot spontaneously 
generate anomalies, the theory, when the color 
dynamics is turned on, must also have no global 
chiral anomalies. This implies that the bound-state 
spectrum must conspire to cancel the anomalies 
associated with the right-handed spectators; in other 
words, the bound-state anomaly structure must 
match that of the original fermions. This anomaly 
matching condition has found applications in the 
study of the possible compositeness of quarks and 
leptons. It has also been applied to the derivation of 
nonperturbative dynamical results in whole classes 
of supersymmetric theories, where the combined 
tools of holomorphicity, instanton physics, and 
anomaly matching have given incisive results. 


Global Structure of Anomalies 


We noted earlier that chiral anomalies are irreduci- 
ble, in that they cannot be eliminated by adding a 
local polynomial counter-term to the action. How- 
ever, anomalies can be described by a nonlocal 
effective action, obtained by integrating out the 
fermion field dynamics, and this point of view proves 
very useful in the nonabelian case. Starting with the 
abelian case for orientation, we note that if A” is an 
external axial-vector field, and we write an effective 
action I'[A], then the axial-vector current i asso- 
ciated with A” is given (up to an overall constant) by 
the variational derivative expression 


STIA] 


p(x) = SAR (x) [10a] 


and the abelian anomaly appears as the fact that the 
expression 


ó 
[aS = = = Ke ae 
ars = XT[A] =G #0, X= IE 


[10b] 
is nonvanishing even when the theory is classically 
chiral invariant. Turning now to the nonabelian 
case, the variational derivative appearing in eqns 
[10a] and [10b] must be replaced by an appropriate 


covariant derivative. In terms of the internal- 
symmetry component fields A? and V? of the 
Yang-Mills potentials of eqn [4a], one introduces 
operators 


-X (a) =O" Fray + Vi Sac 

n v [11a] 
A a O 

+ hatch pera 


with fabe the antisymmetric nonabelian group struc- 
ture constants. The operators X° and Y° are easily 
seen to obey the commutation relations 


[X° (x), X°(9)] = fabcd(x — y) Ye(x) 
X" (x), Y°(y)] = fabe (x — y)Xe(x) [11b] 
[Y*(x), (y)| = [abeel y) Y-(x) 


Let I'[V, A] be the effective action as a functional of 
the fields V”, A”, constructed so that the vector 
currents are covariantly conserved, as expressed 
formally by 


Y°'T|V,A]=0 [12a] 
Then the nonabelian axial-vector current anomaly is 
given by 


X*T[V, A] = G? [12b] 


From eqns [12a] and [12b] and the first line of 
eqn [11b], we have 
X’G? = 


XG’ = OCX =X XIVA] 


x fac Y T| V, A] = 0 [12c] 


which is the Wess-Zumino consistency condition on 
the structure of the anomaly G*. It can be shown 
that this condition uniquely fixes the form of the 
nonabelian anomaly to be that of eqn [4b], up to an 
overall constant, which can be determined by 
comparison with the simplest anomalous AVV 
triangle graph. A physical consequence of the 
consistency condition is that the 7? — yy decay 
amplitude determines uniquely certain other anom- 
alous amplitudes, such as 2y — 32,7 — 3r, anda 
five pseudoscalar vertex. 

Although the action I'[V, A] is necessarily non- 
local, Wess and Zumino were able to write down a 
local action, involving an auxiliary pseudoscalar 
field, that obeys the anomalous Ward identities and 
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the consistency conditions. Subsequently, Witten 
gave a new construction of this local action, in 
terms of the integral of a fifth-rank antisymmetric 
tensor over a five-dimensional disk which has a 
four-dimensional space as its boundary. He also 
showed that requiring e! to be independent of the 
choice of the spanning disk requires, in analogy with 
Dirac’s quantization condition for monopole charge, 
the condition that the overall coefficient in the 
nonabelian anomaly be quantized in integer multi- 
ples. Comparison with the lowest-order triangle 
diagram shows that in the case of SU(N.) gauge 
theory, this integer is just the number of colors Ne. 
Thus, global considerations tightly constrain the 
nonabelian chiral anomaly structure, and dictate 
that up to an integer-proportionality constant, it 
must have the form given in eqns [4a] and [4b]. 


Trace Anomalies 


The discovery of chiral anomalies inspired the search 
for other examples of anomalous behavior. First 
indications of a perturbative trace anomaly obtained 
in a study of broken scale invariance by Coleman and 
Jackiw were shown by Crewther, and by Chanowitz 
and Ellis, to correspond to an anomaly in the three- 
point function 6°V,,V,, where 6 is the energy- 
momentum tensor. Letting A,,,(p) be the momentum 
space expression for this three-point function, and Iy 
the corresponding V,, V,, two-point function, the trace 
anomaly equation in quantum electrodynamics reads 


ð 
AP) -(2 — Po =) Iw (PD) 
R 
E br (Pabo _ Nw) [13a] 


with the first term on the right-hand side the naive 
divergence, and the second term the trace anomaly, 
with anomaly coefficient R given by 


R= 0+3 `S o 


1 spin; T „spin 0 


[13b] 


The fact that there should be a trace anomaly can 
readily be inferred from a trace analog of the Pauli- 
Villars regulator argument for the chiral anomaly 
given in eqn [3a]. Letting ;=wWw be the scalar 
current in abelian electrodynamics, one has 


Olma ~ Calma = Molay — Molim [13c] 


Taking the vacuum to two-photon matrix element 

of this equation, and imposing vector-current con- 

servation, one finds that the matrix element 
> é : = | À 

Olm YY) is proportional to Mọ (O|FyoF do ay, 

for a large regulator mass, and so makes a 
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nonvanishing contribution to the right-hand side of 
eqn [13c], giving the lowest-order trace anomaly. 
Unlike the chiral anomaly, the trace anomaly is 
renormalized in higher orders of perturbation 
theory; heuristically, the reason is that whereas 
boson field regulators do not affect the chiral 
symmetry properties of a gauge theory (which are 
determined just by the fermionic terms in the 
Lagrangian), they do alter the energy-momentum 
tensor, since gravitation couples to all fields, includ- 
ing regulator fields. An analysis using the Callan- 
Symanzik equations shows, however, that the trace 
anomaly is computable to all orders in terms of 
various renormalization group functions of the 
coupling. For example, in abelian electrodynamics, 
defining (a) and óla) by B(a)=(m/a)da/Om and 
1 + 6(a) =(m/mo)Omo /Om, the trace of the energy- 
momentum tensor is given to all orders by 


ot = [1 + 5(a) mod +18(a)N[PyoPY] +--+ [14 


with N[ ] specifying conditions that make the division 
into two terms in eqn [14] unique, and with the 
ellipsis --- indicating terms that vanish by the equa- 
tions of motion. A similar relation holds in the 
nonabelian case, again with the 8 function appearing 
as the coefficient of the anomalous tr N[F),F*’] term. 

Just as in the chiral anomaly case, when spin-0, 
spin-1/2, or spin-1 fields propagate on a background 
spacetime, there are curvature-dependent contribu- 
tions to the trace anomaly, in other words, gravita- 
tional anomalies. These typically take the form of 
complicated linear combinations of terms of the 
form R*, Rw R”, Ruvo R”™, R p”, with coefficients 
depending on the matter fields involved. 

In supersymmetric theories, the axial-vector current 
and the energy-momentum tensor are both 
components of the supercurrent, and so their anoma- 
lies imply the existence of corresponding supercurrent 
anomalies. The issue of how the nonrenormalization 
of chiral anomalies (which have a supercurrent 
generalization given by the Konishi anomaly), and 
the renormalization of trace anomalies, can coexist in 
supersymmetric theories originally engendered con- 
siderable confusion. This apparent puzzle is now 
understood in the context of a perturbatively exact 
expression for the 3 function in supersymmetric field 
theories (the so-called NSVZ, for Novikov, Shifman, 
Vainshtein, and Zakharov, 8 function). Supersymme- 
try anomalies can be used to infer the structure of 
effective actions in supersymmetric theories, and these 
in turn have important implications for possibilities 
for dynamical supersymmetry breaking. Anomalies 
may also play a role, through anomaly mediation, in 
communicating supersymmetry breaking in “hidden 


sectors” of a theory, which do not contain the physical 
fields that we directly observe, to the “physical sector” 
containing the observed fields. 


Further Anomaly Topics 


The above discussion has focused on some of the 
principal features and applications of anomalies. 
There are further topics of interest in the physics and 
mathematics of anomalies that are discussed in 
detail in the references cited in the “Further reading” 
section. We briefly describe a few of them here. 


Anomalies in Other Spacetime Dimensions 
and in String Theory 


The focus above has been on anomalies in four- 
dimensional spacetime, but anomalies of various 
types occur both in lower-dimensional quantum 
field theories (such as theories in two- and three- 
dimensional spacetimes) and in quantum field the- 
ories in higher-dimensional spacetimes (such as N = 1 
supergravity in ten-dimensional spacetime). Anoma- 
lies also play an important role in the formulation 
and consistency of string theory. The bosonic string is 
consistent only in 26-dimensional spacetime, and the 
analogous supersymmetric string only in ten-dimen- 
sional spacetime, because in other dimensions both 
these theories violate Lorentz invariance after quanti- 
zation. In the Polyakov path-integral formulation of 
these string theories, these special dimensions are 
associated with the cancellation of the Weyl anomaly, 
which is the relevant form of the trace anomaly 
discussed above. Yang-Mills, gravitational, and 
mixed Yang-Mills gravitational anomalies make an 
appearance both in N=1 ten-dimensional super- 
gravity and in superstring theory, and again special 
dimensions play a role. In these theories, only when 
the associated internal symmetry groups are either 
SO(32) or Eg x Eg is elimination of all anomalies 
possible, by cancellation of hexagon-diagram anoma- 
lies with anomalous tree diagrams involving 
exchange of a massless antisymmetric two-form 
field. This mechanism, due to Green and Schwarz, 
requires the factorization of a sixth-order trace 
invariant that appears in the hexagon anomaly in 
terms of lower-order invariants, as well as two 
numerical conditions on the adjoint representation 
generator structure, restricting the allowed gauge 
groups to the two noted above. 


Covariant versus Consistent Anomalies; 
Descent Equations 


The nonabelian anomaly of eqns [4a] and [4b] is 
called the “consistent anomaly,” because it obeys the 


Wess—Zumino consistency conditions of eqn [12c]. 
This anomaly, however, is not gauge covariant, as can 
be seen from the fact that it involves not only the 
Yang-Mills field strengths Fý 4, but the potentials 
V”, A” as well. It turns out to be possible, by adding 
appropriate polynomials to the currents, to transform 
the consistent anomaly to a form, called the “covariant 
anomaly,” which is gauge covariant under gauge 
transformations of the potentials V”, A”. This anom- 
aly, however, does not obey the Wess—Zumino 
consistency conditions, and cannot be obtained from 
variation of an effective action functional. 

The consistent anomalies (but not the covariant 
anomalies) obey a remarkable set of relations, called 
the Stora—Zumino descent equations, which relate 
the abelian anomaly in 27 + 2 spacetime dimensions 
to the nonabelian anomaly in 2” spacetime dimen- 
sions. This set of equations has been interpreted 
physically by Callan and Harvey as reflecting the 
fact that the Dirac equation has chiral zero modes in 
the presence of strings in 2n + 2 dimensions and of 
domain walls in 2n + 1 dimensions. 


Anomalies and Fermion Doubling in Lattice 
Gauge Theories 


A longstanding problem in lattice formulations of 
gauge field theories is that when fermions are 
introduced on the lattice, the process of discretization 
introduces an undesirable doubling of the fermion 
particle modes. In particular, when an attempt is made 
to put chiral gauge theories, such as the electroweak 
theory, on the lattice, one finds that the doublers 
eliminate the chiral anomalies, by cancellation between 
modes with positive and negative axial-vector charge. 
Thus, for a long time, it appeared doubtful whether 
chiral gauge theories could be simulated on the lattice. 
However, recent work has led to formulations of lattice 
fermions that use a mathematical analog of a domain 
wall to successfully incorporate chiral fermions and the 
chiral anomaly into lattice gauge theory calculations. 


Relation of Anomalies to the Atiyah-Singer 
Index Theorem 


The singlet (A4,=1) anomaly of eqn [4b] is closely 
related to the Atiyah—Singer index theorem. Specifi- 
cally, the Euclidean spacetime integral of the singlet 
anomaly constructed from a gauge field can be 
shown to give the index of the related Dirac 
operator for a fermion moving in that background 
gauge field, where the index is defined as the 
difference between the numbers of right- and left- 
handed zero-eigenvalue normalizable solutions of 
the Dirac equation. Since the index is a topological 
invariant, this again implies that the Euclidean 
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spacetime integral of the anomaly is a topological 
invariant, as noted above in our discussion of 
instanton-related applications of anomalies. 


Retrospect 


The wide range of implications of anomalies has 
surprised — even astonished — the founders of the 
subject. New anomaly applications have appeared 
within the last few years, and very likely the future 
will see continued growth of the area of quantum 
field theory concerned with the physics and mathe- 
matics of anomalies. 
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Introduction 


The central objective in the study of quantum chaos 
is to characterize universal properties of quantum 
systems that reflect the regular or chaotic features of 
the underlying classical dynamics. Most develop- 
ments of the past 25 years have been influenced by 
the pioneering models on statistical properties of 
eigenstates (Berry 1977) and energy levels (Berry 
and Tabor 1977, Bohigas et al. 1984). Arithmetic 
quantum chaos (AQC) refers to the investigation of 
quantum systems with additional arithmetic struc- 
tures that allow a significantly more extensive 
analysis than is generally possible. On the other 
hand, the special number-theoretic features also 
render these systems nongeneric, and thus some of 
the expected universal phenomena fail to emerge. 
Important examples of such systems include the 
modular surface and linear automorphisms of tori 
(“cat maps”) which will be described below. 

The geodesic motion of a point particle on a 
compact Riemannian surface M of constant nega- 
tive curvature is the prime example of an Anosov 
flow, one of the strongest characterizations of 
dynamical chaos. The corresponding quantum 
eigenstates yj and energy levels A; are given by the 
solution of the eigenvalue problem for the Laplace- 
Beltrami operator A (or Laplacian for short) 


(A+A)p=0, — [l¥llzuy =1 [1] 
where the eigenvalues 
Ag =O < Ay < Ad < +++ OO [2] 
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form a discrete spectrum with an asymptotic density 
governed by Weyl’s law 


Area(T\ HA) 
An 


We rescale the sequence by setting 


Area(T\ HA) 
a= 4r 


which yields a sequence of asymptotic density 1. 
One of the central conjectures in AQC says that, if 
M is an arithmetic hyperbolic surface (see the next 
section for examples of this very special class of 
surfaces of constant negative curvature), the eigen- 
values of the Laplacian have the same local 
statistical properties as independent random vari- 
ables from a Poisson process (see, e.g., the surveys by 
Sarnak (1995) and Bogomolny et al. (1997)). This 
means that the probability of finding k eigenvalues X; 
in randomly shifted interval [X,X +L] of fixed 
length L is distributed according to the Poisson law 
Lfe™/k!. The gaps between eigenvalues have an 
exponential distribution, 





FI ASA A, A= 00 [3] 





Xj [4] 
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as N —> œ, and thus eigenvalues are likely to appear 
in clusters. This is in contrast to the general 
expectation that the energy level statistics of generic 
chaotic systems follow the distributions of random 
matrix ensembles; Poisson statistics are usually 
associated with quantized integrable systems. 
Although we are at present far from a proof of [5], 
the deviation from random matrix theory is well 
understood (see the section “Eigenvalue statistics 
and Selberg trace formula”). 

Highly excited quantum eigenstates yj(j — oo) 
(cf. Figure 1) of chaotic systems are conjectured to 
behave locally like random wave solutions of [1], 





Figure 1 Image of the absolute-value-squared of an eigenfunc- 
tion y;(z) for a nonarithmetic surface of genus 2. The surface is 
obtained by identifying opposite sides of the fundamental region. 
Reproduced from Aurich and Steiner (1993) Statistical properties of 
highly excited quantum eigenstates of a strongly chaotic system. 
Physica D 64(1—3): 185-214, with permission from R Aurich. 


where boundary conditions are ignored. This 
hypothesis was put forward by Berry in 1977 and 
tested numerically, for example, in the case of 
certain arithmetic and nonarithmetic surfaces of 
constant negative curvature (Hejhal and Rackner 
1992, Aurich and Steiner 1993). One of the 
implications is that eigenstates should have uniform 
mass on the surface M, that is, for any bounded 
continuous function g: M —>R 


J Pga f gaa, me 


where dA is the Riemannian area element on M. 
This phenomenon, referred to as quantum unique 
ergodicity (QUE), is expected to hold for general 
surfaces of negative curvature, according to a 
conjecture by Rudnick and Sarnak (1994). In the 
case of arithmetic hyperbolic surfaces, there has 
been substantial progress on this conjecture in the 
works of Lindenstrauss, Watson, and Luo—Sarnak 
(discussed later in this article; see also the review by 
Sarnak (2003)). For general manifolds with ergodic 
geodesic flow, the convergence in [6] is so far 
established only for subsequences of eigenfunctions 
of density 1 (Schnirelman—Zelditch—Colin de Verdiére 
theorem, see Quantum Ergodicity and Mixing of 
Eigenfunctions), and it cannot be ruled out that 
exceptional subsequences of eigenfunctions have 
singular limit, for example, localized on closed 
geodesics. Such “scarring” of eigenfunctions, at least 
in some weak form, has been suggested by numerical 
experiments in Euclidean domains, and the existence 
of singular quantum limits is a matter of controversy 
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in the current physics and mathematics literature. A 
first rigorous proof of the existence of scarred 
eigenstates has recently been established in the case 
of quantized toral automorphisms. Remarkably, 
these quantum cat maps may also exhibit QUE. A 
more detailed account of results for these maps is 
given in the section “Quantum eigenstates of cat 
maps”; see also Rudnick (2001) and De Bièvre (to 
appear). 

There have been a number of other fruitful 
interactions between quantum chaos and number 
theory, in particular the connections of spectral 
statistics of integrable quantum systems with the 
value distribution properties of quadratic forms, and 
analogies in the statistical behavior of energy levels 
of chaotic systems and the zeros of the Riemann zeta 
function. We refer the reader to Marklof (2006) and 
Berry and Keating (1999), respectively, for informa- 
tion on these topics. 


Hyperbolic Surfaces 


Let us begin with some basic notions of hyperbolic 
geometry. The hyperbolic plane H may be abstractly 
defined as the simply connected two-dimensional 
Riemannian manifold with Gaussian curvature —1. 
A convenient parametrization of Hl is provided by 
the complex upper-half plane, S={x+iy:x € 











R,y > 0}, with Riemannian line and volume 
elements 
dx? + dy? dx d 
J y 
respectively. The group of orientation-preserving 





isometries of HI is given by fractional linear 
transformations 





R +b 
H — $), DF 
Pa” [8] 
( ) E SL(2, R) 
c d 


where SL(2,R) is the group of 2 x 2 matrices with 
unit determinant. Since the matrices 1 and —1 
represent the same transformation, the group of 
orientation-preserving isometries can be identified 
with PSL(2,R):=SL(2,R)/{+1}. A finite-volume 
hyperbolic surface may now be represented as the 
quotient ['\H, where FT c PSL(2,R) is a Fuchsian 
group of the first kind. An arithmetic hyperbolic 
surface (such as the modular surface) is obtained, if T 
has, loosely speaking, some representation in n x n 
matrices with integer coefficients, for some suitable n. 
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This is evident in the case of the modular surface, 
where the fundamental group is the modular group 


T = PSL(2, Z) 
= (! a) € PSL(2,R): a,b,c,d € Zb} 


A fundamental domain for the action of the 
modular group PSL(2,Z) on $ is the set 


Fosia.z) ={z € H: |Z) >1,-7<Rez<5} 9 


(see Figure 2). The modular group is generated by 


the translation 
1 1 le + 1 
0 1 va Z 


and the inversion 


k 9 je Lie 


These generators identify sections of the boundary 
of Fpst2,z). By gluing the fundamental domain 
along identified edges, we obtain a realization of the 
modular surface, a noncompact surface with one 
cusp at z— œ, and two conic singularities at z=1 
and z= 1/2 + iv3/2. 

An interesting example of a compact arithmetic 
surface is the “regular octagon,” a hyperbolic 
surface of genus 2. Its fundamental domain is 
shown in Figure 3 as a subset of the Poincaré disc 
D={zEC:|z|<1}, which yields an alternative 
parametrization of the hyperbolic plane H. In these 
coordinates, the Riemannian line and volume 
element read 





2 2 
ds ez dA Se [10] 
Tma] (1 —x* — y?) 
y 
=| 0 1 x 


Figure 2 Fundamental domain of the modular group PSL(2, Z) 
in the complex upper-half plane. 


Figure 3 Fundamental domain of the regular octagon in the 
Poincare disk. 


The group of orientation-preserving isometries is 
now represented by PSU(1,1)=SU(1,1)/{+1}, 


where 


suct.t)=4 (9 f):a Beca- IaP=1} [11] 


acting on ® as above via fractional linear transfor- 
mations. The fundamental group of the regular 
octagon surface is the subgroup of all elements in 
PSU(1,1) with coefficients of the form 


a=k+IlV2, B=(m+nv2)V14+v2 [2 


where k,/,m,n € Zi], that is, Gaussian integers of 
the form kı +ik2,ki,k2 E Z. Note that not all 
choices of k,l,m,n € Zfi] satisfy the condition 
la — |8| =1. Since all elements y#1 of T act 
fix-point free on H, the surface I'\H is smooth 
without conic singularities. 

In the following, we will restrict our attention to a 
representative case, the modular surface with 
r=Psh(2,7Z), 








Eigenvalue Statistics and Selberg 
Trace Formula 


The statistical properties of the rescaled eigenvalues 
X; (cf. [4]) of the Laplacian can be characterized by 
their distribution in small intervals 


N(x, L):= #{j:x<X;<x+L} [13] 


where x is uniformly distributed, say, in the 
interval [X, 2X], X large. Numerical experiments 
by Bogomolny, Georgeot, Giannoni, and Schmit, 
as well as Bolte, Steil, and Steiner (see references in 


Bogomolny (1997)) suggest that the X; are asymp- 
totically Poisson distributed: 


Conjecture 1 For any bounded function g: Zs9 —> C 
we have 


2X OO Lke-L 
x} s N (x, L)) dx — D8) [14] 


as T — œo. 


One may also consider larger intervals, where 
L—oo as X— oo. In this case, the assumption on 
the independence of the X; predicts a central-limit 
theorem. Weyl’s law [3] implies that the expectation 
value is asymptotically, for T— oo, 


IX 
= N(x, L)dx ~ L 15] 
X Jx 
This asymptotics holds for any sequence of L 
bounded away from zero (e.g., L constant, or 
L—oo). 
Define the variance by 


¥?(X,L) = ~ | We, L) -L dx [16] 


In view of the above conjecture, one expects 
5? (X, L)~ L in the limit X —> œ, L/VX —0 (the 
variance exhibits a less universal behavior in the 
range L >> VX (the notation A < B means there is a 
constant c > 0 such that A < cB), cf. Sarnak (1995), 
and a central-limit theorem for the fluctuations 
around the mean: 


Conjecture 2 For any bounded function g:R — C 


we have 
2X 
zh 8 i dx 
X er a aD 


+ =| g(t) e2? de 17 
T J —oo 


as X,L—-o,L< X. 


The main tool in the attempts to prove the above 
conjectures has been the Selberg trace formula. It 
relates sums over eigenvalues of the Laplacians to 
sums over lengths of closed geodesics on the 
hyperbolic surface. The trace formula is in its 
simplest form in the case of compact hyperbolic 
surfaces; we have 


A S h(p)tanh(rp)p dp 


l g(nt,) 
j 2 a a a 


yeH, n= 
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where H, is the set of all primitive oriented closed 
geodesics y, and £, their lengths. The quantity p; is 
related to the eigenvalue A; by the equation A; = p; ++ 
1/4. The trace formula [18] holds for a large class of 
even test functions þh. For example, it is sufficient to 
assume that / is infinitely differentiable, and that the 
Fourier transform of h, 


=> f b(p) e™™* dp 19 


has compact support. The trace formula for non- 
compact surfaces has additional terms from the 
parabolic elements in the corresponding group, and 
includes also sums over the resonances of the 
continuous part of the spectrum. The noncompact 
modular surface behaves in many ways like a 
compact surface. In particular, Selberg showed that 
the number of eigenvalues embedded in the con- 
tinuous spectrum satisfies the same Weyl law as in 
the compact case (Sarnak 2003). 
Setting 


h(p) = Xx,X+L] ase G + 3) [20] 


where xix, x+] is the characteristic function of the 
interval [X,X + L], we may thus view N (X, L) as 
the left-hand side of the trace formula. The above 
test function þh is, however, not admissible, and 
requires appropriate smoothing. Luo and Sarnak (cf. 
Sarnak (2003)) developed an argument of this type 
to obtain a lower bound on the average number 
variance, 


L 

/ »2(X, L^) dL ne [21] 
L Jo (log X) 

in the regime vX/logX« L« VX, which is 
consistent with the Poisson conjecture X?(X, L) ~ L. 
Bogomolny, Levyraz, and Schmit suggested a remark- 
able limiting formula for the two-point correlation 
function for the modular surface (cf. Bogomolny 
et al. (1997) and Bogomolny (2006)), based on an 
analysis of the correlations between multiplicities of 
lengths of closed geodesics. A rigorous analysis of the 
fluctuations of multiplicities is given by Peter (cf. 
Bogomolny (2006)). Rudnick (2005) has recently 
established a smoothed version of Conjecture 2 in the 
regime 


vk VR 
Llog X 





T 7, — 0 22 
where the characteristic function in [20] is replaced 
by a certain class of smooth test functions. 

All of the above approaches use the Selberg trace 
formula, exploiting the particular properties of the 
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distribution of lengths of closed geodesics in 
arithmetic hyperbolic surfaces. These will be dis- 
cussed in more detail in the next section, following 
the work of Bogomolny, Georgeot, Giannoni and 
Schmit, Bolte, and Luo and Sarnak (see Bogomolny 
et al. (1997) and Sarnak (1995) for references). 


Distribution of Lengths of Closed 
Geodesics 


The classical prime geodesic theorem asserts that the 
number N(£) of primitive closed geodesics of length 
less than £ is asymptotically 


N() ~$ [23] 


One of the significant geometrical characteristics of 
arithmetic hyperbolic surfaces is that the number of 
closed geodesics with the same length ¢ grows 
exponentially with ¢@. This phenomenon is most 
easily explained in the case of the modular surface, 
where the set of lengths £ appearing in the lengths 
spectrum is characterized by the condition 


2 cosh(¢/2) = |tr yl [24] 


where y runs over all elements in SL(2, Z) with 
|[try| >2. It is not hard to see that any integer n > 2 
appears in the set {|tr y|: y € SL(2, Z)}, and hence 
the set of distinct lengths of closed geodesics is 


L = {2 arcosh(n/2): n = 3,4,5,...} [25] 


Therefore, the number of distinct lengths less than £ 
is asymptotically (for large @) 


N'(é) = #(£0 [0, 4) ~ e“? [26] 


Equations [26] and [23] say that on average the 
number of geodesics with the same lengths is at least 
~<el/? ae 

The prime geodesic theorem [23] holds equally for 
all hyperbolic surfaces with finite area, while [26] is 
specific to the modular surface. For general arith- 
metic surfaces, we have the upper bound 


N'(£) < ce’? [27] 


for some constant c > 0 that may depend on the 
surface. Although one expects N’(¢) to be asympto- 
tic to (1/2)N(é) for generic surfaces (since most 
geodesics have a time-reversal partner which thus 
has the same length, and otherwise all lengths are 
distinct), there are examples of nonarithmetic Hecke 
triangles where numerical and heuristic arguments 
suggest N'(£) ~ cye?"/£ for suitable constants c1 > 0 
and 0<c2<1/2 (cf. Bogomolny (2006)). Hence 


exponential degeneracy in the length spectrum seems 
to occur in a weaker form also for nonarithmetic 
surfaces. 

A further useful property of the length spectrum 
of arithmetic surfaces is the bounded clustering 
property: there is a constant C (again surface 
dependent) such that 


H(LN [e,€+1))<C [28] 


for all @. This fact is evident in the case of the 


modular surface; the general case is proved by Luo 
and Sarnak (cf. Sarnak (1995)). 


Quantum Unique Ergodicity 





The unit tangent bundle of a hyperbolic surface [\H 
describes the physical phase space on which the 
classical dynamics takes place. A convenient para- 
metrization of the unit tangent bundle is given by 
the quotient [\PSL(2, R — this may be seen be means 
of the Iwasawa decomposition for an element 


g € PSL(2, R), 


1 x yl? 0 
s= 0 1 0 yt 


( ee nnn 29 
—sin@/2 cos0/2 


where x +iy € represents the position of the 
particle in T\Ħ in half-plane coordinates, and 0 € 
[0, 27) the direction of its velocity. Multiplying the 
matrix [29] from the left by (44) and writing the 
result again in the Iwasawa form [29], one obtains 
the action 


co ( 





az+b 
cz+d’ 





0 — 2 arg(cz + d) [30] 


which represents precisely the geometric action of 
isometries on the unit tangent bundle. 

The geodesic flow ® on I\PSL(2, R) is repre- 
sented by the right translation 


t/2 0 
v :Tg=Te( a) [31] 


The Haar measure u on PSL(2, R) is thus trivially 
invariant under the geodesic flow. It is well known 
that u is not the only invariant measure, that is, ®* is 
not uniquely ergodic, and that there is in fact an 
abundance of invariant measures. The simplest 
examples are those with uniform mass on one, or a 
countable collection of, closed geodesics. 

To test the distribution of an eigenfunction 
pj in phase space, one associates with a function 


a € CY(T\PSL(2, R)) the quantum observable 
Op(a), a zeroth order pseudodifferential operator 
with principal symbol a. Using semiclassical tech- 
niques based on Friedrich’s symmetrization, one 
can show that the matrix element 


v(a) = (Opla)yj, pj) [32] 


is asymptotic (as joo) to a positive functional 
that defines a probability measure on 
I\PSL(2, R). Therefore, if M is compact, any 
weak limit of v; represents a probability measure 
on ['\PSL(2, R). Egorov’s theorem (see Quantum 
Ergodicity and Mixing of Eigenfunctions) in turn 
implies that any such limit must be invariant 
under the geodesic flow, and the main challenge 
in proving QUE is to rule out all invariant 
measures apart from Haar. 


Conjecture 3 (Rudnick and Sarnak (1994); see 
Sarnak (1995, 2003)). For every compact hyperbolic 
surface I\H, the sequence vj converges weakly to p. 





Lindenstrauss has proved this conjecture for 
compact arithmetic hyperbolic surfaces of congru- 
ence type (such as the second example in the section 
“Hyperbolic surfaces”) for special bases of eigen- 
functions, using ergodic-theoretic methods. These 
will be discussed in more detail in the next section. 
His results extend to the noncompact case, that is, to 
the modular surface where I =PSL(2, Z). Here he 
shows that any weak limit of subsequences of 1; is 
of the form cu, where c is a constant with values in 
[0,1]. One believes that c=1, but with present 
techniques it cannot be ruled out that a proportion 
of the mass of the eigenfunction escapes into the 
noncompact cusp of the surface. For the modular 
surface, c=1 can be proved under the assumption of 
the generalized Riemann hypothesis (see the section 
“Eigenfunctions and L-functions” and  Sarnak 
(2003)). QUE also holds for the continuous part of 
the spectrum, which is furnished by the Eisenstein 
series E(z,s), where s=1/2+ir is the spectral 
parameter. Note that the measures associated with 
the matrix elements 


v(a) = (Op(a)E(.,1/2 + ir), E(,1/2+ir)) [33] 


are not probability measures but only Radon 
measures, since E(z,s) is not square-integrable. Luo 
and Sarnak, and Jakobson have shown that 





v(a) _ pla) 

im = — 34 
Sub) ul) 34 
for suitable test functions a,b € C™(T\PSL(2, R)) 
(cf. Sarnak (2003)). 
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Hecke Operators, Entropy 
and Measure Rigidity 


For compact surfaces, the sequence of probability 
measures approaching the matrix elements vj is 
relatively compact. That is, every infinite sequence 
contains a convergent subsequence. Lindenstrauss’ 
central idea in the proof of QUE is to exploit the 
presence of Hecke operators to understand the 
invariance properties of possible quantum limits. 
We will sketch his argument in the case of the 
modular surface (ignoring issues related to the non- 
compactness of the surface), where it is most 
transparent. 

For every positive integer n, the Hecke operator 
T, acting on continuous functions on T\Ħ with 
r = SL(2, Z) is defined by 


ied d1 /az+b 
Ia z) = Va De, >, ( 7 ) [35] 
ad=n 








The set M, of matrices with integer coefficients and 
determinant n can be expressed as the disjoint union 


M, = U Ur( 7) [36] 


and hence the sum in [35] can be viewed as a sum 
over the cosets in this decomposition. We note the 
product formula 


T nln = ` 


d|gcd(m,n) 


Ti /d [3 7| 


The Hecke operators are normal, form a com- 
muting family, and in addition they commute with 
the Laplacian A. In the following, we consider an 
orthonormal basis of eigenfunctions y; of A that 
are simultaneously eigenfunctions of all Hecke 
operators. We will refer to such eigenfunctions as 
Hecke eigenfunctions. The above assumption is 
automatically satisfied, if the spectrum of A is 
simple (i.e., no eigenvalues coincide), a property 
conjectured by Cartier and supported by numerical 
computations. Lindenstrauss’ work is based on the 
following two observations. Firstly, all quantum 
limits of Hecke eigenfunctions are geodesic-flow 
invariant measures of positive entropy, and sec- 
ondly, the only such measure of positive entropy 
that is recurrent under Hecke correspondences is 
the Lebesgue measure. 

The first property is proved by Bourgain and 
Lindenstrauss (2003) and refines arguments of 
Rudnick and Sarnak (1994) and Wolpert (2001) on 
the distribution of Hecke points (see Sarnak (2003) for 
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references to these papers). For a given point z € H 
the set of Hecke points is defined as 


Ta(z) := Mnz [38] 


For most primes, the set T(z) comprises (p + 1) 
p*-! distinct points on T\Ħ. cc each z, the Hecke 
operator T, may now be interpreted as the 
adjacency matrix for a finite graph embedded in 
I'\H, whose vertices are the Hecke points T,,(z). 
Hecke eigenfunctions y; with 


Tipi = A) 9; [39] 








give rise to eigenfunctions of the adjacency matrix. 
Exploiting this fact, Bourgain and Lindenstrauss 
show that for a large set of integers n 


|; (Z) t K ` jpw [40] 
wET, (2) 


that is, pointwise values of pil" cannot be substan- 
tially larger than its sum over Hecke points. This, 
and the observation that Hecke points for a large set 
of integers n are sufficiently uniformly distributed 
on I'\H as n— on, yields the estimate of positive 
entropy with a quantitative lower bound. 
Lindenstrauss’ proof of the second property, 
which shows that Lebesgue measure is the only 
quantum limit of Hecke eigenfunctions, is a result of 
a currently very active branch of ergodic theory: 
measure rigidity. Invariance under the geodesic flow 
alone is not sufficient to rule out other possible limit 
measures. In fact, there are uncountably many 
measures with this property. As limits of Hecke 
eigenfunctions, all quantum limits possess an addi- 
tional property, namely recurrence under Hecke 
correspondences. Since the explanation of these is 
rather involved, let us recall an analogous result in a 
simpler setup. The map x2:x++> 2x mod 1 defines a 
hyperbolic dynamical system on the unit circle with 
a wealth of invariant measures, similar to the case of 
the geodesic flow on a surface of negative curvature. 
Furstenberg conjectured that, up to trivial invariant 
measures that are localized on finitely many rational 
points, Lebesgue measure is the only x2-invariant 
measure that is also invariant under action of 
x3:x++3xmod1. This fundamental problem is 
still unsolved and one of the central conjectures in 
measure rigidity. Rudolph, however, showed that 
Furstenberg’s conjecture is true if one restricts the 
statement to X2-invariant measures of positive 
entropy (cf. Lindenstrauss (to appear)). In Linden- 
strauss’ work, x2 plays the role of the geodesic 
flow, and x3 the role of the Hecke correspondences. 
Although here it might also be interesting to ask 
whether an analog of Furstenberg’s conjecture 





holds, it is inessential for the proof of QUE due to 
the positive entropy of quantum limits discussed in 
the previous paragraph. 


Eigenfunctions and L-Functions 


An even eigenfunction y;(z) for T =SL(2, Z) has the 
Fourier expansion 


= Sail 


We associate with y;(z) the Dirichlet series 


y1/2 Ki,(2mny) cos(2rnx) [41] 


CO 


L(s, yj) = y aoa" [42] 


which converges for Re s large enough. These series 
have an analytic continuation to the entire complex 
plane C and satisfy a functional equation, 


A(s, pj) = A(1 TS, pj) [43] 


where 
senlo Fip S — 10; 
Age = r(S8)\r(S2)ree) a 


If y;(z) is in addition an eigenfunction of all Hecke 
operators, then the Fourier coefficients in fact 
coincide (up to a normalization constant) with the 
eigenvalues of the Hecke operators 


aj(m) = A;(m)a;(1) [45] 


If we normalize a;(1)=1, the Hecke relations [37] 
result in an Euler product formula for the 
L-function, 





I] -app +p) 


p prime 


L(s, 9j) = [46] 


These L-functions behave in many other ways like 
the Riemann zeta or classical Dirichlet L-functions. 
In particular, they are expected to satisfy a Riemann 
hypothesis, that is, all nontrivial zeros are con- 
strained to the critical line Ims = 1/2. 

Questions on the distribution of Hecke eigenfunc- 
tions, such as QUE or value distribution properties, 
can now be translated to analytic properties of 
L-functions. We will discuss two examples. 

The asymptotics in [6] can be established 
by proving [6] for the choices g=y,,k=1,2,..., 
that is, 


f vil" pr dA — 0 [47] 


Watson discovered the remarkable relation (Sarnak 
2003) 


2 





| Pir Pir Pjs dA 
M 


_ TAG Pin X Pir X Pi) 48) 
A(1, sym*y;,)A(1, sym’y;, )A(1, sym? yj, ) 

The L-functions A(s,g) in Watson’s formula are 
more advanced cousins of those introduced earlier 
(see Sarnak (2003) for details). The Riemann 
hypothesis for such L-functions then implies, via 
[48], a precise rate of convergence to QUE for the 
modular surface, 


J i’ gdA = J. gda +O a9 


for any € > 0, where the implied constant depends 
on € and g. 

A second example on the connection between 
statistical properties of the matrix elements 
vila) = (Op(a)y;, pj) (for fixed a and random /) and 
values L-functions has appeared in the work of Luo 
and Sarnak (cf. Sarnak (2003)). Define the variance 


V(a) = Noy a — pa)’ s0 


with N(\) = #{j: Aj < A}; cf. [3]. Following a conjec- 
ture by Feingold—Peres and Eckhardt et al. (see Sarnak 
(2003) for references) for “generic” quantum chaotic 
systems, one expects a central-limit theorem for the 
statistical fluctuations of the v;(a), where the normal- 
ized variance N(A)!/7V\(a) is asymptotic to the 
classical autocorrelation function C(a), see eqn [54]. 


Conjecture 4 For any bounded function g:R—C 


we have 
1 n (ua = ula) 
NQ) Del Vila) 
j 7 J. glee ("de [51] 
as A— 00. 


Luo and Sarnak prove that in the case of the 
modular surface the variance has the asymptotics 


lim N(A)'/?V)(a) = (Ba, a) [52] 


A 0O 


where B is a non-negative self-adjoint operator 
which commutes with the Laplacian A and all 
Hecke operators T„. In particular, we have 


By; =5 LG, p;i) Cloy; [53] 
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where 
C(a):= f J aan F OMEdug)de (54 


is the classical autocorrelation function for the 
geodesic flow with respect to the observable a 
(Sarnak 2003). Up to the arithmetic factor 
(1/2)L(1/2,y;), eqn [53] is consistent with the 
Feingold—Peres prediction for the variance of generic 
chaotic systems. Furthermore, recent estimates of 
moments by Rudnick and Soundararajan (2005) 
indicate that Conjecture 4 is not valid in the case of 
the modular surface. 


Quantum Ejigenstates of Cat Maps 


Cat maps are probably the simplest area-preserving 
maps on a compact surface that are highly chaotic. 
They are defined as linear automorphisms on the 
torus T? = R?/Z?’, 


pa: T ->T [55] 


where a point ¿€ R?°(modZ*) is mapped to 
A&(mod 77); A is a fixed matrix in GL(2, Z) with 
eigenvalues off the unit circle (this guarantees 
hyperbolicity). We view the torus T? as a symplectic 
manifold, the phase space of the dynamical system. 
Since T? is compact, the Hilbert space of quantum 
states is an N-dimensional vector space Hyn, N 
integer. The semiclassical limit, or limit of small 
wavelengths, corresponds here to N — oo. 

It is convenient to identify Hy with L*(Z,/NZ), 
with the inner product 





Wim)=~ E WORO [6 


O mod N 


For any smooth function f € C~(T’), define a 
quantum observable 


Opy(f) = Y F(a)Tx(n) 
neZ? 


where f(n) are the Fourier coefficients of f, and 
Tyn(n) are translation operators 


Tyy(n) = e™m/N ga pa [57] 


ty) =O +1) 
y (Q) = ee NyO) 


The operators Opņn(a) are the analogs of the 
pseudodifferential operators discussed in the section 
“Quantum unique ergodicity.” 


[58] 
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A quantization of ®, is a unitary operator Un(A) 
on L?(Z,/NZ) satisfying the equation 


Un(A) ‘Opy(f)Un(A) = Opn(foPa) [59] 


for all f € C~(T*). There are explicit formulas for 
Un (A) when A is in the group 


r=4(¢ 7) €SL2,Z)iab = ed = 0mod 2 | 60) 


These may be viewed as analogs of the Shale—Weil 
or metaplectic representation for SL(2). for example, 


the quantization of 
2 1 
s-(2) " 


yields 
UN(AJWO) =N D exp O 
O' mod N 
~ 90/4 o") WQ’) 62] 


In analogy with [1], we are interested in the 
statistical features of the eigenvalues and eigenfunc- 
tions of Un(A), that is, the solutions to 


Un(A)y = Ax, ellnzczsnz) =l [63] 


Unlike typical quantum-chaotic maps, the statistics 
of the N eigenvalues 


An1,AN2,---,ANN € S' [64] 
do not follow the distributions of unitary random 
matrices in the limit N — œ, but are rather singular 
(Keating 1991). In analogy with the Selberg trace 
formula for hyperbolic surfaces [18], there is an 
exact trace formula relating sums over eigenvalues 
of Un(A) with sums over fixed points of the classical 
map (Keating 1991). 

As in the case of arithmetic surfaces, the eigenfunc- 
tions of cat maps appear to behave more generically. 
The analog of the Schnirelman-Zelditch-Colin de 
Verdière theorem states that, for any orthonormal 
basis of eigenfunctions loni- , we have, for all 
f € C”(T?), 


(OPen on) > | FOE 6s] 


as N — œ, for all j in an index set Jn of full density, 
that is, #Jn ~ N. Kurlberg and Rudnick (see 
Rudnick (2001)) have characterized special bases of 
eigenfunctions lyn} , (termed Hecke eigenbases, 
in analogy with arithmetic surfaces) for which QUE 
holds, generalizing earlier work of Degli Esposti, 


Graffi, and Isola (1995). That is, [65] holds for all 
j=1,...,N. Rudnick and Kurlberg, and more 
recently Gurevich and Hadani, have established 
results on the rate of convergence analogous to 
[49]. These results are unconditional. Gurevich and 
Hadani use methods from algebraic geometry based 
on those developed by Deligne in his proof of the 
Weil conjectures (an analog of the Riemann hypoth- 
esis for finite fields). 

In the case of quantum-cat maps, there are values 
of N for which the number of coinciding eigenvalues 
can be large, a major difference to what is expected 
for the modular surface. Linear combinations of 
eigenstates with the same eigenvalue are as well 
eigenstates, and may lead to different quantum 
limits. Indeed, Faure, Nonnenmacher, and De Bièvre 
(see De Bièvre (to appear)) have shown that there 


are subsequences of values of N, so that, for all 
fe C*(T"), 


(Op(enivem)— 5 f FOFO [66 


that is, half of the mass of the quantum limit 
localizes on the hyperbolic fixed point of the map. 
This is the first, and to date the only, rigorous result 
concerning the existence of scarred eigenfunctions in 
systems with chaotic classical limit. 


Acknowledgment 


The author is supported by an EPSRC Advanced 
Research Fellowship. 


See also: Quantum Ergodicity and Mixing of 
Eigenfunctions; Random Matrix Theory in Physics. 


Further Reading 


Aurich R and Steiner F (1993) Statistical properties of highly 
excited quantum eigenstates of a strongly chaotic system. 
Physica D 64(1-3): 185-214. 

Berry MV and Keating JP (1999) The Riemann zeros and 
eigenvalue asymptotics. SIAM Review 41(2): 236-266. 

Bogomolny EB (2006) Quantum and arithmetical chaos. In: 
Cartier PE, Julia B, Moussa P, and Vanhove P (eds.) Frontiers 
in Number Theory, Physics and Geometry on Random 
Matrices, Zeta Functions, and Dynamical Systems, Springer 
Lecture Notes. Les Houches. 

Bogomolny EB, Georgeot B, Giannoni M-J, and Schmit C (1997) 
Arithmetical chaos. Physics Reports 291(5-6): 219-324. 

De Bièvre S Recent Results on Quantum Map Eigenstates, 
Proceedings of QMATHY, Giens 2004 (to appear). 

Hejhal DA and Rackner BN (1992) On the topography of Maass 
waveforms for PSL(2, Z). Experiment. Math. 1(4): 275-305. 

Keating JP (1991) The cat maps: quantum mechanics and classical 
motion. Nonlinearity 4(2): 309-341. 


Lindenstrauss E Rigidity of multi-parameter actions. Israel 
Journal of Mathematics (Furstenberg Special Volume) (to 
appear). 

Marklof J (2006) Energy level statistics, lattice point problems and 
almost modular functions. In: Cartier PE, Julia B, Moussa P, 
and Vanhove P (eds.) Frontiers in Number Theory, Physics and 
Geometry on Random Matrices, Zeta Functions, and Dynami- 
cal Systems, Springer Lecture Notes. Les Houches. 

Rudnick Z (2001) On quantum unique ergodicity for linear maps 
of the torus. In: European Congress of Mathematics, 


Asymptotic Structure and Conformal Infinity 221 


(Barcelona, 2000), Progr. Math., vol. 202, pp. 429-437. 
Basel: Birkhauser. 

Rudnick Z (2005) A central limit theorem for the spectrum of the 
modular group, Park city lectures. Annales Henri Poincaré 6: 
863-883. 

Sarnak P Arithmetic quantum chaos. The Schur lectures (1992) 
(Tel Aviv), Israel Math. Conf. Proc., 8, pp. 183-236. Bar-Ilan 
Univ., Ramat Gan, 1995. 

Sarnak P (2003) Spectra of hyperbolic surfaces. Bulletin of the 
American Mathematical Society (N.S.) 40(4): 441-478. 


Asymptotic Structure and Conformal Infinity 


J Frauendiener, Universitat Tubingen, Tubingen, 
Germany 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


A major motivation for studying the asymptotic 
structure of spacetimes has been the need for a 
rigorous description of what should be understood by 
an “isolated system” in Einstein’s theory of gravity. 
As an example, consider a gravitating system some- 
where in our universe (e.g., a galaxy, a cluster of 
galaxies, a binary system, or a star) evolving accord- 
ing to its own gravitational interaction, and possibly 
reacting to gravitational radiation impinging on it 
from the outside. Thereby it will emit gravitational 
radiation. We are interested in describing these waves 
because they provide us with important information 
about the physics governing the system. 

To adequately describe this situation, we need to 
idealize the real situation in an appropriate way, since 
it is hopeless to try to analyze the behavior of the 
system in its interaction with the rest of the universe. 
We are mainly interested in the behavior of the 
system, and not so much in other processes taking 
place at large distances from the system. Since we 
would like to ignore those regions, we need a way to 
isolate the system from their influence. 

The notion of an isolated system allows us to 
select individual subsystems of the universe and 
describe their properties regardless of the rest of the 
universe so that we can assign to each subsystem 
such physical attributes as its energy-momentum, 
angular momentum, or its emitted radiation field. 
Without this notion, we would always have to take 
into account the interaction of the system with its 
environment in full detail. 

In general relativity (GR) it turns out to be a rather 
difficult task to describe an isolated system and the 
reason is — as always in Einstein’s theory — the fact 
that the metric acts both as the physical field and as 


the background. In other theories, like electrody- 
namics, the physical field, such as the Maxwell field, 
is very different from the background field, the flat 
metric of Minkowski space. The fact that the metric 
in GR plays a dual role makes it difficult to extract 
physical meaning from the metric because there is no 
nondynamical reference point. 

Imagine a system alone in the universe. As we 
recede from the system we would expect its influence 
to decrease. So we expect that the spacetime which 
models this situation mathematically will resemble 
the flat Minkowski spacetime and it will approximate 
it even better the farther away we go. This implies 
that one needs to impose fall-off conditions for the 
curvature and that the manifold will be asymptoti- 
cally flat in an appropriate sense. However, there is 
the problem that fall-off conditions necessarily imply 
the use of coordinates and it is awkward to decide 
which coordinates should be “good ones.” Thus, it is 
not clear whether the notion of an asymptotically flat 
spacetime is an invariant concept. 

What is needed, therefore, is an invariant defini- 
tion of asymptotically flat spacetimes. The key 
observation in this context is that “infinity” is far 
away with respect to the spacetime metric. This 
means that geodesics heading away from the system 
should be able to “run forever,” that is, be defined 
for arbitrary values of their affine parameter s. 
“Infinity” will be reached for s— oo. However, 
suppose we do not use the spacetime metric g but a 
metric g which is scaled down with respect to g, that 
is, in such a way that g=7g for some function Q. 
Then it might be possible to arrange 2 in such a way 
that geodesics for the metric g cover the same events 
(strictly speaking, this holds only for null geodesics, 
but this is irrelevant for the present plausibility 
argument) as those for the metric g yet that their 
affine parameter $ (which is also scaled down with 
respect to s) approaches a finite value $9 for s > oo. 
Then we could attach a boundary to the spacetime 
manifold consisting of all the limit points corre- 
sponding to the events with $ = ŝọ on the g-geodesics. 
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This boundary would have to be interpreted as 
“infinity” for the spacetime because it takes infinitely 
long for the g-geodesics to get there. 

We arrived at this idea of attaching a boundary by 
considering the metric structure only “up to arbi- 
trary scaling,” that is, by looking at metrics which 
differ only by a factor. This is the conformal 
structure of the spacetime manifold in question. By 
considering the spacetime only from the point of 
view of its conformal structure we obtain a picture 
of the spacetime which is essentially finite but which 
leaves its causal properties unchanged, and hence in 
particular the properties of wave propagation. This 
is exactly what is needed for a rigorous treatment of 
radiation emitted by the system. 


Infinity for Minkowski Spacetime 


The above discussion suggests that we should consider 
the spacetime metric only up to scale, that is, 
to focus on the conformal structure of the spacetime 
in question. Since we are interested in systems which 
approach Minkowski spacetime at large distances 
from the source, it is illuminating to study Minkowski 
spacetime as a preliminary example. So consider the 
manifold M = Rf equipped with the flat metric 


g = dë — dr —rdo’ [1] 
where r is the standard radial coordinate defined by 
Fax dy eer and 

do? = d6? + sin? 6d¢* 
is the standard metric on the unit sphere S*. We now 
introduce retarded and advanced time coordinates, 


which are adapted to the null cone and hence to the 
conformal structure of g by the definition 


“>t —T, v=t+r 


and obtain the metric in the form 
g = dudv — t (v — u) do? 


The coordinates u and v both take arbitrary real values 
but they are restricted by the relation v — u =2r > 0. 
In order to see what happens “at infinity,” we introduce 
the coordinates U and V by the relations 


u = tan U, v= tan V 


Then U and V both take values in the open interval 
(—r/2,7/2) with V > U and the metric is trans- 
formed to 


7 1 oe) 
8 = Foe [dU AV — sin’'(V — U) do“} [2] 


Clearly, the metric is undefined at events with 
cos U=0 or cos V=0. These would correspond to 
events with u= +00 or v= +00 which do not lie in 
M. However, by defining the function 


Q = 2 cos U cos V 
we find that the metric ẹ =*g with 
ê = 4dU dV — sin? (V — U) do? [3] 


is conformally equivalent to g and is regular for all 
values of U and V (keeping V > U). In fact, by 
defining the coordinates 


this metric takes the form 
pedr — dR* — sin? Rdr [4] 


the metric of the static Einstein universe E. Thus, we 
may regard the Minkowski spacetime as the part of 
the Einstein cylinder defined by restricting the 
coordinates T and R to the region |T|+R <7 as 
illustrated in Figure 1. Although M can be considered 
as being diffeomorphic to the shaded part in Figure 1, 
these two manifolds are not isometric. This is obvious 
from considering the properties of the events lying on 





Figure 1 The embedding of Minkowski spacetime into the 
Einstein cylinder. 


the boundary OM of M in E. Fix a point P inside M 
and follow a null geodesic with respect to the metric g 
from P toward the future. It will intersect OM after a 
finite amount of its affine parameter has elapsed. 
When we follow a null geodesic with respect to g 
from P in the same direction, we find that it does not 
reach OM for any value of its affine parameter. Thus, 
the boundary is at infinity for the metric g but at a 
finite location with respect to the metric g. When we 
consider all possible kinds of geodesics for the metric g 
we find that OM consists of five qualitatively different 
pieces. The future pointing timelike geodesics all 
approach the point 7* given by (T, R)=(z,0), while 
the past-pointing geodesics approach i~ with coordi- 
nates (—7,0). All spacelike geodesics come arbitrarily 
close to a point 7? with coordinates (0, 7) (located on 
the front of the cylinder in Figure 1). Null geodesics, 
however, are different. For any point (T, a — |T|) with 
T #0, +r on OM there are g-null-geodesics which 
come arbitrarily close. 

In this sense, we may regard OM as consisting of 
limit points obtained by tracing-geodesics for infi- 
nite values of their affine parameters. According to 
the causal character of the geodesics the set of their 
respective limit points is called future/past timelike 
infinity i+, spacelike infinity 7? or future/past null- 
infinity, denoted by .7*. These two parts of null- 
infinity are three-dimensional regular submanifolds 
of the embedding manifold E, while the points i+, 7? 
are regular points in E in the sense that the metric g 
is regular there. This is not automatic, considering 
the fact that infinitely many geodesics converge to a 
single point. However, the flatness of Minkowski 
spacetime guarantees that the geodesics approach at 
just the appropriate rate for the limit points to be 
regular. 

This example shows that the structure of the 
boundary is determined entirely by the metric g of 
Minkowski spacetime. If we had chosen a different 
function Q’=wQ with w>0 then we would not 
have obtained the Einstein cylinder but some 
different Lorentzian manifold (M’,9’). Yet, the 
boundary of M in M’ would have had the same 
properties. 


Asymptotically Flat Spacetimes 


The physical idea of an isolated system is captured 
mathematically by an asymptotically flat space- 
time. Since such a spacetime M is expected to 
approach Minkowski spacetime asymptotically, 
the asymptotic structure of M is also expected to 
be similar to that of M. This expectation is 
expressed in 
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Definition 1 A spacetime (M, gb) is called “asymp- 
totically simple” if there exists a manifold-with- 
boundary M with metric g,, and scalar field Q on 
M and boundary T =M such that the following 
conditions hold: 


1. M is the interior of M: M =int M: 

2. Zab = O gob on M; 

3. Q and ĝ, are smooth on all of M; 

4,Q>00n M;0=0,V,0 4 0 on -v; and 

5. each null geodesic acquires both future and past 
endpoints on -/. 


This definition formalizes the construction which 
was explicitly performed above, by which one 
attaches a regular (nonempty) boundary to a space- 
time after suitably rescaling its metric. Asymptoti- 
cally simple spacetimes are exactly those for which 
this process of conformal compactification is possi- 
ble. The purpose of condition 5 is to exclude 
pathological cases. There are spacetimes which do 
not satisfy this condition (e.g., the Schwarzschild 
spacetime, where some of the null geodesics enter 
the event horizon and cannot escape to infinity). 
Yet, one would like to include them as being 
asymptotically simple in a sense, because they 
clearly describe isolated systems. For these cases, 
there exists the notion of weakly asymptotically 
simple spacetimes. 

In order to arrive at asymptotically flat space- 
times, one needs to make certain assumptions about 
the behavior of the curvature near the boundary, 
thus: 


Definition 2 An asymptotically simple spacetime is 
called “asymptotically flat” if its Ricci tensor Ric[g] 
vanishes in a neighborhood of .7. 


Note that this definition imposes a rather strong 
restriction on the Ricci curvature; less restrictive 
assumptions are possible. This condition applies 
only near .7. Thus, it is possible to consider 
spacetimes which contain matter fields as long as 
these fields do not extend to infinity. 

Other asymptotically simple spacetimes which are 
not asymptotically flat are the de Sitter and anti-de 
Sitter spacetimes which are solutions of the Einstein 
equations with nonvanishing cosmological constant A. 
It is a simple consequence of the definition that 
the boundary » is a regular three-dimensional 
hypersurface of the embedding spacetime M which 
is timelike, spacelike, or null depending on the sign 
of A. In particular, for the Minkowski spacetime 
(\=0) the boundary is necessarily a null hypersur- 
face, as noted above. 

The requirement that the 
equations hold near has 


vacuum Einstein 
several important 
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consequences. First, .7 is a null hypersurface with 
the special property of being shear-free. This means 
that any cross section of a bundle of its null 
generators does not suffer any distortions when 
moved along the generators. Only expansion or 
contraction can occur. The global structure of -/ 
is the same as the one from the example above. 
Null infinity consists of two connected components, 
s+, each of which is diffeomorphic to $% x R. Thus, 
topologically, .** are cylinders. The cone-like 
appearance as seen in Figure 1 is artificial. It 
depends on the particular conformal factor Q chosen 
for the conformal compactification. Furthermore, it 
is only in very exceptional cases that the metric ĝ is 
regular at 7° or i=. 

The most important consequence, however, con- 
cerns the conformal Weyl tensor C%p.7. This is the 
part of the full Riemann curvature tensor Rpg which 
is trace-free. It is invariant under conformal rescal- 
ings of the metric. Thus, on M, C44 = C’ ped. When 
the vanishing of the Ricci tensor near .”7 is assumed 
then it turns out that the Weyl tensor necessarily 
vanishes on .’. This is the ultimate justification for 
calling such manifolds asymptotically flat because the 
entire curvature vanishes on ./. 


Some Consequences 


There are several consequences of the existence of 
the conformal boundary .7. They all can be traced 
back to the fact that this boundary can be used to 
separate the geometric fields into a universal back- 
ground field and dynamical fields which propagate 
on it. The background is given by the boundary 
points attached to an asymptotically flat spacetime 
which always form a three-dimensional null hyper- 
surface .⁄ with two connected components (in the 
sequel, we restrict our attention to .7* only; .7~ is 
treated similarly), each with the topology of a 
cylinder. And in each case, .” is shear-free. 


The BMS Group 


Since the structure of null-infinity is universal over 
all asymptotically flat spacetimes, it is obvious that 
its symmetry group should also possess a universal 
meaning. This group, the so-called Bondi—Metzner- 
Sachs (BMS) group is in many respects similar to the 
Poincaré group, the symmetry group of M. It is the 
semidirect product of the Lorentz group with an 
abelian group which, however, is not the four- 
dimensional translation group but an infinite-dimen- 
sional group of supertranslations. This group is a 
normal subgroup, so the factor group is isomorphic 
to the Lorentz group. 


In physical terms, the supertranslations arise 
because there are infinitely many directions from 
which observers at infinity (whose world lines coincide 
with the null generators of .7 in a certain limit) can 
observe the system and because each observer is free to 
choose its own origin of proper time u. The observers 
surrounding the system are not synchronized, because 
under the assumptions made there is no natural way to 
fix a unique common origin. Hence, a supertranslation 
is a shift of the parameter along each null generator of 
Jt corresponding to a change of origin for each 
individual observer. It can be given as a map S* —> R. 
A choice of origin on each null generator of .7~* is 
referred to as a “cut” of .7~. It is a two-dimensional 
surface of spherical topology which intersects each null 
generator exactly once. It is an open question whether 
one can always synchronize the observers by imposing 
canonical conditions at i° or i*, thereby reducing the 
BMS group to the smaller Poincaré group. 

The supertranslations contain a unique four- 
dimensional normal subgroup. In M these special 
supertranslations are the ones which are induced by 
the translations of Minkowski spacetime in the 
following way. Take the future light cone of some 
event P and follow it out to .”*, where its intersection 
defines an origin for each observer located there. 
Now consider the light cone of another event O 
obtained from P by a translation in a spatial 
direction. Then the light emitted from O will arrive 
at .“* earlier than that from P for observers in the 
direction of the translation, while it will be delayed 
for observers in the opposite direction. This change 
in arrival time defines a specific supertranslation. 
Similarly, for a translation in a temporal direction, 
the light from O will arrive later than that from P 
for all observers. Thus, every translation in M 
defines a particular supertranslation on .7*. These 
can be characterized in a different way, which is 
intrinsic to .** and which can be used in the general 
case even though there will be no Killing vectors 
present in a general asymptotically flat spacetime. In 
an appropriate coordinate system, the asymptotic 
translations are given as linear combinations of the 
first four spherical harmonics Yoo, Y10, Yi+1. The 
space of asymptotic translations T is in a natural 
way isometric to M. 


The Peeling Property 


Now consider the Weyl tensor C%,.4 on M. Since it 
vanishes on .% where Q=0 we may form the 
quotient 


4 
gg Cha 


which can be shown to be smooth on .v*. The 
physical interpretation of this tensor field is based 
on the following properties. In source-free regions 
the field satisfies the spin-2 zero-rest-mass equation 


Ve = 0 


which is very similar to the Maxwell equations for 
the electromagnetic (spin-1) Faraday tensor. Thus, 
K*,.q is interpreted as the gravitational field, which 
describes the gravitational waves contained inside 
the system. The zero-rest-mass equation for K’ped 
and the fact that the field is smooth on.” implies that 
the Weyl tensor satisfies the “peeling” property. This 
is a characteristic conspiracy between the fall-off 
behavior of certain components of the Weyl tensor 
along outgoing g-null-geodesics approaching .7~* in 
M with respect to an affine parameter s for s — co 
and their algebraic type. Symbolically, the Weyl 
tensor has the following behavior as s — oo along 
the null geodesic: 


4] [31] [1111] 


E yd SaN 5 
Se! Ee ee s4 oo) 


[211] 


where the numerator of each component indicates 
its Petrov type. The repeated principal null direction 
(PND) in the first three components and one of the 
PNDs in the fourth component are aligned with the 
tangent vector of the geodesic. This implies that 
the farthest reaching component of the Weyl tensor, 
which is O(1/s), has the Petrov type of a radiation 
field. It is customary to combine the components 
which are O(1/s') into one complex function and 
denote it by ¢s_;. When expressed in terms of the 
field K%,.q on M, this fall-off behavior implies that 
of all components of K%p.q only %4 does not 
necessarily vanish on ./~. 

In special cases like the Minkowski, Schwarzs- 
child, Kerr, and more generally in all asymptotically 
flat stationary spacetimes, even 74 vanishes on ./~. 
For these reasons, %4 is called the radiation field of 
the system, that is, that part of the gravitational field 
which can be registered by the observers at infinity. 
It describes the outgoing radiation which is being 
emitted by the system during its evolution. 


The Bondi-Sachs Mass-Loss Formula 


Gravitational waves carry away energy from the 
system. This is a consequence of the Bondi-—Sachs 
mass-loss formula. The Bondi-Sachs energy- 
momentum is related to a weighted integral over a 
cut C, 


1 


K= T 


J Whintoalds 6) 
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The quantity in brackets, the mass aspect, is a 
combination of the scalar %2 which in a sense 
measures the strength of the Coulomb-like part of 
the gravitational field on .7* and the complex 
quantity o. In a so-called Bondi coordinate system, 
this quantity is related to the radiation field %4 by 
the relation 


by = 3 


the dot indicating differentiation with respect to the 
affine parameter along the null generators. Thus, o 
is essentially the second time integral of the 
radiation field. The mass aspect is integrated against 
a function W which is an asymptotic translation, 
that is, a linear combination of the first four 
spherical harmonics. Thus, one can view the 
expression [6] as defining a linear map T — R. 
Since T and M are isometric this defines a covector 
P, on M, which can always be shown to be timelike, 
P,P? > 0. This positivity property together with the 
fact that in the special cases of Schwarzschild and 
Kerr spacetimes the integral yields the mass para- 
meters when evaluated for a time translation 
(W=1) motivates the interpretation of Pe as the 
energy-momentum 4-vector of the spacetime at the 
instant defined by the cut C. In particular, for W =1 
the integral gives the time component of Pe, the 
Bondi-Sachs energy E. 

The interpretation of [6] as energy-momentum is 
strengthened by the fact that Pe arises as dual to the 
translations which is familiar from Lagrangian field 
theories where energy and momentum appear as 
generators for time and space translations. In fact, 
one can set up a Hamiltonian framework where the 
role of the Bondi-Sachs energy-momentum as 
generator of asymptotic translations is made 
explicit. 

This point of view suggests that one should also 
be able to define a notion of angular momentum for 
asymptotically flat spacetimes because angular 
momentum arises as the generator of rotations, 
which can also be defined asymptotically. However, 
while there is a unique notion of translation on ./*, 
this is not the case for rotations (and boosts). The 
reason is hidden in the structure of the BMS group 
where the Lorentz group appears naturally as a 
factor group but not as a unique subgroup. In 
physical terms, the angular momentum depends on 
an origin but there is no natural way to choose an 
origin on .7*. This ambiguity in the choice of origin 
leads to several nonequivalent expressions for 
angular momentum in the literature. 

Consider now two cuts C and C’, with C’ later than 
C. Then we may compute the difference AE = E — F’ 
of the Bondi-Sachs energies with respect to the two 
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cuts. It turns out that this difference can be 
expressed as an integral over the (three-dimensional) 
piece X of .7* which is bounded by the two cuts 
(ies OL=C =C): 


1 


a? 
4rG 5 


ód’ V [7] 
This result means that the Bondi—Sachs energy of the 
system decreases, since E'< E and the rate of 
decrease is given by the (positive-definite) amount 
of gravitational radiation which leaves the system 
during the period defined by the two cuts. 

It is necessary to point out that in this article the 
structure of null infinity has been postulated based 
on physical reasonings. The Einstein equations have 
been used only in a very weak sense, namely only in 
a neighborhood of .v. It is an entirely different 
question whether the field equations are compatible 
with this postulated structure. To answer it, one 
needs to show that there are global solutions of the 
Einstein equations which exhibit the postulated 
behavior in the asymptotic region. This question 
has been settled recently in the affirmative: there are 
many global spacetimes which are asymptotically 
flat in the sense described here. 

This article discussed has the notion of null 
infinity, that is, of spacetimes which are asymptoti- 
cally flat in lightlike directions. Spacetimes which 
are asymptotically flat in spacelike directions have 
not been covered. The latter is a notion which has 
been developed largely independently of null infinity 
since it is essentially a property of an initial data set 
and not of the entire four-dimensional spacetime. 
Ultimately, these two notions should coincide, in the 
sense that if one has an initial data set which is 
asymptotically flat in spatial directions in an appro- 
priate sense then its Cauchy development will be an 
asymptotically flat spacetime. However, as of yet, it 
is not clear what the appropriate conditions should 
be because the structure of the gravitational field in 
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Introduction 


Averaging methods are the methods of perturbation 
theory that are based on the averaging principle and 
the idea of dividing the dynamics into slow drift and 
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the neighborhood of spacelike infinity 7 is not 


sufficiently well understood so far. 
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fast oscillations. The most common field of applica- 
tions of averaging methods is the analysis of the 
behavior of dynamical systems that differ from 
integrable systems by small perturbations. 


Averaging Principle 


Equations of motion of a system that differ from an 
integrable system by small perturbations often can 
be written in the form 


I= eg(l, p,e), $ = ol) + ef(I, 9,8) 
I= (h,..., In) € R” [1] 
Y = (Y1,---,Ym) E T” modd27,0<e<« 1 


The small parameter £ characterizes the amplitude 
of the perturbation. For «=0 one gets the 
unperturbed system. The equation I=const. sin- 
gles out an invariant m-dimensional torus of the 
unperturbed system. The motion on this torus is 
quasiperiodic with frequency vector w(I); compo- 
nents of vector I are called “slow variables” 
whereas components of vector y are called “fast 
variables” or “phases.” The right-hand sides of 
system [1] are 27-periodic with respect to all yj. It 
is assumed that they are smooth enough functions 
of all arguments. It is also assumed that compo- 
nents of the frequency vector are not linearly 
dependent over the ring of integer numbers 
identically with respect to I. System [1] is called 
a “system with rotating phases.” 

In applications, one is often interested mainly in 
the behavior of slow variables. The “averaging 
principle” (or method) consists in replacing the 
system of perturbed equations [1] by the “averaged 
system” 


J=eG), GU) = 2" $ whe0de P 


for the purpose of providing an approximate 
description of the evolution of the slow variables 
over time intervals of order 1/e or longer. Here, 
dp=dy,;---dy,. System [2] contains only slow 
variables and, therefore, is much simpler for 
investigation than system [1]. When passing from 
system [1] to system [2], one ignores the terms 
g(I,~,0) — G(I) on the right-hand side of [1]. The 
averaging principle is based on the idea that these 
terms oscillate and lead only to small oscillations 
which are superimposed on the drift described by 
the averaged system. To justify the averaging 
principle, one should establish a relation between 
the behavior of the solutions of systems [1] and [2]. 
This problem is still far from being completely 
solved. 

Another version of the averaging principles is 
used in the case when frequencies are approxi- 
mately in resonance. This means that one or 
several relations of the form (k,w)=0 approxi- 
mately are valid with irreducible integer coefficient 
vectors k Æ 0; here, (k,w) is the standard scalar 
product in R”. Let T be a sublattice of the integer 
lattice Z” generated by these vectors. Let 
r=rankT and k”, k”... k" be a basis in Z”, 
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the first r vectors of which belong to T. Instead of 
p, one can introduce new variables: 


V = (V1,...,0,) E€ T” modd 27 
X = (X1; ---;Xm-r) E T” ” modd 2r 


d; = (kË, p), x = (RE p) 


Let R be an r x m matrix whose rows are vectors 
k® 1 <i< r. For an approximate description of the 
behavior of variables 1,, the averaging principle 
prescribes replacing system [1] by the system 


J a eGr 7); a = Ru J) J. eRFr(J, y 
— (2p) 
G= CAI E ged ig 
Fr(J, 9) = (2m) & #9, 0) dx 


(one should express g,f through v,y and then 
integrate over y,dyx=dy1---dym_,). System [3] is 
called “partially averaged system” for resonances in 
I. Functions Gr,Fr can be obtained from Fourier 
series expansions of functions g, f for e=0 
by throwing away harmonics exp(i(k,w)), kT 
(nonresonant harmonics). Passing from system [1] 
to system [3] is based on the idea that the ignored 
nonresonant harmonics oscillate fast and do not 
affect essentially the evolution of the slow variables. 

Now let system [1] be a Hamiltonian system close 
to an integrable one. The Hamiltonian function has 
the form 


H = Ho(p) + €Hy(p, 9, y, x, €) 


where y,x are coordinates and p,y are conjugated 
to them. The equations of motion have the same 
form as [1], with I=(p, y, x): 


p 2m ya 

Oy ’ Ox 4] 
ga cdth, aM, „ôM 

Oy ’ Ol Ol 


The averaging principle in the case when there are 
no resonant relations leads to the system 
OH, . OM 


p = 0, Pe x= By 


5] 

Hy = (2n)" $ Hilp,y,3.%,0) dy 
Therefore, in this case there is no drift in p, and the 
behavior of y,x is described by the Hamiltonian 
system, which contains p as a parameter. Equations 
of motion of planets around the Sun can be reduced 
to the form [4]. The issue of the absence of the 
evolution of momenta p is known in this problem as 
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the Lagrange—Laplace theorem, about the absence of 
the evolution of semimajor axes of planetary orbits. 


Elimination of Fast Variables, Decoupling 
of Slow and Fast Motions 


The basic role in the averaging method is played by 
the idea that the exact system can be in the principal 
approximation transformed into the averaged sys- 
tem by means of a transformation of variables close 
to the identical one. The extension of this idea is the 
idea that similar transformation of variables allows 
one to eliminate, up to an arbitrary degree of 
accuracy, the fast phases from the right-hand sides 
of the equations of perturbed motion and in this 
way decouple the slow motion from the fast one. 
For system [1], provided there are no resonant 
relations between frequencies, the elimination of fast 
variables is performed as follows. The desirable 
transformation of variables (I, p)—> (J, Y) is sought 
as a formal series 


I=J+em(J,p)+em(,v)+-- 
p=pten(J,v) + nH 


where functions ujv; are 27-periodic in w. The 
transformation [6] should be chosen in such a way 
that in the new variables the right-hand sides of 
equations of motion do not contain fast variables, 
that is, the equations of motion should have the 
form 


6 


J=eGo(J)+Gi(J) +: 
wy = w(J) +eFo(J) +e*Fy(J) +--> 


Substituting [6] into [7], taking into account [1], and 
equating the terms of the same order in £, we obtain 
the following set of relations: 


[7] 








Go(J) = g( J, %, 0) - Fhe 
o 
Fo(J) = f(Jo¥s0) + Em — Ft 
[8] 
Ou; 
Gi) = X4) = Gre 
Ow OVi+1 : 
Fi(J) = Y:(J, Y) + Fad i 2 


The functions X;, Y; are uniquely determined by the 
terms %1,V1,...,%j,V; in expansion [6]. The first 
equation in [8] implies that 


Go(J) = go(J) = G(J) [9] 
m=) e Pik) +U) 


k40 





where g,,k € Z”, are Fourier coefficients of func- 
tion g at £ =0, and u? is an arbitrary function of J. It 
is assumed that the denominators in [9] do not 
vanish, and that the series in [9] converges and 
determines a smooth function. In the same way, 
from the other equations in [8] one can sequentially 
determine Fo,11,..., Gj, uid, FYV, i > 1. 

On truncating the series in [6] and [7] at the terms 
of order <', we obtain a truncated system of the Ith 
approximation. The equation for J is decoupled 
from the other equations and can be solved 
separately. Then the behavior of yw is determined 
by means of quadrature. The behavior of original 
variable I in this approximation is a slow drift 
(described by the equation for J), on which small 
oscillations (described by transformation of variables) 
are superimposed. The behavior of y can be repre- 
sented as a rotation with slowly varying frequency, 
on which oscillations are also superimposed. For /= 1, 
the truncated system coincides with the averaged 
system [|2]. 

If the sublattice T CZ” specifying possible 
resonant relations is given, then in an analogous 
manner one can construct a formal transformation 
of variables (I,y)+->(J,w) such that, in the new 
variables, the fast phase y will appear on the right- 
hand sides of the differential equations for the new 
variables only in combinations (k,w), with k €T 
(see, e.g., Arnol’d et al. (1988)). Again, on truncat- 
ing the series on the right-hand sides of the 
differential equations for the new variables at the 
terms of order £!, we obtain a truncated system of 
the /th approximation. At /=1, this truncated 
system coincides with the partially averaged system 
[3] (for some special choice of arbitrary functions 
that are contained in the formulas for transformation 
of variables). If the original system is a Hamiltonian 
system of the form [4], then the transformation of 
variables eliminating the fast phases from the right- 
hand sides of the differential equations can be 
chosen to be symplectic. The corresponding 
procedures are called “Lindstedt method” and 
“Newcomb method” (nonresonant case for n= m), 
“Delaunay method” (resonant case for n=m), and 
“von Zeipel method” (resonant case for n > m) (see 
Poincaré (1957) and Arnol’d et al. (1988)). 

The calculation of high-order terms in the 
procedures of elimination of fast variables is rather 
cumbersome. There are versions of these procedures 
which are convenient for symbolic processors 
(especially for Hamiltonian systems, e.g., the 
Deprit—Hori method; Giacaglia 1972). 

The averaging method consists in using the 
averaged system for the description of motion in 
the first approximation and the truncated systems 


obtained by means of the procedures of elimination 
of fast variables in the higher approximations, 
together with the corresponding transformations of 
variables. 


Justification of the Averaging Method 


To justify the averaging method, one should estab- 
lish conditions under which the deviation of the 
slow variables along the solutions of the exact 
system from the solutions of the averaged system 
with appropriate initial data on time intervals of 
order 1/e or longer tends to 0 as e— 0. It is 
desirable to have estimates from the above for these 
deviations. The estimates of deviations of the 
solutions of the exact system from the solutions of 
the truncated systems obtained by means of the 
procedure of elimination of fast phases are impor- 
tant as well. It can happen that there are “bad” 
initial data for which the slow component of the 
solution of the exact system deviates from the 
solution of the averaged system by a value of order 
1 over time of order 1/e. In this case, one should 
have estimates from above for the measure of the set 
of such “bad” initial data; on the complementary set 
of initial data, one should have estimates from 
above for the deviation of slow variables along the 
solutions of the exact system from the solution of 
the averaged system. These problems are currently 
far from being completely solved. Some general 
results are described in the following. 

Let functions w,f, g on the right-hand side of 
system [1] be defined and bounded together with a 
sufficient number of derivatives in the domain D{I} x 
T’ {yp} x [0,e0]. Let J(t) be the solution of the 
averaged system [2] with initial condition Ip € D. 
Let (I(t), y(t)) be the solution of the exact system [1] 
with initial conditions (Ip, po). So, I(0)=J(0). It is 
assumed that the solution /(t) is defined and stays at 
a positive distance from the boundary of D on the 
time interval 0 < t < Kje, K =const > 0. 

If system [1] is a one-frequency system (m= 1), 
and the frequency w does not vanish in D, then for 
0 <t< Kje the solution (I(t), y(t)) is well defined, 
and |I(t) — J(t)| < Ce, C = const. > 0. For w=1, this 
assertion was proved by P Fatou (1928) and, by a 
different method, by L I Mandel’shtam and L D 
Papaleksi (1934). This was historically the 
first result on the justification of the averaging 
method (Mintropol’skii 1971). There is a proof 
based on the elimination of fast variables (see, e.g., 
Arnol’d (1983)). For a one-frequency system, higher 
approximations of the procedure of elimination of 
fast variables allow the description of the dynamics 
with an accuracy of the order of any power in € on 
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time intervals of order 1/e (Bogolyubov and 
Mitropol’skii 1961). 

If system [1] is a multifrequency system (7m > 2), but 
the vector of frequencies is constant and nonresonant, 
then for any p > 0 and small enough e < €o(p) it holds 
that |I(t)—J(t)| <p for 0<t<K/e (Bogolyubov 
1945, Bogolyubov and Mitropoľskii 1961). If, in 
addition, the frequencies satisfy the Diophantine 
condition |(k,w)| > const|k| ” for all k € Z” \4{0} 
and some v > 0, then one can choose p= O(e). In 
this case, higher approximations of the procedure of 
elimination of fast variables allow one to describe 
the dynamics with an accuracy of the order of any 
power in € on time intervals of order 1/e (see, e.g., 
Arnol’d et al. (1988)). 

If the system is a multifrequency system, and 
frequencies are not constant (but depend on the slow 
variables I), then due to the evolution of slow 
variables the frequencies themselves are evolving 
slowly. At certain time moments, they can satisfy 
certain resonant relations. One of the phenomena 
that can take place here is a capture into a 
resonance; this capture leads to a large deviation of 
the solutions of the exact and averaged systems. 
However, the general Anosov averaging theorem 
(Anosov 1960) implies that if the frequencies w are 
nonresonant for almost all I, then for any p > 0, the 
inequality |I(t) — J(t)| < p is satisfied for O < t < K/e 
for all initial data outside a set E(p,£) whose 
measure tends to 0 as €— 0. In many cases, it 
turns out that mes E(p, ¢) = O(./e/p) (in particular, 
the sufficient condition for the last estimate is that 
rank(Ow/OI) =m) (Arnol’d et al. (1988)). 

The knowledge about averaging in two- 
frequency systems (m= 2) on time intervals, of order 
of 1/e, is relatively more complete (see Arnol’d 
(1983), Arnol’d et al. (1988), and Lochak and 
Meunier (1988)). For Hamiltonian and reversible 
systems, the justification of the averaging method is 
a by-product of Kolmogorov—Arnold—Moser (KAM) 
theory. The KAM theory provides estimates of the 
difference between the solutions of the exact and 
averaged systems for majority of initial data on 
infinite time interval —co < t < +00. For remaining 
data this difference can grow because of Arnol’d 
diffusion, but, in general, very slowly. According to 
the Nekhoroshev theorem, this difference is small on 
time intervals whose length grows exponentially when 
the perturbation decays linearly (for an analytic 
Hamiltonian if the unperturbed Hamiltonian is a 
generic function, the so-called steep function). 

Another aspect of justification of the averaging 
method is establishing relations between invariant 
manifolds of the exact and averaged systems. 
Consider, in particular, the case of a one-frequency 
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system and a multifrequency system with constant 
Diophantine frequencies. Suppose that the averaged 
system has an equilibrium such that real parts of all 
its eigenvalues are different from 0, or a limit cycle 
such that the absolute values of all but one of its 
multipliers are different from 1. Then the exact 
system has an invariant torus, respectively, m- or 
(m+ 1)-dimensional, whose projection onto the 
space of the slow variables is O(e)-close to the 
equilibrium (cycle) of the averaged system. This 
torus is stable or unstable together with the 
equilibrium (cycle) of the averaged system. For 
Hamiltonian and reversible systems, the problem of 
invariant manifolds is considered in the framework 


of the KAM theory. 


Averaging in Bogolyubov’s Systems 


Systems in the standard form of Bogolyubov (1945) 
are of the form 


x=eX(t,x,e), xER’,O<e<1 [10] 
It is assumed that the function X, besides the usual 
smoothness conditions, satisfies the condition of 
uniform average: the limit (time average) 


T 


Xo(x) = lim a | X(t2,0) dr [11] 


exists uniformly in x. The averaging principle of 
Bogolyubov consists of the replacement of the 
original system in standard form by the averaged 
system 


€ = £ Xo (£) [12] 
with a goal to provide an approximate description 
of the behavior of x. This approach generalizes the 
approach of the section “Averaging principle” for 
the case of constant frequencies (w= const). Upon 
introducing in the given system with constant 
frequencies the deviation from uniform rotation 
a=p-—wt and denoting x= (I,a), we obtain a 
system in the standard form [10]. Here the condition 
of uniform average is fulfilled because X(t,x,0) is a 
quasiperiodic function of time ¢t. The averaged 
system [12] for nonresonant frequencies coincides 
with the averaged system [2]; for resonant frequen- 
cies, it coincides with the partially averaged system 
[3] (one should only supply systems [2] and [3] with 
equations for some components of the vector y — wt 
that do not enter into the right-hand side of the 
averaged system). 

The averaging principle of Bogolyubov is justified 
by three Bogolyubov theorems. According to the 


first theorem, if €(t),0 < t< K/e, is a solution of 
the averaged system, and x(t) is a solution of the 
exact system with initial condition x(0)=€(0), then 
for any p>O there exists e9(p) >0 such that 
x(t) — E(t)| < p for O< t<K/e and 0 <€< €0/(p). 
The second and the third Bogolyubov theorems 
describe the motion in the neighborhoods of 
equilibria and the limit cycles of the averaged 
system. In particular, if for an equilibrium real 
parts of all its eigenvalues are different from 0, or, 
for a limit cycle, the absolute values of all but one 
multipliers are different from 1, then the exact 
system has a solution which eternally stays near 
this equilibrium (cycle). The stability properties of 
this solution are the same as the stability properties 
of the corresponding equilibrium (cycle) of the 
averaged system. 

For systems of the form [10] a procedure exists 
that, similarly to the procedure in the section 
“Elimination of fast variables, decoupling of slow 
and fast motions,” allows us to eliminate time t 
from the right-hand side of the system with an 
accuracy of the order of any power in £ by means of 
a transformation of variables. (To perform this 
procedure, one should assume that the conditions 
of uniform average are satisfied for functions 
that arise in the process of constructing higher 
approximations in this procedure (Bogolyubuv and 
Mitropol’skii 1961).) In the first approximation, 
such a transformation of variables transforms the 
original system into the averaged one. 

The condition of uniform average is very impor- 
tant for theory. If the limit in [11] exists, but 
convergence is nonuniform in x, then the time 
average Xo could be, for example, a discontinuous 
function of x, and the averaged system would not be 


well defined. 


Averaging in Slow-Fast Systems 


Systems of the form [1] are particular cases of the 
systems of the form 


x = f(x,y, €), ý = €g(x,Y,€) [13] 


which are called “slow-fast systems” (or systems 
with slow and fast motions, with slow and fast 
variables). The generalization of the approach of the 
section “Averaging principle” for these systems is 
the following averaging principle of Anosov (1960). 
In the system [6], let x € M,y € R”, where M is a 
smooth compact m-dimensional manifold. At € =0, 
the system for fast variables x contains slow 
variables y as parameters. Assume that this system 
(which is called “fast system”) has a finite smooth 


invariant measure uy and is ergodic for almost all 
values of y. Introduce the averaged system 

1 

V=eG(¥), GY) =e | lx. Y: Oduy 
According to the averaging principle, one should use 
the solution Y(t) of the averaged system with initial 
condition Y(0) = y(0) for approximate description of 
slow motion y(t) in the original system. This 
averaging principle is justified by the following 
Anosov theorem [1]: for any positive p the measure 
of the set E(p,e) of initial data (from a compact in 
the phase space) such that 

mo. | y(t) — YŒ) > p 

tends to 0 as € — Q. 

The particular case when the original system is 
a Hamiltonian system depending on slowly vary- 
ing parameter \=et, and for almost all values of 
A the motion of the system with A=const is 
ergodic on almost all energy levels, is considered 
in Kasuga (1961). 

For the case when the has strong mixing proper- 
ties, see Bakhtin (2004) and Kifer (2004). 

For slow-fast systems, there is also a general- 
ization of approach of the previous section that uses 
time averaging and the condition of uniform average 


(Volosov 1962). 


Applications of the Averaging Method 


The averaging method is one of the most productive 
methods of perturbation theory, and its applications 
are immense. It is widely used in celestial mechanics 
and space flight dynamics for the description of the 
evolution of motions of celestial bodies, in plasma 
physics and theory of accelerators for description of 
motion of charged particles, and in radio engineer- 
ing for the description of nonlinear oscillatory 
regimes. There are also applications in hydrody- 
namics, physics of lasers, optics, acoustics, etc. (see 
Arnol’d et al. (1988), Bogolyubov and Mitropol’skii 
(1961), Lochak and Meunier (1988), Mitropol’skii 
(1971), and Volosov (1962)). 
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Introduction 


The idea of topological invariants defined via path 
integrals was introduced by AS Schwartz (1977) ina 
special case and by E Witten (1988) in its full 
power. To formalize this idea, Witten (1988) 
introduced a notion of a topological quantum field 
theory (TQFT). Such theories, independent of 
Riemannian metrics, are rather rare in quantum 
physics. On the other hand, they admit a simple 
axiomatic description first suggested by M Atiyah 
(1989). This description was inspired by G Segal’s 
(1988) axioms for a two-dimensional conformal 
field theory. The axiomatic formulation of TQFTs 
makes them suitable for a purely mathematical 
research combining methods of topology, algebra, 
and mathematical physics. Several authors explored 
axiomatic foundations of TQFTs (see Quinn (1995) 
and Turaev (1994). 


Axioms of a TQFT 


An (n + 1)-dimensional TQFT (V,7) over a scalar 
field k assigns to every closed oriented n-dimen- 
sional manifold X a finite-dimensional vector space 
V(X) over k and assigns to every cobordism 
(M, X, Y) a k-linear map 


T(M) = 7(M, X,Y): V(X) > V(Y) 


Here a cobordism (M, X, Y) between X and Y is a 
compact oriented (n + 1)-dimensional manifold M 
endowed with a diffeomorphism M ~ X II Y (the 
overline indicates the orientation reversal). All 
manifolds and cobordisms are supposed to be 
smooth. A TQFT must satisfy the following axioms. 


1. Naturality Any _ orientation-preserving  diffeo- 
morphism of closed oriented n-dimensional mani- 
folds f:X-— xX" induces an isomorphism /f;:V 
(X)— V(X’). For a diffeomorphism g between the 
cobordisms (M, X, Y) and (M’, X’, Y’), the follow- 
ing diagram is commutative: 


(Sx) 
— 


væ v(x" 
“(Ml l-an 
vir) ES vy) 


2. Functoriality If a  cobordim (W,X,Z) is 
obtained by gluing two cobordisms (M, X, Y) and 
(M', Y’, Z) along a diffeomorphism f : Y — Y’, then 
the following diagram is commutative: 


r(W) 
v% “2 wz) 
“(M)| l-an 
vy) > wY’ 


3. Normalization For any n-dimensional manifold 
X, the linear map 


T([0, 1] x X): V(X) — V(X) 


is identity. 


4. Multiplicativity There are functorial 
isomorphisms 
V(X IY) = V(X) @ V(Y) 
VO) =k 


such that the following diagrams are commutative: 


V((X IY) Z) (V(X) @V(Y)) 8 V(Z) 
l 


X 


X 


' 
V(X U(YUZ)) V(X) @ (V(Y) 8 V(Z)) 


V(X 110) ~ V(X) @k 
{ l 
V(X) = V(X) 


Here ® = ®z, is the tensor product over k. The 
vertical maps are respectively the ones induced 
by the obvious diffeomorphisms, and the stan- 
dard isomorphisms of vector spaces. 

5. Symmetry The isomorphism 


V(XILY) + V(YILX) 


induced by the obvious diffeomorphism corre- 
sponds to the standard isomorphism of vector 
spaces 


V(X) @V(Y) = V(Y) S V(X) 


Given a TQFT (V, 7), we obtain an action of the 
group of diffeomorphisms of a closed oriented 
n-dimensional manifold X on the vector space 
V(X). This action can be used to study this group. 

An important feature of a TQFT (V,7) is that it 
provides numerical invariants of compact oriented 
(n+ 1)-dimensional manifolds without boundary. 
Indeed, such a manifold M can be considered as a 
cobordism between two copies of Ø so that 7(M) € 
Hom,(k,k)=k. Any compact oriented (n+ 1)- 
dimensional manifold M can be considered as a 
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cobordism between Ø and M; the TQFT assigns to 
this cobordism a vector 7(M) in Hom,(k, 
V(OM)) = V(OM) called the vacuum vector. 

The manifold [0,1] x X, considered as a cobord- 
ism from X II X to @ induces a nonsingular pairing 


V(X) & V(X) —k 


We obtain a functorial isomorphism V(X)= 
V(X)* =Hom,(V(X), Rk). 

We now outline definitions of several important 
classes of TQFTs. 

If the scalar field k has a conjugation and all the 
vector spaces V(X) are equipped with natural 
nondegenerate Hermitian forms, then the TQFT 
(V, 7) is Hermitian. If k=C is the field of complex 
numbers and the Hermitian forms are positive 
definite, then the TQFT is unitary. 

A TQFT (V,7) is nondegenerate or cobordism 
generated if for any closed oriented n-dimensional 
manifold X, the vector space V(X) is generated by 
the vacuum vectors derived as above from the 
manifolds bounded by X. 

Fix a Dedekind domain D c C. A TQFT (V,7) 
over C is almost D-integral if it is nondegenerate and 
there is d € C such that dr(M) € D for all M with 
OM = J. Given an almost integral TQFT (V,7) anda 
closed oriented n-dimensional manifold X, we define 
S(X) to be the D-submodule of V(X) generated by all 
the vacuum vectors. This module is preserved under 
the action of self-diffeomorphisms of X and yields a 
finer “arithmetic” version of V(X). 

The notion of an (n + 1)-dimensional TQFT over 
k can be reformulated in the categorical language as 
a symmetric monoidal functor from the category of 
n-manifolds and (n + 1)-cobordisms to the category 
of finite-dimensional vector spaces over k. The 
source category is called the (n+ 1)-dimensional 
cobordism category. Its objects are closed oriented 
n-dimensional manifolds. Its morphisms are cobord- 
isms considered up to the following equivalence: 
cobordisms (M, X,Y) and (M’, X,Y) are equivalent 
if there is a diffeomorphism M — M’ compatible 
with the diffeomorphisms 0M = X Il Y 0M’. 


TQFTs in Low Dimensions 


TQFTs in dimension 0+1=1 are in one-to-one 
correspondence with  finite-dimensional vector 
spaces. The correspondence goes by associating 
with a one-dimensional TQFT (V,7) the vector 
space V(pt) where pt is a point with positive 
orientation. 

Let (V,7) be a two-dimensional TQFT. The linear 
map 7 associated with a pair of pants (a 2-disk with 
two holes considered as a cobordism between two 


circles S! II St and one circle St) defines a commu- 
tative multiplication on the vector space A= V(S'). 
The 2-disk, considered as a cobordism between S! 
and (), induces a nondegenerate trace on the algebra 
A. This makes A into a commutative Frobenius 
algebra (also called a symmetric algebra). This 
algebra completely determines the TQFT (V,7). 
Moreover, this construction defines a one-to-one 
correspondence between equivalence classes of two- 
dimensional TQFTs and isomorphism classes of 
finite dimensional commutative Frobenius algebras 
(Kock 2003). 

The formalism of TQFTs was to a great extent 
motivated by the three-dimensional case, specifi- 
cally, Witten’s Chern—Simons TQFTs. A mathema- 
tical definition of these TQFTs was first given 
by Reshetikhin and Turaev using the theory of 
quantum groups. The Witten—Reshetikhin—Turaev 
three-dimensional TQFTs do not satisfy exactly the 
definition above: the naturality and the functoriality 
axioms only hold up to invertible scalar factors 
called framing anomalies. Such TQFTs are said to 
be projective. In order to get rid of the framing 
anomalies, one has to add extra structures on the 
three-dimensional cobordism category. Usually one 
endows surfaces X with Lagrangians (maximal 
isotropic subspaces in H,(X;R)). For 3-cobordisms, 
several competing — but essentially equivalent — 
additional structures are considered in the literature: 
2-framings (Atiyah 1989), p ,-structures (Blanchet 
et al. 1995), numerical weights (K Walker, V Turaev). 

Large families of three-dimensional TQFTs are 
obtained from the so-called modular categories. 
The latter are constructed from quantum groups at 
roots of unity or from the skein theory of links. 
See Quantum 3-Manifold Invariants. 


Additional Structures 


The axiomatic definition of a TQFT extends in 
various directions. In dimension 2 it is interesting to 
consider the so-called open—closed theories involving 
1-manifolds formed by circles and intervals and 
two-dimensional cobordisms with boundary 
(G Moore, G Segal). In dimension 3 one often 
considers cobordisms including framed links and 
graphs whose components (resp. edges) are labeled 
with objects of a certain fixed category C. In such a 
theory, surfaces are endowed with finite sets of 
points labeled with objects of C and enriched with 
tangent directions. In all dimensions one can study 
manifolds and cobordisms endowed with homotopy 
classes of mappings to a fixed space (homotopy 
quantum field theory, in the sense of Turaev). 
Additional structures on the tangent bundles — spin 
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structures, framings, etc. — may be also considered 
provided the gluing is well defined. 


See also: Braided and Modular Tensor Categories; Hopf 
Algebras and g-Deformation Quantum Groups; Indefinite 
Metric; Quantum 3-Manifold Invariants; Topological 
Gravity, Two-Dimensional; Topological Quantum Field 
Theory: Overview. 
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Introduction 


The term “axiomatic quantum field theory” sub- 
sumes a collection of research branches of quantum 
field theory analyzing the general principles of 
relativistic quantum physics. The content of the 
results typically is structural and retrospective rather 
than quantitative and predictive. 

The first axiomatic activities in quantum field theory 
date back to the 1950s, when several groups started 
investigating the notion of scattering and S-matrix in 
detail (Lehmann, Symanzik, and Zimmermann 1955 
(LSZ-approach), Bogoliubov and Parasiuk 1957, Hepp 
and Zimmermann (BPHZ-approach), Haag 1957-59 
and Ruelle 1962 (Haag—Ruelle theory) (see Scattering, 
Asymptotic Completeness and Bound States and 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools). 

Wightman (1956) analyzed the properties of the 
vacuum expectation values used in these approaches 
and formulated a system of axioms that the vacuum 
expectation values ought to satisfy in general. Together 
with Garding (1965), he later formulated a system of 
axioms in order to characterize general quantum fields 
in terms of operator-valued functionals, and the two 
systems have been found to be equivalent. 

A couple of spectacular theorems such as the PCT 
theorem and the spin-statistics theorem have been 
obtained in this setting, but no interacting quantum 
fields satisfying the axioms have been found so far 
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(in 1+3 spacetime dimensions). So, the develop- 
ment of alternatives and modifications of the setting 
got into the focus of the theory, and the axioms 
themselves became the objects of research. Their 
role as axioms — understood in the common sense — 
turned into the role of mere properties of quantum 
fields. Today, the term “axiomatic quantum field 
theory” is widely avoided for this reason. 

In a long list of publications spread over the 
1960s, Araki, Borchers, Haag, Kastler, and others 
worked out an algebraic approach to quantum field 
theory in the spirit of Segal’s “postulates for general 
quantum Mechanics” (1947) (see Algebraic Approach 
to Quantum Field Theory). 

The Wightman setting was the basis of a frame- 
work into which the causal construction of the 
S-matrix developed by Stiickelberg (1951) and 
Bogoliubov and Shirkov (1959) has been fitted by 
Epstein and Glaser (1973). The causality principle 
fixes the time-ordered products up to a finite 
number of parameters at each order, which are to 
be put in as the renormalization constants. 

Already in 1949, Dyson had seen that problems in 
the formulation of quantum electrodynamics (QED) 
could be avoided by “just” multiplying the time 
variable and, correspondingly, the energy variable by 
the imaginary unit constant (“Wick rotation”). Schwin- 
ger then investigated time-ordered Green functions of 
QED in this Euclidean setting. This approach was 
formulated in terms of axioms by Osterwalder and 
Schrader (1973, 1975) (see Euclidean Field 
Theory). 

Other extensions of the aforementioned settings 
are objects of current research (see Indefinite Metric, 


Quantum Field Theory in Curved Spacetime, 
Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions, and Thermal Quantum Field 
Theory). 


Quantum Fields 


Garding and Wightman characterized operator- 
valued quantum fields on the Minkowski spacetime 
RIH by a couple of axioms. Given additional 
assumptions concerning the high-energy behavior, 
the Garding—Wightman fields are in one-one corre- 
spondence with algebraic field theories. 

Without specifying or presupposing these addi- 
tional assumptions, the axioms will now be for- 
mulated and discussed in detail and compared to the 
corresponding conditions in the algebraic setting. 
Adjoint operators are marked by an asterisk, and 
Einstein’s summation convention is used. 


Operator-valued functionals The components of a 
field F are an n-tuple F,---F, of linear maps that 
assign to each test function ~€CX(R'*?) linear 
operators F,(y)---F,(y) in a Hilbert space H with 
domains of definition D(F,(y))---D(F,(y)). There 
exists a dense subspace D of H with 
Dc D(F) N D(F(¢)") and F,(p)/DUF,(y)"D CD 
for all indices v. Consider m such fields F'..-F” 
with components F£, 1 <a<m, 1<v<nj,. Assume 
there to be an involution *:(1---m)—(1---m) such 
that FE (p) =F (Y, where G(x) :=y(x). 

Quantum fields cannot be operator-valued func- 
tions on RİY? if one wants them to exhibit (part of) 
the properties to follow. But point fields can be 
quadratic forms; typically this is the case for fields in 
a Fock space. 

For each component F% and each open region 
OCR'*, the field operators F4(y) with supp y c O 
generate a *-algebra F“(O) of operators defined on 
D. These operators typically are unbounded, which 
is one of the differences with the traditional setting 
of the algebraic approach. There a C*-algebra 21(O) 
is assigned to each open region © in such a way 
that OCP implies WO) CUP). Each C*-algebra 
is a *-algebra, but in contrast to a C’-algebra, 
a*-algebra does not need to be endowed with a 
norm. The fundamental observables in quantum 
theory are bounded positive operators (typically, but 
not always, projections), and these generate a C*- 
algebra. 

There is no fundamental physical motivation for 
confining the setting to fields with a finite number of 
components, except that it includes most of the 
fields known from “daily life.” 
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Continuity as a distribution For all ®,V €D, the 
linear functionals Ts y, on CX(R'*?) defined by 


Ty ow (¥) = (®, Fi (p) Y) 


are distributions. They can be extended to tempered 
distributions. 


The Fourier transform of a tempered distribution 
is well defined as a tempered distribution. It is 
mainly due to the importance of Fourier transforma- 
tions that the preceding assumption is convenient. 
Bogoliubov et al. (1975) remark that the assumption 
is not a mere technicality, since it rules out 
nonrenormalizable quantum fields. 


Microcausality (Bose—Fermi alternative) If y and w 
are test functions with spacelike separated support, 
then 


Folo) Ful = + FL) FL(Y)|p- 


The sign depends on the statistics of the fields, it 
is “—” if and only if both F° and F° are fermion 
fields. 

Microcausality is closely related to Einstein 
causality. Einstein causality requires that any two 
observables located in spacelike separated regions 
commute in the strong sense, that is, their spectral 
measures commute. But fields with Fermi—Dirac 
statistics are not observables, and not even for Bose- 
Einstein fields with self-adjoint field operators does 
the above condition imply that the spectral projec- 
tions commute, which is the criterion for commen- 
surability. The sign on the right-hand side does, 
however, specify the statistics of the field. 

This is a crucial difference with the algebraic 
approach. If O and P are spacelike separated open 
regions and if AEU) and BE 2A(P), then one 
assumes, like in the above case, that AB=BA 
(locality). But being elements of C*-algebras, A and 
B are bounded operators (or can be represented 
accordingly), so if A and B are self-adjoint, they are, 
indeed, commensurable. 

Doplicher, Haag, and Roberts (1974) and Buch- 
holz and Fredenhagen (1984) have derived from this 
input of observables a field structure of localized 
particle states, and they showed that the statistics of 
these fields is Bose-Einstein, Fermi—Dirac, or some 
corresponding parastatistics (which is, a priori, 
forbidden if one assumes microcausality). 

Recall that the unimodular group SL(2,C) is 
isomorphic to the universal covering group of 
the restricted Lorentz group oe (the connected 
component containing the unit element). Denote by 
A: SL(2, C) >L! a covering map. 
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Covariance There exist strongly continuous uni- 
tary representations U and T of SL(2,C) and 
(RI, +), respectively, and representations 
D'.--D” of SL(2,C) in C™ ...C™, respectively, 
such that 


U(g) F4(p) U(g)* = D(F (AlL) 
and 
T(y)Fo(y)T(y)" = Fie — y)), 


where D*(g') are the elements of the matrix 


D*(g"!). Dropping coordinate indices, this reads 


U(g)F*(p) U(g)* = D4(g7')F4(y(A(g)*:)) 


and 


T(y)F*(y)T(y)" = F(C — y)). 
The representations U and T generate a representa- 


tion of the universal covering of the restricted 
Poincaré group. 


As it stands, this assumption is a very strong one, 
since it manifestly fixes the action of the representa- 
tion on the field operators. In the algebraic 
approach, the covariance assumption is more mod- 
estly formulated. Namely, it is assumed that 
U(g)2l(O)U(g)" =2(A(g)O) and T(y)2(O)T(y) = 
A(O + y), leaving open how the representation acts 
on the single local observables. 


Vacuum vector There exists a unique (up to a 
multiple) vector QED that is invariant under the 
representations U and T and cyclic with respect to 
the algebra F(R'*°) generated by all field operators 


F4(y), that is, F(R!) =H. 


Spectrum condition The joint spectrum of the 
components of the 4-momentum, 1.e., of the gen- 
erators of the spacetime translations, has support in 
the closed forward light cone V, that is, the set 
(k? > 0, ko > 0}. 


The existence of an invariant ground state called 
the vacuum is standard in algebraic quantum field 
theory as well. 


N-Point Functions 


Consider the above fields F! - - - F”. For each NEN 
and each N-tuple (a; - - - ayn) of natural numbers < m 
(labeling fields), define families (F^ °N) := 
pen, Bnd we S lees of dis- 


V1 eee UN V1 "e YN 


tributions on (RI®>)N by 


PRIN (91 @ + @ On) = Fi (v1) --- FN (pn) 


(using the nuclear theorem) and 
wa on) = (9, FEIN (D)Q). [1] 


These distributions are called the “N-point func- 
tions” of the fields F! --- F” and yield the vacuum 
expectation values of the theory. It is straightfor- 
ward to deduce the following properties from the 
Gårding-Wightman axioms. 


Microcausality (Bose-Fermi alternative) If y; and 
Yi+ı have spacelike separated supports, then 
We A (P18 +++ @ Pi @ Pit1 @ +++ @ pn) 


1 


= LUT TIN (G1 @ +++ @ Pint @ Yi ® +++ Bn). 


L1Vi “UN 








or dropping coordinate indices, 


WA BEAN (01 @ +++ OY; Yi @ +++ @ yn) 
= by HON (91 @ +++ @Yi41 OY; @ +++ On). 
Invariance For all g € SL(2,C) and y E R!', one has 
We gn (P1 @ +++ @ Yn) 
= D8 (gyi DN (gtx 
x wit IN (A(g)g1 @ +: @ A(g)pn) 
= wi IN (pil — y) @ + @ yn(- — y)) 
or dropping coordinate indices, 
w” “AN (py Q-+-@ YN) 
= (D” (g7!) @--- @ D*(g")) 
x wW ~N (Ag) @--- @ A(g)pn) 
= wip) yD D nly). 


By translation invariance, the N-point functions 
we! EN (x1 +++xN) only depend on the N — 1 relative- 
position vectors &1 := x1 — X2, & :=x2—4%3,..., 
EN-1 := XN, — xn. This means that there are distribu- 
tions We1'N on Ca al related to the N-point 
functions by the symbolic condition 


WIN (aq ey) = WEZEN (E «Ey. 


In precise notation, this reads 


where 


Dx (1 +++ E€n_1) =O o-oo = Osage = 1 


ote), 
The functions Wr are called the Wightman 


functions, and they have the following property 
because of the spectrum condition of the field. 


Spectrum condition The support of the Fourier 
transform of each Wi ON ts contained in Gn 


The uniqueness of the vacuum vector (up to a 
phase) is equivalent to the following condition. 


Cluster property For N > 2, let x be a spacelike 
vector in R‘*?, let L be a natural number <N, and 
let » and w be tempered test functions on (R!*°)* 
and (R'*3)N~, respectively. then 


lim iw w (ge 8 W- — Ax)) 
0<vA—> 


= We (pw (&). 





On the one hand, these properties have been 
deduced from the Garding—Wightman axioms via 
eqn [1]. Conversely, a family of distributions 
labeled in the above fashion and satisfying the 
above properties may be used to construct a 
Garding—Wightman field theory provided that two 
more conditions — which hold for all systems of 
N-point functions — are satisfied. This requires 
some elementary notation. 

Define the index sets 


a,-::an 
Íy := -l<aj<m,1<yjcn, 
Vi +*+ VN 


for all isin}, NEN 


To := {0}, and Z:= Unen, Zn. On Z a concatena- 
tion o is defined by 


(25an) e (unt) (a1 
V1 °°°: VN 1°: UM ° Pies 


and 


' VN H1°** EM 


anmam ) 


Non:=Kod:=K 


and an involution x by 


* * * 
a1°:°:an z an` ay and 0 :=0. 
V1 °°° VN VN °: 1 


Define an antilinear involution * on S™:= 
S((R!™)^) by 

plx: xn) = WKN x1) 
for each NEN. Put S°:=C and 2*:=Z for all 
Zz EC. 


Define S?’:—S™ x Iy, and S?:= (Jy S™. For 
each kETyn, the set S*:=S((R'°)N) x {k} is a 
linear space. On the direct sum B’:= @ 
define an associative product by 


(y, s) A) = (PBX, KA) 
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and an antilinear involution * by (y, K)“ :=(y%*, K*). 
This endows 6? with the structure of a nonabelian 
*-algebra with unit element 1=(1,() (Borchers 
algebra). 

If one defines Fy(z):=z1, then wg(z) =z, and the 
Wightman functions induce a C-linear functional w 
on B* by 


ww, K) = Ww. (q) [2] 


w exhibits the following two properties, which are 
the announced additional conditions required for 
reconstructing the fields from the N-point functions. 
Hermiticity w(E*) = w(€). 

Positivity w(E*E) > 0. 


To see Hermiticity, compute 


wy, K) = (Q, Fe (Gp) 2) 
= (F.(W)Q, Q) = w (a, K) 


and use C-linearity to prove the statement for 
arbitrary € € B. For positivity, write any € as a finite 


sum £= (1, K1) a + (Ym, KM), and compute 
M 
ACD <0 (So ki) (Wh =) 
ij=1 


=o 20% @ dj, Ki} >) 
= Stn (vj @ vy) 
-ye 
53 O, Fet 
= Se aW Fre, (j)Q) 


= | 3 Fe (Yi)Q l 


Piet on; (qb 8 W) ) 


) Fr (bj) Q) 


>20; 








Theorem 1 (Wightman’s reconstruction theorem). 
Let m and ny---n, be natural numbers, let 
To, T1,T2,..., and T be the above index sets, and 
let B? be the above Borchers algebra. Let D1 -- -Dm 
be matrix representations of SL(2,C) in C™ .. . C”, 
respectively. 

For each natural number N, let (wWs)kez, be a 
family of distributions on (RI3)N. Suppose the 
family (w,),,<-7 defined this way satisfies microcaus- 
ality, covariance, spectrum condition, and the 
cluster property. If the linear functional w defined 
on B* by eqn [2] is Hermitian and positive, then 
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there is (up to unitary equivalence) a unique family 
F'.--F” of Garding-Wightman fields with nı +++ nm 
components such that eqn |1] holds. 


The proof uses the GNS construction known from 
the theory of operator algebras. The Borchers 
algebra plays several roles. On the one hand, it is a 
linear space with an inner product. The Hilbert 
space H and the invariant space D of the field theory 
are constructed from this structure. On the other 
hand, the Borchers algebra acts on itself as an 
algebra of linear operators by its own algebra 
multiplication. This is the structure the -algebra of 
field operators is constructed from. 


Results 


The mathematical and structural analysis of quan- 
tum fields has improved the understanding of 
scattering theory in the different approaches men- 
tioned above; see Bogoliubov et al. (1975) and the 
relevant articles in this encyclopedia. Apart from 
this, the following results deserve to be mentioned. 
Evidently, many others have to be omitted for 
practical reasons. 


PCT Symmetry 


An early famous result was Liiders’s proof (1957) 
that all fields in the above setting exhibit PCT 
symmetry, that is, the symmetry under reflections in 
all space and time variables combined with a charge 
conjugation. This symmetry is exhibited by all 
particle reactions observed so far. The proof, like 
several of the main results, made extensive use of the 
fact that the N-point functions are boundary values 
of analytic functions due to the spectrum condition, 
and that a fundamental theorem by Bargmann, Hall, 
and Wightman (1957) yields invariant analytic 
extensions. 


Reeh-Schlieder Theorem 


For each field F% and each bounded open region 
OcR'*, the vacuum vector is cyclic with respect 
to F7(O) (Reeh and Schlieder 1961). So excitations 
of the vacuum vector by field operators located in O 
are not to be considered as state vectors of a particle 
localized in O, since they are not perpendicular to 
the excitations by field operators located outside O. 


Unruh Effect and Modular P;CT Symmetry 


In the 1970s, Bisognano and Wichmann (1975, 1976) 
discovered a surprising link of symmetries to the 
intrinsic algebraic structure of quantum fields, which is 
established by the Tomita—Takesaki modular theory 
(see Tomita—Takesaki Modular Theory). Namely, the 


unitary operators implementing the Lorentz boosts on 
the fields are elements of modular groups. This means 
that a uniformly accelerated observer perceives the 
vacuum as a thermal state with a temperature 
proportional to its acceleration, corresponding to the 
famous Unruh effect. 

In addition, it was shown that P;CT symmetries 
(i.e., PCT combined with rotations by the angle z) are 
implemented by modular conjugations (modular P4 CT 
symmetry). Modular P; CT symmetry is a consequence 
of the Unruh effect (Guido and Longo 1995). 


Spin and Statistics 


Immediately following Liiders’s PCT theorem, the 
spin-statistics theorem was proved for the N-point 
functions of the Wightman setting (Liders and 
Zumino 1958, Burgoyne 1958, Dell’Antonio 1961). 
This was a remarkable and widely acknowledged 
progress. But as remarked earlier, the confinement to 
finite-component fields, which is used in the proof, 
cannot be motivated by physical first principles (1.e., in 
a truly axiomatic fashion). The representation D of 
SL(2, C) acting on the components, however, is forced 
to be finite dimensional by this assumption, and since 
the representations D*% are objects of investigation, a 
considerable part of the result is assumed this way 
from the outset. Even more so, there are examples of 
fields with a “wrong” spin-statistics connection and 
infinitely many components. 

This was one reason to continue working on the 
subject. At the beginning of the 1990s, it was found 
that the spin-statistics theorem can be derived from 
the symmetries discovered by Bisognano and Wich- 
mann, and Unruh. Two approaches not referring to 
the number of internal degrees of freedom have been 
worked out: one assumes the Unruh effect (Guido 
and Longo 1995), the other modular PCT symme- 
try (Kuckert 1995, 2005, Kuckert and Lorenzen 
2005). The first approach has been generalized to 
conformal fields, the second to the case that the 
symmetry group’s homogeneous part is not SL(2, C), 
but only SU(2). 

Both approaches can be applied to infinite- 
component fields. They yield existence theorems; a 
distinguished representation is constructed from the 
modular symmetries, and this representation exhib- 
its Pauli’s spin-statistics connection. As mentioned 
before, nothing more can be expected at this level of 
generality. The line of argument works in both the 
algebraic and the Wightman setting. 


A Dynamical Property of the Vacuum 


One can derive the spectrum condition, the Bisog- 
nano—Wichmann symmetries/the Unruh effect, and 


covariance from the condition that no (inertial or) 
uniformly accelerated observer can extract mechan- 
ical energy from the field in vacuo by means of a 
cyclic process (Kuckert 2002). 


Interacting Fields 


The examples of interacting quantum fields that fit 
into the above settings live in one or two spatial 
dimensions only, and their relevance for physics 
mainly consists in being such examples. This 
has contributed to some frustration and to doubts 
on whether one is not, in fact, proving theorems on 
pretty empty sets, or in other words, working on 
“the most sophisticated theory of the free field.” 

The computations in quantum field theory are, like 
most of the computations in physics, perturbative. In 
order to be successful, they need to yield good 
agreement with experiment with reasonable compu- 
tational efforts, that is, by evolution up to the second 
or third order. This asymptotic convergence is more 
important than convergence of the series as a whole. 
There are low-dimensional examples of interacting 
Wightman fields (e.g., (pt); cf. the monograph by 
Glimm and Jaffe (1987)), and time will tell whether 
four-dimensional interacting Wightman fields exist. 
But there is no reason to expect convergence for 
general interacting fields; for example, QED does not 
fit into the Wightman framework. 

The appropriate extension of the Wightman 
setting has been formulated by Epstein and Glaser 
(1973). It defines the S-matrix rather than the field 
itself as a (in general divergent) formal power series 
of operator-valued distributions. 

The above results apply to this somewhat more 
modest setting as well, so the “axiomatic” 
approaches do help in understanding the known 
high-energy physics interactions. This even includes 
gauge theories (see Perturbative Renormalization 
Theory and BRST). The high-precision results of 
QED can be reproduced within this setting, and 
there occur no UV singularities: renormalization 
amounts to the need to extend distributions by 
fixing some parameters, that is, the renormalization 
constants. The infrared problem is circumvented by 
considering the S-matrix as a (position-dependent) 
distribution taking values in the unitary formal 
power series of distributions rather than as a single 
(global) unitary operator (or unitary power series). 


Quantum Energy Inequalities 


Energy densities of Wightman fields admit negative 
expectation values (Epstein, Glaser, and Jaffe 1965). 
This is in contrast to the positivity conditions that 
the energy-momentum tensors of classical general 
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(and, hence, also special) relativity have to satisfy to 
ensure causality. But the conflict can be solved by 
smearing the densities out in space or time, as has 
first been realized by Ford (1991). The extent to 
which the energy density can become negative 
depends on the extent to which it is smeared out: 
“more smearing means less violation of positivity,” 
so the classical positivity conditions are restored at 
medium and large scales. There are many ways to 
make this principle concrete. Quantum energy 
inequalities hold for thermodynamically well- 
behaved quantum fields on causally well-behaved 
classical spacetime backgrounds. 
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2005), see also the references given there for related 
work. 

In different formulations and at differing degrees of 
mathematical sophistication, the causal approach to 
perturbation theory can be found in the monographs 
by Bogoliubov and Shirkov (1959), Scharf (1989, 
2001), and Steinmann (2000). Two modern review 
articles have been written by Brunetti and Fredenhagen 
(2000) and by Diitsch and Fredenhagen (2004). 

The reference original articles on the Euclidean 
axioms are those of Osterwalder and Schrader (1973, 
1975). Note that the first one contains an error. (cf. 
also Zinoviev (1995)). A monograph on Euclidean 
field theory and its relations to the other axiomatic 
settings of quantum field theory and to statistical 
mechanics is that by Glimm and Jaffe (1987). 

A recent review on quantum energy inequalities is 
due to Fewster (2003). 
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Introduction 


Backlund transformations appeared for the first time 
in the work of the geometers of the end of the 
nineteenth century, for instance, Bianchi, Lie, 
Backlund, and Darboux, when studying surfaces 
of constant curvature. If on a surface in three- 
dimensional Euclidean space, the asymptotic direc- 
tions are taken as coordinate directions, then the 
surface metric may be written as 


ds? = dx? + 2cos(w) dx dy + dy” [1] 


where w(x,y) is a function of the surface coordi- 
nates x,y. A necessary and sufficient condition for 
the surface to be of constant curvature is that w 
satisfies the nonlinear partial differential equation 


W xy = sin(w) [2] 


where the subscript denotes partial derivative. 
Equation [2] is nowadays called the sine Gordon 
(sG) equation. Bianchi (1879), Lie (1888, 1890, 
1893), and Bäcklund (1874) introduced a transfor- 
mation which allows one to pass from a solution of 
eqn [2] to a new solution, that is, from a surface of 
constant curvature to a new one. Starting from the 
work of Clarin (1903), this transformation has been 
referred to as Bäcklund transformation (BT). The 


BT for eqn [2] reads 
W : 2) [3a] 


J i 
Wy = —W y + sin (“ e) [3b] 


where a is a nonzero constant parameter and %® is a 
different solution of eqn [2]. It is immediate to prove 
by appropriate differentiation of eqns [3] with 
respect to y and x that both w and wW must satisfy 
eqn [2]. The BT [3] provides a denumerable set of 
exact solutions once a solution w is known. Bianchi 





W p= Wa FA sin( 





showed that four such solutions can be related in an 
algebraic way: 


wW = Ww di +d w = wW 
t = t 4 
an( 4 di =a) an( 4 4 


Equation [4] is derived using the permutability 
theorem proved by Bianchi in his Ph.D. thesis in 
1872: 














whereby the diagram 


a , 
w w 


we mean a BT from w to w’ with parameter a. 
For sG equation [2] a trivial solution is given, for 
example, by w(x, y)=7. Then, from eqn [3a] we get 


1 — =a 


w(x,y) = 2arcsin a 


Introducing this result in eqn [3b], we get a = —1/a. 
So, the application of the BT [3] to sG equation gives 
the nontrivial solution 


-[ax-y/. 
w= 4 arctan o seu [6] 





1 + e-lax-y/a] 


Clarin (1903) extended the results of Bäcklund to 
the case of a generic partial differential equation of 
second order, 


Fix, Y, W, W x, W y, W xxs W xy; W yy) =0 [7] 


by assuming that 
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If the compatibility of eqns [8] 


fy —&x =0 [9] 


is identically satisfied by eqn [7] for the variable 
w(x,y), then we say that eqns [8] are an 
auto-Backlund transformation for eqn [7]. In this 
case, eqns [8] transform a solution of eqn [7] into a 
new solution of the same equation. Thus, eqns [8] 
simplify the problem of finding solutions of eqn [7]. 
Given one solution w(x, y) of eqn [7], the existence 
of a BT reduces the problem of integrating eqn [7] 
into that of solving two first-order ordinary differ- 
ential equations. From this point of view, the 
Cauchy—Riemann relations 


We = Wy, Wy. = Wy [10] 
for the Laplace equation 
Wy FU 5p =O [11] 


are a BT ante litteram (however, without a free 
parameter). 

Consider the case when w(x, y) satisfies a different 
partial differential equation, 


G(x, y, W, W O ya W xx, W xy, W yy) = 0 [12] 


In this case, one still has a BT, but not an auto-BT. 
The best-known cases are when Fy = Wy + W xxx + 
ww, and G1 =Wy + Ù xxx + Www, and Fy = UG 
e” and G2 =W x, (Lamb 1976). In the first case, the 
BT relates the Korteweg-de Vries (KdV) equation to 
the modified KdV equation and this transformation 
paved the way to the discovery of the complete 
integrability of the KdV equation by Gardner et al. 
(1967). In the second case, the BT relates the 
Liouville equation to the wave equation, and can 
be used to solve it completely. Due to the first 
example, often a non-auto-BT is denoted as Miura 
transformation. 

One can now state an operative definition of BT, 
extending the results of Backlund and Clarin to 
more general equations. 


Definition 1 Consider two partial differential 
equations of order mı and m): 
Fi(x,u,u,u,...,u)=0 [13a] 
(1) (2) (mı) 
F(x, ú, ü, ú,..., ú )=0 [13b] 
(1) (2) (m2) 


where x € R” and (u,ù) € CP, and u is the set of 
— k) 
k-order derivative of u. The set of n equations 


~ 


G;(X, U, U,..., Mh, U,..., U) =U 
(1) (s1) 0) (s2) 


j=1,2,..,n [14] 


with sı < mı and s2< m, represents the BT of 
eqns [13] iff the compatibility of eqns [14] is 
identically satisfied on the solutions of eqns [13] 
and G; depends on a set of essential arbitrary 
constant parameters. 


The Clarin formulation [8] and the classical BT 
for the sG [3] are clearly special subcases of this 
definition. When a solution of Fi =0 is known, a 
solution of F2 =0 is obtained by solving a set of 
lower-order partial differential equations. By a 
proper choice of the BT parameters, once a new 
solution is obtained by solving the BT [14], one can 
use the obtained solution as a starting point to 
construct another one, and so on. In this way, one 
can construct a whole ladder of solutions, a priori a 
denumerable set of solutions. This same construc- 
tion has been applied also to the case of functional 
equations. In particular, it has been considered for 
the case of differential-difference and difference- 
difference equations both for finite (dynamical 
systems (Wojciechowski 1982)) and infinite lattices 
(Toda 1989). 

In the case when Fı and F> represent the same 
equation, s1 = s2 = 1 and the BTs G; =0 are linear in 
u, then Definition 1 is strictly related to the notion 
of nonclassical symmetry or conditional symmetry 
(Levi and Winternitz 1989, Olver 1993), an exten- 
sion of the concept of Lie symmetry used to reduce 
and integrate a differential equation. In the case of 
the nonclassical symmetries, the known solution % is 
included in the arbitrary x-dependent coefficients of 
the transformation. In this case, the BT is just a way 
to construct an explicit solution of the differential 
equation [|7]. 

Definition 1 is often too general to be able to get 
explicit results. It is constructive for any partial 
differential equation, linear or nonlinear, but if one 
is not able to get a nontrivial BT this does not 
mean that a BT does not exist. As noted later, the 
existence of an auto-BT is associated to the 
existence of an infinity of symmetries, and this is 
a condition for the exact integrability of eqn [13] 
(Fokas 1980, Ibragimov and Shabat 1980). So, the 
existence of a BT is closely related to the integr- 
ability of eqn [13]. 


Bäcklund via Integrability 


One can derive the BT from the integrability 
properties of eqn [13a]. Equation [13a] is said to 
be integrable if it can be written as the compatibility 
condition of an overdetermined system of linear 
partial differential equations for an auxiliary func- 
tion depending on a free parameter belonging to the 


complex C plane. The prototype of such a situation 
is given by the Lax pair for the KdV equation 


Uş + U xxx — 6uu, = 0 [15] 
introduced by Lax (1968): 


Ly = ky, L=-—8 + u(x,t) [16a] 


Yi = —My, M = Och — 


where k is a free parameter and Y = y(x, t; k). As eqn 
[16a] is nothing else but the stationary Schrödinger 
equation, the function ọ% can be interpreted as a 
wave function, and k* is the spectral parameter 
corresponding to the potential u(x,t). The condition 
for the existence of a solution % of the over- 
determined system of eqns [16] is given by the 
operator equation 


3(u0, + xu) [16b] 


L= |, Ml [17] 


the so-called Lax equation. In the case of 
asymptotically bounded potentials, eqn [16a] 
defines the spectrum unique. Introducing the 
following asymptotic boundary conditions for the 


wave function 4, 


Utk = T(k,t)e™ 
os : [18] 
w(x, t; Rk) Fes e * + R(k,t)e™ 


where R(k,t) and T(k,t) are, respectively, the 
reflection and the transmission coefficient, the 
spectrum is defined in the complex plane of 
the variable k by 


S{u] ={R(k,t), —co < k < œ; Pn, cn (t), 
PAD on 19] 


where p, are the bound state parameters corre- 
sponding to isolated singularities of the reflection 
coefficients on the imaginary positive k-axis corre- 
sponding to a solution ¢,(x,t;p,) of the spectral 
problem vanishing for x — —oo and such that 

Jim, [Pbx tpa) = 1 [20] 
and c, are some functions of t related to the residues 
of R(k,t) at the poles p,. There is a one-to-one 
correspondence between the evolution of the poten- 
tial u(x,t) in eqn [15] and that of the spectrum S[z] 
of the Schrödinger spectral problem [16a]. In parti- 
cular, for the KdV, taking into account eqn [16b], 
the evolution of the reflection coefficient R(k, tf) is 
given by 


dR(k, t) 
dt 





= Bik? R(k, t) [21] 
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In eqn [21] and henceforth, d/dt denotes the total 
derivative with respect to t. 

In the following, for the sake of the simplicity 
of exposition and for the concreteness of the 
presentation, all the results presented on the BT 
will be derived for the KdV equation. Similar 
results can be obtained and have been obtained in 
the literature for many classes of integrable 
partial differential equations in two and three 
dimensions and for differential-difference and 
difference-difference equations. For a partial 
review of the available recent literature on 
the subject, see Rogers and Shadwick (1982) and 
Coley et al. (2001) 

A more general form of introducing the non- 
linear partial differential equation as a compat- 
ibility of an overdetermined system of linear 
equations has been provided by Zaharov and 
Shabat (1979) with the dressing method (DM). In 


the DM, the differential equations [16] are 
substituted by a matrix system of linear equations 
Y e= U(u(x,t),k)v [22a] 
UW, = V(u(x,t),k)Y [22b] 


where V=W(x,t;k) and U and V are matrix 
functions. The existence of a nonsingular solution 
of the system of linear equations [22] requires 
that the matrix functions U and V satisfy the 
equation 


U,- V, +[U,V]=0 [23] 


often called zero-curvature condition. The KdV 
equation [15] in the DM is obtained by choosing 


ea (* en 


1 —ik 
V(u(x,t),k) 
B 2u + 4k? —u,, — Ziku — 4ik? 
E (L +2iku+4ik? 2u(u+ 2k?) —2iku, — E 


|24] 


The existence of an auto-BT implies the existence 
of a differential equation (see Definition 1) which 
relates two solutions of the same nonlinear equa- 
tion. The new solution (x,t) of eqn [15] will be 
associated to a different Lax operator and a 
different spectral problem (but of the same opera- 
tional form) 


L = —ôsx + a(x, t) [25a] 


Lb = ký 25b] 
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The existence of a relation between the potentials 
u(x,t) and u(x,t) thus implies that there must be a 
(u, ŭ; k)-dependent operator D such that 


p= Dy [26] 


The compatibility of eqns [16a], [25b], and [26] 
implies that LDw = Dk?4, that is, 


LD = DE [27] 


Equation [27] is the auto-BT in the Lax formalism. 
If L and L are two different spectral problems 
related to two different nonlinear partial differential 
equations, then eqn [27] will provide a Miura 
transformation. In the DM, the requirement of the 
existence of a BT is given again by eqn [26] with w 
and 7 substituted by Y and W and the operator D 
substituted by a matrix function D. The BT in the 
DM is given by 

D x = U(u(x,t),k)D — DU (u(x, t), k) [28a] 


D, = V(ii(x,t),k)D — DV (u(x,t), k) [28b] 


In the particular case of the Hilbert-Riemann 
problem with zeros, providing the soliton solutions, 
the matrix D can be expressed as a function of W. In 
this way, one derives the Moutard or Darboux 
transformation (DT) (Moutard 1878, Levi et al. 
1984), the most efficient way to get soliton solutions 
of the nonlinear partial differential equation. 

Given a linear ordinary differential equation for 
the unknown w, depending on a set of arbitrary 
functions u(x) and parameters k, the DT provides a 
discrete transformation which leaves the equation 
invariant. In the particular case of the KdV equation 
associated with the stationary Schrodinger spectral 
problem [16a], we have 

u 1) 





(x,t) = u(x,t) — 2(log F(x, t)) xx [29a] 
p(x, t;k)= — k+ ip YP x(x, t; k) 
Falet) 
~ Fe.) w(x, t; k) [29b] 


where the intermediate wave function 


F(x, t) = y(x, t;k = ip) + a(x, t; k = —ip) 


is a linear combination of the Jost solution of the 
Schrödinger spectral problem with p a real para- 
meter and a an arbitrary constant. If one looks for 
an equation involving only the potentials u and @, 
from eqns [29], one gets the BT for the KdV 
equation. Given a trivial solution of the KdV 
equation, together with the corresponding solution 


of the spectral problem, eqn [29a] provides a new 
solution of the KdV, while eqn [29b] gives a new 
solution of the spectral problem. This procedure can 
be carried out recursively and gives a ladder of 
explicit solutions for the KdV equation. 

The DM is a particularly simple setting in which 
one can derive DTs. In fact, expressing the matrix 
D in terms of Y, eqn [28a] gives a relation between 
the potentials of the type given by eqn [29a], while 
eqn [26] gives eqn [29b]. Depending on the form of 
the matrix D in terms of k, one can introduce more 
parameters in the DT. The classical DT [29] 
depends on just one parameter; however, in the 
case of the Schrödinger spectral problem [16a], one 
can also have DTs depending on two parameters, a 
TDT. 

A more general DT, which can provide solutions 
even when the initial solution is not bounded 
asymptotically, can be obtained for many equations 
and, in particular, also for the KdV equation. This is 
obtained in a particular limit of the TDT when the 
parameters coincide (Levi 1988) and it is often 
referred to as binary DT (Matveev and Salle 1991). 
The binary DT for the KdV is given by 





u(x,t) = u(x,t) — 2 (log F(x, t)) xy [30a] 
E 1 CA TA 
Jesh =p a (E SE) balet) 
7 TE 30b] 


where u is a value of k for which the function 
wW(x,t;k) is asymptotically bounded at +oo and the 
function F(x,t) is given by 

+00 


F(x,t)=1+ wy, t; up) dy [31] 


with p an arbitrary constant. The corresponding BT 
obtained eliminating the function F from eqns [30] 
reads 


_ 1. 
Jax — Vax =- 3 (4-9) 
— |q 


x + qx — 2g(x) + 2u\(4 — q) 


2 4-4 

where q= f uoly,t)dy with u(x,t) =u(x, t) — 
g(x), the asymptotically bounded part of u(x,t), 
and g(x) its asymptotic behavior, and 
G= J, toly,t) dy with ño(x, t) = a(x, t) — g(x). 

Once the Lax operator L is given, we can obtain 
in a constructive way the operators M which 
give the admissible nonlinear partial differential 


equations and the operators D which give the 
admissible BT. A technique to do so is provided by 
the so-called Lax technique introduced by Bruschi 
and Ragnisco (1980a-c). Using the Lax technique, 
we can easily obtain the nonlinear partial differ- 
ential equations and BT associated with the Lax 
operator [16a] both in the isospectral and non- 
isospectral case (when k;=0O and when k, £0) 
and the corresponding evolution of the spectrum. 
We have 





u,=f(L,b)u, + g(L, t) [xus + 2u] [33a] 

kı = kg(—4k?, t) 
TRD L 2ikf(—4k?, Rk.) a 
F(A)(ñ-u)+G(A)T1=0 [83d 
ie, p = ECA) = RGA a 3d 


F(—4k2) + 2ikG(—4k2) 


where the functions f, g, F, and G are entire 
functions of their first argument and the recursive 
operators £ and A are given by 


Lf (x) = fxx(x) — 4u(x, t)f (x) 


sano]  fo)dy Pa 
M(E) = fal) — 2a, t) + ule Dfl) 
+T f(y) dy [34b] 


Df (x) = [i x(x, t) + u(x, tf (x) + [a(x t) — u(x, t)] 


x f Bo -uod 34c 


In the limit when # — u the operator A — £. A BT 
is obtained by choosing the functions F and G in 
eqn [33c]. The simplest BT is obtained by setting 
F=o and G=1: 


Ùs +Vx,+(—v)|o—F(0—v)] =0 [35] 


with u(x,t)= —v,(x,t) and ø is the Backlund 
parameter. By combining together BT of the form 
[35] with different parameters as in eqn [5], we get 
the permutability theorem for the KdV BTs: 


a Gins |ia 


=o aawa PS 


Its proof is immediate from the point of view of the 
spectrum. 
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Bäcklund and Symmetries 


A symmetry of the nonlinear equation [15] is given 
by a flow commuting with it, that is, by an 
equation 


U e = f (U, Ux, Ut, ...) [37] 


where e€ is the group parameter, u = u(x, t; €), and the 
e derivative of [15] is zero on its set of solutions. 
A group transformation is obtained by integrating it. 
Usually this is possible only when eqn [37] is a 
quasilinear partial differential equation of the first 
order. Taking into account the evolution of the 
spectrum of the KdV equation [15], it is easy to 
prove that its symmetries are given by 


u e 15; a,l” — 3 ` ate lus 


-- S ae") [xu + 2u] [38] 
n=0 


where a, and 8, are a set of constant parameters. 
For each choice of the parameters a, and (,, 
one gets a symmetry of the KdV equation [15]. 
With eqn [38] one can associate the following 
evolution of the reflection coefficient R(k, t; €): 


n=0 
+00 
-3 X` Byt(—-4k?)"" \k [39] 
n=0 
and of the spectral parameter k 
+00 
ke=X_ Ba(—4k7)"k [40] 
n=0 


As —(1/2)£ l=xu,+2u, one can add to the 
symmetries [38] the exceptional one (which has no 
spectral counterpart as u is not bounded 
asymptotically): 


U. = 1+ 6tux |41] 


By a proper natural choice of the constant para- 
meters a, and 3,, one can define two infinite series 
of symmetries. The first one is obtained by choosing 
Byn=90 and ay,=6n,m with m=1, 2,...,00 and can 
be denoted as the isospectral series as k «= 0. This is 
formed by commuting symmetries. The second one 
is given by a, =0 and 3, = ón m with m=1, 2,...,00 
and can be denoted as the nonisospectral series as 
k. #0. The nonisospectral symmetries have a 
nonzero commutation relation among themselves 
and with the isospectral ones. 
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Except for a few Lie point symmetries (given by 
eqn [41] and by choosing inside the series [38] those 
with different from zero only Go or ag or a,) they 
are all generalized symmetries (Olver 1993). By 
analyzing their spectrum, it is easy to prove that the 
choice [38] is such that they are all independent. For 
the isospectral class, the evolution of the spectrum is 
simple and can be integrated to provide the group 
transformation of the spectrum 


R(k, t;e) =R(R,t) 
xX exp ae 5 ana) he [42] 
n=0 


Let us now consider the simplest BT obtained by 
choosing, in eqn [33c], F(A) =o and G(A)=1, where 
g is an arbitrary parameter. In the spectral space, this 
corresponds to the following change of the spectrum: 


_a@—2ik 
og +2ik 


Defining R(k,t)=R(k,t;€), eqn [42] is equal to 
eqn [43] iff 


R(k,t) R(k,t) 43] 


2 


on eo*"+1(2n +1)’ 


n=0,1,...,00 M] 
So we need an infinite number of symmetries to 
be able to reconstruct the change of the spectrum 
given by the BT. This shows that the existence of a BT 
is strictly connected to the existence of an infinity of 
symmetries which is a condition for the exact 
integrability of the nonlinear partial differential 
equation (Fokas 1980, Ibragimov and Shabat 1980). 


Discretization via Backlund 


BTs, apart from providing classes of exact solutions 
to nonlinear equations, play a very important role in 
the discretization of partial differential equations. As 
noted earlier, an auto-BT is a differential relation 
between two different solutions of the same non- 
linear partial differential equation. If it is assumed 
that the new solution # is just the old solution u 
computed in a different point of a lattice, then the 
BT becomes just a differential-difference equation 
(Chiu and Ladik 1977, Levi and Benguria 1980). 
This can be carried out also at the level of the 
associated compatibility condition and in such a 
way one is able to also obtain its Lax pair. This 
demonstrates the integrability of the differential- 
difference equation 


v(int+1,t),+u(n,t), + [v(n+ 1,2) — v(n, t)| 
x {o —5fv(n + 1,2) —v(n,6)|} =0 [45] 


which is an integrable  differential—-difference 
approximation to the KdV equation or 


win +1, t) , = wn, t) , 
w(n+1,t)+w(n,t) 


Ja 
+ Za sin 5) 


[46] 
a discrete integrable differential—difference approxima- 
tion to the sG equation (Hirota 1977, Orfanidis 1978). 

As the nonlinear superposition formulas are 
purely algebraic relations involving potentials asso- 
ciated with integrable nonlinear partial differential 
equations, one can interpret them as difference- 
difference equations. In the case of the sG equation 
from eqn [7], we have 


Wn+1,m+1 — Wim 


{adi ra Ww Ww 
— 4 arctan 1 Gea ptei ty) [47] 


d1 — a2 4 


where RN = tm Oi HW rion WX) 
Wn,m+1, and w (x, t) =Wn+1,m+1. In a similar manner, 
from [36], one gets 


(Fi Os) Catin — Unyi] 


E [48] 
Of = 02 T Von ~ Uimh] 


Vn+1,m+1 = Yam — 


The continuous limit of eqn [47], obtained by setting 
x=e,n and y =m and choosing 


d1 €1€2 


a2 4 


gives back eqn [2] (Rogers and Schief 1997). It is 
worth mentioning that one can also use known 
nonlinear lattice equations to construct BT for 
nonlinear partial differential equations (Levi 1981). 


See also: Integrable Systems and Discrete Geometry; 
Integrable Systems: Overview; Painlevée Equations; 
Solitons and Kac—Moody Lie Algebras; Toda Lattices. 
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Introduction 


The Batalin—Vilkovisky formalism for quantizing 
gauge theories has a long history of development. It 
begins with the Faddeev-Popov procedure for 
quantizing Yang-Mills theory, involving the Faddeev— 
Popov ghost fields (Faddeev and Popov 1967). It 
continued with the discovery of BRST symmetry by 
Becchi et al. (1976). Then Zinn-Justin (1975) 
introduced sources for these transformations, and 
a symmetric structure in the space of fields and 
sources in his study of renormalizability of these 
theories. Finally, Batalin and Vilkovisky (1981) 
systematized and generalized these developments. 
A more detailed account of this history can be 
found in Gomis et al. (1994), where many worked 
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examples of the Batalin—Vilkovisky formalism are 
given. At the present time, it is the most general 
treatment available. Alexandrov, Kontsevich, Schwarz, 
and Zabaronsky (AKSZ 1997) have presented a 
geometric interpretation for the case in which the 
action is topologically invariant. 


Structure of the Set of Gauge 
Transformations 


Consider a system whose dynamics is governed by 
a classical action S[¢’] which depends on the 
fields ¢'(x),i=1,...,2. We employ a compact 
notation in which the multi-index i may denote 
the various fields involved, the discrete indices on 
which they depend, and the dependence on the 
spacetime variables as well. The generalized 
summation convention then means that a 
repeated index may denote not only a sum over 
discrete variables, but also integration over 
the spacetime variables. ¢;—=e(¢') denotes the 
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Grassmann parity of the fields. Fields with «;= 0 
are called bosonic, with ¢;=1 fermionic. The 
graded commutation rule is 


PPY) = DP y) (x) [1] 


For a gauge theory the action is invariant under a set 
of gauge transformations with infinitesimal form 

bp =R e, a=1or2or...m [2] 
The £% are the infinitesimal gauge parameters and 
Ri, the generators of the gauge transformations. 
When ca = €l) =0 we have an ordinary symmetry, 
when ¢€,=1 the equation is characteristic of a 
supersymmetry. The Grassmann parity of R$, is 
e(R!,) =€; + €a (mod 2). 

A subscript after a comma denotes the right 
derivative with respect to the corresponding field, 
that is, the field is to be commutated to the far right 
and then dropped. The field equations may then be 
written as 


Soi = 0 [3] 


where So is the classical action. Let © denote the 
surface in the space of solutions where the field 
equations are satisfied: 


Soila = 0 4] 


If the gauge transformations are “independent” 
on-shell, that is, 


rank RÍ |- =m [5] 


the gauge theory is said to be “irreducible.” We 
assume here that this is the case. When it is not, the 
theory is “reducible.” For details of the treatment in 
that case, see Gomis, Paris, and Samuel. The 
classical solutions are ġo € X. 

The Noether identities are 


So R$ = 0 [6] 

The general solution to the Noether identity is 
X = RL T + So jE" [7] 
The commutator of two gauge transformations is 
1, 8al6! = (RiR — (-1)° Rh Ri, eles [8] 


Since this commutator is a symmetry of the action, it 
satisfies the Noether identity 


S (Ri Ri, 7 (-1)"* Ri Ri) s0 9) 
which by eqn [7] implies that 


Ri Ri, — (—1)°? RRL = RT 4+ SojEt, [10] 


Equations [8] and [10] lead to the following 
condition: 
51, S)e = (Rİ T}; — SogEhy Jetes [11] 

The tensors T? g are called the structure constants of the 
gauge algebra, although they depend, in general, on 
the fields of the theory. When Efa =0, the gauge 
algebra is said to be “closed,” otherwise it is “open.” 
Equation [11] defines a Lie algebra if the algebra is 
closed and the T are independent of the fields. 

The gauge tensors have the following graded 
symmetry properties: 


T= -DT _ 
Ei, = —(-1)°*Ei, = -(-1)°*Ei, 
The Grassmann parities are 
e(T 34) = €a + €g + € (mod 2) [13] 
and 
e(EÏ g) =Gt+@tea+es(mod2) [14 


Various restrictions are imposed by the Jacobi 
identity 


SO (61, [&2, 63] = 0 [15] 
cyclic(123) 
These restrictions are 
2: (RiA 5, = S0;B! oy ) ere"e*=0 [16] 
cyclic(123) 


where 


L = (T; Mo ) 4 (-1) ot) 


a an ~ Bry 
5 pk 5 
ú (TR! 7 a) 
+ (19) (TERE — TATZ) 


ym ap 


and 


sm (Eh — El, — (1 
xR, Ep + (1) at) 
i (1o > B -5 y) T (—1) to) 
x (a — yo B) 


As in the familiar Faddeev-Popov procedure, it is 
useful to introduce ghost fields C° with opposite 
Grassmann parities to the gauge parameters £“: 


e(C%) = €g + 1 (mod 2) [17] 


and to replace the gauge parameters by ghost fields. 
One must then modify the graded symmetry proper- 
ties of the gauge structure tensors according to 


Taisen. =e See T eds. [18] 
The Noether identities then take the form 
SoiR',C* = 0 [19] 
and the structure relations [10] become 


(2Ri R3 — RET, + SojEng)C°C? =0 [20] 


Introducing the Antifields 


We incorporate the ghost fields into the field set 
64 — {¢',C°}, where i=1,...,2 and a=1,...,m. 
Clearly A=1,...,N, where N=n +m. One then 
further increases the set by introducing an antifield 
®*, for each field ®4. The Grassmann parity of the 
antifields is 


(6%) = e(84) +1 (mod 2) [21] 
Each field is assigned a ghost number, with 
gh[g'] = 0 
gh[C®] = 1 [22] 
gh[®;] = —gh{4] -1 


In the space of fields and antifields, the antibracket 
is defined by 


OX AY OX ay 
X,Y) = a 
UY) = sea joe 905 O04 











|23] 


where ð, denotes the right, ó the left derivative. The 
antibracket is graded antisymmetric: 


CA e a 040.6 |24] 
It satisfies a graded Jacobi identity 
AA iat 
x ((¥, Z), X) + (1) (ZX), Y)=0 [25] 
It is a graded derivation 


(X, YZ) = (X, Y)Z + (—1)%® (X, Z)Y 


T |26] 
(XY, Z) = X(Y, Z) + (-1)°°Y(X, Z) 
It has ghost number 
gh[(X, Y)] = gh[X] + gh[Y] + 1 [27] 


and Grassmann parity 


e((X, Y)) = «(X)+e(Y) +1 (mod2) [28] 
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For bosonic fields 


(B,B) =2 a ri [29] 
for fermionic fields 
(F,F)=0 30] 
and for any X 
((X, X), X) = 0 [31] 


If one groups the fields and the antifields together 
into the set 
Falo h Ba lN [32] 


then the antibracket is seen to define a symplectic 
structure on the space of fields and antifields 


OX Y 
= 0z Oz 


mlO o 
e(a) ~ 


The antifields can be thought of as conjugate 
variables to the fields, since 


(X, Y) 





[33] 


with 


(0^5) = óf 35 


The Classical Master Equation 


Let S[P4, %4] be a functional of the fields and 
antifields with the dimension of an action, vanishing 
ghost number and even Grassmann parity. The 
equation 

ðS Os 


(S9) = 2 Spa p 





=0 [36] 


is the classical master equation. Solutions of the 
classical master equation with suitable boundary 
conditions turn out to be generating functionals for 
the gauge structure of the theory. S is also the 
starting point for the quantization. One denotes by 
» the subspace of stationary points of the action in 
the space of fields and antifields: 


= E z = o) [37] 


Given a classical solution ¢ 9 of So one stationary 
point is 





(=o, C20, 420 [38] 
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An action which satisfies the classical master 
equation has its own set of invariances: 





Os 
— Ri =0 39 
with 
a ac OUN 
This equation implies 
RERE] = 0 41 


One says that Rý is invariant on-shell. A nilpotent 
2N x 2N matrix has rank <N. Let r be the rank of 
the hessian of S at the stationary point: 


8,8,8 
Oz4 Az" 5 


We then have r < N. The relevant solutions of the 
classical master equation are those for which r=N. 
In this case the number of independent gauge 
invariances of the type in eqn [39] equals the number 
of antifields. When at a later stage the gauge is fixed, 
the nonphysical antifields are eliminated. 

To ensure the correct classical limit, the proper 
solution must contain the classical action So in the 
sense that 





r = rank 


[42] 


So", 0%, | l0 = So [o] [43] 


The action S[®“, 6%] can be expanded in a series in 
the antifields, while maintaining vanishing ghost 
number and even Grassmann parity: 


S[&, D] = So + HR C” + C573 (-1)° Orc? 
+g (1) 4 Ef a(—1)°O7C* [44] 


When this is inserted into the classical master 
equation, one finds that this equation implies the 
gauge structure of the classical theory. 


Gauge Fixing and Quantization 


Equation [39] shows that the action S still possesses 
gauge invariances, and hence is not yet suitable for 
quantization via the path integral approach: a 
gauge-fixing procedure is necessary. In the Batalin— 
Vilkovisky approach the gauge is fixed, and the 
antifields eliminated, by use of a gauge-fixing 
fermion WY which has Grassmann parity «(V)=1 
and gh[W]= —1. It is a functional of the fields 4 
only; its relation to the antifields is 


Ow 


Pa = gpa 


[45] 


We define a surface in functional space 


Ow 
A * * 
ny = fu WA) |Y, = sat [46] 
so that for any functional X[®, ®*] 
Ow 
Xle, = X fua 47 


To construct a gauge-fixing fermion V of ghost 
number —1, one must again introduce additional 
fields. The simplest choice utilizes a trivial pair 


Ca, Ta with 
|48] 


The fields Ca are the Faddeev-Popov antighosts. 
Along with these fields we include the corresponding 
antifields C*°,7#**. Adding the term 7,C*® to the 
action § does not spoil its properties as a proper 
solution to the classical master equation, and one 
gets the nonminimal action 


See E Oia |49] 
The simplest possibility for W is 
Į = Cu @) [50] 


where x“ are the gauge-fixing conditions for the 
fields y. The gauge-fixed action is denoted by 


Sy = Sy, [51] 


Quantization is performed using the path integral 
to calculate a correlation function X, with the 
constraint [45] implemented by a 6-function: 


Ty(X) = J pepe's( 0; = sat) 


x exp (; WIS, 5) xla, D] 52 
Here W is the quantum action, which reduces to S in 
the limit 4-0. An admissible Ų leads to well- 
defined propagators when the path integral is 
expressed as a perturbation series expansion. 

The results of a calculation should be independent 


of the gauge fixing. Consider the integrand in eqn 
[52], 


I[@, 6*] = exp (; wo, 2) Xið, S] [53] 


Under an infinitesimal change in Y 


T aS / D&AISY [54] 


where the Laplacian A is 


ei 





[55] 


Obviously, the integral Iy(X) is independent of Y if 
AI=0. For X =1 one gets the requirement 


A exp (; w) = xp (; w) 


i 1 
The formula 
(W, W) = ibAW [57] 


is the quantum master equation. A gauge-invariant 
correlation function satisfies 


(X, W) =ibAX [58] 


The terms of higher order in b by which the 
quantum action W may differ from the solution of 
the classical master equation S correspond to the 
counter-terms of the renormalizable gauge theory if 


AS=0 [59] 


One must, of course, use a regularization scheme 
which respects the symmetries of the theory. For 
W=S+ O(h) the quantum master equation [57] 
reduces in this case to the classical master equation 


(S,S) =0 60] 


Hence, up to possible counter-terms, one may 
simply choose W =S. 

To implement the gauge fixing, one uses for the 
action W = S$". For the path integral Z = Iy(X = 1), 
the integration over the antifields in eqn [52] is 
performed by using the 6-function. The result is 


Fe J Doexp(z su) 61 


Geometrical Interpretation of Topological 
Field Theories 


The Batalin—Vilkovisky formalism for topological 
field theories has been given a geometrical inter- 
pretation by AKSZ (1997). 

A supermanifold equipped with an odd vector 
field satisfying Q7=0 is called a O-manifold. A 
O-manifold provided with an odd symplectic struc- 
ture w (P-structure) is called a QP-manifold if the 
odd symplectic structure is Q-invariant, that is, 
Low=0. Every solution to the classical master 
equation determines a QP-structure on M and vice 
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versa. The geometric object corresponding to a 
classical mechanical system in the Batalin—Vilkovisky 
formalism is a QP-manifold. 

The nondegenerate closed 2-form w is written as 


w = dzC,,dz° [62] 


where 2% are local coordinates in the supermanifold 
M. For functions on M, an (odd) Poisson bracket is 
defined as in eqn [33], where w” stands for the 
inverse matrix of w,,. An even function S on M 
satisfies the classical master equation if (S,S)=0. 
The correspondence between vector fields and 
functions on M is given by KrG = (G, F), where Kp 
is the vector field, F the given function, and G an 
arbitrary function. The function F is called the 
Hamiltonian of the vector field Kr. 

Geometrically, equivalent QP-manifolds describe 
the same physics. In particular, one can consider 
an even Hamiltonian vector field Kp corresponding 
to an odd function F. This vector field determines 
an infinitesimal transformation preserving P-structure. 
It transforms a solution S to the classical master 
equation into the physically equivalent solution 
S+«(S,F), where e is an infinitesimally small 
parameter. 

A submanifold L of a P-manifold M is called a 
Lagrangian submanifold if the restriction of the 
form w to L vanishes. In the particular case when 
M=IIT*N (the cotangent bundle to N with reversed 
parity of fibres) with standard P-structure, one can 
construct many examples of Lagrangian submani- 
folds in the following way. Fix an odd function WV on 
N, the gauge fermion. The submanifold Ly € M 
determined by the equation 


— Ow 
— Ox4 


" 





[63] 


where {x*,&,} are coordinates corresponding to the 
identification of M, will be a Lagrangian submani- 
fold of M. 

The P-manifold M in the neighborhood of L can 
be identified with IIT*L. In other words, one can 
find such a neighborhood U of L in M and a 
neighborhood V of L in IIT*L that there exists an 
isomorphism of P-manifolds U and V leaving L 
intact. Using this isomorphism a function WV defined 
on a Lagrangian submanifold L C M determines 
another Lagrangian submanifold Ly C M. 

Consider a solution $ to the classical master 
equation on M. In the Batalin—Vilkovisky formalism 
we have to restrict § to a Lagrangian submanifold 
L € M, then the quantization of S can be performed 
by integration of exp(iS/h) over L. One may 
construct an odd vector field O on L in such a 
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way that the functional S restricted to L is 
QO-invariant. This invariance is BRST invariance. 

AKSZ apply these geometric constructions to obtain 
in a natural way the action functionals of two- 
dimensional sigma-models (Witten 1998) and to 
show that the Chern-Simons theory (Axelrod and 
Singer 1991) in Batalin—Vilkovisky formalism arises as 
a sigma-model with target space IIG, where G stands 
for a Lie algebra and II denotes parity inversion. 


The Poisson-Sigma Model 


The quantization of the Poisson-sigma model was 
performed by Hirshfeld and Schwarzweller (2000) 
and by Cattaneo and Felder (2001). The Poisson- 
sigma model is the simplest topological field theory 
in two dimensions. It is a field theory on a two- 
dimensional world sheet without boundary (Schaller 
and Strobl 1994). It involves a set of bosonic scalar 
fields, which can be seen as a set of maps 
X':M—N, where N is a Poisson manifold. In 
addition, one has a 1-form A on the world sheet M 
which takes values in T*(N), for x coordinates on M 
we have A=A,,;dx' A dX’. Its action is 


So[X, A] = J u(t” (Au X + P'(X)A pA) [64] 
M 


where ¢” is the antisymmetric tensor and u is the 
volume form on M. The gauge transformations of 


the model are 
6X'=Pi(X)ej, — 6Ayi = DI; [65] 


where D’; = 8,8; + P” ;A„p. The equations of motion 
are 


e” D Ay =0 [66] 
and 
e” (8X? + PIA,;) = +” D, X = 0 [67] 
The gauge algebra is given by 
[5(€1), 6(€2) |X? = P” (P”” jEtnE2m) 
[6(€1), 6(€2) Ani = D u (P”” jE1nE2m) [68] 
= (EVD X Jeon. "JEn 


In our general notation the generators of the gauge 
transformations R are here P” and D',. The gauge 
tensors T and E are P”, and ep, P” ji. The higher- 
order gauge tensors A and B vanish. 

The ghost fields are again denoted by C’. The 
Noether identities are then 


J TaT +- (e”D, XDK) C=0 169] 


Considering the commutator of two gauge transfor- 
mations leads to (see eqns [8]-[11]) 


J CO a POP" VCC, 0 
M 
J u(2(P:D!, 4 PO AP) [70] 
M 
DPU a (D Xew P i) CC.=0 
The Jacobi identity is 
Pi gk“ CCC a0 [71] 
The fields and antifields of the model are 
o* = {A™,X',C;} and 6% = {A",X3,C*} [72] 


The extended action is 
S= J (Apax + P!(X)A,;A,;) 
M 
a, : 1o. 
+ AMD) Cj + X}P"(X)G +5 C* PI (X)CCy 
1... 
+ 7AA euP i(X)Cy cr] [73] 


The gauge-fixing conditions are taken to be of the 
form x;(A, X), so that the gauge fermion [50] becomes 
Y = C'y;(A, X). The antifields are then fixed to be 
= Oxj(A, X) 

A*, = G 

"moè T OA 


ars [74] 
Cr=0 
C; = xi(A, X) 


The gauge-fixed action is 
Sy = / 1( el” (Auð X + P” (X)A iA) 
M 
ES Ck OxR(A, X) 2 7 ôxk(A, X) 
OA pi ut OX! 


1 =n ON A, X) 
t3“ ~ aA, 


PIC; 


Xul A, X) 


C” ðA, w y (X) 


XCC + xi(A,X)) [73] 


Now consider different gauge conditions: 


1. First, the Landau gauge for the gauge potential 
xi=O"A,i, so that the gauge fermion becomes 
W=CO"A,,;. The antifields are fixed to be 


At = Ot 
Cao a0 [76] 
Ci = OMA; 


for this gauge choice the gauge-fixed action is 
= J 1 ( "(Ay X’ + Pi(X)AyA,j) + ČO" D'C; 
1 mi V 
+g (C(I Ce P (X) 
x C,C; — 7(0"Ay) [77] 


Translating this action into the notation of Cattaneo 
and Felder, one sees that it is exactly the expression 
they use to derive the perturbation series. 

2. Now consider the temporal gauge x; = Ao;. The 
gauge fermion is given by U=C’'Ag;. The anti- 
fields are fixed to 


Ave = Cc! 
Atl = 0 
| [78] 
X =G" =Q 
C* = Ao; 
The gauge-fixed action is 
Sy = J phe (A ðX + PH(X)AyiAry) 
M 
OD G= T (Aoi)) [79] 


3. Finally consider the Schwinger-Fock gauge 
Xi =x"A,;. Then the antifields are fixed to be 


A = x#C! 
X*=C*=0 [80] 
C = sA 


for this gauge choice the gauge-fixed action is 
SyS J (el (Auð X + P”(X)A iA) 
M 


+O D G- 7#(O"A,i) 81] 
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Introduction 


The Bethe ansatz is a particular form of wave function 
introduced in the diagonalization of the Heisenberg 
spin chain. It underpins the majority of exactly solved 
models in statistical mechanics and quantum field 
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Notice that in the noncovariant gauges 2 and 3 the 
action simplifies, in that the term which arose 
because of the nonclosed nature of the gauge algebra 
vanishes. 


See also: BF Theories; BRST Quantization; Constrained 
Systems; Graded Poisson Algebras; Operads; 
Perturbative Renormalization Theory and BRST; 
Supermanifolds; Topological Sigma Models. 
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theory. At the heart of the Bethe ansatz is the way in 
which multibody interactions factor into two-body 
interactions. The Bethe ansatz is thus intimately 
entwined with the theory of integrability. 

The way in which the Bethe ansatz works is best 
understood by working through an explicit hands-on 
example. The canonical example is the isotropic 
antiferromagnetic Heisenberg Hamiltonian 

L-1 
H= X hiis thi, 
i=1 


hy=3(0;-0;+1) [1] 
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where 0 = (0%, 0”, 0%) are Pauli matrices and L is the 
length of the chain. Periodic boundary conditions are 
imposed. However, open boundary conditions may 
also be treated, along with the addition of magnetic 
bulk and boundary fields. The z-components of each 
of the spins are either up or down. Since the 
z-component of the total spin commutes with the 
Hamiltonian, the total number n of up spins serves as a 
good quantum number. A state of the system can 
therefore be conveniently described in terms of the 
coordinates of all the up spins. Denote these coordi- 
nates by x;, with 1 < x; < L. The quantum number n 
ensures that the Hamiltonian decomposes into L + 1 
sectors, each of size L choose n. The antiferromagnetic 
ground state occurs in the largest sector. 

The normalization of the Hamiltonian [1] is such 
that its action is that of the permutation operator: 


cae coal me 


bl+—) = |-+), 


b|++) = |++) 
|2] 
h|-+) = |+-) 


Diagonalization of Sectors 


One can address the diagonalization of the sectors 
for various cases. 


Case 1: n=0 

Consider the case with all spins down. The 
eigenstate is W=|—---—), with HU=LW and, 
thus, E=L is the trivial solution. 

Case 2: n=1 


There are L states, with 
L 

Y= X a(x)|y(x)) [3] 
x=1 


where |y(x)} is the state with an up spin at site x. 
The aim is to find the amplitudes a(x). It is clear 
that 


Hiy(x)}) =(L — 2)[b(x)) + [y(x — 1)) 
+ [y(x + 1)) 4 


in the bulk (away from either boundary). Insertion 
of [3] into HY = EW gives 


Ea(x) = (L—2)a(x) ta(x—1)+a(x+1) [5] 


Substitution of spin waves a(x) =e'** gives 


EF=L—2+2cosk [6] 


The boundary conditions are such that a(0)=a(L) 
and a(L + 1)=a(1); either gives e} = 1, from which 
the L values of k follow. 


Case 3: n=2 


Here the wave function can be written in terms of 
the two flipped spins as 


v= Faly) ol) 7 
x<y 
It is to be emphasized that one is working in the 
region with x < y. There are two cases to consider: 
(1) y>x+1 and (2) y=x+1. Consider the 
interactions in the bulk. For (1) the action of the 
Hamiltonian implies 
Ea(x,y) = (L — 4)a(x,y) + a(x — 1,y) + a(x + 1,9) 
+ a(x,y — 1) +a(x,y + 1) [8 | 
and for (2) 
Fa(x,x+1) =(L—2)a(x,x+ 1) 
+a(x—1,x+1)+a(x,x+2) [9] 
The compatibility of these two equations requires that 
2a(x,x +1) =a(x,x)+a(x+1,x+1) [10] 


which is known as the “collision” or “meeting” 
condition. 

Some adjustments need to be made for spins 
which get flipped at the boundaries. Looking at 
[8] and [9] with x=1 and x= L, it is evident that 
one can take 


aly, x + L) = a(x, y) [11] 


to restore the original ordering. The terms which 
arise involve up spins at sites 0 and L +1. This 
illustrates the periodic boundary condition. 

We now assume (the Bethe ansatz) that 


a(x, y) = Ape eY 4. Angee [12] 


Substitution of the ansatz [12] into [8] gives 


E = L — 4 + 2 cos kı + 2 cos k2 [13] 
Substitution of [12] into [10] gives 
Ap 1 — 2 ek a el(ki+k2) 


An 1—2e + elite) ad 
The three relations [11], [12], and [14] give the 
Bethe equations 


l A A 
ekil 22 and ell = re [15] 


21 12 


which are to be solved for kų and k. Note that 
ei(kı+k2)L =<], 


Case 4: n=3 


The full power of the Bethe ansatz method becomes 
evident for three particles. Here 


v= ` a(x, y,z)|W(x, y, z)) [16] 


X<YKZ 


There are several cases to consider: 


1.y>x+1andz>y+1, where 


Ea(x,y,z) = (L — 6)a(x,y,z) + a(x + 1,y, 2) 
+a(x,y£1,z)+a(x,y,z41) [17 
By a(x+1,y,z), we mean a(x+1,y,z)+ 
a(x — 1,y,2z), etc. 
2. y=x+1andz>y-+1, with 


Fa(x,x + 1,2) 
= (L —4)a(x,x+1,z) + a(x — 1,x + 1,2) 
+a(x,x+2,z)+a(x,x+1,z+1) [18] 


3. y>x+1 andz=y+ 1, where 


Ea(x,y,y +1) 
=(L—4j)a(x,y,y+1)+a(x+1,y,y+1) 
PG eV Ly a 1 orale yyy 2) [19] 


4. y=x+ 1 and z=y + 1, for which 


Ea(x,x + 1,x +2) =(L—2)a(x-—1,x+1,x+2) 
+a(x,x+1,x+3) [20] 


Again, we must ensure that these equations are 
compatible. This involves comparison of the last 
three equations with [17]. The three equations to be 
satisfied are 


2a(x,x + 1,2) =a(x,x,z)+a(x+1,x+1,z) [21] 


2a(x,y,y + 1) = a(x, y, y) +a(x,y + 1,y +1) [22] 

Aa(x,x+1,x+2) =a(x,x,x+2)+a(x,x+1,x+1) 
+a(x,x+2,x+2) 

+a(x+1,x+1,x+4+2) [23] 


But note that setting z=x + 2 in [21] and y=x+1 
in [22] leads to [23] being automatically satisfied. 
We are thus left with only two equations [21] and 
[22]. Note the similarity between these two equa- 
tions and the meeting condition [10] for the n=2 
case. 
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In this case the Bethe ansatz is 
a(x, y, 2) =A123242423 + A132242325 
+ A213252423 + Ar31 232324 
+ A3012%32524 + A3122324%5 [24] 
in which z= eki, This is a sum over the 3! 


permutations of the integers 1, 2, 3. Inserting this 
ansatz into |17] gives 


E = L — 6 + 2(cos kı + cos k2 + cos k3) [25] 
To determine the kj, it is convenient to define 
Sy = 1 — 22) + zz [26] 


Substitution of [24] into the meeting conditions [21] 
and [22] then gives 


$42A123 + S214213 + S134132 + S314312 

+ $73A231 + $32A321 = 0 [27] 
$23A123 + $32A132 + S134213 + S314231 

+ $91A321 + $12A312 = 0 [28] 


These equations are assumed to be satisfied in 
permutation pairs, that is, 


$42A123 + $21A213 = 0 
|29] 
sA + Ai = 0, tc 


Up to an overall constant, the relations [27] and [28] 
are satisfied by 


A423 = $21$31$32, A132 =—$31$21$23 
A312 = $13$23$21, A321 =—$23$13$12 [30] 


A231 = $32$12$13, A213 =—$12$32$31 


The boundary condition, a(y,z,x + L)=a(x,y,z), 
gives 


(27 A321 — Aiak ez T (27 A312 — Az31) 232327 
+ (zyAo31 — A123)272323 + (25 A213 — Asai 232327 
+ (zyA132 — A213)2727 23 + (23 A123 — Á312)23 2727 
~0 31] 
This leads to the equations 


L  Á123 Air  S$218$31 
L LEi 
A331 A31 SoS 








AR A213 _ A231 _ $12$32 32) 
* A132 Á312  S0s23 
ee A321 _ Á312 _ $13$23 
A213. A3 isz 





which can be solved for the Bethe roots k;. 
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General n 


The general Bethe ansatz is 
C EEE e = X Ap, N E T ae [33] 
P 


where the sum is over all n! permutations 
P ={p1,..., pn} of the integers 1,...,n. The boundary 


condition is 
AUE Nasso Na Xt Aids) SHA Nsa) 4] 


leading to the Bethe equations 


E Pn [35] 


Api, pu = EP [I Spj,Di [36] 


1<i<j<n 
where ep is the signature of the permutation. Finally, 
iTTS ITT 
g (o OTe or o j 67 
pı S S; 
(=2 Pu Ei Y 
i 
for j;=1,...,2. The eigenvalues are given by 


E=L+) (2cosk; — 2) [38] 
j=1 


Another form of the Bethe equations is obtained 
by defining 


ik; uj — (1/2)1 
i = —___— 39 
© nj + (12)i Pa 
which gives 
E=L- S [40] 
a u? + 1/4 
j=1 1 
with u; satisfying 
(“ Un) mie -u —i 41] 
uj + (1/2)i pa tj — Me +1 


for j=1,...,7. 

All eigenvalues of the Heisenberg spin chain may 
be obtained in terms of the Bethe ansatz solution. 
For example, the distribution of roots u; for the 
ground state are real and symmetric about the 
origin. Excitations may involve complex roots. 
Although obtained exactly in terms of the Bethe 
roots, the Bethe ansatz wave function is 
cumbersome. 

We have thus seen how the Bethe ansatz works 
for the Heisenberg spin chain. The underlying 
mechanism is the way in which the collision or 


meeting conditions can be handled in terms of two- 
body interactions. To see this more clearly, the six 
permutation pair equations [29] can be written in 
the general form A,,,= Y,,Ap,- and Aabe = YpcAachs 
where Y,» = —Sba/Sab- Now there are two possible 
paths to get from A,,, to Acha, namely 


Pichi = Yab Yaa Yie abe 


[42] 
A sba — Yobe Yac Y ababe 


Both paths must be equivalent, with 
YapYba =1 and Yop YacYoe = YpcYacYap [43] 


The latter is a condition of nondiffraction or 
equivalently a manifestation of the Yang—Baxter 
equation. 

Historically, the next model to be exactly solved in 
terms of the Bethe ansatz was the one-dimensional 
model of N interacting bosons on a line of length L 
defined by the Hamiltonian 


N 


82 


>. ó(xi = x) [44] 


1<i<j<N 


where c is a measure of the interaction strength. For 
this model the Bethe ansatz wave function is of the 
same form as [33] with the two-body interaction 
term given by 


Sab = Ra — kp + ic [45] 


The Bethe equations are given by 


N : 
, k: — ke +ic 
exp(ik;L) =— | | =— 
p( iL) pak; — ke —ic 
for j=1,...,N |46] 
The energy eigenvalue is 
N 
F= `“ k? [47] 
j=l 


For repulsive (c > 0) interactions, one can prove that 
all Bethe roots are real. 

The Bethe ansatz has been applied to a number of 
other and more general models, both for discrete 
spins and in the continuum. These include the 
anisotropic Heisenberg (XXZ) spin chain, for 
which the above working readily generalizes to 
trigonometric functions. The underlying ansatz [33] 
remains the same. One key generalization is the 
nested Bethe ansatz, which arises, for example, in 
the solution of the general N-state permutator 
model, the Hubbard model, and the Gaudin—Yang 
model of interacting fermions. For such models the 
nested Bethe ansatz involves an additional level of 
work to determine the amplitudes appearing in the 


wave function [33] due to higher symmetries. This 
results in Bethe equations involving different types 
or colors of roots. 

The exactly solved one-dimensional quantum spin 
chains may also be obtained from their two-dimen- 
sional classical counterparts — the vertex models. For 
example, the six-vertex model shares the same Bethe 
ansatz wave function and Bethe equations as the 
XXZ spin chain. The more general permutator 
Hamiltonians are related to multistate vertex models. 
One may also consider other spin-S models. 

The discussion in this article has centered on what is 
known as the coordinate Bethe ansatz. Another 
formulation is the algebraic Bethe ansatz, which was 
developed for the systematic treatment of the higher- 
spin models. In this formulation, operators create the 
Bethe states by acting on a vacuum. The algebraic 
Bethe ansatz goes hand-in-hand with the quantum 
inverse-scattering method. In all of the exactly solved 
Bethe ansatz models, it is possible to derive quantities 
like the ground-state energy per site via the root density 
method, which assumes that the Bethe roots form a 
uniform distribution in the infinite-size limit. The 
thermodynamics of the Bethe ansatz solvable models 
may also be calculated in a systematic fashion. 

Despite Bethe’s early optimism, the Bethe ansatz 
has not been extended to higher-dimensional 
systems. 


See also: Affine Quantum Groups; Eight Vertex and Hard 
Hexagon Models; Integrability and Quantum Field 
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Introduction 


BF theories are a class of gauge theories with a 
nontrivial metric-independent classical action. As 
such these theories are candidate topological field 
theories akin to the Chern—Simons theory in three 
dimensions, but in contrast to the Chern—Simons 
theory these exist and are well defined in arbitrary 
dimensions. 

The name “BF theories” derives from the fact 
that, roughly (see [1] below and the subsequent 
discussion for a more precise description), the action 
of the BF theory takes the form f B A Fa with F4 the 
curvature of a connection A and B a Lagrange 
multiplier. The classical equations of motion imply 
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that A is flat, F4=0O, and thus BF theories are 
topological gauge theories of flat connections. 

Abelian BF theories and their relation to topolo- 
gical invariants (the Ray-Singer torsion) were 
originally discussed by Schwarz (1978, 1979). In 
the context of the topological field theory, non- 
abelian BF theories were introduced in Horowitz 
(1989) and Blau and Thompson (1989, 1991). 

Since then, BF theories have attracted a lot of 
attention as simple toy-models of (topological) 
gauge theories, and also because of their relation- 
ships with the Chern—Simons theory, the Yang-Mills 
theory, and gauge-theory formulations of gravity, as 
well as because of the rather rich and intricate 
structure of their quantum theories. 

The purpose of this article is to provide an 
overview of these various features of BF theories. 
The standard reference for the basic classical and 
quantum properties of BF theories is Birmingham 
et al. (1991). 
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Basic Classical Properties of BF Theories 
Nonabelian BF Theories 


The classical action and equations of motion Typi- 
Typically, the classical action of the BF theory takes 
the form 


Spr(A, B) = J trg BAFA [1] 
M 

where F4 is the curvature of a connection A on a 
principal G-bundle P — M over an n-dimensional 
manifold M, B is an ad-equivariant horizontal 
(n— 2)-form on P, and trg (a trace) denotes an 
ad-invariant nondegenerate scalar product on the 
Lie algebra g of the Lie group G. Generalizations of 
this are possible, in particular, for G abelian or for 
n=3 and are mentioned below. 

We consider Fa and B as forms on M taking 
values in the bundle of Lie algebras ad P=P x,4 Q 
and refer to such objects as elements of Q*(M, 9). 
Then tr B A Fa € Q”(M, R) is a volume form on M. 
In order to simplify the exposition, in the following 
we will mostly assume that G is compact semisimple 
and that M is compact without a boundary (even 
though relaxing any one of these conditions is 
possible and also of interest in its own right). 

Varying the action [1] with respect to A and B, 
one obtains the classical equations of motion 


Fy = 0. daB =0 [2] 
where 
daB = dB + [A, B] [3] 


is the covariant exterior derivative. In particular, 
therefore, the equations of motion imply that the 
connection A is flat. 


Gauge invariance For any n, the action [1] is 
invariant under G gauge transformations (vertical 
automorphisms of P) acting on A and B as 


A—g'Ag+g'dg, B—g'Bg [4 


(the latter is what is meant by the fact that B takes 
values in ad P), because F4 is also ad-equivariant, 
Fa g Fag, and trg is ad-invariant. The infinitesi- 
mal version of this statement is that the action is 
invariant under the variations 


SA=dad, 6B=([BX [5] 


where A € 0°(M, g) can (formally) be thought of as 
an element of the Lie algebra of the group of gauge 
transformations. 

Gauge-fixing this symmetry can proceed in the 
usual way (via the Faddeev—Popov or Becchi-Rouet- 


Stora-Tyupkin procedure), a typical gauge choice 
being d4,*(A—Ao)=0O where Ao is a reference 
connection, and x is the Hodge duality operator 
corresponding to a choice of metric on M. 


Local p-form symmetries For n=2, the only local 
symmetries of the BF action are the above G gauge 
transformations. For n > 2, however, there are other 
local symmetries associated with shifts of By € 
Q?(M, g) with p = n — 2 > 0. Indeed, integration by 
parts using Stokes’ theorem and OM = 0 shows that [1] 
is Invariant under 


A—A, By—>By+daXrp-1, Ap-1 © QP=1(M, g) [6] 


For p=1, A is a 0-form and the invariance follows. 
For p> 1, however, the gauge parameter has, in 
some sense, its Own gauge invariance. Namely, 
under the shift 


Ap-1 = Ap-1 + darp-2 7] 
one has 
darp—1 — Aarp-1 + Fa Ap-2] [8] 


Thus for Fa =0, the shift [7] has no effect on the 
local symmetry [6]. Likewise, for p > 2 the parameter 
Ap—2 itself has a similar invariance, etc. Since F4 = 0 
is one of the classical equations of motion, the shift 
symmetry [6] is what is called an “on-shell reducible 
symmetry.” Gauge-fixing such symmetries is not 
straightforward, and one generally appeals to the 
Batalin-Vilkovisky formalism to accomplish this. 


Diffeomorphisms and local symmetries One mani- 
festation of the general covariance of the BF action 
[1] is the on-shell equivalence of (infinitesimal) 
diffeomorphisms and (infinitesimal) local symme- 
tries. Diffeomorphisms are generated by the Lie 
derivative Ly along a vector field X. The action of 
Lx on differential forms is given by the Cartan 
formula Ly =dix +ixd, where i) is the operation 
of contraction. The action of the Lie derivative on 
A and B can be written in gauge covariant form as 


LxA =ixFa + dĘaA(X), 
_; 9] 
LxB =ixd4B + [B, A(X)] +dir (X) 


where A(X) =7xA and /(X) =ixB. This shows that 
on-shell diffeomorphisms are equivalent to field- 
dependent gauge and p-form symmetries of the 
BF action. 


The classical moduli space The classical moduli 
space C=C(P, M, G) is the space of solutions to the 
classical equations of motion modulo the local 
symmetries of the action. Since the field content 


and the nature of the local symmetries of the BF 
theory depend strongly on the dimension n of M, the 
structure and interpretation of the classical moduli 
space also depend on n. 

For n=2, by [5] the equation of motion [2] for 
B € Q?(M,g) says that A is invariant under the 
infinitesimal gauge transformation generated by B. 
Thus if A is “irreducible,” there are no nontrivial 
solutions for B and, away from reducible flat 
connections, the classical moduli space is just the 
moduli space of flat connections on P — M over the 
surface M: 


Cy = Meat (P, G) [10] 


This space may or may not be empty, depending on 
whether P admits flat connections or not. 

For n=3, the equation of motion [2] for 
B € Qİ(M, qg) says that B is a tangent vector to the 
space of flat connections at the flat connection A, in 
the sense that under the variation 6A = B, one has 


6F,4 = daB =0 11) 


The local G gauge symmetry and the 1-form symmetry 
[6] now imply that the moduli space of classical 
solutions can be identified with the (co-)tangent bundle 
of the moduli space of flat connections on P — M 
over the 3-manifold M: 


Cs = TMeaar(P, G) [12] 


In higher dimensions there appears to be less 
geometrical structure associated with BF theories, 
and all that can be said in general is that the tangent 
space to C, at a solution (A, B) of the equations of 
motion [2] is the vector space: 


Ta,B)Cn = H (M, g) B H (M, g) [13] 


where HE (M, g) are the cohomology groups of the 
deformation complex 


da : Q*(M, g) — Q+ (M, g) [14] 


associated with the flat connection A, Fa = (d4) =0. 
When M is topologically of the form M= x R 
(where one can think of R as time), one has 


Tya, B)Ĉn E Doo g) B H E, g) [15] 





This is naturally a symplectic vector space (necessary 
for a phase space), the nondegenerate antisymmetric 
pairing being given by Poincaré duality: 


w( [a1], [b4]; [a2], [b2]) = [sola Aba — a ^b) [16] 


Metric independence Perhaps the most important 
property of the action [1] is that, in contrast to, 
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for example, the usual Yang-Mills action for 
nonabelian gauge fields 


1 
Sym = qq | tro Fa MsP [17] 


it does not require a metric (or the corresponding 
Hodge duality operator x) for its formulation. This 
makes it a candidate action for a “topological field 
theory,” this term loosely referring to field theories 
which, in a suitable sense, do not depend on 
additional structures imposed on the underlying 
space(-time) manifold M, in this case a Riemannian 
structure. 

To establish that BF theories are “topological 
quantum field theories,” one needs to show that 
the partition function (and correlation functions) 
of the quantized BF theory are also metric 
independent. This is not completely automatic as 
typically the metric enters in the gauge fixing of 
the local symmetries of the action which is 
required to make the quantum theory well defined. 
The usual lore is that since the metric only enters 
through the gauge fixing and since the quantum 
theory should be independent of the choice of 
gauge, it should also be metric independent. In the 
case of nonabelian BF theories, the complexity of 
their local symmetries complicates the analysis 
somewhat, but it can nevertheless be shown that 
BF theories indeed define topological field theories 
also at the quantum level. 


Special Features of Abelian BF Theories 


All the features of nonabelian BF theories discussed 
above are, of course, also valid when G is abelian 
(with some obvious modifications and simplifica- 
tions). However, when G is abelian, a more general 
action than [1] is possible. Indeed, although there is 
no obvious higher p-form analog of nonabelian 
gauge fields, in the abelian case G=U(1) or G=R, 
and the condition F4 € Q?(M, R) can be relaxed. In 
particular, one can consider the actions 


S(n, p) = S(Bp, Cn-p=1) = / By /\ dC p34 [18] 
M 


with B, € 2?(M,R), Cr-p-1 E€ Q”P-I(M, R), and 
Fc=dC, its (n — p)-form field strength. More gen- 
erally, one can also consider the hybrid action 


Sa(1, p) = J B; A daCn—p-1 [19] 


where A is a fixed (nondynamical) flat G-connection, 
a = 0, and B and C take values in the corresponding 
adjoint bundle. This action can be considered as the 
linearization of the nonabelian BF action [1] around 
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the flat connection A, and it reduces to the abelian BF 
action [18] for q=R. 

The action is invariant under the (reducible) local 
symmetries 


By —_ By + darp-1 


. 20 
Cn-p-1 > Cu-p-1 + dary p_2 20] 


The space of solutions to the equations of motion 
daC=d4B=0 modulo gauge symmetries is (cf. [13]) 
the finite-dimensional vector space 


Cup = HA(M, 9) S HA? (M,g) [21] 
which is naturally symplectic for M= x R. 


Uses and Applications of Quantum 
Abelian BF Theories 


Quantization of Abelian BF Theories and the 
Ray-Singer Torsion 


We will now show that the partition function of 
the abelian BF theory (actually more generally that 
of the linearized nonabelian BF action [19]) is 
related to the Ray-Singer torsion of M. This 
requires some preparatory material on Gaussian 
path integrals, determinants, and gauge fixing that 
we present first. 

In order to simplify the exposition, we assume 
that there are no harmonic modes, either because 
they have been gauged away or because the 
cohomology groups of dą are trivial, H£ (M, 9) = 0; 
that is, the deformation complex [14] is “acyclic.” 


Laplacians, determinants, and the Ray-Singer 
torsion Choosing a Riemannian metric g (and 
Hodge duality operator x) on M, the twisted 
Laplacian on p-forms is 


AY) = (datdyy =dadatdid, [22 


where d% = + xd, x is the adjoint of d with respect to 
the scalar product on p-forms defined by x. This is an 
elliptic operator whose determinant can be defined, for 
example, by a ¢-function regularization. Denoting the 
(nonzero) eigenvalues of AP by x its C-function is 


Cs) = POP)” 23] 


This converges for Re(s) sufficiently large and can be 
analytically continued to a meromorphic function of 
s analytic at s=0, so that 


det AP) = e760) [24] 


is well defined. The Ray-Singer torsion of (M, g9) 
(with respect to the flat connection A) is then 


defined by 


Ta (M) = [I (der any” ‘a 25] 
p=0 


Even though this definition depends strongly on the 
metric g on M, the Ray-—Singer torsion has the 
remarkable property of being independent of g. The 
Ray-Singer torsion can be shown to be trivial 
(essentially =1 modulo zero-mode contributions) 
in even dimensions, but is a nontrivial topological 
invariant in odd dimensions. Henceforth, we will 
suppress the dependence on M and denote the 
n-dimensional Ray-—Singer torsion by T(z). 


Gaussian path integrals and determinants The path 
integral for abelian BF theories is modeled on the 
usual formula for a 6-function 


N(x = 1 1 @ltx 
“=m fd 26 


from which one deduces the Gaussian integral 
formula 


1 
(V2m)” 
= | d’xé"(Dx + J) el 
RY 


1 =k D Y 
= LA. 2 
det D © Pa 


J dar d” x eitDxtikxtin] 
R” xR” 





Here, we have assumed that the operator (matrix) D 
is invertible. The model that one uses in the path 
integral is that 


J dlg] dix] i uP = (det D)! p28 


where ¢ is a set of fields and the y are a set of dual 
fields with D again a nondegenerate operator. The 
inverse determinant arises for Grassmann even fields 
(as in [27]), while it is the determinant that appears 
for Grassmann odd fields. 


Gauge fixing — the Faddeev—Popov trick If the 
action [19], Sa(n, p)= f BpdaCn-p-1, were non- 
degenerate, its partition function could be defined 
directly by [28]. However, because of gauge invariance 
of the action, the kinetic term is degenerate and one 
needs to eliminate the gauge freedom to obtain an (at 
least formally) well-defined expression for the partition 
function. Concretely, this degeneracy can be seen by 


recalling that, when there are no harmonic forms (as we 
have assumed), there is a unique orthogonal Hodge 
decomposition of a p-form B, € QP (M, g) into a sum of 
a d,-exact and a d,-coexact form: 


By = darXp-1 + di Tp+ [29] 


(and likewise for C). Evidently, the exact (longitudinal) 
parts d4à of B and C do not appear in the action, and 
these are precisely the gauge-dependent parts of B and 
C under the gauge transformation [20]. Gauge fixing 
amounts to imposing a condition F (Bp) =0 on B, that 
determines the longitudinal part uniquely in terms of 
the transversal part d,7. A natural condition is 


darXp—1 =():2> F (By) = d4 Bp =) [30] 


A gauge-fixing condition independent of the partition 
function results from inserting “1” in the form of 


1= | dlgls(F(B*)Ar(B) [31] 
G 


into the functional integral (the Faddeev—Popov 
trick), where G is the gauge group. This defines the 
Faddeev—Popov determinant Aç, and the functional 
properties of the delta functional imply that Aç is 
the determinant of the operator that one obtains 
upon gauge variation of F(B). 

In the general case of reducible gauge symmetries, 
the nature of the gauge group is complicated and 
requires some more thought. In the irreducible case, 
however, that is, for p=1, the Lie algebra of the 
gauge group can be identified with 0°(M,q), and 
Ar is the determinant of the operator: 


OF 
5p oA : Q’ (M, g) = O°(M, g) [32] 


For [30], this is simply the Laplacian on 0-forms, 
and thus 


Aç = det AÑ [33] 


The partition function Following the finite-dimen- 
sional model, both the 6-function implementing the 
gauge-fixing condition and the Faddeev—Popov 
determinant can be lifted into the exponential, the 
former by a Lagrange multiplier m [26], a Grassmann 
even 0-form, and the latter by a pair of Grassmann 
odd 0-forms c and č [28], the ghost and antighost 
fields, respectively. The sum of the classical action 
and these gauge-fixing and ghost terms defines the 
(BRST-invariant) “quantum action” S4 (n, p), and the 
partition function is 


Zn J dioj [34] 
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where o denotes collectively all the fields. Concre- 
tely, when n=2 and p=0 (or, equivalently, p = 1), 
the quantum action is 


S4 (2,0) = J Bodac + rda x Cı +x Ae [35] 


Likewise, for n=3 and p=1 (the only other case 
when the gauge symmetry is indeed irreducible), 
both Bı and C4 require separate gauge fixing, and 
the quantum action is 


S1 (3,1) = [ Biases 4 dy x Cy $2% AM 


+ nda x By PORA E [36] 


Formally, therefore, the two-dimensional partition 
function is 





det A) 
Za(2,0) — det Da [37] 
where D4 is the operator: 
d 
D, = ( ʻi ) :Q'(M, 9) 
xd Ax 
— 0°(M,g) © 2°(M, 9) 38] 


One can define the determinant of this operator as 
the square root of the determinant of the operator 
D D= A and therefore the partition function 


Za(2,0) = det A% (det AH) = Ta(2) T39 


is equal to the two-dimensional Ray-Singer torsion 
[25]. In this case, it is easy to see directly that the 
even-dimensional Ray-—Singer torsion is trivial, as 
one could have equally well defined the determinant 
of Da as the square root of the operator 
DaD% =A © AY), which implies Z4(2,0)=1. 

In three dimensions, the two pairs of ghosts each 
contribute a det a, and thus 


(det A)? 





Za(3,1) = emer [40] 
where 
D4 = H o) : Q? (M, g) © Q' (M, 9) 
— 0°(M, g) 6 Q'(M, g) [41] 


is the operator acting on the fields (B1, C1, m, m). As 
before, this operator can be diagonalized by squar- 
ing it, D4 D4 =A 6 A"), and thus 


Za(3, 1) = (det A)3/? (det Aye 
= Tal) 42] 
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is again related to the (this time genuinely nontrivial) 
Ray-Singer torsion. 

In spite of the complications caused by reducible 
gauge symmetries, it can be shown that all of the 
above generalizes to arbitrary n and p, with the 
result that (for n odd) 


Za(n,p) = Ta (n) [43] 


confirming the topological nature of BF theories. 

In the nonabelian case, the situation is significantly 
more complicated because of the complexity of the 
classical moduli space, the (higher cohomology) zero 
modes, and the on-shell reducibility of the gauge 
symmetries. Nevertheless, ignoring all the zero modes 
except those of A, that is, except the moduli m of flat 
connections A(m), the result is similar to that in the 
abelian case, in that the partition function reduces to an 
integral over the moduli space of flat connections, with 
measure determined by the Ray-—Singer torsion T4(y). 


Linking Numbers as Observables of Abelian 
BF Theories 


With the exception of p=0, there are no interesting 
“local” observables (gauge-invariant functionals of the 
fields C and B) in the abelian BF theory, since the gauge- 
invariant field strengths dC and dB vanish by the 
equations of motion. (For p = 0, B is a gauge-invariant 
0-form and hence B(x) is a good local observable.) 
However, as in the Chern-Simons and Yang-Mills 
theories, certain (weakly) nonlocal observables such as 
Wilson loops are also of interest. In the case at hand (eqn 
[18]), we have abelian Wilson surface operators 


wsi=[c m 


associated with p- and (n — p — 1)-dimensional sub- 
manifolds S and S’ of M, respectively. These operators 
are gauge invariant, that is, invariant under the local 
symmetries [20] provided that 0S = OS’ = 0, so that S 
and S’ represent homology cycles of M. 

For M = R”, correlation functions of these opera- 
tors are related to the topological linking number of 
S and S. We choose $=0™% and S =0¥' to be 
disjoint compact-oriented boundaries of oriented 
submanifolds © and X’ of R”. We also introduce 
de Rham currents Ay and As (essentially distribu- 
tional differential forms with 6-function support on 
X or S, respectively), characterized by the properties 


for= | As A Wp 
S M 

[op = Ay A wp+ 
3 M 


for all wp € QF(M, R) (and likewise for S’ and D’). 


[45] 


Since the dimension of © is equal to the codimen- 
sion of S’=0»’, X and S will generically intersect 
transversally at isolated points, and we define the 
“linking number” of S and S’ to be the intersection 
number of X and S’, expressed in terms of de Rham 
currents as 


L(S,S') = [ Ag = J Asây [46] 


In terms of de Rham currents, the Wilson surface 
operators can be written as Ws[B] = f MASAD CIC, 
Thus, the generating functional for correlation 
functions of Wilson surface operators 


(eP Ws [B] e12 Wy ey 


is simply a Gaussian path integral. Using the 
defining properties of de Rham currents, this can 
be formally evaluated (using [27]) to give 


(eif Ws [B] piaWy [C] J= etiapL(S,s’) |48] 


As expected, correlation functions of these topolog- 
ical field theories encode topological information. 


Uses and Applications of Classical 
Nonabelian BF Theories 


Low-dimensional BF theories are closely related to 
other theories of interest, for example, the Yang- 
Mills theory, the Chern—Simons theory, and gravity. 
Here, we briefly review some of these relationships. 
In order to avoid the complexities of quantum 
nonabelian BF theories, we focus on their classical 
features. Brief suggestions for further reading are 
provided at the end of each subsection. 


Relation with Yang-Mills Theory 


In any dimension, the nonabelian BF action can be 
regarded as the zero-coupling limit g% — 0 of the 
Yang-Mills theory since the Yang-Mills action [17] 
can be written in first-order form as 


1 
ial, trg Fa A xFa 
= J trG iBy—2 A Fa + g°Bn-2 /\ *By_2| [49] 
M 


However, whereas for n > 3 the B*-term breaks the 
p-form gauge invariance of the BF action (and thus 
liberates the physical Yang-Mills degrees of free- 
dom), this limit is nonsingular in two dimensions 
where this p-form symmetry is absent and, indeed, 
both theories have zero physical degrees of freedom. 


A nonsingular BF-like zero coupling limit of 
the Yang-Mills theory for n > 3 can be obtained 
by introducing an auxiliary (Stiickelberg) field 
n € Q" -3(M,q) which restores the p-form gauge 
invariance. The resulting BF Yang-Mills action is 


SBEYM = J trG iBy-2 A Fa 
M 


A (B _ Fy 4an)| [50] 


This action is not only invariant under ordinary G 
gauge transformations, but also under the p-form 
gauge symmetry B — B+d,X [6] provided that 7 
transforms as n > n + V2gX. Thus, this shift can be 
used to set 7 to zero, upon which one recovers the 
first-order form of the Yang-Mills action. More- 
over, in the zero-coupling limit all that survives is a 
standard (and nontopological) minimal coupling of 
7 to the BF action: 


lim Sprym 
g°—0 
= J trg |iB„-2 A Fa +3dan ^ xdan| [S1] 
M 


accounting for the correct number of degrees of 
freedom of the Yang-Mills theory (the (n — 3)-form 
ņ being absent for n=2). 

Two-dimensional quantum BF and Yang-Mills 
theories have a variety of interesting topological 
properties. An account of some of them can be found 
in Blau and Thompson (1994) and Witten (1991). For 
a detailed discussion of the gauge symmetries and gauge 
fixing of the BFYM action, see Cattaneo et al. (1998). 


Chern-Simons Theory, Gravity, and (Deformed) 
BF Theory 


The Chern-Simons theory is a three-dimensional 
gauge theory. The Chern-Simons action for an 
H-connection C, H the gauge group, is 


Scs(C) = f tryu(CAdC+4C/ CA C) [52] 
M 


It is invariant under the infinitesimal gauge transforma- 
tions óC = dc, A € 2°(M, §), and the gauge-invariant 
equation of motion is the flatness condition Fc = 0. 
Now let H=TG be the tangent bundle group 
TG ~ G x; g. This is a semidirect product group 
with G acting on g via the adjoint and g regarded 
as an abelian Lie algebra of translations. Thus, in 
terms of generators (Ja, Pa), where the J, are 
generators of G, the commutation relations are 


BF Theories 263 


eJ l= Jo Us Peli F; and Per =0 and 
the curvature of the TG-connection C = J4A* + P,B? is 


Fo = JgF4 + Pada B’ [53] 


Thus, the equations of motion of the TG Chern- 
Simons theory are equivalent to the equations of 
motion [2] of the BF theory with gauge group G. 
This equivalence also holds at the level of the action: 


+Scs(C) = Sgr (A, B) [54] 


provided that one chooses the nondegenerate invar- 
iant scalar product to be 


trro(JaP,) = trala) 
trta (aJe) = terre (PaP,) = 0 


For G=SO(3), TG is the Euclidean group of 
isometries of R? and for G=SO(2,1), TG is the 
Poincaré group of isometries of the three-dimensional 
Minkowski space R”!. For these gauge groups, the BF 
action takes the form of the three-dimensional 
(Euclidean or Lorentzian) Einstein—Hilbert action, 
with the interpretation of B=e as the dreibein and 
A =w as the spin connection. The equations of motion 
for e and w express the vanishing of the torsion 
and the Riemann tensor (equivalent to the vanishing 
of the Ricci tensor for m=3), respectively. This 
Chern-Simons interpretation of three-dimensional 
gravity extends to gravity with a cosmological 
constant, with H the appropriate de Sitter or anti-de 
Sitter isometry group (SO(4), SO(3, 1), or SO(2, 2), 
depending on the signature and the sign of the 
cosmological constant). In terms of the BF interpreta- 
tion, this corresponds to the simple topological 
deformation 


[55] 


S pr(A, B) = J tro(B A Fa +łuBABAB) [56] 
M 


of the BF action, which has the deformed local 
symmetries (cf. [S| and [6]) 


6A=dad+y[B,X], 6B=[B,\)t+daN [57] 


A simple way to understand these symmetries is to 
note that the action can be written as the difference 
of two Chern—Simons actions: 


Scs(A + UB) — Scs(A — \/uB) 
= 4,/uS,pF (A, B) [58] 


whose evident standard local gauge symmetries 
6(A + uB) =da+ pr* are equivalent to [57] for 
ASSAT A: 

A detailed account of three-dimensional classical 
and quantum gravity can be found in Carlip 
(1998). 
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Relation with Gravity 


Theories of two-dimensional gravity and topological 
gravity also have a BF formulation (Blau and 
Thompson 1991, Birmingham et al. 1991) which 
resembles the Chern-—Simons BF formulation of 
three-dimensional gravity described above, the nat- 
ural gauge group now being SO(2, 1) or SO(3) or 
one of its contractions. 

In the first-order (Palatini) formulation, the 
Einstein—Hilbert action for four-dimensional gravity 
can be written as 


SEH = J tr(e\eAF,) [59] 


where e is the vierbein and w is the spin 
connection. This action has the general form of a 
BF action with a constraint that B=e^e be a 
simple bi(co-)vector. Thus, four-dimensional 
general relativity can be regarded as a constrained 
BF theory. Although this constraint drastically 
changes the number of physical degrees of freedom 
(BF theory has zero degrees of freedom, while 
four-dimensional gravity has two), this is never- 
theless a fruitful analogy which also lies at the 
heart of the spin-foam quantization approach to 
quantum gravity. This constrained BF description 
of gravity is also available for higher-dimensional 
gravity theories. 

For further details, and references, see Freidel et al. 
(1999) and the review article (Baez 2000). 


Knot and Generalized Knot Invariants 


The known relationship between Wilson loop 
observables of the Chern-Simons theory with 
a compact gauge group and knot invariants 
(Witten 1989), and the interpretation of the three- 
dimensional BF theory as a Chern—Simons theory 
with a noncompact gauge group raise the question of 
the relation of observables of an n= 3 BF theory to 
knot invariants, and suggest the possibility of using 
an n>4 BF theory to define higher-dimensional 
analogs of knot invariants. It turns out that an 
appropriate observable of n=3 BF theory for 
G=SU(2) is related to the Alexander-Conway 
polynomial. The analysis of higher-dimensional BF 
theories requires the full power of the Batalin— 
Vilkovisky (BV) formalism. BV observables general- 
izing Wilson loops have been shown to give rise to 
cohomology classes on the space of imbedded curves. 

For a detailed discussion of these issues, see 
Cattaneo and Rossi (2001) and references therein. 
A relation between the algebra of generalized 


Wilson loops and string topology has been investi- 
gated in Cattaneo et al. (2003). 


See also: Batalin—Vilkovisky Quantization; BRST 
Quantization; Chern—Simons Models: Rigorous Results; 
Gauge Theories From Strings; Knot Invariants and 
Quantum Gravity; Loop Quantum Gravity; Moduli 
Spaces: An Introduction; Nonperturbative and 
Topological Aspects of Gauge Theory; Schwarz-Type 
Topological Quantum Field Theory; Spin Foams; 
Topological Quantum Field Theory: Overview. 


Further Reading 


Baez J (2000) An introduction to spin foam models of 
quantum gravity and BF theory. Lecture Notes in Physics 
543: 25-94. 

Birmingham D, Blau M, Rakowski M, and Thompson G (1991) 
Topological field theory. Physics Reports 209: 129-340. 

Blau M and Thompson G (1989) A New Class of Topological 
Field Theories and the Ray—Singer Torsion. Physics Letters B 
228: 64-68. 

Blau M and Thompson G (1991) Topological gauge theories 
of antisymmetric tensor fields. Annals of Physics 
205: 130-172. 

Blau M and Thompson G (1994) Lectures on 2d gauge theories: 
topological aspects and path integral techniques. In: Gava E, 
Masiero A, Narain KS, Randjbar—Daemi S, and Shafi Q (eds.) 
Proceedings of the 1993 Trieste Summer School on High 
Energy Physics and Cosmology, pp. 175-244. Singapore: 
World Scientific. 

Carlip S (1998) Quantum Gravity in 2 + 1 Diemensions. Cambridge: 
Cambridge University Press. 

Cattaneo A and Rossi C (2001) Higher-dimensional BF theories in 
the Batalin—Vilkovisky formalism: the BV action and general- 
ized Wilson loops. Communications in Mathematical Physics 
221: 591-657. 

Cattaneo A, Cotta-Ramusino P, Fucito F, Martellini M, and 
Rinaldi M, et al. (1998) Four-dimensional Yang-Mills theory 
as a deformation of topological BF theory. Communications in 
Mathematical Physics 197: 571-621. 

Cattaneo A, Pedrini P, and Frohlich J (2003) Topological field 
theory interpretation of string topology. Communications in 
Mathematical Physics 240: 397-421. 

Freidel L, Krasnov K, and Puzio R (1999) BF description of 
higher-dimensional gravity theories. Advances in Theoretical 
and Mathematical Physics 3: 1289-1324. 

Horowitz GT (1989) Exactly soluble diffeomorphism invariant 
theories. Communications in Mathematical Physics 
125: 417-437. 

Schwarz AS (1978) The partition function of a degenerate 
quadratic functional and Ray—Singer Invariants. Letters in 
Mathematical Physics 2: 247-252. 

Schwarz AS (1979) The partition function of a degenerate 
functional. Communications in Mathematical Physics 
67: 1-16. 

Witten E (1989) Quantum field theory and the Jones 
polynomial. Communications in Mathematical Physics 
127: 351-399. 

Witten E (1991) On quantum gauge theories in two dimen- 
sions. Communications in Mathematical Physics 
141: 153-209. 


Bicrossproduct Hopf Algebras and Noncommutative Spacetime 265 


Bicrossproduct Hopf Algebras and Noncommutative Spacetime 


S Majid, Queen Mary, University of London, 
London, UK 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


One of the sources of quantum groups is a 
bicrossproduct construction coming in the case of 
Lie groups from considerations of Planck-scale 
physics in the 1980s. This article describes these 
objects and their currently known applications. See 
also the overview of Hopf algebras which provides 
the algebraic context (see Hopf Algebras and 
q-Deformation Quantum Groups). 

The construction of quantum groups here is 
viewed as a microcosm of the problem of quantiza- 
tion in a manner compatible with geometry. Here 
quantization enters in the noncommutativity of the 
algebra of observables and “curvature” enters as a 
quantum nonabelian group structure on phase 
space. Among the main features of the resulting 
bicrossproduct models (Majid 1988) are 


1. Compatibility takes the form of nonlinear “matched 
pair equations” generically leading to singular 
accumulation regions (event horizons or a max- 
imum value of momentum depending on context). 

2. The equations are solved in an “equal and 
opposite” form from local factorization of a 
larger object. 

3. Different classical limits are related by observer- 
observed symmetry and Hopf algebra duality. 

4. Nonabelian Born reciprocity re-emerges and is 
linked to T-duality. 


It has also been argued that noncommutative 
geometry should emerge as an effective theory of the 
first corrections to geometry coming from any 
unknown theory of quantum gravity. Concrete 
models of noncommutative spacetime currently 
provide the first framework for the experimental 
verification of such effects. The most basic of these 
possible effects is curvature in momentum space or 
“cogravity.” We start with this. 


Cogravity 


We recall that curvature in space or spacetime 
means by definition noncommutativity among the 
covariant derivatives D;. Here the natural momenta 
are p;=—ibD; and the situation is typified by the 
top line in Figure 1. There are also mixed relations 
between the D; and position functions as indicated 





Position Momentum 
Gravity Curved Noncommutative 
lP; Pj] =i PVE jx Px 
Cogravity Noncommutative Curved 
as 
[X;, Xj] = 210 EX, 2r T 2 
Quantum . 
mechanics [x; Pi] = M6, 
Figure 1 Noncommutative spacetime means curvature in 


momentum space. The equations are for illustration. 


for flat space in the bottom line, which is quantum 
mechanics (there is a similar story for quantum 
mechanics on a curved space). We see however a 
third and dual possibility — noncommutativity in 
position space which should be interpreted as 
curvature in momentum space, that is, the dual of 
gravity. This is an independent physical effect and 
comes therefore with its own length scale which we 
denote à. These ideas were made precise in the mid 
1990s using the quantum group Fourier transform; 
see Majid (2000). Here we show what is involved on 
three illustrative examples. 


1. We consider the “spin space” algebra 
R? : ke = i2Aej Xh 


where e12% = 1 and where it is convenient to insert a 
factor 2. This is the enveloping algebra U(su2), that 
is, just angular momentum space but now regarded 
“upside down” as a coordinate algebra (see Hopf 
Algebras and q-Deformation Quantum Groups). 
Then a plane wave is of the form 


Wp = eP*, Pe R? 


where we set 4 = 1 for this discussion. The momenta 
p; are nothing but local coordinates for the 
corresponding point e'4?° e SU, where Xo is the 
representation by Pauli matrices. It is really elements 
of this curved space SU2 where momenta live. Here 
R? = U(su2) has dual C[SU2] and Hopf algebra 
Fourier transform (after suitable completion) takes 
one between these spaces. Thus, in one direction 


Fif)= | dufu | È 

for f a function on SU2. We use the Haar measure on 
SU2. The local result on the right has J the Jacobian 
for the change to the local p coordinates and f is 
written in terms of these. Note that the coproduct in 
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C[SU>] in terms of the p’ generators is an infinite 
series given by the Campbell-Baker—Hausdorff series, 
and not the usual linear one (this is why the measure 
is not the Lebesgue one). The physical content here is 
in the plane waves themselves, one can use any other 
momentum coordinates to parametrize them with the 
corresponding measure and coproduct. Differential 
operators on RÌ are given by the action of elements of 
C[SU 2] and are diagonal on these plane waves, 


fp = FP) Yp 


which corresponds under Fourier transform simply 
to pointwise multiplication in C[SU2]. For example, 
the function \~*(tr — 2) as a function on SU, will 
give a rotationally invariant wave operator which is 
also invariant under inversion in the group. Its value 
on plane waves is 


PES 2 
sa tr(eh?"— 1) = 5 (cos(Alp|) — 1) 


In the limit A — 0 this gives the usual wave operator 
on RÌ. 

It is also possible to put a differential graded 
algebra (DGA) structure of differential forms on this 
algebra, the natural one being 


2 


dx; = AT x0 = Gx. = i dx; 
u 


(dXX; = dx; = ic; dx T iuô;0 


where @ is the 2 x 2 identity matrix which, together 
with the Pauli matrices o;, completes the basis of 
left-invariant 1-forms. The 1-form @ provides a 
natural time direction, even though there is no time 
coordinate, and the new parameter u Æ 0 appears as 
the freedom to change its normalization. The partial 
derivatives 0’ are defined by 


d(x) = (O'p)dx; + (OWA 


and act diagonally on plane waves as 





while 0° =ipi(tr — 2)/27 is computed as above. 

Note that u cannot be taken to be zero due to an 
anomaly for translation invariance of the DGA. It is 
in fact a typical feature of noncommutative differ- 
ential geometry that there is a 1-form 0 generating d 
by commutator which can be required as an extra 
cotangent direction with its associated partial 
derivative an induced Hamiltonian. In the present 
model we have 


My = i72 0y + O(2) 


which is of the form of Schrödinger’s equation with 
respect to an auxiliary time variable and for a 
particle with mass 1/,. 

The reader may ask what happens to the 
Euclidean group of translations and rotations in 
this context. From the above we find that 
U)(poinc;) = C[SU2] >< U(su2), the semidirect pro- 
duct generated by translations 0’ and usual rota- 
tions. This in turn is the quantum double D(U(su2)) 
of the classical enveloping algebra, and as such a 
quantum group with braiding etc. (see Hopf 
Algebras and g-Deformation Quantum Groups). 
This quantum double has been identified as part 
of an effective theory in 2+ 1 quantum gravity in a 
Euclidean version based on Chern-Simons theory 
with Lie algebra poinc, and the spin space algebra 
proposed as an effective theory for this. The 
quotient of RÌ by an allowed value of the quadratic 
Casimir x* (which then makes it a matrix algebra) 
is called a “fuzzy sphere” and appears as a “world- 
volume algebra” in certain string theories and 
reduced matrix models. The noncommutative dif- 
ferential geometry that we have described is due to 
Batista and the author. 

2. We take the same type of construction to 
obtain the “bicrossproduct model” spacetime 
algebra 

RI»? : E = AX bee =) 
These are the relations of a Lie algebra b, (say) but 
again regarded as coordinates on a noncommutative 
spacetime. Here A is a timescale which can be 
written as a mass scale «=1/X instead. We 
parametrize the plane waves as 

Upp =P% Pppp po = Vpro wp pop 
which identifies the p” as the coordinates of the 
nonabelian group B} =Rœ<,R? with Lie algebra 
b. The group law in these coordinates is read off 
as usual from the product of plane waves, which 
also gives the coproduct of C[B,] on the p”. We 
have parametrized plane waves in this way 
(rather than the canonical way by the Lie algebra 
as before) in order to have a more manage- 
able form for this. We do pay a price that in these 
coordinates group inversion is not simply —p", 
but 


(p,p°) = (—e%® p, p°) 


which is also the action of the antipode S on the 
abstract p” generators. 

In particular, the right-invariant Haar measure on 
B, in these coordinates is the usual d*p so the 
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quantum group Fourier transform reduces to the 
usual one but normal ordered, 


FA) = |, dp fpe" 
R 
(one can also Fourier transform with respect to the left- 
invariant measure dp e*”" on B‚). The inverse is again 
given in terms of the usual inverse transform if we 
specify general fields ~ in Ry by normal ordering of 
usual functions, which we shall do. As before, the action 
of elements of C[B.,]| defines differential operators on 
Ry > and these act diagonally on plane waves. 


We also have a natural DGA with 
deer = ad (dt)x„ — x„dt = idx, 


which leads to the partial derivatives 





T. ae 
= age =i) 
LS = eP ) 


for normal-ordered polynomial functions % or in 
terms of the action of the coordinates p” in C[B,]. 
These OM do respect our implicit -structure 
(unitarity) on RE’ but in a Hopf algebra sense 
which is not the usual sense, since the action of the 
antipode S is not just —p”. This can be remedied by 
using adjusted derivatives L~/7)0" where 


Lap =: (x,t + id) =e” ap 


In this case the natural 4D Laplacian is L7((a°y — 
> (6')7), which acts on plane waves as 


E x (cosh(Ap®) — 1) + p*e” 


where 3 
p = ` pi 
i 


This deforms the usual Laplacian in such a way as to 
remain invariant under the Lorentz group (which now 
acts nonlinearly on B4 in this model) and under group 
inversion. 

This model may provide the first experimental test 
for noncommutative spacetime and cogravity. For the 
analysis of an experiment, we assume the identification 
of noncommutative waves in the above normal-ordered 
form with classical ones that a detector might register. 
In that case one may argue (Amelino-Camelia and 
Majid 2000) that the dispersion relation for such waves 
has the classical derivation as 0p°/Op' which now 
computes as propagation speed for a massless particle: 


Ea 
Op 


__ \p° 





in units where 1 is the usual speed of light. So 
the prediction is that the speed of light depends 
on energy. What is remarkable is that even if 
A~ 10-*4 5 (the Planck timescale), this prediction 
could in principle be tested, for example using y-ray 
bursts. These are known in some cases to travel 
cosmological distances before arriving on Earth, and 
have a spread of energies from 0.1-100 MeV. 
According to the above, the relative time delay A, 
on traveling distance L for frequencies correspond- 
ine top’, p’ + Apo is 


L 
Ar ~ MAp = ~ 10-**s x 100 MeV x 10'°y ~ 1 ms 


which is in principle observable by statistical 
analysis of a large number of bursts correlated 
with distance (determined, e.g., by using the Hubble 
telescope to lock in on the host galaxy of each 
burst). Although the above is only one of a class of 
predictions, it is striking that even Planck-scale 
effects are now in principle within experimental 
reach. 

We now explain what happens to the full 
Poincaré symmetry here. The nonlinear action of 
the Lorentz group on B, Fourier transforms to an 
action on the generators of Re; which combines 
with the above action of the p“ to generate an entire 
Poincaré quantum group U(so1,3 )><C[B,]. We will 
say more about its “bicrossproduct” structure in a 
later section. The above wave operator in momen- 
tum space is the natural Casimir in these momentum 
coordinates. A common mistake in the literature for 
this model is to suppose that the Casimir relation 
alone amounts to a physical prediction, whereas in 
fact the momentum coordinates are arbitrary and 
have meaning only in conjunction with the plane 
waves that they parametrize. The deformed Poincaré 
as an algebra alone is actually isomorphic to the 
undeformed one by a different choice of generators, 
so by itself has no physical content; one needs rather 
the noncommutative spacetime as well. Prior work 
on the relevant deformed Poincaré algebra either did 
not consider it acting on spacetime or took it acting 
on classical (commutative) Minkowski spacetime 
with inconsistent results (there is no such action as a 
quantum group). 

The above model was introduced by Majid 
and Ruegg (1994) and later tied up with a dual 
approach of Woronowicz. There is also a previous 
“«-Poincaré” version of the Hopf algebra alone 
obtained (Lukierski et al. 1991) in another context 
(by contraction of U,(so2,3)) but with fundamentally 
different generators and relations and hence 
different physical content (e.g., the Lorentz 
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generators there do not close among themselves but 
mix with momentum). 

3. The usual Heisenberg algebra of quantum 
mechanics is another possible noncommutative 
(phase) space; one may also take the same algebra 
and view it as a noncommutative spacetime, so: 


Re”: een = 10 


for any antisymmetric tensor 0,,. This is not a 
Hopf algebra but it turns out that this model can 
also be completely solved by Hopf algebra meth- 
ods, namely the theory of covariant twists. Twist 
models also include versions of the noncommuta- 
tive torus studied by Connes, and related 0-spaces, 
which are nontrivial at the level of C*-algebras. 
However, at an algebraic level, all covariant 
structures are automatically provided by applying 
the twisting functor 7 to the desired classical 
construction (see Hopf Algebras and g-Deformation 
Quantum Groups). This is not usually appreciated in 
the physics literature on such models, but see Oeckl 
(2000). 

Thus, consider H = U(R') with generators p” = 
—i0" acting as usual on functions on Minkowski 
space. It has a cocycle 


F = eli/2)b" 20" Ow, 


which induces a new product e on functions by 
pew=-(F'(6@w)). This is just the standard 
Moyal product, in the present case on Rt’, viewed 
as a covariant twist using Hopf algebra methods. 
The Hopf algebra U(R"*) in principle has a twisted 
coproduct given by Ap=F(A( ))F' but this does 
not change as the algebra is commutative. 

Next, H also acts covariantly on 9(R‘%), the 
usual algebra of differential forms, and twisting this 
in the same way gives 


w(x) edx,, = pdx, = (dx,,)y = (dx) e 


unchanged. This is because no terms higher than 
p” © p’6,, contribute and then d(1)=0. The asso- 
ciated partial derivatives defined by d are likewise 
unchanged and act in the usual way as derivations 
with respect to both the e product and the 
undeformed product. The result may look different 
when the same (x) is expressed as a function of the 
variables with the e product. In other words, the 
only deformation comes from the Moyal product 
itself, with the rest being automatic. Moreover, the 
plane waves themselves are unchanged because 
(x- k)” =(x- k)” due to @ being antisymmetric. 
Hence, 

W(x) = et =e, 


pie (x) = Ree (x) 


where p’=-—10". The wave operator —0,0" is 
therefore given by the action of p,p” and has value 
k„k” as usual on plane waves. On the other hand, 


2) RERMG 
Dp @ Wye = IRR Cw aby y 


or in algebraic terms the twist functor 7 applied 
to the Fourier transform implies also a twisted 
coproduct or coaddition law for the abstract k” 
generators, now different from the linear one for the 
covariance momentum operators p”. This leads to 
some of the more interesting features of the model. 

One immediately also has a Poincaré quantum 
group here, Ug(poinc, 3), obtained by similarly 
twisting the classical U(poinc, 3). We just view 
F as living here rather than in the original H. The 
translation sector is unchanged as before but if M°’ 
are the usual Lorentz generators, then 


ArM! = M°? @14+12@M% 
+5(p* ® 6 p” — 0°, .p# Q p°) 
— 7 p" @ Op" — Opt Bp) 


using the metric nu to raise or lower indices. The 
antipode is also modified according to the theory 
in Majid (1995). The relations in the Poincaré 
algebra are not modified (so, e.g., pyp” will 
remain central). Any construction originally Poin- 
caré covariant becomes covariant under this 
twisted one after application of the twisting 
functor. As with the differentials above, the 
action on RS is not actually modified but may 
appear so when functions are expressed in terms 
of the e product. 

The above model is popular at the time of 
writing in connection with string theory. Here, an 
effective description of the endpoints of open 
strings landing on a fixed 4-brane has been 
modeled conveniently in terms of the e product 
above (Seiberg and Witten 1999). It should be 
borne in mind, however, that this fixed 4-brane 
lives in some of the higher dimensions of the string 
spacetime, so this is not necessarily a prediction of 
noncommutative spacetime R!?. 

In fact, a proposal superficially similar to Ro 
above was already proposed in Snyder (1947). 
Here 


lac a] = iA M” 
where à is our length scale and the M” are now 
operators with the usual commutation rules for the 
Lorentz algebra with themselves and with x” and the 
momenta p”. The latter obey 


pt, x] = iln” — X pp), [p,p] = 0 
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so the entire Poincaré algebra is undeformed but the 
phase-space relations are deformed. Snyder also 
constructed the orbital angular momentum realiza- 
tion M” =x”p” — x”p". This model is not a propo- 
sal for a noncommutative spacetime because the 
algebra does not even close among the x”. Rather it 
is a proposal for “mixing” of position and Lorentz 
generators. On the other hand (which was the point 
of view in Snyder (1947)), in any representation of 
the Poincaré algebra, the M“” become operators and 
in some sense numerical. The rotational sector has 
discrete eigenvalues as usual, so to this extent the 
spacetime has been discretized. Although not fitting 
into the methods in this article, it is also of interest 
that the relations above were motivated by con- 
sidering p“ as coordinates projected from a SD flat 
space to de Sitter space and x” as the 5-component 
of orbital angular momentum in the flat space. 

To conclude this section, let us note that there are 
further models that we have not included for lack of 
space. One of them is a much-studied RP? in which 
t is central but the x; enjoy complicated q-relations 
best understood as q-deformed Hermitian matrices. 
One of the motivations in the theory was the result 
in Majid (1990) that q-deformation could be used to 
regularize infinities in quantum field theory as poles 
at g=1. Another entire class is to use noncommu- 
tative geometry and quantum group methods on 
finite or discrete spaces. Unlike lattice theory where 
a finite lattice is viewed as approximation, these 
models are not approximations but exact noncom- 
mutative geometries valid even on a few points. The 
noncommutativity enters into the fact that finite 
differences are bilocal and hence naturally have 
different left and right multiplications by functions. 
Both aspects are mentioned briefly in the overview 
article (see Hopf Algebras and g-Deformation 
Quantum Groups). Also, on the experimental 
front, another large area that we have not had 
room to cover is the prediction of modified 
uncertainty relations both in spacetime and phase 
space (Kempf et al. 1995). 

Moreover, for all of the models above, once one 
has a noncommutative differential calculus one may 
proceed to gauge theory etc., on noncommutative 
spacetimes, at least at the level where a connection 
is a noncommutative (anti-Hermitian) 1-form a. 
Gauge transformations are invertible (unitary) 
elements u of the noncommutative “coordinate 
algebra” and the connection and curvature trans- 
form as 


1 


a—-utau+tu'du 


F(a) = da +a Aa > u F(a)u 


The full extent of quantum bundles and gravity 
(see Quantum Group Differentials, Bundles and 
Gauge Theory) and quantum field theory is not 
always possible, although both have been done for 
covariant twist examples (for functorial reasons) 
and for small finite sets. For the first two models 
above, for example, it is not clear at the time of 
writing how to interpret scattering when the addi- 
tion of momenta is nonabelian. 


Matched Pair Equations 


Although we have presented noncommutative space- 
time first, the first actual application of quantum 
group methods to Planck-scale physics was the 
Planck-scale Hopf algebra obtained by a theory of 
bicrossproducts. Like the Snyder model, the inten- 
tion here was to deform phase space itself, but since 
then bicrossproducts have had many further appli- 
cations. The main ingredient here is the notion of a 
pair of groups (G, M), say, acting on each other as 
we explain now. The mathematics here goes back to 
the early 1910s in group theory, but also arose in 
mathematical physics as a toy version of Einstein’s 
equation in the sense of compatibility between 
quantization and curvature (see the next section). 

By definition, (G, M) are a matched pair of 
groups if there are left and right actions 


M—MxG5G 
of each group on the set of the other, such that 


sle =s, eDbu=u, sPe=e, ex<xlu=e 
(squ) <v = s<((uv), s> (tœu) = (st) bu 
sœ (uv) = (sbu)((s<lu) bv) 
(st) <u = (s<{(tlu))(t<du) 


for all u,v € G,s,t € M. Here e denotes the relevant 
group unit element. As a first application of such 
data, one may make a “double cross product group” 
G œx M with product 


(u,s).(v,t) = (ulv); (s<dv)t) 


and with G, M as subgroups. Since it is built on the 
direct product space, the bigger group factorizes into 
these subgroups. Conversely, if X is a group 
factorization such that the product G x M—X is 
bijective, each group acts on the other by actions 
>, < defined by su = (sœu)(s<u) for u € G and s € 
M, where s, u are multiplied in X and the product is 
factorized as something in G and something in M. 
So finite group matched pairs are equivalent to 
group factorizations. In the Lie group context, the 
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corresponding system of differential equations is 
equivalent to a local factorization. 

There is a nice graphical representation of the 
matched pair conditions which relates to “surface 
integration.” Thus, consider squares 


ups 


s| Z sau 


u 


labeled by elements of M on the left edge and 
elements of G on the bottom edge. We can fill in the 
other two edges by thinking of an edge transformed 
by the other edge as it goes through the square either 
horizontally or vertically, the two together is the 
surface transport = across the square. The matched 
pair equations have the meaning that a square can 
be subdivided either vertically or horizontally as 
shown in Figure 2, where the labeling on vertical 
edges is to be read from top down. The transport 
operation here is nothing other than normal order- 
ing in the factorizing group. In the Lie setting, it 
means that the equations can be solved from 
infinitesimal solutions (a matched pair of Lie 
algebras) by a simultaneous double integration over 
the group (i.e., building up a large box from many 
small ones). If one considers solving the quantum 
Yang-Baxter equations on groups, they appear in 
this notation as an equality of surface transport 
going two ways around a cube, and the classical 
Yang-Baxter equations as curvature of the under- 
lying higher-order connection. 

Also in this notation there is a bicrossproduct 
quantum group defined in Figure 3, at least when M 
is finite. The expressions are considered zero unless 
the juxtaposed edges have the same group labels. In 
that case, the product is a semidirect product 
algebra C(M)><I1CG of functions on M by the 
group algebra of G. The coproduct is the adjoint of 
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Figure 2 Matched pair condition as a subdivision property. 
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Figure 3 Bicrossproduct Hopf algebra showing horizontal 
product and vertical coproduct as an “unproduct.” 


this, so is a semidirect coalgebra C(M)><C6G. Hence 
the two together are denoted C(M)><CG. The dual 
needs G finite and has the same form but with 
vertical and horizontal compositions interchanged, 
that is, a bicrossproduct CM><C(G). Both Hopf 
algebras have the above labeled squares as basis. 

It is possible to generalize both bicrossproducts 
and double cross products associated to matched 
pairs to general Hopf algebras H,»<IH and 
HıDœ<H2, respectively, where H,,H2 are Hopf 
algebras (see Majid 1990) and to relate the two in 
general by dualization of one factor. Another 
general result (Majid 1995) is that Hı><H2 acts 
covariantly on the algebra Hj} from the right, or 
H,><H2 acts covariantly on Hž from the left. A 
third general result is that bicrossproducts solve the 
extension problem 


Hı — H — H; 


meaning that such a Hopf algebra H subject to some 
technical requirements (such as an algebra splitting 
map H — H) is of the form H ~ Hı><H2. The 
theory was also extended to include cocycle bicros- 
sproducts at the end of the 1980s (by the author). 
The finite group case, however, was first found by 
Kac and Paljutkin (1966) in the Russian literature 
and later rediscovered independently in Takeuchi 
(1981) and in the course of Majid (1988). 


The Planck-Scale Hopf Algebra 


We consider a quantum algebra of observables H 
and ask when it is a Hopf algebra extending some 
classical position coordinate algebra C[M] and some 
possibly noncommutative momentum coordinate 
algebra U(g) in the form of a strict extension 


CIM] — H — U(g) 


From the theory above this problem is governed by local 
solutions of the matched pair equations on (G, M). It 
requires that H Y C[M] >< U(g) as an algebra, that is, 
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the quantization of a particle moving on orbits in M 
under some action of G (in an algebraic setting, or 
one can use von Neumann or C*-algebras etc.). And 
it requires the classical phase space to be a 
nonabelian or “curved” group Mp<g*. This extends 
to a coproduct on H which becomes the bicross- 
product Hopf algebra C[M]»<JU(g). In this way, the 
problem which was open at the start of the 1980s of 
finding true examples of Hopf algebras was given a 
physical interpretation as being equivalent to finding 
quantum-mechanical systems reconciled with curva- 
ture, and the equations that governed this were the 
matched pair ones (Majid 1988). 

We still have to solve these equations. In the 
Lie case, they mean a pair of cross-coupled first- 
order equations on G x M. These can be solved 
locally as a double-holonomy construction in line 
with the surface transport point of view, but are 
nonlinear typically with singularities in the non- 
compact case. The equations are also symmetric 
under interchange of G,M so Born reciprocity 
between position and momentum is extended to 
the quantum system with generally “curved” 
position and momentum spaces. Moreover, in so 
far as Einstein’s equation G,,=87T),, is also a 
compatibility between a quantity in position 
space and a quantity originating (ultimately) in 
momentum space, the matched pair equations can 
be viewed as a toy version of these. 

Let us note that the reason to look for H a Hopf 
algebra in the first place, aside from the reasons 
already given, is for observer—-observed symmetry 
(this was put forward as a postulate for Planck-scale 
physics). Thus, H* is also an algebra of observables 
of some dual system, in our case U(m)><C[G] or 
particles in G moving on orbits under M. Thus, 
Born reciprocity is truly implemented in the 
quantum/curved system by Hopf algebra duality. 
Put another way, Hopf algebras are the simplest 
objects after abelian groups that admit Fourier 
transform (see Hopf Algebras and g-Deformation 
Quantum Groups) and we require this on phase 
space if Born reciprocity is to be extended to the 
quantum/curved system. 

The Planck-scale Hopf algebra is the simplest 
example of these ideas (Majid 1988). Here G= 
M=R and the matched pair equations can be solved 
completely. The general solution is 


; ð i 

= ih(1 —e %*) — X= 

for the action of one group with generator p on 
functions of x in the other group and vice-versa. It 
has two parameters which we have denoted as 4 and 


a background curvature scale y, and the correspond- 
ing bicrossproduct C[p|><C[x] is 
lp, x] = ih(1 7 e m 
Ap=pSe™+1p, 
Sp = —pe™ 


Ax=x@®1+1®&x 
ex = ep =0 


Sx =—xX, 


where we should allow power series or take e% as 
an invertible generator. 

It is important to note that the matched pair 
equations here have only this solution and it is 
necessarily singular at p=0 or x=O. The inter- 
pretation in position space is as follows. Consider an 
infalling particle of mass m with fixed momentum 
p=mv., (in terms of the velocity at infinity). By 
definition, p is the free-particle momentum and acts 
on R as above. This corresponds to a free-particle 
Hamiltonian p7/2m and induces 


p=0 
go! N E 
m k E a s ee 


at the classical level. We see that the particle takes 
an infinite time to reach the origin, which is an 
accumulation point. This can be compared with the 
formula in standard radial infalling coordinates 


1 
1 +56M 


for distance x from the event horizon of a black hole 
of mass M (here G is Newton’s constant and c the 
speed of light). So y ~ c?/GM and for the sake of 
further discussion we will use this value. With a 
little more work, one can then see that 





mM <« ms 


Ciac] 
C(X)usual curved geometry 
mM > m? 


C[x] C[p]usual qu. mech. 


where mp is the Planck mass of the order of 10” g 
and X=RPD<R is a nonabelian group. In the first 
limit, the particle motion is not detectably different 
from usual flat space quantum mechanics outside 
the Compton wavelength from the origin. In the 
second limit, the estimate is such that noncommu- 
tativity would not show up for length scales much 
larger than the background curvature scale. 

This Hopf algebra is also the simplest way to 
extend classical position C[x] and momentum C[p] 
in the sense above. In other words, requiring to 
maintain observer-observed symmetry or Born 
reciprocity throws up both quantum mechanics (in 
the form of 4) and something with the flavor of 
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gravity (in the form of y) and both are required for a 
nontrivial Hopf algebra. Moreover, the construction 
necessarily has a self-dual form and indeed the 
dually paired Hopf algebra is C[p|><C[x] with new 
parameters hb =1/b and y'=hby if we take the 
standard pairing x,p across the two algebras. Hopf 
algebra duality realized by the quantum group 
Fourier transform F takes one between the two 
models. 


Bicrossproduct Poincare 
Quantum Groups 


Another example from the 1980s in the same family 
as the Planck-scale Hopf algebra is G=SU> and 
M =B.,, a nonabelian version of R? with Lie algebra 
b, of the form 


5, he 1G, EA 0 

for i1=1,2. The required solution of the matched 
pair equations was found in Majid (1990) and has a 
nonlinear action of rotations on B,. The interpreta- 
tion of C[B}><U(su2) is of particles moving along 
orbits which are deformed spheres in B}, and there 
is a dual model where particles move instead on 
orbits in SU» under the action of b}. Moreover, 
from the general theory of bicrossproducts, we 
automatically have a covariant action of C[B_ >< 
U(su2) on the auxiliary noncommutative space 
R? =U(b.) with relations as above. 

The quantum group here was actually obtained as a 
Hopf—von Neumann algebra but we limit ourselves to 
the underlying algebraic version. Also, there is of 
course nothing stopping one considering this Hopf 
algebra equally well as U)(poinc;), that is, a deforma- 
tion of the group of motions on R?, rather than as an 
algebra of observables. The only difference is to denote 
the generators of C[B,] by the symbols p’, reserving x; 
instead for the auxiliary noncommutative space. We 
lower i,j,k indices using the Euclidean metric. Then 
the bicrossproduct has the form 


[M;, M;] = ie Mg 
[M;, p3] = ieis“ pr 


pi, Pj] = 9, 
[M3, pj] = ie3;*Pe, 
as usual, for i,;=1,2,3, and the modified relations 
1 — e723 


1 
[M;, pj] = 5 ey” (= 


for i,j=1,2 and p? =pf + p>. The coproducts are 


— a) + ire; ppp 


AM; = M; ® e7% + AM3 @ p; +1 M; 
Api = pi@e +1 8p; 


for 1=1,2 and the usual additive ones for p3, M3. 
There is also an appropriate counit and antipode. 
The deformed spheres under the nonlinear rotation 
in Majid (1990) are constant values of the Casimir 
for the above algebra. This is 


s (cosh(\p3) — 1) + p*e™? 

which from the group of motions point of view 
generates the noncommutative Laplacian when 
acting on R. The model here is a Euclidean 
inhomogeneous one. 

The four-dimensional (4D) version U(so;,3)>< 
C[B,] of this construction (Majid and Ruegg 
1994) is again linked to Planck-scale predictions, 
this time as a generalized symmetry. In terms of 
translation generators p”, rotations M; and boosts 
N; we have 


lp", p] = 0, [M;, M;| = iej? M} 
[N;, Nj] = —ie;*Me, [M;, Nj] = iei*Ne 
ip’, Mil = 0, ip’, M; = iep", ip°, N;] = —1p; 


as usual, and the modified relations and coproduct 


er er r 

Dal —5% = 

AN; =N; 9 1 +e™™ @ N; + Acjrp 2 M; 
Ap =pi@l+e Qp 


+ w) + idp'p; 


and the usual additive coproducts on p°, Mj. This 
time the Lorentz group orbits in B} are deformed 
hyperboloids rather than deformed spheres, and the 
Casimir that controls this has the same form as 
above but with — in the cosh term, that is, the 
model is a Lorentzian one. We know from the 
general theory of bicrossproducts that this Hopf 
algebra acts on U(b,) =Ry° the spacetime in the 
section “Cogravity,” and the Casimir induces the 
wave operator as we have seen there. 

Let us look a bit more closely at the deformed 
hyperboloids. Because neither group here is com- 
pact, one expects from the general theory of 
bicrossproducts to have limiting accumulation 
regions. This is visible in the contour plot of p° 
against |p| in Figure 4, where the p? > 0 mass shells 
are now cups with almost vertical walls, compressed 
into the vertical tube 


ipl < AT 


In other words, the 3-momentum is bounded above 
by the Planck momentum scale (if is the Planck 
time). Indeed, the light-cone equation (setting the 
Casimir to zero) reads Aļp|=1 —e™ so this is 
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Figure 4 Deformed mass-shell orbits in the bicrossproduct 
curved momentum space for \ = 1. 








immediate. Nevertheless, this observation is so 
striking that the bicrossproduct model has been 
dubbed “doubly special” and spawned the search for 
other such models. Such accumulation regions are a 
main discovery of the noncompact bicrossproduct 
theory visible already in the Planck-scale Hopf 
algebra. The model further confirms the role of 
the matched pair equations as a toy version of 
Einstein’s. 


Poisson-Lie T-Duality 


We have explained in Section 3 that the matched 
pair equations are equivalent to a local factorization 
of Lie groups, with the action and back-reaction 
created “equally and oppositely” from this. For the 
two models in the last section, these are SL2(C) 
factorizing as SU2 and a 3D B,, and SO2,3 locally as 
SO1,3 and a 4D B,. The first of these examples is in 
fact one of a general family based on the Iwasawa 
decomposition Gc = G><IG* where G is a compact 
Lie group with complexification Gc and G* a 
certain solvable group. From this, one may construct 
a solution (G, G*) of the matched pair equations and 
bicrossproduct quantum group 


C[G*><U (g) 


associated to all complex simple Lie algebras. This is 
again part of the bicrossproduct theory from the 
1980s. On the other hand, the Lie algebra g* here 
can be identified with the dual of g in which case its 
Lie algebra corresponds to a Lie coproduct 
6:g—g®g and makes (g,6) into a Lie bialgebra in 
the sense of Drinfeld. This 6 exponentiates to a 
Poisson bracket on G making it a “Poisson—Lie 


group” and the quantization of this is provided 
by the quantum group coordinate algebras C,[G] 
(see Hopf Algebras and g-Deformation Quantum 
Groups and Classical r-matrices, Lie Bialgebras, and 
Poisson Lie Groups). The bicrossproduct quantum 
groups are nevertheless unrelated to the latter even 
though they spring form related classical data. 

As already discussed, one interpretation here is 
of quantized particles in G* moving on orbits 
under G and in vice versa in the dual model. The 
dual model is equivalent in the sense that the 
states of one (in the sense of positive-linear 
functionals) lie in the algebra of observables of 
the other and we also saw in the Planck-scale 
example inversion of structure constants reminis- 
cent of T-duality in string theory. Motivated in 
part by this duality Klimcik (1996) along with 
Severa in the mid 1990s showed that indeed a 
o-model on G could be constructed in such a way 
that there was a matching dual o-model on G* in 
some sense equivalent in terms of solutions to the 
equations of motion. The Lagrangians here have 
the usual form 


L=E,(u‘d,u,ud_n), 
L = E,(s71d,s,s7'8_s) 


where u:R’'—G and s:R’'!-—G* are the dyna- 
mical fields, except that the inner products E, Ê 
are not constant. Rather they are obtained by 
solving nonlinear differential equations on the 
groups defined through the structure constants 
of g, g* and the Drinfeld double D(g). At the time, 
T-duality here was well understood in the case of 
abelian groups while these Poisson—Lie T-duality 
models provided the first convincing nonabelian 
models. 

This construction was extended by Beggs and 
Majid (2001) to a general matched pair (G, M), that 
is, a o-model on G dual to one on M. The Poisson- 
Lie case is the special case where the actions are 
coadjoint actions and the Lie algebra of GKM is 
D(g). The solutions of the equations of motion for 
the two systems are created “equally and oppo- 
sitely” from one on the factorizing group. It could 
be expected that T-duality ideas again play a role in 
Planck-scale physics. 


Other Bicrossproducts 


There are also infinite-dimensional factorizations 
such as the Riemann-Hilbert problem (see 
Riemann-Hilbert Problem) in the theory of 
integrable systems and hence infinite-dimensional 
matched pairs and bicrossproducts linked to 


274 Bicrossproduct Hopf Algebras and Noncommutative Spacetime 


them. Here we mention just one partly infinite 
example of current interest. 

Thus, the diffeomorphisms on the line R may be 
factorized into transformations of the form ax + b 
and diffeomorphisms that fix the origin and have 
unit differential there. After a (logarithmic) change 
of generators to arrive at an algebraic picture, one 
has a bicrossproduct 


H(1) = U(b, aH- 


where b} is now the two-dimensional (2D) Lie 
algebra with relations [x,y] =x and H» is the algebra 
of polynomials in generators 6, and a certain 
coalgebra as a model of the coordinate algebra of 
the group of diffeomorphisms that fix the origin with 
unit differential. The Hopf algebra H(1) was intro- 
duced by Connes and Moscovici (1998) although not 
actually as a bicrossproduct (but motivated by the 
bicrossproduct theory) as part of a family H (n) useful 
in cyclic cohomology computations. It has cross 
relations and coproduct determined by 


lon, x] = Ont, lon, y] = Non, 
Ad; =, 814+186, 
Ax =x@14+18x+4+61 ®y, 
Ay=y@1+1®y 


which we see has a semidirect product form where 
bn IX = 6n415 Ón GY =n6,. The coalgebra is also a 
semidirect coproduct by means of a back-reaction of 
Hœ in B, (expressed as a coaction). From the 
bicrossproduct theory, we also have a dual model 


C[B.]><U (diff) 


where diffo is the Lie algebra of the group of 
diffeomorphisms fixing the origin. As such it could be 
viewed as in the family of examples in the section 
“Bicrossproduct Poincaré quantum groups” but 
now with a 2D B. We also conclude from 
the bicrossproduct theory that this acts covariantly on 
R4 = U(b,) after introducing the scaling parameter A. 

Finally, the Hopf algebra H(1) is also part of a 
family of bicrossproduct Hopf algebras built on rooted 
trees and related to bookkeeping of overlapping 
divergences in renormalizable quantum field theories 
(see Hopf Algebra Structure of Renormalizable Quan- 
tum Field Theory). While we have not had room to 
cover all bicrossproduct quantum groups of interest, it 
would appear that bicrossproducts are indeed inti- 
mately tied up with actual quantum physics. 


See also: Classical Matrices, Lie Bialgebras, and 
Poisson Lie Groups; Hopf Algebra Structure of 
Renormalizable Quantum Field Theory; Hopf Algebras 
and g-Deformation Quantum Groups; Quantum Group 
Differentials, Bundles and Gauge Theory; 
Riemann-Hilbert Problem; von Neumann Algebras: 
Introduction, Modular Theory, and Classification Theory. 
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Introduction 
Consider the following equation: 
F(X, u) = 0 1 


where X is the variable, u is a parameter, and X, u, F 
belong to appropriate (finite- or infinite-dimensional) 
spaces. The problem of bifurcation theory is to 
describe the singularities of the set of solutions 


S.=4{X; (X, 4) satisfies F(X, uw) = 0} 


The word “bifurcation” was introduced by H 
Poincaré (1885) in his study of equilibria of rotating 
liquid masses. 

The simplest example is the study of the real roots 
x of a quadratic polynomial 


x? + bx+¢=0 [2] 


where u is represented by the pair of parameters 
(b,c) € R?. As it is well known, real roots are 
determined by the sign of 


AŽ p — 4c 


For A <0, there is no real solution of [2], while 
there are two solutions x+ in the region A > 0, 
which merge when the distance between the point 
(b,c) and the parabola A =0 tends towards 0. It is 
then clear that a singularity occurs in the structure 
of the set of solutions of |2] at the crossing of the 
parabola A=O or, in other words, a bifurcation 
occurs in the parameter space (b,c) on the parabola 
A=0. A point (uo,xo) €R? is then called a 
bifurcation point if tuo =(b,c) satisfies A=0, and 
Xo = —b/2. 

In the theory of differential equations, F(X, m) 
often represents a vector field. This study is then 
concerned with the existence of equilibrium solu- 
tions to the differential equation 


dx 
— = F(X 3 
= F(X, 1) 3] 
and is therefore referred to as static bifurcation 
theory. In addition, dynamic bifurcation theory is 
concerned here with “changes” in the dynamic 
properties of the solutions of the differential 


Bifurcation Theory 275 


equation as p varies. A widely used way to 
characterize these “changes” is to say that the vector 
field F( - , o) is structurally stable if the sets of orbits 
of the differential equation are homeomorphic for u 
close to uo, with homeomorphisms which preserve 
the orientation of the orbits in time ¢. Then a 
bifurcation occurs at f=po if F(-,po) is not 
structurally stable. It turns out that there is a close 
link between the stability properties of equilibrium 
solutions of the differential equation and the type of 
the bifurcation in static theory. 

The tools developed in bifurcation theory are 
extensively used to solve concrete problems arising 
in physics and natural sciences. These problems may 
be modeled by ordinary or partial differential 
equations, integral equations, but also delay equa- 
tions or iteration maps, and in all these cases the 
presence of parameters naturally leads to bifurcation 
phenomena. They can be regarded as problems of 
the form [1] or [3], in suitable function spaces, and 
bifurcation theory allows to detect solutions and to 
describe their qualitative properties. During the last 
decades, a class of problems in which the use of 
bifurcation theory led to significant progress is 
concerned with nonlinear waves in partial differen- 
tial equations, including hydrodynamic problems, 
nonlinear water waves, elasticity, but also pattern 
formation, front propagation, or spiral waves in 
reaction—diffusion type systems. 


Examples in One and Two Dimensions 


The most complete results in bifurcation theory are 
available in one and two dimensions. The study of 
static bifurcations in one dimension is concerned 
with scalar equations 


f(x, u) = 0 4] 


where x € R, u € R, and the function f is supposed to 
be regular enough with respect to (x,u). When 
f (xo, 40) =0 and the derivative of f with respect to x 
satisfies xf (xo, uo) Æ 0, the implicit function theorem 
gives a unique branch of solutions x(u) for u close to 
uo, and shows the absence of bifurcation points near 
(uo, xo). Bifurcation theory intervenes when 


Of (x0, Ho) = 0 [5] 


and one cannot apply the implicit function theorem 
for solving with respect to x near xọ. A complete 
description of the set of solutions near (xo, wo) can 
be obtained by looking at the partial derivatives of f 
with respect to x and u. 
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For example, if 


Out (x0, Lo) F 0, 


it is possible to solve with respect to u and obtain a 
regular solution p(x) such that u(xo)= uo and 
f(x, u(x) = 0. In addition, if the second order 
derivative 


O-f (x0, Wo) #0 


the picture of the solution set in the plane (u, x), also 
called bifurcation diagram, shows a turning point 
with a fold opened to the left or to the right 
depending upon the sign of the product 0,,f(x0, po): 
O2f (x0, uo); see Figure 1. Notice that here the 
bifurcation point (uo, xo) € R corresponds to the 
appearance of a pair of solutions of [4] “from 
nowhere”. This is the simplest example of a one- 
sided bifurcation in which the bifurcating solutions 
exist for either u > flo Or u < uo. 

A particularly interesting situation arises when the 
equation possesses a symmetry. For example, assume 
that in [4] the function f is odd with respect to x. This 
implies that we always have the solution x = 0, for any 
value of the parameter u. Assume now that f satisfies 


Oxf(0, Lo) = 0 [6] 
and that 


kuf (0, 0) #0, f(O, po) #0 [7] 


Then the point (uo,0) is a pitchfork bifurcation 
point, this denomination being related with the 
bifurcation diagram in the plane (u, x); see Figure 2. 
Notice that here, the bifurcation point (uo, xo) € R? 
corresponds to the bifurcation from the origin of a pair 
of solutions exchanged by the symmetry x —-—x, in 
addition to the persistent “trivial” solution x= 0 
which is invariant under the above symmetry. Such a 
bifurcation is also referred to as a symmetry-breaking 
bifurcation. Similar bifurcation diagrams are found 
when the equation [4] has a “known” branch of 





Figure 1 Turning point bifurcation in the case 0,,f(Xo, uo) > O 
and 02f(Xo, uo) < 0. The solid (dashed) line indicates the branch 
of stable (unstable) solutions in the differential equation. 


(uo, 0) H 





Figure 2 Supercritical pitchfork bifurcation in the case 
ðZ f(O, 0) >0 and ðŽf(O, po) < 0.. The solid (dashed) lines 
indicate the branch of stable (unstable) solutions in the 
differential equation. 


solutions x(u) for u close to uo. This situation arises 
often in applications where usually this branch consists 
of trivial solutions x(u)=0. Then at a bifurcation 
point (uo, xo) a second branch of solutions appears 
forming either a one-sided bifurcation, or a two-sided 
bifurcation; see Figure 3. 

We can now view f as a vector field in the 
ordinary differential equation 


= fxn) 8 


and the study above corresponds to looking for 
equilibrium solutions of [8]. The stability of such a 
solution is determined by the sign of the derivative 
Of (x, 4) of f at this equilibrium, and it is closely 
related to the type of the static bifurcation. 

In the case of a turning point bifurcation, when 
Zf (x0, uo) Æ 0, the sign of O,f(x, u) is different for 
the two bifurcating solutions. This means that one 
solution is attracting (i.e., stable), the other one 
being repelling (i.e., unstable); see Figure 1. In the 
case of a pitchfork bifurcation as above, the stability 
of the trivial solution x =0 changes when u crosses 
uo, and the stability of both bifurcating nonzero 
solutions is the opposite from the stability of the 
origin on the side of the bifurcation. The bifurcation 





(a) (b) (c) 

Figure 3 Typical bifurcation diagrams in the case of a branch 
of trivial solutions. One-sided bifurcations: (a) supercritical, 
(b) subcritical; two-sided bifurcation: (c) transcritical. The solid 
(dashed) lines indicate the branch of stable (unstable) solutions 
in the differential equation. 


is called supercritical if the bifurcating solutions lie 
on the side of the bifurcation point where the basic 
solution x =0 is unstable and subcritical otherwise; 
see Figure 2. The situation is the same in the case of 
one-sided bifurcations for an equation which has a 
“known” branch of solutions. In the case of a two- 
sided bifurcation, there is an exchange of stability at 
the bifurcation point (mo, xo), solutions on the two 
branches having opposite stability for u > uo and 
u < uo, which changes at (uo, xo). Such a bifurcation 
is also referred to as transcritical; see Figure 3. 

Notice that the study of fixed points or periodic 
points for maps enter in the above frame. Specifi- 
cally, the period-doubling process occurring in 
successive bifurcations of one-dimensional maps is 
a common phenomenon in physics. 

The analysis of bifurcations in two dimensions 
leads to more complicated scenarios. Consider the 
differential equation [8] in which now x € R* and 
f(x, u) € R”, and assume that f(x0, 110) =0. The 
behavior of solutions near (xo, uo) is determined by 
the differential Dyf (xo, uo)=:L of f with respect to 
x, which can be identified with a 2 x 2 matrix. For 
steady solutions, the implicit function theorem 
insures the existence of a unique branch of solutions 
x(u) provided L is invertible or, in other words, zero 
does not belong to the spectrum of L. Consequently, 
the study of bifurcations of steady solutions is 
concerned with the case when zero belongs to the 
spectrum of L, and can be performed following 
the strategy described for one dimension, provided 
that the zero eigenvalue of L is simple. For example, 
assuming that the second eigenvalue is negative 
leads in general to a saddle—node bifurcation, where 
an additional dimension is added to the previous 
picture of a turning point bifurcation, in which one 
of the two bifurcating steady solutions is a stable 
node, while the other one is a saddle. If, in addition, 
there is a symmetry S commuting with f, that is, 
such that f(Sx, u) =Sf(x, u), and if, for example, xo 
is invariant under S, Sxoọ = xo, and the eigenvector Co 
associated to the zero eigenvalue of L is antisym- 
metric, Lo =—o, then there is again a pitchfork 
bifurcation. The equation possesses a branch of 
symmetric steady solutions the stability of which 
changes when crossing the value po of the para- 
meter, node on one side and saddle on the other, 
and a pair of solutions is created in a one-sided 
bifurcation which are exchanged by the symmetry S 
and have stability opposite to the one of the 
symmetric solution, just as in the one-dimensional 
pitchfork bifurcation above. 

A new type of bifurcation that arises for vector 
fields in two dimensions is the so-called Hopf 
bifurcation. This bifurcation was first understood 
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Figure 4 Supercritical Hopf bifurcation. 


by Poincaré, and then proved in two dimensions by 
Andronov (1937) using a Poincaré map, and later in 
n dimensions by Hopf (1948) by means of a 
Liapunov—-Schmidt-type method. For the differential 
equation, the absence of the zero eigenvalue in the 
spectrum of L is not enough to ensure that the 
vector field f(-,jo) is structurally stable in a 
neighborhood of xo. This only holds when the 
spectrum of L does not contain purely imaginary 
eigenvalues, as asserted by the Hartman—Grobman 
theorem. We are then left with the case when L has 
a pair of purely imaginary eigenvalues +iw,w € R*. 
Static bifurcation theory gives that the system has a 
unique branch of equilibria (x(u), u) for u close to 
uo, and typically their stability changes as u crosses 
uo. For the differential equation a Hopf bifurcation 
occurs in which a branch of periodic orbits 
bifurcates on one side of uo, and their stability is 
opposite to that of the steady solution on this side; 
see Figure 4. A convenient way to study this 
bifurcation is through “normal form theory,” 
which is briefly described below. 


Local Bifurcation Theory 


There are two aspects of bifurcation theory, local 
and global theory. As this designation suggests, local 
theory is concerned with (local) properties of the set 
of solutions in a neighborhood of a “known” 
solution, while global theory investigates solutions 
in the entire space. 

An important class of tools in local bifurcation 
theory consists of reduction methods, among which 
the Liapunov-Schmidt reduction and the center 
manifold reduction are often used to investigate 
static and dynamic bifurcations, respectively. The 
basic idea is to replace the bifurcation problem by 
an equivalent problem in lower dimensions, for 
example, a one- or a two-dimensional problem as 
the ones above. 

Consider again the equation [1] in which F:¥ x 
M — Y is sufficiently regular, and ¥, Y, and M are 
Banach spaces. Assume, without loss of generality, 
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that F(0,0)=0, or, in other words, that one solution 
is known. The equation can be then written as 


LX + G(X, u) = 0 


in which L =D xF(0,0) represents the differential of 
F with respect to X at (0, 0), and is assumed to have 
a closed range. The implicit function theorem shows 
absence of bifurcation if L has a bounded inverse, so 
that bifurcations are related to the existence of a 
nontrivial kernel of L. The Liapunov-Schmidt 
reduction then goes as follows. 

Let N(L) and R(L) denote the kernel and the range of 
L, respectively, and consider continuous projections 
P:X—N(L) and O:Y—R(L). Then there exists a 
bounded linear operator B : R(L) — (id — P) X, the right 
inverse of L, satisfying LB = id on R(L) and BL =id — P 
on X. For X € ¥ one may write 


X = Xo + Xı, Xo = PX, X1 = (id — P)X 
and then by projecting with id—QO and O the 
equation becomes 


(id — O)G(X0 + X1, u) = 0 
Xı +BOG(Xo + X1, u) = 0 


The implicit function theorem allows to solve the 
second equation for X; =¥(Xo, u) in a neighborhood 
of the origin. Substitution into the first equation leads 
to the equation in (id — O)Y for Xo in PA, 


(id — Q)G(Xo + Y(Xo, u), u) = 0 


also called bifurcation equation. This equation 
completely describes the set of solutions to [1] in a 
neighborhood of (0, 0), and this problem is then 
posed in a space of dimension much smaller than the 
dimension of ¥. 

The basic principle of the Liapunov—Schmidt method 
has been discovered and used independently by different 
authors. E Schmidt (1908) used this method for integral 
equations, while Liapunov used it to study the stability 
of the zero solution of nonlinear partial differential 
equations when the linear part has zero eigenvalues 
(1947), and later in 1960 for the bifurcation problem 
studied by Poincaré (1885). In working in a Banach 
space of t-periodic functions, the Liapunov—Schmidt 
method may be used to solve the Hopf bifurcation 
problem, as did Hopf himself in 1948. 

The analog of this reduction procedure for the 
differential equation [3] is the center manifold 
reduction. Assuming that F(0,0)=0, we obtain the 
differential equation 


dX 


Since dynamic bifurcations are related to the existence 
of purely imaginary spectral values of L, the kernel of L 
alone is not enough to describe this situation. One has to 
consider the spectral space VY, of L associated to the 
purely imaginary spectrum of L. A spectral gap is 
needed between this part of the spectrum and the rest 
(always true in finite dimensions), so that the spectral 
projection P onto V. is well defined. One writes 





X=X.4+X,, X.=PX, X,=(id—P)X 
and obtains the decomposed system 
dX. 
q TLX: + PG(Xe + Xp, p) 
dX . 
— = LX, Pada GO) 


The reduction procedure works provided the non- 
homogeneous linear equation 
oth = LX, +f 

possesses a unique solution in suitably chosen 
function spaces with weak exponential growth, 
such that one can then solve the second equation 
for X, =W(X,) in a neighborhood of the origin in 
these function spaces. This property is always true in 
finite dimensions, but it has to be checked in infinite 
dimensions. Different results showing the solvability 
of this equation are available in both Banach and 
Hilbert spaces, relying upon additional conditions 
on the spectrum of L, decaying properties of the 
resolvent of L on the imaginary axis, and regularity 
properties of the nonlinearity G. The map W is then 
used to construct a map w:P¥ x M —> (id — P) X, 
defined in a neighborhood of the origin, which 
parametrizes a local center manifold invariant under 
the flow of the equation. The flow on this center 
manifold is governed by the reduced equation in YV., 


dx. 
dt 


which completely describes the bifurcation problem. 
The first proofs of this result were given in finite 
dimensions by Pliss (1964) and Kelley (1967). Center 
manifolds in infinite dimensions have been studied in 
different settings determined by assumptions on the 
linear part L and the nonlinear part G. One typical 
assumption in infinite dimensions is that the spectrum 
of L contains only a finite number of purely imaginary 
eigenvalues, so that the reduced equation above is a 
differential equation in a finite-dimensional space. 
These reduction methods work for a large class of 
problems and the advantage of such an approach is 
that one is left with a bifurcation problem in a 
lower-dimensional space. The methods involved in 
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solving this reduced bifurcation problem can be very 
different from one problem to another, and often 
make use of some additional structure in the problem, 
such as a gradient-like structure, Hamiltonian 
structure, or the presence of symmetries, which 
are preserved by the reduction procedure. 

A powerful tool for the analysis of these reduced 
differential equations is provided by the normal 
form theory, which goes back to works of Poincaré 
(1885) and Birkhoff (1927). The idea is to use 
coordinate transformations to make the expression 
of the vector field as simple as possible. The 
transformed vector field is called normal form. 
There is an extensive literature on normal forms 
for vector fields in many different contexts, in both 
finite- and infinite-dimensional cases. Typically the 
classes of normal forms are characterized in terms of 
the linear part of the differential equation. 

For differential equations of the form 


ČE = Lx + g(x.) 9 
in which L is a matrix and g a sufficiently regular 
map such that g(0,0)=0, D,g(0,0)=0, as encoun- 
tered in bifurcation theory, one possible character- 
ization of normal forms makes use of the adjoint 
matrix L*. Fixing any order k>2, there exist 
polynomials ® and N of degree k in x with 
coefficients which are regular functions of pm, 
and (0, 0) = N(0, 0) = 0, D,.®(0, 0) = D,.N(0, 0) = 0, 
such that by the change of variables 


x=yt By, u) 


the equation [9] is transformed into the normal form 


O <Ly+No.n) +o [10 
in which the polynomial N is characterized through 
N(e™'y, u) = è" N(y, 1) 

for all y, u, and t, or, equivalently, 
DyN(y, )L*y = L*N(y, u) 


for all y and u. This characterization allows to determine 
the classes of possible normal forms for a given matrix L, 
and also provides an efficient way to compute the 
normal form for a given vector field g. As for the 
reduction methods, normal form transformations can be 
made to preserve the additional structure of the 
problem, such as Hamiltonian structure or symmetries. 

As an example, consider a differential equation of 
the form [9] with x € R” and u € R, which supports a 
Hopf bifurcation so that L has simple eigenvalues 
+iw,w > 0, and no other eigenvalues with zero real 
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part. The center manifold reduction provides a 
two-dimensional reduced system with linear part 
having the simple eigenvalues +iw, for which it is 
convenient to write the normal form in complex 
variables 


dA. 2 2k+2 
fA ia + AQ(|AL,n) + o(a’) 
for A(t) € C, where O is a complex polynomial of 
degree k in A|" with O(0, 0) =0, or, equivalently, in 
polar coordinates A = re'®, 


a = rO,(r?, u) a o(r?**?) 
Fat Qor? n) + ofr) 


O, and QO, being the real and imaginary part of Ọ, 
respectively. The radial equation for r truncated at 
order 2k + 1 decouples and admits a pitchfork bifurca- 
tion. The bifurcating steady solutions of this equation 
then lead first to periodic solutions for the truncated 
system, which are then shown to persist for the full 
equation by a standard perturbation analysis. 

A situation that occurs in a large class of problems 
is when the problem possesses a reversibility 
symmetry, which often comes from some reflection 
invariance in the physical space, that is, when the 
vector field F(-, u) anticommutes with a symmetry 
operator S. One of the simplest examples is the case 
of a differential equation [9] when the matrix L has 
a double eigenvalue in 0, no other eigenvalues with 
zero real part, and a one-dimensional kernel which 
is invariant by S. In this case, the center manifold 
reduction provides a two-dimensional reduced rever- 
sible system, which can be put in the normal form 


da 
T b 
db 
Ga @ + of(lal + bD”) 
which anticommutes with the symmetry 


(a,b)++(a,—b). The above system undergoes a 
reversible Takens—Bogdanov bifurcation and has 
for u > 0 a phase portrait as in Figure 5. There are 
two equilibria, one a saddle, the other a center, and 
a family of periodic orbits with the zero-amplitude 
limit at the center equilibrium, and the infinite- 
period limit a homoclinic orbit, originating at the 
saddle point. In concrete problems the bounded 
orbits of such a reduced system determine the shape 
of physically interesting solutions of the full system 
of equations, such as, for example, in water-wave 
theory where to homoclinic and periodic orbits 
correspond solitary and periodic waves, respectively. 
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Figure 5 Phase portrait of the reduced system in a reversible 
Takens—Bogdanov bifurcation (left) and sketch of the a-component 
of solutions corresponding to homoclinic and periodic orbits (right). 
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Figure 6 Phase portrait of the reduced system in absence of 
reversibility (left) and sketch of the a-component of the solution 
corresponding to the bounded orbit (right). 


Notice that in the absence of the reversibility 
symmetry, the same type of bifurcation may lead to 
a completely different phase portrait for the reduced 
system as, for example, the one in Figure 6 in which 
the homoclinic and the periodic orbits disappear. 
This situation often occurs in the presence of a small 
dissipation in nearly reversible systems. 


Global Bifurcation Theory 


Most of the existing results in global bifurcation 
theory concern the static problem [1]. The analysis 
of global sets of solutions often relies upon 
topological methods, degree theory, but also varia- 
tional methods, or analytic function theory. Signifi- 
cant progress in understanding global branches of 
solutions has been made in the 1970s, in particular, 
for nonlinear eigenvalue problems and the Hopf 
bifurcation problem (see, e.g., works by Rabinowitz, 
Crandall, Dancer, and Alexander, Yorke, Ize, 
respectively). 

A now-classical result in the topological theory of 
global bifurcations is the following theorem by 
Rabinowitz (1970), which gives a characterization 
of global sets of solutions for eigenvalue problems of 
the form 


X = F(X, u) = LX + H(X, u) 


H(X, u) =0(||X||), posed for (X, u) € X x R, X being 
a Banach space. In contrast to local theory where 
the function F is usually k-times differentiable (with 
a suitable k), in the global theory a typical 
assumption is that F:¥ x R— X is compact. The 
equation above possesses a “trivial” branch of 


solutions (0, u) for any u. The bifurcation result 
asserts that if for some real parameter value uo zero 
is an eigenvalue of odd multiplicity of the operator 
id — uoL, then the set S of nontrivial solutions (X, u) 
possesses a maximal subcontinuum which contains 
(0, uo) and meets either infinity in ¥ x R or another 
trivial solution (0, u1), 41 Æ uo. In particular, (uo, 0) 
is a bifurcation point. A local version of this result is 
often referred to as Krasnoselski’s theorem. 

Different versions and extensions of these theo- 
rems can be found in the literature, as, for example, 
in the case of a simple eigenvalue, or if the field F is 
real-analytic when the set of solutions is path- 
connected. More recent works address the question 
of lack of compactness, and a number of results are 
now available for problems with additional struc- 
ture (gradient-like or Hamiltonian structure), but 
also for concrete problems, such as the water-wave 
problem. 


See also: Bifurcations in Fluid Dynamics; Bifurcations of 
Periodic Orbits; Central Manifolds, Normal Forms; 
Dynamical Systems in Mathematical Physics: An 
Illustration from Water Waves; Ginzburg-Landau 
Equation; Integrable Systems: Overview; Leray- 
Schauder Theory and Mapping Degree; Singularity and 
Bifurcation Theory; Stability Theory and KAM; Symmetry 
and Symmetry Breaking in Dynamical Systems. 
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Introduction 


Almost all classical hydrodynamical stability problems 
are experiments or gedankenexperiment which have 
been designed to understand and to extract special 
phenomena in more complicated situations. Examples 
are the Taylor—Couette problem, Bénard’s problem, 
Poiseuille flow, or Kolmogorov flow. 

The Taylor—Couette problem consists in finding the 
flow of a viscous incompressible fluid contained in 
between two coaxial co- or counterrotating cylinders, 
cf. Figure 1. If the rotational velocity of the inner 
cylinder is below a certain threshold, the trivial 
solution, called the Couette flow, is asymptotically 
stable. At the threshold, this spatially homogenous 
solution becomes unstable and bifurcates via a pitch- 
fork bifurcation or a Hopf bifurcation into different 
spatially periodic patterns, that is, depending on the 
rotational velocity of the outer cylinder the basic 
patterns are stationary (called the Taylor vortices) or 











eee 


Figure 1 The Taylor—Couette problem with the Taylor vortices. 
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time-periodic. If the rotational velocity of the inner 
cylinder is increased further, more complicated pat- 
terns occur. The bifurcation scenario is well under- 
stood from experiments and analytic investigations. 

Bénard’s problem consists in finding the flow of a 
viscous incompressible fluid contained in between two 
plates, where the lower plate is heated and the upper 
plate is kept at a constant temperature, cf. Figure 2. If 
the temperature difference between the two plates is 
below a certain threshold, the transport of energy from 
below to above is made by pure conduction. At this 
threshold, this spatially homogenous solution becomes 
unstable, convection sets in, and spatially periodic 
patterns as rolls or hexagons occur. Convection 
problems play a big role in geophysical applications, 
that is, in spherical domains, as the earth. The paradigm 
for an anisotropic pattern-forming system is electro- 
convection in nematic crystals. 

Poiseuille flow consists in finding the flow of a 
viscous incompressible fluid flowing through a pipe 
driven by some pressure gradient, cf. Figure 3. In 
noncircular pipes, the trivial laminar flow becomes 
unstable at a critical pressure gradient. Experimen- 
tally, a direct transition to turbulent flow with large 
amplitudes is observed, according to the fact that in 
general at the instability point of the trivial solution 
a subcritical bifurcation occurs. 


DOO 


Figure 2 Beéenard’s problem with rolls. 








Figure 3 Poiseuille flow with the trivial solution. 
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Figure 4 The inclined-plane problem. The trivial Nusselt 
solution possesses a flat top surface and a parabolic flow profile. 


Kolmogorov flow consists in finding the flow of a 
viscous incompressible fluid under the action of an 
external force parallel to the flow direction x and 
varying periodically in the perpendicular y-direction. 
This gedankenexperiment has been designed by 
Kolmogorov in 1958 as a simplified model for the 
Poiseuille flow problem in order to study the nature 
of turbulence. The trivial solution which is called 
Kolmogorov flow can become unstable via a long- 
wave instability along the flow direction. 

The inclined-plane problem consists in finding the 
flow of a viscous liquid running down an inclined 
plane, cf. Figure 4. The trivial solution, the so-called 
Nusselt solution, becomes sideband-unstable if the 
inclination angle ¢ is increased. Then the dynamics is 
dominated by traveling pulse trains, although the 
individual pulses are unstable due to the long-wave 
instability of the flat surface. Time series taken from 
the motion of the individual pulses indicates the 
occurrence of chaos directly at the onset of instability. 

There are other famous hydrodynamical stability 
problems, with arbitrarily complicated bifurcation 
scenarios. 


Spectral Analysis of the Trivial Solution 


All classical hydrodynamical stability problems are 
described by the Navier-Stokes equations 


1 
O=V-U 


where U=U(x,t) € R? with d=2,3 is the velocity 
field, p = p(x,t) € R the pressure field, f some external 
forcing, and v the dynamic viscosity. These equations 
are completed with boundary conditions. In case of 
Bénard’s problem, the Navier-Stokes equations are 
coupled to a nonlinear heat equation. 

By projecting U onto the space of divergence-free 
vector fields and by taking the trivial solution as 
new origin all problems from the previous section 
can be written as evolutionary system 


(U-V)U+f 1 


ðU = AU + N(U) 


where U =0 corresponds to the trivial solution, where 
A is a linear and N(U) = O(U?) for U — 0 a nonlinear 
operator. Most of the examples from the previous 
section are semilinear, that is, from a functional 
analytic point of view, the nonlinear operator N can 
be controlled in terms of the linear operator A. 

Since the form of the bifurcating pattern is only 
slightly influenced by far away boundaries, that is, for 
instance, the upper and lower end of the rotating 
cylinders in the Taylor-Couette problem, the problems 
are considered from a theoretical point of view in 
unbounded domains, 2 =R¢% x 5, with © C R” the 
bounded cross section that is, for instance, that the 
Taylor—Couette problem is considered with two cylin- 
ders of infinite length. Then the eigenfunctions of the 
linear operator A are given by Fourier modes, that is, 


A(e®™ pp n(Z)) = An(Re* Ge n(2) 


with x € Rf k € RÍ, k -x= Da i kjxjz E Xn EN. 
If an external control parameter is changed, inde- 
pendent of the underlying physical problem, the 
trivial solution becomes unstable, then the surface 
k= Reàı(k) intersects the plane {Reàı(k)=0}. 
Generically, this happens first at a nonzero wave 
vector k, Æ 0 (cf. Figure 5). 

Examples for such an instability are the Taylor- 
Couette problem, Bénard’s problem, or Poiseuille 
flow. Very often, due to some conserved quantity in 
the problem we have ReA,(0) =O for all values of 
the bifurcation parameter. Then, a so-called side- 
band instability can occur, cf. Figure 6. 

Examples for such an instability are the Kolmo- 
gorov flow problem or the inclined plane problem. 

According to some symmetries in the problem, for 
instance, reflection along the cylinders in the 
Taylor—Couette problem or rotational symmetry in 
Bénard’s problem, the curves in Figure 5 are double 
or rotational symmetric. 

In case of Q being spherical symmetric, we have 


Air) e1,n(2)) = Afir) prnh) 





Figure 5 Real part of the spectrum in case of an instability at a 
wave number ke Æ 0. Definition of the small bifurcation parameter e. 
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Figure 6 Real part of the spectrum in case of a sideband 
instability. Definition of the small bifurcation parameter e. 


with r>0,z€ SO a for [ENo and m=- l, 
/—1,...,/+1,/ being a spherical harmonic, that 
is, if A;, is the eigenvalue having first positive real 
part, then by symmetry, simultaneously 2/) +1 
eigenvalues cross the imaginary axis. 


Reduction of the Dimension 


In order to understand the occurrence of the spatially 
periodic Taylor vortices in the Taylor—Couette pro- 
blem and of the roll solutions and hexagons in 
Bénard’s problem, the problems are considered with 
periodic boundary conditions along the unbounded 
directions. Then the instability of the trivial solution 
occurs when at least one eigenvalue crosses the 
imaginary axis. Generically, this happens by a simple 
real eigenvalue or a pair of complex-conjugate 
eigenvalues crossing the imaginary axis (Figure 7). 
Center manifold theory and the Lyapunov—Schmidt 
reduction allow to reduce the a priori infinite-dimen- 
sional bifurcation problem to a finite-dimensional one. 

In case of a real eigenvalue A; crossing the imaginary 
axis, the solution u can be written as a sum of the 
weakly unstable mode and the stable modes, that is, 
u= c11 + u, (c1 € R), where u, lives in the closure of 
the span of the stable eigenfunctions {y2, y3,...}. For 
the linearized system all solutions are attracted by the 
one-dimensional set E,={u|u,=0}, in which all 
solutions diverge to infinity. 

For the nonlinear system and small bifurcation 
parameter this attracting structure survives, no 
longer as a linear space, but as a manifold 
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Figure 7 Generically, a simple real eigenvalue or a pair of 
complex-conjugate eigenvalues cross the imaginary axis. 


Figure 8 The center manifold is invariant under the flow, is 
tangential to the central subspace E., and attracts nearby 
solutions with some exponential rate. 


M= {u = C191 + h(c1)| 
h(c1) z span{ 2, 93, os si 


the so-called center manifold which is tangential to E,, 
that is, ||/(c1)|| < Cl|c1||* (Figure 8). The dynamics on 
M, is no longer trivial due to the nonlinear terms. 

Due to the fact that real problems are considered 
ReAi(k-) =0 implies ReA;(—k,) =0, that is, in case 
of 27/k,-periodic boundary conditions always two 
eigenvalues cross the imaginary axis simultaneously. 
For Bénards’s problem in a strip or for the Taylor- 
Couette problem in case of a bifurcation of fixed 
points, the reduced system on the center manifold is 
derived with the ansatz 


U=eA(e pe cee +0) 


where 0 < e€ < 1 is the small bifurcation parameter, 
cf. Figure 5. Then due to el**el*e*e-ik&-* — eikex the 
complex-valued amplitude A satisfies the so-called 
Landau equation 


ƏðrA = A — JAJA? + O(e7) 


where the Landau coefficient y € R is obtained by 
classical perturbation analysis (Figure 9). The 
reduced system is symmetric under the St-symmetry 





Figure 9 The dynamics of the Landau equation. Except of the 
origin which corresponds to the Couette flow, all solutions 
converge towards the circle of fixed points, which corresponds 
to the family of Taylor vortices. The translation invariance of the 
Taylor—Couette problem is reflected by the rotational symmetry of 
the reduced system. 
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Ar Ae? with ¢€R which corresponds to the 
translation invariance of the original systems. 

This so-called equivariant bifurcation theory has 
been applied successfully to convection problems in 
the plane and on the sphere. 

The stability of time-periodic flows can be 
analyzed with Floquet multipliers. Bifurcations 
from a time-periodic solution can lead to quasiper- 
iodic motion in time. Ruelle and Takens (1971) 
showed that already the next bifurcation leads to 
chaotic dynamics. Since this time many classical 
hydrodynamical stability problems have been ana- 
lyzed with bifurcation theory up to turbulent flows. 

It was observed that center manifold theory can 
also be applied successfully to elliptic PDE problems 
posed in spatially unbounded cylindrical domains. 
A famous example is the construction of capillary- 
gravity solitary waves for the so-called water-wave 
problem. 


Modulation Equations 


The analysis of the last section is of no use in case of 
a sideband instability occurring at the wave number 
k.=0, as it happens in the inclined-plane problem 
or in the Kolmogorov flow problem. Moreover, in 
case of an instability at a wave vector k, 4 0, based 
on the above analysis, front solutions cannot be 
described. In such situations, the method of modula- 
tion equations generalizes the role of the finite- 
dimensional amplitude equations from the last 
section. 

The complex cubic Ginzburg-Landau equation in 
normal form is given by 


ðrA = (1+ ia)ðZ A+ A — (1 + ið)AJA|? 


where the coefficients a, 8 € R are real, and we have 
XeR,T>0, and A(X,T)€ C. The Ginzburg- 
Landau equation is a universal amplitude equation 
that describes slowly varying modulations, in space 
and time, of the amplitude of bifurcating spatially 
periodic solutions in pattern-forming systems close 
to the threshold of the first instability. Whenever the 
instability drawn in Figure 5 occurs, that is, for the 
Taylor—Couette problem and Bénard’s problem in a 
strip, that is, d=1, it can be derived by a multiple 
scaling ansatz 


u(x,t) ~ cA(e(x — Cot), Sper OU T 


For instance, in case of a=3=0, the Ginzburg- 
Landau equation possesses front solutions connect- 
ing the stable fixed point A=1 with the unstable 
fixed point A=0. Such solutions correspond in the 
Taylor-Couette problem to modulating fronts 


Figure 10 The front solution of the Ginzburg—Landau equation 
modulates the underlying pattern in the original system. 


connecting the stable Taylor vortices with the 
unstable Couette flow, cf. Figure 10. 

The diffusion operator in the Ginzburg-Landau 
equation reflects the parabolic shape of ReA, close 
to k=k, in Figure 5. In case of the long-wave 
instability, as drawn in Figure 6, the second-order 
differential operator changes in a fourth-order 
differential operator. 

For Kolmogorov flow with T =e*t and X =ex and 
the amplitude scaled with £, we obtain that in lowest 
order A has to satisfy a Cahn—Hilliard equation 


OrA = — V208} A — 304A + 702 (A?) 


where A(X, T) € Rand y € R a constant (cf. Figure 6). 
The Kuramoto-Shivashinsky (KS)-perturbed KdV 
equation 


OrA = —O3u — Ox(A*)/2 — e(02 + Of)u 


with A=A(X,T) E€R,X €R,T >0, where 0O<e<1 
is still a small parameter, can be derived for the 
inclined problem with T=e%t and X=ex and the 
amplitude scaled with €°. 

The theory of modulation equations is nowadays a 
well-established mathematical tool which allows us to 
construct special solutions, global existence results for 
the solutions of pattern-forming systems, or allows to 
characterize the attractors in such systems. The 
method is based on approximation results, showing 
that solutions of the original systems can be approxi- 
mated by the modulation equation and attractivity 
results showing that every solution of the original 
system develops in such a way that it can be described 
by the modulation equation. 

This method can also be applied to secondary 
bifurcations describing instabilities of spatially per- 
iodic wave trains. Then the so-called phase-diffusion 
equations, conservation laws, Burgers equations, 
and again the KS equations occur. 

However, this method cannot be applied success- 
fully in all situations. There are counterexamples 
showing that not every formally derived modulation 
equation describes the original system in a correct 
way. Moreover, very often according to some 
symmetries in the original problem no consistent 
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Figure 11 Spectrum for the flow around an obstacle. 


multiple scaling analysis is possible, that is, that the 
modulation equations still depend on €. 


Discussion 


There is no satisfactory bifurcation analysis for situa- 
tions where boundary layers play a role. The most 
simple problem is the flow around some obstacle. The 
difficulties are according to the fact that due to the 
unbounded flow region there is always continuous 
spectrum up to the imaginary axis. From the localized 
obstacle discrete eigenvalues are created, (cf. Figure 11). 

In such a situation, so far there is no mathematical 
bifurcation theory available. 


See also: Bifurcation Theory; Dynamical Systems in 
Mathematical Physics: An Illustration from Water Waves; 
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Introduction 


Bifurcation theory of periodic orbits relates to 
modeling of quite diverse subjects. It appeared 
classically in the field of celestial mechanics with 
the contributions of H Poincaré. Van der Pol (1926, 
1927, 1928, 1931) observed the frequency-locking 
phenomenon in electrical circuits. More recently, 
Malkin’s theory (Malkin 1952, 1956, Roseau 1966) 
was used to justify synchronization of weakly 
coupled oscillators modeling the electrical activity 
of the cells of the sinusal node in the heart. This 
article provides the essential mathematical back- 
ground necessary for existence of frequency locking. 
Applications can be found, for instance, in Weakly 
Coupled Oscillators. 
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The Asymptotic Phase of a Stable 
Periodic Orbit 


Let IT be a periodic orbit of a vector field and let 
S(T) denote the stable manifold of I (resp. U(T) 
denotes the unstable manifold of Tr). The following 
theorem can be found, for instance, in Hartman 


(1964). 


Theorem There exist a and K such that Re(A;) <—a, 
J= lsk ONG RAN) > O69 Sh =F logd forall 
x € S(T), there is an asymptotic phase to such that for 
allt > 0 


| pex) — Y(t — to) |< Ke 
Similarly, for any x € U(L), there is a to such that t < 0, 
| p(x) — (t — to) |< Ket 


If the periodic orbit is stable, the local stable 
manifold coincides with an open neighborhood of I. 
In such a case, there is a foliation of this open set 
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whose leaves are the points with a given asympto- 
tic phase. The asymptotic phase can be considered 
as a coordinate function œ defined on the 
neighborhood S(T). 

If we consider now the particular case of a plane 
system, this function can be completed with the 
square of the distance function to the orbit into a 
coordinate system called the “amplitude—phase”’ 
system and denoted as (p, œ). 


Frequency Locking and Phase Locking 


The term “oscillator” has two meanings. A con- 
servative “oscillator” is a plane vector field which 
displays an open set of periodic orbits. It is said to 
be isochronous if all orbits have same period. A 
dissipative “oscillator” is a planar vector field which 
displays an attractive limit cycle (attractive periodic 
orbit). 
We consider N dissipative oscillators: 


ae f (xi, yi) 
i. [1] 
g(xi, Vi) 


where i=1,...,m. 

The dynamical system obtained by considering the 
space of all the variables (x;,y;),i = 1,...,m, dis- 
plays an invariant torus full of periodic orbits that 
we denote by T’”(0). 

Assume now that the N oscillators are weakly 
coupled: 


dt = Fy; le el (X59; €) 


dy; 
a g(xi Yi) + €Gi(x, y, €) 


where e can be considered as small as we wish. 


Definition The system [2] has a frequency locking 
if it displays a family of stable periodic orbits I, for 
all values of e small enough which tends to (in the 
sense of Hausdorff’s topology) a periodic orbit of [1] 
contained in the periodic torus T” (0). 


Assume now that [2] has a frequency locking 
associated with the periodic orbit T(t). Consider the 
projections T;(t) of T(t) on the coordinates plane 
(xi, yi), i=1,...,m. Assume that € is small enough 
so that the projection belongs to the open set S; on 
which are defined the “amplitude-phase” coordi- 
nates of the system [1]. We can write the system [2], 
restricted to the open set S= I] S;, as 





dt _ filp, Q, €) 
p [3] 
T =@,(p,a,e), i=1,...,m 


Definition The system [2] has a phase locking if 
the system induced by [3] on T(t) 


= ©; (0,a, €) [4] 





has an attractive singular point. 


As the attractive singular points are structurally 
stable, this is enough to assume that the system 


da; _ 
T = @;(0, a, 0) s 


displays an attractive singular point. 





Periodic Orbits of Linear Systems 
Consider the linear system 


dx 

de 
where P is a continuous T-periodic matrix function 
and q is a vector T-periodic continuous function, 
x=(x1,..-,X,). Consider also the two associated 
homogeneous equations: 


P(t) -x + q(t) [6] 


dx 
dx A 


where P* denotes the transposed of P. 

The set of T-periodic solutions of [7b] is a vector 
space. m denotes its dimension. Let U’ (t), j= 1, ..., m, 
be a basis of this vector space. This basis is completed 
by adding n — m solutions U/(t), j =m + 1,...,n, to 
obtain a basis of R”. Let U(t) be the matrix whose 
columns are these vectors; denote U;(t) the elements of 
this matrix. 

With the change of variable x = U*(0)~'y, system 
[6] gets transformed into 


Z = Olt\y + r(t) 8 


with O(t) = U*(0)P(t)U*(0)~! and r(t) = U*(0)q(t). 
Matrix V(t) = U~!(0)U(t) is such that 


dV 
ae DV, 


and the k first column vectors V(t), denoted as 
Vi(t), 7=1,...,m, are T-periodic. 
Let X(t) be the fundamental solution defined by 
dX _ 


then, 


X H(t) = V* (t) 


The solution of [8] can be written as 


0)+X(t)- [ wre du [9] 


This yields that T-periodic solutions of [8] have 
initial data y(0) given by 


T 
VT) -D0 = f Vids (10) 


Conversely, given a solution y(0) of [10], 
T-periodicity of P and q and uniqueness of solutions 
of a differential equation imply that y(0) represents the 
initial data of a T-periodic solution of [8]. Hence, the 
T-periodic solutions of [8] are in one-to-one corre- 
spondence with the affine space defined by the 
solutions of [10]. The m first rows of V*(T) —I are 
zero and its rank is exactly n — m. In the following, 
assume that the determinant A formed by the (n — m) 
last rows and last columns of (V*(T) — I) is not zero. 

A necessary and sufficient condition so that [8] 
displays a T-periodic solution is 


T n 
/ S>Vie(u)r(u) du =0, k=1,...,m [11a] 
j=l 
S (Vue(T) — 84)35(0) 
j=m+1 
— 4 [ Vie(s)rj(s) ds, mt+1i<s<n [11b] 
. 0 


This yields the Fredholm alternative, if the m 
conditions, 


$ [won 


are satisfied, then [6] displays a family x,(t) of 
T-periodic solutions depending of m parameters 
(a1, a sm: 


Xo(t) = a) T Amm) + E) [13] 


where x(t) is a particular T-periodic solution and 
g(t) denote T-periodic independent solutions of 


k=1,...,m [12] 
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[7a]. To be more specific, one can choose x(t) to 
be the unique solution of [6] such that 
y(0),p =O,R=m+1,...,n, and @(t) solutions of 
[7a], such that y(0),=6j,. With these notations, 


x(t) is such that 
VO) = Oj, RS disney 


and its other initial conditions y(0), = Bg, k =m + 
1,...,”, are fixed: 


Be = Bp 


Malkin’s Theorem for Quasilinear 
Systems 


Consider now nonlinear systems with the 
perturbation: 

dx 

“OEE OET CA [14 


where f is C! and T-periodic in t. 

Assume that the solutions y(t, y(0), €) of [14] exist 
for all values of t, O < t < T. The solutions define a 
differential function of their initial data y(0). This is, 
for instance, true for perturbations of linear systems 
if € is small enough. 

Assume that q satisfies la condition [12] and that 
there is a solution 


to the equations 





-È | Ua (W)f;(xa(u), u, 0) du = 0, 
TE ee oe 
so that 
L a. k=1,...m,j=1,...,m [15b] 
oaj 


is invertible. 
Proceed as in previous section with the coordinate 


change x=U*(0)"'y. Equation [14] gets trans- 
formed into 

dy 

Z=Olytr) +E) [16] 


with F = U*(0)f(U*(0)~' - y,t,€). 

Solutions of [16] are uniquely determined by their 
initial data. We can understand the parameters (a, (3) 
as coordinates on the space of solutions. With this 
viewpoint, for instance, the set of T-periodic 
solutions of [6] is an affine space of dimension m 
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given by the equations 3 = 8? and is parametrized by 
the coordinates a. In this space, we pick up a point 
(which corresponds to a particular T-periodic solu- 
tion of [6]): (a=a”). T-periodic solutions of [16] are 
in One-to-one correspondence with the solutions of 
(a, B,€) = eh 


, €, œ, B), s, €)ds = 0, 


[17a] 


j=m-+1.,...,n 
n T 
-5 | Viz(s)r(s) ds 
j=1 “0 
n T 


-eJ | ValS, 60,8), se)ds=0. 


kR=m+1,...,n [17b] 


where a,,R=1,...,m and B,=y}(0), k=m+ 
1,...,n parametrize the solutions y(t,€,&œ, 6) of 
[14] in this way: 


y(0)=U* (0) -x =D 0) )+x(0) [18] 


Consider the determinant of the Jacobian matrix 
of the mapping 


(a, 8) Cla, 5, €) [19] 
for a=a°, 8, =6), k=m +1,...,n, €=0. This is 
equal to the product of A and the determinant of 


oprla) 


8 Qj leat 





|20] 


which is nonzero. 

The implicit-function theorem shows that the 
differential equation [14] (and thus [16] as well) 
has, for e small enough, a unique T-periodic solution 
which tends to x,0 when e€ tends to 0. 


Generalization of Malkin’s Theorem 


Finally, we consider the most general situation of 
the perturbation of a general system (not necessarily 
linear): 


- = f(x, t) + eg(x,t, €) |21] 


where we assume that 


E = fhet) [22] 


displays an m-parameter family x,(t) of T-periodic 
orbits. 

Assume that the solutions y(t, y(0), €) exist for all 
0 <t< T and define a differentiable mapping of the 
initial data y(0). This is, for instance, the case if we 
assume that the nonperturbed equation defines a 
flow and if € is small enough. 

Assume also that the different solutions x,(t) are 
independent in the sense that the mapping 


t+ Xq(t) 


is an immersion for any t. In other words, the m 
vectors dx,(t)/da; are independent. 

We linearize the solution along the family of 
periodic orbits: 


X = Xq(t) + €€ [23] 
Equation [21] gets transformed into 


SS = Dfi(xa(t).t) -€+8(%a(t)st0)+eF(Ete) PA 


Set, furthermore, 


P(t) = Dfx(Xa(t), t), a(t), t, 0) 


and denote U(t) the fundamental solution of [7b] 
described earlier. 


r(t) = g(x 


Theorem Assume that there is a solution 





n T 
o(a) = ` U; (u)gj(xa(u), u, 0) du = 0, 
j=1 
k=1,...,m [2Sa] 
such that 
OK a) a k= am] m eee) [23b] 
Oa; 


is invertible. Then, for all e sufficiently small, eqn 
[21] has a unique T-periodic solution which tends to 
xao when ce tends to 0. 


We show that under the hypothesis of the 
theorem, we can apply the results proved in the 
preceding section. Note that one can prove the 
theorem for eqn [24] because it reduces to [21] with 
the change of variables [23]. 


Note first that the m conditions [25a] imply that 
the m equations, 


ee 


ay aht ae a BX 


q0(t), t, 0) 
display a family of T-periodic solutions which 
depend on m parameters y=(71,.--,%m). From 
(13), one can write 


cyt) ote Ami Ongt) + E(t) [26] 


where &(t) is a particular T-periodic solution and 
the @(t) are independent T-periodic solutions 
of (22a). 


= 11 (t) + 


Lemma 1 A possible choice for the solutions $;(t) 
iS OXq(t) Jar lacai 


We have already assumed that these vectors are 
independent. They are obviously T-periodic solu- 
tions to (22a). 

In the following, we will assume that all other periodic 
solutions of (22a) are linear combinations of these. 

As a consequence of what was proved in the 
section on periodic orbits of linear systems, system 
[24] displays a periodic solution (for € small enough) 
if there exists a solution 


(re) 


to equations 





n T 
el) => | Unls)Fi(Gs(s),8,0) ds = 0, 
ja! 
kR=1,...,m 
such that 
OV, (Y) 
Oy; hep k=1, m,j=1, 1n 


is invertible. 

Lemma 2 The quantities v,(y) depend linearly in y. 
Proof Observe first that the quantities F;(&,s, 0) 
depend quadratically of €: 


1 ð 
C ia 24 r Op t 





(xao (s), )Er£ 
SP (5), 5,0) 


+ Bix, o(s),s, 0) [27] 
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Then, the solutions ¿(t) depend linearly on y. We thus 
obtain that a priori vp(y) are quadratic functions of 7: 








Viel Visas s Von) 
yw f ANEN 
ir h i OB; Bq OY, 
i 1 Of. OZ, = OZ; 
+ J L| -4 tans 
2474 Vip Eee OZ, OZ \ Og £ 
Og; OZ, 
ee Oe ae aa 28 
+ Bz, a oo [28] 


where the dots represent quantities independent of y. 
We use then the expression 


d Fg 
dt O00, 
B on i OZ, OZ] +5 O zp 


— ORAZ, Og Oy” ~ Az, 0400; 











This allows one to find the homogeneous quadratic 
part as 








fu fi Oe ORI 
zr Jo m nae Iya Or 
_ > f v or OF \ a. 
9i OYq00; 
an s) 2l Oz, ae 
I OVqO Yr 
Integration by parts yields 


jkl Sf 


— dUp afi \ 3 zr = 
-5f (F+ Upls) se) aig, SO 


because U* is solution to [7a]. This shows that [28] 
is linear in y. Suffices to show that the determinant 
of this system does not vanish to have existence and 
uniqueness of the solution such that 
OV ..., Vm 
O D 
OV; ++ +5 Ym 





cae ORR Oi 
GNT OzkðZI Oyy Ay 











Consider now the coefficient of the linear part: 


5 [oof ha a 
jp 
OZ,02) OZ ð 
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and the coefficient 


We can write 





aaa aes ree ef Pee eam | 
dag J en Ej + Sip OZ, a 
Note that 

= -5 E+ E, + g (e(t), 2°, 0) 


Integration by parts yields 
-L / ' (d (QWip\ z 
a=ae 7 0 ds Og 
k Og; OZ, 
U (=. id 
7 | i ee A s 


From the equation 
dU;p fk 
+ 
dt 7 Oz; 


me 








dag 





we deduce that 
dt Ja. Ôg; m a Ukp Odg 


and thus this shows that 














ne 





= Of - Og | dz 
= i. -F e a 
>, | 4 i Ta rq 


This achieves the proof of the theorem. In the special 
case of Hamiltonian systems, in the case of the 
peturbations of an isochronous system, the method 
explained is equivalent to Moser’s averaging theory. 

The reader is referred to other articles in this 
encyclopedia for a discussion of other aspects of 
synchronization, frequency locking, and phase locking. 


dag a=a? 


See also: Bifurcation Theory; Fractal Dimensions in 
Dynamics; Integrable Systems: Overview; Isochronous 
Systems; Leray—Schauder Theory and Mapping Degree; 
Ljusternik—Schnirelman Theory; Singularity and 
Bifurcation Theory; Symmetry and Symmetry Breaking in 
Dynamical Systems; Synchronization of Chaos; Weakly 
Coupled Oscillators. 
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Introduction 


At the end of the 1960s, the theory of integrable 
systems received a great boost by the discovery 
(made by Gardner, Green, Kruskal, and Miura) of 
the inverse-scattering method (see Integrable 
Systems: Overview). It allows one to reduce the 


solution of the (nonlinear) Korteweg-de Vries 
equation (henceforth simply the KdV equation) 


6uu.) [1] 


Ut = H(Uyxx = 


to the solution of linear equations. After the KdV 
equation, a lot of other nonlinear partial differential 
equations, solvable by means of the inverse-scattering 
method, were found out. A common feature of such 
equations is the existence of soliton solutions, that 
is, solutions in the shape of a solitary wave (with 
additional interaction properties). For this reason 
they are called “soliton equations.” 


It was soon observed that the KdV equation can 
be seen as an infinite-dimensional Hamiltonian 
system with an infinite sequence of constants of 
motion in involution; the corresponding (commut- 
ing) vector fields are symmetries for the KdV 
equation, and form the so-called KdV hierarchy. In 
particular, Zakharov and Faddeev constructed 
action-angle variables for the KdV equation. These 
facts pointed out that the KdV equation is an 
infinite-dimensional analog of a classical integrable 
Hamiltonian system (Dubrovin et al. 2001), whose 
theory has been developed during the nineteenth 
century by Liouville, Jacobi, and many others. 
Moreover, the infinite-dimensional case suggested 
methods (such as the existence of a Lax pair) which 
were applied successfully also to finite-dimensional 
cases such as the Toda lattices and the Calogero 
systems. More recently, after the discovery by 
Witten and Kontsevich of remarkable relations 
between the KdV hierarchy and matrix models of 
two-dimensional (2D) quantum gravity, there has 
been a renewed interest in the study of soliton 
equations in the community of theoretical physicists. 
We also mention that the classical versions of the 
extended W,,-algebras of 2D conformal field theory 
are the (second) Poisson structures of the Gelfand- 
Dickey hierarchies. 

In this article we describe the so-called 
bi-Hamiltonian formulation of soliton equations. 
This approach to integrable systems springs from the 
observation, made by Magri at the end of the 1970s, that 
the KdV equation can be seen as a Hamiltonian system 
in two different ways. In the same circle of ideas, there 
were important works by Adler, Dorfman, Gelfand, 
Kupershmidt, Wilson, and many others. Thus, the 
concept of bi-Hamiltonian manifold, which constitutes 
the geometric setting for the study of bi-Hamiltonian 
systems, emerged. This notion and its applications to the 
theory of finite-dimensional integrable systems is 
discussed in Multi-Hamiltonian Systems. 

In the first section of this article, we discuss the 
Hamiltonian form of soliton equations and, more 
generally, we present an important class of infinite- 
dimensional Poisson (also called Hamiltonian) 
structures, namely those of hydrodynamic type. 
Then we show how to use the bi-Hamiltonian 
properties of the KdV equation in order to construct 
its conserved quantities. We also recall that the KdV 
equation can be seen as an Euler equation on the 
dual of the Virasoro algebra. In the third section, we 
deal with other examples of integrable evolution 
equations admitting a bi-Hamiltonian representa- 
tion, that is, the Boussinesq and the Camassa—Holm 
equations, and we consider the bi-Hamiltonian 
structures of hydrodynamic type. 
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Hamiltonian Methods in Soliton Theory 


The most famous example of soliton equation is 
the KdV equation [1], where u is usually a 
periodic or rapidly decreasing real function. The 
choice of the coefficients in the equation has no 
special meaning, since they can be changed 
arbitrarily by rescaling x, t, and u. Right after 
the discovery of the inverse-scattering method for 
solving the Cauchy problem for the KdV equation, 
it was realized that this equation can be seen as an 
infinite-dimensional Hamiltonian system. Indeed, 
from a geometrical point of view, eqn [1] defines a 
vector field X(u)=(1/4)(uxx. — 6uu,) on M, the 
infinite-dimensional vector space of C% functions 
from the unit circle S! to R. (For the sake of 
simplicity, we consider only the periodic case; the 
integrals in this article are therefore understood to 
be taken on St.) The vector field X associated with 
the KdV equation is Hamiltonian, that is, it can be 
factorized as 


X(u) = |—20,] | s(x + 3u?)| 


where dH = (1/8)(—u,, + 3u?) is the differential of 
the functional 


H(u) = JG +32) dx 


that is, the variational derivative 6h/6u of the density 
bh =(1/8)(u? + (1/2)u2), and P= —20, is a Poisson 
(or Hamiltonian) operator. This means that the 
corresponding composition law 


{FG} = | roles eee) J dF (dG), dx [2] 


between functionals of u has the usual properties 
of the Poisson bracket, that is, it is R-bilinear 
and skew-symmetric, and it fulfills the Leibniz 
rule and the Jacobi identity. In other words, 
(M,P) is an infinite-dimensional Poisson mani- 
fold. Using the Poisson bracket [2], eqn [1] can 
be written as 


u, = {u, H} [3] 


corresponding to the usual Hamilton equation in 


R2” 
2 =47, 7). 


up to the replacement of z with u, and of the 
discrete index i with the continuous index x. More 
precisely, in the expression u; = {u, H} the symbol u 
should be replaced by u” (in analogy with 2’), the 
functional assigning to the generic function v € M 
its value at a fixed point x, that is, u” :v—> v(x). In 


(= Ny aga [4] 
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these notations, the Poisson bracket [2] takes the 
form 


{u*,u?} = —28 (x — y) 
where the 6-function is as usual defined as 


| FOSE- 9) de = f(x) 


so that its derivatives are given by 


J EOE- y) de = fC 


Another important 
Boussinesq equation 


example is given by the 


Ut = }(—Uyyxxx + 4u? F 4uuyx) [5] 


describing, like KdV, shallow water (soliton) waves 
in a nonlinear approximation. It can be obtained by 
the first-order (in time) system 


2 
XXXI 


1 2 


ee ae 2 _ 4,1 _ 42 
U p =3U U; Uyy 3U up =2u ti [6] 


by taking the derivative of its second equation with 
respect to t, plugging the result in the first one, and 
setting u=u?. The system [6] is Hamiltonian, since it 


can be written as 
6h éh 
1 __ a 
a Gz) “o al 


with h = (u!) + (1/9)(u?)? — utu? + (1/3)(u2}, and 


tay 7 


is easily seen to be a Poisson operator. Thus, the 
Poisson manifold associated with the Boussinesq 
equation is the space of periodic C% functions with 
values in R*. More generally, one can consider the 
space M” of C% functions from the unit circle St to 
R”. If P”, for i,7=1,...,”, are the entries of a 
constant skew-symmetric matrix and u** assigns to 
the generic function v € M” the value of its ith 
components at a fixed point x, then 


{uh ae?) = PB — y) 


defines a Poisson bracket on M”. One can also let 
the P” depend on the uf in such a way that they 
form the components of a Poisson tensor on R”. If 
H= | hdx isa functional on M” with density h, the 
associated Hamiltonian vector field gives rise to the 
following system of partial differential equations: 


In particular, if ~=2N and 


PS 3 


then we have the Hamiltonian formulation of the 
field equations, 

; 6h l óh 

T = Spi’ Pi ~ Bai? b= des 
Another important example of Poisson bracket on 
M” is given by 

{u*, uh} = g'8 (x — y) [8] 

where g are the entries of a constant symmetric 


matrix. In this case, the Hamiltonian vector field 
associated with H = f h dx is given by 


Notice that this vector field is zero if H= fuf dx, 
with R=1,...,2. This amounts to saying that such 
an H is a Casimir function of the Poisson bracket 
[8], that is, that {H,F}=0O for all functionals F. A 
simple example of this class (with n = 2) is given by 
the Poisson structure of the Boussinesq equation, 
corresponding to the choice g!! =g” =0 and 
g!*=97!—1. Suppose now that the matrix with 
entries g7 is invertible. Then they can be interpreted 
as the contravariant components of a flat pseudo- 
Riemannian metric in R”. A change of coordinates 
(u',...,u”)>(a',...,7”) in R” transforms the 
Poisson bracket [9] in 


{a'* mY} = g’ (B)8' (x — y) +T} (n)a — y) [10] 


where g”(u) are the components of the metric in the 
new coordinates and the T% are the contravariant 
Christoffel symbols related to the usual Christoffel 
symbols by 


ry = -gT [11] 


Conversely, the expression [10] gives a Poisson 
bracket if the metric defined by g” is flat and its 
Christoffel symbols are related to the T} by [11]. 
These are the Poisson structures of hydrodynamic 
type introduced by Dubrovin and Novikov. We will 
consider them again later. 


Bi-Hamiltonian Formulation 
of the KdV Equation 


The KdV equation [1] has a lot of remarkable 
properties, such as the Lax representation and the 
existence of a t-function. In this section, we recall a 
geometrical feature of KdV, namely, the fact that it 


has a second Hamiltonian structure, and we show 
that the integrability of KdV can be seen as a natural 
consequence of its double Hamiltonian representa- 
tion. We have already seen that the KdV vector field 
X(u) = (1/4) (xxx — 6uu,) can be written as 


X(u) = Po dH> 
where Po = —20, and 


1 3 1 2 


But X admits another Hamiltonian representation: 
X (u) = Pi dH, 
where P4 = —(1/2)Oxx, + 2u0, + ux and 


1 
Hy =~] | w dx 


The important point is that P4 is also a Poisson 
operator. Moreover, it is compatible with Po, that is, 
any linear combination of Pp and P4 is still a Poisson 
operator. Thus, the KdV equation is a bi-Hamiltonian 
system, that is, it can be seen in two different (but 
compatible) ways as a Hamiltonian system. Next, we 
will show how this property can be used to construct 
an infinite sequence of conserved quantities for the 
KdV equation, which are in involution with respect to 
the Poisson brackets {- , -}ọ and {-,-}, associated with 
Po and P4. In particular, the phase space M of KdV 
is a bi-Hamiltonian manifold, that is, it has two 
different (but compatible) Poisson structures. Let us 
rename X,;=xX the KdV vector field. Since 
X = Po dH2 = Pı dH;, one is naturally led to con- 
sider the vector fields 


Xi =Podhs 26 Pde, 


Explicitly, Xo(u)=u, and X (uw) =(1/16)(trxxxx — 
10Ut xx — 20UUx + 30u7u,). One can check that 
these vector fields are also bi-Hamiltonian. Indeed, 
Xo(u)= Py dH, with Ho = f udx, and 


with 


5 
(u, a Suu? + =u) dx 


X2 = Po dH; 


AE 
The functional Hp is a Casimir of Pp, that is, 
Po dHọ = 0, so that the iteration ends on this side, 
but it can be continued indefinitely from the other 
side, as shown below. For the time being, let us take 
for granted that there exists an infinite sequence 
{H}}k>o of functionals such that P4 dH; = Po dHp 41; 
in other words, 


to Heja = {o Hrs So [12] 
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Such relations are often called Lenard-Magri rela- 
tions. Then the functionals H; are in involution with 
respect to both Poisson brackets. Indeed, for k >j, 
one has 


{jy dd py = Hra A Haeo 
= = {He Aj jo 


so that {H;, Hk} =0 for all j,k > 0, and therefore 
(H;i, H} =0 for all j,k > 0. Hence, these func- 
tionals are constants of motion (in involution) for 
the KdV equation. The Hamiltonian vector fields 
associated with them are symmetries for the KdV 
equation; the corresponding evolution equations are 
called higher-order KdV equations. The set of such 
equations is the well-known KdV hierarchy. We 
remark that the existence of a sequence of func- 
tionals {H,},5., fulfilling the Lenard-Magri rela- 
tions [12] and starting from a Casimir of Po, is 
equivalent to the existence of a Casimir function 
H(A)= $o HA7 for the Poisson pencil 
P,=P,—2APo, where » is a real parameter. A 
straightforward way (due essentially to Miura, 
Gardner, and Kruskal) to determine such a Casimir 
function is to consider the (generalized) Miura map 
hou=h,+h* — A. As shown by Kupershmidt 
and Wilson, it transforms the Poisson structure 
(1/2)0, (in the variable b) into the Poisson pencil 
Py = — (1/2) Ox, + 2(u + A) + ux. Given u, the 
Riccati equation 


bh, +b =u+x [13] 


admits a unique solution with the asymptotic 
expansion h =z + X`,» bez *, where 2? = à. More- 
over, the coefficients þh, are differential polynomials 
in u (i.e., polynomials in u and its x-derivatives) that 
can be computed by recurrence. Thus, the general- 
ized Miura map can be seen as an invertible 
transformation. Since the functional h> [hdx is a 
Casimir of the Poisson structure (1/2)0,, it follows 
that if (uw) is the solution of the Riccati equation 
[13], then ur f h(u) dx is a Casimir of the Poisson 
pencil P,. More precisely, one has to introduce the 
functional H(\) =z f h(u) dx, that turns out to be a 
Laurent series in A, because the even coefficients of 
h(u) are x-derivatives. This is the Casimir function 
we were looking for. Explicitly, one finds that the 
first terms of h(u) are 


1 ee | — 1 2 
hy 7⁄4, hy = —4Ux, h3 —_ g( Ux Zu ) 
=_ 1 
h4 — ar Maxx — 4uux ) 
— {í 2 3 


Obviously, þı is the density of a Casimir function of 
Po, while h3 and h; are (one-half of) the densities of the 
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two Hamiltonians Hı and H3 of the KdV equation. 
We conclude this section showing that, as observed 
by Khesin and Ovsienko (Arnol’d and Khesin 1998), 
the bi-Hamiltonian structures of KdV have a clear 
Lie-algebraic origin. Indeed, the second Hamiltonian 
structure is the Lie—Poisson structure on the dual of 
the Virasoro algebra, while the first one can be 
obtained by “freezing” the second one at a suitable 
point. Let 4(S') be the Lie algebra of vector fields 


on St. The Virasoro algebra is the vector space 
qg=X(S')@R endowed with the Lie-algebra 
structure 


(roga) (a) -.b) 
= (fg) - KOKO), 
J Eg) dx) 14 


It is called a central extension of (S!) since it is 
obtained by considering the usual commutator 
between vector fields (up to a sign) and by adding 
a copy of R, which turns out to be the center of 
the Virasoro algebra. Equation [14] gives rise 
indeed to a Lie-algebra structure because the 
expression ff'(x)g"(x)dx defines a 2-cocycle of 
X(S'). The dual space g* of g can be considered 
as the space of the pairs (u dx & dx,c), where 
u € C~(S') and ce R. The pairing is obviously 
given by 


(0 odoo. (Fa) ) f moat desai 


The Lie-Poisson structure on the dual q* of a Lie 
algebra g is defined as 


LF, G}(X) = (X, [dF(X), dG(X)]) [15] 


where F, G € C®%(q)* and their differentials at X € q* 
are seen as elements of g. When g is the Virasoro algebra 
and F(u,c)= J f(u, c) dx, G(u, c) = | g(u,c)dx are 
two functionals on g* whose densities f and g are 
differential polynomials in u, one has 


{F, G}(u, c) 


-luso ae, (AE 
-9 S Ge) G e) 
= S (pe) ~ Ga) (an) 
+ fn) (an) Me 


This is (up to rescaling) the second Poisson 
bracket of KdV. The KdV equation is therefore 
an Euler equation, that is, it can be obtained from 
the Euler equations for the rigid body by repla- 
cing the Lie algebra of the rotation group with 
the Virasoro algebra. To be more precise, the 
ac vector field associated with 
Hy(u, c) = —(1/2)( f u? dx + c) is 


U; + 3uuy + CUxx, = 0, ce = 0 

If c £0, this is (up to rescaling) the KdV equation 
[1]. For c=0, we have the Burgers equation (also 
called dispersionless KdV equation), to be discussed 
again later on. The first Poisson bracket for the KdV 
hierarchy can be obtained by “freezing” the Lie- 
Poisson bracket at the point ((1/2)dx & dx, 0) of the 
dual of the Virasoro algebra. This means that 
instead of [16] one has to consider 


{F, G} (u, c) 


=( ave o), (H5 
(BE) S (Ga) Ge) ) 
HOORD 87 


The „orie ponang Hamiltonian is H= (1/2) 
J (—u? + cu2)dx. From this (Lie algebraic) point of 
view, the companh lty between the two Poisson 
brackets follows from the fact that the pencil {-, -}, = 
{-,-}—Af{-,-}9 is obtained from the Lie—Poisson 
bracket {- , -} by applying the translation 


(udx ® dxe) ( (+3) di dec) 


N| ke 


Other Examples 


In the previous section, we have presented the bi- 
Hamiltonian structure of the KdV equation and 
some of its properties. Now we give two more 
examples of equations — the Boussinesq equation 
and the Camassa—Holm equation - admitting a 
bi-Hamiltonian formulation. We have seen in an 
earlier section that the system [6] associated with 
the Boussinesq equation [5] is Hamiltonian with 
respect to the Poisson structure [7] and the 
Hamiltonian 


A more complicated Poisson structure for this 
system Is 


A —30! + 3u702 + 9u'd, + 3u! 
Ps [18] 
B —60 + 6u” ð; + 3u2 


with 


A = 20 — 4u 0? — 6u282 + (2(u?)? + 6ul — 6u2,,) Ox 
{3u — 2u2,, Hu u) 


XXX 


and 
B = 30% — 3w 0 + (9u' — 6uz) Ox + (6u! — 3u2.,) 


It can be obtained by means of the Drinfeld- 
Sokolov reduction (or also by means of a 
bi-Hamiltonian reduction) from the Lie—Poisson 
structure (modified with the cocycle ô) on the 
space of C% maps from S! to the Lie algebra of 
3 x 3 traceless matrices. This is the reason why it is 
a Poisson structure, compatible with [7]. The system 
[6] can be written as 


u; p| (sh2/6u") 

u2 (dh2/6u*) 

where h =(1/3)u, is the density of a Casimir of the 
Poisson structure [7]. Thus, the Boussinesq equation 
is a bi-Hamiltonian system and can be shown to 
possess, like KdV, an infinite sequence of conserved 
quantities and symmetries, forming the Boussinesq 
hierarchy. The KdV and the Boussinesq hierarchy are 
indeed particular examples of Gelfand—Dickey hier- 
archies (Dickey 2003). They are hierarchies of 
systems of n equations with n unknown functions 
and they are related, via the Drinfeld—Sokolov 
approach, to the Lie algebra 5l(n + 1). As shown by 
Adler, Dickey, and Gelfand, these hierarchies have a 
bi-Hamiltonian formulation. Also the generalized 
KdV equations, associated by Drinfeld and Sokolov 
with an arbitrary affine Kac-Moody Lie algebra, are 
bi-Hamiltonian (or are obtained as suitable reduc- 


tions of bi-Hamiltonian systems). Let us consider 
now the (dispersionless) Camassa—Holm equation 


> 


Uz — Ury = IUU; + 2UyUyy + Ug xx [19] 


which also describes shallow water waves, and 
possesses remarkable solutions called peakons, since 
they represent traveling waves with discontinuous 
first derivative. In order to supply this equation with a 
(bi-)Hamiltonian structure, one has to perform the 
change of variable m =u — uxx, whose inverse, in the 
space of period-1 functions, turns out to be given by 
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n(x) = | mi) sinh(y — x) dy 


+ yuna [| mo) eosh(y —x—5) dy 


The Camassa—Holm equation is then bi-Hamiltonian 
with respect to the Poisson pair 
Py = byxx — Ox, Pa = 2m0,, + Mx 


Indeed, it can be written as m, = P1 dH) =P, dH), 
where 


Hy ==5 |W? +12) dv 
H3 =3 | we +m) dx 


Notice that the Poisson pair of the Camassa—Holm 
equation can be obtained from that of KdV by 
moving the cocycle xxx from the second Poisson 
structure to the first one. Indeed, 


Puta = OPER + bô, + c(2mo,, + My ) 
a,b,c E R [20] 


is a family of pairwise compatible Poisson operators. 
Moreover, we mention that Misiołek has shown that 
also the Camassa—Holm equation is an Euler equation 
on the dual of the Virasoro algebra. We conclude this 
article with a brief discussion concerning the so-called 
bi-Hamiltonian structures of hydrodynamic type. They 
play a relevant role in the theory of Frobenius 
manifolds, that, in turn, have deep relations with 
many important topics in contemporary mathematics 
and physics, such as Gromov—Witten invariants and 
isomonodromic deformations. As we have seen in the 
earlier section, a Poisson structure of hydrodynamic 
type is given, on the space of C” maps from S! to (an 
open set of) R”, by 


{u*, uh} = g’ (u) (x — y) + Ty (uuz —y) [21] 


where g’(u) are the contravariant components of 
a (pseudo-)Riemannian flat metric and the T} are 
the (contravariant) Christoffel symbols of the 
metric. If two Poisson structures of hydrodynamic 
type are given, it can be shown that they are 
compatible if and only if the two corresponding 
metrics form a flat pencil. This means that their 
linear combinations (with constant coefficients) 
are still flat (pseudo-)Riemannian metrics, and 
that the contravariant Christoffel symbols of the 
linear combinations are the linear combinations 
of the contravariant Christoffel symbols of the 
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two metrics. The simplest example is given by the 
bi-Hamiltonian formulation of the Burgers (or 
dispersionless KdV) equation, 


u, + 3uu, = 0 


that we have already encountered. We know that 
this equation is Hamiltonian with respect to the 
(Lie—)Poisson operator 2u0, + ux, with Hamiltonian 
function Hı = —(1/2) fu” dx, and with respect to 
the Poisson operator ôx, with Hamiltonian function 
H» = —(1/2) fue dx. This also means that the bi- 
Hamiltonian structure of the Burgers equation 
comes from the family [20]. The first Hamiltonian 
structure corresponds to the standard metric on R, 
that is, du & du, whereas the second one is given by 
the metric (2u) du Q du. 
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Billiard Flow and Billiard Ball Map 


The billiard system describes the motion of a free 
particle inside a domain with elastic reflection off the 
boundary. More precisely, a billiard table is a 
Riemannian manifold M with a piecewise smooth 
boundary, for example, a domain in the plane. The 
point moves along a geodesic line with a constant speed 
until it hits the boundary. At a smooth boundary point, 
the billiard ball reflects so that the tangential compo- 
nent of its velocity remains the same, while the normal 
component changes its sign. This means that both 
energy and momentum are conserved. In dimension 2, 
this collision is described by a well-known law of 
geometrical optics: the angle of incidence equals the 
angle of reflection. Thus, the theory of billiards has 
much in common with geometrical optics. If the billiard 
ball hits a corner, its further motion is not defined. 
The billiard reflection law satisfies a variational 
principle. Let A and B be fixed points in the billiard 


table and let AXB be a billiard trajectory from A to 
B with reflection at a boundary point X. Then, the 
position of a variable point X extremizes the length 
AXB. This is the Fermat principle of geometrical 
optics. 

In this article, we discuss billiards in bounded 
convex domains with smooth boundary, also called 
Birkhoff billiards. A related article treats billiards in 
polygons (see Polygonal Billiards). 

The billiard flow is defined as a continuous-time 
dynamical system. The time-t billiard transformation 
acts on unit tangent vectors to M which constitute the 
phase space of the billiard flow, and the manifold M is 
its configuration space. Thus, the billiard flow is the 
geodesic flow on a manifold with boundary. 

It is useful to reduce the dimensions by one and to 
replace continuous time by discrete one, that is, to 
replace the billiard flow by a mapping, called the 
billiard ball map and denoted by T. The phase space 
of the billiard ball map consists of unit tangent 
vectors (x,v) with the foot point x on the boundary 
of M and the inward direction v. A vector (x,v) 
moves along the geodesic through x in the direction 
of v to the next point of its intersection xı with the 
boundary OM, and then v reflects in OM to the new 


S NE 


Cy > 


Figure 1 Billiard ball map. 


inward vector vı. Then, one has: T(x,v)=(x1,v1). 
For a convex M, the map T is continuous. If M is 
n-dimensional, then the dimension of the phase 
space of the billiard ball map is 2n — 2. 

Equivalently, and more in the spirit of geometrical 
optics, one considers £, the space of oriented 
geodesics (rays of light) that intersect the billiard 
table. This space of lines is in one-to-one correspon- 
dence with the phase space of the billiard ball map: 
to an inward unit vector (x,v) there corresponds the 
oriented line through x in the direction v (Figure 1). 

The space of rays £ carries a canonical symplec- 
tic structure, that is, a closed nondegenerate 
differential 2-form. In the Euclidean case, this 
symplectic structure w is defined as follows. Given 
an oriented line £ in R”, let q be the unit vector 
along £ and p be the vector obtained by dropping 
the perpendicular from the origin to £. Then, 
w=dp ^ dq = ` dp; A dq;. This construction identi- 
fies £L with the cotangent bundle of the unit sphere: 
q is a unit vector and p is a (co)tangent vector at q, 
and w identifies with the canonical symplectic 
structure of T*S”. In the general case of a 
Riemannian manifold M, the symplectic structure 
on the space of oriented geodesics is obtained from 
that on T*M by symplectic reduction. 

One has an important result: the billiard ball map 
preserves the symplectic structure T*(w)=w. As a 
consequence, T is also measure preserving. In the 
planar case, one has the following explicit formula 
for this measure. Let t be an arc length parameter 
along the boundary of the billiard table and let 
a € [0,7] be the angle made by the unit vector with 
this boundary. Then, (a,t) are coordinates in the 
phase space, identified with the cylinder, and the 
invariant measure is sina da dt. 

As a consequence, the total area of the phase 
space equals 2L where L is the perimeter length of 
the boundary of the billiard table, and the mean free 
path equals 7A/L, where A is the area of the billiard 
table. In the general n-dimensional case, the mean 
free path equals 

vol(S”~') vol(M) 


vol(B”-!) vol(OM) 
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where S”~! and B”! are the unit sphere and the unit 
disk in Euclidean spaces. 


Existence and Nonexistence of Caustics 


Given a plane billiard table, a caustic is a curve 
inside the table such that if a segment of a billiard 
trajectory is tangent to this curve then so is each 
reflected segment. Caustics correspond to invariant 
circles of the billiard ball map (i.e., invariant curves 
that go around the phase cylinder): such an invariant 
circle is a one-parameter family of oriented lines, 
and the respective caustic is their envelop. An 
envelop may have cusp-like singularities but if the 
boundary of the billiard table is a smooth curve with 
positive curvature then a caustic, sufficiently close to 
the boundary, is smooth and convex. 

One can recover the table from a caustic by the 
following string construction. Let y be a caustic. 
Wrap a closed nonstretchable string around y, pull it 
tight at a point and move this point around y to 
obtain a new curve I. Then, y is a caustic for the 
billiard inside I’. Note that this construction has one 
parameter, the length of the string. 

The following useful “mirror equation” relates 
various quantities depicted in Figure 2: 


1 1 2k 


a b sina 





where k is the curvature of the boundary at the 
impact point. 

Do caustics exist for every convex billiard table? 
This is important to know, in particular, because the 
existence of a caustic implies that the billiard ball 
map is not ergodic. The answer is given by a 
theorem of Lazutkin: if the boundary of the billiard 
table is sufficiently smooth and its curvature never 
vanishes, then there exists a collection of smooth 
caustics in the vicinity of the billiard curve whose 
union has a positive area. Originally this theorem 
asked for 553 continuous derivatives; later this was 
reduced to six. This result uses the techniques of the 
KAM _ (Kolmogorov—Arnol’d—Moser) theory. The 





Figure 2 String construction and mirror equation. 
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crucial fact is that, in appropriate coordinates, the 
billiard ball map is approximated, near the bound- 
ary of the phase cylinder, by the integrable map 
(x,y) (x+y, 9). 

On the other hand, by a theorem of Mather, if the 
curvature of a convex smooth billiard curve vanishes 
at some point, then this billiard ball map has no 
invariant circles. This result belongs to the well- 
developed theory of area-preserving twist maps of 
the cylinder, of which the billiard ball map is an 
example. 


Integrable Billiards 


Let a plane billiard table be an ellipse with foci F; 
and F2. It is known since antiquity that a billiard 
ball shot from F; reflects to F2. A generalization of 
this optical property of the ellipse is the following 
theorem: a billiard trajectory inside an ellipse 
forever remains tangent to a fixed confocal conic. 
More precisely, if a segment of a billiard trajectory 
does not intersect the segment F,F,, then all the 
segments of this trajectory do not intersect Fı F and 
are all tangent to the same ellipse with foci F, and F3; 
and if a segment of a trajectory intersects Fj F), 
then all the segments of this trajectory intersect Fy F2 
and are all tangent to the same hyperbola with foci 
F, and F3. 

It follows that confocal ellipses are the caustics of 
the billiard inside an ellipse. In particular, a 
neighborhood of the boundary of such a billiard 
table is foliated by caustics. A long-standing 
conjecture, attributed to Birkhoff, asserts that if a 
neighborhood of a strictly convex smooth boundary 
of a billiard table is foliated by caustics, then this 
table is an ellipse. This conjecture remains open. The 
best result in this direction is a theorem of Bialy: if 
almost every phase point of the billiard ball map in a 
strictly convex billiard table belongs to an invariant 
circle, then the billiard table is a disk. 

The multidimensional analogs of the optical 
properties of an ellipse are as follows. Consider an 
ellipsoid M in R” given by the equation 


2 2 2 
X X X 
+See tal (1] 
a, a a, 


and define the confocal family of quadrics M, by the 
equation 











xt % Xn 
a+r +À a2 + À 


where is a real parameter. The topological type of 


M, changes as À passes the values —a?. 


One has the following theorem: a_ billiard 
trajectory inside M remains tangent to fixed 
(1 —1) confocal quadrics. A similar and closely 
related result holds for the geodesic curves on M: 
the tangent lines to a fixed geodesic on M are 
tangent to (n — 2) other fixed quadrics, confocal 
with M. For a triaxial ellipsoid, this theorem goes 
back to Jacobi. 

Explicit formulas for the integrals of the billiard 
in an n-dimensional ellipsoid [1] are as follows. Let 
(x,v) be a phase point, a unit inward tangent vector 
whose foot point x lies on the boundary. The 
following functions are invariant under the billiard 
ball map: 


these functions are not independent: F4 +---+F,=1. 

In fact, the integrals F; Poisson-commute (with 
respect to the Poisson bracket associated with the 
symplectic structure in the phase space of the 
billiard ball map that was described above). Accord- 
ing to the Arnol’d—Liouville theorem, this complete 
integrability of the billiard inside an ellipsoid implies 
that the phase space is foliated by invariant tori and, 
in appropriate coordinates, the map on each torus is 
a parallel translation. 

Similar results on complete integrability hold 
for billiards inside quadrics in spaces of constant 
positive or negative curvature. The former is 
the intersection of a quadratic cone with the 
unit sphere, and the latter with the unit 
pseudosphere. 


Periodic Orbits 


Periodic billiard trajectories inside a planar billiard 
table correspond to inscribed polygons of extremal 
perimeter length. When counting periodic trajec- 
tories, one does not distinguish between polygons 
obtained from each other by cyclic permutation or 
reversing the order of the vertices. In other words, 
one counts the orbits of the dihedral group D, 
acting on v-periodic billiard polygons. 

An additional topological characteristic of a 
periodic billiard trajectory is the rotation number 
defined as follows. Assume that the boundary y of a 
billiard table is parametrized by the unit circle and 
consider a polygon (x1,x2,...,X,) inscribed in y. 
For all i, one has x;41 = x; + t; with t; € (0,1). Since 
the polygon is closed, tj + --- + tn € Z. This integer, 
that takes values from 1 to n—1, is called the 
rotation number of the polygon and denoted by p. 
Changing the orientation of a polygon replaces the 
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Figure 3 Rotation numbers of periodic trajectories. 


rotation number p by n — p. The leftmost 5-periodic 
trajectory in Figure 3 has p=1 and the other three 
p=. 

The following theorem is due to Birkhoff: for 
every n>2 and p< |(n—1)/2|, coprime with n, 
there exist two geometrically distinct n-periodic 
billiard trajectories with the rotation number p. For 
example, there are at least two 2-periodic billiard 
trajectories inside every smooth oval: one is the 
diameter, the longest chord, and another one is of 
minimax type, similar to the minor axis of an 
ellipse. 

In higher dimensions, lower bounds on the 
number of periodic billiard trajectories inside strictly 
convex domains with smooth boundaries were 
obtained only recently by Farber and the present 
author. Here is one of the results: for a generic 
billiard table in R”, the number of n-periodic 
trajectories is not less than (n — 1)(m— 1). The 
proof consists in using the Morse theory to estimate 
below the number of critical points of the perimeter 
length function on the space of inscribed n-gons and 
its quotient space by the dihedral group D,, and the 
main difficulty is in describing the topology of these 
spaces. 

Returning to convex smooth planar billiards, the 
following conjecture remains open for a long time: 
the set of m-periodic points of the billiard ball map 
has zero measure. This is easy for n=2; for n=3 
this is a theorem by M Rychlik. The motivation for 
this question comes from spectral geometry. In 
particular, according to a theorem of Ivrii, the 
above conjecture implies the Weyl conjecture on 
the second term for the spectral asymptotics of the 
Laplacian in a bounded domain with the Dirichlet 
or Neumann boundary conditions. 


Length Spectrum 


The set of lengths of the closed trajectories in a 
convex billiard M is called the length spectrum of M. 
There is a remarkable relation between the length 
spectrum and the spectrum of the Laplace operator 
in M with the Dirichlet boundary condition: 
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5 ; 2 
3 2 9 
3 
{ 1 
Af =f flay =9. From the physical point of view, 
the eigenvalues A are the eigenfrequencies of the 
membrane M with a fixed boundary. Roughly 
speaking, one can recover the length spectrum from 


that of the Laplacian. More precisely, the following 
theorem of K Anderson and R Melrose holds: 


` COS (v=); 


A;Espec A 


is a well-defined generalized function (distribution) 
of t, smooth away from the length spectrum. That is, 
if L> 0 belongs to the singular support of this 
distribution, then there exists either a closed billiard 
trajectory of length l, or a closed geodesic of length / 
in the boundary of the billiard table. 

This relation between the Laplacian and the 
length spectrum is due to the fact that geometric 
optics is not a very accurate description of light. In 
wave optics, light is considered as electromagnetic 
waves, and geometric optics gives a realistic approx- 
imation only when the wave length is small. This 
small-wave approximation is based on the assump- 
tion that the waves are locally almost harmonic, 
while their amplitudes change slowly from point to 
point. The substitution of such a function into the 
corresponding PDEs gives, in the first approxima- 
tion, the equations of wave fronts, that is, of 
geometric optics. 

Here is another spectral result concerning a 
smooth strictly convex plane domain, due to 
S Marvizi and R Melrose. Let L, be the supremum 
and l, the infimum of the perimeters of simple 
billiard m-gons. Then, 

lim n*(L, —1,) = 0 

n—- CO 
for any positive k. Furthermore, L, has an asymp- 
totic expansion, as n — œ, 


where / is the length of the boundary of billiard table 
and c; are constants, depending on the curvature of 
the boundary. 
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Introduction 


Over the last 30 years, black holes have been 
shown to have a number of surprising properties. 
These discoveries have revealed unforeseen relations 
between the otherwise distinct areas of general 
relativity, quantum physics, and _ statistical 
mechanics. This interplay, in turn, led to a number 
of deep puzzles at the very foundations of physics. 
Some have been resolved while others continue to 
baffle physicists. The starting point of these 
fascinating developments was the discovery of 
laws of black hole mechanics by Bardeen, 
Bekenstein, Carter, and Hawking. They dictate the 
behavior of black holes in equilibrium, under small 
perturbations away from equilibrium, and in fully 
dynamical situations. While they are consequences 
of classical general relativity alone, they have a 
close similarity with the laws of thermodynamics. 
The origin of this seemingly strange coincidence lies 
in quantum physics. For further discussion, 
see Asymptotic Structure and Conformal Infinity; 
Loop Quantum Gravity; Quantum Geometry and 
Its Applications; Quantum Field Theory in Curved 
Spacetime; Stationary Black Holes. 

The focus of this article is just on black hole 
mechanics. The discussion is divided into three parts. 
In the first, we will introduce the notions of event 
horizons and black hole regions and discuss properties 
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of globally stationary black holes. In the second, we 
will consider black holes which are themselves in 
equilibrium but in surroundings which may be time 
dependent. Finally, in the third part, we summarize 
what is known in the fully dynamical situations. For 
simplicity, all manifolds and fields are assumed to be 
smooth and, unless otherwise stated, spacetime is 
assumed to be four dimensional, with a metric of 
signature —, +, +, +, and the cosmological con- 
stant is assumed to be zero. An arrow under a 
spacetime index denotes the pullback of that index to 
the horizon. 


Global Equilibrium 


To capture the intuitive notion that black hole is a 
region from which signals cannot escape to the 
asymptotic part of spacetime, one needs a precise 
definition of future infinity. The standard strategy is to 
use Penrose’s conformal boundary J+. A black hole 
region B of a spacetime (M, g,,) is defined as B= M\ 
I-(J*), where I~ denotes “chronological past.” The 
boundary ðB of the black hole region is called the 
“event horizon” and denoted by E. Thus, € is the 
boundary of the past of 7*. It therefore follows that £ is 
a null 3-surface, ruled by future inextendible null 
geodesics without caustics. If the spacetime is globally 
hyperbolic, an “instant of time” is represented by a 
Cauchy surface M. The intersection of 8 with M may 
have several disjoint components, each representing a 
black hole at that instant of time. If M’ is a Cauchy 
surface to the future of M, the number of disjoint 
components of M’ U 8 in the causal future of MU B 
must be less than or equal to those of MUS 


(see Hawking and Ellis (1973)). Thus, black holes can 
merge but can not bifurcate. (By a time reversal, i.e., by 
replacing I+ with J~ and I~ with I", one can define a 
white hole region W. However, here we will focus only 
on black holes.) 

A spacetime (M, g,,) is said to be stationary (i.e., time 
independent) if g, admits a Killing field t° that 
represents an asymptotic time translation. By conven- 
tion, £ is assumed to be unit at infinity. (M, g,,) is said 
to be axisymmetric if g, admits a Killing field $7 
generating an SO(2) isometry. By convention @ is 
normalized such that the affine length of its integral 
curves is 27. Stationary spacetimes with nontrivial M \ 
I~(J*) represent black holes which are in global 
equilibrium. In the Einstein—Maxwell theory in four 
dimensions, there exists a unique three-parameter 
family of stationary black hole solutions, generally 
parametrized by mass m, angular momentum J, and 
electric charge O. This is the celebrated Kerr-Newman 
family. Therefore, in general relativity a great deal of 
work on black holes has focused on these solutions and 
perturbations thereof. The Kerr-Newman family is 
axisymmetric and furthermore, its metric has the 
property that the 2-flats spanned by the Killing fields 
t’ and ¢” are orthogonal to a family of 2-surfaces. This 
property is called “t-¢ orthogonality.” These features of 
Kerr-Newman space-times are widely used in black 
hole physics. Note however that uniqueness fails in 
higher dimensions, and also in the presence of 
nonabelian gauge fields or rings of perfect fluids around 
black holes in four dimensions. In mathematical 
physics, there is significant literature on the new 
stationary black hole solutions in Einstein—Yang— 
Mills-Higgs theories. These are called “hairy black 
holes.” Research on stationary black hole solutions with 
rings received a boost by a recent discovery that these 
black holes can violate the Kerr inequality J < Gm? 
between angular momentum J and mass m. 

A null 3-manifold K in M is said to be a “Killing 
horizon” if g, admits a Killing field K* which is 
everywhere normal to K. On a Killing horizon, one 
can show that the acceleration of K* is proportional 
to K® itself: 


K*V,K? = cK? [1] 


The proportionality function « is called “surface 
gravity.” We will show in the next section that if a 
mild energy condition holds on K, then k must be 
constant. Note that if we rescale K° via K’ — cK‘, 
where c is a constant, surface gravity also rescales as 
Ki — CK. 

In the Kerr-Newman family, the event horizon is 
a Killing horizon. More generally, if an axisym- 
metric, stationary black hole spacetime (M, g,,) 
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satisfies the t-d orthogonality property, its event 
horizon € is a Killing horizon. (Although one can 
envisage stationary black holes in which these 
additional symmetry conditions are not met, this 
possibility has been ignored in black hole mechanics 
on stationary spacetimes. Quasilocal horizons, dis- 
cussed below, do not require any spacetime symme- 
tries.) In these cases, the normalization freedom in 
K” is fixed by requiring that K* have the form 


K? = 17 +. 0¢% (2) 


on the horizon, where Q is a constant, called the 
“angular velocity of the horizon.” The resulting « is 
called the surface gravity of the black hole. It is 
remarkable that k is constant for all such black 
holes, even when their horizon is highly distorted 
(i.e., far from being spherically symmetric) either 
due to rotation or due to external matter fields. This 
is analogous to the fact that the temperature of a 
thermodynamical system in equilibrium is constant, 
independently of the details of the system. In 
analogy with thermodynamics, constancy of « is 
referred to as the “zeroth law of black hole 
mechanics.” 

Next, let us consider an infinitesimal perturbation 
6 within the three-parameter Kerr-Newman family. 
A simple calculation shows that the changes in the 
Arnowitt—Deser—Misner (ADM) mass m, angular 
momentum J, and the total charge O of the 
spacetime and in the area a of the horizon are 
constrained via 


K 

ôm = zG ba t+ NAJ + BOO [3] 
where the coefficients k,9,® are black hole para- 
meters, ® = A, K? being the electrostatic potential at 
the horizon. The last two terms, 06] and ®6Q, have 
the interpretation of “work” required to spin the 
black hole up by an amount 6] or to increase its 
charge by 6Q. Therefore, [3] has a striking resem- 
blance to the first law, 6E = TS + 6W, of thermo- 
dynamics if (as the zeroth law suggests) x is made 
proportional to the temperature T, and the horizon 
area a to the entropy S. Therefore, [3] and its 
generalizations discussed below are referred to as 
the “first law of black hole mechanics.” 

In Kerr-Newman spacetimes, the only contribu- 
tion to the stress-energy tensor comes from the 
Maxwell field. Bardeen et al. (1973) consider 
stationary black holes with matter such as perfect 
fluids in the exterior region and stationary perturba- 
tions 6 thereof. Using Einstein’s equations, they 
show that the form [3] of the first law does not 
change; the only modification is addition of certain 
matter terms on the right-hand side which can be 


302 Black Hole Mechanics 


interpreted as the work 6W done on the total 
system. A generalization in another direction was 
made by Iyer and Wald (1994) using Noether 
currents. They allow nonstationary perturbations 
and, more importantly, drop the restriction to 
general relativity. Instead, they consider a wide 
class of  diffeomorphism-invariant Lagrangian 
densities L(Zab5 Rabeds VaR bedes stag De y Va Oh ann ) 
which depend on the metric g,,, matter fields ®~., 
and a finite number of derivatives of the Riemann 
tensor and matter fields. Finally, they restrict 
themselves to k Æ 0. In this case, on the maximal 
analytic extension of the spacetime, the Killing field 
K* vanishes on a 2-sphere S, called the bifurcate 
horizon. Then, [3] is generalized to 

K 
2n 
Here 6W again represents “work terms” and Shor 1s 
given by 


OShor + OW [4] 


ôm = 


óL 
Siar = -27 $ z ~~~ Nab cd 5 
s, ORabed * 2 


where n,, is the binormal to S, (with n,n = —2), 
and the functional derivative inside the integral is 
evaluated by formally viewing the Riemann tensor 
as a field independent of the metric. For the 
Einstein-Hilbert action, this yields Shor =4/4G and 
one recovers [3]. 

These results are striking. However, the under- 
lying assumptions have certain unsatisfactory 
aspects. First, although the laws are meant to refer 
just to black holes, one assumes that the entire 
spacetime is stationary. In thermodynamics, by 
contrast, one only assumes that the system under 
consideration is in equilibrium, not the whole 
universe. Second, in the first law, quantities a, Q, ® 
are evaluated at the horizon while M, J are 
evaluated at infinity and include contributions from 
possible matter fields outside the black hole. A more 
satisfactory law of black hole mechanics would 
involve attributes of the black hole alone. Finally, 
the notion of the event horizon is extremely global 
and teleological since it explicitly refers to 7*. An 
event horizon may well be developing in the very 
room you are sitting today in anticipation of a 
gravitational collapse in the center of our galaxy 
which may occur a billion years hence. This feature 
makes it impossible to generalize the first law to 
fully dynamical situations and relate the change in 
the event horizon area to the flux of energy and 
angular momentum falling across it. Indeed, one can 
construct explicit examples of dynamical black holes 
in which an event horizon € forms and grows in the 
flat part of a spacetime where nothing happens 


physically. These considerations call for a replace- 
ment of € by a quasilocal horizon which leads to a 
first law involving only horizon attributes, and 
which can grow only in response to the influx of 
energy. Such horizons are discussed in the next two 
sections. 


Local Equilibrium 


The key idea here is drop the requirement that 
spacetime should admit a stationary Killing field and 
ask only that the intrinsic horizon geometry be time 
independent. Consider a null 3-surface A in a 
spacetime (M,g,,) with a future-pointing normal 
field #. The pullback g,,5:=g,) of the spacetime 
metric to A is the intrinsic, degenerate “metric” of A 
with signature 0, +, +. The first condition is that it 
be “time independent,” that is, Lyg,,=0 on A. 
Then by restriction, the spacetime derivative opera- 
tor V induces a natural derivative operator D on A. 
While D is compatible with qap, that is, Dagp, =Q, it 
is not uniquely determined by this property because 
dab is degenerate. Thus, D has extra information, 
not contained in qap. The pair (qab, D) is said to 
determine the intrinsic geometry of the null surface 
A. This notion leads to a natural definition of a 
horizon in local equilibrium. Let A be a null, three- 
dimensional submanifold of (M,g,,) with topology 
S x R, where S is compact and without boundary. 


Definition 1 A is said to be “isolated horizon” if it 
admits a null normal & such that: 


(i) Le qap =0 and [L,;, D] =0 on A and 
(ii) —I%,0° is a future pointing causal vector on A. 


On can show that, generically, this null normal field 
é* is unique up to rescalings by positive constants. 


Both conditions are local to A. In particular, (M, g,,) 
is not required to be asymptotically flat and there is no 
longer any teleological feature. Since A is null and 
Ledab =, the area of any of its cross sections is the 
same, denoted by ay. As one would expect, one can 
show that there is no flux of gravitational radiation or 
matter across A. This captures the idea that the black 
hole itself is in equilibrium. Condition (ii) is a rather 
weak “energy condition” which is satisfied by all 
matter fields normally considered in classical general 
relativity. The nontrivial condition is (i). It extracts 
from the notion of a Killing horizon just a “tiny part” 
that refers only to the intrinsic geometry of A. As a 
result, every Killing horizon K is, in particular, an 
isolated horizon. However, a spacetime with an 
isolated horizon A can admit gravitational radiation 
and dynamical matter fields away from A. In fact, as a 
family of Robinson—Trautman spacetimes illustrates, 


gravitational radiation could even be present arbitra- 
rily close to A. Because of these possibilities, there are 
many nontrivial examples and the transition from 
event horizons of stationary spacetimes to isolated 
horizons represents a significant generalization of 
black hole mechanics. (In fact, the derivation of the 
zeroth and the first law requires slightly weaker 
assumptions, encoded in the notion of a “weakly 
isolated horizon” (Ashtekar et al. 2000, 2001).) 

An immediate consequence of the requirement 
Ledab =Q is that there exists a 1-form w, on A such 
that D,0’ =w,’. Following the definition of «x on a 
Killing horizon, the surface gravity k of (A, £) is 
defined as k =w,%. Again, under £ — cl", we have 
Kice) =cke. Together with Einstein’s equations, the 
two conditions of Definition 1 imply Lwa =0 and 
£*Diqwp, =0. The Cartan identity relating the Lie 
and exterior derivative now yields 


Die) = Dake) — 0 [6] 


Thus, surface gravity is constant on every isolated 
horizon. This is the zeroth law, extended to horizons 
representing local equilibrium. In the presence of an 
electromagnetic field, Definition 1 and the field 
equations imply Ly F,, = 0 and #F,p = 0. The first of 
these equations implies that one can always choose a 
gauge in which L;A, =0. By Cartan identity it then 
follows that the electrostatic potential ®):= A, is 
constant on the horizon. This is the Maxwell analog 
of the zeroth law. 

In this setting, the first law is derived using a 
Hamiltonian framework (Ashtekar et al. 2000, 
2001). For concreteness, let us assume that we are 
in the asymptotically flat situation and the only 
gauge field present is electromagnetic. One begins by 
restricting oneself to horizon geometries such that A 
admits a rotational vector field ° satisfying 
Loodab =Q. (In fact for black hole mechanics, it 
suffices to assume only that C,€,,=0, where ezp is 
the intrinsic area 2-form on A. The same is true on 
dynamical horizons discussed in the next section.) 
One then constructs a phase space I of gravitational 
and matter fields such that (1) M admits an internal 
boundary A which is an isolated horizon; and (2) all 
fields satisfy asymptotically flat boundary conditions 
at infinity. Note that the horizon geometry is 
allowed to vary from one phase-space point to 
another; the pair (g,,,D) induced on A by the 
spacetime metric only has to satisfy Definition 1 and 
the condition Lyqap = 0. 

Let us begin with angular momentum. Fix a 
vector field ¢* on M which coincides with the fixed 
y* on A and is an asymptotic rotational symmetry 
at infinity. (Note that $7 is not restricted in any way 
in the bulk.) Lie derivatives of gravitational and 
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matter fields along ° define a vector field X(¢) on 
I. One shows that it is an infinitesimal canonical 
transformation, that is, satisfies £x(4)Q=0, where Q 
is the symplectic structure on IT. The Hamiltonian 
H(¢) generating this canonical transformation is 
given by 


1 1 [7] 
Les -7G fs) m PAGE 


where J‘) is the ADM angular momentum at 
infinity, S is any cross section of A, and e the area 
element thereon. The term ee is independent of the 
choice of S made in its evaluation and interpreted as 
the “horizon angular momentum.” It has numerous 
properties that support this interpretation. In parti- 
cular, it yields the standard angular momentum 
expression in Kerr-Newman spacetimes. 

To define horizon energy, one has to introduce a 
“time-translation” vector field t°. At infinity, t must 
tend to a unit time translation. On A, it must be a 
symmetry of qap. Since Æ and f are both horizon 
symmetries, t = ck + Oy’ on A, for some constants 
c and Q. However, unlike $%, the restriction of t to 
A cannot be fixed once and for all but must be 
allowed to vary from one phase-space point to 
another. In particular, on physical grounds, one 
expects Q to be zero at a phase-space point 
representing a nonrotating black hole but nonzero 
at a point representing a rotating black hole. This 
freedom in the boundary value of t° introduces a 
qualitatively new element. The vector field X(t) on T 
defined by the Lie derivatives of gravitational and 
matter fields does not, in general, satisfy Lx (7 Q = 0; 
it need not be an infinitesimal canonical transforma- 
tion. The necessary and sufficient condition is that 
(Kee) /87G)baa + Q6Ja + Bieexy6Qa be an exact var- 
lation. That is, X(t) generates a Hamiltonian flow if 
and only if there exists a function Ey on TI such that 


A 
87G 


This is precisely the first law. Thus, the framework 
provides a deeper insight into the origin of the first 
law: it is the necessary and sufficient condition for 
the evolution generated by t* to be Hamiltonian. 
Equation [8] is a genuine restriction on the choice of 
phase-space functions c and Q, that is, of restrictions 
to A of evolution fields ¢%. It is easy to verify that M 
admits many such vector fields. Given one, the 
Hamiltonian H(t) generating the time evolution 
along £ takes the form 


bE = 





day + N6Ja + Pena [8] 


H(t) = E® — EP [9] 
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re-enforcing the interpretation of EN as the horizon 
energy. 

In general, there is a multitude of first laws, one for 
each vector field t4, the evolution along which preserves 
the symplectic structure. In the Einstein—Maxwell 
theory, given any phase-space point, one can choose a 
canonical boundary value ¢% exploiting the uniqueness 
theorem. ne is then called the horizon mass and 
denoted simply by ma. In the Kerr-Newman family, 
H(t.) vanishes and m, coincides with the ADM mass 
Mə. Similarly, if 6% is chosen to be a global rotational 
Killing field, ie equals J'?). However, in more general 
spacetimes where there is matter field or gravitational 
radiation outside A, these equalities do not hold; ma 
and Ja represent quantities associated with the 
horizon alone while the ADM quantities represent 
the total mass and angular momentum in the space- 
time, including contributions from matter fields and 
gravitational radiation in the exterior region. In the 
first law [8], only the contributions associated with 
the horizon appear. 

When the uniqueness theorem fails, as, for 
example, in the Einstein-Yang—Mills—Higgs theory, 
first laws continue to hold but the horizon mass ma 
becomes ambiguous. Interestingly, these ambiguities 
can be exploited to relate properties of hairy black 
holes with those of the corresponding solitons. (For 
a summary, see Ashtekar and Krishnan (2004).) 


Dynamical Situations 


A natural question now is whether there is an analog of 
the second law of thermodynamics. Using event 
horizons, Hawking showed that the answer is in the 
affirmative (see Hawking and Ellis (1973)). Let (M, g4) 
admit an event horizon €. Denote by ¢% a geodesic null 
normal to £. Its expansion is defined as bjo := q® V ap; 
where q% is any inverse of the degenerate intrinsic 
metric qap on E, and determines the rate of change of the 
area element of € along ¢*. Assuming that the null energy 
condition and Einstein’s equations hold, the Raychaud- 
huri equation immediately implies that if 0) were to 
become negative somewhere it would become infinite 
within a finite affine parameter. Hawking showed that, 
if there is a globally hyperbolic region containing 
I- (g*) UE — that is, if there are no naked singularities 
— this can not happen, whence 6(/) > 0 on E. Hence, if a 
cross section S> of E is to the future of a cross section $4, 
we must have ds, > as,. Thus, in any (i.e. not 
necessarily infinitesimal) dynamical process, the change 
Aa in the horizon area is always non-negative. This 
result is known as the “second law of black hole 
mechanics.” As in the first law, the analog of entropy is 
the horizon area. 


It is tempting to ask if there is a local physical 
process directly responsible for the growth of area. 
For event horizons, the answer is in the negative 
since they can grow in a flat portion of spacetime. 
However, one can introduce quasilocal horizons 
also in the dynamical situations and obtain the 
desired result (Ashtekar and Krishnan 2003). These 
constructions are strongly motivated by earlier ideas 
introduced by Hayward (1994). 


Definition 2 A three-dimensional spacelike sub- 
manifold H of (M,g,,) is said to be a “dynamical 
horizon” if it admits a foliation by compact 
2-manifolds S (without boundary) such that: 


(i) the expansion O) of one (future directed) null 
normal field Æ to S vanishes and the expansion 
of the other (future directed) null normal field, 
n’ is negative; and 

(ii) -T7,€° is a future pointing causal vector on H. 


One can show that this foliation of H is unique and 
that S is either a 2-sphere or, under degenerate and 
physically over-restrictive conditions, a 2-torus. Each 
leaf S isa marginally trapped surface and referred to as a 
“cut” of H. Unlike event horizons €, dynamical horizons 
H are locally defined and do not display any teleological 
feature. In particular, they cannot lie in a flat portion of 
spacetime. Dynamical horizons commonly arise in 
numerical simulations of evolving black holes as world 
tubes of apparent horizons. As the black hole settles 
down, H asymptotes to an isolated horizon A, which 
tightly hugs the asymptotic future portion of the event 
horizon. However, during the dynamical phase, H 
typically lies well inside €. 

The two conditions in Definition 2 immediately 
imply that the area of cuts of H increases mono- 
tonically along the “outward direction” defined by 
the projection of # on H. Furthermore, this change 
turns out to be directly related to the flux of energy 
falling across H. Let R denote the “radius function” 
on H so that the area of any cut S is given by 
ds =47R*. Let N denote the norm of 0,R and AH, 
the portion of H bounded by two cross sections S4 
and S2. The appropriate energy turns out to be 
associated with the vector field NÆ, where & is 
normalized such that its projection on H is the unit 
normal 7% to the cuts S. In the generic and 
physically interesting case when S is a 2-sphere, the 
Gauss and the Codazzi (i.e., constraint) equations 
imply 

1 


1 
— (Ra —-Rı)= | Ta NE d V +—— 


x J. N (tao + 2a") dV [10 


Here 7% is the unit normal to H, 07 the shear of ⁄# 
(i.e., the tracefree part of g7q?"Vinln)s and ¢7= 
qV ly, where gq” is the projector onto the 
tangent space of the cuts S. The first integral on 
the right-hand side can be directly interpreted as the 
flux across AH of matter—energy (relative to the 
vector field Né’). The second term is purely 
geometric and is interpreted as the flux of energy 
carried by gravitational waves across AH. It has 
several properties which support this interpretation. 
Thus, not only does the second law of black hole 
mechanics hold for a dynamical horizon H, but the 
“cause” of the increase in the area can be directly 
traced to physical processes happening near H. 

Another natural question is whether the first law 
[8] can be generalized to fully dynamical situations, 
where 6 is replaced by a finite transition. Again, the 
answer is in the affirmative. We will outline the idea 
for the case when there are no gauge fields on H. As 
with isolated horizons, to have a well-defined notion 
of angular momentum, let us suppose that the 
intrinsic 3-metric on H admits a rotational Killing 
field y. Then, the angular momentum associated 
with any cut S is given by 


1 
f Kye? dV = 
S 


yt 
i 87G S 


ey aoa? (2) dq? 11 


where K» is the extrinsic curvature of H in (M, g,,) and 
j'*) is interpreted as “the angular momentum density.” 
Now, in the Kerr family, the mass, surface gravity, and 
the angular velocity can be unambiguously expressed as 
well-defined functions m(a, J), (a, J), and Q(a, J) of the 
horizon area a and angular momentum J. The idea is to 
use these expressions to associate mass, surface gravity, 
and angular velocity with each cut of H. Then, a 
surprising result is that the difference between the 
horizon masses associated with cuts S; and S» can be 
expressed as the integral of a locally defined flux across 


the portion AH of H bounded by Hı and H2: 

_ _ 1 _ 1 A wy 42 
oe eae a Qj? 

m — mı eh tsa fh. j? d V 


A 
-$ OPV- | a fav] [12] 
S1 Qı S 


If the cuts Sy and S4 are only infinitesimally separated, 
this expression reduces precisely to the standard first 
law involving infinitesimal variations. Therefore, [12] is 
an integral generalization of the first law. 

Let us conclude with a general perspective. On the 
whole, in the passage from event horizons in 
stationary spacetimes to isolated horizons and then 
to dynamical horizons, one considers increasingly 
more realistic situations. In all the three cases, the 
analysis has been extended to allow the presence of 
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a cosmological constant A. (The only significant 
change is that the topology of cuts S of dynamical 
horizons is restricted to be S* if A>0 and is 
completely unrestricted if A < 0.) In the first two 
frameworks, results have also been extended to higher 
dimensions. Since the notions of isolated and dynami- 
cal horizons make no reference to infinity, these 
frameworks can be used also in spatially compact 
spacetimes. The notion of an event horizon, by 
contrast, does not naturally extend to these space- 
times. On the other hand, the generalization [4] of the 
first law [3] is applicable to event horizons of 
stationary spacetimes in a wide class of theories while 
so far the isolated and dynamical horizon frameworks 
are tied to general relativity (coupled to matter 
satisfying rather weak energy conditions). From a 
mathematical physics perspective, extension to more 
general theories is an important open problem. 


See also: Asymptotic Structure and Conformal Infinity; 
Branes and Black Hole Statistical Mechanics; Dirac 
Fields in Gravitation and Nonabelian Gauge Theory; 
Geometric Flows and the Penrose Inequality; Loop 
Quantum Gravity; Minimal Submanifolds; Quantum Field 
Theory in Curved Spacetime; Quantum Geometry and its 
Applications; Random Algebraic Geometry, Attractors 
and Flux Vacua; Shock Wave Refinement of the 
Friedman—Robertson—Walker Metric; Stationary Black 
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Introduction 


Ludwig Boltzmann (1872) established an evolution 
equation to describe the behavior of a rarefied gas, 
starting from the mathematical model of elastic balls 
and using mechanical and statistical considerations. 
The importance of this equation is twofold. First, it 
provides a reduced description (as well as the 
hydrodynamical equations) of the microscopic 
world. Second, it is also an important tool for the 
applications, especially for dilute fluids when the 
hydrodynamical equations fail to hold. 

The starting point of the Boltzmann analysis is to 
abandon the study of the gas in terms of the detailed 
motion of molecules which constitute it because of 
their large number. Instead, it is better to investigate 
a function f(x,v), which is the probability density of 
a given particle, where x and v denote its position 
and velocity. Actually, f(x,v)dx dv is often confused 
with the fraction of molecules falling in the cell of 
the phase space of size dx dv around x, v. The two 
concepts are not exactly the same, but they are 
asymptotically equivalent (when the number of 
particles is diverging) if a law of large numbers holds. 

The Boltzmann equation is the following: 


(+v: Vf = O(F,f) [1] 


where QO, the collision operator, is defined by eqn [2]: 
OFA = [den f de= 
x (f(x, vf (x, v1) — f(x, vo) F(x, v)] |2] 


and 


v =v — njn. (v — v)] 
vi =v +nj|n. (v — v)] 3l 


Moreover, n (the impact parameter) is a unitary 
vector and S4 = {n|n-(v — v1) > 0}. 

Note that v’,v', are the outgoing velocities after a 
collision of two elastic balls with incoming velocities 
v and vy and centers x and x+rn, r being the 
diameter of the spheres. Obviously, the collision 
takes place if m-(v—v ,) > 0. Equations [3] are a 
consequence of the conservation of total energy, 
momentum, and angular momentum. Note also that 
r does not enter in eqn [1] as a parameter. 


As fundamental features of eqn [1], we have the 
conservation in time of the following five quantities 


[ax | efter Dy [4] 


with a=0,1,2, expressing conservation of the 
probability, momentum, and energy. 

From now on we shall set {= fgs for notational 
simplicity. 

Moreover, Boltzmann introduced the (kinetic) 
entropy defined as 


Hip) = | dx | dvf log f(x.) is] 


and proved the famous H-theorem asserting the 
decreasing of H(f(t)) along the solutions to eqn [1]. 

Finally, in the case of bounded domains or 
homogeneous solutions (f = f (v; t) is independent of 
x), the distribution defined for some 8 > 0,p > 0, 
and u € R? by 


M(x, v) = Po .—(8/2)|v—u)? (6] 


(2n/)°? 


called Maxwellian distribution, is stationary for the 
evolution given by eqn [1]. In addition, M minimizes 
H among all distributions with given total mass p, 
given mean velocity u, and mean energy. The 
parameter ( is interpreted as the inverse 
temperature. 

In conclusion, Boltzmann was able to introduce 
not only an evolutionary equation with the remark- 
able properties expressing mass, momentum, and 
energy conservation, but also the trend to the 
thermal equilibrium. In other words, he tried to 
conciliate the Newton’s laws with the second 
principle of thermodynamics. 


The Boltzmann Heuristic Argument 


Thus, we want to find an evolution equation for the 
quantity f(x,v;t). The molecular system we are 
considering consists of N identical particles of 
diameter r in the whole space R°. We denote by 
X1,V1,---,XN,UN a State of the system, where x; and 
v; indicate the position and the velocity of the 
particle 7. The particles cannot overlap (i.e., the 
centers of two particles cannot be at a distance 
smaller than the particle diameter r). 

The particles are moving freely up to the first 
instance of contact, that is, the first time when two 
particles (say particles 7 and f) arrive at a distance r. 
Then the pair interacts when an elastic collision 
occurs. This means that they change instantaneously 


their velocities, according to the conservation of 
the energy and linear and angular momentum. 
More precisely, the velocities after a collision 
with incoming velocities v and vı are those given 
by formula [3]. After the first collision, the 
system evolves by iterating the procedure. Here 
we neglect triple collisions because they are 
unlikely. The evolution equation for a tagged 
particle is then of the form 


(0, +v- Vx)f = Coll 17] 


where Coll denotes the variation of f due to the 
collisions. 


We have 
Coll = G-L [8] 


where L and G (the loss and gain terms, respectively) 
are the negative and positive contributions to the 
variation of f due to the collisions. More precisely, 
Ldxdvdt is the probability of the test particle to 
disappear from the cell dxdv of the phase space 
because of a collision in the time interval (t,t + dt) 
and Gdxdudt is the probability to appear in the 
same time interval for the same reason. Let us 
consider the sphere of center x with radius r and a 
point x + rn over the surface, where n denotes the 
generic unit vector. Consider also the cylinder with 
base area dS=r7dn and height |V\dt along the 
direction of V =v — v. 

Then a given particle (say particle 2) with velocity 
v can contribute to L because it can collide with the 
test particle in the time dż, provided it is localized in 
the cylinder and if V -n < 0. Therefore, the contri- 
bution to L due to the particle 2 is the probability of 
finding such a particle in the cylinder (conditioned to 
the presence of the first particle in x). This quantity is 
h(x, v, x + nr, v2) | (v2 — v) : n|r? dn dv dt, where fr 
is the joint distribution of two particles. Integrating in 
dn and dv, we obtain that the total contribution to 
L due to any predetermined particle is 


A J dv n dn falx 0% + MY, v2)|(v2 E v) l n| [9] 


where S* is the unit hemisphere (v2 —v)-n < 0. 
Finally, we obtain the total contribution multiplying 
by the total number of particles: 


L=(N- tr | der 


x | dnfo(x,v,x + nr,v2)|(v2—v)-n| [10] 

S- 
The gain term can be derived analogously by 
considering that we are looking at particles which 
have velocities v and v after the collisions so 
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that we have to integrate over the hemisphere 
S7 ={(v2 —v)-n > 0}: 


G=(N- ir | der 


x | duafa(x,v,x +nr,v2)\(v2 —v)-n| [11] 
S 


Summing G and —L, we get 
Coll = (N - 1)” J dy 


z | dnhævx + nrn)» —v)-n [12] 


which, however, is not a very useful expression 
because the time derivative of f is expressed in terms 
of another object, namely f. An evolution equation 
for h will imply fs, the joint distribution of three 
particles, and so on, up to we include the total 
particle number N. Here the basic main assumption 
of Boltzmann enters, namely that two given particles 
are uncorrelated if the gas is rarefied, namely 


fxv, x2, V2) al f(x, v)f (x2, v2) [13] 


Condition [13], referred to as the propagation of 
chaos, seems contradictory at first sight: if two 
particles collide, correlations are created. Even though 
we could assume eqn [13] at some time, if the test 
particle collides with particle 2, such an equation 
cannot be satisfied anymore after the collision. 

Before discussing the propagation of chaos 
hypothesis, we first analyze the size of the collision 
operator. We remark that, in practical situations 
for a rarefied gas, the combination Nr? = 10+ cm’ 
(i.e., the volume occupied by the particles) is very 
small, while Nr? = O(1). This implies that G = O(1). 
Therefore, since we are dealing with a very large 
number of particles, we are tempted to perform the 
limit N —> œ and r—0O in such a way that 
r? = O(N). As a consequence, the probability that 
two tagged particles collide (which is of the order of 
the surface of a ball, i.e., O(7r7)) is negligible. 
However, the probability that a given particle 
performs a collision with any one of the remaining 
N —1 particles (which is O(Nr*)=O(1)) is not 
negligible. Therefore, condition [13] is referring to 
two preselected particles (say particles 1 and 2), so 
that it is not unreasonable to conceive that it holds 
in the limiting situation in which we are working. 

However, we cannot insert [13] in [12] because 
this latter equation refers to instants before and after 
the collision and, if we know that a collision took 
place, we certainly cannot invoke eqn [13]. Hence, it 
is more convenient to assume eqn [13] in the loss 
term and work over the gain term to keep advantage 
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of the factorization property which will be assumed 
only before the collision. 

Coming back to eqn [11] for the outgoing pair 
velocities v, v2 (satisfying the condition (v2 — v) -n > 0), 
we make use of the continuity property 


fa(x,v,x +nr,v2) =fa(x,v',x+nr,v,) u4 


where the pair v’,v5, is pre-collisional. On f 
expressed before the collision, we can reasonably 
apply condition [13] and obtain 


G-L=(N-1r | dey du(v —v2)-n 
st 


x [fF (x, u')f (x — nr, vh) 
— f(x, v)f (x + nr, v2)| [15] 


after a change n— —n in the gain term, using the 
notation S? for the hemisphere {n| = (v2 — v) -n > 0}. 
This transforms the pair v’,v, from a pre-collisional 
to a post-collisional pair. 

Finally, in the limit N — œ, r— 0, Nr? = A, we 


find 
(Op +v: Vof 
an IEZ s. dn(v — vz) -n 
x (F(x, )F(x,04) — f(x, v)F(x,v2)] 6] 


The parameter A, called mean free path, represents, 
roughly speaking, the typical length a particle can 
cover without undergoing any collision. In eqns [1] 
and [2], we just chose \= 1. 

Equation [16] (or, equivalently, eqns [1] and [2]) is 
the Boltzmann equation for hard spheres. Such an 
equation has a statistical nature, and it is not 
equivalent to the Hamiltonian dynamics from which 
it has been derived. Indeed, the H-theorem shows that 
such an equation is not reversible in time as expected 
of any law of mechanics. 

This concludes the heuristic preliminary analysis of 
the Boltzmann equation. We certainly know that the 
above arguments are delicate and require a more 
rigorous and deeper analysis. If we want the Boltzmann 
equation not to be a phenomenological model, derived 
by ad hoc assumptions and justified only by its 
practical relevance, but rather that it is a consequence 
of a mechanical model, we must derive it rigorously. In 
particular, the propagation of chaos should be not a 
hypothesis but the statement of a theorem. 


Beyond the Hard Spheres 


The heuristic arguments we have developed so far 
can be extended to different potentials than that of 
the hard-sphere systems. If the particles interact via 


a two-body interaction V= V(r), the resulting 
Boltzmann equation is eqn [1], with 


Of. = | dn | dnBe—vim[ff fh) 07 


where we are using the usual shorthand notation: 


f = f(x,v), fi =f 
hi = f(x,v4) 


and B=B(v—v;7) is a suitable function of the 
relative velocity v — vı and the impact parameter n, 
which is proportional to the cross section relative to 
the potential V. Another equivalent, sometimes 
more convenient, way, to express eqn [17] is 


O(f,f) = | an | av | ev, W (v, vilu’, v4) 
ffi — ff | 
with 
W(v, v4 |v’, vi) 
=w(v,vi|v',v') x ólu +vi =v — vi) 
x 6(3(v" +, — (v)’—(v')") ) [20] 
where w is a suitable kernel. All the qualitative 


properties, such as the conservation laws and the 
H-theorem, are obviously still valid. 


f = f(x, v), [18] 


[19] 


Consequences 


The Boltzmann equation provoked a debate involving 
Loschmidt, Zermelo, and Poincaré, who outlined 
inconsistencies between the irreversibility of the equa- 
tion and the reversible character of the Hamiltonian 
dynamics. Boltzmann argued the statistical nature of 
his equation and his answer to the irreversibility 
paradox was that “most” of the configurations behave 
as expected by the thermodynamical laws. However, 
he did not have the probabilistic tools for formulating 
in a precise way the statements of which he had a 
precise intuition. 

Grad (1949) stated clearly the limit N —> œ, 
r — 0, N7? — const., where N is the number of 
particles and r is the diameter of the molecules, in 
which the Boltzmann equation is expected to hold. 
This limit is usually called the Boltzmann-Grad limit 
(B-G limit in the sequel). 

The problem of a rigorous derivation of the 
Boltzmann equation was an open and challenging 
problem for a long time. Lanford (1975) showed that, 
although for a very short time, the Boltzmann equation 
can be derived starting from the mechanical model of the 
hard-sphere system. The proof has a deep content but is 
relatively simple from a technical viewpoint. 


Existence 


The mathematical study of the Boltzmann equation 
starts with the problem of proving the existence of 
the solutions. One would like to be able to show that, 
for all (or at least for a physically significant family 
of) initial distributions (which are positive and 
summable functions) with finite momentum, energy, 
and entropy, there exists a unique solution to eqn [1] 
with the same mass, momentum, and energy as of the 
initial distribution. Moreover, the entropy should 
decrease and the solution should approach the right 
Maxwellian as t— oo. The problem, in such a 
generality, is still unsolved, but several results in this 
direction have been achieved since the pioneering 
works due to Carleman (1933) for the homogeneous 
equation. Actually, there are satisfactory results for 
some special situations, such as the homogeneous 
solutions (independent of x) close to the equilibrium, 
to the vacuum, or to homogeneous data. The most 
general result we have up to now is, unfortunately, 
not constructive. This is due to Di Perna and Lions 
(1989), who showed the existence of suitable weak 
solutions to eqn [1]. However, we still do not know 
whether such solutions, which preserve mass and 
momentum, and satisfy the H-theorem, are unique 
and also preserve the energy. 


Hydrodynamics 


The derivation of hydrodynamical equations from 
the Boltzmann equation is a problem as old as the 
equation itself and, in fact, it goes back to Maxwell 
and Hilbert. Preliminary to the discussion of the 
hydrodynamic limit, we establish a few properties of 
the collision kernel. 

It is a well-known fact that the only solution to 
the equation 


O(f,f) = 0 [21] 
is a local Maxwellian, namely 
f(x,v):= M(x, v) 


— P) O eT [yg 


(2nT(x))?”” 


where the local parameters p, pu, and T satisfy the 
relations 


[ Mav =p [23] 


J vM = pu [24] 
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i foe, T o 
3” Mdv =5pT + 5pu [25] 


Moreover, the only solution to the equation 


/ b(v)O(f, f) dv = 0 26 


is any linear combination of the quantities (1, v, v*), 
called collision invariants. The last property 
obviously corresponds to the mass, momentum, 
and energy conservation. 

With this in mind, consider a change of 
variables in the Boltzmann equation [1], passing 
from microscopic to macroscopic variables, 
x— ex, t— et. Here £ is a small scale parameter 
expressing the ratio between the typical inter- 
particle distances and the typical distances over 
which the macroscopic equations are varying. 
Such a change yields 


@+v-V)==Ohf) 27 


We need to allow the small parameter £ (mean free 
path or the Knudsen number) to tend to zero. In 
order to eliminate the singularity on the right-hand 
side of [27], we multiply both sides by the collision 
invariants v® with a=0,1,2, and obtain the five 
equations: 


J do Oy aT = [28] 


On the other hand, if f. converges to f, as €—> 0, 
necessarily O(f,f)=0 and hence f =M. Therefore, 
we expect that in the limit € — 0, 


fa v"(O; +u-V,)M =0 [29] 
Equation [29] fixes a relation among the fields p, u, T 


as functions of x and t. A standard computation gives 
us the Euler equations for compressible gas 


p + div(pu) = 0 [30] 

1 
aie E VEN [31] 
OT + (u- V)T +3TVu =0 [32] 


where the pressure p is related to the density p and 
the temperature T by the perfect gas law 


p=pT [33] 
In order to make the above arguments rigorous, 


Hilbert (1916) developed a useful tool, called the 
Hilbert expansion, to control the limiting procedure. 
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Namely, he expressed a formal solution to eqn [27] 
in the form of a power series expansion: 


Le [34] 


j20 


where fo is the local Maxwellian, with the para- 
meters p,u, T satisfying the Euler equations. All the 
other coefficients f; of the developments can be 
determined by recurrence, inverting suitable opera- 
tors. However, the series is not expected to be 
convergent, so that the way to show the validity of 
the hydrodynamical limit rigorously is to truncate 
the expansion and to control the remainder. The 
first result in this direction was obtained by Caflisch 
(1980). However, this approach is based on the 
regularity of the solutions to the Euler equations, 
which is known to hold only for short times since 
shocks can be formed. How to approximate the 
shocks in terms of a kinetic description is still a 
difficult and open problem. 

Note that the hydrodynamical picture of the 
Boltzmann equation just means that we are looking 
at the solutions of this equation at a suitable 
macroscopic scale. The rarefaction hypothesis 
underlying the Boltzmann description is reflected in 
the law of perfect gas, which states that the 
particles, in the local thermal equilibrium, are free. 


Stationary Problems 


Stationary non-Maxwellian solutions to the 
Boltzmann equation should describe stationary 
nonequilibrium states exhibiting nontrivial flows. 
In spite of the physical relevance of these problems, 
not many complete mathematical results are, at the 
moment, available. Among them, there is the 
traveling-wave problem, which can be formulated 
in the following way. We look for a solution 
f=f(x —ct,v),f:R x R?—R*, constant in form 
but traveling with a constant velocity c > 0, to 


(v1 -9f = OFF. f) [35] 


where v1 is the first component of v and f’ denotes 
the spatial derivative of f. Equation [35] must be 
complemented by the boundary conditions which 
are f—M4, as x—oœ, where M+ are the right 
and left Maxwellians, namely two prescribed equili- 
brium situations at infinity. The parameters (density, 
mean velocity, and temperature) of the Maxwel- 
lians, however, cannot be chosen arbitrarily. Indeed, 
the conservations of the mass, momentum, and 
energy (which are properties of QO) imply the 
conservations (in x) of the fluxes of these quantities. 
Hence, we have to impose five equations that relate 


the upstream and the downstream values of the 
densities, mean velocities, and temperatures. Such 
relations are known in gas dynamics as the 
Rankine-Hugoniot conditions. A solution of this 
problem has been found by Caflisch and Nikolaenko 
(1983) in case of a weak shock (namely, when M, 
and M_ are close) by using Hilbert expansion 
techniques. More recently, Liu and Yu (2004) 
established also stability and positivity of this 
solution. 


Quantum Kinetic Theory 


Uehling and Uhlembeck (1933) introduced the 
following kinetic equation for describing a large 
system of weakly interacting bosons or fermions: 


(O+ v: Ve) f = fanfa Janw v, viw’, w) 


x {140A 
(LECE [36] 


Here the +/— sign, 
respectively, and 


W (v, vilu’, vi) 
= (V (v! =v) — V(v' — v1))d(v +v,—v'—v') 
x (i(i? egy = (v,)’)) [37] 


Moreover, 


stand for bosons/fermions, 


= 4r J dx el?* [38] 


where V is the interaction potential. Note that eqn 
[37] is the expression of the cross section of a 
quantum scattering in the Born approximation. 

The unknown f = f (x, v; t) in eqn [37] is the expected 
number of molecules falling in the unit (quantum) cell 
of the phase space. This function is proportional to the 
one-particle Wigner function, introduced by Wigner 
(1932) to handle kinetic problems in quantum 
mechanics, and defined as (setting b = 1): 


iy-v hace 1 
Ons zj ave” p(x +ły;x— ły) 


where p(x;z) is the kernel of a one-particle density 
matrix. Basically, the Wigner function is an equiva- 
lent way to describe a state of a quantum system. 
For instance, eqn [40] below expresses the equili- 
brium distributions for bosons and fermions in 
terms of Wigner functions. In general, the Wigner 
functions, due to the uncertainty principle, are real 
but not necessarily positive; however, the integral 
with respect to x and v gives the probability 


distributions of the velocity and the position, 
respectively. In the kinetic regime, in which we are 
interested, the scales are mesoscopic, namely the 
typical quantum oscillations are on a scale much 
smaller than the characteristic scales of the problem, 
so that we expect that f should be a genuine 
probability distribution, since the Heisenberg 
principle does not play an essential role. However, 
the interaction occurs on a microscopic scale, so that 
we expect that the statistics play a role in addition 
to the quantum rules for the scattering. 
In this framework, the entropy functional is 


Hip) = | dx | dv{f(x,v) log fv) 
F(E f(x,v))log(1+f(e,v))] B9 


It is decreasing along the solutions to eqn [35] and it is 
also minimized (among the distributions with given 
mass, momentum, and energy) by the equilibria 


M(v) 


& 


ee 40 
e(G/2)|v—ul? Fz | | 


namely the Bose-Einstein and the Fermi—Dirac 
distributions, respectively. Here G>1 and z>0 
are the inverse temperature and the activity, respec- 
tively. Note that, for the Bose-Einstein distribution, 
z < 1. This creates, in a sense, an inconsistency with 
eqn [36]. Indeed, assuming u=0 and an initial 
distribution f =fo(v) with the density larger than the 
maximal density allowed by eqn [40], namely 


1 
Pes | we [41] 


it cannot converge to any equilibrium. In order to 
overcome this difficulty related to the Bose con- 
densation, one can enlarge the definition of the 
equilibria family by setting 


1 


M(v) = e8722 —] 


+ pd(v) [42] 
to take care of excess of mass by means of a condensate 
component. However, it is not clear whether eqn 
[36] can actually describe the Bose condensation 
since its derivation from the Schrodinger equation 
requires, just from the very beginning, the existence of 
bosonic quasifree states which can be constructed only 
if the density is moderate. Further analyses are certainly 
needed to clarify the situation. A rigorous derivation of 
the Uehling and Uhlembeck equation is, up to now, far 
from being obtained even for short times; nevertheless, 
such an equation is extensively used in the applications. 
Equation [36] concerns a weakly interacting gas of 
quantum particles. From a mathematical viewpoint, it 
is expected to be valid in the so-called weak-coupling 
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limit, which consists in scaling space and time and the 
interaction potential ¢ as 


o— Ved [43] 


where e™! = N!/? is a parameter diverging when the 
number of particles N tends to infinity. 

We mention, incidentally, that under such a 
scaling, a classical system is described by a transport 
equation, called Fokker-Planck-Landau equation, 
with a diffusion operator in the velocity space. 

The B-G limit considered for classical particle 
systems is different from that considered here 
for weakly interacting quantum systems. It is actually 
equivalent to rescaling space and time according to 


t — et [44] 


XS ex; TS, 


x — EX, 


leaving the interaction unscaled but, in order to 
control the total interaction, we make the density 
diverging gently as e™ = N". 

A quantum system under such a scaling is expected to 
be described by a Boltzmann equation [1] with the 
collision operator O computed with the full quantum 
cross section. Now we do not have any effect of the 
statistics because in this rarefaction limit these correc- 
tions disappear. On the other hand, the cross section is 
that arising from the analysis of the quantum scattering. 
Since we do not rescale the interaction, all the other 
terms in the Born expansion of the cross section play a 
role. This kind of Boltzmann equation is a good 
description of a rarefied gas in which quantum effects 
are not negligible. 


See also: Adiabatic Piston; Evolution Equations: Linear 
and Nonlinear; Gravitational N-Body Problem (Classical); 
Interacting Particle Systems and Hydrodynamic 
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Nonequilibrium Statistical Mechanics: Dynamical 
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Introduction 


In 1924 the Indian physicist S N Bose introduced a new 
statistical method to derive the blackbody radiation law 
in terms of a gas of light quanta (photons). His work, 
together with the contemporary de Broglie’s idea of 
matter—wave duality, led A Einstein to apply the same 
statistical approach to a gas of N indistinguishable 
particles of mass m. An amazing result of his theory was 
the prediction that below some critical temperature a 
finite fraction of all the particles condense into the 
lowest-energy single-particle state. This phenomenon, 
named Bose-Einstein condensation (BEC), is a conse- 
quence of purely statistical effects. For several years, 
such a prediction received little attention, until 1938, 
when F London argued that BEC could be at the basis of 
the superfluid properties observed in liquid *He below 
2.17 K. A strong boost to the investigation of Bose- 
Einstein condensates was given in 1995 by the observa- 
tion of BEC in dilute gases confined in magnetic traps 
and cooled down to temperatures of the order of a few 
nK. Differently from superfluid helium, these gases 
allow one to tune the relevant parameters (confining 
potential, particle density, interactions, etc.), so to make 
them an ideal test-ground for concepts and theories on 
BEG, 


What Is BEC? 


In nature, particles have either integer or half- 
integer spin. Those having half-integer spin, like 
electrons, are called fermions and obey the Fermi- 
Dirac statistics; those having integer spin are 
called bosons and obey the Bose-Einstein statis- 
tics. Let us consider a system of N bosons. In 
order to introduce the concept of BEC on a 
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general ground, one can start with the definition 
of the one-body density matrix 


MY (r,r) = (Wl (rir) [1] 


The quantities U'(r) and U(r) are the field operators 
which create and annihilate a particle at point r, 
respectively; they satisfy the bosonic commutation 
relations 


V(r), Wr) =6r-7), WH), vr) =0 2] 
If the system is in a pure state described by the 
N-body wave function W(r,,...,7n), then the 
average [1] is taken following the standard rules of 
quantum mechanics and the one-body density 
matrix can be written as 


nD) (r r’) 


=N | draden W112...) YH Parn) [3] 
involving the integration over the N—1 variables 
r2,..., ry. In the more general case of a statistical 
mixture of pure states, expression [3] must be 
averaged according to the probability for a system 
to occupy the different states. 

Since nP (r,r')= (n (r',r))* the quantity n®, 
when regarded as a matrix function of its indices 
rand r’, is Hermitian. It is therefore always possible 
to find a complete orthonormal basis of single- 
particle eigenfunctions, y,(r), in terms of which the 
density matrix takes the diagonal form 


nD) (r 7’) =A, nipi (r)pilr"’) 4 


The real eigenvalues n; are subject to the normal- 
ization condition `; n; =N and have the meaning of 
occupation numbers of the single-particle states ;. 
BEC occurs when one of these numbers (say, no) 
becomes macroscopic, that is, when no = No is a 
number of order N, all the others remaining of order 1. 


In this case eqn [4] can be conveniently rewritten in 
the form 


nP (r,r) = Novo(r)yolr') +X mely) [S 
iZ0 

and the state represented by go(r) is called 
Bose-Einstein condensate. This definition is rather 
general, since it applies to any macroscopic (N > 1) 
system of indistinguishable bosons independently of 
mutual interactions and external fields. 

The one-body density matrix [1] contains informa- 
tion on important physical observables. By setting 
r = r' one finds the diagonal density of the system 


n(r) =n (r,r) = (U(r) V(r) 6 


with N= fdr n(r). The off-diagonal components 
can instead be used to calculate the momentum 
distribution 


n(p) = (V'(p)(p)) [7] 
where W(p) = (27h) > f drÊ(r) exp [-ip - r/h] is the 


field operator in momentum representation. By 
inserting this expression for W(p) into eqn [7] one 


finds 


1 S S\o. 
n(p) = zy oe n (R+5,R =)é 
[8] 


where s=r—r and R=(r + r')/2. 

Let us consider a uniform system of N particles in 
a volume V and take the thermodynamic limit 
N, V— oo with density N/V kept fixed. The eigen- 
functions of the density matrix are plane waves and 
the lowest-energy state has zero momentum, p=0, 
and constant wave function yo(r)= V~!/*. BEC in 
this state implies a macroscopic number of particles 
having zero momentum and constant density No/V. 
The density matrix only depends on s=r — r’ and 
can be written as 


In the s > œ limit, the sum on the right vanishes due 
to destructive interference between different plane 
waves, but the first term survives. One thus finds that, 
in the presence of BEC, the one-body density matrix 
tends to a constant finite value at large distances. This 
behavior is named off-diagonal long-range order, 
since it involves the off-diagonal components of the 
density matrix. Its counterpart in momentum space is 
the appearance of a singular term at p =Q: 


n(p) = Noô(p) + X` npêlp — p') [10] 
p'#0 
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The sum on the right is the number of noncondensed 
particles (N — No), and the quantity No/N is called 
condensate fraction. 

If the system is not uniform, the eigenfunctions of 
the density matrix are no longer plane waves but, 
provided N is sufficiently large, the concept of BEC 
is still well defined, being associated with the 
occurrence of a macroscopic occupation of a 
single-particle eigenfunction yo(r) of the density 
matrix. Thus, the condensed bosons can be 
described by means of the function W(r)= 
VNovo(r), which is a classical complex field playing 
the role of an order parameter. This is the analog of 
the classical limit of quantum electrodynamics, 
where the electromagnetic field replaces the micro- 
scopic description of photons. The function Y may 
also depend on time and can be written as 


W(r,t) = T(r, t) e [11] 


Its modulus determines the contribution of the 
condensate to the diagonal density [6], while the 
phase S is crucial in characterizing the coherence 
and superfluid properties of the system. The order 
parameter [11], also named macroscopic wave 
function or condensate wave function, is defined 
only up to a constant phase factor. One can always 
multiply this function by the numerical factor e° 
without changing any physical property. This 
reflects the gauge symmetry exhibited by all the 
physical equations of the problem. Making an 
explicit choice for the value of the order parameter, 
and hence for the phase, corresponds to a formal 
breaking of gauge symmetry. 


BEC in Ideal Gases 


Once we have defined what is a Bose-Einstein 
condensate, the next question is when such a 
condensation occurs in a given system. The ideal 
Bose gas provides the simplest example. So, let us 
consider a gas of noninteracting bosons described 
by the Hamiltonian H=)y., H where the Schrö- 
dinger equation H'"y,(r) =ey;(r) gives the spec- 
trum of single-particle wave functions and 
energies. One can define an occupation number 
n; as the number of particles in the state with 
energy ¢;. Thus, any given state of the many-body 
system is specified by a set {n;}. The mean 
occupation numbers, 7;, can be calculated by 
using the standard rules of statistical mechanics. 
For instance, by considering a grand canonical 
ensemble at temperature T, one finds 


ni = {exp (6 — u)| — 1} [12] 
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with G=1/(kgT). The chemical potential u is fixed 
by the normalization condition X`; n; =N, where N 
is the average number of particles in the gas. For 
T — co the chemical potential is negative and large. 
It increases monotonically when T is lowered. Let us 
call e 9 the lowest single-particle level in the 
spectrum. If at some critical temperature T, the 
normalization condition can be satisfied with 
u— €g, then the occupation of the lowest state, 
no = No, becomes of order N and BEC is realized. 
Below T, the normalization condition must be 
replaced with N = Nọ + Nr, where Nr = Dr Nj; is 
the number of particles out of the condensate, that 
is, the thermal component of the gas. Whether BEC 
occurs or not, and what is the value of T depends 
on the dimensionality of the system and the type of 
single-particle spectrum. 

The simplest case is that of a gas confined in a 
cubic box of volume V = L? with periodic boundary 
conditions, where H = — (b° /2m)V?. The eigen- 
functions are plane waves ppr) = V? exp |- ip - 
r/b], with energy ¢,=p7/2m and momentum 
p=2rhbn/L. Here n is a vector whose components 
Ny, ny, nz are 0 or + integers. The lowest eigenvalue 
has zero energy (co =0) and zero momentum. The 
mean occupation numbers are given by 
Ap = {exp [8(p?/2m — u)] —1}*. In the thermo- 
dynamic limit (N, V — oo with N/V kept constant), 
one can replace the sum *`, with the integral 
fdeple), where ple) = (27)? V(2m/b*) > ye is the 
density of states. In this way, one can calculate the 
thermal component of the gas as a function of T, 
finding the critical temperature 


21b N ae 
kg T: = — | = > 13 

T= a (VTD) ~ 
where ¢ is the Riemann zeta function and ¢(3/2) ~ 
2.612. For T > Te, one has u < 0 and Nr =N. For 
T < T, one instead has w=0, Nr =N — No and 


No(T) = N[1 — (T/T) [14] 


The critical temperature turns out to be fully 
determined by the density N/V and by the mass of 
the constituents. These results were first obtained 
by A Einstein in his seminal paper and used by 
F London in the context of superfluid helium. We 
notice that the replacement of the sum with an 
integral in the above derivation is justified only if 
the thermal energy kgT is much larger than the 
energy spacing between single-particle levels, that is, 
if kgT >> bh? /2mV?/?. Is is also worth noticing that 
the above expression for T, can be written as 
\AN/V ~ 2.612, where Ap =[27h?/(mkgT)|"/7 is 
the thermal de Broglie wavelength. This is 


equivalent to saying that BEC occurs when the 
mean distance between bosons is of the order of 
their de Broglie wavelength. 

Another interesting case, which is relevant for the 
recent experiments with BEC in dilute gases con- 
fined in magnetic and/or optical traps, is that of an 
ideal gas subject to harmonic potentials. Let us 
consider, for simplicity, an isotropic external poten- 
tial Ver(r) = (1/ 2)mux. Pa The single-particle Hamil- 
tonian is H” = — (b° /2m)V? + Vext(r) and its 
eigenvalues are €y,.1,,n,= (Mx +My + Nz +3/2)hur. 
The corresponding density of states is p(e)= 
(1/2)(Hw,,)°e2. A natural thermodynamic limit for 
this system is obtained by letting N— œ and 
Who +0, while keeping the product Nw), constant. 
The condition for BEC to occur is that u approaches 
the value €999 = (3/2)huy,, from below by cooling the 
gas down to T.. Following the same procedure as 
for the uniform gas, one finds 


kpT: = bwp [N/B] = 0.94ux,, N! [15] 
and 
No(T) = N[1 — (T/Te)"] [16] 


Notice that the condensate is not uniform in this case, 
since it corresponds to the lowest eigenfunction of the 
harmonic oscillator, which is a Gaussian of width 
io” [P7 (7w},.)|'/*. Correspondingly, the condensate 
in the momentum space is also a Gaussian, of width 
a,\. This implies that, differently from the gas in a box, 
here the condensate can be seen both in coordinate and 
momentum space in the form of a narrow distribution 
emerging from a wider thermal component. Finally, 
results [15] and [16] remain valid even for anisotropic 
harmonic potentials, with trapping frequencies wx, wy, 
and w,, provided the frequency who is replaced by the 
geometric average (Wyld) 


BEC in Interacting Gases 


Actual condensates are made of interacting particles. 
The full many-body Hamiltonian is 


E J drôt (r) Ay Î(r) 
+5 d'dr Wr (r)V(r-r)U(r)U(r) [17 


where V(r — 1’) is the particle—particle interaction and 
Hy = —(b° /2m)V* + Vext(r). Differently from the 
case of ideal gases, H is no longer a sum of single- 
particle Hamiltonians. However, the general defini- 
tions given in the section “What is BEC?” are still 
valid. In particular, the one-body density matrix, in the 
presence of BEC, can be separated as in eqn [5]. One 


can write nt (r, r) = U*(r) U(r’) + a (r,r), where Y 
is the order parameter of the condensate (WV*(r) U(r’) 
being of order N), while (r, r’) vanishes for large 
lr — r'|. This is equivalent to say that the bosonic field 
operator splits in two parts, 


U(r) = V(r) + 6W(r) [18] 
where the first term is a complex function and the 
second one is the field operator associated with 
the noncondensed particles. This decomposition is 
particularly useful when the depletion of the 
condensate, that is, the fraction of noncondensed 
particles, is small. This happens when the interac- 
tion is weak, but also for particles with arbitrary 
interaction, provided the gas is dilute. In this case, 
one can expand the many-body Hamiltonian by 
treating the operator 6W as a small quantity. 

A suitable strategy consists in writing the Heisen- 
berg equation for the evolution of the field opera- 
tors, ibd,v =[W, Ĥ], using the many-body 
Hamiltonian [17]: 


ihO, U(r, t) 
— (Fi + f artio — ril) 
x Û(r,t) [19] 


The zeroth-order is thus obtained by replacing the 
operator © with the classical field Y. In the integral 
containing the interaction V(r — 1’), this replacement is, 
in general, a poor approximation when short distances 
(r — r’) are involved. In a dilute and cold gas, one can 
nevertheless obtain a proper expression for the inter- 
action term by observing that, in this case, only binary 
collisions at low energy are relevant and these collisions 
are characterized by a single parameter, the s-wave 
scattering length, a, independently of the details of the 
two-body potential. This allows one to replace V(r — r’) 
in H with an effective interaction V(r — r’) = g6(r — 1’), 
where the coupling constant g is given by g=4ah°a/m. 
The scattering length can be measured with several 
experimental techniques or calculated from the exact 
two-body potential. Using this pseudopotential and 
replacing the operator Ê with the complex function W in 
the Heisenberg equation of motion, one gets 


iht (r,t) 


BV 5 

S maa T Vext(1) + g|WU(r, t) | U(r, t) [20] 
2m 

This is known as Gross—Pitaevskii (GP) equation and 

it was first introduced in 1961. It has the form of a 

nonlinear Schrödinger equation, the nonlinearity 

coming from the mean-field term, proportional to 
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\|". It has been derived assuming that N is large 
while the fraction of noncondensed atoms is negli- 
gible. On the one hand, this means that quantum 
fluctuations of the field operator have to be small, 
which is true when n|a|> < 1, where n is the particle 
density. In fact, one can show that, at T=0 the 
quantum depletion of the condensate is proportional 
to (nļal?) 2. On the other hand, thermal fluctuations 
have also to be negligible and this means that the 
theory is limited to temperatures much lower than 
Te. Within these limits, one can identify the total 
density with the condensate density. 

The stationary solution of eqn [20] corresponds to 
the condensate wave function in the ground state. One 
can write V(r, t) = Vo(r) exp (—ipt/h), where u is the 
chemical potential. Then the GP equation [20] becomes 





Vv 2 
+ Veal) etol) vot) = nvo) (21 
where n(r) = olr) is the particle density. The same 
equation can be obtained by minimizing the energy of 
the system written as a functional of the density: 


Ej]= | dr 


The first term on the right corresponds to the 
quantum kinetic energy coming from the uncertainty 
principle; it is usually named “quantum pressure” 
and vanishes for uniform systems. 

The next order in 6W gives the excited states of the 
condensate. In a uniform gas the ground-state order 
parameter, Wo, is a constant and the first-order 
expansion of H was introduced by N Bogoliubov in 
1947. In particular, he found an elegant way to 
diagonalize the Hamiltonian by using simple linear 
combinations of particle creation and annihilation 
operators. These are known as Bogoliubov’s trans- 
formations and stay at the basis of the concept of 
quasiparticle, one of the most important concepts in 
quantum many-body theory. 

A generalization of Bogoliubov’s approach to the 
case of nonuniform condensates is obtained by 
considering small deviations around the ground 
state in the form 


p” 2 gn? 
Im Valo + 2Vexe(r) + 


| (22 








T(r, t) = e™ P Wo(r) + u(r)e + v*(r)e] [23] 


Inserting this expression into eqn [20] and keeping 
terms linear in the complex functions u and v, one gets 


hwu(r) = [Ho — u + 2g¥o(r)|u(r) + g¥o(r)u(r) [24] 


—huv(r) = [Êo — w+ 2gW2(r)ju(r) +203(r)ulr) [25] 
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These coupled equations allow one to calculate the 
energies € =hw of the excitations. They also give the 
so-called quasiparticle amplitudes u and v, which obey 
the normalization condition 


J drin (ryju lr) — (riwl = ô; 


In a uniform gas, u and v are plane waves and one 
recovers the famous Bogoliubov’s spectrum 


Ia a 1/2 
hw = g é + zen )| [26] 


where q is the wave vector of the excitations. 
For large momenta the spectrum coincides with the 
free-particle energy bq?/2m. At low momenta, it 
instead gives the phonon dispersion w= cq, where 
c=[gn/m]'/* is the Bogoliubov sound velocity. The 
transition between the two regimes occurs when the 
excitation wavelength is of the order of the healing 
length, 


E= [8nnal'/* = b/(mcv2) [27] 


which is an important length scale for superfluidity. 
When the order parameter is forced to vanish at some 
point (by an impurity, a wall, etc.), the healing length 
provides the typical distance over which it recovers its 
bulk value. In a nonuniform condensate the excitations 
are no longer plane waves but, at low energy, they have 
still a phonon-like character, in the sense that they 
involve a collective motion of the condensate. 

The GP equation [20] is the starting point for an 
accurate mean-field description of BEC in dilute 
cold gases, which is rigorous at T=0 and for 
nja|? «1. Static and dynamics properties of con- 
densates in different geometries can be calculated by 
solving the GP equation numerically or using 
suitable approximated methods. The inclusion of 
effects beyond mean field is a highly nontrivial and 
interesting problem. A rather extreme case is 
represented by liquid He, which is a dense system 
where the interaction between atoms causes a large 
depletion of the condensate even at T=0 (No/N 
being less than 10%) and thus a full many-body 
treatment is required for its rigorous description. 
Nevertheless, even in this case, the general defini- 
tions of the section “What is BEC?” are still useful. 


Superfluidity and Coherence 


With the word superfluidity, one summarizes a 
complex of macroscopic phenomena occurring in 
quantum fluids under particular conditions: persis- 
tent currents, equilibrium states at rest in rotating 


vessels, viscousless motion, quantized vorticity, and 
others. These features can also be observed in BEC. 
The link between BEC and superfluidity is given by 
the phase of the order parameter [11]. To under- 
stand this point, let us consider a uniform system. If 


W(r,t) is a solution of the Heisenberg equation [19] 
with Vext = 0, then 


W' (r,t) = U(r — vt, t) exp E (m rT ;mt) [28] 


where v is a constant vector, is also a solution. This 
equation gives the Galilean transformation of 
the field operator and also applies to its condensate 
component W. At equilibrium, the ground-state 
order parameter is given by Wo =./nexp (—ipit/D), 
where n is a constant independent of r. In a frame 
where the condensate moves with velocity v, the 
order parameter instead takes the form Wo= 
JVnexp (iS), with S(r,t)=b | [mv -r — (mv? /2 + pt]. 
The velocity of the condensate can thus be identified 
with the gradient of the phase S: 


vr,t) = ? vsir, t) [29] 


This definition is also valid for v varying slowly in 
space and time. The modulus of the order para- 
meter plays a minor role in this definition and it is 
not necessary to assume the gas to be dilute and 
close to T =0. Indeed, the relation [29] between the 
velocity field and the phase of the order parameter 
also applies in the presence of large quantum 
depletion, as in superfluid *He, and at T0. In 
this case, n should not be identified with the 
condensate density. Conversely, in dilute gases at 
T =0, n is the condensate density and the velocity 
[29] can be simply obtained by applying the usual 
definition of current density operator, f, to the order 
parameter [11]. 

The velocity [29] describes a potential flow and 
corresponds to a collective motion of many particles 
occupying a single quantum state. Being equal to the 
gradient of a scalar function, it is irrotational 
(Vxv,=0) and satisfies the Onsager-Feynman 
quantization condition ¢v,-dl=Kh/m, with k 
non-negative integer. These conditions are not 
satisfied by a classical fluid, where the hydro- 
dynamic velocity field, v(r,t)=j(r,t)/n(r,t), is the 
average over many different states and does not 
correspond to a potential flow. 

By using the definition of the phase S and velocity 
v, together with particle conservation, one can show 
that the dynamics of a condensate, as far as 
macroscopic motions are concerned, is governed by 
the hydrodynamic equations of an irrotational 


nonviscous fluid. Within the mean-field theory, this 
can be easily seen by rewriting the GP equation [20] 
in terms of the density n=|W|? and the velocity 
[29]. Neglecting the quantum pressure term V? yn 
(hence limiting the description to length scales 
larger than the healing length £), one gets 


o 
z” +V. (vn) = 0 [30] 
and 
m pN at ree =0 pi 
Ot ext H 2 —— 


with the local chemical potential u(n) = gn. These 
equations have the typical structure of the dynamic 
equations of superfluids at zero temperature and can 
be viewed as the T=0 case of the more general 
Landau’s two-fluid theory. 

One of the most striking evidences of superfluidity 
is the observation of quantized vortices, that is, 
vortices obeying the Onsager—Feynman quantization 
condition. A vast literature is devoted to vortices in 
superfluid helium and, more recently, vortices have 
also been produced and studied in condensates of 
ultracold gases, including nice configurations of 
many vortices in regular triangular lattices, similar 
to the Abrikosov lattices in superconductors. Other 
phenomena, such as the reduction of the moment of 
inertia, the occurrence of Josephson tunneling 
through barriers, the existence of thresholds for 
dissipative processes (Landau criterion), and others, 
are typical subjects of intense investigation. 

Another important consequence of the fact that 
BEC is described by an order parameter with a well- 
defined phase is the occurrence of coherence effects 
which, in different words, mean that condensates 
behave like matter waves. For instance, one can 
measure the phase difference between two conden- 
sates by means of interference. This can be done in 
coordinate space by confining two condensates in 
two potential minima, a and b, at a distance d. Let 
us take d along z and assume that, at t=0, the order 
parameter is given by the linear combination 
U(r) = V(r) + exp (id) V,(r) with Y, and Y, real 
and without overlap. Then let us switch off the 
confining potentials so that the condensates expand 
and overlap. If the overlap occurs when the density 
is small enough to neglect interactions, the motion 
is ballistic and the phase of each condensate evolves 
as S(r,t) ~ mr?/(2ht), so that v=r/t. This implies 
a relative phase ¢+S(x,y,z+d/2) — S(x,y, z— 
d/2)=¢ġ+ mdz/bt. The total density n=|U|* thus 
exhibits periodic modulations along z with wave- 
length ht/md. This interference pattern has indeed 
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been observed in condensates of ultracold atoms. In 
these systems it was also possible to measure the 
coherence length, that is, the distance |r — r'| at which 
the one-body density vanishes and the phase of the 
order parameter is no more well defined. In most 
situations, the coherence length turns out to be of the 
order of, or larger than the size of the condensates. 
However, interesting situations exist when the coher- 
ence length is shorter but the system still preserves some 
features of BEC (quasicondensates). 


Final Remarks 


Bose-Einstein condensates of ultracold atoms are 
easily manipulated by changing and tuning the 
external potentials. This means, for instance, that one 
can prepare condensates in different geometries, 
including very elongated (quasi-1D) or disk-shaped 
(quasi-2D) condensates. This is conceptually impor- 
tant, since BEC in lower dimensions is not as simple as 
in three dimensions: thermal and quantum fluctua- 
tions play a crucial role, superfluidity must be properly 
re-defined, and very interesting limiting cases can be 
explored (Tonks—Girardeau regime, Luttinger liquid, 
etc.). Another possibility is to use laser beams to 
produce standing waves acting as an external periodic 
potential (optical lattice). Condensates in optical 
lattices behave as a sort of perfect crystal, whose 
properties are the analog of the dynamic and transport 
properties in solid-state physics, but with controllable 
spacing between sites, no defects and tunable lattice 
geometry. One can investigate the role of phase 
coherence in the lattice, looking, for instance, at 
Josephson effects as in a chain of junctions. By tuning 
the lattice depth one can explore the transition from a 
superfluid phase and a Mott-insulator phase, which is 
a nice example of quantum phase transition. Control- 
ling cold atoms in optical lattice can be a good starting 
point for application in quantum engineering, inter- 
ferometry, and quantum information. 

Another interesting aspect of BECs is that the key 
equation for their description in mean-field theory, 
namely the GP equation [20], is a nonlinear Schrö- 
dinger equation very similar to the ones commonly 
used, for instance, in nonlinear quantum optics. This 
opens interesting perspectives in exploiting the analo- 
gies between the two fields, such as the occurrence of 
dynamical and parametric instabilities, the possibility 
to create different types of solitons, the occurrence of 
nonlinear processes like, for example, higher harmonic 
generation and mode mixing. 

A relevant part of the current research also involves 
systems made of mixtures of different gases, Bose—Bose 
or Fermi-Bose, and many activities with ultracold 
atoms now involve fermionic gases, where BEC can 
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also be realized by condensing molecules of fermionic 
pairs. An extremely active research now concerns the 
BCS-BEC crossover, which can be obtained in Fermi 
gases by tuning the scattering length (and hence the 
interaction) by means of Feshbach resonances. 

Ten years after the first observation of BEC in 
ultracold gases, it is almost impossible to summarize 
all the researches done in this field. A large amount 
of work has already been devoted to characterize the 
condensates and several new lines have been opened. 
Rather detailed review articles and books are 
already available for the interested readers. 


See also: Interacting Particle Systems and Hydrodynamic 
Equations; Quantum Phase Transitions; Quantum 
Statistical Mechanics: Overview; Renormalization: 
Statistical Mechanics and Condensed Matter; Superfluids; 
Variational Techniques for Ginzburg—Landau Energies. 
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Introduction 


In this article we discuss quantum theories which 
describe systems of nondistinguishable particles 
interacting with external fields. Such models are 
of interest also in the nonrelativistic case (in 
quantum statistical mechanics, nuclear physics, 
etc.), but the relativistic case has additional, 
interesting complications: relativistic models are 
genuine quantum field theories, that is, quantum 
theories with an infinite number of degrees of 
freedom, with nontrivial features like divergences 
and anomalies. Since interparticle interactions are 
ignored, such models can be regarded as a first 
approximation to more complicated theories, and 
they can be studied by mathematically precise 
methods. 

Models of relativistic particles in external electro- 
magnetic fields have received considerable attention 
in the physics literature, and interesting phenomena 
like the Klein paradox or particle-antiparticle pair 
creation in overcritical fields have been studied; see 
Rafelski et al. (1978) for an extensive review. We 
will not discuss these physics questions but only 


describe some prototype examples and a general 
Hamiltonian framework which has been used in 
mathematically precise work on such models. The 
general framework for this latter work is the 
mathematical theory of Hilbert space operators 
(see, e.g., Reed and Simon (1975)), but in our 
discussion we try to avoid presupposing knowledge 
of that theory. As mentioned briefly in the end, this 
work has had close relations to various topics of 
recent interest in mathematical physics, including 
anomalies, infinite-dimensional geometry and group 
theory, conformal field theory, and noncommutative 
geometry. 

We restrict our discussion to spin-0 bosons and 
spin-1/2 fermions, and we will not discuss models 
of particles in external gravitational fields but 
only refer the interested reader to DeWitt (2003). 
We also only mention in passing that external 
field problems have also been studied using 
functional integral approaches, and mathemati- 
cally precise work on this can be found in the 
extensive literature on determinants of differential 
operators. 


Examples 


Consider the Schrödinger equation describing a 
nonrelativistic particle of mass m and charge e 


moving in three-dimensional space and interacting 
with an external vector and scalar potentials A and 
@, respectively, 
iy = Hy, H= = (-iV+eA)’-ed [1 
2m 
(we set h=c=1, 0,=0/Ot, and p,p, and A can 
depend on the space and time variables x € R? and 
te R). This is a standard quantum-mechanical 
model, with œ% the one-particle wave function 
allowing for the usual probabilistic interpretation. 
One interesting generalization to the relativistic 
regime is the Klein—Gordon equation 


(id, Jeo) -iV + eA)? = m? p=0 [2 


with a C-valued function w. There is another 
important relativistic generalization, the Dirac 
equation 


(ið; + eġ) — (—iV + eA) -æ + mpy =0 [3] 


with œ&= (a1, @&2, &3) and 8 Hermitian 4x4 
matrices satisfying the relations 


aib = — bai, 


and a C*-valued function w (we also write 1 for the 
identity). These two relativistic equations differ by 
the transformation properties of ~ under Lorentz 
transformations: in [2] it transforms like a scalar 
and thus describes spin-0 particles, and it transforms 
like a spinor describing spin-1/2 particles in [3]. While 
these equations are natural relativistic generaliza- 
tions of the Schrödinger equation, they no longer 
allow to consistently interpret ~ as one-particle 
wave functions. The physical reason is that, in a 
relativistic theory, high-energy processes can create 
particle—-antiparticle pairs, and this makes the 
restriction to a fixed particle number inconsistent. 
This problem can be remedied by constructing a 
many-body model allowing for an arbitrary number 
of particles and antiparticles. The requirement that 
this many-body model should have a ground state is 
an important ingredient in this construction. 

It is obviously of interest to formulate and study 
many-body models of nondistinguishable particles 
already in the nonrelativistic case. An important 
empirical fact is that such particles come in two 
kinds, bosons and fermions, distinguished by their 
exchange statistics (we ignore the interesting possi- 
bility of exotic statistics). For example, the fermion 
many-particle version of [1] for suitable ¢ and A is a 
useful model for electrons in a metal. An elegant 
method to go from the one- to the many-particle 
description is the formalism of second quantization: 
one promotes yw to a quantum field operator with 


P=1 fl 


Qj; + AjA; = Os 
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certain (anti-) commutator relations, and this is a 
convenient way to construct the appropriate many- 
particle Hilbert space, Hamiltonian, etc. In the 
nonrelativistic case, this formalism can be regarded 
as an elegant reformulation of a pedestrian con- 
struction of a many-body quantum-mechanical 
model, which is useful since it provides convenient 
computational tools. However, this formalism nat- 
urally generalizes to the relativistic case where the 
one-particle model no longer has an acceptable 
physical interpretation, and one finds that one can 
nevertheless give a consistent physical interpretation 
to [2] and [3] provided that w are interpreted as 
quantum field operators describing bosons and 
fermions. This particular exchange statistics of the 
relativistic particles is a special case of the spin- 
statistics theorem: integer-spin particles are bosons 
and half-integer spin particles are fermions. While 
many structural features of this formalism are 
present already in the simpler nonrelativistic models, 
the relativistic models add some nontrivial features 
typical for quantum field theories. 

In the following, we discuss a precise mathema- 
tical formulation of the quantum field theory models 
described above. We emphasize the functorial nature 
of this construction, which makes manifest that it 
also applies to other situations, for example, where 
the bosons and fermions are also coupled to a 
gravitational background, are considered in other 
spacetime dimensions than 3 + 1, etc. 


Second Quantization: 
Nonrelativistic Case 


Consider a quantum system of nondistinguishable 
particles where the quantum-mechanical descrip- 
tion of one such particle is known. In general, this 
one-particle description is given by a Hilbert space 
h and one-particle observables and transforma- 
tions which are self-adjoint and unitary operators 
on h, respectively. The most important observable 
is the Hamiltonian H. We will describe a general 
construction of the corresponding many-body 
system. 


Example As a motivating example we take the 
Hilbert space h = L?(R°) of square-integrable func- 
tions f(x), x € R?, and the Hamiltonian H in [1]. A 
specific example for a unitary operator on / is the 
gauge transformation (Uf)(x) = exp(ix(x))f(x) with 
x a smooth, real-valued function on R°. 


In this example, the corresponding wave functions 
for N identical such particles are the L?-functions 
[N Xise XN) E R°. It is obvious how to extend 
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one-particle observables and transformations to such 
N-particle states: for example, the N-particle Hamil- 
tonian corresponding to H in [1] is 


N 
1 
Hy = > a T eA(t, x;)) —eġ(t,x;) [5] 
JZ 


and the N-particle gauge transformation Uy is defined 
through multiplication with I~- i explix(x;)). E 

For systems of indistinguishable particles it is 
enough to restrict to wave functions which are even 
or odd under particle exchanges, 


TG ET ag AEN 


SEN O roe O [6] 


for all 1<j<k<N, with the upper and lower 
signs corresponding to bosons and fermions, respec- 
tively (this empirical fact is usually taken as a 
postulate in nonrelativistic many-body quantum 
physics). It is convenient to define the zero-particle 
Hilbert space as C (complex numbers) and to 
introduce a Hilbert space containing states with all 
possible particle numbers: this so-called Fock space 
contains all states 


fo 
fı (x1) 
fa (x1, %2) [7] 
[Big Rd, k3) 


with fo € C. The definition of Hy and Uy then 
naturally extends to this Fock space; see below. 


General Construction 


The construction of Fock spaces and many-particle 
observables and transformations just outlined in a 
specific example is conceptually simple. An alter- 
native, more efficient construction method is to use 
“quantum fields,” which we denote as w(x) and 
w(x), x € R°. They can be fully characterized by the 
following (anti-) commutator relations: 


(x), Wi), =F(x—y), W) vO). =9 [8 


where [a,b]; = ab + ba, with the commutator and 
anticommutators (upper and lower signs, respec- 
tively) corresponding to the boson and fermion case, 
respectively. It is convenient to “smear” these fields 
with one-particle wave functions and define 


v= fe xf ue) 
vis [ Pxut f(x) 


R? 


for all f € h. Then the relations characterizing the 
field operators can be written as 


WA, e) = (fg) 
WWE) [10] 
YVf,g Ebh 


where 


(fa) = |, PT) 


is the inner product in h. The Fock space F+(h) can 
then be defined by postulating that it contains a 
normalized vector Q called “vacuum” such that 


W(NQ=0 feh [11] 


and that all Y® (f) are operators on F+(h) such that 
w'(f) =wW(f)*, where * is the Hilbert space adjoint. 
Indeed, from this we conclude that F+(h), as vector 
space, is generated by 


A Aha-Afvev(A (hh) vyaQ [12] 


with f; € h and N=0,1,2,..., and that the Hilbert 
space inner product of such vectors is 


(hi A fa A+++ Af, 81 A 82 At A gm) 


N 
=ovom X (+1)" J [F sr) [13] 
PESN j=1 
with Sy the permutation group, with (+1)!!! =1 
always, and (—1)!?! = +1 and —1 for even and odd 
permutations, respectively. The many-body Hamil- 
tonian g(H) corresponding to the one-particle Hamil- 
tonian H can now be defined by the following relations: 


qQ(H)Q=0, [aH A= HA 14 
for all f € b such that Hf is defined. Indeed, this 
implies that 


Q(A)f Af AAA fN 


N 
=S fhln H] AN [15] 
ci 


which defines a self-adjoint operator on F+(h), and 
it is easy to check that this coincides with our down- 
to-earth definition of Hy above. Similarly, the 
many-body transformation O(U) corresponding to 
a one-particle transformation U can be defined as 


Q(U)QX=2, QUW) =vV'(UA)Q(U) [16] 
for all f € h, which implies that 


O(U)JA ARA ASN 


— (Ufi) A (Uf2) A-A (Ufy) [17] 


and thus coincides with our previous definition of 
Un. 

While we presented the construction above for a 
particular example, it is important to note that it 
actually does not make reference to what the one- 
particle formalism actually is. For example, if we 
had a model of particles on a space M given by 
some “nice” manifold of any dimension and with M 
internal degrees of freedom, we would take 
h =L?(M)@C™ and replace [9] by 


M a 
v= j due) Fee) s 
j=l 


and its Hermitian conjugate, with the measure u on 
M defining the inner product in h, 


(fg) =f dulce) E Flee 


With that, all formulas after [9] hold true as they stand. 
Given any one-particle Hilbert space h with inner 
product (- , -), observable H, and transformation U, the 
formulas above define the corresponding Fock spaces 
F (hb) and many-body observable q(H) and transfor- 
mation O(U). It is also interesting to note that this 
construction has various beautiful general (functorial) 
properties: the set of one-particle observables has a 
natural Lie algebra structure with the Lie bracket given 
by the commutator (strictly speaking: i times the 
commutator, but we drop the common factor i for 
simplicity). The definitions above imply that 


|q(A), q(B)] = q4 (A, BI) [19] 


for one-particle observables A, B, that is, the above- 
mentioned Lie algebra structure is preserved under 
this map q. In a similar manner, the set of one- 
particle transformations has a natural group struc- 
ture preserved by the map Ọ, 


Q(U)O(V)=Q(UV), Q(U) '=Q(U"!) po 


Moreover, if A is self-adjoint, then exp(iA) is 
unitary, and one can show that 


O(exp(iA)) = exp(iq(A)) |21] 


For later use, we note that, if {f,},e7 is some 
complete, orthonormal basis in h, then operators A 
on h can be represented by infinite matrices 
(Ainn)m,neZ with Amn = (Dee and 


where Yy® =w'(f,) obey 


ms hl -= Omar Wav B 
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for all m, n. We also note that, in our definition of 
g(A), we made a convenient choice of normal- 
ization, but there is no physical reason to not choose 
a different normalization and define 


q (A) = (A) — b(A) |24] 


where b is some linear function mapping self-adjoint 
operators A to real numbers. For example, one may wish 
to use another reference vector 2 instead of Q in the 
Fock space, and then would choose b(A) = (Q, q(A)Q). 
Then the relations in [19] are changed to 


[q (A), q (B)] = q'([A, B]) + So(A,B) [BS] 


where So(A, B) = b([A, B]). However, the C-number 
term So(A, B) in the relations [25] is trivial, since it 
can be removed by going back to g(A). 


Physical Interpretation 


The Fock space F+(h) is the direct sum of subspaces 
of states with different particle numbers N, 


F=(b) = D h [26] 
N=0 


where the zero-particle subspace p =C is gener- 
ated by the vacuum Q, and h\’ is the N-particle 
subspace generated by the states fp AfoA---A 
fn, fi € h. We note that 


N = q(1) |27] 


is the “particle-number operator,” M Fy =NFy for 
all Fy € 2A '. The field operators obviously change 
the particle number: ~'(f) increases the particle 
number by one (maps p™’) to per), and w(f) 
decreases it by one. Since every f € / can be interpreted 
as one-particle state, it is natural to interpret w'(f) and 
(f) as “creation” and “annihilation” operators, 
respectively: they create and annihilate one particle in 
the state f € þ. It is important to note that, in the 
fermion case, [10] implies that wi(f)* =0, which is a 
mathematical formulation of the Pauli exclusion 
principle: it is not possible to have two fermions in the 
same one-particle state. In the boson case, there is no 
such restriction. Thus, even though the formalisms 
used to describe boson and fermion systems look very 
similar, they describe dramatically different physics. 


Applications 


In our example, the many-body Hamiltonian 
Ho = q(H) can also be written in the following 
suggestive form: 


Ho = / dP x ast (20) (Hy) (x) 28] 
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and similar formulas hold true for other observables 
and other Hilbert spaces h=L*(M)@C”. It is 
rather easy to solve the model defined by such 
Hamiltonian: all necessary computations can be 
reduced to one-particle computations. For example, 
in the static case, where A and @ are time 
independent, a main quantity of interest in statistical 
physics is the free energy 


E = —B' log(tr(exp (—S[Ho — uN]))) |29] 


where 3>0 is the inverse temperature, p the 
chemical potential, and the trace over the Fock 
space F+(h). One can show that 


E = +tr(6~' log(1 F exp(—S[H — p)))) [80] 


where the trace is over the one-particle Hilbert space 
h. Thus, to compute E, one only needs to find the 
eigenvalues of H. 

It is important to mention that the framework 
discussed here is not only for external field 
problems but can be equally well used to for- 
mulate and study more complicated models with 
interparticle interactions. For example, while the 
model with the Hamiltonian Hp above is often too 
simple to describe systems in nature, it is easy to 
write down more realistic models, for example, the 
Hamiltonian 


H =Ho + (e2/2) / dx J dy ai (x)ust(y) 
x |x — yl ay) d(x) [31] 


describes electrons in an external electromagnetic 
field interacting through Coulomb interactions. This 
illustrates an important point which we would like 
to stress: the task in quantum theory is twofold, 
namely to formulate and to solve (exact of other- 
wise) models. Obviously, in the nonrelativistic case, 
it is equally simple to formulate many-body models 
with and without interparticle interactions, and only 
the latter are simpler because they are easier to 
solve: the two tasks of formulating and solving 
models can be clearly separated. As we will see, in 
the relativistic case, even the formulation of an 
external field problem is nontrivial, and one finds 
that one cannot formulate the model without at 
least partially solving it. This is a common feature of 
quantum field theories making them challenging and 
interesting. 


Relativistic Fermion and Boson Systems 


We now generalize the formalism developed in the 
previous section to the relativistic case. 


Field Algebras and Quasifree Representations 


In the previous section, we identified the field 
operators w"")(f) with particular Fock space opera- 
tors. This is analogous to identifying the operators 
pj = —10,, and q; =x; on L?(R™) with the generators 
of the Heisenberg algebra, as usually done. (We 
recall: the Heisenberg algebra is the star algebra 
generated by P; and Q;, j=1,2,...,M < oo, with 
the well-known relations 


[Pj, Pk] = [Pj, Qe] = 9 
QO! = Q; 


for all j, k.) Identifying the Heisenberg algebra with 
a particular representation is legitimate since, as is 
well known, all its irreducible representations are 
(essentially) the same (this statement is made precise 
by a celebrated theorem due to von Neumann). 

However, in case of the algebra generated by the 
field operators ~'"(f), there exist representations 
which are truly different from the ones discussed in 
the last section, and such representations are needed 
to construct relativistic external field problems. It is 
therefore important to distinguish the fields as 
generators of an algebra from the operators repre- 
senting them. We thus define the (boson or fermion) 
field algebra A+(h) over a Hilbert space h as the star 
algebra generated by W'(f),f € h, such that the map 
f — W(f) is linear and the relations 


IP;, Pp] = —idjp, 


[32] 
Pop, 


=0 [33] 


are fulfilled for all f,g€b, with f the star 
operation in A+(b). The particular representation 
of this algebra discussed in the last section will be 
denoted by mo, mo(W'"(f)) =w' (f). Other represen- 
tations mp_ can be constructed from any projection 
operators P_ on h, that is, any operator P_ on h 
satisfying P* = P2 =P_. Writing ģ%®(f) short for 
mp (W(F)), this so-called quasifree representation 


is defined by 


DA) = y PA) + ¥P/) 
WA) = V(P+f) F Y (P-F) 
where the bar means complex conjugation. It is 
important to note that, while the star operation is 


identical with the Hilbert space adjoint * in the 
fermion case, we have 


ÒF = vA)" with 
F=P, -P 


[34] 


[35] 
for bosons 


where F is a grading operator, that is, F* = F and F = 1. 
We stress that the “physical” star operation always is x, 
that is, physical observables A obey A = A*. 

The present framework suggests to regard quantiza- 
tion as the procedure which amounts to going from a 
one-particle Hilbert space / to the corresponding field 
algebra A,(h). Indeed, the Heisenberg algebra is 
identical with the boson field algebra A_(C™) (since 
the latter is obviously identical with the algebra of M 
harmonic oscillators), and thus conventional quantum 
mechanics can be regarded as boson quantization in the 
special case where the one-particle Hilbert space is 
finite dimensional. It is interesting to note that 
“fermion quantum mechanics” A_(C™) is the natural 
framework for formulating and studying lattice fer- 
mion and spin systems which play an important role in 
condensed matter physics. 

In the following, we elaborate the naive inter- 
pretations of the relativistic equations in [2] and [3] 
as a quantum theory of one particle, and we discuss 
why they are unphysical. For simplicity, we assume 
that the electromagnetic fields ¢, A are time inde- 
pendent. We then show that quasifree representa- 
tions as discussed above can provide physically 
acceptable many-particle theories. We first consider 
the Dirac case, which is somewhat simpler. 


Fermions 


One-particle formalism Recalling that ið; is the 
energy operator, we define the Dirac Hamiltonian D 
by rewriting [3] in the following form: 


iby = Dy, D=(-iV+eA)-a+mB-ed [36] 


This Dirac Hamiltonian is obviously a self-adjoint 
operator on the one-particle Hilbert space h = L?(R*) @ 
C*, but, different from the Schrödinger Hamiltonian in 
[1], it is not bounded from below: for any Ep > —cx, 
one can find a state f such that the energy expectation 
value (f, Df) is less than Eg. This can be easily seen for 
the simplest case where the external potential vanishes, 
A= @=0. Then the eigenvalues of D can be computed 
by Fourier transformation, and one finds 


E=4)/p?+m, peR [37] 


Due to the negative energy eigenvalues we conclude 
that there is no ground state, and the Dirac 
Hamiltonian thus describes an unstable system, 
which is physically meaningless. 

To summarize: a (unphysical) one-particle 
description of relativistic fermions is given by a 
Hilbert space h together with a self-adjoint Hamil- 
tonian D unbounded from below. Other observables 
and transformations are given by self-adjoint and 
unitary operators on h, respectively. 
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Many-body formalism We now explain how to 
construct a physical many-body description from these 
data. To simplify notation, we first assume that D has a 
purely discrete spectrum (which can be achieved by 
using a compact space). We can then label the eigen- 
functions f„ by integers n such that the corresponding 
eigenvalues E, > 0 for n > 0 and E, <0 for n < 0. 
Using the naive representation of the fermion field 
algebra discussed in the last section, we get (we use the 
notation introduced in [22]) 


q(D) = X` [Enhi — Y Enyin [88] 


n>0 n<0 


which is obviously not bounded from below and thus 
not physically meaningful. However, yi pn = 1 — pny}, 
which suggests that we can remedy this problem by 
interchanging the creation and annihilation operators 
for n < 0. This is possible: it is easy to see that 


tn=Un Wa>0 and f =y} Wn<0 [39 


provides a representation of the algebra in [23]. We 
thus define 


(D) = YD En: Biba: 40) 


nEZ, 


with the so-called normal ordering prescription 


WO! n= Yh Yn — (Q, Uh dQ) [41] 


where we made use of the freedom of normalization 
explained after [23] to eliminate unwanted additive 
constants. We get q(D)= X` „ez |En|whwv,, which is 
manifestly a non-negative self-adjoint operator with 
as ground state. We thus found a physical many- 
body description for our model. We can now define 
for other one-particle observables, 


G(A) =X Amn Otn : [42] 
neZ 


and, by straightforward computations, we obtain 
l4 (A), q(B)] = 4([A, B]) + S(A, B) [43] 
S(A, B) = > mi Sans0 (Ama B mn = Dinu); 


where 
that is, 


S(A, B) = tr(P_AP,BP_—P_BP,AP_) [44] 
with P- = 0, fulfn:) the projection onto the 
subspace spanned by the negative energy eigenvec- 


tors of D and P, =1 — P_. One can show that ĝ(A) 
is no longer defined for all operators but only if 


P_AP. and PĻAP_ are 
Hilbert-Schmidt operators [45] 


(we recall that a is a Hilbert-Schmidt operator if 
tr(a*a) < oo). The C-number term S(A,B) in [43] is 
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often called Schwinger term and, different from the 
similar term in [25], it is now nontrivial, that is, it is 
no longer possible to remove it by a redefinition 
q'(A)=q(A) — b(A). This Schwinger term is an 
example of an anomaly, and it has various interest- 
ing implications. 

In a similar manner, one can construct the many- 
body transformations O(U) of unitary operators U 
on / satisfying the very Hilbert-Schmidt condition 
in [45], and one obtains 


O(U)O(V) = x(U, V)Q(UV) [46] 
with interesting phase-valued functions x. 

More generally, for any one-particle Hilbert 
space h and Dirac Hamiltonian D, the physical 
representation is given by the quasifree representa- 
tion mp_ in [34] with P_ the projection onto the 
negative energy subspace of D. The results about q 
and QO mentioned hold true in any such 
representation. 

Thus the one-particle Hamiltonian D determines 
which representation one has to use, and one 
therefore cannot construct the “physical” represen- 
tation without specific information about D. How- 
ever, not all these representations are truly different: 
if there is a unitary operator U on the Fock space 


F (bh) such that 
U“ rp (WPMU = ape WPA) [47] 


for all f €b, then the quasifree representations 
associated with the different projections P") and 
P) are physically equivalent: one could equally well 
formulate the second model using the representation 
of the first. Two such quasifree representations are 
called unitarily equivalent, and a fundamental 
theorem due to Shale and Stinespring states that 
two quasifree representations Tpu, are unitarily 
equivalent if and only if P“) — P) is a Hilbert- 
Schmidt operator (a similar result holds true in the 
boson case). 


Bosons 


One-particle formalism Similarly as for the Dirac 
case, the solutions of the Klein—Gordon equation in 
[2] also do not define a physically acceptable one- 
particle quantum theory with a ground state: the 
energy eigenvalues in [37] for A=¢=0 are a 
consequence the relativistic invariance and thus 
equally true for the Klein—Gordon case. However, 
in this case there is a further problem. To find the 
one-particle Hamiltonian, one can rewrite the 
second-order equation in [2] as a system of first- 
order equations, 


t= (a) (Cae e) 


B? = (-iV + eA)’ +m’, 


with 
C = —eġ |49] 


Thus, one sees that the natural one-particle Hilbert 
space for the Klein-Gordon equation is 
bh=L?(R?)& C?; here, and in the following, we 
identify hb with ho @ho,ho=L?(R°), and use a 
convenient 2 x 2 matrix notation naturally asso- 
ciated with that splitting. However, the one-particle 
Hamiltonian is not self-adjoint but rather obeys 


(Go) bo 


with x the Hilbert space adjoint. It is important to 
note that J is a grading operator. Thus, we can 
define a sesquilinear form 


(f,2), = /,Jg) Vf,g Eb [51] 


with (-,-) the standard inner product, and [50] is 
equivalent to K being self-adjoint with respect to 
this sesquilinear form; in this case, we say that K is 
J-self-adjoint. Thus, in the Klein—Gordon case, this 
sesquilinear form takes the role of the Hilbert space 
inner product and, in particular, not (®,®) but (®,®), is 
preserved under time evolution. However, different 
from İP, PJ is not positive definite, and it is 
therefore not possible to interpret it as probability 
density as in conventional quantum mechanics. For 
consistency, one has to require that one-particle 
transformations U are unitary with respect to (®,®),, 
that is, Ut = JUJ. We call such operators J-unitary. 

To summarize: a (unphysical) one-particle 
description of relativistic bosons is given by a 
Hilbert space of the form b = bo 6 ho, the grading 
operator J in [50], and a J-self-adjoint Hamiltonian 
K of the form as in eqn [48], where B > 0 and C are 
self-adjoint operators on hbo. Other observables and 
transformations are given by J-self-adjoint and 
J-unitary operators on /, respectively. 


K* = JKJ, 


Many-body formalism We first consider the quasi- 
free representation mpo) of the boson field algebra 
A_(h) so that the grading operator in [35] 1s 
equal to J, that is, P—=(1—J)/2. Writing 
Tpo(W'(F)) =P" (fF), one finds that 


q(A)=qJAJ), Q(U) = QQU) [52] 


and thus J-self-adjoint operators and J-unitary 
operators are mapped to proper observables and 
transformations. In particular, g(K) is a self-adjoint 


operator, which resolves one problem of the one-particle 
theory. However, g(K) is not bounded from below, and 
thus mpi) is not yet the physical representation. 

The physical representation can be constructed 


using the operators 
7 1 B!/2 iB-1/2 E 1 0 
T "a & Ttae Pp. F = 0 —1 [53] 
(for simplicity, we restrict ourselves to the case C= 0 


and B > 0; we use the calculus of self-adjoint operators 
here) with the following remarkable properties: 


T =JT"F 
4a JB 0 [54] 
IK -= = K 
0 —B 


One can check that 


DA =T, AAETH [55] 


is a quasifree representation mp. of A- (b) with 
P_=(1 — F)/2. With that the construction of g and 
Ô is very similar to the fermion case described 
above (the crucial simplification is that K and F now 
are diagonal). In particular, g(K) is a non-negative 
operator with the ground state Q, and g(A) and 
O(U) are self-adjoint and unitary for every one- 
particle observable A and transformation U, respec- 
tively. One also gets relations as in [43] and [46]. 


Related Topics of Recent Interest 


The impossibility to construct relativistic quantum- 
mechanical models played an important role in the 
early history of quantum field theory, as beautifully 
discussed in chapter 1 of Weinberg (1995). 

The abstract formalism of quasifree representations 
of fermion and boson field algebras was developed in 
many papers (see, e.g., Ruijsenaars (1977), Grosse and 
Langmann (1992), and Langmann (1994) for explicit 
results on O and x). A nice textbook presentation 
with many references can be found in chapter 13 of 
Gracia-Bondia et al. (2001) (this chapter is rather self- 
contained but mainly restricted to the fermion case). 

Based on the Shale-Stinespring theorem, there has 
been considerable amount of work to investigate 
whether the quasifree representations associated 
with different external electromagnetic fields 
p1, Ay and 42, A are unitarily equivalent, if and 
which time-dependent many-body Hamiltonians 
exist, etc. (see chapter 13 of Gracia-Bondia et al. 
(2001), and references therein). 

The infinite-dimensional Lie algebra g2 of Hilbert 
space operators satisfying the condition in [45] is an 
interesting infinite-dimensional Lie algebra with a 
beautiful representation theory. This subject is closely 
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related to conformal field theory (see, e.g., Kac and 
Raina (1987) for a textbook presentation and Carey 
and Ruijsenaars (1987) for a detailed mathematical 
account within the framework described by us). 

It turns out that the mathematical framework 
discussed in the previous section is sufficient for 
constructing fully interacting quantum field theories, 
in particular Yang-Mills gauge theories, in 1+ 1 
but not in higher dimensions. The reason is that, in 
3 +1 dimensions, the one-particle observables A of 
interest do not obey the Hilbert-Schmidt condition 
in [45] but only the weaker condition 


tr(a*a)" < œ, a= P APs [56] 


with n=2, and the natural analog of g2 in 3+1 
dimensions thus seems to be the Lie algebra g», of 
operators satisfying this condition with n = 2. Various 
results on the representation theory of such Lie 
algebras 29,.2 have been developed (see Mickelsson 
(1989), where various interesting relations to infinite- 
dimensional geometry are also discussed). 

As mentioned, the Schwinger term S(A,B) in [44] is 
an example of an anomaly. Mathematically, it is a 
nontrivial 2-cocycle of the Lie algebra g2, and analogs 
for the groups 22,52 have been found. These cocycles 
provide a natural generalization of anomalies (in the 
meaning of particle physics) to operator algebras. They 
not only shed some interesting light on the latter, but 
also provide a link to notions and results from 
noncommutative geometry (see, e.g., Gracia-Bondia 
et al. (2001)). We believe that this link can provide a 
fruitful driving force and inspiration to find ways to 
deepen our understanding of quantum Yang-Mills 
theories in 3 + 1 dimensions (Langmann 1996). 


See also: Anomalies; C*-Algebras and Their 
Classification; Dirac Fields in Gravitation and Nonabelian 
Gauge Theory; Dirac Operator and Dirac Field; Gerbes in 
Quantum Field Theory; Quantum Field Theory in Curved 
Spacetime; Quantum n-Body Problem; Superfluids; 
Two-Dimensional Models. 
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Introduction 


There is a common practice in mathematics of placing a 
boundary on an object which may not appear to come 
naturally equipped with one; this is often thought of as 
adding ideal points to the object. Perhaps the most 
famous example is the addition of a single “point at 
infinity” to the complex plane, resulting in the Riemann 
sphere: this is a boundary point in the sense of providing 
an ideal endpoint for lines and other endless curves in 
the plane. Often, there is more than one reasonable way 
to construct a boundary for a given object, depending 
on the intent; for instance, the plane is sometimes 
equipped, not with a single point at infinity, but with a 
circle at infinity, resulting in a space homeomorphic to a 
closed disk. Both these boundaries on the plane have 
useful but different things to tell us about the nature of 
the plane; the common feature is that, by bringing the 
infinite reach of the plane within the confines of a more 
finite object, we are better able to grasp the behavior of 
the original object. 

The general usefulness of the construction of 
boundaries for an object is to allow behavior of 
structures in the “completed” object to aid in 
visualization of behavior in the original object, 
such as by providing a degree of measurement or 
other classification of processes at infinity. This 
utility has not been overlooked for spacetimes. A 
variety of purposes may be served by various 
boundary construction methods: providing a locale 
for singularities (as the spacetime itself is modeled 
by a smooth manifold with a smooth metric, free of 
singular points); providing a platform from which to 
measure global properties such as total energy or 
angular momentum; displaying in finite form the 
causal structure at infinity; or providing a compact 
(or quasicompact) topological envelope for the 
spacetime while preserving the causal structure. 


Reed M and Simon B (1975) Methods of Modern Mathematical 
Physics. II. Fourier Analysis, Self-Adjointness. New York: 
Academic Press. 

Ruijsenaars SNM (1977) On Bogoliubov transformations for 
systems of relativistic charged particles. Journal of Mathema- 
tical Physics 18: 517-526. 

Weinberg S (1995) The Quantum Theory of Fields, vol. I (English 
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This article will consider several of the methods 
that have been used or proposed for constructing 
boundaries for spacetimes, ranging from the ad hoc 
(but practical) to the universal. Perhaps the 
simplest way to classify these methods is into 
those which employ or analyze embeddings of the 
spacetime in question and those that do not. 


Boundaries from Embeddings 
General 


The simplest and most common method of construct- 
ing a boundary for a spacetime M is to find a suitable 
manifold N (of the same dimension) and an appro- 
priate map ¢:M — N which is a topological embed- 
ding, that is, a homeomorphism onto its image ¢(M). 
We can consider Mọ, the closure of ¢(M) in N, as the 
o-completion of M, and 0;(M)=M, — (M) as the 
@-boundary. Typically, this embedding is chosen in 
such a way that curves of interest in M — such as 
timelike or null geodesics or causal curves of bounded 
acceleration — which have no endpoints in M, do have 
endpoints in (M); in other words, if c : [0, 00) — M is 
such a curve of interest, then lim,_,,, o(c(t)) exists in N. 

The common practice, initiated by Penrose in 
1967, is to choose N to be another spacetime — 
often called the unphysical spacetime, while M is 
considered the spacetime of physical interest — and to 
require the embedding ¢ to be a conformal mapping, 
that is, @ carries the spacetime metric in M to a scalar 
multiple of the spacetime metric in N. As conformal 
maps preserve the local causal structure, leaving 
unchanged the notions of timelike curve or null 
curve, this means that M, inherits from N a causal 
structure which, locally, is an extension of that of M. 
This allows us to speak of causal relationships within 
Mg, closely related to those in M. 


Minkowski Space 


The prototypical example is the conformal embedding 
of Minkowski space into the Einstein static spacetime. 


Let R” denote Euclidean n-space, S” the unit 
n-sphere, and L” Minkowski n-space, that is, R” with 
metric ds*=dxt+---+dx2_,-di? (so L’= 
R”! x L"). The n-dimensional Einstein static space- 
time is the product spacetime E” = S”! x L!. Con- 
sider S’~' as embedded in R” =R”! x R!. Then the 
conformal embedding is ¢:L” — E”, expressed as 
o:R™!'x L! —> St x L! cR*™!x R! xL! given 
by (x,t) =((x/|x|) sin 0, cos 6,7), where 0 = tan! 
(¢+ |x|) —tan+(¢—|x|) and += tan (¢+|x|)+ 
tan! (t — |x|). The boundary 0,;(L”) consists of the 
following: the points {9 + t=7;0 < T < 7}, composed 
of an S”* of null lines coming together at the point 
it =(0,1,7); a similar cone of null lines {9 — T = 7; 
-r < T < 0} with vertex at i` =(0,1, — 7r); anda single 
limit-point for both cones at i? = (0, —1,0). The r > 0 
null cone is called S* (the letter is read “scri” for 
“script-I”), its counterpart ò (Figures 1 and 2). As all 
future-directed timelike geodesics in L” have i* as an 
endpoint in E”,7* is called future-timelike infinity; 
similarly, 7 is past-timelike infinity. Every future- 
directed null geodesic ends up on 93°, which is thus 


T=—T 





Figure 1 1L? conformally embedded in E? =S! x L’. 
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termed future-null infinity, and 3 is past-null infinity. 
All spacelike geodesics come to 7°, spacelike infinity. 
For n=2, this picture produces the familiar 
diamond representation of L* (Figure 3): as E? is 
easily unrolled into another copy of L (metric 


BDO 


eS 


T=—T 


~ 


Figure 2 L’ conformally embedded in E? = Sê x L’. 
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Unrolled E? 





Image of L? 





Figure 3 L? conformally embedded in unrolled Eĉ, i.e., 
R'xL'=L?. 


dé? — dr’), this means that (7) is the region |0| + 
Ir| <a in L7; timelike curves and null geodesics in 
the original L? are the same as in (L^), and their 
endpoints in the boundary of the diamond are 
evident. For higher dimensions, the picture is not as 
visually obvious, since E” cannot be unrolled; but the 
principle of reading the causal structure at infinity of 
L” via its boundary points in E” remains the same. 


Conformal Embeddings 


There have been various formulations designed to 
emulate the conformal mapping of L” with respect to 
spacetimes, which are, in some sense, asymptotically 
like Minkowski space being conformally mapped into 
larger spacetimes. A spacetime M with metric g is 
called asymptotically simple or (alternatively) asymp- 
totically flat if there is a spacetime N with metric h, 
an embedding ¢:M — N, and a scalar function 2 
defined on N with ¢*h=(Qod)’g (ie, od is 
conformal with 9? the conformal factor) and Q =0 
on (M), dQ #0 on (M), and various other 
restrictions on Q, depending on the intent. One can 
define asymptotic symmetries of M by means of 
motions within 0;(M), leading to notions of global 
energy and angular momentum (see Hawking and 


Ellis (1973) and Wald (1984) for details). 


Classifications of Embeddings 


As a general rule, there is no uniqueness in the 
choice of an embedding ¢ for a spacetime M to 
construct a boundary, nor in the topology of the 
resulting boundary 0;(M), or even of which curves 
of interest end up having endpoints in the boundary. 
In an attempt to categorize which embeddings yield 
equivalent results and what sort of results there are 
in terms of endpoints of curves, Scott and Szekeres 


(1994) formulated what they called the abstract 
boundary of a spacetime. This depends on a choice 
of class of “interesting” curves, each characterizable 
as having either infinite or finite parameter length; 
typical choices for this class would be timelike 
geodesics or causal geodesics or timelike curves of 
bounded acceleration. For instance, a boundary 
point may be said to represent a singularity with 
respect to the chosen class of curves if it is the 
endpoint of one such curve with finite parameter 
length; nonsingular points are points at infinity. 
These classifications do not require conformal 
embeddings, nor even that the target of the embed- 
dings be spacetimes; they accommodate boundaries 
of a far more general type than Penrose’s notion 
stemming from conformal embeddings. 

A somewhat different study of boundaries from 
embeddings has been formulated by Garcia-Parrado 
and Senovilla (2003), classifying points at infinity and 
singularities in (M) for embeddings ¢:M — N in 
which N is a spacetime, @ preserves the chronology 
relation <, and there is also a diffeomorphism 
w:0(M) — N which again preserves < (the chronol- 
ogy relation in a spacetime is defined thus: x < y if 
and only if there is a future-directed timelike curve 
from x to y). This scheme applies more generally than 
to conformal embeddings, but the requirement for 
chronology-preserving maps in both directions guar- 
antees a strong sensitivity to causality; it amounts toa 
mild extension of Penrose’s notion that is often much 
easier to construct. 


Universal Constructions 
B-Boundary 


Attempts have been made to formulate boundary 
concepts specifically for defining singularities as 
ideal endpoints for finite-length geodesics. The 
most complete venture in this direction is the 
b-boundary (“b” for “bundle”) of Schmidt (Hawking 
and Ellis 1973, pp. 276-284). This is a formulation 
that takes note only of the connection in the linear 
frames bundle L(M) of a spacetime M (or of any 
manifold with a linear connection, metric or other- 
wise); in other words, it takes no particular note of 
the spacetime metric or even of the causal structure of 
the spacetime, but only of the notion of parallel 
translation of tangent vectors along curves. Parallel 
translation of a frame (a basis for the tangent space) 
along a curve is used to obtain an ad hoc length for 
the curve by treating the translated frame as positive- 
definite orthonormal at each point; whether this 
length is finite or infinite is independent of the choice 
of the original frame. The Schmidt construction 


defines a boundary on M which gives an endpoint for 
each curve, endless in M, which is finite in that sense: 
Select a positive-definite metric on L(M), give it a 
boundary by means of Cauchy completion, and then 
take the appropriate quotient by the bundle group. 
This has an appealing universality of application, but 
the problems of putting it into practice are quite 
formidable. Also, the fact that it takes no special note 
of the spacetime character of M suggests that it may 
not be of particular utility for physical insights. 


Causal Boundary: Basics 


In 1972 Geroch, Kronheimer, and Penrose (GKP) 
formulated a notion of boundary —- the causal 
boundary — that is specifically adapted to the causal 
character of a spacetime M; indeed, it is defined in 
such a way that one need know only the chronology 
relation < on M without any further reference to 
the metric (another way of saying this is that the 
causal boundary is conformally invariant). Like 
Schmidt’s b-boundary, the causal boundary is a 
universal construction, not depending on any extra- 
neous choices; however, although it has an obvious 
clarity in its causal structure, there are subtleties in 
the choice of an appropriate topology which are 
perhaps not yet fully resolved. As this boundary 
construction appears to embody the best hopes for a 
practical universal construction, it is detailed here in 
some depth. 

The causal boundary construction applies only to 
strongly causal spacetimes; essentially, this means 
that the local causal structure at each point is 
exactly reflective of the global causal structure. 

The basic construction of the causal boundary of 
a spacetime M starts with two separate parts: the 
future and past (pre-)boundaries of M, intended as 
yielding endpoints for, respectively, future- and past- 
endless causal curves. Part of the difficulty of the 
causal boundary is knowing how best to meld these 
two into one; currently, there are several answers to 
this conundrum. 

The elements of the future causal boundary of M 
are defined in terms of the past-set operator I~. For 
a point x € M, the past of x is IT (x)= {y |y < x}; for 
a set A C M, T [A] = Uea I (x). A set PCM is 
called a past set if I- [P]= P; anything of the form 
P=I [A] is a past set, and all past sets have this 
form. A past set P is an indecomposable past set (IP) 
if P cannot be written as P4 U P2 for past sets which 
are proper subsets P; CP. IPs come in exactly two 
varieties: pointlike IPs (PIPs), of the form I (x) 
(Figure 4), and terminal IPs (TIPs), of the form I [c] 
for c a future-endless causal curve (Figure 5). (Of 
course, any I(x) can also be expressed as I~ [c] for c 
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‘ex 


Figure 4 PIP P=/ (x). 


a causal curve ending at x.) The future causal 
boundary of M, 0(M), consists of all the TIPs of M; 
the future causal completion of M is M=0(M) UM. 
But that is just a set; the causal structure of M needs 
to be extended to M. : 

For any x € M and P € (M), set x < P if and 
only if x € P; set P < x if and only if P CI“ (y) for 
some y < x (y € M); and for P and O in (M), set 
P<O if and only if PCI (y) for some ye O. 
If we consider this an extension of the < relation on 
M, then we end up with a relation which, like that 
on M, is transitive and antireflexive. Furthermore, it 
has the property that for all a, 8 € M,a « B if and 
only if for some x € M,a & x < 6. (One can also 
amend the chronology relation within M to be more 
like the definition in the extension; that is not of 
major import.) 

We can also extend the causality relation < on M 
to one on M (in M,x < y if there is a future-directed 


C 
Figure 5 TIP P=I/-c. 
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causal curve from x to y): for x€ M and P,Q € 
(M), x < P for I~ (x) c P,P < x for P c T(x), and 
P=O for PCO. ; 

The intent is to have the elements of 0(M) provide 
future endpoints for future-endless causal curves in 
M; in particular, we want two such curves, c, and 
c2, to be assigned the same future endpoint precisely 
when I [cy] =I [c2]. This is accomplished by the 
simple expedient of defining the future endpoint of a 
future-endless causal curve c to be P =T [c]. We do 
not have a topology on M as yet, but it is worth 
noting that if P is the assigned future endpoint of c, 
then I (P)= T [c]; this is at least the correct causal 
behavior for a putative future endpoint of c. 

We can perform all the operations above in the 
time-dual manner, obtaining the past causal bound- 
ary O(M), consisting of terminal indecomposable 
future sets (TIFs), and the past causal completion 
M=0(M)UM. The full causal boundary of M 
consists of the union of (M) with ÕM) with some 
sort of identifications to be made. 

As an example of the need for identifications, 
consider M to be L with a closed timelike line 
segment deleted, say M=L7? —{(0,t)|O0<t< 1}. 
For (M), we have first the boundary elements at 
infinity: the TIP ¿t = M (the past of the positive time 
axis) and the set of TIPs making up 9” (the pasts of 
null lines going out to infinity in LŽ); and then, the 
boundary elements coming from the deleted points: 
for each t with 0 < ż < 1, two IPs emanating from 
(0, t), that is, Př, the past of the null line going 
pastwards from (0, t) toward x > 0, and P7, the past 
of the null line going pastwards from (0, t) toward 
x <0; and Po, emanating from (0, 0), that is, the 
past of the negative time axis. Similarly, 0(M) 
consists of 7,3, TIFs F and F; emanating from 
(0, t) for 0 < t < 1, and the TIF F; emanating from 
(0, 1). We probably want to make at least the 
following identifications for each t with 0 <t< 1, 
P} =F and P; = F;;P{ =F = P7; and Fo = 
Po =F). This results in a two-sided replacement 
for the deleted segment; for some purposes, it might 
be deemed desirable to identify the two sides as one, 
but a universal boundary is probably a good idea, 
leaving further identifications as optional quotients 
of the universal object. 

How best to define the appropriate identifications 
in general is a matter of some controversy. GKP 
defined a somewhat complicated topology on 
M=ô(M) U ÕIM) U M, then used an identification 
intended to result in a Hausdorff space. There are 
significant problems with this approach in some 
outré spacetimes, as pointed out by Budic and Sachs 
(1974) and Szabados (1989), both of whom recom- 
mended a different set of identifications. But what is 


of more concern is that the topology prescribed by 
GKP is not what might be expected in even the 
simplest of cases, for example, Minkowski space: L” 
needs no identifications among boundary points (no 
matter whose identification procedure is followed). 
The GKP topology on L”, restricted to ôL”), is not 
that of a cone (S”* x R! with a point added), as is 
the case for 3* in the conformal embedding into E”; 
but, instead, each null line in O(L”) (not including 7") 
is an open set, and ¿i has no neighborhood in 0(L”) 
save for the entire boundary. This is a topology 
bearing no relation at all to that of any embedding. 


Future Causal Boundary 


Construction An alternative approach, initiated by 
Harris (1998), is to forego the full causal boundary 
and concentrate only on M and M separately. There 
is an advantage to this in that the process of future 
causal completion — that is to say, forming M from 
M —- can be made functorial in an appropriate 
category of “chronological sets”: a set X with a 
relation < which is transitive and antireflexive such 
that it possesses a countable subset S$ which is 
“chronologically dense,” that is, for any x,y € X, 
there is some s € S with x<s<y. Any strongly 
causal spacetime M is a chronological set, as is M. 
The entire construction of the future causal bound- 
ary works just as well for a chronological set. The 
role of a timelike curve in a chronological set is 
taken by a future chain: a sequence c={x,} with 
Xn K Xn+1 for all n. For any future chain c, I~ [c] is an 
IP, and any IP can be so expressed; but unlike in 
spacetimes, 17 (x) may or may not be an IP for x € X. 
Then, X is always future complete in the sense that 
for any future chain c in X, there is an element a € X 
with I~ (a) =I [c]: for instance, if the chain c lies in 
X but there is no x € X with I (x)= [c], just let 
œ= TI- [c], which is an element of 6(X). This yields a 
functor of future completion from the category of 
chronological sets to the category of future-complete 
chronological sets, and the embedding X — X is a 
universal object in the sense of the category theory; 
this implies that it is categorically unique and is the 
minimal future-completion process. 

However, it is crucial to have more than the 
chronology relation operating in what is to be a 
boundary; topology of some sort is needed. This is 
accomplished by defining what might be called the 
future-chronological topology for any chronological 
set — including for M when M is a strongly causal 
spacetime. This topology is defined by means of a 
limit-operator L on sequences: if X is the chron- 
ological set, then for any sequence of points o = {x,} 
in X,L(c) denotes a subset of X which is the set of 


limits of o. It is explicitly recognized that there may 
be more than one limit of a sequence, as the space 
may not be Hausdorff; no attempt is made to 
remove any non-Hausdorffness, as this is viewed as 
giving important information on how, possibly, 
two points in the future causal boundary represent 
very similar and yet not identical pieces of 
information about the causal structure at infinity. 
Once the limit operator is in place, the actual 
topology on X is defined thus: a subset A C X is 
said to be closed if and only if for any sequence 
o C A,L(c) C A (and open sets are complements of 
closed sets). This yields the elements of L(c) as 
topological limits of o. 

The definition of L is simplest when X has the 
property that I~ (x) is an IP for any x € X; as this is 
true for X being either a spacetime M or the future 
causal completion M of a spacetime, the discussion 
here is restricted to this situation. Let us also make 
the common assumption that X is past-distinguishing, 
that is, I~ (x)= I~ (y) implies x = y. 

Let o={x,} be a sequence of points in a past- 
distinguishing chronological set X in which the past 
of any point is an IP. Then L(c) consists of those 
points x for which (see Figures 6 and 7) 


1. for all y € I(x), for n sufficiently large, y < xy, 
and 

2. for any IP P DI“ (x), there is some z € P such that 
for n sufficiently large, z K xy. 


Then the future-chronological topology on X has 
these features: 


1. It is a Tı topology, that is, points are closed. 

2. If (x)= [c] for a future chain, c= {x,}, then x 
is a topological limit of the sequence {x,}. 

3. If X=M, a strongly causal spacetime, then the 
future-chronological topology is precisely the 
manifold topology. 

4. If X=M, the future causal completion of a 
strongly causal spacetime M, then the induced 
topology on M is the manifold topology, 0(M) is 
a closed subset of M, and M is dense in M. As per 
property (2), for any future-endless causal curve c 
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Figure6 x€ L({xn}): forall y € F(x), eventually y < xn, and for 
all IP P 2 F(x), there is some z € P such that eventually z & Xp. 
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Figure 7 xg L({xn}): there is some IP P 2 F(x) such that for 
all z € P, Z < Xn for infinitely many n. 


in M, the point I~[c] in 0(M) is the topological 
endpoint of c in M. 

5. If X = L”, then X is homeomorphic to the conformal 
image of L” in E” together with 3° and i*; in 
particular, O(L,,) has the topology of a cone. 


Examples The future causal boundary with the 
future-chronological topology can be calculated 
with a fair degree of success. For instance, if M 
is conformal to a simple product spacetime OQ x L! 
(QO a Riemannian manifold), then (M) is much 
like O(L”) in that it consists of null or timelike 
lines factored over a particular boundary construc- 
tion O(O) on O, coming together at a single point 7* 
(the IP which is all of M); if O is complete, then 
these are all null lines, and together they may be 
called 3”. 

The elements of O(Q) are defined in terms of the 
Lipschitz-1 functions on O known as Busemann 
functions: if c:[a,w)— O is any endless unit-speed 
curve (typically, w= oo), then the Busemann function 
b.: O > Ris defined by b,(q) = lim,_.,, (s — d(c(s), q)), 
where d is the distance function in QO; this function 
is either finite for all g or infinite for all g. The set 
B(Q) of finite Busemann functions has an R-action 
defined by a-b,=bg., where (a-c)(s)=c(s +a). 
Then O(O)=B(Q)/R. For any P€0O(M), the 
boundary of P, as a subset of Ox L! = Ọ x R, is 
the graph of a Busemann function (the function is 
b. for P generated by a null curve projecting to c); 
and a point x=(g,t) in M can be represented by 
OUI (x)), which is the graph of the function 
t—d(—,q). Thus, one could use the function- 
space topology on B(Q) to topologize M; in that 
function-space topology (M) is a cone on (Q), 
and M, apart from i‘, is the topological product of 
R with OUO(Q). The future-chronological topol- 
ogy is sometimes different from the function-space 
topology, allowing more convergent sequences 
than the function-space topology does. When this 
happens, the result is non-Hausdorff, revealing 
pairs of points in (M) which are more closely 
related to one another than the function-space 
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topology reveals; but it is still the case that (M), 
apart from 7*, is fibered by R over O(Q). 

If O is a warped product O=(a,b) x K for a 
compact manifold K with metric dr? + e?h with þh 
a metric on K, then one can calculate more precisely: 
if, for instance, o has a minimum in the interior of 
(a, b) and has suitable growth on either end, then 
O(QO) represents two copies of K (one for each end of 
(a,b) x K), the future-chronological topology is the 
same as the function-space topology, and M (apart 
from i”) is a simple product of R with QU 0O(Q): 
O(M) is precisely a null cone over two copies of K. 
This applies, for instance, to exterior Schwarzschild, 
where K =S*; the boundary at one end of exterior 
Schwarzschild is the usual 3°, and the boundary at 
the other end is the null cone {r=2m}, where 
exterior attaches to interior Schwarzschild. 

Calculations for the future-chronological topology 
become much easier when (M) is purely spacelike, 
that is, no P € (M) is contained in the past of any 
other element of M. For instance, if M is conformal 
to a multiwarped product, QO; x--- x Om x (a,b) 
with metric f(t) hi +--+ + falt) hm — dé?, where h; 
is a Riemannian metric on Q;, then O(M) will be 
purely spacelike if all the Riemannian factors are 
complete and for each 1, {7 1/fi(t) dt < œ; in that 
case, (M) O, where O=Q, x---x Omn and 
Me O x (a,b). This applies, for instance, to inter- 
ior Schwarzschild, where O,=R! and O,=S’, 
yielding the topology of R! x S* for the Schwarzs- 
child singularity. 

There is a categorical universality for spacelike 
boundaries and the future-chronological topology. 
This means that any other reasonable way of 
future-completing interior Schwarzschild must yield 
R! x S% or a topological quotient of that for the 
singularity; and if the result is to be past-distinguishing, 
R! x S* is the only possibility. 

Of course, all this can be done in the time-dual 
fashion, using the past-chronological topology on 
M. It would be desirable to combine the future and 
past causal boundaries with a suitable topology as 
well as appropriate identifications. There has been 
some work in that direction. 


Causal Boundary: Revisited 


Marolf and Ross (2003) have proposed an identification 
of TIPs and TIFs that relies on the equivalence relation 
defined by Szabados. For an IP P and IF F, call (P, F) a 
Szabados pair if P c I~ (x) for all x € F, P is maximal 
among IPs for that property, and dually for F with 
respect to P. For instance, for any x € M, (I~ (x), I*(x)) 
is a Szabados pair. The Marolf—Ross version of the 


causal boundary, 0(M), consists of all Szabados pairs 


formed of TIPs and TIFs, plus any TIP or TIF that 
cannot be paired; this produces an appropriate set of 
identifications within (M) U ÕIM). The chronology 
relation on M is extended to M = 0(M) U M by treating 
each point x in M as the Szabados pair (I~ (x), I~ (x)) and 
each unpaired IP P as (P, Ø) and unpaired IF F as (Ø, F), 
and then defining (P,F)<(P’,F’) whenever 
Far so, 

The resulting chronological set is not necessarily 
either past- or future-distinguishing, but it is (past and 
future)-distinguishing. The topology they propose 
places endpoints in 0(M) for all causal curves which 
are endless in M, but there may be multiple future 
endpoints for a single future-endless curve. The 
topology need not be T;: points can fail to be closed. 
For a product spacetime M = O x Lt, the Marolf—-Ross 
topology on M is always the function-space topology. 

As of this writing, there is active research by J L Flores 
to institute a Marolf—Ross type of identification of 0(M) 
with 0(M) using a topology that partakes more of the 
future- and past-chronological topologies. 


See also: Asymptotic Structure and Conformal Infinity; 
Spacetime Topology, Causal Structure and Singularities. 
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Boundary conformal field theory (BCFT) is simply 
the study of conformal field theory (CFT) in 
domains with a boundary. It gains its significance 
[1] because, in some ways, it is mathematically 
simpler: the algebraic and geometric structures of 
CFT appear in a more straightforward manner; and 
[2] because it has important applications: in string 
theory in the physics of open strings and D-branes, 
and in condensed matter physics in boundary critical 
behavior and quantum impurity models. 

This article, however, describes the basic ideas 
from the point of view of quantum field theory, 
without regard to particular applications or to any 
deeper mathematical formulations. 


Review of CFT 
Stress Tensor and Ward Identities 


Two-dimensional CFTs are massless, local, relati- 
vistic renormalized quantum field theories. 
Usually they are considered in imaginary time, 
that is, on two-dimensional manifolds with 
Euclidean signature. In this article, the metric is 
also taken to be Euclidean, although the formula- 
tion of CFTs on general Riemann surfaces is also 
of great interest, especially for string theory. For 
the time being, the domain is the entire complex 
plane. 

Heuristically, the correlation functions of such a 
field theory may be thought of as being given by 
the Euclidean path integral, that is, as expectation 
values of products of local densities with respect 
to a Gibbs measure Z“ e*8) [dw], where the 
{y(x)} are some set of fundamental local fields, Sg 
is the Euclidean action, and the normalization 
factor Z is the partition function. Of course, such 
an object is not in general well defined, and this 
picture should be seen only as a guide to 
formulating the basic principles of CFT which 
can then be developed into a mathematically 
consistent theory. 


In two dimensions, it is useful to use the so-called 
complex coordinates z=x! + ix*,zg=x!—ix*. In 
CFT, there are local densities ¢;(z, Z), called primary 
fields, whose correlation functions transform covar- 


iantly under conformal mappings z — z’ =f(z): 
(61 (21, 21) b2 (2, 22) =) 
= [FRPR OG Wel%)) Ul 


where (hj, h;) (usually real numbers, not complex 
conjugates of each other) are called the conformal 
weights of ¢;. These local fields can in general be 
normalized so that their two-point functions have 
the form 


(OCZKO Ze)) = kN (zi — Ze) az (2 


They satisfy an algebra known as the operator 
product expansion (OPE) 


bi(21, 21) + (22, Z2) 


phh 
= > yp Zi 22) pene 
k 


x (%1 — Zo) Pt (x1, 21) n ) 


which is supposed to be valid when inserted into 
higher-order correlation functions in the limit when 
[z1 — Z2| is much less than the separations of all the 
other points. The ellipses denote the contributions of 
other nonprimary scaling fields to be described 
below. The structure constants c;,, along with the 
conformal weights, characterize the particular CFT. 

An essential role is played by the energy- 
momentum tensor, or, in Euclidean field theory 
language, the stress tensor T“”. Heuristically, it is 
defined as the response of the partition function to 
a local change in the metric: 


TY’ (x) = —(2n) 6In Z/6g,,,(x) [4] 


(the factor of 27 is included so that similar factors 
disappear in later equations). 

The symmetry of the theory under translations 
and rotations implies that T’” is conserved, 
ô, T” =0, and symmetric. Scale invariance implies 
that it is also traceless © = Tý =0. It should be 
noted that the vanishing of the trace of the stress 
tensor for a scale invariant classical field theory does 
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not usually survive when quantum corrections are 
taken into account: indeed, © œx (g), the renorma- 
lization group (RG) beta-function. A quantum field 
theory is thus only a CFT when this vanishes, that is, 
at an RG fixed point. In complex coordinates, the 
components T,;=T;,=40 vanish, while the con- 
servation equations read 


Ozl xz a O; Tz — [5] 


Thus, correlators of T(z) = Tz, are locally analytic 
(in fact, globally meromorphic) functions of z, while 
those of T(z)=Tx are antianalytic. It is this 
property of analyticity which makes CFTs tractable 
in two dimensions. 

Since an infinitesimal conformal transformation 
z — z+ a(z) induces a change in the metric, its effect 
on a correlation function of primary fields, given by [1], 
may also be expressed through an appropriate integral 
involving an insertion of the stress tensor. This leads to 
the conformal Ward identity: 


[ir olis (zi: Z;)) a(z) dz 
Tho ži) + a CA (0/0z;) NIG Cray [6] 
j 


where C is a contour encircling all the points {z;}. 
(A similar equation holds for the insertion of T.) 
Using Cauchy’s theorem, this determines the first 
few terms in the OPE of T with any primary density: 
E h; E 
T(2) $i) =a OHH) 
(z — %) 
1 : 
+—— 0,,6(%,%) +O) [7] 
Z= Zij 


The other, regular, terms in the OPE generate new 
scaling fields, which are not in general primary, 
called descendants. One way of defining a density to 
be primary is by the condition that the most singular 
term in its OPE with T is a double pole. 

The OPE of T with itself has the form 


T(z) - T(z1) = OE Te +--+ [8] 

C=) = 2) 
The first term is present because (T(z)T(z1)) is 
nonvanishing, and must take the form shown, with c 
being some number (which cannot be scaled to 
unity, since the normalization of T is fixed by its 
definition) which is a property of the CFT. It is 
known as the conformal anomaly number or the 
central charge. This term implies that T is not itself 
primary. In fact, under a finite conformal transfor- 
mation z — 2’ =f(z), 


T(z) = f(z) T(z") + 
where {z’,z}=(f'"f' 


derivative. 


5 2} 9 


„o is the Schwartzian 


Virasoro Algebra 


As with any quantum field theory, the local fields 
can be realized as linear operators acting on a 
Hilbert space. In ordinary QFT, it is customary to 
quantize on a constant-time hypersurface. The 
generator of infinitesimal time translations is the 
Hamiltonian Ĥ, which itself is independent of 
which time slice is chosen, because of time 
translational symmetry. It is also given by the 
integral over the hypersurface of the time-time 
component of the stress tensor. In CFT, because of 
scale invariance, one may instead quantize on fixed 
circle of a given radius. The analog of the 
Hamiltonian is the dilatation operator D, which 
generates scale transformations. Unlike H, the 
spectrum of D is usually discrete, even in an 
infinite system. It may also be expressed as an 
integral over the radial component of the stress 
tensor: 


= ie + Lo [10] 


where, because of analyticity, C can be any contour 
encircling the origin. 
This suggests that one define other operators 


RAA z” +1 T(z)dz [11] 


and similarly the b From the OPE [8] then follows 
the Virasoro algebra V: 


Èn, Lim) = (n —m)Lngm + <n — 
12 
with an isomorphic algebra V generated by the Ly. 
In radial quantization, there is a vacuum state |0). 
Acting on this with the operator corresponding to a 
scaling field gives a state |¢;) = $,(0, 0)|0) which is 
an eigenstate of D: in fact, 


Loldj) = hilo), Lolo) = bil) [13] 


From the OPE [7], one sees that |Ly,@j) « fanle), 
and, if ġ; is primary, L,,|¢;) =0 for all n > 1. 

The states corresponding to a given primary field, 
and those generated by acting on these with all the 
L,, with n < 0 an arbitrary number of times, form a 


1)br+mo [12] 


highest-weight representation of V. However, this is 
not necessarily irreducible. There may be null 
vectors, which are linear combinations of states at 
a given level which are themselves annihilated by all 
the Ê, with n > 0. They exist whenever þh takes a 
value from the Kac table: 


(r(m+1)—sm)* -1 


P= hrs = 4m(m+ 1) 


[14] 
with the central charge parametrized as c= 1 — 6/ 
(m(m + 1)), and r, s are non-negative integers. These 
null states should be projected out, giving an 


irreducible representation V}. 
The full Hilbert space of the CFT is then 


H = n, zV 9V; [15] 
bp 


where the non-negative integers n, ;, specify how 
many distinct primary fields of weights (h, p) there 
are in the CFT. 

The consistency of the OPE [3] with the existence 
of null vectors leads to the fusion algebra of the 
CFT. This applies separately to the holomorphic and 
antiholomorphic sectors, and determines how many 
copies of V, occur in the fusion of V, and Vp: 


=) N V, [16] 


where the N4, are non-negative integers. 

A particularly important subset of all CFTs 
consists of the minimal models. These have rational 
central charge c=1 — 6(p — q} /pq, in which case 
the fusion algebra closes with a finite number of 
possible values 1<r<gq,1<s<p in the Kac 
formula [14]. For these models, the fusion algebra 
takes the form 


Va O Vp 


EEEE elt 


Vrsi © Vn = ` ` Vis [17] 


r=|r1—12| s=|s1—s2| 


where the prime on the sums indicates that they are 
to be restricted to the allowed intervals of r and s. 

There is an important theorem which states that 
the only unitary CFTs with c< 1 are the mini- 
mal models with p/qg=(m-+1)/m, where m is an 
integer >3. 


Modular Invariance 


The fusion algebra limits which values of (h,/h) 
might appear in a consistent CFT, but not which 
ones actually occur, that is, the values of the n, ;. 
This is answered by the requirement of modular 
invariance on the torus. First consider the theory on 
an infinitely long cylinder, of unit circumference. 
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This is related to the (punctured) plane by the 
conformal mapping z— (1/2r)lnz = t+ ix. The 
result is a QFT on the circle 0<x< 1, in 
imaginary time t. The generator of infinitesimal 
time translations is related to that for dilatations in 
the plane: 


TC 
Ĥ = 27D -Z 
r- 

= 2n(Lo + Lo) -Z 18 


where the last term comes from the Schwartzian 
derivative in [9]. Similarly, the generator of transla- 
tions in x, the total momentum operator, is 
P=2n(Lo — Lo). 

A general torus is, up to a scale transformation, a 
parallelogram with vertices (0,1,7,1+7) in the 
complex plane, with the opposite edges identified. 
We can make this by taking a cylinder of unit 
circumference and length Im, 7, twisting the ends by 
a relative amount Rer, and sewing them together. 
This means that the partition function of the CFT on 
the torus can be written as 


Z(r,7) =tr e- (Im7)Ĥ+i(Im 7) 
ii gio /24 Glo—e/24 [19] 


using the above expressions for H and Ê and 
introducing q = e*™”. 

Through the decomposition [15] of H, the trace 
sum can be written as 


Z(r,7) = mpal xz Bo 
hb 


where 


Xn (4) = try,gi-/4 = 


2, dy,(N —(c/24)+ [21] 
is the character of the representation of highest weight 
h, which counts the degeneracy d(N) at level N. It is 
purely an algebraic property of the Virasoro algebra, 
and its explicit form is known in many cases. 

All of this would be less interesting were it not 
for the observation that the parametrization of the 
torus through r is not unique. In fact, the 
transformations S:r— —1/r and T:r—=>r+1 
give the same torus (see Figure 1). Together, these 


—1/7 


0 1 


Figure 1 Two equivalent parametrizations of the same torus. 
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Operations generate the modular group SL(2, Z), 
and the partition function Z(r,7) should be 
invariant under them. T-invariance is simply imple- 
mented by requiring that h — h is an integer, but 
the S-invariance of the right-hand side of [20] 
places highly nontrivial constraints on the n, ;. 
That this can be satisfied at all relies on the 
remarkable property of the characters that they 
transform linearly under S: 


(e277) = 2. S xp (e e] [22] 


This follows from applying the Poisson sum formula 
to the explicit expressions for the characters, which 
are related to Jacobi theta-functions. In many cases 
(e.g., the minimal models) this representation is 
finite dimensional, and the matrix S$ is symmetric 
and orthogonal. This means that one can immedi- 
ately obtain a modular invariant partition function 
by forming the diagonal sum 


Z=S_ xva)x0() |23] 
7 


so that n, ;=6,;. However, because of various 
symmetries of the characters, other modular invariants 
are possible: for the minimal models (and some others) 
these have been classified. Because of an analogy of the 
results with the classification of semisimple Lie 
algebras, the diagonal invariants are called the A-series. 


Boundary CFT 


In any field theory in a domain with a boundary, 
one needs to consider how to impose a set of 
consistent boundary conditions. Since CFT is for- 
mulated independently of a particular set of funda- 
mental fields and a Lagrangian, this must be done in 
a more general manner. A natural requirement is 
that the off-diagonal component Tj, of the stress 
tensor parallel/perpendicular to the boundary should 
vanish. This is called the conformal boundary 
condition. If the boundary is parallel to the time 
axis, it implies that there is no momentum flow 
across the boundary. Moreover, it can be argued 
that, under the RG, any uniform boundary condi- 
tion will flow into a conformally invariant one. For 
a given bulk CFT, however, there may be many 
possible distinct such boundary conditions, and it is 
one task of BCFT to classify these. 

To begin with, take the domain to be the upper- 
half plane, so that the boundary is the real axis. The 
conformal boundary condition then implies that 
T(z) = T(z) when z is on the real axis. This has the 
immediate consequence that correlators of T are 
those of T, analytically continued into the lower- 


half plane. The conformal Ward identity, cf. [7], 
now reads 


(T(x) J] oi(z,%)) 





—a tz : z a.) ( lI Plezi) ) |24] 


In radial quantization, in order that the Hilbert 
spaces defined on different hypersurfaces be equiva- 
lent, one must choose semicircles centered on some 
point on the boundary, conventionally the origin. 
The dilatation operator is now 


b=5 |: Teds, | zt \dz p3 


where S is a semicircle. Using the conformal 
boundary condition, this can also be written as 


1 i 
z e dz [26] 


D = Lo = 
where C is a complete circle around the origin. As 
before, one may similarly define the L,, and they 
satisfy a Virasoro algebra. 

Note that there is now only one Virasoro algebra. 
This is related to the fact that conformal mappings 
which preserve the real axis correspond to real 
analytic functions. The eigenstates of Lo correspond 
to boundary operators @;(0) acting on the vacuum 
state |O). It is well known that in a renormalizable 
QFT operators at the boundary require a different 
renormalization from those in the bulk, and this will 
in general lead to a different set of conformal 
weights. It is one of the tasks of BCFT to determine 
these, for a given allowed boundary condition. 

However, there is one feature unique to boundary 
CFT in two dimensions. Radial quantization also 
makes sense, leading to the same form [26] for the 
dilation operator, if the boundary conditions on the 
negative and positive real axes are different. As far as 
the structure of BCFT goes, correlation functions with 
this mixed boundary condition behave as though a 
local scaling field were inserted at the origin. This has 
led to the term “boundary condition changing (bcc) 
operator,” but it must be stressed that these are not 
local operators in the conventional sense. 


The Annulus Partition Function 


Just as consideration of the partition function on the 
torus illuminates the bulk operator content n, ;, it 


r 


Figure 2 The annulus, with boundary conditions a and b on 
either boundary. 


turns out that consistency on the annulus helps 
classify both the allowed boundary conditions, and 
the boundary operator content. To this end, con- 
sider a CFT in an annulus formed of a rectangle of 
unit width and height 6, with the top and bottom 
edges identified (see Figure 2). The boundary 
conditions on the left and right edges, labeled by 
a,b,..., may be different. The partition function 
with boundary conditions a and b on either edge is 
denoted by Z,,(6). 

One way to compute this is by first considering 
the CFT on an infinitely long strip of unit width. 
This is conformally related to the upper-half plane 
(with an insertion of bcc operators at 0 and oo if 
a+b) by the mapping z — (1/7)Inz. The gen- 
erator of infinitesimal translations along the strip is 


H,» = nD — 1c/24 = nlo — 1/24 [27] 
Thus, for the annulus, 


Zap (ô) = treo Ha = te gto77e/2A [28] 


with q =e. As before, this can be decomposed 
into characters: 


Za lô) = X_n xla) |29] 
h 


but note that now the expression is linear. The non- 
negative integers n give the operator content with 
the boundary conditions (ab): the lowest value of þh 
with n”, > 0 gives the conformal weight of the bec 
operator, and the others give conformal weights of 
the other allowed primary fields which may also sit 
at this point. 

On the other hand, the annulus partition function 
may be viewed, up to an overall rescaling, as the 
path integral for a CFI on a circle of unit 
circumference, being propagated for (imaginary) 
time 6'. From this point of view, the partition 
function is no longer a trace, but rather the matrix 
element of e ™/? between boundary states: 


Za(5) = (ale#/*|b) [30] 
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Note that H is the same Hamiltonian that appears in 
[18], and the boundary states lie in H, [15]. 

How are these boundary states to be character- 
ized? Using the transformation law [9] the 
conformal boundary condition applied to the 
circle implies that L,—L_,. This means that 
any boundary state |B) lies in the subspace 
satisfying 


L.,|B) = L_,|B) [31] 


Moreover, because of the decomposition [15] of 
H, |B) is also some linear superposition of states from 
V, & Vz. This condition can therefore be applied in 
each subspace. Taking n=0 in [31] constrains h =b. 
For simplicity, consider only the diagonal CFTs with 
n, =ó, j- It can then be shown that the solution 
of [31] is unique and has the following form. 
The subspace at level N of V, has dimension 
d,(N). Denote an orthonormal basis by |b,N;/), 
with 1 <j;<d,(N), and the same basis for V, by 
lh, N37). The solution to [31] in this subspace is 
then 





CO 


di (N) 
b) =X X |bLN; i) 81h, N; j) [32] 


N=0 j=1 


These are called Ishibashi states. Matrix elements of 
the translation operator along the cylinder between 
them are simple: 


Q (h, N’: yje- (28/8) L0+Lo—e/ 12) [33] 


= Öri >, e (41/6) (P+N—(c/24)) [34] 
NO l 
= dypxn(e e) [35] 


Note that the characters which appear are 
related to those in [29] by the modular transfor- 
mation 5. 

The physical boundary states satisfying [29], 
sometimes called the Cardy states, are linear 
combinations of the Ishibashi states: 


la) = X ((bla)|h)) [36] 
h 


Equating the two different expressions [29] and [30] 
for Z,,, and using the modular transformation law 
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[22] and the linear independence of the characters 
gives the (equivalent) conditions: 


nay = X Sp alh')) (( 
þh! 
XS pn [38] 


h 


h'|b) [37] 


(alh) (e lb) = 


These are called the Cardy conditions. The require- 
ments that the right-hand side of [37] should give a 
non-negative integer, and that the right-hand side of 
[38] should factorize in a and b, give highly 
nontrivial constraints on the allowed boundary 
states and their operator content. 

For the diagonal CFTs considered here (and for 
the nondiagonal minimal models) a complete solu- 
tion is possible. It can be shown that the elements S% 
of S are all non-negative, so one may choose 
((h|0) =(S2)'/*. This defines a boundary state 


8) = (s) w 39) 


h 


and a corresponding boundary condition such that 
n= 6,9. Then, for each h' #0, one may define a 
boundary state 


(hih = Sf, Sp)" [40] 


From [37], this gives te = pp. For each allowed p’ 
in the torus partition function, there is therefore a 
boundary state |h’) satisfying the Cardy conditions. 
However, there is a further requirement: 





Npp" = sh |41] 


should be a non-negative integer. Remarkably, this 
combination of elements of S occurs in the Verlinde 
formula, which follows from considering consis- 
tency of the CFT on the torus. This states that the 
right-hand side of [41] is equal to the fusion algebra 
coefficient Ny,,,.. Since these are non-negative 
integers, the consistency of the above ansatz for the 
boundary states is consistent. 

We conclude that, at least for the diagonal models, 
there is a bijection between the allowed primary fields 
in the bulk CFT and the allowed conformally invariant 
boundary conditions. For the minimal models, with a 
finite number of such primary fields, this correspon- 
dence has been followed through explicitly. 


Example The simplest example is the diagonal c = - 
unitary CFT corresponding to m=3. i ocd 


values of the conformal weights are h = 0, 1 J5 -— z» and 
1 1 i, 
2 2 A 
S=|3 % =a [42] 
S re 0 
v2 v2 


from which one finds the allowed boundary states 
~ 1 111 1 |1 
0-ta) aae) 3 

tala) anl) 

1 1 

5 = 10) - 5) as 


The nontrivial part of the fusion algebra of this 
CFT is 














Vit OVi = Vot V1 |46] 
V OV =V; 47) 
V; © Vi Vo |48] 


from which can be read off the boundary operator 
content 


€ =1 [49] 


The c = CFT is known to describe the continuum limit 
of the critical Ising model, in which spins s = +1 are 
localized on the sites of a regular lattice. The above 
boundary conditions may be interpreted as the con- 
tinuum limit of the lattice boundary conditions s =1, 
free and s = —1, respectively. Note there is a symmetry 
of the fusion rules which means that one could 
equally well have inverted the ordering of this 
correspondence. 


Other Topics 
Boundary Entropy 


The partition function on annulus of length L and 
circumference 3 can be thought of as the quantum 
statistical mechanics partition function for a one- 
dimensional QFT in an interval of length L, at 
temperature 67t. It is interesting to consider this 
in the thermodynamic limit when 6=L/( is large. In 
that case, only the ground state of H contributes in 
[30], giving 


Zap(L, B) ~ (a|0)(0|b)er/°? [50] 


from which the free energy Fap = — 67! In Zap and 
the entropy S,, = —*(OF,,/03) can be obtained. 
The result is 


Sab = (nc/38)L + Sa + sp + 0(1) [51] 


where the first term is the usual extensive contribu- 
tion. The other two pieces sz =In((a|0)) and sẹ = 
In ((b|O)) may be identified as the boundary entropy 
associated with the corresponding boundary states. 
A similar definition may be made in massive QFTs. 
It is an unproven but well-verified conjecture that 
the boundary entropy is a nonincreasing function 
along boundary RG flows, and is stationary only for 
conformal boundary states. 


Bulk—Boundary OPE 


The boundary Ward identity [24] has the implica- 
tion that, from the point of view of the dependence 
of its correlators on z and Z, a primary field 
(zz) may be thought of as the product of two 
local fields which are holomorphic functions of zj; 
and z;, respectively. These will satisfy OPEs as |z; — 
z;| — 0, with the appearance of primary fields on the 
right-hand side being governed by the fusion rules. 
These fields are localized on the real axis: they are 
the boundary operators. There is therefore a kind of 
bulk—boundary OPE: 


Pilz, zi) o> ‘e(Im gj) Rez) [52] 


where the sum on the right-hand side is, in principle, 
over all the boundary fields consistent with the 
boundary condition, and the coefficients dj, are 
analogous to the OPE coefficients in the bulk. As 
before, they are nonvanishing only if allowed by the 
fusion algebra: a boundary field of conformal weight 
h, is allowed only if Ni a, 

For example, in the c= $ CFT, the bulk operator 
with h=h= ig goes over a the boundary opera- 
tor with h =0, or that with h = 4, depending on the 
boundary condition. The bulk operator with 
h=h= +, however, can only go over into the 
identity boundary operator with 4 =0 (or a descen- 
dent thereof.) 

The fusion rules also apply to the boundary 
operators themselves. The consistency of these with 
bulk—boundary and bulk—bulk fusion rules, as well 
as the modular properties of partition functions, was 
examined by Lewellen. 


Extended Algebras 


CFTs may contain other conserved currents apart 
from the stress tensor, which generate algebras 
(Kac-Moody, superconformal, W-algebras) which 
extend the Virasoro algebra. In BCFT, in addition to 
the conformal boundary condition, it is possible (but 
not necessary) to impose further boundary condi- 
tions relating the holomorphic and antiholomorphic 
parts of the other currents on the boundary. It is 
believed that all rational CFTs can be obtained from 
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Kac—Moody algebras via the coset construction. The 
classification of boundary conditions from this point 
of view is fruitful and also important for applica- 
tions, but is beyond the scope of this article. 


Stochastic Loewner Evolution 


In recent years, there has emerged a deep connection 
between BCFT and conformally invariant measures 
on curves in the plane which start at a boundary of a 
domain. These arise naturally in the continuum limit 
of certain statistical mechanics models. The measure 
is constructed dynamically as the curve is extended, 
using a sequence of random conformal mappings 
called stochastic Loewner evolution (SLE). In CFT, 
the point where the curve begins can be viewed as 
the insertion of a boundary operator. The require- 
ment that certain quantities should be conserved in 
mean under the stochastic process is then equivalent 
to this operator having a null state at level two. 
Many of the standard results of CFT correspond to 
an equivalent property of SLE. 
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Introduction 


Inverse problems are generally positioned as the 
problems of determination of a system (its structure, 
parameters, etc.) from its “input — output” 
correspondence. 

The boundary-value inverse problems deal with 
systems which describe processes (wave, heat, electro- 
magnetic ones, etc.) occurring in media occupying a 
spatial domain. The process is initiated by a boundary 
source (input) and is described by a solution of a certain 
partial differential equation in the domain. Certain 
additional information about the solution, which can be 
extracted from measurements on the boundary, plays 
the role of the output. The objective is to determine the 
parameters of the medium — in particular, the coeffi- 
cients in the equation — from this information. 

The boundary control (BC) method (Belishev 
1986) is an approach to the boundary-value inverse 
problems based on their links with the control 
theory and system theory. The present article is a 
version of the BC method which solves the problem 
of reconstruction of a Riemannian manifold from its 
boundary spectral or dynamical data. 


Forward Problems 
Manifold 


Let (Q, d) be a smooth compact Riemannian manifold 

with the boundary T, dim Q > 2; d is the distance 

determined by the metric tensor g. For A C Q denote 
(AY := {x E Q|d(x, A) <r}, r>0 


the hypersurfaces T!:={x € Q|d(x, r) =T} T>0 
are equidistant to T. In terms of the dynamics of 
the system, the value 


T, := min{T > 0| (rT)? = 9} = max d(-, T) 


means the time needed for waves, moving from T 
with the unit speed, to fill Q. 


A point x € Q is said to belong to the set cg C Q if 
x is connected with I via more than one shortest 
geodesic. The set c:= Co is called the separation set 
(cut locus) of Q with respect to I. It is a closed set of 
zero volume. Let 7,(y) be the length of the geodesic 
emanating from y€I orthogonally to I and 
connecting y with c. The function 7,(-) is continuous 
on I. 

For x€Q\c the pair (7,7), such that 
T=d(x,T)= d(x,y), constitutes the semigeodesic 
coordinates of x. The set of these coordinates 


O := (y, T)| y Er, 0< 747.45) CT x 0,7, 


is called the pattern of Q. Pictorially, to get the 
pattern, one needs to slit Q along c and then pull it 
on the cylinder T x [0, T.]. The part ©O1:=0 A(T x 
[0, T]) of the pattern consists of the semigeodesic 
coordinates of the points x € (')'\c (Figure 1). 


Dynamical System 


Propagation of waves in the manifold is described by 
a dynamical system a! of the form 


Uy, —A,u=h in Q x (0, T) [1] 
u = Ur k= 0 in Q [2] 
u=f on [ x (0, T] [3] 


where Ag is the Beltrami—Laplace operator, 0< T <o, 
f and 4 are the boundary and volume sources 
(controls), w=wu!”(x,t) is the solution (wave). 

Set H := L2(Q); the spaces of the controls are 


FT :=L)(T x [0,T]), G! := L2([0, TH) 


al 





Figure 1 


Manifold and pattern. (Data from Belishev (1997).) 
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The “input +> state” map of the system a! is 


realized by the control operator WT: 
TGE oH, WTF, b} := u” (T) 
and its parts 
WaT oH, 
Wia f := u” (-, T), 


Wh: la BaL 
WroiP = u™(. ) T) 


In the case f=0 the evolution of the system is 
governed by the operator L:= —Ag defined on the 
Sobolev class H7(Q) N H}(Q) of functions vanishing 
on I’, and the semigroup representation 


wu” (. r) = W’ hb 


vol 
a i sin|(r—2)L'?|b(.,2)de (4 
0 


holds for all r > 0. 
The “input +> output” map is implemented by the 
response operator R: F! = F!, 


R'f := ðu” onT x [0,T] 


defined on controls f € H'(T x [0, T]) vanishing on 
I x {t=0}; here v= v(y) is the outward normal to r. 
The normal derivative ðu"? describes the forces 
appearing on T as a result of interaction of the wave 
with the boundary. 

The map CT : F! = F',Cl:=(W)* WE, which 
is called the connecting operator, can be represented 
via the response operator of the system a7!: 


CT = a RO [5] 


ST: F! — F°! being the extension of controls from 
x [0,T] onto T x [0,2T] as odd functions of t 
with respect to t=T, and JT: F?! — F°! being the 


integration 
PAGD = f fed 


Controllability 


Open subsets o CI and wc Q determine the 
subspaces 


F! = {f € F! | supp f co x [0,T]} 
Gi := {h € g' | supp h c © x [0, TI} 


of controls acting from o and w, respectively. In view 
of hyperbolicity of the problem [1]-[3], the relation 


supp u” (t) c (a) UY, t>0 [6] 


holds for f € FT and h € G}. This means that the 
waves propagate in Q with the speed=1. 


The sets of waves 


T._ WT rT T._ wT cT 
Uz T Wha? gs U, E Wolo 
are said to be reachable at time t=T from o and w, 
respectively. Denoting 


HA := {y € H| supp y C A} 


by virtue of [6] one has the embeddings U? c H(z)" 
and U! C HO 2 The property of the system a 
that plays the key role in inverse problems is that 
these embeddings are dense: 


T 


cduUT =H, cul = Ho [7] 
for any T > 0 (cl denotes the closure in H). 

In control theory, relations [7] are interpreted as 
an approximate controllability of the system in 
subdomains filled with waves; the name “BC 
method” is derived from the first one (boundary 
controllability). This property means that the sets 
of waves are rich enough: any function supported 
in the subdomain (5)! reachable for waves excited 
on o can be approximated with any precision in 
H-norm by the wave w/°(-,T) due to appropriate 
choice of the control f acting from o. The proof of 
[7] relies on the fundamental Holmgren—John—Tataru 
unique continuation theorem for the wave equation 


(Tataru 1993). 


Laplacian on Waves 


If h=0, so that the system is e only by 
sours controls, its trajectory (w? %-,2))0<t<T} 
does not leave the reachable set UŁ. x this case, the 
system possesses One more intrinsic operator L! 
which acts in the subspace clU/} and is introduced 
through its graph 


gL? cl} {WEF -Wafa f € CET x (0.7) } (8 


(closure in HxH). By virtue of the relation 
L'W, f=—A,W;,f following from the wave 
equation [1] ad [6], the operator L! is interpreted 
as Laplacian on waves filling the subdomain ([)’. 
In the case T > T,, one has T) =Q, clUs =H, 
and L! is a densely defined operator in H, satisfying 
L! CL. Using [7], one proves the equality L! = L. 
This equality and representation [4] imply that 


b= fi (LT) "sin [e-d] etde 9 


for all r>0 and any fixed T >T*. 
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Spectral Problem 


The Dirichlet homogeneous boundary-value pro- 
blem is to find nontrivial solutions of the system 


—-Azp=dAyp in [10] 


p=0 onT [11] 


This problem is equivalent to the spectral analysis 
of the operator L; it has the discrete spectrum 
{Ag}p 1.0 < à1 < A2 < +, Ag > œ; the eigenfunctions 
(prkl 1LPk=Akpk, form an orthonormal basis 
in H. 

Expanding the solutions of the problem (1)-(3) 
over the eigenfunctions of the problem [10], [11] 
one derives the spectral representation of waves: 


u’ (, T) = Wof = dr ) 3 


Sp) FT PR 


where 


SE (wt) = Agi? sin (T = DA | Boge) 


Thus, for a given control f, the Fourier coefficients 
of the wave uw are determined by the spectrum 
{Az}; and the derivatives {09 pk} 1 


Inverse problems 
General Setup 


The set of pairs X := {A}; Ope}; associated with 
the problem [10], [11] is said to be the Dirichlet 
spectral data of the manifold (Q,d). The spectral 
(frequency domain) inverse problem is to recover the 
manifold from its spectral data. 

Since the speed of wave propagation is unity, the 
response operator R? contains the information not 
about the entire manifold but only about its part 
(T)™?. This fact is taken into account in the 
dynamical (time domain) inverse problem which 
aims to recover the manifold from the operator R?! 
given for a fixed T > T,. 

If the manifolds (Q, d’) and (Q”, d”) are isometric 
via an isometry i:Q' — Q”, then, identifying the 
boundaries by i(y) = y, one gets two manifolds with 
the common boundary T = ðQ = ðQ” which possess 
identical inverse data: ©/=™",R’*’ =R’*". Such 
manifolds are called equivalent: they are indistin- 
guishable for the external observer extracting © or 
R?! from the boundary measurements. Therefore, 
these data do not determine the manifold uniquely 
and both of the inverse problems need to be 
clarified. The precise formulation is given in the 
form of two questions: 


1. Does the coincidence of the inverse data imply 
the equivalence of the manifolds? 

2. Given the inverse data of an unknown manifold, 
how to construct a manifold possessing these 
data? 


The BC method gives an affirmative answer to the 
first question and provides a procedure producing a 
representative of the class of equivalent manifolds 
from its inverse data. The method is based on the 
concepts of model and “coordinatization.” 


Model 


A pair consisting of an auxiliary Hilbert space H 
and an operator Waa: F! — His said to be a model 
of the system a’, if Wig, is determined by inverse 
data, and the map U: Wyle Wal is an isometry 
com Ran Wf, C H onto Ran Wha C H. The model is 
an intermediate object in solving inverse problems. It 
plays the role of an auxiliary copy of the original 
dynamical system which an external observer can build 
from measurements on the boundary. While the 
genuine wave process inside Q, initiated by a boundary 
control, remains unaccessible for direct measurements, 
its H-representation can be visualized by means of the 
model control operator W,,. This is illustrated by the 
diagram on Figure 2, where the upper part is invisible 
for an external observer, whereas the lower part can be 
extracted from inverse data. 

Each type of data determines a corresponding 
model. The spectral model is the pair 


H:=hb, Waa = {(se)erter [13] 
(see [12]); the role of isometry U is played by the 
Fourier transform F : H —> H, Fy:={(y,~)y}2_1. By 
virtue of [4], the data £ also determine the operator 


~ 


W vot: La([0, 1]; H) > H, 


Was | E sinfe -AE OE dt, 


0 
r>0 [14] 





Figure 2 Model of a system. (Data from Belishev (1997).) 
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where L := ULU* = diag{A,}7°_,. Thus, the spectral 
model allows one to see the Fourier images of 
invisible waves. 

According to [5], the response operator R?! 
determines the modulus of the control operator 
(Che 


Wal = (Wig) wal? = 


which enters in the polar decomposition 
Y= PWE ql- Along with it, the response operator 
determines the dynamical model 


H:=clRan(C)'?, We (CT)? [is] 


The correspondence “system — model” is realized 
by the isometry U = ©* : Wf > |W} |f. The opera- 
tor L := ULTU* dual to the Laplacian on waves, is 
determined by its graph 


grLT 
=A (WRF, -WE ff E CPE x 0T) } [16 


(see [8]) and, therefore, Č" is also determined 
by R*'. In the case T>T, the operator 
W a: L2([0, r]; H)— H dual to W”, is represented 


in the = 


vol? 


~ 


Wry = | E sinfe- DEOH at 
0 
r>0 [17] 


in accordance with [9]. Thus, the dynamical model 
visualizes the ®*-images of the waves propagating 
inside Q. 


Wave Coordinatization 


In a general sense, a coordinatization is a corre- 
spondence between points x of the studied set A and 
elements %¥ of another set A such that: (i) the 
elements of A are accessible and distinguishable; (ii) 
the map x+>%* is a bijection; and (iii) relations 
between elements of A determine those between 
points of A which are studied (H Weyl). Coordina- 
tization enables one to study A via operations with 
coordinates % € A. 

The external observer investigating the mani- 
fold probes 2 with waves initiated by sources on 
I. The relevant coordinatization of Q described 
below uses such waves and is implemented in 
three steps. 

Step 1 (subdomains) Let x(y, T) be the end point of 
the geodesic of the length 7 > 0 emanating from y € T 
in the direction —v(y), and let a ae be a small 
neighborhood shrinking to y as € > 0. If r< 7;(4), 
then the family of subdomains 





Figure 3 The subdomains. 


Ww (9,7) = [PP \ LY] (a) 


(shaded domain on Figure 3) shrinks to x(y, T); if 
T >7,(y), then the family terminates: wf (y, 7) =@ as 
e€ < €0(7) (the case y= 7 in Figure 3). Such behavior 
of subdomains implies that 


lim (KEYAT A (yY 
_ l (CT) TERLI) 
0), T > Taly) [18] 


Step 2 (wave subspaces) Pass from the subdomains 


to the corresponding subspaces H(I)’,H(o;)', 
H(w*(y,7))’, and represent them via reachable sets 


by [7]: 
APD =cl WF, H(i) = cl Weat os 
Hla (y7))" = el Wry L2([0, A; Haf (7,7) 
= cl W" ail ((0, r); [HITY 
SHV | NH(0%)") 
= cd WL) (10, r); [el WE” 
O cl Wg F] N cl Wif; ) 
Define 
Woon) = lim cl WiaiL2 ([0, r]; [el WF” 
S dl Wig FT] A cl WEF S| [19] 


Wino) = Wiy,+0)> 7 = O (the limits in the sense of the 
strong operator convergence of the projections in H 
on the corresponding subspaces). By the definitions, 
one has Wi, = limz.o H(«*(7,7))’, whereas [18] 
leads to the equality 
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for ally € T, 7 > 0, r > 0. Asa result, since any x € Q 
can be represented as x = x(y, T), one attaches to every 
point of the manifold a family of expanding subspaces 
(Wir = 0} built out of waves. As is seen from [20], 
the family is determined by the point x (not dependent 
on the representation x = x(7,7)); the subspaces which 
it consists of coincide with H(x)’. 
Expressing the distance as 


dx x = 2intie SO He) OHS). e410 
in accordance with [20], one can represent 
re Cee cae 
=2 inf {r > 0| Wo m AW an ALOE [21] 


where x’ =<x(7’, 7’), Pa , T”), and hence find 
the distance via the above families. 

Step 3 (wave copy) By Oe gis: vyéeT,7> 0, 
gather all nonzero families { he > O}=:* in the 
set Q ={%}. Redenoting W: We . E€ x, endow the 
set with the distance 


d(x',x") := 2 inf {r > 0| W AW 4 {0}} BZ 


In view of [21], one has d(x’, x!) = d(%', X"), so 
that the metric space (Q, d) is an isometric copy 
of (Q, d) by construction. Thus, the correspondence 


x=% (“point=> family”) is an isometry and 
satisfies the general principles (i)-(1ii) of 
coordinatization. 


The manifold (Q, d) is the end product of the 
wave coordinatization. It represents the original 
manifold as a collection of infinitesimal sources 
interacting with each other via the waves which they 
produce. 


Solving Inverse Problems 


The motivation for the above coordinatization is 
that the wave copy can be reproduced via any 
model. Namely, the external observer with the 
knowledge of © or R7!(T > T,) can recover (Q, d) 
up to isometry by the following procedure: 


1. Construct the model corresponding to the given 
inverse data and determine the operators Ŵ,4, 
O<7< T by [13], [15]; then determine 
L,L, and W,,, by [14] or [16], [17]. 

2. Replace on the right-hand side of [19] all 
operators W without tildes by the ones with 
tildes, and get the subspaces Wa, = UW, 25 
vyeET,7>0,r>0. 

3. Gather all nonzero families {W Waas ilr > 0}=: £in the 
set 2 = {x} and e the subspaces as 
Ws: =W, € x; endow the set with the metric 
dA = 2 inf{r > 0| Wa Wen & {0} (see [22]), 


and get a sample (Ô, d) of the wave copy (Q, d). 


This sample is isometric to the original (Q, d) by 
construction. Identifying properly the boundaries aon 
and T, one turns (Q,d) into a canonical representa- 
tive of the class of equivalent manifolds possessing 
the given inverse data. 

If the response operator R*! is given for a fixed 
T < T,, the above procedure produces the wave 
copy of the submanifold ( i. d). This locality in 
time is an intrinsic feature and advantage of the BC 
method: longer time of observation on T increases 
the depth of penetration into Q. 


Amplitude Formula 


Another variant of the BC method is based on 
geometrical optics formulas describing the propaga- 
tion of singularities of the waves. 

Let y € H, and let 8 be the density of the volume 


in semigeodesic coordinates: =(ZdI'dr; the 
function 
, aso Tyee); my) See 
WW): + otherwise 


defined on T x [0, T,] is called the image of y. The 
amplitude formula represents the images of waves 
initiated by boundary controls in the form 


uf0(-,T)(y,7) = lim [Wia U — P) Weal (7 t) 


toT-—Tr- 
O<7r< T 


where I is the identity operator and P” is the 
projection in H onto clW/,F". The formula is 
derived by the ray method going back to 
J Hadamard, the derivation uses the controllability 
[7]. 

Any model determines the right-hand side of the 
last relation by the. isometry: SW 4) l= P) 
WI = (Wia) ie Wea, where Wye = UWL is 
the identity operator, and P’ = UP’ U* is the projec- 
tion in H onto clW,,F"’. This leads to the 
representation 


uO T(n T) = lim [( Waa)" (I PY Weal) 


0O<T<T [23] 


and makes the amplitude formula a useful tool for 
solving the inverse problems. The external observer 
can construct a model via inverse data and then 
visualize by [23] the wave images on the part O! of 
the pattern (see Figure 1). The collection of images 
ul° corresponding to all possible controls f is rich 
enough for recovering the tensor g on ©! (i.e., the 
metric tensor in semigeodesic coordinates) and 
turning the pattern into an isometric copy of the 
submanifold T, d). This variant of the method is 
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more appropriate if one needs to recover unknown 
coefficients of the wave equation in Q — it can be 
realized in terms of numerical algorithms. 


Extensions of the Method 


Electromagnetic waves are also well suited for 
coordinatization and for constructing the wave copy 
(9,d). An appropriate version of the amplitude 
formula also exists for the system governed by the 
Maxwell equations (see Further Reading). At present 
(2004), the applicability of the BC method to three- 
dimensional inverse problems of elasticity theory is 
still an open question. The following hypothesis 
concerns the Lamé system: the wave coordinatization 
procedure (steps 1-3) using the elastic waves instead 
of the above uw, gives rise to the copy of 2 c R? 
endowed with the metric  |dx|7/ or where 
Cp = 4/ (À + 2u)/p is the speed of the pressure waves. 

The concept of model is used for solving inverse 
problems for the heat and Schrödinger equations 
(Avdonin and Belishev, 1995-2004), as well as for 
the problem of boundary data continuation 
(Belishev 2001, Kurylev and Lassas 2002). A variant 
of the BC method allows one to recover not only the 
manifold but also the Schrödinger type operators on 
it and/or the dissipative term in the scalar wave 
equation (Kurylev and Lassas 1993-2003). 

An appropriate version of the amplitude formula 
solves the inverse problem for one-dimensional two- 
velocity dynamical system which describes the waves 
consisting of two modes propagating with different 
speeds and interacting with each other (Belishev, 
Blagoveschenski, Ivanov, 1997-2000). 

One more variant of coordinatization going back 
to the first paper on the BC method, associates with 
points x € Q the Dirac measures ôx; then, their 
images 6, are identified via suitable models. This 
variant solves inverse problems on graphs and the 
two-dimensional elliptic Calderon problem. The 
reader is referred to articles by the present author 
listed in Further Reading. 

Within the scope of the method, one derives some 
natural analogs of the classical Gelfand—Levitan— 
Krein—Marchenko equations (Belishev, 1987-2001). 
Also, an appropriate analog solves the kinematic 
inverse problem for a class of two-dimensional 
manifolds (Pestov 2004). 

There exists an abstract version of the 
approach, embedding the BC method into the 


framework of linear system theory (Belishev 
2001). The method is also related to the problem 
of triangular factorization of operators (Belishev 
and Pushnitski 1996). 

Numerical algorithms for solving two-dimensional 
spectral and dynamical inverse problems for the wave 
equation pu, — Au=O0 which recover the variable 
density p have been developed and tested (Filippov, 
Gotlib, Ivanov, 1994-1999). 


See also: Dynamical Systems and Thermodynamics; 
Geophysical Dynamics; Inverse Problem in Classical 
Mechanics. 
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Introduction 


Integrable equations are a special class of nonlinear 
equations arising in the modeling of a wide variety 
of physical phenomena. It has been argued that 
integrable PDEs are in a certain, specific sense 
“universal” models for physical phenomena invol- 
ving weak nonlinearity. Indeed, integrable equations 
are obtained by a procedure involving rescaling and 
an asymptotic expansion from very large classes of 
nonlinear evolution equations, which preserves 
integrability while retaining in the limit weakly 
nonlinear effects. For this reason, integrable equa- 
tions are a very important class of PDEs. Important 
examples are the nonlinear Schrödinger (NLS) 
equation 


iq: + dxx — 2Alql’q = 0, A=+1 [1] 


the Korteweg—deVries (KdV) equation 


qt + dx £4xxx + 6qqx = 0 |2] 
the modified KdV (mKdV) equation 


7 an A F 6àq°qx =, A= EIl [3] 


and the sine-Gordon (SG) equation in light-cone or 
laboratory coordinates 


Axe + sing =O or qau — qxx + sing = 0 J4 

A general method for solving the initial-value 
problem for integrable equations in one space 
dimension was discovered in 1967, when in a 
pioneering and much celebrated work (Gardner 
et al. 1967), the initial-value problems for KdV 
with decaying initial condition was completely 
solved. Soon afterwards, it was understood that 
this method, now known as the “inverse scattering 
transform,” is of more general applicability. Indeed, 
it can be applied to those nonlinear equations that 
can be written as the compatibility condition of a 
pair of linear eigenvalue equations. The method of 
solution for the Cauchy problem essentially relies on 
the possibility of expressing the equation through 
this pair, now called a Lax pair after the work of 
Lax (1968), who first clarified the connection. 
Zakharov and Shabat (1972) constructed such a 
pair for the NLS equation, and in subsequent years 
the Lax pairs associated with all important integr- 
able equations in one and two spatial variables were 


constructed. These include the NLS, sG, mKdV, 


Davey-Stewartson I and II, and Kamdotsev- 
Petviashvili I and II equations. 

There is no universally accepted definition of an 
integrable PDE, but on account of the above results, 
the existence of a Lax pair can be taken as the 
defining property of such equations. In the course of 
the 1970s, the inverse scattering transform was 
applied to solve the initial-value (Cauchy) problem 
for many integrable equations. In principle, there is 
no obstruction to solving analytically the initial-value 
problem by the inverse scattering transform as soon 
as a Lax pair is constructed for the equation, and 
appropriate decaying initial conditions are pre- 
scribed. The solution is then characterized in 
terms of a certain integral equation. This approach 
is equivalent to associating with the initial-value 
problem a classical problem in complex analysis, 
namely a matrix Riemann-Hilbert problem, 
defined in the complex spectral space. This point 
of view is currently taken by many authors as it 
provides a unifying and very flexible framework for 
the analysis. 

After the success of the inverse scattering trans- 
form in solving the Cauchy problem, it was natural 
to attempt to generalize the approach to boundary- 
value problems. To describe the difficulties involved 
in this generalization, consider the case of evolution 
equations in one space and one time dimensions. 
The independent variables can be denoted by (x,t), 
with t > 0 representing time. While the initial-value 
problem is posed on the full real line, hence for 
x € (~œ, co), the simplest boundary-value problem 
is posed on a half-line, for x € (0,00). In addition 
to initial conditions for initial time t=O, it is 
necessary to prescribe conditions at the boundary 
x=0. The number of conditions that must be 
prescribed to obtain a problem which admits a 
unique solution depends on the particular equation, 
but for evolution equation it is roughly equal to 
half the number of x-derivatives involved in the 
equation. For example, for the NLS equation, a 
well-posed problem is defined as soon as one 
boundary condition at x=0 is prescribed; hence a 
typical boundary-value problem for this equation is 
obtained, for example, when g(x,0)=qo(x) and 
q(0, t) = go0(t) are prescribed and compatible, so that 
go(0) = g0(0). It follows that, while qxx(0,t) can be 
computed from the equation, gx(0,f) is not imme- 
diately known. An even more difficult situation 
arises for the KdV equation [2] (with the + sign), 
for which a well-posed problem is again defined as 
soon as one boundary condition is prescribed, so 
that there are two unknown boundary values. 
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Because of this simple fact, a straightforward 
application of the ideas of the inverse scattering 
transform immediately encounters one crucial diffi- 
culty. This transform method yields an integral 
representation of the solution which involves not 
only the given boundary conditions f(t), but also the 
other “unknown” boundary values — in our example 
for the NLS equation, the function q,(0,t). The 
problem of characterizing these unknown boundary 
values has impeded progress in this direction for over 
thirty years. 

On account of their physical significance, various 
boundary-value problems for the KdV equation have 
been considered, and classical PDE techniques (not 
specific to integrable models) have been used to 
establish existence and uniqueness results (Bona 
et al. 2001, Colin and Ghidaglia 2001, Colliander 
and Kenig 2001). These approaches, and in parti- 
cular the approach of Colliander and Kenig, are 
quite general and possibly of wide applicability, and 
give global existence results in wide functional 
classes. However, they do not rely on integrability 
properties. Indeed, none of these results use the 
integrable structure of the equation in any funda- 
mental or systematic way. However, the fact that 
these equations are integrable on the full line implies 
very special properties that should be exploited in 
the analysis and it is natural to try to generalize the 
inverse scattering transform approach. 

Such a generalization is sometimes directly possi- 
ble. For example, it has been used for studying the 
problem on the half-line for the hyperbolic version 
of the sG equation [4a] which does not involve 
unknown boundary values (Fokas 2000, Pelloni). It 
has also been used to study some specific boundary- 
value problems for the NLS equation, for example, 
for homogeneous Dirichlet or Neumann conditions, 
when it is possible to use even or odd extensions of 
the problem to the full line (Ablowitz and Segur 
1974), or more recently in Degasperis et al. (2001). 
In the latter case, however, the unknown boundary 
values are characterized through an integral Fred- 
holm equation, which does not admit a unique 
solution. Some special cases of boundary-value 
problems for the KdV equation (Adler et al. 1997, 
Habibullin 1999) and elliptic sG (Sklyanin 1987) 
have also been studied via the inverse scattering 
transform. However all the examples considered are 
nongeneric, and it has recently been shown (Fokas, 
in press) that the boundary conditions chosen fall in 
the special class of the so-called “linearizable” 
boundary conditions, for which the problem can be 
solved as if it were posed on the full line. One 
cannot hope to use similar methods to solve the 
problem with generic boundary conditions. 


Recently, Fokas (2000) introduced a general 
methodology to extend the ideas of the inverse 
scattering transform to boundary-value problems. 
This methodology provides the tools to analyze 
boundary-value problems for integrable equations to 
a considerable degree of generality. We note as a 
side remark that linear PDEs are trivially integrable, 
in the sense of admitting a Lax pair (in this case the 
Lax pair can be found algorithmically, while the 
construction of the Lax pair associated with a 
nonlinear equation is by no means trivial). As a 
consequence of this remark, the extension of the 
inverse scattering transform also provides a method 
for solving boundary-value problems for a large 
variety of linear PDEs of mathematical physics. 

What follows is a general description of the 
approach of Fokas, considering, for the sake of 
concreteness, the case of an integrable PDE in the 
two variables (x,t) which vary in the domain D 
(typically, for an evolution problem D=(0,o)~x 
(0,T)). We assume that g(x,t) denotes the unique 
solution of a boundary-value problem posed for 
such an equation. 


The method consists of the following steps. 


1. Write the PDE as the compatibility condition of a 
Lax pair. This is a pair of linear ODEs for the 
function p=p(x,t,k) involving the solution 
q(x, t) of the PDE, the derivatives of this solution, 
and a complex parameter k, called the spectral 
parameter. This can be done algorithmically for 
linear PDEs, and in this case p(x,t,k) is a scalar 
function. For nonlinear integrable PDEs, u(x, t, k) 
is in general a matrix-valued function. 

The equivalence of the PDE with a Lax pair 
can be reformulated in the language of differ- 
ential forms, and in this language it is easier to 
describe the methodology in general. Assume 
then that Q(x,t,k) is a differential 1-form 
expressed in terms of a function g(x,t) and its 
derivatives, and of a complex variable k, and one 
which is characterized by the property that 
dQ=0 if and only if g(x,t) satisfies the given 
PDE. The closure of the form Q yields the two 
important consequences 2(a) and 2(b) below. 

2. (a) Since the domain D under consideration is 
simply connected, the closed form Q is also exact; 
hence, it is possible to find the particular, 0-form 
u(x,t,k), solving du=Q. In particular, u(x,t, k) 
can be chosen to be sectionally bounded with 
respect to k by solving either a Riemann—Hilbert 
problem or a d-bar problem in the complex 
spectral k plane, and the solution p(x,t,k) is 
then expressed in terms of certain “spectral 
functions” depending on all the boundary values 
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of the solution g(x,t) of the PDE. The function 
g(x,t) can then be expressed in terms of 
u(x,t,k). (b) The integral of Q along the 
boundary of the domain D vanishes. This yields 
an integral constraint between all boundary 
values of the solution of the PDE, which 
becomes an algebraic constraint for the spectral 
functions. The resulting algebraic identity is 
called the “global relation.” 

3. The last step is the analysis the k-invariance 
properties of the global relation. This analysis 
yields the characterization of the spectral func- 
tions in terms only of the given boundary 
conditions. 


The crucial and most difficult step in the solution 
process is the characterization described above. The 
analysis required depends on the type of problem 
under consideration. For nonlinear integrable evolu- 
tion PDEs posed on the half-line x > 0, in general 
the characterization mentioned in step (3) involves 
solving a system of nonlinear Volterra integral 
equations. This is an important difference from the 
case of the Cauchy problem, where the solution is 
given by a single integral equation where all the 
terms are explicitly known. 

The method outlined above has been applied 
successfully to solve a variety of boundary-value 
problems for linear and integrable nonlinear PDEs. 
For concreteness, here the focus is on the important 
case of integrable evolution PDEs in one space, which 
illustrates clearly the generalities of this method. 


Integrable Evolution Equations in One 
Space Dimension 


The crucial property of integrable PDEs which is 
used in the inverse scattering transform approach to 
solve the initial-value problem is the fact that they 
can be written as the compatibility of a Lax pair. 
Many integrable evolution equations of physical 
significance (such as NLS, KdV, sG, and mKdV) 
admit a Lax pair of the form 


Hx + ifi(R)o3 = O(x,t, k) 


" 5 

Ht + if2(k)ozsu =Q(x,t,k)u 4 
where u(x,t,k) is a 2 x 2 matrix, 03 =diag(1, — 1), 
fi(k),i=1, 2, are analytic functions of the complex 
parameter k, and QO, O are analytic functions of k, 
of the function g(x, t) (and of its complex conjugate 
q(x,t) for complex-valued problems) and of its 
derivatives. For example, the NLS equation [1] is 


equivalent to the compatibility condition of the pair 





ae On. © ( : 4 
x 1 = ) = 
H 03- H g 0 


ue + 2ik?ozu = (2kQ — iQxo3 — idlg|*o3) pu 


6 


The first step towards a systematic new approach to 
solving boundary-value problem was the work of 
Fokas and Its, who associated the boundary-value 
problem for NLS on the half-line to a single 
Riemann-Hilbert problem determined by both 
equations in the Lax pair. The jump determining 
this Riemann—-Hilbert problem has an explicit 
exponential dependence on both x and t. This differs 
from the classical inverse scattering approach, in 
which the x-part of the Lax pair is used to determine 
an x-transform with t-dependent scattering data, 
and the t-part of the Lax pair is then exploited to 
find the time evolution of these data. The work of 
Fokas and Its led to the understanding that both 
equations in the Lax pair [6] must be considered in 
order to construct a spectral transform appropriate 
to solve boundary-value problems. Fokas (2000) 
reviews his systematic way to solve these problems 
by performing the simultaneous spectral analysis of 
both equations in the Lax pair. The transform thus 
obtained, which is a nonlinearization of the Fourier 
transform, precisely generalizes the inverse scatter- 
ing transform. 

This simultaneous analysis also leads naturally to 
the identification of the “global relation” which 
holds between initial and boundary data, and which 
plays an essential role in deriving an expression for 
the solution of the problem which does not involve 
unknown boundary values. 

The Riemann-Hilbert problem with explicit (x, t) 
dependence, the global relation, and the invariance 
properties of the latter with respect to the spectral 
parameter are the fundamental ingredients of this 
systematic approach to solve boundary-value pro- 
blems for integrable equations. 

The steps involved in this method are summar- 
ized in the introduction. While steps (1) and (2) 
can be described generally, and, once the Lax pair 
is identified, can be performed algorithmically (at 
least under the assumption that the solution of the 
PDE exists), the last step is the most difficult part 
of the analysis, and it needs to be considered 
separately for each given problem. However, it is 
this step that yields the effective characterization 
of the solution. 

The results obtained for the particular case of eqn 
[1] are reviewed in detail in the next section, as they 
provide an important example, which can be 
generalized without any conceptual difficulty to 
eqns [2|-[4]. 
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The NLS Equation 


As already mentioned, the initial-value problem for 
NLS was solved, for decaying initial condition, by 
Zakharov and Shabat, and studied in depth by many 
others. However, by the mid-1990s only a handful 
of papers had been written on the solution of the 
boundary-value problem posed on the half-line, all 
on a specific example or aspect of the problem, or 
attempts at solving the problem using general PDE 
techniques. 

For this equation, the approach of Fokas yields 
the following results. Let the complex-valued 
function g(x,t) satisfy the NLS equation [1], for 
x > 0 and t£ > 0, for prescribed one initial and one 
boundary conditions. For the sake of concreteness, 
we select the specific initial and boundary 
conditions 


q(x,0) = qo(x) € S(R") 
q(0,t) = go(t) € S(R*) 7] 
qo(0) = go(9) 
where S denotes the space of Schwartz functions 
(similar results hold for different choices of bound- 
ary conditions, and less restrictive function classes). 
The solution of this initial boundary-value (IBV) 


problem can be constructed as follows (Fokas 2000, 
2002; in press): 


e Given go(x) construct the spectral functions 


{a(k), b(k)}. These functions are defined by 
a(k) = 2(0,k), b(k) = 1(0, k) 


where the vector ¢(x,k) with components ¢1(x, k) 
and ¢2(x,k) is the following solution of the 
x-problem of the associated Lax pair evaluated 
at r=0: 


Px + ikozġ _ O(x, 0, Rk), 


olx, k) = (1) + o(1)) as X —> 00 


oan (hy, P) 


O0<x<ow,Imk>0 


(03 and O(x,t, k) are defined after eqns [5] and [6], 
respectively). 

Given go(x) and go(t) characterize g1(t) by the 
requirement that the spectral functions 
{A(t, k), B(t, k)} satisfy the global relation 


B(t,k) — R(k)A(t,k) = ef AE 


c [0, T], keD 


where D denotes the first quadrant of the 
complex k-plane: 


D = {k|Rek > 0, Imk > 0} 


D denotes the closure of D, and c(t,k) is a 
function of k analytic in D and of order O(1/k) 
as k — co. The spectral functions are defined by 


A(t,k) = e7*')(¢,k), 


B(t,k) = —e7*'*@,(t, k) "1 


where the vector ®(t,k) with components ®; and 
®, is the following solution of the t-problem of 
the associated Lax pair evaluated at x = 0: 


P, + 2ik*o38 = O(0,t,k)® 
O<t<T, REC 


van -() f 


~ 


O(0,t,k) = 
( — |go(t)|? 2kgo(t) + -= 
2kgo(t) — irgi(t) |go(t)|’ 


Given a(k),b(k) and A(k),B(k), define a 2x2 
matrix Riemann-Hilbert problem. This problem 
has the distinctive feature that its jump has 
explicit (x,t) dependence in the exponential 
form of exp {ikx + 2ik7t}. Determine g(x,t) in 
terms of the solution of this Riemann—Hilbert 
problem by using the fact that these functions 
are related by the Lax pair. Then the function 
q(x,t) solves the IBV problem [1]-[7] with 
q(x, 0) = qo(x), q(0, t) = go(t), and q,.(0, t) = g(t). 


The above construction can be summarized in the 
following theorem (Fokas 2002): 


Theorem 1 Consider the boundary-value problem 
for the NLS equation [1] determined by the conditions 
[7]. Let a(k), b(k) be given by [8], and suppose that 
there exists a function g(t) such that if A(k), B(k) are 
defined by [9], then the global relation [8] holds. 

Let M(x,t,k) be the solution of the 2x2 
Riemann-Hilbert problem with jump on the real 
and imaginary axes given by 


e M_(x,t,k)=M, (x,t, k)](x,t,k) with M=M_ in 
the second and fourth quadrants of C, M = M, in the 
first and third quadrants of C, and J(x, t, k) is defined 
in terms of a, b, A, B and the exponential eikx—2ik*t. 

e M=I+ O(1/k) as k — œ and has appropriate 
residue conditions if there are poles 
Then M(x,t,k) exists and is unique, and 


67) = 2i lim (RM(x,t,k))15 
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The result above relies on characterizing the 
unknown boundary value g1(t) a priori by requiring 
that the global relation hold. Recently, substantial 
progress has been made in this direction in the case of 
integrable nonlinear evolution equations, in particu- 
lar of NLS. Namely Fokas (in press) contains an 
effective description of the map assigning to each 
given g(x, 0) =go(x) and go(t) = 4(0, t) a unique value 
for qx(0, t) (called the Dirichlet to Neumann map) for 
the NLS, as well as for a version of the Korteweg- 
deVries and sG equations. We state below the 
relevant theorem for the case of the NLS equation. 


Theorem 2 Let q(x,t) satisfy the NLS equation on 
the half-line O < x < œ,t > 0 with the initial and 
boundary conditions |7]. Then gı(t):= qx(0,t) is 
given by 


gaa 20) J e kt (5 (t,k) —®(t,-k))dk 


TT 


n | ei" (kD (t, k) — 4 (t,—k)] +igo(t))dk 
n JƏD 

with PB=(®1,®2)' given by the solution of [10]. The 
Neumann datum g1(t) is unique and exists globally 
in t. 


This result yields a rigorous proof of the global 
existence of the solution of boundary-value pro- 
blems on the half-line for the NLS equation. There- 
fore, the assumption in Theorem 1 that a suitable 
function g(t) exists can be dropped. 


Generalizations and Summary of Results 


Results analogous to the ones presented in the 
previous section can be phrased exclusively in terms 
of integral equations rather than in terms of 
Riemann-Hilbert problems, as done for example in 
Khruslov and Kotlyarov (2003). This is the point of 
view of the school of Gelfand and Marchenko, and in 
this setting the functions ® are given in the so-called 
Gelfand—Levitan—Marchenko representation. Results 
on boundary-value problems for the NLS equation 
using this representation have been obtained only 
under additional assumptions on the unknown part 
of the boundary values. It was only after the idea that 
the x- and t-parts of the spectral equations should be 
treated simultaneously that this approach yielded 
complete results. However, the Gelfand—Levitan— 
Marchenko representation yields a crucial simplifica- 
tion for deriving the explicit form of the Dirichlet to 
Neumann map and proving Theorem 2. This 


representation has now been derived for all equations 
[1]-[3], see Fokas (in press). 

The analysis of the invariance properties of the 
global relation with respect to k also yields the 
characterization of all the boundary conditions for 
which the transform obtained to represent the solution 
linearizes. For these boundary conditions, called 
linearizable, the solution can be represented as 
effectively as for the Cauchy problem. For example, 
the linearizable boundary conditions for the NLS 
equation are given by any boundary values that satisfy 


go(t)gı (t) — 


(e(t) = 0 


An example of boundary condition satisfying 
this constraint, encompassing also Dirichlet and 
Neumann homogeneous conditions, is g(0,t) — 
Xqx(0,t)=0, with x a non-negative constant. 

As mentioned at the beginning of the previous 
section, the approach described in general can be 
used to obtain results similar to those given for the 
NLS equation for many other integrable evolution 
equations, in particular, mKdV (Boutet de Monvel 
et al. 2004), sG, and KdV (Fokas 2002). The results 
obtained are essentially the same as for NLS, 
starting from the general form [5] of the Lax pair, 
and include the derivation of the solution representa- 
tion, the complete characterization of linearizable 
boundary conditions, and the analysis of the Dirichlet 
to Neumann map. 

The approach above can also be used for studying 
boundary-value problems posed on finite domains, 
for x € [0,1]. This has been done for a model for 
transient simulated Raman scattering (Fokas and 
Menyuk 1999), for the sG equation in light-cone 
coordinates (Pelloni, in press), and for the NLS 
equation (Fokas and Its 2004). In this case also the 
method yields a representation of the solution which 
is suitable for asymptotic analysis. In this respect, 
the question of soliton generation from boundary 
data is of some importance, and has been recently 
considered by various authors (Fokas and Menyuk 
1999, Boutet de Monvel and Kotlyarov 2003, 
Pelloni in press, Boutet de Monvel et al. 2004). 
The results are however still considered case by case, 
and there is no general framework for this problem 
identified yet. For problem on the half-line, solitons 
may be generated but not necessarily in correspon- 
dence to the singularities that generate soliton for 
the full line problem, even when the same singula- 
rities are present. For problems posed on finite 
domains, in some specific cases at least for the 
simulated Raman scattering, and the sG equations, 
it appears that the dominant asymptotic behavior is 
given by a similarity solution. 


In conclusion, the extension of the inverse scattering 
transform given by Fokas provides the tool for analyzing 
boundary-value problems specific to nonlinear integr- 
able equations. This tool relies, in an essential way, on 
the integrability structure of the problem, and yields a 
full characterization of the solution as well as uniqueness 
and existence results. The solution representation thus 
obtained is not always fully explicit, but it is always 
suitable for asymptotic analysis using standard techni- 
ques such as the recent nonlinearization of the classical 
steepest descent method. 


See also: 0 Approach to Integrable Systems; Integrable 
Discrete Systems; Integrable Systems and the Inverse 
Scattering Method; Integrable Systems: Overview; 
Nonlinear Schrodinger Equations; Riemann-Hilbert 
Methods in Integrable Systems; Separation of Variables 
for Differential Equations; Sine-Gordon Equation. 
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Introduction 


Tensor or monoidal categories are encountered in 
various branches of modern mathematical physics. 
First examples came without mentioning the name of a 
monoidal category as categories of modules over a 
group or a Lie algebra. The operation of a monoidal 
product in this case is the usual tensor product X @c Y 
of modules (representations) X and Y. These categories 
are symmetric: the modules X ® Y and Y & X are 


isomorphic; moreover, the permutation isomorphism 
(the twist) c:X@YrY@X, x@yy@x, is 
involutive, c? =idysjy. Next examples of monoidal 
categories were given by categories of representa- 
tions of supergroups or Lie superalgebras. They are 
also symmetric: now the symmetry (Koszul’s rule) 
c:XQY—Y QX, x gym (-1)*8* 48" y Q x, is the 
twist with a sign, which depends on the degree (or 
parity) degx of elements x € X. 

The development of the theory of exactly solvable 
models in statistical mechanics led Drinfeld (1987) 
to the notion of quantum groups — Hopf algebras H 
with additional structures (quasitriangular Hopf 
algebras). H-Modules also form a monoidal cate- 
gory; however, it is not symmetric, but only braided. 
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It means that a canonical braiding isomorphism 
c:X@Y—Ye®@X still exists, but it is not involutive 
any more, c? Æ id. The braiding c satisfies the Yang- 
Baxter equation 


(c3 U @ej)e@]) 
=(1@c)(c@lj\1®c):X@YSZ—-ZeY@xX 


for any three H-modules X, Y, Z. 

In the above examples, we also have an obvious 
isomorphism of associativity a:X ®(Y®Z)—> 
(X@Y)®Z of the iterated tensor product. 
There are, however, monoidal categories of 
modules, where such an isomorphism is nontri- 
vial, namely, modules over quasi-Hopf algebras. 
These were introduced by Drinfeld (1989a, b) in 
connection with the Knizhnik—Zamolodchikov equa- 
tions. These nontrivial associativity isomorphisms 
a:X@(Y@Z)—-(X@Y)@Z are required to 
satisfy the pentagon equation of Mac Lane and 
Stasheff. 

Braided monoidal categories also arise in rational 
conformal field theories (RCFTs), integrable models 
of statistical mechanics and topological quantum 
field theories (TQFTs). The common feature of 
these categories is that they are semisimple abelian 
with finite number of simple modules. In other 
words, such a category C is equivalent to the category 
of finite-dimensional C” =C x --- x C-modules for 
some n. However, not monoidally equivalent, the 
monoidal structure can be rather involved. For 
instance, from the Ising model one can obtain the 
monoidal category with two simple objects I and X, 
which obey the monoidal law 1 @1=11@X=X® 
l=xX,X ® X=1 X. Clearly, such relations cannot 
be satisfied by finite-dimensional C-vector spaces 1 
and X, if ® would mean the usual tensor product ®c 
of C-vector spaces. However, here ® means simply a 
functor ®:C x C—C with certain properties. Cate- 
gories which come from RCFT, integrable models or 
TQFT often enjoy additional properties. They are 
rigid — for each object X, there exists a dual object 
X“. They are ribbon (balanced) — there is a canonical 
endomorphism vy : X — X for each object X, which 
is related to the braiding. They are modular, which is 
defined as nondegeneracy of a certain matrix. The 
meaning of modularity is that the ribbon category is 
suitable for producing a TQFT out of it. 

For categories equivalent to the category of 
Cx.--- x C-modules, the ribbon (braided) monoidal 
structure can be specified by a finite number of complex 
matrices. For instance, 6j-symbols or g-6j-symbols 
encode the associativity isomorphism. In this form, 
modular categories appeared in the work of Moore and 
Seiberg (1989) on RCFTs. Such categories can be 





realized as categories of modules over weak Hopf 
algebras, but we stress again that the monoidal product 
for such modules does not coincide with the tensor 
product of vector spaces. So, general features are better 
seen at the level of category theory, and we now start 
with precise definitions. 


Rigid Monoidal Categories 


We recall here the basic definitions of monoidal 
categories, monoidal functors, and dual objects. 


Definition 1 A monoidal category (C, ®,a, l,l, r) is 
a category C, a functor ®:C x C—C (called the 
tensor product), a functorial isomorphism a: X & 
(YS Z)—=(X @ Y) ® Z, the associativity isomorph- 
ism, a unit object 1, and two functorial isomorph- 
isms 1:1 @ X — X,r:X ®1—X such that 


X@(Y@(Z@W))4(X @Y) @(Z@W)S((X@Y)@Z)QAW 
Xeal 
X@((Y®Z)@W) 2 


fag w 
(X@(Y@Z) 8 W 





commutes (the pentagon equation) and 


ry'@Y 
AX AY = (xsaon xey £ xener) 


Definition 2 A monoidal functor (F, ¢, f): (C, 8) > 

(D, Q ) is a functor F:C + D, a functorial isomorph- 

ism ¢=@¢@x,y: F(X) @ F(Y)— F(X & Y) € D, and an 

isomorphism f : 1 — Fl € D such that 

FX @ (FY @ FZ) “5 FX @F(Y @Z) “+ F(X @(Y@Z)) 
al | Fa 


(FX @ FY) @ FZ 3 F(X@Y)@Z + F((X@Y)®Z) 
FI@FX—+F(1@X) FX@FI—+F(X@1) 


fat} | FI, tof] | Fr 

1@FX_! , FX FX®@l]_1* , FX 
commute. A morphism of monoidal functors 
A:(F,¢,f) -(G,y,g) is a functorial morphism 


A:F—G such that 


FX @ FY “> F(X @ Y) 


agal hr 


GX @ GY» G(X @Y) 


g= (1 Las G1) 
The f datum of a monoidal functor (F,¢,f) is 
uniquely determined by the (F,@) data, so we can 
denote a monoidal functor as (F, @) or even F. 
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The coherence theorem of Mac Lane (1963) states 
that any monoidal category C is equivalent to a 
strictly monoidal category, in which X ® (Y & Z) = 
(X®@Y)®Z,1@X=X=X @ 1, and the isomorph- 
isms a,l,r are identity isomorphisms. Thus, in 
theoretical constructions, one may ignore the associa- 
tivity isomorphism. It is not always so in practice. For 
instance, working with quasi-Hopf algebras related 
with the Knizhnik—Zamolodchikov equation one 
prefers to keep the original category, which is (a 
deformation of) the category of modules over a Lie 
algebra, rather than to replace it with a strict monoidal 
category, that is not a category of modules any more. 


Definition 3 A rigid category C is a monoidal 
category in which, to every object X € C, dual 
objects XY and YX €C are assigned together with 
morphisms of evaluation and coevaluation 


evx :X@XY = 1=X|)x” 
evk: YX @X 3 1=YX| JX 


coevx : 1 + XY @X = X"( |X 


coevy : 1+ X@YX = X{)‘X 
The evaluations and coevaluations are chosen such 


that the compositions 


XT XQ xox @X) (XOX) @X ilox xX 


ye oe ee a aor 


Pe Q 12x’ 


ev’ @1 


XY FA 1 @xV tS (XY @X)@XY 24 X @(X@X”) 
VX Kel = “Kalix xexie x ie "* 
are all identity morphisms. 

In a rigid monoidal category C, there is a pairing 
(X@Y) @(YY @X’)4(X@(Y@Y’Y)) 

Q XY XBevox’ (K @ 1) @ XY _1@X” Xo X Z1 


which induces an isomorphism jx, y: YY @ XY > (X @ 


Y)“, such that the above pairing coincides with 


187+ 


(X @ Y)@(YY 9X) (XgY)e(XgY) 31 


The equation 
coevxey = (ary 2Y~Y’ @1@Y 
es yy Q) XV @QXQY 


S xer“ @(X@Y)) 





also holds. Similarly, there is an isomorphism 
jx y: YƏYX—>\(X8Q Y). 

Morphisms constructed from braidings and (co)- 
evaluations are often described by tangles. The 


X 
A morphism X=Y by i 
Y 

o X Y 

The braiding Cx y : X8 Y — Y@X by 4 
J. 

; o. X Y 

The inverse braiding c™! : X8 Y— Y@X by X 

/ 
V 
The evaluation evy :X@XY > 1 by ao 


coevy: | —» XY@X by a) 


Figure 1 Conventions for notation of morphisms from 
tangles. 


The coevaluation 


conventions are listed in Figure 1. The suggested 
assignment of morphisms in C to elementary pictures 
extends to a unique functor ® from the category of 
C-colored tangles to the category C itself. With the 
above interpretation, these tangles need not be 
oriented. We shall use the same notation for framed 
tangles, and the framing will be within the plane. 
The maps ObC— ObC, X= XY, and XrYX 
extend to contravariant self-equivalences C—C, 
freft, and ftf. For given f, the morphisms f' 
and tf can be defined, respectively, by the following 
pictures using the assignment from Figure 1: 


y~“ y~ 
X 


C8 
| 
> 


vY “ye 


II 
z 


v% vy% 
We have a monoidal self-equivalence of C, 
(j2): (C,8,1) > (C,@,1), X= X, ff" 
J2XY = (x gyw (YY XV)“ ES (X@ T [1] 


It is not always true that the two duals XY and YX 
are isomorphic. However, there are canonical 
isomorphisms 


X > (XY), X > (YX)" 
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We may replace the category C with an equivalent one, 
such that the above isomorphisms become identity 
morphisms, and the functors —Y and Y— are inverse to 
each other. We shall assume this to simplify notations. 
Finally, we denote the iterated duals by X”) = XYY 
(n times) and X”) = YY X (n times) for n > 0. 


Braided Categories 


Here we review the definitions of the braiding 
isomorphism and further derived isomorphisms. Sev- 
eral basic relations between them are listed. Two 
important classes of examples of braided categories 
are given by the categories of modules over quasitrian- 
gular Hopf algebras and the categories of tangles. 


Definition 4 A braided category (C, c) is a monoidal 
category C equipped with a functorial isomorphism 
c=cx,y:X®Y—Y@xX - the braiding, or the 
commutativity isomorphism — such that the two 
hexagons commute, 





X @(Y @ Z) 12c§ K@(Z@Y) (KX @Z)@Y 
a | [eel 


ct! 
(X @ Y)@Z—+Z@(X@Y)S(Z@X)@Y 








(one for c and one for c™). 


The graphical notation for the braiding and its 
inverse 1s 


x Y 
c=(exy:X@Y>Y@X)= >< 
Y X 

x Yy 

c= Se 

we n 

Y X 


In a rigid braided category, we can define 
functorial isomorphisms using again the conventions 
from Figure 1: 


X X 
aK. aO 


These are isomorphisms of monoidal functors 
(see [1]) 


uy: (Id, c?) — (j2) 


u : (Id, f) — (—*”, j2) 


In particular, this implies the commutativity of the 
diagram 


XY @ YY R, xoyr 
The square of the monoidal functor (—YY,/2) is 


(= jg) : (C, 9, l) = (C, ©, l), 
Xes Kyy fr ft 


where 
rE bow g YYW B, (XY @ yoy a. (X2 ge 
The natural isomorphism u$ = u2, o u? is, in fact, an 


isomorphism of monoidal functors uj: (Id, id) — 
LY, ja). 


Ribbon Categories 


Now we define balancing and recall some properties 
of balanced (ribbon) categories. 


Definition 5 Let C be a rigid braided category. 
A balancing 6x:X—X% is an isomorphism of 
monoidal functors (3: (Id, id, id) — (YY, j2,d2) such 
that 8 =u% and 6% =A: X =X’. The cate- 
gory C equipped with a balancing is called 
balanced. 


We also use the notation ug = 3. In any balanced 
category, there exists a canonical ribbon twist v. 
A ribbon twist v=vx:X — X,v:Id—Id is a self- 
adjoint (vxv =v) automorphism of the identity 
functor such that c? = (vy! & vy!) ovxe@y. It can be 
determined from the equations 

ue =w or =u ovn X= X 


D sn = or =n ov xX 


In particular, its square is given by the canonical 
isomorphism v?=u; ouf. Conversely, in any 
rigid braided category with a ribbon twist (called 
ribbon category) there exists a canonical balan- 
cing “4 given by the above formulas. Thus, ribbon 
categories and balanced categories are synonyms. 


In the case of X = 1, we have vı = id). 


The following result can be used to simplify 
notations: 


Proposition 1 For any ribbon category C there exists 
a ribbon category D equivalent to C such that in it 


i) 1” =1; 
(ii) for any object X we have ‘X= X*‘,X‘Y =X, 
and Bx =idy :X— XY == =X, 





(iit) for any object X we have evx=evy,:X ® 
XY — 1, and coevx =coevyy :1— XY 8 X. 


In the category C= H-mod, where H is a ribbon 
Hopf algebra, the equation XY =’X is not neces- 
sarily satisfied. Nevertheless, XY is canonically 
isomorphic to YX. The same holds in any ribbon 
category. We identify these objects via 8= u4: 
YX — X“. This allows us to use the right dual 
objects in place of the left ones. In that role, the 
right duals are equipped with the left evaluation 
and coevaluation, called flipped evaluation and 
coevaluation, respectively: 


éy XY @X X88 XY @XVS1 


Coev : ] _coev, XY @ XY L1 8X\ X @ XV 


They are often denoted simply ev and coev and 
should be replaced by ev and coev in applications. In 
the context of Hopf algebra, 8 is given by the action 
of a group-like element introduced by Drinfeld. 


Hopf Algebras in Braided Categories 


Let C be a braided monoidal category. A Hopf 
algebra H in C is an object H € ObC together with 
an associative multiplication m: H & H—H and an 
associative comultiplication A: H — H & H, obeying 
the bialgebra axiom 


(H@H™>HSH@H) 
= (H@eH**SHeHSHeH 
HeeH He H@EHEH 
mon, H@ H) 


Moreover, H has a unit 7: | H, a counit £: H — 1, 
an antipode y:H— H, and the inverse antipode 
y!:H—H. The defining relations for these are the 
same as in the classical case. Notice, in particular, 
that the unit is also a morphism. Associativity of 
multiplication, as well as coassociativity of comulti- 
plication, is formulated with the use of associativity 
isomorphism (in the nonstrict case). 

Hopf algebras in braided categories have also 
been called braided groups. Their basic properties 
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are very similar to those of usual Hopf algebras, for 
example, the antipode is antimultiplicative with 
respect to the braiding (see, e.g., Majid (1993)). 
For Hopf algebras in rigid braided categories, there 
exist integrals in a sense very much similar to the 
case of ordinary finite-dimensional Hopf algebras, 
as shown by Bespalov et al. (2000). 


Modular Categories 


Assume that a braided rigid monoidal category C is 
equivalent as a category (with monoidal structure 
ignored) to the category of finite-dimensional mod- 
ules over a finite-dimensional algebra. In particular, 
C is abelian. Then there exists an object F in C, 
equipped with a morphism ix: X @ XY — F for each 
X € ObC, such that the diagram 


xoy fe" ye yy 


Xft | le 
X@XY__™ F 








is commutative for all morphisms f : X — Y of C, and, 
moreover, F is universal between objects with such 
properties. Here ft: YY +X” is the transpose of a 
morphism f : X — Y. In other words, F is a direct limit, 
called the coend and denoted as F= f ZE Z g ZY. It 
can also be defined via an exact sequence 


B X @ YY _feY’-xef" DZo z “Fo 0 
f:X—>YEC ZEC 


It turns out that the coend F is a Hopf algebra in 
the braided category C, when it is equipped with the 
following operations. The comultiplication in F is 
uniquely determined by the equation 


(xox SrA FoF) 
= (XOX =X@l@Xx’ 
Xgcoev8 X XQXQOXQX“ 
2h, F @ F) 





The counit in F is determined by the equation 
(xax 5r 1) = (xox 51) 


The multiplication m:F & F—F is defined by the 
following diagram: 








x K yy XOX“ @(Y@Y) xeiv, FQF 
= an and xee| ala 
a X@Y@(X@Y)*_ixey , F 


X Y YY X 


356 Braided and Modular Tensor Categories 


The unit is given by the morphism 


n:1=1@1Y F 
The diagram corresponding to the antipode 
yp: F — F is given by 


YF = bS 


The structure of the coend F as a Hopf algebra can 
also be found directly from its universal property, as 
in Majid (1993). 

There is a pairing of Hopf algebras w: F & F —> | in C: 


F n F 


It induces a homomorphism of Hopf algebras F > FY. 


Definition 6 A ribbon category C, equivalent as 
a category to the category of finite-dimensional 
modules over a finite-dimensional algebra, is called 
modular if the pairing w is nondegenerate, that is, 
the induced morphism F — F” is invertible. 


Examples of nonsemisimple modular categories 
include C=H-mod, where H=u,(g) is a finite- 
dimensional algebra, quotient of the quantum 
universal enveloping algebra U,(g), and q is a root 
of unity of odd degree. In these examples, the 
coalgebra F identifies with the dual Hopf algebra 
H*, but the multiplication in F differs from that of 
H*. Explicit formula for the multiplication in F uses 
the R-matrix for H (see, e.g., Majid (1993)). 
A definition of modularity for another type of 
categories (not necessarily abelian) was given by 
Turaev (1994). 

When the category C is modular, the integrals for 
the Hopf algebra F have especially simple properties. 
The integral element in F is two sided. It is a 
morphism p: 1 — F such that 


(F-Fo1 ror) 
— (r=1 F) 


= (F=10F Ferr) 


and u is universal between morphisms with such 
property. By duality, the integral functional A: F —> 1 
is also two sided. It satisfies 


(FFeFSFel=F) 
— (r=15 F) 
— (Fror 51oF=F) 


and is universal between morphisms with such property. 
The integral element and the integral functional are 
unique up to a multiplication by an element of Aute 1. 


Semisimple Abelian Modular Categories 


Reshetikhin and Turaev proposed to construct invari- 
ants of 3-manifolds via quantum groups. More 
precisely, they use certain abelian semisimple ribbon 
categories obtained from quantum groups at roots of 
unity as trace quotients. One can forget about the origin 
of these categories and work simply with semisimple 
modular categories. We shall describe them as input 
data for the modular functor construction. 

Let C be a C-linear abelian semisimple modular 
ribbon category. Assume that the number of 
isomorphism classes of simple objects is finite. 
Assume also that 1 is simple and for each simple 
object X the endomorphism algebra End X =C. We 
denote by S={X;}, the list of (representatives of 
isomorphism classes of) all simple objects. 

Under these assumptions, many formulas simplify. 
The coend F € C takes the form 


F=(QX@ x a 
XES 


Any morphism 1 — F is a C-linear combination of the 
standard morphisms for X € S, 


COeV 


l V 18u v i 
: 1 — X8 X— X ® X* — F 





The morphisms x form a basis of the commu- 
tative algebra InvF= Home(l,F). The Grothen- 
dieck ring of the category C determines the 
multiplication law in Inv F via the algebra 
isomorphism C &z Ko(C) — Inv F, [X] > ox. 

Any morphism F— 1 can be represented as a 
linear combination of the morphisms 


el xo “S1 


where X € S. The functional yı : F — 1 satisfies the 


properties of a two-sided integral A of the braided 
Hopf algebra F. 


The Verlinde Formula 


The number 


XY X 
dim, (X) = G 


coev 1@u5 ev 
tS wox x ex" 1 

is called the dimension of an object X € ObC. (The 

index q reminds us that this number coincides with 

the q-dimension in the case C=U,(g)-mod.) We 

have dim, (X”) = dim, (X). 


Definition 7 Introduce a biadditive function of two 
variables s: ObC x ObC— C on the class of objects of C: 





In particular, its restriction to S is a matrix s|>:S x 
S—C, denoted again by s=(sxy)x yes by abuse of 
notation; here X and Y run over simple objects. 


Notice that sxy =Syx, so the matrix S is symmetric. 
Let us consider the C-algebra Inv F = Home(1, F). It has 
the basis dx, X € S; hence, it is n-dimensional, where 
n = Card S. The form w on F induces a bilinear form 


w' : Inv F x Inv F—> Hom(1, F @ F) Hom) 1 


The matrix 6xy) is the matrix of the form w” in the 


basis (dx). 


Lemma 1 (The Verlinde formula) For any simple 


X € S and any objects Y and Z of C, we have 
sxi = dim,(X), Sx1Sx, Yaz = SxySxz [2] 


Proof The first formula is straightforward. Since 





is a number, we can move it from the second factor 
to the first in the following computation: 
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SX1SX, Y@Z 








ug 
= SXYSXZ 
This proves the second formula. oO 
Proposition 2 (Criterion of modularity) In the 


above assumption of semisimplicity, the following 
conditions are equivalent: 


(i) C is modular (w is nondegenerate); 

(ii) the matrix (Sxy)x yes is nondegenerate; 

(iii) for any X € S its dimension dim X does not 
vanish, and there exist numbers u4, Y € S, such 
that forall X € S we have So y—s5 Sxy Hy = 6x1; and 

(iv) for each simple XÆl we bave 
S yes SXY dim, Y=0 and dim, X 0. 


The easy implication (ii) => (iii) can be deduced 
from the Verlinde formula. If the dimension 
dim, (X) =sxı of a simple object X vanishes, then 
sy =0 for all Y € ObC. This contradicts to the 
assumption of nondegeneracy of (syy). 

Let us determine the coefficients uy of the integral 
element 


p= 5 pydy T 


YES 


of the Hopf algebra F. It also has a two-sided 


integral-functional A:F—1. The corresponding 
endomorphism is 
iz = (Z FoZz 18 Z =— 7) 
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for an arbitrary object Z of C, where 6z is the 
natural coaction. The equation 


X XY 


y Y“ 


X XY 
Hy by 


= dyy 


Y Y“ 


follows from the properties of the two-sided integral 
à of the Hopf algebra F. Due to uniqueness of 
integrals, À is proportional to yı. In eqn [3], X and 
Y vary over S. The right-hand side is the identity 
morphism if X=Y, and vanishes otherwise. Sub- 
stituting the definition of dy, we rewrite the 
equation as follows: 


XYY XY Y 


XYY X“ Y 
For X = 1, we get 
uy -Ay = bly -idy : YOY [5] 
If Y % 1, then Ay =0. So [5] tells essentially that 
m'à =idy: 11 [6] 


Now return to [4] with X = Y. If we compose that 
equation with coev:1— YY & Y, we obtain 





Multiplying both sides of [7] with u1, we find 
HY = Hi: dim; (Y) 


The normalization is fixed by eqn [6], which we can 
write as 


Y Y 


u 
1 = ur: = mX by u2 
YES 


= Mi ` (dim,(Y))° 


YES 


Hence, 


=i 
(m) = (>: (in) [8] 


YES 


So, we find u1, unique up to a sign. 


Conjugation Properties 


From the Verlinde formula [2], we conclude that 


the commutative C-algebra Inv F possesses 
homomorphisms 
xx : Inv F — C 


py |> (dim, (X))~'sxy = Sxy/SX1 


The matrix s is invertible, so that its columns cannot 
be proportional. Hence, all yx are different char- 
acters. Their number is n = Card S =dimc F; hence, 
there is an isomorphism of C-algebras 


xy:Inv F> Cx.. x C=C” 
OKEE) yeaa] 


Now we show that the dimensions dim, (Y) are 
real numbers, so that mı is also a real number. One 
can introduce in Inv F an antilinear involution, 


—*:InvF >InvF, ($x) = ox. 
and a scalar (Hermitian) product 
(dx|dy) =dxy, X,YES 


Then Inv F becomes a finite-dimensional commu- 
tative Hilbert algebra. Indeed, 


(dxoy|bz) = dim Hom(X ®& Y, Z) 
= dim Hom(X, YY Q Z) = (¢x|¢}¢z) 


From the theory of finite-dimensional commutative 
Hilbert algebras, we know that idempotents in the 
algebra Inv F are self-adjoint (only in that case the 
scalar product can be positive definite). Hence, x is 
a *-morphism, that is, yx(¢*)=yx(¢). Therefore, 





Sxyv /Sx1 =Sxy/Sxi- In the particular case of X = 1, 
we obtain 


dim,(Y) = dim,(Y”) = s= sy = dim,(Y) 


since Sj; =1. This proves that for any Y €C its 
dimension dim, (Y) is a real number. 

It is natural to take for mı the positive root of the 
right-hand side of [8]. Positiveness fixes jz; uniquely. 


Examples of Semisimple Modular Categories 


In their original paper, Reshetikhin and Turaev 
(1991) use as algebraic input data the representation 
theory of the quantum deformation U=U,/(sl2) of 
the Lie algebra sl(2,C), where g is a root of unity. 
They construct the invariant as a trace over 
U-equivariant morphisms, and prove the necessary 
modularity condition concerning the nondegeneracy 
of the braided pairing. 

The general picture is drawn by Turaev (1994), 
where 3-manifold invariants and TQFTs are con- 
structed from semisimple modular categories. He 
shows how to obtain the latter as quotients of 
certain subcategories of representations of a modu- 
lar Hopf algebra by the ideal of trace-negligible 
morphisms. 

Finkelberg (1996), based on results of Gelfand 
and Kazhdan, establishes (via the theory of Kazhdan 
and Lusztig) an equivalence between two modular 
categories. The first is the semisimple category C of 
integrable modules over an affine Lie algebra g of 
positive integer level k. The second is a certain 
subquotient of the category of U,(g)-modules for 
g=exp(mim!/(k+hY)), where m € {1,2,3} and hv 
is the dual Coxeter number of g. Huang and 
Lepowsky (1999) describe the rigid braided struc- 
ture of C using vertex operators. Bakalov and 
Kirillov (2001) use geometrical constructions to 
make C into a modular category, associated with 
the Wess—Zumino-—Witten (WZW) model. They 
construct the corresponding WZW modular functor. 


Modular Functor and TQFT 


Modular categories give rise to a modular functor 
and a TQFT. The meanings of those differ from 
author to author, but the common features are the 
following. Such a TQFT is a functor from the 
category whose objects are smooth surfaces with 
additional structures and morphisms are three- 
dimensional manifolds with additional structures to 
the category of vector spaces. A modular functor is 
the restriction of such TQFT to the subcategory whose 
morphisms are homeomorphisms of surfaces. One of 
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the constructions due to Kerler and Lyubashenko 
(2001) takes a nonsemisimple modular category as an 
input and assigns to it a double TQFT functor, that is, 
a functor between double categories. The target is the 
2-category of abelian categories. 


See also: Axiomatic Approach to Topological Quantum 
Field Theory; Hopf Algebras and g-Deformation Quantum 
Groups; The Jones Polynomial; Knot Invariants and 
Quantum Gravity; Quantum 3-Manifold Invariants; 
Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions; Topological Quantum Field 
Theory: Overview; von Neumann Algebras: Introduction, 
Modular Theory, and Classification Theory; von 
Neumann Algebras: Subfactor Theory. 
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Introduction 


Branes appear in string theories and M-theory as 
extended objects which contain some nonperturba- 
tive information about the theory, and, apart from 
gravity, they can couple with gauge fields. 

At low energies, M-theory can be approximated 
with an 11-dimensional N = 1 supergravity, which in 
fact is unique and contains a graviton field (the metric 
gı), a spin 3/2 field w (the gravitino) and a gauge field 
consisting of a 3-form potential field c. The gauge 
field, whose field strength is a 4-form G = dc, can then 
couple electrically with two-dimensional extended 
objects, called M2 membranes. Moving in spacetime, 
an M2 membrane describes a three-dimensional world 
volume W3 so that its coupling to the gauge field is 


S> = k C [1] 
w3 
k representing the charge. 

With c we can associate a dual field č such that 
dč=*G. It is a 6-form and can then electrically 
couple with a five-dimensional object, the M5 
membrane. However, as c is the true field, we say 
that M5 couples magnetically with c. 

In superstring theories, which however are related 
to M-theory by a dualities web, there are many 
more objects to be considered. In particular, we will 
consider type II strings, which at low energies are 
described by ten-dimensional N=2 supergravity 
theories. They contain a Neveu-Schwarz sector 
consisting of a graviton g,,, a 2-form potential 
Biv, and a scalar field ¢, the dilaton. The content of 
the Ramond—Ramond fields depends on the chirality 
of the supercharges. 

Type IIA strings are nonchiral (their left and right 
supercharges having opposite chiralities) and con- 
tain only odd-dimensional p-form potentials A"), 
with p= 1, 3,5, 7,9. 

Type IIB strings are chiral and contain only 
even-dimensional p-form potentials A), with 
p=0, 2,4, 6, 8. 

Proceeding as before, we see that a (p + 1)-form 
potential can couple electrically with a p-dimensional 
object and magnetically with a (6 — p)-dimensional 
object. Such objects in fact exist in type II strings: the 
Dp branes are p-dimensional extended objects, with 
p=0, 2,4, 6,8 for IIA strings and p= —1,1,3,5,7,9 
for IIB strings. In particular, DO and D1 branes are 


called D-particles and D-strings respectively, whereas 
D(—1) branes are instantons, that is, points in 
spacetime. Concretely, D-branes are extended regions 
in spacetime where the endpoints of open strings are 
constrained to live. Mathematically, they are defined 
imposing Dirichlet conditions (whence the “D” of 
D-brane) on the ends of the string, along certain 
spatial directions. Excitation of these string states 
gives rise to the dynamic of the brane. They 
correspond to a ten-dimensional U(1) gauge field, 
whose components, which are tangent to the brane 
world volume, give rise to a gauge field in p+1 
dimensions, whereas the orthogonal components 
generate deformations of the brane shape. Moreover, 
if n parallel p-branes overlap, the gauge theory on the 
world volume is enhanced to a U(n) gauge theory. 
Closed strings can generate gravitational interactions 
responsible for wrappings of the brane. However, in 
the cases when gravitational interaction is negligible, 
we can use this mechanism to construct (p + 1)- 
dimensional gauge theories, as we will see. 

Before explaining how the construction works let 
us remember that there are two other interesting 
objects which often appear. In fact, we have not yet 
considered the Neveu-Schwarz B-field: this field can 
couple electrically with a one-dimensional object 
and magnetically with a five-dimensional object. 
These are the usual string (also called a fundamental 
or F-string) and a five-dimensional membrane called 
NSS brane. 

We will see how supersymmetric gauge theory 
configurations can be realized geometrically, con- 
sidering more or less simple configurations of 
branes. We will also show that quantum corrections, 
be they exact or perturbative, can be described in 
this geometrical fashion. To be explicit, we will 
work with four-dimensional gauge theories, but it is 
clear that similar constructions can be done in 
different dimensions. 


Gauge Groups on the Branes 


A deeper understanding of how D-branes and 
related world-volume gauge theories work requires 
the introduction of dualities, but a quite simple 
heuristic argument can be given, giving up some 
rigor in favor of intuition. 

To set our ideas, let us think of an open string 
moving in a nearly flat (but ten-dimensional) space- 
time. Its trajectory will describe a two-dimensional 
surface having a boundary traced by the ends of the 
string (Figure 1). The string can then be described by 
a map from a two-dimensional surface X, having a 


Closed string Open string 





Figure 1 Strings moving in spacetime. 


boundary y=0O™%, to spacetime, say X"“(o,7) with 
u=0,1,...,9. Here we chose on © local coordi- 
nates o” =(0,7T), where ø € [0,7] is a spacelike 
coordinate and 7 is a timelike one. Then o=0,7 
individuate the ends of the string and are identi- 
fied for the closed string. Now, on a given back- 
ground, the string evolution is usually described as a 
two-dimensional (supersymmetric) conformal field 
theory for the fields X“(o,7). The action for the 
bosonic part is the same for both type IIA and IIB 
strings, and reads 


u V 
E f VB gu) 
2 








S[X] = 





Aral! Oo” of 
1 OX" OX” 
— | B — °/\ do? 
j Aral! j. wX) Oo” Oo" ae ae 2 


where g,, and B are the metric and a 2-form 
potential field for the given spacetime background, 
and hag is a metric for X. In general, we must also 
add a scalar field ¢(X), but it will not play any role 
here. Using conformal invariance, we can reduce hag 
to the flat metric. Also consider a flat background 
Zu X) =N and concentrate for a moment on the 
B-field. 

Conceived as a 2-form field over the spacetime, 
the potential field B is a gauge field: its field strength 
3-form H =dB is unchanged under a shift 


B—B+dA [3] 


generated by the 1-form field A(X). Here A should be 
a totally unphysical field. However, note that if one 
considers open strings, the action for the B-field, and 
then the full action is shifted by a boundary term 


OX 
A(X) do A 


1 
The boundary y just describes the timelike world 
lines of the ends of the string. Thus, the ends of 
the string carry a U(1) charge and, even though 
the B-field vanishes, we can have the open-string 
action 


S[X] = — J oX" Xd o 


+ J A, (X)OnX!do® [5] 
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Here we conventionally rescaled the A field to 
normalize the action. To define the equation of 
motion, however, we must also specify boundary 
conditions for X“(o,7) on y. Let us choose Neu- 
mann conditions for u=0,1,...,p and Dirichlet 
conditions for the remaining directions 


aX (y) =0, a=0,...,p 6) 


a,X(y)=0, i=pt,...,9 7] 


This means that the extrema of the string are bound 
on a (p + 1)-dimensional region (including time): the 
Dp brane. If for X we consider the full strip 
(0,7) =[0, 7] x R then the U(1) action reduces to 


sa[X] = J * A,ðX (m7) 


— J Að X (0,7) [8] 


Thus, only the components of A, tangent to the 
brane interact with the ends of the strings. What 
about the normal components A;? 

To understand its meaning, let us proceed to 
compute the mean momentum transferred by the 
string, as it would be rigid. Imitating the Hamilton- 
Jacobi procedures for particles, let us consider the 
action up to a fixed time, say r=0, so that 
X= [0, xr] x [-2œ,0]. It is then a function of the 
position X”(c,0) of the string at the instant 7=0. 
To compute the momentum, we must vary the 
action by changing the position by a constant shift 
X” (o)= Ab. The variation will then contain some 
boundary terms which, for reasons of consistency, 
we must make vanish. 

Before doing such a computation, let us make 
some further comments. It is plausible to assume 
that the two ends of the string could be charged for 
different U(1) fields. To the states of the open string 
we can in fact add two discrete labels 1, /=1,...,7, 
for some integer n, called Chan-Paton factors, and 
referring, respectively, to the two ends of the string. 
We will indicate the ends of the string as X” (0, 7; I) 
and X”(r, T; J) when we need to specify the states. If 
the string is in the excited state (I, J), then X(0, 7; J) 
can couple with the field A! and X(z,7;]) with AY). 
For simplicity, we will now assume that these fields 
are constant. Note however that A”) must be 
intended as a function of X(0,7) only, and similarly 
for AU. Also to realize the variation we can vary 
X"(o,T) by a function 6X"(o,7)=A*(r)_ strictly 
picked to Ap at 7=0 so that essentially 


0,A"(r) = AM6(r) 9) 


where (rT) is the Dirac delta function. 
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Using the chosen boundary conditions, the varia- 
tion of the full action contains the boundary terms 


Stouna = (Al =A ) [ aaieiar 


al Aia,X,(0,0)do 


Toa 
N! 
— Xj ) — Xj ) 
= | (7,0) — X;(0, 0) 
+ 2ra (AP — A)| 10) 


Imposing the condition of its vanishing gives the 
physical interpretation for the normal components 


of the U(1) fields 
X;(, 0) — X;(0,0) = —2ra! (4)? = AP) [11] 


This means that, up to a constant shift, the fields 
A measure the positions of the ends of the strings 
in the transverse directions! (Figure 2). Equivalently, 
we can say that the string ends on two different Dp 
branes, parallel but displaced in the transverse 
directions by a quantity —2ra’ (AV Pat AW), We are 
thus also able to interpret the Chan—Paton factors. 
They mean that the string is living in a background 
of n parallel branes, stretched between the Ith and 
the Jth brane. On every brane, a U(1) gauge group 
lives so that the full gauge group is U(1)”. However, 
when k of the branes overlap, the corresponding set 
of states become indistinguishable, so that the gauge 
group can be enhanced to a U(k) group. In 
conclusion, n overlapping parallel Dp branes carry 
a (p+1)-dimensional U(n) gauge theory which 
breaks in U(k;) block factors if the branes separate 
in stacks of k; overlapping branes. 

We can say a little bit more about this. If the 
string excited states represent gauge degree of 
freedom, they must become massive to break gauge 
symmetry when the branes separate. To see this, let 
us conclude by computing the mean momentum 
carried by the string. After elimination of the 





Figure 2 Tangential components of Aa appear as gauge 
modes. Normal components A; appear as shift modes. 


boundary terms, the total variation of the action 
due to the shift 6X“(a,0) = A” becomes 


6S = 2 zzl, 0,A"0,X,,do7 
“4 
eo an a,X.,(0,0)do 12) 


The resulting momentum is 


1 T 
l; = ari, OX p(T; 0)do 


On the bulk, the fields X” satisfy the standard wave 
equation in two dimensions, so that the general 
solution is the sum of a left-moving and a right- 
moving part, X(o,7T)=X{(7T +0) + Xp(7 — 0). 
Imposing the boundary conditions, one finds 


X*(o,7T) =X (T +0) + Xi (tT -o) 
+ 2na'p*t + X? [13] 


Xi(o,7) =Xi(r +0) — Xİ (r — 0) 
42a! (Av 7 Alt) o+X, [14 


Here Xj and p° are integration constants and 
Xi (r+) — Xi(r-—7)=0. A direct computation 
then shows that P?’=p* and P'=0, which is also 
what intuition suggests: the string can freely move 
along the branes but is fixed between them in the 
orthogonal directions. However, if it is stretched 
between two separated branes (1.e., if I Æ J), there is 
another contribution to the energy. In fact the factor 
T :=1/(27a’) represents the string tension, so that if 
A is its minimal length, its minimal contribution to 
the energy will be 6E=TA. This energy must 
equally contribute to the spectrum of the excited 
modes, the gauge field bosons. Here in fact, is where 
T-duality comes into play, but we will not discuss it. 

The conclusion is that the spectrum corresponding to 
the stretched string must satisfy the condition E > TA, 
which is as if the string states acquired a mass TA, 
that is, 


m= Y ( ADi AD) 15] 


i=p+1 


This gives us a geometric tool to construct (p + 1)- 
dimensional gauge theories: on n coincident Dp 
branes there exists a U(n) gauge theory which can be 
broken separating the branes and thus giving a mass 
to the gauge bosons. Such a mass is proportional to 
the distance between the branes (Figure 3). 

Before continuing with some examples, let us 
make two comments. First, the theory obtained in 
this way is a supersymmetric one, because the 


Massless 





Massive 


Figure 3 Stretched strings acquire a mass. 


Dirichlet conditions allow the action of supersym- 
metric transformations of the form e Or, + ROR, 
where O; and Op are the fermionic left and right 
supercharge operators and eL, er are spinors satisfy- 
ing the brane projection condition eg = +T°T!.-...- 
TPep. Here IT“ are the ten-dimensional Dirac 
matrices and one refers to “antibranes” for the 
negative sign. 

Second, the gauge group can be converted into an 
SO(n) or an Sp(n/2) (for even n), adding an 
orientifold plane parallel to the branes. The orienti- 
fold plane acts on the orthogonal spacetime direc- 
tions with a Z,»-action 


X ~ X’ [16] 


if X’ =0 is the position of the orientifold. It further 
acts on the string world sheet as o ~ 7 — o making it 
an unoriented string. The effect is to project out 
some states from the spectra, thus reducing the 


gauge group. 


Geometric Engineering of Gauge 
Theories from Branes 


To illustrate how brane construction of gauge 
theories works, we will consider a particular con- 
figuration of branes (Witten 1997). 

We would like to obtain a four-dimensional U(z) 
gauge theory. A possibility could be to take n D3 
branes in a type IIB string background. However, 
such a model would contain too many supersymme- 
tries: in ten dimensions, supersymmetries are gener- 
ated by two 16-dimensional chiral spinors eL, er 
CU ees be ELR = ELR). From the four-dimensional 
point of view, each of them represents four four- 
dimensional spinors giving an N =8 supersymmetric 
theory. The projection condition, due to the branes, 
reduces the number of supersymmetries to four. 
Supersymmetry not being manifest in nature, it is 
desirable to have fewer supersymmetric gauge theo- 
ries at hand. Because different brane projection 
conditions can further reduce supersymmetry, we 
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Figure 4 D4 branes ending on an NS5 brane. Gauge degrees 
of freedom are frozen in four dimensions. 


can try to consider the coexistence of more kinds of 
branes. 

One way to do this is to consider n parallel 4-branes 
ending on an NSS brane in type IIA string theory 
(Figure 4), and then analyze the gauge theory restricted 
to the four-dimensional intersection (here the theory is 
nonchiral as T°-...- Tee = +e,/r). What kind of 
branes can end on other kind of branes can be 
established, starting from the fact that strings can end 
on a brane, and using the dualities tool (Giveon and 
Kutasov 1999), 

Let us fix some conventions. We will indicate with 
x = (x9, x!, x2, x3) € R* the coordinates on the inter- 
section, so that (x; v) = (x; xt, x°) € R! define the NSS 
brane, and (x, x£), with x € [0, 00), the 4-branes. Also 
vı will indicate the position of the Ith 4-brane on the 5- 
brane, and y=(x’,x°,x’) will collect the remaining 
coordinates. Finally, we will indicate the product of T- 
matrices, corresponding to given directions, indicizing 
a simple T with the respective coordinates. For 
example [’=I“*I°. With these conventions, the 
brane projection conditions for D4 and NSS branes, 
respectively, read 


Hr re [17] 


EL = ITI E, ER = Tl ee [1 8| 


These projections reduce supersymmetry to N =2. 
After a short manipulation and using for example 
antichirality of eg, it is easy to see that the first 
condition can be substituted by 


EL = I~“ YeR [19] 


In other words, we could add a number of 6-branes 
in the (x,y) directions, without further reducing 
supersymmetry. We will consider this possibility 
later. 

On the D4 branes there is an eventually broken 
U(n) gauge theory. Here the vector fields 
Ans 4=0,1,2,3,6, and the scalar fields vy and y 
live. The last ones are set to zero by the Dirichlet 
conditions, whereas vy measure the fluctuations of 
the D3 brane positions over NSS. The O(2) group 
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of rotations of the (x*,x°) coordinates acts on 


them, which can be broken by an expectation 
value (vr) #0. The SO(3) rotations of (x°,x7,x°*) 
(under which vr are singulets) do not influence the 
projection conditions and can then be identified with 
the R-symmetry group SU(2)p. It could be broken by a 
nonvanishing expectation value (y) #0, but as we 
said it cannot happen in the actual configuration. This 
highlights an unbroken supersymmetric Coulomb 
branch. 

What is the physics as seen by an observer living 
on the four-dimensional spacetime x? The compo- 
nents Aa, a=0, 1,2, 3, of the vector fields transform 
as vectors with respect to the four-dimensional 
Lorentz group SO(1, 3). They satisfy Neumann 
boundary conditions on xf =0 and then survive as 
U(n) gauge vector fields. The Ag component behaves 
as a scalar with respect to SO(1, 3) but is eliminated 
by a Dirichlet condition in xf =0. The v scalar field 
will be responsible for the eventual breaking of the 
gauge group. 

This seems to be quite a good scenario but 
actually the situation is unsatisfactory. If a 4-brane 
extends to the interval [0, L] in the xê direction, the 
effective action for the gauge fields goes like this: 


1 L 
7 dx’ | d*xtrF,,,. PY 
ED, /0 Rí 
wa d*xtrF,,3F°? [20] 
D4 


where a, @=0,1,2,3. Thus, the gauge coupling in 
four dimensions appears to be g4 =(gp,)/VL. In our 
case, where L goes to infinity, the gauge coupling 
vanishes and the gauge degrees of freedom are 
frozen. Moreover, an argument similar to the one 
made for the stretched strings shows that the energy 
of the D4 brane is very high and makes the 
mechanism of gauge group breaking difficult. The 
same is true for the NSS brane, which also turns out 
to be extremely massive and does not participate in 
the dynamics. But this is what we want. 

To solve the problem and restore gauge dynamics 
in four dimensions, one must consider a stack of 
4-branes of finite length in the x° direction. This can 
be achieved placing in xê =L a second NSS brane 
parallel to the first one and in the same point in y 
(Figure 5). In this way, the D4 branes can stretch 
between the NSS branes. If L is little enough, the 
gauge dynamics is restored also requiring a small 
value for gp,, to ensure the gravitational coupling 
(and the couplings with the Kaluza—Klein and NSS 
modes) to be negligible. However, L must be bigger 
then the 6X° fluctuations in order to avoid quantum 
corrections. 


NS5 





Figure 5 N=2 four-dimensional super Yang-Mills theory, with 
U(n) gauge group. 


What we just obtained is an N=2 supersym- 
metric classical U(m) gauge theory in four dimen- 
sions, without matter, and in the Coulomb branch. 
Before considering quantization, let us briefly 
discuss some possible generalizations. For example, 
matter can be realized attaching to the left-hand side 
NSS brane, new D4 branes parallel to the previous 
ones, but extended in the x° direction from —oo to 0 
(Figure 6). Considering strings stretched between 
long and short branes, we obtain states whose half- 
gauge action, associated with the end connected to 
the long brane, is frozen. The corresponding states 
thus appear in the fundamental representation and 
can be interpreted as matter states. 

To consider the Higgs branch, one should be able 
to break supersymmetry giving an expectation value 
to y. As mentioned above, in the actual configura- 
tion this cannot happen because y is set to 0 by 
Dirichlet conditions. Fortunately, as we said, one 
can add 6-branes in the (x,y) directions. If we insert 
such branes to stop the long D4 branes in a large but 
finite value of xf, say xê =—M with M > L, then 
long branes have Neumann conditions in the y 
directions. Thus, fluctuations of the long branes can 
give an expectation value to y, breaking super- 
symmetry and subsequently the Higgs branch can be 
tuned, shifting 4-branes stretched between 6-branes 
(Figure 7). 





Figure 6 Adding matter. 


branch 





Figure 7 Permitting Higgs phases. 


The details require some careful inspection, but 
we shall stop our analysis here (Giveon and Kutasov 
1999). 

More general gauge configurations can be realized 
by adding more parallel NSS branes, and thus 
obtaining product groups. Adding orientifold planes, 
one can change gauge groups as explained in the 
previous section (Figure 8). 

Finally, we can take a further step towards more 
physical models, constructing N=1 gauge theories. 
For example, this can be achieved from the previous 
N=2 model, rotating the second NSS brane from 
the (x,v) position, to the (x,w) position, where 
w= (x, x°) (Figure 9). Then a new brane projection 
condition appears (e =I*I egr), breaking super- 
symmetry down to N=1. 

In this case, one could also obtain chiral matter, 
adding, for example, orientifold planes. 


Quantum Corrections from M-Theory 


Up to this point we have considered classical gauge 
configurations. Quantum corrections could be com- 
puted switching on brane fluctuations. However, it 
is an amusing fact that working with M-theory one 
can obtain exact quantum results. As an example, 
let us sketch how the exact Seiberg-Witten solution 
can be obtained for the N =2 model described in the 
previous section, in the simplest case without 
matter. 
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The full web of dualities suggests the existence of 
a unique unifying theory called M-theory. At low 
energies, M-theory appears as the strong-coupling 
limit of type IIA strings. In such a limit, DO branes 
become the dominant objects and the corresponding 
states can be interpreted as Kaluza—Klein modes 
coming from an eleventh dimension x!° compacti- 
fied on a circle S! (Figure 10). 

Thus, M-theory manifests itself as an 11-dimensional 
supergravity. In particular, it can be shown that there 
can be only a unique 11-dimensional supergravity. As 
said, here the nonperturbative objects are two- or five- 
dimensional membranes. 

From the M-theory point of view, the D4 branes 
considered in our model appear as M5 membranes 
wrapped on the eleventh direction S! (Figure 11). 
Because quantum corrections are no longer negligi- 
ble, we can no longer think of these branes as 
stretched in the x° direction, but v must also be 
considered. Thus, the M5 membranes will describe, 
in R' x S!, a region RÍ x S, where R* are the x 
coordinates, and S is a Riemann surface immersed in 
O x St, Q being spanned by the (v,x°) coordinates. 
In fact, supersymmetry constrains the surface to be a 
holomorphic curve, so that to describe it, it is 


convenient to collect v=(x*,x°) and (x°,x!°) into 


complex coordinates v = xf + ix? and s=x°® + ix!®. 
To compute quantum fluctuations, let us note that 
the end of a D4 brane over an NSS brane is free to 
move along the v directions. A fully free end of a 
brane would satisfy a free wave equation. However, 
as xê is constrained in all directions but the v ones, it 
will simply satisfy a Laplace equation in two 
dimensions: A,X°=0. Let us solve it, for a fixed 


NSS brane. It will be (at least for large values of v) 


nL NR 
xê (v) = kX log ly — | — kX log w=] [21 
i1 i=1 


where my, is the number of D4 branes ending on 
the left-hand side of the NSS brane, in the positions 


(a 
i 


vi, and similar for the R index, which refers to 





Figure 8 N=2 four-dimensional super Yang—Mills theory with U(n;) x U(n2) gauge group and matter. Strings crossing the central 


NS5 brane give matter in the (n;, N2) representation. 
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Figure 10 In M-theory one can think as if at any ten-dimensional 
spacetime point, there is attached an S! circle of ray Rio. 


M5 membrane (v, y) 





D4 brane 


Figure 11 


D4 branes become M5 membranes in M-theory. 


the right-hand side. Here (a) refers to the ath NSS 
brane, and k is an integration constant. 

Because x° is the real part of a holomorphic field, 
whose imaginary part is compactified on a circle of 
ray R10, we then find 


ny, 
s(v) = R10 Dp log (v — a] 
i=1 


— Ryo 5> log (v — ve) [22] 
i=1 


This describes the quantum fluctuations of the NSS 
brane as seen in M-theory. In particular, because of 
the imaginary part of s, the ends of the D4 branes 
appear as vortices on the NSS brane. In place of s, it 
is now convenient to introduce a new field 
t:= exp (—s/R10) so that 


i(v) = S$ [23] 


Before continuing, let us look a bit again at the 
classical limit. In this case, a fixed value of v will 
correspond to the position of a D4 brane, whereas a 
fixed value of s will correspond to the fixed position 
of an NSS brane. The classical configuration is then 


n 


(s — s() (s — s) | (v—v;) =0 [24] 


i=1 


Here s'‘” are the positions of the NSS branes, and 
the positions v; of the D4 branes coincide for both 
the NSS branes. Also, for large values of v, one has 
Hiap and t@) su. 

Quantum mechanically, the configuration is 
determined in terms of v and t by the holomorphic 
curve S, which can be described as an algebraic 
curve F(v,t)=0, generalizing the classical configura- 
tion. As there are two NSS branes and n D4 branes, 
F must be a polynomial of degree 2 in f, 


F(v,t) = Az(v)t* + A (vt + Ao(v) [25] 


where A,, a= 1,2,3, are all polynomials of degree n. 
Note that values of v such that A, vanishes give the 
solution t = 0, which corresponds to sending the right- 
hand side NSS brane to oo. Similarly, A2 = 0 sends the 
other NSS brane to —oo. To avoid these undesirable 
configurations, we can set Ag=A2=1. For A1, we 
can take the most general choice, up to an eventual 
shift in v, giving the quantum configuration 


t? + [v" tana? +--+ +av+ao|t+1=0 [26] 


This realizes a quantum-mechanical correspondence 
between the M5 membrane configurations described 
by the given polynomials, and the N=2 super 
Yang-Mills vacua. But this is also the claimed 
Seiberg—Witten curve. In particular, M-theory gives 
a concrete physical meaning for the support Rie- 
mann surfaces of the Seiberg—Witten solutions. 

To conclude, let us make some further comments. 
It is clear how the construction can be extended for 
involving more configurations, for example, with 
more NSS branes, or adding matter. 

Also, we have seen that the geometrical picture 
which branes give of gauge theories extends at the 
quantum level. 

A similar construction can be made for the N=1 
model, which also permits a full geometrical proof 
of the Seiberg duality at both classical and quantum 
levels. 

Finally, we should note that there are also 
other methods, which work in spacetimes where extra 
dimensions are compactified. There, the branes wrap 
around certain singular loci which contain information 
about gauge symmetries (Lerche 1997). 


See also: AdS/CFT Correspondence; Compactification of 
Superstring Theory; Gauge Theories from Strings; 
Noncommutative Geometry from Strings; Seiberg—Witten 
Theory; Supergravity; Superstring Theories; 
Supersymmetric Particle Models. 
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Introduction 


At high enough energies, Einstein’s classical theory 
of general relativity breaks down, and will be 
superseded by a quantum gravity theory. The 
singularities predicted by general relativity in grav- 
itational collapse and in the hot big bang origin of 
the universe are thought to be artifacts of the 
classical nature of Einstein’s theory, which will be 
removed by a quantum theory of gravity. Develop- 
ing a quantum theory of gravity and a unified theory 
of all the forces and particles of nature are the two 
main goals of current work in fundamental physics. 
The problem is that general relativity and quantum 
field theory cannot simply be molded together. 
There is as yet no generally accepted (pre-)quantum 
gravity theory. 

The quest for a quantum gravity theory has a long 
and thus far not very successful history. Many 
different lines of attack have been developed, each 
having a different way of dealing with the classical 
singularities that arise from point particles and 
smooth spacetime geometry. String theory does 
away with zero-dimensional point particles, and 
particles are modeled as different states of new 
fundamental objects, the one-dimensional strings. It 
turns out, however, that there is a price to pay — the 
number of spacetime dimensions must be greater 
than four for a consistent theory. When fermions are 
included, which leads to superstring theory, the 
required number of dimensions is ten — one time and 
nine space dimensions. 

There are in fact five distinct (1+9)-dimensional 
superstring theories. In the mid-1990s, duality 
transformations were discovered that relate these 
superstring theories to each other and to the (1+10)- 
dimensional supergravity theory. This led to the 
conjecture that all of these theories arise as different 
limits of a single theory, which has come to be 
known as M theory. It was also discovered that 
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extended objects of higher dimension than strings 
play a fundamental role in the theory. These objects 
are known as “branes” (from membranes), and the 
relation between them and strings leads to a new 
picture of how gravity and matter may be connected 
in the universe. Roughly speaking, open strings 
describe the particles of the nongravitational sector, 
and their ends are attached to branes, while closed 
strings, which describe the graviton and associated 
particles of the gravitational sector, can move freely 
in all dimensions. 

Thus, the observable universe could be a 
(1+3)-surface - a “brane,” embedded in a 
(1+3+d)-dimensional spacetime - the “bulk,” 
with standard-model particles and fields trapped on 
the brane, while gravity is free to access the bulk. 
Brane-world models offer a phenomenological way to 
test some of the novel predictions and corrections to 
general relativity that are implied by M theory. 


Higher-Dimensional Gravity 


Brane worlds can be seen as reviving the original 
higher-dimensional ideas of Kaluza and Klein in the 
1920s, but in a new context of quantum gravity. An 
important consequence of extra dimensions is that 
the four-dimensional Planck scale M, = Mi4)= 
1.2 x 10!? GeV is no longer the fundamental energy 
scale of gravity. The fundamental scale is instead 
M 44a). This can be seen from the modification of 
the gravitational potential. For an Einstein—Hilbert 
gravitational action, 


1 44d 
Seraviy = =—3— | d’xd4yy/—4+4 
8 ty a X yY § 
x [HOR — Aaa) (1) 


we have the higher-dimensional Einstein field 


equations, 


(4+d) Gaz — (44d) Raz 


— HD RHAD oap 


= — A (44a) 44) eag + Kad) OT ap 2 
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where x4 =(x*,y!,...,y%) and Kas) is the gravita- 


tional coupling constant given by 


8T 
K (44d) = 81G(444) = vet [3] 
(4+d) 


The static weak field limit of the field equations 
leads to the (4+d)-dimensional Poisson equation, 
whose solution is the gravitational potential 


2 
K (4+d) 
rl +d 





V(r) x 4] 

In the simplest scenario, we can assume a 
toroidal configuration for the d extra dimensions, 
with each compactified on the same length scale L. 
Then on scales r<L, the potential is (4+d)- 
dimensional, V~r-“'+4, By contrast, on scales 
large relative to L, where the extra dimensions do 
not contribute to variations in the potential, V behaves 
like a four-dimensional potential, V ~ L~¢r~!. This 
means that the usual Planck scale becomes an effective 
coupling constant, describing gravity on scales much 
larger than the extra dimensions, and related to the 
fundamental scale via the volume of the extra 
dimensions: 


2 2+d yd 
M? ~ Mi L [5] 


Large Extra Dimensions 


If the extra-dimensional volume is significantly 
above the Planck scale, then the true fundamental 
scale M(4,q) can be much less than the effective scale 


Mp, 


L? > M7? => Masa < Mp (6] 


In this case, we understand the weakness of gravity 
as due to the fact that it “spreads” into extra 
dimensions, and only a part of it is felt in four 
dimensions. 

A lower limit on Mi.) is given by null results in 
table-top experiments to test for deviations from 
Newton’s law in four dimensions, V «x r™t. These 
experiments currently probe submillimeter scales, 
and find no detectable deviation, so that 


La 107 mm ~ (1071 Tey) 
= Misa) = 109(32-154)/(4+2) TeV [7] 


Stronger bounds can be derived from null results in 
particle accelerators in some brane-world models, or 
from constraints imposed by observations of super- 
novae or of light-element abundance. 

Brane worlds, arising in the framework of string 
theory, thus incorporate the possibility that the 


fundamental scale is much less than the Planck 
scale felt in four dimensions. This emerges by virtue 
of the large size of the extra dimensions. It is not 
necessary for all extra dimensions to be of equal size 
for this mechanism to operate. There are string 
theory solutions (Horava—Witten solutions) with 
two (14+9)-branes located at the boundaries of the 
bulk, at the endpoints of an $'/Z» orbifold, that is, 
a circle folded on itself across a diameter. The 
orbifold extra dimension is the large one, whereas 
the other six extra dimensions on the branes are 
compactified on a very small scale, close to the 
fundamental scale, and their effect on the 
dynamics is felt through “moduli” fields, that is, 
five-dimensional scalar fields. 

These solutions can be thought of as effectively 
five dimensional, with an extra dimension that can 
be large relative to the fundamental scale. They 
provide the basis for the Randall-Sundrum 1 (RS1) 
phenomenological models of five-dimensional grav- 
ity. The single-brane Randall-Sundrum 2 (RS2) 
models with infinite extra dimension arise when 
the orbifold radius tends to infinity. The RS models 
are not the only phenomenological realizations of M 
theory ideas. They were preceded by the brane- 
world models of Arkani-Hamed, Dimopoulos, and 
Dvali (ADD), which put forward the idea that a 
large volume for the compact extra dimensions 
would lower the effective Planck scale Mi4,q). If 
Mv4+a) is close to the electroweak scale, Mew, then 
this would address the long-standing “hierarchy” 
problem, that is, why there is such a large gap 
between Mew ~ 1 TeV and M, ~ 10!° TeV. 

In the ADD models, more than one extra 
dimension is required for agreement with experi- 
ments, and there is “democracy” among the equiva- 
lent extra dimensions, which, in addition, are flat. 
By contrast, the RS models have a “preferred” extra 
dimension, with other extra dimensions treated as 
ignorable (i.e., stabilized except at energies near the 
fundamental scale). Furthermore, this extra dimen- 
sion is curved or “warped” rather than flat: the bulk 
is a portion of anti-de Sitter (AdS;) spacetime. The 
RS branes are Z2-symmetric (mirror symmetry), and 
have a tension, which serves to counter the influence 
on the brane of the negative bulk cosmological 
constant. This also means that the self-gravity of the 
branes is incorporated in the RS models. The novel 
feature of the RS models compared to previous 
higher-dimensional models is that the observable 
three dimensions are protected from the large extra 
dimension (at low energies) by curvature (warping), 
rather than straightforward compactification. 

The RS brane worlds provide phenomenological 
models that reflect at least some of the features of 


M theory, and that bring exciting new geometric 
and particle physics ideas into play. The RS2 
models also provide a framework for exploring 
holographic ideas that have emerged in M theory. 
Roughly speaking, holography suggests that 
higher-dimensional dynamics may be determined 
from a knowledge of the fields on a lower- 
dimensional boundary. The AdS/CFT correspon- 
dence is an example in which the classical 
dynamics of the higher-dimensional AdS gravita- 
tional field are equivalent to the quantum 
dynamics of a conformal field theory (CFT) on 
the boundary. 


Kaluza—Klein Modes 


The dilution of gravity via extra dimensions not 
only weakens gravity, it also broadens the range of 
graviton modes felt on the brane. The graviton is 
more than just the four-dimensional massless mode 
of four-dimensional gravity — other modes, with an 
effective mass on the brane, arise from the fact 
that the graviton is a (4+d)-dimensional massless 
particle. These extra modes on the brane are 
known as Kaluza—Klein (KK) modes of the 
graviton. 

For simplicity, consider a flat brane with one flat 
extra dimension, compactified through the identi- 
fication y + y+ 2rnL, where n= 0,1,2,.... The 
perturbative five-dimensional graviton is defined 
via 


nag > ©) nag + hap [8] 


where °°), is the five-dimensional Minkowski metric 
and þag is a small transverse traceless perturbation. Its 
amplitude can be Fourier expanded as 


h(x", y) =)! by, (x") 9] 


where þh, are the amplitudes of the KK modes, that 
is, the effective four-dimensional modes of the five- 
dimensional graviton. To see that these KK modes 
are massive from the brane viewpoint, we start from 
the five-dimensional wave equation that the massless 
five-dimensional field h satisfies (in a suitable 


gauge): 
“Oh=0 = Oh+d;h=0 [10] 
It follows that the KK modes satisfy a four- 


dimensional Klein—Gordon equation with an effec- 
tive four-dimensional mass, m,: 
2 
by = Nix, [11] 


My = 


5l a3 
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The massless mode, ho, is the usual four- 
dimensional graviton mode. But there is a tower 
of massive modes, L!,2L,..., which 
imprint the effect of the five-dimensional gravita- 
tional field on the four-dimensional brane. Com- 
pactness of the extra dimension leads to 
discreteness of the spectrum. For an infinite 
extra dimension, L— oo, the separation between 
the modes disappears and the tower forms a 
continuous spectrum. 


Randall-Sundrum Brane Worlds 


RS brane worlds do not rely on compactification to 
localize gravity at the brane, but on the curvature of 
the bulk. What prevents gravity from “leaking” into 
the extra dimension at low energies is a negative 
bulk cosmological constant, 


= 12] 
where £ is the curvature radius of AdS; and p is the 
corresponding energy scale. The bulk cosmological 
constant with its repulsive gravity effect acts to 
“squeeze” the gravitational potential closer to the 
brane. We can see this clearly in Gaussian normal 
coordinates x“ = (x”, y) based on the brane at y=0, 
for which the metric takes the form 


Ods? = dy? + ely y ded” [13] 


with n the Minkowski metric. The exponential 
warp factor reflects the confining role of the bulk 
cosmological constant. The Z2-symmetry about the 
brane at y=0 is incorporated via the |y| term. In the 
bulk, this metric is a solution of the five-dimensional 
Einstein equations, 


Gap = -Agag [14] 


that is, °)T43—=0 in eqn [2]. The brane is a flat 
Minkowski spacetime, gag(x”, 0) =n ó” a0" B, with 
self-gravity in the form of brane tension. 

The two RS models are distinguished as follows: 


RS1 There are two branes in RS1, at y=0 and 
y= L, with Z2-symmetry identifications 
yo-y y+LeL-y [15] 


The branes have equal and opposite tensions, +A, 
where 


sM 


The positive-tension “TeV” brane has fundamental 
scale Mis) ~ 1 TeV. Because of the exponential 
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warping factor, the effective scale on the negative 
tension “Planck” brane at y=L is Mp. On the 
positive tension brane, 


M2 = Mise|4 7 a [17] 
So RS1 gives a new approach to the hierarchy 
problem. Because of the finite separation between 
the branes, the KK spectrum is discrete. 

RS2 In RS2, there is only one, positive- 
tension, brane. This may be thought of as arising 
from sending the negative tension brane off to 
infinity, L— œ. Then the energy scales are 
related via 


Mis) ==; [18] 


On the RS2 brane, the negative As) is offset by 
the positive brane tension A. The fine-tuning in eqn 
[16] ensures that there is zero effective cosmological 
constant on the brane, so that the brane has the 
induced geometry of Minkowski spacetime. To see 
how gravity is localized at low energies, we consider 
the five-dimensional graviton perturbations of the 
metric: 


gap —> Soag +hap 


i 19) 
hay =0 = b", = ð h" 


We split the amplitude / into three-dimensional 
Fourier modes, and the linearized five-dimensional 
Einstein equations lead to the wave equation (y > 0) 


e2v/t b +k? p = h" — ty i20] 
Separability means we can write 


b(t,y) = S pm) bay) [21] 


and the wave equation reduces to 


Pm + (m* +k )pm=0 |22] 
hi — f h, + ebm =0 [23] 
The zero-mode solution is 
polt) = Ao, eT’ + Ao- e" [24] 
ho(y) = Bo + Cer’ [2.5] 


and the massive KK mode (m > 0) solutions are 


Pm(t) = Am+ exp (4ivin? + t) 
+ Am- exp (-ivm? + kèt) [26] 


Dno) = ea [Brz (mie) 
+ Cm Y2 (mie) [27] 


where J2, Y2 are Bessel functions. 
The boundary condition for the perturbations is 
b'(t,0)=0, which implies 





B Jing) 
Co=0, Cn =- p P” 28] 


In the RS1 model, we have a further boundary 
condition, /’(t,L)=0, which leads to a discrete 
eigenspectrum, namely the masses m that satisfy 


h (meer) Yı (me) — Yı (mee) Ii(me)=0 [29] 


The zero mode is normalizable, since 
| Boe?” dy 
0 


Its contribution to the gravitational potential 
V=(1/2)hoo gives the four-dimensional result, V « 
r1, The contribution of the massive KK modes sums 
to a correction of the four-dimensional potential. 
For r < £, one obtains 


GM ÀN GM? 
V(r) x = (1 i r) > [31] 





< 00 [30] 





which simply reflects the fact that the potential 
becomes truly five dimensional on small scales. For 
Pa? 


V(r) & — (1 m a) 32] 


which gives the small correction to four-dimensional 
gravity at low energies from extra-dimensional effects. 


Cosmological Brane Worlds 


The RS models contain vacuum (Minkowski) 
branes. In order to pursue brane-world ideas in 
cosmology, we need to generalize the RS models to 
incorporate cosmological branes with matter and 
radiation on them. The effective field equations on 
the brane are the vehicle for brane-bound observers 
to interpret cosmological dynamics. They arise from 
projecting the five-dimensional field equations onto 
the brane, via the Gauss—Codazzi equations. These 
equations involve also the extrinsic curvature K,,,, of 
the brane, which determines how the brane is 
imbedded in the bulk. 

The stress-energy on the brane (tension, matter, 
radiation) means that there is a jump in K, across 


the brane. More precisely, the junction conditions 
across the brane are 


— brane brane 
Ki, Z Koy = — Ks) [Tes a gr Sw [34] 
where 
Le =e [35] 


is the total energy-momentum tensor on the brane 
and 7°" = TT The Z2-symmetry means that 
when approaching the brane from one side and 
going through it, one emerges into a bulk that looks 
the same, but with the normal reversed. This implies 
that 
- _ _ xt 

| Ones iy [36] 
so that we can use the junction condition (eqn [34]) 
to determine the extrinsic curvature: 


Kw = -Kis [Ty +A — T) gy [37] 


where T=T",,, we have dropped the (+) and we 
evaluate quantities on the brane by taking the limit 
y— +0. 

Together with the Gauss—Codazzi equations, eqn [37] 
leads to the induced field equations on the brane: 


2 
6 p= len 2 T ot 6S ww Gis 8 
where 
kE K4) = LAKI) [39] 
A= Ava) = 5 (Acs) + K A| [40] 
Sig =i TT yy — al 7 
+A gum Tas T — T] [41] 
and 
Ew = PCan n g a [42] 
where 7“ is the unit normal to the brane and 


(')Cacgp is the Weyl tensor in the bulk. 

The induced field equations [38] show two key 
modifications to the standard four-dimensional Einstein 
field equations arising from extra-dimensional effects. 


eS, ~ (Tw) is the high-energy correction term, 
which is negligible for p< à, but dominant for 
p > A (where p is the energy density): 


Sw /N [Tul p 


P 43 
let Tav] À A 43] 
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è E, the projection of the bulk Weyl tensor on the 
brane, encodes corrections from KK or five- 
dimensional graviton effects. From the brane- 
observer viewpoint, the energy-momentum 
corrections in S,,, are local, whereas the KK 
corrections in €,, are nonlocal, since they 
incorporate five-dimensional gravity wave 
modes. These nonlocal corrections cannot be 
determined purely from data on the brane. In 
the perturbative analysis of RS2 which leads to 
the corrections in the gravitational potential, eqn 
[32], the KK modes that generate this correction 
are responsible for a nonzero €,,,; this term is 
what carries the modification to the weak-field 
field equations. 


The effective field equations are not a closed system. 
One needs to supplement them by five-dimensional 
equations governing €,,,, which are obtained from the 
five-dimensional Einstein equations. 


Cosmological Dynamics 


A (1+4)-dimensional spacetime with spatial 
4-isotropy (four-dimensional spherical/ plane/ 
hyperbolic symmetry) has a natural splitting into 
hypersurfaces of symmetry, which are (1+3)- 
dimensional surfaces with 3-isotropy and 
3-homogeneity, that is, Friedmann—Robertson- 
Walker (FRW) surfaces. In particular, the AdS; 
bulk of the RS2 brane world, which admits a 
foliation into Minkowski surfaces, also admits an 
FRW foliation since it is 4-isotropic. The general- 
ization of AdS; that preserves 4-isotropy and 
solves the five-dimensional Einstein equation is 
Schwarzschild AdSs, and this bulk therefore 
admits an FRW foliation. It follows that an 
FRW cosmological brane world can be embedded 
in Schwarzschild AdS; spacetime. 

The black hole in the bulk is felt on the brane 
via the €,,, term. The bulk black hole gives rise to 
“dark radiation” on the brane via its Coulomb 
effect. The FRW brane can be thought of as 
moving radially along the fifth dimension, with the 
junction conditions determining the velocity via 
the Friedmann equation. Thus, one can interpret 
the expansion of the universe as motion of the 
brane through the static bulk. In the special case 
of no black hole and no brane motion, the brane is 
empty and has Minkowski geometry, that is, the 
original RS2 brane world is recovered, in different 
coordinates. 

An intriguing aspect of the cosmological metric is 
that five-dimensional gravitational wave signals can 
take “shortcuts” through the bulk in traveling 
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between points A and B on the brane. The travel 
time for such a graviton signal is less than the time 
taken for a photon signal (which is stuck to the 
brane) from A to B. 

Cosmological dynamics on the brane are governed 
by the modified Friedmann equation: 


K? m 1 K 
M a e A lal 
where H =å/a is the Hubble expansion rate, a(t) is 
the scale factor, K is the curvature index, and m is 
the mass of the bulk black hole. 

The p?/A term is the high-energy term. When p > 
à, in the early universe, then H? œ p*. This means 
that a given energy density produces a greater rate of 
expansion that it would in standard four-dimen- 
sional gravity. As a consequence, inflation in the 
early universe is modified in interesting ways, some 
of which may leave a signature in cosmological 
observations. 

The m/a* term in eqn [44] is the “dark 
radiation,” so called because it redshifts with 
expansion like ordinary radiation. But, unlike 
ordinary radiation, it is not a form of detectable 
matter, but the imprint on the brane of the 
gravitational field in the bulk (the Coulomb effect 
of the bulk black hole). This additional effective 
relativistic degree of freedom is constrained by 
nucleosynthesis in the early universe. Any extra 
radiative energy not thermally coupled to radiation 
affects the rate of production of light elements, and 
observed abundances place tight constraints on 
such extra energy. The dark radiation can be no 
more than ~3% of the radiation energy density at 
nucleosynthesis: 


3m 


K“ Pnuc 





<0.03 45) 


The other modification to the Hubble rate is via 
the high-energy correction p/X. In order to recover 
the observational successes of general relativity, the 
high-energy regime where significant deviations 
occur must take place before nucleosynthesis, that 
is, cosmological observations impose the lower 
limit 


\>(1MeV)* > Mis) >10*GeV [46] 


This is much weaker than the limit imposed by 
table-top experiments, which limit the curvature 
radius to £<0.2 mm, leading to 


A > (100 GeV)* + Mis) > 108 GeV [47] 


The high-energy regime during radiation domina- 
tion is short-lived. Since p”/ decays as a™® during the 
radiation era, it will rapidly drop below one, and the 
universe will enter the low-energy four-dimensional 
regime. However, traces of the high-energy era may be 
left in the perturbation spectra that leave an imprint in 
the cosmic microwave background radiation. 

In conclusion, simple brane-world models of RS2 
type provide a rich phenomenology for exploring 
some of the ideas that are emerging from M theory. 
The higher-dimensional degrees of freedom for the 
gravitational field, and the confinement of standard 
model fields to the visible brane, lead to a complex 
but fascinating interplay between gravity, particle 
physics, and geometry, which enlarges and enriches 
general relativity in the direction of a quantum 
gravity theory. High-precision astronomical data 
mean that cosmology is a potential laboratory for 
testing and constraining these brane worlds. The 
models predict extra-dimensional signatures in the 
cosmic microwave background and other observa- 
tions, and these predictions can in principle be tested 
against data. 


See also: String Theory: Phenomenology; Supergravity; 
Superstring Theories. 
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Introduction 


In classical general relativity, a black hole is a 
solution of Einstein’s equations with a region of 
spacetime which is causally disconnected from the 
asymptotic region at infinity. The boundary of such 
a region is called the “event horizon.” The spacetime 
around the simplest black hole in three space 
dimensions is described by the Schwarzschild metric 








where G is Newton’s gravitational constant, c is the 
velocity of light, and we have used spherical 
coordinates with dQ the line element on an S*. A 
nonrotating, uncharged star which is too massive to 
form a neutron star will eventually collapse, and at 
late times the metric will be given by [1]. The 
horizon is a null surface $% x t and the radius of the 
S* is Thorizon = 2GM/c?. The Schwarzschild solution 
has generalizations to black holes with charge and 
angular momentum and no-hair theorems guarantee 
that a black hole has no other characteristic property. 
All these solutions can be generalized to other 
theories like supergravity in various dimensions. 

In 1974, Hawking showed that due to pair 
production of particles near the horizon, black 
holes radiate thermally. Hawking’s calculation is 
valid for black holes whose masses are much larger 
than the Planck mass: for such black holes, the 
curvature at the horizon is weak and normal 
semiclassical quantization is valid. Remarkably, the 
properties of Hawking radiation are quite universal. 
A black hole can be characterized by an entropy 
called the Bekenstein-—Hawking entropy. The leading 
result for the entropy Spy for all black holes in any 
theory with the standard Einstein—Hilbert action is 
given by 


Ay 


where Ay denotes the area of the horizon. The 
temperature Ty is given by 


ly => [3] 


where « is the surface gravity at the horizon. The 
principle of detailed balance further ensures that the 
radiation rate of some species of particle i, I;(k), 
in some given momentum range (k, k + dk) is related 
to the corresponding absorption cross section o;(k) by 





d 

EE 4 

€ l (2r) 
where w is the energy and d denotes the number of 
spatial dimensions. The + sign refers to fermions 
(bosons), respectively. A nontrivial k dependence of 
o; signifies a departure from black-body behavior. 
Consequently, o;(k) is often called a grey-body 
factor. Equations [2] and [3] may be derived by 
combining Hawking’s calculation of the radiation 
with standard thermodynamic relations. Alterna- 
tively, they follow from the leading semiclassical 
approximations of path-integral formulations of 
Euclidean gravity based on the standard Einstein- 
Hilbert action. For an account of black-hole 
thermodynamics, see Wald (1994). 

Unlike usual thermodynamic systems, black holes 
appear to pose a deep puzzle. In usual systems, 
thermodynamics is a coarse-grained description of a 
system which is in a highly degenerate state. 
Typically, such systems are described in terms of a 
few macroscopic parameters such as the total 
energy, the total volume, the total charge. For each 
set of values of these macroscopic parameters, there 
are a large number of microscopic states which can 
be described in terms of the constituents such as 
atoms or molecules. This degeneracy manifests itself 
as an entropy S$ which is related to the number of 
microscopic states for a given set of values of the 
macroscopic parameters, Q by Boltzmann’s relation 


S = log(Q) 5 


where units have been chosen such that the 
Boltzmann constant is unity. For a black hole, the 
macrostates are specified by its mass, charge, and 
angular momentum. No-hair theorems, however, 
seem to suggest that there are no other properties 
and hence no obvious candidate for microstates. In 
the absence of such a statistical basis, one would be 
inevitably led to the conclusion that there is loss of 
information in processes involving black holes. 

In a consistent quantum theory of gravity, there 
would be such a statistical basis since quantum 
mechanics is unitary. String theory is a strong 
candidate for a unified theory which contains 
gravity. Indeed, string theory provides a microscopic 
description for a class of black holes. 
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Black Hole Solutions in String Theory 


Perturbatively, the basic excitations of string theory 
are fundamental closed and open strings character- 
ized by a string tension T, and hence a length scale, 
the string length ls =1/./27T,. Consistency requires 
that the string should be able to propagate in ten 
spacetime dimensions and should be supersym- 
metric at the fundamental level. Formulated in 
this fashion, there are several consistent string 
theories: type IIA, type IIB, and heterotic string 
theory (which contain only closed strings perturba- 
tively) and type I theory (which contains both open 
and closed strings). 

At energies much smaller than 1//,, only the 
massless modes of the string can be excited. For all 
these string theories, the massless spectrum of closed 
strings contains the graviton and the low-energy 
dynamics is given by the appropriate supersymmetric 
generalization of general relativity, supergravity. In 
addition, the closed-string spectrum contains a 
neutral scalar field, the dilaton ¢, whose expectation 
value gives rise to a dimensionless parameter govern- 
ing interactions, called the string coupling g:: 


&s = ge [6] 


The ten-dimensional gravitational constant is given 


by 
Gio = 87° gil 7] 


Ten-dimensional supergravity has a wide variety of 
black hole solutions, the simplest of which is the 
straightforward generalization of the Schwarzschild 
solution. 


Black p-Brane Solutions 


More significantly, there are solutions which are 
charged with respect to the various gauge fields that 
appear in the supergravity spectrum. Generically, 
these charged solutions represent extended objects. 
For accounts of such solutions, see Maldacena 
(1996). 

Consider, for example, the supergravity which 
follows from type IIB string theory. This theory has 
a pair of 2-form gauge fields Bmn and Bi,,, and a 
4-form gauge field Aynreo with a self-dual field 
strength. Just as an ordinary point electric charge 
produces a 1-form gauge field, a (p + 1)-form gauge 
field may be sourced by an electrically charged 
p-dimensional extended object. The corresponding 
field strength is a (p + 2)-form, whose Hodge dual in 
d spacetime dimensions is a (d — p — 2) form. This 
shows that there should be magnetically charged 


(d — p — 4)-dimensional extended objects as well. 
These extended objects are called “branes.” 

In the type IIB example, there should be two 
kinds of one-dimensional extended objects 
which carry electric charge under Byn, Bun, 
called the F-string and the D-string, respectively. 
There are also two kinds of five-dimensional 
branes which carry magnetic charges under 
Bun; Byn, called the NS 5-brane and DS brane, 
respectively. Finally, there should be a 3-brane, 
since the corresponding 5-form field strength is 
self-dual as well as a D7 brane. A similar catalog 
can be prepared for other string theories, as well 
as for 11-dimensional supergravity, which is the 
low-energy limit of M-theory. 

The classical solutions for a set of p-branes of the 
same kind generally have inner and outer horizons 
which have the topology t x S°-? x RP. The outer 
horizon is then associated with a Hawking tempera- 
ture and a Bekenstein-Hawking entropy. Of parti- 
cular interest are extremal limits. In this limit, the 
inner and outer horizons coincide and the mass 
density is simply proportional to the charge. Given 
some charge, the extremal solution has the lowest 
energy. Extremal limits are interesting because in 
supergravity these correspond to solutions in which 
some of the supersymmetries (in this case, half of the 
supersymmetries) are retained — such solutions are 
called Bogomolny—Prasad—Sommerfeld (BPS) satu- 
rated solutions. The charge in question appears as a 
central charge in the extended supersymmetry 
algebra. This fact may be used to show that such 
BPS solutions are absolutely stable. Indeed, for the 
particular solution considered here, the Hawking 
temperature Ty — 0, so that there is no Hawking 
radiation, as required by stability. Furthermore, the 
entropy Spy — 0. The horizon shrinks to a point 
which appears as a naked null singularity. 

All the ten dimensions of string theory need not be 
noncompact. In fact, to describe the real world, one 
must have a solution of string theory in which six of 
the dimensions are wrapped up and form a compact 
space. In principle, however, one can compactify 
any number of dimensions. In the above example 
of a p-brane, it is trivial to compactify the 
directions along which the brane is extended to a 
p-dimensional torus, T?, which can be chosen to be 
a product of p circles each of radius R. At length 
scales much smaller than R, the theory then becomes 
a (10 — p)-dimensional theory. The p-brane appears 
as a black hole with a spherical horizon and, 
since the original p-form gauge field now behaves 
as an ordinary 1-form gauge field with a nonzero 
time component, this is an electrically charged 


black hole. 


D1-D5-N System and Five-Dimensional Black 
Holes 


For reasons which will become clear in the next 
section, it is useful to get extremal black holes with 
large horizon areas, so that Hawking’s semiclassical 
formulas are valid. It turns out that such solutions 
involve branes of various types which intersect each 
other and are suitably wrapped on compact internal 
spaces. Such black holes then have necessarily 
different kinds of charges. It turns out that the 
simplest case is a five-dimensional black hole with 
three kinds of charges, which is obtained by brane 
systems wrapped on a compact five-dimensional 
space. An example is a type IIB solution which has 
DS branes which are wrapped on either T4 x S! or 
K3 x S', together with D1 branes wrapped on the S! 
as well as some momentum along the St. From the 
noncompact five-dimensional point of view, this is a 
black hole with three kinds of gauge charges: the D5 
charge Os, the D1 charge O1, and a Kaluza—Klein 
charge N coming from the momentum P=N/R 
along the circle of radius R. 

When the internal space is T* x St the five- 
dimensional Einstein frame metric is given by 


a? = (fin)? (1 - Bae 


dr? 
+10"? ea tase] i 
where 
r2 sinh? a4 r2 sinh” a 
f(r) = (14885 aaa a 
r2 sinh? o 
f ( ad 9 
and the three charges are 
Qı = Vrè sinh 2a1 7 is sinh 2a5 
13274 g 16’ > gP 
f [10] 
£ ré sinh 20 


N=— 
3278 g? 


where V is the volume of the T* and R is the radius 
of the circle St. 
The ADM mass of the black hole is 


RVs; 
Manm = 3274 92/8 


x [cosh 2a; + cosh2a5+cosh2a] [11] 
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The Bekenstein-—Hawking entropy is given by 


RVr? 
a= me a, coshas cosh o [12] 


while the Hawking temperature is 


1 


aeoo 13 
i 2rro cosh a1 cosh as cosh o 13, 

The extremal limit of this solution is given by 
ro—> 0, ay,as,0 — œ 114] 


Q1, Os, N = fixed 


The extremal solution is a BPS saturated state and 
retains four of the original supersymmetries. In this 
limit, the inner and outer horizons coincide. How- 
ever, the horizon is now a smooth S° with a finite 
area in the Einstein frame metric. Consequently, the 
extremal Bekenstein—Hawking entropy is also finite 
and may be seen to be 


oo extremal — 2n./O105N [15] 
The temperature, however, is zero in this limit, 
which is consistent with the stability of a BPS 
saturated state. 

The above five-dimensional black hole is in fact a 
generalization of the Reissner-Nordtsrom black 
hole. Similar solutions with large horizon areas in 
the extremal limit can be constructed in four 
dimensions. One such construction is in the IIB 
theory wrapped on T° in which there are four sets of 
D3 branes which wrap four different T°’s contained 
in the T°. Black holes with lower supersymmetry 
may be obtained by replacing the T° by a Calabi- 
Yau space. 


Duality and Branes 


String theory has a rich set of symmetries called 
duality symmetries which relate different kinds of 
string theories that are suitably compactified. 
These symmetries relate different classical solutions. 
For example, application of these symmetries relate 
the five-dimensional black holes above with other 
five-dimensional black holes with different kinds of 
charges. Furthermore, at the level of supergravity, 
these various theories may be derived from 
a yet unknown 11-dimensional theory called the 
M-theory whose low-energy limit is 11-dimensional 
supergravity. 


Branes in String Theory 


For a given string theory, the perturbative spectrum 
consists of strings. However, at the nonperturbative 
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level, there are, in addition, extended objects of 
other dimensionalities. Duality symmetries imply 
that these extended objects are as “fundamental” 
as the strings themselves. Such extended objects are 
also called branes. For an exhaustive account of 
branes in string theory, see Johnson (2003). 

Like their counterparts in supergravity, branes in 
string theory are typically charged with respect to 
some gauge fields. While supergravity solutions are 
possible with any value of the charge, in string 
theory the brane charges have to be quantized. 
Multiple units of the minimum quantum of charge 
can appear as collections of branes each with unit 
charge or, alternatively, branes which wrap around 
compact cycles in space a multiple number of times. 


D-Branes 


The extended objects in string theory are described 
in terms of their collective excitations. These 
are best understood for the class of branes called 
D-branes in the type II theory, discovered by 
Polchinski. These are D1, D3, D5, and D7 branes 
in type IIB and DO, D2, D4, and D6 branes in 
type IIA theory. Dp branes are characterized by the 
fact that they couple to, and act as sources for, 
(p+ 1)-form gauge fields which belong to the 
Ramond-Ramond sector of the theory. Collective 
excitations of a p-dimensional extended object in 
field theory are expected to be described by waves 
on its (p+ 1)-dimensional world volume. The 
collective coordinate action would be a quantum 
field theory which has vectors, corresponding 
to longitudinal oscillations of the brane, and 
scalars which correspond to transverse oscillations. 
For D-branes in string theory, the theory of 
collective excitations is a string field theory of open 
strings whose endpoints lie on the brane. (This is the 
origin of the nomenclature D-brane: an open string 
whose ends are constrained to lie on the brane has a 
world-sheet description in which the bosonic 
fields corresponding to transverse target space 
coordinates have Dirichlet boundary conditions.) 
The lowest-energy states of open superstrings are 
ordinary massless gauge fields and their supersym- 
metric partners so that the low-energy limit of 
the string field theory is a supersymmetric gauge 
theory. 

The fact that the underlying theory is a string 
theory has an important consequence. For a system 
of N parallel D-branes of the same type, one 
would have open strings which join different branes 
as well as the same brane. The low-energy 
theory then becomes a supersymmetric nonabelian 
gauge theory with gauge group U(N). In a suitable 


gauge, the off-diagonal gauge fields and their super- 
symmetric partners (which include scalar fields in 
the adjoint representation) are the low-energy 
degrees of freedom of open strings which connect 
different branes. 
The mass density or tension T, of a single Dp 
brane is given by 
1 
0 One m 
This couples to the (p + 1)-form gauge field with a 
charge 


Mp = 8s Ip [17] 


and the Yang-Mills coupling constant for the collec- 
tive theory on the brane world volume is given by 


ZÝM-Dp =Q a [18] 


The ground state of a single Dp brane is a BPS state 
which preserves 16 of the 32 supersymmetries of the 
original theory. One consequence of this is that two or 
more parallel Dp branes of the same type form a 
threshold bound state preserving the same supersym- 
metries, with no net force between them. As a result, the 
tension of N parallel Dp branes is simply NT). 

Branes of different dimensionalities can also form 
bound states. Of particular interest are configura- 
tions which can form threshold bound states which 
preserve some supersymmetries. For example, a set 
of Nı parallel Dp branes can form a threshold 
bound state with a set of N2 parallel D(4 + p) 
branes with all the p branes lying entirely along the 
(4+ p)-branes. This configuration is also a BPS 
saturated state preserving eight of the original 
supersymmetries and would have charges under 
both (p + 1)-form and (p+ 5)-form gauge poten- 
tials. The BPS nature ensures that the total mass 
density is the sum of the individual mass densities. 


NS Branes 


The other extended objects in string theory are 
called NS branes since they couple to p-form 
gauge fields which arise from the Neveu-Schwarz/ 
Neveu-Schwarz sector of the world-sheet theory. 
These are present in all the five string theories and 
appear in two types. The first is a macroscopic 
fundamental string which may be wound around a 
compact direction. The second is called a solitonic 
5-brane. While the collective dynamics of a funda- 
mental string is the standard world-sheet description 
of string theory, the description for the NS 5-brane 
is rather complicated and not known in full 
detail. The rest of this article deals exclusively with 
D-branes. 


D-Branes and Black Branes 


The idea that black holes correspond to highly 
degenerate states in string theory is quite old and 
dates back to ’t Hooft (1990) and Susskind (1993). 
In the following two sections we discuss such black 
holes which are described by D-branes. For reviews 
see Maldacena (1996), Das and Mathur (2001), and 
David et al. (2002). 

We have so far discussed the string-theoretic 
branes in two different ways. In the first description, 
branes are solutions of the low-energy equations of 
motion — this is the setting in which branes provide 
conventional descriptions of black holes. In the 
second description, branes are certain states in the 
quantum theory of superstrings. More specifically, 
D-branes are described in terms of states of the 
open-string field theory which lives on the branes. 
The first description is necessarily approximate. On 
the other hand, the second description is exact in 
principle, although in practice one might not know 
how to write down and analyze the string-field 
theory in an exact fashion. 

The description in terms of open-string field 
theory should reduce to the description in terms of 
a classical solution when the charges and masses 
become large. If black-hole thermodynamics has a 
microscopic origin, D-branes should be highly 
degenerate states in this limit and the entropy 
should be given by the Boltzmann formula. Further- 
more, Hawking radiation should be understood as 
an ordinary decay process. 

For a system of Q, parallel Dp branes, the mass 
is O,/gs, while Newton’s gravitational constant 
G~g?. Gravitational effects are controlled by 
GM ~ g¿Qp. A semiclassical limit in closed-string 
theory requires g, — 0, while a nontrivial gravita- 
tional effect in this limit requires g,O,» finite, which 
implies one must have Q, >1. Furthermore, when 
gsQp > 1 the typical curvatures are small compared 
to the string scale and the semiclassical string theory 
reduces to classical supergravity. This is the limit in 
which branes are well described as classical 
solutions. 

Similar considerations apply for brane systems with 
multiple charges. For example, in the D1-D5-N 
system the classical solution becomes a good 
description when all the quantities g,0;, g,0s5, and 
g-N become large. (The relevant quantity which 
comes with the momentum has g? rather than g, 
because the mass contribution from the momentum is 
simply N/R without any inverse power of g,.) 
However, g, is the square of the coupling constant 
of the open-string theory living on the brane — in fact, 
eqn [18] shows this relation in the low-energy limit. 
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It is well known that in a U(Q,) gauge theory the real 
coupling constant is gym v Qp ~ ,/g;Oy. This means 
that the semiclassical limit corresponds to a strongly 
coupled string-field theory which reduces to strongly 
coupled gauge theory in the low-energy limit and the 
picture of D-branes as a collection of open strings is 
not very useful. In fact, known calculational methods 
in gauge theory or open-string theory are not valid in 
this regime. 


Microscopic Entropy for Two-Charge Systems 


The prospects are much better for extremal black 
holes, which appear as BPS states in string theory. 
This is because the spectra of BPS states do not 
depend on the coupling. The degeneracy of such 
states may therefore be calculated at weak coupling, 
where techniques are well known and the result can 
be extrapolated to strong coupling without change. 

The simplest BPS state is the ground state of a set of 
parallel D-branes of the same type. This state is indeed 
128-fold degenerate, which would imply a micro- 
scopic entropy. This entropy, however, is small and 
therefore invisible in the corresponding classical 
solution. Indeed, the classical solution shows that in 
the extremal limit the horizon area is zero, leading toa 
vanishing Bekenstein—Hawking entropy. 

The next interesting class of states consists of 
threshold bound states with two kinds of 
charges. Consider, for example, the D1-D5 system 
on T* x S! considered above with no momentum 
along the D1’s. By known duality transformations, 
this is equivalent to a fundamental IIB string which 
is wound Os times around the St! and with a net 
momentum P = O;,/2705R (where R is the radius of 
the S'), with four of the transverse directions 
compactified on a T*. For this system, it is easy to 
count the number of states for given values of Q1 
and O; at weak string coupling by simply enumer- 
ating the perturbative oscillator states of the string. 
For large values of O; and Os, we can alternatively 
calculate this entropy by using a canonical ensemble 
of eight massless bosons corresponding to the eight 
transverse polarizations and their supersymmetric 
partners — eight massless fermions — moving on the 
string with some temperature T and a chemical 
potential œ for the total momentum. 

Consider a noninteracting gas of f massless bosons 
and f fermions living on a circle with circumference 
L. The average number of left- and right-moving 
particles with some energy e, denoted by pr, pr, 
respectively, are 
i=L,R 19] 


pile) — ee/T; a: 1’ 
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where the + sign refers to fermions and bosons, 
respectively, and we have introduced left- and right- 
moving temperatures T, Tp. The physical tempera- 


ture 1s 
1 1/1 1 
eet ae oe 2 
T Et) An 


The extensive quantities, such as the energy E, 
momentum P, and entropy S, then become the sum 
of left- and right-moving pieces: 


E=FE ,+Egr, P=Pi+Pr, S= St SR [21] 


and the distribution function [19] leads to the 
following thermodynamic relations: 


z [SE _ 48: 
OM nf afl’ 


Since the total momentum P = Pp + Py = Ep — E; is 
nonzero, the lowest-energy state is clearly the one in 
which all the particles move in the same direction, 
for example, right moving. This is a BPS state and 
corresponds to the extremal solution in supergravity. 
Then E=Ep=P=Pg. This approach to the black 
hole entropy was initiated by Das and Mathur 
(1996) and Callan and Maldacena (1996). 

For our two-charge system, f=8,P=270,/L, 
and L=27RQ Qs. Using [22] we get 


senate — 2t/20105 [23] 


This is the microscopic entropy for the fundamental 
string with momentum in the type II theory. By 
duality, this is also the microscopic entropy of the 
D1-D5 system. This is a large number which should 
agree with the macroscopic entropy calculated from 
the corresponding classical solution. 

The discussion is almost identical for the funda- 
mental heterotic string, except that now we have 
24 right-moving bosons, eight left-moving bosons, 
and eight left-moving fermions, and the BPS state 
consists only of right movers. If nw denotes the 
winding number and np the quantized momentum 
the extremal heterotic string entropy is 








i=L,R [22] 


eo heterotic a Aire Tipiy (2 4] 
The supergravity solution for the D1-DS 
system may be obtained by substituting o=0 in 
eqns [8|-[13]. In the extremal limit, the classical 
Bekenstein—Hawking entropy vanishes as is clear 
from the expression [15], in which N=0. This 
appears to be in contradiction with the fact that the 
state has a large microscopic entropy. 


The key point, however, is that the two-charge 
solution has a singular horizon where the string 
frame curvature is large. Consequently, low-energy 
tree-level supergravity breaks down near the horizon 
and higher-derivative terms (e.g., higher powers of 
curvature) become important. This issue has been 
best studied for the fundamental heterotic string 
compactified on T°. This is dual to the D1-D5 
system in type IIB theory compactified on K3 x T?. 
The classical supergravity solution is then a singular 
black hole in four spacetime dimensions. In one of 
the first papers on the string-theoretic understanding 
of black hole thermodynamics, Sen (1995) showed 
that, for large np, my, string-loop effects are small 
near the horizon so that the only relevant correc- 
tions are higher-derivative terms coming from 
integrating out the massive modes of the string at 
tree level. Furthermore, a robust scaling argument 
shows that regardless of the detailed nature of the 
derivative corrections, the macroscopic entropy 
defined through the horizon area must be of the 
form a,/npnw, where a is a pure number. Finally, 
one can define a “stretched horizon” as the surface 
where the curvature becomes of the order of the 
string scale and the area of the stretched horizon 
is indeed proportional to \/mpmw. This result gives 
a strong indication that string theory provides a 
microscopic basis for black hole thermodynamics, 
although the coefficient a cannot be determined 
without more detailed knowledge of higher- 
derivative terms. 


Microscopic Entropy of Extremal Three-Charge 
System 


Brane bound states with three kinds of charge 
provide examples of black holes whose extremal 
limits have large horizons with curvatures much 
smaller than the string scale. In this case, a 
microscopic count of states in string theory should 
exactly account for the Bekenstein-Hawking 
formula, without corrections coming from 
higher derivatives. This is indeed true, as first found 
by Strominger and Vafa (1996). In the following, we 
will outline how this calculation can be done in the 
D1-D5-N system on K3 x S! or T* x S! following 
the treatment of Dijkgraaf et al. (1996). 

D1 branes can be considered as “instanton 
strings” in the six-dimensional supersymmetric 
U(Q;) gauge theory of DS branes (actually, these 
should be called solitonic strings rather than 
instantons, since the configurations are time 
independent). The total instanton number is the 
Di-brane charge QO;. The moduli space of 
these instantons is then a blown-up version of the 


orbifold (T*)2'25/8(Q1Qs) or (K3)2'25/S(Q1Qs) 
and is 40,Qs dimensional. Since any instanton 
configuration is independent of time x° and the S! 
direction x°, the collective coordinate dynamics is a 
(1 + 1)-dimensional field theory which lives in the 
(x9, x°) space. At low energies, this flows to a 
conformal field theory with a central charge 
c=6010;5 since there are 40105; bosons each 
contributing 1 to the central charge and an equal 
number of fermions each contributing 1/2. The BPS 
state with momentum N/R is a purely right- or left- 
moving state in this conformal field theory which 
has a conformal weight N. From general principles 
of conformal invariance, the degeneracy of such 
states for large N is given by Cardy’s formula 


d(N) ~< e27 cN/6 [25] 
so that the microscopic entropy is 


Scharge = log d(n) = 2ry/cN/6 |26] 


Substituting the value of c = 60105, this is in exact 
agreement with the Bekenstein-Hawking entropy of 
the classical solution given in [15]. 


Nonextremal Black Holes and Hawking 
Radiation 


The BPS property of ground states of D-brane 
systems enables us to compute the degeneracy of 
microstates exactly in the regime of parameters 
where the state can be reliably described as a black 
hole solution in the low-energy theory. However, 
extremal black holes have vanishing temperature 
and do not radiate. To understand the microscopic 
origins of Hawking radiation, one has to go away 
from extremality. Such states are not supersym- 
metric and an extrapolation of weak-coupling 
calculations to strong coupling is not a priori 
justified. Nevertheless, it turns out that for small 
departures from extremality, weak-coupling results 
still reproduce semiclassical answers for entropy, 
temperature, and luminosity. 


Near-Extremal Entropy 


Nonextremal properties are best understood for the 
D1-D5-N system on T* x S!. In the orbifold limit, 
the conformal field theory which describes the low- 
energy dynamics is equivalent to a gas of strings 
which are wound around the St and which can 
oscillate along the T4. The total winding number is 
k=Q,Qs5 and may be achieved by sets of strings 
which are multiply wound in various ways. As 
argued below, entropically the most favored config- 
uration is a single long string wound around Q,0;5 
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times. Thus, the thermodynamics may be analyzed 
exactly along the lines of the fundamental string in 
the previous section. The thermodynamic relations 
are given by [22] with f =4 and L=27RQO Qs. The 
extremal state consists entirely of right movers and 
E=Er=N/R. Substituting these values in [22] 
yields the correct formula for the microscopic 
entropy 


srenares — In/O1O5N [27] 


The same expression follows if f =4010;5; and 
L=2rR corresponding to Q105 singly wound 
strings. However, for statistical methods to hold, 
the entropy must be much larger than the number of 
flavors. The ratio of the entropy to the number of 
flavors is S/f ~ ,/N/Q,Qs5 for multiple singly 
wound strings and is not guaranteed to be large 
when all of O1, Os, N are large. On the other hand, 
this ratio is S/f ~ ,/Q,Qs5N for the long string. 
This shows that the long string is always entropi- 
cally favored. 

A departure from the extremal state is achieved by 
adding a left-moving momentum 277/L as well as a 
right-moving momentum 277/L to the extremal 
state, thus adding energy to the system but main- 
taining the total momentum. For the long string, this 
yields 


Sr = 2ryV Q1QSN +n, 


For small departures from extremality, n « N, the 
expressions for the total entropy and temperature as 
a function of the excess energy AE=2n/Q10;5 
agree exactly with the near-extremal Bekenstein— 
Hawking entropy and the Hawking temperature of 
the classical solution, as shown by Callan and 
Maldacena (1996) and by Horowitz and 
Strominger. 

The necessity of the long string appears in another 
important physical consideration. For statistical 
mechanics to be valid, the specific heat of the system 
has to be larger than unity. This implies that for 
the case considered here the energy gap AE must be 
larger than 1/ROQ,Qs, which is precisely what the 
long string yields. 


SL = 2ry/n [28] 


Hawking Radiation 


A nonextremal state described above is unstable, 
since a left mover can annihilate a right mover into a 
closed-string mode which may leave the brane 
system and propagate to the asymptotic region. 
The resulting closed-string state will be in a thermal 
state whose temperature is the physical temperature 
of the initial state. This process is the microscopic 
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description of Hawking radiation. The decay rate is 
related to the absorption cross section of the 
corresponding mode by the principle of detailed 
balance, encoded in eqn [4]. 

From the point of view of the classical solution, 
the absorption cross section can be calculated by 
solving the linearized wave equation in the 
background geometry and calculating the ratio of 
the incident and reflected waves. It follows from 
these calculations that at low energies, absorption 
(and hence emission) are dominated by massless 
minimally coupled scalars. In fact, for any spheri- 
cally symmetric black hole in any number of 
dimensions, there is a general theorem which 
ensures that the low-energy limit of this absorption 
cross section is exactly equal to the horizon area. 

In the microscopic model for the three-charge 
black hole, this absorption cross section may be 
calculated by the usual rules of quantum mechanics. 
In the long-string limit and in the approximation 
that the modes on the long string form a dilute gas, 
the result has been derived by Das and Mathur 
(1996): 


om 27G 1900105 


ev/T _ 4 
o(w) = 


“eh therm 1 +7! 


where V is the volume of the T* and T is the 
physical temperature given by [20]. For a near- 
extremal hole Tr > Ti, so that T~2T,. Then 
in the extreme low-energy limit w < Tp, so that 
the corresponding Bose factor may be approxi- 


mated as 1/(e%/?7® —1)~ 2Tr/w. The cross 
section [29] becomes 
7 47O105GioTR — 4GioSr 
V © (20R)V 
= AGS renal — AH [30] 


where Gs is the five-dimensional Newton’s gravita- 
tional constant. We have used the relation [22] with 
L=2nRQ,Qs and f =4. The fact that in the near- 
extremal limit Sp is simply the extremal entropy and 
the fact that the extremal entropy reproduces the 
Bekenstein—Hawking formula has been used as well. 
Thus, the microscopic cross section exactly reproduces 
the semiclassical result at low energies. Even more 
remarkably, the full cross section [29] agrees with the 
semiclassical answer for the gray-body factor for 
parameters which correspond to the dilute-gas regime, 
as shown by Maldacena and Strominger. 

It is rather surprising that the results for micro- 
scopic absorption cross section calculated at weak 
coupling agree with the semiclassical answers, since 
the relevant process involves states which are not 


supersymmetric and therefore a naive extrapolation 
to strong coupling is not a priori justified. There 
are strong indications, however, that low-energy 
nonrenormalization theorems are at work. This 
agreement has been established not only for black 
holes with finite-horizon areas, but also for other 
systems with no horizons — most significantly, a set 
of parallel 3-branes — and forms the basis for 
Maldacena’s conjecture about AdS/CFT Correspon- 
dence (see AdS/CFT Correspondence). 


Effects of Higher-Derivative Terms 


The classical low-energy limit of string theory is 
supergravity. The effects of the massive modes of the 
string as well as effect of string loops is to add terms to 
the supergravity action which involve higher number 
of spacetime derivatives, for example, terms containing 
higher powers of the curvature. In the presence of such 
terms, the Bekenstein—Hawking formula for black hole 
entropy [2] receives corrections which can be calcu- 
lated in a systematic fashion. It turns out that for a 
class of extremal black holes, this corrected entropy as 
computed in the modified supergravity is also in exact 
agreement with a microscopic calculation. 

One example of this agreement is provided by four- 
dimensional extremal black holes in type IA string 
theory compactified on a Calabi-Yau manifold. These 
are obtained by wrapping D4 branes on three different 
4-cycles on the Calabi-Yau and having in addition a 
number of DO branes. Let p“, A=1,...,3 denote the 
three D4 charges and go denote the DO charge. The 
microscopic entropy of the BPS state can be computed 
by embedding this in M-theory: 

g§CY—Black hole 


micro 


1 
= 204] Z lqo|(Cascp“p?p® + c2ap4) [31] 


where Capc is the intersection number of the 
4-cycles and c2 denotes the second Chern class of 
the Calabi-Yau space. When all the charges p4 are 
large, the term involving c2 is subdominant. In this 
case, the result agrees with the Bekenstein-Hawking 
entropy of the corresponding classical solution. 
When the charges are not all large (so that the 
second term is appreciable), the curvatures of the 
supergravity solution become large at the horizon 
and higher-derivative corrections to the action 
cannot be ignored. In this particular case, it turns 
out that these higher-derivative corrections are 
string-loop corrections and can be computed using 
general properties of N=2 supersymmetry, so that 
one can compute corrections to  near-horizon 
geometry. Furthermore, one has to now modify the 


expression for macroscopic entropy using the 
formalism of Wald. Putting these together, it is 
found that the macroscopic entropy following from 
the modified supergravity is in exact agreement with 
[31]. This subject is reviewed in Mohaupt (2000). 

These methods have also been applied to the 
problem of two-charge black holes in heterotic 
string theory on T6 or, equivalently, type HA on 
K3 x T? (Dabholkar 2004). Recall that in this case 
the horizon of the usual supergravity solution is 
singular. It has been found that leading-order 
higher-derivative corrections smoothen out the 
horizon into a AdS, x S* spacetime and the 
modified expression for the macroscopic entropy is 
again in exact agreement with the microscopic 
answer [23]. 


Geometry of Microstates 


A satisfactory solution of the information-loss 
paradox requires a much more detailed understand- 
ing of black holes in string theory. The discussion 
above shows that black holes have microstates 
which may be described well in the weak-coupling 
regime. It is interesting to ask whether there is a 
description of these microstates in the strong- 
coupling regime in terms of the effective geometry 
perceived by suitable probes. This question has been 
answered for the two-charge system in great detail 
(see Mathur (2004)). It turns out that the D1-D5 
microstates can be described by perfectly smooth 
metrics with no horizons, and they asymptote to 
the standard two-charge metric discussed above. 
The location of the erstwhile stretched horizon 
marks the point where the different microstates 
start differing from each other significantly. Since 
each such geometry does not have a horizon, neither 
does it have any entropy — this is consistent with 
their identification with nondegenerate microstates. 
Indeed, the number of such microstates correctly 
accounts for the microscopic entropy. Whether a 
similar picture holds for the three-charge system 
remains to be seen in detail, although there are some 
indications that this may be true. In this approach, it 
is not yet fully understood how a horizon emerges 
and why the entropy scales as the horizon area. 


Outlook 


One key feature of the understanding of black hole 
statistical mechanics from the dynamics of branes is 
the fact that a problem in gravity is mapped to a 
problem in a theory without gravity, for example, 
open-string field theory. In fact, the closed strings in 
the bulk are already contained in the spectrum of the 
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open strings. This is a consequence of the basic 
duality between open strings and closed strings. 
Furthermore, the open-string theory lives in a lower- 
dimensional spacetime. This is a manifestation of 
the holographic principle. As argued by Maldacena, 
the presence of a horizon implies that the low- 
energy limit retains all the modes of the closed 
strings near the horizon, while it truncates the open- 
string theory to a gauge theory. Open-—closed duality 
then reduces to gauge-string duality. This provides a 
strong evidence that black holes obey the normal 
laws of quantum mechanics and hence their time 
evolution is unitary. 

One of the most outstanding problems in the 
subject is a proper understanding of neutral black 
holes. Most of the quantitative results described 
above depend on supersymmetry, which allows 
extrapolation of weak-coupling answers to the 
strong-coupling domain. Some of these results can 
be extended to situations which have small depar- 
tures from supersymmetry, for example, near- 
extremal black holes. States corrresponding to 
neutral black holes are, however, far from super- 
symmetry and known calculational techniques fail. 
There are good reasons to expect, however, that the 
general philosophy — in particular the holographic 
principle — is still valid. Finally, so far string theory 
has been able to attack problems of eternal black 
holes. A satisfactory understanding of the informa- 
tion-loss problem requires an understanding of the 
dynamics of black hole formation and subsequent 
evaporation. Unfortunately, very little is known 
about this at the moment. 


See also: AdS/CFT Correspondence; Black Hole 
Mechanics; Supergravity; Superstring Theories. 


Glossary 


ADM (Arnowitt—Deser—Misner) mass — Mass of a gravita- 
tional background which is asymptotically flat. 

AdS, (anti-de Sitter space) — A space (or spacetime) with 
constant negative curvature in n dimensions. 

BPS state (Bogomolny—Prasad-Sommerfeld state) — In a 
theory of extended supersymmetry, a state that is 
invariant under a nontrivial subalgebra of the full 
supersymmetry algebra. These states always carry 
conserved charges, and supersymmetry determines the 
mass exactly in terms of the charges. 

Calabi-Yau space — Complex Kahler manifold with 
vanishing first Chern class. 

Compactify (n. compactification) — To consider a field or 
string theory in a spacetime some of whose spatial 
dimensions are compact. 

Dirichlet boundary condition — The boundary condition 
which fixes the value of a field on the boundary. 
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Duality Equivalence of systems which appear to be 
distinct. For string theories, such equivalences relate 
string theories on different spacetimes as well as 
theories with different coupling constants. 

Einstein—Hilbert action — The standard action for gravity 
which leads to Einstein’s equation, 
S= (1/167G) f dfx gR, where R is the Ricci scalar, 
g denotes the determinant of the metric, and G is 
Newton’s gravitational constant. 

Instanton — A classical solution of Euclidean field theory 
with finite action. 

Kaluza—Klein gauge field — In a compactified theory, the 
gauge field which arises from the metric of the higher- 
dimensional theory. 

K3 — The unique Calabi-Yau manifold in four dimensions 
having an SU(2) holonomy. 

Loop levels — In a Feynman diagram expansion of a field 
theory, terms which contribute in higher orders of the 
Planck constant hb. 

Macroscopic entropy — Entropy associated with gravita- 
tional backgrounds via the Bekenstein-Hawking for- 
mula or its generalization. 

Microscopic entropy — Entropy which follows from the 
degeneracy of states of a system via Boltzmann’s 
relation. 

Minimally coupled scalar — A scalar field whose equation 
of motion is the standard Klein—Gordon equation 
where the derivatives are covariant derivatives. 

Neveu-Schwarz/Neveu-Schwarz states — In type I and II 
string theories, bosonic closed-string states whose left- 
and right-moving parts are bosonic. 

No-hair theorem — A theorem in general relativity which 
states that black holes with nonsingular horizons are 
uniquely characterized by their mass, angular 
momenta, and charges which can couple to long- 
range gauge fields. 

Orbifold — A coset space M/G where G is a group of 
discrete symmetries of a manifold M. If G has a fixed 
point, the space is singular. 

p-Form - A fully antisymmetric p-index tensor. 

Ramond-—Ramond states — In type I and II string theories, 
bosonic closed-string states whose left- and right- 
moving parts are fermionic. 

Reissner—Nordstrom black hole — Black hole solution of 
general relativity with electric Maxwell charge. 

S” — n-Dimensional sphere. 


Supergravity -— Supersymmetric extension of general 
relativity. 
Supersymmetry — A symmetry between bosons and 


fermions. 


Threshold bound state — A bound state which is margin- 
ally bound, that is, the binding energy is zero. 

Tree level — In a Feynman diagram expansion of a field 
theory, terms which contribute to lowest order of the 
Planck constant hb. 

U(N) - The group of N xN unitary matrices. If the 
determinant is unity, the subgroup is called SU(N). 
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Introduction 


Watching the sea or a lake it is often possible to 
trace a wave as it propagates on the water’s surface. 
One can roughly distinguish two types of breaking 
waves. All waves break while reaching the shore but 
certain waves break far from the shore. In the first 
case, the change in water depth or the presence of an 
obstacle (e.g., a rock) seems to cause wave breaking, 
while for certain waves within the second category, 
these factors appear not to be essential. It is a matter 
of observation that for many waves that break in the 
open water a drastic increase in their slope near 
breaking is noticeable. This leads us to the following 
mathematical definition: the wave profile gradually 
steepens as it propagates until it develops a point 
where the slope is vertical and the wave is said to 
have broken (Whitham 1980). Throughout this 
article, we are concerned with wave breaking that 
is not caused by a drastic change of the topography 
of the bottom; for a discussion of wave breaking at 
the beach we refer to Johnson (1997). The governing 
equations for water waves (see the next section) are 
too difficult to be dealt with in their full generality. 
Therefore, to gain some insight, one has to find 
simpler models that are more tractable mathemati- 
cally. Investigating the properties of the model, 
certain predictions can be made. The conclusions 
reached will reflect reality only to some limited 
extent. The value of a model depends on the number 
and the degree of accuracy of physically useful 
deductions that can be made from it — the “truth” of 
the model is meaningless as all experiments contain 
inaccuracies and effects other than those accounted 
for (while deriving the model) cannot be totally 
excluded. We intend to discuss the way in which a 
recent model due to Camassa and Holm (1993) can 
lead to a better understanding of breaking water 
waves. Firstly we survey a few classical nonlinear 
partial differential equations that model the propa- 
gation of water waves over a flat bed (within the 
confines of the linear theory one cannot cope with 
the wave breaking phenomenon) and discuss their 
relevance to the study of breaking waves. We then 
analyze the breaking of waves within the context of 
the Camassa—Holm equation: existence of breaking 
waves, criteria that guarantee that a certain initial 
shape develops into a breaking wave, specific 
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features of wave breaking (blow-up rate and blow- 
up set for certain types of breaking waves). We 
conclude the presentation with a discussion of the 
way in which solutions to the Camassa—Holm 
equation can be continued after wave breaking. 


The Governing Equations 


The water waves that one typically sees propagating 
on the surface of the sea or on a lake are, as a matter 
of common experience, approximately two dimen- 
sional. That is, the motion is identical in any direction 
parallel to the crest line. To describe these waves, it 
suffices to consider a cross section of the flow that is 
perpendicular to the crest line. Choose Cartesian 
coordinates (x, y) with the y-axis pointing vertically 
upwards and the x-axis being the direction of wave 
propagation, while the origin lies at the mean water 
level. Let (u(t, x,y), v(t,x,¥y)) be the velocity field of 
the flow, let y= — d be the flat bed (for some fixed 
d > 0), and let y =n(t,x) be the water’s free surface. 
Homogeneity (constant density) is a physically reason- 
able assumption for gravity waves (Johnson 1997), 
and it implies the equation of mass conservation 


Uy + Vy = 0 [1] 


The inviscid setting is realistic since experimental 
evidence confirms that the length scales associated 
with an adjustment of the velocity distribution due to 
laminar viscosity or turbulent mixing are long com- 
pared to typical wavelengths. Under the assumption of 
inviscid flow the equation of motion is Euler’s equation 
U; + Uty + Vuy = —P, a 
Ve + UV, + UVy = —Py — g 
where P(t,x,y) denotes the pressure and g is the 
gravitational constant of acceleration. The free 
surface decouples the motion of the water from 
that of the air so that (Johnson 1997) the dynamic 
boundary condition 


P= Po ony=n(t,x) [3] 


must hold if we neglect surface tension, where Po is 
the (constant) atmospheric pressure. Moreover, 
since the same particles always form the free surface, 
we have the kinematic boundary condition 


on y = H(t, x) 4] 


On the flat bed we have the kinematic boundary 
condition 


V = mh + Un 


v=0 ony=-d [5] 
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expressing the fact that the flow is tangent to the 
horizontal bed (or, equivalently, that water cannot 
penetrate the rigid bed). The governing equations 
for water waves are [1]-[5]. Other than the fact that 
they are highly nonlinear, a main difficulty in 
analyzing the governing equations lies in the fact 
that we deal with a free boundary problem: the free 
surface y=7(t,x) is not specified a priori. In our 
discussion, we suppose that initially (at time t=0), a 
disturbance of the flat surface of still water was 
created and we analyze the subsequent motion of 
the water. The balance between the restoring gravity 
force and the inertia of the system governs the 
evolution of the mass of water and our primary 
objective is the behavior of the free surface. 

An important category of flows are those of zero 
vorticity, characterized by the additional assumption 


ty =O [6] 


The vorticity of a flow, w = uy — vx, measures the local 
spin or rotation of a fluid element. In flows for which 
[6] holds the local whirl is completely absent and for 
this reason such flows are called irrotational. Relation 
[6] ensures the existence of a velocity potential, namely 
a function ¢(t, x,y) defined up to a constant via 


Qy =U, Py =U 


Notice that [1] ensures that ¢ is a harmonic 
function, that is, (0? + &)o=0. In this way, the 
powerful methods of complex analysis become 
available for the study of irrotational flows. Thus, 
while most water flows are with vorticity, the study 
of irrotational flows can be defended mathemati- 
cally on grounds of beauty. Concerning the physical 
relevance of irrotational water flows, experimental 
evidence indicates that for waves entering a region 
of still water the assumption of irrotational flow is 
realistic (Johnson 1997). Moreover, as a conse- 
quence of Kelvin’s circulation theorem (Acheson 
1990), a water flow that is irrotational initially has 
to be irrotational at all later times. It is thus 
reasonable to consider that water motions starting 
from rest will remain irrotational at later times. 


Nonlinear Model Equations 


Starting from the governing equations [1|-[6] one can 
derive a variety of model equations using the non- 
dimensionalization and scaling approach: a suitable 
set of nondimensional variables is introduced, which, 
after scaling, leads to the appearance of parameters. 
The sizes and relative sizes of these parameters then 
govern the type of phenomenon that is of interest. An 
asymptotic expansion in one or several parameters 


yields an equation that is usually of significance in 
some region of space/time. The aim of this process is to 
obtain a simpler model that can be used to gain some 
understanding and to make some predictions for 
specific physical processes. This scaling method yields 
the Korteweg-de Vries (KdV) equation 


me tme Fikse — 0, t>O0,xEk [7] 


as a model for the unidirectional propagation of 
shallow water waves over a flat bed (Johnson 1997). 
In [7] the function n(t,x) represents the height of the 
water’s free surface above the flat bed. We would 
like to emphasize that the “shallow water” regime 
does not refer to water of insignificant depth — it 
indicates that the typical wavelength is much larger 
than the typical depth (e.g., tidal waves are 
considered to be shallow water waves although 
they affect the motion of the deep sea). The KdV 
model admits the solitary wave solutions 


ne(t, x) = 3c sech? k (x — e) ); ceR [8] 


For any fixed c > 0, the profile 7, propagates without 
change of form at constant speed c on the surface on 
the water, that is, it represents a traveling wave. Since 
the profiles [8] of the traveling waves drop rapidly to 
the undisturbed water level 7 = 0 ahead and behind the 
crest of the wave, 7, are called solitary waves. Notice 
that [8] shows that taller solitary waves travel faster. 
They have other special properties: an initial profile 
consisting of two solitary waves, with the taller 
preceding the smaller one, evolves in such a way that 
the taller wave catches up the other, there is a period of 
complicated nonlinear interaction but eventually both 
solitary waves emerge completely unscathed! This 
special type of nonlinear interaction (the superposition 
principle is not valid since KdV is a nonlinear 
equation) in which solitary waves regain their form 
upon collision occurs only for special equations, in 
which case the solitary waves are called solitons. A 
further interesting property of the KdV model, relevant 
for the understanding of the interaction of solitons, is 
the fact that it is completely integrable (McKean 
1998): there is a transformation which converts the 
equation into an infinite sequence of linear ordinary 
differential equations which can be trivially integrated. 
Moreover, the KdV-solitons 7, are stable: an initial 
profile that is close to the form of a soliton will evolve 
into a wave that at any later times has a form close to 
that of a soliton (Benjamin 1972). Despite all these 
intriguing features of the KdV-model, for all initial 
profiles x +> 7(0, x) within the Sobolev space H'(R) of 
square-integrable functions with a square-integrable 
distributional derivative, eqn [7] has a unique solution 


defined for all times t > 0 (cf. Kenig et al. (1996)) so 
that the KdV model cannot be used to shed light on the 
wave breaking phenomenon. 

Whitham (1980) suggested the equation 


ee | k(x —y)ny(tydy=0 [9] 


for the free surface profile x> n(t,x), with the 
singular kernel 


1/2 


to model wave breaking. It can be shown 
(see Constantin and Escher (1998) and references 
therein) that [9] describes wave breaking: there are 
smooth initial profiles x+>7(0,x) such that the 
resulting unique solution of [9] exists on a maximal 
time interval [0, T) with 


sup {m(t,x)} < 00 
(t,«)€[0,T)xR 


inf {7,.(t,x)} + -œ as tÎT 
xER 


(the solution remains bounded but its slope becomes 

infinite in finite time). However, in contrast to the KdV 

model, eqn [9] is not integrable and does not possess 

soliton solutions. As emphasized by Whitham (1980), 

it is intriguing to find models for water waves which 

exhibit both soliton interaction and wave breaking. 
The Camassa—Holm equation 


Te — Mize + 37x = NaNe + Mke [10] 


was first obtained by Fokas and Fuchssteiner (1981/ 
82) as a nonlinear partial differential equation with 
infinitely many conservation laws. Camassa and Holm 
(1993) derived [10] as a model for shallow water 
waves, established that the equation possesses soliton 
solutions and found that it is formally integrable (for 
a discussion of the integrability issues we refer 
to Constantin (2001), and Lenells (2002)). Moreover, 
the solitons of [10] are stable (Constantin and Strauss 
2003). An astonishing plentitude of structures is 
tied into the Camassa-Holm equation: [10] is a re- 
expression of geodesic flow on the diffeomorphism 
group (Constantin 2000, Kouranbaeva 1999), a 
property that can be used to show that the least action 
principle holds in the sense that there is a unique flow 
transforming a wave profile into a nearby profile 
within the class of flows that minimize the kinetic 
energy (see the discussion in Constantin (2000) and 
Constantin and Kolev (2003)). Interestingly, the 
Camassa-Holm equation also models wave breaking. 
More precisely (see the discussion in Constantin 
(2000)), for any initial data x—> ņo(x)=n(0,x) in 
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H? (R) there is a unique solution of [10] defined on 
some maximal time interval [0, T) and the solution 
stays uniformly bounded on [0, T) with 


lim ( inf {ne(t,x)}(T — t)) = —2 if T < 00 
In addition to this, for a large class of initial data, there 
is precisely one point where the slope of the wave 
becomes infinite at breaking time (Constantin 2000): if 
no Æ 0 is odd and such that n(x) — 45(x) > 0 for all 
x < 0, then the corresponding wave t+> [x n(t, x)] 
will break in finite time T < oo and 


lim nx. (t,0) = —oo 


whereas 
cosh(x) 
< binnan a A 
te J0, T), x#0 


for some constant K > 0. Thus, the Camassa-Holm 
model is an integrable infinite-dimensional Hamil- 
tonian system with stable solitons and eqn [10] 
admits also breaking waves as local solutions (see 
Constantin and Escher (1998) and McKean (1998) 
and references therein for further results on wave 
breaking for the Camassa-Holm equation). 

We conclude our discussion by pointing out that it 
is possible to continue solutions of the Camassa- 
Holm equation past the breaking time. For this 
purpose it is convenient to rewrite [10] as the 
nonlinear nonlocal conservation law 


1 2 
m+ +3 f e (1? +) dy = 0 [11] 
R 


reminiscent to some extent to the form of [7] and [9] 
and obtained by formally applying the operator 
(1 — a2)! to [10] in view of the fact that 


(1-0) 'f=Pxf for feL*(R) 
the kernel of the convolution being 


P(x) =e, xeER 


By introducing a new set of independent and depen- 
dent variables it is possible to resolve all singularities 
due to wave breaking in the sense that [11] is 
transformed into a semilinear system, the unique 
solution of which can be obtained as a fixed point of 
a contractive operator (Bressan and Constantin 2005). 
In terms of [11], a semigroup of global conservative 
solutions (in the sense that the total energy 


1 
JG + z)dx 
2 JR 
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equals a constant, for almost every time), depending 
continuously on the initial data 7(0,-) € H'(R), is 
thus constructed. 


See also: Compressible Flows: Mathematical Theory; 
Dynamical Systems in Mathematical Physics: 

An Illustration from Water Waves; Integrable Systems: 
Overview; Interfaces and Multicomponent Fluids. 


Further Reading 


Acheson DJ (1990) Elementary Fluid Dynamics. New York: 
Oxford University Press. 

Benjamin TB (1992) The stability of solitary waves. Proceedings 
of the Royal Society of London Series A 328: 153-183. 

Bressan A and Constantin A (2005) Global conservative 
solutions of the Camassa—Holm equation, Preprints on 
Conservation Laws 2005-016 (www.math.ntnu.no/conserva- 
tion/2005/016) . 

Camassa R and Holm DD (1993) A new integrable shallow water 
equation with peaked solitons. Physical Review Letters 
71: 1661-1664. 

Constantin A (2000) Existence of permanent and breaking waves 
for a shallow water equation: a geometric approach. Annales 
de l'Institut Fourier (Grenoble) 50: 321-362. 

Constantin A (2001) On the scattering problem for the Camassa- 
Holm equation. Proceedings of the Royal Society of London 
Series A 457: 953-970. 

Constantin A and Escher J (1998) Wave breaking for nonlinear 


nonlocal shallow water equations. Acta Mathematica 
181: 229-243. 


BRST Quantization 


M Henneaux, Universite Libre de Bruxelles, Bruxelles, 
Belgium 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


The BRST symmetry was originally introduced in the 
seminal papers by Becchi et al. (1976) and Tyutin (1975) 
for Yang-Mills gauge theories as a tool for controlling 
the renormalization of the models in a consistent (gauge- 
independent) way. This symmetry was discovered as a 
residual symmetry of the gauge-fixed action. It was 
realized later that, in fact, the BRST construction is quite 
general, in the sense that it covers arbitrary gauge 
theories and not just Yang-Mills gauge models. 
Furthermore, it is intrinsic, in that no gauge choice is 
actually necessary to define it. 

The purpose of this review is to explain the general, 
intrinsic features of the BRST formalism applicable to 
“any” gauge theory. The proper setting for discussing 
these issues is that of homological algebra (Stasheff 
(1998), and references therein). This article first explains 
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the necessary algebraic material underlying the con- 
struction and then illustrates it in the cases of the 
Hamiltonian BRST formalism and the Lagrangian 
BRST formalism. 


A Result from Homological Algebra 


The main result of homological algebra needed in 
the BRST construction deals with a differential 
complex C with two gradings. The first grading is 
an N-degree and is called the “resolution degree,” or 
“r-degree.” The second grading is a Z-degree and is 
called the total ghost number. It is denoted by gh. 
We assume that there are two odd derivations 6 and 
so that have the following properties: 


r(6) = —1, gh(6) = 1 1 
r(so) = 0, gh(so)= 1 
and 
& =0, soph +6s=0, n=] [2] 


for some derivation sı of r-degree 1 and ghost 
number 1. The bracket [-,| is the graded commu- 
tator — in this specific case, the anticommutator. We 
also assume that the homology of 6 vanishes at 
nonzero value of the r-degree, both in the original 
complex C, 

H;(6,C)=0, k>0 [3] 
(which is equivalent to 6a = 0, r(a) > 0 = a = ôb) 
and in the space of derivations, 


a,l =0, r(a) #0 = a= [6,6] 4] 


where a and 8 are both derivations in C. The 
r-degree of a homogeneous linear operator a 
is defined through r(a(x))=r(a)+r7(x) for any 
element x € C and is negative when a decreases the 
r-degree. 

In Ho(6,C), the (odd) derivation sọ defines a 
differential. The cohomology of sg modulo6, 
denoted H*(so, Ho(6,C)), is the cohomology of so in 
Ho(6,C). It is explicitly defined through the cocycle 
condition 


soa = 6m [S] 
with coboundaries of the form 
sob + ôn [6] 
The central result underlying the BRST construc- 
tion is: 


Theorem 1 Given the above setting, there exists 
an odd derivation s in C with the following 
properties: 


s = ó + So + S1 +++- [7] 
r(sk) =k,  gh(są)=1 [8] 
s*=0 [9] 


Furthermore, one has 
H*(s,C) = H* (so, Ho(6,C)) [10] 


The proof is straightforward (see, e.g., 
Henneaux and Teitelboim (1992)). In particular, 
the proof of [10] is a standard spectral sequence 
argument with a sequence that collapses after the 
second step. It is interesting to note that, contrary 
to sg, which is only a differential modulo 6,s is a 
true differential. The construction of s provides a 
model for H*(so, Ho(6,C)). The differential s is not 
unique, but this does not affect the subsequent 
discussion. 
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In physical applications, the total ghost number is 
a derived quantity. The primary gradings are the 
resolution degree and the “filtration degree” called 
the pure ghost number and denoted pgh. It is an 
N-degree and one has 


gh = pgh -r [11] 


The r-degree is known as the antighost or antifield 
number, depending on the context (see below). 
When r(x)=0, one has gh(x)=pgh(x). Since the 
pure ghost number is non-negative, this implies that 


H*(s,C)=0, k<0 [12] 


A Geometric Application 
Geometric Setting 


Theorem 1 is relevant to the following situation. 
Consider a surface © in a manifold M, defined by 
equations 


fa =0 [13] 


which may or may not be independent. (We assume 
for definiteness that the variables in M are bosonic, 
that is, that M is an ordinary manifold — as opposed 
to a supermanifold. The graded case can be covered 
without difficulty by including appropriate sign 
factors at the relevant places.) Assume that © is 
partitioned by orbits generated by vector fields X, 
defined everywhere in M, tangent to © and closing 
on X in the Lie bracket, 


XXe] = C7 agX, + “more” [14] 


where “more” denotes terms that vanish on X. We 
assume, for simplicity, that the vector fields Xa are 
linearly independent of X, although this is not 
necessary. The formalism can be developed in the 
nonindependent case, but it then requires more vari- 
ables. We are interested in the quotient space © /O of 
the surface © by the orbits. To guide the geometrical 
intuition, we shall assume that this quotient space is a 
smooth manifold (the fiber of the orbits, etc.), and we 
shall suggestively adopt notations adapted to this best 
possible case. The approach, being purely algebraic, is 
in fact more general. (Accordingly, the notations 
should be understood with a liberal mind.) 

The aim here is to describe the algebra of 
“observables,” that is, the algebra C%°(/O) of 
functions on the quotient space }/O. The terminology 
“observables” anticipates the physical situation dis- 
cussed below, where the orbits are the “gauge orbits.” 
In order to describe algebraically the algebra of 
observables, one observes that this algebra is obtained 
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through a two-step procedure. First, one restricts the 
functions from M to ©. Second, one imposes the 
invariance condition along the orbits. To each of these 
steps corresponds a separate differential. 


Longitudinal Complex 


The longitudinal complex is associated with the 
second step. One can consider on X an “exterior 
derivative operator D along the gauge orbits.” This 
operator is defined on functions on X as 


Df = Xa(f)C° 15] 


where the 1-forms C®% dual to the X’s are called 
ghosts. In the physical context, the form-degree is 
the pgh described earlier, and so pgh(C*)=1. The 
action of D on the ghosts is given by 


DC* = -4C7 4gC°C? [16] 


The longitudinal complex Ly is the complex of 
exterior forms along the gauge orbits. In our 
representation used here, it is given by the space 
of polynomials in the ghosts C° with coefficients 
that are functions on X. The exterior derivative D 
is defined on this space by extending the formulas 
[15] and [16] so that it is an odd derivation. One 
clearly has (on ©) 


D=0 [17] 


The functions on the quotient space ///O are just the 
elements of the zeroth cohomological group 
H” (D, Lal 


WD Ly) =C a0) [18] 
In general, H? (D, Ly) 4 0. 


Koszul-Tate Differential 6 


The Koszul-Tate differential 6 implements the first 
step in the reduction procedure. More precisely, it 
provides an algebraic resolution of the algebra 
C”(£) of the smooth functions on the surface X. 

That algebra can be identified with the quotient 
algebra 


C”(£) = Cr (MIN [19] 


where M is the ideal of functions that vanish on ©. 
The Koszul-Tate complex K is defined by adding 
one new generator for each equation fa =Q defining 
X, denoted ¢* and assigned r-degree 1. In the algebra 
C™(M) S A(t) (where A(t) is the exterior algebra 


on £“), one defines 6 through 
f=0 VfECX(M), of =f, [20 


and extends it as an odd derivation. It is clear 
that 7(6)= —1 and that 6*=0. Because the 


functions on M are annihilated by 6, they are 
clearly cycles at r-degree zero. Because the left- 
hand side f, of the equations f,=0 are exact 
(equal to 6t*), the ideal M coincides with the set 
of boundaries in degree zero. 

Thus, 


Ho(6, C) = C” (X) |21] 


We see accordingly that 6 successfully enforces the 
restriction to the surface © through its homology in 
degree zero. 

However, if the equations f, =0 are not indepen- 
dent, this is not the end of the story. Indeed, any 
identity Z4fa=0 on the functions f, leads to a 
nontrivial cycle 242% in r-degree 1, 6(Z42t7) =0. This 
is undesirable. To cure this drawback, one intro- 
duces further generators ¢% in r-degree 2, one for 
each identity Z4f, =0, and defines 

st = Za, (ty) =2 22] 
in order to “kill” the unwanted cycles Z4%t*. The 
Koszul complex K is thus enlarged to contain these 
new (even) variables and redefined as 


K = C*(M) 8 A(t*) @ S(t*) [23] 


where S(t4) is the symmetric algebra in t4. The 
operator 6 is extended to K as an odd derivation. 
One has 6*=0 and the property [21] is unaffected 
by the inclusion of the new generators. Furthermore, 
by construction, 


A1(6,K) =0 [24] 


b 


If there is no “identity on the identities,” we shall 
assume that the process stops. Otherwise, one needs 
to introduce further generators in r-degree 3 and 
possibly higher. When all the appropriate variables 
are included, there is no homology at higher 
r-degree. Thus, 


H,(6,K)=0, k>0 25] 


Combining 6 with D 


We now turn to the problem of combining the 
Koszul-Tate complex with the longitudinal com- 
plex, so as to implement the full reduction. To that 
end, we define C by adding the ghosts to K, 


C=k OAN Ca] =0 [26] 


We then extend the action of the Koszul-Tate 
differential in the simplest way which preserves all 
gradings, namely 


Ca = 0 [27] 


It is clear that the homology of 6 in C is given by 
Holo, C) = Ls M6 C)=0 (k>0) P8] 


One can also extend the longitudinal derivative 
D to the whole complex C because the vector fields 
Xa are defined throughout M and so, the defini- 
tions [15] and [16] make sense in C. One defines 
the action of D on the generators t* by requiring 
that 


Dô+6D =0 [29] 


This is easily verified to be possible. However, the 
(odd) derivation so obtained fails to be a differential 
in C when the vector fields X, do not close off the 
surface X. In that case, the gauge transformations 
are not integrable off ©; one says that they form an 
“open algebra.” One has then D? = 0 only on X, or, 
more precisely, 


D? = —6s1 — 516 [30] 


for some (odd) derivation sı (that vanishes in the 
“closed algebra” case). But this situation is precisely 
the one discussed earlier, with the Koszul-Tate 
differential being indeed 6, as anticipated by the 
notation, and the longitudinal differential D playing 
the role of so (the degrees also match). Applying the 
theorem discussed there, we can conclude: 


Theorem 2 There exists a differential s in C, 


s=64+D+5,4+-:-, r=) [31] 


such that 


H? (s,C) = COO) [32] 


This is an immediate consequence of Theorem 1 
and eqns [18] and [28]. The differential s is known 
in the physical applications described below as the 
BRST differential. 


Hamiltonian BRST Construction 


As a first application of the above setting, we 
consider the Hamiltonian description of gauge 
systems. As already known, gauge systems are 
characterized in the Hamiltonian description by 
constraints and, for this reason, are called “con- 
strained Hamiltonian systems.” Furthermore, the 
gauge transformations generate gauge orbits on the 
constraint surface and the physical observables are 
the functions on the quotient space of the constraint 
surface by the gauge orbits. 

A further important feature arises in the Hamilto- 
nian formalism: the gauge transformations are 
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canonical transformations that are generated by the 
first-class constraints. Assuming that all the second- 
class constraints have been eliminated and that the 
bracket being used is the Dirac bracket, one sees 
that there is a vector field X, for each constraint 
function f,,a@=a. (The functions f, are thus 
assumed to be independent since the vector fields 
Xa are assumed to be so. If not, further variables are 
needed, but the analysis proceeds along the same 
ideas.) 

This implies, in turn, that there is a pairing between 
the ghosts C? associated with the longitudinal exterior 
derivative and the generators t; of the Koszul—Tate 
complex. This pairing enables one to extend the 
bracket structure defined on the phase space to the 
pairs (C’,t*) by declaring that these are canonically 
conjugate. The variables t% are the momenta conjugate 
to the ghosts, [t*,C?] =62. Accordingly, the complex C 
relevant to the Hamiltonian situation, 


C = C”(P) @ A(C%) A (t) [33] 


has a phase-space structure (here, P= M is the 
manifold obtained after eliminating the second-class 
constraints, equipped with the Dirac bracket). The 
space C is known as the “extended phase space.” 
The r-degree is called “antighost number” in the 
Hamiltonian context. 

By the general theorem described in the previous 
section, one knows that the cohomology at gh=0 of 
the BRST differential is isomorphic to the algebra of 
the observables. Thus, there are two alternative 
ways to describe this physical algebra, either 
through reduction, by eliminating the redundant 
(gauge) variables, or cohomologically in an extended 
space containing additional variables, the ghosts, 
and their momenta. 

There is an additional interesting feature of the 
BRST construction in the Hamiltonian case: the 
BRST transformation is a canonical transformation 
in the extended phase space, in the sense that 


sF = [0,F [34] 


for some “BRST generator” Q of ghost number 1 
(F, Q € C). The nilpotency s* of the BRST differen- 
tial is equivalent to 


(2, Q] = 0 35] 


That s is canonically generated implies that the 
cohomological BRST groups come with a natural 
bracket structure: the Poisson bracket of the extended 
phase space passes on to the BRST cohomological 
groups. In particular, H°(s,C), equipped with this 
bracket structure, is isomorphic (as Poisson algebra) 
to the algebra of physical observables. 
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Lagrangian BRST Construction 


The analysis of the Lagrangian BRST construc- 
tion, due to Batalin and Vilkovisky (1981) (“anti- 
field formalism”), proceeds in the same way because 
the covariant description of the space of observables 
involves also the same geometric ingredients. The 
surface X is now the “stationary surface,” that is, 
the space of solutions to the equations of motion. 
The space M in which it is embedded is the space of 
all field histories. The gauge symmetry acts on this 
space. Furthermore, the gauge vector fields are 
tangent to X since a solution is mapped on a 
solution by a gauge transformation. The integral 
submanifolds are the gauge orbits. The observables 
are the functions on the quotient space. 

Since the equations of motion follow from an 
action principle, there are as many equations as 
there are fields y’. The corresponding generators t? 
in the Koszul—Tate complex (at degree 1) are called 
“antifields conjugate to the fields” and are denoted 
y*. The r-degree is known as “antifield” (or also 
“antighost”) number. The gauge symmetry of the 
action implies Noether identities on the equations of 
motion. These are, therefore, not independent. 
According to the above general discussion, there 
are further generators in the Koszul—Tate complex, 
at degree 2. More precisely, there are as many new 
generators in degree 2 as there are Noether identities 
or independent gauge symmetries. These are called 
antifields conjugate to the ghosts and denoted C*. 

In the longitudinal complex, one has the ghosts C°, 
with as many ghosts as there are gauge symmetries. 
Thus, the BRST complex is the space 


C = C*(M) S A(C*) D A) DS(C) [B6] 


where M is the space of all field histories. There is 
now a natural pairing between the original field 
variables y’ and the antifields ~*, as well as between 
the ghosts C“ and the antifields C*. One thus defines 
a bracket in which the fields y’ and the ghosts C® on 
the one hand, and the antifields yf and C% on the 
other, are declared to be conjugate. This bracket is 
denoted by parentheses, 
(g, p7) = ;, (C C3) E 63 [37] 
However, since the bracket pairs variables with 
degrees that add up to —1, it is in fact an “odd 
bracket,” called the “antibracket.” 
The BRST differential is again canonically gener- 
ated, but this time in the antibracket, 


sF=(S,F), Fec (38) 


where the generator S is an even function of the 
fields, the ghosts and the antifields, with gh = O (the 


ghost number is carried by the odd antibracket). 
The nilpotency s*=0 of the BRST differential is 
equivalent to the crucial “master equation,” 


(S,S) =0 [39] 


Because the BRST differential is canonically 
generated, there is a natural bracket in cohomology. 
This bracket is not the Poisson bracket of observa- 
bles (at gh = 0) because it changes the ghost number 
by one unit. One can, however, relate it to the 
Poisson bracket of observables (Barnich and Hen- 
neaux 1996); furthermore, it plays an important role 
in the study of the consistent deformations of the 
action. 


Spacetime Locality 


In the context of local field theory, one is often 
interested in a particular class of functions of the 
field histories, namely the so-called space of local 
functionals. A local functional is, by definition, the 
integral of a local n-form (where n is the spacetime 
dimension). A local n-form reads, in local 
coordinates, 


w = f(x) d”x |40] 


where f(x) depends on the fields at x as well as on a 
finite number of their derivatives. When the ghosts 
and the antifields are included, the local functions 
depend on them in the same way. 

The previous general cohomological result was 
derived in the space of all function (al)s, without locality 
restriction. When changing the space of cochains, one 
may change the cohomology. For instance, a local 
functional which is BRST-trivial in the space of all 
functionals may become nontrivial in the space of local 
functionals. This indeed happens here because the 
homology of the Koszul-Tate differentials usually no 
longer vanishes at strictly positive r-degree in the space 
of local functionals, where it is related to local 
conservation laws. As a result, the analysis of the 
BRST cohomology in the space of local functionals is 
an interesting and nontrivial problem. In particular, the 
cohomological groups H*(s) in the space of local 
functionals may not vanish at negative ghost numbers. 


BRST Quantization 


The quantization of a dynamical system can proceed 
along different lines. For gauge models, the path- 
integral approach is most efficiently pursued in the 
context of the antifield formalism. We shall briefly 
outline here the general principles underlying the 


operator approach, which is based on the Hamiltonian 
formalism. 

In the operator approach, all the variables, 
including the ghosts and the conjugate momenta, 
are realized as operators in a space endowed with a 
nonpositive-definite inner product (because of the 
ghosts and the gauge modes). Real dynamical 
variables become formally Hermitian operators. 
Ignoring anomalies, the BRST generator Q becomes 
an operator that fulfills the conditions 


C=O, a0 [41] 


(which allows for nontrivial solutions Q Æ 0 because 
the inner product is not positive definite). The 
second relation is a consequence of the classical 
Poisson bracket relation [Q,Q]=0 and the fact that 
the graded Poisson bracket of two odd objects 
becomes the anticommutator. 

To remove the ghost and gauge redundancy, which 
has no physical content, one must impose a condition 
that selects physical states. The appropriate condition 
is motivated by the general cohomological result 
connecting the BRST cohomology with the algebra of 
physical observables. One imposes the condition 


QI) = 0 [42] 


Because of [41], states of the form 2|x) are solutions 
of [42], but they have a vanishing inner product with 
any other physical states, including themselves. They 
are called null states. The physical states are given by 
the BRST state cohomology. The physical operators 
are given by the BRST operator cohomology at 
gh=0 and induce a well-defined action in the state 
cohomology. In particular, the Hamiltonian, being 
gauge invariant in the original theory, is represented 
by a BRST cohomological class, so that the time 
evolution maps physical states on physical states. 
The whole scheme is (formally) consistent because 
exact BRST operators have vanishing matrix elements 
between states annihilated by the BRST operator Q, 
while null states |) are such that (7|A|¢)= 0 whenever 
A is a BRST-closed operator, [A,Q]=0, and |y) a 
physical state. Problems may arise, however, if the 
classical relations [Q,Q]=0 and [H,Q]=0 are not 
satisfied in presence of extra terms of order h,that is, 


O40 or HON+0HF0 [43] 


In such cases, one says that they are anomalies. These 
are usually fatal to the consistency of the theory. 


Some Applications 


The number of applications of the BRST formalism 
is so large that it would be out of place to try being 
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exhaustive here. Some of its main successes are 
outlined here, with suggestions for “Further reading.” 


Renormalization of Gauge Theories 


First, there is the original context of perturbative 
renormalization and anomalies for gauge theories of 
the Yang-Mills type. The relevant cohomology here 
is the BRST cohomology in the space of local 
functionals involving the fields, the ghosts, and the 
antifields. The antifields are also known in this 
context as Zinn-Justin sources for the BRST varia- 
tions of the fields and ghosts, since Zinn-Justin was 
the first to introduce them (with that meaning). 
Many authors have contributed to the full computa- 
tion of the local BRST cohomology. A review is 
given in Barnich et al. (2000), where extensions to 
other theories are also indicated. 


String Theory 


Modern string theory would be inconceivable with- 
out the BRST formalism. This started with the 
pioneering paper by Kato and Ogawa (1983), where 
the critical dimension of the bosonic string was 
derived from the condition that Q? should vanish 
(quantum mechanically), and where it was shown 
that the string physical states could be identified 
with the state BRST cohomology. The reader is 
referred to excellent monographs on modern string 
theory (see “Further reading”). 


Deformations of Gauge Models 


The study of consistent deformations of a given 
gauge theory (i.e. the problem of introducing 
consistent couplings) is also efficiently dealt with in 
the BRST context. References to applications may 
be found in Henneaux (1998). 


See also: Anomalies; Batalin—Vilkovisky Quantization; 
BF Theories; Constrained Systems; Functional 
Integration in Quantum Physics; Graded Poisson 
Algebras; Indefinite Metric; Perturbative Renormalization 
Theory and BRST; Quantum Chromodynamics; Quantum 
Field Theory: A Brief Introduction; Renormalization: 
General Theory; String Field Theory; Supermanifolds; 
Topological Sigma Models. 
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The study of algebras of Hilbert space operators, closed 
under the adjoint operation and in the weak operator 
topology, was begun by John von Neumann shortly 
after the discovery of quantum mechanics, and partly 
with the aim of understanding the monolithic ideas 
proposed by Heisenberg and Schrödinger. 

Seventy-five years later, the theory of these 
algebras has become a monolith in its own right 
(see von Neumann Algebras: Introduction, Modular 
Theory and Classification Theory; von Neumann 
Algebras: Subfactor Theory), with more internal 
structure and with more external reference to physics 
and, as it turns out, to other areas of mathematics 
than could possibly have been imagined at the outset. 
(The most striking example of an application to 
mathematics is perhaps the discovery of the Jones 
knot polynomial (see The Jones Polynomial); note 
that this has also had repercussions for physics.) 

Twenty-five years after the beginning of the 
theory of von Neumann algebras, as these algebras 
are now called, Gelfand and Naimark noticed that a 
second class of algebras of operators on a Hilbert 
space, closed under the adjoint operation, was 
worthy of study, namely those closed in the norm 
topology. Gelfand and Naimark made two impor- 
tant discoveries concerning this class of operator 
algebras, now called C*-algebras. 

First, Gelfand and Naimark showed that, in the 
commutative case, at least when the C*-algebra is 
considered only up to isomorphism — with its 
identity as a concrete algebra of operators sup- 
pressed — the information contained in a C*-algebra 
is purely topological. More precisely, Gelfand and 
Naimark showed that the category of unital 
commutative C*-algebras, with  unit-preserving 
algebra homomorphisms (these necessarily preserve 
the adjoint operation), is equivalent in a contra- 
variant way (i.e., with reversal of arrows) to the 
category of compact Hausdorff spaces, with con- 
tinuous maps. The compact space associated with a 


unital commutative C*-algebra under the Gelfand- 
Naimark correspondence may be viewed as the 
space of maximal proper ideals, with a natural 
topology (the hull-kernel, or Jacobson, topology), 
and is called the spectrum. This space may also be 
viewed as the set of (unital, linear, multiplicative) 
maps from the algebra into the complex numbers, 
in which case the topology is that of pointwise 
convergence. 

Second, using this result, Gelfand and Naimark 
proved that arbitrary C*-algebras could be axioma- 
tized in a simple way abstractly, as *-algebras — that 
is, as algebras over the complex numbers with a 
conjugate linear anti-automorphism of order 2 — with 
certain special properties. It is now known that the 
only property that needs to be assumed is the 
existence of a (necessarily unique) Banach space 
norm related to the *-algebra structure by means of 
the so-called C*-algebra identity: 


lxx] = Ile" || ll] [1] 


This is clearly related to — and in fact implies — the 
normed algebra inequality 


lx yll < Mell [ly |2] 


One reason that the Gelfand-Naimark axiomati- 
zation of C*-algebras is important is that it under- 
lines how natural it is to consider a C*-algebra 
abstractly, i.e., independently of any particular 
representation. Indeed, while one of the fundamen- 
tal phenomena of von Neumann algebra theory 
(discovered by Murray and von Neumann) is that, 
essentially — in rather a strong sense — there is only 
one way to represent a given von Neumann algebra 
on a Hilbert space (and there is even a canonical 
way, called the standard representation!), it is an 
equally fundamental phenomenon of C*-algebra 
theory that, except in extremely special cases, this 
is no longer true. 

For instance, although the C*-algebra of compact 
operators on a given Hilbert space has, up to unitary 
equivalence, only a single irreducible representation — 
this is what underlies the fact, proved by von 
Neumann, referred to as the uniqueness of the 
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Heisenberg commutation relations for a quantum- 
mechanical system with finitely many degrees of 
freedom — as soon as one considers a physical system 
with infinitely many degrees of freedom, one finds that 
the naturally associated C*-algebra has infinitely 
many — indeed, uncountably many — unitary equiva- 
lence classes of irreducible representations, and it is 
impossible to parametrize these in any reasonable way. 

This striking dichotomy presents itself also in 
other contexts, more elementary perhaps than the 
physics of infinitely many degrees of freedom. 
Consider the dynamical system consisting of a circle 
and a fixed rotation acting on it. If the rotation is of 
finite order — i.e., if the angle is a rational multiple 
of 2r — then the naturally associated C*-algebra is 
relatively easy to study. In the case of angle zero, it 
is the unital commutative C*-algebra with Gelfand- 
Naimark spectrum the torus. In the general case of a 
rational angle, the space of unitary equivalence 
classes of irreducible representations is still naturally 
parametrized by the torus. (And this is the same as 
the space of primitive ideals — the kernels of the 
irreducible representations — with the Jacobson 
topology.) 

In the irrational case — the case of a rotation by an 
irrational multiple of 27 (still elementary from a 
geometrical point of view; note that the calendar is 
based on such a system!) — the irreducible represen- 
tations are no longer parametrized up to unitary 
equivalence by the torus — and the space of primitive 
ideals consists of a single point — the C*-algebra is 
simple. (But it is decidedly not simple to study!) 

This fundamental dichotomy in the classification 
of C*-algebras — conjectured by Gaarding and 
Wightman in the quantum-mechanical setting and 
by Mackey in the geometrical one — was established 
by Glimm. Glimm proved (in the setting of separ- 
ability; most of his results were generalized later 
to the nonseparable case) that a large number of 
a priori different ways that a C*-algebra could 
behave well were in fact one and the same behavior: 
either all present for a given C*-algebra, or all 
catastrophically absent! 

Some of the properties considered by Glimm, and 
shown to be equivalent (for a separable C*-algebra) 
were as follows. First of all, every representation of 
the C*-algebra on a Hilbert space should be of type 
I, i.e., should generate a von Neumann algebra of 
type I. (A von Neumann algebra was said by Murray 
and von Neumann to be of type I if it contained a 
minimal projection of central support one, i.e., a 
projection not contained in a proper direct sum- 
mand and minimal with this property.) Second, in 
every irreducible representation (not necessarily 
injective) on a Hilbert space, the image of the 


C*-algebra should contain the compact operators. 
Third, any two irreducible representations with the 
same kernel should be unitarily equivalent. Fourth, 
it should be possible to parametrize the unitary 
equivalence classes of irreducible representations by 
a real number in a natural way (respecting the 
natural Borel structure introduced by Mackey). 

The first of the equivalent properties listed above, 
that all representations of a C*-algebra should be of 
type I, suggested a name for the property — that the 
C*-algebra itself should be of type I. This property 
of a C*-algebra, identified by Glimm - or, rather, its 
opposite, which as mentioned above is much more 
common (just as irrational numbers are more 
common than rationals, or systems with infinitely 
many degrees of freedom are, at least in theory, 
much more common than those with finitely many 
degrees of freedom) — is a fundamental unifying 
principle of nature. 

Besides commutative C*-algebras — as mentioned 
above, just another way of looking at topological 
spaces (compact Hausdorff spaces, that is) — and 
besides the C*-algebra associated to a rotation or to 
a physical system with infinitely many degrees of 
freedom, what are some of the naturally occurring 
examples of C*-algebras — of type I or not! 

First, let us take a closer look at what arises from 
a system with infinitely many degrees of freedom — 
in the fermion case. As shown by Jordan and 
Wigner, one obtains what, as a C*-algebra, is very 
easy to describe, namely, just the infinite tensor 
product in the category of unital C*-algebras of 
copies of the algebra of 2 x 2 matrices over the 
complex numbers. As it happens, in work earlier 
than that referred to above, Glimm had considered 
such infinite tensor product C*-algebras, also allow- 
ing the components to be matrix algebras of order 
different from two. This raised a problem of 
classification — for those C*-algebras, all of which 
were simple and not of type I. (The only simple 
unital C*-algebra of type I is a single matrix algebra, 
or a finite tensor product of matrix algebras!) 

In a pioneering classification paper (the first paper 
on the classification of C*-algebras being perhaps 
that of Gelfand and Naimark, in which the commu- 
tative case was described), Glimm obtained the 
classification of infinite tensor products of matrix 
algebras, showing that it was a direct extension of 
the classification of finite tensor products, 1.e., just 
of the matrix algebras themselves. As described later 
by Dixmier, Glimm’s classification was as follows. 
Given a sequence 71,72,... of natural numbers 
(equal to one or more), form the infinite product in 
a natural way — just by keeping track of the total 
number of times each prime number appears in the 


finite products nı ...m, (a multiplicity which may be 
either finite or infinite). Call such a formal infinite 
product a generalized integer — or, perhaps, a 
supernatural number! Two (countably) infinite 
tensor products of matrix algebras are isomorphic 
(just as in the finite tensor product case) if and only 
if the corresponding supernatural numbers are 
equal. 

In formulating Glimm’s classification of infinite 
tensor products of matrix algebras in this way, 
Dixmier pointed out that each supernatural number 
determines a subgroup of the rational numbers 
(those with denominator dividing the supernatural 
number) and that every subgroup of the rational 
numbers containing the integers arises in this way. 
He then gave an alternative derivation of Glimm’s 
theorem by recovering this subgroup of the rational 
numbers as a natural invariant of the algebra, 
namely, as the subgroup generated by the values 
on projections of the unique normalized trace. (By a 
trace is meant here a unitarily invariant positive 
linear functional.) This could even be interpreted as 
an alternative statement of Glimm’s theorem. 

Soon afterwards, Bratteli considered an extension 
of Glimm’s class of C*-algebras, namely, the 
inductive limits of arbitrary sequences of finite- 
dimensional C*-algebras, and gave a classification of 
these algebras in terms of the embedding multiplicity 
data in the sequences. This was exactly analogous to 
the original classification of Glimm, but now vastly 
more complex, with the multiplicity data of the 
sequence encoded in what is now called a Bratteli 
diagram. (Note that a finite-dimensional C*-algebra 
is just a direct sum of matrix algebras over the 
complex numbers.) Bratteli diagrams have proved to 
be very important, and in particular have been shown 
by Putnam and others to be useful for the study of 
minimal homeomorphisms of the Cantor set. 

Brattelis extension of Glimm/’s tensor product 
classification was followed by a corresponding 
extension by the present author of Dixmier’s 
approach to Glimm’s result. It was no longer 
possible to express the appropriate data in terms of 
traces (even in the case of a unique normalized 
trace). Instead, the present author recalled the 
concept of equivalence of projections introduced 
by Murray and von Neumann forty years earlier, 
together with the fact, proved by Murray and von 
Neumann, that equivalence is compatible with 
addition of orthogonal projections. (Two projec- 
tions in a *-algebra are equivalent if they are equal 
to x*x and xx* for some element x.) The resulting 
elementary invariant — the set of equivalence classes 
of projections with the operation of addition 
whenever defined (whenever the equivalence classes 
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to be added have orthogonal representatives) — one 
might refer to this as a local abelian semigroup — 
which was used by Murray and von Neumann to 
divide von Neumann algebras into what they called 
types I, II, and III - was shown by the author to 
determine Bratteli’s algebras up to isomorphism. 

Bratteli called his algebras approximately finite- 
dimensional C*-algebras, or AF algebras. The author 
referred to his invariant simply as the range of the 
(abstract) dimension, and pointed out that this 
structure determined an enveloping ordered abelian 
group, which he called the dimension group. It was 
soon noticed that the dimension group was related 
to the K-group introduced by Grothendieck in 
algebraic geometry (see K-Theory), and by Atiyah 
and Hirzebruch (see K-Theory) in topology. 

Grothendieck’s K-group was defined for an arbi- 
trary ring with unit, and Atiyah and Hirzebruch in 
effect considered the special case of the ring of 
continuous functions on a compact Hausdorff space — 
in other words, a commutative C*-algebra — in the 
process showing that the deep phenomenon of Bott 
periodicity could be expressed in terms of this 
invariant. The invariant itself (see below) is essen- 
tially the same as that of Murray and von Neumann. 
In the special case that the ring is an AF algebra, the 
K-group coincides with the dimension group. (The 
K-group has a natural ordered, or pre-ordered, 
structure, although this was often suppressed.) 

Let us consider the definition of the K-group of a 
not necessarily unital C*-algebra; it is in this setting 
that the statement of Bott periodicity attains its 
simplest form. 

First, in the unital case, one constructs the abelian 
local semigroup (addition just partially defined) of 
Murray—von Neumann equivalence classes of pro- 
jections, as described above in the case of an AF 
algebra. Let us call this the dimension range. As 
stated above, for AF algebras this is all that needs to 
be done — the enveloping group of the dimension 
range is already the K-group. In the general case, 
one must repeat the construction for the algebra of 
2 x 2 matrices over the given algebra, with the given 
algebra considered as embedded as the upper left- 
hand corner of the matrix algebra. The dimension 
range of the given algebra then maps naturally into 
(but not necessarily onto) the dimension range of the 
matrix algebra. One should then repeat this con- 
struction, doubling the order of the matrix algebra 
at every stage (or, alternatively, increasing it just by 
one). The enveloping group of the (algebraic) 
inductive limit of this sequence of local semigroups 
is then the K-group of the given algebra. (Alterna- 
tively, one may just consider immediately the 
*-algebra of all infinite matrices over the given 


396 C’*-Algebras and their Classification 


C*-algebra with only finitely many nonzero entries, 
and form the dimension range of this *-algebra — and 
the enveloping group of this abelian local semi- 
group, now in fact a semigroup.) 

In the case of a nonunital C*-algebra, one adjoins 
a unit (as may be done, for instance, by representing 
the C*-algebra faithfully on a Hilbert space, and 
showing that the C*-algebra obtained by adjoining 
the identity operator is independent of the representa- 
tion — actually, one need only check that the *-algebra 
structure is unique, as the C*-algebra norm on a 
C*-algebra is always determined by the *-algebra 
structure). The K-group of the resulting unital 
C*-algebra then maps naturally into the K-group of 
the natural one-dimensional quotient, and the kernel 
of this map is, for reasons that will become clearer 
later, defined to be the K-group of the nonunital 
algebra. 

Atiyah and Hirzebruch in fact referred to the 
K-group of the C*-algebra as Ko — the reason being 
that there is another very natural group to consider, 
namely, the K-group of the suspension of the 
C*-algebra. (The suspension, SA, of a C*-algebra A 
is defined as the C*-algebra of all continuous 
functions from the real line R into A which converge 
to zero at too, with the pointwise *-algebra 
operations and the supremum norm. It may also be 
defined as the (unique) C*-algebra tensor product 
A®Co(R), where Co(R) denotes the suspension of 
the C*-algebra C of complex numbers.) Denoting 
the Ko-group of the suspension of a given C*-algebra 
by Kı, one might expect this process to continue, 
but in fact it is periodic (Ko, K1, Ko, Ki,...). Bott 
periodicity states that there is a natural isomorphism 
of K with Ko. (C*-algebras can also be defined with 
the field of real numbers as scalars, and in this case 
the period of Bott periodicity is eight.) 

Another way of stating Bott periodicity, or, more 
precisely, of embedding it into the K-theory of 
C*-algebras, is as follows. Given a short exact 
sequence of C*-algebras, 


0>f+A->A/J> 90 [3] 


i.e., given a C*-algebra A and a closed two-sided 
ideal J (the quotient *-algebra is then a C*-algebra 
with the quotient norm) — A is sometimes referred to 
as an extension of J by A/J — consider the natural 
short (not necessarily exact) sequences 


Ko(J) > Ko(A) > Ko(A/J) [4] 
and 
Ki(J) > Ki(A) > KA) [5] 


(Ko and K, are functors!). There exist natural connect- 
ing maps K;(A/J) > Ko(J) and Ko(A/J) > Ki(J) — the 


first referred to as the index map, and the second 
(sometimes referred to as the odd-order index map) 
obtained from this immediately from Bott periodicity 
(as stated above) — such that the periodic six-term 
sequence 


Ko(J) > Ko(A) > Ko(A/J) 


T l 
Ki(A/J) — Ki(A) — Ki(J) 


is exact. (The periodicity stated above can also be 
recovered from this.) 

Given that the functor Kg classifies AF algebras, 
one might expect the functor Kı to be useful for 
classification purposes also. In fact, this is the case. 
(Indeed, as shown by Brown, the K,-functor is 
already important for the theory of AF algebras — in 
spite of, or even because of (!), the fact that the 
K,-group of an AF algebra is zero.) Using the six- 
term exact sequence of Bott periodicity described 
above, corresponding to an extension of C*-algebras, 
together with results of the present author, Brown 
showed that any extension of one AF algebra by 
another is again an AF algebra. 

A rather large class of simple unital C*-algebras 
has by now been classified by means of the 
invariants Kọ and Kı — together with the class of 
the unit in Ko, and the order (or pre-order) structure 
on Kg — and also taking into account the compact 
convex set of tracial states on the C*-algebra 
(a positive linear functional on a C*-algebra is called 
a trace if it has the same value on x* x and x x* for 
every element x, and a tracial state if it is a state, 
that is, has norm 1, or has value 1 on the unit in the 
case the algebra has a unit). In addition to the set of 
tracial states, together with its natural topology and 
convex structure, one should also keep track of the 
natural pairing between traces and Kg (any trace on 
a unital C*-algebra has the same value on two 
equivalent projections — equal to x*x and xx* for 
some element x — and hence gives rise to an additive 
real-valued functional on Ko). 

In terms of these invariants (which might, broadly 
speaking, be called K-theoretical), it has been 
possible to classify the simple unital C*-algebras 
(not of type I) arising as inductive limits (1.e., as the 
completions of increasing unions) of sequences of 
finite direct sums of matrix algebras over separable 
commutative C*-algebras, these assumed to have 
spectra of dimension at most three, on the one hand 
(work of the present author together with Guihua 
Gong and Liangging Li, a culmination of earlier 
work of these authors together with a number of 
others), and, on the other hand, it has been possible 
(work of Kirchberg and Phillips, also based on 
earlier work by a number of authors) to classify the 


C*-algebra tensor products (in a natural sense) of 
these C*-algebras with what is called the Cuntz 
C*-algebra O, (see below). In the first of these two 
cases, the compact convex set of tracial states — 
always a Choquet simplex — is an arbitrary (metriz- 
able) such space. 

In the second case, this space is empty (as it is for 
O» in particular). In both cases, Kọ and Kı are 
arbitrary countable abelian groups, with the proviso 
that Ko is not the sum of a torsion group and a 
cyclic group. In the first case, the order structure on 
Ko, the class of the unit element, and the pairing of 
Ko with the space of traces have certain special 
properties; as it turns out, these can be expressed in 
a simple way. (The class of the unit need only be 
positive and nonzero.) In the second case, the order 
structure on Ko is degenerate — every element is 
positive — and the class of the unit can be arbitrary 
(including zero!). 

Let us just note that the Cuntz C*-algebra O% is 
the unital C*-algebra generated by an infinite 
sequence s1,S2,... Of isometries with orthogonal 
ranges (in other words, elements s; such that s* s; is 
the unit and s; s;=0 if j #7). One need not require 
the C*-algebra to have the universal property with 
respect to these generators and relations as it is in 
fact unique (up to an isomorphism preserving these 
generators). In particular, this C*-algebra is simple. 
(If one considers a finite sequence of isometries with 
orthogonal ranges, and assumes in addition that the 
sum of these is the unit, one also obtains a simple 
C*-algebra, the Cuntz C*-algebra O,,,. n=2,3,...). 
The Ko-group and K1-group of O» are, respectively, 
Z and 0. (The Ko-group and K1-groups of O, for 
n=2,3,... are, respectively, Z/(n — 1)Z and 0.) 

Both classes of C*-algebras considered in the 
classification result stated above, although des- 
cribed in rather a concrete way (in terms of 
inductive limits and tensor products), can also be 
characterized axiomatically, in a way that makes it 
clear that they are, in fact, much more general than 
they seem. (These axiomatizations are due to 
Lin and to Kirchberg and Phillips. Typically, the 
abstract axioms are easier to establish in a 
given case than the inductive limit form described 
above.) 

In view of this, and the fact that one of the axioms 
is a notion of amenability (the analogous property 
for C*-algebras of a notion that has also been 
considered for von Neumann algebras) and since 
amenable von Neumann algebras (on a separable 
Hilbert space) have been classified completely (in 
remarkable work of Connes, together with many 
others, starting with Murray and von Neumann — 
and, one must also mention, ending with Haagerup, 


C*-Algebras and their Classification 397 


who settled a particularly stubborn case), it is 
natural to ask whether the K-theoretical invariants 
described above might be sufficient to classify all 
amenable separable C*-algebras, say, those which 
are simple and unital. 

The work of Villadsen has shown that additional 
invariants must in fact be considered, if one is to 
deal with arbitrary amenable simple C*-algebras, 
and this has been confirmed in subsequent work of 
Rørdam and of Toms. (Villadsen’s examples were 
obtained by removing the condition of low dimen- 
sion on the spectra of the commutative C*-algebras 
appearing in the inductive limit decomposition 
considered above.) The very nature of these authors’ 
work, however, has been to introduce additional 
invariants, all of which it seems natural to consider 
as, broadly speaking, K-theoretical. (And all of 
which, as it happens, are already familiar.) 

The question of the classifiability, in terms of 
simple invariants (K-theoretical in nature, at least in 
the broad sense, and including the spectrum which is 
indispensable in the nonsimple case), of all (separ- 
able) amenable C*-algebras would therefore still 
appear to be on the agenda. 

Already, in any case, just like the analogous 
question for von Neumann algebras (now settled), 
this question would appear to have had a noticeable 
influence on the development of the subject — not 
least in underlining the importance of K-theoretical 
methods, which have proved to be pertinent both in 
connection with the index theory of differential 
operators on geometrical structures — from foliations 
to fractals -— and in connection with questions in 
physics, related to quantum statistical mechanics 
(see e.g., Quantum Hall Effect), to quantum field 
theory (e.g., the standard model), and even to string 
theory and M-theory. 


See also: Axiomatic Quantum Field Theory; Bosons and 
Fermions in External Fields; The Jones Polynomial; 
k-Theory; Positive Maps on C*-Algebras; Quantum Hall 
Effect; von Neumann Algebras: Introduction, Modular 
Theory, and Classification Theory; von Neumann 
Algebras: Subfactor Theory. 
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Calibrated Geometry 


“Calibrated geometry,” introduced by Harvey and 
Lawson (1982), is the study of special classes of 
“minimal submanifolds” N of a Riemannian mani- 
fold (M,g), defined using a closed form y on M 
called a calibration. For example, if (M,J,g) is a 
Kahler manifold with Kahler form w, then complex 
k-submanifolds of M are calibrated with respect to 
y=wk/k!. Another important class of calibrated 
submanifolds are special Lagrangian submanifolds 
in Calabi-Yau manifolds, which is the focus of the 
section “Special Lagrangian geometry.” 


Calibrations and Calibrated Submanifolds 


We begin by defining “calibrations” and “calibrated 
submanifolds.” 


Definition 1 Let (M,g) be a Riemannian manifold. 
An “oriented tangent k-plane” V on M is a vector 
subspace V of some tangent space TyM to M with 
dimV =k, equipped with an orientation. If V is an 
oriented tangent k-plane on M then gly is a 
Euclidean metric on V; so, combining g|y with the 
orientation on V gives a natural volume form voly 
on V, which is a k-form on V. 


Now let y be a closed k-form on M. y is said to 
be a calibration on M, if for every oriented k-plane 
V on M, vly < voly. Here, yly =a-voly for some 
aé€R, and gly <voly if a<1. Let N be an 
oriented submanifold of M with dimension k. Then 
each tangent space T,N for x € N is an oriented 
tangent k-plane. We say that N is a calibrated 
submanifold if |r n =volr,n for all x € N. 

It is easy to show that calibrated submanifolds 
are automatically “minimal submanifolds.” We 
prove this in the compact case, but noncompact 
calibrated submanifolds are locally volume-minimizing 
as well. 


Proposition 2 Let (M,g) be a Riemannian mani- 
fold, p a calibration on M, and N a compact 
y-submanifold in M. Then N is volume-minimizing 
in its homology class. 


Proof Let dimN=k, and let [N] € H;(M, R) and 
[p] € H*(M,R) be the homology and cohomology 
classes of N and y. Then 


eln = | Plan = J -volray = Vol(N) 


since ọ|r n =Vvolr,n for each x EN, as N is a 
calibrated submanifold. If N’ is any other compact 
k-submanifold of M with [N’]=[N] in H;(M,R), 
then 


WN = AN] | elus f volna 
= Vol(N’) 


since |r y < volr,n because y is a calibration. The 
last two equations give Vol(N) < Vol(N’). Thus, N 
is volume-minimizing in its homology class. oO 


Now let (M, g) be a Riemannian manifold with a 
calibration y, and let 1: N—M be an immersed 
submanifold. Whether N is a -submanifold 
depends upon the tangent spaces of N. That is, it 
depends on ų and its first derivative. So, for N to be 
calibrated with respect to is a first-order partial 
differential equation on v. But if N is calibrated then 
N is minimal, and for N to be minimal is a second- 
order partial differential equation on . 

One moral is that the calibrated equations, being 
first order, are often easier to solve than the minimal 
submanifold equations, which are second order. So 
calibrated geometry is a fertile source of examples of 
minimal submanifolds. 


Calibrated Submanifolds and Special Holonomy 


A calibration y on (M,g) is only interesting if there 
exist plenty of y-submanifolds N in M, locally 
or globally. Since |r y =volr,n for each x €N, 
y-submanifolds will be abundant only if the family 
F, of calibrated tangent k-planes V with y|y = voly 
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is “reasonably large” - say, if Fẹ, has small 
codimension in the family of all tangent k-planes V 
on M. A maximally boring example is the k-form 
y=0, which is a calibration but has no calibrated 
tangent k-planes, so no y-submanifolds. 

Thus, most calibrations y will have few or no 
y-submanifolds, and only special calibrations » with 
F large will have interesting calibrated geometries. 
Now the field of Riemannian holonomy groups is a 
natural companion for calibrated geometry, because 
it gives a simple way to generate interesting 
calibrations y which automatically have F, large. 

Let G C O(n) be a possible holonomy group of a 
Riemannian metric. In particular, we can take G to be 
one of the holonomy groups U(m), SU(m), Sp(m), Go, 
or Spin(7) from Berger’s classification. Then G acts 
on the k-forms A*(R”)* on R”, so we can look for 
G-invariant k-forms on R”. Suppose yo is a nonzero, 
G-invariant k-form on R”. 

By rescaling pọ we can be arrange that for each 
oriented k-plane U C R”, we have yo|,, < voly, and 
that yo|y = voly for at least one such U. Let H be the 
stabilizer subgroup of this U in G. Then ¥o|,.y = 
vol,.y by G-invariance, so y-U is a calibrated 
k-plane for all y€ G. Thus, the family Fo of 
yo-calibrated k-planes in R” contains G/H, so it is 
“reasonably large,” and it is likely that the calibrated 
submanifolds will have an interesting geometry. 

Now let M be a manifold of dimension n, and g 
a metric on M with Levi-Civita connection V and 
holonomy group G. Then there is a k-form y on M 
with Vy=0, corresponding to yo. Hence dy=0O, 
and y is closed. Also, the condition yo|y < voly for 
all oriented k-planes U in R” implies that yly < 
voly for all oriented tangent k-planes V in M. Thus, 
y is a calibration on M. The family F, of calibrated 
tangent k-planes on M fibers over M with fiber Fo; 
so, it is “reasonably large.” 

This gives a general method for finding interesting 
calibrations on manifolds with reduced holonomy. 
Here are the most significant examples. 


e let G=U(m) c O(2m). Then G preserves a 
2-form wo on R*”. If g is a metric on M with 
holonomy U(m), then g is Kahler with complex 
structure J, and the 2-form w on M associated to 
wo is the Kahler form of g. 

One can show that w is a calibration on (M, g), 
and the calibrated submanifolds are exactly the 
“holomorphic curves” in (M,J). More generally, 
w*/k! is a calibration on M for 1 < k < m, and 
the corresponding calibrated submanifolds are the 
complex k-dimensional submanifolds of (M, J). 

e Let G=SU(m) c O(2m). Then G preserves a 
complex volume form 0Q)=dz,A---A dz, on 


C”. Thus, a Calabi-Yau m-fold (M,g) with 
Hol(g) =SU(m) has a holomorphic volume form 
Q. The real part Re is a calibration on M, and 
the corresponding calibrated submanifolds are 
called special Lagrangian submanifolds. 

e The group G2 C O(7) preserves a 3-form yo and a 
4-form xpo on R”. Thus, a Riemannian 7-manifold 
(M, g) with holonomy Gz comes with a 3-form y 
and 4-form xy, which are both calibrations. The 
corresponding calibrated submanifolds are called 
associative 3-folds and coassociative 4-folds. 

e The group Spin(7) C O(8) preserves a 4-form Qo 
on R®. Thus a Riemannian 8-manifold (M, g) with 
holonomy Spin(7) has a 4-form Q, which is a 
calibration. The Q-submanifolds are called Cayley 
4-folds. 


It is an important general principle that to each 
calibration y on an n-manifold (M,g) with special 
holonomy constructed in this way, there corre- 
sponds a constant calibration yp on R”. Locally, y- 
submanifolds in M resemble the yo-submanifolds in 
R”, and have many of the same properties. Thus, to 
understand the calibrated submanifolds in a mani- 
fold with special holonomy, it is often a good idea to 
start by studying the corresponding calibrated 
submanifolds of R”. 

In particular, singularities of y-submanifolds in M 
will be locally modeled on singularities of o- 
submanifolds in R”. (In the sense of geometric 
measure theory, the tangent cone at a singular point 
of a y-submanifold in M is a conical yo-submanifold 
in R”.) So by studying singular yo-submanifolds in 
R”, we may understand the singular behavior of 
-submanifolds in M. 


Special Lagrangian Geometry 


We now focus on one class of calibrated submani- 
folds, special Lagrangian submanifolds in Calabi- 
Yau manifolds. Calabi-Yau 3-folds are used to 
make the spacetime vacuum in string theory, and 
special Lagrangian 3-folds are the classical versions 
of A-branes, or supersymmetric 3-cycles, in Calabi- 
Yau 3-folds. Special Lagrangian geometry aroused 
great interest amongst string theorists because of its 
role in the SYZ conjecture, providing a geometric 
basis for “mirror symmetry” of Calabi-Yau 3-folds. 


Calabi-Yau Manifolds 


Here is our definition of Calabi-Yau manifold. 
Readers are warned that there are several different 
definitions of Calabi-Yau manifolds in use in the 
literature. Ours is unusual in regarding Q as part of 
the given structure. 
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Definition 3 Let m > 2. A Calabi-Yau m-fold is a 
quadruple (M, J, g,Q) such that (M,/) is a compact 
m-dimensional complex manifold, g a Kahler metric 
on (M,/) with Kahler form w, and Q a holomorphic 
(m,0)-form on M called the holomorphic volume 
form, which satisfies 


W” /m! = (—1)" PP GRAA [1] 


The constant factor in [1] is chosen to make ReQ a 
calibration. It follows from [1] that g is Ricci-flat, Q 
is constant under the Levi-Civita connection, and 
the holonomy group of g has Hol(g) C SU(m). 


Let (M,J) be a compact, complex manifold, and g 
a Kahler metric on M, with Ricci curvature R,,. Define 
the Ricci form p of g by pac =J? Rpe. Then p is a closed 
real (1, 1)-form on M, with de Rham cohomology class 
[o] =2rcı(M) € H?(M,R), where cı(M) is the first 
Chern class of M in H? (M, Z). The Calabi conjecture 
specifies which closed (1,1)-forms can be the Ricci 
forms of a Kahler metric on M. 


The Calabi conjecture Let (M,J) be a compact, 
complex manifold, and g a Kabler metric on M, 
with Kahler form w'. Suppose that p is a real, closed 
(1,1)-form on M with [p|=2nc\(M). Then there 
exists a unique Kahler metric g on M with Kahler 
form w, such that [w]=[w'] € H?(M,R),and the 
Ricci form of g is p. 


Note that [w]=[w’] says that g and g’ are in the 
same Kahler class. The conjecture was posed by Calabi 
in 1954, and was eventually proved by Yau in 1976. 
Its importance to us is that when the canonical bundle 
Ky is trivial, so that c;(M) = 0, we can take p = 0, and 
then g is Ricci-flat. Since Ky is trivial, it has a nonzero 
holomorphic section, a holomorphic (m, 0)-form ©. As 
g is Ricci-flat, it follows that VQ = 0, where V is the 
Levi-Civita connection of g. Rescaling Q by a complex 
constant makes [1] hold, and then (M,/J,g,Q) is a 
Calabi-Yau m-fold. This proves: 


Theorem 4 Let (M,J) be a compact complex m- 
manifold with Km trivial. Then every Kahler class 
on M contains a unique Ricci-flat Kahler metric g. 
There exists a holomorphic (m,0)-form Q, unique 
up to change of phase Qr>e’Q, such that 
(M, J, g,Q) is a Calabi-Yau m-fold. 


Using algebraic geometry, one can produce many 
examples of complex m-folds (M, J) satisfying these 
conditions, such as the Fermat (m + 2)-tic 


{|Z0, ae 


eor a 


CR 
+240 2 


Therefore, Calabi-Yau m-folds are very abundant. 


Special Lagrangian Submanifolds 


Definition 5 Let(M,J,g,Q) be a Calabi-Yau m-fold. 
Then ReQ) is a calibration on the Riemannian 
manifold (M,g). An oriented real m-dimensional 
submanifold N in M is called a special Lagrangian 
submanifold (SL m-fold) if it is calibrated with respect 
to Re). 


Here is an alternative definition of SL m-folds. It 
is often more useful than Definition 5. 


Proposition 6 Let (M,J,g,Q) be a Calabi-Yau 
m-fold, with Kahler form w, and N a real m-dimen- 
sional submanifold in M. Then N admits an 
orientation making it into an SL m-fold in M if 
and only if wln = 0 and ImQ|\, = 0. 


Regard N as an immersed submanifold, with 
immersion 1: N — M. Then [w],,] and [ImQ],)] are 
unchanged under continuous variations of the 
immersion 4. Thus, [wln] =[ImQ]|,,] =0 is a neces- 
sary condition not just for N to be special 
Lagrangian, but also for any isotopic submanifold 
N’ in M to be special Lagrangian. This proves: 


Corollary 7 Let (M,J,g,Q) be a Calabi-Yau m- 
fold, and N a compact real m-submanifold in M. 
Then a necessary condition for N to be isotopic 
to a special Lagrangian submanifold N' in M 
is that [w|,]=0 in H*(N,R) and [ImQ|,]=0 in 
H”(N,R). 


Deformations of Compact SL m-Folds 


The deformation theory of compact special Lagran- 
gian manifolds was studied by McLean (1998), who 
proved the following result: 


Theorem 8 Let (M,J,g, 9) be a Calabi-Yau 
m-fold, and N a compact special Lagrangian 
m-fold in M. Then the moduli space My of special 
Lagrangian deformations of N is a smooth manifold 
of dimension b!(N), the first Betti number of N. 


Sketch proof. Suppose for simplicity that N is an 
embedded submanifold. There is a natural orthogo- 
nal decomposition TM|n = TN @v, where v — N is 
the normal bundle of N in M. As N is Lagrangian, 
the complex structure J: TM — TM gives an iso- 
morphism J:v — TN. But the metric g gives an 
isomorphism TN = T*N. Composing these two 
gives an isomorphism v S T*N. 

Let T be a small tubular neighborhood of N in M. 
Then we can identify T with a neighborhood of the 
zero section in v. Using the isomorphism v = T*N, we 
have an identification between T and a neighborhood of 
the zero section in T*N. This can be chosen to identify 
the Kähler form w on T with the natural symplectic 
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structure on T*N. Let r:T — N be the obvious 
projection. 

Under this identification, submanifolds N’ in T C 
M which are C! close to N are identified with the 
graphs of small smooth sections a of T*N. That is, 
submanifolds N’ of M close to N are identified with 
1-forms a on N. We need to know: which 1-forms a 
are identified with SL m-folds N’? 

Now, N’ is special Lagrangian if w|,, = ImQ|,, = 0. 
But a|\,:N’— N is a diffeomorphism, so we can 
push w|,, and ImQ|,, down to N, and regard them 
as functions of a. Calculation shows that 


Tx (Wl) =da and m,(ImQ9|y) = Fla, Va) 


where F is a nonlinear function of its arguments. 
Thus, the moduli space My is locally isomorphic to 
the set of small 1-forms a on N such that da = 0 
and F(a, Va) = 0. 

Now it turns out that F satisfies F(a, Va) ~% 
d(xa@) when a is small. Therefore, My is locally 
approximately isomorphic to the vector space of 1- 
forms a with da=d(ka) =0. But by Hodge theory, 
this is isomorphic to the de Rham cohomology 
group H'!(N,R), and is a manifold with dimension 
b'(N). 

To carry out this last step rigorously requires 
some technical machinery: one must work with 
certain Banach spaces of sections of T*N, A*T*N 
and A” T*N, use elliptic regularity results to prove 
that the map a> (da, F(a, Va)) has closed image in 
these Banach spaces, and then use the implicit 
function theorem for Banach spaces to show that 
the kernel of the map is what is expected. 


Obstructions to Existence of Compact SL m-Folds 


Let {(M, Ji, 2,01): E (—e,€)} be a smooth one- 
parameter family of Calabi-Yau m-folds. Suppose 
No is an SL m-fold in (M, Jo, g0, Q0). When can we 
extend Nop to a smooth family of SL m-folds N, in 
(M, J+, 8+, 2+) for t € (—e, €)? 

By Corollary 7, a necessary condition is that 
[vrn] =UmQ,|y,]=0 for all ¢. Our next result 
shows that locally, this is also a sufficient condition. 


Theorem 9 Let {(M,Ji,2;,0:):t€ (-e6)} be a 
smooth one-parameter family of Calabi-Yau m-folds, 
with Kahler forms w+. Let No be a compact SL m-fold 
in (M,Jo, 80,20), and suppose that [wiin ]=0 
in H? (No, R) and [Im Qrin] =0 in H™ (No, R) for all 
t € (—€6,6). Then No extends to a smooth one- 
parameter family {N;:t E€ (—6,6)}, where 0< <€ 
and N, is a compact SL m-fold in (M, Jt, 81, Q+). 


This can be proved using similar techniques to 
Theorem 8. Note that the condition [Im Q,|,,]=0 


for all t can be satisfied by choosing the phases of 
the Q; appropriately, and if the image of H2(N,‘Z) in 
H(M, R) is zero, then the condition [w|,,;] =0 holds 
automatically. 

Thus, the obstructions [wrn] =[ImQ|,,]=0 in 
Theorem 9 are actually fairly mild restrictions, and 
SL m-folds should be considered as pretty stable 
under small deformations of the Calabi-Yau 
structure. 


Remark The deformation and obstruction theory 
of compact SL m-folds are extremely well behaved 
compared to many other moduli space problems in 
differential geometry. In other geometric problems 
(such as the deformations of complex structures on a 
complex manifold, or pseudoholomorphic curves in 
an almost-complex manifold, or instantons on a 
Riemannian 4-manifold), the deformation theory 
often has the following general structure. 


There are vector bundles E, F over a compact 
manifold M, and an elliptic operator P :C®(E) — 
C™(F), usually first order. The kernel Ker P is the 
set of infinitesimal deformations, and the cokernel 
Coker P the set of obstructions. The actual moduli 
space M is locally the zeros of a nonlinear map 
Y : Ker P — Coker P. 

In a generic case, Coker P=0, and then the 
moduli space M is locally isomorphic to KerP, 
and so is locally a manifold with dimension ind(P). 
However, in nongeneric situations Coker P may be 
nonzero, and then the moduli space M may be 
nonsingular, or have an unexpected dimension. 

However, SL m-folds do not follow this pattern. 
Instead, the obstructions are topologically determined, 
and the moduli space is always smooth, with dimen- 
sion given by a topological formula. This should be 
regarded as a minor mathematical miracle. 


Mirror Symmetry and the SYZ Conjecture 


Mirror symmetry is a mysterious relationship 
between pairs of Calabi-Yau 3-folds M, M, arising 
from a branch of physics known as string theory, 
and leading to some very strange and exciting 
conjectures about Calabi-Yau 3-folds, many of 
which have been proved in special cases. 

In the beginning (the 1980s), mirror symmetry 
seemed mathematically completely mysterious. But 
there are now two complementary conjectural 
theories, due to Kontsevich and Strominger-Yau- 
Zaslow, which explain mirror symmetry in a fairly 
mathematical way. Probably both are true, at some 
level. The second proposal, due to Strominger, Yau, 
and Zaslow (1996), is known as the SYZ conjecture. 
Here is an attempt to state it. 
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The SYZ conjecture Suppose M and M are mirror 
Calabi-Yau 3-folds. Then (under some additional 
conditions), there should exist a compact topologi- 


cal 3-manifold B and surjective, continuous maps 
f:M — B and f: M — B, such that 


(i) There exists a dense open set Bo C B, such that 
for each b € Bo, the fibers f-'(b) and f-'(b) are 
nonsingular special Lagrangian 3-tori T? in M 
and M. Furthermore, f-'(b) and f~!(b 
some sense dual to one another. 

(ii) For each b € A=B\Bo, the fibers f—'(b) and 
f-'(b) are expected to be singular special 
Lagrangian 3-folds in M and M. 


) are in 


The fibrations f and f are called special Lagran- 
gian fibrations, and the set of singular fibers A is 
called the discriminant. In part (i), the nonsingular 
fibers of f and f are supposed to be dual tori. What 
does this mean? 

On the topological level, we can define duality 
between two tori T,T to be a choice of isomorph- 
ism H'(T,Z)2H,(T,Z). We can also define 
duality between tori equipped with flat Riemannian 
metrics. Write T=V/A, where V is a Euclidean 
vector space and A a lattice in V. Then the dual 
torus T is defined to be V*/A*, where V* is the 
dual vector space and A* the dual lattice. However, 
there is no notion of duality between nonflat 
metrics on dual tori. 

Strominger, Yau, and Zaslow argue only that 
their conjecture holds when M, M are close to the 
“large complex structure limit.” In this case, the 
diameters of the fibers f~'(b), f-'(b) are expected to 
be small compared to the diameter of the base space 
B, and away from singularities of f, f, the metrics on 
the nonsingular fibers are expected to be approxi- 
mately flat. So, part (i) of the SYZ conjecture says 
that for b € B\Bo, f'(b) is approximately a flat 
Riemannian 3-torus, and f~!(b) is approximately the 
dual flat Riemannian torus. 

Mathematical research on the SYZ conjecture has 
followed two broad approaches. The first could be 
described as symplectic topological. For this, we 
treat M, M just as symplectic manifolds and f, fj just 
as Lagrangian fibrations. We also suppose B is a 
smooth 3-manifold and f, f are smooth maps. Under 
these simplifying assumptions, Mark Gross, Wei- 
Dong Ruan, and others have built up a beautiful, 
detailed picture of how dual SYZ fibrations work at 
the global topological level. 

The second approach could be described as local 
geometric. Here, we try to take the special Lagran- 
gian condition seriously from the outset, and focus 
on the local behavior of special Lagrangian 


submanifolds, and especially their singularities, 
rather than on global topological questions. In 
addition, we are intrested in what fibrations of 
generic Calabi-Yau 3-folds might look like. 

There is now a well-developed theory of SL 
m-folds with isolated singularities modeled on 
cones (Joyce 2003a). This is applied to SL 
fibrations and the SYZ conjecture in Joyce 
(2003a, b), leading to the tentative conclusions 
that for generic Calabi-Yau 3-folds M, special 
Lagrangian fibrations f : M — B will be only piece- 
wise smooth, and have discriminants A of real 
codimension 1 in B, in contrast to smooth fibra- 
tions which have A of codimension 2. We also 
argue that for generic mirrors M,M and f,f, 
the discriminants A,A cannot be homeomorphic 
and so do not coincide. This contradicts part (ii) 
above. 

A better way to formulate the SYZ conjecture 
may be in terms of families of mirror Calabi-Yau 
3-folds M,,M, and fibrations fi: M; > B, f: M, > 
B for t € (0,€) which approach the aa complex 
structure limit” as t + 0. Then we could require the 
discriminants A,, A, of faf, to converge to some 
common, codimension 2 limit Ap as t — 0. 

It is an important, and difficult, open problem to 
construct examples of special Lagrangian fibrations 
of compact, holonomy SU(3) Calabi-Yau 3-folds. 
None are currently known. 


See also: Minimal submanifolds; Mirror Symmetry: 
A Geometric Survey; Moduli Spaces: An Introduction; 
Riemannian Holonomy Groups and Exceptional Holonomy. 
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Introduction 


Systems of Calogero—Moser-Sutherland (CMS) type 
form a class of finite-dimensional dynamical systems 
that are integrable both at the classical and at the 
quantum level. The CMS systems describe N point 
particles moving on a line or on a ring, interacting 
via pair potentials that are specific functions of four 
types, namely rational (I), hyperbolic (II), trigono- 
metric (III), and elliptic (IV). They occur not only in 
a nonrelativistic (Galilei-invariant), but also in a 
relativistic (Poincaré-invariant) setting. Thus, one 
can distinguish a hierarchy of 16 physically distinct 
versions (classical/quantum, nonrelativistic/relativis- 
tic, type I-IV), the most general one being the 
quantum relativistic type IV system. 

The nonrelativistic systems date back to pioneer- 
ing work by Calogero, Sutherland, and Moser in the 
early 1970s. The pair potential structure of the 
interaction can be encoded in the root system An_1, 
and there also exist integrable versions for all of the 
remaining root systems. The classical systems are 
given by N Poisson commuting Hamiltonians with a 
polynomial dependence on the particle momenta 
p1, --- pn. Accordingly, the quantum versions are 
described by N commuting Hamiltonians that are 
partial differential operators. 

The relativistic systems were introduced in the 
mid-1980s, at the classical level by Ruijsenaars and 
Schneider, and at the quantum level by Ruijsenaars. 
They converge to the nonrelativistic systems in the 
limit c — oo, where c is the speed of light. Again, the 
systems can be related to the root system An_1, and 
they admit integrable versions for other root 
systems. All of the commuting classical Hamilto- 
nians depend exponentially on generalized momenta 
Pi,.--,DN- Hence, the associated commuting quan- 
tum Hamiltonians are analytic difference operators. 

The above integrable systems can be further 
generalized by allowing supersymmetry or internal 
degrees of freedom (“spins”), coupled in quite 
special ways to retain integrability. In this article, 
however, the focus is on the 16 versions of the 
An-1-Symmetric CMS systems without internal 
degrees of freedom. The primary aim is to acquaint 
the reader with their definition and integrability, 


and with their most prominent features and inter- 
relationships. Second, we intend to give a rough 
sketch of the state of the art concerning explicit 
solutions for the various versions. This involves a 
concretization of the action-angle maps and eigen- 
function transforms that simultaneously diagonalize 
the commuting dynamics, paying special attention to 
their remarkable duality properties. 

It is beyond the scope of this article to review the 
hundreds of papers specifically dealing with CMS 
type systems, let alone the much larger literature 
where they play some role. Indeed, the systems have 
been encountered in a great many different contexts 
and they are related to a host of other integrable 
systems in various ways. Accordingly, they can be 
studied from the perspective of various subfields of 
mathematics and theoretical physics. First some of 
these perspectives and relations to seemingly quite 
different topics will be mentioned before embarking 
on the far more focused survey. 

Staying first within the confines of the CMS type 
systems, some nonobvious limits yielding other 
familiar finite-dimensional integrable systems will 
be mentioned. To begin with, all of the Ay_1 type 
systems give rise to systems with a Toda type 
(exponential “nearest neighbor”) interaction via a 
suitable limiting transition (basically a strong- 
coupling limit). This leads to integrable N-particle 
systems with a classical/quantum, nonrelativistic/ 
relativistic, nonperiodic/periodic version; starting 
from the quantum relativistic periodic Toda system, 
the remaining seven versions can be obtained by 
suitable limits. 

Next, we recall that the quantum system of N 
nonrelativistic bosons on the line or ring interacting 
via a pair potential of 6-function type is soluble via a 
Bethe ansatz, with the “line version” exhibiting 
quantum soliton behavior (factorized scattering). It 
has been shown that there exist scaling limits of 
eigenfunctions for suitable CMS systems that give 
rise to the latter Bethe type eigenfunctions for N = 2, 
while convergence for N > 2 is plausible, but has 
not been demonstrated thus far. 

Via suitable analytic continuations preserving 
reality/formal self-adjointness, one can arrive at 
CMS systems with more than one species of particle 
(particles and “antiparticles”). Likewise, analytic 
continuations and appropriate limits of CMS sys- 
tems associated with root sytems other than An-_ 
lead to a further proliferation of N-dimensional 
integrable systems. Typically, such limits refer either 
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to the commuting Hamiltonians (the Toda limit 
being a case in point) or to the joint eigenfunctions 
(as exemplified by the 6-function system limit); it 
seems difficult to control both sets of quantities at 
once. 

Starting from the spin type CMS systems, another 
kind of limit can be taken. Specifically, by “freez- 
ing” the particles at equilibrium positions, it is 
possible to arrive at integrable spin chains of 
Haldane-Shastry and Inozemtsev type. 

At this point, it is expedient to insert a brief 
remark on finite-dimensional integrable systems. As 
the term suggests, one may expect that, with due 
effort, such systems can be “integrated,” or, equiva- 
lently, “solved.” But it should be noted that the 
latter terms (let alone the qualifier “due effort’’) 
have no unambiguous mathematical meaning. Cer- 
tainly, “solving” involves obtaining explicit infor- 
mation on the action-angle map and joint 
eigenfunction transform at the classical and quan- 
tum level, resp., but a priori it is not at all clear how 
far one can proceed. 

Focusing again on the CMS systems and their 
relatives, it should be stressed that, in many cases, 
one is still far removed from a complete solution, 
especially for the elliptic CMS systems. In this 
regard the previous remark serves not only as a 
caveat, but also to make clear why the various 
vantage points provided by different subfields in 
mathematics and physics are crucial: typically, they 
yield complementary insights and distinct represen- 
tations for solutions, serving different purposes. 

To be sure, in first approximation the mathe- 
matics involved at the classical and quantum level is 
symplectic geometry and Hilbert space theory, resp. 
In point of fact, however, far more ingredients have 
turned out to be quite natural and useful. On the 
classical level, these include the theory of groups, Lie 
algebras and symmetric spaces, linear algebra and 
spectral theory, Riemann surface theory, and more 
generally algebraic geometry. 

On the quantum level, the viewpoint of harmonic 
analysis on symmetric spaces is particularly natural 
and fruitful for the nonrelativistic CMS systems and 
their arbitrary root-system versions, whereas quan- 
tum groups/algebras/symmetric spaces can be tied in 
with the relativistic systems and their versions for 
other root systems. (The c — oo limit amounts to the 
q— 1 limit in the quantum group picture.) As a 
matter of fact, the whole area of special functions 
and their g-analogs is intimately related to the 
quantum CMS type systems (cf. also the last section 
of this article). Finally, the occurrence of commut- 
ing analytic difference operators in the relativistic 
(q #41) systems leads to largely uncharted territory 


in the intersection of the theory of Hilbert space 
eigenfunction expansions and the theory of linear 
analytic difference equations. 

The study of the thermodynamics (N — oo limit 
with temperature >0 and density >0 fixed) asso- 
ciated with the trigonometric and elliptic CMS 
systems and their spin cousins yields its own circle 
of problems. It was initiated by Sutherland three 
decades ago, and even though a host of results on 
partition functions, correlation functions, fractional 
statistics, strong—weak coupling duality, relations to 
Yangians, etc., have meanwhile been obtained, 
many questions are still open. This area also has 
links with random-matrix theory, but the input from 
this field is thus far limited to certain discrete 
couplings. 

The above N-dimensional integrable systems are 
related to a great many infinite-dimensional integr- 
able systems, both at the classical and at the 
quantum level. On the one hand, there are structural 
analogs that have been used to advantage in the 
study of CMS systems, including Lax pair and R- 
matrix formulations, zero-curvature representations, 
bi-Hamiltonian formalism, Backlund transforma- 
tions, time discretizations, and tools such as Baker- 
Akhiezer functions, Bethe ansatz, separation of 
variables, and Baxter-type O-operators. 

On the other hand, there are striking physical 
similarities between various soliton field theories 
(a prominent one being the sine-Gordon field 
theory) and infinite soliton lattices (in particular 
several Toda type lattices), and the CMS systems for 
special parameter values. Particularly conspicuous 
are the ties between the classical CMS systems and 
the KP and two-dimensional Toda hierarchies. The 
latter relations actually extend beyond the solitons, 
including rational and theta function solutions. 

CMS systems are relevant in various other 
contexts not yet mentioned. A prominent one 
among these is a class of supersymmetric gauge 
field theories. In this quantum context, the classical 
CMS systems have surfaced in the description 
of moduli spaces encoding the vacuum structure 
(Seiberg-Witten theory). Equally surprising, certain 
classical CMS systems (with internal degrees 
of freedom) have found a second application in a 
quantum context, namely in the description of 
quantum chaos (level repulsion). 

We conclude this introduction by listing addi- 
tional disparate subjects where connections with 
CMS type systems have been found. These include 
the theory of Sklyanin, affine Hecke, Kac-Moody, 
Virasoro and W-algebras, equations of Knizhnik-— 
Zamolodchikov, Yang—Baxter, Witten—Dijkgraaf- 
Verlinde-Verlinde, and Painlevé type, Gaudin, 
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Hitchin, Wess-Zumino, matrix and quasi-exactly 
solvable models, Dunkl—Cherednik and Polychrona- 
kos operators, the quantum Hall effect and quantum 
transport, two-dimensional Yang-Mills theory, 
functional equations, integrable mappings, Huygens’ 
principle, and the bispectral problem. 


Classical Nonrelativistic CMS Systems 


A system of N nonrelativistic equal-mass m particles 
on the line interacting via pair potentials can be 
described by a Hamiltonian 


EF 
To 
mA + 


The CMS systems are defined by four distinct 
choices of pair potential. The simplest choice reads 


g>0 (1) 2 | 


>, Vie—me). weS0 [L 


1<j<k<N 


V(x) = g*/mx’, 


Hence, the coupling constant g has dimension 
[action] (the product of [position] and [momen- 
tum]). This potential is clearly repulsive. Thus, each 
initial state in the phase space 


Q = {(x,p) ER" |x € G} [3] 
where G is the configuration space 
G={xeR"]xy << x1} [4] 


is a scattering state. 
The next level is given by the hyperbolic choice 


V(x) = 2v? /msinh" (vx), v>0 D [5 


Hence, v has dimension [position] *, and the 
previous system arises by taking v to 0. It is clear 
that [5] yields again a repulsive particle system, so 
that each state in 2 given by [3] is a scattering state. 

The highest level in the hierarchy is the elliptic 
level, where 


V(x) = g p(x; w, 4) /m, 


and p(x;w,w) denotes the Weierstrass g-function 
with periods 2w and 2w”. It is beyond the scope of 
this article to elaborate on the elliptic regime, even 
though it is of considerable interest. It reappears in 
later sections as the most general regime in which 
integrability holds true. Indeed, a prominent feature 
of the elliptic case [6] is that it can be specialized 
both to the hyperbolic case [5] and to the trigono- 
metric case, given by 


V(x) = 2v /msin* (vx) (HI) 17] 


To obtain the hyperbolic specialization, one 
should take w =ir/2v and send w to œ; then [6] 


w,—-iw >0 (IV) [6] 


reduces to [5] (up to an additive constant). Likewise, 
[7] results from [6] by choosing w=7/2v and 
taking —iw” to oo. 

The physical picture associated with the trigono- 
metric and elliptic systems is quite different from 
that of the rational and hyperbolic ones. Of course, 
the potentials [7] and [6] are again repulsive, but 
now the internal motion is confined and oscillatory. 
More specifically, due to energy conservation the 
phase spaces 


Om = Gm x RN, 
Gy = {xn < +++ < x1,X1 — XN < T/V} [8] 


Q = Gy x RY, 
Gry = {xn < +- < x1, X1 — XN < 2w} [9] 


are left invariant by the flow generated by the 
trigonometric and elliptic N-particle Hamiltonian, resp. 

Alternatively, one may interpret the trigonometric 
Hamiltonian as describing particles constrained to 
move on a circle and interacting via the inverse 
square potential [2]. In this picture, the quantities 
2vx1,...,2vxyn are viewed as angular positions on 
the circle, and one needs a suitable quotient of the 
phase space [8] by a discrete group action to 
describe a state of the system. 

Turning to integrability aspects, we begin by 
noting that the total momentum Hamiltonian 


N 
D, [10] 
j=l 


obviously Poisson commutes with the above defin- 
ing Hamiltonians of the systems. For N =2, there- 
fore, integrability is plain. It is possible to write 
down explicitly the higher commuting Hamiltonians 
for N > 2 as well but, in the nonrelativistic setting, 
it is more illuminating to characterize them as the 
power traces or (equivalently) the symmetric func- 
tions of a so-called Lax matrix. 

The Lax matrix is an NxN _ matrix-valued 
function on the phase space of the system. It plays 
a pivotal role not only for understanding integr- 
ability, but also for setting up an action-angle 
transformation. The latter issue is discussed again 
later. Here the more conspicuous features of the Lax 
matrix will be explained, focusing on the type II 
system for expository ease. Then one can choose 


Li = Pj, Lip = igv/sinh V(x; = Xg); 
bkal Ngk [11] 
Thus, L is Hermitean and we have 


tL=P,._ eL =2mH [12] 
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(The rational Lax matrix results from [11] by taking 
v — 0, and the trigonometric one by taking v — iv. 
The elliptic Lax matrix has a similar structure, but it 
involves an extra “spectral” parameter.) 

Although not obvious, it is true that all of the 
power traces 


H, = tr ee 
are in involution (i.e., Poisson commute). One way to 
understand this involves the so-called Lax pair 
equation associated with the Hamiltonian flow gener- 
ated by H=H)/m. This involves a second N x N 
matrix function given by 


Mi => 


Igy m sinh? v(x; — xı) 


k=1,...,N [13] 


—igv? 


igv cosh v(x; — xp) [14] 
eS a a a 
m sinh” v(x; — Xz) 
j#R 
When the positions and momenta in L and M evolve 
according to the H-flow, one has 


L; = [My, Ly] [15] 


where [| -,- ] is the matrix commutator. (Indeed, [15] 
amounts to the Hamilton equations, as is readily 
checked.) Since M is anti-Hermitean, it is not 
difficult to derive from this Lax pair equation that 
the flow is isospectral: L; is related to Lo by a 
unitary transformation L,;=U;L )U; obtained from 
M,, so that the spectrum of L; is time independent. 
This argument already shows the existence of N 
conserved quantities under the H-flow, namely the 
N eigenvalues of L. It is, however, simpler to work 
with either the power traces H, given by [13] or 
with the symmetric functions S, of L, given by 


N 
det(1n + AL) = SAS, [16] 
k=0 


These Hamiltonians depend only on the eigenvalues 
of L, so they are also conserved under the flow. 


Note that 
S,;=P, S,=P*’—mH [17] 


To see why these Hamiltonians are in involution, 
one can invoke the long-time asymptotics of the 
H-flow. It reads 


pt) ~,  pn<=< pi, 
x(t) ~ xf + tp;/m, [18] 
j=1,...,N, t-— œ 


Accordingly, one gets 


Lp ~ diag(p1,...,DN) = Loo, t —> oO [19] 


Since the time evolution is a canonical transforma- 
tion and the Poisson brackets {H,,H)} are time 
independent (by the Jacobi identity), it now readily 
follows from [19] that they vanish. (Indeed, H; and 
H; reduce to power traces of Læ, and the asymptotic 
momenta P,,.-.,Py Poisson commute.) 


Quantum Nonrelativistic CMS Systems 
The canonical quantization prescription 
p= ib 0x J= lN [20] 


(b being the Planck constant) gives rise to an 
unambiguous quantum Hamiltonian 


p? N 
j=l 1<j<k<N 


for any classical Hamiltonian [1]. Thus, the defin- 
ing Hamiltonians of the above systems give rise to 
well-defined partial differential operators (PDOs), 
which act on suitable dense subspaces of the 
Hilbert space L7(G,,dx),K=I,...,IV, with Gy and 
Gy given by G in [4], and Gm, Gr by [8] and [9], 
respectively. 

We recall that there is no general result ensuring that 
a classically integrable system admits an integrable 
quantum version. More precisely, when one substi- 
tutes [20] in N Poisson commuting Hamiltonians, it 
need not be true that they commute as quantum 
operators, even when no ordering ambiguities are 
present. For the power trace Hamiltonians such 
ambiguities do occur. (For example, [11] gives rise 
to a term in H; proportional to pı /sinh* v(x; — x2).) 
On the other hand, no noncommuting factors occur 
in the quantization of $,,...,Sx. To verify this, one 
need only note that S, equals the sum of all k xk 
principal minors of L, cf. [16]; choosing a diagonal 
element p; in a summand, one therefore has no 
dependence on x; in the remaining factors, hence no 
ordering ambiguity. 

As a result, the prescription [20] yields N 
unambiguous operators S(x, —1bV), which are 
moreover formally self-adjoint on L*(G,,dx) for 
each of the four cases k=I,...,IV. Although by no 
means obvious, it is true that these operators do 
commute. Thus, integrability is preserved under 
quantization of the above systems. Now the power 
traces of a matrix can be expressed as polynomials 
in the symmetric functions (via the Newton 
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identities), so this yields an ordering ensuring that 
the quantized power traces commute as well. 

Just as the action-angle transformation for a 
classically integrable system “diagonalizes” all of 
the Poisson commuting Hamiltonians at once (in the 
sense that the transformed Hamiltonians depend 
only on the action variables), one expects that there 
exists a unitary operator that transforms all of the 
commuting Hamiltonians to diagonal form. In the 
classical setting, the existence of this diagonalizing 
map follows (under suitable technical restrictions) 
from the Liouville-Arnold theorem, whereas in the 
quantum context the existence of such a joint 
eigenfunction transformation is a far more delicate 
issue. This problem is briefly discussed later again, 
noting here that the solutions obtained to date vary 
considerably in completeness and “explicitness” for 
the four regimes. 


Classical Relativistic CMS Systems 


The nonrelativistic spacetime symmetry group is the 
Galilei group. Its Lie algebra is represented by the 
time translation generator H given by [1], space 
translation generator P given by [10], and the Galilei 
boost generator 


B=-m) šj [22] 


More precisely, the Poisson brackets are given by 


so that the last bracket does not vanish (as is 
the case for the Galilei Lie algebra). This deviation 
is inconsequential, however, since the constant 
Nm (central extension) yields trivial Hamilton 
equations. 

The relativistic spacetime symmetry group (Poin- 
caré group) yields a Lie algebra that differs from 
[23] only in Nm being replaced by H/c*, where c is 
the speed of light. Clearly, the functions 


H = me 3 cosh (E 
= mc 
P = mc 3 sinh (E 
= MC 


together with B given by [22] give rise to these 
altered Poisson brackets. Physically, these three 
generators describe a system of N relativistic free 
mass-m particles in terms of their rapidities p;/mc. 


|24] 


A natural ansatz to take interaction into account 
now reads 


N ; 
H = me? 2 cosh (25) V(x) 


P=mc 3 sinh (25) Vi) |25] 
=l 

Vi(x) = | | f(x; — xe) 
kAj 


Indeed, it is plain that this still entails 


{H,B} =P, {PB Hye [26] 


But to obtain a relativistic particle system, the time 
and space translations must also commute. The 
corresponding requirement {H, P}=0 yields a severe 
constraint on the “pair potential” function f(x) in 
[25] whenever N>2. (For N=2, one gets 
{H,P}=0 irrespective of the choice of f.) 

As it turns out, the vanishing requirement is 
satisfied when 


f° (x) =a + be(x) |27] 


where a, b are constants and g(x) is the Weierstrass 
function already encountered. Taking, for example, 
a,b > 0, one can take the positive square root of the 
right-hand side of [27]. This choice of f(x) yields the 
defining Hamiltonian of the relativistic elliptic 
system (type IV). In the three degenerate cases, it is 
convenient to choose 


(1+8 /m?c?x?)'/* (D 
(1+sin*(vg/mc)/sinh?(vx))'/? I) [28] 
(1-+sinh*(vg/mc)/sin?(vx))'/? D 


f(x) = 


It is an elementary exercise to check that this 
implies 


lim (H — Nmc*) = Hr, 


C—> OO 


lim P= P» [29] 


where H,, and P,, are the above nonrelativistic time 
and space translation generators. Hence, the defin- 
ing Hamiltonians of the relativistic systems reduce 
to their nonrelativistic counterparts in the limit 
C— OO. 

The special character of the function [27] makes 
itself felt not only in ensuring Poincaré invariance, 
but also in entailing integrability. To begin with, 
note that the functions 


N 


Siw =exp( +6X0p), B=1/me BO 


j=l 
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commute with H and P, so that integrability for 
N =3 is plain. More generally, the Hamiltonians 


Sye D exp (#0) fE xj — Xp), 


ico) jel iC [31] 
ae 
can be shown to mutually commute. Clearly, one has 
Sa = S_nNSn_-}, J=1,...,N-—1 [32] 
and 


H = (Sı +S-1)/2m6*, P = (Sı — S-1)/26 [33] 


As anticipated by the notation, the functions 
S1, ..., Sy may be viewed as the symmetric functions 
of a Lax matrix. More precisely, in the elliptic case 
this is true up to multiplicative constants that 
depend on a spectral parameter occurring in the 
Lax matrix. As before, only the Lax matrix for the 
type II system is specified here. In this case, one can 
dispense with the spectral parameter and choose 


Li = ejCjkek, i,R = 1,..., N [34] 
where 
ej = exp(vx; +4p;/2) [] fæ- x)" BS] 
[Aj 
sinh (iGv 
Cu = exp(—v(e + x4) a a 


sinh v(x; — xz + ilg) 


In [35], f(x) is the type II function given by [28]. The 
matrix C arises from Cauchy’s matrix 1/(w; — zz) 
via a suitable substitution, and Cauchy’s identity 


1 J 
T RRS jk=1 


Wij 
N 
1 
“I, 


age a) 1<j<k<N (w; 





(= WE) E Ze) 
=p) — WE) 





[37] 


ensures that [34] yields the Hamiltonians S; of [31]. 
To conclude this section, we point out that the 
relation 


L = 1y + Lar + O(67), 


where L,, denotes the nonrelativistic Lax matrix 
[11], can be used to deduce the involutivity of the 
nonrelativistic Hamiltonians from that of their 
relativistic counterparts. 


6-0 [38] 


Quantum Relativistic CMS Systems 


When the canonical quantization prescription [20] is 
applied to the classical Hamiltonians [31] with 


f(x)=1, one obtains commuting quantum operators 
whose action is exemplified by 


wo(-Lid)nay=r(x-i2) 09 


That is, the operators act on functions that have an 
analytic continuation in x;,...,xn from the real line 
R to a strip around R in the complex plane C, 
whose width is at least 24/mc. 

Operators of this type are called analytic differ- 
ence operators (henceforth AAOs). The choice 
f(x)=1 amounts to the free case g=0 in [28]. 
For g #0, however, the canonical quantization 
exemplified by [39] yields noncommuting AAOs. 
Thus, the factor ordering following from [31] 
would entail that integrability breaks down at the 
quantum level. 

As mentioned before, there is no general result 
guaranteeing that a different ordering that preserves 
integrability exists. Even so, this is true in the 
present case. Specifically, the function f(x) can be 
factorized as f,(x)f_(x), and then the AAOs 


Sy) = ` LAG; 


Tc{1,...,N} jel 
M= k¢I 


x exp (=a) W 


El jel 
/ kgI 


— Ag) |40] 


do commute. In the elliptic case [27], this factoriza- 
tion involves the Weierstrass o-function, and com- 
mutativity can be encoded in a sequence of 
functional equations satisfied by the o-function. 
For the type I-III systems the pertinent factorization 
of [28] is given by 


(1 + ig/x)'/” (I) 
(sinh v(x + i8g)/sinh vx)'/* (ID) [41] 
+i8g)/sinvx)'/* (D 


(Here one has g > 0, and the choice of square root is 
such that f+(x)— 1 for g | 0.) 

The nonrelativistic limit c — œo of the quantum 
Hamiltonians [33] can be determined by expanding 
Sı and S in a power series in 8=1/mc. In this 
way, one obtains once more [29], except for a small, 
but crucial change in Ha: instead of the coupling 
constant dependence g% in the potential energy, one 
gets g(g —h). The extra term arises from the action 
of the term linear in 8 in the expansion of the 
exponential on the term linear in 8 in the expansion 
of the functions f(x) 

From the perspective of the nonrelativistic quan- 
tum CMS systems, the change g? — g(g — þh) appears 
ad hoc. As it transpires, however, the different 


f(x) = 


(sin v(x 
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dependence on g ensures that the eigenfunctions of 
Hy, depend on g in a far simpler way. This will 
become clear shortly. 


Action-Angle Transforms and Duality 


Under certain technical assumptions, any integrable 
system given by N independent Poisson commu- 
ting Hamiltonians S;(x,p),...,Sn(x,p) on a 2N- 
dimensional phase space admits local canonical 
transformations to action-angle variables. Like the 
spectral theorem on the quantum level, this 
structural result is of limited practical value. Indeed, 
just as the spectral theorem yields no concrete 
information concerning eigenfunctions, bound-state 
energies, scattering, etc., associated with a given 
self-adjoint Hamiltonian, the  Liouville—Arnold 
theorem only yields general insight in the type of 
motion that can occur and the geometric character 
of the local maps (in terms of invariant tori). 

To fully comprehend (“solve”) a given integrable 
system, one should render the associated action- 
angle map as concrete as possible. For the CMS type 
systems, a complete solution to this problem has 
only been achieved for the systems of type I-III. The 
motion in the trigonometric systems is oscillatory, so 
that a closeup via the action-angle transform 
involves extensive geometric constructions. By con- 
trast, the type I and II systems are scattering systems, 
and here the action-angle map can be tied in with 
the classical wave maps (Møller transformations). 

We now sketch some salient features of the 
action-angle maps for systems of type I and II. In 
all cases the map (denoted ©) is a canonical 
transformation from the phase space Q (eqn [3]) 
with 2-form dx ^ dp to the phase space 


Ô = {(&,p) E RN |p € G} 42, 


with 2-form dê A dp. Thus, the actions p,,...,pyx 
vary over G given by [4] and the “angles” <1,..., <n 
over R. Consequently, Ê amounts to 2 with x and p 
interchanged. 

As should be the case, the transformed commuting 
Hamiltonians 


S, =S,-®', k=1,...,N [43] 


depend only on the action vector p. To be specific, 
they arise from S(x, p) by taking g=0 (no interac- 
tion, hence no x dependence) and substituting p — P. 
Indeed, the actions p, are the t — co limits of the 
momenta p(t), where the t dependence refers to the 
defining Hamiltonian of the system. 

As it happens, the Lax matrix L is of decisive 
importance to concretize the action-angle map ®, 


and in particular to reveal its hidden duality 
properties. The starting point is a commutation 
relation of L(x,p) with a diagonal matrix A(x) 
given by 


A(x) = diag(d(x1),...,d(xn)) 
_ fy (I) [44] 
10) = {Cep(2iy) 


Obviously, the symmetric functions D;(x) of A(x) 
yield an integrable system on Q, so the Hamiltonians 


Di (%,P) = (Deo B®") (%,p), R=1,...,N [45] 


yield an integrable system on the action-angle phase 
space 9. The crux of the matter is now that these 
systems are familiar: they are also systems of type I 
and II! 

To be specific, let us denote the dual systems just 
described by a caret, and the nonrelativistic/relati- 
vistic systems by a suffix nr/rel, resp. Then the 
duality properties alluded to are given by 


AN 


ba Dad Mpe 
Íe = Iel 


ls = ie 
; |46] 
Mr = Leeds 


and #7! serves as the action-angle map for the dual 
systems. 

In order to sketch why this state of affairs holds 
true for the II... system, recall that its Lax matrix is 
given by [34]. From this, one readily checks the 
commutation relation 


coth(iGvg)|A, L] = 2e & e — (AL + LA) |47] 


Since L is Hermitean, there exists a unitary U 
diagonalizing L. It can now be shown that the 
spectrum of L is positive and nondegenerate, and 
that U*e has nonzero components. The gauge 
ambiguity in U (given by a permutation matrix and 
diagonal phase matrix) can, therefore, be fixed by 
requiring 


U*LU = diag(exp(3p1),...,exp(Gpn)), 
pn <: < PI |48] 


(Ure), >0, j=1,...,N 49) 


A suitable reparametrization of U*e then yields the 
“angle” vector x. 

As a consequence, U*AU becomes a function of £ 
and p. In detail, one finds 


(VAUR p) = L(B/2, 2v;p, x)" [50] 


where L(v, 8; x, p) is given by [34] and T denotes the 
transpose. Therefore, the “dual Lax matrix” 
A=U*AU is essentially equal to L, explaining the 
self-duality I,e ~ I,e announced above. 
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With the action-angle transform under explicit 
control, much more can be said about the solutions 
to Hamilton’s equations for each of the commuting 
Hamiltonians, both as regards finite times and as 
regards long-time asymptotics (scattering). It is 
beyond the scope of this article to enlarge on this, 
but it is worth mentioning that the scattering reveals 
the solitonic character of the particles. Indeed, the 
set of asymptotic momenta p,,...,Py is conserved 
under the scattering and the asymptotic position 
shifts are factorized in terms of pair shifts. A quite 
remarkable feature of the type I systems is that the 
shifts actually vanish (“billiard ball” scattering). 


Eigenfunction Transforms and Duality 


Both at the relativistic and at the nonrelativistic level 
the commuting quantum Hamiltonians $41,..., SN 
are formally self-adjoint on the Hilbert space 
L7(G,,dx),K=I,...,I1V. Thus, it may be expected 
that it is possible to construct a unitary eigenfunc- 
tion transform 


©, : L? (Gy, dx) > L?(Gx, dite (p)), 
k=... IV [51] 


diagonalizing S, as multiplication by a real-valued 
function M;(p). Here G,, encodes the joint spectrum 
and dyu,(p) is a suitable measure on G.. 

Obviously, this expectation is borne out in the 
free case g=0. Then, ©, is basically Fourier 
transformation, its kernel consisting of a sum of 
joint eigenfunctions 


exp(—1x - o(p)/h), 
with o ranging over the permutation group Sy. For 
«=I,II, one can take G,=G,=G (eqn [4]) and 
du,(p) =dp. Here one gets 


M,(p) = nn [53] 
sieci N exp(Gpi, ) -- -exp(GDi, ) 


in the nonrelativistic and relativistic case, resp. For 
k = ÍI], IV, one needs to take into account periodic 
boundary conditions on the walls of G,, yielding a 
discrete joint spectrum after the center-of-mass 
motion is omitted. (With the above choices of Gm 
and Gyy, cf. [8] and [9], the center-of-mass motion is 
a free motion along the line, so the total momentum 
still varies continuously.) Of course, the diagona- 
lized S, are once more given by [53], since the kernel 
of ®, consists of free boson states. 

Taking next g > 0, the above expectation has not 
been confirmed for all of the eight regimes involved. 
This is not only because in some cases not even the 


o E Sn [52] 


existence of joint eigenfunctions has been shown, 
but also because in the relativistic case the unitarity 
of ®y and yy already breaks down for N=2 when 
g increases beyond a critical value, cf. [57] below. It 
is quite likely that this happens for N > 2 as well, 
but this is not readily apparent from the current 
fragmentary knowledge on joint eigenfunctions for 
N> 

The only two cases where the g>0 joint 
eigenfunction transform is of an elementary nature 
are the Hlp and III,.; cases. Indeed, the joint 
eigenfunctions describing the internal motion are of 
the form 


n(x) = W(x)'/*P, (x), neNN+ J54 


Here, 


w(x; — XR) [55] 
1<j<k<N 


is a positive weight function on Gym and the P,,(x) 
are multivariable orthogonal polynomials. Thus, 
P,,(x) is a finite linear combination of the above 
free boson states, with p in [52] a linear function of 
n. For the III, case, these eigenfunctions were 
already found by Sutherland. (Here, the functions 
P,(x) amount to polynomials, often called the Jack 
polynomials, which arose in a statistics context.) 
The Me polynomials may be viewed as the special 
Ayn-ı case of Macdonald’s orthogonal g-polyno- 
mials for arbitrary root systems, with 


q = exp(—2hBv) [56] 


(Note that q converges to 1 both in the nonrelati- 
vistic limit c — oo and in the classical limit h — 0.) 

For the II,, case, the joint eigenfunctions were 
found and studied a couple of decades ago by 
Heckman and Opdam, yielding a multivariable 
hypergeometric transform. Indeed, for N=2, the 
eigenfunctions can be expressed in terms of the 
hypergeometric function 2F1, as has been known 
since the early days of quantum mechanics. Like- 
wise, the arbitrary-N I,, joint eigenfunction trans- 
form (studied in detail by de Jeu) can be viewed as a 
multivariable Hankel transform, the N=2 kernel 
being essentially a Hankel function. 

Much less is known concerning IV,, eigenfunc- 
tions, and a fortiori for the associated transform 
ış. For N=2 the time-independent Schrödinger 
equation amounts to the Lamé equation. Hence, 
solutions are Lamé functions that can be studied in 
particular via Fuchs theory (regular singularities). A 
far more explicit form of the eigenfunctions dates 
back to work by Hermite in the nineteenth century. 
More precisely, provided the g dependence of the 
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defining Hamiltonian is changed from g% to g(g — þ) 
(a change already encountered above), Hermite’s 
results apply to couplings g=/b, 1=2,3,4,... His 
eigenfunctions have a structure that is nowadays 
referred to as the Bethe ansatz. For the same g values 
and arbitrary N, Hy, eigenfunctions of Bethe ansatz 
type were found and studied by Felder and 
Varchenko, but even for these g values much 
remains to be done to achieve a complete under- 
standing of the ®ry transform. 

A quite different approach, due to Komori and 
Takemura, does yield rather detailed information on 
y for arbitrary g > 0. The key feature of their 
strategy is to view the IV,, case as a perturbation of 
the Iar case. This entails, however, that the validity 
of their results is restricted to large imaginary period 
of the o-function. 

For the IV,,; system, there are only rather 
complete results on ®ry for N= 2. More specifically, 
the eigenfunction transform is known to be unitary 
for 


gé€[0,b4+7/6v| [57] 


and a dense set in a corresponding parameter space. 
(For g outside this interval, unitarity is violated.) 
The kernel of ry involves eigenfunctions of Bethe 
ansatz structure. For g = lh, l=2,3,... and arbitrary 
N, Bethe ansatz type H,., eigenfunctions were found 
by Billey, generalizing the Felder-Varchenko results 
mentioned above. 

It remains to discuss the I,e and Ile systems. To 
this end, we first recall the classical dualities [46]. It 
is natural to expect that these dualities are still 
present at the quantum level. For the I, case, this is 
readily confirmed: the transform is indeed invariant 
under interchange of x and p. In fact, the N=2 
center-of-mass Hankel transform even depends only 
on (x1 — x2)(p1 — p2), so that self-duality is manifest 
in this case. 

More generally, for N=2 the expected dualities 
[46] are indeed present. The Har 2F; transform 
satisfies the I,.; analytic difference equation in pı — 
p2 due to the contiguous relations obeyed by 2F1. The 
Ia transform is only unitary when g is restricted by 
[57], and it is indeed self-dual in the same sense as the 
action-angle map (Ruijsenaars). 

Turning finally to the case N > 2, the multi-variable 
hypergeometric transform ®y does have the expected 
duality property. More specifically, its inverse diag- 
onalizes the commuting Ie AAOs (Chalykh). For I,e 
with N>2 and g=/b,1=2,3,..., Chalykh also 
finds elementary joint eigenfunctions with the 
expected self-duality. To date, no Hilbert space results 
for the N > 2 II,,; case have been obtained. 


To conclude, we mention that the soliton scatter- 
ing behavior at the classical level is preserved under 
quantization in all cases where this can be checked. 
That is, no new momenta are created in the 
scattering process and the S-matrix is factorized as 
a product of pair S-matrices. Moreover, for the type 
I cases, the S-matrix is a momentum-independent 
(but g-dependent) phase, as a quantum analog of the 
classical billiard ball scattering. 


See also: Bethe Ansatz; Classical -Matrices, Lie 
Bialgebras, and Poisson Lie Groups; Functional 
Equations and Integrable Systems; Integrable Discrete 
Systems; Integrable Systems and Algebraic Geometry; 
Integrable Systems in Random Matrix Theory; Integrable 
Systems: Overview; Isochronous Systems; Ordinary 
Special Functions; g-Special Functions; Quantum 
Calogero—Moser Systems; Seiberg—Witten Theory; 
Separation of Variables for Differential Equations; 
Sine-Gordon Equation; Toda Lattices. 
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Introduction 


Lagrangian formulations of general relativity (GR) 
were found by Hilbert and by Einstein himself, 
almost immediately after the discovery of the theory. 
The construction of Hamiltonian formulations of 
GR, on the other hand, has taken much longer, and 
has required decades of theoretical research. 

The first such formulations were developed by 
Dirac and by Bergmann and his collaborators, in the 
1950s. Their cumbersome formalism was simplified 
by the introduction of new variables: first by 
Arnowit, Deser, and Misner in the 1960s and then 
by Ashtekar in the 1980s. A large number of 
variants and improvements of these formalisms 
have been developed by many other authors. Most 
likely the process is not over, and there is still much 
to learn about the canonical formulation of GR. 

A number of reasons motivate the study of 
canonical GR. In general, the canonical formalism 
can be an important step towards quantum theory; 
it allows the identification of the physical degrees of 
freedom, and the gauge-invariant states and obser- 
vables of theory; and it is an important tool for 
analyzing formal aspects of the theory such as its 
Cauchy problem. All these issues are highly non- 
trivial, and present open problems, in GR. 

In turn, the structural peculiarity and the con- 
ceptual novelty of GR have motivated re-analyses 
and extensions of the canonical formalism itself. 

The following sections discuss the source of the 
peculiar difficulty of canonical GR, and summarize 
the formulations of the theory that are most 
commonly used. 


The Origin of the Difficulties 


The reason for the complexity of the Hamiltonian 
formulation of GR is not so much in the intricacy of 
its nonlinear field equations; rather, it must be found 
in the conceptual novelty introduced by GR at the 
very foundation of the structure of mechanics. 

The dynamical systems considered before GR can 
be formulated in terms of states evolving in time. One 
assumes that a time variable t can be measured by a 
physical clock, and that certain observable quantities 
A of the system can be measured at every instant of 
time. If we know the state s of the system at some 


initial time, the theory predicts the value A(t) of 
these quantities for any given later instant of time t. 
The space of the possible initial states s is the phase 
space Io. Observables are real functions on Ip. 
Infinitesimal time evolution can be represented as a 
vector field in Tọ. This vector field is determined by 
the Hamiltonian, which is also a function on To. The 
integral lines s(t) of this vector field determine 
the time evolution A(t) = A(s(t)) of the observables. 

This conceptual structure is very general. It can be 
easily adapted to special-relativistic systems. How- 
ever, it is not general enough for general-relativistic 
systems. GR is not formulated as the evolution of 
states and observables in a preferred time variable 
which can be measured by a physical clock. Rather, 
it is formulated as the relative (common) evolution 
of many observable quantities. Accordingly, in GR 
there is no quantity playing the same role as the 
conventional Hamiltonian. In fact, the canonical 
Hamiltonian density that one obtains from a 
Legendre transformation from a Lagrangian 
vanishes identically in GR. 

The origin of this peculiar behavior of the theory is 
the following. The field equations are written as 
evolution equations in a time coordinate t. However, 
they are invariant under arbitrary changes of t. That is, 
if we replace ¢ with an arbitrary function t = t'(t) ina 
solution of the field equations, we obtain another 
solution. This underdetermination does not lead to a 
lack of predictivity in GR, because we do not interpret 
the variable t as the measurable reading of a physical 
clock, as we do in non-general-relativistic theories. 
Rather, we interpret t as a nonobservable mathematical 
parameter, void of physical significance. Accordingly, 
the notions of “state at a given time” and “value of 
an observable at a given time” are very unnatural in GR. 

A Hamiltonian formulation of GR requires a 
version of the canonical formalism sufficiently 
general to deal with this broader notion of evolu- 
tion. Generalizations of the Hamiltonian formalism 
have been developed by many authors, such as Dirac 
(see below), Souriau, Arnold, Witten, and many 
others. The first step in this direction was taken by 
Lagrange himself: Lagrange gave a time-independent 
interpretation of the phase space as the space T of 
the solutions of the equations of motion (modulo 
gauges). As we shall see, however, consensus is still 
lacking on a fully satisfactory formalism. 


Dirac Theory of Constrained Systems 


Dirac has developed a Hamiltonian theory for 
mechanical systems with constraints, precisely in 


view of its application to GR. Dirac’s theory is 
beautiful, finds vast applications, and it is still 
commonly taken as the basis to discuss Hamiltonian 
GR, although GR does not fit very naturally into 
Dirac’s scheme. In the following, only the part of 
Dirac’s theory relevant for GR is summarized. 

Consider a Lagrangian system with Lagrangian 
variables gf, withi=1,..., n. Call v’ the corresponding 
velocities. Let the system be defined by the Lagrangian 
L(q',v'). The momenta are defined as functions of q‘ 
and 1f by p,(q',v') = OL(q',v')/Ov'. The canonical 
Hamiltonian H(q‘, pi) = v (q', pi)Pi — Lig TAC Di) 
(summation over repeated indices is understood) is 
obtained by inverting the function p;(q', v’) and expres- 
sing the velocities as functions of the momenta v"(q’, pi). 
The phase space To is the space of the variables (q’, p;). 
Infinitesimal time evolution is given by the vector field 
V=v'(q', pi)O/0d' + fil, pi)0/Opi, where velocities 
and forces are given by the Hamilton equations 
v = OH /Op; and f; = —OH/O¢q'. 

More formally, the 2-form w= dp; ^A dg’ endows 
Io with a symplectic structure. In the presence of 
such a structure, every function A determines a 
vector field Vy, defined by iy,w = —dA. By inte- 
erating this field, we have a flow in Ip, called the 
flow generated by A. Time evolution is the flow 
generated by the Hamiltonian. Given two functions 
A and B, their Poisson brackets are defined by the 
function {A,B}=—V,(B)=V,(A). Therefore, the 
time evolution of an observable A satisfies 
dA/dt={A,H}. A dynamical system is completely 
characterized by the set (Io,w,A,H), where 
A=(A1,..., An) is the ensemble of the observables. 

A constrained system, in the sense of Dirac, is 
a system for which the image of the function v’ > 
pi(q',v') is smaller than R”. We can characterize 
the image T of the map (gf, vf) — (qf, pi) with a set 
of equations on To 


Ca(q', pi) = 0 [1] 


where a =1,...,m’'. These are called the primary 
constraints. 

The “constraint surface” C is the largest subspace 
of Z which is preserved by time evolution. It can be 
characterized by adding additional constraints, still 
of the form (1), with a=m'+1,...,m. These 
additional constraints, called secondary constraints, 
can be computed as the Poisson brackets of the 
primary constraints with the Hamiltonian (plus the 
Poisson brackets of these secondary constraints with 
the Hamiltonian, and so on, until the Poisson 
brackets of all the constraints with the Hamiltonian 
vanish on in C). We say that an equation holds 
weakly if it holds on C. 
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A constrained system is “first class” if the Poisson 
brackets of the constraints among themselves 
vanishes weakly. Maxwell theory and GR are first- 
class constrained systems. In a first-class constrained 
system, the constraints generate flows that preserve 
C and foliate it into “orbits.” The space of these 
orbits is called the physical phase space (see 
Figure 1). 

This flow is interpreted as a “gauge” transforma- 
tion, namely as a change of mathematical descrip- 
tion of the same physical state. As first observed by 
Dirac, such interpretation is necessary if we demand 
a deterministic physical evolution, for the following 
reason. A first-class constrained system is a system 
in which the time evolution q'(t) of the Lagrangian 
variables is not completely determined by the 
equations of motion. (The relation between con- 
straints and underdetermination of the evolution is 
simple to understand. In a Lagrangian system, the 
number of equations of motion is equal to the 
number of Lagrangian variables. If one of these 
equations is a constraint (between the initial 
velocities and initial coordinates), then one evolu- 
tion equation is missing.) To recover a deterministic 
physical evolution, we must interpret two “mathe- 
matical” states that can evolve from the same initial 
data, as describing the same “physical” state. As 
shown by Dirac, the transformations generated by 
the constraints are precisely the ones that implement 
such an identification. 

It follows that the physical states must be identified 
with the equivalence classes of the points of C under 
the gauge transformations generated by the con- 
straints, namely with the orbits of their flow. It is 
easy to show that (locally) there is a unique 
symplectic 2-form w,, on Pph such that its pullback 
to C is equal to the pullback of w to C (iw =7,.Wpp, 
see Figure 1). Physical observables A,n are functions 
on C that are gauge invariant, namely constant on 


yee 


Orbits 


Space of the orbits 


Figure 1 The structure of a first-class constrained system. 
To: phase space, C: constraint surface, Ip): physical phase 
space; i: imbedding of C in T; a projection to orbit space 
(sending each point into its orbit). 
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the orbits. That is, they are functions on Pph. The 
Hamiltonian is a physical observable. The dynamical 
system (In, ph, Aph, H), where Apn is the ensemble 
of the physical observables, is a complete description 
of the physical system, called the gauge-invariant 
formulation, with no more constraints or gauges. 

For instance, the phase space of Maxwell theory is 
coordinatized by the Maxwell potential 
A, (x), 4=0,1,2,3, and its conjugate momentum 
E(x). Since the time derivative of Ag does not 
appear in the Maxwell action, the primary con- 
straint 1s 


E°(x) = 0 [2] 


The secondary constraint turns out to be the Gauss 
law, 


3 E" (x) = 0 3) 


where a=1,2,3. The first generates arbitrary 
transformations of Ap, while the second gene- 
rates the time-independent gauge transformations 
6A,(x) =0,X(x). The pair (Ao, 7°) can be dropped 
altogether, since it is formed by a pure gauge 
variable and a variable constrained to vanish. 
The (gauge-invariant) Hamiltonian is H =1/87 
f dx (E2E, + B“B,), where B?= €°°0,A. is the 
magnetic field and E% is easily recognized as the 
electric field. E* and B, are the physical 
observables. 


General Structure of GR Constraints 


GR fits into Dirac theory with a certain difficulty. 
Since the constraints are the generators of the gauge 
invariances, it is easy to determine their structure in 
GR. The gauge invariances of GR are given by the 
coordinate transformations x” — x” = f”(x), where 
x= (x,t). Accordingly, we have four primary con- 
straints 7 = 0, analogous to [2], and four secondary 
constraints C,,(x)=0, analogous to [3]. These are 
usually separated into the three “momentum” 
constraints 


C, (x) =0 [4] 


which generate fixed-time spatial coordinate trans- 
formations and the “Hamiltonian” constraint 


C(x) = 0 [5] 


which generates changes in the t coordinate. 

The metric g,,,(x) that represents the gravitational 
field in Einstein’s original formulation has ten 
independent components per point. Each first-class 
constraint indicates that one Lagrangian variable is 
a gauge degree of freedom. The physical degrees of 


freedom of GR are therefore (10 —-4—4)=2 per 
point. In the linearized theory, these are the two 
degrees of freedom that describe the two polariza- 
tions of a gravitational wave of given momentum. 
Formulations of GR in which there are additional 
gauge invariances (such as Cartan’s tetrad formula- 
tion, see below) have, accordingly, more constraints. 

Since the Hamiltonian generates evolution in the 
Lagrangian evolution parameter t, and since such 
evolution can be obtained as a gauge transforma- 
tion, it follows that the Hamiltonian is a constraint 
in GR. The vanishing of the Hamiltonian is a 
characteristic feature of general-relativistic systems. 
The Hamiltonian structure of GR is therefore 
determined by its phase space and its constraints. 
The gauge-invariant formulation of the theory is 
given just by the set (T ph, ph, App) and no Hamilto- 
nian. The physical interpretation of this structure is 
discussed in the last section. 


ADM Formalism 


In Einstein’s formulation, the Lagrangian variable of 
GR is the metric field g,,(x,t) (here we use the 
signature [—, +, +,+]). Arnowit, Deser, and 
Misner have introduced the following change of 
variables: 


dab = Sab; N= 1/ V =e", 


where gq?’ is the inverse of the three-dimensional 
metric qab, used henceforth to raise and lower space 
indices a,b = 1,2, 3. This is equivalent to writing the 
invariant interval in the form 


N* = q” gao [6] 


ds? = —N? d? + qa (dx* + N° dt) (dx? + N? dt) 


These variables have an interesting geometric inter- 
pretation. Consider a family of spacelike (“ADM”) 
surfaces X}; defined by t = constant. qa, is the 3-metric 
induced on the surface. N is called the “lapse” function 
and N* is called the “shift” function. Their geometrical 
interpretation is illustrated in Figure 2. 

When written in terms of these variables, the 
action of GR takes the form 


Slqab, N, N* = f aéxvaNIR T kank” g k*| 


where q = det qa and R are the determinant and the 
Ricci scalar of the metric q4; 


k O:dab — DaNp — Di Na) 


1 
is the extrinsic curvature of the constant time 
surface; and D, is the covariant derivative of qab. 
This action is independent of the time derivatives of 


t+dt Na dt 





Figure 2 The geometrical interpretation of the lapse N(x, t) 
and shift N@(x,t) fields. Two ADM surfaces, defined by the 
values t and t+ dt, are displayed. N(x, t)dt is the proper length 
of the vector joining the two surfaces, normal to the first surface 
at (x, t). This is the proper time lapsed between the two surfaces 
for an observer at rest on the first surface at (x, t). The quantity 
dx? = N? (x, t)dt is the shift (the displacement) between the 
endpoint of this vector and the point (x, t + dt) having the same 
spacial coordinates as (x, t). 


N and N’. The conjugate momenta 7m and 7m4 of these 
quantities are therefore the primary constraints and 
the pairs (r, N) and (z,, N?) can be taken out of the 
phase space as for the pair (E°, Ag) in the Maxwell 
example. We can therefore take the 3-metric qalx) 
and its conjugate momentum p(x) as the canonical 
variables of GR. The momentum is related to the 
“velocity” qab, by 


p” = Jak” r kq”) 


where k = kaq”. 
The secondary constraints [4] and [5] turn out to be 


C= Vēda (2) =0 7 


and 


— 1 ab a 2\ _ — 
C= = (» Pab — 5? ) vqR=0 [8] 
where p =p?’ gap 

If the two fields g,,(x,t) and p%(x,t) satisfy the 
Hamilton equations 


a = {qa (x,t), H(t)} 9 
ap” (x,t) pa 
= =f P(x t), H(t)} [10] 


where 


H(t) = | Px N(x,1)Clgas(o.t), pCt) 
T N° (x, t) Ca dab (Xx, t), a t)| 


with arbitrary functions N(x, t), N?(x,t), then the 
metric g(x, t), defined from qap, N, N° by eqn [6], is 
the general solution of the vacuum Einstein equation 
Ricci[g]=0. Therefore, these equations provide a 
Hamiltonian form of the Einstein field equation. 
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Tetrad Formalism 


The tetrad formalism, developed by Cartan, Weyl, 
and Schwinger, has definite advantages with respect 
to the metric formalism. It allows the coupling of 
fermion fields to GR and is, therefore, needed to 
couple the standard model to GR. In the tetrad 
formalism, the gravitational field is represented by 
four covariant fields e!,(X), where J, /,...=0,1,2,3 
are flat Lorentz indices raised and lowered with the 
Minkowski metric nj = diag[—1, +1, +1, +1]. The 
relation with the metric formalism is given by 


Suv = nye e, 


In this formulation, GR has an additional local 
SO(3,1) gauge invariance, given by local Lorentz 
transformations on the I indices. The corresponding 
canonical formalism is usually defined in a gauge 
in which e}=0, where i,j,...=1,2,3 are flat 
three-dimensional indices raised and lowered with 
the 6,;=diag[+1, +1, +1]. In this gauge, the 
Lorentz group is reduced to the local SO(3) group 
of spatial transformations, and the ADM variable 


are defined by 
N N 
“=(0 2) oi 


where N'=e!,N*. This is equivalent to writing the 
invariant interval in the form 


ds? = —N? de? + (e,; dx? + N; de) (e; dx? + Ni de) 


The reduced canonical variables can be taken to be 
the field e/(x) that represents the “triad” of the 
ADM surface, and its conjugate momentum p%(x). 
Their relation with the three-dimensional metric 
variables is given by transforming internal indices 
into tangent indices with the triad field ef and its 
inverse ef. In particular, 


qab = bye, [12] 
p” = ep; [13] 
Also, for later reference, 
hi = ib} — 2 i Ti 14 
A ab = org Pa T 28a?) [14] 


where p = ép?. 

The momentum and Hamiltonian constraints are 
the same as in the ADM formulation, with qap and 
p™® expressed in terms of the triad variables. The 
additional constraint that generates the internal 
rotations is 


G; = Cie’, pt = 0 [15] 
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Ashtekar Formalism 


The Ashtekar formalism simplifies the form of the 
constraints and casts GR in a form having the same 
kinematics as Yang-Mills theory. With its variants, it 
is widely used in nonperturbative quantum gravity, in 
particular in the loop formulation (see Loop Quan- 
tum Gravity). It can be obtained from the tetrad 
canonical formalism by the canonical transformation 
Ai = few)" + iki [16] 


a 
E = det ee; [17] 


where w = w! dx” is the (torsion-free) spin connec- 
tion of the triad 1-form field e’ = ef dx*, determined 
by the Cartan equation 


de + wi ne =0 


The “electric” field E is real, while the Sen—Ashtekar 
connection A’ = A! dx* is complex and satisfies the 
reality condition 


A’ + Ai = 2I"[el [18] 


The connection A’ has a simple geometrical inter- 
pretation. It is the pullback Ag =u) on the t=0 
ADM surface of the self-dual part 


1 1 
wht = 7 (1 -3 ey To | 


of the four-dimensional torsion free spin connection 
w determined by the tetrad field A 

In terms of these fields, the constraint equations 
can be written in the form 


G; = D,E? = 0 [19] 
Caf 0 [20] 
CS epr p P E" =0 [21] 


where D, is the covariant derivative and F,,, is the 
curvature defined by the connection A. The first of these 
constraints is the nonabelian version of the Gauss law 
[3]: it is the gauge constraint of Yang-Mills theory. The 
constraints are polynomial in the canonical variables. 

These equations are often written using a basis 7; 
in the su(2) Lie algebra, and defining the su(2) 
connection A= A'r; and the su(2)-valued vector 
field E?=E”7;. In terms of these fields the con- 
straints can be written in the form 


G = D,E* =0 
C= tr Fap E°] = 0 
C = tr[F, E E’] = 0 


where the trace is on su(2). 


A variant of this formalism commonly used in 
quantum gravity is obtained by replacing [16] with 
the Barbero connection 

Ai = Ley wit + oki 22) 
where y is an arbitrary complex number, called the 
Immirzi parameter. In terms of this connection, [21] 
is replaced by 


T 1 2 
C = ep Fi pE E + 7 det efkapk® — K?) = 0 


where e’, and k,y are given as function of E and A by 
[22] and [17]. The choice y = 1, with the constraint 
[19]-[21], gives the canonical formulation of Eucli- 
dean GR. 

All the formulations described extend readily to 
matter couplings. The structure of the constraints 
remains the same — with additional constraints corre- 
sponding to matter gauge invariances, if any. The GR 
constraints are modified by the addition of matter terms. 
In particular, the Hamiltonian constraint C and the 
momentum constraint C, are modified by the addition 
of terms determined by the energy density and the 
momentum density of the matter, respectively. In the 
Ashtekar formulation, a fermion field modifies the 
Gauss law constraint by the addition of a torsion term. 


Evolution 


In the gauge-invariant canonical structure of GR, there 
is no explicit time flow generated by a Hamiltonian. If 
the formalism is utilized just in order to express the 
Einstein equation in first-order canonical form, this is 
not a difficulty, because evolution in the coordinate 
time is generated by the constraints. On the other 
hand, if we are interested in understanding the 
structure of states, observables, and evolution of GR, 
the situation appears to be puzzling. An additional 
complication arises from the fact that virtually no 
gauge-invariant observable Apn is known explicitly as 
a function on the phase space. These issues become 
especially relevant when the canonical formalism is 
taken as a starting point for quantization. How is 
physical evolution represented in canonical GR? 

The first relevant observation is that the gauge- 
invariant phase space Iph is better understood as a 
phase space in the sense of Lagrange: namely as the 
space T of the solutions of the equations of motion 
modulo gauges, rather than a space of instantaneous 
states. Recall that in GR the notion of “instanta- 
neous state” is rather unnatural. 

In the ADM formulation, for instance, an orbit on 
the constraint surface of GR can be understood as 
the ensemble of all possible values that the variables 


(qalx), p” (x)) can take on arbitrary spacelike ADM 
surfaces embedded in a given solution of the 
Einstein equation. Motion along the orbit (which 
has dimension 4 x 00°) corresponds to arbitrary 
deformations of the surface. 

Physical applications of classical GR deal with 
relations between “partial observables.” A partial 
observable is any variable physical quantity that can 
be measured, even if its value cannot be determined 
from the knowledge of the physical state. An example 
of partial observable in nonrelativistic mechanics is 
given precisely by the nonrelativistic time t. Partial 
observables are represented in GR as functions on To. 
A physical state in I’, determines an orbit in C, and 
therefore a set of relations between partial observables 
(see Figure 1). That is, it determines the possible values 
that the partial observables can take “when” and 
“where” other partial observables have given values. 
All physical predictions of classical GR can be 
expressed in this form. 

One of the partial observables can be selected to 
play the role of a physical clock time, and evolution 
can be expressed in terms of such clock time. In 
general, it is difficult — if not impossible — to find a 
clock time observable in terms of which evolution is 
a proper conventional Hamiltonian evolution. Mat- 
ter couplings partially simplify the task. For 
instance, if the motion of planet Earth is coupled 
to GR, then proper time along this motion from a 
significative event on Earth, which is a partial 
observable, can be a convenient clock time. In pure 
gravity, the “York time” defined as the trace of the 
extrinsic curvature Ty = k, on ADM surfaces where 
k is spatially constant, has been extensively and 
effectively used as a clock time in formal analysis of 
the theory. A Hamiltonian that generates evolution 
in a given clock time T can be formally obtained by 
solving the Hamiltonian constraint with respect to a 
momentum Pr conjugate to T. Such “reparametriza- 
tions” of the relative evolution of the partial 
observables can be useful to analyze equations and 
to help intuition, but they are by no means necessary 
to have a well-defined interpretation of the theory. 

Another possibility to introduce a preferred time 
flow is to consider asymptotically flat solutions of 
the field equations. In this case, one can define a 
nonvanishing Hamiltonian, given by a boundary 
integral at spacial infinity. This Hamiltonian gen- 
erates evolution in an asymptotic Minkowski time. 
This choice is convenient for describing observations 
performed from a large distance on isolated gravita- 
tional systems. Many general-relativistic physical 
observations do not belong to this category. 

Various other techniques to define a fully gen- 
erally covariant canonical formalism have been 
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explored. Among these: definitions of the physical 
symplectic structure directly on the space of the 
solutions of the field equations; generalization of the 
initial and final surfaces to boundaries of compact 
spacetime regions; construction of “evolving con- 
stants of motion,” namely families of gauge-invar- 
iant observables depending on a clock time 
parameter; multisymplectic formalisms that treats 
space and time derivatives on a more equal footing; 
and others. Many of these techniques are attempts 
to overcome the unequal way in which time and 
space dependence are treated in the conventional 
Hamiltonian formalism. 

GR has deeply modified our understanding of 
space and time. An extension of the canonical 
formalism of mechanics, compatible with such a 
modification, is needed, but consensus on the way 
(or even the possibility) of formulating a fully 
satisfactory general-relativistic extension of Hamil- 
tonian mechanics is still lacking. 


See also: Asymptotic Structure and Conformal Infinity; 
Constrained Systems; General Relativity: Overview; 
Loop Quantum Gravity; Quantum Cosmology; Quantum 
Geometry and its Applications; Spin Foams; 
Wheeler—De Witt Theory. 
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Introduction 


Shared entanglement between a sender and receiver 
can significantly improve the usefulness of a 
quantum channel for the communication of either 
classical or quantum data. Superdense coding and 
teleportation provide the most well-known examples 
of this improvement; free entanglement doubles the 
classical capacity of a noiseless quantum channel 
and makes it possible for a noiseless classical channel 
to send quantum data. In fact, the entanglement- 
assisted classical and quantum capacities of a 
quantum channel are in many senses simpler and 
better behaved than their unassisted counterparts 
(Holevo 1998, Schumacher and Westmoreland 
1997, Devetak 2005). Most importantly, these 
capacities can be calculated using simple formulas 
and finite optimization procedures (Bennett et al. 
1999, 2002). No such finite procedure is known for 
either of the unassisted capacities. Moreover, the 
entanglement-assisted classical and quantum capa- 
cities are related by a simple factor of 2. The 
unassisted capacities, in contrast, have completely 
different formulas. In fact, the simple factor of 2 
generalizes to a statement known as the quantum 
reverse Shannon theorem, which governs the rate at 
which one quantum channel can simulate another 
(Bennett et al. 2005). The answer is given by the 
ratio of the entanglement-assisted capacities. 


Notation 


Quantum systems will be denoted by A, B, and so 
on as well as their variants such as A’ and A. The 
choice of letter will generally indicate which party 
holds a given system, with A reserved for the sender, 
Alice, and B for the receiver, Bob. Given a quantum 
system C, C®” will often be written as C”. These 
symbols will be used to denote both the Hilbert 
space of the quantum system and the set of density 
operators on that system. Thus, a quantum channel 
N :A'—B refers to a trace-preserving, completely 
positive (TPCP) map from the operators on the 
Hilbert space of A’ to those of B. id“ refers to the 
identity channel on C. The map M @id© will 
frequently be abbreviated to M in order to simplify 
long expressions. Likewise, the density operator 
ly)(y| of a pure quantum state |y) will be 
abbreviated to y. m“ will refer to the maximally 


mixed state on C and zy to the maximally mixed 
state on a specified d-dimensional quantum system. 

For a given quantum state y^? on the composite 
system AB, yp“ =trg y^? and 


H(A), = H(y") = —tr(y log, y4) [1] 
is the von Neumann entropy of y4, while 
H(A|B), = —I,(A)B) = H(AB), — H(B) 
is its conditional entropy and 
I(A; B), = H(A), + H(B),, — H(AB), [3] 


its mutual information. 


Entanglement-Assisted Classical 
and Quantum Capacities 


The entanglement-assisted classical capacity of a 
quantum channel M : A’—B is the optimal rate at 
which classical information can be communicated 
through the channel while in addition making use of 
an unlimited number of maximally entangled states. 

The formal definition proceeds as follows. Alice 
and Bob are assumed to share nS ebits in the form of 
a maximally entangled state |®)“" of Schmidt rank 
2”°. Conditioned on her message m € {1,2,...,2”%}, 
Alice will apply an encoding operation €,,: A — A”. 
Bob’s decoding is given by a POVM (Am on the 
composite system BB”. The procedure is said to have 
maximum probability of error «€ if 


max tr [Am (N o Em)(®)] > 1 —¢ 4] 


These elements, illustrated in Figure 1, consisting of 
the shared entanglement, as well as the encoding and 
decoding operations meeting the criterion of eqn [4], 
are called a (2”%, 2”, n, €) entanglement-assisted clas- 
sical code for the channel M. A rate R is said to be 
achievable if there exists a choice of S>0 and a 
sequence of entanglement-assisted classical codes 
(27% 2”. 1, cn) with €,, — 0. The entanglement-assisted 





Figure 1 Circuit representation of the elements of an 
entanglement-assisted classical code for the channel M. Alice 
encodes message m by applying the operation Em to her half 
of the shared entanglement. Bob decodes by applying the 
POVM {Am} on the output of the channel and his half of the 
shared entanglement. 


classical capacity Cg(N) of N is defined to be the 
supremum over all achievable rates. 


Theorem 1 (Bennett et al. 1999, 2002). The 
entanglement-assisted classical capacity Cg of a 
quantum channel N : A' — B is given by 


Can) = max I(A; B), [5] 


where the maximization is over states of? =N (^^) 
arising from the channel by acting on the A’ half of 
any pure state |p)" 


The theorem bears a strong formal resemblance to 
Shannon’s noisy coding theorem for the classical 
capacity of a classical noisy channel. There the 
capacity formula is also given by an optimization of 
the mutual information, but over joint distributions 
between the input and output alphabets arising from 
the action of the channel. Such a joint distribution 
cannot exist in general for a quantum channel 
because the no-cloning theorem excludes the possi- 
bility of the input and output existing simulta- 
neously. Equation [5] instead refers to a natural 
substitute for the joint input-output distribution: a 
quantum state arising from the quantum channel 
acting on half of an entangled pure state. 

Another point worth stressing is that, unlike the 
known formulas for the unassisted classical and 
quantum capacities of a quantum channel, eqn [5] 
refers to only a single use of M instead of the limit 
of many uses, N°”. The formula can therefore 
readily be used to evaluate Cg for any channel of 
interest. 

Consider, for example, the d-dimensional depo- 
larizing channel 


D,(p) = (1 — p)p + pra [6] 


that with probability p completely randomizes the 
input but otherwise leaves the input invariant. For 
such channels, the maximum is achieved by choos- 





ing a aom entangled state for |y)"“, yielding 
d=] 
-be(i-e =) 
where for any 0 < q < 1 and integer r>1, 


b,(q) = — qlog, g— (1-4) 
x log, (3—4) 8 


is the Shannon entropy of the 


(q,(1 — q)/(r = 1),...,(1 -— 4) /(r — 1)). 
Entanglement assistance also simplifies the rela- 
tionship between the classical and quantum 





distribution 
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capacities of a channel. Proceeding as before to 
formally define the quantum capacity, Alice and Bob 
are again, assumed to share a maximally entangled 
state |®) AB of Schmidt rank 2”, Alice’s encoding 
operation will be a TPCP map €E: :AA— A” acting 
on an input system A and her half of the shared 
entanglement, A. Bob’s decoding will likewise be a 
TPCP map D: BB” — B acting on the output of the 
channel, B”, and his half of the shared entangle- 
ment, B. A and B are assumed to be isomorphic 
quantum systems of some fixed dimension 2”2. The 
procedure is said to have subspace fidelity 1 — € if 


B(p|(Don™" o£) (27 @ v4 |p)? >1 =e e 


for all |y)“ € A. These elements, illustrated in 
Figure 2, are together called a (2”2,2,n,€) 
entanglement-assisted quantum code for the channel 
N. A rate Ọ is said to be achievable if there exists a 
choice of $>0 and a sequence of entanglement- 
assisted quantum codes (2”*%, 2”, n, €,) with €, — 0. 
The entanglement-assisted quantum capacity Org(NV) 
of N is defined to be the supremum over all 
achievable rates. 

There is considerable freedom in the definition of 
the entanglement-assisted quantum capacity. It 
could, for example, be defined as the largest amount 
of maximal entanglement that can be generated 
using the channel, minus the entanglement con- 
sumed during the protocol itself. Alternatively, the 
fidelity criterion eqn [9] could be strengthened to 
require that Do N” “" o € preserve not only pure 
states on A but any entanglement between A and a 
reference system. All of these variants yield the same 
capacity formula: 


OBN) =5 CEN) [10] 


This equivalence is a direct consequence of the 
existence of the teleportation and superdense coding 
protocols. When maximal entanglement is available, 
teleportation converts the ability to send classical 
data into the ability to send quantum data at half 
Conversely, by 


the classical rate. consuming 





Figure 2 Circuit representation of the elements of an 
entanglement-assisted quantum code for the channel M. € is 
Alice’s encoding operation, which acts on both her input state 
and her half of the shared entanglement. Bob decodes using a 
quantum operation D acting on the output of the channel and his 
half of the shared entanglement. 
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maximal entanglement, superdense coding converts 
the ability to send quantum data into the ability to 
send classical data at double the quantum rate. 


Sketch of Proof 


The proof of a capacity theorem can usually be 
broken into two parts, achievability and optimality. 
The achievability part demonstrates the existence of 
a sequence of codes reaching the prescribed rate 
while the optimality part shows that it is impossible 
to do better. 

The main idea in the achievability proof can be 
understood by studying the special case where 
p^ =r", Let dy =dimA’ and (U; be a set of 
Weyl operators for A”. The relevant property of 
these operators is that averaging over them imple- 
ments the constant map: for all density operators p, 


1 In 
pa 2, UieU; = x" ue 


Consider the state o; that arises if Alice acts with U; 
on the A” half of a rank-d%, maximally entangled 
state |y)"“" and then sends the A” half of the 
resulting state through M. (Note that here A” also 
plays the role of A.) The entropy of the resulting 
state is 


H(o)) =H(N(UjS1g)e(U} @Iy))) 02 


= H(N(¢)) [13] 


since U; does not change the local density operator 
on A”. 

On the other hand, if Alice selects a value of j 
from the uniform distribution, then the resulting 
average input state to the channel will be 


Tî" Q no = vf" @ y^ [14] 
and the corresponding average output state will be 
N(Y") Q v4, which has entropy 

H(N(p*")) + H(¢"*) [15] 


Therefore, the Holevo quantity of the ensemble of 
output states, defined as the entropy of the average 
state minus the average of the entropies of the 
individual output states, will be equal to 


H(y') + H(W(w"")) —H(N(e"")) 6 


This is precisely the quantity I(A;B), for the state 
N(v44") since the channel M transforms the A” 
system into B. Moreover, if Bob is given the A part of 
the maximally entangled state, then this is the Holevo 


quantity of an ensemble of states that can be produced 
by Alice acting on half of a shared entangled state and 
then sending her half through the channel. Invok- 
ing the Holevo—Schumacher—Westmoreland (HSW) 
theorem for the classical capacity (Holevo 1998, 
Schumacher and Westmoreland 1997) therefore com- 
pletes the proof; using coding, the Holevo quantity is 
an achievable communication rate. 

The proof that eqn [5] is optimal involves a series 
of entropy manipulations similar to the optimality 
proofs for the unassisted classical and quantum 
capacities. From the point of view of quantum 
information, the truly unusual part of the proof is 
the demonstration that it is unnecessary to consider 
multiple copies of M (Cerf and Adami 1997). 
Specifically, let 


F(N) = max I(A;B), 17] 


where the maximization is defined as in Theorem 1. 
Techniques analogous to those used for the unas- 
sisted capacities yield the upper bound 


_ 1 ‘i 
Ce) < lim EFN”) 18] 
Unlike the unassisted case, however, a relatively easy 
argument shows that 


f(N1 8 N2) =f(N1) + F(N) [19] 


(The analogous statement is an important conjecture 
for the classical capacity and is known to be false for 
the quantum capacity (DiVincenzo et al. 1998).) As 
a result, Cg(N) < f(N), which is the optimality part 
of Theorem 1. 

To see the origin of eqn [19], it will be helpful to 
invoke Stinespring’s theorem to write Mj =trg,U;" ', 
where U;:A;—B;Ej is an isometry. Fix a state 
p4 and let o= (U1 @U2)(v~). Equation [19] 
follows from the fact that 


I(A; B1B2), < I(AB2E2; B1), 
+ I(AB1E1; Bz), [20] 


Simply redefining A to be AB) FE») shows that the first 
term of the right-hand side is upper bounded by 
f(N1). The second term, likewise, is upper bounded 
by f(N2). Equation [20] is itself equivalent to the 
inequality 


A(B,B2|Fi E>), + H(B,B2), 
< H(Bi|E1), + H(B2|E2), 
+H(B1), + H(B2), 21 


The inequality H(B,B2), < H(B,), + H(B2), holds 
by the subadditivity of the von Neumann entropy. 


Repeated applications of the strong subadditivity 
inequality, moreover, lead to the inequality 


H(BiBz|E1 E2); < H(Bi|E1), 
+ H(B2|E2), [22] 


Together, they prove eqn [20] and, thence, eqn [19]. 
The intuitive meaning of this “single-letterization” is 
unclear, but regardless, it is interesting to note that 
the proof involved invoking a pair of purifying 
environment systems, E; and E>, and studying the 
entropy relationships between the true outputs of 
the channel and the environment’s share. 


The Quantum Reverse Shannon Theorem 


A strong argument can be made that the entanglement- 
assisted capacity of a quantum channel is the most 
important capacity of that channel and that all the 
other capacities are, in some sense, of less significance. 
The fact that it is unnecessary to distinguish between 
the classical and quantum entanglement-assisted capa- 
cities because they are related by a factor of 2 is a hint 
in that direction, as is the simple, single-letter formula 
for Cr(N). 

A more general argument can be made by 
considering the problem of having one channel 
simulate another. Indeed, the quantum capacity of 
a quantum channel is simply the optimal rate at 
which that channel can simulate the noiseless 
channel idz on a single qubit. Likewise, the classical 
capacity of a quantum channel is its optimal rate for 
simulation of a qubit dephasing channel 


pr |0)(Ole|0)(O] + [1) (Toit) 1] |23] 


In this spirit, the fact that Ce(N)=20zg(N) can be 
re-expressed in the form 
CE(N) 

OLN) = & (id> ) [24] 
Equivalently, when entanglement is free, the optimal 
rate at which M can simulate a noiseless qubit channel 
is given by the ratio between the entanglement- 
assisted classical capacities of M and id). The 
quantum reverse Shannon theorem generalizes this 
statement to the simulation of arbitrary channels in 
the presence of free entanglement. 

Suppose that Alice and Bob would like to use 
N1 : A' — B to simulate another channel M3 : A’ — B. 
Fix an input state y^ and let |p)^^" be a purification 
of (p^ )®”. As always, assume that Alice and Bob share 
a maximally entangled state |®)"" of Schmidt rank 
2”°, Alice’s encoding operation will be a TPCP map 
E: AA” — A™ acting on n copies of the input system 
A’ and her half of the shared entanglement, A. Bob’s 
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decoding will likewise be a TPCP map D: B”B — B” 
acting on m copies of the output of the channel, and his 
half of the shared entanglement, B. This procedure is 
said to e-simulate M3” on (y^ )®” if 


F(N3"(o"*"), (Do N?” o E) (04 a ^4") 
>1-e [25] 


where F is the mixed state fidelity F(p,c)= 
(tr,/p'/2cp!/2)?. The entire procedure, illustrated in 
Figure 3, is said to be a (2”,m,n, €) entanglement- 
assisted simulation of M2 by M1. A rate R, measured 
in copies of M2 per copy of M1, is said to be 
achievable for ^ if there exists a choice of S > 0 and 
a sequence of (2”,m,,n,€,) entanglement-assisted 
simulations with n/m„ — R while «, — 0. 

The quantum reverse Shannon theorem states 
that the entanglement-assisted capacity completely 
governs the achievable simulation rates. 


Theorem 2 (Winter 2004, Bennett et al.). Given 
two channels N,:A'—B and Nz:A'—B,R is an 
achievable simulation rate for Nz by N and all 

input states p^ if and only if 

Ce(N1) 
Ro- ee 26 
7 Cr (N2) | | 
Note that the form of eqn [26] ensures that the 
simulation is asymptotically reversible: if a channel 
N is used to simulate Ma and the simulation is then 
used to simulate M1 again, then the overall rate 
becomes 
C C 
EIN1) Ce(N2) _ 27 
Ce(N2) CEN 1) 

Thus, in the presence of free entanglement and for a 
known input density operator of the form (y^ )°”, a 
single parameter, the entanglement-assisted classical 
capacity, suffices to completely characterize the 
asymptotic properties of a quantum channel. 





Ar Pvg B” 


(a) (b) 

Figure 3 Circuit representation of an entanglement-assisted 
simulation of Mə by M1. (a) The simulation circuit, with Alice’s 
encoding operation € acting on n copies of A’ and Bob’s 
decoding operation producing n copies of B. (b) The circuit that 
the protocol is intended to simulate. As stated, the quantum 
reverse Shannon theorem allows the simulation circuit to depend 
on the density operator of the input state restricted to A”. 
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Moreover, since two channels that are asymptoti- 
cally equivalent without free entanglement will 
surely remain equivalent if free entanglement is 
permitted, eqn [26] gives essentially the only 
possible nontrivial, single-parameter asymptotic 
characterization of quantum channels. This is the 
sense in which the entanglement-assisted capacity 
should be regarded as the most important capacity 
of a quantum channel. 

The proof of the quantum reverse Shannon 
theorem is quite involved, but some of its features 
can be understood without much work. First, note 
that by the optimality statement of the entanglement- 
assisted classical capacity, the desired simulation can 
exist only if eqn [26] holds. Otherwise, composing 
the simulation of M2 by M1 with a sequence of codes 
achieving Cg(M2) would result in a sequence of codes 
beating the capacity formula for M1. 

Similarly, note that one method to simulate a 
channel M4 using Ma is to first use M2 to simulate 
the noiseless channel and then use the simulated 
noiseless channel to simulate M1. Since the achiev- 
able rates for the first step are characterized by the 
entanglement-assisted capacity theorem, proving the 
achievability part of Theorem 2 reduces to finding 
protocols for simulating a general noisy quantum 
channel M2 by a noiseless one. That perhaps sounds 
like a strange goal, but nonetheless is the difficult 
part of the quantum reverse Shannon theorem. 

It is likely that the quantum reverse Shannon 
theorem can be extended to cover other types of 
inputs than the known tensor power states (y^ )®”. 
The most desirable form of the theorem would be 
one valid for all possible input density operators on 
A’®", providing a single simulation procedure 
dependent only on the channels and not the input 
state. It is known that without modifying the form 
of the free entanglement, this most ambitious form 
of the theorem fails, but it is conjectured that the 
full-strength theorem does hold provided very large 
amounts of entanglement are supplied in the form of 
the so-called embezzling states (van Dam and 
Hayden 2003). 


Relationships between Protocols 


There is another sense in which the entanglement- 
assisted capacity can be viewed as the fundamental 
capacity of a quantum channel: an efficient protocol 
for achieving the entanglement-assisted capacity can 
be converted into protocols achieving the unassisted 
quantum and classical capacities, or at least very 
close variants thereof. 

An efficient protocol in this case refers to one that 
does not waste entanglement. Suppose that N : A’ — B 


can be written trU’? for some isometry UP®. Let 
ly)” be a pure state and |o)“?" = UPF\y)4"’ the 
corresponding purified channel output state. Careful 
analysis of the entanglement-assisted classical commu- 
nication protocol achieving the rate I(A; B), leads to 
an entanglement-assisted quantum communication 
protocol consuming entanglement at the rate 
(1/2)I(A; E), ebits per use of VV and yielding commu- 
nication at the rate of (1/2)I(A; B), qubits per use M. 
The protocol achieving this goal is known as the 
“father” (Devetak et al. 2004). 

If the entanglement consumed in the father were 
actually supplied by quantum communication from 
Alice to Bob, then the net rate of quantum 
communication produced by the resulting protocol 
would be (1/2)I(A; B), — (1/2)I(A; E), qubits from 
Alice to Bob, that is, the total produced minus the 
total consumed. 

This quantity, how much more information B has 
about A than E does, can be simplified using an 
interesting identity. Since |o)""” is pure, 


1(A;E), = H(A), +H(E),-H(AE), [28] 


= H(A), + H(AB), — H(B) [29] 


o 


Expanding I(A; B), and canceling terms then reveals 
that 


11(A; B) — 41(A; E) = -H(A|B), 


= I.(A)B) 30 


Oo 
where the function I, is known as the coherent 
information. After optimizing over input states and 
multiple channel uses, this is precisely the formula for 
the unassisted quantum capacity of a quantum channel 
(Devetak 2005). Thus, the net rate of qubit commu- 
nication for the protocol derived from the father 
exactly matches the rates necessary to achieve the 
unassisted quantum capacity. The only caveat is that 
the protocol derived from the father uses quantum 
communication catalytically, meaning that some com- 
munication needs to be invested in order to get a gain 
of I.(A)B). For the unassisted quantum capacity, no 
investment is necessary. Nonetheless, detailed analysis 
of the situation reveals that the amount of catalytic 
communication required can be reduced to an amount 
sublinear in the number of channel uses, meaning the 
rate of required investment can be made arbitrarily 
small. In this sense, the father protocol essentially 
generates the optimal protocols for the unassisted 
quantum capacity. 

Protocols achieving the unassisted classical capa- 
city can be constructed in a similar way. In this case, 
one starts from an ensemble E= {p;, N CHA) of 
states generated by the channel. Achievability of 


the unassisted classical capacity formula follows 
from achievability of rates of the form 


x(E)=H( So pnt) 
-D pH (N Ww) [31] 


for arbitrary ensembles of output states. Consider 
the channel 


N(p) = Y (loli) N h) [32] 


] 


and input state |y)““ = i SBD |i)". Iro=N\' (9), 
then I(A; B), is equal to y(€). Thus, there are protocols 
consuming entanglement that achieve the classical 
communications rate y(€) for the modified channel 
N. Because the channel M includes an orthonormal 
measurement which destroys all entanglement between 
A and B, however, it can be argued that any 
entanglement used in such a protocol could be replaced 
by shared randomness, which could then in turn be 
eliminated by a standard derandomization argument. 
The net result is a procedure for choosing rate x(£) 
codes for the channel M consisting of states of the form 
Pi 8 Wy, which is the essence of the achievability 
proof for the unassisted classical capacity. 

This may seem like an unnecessarily cumbersome 
and even circular approach to the unassisted 
classical capacity given that the proof sketched 
above for the entanglement-assisted classical capa- 
city itself invokes the unassisted result in the form of 
the HSW theorem. The approach becomes more 
satisfying when one learns that simple and direct 
proofs of the father protocol exist that completely 
bypass the HSW theorem (Abeyesinghe et al. 2005). 

Thus, the entanglement-assisted communication 
protocols can be easily transformed into their 
unassisted analogs, confirming the central place of 
entanglement-assisted communication in quantum 
information theory. 
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Introduction 


Any processing of quantum information, be it 
storage or transfer, can be represented as a quantum 
channel: a completely positive and trace-preserving 
map that transforms states (density matrices) on the 
sender’s end of the channel into states on the 
receiver’s end. Very often, the channel S that sender 
and receiver (conventionally called Alice and Bob, 
respectively) would like to implement is not readily 
available, typically due to detrimental noise effects, 
limited technology, or insufficient funding. They 
may then try to simulate S with some other channel 
T, which they happen to have at their disposal. The 
quantum channel capacity O(T,S) of T with respect 
to S quantifies how well this simulation can be 
performed, in the limit of long input strings, so that 
Alice and Bob can take advantage of collective pre- 
and post-processing (cf. Figure 1). Higher capacities 
may result if Alice and Bob are allowed to use 
additional resources in the process, such as classical 
side channels or a bunch of maximally entangled 
pairs shared between them. 

Quantum capacity thus gives the ultimate bench- 
marks for the simulation of one quantum channel by 
another and for the optimal use of auxiliary 
resources. Together with the compression rate of a 
quantum source (see Source Coding in Quantum 





Figure 1 Equipped with collective encoding and decoding 
operations (and perhaps some auxiliary resources), n=3 
instances of the channel 7 simulate m=2 instances of the 
channel S. The transmission rate of the above scheme is 2/3. 
Capacity is the largest such rate, in the limit of long messages 
and optimal encoding and decoding. 


Information Theory), it lies at the heart of quantum 
information theory. 

In a very typical scenario, Alice and Bob would 
like to implement the ideal (noiseless) quantum 
channel S=id: they are interested in sending 
quantum states undistorted over some distance, or 
want to store them safely for some period of time, so 
that all the precious quantum correlations are 
preserved. The capacity O(T)= O(T, id) is then the 
maximal number of qubit transmissions per use of 
the channel, taken in the limit of long messages and 
using collective encoding and decoding schemes 
asymptotically eliminating all transmission errors. 
This is what is generally called the quantum capacity 
of the channel T, and it is our main focus in this 
article. Little is known so far about the quantum 
capacity for the simulation of other (nonideal) 
channels (cf. the section “Related capacities”). 

In remarkable contrast to the classical setting, 
quantum channel capacities are very much affected 
by additional resources. This leads to unexpected 
and fascinating applications such as teleportation 
and dense coding. But it also results in a bewildering 
variety of inequivalent channel capacities, which still 
hold many challenges for future research. 


Notation 


A quantum channel which transforms input systems 
on a Hilbert space H4 into output systems on a 
(possibly different) Hilbert space Hg is represented 
(in Schrodinger picture) by a completely positive and 
trace-preserving linear map T:B,(H,)— B,(Hz), 
where B,(H) denotes the space of trace class 
operators on the Hilbert space H (see Channels in 
Quantum Information Theory). We write A instead 
of B,(H.,4) to streamline the presentation, and A” for 
the n-fold tensor product B,(H,4)*”. 

It is evident that the definition of channel capacity 
requires the comparison of different quantum 
channels. A suitable distance measure is the norm 
of complete boundedness (or cb-norm, for short), 
denoted by ||- ||. For two channels T and S, the 
distance (1/2)||T — S||.,, can be defined as the largest 
difference between the overall probabilities in two 
statistical quantum experiments differing only by 
exchanging one use of S by one use of T. These 
experiments may involve entangling the systems on 
which the channels act with arbitrary further 
systems; hence the cb-norm remains a valid distance- 
measure if the given channel is only part of a larger 
system. Equivalently, we may set |T]|.,:= 
sup, |T @id,|], where — ||R||:= SUP |ol|, <1 IR (o)l 


denotes the norm of linear operators, and 
lloll; := tro*@ is the trace norm on the space of 
trace-class operators B,(H). 

We use base two logarithms throughout, and we 
write ldx:= log, x and exp, x:= 2”. 


Quantum Channel Capacity 


The intuitive concept underlying quantum channel 
capacity is made rigorous in the following 
definition: 


Definition 1 A positive number R is called achiev- 
able rate for the quantum channel T: A— B with 
respect to the quantum channel S: A’ > B' iff for any 
pair of integer sequences (np) en and (m,),en with 
lim, — œ ny =œ and limy — oo 2 < R we have 


lim inf ||DT®”E — $®”||_, =0 [1] 
D,E 








v—oo D, 


the infimum taken over all encoding channels E and 
decoding channels D with suitable domain and 
range. The channel capacity O(T,S) of T with 
respect to S is defined to be the supremum of all 
achievable rates. The quantum capacity is the special 
case O(T):= O(T,id2), with id? being the ideal 
qubit channel. 


In this article, we mainly concentrate on 
channels between finite-dimensional systems. This 
is enough to bring out the basic ideas. Many of the 
concepts and results discussed here can be general- 
ized to Gaussian channels, which play a central 
role as building blocks for quantum optical 
communication lines (Holevo and Werner 2001, 
Eisert and Wolf). 

There is considerable freedom in the definition 
of quantum channel capacity, at least for ideal 
reference channels (Kretschmann and Werner 
2004). In particular, the encoding channels E in 
eqn [1] may always be restricted to isometric 
embeddings. 

In addition, it is not necessary to check an infinite 
number of pairs of sequences (n) en and (m,),en 
when testing a given rate R, as Definition 1 would 
suggest. Instead, it is enough to find one such pair 
which achieves the rate R infinitely often, 
lim, om, /n, =R. 

Without affecting the capacity, the cb-norm ||T||,, 
may be replaced by the unstabilized operator norm 
|T|| or by fidelity measures, which are in general 
much easier to compute. In particular, one might 
choose the minimum fidelity, 


Be ee 2 | 
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or even the average fidelity, 


F(T) = J ITU ANY) dep [3] 


Unfortunately, this equivalence is restricted to 
capacities with noiseless reference channel S= id. 
In the vicinity of other (nonideal) channels, equiva- 
lence of the stabilized and unstabilized error criteria 
may be lost. Of course, the comparison of channels 
is ultimately based on the comparison of a state to 
its image, and here the pure states are the worst 
case. Hence, the remarkable insensitivity of the 
quantum capacity to the choice of the error criterion 
stems from the observation that the comparison 
between an arbitrary state and a pure state is rather 
insensitive to the criterion used. 

Instead of requiring the error quantity in eqn [1] to 
approach zero in the large block limit v — œo, one 
might feel tempted to impose that the errors vanish 
completely for some sufficiently large block length, 
since this is the standard setup in the theory of 
quantum error correction (see Quantum Error Correc- 
tion and Fault Tolerance). While it is true that errors 
can always be assumed to vanish exponentially in eqn 
[1], requiring perfect correction may completely change 
the picture: if a channel has some small positive 
probability for depolarization, the same also holds for 
its tensor powers, and no such channel allows the 
perfect transmission of even one qubit. Hence, the 
capacity for perfect correction will vanish for such 
channels, while the standard capacity (in accordance 
with Definition 1) will be close to maximal, O(T) ~ 1. 
The existence of perfect error-correcting codes thus 
gives lower bounds on the channel capacity, but is not 
required for a positive transfer rate. 

In the other extreme, one might sometimes feel 
inclined to tolerate (small) finite errors in the 
transmission. For some ¢>0, we define QO-.(T) 
exactly like the quantum capacity in Definition 1, 
but require only that the error quantity in eqn [1] 
falls below <€ for some sufficiently large v. 
Obviously, O-(T) > QO(T) for any quantum 
channel T. We also have lim-_.9 O-(T)= O(T) 
(Kretschmann and Werner 2004). In the classical 
setting, even a strong converse is known: if £ > 0 is 
small enough, one cannot achieve bigger rates by 
allowing small errors, that is, C.(T) = C(T). It is still 
undecided whether an analogous property holds for 
the quantum capacity O(T). 


Related Capacities 


This article is chiefly concerned with the quantum 
capacity of a quantum channel. A variety of other 
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capacities have been derived from Definition 1 by 
either amending the channel S to be simulated, or 
allowing Alice and Bob to make use of additional 
resources. Their interrelations are reviewed in Bennett 
et al. (2004) 

Much interest has been devoted to the hybrid 
problem of transmitting classical information undis- 
torted over noisy quantum channels. The classical 
capacity C(T) of a quantum channel T is discussed in 
the article Quantum Channels: Classical Capacity of 
this Encyclopedia. It is obtained by choosing the ideal 
one-bit channel rather than the one-qubit channel as 
the standard of reference in Definition 1. Encoding 
channels E and decoding channels D are then 
restricted to preparations and measurements, respec- 
tively. Since a quantum channel can also be employed 
to send classical information, we have C(T) > O(T). 
There are, obviously, examples in which this 
inequality is strict: the entanglement-breaking channel 
T(o) = >7,Ulels) |j) | is composed of a measurement 
in the orthonormal basis {|7)};, followed by a prepara- 
tion of the corresponding basis states. It destroys all 
the entanglement between the sender and a reference 
system, implying O(T)=0. Yet all the basis states |j} 
are transmitted undistorted, which is enough to 
guarantee that C(T) = 1. 

Definition 1 also applies to purely classical 
channels, and thus to the setting of Shannon’s 
information theory. A classical channel T between 
two d-level systems is completely specified by the 
dxd matrix ee a of transition probabilities. 
For these channels the cb-norm difference is just 
(twice) the maximal error probability: 


id a T lje = 2 sup, {1 E T xx} 


which is the standard error criterium for classical 
information transfer. 

Dense coding and teleportation suggest that 
entanglement is a powerful resource for information 
transfer. It doubles the classical channel capacity of 
a noiseless channel, and it allows to send quantum 
information over purely classical channels. Surpris- 
ingly, the entanglement-assisted capacities are often 
simpler and better behaved than their unassisted 
counterparts. Unlike the classical and quantum 
capacities proper, they are relatively easy to calcu- 
late using finite optimization procedures, and there 
has recently been significant progress in under- 
standing the simulation rates for nonideal channels 
in this scenario (see Capacities Enhanced by 
Entanglement). 

The quantum channel capacity is unaffected by 
entanglement-breaking side channels. In particular, 
classical forward communication alone cannot 


enhance it. However, unlike in the purely classical 
case, both the quantum and classical channel 
capacity (but not the entanglement-assisted capacity) 
may increase under classical feedback. 


Elementary Properties 


The capacity of a composite channel Tı o T2 cannot 
be bigger than the capacity of the channel with the 
smallest bandwidth. This in turn suggests that 
simulating a concatenated channel is in general easier 
than simulating any of the individual channels. These 
relations are known as bottleneck inequalities: 


Q(T) o T2,S) < mintQ(T1, S), O(T2,S)} 4 


Q(T, S1 o 82) > max{Q(T, $1), O(T,S2)} — [5] 


Instead of running Tı and T> in succession, we may 
also run them in parallel. In this case, the capacity 
can be shown to be superadditive, 


OTe 15,3) = O71;8) FOTS) [6] 


For the standard ideal channels, we even have 
additivity. The same holds true if both S and one 
of the channels Tı, Tọ are noiseless, the third 
channel being arbitrary. However, results on the 
activation of bound-entangled states seem to suggest 
that the inequality in eqn [6] may be strict for some 
channels (see Entanglement). 

Finally, the two-step coding inequality tells us that 
by using an intermediate channel in the coding 
process we cannot increase the transmission rate: 


oTe D] 2 OT, Ts) Ol 13525) [7] 


Applying eqn [7] twice with Tọ =id and T3=id 
immediately yields upper and lower bounds on the 
channel capacity with nonideal reference channel, 


O(T1) 
O(T)) 


The evaluation of the lower bound in eqn [8] then 
requires efficient protocols for simulating a noisy 
channel T> with a noiseless resource. 

There are special cases in which the quantum 
channel capacity can be evaluated relatively easily, 
the most relevant one being the noiseless channel id,, 
where by the subscript n we denote the dimension of 
the underlying Hilbert space. In this case, we have 


aes Idn 
O(idy, idm) = idm 


The lower bound O(id,, idm) > Idn/ldm is immedi- 
ate from counting dimensions. To establish the 


upper bound, we use the fact that a noiseless 
quantum channel cannot simulate itself with a rate 





ZOU da) Ole) Ud T) [8] 


9] 


exceeding unity: O(idm,idm) <1. This is just the 
upper bound we want to prove for the special case 
n=m, and it can be extended to the general case 
with the help of the two-step coding inequality [7]: 
O(idm, idn) O(idy, idm) < O(idm, idm) <1, implying 
O(idy, idm) < 1/Q(idm, id,) < ld n/ldm, where in the 
last step we have applied the lower bound with the 
roles of n and m interchanged. 

Combining eqn [9] with the two-step coding 
inequality |7], we see that for any channel T 


. ld mm 
O(I idy) = EE 


which shows that quantum channel capacities relative 
to noiseless channels of different dimensionality only 
differ by a constant factor. Fixing the dimensionality 
of the reference channel then only corresponds to a 
choice of units. Conventionally, the ideal qubit 
channel id) is chosen as a standard of reference, as 
in Definition 1 above, thereby fixing the unit “bit.” 

The upper bound on the capacity of ideal channels 
can also be obtained from a general upper bound on 
quantum capacities (Holevo and Werner 2001), 
which has the virtue of being easily calculated in 
many situations. It involves the transposition map, 
which we denote by O, defined as matrix transposi- 
tion with respect to some fixed orthonormal basis. 
The transposition is positive but not completely 
positive, and thus does not describe a physical 
channel (see Channels in Quantum Information 
Theory). We have ||O||,,=d for a d-level system. 
For any channel T and small € > 0, 


Q(T) < Q.(T) <ld ||TO]|4, =: Qe(T) (11) 


where QO, is the finite error capacity introduced in 
the section “Quantum channel capacity.” 

The upper bound Oe(T) has some remarkable 
properties, which make it a capacity-like quantity in 
its own right. For example, it is exactly additive, 


Oo(S ® T) = Qe(S) + Qe(T) [12] 


for any pair S,T of channels, and it satisfies 
the bottleneck inequality: 


OQOə(ST) < min{Qe(S), Qe(T)} 


Moreover, it coincides with the quantum capacity on 
ideal channels, Og(id,,) = O(id,,) = ld n, and it vanishes 
whenever TO is completely positive. In particular, if 
id & T maps any entangled state to a state with positive 
partial transpose, we have Oo(T) = 0. 


(T, idm) [10] 


State-Channel Duality 


Quantum capacity is closely related to the distillable 
entanglement, which is the optimal rate m/n at 
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which 1 copies of a given bipartite quantum state ọ 
shared between Alice and Bob can be asymptotically 
converted into m maximally entangled qubit pairs 
(see Entanglement). Similar to the quantum capa- 
city, the definition involves the large block limit 
n,m— oo and an optimization over all conceivable 
distillation protocols. These may consist of several 
rounds of local quantum operations and (forward or 
two-way) classical communication. The one-way 
and two-way distillable entanglement of o will be 
denoted by Dı(o) and D2(ọ), respectively. 

Suppose that Alice and Bob are connected by a 
quantum channel T and run such a one-way distilla- 
tion protocol on (many copies of) the state 
or:= (T Qid)|QXN], where |Q) := (1/yda) X; li,i) 
is maximally entangled on Hy 8 H y. If the distillation 
yields maximally entangled qubits at positive rate R, 
Alice may apply the standard teleportation scheme to 
send arbitrary quantum states to Bob undistorted at 
that same rate R. Like the distillation protocol itself, 
teleportation requires classical forward communica- 
tion, which however does not affect the channel 
capacity (cf. the section “Related capacities”). Thus, 
O(T) > Dı(or). If two-way distillation is allowed, we 
have O2(T) > D2(or) for the capacity O2(T) assisted 
by two-way classical side communication. 

Conversely, if Alice and Bob use a bipartite 
quantum state ọ shared between them as a substitute 
for the maximally entangled state |Q) in the 
standard teleportation protocol, they will implement 
some noisy quantum channel T,. If this channel 
allows to transfer quantum information at nonvan- 
ishing rate R, Alice may share maximally entangled 
states with Bob at that same rate R. Consequently, 
D1(9) > O(T,) and D2(0) > Q2(T,). 

These relations (Bennett et al. 1996) allow to 
bound channel capacities in terms of distillable 
entanglement and vice versa. If the two maps 
Tt or and o++T, are mutually inverse, we even 
have Dı(o0)=0(T,) and D2(e)=Q2(T,). In this 
case, the duality o = T, is the physical implementa- 
tion of Jamiolkowski’s isomorphism between bipar- 
tite states and channels (see Channels in Quantum 
Information Theory). This has been shown 
(Horodecki et al. 1999) to hold for isotropic states, 
which are invariant under the group of all U@U 
transformations, where U is the complex conjugate 
of the unitary U. The corresponding channels are 
partly depolarizing. 

In general, T,, AT. However, the so-called con- 
clusive teleportation allows us to implement T at 
least probabilistically, resulting in the relation 


Q(T) < Di(or) < Q(T) 13] 
A 
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The duality [13] can be applied to show that both 
the unassisted and the two-way quantum capacities 
are continuous in any open set of channels 
having nonvanishing capacities (Horodecki and 
Nowakowski 2005). 


Coding Theorems 


Computing channel capacities straight from Defini- 
tion 1 is a tricky business. It involves optimization in 
systems of asymptotically many tensor factors, and 
can only be performed in special cases, like the 
noiseless channels in the section “Elementary prop- 
erties.” Coding theorems aspire to reduce this 
problem to an optimization over a low-dimensional 
space. They usually come in two parts: the converse 
provides an upper bound on the channel capacity 
(typically in terms of some entropic expression), 
while the direct part consists of a coding scheme 
that attains this bound. By Shannon’s celebrated 
coding theorem, the classical capacity of a classical 
noisy channel can be obtained from a maximization 
of the mutual information over all joint input- 
output distributions. 

For the quantum channel capacity, the relevant 
entropic quantity is the coherent information, 


I(T, o) = H(T(0)) — H(T @id(|ve)(Wel)) [14] 


where H denotes the von Neumann entropy: 
H(o)= —tro ldo, and Y € Ha ® Ha: is a purifica- 
tion of the density operator ọ € A. The coherent 
information does not increase under quantum 
operations, [.(So T,o0)<I.(T,o) for any quantum 
channel S$ and state o€ A. This is the data 
processing inequality (Barnum et al. 1998), which 
shows that the regularized coherent information 
provides an upper bound on the quantum channel 
capacity: if Alice and Bob have a coding scheme for 
the channel T with capacity O(T), n channel uses 
allow them to share a maximally entangled state of 
size ~ exp, n O(T). The coherent information of this 
state equals ~n Q(T), and was no larger prior to 
Bob’s decoding. 

Recently, Devetak (2005) developed a coding 
scheme to show that this bound is in fact attainable. 
Different proofs were outlined by Lloyd and Shor. 


Theorem 1 For every quantum channel T, 


O(T) = lim 1 nax I(T”, 0) 115] 
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Unlike the classical or quantum mutual information, 
coherent information is strictly superadditive for 
some channels (DiVincenzo et al. 1998). Hence, 


taking the limit n — oo in eqn [15] is indeed required, 
and in general the evaluation of the capacity formula 
[15] still demands the solution of asymptotically large 
variational problems. This should be contrasted with 
the entanglement-assisted capacities Cp(T) =20,¢(T) 
(where a simple nonregularized coding theorem is 
known to hold, see Capacities Enhanced by Entan- 
glement) and the capacity for classical information 
C(T) (where additivity is conjectured but not proved, 
see Quantum Channels: Classical Capacity). Even a 
maximization of the single-shot coherent information 
I.(T,o0) appears to be a difficult optimization 
problem, since this quantity is neither convex nor 
concave and may have multiple local maxima (Shor 
2003). Thus, even for simple-looking systems like the 
qubit depolarizing channel, so far we only have upper 
and lower bounds on the quantum channel capacity, 
but do not yet know how to compute its exact value. 

We now sketch Devetak’s proof of Theorem 1, 
assuming only some familiarity with Holevo- 
Schumacher—Westmoreland (HSW) random codes 
for the classical channel capacity (see Quantum 
Channels: Classical Capacity). It is easily seen from 
Stinespring’s dilation theorem (see Channels in 
Quantum Information Theory) that a noiseless 
quantum channel provides perfect security against 
eavesdropping. This is one of the characteristic traits 
of quantum mechanics and lies at the heart of 
quantum cryptography. In his proof, Devetak 
showed a way to turn this around and upgrade 
coding schemes for private classical information to 
quantum channel codes. 

The relation between quantum information trans- 
fer over a channel T: A-—B and privacy against 
eavesdropping is best understood in terms of the 
companion channel Tg: A—€. Tg arises from a 
given Stinespring isometry V:H4y—-Hg@®He of 
T=Tg by interchanging the roles of the output 
system B and the environment €: 


Tp(o) = tre V oV“ = Te(o) = tre VoV* [16] 


The channel Tẹ describes the information flow into 
the environment €, a system we assume to be under 
complete control of a potential eavesdropper, Eve 
say. The setup for private classical information 
transfer (including the definition of rates and capa- 
city) is then exactly the same as for the classical 
channel capacity (see Quantum Channels: Classical 
Capacity), but the protocols now have to satisfy the 
additional requirement that Tg releases (almost) no 
information to the environment. This can be achieved 
by randomizing over ve ~ exp, n x(Te, {pi, oi}) code 
words of a standard HSW code of total size 
~ EXP) nx(Ig, (Dis oih), where \Pis Qi} is the quantum 
ensemble from which a set of random code words 


generated. The appearance of 


{Ok eo i= is 
the Holevo boru 


-u(r e) - D T(o:)) [17] 


in the dimension of both these code spaces can be 
understood from the size of the relevant typical 
subspaces (Devetak and Winter 2004). 

The randomization guarantees that the remaining 
vg ~ exp, n(x(Tg)— x(Te)) code words are almost 
indistinguishable to Eve: 


IS ren OR] — OF ) 


VET] 


X(T, {Di Gi}) 


<e, Vik=1,...,vg [18] 














1 


The net transfer rate for private classical informa- 
tion is then R ~ y(T) — x(Te), which is just the total 
transfer rate for the channel Alice — Bob reduced by 
the transfer rate Alice — Eve. 

Remarkably, if o= $; p; |W) (a;| is a decomposi- 
tion of ọ € A into pure states, the private transfer 
rate exactly equals the coherent information, 


I.(Tg, 0) = H(Tg(0)) — H(Te(e)) 
= x(Ts) — x(Te) [19] 


The so-called entropy exchange 
H(Te(o)) = H (Ts ® id(|o) (Who) 


quantifies the extent to which a formerly pure 
ancilla state becomes mixed via interaction with 
the signal states. Equation[19] then nicely reflects 
the intuition that for high-rate quantum information 
transfer the signal states should not entangle too 
much with the environment. In fact, for an almost 
noiseless channel the entropy exchange nearly 
vanishes, and the optimized coherent information 
almost attains the maximal value 1, while for nearly 
depolarizing channels we have I,(Tg, 0) ~% —H(o) <0. 

So far, we have sketched a protocol for private 
classical information transfer. Devetak’s coherenti- 
fication allows to pass from the transmission of 
classical messages to the transmission of coherent 
superpositions. This technique has also been applied 
to obtain entanglement distillation protocols from 
secret key distillation, and offers a unified view on 
the secret classical resources and their quantum 
counterparts (Devetak and Winter 2004, Devetak 
et al. 2004). 

In order to transfer quantum information, Alice 
will only need to send one half of a maximally 
entangled state of dimensionality ~exp, n I.e(Tg, o). 
As described in the previous section, teleportation 
then allows her to transfer arbitrary quantum states 
from a subspace of that size. 
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Given a set of pure state code words 

| ea Ve . . . . 
(Pk) k1 1-1 Of a private classical information 
protocol, for entanglement transfer Alice prepares 


the input state 


1 1< 
(xa = =) _ NR) 8 HD) _ leet), [20] 
AA > A > A 


where A’ denotes a reference system that Alice keeps 
in her lab. On his share of the resulting output state 
|®’) wee Bob will then employ the corresponding 
measurement operators {M4}, 3 < to implement the 


coherent measurement 


Vu | Y)p = DV Me IP) 


which places the measurement outcomes into some 
reference system B1 ® By. Any measurement which 
identifies the output with high probability only 
slightly disturbs the output state, and thus Bob’s 
coherent measurement leaves the total system in an 
approximation of the state 


© | kl) ) B1B, 





1 VB,VE 
p”\ = k) lk) Il ; 21 
|") ae. Ja FA ) By Pki) BE [21] 


in which Eve and Bob are still entangled. A 
completely depolarizing channel Tg would directly 
yield a factorized output state B® € here. Although 
the randomization in eqn [18] does not necessarily 
result in complete depolarization, there is a controlled 
unitary operation which Bob may apply to effectively 
decouple Eve’s system, resulting in the output state 
~(1/\/¥B) X p | RR) yg, 8E, which is the maximally 
entangled state of size vg ~ exp, n1.(Tg, o) required 
for teleportation. The direct part of the capacity 
theorem then follows by applying the above coding 
scheme to large blocks and maximizing over (pure) 
input ensembles, concluding the proof. 

Devetak’s proof of the coding theorem seems to 
indicate that the private classical capacity C,(T) 
equals the quantum capacity QO(T) for every 
quantum channel T. However, for the coherentifica- 
tion protocol, we have restricted the private coding 
schemes to pure state input ensembles, and thus we 
can only conclude that O(T) < C,(T). The existence 
of bound-entangled states with positive one-way 
distillable secret key rate (Horodecki et al. 2005) 
implies that this inequality can be strict. A general 
procedure does exist to retrieve (almost) all the 
information from the output of a noisy quantum 
channel that releases (almost) no information to the 
environment. But this requires a stronger form of 
privacy than eqn [18]. 
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Quantum Channels with Memory 


This article has so far been restricted to memory- 
less quantum channels, in which successive chan- 
nel inputs are acted on independently. Messages of 
n symbols are then processed by the tensor 
product channel T®”, as in Definition 1 and 
illustrated in Figure 1. In many real-world applica- 
tions, the assumption of having uncorrelated noise 
cannot be justified, and memory effects need to be 
taken into account. For a quantum channel T with 
register input A and register output B, such effects 
are conveniently modeled (Bowen and Mancini 
2004) by introducing an additional memory 
system M, so that now T:M8A—B8M is a 
completely positive and trace-preserving map with 
two input systems and two output systems. Long 
messages with n signal states will then be 
processed by the concatenated channel 
T,:M®A"— B" 8M. In such a concatenation, 
the memory system is passed on from one channel 
application to the next, and thus introduces 
(classical or quantum) correlations between con- 
secutive register inputs. 

Remarkably, this relatively simple model can be 
shown (Kretschmann and Werner 2005) to encom- 
pass every reasonable physical process: every sta- 
tionary channel S: A™ — B% which turns an infinite 
string of input states (on the quasilocal algebra A”) 
into an infinite string of output states on B% and 
satisfies the causality constraint is in fact a con- 
catenated memory channel. Causality here means 
that the outputs of the stationary channel S at given 
time tọ do not depend on inputs at times ft > tọ. 
Figure 2 illustrates the structure theorem for causal 
stationary quantum channels. In general, it produces 
not only the memory channel T with memory 
algebra M, but also a map R describing the 
influence of input states in the remote past. 
Intuitively, such a map is often not needed, because 
memory effects decrease in time: the memory 
channel T is called forgetful if outputs at a large 
time t depend only weakly on the memory initializa- 
tion at time zero. In fact, memory effects can be 
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Figure 2 By the structure theorem, a causal automaton S can 
be decomposed into a chain of concatenated memory channels 
T plus some input initializer R. Evaluation with the partial trace tr 
means that the corresponding output is ignored. 


shown to die out even exponentially. The set of 
these channels is open and dense in the set of 
quantum memory channels. Hence, generic memory 
channels are forgetful. 

The capacity of memory channels is defined in 
complete analogy to the memoryless case, replacing 
the n-fold tensor product T®” in Definition 1 by 
the n-fold concatenation T,,. The coding theorems 
for (private) classical and quantum information 
can then be extended from the memoryless case 
to the very important class of forgetful channels 
(Kretschmann and Werner 2005). 

Nonforgetful channels call for universal coding 
schemes, which apply irrespective of the initializa- 
tion of the input memory. Such schemes are 
presently known only for very special cases. 
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Historical and Conceptual Background 


A capillary surface is the interface separating two 
fluids that lie adjacent to each other and do not mix. 
Examples of such surfaces are the upper surface of 
liquid partially filling a vertical cylinder (capillary 
tube), the surface of a liquid drop resting in 
equilibrium on a tabletop (sessile drop) and the 
surface of a liquid drop hanging from a ceiling 
(pendent drop); further instances are the surface of a 
falling raindrop, the bounding surface of the liquid 
in the fuel tank of a spaceship, and the interface 
formed by a fluid mass rotating within another fluid. 
This last example extends to the problem of rotating 
stars. 

Interfaces separating fluids and solids share some 
of the physical attributes of capillary surfaces, and 
the study of wetted portions of rigid “support 
surfaces” becomes essential for describing global 
behavior of capillary configurations. However, some 
significant distinctions appear that change the 
formal structure of the problems, and must be 
accounted for in the theory. 

Phenomena governed by capillarity pervade all of 
daily life, and most are so familiar as to escape 
special notice. By contrast, throughout the eigh- 
teenth century and presumably earlier, great atten- 
tion centered on the rise of liquid in a narrow glass 
circular-cylindrical tube dipped vertically into a 
liquid reservoir (Figure 1); this striking event had a 
dramatic impact that confounded intuition. Clarifi- 
cation of the behavior became one of the major 
problems challenging the scientific world of the 
time, and was not achieved during that period. The 
term “capillary,” adapted from the Latin “capillus” 
for hair, was applied to the phenomenon since it was 
observed only for tubes with very fine openings; the 
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Figure 1 Capillary tube in infinite reservoir, in downward 
gravity field. 


more general usage adopted in the definition above 
derives from the recognition of a class of phenomena 
with a common physical basis. 

The first recorded observations concerning 
capillarity seem due to Aristoteles c. 350 Bc. He 
wrote that “a broad flat body, even of heavy 
material, will float on water, however a narrow 
thin one such as a needle will always sink.” Any 
reader with access to a needle and a glass of water 
will have little difficulty refuting the assertion. 
Remarkably, the error in reasoning seems not to 
have been pointed out for almost 2000 years, 
when Galileo addressed the problem in his 
Discorsi, about 1600. The only substantive studies 
till that time are apparently those of Leonardo da 
Vinci a hundred years earlier. Leonardo intro- 
duced reasoning close in spirit to that of current 
literature; however, the Calculus was not available 
to him, and he was not in a position to develop his 
ideas in quantitative ways. 


Young’s Contribution 


The later discovery of the Calculus provided a 
driving impetus guiding many new studies during 
the eighteenth century. But despite the enormity of 
that weapon, it did not on its own suffice, and initial 
quantitative success had to await two initiatives 
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taken by Thomas Young in 1805. Young based his 
studies on the concept of surface tension that had 
been introduced by von Segner half a century earlier. 
Segner hypothesized that every curve on a fluid/fluid 
interface S experiences on both its sides an orthogo- 
nal force o per unit length, which (for given 
temperature) depends only on the materials and is 
directed into the tangent planes on the respective 
sides. The presence of such forces can be indicated 
by simple experiments. They become clearly evident 
in the case of thin (soap) films spanning a frame, in 
which case there is an easily observed orthogonal 
pull on the frame, see the section “Dual interpreta- 
tion of o: distinction between fluids and solids.” 


Young made two basic conceptual contributions 
(Y1, Y2): 


Y1. Relation of pressure jump across a free interface 
to mean curvature and surface tension. 


Consider a piece of surface S in the shape of a 
spherical bowl of radius R, separating two immisci- 
ble fluid media, as in Figure 2. In equilibrium, any 
pressure difference 6p across § must be balanced by 
a tension g on its rim I. If S projects to a disk of 
(small) radius r on the plane tangent to S$ at the 
symmetry point, we are led to 


nr dp ~ nro sin V [1] 


where V is inclination of S at the rim, relative to the 
plane. We thus find at the base point 
dsin V 1 

= 20 — [2] 
dr R 
Young then went on to consider a general S, without 
symmetry hypothesis. Letting 1/R1,1/R2 denote the 
planar curvatures at a point in S of two normal 
sections in orthogonal directions, he asserted that 





Op = 260 


1/1 1 
p= 205 (z a z) = 20H [3] 
where H is the mean curvature of S at the point. 
Although Young provided no formal justification for 
this step, we can establish it with the aid of a general 
formula from differential geometry that was not 
known in his lifetime: 


J 2HN dS = f mre 4 
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Figure 2 Pressure change across fluid element, balanced by 
surface tension. 


where N is a unit normal on S, and n is unit 
conormal (as indicated in Figure 2) on T. Multi- 
plying both sides of [4] by o, the right-hand side 
becomes the net surface tension force on S. Since 
that must equal the net balancing pressure force, we 
obtain 


J (Sp — 20H)NdS = 0 [5] 
S 


Letting the diameter of S tend to zero, the assertion 
follows. 

We emphasize here the implicit assumption above, 
that o is a constant depending only on the particular 
materials, and not on the shape of S. This author 
knows of no source in which that is clearly 
established, although experiments and experience 
provide some a posteriori justification. See the 
further comments under Y2, and later in sections 
“Gauss? contribution: the energy method” and 
“Dual interpretation of o: distinction between fluids 
and solids.” 


Y2. The capillary contact angle. 


Young asserted that there are surface tensions for 
solid/fluid interfaces analogous to those just intro- 
duced, and again depending only on the materials. 
This assertion is erroneous, as was suggested in 
writings of Bikerman and of others, and more 
recently established in a definitive example by Finn. 
Using his premise, Young attempted to characterize 
the contact angle y made by the fluid surface with a 
rigid boundary, by requiring that the net tangential 
component of the three surface tension vectors 
vanish at the triple interface; this leads to the often 
employed but incorrect “Young diagram,” see 
Figure 3, and the relation 
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Figure 3 Young diagram; balance of tangential forces. 
Residual normal force remains. 


for cosy in terms of the magnitudes of the three 
“surface tensions.’? Young concluded that the 


contact angle depends only on the materials, and 


in no other way on the conditions of the problem. 
This basic assertion is by a fortuitous accident 
correct, as follows from the contribution by 
Gauss described below; it underlies all modern 
theory. 

Using Y1 and Y2, Young produced the first 
verifiable prediction for the rise height uo in 
the circular capillary tube of Figure 1. He 
assumed the interface to be spherical, so that H 
is constant and a= cosy/H. He assumed vanish- 
ing outside pressure. According to classic laws of 
hydrostatics, 6p = pguo =20H by Y1, where p is 
fluid density; there follows the celebrated rela- 
tion, presented entirely in words in his 1805 
article: 


2 
a 7) 


Ka o 





Young scorned the mathematical method, and 
made a point of deriving and publishing his 
results on capillarity without use of any mathe- 
matical symbols. This personal idiosyncrasy 
causes his publications to be something of a 
challenge to read. 


The Laplace Contribution 


In 1806, Laplace published the first analytical expres- 
sion for the mean curvature of a surface u(x, y), and 
showed that the expression can be written as a 
divergence. He obtained the equation 
div Tu = 2H, Tu= a [8] 
1+ |Vul 


Thus, if H is known from geometrical or physical 
considerations, as it is for the capillary tube in 
the example just considered, one finds a second- 
order (nonlinear) equation for the surface height 
of any solution as a graph. The equation is 
elliptic for any function u(x,y) inserted into the 
coefficients, however not uniformly so; the parti- 
cular nonuniformity leads to some striking and 
unusual behavior of its solutions, as we shall see. 
With the aid of [8], Laplace improved the Young 
estimate |7] to 


ma aa 1 2 1 — sin? y a 9) 

Ka cosy 3\ cosy 
Both Young and Laplace proposed their for- 
mulas for “narrow tubes”, but neither gave any 
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quantitative indication of what “narrow” should 
signify. Note that whenever 0<y<7/2, [9] 
becomes negative when the nondimensional Bond 
Number B = ka’ exceeds 8; since u is known to be 
positive in the indicated range for y, [9] provides 
no information in that case, whereas [7] is still of 
some value. Nevertheless, [9] is asymptotically 
exact and consists of the first two terms of the 
formal expansion in powers of a; that was first 
proved by D Siegel in 1980, almost 200 years 
following the discovery of the formulas. In 1968, 
P Concus extended the formal expansion for the 
height to the entire traverse 0 <r <a. F Brulois 
(1981) and independently E Miersemann (1994) 
proved the expansion to be asymptotic to every 
order. Explicit bounds for the rise height above 
and below, making quantitative the notion of 
“narrow,” were obtained by Finn. 

Laplace supplied the first detailed mathematical 
investigations into the behavior of capillary surfaces, 
applying his ideas to many specific examples. His 
underlying motivation apparently derived at least 
partly from astronomical problems, and he pub- 
lished his contributions in two “Suppléments” to the 
tenth volume of his Mécanique Céleste. 


Gauss’ Contribution: The Energy Method 


Young and Laplace both based their reasonings 
on force-balance arguments, which at best were 
unclear and at worst conceptually wrong. In 
1830, Gauss took up the problem anew from a 
variational point of view, using the Johann 
Bernoulli principle of virtual work. To do so, he 
attempted to characterize both surface energies 
and bulk fluid energies in terms of postulated 
particle attractions and repulsions. In an aston- 
ishing 30 pages, he essentially introduced founda- 
tions of modern potential theory, of measure 
theory, and of thermodynamics. He ended up 
with elaborate expressions that could not readily 
be applied, and which at least to some extent he 
did not use. He asserted, for example, that the 
bulk internal energy would be proportional to 
volume, which for an incompressible fluid is 
constant under admissible deformations, and on 
that basis he ignored the bulk energy term 
completely. His procedures then led him, in an 
independent and more convincing way, to the 
identical equation and boundary condition that 
had been produced by his predecessors. It must, 
of course, be remarked that his justification for 
ignoring the bulk energy term would not be 
correct for a compressible liquid (see the section 
“Compressibility”), and it is open to some 
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question for the central motivating problem of a 
capillary tube dipped into an infinite liquid bath, 
in which event there is no volume constraint. 

The material that follows is guided by the ideas of 
Gauss; however, I have found it advantageous to 
replace his elaborate hypotheses on particle attrac- 
tions and repulsions by a simpler phenomenological 
reasoning as to the nature of the energy terms to be 
expected. 

To fix ideas, we consider a semi-infinite cylinder 
of general section 2 and of homogeneous material, 
closed at the bottom, situated vertically in a down- 
ward gravity field g per unit mass, and partly filled 
with an incompressible liquid of density p covering 
the bottom (a more exact discussion, taking account 
of compressibility, is indicated below in the section 
“Compressibility”). We assume an equilibrium fluid 
configuration with the liquid bounded above by an 
ideally thin interface S:u(x,y) (see Figure 4). We 
distinguish the energy terms that occur: 


1. Surface energy. This is the energy required to 
create the surface interface S$. We can characterize it 
by noting that fluid particles within or exterior to the 
liquid are attracted equally to neighboring particles in 
all directions; however, at the surface S there is a 
differential attraction, to particles of the exterior 
medium (such as air) above, or to the liquid below 
(see Figure 5). Thus, particles in the interface are 
pulled orthogonally to S. In general, for a liquid—gas 
interface, significant work will be done only on the 
liquid and those particles will be pulled toward the 
liquid; otherwise, the liquid would evaporate across 
the interface and disappear. The work done in that 
(infinitesimal) motion is proportional to the area of S, 
so that for the surface energy Es we obtain 


Es = o | \/1 + |Vuļ| dx [10] 
C1. 
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Figure 4 Liquid in cylindrical capillary tube, of general section Q. 
Reproduced with permission from the American Institute of 
Aeronautics and Astronautics. 





Figure 5 Attractions on a fluid element: (1) interior to the fluid; 
(2) on the surface interface. 


The constant o has the dimensions of force per unit 
length, and turns out to be the surface tension of the 
interface. We note from [10] its dual interpretation 
as areal energy density on S, arising from formation 
of that surface. This alternative interpretation lends 
conceptual support to the supposition that ø is 
constant on S. See the section “Dual interpretation 
of a: distinction between fluids and solids.” 

Implicit in the above discussion are deep 
premises about the nature of the forces acting 
within the fluid. Essentially these forces must be 
perceptible only at infinitesimal distances, and 
grow rapidly with decreasing distance. Forces 
both of attraction and of repulsion must be 
present. The recognition of the need for such 
forces can be traced back to Newton. Quantita- 
tive postulates as to their precise nature were 
introduced by van der Waals in the late nine- 
teenth century, and the topic remains still in 
active study. Since these forces appear at mole- 
cular distance levels, their introduction leads 
inevitably to questions of statistical mechanics. 
Additionally, our discussion of work done in 
forming the surface implicitly assumes a compres- 
sible transition layer there, in conflict with our 
treatment of S as an ideally thin interface 
bounding an incompressible fluid. In these senses, 
it is striking that [10] — which is in accord with 
classical constructions — could be obtained via 
global qualitative postulates concerning a con- 
tinuum in static equilibrium, in which the specific 
nature of the forces is not introduced. 

Rayleigh measured the thickness of the surface 
interface between water and air to be of mole- 
cular size, thus providing experimental justifica- 
tion for the procedure adopted. 

2. Wetting energy. A similar discussion applies at 
the interface separating the liquid and solid at the 
cylinder walls; however, this time the net attraction 
can be in either direction, as particles from neither 
medium can migrate significantly into the other. For 
the wetting energy Ew, we write, with X the 
boundary of Q, 


Ew = -bo $ u ds [11] 


We designate 8 as the relative adhesion coefficient of 
the liquid—gas-solid configuration. We assume that 
the cylinder walls are of homogeneous material, so 
that 8 will be constant. In general, 8 is a difference of 
factors that apply on the walls at the two interfaces, 
with the liquid and with the external medium. 

3. Gravitational energy. The work done in 
lifting an amount of liquid péhédQ against the 
gravity field from the base level to a height þ in a 
vertical tube of small section 62 is pghéhédQ. Thus, 
the work done in filling that tube up to the 
surface height u is (pgu7/2)6Q, and the total 
gravitational energy is 


Eg= BE fa dx [12] 
2 Jo 


4. Volume constraint. In the configuration con- 
sidered the volume is to be unvaried during 
admissible deformations; we take account of the 
constraint by introducing a Lagrange parameter 4, 
and an additional “energy” term 


Ey= do | udx [13] 
Q 


According to the principle of virtual work, the 
sum E of the above energies must remain unvaried 
in any deformation that respects all mechanical 
constraints other than the volume constraint. We 
choose a deformation u — u + en, with 7 smooth in 
the closure of Q, which determines a functional E(e). 
From E’(0) =0 follows 


-8 $ nds =0 [14 


from which 


[nti Tu + (Ku + à) }dx 


T f n(v- Tu — B)ds = 0 [15] 


with Tu = Vu/4/1 + |Vu|*, and with v the unit 
exterior normal on X. Choosing first 7 to have 
compact support in Q, the boundary term vanishes, 
and the “fundamental lemma” of the calculus of 
variations yields 


div Tu = ku + À, K = pg/o [16] 


throughout Q. Thus, the area integral in [15] 
vanishes for any 7. We are therefore free to choose 
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71 as we wish on the boundary, and the fundamental 
lemma now yields v-Tu=( on X. We now note 
that for any liquid surface u(x, y) there holds 


v- Tu = cosy [17] 


on X, where y is the angle between the cylinder wall 
and the surface S, measured within the liquid. Since 
B is assumed to be constant, that is so also for y. It is 
a physical constant: the contact angle, that must be 
measured in an independent experiment, and cannot 
be prescribed in advance or calculated within the 
scope of the theory. 

The constant 6, originally introduced as a general 
proportionality constant, is now characterized as 
B= cos y. We thus see that a physical surface of the 
form envisaged is possible only if -1<6< 1. 
Physically, one expects that if 8 < —1 the liquid 
will separate from the walls, while, if G> 1, the 
liquid will spread over the walls as a thin film. 

Equation [16] and boundary condition [17] 
provide a nonlinear second-order equation that is 
elliptic for any function u(x,y), and also a non- 
linear transversality condition on the boundary, for 
determining the surface interface S. The expression 
div Tu is exactly twice the mean curvature of the 
surface S. If «#0 then A can be eliminated by 
addition of a constant to u. The problem [16]-[17] 
for the fluid in a vertical cylindrical capillary tube 
of general section becomes thus a geometrical one: 
to find a surface whose mean curvature is a 
prescribed function of position in space, and 
which meets the cylindrical boundary walls in a 
prescribed angle +. 

In the absence of gravity, [16] takes the form 


div Tu = 2H (18) 


for a surface of constant mean curvature H. The 
constant H is determined by integrating [18] over Q, 
and using [17]: 


X| cosy 


2H = 
Q| 


[19] 
where || and |Q] denote the respective perimeter 
and area, and thus H is independent of volume. 
From the known uniqueness up to an additive 
constant of the solutions of [18], [17] it follows 
that the shape of the solution surface is indepen- 
dent of volume. That result holds also for [16], [17] 
in view of the possibility to eliminate À from the 
equation by addition of a constant, and the 
uniqueness of the solutions of the resulting 
equation. 

Equations [16]-[17] or [18]-[17] are appropriate 
for determining capillary surfaces that are graphs 
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u(x,y) over a base domain Q. More generally, any 
surface S in 3-space satisfies the equation 


Ax = 2HN (20) 


where H is its scalar mean curvature and N is a unit 
normal vector on S. Here A is the “intrinsic 
Laplacian” in the metric of S. This is the appropriate 
relation to be applied in situations for which the 
physical surface folds over itself and cannot be 
expressed globally as a graph. The formal simplicity 
of [20] is deceptive; the challenges arising from the 
nonlinearity in the equation can be formidable, and 
very little general theory is as yet available. 


Dual Interpretation of c: Distinction between 
Fluids and Solids 


We have already remarked the duality in connection 
with eqn [10] above. It can be made explicit with a 
simple experiment proposed by Dupré. One makes a 
rigid frame with a sliding bar of length /, as in 
Figure 6, and dips the frame into soap solution. On 
lifting the frame from the solution the opening will 
be filled with a soap film, and one finds a force 
F=2o! on the bar, directed orthogonal to the bar 
(the factor 2 appears since the film has two sides). 
The work done in sliding the bar a distance 6x is 
6F =2o0l6x, which can also be written 6F=206A 
with 6A an element of area. In this sense, the two 
interpretations of o are formally equivalent, for 
fluid/fluid interfaces. 

The equivalence cannot be extended to solid/fluid 
interfaces. Consider a rigid spherical ball of generic 
material and radius R, freely floating in an infinite 
liquid bath in a gravity-free environment, see 
Figure 7a. It can be shown that the unique 
symmetric solution to the problem is a horizontal 
surface, as in the figure. A variational procedure as 
above shows that if e0,@,,e2 are the interfacial 
energy densities associated with the three interfaces, 
then 

cos y = L [21] 

eo 
in formal analogy with the Young relation [6]. But 
€1,€2 cannot be interpreted as interfacial forces 
whose net tangential component cancels that of eo. 


Figure 6 Dupré apparatus for exhibiting surface tension. 
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Figure 7 (a) Floating spherical ball; presumed “Young” forces. 
(b) Normal and vertical components of Young forces; contra- 
diction to presumed equilibrium. 


To do so would lead to a net downward force o, on 
the ball (see Figure 7b), contradicting the supposed 
equilibrium state. 


Mathematical and Physical Predictions: 
Experiments 


In the following sections, we study the kinds of 
behavior imposed on a surface S by the requirement 
that it appear as solution of one of the indicated 
equations and boundary conditions. Some of these 
properties are quite surprising in the context of 
classically expected behavior of solutions of equa- 
tions of mathematical physics. The mathematical 
predictions were, however, corroborated in certain 
cases experimentally, as we discuss below. 


Uniqueness and Nonuniqueness 


We begin by considering uniqueness questions. We 
start with a semi-infinite capillary tube, closed at the 
bottom, to be partially filled with a prescribed 
volume of (incompressible) liquid making contact 
angle y on the container walls (Figure 8a). If « > 0, 
any solution is uniquely determined. That is a quite 
general theorem, valid for a wide class of domains 2) 
including all piecewise smooth domains (at the 
corners of which data of the form [17] cannot be 
prescribed); formally, data can be omitted on any 
boundary set of linear Hausdorff measure zero. In 
this result, no growth conditions need be imposed 
near the boundary (note that such a statement 
would be false for solutions of the Laplace equation 
under Dirichlet boundary conditions). 

Next we consider a sessile liquid drop on a 
horizontal plate (Figure 8b). Again the solution is 
uniquely determined by the volume and by y, 
although the known proof differs greatly from that 
of the other case. 

We now consider a smooth deformation of the 
base plane, depending on a parameter t, which 
carries it into the cylinder; that can be done in such 
a way that the supporting surface is at all times 
“bowl-shaped,” as in Figure 8c. Since the bowl 
formation tends to restrict the possible deformations 





(d) 


Figure 8 Support configurations: (a) capillary tube, general 
section; (b) horizontal plate; (c) convex surface appearing during 
deformation of horizontal plate to capillary tube; and (d) 
Nonuniqueness of configuration appearing during convex defor- 
mation. Reproduced from Mathematics Intelligencer 24(3) 2002 
21—33 with permission from Springer-Verlag Heidelberg. 


of the fluid consistent with smooth contact with the 
supporting rigid surface, one might expect that 
the corresponding capillary surface S(t), arising 
from the identical fluid mass, will for each t be 
uniquely determined. 

That is however not true, even for symmetric 
configurations. We can see that from the configuration 
of Figure 8d, consisting of a vertical circular cylinder 
whose base is a 45° cone. We assume a contact angle 
y=45° and adjust the radius so that a horizontal 
surface lying just below the cylinder/cone juncture 
provides the prescribed volume. This is a formal 
solution surface. Now fill the configuration with a 
larger volume, so that the contact line will lie above the 
juncture. The upper surface will no longer be flat, in 
view of the 45° contact angle, and takes an appearance 
as indicated in the figure. Finally, we decrease the fluid 
volume, keeping all other parameters unchanged. As 
noted above, the upper surface moves rigidly down- 
ward, and it is clear that if the original surface is close 
enough to the juncture line, then the prescribed volume 
will be attained before the contact line reaches the 
juncture. Thus, uniqueness fails. 

In this construction as just described, the bounding 
surface is not smooth; however, one sees easily that 
the procedure continues to work if the edge and 
vertex are smoothed locally. In fact, one can carry the 
procedure to a striking conclusion; by appropriate 
smoothing, one can construct a bounding surface 
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admitting an entire continuum of distinct solution 
interfaces, all with the same contact angle and 
enclosing the same fluid volume (Gulliver and 
Hildebrandt; Finn). This can be done for any gravity 
field. Figure 9 illustrates seven members of the family 
of interfaces, in the particular case «s =Q. 

The question immediately arises as to which if 
any of the continuum of surfaces will be seen in 
an experiment. In fact, it can be proved that none 
of the indicated surfaces is mechanically stable 
(Finn, Concus and Finn, Wente). Since the indicated 
family includes all symmetric surfaces that are 
stationary for the energy functional, we find that 
any stable stationary configuration must be asym- 
metric. Thus, we have obtained an example of 
symmetry breaking, in which all conditions of the 
problem are symmetric, but for which all physically 
acceptable solutions are asymmetric. 

These results were subjected to computational test 
by M Callahan using the Surface Evolver software, 
to experimental test by M Weislogel in a drop 
tower, and to experimental test by S Lucid in the 
Mir Space Station. The results of the latter experi- 
ment are compared in Figure 10 with the computer 
calculations. In both cases, both a local minimizer 
(potato chip) and a presumed global minimizer 
(spoon) were observed. 

The seven surface interfaces indicated in Figure 9 
all provide the same sum of surface and wetting 
energy, and bound the same volume of fluid. They 
all satisfy an eqn [18] with constant H, in 
accordance with hypotheses of incompressibility 
and vanishing gravity. Thus, formally, all configura- 
tions have identical mechanical energy. The surfaces 





Figure 9 Seven spherical capillary interfaces in an “exotic” 
container of homogeneous material in zero gravity. All interfaces 
bound the same volume and have the same sum of free surface 
and wetting energies. If all pressures above the interfaces are the 
same, then the pressures below them successively increase as the 
curvature vectors of the vertical sections change from upwardly to 
downwardly directed. Reproduced from Mathematics Intelligences 
24(3) 2002 21-33 with permission from Springer-Verlag Heidelberg. 
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Rotationally 
symmetric 





Spoon (left) 


Potato chip 


Figure 10 Symmetry breaking in exotic container, g = 0. Below: 
calculated presumed global minimizer (spoon) and local minimizer 
(potato chip). Above: experiment on Mir: symmetric insertion of fluid 
(center); spoon (left); potato chip (right). This is a grayscale version 
of a color figure reproduced from Journal of Fluid Mechanics, 224: 
383-94, (1991) with permission of Cambridge University Press. 


are all spherical caps; however, the radii R of the 
caps vary considerably. According to Y1 above, the 
pressure change across each interface is Ap =20/R. 
Since one may assume the outer region to be a 
vacuum with zero pressure for all caps, we find that 
the pressures within the fluids vary greatly among 
the configurations. One would thus expect that 
work is done within the fluid in passing from one 
configuration to another, a circumstance we have 
excluded by hypothesis when determining the 
family. From this point of view, the (customary) 
hypothesis of incompressibility that was used in 
determining the family is put into significant ques- 
tion; we examine this point in some detail in the 
section “Compressibility.” 


Discontinuous Dependence l 


Capillary surfaces can exhibit striking discontinuous 
dependence on the defining data. As initial example, 
we consider the behavior of a solution of [18]-[17] 
at a protruding corner point P of the domain Q of 
definition. For simplicity, we assume the corner 
bounded locally by straight segments, meeting in an 
opening angle 2a < 7, thus forming locally a wedge 
domain. In anticipation of material to follow, we 
assume contact angles %4 and %2 on the respective 
sides, 0 < y1,72 < T. One can show that a necessary 
condition for a solution surface over a domain Qs as 
in Figure 11 to have a continuous normal vector up 
to P is that the data point (7,72) lie in the closure of 
the rectangle R of Figure 12. (This figure includes 





Figure 11 Wedge domain. Reproduced from Finn R “Capillary 
Surface Interfaces” in Notices of AMS 46 No.7 (1999) with 
permission of the American Mathematical Society. 
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Figure 12 Domain R of data yielding continuous normal to 
capillary surface in wedge of opening 2a@ < m. The symbols D 
and I are clarified in the section “Behavior at a corner point.” 
Reproduced from “Capillary Wedges Revisited” in SIAM J. Math. 
Anal. 27 No.1 (1996) 56—69 with permission from SIAM. 


also additional material anticipating the section 
“Drops in wedges”). 

For data points interior to R, this criterion also 
suffices for the existence of at least one such solution 
surface, for any prescribed H; such surfaces can in 
fact be produced explicitly as spherical caps (planes 
if H =O). It remains to discuss what can occur with 
data arising from the remaining four subregions of 
the square. 

If (1,72) € Df, then there is no solution to 
[18]-[17] in any neighborhood of the corner point 
P. On the other hand, an explicit solution for any 
H > 0 can be found as a lower spherical cap on 
the segment y1 +% =T — 2a that separates Dİ 
from R (see Figure 13, which indicates the 
equatorial circle). Correspondingly, if H < 0 then 
an explicit solution can be found on the separation 
line between Dī and R. Thus, there is a 





Figure 13 Construction of solution as lower hemisphere; y4 + 
Yə =m — 2a,H > 0. Reproduced from “Capillary Wedges Revis- 
ited” in SIAM J. Math. Anal. 27 No.1 (1996) 56-69 with 
permission from SIAM. 


discontinuous change in behavior in crossing from 
R to either of the D4 regions. 

This behavior was put to experimental test by 
W Masica, who considered the case 0 < %1 =72 = 
y < 1/2 near the crossing point y= with D7, for 
which a+ %e =7/2. He partially filled a regular 
hexagonal cylinder of acrylic plastic, successively 
with two different liquids, making respective contact 
angles greater or less than 7, with the plastic. For 
each liquid, Masica then allowed the cylinder to fall 
in a 132 m drop tower. Figure 14 compares the two 
configurations after about 5s of free fall. In the case 
Y>%er he obtained the spherical-cap solution, 
which in this case covers the entire base domain Q 
and appears as an explicit solution of [18]-[17]. 
When y< Ye, the liquid rose to the top of the 
cylinder near the edges, filling out the edges over the 
corner points. The surface interface S does not cover 
Q, but instead folds back over itself, doubly covering 
a portion of Q. Thus, a physical surface appears as it 
must, but it is not a solution of [18] over Q. 


Discontinuous Dependence Il 


About 1970, M Miranda raised informally the 
question, whether a capillary tube Zo, whose section 


a a z ™ 


f 





(a) (b) 
Figure 14 Liquid in hexagonal cylinder, during free fall in drop 
tower: (a) æ + y > m/2; (b)a+y<a2/2. 
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Qo lies strictly interior to a section Q4 of a tube Z1, 
will raise liquid from an infinite reservoir in a 
downward directed gravity field to a higher level 
over Qo than will Z, over that subdomain of its 
section. That is true if both cylinders are circular, 
and in the intervening years its correctness was 
established in a number of other cases of particular 
interest. 

Finn and Kosmodem’yansku, Jr. showed, how- 
ever, by example that the assertion fails in a large 
range of cases, and in fact can fail with arbitrarily 
large height differences, uniformly over Qo. Beyond 
that, the construction exhibits a strikingly discontin- 
uous change of behavior, under perturbations of a 
disk as inner domain. Perhaps more remarkably, the 
assertion can hold with the inner domain a disk, but 
with discontinuous reversal of behavior as the disk is 
perturbed to neighboring disks. That was shown in a 
form of the example given later by Finn, and 
illustrated in Figure 15. Here the outer domain Q1 
is polygonal, with sides that extend to be tangent to 
a unit disk Qo, as indicated. The angle y is to be 
chosen so that 0 < 17/2 — y < Qmin, where Qmin is the 
smallest of the interior vertex half-angles of Q4. In 
view of the assumed infinite fluid reservoir, there is 
no volume constraint, and the governing equation 
[16] takes the form 


k = pg/o > 0 [22] 


div Tu = ku, 


Taking at first the inner domain to be Qo, it can 


be shown that for the corresponding solutions u? 
and u! of [22], there holds u? >u! over Qo for 





Figure 15 Discontinuous reversal of limiting height behavior. All 
sides of the polygonal domain Q4 are tangent to the unit disk Qo. 
For the corresponding solution heights u? in Qo, u° in the disk 9; 
of radius 1+ .¢, and u! in Q4, there holds u' — u? < 0, for any 
downward gravity. But lim,,_.9(u’ — u®) = +00, for any £ > 0. 
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any « >0, and thus the Miranda question has a 
positive answer for that configuration. But if we 
replace Qo by a concentric disk Q, C Q1 of radius 
1+ e, we find 


2 
ita o K) — sup Ho} e 
E C) 


1+e k 














1 — sin 1 — sin 
iea 


where w= arccos(cos y/ sin aœ), and u° is the solution 
of [22], [17] in Q.. Since k does not appear on the 
right side of [23], there follows in particular that for 
any £ > 0, there holds 


|23] 


cos y cos y 


imp ia (aso — sup z =oo [24] 
k—>—0 | Qe Q. 


In particular, a negative answer to Miranda’s 
question appears for all gravity sufficiently small. 
But as observed above, a positive answer occurs in 
Qo, for any positive gravity. Thus, the limiting 
behavior as x — 0 changes discontinuously, as € —> 0. 
We find that the two limiting procedures cannot be 
interchanged: for any x € Qo, we obtain 


lim lim {a (xk) — uf (x3) } = +00. 


E—0 k— [25] 
lim | lim{u'( x;K) — uf (x;k)} = const. < 0 


Existence Questions | 


For the general equation [20] there is an established 
literature on existence of surfaces containing a 
prescribed space curve. There is very little literature 
relating to the capillarity boundary condition that 
the solution surface S meet a prescribed “support” 
surface W in a prescribed angle y. The existence of 
at least one such surface interior to a prescribed 
sufficiently smooth closed space domain was proved 
by Almgren, and then Taylor proved smoothness at 
the contact curve. These are abstract theorems that 
are basic for the theory but in general do not 
provide specific information in particular cases of 
interest. 

Special interest attaches to the nonparametric 
cases [16] or [18] with boundary condition [17], 
especially in view of the discontinuous behavior 
properties described above. These cases were studied 
in depth by a number of authors, with results that 
put the above examples into some perspective. 

M Emmer proved the existence of a unique 
solution of [16]-[17] for any compact 2 having 
Lipschitz boundary with Lipschitz constant L such 
that V1 + L? cosy < 1 — e€ for some £ > 0. Finn and 


Gerhardt (F and G) extended this condition, and 
showed in particular that solutions exist in general 
in piecewise smooth Q. This result contrasts with the 
zero-gravity case [18] discussed in the section 
“Existence questions II,” for which solutions fail to 
exist when V1 + L? cosy > 1 at a protruding corner 
(see the section “Discontinuous dependence I”). 
However, in the cases v1 + L? cosy > 1 studied 
by F and G the solution u(x) is necessarily 
unbounded in the corner. This condition is equiva- 
lent to a < |y — 2/2 at the corner. Concus and Finn 
showed that if œ > |y—7/2| in a neighborhood Qg 
of a corner with rectilinear sides, as indicated in 
Figure 11, then the solution u(x) satisfies 


2 
ju(x;K)| < 75 +6 [26] 
independent of a, y in the range considered. Here it 
is assumed that [16] is normalized so that A= 0; 
when « Æ 0 this can always be achieved by adding a 
constant to u. On the other hand, if a < |y — 7/2|, 
then 


V k2 — sin? 


Rp 


cos V — 27] 


u(x;K) © 
where k= sina/cosy and ¥ is polar angle relative 
to a bisector at the vertex; hence u becomes 
unbounded as O(1/r). Thus, the behavior changes 
discontinuously as the configuration for which 
œ= |y — 71/2] is crossed. 

This prediction was corroborated by T Coburn in 
a “kitchen sink” experiment in the Medical School 
at Stanford University. Coburn formed a wedge 
using two sheets of acrylic plastic, resting on a glass 
plate, and inserted a drop of distilled water at the 
base of the wedge. Initially, the wedge was opened 
sufficiently that a + y > 7/2, and he obtained the 
configuration of Figure 16a, with the maximum 
height slightly lower than that indicated by [26]. By 
closing down the angle slightly, the liquid rose to 
over ten times that height, as shown in Figure 16b. 
This experiment was later repeated by Weislogel 
under laboratory conditions; it incidentally estab- 
lishes the contact angle of water and acrylic plastic 
in the Earth’s atmosphere as 80° + 2°. 

The indicated procedure provides in general a 
very accurate way to measure contact angles, when 
the angle is not far from 7/2. For y near zero or m in 
the Earth’s gravity field, the discontinuity is con- 
fined to a microscopic neighborhood of the vertex, 
and can be difficult to observe. This technical 
difficulty was addressed by Fischer and Finn, who 
introduced “canonical proboscis” domains, the 
theory of which was further developed by Finn and 





(a) (b) 
Figure 16 Distilled water in wedges formed by acrylic plastic 
plates; g>0. (a) a+y>a/2; (b) a+ y¥<a/2. Reproduced 
from P Concus and R Finn, “On Capillary Free Surfaces in a 
Gravitational Field” in Acta Math 132 (1974) 207-223 with 
permission of Institut Mittag-Loeffler. 


Leise and by Finn and Marek. For such domains the 
change in behavior is not strictly discontinuous, but 
it is nearly so, and it extends over large portions of 
the cylinder section, so that it is easily observable. 
Concus, Finn, and Weislogel conducted space 
experiments, demonstrating the feasibility of the 
method as a means for measuring contact angles in 
general ranges. 

In [26]-[27] no growth conditions at the corner 
are imposed; the estimates hold for every solution 
defined in Qs and assuming the prescribed data on 
the side walls, with no data prescribed at the vertex. 
The formula [27] is the initial term of a formal 
asymptotic expansion of the solution, in powers of r. 
Miersemann obtained the complete expansion, 
asymptotic to every order, when a < |y — 7/2]. He 
obtained somewhat less complete information in the 
bounded case [26]. 

Chen, Finn, and Miersemann provided a form of 
[27] that is applicable for any data (71,72) on the 
respective sides of the wedge, that arise from the D$ 
regions of Figure 12. Lancaster and Siegel and 
independently Chen, Finn, and Miersemann showed 
that if Qa < %1 +2 — T < 2a, then every solution 
is bounded at the vertex. This result holds also for 
the zero gravity eqn [18]. 

In the case of [18], Concus and Finn showed that 
in the DF regions no solution exists, regardless of H. 
Again, this result holds without growth conditions. 

From these considerations and from remarks in 
the section “Discontinuous dependence I” follows 
that for data in D5, all solutions either of [18] or of 
[16] are bounded but have discontinuous derivatives 
at the vertex P. Extrapolating from the behavior of 
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particular computed solutions, Concus and Finn 
conjectured that all solutions of [18] or of [16] that 
arise from data in DF are discontinuous at P. A 
number of attempts to prove or to disprove this 
conjecture have till now been unsuccessful. 

An existence theorem for [16]-[17] alternative to 
that of Emmer was obtained independently by 
Ural’tseva, using a very different approach. This 
procedure yielded smoothness estimates up to the 
boundary, but required a hypothesis of boundary 
smoothness, so that the result does not mesh with the 
discontinuous dependence behavior as does that of 
Emmer. Later versions of the existence result, again 
under boundary smoothness requirements, were given 
by Gerhardt, Spruck, and Simon and Spruck. In the 
procedure introduced by Emmer, the boundary trace is 
shown to exist only in a very weak sense (which, 
however, suffices for a uniqueness proof). The later 
work can be adapted to show that the Emmer 
solutions are smooth on the smooth parts of 02. 

None of the above procedures provides existence for 
the zero gravity case [18]. As we shall see in the 
following section, that is not an accident of the 
methods, but reflects subtle properties of the equations. 


Existence Questions Il 


We consider here the zero-gravity case [18], over a 
domain Q bounded by a piecewise smooth curve ©, 
under the boundary condition [17]. Integrating [18] 
over Q and using [17], we find 2H|Q| = |E] cos y. Let 
OF COQ, SF = VN a*r = 9NA OO*. The same proce- 
dure over Q*, using that |Tu| < 1 for any u(x, y), 
leads to the bound 


Ol; y| > 0 [28] 
where © is defined by 
SL; y] = |T] — |E*| cos y + 2H|9*| [29] 


The inequality [28] must hold for any choice of 
Q* CQ. This provides a necessary condition for 
existence of a solution to [18]-[17] in Q. E Giusti 
showed that when Q* is interpreted in a generalized 
sense as a Caccioppoli set, the condition [28] 
becomes also sufficient for existence. 

It is easy to give specific examples of convex 
analytic domains Q, in which subdomains Q* can be 
found such that [28] fails. Thus, the general 
existence results for [16] do not carry over to [18], 
regardless of local domain smoothness. Neverthe- 
less, in many cases of interest (e.g., a circular disk or 
an ellipse that is not too eccentric), solutions of 
[18]-[17] do exist for any y and are well behaved. 
Finn investigated the condition [28] in general by 
showing the existence of a system of arcs {T} CQ 


442 Capillary Surfaces 


x 


Figure 17 Extremal configuration for the functional ®. 


that minimize ®. All such arcs are circular of radius 
1/2H, and meet X either at smooth points in an 
angle y, or else at a reentrant corner point in an 
angle y* > y, measured on the side of I’ opposite to 
that into which the curvature vector points 
(Figure 17). All minimizing configurations are 
bounded by arcs of that form, although not all 
such configurations minimize. In a typical situation 
one will encounter only a finite number of such arcs, 
in which case only a finite number of cases need be 
examined. If ®>0O in each such case, then a 
solution of [18]-[17] exists for the given 2 and y. 
It may occur that no such arcs exist; we then observe 
that since ®[0;7]=®[%;7y]=0, ® cannot become 
nonpositive for any Q* C Q unless a minimizing T 
can be found in Q, contradicting the assumed 
nonexistence of minimizers. Thus, the criterion is 
then vacuously satisfied, and we conclude that a 
solution of [18]-[17] exists. 

One has, of course, to ask what happens 
physically in cases for which ®[I;y] < 0 for some 
I. as above. The possible modes of behavior were 
studied in particular cases by Tam and later by 
deLazzer, Langbein, Dreyer, and Rath; Finn and 
Neel characterized the general case. Formally, the 
fluid rises to infinity throughout domains Q* of the 
form indicated, but with H replaced by a value 
H- < H; on the opposite side of the circular arcs I, 
the fluid is asymptotic to the vertical cylinders over 
I. In a physical situation, the fluid will rise to the 
top of the container in a nearly cylindrical region 
adjacent to a portion of the container walls, 
approximating the indicated behavior and partially 
wetting the top of the container. One sees that 
behavior in Figure 14b, in which the fluid fills out 
regions adjacent to the corners. An analogous 
configuration would still be observed if the corners 
were smoothed locally. If insufficient fluid is 
available, a portion of the base Q could become 
unwetted. 


Behavior at a Corner Point 


Lancaster and Siegel (L and S) studied the behavior of 
the limits (which they designate by Ru) of bounded 
solutions of [16] or of [18] along radial segments 


tending to a corner point P of a domain Q. These limits 
can exhibit remarkable idiosyncratic behavior. For 
simplicity of exposition, we restrict ourselves here to 
rectilinear boundary segments at P, and assume 
constant boundary angles 7,72 40,7 on the two 
sides. L and S prove first that the limits Ru exist and 
vary continuously with direction of approach; then 
they show the existence of “fan” regions of directions 
adjacent to those of the sides, in which the limits are 
constant independent of direction, see Figure 18. They 
obtain that if the opening angle 2a at P satisfies 2a < 
T, then for data in the rectangle R of Figure 12 the fans 
overlap (see Figure 18a), so that the solution is 
necessarily continuous at P. For data in Dj, the 
solution decreases from the y1 side ©; to the y side X2 
(“D” behavior), subject to the Concus—Finn conjecture 
(see the section “Existence questions I”), with the 
reverse behavior (“I”) in D>. Concus and Finn showed 
that if 2a < r then in Dj there is no bounded solution 
of [16]-[17] or [18]-[17] as a graph. For [16]-[17], 
unbounded solutions do however exist for such data 
(see the section “Existence questions I”). 








(c) 


Figure 18 (a) Fan domains APA’ and BPB’ of constant limiting 
values; 2a < m so that the fans overlap when data are in R. (b) 
2a > m, case 1. Fans APA’ and BPB’ of constant radial limits 
appear. Limiting value changes strictly monotonically as 
approach direction changes from A’P to B'P. (c) 2a@ > m; case 2. 
In addition to the two fans adjacent to the sides of the 
wedge, a half plane of constant radial limits appears. 


If 2a >r, then the fans do not overlap, and 
in fact continuity at P cannot in general be 
expected. Outside the indicated fan regions adja- 
cent to the wedge sides, the limit values either 
change strictly monotonically with angle of 
approach, as in Figure 18b, or else they do so 
except for approaches within a third, central fan, 
which covers a full half-space, and interior to 
which the limiting values again remain constant, 
see Figure 18c. L and S give an example under 
which that behavior actually occurs. Remarkably, 
in the example the prescribed data are the same on 
both boundary segments. The solution is never- 
theless discontinuous at P, with an interval in 
which the radial limit increases, another interval in 
which it decreases, two fans of constant limit 
adjacent to the sides, and a fan of breadth ~ in- 
between. 

General conditions for continuity at a reentrant 
corner (2a > 7) have not yet been established. L and 
S give a sufficient condition, depending on a 
hypothesis of symmetry. Since no such hypothesis 
is needed when 2a < mr, one might at first expect it 
to be superfluous. However, Shi and Finn showed 
that by introducing an asymmetric domain perturba- 
tion that in an asymptotic sense can be arbitrarily 
small, the solution can be made discontinuous at P. 
That can be done without affecting any other 
hypotheses of the L and S theorem. 

In as yet unpublished work, D Shi characterized 
all possible behaviors at a reentrant corner, subject 
to the validity of the Concus—Finn conjecture at a 
protruding corner. If x > 0 then all solutions of [16] 
or of [18] in a neighborhood of P in Q are bounded 
at P. The further behavior depends on the particular 
data, and is indicated in Figure 19. Note the analogy 
with Figure 12, although the interpretations in the 
figures differ in detail. Here the symbol I denotes 
strictly increasing from the side X4 to X2, except on 
the fan regions of constant limits; ID denotes 
constancy on a fan adjacent to %, then strictly 
increasing, then constancy on a fan of opening 7, 
then strictly decreasing, then constancy on a fan 
adjacent to X2. D and DI are defined analogously. 
All cases can be realized in particular configurations. 


Drops in Wedges 


Closely related to the material just discussed is the 
question of the possible configurations of a con- 
nected drop of liquid placed into a wedge formed by 
intersecting plates of possibly differing materials, in 
the absence of gravity. Thus, one has distinct 
contact angles 71,72 on the two plates. Finn and 
McCuan showed that if (71,72) E€ R then the only 
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Figure 19 m < 2a < 2m. Possible modes of behavior. Repro- 
duced with permission from the Pacific Journal of Mathematics. 


possibility is that the drop surface S is part of a 
sphere. For data in D7, no such drop can exist, 
barring exotically singular behavior at the vertex 
points where the edge of the wedge meets S. 

For data in DF the situation is less clear. Concus, 
Finn, and McCuan (CFM) showed that local 
behavior exhibiting such data is indeed possible; 
however, they conjectured that such behavior 
cannot occur for simple drops. In conjunction with 
the above results, they were led to the conjecture 
that the free surface S of any liquid drop in a planar 
wedge, that meets the wedge in exactly two vertices 
and the wedge faces in constant contact angles 
%1, %2, is necessarily spherical. Here it is supposed 
only that 0 < 1,72 < T. 

The behavior of a drop of prescribed volume, as 
the data move from the midpoint of R to the D 
regions along parallels to the sides of R, is displayed 
in Figure 20. As one moves into the DŻ regions, the 
drop detaches from one side of the wedge and 
becomes a spherical cap resting on a single planar 
surface, in accord with the above conjecture. As Dī 
is approached, the liquid becomes a drop of very 
large radius that fills out a long thin region in the 
wedge, and disappears to infinity as the boundary of 
R is crossed. However, as Df is entered, the 
configuration transforms smoothly into a spherical 
liquid bridge, connecting the two faces of the wedge 
without contacting the wedge line. 


Stability Questions 


A number of authors, for example, Langbein, Vogel, 
Finn and Vogel, Steen, and Zhou, have studied the 
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(a) In R, near D; (a) In R, near Di 





(b) In R, near D; (b) Center point 





(c) In D4 


(c) In R, near D, 


(A) (B) 


Figure 20 (A) Drop configurations in wedge with opening 
angle 2œ = 50°, for three data positions on the line y4 =y.=y 
(a) y=70° (in R, near D;); (b) y=90° (in R, near D7); (c) 
y=110° (in D7). The first two cases yield edge blobs, the third a 
spherical tube that does not contact the edge line. (B) Drop 
configurations in a wedge of opening angle 2œ = 50°, for three 
data choices in R, on the line y4 =% — y» =y; (a) y=70° (near 
Dz); (b) y=90° (center of R); (c) y=35° (near Dz). As D5 is 
entered, original boundary conditions can no longer be satisfied 
by spherical drop, but configuration changes smoothly into drop 
on single plane, with prescribed data for that plane. Reproduced 
with permission from Concus P, Finn R and McCuan J (2001) 
Liquid bridges, edge blobs, and Scherk-type capilliary surfaces. 
Indiana University Mathematics Journal 50: 411—441. 


stability of liquid drops trapped between parallel 
plates, forming an annular liquid bridge joining the 
plates under the capillarity boundary condition of 
prescribed contact angles 71,772 on the respective 
plates. These studies consider the effects of dis- 
turbances within the fluid, assuming the plates are 
rigid and perfectly parallel. CFM show that from the 
point of view of physical prediction, the results of 
these studies may be open to some question. 
Specifically, they show that unless the drop is 
initially of spherical form, then infinitesimal tilting 
of one of the plates always results in a discontinuous 
transition of the drop form. Depending on the 
particular data, the transition can be to a spherical 
drop; however, it can also occur that the tilting 


causes the entire fluid to disappear to infinity in the 
wedge. 

CFM proved that if a connected liquid mass with 
spherical outer surface S$ cuts off areas |Wj|, |W2!| 
from plates I4, I which it meets in angles 7,72, as 
in Figure 20, then 


- 3|V 
- 5 |Wj| cos 1 + (S| -2M [30] 
1 


where |S| denotes area of the spherical free surface 
interface, |V| the enclosed volume, and R the radius. 
An immediate consequence is that the mechanical 
energy E of the configuration is 


30|V| 
R 


where o is surface tension. Using this result, they 
show that if a spherical liquid mass meets two 
wedge faces in angles 7,72 in the absence of 
gravity, then the configuration has smaller mechan- 
ical energy than does any connected liquid mass of 
the same volume that meets only one of the faces in 
the contact angle for that face. In turn, the drop ona 
single face has smaller energy than does a spherical 
ball of the same volume that meets no face. Note 
that in all zero-gravity cases for which stability 
relative to plate tilting can be expected, the liquid 
mass must be spherical. 


E= 31) 


Compressibility 


Until very recently, all literature on capillarity was 
based on a hypothesis that the body of the fluid 
is incompressible. Indeed, from the point of view 
of macroscopic mechanical measurements, most 
liquids are nearly incompressible. But all liquids are 
also to some extent compressible, and this property 
was even conceptually essential in our characteriza- 
tion in the section “Gauss’ contribution: the energy 
method” of the surface energy, even for the nomin- 
ally incompressible case. It is as yet unclear to what 
extent the compressibility properties of the bulk 
liquid will influence the physical predictions of the 
theory. In this connection, see the remarks at the end 
of the section “Uniqueness and nonuniqueness.” 


The Equations I 


Finn derived two possible equations extending [16] 
and [17], arising from different modelings. Both 
characterize equilibrium points as stationary points 
for the mechanical energy, and both are based on a 
hypothesized pressure—-density relation p= po + 
x(p — po). The first equation takes account of 
the change in density with height, arising from 


the gravity field. For a container consisting of a 
semi-infinite vertical cylinder, closed at the bottom, 
one obtains 


div Tu = Eu yg(1 — cos w) +A [32] 
where w is the angle between the upward directed 
surface normal and the vertical axis, and A is to be 
determined by a volume constraint. Athanassenas 
and Finn proved that for a general smooth domain 
Q, prescribed y, and prescribed fluid mass M subject 
to the restriction 


M < po|Q|/xg [33] 


there exists exactly one solution of [32] achieving 
the boundary data y. 

The condition [33] is necessary for existence with 
the prescribed mass. 

The methods used for this theorem do not permit 
regularity conditions to be relaxed to allow domains 
with corner points. An approximation procedure 
yields an existence theorem for such cases, however 
the uniqueness proof then fails; it can be replaced by 
a weaker result, estimating the difference between 
two eventual solutions: Let u, v, be solutions of [32] 
in a piecewise smooth domain Q, and suppose v- 
Tu <v-Tv on S=0OQ except at the corner points, 
where no data are prescribed. Then 


u<v+xoa/po [34] 


throughout Q. 

Note that in this result, no growth condition is 
imposed at the corner points. It can happen that 
both u and v are unbounded at a corner point; 
nevertheless, [34] holds uniformly over Q. 

The solutions of [32] emulate many of the 
characteristics of solutions of [16]. Notably, there is 
again a dichotomy of behavior, depending on open- 
ing angle 2a at a corner point, with all solutions 
either bounded, or unbounded with growth like 1/r. 


The Equations Il 


If in addition to taking account of the change of density 
with height, one accounts for the energy change due to 
expansion or contraction of volume elements with 
changing density, one is led to the equation 


div Tu — 20 —XPo (eX — 1) 
Ox 


+ xg(1 — cosw) + A [35] 


Here the changes from the incompressible case are 
much more significant than for [32]. In order to 
ensure stable behavior of solutions, it seems appro- 
priate to impose the condition pọ > xpo. The general 
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existence theorem above can no longer be expected; 
it is possible to give explicit examples of analytic 
domains, and constant data y, for which no solution 
of the problem exists. Thus, even in a large down- 
ward gravity field, the solutions can emulate the 
behavior of solutions of [18]. That can happen, 
however, only for data y exceeding 7/2. The 
condition [33] is again necessary for existence. 

For eqn [34], A cannot be eliminated by addition 
of a constant to the solution, and its determination 
creates a new level of difficulty toward solution of 
the physical existence question. Athanassenas and 
Finn proved unique existence of solutions of [35], 
[17] for a capillary tube of general smooth section Q 
dipped into an infinite liquid bath (which corre- 
sponds to A=0), when 0 < y < 7/2. If y > 2/2 then 
solutions do not always exist; it can happen that the 
surface moves down to the bottom of the tube, 
regardless of the depth of immersion. Under a 
hypothesis of radial symmetry, Finn and Luli were 
able to prove the existence of solutions with 
prescribed mass in a semi-infinite cylinder closed at 
the bottom, in the range 0 < y < m, and uniqueness 
if O<y< 7/2. Note that in this case, values y > 
m/2 are not excluded. For large enough mass, the 
surface will always cover the base of the tube. 


Closing Remarks 


This brief survey is intended only as a general 
indication of the current state of the theory; much 
material of interest could not be included. Nor have 
we addressed hysteresis effects on contact angle. 
Detailed references to the material discussed and also 
to further information can be found in the articles 
listed below. More recent publications can be located 
by following links in MathSciNet or Zentralblatt. 
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Burgers Type Equations 


We consider here two types of equations: the scalar 
partial differential equations (PDEs) of the form 


O O g 
% an= 2i 


f=f(x, t), x € R, t € R4, and the scalar difference- 
differential equations of the form 


e>0 [1] 


OF F(x,t) — F(x — e,t) 
aad pe ee 
ae t 2E) z 

F=F(x, t, xE R, te R,. 

Equation [1] for the case of linear ft yi(f) 

was called as Burgers equation by Hopf (1950), 

who justified this by the remark: “equation was 

first 


=0, e>0 [2 


of of __&f 


Ot Ox əx? 


introduced by J. M. Burgers (1940) as a simplest 
model to the differential equations of fluid flow”. In 
fact, eqn [1] for linear y(f) was introduced earlier in 
1915 by Bateman. Equation [1] for general y(f) 
appeared later in very different models, for example, 
in the model for displacement of oil by water, in a 
model of road traffic, etc. 

For y(f) =a + b- f, Hopf and Cole have studied 
[1] basing on the substitution 


reducing [1] to the heat equation 


dy _ Og 
Ot  — Ox? 


This transformation (often called as the Hopf- 
Cole transform) appeared for the first time in 1906 
in the book of Forsyth “Theory of differential 
equations.” 


Equation [2] first appeared for y(F)=a + b-F, 
e=1,x=n € Z, in Levi, Ragnisco, Bruchi (1983) as 
a semidiscrete equation reducible to the linear 
equation 





dt 
by the substitution 


F(n,t) =—$ (Gal — Gr-1(0) 


b G,,(t) 


Equation [2] for general y(F) was introduced by 
Henkin, Polterovich (1991) for the description of a 
Schumpeterian evolution of industry. For any € > 0, 
one can consider [2] as the family of difference- 
differential equations, depending on the parameter 
0={x/e} € [0, 1), where {x/e} denotes the frac- 
tional part of x/e. For physical applications of [1] 
(see Gelfand (1959), Landan and Lifschitz (1968), 
Lax (1973)), the inviscid case (€ = +0) is the most 
interesting. But, for some special physical models 
and for some social and biological applications (see 
Henkin, Polterovich (1991), Serre (1999)), the 
interesting case concerns eqn [2] with e=1 and 
x EZ. 

The results considered in this article concern 
mainly the Cauchy problem for eqns [1] and [2] 
with initial data f(x,0), F(x,0) satisfying the 
conditions 


f(x, 0) — a*, 


0 
J f(x, 0) — a` |dx r 
+} lat — f(x, 0)|dx < co 


x — +00 


and correspondingly 
F(ke + 0€) = a*, k— +00 


0 
N |F(ke + 0e, 0) — a7 | 


ee 4] 
+ X lat — F(ke + 8e, 0)| < co 
k=0 
where a <a‘’,@€[0,1) and the mapping 


0 {F(ke + Oc, 0) — ak, k € Z} € l is smooth. 
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The standard classical questions concerning 
Cauchy problems [1], [3] and [2], [4], namely 
those relating to existence, unicity, regularity, and 
conservation laws are well established (see Oleinik 
(1959), and Serre (1999)). This section formulates 
only those which are essential for the study 
of asymptotic behavior of solutions f(x,t) and 
F(x,t), when t — œ or £ — 0, and of the relation 
between vanishing viscosity and difference scheme 
approximations for inviscid Burgers type 
equations. 

One can see that asymptotic behavior of solutions 
of [2], [4] when ¢ —-+0 is not the same as the 
asymptotic behavior of [1], [3] when € — +0, in 
spite of fact that in the limiting case € = +0 both [1] 
and [2] look identical. It can be explained by the 
fact that eqn [2] can be interpreted as a semidiscrete 
approximation of the nonconservative (nonphysical) 
equation 


OF OF e OF 
ar p(F) z = 5 AF) aa 
However, the problem [2], [4] can be naturally 


transformed into conservative (physical) initial pro- 
blem. Indeed, the substitution 


F 
d 
r= 2 
o p(y) 
(under condition of integrability of 1/y(y)) trans- 
forms [2] into the equation 


Of (x, t) m Yf (x t)) — Y-t) _ 9 
ot E 


where w’(f)=y(F). Equation [5] is the so-called 
monotone one-sided semidiscrete approximation of 
conservative viscous equation, 


trent- e 





[5] 


where 





x — Ło 


* a 
f(x,0) > / P 


The results of finite-difference approximations 
for nonlinear conservation laws (see A. Harten, 
J. Hyman, P. Lax (1976)) explain both the similarity 
of behavior of [6] and [5] as well as some difference 
in the behavior of [1] and [2]. 

For further exposition the following assumption is 
useful: 


Assumption 1 Let ọ in [1], [2] be a positive and 
continuously differentiable function on the interval 
[a~, at]. Let y’ have only isolated zeros. 


From references one can deduce the following gene- 
ral properties of Cauchy problems [1], [3] and [2], [4]. 


Theorem 0 Under Assumption 1, we have: 


(i) There exists a unique (weak) solution f(x,t), x € 
R, t € R, of the problem |1], [3]; this solution is 
necessarily smooth for t > 0; besides, it satisfies 
the following conservation laws for t > 0: 


f(x,t) >a, x——oo 
f(x,t) > ue 


s [oe 7 f(x,t)) |x - [ (f (x,t) — a7 )dx 


CO 


x — +00 


Moreover, if the initial value f(x,0) is nonde- 
creasing as a function of x, then solution f(x, t) 
is nondecreasing as a function of x for all t > 0. 
(ii) There exists a unique solution F(x,t) x € R, t € 
R+ of the problem [2], [4]; this solution is 
smooth for t > 0; besides, it satisfies the follow- 
ing conservation laws for t > 0 and @ € [0, 1): 


k — =œ 


k = +00 


F(ke + 0e,t) > a, 
F(ke + be,t) > a”, 


d S> [ dy 3 fo dy 
dt p—1 Y F(ke+8e,t) p(y) EE p(y) 
=at- a 
Moreover, if for some 0 € [0, 1) the F(ke + 80e,0) is 
nondecreasing as a function of k € Z then solution 


F(ke + 6e, t) is also nondecreasing as a function of 
k € Z for all t > 0 and the same 0. 


Gelfand’s Problem and Iljin-—Oleinik 
Theorem 


The main results considered in this article are related 
to the following problem, formulated explicitly by 
Gelfand (1959): to find the asymptotic (t — oo) of the 
solution f(x, t) of the eqn [1] with the initial condition 


feo = {Foy 


where a <a’. 

Gelfand found a solution to this problem for the 
inviscid case ¢=+0 with initial conditions 
fix, =a ix <0, and 7 (x, 0)=a" 1 x > 0 (see 
below), and remarked that it would be interesting to 
prove that the main term of the asymptotic (t — oo) 
of f(x, t) satisfying [1], [7] coincides with the 
solution of [1], [7] for € = +0. 


if +x > +x* 


if x € [x , xT] 7 
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Gelfand’s problem admits natural extension for 
eqn [2] with the initial conditions 


F(x,0) = a* if +x > +x* 


F(x, 0) = F°(x), [8] 


if x € [x a 
, at], the a 
_ fie (y)dy. Let the W úlu), u 


a at], be sie bound of the convex set 


{(u,v):v <u), u € [a ,a"]} 


By Assumption 1, the set s={u ela, at]: 
w(u) <d(u)} is the finite union of intervals, 
s=(a~, bo) U (a1, 61) U =- (ar, a"), where a~ =q < 
O ET E O 

Let us define the function f(x, t) by 


a. us introduce, for u € [a7 


ifx< Ola) =t 


f (x,t) = ate (x/t), if p(av)-t<x<y(at)-t 


txpl ar 


where in the case bu j= AA Psl 
1,..., L; also, by definition, TAa (¢)=[ay, Al. 


Theorem 1 (Gelfand) The solution f(x, t) of the 
problem [1], [7] for the case «=+0 aa initial 
conditions f(x, 0)=a*, if +x >0, has the explicit 


form: f(x, t)=f(x, t). 


The analogous statement is valid also for the 
problem [2], [8] if, in the construction above, one 


takes 
V(u) = — 
"i I p(y) 


instead of y(u), u € [a7, at]. 

The Gelfand problem for [1], [3] and [1], [7] with 
monotonic (f) was solved by Iljin and Oleinik 
(1960). In the case a =a‘, the solution of this 
problem follows from an earlier work of Lax (1957). 
For the case of linear y(f), the solution of this problem 
follows from an earlier work of Hopf (1950). 

For semidiscrete initial problems [2], [4] and [2], 
[8], the analog of the asymptotic results of Hopf and 
Iljin—Oleinik have been obtained and applied by 
Henkin and Polterovich (1991). 

The case of increasing (f) has been studied in 
detail. In this case, for both initial problems [1], [3] 
and [2], [4], there is uniform convergence of solutions 
f(x, t) and F(x, t) to the so-called rarefaction profile 

LY Sa i 


g(x/t) = Daj, x € [elan)-t, (at) t 


t — oo (see Iljin and Oleinik (1960) and Henkin 
and Polterovich (1991)). More precise result in 
this case about convergence to the so-called 


N-wave has been obtained by Dafermos (1977) 
and Liu (1978). 

For the case of a general y(f), in particular, for 
the case of nonincreasing (f), we need the notion 
of shock profile. Following Serre (1999), three 
definitions can be introduced. 


Definition The initial problem [1], [3] (correspond- 
ingly, [2], [4]) admits (a7, a~)-shock profile (a~ < a~) 
if there exists a traveling-wave solution of this equation, 
that is, of the form f= f(x — ct) scons spondine” 
F=F(x — Ct)), such that fix) — at when x — +00 
(correspondingly, F(x) > a Aili x — +00}. 


From the results of Gelfand (1959) and Oleinik 
(1959), it follows that initial problem [1], [3] admits 
(a~, a*)-shock profile iff 


+ 


c= | voy 


-f eo)dy, Wwe (oat) 9 





u — Q 


From the results of Henkin and Polterovich 
(1991) and Belenky (1990), it follows that initial 
problem [2], [4] admits (a7, a*)-shock profile iff 


1 1 F dy 
C at-a a py) 
1 “d 
SENES 
u—u aT p(y) 
In the case € = +0, the equality in [9] and [10] is 
called the Rankine-Hugoniot condition, the inequal- 


ity in [9] and [10] is called the entropy condition (or 
the Gelfand—Oleinik condition). 


VuE(a,at) [10] 





Definition For initial problem [1], [3] (correspond- 
ingly, [2], [4]) admitting (a7, a*)-shock profile and 
for e = +0, we will call by shock waves the weak 
solutions of [1], [3] (correspondingly, [2], [5], [4]) of 
the form 





f (x —ct)=a*, if +x > ct 


F° (x — Ct) = aF, 





if +x > Ct 


where c, C satisfy Rankine-Hugoniot and entropy 
conditions [9], [10]. 


Definition The (œ~, a*)-shock profile for [1] (cor- 
respondingly, for [2]) is called strict if in addition to 
[9], [10] we have the Lax (1954) condition: 


plat) < c< pla”) [11] 


and correspondingly 


plat) < C< pla) [12] 


The (a~, a*)-shock profile for [1] or [2] is called 
semicharacteristic if one of the inequalities in [11] or 
[12] is strict and the other is an equality. This profile 
is called characteristic if both inequalities in [11] or 
[12] are equalities. 


One can check (Iljin and Oleinik 1960, Henkin and 
Polterovich 1991) that if in addition to Assumption 1 
the function y on [a ,a™t] is nonconstant and 
nonincreasing then eqn [1] (correspondingly, [2]) 
admits a strict (a, a )-shock profile. 

The main result of Iljin—Oleinik (1960) for eqn [1] 
and analogous statement of Henkin and Polterovich 
(1991) for eqn [2] can be presented as follows. 


Theorem 2 

(i) Let the initial problem [1], [3] admit a strict 
(a~,a*)-shock profile f. Let f(x,t),x ER, t€ 
R4, be a solution of [1], [3]. Then there exists 
do ER 


~ 


sup |f(x,t) — f(x — ct — do)| = 0, t— œ [13] 
xER 
The value of do is determined uniquely by relation 


f 1&0) Fl - do)} dx =0 


(ii) Let the initial problem |2], [4] admit a strict 
(a7, a*)-shock profile F. Let F(x, t) x € R, t € 
R+ be a solution of [2], [4]. Then there exists 
continuous function D (0), 6 € [0, 1), such that 


~ 


ey =e = Re 


t — œ 


The function Dọo(0), 0 € [0, 1], is determined 
uniquely from relation 


3 {®(F(n, 0)) — &(F(n — Do))} = 0 


k=—co 


where 


A 
F)= | ay. fed. A 
r p(y) 


xr 


If in conditions (i) and (ii), we take € = +0 then 
there exist do, Do such that Y > 0, we have 


(iii 


sup |a" — f(x, t)| 


x>ct+do+6é 
+ sup ja —f(x,t)| +90, t= 00 
sek 15] 
sup ja’ — F(x,t)| 
x>Ct+Do+6 
+ sup ja —F(x,t)| -~0, t—oo 
x<Ci+Do—6 
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The values of dọ and Do are determined by 


m (f(x,0) — a)dx+ f (f(x,0) _6*\de=0 


CoO do 


[eo —a)dx + [ eo) za jdy=0 


Oo Do 


Remarks 


(i) The statements of Theorem 2 give a positive 
answer to Gelfand’s question for the case of 
initial problem [1], [3] and [2], [4], admitting 
strict shock profiles. 

(ii) For linear y(f)=a + bf, a> 0, a+ bat > 0, 
b < 0, the traveling waves f, F for [1], [3] and 
[2], [4] can be found explicitly: 


f= =i Q >Q 
"T+ exp{—p(x — ct)} 
_ Pe Ae E =a" )2 
c=at z(a +a‘), pan o 
at — a` 


EEEa a 


-+ 
C=b]/ n , P= "In 
a + ba- E 


a + ba” 
a + bat 











b= fat bary(1~ a 


For initial problems [1], [7] and [2], [8], a* > 
a, the asymptotic convergence statements 
[13]-[15] admit the precise asymptotic esti- 
mates (see Iljin and Oleinik (1960) for [1], [7]: 


xr 


(iii 


~ 


sup [f(x,t) — f(x — et — do)| = O (e=) 


y >0,£>0 


[16] 


~ 


a Fest) a a *) 


y>0,£>0 [17] 


f(x, t) =a" for 
fot; gs4 
F(x,t)=a* for 
t > tọ, € = +0 


+x >+(ct+ do) 


|18] 
EX > +(Ct = Do) 


Theorem 2(i) is proved basing on the following 
idea. Let f satisfy the initial problem [1], [3] and let 
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f(x — ct + do) be (a7, at)-shock profile for [1], 
satisfying condition [13]. Put 


S(t) = | {f0.2)-7 


The function 6(x, t) satisfies the nonlinear parabolic 
equation 


— ct — do)}dy 


oô 06 O76 
a t+ vluf + (1 — Ka = Eo} 


where «K(x, t) is some smooth function of (x, t) with 
values in [0, 1]. 

Besides, by conservation law of Theorem O(i), we 
have 6(x, t) — 0, x — oo, Vt > 0. 

Estimates basing on maximum principle and 
appropriate comparison statements give that 
d(x, t) > 0, x € R, t > œ. It implies that 


f (x,t) 


Theorem 2(ii) is proved in a similar way. Let F(x, 
t) satisfy the initial problem [2], [4] with x=n € 
Z,, €=1, 0={x}=0, and let F(n — Ct — Do) be 
(a~, a*)-shock profile for [2], satisfying condition 
[14]. Put 


—f(x—ct-—dy) +0, xER, tx 


= Hr 


Then function A(n, t) satisfies the semidiscrete 

parabolic equation 

dA(n, t) 
dt 


@(F(n — Ct — Do))} 





=(P" (K0(F) 
+ (1 — «)®(F)))(A(# — 1,2) — A(n, t)) 
where «x(n, t) is some function with values in [0, 1]. 
Besides, by conservation law of Theorem O(11), we 
have 


A(n,t) +0, n—-+co, Vt>0 


Estimates, basing on generalized maximum prin- 
ciple and comparison statements, give that 
A(n, t) > 0, n E€ Z, t — œ. It implies that 


F(n,t)— F(n— Ct + Do) > 0, neZ, t— 00 


Remark For the cases of nonstrict shock profiles 
(characteristic or semicharacteristic) the statements 
of Theorem 2 are not valid. The reason is that, 
under initial conditions [3], [4] for any dọ and Do, 


we have 
f {f(x,0) — f(x — do) }dx = œ% 


and, correspondingly, 


Yo (ke + Oe — Do) — ®(F(ke + G2,0))} = co 


So, the crucial argument, related to conservation 
law, does not hold. 


One can extend the important Theorems 2(i), 2(ii) 
for the case of nonstrict shock profiles in two different 
ways: by changing conditions of these theorems or by 
changing conclusions of these theorems. 

The first method (started by Mei, Matsumura, and 
Nishihara in 1994) was completed by the following 
L'-asymptotic stability result (Serre 2004). 


Theorem 3 (Freistithler-Serre). Let eqns [1], [2] 
admit (a~, at)-shock profiles and f, . — the corre- 
sponding train-wave solutions of | [2]. Let 
f(x, t), F(n, t),x E R, n €Z, te Ry, r PA of 
eqns |1], [2] with such initial conditions that 


[feo 


YF 0,0) - 


[fen 


and, correspondingly, 


~ f(x) |dx < co 
n)| < co 
Then 
— f(x — ct —dy)|dx + 0 


N |F(n,t) — F(n — Ct — Do)| > 0, 


—CO 


t — œ 


where constants dy and Do are calculated from the 
same relations as in Theorem 2. 


Remark For the inviscid case € = +0, the state- 
ment of Theorem 3 is still valid for equations 
admitting strict shock profiles, but generally is not 
valid for equations admitting only nonstrict shock 
profiles (see Serre (2004)). 


The second method permits, keeping initial con- 
ditions [3], [4], to localize the positions of viscous 
shock waves for generalized Burgers equations 
(see the next section). 


Asymptotic Behavior of Solutions of 
Generalized Burgers Equations 


The main current interest and the main difficulty in 
the study of Gelfand’s problem for generalized 
Burgers equations consist in the following question 
formulated explicitly for initial problem [1], [3] by 
Liu et al. (1998): “In the Cauchy problem there is 
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the question of determining the location of viscous 
shock waves”. A similar question and related 
conjecture were formulated by Henkin and Potter- 
ovich (1999) for the initial problem [2], [4]. 

For solving this problem, it is important to solve it 
first for the Burgers type equations admitting 
nonstrict shock profiles. 


Theorem 4 


(Henkin-—Shananin—Tumanovy). 


(i) Let the initial problem [1], [3] admit the nonstrict 
(a, a*)-shock profile |9] and f(x — ct) be a 
corresponding traveling-wave solution. Let 

play) #0, 

g(a") #0, 
Let f(x, t) be a solution of [1], [3]. Then there 
exist constants yo and do such that 


if yla) =c 
if (at) =c 


~ 


sup |f (x,t) — f(x — ct — eyo ln t — do)| - 0, t—> œ 
xER 


where 
a =a + 
—1/¢'(a"), if pla”) > c= (a) 
= 4 1/g' (a7), if p(a~) =c > gla") 


1/y'(a~) — 1/¢'(a"), 
(ii) Let the initial problem [2], [4] with e=1 admit the 
nonstrict (a7, a )-shock profile [10] and F(n — Ct) 
be a corresponding traveling-wave solution. Let 
ee) #0, if ple-)=C 
y(at) #0, if y(a™)=C 
Let F(n, t) be a solution of [2], [4]. Let 


if pla”) =c = plat) 


AF(n, 0)2F(n, 0) — F(n — 1,0) > 0 


Then there exist constants To and Do such that 


~ 


sup |F(n, t) — F(n — Ct — To Int — Do)| — 0, 
nEZ, 
t — œ 
where 


(at — a` )- To 


-C/(2%'(a™)), if pla“) > C= pla") 
_ J 2g (a )), if p(a~) = C > yla*) 

(C/2)[-1/y'(a*) 

+1/¢'(a7)), if p(a~) = C= la") 


Remarks 


(i) One could think that nonstrict shock profiles 
as in Theorem 4 can appear only in exceptional 
cases. But Proposition 2 and Theorem 5 below 


show, on the contrary, that characteristic shock 
profiles and, as a consequence, the behavior of 
initial problems [1], [3] and [2], [4] as in Theorem 
4 are rather a rule than an exception. 
(ii) The statement of Theorem 4(i) (and also of 
Theorem 5(i)) below) disprove the Gelfand hope 
that the main term of asymptotic (t — oo) of 
f(x, t), satisfying [1], [7], coincides with the 
solution of [1], [7] for e =+0 with the same 
initial condition. Indeed, in conditions of Theorem 
4, we have y(a_)=c or y(at)=c, but y'(a_) Æ 
o'lat); then for any e> 0 the traveling wave 
f(x — ct — eyo Int — do) for [1], [3], concentrated 
near the point x,(t)=ct + eyo Int + do, moves 
away (t — oo) from the shockwave for [1], [7] for 
€ = +0, concentrated near the point xo(t)=ct + 
o(Int), where o(Int)/Int — 0, t — oo. 
Theorem 4 (and also Theorem 5 below) also 
illustrate another interesting phenomenon: for 
the case y'(a_) Æ y'(at), one has asymptotic 
convergence of the solution of [1], [3] (corre- 
spondingly of [2], [4]) to the traveling 
wave f(x — ct — eyolnt — do) (correspondingly 
F(x — Ct — To Int — Do)), which does not 
satisfy eqn [1] or correspondingly eqn [2]. Such 
a phenomenon was first discovered by Liu and 
Yu (1997) in the special boundary-value pro- 
blem for the classical Burgers equations, if 
u(x, t) satisfies the following conditions: 


xr 


(iii 


if up +U -Uy = Uxx, u(O,t) =1, u(co,t) = —1, 
u(x,0) = —th 5. then 


u(x,t) + th5 (x —In(1 +t)|— 0, t- 00, x >0 


Theorem 4 is proved in basing on the following 
idea. Let f(x, t) satisfy [1], [3] and F(x, t) satisfy [2], 
[4]. Let f(x — ct) be the traveling wave for [1], [3] 
and F(n — Ct) be the traveling wave for [2], [4]. 
Suppose that g(a”) >c=C=y(a™). Let da(t) and 
D,(t), A > 0 be functions such that 


ct+Ay/t 7 
J {f (x,t) —Fe—ct—da()}Jdx =0 [19] 
ci=As/t 


and, correspondingly, 


[Ct+A vt 


| - 
>, {ETRO 


k=[Ct-A vt] 
+ (Ct + Avt — [Ct + AvV#])(®(F(Ct+ Avt] + 1,¢)) 
— 6(F([(Ct + Avt] +1- Ct+D4(t))=0 
|20] 
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The relations [9], [20] can be called “localized 
conservation law.” The proof contains two difficult 
parts. 

The first part consists in proving that for A > 2,/c 
(correspondingly, A > 2VC) the following asymp- 
totics are valid: 


e-Int 


da(t) “reer a t —> œO 
Da(t) = a t D +l), E 
[21] 


where d?, D? are independent of A. 
The second part gives the following convergence 
statements: 


— f(y — ct — da (t))} 


aip | {f(y, t) 
x€[ct-AVi,ct+AVi] Y ct-Avt 
dy| —>0, t> œ 


n 


sup | XO {&(F(R,t)) 


xE[Ct—-AvVt,Ct+Avt] k=[Ct-A v^] 
O(F (k — Ct — Da(t )))}| — 0, 


t — œ 


The precise a priori estimates of local solutions of 
[1], [2] play an important role in the proof. An 
example of such an estimate, also useful for further 
results, is given below. 


Proposition 1 Let, in eqn |2], C=y(0) > 0, e= 
1, 0 < '(0) < 70, KT (x — Ct)/v Ct. Let the ae. 
tion F(x, t), defined in the domain Qo = {(x, t): a1 < 
X < ay}, a > 0, satisfy eqn [2], 


AF (x, t)=F(x,t) — F(x —1,t) > 0 
ii 
F(x,t)| < ——, (x,t)€20,t>t 
| ( )| JCt ( ) 0 0 
Then 
B-T 
AF (x,t) < -or > (x,t) € Qo, t> to 
where 


B = Bo a2 + G +) (1 + In(1 + a) 


d= min(x =.) = x) 
and Bo is an absolute constant. 


It is interesting to compare a priori estimate of 
Proposition 1 with some similar (but less precise) 
estimates in the theory of classical quasilinear 
parabolic equations (Ladyzhenskaya et al. 1968). 

We will formulate now the general conjecture 
concerning asymptotic behavior of solutions of 


initial problems [1], [3] and [2], [4] and some 
partial results which confirm this conjecture. To 
simplify formulation we admit the following. 


Assumption 2 Let ¢)(u) and (u) be upper bounds of 
the convex hulls for the graphs of 


and 


_ [dy 
ua) = | ply) 


a*]. We suppose that 


< Y(u)} 
(a1, 81) JEE (az, a”) 


respectively, with u € [a’, 
s = {u € la , a+]: y(u) 
= (a`, bo) U 
where 

a = ao < Bo < ai < 81 <- < aL < BL =at 

or, correspondingly, 
S= {u € ja, a7]: 
a (a, bo) U 


U(u) < U(u)} 
(41,61) U-+- (am, a") 
where 


+ 


a =a <bo<a<b<::-<ay<by=a 


In addition, we suppose that y’(a;) 40, y'(3;))F 
0,/=0, 1,...,L or, correspondingly, a E 
0, (mn) 70; MA. 1,...,M. 


Proposition 2 (Weinberger 1990, Henkin and 
Polterovich 1999). Under Assumptions 1, 2, one has: 


i) If ueļ[a, a]\s and, correspondingly, u € 
[a~, at] \ S, then following functions are well 
defined: 

bl, if x < p(6i) t 
x T if p(B) -t<x 
gi (>) = < pla): t 
Q+1; fx > p(a41) t, 
a O 2 


and, correspondingly, 


a | POM), if lbp) tsa 
Gi(=) = < (4m41) +t 
t i 
ER ih > plani) h 
m= 0,1,...,M 


(ii) For any interval (aj, Bi) Cs and, correspond- 
ingly, (Am, bm) CS there exist traveling waves 
fi(x — cıt) for [1] with overfall (œi, 6i) and, 


correspondingly, Fy,(x — Cmt) for [2] with over- 
fall (am, bm), where 


1 Gy J 
CI = 
5ga, vom 


cı = (ĝi), E een Oe 
cı = p(ai), keed 





and, correspondingly, 





Cn = (bm), m=0,...,M—-1 
Ca =U Ge) Wa lM 
Conjecture (Henkin and Polterovich 1994, 1999, 


Henkin and Shananin 2004). Let 


Core rere © 


, E ed 
=X he- et —enlt)) + 0 ai(=) - 24 
I=0 I=0 I=0 
L 
= ` aj, L>1 
[=i 
F(ne, LT ets Tm) 
M _ M-1 ne 
=y fue ai= EY Ca (=) 
t 
m=0 m=0 
M-1 M 
-X bm- am, M21 
m=0 m=1 


Then under Assumptions 1, 2, the following state- 
ments are valid: 


(i) For any solution f(x, t), x € R,t€ Ry, of ini- 
tial problem [1], [3], there exist shift-functions y(t): 


yy In t+ O(1) < y(t) < yr ln t+ O(1) 


Caa a aco, deal 
such that 
sup If (x, £) — f(x, t, Y0, Y1,- -3 YL) m 0, 
xER 
t — œ 


(ii) Moreover, in (i) one can take 








y = 

= E 

(Go) 
E fli=0 2 
eB) l 1 < L,y(ao) £ (8o) 

= if l 
SCOR OS 

a ifl=L>0,p(ar) # (8L) 
p' (ay) 
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(iii) For any solution F(ne, t), n € Z, t € R4, of initial 
problem [2], [4], there exist shift-functions T(t): 
T- Int+O(1) < T(t) <T} Int + O(1) 
027-20 2. 127-1..5,0 


such that 


~ 


sup |F(ne, t) — F(ne,t,U,T1,...,Um)| — 0, 
neZ 


t — œ 


(iv) Moreover, in (iii) one can take 





eo 
7 (Di —An) 
1 
Fb)’ if m=0< M, (ao) £ (bo) 
1 1 
= , if0<m<M 
a p'am) p'(bm) 
1 : 
E ifm=M >0,y(lam)#ọ(bm) 


The main result confirming formulated conjec- 
tures is the following. 


Theorem 5 (Henkin and Shananin). Conjecture 
(i) for L=1 and corresponding conjecture (iii) for 
M =1 are true, that is for solution of initial problem 
[1], [3] there exist shift functions y(t) = O (ln t) such 
that for t + œ we have 


fol% — cot — eyo(t)), if x < cot 
f(x, t)—> To Far sse 
fi(x — cit — ey (t)), ifx>cit 


and for solution of initial problem |2], [4] there exist 
shift functions Tm(t)=O(ln t) such that for t + œ 
we have 


~ 


Fo(ne — Cot — Ero (t)), if ne < Cot 
heen yp) (ne/t), if Cot < ne 
i i < Cit 

Fi (ne — Cyt — Eeri (t)), if ne > Cit 


The proof of Theorem 5 is of the same nature as 
the proof of Theorem 4. 


Remarks 


(i) The proof of stronger Conjectures (ii) and (iv) 
for L=1 or M=1 are in preparation. 
(ii) The numerical results, Rykova and Spivak (pre- 
print, 2004), confirm conjecture (iii) for M =2. 
(iii) The results of Weinberger (1990) and Henkin 
and Polterovich (1999) confirm convergence 
statements of Conjectures (i), (iii) for all L and 
M, but only on the intervals of rarefaction 
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profiles: x € [y(G))t, y(aj;.1)t] or, correspond- 
ingly, x € [p(Om)t, Plam+1)t], £ > 0. 


The problem of finding asymptotics (t — oo) of 
solutions of (viscous) conservation laws has been 
posed originally not only for generalized Burgers 
equations but also for systems of conservation laws in 
one spatial variable (see Gelfand (1959)). In this 
direction many important results on existence and 
asymptotic stability of viscous shock profiles (con- 
tinuous and discrete) have been obtained and applied 
(see Benzoni-Gavage (2004), Lax (1973), Serre 
(1999), Zumbrun and Howard (1998) and references 
therein). The results of type of Theorems 4,5 have not 
yet been obtained for systems of conservation laws. 

It is also very interesting to study asymptotic 
behavior of scalar (viscous) conservation laws in 
several spatial variables (continuous or discrete), 
basing on the asymptotic properties of Burgers type 
equations. In this direction there have been several 
important results and problems (see Bauman and 
Phillips (1986), Henkin and Polterovich (1991), 
Hoff and Zumbrun (2000), Serre (1999), 
Weinberger (1990), and references therein). 
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What is a Cellular Automaton? 


Cellular automata (CAs) were first introduced by 
J von Neumann in his investigation of “complexity,” 
following an inspired suggestion by S Ulam. But in the 
last 50 years they have been investigated and used in a 
number of fields; widely different terminologies have 
been used by researchers that now it is difficult even 
to give a precise general definition of a CA. Thus, 
some definitions and approximations are in order. 
First a broad definition: 


1. have a number of cells (boxes); 

2. at any (discrete) time step, any cell can present 
itself in a certain “state” among a finite number 
of different states; 

3. the state of any cell can change (evolve) from a 
time step to the subsequent time step; and 

4. there is a rule (evolution law, EL) which 
determines this transition. 


Note that the number of cells can be finite or infinite; 
the cells can be arranged on a line, on a surface, in the 
ordinary three-dimensional (3D) space, or possibly in a 
hyperspace (in any case, the cells can be numbered); the 
different states of a cell can be denoted by integer 
numbers but, in different contexts of application of 
CAs, different imaginative pictures have also been used 
(e.g., different colors, dead and living cells, number of 
balls in a box, etc.); the evolution of a CA proceeds in 
finite time steps (time is also discrete); the EL, provided 
that it is effective on any possible configuration of a 
given CA (computability), is otherwise completely 
arbitrary (indeed, there are not only deterministic and 
probabilistic ELs, but also those that “evolve” in time — 
following a meta-EL, which in turn can be determinis- 
tic or probabilistic). 

Consider some examples of CAs. 


Example 1 (CA1) Consider a linear array of seven 
boxes (cells; one can number them c(i), i=1,2,..., 7). 
Each box can be empty or it can contain a ball (so 
there are just two states for each cell). Given a 
configuration of this CA at time t, what happens at 
time t + 1 (EL)? 


(i) the state of the first box c(1) never changes; 
(ii) for each other box c(i), i=2,3,...,73 
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(iia) if the box is empty and the box on its left is 
empty then put a ball in the box; 

(iib) if there is a ball in the box and also there is a ball 
in the box on its left then empty the box. 


An example of the evolution of such a rather trivial 
CA is given in Figure 1. 


A more precise notation can now be established. 

First, let us denote the state of a cell at time t by a 
“state function,” say S. According to the point (iib) 
above, the number of possible states is arbitrary but 
finite: denote this number by the positive integer M 
(M > 1). Then S takes values on a finite field, say 
Zim = 7Z,/MZ. = {0,1,2,...,M—1} (in plain words, 
we have denoted the M states for the CA by the 
first M non-negative integers). Different cells can be 
labeled with a progressive number: c(n), n = n1, nı + 
1,... n2 —1,”2; possibly, in case of an infinite 
number of cells, one has m;—-—oo and/or 
nı — +00. In the case of nı = —00, mı =œ, One 
speaks of a unidimensional CA. Of course, the field S 
depends on 7 as well as on time (remember that, for a 
CA, “time” is a discrete variable: t=0,1,2,...). The 
field S(, t) describes completely the CA. If the EL is 
deterministic, then one can determine (com- 
pute) S(m,t) step by step for t > 0 from the initial 
configuration S(n,0) (initial datum, ID). Consider 
only static ELs, namely those that do not change in 
time. A further distinction can be made: there are 
ELs such that the future state of the generic cell, 
S(n,t + 1), depends on the whole current configura- 
tion of the CA (these are called nonlocal ELs) and 
there are ELs for which S(n,t + 1) depends only on 


c(1) c(2) c(3) c(4) c(S) c(6) c(7) 





Figure 1 A seven time-step evolution of CA1 starting from a 
given ID (t=0). Note that a stable configuration has been 
reached at t=6. 
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the current state of a finite number, say N, of cells 
(local ELs): 


{S(n+ki,t)}, i= 1,2,...,N, kie Z 
= S(n,t +1) 1) 


Note that, in principle, the set of cells that 
determine, according to the EL, the future state of the 
generic cell n, could depend on n, namely one can have 
N=N(n), as well kj =k;(m),i=1,2,...,N(n) (see 
CA2 below). In any case, such a set of cells is called 
the interaction set (IS). Moreover, the distance from 
the cell n of the farthest cell in the IS is called 
the range R (of the interaction): R= max(|k;|). If 
IS = {c(n — R),c(n —R+1),...c(m),...c(n + R — 1), 
c(n + R)}, then this IS is called a neighborhood of 
range R. It is, moreover, clear that, for unidimensional 
CA, there exists at least one infinite subset of cells that 
have the same state. If there is only one such subset, 
then it is called the vacuum set and the state of its 
cells is called vacuum state: let V denote the value of 
this state (0 < V < M, S(n,t).—>_V). 


n= 
Example 2 (CA2) An example of CA with 
n-dependent IS (M=2,R=3,V=0). This is the 
EL: the cell c(n) changes its state (0 — 1,1 — 0) iff 


(i) 2 is even and at least one of the two cells on its 
left is not in the vacuum state; 

(ii) mis odd and one or three of the three cells on its 
right are not in the vacuum state. 


An example of the evolution of such a CA is given 
in Figure 2. 


Usually, only a subclass of ELs is considered for 
which the phenomenon of vacuum excitation 
cannot occur. Namely, during the evolution of 
the CA, an infinite subset of the vacuum set 
cannot change its state in just one time step. In 
other words: if the set of cells starting from the 
first cell and ending with the last one for which 
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from a random chosen initial configuration. Note the left-right 
asymmetry due to the asymmetry of its IS and EL. 


S(n,t) Æ V be called population set (PS), then PS is 
a finite set at each time. 

Of course, one can easily devise an EL for which 
this is not true; nevertheless, the EL itself is still 
valid (computable), for instance, 


Example 3 (CA3) This is an unidimensional CA, 
namely there are infinite cells on a line (n € Z). The 
cells have M states and V=0O; the EL reads: 


the state of each cell cycles in the set of available states 
(O—-1,1—-2,...,M—2— M-—1,M-—-1-—0) 


Note that the range R is zero, there is a vacuum 
excitation; nevertheless, the EL is effective. 


Deterministic, static, and local ELs that do not give 
rise to vacuum excitation are called normal ELs (NELs). 

Since M, N are finite for an NEL, one can give the 
NEL itself as a table, considering every possible 
configuration of the IS and specifying the outcome 
for each configuration (note that there are MN 
possible configurations). 


Example 4 (CA4) ne Z,M=2, V =0,IS = {c(n), 

cin —1),c(n+2)},N=3,R=2. The EL is: 
S(n,t) 0000 1 1 1 1 
S-14 00110011 4 
S(n+2,t) 0 1 O 1 O 1 O 1 
S(n,t+1) 01101101 

An example of the evolution of such a CA is given 


in Figure 3. 


However, these NELs can also be given in an 
alternative representation (more useful in view of the 
extensions of the concept of CA itself, see below). 
Namely, an NEL can be given as a discrete-time 
EL for the state function S(n,t) in the finite field 
Zm ={0,1,2,...,M — 1}. 





Figure 3 Four hundred and sixty-one time steps of CA4, 
starting from a random chosen PS of 50 cells. 


For example, the NEL above for CA4 can be 
expressed as follows: 


2 


S(n,t + 1)=S(n — 1,2) + S(n,t) + S(n + 2,2) 


+ S(n,t)S(n + 2, tf) 
+ S(n — 1,t)S(n, t)S(n + 2, tf) [3] 


Here and in the following, the symbol “ denotes a 
congruence mod M. 
Another example is the following. 


Example 5 (CAS) ne Z,M=3,N=3, V=0,R=1, 
IS = {c(n — 1), c(n),c(n + 1)}. The NEL is: 


S(n,t +1) =S(n—1,t) +S(n,t) +S(n + 1,2) 
+ 2S(m — 1,t)S(n + 1,1) [4] 


An example of the evolution of such a CA is given 
in Figure 4. 


Classification of ELs 


Considering a CA with given M > 1,N > 1, the 
number L of possible deterministic, static ELs is 


L(M, N) = MO) [5] 


Of course, this number can be very large for 
relatively small values of M and N also. Never- 
theless, it is a finite positive integer, so that, for 
given M, N, one could denote every EL by an 
integer number and investigate the typical behavior 
of each EL. A considerable reduction of this 
number is obtained if one limits attention to 
totalistic ELs, namely to those whose outcome 
depends only on the global configuration of the 
IS, often just on 


N, REZ [6] 





Figure 4 Four hundred and sixty-one time steps of CA5, 
starting from a random chosen PS of 50 cells. 
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Figure 5 A class-1 CA: every ID rapidly evolves to 
periodic structures; M =3,V =0, R=2,EL: S(n,t + 1) 2 S(n, t)+ 
S(n — 1, t)S(n + 2, t). 


Deep and extensive computer investigations have 
been exploited for unidimensional CAs with small 
values of M, N. Surprisingly enough, it seems that 
the typical behavior of all these CAs can be (roughly 
and heuristically) classified in just four classes 
(Wolfram 2002): 


e Class 1 (simple): possibly after a complicated 
transient, simple patterns emerge. 

e Class 2 (fractalic): possibly after a transient, 
overall regular nested structures are obtained. 

e Class 3 (chaotic): complicated but seemingly 
random behavior. 

e Class 4 (complex): possibly after a transient, 
localized structures emerge that interact in com- 
plex ways. 


Due to the looseness of the above definitions, 
perhaps a better way to distinguish between classes 
is to train one’s eye. Consider some examples of 
CAs for each class: the typical behavior of class-1 
CA is shown in Figures 5 and 6, of class-2 CA in 
Figures 7 and 8, of class-3 CA in Figures 4 and 9, 
of class-4 CA in Figures 10 and 11. Note, however, 
that often one has “mixed type” CA: for example, 
CA4 is of class 1 on the right and of class 2 on 
the left (see Figure 3); Figure 12 exhibits a CA 
where the typical behaviors of classes 2 and 3 are 
superimposed. 


Extensions 


The concept of a CA is so simple that many 
extensions of the above-sketched definition of a 
CA can be easily devised. A (nonexhaustive) survey 
of such extensions follows. 
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Figure 6 A class-1 CA, a random ID vanishes after 337 
time steps, M=5,V=0,R=2,EL: S(n,t+1)2S(n—1,2) 
S(n—2,t)+ S(n+1,t)S(n+2, t)+ S(n—1,t)S(n+1,t)+ S(n—-2,t) 
S(n+2,t). 


& 


k r 


Figure 7 A class-2 CA: Sierpinsky triangles appear; M=2, 
V=0, R=1,EL: S(n,t +1) S(n-1,t) + S(n+1,t). 


Vector CA 


the state function S(n,t) is 
“vector,” namely S(n,t) = 
(Sa(n, t), S2(n, t), ...SL(n,t)), L being a positive inte- 
ger. Each component S,(m,t)(J=1,2,...,L) takes 
values in a finite field, say Zm,={0,1, 2,...,Mı — 
1}, and evolves, according to some EL, interacting 
with the other components. Of course, one can give 
separately the time evolution for each component; 
however, it is also possible to give a global 
representation of a vector CA, introducing a global 


In this extension, 
considered as a 
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Figure 8 Aclass-2 CA: a double fractal structure appears; M =4, 
V=0, R= 2, EL: S(n,t+ 1)4 S(n — 2, t) + S(n, t) + S(n+ 2, t). 





Figure 9 A class-3 CA: M=5, V=0,R=2,EL: S(n,t+ 1) 
2S(n — 1, t) + S(n + 1, t) + S(n, t)(S(n + 1, t) + S(n + 2, t)) 
S(n — 1, t)S(n + 1, t). 


+ jjo 


function S(n,t) that takes values in the finite field 
Zim,M=M,M>...M_,; for example, 


S(n,t) = S1 (n, t) HS Sil, t) J[ M; [7] 


k>l 


Thus, in a sense, vector CAs are still usual CAs 
with a complicated EL. 


Example 6 (CA6) A two-component vector CA: 


4 si(n,t)S1(n+1,t) 
+ (Mı — 1)S2(n— 1,t)S2(n,t)+c1 [8] 


Sı (n, t + 1) 


So(n,t +1) Ë S1 (n — 1, t)S2(n, t) 
+Si(n,t)So(n+1,¢)+c. [9] 
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Figure 10 Aclass-4 CA (Wolfram CA 110): M=2, V 
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such CAs can still be very rich). Usually, one 
considers an infinite number of cells tessellating 
the whole s-space, s=2,3,... (e.g., squares or 
hexagons on the plane, cubic cells in 3-space). The 
changes in the previous notation and definitions are 
plain: for example, for a bidimensional CA, the state 
function depends now on two discrete “space” 
variables (S(n1, m, t), nı E€ Z,n2 E€ Z); furthermore, 
there is a greater freedom in choosing a neighbor- 
hood of range R. Two most-used neighborhoods of 
range 1 are shown below: 


Neumann neighborhood 


oy GC a oO 2 
Co Ce GO 
O ® © ® OC 
O Oge OO 
oO E S a 


11 
Moore-Conway neighborhood ae 


GIOV 
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The most famous (and interesting) bidimensional 
CA is “Life”, introduced by J H Conway, which is 
discussed next. 


Example 7 (CA “Life”; Moore—Conway neighbor- 
hood, V=0,M=2). A cell in the vacuum state 0 is 
called “dead”; a cell in the state 1 is called “alive.” 
The EL is as follows: 


(i) If a cell is dead at time ż, it comes alive at time 
t+ 1 if and only if exactly three of its eight 
neighbors are alive at time ż (reproduction). 

(ii) Ifa cell is alive at time ż, it dies at time t + 1 if and 
only if fewer than two (loneliness) or more than 
three (overcrowding) neighbors are alive at time t. 


Clearly, this is a totalistic NEL. Now considering 
the explicit form of o (see [6]): 


o(m1,n2,t) =—S(m,n2,t) 
1 


1 
T ` S(nı + k,n +k2,t) [12] 
T 


the above EL can be simply expressed as: 
S(11,12,t + 1) = 63 o + 62. o S(n1, 12, t) [13] 


where 63, is the Kroenecker symbol. 
Life is a class-4 CA; it exhibits a rich variety of 
interesting structures: stable structures, oscillators 


(periodic structures), gliders and ships (moving 
structures), emitters and absorbers (namely, struc- 
tures that, after a time period, reconstitute them- 
selves, but meanwhile they have emitted or adsorbed 
moving structures). These structures are essential to 
prove that Life can be used to construct a universal 
Turing machine (see below). One can get a rough 
idea of such “richness” from Figure 14. 

As in the previous case of vector CA, one could 
object that also multidimensional CAs are not true 
extensions of the unidimensional CAs. Indeed, since 
the whole set of cells is still a countable set, one 
could number the cells with just a discrete “space” 
variable (say n € Z ). For example, in the case of a 
square tessellation of the plane, we could enumerate 
the cells in the plane starting from the origin as 
follows: 


22 — 


21 20 19 18 
-13 -12 -11 4 5 617 
-14 -9 -10 3 2 7 16 
-15 -8 -1 0 1 815 [14] 
“6 =7 2 -3 10 J f 
“7 -6 =) åA 
—18 -19 


Thus, any multidimensional CA could in principle 
be viewed as a unidimensional one. Of course, one 
has to pay a price for this: ISs and ELs that are 
simple for a multidimensional CA become cumber- 
some for its unidimensional version and vice versa. 


Higher Time Derivatives 


Up to now, we have considered CAs whose evolved 
state S(t + 1) depends only on the state S(t), namely 
the state of the CA itself at the previous time step. In 
other words the EL involves just the first (discrete) 
time derivative (1_CA). One can easily extend all the 
previous definitions to consider higher-order discrete 
time derivatives (K_CA). Of course, the ID and the IS 
for such a CA involve the state of the CA at K 
subsequent time steps. 

An example of a unidimensional 2_CA is given 
below. 


Example 8 (CA7) M=3,V=0,R=1. The EL is: 


2 


S(n,t+1)=S(n—1,t)+S(n,t-—1)+S(n+1,t) [15] 


An example of the evolution of such a CA is given in 
Figure 15. 
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Figure 14 CA “Life”: (a) Time 0. Near the lower border, five 
stable structures (from the left to the right: a “block”, a “boat”, a 
“ship”, a “loaf”, a “beehive”); near the left border three “blinkers” 
(period-2 oscillators); near the right corner, a symmetric structure 
that, in one time step, evolves into a “pulsar” (a period-3 
oscillator), on the left-up corner a “glider” (a moving structure); 
on the right-up corner a “medium weight spaceship” (another 
moving structure); in the center, a configuration that vanishes in a 
few time steps. (b) Time 1. The structures on the lower border are 
unchanged, the blinkers, the glider, and the space ship are in an 
intermediate state, on the right border, the pulsar starts to pulse. 
(c) Time 2. The three blinkers on the left border are again in their 
Original configurations (periodic structure with period 2), the 
pulsar, the glider and the spaceship are in another intermediate 
state. (d) Time 3. The pulsar is in its second state, the glider and 
the spaceship in their third, the structure in the center is going to 
vanish. (e) Time 4. The pulsar has completed its pulsation (period- 
3 oscillator, see Figure 14b); the structure in the center has 
vanished, the glider and the spaceship have recovered their 
original configurations (see Figure 14a) but meanwhile they have 
moved of a cell in four time steps (1\4 of the highest velocity 
attainable by a moving structure in a CA of range 1). The glider is 
moving downward and to the right, the space ship in horizontal to 
the left. (f) Time 60. The space ship has almost completed its 
crossing, the glider has reached the center and it is in a collision 
route with the pulsar. 


It is plain that taking a suitable continuum limit 
of a K_CA one gets a partial differential equation of 
order K for the evolution. However, there are also 
special and interesting CAs, called “filter” CAs, 
that in a suitable continuum limit end up in integral 
evolution equations. For a filter unidimensional 
CA, the evolved state at the cell n, S(n,t+ 1), 
depends also on the (already) evolved states of the 
cells on its left (or right): for example, an NEL of 
the type 


S(n,t + 1) = F(S(n + kj, t), S(n — kj,t + 1)) 


i=1,2,...,.N; kiez 
j=1,2,....N; REN [16] 


is still valid (computable). Extensions to K_CAs or 
vector CAs or multidimensional CA are plain. Very 
often filter CAs exhibit a class-4 behavior with 
particle-like structures moving and interacting in a 
complex way; see the following example and 
examples in the next section. 


Example 9 (CA8) M=2,V=0,R=2. The EL is: 


S(n,t +1) =S(n —1,t— 1)S(n — 2,1) 
+ S(n,t) + S(n + 1, t)S(a + 2,t) [47] 


An example of the evolution of such a CA is given 
in Figure 16. 


Invertible CA 


For most of the ELs there is a loss of information 
in the course of the evolution (see, e.g., Figures 5 
and 6). Indeed, different definitions of “CA 
entropy” have been introduced to measure the 
“randomness” in the behavior of a given CA. 
However, since CAs are important in physical 
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Figure 16 CA8, a “filter” CA. Note the emerging of particle-like 
structures moving to the left and to the right and interacting in 
complex ways. 
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modeling as well as in cryptography and data 
compression, there is great interest in a special 
subclass of CAs which are “invertible” (time 
reversible). Namely, for an “invertible” CA fol- 
lowing a given EL and starting from an arbitrary 
ID, there exists an “inverse” EL such that one 
can recover the ID from the evolved states. 
Invertible CAs can be easily devised in the case of 
K_CA(K > 1). For example, if K=2,3..., one can 
consider ELs of the form 


S(n,t+1)"S(n,t-K+1) +F(S(n+&i,t—j)) [18a] 


1 


where 


= 1,2,..., N}; Kez 
' 14^) ’ ’ i (1 8b] 
j= 0,1,2,...,K-2 
and F is an arbitrary polynomial function. 
It is then clear that the inverse EL reads 


S(n,i+1)=S(n,i—K+1) 


+(M—1)F(S(n+k1,7+)-K +2) 19] 


Indeed, if an arbitrary ID evolves according to 
the EL [18], then applying the inverse EL [19] to K 
subsequent evolved states (taken in reversed order), 
eventually the original ID is recovered (in reversed 
order) (see the following example). 





Example 10 (CA9) A 6.CA:M=2,V=0,R=1. 


The EL is: 


S(n,t + 1) =S(n,t — 5) + S(n,t — 3) +S(n +1,t—2) 
+S(n— 1,t— 1) 
+ S(n,t — 2)S(n+ 1,t— 2) 
+ S(n,t)S(n— 1,t) [20] 


The inverse EL, 
(Figure 17) 


[19], reads 


according to 





(b) 


Figure 17 CA9, a 6_CA: (a) a 50 time-step evolution from a 
peculiar ID; (b) a 50 time-step evolution of the inverse EL, starting 
from the last six configurations of Figure 17a (taken in inverse 
order); the ID of Figure 17a is recovered (in inverse order). 


Of course, more complicated invertible ELs can be 
devised. Invertible ELs can be also easily devised for 
“filter” CA, for example, if an NEL for a “filter” CA 
reads 


S(n,t +1) =S(n,t) 


+ F(S(n + kj,t),S(n—k,t+1)) (22) 


~ 


where k; and kj are positive integers 
(i=; 2a ING = l2 N) and Fis an arbitrary 
(polynomial) function, then it is invertible and 


the inverse NEL reads 


~ 
~ 


S(n,t + 1) M Sfin, $) +(M-1) 
x F(S(n + ki, +1),S(n—k,#)) [23] 
Note that [22] is computable starting from 


n= —œ, whereas [23] is computable starting from 
n= +O. 








(a) 





(c) 
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Example 11 (CA10) 
The EL is: 


A1.5_CA,M=2, V=0,R =3. 


S(n,t + 1) =S(n,t) + S(n —3,t + 1)S(n —2,t +1) 
+ S(n + 2,t)S(n + 3,t) 
+ S(n —2,t+1)S(m—1,t+1) 
+ S(z+1,2)S(m + 2, ft) [24] 
Note that this EL is of the form [22]; therefore, it 


is invertible (see Figure 18a). According to [23], the 
inverse EL reads: 


S(n,f + 1) =S(n,#) + S(n + 3,2 + 1)S(n + 2,241) 
+ S(n — 2, t)S(n — 3,t) 
+ Sin4-2, t+ 1) Se +1,24-1) 
+ S(n — 1,#)S(n — 2,2) [25] 
This CA exhibits a very rich dynamics: any 


complex ID rapidly decays in a great variety of coherent 
particle-like structures, steady or moving to the right or 





(b) 
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Figure 18 CA10: (a) 230 time-step evolution, then the inverse EL is applied for 230 further time step in order to recover the initial 
configuration. (b) Collisions between different kinds of particle-like coherent moving structures. The last collision (on the right) is 
a solitonic one: the interaction produces just a phase shift, preserving number, shape, and velocities of the involved “particles.” 
(c) “Particles” moving with different velocities and interacting in complex ways (solitonic collisions, particle creations and annihilations). 
(d) A particle goes through a nonhomogeneous medium and undergoes refraction by the medium itself. 
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to the left with different velocities. The interactions 
between different particles may be solitonic (the 
particles emerge unchanged but shifted) or annihila- 
tion—creation phenomena can occur (see Figures 18a-d). 


Applications of CAs 


CAs as Universal Constructors and 
Turing Machines 


In the 1950s, von Neumann, who contributed to the 
development of the first computer (ENIAC), decided 
to work out a mathematical theory of automata. 
Such a theory was finalized to give an answer to the 
following question: is it possible to build an 
automaton such that it allows universal computa- 
tion (1.e., it embodies a universal Turing machine) 
and, moreover, it is able to build (in order of 
decreasing generality) 


1. an arbitrary automata (universal constructor); 

2. a copy of itself (self-reproducing); and 

3. an automaton that is itself a universal Turing 
machine (constructor)? 


The last question von Neumann had intention to 
address was if in the process of automata self- 
reproduction (if possible) a process of evolution 
could take place, that is, if a simpler automaton 
could generate a more complex one. 

In the beginning, the idea of von Neumann was to 
describe, using mathematical axioms, an automaton 
moving inside a warehouse and selecting various 
elementary spare parts (e.g., “muscles,” switches, rigid 
girders) and then assembling them into a new auto- 
maton. While this original idea was very realistic, it was 
also very difficult to pursue, so that von Neumann, 
following a suggestion by Ulam, decided to consider his 
questions in the more abstract framework of CAs. 

The particular CA he considered is an infinite 
square CA with 29 possible states. The transition rule 
is dependent upon the cell to update and its north, 
east, south, and west neighbor cell (the von Neumann 
neighborhood). Among the 29 possible states there is 
one state that is “quiescent” (the vacuum state). 

von Neumann proved the existence of a configura- 
tion of ~ 50000 cells immersed in a sea of quiescent 
states that embodies a universal Turing machine and 
that is a universal constructor. An infinite one- 
dimensional “tape” is used to store a description of 
the automaton to build. The universal constructor 
reads the description on the tape, develops a 
“constructing arm” that builds the configuration 
described on the tape in an unoccupied part of the 
cellular space, makes a copy of the tape and finally 
attaches it to the newly built automaton and retracts 


the constructing arm. When on the tape, it stores a 
description of the universal constructor itself, then it 
self-reproduces. The total size of the self-reproducing 
automaton amounts to ~200000 cells. (Some com- 
puter simulations of von Neumann self-reproducing 
automaton are available on the web.) 

Since von Neumann’s CA is a very complex one, 
it led researchers to think that a CA able to simulate 
a universal Turing machine should also be quite 
complex. The perspective changed completely after 
the introduction of CA Life. Conway was looking 
for a simple CA with a possible rich dynamics; 
however, it was subsequently realized that Life was 
much more complicated that anyone could have 
thought. Finally, thanks to the development of faster 
computers that allowed visualization of the evolu- 
tion of quite large populations and through the 
contribution of a large number of researchers, it was 
proved that a universal Turing machine could be 
embedded in Life. 

The discovery that even a simple CA such as Life 
could incorporate a universal Turing machine led to 
the question whether it could be possible to build a 
universal Turing machine inside a simple one- 
dimensional CA. This is indeed the case: up to 
now, the simplest CA capable of universal computa- 
tion is the W110 CA (see Figure 10), as proved 
recently by Cook after a conjecture formulated by 
Wolfram in 1985. 


CAs for Computer Simulations 


One of the major applications of CAs is the 
computer simulation of various dynamical pro- 
cesses. Even if CAs were not invented for this 
purpose, they possess peculiarities that make them 
particularly suitable for this task. The main advan- 
tage of using a CA for a dynamical simulation is due 
to their completely discrete nature that allows exact 
simulations on a computer. Thus, any spurious 
effect due to rounding errors is ruled out. Another 
advantage is that the EL of a CA can be seen as a 
function between finite sets. For this reason, one can 
specify the EL through a “lookup table” (see [2]): 
then when running the simulations, the computer 
has only to access the table instead of computing the 
function every time, shortening considerably the 
computation time. Another great advantage of CAs 
in computer simulations is that, for their very nature 
(at least for local EL), they can be implemented on 
parallel machines. These two concepts are at the 
basis of dedicated computers for CAs simulations 
developed by Toffoli, Margolus, and co-workers 
(CAM series). The possibility to use efficiently 
parallel computers for CA simulation could prove 





Figure 19 A CA that “computes” the 3n+ 1 Collatz—Ulam 
map. The ID for the CA is the initial number for the iterated map 
(binary notation, order 2°°°, randomly chosen, displayed on the 
left vertical axis). The CA, according to the Collatz conjecture, 
ends up to the final stable configuration (horizontal line on the 
right for the CA, 1— 4 — 2 — 1 for the map). 


to be fundamental when computer speeds approach 
saturation. Moreover, CAs themselves can mimic 
parallel computations, see, for example, Figure 19, 
where a nonlocal CA “computes” very efficiently the 
celebrated Collatz—-Ulam 3n + 1 map. 


CAs in Physics 


Since Newton, physics has been described through 
differential equations and continuous functions. 
However, such a mathematical description is not 
fit for simulation on a computer, and some 
discretizations must be considered. First, one has to 
discretize space and time passing from differential 
equations to (finite systems of) finite difference 
equations; second, one has to round off the values 
of the functions to store them in the memory of the 
computer. The main drawback of this procedure is 
that in chaotic systems such approximations can 
rapidly lead to great differences between the real 
and the simulated behavior. As already noticed, this 
problem does not appear in CA. Thus, one would 
like to use this good characteristic of CAs in physical 
modeling taking due account of the continuous 
nature of the physics involved. This requires atten- 
tion and ingenuity in constructing reliable CA 
models for physical processes. For example, this 
goal has been achieved in the so-called lattice gas 
automata (LGAs). 

LGAs are CA models for the microscopic 
dynamics of fluids and gases. The thermodynamic 
limit of these CAs yields the correct continuous 
functions for the macroscopic quantities (density, 
pressure, viscosity, etc.). 

The first step toward LGAs was the discovery that 
the HPP model developed in the 1970s by Hardy, 
Pomeau and De Pazzis was in fact a CA. The HPP 
model describes the behavior of a fluid (or a gas) in 
a plane. The configuration space is given by a 
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bidimensional square lattice and the particles are 
described by arrows lying on the edges of the lattices 
and pointing to some vertex (see Figure 20a). 

The particles are assumed to be all identical and 
with the same velocity, and particles on the same 
edge with the same direction are not allowed 
(exclusion principle). The EL prescribes that parti- 
cles move with unitary velocity along the edges in 
the direction pointed by the arrow (free flight) 
unless there are exactly two particles on the edges 
connected to a given vertex and they point in 
opposite directions (collision); in this case they are 
replaced by two arrows pointing outward on the 
previously empty edges (see Figure 20b). Clearly, 
the EL conserves the number and the momentum of 
the particles. 

The HPP model can be described algebraically. 
The admissible particle velocities are just 


cı = +%, c2 = +9, 3 =—%, c4 = —Y [26] 





(b) 


Figure 20 (a) An example of configuration for the HPP model. 
(b) Head on collisions and three particle collisions in the HPP 
model. 
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Accordingly, only four bits n;(x, t), 7=1,2,3,4, are 
required to denote the presence (1) or the absence 
(0) of a particle with velocity c; pointing vertex x at 
time t. The dynamical rule for HPP can be written in 
the form 


nj(x + Ctt 1) = VX, t) T w(x, t) [27] 


where term n;(x,t) on the right-hand side accounts 
for the free flight of particles, while w;(x, t) modifies 
the trajectories in the case of collisions. The wj; are 
determined by the state of the system according to 
the following rules: 


W1 = —ny(1 — nz)n3(1 = n4) 

-+ (1 — n1)n2(1 z n3 )n4 [28a] 
Ww = -m (1 — n3)n4(1 = nı) 

+ (1 — n2)n3(1 = n4)n [28b] 
w3 = —n3(1 — n4)nı (1 — n2) 

-+ (1 — n3)n4(1 = N1)nN2 [28c] 
W4 = —n4(1 = nı)m (1 = n3) 

+ (1 — n4)nı(1 — n2 )n3 [28d] 


It is plain that eqns [27] and [28] can be 
interpreted as the EL for a CA. 

In the thermodynamic limit, the equations govern- 
ing the dynamics of the macroscopic quantities of 
the fluid are given by the continuity equation and by 
anisotropic Navier-Stokes equations. The aniso- 
tropy in the Navier-Stokes equations is due to the 
fact that the invariance group of the square lattice is 
too small. This problem was solved by Frisch, 
Hasslacher, and Pomeau in 1986, with the introduc- 
tion of the FPP model. It turns out that a hexagonal 
lattice has enough symmetries to recover the 
isotropic Navier-Stokes equations in the thermo- 
dynamic limit. So, the FPP model is an example of a 
model where even if the microscopic dynamics is 
almost a caricature of the real dynamics, the 
thermodynamic limit gives rise to the correct 
physical equations. 

CAs have been used to simulate many other 
physical processes (unfortunately, there is no space 
here for a sufficiently elaborate description). The 
principal fields of application are: percolation 
theory, magnetism, diffusion phenomena, sandpiles, 
models of earthquakes, crystal growth, etc. 

The more intriguing aspect of some even simple CAs 
(e.g., CA9, CA10: see Figures 16 and 18) is their very 
rich particle-like dynamics. For instance, the existence 
of solitonic collisions suggested that the techniques 
recently developed to find and treat “integrable” 


nonlinear dynamical systems (nonlinear continuous 
and discrete evolution equations, many-body pro- 
blems) could profitably be extended to find “integr- 
able” CAs. Indeed, many such CAs have been found 
that exhibit “solitons” and are endowed with non- 
trivial conservation laws (of course, this is very 
important in physical modeling). Moreover, the 
above-cited similarity between certain CA behaviors 
and elementary particle physics phenomena suggests 
that the fundamental structure of reality (at the Planck 
level) could indeed be that of a CA (cells of Plank 
length, discrete time flow): attempts to construct this 
underlying CA physics have been pursued. 


Other Applications 


CAs exhibit a great plasticity, which makes them 
well suited to model systems in a wide range of 
fields. This is mainly due to the fact that CAs with 
very simple rules can also simulate universal Turing 
machines, so that they can exhibit a very rich and 
complicated overall dynamics (in principle, one 
could simulate any dynamical system using a simple 
CA). There is another reason for the wide applic- 
ability of CA modeling even outside of physics: 
namely, it is well known that algorithms, not 
differential equations, are better instruments to 
schematize dynamical processes for complex and 
organized systems. Since simple algorithms can be 
naturally implemented on CAs, the latter are very 
useful for realizing simple models and simulations in 
many fields: biology, economics, ecology, neural 
networks, traffic models, etc. 

Moreover, applications of CAs in informatics and 
specifically in cryptography and data compression 
have been investigated. 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Generic Properties of 
Dynamical Systems; Integrable Systems: Overview. 
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Introduction 


We consider differentiable dynamical systems gen- 
erated by a diffeomorphism or a vector field on a 
manifold. We restrict to the finite-dimensional case, 
although some of the ideas can also be developed in 
the general case (Vanderbauwhede and Iooss 1992). 
We also restrict to the behavior near a stationary 
point or a periodic orbit of a flow. 

Let the origin 0 of R” be a stationary point of a C! 
vector field X, that is, X(0)=0. We consider the 
linear approximation A=dxX(0) of X at 0 and its 
spectrum o(A), which we decompose as o(A) =o, U 
Oc U Ou, where o; resp. o: resp. Cuy consists of those 
eigenvalues with real part <Oresp.=Oresp. >0. If 
o.= then there is no central manifold, and the 
stationary point O is called hyperbolic. Let Es, Ee, 
and E, be the linear A-invariant subspaces corre- 
sponding to o, resp. o: resp. o,. Then R” = E, ® 
E.@E,. We look for corresponding X-invariant 
manifolds in the neighborhood of 0, in the form of 
graphs of maps. More precisely: 


Theorem 1 Let the vector field X above be of class 
C’(1<r<oo). There exist map germs o55:(E;,0) > 
E; @ En, Qsc: (Es B Ec, 9) > Eus Pun? (Eu, 9) > Es Q Ec, 
Peu: (Ec D Ey,0) > Es, and Qe: (Ec,0) > Es @ Ey of 
class C" such that the graphs of these maps are 
invariant for the flow of X. Moreover, these maps 
are of class C", and their linear approximation at 0 
is zero, that is, their graphs are tangent to, 
respectively, Es, Es B Ee, En, Ec ® En, and E,. If X is 
of class C® then ġbss and buu are also of class C®. If 
X is analytic then ss and oy, are also analytic. 








The graph of ¢, is called the (local) central 
(or, center) manifold of X at O and it is often 
denoted by W°. Thus, it is an invariant manifold 
of X tangent at the generalized eigenspace of 
dX(0) corresponding to the eigenvalues having zero 
real part. 
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von Neumann J (1966) In: Burks AW (ed.) Theory of Self- 
Reproducing Automata. Urbana: University of Illinois Press. 

Wolfram S (2002) A New Kind of Science. Champaign: Wolfram 
Media. 


(Non) uniqueness, Smoothness 


Most proofs in the literature (Vanderbauwhede 
1989) use a cutoff in order to construct globally 
defined objects, and then obtain the invariant graph 
as the solution of some fixed-point problem of a 
contraction in an appropriate function space. 
Although this solution is unique for the globalized 
problem, this is not the case at the germ level: 
another cutoff may produce a different germ of 
a central manifold. In other words, locally a 
central manifold might not be unique, as is 
easily seen on the planar example x*0/0x — 
yO/Oy. On the other hand, the oo-jet of the map 
@-, in case of a C™ vector field, is unique, so if 
there would exist an analytic central manifold then 
this last one is unique; in the foregoing example, 
it is the x-axis. But for the (polynomial) example 
(x — y*)0/Ox + y?0/Oy one can calculate that the 
oo-jet of x = ¢-(y) is given by Japy) = Dra my”, 
which has a vanishing radius of convergence, so 
there is no analytic central manifold. On the other 
hand, by the Borel theorem we can choose a 
C™-representative for ġe. This can be generalized 
in the planar case: 


Proposition 1 If n=2 and if X is C® and if the 
oo-jet of X in the direction of the central manifold 
is nonzero, then this central manifold is C®. 
In particular, if X is analytic then the central 
manifold is either an analytic curve of stationary 
points or is a C® curve along which X has a 
nonzero jet. 


For proofs and additional reading, the reader is 
referred to Aulbach (1992). In general, a central 
manifold is not necessarily C% (van Strien 1979, 
Arrowsmith and Place 1990): for the system in 
R? given by 


one can find a CF central manifold for every k but 
there is no C% central manifold. Indeed, in this case 
the domain of definition of ¢, shrinks to zero when 
k tends to infinity. 
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Central Manifold Reduction 


The importance of a central manifold lies in the 
principle of central manifold reduction, which 
roughly says that for local bifurcation phenomena 
it is enough to study the behavior on the central 
manifold, that is, if two vector fields, restricted to 
their central manifolds, have homeomorphic integral 
curve portraits, and if the dimensions of E; and E, 
are equal, then the two vector fields have home- 
omorphic integral curve portraits in R”, at least 
locally near 0. Let us be more precise: 


Theorem 2 Let m be the dimension of E.. There 
exists p, O<p<n-—m, such that X is locally 
C°-conjugate to 


oe o 
x’ =y. X Gina: Zm) = 
i=1 1 


OZ 
= 2 L ð 
S ag- D ag 
i=m+1 Ozi i=m+p+1 Ozi 


where (Z1,...,Zm) is a coordinate system on a 
central manifold, (z1,...,Zn) is a coordinate system 
on R” extending (Z1,...,%m) and X q Xr0/ 3z 
is the restriction of X to a central manifold. 
Moreover, if 


and if Y~", Y;0/0z; is C®-equivalent (resp. C°- 
conjugate) to )~""_, X;0/0z; then X is C°-equivalent 
(resp. -conjugate) to Y. 


For a proof and further reading (a generalization) 
see Palis and Takens (1977). 

In case that more smoothness than just C? is 
needed, we have the principle of normal lineariza- 
tion along the central manifold. More concretely, let 
x denote a coordinate in the central manifold and 
let y be a complementary variable, that is, let 
X = X.0/ðx + X,0/O0y. We define the normally 
linear part along the central manifold by 

ð OX, o 

NX = X10, 0) zx + By (x, 0) Vay 
Under certain nonresonance conditions (Takens 
1971, Bonckaert 1997) on the real parts of the 
eigenvalues of dX(0), there exists a C” local 
conjugacy between X and NX for each re N 
(assuming X to be of class C%). If there are 
resonances, then one can conjugate with the 


so-called seminormal or renormal form containing 
higher-order terms (see Bonckaert (1997, 2000) and 
references therein; here one can also find results for 
cases where extra constraints should be respected, 
like symmetry, reversibility, or invariance of some 
given foliation etc.). 


Parameters 


Having an eigenvalue with zero real part is 
ungeneric, so in bifurcation problems we consider 
p-parameter families X, near, say, A=0. With 
respect to the results above, we remark that such a 
family can be considered as a vector field near 
(0,0) € R” x R? tangent to the leaves R” x {A}. In 
fact, the parameter direction R? is contained in Es. 
In all the results mentioned, this structure “of being 
a family” is respected. For example, in Theorem 2 
we replace X;(Z1,...,%m) by Xi(Z1,... 5 %m, A). Hence, 
if X, is a versal unfolding of Xo then X, is a versal 
unfolding of Xo. By this, the search for versal 
unfoldings is reduced to the unfolding of singula- 
rities whose linear approximation at 0 has a purely 
imaginary spectrum. 


Diffeomorphisms, Periodic Orbits 


A completely analogous theory can be developed for 
fixed points of diffeomorphisms f:(R”,0) — R”. 
Here we split up the spectrum of the linear part 
L=df(0) at 0 as o(L)=0, Uo, U Cu, where o; resp. 
Oc resp. o, consists of those eigenvalues with 
modulus <1 resp.=1 resp. >1. This theory can be 
applied to the time-t map of a vector field (and will 
give the same invariant manifolds) and to the 
Poincaré map of a transversal section of a periodic 
orbit of a vector field (Chow et al. 1994). 


Normal Forms 


The general idea of a normal form is to put a 
(complicated) system into a form “as simple as 
possible” by means of a change of coordinates. This 
idea was already developed to a great extent by 
H Poincaré. Simple examples are: (1) putting a square 
matrix into Jordan form, (2) the flow box theorem 
(Arrowsmith and Place 1990) near a nonsingular 
point. Depending on the context and on the purpose 
of the simplification, this concept may vary greatly. It 
depends on the kind of changes of coordinates that are 
tolerated (linear, polynomial, formal series, smooth, 
analytic) and on the possible structures that must be 
preserved (e.g., symplectic, volume-preserving, sym- 
metric, reversible etc.). Let us restrict to local normal 
forms, that is, in the vicinity of a stationary point of a 
vector field or a diffeomorphism (the latter can be 


applied to the Poincaré map of a periodic orbit). We 
concentrate on the simplification of the Taylor series. 
The general idea is to apply consecutive polynomial 
changes of variables; at each step we simplify terms of 
a degree higher than in the step before. The ideal 
simplification would be to put all higher-order terms 
to zero, which would (at least at the level of formal 
series) linearize the system. But as soon as there are 
resonances (see below), this is impossible: the planar 
system 2x0/Ox + (y + x?)ð/ðy cannot be formally 
linearized. 


Setting 


Let X be a C’*! vector field defined on a neighbor- 
hood of 0 € R”, and denote A=dX(0) (its linear 
approximation at 0). The Taylor expansion of X at 
0 takes the form 


X(x) = A -x+ Y` Xp(x) + O(n") 
k=? 


where X; € H*, the space of vector fields whose 
components are homogeneous polynomials of 
degree k. The classical formal normal-form theorem 
is as follows. We define the operator La on H* by 
putting Lah(x)=dh(x)-A-x —A-h(x); one calls LA 
the homological operator. One checks that 
La(H*) c HF. One also denotes this by ad A(h)(x): 
see further in the Lie algebra setting. Let R* be the 
range of La, that is, Rt = L4(H*). Let G? denote any 
complementary subspace to R* in H*. The formal 


normal-form theorem states, under the above 
settings: 
Theorem 3 (Chow et al. 1994, Dumortier 1991) 


There exists a composition of near identity changes 
of variables of the form 


x= y+ é (y) [1] 


where the components of £ are homogeneous 
polynomials of degree k, such that the vector field 
X is transformed into 


Y(y) = A-y+ So E 
k=? 


where g, € Gf, k =2,...,r. 


Sometimes this theorem is applied to the restric- 
tion of a vector field to its central manifold, for 
reasons explained in the last section. This is the 
reason why we did not assume X to be C%; in the 
latter case one can let r — œ and obtain a normal 
form on the level of formal Taylor series (also called 
oo-jets). Using a theorem of Borel, we infer the 
existence of a C% change of variables @ such that 
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the Taylor series of +(X) is A -y +g» gly). For 
practical computations, it is often appropriate to 
first simplify the linear part A and to diagonalize it 
whenever possible. Hence, it is convenient to use a 
complexified setting and to use complex polyno- 
mials or power series. One can show that all 
involved changes of variables preserve the property 
of “being a complex system coming from a real 
system,” that is, at the final stage we can return to a 
real system (see, e.g., Arrowsmith and Place (1990) 
for a more precise mathematical description). 
Hence, we can assume that A is an upper 
triangular matrix. Let the eigenvalues be A1,..., An. 
It can be calculated that the eigenvalues of L4, as an 
operator H* — H*, are then the numbers (A, a) — A; 
where a € N”, 2 a aj=k and 1<j <n. Hence, if 
these would all be nonzero then B£ =H*, and then 
we have an ideal simplification, that is, all g, equal 
to zero. However, if such a number is zero, that is, 


(A,a) — A; = 0 2] 


it is called a resonance between the eigenvalues. In 
such a case, we have to choose a complementary 
space G*. From linear algebra it follows that one 
can always choose 


G* = ker(La:) [3] 


where A* is the adjoint operator. But this choice [3] is 
not unique and is, from the computational point of 
view, not always optimal, especially if there are 
nilpotent blocks. This fact has been exploited by 
many authors. A typical example is the case where 
A = yð/ðx. On the other hand, if A is semisimple we 
can choose the complementary space to be ker(L4), so 
Lag =0; we can assume it to be the (complex) 
diagonal[\;,...,A,]. In that case we can be more 
explicit as follows. Let e; = 0/0x; denote the standard 
basis on C”. For a monomial one can calculate that 


La(x"e) = ((A, a) — Aj) x%e; [4] 


If the latter is zero, then the monomial is called 
resonant. This implies that the normal form can be 
chosen so that it only contains resonant monomials. 

Putting a system into normal form not only 
simplifies the original system, it also gives more 
geometric insight on the Taylor series. To be more 
precise, suppose (for simplicity, this can be general- 
ized (Dumortier 1997)) that A is semisimple. One 
can calculate that the condition Lag, =0 implies: 
exp (— At)g,( exp (At)x) =g,(x) for all t€ R. This 
means that g, is invariant for the one-parameter 
group exp(At). A typical example in the plane 
is: A has eigenvalues iA, —i\. Note that the (only) 
resonances are ((1A, —iA),(p+1,p)) —iA=0 and 
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(iA, —iA),(p,p + 1) +1iA=0 for all peN. We 
suppose that the original system was real, that is, 
on R*; we can choose linear coordinates such that 
for z=x+iy,Zz=x-—i1y the linear part is 
A =diagonal[iA, —1A]. Applying the remarks above, 
we conclude that the normal form only contains the 
monomials (zz)’z0/0z and (zz)?z0/0z. The geo- 
metric interpretation here is that these monomials 
are invariant for rotations around (0,0). This can 
also be seen on the real variant of this: the Taylor 
series of the (real) normalized system has the 
form (A+ f(x* + y*))(x0/Oy — yO/Ox) + g(x* +y?) 
(x0/Ox +yO/Oy) and is invariant for rotations. 
Warning: the dynamic behavior of a formal normal 
form in the central manifold can be very different 
from that of the original vector field, since we are 
only looking at the formal level. A trivial example is 
(take f=g=0 in the foregoing example) X(x,y)= 
A(xðy — yOx) — exp (—1/(x*))0/Ox, where orbits 
near (0,0) spiral to (0,0), whereas the normal form 
is just a linear rotation. This difference is due to the 
so-called flat terms, that is, the difference between 
the transformed vector field and a C™-realization of 
its normalized Taylor series (or polynomial). In case 
of analyticity of X, one can ask for analyticity of the 
normalizing transformation ¢. Generically, this is 
not the case in many situations. The precise meaning 
of this “genericity condition” is too elaborate to 
explain in this brief review article. We provide some 
suggestions for further reading in the next section. 
One could roughly say that, in the central manifold, 
the normal form has too much symmetry and is too 
poor to model more complicated dynamics of the 
system, which can be “hidden in the flat terms.” To 
quote II’yashenko (1981): “In the theory of normal 
forms of analytic differential equations, divergence 
is the rule and convergence the exception ....” 

In many applications, we want to preserve some 
extra structure, such as a symplectic structure, a 
volume form, some symmetry, reversibility, some 
projection etc.; the case of a projection is important 
since it includes vector fields depending on a para- 
meter. Sometimes a superposition of these structures 
appears (e.g., a family of volume-preserving systems). 
We would like that the normal-form procedure 
respects this structure at each step. One can often 
formulate this in terms of vector fields belonging to 
some Lie subalgebra Lo. The idea is then to use 
changes of variables like [1], where £; is then generated 
by a vector field in £o. This will guarantee that all 
changes of variables are “compatible” with the extra 
structure. Unlike the general case where we could 
work with monomials as in [4], we will have to 
consider vector fields h, in Lo whose components are 
homogeneous polynomials of degree k. If this can be 


done, one says that Lo respects the grading by the 
homogeneous polynomials. In order to fix ideas, 
suppose that Lo are the divergence-free planar vector 
fields. Note that a monomial x'y/0/0x is not diver- 
gence free. We can instead use time mappings of 
homogeneous vector fields of the form a(g+ 
1)x? +1 yI0/Ax — a(p+1)x? yIt!d/dy. Up to terms 
of higher order we can use the time-one map of h; 
instead of x + h,(x). In case that one asks for a C™%- 
realization of the normalizing transformation, we need 
an extra assumption on the extra structure, that is, on 
Lo, called the Borel property: denote by J, 0 the set of 
formal series such that each truncation is the Taylor 
polynomial of an element of £o. The extra assumption 
is: each element of J..,9 must be the Taylor series of a 
C™ vector field in Lo. It can be proved (Broer 1981) 
that the following structures respect the grading and 
satisfy the Borel property: being an r-parameter family, 
respecting a volume form on R”, being a Hamiltonian 
vector field (n even), and being reversible for a linear 
involution. 

One could consider other types of grading of the 
Lie-algebras involved. 

This method, using the framework of the so-called 
filtered Lie algebras, is explained and developed 
systematically in a more general and abstract 
context in Broer (1981). 

In nonlocal bifurcations, such as near a homo- 
clinic loop, for example, it is not enough to perform 
central manifold reduction near the singularity: a 
simplified smooth model in a full neighborhood of 
the singularity is often needed, for example, in order 
to compute Poincaré maps. 

Let us start with the “purely” hyperbolic case (i.e., 
dim E,=0). First we compute the formal normal 
form such as the above. If there are no resonances 
[2] then we can formally linearize the vector field X. 
If X is C” then a classical theorem of Sternberg 
(1958) states that this linearization can be realized 
by a C% change of variables (i.e., no more flat terms 
remaining). In case there are resonances, we must 
allow nonlinear terms: the resonant monomials. In 
this case we can also reduce C™ to this normal form. 
Using the same methods, it is also possible to reduce 
to a polynomial normal form, but this time using 
C*(k < oo) changes of variables. More precisely, if k 
is a given number and if we write the vector field as 
X =Xyn + Ry, where Xy is the Taylor polynomial 
up to order N (which can be assumed to be in 
normal form) and where Ryn(x) = Olx tt), then for 
N sufficiently large there is a C* change of variables 
conjugating X to Xy near 0. The number N depends 
on the spectrum of A =dX(0). An elegant proof of 
these facts can be found in II’yashenko and Yakovenko 
(1991). For the case when extra structure must be 


preserved, see Bonckaert (1997), which also deals with 
the partially hyperbolic case (dim E; > 1). As already 
remarked above, the case of a parameter-dependent 
family can be regarded as a partially hyperbolic 
stationary point preserving this extra structure. 

The question of an analytic normal form, also in 
the hyperbolic case, leads to convergence questions 
and calls upon the so-called small-divisor problems. 
The classical results are due to Poincaré and Siegel. 
Let us summarize them; they are formulated in the 
complex analytic setting: 


Theorem 4 


(i) If the convex hull of the spectrum of A does not 
contain 0 € C then X can locally be put into 
normal form by an analytic change of variables. 
Moreover, this normal form is polynomial. 

(ii) If the spectrum {M,..., An} of A satisfies the 
condition that there exists C > 0 and u > 0 such 
that for any m € N” with X`; mj > 2: 


E T T — Dy > [s] 


for 1 <j <n then X can be locally linearized by 
an analytic change of variables. 


Note that case (i) contains the case where 0 is a 
hyperbolic source or sink. This case (i) in Theorem 4 
can be extended if there are parameters: if X 
depends analytically on a parameter ¢ € C” near 
e€ =Q then the change of variables is also analytic in 
e€; moreover, the normal form is then a polynomial 
in the space variables whose coefficients are analy- 
tically dependent on the parameter e. 

For case (ii) this is surely not the case, since the 
condition [5] is fragile: a small distortion of the 
parameter generically causes resonances, be it of a 
high order. To fix ideas, consider n =2 and suppose 
Ay <O< 22. By a generic but arbitrary small 
perturbation, we can have that the ratio of these 
eigenvalues becomes a negative rational number 
—p/q, which gives a resonance of the form [2] 
with j=1 and a=(g+1,p), so [5] is violated. 

So analytic linearization, or even a polynomial 
analytic normal form, is ungeneric for families of 
such hyperbolic stationary points. The search for 
analytic normal forms, that is, simplified models, for 
families is still under investigation. A first simplifica- 
tion is obtained via the stable and unstable manifold 
from Theorem 1, that is, the graphs of ¢,, and yy. 
When X is analytic near 0 then these manifolds are 
also analytic. So, up to an analytic change of variables, 
we can assume that E, and E, are invariant, which 
gives a simplification of the expression of X. More- 
over, there is analytic dependence on parameters. 
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For local diffeomorphisms there are completely 
similar theorems pertaining to all the cases consid- 
ered above. 


Concluding Remarks 


The concept of central manifold can be extended to 
more general invariant sets (see Chow et al. (2000) 
and references therein). It can also be extended to 
the infinite-dimensional case and can be applied to 
partial differential equations (Vanderbauwhede and 
Iooss 1992). 

Concerning the generic divergence of normalizing 
transformations, the reader is referred to Broer and 
Takens (1989), Bruno (1989), Il’yashenko (1981), and 
I’yashenko and Pyartli (1991). Although the power 
series giving the normalizing transformation generally 
diverges, the study of the dynamics is often performed 
by truncating the normal form at a certain order. 
Recently, Iooss and Lombardi (2005) considered the 
question as to what an optimal truncation is. It is 
shown, in case dX(0) is semisimple, that the order of 
the normal form can be optimized so that the remainder 
satisfies some estimate shrinking exponentially fast to 
zero as a function of the radius of the domain. 

Concerning normal forms preserving the 
Hamiltonian structure, see Birkhoff (1966) and 
Siegel and Moser (1995) for a starting point; this is 
an extended subject on its own, sometimes called 
Birkhoff normal form, and it would require another 
review article. 

Further simplifications of the normal form can 
sometimes be obtained by taking into account 
nonlinear terms (instead of just A) in order to obtain 
reductions of higher-order terms (see Gaeta (2002) 
and especially the references therein). 

Applications of normal forms and central mani- 
folds to bifurcation theory have been explained in 
Dumortier (1991). 


See also: Averaging Methods; Bifurcation Theory; 
Dynamical Systems and Thermodynamics; Dynamical 
Systems in Mathematical Physics: An Illustration from 
Water Waves; Finite Group Symmetry Breaking; 
Korteweg—de Vries Equation and Other Modulation 
Equations; Multiscale Approaches; Normal Forms and 
Semiclassical Approximation; Symmetry and Symmetry 
Breaking in Dynamical Systems. 
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Introduction 


Consider a typical quantum system such as a string 
of ions in a trap. To “process” the quantum 
information the ions carry, we have to perform in 
general many steps of a quite different nature. 
Typical examples are: free time evolution (including 
unwanted but unavoidable interactions with the 
environment), controlled time evolution (e.g., the 
application of a “quantum gate” in a quantum 
computer), preparations and measurements. Each 
processing step can be described by a channel which 
transforms input systems into output system of a 
possibly different type (e.g., a measurement trans- 
forms quantum systems into classical information). 


Systems, States, and Algebras 


To get a unified mathematical description of systems 
of different physical nature, it is useful to consider 


C*-algebras (which are, in our case, always finite 
dimensional): quantum systems can be represented 
in terms of the algebra B(H) of (bounded) operators 
on the Hilbert space H=C%; for classical informa- 
tion we have to choose the set C(X) of (continuous), 
complex-valued functions on the finite alphabet X; 
and the tensor product of both B(H) @C(X) 
describes hybrid systems which are _half-classical 
and half-quantum. Assume now that A is one of 
these algebras. Effects (i.e., yes/no measurements on 
the system in question) are then described by A € A 
satisfying 0 < A < 1, states are positive, normalized 
linear functionals w: A — C, and the probability to 
get the result “yes” during an A measurement on a 
system in the state w is given by w(A). Since A is 
assumed to be finite dimensional, each state w on 
B(H) is represented by a density operator p, that is, 
w(A) =tr(pA). Likewise, a state w on C(X) has the 
form w(A)= >>, A(x)px, where (px),cx denotes a 
probability distribution on X, and a state w on 
BIH) C(X) is described by a sequence (px),cx of 
positive (trace-class) operators on B(H) with 
yoy trt(px)=1 such that w(A)= >>, tr(p,Ax). Here 


we have used the fact that an element A € B(H) Q 
C(X) can be represented in a canonical way by a 
sequence (Ax),<-x of operators on H. The set of 
states will be denoted in the following by S(A) and 
the set of effects by E(A). 


Completely Positive Maps 


Our aim is now to get a mathematical object which 
can be used to describe a channel. To this end, 
consider two C*-algebras, A, B, describing the input 
and output system, respectively, and an effect A € B 
of the output system. If we invoke first a channel 
which transforms A systems into B systems, and 
measure A afterwards on the output systems, we end 
up with a measurement of an effect T(A) on the 
input systems. Hence, we get a map T: E(B) — €(A) 
which completely describes the channel (note that 
the direction of the mapping arrow is reversed 
compared to the natural ordering of processing). 
Alternatively, we can look at the states and interpret 
a channel as a map 7T*:S(A) — S(B) which trans- 
forms A systems in the state p € S(A) into B systems 
in the state T*(p). To distinguish between both 
maps, we can say that T describes the channel in the 
Heisenberg picture and T* in the Schrodinger 
picture. On the level of the statistical interpretation, 
both points of view should coincide of course, that 
is, the probabilities (T*p)(A) and p(TA) to get the 
result “yes” during an A measurement on B systems 
in the state T*p, respectively a TA measurement on 
A systems in the state p, should be the same. Since 
(T*p)(A) is linear in A, we see immediately that T 
must be an affine map, that is, T(A,;A1 + A2A2) = 
AM T(A1) + Ax T(A2) for each convex linear combina- 
tion 141 +A2A2 of effects in 6, and this in turn 
implies that T can be extended naturally to a linear 
map, which we will identify in the following with 
the channel itself, that is, we say that T is the 
channel. 

Let us now change slightly our point of view and 
start with a linear operator T:A— B. To be a 
channel, T must map effects to effects, that is, T has 
to be positive: T(A) > OVA > 0 and bounded from 
above by 1, that is, T(1) < 1. In addition, it is natural 
to require that two channels in parallel are again a 
channel. More precisely, if two channels T: A; > B1 
and S: A — B2 are given, we can consider the map 
T & S which associates to each A & B € A; & A the 
tensor product T(A) ® S(B)€ By, ® By. It is natural to 
assume that T&S is a channel which converts 
composite systems of type A; ®.A2 into B1 ® By 
systems. Hence, S ® T should be positive as well. 


Definition 1 Consider two observable algebras 


A,B and a linear map T: A — BC B(H). 
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(i) T is called positive if T(A) >0 holds for all 
positive A € A. 

(ii) T is called completely positive (CP) if T 8 
Id: A ® B(C”) — B(H) ® B(C”) is positive for 
all 2» € N. Here Id denotes the identity map 
on B(C”). 

(iii) T is called unital if T(1)=1 holds. 


Consider now the map T*: 6* — A* which is dual 
to T, that is, T*p(A) = p(TA) for all p € B* and A € A. 
It is called the Schrodinger-picture representation of 
the channel T, since it maps states to states provided T 
is unital. (Complete) positivity can be defined in the 
Schrödinger picture as in the Heisenberg picture, and 
we immediately see that T is (completely) positive iff 
T” is. 

It is natural to ask whether the distinction 
between positivity and complete positivity is 
really necessary, that is, whether there are 
positive maps which are not CP. If at least one 
of the algebras A or B is classical, the answer is 
no: each positive map is CP in this case. If both 
algebras are quantum however, complete positiv- 
ity is not implied by positivity alone. The most 
prominent example for this fact is the transposi- 
tion map. 

If item (ii) holds only for a fixed nEN, 
the map T is called n-positive. This is obviously 
a weaker condition than complete positivity. 
However, n-positivity implies m-positivity for 
all m <n, and for A=B(C“) complete positivity 
is implied by 1-positivity, provided n > d holds. 

Let us consider now the question whether a 
channel should be unital or not. We have already 
mentioned that T(1)< 1 must hold since effects 
should be mapped to effects. If T(1) is not equal to 1, 
we get p(T1l)=T*p(1)<1 for the probability to 
measure the effect 1 on systems in the state T*p, 
but this is impossible for channels which produce an 
output with certainty, because 1 is the effect which 
is always true. In other words, if a CP map is not 
unital, it describes a channel which sometimes 
produces no output at all and T(1) is the effect 
which measures whether we have got an output. We 
will assume henceforth that channels are unital if 
nothing else is explicitly stated. 


Quantum Channels 


In this section we will discuss some basic properties 
of CP maps which transform quantum systems into 
quantum systems, in particular the Stinespring 
theorem, which constitutes the most important 
structural result. For a more detailed presentation, 
including generalizations to more general input/ 
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output algebras the reader should consult the 
textbook by Paulsen (2002). 


The Stinespring Theorem 


Hence consider channels between quantum systems, 
i.e., A=B(H,) and B=B(H2). A fairly simple 
example (not necessarily unital) is given in terms of 
an operator V:H, — H2 by B(H,) 3 Ame VAV* € 
B(H2). A second example is the restriction to a 
subsystem, which is given in the Heisenberg picture 
by B(H) SAR AQ le E€ B(H 8 K). Finally the com- 
position SoT=ST of two channels is again a 
channel. The following theorem says that each 
channel can be represented as a composition of 
these two examples [7]. 


Theorem 2 (Stinespring dilation theorem). Every 
completely positive map T:B(H,) — B(H2) has the 
form 


T(A)=V*(A@1x)V [1] 


with an additional Hilbert space K and an operator 
V:H — H1 @K. Both (i.e. K and V) can be 
chosen such that the span of all (A ® 1)Vọ with A € 
B(Hı) and EH is dense in H18 K. This 
particular decomposition is unique (up to unitary 
equivalence) and is called the minimal 
decomposition. 


By introducing a family |x;)(x;| of one-dimen- 
sional projectors with > /;|x;)(x;| = 1, we can define 
the “Kraus operators” (4, Vio) = (Y ® xj, Vo). 
In terms of these, we can rewrite eqn [1] in 
the following form (Kraus 1983): 


Corollary 3 (Kraus form). Every CP map 
T: B(H,) — B(H2) can be written in the form 


T(A) = > V*AV; 2] 
j=1 


with operators V;:Hı — Hı. 


To get a third representation of channels, consider 
the Stinespring form [1] of T and a vector WEK 
such that U(@@wW)=V(d) can be extended to a 
unitary map U:H@®K — H&K. It is then easy to 
see that the dual T* of T can be written as: 


Corollary 4 (Ancilla form). Assume that T : B(H) > 
B(H) is a channel. Then there is a Hilbert space K, a 
pure state po, and a unitary map U:H@®K >H8QK 
such that 


T* (p) = trxe(U(p 8 po) U") [3] 
holds. 


This representation of a channel has a (seemingly) 
very nice physical interpretation, because we can 
look at eqn [3] as the unitary interaction of the 
system with an unobservable environment, which is 
initially in the state po. The problem, however, is 
that there is a great arbitrariness in the choice of U 
and pọ. This is the weakness of the ancilla form 
compared to the Stinespring representation. 

Finally, let us state a related result. It characterizes 
all decompositions of a given completely positive 
map into completely positive summands. By analogy 
with results for states on abelian algebras (i.e., 
probability measures), we will call it a Radon- 
Nikodym theorem (see Arveson (1969) for a proof). 


Theorem 5 (Radon-Nikodym theorem). Let 
T, :B(H1) > B(H2),x EX be a family of CP 
maps and let V:H2 —> Hı K be the Stinespring 
operator of T=% „Tx; then there are uniquely 
determined positive operators F, in B(K) with 
ote lL and 


T,(A) = V*(A 8 Fx) V 4] 


The Jamiołkowski Isomorphism 


The subject of this section is a relation between CP 
maps and states of bipartite systems, first discovered 
by Jamiołkowski (1972), and which is very useful in 
translating properties of bipartite systems into 
properties of positive maps and vice versa. 

The idea is based on the following setup. Alice 
and Bob share a bipartite system in a maximally 
entangled state 


Jd 


(where e1,...,eg denote an orthonormal basis of H). 
Alice applies to her subsystem a channel T: B(H) — 
B(H’) while Bob does nothing. At the end of the 
processing, the overall system ends up in a state 


Rr = (T @Id)|x)(x| [6] 


Mathematically, eqn [6] makes sense if T is only 
linear but not necessarily positive or CP (but then 
Rr is not positive either). If we denote the space of 
all linear maps from B(H) into B(H’) by £, we get a 
map 


1 d 
X= F2 Ca Bla EHOH [5] 
gal 


L2TrRrEBK@H) [7] 


which is easily shown to be linear (ie., 
RUTAS = uRr + ARs for all A, u € C and all 
T,S € £). Furthermore, this map is bijective, hence 
a linear isomorphism. 


Theorem 6 The map defined in eqns |7] and |6] is 
a linear isomorphism. The inverse map is given by 


BIH@H') 3 pHT, EL [8] 


with 


(e Tlo) =dt (olleelle) D 


where e',...,e', E H denote an (arbitrary) ortho- 
normal basis of H' and the transposition of o is 
defined with respect to the basis e,,a=1,...,d used 
to define x in [5]. 


From the definition of Rr in eqn [6], it is obvious 
that Rr is positive, if T is CP. To see that the 
converse is also true is not as trivial (because a 
transposition is involved), but it requires only a 
short calculation, which is omitted here. Hence, we 
gert: 


Corollary 7 The operator Rr is positive, iff the 
map T is CP. 


Examples 


Let us return now to the general case (i.e., arbitrary 
input and output algebras) and discuss several 
examples. 


Channels Under Symmetry 


It is often useful to consider channels with special 
symmetry properties. To be more precise, consider 
a group G and two unitary representations 71,72 
on the Hilbert spaces Hı and H2, respectively. 
A channel T:B(H,) — B(H2) is called covariant 
(with respect to mı and 7) if 


[m1 (U)Am (U)"] = m2 (U)T[A]m (U) 
VAEBH1)VUEG [10] 


holds. The general structure of covariant channels 
is governed by a fairly powerful variant of Stine- 
springs theorem (Keyl and Werner 1999). 


Theorem 8 Let G be a group with finite-dimen- 
sional unitary representations nj:G — U(H;) and 
T: B(H,) — B(H2) a m1, m-covariant channel. 


(i) Then there is a finite-dimensional unitary 
representation #:G-—U(K) and an operator 
V:H — Hı Q K with Vro(U)=71(U) S8 (U)V 
and T(A)= V*A Q 1V. 

(ii) If T= X`, T° is a decomposition of T in CP and 
covariant summands, there is a decomposition 
l= X., F° of the identity operator on K into 
positive operators F° € B(K) with [F°, #(g)|=0 
such that T°(X)= V*(X 8 F°)V. 
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The most prominent examples of covariant 
channels arise with H,=H2=C*,G=U(d) and 
mı(U)=m(U)= U. All channels of this type are of 
the form 


T(A) = (1-—V)A+0d"'tr(A)l 
with J € [0,d/(d? — 1)| [11] 


and are known as “depolarizing channels.” They 
often serve as a standard model for noise. Two 
particular cases are the ideal channel arising with 
V=0, and the completely depolarizing channel 
(= 1) which erases all information. If we choose 


m2(U)=U (where the bar denotes complex conju- 
gate) instead of 72(U) =U, we get 





T(A) = + i Îtr(A)1 + A‘] 


1-0 
A)l-A!' 
T T [tr( ) |, 
If we map these channels to states of bipartite 
systems (using the Jamiołkowski isomorphism from 
the last section), we get “Isotropic states” from 
eqn [11] and “Werner states” from [12]. 





ve [0,1] [12] 


Classical Channels 


The classical analog to a quantum operation is a 
channel T:C(X) — C(Y) which describes the trans- 
mission or manipulation of classical information. As 
already mentioned in the subsection “Completely 
positive maps,” positivity and complete positivity 
are equivalent in this case. Hence, we have to 
assume only that T is positive and unital. Obviously, 
T is characterized by its matrix elements 
Txy = by(Tex), where 6, € C*(X) denotes the Dirac 
measure at y € Y and ex € C(X) is the canonical 
basis in C(X). More precisely, 6, and e, denote, 
respectively, the probability distribution and the 
function on X, given by 


dy — (Oxy xex and ex(y) = Osy [13] 


We will keep this notation up to the end of this 
article. Positivity and normalization of T imply that 
0< Ey = 1 and 


1 = & (1) = 46,(T(1)) 
(5 ex) =% Ty [14 


holds. Hence the family (Txy)xcx is a probability 
distribution on X and Ty is, therefore, the transition 
probability to get the information x €X at the 
output side of the channel if y € Y was sent. 


= by 
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Observables 


Let us consider now a channel which transforms 
quantum information 6(H) into classical information 
C(X). Since positivity and complete positivity are 
again equivalent, we just have to look at a positive 
and unital map E:C(X) — B(H). With the canonical 
basis ex,x EX, of C(X), we get a family 
Es = E(ex), x E€ X, of positive operators Ex € B(H) 
with X ex Ex =1. Hence, the Ey form a positive 
operator valued (POV) measure, i.e., an observable. 
If, on the other hand, a POV measure Ey € B(H), x € 
X, is given, we can define a quantum-to-classical 


channel E:C(X) — B(H) by 


E(f) = > _ f(x) Ex [15] 


xEX 


This shows that the observable FE,,x € X, and the 
channel E can be identified. 


Preparations 


Let us now exchange the role of C(X) and B(H); in 
other words, let us consider a channel R :B(H) — 
C(X) with a classical input and a quantum output 
algebra. In the Schrödinger picture, we get a family of 
density matrices px := R*(6,) € B*(H),x € X, where 
ôx € C*(X) denotes again the Dirac measure on X. 
Hence, we get a parameter-dependent preparation 
that can be used to encode the classical information 
x € X into the quantum information p, € B*(H). 


Instruments 


An observable describes only the statistics of 
measuring results, but does not contain information 
about the state of the system after the measurement. 
To get a description which fills this gap, we have 
to consider channels which operate on quantum 
systems and produce hybrid systems as output, that is, 
T:B(H) @ C(X) — B(K). Following Davies (1976), 
we will call such an object an instrument. From T we 
can derive the subchannel 


C(X) af OTL 8f) € BK) 16] 


which is the observable measured by T, that is, 
tr(T (1 8 e,)p) is the probability to measure x € X on 
systems in the state p. On the other hand, we get for 
each x € X a quantum channel (which is not unital) 


B(H) > AGT,(A) =T(A@ex) €B(K) [17] 


It describes the operation performed by the instru- 
ment T if x € X was measured. More precisely, if a 
measurement on systems in the state p gives the 
result x € X, we get (up to normalization) the state 
T*(p) after the measurement, while 


te(Te(p)) =t(pT(L@ex)) [18] 
is (again) the probability to measure x € X on p. 
The instrument T can be expressed in terms of the 


operations Ty by 
dat f(x [19] 


T(A8f)= 
Hence, we can identify T with the family Ty, x € X. 
Finally, we can consider the second marginal of T 


NOT. [20] 


xEX 


= tr(T;(p)1) 


B(H) > A= T(A 91) = 


It describes the operation we get if the outcome of 
the measurement is ignored. 

The best-known example of an instrument is a von 
Neumann-Lüders measurement associated with a PV 
measure given by family of projections E,,x=1, 

.,da; for example, the eigenprojections of a self- 
adjoint operator A € B(H). It is defined as the channel 


T : BH) Q C(X) = B(H) 


with X = {1,...,d} and T,(A) = E,AE, [21] 


Hence, we get the final state tr(Exp) Ex pEx if we 
measure the value x € X on systems initially in the 
state p — this is well known from quantum mechanics. 


Parameter-Dependent Operations 


Let us change now the role of B(H) 8 C(X) and 
B(K); in other words, consider a channel T : B(K) — 
B(H) & C(X) with hybrid input and quantum output. 
It describes a device which changes the state of a 
system depending on the additional classical infor- 
mation. As for an instrument, T decomposes into a 
family of (unital!) channels T),:B(K) — B(H) such 
that we get T*(p 8 p)= `, px Tž(p) in the Schrédin- 
ger picture. Physically, T describes a parameter- 
dependent operation: depending on the classical 
information x € X, the quantum information p € 
B(K) is transformed by the operation Ty. 

Finally, we can consider a channel T :B(H) ® 
C(X) — B(K) @C(Y) with hybrid input and output 
to get a parameter-dependent instrument: similarly 
to the above discussion, we can define a family of 
instruments Ty :B(H) 8 C(X) > Bi(K),y € Y, by the 
equation T*(p8p)= jy PyTj(p). Physically, T 
describes the following device: it receives the 
classical information y € Y and a quantum system 
in the state p € B*(K) as input. Depending on y, a 
measurement with the instrument T, is performed, 
which in turn produces the measuring value x € X 
and leaves the quantum system in the state (up to 
normalization) Toal with Ty,» given as in eqn 
[171 by TDA DAGO gy): 


See also: Capacities Enhanced by Entanglement; 
Capacity for Quantum Information; Entanglement; 
Optimal Cloning of Quantum States; Positive Maps on 
C*-Algebras; Quantum Channels: Classical Capacity; 
Quantum Dynamical Semigroups; Quantum Entropy; 
Quantum Spin Systems; Source Coding in Quantum 
Information Theory. 
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Introduction 


Chaos is a type of behavior that can be exhibited by 
a large class of physical systems and their mathe- 
matical models. These systems are deterministic. 
They are modeled by sets of coupled nonlinear 
ordinary differential equations (ODEs): 


_ _ dx; l 
ži =- = fils c) [1] 


called dynamical systems. The coordinates x desig- 
nate points in a state space or phase space. 
Typically, x € R” or some n-dimensional manifold 
for some n > 3, and ce Rf are called control 
parameters. They describe parameters that can be 
controlled in physical systems, such as pumping 
rates in lasers or flow rates in chemical mixing 
reactions. The most important mathematical prop- 
erty of dynamical systems is the uniqueness theorem, 
which states that there is a unique trajectory through 
every point at which f(x;c) is continuous and 
Lipschitz and f(x; c) 4 0. In particular, two distinct 
periodic orbits cannot have any points in common. 

The properties of dynamical systems are gov- 
erned, in lowest order, by the number, stability, and 
distribution of their fixed points, defined by 
x;=fi(x;c)=0. It can happen that a dynamical 
system has no stable fixed points and no stable 
limit cycles (x(t) =x(t+ T), some T > 0, all ż). In 
such cases, if the solution is bounded and recurrent 
but not periodic, it represents an unfamiliar type of 
attractor. If the system exhibits “sensitivity to initial 
conditions” (\x(t) — y(t)| ~ e*“|x(0) — y(0)| for 
|x(0)— y(0)| =e and A>O for most x(0)), the 
solution set is called a “chaotic attractor.” If the 
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attractor has fractal structure, it is called a “strange 
attractor.” 

Tools to study strange attractors have been 
developed that depend on three types of mathe- 
matics: geometry, dynamics, and topology. 

Geometric tools attempt to study the metric 
relations among points in a strange attractor. 
These include a spectrum of fractal dimensions. 
These real numbers are difficult to compute, require 
very long, very clean data sets, provide a number 
without error estimates for which there is no 
underlying statistical theory, and provide very little 
information about the attractor. 

Dynamical tools include estimation of Lyapunov 
exponents and a Lyapunov dimension. They include 
globally averaged exponents and local Lyapunov 
exponents. These are eigenvalues related to the 
different stretching (A > 0) and squeezing (A < 0) 
eigendirections in the phase space. To each globally 
averaged Lyapunov exponent A;, Ay > A2 > +--+: > Ans 
there corresponds a “partial dimension” €;,0 < &; < 1, 
with ¢;=1 if 4; > 0. The Lyapunov dimension is 
the sum of the partial dimensions dp = )>"_, €; 
That the partial dimension €; = 1 for A; > 0 indicates 
that the flow is smooth in the stretching (A; > 0) and 
flow directions and fractal in the squeezing (A; < 0) 
directions with «; < 1. Dynamical indices provide 
some useful information about a strange attractor. 
In particular, they can be used to estimate some 
fractal properties of a strange attractor, but not vice 
versa. 

Topological tools are very powerful for a 
restricted class of dynamical systems. These are 
dynamical systems in three dimensions (n= 3). For 
such systems there are three Lyapunov exponents 
Ay > A2 > A3, with A; > 0 describing the stretching 
direction and responsible for “sensitivity to initial 
conditions,” 2 =0 describing the direction of the 
flow, and \3 < 0 describing the squeezing direction 
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and responsible for “recurrence.” Strange attractors 
are generated by dissipative dynamical systems, 
which satisfy the additional condition A; + A2 + 
A3 <0. For such attractors, «&=e6,=1 and 
e3 =,1/|A3| by the Kaplan-—Yorke conjecture, so 
that d =2 + 63 =2+ ` /là3l. 

A number of tools from classical topology have 
been exploited to probe the structure of strange 
attractors in three dimensions. These include the 
Gauss linking number, the Euler characteristic, the 
Poincaré-Hopf index theorem, and braid theory. 
More recent topological contributions include sev- 
eral definitions for entropy, the development of a 
theory for knot holders or braid holders (also called 
branched manifolds), the Birman—Williams theorem 
for these objects, and relative rotation rates, a 
topological index for individual periodic orbits and 
orbit pairs. 

Three-dimensional strange attractors are 
remarkably well understood; those in higher 
dimensions are not. As a result, the description 
that follows is largely restricted to strange attrac- 
tors with dy < 3 that exist in R? or other three- 
dimensional manifolds (e.g., R? x S'). The obstacle 
to progress in higher dimensions is the lack of a 
higher-dimensional analog of the Gauss linking 
number for orbit pairs in RÌ. 


Overview 
The program described below has two objectives: 


1. classify the global topological structure of strange 
attractors in R3; and 

2. determine the “perestroikas” (changes) that such 
attractors can undergo as experimental condi- 
tions or control parameters change. 


Four levels of structure are required to complete 
this program. Each is topological and discretely 
quantifiable. This provides a beautiful interaction 
between a rigidity of structure, demanded by 
topological constraints, and freedom within this 
rigidity. These four levels of structure are: 


1. basis sets of orbits, 

2. branched manifolds or knot holders, 
3. bounding tori, and 

4. embeddings of bounding tori. 


Branched Manifolds: Stretching 
and Squeezing 


A strange attractor is generated by the repetition of 
two mechanisms: stretching and squeezing. Stretch- 
ing occurs in the directions identified by the positive 


Lyapunov exponents and squeezing occurs in the 
directions identified by the negative Lyapunov 
exponents. In R? there is one stretching direction 
and one squeezing direction. 

A simple stretch-and-squeeze mechanism that 
nature appears to be very fond of is illustrated in 
Figure 1. In this illustration, a cube of initial 
conditions at (a) is advected by the flow in a short 
time to (b). During this process, the cube is 
deformed by being stretched (A; > 0). It also shrinks 
in a transverse direction (A3 < 0). During the initial 
phase of this deformation, two nearby points 
typically separate exponentially in time. If they 
were to continue to separate exponentially for all 
times, the invariant set would not be bounded. 
Therefore, this separation cannot continue indefi- 
nitely, and in fact it must somehow reverse itself 
after some time because the motion is recurrent. The 
mechanism shown in Figure 1 involves folding, 
which begins between (b) and (c) and continues 
through to (d). Squeezing occurs where points from 
distant parts of the attractor approach each other 
exponentially, as at (d). Finally, the cube, shown 
deformed at (d), returns to the neighborhood of 
initial conditions (a). This process repeats itself and 
builds up the strange attractor. As can be inferred 
from this figure, the strange attractor constructed by 
the repetitive process is smooth in the expanding 
(A,) and flow (A2 =0) directions but fractal in the 
squeezing (A3) direction. The attractor’s fractal 
dimension is €41 +o + €3 =2 + 63 =2 + A1/|A3|. 

Figure 1 summarizes the boundedness and recur- 
rence conditions that were introduced to define 
strange attractors, and illustrates one stretching and 
squeezing mechanism that occurs repetitively to 
build up the fractal structure of the strange attractor 
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Figure 1 A common stretch-and-fold mechanism generates 
many experimentally observed strange attractors. The Topology 
of Chaos; R Gilmore and M Lefranc; Copyright © 2002, Wiley. 
This material is used by permission of John Wiley & Sons, Inc. 


and to organize all the (unstable) periodic orbits in it 
in a unique way. The particular mechanism shown 
in Figure 1 is called a stretch-and-fold mechanism. 
Other mechanisms involve stretch and roll, and tear 
and squeeze. 

The stretch-and-squeeze mechanisms are well 
summarized by the cartoons shown in Figure 2. On 
the left, a cube of initial conditions (top) is deformed 
under the flow. The flow is downward. Stretching 
occurs in one direction (horizontal) and shrinking 
occurs in a transverse direction (perpendicular to the 
page). In the limit of extreme shrinking (A3 — 
—‘oo”), the dynamics of the stretching part of the 
flow is represented by the two-dimensional surface 
shown on the bottom left. This surface fails to be a 
manifold because of the singularity, called a splitting 
point. This singularity represents an initial condition 
that flows to an unstable fixed point with at least 
one stable direction. On the right (squeezing), two 
distant cubes of initial conditions (top) in the flow 
are deformed and brought to each other’s proximity 
under the flow (middle). In the limit of extreme 
dissipation, two two-dimensional surfaces represent- 
ing inflows are joined at a branch line to a single 
surface representing an outflow. This surface fails to 
be a manifold because of the branch line, which is a 
singularity of a different kind. Points below the 
branch line in this representation of the flow (on the 
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Figure 2 Left: The stretch mechanism is modeled by a two- 
dimensional surface with a splitting point singularity. Right: The 
squeeze mechanism is modeled by a two-dimensional surface 
with a branch line singularity. The Topology of Chaos; R Gilmore 
and M Lefranc; Copyright © 2002, Wiley. This material is used 
by permission of John Wiley & Sons, Inc. 
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outflow side of the branch line) have two preimages 
above the branch line, one in each inflow sheet. This 
structure generates positive entropy. 

A beautiful theorem of Birman and Williams 
justifies the use of the two cartoons shown at the 
bottom of Figure 2 to characterize strange attractors 
in R3. As preparation for the theorem, Birman and 
Williams introduced an important identification for 
the nongeneric or atypical points that “are not 
sensitive to initial conditions” 


x~y if |x(t)—y(t 


That is, two points in a strange attractor are 
identified if they have asymptotically the same 
future. In practice, this amounts to projecting the 
flow down along the stable (A3 < 0) direction onto a 
two-dimensional surface described by the stretching 
(A; > 0) and the flow (A,=0) directions. This 
surface is not a manifold because of lower- 
dimensional singularities: splitting points and branch 
lines. The two-dimensional surface has many names, 
for example, knot holder (because it holds the 
periodic orbits that exist in abundance in strange 
attractors), braid holders, templates, branched mani- 
folds. The flow, restricted to this surface, is called a 
semiflow. Under the semiflow, points in the branched 
manifold have a unique future but do not have a 
unique past. The degree of nonuniqueness is mea- 
sured by the topological entropy of the dynamical 
system. The Birman—Williams theorem is: 


Theorem Assume that a flow ®, 


(i) on R? is dissipative (\1 > 0, Ax =0, A3 < 0 and 
Ay + A2 + A3 < 0); 

(ii) generates a hyperbolic strange attractor (the 
eigenvectors of the local Lyapunov exponents 
A1, A2, A3 span everywhere on the attractor). 


Then the projection |2] maps the strange attractor 
SA to a branched manifold BM and the flow ®, on 
SA to a semiflow ®, on BM in RÌ. The periodic 
orbits in SA under ®, correspond 1:1 with the 
periodic orbits in BM under Ê, with perhaps one or 
two specified exceptions. On any finite subset of 
orbits the correspondence can be taken via isotopy. 


The beauty of this theorem is that it guarantees 
that a flow ®, that generates a (fractal) strange 
attractor S.A can be continuously deformed to a new 
flow ©, on a simple two-dimensional structure BM. 
During this deformation, periodic orbits are neither 
created nor destroyed. The uniqueness theorem for 
ODEs is satisfied during the deformation, so orbit 
segments do not pass through each other. As a 
result, the topological organization of all the 
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unstable periodic orbits in the strange attractor is 
the same as the topological organization of all the 
unstable periodic orbits in the branched manifold. In 
fact, the branched manifold (knot holder) defines 
the topological organization of all the unstable 
periodic orbits that it supports. Topological organi- 
zation is defined by the Gauss linking number and 
the relative rotation rates, another braid index. 

The significance of this theorem is that strange 
attractors can be characterized — in fact classified — 
by their branched manifolds. Figure 3 shows a 
branched manifold “for a figure-8 knot” as well as 
the figure-8 knot itself (dark curve). If a constant 
current is sent through a conducting wire tied into 
the shape of a figure-8 knot, a discrete countable set 
of magnetic field lines will be closed. These closed 
field lines can be deformed onto the two-dimen- 
sional surface shown in Figure 3. Each of the eight 
branches of this branched manifold can be named. 
One way to do this specifies the two branch lines 
that are joined by the branch in the sense of the flow 
(e.g., (aa) and (Ga) (but not (a3)). Every closed field 
line can be labeled by a symbol sequence that is 





ba |O0 O 1 1 0 0 0 0 


bb |O 0 0O O 1 1 0 0 


Figure 3 Figure-8 knot (dark curve) and the figure-8 branched 
manifold. Transition matrix for the eight branches of the figure-8 
branched manifold is also shown. Flow direction is shown by 
arrows. The Topology of Chaos; R Gilmore and M Lefranc; 
Copyright © 2002, Wiley. This material is used by permission of 
John Wiley & Sons, Inc. 


unique up to cyclic permutation. This symbol 
sequence provides a symbolic name for the orbit. 
For example, (aa)(aß)(Bb)(ba) is a period-4 orbit. 
The structure of a branched manifold is determined 
in part by a transition matrix T. The matrix element 
Ty is 1 if the transition from branch 7 to branch j is 
allowed, O otherwise. The transition matrix for the 
figure-8 branched manifold is shown in Figure 3. 

The Birman-Williams theorem is stronger than its 
statement suggests. More systems satisfy the state- 
ment of the theorem than do the assumptions of the 
theorem. The figure-8 knot, and its attendant 
magnetic field, is not dissipative — in fact, it is not 
even a dynamical system, yet the closed loops can be 
isotoped to the figure-8 knot holder. There are other 
ways in which the Birman-Williams theorem is 
stronger than its statement suggests. 

It is apparent from Figure 3 that the figure-8 
branched manifold can be built up Lego©® fashion 
from the two basic building blocks shown in 
Figure 2. This is more generally true. Every 
branched manifold can be built up, Lego® fashion, 
from the stretch (with a splitting point singularity) 
and the squeeze (with a branch line singularity) 
building blocks, subject to the following two 
conditions: 


1. outputs flow to inputs and 
2. there are no free ends. 


The figure-8 branched manifold is built up from 
four stretch and four squeeze building blocks. As a 
result, there are eight branches and four branch 
lines. 

Two often-studied strange attractors are shown in 
Figures 4 and 5. Figure 4 shows the details of the 
Rossler dynamical system. A similar spectrum of 
features is shown in Figure 5 for the Lorenz equations. 
The knot holder in Figure 5e is obtained from the 
caricature in Figure 5d by twisting the right-hand lobe 
by z radians. 

Branched manifolds can be used to characterize 
all three-dimensional strange attractors. Branched 
manifolds that classify the strange attractors gener- 
ated by four familiar sets of equations (for some 
control parameter values) are shown in Figure 6. 
The sets of equations, and one set of parameter 
values that generate strange attractors, are presented 
in Table 1. 

The beauty of this topological classification of 
strange attractors is that it is apparent, just by 
inspection, that there is no smooth change of 
variables that will map any of these systems to any 
of the others for the parameter values shown. 

Branched manifolds can be described algebrai- 
cally. In Figure 7 we provide the algebraic 


Chaos and Attractors 481 


—=X+a AN AIPA TAHA 
2 t Ml po 


& y 


(e) 





Figure 4 The Rössler dynamical system. (a) Rössler equations. (b) Time series z(t) and x(t) generated by these equations, and 
(c) projection of the strange attractor onto the x-y plane. (d) Caricature of the flow and (e) knot holder derived directly from the 
caricature. Control parameter values (a, b,c) = (2.0, 4.0, 0.398). The Topology of Chaos; R Gilmore and M Lefranc; Copyright © 2002, 
Wiley. This material is used by permission of John Wiley & Sons, Inc. 


50 


30 I 








GX =—ox+ oy th i ' 
d i Mpat Ment 
ap =Rx-y-xz he HK 

= i | 
dZ =—bz + xy j: ati ith || i i| 





(e) 


al Ma j | W 


| ome, 

é ae 
M Ij y| | An, 

lH a — `. 

' \ ee Cie wA P 
| | Wh Mil PW fis. Sa j 

NN Lag CEN a 
See > NENA he 
AN A À Sw J 
ae be 
i N 
’ 


ph 





Ks 





(d) 


Figure 5 (a) Lorenz equations. (b) Time series x(t) and z(t) generated by these equations, and (c) projection of the strange attractor 
onto the x-y plane. (d) Caricature of the flow and (e) knot holder derived directly from the caricature by rotating the right-hand lobe by 7 


radians. Control parameter values (R, o, b) = 


(26.0, 10.0,8/3). The Topology of Chaos; R Gilmore and M Lefranc; Copyright © 2002, 


Wiley. This material is used by permission of John Wiley & Sons, Inc. 


description of two branched manifolds. Figure 7a 
shows the branched manifold that describes experi- 
mental data generated by many physical systems. 
The mechanism is a simple stretch-and-fold defor- 
mation with zero global torsion that generates a 
typical Smale horseshoe. There are two branches. 
The diagonal elements of the matrix identify the 
local torsion of the flow through the corresponding 
branch, measured in units of m. Branch 0 has no 
local torsion, and branch 1 shows a half-twist and 
has local torsion +1. The off-diagonal matrix 


elements are twice the linking number of the 
period-1 orbits in the corresponding pair of branches. 
Since the period-1 orbits in these two branches do not 
link, the off-diagonal matrix elements are 0. The 
period-1 orbits in the branches labeled 1 and 2 in 
Figure 7b have linking number +1, so the off-diagonal 
matrix elements are T(1,2)=7(2,1)=2 x +1. The 
array identifies the order (above, below) that the two 
branches are joined at the branch line, the smaller the 
value, the closer to the viewer. These two pieces of 
information, four integers in Figure 7a and eight in 
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Figure 6 Branched manifolds for four standard sets of 
equations: (a) Rossler equations, (b) periodically driven Duffing 
equations, (c) periodically driven van der Pol equations, and 
(d) Lorenz equations. The Topology of Chaos; R Gilmore and 
M Lefranc; Copyright © 2002, Wiley. This material is used by 
permission of John Wiley & Sons, Inc. 





Table 1 Four sets of equations that generate strange attractors 

Dynamical Parameter 

system ODEs values 
X=-y-Z 

Rossler y=X+ay (a, b, c) = (2.0, 4.0, 0.398) 
Z=b+2(x —C) 

Duffin is 

g y=—6y—x3+x ~~ (6,A,w) =(0.4, 0.4, 1.0) 
+ Asin(wt) 

van der Pol x=by+(c-—dy*)x  (b,c,d,A,w)= 
y= —-x+Asin(wt) (0.7, 1.0, 10.0, 0.25, 7/2) 
X= —0oX + 0oy 

Lorenz y=Rx-y-xz (R, ø, b) = (26.0, 10.0, 8/3) 
z= —bz+xy 


Figure 7b, serve to determine the topological organi- 
zation of all the unstable periodic orbits in any 
strange attractor with either branched manifold. 

The periodic orbits are identified by a repeating 
symbol sequence of least period p, which is unique 
up to cyclic permutation. The symbol sequence 
consists of a string of integers, sequentially identify- 
ing the branches through which the orbit passes. For 
a branched manifold with two branches, there are 
two symbols. The number of orbits of period 
p, N(p), obeys the recursion relation 


k<p/2 


pN(p) = 2? — X RN(k) [3] 
1=k|p 


V 


0 0 
1 2 
2 2 


ee ~ 


ot. 


[o =i 


(a) (b) 

Figure 7 Branched manifolds are described algebraically. The 
diagonal matrix elements describe the twist of each branch. 
The off-diagonal matrix elements are twice the linking number of 
the period-1 orbits in each of the two branches. The array 
describes the order in which the branches are connected at the 
branch line. (a) Smale horseshoe branched manifold. (b) Beginning 
of a “gateau roulé” (jelly roll) branched manifold. 


O O O 


Table 2 shows the number of orbits of period 
p < 20 for the branched manifolds with two and 
three branches shown in Figure 7. The number of 
orbits of period p grows exponentially with p, and 
the limit br = lims — ə log (N(p))/p defines the topo- 
logical entropy hr for the branched manifold. The 
limits are In2 and In3 for the branched manifolds 
with two and three branches, respectively. The 
linking numbers of orbits up to period S in the 
Smale horseshoe branched manifold are shown in 
Table 3, which identifies each of the orbits by its 
symbol sequence (e.g., 00111). 


Table 2 Number of orbits of period p on the branched manifolds 
with two and three branches, shown in Figure 7. The integers 
N3(p) are constructed by replacing 2? by 3° in eqn [3] 


Two Three Two Three 
Period branches branches Period branches branches 





p N2(p) N3(P) p N2(p) N3(p) 

1 2 3 11 186 16104 
2 1 3 12 335 44220 
3 2 8 13 630 122640 
4 3 18 14 1161 341 484 
5 6 48 15 2182 956576 
6 9 116 16 4080 2690010 
7 18 312 17 7710 7 596 480 
8 30 810 18 14532 21 522 228 
9 56 2184 19 27 954 61171656 
10 99 5880 20 52377 174336264 
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Table 3 Linking numbers of orbits to period 5 in the Smale horseshoe branched manifold with zero global torsion 





0 14 A 3; 34 4, 44 A 5 54 5 5e 53 — 5g 

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

1 0 0 1 1 1 2 1 1 2 2 2 2 1 1 

21 01 0 1 1 2 2 3 2 2 4 4 3 3 2 2 
31 011 0 1 2 2 3 4 3 3 5 5 5 5 3 3 
31 001 0 1 2 3 2 4 3 3 5 5 4 4 3 3 
4 0111 0 2 3 4 4 5 4 4 8 8 7 7 4 4 
4o 0011 0 1 2 3 3 4 3 4 5 5 5 5 4 4 
4 0001 0 1 2 3 3 4 4 3 5 5 5 5 4 4 
54 01111 0 2 4 5 5 8 5 5 8 10 9 9 5 5 
54 01101 0 2 4 5 5 8 5 5 10 8 8 8 5 5 
52 00111 0 2 3 5 4 7 5 5 9 8 6 7 5 5 
52 00101 0 2 3 5 4 7 5 5 9 8 7 6 5 5 
53 00011 0 1 2 3 3 4 4 4 5 5 5 5 4 5 
53 00001 0 1 2 3 3 4 4 4 5 5 5 5 5 4 
Tables of linking numbers have been used supports. Whenever a low-dimensional strange 


successfully to identify mechanisms that nature uses 
to generate chaotic data. This analysis procedure is 
called topological analysis. Segments of data are 
identified that closely approximate unstable periodic 
orbits existing in the strange attractor. These data 
segments are then embedded in R°. Each orbit is 
given a trial identification (symbol sequence). Their 
pairwise linking numbers are computed either by 
counting signed crossings or using the time- 
parametrized data segments and estimating the 
integers numerically using the Gauss linking integral 


e B) 
-pf oe ra(tı) EAA I RETTES 
ra(tı) — re(t2)| 
This table of experimental integers is compared with 
the table of linking numbers for orbits with the same 
symbolic name on a trial branched manifold. This 
procedure serves to identify the branched manifold 
and refine the symbolic identifications of the 
experimental orbits, if necessary. The procedure is 
vastly overdetermined. For example, the linking 
numbers of only three low-period orbits serve to 
identify the four pieces of information required to 
specify a branched manifold with two branches. 
Since six or more surrogate periodic orbits can 
typically be extracted from experimental data, 
providing (£) = 15 or more linking numbers, this 
topological analysis procedure has built-in self- 


consistency checks, unlike analysis procedures 
based on geometric and dynamical tools. 


Basis Sets of Orbits 


A branched manifold determines the topological 
organization of all the periodic orbits that it 


attractor is subjected to topological analysis, it is 
always the case that fewer periodic orbits are 
present and identified than are allowed by the 
branched manifold that classifies it. This is the case 
for strange attractors generated by experimental 
data as well as strange attractors generated by 
ODEs. The full spectrum occurs only in the 
hyperbolic limit, which has never been seen. 

The orbits that are present are organized exactly 
as in the hyperbolic limit — that is, as determined by 
the underlying branched manifold. As control para- 
meters change, the strange attractor undergoes 
perestroikas. New orbits are created and/or old 
orbits are annihilated in direct or inverse period- 
doubling and saddle-node bifurcations. The orbits 
that are present are always organized as determined 
by the branched manifold. Orbits are not created or 
annihilated independently of each other. Rather, 
there is a partial order (“forcing order”) involved in 
orbit creation and annihilation. This partial order is 
poorly understood for general branched manifolds. 
It is much better understood for the two-branch 
Smale horseshoe branched manifold. 

The forcing diagram for this branched manifold 
is shown in Figure 8 for orbits up to period 8. It is 
typically the case that the existence of one orbit in 
a strange attractor forces the presence of a 
spectrum of additional orbits. Forcing is transitive, 
so if orbit A forces orbit B(A => B) and B forces C, 
then A forces C: if A => B and B > C then A > C. 
For this reason, it is sufficient to show only the 
first-order forcing in this figure. The orbits shown 
are labeled by their period and the order in which 
they are created in a particular highly dissipative 
limit of the dynamics: the logistic map (U-sequence 
order in Figure 8). For example, 52 describes the 
second (pair) of period-5 orbits created in the 
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(a) Forcing diagram for orbits up to period 8 in the Smale horseshoe branched manifold. (b) The sequence (“universal 


order”) in which orbits are created in the highly dissipative limit, which is the logistic map. The Topology of Chaos; R Gilmore and 
M Lefranc; Copyright © 2002, Wiley. This material is used by permission of John Wiley & Sons, Inc. 


logistic map in the transition from simple, non- 
chaotic behavior to fully chaotic (hyperbolic) 
behavior. 

The orbits in the forcing diagram are organized 
according to their one-dimensional entropy 
(horizontal axis, U-sequence order) and their two- 
dimensional entropy (vertical axis). Nonchaotic 
(“laminar”) behavior occurs at the lower left of 
this figure, where both entropies are zero. Fully 
chaotic behavior occurs at the upper right, where 
both entropies are In2. As control parameters 
change, a dynamical system that can exhibit chaos 
generated by a stretch-and-fold mechanism follows a 
path in the forcing diagram from the lower left to 
the upper right. Each such path is a “route to 
chaos.” The Smale horseshoe mechanism exhibits 
many different routes to chaos: each follows a 
different path in the forcing diagram. 

The state of a strange attractor at any stage in its 
route to chaos can be specified by a “basis set of 
orbits.” This is a set of orbits whose presence forces 
the existence of all other orbits that can concur- 
rently be found in the attractor, up to any finite 


period. The basis set of orbits can be constructed 
algorithmically. The algorithm is as follows: 


1. Write down all the orbits that are present in 
order of increasing two-dimensional entropy 
from left to right. 

2. For orbits with the same two-dimensional entropy, 
order by increasing one-dimensional entropy. 

3. Remove the “highest” (rightmost) orbit from this 
list, together with all the orbits that it forces. 
This is the first basis orbit. 

4. Of the orbits remaining, again remove the right- 
most and all the orbits that it forces. This is the 
second basis orbit. 

5. Continue until all orbits have been removed. 


For any finite period, the above algorithm 
terminates because there is only a finite number of 
orbits. For example, if the orbit 52 is present as well 
as all orbits with lower one-dimensional entropy, 
the basis set is 87R, 76, 74F, 8¢6F, 83,52. As control 
parameters change, a strange attractor undergoes 
perestroikas that are quantitatively determined by 
changes in the basis sets of orbits. 


Bounding Tori 


As experimental conditions or control parameters 
change, strange attractors can undergo “grosser” 
perestroikas than those that can be described by a 
change in the basis set of orbits. This occurs when new 
orbits are created that cannot be contained on the initial 
branched manifold — for example, when orbits are 
created that must be described by a new symbol. This is 
seen experimentally in the transition from horseshoe 
type dynamics to gateau roulé type dynamics. This 
involves the addition of a third branch to the branched 
manifold with two branches, as shown in Figures 7a 
and 7b. Strange attractors can undergo perestroikas 
described by the addition of new branches to, or 
deletion of old branches from, a branched manifold. 
These perestroikas are in a very real sense “grosser” 
than the perestroikas that can be described by changes 
in the basis sets of orbits on a fixed branched manifold. 

There is a structure that provides constraints on 
the allowed bifurcations of branched manifolds 
(creation/annihilation of branches), which is analo- 
gous to the constraints that a branched manifold 
provides on the bifurcations and topological organi- 
zation of the periodic orbits that can exist on it. This 
structure is called a bounding torus. 

Bounding tori are constructed as follows. The semi- 
flow on a branched manifold is “inflated” or “blown 
up” to a flow on a thin open set in R? containing this 
branched manifold. The boundary of this open set is a 
two-dimensional surface. Such surfaces have been 
classified. They are uniquely tori of genus g; g=0 
(sphere), g=1 (tire tube), g=2,3,.... The torus of 
genus g has Euler characteristic x = 2 — 2g. The flow is 
into this surface. The flow, restricted to the surface, 
exhibits a singularity wherever it is normal to the 
surface. At such singularities the stability is determined 
by the local Lyapunov exponents: A; > 0 and à; < 0, 
since the flow direction (A2=0) is normal to the 
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surface. As a result, all singularities are saddles; so, by 
the Poincaré—Hopf theorem, the number of singularities 
is strongly related to the genus. The number is 2(g — 1). 

The flow, restricted to the genus-g surface, can be 
put into canonical form and these canonical forms can 
be classified. The classification involves projection of 
the genus-g torus onto a two-dimensional surface. The 
planar projection consists of a disk with outer 
boundary and g interior holes. All singularities can be 
placed on the interior holes. The flow on the interior 
holes without singularities is in the same direction as 
the flow on the exterior boundary. Interior holes with 
singularities have an even number, 4,6,.... Some 
canonical forms are shown in Figure 9. 

Poincaré sections have been used to simplify the 
study of flows in low-dimensional spaces by effec- 
tively reducing the dimension of the dynamics. In 
three dimensions, a Poincaré surface of section for a 
strange attractor is a minimal two-dimensional sur- 
face with the property that all points in the attractor 
intersect this surface transversally an infinite number 
of times under the flow. The Poincaré surface need 
not be connected and in fact is often not connected. 

The Poincaré section for the flow in a genus-g torus 
consists of the union of g — 1 disjoint disks (g > 3) or 
is a single disk (g = 1). The locations of the disks are 
determined algorithmically, as shown in Figure 9. The 
interior circles without singularities are labeled by 
capital letters A, B,C,... and those with singularities 
are labeled with lowercase letters a,b,c,... The 
components of the global Poincaré surface of section 
are numbered sequentially 1,2,...,g — 1, in the order 
they are encountered when traversing the outer 
boundary in the direction of the flow, starting from 
any point on that boundary. Each component of the 
global Poincaré surface of section connects (in the 
projection) an interior circle without singularities to 
the exterior boundary. There is one component 
between each successive encounter of the flow with 





ABCBDED 
abbacca 


(a) (b) 


ABCDCBE 
abccbaa 


ABCBDBE 
abbccaa 


(c) 


Figure 9 Three inequivalent canonical forms of genus 8 are shown. Each is identified by a “period-7 orbit” and its dual. Reprinted 
figure with permission from Physical Review E, 69, 056206, 2004. Copyright (2004) by the American Physical Society. 
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holes that have singularities. Heavy lines are used to 
show the location of the seven components of the 
global Poincaré surface of section for each of the three 
inequivalent genus-8 canonical forms shown in 
Figure 9. The structure of the flow is summarized by 
a transition matrix. For the canonical form shown in 
Figure 9c the transition matrix is 


1 10 0 0 0 0 
0 0 1 1 0 0 0 
0 0 1 1 0 0 0 
T= |]0 0 0 0 1 1 =0 
0 0 0 0 110 
0 10 0 0 0 1 
1 0 0 0 0 0 1 


where T;;=1 if the flow can proceed directly from 
component 7 to component j, 0 otherwise. 
Bounding tori, dressed with flows, can be labeled. In 
fact, two dual labeling schemes are possible. Following 
the outer boundary in the direction of the flow, one 
encounters the g — 1 components of the global Poin- 
caré surface of section sequentially, the interior holes 
without singularities at least once each, and the interior 
holes with singularites at least twice each. The 
canonical form (genus-g torus dressed with a flow) on 
the genus-8 bounding torus shown in Figure 9a can be 
labeled by the sequence in which the holes without 
singularities are encountered (ABCBDED) or the order 
in which the holes with singularities are encountered 
(abbacca). Both sequences contain g— 1 symbols. 
These labels are unique up to cyclic permutation. 
Symbol sequences for canonical forms for bounding 
tori act in many ways like symbol sequences for 
periodic orbits on branched manifolds. Although there 
is a 1:1 correspondence between bounded closed two- 
dimensional surfaces in R? and genus g, the number of 


Table 4 Number of canonical bounding tori as a function of 


genus g 

g N(9) g N(9) g N(9) 

3 1 9 15 15 2211 
4 1 10 28 16 5549 
5 2 11 67 17 14290 
6 2 12 145 18 36824 
7 5 13 368 19 96 347 
8 6 14 870 20 252 927 


canonical forms grows rapidly with g, as shown in 
Table 4. In fact, the number, N(g), grows exponen- 
tially and can even be assigned an entropy: 


__, In(N(g)) _ 
or =n [5] 


In some sense, canonical forms that constrain 
branched manifolds within them behave like branched 
manifolds that constrain periodic orbits on them. 

Every strange attractor that has been studied in R? 
has been described by a canonical bounding torus that 
contains it. This classification is shown in Table 5. 

Branched manifold perestroikas are constrained 
by bounding tori as follows. Each branch line of any 
branched manifold can be moved into one of the 
g—1 components of the global Poincaré surface of 
section. Any branched manifold contained in a 
genus-g bounding torus (g > 3) must have at least 
one branch between each pair of components of the 
global Poincaré surface of section between which the 
flow is allowed, as summarized by the canonical 
form’s transition matrix. New branches can only be 
added in a way that is consistent with the canonical 
form’s transition matrix, continuity requirements, 
and the no intersection condition. 


Table 5 All Known strange attractors of dimension aL < 3 are bounded by one of the standard dressed tori. Dual labels for the 
bounding tori depend on g — 1 symbols describing holes with or without singularities 


Strange attractor 


Rossler, Duffing, Burke, and Shaw 
Various lasers, gateau roulé 

Neuron with subthreshold oscillations 
Shaw-van der Pol 

Lorenz, Shimizu—Morioka, Rikitake 
Cə covers of Rossler 

Cə cover of Lorenz? 

Co cover of Lorenz? 

2 — 1 Image of figure-8 branched manifold 
Figure-8 branched manifold 

C, covers of Rossler 

Cn cover of Lorenz? 

Cn cover of Lorenz? 

Multispiral attractors 


“Rotation axis through origin. 
Rotation axis through one focus. 


Holes w/o singularites Holes with singularities Genus 
A 1 

A 1 

A 1 

A 1 

AB aa 3 

AB a? 3 
ABCD at 5 
ABCB abba 5 
ABCB ab(ab)' 5 
AEBECEDE abed 9 
AB.--N a” n+1 
AB. --(2N) aa 2n+1 
(AZ)(BZ) --- (NZ) ab:n 2n+1 
A(B--- M)N(B -< - MY” (ab. --m)(ab --- my” 2m+1 


In the simplest case, g=1, a third branch can be 
added to a branched manifold with two branches only 
if its local torsion differs by +1 from the adjacent 
branch. In addition, the ordering of the new branch 
must be consistent with the continuity and no 
intersection (ODE uniqueness theorem) requirements. 


Embeddings of Bounding Tori 


The last level of topological structure needed for the 
classification of strange attractors in R? describes 
their embeddings in R°. The classification using 
genus-g bounding tori is intrinsic — that is, the 
canonical form shows how the flow looks from 
inside the torus. Strange attractors, and the tori that 
bound them, are actually embedded in R. For a 
complete classification, we must specify not only the 
canonical form but also how this form sits in R°. 
This program has not yet been completed, but we 
illustrate it with the genus-1 bounding torus in 
Figure 10. Figure 10a shows the canonical form, and 
two different embeddings of it in R?. The embedding 
on the left is unknotted. The embedding on the right is 
knotted like a figure-8 knot. Extrinsic embeddings of 
genus-1 tori are described by tame knots in R*, and 
tame knots can be used as “centerlines” for extrinsi- 
cally embedded genus-1 tori. Higher-genus (g > 3) 
canonical forms — intrinsic genus-g tori dressed with a 





(b) (c) 

Figure 10 (a) Canonical form for genus-1 bounding torus. 
Extrinsic embeddings of the torus into R? that are (b) unknotted 
and (c) knotted like the figure-8 knot. 
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canonical flow — have a larger (but discrete) variety of 
extrinsic embeddings in R3. 


The Embedding Question 


The mechanism that nature uses to generate chaotic 
behavior in physical systems is not directly observable, 
and must be deduced by examining the data that are 
generated. Typically, the data consist of a single scalar 
time series that is discretely recorded: x;, i= 1,2,.... 
In order to exhibit a strange attractor, a mapping of the 
data into RN must also be constructed. If the attractor 
is low dimensional (dr < 3), one can hope that a 
mapping into R? can be constructed that exhibits no 
self-intersections or other degeneracies. Such a map is 
called an embedding. Once an embedding in R° is 
available, a topological analysis can be carried out. The 
analysis reveals the mechanism that underlies the 
creation of the embedded strange attractor. 

But how do you know that the mechanism that 
generates the observed, embedded strange attractor 
has anything to do with the mechanism nature used 
to generate the experimental data? 

If the embedding is contained in a genus-1 bounding 
torus, then the topological mechanism that generates 
the data, as defined by some unknown branched 
manifold BMegxp, and the topological mechanism that 
is identified from the embedded strange attractor 
BMempsg, are identical up to three degrees of freedom: 
parity, global torsion, and the knot type. As a result, in 
this case (genus-1) a topological analysis of embedded 
data does reveal nature’s hidden secrets. 


See also: Ergodic theory; Fractal dimensions in 
dynamics; Generic Properties of Dynamical Systems; 
Gravitational N-body Problem (Classical); 
Homeomorphisms and Diffeomorphisms of the Circle; 
Homoclinic phenomena; Inviscid Flows; Lyapunov 
Exponents and Strange Attractors; Nonequilibrium 
Statistical Mechanics (Stationary): Overview; Random 
Algebraic Geometry, Attractors and Flux Vacua; Random 
Matrix Theory in Physics; Regularization for Dynamical 
Zeta Functions; Singularity and Bifurcation Theory; 
Symmetry and Symmetry Breaking in Dynamical 
Systems; Synchronization of Chaos. 
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Vector Bundles 


Let Vect,(M, F) be the set of isomorphism classes of 
real (F=R) or complex (F=C) vector bundles of 
rank k over a smooth connected m-dimensional 
manifold M. Let 


Vect(M, F) = LJ Vect, (M, F) 
k 


Principal Bundles - Examples 
Let H be a Lie group. A fiber bundle 
p:P—-M 


with fiber H is said to be a principal bundle if there 
is a right action of H on P which acts transitively on 
the fibers, that is, if P/H=M. If H is a closed 
subgroup of a Lie group G, then the natural 
projection G — G/H is a principal H bundle over 
the homogeneous space G/H. Let O(k) and U(k) 
denote the orthogonal and unitary groups, respec- 
tively. Let St denote the unit sphere in R**+!. Then 
we have natural principal bundles: 


O(k) CO(k+1) — S* 
U(k) CU(k +1) 3 S7**1 


Let RP* and CP* denote the real and complex 
projective spaces of lines through the origin in R**! 
and C+! respectively. Let 


Z2 = {+Id} c O(k) 
St = {\- Id: A = 1} c UR) 
One has Z» and S! principal bundles: 
Toes çk-1 = Rpk-! 
gl _, g2k-1 _, Cp%-! 


Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge 
University Press. 

Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear 
Physics and Its Mathematical Tools. Bristol: IoP Publishing. 

Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental 
Approach to Nonlinear Dynamics and Chaos. Reading, MA: 
Addison-Wesley. 


Frames 


A frame s:=(s1,...,8,) for V € Vect,(M, F) over an 
open set O C M is a collection of k smooth sections 
to V|., so that {s1(P),...,s,(P)} is a basis for the 
fiber Vp of V over any point P € O. Given such a 
frame s, we can construct a local trivialization which 
identifies O x F* with V| o by the mapping 


(Pi Aisiak) = Arsi(P) +--+ + Apse(P) 


Conversely, given a local trivialization of V, we can 
take the coordinate frame 


si(P) = P x (0,...,0,1,0,...,0) 


Thus, frames and local trivializations of V are 
equivalent notions. 


Simple Covers 


An open cover {0a} of M, where a ranges over some 
indexing set A, is said to be a simple cover if any 
finite intersection Og, N:+- N Oa, is either empty or 
contractible. 

Simple covers always exist. Put a Riemannian 
metric on M. If M is compact, then there exists a 
uniform 6 > 0 so that any geodesic ball of radius 6 is 
geodesically convex. The intersection of geodesically 
convex sets is either geodesically convex (and hence 
contractible) or empty. Thus, covering M by a finite 
number of balls of radius 6 yields a simple cover. 
The argument is similar even if M is not compact 
where an infinite number of geodesic balls is used 
and the radii are allowed to shrink near oo. 


Transition Cocycles 


Let Hom(F, k) be the set of linear transformations of 
F? and let GL(F, k) c Hom(F, k) be the group of all 
invertible linear transformations. 

Let {sa} be frames for a vector bundle V over some 
open cover {O,} of M. On the intersection Oa N Og, 
one may express Sa = Wagsg, that is 


SP = >, D P)sag(?) 


1<j<k 
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Vector Bundles 


Let Vect,(M, F) be the set of isomorphism classes of 
real (F=R) or complex (F=C) vector bundles of 
rank k over a smooth connected m-dimensional 
manifold M. Let 


Vect(M, F) = LJ Vect, (M, F) 
k 


Principal Bundles - Examples 
Let H be a Lie group. A fiber bundle 
p:P—-M 


with fiber H is said to be a principal bundle if there 
is a right action of H on P which acts transitively on 
the fibers, that is, if P/H=M. If H is a closed 
subgroup of a Lie group G, then the natural 
projection G — G/H is a principal H bundle over 
the homogeneous space G/H. Let O(k) and U(k) 
denote the orthogonal and unitary groups, respec- 
tively. Let St denote the unit sphere in R**+!. Then 
we have natural principal bundles: 


O(k) CO(k+1) — S* 
U(k) CU(k +1) 3 S7**1 


Let RP* and CP* denote the real and complex 
projective spaces of lines through the origin in R**! 
and C+! respectively. Let 


Z2 = {+Id} c O(k) 
St = {\- Id: A = 1} c UR) 
One has Z» and S! principal bundles: 
Toes çk-1 = Rpk-! 
gl _, g2k-1 _, Cp%-! 


Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge 
University Press. 

Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear 
Physics and Its Mathematical Tools. Bristol: IoP Publishing. 

Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental 
Approach to Nonlinear Dynamics and Chaos. Reading, MA: 
Addison-Wesley. 


Frames 


A frame s:=(s1,...,8,) for V € Vect,(M, F) over an 
open set O C M is a collection of k smooth sections 
to V|., so that {s1(P),...,s,(P)} is a basis for the 
fiber Vp of V over any point P € O. Given such a 
frame s, we can construct a local trivialization which 
identifies O x F* with V| o by the mapping 


(Pi Aisiak) = Arsi(P) +--+ + Apse(P) 


Conversely, given a local trivialization of V, we can 
take the coordinate frame 


si(P) = P x (0,...,0,1,0,...,0) 


Thus, frames and local trivializations of V are 
equivalent notions. 


Simple Covers 


An open cover {0a} of M, where a ranges over some 
indexing set A, is said to be a simple cover if any 
finite intersection Og, N:+- N Oa, is either empty or 
contractible. 

Simple covers always exist. Put a Riemannian 
metric on M. If M is compact, then there exists a 
uniform 6 > 0 so that any geodesic ball of radius 6 is 
geodesically convex. The intersection of geodesically 
convex sets is either geodesically convex (and hence 
contractible) or empty. Thus, covering M by a finite 
number of balls of radius 6 yields a simple cover. 
The argument is similar even if M is not compact 
where an infinite number of geodesic balls is used 
and the radii are allowed to shrink near oo. 


Transition Cocycles 


Let Hom(F, k) be the set of linear transformations of 
F? and let GL(F, k) c Hom(F, k) be the group of all 
invertible linear transformations. 

Let {sa} be frames for a vector bundle V over some 
open cover {O,} of M. On the intersection Oa N Og, 
one may express Sa = Wagsg, that is 


SP = >, D P)sag(?) 


1<j<k 


The maps Wag: Oa Og — GL(F, k) satisfy 


Paa =Id on Oa 


1 

Vap = WoyPyg on OgNOgN O, g 

Let G be a Lie group. Maps belonging to a 

collection {Wag} of smooth maps from Oa N Og to G 

which satisfy eqn [1] are said to be transition 

cocycles with values in G; if G C GL(F,k), they 

can be used to define a vector bundle by making 
appropriate identifications. 


Reducing the Structure Group 


If G is a subgroup of GL(F, k), then V is said to have 
a G-structure if we can choose frames so the 
transition cocycles belong to G; that is, we can 
reduce the structure group to G. 

Denote the subgroup of orientation-preserving 
linear maps by 


GLH (R, k) := {Y € GL(R, k): det(4) > 0} 


If V € Vect,(M, R), then V is said to be orientable if 
we can choose the frames so that 


Pag E GL! (R, k) 


Not every real vector bundle is orientable; the first 
Stiefel-Whitney class sw ,(V) € H!(M; Z2), which is 
defined later, vanishes if and only if V is orientable. 
In particular, the Mobius line bundle over the circle 
is not orientable. 

Similarly, a real (resp. complex) bundle V is 
said to be Riemannian (resp. Hermitian) if we can 
reduce the structure group to the orthogonal group 
O(k) C GL(R,R) (resp. to the unitary group 
U(k) C GL(C, k)). 

We can use a partition of unity to put a positive- 
definite symmetric (resp. Hermitian symmetric) fiber 
metric on V. Applying the Gram-Schmidt process 
then constructs orthonormal frames and shows that 
the structure group can always be reduced to O(k) 
(resp. to U(k)); if Vis a real vector bundle, then the 
structure group can be reduced to the special 
orthogonal group SO(k) if and only if V is 
orientable. 


Lifting the Structure Group 


Let r be a representation of a Lie group H to 
GL(F, k). One says that the structure group of V can 
be lifted to H if there exist frames {s,} for V and 
smooth maps ¢93:0,.N03 — H, so Tdhag = Wags 
where eqn [1] holds for æ. 
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Spin Structures 


For k > 3, the fundamental group of SO(k) is Z2. 
Let Spin(k) be the universal cover of SO(k) and let 


T : Spin(k) — SO(k) 


be the associated double cover; set Spin(2)=S! and 
let r(A) = A*. An oriented bundle V is said to be spin 
if the transition functions can be lifted from SO(k) 
to Spin(k); this is possible if and only if the second 
Stiefel-Whitney class of V, which is defined later, 
vanishes. There can be inequivalent spin structures, 
which are parametrized by the cohomology group 
H'(M; Z2). 


The Tangent Bundle of Projective Space 


The tangent bundle TRP” of real projective space is 
orientable if and only if m is odd; TRP” is spin if 
and only if m = 3 mod 4. If m = 3 mod 4, there are 
two inequivalent spin structures on this bundle as 
H! (RP”; Z2) = Z2. 

The tangent bundle TCP” of complex projective 
space is always orientable; TCP” is spin if and only 


if m is odd. 


Principal and Associated Bundles 


Let H be a Lie group and let 
Gog Og NVOg— A 


be a collection of smooth functions satisfying the 
compatibility conditions given in eqn [1]. We define 
a principal bundle P by gluing Oa x H to Og x H 
using ¢: 


(P,h), ~ (P, ġap(P)b)g for P € Oa N Og 


Because right multiplication and left multiplication 
commute, right multiplication gives a natural action 


of H on P: 
(P,h) -h := (P,b - b), 


The natural projection P — P/H =M is an H fiber 
bundle. 

Let r be a representation of H to GL(F, k). For 
£ E€ P,à € Ff, and h € H, define a gluing 


(E, A) ~ (E-H, T(h)A) 
The associated vector bundle is then given by 
Pal Pa 


Clearly, {TØag} are the transition cocycles of the 
vector bundle P x, F*. 
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Frame Bundles 


If V is a vector bundle, the associated principal 
GL(F, k) bundle is the bundle of all frames; if V is 
given an inner product on each fiber, then the 
associated principal O(k) or U(k) bundle is the bundle 
of orthonormal frames. If V is an oriented Riemannian 
vector bundle, the associated principal SO(k) bundle is 
the bundle of oriented orthonormal frames. 


Direct Sum and Tensor Product 


Fiber-wise direct sum (resp. tensor product) defines the 
direct sum (resp. tensor product) of vector bundles: 
9 : Vect,(M, F) x Vect,(M, F) 
Vectk+n (M, F) 
Q : Vect; (M, F) x Vect, (M, F) 
— Vect,,,(M, F) 
The transition cocycles of the direct sum (resp. 
tensor product) of two vector bundles are the direct 
sum (resp. tensor product) of the transition cocycles 
of the respective bundles. 
The set of line bundles Vect;(M,F) is a group 
under ®. The unit in the group is the trivial line 


bundle | :=M x F; the inverse of a line bundle L is 
the dual line bundle L* :=Hom(L, F) since 


LoL =] 


Pullback Bundle 


Let p:V—M be the projection associated with 
V € Vect;,(M, F). If fis a smooth map from N to M, 
then the pullback bundle f*V is the vector bundle 
over N which is defined by setting 

FV = {(P,v) EN x Vf (P) = plv)} 


The fiber of f*V over P is the fiber of V over f(P). 
Let {s,} be local frames for V over an open cover 
{O,} of M. For P € f-!(O,), define 


(“sat (P) := (P, Sa(F(P))) 


This gives a collection of frames for f*V over the 
open cover {f~!(O,)} of N. Let 


f“ Pap = Pag o f 


be the pullback of the transition functions. Then 


{fsa} (P) = (P, baa(F(P))sa(F(P))) 
= (Phas) (fsg) j (P) 


This shows that the pullback of the transition 
functions for V are the transition functions of the 


pullback f*(V). 


Homotopy 


Two smooth maps fo and f; from N to M are 
said to be homotopic if there exists a smooth map 
F:N xI—M so that fo(P)=F(P,0) and so that 
fı(P) = F(P, 1). If fo and fı are homotopic maps from 
N to M, then f;V is isomorphic to f5 V. 

Let [N, M] be the set of all homotopy classes 
of smooth maps from N to M. The association 
V — f*V induces a natural map 


IN, M] x Vect,(M, F) — Vect; (N, F) 


If M is contractible, then the identity map is 
homotopic to the constant map c. Consequently, 
V =Id*V is isomorphic to c*V =M x F*. Thus, any 
vector bundle over a contractible manifold is trivial. 
In particular, if {O,} is a simple cover of M and if 
V € Vect(M, F), then V|,_ is trivial for each a. This 
shows that a simple cover is a trivializing cover for 
every V € Vect(M, F). 


Stabilization 


Let | € Vectı(M, F) denote the isomorphism class of 
the trivial line bundle M x F over an m-dimensional 
manifold M. The map V — V @ | induces a stabili- 
zation map 


s : Vect(M, F) > Vect,,1(M, F) 
which induces an isomorphism 


Vect,(M, R) = Vectg,;(M, R) 
Vect,(M, C) = Vect,.,(M, ©) 


fork > m 
for 2k > m 


These values of k comprise the stable range. 


The K-Theory 





The direct sum © and tensor product ® make 
Vect(M, F) into a semiring; we denote the associated 
ring defined by the Grothendieck construction by 
KF(M). If V e Vect(M, F), let [V] € KF(M) be the 
corresponding element of K-theory; KF(M) is gener- 
ated by formal differences [V1] — [V2]; such formal 
differences are called virtual bundles. 

The Grothendieck construction (see K-theory) 
introduces nontrivial relations. Let $” denote the 
standard sphere in R”*'. Since 


T(S) @1=(m+ DI 


we can easily see that [TS”]=m[1l] in KR(S”), 
despite the fact that T(S”) is not isomorphic to ml 
fOr mA 13,7, 

Let L denote the nontrivial real line bundle over 
RP*. Then TRP* 61=(k + 1)L, so 


[TRP*] = (k + 1)[L] — [1] 





The map V — Rank(V) extends to a surjective 
map from KF(M) to Z. We denote the associated 
ideal of virtual bundles of virtual rank 0 by 


KF(M) := ker(Rank) 
In the stable range, V — [V] — k[ 1] identifies 
Vect;,(M,R) = KR(M) ifk>m 3 
Vect,(M,C) =KC(M) if 2k>m 


These groups contain nontrivial torsion. Let L be the 
nontrivial real line bundle over RP*. Then 


KR(RP*) = Z.- {[L] — (1J}/2’"Z4(L] - [1]} 


where v(k) is the Adams number. 


Classifying Spaces 


Let Gr;(F,7) be the Grassmannian of k-dimensional 
subspaces of F”. By mapping a k-plane z in F” to the 
corresponding orthogonal projection on m, we can 
identify Gr(F, n) with the set of orthogonal projec- 
tions of rank k: 


{€ € Hom(F”): & = £, & = £, tr(6) =k} 


There is a natural associated tautological k-plane 


bundle 
V (F, n) € Vect; (Gr (F, n), F) 
whose fiber over a k-plane ~r is the k-plane itself: 
V (F, n) := {(€,x) € Hom(F”) x F” : & = x} 
Let [M, Gr,(F,7)] denote the set of homotopy 
equivalence classes of smooth maps f from M to 
Gr(F,n). Since [fi]=[f2] implies that f/V_ is 
isomorphic to f% V, the association 
f as PVE (F, n) € Vect,(M, F) 
induces a map 


[M, Gr, (F, n)| — Vect,(M, F) 


This map defines a natural equivalence of functors 
in the stable range: 


IM, Gri (R, v + k)| = Vect (M, R) 
IM, Gr(C, v + k)| = Vect (M, C) 


for v > m 


4] 


for 2v >m 


The natural inclusion of F” in F”*! induces natural 
inclusions 
Gr, (F, 2) C Gr, (F, 2+ 1) 
V; (F, n) C V; (F, n -+ 1) 
Let Gr,(F,oo) and V;,(F,oo) be the direct limit 


spaces under these inclusions; these are the infinite- 
dimensional Grassmannians and classifying bundles, 


[5] 
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respectively. The topology on these spaces is the 
weak or inductive topology. The Grassmannians are 
called classifying spaces. The isomorphisms of 
eqn [4] are compatible with the inclusions of eqn [5] 
and we have 


IM, Gr (F, 00)] = Vect, (M, F) 6) 


Spaces with Finite Covering Dimension 


A metric space X is said to have a covering 
dimension at most m if, given any open cover {Ua} 
of X, there exists a refinement {Og} of the cover so 
that any intersection of more than m + 1 of the {06} 
is empty. For example, any manifold of dimension 
m has covering dimension at most m. More 
generally, any m-dimensional cell complex has 
covering dimension at most m. 

The isomorphisms of [2]-[4], and [6] continue to 
hold under the weaker assumption that M is a metric 
space with covering dimension at most m. 


Characteristic Classes of Vector 
Bundles 


The Cohomology of Gr, (I, 00) 


The cohomology algebras of the Grassmannians are 
polynomial algebras on suitably chosen generators: 


H* (Gr; (R, 00); Za) = Za [sw1, parea SW] 7] 
H*(Grz(C, 00); Zi) = Zlc1, eae Ch] 


The Stiefel-Whitney Classes 


Let V € Vect,;(M,R). We use eqn [6] to find 
Y :M — Gr, (IR, oo) which classifies V; the map Y 
is uniquely determined up to homotopy and, using 
eqn |7], one sets 


sw;(V) := U*sw; € H'(M; Z2) 
The total Stiefel-Whitney class is then defined by 
sw( V) = 1+sw,(V)+---+sw,(V) 
The Stiefel-Whitney class has the properties: 


1. If f: X41 — X2, then f*(sw(V)) =swi(f* V). 

2. sw(V $ W)=sw(V)sw( W). 

3. If L is the Möbius bundle over St, then sw1(L) 
generates H! (St; Z2) = Z2. 


The cohomology algebra of real projective space 
is a truncated polynomial algebra: 


H*(RP*;Z2) = Za|x]/x**! = 0 
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Since TRP? @1=(k +1)L, one has 


sw(TRP*) = (1 + x)! 
(k +1)k 2, 


= 1] +k 
+ kx + 


Orientability and Spin Structures 


The Stiefel-Whitney classes have real geometric 
meaning. For example, sw;(V)=0 if and only if V 
is orientable; if sw;(V)=0, then sw2(V) =O if and 
only if V admits a spin structure. With reference to 
the discussion on the tangent bundle or projective 
space, eqn [8] yields 


if k = 0 mod 2 
if k= 1 mod 2 


Thus, RP* is orientable if and only if k is odd. 


Furthermore, 


sw; (TRP*) = 0 
X 


0 if k=3mod4 
x if kR=1mod4 


Thus, TRP* is spin if and only if k = 3 mod 4. 


sw2(TRP*) = 


Chern Classes 


Let V € Vect,(M,C). We use eqn [6] to find 
W:M — Gr,(C, oo) which classifies V; the map WU 
is uniquely determined up to homotopy and, using 
eqn [7], one sets 


c)(V) := V*c; € H*(M;Z) 
The total Chern class is then defined by 
c(V) :=1+c(V)+---+e(V) 
The Chern class has the properties: 


1. If f:Xı > Xo, then f*(c(V)) =c(f*V). 

2. c( V W)=c(V)c(W). 

3. Let L be the classifying line bundle over 
S? = CPt. Then Joti L)= =1. 


The cohomology algebra of complex projective 
space also is a truncated polynomial algebra 


H*(CP*; Z) = Z[x]/x*" 


where x =c,(L) and L is the complex classifying line 
bundle over CP% = Grı(C,k +1). If T-CP* is the 
complex tangent bundle, then 


c(T-CP*) = (1 + x)" 


The Pontrjagin Classes 


Let V be a real vector bundle over a topological 
space X of rank r=2k or r=2k + 1. The Pontrjagin 


classes p;(V) € H” (X; Z) are characterized by the 
properties: 


1. p(V)=1+ pi(V) eV) 

2. If f:X 1 — X2, then f*(p(V)) =pl(f*V). 

3. p(V 8 W)=p(V)p(W) mod elements of order 2. 
4. fep pı( TCP?) =3. 


We can complexify a real vector bundle V to 
construct an associated complex vector bundle Vc. 
We have 


pi(V) := (-1)'c2i(Ve) 


Conversely, if V is a complex vector bundle, we can 
construct an underlying real vector bundle Vg by 
forgetting the underlying complex structure. Mod- 
ulo elements of order 2, we have 


p(Vr) = (V)e(V") 


Let TCP* be the real tangent bundle of complex 
projective space. Then 


p(TCP*) = (1 —x?)**# 


Line Bundles 


Tensor product makes Vect;(M,F) into an abelian 
group. One has natural equivalences of functors 
which are group homomorphisms: 


sw, : Vect;(M,R) > H! (M; Z2) 
cı : Vect;(M, C) — H*(M; Z) 


A real line bundle L is trivial if and only if it is 
orientable or, equivalently, if sw,(L) vanishes. A 
complex line bundle L is trivial if and only if 
cı(L)=0. There are nontrivial vector bundles with 
vanishing Stiefel-Whitney classes of rank k > 1. For 
example, sw;(TS*)=0 for i > 0 despite the fact that 
TSF is trivial if and only if k=1, 3,7. 


Curvature and Characteristic Classes 
de Rham Cohomology 


We can replace the coefficient group Z by C at the cost 
of losing information concerning torsion. Thus, we 
may regard p;(V) € H*(M;C) if V is real or c;(V) € 
H*'(M;C) if V is complex. Let M be a smooth 
manifold. Let C°A’M be the space of smooth 
p-forms and let 


d: C™APM — CAPIM 


be the exterior derivative. The de Rham cohomology 
groups are then defined by 


H? (M) -= ker(d : CCAP M — C”APtI M) 
deR“ -° im(d : Ce AP-1M — CAP M) 

The de Rham theorem identifies the topological 
cohomology groups H?(M;C) with the de Rham 
cohomology groups H (M) which are given 
differential geometrically. 

Given a connection on V, the Chern-Weyl theory 


enables us to compute Pontrjagin and Chern classes in 
de Rham cohomology in terms of curvature. 


Connections 
Let V be a vector bundle over M. A connection 
V : C”(V) — C”(T*M & V) 
on V is a first-order partial differential operator 
which satisfies the Leibnitz rule, that is, if s is a 


smooth section to V and if f is a smooth function 
on M, 


V(fs) = df 8 s + fVs 
If X is a tangent vector field, we define 
Vxs = (X, Vs) 


where (-,-) denotes the natural pairing between the 
tangent and cotangent spaces. This generalizes to the 
bundle setting the notion of a directional derivative 
and has the properties: 


1. V ¢xs =fVxs. 

2. Vx(fs) =X(f)s + fV xs. 

3. Vx,4x%55 = Vx,s + Vx,S. 

4. Vx(s1 + 52) = Vxs1 + Vxs2. 


The Curvature 2-Form 
Let wp be a smooth p-form. Then 
V : C”(A? M @ V) > C®(APTIM @ V) 
can be extended by defining 
V(wp & s) = dwp & s + (1) wp A Vs 


In contrast to ordinary exterior differentiation, V? 
need not vanish. We set 


Q(s) := V's 


This is not a second-order partial differential 
operator; it is a zeroth-order operator, that is, 


O(fs) = ddf @s — df AVs + df AVs+fV*s 
= fAs) 
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The curvature operator Q can also be computed 
locally. Let (s;) be a local frame. Expand 


Vs, = Siu} ©) Sj 
j 
to define the connection 1-form w. One then has 
Vs = (des! = we /\ wh) G Sz 
and so 


Ql =d =n Au 


1 1 


iis wr; is another local frame, we compute 
j= dgg + gue and Q =gNg"! 


Although the connection 1-form w is not tensorial, the 
curvature is an invariantly defined 2-form-valued 
endomorphism of V. 


Unitary Connections 


Let (- ,-) be a nondegenerate Hermitian inner product 
on V. We say that V is a unitary connection if 


(Vs1,52) + (s1, Vs2) = d(s1, s2) 


Such connections always exist and, relative to a 
local orthonormal frame, the curvature is skew- 
symmetric, that is, 


040 =0 


Thus, Q can be regarded as a 2-form-valued element 
of the Lie algebra of the structure group, O(V) in the 
real setting or U(V) in the complex setting. 


Projections 


We can always embed V in a trivial bundle 1” of 
dimension v; let zy be the orthogonal projection on 
V. We project the flat connection to V to define a 
natural connection on V. For example, if M is 
embedded isometrically in the Euclidean space R”, 
this construction gives the Levi-Civita connection on 
the tangent bundle TM. The curvature of this 
connection is then given by 


= TV dry dry 


Let Vp be the fiber of V over a point P € M. The 
inclusion i:V C R” defines the classifying map 
f:P — Gr,(R,7) where we set 


f(P) = i(Vp) 
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Chern-Weyl Theory 


Let V be a Riemannian connection on a real vector 


bundle V of rank k. We set 


pO et (1 +39) 


Let Q! denote the transpose matrix of differential 
form. Since Q+QT=0, the polynomials of odd 
degree in Q vanish and we may expand 


p(Q) = 14 pi(Q) +--+» + p9) 


where k=2r or k =2r + 1 and the differential forms 
pi(Q) € C°A*(M) are forms of degree 4i. 

Changing the gauge (i.e., the local frame) replaces 
Q by Ngt and hence p(Q) is independent of the 
local frame chosen. One can show that dp,(Q) = 0; 
let [p;(Q)] denote the corresponding element of de 
Rham cohomology. This is independent of the 
particular connection chosen and [p;(Q2)] represents 
pi(V) in H*(M; C). 

Similarly, let V be a complex vector bundle of 
rank k with a Hermitian connection V. Set 


e(O) = det (i + ra) 


= 14 e1(2) +--+ aN) 


Again c;(Q) is independent of the local gauge and 
dc;(Q) =0. The de Rham cohomology class [c;(Q)] 
represents c;(V) in H”(M;C). 


The Chern Character 
The total Chern character is defined by the formal 
sum 

ch(Q) := tr(e¥~!2/2") 


Vy, 
~ 2 a a) 


= cho(Q) + chy (Q) + .--- 





Let ch(V) =[ch(Q)] denote the associated de Rham 
cohomology class; it is independent of the particular 
connection chosen. We then have the relations 


ch(V 6 W) = ch(V) + ch(W) 
ch(V & W) =ch(V)ch(W) 


The Chern character extends to a ring isomorph- 
ism from KU(M)®Q to H*(M;Q), which is a 
natural equivalence of functors; modulo torsion, 
K theory and cohomology are the same functors. 


Other Characteristic Classes 


The Chern character is defined by the exponential 
function. There are other characteristic classes 
which appear in the index theorem that are defined 
using other generating functions that appear in 
index theory. Let x:=(x,,...) be a collection of 
indeterminates. Let s,(x) be the vth elementary 
symmetric function; 


[a F Xy) =] + $1(x) + s(x) +... 


V 


For a diagonal matrix A :=diag(\;,...), denote the 
normalized eigenvalues by x; := V —14A;/2r. Then 


c(A) -anf a] =] +s) 


Thus, the Chern class corresponds in a certain sense 
to the elementary symmetric functions. 

Let f(x) be a symmetric polynomial or more 
generally a formal power series which is symmetric. 
We can express f(x)= F(sı(x),...) in terms of the 
elementary symmetric functions and define 
f(Q) = F(c,(Q),...) by substitution. For example, 
the Chern character is defined by the generating 
function 


The Todd class is defined using a different 
generating function: 


td(x) = | [x(-e*) 
=1+td,(x)+-:- 


If V is a real vector bundle, we can define 
some additional characteristic classes similarly. Let 
{+/—11,...} be the nonzero eigenvalues of a 
skew-symmetric matrix A. We set xj=—Aj/27 
and define the Hirzebruch polynomial L and the A 
genus by 


aa I] a7 


A®) = laa 


—1+Aj,(x) + A(x) +> 


The generating functions 


ie. A aa 


are even functions of x, so the ambiguity in the 
choice of sign in the eigenvalues plays no role. This 
defines characteristic classes 


L(V) € H#(M;C) and A,(V) € H*(M;C) 


Summary of Formulas 


We summarize below some of the formulas in terms 
of characteristic classes: 


_ Vv —1tr(Q) 


1. c1 (02) 77 ) 

2. ex() = gy ltr?) — (9), 

3. pil) = — ptr), 

i: chiv) =k fa +272 4. Ly, 
ca (á+) aco 
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10. LIV@W)=LIV 


The Euler Form 


So far, this article has dealt with the structure groups 
O(k) in the real setting and U(k) in the complex 
setting. There is one final characteristic class which 
arises from the structure group SO(k). Suppose k = 2n 
is even. While a real antisymmetric matrix A of shape 
2n x 2n cannot be diagonalized, it can be put in block 
off 2-diagonal form with blocks, 


O Ap 
=A 0 
2 2 


The top Pontrjagin class p (A) = xf- x7 
square. The Euler class 


is a perfect 


Gj, (A) Hee Ky 
is the square root of p,. If V is an oriented vector 
bundle of dimension 2n, then 
e2n( V) € H” (M; C) 


is a well-defined characteristic class satisfying 
e2n( V)? = Pal V). 

If V is the underlying real oriented vector bundle 
of a complex vector bundle W, 


en V) = cn(W) 
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If M is an even-dimensional manifold, let e,,(M):= 
em( TM). If we reverse the local orientation of M, 
then e,,(M) changes sign. Consequently, e,,(M) is a 
measure rather than an m-form; we can use the 
Riemannian measure on M to regard e,,(M) as a 
scalar. Let Rj; be the components of the curvature of 
the Levi-Civita connection with respect to some local 
orthonormal frame field; we adopt the convention 
that Rj221=1 on the standard sphere S* in R>. If 
el := (el e) is the totally antisymmetric tensor, then 


L SS E Raisins + Rin tinininn 
ain (87)”n! 
LJ i 


Let R := Ry and pj := Rikk; be the scalar curvature 
and the Ricci tensor, respectively. Then 


1 
a=7 
= 1 2 2 2 
e4 = 555 (R? — 4o + IRP) 


Characteristic Classes of Principal 
Bundles 


Let g be the Lie algebra of a compact Lie group G. 
Let r:P — M be a principal G bundle over M. For 
E € P, let 


Ve := ker my : TeP > TyeM and He := Vy 


be the vertical and horizontal distributions of the 
projection 7, respectively. We assume that the metric 
on P is chosen to be G-invariant and such that 
Tx: He — TreM is an isometry; thus, 7 is a Rieman- 
nian submersion. If F is a tangent vector field on M, 
let HF be the corresponding vertical lift. Let py be 
orthogonal projection on the distribution V. The 
curvature is defined by 


OQA FiF) = py H(Fi) HP) 


the horizontal distribution H is integrable if and only if 
the curvature vanishes. Since the metric is G-invariant, 
Q)(F,, F2) is invariant under the group action. We may 
use a local section s to P over a contractible coordinate 
chart O to split r 'O=O x G. This permits us to 
identify V with TG and to regard Q as a g-valued 
2-form. If we replace the section s by a section §, then 
Q = gQg"! changes by the adjoint action of G on q. 
If V is a real or complex vector bundle over M, 
we can put a fiber metric on V to reduce the 
structure group to the orthogonal group O(r) in the 
real setting or the unitary group U(r) in the complex 
setting. Let Py be the associated frame bundle. A 
Riemannian connection V on V induces an invariant 
splitting of TPy=VO@H and defines a natural 
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metric on Py; the curvature Q of the connection V 
defined here agrees with the definition previously. 

Let Q(G) be the algebra of all polynomials on 
q which are invariant under the adjoint action. If 
O € Q(G), then O(Q) is well defined. One has 
dO(Q) =0. Furthermore, the de Rham cohomology 
class O(P) :=[O(Q)] is independent of the particular 
connection chosen. We have 


Q(U(R)) = Clei,.-- ce] 
Q(SU(k)) = T 
QO(O(2k)) = Clpi,..-, De 
O(O(2k + 1)) = Clpi,..., Pkl 
Q(SO(2k)) =C] Piss Pes eel /eg = Pe 


O(SO(2k + 1)) = Clpy,..., pe 


Thus, for this category of groups, no new character- 
istic classes ensue. Since the invariants are Lie- 
algebra theoretic in nature, 


Q(Spin(k)) = Q(SO(k)) 


Other groups, of course, give rise to different 
characteristic rings of invariants. 
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Introduction 


The relationship between topological invariants and 
functional integrals from quantum Chern—Simons 
theory discovered by Witten (1989) raised several 


challenges for mathematicians. Most of the tremen- 
dous amount of mathematical activity generated by 
Witten’s discovery has been concerned primarily with 
issues that arise after one has accepted the functional 
integral as a formal object. This has left, as an 
important challenge, the task of giving rigorous 
meaning to the functional integrals themselves and to 
rigorously derive their relation to topological invar- 
iants. The present article will discuss efforts to put the 
functional integral itself on a rigorous basis. 


Chern-Simons Functional Integrals 


We shall describe here the typical Chern—Simons 
functional integral. For the purposes of this article, 
we will confine ourselves to a simpler setting rather 
than the most general possible one. In fact, we shall 
work with fields over three-dimensional Euclidean 
space R? (instead of a general 3-manifold). 

The typical Chern—Simons functional integral is of 
the form 


/ ellR/4m)Scs(A) Wo R (A)...Wo,r,(A)DA f1] 
A 


Our objective in this section will be to specify what 
the terms in this formal integral mean. Very briefly, 
the integration is with respect to a formal “Lebesgue 
measure” on A, an infinite-dimensional space of 
geometric objects A called connections over R? with 
values in the Lie algebra LG of a group G. In the 
first term in the integrand, in the exponent, k is a 
real number, and Scs(A) is the Chern—Simons action 
for the connection A. Each term Wce,r,(A) is a 
Wilson loop observable, the trace in some represen- 
tation R; of the holonomy of the connection A 
around the loop C;. The entire integral, formal 
though it may be, provides an invariant associated 
with the system of loops Cy,..., Cy. 

Let G be a compact Lie group; for ease of 
exposition, let us take G to be a closed, connected 
subgroup of U(z). Thus, each element of G is an 
nxn complex matrix g with g*g=I, the identity. 
The Lie algebra LG consists of all n x n matrices A 
which are skew-Hermitian, that is, satisfy A* = —A, 
and for which e™ € G for all real numbers t. On LG 
there is a convenient inner product given by 


(A,B) = tr(AB*) 


This inner product is invariant under the conjuga- 
tion action of the group G on its Lie algebra LG. 

By a connection over R? we shall mean a C% 
1-form with values in LG. The set of all connections 
is an affine (in our case, actually a linear) space A. If 
A € A, then define 


Scs(A) = f. tr(A A dA +4A AAAA) [2] 


This is, up to constant multiple, the Chern-Simons 
action functional. 

Let A be a connection and consider a piecewise 
smooth path 


C: [0,1] = R? 


With this one can associate a G-valued path [0,1] — 
G:t g(t) € G satisfying the differential equation 


g'(t)g(t)* = —A(C'()) 
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subject to the initial condition g(0)=TJ, the identity. 
The path t+ g(t) describes parallel transport along C 
by the connection A. If C is a loop then the final value 
g(1) is the holonomy of A around C. If R is a repre- 
sentation of G on some finite-dimensional vector space 
then the trace of R(g(1)) is the Wilson loop observable: 


Wer (A) = tr(R(e(1))) [3] 


Thus, we have specified the meaning of the terms 
appearing in the formal integral [1], where 
C1,...,C„ of eqn [1] form a link (a family of 
nonintersecting, imbedded loops) in R? and 
R1,..., Rn are finite-dimensional representations of 
G. Witten showed that, at least for suitable values of 
k, integrals of this form ought to produce topologi- 
cal invariants, which he identified, for the link. 

The integral [1] is problematic for several reasons. 
First, there is no reasonable and useful analog of 
Lebesgue measure on an infinite-dimensional space. 
Even if one were to regularize this measure in some 
simple way, one would run into the problem that the 
measure would not live on the space of smooth 
connections, and so the integrand would become 
meaningless. 

There are several different approaches to a 
mathematical interpretation of [1]. The approach 
that is often taken in practice is to simply ignore the 
analytical problem and define the value of the 
integral [1] to be what Witten’s calculations have 
given. One approach, used, for instance, by Bar- 
Natan (1995) is to expand the integrand in a series 
and relate each individual integral in this expansion 
separately to topological invariants. Discrete 
approximation procedures to the continuum integral 
have also been explored. In the abelian case, infinite- 
dimensional oscillatory integral techniques have 
been used to understand the functional integral. 
Frohlich and King (1999) showed the possibility of 
interpreting parallel transport using ideas from 
stochastic differential equations. Such an approach 
has been used successfully in the case of two- 
dimensional Yang-Mills theory, where the func- 
tional integral actually corresponds to integration 
with respect to a measure. In this article, we focus 
on a method of understanding the normalized 
Chern-Simons functional integral in terms 
of infinite-dimensional distribution theory and 
examining some ideas for understanding Wilson 
loop expectation values in this setting. 


Infinite Dimensional Distributions 


Let (x°,x!,x2) denote the usual coordinates on R°. 
Gauge symmetry, an issue which will not be 
examined here, may be used to simplify the problem 
of the Chern-Simons integral. In particular, one 
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need only focus on connections which vanish in the 
x*-direction, that is, connections of the form 
A= Aodx° + A;dx!. For such A, the triple wedge- 
product term in the Chern-Simons action disap- 
pears, and we are left with the quadratic expression: 


aA J t(AAdA) 4 


This is good, since the functional integral now 
involves a quadratic exponent and so stands a good 
chance of rigorous realization, just as Gaussian 
measure can be given rigorous meaning in infinite 
dimensions. However, in the Chern—Simons situa- 
tion, there is no hope of actually getting a measure, 
not even a complex measure. 

The next best thing to a measure is a distribution 
or “generalized function.” A distribution over a space 
Y is a continuous linear functional on a topological 
vector space of functions on Y. Thus, the objective is 
to realize the Chern—Simons functional integral as a 
continuous linear functional on some space of test 
functions over A (more precisely, on an extension of 
A). Before turning to the specific case of the Chern- 
Simons integral, let us examine some elements of the 
theory of infinite-dimensional distributions, in as 
much as they are relevant to our needs. 

Let us consider a Hilbert space Eo, and a positive 
Hilbert-Schmidt operator T on Eo. For each integer 
p > 0, let Ep = TP? (Eo), which is a Hilbert space with 
the inner product (x,y), =(T?x, Ty). Then we 
have the chain of inclusions 


E=()& C2 CEC Ep [5] 
p21 


with each inclusion Ep} —>Ep being Hilbert- 
Schmidt. Let €_, =E, be the topological dual of €,, 
the space of continuous linear functionals on £p, and 
let €’ be the topological dual of €, where the latter is 
given the topology generated by all the norms ||- ||,- 
Then we have the inclusions 


Ey~&CErcEac: c&=|(JE» [6 
p20 


For each x€€ there is the evaluation map 
£:E'— R: G(x). A very special case of a general 
theorem of Minlos guarantees that on the dual £’ there 
is a measure u on the sigma algeba generated by all the 
functions x such that each x is a Gaussian random 
variable of mean zero and variance |x|¢, that is, 


ee 2 2 
J eltx du =e” lxlo/2 
4 


for all x © E and te R. This measure u is the 
standard Gaussian measure on €’ for the infinite- 
dimensional nuclear space €. 


The inner products (-, -),, give rise to a nuclear space 
structure on function spaces over €. Let U be the 
algebra of functions on E’ generated by the exponen- 
tials e**, with x running over £ and A over C. For each 
p = 0, there is an inner product ((-,-)), on U such that 


Cac e82 ) ) — eWilxy), 7 
p 

For p=0 the left-hand side coincides with the L?(u) 

inner product. Let [E€], be the Hilbert space 


completion of U in the ((-,-)), inner product. Then 


--[E]3 € [Ela € [Ely C [Elg = 17°C w) [8] 


Let [E] = Mp>0 [E],, equipped with topology from all 
the norms ||-||,, and [E] its topological dual. 
Elements of [€]’, being continuous linear functionals 
on the “test function space” [E], are called distribu- 
tions over €, in the language of white-noise analysis. 

A fundamental tool in the study of infinite- 
dimensional distributions is the S-transform. This 
generalizes the traditional Segal-Bargmann trans- 
form from the L?-setting to the context of distribu- 
tions. Let E£. be the complexification of £. The inner 
product (-,-)y on E extends to a complex-bilinear 
pairing E. x E> C:(z,w)z-w. The evaluation 
pairing E€ x E—R also extends naturally to the 
complexifications. For ® a distribution belonging to 
[E], define a function S® on € by 


S®(z) = B(c,) 


for all z € E.. Here c, is the coherent state function on 
E' given by c.(¢) =e%*)-“/22_ A fundamental and 
useful result in white-noise analysis, due originally to 
Potthoff and Streit, specifies the range of the transform 
S and allows reconstruction of a distribution ® from 
the function S®. Briefly, the range of S consists of 
functions which are holomorphic, in an appropriate 
sense, and have at most quadratic exponential growth. 
In particular, this theorem implies that a function of the 
form z= e®*, for any constant a, is in the range of ®. 


Rigorous Realization of Chern-Simons 
Integrals 


We return to the Chern—Simons context. As men- 
tioned earlier, gauge symmetry may be invoked to 
reduce the space of connections to the smaller space: 


E=X@X (9) 


where X=S(R°)@LG is the space of rapidly 
decreasing functions with values in the Lie algebra 
LG. Let 





as a linear operator on L?(R°),T)= T” & I the 
induced operator on L?(R?) & LG, and T= T @ To. 
Then, as described in the preceding section, we have 
the space € and its dual €’. There is then the 
standard Gaussian measure u on €’, and the space 
[E] of distributions over €'. 

The normalized Chern-Simons integral may be 
viewed as a linear functional 


1 
on a / ellk/4™)Scs(A)F(A)DA [10] 
E 


where N is a “normalizing” factor. Rigorous mean- 
ing can be given to this by first formally working out 
what the S-transform of ®cs ought to be. Calcula- 
tion shows that SẸ is indeed a holomorphic function 
on E; of quadratic growth. The Potthoff-Streit 
theorem then implies that ®cs does exist as a 
distribution in the space [E]. Let us examine this 
in some more detail. 

As before, we take A to be of the form 
A = Aodx? + A;dx!, with the component A, equal 
to 0. Integration by parts shows that 


k k 

4, 30s(A) = -5 tr(A02 A1) dvol [11] 
A formal computation reveals that S(®cs)(j) should 
be given by 


r E TE P 
sofun) ua 
where j= (fo, j1), and 


arfa) =F | dsla) — Leo) (9)] Fas) 


The Potthoff-Streit criterion implies the existence of 
a distribution ®cs, whose S-transform is given by the 
above expression. 

The distribution ®cs is, however, not a suffi- 
ciently powerful object to allow determination of 
the Wilson loop expectations that one would really 
like to have. For instance, ®cs does not live on the 
space of smooth connections and so the meaning of 
parallel transport needs to be defined. The state of 
knowledge, at the rigorous level, at this point is still 
evolving, with progress reported by A. Hahn. We 
describe some ideas for the Wilson loop expecta- 
tions in the following. 

The strategy for defining parallel transport along 
a path is to smear out the path by means of bump 
functions and essentially replace the path by a path 
of test functions in €. The description given here is 
mainly for the case of abelian G. Choose first a C% 
non-negative bump function ~ on R°, vanishing 
outside the unit ball and having L! norm equal to 1. 
For e > 0, let yf be the scaled bump function given 
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by u(x) =e 3u(x/e). Next, for a smooth loop 
[0, 1] — M(t) =(lo(t), fi (t), l2(t)), let M(t) =% -K(2)), 
the scaled bump function centered now at the path 


point l(t). Now consider a generalized connection 
A = (Ao, A1) € €. Set 


BA) = Ao (EA Oo HACO 3 


The equation of parallel transport can be reformu- 
lated as a differential equation for a matrix-valued 
path tr P$ (t) satisfying 

di I (4) pi 
qra) + Bale)Pale) = 0 [14] 
and the initial condition P% (t) =I. With this smear- 
ing, one can consider functions of the form 


W.(L; A) = Jee! (A)) [15] 
1=1 


for a link L consisting of loops ,...,/,, instead of 
the classical Wilson loop variable. 

At this stage, it would be natural to consider 
taking eļ0 in @(W.(L)). However, this is still 
problematic. A further regularization is needed, 
roughly corresponding to the geometric notion of 
framing. In the definition of ®cs, alteration is made 
to the quadratic form O(j, j) in the exponent which 
appears in the expression for S(®cs), replacing it 
with O(j, 37), where {ds},.9 is a family of suitable 
diffeomorphisms of R?, with ġo being the identity. 
In a sense, this splits a single loop / into l and a 
neighboring loop ġol. At the end, one has to take 
s|0. The resulting limiting value is the expected 
link-invariant. We shall not go into the case of 
nonabelian G, which is more complex, for which 
work continues to be in progress. 

Infinite-dimensional distributions can be used to 
formulate a rigorous theory for normalized Chern- 
Simons functional integrals. The more specific ques- 
tions raised by the Wilson-loop integrals in this setting 
opens up new problems for further developments in 
the distribution theory, connecting geometry, topol- 
ogy, and infinite-dimensional analysis. 
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Classical groups are Lie groups corresponding to 
three classical geometries — linear, metric, and 
symplectic. Let us start with the complex field C. 
We consider the linear space C” and the group 
GL(n;C) of its automorphisms — nondegenerate 
(invertible) linear transformations. The complex 
linear metric space is the space C” endowed by a 
nondegenerate symmetric bilinear form; the orthogo- 
nal group O(n; C) is the subgroup in GL(n; C) of 
automorphisms of this structure. If, for n=2l, we 
replace the symmetric form by a nondegenerate skew- 
symmetric form, we obtain the linear symplectic 
space and the group Sp(l; C) of its automorphisms — 
the symplectic group. 

A fundamental observation of nineteenth century 
geometry was that the transfer from the complex 
field to the real one, gives not only three corres- 
ponding groups for R but a much reacher collection 
of real forms of complex classical groups: unitary, 
pseudounitary, pseudoorthogonal, etc. (see below). 
Classical geometries correspond to homogeneous 
manifolds with classical groups of transformations. 
Geometers understood that this produces a very 
reach world of non-Euclidean geometries, including 
the first example of non-Euclidean geometry - 
hyperbolic geometry. Some classical algebraic the- 
ories through such an approach obtain a geometrical 


interpretation (see below the consideration of the 
cone of symmetric positive forms). Between classical 
manifolds there are Minkowski space, Grassman- 
nians, and multidimensional analogs of the disk and 
the half-plane. A substantial part of this theory is a 
matrix geometry, which serves as a background for 
matrix analysis. A rich geometry on classical 
manifolds with many symmetries is a background 
for a rich multidimensional analysis with many 
explicit formulas. Classical geometries, starting with 
Minkowski geometry, have appeared in some 
problems of mathematical physics. 

A crucial technical fact is the embedding of the 
classical groups in the class of semisimple Lie groups; 
it gives a very strong unified method to work with 
semisimple groups and corresponding geometries — the 
method of roots. Nevertheless, some special realiza- 
tions and constructions for classical groups can also be 
very useful. A very impressive example is the twistors 
of Penrose, where an initial construction is the 
realization of points of four-dimensional Minkowski 
space as lines in three-dimensional complex projective 
space. We mention below some general facts about 
semisimple groups and homogeneous manifolds, but 
the focus will be on special possibilities for the classical 
groups. The class of simple Lie groups contains, 
besides the classical groups, only a finite number of 
exceptional groups which are also very interesting and 
are connected, in particular, with noncommutative 
and nonassociative geometries; they have applications 
to mathematical physics. 


Complex Groups and Homogeneous 
Manifolds 


Complex Classical Groups 


The complete linear group GL(n; C) is the group of 
nongenerate matrices g of order n (det g Æ 0) and the 
special linear group SL(n;C) is its subgroup of 
matrices with the determinant equal 1 (unimodular 
condition). The unimodular condition kills the one- 
dimensional center, perhaps, leaving only a finite 
center. We realize the direct products of several copies 
of complete linear groups with different dimensions, 
for example, GL(k; C) x GL(/; C), as the groups of the 
blockdiagonal nondegenerate matrices. The letter S 
always means that we take matrices with determinant 
1. So the notation S(L(k; C) x L(/;C)) means that we 
take blockdiagonal matrices with blocks of sizes k, l 
and with the determinant 1. 

Let I be a nondegenerate symmetric matrix of 
order n; then the orthogonal group O(n; C) is the 
subgroup in GL(m;C) of matrices preserving the 
corresponding symmetric form so that 


g ig=] 


These matrices can have the determinant +1. The 
special orthogonal group SO(n; C) is the subgroup 
of orthogonal matrices with determinant 1. Differ- 
ent I’s give isomorphic orthogonal groups since they 
are all linearly equivalent. If we take as I the unit 
matrix E=E,, then we receive the group of 
orthogonal matrices in the classical sense: g' g= E. 

If n=2]l and we replace in this definition the 
symmetric matrix I by a nondegenerate skew- 
symmetric matrix J, we obtain the symplectic 
group Sp(/;C). Again, different J’s give isomorphic 
groups. The typical example of J is 


0 E 
=(4, 2 


It is convenient then to represent matrices g as 


me; 


where the blocks A, B, C, D are matrices of order /. 
Then the symplectic condition is that A'D — 
C'A=E and matrices A'C and D'B are symmetric. 
If C=0 then D=(A')* and AB is a symmetric 
matrix. In this way, we have in Sp(l; C) a subgroup 
P of blocktriangular matrices of a very simple 
structure; it is an example of subgroups which are 
called parabolic. 

There are two principal classes of homogeneous 
spaces with complex semisimple Lie groups: flag 
manifolds and Stein manifolds. 
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Flag Manifolds 


These homogeneous spaces F=G/P with semi- 
simple (in our case with classical) groups G have 
parabolic subgroups P as the isotropy subgroups. 
The group G=GL(n;C) transitively acts on the 
flag manifolds F(m,...,7,),0< my <---<n, <n, 
whose elements are (71,...,7,)-flags — sequences of 
embedded subspaces in C” of the dimensions 
(n1, ..., r). The isotropy subgroup P = P(m1,...,7,;) 
is the subgroup of blocktriangle matrices with the 
diagonal blocks of sizes kı,...,k,+1,k;= (n; — 
nj-1), no =0, n1 =n. The flag manifolds are com- 
pact complex manifolds. The matrices proportional 
to the unit matrix E, act trivially and we can 
consider instead of the action of G = GL(n; C) the 
transitive action of G = SL(n; C). 

Let us pay particular attention to two extremal 
cases. The first one is the case of the maximal 
flag manifold when we have the sequence of 
all integers (1,2,3,...,2—1) — complete flags; the 
subgroup P in this case is called Borelian. Another 
case is minimal flag manifolds with r=1 (for them 
the unipotent radical of the parabolic subgroups is 
commutative). Then in the case of SL(n;C) the 
sequence has only one element nı =k < n and we 
have Grassmannian manifolds Grc(k;n)=F(k) of 
k-dimensional subspaces in C”. If R=1 or R=n— 1, 
we obtain the dual realizations of the complex 
projective space CP”~'. We can interpret points 
of Grc(k;n) also as (k —1)-dimensional planes in 


Cre, 
We can define points of the projective space 
CP”! by homogeneous coordinates - as the 


equivalency classes (z ~ cz,z € C” \ {[0},c E C \ 0). 
For the Grassmannians we can similarly use matrix 
homogeneous coordinates (Stiefel’s coordinates): 
classes of (k x n)-matrices Z € Mat(k,n) of the 
maximal rank k relative to the equivalency 


Zw~uZ, ue GL(k;C) 


The rows of a matrix Z correspond to a base in 
subspace with the homogeneous coordinate Z; the 
left multiplication on a matrix u replaces this base, 
but does not change the subspace. The group 
GL(n;C) acts by right multiplications: 


Z + Zg 


and this action preserves the equivalency classes. 
Suppose k < n — k and the left k-minor of Z is not 
zero. Such matrices give the dense coordinate chart 
CR’), we can pick in the equivalency classes the 
representatives (Ep, z), z € Mat(k,n — k), and con- 
sider the matrices z as (inhomogeneous) local 
coordinates. In the inhomogeneous coordinates the 
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action of the group has a matrix fractional linear 
form: let 


A B 
= (¢ 3 
A € Mat(k), D e Mat(n — k), 
B € Mat(k,n — k), C € Mat(n — k, k) 


Then we have the transformation in inhomogeneous 
coordinates: 


z= (A +2zC)'(B+zD) 


The condition C=0 defines the parabolic sub- 
group which has affine action in inhomogeneous 
coordinates which is transitive in the coordinate 
chart. In such a way the Grassmannian is a 
compactification of C*"-® (realized as a space of 
k x (n — k) matrices). If n= 2k, we can consider it as 
the compactification of the space of square matrices 
z of order k with the flat generalized conformal 
structure defined by translations of the isotropy cone 
{det z=0}. 

There are similar constructions of flag manifolds 
for other classical groups. We will consider only the 
minimal flag manifolds. For O(2k; C) we consider 
the isotropic Grassmannian Gri (2k; C) of isotropic 
k-subspaces relative to the symmetric form I. We 
take the matrix realization of Grc(k;2k), using 
Stiefel’s homogeneous coordinates, and add the 
matrix equation 


ZIZ! =0 


which is well defined in the homogeneous coordi- 
nates (compatible with the equivalency classes) and 
defines isotropic subspaces relative to I. This matrix 
cone is preserved by the subgroup O(2k;C) c 
GL(2k;C) corresponding to the matrix I. If we 
take the symmetric matrix 


then in inhomogeneous coordinates (z is a square 
k-matrix) this equation is transformed into the 
condition that the matrix z is skew-symmetric. So, 
in a natural sense, the isotropic Grassmannian is 
the compactification of the linear space of skew- 
symmetric matrices Alt(k) = C, N = k(k — 1)/2. 

A similar construction makes sense for the 
symplectic group: if we replace the symmetric form 
I with the skew-symmetric form J, we obtain the 
equation of the matrix cone representing the 
Lagrangian Grassmannian GrÉ (k; 2k) of Lagrangian 
subspaces in 2k-dimensional linear symplectic space. 
If we were to choose J as above, then in the 


(inhomogeneous) coordinate chart we obtain the 
condition that the matrix z is symmetric. Thus, we 
have the (dense) coordinate chart on the Lagrangian 
Grassmannian C™ =Sym(k), N=k(k+1)/2 —- the 
linear space of symmetric matrices. 

There is one more type of minimal flag manifolds 
for the orthogonal group SO(n; C) — the quadric Q 
in the projective space: 


Iz= =O 


where rows z € C”\{O} represent, in homogeneous 
coordinates, points in CP”~'. If I=E, we have the 
equation (z1) +---+(z,)7=0. This quadric is the 
complex compact conformal flat manifold 
CCN,N=n-—2; it is the compactification of C 
endowed with the flat conformal structure corre- 
sponding to the quadratic isotropic cone. The 
parabolic group is generated by linear conformal 
transformations and translations. On the quadric O 
the conformal structure is defined by intersections of 
tangent spaces with O. Apparently, this structure is 
invariant relative to the natural action of SO(n; C). 


Classical Stein Manifolds 


Such homogeneous complex manifolds X = G/H have 
complex reductive isotropy subgroups H. Contrary to 
the flag manifolds which are compact, these manifolds 
are Stein ones and there are many holomorphic 
functions on them. The typical examples for 
G=GL(n;C) are homogeneous spaces S(kj,..., 
k41), n= ki +--+ + k41, for which the isotropy sub- 
groups are blockdiagonal matrices with the blocks of 
sizes ky,...,R,41. Then points of the manifold can be 
realized as generic sets of subspaces L; C C”, 
dim L;=k;,1 <j <r+1 or, what is equivalent, gen- 
eric sets of (k; — 1)-dimensional planes in Cp”. Since 
the isotropy subgroup of such a homogeneous space is a 
subgroup of the parabolic subgroup P(7,...,7,), 
kj =n; — nj-1, we have the natural fibering S(ki,..., 
k1) > F(my,...,”,) (it is simple to see this geo- 
metrically: the ith subspace of a flag in the base is the 
direct sum of first 7 subspaces representing a point in 
the fiber). This is a convenient tool to apply 
complex analysis on S to the compact manifold F 
where there are no nontrivial holomorphic functions. 
Let us emphasize that such a connection exists only 
for special classes of classical Stein manifolds. 

Let us pay special attention to the subclass of 
symmetric Stein manifolds. For such manifolds X, the 
isotropy subgroup H is fixed relative to a holomorphic 
involutive automorphism of G. Complex semisimple 
Lie groups G (including classical ones) are symmetric 
Stein manifolds relative to the action of their square 
G x G by left and right multiplications. 


Classical Stein manifolds for SL(n; C) considered 
above are symmetric if r=1 and we have the 
manifold of pairs of subspaces of complimentary 
dimensions intersecting only on {0}. The simplest 
example is the manifold of pairs of different points 
of the projective line CPt. Let us point out again 
that the transition to the generic pairs of points 
transforms the compact complex manifold without 
nonconstant holomorphic functions into a Stein 
manifold with a large collection of holomorphic 
functions. 

Some other examples of symmetric Stein mani- 
folds are connected with classical geometry and 
linear algebra. The affine hyperboloid in C”, 


Q(z) =1 


is a Symmetric space for G = O(n; C), H = O(n — 1; C). 
We can compare it with the projective quadric 
O(z) =0 which is a minimal flag manifold. Let us 
remark that there is a duality here: it is possible to 
interpret points of the hyperboloid of dimension 
as generic hyperplane sections of the projective 
quadric of dimension n — 1. 

The space X of complex symmetric matrices of 
order n with determinant 1 is symmetric for the 
group SL(m;C) which acts by the changes of 
variables in the corresponding quadratic forms: 


z= g' zg, g € SL(n; C) 


The transitive action reflects the possibility of 
transforming such a form into a sum of squares. 
The isotropy subgroup is SO(n; C). 

The Stein symmetric manifold X = SO(n; C)/ 
S(O(k; C) x O(n — k; C)) is realized as the manifold 
of k-dimensional subspaces in C” on which the 
restriction of the principal symmetric form I is 
nondegenerate. 


Isomorphisms in Small Dimensions 


Isomorphisms of classical groups in small dimen- 
sions produce isomorphisms of some classical 
homogeneous manifolds. Such isomorphisms were 
very important in the history of geometry; below are 
a few examples. We will consider local isomorph- 
isms (up to a finite center). We have SL(2;C) = 
SO(3; C). Let us realize C? as the space of symmetric 
matrices z of order 2. Then, as we remarked above, 
the two-dimensional submanifold X of matrices 
with determinant 1 is the symmetric Stein manifold 
for the group SL(2;C). On the other hand, we can 
take detz as the quadratic symmetric form I in C; 
then X is the hyperboloid for this form and the 
action of SL(2;C) on symmetric matrices gives the 
orthogonal transformations relative to this form I. 
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Similarly, we can interpret the local isomorphism 
SO(4; C) & SL(2;C) x SL(2; C). We realize Cf as the 
space of square matrices z of order 2 with the 
symmetric quadratic form I(z,z) = det (z). Then left 
and right multiplications of z on unimodular 
matrices (z> uzv, u,v € SL(2;C)) induce orthogonal 
transforms for the form I and any orthogonal 
transform can be represented in such a form (one 
can see it by the calculation of dimensions). 

The local isomorphism SL(4; C) = SO(6;C) has a 
slightly more complicated nature. Let us consider the 
Grassmannian Grc(2;4) of lines in the projective 
space CP? with 2 x 4 matrices Z as matrix homo- 
geneous coordinates. Let pjj,7 < j, be the minors of Z 
with ith and jth columns. They are called Pliicker 
coordinates on Grc(2;4): the equivalency class of 
Z is defined by the sequence of six numbers 
p=(pi.1 St <j <7) #(0,...,0) up to a constant 
factor. Thus, we have an imbedding of Grc(2; 4) in the 
projective space CP°. The image will be the quadric 


p12P34 — P1324 + Pi4sp24 =0 


Thus, we have the isomorphism of two flag manifolds 
and the action of SL(4;C) on the Grassmannian 
transforms in orthogonal transformations of four- 
dimensional quadric in CP°. The Pliicker coordinates 
can be defined for any Grassmannian, but they do not 
produce in other cases some isomorphisms with other 
flag manifolds; nevertheless, they realize them as 
intersections of quadrics in projective spaces. 


Compact Classical 
Homogeneous Manifolds 


Compact classical groups U(r), SU(m), O(n), SO(n), 
Sp(/) are maximal compact subgroups in the corre- 
sponding classical complex groups GL(n; C), SL(n; C), 
O(n; C), SO(n; C), Sp(/;C). This condition defines 
them up to an isomorphism. They are fixed subgroups 
of some antiholomorphic involutive automorphisms. 
The unitary groups U(n) and SU(m) are the groups 
of unitary matrices (g*g=E,) correspondingly, of 
unitary matrices with determinant 1. As the compact 
orthogonal group we can take the intersection U(n) N 
O(n; C). For the standard form J, it will be the group of 
real orthogonal matrices: g' g = E (so the involution in 
O(n; C) is the conjugation g—> 8). Similarly, we can 
take Sp(/) =SU(2/) A Sp(l; C) (then the involution is 
gr> — Jf). 

Compact classical groups act on compact homo- 
geneous Riemann manifolds. There are two mech- 
anisms connecting compact and complex 
homogeneous manifolds. We observe the first 
possibility in the case of flag manifolds which are 
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compact. We considered them so far relative to the 
action of complex (noncompact) groups. It turns out 
that on the flag manifold F=G/P the maximal 
compact subgroup U C G continues to be transitive: 
so we can consider flag manifolds also as being 
homogeneous with compact groups. Then F= U/C, 
where C is the centralizer of a torus in U. There is a 
Kahler metric on F, invariant relative to U. Thus, G 
is the group of all automorphisms of F as the 
complex manifold, but U is the group of its 
automorphisms as the Kahler manifold. It defines 
two sides of geometry of flag manifolds: complex 
and Kahler. Flag manifolds are the only compact 
homogeneous Kahler manifolds with semisimple Lie 
groups (the class of all compact Kahler manifolds 
also contains locally flat compact manifolds — 
toruses). In the example considered above we have 
F(m1,...,,) =SU(m)/S(U(Ro) x --- U(R,)). In the lan- 
guage of Stiefel (homogeneous) coordinates, we fix a 
positive Hermitian form in C” and characterize 
subspaces by orthonormal bases. For r=1 we have 
Grassmannians Grc(k;7), in particular the projec- 
tive space CP”! which we consider relative to the 
action of the unitary groups. Relative to this action 
they are Hermitian symmetric spaces. In the case of 
minimal flag manifolds for other groups the action 
of maximal compact subgroups also defines on them 
the structure of compact Hermitian symmetric 
spaces. Let us emphasize that relative to noncom- 
pact groups of biholomorphic automorphisms G, 
the minimal flag manifolds (including the Grass- 
mannians) are not symmetric. 

In the case of homogeneous Stein manifolds 
X=G/H, the picture is different: the maximal 
compact subgroups have no open orbits. There are 
totally real orbits which are the compact forms of 
X: Xp = Gr/Hgr, where Gr and Hp are compact 
forms of G and H, respectively. It is the canonical 
embedding of compact homogeneous manifolds 
in their complexifications. The important special 
case is the embedding of compact symmetric 
manifolds in the Stein symmetric manifolds — their 
complexifications. 

For compact symmetric manifolds X =U/K the 
groups U,K are compact Lie groups and elements 
of K are fixed for an involutive automorphism o 
such that K contains the connected component of 
the subgroup of all fixed elements of o. This 
possibility to connect several symmetric manifolds 
with one involution is illustrated by the next 
example. The sphere S”! c R” is the symmetric 
space SO(n)/SO(n — 1); the real projective space 
RP”! is SO(n)/O(n—1). Here SO(n—1) is the 
connected component of O(n — 1) and S$”! is a 
double covering of RP”~'. A few more examples, the 


real Grassmannian Grg(k;n) of k-subspaces in R” 
can be defined as SO(u)/S(O(k) x O(n — k)). This 
representation corresponds to the characterization 
of subspaces by orthonormal bases. The considera- 
tion of arbitrary bases defines the action of the 
larger group GL(n;R) on Grg(k; n). Relative to this 
action, the real Grassmannian is not symmetric since 
the isotropy subgroup is parabolic and is not 
involutive. Such a possibility to extend the group is 
typical for a class of compact symmetric manifolds 
called symmetric R-spaces. They are real forms of 
Hermitian compact symmetric manifolds (minimal 
flag manifolds). Let us also mention compact 
symmetric spaces SU(m) /SO(7), which is the compact 
form of the space of unimodular symmetric matrices 
and can be presented by the submanifold of unitary 
matrices in it. Also, all compact Lie groups G are 
symmetric spaces relative to the action of G x G. 


Noncompact Riemannian 
Symmetric Manifolds 


This class of symmetric manifolds has the strongest 
connections with classical mathematics. Let us 
consider noncompact real semisimple Lie groups — 
real forms of complex semisimple Lie groups. They 
correspond to antiholomorphic involutions in com- 
plex groups. 

Between real forms of SL(C, n) there are real and 
quaternionic unimodular groups SL(R, n), SL(H, 7) 
and pseudounitary groups SU(p,g) of complex 
matrices preserving a Hermitian form H of the 
signature (p,q). The complex orthogonal group has 
as real forms, in particular, pseudoorthogonal 
groups SO(p,g) of real matrices preserving a 
quadratic form of the signature (p,q). 

Let G be a real simple Lie group and K be its 
maximal compact subgroup. Then X=G/K is a 
Riemann symmetric manifold of noncompact type; 
K is defined by an involutive automorphism of G. 
Therefore, in irreducible situation there is a corre- 
spondence between noncompact Riemann sym- 
metric manifolds and real simple noncompact Lie 
groups. K-orbits on X are parametrized by points of 
the orbit on X of a maximal abelian subgroup A —- 
the Cartan subgroup of the symmetric space X. Its 
dimension / is the important invariant of X — its 
rank. The algebraic base for geometry of X is the 
Iwasawa decomposition 


G=KAN 





where N is a maximal unipotent subgroup (in a 
natural sense compatible with A). Then the para- 
bolic subgroup P = AN is transitive on X. 


Symmetric Cones 


Let us start with X = GL(n, R)/O(z). This manifold 
corresponds to the classical theory of quadratic 
forms: X can be realized as the manifold Sym, (7) of 
symmetric positive matrices x >>0O of order n 
(corresponding to positive quadratic forms). Then 
the transitivity of GL(n; R) on X corresponds to the 
possibility to transform positive forms to a sum of 
squares. The sufficiency of triangle matrices for such 
transformations corresponds to the transitivity on 
X = Sym, (n) of the parabolic subgroup P of (upper) 
triangle matrices with positive diagonal elements. So 
A is the group of diagonal matrices with positive 
elements and the submanifold of diagonal matrices 
in X parametrizes K-orbits. The general fact about 
A-parametrization in this example is the classical 
fact about the reduction of quadratic forms to 
diagonal form by orthogonal transformations. 

There are complex and quaternionic versions 
of this picture. The symmetric manifold 
X=GL(n;C)/U(m) is realized as the manifold 
Herm, (n) of positive complex Hermitian matrices 
(forms) and similarly classical facts of linear algebra 
on Hermitian quadratic forms are transformed into 
geometrical statements on symmetric spaces. Let us 
emphasize that we consider here the group GL(n; C) 
as the real group. The same situation exists with the 
manifold Herm,(H,”) of positive quaternionic 
Hermitian matrices, which is the symmetric mani- 
fold for the real group GL(n; H). 

These three manifolds can be included in an 
impressive geometrical structure. They all are con- 
vex homogeneous cones V in linear spaces R which 
are self-dual (V = V*) relative to a bilinear form 
(-,-). Let us recall that 








V" = {x; (x,y) > 0,y eV \ 0} 


Here V is the closure of V. So these three symmetric 
manifolds are linear homogeneous self-dual cones. 

There is only one more type of classical homo- 
geneous self-dual cones — quadratic (Lorentzian) 
cones 


La = {x ER, xf — x3 — +--+ x > 0,21 > OF 


which is also called the future light cone (the 
condition x; < 0 defines the past light cone). The 
group of linear automorphisms of this cone is 
SO(1,7) x Rt; the first factor is the Lorentz group. 

There is also one exceptional 27-dimensional 
cone; it is possible to interpret this cone as the 
cone of positive Hermitian matrices of third order 
over Cayley numbers. There is a very nice structural 
theory of homogeneous self-dual cones; it is con- 
venient to develop this theory in the language of 
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Jordan algebras (Faraut and Koranyi 1994). Such 
cones participate as elements of explicit construc- 
tions of other classes of symmetric spaces (see 
below). 

Following Siegel, it is possible to connect with 
homogeneous self-dual cones multidimensional ver- 
sions of Euler integrals ((- and B-functions) (Faraut 
and Koranyi 1994). They have many applications, 
including those to integral formulas for complex 
symmetric domains. 


Riemann Symmetric Manifolds of Rank 1 


The first example of non-Euclidean geometry is 
connected with the Riemann symmetric manifolds of 
rank 1 — hyperbolic spaces; X =SO(1,7)/O(m) is the 
hyperbolic space of dimension n. It can be realized 
as the upper sheet of the two-sheeted hyperboloid: 


sgam + x2 =1,%9 > 0 
Pseudoorthogonal linear transformations from 
SO(1,7) preserve this surface; they play the role of 
hyperbolic motions. The equivalent realization is in 
the real ball x7—---—x2<1 relative to the 
projective transformations preserving this ball. 

Another example of a Riemann symmetric mani- 
fold of rank 1 is the complex hyperbolic symmetric 
space X = SU(1; n)/U(n). It can similarly be realized 
either as the hyperboloid 


pof =a — +++ — eal = 1 


in C”*! relative to pseudounitary linear transforma- 
tions or as the complex ball |z| +--+ lza <1 
relative to complex projective transformations pre- 
serving it. There are also quaternionic hyperbolic 
spaces which are realized as the quaternionic balls in 
the quaternionic projective spaces. These three series 
exhaust all classical Riemann symmetric manifolds 
of rank 1 of noncompact type. There is only one 
exceptional symmetric manifold of rank 1: it has the 
dimension 16 and can be interpreted as a two- 
dimensional ball for Cayley numbers. 


Classical Symmetric Domains in C” 
(Cartan Domains) 


Riemann symmetric manifolds of noncompact type 
which admit an invariant complex structure also 
have an invariant Hermitian form corresponding to 
the Riemann metrics. For this reason, we will call 
them noncompact Hermitian symmetric manifolds 
(we considered above the compact Hermitian sym- 
metric manifolds). They are Stein manifolds, but as 
opposed to symmetric Stein manifolds, which we 
considered above, they are homogeneous relative to 
real groups. The condition for a Riemann symmetric 
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manifold X = G/K to be Hermitian is that K has an 
one-dimensional center. All Hermitian symmetric 
manifolds of noncompact type can be realized as 
bounded domains in C” (but, of course, not all their 
holomorphic automorphisms extend in C”). In the 
case of classical manifolds, these domains are called 
Cartan’s domains: Cartan gave their explicit matrix 
realizations. 

The nature of groups of holomorphic automorph- 
isms of symmetric domains X=G/K c C is 
explained by Cartan’s duality. Each such domain 
(Hermitian symmetric manifold of noncompact 
type) admits an embedding in a Hermitian sym- 
metric manifold of compact type Xc such that the 
complexification Gc of G is the group of holo- 
morphic automorphisms of Xc (correspondingly, 
D is an open G-orbit on Xc). Moreover, X lies 
inside a (Zariski open) coordinate chart C, which 
is an orbit of a parabolic subgroup. 

The simplest example is the complex ball CB” 
(complex hyperbolic space) imbedded in the com- 
plex projective space CP”. The affine chart C” is the 
orbit of the parabolic subgroup of affine transfor- 
mations. Let us consider more complicated 
examples. 

Let Xc be the Grassmannian Grc(k;n),q =n — 
k > p; we will use matrix homogeneous coordinates 
Z — kxn matrices — for the description of the 
symmetric domain. Then Gc =SL(n; C). Let us take 
its real form G=SU(k;g),kt+q=n. We fix a 
Hermitian form H of the signature (k, g) and realize 
G as the group of matrices preserving H: 


gig =H 
Then X = Xz, 4 =SU(k, g)/S(U(R) x U(q)) can be rea- 
lized as the domain in the Grassmannian 


ZHZ > 0 


so that this Hermitian matrix of order k must be 
positive. It is essential that this condition is invariant 
relative to multiplications of Z on nondegenerate 
matrices u on the left and, therefore, it is a well- 
defined condition in homogeneous coordinates. 

Let us specify the choice of H: 


(Ep 0 
mi=( —,) 


Then the corresponding domain Xj, is defined in 
inhomogeneous coordinates Z = (Ez, z),z € Mat(k, q), 
by the condition 


Ep — zz“ >0 


This matrix ball lies completely in the coordinate 
chart C4, Its rank is equal to min (k, q). Thus, we 


have the realization of this Hermitian symmetric 
space as a bounded domain in C, N = kq. In the 
case k= 1, we have the usual (scalar) complex ball. 
Let us remark that the edge of the boundary 
(Shilov’s boundary) is the compact symmetric space 


with the group of automorphisms S(U(k) x U(q)) 
(the isotropy subgroup of X). For k=q the edge 
coincides with the set of unitary matrix U(k). 
Different forms H of the signature (k,q) are 
linearly equivalent and they correspond to different 
(biholomorphically equivalent) realizations of this 
Hermitian symmetric spaces. Let us, in the beginning, 
set k=q; the inhomogeneous matrix coordinates are 
square matrices of order k. Let us take the form 


fo ik 
m=( ir, “i 


Then, in inhomogeneous matrix coordinates, we 
have the domain X3: 


1 
7 (z= 2") > 0 


(complex matrices with positive skew-Hermitian 
parts). This domain (but not its boundary) lies in 
the chart. It has the structure of the tube domain 
T =R” +iV,n= k?, corresponding to the symmetric 
cone of positive Hermitian matrices (we take the 
space of such matrices as a real form of C”). The 
group of affine transformations of the tube domain: 


Zreuzu*+a, ue GL(k;C),a € Herm(k) 


is transitive on X32; it is the parabolic subgroup in 
SU(R, q). 

The biholomorphic equivalency of the realizations 
of X corresponding to different H is induced by the 
equivalency of these forms. We have 

. V2 ( Ep} iE; 
Hy =AH1A ; a e E; ) 
Then the transform Z — ZA transforms X> in X4. In 
inhomogeneous coordinates it is the fractional linear 
matrix transform 


zr ilz + iE) "(z —iE}) 


It is the matrix version of the classical Cayley transform. 
Similarly, we can write down the inverse transform. 

If q Æ k, then there is also an analog of the tube 
realization. Let r=q — k > 0 and 


0 iE, 0 
m=- 0 Ü 
0 0 -E, 


Let us represent the inhomogeneous coordinates 
as z=(E,,w,u),w E€ Mat(k),u € Mat(k,r). Then the 
domain X3 is defined by the condition 


1 

7 (w —w") uu" > 0 
This is an example of Siegel domains of the second 
kind (Pyatetskii-Shapiro 1969). This domain has a 


transitive group of affine transformations: 


(w,u)> (wtat2ub* + bb*,u+b) 
a € Herm(k), b € Mat(k,r) 
(w,u)++(cwc*,cu) c€ GL(k;C) 


This class of symmetric domains in Grassman- 
nians is called Cartan’s domains of the first class. 
There are similar constructions for minimal flag 
domains (compact Hermitian symmetric spaces) 
with other groups. Let us consider the Lagrangian 
Grassmannian Grě(k; 2k) corresponding to the 
form J above. Here Gc=Sp(k,C). Its real form 
G=Sp(k;R) can be realized as the subgroup 
of complex symplectic matrices preserving a 
Hermitian form H of the signature (k, k). In other 
words, we intersect the domains from the last 
example with the Lagrangian Grassmannians. We 
consider the coordinate chart with inhomogeneous 
coordinates — symmetric matrices z € Sym(k). For 
Hı we have the domain of symmetric matrices z 
with the condition 


E -27> 0,2 =z" 


This bounded realization is called Siegel’s disk. For 
H> the real form is the group of real symplectic 
matrices and X> is the domain 


+ 


1 
Sz=—(<-zZ)>0, z=z 


2i 
of complex symmetric matrices with positive ima- 
ginary parts; it is called Siegel’s half-plane. This is 
the third class of Cartan’s domains. There are Siegel 
domains of second kind connecting with the cones 
of positive symmetric matrices; some of them are 
homogeneous, but they are never symmetric. 

There are two more series of classical minimal flag 
manifolds: the isotropic Grassmannians and quadrics. 
They both contain the dual bounded symmetric 
domains (Cartan’s domains of second and fourth 
classes correspondingly). Some of these domains in 
the isotropic Grassmannians admit the realizations as 
tubes with the cone of positive Hermitian quaternionic 
matrices and others as Siegel domains of the second 
kind corresponding to the same cones. 

Symmetric domains in quadrics can be realized as 
tube domains with the Lorentzian (light) cones. 
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The corresponding tubes are called the future (past) 
tube, depending on which light cone was taken. 
Let us consider this construction. The group of 
holomorphic automorphisms of these domains is 
G=SO(2;n) — the conformal extension of the 
Lorentz group. To realize this group, let us fix a 
real symmetric matrix O of signature (2,7) and the 
group is the group of linear transformations preser- 
ving simultaneously the quadratic symmetric and 
Hermitian forms with this matrix QO: 


g'QOg=0, g*Qg=O 
The standard realization corresponds to the diagonal 
matrix O with the diagonal (1,1,—1,...,—1). 
Cartan’s domains of the fourth class are connected 
components of the manifold 


ZOOL =0, ZOZ S60 


where rows Z are homogeneous coordinates in the 
projective space CP”*!. In other words, we consider 
a domain on the quadric in the projective space 
(which is the complex flat conformal space CC”). 
For the standard O the domain will lie in the 
coordinate chart; thus it is the bounded realization. 
For the tube realization, we take 


01 0 
O=11 0 0 
0 0 E, 


Let Z= (Z0,Z1,W1,-.-, Wn), W =u + iv, g(s,t)=siti— 
Saly = ecm —s,t, and we consider the affine 
chart C”*! ={zy) =1}. We have 


ZOZ' =2z, + q(w,w)=0 
ZOZ* = 221 + q(w,w) >0 


The first condition gives 23tz, = q(v, v) — q(u,u) and 
then the second condition gives the final description 
of the considered set in C;,: 


qv, v) =v] E v5 a 
as the union of the future and the past tubes 
(T+ ={v1 Z0}. The edge R” of these tubes (v= 0) 
has the structure of the Minkowski space correspond- 
ing to the form q. The parabolic subgroup is the affine 
conformal group of the Minkowski space. It includes 
the Poincaré group and is transitive on tubes. The 
complete group of holomorphic automorphisms of 
tubes G =SO(2, n) is the group of all (not only affine) 
conformal transformations of the Minkowski space. 
The complete edge of these symmetric domains in the 
quadric CC” is the conformal compactification of the 
Minkowski space (a compact symmetric R-space with 
the compact group S(O(2) x O(7)) on which the 
noncompact group SO(2, 7) also acts). 


ue >0, wa=utiv 
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In addition to four Cartan’s classes of classical 
domains there are two exceptional symmetric 
domains in the dimensions 27 and 16 (dual to two 
exceptional compact Hermitian symmetric spaces of 
these dimensions). The first of them can be realized 
as the tube domain corresponding to the exceptional 
cone of positive Hermitian matrices with Cayley 
numbers of order 3 (the dimension 27) and another 
can be realized as a Siegel domain of the second 
kind associated with the eight-dimensional future 
tube. It is possible, using I-function of self-dual 
homogeneous cones, to write explicit Bergman and 
Cauchy-Szego integral formulas. 


Noncompact Symmetric R-Spaces 


There are several other interesting noncompact 
symmetric manifolds. Let us mention the noncom- 
pact symmetric R-spaces which are real forms of 
complex symmetric domains. The typical example is 
the domain of real square matrices x € Mat(k): 


Ep —xx' >0 


The condition is that this symmetric matrix is 
positive. It is the Riemann symmetric space with 
the group G=SO(k,k). It can be embedded in the 
real Grassmannian Grp(k;2k) with the matrix 
homogeneous coordinates X € Matr(k,2k) and the 
group SL(2k; R) acting of X by right multiplications. 
Let 


and SO(k, k) be the subgroup of matrices preserving 
the quadratic form l:ghg' =l. This group will 
preserve the domain XI,X' > 0 and, in the inho- 
mogeneous coordinates X = (Ep, x), x E€ Matg(k), it 
will be exactly the same as the domain above. The 
group SO(k,k) acts by matrix fractional linear 
transformations. This domain is the real form on 
Siegeľs ball. If we replace the form on 


(0 Er 
2 E 0 


then we realize our symmetric manifold as the 
domain 


x+x 0 


So, the symmetric part of the matrix x must be 
positive. This realization is homogeneous relative 
to the linear automorphisms: x= axa! +b, a€ 
GL(k; R), b= —b'. A similar construction exists 
for rectangular matrices. 


Geometry of Isomorphisms in Small Dimensions 


We connected above several local isomorphisms of 
complex classical groups with some geometrical 
facts. Let us mention now several similar examples 
for real groups. We start from isomorphisms of 
symmetric cones. The cone Sym,(2) of symmetric 
positive matrices of second order is (linearly) 
isomorphic to the future light cone L(2). The 
comparison of the groups of automorphisms gives 
the local isomorphism 


SL(2; R) = SO(1; 2) 


This isomorphism corresponds also to the isomorph- 
ism of two classical realizations of hyperbolic plane — 
of Poincaré and Klein. Let us also mention that the 
isomorphism SL(2, R) = SU(1, 1) corresponds to the 
holomorphic equivalency of the disk and the upper 
half-plane. The isomorphism Herm, (2) = L(3) corres- 
ponds to the presentation of any Hermitian matrix of 
the order 2 in Pauli’s coordinates, 


( t— X1 
T= 
X2 — 1X3 
Then, det z=t? — x? — x$ — x4. To compare the 
groups of automorphisms, we receive 


SL(2,C) = SO(1,3) 


Xit 3 
LFX 


Similarly, in the quaternionic case, the isomorphism 
of the cones Herm, (2, H) gives the isomorphism 


SL(2, H) = SO(1, 5) 








The linear isomorphism of cones produces the 
holomorphic isomorphism of corresponding tubes 
and their groups of holomorphic automorphisms. So 
each of these three isomorphisms gives automati- 
cally one more isomorphism. Let us give it for the 
first two cones: 


Sp(2, R) = SO(2,2),  SU(2,2) = SO(2, 3) 


We just compared the descriptions of automorph- 
isms of classical tubes from above. 

Considering det(x) as the quadratic form of 
signature (2,2) on Mat(2) ~ R*, we obtain 


SO(2,2) = SL(2, R) x SL(2, R) 


Each of local isomorphisms in the complex case 
has different real forms which admit some geome- 
trical interpretations. We mentioned above two real 
forms of the isomorphism SL(4; C) = SO(6; C). The 
isomorphism for SO(2,2) admits another interpreta- 
tion in the language of Plücker’s coordinates: points 
of the quadric in RP? of the signature (2,3) can be 
interpreted as (complex) lines in CP? which lie on a 
Hermitian quadric of the signature (2,2) (Gindikin 


1983). The isomorphism above for the group 
SL(2, H) also corresponds to Hopf’s fibering of 
CP? on complex lines over the sphere S* or the 
isomorphism S4 and the quaternionic projective line 
HP!. In all these cases, isomorphisms of homo- 
geneous manifolds intertwine the actions of locally 
isomorphic groups. 








Pseudo-Riemann Symmetric Manifolds 


We obtain the next broad class of homogeneous 
manifolds if we preserve conditions that the group G 
is a real semisimple one, the isotropy subgroup H is 
involutive, but we remove the restriction that H 
must be (maximal) compact. Such symmetric mani- 
folds are often called semisimple pseudo-Riemann 
symmetric manifolds (since there are also pseudo- 
Riemann symmetric manifolds whose groups are not 
semisimple). This class of spaces contains symmetric 
Stein manifolds Xc=Gc/Hc. Each semisimple 
symmetric manifold X = G/H admits complexifica- 
tion as a symmetric Stein manifold. Each real 
semisimple Lie group G is symmetric relative to 
the group G x G. 

The simplest family of semisimple symmetric 
manifolds is the family of all hyperboloids of all 
signatures 


Hyg ta} bs bad abyy a=) 
with the groups SO(p, q). Their complexifications 
are complex hyperboloids. There are two types 
of Riemann manifolds in these families: compact 
ones — spheres and noncompact ones — two-sheeted 
hyperboloids; all others are pseudo-Riemann. 

The Cartan duality holds for pseudo-Hermitian 
symmetric manifolds: they are domains in compact 
Hermitian symmetric manifolds (minimal flag mani- 
folds) Z=Gc/Pc. They are open orbits of real 
forms G of the groups of holomorphic automorph- 
isms Gg. We construct examples of such manifolds 
if we consider one of the above-described realiza- 
tions of noncompact Hermitian symmetric mani- 
folds (through matrix homogeneous coordinates) 
and replace the condition of positivity with the 
condition that the symmetric (Hermitian) matrix in 
the definition has a fixed nondegenerate signature 
(i,k —i). We can call such pseudo-Hermitian sym- 
metric manifolds satellites of Hermitian ones. 
Correspondingly, we can consider nonconvex 
tubes, for example, the set T of such symmetric 
matrices whose imaginary parts have the signature 
(in — i). This domain is linear homogeneous, but it 
is not symmetric; to receive the symmetric manifold 
we need to extend the nonconvex tube by a 
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manifold of smaller dimension (which plays a role 
of infinity). 

There are pseudo-Hermitian symmetric manifolds 
which are not satellites of Hermitian ones. Let us 
give an interesting example. The group SL(2p, R) 
has two open orbits on the Grassmannian 
Grc(p;2p) which are both pseudo-Hermitian sym- 
metric spaces. Let us consider as above the Stiefel 
coordinates Z € Matc(p,2p) and let Z=X+1Y. 
Then the orbits are defined by the conditions 


det(¥ ) 20 


In the intersection with the coordinate chart 
Z=(E,z),z E€ Matc(p),z=x+iy, we have the 
conditions 


det y20 


Therefore, we obtain (nonconvex) tube domains in 
CN = Matc(p), N =p’, corresponding to nonconvex 
homogeneous cones V+} of real matrices with 
positive (negative) determinants. These tubes do 
not coincide with the symmetric manifolds which 
include also some sets of small dimensions outside of 
the coordinate chart (on “infinity”). There are other 
homogeneous nonconvex cones such that corre- 
sponding tube domains are Zariski open parts of 
pseudo-Hermitian symmetric spaces (D’Atri and 
Gindikin 1993). Between these cones are cones of 
nondegenerate skew-symmetric matrices, of skew- 
Hermitian quaternionic matrices. We again observe 
strong connections with classical mathematics. Not 
all pseudo-Hermitian symmetric manifolds admit 
such tube realizations of dense parts. Analysis in 
pseudo-Hermitian symmetric manifolds is very 
interesting: we consider there instead of holo- 
morphic functions 0-cohomology of some degree. 

Geometric relations between different symmetric 
manifolds are usually important for analytic applica- 
tions since they can produce some nontrivial integral 
transformations. In a broad sense, such transforms are 
considered in integral geometry (Gelfand et al. 2003). 
An important example is duality between some 
compact Hermitian symmetric manifolds (when points 
in one of them are interpreted as submanifolds in 
another one). The simplest example is the projective 
duality between dual copies of projective spaces or, 
more generally, the realization of points of Grass- 
mannians as projective planes. Such a duality can 
induce a duality between orbits of real forms of groups. 
In a special case, it can be a duality between Hermitian 
and pseudo-Hermitian symmetric manifolds. 

Here is one important example. Let us consider in 
the projective space CP**-! the domain D which in 
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homogeneous coordinates — rows z= (2%0,21,--+5%n) 
are defined by the equation zHz* >0, where H 
is a Hermitian form of the signature (k,k), for 
example, 


2 2 2 2 
kol eee ep, =y = es zn) SO 


This domain is (k — 1)-pseudoconcave and it con- 
tains (k — 1)-dimensional complex compact cycles, 
namely (k — 1)-dimensional planes. The manifold of 
these planes is exactly the domain X in the Grass- 
mannian Grc(k;2k) (of projective (k — 1)-planes) 
which is the noncompact Hermitian symmetric 
space — the orbit of the group SU(k, k) (see above). 
This picture is the geometrical basis for a deep 
analytic construction. In the domain D the spaces 
of (k —1)-dimensional -cohomology are infinite 
dimensional for some coefficients. Their integration 
on (k—1)-planes (the Penrose transform) gives 
sections of corresponding vector bundles on X. The 
images are described by differential equations — 
generalized massless equations. The basic twistor 
theory corresponds to k=2 when X is isomorphic 
to four-dimensional future tube (see above). 

Similar dual realizations of Hermitian symmetric 
manifolds exist only in special cases. The twistor 
realization of four-dimensional future tube was 
possible since the Grassmannian Grc(2;4) is iso- 
morphic to the quadric in CP*. This does not work 
for the future tubes of bigger dimensions but there is 
another possibility (Gindikin 1998). Let us have the 
quadric O,-1 C CP” be defined in the homogeneous 
coordinates by the equation 


C(z) = (zo)*— (z1)? — ++» — (en)? =0 


and z-¢ is the bilinear form. As already mentioned, 
the set of (nondegenerate) hyperplane sections 


¢-z=0, ¢eC™, O(O=1 


of O,-1 is the corresponding hyperboloid H,,. Thus, 
we have the duality between a flag manifold (the 
quadric O,_1) and a symmetric Stein manifold (the 
hyperboloid H,,) with the same group SO(n + 1, C); 
they have different dimensions. 

The group SO(1,n) has two orbits on QO,-1: 
the real quadric Op = {z € On-13 S(z) =0} and its 
complement X=QO,-1\Opr. Hyperplane sections 
which do not intersect Op (lie at X) correspond 
such ¢ € H, that 


LI(R(z)) > 0 


This set has two connected components D+ which 
are biholomorphically equivalent to the future and 
past tubes T+ of the dimension n. Let us emphasize 
that their group of automorphisms is SO(2,7) in 


spite of the fact that this group acts neither on X 
nor on H,. Such an extension of the symmetry 
group is a very interesting phenomenon. It happens 
for several other symmetric manifolds, but is not a 
general fact. This geometrical construction gives a 
possibility to construct a multidimensional version 
of the Penrose transform from (n — 2)-dimensional 
-cohomology with different coefficients into solu- 
tions of massless equations on the future (past) 
tubes. 

The last duality is connected with some general 
geometrical construction. We mentioned that each of 
the Riemann symmetric manifolds X = G/K admits a 
canonical embedding in the symmetric Stein manifold 
Xc = Gc/Kc. It turns out that X has in Xc a canonical 
Stein neighborhood — the complex crown Q(X) such 
that many analytic objects on X can be holomorphi- 
cally extended on the crown (Gindikin 2002). For 
example, all solutions of all invariant differential 
equations on X (which are elliptic) admit such 
holomorphic extension. In the last example, D is 
the crown of the Riemann symmetric space which is 
defined, in H,,, by the condition S(¢) = 0, R(Co) > 0. 

Symmetric manifolds are distinguished from most 
other homogeneous manifolds by a very rich 
geometry which is a background for deep analytic 
considerations. There are several important nonsym- 
metric homogeneous manifolds. We already men- 
tioned flag manifolds and Stein homogeneous 
manifolds with complex semisimple Lie groups 
which can be nonsymmetric. Pseudo-Riemann sym- 
metric manifolds are open orbits of real groups on 
compact Hermitian symmetric spaces. It turns out 
that open orbits on other flag manifolds also 
produce interesting homogeneous manifolds. Let 
F=Gc/Pc be a flag manifold. Flag domains are 
open orbits of a real form G on F. Of course, 
pseudo-Hermitian symmetric manifolds are a special 
case of this construction. Let us consider a simple 
example with Gc =SL(3;C) and P — the triangle 
group. Then points of F are pairs {a point z and a 
line l passing through it}. Let G=SU(2;1); it has 
two open orbits on CP: the complex ball D and its 
complementary D“. On F, the group G has three 
open orbits (flag domains): in the first z € D, L is 
arbitrary; in the second / c DS; in the third z € DS, 1 
intersects D. They are all 1-pseudoconcave. In one- 
dimensional 0-cohomology of these flag domains 
with coefficients in line bundles, are realized all 
three discrete series of unitary representations of 
SU(2,1). For arbitrary semisimple Lie groups, all 
discrete series of representations can also be realized 
in -cohomology of flag domains. Crowns of 
Riemann symmetric spaces which we just mentioned 
parametrize cycles (complex compact submanifolds) 
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in flag domains. Some general version of the Penrose 
transform connects through the integration along 
cycles cohomology in flag domains with holo- 
morphic solutions of some differential equations in 
crowns (generalized massless equations). 


See also: Combinatorics: Overview; Compact Groups 
and their Representations; Lie Groups: General Theory; 
Pseudo-Riemannian Nilpotent Lie Groups; Several 
Complex Variables: Compact Manifolds; Stability of 
Minkowski Space; Symmetry Classes in Random Matrix 
Theory; Twistor Theory: Some Applications; Twistors. 
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Introduction 


The notion of “classical r-matrices” has emerged as 
a by-product of the quantum inverse scattering 
method (which was developed mainly by L D 
Faddeev and his team in their work at the Steklov 
Mathematical Institute in Leningrad); it has given a 
new insight into the study of Hamiltonian structures 
associated with classical integrable systems solvable 
by the classical inverse scattering method and its 
generalizations. Important classification results for 
classical r-matrices are due to Belavin and Drinfeld. 
Based on the initial results of Sklyanin, Drinfeld 
introduced the important concepts of “Poisson Lie 
groups” and “Lie bialgebras” which arise as a 
semiclassical approximation in the study of quan- 
tum groups. 

A Poisson group is a Lie group G equipped 
with a Poisson bracket such that the multiplica- 
tion m:G x G — G is a Poisson mapping. A 
Poisson bracket on G with this property is called 
multiplicative. More explicitly, let Ax, px be the 
left and right translation operators in C™~(G) by 


an element x € G,Axy(y) = (xy), pxp(y) = pyx). 


Multiplication in G is a Poisson mapping, if for 
any y, Y € C~(G), we have 


LP, Vy) = Ach AVF) + try? Pybs(%) A 


Note that in general, multiplicative brackets are 
neither left nor right invariant; in other words, for 
fixed x translation operators Ax, px do not preserve 
Poisson brackets. 

Multiplicative Poisson brackets naturally arise in the 
study of integrable systems which admit the so-called 
“zero-curvature representation.” The study of zero- 
curvature equations, and in particular, of the Poisson 
properties of the associated monodromy map, was the 
main source of nontrivial examples (associated with 
classical r-matrices, classical Yang—Baxter equations, 
and factorizable Lie bialgebras). The special class of 
multiplicative Poisson brackets encountered in this 
context is closely related to factorization problems in 
Lie groups (in particular, the matrix Riemann pro- 
blem); these problems represent the key tools in 
constructing solutions of zero-curvature equations. 

The equivalent definition of Poisson Lie groups 
uses the dual language of “Hopf algebras.” Let 
A=F(G) be the commutative algebra of (smooth) 
functions on a Lie group G equipped with the 
standard coproduct A:A—A@A 


Ay(x, y) = (xy), p E F(G), x,yEG 


as usual, we identify the (topological) tensor product 
F(G) ® F(G) with F(G x G). The multiplicative 
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Poisson bracket on G equips F(G) with the structure 
of a Poisson—Hopf algebra, that is 


Aig, Yt = (Ay, AY} |2] 


Equation [2] is the starting point for the study of 
relations between Poisson groups and quantum 
groups. Following the general philosophy of defor- 
mation quantization, we can look for a deformation 
A, of the commutative Hopf algebra A with the 
deformation germ determined by the Poisson struc- 
ture on A satisfying eqn [2]. The fundamental 
theorem (conjectured by Drinfeld and proved by 
Etingof and Kazhdan) asserts that any Poisson 
algebra associated with a Poisson group admits a 
formal quantization (in the category of Hopf 
algebras). 


Poisson Groups and Lie Bialgebras 


Let G be a Lie group with Lie algebra g equipped 
with a multiplicative Poisson bracket. Any Poisson 
bracket is bilinear in differentials of functions; it is 
convenient to express it by means of right- or left- 
invariant differentials. For y € F(G) set 


(V(x), X) = (d/dt),_9¢(ex), 
y (x), X) = re ople"), 
X € g, V(x), V p(x) € 9° 
Let us define the Poisson operator ņ:G —> 
Hom(g*, g) by setting 
{y pi) = (n(x) Vp), VY) [3] 


For a finite-dimensional Lie algebra, we can identify 
Hom(g*,qg) with q@q; the skew symmetry of 
Poisson bracket implies that 7 € g A g. By an abuse 
of language, the same identification is traditionally 
used for infinite-dimensional algebras (e.g., for loop 
algebras) as well. Of course, in the latter case, the 
corresponding Poisson tensors are represented by 
singular kernels which do not lie in the algebraic 
tensor product and should be regarded as 
distributions. 

Multiplicativity of Poisson bracket on G implies a 
functional equation for 7 


n(xy) nly) + n(x) 4] 


which means that 7) is a 1-cocycle on G (with values 
in g Ag). By setting 


(x)= (4) M Xea 


= (Adx Q Ad x) - 


we conclude from eqn [4] that 6:9 —g^g is a 
1-cocycle on q, that is, 


6([X, Y]) = [KX @I1+1@X,6(Y) 


~[Y @1+1@ Y,6(X)] 


Equation [4] implies that 7(e)=0, that is, a multi- 
plicative Poisson structure is identically zero at the 
unit element. Its linearization at this point induces 
the structure of a Lie algebra on the cotangent space 
TG œ q*; namely, for any £, € € q*, choose y, y’ € 
F(G) in such a way that Vey =£, Vy’ =£, and set 


Elke = VAG? I [5] 


It is easy to see that ([€, €'],,X)=(EA€&,6(X)), 
which proves that the bracket is well defined, 
while eqn [5] implies the Jacobi identity. 


Definition 1 Let g,g* be a pair of linear spaces set 
in duality; (g, g*) is called a Lie bialgebra if both g 
and g* are Lie algebras and the mapping ó: g >49 & 
g which is dual to the commutator map [, |, :9* Q 
q* + g* is a 1-cocycle on g. 


Thus if G is a Poisson—Lie group, the pair (g, g*) is 
a Lie bialgebra (called the “tangent Lie bialgebra” of 
G). Poisson—Lie groups form a category in which the 
morphisms are Lie group homomorphisms, which 
are also Poisson mappings. A morphism 
~>(§, 6") in the category of Lie bialgebras is 
a Lie algebra homomorphism g — 9 such that the 
dual map b* — q* is again a Lie algebra homo- 
morphism. It is easy to see that morphisms of 
Poisson groups induce morphisms of their tangent 
bialgebras. The converse is also true. 


Theorem 1 


(i) Let (q,q*) be a Lie bialgebra, G a connected, 
simply connected Lie group with Lie algebra q. 
There is a unique multiplicative Poisson bracket 
on G such that (g, g*) is its tangent Lie bialgebra. 

(ii) Morphisms of Lie bialgebras induce Poisson 
mappings of the corresponding Poisson groups. 


Basically, the theorem asserts that a Poisson 
tensor is uniquely restored from the infinitesimal 
cocycle on the corresponding Lie algebra; moreover, 
the obstruction for the Jacobi identity vanishes 
globally if this is true for its infinitesimal part at 
the unit element of the group. 

It is important to observe that Lie bialgebras 
possess a remarkable symmetry: if (q,q*) is a Lie 
bialgebra, the same is true for (q*,q). Hence, the 
dual group G* (which corresponds to q*) also carries 
a multiplicative Poisson bracket. The duality theory 
for Lie bialgebras, based on the key notion of the 
Drinfeld double, is discussed in the next section. 
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Classical r-Matrices and Special 
Classes of Lie Bialgebras 


The general classification problem for Lie bialgebras 
is unfeasible (e.g., classification of abelian Lie 
bialgebras includes classification of all Lie algebras). 
In applications, one mainly deals with important 
special classes of Lie bialgebras, of which factoriz- 
able Lie bialgebras are probably the most important. 
In a sense, this class may be regarded as exhaustive, 
since (as explained below) any Lie bialgebra is 
canonically embedded into a factorizable one. 
Various other special classes discussed in literature 
are “coboundary bialgebras,” “triangular bialge- 
bras,” and “quasitriangular bialgebras.” 

The Lie bialgebra (g, q*, 6) is called a coboundary 
bialgebra if the cobracket 6 is a trivial 1-cocycle on q, 
that is, 


6(X)=[X @I1+1@xX,r]| forallxXeq [6] 


the constant element r € g A g is called the “classical 
r-matrix.” If g is semisimple, H! (g, V)=0 for any 
q-module V by the classical Whitehead theorem, and 
hence all Lie bialgebra structures on g are of 
coboundary type. The associated Lie bracket on g* 
is given by the formula 


6, €], = adg r€-& — ad, ré’-€ [7] 


where we identified r € g A g with a skew-symmetric 
linear operator r:g*— g. The restrictions imposed 
on r by the Jacobi identity are formulated in terms 
of the so-called “Yang-Baxter tensor” [[r,r]] € g^ 
g Ag, which is a quadratic expression in r. To define 
it, let us mark different factors in tensor products, 
for example, g ® 9 ® q, by fixed numbers 1, 2, 3,... 
which indicate their place; for simplicity, we assume 
that g is embedded in an associative algebra A with a 
unit. The embeddings are defined as 


112,123,113: T@G—-ABABA 


by setting i2(X@Y)=X®YSI, and similarly 
in other cases. For a€q®q, we put 142(a)=ay2, 
etc. Set 


lr, r]] = [12,713] + (ria, 723] + [713,723] [8] 


The commutators in the RHS are computed in the 
associative algebra A & A&A; it is easy to check 
that the result does not depend on the choice of the 
embedding g A. 


Proposition 1 The Jacobi identity for | , |, is valid if 
and only if [[r,r]] is ad g-invariant, that is, if 


X@I@I+1@X@I+I181 8X, [fr] =0 
for all X Eq 


A coboundary Lie bialgebra with [[r,7r]] € (Aq) 
is called “quasitriangular”; it is called “triangular” 
if r satisfies the classical Yang—Baxter equation 
[[r,7]] =0. (Both terms come from another name of 
the classical Yang-Baxter equation, the “classical 
triangle equation.”) 

When a Lie algebra g admits a nondegenerate 
invariant inner product, the class of quasitriangular 
Lie bialgebra structures on g admits an important 
specialization. Let q@q*~q@q be the natural 
isomorphism induced by the inner product. Let I € 
g ® q* be the canonical element; its image t EC G@q 
under this isomorphism is called the “tensor 
Casimir element.” Clearly, t € (S*q)* and, more- 

3 a 
over, [t12,t23] E€ (Ag)®. When g is semisimple, the 
mapping (S*q)*— (Aq)%:s' [s12, 523] is an iso- 
morphism; in particular, if g is simple, both spaces 
are one dimensional and generated by a tensor 
Casimir (which is unique up to a scalar multiple). A 
Lie bialgebra (q,1r) is called factorizable if re qAq 


satisfies the modified classical Yang—Baxter 
equation 

Ir, 7] = clf12,t23|, c= const 4 0 [9] 
The convenient normalization is c= — 1/4 (it can be 


achieved by an appropriate normalization of r). 
Instead of dealing with the modified Yang-Baxter 
equation, we may relax the antisymmetry condition 
imposed on r. Set re =r+(1/2)t€q@q. Since t 
is ad qg-invariant, the symmetric part of r+ drops 
out from the cobracket; on the other hand, one 
has [[r+,r+]] =0. Regarding r+ as a linear operator, 
r+ € Hom(q*,g), we get the following important 
result: 


Proposition 2 Let (q,q*) be a factorizable Lie 
bialgebra. 


(i) The mappings rz € Hom(q*,q) are Lie algebra 
homomorphisms; moreover, r* = —r_. 
(ii) The combined mapping 


ly: G >~GGeq: Xr (r,X,r_X) 


is a Lie algebra embedding. 
(iii) Any X €q admits a unique decomposition 
X= xX a X_ with (X4, X_) €E Im l 


The additive decomposition in a factorizable Lie 
bialgebra gives rise to a multiplicative factorization 
problem in the associated Lie group. Namely, i, may 
be extended to a Lie group embedding i, : G* > G x 
G and any x € G, which is sufficiently close to the 
unit element, admits a decomposition x=x4,x 
with (x,,x_) € Im i,. 

Any Lie bialgebra (g,q*) admits a canonical 
embedding into a larger Lie bialgebra (called its 
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“double”) which is already factorizable. Namely, set 
D=q@q* as a linear space and equip it with the 
natural inner product, 


(((X, F), (X°,F))) = (F,X") + (FX) o] 


Theorem 2 


(i) There exists a unique structure of the Lie algebra 
on D such that: (a) g,q* C D are Lie subalgebras. 
(b) The inner product |10] is invariant. 

(ii) Let Pg,Pq» be the projection operators onto 
q,q° CD parallel to the complementary sub- 
algebra. Set a =P., = —Pg; then (d,r°) is a 
factorizable Lie bialgebra. 

(iii) The inclusion map (g, g*)~ (d, D*) is a homo- 
morphism of Lie bialgebras and the dual inclusion 
map (9*, g)> (D, D“) is an antibomomorphism. 


Conversely, let a be a Lie algebra equipped with a 
nondegenerate invariant inner product, a, C a its Lie 
subalgebras such that (i) a+ are isotropic with respect 
to inner product, (ii) a=a,. +a_ as a linear space. 
The triple (a,a,,a_) is called a “Manin triple.” Let 
P+ be the projection operators onto a+ in this 
decomposition. Set r-= +P. Then (a,rz) is a 
factorizable Lie bialgebra; moreover, a, and a_ are 
set into duality by the inner product in a and inherit 
the structure of a Lie bialgebra, and a is their double. 

If (g,g*) is itself a factorizable Lie bialgebra, its 
double admits a simple explicit description. Set 
D=q@q (direct sum of Lie algebras); let us equip 
D with the inner product 


(X, XJ; (X; Y’))) = (X, Y) _ (Y, Y’) 


Let gê c D be the diagonal subalgebra; we identify 
g* with the embedded subalgebra i,(q*) C D. 


Proposition 3 


(i) (d, gf, i (g*)) is a Manin triple. 
(ii) As a Lie algebra, D=9 ® 9 is isomorphic to the 
double of 9. 


Key examples of factorizable Lie bialgebras are 
associated with semisimple Lie algebras and their 
loop algebras. 


1. Let f be a compact semisimple Lie algebra: g = fc 
its complexification regarded as a real Lie algebra, 
o € Aut g the Cartan involution which fixes f, and 
q=f@p the associated Cartan decomposition. 
Fix a real split Cartan subalgebra a C p and the 
associated Iwasawa decomposition g= +a +n; 
put $=a +n. Let B be the complex Killing form 
on g; let us equip g with the real inner product 
(X, Y)=ImB(X,Y), then (q,f,3) is a Manin 


triple. Hence, any compact semisimple Lie group 
K carries a natural Poisson structure; its double 
G = D(K) is the complex group G = Kc (regarded 
as a real Lie group). The associated factorization 
problem in G is the Iwasawa decomposition 
G=KAN, which exists globally. 


. Let g be a real split semisimple Lie algebra, 6 its 


Cartan subalgebra, and A, a system of positive 
roots. Fix an invariant inner product on g which 
is positive on §, and let {ea; œ € A+} be the root 
vectors normalized in such a way that 
(en ene = al Ler 


N+ = CD R- ex 


ac A, 


Fix an orthonormal basis {H;} in 6; let P4, Po 
be the projection operators onto n+, in the 
Bruhat decomposition q=n_.+h.+n,. The 
standard Lie bialgebra structure on g is given 
by the r-matrices r+ = +P,+4Po. In tensor 
notation, 


reat Yo ea heaty) HoH, [11 


acA, 


Let b} =ġ +n be the opposite Borel subalge- 
bras; the inner product in g sets them into 
duality, and (6,,6_) is a Lie sub-bialgebra 
in (q,q*). Let G be the connected, simply 
connected Lie group associated with g,Bs= 
HN. its opposite Borel subgroups which corres- 
pond to b4. Let p: B+—B+/N+ œH be the 
canonical projection. The associated factoriza- 
tion problem in G, g=b,b, (b,b) € B, x 
B_,p(b.)=p(b_), is closely related to the 
Bruhat decomposition; it is solvable for all g in 
the open Bruhat cell B,N_ CG. 


. Let Lg =g Q C((z)) be the loop algebra of a finite 


dimensional semisimple Lie algebra g, as usual we 
denote the ring of formal Laurent series by C((z)). 
Put Lg, =g ® C[[z]], La. =g @ z1'C[z1)]. Fix an 
invariant inner product on g and equip Lg with 
the inner product 


(XY) = Res.9( X(z), Y(z))-dz 


Then (Lg, Lg}, Lg_) is a Manin triple. The associa- 
ted classical r-matrix is called “rational r-matrix” in 
tensor notation, it is represented by a singular kernel 





(2,2) = 


where t € g @q is the tensor Casimir, which is 
essentially the Cauchy kernel. 


. Let us assume that g = $!(7); in this case, the loop 


algebra Lg admits a nontrivial decomposition 
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associated with the so-called “elliptic r-matrix.” 
Set 


h= diag(l;e e] 


) 


0O 1 ae O 
0 1 
| [12] 
l = oe e2mi/n 
1 
1 0 0 


Put ZZ =Z/nZ x Z/nZ; for a=(a1,az) € ZZ, 
set I, =I} I; matrices I, define an irreducible 
projective representation of Z7 (they form the so- 
called “finite Heisenberg group)”. Let us denote 
the elliptic curve of modulus r by E= C/Z + TZ 
and let P— € be the n-dimensional holomorphic 
vector bundle with flat connection and with 
monodromies given by 


ze7et+1:h,=Adh, zreztt:ho=Adh 


Let Ge C Lg be the subspace of Laurent expansions 
at zero of the global meromorphic sections of P 
with a unique pole at 0 € E£. Then (Lg, Lg, Ge) is 
again a Manin triple. The associated classical 
r-matrix is the kernel of a singular integral operator 
which associates a meromorphic section of P to its 
principal part at 0. Explicitly, it is given by 


ez) ES (E-a or) [13] 


a,b=0 
x (Adl p @ I) -t 





where ¢ is the Weierstrass zeta function. 

5. Let g be an arbitrary semisimple Lie algebra 
again. Let us equip the loop algebra Lg with the 
inner product 


((X, Y))y = Resz—0 (X (2), Y(z))z71 dz 


St M4 =n} +g ® zC[[z]], V_=n_+ 9 @ 
ztC[zt]. We have Lg=N,+6+N_, where 
we identify Ñ), n+ C g with the corresponding 
subalgebras of constant loops in Lg. Let P4, Po 
be the projection operators onto N+, in this 
decomposition and r-= + P4 Æ (1/2)Po. The 
classical r-matrices r+ define on Lg the structure 
of a factorizable Lie bialgebra. The associated 
tensor kernels are called the trigonometric classi- 
cal r-matrices. 


Classical r-matrices described above are associated 
with factorization problems in the infinite-dimensional 
loop groups: matrix Riemann problems or matrix 
Cousin problems (in the elliptic case). Belavin and 


Drinfeld have given a complete classification of 
factorizable Lie bialgebra structures for semisimple 
Lie algebras; in the loop algebra case, the problem they 
solved consists of classification of all meromorphic 
solutions of the classical Yang—Baxter equation. In 
other words, we assume that the distribution kernel 
associated with the classical r-matrix is represented by 
a meromorphic function (of two complex variables). 
Up to an equivalence, any such solution depends 
only on one variable and belongs to the rational, 
trigonometric, or elliptic type (in the latter case, the 
underlying Lie algebra is necessarily $l(m)). Classifi- 
cation of solutions in the elliptic case is completely 
rigid; in the trigonometric case, the moduli space is 
finite dimensional and admits an explicit descrip- 
tion. In the rational case, the classification is 
somewhat less explicit (it has been completed by A 
Stolin under some nondegeneracy condition). Con- 
trary to to the popular belief, there are many other 
structures of a factorizable Lie bialgebra on loop 
algebras, for which the associated r-matrices are 
given by more singular distribution kernels. 


Poisson Lie Groups 


If the tangent Lie bialgebra of a Poisson Lie group is 
of coboundary type, the cocycle 7 is also trivial, 


n(q)=r—Adq@®Adq-r. Hence, the Poisson 
bracket on G is given by 
{p, y= (r VeA VY = r VPA), rEgAg 


where Vy, V’y € g* are left and right differentials of 
y € C”(G). This is the so-called “Sklyanin bracket”. 
Let us assume that G is a matrix group; its affine 
ring generated by evaluation functions $j which 
assign to L € G its matrix coefficients, ;(L) = Lj. 
The Poisson bracket on G is completely determined 
by its values on ¢j;. Explicitly, we get 


{2i Pem} (L) = Ír, L 8 Llikim [14] 


the commutator in the RHS is in Mat(n?). By a 
variation of language, evaluating functions and their 
values on a generic element L € G are denoted by 
the same letter; using tensor notation to suppress 
matrix indices, we get 


4 to} = (tla) la =L LL =I0L [5| 


In the case of loop algebras, these Poisson bracket 
relations take the form 


(L10), L2(u)} = FA, u), Li A) £2 (1) 


Let us assume that G is factorizable and the 
associated factorization problem is globally solvable. 
The Poisson bracket on the dual group G* ~ 
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i,(G*) C G x G may be characterized in terms of the 
matrix coefficients of (h,,h_)=i,(b), or of their 
quotient h =h,h—'. Explicitly, we get 


(PLI) = [rbi], {1.02 }.= [rabt] [16 


{h4, h h= rhib + hyhor — hərbi = hir_ho, 


r=ł}(r} +r) 


The key question in the geometry of Poisson 
groups consists in description of symplectic leaves in 
G, G*. This question is already nontrivial when G* is 
abelian (and hence may be identified with the dual of 
the Lie algebra g = Lie(G)). The Poisson bracket on 
q* is linear; this is the well-known Lie—Poisson (alias, 
Beresin—Kirillov-Kostant) bracket. Its symplectic 
leaves coincide with the orbits of the coadjoint 
representation of G in q*. The natural way to prove 
this fundamental result (which goes back to Lie) is to 
consider first the natural action of G on the 
cotangent bundle T*G ~ G x g*; this action is 
Hamiltonian, and the coadjoint orbits arise as a 
result of Hamiltonian reduction associated with this 
action. The generalization of the theory of coadjoint 
orbits to the case of arbitrary Poisson groups starts 
with the notion of symplectic double, which is the 
nonlinear analog of the cotangent bundle. 

Let D be the double of (G,G*); assume for 
simplicity that D=G-G* globally and hence the 
associated factorization problem is always solvable. 
Let rp =(1/2)(Pg — Pox). Set 


{y v}4 = (Vy, Vo) E (rV p, Vy) [18] 


The bracket {,}_ is the usual Sklyanin bracket which 
defines the structure of a Poisson group on D, while 
{,}, is nondegenerate and defines a symplectic 
structure on D. Let us denote the copies of D equipped 
with the bracket {,} by D+. The bracket on D+ is not 
multiplicative, but it is covariant with respect to the 
action of D_ by left and right translations; in other 
words, the natural mappings D- x D} —D, and 
D, x D- —> D,, associated with multiplication in D, 
preserve Poisson brackets. Since G,G* C D_ are 
Poisson subgroups, natural actions G x D} > D, 
and G* x D, — D» by left and right translations are 
Poisson mappings. Consider the natural projections 


De Da 


pS NP 


G\D œ G* G ~ D/G* G*\D2~G 


onto the space of left and right coset classes. It is easy 
to see that functions on D, which are constant on each 
projection fiber are closed with respect to the Poisson 
bracket. This means that the quotient spaces inherit 


the Poisson structure. Moreover, the maps 7,7’ and 
p,p’ form the so-called “dual pairs”, that is, the 
algebras of functions which are constant on the fibers 
of x and r (or of p and p’) are mutual centralizers of 
one another in the big Poisson algebra F(D.,). 
Since D=G-G*=G*-G, we have G*/D~G, 
G/D ~ G*; it is easy to check that the quotient 
Poisson structure induced on G,G* coincides with 
the original one. Applying the fundamental theorem 
on dual pairs of Poisson mappings (going back to S. 
Lie), we conclude that symplectic leaves in G and G*, 
respectively, coincide with the orbits of G* (respec- 
tively, G) in these quotient spaces. The actions G x 
G* — G*, G* x G— G are called “dressing transfor- 
mations”. Unit elements in G and G* are fixed points 
of dressing transformations; their linearizations at the 
tangent spaces T,G* ~ q*,T.G œ g coincide with the 
coadjoint actions of G and G*, respectively. 

When D Æ G- G* (i.e., the factorization problem in 
D is not always solvable), dressing actions are still well 
defined as global transformations of the quotient 
spaces; in this case G, G* may be identified with open 
cells in D/G*, D/G, respectively, which means that 
dressing action on G, G* is, in general, incomplete. 

If the group G is factorizable, symplectic leaves in the 
dual group G* admit a nice uniform description: since 
in this case D=G x G and GC D is the diagonal 
subgroup, the quotient D/G may be modeled on G 
itself. The quotient Poisson bracket in this realization 
coincides with [17], while the dressing action coin- 
cides with conjugation in G (and is independent of 
r). Hence, symplectic leaves in D/G coincide with 
conjugacy classes in G; the equivalence of this model 
with G* (equipped with the bracket [16]) is provided 
by the factorization map. The description of sym- 
plectic leaves in G is more subtle (and already 
crucially depends on the choice of r!); for semisimple 
Lie groups with the standard Poisson structure, it is 
related to the geometry of double Bruhat cells. 

For loop groups with rational, trigonometric, or 
elliptic r-matrices, dressing action is associated with 
auxiliary factorization problems in the loop group. 
Roughly speaking, symplectic leaves correspond to 
rational loops with prescribed singularities. Many 
important examples have been described in connection 
with integrable lattice systems, although a complete 
classification theorem is still not available. For 
q = $1(2), the elliptic Manin triple described earlier 
leads to the Poisson structure on the group of “elliptic 
loops” with values in SL(2); its simplest symplectic 
leaves (corresponding to loops with simple poles) are 
associated with a remarkable Poisson algebra, the 
Sklyanin algebra (with four generators and two 
Casimir functions), which admits an interesting 
explicit quantization. 
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Dressing action is a nontrivial example of a 
Poisson group action. In general, such actions are 
not Hamiltonian in the usual sense; the appropriate 
generalization is provided by the notion of the 
nonabelian moment map. Let G x M —M be an 
action of a Poisson group G on a Poisson manifold 
M,q— Vect M, the associated homomorphism of 
Lie algebras. A mapping w:M-—G* is called the 
nonabelian moment map associated with this action, 
if for any X € g and wy € F(M), we have 


X- p= (wu *{p, phy, X) 


In this case, Gx M—M is a fortiori a Poisson 
map. Both dressing actions G* x G— G and G x 
G* — G* admit nonabelian moment maps, which are 
just the identity maps u=idg and p* =idg:. For 
compact Poisson groups, the nonabelian moment 
map has good convexity properties, which general- 
ize the convexity properties of the ordinary moment 
map for Hamiltonian group actions. 

The general theory of homogeneous Poisson spaces 
has some peculiarities. Typically, the G-covariant 
Poisson structure on a given homogeneous space is 
not unique (when it exists); this is true already for 
principal homogeneous spaces (a simple example is 
provided by the symplectic double D+). Let G be a 
Poisson Lie group, (q, q*) its tangent Lie bialgebra, D 
its double, U its Lie subgroup, u = Lie U. A subalgebra 
{ c Dis called Lagrangian if it is isotropic with respect 
to the canonical inner product in D. The general 
classification result, according to Drinfeld, asserts that 
there is a bijection between G-covariant Poisson 
structures on G/U and the set of all Lagrangian 
subalgebras I c D such that LN q=u. Various non- 
trivial examples arise, notably in the study of integr- 
able systems. For instance, the geometric proof of the 
factorization theorem for lattice zero-curvature equa- 
tion, which is stated in the following section, uses a 
different Poisson structure on the double (the so-called 
“twisted symplectic double).” 


Applications to Integrable Systems 


The definition of Poisson—Lie groups was motivated 
by key examples which arise in the theory of 
integrable systems. In applications, one often deals 
with nonlinear differential equations which may be 
written in the form of the so-called “lattice zero 
curvature equations” 

“de = LmnMm _ Mm+1Lm, 
where Lm, Mm are matrices, possibly depending on 
an additional parameter (or, more generally, abstract 


mEZ [19] 


linear operators). Equations [19] give the compat- 
ibility conditions for the auxiliary linear system 


dn 
dt 
The use of finite-difference operators associated with 
a one-dimensional lattice, as in [20], is particularly 
well suited for the study of “multiparticle” lattice 
models. Let we assume that the “potential” L,, in [20] 
is periodic, Linin=Lm; the period N may be 
interpreted as the number of copies of an “elemen- 
tary” system. It is natural to presume that “Lax 
matrices” L,, in [19] are elements of a matrix Lie 
group G (or of a loop group, if they depend on an 
extra parameter). The auxiliary linear problem [20] 
leads to a family of dynamical systems on GN which 
remain integrable for any N. Let T: GN — G be the 
“monodromy map” which assigns to the set 
Li,...,Ly of local Lax matrices their ordered 
product Ty = LnLn_1---L,. Let us assume that G is 
equipped with the Sklyanin bracket associated with a 
factorizable r-matrix r. Then T is a Poisson map. Let 
I(G) be the algebra of central functions on G; for y € 
I(G), set Hp, =yoT. All functions Hy, yp € I(G) are 
in involution with respect to the product Poisson 
bracket on GN and give rise to lattice zero-curvature 
equations of the same form as [19]; for a given y, we 
may choose the M-matrix in either of the two forms: 


ME = r4 (Um VETY) vm = [| Le 


1<k<m 


Wm-+1 E Lin Vins = =M m Vis me ZL [20] 


Let L,,(t),m=1,...,N, be the integral curve of 
this equation which starts at L?,. The construction of 
this curve reduces to the factorization problem asso- 
ciated with the chosen r-matrix. Explicitly, we get 


Lim(t) = Sm+1 (HAT LL g(t), = Sm+1 Omi as 


where (g,,(t)152,,(¢)_) is the curve in G* which 
solves the factorization problem 


Em (E) 8m (E = dm exp(tVe(T(L°))) Y7, 
On = Yml L?) 


This result exhibits the double role of the r-matrix. 
On the one hand, it serves to define the Poisson 
structure on G which is adapted to the study of 
lattice zero-curvature equations; in particular, the 
dynamical flow associated with these equations is 
automatically confined to symplectic leaves in GN. 
(In applications, G is usually a loop group equipped 
with a factorizable r-matrix; despite the fact that 
dim G =œ, it admits plenty finite-dimensional sym- 
plectic leaves.) In its second incarnation, the r-matrix 
serves to define the factorization problem which 
solves these zero-curvature equations. In the loop 


518 Clifford Algebras and Their Representations 


group case, this is a matrix Riemann problem; its 
explicit solution is based on the study of the spectral 
curve associated with the “monodromy matrix” Tz 
and uses the technique of algebraic geometry. 

The monodromy map T : GN — G may be regarded 
as a nonabelian moment map associated with an 
action of the dual Lie algebra q* on the phase space. 
This action actually extends to an action of the (local) 
Lie group G* which transforms solutions into solu- 
tions again. This is the prototype “dressing” action 
(originally defined by Zakharov and Shabat in their 
study of zero-curvature equations related to Riemann- 
Hilbert problems). Dressing provides an effective tool 
to produce new solutions of zero-curvature equations 
from the “trivial” ones; it was also the first nontrivial 
example of a Poisson group action. 
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Introduction 
Introductory and Historical Remarks 


Clifford (1878) introduced his “geometric algebras” 
as a generalization of Grassmann algebras, complex 
numbers, and quaternions. Lipschitz (1886) was the 
first to define groups constructed from “Clifford 
numbers” and use them to represent rotations in a 


Euclidean space. Cartan discovered representations of 
the Lie algebras so,(C) and so,(R),2 > 2, that do 
not lift to representations of the orthogonal groups. 
In physics, Clifford algebras and spinors appear for 
the first time in Pauli’s nonrelativistic theory of the 
“magnetic electron.” Dirac (1928), in his work on the 
relativistic wave equation of the electron, introduced 
matrices that provide a representation of the Clifford 
algebra of Minkowski space. Brauer and Weyl (1935) 
connected the Clifford and Dirac ideas with Cartan’s 
spinorial representations of Lie algebras; they found, 
in any number of dimensions, the spinorial, projective 
representations of the orthogonal groups. 


Clifford algebras and spinors are implicit in 
Euclid’s solution of the Pythagorean equation x* — 


y? + z*=0, which is equivalent to 


- i Je) a W 
Z Yr q 


and gives x =q? — p*, y =p? + q*, z=2pq. If the 
numbers appearing in [1] are real, then this equation 
can be interpreted as providing a representation of a 
vector (x,y,z) € R°, null with respect to a quadratic 
form of signature (1,2), as the “square” of a spinor 
(p,q) € R?. The pure spinors of Cartan (1938) 
provide a generalization of this observation to 
higher dimensions. 

Multiplying the square matrix in [1] on the left by 
a real, 2 x 2 unimodular matrix, on the right by its 
transpose, and taking the determinant, one arrives at 
the exact sequence of group homomorphisms: 


1 > Z, — SL: (R) =Spin}, + SO}, > 1 


Multiplying the same matrix by 


K o S ; 
=(7 a 2 


on the left and computing the square of the product, 
one obtains 


2 
4 x+y 1 0 
K= =R 0 1 


This equation is an illustration of the idea of 
representing a quadratic form as the square of a 
linear form in a Clifford algebra. Replacing y by iy, 
one arrives at complex spinors, the Pauli matrices, 


0 1 1 0 
ü= i Cu = Ie; = 
1 0 g ~ io =i 


Spin; = SU, etc. 

This article reviews Clifford algebras, the asso- 
ciated groups, and their representations, for quad- 
ratic spaces over complex or real numbers. These 
notions have been generalized by Chevalley (1954) 
to quadratic spaces over arbitrary number fields. 


Notation 


If S is a vector space over K=R or C, then S* 
denotes its dual, that is, the vector space over K 
of all K-linear maps from S to K. The value of w € 
S on s€S is sometimes written as (s,w). 
The transpose of a linear map f:S; —> $2 is the 


map f*:S} — Si defined by (s,f*(w)) = (f(s), w) for 
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every s E€ Sy and w € S$}. If Sı and S2 are complex 
vector spaces, then a map f:$; — $2 is said to be 
semilinear if it is R-linear and f(is) = —if(s). The 
complex conjugate of a finite-dimensional complex 
vector space S is the complex vector space S of all 
semilinear maps from S* to C. There is a natural 
semilinear isomorphism (complex conjugation) S — S, 
st+s such that (w,s) =(s,w) for every we S*. 
The space S can be identified with S and then s=s. 
The spaces (S)* and S* are identified. If f:S; — S2 
is a complex-linear map, then there is the complex- 
conjugate map f:Sı — S2 given by f(s)=f(s) and 
the Hermitian conjugate map fi ea f SeS. 
A linear map A:S > S* such that At =A is said to 
be Hermitian. K(N) denotes, for K=R, C or H, the 


set of all N by N matrices with elements in K. 








Real, Complex, and Quaternionic Structures 


A real structure on a complex vector space S is a 
complex-linear map C:S —S such that CC=ids. 
A vector s € S is said to be real if s= C(s). The set of 
all real vectors is a real vector space; its real 
dimension is the same as the complex dimension of S. 

A complex-linear map C:S— S such that 
CC = — ids defines on S a quaternionic structure; a 
necessary condition for such a structure to exist is 
that the complex dimension m of S be even, m= 2n, 
n € N. The space S with a quaternionic structure 
can be made into a right vector space over the field 
H of quaternions. In the context of quaternions, it is 
convenient to represent the imaginary unit of C as 
V—1. Multiplication on the right by the quaternion 
unit i is realized as the multiplication (on the left) by 
V—1. If j and k=ij are the other two quaternion 
units and s € S, then one puts sj = C(s) and sk = sij. 

A real vector space S can be complexified by 
forming the tensor product C ®g S=S @i8. 

The realification of a complex vector space S is the 
real vector space having S as its set of vectors so that 
dimer S = 2 dimc S. The complexification of a realifica- 
tion of S is the “double” S $ S of the original space. 








Inner-Product Spaces and Their Groups 


Definitions: quadratic and symplectic spaces A 
bilinear map B:S x S — K on a vector space S over 
K is said to make S into an inner-product space. To 
save on notation, one also writes B:S — S* so that 
(s, B(t)) =B(s,t) for all s,teS. The group of 
automorphisms of an inner-product space, 


Aut(S,B)={R € GL(S)|R* o Bo R=B} 


is a Lie subgroup of the general linear group GL(S). 
An inner-product space (S,B) is said here to be 
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quadratic (resp., symplectic) if B is symmetric (resp., 
antisymmetric and nonsingular). A quadratic space is 
characterized by its quadratic form s +> B(s,s). For 
K=C, a Hermitian map A:S—S defines a 
Hermitian scalar product A(s, t) = (s, A(t)). 

An orthogonal space is defined here as a quadratic 
space (S, B) such that B:S — S* is an isomorphism. 
The group of automorphisms of an orthogonal space 
is the orthogonal group O(S,B). The group of 
automorphisms of a symplectic space is the sym- 
plectic group Sp(S, B). The dimension of a symplec- 
tic space is even. If S=K*” is a symplectic space 
over K=R or C, then its symplectic group is 
denoted by Sp,,(K). Two quaternionic symplectic 
groups appear in the list of spin groups of low- 
dimensional spaces: 








Sp, (H) = {a € H(2) |ata =I} 


and 








Sp, (Hl) = {a € H(2) | a'oa = oz} 


Here at denotes the matrix obtained from a by 
transposition and quaternionic conjugation. 


Contractions, frames, and orthogonality From now 
on, unless otherwise specified, (V,g) is a quadratic 
space of dimension m. Let AV = fo A’V be its 
exterior (Grassmann) algebra. For every v € V and 
w € AV there is the contraction g(v)|w characterized 
as follows. The map V x AV > AV, (v,w)=> 
g(v)|w, is bilinear; if x € AP V, then g(v)| (x Aw) = 
(g(v) |x) Aw + (—1)?x A (g(v) Jw) and g(v) |v=g(v,v). 

A frame (e,,) in a quadratic space (V, g) is said to 
be a quadratic frame if u Æ v implies g(e,, e,) = 0. 

For every subset W of V there is the orthogonal 
subspace W+ containing all vectors that are ortho- 
gonal to every element of W. 

If (V, g) is a real orthogonal space, then there is an 
orthonormal frame (e,,), “= 1,...,m, in V such that 
k frame vectors have squares equal to —1, / frame 
vectors have squares equal to 1 and k+/=m. The 
pair (k,/) is the signature of g. The quadratic form g 
is said to be neutral if the orthogonal space (V, g) 
admits two maximal totally null subspaces W and 
W’ such that V=W ẹ W’. Such a space V is 2n- 
dimensional, either complex or real with g of 
signature (n,n). A Lorentzian space has maximal 
totally null subspaces of dimension 1 and a 
Euclidean space, characterized by a definite quad- 
ratic form, has no null subspaces. The Minkowski 
space is a Lorentzian space of dimension 4. 

If (V,g) is a complex orthogonal space, then an 
orthonormal frame (e,), f=1,...,m, can be 


chosen in V so that, defining g,,=g(e,,e,), one 
has pus (—1)“*! and, if u £ v, then BU. 

If A:S — S“ is a Hermitian isomorphism, then 
there is a (pseudo)unitary frame (ea) in S such that 
the matrix Aag =A(ea,eg) 1s diagonal, has p 1’s 
and q —1’s on the diagonal, p + q = dim S. If p=q, 
then A is said to be neutral. A is definite if either p 
or q=0. 


Algebras 


Definitions An algebra over K is a vector space A 
over K with a bilinear map A x A — A, (a, b)— ab, 
which is distributive with respect to addition. 
The algebra is associative if (ab)c=a(bc) holds for 
all a,b,c € A. It is commutative if ab = ba for all 
a,b €e A. An element 14 is the unit of A if 
14a=a1, =a holds for every a € A. 

From now on, unless otherwise specified, the bare 
word algebra denotes a finite-dimensional, associa- 
tive algebra over K=R or C, with a unit element. 
If S is an N-dimensional vector space over K, then the 
set EndS of all endomorphisms of S$ is an N*- 
dimensional algebra over K, the product being 
defined by composition; if f,g € Ends, then one 
writes fg instead of fog; the unit of EndS is 
the identity map I. By definition, homomorphisms 
of algebras map units into units. The map K —> A, 
amala is injective and one identifies K with its 
image in A by this map so that the unit can be 
represented by 1€ KC A. A set BC A is said to 
generate A if every element of A can be represented 
as a linear combination of products of elements of B. 
For example, if V is a vector space over K, then its 
tensor algebra 


T(V) = O5-0 QP V 


is an (infinite-dimensional) algebra over K generated 
by K@V. The algebra of all N x N matrices 
with entries in an algebra A is denoted by A(N). 
Its unit element is the unit matrix I. In particular, 
R(N), C(N), and HN) are algebras over R. The 
algebra R(2) is generated by the set {o,,0,}. As a 
vector space, the algebra R(2) is spanned by the set 
{1 Orgs Oe 

The direct sum A@B of the algebras A and B 
over K is an algebra over K such that its underlying 
vector space is A x B and the product is defined by 
(a,b) - (a’, b')=(aa',bb’') for every aad E€ A and 
b,b' € B. Similarly, the product in the tensor 
product algebra A &x B is defined by 








(a® b)- (d 8 b") =ad & bb’ [3] 


For example, if A is an algebra over R, then the 
tensor product algebra R(N) ®p A is isomorphic to 
A(N) and 
K(N) 8x K(N') = K(NN‘) |4] 

for K=R or C and N, N’ € N. There are isomorph- 
isms of algebras over R: 

C&rRC=C9C 

C &r H=C(2) [5] 
H ®p H=R(4) 











An algebra over R can be complexified by complex- 
ifying its underlying vector space; it follows from [5] 
that C(2) is the complex algebra obtained by 
complexification of the real algebra HI. 

The center of an algebra A is the set 


Z(A)={a€ Aļab=ba V be A} 


The center is a commutative subalgebra containing 
K. An algebra over K is said to be central if its center 
coincides with K. The algebras R(N) and H(N) are 
central over R. The algebra C(N) is central over C, 
but not over R. 








Simplicity and representations Let Bı and 62 
be subsets of the algebra A. Define 6, Bz = {b1b2 | 
bı € By, b2 € B2}. A vector subspace B of A is said 
to be a left (resp., right) ideal of A if AB C B (resp., 
BA C B). A two-sided ideal — or simply an ideal — is 
a left and right ideal. An algebra A Æ {0} is said to 
be simple if its only two-sided ideals are {0} and A. 

For example, the algebras R(N) and H(N) are 
simple over R; the algebra C(N) is simple when 
considered as an algebra over both R and C; every 
associative, finite-dimensional simple algebra over R 
or C is isomorphic to one of them. 

A representation of an algebra A over K in a vector 
space S over K is a homomorphism of algebras p: A > 
End S. If p is injective, then the representation is said to 
be faithful. For example, the regular representation p: 
A — EndA of an algebra A, defined by p(a)b=ab 
for all a,b € A, is faithful. A vector subspace T of 
the vector space S carrying a representation p of A 
is said to be invariant for p if p(a)T C T for every 
a € A; it is proper if distinct from both {0} and S. 
For example, a left ideal of A is invariant for the 
regular representation. Given an invariant subspace 
T of p one can reduce p to T by forming the 
representation pr:A — End T, where pr(a)s = p(a)s 
for every ac A and seT. A representation is 
irreducible if it has no proper invariant subspaces. 

A linear map F:Sı — $2 is said to intertwine the 
representations p1 : A — End S; and p2: A — End S if 
Fo;(a)=p2(a)F holds for every a€ A. If F is an 
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isomorphism, then the representations pı and p2 are 
said to be equivalent, p41 ~ p2. The following two 
propositions are classical: 


Proposition (A) 


(i) An algebra over K is simple if and only if it 
admits a faithful irreducible representation in a 
vector space over K. Such a representation is 
unique, up to equivalence. 

(ii) The complexification of a central simple algebra 
over IR is a central simple algebra over C. 


For real algebras, one often considers complex 
representations, that is, representations in complex 
vector spaces. Two such representations p,;:A— 
End S4 and p2:A— End S2 are said to be complex 
equivalent if there is a complex isomorphism F:$; > 
S2 intertwining the representations; they are real 
equivalent if there is an isomorphism among the 
realifications of S$; and S2, intertwining the 
representations. For example, C, considered as an 
algebra over R, has two complex-inequivalent 
representations in C: the identity representation 
and its complex conjugate. The realifications of 
these representations, given by ite and i —e, 
respectively, are real equivalent: they are intertwined 
by oz. The real algebra H, being central simple, has 
only one, up to complex equivalence, representation 
in C*: every such representation is equivalent to the 
one given by 


irnvo,/vV—1, j= oy/V-1, ki o,/V—1 


This representation extends to an injective homo- 
morphism of algebras 7: H(N) — C(2N) which is used 
to define the quaternionic determinant of a matrix a € 
H(N) as dety(a)=deti(a), so that dety(a) > 0 and 
dety (ab) =dety(a)dety(b) for every a,b € H(N). In 
particular, if q4 € H and A, u € R, then detyy(qg)= gq and 






































A 
denn ( _ T) =ou) (6) 
-J n 


There are quaternionic unimodular groups 
SLn(Al) = {a € H(N) | dety(a)=1}. For example, 
the group SLi (H) is isomorphic to SU and SL (HI) 
is a noncompact, 15-dimensional Lie group, one of 
the spin groups in six dimensions. 

















Antiautomorphisms and inner products An auto- 
morphism of an algebra A is a linear isomorphism a: 
A— A such that a(ab)=a(a)a(b). An invertible 
element c € A defines an inner automorphism Ad(c) € 
GL(A), Ad(c)a=cac~!. Complex conjugation in C, 
considered as an algebra over R, is an automorphism 
that is not inner. An antiautomorphism of an 
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algebra A is a linear isomorphism 8: A — A such that 
GB(ab) = 3(b)G(a) for all a,b € A. An (anti)auto- 
morphism (3 is involutive if G7 =id. For example, 
conjugation of quaternions defines an involutive 
antiautomorphism of HI. 

Let p:A — End S be a representation of an algebra 
with an involutive antiautomorphism (3. There is then 
the contragredient representation p: A — End S* given 
by p(a) =(p(G(a)))*. If, moreover, A is central simple 
and p is faithful irreducible, then there is an isomorph- 
ism B:S — S* intertwining p and ø which is either 
symmetric, B* =B, or antisymmetric, B*=—B. It 
defines on S the structure of an inner-product space. 
This structure extends to End S: there is a symme- 
tric isomorphism B & B7! :End S — (End S)* = End S* 
given, for every f € End S, by (B & B~!)(f) = Bf B7. 

Let KX = K\{0} be the multiplicative group of the 
field K. Given a simple algebra A with an involutive 
antiautomorphism 8, one defines N(a)= (aja and 
the group 





G(8)={a E€ A|N(a@) € K*} 


Let p: A — End S be the faithful irreducible represen- 
tation as above, then, for a € A and s,t € S, one has 


B(p(a)s, p(a)t) = N(a)B(s, t) 
If a € G(G) and A € K*, then Aa € G(3) and the norm 


N satisfies N(Aa) = A*N(a). The inner product B is 
invariant with respect to the action of the group 


Gi (9) = {4 € GG) |N(a)=1} 


Proposition (B) Let A be a central simple algebra 
over K with an involutive antiautomorphism 3 and a 
faithful irreducible representation p so that 


p(a) = Bo(a)B' 
The map h : A x A— K defined by 


h(a, b) = tr p(G(a)b) 


is bilinear, symmetric, and nondegenerate. The map 
p is an isometry of the quadratic space (A,h) on its 
image in the quadratic space (End S, B ® B™'). 


Graded Algebras 


Definitions An algebra A is said to be Z-graded 
(resp., Z2-graded) if there is a decomposition of the 
underlying vector space A= pez A? (resp., 
A= A? @ A!) such that A? A1 C APT. Ina Z2-graded 
algebra, it is understood that p + q is reduced mod 2. If 
a € A’, then a is said to be homogeneous of degree p. 
The exterior algebra AV of a vector space V is 
Z-graded. Every Z-graded algebra becomes Z-graded 





when one reduces the degree of every element 
mod 2. A graded isomorphism of graded algebras 
is an isomorphism that preserves the grading. 

A Zp-grading of A is characterized by the 
involutive automorphism a such that, if a € Æ, 
then a(a)=(—1)?a. From now on, grading means 
Z2-grading unless otherwise specified. The elements 
of A? (resp., A!) are said to be even (resp., odd). It 
is often convenient to denote the graded algebra as 


A® = A [7] 


Given such an algebra over K and NEN, one 
constructs the graded algebra A°(N) — A(N). Two 
graded algebras over K, A? > A and A” = A’ are 
said to be of the same type if there are integers N 
and N’ such that the algebras A°(N) > A(N) and 
A’°(N') — A'(N') are graded isomorphic. The prop- 
erty of being of the same type is an equivalence 
relation in the set of all graded algebras over K. 

Given an algebra A, one constructs two “canoni- 
cal” graded algebras as follows: 


1. the double algebra 
A—-A®A 


graded by the “swap” automorphism, a(a1, a2) = 
(a2, 41) for a1,a2 E€ A; 
2. the algebra 


Ag A= A(2) 


is defined by declaring the diagonal (resp., anti- 
diagonal) elements of A(2) to be even (resp., odd). 


The real algebra R(2) has also another grading, 
given by the involutive automorphism a such that 
a(a) =eae~', where a € R(2) and e is as in [2]. In 
this case, [7] reads 

C — R(2) 


There are also graded algebras over R: 


R-C, C-H, and Ħ — C(2) 








The grading of the last algebra can be defined by 
declaring the Pauli matrices and i to be odd. 


Super Lie algebras A super Lie algebra is a graded 
algebra A such that the product (a,b)— [a,b] is 
super anticommutative, [a,b] = — (—1)°1[b,a], and 
satisfies the super Jacobi identity, 


|a, [b, c]] = [la, b], c] + (-1)""[b, [a, €] 


for every a € A’, b € At and c € A. To every graded 
associative algebra A there corresponds a super Lie 
algebra GLA: its underlying vector space and 
grading are as in A and the product, for a € AP 


and b € A’, is given as the supercommutator |a, b] = 
ab — (—1)?%ba. 


Supercentrality and graded simplicity A graded 
algebra A over K is supercentral if Z(A) N A? =K. 
The algebra R — C is supercentral, but the real 
ungraded algebra C is not central. 

A subalgebra B of a graded algebra A is said to be 
a graded subalgebra if B=BNA° @BNA'. A 
graded ideal of A is an ideal that is a graded 
subalgebra. A graded algebra A Æ {0} is said to be 
graded simple if it has no graded ideals other than 
{0} and A. The double algebra of a simple algebra is 
graded simple, but not simple. 


The graded tensor product Let A and B be graded 
algebras; the tensor product of their underlying 
vector spaces admits a natural grading, (A Q B)? = 
@, A? @ BP4. The product defined in [3] makes 
AQ B into a graded algebra. There is another “super” 
product in the same graded vector space given by 


(a @ b)- (a Q b’) =(-1)?4aa' @ bb’ 


for a’ € A? and b € B1. The resulting graded algebra 
is referred to as the graded tensor product and 
denoted by AQ B. For example, if V and W are 
vector spaces, then the Grassmann algebra A(V @ 
W) is isomorphic to AV & A W. 


Clifford Algebras 
Definitions: The Universal Property and Grading 


The Clifford algebra associated with a quadratic 
space (V,g) is the quotient algebra 


CLV, 8) =T(V)/I(V,8) [8] 


where J (V, g) is the ideal in the tensor algebra 7 (V) 
generated by all elements of the form v@v— 
gv, v) iTv), ve. 

The Clifford algebra is associative with a unit 
element denoted by 1. One denotes by « the 
canonical map of 7(V) onto Cé(V,g) and by ab 
the product of two elements a,b € Cé(V,g) so that 
K(P @ O)=K(P)K(Q) for P,O € T(V). The map «x is 
injective on K @ V, and one identifies this subspace of 
T(V) with its image under «. With this identification, 
for all u,v € V, one has 


uv + vu =2g(u,v) 


Clifford algebras are characterized by their universal 
property described in the following proposition. 


Proposition (C) Let A be an algebra with a unit 14 
and let f:V — A be a Clifford map, that is, a linear 
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map such that f(v} = g(v,v)14 for every v € V. There 
then exists a homomorphism f:Cl(V,g) — A of 
algebras with units, an extension of f, so that f (v) = f(v) 
for everyv E€ V. 


As a corollary, one obtains 


Proposition (D) If f is an isometry of (V,g) into 
(W,h), then there is a homomorphism of algebras 
CLF): CLV, g) — Cl(W,h) extending f so that there 
is the commutative diagram 


CHV, o) ČO cew,h) 
T i 
V — W 
f 


For example, the isometry v> —v extends to the 
involutive main automorphism a of Cé(V,g), defin- 
ing its Z2-grading: 


CHV, g) =CO(V,8) CH (V, g) 


The algebra C4( V, g) admits also an involutive cano- 
nical antiautomorphism 8 characterized by G(1)=1 
and (3(v)=v for every v € V. 


The Vector Space Structure of Clifford Algebras 


Referring to proposition (D), let A = End( A V) and, for 
every v € V and w E€ AV, put f (v)w =v Aw + giv) lw, 
then f: V — End( AV) is a Clifford map and the map 


i : CUV, g) ~ AV [9] 


given by i(a)=fla)lay is an isomorphism of vector 
spaces. This proves 


Proposition (E) As a vector space, the algebra 
Cl(V,g) is isomorphic to the exterior algebra ^V. 


If V is m-dimensional, then Cé(V,g) is 
2”-dimensional. The linear isomorphism [9] defines a 
Z-grading of the vector space underlying the Clifford 
algebra: if ilap) € AFV, then a, is said to be of 
Grassmann degree k. Every element a € CHV, g2) 
decomposes into its Grassmann components, 
a= * -7 ap. The Clifford product of two elements of 
Grassmann degrees k and / decomposes as follows: 
a,bi = De (apb1)p5 and (akbi)p =0 if ps |k = l| or 
p=k-Il+1mod20rp>m—|m—k-ll. 

One often uses [9] to identify the vector spaces AV 
and Cé(V,g); this having been done, one can write, 
for every v € V anda € CHV, g), 


va=v ħa + g(v)]a [10] 


so that [v,a] =2g(v) |a, where [, ] is the supercommu- 
tator. It defines a super Lie algebra structure in the 
vector space K @ V. The quadratic form defined by g 
need not be nondegenerate; for example, if it is the 
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0-form, then [10] shows that the Clifford and exterior 
multiplications coincide and Cé(V, 0) is isomorphic, as 
an algebra, to the Grassmann algebra. 


Complexification of Real Clifford Algebras 


Proposition (F) If (V,g) is a real quadratic space, 
then the algebras C ®C&(V,g) and CHC 8 V,C® g) 
are isomorphic, as graded algebras over C. 


From now on, through the end of the article, one 
assumes that (V,g) is an orthogonal space over 
K=R or C. 

The Clifford algebra associated with the orthogo- 
nal space C” is denoted by Cé,,. The Clifford 
algebra associated with the orthogonal space 
(R**! g), where g is of signature (k,l), is denoted 
by Ciki so that C@ Cle | = Cli: 


Relations between Clifford Algebras in Spaces of 
Adjacent Dimensions 


Consider an orthogonal space (V, g) over K and the 
one-dimensional orthogonal space (K, bı), having a 
unit vector w € K, h1(w,w) =e, where e=1 or —1. 
The map Va vreuw €CO(V@K,g@h1) satisfies 
(vw) = —eg(v,v) and extends to the isomorphism 
of algebras Cé(V,—eg) - CO(V @K,g@h1). This 


proves 


Proposition (G) There are isomorphisms of algebras: 
Chin > CL). and Che 1 > Cle, 4 )- 


Consider the orthogonal space (K*,)) with a 
neutral b such that, for A,w~eK, one has 
(A, u), b(A, 1)) = Au. The map 


0 À 
K? — K(2), awf ,) 


u 


has the Clifford property and establishes the 
isomorphisms represented by the horizontal arrows 
in the diagram 


Cé(K?,h) — K(2) 


i T [11] 
C®(K?,bh) > KeK 


Proposition (H) If (K?,þ) is neutral and (V,g) is 
over K, then the algebra Cl\V@K*,g@h) is 
isomorphic to the algebra CHV, g) ® K(2) Specifically, 
there are isomorphisms 


Cerri = Cle) @ R(2) 








[12] 


The Chevalley Theorem and the Brauer-Wall 
Group 


If (V,g) and (W,/) are quadratic spaces over K, then 
their sum is the quadratic space (V $ W,g@ bh) 
characterized by g6h:VGW-— V* @ W* so that 
(e 6 h)(v, w) =(g(v),b(w)). By noting that the map 
VO Wod(v,w)>v@1418wWECKV,g) @CUW,h) 
has the Clifford property, Chevalley proved 


Proposition (I) The algebra Cl\VEW,g@h) is 
isomorphic to the algebra CHUV, g) ® C&W, hb). 


The type of the (graded) algebra CAV 6 W,¢ @ bh) 
depends only on the types of C4V,g) and C4 W, bh). 
The Chevalley theorem (I) shows that the set of types 
of Clifford algebras over K forms an abelian group for 
a multiplication induced by the graded tensor product. 
The unit of this Brauer—Wall group of K is the type of 
the algebra Cé(K*,h) described in [11]; for a full 
account with proofs, see Wall (1963). 








The Volume Element and the Centers 


Let e=(e,,) be an orthonormal frame in (V,g). The 
volume element associated with e is 


N= €1€2°°:* em 


If 7’ is the volume element associated with another 
orthonormal frame e’ in the same orthogonal space, 
then either =n (e and e’ are of the same 
orientation) or 7/=—7 (e and e’ are of opposite 
orientation). For K=C, one has 7*=1; for K=R 
and g of signature (k,/) one has 


n2 = (—1)0/2k-Dk-1) [13] 


It is convenient to define ų¿ € {1, i} so that n? = 14. For 
every v € V one has vn =(—1)”*' nv. The structure of 
the centers of Clifford algebras is as follows: 


Proposition (J) If mis even, then Z(Cé(V,g))=K 
and Z(C0° (V,g))=K@Kn. If m is odd, then 
2(C&(V,g))=K@Kn and Z(C0(V,g))=K. 

The graded algebra C(V,g) is supercentral for 
every m. 








The Structure of Clifford Algebras 


The complex case Using [4] one obtains from [11] 
and [12] the isomorphisms of algebras 





CE, 14°C = CO”) [14 
Chont1 =C hn =C(2") © C(2") [15] 
for n=0,1,2,... . Therefore, there are only two types 


of complex Clifford algebras, represented by 
C—=C9C and CC — C(2): the Brauer-Wall 
group of C is Z2. 


The real case In view of proposition (I) and 
Cly,1=R(2), the algebra Clg ; is of the same type as 
Clp_19 if k> and of the same type as Clo), 
if k<l. Since C1 Chk =Clkik+ the type 
of on , is the inverse of the type of Clk ;. The algebra 
Ch 9 — Cl4o is isomorphic to H@H— H(2): if 
x = (x1,%2,x3,x4) E RÍ C Cl4o, and q =ix1 + jx2+ 
kx3 + x4 € H, then an isomorphism i is obtained from 


the Clifford map f, 


0 q 
f (x)= e | |16] 


In view of [13], the volume element ņ satisfies n? = 1. 
By replacing —q with q in [16], one shows that ue 4 
is also isomorphic to H(2). The map R* x RE” 
FI(2) 8 Cép; given by (x,y) f(x)@1+7@y fe 
the Clifford property and establishes the isomorphism 
of algebras Cl,.4;=H@Cé,;. Since, similarly, 
Clr 144 = H ® Chg 1, one obtains the isomorphism 


























Clk+4,1 = Chk 144 
Therefore, 
Clk+g 1 =C Ek44,144 = Cék jig = Chk 9 R(16) 


and the algebras Cé, 1, Clk+8,1, and Clk 1+8 are all of the 
same type. This double periodicity of period 8 is 
subsumed by saying that real Clifford algebras can be 
arranged on a “spinorial chessboard.” The type of 
Ch | — Chg depends only on k — l mod 8; the eight 
types have the following low-dimensional algebras as 
representatives: Cki, 05 Coy. 05 Ch. Os Cla, 0 = (Cho; 45 Cko; 2 
Clo,2, and Cfo 1. The Brauer—Wall group of R is Zg, 
generated by the type of Cr. o > Cé1,0, that is, by R —> 
C. Bearing in mind the isomorphism eas ;= CE 441 
and abbreviating C — R(2) to C > R, etc., one can 
arrange the types of real Clifford algebras " the form 
of a “spinorial clock”: 


0 

















R RƏR 3 R 
6T {1 
C C [17] 
5 7 | 2 
H + H@H + H 
4 3 


Proposition (K) Recipe for determining Chi. 1 


Clr i 


i) find the integers u and v 
k—l =8u +v and 0v 7; 

(ii) from the spinorial clock, read off A? — vA, and 
compute the real dimensions, dim Ao = 2” and 
dim A, =27; and 

(iii) form Chr. pA, Or) 
A, (2 (1/2)( \(k+I— 7), 


such that 


and Cki = 
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The spinorial clock is symmetric with respect to 
the reflection in the vertical line through its center; 
this is a consequence of the isomorphism of algebras 
Cle 1+2 = Ch, k @ R(2). 

Note that the “abstract” algebra Cé, ı carries, in 
general, less information than the Clifford algebra 
defined in [8], which contains V as a distinguished 
vector subspace with the quadratic form 
vi+vu*=g(v,v). For example, the algebras Cls 0, 
Cl4.4, and Cho,g are all graded isomorphic. 


Theorem on Simplicity 


From general theory (Chevalley 1954) or by inspec- 
tion of [14], [15], and [17], one has 


Proposition (L) Let m be the dimension of the 
orthogonal space (V,g) over K. 


i) If m is even (resp., odd), then the algebra 
CHV, g) (resp., C0(V, g)) over K is central simple. 

(ii) If K=C and m is odd (resp., even), then the 
algebra C&(V,g) (resp., Cl°(V,g)) is the direct 
sum of two isomorphic complex central simple 
algebras. 

(iii) If K=R and m is odd (resp., even), then the 
algebra CHV, g) (resp., CC°(V, g)) when n? =1 is 
the direct sum of two isomorphic central simple 
algebras and when 1* = —1 is simple with a 
center isomorphic to C. 


Representations 


The Pauli, Cartan, Dirac, and Weyl 
Representations 


Odd dimensions Let (V,g) be of dimension 
m=2n+1 over K. From propositions (A) and (L) it 
follows that the central simple algebra C (V, g) has a 
unique, up to equivalence, faithful, and irreducible 
representation in the complex 2”-dimensional vector 
space S$ of Pauli spinors. By putting o(7)=vcl it is 
extended to a Pauli representation o:C¢(V,g) > 

End S. Given an orthonormal frame (e,,) in V, Pauli 
endomorphisms (matrices if S is identified with C”) 
are defined as o, = o(e,,) € End S. The representations 
o and goa are complex inequivalent. For K=C 
none of them is faithful; their direct sum is the faithful 
Cartan representation of C4 V, g) in S@S. For K=R 
and (1/2)(k — l — 1) even, the representations o and 
g oa are real equivalent and faithful. On computing 
G(n) one finds that the contragredient representation č 
is equivalent to o for n even and to ø o a for n odd. 





Even dimensions Similarly, for (V, g) of dimension 
m=2n over K, the central simple algebra Cé(V, g) 
has a unique, up to equivalence, faithful, and 
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irreducible representation y:C4(V, g) — End S in the 
2”-dimensional complex vector space S$ of Dirac 
spinors. The Dirac endomorphisms (matrices) are 
y= 7(e,). Put T = iyn) so that T? =I: the matrix T 
generalizes the familiar ys. The Dirac representation y 
restricted to C0°(V, g) decomposes into the sum y4 ® y- 
of two irreducible representations in the vector spaces 


Spas 65 | is es} 


of Weyl (chiral) spinors. The elements of S, are said 
to be of opposite chirality with respect to those of 
S_. The transpose I* defines a similar split of S*. 
The representations y} and y_ are never complex- 
equivalent, but they are real equivalent and 
faithful for K=R and (1/2)(k — l) odd. 

The representations yoa and ¥ are both equiva- 
lent to y. It is convenient to describe simultaneously 
the properties of the transpositions of the Pauli and 
Dirac matrices; let p, be either the Pauli matrices 
for V of dimension 27 + 1 or the Dirac matrices for 
V of dimension 2n. There is a complex isomorphism 
B:S — S* such that 


ø, = (—1)"Bp,B [18] 


In the case of the Dirac matrices, the factor (—1)” in 
[18] implies that this equation also holds for T in 
place of p,. The isomorphism B preserves (resp., 
changes) the chirality of Weyl spinors for n even 
(resp., odd). Every matrix of the form By,, -- -Yups 
where 


1<py <-++ < ppdn [19] 


is either symmetric or antisymmetric, depending on 
p and the symmetry of B. A simple argument, based 
on counting the number of such products of one 
symmetry, leads to the equation 


B*= (eer, 


valid in dimensions 2” and 2n + 1. 


Inner products on spinor spaces Let S be the 
complex vector space of Dirac or Pauli spinors 
associated with (V,g) over K. The isomorphism B: 
S—S defines on S$ an inner product 
B(s, t) = (s, B(t)), s,t € S, which is orthogonal for 
m=0,1,6, or 7mod8 and symplectic for m= 
2,3,4, or Smod8. For m=Omod4, this product 
restricts to an inner product on the space of Weyl 
spinors that is orthogonal for m=Omod8 and 
symplectic for m=4mod8. For m = 2 mod 4, the 
map B defines the isomorphisms B+ :S4 —> Sor 


Example One of the most used representations y: 
Cb3 1 — C(4) is given by the Dirac matrices 


O Oo QO dy 
= s y = 
E —o, 0 “oy 0 
O z 0 I [20] 
y= = 0 ) y4 = I0 


Change Conjugation and Majorana Spinors 


Throughout this section and next, one assumes 
K=R so that, given a representation p:C&4(V,g) > 
End S,one can form the complex- (“charge”) conjugate 
representation p:C4(V,g)— EndS defined by 


p(a)=p(a) and the Hermitian conjugate representa- 
tion p':Cé(V, g) > End S“, where p'(a) = pla). 


Even dimensions The representations 7 and y are 
equivalent: there is an isomorphism C:S — S such 
that 


Ta = Cy, C7! [21] 


The automorphism CC is in the commutant of 4; it 
is, therefore, proportional to I and, by a change of 
scale, one can achieve CC=I for k-l=0 or 
6 mod 8 and CC = -I for k — l = 2 or 4mod8. 

The spinor se = C~'s € S is the charge conjugate of 
s € S. If Y: V —> S is a solution of the Dirac equation 


(Q (ðu — 1gA,,) — K)p=0 


for a particle of electric charge q, then 7, is a 
solution of the same equation with the opposite 
charge. Since 


L=; CrO 


charge conjugation preserves (resp., changes) the 
chirality of Weyl spinors for (1/2)(k — I) even (resp., 
odd). 

If CC=I, then 


Re S= {s €S|s,=s} 


is a real vector space of dimension 2”, the space of 
Dirac-Majorana spinors. The representation y is 
real: restricted to Re S and expressed with respect to 
a frame in this space, it is given by real 2” x 2” 
matrices. For k — l = 0 mod 8 the representations y+ 
and y— are both real: in this case there are 
Weyl-Majorana spinors. 

Odd dimensions On computing o(7) one finds that 
the conjugate representation g is equivalent to a 


(resp., o o a) if n? = 1 (resp., n? = —1). There is an 
isomorphism C:S — S such that 


T, = (1) Ce, Cc [22] 


and CC =I (resp., CC = — I) for k — l = 1 or 7mod 8 
(resp., k — l = 3 or 5 mod 8). For k — l = 1 mod 8, the 
restriction of the Pauli representation to C& , is real 
and the Pauli matrices are pure imaginary; for k — l = 
7 mod 8, the Pauli representations of Cé, ; are both real 
and so are the Pauli matrices. In both these cases there 
are Pauli-Majorana spinors. 


Hermitian Scalar Products and Multivectors 


For m=k-+I odd and C as in [22], the map 
A=BC:S —S° intertwines the representations ot 
and o (resp., 0 o œ) for k even (resp., odd), 


k = 
o|,=(-1)"Ao,A* 


By rescaling of B, the map A can be made 
Hermitian. The corresponding Hermitian form 
st++A(s,s) is definite if and only if k or /=0; 
otherwise, it is neutral. 

For m=k-+1 even, the representations 7! and y 
are equivalent and one can define a Hermitian 
isomorphism A:S — S° so that 


= Aya 23] 


The isomorphism A’ = AT intertwines the represen- 
tations y' and yo a; it can also be made Hermitian 
by rescaling. The Hermitian form A(s,s) is definite 
for k=O and A’(s,s) is definite for l= 0; otherwise, 
these forms are neutral. For example, in the familiar 
representation [20], one has A= 74, a neutral form. 

For p=0,1,...,m=2n, two spinors s and te S 
define the p-vector with components 


Auy..-tp (s, t) = (S, AY, or Yup t) |24] 


where the indices are as in [19]. The Hermiticity of 
A and [23] imply 


w O 1/2 —1 
Aven. 2) =(—1)! /2)p(p Amm (t 8) 


In view of Tt =(—1) ATAT}, the map A defines, 
for k even, a nondegenerate Hermitian scalar 
product on the spaces Si whereas A(s,t)=0 if s 
and t are Weyl spinors of opposite chiralities. For k 
odd, A changes the chirality. 


The Radon-Hurwitz Numbers 


Proposition (M) For every integer m>0O, the 
algebra Clm,o has an irreducible real representation 


Clifford Algebras and Their Representations 527 


p of dimension 2X), where x(m) is the mth Radon- 
Hurwitz number given by 


and x(m + 8) = x(m) + 4. The matrices py, € R(2*™)), 
u=1,...,m, defining these representations satisfy 


PuPv + PuPu = —20 wl 


and can be chosen so as to be antisymmetric. In all 
dimensions other than m = 3 mod 4 the representa- 
tions are faithful. 

For m=2 and 4mod8 (resp, m=1,3, and 
5 mod 8) the representations p are the realifications of 
the corresponding Dirac (resp., Pauli) representations. 
In dimensions m=O and 6mod8 (resp., 
m = 7 mod 8) the Dirac (resp., Pauli) representations 
themselves are real. 


Inductive Construction 
of Representations 


An inductive construction of the Pauli 


representations 


TC 4g RO), HH=1.25c: 


and of the Dirac representations 
ACn =R) Mal ose: 
is as follows. 


1. In dimension 1, put 0; = 1. 
2. Given o, € R(2”~'), u=1,...,2n — 1, define 


Op 
r for u=1,...,2n—1 


0 -I 
Y2n = I 0 


3. Given % CIR”), (= dyxsag 2%, define O,= 
for w=1,...,2n, and don414=%1°°* Yon: 


All entries of these matrices are either 0, 1, or —1; 
therefore, they can be used to construct representa- 
tions of Clifford algebras of orthogonal spaces over 
any commutative field of characteristic Æ 2. 

By induction, one has of = (=P tto. Therefore, 
the isomorphisms appearing in [18] are 
B=7274:++Y2n for both m=2n and 2n + 1. 

By multiplying some of the matrices ø, or y, by the 
imaginary unit, one obtains complex representations 
of the Clifford algebras associated with the quadratic 
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forms of other signatures. For example, in dimension 
3, (01,102, 03) are the Pauli matrices. In dimension 4, 
multiplying y2 by i one obtains the Dirac matrices for g 
of signature (1,3), in the “chiral representation”: 


O o QO oy 
mC oo) rela o) 
O oy 0 -=I 
n= (7 a) w= (j 


To obtain the real Majorana representation one uses 
the following fact: 


|25] 


Proposition (N) If the matrix C € R(2”) is such 
that C? =I and [21] holds, then the matrices 
(I+iC)y,(I+iC)', p=1,...,27, {\it are real}. 


For the matrices [25], one can take C = 717374 to 
obtain 


The real representations described in proposition 
(M) can be obtained by the following direct inductive 
construction. Consider the following seven real anti- 
symmetric and anticommuting 8 x 8 matrices: 


p1ı= 7 8İ Qe, P2 = 0z B E B Ox 


P3 = Oz BER Oz, P4 = 0x DEQ] 


[26] 
P5 = Ox O Ox Q E, 


pP=EQIİIQI 


P6 = Ox D Oz WE 


For m=4,5,6,and7 the matrices p1,..., Pm gener- 
ate the representations of Cé,,,9 in R®. The eight 
matrices 0,,=0, 8 py, W=1,...,7, and 0g =e @I® 
I&I give the required representation of Cég 9 in 
R'°, By dropping the first factor in p1, p2, p3, one 
obtains the matrices generating a representation of 
Cé3,9 in Rf, etc. The symmetric matrix 
O=- =o, @®1@ IQI anticommutes with all 
the ðs and ©7=I. If the matrices p, € R(2*”) 
correspond to a representation of Clm,o, then the 
m + 8 matrices © ® p1,..., O Q pm, @1,...,03 QI 
generate the required representation of Cl,,+8 0. 


Vector Fields on Spheres 
and Division Algebras 


It is known that even-dimensional spheres have no 
nowhere-vanishing tangent vector fields. All such 


fields on odd-dimensional spheres can be constructed 
with the help of the representation p described in 
proposition (M). Given a positive even integer N, let 
m be the largest integer such that N = 2x% p, where 
p is an odd integer. Consider the unit sphere 
Sn-1 = {x € Rò | |\x||/ =1} of dimension N — 1. For 
v E€ R”, put p'(v)=p(v) @I, where I € R(p) is the 
unit matrix. Since p(v) is antisymmetric, so is the 
matrix p'(v) € R(N). Therefore, for every x € Sn-1, 
the vector p'(v)x is orthogonal to x. The map 
xt p'(v)x defines a vector field on Sy_ 1 that 
vanishes nowhere unless v=0: the (N—1)-sphere 
admits a set of m tangent vector fields which are 
linearly independent at every point. Using methods of 
algebraic topology, it has been shown that this 
method gives the maximum number of linearly 
independent tangent vector fields on spheres. 

If m=1, 3, or 7, then m+1=2*™) and, for these 
values of m, the sphere S,, is parallelizable. More- 
over, one can then introduce in R”*! the structure 
of an algebra Am as follows. Put pọ =I. If eọ € R™*! 
is a unit vector and e,,=p,(eo), then (€0, €1,..-5 €m) 
is an orthonormal frame in R”*!. The product of 
x= So Xe. and y= YT" _ 9 Yuen is defined to be 


x y= ` Xa Vpn E) 
uv=0 


so that eọ is the unit element for this product. 
Defining Rex=xoeo, Imx =x — Rex, x=Rex—Imx, 
one has X +x =e||x]|7 and x-(x-y)=(x-x)-y, so that 
x-y=0 implies x=0 or y=0: A, is a normed 
algebra without zero divisors. The algebras A; and 
A3 are isomorphic to C and H, respectively, and A7 
is, by definition, the algebra O of octonions 
discovered by Graves and Cayley. The algebra O is 
nonassociative; its multiplication table is obtained 
with the help of [26]. 





Spinor Groups 


Let (V,g) be a quadratic space over K. If u € V is 
not null, then it is invertible as an element of 
Cé(V,g) and the map v= —uvu7! is a reflection in 
the hyperplane orthogonal to u. The orthogonal 
group O(V,g)=O(V, —g)={R € GL(V)|R* ogo 
R = g} is generated by the set of all such reflections. 
A spinor group G is a subset of Cé(V, g) that is a 
group with respect to multiplication induced by the 
product in the algebra, with a homomorphism 
p:G — GL(V) whose image contains the connected 
component $O°(V, g) of the group of rotations of 
(V,g). In the case of real quadratic spaces, one 
considers also spinor groups that are subsets of C @ 
Cé(V, g) with similar properties. By restriction, every 


representation of Cé(V,g) or C@C&V,g) gives 
spinor representations of the spinor groups it 
contains. 


Pin Groups 


It is convenient to define a unit vector v € VC 
Cé(V,g) to be such that v?=1 for V complex and 
v?=1 or —1 for V real. The group Pin(V,g) is 
defined as the subgroup of Cpin(V,g) consisting of 
products of all finite sequences of unit vectors. 
Defining now the twisted adjoint representation Ad 
by Ad(a)v = a(a)va~', one ontains the exact sequence 


1> Z = Pin(V,g)SO(V,g) -1 [27 


If dimV is even, then the adjoint representation 
Ad(a)v=ava™' also yields an exact sequence like 
[27]; if it is odd, then the image of Ad is SO(V, g) and 
the kernel is the four-element group {1, —1,7, —7}. 

Given an orthonormal frame (e,,) in (V,g) and 
a € Pin(V, g), one defines the orthogonal matrix 
R(a) =(Ré(a)) by 


Ad(a)e,, = e,R? (a) (28) 


If (V, g) is complex, then the algebras Cé(V, g) and 
Cé(V, —g) are isomorphic; this induces an iso- 
morphism of the groups Pin(V,g) and Pin(V, —g). 
If V =C”, then this group is denoted by Pin,,(C). If 
V=R*" and g of signature (k,/), then one writes 
Pin(V,g)=Pin, ;. A similar notation is used for the 
groups spin, see below. 


Spin Groups 


The spin group Spin(V, g) =Pin(V, g) N Cé°(V, g) is 
generated by products of all sequences of an even 
number of unit vectors. Since the algebras C°(V, g) 
and Cé°(V, —g) are isomorphic, so are the groups 
Spin(V,g) and Spin(V, —g). Since a(a)=a for a € 
Spin(V,g), the twisted adjoint representation 
reduces to the adjoint representation and yields the 
exact sequence 


1> Z > Spin(V,g) ŽS SO(V,g)—>1 [29] 
For V = C”, the spin group is denoted by Spin,,(C). 


Since Spin„(C) C G1(G), the bilinear form B is 
invariant with respect to the action of this group. 


Spin? Groups 


The connected component Spin? (V, g) of the group 
Spin(V,g) coincides with Spin(V,g) if either the 
quadratic space (V, g) is complex or real and kl=0. 
In signature (k,l), the connect group Spin? | is 
generated in Ch ı by all products of the form 
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U,...U2,V,...V2, such that u?= —1 and ve=1. 
The connected groups Spin,,9 and Spino „ are 
isomorphic and denoted by Spin,,. Since Spin? ; C 
Gı(8), the Hermitian form A and the bilinear form 
B are invariant with respect to the action of this 
group. Moreover, for k+l even, from [24] and 
[28] there follows the transformation law of 
multivectors formed from pairs of spinors, 


Am-m (V(@)s, Y(a)t) 
= Ay,..v, (S, E) RG (a jes Re (a~t) 


Consider Spin®(V,g) and assume that either V is 
complex of dimension >2 or real with k or | > 2. 
Then there are two unit orthogonal vectors 
€1,€2 © V such that (e;,e))? = —1. The vector 
u(t) =e,;cost + esin t is obtained from e1 by rotation 
in the plane span {e1,e2} by the angle że R. The 
curve tre eult), 0 < t < a, connects the elements 
1 and —1 of Spin®(V, g). Its image in SO°(V, g), that 
is, the curve tr Ad(e,u(t)), O < t< m, is closed: 
Ad(1)=Ad(—1). This fact is often expressed by 
saying that “a spinor undergoing a rotation by 27 
changes sign.” There is no homomorphism — not 
even a continuous map — f SOV, g) > Spin? (V, g) 
such that Ad o f =id. 


Spin® Groups 


For the purposes of physics, to describe charged 
fermions, and in the theory of the Seiberg-Witten 
invariants, one needs the Spin* groups that are spinorial 
extensions of the real orthogonal groups by the group U1 
of “phase factors.” Assume V to be real and g of 
signature (k,/) so that the sequence [29] can be 
written as 


L > Zy > Opiipg. > SO,p 1 


Define the action of Z2 = {1, —1} in Spin, ; x U1 so 
that (—1)(a,z)=(— a, —z). The quotient (Spin, ; x 
U1)/Z2 = Spin; ; yields the extensions 


1 — Ui a Spin; | E 907; — 1 
and 
1 — Spin, ; — Spin; ; > Ui > 1 


For example, Spin, = SU% and Spin§ = Up. 


Spin Groups in Dimensions <6 


The connected components of spin groups asso- 
ciated with orthogonal spaces of dimension <6 are 
isomorphic to classical groups. They can be expli- 
citly described starting from the following 
observations. 
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Consider the four-dimensional vector space 
(of twistors) T over K, with a volume element 
vol € AT. The six-dimensional vector space 
V=A*T has a scalar product g defined by 
g(u,v)vol=2u A v for u,v € V. The quadratic form 
g(u,u) is the Pfaffian, Pf(u). If u € V is represented 
by the corresponding isomorphism T* — T and a € 
End T, then Pf(aua*) = detaPf(u). The last for- 
mula shows Spin®(V,g)=SL(T), so that Spin, (C) = 
SL4(C). For K=R, the Pfaffian is of signature (3, 3), so 
that Spins 3 =SL,4(R). A non-null vector v € V defines 
a symplectic form on T*. The five-dimensional vector 
space v+ C V is invariant with respect to the symplec- 
tic group Sp(T*, u) = Spin? (vt, Pf|v). This shows that 
Spin;(C) =Sp,(C) and Spins 3 = Sp4(R). Spin groups 
for other signatures in real dimensions 6 and 5 are 
obtained by considering appropriate real subspaces of 
C and C°, respectively. For example, [6] is used to 
show that Spiny, s= SL (H). 

Spin groups in dimensions 4 and lower are 
similarly obtained from the observation that det is 
a quadratic form on the four-dimensional space K(2) 
and Cé°(K(2), det) = K(2) @ K(2). 

Several spin groups are listed below. 





The complex spin groups 
Spin (CSC, Spin; (C) = SL2 (C) 
Sping(C) = SL2 (C) x SL? (C) 
Spins (C) = Sp4(C) 
Sping(C) = SL4(C) 


The real, compact spin groups 





Spin, = U4, Spin; = SU; 
Spin; =SU2 x SU2, Spin; = Sp, (H) 
Sping = SU, 


The groups Sping | forl<k<landk+I<6 
Spiny; =R*, — Spin} , =SL2(R) 
Spiny 3 =SL2(C) 
Spiny > =SL2(R) x SL2(R) 
Spiny 4 = Sp, (A) 
Spiny 3 =Sp,(R), 
Spins 4 = SU2,2 
Spins 3 = SL4(R) 








Spiny 5 = SL2 (H) 


See also: Dirac Operator and Dirac Field; Index 
Theorems; Relativistic Wave Equations Including Higher 
Spin Fields; Spinors and Spin Coefficients; Twistors. 


Further Reading 


Adams JF (1981) Spin (8), triality, F4 and all that. In: Hawking 
SW and Roček M (eds.) Superspace and Supergravity. 
Cambridge: Cambridge University Press. 

Atiyah MF, Bott R, and Shapiro A (1964) Clifford modules. 
Topology 3(suppl. 1): 3-38. 

Baez JC (2002) The octonions. Bulletin of the American 
Mathematical Society 39: 145-205. 

Brauer R and Weyl H (1935) Spinors in n dimensions. American 
Journal of Mathematics 57: 425-449. 

Budinich P and Trautman A (1988) The Spinorial Chessboard,- 
Trieste Notes in Physics. Berlin: Springer. 

Cartan E (1938) Théorie des spineurs. Actualités Scientifiques et 
Industrielles, No. 643 et 701. Paris: Hermann (English 
transl.:The Theory of Spinors. Paris: Hermann, 1966). 

Chevalley C (1954) The Algebraic Theory of Spinors. New York: 
Columbia University Press. 

Clifford WK (1878) Applications of Grassmann’s extensive 
algebra. American Journal of Mathematics 1: 350-358. 

Clifford WK (1882) On the classification of geometric algebras. 
In: Tucker R (ed.) Mathematical Papers by William Kingdon 
Clifford, pp. 397-401. London: Macmillan. 

Dirac PAM (1928) The quantum theory of the electron. 
Proceedings of the Royal Society of London A 117: 610-624. 

Eckmann B (1942) Gruppentheoretische Beweis des Satzes von 
Hurwitz—Radon über die Komposition quadratischer Formen. 
Commentarii Mathematici Helvetici 15: 358-366. 

Karoubi M (1968) Algébres de Clifford et K-théorie. Annales 
Scientifiques de École Normale Superieure 4ème sér 1: 161-270. 

Lipschitz RO (1886) Untersuchungen über die Summen von 
Ouadraten. Berlin: Max Cohen und Sohn. 

Lounesto P (2001) Clifford Algebras and Spinors, 2nd edn. 
London Math. Soc. Lecture Note Series, vol. 286. Cambridge: 
Cambridge University Press. 

Pauli W (1927) Zur Quantenmechanik des magnetischen 
Elektrons. Z. Physik 43: 601-623. 

Penrose R and MacCallum MAH (1973) Twistor theory: an 
approach to the quantisation of fields and space-time. Physics 
Report 6C(4): 241-316. 

Porteous IR (1995) Clifford Algebras and the Classical Groups, 
Cambridge Studies in Advanced Mathematics, vol. 50. Cam- 
bridge: Cambridge University Press. 

Postnikov MM (1986) Lie groups and Lie algebras. Mir: Moscow. 

Sudbery A (1987) Division algebras (pseudo)orthogonal groups 
and spinors. Journal of Physics A17: 939-955. 

Trautman A (1997) Clifford and the “square root” ideas. 
Contemporary Mathematics 203: 3-24. 

Trautman A and Trautman K (1994) Generalized pure spinors. 
Journal of Geometry and Physics 15: 1-22. 

Wall CTC (1963) Graded Brauer groups. Journal für die Reine 
und Angewandte Mathematik 213: 187-199. 


Cluster Expansion 


R Kotecky, Charles University, Prague, 
Czech Republic, and the University of Warwick, UK 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


The method of cluster expansions in statistical 
physics provides a systematic way of computing 
power series for thermodynamic potentials (loga- 
rithms of partition funtions) as well as correlations. 
It originated from the works of Mayer and others 
devoted to expansions for dilute gas. 


Mayer Expansion 


Consider a system of interacting particles with 
Hamiltonian 





An (p, tsss y PNT lasita ry) 
N p? N 
=J at DL (r-r) 1] 
i=1 i, j=1 


where ® is a stable and regular pair potential. 
Namely, we assume that there exists B > 0 such that 


N 
X &(r;—1;) > -BN [2] 
ij=1 
for all N=2,3,... and all (r4,... ry) € RÌ, and 
that 
C(8) = | |e 1|d’r < œœ [3] 


for some @>0 (and hence all 8> 0). Basic 
thermodynamic quantities are given in terms of the 
grand-canonical partition function 


œo N 3 3 
Z(G,A,V alin VN ern Llc as S 


J= 
oN bon 


-Sap E Ch 


In the second expression we absorbed the factor 
resulting from the integration over impulses into 
(configurational) activity \=(2am/h*)*/*z. In par- 
ticular, the pressure p and the density p are defined 
by the thermodynamic limits (with V— œo in the 
sense of Van Hove) 

1 


. 1 


Cluster Expansion 531 


and 


1 
o(8,X) = Jim A tog Z(8,4V) ‘(6 


Mayer series are the expansions of p and p in powers 
of A: 


y= D bà” 17] 
n=1 


and 


CO 


p(B, A) =X nb," 8 


n=1 


Mayer’s idea for a systematic computation of 
coefficients b, was based on a reformulation of 
partition function Z(G,A,V) in terms of cluster 
integrals. Introducing the function 


f(r) = 98) — 1 9] 


and using G[N] to denote the set of all graphs on N 


vertices {1,..., N}, we get 
OO \N N 4 
Z(B,d, V) Phe It +flri—n)) [dr 
= 1J= 


where 


wg)=] Ire- [en uy 
V 


~ {ij}eg 
Observing that the weight w is multiplicative in 
connected components (clusters) g1,...,g} of the 


graph g, 


k 
w(g) =| [wle [12] 


we can rewrite 


Z(8, A, V) = a S| [4 [13] 
gEG 


N “Agi 
with the sum running over all disjoint collections {g;} 
of connected graphs with vertices in {1,..., N}. A 
straightforward exponential expansion can be used to 
show that, at least in the sense of formal power series, 


-YSL we) 114 


log Z(G, A, V) 
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where C[m] is the set of all connected graphs on n 
vertices. Using b'”) to denote the coefficients 


bY =a DD, wg) [15] 
n | 
|V] n! E 
and observing that the limits limy — æ (1/|V|)w(g) of 


cluster integrals exist, we get b, = limy — œ bW. “oe 
convergence of Mayer series can be controlled directly 
by combinatorial estimates on the coefficients b\”. Asa 
result, the diameter of convergence of the series [7] and 
[8] can be proved to be at least (C(3)e298+!). A less 
direct proof is based on an employment of linear 
integral Kirkwood-Salsburg equations in a suitable 
Banach space of correlation functions. 

Similar combinatorial methods are available also 
for evaluation of coefficients of the virial expansion 
of pressure in powers of gas density, 


0) =Y Bap" 16] 
n=1 


obtained by inverting [8] (notice that bı = 1) and 
inserting it into [7]. One is getting 3, = limy — œ BV 
with 


1 1 


(V) — 
Pn = vja 


w(8) [17] 
geBln] 


where Bln] c C[n] is the set of all 2-connected 
graphs on {1,... n}; namely, those graphs that 
cannot be split into disjoint subgraphs by erasing 
one vertex (and all adjacent edges). The diameter of 


convergence of the virial expansion turns out to be 
no less than (C(8)e(e298 +1). 


Abstract Polymer Models 


An application of the ideas of Mayer expansions to 
lattice models is based on a reformulation of the 
partition function in terms of a polymer model, a 
formulation akin to [13] above. Namely, the partition 
function is rewritten as a sum over collections of 
pairwise compatible geometric objects — polymers. 
Most often, the compatibility means simply their 
disjointness. 

While the reformulation of “physical partition 
function” in terms of a polymer model (including the 
definition of compatibility) depends on particularities 
of a given lattice model and on the considered region of 
parameters — high-temperature, low-temperature, large 
external fields, etc. — the essence and results of cluster 
expansion may be conveniently formulated in terms of 
an abstract polymer model. 

Let G = (V, E) be any (possibly infinite) countable 
graph and suppose that a map w: V —>C is given. 


Vertices v € V are called abstract polymers, with 
two abstract polymers connected by an edge in the 
graph G called incompatible. We shall refer to w(v) 
as to the weight of the abstract polymer v. For any 
finite W CV, we consider the induced subgraph 
G[W] of G spanned by W and define 


w) =X | [vr) [18] 


ICW vel 


Here the sum runs over all collections I of 
compatible abstract polymers — or, in other words, 
the sum is over all independent sets I of vertices in 
W (no two vertices in I are connected by an edge). 
The partition function Zw(w) is an entire function 
inw={w(v)},cew E CII and Zķ(0)=1. Hence, it is 
nonvanishing in some neighborhood of the origin 
w = 0 and its logarithm is, on this neighbourhood, an 
analytic function yielding a convergent = series 


ae aw (X [19] 


Xex(W 


Here, 1(W) is the set of all multi-indices X : W — 
{01,...}andw* = J], w(v)*"), Inspecting the formula 
for aw(X) in terms of corresponding derivatives of 
log Zw(w), it is easy to show that the Taylor coefficients 
aw(X) actually do not depend on W:ay(X) =dsupp 
X(X), where supp X = {v € V: X (v) Æ 0}. As a result, 
one is getting the existence of coefficients a(X) such that 


log Zw(w)= X` a(X)w* [20] 
Xex(W) 


log Zw(w 


for every finite W C V. 

The coefficients a(X) can be obtained explicitly. 
One can pass from [18] to [20] in a similar way as 
passing from [10] to [13]. The starting point is to 
replace the restriction to compatible collections of 
abstract polymers in the sum [18] by the factor 
I vewl(1 + Flv, v')) with 


0 if v and v are compatible 


F(v,v’) = < — 1 otherwise (v and v [21] 


connected by an edge from G) 


and to expand the product afterwards. The resulting 
formula is 


a(X) = (X) Gan [22] 
HcG(X) 


Here, G(X) is the graph with |X| = X` |X(v)| vertices 
induced from G[supp X] by replacing each of its 
vertices v by the complete a on |X(v A vertices 
and X! is the multifactorial X! = [| [yesupp x X(v)!- T 
sum is over all connected a A f G(X 
spanned by the set of vertices of G(X) and |E(H 
is the number of edges of the graph . 


A useful property of the coefficients a(X) is their 
alternating sign, 


(-1)'**"a(X) > 0 [23] 


More important than an explicit form of the 
coefficients a(X) are the convergence criteria for the 
series [20]. One way to proceed is to find direct 
combinatorial bounds on the coefficients as expressed 
by [22]. While doing so, one has to take into account the 
cancelations arising in view of the presence of terms of 
opposite signs in [22]. Indeed, disregarding them would 
lead to a failure since, as it is easy to verify, the number 
of connected graphs on |X| vertices is bounded from 
below by 2(X-1(|xl-2)/2. An alternative approach is to 
prove the convergence of [20] on polydisks Dw, r = 
{w:|w(v)| < R(v) for v € W} by induction in |W], 
once a proper condition on the set of radii R= {R(v); 
v € V}is formulated. The most natural for the inductive 
proof (leading in the same time to the strongest claim) 
turns out to be the Dobrushin condition: 

There exists a function r: V — [0, 1) such that, for 
each ve V 


Riv) <rv) [J a-rv) [24] 
v'EN (v) 


Here N (v) is the set of vertices v’ € V adjacent in 
graph G to the vertex v. 

Using ¥ to denote the set of all multi-indices 
X:V—{0,1,...} with finite |X|= X |X(v)| and 
saying that X € ¥ is a cluster if the graph G(supp 
X) is connected, we can summarize the cluster 
expansion claim for an abstract polymer model in 
the following way: 


Theorem (Cluster expansion). There exists a func- 
tion a: X —> R that is nonvanishing only on clusters, 
so that for any sequence of diameters R satisfying 
the condition [24] with a sequence {r(v)}, the 
following holds true: 


(i) For every finite W C V, and any contour weight 
w € Dw r, one has Zw(w) + 0 and 


log Zw(w)= X` a(X)w* 
XEX(W) 


(ii) xcassuppxsv AX) |lw|* < —log(1 — r(v)). 


Notice that, we have got not only an absolute 
convergence of the Taylor series of log Zw in the closed 
polydisk Dw,r, but also the bound (ii) (uniform in W) 
on the sum over all terms containing a fixed vertex v. 
Such a bound turns out to be very useful in applications 
of cluster expansions. It yields, eventually, bounds on 
various error terms, avoiding a need of an explicit 
evaluation of the number of clusters of “given size.” 
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The restriction to compatible collections of polymers 
can be actually relaxed. Namely, replacing [25] by 


Zww)= X [[ we) [] Uwv) [25] 


W'cW veW' vue W' 


with U(v,v’) € [0,1] (soft repulsive interaction), and 
the condition [24] by 


R(v) < rv) lI L= U |26] 


one can prove that the partition function Zw(w) 
does not vanish on the polydisk Dw,r implying thus 
that the power series of log Zyw(w) converges 
absolutely on Dy,r. 

Polymers that arise in typical applications are 
geometric objects endowed with a “support” in the 
considered lattice, say 74,d>1, and their weights 
satisfy the condition of translation invariance. Cluster 
expansions then yield an explicit power series for the 
pressure (resp. free energy) in the thermodynamic 
limit as well as its finite-volume approximation. 

To formulate it for an abstract polymer model, we 
assume that for each x€ Zf, an isomorphism 
Tx : G — G is given and that with each abstract polymer 
vE V a finite set A(v) C ZÝ is associated so that 
A(t,(v)) = A(v) + x for every v € V and every x € Zf. 
For any finite W C V and any multi-index X, let 
A(W) = Usew A(v) and A(X)= A(supp(X)). On the 
other hand, for any finite A c Zf, let W(A)={v € 
V:A(v) C A}. Assuming also that the weight w: V— C 
is translation invariant — that is, w(v)=w/(7,(v)) for 
every v € V and every x € Zf — we get an explicit 
expression for the “pressure” of abstract polymer model 
in the thermodynamic limit 


| 
p= lim ian 8 Zwa)(w) = ` — [27] 
B X:A(X)30 


In addition, the finite-volume approximation can be 
explicitly evaluated, yielding 


log Zw a) (w) 


=plA|+ > 


X:A(X)NACZO 


x AX) N Al 


AX)" TAR) 


|28] 


Using the claim (ii), the second term can be bounded 
by const. |ðA]. 


Cluster Expansions for Lattice Models 


There is a variety of applications of cluster expan- 
sions to lattice models. As noticed above, the first 
step is always to rewrite the model in terms of a 
polymer representation. 
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High-Temperature Expansions 


Let us illustrate this point in the simplest case of the Ising 
model. Its partition function in volume A c Zł, with 
free boundary conditions and vanishing external field, is 


LoS X exp ` OxOy [29] 


Using the identity 
ef% — cosh B + oxo sinh 8 [30] 


it can be rewritten in the form 


Za (8) = 2"(cosh 6) P'S (tanh 6)” [31 
B 


Here, the sum runs over all subsets B of the set B(A) of 
all bonds in A (pairs of nearest-neighbor sites from A) 
such that each site is contained in an even number of 
bonds from B. Using A(B) to denote the set of sites 
contained in bonds from B, we say that B1, B2 C B(A) 
are disjoint if A(B1) N A(B2) = @. Splitting now B into a 
collection B = {B,,..., BŁ} of its connected components 
called (high-temperature) polymers and using B(A) to 
denote the set of all ae in A, we are getting 


Z (8) =2'\(cosh 8)?! X [] (tanh 0)” [32 


BCB(A) BEB 


with the sum running over all collections 6 of mutually 
disjoint polymers. This expression is exactly of the 
form [18], once we define compatibility of polymers 
by their disjointness. Introducing the weights 


w(B) = (tanh 8)” [33] 
and taking the set B(A) of all polymers in A for W, 
we get the polymer representation Zy)((3)= 


2A cosh 3) 2! Zg (w). 

To apply the cluster expansion theorem, we have to 
find a function r such that the right-hand side of [24] is 
positive and yields thus the radius of a polydisk of 
convergence. Taking r(B) = e!?l with a suitable €, we get 


[] @-718) > e”! [34] 
B'EN(B) 


allowing to choose R(B)=r(B)e2!3! = (ee)! 
Indeed, to verify [34] we just notice that the number 
of polymers of size n containing a fixed site is 
bounded by «” with a suitable constant «x. Thus, 


Bl< So Ke <i [35] 
B': A(B')>x =1 


once € is sufficiently small, and thus 


>» Pls |A(B)| < |B] [36] 


B'EN(B) 


yielding [34] (1 — t > e7%* for t < 1/2). ). To have w € 
Dw,r (for any W) is, for R(B)= (ee 2)IBI sufficient 
to take 8 < Go with G Bo = ee 7?. 

As a consequence, for B < Bo we can use the 
cluster expansion theorem to obtain a convergent 
power series in powers of tanh 8. In particular, 
using A(X) = Ugesuppx A(B), we get the pressure by 
the explicit formula 


p6) = 


log 2 + d log(cosh 8) + ES, w* Pa 


A(X)| 





XA Oax 


for any fixed x € Zf (by translation invariance of 
the contributing terms, the choice of x is irrelevant). 
The function 8p(8) is analytic on the region 8 < (io 
since it is obtained as a uniformly absolutely 
convergent series of analytic terms (tanh By, 

This type of high-temperature cluster expansion 
can be extended to a large class of models oo 
Boltzmann factor in the form exp {—(@) >, Ua(@ 
where ¢=(¢,3x € Zf) is the configuration mi 
a priori on-site probability distribution v(dé,) and 
Ua, for any finite AC Zf, are the multi-site 
interactions (depending only on (¢,;x € A)). Using 
the Mayer trick we can rewrite 


exp l-5 ` uate 


ACA 


=|][G+h@) B8] 
A 

with f4(¢)= exp{—GU,(¢)}— 1. Expanding the 

product we will get a polymer representation with 


polymers A consisting of connected collections 
A=(A1,...,Az) with weights 


4)= o 


AEA HU, 


vidos) [B9] 


under appropriate bounds on the interactions U, 
and for 8 small enough, using A(A) to denote the set 
Uses, we get, 


wA <1 [40] 
A:A(A) 3x 


This assumption allows, as before in the case of the 
high-temperature Ising model, to apply the cluster 
expansion theorem yielding an explicit series expan- 
sion for the pressure. 


Correlations 


Cluster expansions can be applied for evaluation of 
decay of correlations. Let us consider, for the class 
of models discussed above, the expectation 


Wz | wOna) 0 


xEA 


with Ha(¢?)= aca Ual) and a function W 
depending only on variables ¢, on sites x from a 
finite set SC AC Zf. 

A convenient way of evaluating the expectation starts 
with introduction of the modified partition function 


Zw (Q) = Za + aav = 741 + a(W),) [42] 
Clearly, 


_ d log Za x(a) 


(Ua = 5A 43 


a=0 

Thus, one may get an expression for the expectation 
(W),, by forming a polymer representation of Z4, v(a) 
and isolating terms linear in œ in the corresponding 
cluster expansion. For the first step, in the just cited 
high-temperature case with general multi-site inter- 
actions, we first enlarge the original set A(A) of all 
polymers in A (consisting of connected collections 
A=(A1,...,Ag)) to Ws(A)=A(A) U As(A), where 
As(A) is the set of all collections (A,,...,A,) of 
polymers such that each of them intersects the set S$ 
(polymers (A;,...,A,) are “glued” by S into a single 
entity). Compatibility is defined as before by disjoint- 
ness; in addition, any two collections from As(A) are 
declared to be incompatible as well as any polymer A 
from A(A) intersecting S is considered to be incompa- 
tible with any collection from As(A). Defining now 


Wal A) =w(A) for A € A(A) and 


wal A) =a | WO v(dos) 
XEUAEA U- UA, AUS 
[44] 
for A=(A1,..., Ak) E As(A), we get Za vla) 


exactly in the form [18], 


Zav (a) = [I Wal A) [45] 


TCWs (A) AET 


As a result, we have 


log Za yla) = 
XEX(Ws(A)) 


a(X)w* [46] 


Q 


allowing easily to isolate terms linear in a: namely, 
the terms with multi-indices X with supp X N As(A) 
consisting of a single collection, say Ao, that occurs 
with multiplicity one, X(Ao)=1. Explicitly, using 
Xs a lA) = {X € X(Ws5(A)) : supp X N As(A) 

= {Ao}, X(Ao) = 1} [47] 


we get 


Was X D, a(X)u* [48] 


Ape As(A) XEXs Ay (A) 


It is easy to show that, for sufficiently small (3, the series 
on the right-hand side is absolutely convergent even if 
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we extend As(A) to As =U, As(A) and Xs 4 (A) to 
Xs Ay = Un Xs, 4)(A). As a result, we have an explicit 
expression for the limiting expectation (VW) in terms of 
an absolutely convergent power series. This can be 
immediately applied to show that |(W) — (W),| decay 
exponentially in distance between S and the comple- 
ment of A. Indeed, it suffices to find a suitable bound on 
Sy la(X)|jw|* with the sum running over all clusters 
X reaching from the set S to A°. To this end one does not 
need to evaluate explicitly the number of clusters of 
given “diameter” diam(X)= >>, X(A) diam(A(A))=m 
with m > dist(S,A°). The needed estimate is actually 
already contained in the condition (11) from the cluster 
expansion theorem. It just suffices to choose a suitable 
k and assume that 8 is small enough to assure validity 
of (40) in a stronger form, > 4.,(4)5x wl AKAA < 1, 
yielding eventually 


Ja(X) |||" < KANO A958) 
X : diam(X) > dist(S, A°) 


Y2 [aX feo KE AA 


X:UA esas xA(A)ax 
a S| K~aistts: AS) [49] 


Exponential decay of correlations (W1;W2), = 
(WyW2), — (Fida (Pa) (and the limiting (Y1; W2)) 
in distance between supports of Y4 and Y3 can be 
established in a similar way by isolating terms 
proportional to a,a2 in the cluster expansion of 
log Za,u,,w,(Q1, @2) with 


Zav v (01,22) 
=Z Faa taaa Faaa VNA) [50] 


The resulting claim can be readily generalized to one 
about the decay of the correlation (W1;...;W,) in 
terms of the shortest tree connecting supports 
S1,.--5Sp of the functions Y1,..., P}. 


Low-Temperature Expansions 


Finally, in some models with symmetries, we can apply 
cluster expansion also at low temperatures. Let us 
illustrate it again in the case of Ising model. This time, 
we take the partition function Z;(3) with plus 
boundary conditions. First, let us define for each 
nearest-neighbor bond (x,y) its dual as the (d — 1)- 
dimensional closed unit hypercube orthogonal to the 
segment from x to y and bisecting it at its center. For a 
given configuration o,, we consider the boundary of 
the regions of constant spins consisting of the union 
O(o,) of all hypercubes that are dual to nearest- 
neighbor bonds (x, y) for which ox 4 oy. The contours 
corresponding to o, are now defined as the connected 
components of O(o,). Notice that, under the fixed 
boundary condition, there is a one-to-one correspon- 
dence between configurations o, and sets I of 
mutually compatible (disconnected) contours in A. 
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Observing that the number of faces in (øx) is just 
the sum of the areas |y| of the contours y € T, we 
get the polymer representation 


Zee > ex 2 7) [51] 


yer 


where the sum is over all collections of disjoint 
contours in A. Here E(A) is the set of all bonds (x, y) 
with at least one endpoint x,y in A. 

The condition [24] with r(y)=e7 yields a similar 
bound on the weights w(y)=e7l"! as in the high- 
temperature expansion. To verify it, for 8 sufficiently 
large, boils down to the evaluation of number of 
contours of size n that contain a fixed site. 

As a result, we can employ the cluster expansion 
theorem to get 


log Zx(8) = BIE(A)|+ D> a(X)w* [52 


X:XEX(C(A)) 


with an explicit formula for the limit 


bp(6) = bd + yy 53 
p8) een A [53] 


Here, A(X) is the set of sites attached to contours 
from supp X, 


A(X) = Uyesupp xA(7) [54] 
with 
A(y) = {x € Z? | such that dist(x, y) < 1/2} [55] 


As a consequence of the fact that [53] is, for large 
8, an absolutely convergent sum of analytic terms 
a(X)w* —~a(X)e” 2, Xion (considered as functions 
of 8), the function Gp() is, for large 8, analytic in 8. 

The fact that one can explicitly express the 
difference log Z)(3) — |A|Gp(G) (cf. [28]) found 
numerous applications in situations where one 
needs an accurate evaluation of the influence of the 
boundary of the region A on the partition function. 
One such example is a study of microscopic 
behavior of interfaces. The main idea is to use the 
explicit expression in the form 


Zx (8) 
— a px AX) NA| 
saple D aX 
=exp{ pOl I  (+fx) |56] 
X:A(X)NAc 0 
Noticing that 
_ JA(X) N Al 
fx = exp aX a 


does not vanish only if A(X) A A 4 0, we can expand 
the product to obtain “decorations” of the boundary 
ðA by clusters fx. In the case of interface these clusters 
can be incorporated into the weight of interface, while 
on a fixed boundary they yield a “wall free energy.” 
The possibility of the (low-temperature) polymer 
representation of the partition function in terms of 
contours is based on the + > — symmetry of the 
Ising model. In absence of such a symmetry, cluster 
expansions can still be used, but in the framework of 
Pirogov-Sinai theory (see Pirogov-Sinai Theory). 
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Introduction 


Very generally, a family of coherent states is a set of 
continuously labeled quantum states, with specific 
mathematical and physical properties, in terms 
of which arbitrary quantum states can be expressed 
as linear superpositions. Since coherent states are 
continuously labeled, they form overcomplete 
sets of vectors in the Hilbert space of states. 
Originally these states were introduced into physics 
by Schrodinger (1926), as a family of quantum 
states in terms of which the transition from quantum 
to classical mechanics could be conveniently studied. 
These states have the minimal uncertainty property, 
in the sense that they saturate the Heisenberg 
uncertainty relations. The name coherent state was 
applied when these states were rediscovered in the 
context of quantum optical radiation by Glauber, 
Klauder, and Sudarshan. It was demonstrated that in 
these states the correlation functions of the quantum 
optical field factorize as they do in classical optics, 
so that the optical field has a near-classical behavior, 
with the optical beam being coherent. In this article, 
we shall refer to these originally studied coherent 
states as canonical coherent states (CCS). 

The canonical coherent states, apart from their 
use in quantum optics, have also been found to be 
extremely useful in computations in atomic and 
molecular physics, in quantum statistical mechanics, 
and in certain areas of mathematics and mathema- 
tical physics, including harmonic analysis, symplec- 
tic geometry, and quantization theory. Their wide 
applicability has prompted the search for other 
families of states sharing similar mathematical and 
physical properties. These other families of states are 
usually called generalized coherent states, even when 
there is no link to optical coherence in such studies. 


Some Properties of CCS 


In addition to the minimal uncertainty property, the 
canonical coherent states have a number of analytical 
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and group-theoretical properties which are taken as 
starting points in looking for generalizations. We 
now define the canonical coherent states mathemati- 
cally and enumerate a few of these properties. 
Suppose that the vectors |0),|1),...,|7),..., cor- 
respond to quantum states of 0,1,...,7,..., exci- 
tons, respectively. The Hilbert space of these states, 
in which they form an orthonormal basis, is often 
known as Fock space. The canonical coherent states 
are then defined in terms of this basis, for each 
complex number z, by the analytic expansion: 


-eeN 21, 1 
|z) dal ) [1] 


The states |z) are normalized to unity: (z|z) =1. 
They satisfy the formal eigenvalue equation 


alz) = z\z) |2] 


where a is the annihilation operator for excitons, which 
acts on the basis vectors (Fock states) |n} as follows: 


ajn) = vajn — 1) 3] 
Its adjoint a! has the action 
a'|n) = Vn + 1|n +1) 4] 
and 
la, at] = aa' — aa = I [5] 


I being the identity operator on Fock space. 
Introducing the self-adjoint operators O and P, of 
position and momentum, respectively, 


it is possible to demonstrate the minimal uncertainty 
property referred to above (we take 4 = 1): 


(AQ) (AP) = 5 7] 
where for any observable A, 


(AA) = [lelle — (elAlz)?] 


is its dispersion in the state |z). 


538 Coherent States 


One can also prove the resolution of the identity, 


five SLA [8] 


where z= (1/V2)(q ) has been written in terms 
of its real and Ais Soon parts (1/V2)q and 
(1/./2)p, respectively. The above operator integral 
is to be understood in the weak sense, as will be 
explained later. Equation [8] incorporates the 
mathematical fact that the set of vectors |z) is 
overcomplete in the Hilbert space. Indeed, using [8] 
any vector |@) in the Hilbert space can be written as 
a linear (integral) superposition of these states: 


o= [oot 


where Y is the component function, U(z) = (d|z). 
Thus, the coherent states |z) form a pos i 
fabled total set of vectors in the Hilbert space and 
since this space is separable, they are an over- 
complete set. 

Analytic properties of the vectors |z) emerge when 
the scalar product (¢|z) is taken with respect to an 
arbitrary vector |) in Fock space. From [1] it is 
clear that 





F(z) = (¢|z) = ef (2) 


where f is an entire analytic function in the complex 
variable z. Moreover, the mapping ¢++f is an 
isometric embedding of the Fock space onto the 
Hilbert space of analytic functions, with respect to 


the norm 
1/2 
fll = | [ Oban 2) 9) 


defined by the measure du(z, z) = (1/27) dq dp. 
Group-theoretical properties of the CCS can be 
demonstrated by noting that 


(a')” 
ln) = Tul (0) and aļ0) = 0 


using which [1] can be recast into the form 


U(z)|0) 





iz) = e~ lel" /2 924" 19) = 


UZ) = e za 


[10] 


The vectors |z} and the unitary operator U(z) can be 
reexpressed in terms of the real variables g, p and the 
operators Q, P as 


Iz) = |a,P) = U(g, p)|0) 


U(q, p) = evo? n 


The operators U(q,p) realize a (projective) unitary, 
irreducible representation of the Weyl-Heisenberg 
group, which is the group whose Lie algebra has the 
generators O, P, and I, obeying the commutation 
relations [O, P] =u. The existence of the resolution 
of the identity [8] is the statement of the fact that 
this representation is square integrable (a notion 
which will be elaborated upon in the section “Some 
examples”) which gives us the next paradigm for 
building coherent states, namely by the action, on a 
fixed vector, of the unitary operators of a square- 
integrable representation of a locally compact 
group. 

The above range of properties, which are enjoyed 
by the CCS, cannot all be expected to hold when 
looking for generalizations. It then becomes neces- 
sary to adopt one or other of these properties as the 
starting point and to proceed from there. In so 
doing, it is best first to set down a general definition 
of coherent states, involving a minimal mathema- 
tical structure. Motivated more by possible applica- 
tions to physics, we do this in the following section. 


General Definition 


Let § be an abstract, separable Hilbert space over 
the complexes, X a locally compact space and dv a 
measure on X. Let |x,i) be a family of vectors in 9, 
defined for each x in X and i= 1,2,3,..., N, where 
N is usually a finite integer, although it could also 
be infinite. We assume that this set of vectors 
possesses the following properties: 


1. For each i, the mapping x+>|x,7) is weakly 
continuous, i is, for each vector |Ø) in 9, the 
function Ņ;(x)= (x,iļġ) is continuous (in the 
topology of a 

2. For each x in X, the vectors |x,1)1 = 1; 2; N; 
are linearly independent. 

3. The resolution of the identity 


a è z 
2 | lx, i) (x, i|dv(x) = Iş [12] 


holds in the weak sense on the Hilbert space 9, 
that is, for any two vectors |), ly) in 9, the 
following equality holds: 


= í . 
5> J lx) = (ole 


A set of vectors |x,7) satisfying the above three 
properties is called a family of generalized vector 
coherent states. In case N = 1, the set is called a family 
of generalized coherent states. Sometimes the resolu- 
tion of the identity condition is replaced by a weaker 
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condition, with the vectors |x, i simply forming a total 
set in § and the functions F;(x)= (x, il), as |@) runs 
through §, forming a aie kernel Hilbert 
space. Alternatively, the identity on the right-hand 
side of [12] could also be replaced by a bounded, 
positive operator T with bounded inverse. In this case, 
the term frame is also used for the family of general- 
ized coherent states. For physical applications, how- 
ever, the resolution of the identity condition is always 
assumed to hold, although the measure dv could be of 
a very general nature (possibly also singular). The 
objective in all these cases is to ensure that an arbitrary 
vector |ø} be expressible as a linear (integral) 
combination of these vectors. Indeed, [12] is immedi- 
ately seen to imply that 


-5 fw x) |x, 1)dv(x) [13] 


where W;(x) = (x, mn 

a to a family of generalized coherent 
states on a Hilbert space §, there is an intrinsic 
isomorphism between this space and a Hilbert space 
of (in general, vector valued) continuous functions 
over X. Using this isomorphism, it is always possible 
to look upon coherent states as a family of 
continuous functions which are square integrable 
with respect to the measure dv. To demonstrate this, 
we note that, in view of [12], for each vector |ġ}) in 
H, the vector- a function ¥(x) on x, with 
components W;(x) = (x, il), i=1,2,...,N, satisfies 
the norm aca 


= 2 2 
> | W,(x)|2dv(x) = [lol 


This means that the set of vectors Y, as |) runs 
through §, is a closed subspace of the Hilbert space 
Lew(X, dv) of N-vector-valued functions on x. Let us 
denote this subspace by 9x and note that this space 
is a reproducing kernel Hilbert space with a matrix- 
valued kernel K(x, y) having matrix elements 





K(x, y); = (x, ily, j}, I= L;A [14] 
and enjoying the properties 
K(x, y); = K(y, x) ii K(x, x) jj > 0 [15] 


and 


N 
Do [ KEKE) = Kix), (6 


If e’, i=1,2,...,N, are the vectors constituting the 
canonical basis of C^, then for each x in X and 
i=1,2,...,N, the vector-valued function & on X, 


defined by E (y K(y,x)e’, is the image in 9x of 
the wed ier a coherent state |x, i}, under the 
above-mentioned isometry. The vectors & span 
the space 9x and for an arbitrary element ¥ of this 
Hilbert space, the reproducing property [16] of the 
kernel implies the relation 


J Keto) 
X 


Conversely, given any reproducing kernel Hilbert 
space, with a kernel satisfying the relations [15] and 
[16], generalized coherent states can be constructed 
as above in terms of this kernel. Mathematically, 
therefore, generalized coherent states are just the set 
of vectors naturally defined by the kernel in a 
reproducing kernel Hilbert space. 


= P(x) [17] 


Some Examples 


We present in this section some of the more 
commonly used types of coherent states, as illustra- 
tions of the general structure given above. 

A large class of generalizations of the canonical 
coherent states [1] is obtained by a simple modifica- 
tion of their analytic structure. Let x1 < x2 < --- < 
Xn <--- be an infinite a of positive numbers 
(x; Æ 0). Define x,!=x1x2 -Xn and by convention 
set xo! = 1. In the same Fock space in which the CCS 
were described, we now define the related deformed 
or nonlinear coherent states via the analytic 
expansion 


iz) = ous 3 Ja [18] 


The normalization factor N(|z|*) is chosen so that 
(z|z)=1. These generalized coherent states are 
overcomplete in the Fock space and satisfy a 
resolution of the identity of the type 


J z) (eM (le?) dv(z,2) = J 19 


D being an open disk in the complex plane of radius 
L, the radius of convergence of the series 
Yao KIV xa: (1n the case of the CCS, L=ce,) 
The measure dv is generically of the (omi dé dX(r) 
(for z=re!’), where dd is related to the x,! through 
the moment condition 


R, 


L 
=f r°” dA(r), n=0,1,2,... [20] 


This means that once the quantities x,,! are specified, 
the measure dX is to be determined by solving the 
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moment problem [20], which of course may not 
always have a solution. This puts a constraint on the 
type of sequences {x,,} which may be used in the 
construction. 

Once again, we see that for an arbitrary vector |¢) 
in the Fock space, the function F(z) =(¢|z), of the 
complex variable z, is of the form F(z)= 
N (lz?) 1/7 F(z), where f is an analytic function on 
the domain D. The reproducing kernel associated to 
these coherent states is 


Kao, 
= [weere D] PTE p 


l 
TE Xn: 





By analogy with [2], one can define a generalized 
annihilation operator A by its action on the vectors |z}, 


Alz) = zl?) |22] 


and its adjoint operator A’. These act on the Fock 
states |7) as follows: 


Aln) = Vln — 1) 
Atn) = yarla + 1) 


Depending on the exact values of the quantities x, 
these two operators, together with the identity I and 
all their commutators, could generate a wide range 
of algebras including various deformed quantum 
algebras. The term nonlinear, as often applied to 
these generalized coherent states, comes again from 
quantum optics, where many such families of states 
are used in studying the interaction between the 
radiation field and atoms, and the strength of the 
interaction itself depends on the frequency of 
radiation. Of course, these coherent states will not 
in general have either the group-theoretical or the 
minimal uncertainty properties of the CCS. 

The following is an example of generalized 
coherent states of the above type, built over the 
unit disk, D={z € C||z| < 1}: on the Fock space, 
we define the states 


ere) i 1/2 
a= (=P |S arn) r= lel A 


n=0 


|23] 


where r=1, 3/22; 5/2- and 

 T(a+m) 

(a)m ~~ Ta) 
=a(a+1)(a+2)---(a+m-—1) 

Comparing [24] with [18] we see that x, =n/(2K + 


n — 1) so that lim, _... x, = 1. Thus, the infinite sum 
is convergent for any z lying in the unit disk. These 


generalized coherent states arise from representa- 
tions of the group SU(1, 1) belonging to the discrete 
series, each irreducible representation being labeled 
by a specific value of the index «. The associated 
Hilbert space of functions, analytic on the unit disk, 
is a subspace of L7(D, dus), with 

(1 = prea? 


dus(z, Z) = (2k — 1) r dr d 


A 


which can be obtained by solving the moment 
problem [20]. The resolution of the identity satisfied 
by these states is 


E S ell r dr dé -l 25) 


7 a-Py 


The associated generalized creation and annihilation 
operators are 


Aln) = ,/——"——_|n - 1) 
2k+n-—1 26 


n+ 1 
2k+n 


so that, clearly, [A, A1] 4 I. 

Operators A and At of the general type defined in 
[23] are also known as ladder operators. When such 
operators appear as generators of representations of 
Lie algebras, their eigenvectors (see [22]) are usually 
called Barut—Girardello coherent states. As an example, 
the representation of the Lie algebra of SU(1,1) on the 
Fock space is generated by the three operators K,, K_, 
and K3, which satisfy the commutation relations 


[K-, K4] = 2K; |27] 








A'|n) = ja + 1) 


Ke, K+| =a Ka, 


They act on the vectors |) as follows: 


K_|n) = yn(2k +n — 1)|n — 1) 


|28] 


Thus, K_|0) =0 and 


In) = 


—— 0 
n\(2K), 10) 


The Barut-Girardello coherent states |z} are now 
defined as the formal eigenvectors of the ladder 
operator K_: 


K_|z) =2|z), ZEL [29] 
They have the analytic form 
ae CO gn 


ie) = VJ Tox—-1(2|2]) 2 nl(2e +n — 1)! 


where I(x) is the order-v modified Bessel function 
of the first kind. These coherent states satisfy the 
resolution of the identity, 


2 

=J z) (z|Ko,-1(27r)lo,-1(27)r drd = I 31] 
z= rel’ 

where again, K,(x) is the order-v modified Bessel 

function of the second kind. 

A nonanalytic extension of the expression [18] is 
often used to define generalized coherent states 
associated to physical Hamiltonians having pure 
point spectra. These coherent states, known as 
Gazeau—Klauder coherent states, are labeled by 
action—angle variables. Suppose that we are given 
the physical Hamiltonian H=5°>™_, E,,|2)(n|, with 
Eo =0, that is, it has the energy eigenvalues E„ and 
eigenvectors |n), which we assume to form an 
orthonormal basis for the Hilbert space of states 9. 
Let us write the eigenvalues as E, =we, by introdu- 
cing a sequence of dimensionless quantities {en} 
ordered as: 0O=€9 < €1 < & <---. Then, for all J > 0 
and y € R, the Gazeau-Klauder coherent states are 
defined as 


J,a) a ar 


where again N is a normalization factor, which 
turns out to be dependent on J only. These coherent 
states satisfy the temporal stability condition 


n/2 6 = 
J 32] 


e177) = |J, y + wt) [33] 
and the action identity 
VA Yg =o [34] 


While these generalized coherent states do form an 
overcomplete set in §, the resolution of the identity 
is generally not given by an integral relation of the 
type [12]. 

For the second set of examples of generalized 
coherent states, we take the group-theoretical structure 
of the CCS as the point of departure. Let G be a 
locally compact group and suppose that it has a 
continuous, irreducible representation on a Hilbert 
space § by unitary operators U(g), g € G. This 
representation is called square integrable if there exists 
a nonzero vector |Y} in § for which the integral 


Y) =f WUE) 


converges. Here du is a Haar measure of G, which 
for definiteness, we take to be the left-invariant 
measure. (The value of the above integral is 


Y)? du(g) [35] 
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independent of whether the left- or the right-invariant 
measure is used, so we could just as well have used 
the right-invariant measure.) A vector |Y}, satisfying 
[35], is said to be admissible, and it can be shown 
that the existence of one such vector guarantees the 
existence of an entire dense set of such vectors in §. 
Moreover, if the group G is unimodular, that is, if the 
left- and the right-invariant measures coincide, then 
the existence of one admissible vector implies that 
every vector in § is admissible. Given a square- 
integrable representation and an admissible vector 
Jw), let us define the vectors 





U 36 
ule 36 


for all g in the group G. These vectors are to be seen 
as the analogs of the canonical coherent states [11], 
written there in terms of the representation of the 
Weyl-Heisenberg group. Next, it can be shown that 
the resolution of the identity 


J g) (eldu(g) = Is 37 


holds on 5. Thus, the vectors |g) constitute a family 
of generalized coherent states. The functions 
F(g)=(/(g|ġ) for all vectors |¢) in § are square 
integrable with respect to the measure du and the 
set of such functions, which in fact are continuous in 
the topology of G, forms a closed subspace of 
L*(G,dy). Furthermore, the mapping @+-F is a 
linear isometry between H and L*(G, dy) and under 
this isometry the representation U gets mapped to a 
subrepresentation of the left regular representation 
of G on L?(G, dy). 

A typical example of the above construction is 
provided by the affine group, Gags. This is the group 
of all 2 x 2 matrices of the type 


G) 


a and b being real numbers with a 4 0. We shall 
also write g= (b,a). This group is nonunimodular, 
with the left-invariant measure being given by 
dyi(b, a) =(1/a*) db da. (The right-invariant measure 
is (1/a) ) db da.) The affine group has a unitary 
irreducible eave on the Hilbert space 
L?(R, dx). Vectors in L?*(R,dx) are measurable 
functions (x) of the real variable x and the 
(unitary) operators U(b,a) of this representation 
act on them in the manner 








1 x —b 
Uea r) Da 
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If w is a function in L?(R, dx) such that its Fourier 
transform w satisfies the condition 


J WOE ik <o 40) 
r IR 


then it can be shown to be an admissible vector, that is, 


> db da 
< CO 


q2 





e(1)) = [ (Ub, a) 


Thus, following the general construction outlined 
above, the vectors 


|b, a) = 





U(b,a)w, (b,a) € Gage [41] 


1 
velp) 


define a family of generalized coherent states and 
one has the resolution of the identity 


f baba 4 = 
Gat 4 


on L? (R, dx). 

In the signal-analysis literature a vector satisfying 
the admissibility condition [40] is called a mother 
wavelet and the generalized coherent states [41] are 
called wavelets. Signals are then identified with 
vectors |ġ) in L?(R, dx) and the function 


I [42] 


F(b a) = (6, a|6) [43] 


is called the continuous wavelet transform of the 
signal ¢. 

There exist alternative ways of constructing 
generalized coherent states using group representa- 
tions. For example, the Perelomov method is based 
on the observation that the vector |0), appearing in 
the construction of the canonical coherent states in 
[10] and [11] using the representation of the Weyl- 
Heisenberg group, is invariant up to a phase, under 
the action of its center. Consequently, the coherent 
states |z), as written in [10], are labeled, not by 
elements of the group itself, but only by the points in 
the quotient space of the group by its (central) phase 
subgroup. Generally, let G be a locally compact 
group and U a unitary irreducible representation of 
it on the Hilbert space §. We do not assume U to be 
square integrable. We fix a vector |y) in 9, of unit 
norm and denote by H the subgroup of G consisting 
of all elements b for which 


U(h) |b) = el) |p) [44] 


where w is a real-valued function of h. Let X = G/H 
be the left-coset space and x an arbitrary element in X. 


Choosing a coset representative g(x) € G, for each 
coset x, we define the vectors 


x) = Ulset) 45] 


in §. The dependence of these vectors on the specific 
choice of the coset representative g(x), is only 
through a phase. Thus, if instead of g(x) we took a 
different representative g(x)’ € G for the same coset 
x, then since g(x)’ = g(x)h for some h € H, in view of 
[44] we would have U(g(x)’)|w) =e) |x). Hence, 
quantum mechanically, both |x) and U/(g(x)’)|w) 
represent the same physical state and in particular, 
the projection operator |x)(x| depends only on the 
coset. Vectors |x), defined in this manner, are called 
Gilmore—Perelomov coherent states. Since U is 
assumed to be irreducible, the set of all these vectors 
as x runs through G/H is dense in §. In this 
definition of generalized coherent states, no resolu- 
tion of the identity is postulated. However, if X 
carries an invariant measure, under the natural 
action of G, and if the formal operator B defined as 


B= | be) (x| dya( oe) 


is bounded, then it is necessarily a multiple of the 
identity and a resolution of the identity is again 
retrieved. 

The Perelomov construction can be used to define 
coherent states for any locally compact group. On 
the other hand, there exist other constructions of 
generalized coherent states, using group representa- 
tions, which generalize the notion of square integr- 
ability to homogeneous spaces of the group. Briefly, 
in this approach one starts with a unitary irreducible 
representation U and attempts to find a vector |w), a 
subgroup H and a section o: G/H — G such that 


f. eldu = 7 46 
G/H 


where |x) =U(o(x))|w), T is a bounded, positive 
Operator with bounded inverse and dy is a quasi- 
invariant measure on X=G/H. It is not assumed 
that |Y) be invariant up to a phase under the action 
of H and clearly, the best situation is when T is a 
multiple of the identity. Although somewhat techni- 
cal, this general construction is of enormous 
versatility for semidirect product groups of the type 
R”x=.K, where K is a closed subgroup of GL(n, R). 
Thus, it is useful for many physically important 
groups, such as the Poincaré or the Euclidean group, 
which do not have square-integrable representations 
in the sense of the earlier definition (see eqn [35]). 
The integral condition [46] ensures that any vector 
|) in § can be written in terms of the |x). Indeed, it 


is easy to see that one has the integral representation 
of a vector, 


Be | W(x) x) d(x) 
W(x) = (x|T“4) 


in terms of the generalized coherent states. 

The canonical coherent states satisfy the minimal 
uncertainty relation [7]. It is possible to build 
families of coherent states by generalizing from this 
condition. To do this, one typically starts with two 
self-adjoint generators in the Lie algebra of a 
particular group representation and then looks for 
appropriate eigenvectors of a complex combination 
of these two generators. For two self-adjoint 
operators B and C on a Hilbert space %, satisfying 
the commutation relation [B,C]=iD and any 
normalized vector @ in §, one can prove the 
Heisenberg uncertainty relation 

2 
(AB) (AC)? > SP 47 
where (X) = (¢|X¢) and (AX)* = (X2) — (XY, for 
any operator X on §. More generally, one can prove 
the Schrodinger—Robertson uncertainty relation 


(AB (acy > 7 [iP +(e] as 


where (F) =(BC+ CB) —2(B)(C) measures the 
correlation between B and C in the state @. 
If (F)=0, the above relation reduces to the 
Heisenberg uncertainty relation. On the other 
hand, if (D)=0, the Heisenberg uncertainty rela- 
tions become redundant. Suppose now that B and 
C are two self-adjoint elements of the Lie algebra in 
the unitary irreducible representation of a Lie group 
and we look for states |ø} which minimize the 
uncertainty relation [48], that is, for which 
the equality holds. It turns out that such states 
can be found by considering the linear combination 
B +1AC, for a fixed complex number 4, and solving 
the formal eigenvalue equation 


IB + iAC]|z, A) = z|z, A) 49) 

with z= (B) +iX(C) 
Solutions to this equation for which |A|=1 are 
called squeezed states, since in this case AB Æ AC. 
Generally, the states |z, à) are known as intelligent 


states. As an example, for the operators O and P in 
[6], for which one has 


(AQ) (AP)? > 3 [1+ (FY 


ALR 
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taking the combination O + iAP, one obtains the 
minimal uncertainty states, 


Iz, A) = N (z, A)? e MAN? ele/v2)+w)4' O) [50] 


N(z,A) being a normalization constant and 
w=(1—)/(1 +A). The case X= —1 does not lead 
to any solutions, while A=1 gives the canonical 
coherent states [10]. For real A 4 1 the above states 
are the well-known squeezed states of quantum 
optics. 

Our final example is that of a family of vector 
coherent states, which will be obtained essentially 
by replacing the complex variable z in [18] by a 
matrix variable. We choose the domain Q =C*** 
(all 2 x 2 complex matrices), equipped with the 
measure 


—tr[33t] _2 
dv(3,3!) =; 





where 3 is an element of Q and zg; = xg; + iy,; are its 
entries. One can then prove the matrix orthogon- 
ality relation 


[ 3° 3H dv(3, 3") 


= 5 f 1383") dv(3, 3k 
SDR, Ryo 0,14 24052460 [51] 


lz being the 2 x 2 identity matrix and 


© (k+3)! 
ARIT 2(k + 1)(k +2) [52] 
k=1,2,3,..., b(0)=1 


Consider the Hilbert space H = L? (Q, dv) of square 
integrable, two-component vector-valued functions 
on Q and in it consider the vectors |‘), i= 1,2, 
k=0,1,2,...,00, defined by the C?-valued 


functions, 


Wi (3) = — 3y [53] 


blk) 


where the vectors x’,i=1,2, form an orthonormal 
basis of C?. By virtue of [51], the vectors P) 
constitute an orthonormal set in %, that is, 


(PIP is = See i 


Denote by Sx the Hilbert subspace of Ý generated 
by this set of vectors. This can be shown to be a 
reproducing kernel Hilbert space of analytic 
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functions in the variable 3', with the matrix valued 
kernel K:Q x QH>C?*?: 


-SSi s4 
b(k) 

Vector coherent states in x are then naturally 
associated to this kernel and are given by 





[55] 
that is, |3,2)(3") = K(3", 3)x 


for ¿= 1,2 and all 3 in Q. They satisfy the resolution 
of the identity 


A 


The expression for the |3,7) in [55], involving the 
sum, should be compared to [18], of which it is a 
direct analog. 


3, i|dv(3, 3') = Is, [56] 


Some Applications of Coherent States 


Generalized coherent states have many applications 
in physics, signal analysis, and mathematics, of 
which we mention a few here. As an example of 
an application of deformed coherent states, we take 


gt — qg"? 
Xn = |r 5 g> 0 57 

: = 7 37 
in the definition of these states in [18]. It is then easy 
to see that the operators A and Al, defined in [23], 
satisfy the g-deformed commutation relation 


AA'—qA'A=q% [58] 


where N is the usual number operator, which acts 
on the Fock states as N|z)=n|n). Clearly, in the 
limit as q —> 1, these g-deformed coherent states go 
over to the canonical coherent states, with the 
operators A and A! becoming the usual creation 
and annihilation operators a and at, respectively. 
The operators A and A! and the commutation 
relation [58] describe a system of g-deformed 
oscillators, which have been used to describe, for 
example, the vibrations of polyatomic molecules. 
The potential energy between the atoms of such 
a molecule has anharmonic terms, leading to 
a deformation of the usual oscillator algebra, 
generated by the operators a and a’. 


As already mentioned, generalized coherent states 
are widely used in signal analysis. The wavelet 
transform F(b,a)=(b,a|¢), introduced in [43], is a 
time-frequency transform, in which the parameter b 
is identified with time aud 1/a with frequency. 
Wavelet transforms are used extensively to analyze, 
encode, and reconstruct signals arising in many 
different branches of physics, engineering, seismo- 
graphy, electronic data processing, etc. Similarly, the 
canonical coherent states, as written in [11], give 
rise to the transform F(q, p) = (q, p | ¢). Again, if q is 
interpreted as time and p as frequency, then this is 
just the windowed Fourier transform, also used 
extensively in signal processing. More general 
wavelets, from higher-dimensional affine groups, 
are used to analyze higher-dimensional signals, 
while wavelet like transforms from other groups 
have been used to study signals exhibiting different 
geometries. In particular, wavelet transforms from 
spherical geometries have been applied to the study 
of brain signals and to astrophysical data. 

Our final example is taken from quantization 
theory. A quantization technique is a method for 
performing the transition from a given classical 
mechanical system to its quantum counterpart. 
Many methods have been developed to accomplish 
this and the use of coherent states is one of them. 
Suppose that we are given a family of coherent 
states |x) in a Hilbert space §, where the set X from 
which x is taken is a classical phase space. This 
means that X is a symplectic manifold with an 
associated 2-form w, which defines a Poisson 
bracket on the set of observables of the classical 
system, which are real-valued functions on X. There 
is a natural measure dw, defined on X by the 2-form 
w. Let us assume that the coherent states |x) satisfy a 
resolution on the identity with respect to this 


measure: 
/ x) (xe|dea(xc) = Is 
X 


In this case, the coherent states may be used to 
quantize the observables of the classical system in 
the following way: let f be a real-valued function on 
X, representing a classical observable and suppose 
that the formal operator 


f= J f (2x) oe) (xldw(x) 59) 


is well defined as a self-adjoint operator on §. Then 
we may take the operator f to be the quantized 
observable corresponding to the classical observable 
f. Suppose that we have two such operators, f and g, 


corresponding to the two classical observables f and 
g, which have the Poisson bracket {f, g}, defined via 
the 2-form w. We then check if the quantization 
condition 


fa = Fe 60) 


where / is Planck’s constant, is satisfied. Generally 
this will be the case for a certain number of classical 
observables. This method of quantization has been 
most successfully used for manifolds X which have a 
(complex) Kahler structure. Over such a manifold, 
one can define a Hilbert space of analytic functions, 
which has a reproducing kernel and hence a 
naturally associated set of coherent states. As a 
specific example, we take the case of canonical 
coherent states [11]. We can identify the complex 
plane C with the phase space R? of a free classical 
particle having a single degree of freedom. The 
measure dw in this case is just (1/27)dqdp. If we 
now quantize the classical observables f(g, p)=q 
and f(g,p)=p, of position and momentum, respec- 
tively, using the canonical coherent states, we obtain 
the two operators 


B dq dp 
Q = J adapa 3? 


B dq dp 
P= | plapan E 
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Introduction 


The origins of cohomology theory are found in 
topology and algebra at the beginning of the last 
century but since then it has become a tool of nearly 
every branch of mathematics. Its a way of life! 
Naturally, this article can only give a glimpse at the 
rich subject. We take here the point of view of 
algebraic topology and discuss only the cohomology 
of spaces. 

Cohomology reflects the global properties of a 
manifold, or more generally of a topological space. 
It has two crucial properties: it only depends on the 
homotopy type of the space and is determined by 
local data. The latter property makes it in general 
computable. 
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It can be verified that these two operators satisfy the 
canonical commutation relations [Q,P]=ils, as 
required. 


See also: Solitons and Kac—Moody Lie Algebras; 
Wavelets: Mathematical Theory. 
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To illustrate the interplay between the local and 
global structure, consider the Euler characteristic of 
a compact manifold; as will be explained below, 
cohomology is a refinement of the Euler character- 
istic. For simplicity, assume that the manifold M is a 
surface and that we have chosen a way of dividing 
the surface into triangles. The Euler characteristic is 
then defined to be 


x(M) = F-E+V 


where F denotes the number of faces, E the number 
of edges, and V the number of vertices in the 
triangulation. Remarkably, this number does not 
depend on the triangulation. Yet, this simple, easy to 
compute number can already distinguish the differ- 
ent types of closed, oriented surfaces: for the sphere 
we have x =2, the torus x =0, and in general for 
any surface M, of genus g 


x(Mz) =2—2g 
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The Euler characteristic also tells us something 
about the geometry and analysis of the manifold. For 
example, the total curvature of a surface is equal to its 
Euler characteristic. This is the Gauss—Bonnet theo- 
rem and an analogous result holds in higher dimen- 
sions. Another striking result is the Poincaré—Hopf 
theorem which equates the Euler characteristic with 
the total index of a vector field and thus gives strong 
restrictions on what kind of vector fields can exist on 
a manifold. This interplay between global analysis 
and topology has been one of the most exciting and 
fruitful research areas and is most powerfully 
expressed in the celebrated Atiyah—-Singer index 
theorem, which determines the analytic index of an 
elliptic operator, such as the Dirac operator on a spin 
manifold, in terms of cohomology classes. 


Chain Complexes and Homology 


There are several different geometric definitions of 
the cohomology of a topological space. All share 
some basic algebraic structure which we will explain 
first. 

A “chain complex” (C,, 0,) 


ele a y 1] 


is a collection of vector spaces (or R-modules more 
generally) C;,i>0, and linear maps (R-module 
maps) 0;: C; + C;_1 with the property that for all 7 


O; O O;44 = 0 [2] 


The scalar fields one tends to consider are the 
rationals Q, reals R, complex numbers C, or a 
primary field Z,, while the most important ring R is 
the ring of integers Z though we will also consider 
localizations such as Z[1/p], which has the effect of 
suppressing any p-primary torsion information. 

Of particular interest are the elements in C; that are 
mapped to zero by O;, the i-dimensional “cycles,” and 
those that are in the image of 0;,1, the i-dimensional 
“boundaries.” Because of [2], every boundary is a 
cycle, and we may define the quotient vector space 
(R-module), the ith-dimensional homology, 

Hi(C.30,) = SS’ 3] 


imOj+1 





(C,,0,) is “exact” if all its cycles are boundaries. 
Homology thus measures to what extent the 
sequence [1] fails to be exact. 


Simplicial Homology 


A triangulation of a surface gives rise to its 
“simplicial” chain complex: Taking coefficients in 





Z, Cə, C1,Cọo are the free abelian groups generated 
by the set of faces, edges, and vertices, respectively; 
C; = {0} for i > 3. The map ð assigns to a triangle 
the sum of its edges; 0; maps an edge to the sum of 
its endpoints. If we are working with Z3 coeffi- 
cients, this defines for us a chain complex as [2] is 
clearly satisfied; in general, one needs to keep track 
of the orientations of the triangles and edges and 
take sums with appropriate signs (cf. [6] below). An 
easy calculation shows that for an oriented, closed 
surface M, of genus g, we have 


Ho(M,,Z) =Z 

Hı (Mş; Z) = -a 4] 
Hy (M,;Z) =Z 

Hj(M,;Z) =0 tori>3 


Note that the Euler characteristic can be recov- 
ered as the alternating sum of the rank of the 
homology groups: 


dim M 


x(M) = X (-1)' rk Hi(M; Z) 5] 
i=0 


Every smooth manifold M has a triangulation, so 
that its simplicial homology can be defined just as 
above. More generally, simplicial homology can be 
defined for any simplicial space, that is, a space that 
is built up out of points, edges, triangles, tetrahedra, 
etc. Formula [5] remains valid for any compact 
manifold or simplicial space. 


Singular Homology 


Let X be any topological space, and let A” be the 
oriented n-simplex [vo,...,V,] spanned by the 
standard basis vectors v; in R”*!. The set of singular 
n-chains S,,(X) is the free abelian group on the set of 
continuous maps g: A” — X. The boundary of a is 
defined by the alternating sum of the restriction of o 
to the faces of A”: 


One easily checks that the boundary of a boundary is 
zero, and hence (S,(X), 0.) defines a chain complex. 
Its homology is by definition the singular homology 
H,(X;Z) of X. For any simplicial space, the inclusion 
of the simplicial chains into the singular chains 
induces an isomorphism of homology groups. In 
particular, this implies that the simplicial homology 
of a manifold, and hence its Euler characteristic do 
not depend on its triangulation. 

If in the definition of simplicial and singular 
homology we take free R-modules (where R may 


also be a field) instead of free abelian groups, we get 
the homology H.,(X;R) of X with coefficients in R. 
The “universal-coefficient theorem” describes the 
homology with arbitrary coefficients in terms of the 
homology with integer coefficients. In particular, if R 
is a field of characteristic zero, 


dim H,,(X; R) = rk H,(X; Z) 


Basic Properties of Singular Homology 


While simplicial homology (and the more efficient 
cellular homology which we will not discuss) is 
easier to compute and easier to understand geome- 
trically, singular homology lends itself more easily to 
theoretical treatment. 


1. Homotopy invariance. Any continuous map 
f:X—Y induces a map on homology 
fs: H.(X;.R)— H,(Y;R) which only depends on 
the homotopy class of f. 

In particular, a homotopy equivalence f:X— Y 

induces an isomorphism in homology. So, for exam- 

ple, the inclusion of the circle St into the punctured 
plane C\{0} is a homotopy equivalence, and thus 
H;(C\{0}; R) ~ H;(S'; R) 
Z fori=0,1 
0 fori>2 

For the one point space we have Ho(pt; R) = R. Define 

reduced homology by H,(X;R):= ker(H.(X;R)—- 

H,.(pt; R)). 


2. Dimension axiom. H,(pt;R)=0 for all i. 


More generally, it follows immediately from the 
definition of simplicial homology that the homology 
of any n-dimensional manifold is zero in dimensions 
larger than n. 

We mentioned in the introduction that homology 
depends only on local data. This is made precise 


by the 


3. Mayer—-Vietoris theorem. Let X=AUB be the 
union of two open subspaces. Then the following 
sequence is exact: 


... >H,(A NB; R) — H,(4; R) © H,,(B; R) 


— H,(X:R)—> Hy_1(ANB:R) 
—>--- — H(X; R)—>0 


On the level of chains, the first map is induced by the 
diagonal inclusion, while the second map takes the 
difference between the first and second summands. 
Finally, 0 takes a cycle c=a + b in the chains of X 
that can be expressed as the sum of a chain a in A 
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and b in B to ôc:=6, a= —ðô,b. For example, 
consider two cones, A and B, on a space X and 
identify them at the base X to define the suspension 
XX of X. Then YX=AUB with A,B ~ pt and AN 
B ~ X. The boundary map ð is then an isomorphism: 


forallan>0 [7] 


H,(X; R) ~ Hysi(=X; R) 
From this one can easily compute the homology of a 
sphere. First note that 


Ho (X; Z) = Z*! 


where k is the number of connected components in 
X. Also, S” ~ US"! ~ ... ~ E”S?. Thus, by [7], 


H,(S";Z) ~Z and H,(S";Z)=0 for»x#n [8] 


If Y is a subspace of X, relative homology groups 
H,(X, Y; R) can be defined as the homology of the 
quotient complex S,(X)/S,(Y). When Y has a good 
neighborhood in X (i.e. it is a neighborhood 
deformation retract in X), then, by the “excision 
theorem,” 


H,.(X, Y;R) ~ H,(X/Y;R) 


where X/Y denotes the quotient space of X with Y 
identified to a point. There is a long exact sequence 


«++» H,(Y;R) — H,(X; R) — H,(X, Y; R) 


o 


— H,-1(Y; R) — --- — Ho(X, Y; R) — 0 


This and the Mayer—Vietoris sequence give two ways of 
breaking up the problem of computing the homology of 
a space into computing the homology of related spaces. 
An iteration of this process leads to the powerful tool of 
spectral sequences (see Spectral Sequences). 


Relation to Homotopy Groups 


Let mı(X, xo) denote the fundamental group of X 
relative to the base point xo. These are the based 
homotopy classes of based maps from a circle to X. 


If X is connected, then H,(X;7Z) is 


9 
the abelianization of mı(X, xo) "I 


Indeed, every map from a (triangulated) sphere to 
X defines a cycle and hence gives rise to a homology 
class. This defines the Hurewicz map + :m,(X; xo) > 
H,(X;Z). In general there is no good description of 
its image. However, if X is k-connected with k > 1, 
then / induces an isomorphism in dimension k + 1 
and an epimorphism in dimension k + 2. 

Though [9] indicates that homology cannot distin- 
guish between all homotopy types, the fundamental 
group is in a sense the only obstruction to this. 
A simple form of the “Whitehead theorem” states: 
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Theorem If a map f:X—Y between two simpli- 
cial complexes with trivial fundamental groups 
induces an isomorphism on all homology groups, 
then it is a homotopy equivalence. 


Warning: This does not imply that two simply 
connected spaces with isomorphic homology groups 
are homotopic! The existence of the map f inducing 
this isomorphism is crucial and counterexamples can 
easily be constructed. 


Dual Chain Complexes and Cohomology 


The process of dualizing itself cannot be expected to 
yield any new information. Nevertheless, the coho- 
mology of a space, which is obtained by dualizing its 
simplicial chain complex, carries important addi- 
tional structure: it possesses a product, and more- 
over, when the coefficients are a primary field, it is 
an algebra over the rich Steenrod algebra. As with 
homology we start with the algebraic setup. 

Every chain complex (C,,0,) gives rise to a dual 
chain complex (C*,0*) where C’=homp(C;, R) is 
the dual R-module of C;; because of [2], the 
composition of two dual boundary morphisms 





o'+!:C' + Ct! is trivial. Hence we may define the 
ith dimensional cohomology group as 
3 ker o+! 
HC O) = 10 
(a = 10 


Evaluation (o, ¢)—> (o) descends to a dual pairing 
Hy, (C, 0.) SR H” (C*, 0*)—>R 


and when R is a field, this identifies the cohomology 
groups as the duals of the homology groups. More 
generally, the universal-coefficient theorem relates 
the two. A simple version states: let (C,,0,) be a 
chain complex of free abelian groups (such as the 
simplicial or singular chain complexes) with finitely 
generated homology groups. Then, 


H'(C*, 0*) 





~ HE (C,,0,) OH (Cô) [1 
where H' denotes the torsion subgroup of H, and 
H*ee denotes the quotient group H,/H'*. 

Singular Cohomology 


The dual S*(X) of the singular chain complex of a 
space X carries a natural pairing, the cup product, 
U: $?(X) @ S1(X)— $?*9(X) defined by 
(91 U 2)(0) 
= NO ioa) 2E lp, 


This descends to a a aA on —_—s 
groups and makes H*(X;R):= ©, 9 H"(X; R) into 


genes 


an associative, graded commutative ring: u U v= 
faa ete, UH. 

The “Künneth theorem” gives some geometric 
intuition for the cup product. A simple version 
states: for spaces X and Y with H*(Y;R) a finitely 
generated free R-module, the cup product defines an 
isomorphism of graded rings 


H* (X; R) Ər H*(Y;R) — H*(X x Y;R) 


For example, for a sphere, all products are trivial for 
dimension reasons. Hence, 
* 

H*(8";Z) = N (x) [12] 
is an exterior algebra on one generator x of degree 
n. On the other hand, the cohomology of the 
n-dimensional torus T” is an exterior algebra on 
n degree-1 generators, 


UA Cree) [13] 


The dual pairing can be generalized to the slant or 
cap product 


N : Hya (X; R) 8r H (X; R) — H,_i(X;R) 


defined on the chain formula 


(0, p) r Palin 


level by the 


al livst 


Steenrod Algebra 


The cup product on the chain level is homotopy 
commutative, but not commutative. Steenrod used 
this defect to define operations 


Sd H(X 7o) 3h" X Z) 


for all ¿> 0 which refine the cup-squaring opera- 
tion: when n=i, then Sq"(x)=xUx. These are 
natural group homomorphisms which commute 
with suspension. Furthermore, they satisfy the 
Cartan and Adem Relations 


A Uy = N Sai (x ) U Sq’ (y) 
i+j=n 
W/2] /;—k—1 
Po j o. 
Sai Sgi = 5 i+j—k ¢ k 
q'Sq we _— q''*Sq 
fori <2 


The mod-2 Steenrod algebra A is then the free 
Z2-algebra generated by the Steenrod squares 
Sq',i > 0, subject only to the Adem relations. With 
the help of Adem’s relations, Serre and Cartan found 
a Z2-basis for A: 

{Sq' := Sq" =- 


Sq” li > 2i41 for all j} 


The Steenrod algebra is also a Hopf algebra with 
a commutative comultiplication A:A-~A@®A 
induced by 


A(Sq"):= X Sq’ @ Sq’ 


i+j=n 


The Cartan relation implies that the mod-2 
cohomology of a space is compatible with the 
comultiplication, that is, H*(X;Z,) is an algebra 
over the Hopf algebra A. There are odd primary 
analogs of the Steenrod algebra based on the 
reduced pth power operations 


Pe (7, 3H? a) 


with similar properties to A. 

One of the most striking applications of the 
Steenrod algebra can be found in the work of 
Adams on the “vector fields on spheres problem”: 
for each n, find the greatest number k, denoted K(7), 
such that there is a k-field on the (n — 1)-sphere S”~!. 
Recall that a k-field is an ordered set of k pointwise 
linear independent tangent vector fields. If we write 
in the form n = 244+(2s + 1) with 0 < b < 4, Adams 
proved that K(z) =2° + 8a — 1. In particular, when n 
is odd, K(x) =0. We give an outline of the proof for 
this special case in the next section. 


e The failure of associativity of the cup product at 
the chain level gives rise to secondary operations, 
the so-called “Massey products.” 


Cohomology of Smooth Manifolds 


A smooth manifold M of dimension n can be 
triangulated by smooth simplices ø: A” M. If M 
is compact, oriented, without boundary, the sum of 
these simplices define a homology cycle [M], the 
fundamental class of M. The most remarkable 
property of the cohomology of manifolds is that 
they satisfy “Poincaré duality”: taking cap product 
with [M] defines an isomorphism: 


D:= [M]Jn: H*(M;Z) — H,,_.(M;Z) forallk [14 


In particular, for connected manifolds, H”(M; Z) ~ Z; 
and every map f:M’—M between oriented, compact 
closed manifolds of the same dimension has a degree: 
f*:H*(M;Z) + H*(M’;Z,) is multiplication by an 
integer deg(f), the degree of f. For smooth maps, the 
degree is the number of points in the inverse image of 
a generic point p € M counted with signs: 


deg(f) = $. sign(p’) 


p'ef-'(p) 
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where sign(p’) is +1 or —1 depending on whether f is 
orientation preserving or reversing in a neighbor- 
hood of p’. For example, a complex polynomial of 
degree d defines a map of the two-dimensional 
sphere to itself of degree d: a generic point has n 
points in its inverse image and the map is locally 
orientation preserving. On the other hand, a map of 
S”-! induced by a reflection of R” reverses orienta- 
tion and has degree —1. Thus, as degrees multiply on 
composing maps, the antipodal map x> —x has 
degree (—1)”. As an application we prove: 


Every tangent vector field on an even-dimensional 
sphere S"! has a zero. 


Proof Assume v(x) is a vector field which is nonzero 
for all x € S”~!. Then x is perpendicular to v(x), and 
after rescaling, we may assume that v(x) has length 1. 
The function F(x, t) = cos (t)x + sin (t) v(x) is a well- 
defined homotopy from the identity map (t=0) to 
the antipodal map (t=7). But this is impossible as 
homotopic maps induce the same map in (co)homo- 
logy and we have already seen that the degree of the 
identity map is 1 while the degree of the antipodal 
map is (—1)” = —1 when 7 is odd. 


e It is well known that two self-maps of a sphere of 
any dimension are homotopic if and only if they 
have the same degree, that is, 7,($”) ~ Z for n > 1. 

e When M is not orientable, [M] still defines a cycle 
in homology with Z,-coefficients, and [M]n 
defines an isomorphism between the cohomology 
and homology with Z, coefficients. 

e As [M] represents a homology class, so does every 
other closed (orientable) submanifold of M. It is 
however not the case that every homology class 
can be represented by a submanifold or linear 
combinations of such. 


Cohomology is a contravariant functor. Poincaré 
duality however allows us to define, for any f : M' — M 
between oriented, compact, closed manifolds of arbi- 
trary dimensions, a “transfer” or “Umkehr map,” 


f := D"f,D' : H*(M'; Z) > H*°(M; Z) 


which lowers the degree by c= dim M’ — dim M. It 
satisfies the formula 


f (f*(x) Uy) =xuf (y) 


for all x € H*(M; Z) and y € H*(M'; Z). When f is a 
covering map then f! can be defined on the chain 


level by 
f (x)(0) = (27) 


where x € C*(M’) and o € C,(M). 
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de Rham Cohomology 


If x1,...,X, are the local coordinates of R”, define an 
algebra Q* to be the algebra generated by symbols 
dx1,...,dx, subject to the relations dxjdx; = —dx;dx; 
for all i,j. We say dx;,---dx;, has degree q. The 
differential forms on R” are the algebra 


O*(R") := {C™% functions on R”} @p Q* 


The algebra 0*(R”) = Dy =0 27 (R”) is naturally 
graded by degree. There is a differential operator 
d:04(R”) > 09+1(R”) defined by 


1. if f € N(R”), then df = Y` (Of /Ox;)dx; 
2. if w= 3 ÀX then dw = XS dfrdxı 


I stands here for a multi-index. For example, in R? 
the differential assigns to 0-forms (= functions) the 
gradient, to 1-forms the curl, and to 2-forms the 
divergence. An easy exercise shows that d?=0 and 
the gth de Rham cohomology of R” is the vector space 


ker d : QI(R”) — 09*1(R”) 
q WY — 
Ren Aae im d : Q9-!(IR”) — Q1(R”) 


More generally, the de Rham complex Q*(M) and 
its cohomology Hï. ę(M) can be defined for any 
smooth manifold M. 

Let o be a smooth, singular, real (q + 1)-chain on 
M, and let w € Q1(M). Stokes theorem then says 


f o= [aw 
Oo o 


and therefore integration defines a pairing between 
the qth singular homology and the qth de Rahm 
cohomology of M. This pairing is exact and thus de 
Rahm cohomology is isomorphic to singular coho- 
mology with real coefficients: 


Hije R(M) ~ (H. (M; R))* ~ H*(M;R) 


Let (*(M) denote the subcomplex of compactly 
supported forms and H*(M) its cohomology. Integra- 
tion with respect to the first i coordinates defines a map 


QZR”) > OS" (R"™) 


which induces an isomorphism in cohomology; note in 
particular H”(R”) =R. More generally, when E— M 
is an i-dimensional orientable, real vector bundle over 
a compact, orientable manifold M, integration over 
the fiber gives the “Thom isomorphism”: 


H} (E) =~ H} (M) ~ Hg. (M) 


For orientable fiber bundles F—M’-4M_ with 
compact, orientable fiber F, integration over the 
fiber provides another definition of the transfer map 


f : Ha. rR(M') > His (M) 


Hodge Decomposition 


Let M be a compact oriented Riemannian manifold of 
dimension n. The Hodge star operator, x, associates to 
every q-form an (n -— q)-form. For R” and any 
orthonormal basis {e1,..., €n}, it is defined by setting 


Fete Ney) = ep Ao Ney 


where one takes + if the orientation defined by 
{€1,...,5€n} is the same as the given one, and — 
otherwise. Using local coordinate charts this defini- 
tion can be extended to M. Clearly, x depends on the 
chosen metric and orientation of M. If M is 
compact, we may define an inner product on the 
g-forms by 


(Ww, uw’) =f wns 


With respect to this inner product * is an isometry. 
Define the codifferential via 


6 = (-1) PT] x dx : QI(M) = 09-1(M) 
and the Laplace—Beltrami operator via 
A := ôd + dé 


The codifferential satisfies 6>=0 and is the adjoint 
of the differential. Indeed, for g-forms w and (q + 1)- 
forms Ww’: 


(dw, w) = (w, du’) [15] 

It follows easily that A is self-adjoint, and 
furthermore, 

Aw=0O if andonlyif dw=Oandédw=0 [16] 


A form w satisfying Aw =0 is called “harmonic.” Let 
H? denote the subspace of all harmonic g-forms. It is 
not hard to prove the “Hodge decomposition theorem”: 


Q1 = HI Pimd @ imé 


Furthermore, by adjointness [15], a form w is closed 
only if it is orthogonal to im ô. On calculating the 
de Rham cohomology we can also ignore the 
summand imd and find that: 


Each de Rham cohomology class on a compact 

oriented Riemannian manifold M contains a unique 

harmonic representative, that is, Hi, AMISH. 
Warning: This is an isomorphism of vector spaces 
and in general does not extend to an isomorphism of 
algebras. 


Examples 
We list the 


examples. 


cohomology of some important 


Projective Spaces 


Let RP” be real projective space of dimension n. Then, 
H*(RP"; Z2) = Zo|x) /(x"*1) 


is a stunted polynomial ring on a generator x of 
degree 1. 

Similarly, let CP” and HP” denote complex and 
quaternionic projective space of real dimensions 2” 
and 4n, respectively. Then, 





H*(CP";Z) = Z[y|/("*") 
H* (HP"; Z) = Z[z]/(2"*") 





are stunted polynomial rings with deg(y)=2 and 
deg(z) = 4. 


Lie Groups 


Let G be a compact, connected Lie group of rank l, 
that is, the dimension of the maximal torus of G is /. 
Then, 


H*(G,Q) 


x 
TU ETTER TE 
Q 


where la; =i and d,...,d; are the fundamental 
degrees of G which are known for all G. Often this 
structure lifts to the integral cohomology. In 
particular we have: 


Hfee(SO(2k + 1); Z)) 
~ N |a3,a7, eaa ,d4k—1] 
Z 
Hice (SO(2k); Z)) 


x 
~ TAN C EEE E E 
Z 


H*(U(k);Z) ~ N [a,a3,... a21] 
Z 


Classifying Spaces 


For any group G there exists a classifying space BG, 
well defined up to homotopy. Classifying spaces 
are of central interest to geometers and topologists 
for the set of isomorphism classes of principal 
G-bundles over a space X is in one-to-one corre- 
spondence with the set of homotopy classes of maps 
from X to BG. In particular, every cohomology class 
c € H*(BG; R) defines a characteristic class of 
principle G-bundles E over X: if E corresponds to 
the map fg: X — BG, then c(E) :=ff(c). 
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BG can be constructed as the space of G-orbits of 
a contractible space EG on which G acts freely. 
Thus, for example, 


BZ = R/Z ~ S! 
BZ» = (lim S”)/Zy. ~ RP” 
BS = lim sA aP 
and more generally, infinite Grassmannian mani- 


folds are classifying spaces for linear groups. When 
G is a compact connected Lie group, 


H*(BG; Q) = Q|x24,,- ai Ka 
with d; as above and |x;| =i. In particular, 
H*(BSO(2k + 1); Z[1/2}) 


ZN) 2) Pits Pr Pk] 
H*(BSO(2k); Z[1/2)) 


= Z172 PiP; sai ER] 
H*(BU(k);Z) ~ Zll c2,..., | 


where the Pontryagin, Euler, and Chern classes have 
degree |p;| = 4i, |e,| = 2k, and |c;| = 27, respectively. 


Moduli Spaces 


Let M% be the space of Riemann surfaces of genus g 
with n ordered, marked points. There are naturally 
defined classes k; and e,...,é€, of degree 2i and 2, 
respectively. By Harer-Ivanov stability and the 
recent proof of the Mumford conjecture (Madsen- 
Weiss, preprint 2004), there is an isomorphism up to 
degree x < 3g/2 of the rational cohomology of Mg 
with 


Qfk1, K2, ae .] ® Qle1,. ce , ĉn] 


The rational cohomology vanishes in degrees * > 
4g — 5 if n=0, and x > 4g — 4 +n if n > 0. Though 
the stable part of the cohomology is now well under- 
stood, the structure of the unstable part, as proposed by 
Faber (Viehweg 1999), remains conjectural. 


Generalized Cohomology Theories 


The three basic properties of singular homology 
appropriately dualized, hold of course also for 
cohomology. Furthermore, they (essentially) deter- 
mine (co)homology uniquely as a functor from the 
category of simplicial spaces and continuous func- 
tions to the category of abelian groups. If we drop 
the dimension axiom (2), we are left with homotopy 
invariance (1), and the Mayer—Vietoris sequence (3). 
Abelian group valued functors satisfying (1) and (3) 
are so called “generalized (co)homology theories.” 


552 Cohomology Theories 


K-theory and cobordism theory are two well-known 
examples but there are many more. 


K-Theory 


The geometric objects representing elements in com- 
plex K-theory K°(X) are isomorphism classes of finite 
dimensional complex vector bundles E over X. Vector 
bundles E,E’ can be added to form a new bundle 
E @ E' over X, and K? (X) is just the group completion 
of the arising monoid. Thus, for example, for the point 
space we have K°(pt)=Z. Tensor product of vector 
bundles E & E’ induces a multiplication on K-theory 
making K*(X) into a graded commutative ring. 

In many ways K-theory is easier than cohomol- 
ogy. In particular, the groups are 2-periodic: all even 
degree groups are isomorphic to the reduced 
K-theory group K(X) := coker(K? (pt) = Z — K°(X)), 
and all odd degree groups are isomorphic to 
K(X) := K°(=X). 

The theory of characteristic classes gives a close 
relation between the two cohomology theories. The 
Chern character map, a rational polynomial in the 
Chern classes, defines 


ch : K°(X) @z Q— H°"(X; Q) 
= © H*(x;Q) 
k>0 








an isomorphism of rings. Thus, the K-theory and 
cohomology of a space carry the same rational 
information. But they may have different torsion 
parts. This became an issue in string theory when 
D-brane charges which had formerly been thought 
of as differential forms (and hence cohomology 
classes) were later reinterpreted more naturally as 
K-theory classes by Witten 1998) 


e There are real and quaternionic K-theory groups 
which are 8-periodic. 


Cobordism Theory 


The geometric objects representing an element in the 
oriented cobordism group (§,(X) are pairs (M,f) 
where M is a smooth, orientable n-dimensional 
manifold and f : M—X is a continuous map. Two 
pairs (M, f) and (M’,f’) represent the same cobord- 
ism class if there exists a pair (W, F) where W is an 
(n + 1)-dimensional, smooth, oriented manifold 
with boundary OW =M U —M' such that F: W > X 
restricts to f and f’ on the boundary OW. Disjoint 
union and Cartesian product of manifolds define an 
addition and multiplication so that 0§,(X) is a 
graded, commutative ring. 


è Similarly, unoriented, complex, or spin cobordism 
groups can be defined. 


Elliptic Cohomology 


Quillen proved that complex cobordism theory is 
universal for all complex oriented cohomology 
theories, that is, those cohomology theories that 
allow a theory of Chern classes. In a complex 
oriented theory, the first Chern class of the tensor 
product of two line bundles can be expressed in 
terms of the first Chern class of each of them via a 
two-variable power series: c,(E 8 E’)=F(c,(E), 
c1(E’)). F defines a formal group law and Quillen’s 
theorem asserts that the one arising from complex 
cobordism theory is the universal one. 

Vice versa, given a formal group law, one may try to 
construct a complex oriented cohomology theory from 
it. In particular, an elliptic curve gives rise to a formal 
group law and an elliptic cohomology theory. Hopkins 
et al. have described and studied an inverse limit of 
these elliptic theories, which they call the theory of 
topological modular forms, tmf, as the theory is closely 
related to modular forms. In particular, there is a 
natural map from the groups tmf2,(pt) to the group of 
modular forms of weight n over Z. After inverting a 
certain element (related to the discriminant), the 
theory becomes periodic with period 24* = 576. 

Witten (1998) showed that the purely theoreti- 
cally constructed elliptic cohomology theories 
should play an important role in string theory: the 
index of the Dirac operator on the free loop space of 
certain manifolds should be interpreted as an 
element of it. But unlike for ordinary cohomology, 
K-theory, and cobordism theory we do not (yet) 
know a good geometric object representing elements 
in this theory without which its use for geometry 
and analysis remains limited. Segal speculated some 
20 years ago that conformal field theories should 
define such geometric objects. Though progress has 
been made, the search for a good geometric 
interpretation of elliptic cohomology (and tmf) 
remains an active and important research area. 


Infinite Loop Spaces 


Brown’s representability theorem implies that for 
each (reduced) generalized cohomology theory h* we 
can find a sequence of spaces E, such that h”(X) is 
the set of homotopy classes |X, E,,| from the space X 
to E, for all n. Recall that the Mayer—Vietoris 
sequence implies that )”(X) ~ h”*+!(X). The sus- 
pension functor © is adjoint to the based loop space 
functor Q which takes a space X to the space of 
based maps from the circle to X. Hence, 


h'(X) = [X, En] = (EX, Ena] 
= |X, Eii] 


and it follows that every generalized cohomology 
theory is represented by an infinite loop space 


Vice versa, any such infinite loop space gives rise to 
a generalized cohomology theory. 

One may think of infinite loop spaces as the 
abelian groups up to homotopy in the strongest 
sense. Indeed, ordinary cohomology with integer 
coefficients is represented by 


ZS OS) » PCP? ww OF K(n, Z) & - 


where by definition the Eilenberg—MacLane space 
K(n, Z) has trivial homotopy groups for all dimen- 
sions not equal to n and 7,K(n,Z)=Z. Complex 
K-theory is represented by 


Z x BU ~ Q(U) = OF (BU) = OP (U) = + 


This is Botts celebrated “periodicity theorem.” 
Finally, oriented cobordism theory is represented by 
O° MSO := lim 0”Th(%,) 

n— CO 
where 7,,— BSO, is the universal n-dimensional 
vector space over the Grassmannian manifold of 
oriented n-planes in R™, and Th(»,,) denotes its 
Thom space. 

A good source of infinite loop spaces are 
symmetric monoidal categories. Indeed every infinite 
loop space can be constructed from such a category: 
the symmetric monoidal structure gives the corre- 
sponding homotopy abelian group structure. For 
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Introduction 


Combinatorics is a vast field which enters particularly 
in a crucial way in statistical physics. There, it is 
particularly the enumerative problems that are of 
importance. Therefore, in this article, we shall mainly 
concentrate on the enumerative aspects of combina- 
torics. We first recall the basic terminology, in 
particular the basic combinatorial objects and num- 
bers, together with the simplest facts about them. We 
then provide introductions into the most important 
techniques of enumeration: the generating function 
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example, the category of  finite-dimensional, 
complex vector spaces and their isomorphisms 
gives rise to Z x BU. To give another example, in 
quantum field theory, one considers the (d + 1)- 
dimensional cobordism category with objects the 
compact, oriented d-dimensional manifolds, and 
their (d + 1)-dimensional cobordisms as morphisms. 
Disjoint union of manifolds makes this category 
into a symmetric monoidal category. The associated 
infinite loop space and hence generalized cohomol- 
ogy theory has recently been identified as a (d + 1)- 
dimensional slice of oriented cobordism theory 
(Galatius et al. preprint 2005). 


See also: Characteristic Classes; Equivariant 
Cohomology and the Cartan Model; Functional Equations 
and Integrable Systems; Index Theorems; Intersection 
Theory; K-Theory; Moduli Spaces: An Introduction; 
Riemann Surfaces; Spectral Sequences. 
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technique, Redfield—Polya theory, methods of solving 
functional equations of combinatorial origin, meth- 
ods of asymptotic enumeration, the theory of heaps, 
and the transfer matrix method. The subsequent 
sections then discuss specific problem circles with 
relation to statistical physics more closely. We discuss 
lattice path problems, explain Kasteleyn’s method of 
enumerating perfect matchings and tilings, present 
the fundamental theorems on nonintersecting paths, 
and provide an introduction into the research field 
involving vicious walkers, plane partitions, rhombus 
tilings, alternating sign matrices, six-vertex config- 
urations, and fully packed loop configurations. 
Finally, we explain how one should treat binomial 
and hypergeometric series, which frequently arise in 
enumeration problems. 
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Basic Combinatorial Terminology 


In this section we review basic combinatorial 
notions and facts. The reader can find a more 
detailed treatment and further results, for example, 
in chapter 1 of Stanley (1986). 

The basic combinatorial choice problems and 
their solutions are: there are 2” subsets of an 
n-element set. There are (7)k-element subsets of an 
n-element set. Given an alphabet A= {a1,a2,...} a 
word is a (finite or infinite) sequence of elements of 
A. Usually, a finite word is written in the form 
W1{W2...W, (with w; € A). Out of the letters 
{1,2,...,R}, one can build k” words of length n. 
Out of the letters {1,2,..., k}, one can build ("7871 ) 
increasing sequences of length n. The number of 
permutations of an n-element set is n!. The set of 
permutations of {1,2,...,7} is denoted by G,. The 
number of permutations of an n-element set with 
exactly k cycles is the Stirling number of the first 
kind, s(n,k). These numbers are given as the 
expansion coefficients of falling factorials, 


x(x —1)-+-(x -—n+1) = S(-1)"*s(n, k) xk 


k=0 


or in form of the double (formal) power series 


` aa = (1+ y)” 


= 
n,k>0 fe 


A partition of a set is a collection of pairwise 
disjoint subsets the union of which is the complete 
set. The subsets in the collection are called the 
blocks of the partition. The total number of 
partitions of an n-element set is the Bell number 
Bn. These numbers are given by 


The number of partitions of an n-element set into 
exactly k blocks is the Stirling number of the second 
kind, S(n,k). These numbers are given by 


NO S(n,k) xt = eX 
n,k>0 ms 


or, explicitly, by 


S(n,k) = D (Cyr 


A composition of a positive integer n is a represen- 
tation of n as a sum n=s1 +s2 +: +s} of other 
positive integers s; where the order of the sum- 
mands matters. The total number of compositions of 


n is 2⁄1, The number of compositions of n with 
exactly k summands is (Z) A partition of a 
positive integer n is a representation of n as a sum 
n= 1 +2 +- +A of other positive integers A;, 
where the order of the summands does matter. Thus, 
we may assume that the summands are ordered, 
My > A2 >- >Ap >O0. This is the motivation 
to write partitions most often in the form of 
tuples (1, à2,..., Ap) the entries of which are 
weakly decreasing. The summands of a partition 
are called the parts of the partition. Let p(n) denote 
the number of partitions of n. These numbers are 
given by 


n—0 Ties (1 a x") 
If p(n, k) denotes the number of partitions of n into 
at most k parts, then we have 

1 


J n,k)x” = ——_____ 
2, Pim = Fa 


Finally, if p(m,k,m) denotes the number of parti- 
tions of n into at most k parts, all of which are at 
most m, then 


N p(n, k,m)x" 
n>0 
(1 — yai — a o. (1 E gatt) 
a (1— xk) — xk!) «+61 —x) 
The expression on the right-hand side is called 
q-binomial coefficient, and is denoted by [ hes 

Partitions are frequently encoded in terms of their 
Ferrers diagrams. The Ferrers diagram of a partition 
A= (1, à2,..., A) is an array of cells with £ left- 
justified rows and åA; cells in row 7. For example, the 
diagram in Figure 1 is the Ferrers diagram of the 
partition (3,3,2). 

A lattice path P in Zf (where Z denotes the set of 
integers) is a path in the d-dimensional integer 
lattice Z? which uses only points of the lattice, that 
is, it is a sequence (Po, P1,..., Pı), where P; € Z4 for 
all ¿4 The vectors PoP1,P1P>,..., PiP; are called 
the steps of P. The number of steps, l, is called the 
length of P. Figure 2 shows a lattice path in Z” of 
length 11. 


Figure 1 A Ferrers diagram. 





Figure 2 A Motzkin path. 


A Dyck path is a lattice path in the integer 
plane Z? consisting of up-steps (1, 1) and down-steps 
(1, —1), which starts at the origin, never passes below 
the x-axis, and ends on the x-axis. See Figure 3 for an 
example. 

The number of Dyck paths of length 27 is the 


Catalan number 
C, = 1 (27 
n+1\n 


The generating function (see the next section for an 
introduction to the theory of generating functions) 
for these numbers is 
x , 1-vy1-4x 
Soen 


=- 5" 1 





The reader is referred to exercise 6.19 in Stanley 
(1999) for countless occurrences of the Catalan 
numbers. 

A Motzkin path is a lattice path in the integer 
plane Z? consisting of up-steps (1,1), level steps 
(1,0), and down-steps (1,—1), which starts at the 
origin, never passes below the x-axis, and ends on 
the x-axis. The path in Figure 2 is in fact a Motzkin 
path. The number of Motzkin paths of length 7 is 
the Motzkin number 


1 2k\/n 
T k+1\k Ge) 
The generating function for these numbers is 


ee ea 


2x2 2 


n=0 


The reader is referred to exercise 6.38 in Stanley (1999) 
for numerous occurrences of the Motzkin numbers. 

A Schröder path is a lattice path in the integer 
plane Z consisting of horizontal steps (1,0) and 





Figure 3 A Dyck path. 
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Figure 4 A Schröder path. 


vertical steps (0,1), which starts at the origin, never 
passes below the diagonal x=y, and ends on the 
diagonal x =y. See Figure 4 for an example. 

The number of Schröder paths of length n is the 
(large) Schroder number 


D 74 (#24) 


k>0 


The generating function for these numbers is 


L o, Cll Hx — V1 -— 6x +x? 
ee 


F 3] 


n=0 


The reader is referred to exercise 6.39 in Stanley 
(1999) for numerous occurrences of the Schröder 
numbers. 

There is another famous sequence of numbers 
which we did not touch yet, the Fibonacci numbers 
F,,. They are given by 





PON 3 145 n+1 
aay 5 


with generating function 


D Px = e [4] 


= 1—=x-—x2 


They also occur in numerous places. For example, 
the number F, counts all paths on the integers Z 
from 0 to n with steps (1,0) and (2,0). 

An undirected graph G consists of vertices and 
edges. An edge is a two-element subset of the 
vertices, which, however, is thought of as a line or 
curve connecting the two vertices. See Figure 5a 
for an example. The usual notation for a graph G 
is G=(V,E), where V is the set of vertices and E 
is the set of edges of G. A graph is planar if it is 
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U5 


Up 


v3 


vi 


(a) (b) 
Figure 5 (a) An undirected graph. (b) A directed graph. 
embedded in the plane (sphere) in such a way that 
the curves which mark the edges do not intersect 
in their interiors. There can be several different 
ways to embed the same graph in the plane (or in 
another surface). When we speak of a planar 
graph then we assume the graph already to be 
embedded in a given way. For example, the graph 
in Figure 5 is not a planar graph, by its drawing. 
However, there is a different embedding which is 
planar (namely, all embeddings which put the 
vertex v3 above the vertex vs and leave the other 
vertices as they are). A tree is a graph without any 
cycles. 

A directed graph (or digraph) G consists of 
vertices and arcs (which are sometimes also called 
directed edges). An arc is a pair of vertices, which, 
however, is thought of an arrow pointing from the 
first vertex of the pair to the second. See Figure 5b 
for an example. The usual notation for a directed 
graph G is again G= (V, E), where V is the set of 
vertices and E is the set of arcs of G. All other 
notions explained for undirected graphs have analo- 
gous meanings for directed graphs. 

Graphs can be labeled, in which case each vertex 
is assigned a label, or unlabeled. The (undirected) 
graph in Figure 5a is labeled, whereas the (directed) 
graph in Figure 5b is unlabeled. 


Generating Functions 


Generating functions are the very basic tools of 
enumeration. For introductions to this technique, 
from different points of view, the reader is referred 
to Bergeron et al. (1998), Flajolet and Sedgewick 
(chapter 1 in the reference listed in “Further read- 
ing” section), and Stanley (1998, chapter 1; 1999, 
chapter 4). 

Let A be a set of (unlabeled) objects. Each object 
ain A has a certain size, |a|, which is a non-negative 
integer. Let us also assume that there is only a finite 
number of objects from A of a given size. Let a, be 
the number of objects from A of size n. The 


(ordinary) generating function for A is the formal 
power series 


Fax) = Sxl = Soana" 
=0 


acA n 


(“formal” means that x is just an indeterminate, not 
a real or complex number. One can compute with 
formal power series in the same way as with analytic 
series, only that convergence issues do not arise, 
respectively that “convergence” has a different 
meaning; cf. Stanley (1998, section 1.1)) Typical 
examples are Sets (the collection containing all 
“unlabeled sets,” that is, all objects of the form 
{e,e,...,¢}, including the empty set, where the size 
of {e e... e} is the number of e’s), Sequences 
(the collection containing all “unlabeled sequences,” 
that is, all objects of the form (¢,¢,...,¢), including 
the empty sequence), Cycles (unlabeled cycles), 
with respective generating function 


Fgets (x) = Piésquenden (x) ie ae 
= X 
= 1-x 


or Trees (unlabeled trees). 

If A and B are two sets of objects, one can define 
several other sets of objects using them. The union 
of A and B, written AU B, has as a groundset the 
disjoint union of A and B, and the size of an element 
from A is its size in A, while the size of an element 
from B is its size in B. We have 


F 4up(x) = Fax) T Fg(x) [6] 


The product of A and B, written A x B, has as a 
groundset the set of pairs A x B, and the size of an 
element (a, b) from A x B is the sum of the sizes of a 


(in A) and of b (in B). We have 
Faxg(x) = Fa(x) - Fg(x) [7] 


The substitution of two sets A and B of objects 
can only be defined in certain circumstances, and 
only in certain more restrictive circumstances the 
generating function for the substitution can be 
computed by substituting the generating functions 
for A and B. Let us assume that any object a from 
A of size n, by its structure, has n atoms (nodes). For 
example, if A is a certain set of trees, where the size 
of a tree is the number of leaves in the tree, then we 
may take, as the atoms, the leaves of the tree. In this 
situation, the substitution of B in A, denoted by 
A(B), is the set of objects which arises by replacing 
the atoms of objects from A by objects from B in all 
possible ways. The size of an object from A(B) is the 
sum of the sizes of the objects from B that it 





Foyetës (x) 


contains. In order that A(B) contains only a finite 
number of objects of a given size, we must assume 
that 6 contains no elements of size 0. If, in addition, 
the atoms of any element a from A inherit an order 
(e.g., if A is a set of binary trees, then the leaves of a 
binary tree are ordered in a natural way from “left” 
to “right”), then we have 


Fyg (x) = Fa(Fe(x)) | 8] 


However, this equation is not true in general. The 
general formula comes out of Redfield—Polya theory 
(see [21] and [24]) and requires the notion of cycle 
index series. For example, if B is the set of connected 
(unlabeled) graphs, A is Sets, so that A(B) is the 
set of all (connected and disconnected) graphs, then 
[8] is not true, but what is true is 


Fsets(B) = exp (F(x) +4Fe(x*)+4Fe(x?)+---) [9] 


This holds, in fact, for any set B of unlabeled objects. 
(This is seen by combining [24], [17], and [21].) 

Next we deal with the enumeration of labeled 
objects. Let A be a set of labeled objects, again, each 
object a with a certain size |a| which is a non- 
negative integer. “Labeled” means that each object 
of size n, by its structure, comes with n atoms 
(nodes) which are labeled 1,2,...,”. For example, 
A may be the set of all labeled graphs, where the 
size of a graph is the number of its vertices, and 
where the vertices are labeled 1,2,...,”. Again, we 
assume that there is only a finite number of objects 
from A of a given size. Let a, be the number of 
objects from A of size n. The exponential generating 
function for A is the formal power series 


|a oo n 
X X 
acA a|! n=0 Mg 


Typical examples are Sets (the collection containing 
all “labeled sets,” that is all objects of the form 
{1,2,...,7}, including the empty set), Permuta- 
tions, Cycles (labeled cycles), with respective 
generating functions 


Esevg( X) = Exp (x) [10] 





|11) 


Epernutations (x) = l=% 


1 
1—x 





Ecycies(x) = log [12] 
or Trees (labeled trees). The explicit form of the 
generating function for Trees is discussed in the 
section “Solving equations for generating functions: 
the Lagrange inversion formula and the kernel 
method.” 
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If A and B are two sets of objects, one defines 
again several other sets of objects using them. The 
union of A and B, written AU B, has as a groundset 
the disjoint union of A and B, and the size of an 
element from A is its size in A, while the size of an 
element from B is its size in B. We have 


E aus(x) = Elx) + FE p(x) [13] 


To define the product of A and B, written A x B, 
we cannot simply take A x B as a groundset, we 
must also say something about the labeling of the 
objects. So, as a groundset we take all pairs (a,b) 
with a€ A and b € B, but labeled in all possible 
ways by 1,2,...,]a| + |b| such that the order of 
labels assigned to a respects the original order of 
labels of a, and the same for b. The size of such an 
element (a,b) is again the sum of the sizes of a (in A) 
and of b (in B). We have 


Eaxg(x) = E(x) - Ep(x) [14] 


Since, in the labeled world, objects come automati- 
cally with atoms, the substitution of two sets A and 
B of objects can now always be defined. The 
substitution of B in A, denoted by A(B), is the set 
of objects which arises by replacing the atoms of 
objects from A by objects from B in all possible 
ways, and labeling the substituted objects in all 
possible ways by 1,2,..., >°,|b| (the sum being 
over the objects from 6 which were put in the places 
of the atoms) that are consistent with the original 
labelings of the objects from B. The size of an object 
from A(B) is the sum of the sizes of the objects from 
B that it contains. In order that A(B) contains only a 
finite number of objects of a given size, we must 
assume that B contains no elements of size 0. Then 
we have 


E.)(x) = Ea(Es()) as 
An example of a composition is 
Permutations = Sets(Cycles) 


Thus, from [15] we have 


Epermitations (x) = Pgebs ( Beets (x)) 


corresponding to the identity 


— = exp(log 1/(1 — x)) 

l= 

Another manifestation of the composition rule is, for 
example, the fact (which is sometimes called the 
“exponential principle”) that, if one takes the log of 
the partition function for some maps, the result is 
the partition function for the connected maps among 
them. 
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All of the above can be generalized to a weighted 
setting. Namely, if A is a set of objects (labeled or 
unlabeled), and if w: A—R is a weight function 
from A into some ring R, then all of the above 
remains true, if we replace the definitions of F4(x) 
and E(x) above by the weighted sums 


and 


respectively, if in the definition of the union of A 
and B we define the weight of an object to be its 
weight in A, respectively B, if in the definition of the 
product of A and B we define the weight of an 
object (a, b) to be the product of the weights of a 
and b, and if in the definition of the substitution we 
define the weight of an object in A(B) as the product 
of the weights of the objects from $ that were put in 
place of the atoms. 


Redfield-Polya Theory of Colored 
Enumeration 


The natural and uniform environment for the 
separate treatment of generating functions for 
unlabeled and labeled objects in the last section is 
the theory for counting colored objects founded by 
Redfield and Polya, in the modern treatment 
through cycle index series due to Joyal. We refer 
the reader to Bergeron et al. (1998, appendix 1), 
de Bruijn (1981), and Stanley (1999, chapter 7) for 
further reading. 

Let A be a set of labeled objects with the 
constraint that there is only a finite number of 
objects of a given size. The cycle index series for A is 
the formal multivariable series 


ZA(X1,X2, ss z) 


= 1 . c10 o C3(0 
= > fix, (A) xi! ) sal xl P 


n=0 ` oc5, 


[16] 


where fix (A) is the number of objects a from A that 
remain invariant when the labels are permuted 
according to the permutation ø (in particular, if o € 
©,, the size of a must be n in order that o can be 
applied to the labels), and where c;(o) denotes the 
number of cycles of length 7 of o. 

In most cases, it is difficult to obtain compact 
expressions for the cycle index series. However, for 


our familiar families of objects, compact expressions 
are available: 


Xr. X 
Zsets (X1, X2;...) = exp (21 ee ook ams 








) = PO iog 1 


where ®(7) is the Euler totient function (the number 
of positive integers j < i relatively prime to 2). 
What makes the cycle index series so fundamental 
is the fact that the generating functions from the last 
section are specializations of it. Namely, the 
exponential generating function for A is equal to 


Legetes (x1 X253.. 


E(x) = Za(x,0,0,...) [20] 


If, given the set of labeled objects A, we produce a 
set of unlabeled objects A by taking all the objects 
from A but forgetting the labels, then the ordinary 
generating function for A is another specialization 
of the cycle index series, 


Be SA ee yee [21] 


The cycle index series satisfies the following 
properties with respect to union, product and 
composition of sets of objects: 


Zaug(x1, X2, T .) = 7 Jini 0, os s] 
+ Zg(x1,X2,...) [22] 
ZAxB(X1,%2)-+-) =Zalx1,%2,.--) 


x Lig Kis x) [23] 


JS LA E Kis K Xs); 
LG 0, X4, X6 Aer le 
ZB(X3,X6,X9, T ae os .) [24] 
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Similar to the theory of generating functions 
surveyed in the last section, one can also develop a 
weighted version of the cycle index series. Given a set 
of labeled objects A, where each object a is assigned a 
weight w(a), one changes the definition [16] insofar as 
fix,(A) gets replaced by the weighted sum 
Solaa Wla), where o(a) means the object arising 
from a by permuting the labels according to ø. Then all 
the above formulas remain true in this weighted setting. 

Cycle index series are instrumental in the enu- 
meration of colored objects. The basic situation is 
that we have given a set A of unlabeled objects so 
that every object of size n comes with n atoms 
(nodes). For example, we may think of A as the set 
of cycles. We are now going to color each atom by a 


color from the set of colors C. The question that we 
pose is: how many different colored objects of a 
given size are there? In our example, if C consists of 
the two colors “black” and “white,” then we are 
asking the question of how many necklaces one can 
make out of n pearls that can be black or white. In 
terms of generating functions, we want to compute 


DP (x) = Sox 


where the sum is over all colored objects c that one 
can obtain by coloring the objects from A. 

The central result of Redfield—Polya theory is that, 
if A is the set of labeled objects that one obtains 
from A by labeling the objects of A in all possible 
ways, then 


T g(x) = Za(|Clx, |Clx*, |Clx’, ...) 


There is again a weighted version. One allows the 
objects a from A to have weight w(a) € R. More- 
over, one assumes a weight function f:C—R on 
the colors with values in the ring R. One defines the 
weight of a colored object obtained by coloring 
the atoms of a to be w(a) multiplied by the product 
of all f(y), where y ranges over all the colors of the 
atoms (including repetitions of colors). Let T j(w, f) 
denote the sum of all the weights of all colored 
objects obtained from A. Then 


T y(w,f) =Z (Zro. DORDOKA j 


cEC cEC cEC 


We remark that these results cover also the case of 
enumeration of objects under a group action. This 
includes the enumeration of objects on which we 
impose certain symmetries. See Bergeron et al. 
(1998, appendix 1), de Bruijn (1981), and Stanley 
(1999, chapter 7) for more details. The enumeration 
of asymmetric objects is the subject of an ongoing 
research program (cf. Labelle and Lamathe (2004)). 


Solving Equations for Generating 
Functions: The Lagrange Inversion 
Formula and the Kernel Method 


In this section, we describe two methods to solve 
functional equations for generating functions. The 
Lagrange inversion makes it possible (in some situa- 
tions) to find explicit expressions for the coefficients of 
an implicitly given series. The kernel method (and its 
extensions), on the other hand, is a powerful method 
to obtain an explicit expression for an implicitly given 
function. We refer the reader to Flajolet and 
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Sedgewick, (section VII.5 of the reference in “Further 
reading” section) for further reading. 

In many situations it will happen that, when we 
apply the methods from the last section, we end up 
with a functional equation for the generating function 
f(x) => o fux” that we wanted to compute. For 
example, if t, denotes the number of labeled rooted 
trees with n nodes, and if we write T(x)= 
So tnx" /n!, then, by applying a straightforward 
decomposition of a tree into its root and its set of 
subtrees attached to the root, we obtain the equation 


T(z) = gexp(T(z)) [25] 


How does one solve such an equation? As a matter 
of fact, for T(z), there is no expression in terms of 
known functions. However, the Lagrange inversion 
formula enables one to find the coefficients t„/n! of 
T(z) explicitly. The theorem reads as follows. 


Theorem Let g(x) be a formal Laurent series 
containing only a finite number of negative powers 
of x, and let f(x) be a formal power series without 
constant term. If we expand g(x) in powers of f(x), 
g(x) => afa) |26] 
k 
then the coefficients Cy are given by 
Les —n 

cn =— [x l fE) forn#0 [27] 


or, alternatively, by 


Cn = [x] g(x) f(x) f(x) |28] 


Here, [x”"|h(x) denotes the coefficient of x” in the 
power series h(x). 


With this theorem in hand, eqn [25] is easy to 
solve. We write it in the form 


T(x) exp(-T(x)) =x 29) 


We want to know the coefficients in the expansion 
T(t). wine [ae Since, by RIL T(x) is: the 
compositional inverse of x exp (—x), substitution of 
x exp (—x) instead of x gives 


x= >" (wexp(-x))" 


n=0 ` 


This equation is in the form [26] with f(x)= 
xexp(—x) and g(x)=x. Hence, by [27], we obtain 


t, 1 





fr = Se (xexp(—x))” 
ny”! 
== [x] exp(nx) = 


and, thus, t, =n"~!. 
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The second method to solve functional equations 
which we explain in this section is the kernel 
method. We illustrate the method by an example. 
Let us consider the problem of counting Dyck paths 
of length 2n (see the section “Basic combinatorial 
terminology”). Rather than attempting to arrive at a 
solution of the problem directly, we consider the 
more general problem of counting the number a, 
of paths consisting of steps (1, 1) and (1, —1), which 
start at the origin, never drop below y=0, have 
length n, and end at height k. We then form the 
bivariate generating function F(u,x)= >), ps0 
An, px"u®. We then have the functional equation _ 


F(u,x) = 1 + xuF(u, x) + = (F(u,x) — F(0,x)) [30] 


since a path can be empty (this explains the term 1), 
it can end by a step (1,1) (this explains the term 
xuF(u)), or it can end by a step (1,—1). The latter 
can only happen if the path before that last step did 
not end at height 0. The generating function for 
these paths is F(u,x) —F(0,x), and this explains the 
third term in the eqn [30]. In fact, we may replace 
[30] by 


F(u,x) = 1 + xuF(u,x) + - (F(u, x) —Fi(x)) [31] 


because [31] implies that F1 (x) = F(0, x). 

The idea of the kernel method is to get rid of the 
unknown series F(u, x). This is possible because F(u, x) 
occurs linearly in [31], which can be rewritten as 


F(u, x) (1 — xu — =) =1- -Fi (x) [32] 


u 


We simply equate the coefficient of F(u, x) in this 
equation to zero, 


(<0 i 
u 


solve this for u, 


1— V1 — 4x2 
2% 


(the other solution for u makes no sense in [31]), 
and substitute this back in [32], to obtain 


1— V1 — 4x2 
2x2 


the familiar generating function [2] for the Catalan 
numbers. Now, by substituting this result in [31], we 
can even compute the full series F(u, x). 

While this was certainly a complicated, and 
unusual, way to compute the Catalan numbers, 
this approach generalizes when one considers 
paths with different step sets (see section VII.S of 
the Flajolet and Sedgewick reference in “Further 


Uu = 


F, (x) = 


reading” section). In a more general situation, one 
has a functional equation 


PUP XT) cag E E [33] 


where F(u,x) appears linearly, as well as the 
unknown series F(x),...,F,(x), whereas x and u 
appear rationally. It is clear that one can apply the 
same technique, namely collecting all the terms 
involving F(u, x), equating the coefficient of F(u, x) 
to zero, solving for u and substituting back in [33]. If 
there is more than one function F;(x), then this will 
only give one equation for F;(x). However, when 
equating the coefficient of F(u,x), which was a 
polynomial equation, there can be more solutions. 
(That was actually also the case in our example, 
although only one solution could be used.) All these 
solutions can be substituted in [33] to give many 
more equations for F;(x). The kernel method will 
work if we have enough equations to determine the 
unknown functions F;(x) (see the Flajolet and 
Sedgewick reference, section VII.S for further details). 
In the variant of the “obstinate kernel method,” 
more equations are produced in more sophisticated 
ways. The method has been largely extended by 
Bousquet-Mélou and co-workers to cover equations 
of the form [33], where P is a polynomial such that 
eqn [33] determines all involved series uniquely. This 
extension covers in particular the so-called quadratic 
method due to Brown, which is of great significance 
in the work of Tutte on the enumeration of maps. 
We refer the reader to Bousquet-Mélou and Jehanne 
(2005) and the references given there for these 
extensions. 


Extracting Asymptotic Information 
from Generating Functions 


There is powerful machinery available to extract the 
asymptotic behavior of the coefficients of a power 
series out of analytic properties of the power series. 
We describe the corresponding methods, singularity 
analysis and the saddle point method in this section. 
The survey by Odlyzko (1995) and the Flajolet and 
Sedgewick reference in “Further reading” are excel- 
lent sources for further reading, which, in particular, 
contain several other methods which we cannot 
cover here for reasons of limited space. 

Let us suppose that we are interested in the 
asymptotic behavior of the sequence (f,,),+9 of real 
(or complex) numbers as n tends to infinity. Let us 
suppose that the power series [(z)= >> ofn?” 
converges in some neighborhood of the origin. (If 
this series converges only at z=0, then either one 
has to try to scale, that is, for example, look at the 


power series f(z)= >> of, 2"/n! instead, or one 
must apply methods other than singularity analysis 
or the saddle point method. In the latter case, 
depending on the nature of the coefficients fa, this 
may be the Euler—Maclaurin or the Poisson summa- 
tion formulas, the Mellin transform technique, or 
other direct methods. The reader is referred to 
Odlyzko (1995) and the Flajolet and Sedgewick 
reference.) The idea is then to consider f(z) as a 
complex function in z (and extend the range of f 
beyond the disk of convergence about the origin), 
and to study the singularities of f(z). (The point at 
infinity can also be a singularity.) The upshot is that 
the singularities of f(z) with smallest modulus 
dictate the asymptotic behavior of the coefficients 
fa- These singularities of smallest modulus are called 
the dominating singularities. 

If there is an infinite number of dominant 
singularities, then one has to try the circle method. 
We refer the reader to Andrews (1976) and Ayoub 
(1963) for details of this method. 

If there is a finite number of dominant singula- 
rities, then there can be again two different situa- 
tions, depending on whether these are “small” or 
“large” singularities. Roughly speaking, a singularity 
is small if the function f(z) grows at most 
polynomially when z approaches the singularity, 
otherwise it is “large.” A typical example of a small 
singularity is z=1/4 in (1—4z)‘/*, whereas a 
typical example of a large singularity is g=oo in 
exp (x) or z=1 in exp (1/(1 — 2)). 

The method to apply for small singularities is the 
method of singularity analysis as developed by 
Flajolet and Odlyzko. (Singularity analysis implies 
Darboux’s method, which occurs frequently in the 
literature, and, thus, supersedes it.) For the sake of 
simplicity, we consider first only the case of a 
unique dominant singularity. We shall address the 
issue of several dominant singularities shortly. 
Furthermore, we assume the singularity to be 
z=1, again for the sake of simplicity of presenta- 
tion. The general result can then be obtained by 
rescaling z. 

The basic idea is the transfer principle: 


If f(z) = o() + O(7(z)) then 
fn = On + OF) [34] 


where o(z)= X _o on 2” is a linear combination of 


standard functions of the form (1 —z)™“, or loga- 
rithmic variants, and 7(z)= >>) Taz” also lies in 
the scale (see sections VI.3,4 of the Flajolet and 
Sedgewick reference for the exact statement). The 
expansion for f(z) in [34] is called the singular 
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expansion of f(z). For the above-mentioned stan- 
dard functions, we have 











1-2z 
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where [z”]g(z) denotes the coefficient of z” in g(z), 
and where 


dk 1 


Ci = OTET 





S=Q 


If a is a nonpositive integer, then this expansion has 
to be taken with care (cf. section VI.2 of the Flajolet 
and Sedgewick reference). 

To see how this works, consider the example 


fn= 3o?) We have 


OO an 1 
Do fut (I-ai I= 42 


The function on the right-hand side is meromorphic 
in all of C (where C denotes the complex numbers), 
with singularities at z=1 and z=1/4. The domi- 
nant singularity is z=1/4. We determine the 
singular expansion of f(z) about z = 1/4, 
— 42)1/? 


f(e) = 5 (1-42)? 24 


+ 4a — 4z)3/7 4 o(a — 4z)*>) 
27 

(We stopped the expansion after three terms. The 
farther we go, the more terms can we compute 
of the asymptotic expansion for fa.) Hence, we 
obtain 


~1/2 
pai O 
IT(172) 8n 128n? 
4 n??? 3 
-E ee 
9T(—1/2) ( =) 
4 n’? -> 
aie —7/2 
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If there are several small dominant singularities 
(but only a finite number of them), then one simply 
applies the above procedure for all of them and, to 
obtain the desired asymptotic expansion, one adds 
up the corresponding contributions. 








562 Combinatorics: Overview 


The method to apply for large singularities is the 
saddle point method. For the following considera- 
tions, we assume that f(z) is analytic in |z| < R < œ. 
At the heart of the saddle point method lies 
Cauchy’s formula 


my pice 36 


7 271 C gti 


fn = [2"|f(z) 


for writing the mth coefficient in the power series 
expansion of f(z). Here, C is some simple closed 
contour around the origin that stays in the range 
|z| < R. The idea is to exploit the fact that we are 
free to deform the contour. The aim is to choose a 
contour such that the main contribution to the 
integral in [36] comes from a very tiny part of the 
contour, whereas the contribution of the rest is 
negligible. This will be possible if we put the 
contour through a saddle point of the integrand 
f(z)/z"*!. Under suitable conditions, the main 
contribution will then come from the small passage 
of the path through the saddle point, and the 
contribution of the rest will be negligible. 

In practice, the saddle point method is not always 
straightforward to apply, but has to be adapted to the 
specific properties of the function f(z) that we are 
encountering. We refer the reader to the correspond- 
ing chapters in the Flajolet and Sedgewick reference 
and Odlyzko (1995) for more details. There is one 
important exception though, namely the Hayman 
admissible functions. We will not reproduce the 
definition of Hayman admissibility because it is 
cumbersome (cf. section VIII.5 in the Flajolet and 
Sedgewick reference and definition 12.4 of Odlyzko 
(1995)). However, in many applications, it is not 
even necessary to go back to it because of the closure 
properties of Hayman admissible functions. Namely, 
it is known (cf. Odlyzko (1995), theorem 12.8) that 
exp (p(z)) is Hayman admissible in |z|<oo for any 
polynomial p(z) with real coefficients as long as the 
coefficients a, of the Taylor series of exp (p(z)) are 
positive for all sufficiently large n (thus, e.g., exp (z) 
is Hayman admissible), and it is known that, if f(z) 
and g(z) are Hayman admissible in |z| < R < ov, then 
exp(f(z)) and f(z)g(z) are also (thus, e.g., 
exp (exp (z) — 1) is Hayman admissible). 

The central result of Hayman’s theory is the 
following: if f(z) = S° so fanz” is Hayman admissible 
in |z| < R, then E 


f (Tn) 
n ~Y — 37 
f Ff LEO Fa) Pe 37 
where r, is the unique solution for large n of the 
equation a(r)=n in (Ro, R), with a(r)=rf'(r)/f(r), 
b(r) =ra'(r), and a suitably chosen constant Ro > 0. 


This result covers only the first term in the 
asymptotic expansion. There is an even more 
sophisticated theory due to Harris and Schoenfeld, 
which allows one to also find a complete asymptotic 
expansion. We refer the reader to section VIII.5 of 
the Flajolet and Sedgewick reference and Odlyzko 
(1995) for more details. 

Methods for the asymptotic analysis of multi- 
variable generating functions are also available 
(see the corresponding chapters in Flajolet and 
Sedgewick, Odlyzko (1995) and the recent impor- 
tant development surveyed in the Pemantle and 
Wilson reference listed in “Further reading”). We 
add that both the method of singularity analysis and 
Hayman’s theory of admissible functions have been 
made largely automatic, and that this has been 
implemented in the Maple program gdev (see 
“Further reading”). 


The Theory of Heaps 


The theory of heaps, developed by Viennot, is a 
geometric rendering of the theory of the partial 
commutation monoid of Cartier and Foata, which 
is now most often called the Cartier—Foata monoid. 
Its importance stems from the fact that several 
objects which appear in statistical physics, such as 
Motzkin paths, animals, respectively polyominoes, 
or Lorentzian triangulations (see the Viennot and 
James reference in “Further reading” and the 
references therein), are in bijection with heaps. 

Informally, a heap is what we would imagine. We 
take a collection of “pieces,” say B1, Bo,..., and put 
them one upon the other, sometimes also sideways, 
to form a “heap,” see Figure 6. 

There, we imagine that pieces can only move 
vertically, so that the heap in Figure 6 would indeed 
form a stable arrangement. Note that we allow 
several copies of a piece to appear in a heap. (This 
means that they differ only by a vertical translation.) 
For example, in Figure 6 there appear two copies of 
By. Under these assumptions, there are pieces which 
can move past each other, and others which cannot. 
For example, in Figure 6, we can move the piece Bg 
higher up, thus moving it higher than Bı if we wish. 
However, we cannot move B7 higher than Beg, 





Figure 6 A heap of pieces. 


because Bs blocks the way. On the other hand, we 
can move By past Bı (thus taking Bs with us). Thus, 
a rigorous way to introduce heaps is by beginning 
with a set B of pieces (in our example, B= 
{B;, B2,...,B7}), and we declare which pieces can 
be moved past another and which cannot. We 
indicate this by a symmetric relation R: we write 
aRb to indicate that a cannot move past b (and vice 
versa). When we consider a word a1da2...da, of 
pieces, a; € B, we think of it as putting first a1, then 
putting a on top of it (and, possibly, moving it past 
a1), then putting a3 on top of what we already have, 
etc. We declare two words to be equivalent if one 
arises from the other by commuting adjacent letters 
which are not in relation. A heap is then an 
equivalence class of words under this equivalence 
relation. What we have described just now is indeed 
the original definition of Cartier and Foata. 

The class of heaps which occurs most frequently 
in applications is the class of heaps of monomers 
and dimers, which we now introduce. Let B= MUD, 
where M = {mo, m1, ...} is the set of monomers and 
D ={dj,d2,...} is the set of dimers. We think of a 
monomer m; as a point, symbolized by a circle, 
with x-coordinate i, see Figure 7. We think 
of a dimer d; as two points, symbolized by circles, 
with x-coordinates i— 1 and i which are connected 
by an edge, see Figure 7. We impose the relations 
mi Rmij, m;Rdi, m;Rdiz1, [= 0, 1, EE d; Rd i— 1 << 
j < i, and extend R to a symmetric relation. Figure 8 
shows two heaps of momomers and dimers. 

For example, Motzkin paths are in bijection with 
heaps of monomers and dimers. To see this, given a 
Motzkin path, we read the steps of the path from 


Figure 7 Monomers and dimers. 


ẹ— i i i oo | : 6 ġe | 
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Figure 8 Two heaps of monomers and dimers. 
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Figure 9 Bijection between Motzkin paths and heaps of 
monomers and dimers. 


the beginning to the end. Whenever we read a level- 
step at height 4, we make it into a monomer with 
x-coordinate h, whenever we read a down-step from 
height h to height hb — 1, we make it into a dimer 
whose endpoints have x-coordinates h — 1 and b. 
Up-steps are ignored. Figure 9 shows an example. In 
the figure, the heap is not in “standard” fashion, in 
the sense that the x-axis is not shown as a horizontal 
line but as a vertical line (cf. Figure 7). But it could 
be easily transformed into “standard” fashion by a 
simple reflection with respect to a line of slope 1. 

Lattice animals on the triangular lattice and on the 
quadratic lattice are also in bijection with heaps, this 
time with heaps consisting entirely out of dimers. 
Given an animal, one simply replaces each vertex of 
the animal by a dimer, see Figures 10 and 11. While 
in the case of animals on the triangular lattice this 
gives a constraintless bijection (see Figure 10), in the 
case of the quadratic lattice this sets up a bijection 
with heaps of dimers in which two dimers of the 
same type can never be placed directly one over the 
other (see Figure 11). For example, two dimers ds, 
one placed directly over the other (as they occur in 
Figure 10), are forbidden under this rule. 

Next we make heaps into a monoid by introdu- 
cing a composition of heaps. (A monoid is a set with 
a binary operation which is associative.) Intuitively, 
given two heaps Hı and H3, the composition of H; 
and H2, the heap Hı o Hy, is the heap which results 


012345678 


Figure 10 Bijection between animals and heaps of dimers. 





0123 45 6/7 8 


Figure 11  Bijection between animals and heaps of dimers. 
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Figure 12 The composition of the heaps in Figure 8. 


by putting H3 on top of H1. In terms of words, the 
composition of two heaps is the equivalence class of 
the concatenation uw, where u is a word from the 
equivalence class of Hı, and w is a word from the 
equivalence class of Hp. 

The composition of the two heaps in Figure 8 is 
shown in Figure 12. 

Given pieces 6 with relation R, let H(B, R) be the 
set of all heaps consisting of pieces from B, 
including the empty heap, the latter denoted by @. 
It is easy to see that the composition makes 
(H(B,R), o) into a monoid with unit 0. 

For the statement of the main theorem in the 
theory of heaps, we need two more terms. A trivial 
heap is a heap consisting of pieces all of which are 
pairwise unrelated. Figure 13a shows a trivial heap 
consisting of monomers and dimers. A pyramid is a 
heap with exactly one maximal (=topmost) ele- 
ment. Figure 13a shows a pyramid consisting of 
monomers and dimers. Finally, if H is a heap, then 
we write |H| for the number of pieces in H. 

In applications, heaps will have weights, which are 
defined by introducing a weight w(B) for each piece B 
in B, and by extending the weight w to all heaps H by 
letting w(H) denote the product of all weights of the 
pieces in H (multiplicities of pieces included). 

Let M be a subset of the pieces B. Then, the 
generating function for all heaps with maximal 
pieces contained in M is given by 


= > UT ET(B\M.R) (—1)"'w(T) 
> UTET(BR) (—1)|"!w(T) 


[38] 


where 7(B,R) denotes the set of all trivial heaps 
with pieces from B. In particular, the generating 
function for all heaps is given by 


1 


w(H 
HEH(B,R) 
maximal pieces CM 


w(H) == [39] 
HEH(B,R) »Tet(B,r)(—1) w(T) 

a CO) oO 9 
a ae ee ae ae O aoe ae ee ee 
012345 6/7 012345 6 


(a) (b) 
Figure 13 (a) A trivial heap. (b) A pyramid. 


Furthermore, if P(B,R) denotes the set of all 
pyramids with pieces from 6, then 


P 
HD = log >. w(H) [40] 
PEP(B,R) | | HeEH(B,R) 
where |P| is the number of pieces of P. (As the 
reader will have guessed, this is a consequence of the 
“exponential principle” mentioned in the section 


“generating functions.”’) 


The Transfer Matrix Method 


The transfer matrix method (cf. Stanley (1986), 
chapter 4 for further reading) applies whenever we 
are able to build the combinatorial objects that we 
are interested in by moving on a finite number of 
states in a step-by-step fashion, where the current 
step does not depend on the previous ones. (In 
statistical language, we are considering a finite-state 
Markov chain.) For example, Motzkin paths which 
are constrained to stay between two parallel lines, 
say between y=0 and y=K, can be described in 
such a way: the states are the heights 0,1,...,K, 
and, if we are in state h, then in the next step we are 
allowed to move to states b +1,h, or h — 1, except 
that from state 0 we cannot move to —1 (there is no 
state —1), and when we are in state K we cannot 
move to K + 1 (there is no state K + 1). 

For describing the general situation, let G = (V, E) 
be a directed graph with vertex set V and edge set E. Let 
W,(u,v) denote the number of walks from vertex u to 
vertex v along edges of G. To compute these numbers, 
we consider the adjacency matrix of G,A(G). By 
definition, using our notation, A(G) = (w1 (u, v))„ vev- 
Obviously, (w(u, v))„ vey = (A(G))”. Thus, 


o Wn (U, oa 
n=0 


u,EeV n=0 


= (In — A(G)x)™ 


where I,, is the n x n identity matrix. In other words, 
the generating functions X` )w,(u,v)x” for the 
walk numbers between u and v form the entries of a 
matrix which is the inverse matrix of I,, — A(G)x. By 
elementary linear algebra, 


Co 
>» Wy(u,v) x” 
n=0 


CA detn = AG) ou yy 
E det(I,, — A(G)x) Be 
where det (I, — A(G)x),,,, is the minor of I,, — A(G)x 
with the row indexed by v and the column indexed 


by u omitted, and where #u denotes the row 
number of u and similarly for #v. A weighted 
version could also be developed in the same way, 
where we put a weight w(e) on each edge, and the 
weight of a walk is the product of the weights of all 
its edges. 

In particular, the expression [41] is a rational 
function in x. Then, by the basic theorem on 
rational generating functions (cf. Stanley (1986), 
section 4.1), the number w,,(u, v) can be expressed as 
a sum Ta P;(n)y?, where the Ņ;’s are the different 
roots of the polynomial det (xI,, — A(G)), and P;(n) 
is a polynomial of degree less than the multiplicity 
of the root q;. (The P;(z)’s depend on u and v, 
whereas the 4;’s do not.) If there exists a unique root 
y; with maximal modulus, then this implies that, 
asymptotically as n — œ, w, (u,v) ~ Piin). 


Lattice Paths 


Recall from the section on basic combinatorial 
terminology that a lattice path P in Zf is a path in 
the d-dimensional integer lattice Z,4 which uses only 
points of the lattice, that is, it is a sequence 
(Po, P1, ..., Pi), where P; € Z for all i. The vectors 
—> —— — 

PoP1, P4P2,...,Pı—-1P; are called the steps of P. The 
number of steps, l, is called the length of P. 

The enumeration of lattice paths has always 
been an intensively studied topic in statistics, 
because of their importance in the study of 
random walks, of rank order statistics for non- 
parametric testing, and of queueing processes. The 
reader is referred to Feller (1957) and particularly 
Mohanty’s (1979) book, which is a rich source for 
enumerative results on lattice paths, albeit in a 
statistical language. We review the most important 
results in this section. Most of these concern two- 
dimensional lattice paths, that is, the case d=2. 

To begin with, we consider paths in the integer 
plane Z? consisting of horizontal and vertical unit 
steps in the positive direction. Clearly, the number 
of all (unrestricted) paths from the origin to (n, m) is 
the binomial coefficient (”'’"). By the reflection 
principle, which is commonly attributed to D André 
(see, e.g., Comtet (1974) p. 22), it follows that the 
number of paths from the origin to (n,m) which do 
not pass above the line y = x + t, where m < n + t, is 


given by 
vem) — ( n+m ) 42] 
n n+t+1 
Roughly, the reflection principle sets up a bijec- 


tion between the paths from the origin to (n,m) 
which do pass above the line y =x + t and all paths 
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from (—t—1,f+1) to (n,m), by reflecting the path 
portion between the origin and the last touching 
point on y=x +t+ 1 in this latter line. Thus, the 
result of the enumeration problem is the number of 
all paths from (0, 0) to (n,m), which is given by the 
binomial coefficient (ener minus the number of all 
paths from (—t—1,t+ 1) to (n,m), which is given 


n+m 


nitit)> Whence the 


by the binomial coefficient 
formula [42]. 

If one considers more generally paths bounded by 
the line my =nx + t, no compact formula is known. 
It seems that the most conceptual way to approach 
this problem is through the so-called kernel method 
(see the section on solving equations for generating 
functions), which, in combination with the saddle 
point method, allows one also to obtain strong 
asymptotic results. There is one special instance, 
however, which has a “nice” formula. The number 
of all lattice paths from the origin to (n,m) which 
never pass above x= uy, where p is a positive 
integer, is given by 

n (ee) 43] 


n+m-+1 m 


The most elegant way to prove this formula is by 
means of the cycle lemma of Dvoretzky and 
Motzkin (see Mohanty (1979), p. 9 where the cycle 
lemma occurs under the name of “penetrating 
analysis”). 

Iteration of the reflection principle shows that the 
number of paths from the origin to (n,m) which stay 
between the lines y=x+t and y=x+s (being 
allowed to touch them), where t > 0 > sand n + t> 
m>n-+s, is given by the finite (!) sum (see, e.g., 
Mohanty (1979), p. 6) 


P(e sta) 


keZ 


7 (, — k(t ane +t+ :)) |44! 


The enumeration of lattice paths restricted to 
regions bounded by hyperplanes has also been 
considered for other regions, such as quadrants, 
octants, and rectangles, as well as in higher dimen- 
sions. A general result due to Gessel and Zeilberger, 
and Biane, independently, on the number of lattice 
paths in a chamber (alcove) of an (affine) reflection 
group (see Krattenthaler (2003) for the correspond- 
ing references and pointers to further results) shows 
how far one can go when one uses the reflection 
principle. In particular, this result covers [42] and 
[44], the enumeration of lattice paths in quadrants, 
octants, rectangles, and many other results that have 
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appeared (before and after) in the literature. We 
present a particularly elegant (and frequently occur- 
ring) special case. (In reflection group language, it 
corresponds to the reflection group of “type Ay_1.” 
See Humphreys (1990) for terminology and infor- 
mation on reflection groups.) 

Let A= (di;a) and E = (€1,€2,..., €q) be 
points in Z? with q >m >.> a, and e,> 
e2 >--- > ey. The number of all paths from A to E in 
the integer lattice Z, which consist of positive unit 
steps and which stay in the region x1 > x2 >--- > xq, 
equals 


d 1 
(Ze a) de lgo s 

The counting problem of the theorem is equiva- 
lent to numerous other counting problems. It has 
been originally formulated as an n-candidate ballot 
problem, but it is as well equivalent to counting the 
number of standard Young tableaux of a given 
shape. In the case that all aps are equal, the 
determinant does in fact evaluate into a closed- 
form product. In Young tableaux theory, a parti- 
cular way to write the result is known as the 
hook-length formula (see, e.g., Stanley (1999), 
corollary 7.21.6). 

We return to lattice paths in the plane, mention- 
ing some more closely related results. The first is a 
result of Mohanty (1979, section 4.2), which 
expresses the number of all lattice paths from the 
origin to (n,m) which touch the line y=x+t 
exactly r times, never crossing it, as the difference 


(were enone! ei 46 
Ca a. Wet 

Not forbidding that the paths cross the bounding 
line, we arrive at the problem of counting the lattice 


paths from the origin to (n,m), which cross the main 
diagonal y =x exactly r times, the answer being 


m—n+2r+1 /m+n+1 
n—r 


mt+n+i1 


2r+2 2n 
n n—-r—1 
Next, we give the number of lattice paths from the 
origin to (n, n) which have 2r steps on one side of 


the line y=x, as 
(z) E — 7) 48] 
r n—r 


a result due to Sparre Andersen. We refer the reader 
to Mohanty (1979, chapter 3) for further results in 
this direction. 


) ifm>n m 





ifm=n 


Enumerating lattice paths with a fixed number 
of maximal straight pieces (which correspond to 
runs), is intimately connected to another basic 
enumeration problem concerning lattice paths: the 
enumeration of lattice paths having a fixed number 
of turns. An effective way to attack the latter problem 
is by means of two-rowed arrays (see the survey 
article by Krattenthaler (1997), where in particular 
analogs of the reflection principle for two-rowed 
arrays are developed. These imply formulas for the 
number of lattice paths with fixed starting points and 
endpoints and a fixed number of north-east (respec- 
tively east—-north) turns, for unrestricted paths, as 
well as for paths bounded by lines. (A north-east turn 
in a lattice path is a point where the direction changes 
from “north” to “east.” An east—north turn is defined 
analogously.) In particular, analogs of [42]-[44] are 
known when the number of north-east (respectively 
east—north) turns is fixed. 

These formulas imply for example (see again 
Krattenthaler (1997, section 3.5)) that the number 
of lattice paths from the origin to (n,n) which 
never pass above the line y=x+t and have 
exactly 2r maximal straight pieces is given by 


(a 
(HC) 


with a similar result for the case of 2r + 1 maximal 
straight pieces. (If t= 0, the numbers in [49] become 


lyn n 

n ( r ) (, —1 
and they are known as the Narayana numbers.) 
Furthermore, they imply that the number of lattice 
paths from the origin to (n,n) which never pass 
above the line y=x+z¢z and never below the line 
y=x-—t and have exactly 2r maximal straight 
pieces is given by 
3 Ci ‘) Gia ‘) 

- r+k—1 r—k—1 


k=—0o 
7 iW Dkt +t =1\ (n+ 2kt=t=—1 
r+k—2 r—k 
_ (n—2kRt+t—1\ (n+2kt—-t-1 50) 
r+k—1 r—k—1 
with a similar result for the case of 2r + 1 maximal 
straight pieces. 


The most general boundary for lattice paths that 
one can imagine is the restriction that it stays 


between two given (fixed) paths. Let us assume that 
the horizontal steps of the upper (fixed) path are at 
heights a, <a) < --- < an, whereas the horizontal 
steps of the lower (fixed) path are at heights bı < 
by <--> < bp, a; > b;,i1=1,2,...,n. Then the num- 
ber of all paths from (0, b14) to (n,an) satisfying the 
property that for all 1=1,2,...,7 the height of the 
ith horizontal step is between b; and a; is given by 
the determinant 


det ((% ~ Oi + ‘)) 51] 
1<ij<n j—i+1 


In the statistical literature, this formula is often 
known as “Steck’s formula,” but it is actually a 
special case of a much more general theorem due 
to Kreweras. A generalization of [51] to higher- 
dimensional paths was given by Handa and 
Mohanty (see Mohanty (1979, section 2.4)). 

Next, we consider three-step lattice paths in the 
integer plane Z^, that is, paths consisting of up-steps 
(1,1), level steps (1,0), and down-steps (1, —1). The 
particular problem that we are interested in is to 
count such three-step paths starting at (0,r) and 
ending at (¢,s), which do not pass below the x-axis 
and do not pass above the horizontal line y=K. 
Furthermore, we assign the weight 1 to an up-step, 
the weight b, to a level-step at height 4, and the 
weight à, to a down-step from height b to h — 1. 
The weight w(P) of a path P is defined as the 
product of the weights of all its steps. Then we have 
the following result, which can be obtained by the 
transfer matrix method described in the last section. 

Define the sequence (py(x)),,59 of polynomials by 


XPu(X) = Pnsi(x) + OnPalx) + AnPn—1(*) 


forn>1 pe 


with initial conditions po(x)=1 and pi(x)=x — bo. 
Furthermore, define (Sp,,(x)),,+9 to be the sequence of 
polynomials which arises from the sequence (p,,(x)) 
by replacing A; by Aj41 and b; by b341,i1=0,1,2,..., 
everywhere in the three-term recurrence [52] and in 
the initial conditions. Finally, given a polynomial p(x) 
of degree n, we denote the corresponding reciprocal 
polynomial x”p(1/x) by p*(x). 

With the weight w defined as before, the generat- 
ing function `p w(P)x“®, where the sum is over all 
three-step paths which start at (0, r), terminate at 
height s, do not pass below the x-axis, and do not 
pass above the line y =K, is given by 


Sp (x) pe (x) 
Pk (x) 

x" Sp (x)S* pka) 
Prat (x) 


[53] 
Xp Às+ 


) 
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The sequence of polynomials (p„(x))„>o 1s in fact a 
sequence of orthogonal polynomials (cf. Koekoek 
and Swarttouw (1998) and Szeg6 (1959)). 

We remark that in the case that r=s =O there is 
also an elegant expression for the generating func- 
tion due to Flajolet (see section V.2 of the Flajolet 
and Sedgewick reference in “Further reading”) in 
terms of a continued fraction. 

In order to solve our problem, we just have to 
extract the coefficient of x’ in [53]. By a partial 
fraction expansion, a formula of the type 


> oe [54] 


results, where the €,,’s are the zeroes of px1(x), and 
the cm's are some coefficients, only a finite number 
of them being nonzero. 

It should be noted that, because of the many 
available parameters (the b„’s and .,.s), by appro- 
priate specializations one can also obtain numerous 
results about enumerating three-step paths accord- 
ing to various statistics, such as the number of 
touchings on the bounding lines, etc. 

There are two important special cases in which a 
completely explicit solution in terms of elementary 
functions can be given. 

The first case occurs for b; =0 and A; =1 for all 7. 
In this case, the polynomials p,(x) defined by 
the three-term recurrence [52] are Chebyshev poly- 
nomials of the second kind, py(x)=U,(x/2). 
(The Chebyshev polynomial of the second kind 
U,(x) is defined by U,(cost)= sin ((n + 1)t)/sint 
(see Koekoek and Swarttouw (1998) for almost 
exhaustive information on these polynomials and, 
more generally, on hypergeometric orthogonal poly- 
nomials)). The result which is then obtained from the 
general theorem (clearly, the zeros of U,(x) are 
x= cos(2km/(n+1)),kR=1,2,...,n, and therefore 
the partial fraction expansion of [53] is easily 
determined) is that the number of lattice paths from 
(0,r) to (£,s) with only up- and down-steps, which 
always stay between the x-axis and the line y= K, is 
given by (see also Feller (1957, chapter XIV, eqn [5.7]) 


2 K+1 rk £ 
a Jeco —— 
Kao ( cos zez) 
_ tRir +1). mwk(s +1) 
x a Kas Ca [55] 
a formula which goes back to Lagrange. 

The second case occurs for b;=1 and \;=1 for 
all z. In this case, the polynomials p,(x) defined 
by the three-term recurrence [52] are again 
Chebyshev polynomials of the second kind, 
Du(x) =U,((x —1)/2). The result which is then 
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obtained from the general theorem is that the 
number of three-step lattice paths from (0,r) to 
(¢,s), which always stay between the x-axis and the 
line y = K, is given by 


2 K+1 ah £ 
L S2 1 
K+2 ( a aoe a ) 


k=1 
„arel +1) sin TEE +1) 
K+2 K+2 
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Perfect Matchings and Tilings 


In this section we consider the problem of counting 
the perfect matchings of a graph. For an introduc- 
tion into the problem, and into methods to solve it, 
as well as for a report on recent developments, we 
refer the reader to Propp (1999). 

Let G=(V,E) be a finite loopless graph with 
vertex set V and edge set E. A matching (also called 
1-factor in graph theory) is a subset of the edges 
with the property that no two edges share a vertex. 
A matching is perfect if it covers all the edges. 
Let M(G) denote the number of perfect matchings of 
the graph G. More generally, we could assign a 
weight w(e) to each edge e of the graph and define the 
weight of a matching to be the product of 
the weights of all its edges. Let M,,(G) denote 
the sum of all weights of all matchings of the 
graph G. 

Kasteleyn’s method for determining M(G), respec- 
tively M,,(G), makes use of determinants and 
Pfaffians. Recall that the Pfaffian Pf(A) of a 
triangular array A = (a; j)i<i<j<2n 18 defined by 


PA) = S_(sgnm) I] Qij [57] 


a {ij}em 


where the sum is over all perfect matchings of the 
complete graph on vertices {1,2,..., 27}, and where 
the product is over all edges {i,j} i <j, of m. The 
sign sgn m of m is (—1)#*ts5788 Of m where a crossing 
is a pair ({7,7}, {k, /}) of edges such thati< kR<j<l. 
Usually, one extends the triangular array A to a 
matrix by setting ji i = — Aj, j5t < Í and aj,j=0 for 
all i. Then, abusing notation, we identify the 
triangular array with the skew-symmetric matrix 
A= (4i,j)1<i,;<2n- The Pfaffian satisfies the following 
useful properties: 


Pf£(B’AB) = det(B) Pf(A) 
and 


Pf(A)> = det(A) [58] 


The latter equality shows in particular that Pfaffians 
are very close to determinants. They do, in fact, 
generalize determinants since 


0 B 
pf (5 4 = det B [59] 


for any square matrix B. 

Thus, given a graph with vertices v1,12,...,V2n5 
specializing a;,; to the weight of the edge between v; 
and vj, if it exists, and setting a;;=0 otherwise in 
the definition of the Pfaffian, we obtain almost 
M,,(G), the only difference is that there could be 
signs in front of the individual terms of the sum, 
whereas in M,,(G) the sign in front of each term 
must be +. (The object obtained by omitting the sign 
in [57] is called Hafnian. Unfortunately, in contrast 
to the Pfaffian, it does not have any nice properties 
and it is therefore extremely difficult to compute.) 
Kasteleyn’s idea is to circumvent this problem by 
orienting the edges of the graph, defining signed 
weights of the edges, in such a way that the Pfaffian 
of the array with signed weights produces exactly 
M,,(G). 

More precisely, given a (weighted) graph G with 
vertices 1,V2,..-,UV2n, we make it into an oriented 
(weighted) graph G. That is, if there is an edge 
between v; and vj, e;,; say, we orient it either from v; 


to v; or the other way. Now we define the signed 
—- 


adjacency matrix A(G) of G by letting its (i,7)-entry 
to be +w(e;;) if there is an edge from v; to vj 
oriented that way, —w/(e;,;) if there is an edge from 
v; to vi oriented that way, and 0 if there is no edge 
between v; and v;. Such an orientation is called 
Pfaffian if 


Pf(A(G)) = +M.,(G) 


Clearly, the question remains whether a Pfaffian 
orientation can be found for a given graph. In 
general, this is an open question. However, Kaste- 
leyn shows that for planar graphs such a Pfaffian 
orientation can always be found. Moreover, he 
shows that any orientation of a planar graph 
which has the property that around any face 
bounded by 4k edges an odd number of edges is 
oriented in either direction and that around any face 
bounded by 4k + 2 edges an even number of edges is 
oriented in either direction is Pfaffian. 

For bipartite graphs (1.e., for graphs in which the set 
of vertices can be split into two disjoint sets such that 
all the edges connect the vertex of one of these sets to a 
vertex of the other), the situation is even nicer. This is 
because for a bipartite graph G in which both parts of 
the bipartition of the vertices are of the same size 
(otherwise, there is no perfect matching), any signed 


adjacency matrix A(G) has the block form of the 
matrix on the left-hand side of [59] and, hence, the 
Pfaffian reduces to a determinant. More precisely, let 
G be a bipartite graph with vertex set V= UU W, 
U={u4,u2,..., Un} and W = {w1, W2,..., Wn}, with 
edges connecting some u; to some wj. Given a 
Pfaffian orientation G, we build the signed bipartite 
= — 
adjacency matrix B(G) =(0;,j)1<;,;<, of G by setting 
bi ; = +w(e;,;) if there is an edge from u; to w; oriented 
that way, —w(e;,;) if there is an edge from u; to w; 
oriented that way, and 0 if there is no edge between u; 
and w;. Then we have 


det(B(G)) = +M,,(G) 


In particular, this holds for any bipartite planar 
graph. See Robertson et al. (1999) for a structural 
description about which (not necessarily planar) 
bipartite graphs admit a Pfaffian orientation. 

Kasteleyn’s construction in the planar case has 
been generalized to graphs on surfaces of any genus 
g in Dolbilin et al. (1996), Galluccio and Loebl 
(1999), and Tesler (2000), independently. As pre- 
dicted by Kasteleyn, the solution is in terms of a 
linear combination of 48 Pfaffians. 

With the help of his method, Kasteleyn computed 
the number of dimer coverings of an mxn 
rectangle. (A dimer is a 2 x 1 rectangle. Thus, this 
is equivalent to counting the number of perfect 
matchings on the m xn grid graph. The formula 
was independently found by Temperley and Fisher.) 
The result is 


III] 2 cos 4 +2v —1 cos fl 
m+1 n+i1 














There is a similar rewriting if one of m or n is odd. 
(If both m and n are odd, there is no dimer 
covering.) 

For further reading and references see Dimer 
Problems and Kuperberg (1998). 


Nonintersecting Paths 


Let G=(V,E) be a directed acyclic graph with 
vertices V and directed edges E. Furthermore, we are 
given a function w which assigns a weight w(x) to 
every vertex or edge x. Let us define the weight w(P) 
of a walk P in the graph by [], w(e) [], w(v), where 
the first product is over all edges e of the walk P and 
the second product is over all vertices v of P. We 
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denote the set of all walks in G from u to v by 
P(u—v), and the set of all families (P4, P2,..., Pn) 
of walks, where P; runs from u; to v;,i=1,2,...,7, 
by P(lu—v), with u=(u4,u2,...,u,) and v= (v, 
V2,... Vn). The symbol P*(u— v) stands for the set 
of all families (P1, P2,..., Pa) in P(u— v) with the 
additional property that no two walks share a 
vertex. We call such families of walk(er)s “vicious 
walkers” or, alternatively, “nonintersecting paths.” 
The weight w(P) of a family P=(P1,P2,...,P,) of 
walks is defined as the product | [?_ , w(P;) of all the 
weights of the walks in the family. Finally, given a 
set M with weight function w, we write GF(M;w) 
for the generating function $ e 4 w(x). 

We need two further notations before we are able 
to state the Lindstrom—Gessel—Viennot theorem. 
(For references and historical remarks, we refer the 
reader to footnote 5 in Krattenthaler (2005a).) As 
earlier, the symbol G,, denotes the symmetric group 
of order n. Given a permutation o € G,, we write uo 
for (tioti); Ug(2)9++ +5 Ui) Then 


` (sgn a) -GF(PT(u, — v); w) 
oE, 


= det (GF(P(u; > vi); w)) [60] 


1<i, j <n 


Most often, this theorem is applied in the case 
where the only permutation ø for which vicious 
walks exist is the identity permutation, so that the 
sum on the left-hand side reduces to a single term 
that counts all families (P;,P2,...,P,) of vicious 
walks, the ith walk P; running from A; to 
E;i=1,2,...,n. This case occurs, for example, if 
for any pair of walks (P, O) with P running from u, 
to vq and O running from u, to vea < b and c <d, 
it is true that P and O must have a common vertex. 
Explicitly, in that case we have 


GF(P* (u—v);w)= det (GF(P(u;—v;);w)) [61] 


1<ij<n 


If the starting points or/and the endpoints are not 
fixed, then the corresponding number is given by a 
Pfaffian, a result obtained by Okada and Stembridge 
(see Bressoud (1999) for references). For a set A of 
starting points, let P'(A—v) denote the set of all 
families (P,,P2,...,P2,) of nonintersecting lattice 
paths, where P; runs from some point of A to 
vj;,i=1,2,...,2n. Furthermore, let us suppose that 
the elements of A={u1,u2,...} are ordered in such a 
way that for any pair of walks (P,Q) with P running 
from u, to vg and O running from u, to ve, a < b and 
c<d, it is true that P and O must have a common 
vertex. (This is the same condition as the one which 
makes [61] valid, with the only difference that, here, 
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the number of u;’s could be larger than the number of 
v;’'s.) Then, 


GF(P* (A= v);w) 


= Pf © (GF(P(uq > v;);w)GF(P(up > 0); w) 


1<i,j<2n PET 
—GF(P(up — vi); w)GF (P (ua > v;);w))) [62] 


If the number of paths is odd, then one can use the 
same formula by adding an artificial point to the 
endpoints and to the set of starting points A. There 
is also a theorem by Okada and Stembridge which 
covers the case that starting points and endpoints 
vary. Refinements when the number of turns is fixed 
can be found in Krattenthaler (1997). 


Vicious Walkers, Plane Partitions, 
Rhombus Tilings, and Fully Packed 
Loop Configurations 


In this section we describe the interrelations between 
four frequently appearing objects in statistical 
mechanics and combinatorics: vicious walkers, 
plane partitions, rhombus tilings, and fully packed 
loop configurations. 

Given a lattice, vicious walkers, as introduced by 
Fisher (1984), are particles which move on lattice 
sites in such a way that two particles never occupy 
the same lattice site. Models of vicious walkers have 
been the object of numerous studies from various 
points of view. Rather than accomplishing the 
impossible task of providing a complete overview 
of references, the reader is referred to the basic 
reference Fisher (1984) and to Krattenthaler (2005a) 
for further pointers to the literature. 

Most of the known results apply for vicious 
walkers on the line. There are in fact two different 
models: in the random turns vicious walker model, n 
walkers move on the integral points of the real line 
in such a way that at each tick of the clock exactly 
one walker moves to the right or to the left, whereas 
in the lock step vicious walker model n walkers 
move on the integral points of the real line in such a 
way that at each tick of the clock each walker moves 
to the right or to the left. 

The first model is equivalent to a model of one 
walker in Z” (Z denoting the set of integers) which 
at each tick of the clock moves a positive or negative 
unit step in the direction of one of the coordinate 
axes, always staying in the wedge x1 > x2 >- > 
Xn. This point of view was already put forward by 
Fisher (1984). However, this problem belongs to the 
problem of counting paths in chambers of reflection 
groups discussed in the section “Lattice paths.” 


The second model could also be realized as a 
single walker model (cf. Krattenthaler (2003)). 
However, most often it is realized as a model of n 
paths in the plane consisting of steps (1,1) and 
(1,—1) with the property that no two paths have a 
point in common. In this picture, the x-axis becomes 
the time line, the kth path doing an up-step (1,1) 
from (t—1,y) to (t,y +1) meaning that the kth 
particle moves to the left at time t, whereas the kth 
path doing a down-step (1, —1) from (t—1,y) to 
(t,y — 1) meaning that the kth particle moves to the 
right at time t. 

The reader should consult Figure 14a for an 
example. (The labelings should be ignored at this 
point.) Clearly, what we encounter here is a 
particular instance of the nonintersecting paths of 
the last section. Therefore, for fixed starting points 
and endpoints, formula [61] applies, whereas if the 
starting points vary and the endpoints are fixed, it is 
formula [62] that applies. 

At this point, the links to the other objects, 
semistandard tableaux and plane partitions 
(cf. Bressoud (1999)), emerge. A filling of the cells 
of the Ferrers diagram of A with elements of the set 
{1,2,...}, which is weakly increasing along rows 
and strictly increasing along columns is called a 
(semistandard) tableau of shape A. Figure 14b shows 
such a semistandard tableau of shape (4,3,2). In 
fact, vicious walkers and semistandard tableaux are 
equivalent objects. To see this, first label down-steps 
by the x-coordinate of their endpoint, so that a step 
from (a—1,b) to (a,b—1) is labeled by a, see 
Figure 14a. Then, out of the labels of the jth path, 
form the jth column of the corresponding tableau, 

















Figure 14 (a) Vicious walkers. (b) A tableau. 


see Figure 14b. The resulting array of numbers is 
indeed a semistandard tableau. This can be readily 
seen, since the entries are trivially strictly increasing 
along columns, and they are weakly increasing along 
rows because the paths do not touch each other. 
Thus, problems of enumerating vicious walkers can 
be translated into tableau enumeration problems, 
and vice versa. 

The significance of semistandard tableaux lies 
particularly in the representation theory for classical 
groups, see Classical Groups and Homogenous 
Spaces and Compact Groups and Their Representa- 
tions. Namely, the irreducible characters for 
GL(n,C) and SL(n,C), the Schur functions, are 
generating functions for semistandard tableaux of 
a given shape. If the entries of the ith row of 
a semistandard tableau are required to be at least 
2i — 1, then one speaks of symplectic tableaux, and 
the irreducible characters for Sp(2n, C) are generat- 
ing functions for symplectic tableaux of a given 
shape. We refer the reader to Krattenthaler et al. 
(2000) for more information on these topics. 

Objects which are very close to semistandard 
tableaux are plane partitions. According to MacMa- 
hon, a plane partition of shape A is a filling of the 
Ferrers diagram of A with non-negative integers which 
is weakly decreasing along rows and columns. See 
Figure 15b for an example of a plane partition of shape 
(3,3,3). In particular, semistandard tableaux and 
plane partitions of rectangular shape are actually 
equivalent. For, let T be a semistandard tableau of 
rectangular shape. Then, from each element of the ith 
row we subtract i. Finally, the obtained array is rotated 
by 180°. As a result, we obtain a plane partition. See 
Figure 15 for a semistandard tableau and a plane 
partition which correspond to each other under these 
transformations. 

On the other hand, plane partitions can also be 
realized as three-dimensional objects, by interpreting 
each entry in the array as a pile of unit cubes of the 
size of the entry. For example, the plane partition in 
Figure 15 corresponds to the pile of cubes in 
Figure 16a. But then, forgetting the three-dimensional 
view, by embedding the picture in a minimally 
bounding hexagon, and by filling the emerging empty 
regions by rhombi of unit length in the unique way this 
is possible, we obtain a rhombus tiling of a hexagon in 


1 1 2 2 2 1 
3 o 9 1 1 1 
4 5 5 1 0 O 


(a) (b) 


Figure 15 (a) A semistandard tableau. (b) A plane partition. 
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(a) (b) 
Figure 16 (a) A plane partition; three-dimensional view. 
(b) A rhombus tiling. 


which opposite sides have the same length, see 
Figure 16b. 

From the rhombus tiling, there is then again an 
elegant way to go to nonintersecting paths: we mark 
the mid-points of the edges along two opposite sides, 
see Figure 17a. Now we draw lattice paths which 
connect points on different sides, by “following” 
along the other lozenges, as indicated in Figure 17a 
by the dashed lines. Clearly, the resulting paths are 
nonintersecting, that is, no two paths have a 
common vertex. If we slightly distort the underlying 
lattice, we get orthogonal paths with horizontal and 
vertical steps in the positive direction, see 
Figure 17b. 

Rhombus tilings, on their part, are equivalent to 
perfect matchings of hexagonal graphs. To see this, 
one places the tiling on the underlying triangular 
grid, see Figure 18a. Then one places a bond into 
each rhombus, so that it connects the mid-points of 
the two triangles out of which the rhombus is 
composed, see Figure 18b. Finally, one forgets the 
contour of the tiling, but instead one introduces all 
the other edges which connect mid-points of 
adjacent triangles of the triangular grid, see 
Figure 18c. Thus, one arrives at a perfect matching 
of the hexagonal graph consisting of the edges 
connecting mid-points of triangles. 

Because of these various connections, enumera- 
tion problems for vicious walkers, plane partitions, 
tableaux, rhombus tilings can be approached by 
the different methods which are available for the 
various objects: the determinant theorem from 
the section ‘“Nonintersecting paths,” together 
with determinant evaluation techniques (cf. the 
survey Krattenthaler (2005b)), apply, as well as the 
“Kasteleyn method” from the section “Perfect 











(a) (b) 


(a) A rhombus tiling. (b) A family of nonintersecting 


Figure 17 
paths. 
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(c) 
Figure 18 (a) A rhombus tiling. (b) Bonds in rhombi. 
(c) A perfect matching of a hexagonal graph. 


matchings and tilings,? and also methods from 
character theory for the classical groups. All 
of these methods have been applied extensively (see 
the surveys by Kenyon (2003), Propp (1999), and 
Krattenthaler et al. (2000)), the first and third more 
frequently for exact enumeration, while the second 
particularly for asymptotic studies. It should be 
noted that methods from random matrix theory also 
apply in certain situations, see Johansson (2002). See 
Growth Processes in Random Matrix Theory and 
Random Matrix Theory in Physics. 

In fact, we missed mentioning a further object, from 
statistical physics, which in some cases is equivalent to 
vicious walkers, etc.: fully packed loop configurations. 
(Fully packed loop configurations are in bijection with 
six-vertex configurations, see the next section.) If one 
imposes certain “connectivity constraints” on fully 
packed loop configurations, then one can construct 
bijections with rhombus tilings and, hence, with 
nonintersecting paths and with the other objects 
discussed in this section. The reader is referred to 
Di Francesco et al. (2004) and references therein. 

Having explained the various connections, we cite 
some fundamental results in the area. (We refer the 
reader to Bressoud (1999) and Stanley (1999, 
chapter 7).) MacMahon proved that the number of 
all plane partitions contained in an a x b x c box 
(when viewed in three dimensions) is equal to 


: ga R= 1 
NI 4 63] 


i=1 j=1 k=1 


Thus, the number of rhombus tilings of a hexagon 
with side lengths a,b,c,a,b,c is given by the same 
number, as well as the number of all vicious walkers 
(Pi, P2,..., Pa), where P; runs from (0,27) to (b +c, 
b — c+ 2i), i=1,2,...,a. More generally, the num- 
ber of semistandard tableaux of shape A with entries 
at most m is given by the hook-content formula 


qs 64] 
uEÀ 

where u ranges over all the cells of the Ferrers 
diagram of A, with c(u) being the content of u, 
defined as the difference of the column number and 
the row number of u, and with h(u) being the hook 
length of u, defined as the number of cells in the 
hook of u, the latter consisting of the cells to the 
right of u in the same row and below u in the 
same column, including u. Thus, this also gives a 
formula for the number of all vicious walkers 
(P,,P2,...,P2), where P; runs from (0,27) to 
(N,4;). See Krattenthaler et al. (2000, section 2) 
for details. There it is also explained that a Schur 
function summation formula, together with an 
analog of the hook-content formula for special 
orthogonal characters, proves that the number of 
all vicious walkers (P,,P2,...,P,), where P; runs 
from (0,27) for N steps is given by 


at+ti+j—1 
—— 65 
i+j-—1 ea 


1<i<j<N 


The reader is referred to the references given in 
this section for many more results, in particular, on 
the enumeration of plane partitions with symmetry, 
the enumeration of rhombus tilings of regions other 
than hexagons, and the enumeration of vicious 
walkers with various starting points and endpoints, 
under various constraints. 


Six-Vertex Model and Alternating-Sign 
Matrices 


An alternating-sign matrix is a square matrix of 0’s, 
1’s and —1’s for which the sum of entries in each 
row and in each column is 1 and the nonzero entries 
of each row and of each column alternate in sign. 
For instance, 


COCR SO 
COrRCSO 
© 
| 
= 
SODO 


isa 5 x 5 alternating-sign matrix. Zeilberger proved 
that the number of n x n alternating-sign matrices is 
given by 


ieee, 66 


and he went on to prove the finer version that the 
number of n x n alternating-sign matrices with the 
(unique) 1 in the first row in position j is given by 


(eee aes, rm | ee j— z) n—1 ae, 

ee. 2 ee [67] 

n 1) i=0 
The first number is also equal to the number of 
totally symmetric self-complementary plane parti- 
tions contained in the (27) x (2m) x (2m) box, but 
there is no intrinsic explanation why this is so. We 
refer the reader to Bressoud (1999) for an exposi- 
tion of these results, and for pointers to the 
literature containing further unexplained connec- 
tions between alternating-sign matrices and plane 
partitions. 

While the first result was achieved by a brute-force 
constant-term approach, the second result is based on 
the observation that alternating-sign matrices are in 
bijection with configurations in the six-vertex model 
on the square grid under domain-wall boundary 
conditions. This then allowed one to use a formula 
due to Izergin for the partition function for these six- 
vertex configurations. Similar formulas for variations 
of the model have been found by Kuperberg, and by 
Razumov and Stroganov (see Razumov and Stroga- 
nov (2005) and references therein). 

A configuration in the six-vertex model is an 
orientation of edges of a 4-regular graph (i.e., at 
each vertex there meet exactly four edges) such that 
at each vertex two edges are oriented towards the 
vertex and two are oriented away from the vertex. 
Thus, there are six possible vertex configurations, 
giving the name of the model, see Figure 19. To go 
from one object to the other, one uses the transla- 
tion between local configurations at a vertex and 
entries in alternating-sign matrices indicated in the 
figure. An example of the correspondence can be 
found in Figure 20. 

Another manifestation of alternating-sign matrices 
and six-vertex configurations are fully packed loop 
configurations. A fully packed loop configuration on a 
graph is a collection of edges such that each vertex is 


Titian? 


Figure 19 The six vertex configurations. 
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(a) (b) 


Figure 20 (a) An alternating-sign matrix. (b) A six-vertex 


configuration. 


incident to exactly two edges. One obtains a fully 
packed loop configuration out of a six-vertex config- 
uration by dividing the square lattice into its even and 
odd sublattice denoted by A and B, respectively. 
Instead of arrows, only those edges are drawn that, 
on sublattice A, point inward and, on sublattice B, 
point outward. The reader is referred to de Gier 
(2005) and Di Francesco et al. (2004) for further 
reading. 

The story of alternating-sign matrices and their 
connection to the six-vertex model is given a vivid 
account in Bressoud (1999), with further important 
results by Kuperberg, Okada, Razumov and 
Stroganov, referenced in Razumov and Stroganov 
(2005). 

Fully packed loop configurations seem to play an 
important role in the explicit form of the ground- 
state vectors of certain Hamiltonians in the dense 
O(1) loop model. The corresponding conjectures are 
surveyed in de Gier (2005). There is important 
progress on these conjectures by Di Francesco and 
Zinn—Justin (2005, and references therein). 


Binomial Sums and Hypergeometric Series 


When dealing with enumerative problems, it is 
inevitable to deal with binomial sums, that is, sums 
in which the summands are products/quotients of 
binomial coefficients and factorials, such as, for 


example, 
S$) (> 2. 
E k n— k 
In most cases, the right environment in which one 


should work is the theory of (generalized) hypergeo- 
metric series. These are defined as follows: 


, di,- ..; A; n z ak (4;), 2° 
by...,b, | kore Odi ki 


where (a), =a(a+1)(a+2)---(a+k—1) for k > 
0, and (a))=1. The symbol (a), is called the 
Pochhammer symbol or shifted factorial. For in- 
depth treatments of the subject, we refer the reader 
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to Andrews et al. (1999), Gasper and Rahman 
(2004), and Slater (1966). 

Hypergeometric series can be characterized as 
series in which the quotient of the (k + 1)st by the 
kth summand is a rational function in k. This is also 
the way to convert binomial sums into their 
hypergeometric form (respectively to see if this is 
possible; in most cases it is): form the quotient of the 
(k+1)st by the kth summand and read off the 
parameters a1,...,4,,b1,...,bs;, and the argument z 
from the factorization of the numerator and the 
denominator polynomials of the rational function, 
out of these form the corresponding hypergeometric 
series, and multiply the series by the summand for 
k=0. This is, in fact, a completely routine task, and, 
indeed, computer algebra programs such as Maple 
and Mathematica do this automatically. 

The reason why hypergeometric series are much 
more fundamental than the binomial sums them- 
selves is that there are hundreds of ways to write the 
same sum using binomial coefficients and factorials, 
whereas there is just one hypergeometric form, that 
is, hypergeometric series are a kind of normal form 
for binomial sums. In particular, given a specific 
binomial sum, it is a hopeless enterprise to scan 
through all the identities available in the literature 
for this sum. There may be an identity for it, but 
perhaps written differently. On the contrary, given a 
specific hypergeometric series, the list of available 
identities which apply to this series is usually not 
large, and tables of such identities can be set up in 
a systematic way. This has been done (cf. Slater 
(1966); the most comprehensive table available to 
this date is contained in the manual of 
the Mathematica package HYP - see “Further 
reading”), and scanning through these tables is 
largely facilitated by the use of the Mathematica 
package HYP. 

We give here some of the most important 
identities for hypergeometric series. Aside from the 
binomial theorem, the most important summation 
formulas are: the Gaufs »F,-summation formula 


a,b 
2F1 1| = 
C 


[(c)(c —a-— b) 
[(c—a)I'(c — b) 


provided R(c — a — b) > 0, 
the Pfaff—Saalschiitz summation formula 


a,b,—n B 7 
p il- C-d,e-b, 


c,lt+at+b—-—c—n (ale-a -= b), 
provided 7 is a non-negative integer, and 
the Dougall summation formula 


a,a/2+1,b,c,d,1+2a—b—c—d-+n,—n 


F 
oe a/2,1t+a—b,1+a—c,1+a-—d, 


—a+b+c+d-—n,a+1+n 


_(1+a4),(1+a—b-c),(1+a—b—d),(1+a—c—d), 
(1+a—b),(1+a-c),(1+a-d),(1+a-—b-c-—d), 


provided 7 is a non-negative integer. 

Some of the most important transformation 
formulas are 
the Euler transformation formula 


a,b c—a,c—b 
2F1 Z| = (1 E z) PF Z 
C C 


provided |z| < 1, 
the Kummer transformation formula 


a,b,c 
F 4 _f@lid+e=42=—)—¢) 
an | T(e-a)I'(d+e—b-c) 
d,e 
a,d—b,d—c 
x 3F2 a 
d,d +e-—b-c 


provided both series converge, 
and the Whipple transformation formulas 


a,b,c,—n 
4F3 A 
e,f,ltatb+c—e-f-—n 


_ (e-a), (f 4) 
ORP 


—n,a,1+a+c—e—f—n,1+a+b—e-f-n 


x 4F3 1 
1+a+b+c—e—f—n,1+a—e-—n,1+a-—f—n 
[68] 
where n is a non-negative integer, and 
E AE 
7F6 a 
$,l+a—b,1+a—c,1+a—d,1+a—e,1+a+n 
_(1+a),(1+a—d-e), 
(1+a-—d),(1+a-e), 
1+a—b—c,d,e,—n 
x 4F3 :1 [69] 


1+a—b,l+a—c,-a+dt+e-n 


provided 7 is a non-negative integer. 


Since about 1990, for the verification of binomial 
and hypergeometric series, there are automatic tools 
available. The book by Petkovšek et al. (1996) is an 
excellent introduction into these aspects. The philo- 
sophy is as follows. Suppose we are given a binomial 
or hypergeometric series S(z)= `, F(n,k). The 
Gosper—Zeilberger algorithm (see “Further read- 
ing”) (cf. Petkovšek et al. (1996); a simplified 
version was presented in the reference Zeilberger in 
“Further reading”) will find a linear recurrence 


Ao(n)S(n) + Ai(n)S(n +1) ++: 
+ Ag(n)S(n + d) = C(n) [70] 


for some d, where the coefficients A;(n) are 
polynomials in n, and where C(m) is a certain 
function in n, with proof! 

If, for example, we suspected that S(n) = RHS(n), 
where RHS(z) is some closed-form expression, then 
we just have to verify that RHS(z) satisfies the 
recurrence [70] and check S(n)=RHS(n) for suffi- 
ciently many initial values of n to have a proof for 
the identity S(n)=RHS(n) for all n. On the other 
hand, if RHS(z) was a different sum, then we would 
apply the algorithm to find a recurrence for RHS(7). 
If it turns out to be the same recurrence then, again, 
a check of S(z) = RHS(n) for a few initial values will 
provide a full proof of S(n) = RHS(z) for all n. 

Even in the case that we do not have a conjectured 
expression RHS(z), this is not the end of the story. 
Given a recurrence of the type [70], the Petkovsek 
algorithm (see “Further reading”) (cf. Petkovšek ez al. 
(1996)) is able to find a closed-form solution (where 
“closed form” has a precise meaning), respectively tell 
that there is no closed-form solution. 

The fascinating point about both algorithms is 
that neither do we have to know what the algorithm 
does internally nor do we have to check that. For 
the Petkovsek algorithm, this is obvious anyway 
because, once the computer says that a certain 
expression is a solution of [70], it is a routine matter 
to check that. This is less obvious for the Gosper- 
Zeilberger algorithm. However, what the Gosper- 
Zeilberger algorithm does is, for a given sum 
S(n)= >, F(n,k), it finds polynomials Ao(7), 
A,(n),...,Ag(”) and an expression G(n,k) (which 
is, in fact, a rational multiple of F(7,k)), such that 


Ao(1)F(n,k) + Ai(n)F(n+1,k)+--- 
+ Aq(n)F(n + d,k) = G(n,k+1)—G(n,k) [71] 


for some d. Because of the properties of F(n, k) and 
G(n,k), which are part of the theory, this is an 
identity which can be directly verified by clearing all 
common factors and checking the remaining identity 
between rational functions in n and k. However, we 


Combinatorics: Overview 575 


may now sum both sides of [71] over k to obtain a 
recurrence of the form [70]. 

Algorithms for multiple sums are also available 
(see “Further reading”). They follow ideas by Wilf 
and Zeilberger (1992) (of which a simplified 
version is presented in a Mohammed and Zeilber- 
ger preprint (see “Further reading”)); however, they 
run more quickly in capacity problems. Schneider 
(2005) is currently developing a very promising 
new algorithmic approach to the automatic treat- 
ment of multisums. See q-Special Functions and 
Statistical Mechanics and Combinatorial Problems. 


See also: Classical Groups and Homogeneous Spaces; 
Compact Groups and Their Representations; Dimer 
Problems; Growth Processes in Random Matrix Theory; 
Ordinary Special Functions; g-Special Functions; Saddle 
Point Problems; Statistical Mechanics and Combinatorial 
Problems. 
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In this article, we describe the structure and 
representation theory of compact Lie groups. 
Throughout the article, G is a compact real Lie 


group with Lie algebra g. Unless otherwise stated, 
G is assumed to be connected. The word “group” 
will always mean a “Lie group” and the word 
“subgroup” will mean a closed Lie subgroup. The 
notation Lie(H) stands for the Lie algebra of a Lie 
group H. We assume that the reader is familiar 
with the basic facts of the theory of Lie groups and 
Lie algebras, which can be found in Lie Groups: 
General Theory, or in the books listed in the 
bibliography. 


Examples of Compact Lie Groups 
Examples of compact groups include 


è finite groups, 

è quotient groups T” =R”/Z”, or more generally, 
V/L, where V is a finite-dimensional real vector 
space and L is a lattice in V, that is, a discrete 
subgroup generated by some basis in V — groups 
of this type are called “tori”; it is known that 
every commutative connected compact group is a 
torus; 

è unitary groups U(m) and special unitary groups 
SU(n),n > 2; 

èe orthogonal groups O(n) and SO(n), n > 3; and 

e the groups U(n, H), n > 1, of unitary quaternionic 
transformations, which are isomorphic to Sp(n) := 
Sp(m, C) A SU(2n). 





The groups O(n) have two connected components, 
one of which is SO(n). The groups SU(7) and Sp(7) 
are connected and simply connected. 

The groups SO(n) are connected but not simply 
connected: for n >3, the fundamental group of 
SO(n) is Zo. The universal cover of SO(n) is a 
simply connected compact Lie group denoted by 
Spin(z). For small n, we have isomorphisms: 
Spin(3) ~ SU(2), Spin(4) ~ SU(2) x SU(2), Spin(5) > 
Sp(4), and Spin(6) ~ SU(4). 


Relation to Semisimple Lie Algebras 
and Lie Groups 


Reductive Groups 
A Lie algebra g is called 


e “simple” if it is nonabelian and has no ideals 
different from {0} and q itself; 

e “semisimple” if it is a direct sum of simple ideals; 
and 

e “reductive” if it is a direct sum of semisimple and 
commutative ideals. 


We call a connected Lie group G “simple” or 
“semisimple” if Lie(G) has this property. 


Theorem 1 Let G be a connected compact Lie 
group and q=Lie(G). Then 


(i) The Lie algebra g =Lie(G) is reductive: g=a ® 
q’, where a is abelian and g'=[q,q] is 
semisimple. 

(ii) The group G can be written in the form G=(A x 
K)/Z, where A is a torus, K is a connected, simply 
connected compact semisimple Lie group, and Z 
is a finite central subgroup in A x K. 

(iii) If G is simply connected, it is a product of 
simple compact Lie groups. 
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The proof of these results is based on the fact that 
the Killing form of g is negative semidefinite. 


Example 1 The group U(n) contains as the center 
the subgroup C of scalar matrices. The quotient 
group U(z)/C is simple and isomorphic to 
SU(n)/Zn. The presentation of Theorem 1 in this 
case is 


U(n) = (T! x SU(n))/Z,, 
= (C x SU(n))/(CNSU(n)) 


For the group SO(4) the 
(SU(2) x SU(2))/{4(1 x 1)}. 


presentation is 


This theorem effectively reduces the study of the 
structure of connected compact groups to the study 
of simply connected compact simple Lie groups. 


Complexification of a Compact Lie Group 


Recall that for a real Lie algebra g, its complex- 
ification is gc =9 ® C with obvious commutator. It 
is also well known that gc is semisimple or 
reductive iff q is semisimple or reductive, respec- 
tively. There is a subtlety in the case of simple 
algebras: it is possible that a real Lie algebra is 
simple, but its complexification gc is only semi- 
simple. However, this problem never arises for Lie 
algebras of compact groups: if g is a Lie algebra of a 
real compact Lie group, then g is simple if and only if 
gc is simple. 

The notion of complexification for Lie groups is 
more delicate. 


Definition 1 Let G be a connected real Lie group 
with Lie algebra g. A complexification of G is a 
connected complex Lie group Gc (i.e., a complex 
manifold with a structure of a Lie group such that 
group multiplication is given by a complex analytic 
map Gc x Gc — Gc), which contains G as a closed 
subgroup, and such that Lie(Gc)= qc. In this case, 
we will also say that G is a real form of Gc. 


It is not obvious why such a complexification 
exists at all; in fact, for arbitrary real group it may 
not exist. However, for compact groups we do have 
the following theorem. 


Theorem 2 Let G be a connected compact Lie 
group. Then it has a unique complexification Gc D G. 
Moreover, the following properties hold: 


(i) The inclusion G C Gc is a homotopy equiva- 
lence. In particular, ™(G)=71(Gc) and the 
quotient space Gc/G is contractible. 

(ii) Every complex finite-dimensional representation 
of G can be uniquely extended to a complex 
analytic representation of Gc. 
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Since the Lie algebra of a compact Lie group G is 
reductive, we see that Gc must be reductive; if G is 
semisimple or simple, then so is Gc. The natural 
question is whether every complex reductive group 
can be obtained in this way. The following theorem 
gives a partial answer. 


Theorem 3 Every connected complex semisimple 
Lie group H has a compact real form: there is a 
compact real subgroup K CH such that H = Kc. 
Moreover, such a compact real form is unique up to 
conjugation. 


Example 2 


(i) The unitary group U(z) is a compact real form 

of the group GL(z, C). 

(ii) The orthogonal group SO(n) is a compact real 
form of the group SO(n, C). 

(iii) The group Sp(z) is a compact real form of the 
group Sp(z, C). 

(iv) The universal cover of GL(n, C) has no compact 
real form. 


These results have a number of important appli- 
cations. For example, they show that study of 
representations of a semisimple complex group H 
can be replaced by the study of representations of its 
compact form; in particular, every representation is 
completely reducible (this argument is known as 
Weyl’s unitary trick). 


Classification of Simple Compact Lie Groups 


Theorem 1 essentially reduces such classification to 
classification of simply connected simple compact 
groups, and Theorems 2 and 3 reduce it to the 
classification of simple complex Lie algebras. Since 
the latter is well known, we get the following result. 


Theorem 4 Let G be a connected, simply con- 
nected simple compact Lie group. Then gc must be 
a simple complex Lie algebra and thus can be 
described by a Dynkin diagram of one the following 
types: A Buy Cu, Dn, Fé, by, Eg, F4, Co 

Conversely, for each Dynkin diagram in the above 
list, there exists a unique, up to isomorphism, simply 
connected simple compact Lie group whose Lie 
algebra is described by this Dynkin diagram. 


For types A,,...,D,, the corresponding compact 
Lie groups are well-known classical groups shown in 
the table below: 


Bn, n > 2 





Spin(2n + 1) 





The restrictions on n in this table are 
made to avoid repetitions which appear for 
small values of n. Namely, A; = Bı = C1, which 
gives SU(2) = Spin(3)= Sp(1); D2 = Aı U A1, which 
gives Spin(4) =SU(2) x SU(2); B2 = C2, which gives 
SO(5)=Sp(4); and A3 = D3, which gives SU(4) = 
Spin(6). Other than that, all entries are distinct. 

Exceptional groups E¢,..., Gz also admit explicit 
geometric and algebraic descriptions which are 
related to the exceptional nonassociative algebra O 
of the so-called octonions (or Cayley numbers). For 
example, the compact group of type G2 can be 
defined as a subgroup of SO(7) which preserves an 
almost-complex structure on S°. It can also be 
described as the subgroup of GL(7,R) which 
preserves one quadratic and one cubic form, or, 
finally, as a group of all automorphisms of O. 


Maximal Tori 
Main Properties 
In this section, G is a compact connected Lie group. 


Definition 2 A “maximal torus” in G is a maximal 
connected commutative subgroup T C G. 


The following theorem lists the main properties of 
maximal tori. 


Theorem 5 


(i) For every element g € G, there exists a maximal 
torus T > g. 

(ii) Any two maximal tori in G are conjugate. 

(ii) If g€ G commutes with all elements of a 
maximal torus T, then g€ T. 

(iv) A connected subgroup HCG is a maximal 
torus iff the Lie algebra Lie(H) is a maximal 
abelian subalgebra in Lie(G). 


Example 3 Let G=U(n). Then the set T of 
diagonal unitary matrices is a maximal torus in G; 
moreover, every maximal torus is of this form after 
a suitable unitary change of basis. In particular, this 
implies that every element in G is conjugate to a 
diagonal matrix. 


Example 4 Let G=SO(3). Then the set D of 
diagonal matrices is a maximal commutative sub- 
group in G, but not a torus. Here D consists of four 
elements and is not connected. 


Maximal Tori and Cartan Subalgebras 


The study of maximal tori in compact Lie groups is 
closely related to the study of Cartan subalgebras in 
reductive complex Lie algebras. For convenience of 
readers, we briefly recall the appropriate definitions 


here; details can be found in Serre (2001) or in Lie 
Groups: General Theory. 


Definition 3 Let a be a complex reductive Lie 
algebra. A Cartan subalgebra h Ca is a maximal 
commutative subalgebra consisting of semisimple 
elements. 


Note that for general Lie algebras Cartan sub- 
algebra is defined in a different way; however, for 
reductive algebras the definition given above is 
equivalent to the standard one. 

A choice of a Cartan subalgebra gives rise to the 
so-called root decomposition: if h C a is a Cartan 
subalgebra in a complex reductive Lie algebra, then 
we can write 


a=he (Da) [1] 


aER 


where 


A, = {x € a| adh.x = (a,h)xVh € b} 
R = {a € Ñ — {0}jaa #0} C h 


The set R is called the “root system” of a with 
respect to Cartan subalgebra h; elements a € R are 
called “roots.” We will also frequently use elements 
a” € h defined by (a, 3) =2(a, 3)/(a,a) where (,) 
is a nondegenerate invariant bilinear form on a* and 
(,) is the pairing between a and a*. It can be shown 
that so defined a“ does not depend on the choice of 
the form (,). 


Theorem 6 Let G be a connected compact Lie 
group with Lie algebra g, and let TCG be a 
maximal torus in G, t=Lie(T) C g. Let gc, Gc be 
the complexification of g, G as in Theorem 2. 

Let h=tc C Qc. Then h is a Cartan subalgebra in 
gc, and the corresponding root system RC it*. 
Conversely, every Cartan subalgebra in gc can be 
obtained as tc for some maximal torus T C G. 


Weights and Roots 


Let G be semisimple. Recall that the root lattice 
O C it“ is the abelian group generated by roots a € 
R, and let the coroot lattice OY C it be the abelian 
group generated by coroots a’,a € R. Define also 
the weight and coweight lattices by 


P={Alla’,A)EZ Vae R} cit 
PY = {t\(t,a)€Z Vae R} Cit, 


where (-,-:) is the pairing between t and the dual 
vector space t”. 
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It follows from the definition of root system that 
we have inclusions 


OcPCciIt 2 
OEP cu 

Both P, O are lattices in it*; thus, the index (P : Q) 
is finite. It can be computed explicitly: if a; is a basis 
of the root system, then the fundamental weights w; 


defined by 
(aj, wj) = ôi 


form a basis of P. The simple roots a; are related 
to fundamental weights w; by the Cartan matrix A: 
a= 3 Ayu Therefore, (F:0)=(P 0" ]= |detAl 

Definitions of P, O, PY, OY also make sense when 
g is reductive but not semisimple. However, in this 
case they are no longer lattices: rkKO < dim t*, and P 
is not discrete. 

We can now give more precise information about 
the structure of the maximal torus. 


Lemma 1 Let T be a compact connected commu- 
tative Lie group, and t=Lie(T) its Lie algebra. Then 
the exponential map is surjective and preimage 
of unit is a lattice L C t. There is an isomorphism 
of Lie groups 


exp: t/L — T 


In particular, T ~ R'/Z' =T’,r= dim T. 
Let X(T) c it* be the lattice dual to (2i) "L: 


X(T) = {A € it (A, D € 2m1Z V1 € L} [3] 


It is called the “character lattice?” for T (see the 
subsection “Examples of representations”). 


Theorem 7 Let G be a compact connected Lie 
group, and let T C G be a maximal torus in G. 

Then O C X(T) c P. Moreover, the group G is 
uniquely determined by the Lie algebra q and the 
lattice X(T) € it* which can be any lattice between 
O and P. 


Corollary For a given complex semisimple Lie 
algebra a, there are only finitely many (up to 
isomorphism) compact connected Lie groups G 
with gc =4. 

The largest of them is the simply connected group, 
for which T =t/2miO”, X(T) =P; the smallest is the 
so-called “adjoint group,” for which T=t/2miP”, 
X(T)=0. 


Example § Let G =U(n). Then it= {real diagonal 
matrices}. Choosing the standard basis of matrix 
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units in it, we identify it ~ R”, which also allows us 
to identify it* ~ R”. Under this identification, 


Q= {Aten EZ, > =0} 


= 4 (Aigiesss An) DG E€ R, À; — Àj = Z} 
XTn 
Note that Q,P are not lattices: O~Z""'," 
P~RxZ!, 


Now let G=SU(n). Then it*=IR”/R-(1,...,1), 
and O, P are the images of O, P for G = U(n) in this 
quotient. In this quotient they are lattices, and 
(P:Q)=n. The character lattice in this case is 
X(T)=P, since SU(m) is simply connected. The 
adjoint group is PSU(m)=SU(m)/C, where C= 
{A -id|A” = 1} is the center of SU(n). 


Weyl Group 


Let us fix a maximal torus T C G. Let N(T) c G be 
the normalizer of T in G: N(T) ={g € G| gTg !=T}. 
For any g € N(T) the transformation A(g): t gtg™! is 
an automorphism of T. According to Theorem 5, this 
automorphism is trivial iff g € T. So in fact, it is the 
quotient group N(T)/T which acts on T. 


Definition 4 The group W=N(T)/T is called the 
“Weyl group” of G. 


Since the Weyl group acts faithfully on t and t*, it 
is common to consider W as a subgroup in GL(t*). It 
is known that W is finite. 

The Weyl group can also be defined in terms of 
Lie algebra g and its complexification gc. 


Theorem 8 The Weyl group coincides with the 
subgroup in GL(it*) generated by reflections 
Sa: x> x — (2(a,x))/(a,a),a € R, where, as 
before, (,) is a nondegenerate invariant bilinear 
form on g*. 


Theorem 9 


(i) Two elements t,,t2 E€ T are conjugate in G iff 
tı = w(t) for some w € W. 

(ii) There exists a natural homeomorphism of 
quotient spaces G/AdG ~ T/W, where AdG 
stands for action of G on itself by conjugation. 
(Note, however, that these quotient spaces are 
not manifolds: they have singularities.) 

(iii) Let us call a function f on G central if 
f(hgh')=f(g) for any g,h€G. Then the 
restriction map gives an isomorphism 


{continuous central functions on G} 


~ {W — invariant continuous functions on T} 


Example 6 Let G = U(n). The set of diagonal unitary 
matrices is a maximal torus, and the Weyl group is the 
symmetric group S, acting on diagonal matrices by 
permutations of entries. In this case, Theorem 9 shows 
that if f(U) is a central function of a unitary matrix, 
then (U) =f(A1,..-, An), where à; are eigenvalues of 
U and f is a symmetric function in 7 variables. 


Representations of Compact Groups 
Basic Notions 


By a representation of G we understand a pair 
(m, V), where V is a complex vector space and 7 is 
a continuous homomorphism G-— Aut(V). This 
notation is often shortened to z or V. In this article, 
we only consider finite-dimensional (f.d.) represen- 
tations; in this case, the homomorphism 7 is 
automatically smooth and even real-analytic. 

We associate to any f.d. representation (7, V) of G 
the representation (m4, V) of the Lie algebra g = Lie(G) 
which is just the derivative of the map 7: G — Aut V at 
the unit point e € G. In terms of the exponential map, 
we have the following commutative diagram: 


G —> AutV 


exp I T exp 


g —> EndV 


Choosing a basis in V, we can write the operators 
mg) and m,(X) in matrix form and consider r and 7, 
as matrix-valued functions on G and q. The diagram 
above means that 


m(exp X) = e7™ (X) |4] 


Recall that if G is connected, simply connected, then 
every representation of g can be uniquely lifted to a 
representation of G. Thus, classification of repre- 
sentations of connected simply connected Lie groups 
is equivalent to the classification of representations 
of Lie algebras. 

Let (71, Vi) and (m2, V2) be two representations of 
the same group G. An operator A € Hom(V4, V2) is 
called an “intertwining operator,? or simply an 
“intertwiner,” if A o 74(g)=72(g)0A for all g€ G. 
Two representations are called “equivalent” if they 
admit an invertible intertwiner. In this case, using an 
appropriate choice of bases, we can write 71 and 7 
by the same matrix-valued function. 

Let (z, V) be a representation of G. If all operators 
m(g),g € G, preserve a subspace Vı C V, then the 
restrictions 71(g)=7(g)|y, define a “subrepresenta- 
tion” (m1, V1) of (m, V). In this case, the quotient 
space V2 = V/V; also has a canonical structure of a 
representation, called the “quotient representation.” 


A representation (m, V) is called “reducible” if it 
has a nontrivial (different from V and {0}) sub- 
representation. Otherwise it is called “irreducible.” 

We call representation (r, V) “unitary” if V is a 
Hilbert space and all operators r(e), g € G, are 
unitary, that is, given by unitary matrices in any 
orthonormal basis. We use a short term “unirrep” 
for a “unitary irreducible representation.” 


Main Theorems 


The following simple but important result was one 
of the first discoveries in representation theory. It 
holds for representations of any group, not necessa- 
rily compact. 


Theorem 10 (Schur lemma). Let (m;i, V;),i=1,2, be 
any two irreducible finite-dimensional representa- 
tions of the same group G. Then any intertwiner 
A: V1 — Və is either invertible or zero. 


Corollary 1 If V is an irreducible f.d. representation, 
then any intertwiner A: V — V is scalar: A=c-id,c E€ C. 


Corollary 2 Every irreducible representation of a 
commutative group is one dimensional. 


The following theorem is one of the fundamental 
results of the representation theory of compact 
groups. Its proof is based on the technique of 
invariant integrals on a compact group, which will 
be discussed in the next section. 


Theorem 11 


(i) Any f.d. representation of a compact group is 
equivalent to a unitary representation. 

(ii) Any f.d. representation is completely reducible: 
it can be decomposed into direct sum 


v= CD nV; 


where V; are pairwise nonequivalent unirreps. 
Numbers ni € Z, are called “multiplicities.” 


Examples of Representations 


The representation theory looks rather different for 
abelian (i.e., commutative) and nonabelian groups. 
Here we consider two simplest examples of both kinds. 

Our first example is a one-dimensional compact 
connected Lie group. Topologically, it is a circle 
which we realize as a set T ~ U(1) of all complex 
numbers t with absolute value 1. 

Every unirrep of T is one dimensional; thus, it is 
just a continuous multiplicative map 7 of T to itself. 
It is well known that every such map has the form 


m,(t)=t* for some ke Z 
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The collection of all unirreps of T is itself a group, 
called “Pontrjagin dual” of T and denoted by 
T. This group is isomorphic to Z. 

By Theorem 11, any f.d. representation 7 of T is 
equivalent to a direct sum of one-dimensional 
unirreps. So, an equivalence class of m is defined by 
the multiplicity function u on T=Z taking non- 
negative values: 


TOOR, 


The many-dimensional case of compact connected 
abelian Lie group can be treated in a similar way. 
Let T be a torus, that is, an abelian compact group, 
t=Lie(T). Then every irreducible representation 
of T is one dimensional and thus is defined by a 
group homomorphism y:T— T!=U(1). Such 
homomorphisms are called “characters” of T. One 
easily sees that such characters themselves form a 
group (Pontrjagin dual of T). If we denote by L the 
kernel of the exponential map t—> T (see Lemma 1), 
one easily sees that every character has a form 


y(exp(t)) =e", tet, \€ X(T) 


where X(T) C it* is the lattice defined by [3]. Thus, 
we can identify the group of characters T with X(T). 
In particular, this shows that T ~ Z™", 

The second example is the group G=SU(2), the 
simplest connected, simply connected nonabelian 
compact Lie group. Topologically, G is a three- 
dimensional sphere since the general element of G is 
a matrix of the form 


wt 
S= 


Let V be two-dimensional complex vector space, 
realized by column vectors (%). The group G acts 
naturally on V. This action induces the representa- 
tion II of G in the space S(V) of all polynomials in 
u, v. It is infinite dimensional, but has many f.d. 
subrepresentations. In particular, let S*(V), or 
simply S*, be the space of all homogeneous 
polynomials of degree k. Clearly, dim S* =k + 1. 

It turns out that the corresponding f.d. representa- 
tions (II,,5*),k > 0, are irreducible, pairwise non- 
equivalent, and exhaust the set G of all unirreps. 

Some particular cases are of special interest: 


b 
i a,b €C, |a} +b =1 
a 


1. k=0. The space Vo consists of constant functions 
and Ilo is the trivial one-dimensional representa- 
tion: IIp(g) = 1. 

2. k=1. The space Vj, is identical to V and II; is 
just the tautological representation 7(g) = g. 

3. k=2. The space V> is spanned by monomials 


u*,uv,v*. The remarkable fact is that this 
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representation is equivalent to a real one. Namely, 
in the new basis 








w+ wav = 
Ca Ua ae Z = 10 
we have 
; Re(a? +b?) 2Im(ab) Im(b* —a?) 
a _ _ 
Il) É d =| 2Im(ab) Ja -|b| 2Re(ab) 


Im(a? +b?) 2Re(ab) Re(a*— b?) 


This formula defines a homomorphism II; : SU(2) —> 
SO(3). It can be shown that this homomorphism is 
surjective, and its kernel is the subgroup 
{+1} CSU(2): 


ee hems bie) en 0) ce) ee 


The simplest way to see it is to establish the 

equivalence of Il, with the adjoint representation 
of G in g. The corresponding intertwiner is 

S4 > (a+iy)u? + 2ißuv 

ana 18 a+1y 

+(a—iy)v (i ip Jes 


Note that SU(2) and SO(3) are the only compact 
groups associated with the Lie algebra sl(2, C). 

The group G contains the subgroup H of diagonal 
matrices, isomorphic to Tt. Consider the restriction 
of II, to T!. It splits into the sum of unirreps my as 
follows: 


s=|n/2| 


G ) 
Resy: IL, — Tn—2s 
s=0 


The characters 7, which enter this decomposition 
are called the weights of II,. The collection of all 
weights (together with multiplicities) forms a multi- 
set in T denoted by P(II,,) or P(S”). 

Note the following features of this multiset: 


1. P(II,) is invariant under reflection kt> —k. 

2. All weights of II, are congruent modulo 2. 

3. The nonequivalent unirreps have different multi- 
sets of weights. 


Below we show how these features are generalized 
to all compact connected Lie groups. 


Fourier Transform 
Haar Measure and Invariant Integral 


The important feature of compact groups is the 
existence of the so-called “invariant integral,” or 
“average.” 


Theorem 12 For every compact Lie group G, there 
exists a unique measure dg on G, called “Haar 
measure,’ which is invariant under left shifts 
Ly:ht+ gh and satisfies fẹ dg=1. 

In addition, this measure is also invariant under 
right shifts h— bg and under involution h= h+. 


Invariance of the Haar measure implies that for 
every integrable function f(g), we have 


f(g)dg= | flhg)dg= | f(gh)dg= | f(g ')dg 
[ flede= | flbese- | fed f, 


For a finite group G, the integral with respect to 
the Haar measure is just averaging over the group: 


1 
J fe) de= fe) 


gEG 


For compact connected Lie groups, the Haar 
measure is given by a differential form of top degree 
which is invariant under right and left translations. 

For a torus T” = R” /Z” with real coordinates 6, € 
R/Z or complex coordinates t,=e7%, the Haar 
measure is d’6:=d6,d6> ---d6,, or 





In particular, consider a central function f (see 
Theorem 9). Since every conjugacy class contains 
elements of the maximal torus T (see Theorem 5), 
such a function is determined by its values on T, and 
the integral of a central function can be reduced to 
integration over T. The resulting formula is called 
“Weyl integration formula.” For G=Uj(m) it looks 
as follows: 


om —— 2qn 
Ja f (e)dg = TEON t: — t| d t 


i<j 
where T is the maximal torus consisting of diagonal 
matrices 


t = diag(t1,...,tn), th = ek 


and d”t is defined above. 

Weyl integration formula for arbitrary compact 
group G can be found in Simon (1996) or Bump 
(2004, section 18). 

The main applications of the Haar measure are the 
proof of complete reducibility theorem (Theorem 11) 
and orthogonality relations (see below). 


Orthogonality Relations and Peter—Weyl Theorem 


Let Vi, V2 be unirreps of a compact group G. 
Taking any linear operator A: V;— V2 and aver- 
aging the expression A(g):=72(g7!) o A o m1(g) over 


G, we get an intertwining operator (A) = /.A 
Comparing this fact with the Schur lemma, one 
obtains the following fundamental results. 

Let (m, V) be any unirrep of a compact group G. 
Choose any orthonormal basis {v,,1 < k < dim V} 
in V and denote by os or tz, the function on G 
defined by 


te) (g) = ((g)v1, Ve) 


The functions ae are called “matrix elements” of the 
unirrep (m, V). 


Theorem 13 (Orthogonality relations) 


(i) The matrix elements ty, are pairwise orthogonal 
and have norm (dim V)'/* in L?(G, dg). 

(ii) The matrix elements corresponding to equiva- 
lent unirreps span the same subspace in 
L? (G, dg). 

(iii) The matrix elements of two nonequivalent 
unirreps are orthogonal. 

(iv) The linear span of all matrix elements of all 
unirreps is dense in C(G),C®™(G), and in 
L7(G, dg) (generalized Peter-Weyl theorem). 


In particular, this theorem implies that the set G of 
equivalence classes of unirreps is countable. For an 
f.d. representation (m, V) we introduce the character 
of 7 as a function 


dim V 


= 24 tee (E [5] 


It is obviously a central function on G. 


Xr(g) = tra(g 


Remark Traditionally, in representation theory 
the word “character” has two different meanings: 
(1) a multiplicative map from a group to U(1), and 
(2) the trace of a representation operator 7(g). For 
one-dimensional representations both notions 
coincide. 


From the orthogonality relations we get the 
following result. 


Corollary The characters of unirreps of G form an 
orthonormal basis in the subspace of central func- 
tions in L*(G, dg). 


Noncommutative Fourier Transform 


The noncommutative Fourier transform on a com- 
pact group G is defined as follows. Let G denote the 
set of equivalence classes of unirreps of G. Choose 
for any A € G a representation (7), V,) of class A 
and an orthonormal basis in V). Denote by d(A) the 
dimension of V}. 
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We introduce the Hilbert space L(G) as the space 
of matrix-valued functions on G whose value at a point 
A E€ G belongs to Matq (C). The norm is defined as 


IHE = dO): AFA) 
AEG 


For a function f on G define its Fourier transform f 
as a matrix-valued function on G: 


=f re 


Note that in the case G=T! this transform 
associates to a function f the set of its Fourier 
coefficients. In general this transform keeps some 
important features of Fourier coefficients. 


Theorem 14 


(i) Fora function f € L'(G, dg) the Fourier transform 
f is well defined and bounded (by matrix norm) 
function on G. 

(ii) For a function f € L'(G,dg) NL7(G,dg) the 
following analog of the Plancherel formula holds: 


Hews rot 
=) 4) 


ACG 


AFA) = fay 


(ii) The following inversion formula expresses f in 
terms of f: 


8) = 97d) fA) 
AEG 


(iv) The Fourier transform sends the convolution to 
the matrix multiplication: 


ma 


where the convolution product x is defined by 
At | APDE) 


Note the special case of the inversion formula for 


g=e: 
e)= So d(d)-tr 
AEG 


= dQ) - xalg) 
NEG 


where 6(g) is Dirac’s delta-function: {. f(g) 
óôlg)dg =f (e). Thus, we get a presentation of Dirac’s 
delta-function as a linear combination of characters. 
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Classification of Finite-Dimensional 
Representations 


In this section, we give a classification of unirreps of 
a connected compact Lie group G. 


Weight Decomposition 


Let G be a connected compact group with maximal 
torus T, and let (z, V) be a f.d. representation of G. 
Restricting it to T and using complete reducibility, 
we get the following result. 


Theorem 15 The vector space V can be written in 
the form 


V= OW, 


NE X(T) [6] 
Vy = {v € Via, (t)v = (å, thu Vt € t} 


where X(T) is the character group of T defined by [3]. 
The spaces V, are called “weight subspaces,” 
vectors v € V, — “weight vectors” of weight A. The set 


P( V] = {AE X(T)|Va # 0r [7] 


is called the “set of weights” of 7, or the “spectrum” 
of Resa, and 


mult, v) (A) = dim Vy 


is called the “multiplicity” of A in V. 


The next theorem easily follows from the defini- 
tion of the Weyl group. 


Theorem 16 For any f.d. representation V of G, 
the set of weights with multiplicities is invariant 
under the action of the Weyl group: 


mult(,y) (A) = multis, v) (w(A)) 


for any w e W. 


Classification of Unirreps 


Recall that R is the root system of gc. Assume that 
we have chosen a basis of simple roots @1,...,@, C 
R. Then R=R, U RL; roots a € R, can be written 
as a linear combination of simple roots with positive 
coefficients, and R- = — R4. 

A (not necessarily f.d.) representation of gc is 
called a “highest-weight representation”? if it is 
generated by a single vector v € V) (the highest- 
weight vector) such that gav =0 for all positive 
roots a € R}. 

It can be shown that for every A € X(T), there is a 
unique irreducible highest-weight representation of 
gc with highest weight A, which is denoted L(A). 


However, this representation can be infinite dimen- 
sional; moreover, it may not be possible to lift it to a 
representation of G. 


Definition 5 A weight A € X(T) is called “domi- 
nant” if (A, a) € Z for any simple root a;. The set 
of all dominant weights is denoted by X,(T). 


Theorem 17 


(i) All weights of L(A) are of the form w= A — Sinjay, 
n; E La. 

(ii) Let A € X,. Then the irreducible highest-weight 
representation L(A) is f.d. and lifts to a 
representation of G. 

(iii) Every irreducible f.d. representation of G is of 
the form L(A) for some A € X4. 


Thus, we have a bijection {unirreps of G} X4. 


Example 7 Let G=SU(2). There is a unique simple 
root a and the unique fundamental weight w, related 
by a=2w. Therefore, X, =Z, -w and unirreps are 
indexed by non-negative integers. The representa- 
tion with highest weight k-w is precisely the 
representation II, constructed in the subsection 
“Examples of representations.” 


Example 8 Let G=U(n). Then X = Z”, and X, = 
{(A1,---5An) E Z” |` > +--+ > An}. Such objects are 
well known in combinatorics: if we additionally 
assume that à„ > 0, then such dominant weights are 
in bijection with partitions with n parts. They can 


also be described by “Young diagrams” with n rows 
(see Fulton and Harris (1991)). 


Explicit Construction of Representations 


In addition to description of unirreps as highest- 
weight representations, they can also be constructed 
in other ways. In particular, they can be defined 
analytically as follows. Let B=HN, be the 
Borel subgroup in Gc; here H=exph, 
Ny =exp diver, Ola For A€h", let aiB =C 
be a multiplicative map defined by 


Vip). = etA) [8] 


Theorem 18 (Cartan-Borel-Weil). Let A€ X(T). 
Denote by V(X) the space of complex-analytic 
functions on Gc which satisfy the following trans- 
formation property: 


f(gb) = xx" ()F (a): 
The group Gc acts on V(X) by left shifts: 


(n(g)f)(2) = f(g 'h) 9] 


g E€ Gc, beB 


Then 


(i) V(A) A {0} aff A © Xy: 


(ii) If -\ € X41, the representation of G in V(X) is 
equivalent to L(wo(A)), where wo € W is the 
unique element of the Weyl group which sends 
R, to RL. 


This theorem can also be reformulated in more 
geometric terms: the spaces V(A) are naturally 
interpreted as spaces of global sections of appro- 
priate line bundles on the “flag variety” 
B=Gc/B=G/T. 

For classical groups, irreducible representations 
can also be constructed explicitly as the subspaces in 
tensor powers (C”)®*, transforming in a certain way 
under the action of the symmetric group Sz. 


Characters and Multiplicities 
Characters 


Let (m, V) be a f.d. representation of G and let y, be 
its character as defined by [5]. Since x, is central, 
and every element in G is conjugate to an element of 
T, Xx is completely determined by its restriction to 
T, which can be computed from the weight decom- 
position [6]: 
XrlT = >. dim Vy °@) 
de X(T) 
= ` mult, À - ey [10] 
AEX(T) 


where e, is the function on T defined by 
el exp (t)) =e% tet. Note that Cryin m= Ene and 
that eo= 1. 


Weyl Character Formula 





Theorem 19 (Weyl character formula). Let A € X}. 
Then 
A) 
ALO) = a A= ` EW hew 
P wEW 


where, for w € W, we denote e(w) = detw consid- 
ered as a linear map t* >t", and p=(1/2) Dip a. 


In particular, computing the value of the character 
at point t=0 by L’Hopital’s rule, it is possible to 
deduce the following formula for the dimension of 
irreducible representations: 


dim L(A) = | [ 


aeER, 


(a, A + p) 


(a, p) 11 
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Example 9 Let G=SU(2). Then Weyl character 
formula gives, for irreducible representation II, with 
highest weight k - w, 


yk+1 _ y—(k+1) 
IL — 
ue x— x! 
= xP ph nda Nes 


which implies dim II, =k + 1. 


Weyl character formula is equivalent to the follow- 
ing formula for weight multiplicities, due to Kostant: 


mult; (,)u = ` e(w)K(w(à + p) — p- u) 
weW 


where K is Kostant’s partition function: K(r) is the 
number of ways of writing T as a sum of positive 
roots (with repetitions). 

For classical Lie groups such as G = U(), there are 
more explicit combinatorial formulas for weight multi- 
plicities; for U(7), the answer can be written in terms of 
the number of “Young tableaux” of a given shape. 
Details can be found in Fulton and Harris (1991). 


Tensor Product Multiplicities 


Let (7, V) be a f.d. representation of G. By complete 
reducibility, one can write V = in) L(A). The coeffi- 
cients n) are called multiplicities; finding them is an 
important problem in many applications. In parti- 
cular, a special case of this is finding the multi- 
plicities in tensor product of two unirreps: 


L(A) @ L(u) = SD NY, L(Y) 


Characters provide a practical tool for computing 
multiplicities: since characters of unirreps are line- 
arly independent, multiplicities can be found from 
the condition that yy = Xn\XLa). In particular, 


XL(A)XL(u) = SONY XL) 


Example 10 For G=SU(2), tensor product multi- 
plicities are given by 


IL, ® IL, = el; 





where the sum is taken over all / such that |m — n| < 
l<m+n,m+n-+l is even. 


For G = U(n), there is an algorithm for finding the 
tensor product multiplicities, formulated in the 
language of Young tableaux (Littlewood—Richardson 
rule). There are also tables and computer programs 
for computing these multiplicities; some of them are 
listed in the bibliography. 


See also: Classical Groups and Homogeneous Spaces; 
Combinatorics: Overview; Equivariant Cohomology and 
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the Cartan Model; Finite Group Symmetry Breaking; Lie 
Groups: General Theory; Ljusternik—Schnirelman Theory; 
Noncommutative Geometry and the Standard Model; 
Optimal Cloning of Quantum States; Ordinary Special 
Functions; Quasiperiodic Systems; Symmetry Classes in 
Random Matrix Theory. 
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Introduction 


Superstring theories and M-theory, at present the best 
candidate quantum theories which unify gravity, 
Yang-Mills fields, and matter, are directly formu- 
lated in ten and eleven spacetime dimensions. To 
obtain a candidate theory of our four-dimensional 
universe, one must find a solution of one of 
these theories whose low-energy physics is well 
described by a four-dimensional effective field theory 
(EFT), containing the well-established standard 
model (SM) of particle physics coupled to Einstein’s 
general relativity (GR). The standard paradigm for 
finding such solutions is compactification, along the 
lines originally proposed by Kaluza and Klein in the 
context of higher-dimensional general relativity. One 
postulates that the underlying D-dimensional space- 
time is a product of four-dimensional Minkowski 
spacetime, with a (D — 4)-dimensional compact and 
small Riemannian manifold K. One then finds 
that low-energy physics effectively averages over K, 
leading to a four-dimensional EFT whose field 
content and Lagrangian are determined in terms of 
the topology and geometry of K. 

Of the huge body of prior work on this subject, the 
part most relevant for string/M-theory is supergravity 
compactification, as in the limit of low energies, small 
curvatures and weak coupling, the various string 
theories and M-theory reduce to ten- and eleven- 
dimensional supergravity theories. Many of the quali- 
tative features of string/M-theory compactification, and 
a good deal of what is known quantitatively, can be 


understood simply in terms of compactification of these 
field theories, with the addition of a few crucial 
ingredients from string/M-theory. Thus, most of this 
article will restrict attention to this case, leaving many 
“stringy” topics to the articles on conformal field 
theory, topological string theory, and so on. We also 
largely restrict attention to compactifications based on 
Ricci-flat compact spaces. There is an equally important 
class in which K has positive curvature; these lead to 
anti-de Sitter (AdS) spacetimes and are discussed in the 
article on AdS/CFT (see AdS/CFT Correspondence). 
After a general review, we begin with compacti- 
fication of the heterotic string on a three complex 
dimensional Calabi-Yau manifold. This was the first 
construction which led convincingly to the SM, and 
remains one of the most important examples. We 
then survey the various families of compactifications 
to higher dimensions, with an eye on the relations 
between these compactifications which follow from 
superstring duality. We then discuss some of the 
phenomena which arise in the regimes of large 
curvature and strong coupling. In the final section, 
we bring these ideas together in a survey of the 
various known four-dimensional constructions. 


General Framework 


Let us assume we are given a D- (=d+k) dimen- 
sional field theory 7. A compactification is then a 
D-dimensional spacetime which is topologically 
the product of a d-dimensional spacetime with an 
k-dimensional manifold K, the compactification or 
“internal” manifold, carrying a Riemannian metric 
and with definite expectation values for all other 
fields in J. These must solve the equations of motion, 
and preserve d-dimensional Poincaré invariance (or, 
perhaps another d-dimensional symmetry group). 


The most general metric ansatz for a Poincaré 
invariant compactification is 


Tv U 
Gy = 4 0 A 


where the tangent space indices are 0<I<d+ 
k=D,0<p<d, and 1<i<k. Here Tp is the 
Minkowski metric, Gj is a metric on K, and f is a 
real-valued function on K called the “warp factor.” 

As the simplest example, consider pure 
D-dimensional GR. in this case, Einstein’s equations 
reduce to Ricci flatness of Gy. Given our metric 
ansatz, this requires f to be constant, and the metric 
Gj; on K to be Ricci flat. Thus, any K which admits 
such a metric, for example, the k-dimensional torus, 
will lead to a compactification. 

Typically, if a manifold admits a Ricci-flat metric, 
it will not be unique; rather there will be a moduli 
space of such metrics. Physically, one then expects 
to find solutions in which the choice of Ricci-flat 
metric is slowly varying in d-dimensional spacetime. 
General arguments imply that such variations 
must be described by variations of d-dimensional 
fields, governed by an EFT. Given an explicit 
parametrization of the family of metrics, say 
Gj(¢°) for some parameters ¢°, in principle the 
EFT could be computed explicitly by promoting 
the parameters to d-dimensional fields, substituting 
this parametrization into the D-dimensional action, 
and expanding in powers of the d-dimensional 
derivatives. In pure GR, we would find the four- 
dimensional effective Lagrangian 


LEFT = | dyy/der G(d)R“ 


Bn mii Gn ac 
+ y/det GG” HGO); a gh OH 0” 


Ae ee [1] 





While this is easily evaluated for K a symmetric space 
or torus, in general a direct computation of Lgfr is 
impossible. This becomes especially clear when one 
learns that the Ricci-flat metrics Gj are not explicitly 
known for the examples of interest. Nevertheless, 
clever indirect methods have been found that give a 
great deal of information about Lepr; this is much of 
the art of superstring compactification. However, in 
this section, let us ignore this point and continue as if 
we could do such computations explicitly. 

Given a solution, one proceeds to consider its 
small perturbations, which satisfy the linearized 
equations of motion. If these include exponentially 
growing modes (often called “tachyons’’), the solu- 
tion is unstable. (Note that this criterion is modified 
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for AdS compactifications). The remaining perturba- 
tions can be divided into massless fields, correspond- 
ing to zero modes of the linearized equations of 
motion on K, and massive fields, the others. General 
results on eigenvalues of Laplacians imply that the 
masses of massive fields depend on the diameter of 
K as m ~ 1/diam(K), so at energies far smaller than 
m, they cannot be excited (this is not universal; 
given strong negative curvature on K, or a rapidly 
varying warp factor, one can have perturbations of 
small nonzero mass). Thus, the massive fields can be 
“integrated out,” to leave an EFT with a finite 
number of fields. In the classical approximation, this 
simply means solving their equations of motion in 
terms of the massless fields, and using these 
solutions to eliminate them from the action. At 
leading order in an expansion around a solution, 
these fields are zero and this step is trivial; never- 
theless, it is useful in making a systematic definition 
of the interaction terms in the EFT. 

As we saw in pure GR, the configuration space 
parametrized by the massless fields in the EFT, is the 
moduli space of compactifications obtained by 
deforming the original solution. Thus, from a 
mathematical point of view, low-energy EFT can 
be thought of as a sort of enhancement of the 
concept of moduli space, and a dictionary set up 
between mathematical and physical languages. To 
give its next entry, there is a natural physical metric 
on moduli space, defined by restriction from the 
metric on the configuration space of the theory 7; 
this becomes the sigma-model metric for the scalars 
in the EFT. Because the theories 7 arising from 
string theory are geometrically natural, this metric is 
also natural from a mathematical point of view, and 
one often finds that much is already known about it. 
For example, the somewhat fearsome two derivative 
terms in eqn [1], are (perhaps) less so when one 
realizes that this is an explicit expression for the 
Weil—Petersson metric on the moduli space of Ricci- 
flat metrics. In any case, knowing this dictionary is 
essential for taking advantage of the literature. 

Another important entry in this dictionary is that 
the automorphism group of a solution translates 
into the gauge group in the EFT. This can be either 
continuous, leading to the gauge symmetry of 
Maxwell and Yang-Mills theories, or discrete, 
leading to discrete gauge symmetry. For example, if 
the metric on K has continuous isometry group G, 
the resulting EFT will have gauge symmetry G, as in 
the original example of Kaluza and Klein with K = S! 
and G=U(1). Mathematically, these phenomena 
of “enhanced symmetry” are often treated using the 
languages of equivariant theories (cohomology, 
K-theory, etc.), stacks, and so on. 
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To give another example, obstructed deformations 
(solutions of the linearized equations which do not 
correspond to elements of the tangent space of the 
true moduli space) correspond to scalar fields which, 
while massless, appear in the effective potential in a 
way which prevents giving them expectation values. 
Since the quadratic terms V” are masses, this 
dependence must be at cubic or higher order. 

While the preceding concepts are general and apply 
to compactification of all local field theories, string 
and M-theory add some particular ingredients to this 
general recipe. In the limits of small curvatures and 
weak coupling, string and M-theory are well described 
by the ten- and 11-dimensional supergravity theories, 
and thus the string/M-theory discussion usually starts 
with Kaluza—Klein compactification of these theories, 
which we denote I, Ia, IIb, HE, HO and M. Let us 
now discuss a particular example. 


Calabi-Yau Compactification 
of the Heterotic String 


Contact with the SM requires finding compactifications 
to d = 4 either without supersymmetry, or with at most 
N =1 supersymmetry, because the SM includes chiral 
fermions, which are incompatible with N > 1. Let us 
start with the Eg x Eg heterotic string or “HE” theory. 
This choice is made rather than HO because only in this 
case can we find the SM fermion representations as 
subrepresentations of the adjoint of the gauge group. 

Besides the metric, the other bosonic fields of the HE 
supergravity theory are a scalar ® called the dilaton, 
Yang-Mills gauge potentials for the group G = Eg x 
Eg, and a 2-form gauge potential B (often called the 
“Neveu-Schwarz” or “NS” 2-form) whose defining 
characteristic is that it minimally couples to the 
heterotic string world-sheet. We will need their gauge 
field strengths below: for Yang-Mills, this is a 2-form 
Fi, with a indexing the adjoint of Lie G, and for the NS 
2-form this is a 3-form Hyg. Denoting the two 
Majorana—Weyl spinor representations of SO(1, 9) as 
S and C, then the fermions are the gravitino wy € 
S@V, a spin 1/2 “dilatino” A € C, and the adjoint 
gauginos x” € S. We use I’; to denote Dirac matrices 
contracted with a “zehnbein,” satisfying {I7, 7} = 
Gay, and Py = (1/2)(17, Iyl etc. 

A local supersymmetry transformation with para- 
meter € is then 


bw = Dje + tHe 2] 
6A = 0,81" — Hyrre [3] 


6x = Fire [4] 


We now assume N = 1 supersymmetry. An unbroken 
supersymmetry is a spinor € for which the left-hand 
side is zero, so we seek compactifications with a 
unique solution of these equations. 

We first discuss the case H=0. Setting ôy, in 
eqn [2] to zero, we find that the warp factor f must 
be constant. The vanishing of ôy; requires € to be a 
covariantly constant spinor. For a six-dimensional 
M to have a unique such spinor, it must have SU(3) 
holonomy; in other words, M must be a Calabi-Yau 
manifold. In the following, we use basic facts about 
their geometry. 

The vanishing of 6A then requires constant dilaton 
®, while the vanishing of ôx” requires the gauge field 
strength F to solve the hermitian Yang-Mills 
equations, 


F29 we Fo z Fi =Ü 


By the theorem of Donaldson and Uhlenbeck-Yau, 
such solutions are in one-to-one correspondence 
with u-stable holomorphic vector bundles with 
structure group H contained in the complexification 
of G. Choose such a bundle E; by the general 
discussion above, the commutant of H in G will be 
the automorphism group of the connection on E and 
thus the low-energy gauge group of the resulting 
EFT. For example, since Eg has a maximal E¢ x 
SU(3) subgroup, if E has structure group H = SL(3), 
there is an embedding such that the unbroken gauge 
symmetry is Eg x Eg, realizing one of the standard 
grand unified groups E¢ as a factor. 

The choice of E is constrained by anomaly 
cancellation. This discussion (Green et al. 1987) 
modifies the Bianchi identity for H to 


1 
H=trRAR-—Y PAF 
d tr RA 30 : /\ 5] 


where R is the matrix of curvature 2-forms. The 
normalization of the F A F term is such that if we 
take E = TK the holomorphic tangent bundle of K, 
with isomorphic connection, then using the embed- 
ding we just discussed, we obtain a solution of eqn 
[5] with H =0. 

Thus, we have a complete solution of the 
equations of motion. General arguments imply that 
supersymmetric Minkowski solutions are stable, so 
the small fluctuations consist of massless and 
massive fields. Let us now discuss a few of the 
massless fields. Since the EFT has N=1 super- 
symmetry, the massless scalars live in chiral multi- 
plets, which are local coordinates on a complex 
Kahler manifold. 

First, the moduli of Ricci-flat metrics on K will 
lead to massless scalar fields: the complex structure 


moduli, which are naturally complex, and Kahler 
moduli, which are not. However, in string compac- 
tification the latter are complexified to the periods of 
the 2-form B + iJ integrated over a basis of H2(K, Z), 
where J is the Kahler form and B is the NS 2-form. In 
addition, there is a complex field pairing the dilaton 
(actually, exp(—®)) and the “model-independent 
axion,” the scalar dual in d=4 to the 2-form B,,. 
Finally, each complex modulus of the holomorphic 
bundle E will lead to a chiral multiplet. Thus, the 
total number of massless uncharged chiral multiplets 
is 1+h'!(K) + b*!(K) + dim H! (K, End (E)). 

Massless charged matter will arise from zero 
modes of the gauge field and its supersymmetric 
partner spinor x“. It is slightly easier to discuss the 
spinor, and then appeal to supersymmetry to get the 
bosons. Decomposing the spinors of SO(6) under 
SU(3), one obtains (0,p) forms, and the Dirac 
equation becomes the condition that these forms 
are harmonic. By the Hodge theorem, these are in 
one-to-one correspondence with classes in Dolbeault 
cohomology H®?(K,V), for some bundle V. The 
bundle V is obtained by decomposing the spinor into 
representations of the holonomy group of E. For 
H=SU(3), the decomposition of the adjoint under 
the embedding of SU(3) x E¢ in Eg, 


248 = (8,1) + (1,78) + (3,27) + (3,27) [6] 


implies that charged matter will form “generations” 
in the 27, of number dim H®!(K, E), and “antigene- 
rations” in the 27, of number dim H®!(K, E) = 
dim H®?(K, E). The difference in these numbers is 
determined by the Atiyah—Singer index theorem to be 


Neen = No = N37 a JSE) 


In the special case of E = TK, these numbers are 
separately determined to be No7=b'! and 
N5,=b*!', so their difference is x(K)/2, half the 
Euler number of K. In the real world, this number is 
Neen = 3, and matching this under our assumptions 
so far is very constraining. 

Substituting these zero modes into the ten- 
dimensional Yang-Mills action and integrating, one 
can derive the d=4 EFT. For example, the cubic 
terms in the superpotential, usually called Yukawa 
couplings after the corresponding fermion—boson 
interactions in the component Lagrangian, are 
obtained from the cubic product of zero modes 


[once Az \ 3) 


where Q is the holomorphic ¢; € H®'(K, Rep E) are 
the zero modes, and tr arises from decomposing the 
Eg cubic group invariant. 
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Note the very important fact that this expression 
only depends on the cohomology classes of the ¢; 
(and Q). This means the Yukawa couplings can be 
computed without finding the explicit harmonic 
representatives, which is not possible (we do not 
even know the explicit metric). More generally, one 
expects to be able to explicitly compute the super- 
potential and all other holomorphic quantities in 
the effective Lagrangian solely from “topological” 
information (the Dolbeault cohomology ring, and 
its generalizations within topological string theory). 
On the other hand, computing the Kahler metric in 
an N=1 EFT is usually out of reach as it would 
require having explicit normalized zero modes. 
Most results for this metric come from considering 
closely related compactifications with extended 
supersymmetry, and arguing that the breaking 
to N=1 supersymmetry makes small corrections 
to this. 

There are several generalizations of this construc- 
tion. First, the necessary condition to solve eqn [5] is 
that the left-hand side be exact, which requires 


c2(E) = c2(TK) [7] 


This allows for a wide variety of E’s to be used, so 
that Ngen = 3 can be attained with many more K’s. 
This class of models is often called “(0,2) compacti- 
fication” to denote the world-sheet supersymmetry 
of the heterotic string in these backgrounds. One can 
also use bundles with larger structure group; for 
example, H =SL(4) leads to unbroken SO(10) x Eg, 
and H=SL(5) leads to unbroken SU(5) x Eg. 

The subsequent breaking of the grand unified 
group to the SM gauge group is typically done by 
choosing K with nontrivial 71, so that it admits a 
flat line bundle W with nontrivial holonomy 
(usually called a “Wilson line”). One then uses the 
bundle E ® W in the above discussion, to obtain the 
commutant of H & W as gauge group. For example, 
if 71(K) S Zs, one can use W whose holonomy is an 
element of order 5 in SU(5), to obtain as commutant 
the SM gauge group SU(3) x SU(2) x U(1). 

Another generalization is to take the 3-form H Æ 0. 
This discussion begins by noting that, for super- 
symmetry, we still require the existence of a unique 
spinor €; however, it will no longer be covariantly 
constant in the Levi-Civita connection. One way to 
structure the problem is to note that the right-hand 
side of eqn [2] takes the form of a connection with 
torsion; the resulting equations have been discussed 
mathematically in (Li and Yau 2004). 

Another recent approach to these compactifica- 
tions (Gauntlett 2004) starts out by arguing that e€ 
cannot vanish on K, so it defines a weak SU(3) 
structure, a local reduction of the structure group of 
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T K to SU(3) which need not be integrable. This 
structure must be present in all N=1,d=4 super- 
symmetric compactifications and there are hopes 
that it will lead to a useful classification of the 
possible local structures and corresponding partial 
differential equations (PDEs) on K. 


Higher-Dimensional and Extended 
Supersymmetric Compactifications 


While there are similar quasirealistic constructions 
which start from the other string theories and 
M-theory, before we discuss these, let us give an 
overview of compactifications with N > 2 super- 
symmetry in four dimensions, and in higher dimen- 
sions. These are simpler analog models which can be 
understood in more depth; their study led to one of 
the most important discoveries in string/M-theory, 
the theory of superstring duality. 

As before, we require a covariantly constant 
spinor. For Ricci-flat K with other background 
fields zero, this requires the holonomy of K to be 
one of trivial, SU(z), Sp(z), or the exceptional 
holonomies G2 or Spin(7). In Table 1 we tabulate 
the possibilities with spacetime dimension d greater 
or equal to 3, listing the supergravity theory, the 
holonomy type of K, and the type of the resulting 
EFT: dimension d, total number of real super- 
symmetry parameters Ns, and the number of spinor 
supercharges N (in d=6, since left- and right- 
chirality Majorana spinors are inequivalent, there 
are two numbers). 

The structure of the resulting supergravity EFTs is 
heavily constrained by Ns. We now discuss the 
various possibilities. 


Table 1 String/M-theories, holonomy groups and the resulting 
supersymmetry 





Theory Holonomy d Ns N 
M, II Torus Any 32 Max 
M SU(2) 7 16 1 
SU(8) 5 8 1 
Go 4 4 1 
Sp(4) 3 6 3 
SU(4) 3 4 2 
Spin(7) 3 2 1 
lla SU(2) 6 16 (1,1) 
SU(8) 4 8 2 
Go 3 4 2 
IIb SU(2) 6 16 (0, 2) 
SU(8) 4 8 2 
Go 3 4 2 
HE, HO, | Torus Any 16 Max/2 
SU(2) 6 8 1 
SU(3) 4 4 1 
Go 3 2 1 


Ns = 32 


Given the supersymmetry algebra, if such a super- 
gravity exists, it is unique. Thus, toroidal compac- 
tifications of d=11 supergravity, Ila and IIb 
supergravity lead to the same series of maximally 
supersymmetric theories. Their structure is gov- 
erned by the exceptional Lie algebra E,,_,; the 
gauge charges transform in a fundamental repre- 
sentation of this algebra, while the scalar fields 
parametrize a coset space G/H, where G is the 
maximally split real form of the Lie group E11-4, 
and H is a maximal compact subgroup of G. 
Nonperturbative duality symmetries lead to a 
further identification by a maximal discrete sub- 
group of G. 


Ns = 16 


This supergravity can be coupled to maximally 
supersymmetric super Yang-Mills theory, which 
given a choice of gauge group G is unique. Thus, 
these theories (with zero cosmological constant and 
thus allowing super-Poincaré symmetry) are 
uniquely determined by the choice of G. 

In d=10, the choices Eg x Eg and Spin(32)/Z2 
which arise in string theory, are almost uniquely 
determined by the Green—Schwarz anomaly cancel- 
lation analysis. Compactification of these HE, HO 
and type I theories on T” produces a unique theory 
with moduli space 


Rt x SO(n,n + 16; Z)\SO(n,n + 16; R)/SO(n, R) 
x SO(n + 16, R) [8] 


In Kaluza—Klein (KK) reduction, this arises from the 
choice of metric gj, the antisymmetric tensor B; and 
the choice of a flat Eg x Eg or Spin(32)/Z2 connec- 
tion on T”, while a more unified description follows 
from the heterotic string world-sheet analysis. Here 
the group SO(n, 7 + 16) is defined to preserve an even 
self-dual quadratic form 7 of signature (n,n + 16); 
for example, 7=(—Eg) B (—Eg) BI GI GI, where I 
is the form of signature (1,1) and Eg is the Cartan 
matrix. In fact, all such forms are equivalent under 
orthogonal integer similarity transformation; so, 
the resulting EFT is unique. It has a rank 16 + 2n 
gauge group, which at generic points in moduli 
space is U(1)'°t*”, but is enhanced to a nonabelian 
group G at special points. To describe G, we first 
note that a point p in moduli space determines an 
n-dimensional subspace V, of R'‘°t*”, and 
an orthogonal subspace Vv, (of varying dimen- 
sion). Lattice points of length squared —2 con- 
tained in V+ then correspond to roots of the Lie 
algebra of Gp. 














The other compactifications with Ns=16 is 
M-theory on K3 and its further toroidal reductions, 
and IIb on K3. M-theory compactification to d= 7 
is dual to heterotic on T’, with the same moduli 
space and enhanced gauge symmetry. As we discuss 
at the end of the section “Stringy and quantum 
corrections,” the extra massless gauge bosons of 
enhanced gauge symmetry are M2 branes wrapped 
on 2-cycles with topology S*. For such a cycle to 
have zero volume, the integral of the Kahler form 
and holomorphic 2-form over the cycle must vanish; 
expressing this in a basis for H7(K3,R) leads to 
exactly the same condition we discussed for 
enhanced gauge symmetry above. The final result is 
that all such K3 degenerations lead to one- of the 
two-dimensional canonical singularities, of types A, 
D or E, and the corresponding EFT phenomenon is 
the enhanced gauge symmetry of corresponding 
Dynkin type A, D, or E. 

IIb on K3 is similar, but reducing the self-dual 
Ramond—Ramond (RR) 4-form potential on the 2- 
cycles leads to self-dual tensor multiplets instead of 
Maxwell theory. The moduli space is eqn [8] but 
with n=5, not n=4, incorporating periods of RR 
potentials and the SL(2,Z) duality symmetry of IIb 
theory. 

One may ask if the Ns=16 I/HE/HO theories in 
d=8 and d=9 have similar duals. For d= 8, these 
are obtained by a pretty construction known as 
“F-theory.” Geometrically, the simplest definition of 
F-theory is to consider the special case of M-theory 
on an elliptically fibered Calabi-Yau, in the limit 
that the Kahler modulus of the fiber becomes small. 
One check of this claim for d= 8 is that the moduli 
space of elliptically fibered K3s agrees with eqn [8] 
with n= 2. 

Another definition of F-theory is the particular 
case of IIb compactification using Dirichlet 
7-branes, and orientifold 7-planes. This construction 
is T-dual to the type I theory on T*, which provides 
its simplest string theory definition. As discussed in 
Polchinski (1999), one can think of the open strings 
giving rise to type I gauge symmetry as living on 32 
Dirichlet 9-branes (or D9-branes) and an orientifold 
nineplane. T-duality converts Dirichlet and orienti- 
fold p-branes to (p — 1)-branes; thus this relation 
follows by applying two T-dualities. 

These compactifications can also be parametrized 
by elliptically fibered Calabi-Yaus, where K is the 
base, and the branes correspond to singularities of 
the fibration. The relation between these two 
definitions follows fairly simply from the duality 
between M-theory on T’, and IIb string on St. There 
is a partially understood generalization of this 
to. a7. 
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Finally, these constructions admit further discrete 
choices, which break some of the gauge symmetry. 
The simplest to explain is in the toroidal compacti- 
fication of I/HE/HO. The moduli space of theories 
we discussed uses flat connections on the torus 
which are continuously connected to the trivial 
connection, but in general the moduli space of flat 
connections has other components. The simplest 
example is the moduli space of flat Eg x Eg 
connections on St, which has a second component 
in which the holonomy exchanges the two Eg’s. On 
T>, there are connections for which the holonomies 
cannot be simultaneously diagonalized. This struc- 
ture and the M-theory dual of these choices is 
discussed in (de Boer et al. 2001). 


Ns =8,d<6 


Again, the gravity multiplet is uniquely determined, 
so the most basic classification is by the gauge group 
G. The full low-energy EFT is determined by the 
matter content and action, and there are two types 
of matter multiplets. First, vector multiplets contain 
the Yang-Mills fields, fermions and 6 —d scalars; 
their action is determined by a prepotential which is 
a G-invariant function of the fields. Since the vector 
multiplets contain massless adjoint scalars, a generic 
vacuum in which these take nonzero distinct 
vacuum expectation values (VEVs) will have U(1)’ 
gauge symmetry, the commutant of G with a generic 
matrix (for d< 5, while there are several real 
scalars, the potential forces these to commute in a 
supersymmetric vacuum). Vacua with this type of 
gauge symmetry breaking, which does not reduce 
the rank of the gauge group, are usually referred to 
as on a “Coulomb branch” of the moduli space. To 
summarize, this sector can be specified by ny, the 
number of vector multiplets, and the prepotential F, 
a function of the my VEVs which is cubic in d=5S, 
and holomorphic in d=4. 

Hypermultiplets contain scalars which parame- 
trize a quaternionic Kahler manifold, and partner 
fermions. Thus, this sector is specified by a 47, real 
dimensional quaternionic Kahler manifold. The G 
action comes with triholomorphic moment maps; if 
nontrivial, VEVs in this sector can break gauge 
symmetry and reduce it in rank. Such vacua are 
usually referred to as on a “Higgs branch.” 

The basic example of these compactifications is 
M-theory on a Calabi-Yau 3-fold (CY3). Reduction 
of the 3-form leads to h'!'(K) vector multiplets, 
whose scalar components are the CY Kahler moduli. 
The CY complex structure moduli pair with periods 
of the 3-form to produce h*!(K) hypermultiplets. 
Enhanced gauge symmetry then appears when the 
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CY3 contains ADE singularities fibered over a curve, 
from the same mechanism involving wrapped M2 
branes we discussed under Ns = 16. If degenerating 
curves lead to other singularities (e.g., the ODP or 
“conifold”’), it is possible to obtain extremal transi- 
tions which translate physically into Coulomb-—Higgs 
transitions. Finally, singularities in which surfaces 
degenerate lead to nontrivial fixed-point theories. 

Reduction on S! leads to Ila on CY3, with the 
spectrum above plus a “universal hypermultiplet” 
which includes the dilaton. Perhaps the most 
interesting new feature is the presence of world- 
sheet instantons, which correct the metric on vector 
multiplet moduli space. This metric satisfies the 
restrictions of special geometry and thus can be 
derived from a prepotential. 

The same theory can be obtained by compactifi- 
cation of IIb theory on the mirror CY3. Now vector 
multiplets are related to the complex structure 
moduli space, while hypermultiplets are related to 
Kahler moduli space. In this case, the prepotential 
derived from variation of complex structure receives 
no instanton corrections, as we discuss in the next 
section. 

Finally, one can compactify the heterotic string on 
K3 x T®4, but this theory follows from toroidal 
reduction of the d=6 case we discuss next. 


Ns = 8, d=6 


These supergravities are similar to d < 6, but there 
is a new type of matter multiplet, the self-dual 
tensor (in d < 6 this is dual to a vector multiplet). 
Since fermions in d=6 are chiral, there is an 
anomaly cancellation condition relating the numbers 
of the three types of multiplets (Aspinwall 1996, 
section 6.6), 


ny — ny + 29n7 = 273 19] 


One class of examples is the heterotic string 
compactified on K3. In the original perturbative 
constructions, to satisfy eqn |7], we need to choose a 
vector bundle with c.(V)=y(K3) =24. The result- 
ing degrees of freedom are a single self-dual tensor 
multiplet and a rank-16 gauge group. More gen- 
erally, one can introduce Nsg heterotic 5-branes, 
which generalize eqn [7] to c2(E) + Nsg =c2(TK). 
Since this brane carries a self-dual tensor multiplet, 
this series of models is parametrized by ny. They are 
connected by transitions in which an Eg instanton 
shrinks to zero size and becomes a 5-brane; the 
resulting decrease in the dimension of the moduli 
space of Eg bundles on K3 agrees with eqn [9]. 
Another class of examples is F-theory on an 
elliptically fibered CY3. These are related to 


M-theory on an elliptically fibered CY3 in the same 
general way we discussed under Ns=16. The 
relation between F-theory and the heterotic string 
on K3 can be seen by lifting M-theory-heterotic 
duality; this suggests that the two constructions are 
dual only if the CY3 is a K3 fibration as well. Since 
not all elliptically fibered CY3s are K3 fibered, the 
F-theory construction is more general. 

We return to d=4 and Ns =4 in the final section. 
The cases of Ns < 4 which exist in d < 3 are far less 
studied. 


Stringy and Quantum Corrections 


The D-dimensional low-energy effective supergrav- 
ity actions on which we based our discussion so far 
are only approximations to the general story of 
string/M-theory compactification. However, if 
Planck’s constant is small, K is sufficiently large, 
and its curvature is small, then they are controlled 
approximations. 

In M-theory, as in any theory of quantum gravity, 
corrections are controlled by the Planck scale 
parameter M?~7, which sits in front of the Einstein 
term of the D-dimensional effective Lagrangian, and 
plays the role of 4. In general, this is different from 
the four-dimensional Planck scale, which satisfies 
M3, =Vol(K)Mp~?. After taking the low-energy 
limit E « Mp, the remaining corrections are con- 
trolled by the dimensionless parameters /p/R, where 
R can any characteristic length scale of the solution: 
a curvature radius, the length of a nontrivial cycle, 
and so on. 

In string theory, one usually thinks of the 
corrections as a double series expansion in g,, the 
dimensionless (closed) string coupling constant, and 
a’, the inverse string tension parameter, of dimen- 
sions (length)”. The ten-dimensional Planck scale is 
related to these parameters as M8 = 1/g?(a’)*, up to 
a constant factor that depends on conventions. 

Besides perturbative corrections, which have power- 
like dependence on these parameters, there can be 
world sheet and “brane” instanton corrections. For 
example, a string world sheet can wrap around a 
topologically nontrivial spacelike 2-cycle © in K, 
leading to an instanton correction to the effective 
action which is suppressed as exp(—Vol(=)/27a’). 
More generally, any p-brane wrapping a p-cycle 
can produce a similar effect. As for which terms in 
the effective Lagrangian receive corrections, this 
depends largely on the number and symmetries of 
the fermion zero modes on the instanton world 
volumes. 

Let us start by discussing some cases in which one 
can argue that these corrections are not present. 


First, extended supersymmetry can serve to elim- 
inate many corrections. This is analogous to the 
familiar result that the superpotential in d=4,N=1 
supersymmetric field theory does not receive (or “is 
protected from”) perturbative corrections, and in 
many cases follows from similar formal arguments. 
In particular, supersymmetry forbids corrections to 
the potential and two derivative terms in the 
Ns = 32 and Ns = 16 Lagrangians. 

In Ns = 8, the superpotential is protected, but the 
two derivative terms can receive corrections. How- 
ever, there is a simple argument which precludes 
many such corrections — since vector multiplet and 
hypermultiplet moduli spaces are decoupled, a 
correction whose control parameter sits in (say) a 
vector multiplet, cannot affect hypermultiplet mod- 
uli space. This fact allows for many exact computa- 
tions in these theories. 

As an example, in IIb on CY3, the metric on 
vector multiplet moduli space is precisely eqn [1] as 
obtained from supergravity (in other words, the 
Weil—Petersson metric on complex structure moduli 
space). First, while in principle it could have been 
corrected by world-sheet instantons, since these 
depend on Kahler moduli which sit in hypermulti- 
plets, it is not. The only other instantons with the 
requisite zero modes to modify this metric are 
wrapped Dirichlet branes. Since in Ib theory these 
wrap even-dimensional cycles, they also depend on 
Kahler moduli and thus leave vector moduli space 
unaffected. 

As previously discussed, for K3-fibered CY3, this 
theory is dual to the heterotic string on K3 x T°. 
There, the vector multiplets arise from Wilson lines 
on T*, and reduce to an adjoint multiplet of N=2 
supersymmetric Yang-Mills theory. Of course, in 
the quantum theory, the metric on this moduli space 
receives instanton corrections. Thus, the duality 
allows deriving the exact moduli space metric, and 
many other results of the Seiberg-Witten theory of 
N=2 gauge theory, as aspects of the geometry of 
Calabi-Yau moduli space. 

In Ns=4, only the superpotential is protected, 
and that only in perturbation theory; it can receive 
nonperturbative corrections. Indeed, it appears that 
this is fairly generic, suggesting that the effective 
potentials in these theories are often sufficiently 
complicated to exhibit the structure required for 
supersymmetry breaking and the other symmetry 
breakings of the SM. Understanding this is an active 
subject of research. 

We now turn from corrections to novel physical 
phenomena which arise in these regimes. While this 
is too large a subject to survey here, one of the basic 
principles which governs this subject is the idea that 
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string/M-theory compactification on a singular 
manifold K is typically consistent, but has new 
light degrees of freedom in the EFT, not predicted 
by KK arguments. We implicitly touched on one 
example of this in the discussion of M-theory 
compactification on K3 above, as the space of 
Ricci-flat K3 metrics has degeneration limits in 
which curvatures grow without bound, while the 
volumes of 2-cycles vanish. On the other hand, the 
structure of Ns=16 supersymmetry essentially 
forces the d=7 EFT in these limits to be non- 
singular. Its only noteworthy feature is that a 
nonabelian gauge symmetry is restored, and thus 
certain charged vector bosons and their superpart- 
ners become massless. 

To see what is happening microscopically, we 
must consider an M-theory membrane (or 2-brane), 
wrapped on a degenerating 2-cycle. This appears as 
a particle in d=7, charged under the vector 
potential obtained by reduction of the D=11 
3-form potential. The mass of this particle is the 
volume of the 2-cycle multiplied by the membrane 
tension, so as this volume shrinks to zero, the 
particle becomes massless. Thus, the physics is also 
well defined in 11 dimensions, though not literally 
described by 11-dimensional supergravity. 

This phenomenon has numerous generalizations. 
Their common point is that, since the essential 
physics involves new light degrees of freedom, they 
can be understood in terms of a lower-dimensional 
quantum theory associated with the region around 
the singularity. Depending on the geometry of the 
singularity, this is sometimes a weakly coupled field 
theory, and sometimes a nontrivial conformal field 
theory. Occasionally, as in IIb on K3, the lightest 
wrapped brane is a string, leading to a “little string 
theory” (Aharony 2000). 


N = 1 Supersymmetry in Four Dimensions 


Having described the general framework, we con- 
clude by discussing the various constructions which 
lead to N=1 supersymmetry. Besides the heterotic 
string on a CY3, these compactifications include 
type Ila and IIb on orientifolds of CY3, the related 
F-theory on elliptically fibered Calabi-Yau 4-folds 
(CY4), and M-theory on G2 manifolds. Let us briefly 
spell out their ingredients, the known nonperturbative 
corrections to the superpotential, and the duality 
relations between these constructions. 

To start, we recap the heterotic string construc- 
tion. We must specify a CY3K, and a bundle E over 
K which admits a Hermitian Yang-Mills connec- 
tion. The gauge group G is the commutant of the 
structure group of E in Eg x Eg or Spin(32)/Za, 
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while the chiral matter consists of metric moduli of 
K, and fields corresponding to a basis for the 
Dolbeault cohomology group H®!(K, Rep E) where 
Rep E is the bundle E embedded into an Eg bundle 
and decomposed into G-reps. 

There is a general (though somewhat formal) 
expression for the superpotential, 


W= f OA +10(AdA +39) 
+ f QAH? + Wyp [10] 


The first term is the holomorphic Chern-Simons 
action, whose variation enforces the Fè? =0 condi- 
tion. The second is the “flux superpotential,” while 
the third term is the nonperturbative corrections. 
The best understood of these arise from super- 
symmetric gauge theory sectors. In some, but not all, 
cases, these can be understood as arising from gauge 
theoretic instantons, which can be shown to be dual 
to heterotic 5-branes wrapped on K. Heterotic 
world-sheet instantons can also contribute. 

The HO theory is S-dual to the type I string, with 
the same gauge group, realized by open strings on 
Dirichlet 9-branes. This construction involves essen- 
tially the same data. The two classes of heterotic 
instantons are dual to D1- and D5-brane instantons, 
whose world-sheet theories are somewhat simpler. 

If the CY; K has a fibration by tori, by applying 
T-duality to the fibers along the lines discussed for 
tori under Ns = 16 above, one obtains various type II 
orientifold compactifications. On an elliptic fibra- 
tion, double T-duality produces a IIb compactifica- 
tion with D7s and O7s. Using the relation between 
IIb theory on T? and F-theory on K3 fiberwise, one 
can also think of this as an F-theory compactifica- 
tion on a K3-fibered CY4. More generally, one 
can compactify F theory on any elliptically fibered 
4-fold to obtain N=1. These theories have 
D3-instantons, the T-duals of both the type I 
D1- and DS5-brane instantons. 

The theory of mirror symmetry predicts that all 
CY3s have T° fibration structures. Applying the 
corresponding triple T-duality, one obtains a Ila 
compactification on the mirror CY; K, with D6- 
branes and O6-planes. Supersymmetry requires 
these to wrap special Lagrangian cycles in K. As in 
all Dirichlet brane constructions, enhanced gauge 
symmetry arises from coincident branes wrapping 
the same cycle, and only the classical groups are 
visible in perturbation theory. Exceptional gauge 
symmetry arises as a strong coupling phenomenon 
of the sort described in the previous section. The 
superpotential can also be thought of as mirror to 
eqn [10], but now the first term is the sum of a real 


Chern-Simons action on the special Lagrangian 
cycles, with disk world-sheet instanton corrections, 
as studied in open string mirror symmetry. The 
gauge theory instantons are now D2-branes. 

Using the duality relation between the IIa string and 
11-dimensional M-theory, this construction can be 
lifted to a compactification of M-theory on a seven- 
dimensional manifold L, which is an S! fibration over 
K. The D6 and O6 planes arise from singularities in the 
S! fibration. Generically, L can be smooth, and the 
only candidate in Table 1 for such an N=1 
compactification is a manifold with Gz holonomy; 
therefore, L must have such holonomy. Finally, both 
the IIa world-sheet instantons and the D2-brane 
instantons lift to membrane instantons in M-theory. 

This construction implicitly demonstrates the exis- 
tence of a large number of G holonomy manifolds. 
Another way to arrive at these is to go back to the 
heterotic string on K, and apply the duality (again 
under Ns = 16) between heterotic on T? and M-theory 
on K3 to the T? fibration structure on K, to arrive at 
M-theory on a K3-fibered manifold of Gz holonomy. 
Wrapping membranes on 2-cycles in these fibers, we 
can see enhanced gauge symmetry in this picture fairly 
directly. It is an illuminating exercise to work through 
its dual realizations in all of these constructions. 

Our final construction uses the interpretation of the 
strong coupling limit of the HE theory as M-theory on 
a one-dimensional interval I, in which the two Eg 
factors live on the two boundaries. Thus, our original 
starting point can also be interpreted as the heterotic 
string on K x I. This construction is believed to be 
important physically as it allows generalizing a 
heterotic string tree-level relation between the gauge 
and gravitational couplings which is phenomenologi- 
cally disfavored. One can relate it to a Ia orientifold as 
well, now with D8- and O8-branes. 

These multiple relations are often referred to as the 
“web” of dualities. They lead to numerous relations 
between compactification manifolds, moduli spaces, 
superpotentials, and other properties of the EFTs, 
whose full power has only begun to be appreciated. 


Suggestions for further reading 


Original references for all but the most recent of 
these topics can be found in the following textbooks 
and proceedings. We have also referenced a few 
research articles which are good starting points for 
the more recent literature. There are far more 
reviews than we could reference here, and a partial 
listing of these appears at http://www.slac.stanford. 
edu/spires/reviews/ 


See also: Brane Construction of Gauge Theories; 
Random Algebraic Geometry, Attractors and Flux Vacua; 


String Theory: Phenomenology; Superstring Theories; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras; Viscous Incompressible Fluids: 
Mathematical Theory. 
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Introduction 


The Euler equations for compressible fluids consist of 
the conservation laws of mass, momentum, and energy: 





p+Vx:-m=0, xeR4 [1] 
am+ Vs: (E) +v =0 [2] 
3E + Vx (Ze +P) — 0 3] 


Equivalently, these correspond to the general form of 
nonlinear hyperbolic systems of conservation laws: 


Out+Vy-f(u)=0, xER, ucR” j4 


System [1]-[3] is closed by the following constitutive 
relations: 


p = p(p,e), A ar a a [5] 


In [1]-[3] and [5], r=1/p is the deformation 
gradient (specific volume for fluids, strain for 
solids), v= (v1,... vg)" is the fluid velocity with 
pv=m the momentum vector, p is the scalar 
pressure, and E is the total energy with e the 
internal energy which is a given function of (T, p) or 
(p,p) defined through thermodynamical relations. 
The other two thermodynamic variables are tem- 
perature 0 and entropy S. If (p,S) are chosen as 


independent variables, then the constitutive relations 
can be written as 


(e, p, 8) a (e(p, S), P(p, S), A(p, S)) [6] 


governed by 0 dS=de + pdr = de — pdp/p*. For 
polytropic gases, 


p = p(p,S) = cpr! 
o> P 

~~ (y—1p 7] 
ae 

E 


where R > 0 may be taken to be the universal gas 
constant divided by the effective molecular weight of 
the particular gas, c, > 0 is the specific heat at constant 
volume, y= 1 + R/c, > 1 is the adiabatic exponent, 
and « can be any positive constant under scaling. 

The most important criterion of applicability of 
any mathematical model is its well-posedness: 
existence, uniqueness, and stability. The well-posedness 
theory for compressible fluid flows is far from being 
complete, and many further issues are still unexplored. 
In particular, the global existence and uniqueness of 
solutions in Rf, d > 2, is still a major open problem, and 
only partial results shed some lights on the amazing 
complexity of the problem. Below, we will mainly focus 
on the well-posedness issues with emphasis on the 
Cauchy problem, the initial value problem: 


|, = Uo [8] 


first for inviscid compressible fluid flows and then 
for viscous compressible fluid flows. 

Throughout this article, where a cited reference is 
not shown in the “Further reading” section, it may 
usually be found by consulting Bressan (2000), 
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Chen (2005), Dafermos (2005), Feireisl (2004), 
Lions (1986, 1988) or Liv (2000). 


Inviscid Compressible Fluid Flows: 
Euler Equations 


Solutions to the Euler equations [1]-[3] are generically 
discontinuous functions obeying the Clausius-Duhem 
inequality, the second law of thermodynamics: 


O,(pS) + Vx- (mS) > 0 9) 


in the sense of distributions. Such discontinuous 
solutions are called entropy solutions. 

When a flow is isentropic, that is, entropy S is a 
uniform constant Sp in the flow, then the Euler 
equations for the flow take the simpler form: 


Op +Vx:m=0 


10) 
om + Vx: (m & m/p) + Vxp = 0 


where the pressure is a function of the density, 
p = p(p, So), with constant So. For a polytropic gas, 


p(o) =K, y>1 [11] 


where « can be any positive constant by scaling. This 
system can be derived from [1] to [3] as follows: for 
smooth solutions of [1]-[3], entropy S(p,m,E) is 
conserved along fluid particle trajectories, that is, 


O:(pS) + Vx (mS) = 0 


If the entropy is initially a uniform constant and 
the solution remains smooth, then the energy 
equation can be eliminated and entropy S keeps the 
same constant in later time. Thus, under constant 
initial entropy, a smooth solution of [1]-[3] satisfies 
the equations in [10]. Furthermore, solutions of 
system [10] are also a good approximation to 
solutions of system [1|-[3] even after shocks form, 
since the entropy increases across a shock to the 
third order in wave strength for solutions of [1]-[3], 
while in [10] the entropy is constant. Moreover, 
system [10] is an excellent model for the isothermal 
fluid flow with y= 1 and for the shallow-water flow 
with y= 2. For such barotropic flows (i.e., p = p(p)), 
the energy equation [3] serves as an entropy 
inequality (see Lax (1973)): 


OE + Vx-(m(E + p(p))/p) <0 
in the sense of distributions 


In the one-dimensional case, system [1]-[3] in 
Eulerian coordinates is 


O:p + ðm = 0, ðm + O,(m*/p +p) =0 


OE + 0,(m(E + p)/p) = 0 E 


The system above can be rewritten in Lagrangian 
coordinates: 


OT — O,v = 0, OV + Op = 0 


; [13] 
le + uv /2) + 0, (pv) = 0 


with v=m/p, where the coordinates (t,x) are 
the Lagrangian coordinates, which are different 
from the Eulerian coordinates for [12]; for simp- 
licity of notations, we do not distinguish them. 
For the barotropic case, systems [12] and [13] 
reduce to 


Orp + dym = 0, ðm + 0;(m*/p +p) =(0 {14 


and 


OT — O,v = 0, Ov + xp = 0 [15] 


respectively, where pressure p =p(p)=p(T), T=1/p. 
The solutions of [12] and [13], as well as [14] and 
[15], are equivalent even for entropy solutions with 
vacuum where p=0. 

The potential flow is well known in transonic 
aerodynamics, beyond the isentropic approxi- 
mation [10] from [1] to [3]. Denote D,;=0,+ 
SS 1 UO, the convective derivative along fluid 
particle trajectories. From [1] to [3], we have 


D,S=0 16] 


and, by taking the curl of the momentum equations, 


D, (2) 2 Vo aes. ie 
p) p p 
The identities [16] and [17] imply that a smooth 
solution of [1]-[3] which is both isentropic and 
irrotational at time t=0 remains isentropic and 
irrotational for all later times, as long as this 
solution stays smooth. Then, the conditions 
S= Sọ =const. and w=curl,v=0 are reasonable for 
smooth solutions. For a smooth irrotational solu- 
tion, we integrate the d-momentum equations in 
[10] through Bernoulli’s law: 


Ow + Vx(|v|"/2) + Vb(p) = 0 


where h'(p)=p,(p,So)/p. On a simply connected 
space region, the condition curl,v=0O implies that 


there exists ® such that v=V,®. Then, 
drp + Vx (pVx®) = 0 18] 
3P + 4/V.6|° + b(p) = K 


for some constant K. From the second equation in 
[18], we have 


p(D®) = h! (K — (0,8 + 4/V,,6|")) 


Then, system [18] can be rewritten as the following 
time-dependent potential flow equation of second 
order: 


Op(D®) + Vz : (p(D®)Vx®) = 0 [19] 


For a steady solution ® = (x), that is, 0,® =Q, 
we obtain the celebrated steady potential flow 
equation of aerodynamics: 


Va- (P(V s)V x) = 0 [20] 


In applications in aerodynamics, [18] or [19] is 
used for discontinuous solutions, and the empirical 
evidence is that entropy solutions of [18] or [19] are 
fairly good approximations to entropy solutions for 
[1]-[3] provided that (1) the shock strengths are 
small, (2) the curvature of shock fronts is not too 
large, and (3) there is a small amount of vorticity in 
the region of interest. Model [19] or [18] is an 
excellent model to capture multidimensional shock 
waves by ignoring vorticity waves, while the 
incompressible Euler equations are an excellent 
model to capture multidimensional vorticity waves 
by ignoring shock waves. 


Local Well-Posedness for Classical Solutions 


Consider the Cauchy problem for the Euler equations 
[1]-[3] with Cauchy data [8]: 


Assume that uo : R? — Dis in HS N L” withs > d/2 +1. 
Then, for the Cauchy problem [1]-[3] and [8], there 
exists a finite time T=T'(||uol|,, ||uollz=) € (0, co) such 
that there is a unique, stable bounded classical solution 
u € C!([0, T] x RÊ) with u(t, x) € D for (t,x) € [0, T] x 
R? and u € C([0, T]; Hs) A C!([0, T]; Ht). Moreover, 
the interval [0, T) with T < oo is the maximal interval 
of the classical H* existence for [1]-[3] if and only if 
either ||(ttr,Vxtt)||;s — 0o or u(t,x) escapes every 
compact subset K €D as t — T. 


This local existence can be established by relying 
solely on the elementary linear existence theory for 
symmetric hyperbolic systems with smooth coeffi- 
cients (cf. Majda (1984)), or by the abstract 
semigroup theory (Kato 1975). 


Formation of Singularities 


For the one-dimensional case, singularities include 
the development of shock waves and formation of 
vacuum states. For the multidimensional case, the 
situation is much more complicated: besides shock 
waves and vacuum states, singularities can also be 
generated from vortex sheets, focusing and breaking 
of waves, among others. 
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Consider the Cauchy problem of the Euler 
equations [1]-[3] in R? for polytropic gases with 
smooth initial data: 


(p, v, S)|,-9 = (Po, vo, So) (x) 
po(x) > 0, x ER? [21] 


satisfying (p0, Vo, So)(x) = (p, 0, S) for |x| > L, where 
p > 0, S, and L are given constants. The equations 
possess a unique local C! solution (p, v, S)(t, x) with 
p(t, x) >O provided that the initial data [21] is 
sufficiently regular. The support of the smooth 
disturbance (p0(x) — p,vo(x),So(x) — S) propagates 


with speed at most o = 4/p,(p, S) (the sound speed), 


that is, 


(Ø; V, S)(t, x) = (0, 0, S) 
Define 


P(t) = f (pote), Sx) — p@,5)"”) dx 


F(t) = | (pv)(t,x) -x dx 


if |x] >L+ot [22 


which, roughly speaking, measure the entropy and the 
radial component of momentum. Then, if (p, v, S)(t, x) 
is a C! solution of [1]-[3] and [21] for 0 < t < T, and 


P(0) > 0, F(0) > aoR* max po(x) 


with a= 167/3 [23] 


then the lifespan T of the C! solution is finite 
(Sideris 1985). 

To illustrate a way in which the conditions in 
[23] may be satisfied, consider the initial data: 
po = P, So =S. Then P(0)=0, and [23] holds if 


J vo(x) -xdx > acR* 
Ix|<R 


Comparing both sides, one finds that the initial 
velocity must be supersonic in some region relative 
to the sound speed at infinity. The formation of a 
singularity (presumably a shock wave) is detected as 
the disturbance overtakes the wave front forcing the 
front to propagate with supersonic speed. 
Singularities are formed even without the condi- 
tion of largeness, such as [23], being satisfied. For 
example, if So(x) > S and, for some 0 < Ro < R, 


I x17 (Ixl —7)"(po(x) — p) dx > 0 
Ix| Sr A 
J lxe| > (|x|? = 1”) po (x)v0 (x) -xdx > 0 


for Ro <r<R, then the lifespan T of the C! 
solution of [1]-[3] and [21] is finite. The 
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assumptions in [24] mean that, in an average sense, 
the gas must be slightly compressed and outgoing 
directly behind the wave front. 


Local Well-Posedness for Shock-Front Solutions 


For a general hyperbolic system of conservation laws 
[4], shock-front solutions are discontinuous, piecewise 
smooth entropy solutions with the following structure: 


1. There exists a C* spacetime hypersurface S(t) 
defined in (t,x) for O<t<T with spacetime 
normal (1%, Ve) = (Vi, 114,..., Vy) as well as two 
C! vector-valued functions: u+(t,x) and u~ (t,x), 
defined on respective domains Dt and D` on 
either side of the hypersurface S(t) and satisfying 
Out + Vx -f(ut)=0 in DF; 

2. The jump across the hypersurface S(t) satisfies the 
Rankine—Hugoniot condition: 


{uy(u" =w) + vs: (fU) — fu) Hs =0 


For [4], the surface S is not known in advance 
and must be determined as part of the solution of 
the problem; thus, the two equations in (1)-(2) 
describe a multidimensional, highly nonlinear, free- 
boundary-value problem. The initial data yielding 
shock-front solutions is defined as follows. Let So be 
a smooth hypersurface parametrized by a, and let 
via) =(M,..., Yg)(a) be a unit normal to So. Define 
the piecewise smooth initial values for respective 
domains Dj and Dp on either side of the hypersur- 
face So as 


+ 
xe Do 


MED, i 


n(n) = [BA 


It is assumed that the initial jump in [25] satisfies the 
Rankine-Hugoniot condition, that is, there is a 
smooth scalar function o(a) so that 


- o(a) (uf (a) — uz (a) 
+u(a)- (F(u§(a)) — F(up(a))) =0 26 


and that o(a) does not define a characteristic 
direction, that is, 


ala) # X;(up), 


where A; i=1,...,7”, are the eigenvalues of [4]. It is 
natural to require that S(0) = So. 

Consider the Euler equations [1]-[3] in R? for 
polytropic gases with piecewise smooth initial data: 


(0) vo E+) (x), 
(po; Vo; EF) (a) 


Assume that So is a smooth compact surface in R? 
and that (pġ, v5, Ej (x) belongs to the uniform local 


aeSo,1<i<n [27] 


+ 
sE 


28 
XED AR 


(ev Blo = { 


Sobolev space HS (D5), while (p5, v9 , Eg )(x) belongs 
to the Sobolev space H*(Do ), for some fixed s > 10. 
Assume also that there is a function o(a) € H%(So) 
so that [26] and [27] hold, and the compatibility 
conditions up to order s — 1 are satisfied on Sọ by 
the initial data, together with the entropy condition: 


vy : v(a) + y Polo So) < ola) 
< vo v(a) + y Polo: So) [29] 


Then, there are a C* hypersurface S(t) and C! 
functions (p*,v*,E*)(t,x) defined for te [0,T], 
with T sufficiently small, so that 


(o' ,v",E*)(t,x), 
(0 ,v E )(t,x), 


is the discontinuous shock-front solution of the 
Cauchy problem [1]-[3] and [28]. Here a vector 
function u is in H5, provided that there exists 
some r>0 so that max, ega lwr yul|gs <00 with 
w, y(x) =w((x—y)/r) where wecC®(Rf) is a 
function so that w(x) > 0, w(x)=1 when |x|< 1/2, 
and w(x)=0 when |x| > 1. 

The compatibility conditions are needed in order 
to avoid the formation of discontinuities in higher 
derivatives along other characteristic surfaces ema- 
nating from So: Once the main condition [26] is 
satisfied, the compatibility conditions are automati- 
cally guaranteed for a wide class of initial data. The 
idea of the proof is to use the existence of a strictly 
convex entropy and the symmetrization of [4]; the 
shock-front solutions are defined as the limit of a 
convergent classical iteration scheme based on 
a linearization by using the theory of linearized 
stability for shock fronts (Majda 1984). The uni- 
form existence time of shock-front solutions in 
shock strength can be achieved (Métivier 1990). 
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Global Theory in L~ for the isentropic Euler 
Equations for x € R 


Consider the Cauchy problem for [14] with initial 
data: 


(P, m)l: = (po, mo) (x) [31] 


where po and mo are in the physical region 
{(p,77):p > 0,|m| < Cop} for some Co > 0. System 
[14] is strictly hyperbolic at the states with p > 0, 
and strict hyperbolicity fails at the vacuum states 
V := {(p,m/p):p=0, |m/p| < oo}. Then, we have: 


1. There exists a global solution (p,7)(t,x) of the 
Cauchy problem [14] and [31] satisfying 


0 < plt,x) <C, [m(t x)| < Colt,x) — (32) 


for some C > 0 depending only on Co and y, and 
the entropy inequality 


On(p,m) + Oxg(p,m) < 0 [33] 


in the sense of distributions for any convex weak 
entropy-entropy flux pair (7, q), that is, 


Vaq(p,m) = Vn(p,m)VF (p,m) 
with 
Vin(p,m) >0 and nly =0 


2. The solution operator (p, m)(t, -)=S;(p0,™0)(-), 
determined by (1), is compact in Lj..(R) for t > 0; 

3. Furthermore, if (pọ, mo)(x) is periodic with period 
P, then there exists a global periodic solution 
(p,m)(t, x) with [32] such that (p, m)(t,x) asymp- 
totically decays to 


1 
7 J om 
in Lt. 


The convergence of the Lax-Friedrichs scheme, 
the Godunov scheme, and the vanishing viscosity 
method for system [14] have also been established. 

The results are based on a compensated compact- 
ness framework to replace the BV compactness 
framework. For a gas obeying the y-law, the case 
y=(N+4+2)/N, N>5 odd, was first studied by 
DiPerna (1983), and the case 1<y< 5/3 for 
usual gases was first solved by Chen (1986) and 
Ding-Chen-Luo (1985). The cases y > 3 and 5/3 < 
y<3 were treated by Lions—Perthame—Tadmor 
(1994) and Lions—Perthame-Souganidis (1996), 
respectively. The case of general pressure laws was 
solved by Chen—LeFloch (2000, 2003). All the 
results for entropy solutions to [14] in Eulerian 
coordinates can equivalently be presented as the 
corresponding results for entropy solutions to [15] 
in Lagrangian coordinates. The isothermal case 
y= 1 was treated by Huang—Wang (2002). 


Global Theory in BV for the Adiabatic Euler 
Equations for x€ R 


Consider the Euler equations [13] for polytropic 
gases with the Cauchy data: 


(7,U,S8)|,9 = (70, Vo, So) (x) [34] 


Then we have (Liu 1977, Temple 1981, Chen and 
Wagner 2003): 


Let K C {(r, v, S):7 > 0} be a compact set in R, x R’, 
and let N > 1 be any constant. Then there exists a 
constant Co = Co(K, N), independent of y € (1,5/3], 
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such that, for every initial data (To, vo, So) € K with 
TVR(To, vo, So) < N, when 


(y — 1)TVr(T0, vo, S0) < Co for any y € (1, 5/3] 


the Cauchy problem [13] and [34] has a global 
entropy solution (7,v, S)(t,x) which is bounded and 
satisfies 


TVpg(T, V, SIG, -) <C TVr(1, V0, So) 
for some constant C > 0 independent of y. 


This result specially includes that for the baro- 
tropic case (Nishida 1968, Nishida—Smoller 1973, 
DiPerna 1973). Some efforts in the direction of 
relaxing the requirement of small total variation 
have been made. Some extensions to the initial- 
boundary value problems have also been made. In 
addition, an entropy solution in BV with periodic 
data or compact support decays when t— 0. 
Furthermore, even for a general hyperbolic system 
[4] for x € R, we have: 


If the initial data functions uo(x) and vo(x) have 
sufficiently small total variation and uo — vp € L'(R), 
then, for the corresponding exact Glimm, or wave- 
front tracking, or vanishing viscosity solutions u(t, x) 
and v(t,x) of the Cauchy problem [4] and [8], there 
exists a constant C > 0 such that 


lut, :) — vt, llir < Clo — volli) 
for allt > 0 [35] 


An immediate consequence is that the whole 
sequence of the approximate solutions constructed 
by the Glimm (1965) scheme, as well as the wave- 
front tracking method and the vanishing viscosity 
method, converges to a unique entropy solution of 
[4] and [8] when the mesh size or the viscosity 
coefficient tends to zero. More detailed discussions 
and extensive references about the L'-stability of BV 
entropy solutions and related topics can be found in 
Bressan (2000) and Dafermos (2000); also see Chen 
and Wang (2002). Furthermore, the Riemann solu- 
tion is unique and asymptotically stable in the class 
of entropy solutions to [13] with large variation 
satisfying only one physical entropy inequality 
(Chen-Frid-Li 2002). 


Multidimensional Steady Theory 


The mathematical study of two-dimensional steady 
supersonic flows past wedges, whose vertex angles 
are less than the critical angle, can date back to the 
1940s, since the stability of such flows is fundamental 
in applications (cf. Courant—Friedrichs (1948)). Local 
solutions around the wedge vertex were first 


constructed (Gu 1962, Schaeffer 1976, Li 1980). 
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Such global potential solutions were constructed 
when the wedge has some convexity, or is a small 
perturbation of the straight wedge with fast decay in 
the flow direction (Chen 2001, Chen-Xin-Yin 2002), 
or is piecewise smooth which is a small perturba- 
tion of straight wedge (Zhang 2003). For the 
two-dimensional steady supersonic flows gov- 
erned by the full Euler equations past Lipschitz 
wedges, it indicates (Chen-Zhang-Zhu 200S5a) 
that, when the wedge vertex angle is less than 
the critical angle, the strong shock front 
emanating from the wedge vertex is nonlinearly 
stable in structure globally, although there may be 
many weak shocks and vortex sheets between the 
wedge boundary and the strong shock front, under 
the BV perturbation of the wedge so that the total 
variation of the tangent function along the wedge 
boundary is suitably small. This asserts that any 
supersonic shock for the wedge problem is non- 
linearly stable. 

A self-similar gas flow past an infinite cone in R° 
with small vertex angle is also nonlinearly stable 
upon the BV perturbation of the obstacle (Lien-Liu 
1999). It is still open for the nonlinear stability when 
the infinite cone in R? has arbitrary vertex angle. 
The stability issues of supersonic vertex sheets have 
been studied by classical linearized stability analysis, 
large-scale numerical simulations, and asymptotic 
analysis. In particular, the nonlinear development of 
instabilities of supersonic vortex sheets at high 
Mach number was predicted as time evolves 
(Woodward 1985, Artola-Majda 1989). In contrast 
with the prediction of evolution instability, steady 
supersonic vortex sheets, as time-asymptotics, are 
stable globally in structure, even under the BV 
perturbation of the Lipschitz walls, although there 
may be many weak shocks and supersonic vortex 
sheets away from the strong vortex sheet (Chen- 
Zhang-Zhu 2005b). 

Transonic shock problems for steady fluid flows 
are important in applications (cf. Courant and 
Friedrichs (1948)). A program on the existence and 
stability of multidimensional transonic shocks has 
been initiated and three new analytical approaches 
have been developed (Chen-Feldman 2003, 2004). 
The transonic problems include the existence and 
stability of transonic shocks in the whole Rf, the 
existence and stability of transonic flows past finite 
or infinite nozzles, the stability of transonic flows 
past infinite nonsmooth wedges, and the existence of 
regular shock reflection solutions. The first 
approach is an iteration scheme based on the 
nondegeneracy of the free boundary condition: the 
jump of the normal derivative of a solution across 


the free boundary has a strictly positive lower bound 
(Chen-Feldman 2003, 2004), which works for the 
nonlinear equations whose coefficients may depend 
on not only the solution itself but also the gradients 
of the solution. The second approach is a partial 
hodograph procedure, with which the existence and 
stability of multidimensional transonic shocks that 
are not nearly orthogonal to the flow direction can 
be handled (Chen-Feldman 2004): one of the main 
ingredients in this approach is to employ a partial 
hodograph transform to reduce the free boundary 
problem into a conormal boundary value problem 
for the corresponding nonlinear equations of diver- 
gence form and then develop techniques to solve the 
conormal boundary value problem. When the reg- 
ularity of the steady perturbation is C*° or higher, 
the third approach is to employ the implicit function 
theorem to deal with the existence and stability 
problem. Another iteration approach, which works 
well for the two-dimensional equations whose coeffi- 
cients depend only on the solution itself, has also 
been developed (Canic-Keyfitz-Lieberman 2000). 

Further longstanding open problems include the 
existence of global transonic flows past an airfoil or 
a smooth obstacle (Morawetz 1956-58, 1985). 


Multidimensional Unsteady Problems 


Now we present some multidimensional time- 
dependent problems with a simplifying feature that 
the data (domain and/or the initial data) coupled 
with the structure of the underlying equations 
obey certain geometric structure so that the multi- 
dimensional problems can be reduced to lower- 
dimensional problems with more complicated 
couplings. Different types of geometric structure 
call for different techniques. 

The Euler equations for compressible fluids 
with geometric structure describe many important 
fluid flows, including spherically symmetric flows 
and self-similar flows. Such geometric flows 
are motivated by many physical problems such as 
shock diffractions, supernovas formation in stellar 
dynamics, inertial confinement fusion, and under- 
water explosions. For the initial data with large 
amplitude having geometric structure, the requi- 
red physical insight is: (1) whether the solution 
has the same geometric structure globally and 
(2) whether the solution blows up to infinity in a 
finite time. These questions are not easily under- 
stood in physical experiments and numerical simula- 
tions, especially for the blow-up, because of the 
limited capacity of available instruments and 
computers. 


The first type of geometric structure is spherical 
symmetry. A criterion for L° Cauchy data functions 
of arbitrarily large amplitude was observed to 
guarantee the existence of spherically symmetric 
solutions in L% in the large for the isentropic flows, 
which model outgoing blast waves and large-time 
asymptotic solutions (Chen 1997). On the other hand, 
it is evident that the density blows up as |x| — 0 in 
general, especially for the focusing case; the singular- 
ity at the origin makes the problem truly multi- 
dimensional due to the reflection of waves from 
infinity and their strengthening as they move radially 
inwards. One of the important open questions is to 
understand the order of singularity, p(t, |x|) ~ |x|“, 
at the origin for bounded Cauchy data. 

The second type of geometric structure is self- 
similarity, that is, the solutions with initial data 
functions that give rise to self-similar solutions, 
especially including Riemann solutions. Compressi- 
ble flow equations in R, d > 2, with one or more 
linearly degenerate modes of wave propagation have 
additional difficulties. In that case, the global flow is 
governed by a reduced (self-similar) system which is 
of composite (hyperbolic—elliptic) type in the sub- 
sonic region. The linearly degenerate waves give rise 
to one or more families of degenerate characteristics 
which remain real in the subsonic region. In some 
cases, the reduced equations couple an elliptic 
(degenerate elliptic) problem for the density with a 
hyperbolic (transport) equation for the vorticity. 

An important prototype for both practical 
applications and the theory of multidimensional 
complex wave patterns is the problem of diffraction 
of a shock wave which is incident along an inclined 
ramp (see Glimm and Majda (1991)). When a 
plane shock hits a wedge head-on, a self-similar 
reflected shock moves outward as the original 
shock moves forward. The computational and 
asymptotic analysis shows that various patterns of 
reflected shocks may occur, including regular 
reflection and (simple, double, and complex) 
Mach reflections. The main part or whole reflected 
shock is a transonic shock in the self-similar 
coordinates, for which the corresponding equation 
changes the type from hyperbolic to elliptic across 
the shock. There are few rigorous mathematical 
results on the global existence and stability of 
shock reflection solutions and the transition among 
regular, simple Mach, double Mach, and complex 
Mach reflections for the potential flow equa- 
tion [19] and the full Euler equations [1]-[3]. 
Some results were recently obtained for simplified 
models including the transonic small-disturbance 
equation near the reflection point and the pressure 
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gradient equation when the wedge is close to a flat 
wall. 

For the potential flow equation [19], a self- 
similar solution is a solution of the form: 
v=td(y), y=x/t. Letting y(y)=—y*/2 4+ oly), 
then the system can be rewritten in the form of a 
second-order equation of mixed hyperbolic—elliptic 
type in y € Rf by scaling: 


Vy: (o(Vyyl?, 9) Vyp) + doll Vye, p) =9 [36] 


with p(q’, z) = (1 — (q? + 2z)/2)!/"-. Equation [36] 
at |Vyy|=q is hyperbolic (pseudosupersonic) if 
plq, z) + qpq(q75z) < 0 and elliptic (pseudosubsonic) 
if p(q*,z) + qpq(q*,z) > 0. Under this framework, 
the nature of the shock reflection pattern has been 
explored for weak incident shocks (strength b) and 
small wedge angles 20, by a number of different 
scalings, a study of mixed equations, and matching 
asymptotics for the different scalings, where the 
parameter B=c102 /b(y + 1) ranges from 0 to co 
and cı is the speed of sound behind the incident 
shock (Morawetz 1994). For 6>2, a regular 
reflection of both strong and weak kinds is 
possible as well as a Mach reflection; for 8 < 
1/2, a Mach reflection occurs and the flow behind 
the reflection is subsonic and can be constructed in 
principle (with an elliptic problem) and matched; 
and for 1/2< (<2, the flow behind a Mach 
reflection may be transonic which is a solution of 
a nonlinear boundary-value problem of mixed 
type. The basic pattern of reflection has been 
shown to be an almost semicircular shock issuing, 
for a regular reflection, from the reflection point 
on the wedge and, for a Mach reflection, matched 
with a local interaction flow. Some related 
observations were also made (Keller-Blank 1951, 
Hunter-Keller 1984, Hunter 1988). It is important 
to establish rigorous proofs. Recently, a rigorous 
existence proof was established for global solutions 
to shock reflection by large-angle wedges in Chen 
and Feldman (2005). 


Analytical Frameworks for Entropy Solutions 


The recent great progress for entropy solutions for 
one-dimensional time-dependent Euler equations 
and two-dimensional steady Euler equations, based 
on BV, L', or even L® estimates, naturally arises the 
expectation that a similar approach may also be 
effective for the multidimensional Euler equations, 
or more generally, hyperbolic systems of conserva- 
tion laws, especially, 


lut, llev < Clio ley [37] 
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Unfortunately, this is not the case. The necessary 
condition for [37] to be held for p42 (Rauch 
1986) is 


Vfl) Vfilu) = Vfilu)Vf lu) 
for all k,l = 1,2,...,d [38] 


The analysis suggests that only systems in which the 
commutativity relation [38] holds offer any hope for 
treatment in the framework of BV. This special case 
includes the scalar case n=1 and the case of one 
space dimension d = 1. Beyond that, it contains very 
few systems of physical interest. 

In this regard, it is important to identify effective 
analytical frameworks for studying entropy solu- 
tions of the multidimensional Euler equations [1]- 
[3], which are not in BV. Naturally, we want to 
approach the questions of existence, stability, 
uniqueness, and long-time behavior of entropy 
solutions with as much generality as possible. For 
this purpose, a theory of divergence-measure fields 
to construct such a global framework has been 
developed for studying entropy solutions (Chen-Frid 
1999, 2000, Chen-Torres 2005, Chen-Torres-Ziemer 
2005). For more details, see Chen (2005). 


Viscous Compressible Fluid Flows: 
Navier-Stokes Equations 


Compressible fluid flows that are viscous and 
conduct heat are governed by the following 
Navier-Stokes equations: 


p+Ve.m=0, xER? [39] 


msm 





OE +Ve: (Z E+) m (Z-z) ~Vx-q [41] 


Here, 2=X(V xv, p,0) is the viscous stress tensor 
which is symmetric from the conservation of angular 
momentum and q is the heat flux. If the fluid is 
isotropic and the viscous tensor & is a linear function 
of V,v and invariant under a change of reference 
frame (translation and rotation), then we deduce 
from elementary algebraic manipulations that 
necessarily 


E = A(p,0)Vx -v + 2p(p, 0)D [42] 


which corresponds to the Newtonian fluids, where 
D=(Vxv + (Vxv)')/2 is the deformation tensor and 
A and u are the Lamé viscosity coefficients. 


Furthermore, since the fluid is isotropic, we are led 
to the Fourier law: 


q = —k(p,9,|Vx8|)Vx0 


for scalar function k which, in most cases, is taken 
to be simply a function of p and 6, or even a 
constant called the thermal conduction coefficient. 
Again, system [39|-[41] is closed by the constitutive 
relations in [5]. The equation for entropy S is 





BOS) + Vx: (mS +4) 
—UVxv): Vv q-Vx0 
~ $ Pr Ha 


The second law of thermodynamics indicates that 
the right-hand side of [43] should be non-negative 
which yields the restriction: 
R(p,0,|Vx0|) > 0, >20, A+2u/d > 0 

The case 4 > 0 and A+ u >Q is the viscous case 
with heat conductivity k > 0. In particular, the 
kinetic theory indicates that the Stokes relationship 
should hold, namely A= —2u/d and the adiabatic 
component y= 5/3 for monatomic gases. 

In mathematical viscous fluid dynamics, an 
important model is the barotropic model for 
viscous fluids, that is, p=p(p). Then, the specific 
energy E can be taken in the form of 
E=(1/2)p|v? + pe(p) with e'(p)=p(p)/p. For clas- 
sical solutions, the energy of a barotropic flow 
satisfies the equality: 


OE + Vz- (E+ pw) = Vx: (Zv) -È : Vv 


which is now a direct consequence of [39] and [40]. 

The question of local existence of classical 
solutions to [39]-[41] for regular initial data was 
addressed by Nash (1962), where there is no 
indication whether or not these solutions exist for 
all times. 

In the case of one space dimension, the well- 
posedness is largely settled. The basic result for the 
existence of classical solutions is that of Kazhikhov 
(1976); see Lions (1998) and Feireisl (2004) for 
extensive references. The discontinuous solutions 
have been constructed (Shelukhin 1979, Serre 1986, 
Hoff 1987, Chen-Hoff-Trivisa 2000). 

For the Navier-Stokes equations in R° with 
general equation of state, the global classical 
solutions for the Cauchy problem and various 
initial-boundary value problems whose initial data 
is small around a constant state have been 


constructed (Matsumura-Nishida 1980, 1983). The 
approach is to obtain a priori estimates via energy 
methods for extending the local solution or for a 
difference method globally. These results have been 
extended to the Cauchy problem or the initial- 
boundary value problems with small discontinuous 
initial data (Hoff 1997). 

For the Navier-Stokes equations in R? for 
barotropic flows with [11] and large initial data, 
the global existence of solutions containing vacuum 
for the Cauchy problem or various initial-boundary 
value problems was first established by Lions 
(1998) for 4 > 3/2. f d=2,.7 2 9/5 if d=3, and 
y >d/2 if d>4. The gap was closed by Feireisl- 
Novotny—Petzeltova (2001) for the full range 
y >d/2. These results have been extended to the 
full Navier-Stokes equations describing the motion 
of a general compressible, viscous, and heat con- 
ducting fluid (see Feireis! (2004)). The physically 
relevant isothermal case, y=1, is completely open 
even if d=2. The only large data existence result is 
that for radially symmetric data (Hoff 1992). The 
general case y > 1 and d=3 for radially symmetric 
data was solved only recently (Jiang-Zhang 2001). 

The lower-bound estimate on the density is a 
delicate issue. Weak solutions containing vacuum 
for the isentropic viscous flows with constant 
viscosity are unstable in general (Hoff-Serre 
1991). Hence, it is important to see whether 
vacuum will never develop if the initial data is 
away from vacuum; this has been shown for the 
one-dimensional case for large initial data and 
for the multidimensional case with small data. On 
the other hand, from the kinetic theory, if 
solutions contain vacuum, then the viscosity 
coefficients in the Navier-Stokes equations should 
depend on the density near vacuum; this indeed 
stabilizes the solutions for the one-dimensional 
case. 

The stability of viscous shock waves has been 
studied for the one-dimensional case (see Liu (2000) 
and the references therein). The compressible- 
incompressible limits from the isentropic compres- 
sible to incompressible Navier-Stokes equations 
when the Mach number tends to zero have been 
established for arbitrarily weak solutions (Lions- 
Masmoudi 1998) and for smooth solutions and a 
class of initial data functions (Hoff 1998). 
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The inviscid limits from the Navier-Stokes equa- 
tions to the Euler equations have been established as 
long as the solutions of the Euler equations are 
smooth, when the viscosity and heat conductivity 
coefficients tend to zero (Klainerman-Majda 1982). 
It is completely open for general entropy solutions, 
even in the one-dimensional case. 


See also: Breaking Water Waves; Capillary Surfaces; 
Fluid Mechanics: Numerical Methods; Geophysical 
Dynamics; Incompressible Euler Equations: 
Mathematical Theory; Inviscid Flows; 
Magnetohydrodynamics; Newtonian Fluids and 
Thermohydraulics; Non-Newtonian Fluids; Partial 
Differential Equations: Some Examples; Stability of 
Flows; Viscous Incompressible Fluids: Mathematical 
Theory. 
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Conventions and Units 


This article adopts many of the conventions and 
notations of Misner, Thorne, and Wheeler (1973) — 
hereafter denoted MTW - including metric signature 
(— ++ +); definitions of Christoffel symbols and 
curvature tensors (up to index permutations per- 
mitted by standard symmetries of the tensors in a 
coordinate basis); the use of Greek indices 
a, 9,y,..., ranging over the spacetime coordinate 
values (0, 1,2, 3) — (t, x!,x*,x°), to denote the com- 
ponents of spacetime tensors such as g,,,; the similar 
use of Latin indices 1,j,k,..., ranging over the 
spatial coordinate values (1,2,3)— (x!,x*,x°), for 
spatial tensors such as qj; the use of the Einstein 
summation convention for both types of indices; the 
use of standard Kronecker delta symbols (tensors), 
ôH, and 6';; the choice of geometric units, G=c=1; 
and, finally, the normalization of the matter fields 
implicit in the choice of the constant 87 in [1]. 

The majority of the equations that appear in this 
article are tensor equations, or specific components 
of tensor equations, written in traditional index (not 
abstract index) form. Thus, these equations are 
generally valid in any coordinate system, (t, x’), 
but, of course do require the introduction of a 
coordinate basis and its dual. This approach is also 
largely a matter of convention, since all of what 
follows can be derived in a variety of fashions, some 
of them purely geometrical, and there are also 
approaches to numerical relativity based, for exam- 
ple, on frames rather than coordinate bases. 

This article departs from MTW in its use of a, Ø, 
and q; to denote the lapse, shift, and spatial metric, 
respectively, rather than MTW’s N, N’, and ®’g;. 

Finally, the operations of partial differentiation 
with respect to coordinates x”, t, and x’ are denoted 
Ou, Or, and O;, respectively. 


Introduction 


The numerical analysis of general relativity, or 
numerical relativity, is concerned with the use of 
computational methods to derive approximate solu- 
tions to the Einstein field equations 


Gio = ST ly [1] 


Here, G,, is the Einstein tensor — that contracted 
piece of the Riemann curvature tensor that has 
vanishing divergence — and T, is the stress tensor of 
the matter content of the spacetime. T,,, likewise has 
vanishing divergence, an expression of the principle 
of local conservation of stress-energy that general 
relativity embodies. 

The elegant tensor formulation [1] belies the fact 
that, ultimately, the field equations are generically a 
complicated and nonlinear set of partial differential 
equations (PDEs) for the components of the space- 
time metric tensor, g(x“), in some coordinate 
system x”. Moreover, implicit in a numerical 
solution of [1] is the numerical solution of the 
equations of motion for any matter fields that 
couple to the gravitational field — that is, that 
contribute to T,,,. The reader is reminded that it is a 
hallmark of general relativity that, in principle, all 
matter fields — including massless ones such as the 
electromagnetic field — contribute to T,,,. 

Now, in the 3+ 1 approach to general relativity 
that is described below, the task of solving the field 
equations [1] is formulated as an initial-value or 
Cauchy problem. Specifically, the spacetime metric, 
Cil |= Sif, xf), which encodes all geometric 
information concerning the spacetime, M, is 
viewed as the time history, or dynamical evolution, 
of the spatial metric, yl0, x*), of an initial space- 
like hypersurface, X(0). In any practical calculation, 
the degree to which the matter fields “back-react” 
on the gravitational field, that is, contribute to T, 
substantially enough to cause perturbations in gy, 
at or above the desired accuracy threshold, will 
thus depend on the specifics of the initial 
configuration. 

In astrophysics, there are relatively few well- 
identified environments in which it is generally 
thought to be crucial to the faithful emulation of 
the physics that the matter fields be fully coupled to 
the gravitational field. However, both observation- 
ally and theoretically, the existence of gravitation- 
ally compact objects is quite clear. Gravitationally 
compact means that a star with mass, M, has a 
radius, R, comparable to its Schwarzschild radius, 
Rm, which is defined by 


2G z 1 
Ru =-7 M7 10 *7 kgm [2] 
Here, and only here, G and c — Newton’s gravita- 
tional constant and the speed of light, respectively — 
have been explicitly reintroduced. The fact that 
Ry /R is about 10~° and 10~ at the surfaces of the 


sun and earth, respectively, is a reminder of just how 
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weak gravity is in the locality of Earth. However, as 
befits anything of Einsteinian nature, the weakness 
of gravity is relative, so that at the surface of a 
neutron star, one would find 


Rm 
while for black holes, one has 
Rm | 
m 4 


In such circumstances, gravity is anything but 
weak! Furthermore, in situations where the mat- 
ter—-energy distribution has a highly time-dependent 
quadrupole moment — such as occurs naturally with 
a compact-binary system (i.e. a gravitationally 
bound two-body system, in which each of the 
bodies is either a black hole or a neutron star) — the 
dynamics of the gravitational field, including, 
crucially, the dynamics of the radiative components 
of the gravitational field, can be expected to 
dominate the dynamics of the overall system, 
matter included. For scenarios such as these, it 
should come as no surprise that the solution of the 
combined gravitohydrodynamical system begs for 
numerical analysis. 

In addition, both from the physical and mathe- 
matical perspectives, it is also natural to study the 
strong, field dynamic regimes (R — Ry and/or v >c, 
where v is the typical speed characterizing internal 
bulk motion of the matter) of general relativity 
within the context of a variety of matter models. 
Typical processes addressed by these theoretical 
studies include the process of black hole formation, 
end-of-life events for various types of model stars, 
and, again, the interaction, including collisions, of 
gravitationally compact objects. Note that it is 
another hallmark of general relativity that highly 
dynamical spacetimes need not contain any matter; 
indeed, the interaction of two black holes — the 
natural analog of the Kepler problem in relativity — 
is a vacuum problem; that is, it is described by a 
solution of [1] with T, =0. 

Motivated in significant part by the large-scale 
efforts currently underway to directly detect gravita- 
tional radiation (gravitational waves), much of the 
contemporary work in numerical relativity is 
focused on precisely the problem of the late phases 
of compact-binary inspiral and merger. Such bin- 
aries are expected to be the most likely candidates 
for early detection by existing instruments such as 
TAMA, GEO, VIRGO, LIGO, and, more likely, by 
planned detectors including LIGO II and LISA (see, 
e.g. Hough and Rowan (2000)). Detailed and 
accurate predictions of expected waveforms from 


these events — using the techniques of numerical 
relativity — have the potential to substantially hasten 
the discovery process, on the basis of the general 
principle that if one knows what signal to look for, 
it is much easier to extract that signal from the 
experimental noise. 

The computational task facing numerical relati- 
vists who study problems such as binary inspiral is 
formidable. In particular, such problems are intrin- 
sically “3D,” to use the CFD (computational fluid 
dynamics) nomenclature in which time dependence 
is always assumed. That is, the PDEs that must be 
solved govern functions, F(t, x£), that depend on all 
three spatial coordinates, x*, as well as on time, t. 
Unfortunately, even a cursory description of 3D 
work in numerical relativity as it stands at this time 
is far beyond the scope of this article. 

What follows, then, is an outline of a traditional 
approach to numerical relativity that underpins 
many of the calculations from the early years of 
the field (1970s and 1980s), most of which were 
carried out with simplifying restrictions to 
either spherical symmetry or axisymmetry. The 
mathematical development, which will hereafter be 
called the 3+ 1 approach to general relativity, has 
the advantage of using tensors and an associated 
tensor calculus that are reasonably intuitive for the 
physicist. This “standard” 3+ 1 approach is also 
sufficient in many instances (particularly those 
with symmetry) in the sense that it leads to well- 
posed sets of PDEs that can be discretized and 
then solved computationally in a convergent 
(stable) fashion. In addition, a thorough under- 
standing of the 3+1 approach will be of sig- 
nificant help to the reader wishing to study any of 
the current literature in numerical relativity, 
including the 3D work. 

However, the reader is strongly cautioned that 
the blind application of any of the equations that 
follow, especially in a 3D context, may well lead 
to “ill-posed systems,” numerical analysis of which 
is useless. Anyone specifically interested in using 
the methods of numerical relativity to generate 
discrete, approximate solutions to [1], particularly 
in the generic 3D case, is thus urged to first 
consult one of the comprehensive reviews of 
numerical relativity that continue to appear at 
fairly regular intervals (see, e.g., Lehner (2001), or 
Baumgarte and Shapiro (2003)). Most such refer- 
ences will also provide a useful overview of many 
of the most popular numerical techniques that are 
currently being used to discretize (convert to 
algebraic form) the Einstein equations, as well as 
the main algorithms that are used to solve the 
resulting discrete equations. These subjects are not 
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described below, not least since discussion of the 
available discretization techniques only makes 
sense in the context of PDEs of specific systems 
with specific boundary conditions, while there is 
only space here to describe the general mathema- 
tical setting for 3+ 1 numerical relativity. 


The 3+ 1 Spacetime Split 


At least at the current time, computations in 
numerical relativity are restricted to the case of 
globally hyperbolic spacetimes. A spacetime (four- 
dimensional pseudo-Riemannian manifold), My, 
endowed with a metric, g,,,, is globally hyperbolic 
if there is at least one edgeless, spacelike hypersur- 
face, (0), that serves as a Cauchy surface. That is, 
provided that the initial data for the gravitational 
field are set consistently on (0) — so that the four 
constraint equations are satisfied (see below) — the 
entire metric g,,(t,x’) can be determined from the 
field equations [1] (with appropriate boundary 
conditions), and thus, so can the complete geometric 
structure of the spacetime manifold. 

To be sure, global hyperbolicity is restrictive. It 
excludes, for example, the highly interesting Godel 
universe. However, particularly from the point 
of view of studying asymptotically flat solutions 
(or solutions asymptotic to any of the currently 
popular cosmologies), as is usually the case in 
astrophysics, the requirement of global hyperbolicity 
is natural. 

The 3 + 1 split is based on the complete foliation 
of Ms based on level surfaces of a scalar function, 
t — the time function. That is, the t=const. slices, 
are three-dimensional spacelike (Riemannian) hyper- 
surfaces, and, as £ ranges from —oo to +a, 
completely fill the spacetime manifold, My. In 
order for the S(t) to be everywhere spacelike, 
t must be everywhere timelike: 


ON INTA 5] 


Here V,, is the spacetime covariant derivative 
operator compatible with the four metric, g,,,, thus 
satisfying Vo, =0, and g” is the inverse metric 
tensor, which satisfies g“g,,=6",. The reader is 
reminded that ô”, is a Kronecker delta symbol; that 
is, ô, has the value 1 if w=v, and the value 0 
otherwise. 

Furthermore, the scalar function t is now adopted 
as the temporal coordinate, so that x“=(t,x’), 
where the x’ are the three spatial coordinates. As 
noted implicitly above, since the problem under 
consideration is a pure Cauchy evolution, the range 


of t should nominally be infinite, both to the future 
as well as to the past; that is, the solution domain is 


—oo < < œ [6] 


X| = (jii) < 00 [7] 


However, this assumes that one has global 
existence for arbitrarily strong initial data, which 
is decidedly not always the case in general 
relativity. Indeed, “continued? or “catastrophic” 
gravitational collapse — that is, the process of black 
hole formation — signaled, in modern language, by 
the appearance of a trapped surface, inexorably 
leads to a physical singularity, which - the 
somewhat vague nature of the singularity theorems 
of Penrose, Hawking, and others notwithstanding — 
in actual numerical computations invariably turns 
out to be “catastrophic” in terms of Cauchy 
evolution. 

Such behavior in time-dependent nonlinear PDEs 
is quite familiar in the mathematical community at 
large, where it is frequently known as finite-time 
blow-up (or finite-time singularity). However, 
despite the fact that such behavior is one of the 
most fascinating aspects of solutions of the Einstein 
equations, the following discussion will be, impli- 
citly at least, restricted to the case of weak initial 
data, that is, to initial data for which there is global 
existence. 

With the manifold Ms sliced into an infinite 
stack of spacelike hypersurfaces, X(t), attention 
shifts to any single surface, as well as to the 
manner in which such a generic surface is 
embedded in the spacetime. 

First, each spacelike hypersurface, S(t), is itself a 
three-dimensional Riemannian differential manifold 
with a metric 7;(t, xf). (Note that in this discussion, 
the symbol £ is to be understood to represent any 
specific value of coordinate time.) From this metric, 
one can construct an inverse metric, yË (t, xf), 
defined, as usual, so that 


Pag = 5 [8 


Associated with the spatial metric, yj, is a natural 
spatial covariant derivative operator, D;, that is 
compatible with yj: 


Devi = 0 [9] 


With the spatial metric, 7, and its inverse, 7’, in 
hand, the standard formulas of tensor analysis can 
be applied to compute the usual suite of geome- 
trical tensors. All tensors thus computed, and 
indeed, all tensors defined intrinsically to the 
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hypersurfaces X(t) are called “spatial” tensors, and 
have their indices (if any) raised and lowered with 
q” and yj, respectively. 

Thus, the Christoffel symbols of the second kind, 
I“, are given by 


Dig = 3” (ery + Ore — Oje) (10) 


Note that these quantities are symmetric in their last 
two indices 


Peels [11] 


and that they can be used, as usual, in explicit 
calculation of the action of the spatial covariant 
derivative operator on an arbitrary tensor. In 
particular, for the special cases of a spatial vector, 
V', and a covector (1-form), W;, one has 


D; Vİ = ð Vİ + g V* [12] 
and 
D;W; = 0,W; — T; W, [13] 


respectively. 

Given the Christoffel symbols, the components of 
the spatial Riemmann tensor, denoted here Rijp! , are 
computed using 


Rik = OV ig — OT ig +I” ig Pin 
— I” gD mi [14 


Finally, the Ricci tensor, Ri, and Ricci scalar, R, are 
defined in the usual fashion 


Ri, = Rey = Rey [15] 
Rah; [16] 


The reader should again note that all of the 
tensors just defined “live” on each and every single 
spacelike hypersurface, X(t), and are thus known as 
hypersurface-intrinsic quantities. In particular, the 
spatial Riemann tensor, Rijk’, which encodes all 
intrinsic geometric information about X(t), in no 
way depends on how the slice is embedded in the 
spacetime My. 

The next step in the 3+ 1 approach involves 
rewriting the fundamental spacetime line element for 
the squared proper distance, ds*, between two 
spacetime events, P and Q, having coordinates x” 
and x” + dx”, respectively, 


ds* = g,,dx"dx" [17] 





Figure 1 Spacetime displacement in the 3+1 approach, 
following Misner, Thorne, and Wheeler (1973). Solid lines represent 
surfaces of constant time, t; that is, each solid line represents a 
single spacelike hypersurface, }(t). Dotted lines denote trajectories 
of constant spatial coordinate, that is, trajectories with x* = const. 
The lapse function, a(t, x), encodes the (local) ratio between 
elapsed coordinate time, dt, and elapsed proper time, dz = a dt, for 
an observer moving normal to the slices (i.e., for an observer with a 
4-velocity, u”, identical to the hypersurface normal, n“). Similarly, 
the shift vector, '(t,x*), describes the shift, @'(t,x’)dt, in 
trajectories of constant spatial coordinate — the dotted lines in the 
figure — relative to motion perpendicular to the slices. The 3+ 1 
form of the line element [18] then follows immediately from an 
application of the spacetime version of the Pythagorean theorem. 


As Figure 1 illustrates, a quick route to the 3+ 1 
decomposition of the above expression, and thus of 
the tensor g,,, itself, is based on an application of 
the “four-dimensional Pythagorean theorem.” In 
setting up the calculation, one naturally identifies 
four functions, the scalar lapse, a(t,x*), and the 
vector shift, 6'(t, x£), that encode the full coordi- 
nate (gauge) freedom of the theory. That is, 
complete specification of the lapse and shift is 
equivalent to completely fixing the spacetime 
coordinate system. 

In light of the above discussion, and again 
referring to Figure 1, one readily deduces the 3 + 1 
decomposition of the spacetime line element: 


ds? = —a*dé* + yy (dx + B'dt)(dx' + B’dt) [18] 


A rearranged form of this last expression is also 
often seen in the literature: 


ds? = (-o? + Be dt + 28,dx'dt 
T qyjdx'dx’ [1 9| 


The following useful identifications of the “time- 
time,” “time-space,” and “space-space” pieces of 
the spacetime metric, g,,., follow immediately from 
[19]: 


goo = —a’* + BB [20] 
Zoi = Zio = Bi = Yn" [21] 
Sij = Vij [22] 


This last relation is an example of a useful general 
result; the purely spatial components, Ojg..., of a 
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completely covariant, but otherwise arbitrary, space- 
time tensor, Qag... constitute the components of a 
completely covariant spatial tensor. 

A straightforward calculation, which provides a 
good exercise in the use of the 3+1 calculus, 
yields the following equally useful identifications for 
various pieces of the inverse spacetime metric: g®” 


g” =a [23] 
er L g” na a [24] 
g =g a |25] 


Since the Einstein field equations are equations 
with, loosely speaking, geometry on one side and 
matter on the other, tensors built from matter fields 
must also be decomposed. In particular, it is 
conventional to define tensors, p, j; and Sy that 
result from various projections of the spacetime 
stress energy tensor, T,,,, onto the hypersurface: 


(ee le [26] 
ji = —n,T"; [27] 
Si = Ty [28] 


For observers with 4-velocities u” equal to n”, and 
only for those observers with u” =n”, the above 
quantities have the interpretation of the locally and 
instantaneously measured energy density, momen- 
tum density, and spatial stresses, respectively. As 
with the geometric quantities, all of the matter 
variables, p, j; and S; defined in [26]-[28] are 
spatial tensors and thus have their indices (if any) 
raised and lowered with the 3-metric. Note that the 
identification S;=T; is another illustration of 
the general result mentioned in the context of the 
previous identification of yj and gj. 

Finally, observing that time parameters are natu- 
rally defined in terms of level surfaces (equipotential 
surfaces), it should be no surprise that the covariant 
components, n,„, of the hypersurface normal field, 


n, = (~a, 0, 0,0) (29) 


are simpler than the components, n”, of the normal 
itself, | 
=la a p) [30] 


and, in fact, egn [29] can also be deduced from a 
quick study of Figure 1. 

In the 3 + 1 approach, in addition to the 3-metric, 
q(t, xf), and coordinate functions, a(t,x’) and 
GB(t, x’), it is convenient to introduce an additional 
rank-2 symmetric spatial tensor, K;(t, xf), known as 


the extrinsic curvature (or second fundamental 
form). This additional tensor is analogous to a 
time derivative of y(t, xf), or, from a Hamiltonian 
perspective, to a variable that is dynamically 
conjugate to y(t, xf). 

As the name suggests, the extrinsic curvature 
describes the manner in which the slice Y(t) is 
embedded in the manifold (to be contrasted with 
Rize! defined by [14] which is, as mentioned 
previously, completely insensitive to the manner in 
which the hypersurface is embedded in My). 

Geometrically, Kj; is computed by calculating the 
spacetime gradient of the normal covector field, 1,,, 
and projecting the result on to the hypersurface, 


Ki = —3 Vin [31] 


where it must be stressed that V, is the spacetime 
covariant derivative operator compatible with the 
4-metric, gag; that is, V,8ag =0. A straightforward 
tensor calculus calculation then yields the following, 
which can be viewed as a definition of the Kj 


K; = -= (ð + DiGi + DiGi) BY 
Here, D; is the spatial covariant metric, compatible 
with (D,7;=90), that was defined previously. 
Observe that this equation can be easily solved for 
Oryij (this will be done below), and thus, in the 3 + 1 
approach it is [32] that is the origin of the evolution 
equations for the 3-metric components, 7j. 


Einstein’s Equations in 3+ 1 Form 
The Constraint Equations 


As is well known, as a result of the coordinate (gauge) 
invariance of the theory, general relativity is overdeter- 
mined in a sense completely analogous to the situation 
in electrodynamics with the Maxwell equations. One 
of the ways that this situation is manifested is via the 
existence of the constraint equations of general 
relativity. Briefly, starting from the naive view that 
the ten metric functions, g,,,(t,x'), that completely 
determine the spacetime geometry are all dynamical — 
that is, that they satisfy second-order-in-time equations 
of motion — one finds that the Einstein equations do not 
provide dynamical equations of motion for the lapse, 
a, or the shift, 3’. Rather, four of the field equations [1] 
are equations of constraint for the “true” dynamical 
variables of the theory, {yj, 0/7}, or, equivalently, 
{yj K';}. Note that in the following, the mixed 
form, K’‘;, is at times used — again by convention — as 
the principal representation of the extrinsic curvature 
tensor (instead of Kj as previously, or K”). 
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Thus, four of the components of [1] can be 
written in the form 


C" (ya, K's, Ons 100%, OK) =T* B3] 


where T” depends only on the matter content in the 
spacetime. Note that in addition to having no 
dependence on 0774, the constraints are also 
independent of a and p. 

If the Einstein equations [1] are to hold throughout 
the spacetime, then the constraints [33] must hold on 
each and every spacelike hypersurface, X(t), including, 
crucially, the initial hypersurface, (0). From the point 
of view of Cauchy evolution, this means that the 12 
functions, {y;(0, x$), K‘,(0, x*)}, constituting the grav- 
itational part of the initial data, are not completely 
freely specifiable, but must satisfy the four constraints 


CH (m0, 24), Kij(0, x), : J —T#(0,x*) [84] 


However, provided initial data that do satisfy the 
equations is chosen, then — as consistency of the 
theory demands —- the dynamical equations of 
motion for the {yj, K’;} (eqns [37] and [38] below) 
guarantee that the constraints will be satisfied on all 
future (or past) hypersurfaces, X(t). In this internal 
self-consistency, the geometrical Bianchi identities, 
V,G""=0, and the local conservation of stress 
energy, V, I” =0, play crucial roles. 

In the 3+ 1 approach, as one would expect, the 
constraint equations further naturally subdivide into 
a scalar equation 


R — KK + K? = 16rp [35] 
and a (spatial) vector equation 
D,K! — D'K = 8rř [36] 


where the energy and momentum densities, p and j = 
yj, are given by [26]-[28]. Equations [35] and [36] 
are often known as the Hamiltonian and momentum 
constraint, respectively, not least since the behavior of 
their solutions as X = \/yx'x! — oo encodes the 
conserved mass and linear momentum (four numbers) 
that can be defined in asymptotically flat spacetimes. 

In a general 3 + 1 coordinate system, and with an 
appropriate choice of variables, the constraints can 
be written as a set of quasilinear elliptic equations 
for four of the {y;, K';} (or, more properly, for 
certain algebraic combinations of the {7j, K‘;}). 
Thus, especially for 2D and 3D calculations, the 
setting of initial data for the Cauchy problem in 
general relativity is itself a highly nontrivial mathe- 
matical and computational exercise. Readers 
wishing more details on this subject are directed to 
the comprehensive review by Cook (2000). 


The Evolution Equations 


As discussed above, in the 3 + 1 form of the Einstein 
equations [1], the spatial metric, yj, and the 
extrinsic curvature, K';, are viewed as the dynamical 
variables for the gravitational field. The remainder 
of the 3 + 1 equations are thus two sets of six first- 
order-in-time evolution equations; one set for yj, 


WY: = — 2aryipK®; F B Oki 
+ ikB + Yrði" [37] 
and the other set for K’,, 
O,K'; = kK’; — Ok K"; + 0,8" K"'_ — D'Dja 
+ a(Rj+KK';+8n(56(S—p)—-S')) [88] 


As also noted previously, the evolution equations 
[37] for the spatial metric components, 7, follow 
from the definition of the extrinsic curvature [31]. 
The derivation of the equations for the extrinsic 
curvature, on the other hand, require lengthy, but 
well-documented, manipulations of the spatial com- 
ponents of the field equations [1]. 


The (Naive) Cauchy Problem 


A naive statement of the Cauchy problem for 3 + 1 
numerical relativity is thus as follows: fix a speci- 
fied number, N, of matter fields ¢4(t,x*), A= 
1,2,...,N, all minimally coupled to the gravita- 
tional field, with a total stress tensor, T», given by 


(i @ pe h [39] 
A=1 
where TŻ, is the stress tensor corresponding to the 
matter field £4. Choose a topology for 4(0) (e.g., R? 
with asymptotically flat boundary conditions; T°, 
with no boundaries, etc.) This also fixes the 
topology of My to be Rxthe topology of (0). 

Next, freely specify eight of the 12 {y;(0, x*), 
K',(0, x*)}, as well as initial values, €4(0,x*), for the 
matter fields. Then determine the remaining four 
dynamical gravitational fields from the constraints 
[35] and [36]. This completes the initial data 
specification. 

One must now choose a prescription for the 
kinematical (coordinate) functions, a and 6’, so that 
either explicitly or implicitly, they are completely fixed; 
for the case of implicit specification, this may well 
mean that the coordinate functions themselves will 
satisfy PDEs, which, furthermore, can be of essentially 
any type in practice (i.e., elliptic, hyperbolic, para- 
bolic,...). Finally, with consistent initial data, 
{y(0, x£), K';(0, xf); €4(0,x*)}, in hand, and with a 
prescription for the coordinate functions, the evolution 
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equations [37] and [38] can be used to advance the 
dynamical variables forward or backward in time. 

The above description is naive since, apart from a 
consistent mathematical specification, the most crucial 
issue in the solution of a time-dependent PDE as a 
Cauchy problem is that the problem be “well posed.” 
Roughly speaking, this means that solutions do not 
grow without bound (“blow-up”) without physical 
cause, and that small, smooth changes to initial data 
yield correspondingly small, smooth changes to the 
evolved data. In short, the Cauchy problem must be 
stable, and whether or not a particular subset of 
the equations displayed in this section yields a well- 
posed problem is a complicated and delicate issue, 
especially in the generic 3D case. The reader is thus 
again cautioned against blind application of any of the 
equations displayed in this article. 


Boundary Conditions 


In principle, because all spacelike hypersurfaces, X(t), 
in a pure Cauchy evolution are edgeless — and provided 
that the initial data {-yg(0, x£), K (0, xf); £a (0, x£)} is 
consistent with asymptotic flatness, or whatever other 
condition is appropriate given the topology of the 
X(t) — there are essentially no boundary conditions to 
be imposed on the dynamical variables, {y;(t, x$), 
K',(t, x*)}, during Cauchy evolution. Note that asymp- 
totic flatness generally requires that 


1 
Jim w = fi + O (5) [40] 
and 
noe” 1 
ee eel oo aa 


where X is defined by 


X= A Xe [42] 
as previously, and fy is the flat 3-metric. Similarly, 
should the lapse, a, and shift, G, be constrained by 
elliptic PDEs — as is frequently the case in practice — 
then the only natural place to set boundary condi- 
tions is at spatial infinity, and then, provided that 
the frame at spatial infinity is inertial, with 
coordinate time t measuring proper time, one should 
have 


Jim a=1+0O (5) [43] 
and 


lim #=0 (5) [44] 


X — 00 X 


It is critical to note at this point, however, that in 
the vast bulk of past and current work in numerical 
relativity, including most of the ongoing work in 
3D, the Einstein equations [1] have been solved, not 
as a pure Cauchy problem, but as a mixed initial- 
value/boundary-value (IBVP) problem. That is, in 
the discretization process in which the continuum 
equations |1] are replaced with algebraic equations, 
the continuum domain [6]-[7] is typically replaced 
with a truncated spatial domain 

x] < X; [45] 


max 


1 . . oe 
where the X/,,. are a priori specified constants 


(parameters of the computational solution) that 
define the extremities of the “computational box.” 
As one might expect, the theory underlying stability 
and well-posedness of IBVP problems —- especially 
for differential systems as complicated as [1] — is 
even more involved than for the pure initial-value 
case, and is another very active area of research in 
both mathematical and numerical relativity 
(see, e.g., Friedrich and Nagy (1999)). 


See also: Critical Phenomena in Gravitational Collapse; 
Einstein Equations: Initial Value Formulation; Fluid 
Mechanics: Numerical Methods; General Relativity: 
Overview; Geometric Analysis and General Relativity; 
Gravitational Waves; Hamiltonian Reduction of Einstein’s 
Equations; Magnetohydrodynamics; Spacetime 
Topology, Causal Structure and Singularities; Symmetric 
Hyperbolic Systems and Shock Waves. 
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Introduction 


Consider a dynamical system with coordinates 
q’ (i=1,...,m”) and Lagrangian L(g’, q’) (field theory 
is formally covered by regarding the spatial coordi- 
nates as a continuous index). When going to the 
Hamiltonian formulation, it is usually assumed that 
the Legendre transformation between the velocities 
ġ' and the momenta 


pi = Dai [1] 


can be inverted to yield the velocities as functions of 
the q’s and the p’s. This “regular” situation occurs 
for most systems appearing in standard classical 
mechanics and enables one to proceed to the 
Hamiltonian formulation of the theory without 
difficulty. 

In field theory, however, the regular case is the 
exception rather than the rule. This is due to gauge 
invariance and first-order Lagrangians. 


è Gauge invariance A system possesses gauge sym- 
metries if it is invariant under transformations that 
involve arbitrary functions of time (gauge trans- 
formations). In that case, the solution of the 
equations of motion with given initial data is not 
unique, since it is always possible to perform a 
gauge transformation in the course of the evolution 
without changing the initial data. It is then clear 
that the Legendre transformation cannot be inver- 
tible, for if it were, one could rewrite the equations 
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of motion in the standard canonical form 
ġ = 0H/dp;,p; = —OH/dq'. These canonical 
equations are in normal form and have a unique 
solution for given initial data, which would 
contradict the presence of a gauge symmetry. 

A simple example that illustrates this phenom- 
enon is given by the following model for three 
variables q!, g*, and A, the Lagrangian of which 
reads 


ue 


L=3((@'-d) + (@ -)) 2 


This model is inspired by electromagnetism: the 
variables q! and q? play a role somewhat similar 
to that of the spatial components of the vector 
potential, while A corresponds to the temporal 
component. The Lagrangian is invariant under the 
gauge transformations 

qaa te go rea AAt D 
where € is an arbitrary function of time. The 
conjugate momenta are 

T) = 0 


pi=q'-A, pr=q -d, 


One cannot invert the Legendre transformation 
since one cannot express the velocity in terms of 
the momenta. 

First-order Lagrangians Fermionic fields obey 
first-order equations. Their Lagrangian is linear 
in the derivatives, so that the conjugate momenta 
p; depend on the coordinates g’ only. It is then 
clearly impossible to express the velocities in 
terms of the momenta through the Legendre 
transformation. More generally, any first-order 
Lagrangian with or without gauge symmetry leads 
to a noninvertible Legendre transformation. 
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A simple system that exhibits this feature is 
described by the Lagrangian 


Lage 1g). j4] 


for two bosonic degrees of freedom (z!, z4). This 
is in fact the canonical form of the Lagrangian for 
a free particle in one dimension (z? is the 
momentum conjugate to the position z!): the 
system is already in Hamiltonian form. There is 
no gauge invariance, but because the Lagrangian 
is first order, the Legendre transformation with 
[4] as starting point, 


pı=z, pr=0 5] 
is non invertible for the velocities (which do not 
even appear in the formulas for the momenta). 


Dirac showed how to develop the Hamiltonian 
formalism in the case when the Legendre transfor- 
mation is not invertible. One can still reformulate 
the equations in phase space and write them in terms 
of brackets with the Hamiltonian, but a new major 
feature emerges, namely the canonical variables are 
no longer free. Rather, the permissible phase-space 
points are constrained to be on the so-called 
“constrained surface.” For this reason, systems for 
which the Legendre transformation is not invertible 
are also called “constrained Hamiltonian systems.” 
We shall adopt this terminology here. 

The purpose of this article is to explain the main 
ideas underlying the Dirac method. To simplify the 
discussions and to focus on the features peculiar to 
the Dirac construction, we shall assume as a rule 
that all necessary smoothness conditions are fulfilled 
by the functions, surfaces, etc., appearing in the 
formalism. How to develop the analysis when some 
of the smoothness conditions are not fulfilled is of 
definite interest but goes beyond the scope of this 
review. We shall also assume, for definiteness, that 
all the variables are bosonic in order to avoid 
straightforward but somewhat cumbersome sign 
factors in the formulas. 


General Theory 
Dirac Algorithm 


Primary constraints When the Legendre transfor- 
mation [1] cannot be inverted, the momenta p;’s do 
not span an n-dimensional space but are constrained 
by relations 


bn(q,p) = 9, 


which follow from their definition. These equations 
reduce to identities when the momenta are replaced 


m= 1,...,M [6] 


by their expression [1] in terms of the coordinates 
and the velocities. They are called primary con- 
straints. We shall assume that the matrix 

O(bm) 

O(Di, q’) 
is everywhere of constant (maximum) rank M on the 
phase-space surface defined by eqns [6] which is 
assumed to be smooth. This surface is of dimension 
2n — M. 


Canonical Hamiltonian The next step in the Dirac 
procedure is to define the canonical Hamiltonian H 
through 


H = ğpi- L [7] 


As shown by Dirac, H can be re-expressed as a 
function H(q, p) of the momenta and the coordi- 
nates, even when the Legendre transformation is not 
invertible: the canonical Hamiltonian H depends on 
the velocities only through the p,’s. Furthermore, the 
original equations of motion in Lagrangian form are 
equivalent to the Hamiltonian equations 














|, OH „3pm 
i “ap ” opi 8] 

o OH, Ob m 
Pi = Oq' u Oq' [9] 
Pm(q, P) = 9 [10] 


where the u””’s are parameters, some of which will 
be determined through the consistency algorithm to 
be discussed shortly. (In [7]-[9] and everywhere 
below, there is a summation over the repeated 
indices.) 


Secondary constraints The equations of motion [8] 
and [9] can be rewritten as 


F = [F, H] + u” |F, om] [11] 
where F= F(q,p) is any function of the canonical 
variables. Here, the Poisson bracket is defined as 
usual by 


_ OG OF ðG OF 


Gila ————=— 
GF Oq' Op; Op; Oq' 











|12] 


If one takes for F one of the primary constraints 
Ọm, one should get zero, @m=0. This yields the 
consistency conditions 


(bm, H] + u” [bm, bm'] = 0 [13] 


These conditions can imply further restrictions on the 
canonical variables and/or impose conditions on the 


variables 4”. Any new relation X(g,p)=0 on the 
canonical variables leads, in turn, to a further consis- 
tency condition X =[X,H] + u” [X, ¢,]=0, which 
can bring in either further restriction on the constraint 
surface or fix more variables 4”. Constraints that 
follow from the consistency algorithm are called 
“secondary constraints.” Finally, one is left with a 
certain number of secondary constraints, which are 
denoted by ¢, =0,k =M + 1,...,M + K. We assume 
again that all the constraints (primary and secondary) 
define a smooth surface, called the “constraint surface,” 
and fulfill the condition that 0(¢,)/0(q',p;) is of 
maximum rank J = M + K on the constraint surface. 
(We also assume for simplicity that there is no 
branching in the consistency algorithm.) 


Restrictions on the w’s 
constraints 


Having a complete set of 


j=l,....M+K=]J [14] 


we can now investigate more precisely the restric- 
tions on the variables u”. These read 


i H| + 0" |G), om) S0, j=1,... [15] 


where the notation ~ means “equal modulo the 
constraints.” In [15], m is summed from 1 to M. 
Equations [15] are a set of J linear, inhomogeneous 
equations for the ws, with coefficients that are 
functions of the canonical variables gq’, p;. The 
general solution of this system is of the form 


u” = U” +u V? [16] 


where U” is a particular solution and where the V” 
(a=1,...,A) provide a complete set of independent 
solutions of the homogeneous system 


V lbj, bm] 0 17 


The coefficients u*(a=1,...,A) are completely 
arbitrary. 

We thus see the emergence of another new feature 
in the theory, in addition to the appearance of 
constraints. It is that the general solution of the 
equations of motion may contain arbitrary functions 
of time (when A #0), in agreement with the 
possible presence of a gauge symmetry. 


First- and Second-Class Constraints 


First- and second-class functions A function F(q, p) 
is called a first-class function if it generates a 
canonical transformation that maps the constraint 
surface on itself. Thus, F(g,p) is first class if its 
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Poisson brackets with all the constraints vanish 
weakly (i.e., are zero on the constraint surface), 


jal,...J 18 


A function is second class otherwise, that is, if there 
is at least one constraint ¢ such that [F,¢;] 4 0 
(not even weakly). Second-class functions generate 
canonical transformations that do not leave the 
constraint surface invariant. Since canonical trans- 
formations that map the constraint surface on itself 
form a group, the Poisson bracket of two first-class 
functions is itself a first-class function. 

Because the system is constrained to lie on the 
constraint surface, the only allowed canonical 
transformations are those that are generated by 
first-class functions. The importance of the distinc- 
tion between first-class and second-class functions 
stems from this elementary fact. Note, in particular, 
that the time evolution is generated — as it should — 
by a first-class generator since the equations of 
motion [11] can be rewritten as 


IF, dj] ~ 0, 


F x [F, H] + u [F, V” m] [19] 
with 
H' = H + U” Qm [20] 
One has both [H’, m] ~ 0 and [V bm, oj] ~ 0. 


Splitting of the constraints One can separate 
the constraints between first-class and second-class 
constraints. This can be achieved by considering the 
matrix Cj of the Poisson bracket of the constraints, 


Cy = [6,67], Af =1,--.,J [21] 
One has the following theorem due to Dirac. 


Theorem 1 If det Cj ~ 0, there exists at least one 
first-class constraint among the ġ;’s. 


Proof Straightforward: if det Cj ~ 0, one can find 
a nontrivial solution X of XC; +0. The corre- 
sponding constraint Xg; is easily verified to be first 
class. 


By redefining the constraints as Êj — $ = aj! by 
with aj (q,p) invertible, one can bring the Poisson 
brackets of the constraints to the form 


Kanal = Gag. 22] 


with (¢;) = (Ja, Xa) and where the matrix Cag is 
invertible. (We assume, for simplicity, throughout 
that the rank of the matrix Cj is constant on the 
constraint surface (“regular case”).) In this repre- 
sentation, the constraints are completely split into 
first-class constraints (ya) and second-class 


Yas Yo! T 0, Yas Xa] = 0, 
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constraints (Xa): there is no first-class constraint left 
among the y,’s, and the set {y,} exhausts all the 
first-class constraints. Note that now the index 
a=1,...,A,A+1,...,A runs over all (primary and 
secondary) first-class constraints. 

This separation of the constraints into first-class 
and second-class constraints is quite important 
because, as already seen above, the first-class 
constraints generate admissible canonical transfor- 
mations, while the second-class constraints do not. 

For a bosonic system, the matrix Cag is antisym- 
metric. As Cag is invertible, this implies that the 
number of second-class constraints is even. In the 
fermionic case, Cag is symmetric (in the fermionic 
sector) and, therefore, the number of second-class 
constraints can be even or odd. 


First-class constraints and gauge symmetries The 
first-class constraints not only map the constraint 
surface on itself, but generate, in fact, transforma- 
tions that do not change the physical state of the 
system, that is, gauge transformations. Indeed, the 
presence of arbitrary functions in the solutions of 
the equations of motion indicates that the g’s and 
the p’s involve some redundancy and are not all 
physically distinct. Only those phase-space functions 
whose time evolution does not depend on the 
arbitrary functions u’ are observables. 

That the first-class constraints generate gauge 
transformations is rather clear in the case of the 
first-class primary constraints, since these appear 
explicitly in the generator of the time evolution 
multiplied by arbitrary functions. That it also holds 
for the first-class secondary constraints is known as 
the “Dirac conjecture.” This conjecture can be 
proved under reasonable assumptions (see, e.g., 
Henneaux et al. 1990). The reason that the 
secondary first-class constraints also correspond to 
gauge transformations is that they appear in the 
brackets of the Hamiltonian with the primary first- 
class constraints. Thus, different choices of arbitrary 
functions u’ in the dynamical equations of motion 
will lead to phase-space points that differ by a 
canonical transformation whose generator involves 
the secondary first-class constraints as well. 

In any case, as noted below, one must identify the 
phase-space points in the same orbit generated by all 
the first-class constraints (primary and secondary) in 
order to get a reduced space with a symplectic 
structure (“reduced phase space”). For this reason, 
one postulates that the first-class constraints always 
generate gauge transformations, even for systems 
which are counterexamples to the Dirac conjecture 
(i.e. in that case, one defines the gauge 


transformations as being the transformations gener- 
ated by the first-class constraints). 

The extended Hamiltonian Hg is defined to be the 
sum of the first-class Hamiltonian [20] and of all the 
first-class constraints y4 multiplied by an arbitrary 
Lagrange multiplier, 


Hp = H’ + vya [23] 


(with a summed from 1 to A). It is the generator of 
the time evolution in which the complete gauge 
symmetry is fully displayed. 


Elimination of second-class constraints — Dirac 
brackets Second-class constraints do not generate 
permissible canonical transformations, since they do 
not map the constraint surface on itself. For this 
reason, it is convenient to eliminate them. This can 
consistently be done by using the Dirac brackets 
instead of the Poisson brackets. By definition, the 
Dirac bracket [F, G]p of two phase-space functions 
F and G is given by 


[F, D]p = [F, G] - [F, xalC**[xe,G] PA 
where C°’ is the inverse to Cag, 
CY Cy =o, 


(which exists since the y,’s are second class). As 
shown by Dirac, the bracket [24] is indeed a bracket 
(antisymmetry, derivation property, and Jacobi 
identity). Furthermore, it fulfills the crucial property 
that the Dirac bracket of anything with any second- 
class constraint is zero, 


IF, Xalp = 0 


Thus, one can consistently eliminate the second-class 
constraints and replace the Poisson bracket by the 
Dirac bracket. Once this is done, one has fewer 
canonical variables and only first-class constraints 
remain (if any). It also follows from the definition 
that the Dirac bracket of two first-class functions is 
equal to their Poisson bracket. 


(F arbitrary) [25] 


Gauge conditions One can push the reduction 
procedure further and eliminate the first-class con- 
straints by means of gauge conditions. Gauge condi- 
tions C,=0 are conditions on the phase-space 
variables which do not follow from the Lagrangian 
and which have the property that they cut each gauge 
orbit once and only once. Since the gauge transfor- 
mations are generated by the first-class constraints, 
this requirement is (locally) equivalent to 


[Ca w]e FO > 2? FO [26] 


That is, the constraints (y,,C,) form together a 
second-class system: there is no first-class constraint 
left once the conditions C,=0 are included. One 
can then eliminate all the constraints and gauge 
conditions and introduce the corresponding Dirac 
bracket. For gauge-invariant functions, this Dirac 
bracket coincides with the original Poisson bracket. 

The reduced phase space is the unconstrained 
space obtained after this reduction, equipped with 
the Dirac bracket. It has dimension 2n— s — 2A, 
where 2n is the dimension of the original phase 
space, s is the number of second-class constraints, 
and A is the number of first-class constraints. In the 
bosonic case, this number is even (as it should) 
because s is even. One sees that “first-class con- 
straints strike twice” since they need gauge 
conditions. 

The observables of the theory are the reduced 
phase-space functions. They form a Poisson algebra, 
the relevant reduced phase-space bracket being the 
Dirac bracket associated with all the constraints and 
gauge conditions. The symplectic structure defined 
in the reduced phase space is nondegenerate because 
one has removed all the first-class constraints. 

The definition of reduced phase space given above 
is useful in practice but has the conceptual 
drawback of relying on gauge conditions. This 
approach does not display clearly its intrinsic 
significance and, furthermore, in the case of the 
so-called Gribov problems (global obstructions to 
cutting each gauge orbit once and only once), may 
yield the incorrect expectation that the reduced 
phase space does not exist. We shall provide a more 
intrinsic definition below, which does not involve 
gauge conditions. 


Examples 

First example (see eqn [2]). There is here one 
primary constraint, namely 7,=0. The canonical 
Hamiltonian is (1/2)((p1)* + (p2)*) + A(p1 + p2). 
The consistency algorithm yields the secondary 
constraint pı + p2 =0 and no condition on the w’s. 
The constraints are first class. They generate the 
gauge transformations q! — q! +€, q? —> q? +e, 
and A — A+ 7, which coincide with the Lagrangian 
gauge transformations if one identifies 7 with ê 
(e and ¿ are, of course, independent at any given 
time). One can fix the gauge by means of the gauge 
conditions A= 0, qt! +q?=0. The reduced phase 
space is two-dimensional and the observables can 
be identified with the functions of the gauge- 
invariant variables (1/2)(q! — q?) and p; —po, 
which are conjugate. Any other gauge condition 
leads to the same reduced phase space. 
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Second example (see eqn [4]). The primary 
constraints are pı — z7=0 and p2 =0 and define a 
two-dimensional plane in the four-dimensional 
phase space (z!,z7,p1,p2). The consistency algo- 
rithm forces ut! =z? and u? =0 and does not bring 
any further constraint. The constraints are second 
class since [p2,p; —z*]=1. One can eliminate pı 
and p2 through the constraints. The Dirac brackets 
of the remaining variables vanish, except 
[z!, 27] =1. The reduced phase is the space of the 
2’s, with z7 conjugate to zt. The Hamiltonian is the 
free-particle Hamiltonian , H = (1/2)(z2)”. Thus, one 
recovers the original description which was already 
in Hamiltonian form. (The recognition that a system 
is already in first-order form often enables one to 
shortcut some aspects of the Dirac procedure by not 
introducing the unnecessary momenta which would 
in any case be eliminated in the end.) 


Quantization 


The phase space of physical interest is the reduced 
phase space and the physical algebra is the algebra 
of the observables. The quantization of the theory 
then amounts to quantizing the algebra of the 
observables. This can be achieved along two 
different lines: 


1. Reduce then quantize: In this direct approach, 
one represents as quantum operators only the 
reduced phase-space functions. There is no 
Operator associated with non-gauge-invariant 
functions. 

2. Quantize then reduce: In this approach, one 
represents as quantum operators the bigger alge- 
bra of functions of all the phase-space variables. 
One must then take into account the constraints. 
The second-class constraints are enforced as 
Operator equations, which is consistent with the 
correspondence rule that the commutator in the 
quantum theory is ib times the Dirac bracket, 


AB — BA = ib[A, B], [27] 


(plus higher-order terms in $). The first-class 
constraints are implemented in a more subtle 
way. It would be inconsistent to impose them as 
operator equations since in general [y, F]p 4 0 
(even in the Dirac bracket). What one does is to 
impose them as conditions on the physical states: 
these are defined as the states annihilated by the 
first-class constraints, 


yalp) = 0 [28] 


For simple systems, it is easy to verify that the two 
procedures are equivalent. There is yet another 
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approach, in which one extends the system rather 
than reduce it. This is the Becchi-Rouet—Stora— 
Tyutin (BRST) approach, in which the new variables 
are called ghosts. 


Geometric Description 


We defined above first-class and second-class 
constraints through algebraic means. It turns out 
that these definitions also have a geometrical 
interpretation, which sheds considerable insight 
into their nature. 

The phase-space symplectic 2-form o induces, by 
pullback, a 2-form oy on the constraint surface X. 
While o is of maximal rank, this may not be the case 
for the induced oy, which may be degenerate. In 
fact, the rank of oy fails to be equal to the 
maximum rank 2n — J (where J is the total number 
of constraints) by precisely the number A of first- 
class constraints. 

Indeed, the Hamiltonian vector fields Xj, associated 
with the first-class constraints are tangent to the 
constraint surface X and are null eigenvectors of oy, 


Oy(Xyq,Y) =O VY tangent to X [29] 


as an immediate consequence of the first-class 
property. Here, all first-class constraints (primary 
and secondary) yield a null eigenvector. The integral 
surfaces of the vector fields X., are the gauge orbits. 
The reduced phase space is nothing else but the 
quotient space of the constraint surface by the gauge 
orbits. The 2-form induced in the quotient space is 
invertible because one has removed all degeneracy 
directions (including the ones associated with sec- 
ondary first-class constraints). Reaching the reduced 
phase space falls under the scope of Hamiltonian 
reduction. The observables are the functions on the 
reduced phase space. 

Thus, the reduced phase space is obtained through 
a two-step procedure. First, one restricts the functions 
to functions on the constraint surface X. One may 
view the algebra C™(%) of smooth functions on as 
the quotient algebra C™(P)/N of the algebra C™(P) 
of smooth phase-space functions by the ideal N of 
phase-space functions that vanish on the constraint 
surface o. The second step in the reduction procedure 
is to impose the gauge-invariant condition on the 


functions in C%(X), that is, to impose that they are 
constant along the gauge orbits O. Assuming all 
necessary smoothness and regularity conditions to be 
fulfilled (i.e., that the orbits fiber which is, for 
instance, the case if the gauge orbits are the orbits 
of a free and proper group action), one may denote 
the algebra of observables as C”(£/0). This algebra 
is a Poisson algebra because the induced 2-form on 
the quotient space “/O is nondegenerate. The 
algebraic description of the observables underlies the 
BRST construction. 

It is interesting to note that in the covariant 
approach to phase space, a similar two-step reduc- 
tion procedure occurs. What plays the role of the 
constraint surface is the stationary surface in the 
space of all histories g'(t) of the dynamical variables. 
The gauge symmetry acts on this space and the 
reduced phase space is just the quotient space. One 
can establish the equivalence of the two descriptions 
(Barnich et al. 1991). 


See also: Batalin—Vilkovisky Quantization; BRST 
Quantization; Canonical General Relativity; Operads; 
Perturbative Renormalization Theory and BRST; 
Quantum Dynamics in Loop Quantum Gravity; Quantum 
Field Theory: A Brief Introduction. 
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Euclidean Quantum Fields 


The construction of a relativistic quantum field is 
still an open problem for fields in spacetime 
dimension d>4. The conceptual difficulty that 
sometimes led to fear an incompatibility between 
nontrivial quantum systems and special relativity 
has however been solved in the case of dimension 
d=2,3 although, so far, has not influenced the 
corresponding debate on the foundations of quan- 
tum mechanics, still much alive. 

It began in the early 1960s with Wightman’s work 
on the axioms and the attempts at understanding the 
mathematical aspects of renormalization theory and 
with Hepps’ renormalization theory for scalar fields. 
The breakthrough idea was, perhaps, Nelson’s 
realization that the problem could really be studied 
in Euclidean form. A solution in dimensions d = 2, 3 
has been obtained in the 1960s and 1970s through a 
remarkable series of papers by Nelson, Glimm, 
Jaffe, and Guerra. While the works of Nelson and 
Guerra relied on the “Euclidean approach” (see 
below) and on d= 2, the early works of Glimm and 
Jaffe dealt with d = 3 making use of the “Minkowskian 
approach” (based on second quantization) but 
making already use of a “multiscale analysis” 
technique. The latter received great impulsion and 
systematization by the adoption of Wilson’s views 
and methods on renormalization: in physics termi- 
nology, renormalization group methods; a point of 
view taken here following the Euclidean approach. 
The solution dealt initially with scalar fields but it 
has been subsequently considerably extended. 

The Euclidean approach studies quantum fields 
through the following problems: 


1. existence of the functional integrals defining the 
generating functions (see below) of the probabil- 
ity distribution of the interacting fields in finite 
volume: the “ultraviolet stability problem,” 

2. existence of the infinite-volume limit of the 
generating functions: the “infrared problem,” 
and 

3. check that the infinite volume generating 
functions satisfy the axioms needed to pass 
from the Euclidean, probabilistic, formulation 
to a Minkowskian formulation guaranteeing 
the existence of the Hamiltonian operator, 


relativistic covariance, Ruelle-Haag scattering 
theory: the “reconstruction problem.” 


The characteristic problem for the construction of 
quantum fields is (1) and here attention will be 
confined to it with the further restriction to the 
paradigmatic massive scalar fields cases. The dimen- 
sion d of the spacetime will be d=2,3 unless 
specified otherwise. 

Given a cube A of side L, ACR®%, consider the 
following functional integral on the space of the fields on 
A, that is, on functions o defined for č € A, 


ZN(A,f) = J exp (- / One pa ngs 
+un + feo Jag) Pudo i1] 


The fields y's") are called “Euclidean” fields with 
ultraviolet cutoff N > 0, fg is a smooth function with 
compact support bounded by |fe| < 1 (for definiteness), 
the constants Ay >0,pun,vn are called “bare cou- 
plings,” and Py is a Gaussian probability distribution 
defining the free-field distribution with mass m and 
ultraviolet cutoff N; the probability distribution Py 
is determined by its “covariance” CSN df fyls) 
Pn (SN) dPw, which 1 in the physics literature is s called a 
“propagator,” given by 


e'P:( (€-7+nL) 


= 
prm XN 


(pdp [2 


22 


The sum over the integers n € Zf is introduced so that 
the field ee is periodic over the box A: this is not 
really necessary as in the limit L — oo either translation 
invariance would be recovered or lack of it properly 
understood, but it makes the problem more symmetric 
and generates a few technical simplifications; here 
\¥n(z) is a “regularizer” and a standard choice is 


mOPN = 1) 


xn (ipl) = 


with y> 1, which is such that 


xne) 1 1 

p? d m2 p* de m2 p? £ NX m2 
-y a oe E 
= ~ p? + 2P- m? p + 7m? 


here y>1 can be chosen arbitrarily: so y=2. If 
d > 3, the above regularization will not be sufficient 
and a yn decaying faster than p% would be needed. 
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A simple estimate yields, if € € (0,1) is fixed and c 
is suitably chosen, 


C | < cy(t-2)Ne-mle—m 





ac DN Nm- E (4 
with ~% ee as N if d=2. 
The 
— 1g ZNC, f) 
defines a “generating function” of a probability 


distribution Pint over the fields on A which will be 
called the “distribution with y*-interaction” regu- 
larized on A and at length scale m!y%: the 
integral, in [1], 


def i . 
We = Do - f (ane + pny 
tant fee) alg sI 


will be called the “interaction potential” with 
external field f. The RN is introduced to 
guarantee that the integral [1], fe’ dPn, is well 
defined if An >0. The coment of Pin are the 
functional derivatives of C(f): they are called 
“Schwinger functions.” 

The problem (1) can now be made precise: it is to 
show the existence of AN, uyn, vyn SO that the limit 


: ZN(A, f) 
OAA AT 


exists for all f and is not Gaussian, that is, it is not 
the exponential of a quadratic form in f: which 
would be the case if An, uyn — 0 fast enough: the last 
requirement is of course essential because the 
Gaussian case describes, in the physical interpreta- 
tion, free fields and noninteracting particles, that is, 
it is trivial. Note that vn does not play a role: its 
introduction is useful to be able to study separately 
the numerator and the denominator of the fraction 


Zn(A,f) 
Zn(A, 0) 


For more details, the reader is referred to Wightman 
and Garding (1965), Streater and Wightman (1964), 
Nelson (1966), Guerra (1972), Osterwalder and 
Schrader (1973), and Simon (1974). 


The Regularized Free Field 


Since the propagator, see [4], decays exponentially 
over a scale m™! and is smooth over a scale mty 


the fields ee sampled with distribution Py 
are rather singular objects. Their properties cannot be 
described by a single length scale: they are extremely 
large for large N, take independent values only beyond 
distances of order m7! but, at the same time, they look 
smooth only on the much smaller scale m~!y-\. Their 
essential feature is that fixed ¢<1, for example, 
é€=1/2, with Py-probability 1 there is B>0 such 
that (interpreting y'¢~*)/2N as N if d=2) 


| oe | < ByN(d-2)/2 
N N 2 2/2 
oe ) = pis ) 2 Bay 2/2 (Nm|é _ ni) 


and furthermore the probability of the relations in 
[6] will be N-independent, that is, oN) are 
bounded and roughly of size N4- nh as N —> oœ 
and, on a very small length scale mty ™, almost 
constant. 

Substantial control on the field y's) statisti- 
cally sampled with distribution Px can be obtained 
by decomposing it, through [3], into “components 
of various scales”: that is, as a sum of statistically 
mutually independent fields whose properties 
are entirely characterized by a single scale of length. 
This means that they have size of order 1 and 
are independent and smooth on the same length 
scale. 

Assuming the side of A to be an integer multiple 
of m, let Q, be a pavement of A into boxes of 
side m'y”, imagined hierarchically arranged so 
that the boxes of Q, are exactly paved by those of 
Op+1- 

Pena z” 
gator C i 


¥ ( 2 ! l p ! ja 
p+ m ptm 


neZ’ 


to be the random field with propa- 
with Fourier transform 


so that y! and its propagator Cx ’ can be repre- 


sented, see [2], [3], as 


N 
N = h 
os L= Noge L 
þ=1 
` 7 
(<N) _ h(d—2) Ol) 
Cen E a Cheah 


where the fields z% are independently distributed 
Gaussian fields. Note that the fields z are also 
almost identically distributed because their propa- 
gator is obtained by periodizing over the period y’L 
the same function 


Ce. nN (2r)? p? a 2m? p? ra m2 


that is, their propagator is 


The reason why they are not exactly equally 
distributed is that the field z% is periodic with 
period y’L rather than L. But proceeding with care 
the sum over n in the above expressions can be 
essentially ignored: this is a little price to pay if one 
wants translation invariance built in the analysis 
since the beginning. 

The representation [7] defines a 
representation” of the field yw’. Smoothness 
properties for the field y's‘) can be read from 
those of its “components” 2”). Define, for A € Qo, 


“multiscale 


zg = 4 
Ra — max ef | + pe- 1/4 [8] 
A ECA NEA é = n| 
ealn 


and r will be chosen r=0 or r=1 as needed (in 
practice 7=0 if d=2 and T=1 if d=3) T=1 will 
allow us to discuss some smoothness properties of 
the fields which will be necessary (e.g., if d=3). 
Then the size |z|, of any field 2"), for all b>1, is 
estimated by 


-B2 y 
P( max Izla < B) > eee A 
af [9] 
P(|z|, > Ba, VA € D) < I ceo 
AED 


where P is the Gaussian probability distribution of 
z, D is any collection of boxes A € Qo and c,c' > 0 
are suitable constants. The [9] imply in particular 
[6]. The estimates [9] follow from the Markovian 
nature of the Gaussian field z, that is, from the 
fact that the propagator is the Green’s function of an 
elliptic operator (of fourth order, see the first of [3]), 
with constant coefficients which implies also the 
inequalities (fixing £ € (0, 1)) 


fowl = | J zeenP (da) < ce 7el5—ml 
(h) (h) 10] 
low Cin 
where |€—7| is reinterpreted as the distance 
between é, 7 measured over the periodic box 7A 
(hence | — n| differs from the ordinary distance 
only if the latter is of the order of y’L). The 
interpretation of [10] is that z” are essentially 
bounded variables which, on scale ~m™, are 
essentially constant and furthermore beyond length 
~m are essentially independently distributed. 
For more details, the reader is referred to Wilson 
(1970, 1972) and Gallavotti (1981, 1985). 





< c(m|n — n|) 


Constructive Quantum Field Theory 619 


Perturbation Theory 


The naive approach to the problem is to fix Ay = 
A> 0 and to develop Zn(A,f) or, more conveniently 
and equivalently, (1/|A|) log Zn(A, f) in powers of À. 
If one fixes a priori uyn,vn independent of N, 
however, even a formal power series is not possible: 
this is trivially due to the divergence of the 
coefficients of the power series, already to second 
order, for generic f in the limit N — oo. Nevertheless 
it is possible to determine pn(A),vnN(A) as functions 
of N and à so that a formal power series exists (to 
all orders in A): this is the key result of renormaliza- 
tion theory. 

To find the perturbative expansion, the simplest is 
to use a graphical representation of the coefficients of 
the power expansion in A, uN, Vyn, f and the Gaussian 
integration rules which yield (after a classical 
computation) that the coefficient of MUN TE, ereje 1S 
obtained by considering the graph elements shown in 
Figure 1, where the segments will be called half-lines 
and the graph elements will be called, respectively, 
“coupling” or “y*t-vertex,” “mass vertex, 
vertex,” and “external vertex.” 

The half-lines of the graph elements are consid- 
ered distinct (i.e., imagine a label attached to 
distinguish them). Then consider all possible con- 
nected graphs G obtained by first drawing, respec- 
tively, n,p,r graph elements in Figure 1, which are 
not vacuum vertices, with their nodes marked by 
points in A named Sister Gneist Sapra ANC 
form all possible graphs obtained by attaching pairs 
of half-lines emerging from the vertices of the graph 
elements. These are the “nontrivial graphs.” 
Furthermore, consider also the single “trivial” 
graph formed just by the third graph element and 
consisting of a single point. All graphs obtained in 
this way are particular Feynman graphs. 

Given a nontrivial graph G (there are many of 
them) we define its value to be the product 


We G6 passem ero 


n, Pp 
= (syed My Leng EN 
n!p!r! i Šon 


393 cc 


vacuum 


’ nae 


[11] 


where the last product runs over all pairs £= (€), no) 
of half-lines of G that are joined and connect two 
vertices labeled by points €,, 7): “call line of G” any 
such pair. If the graph consists of the single vacuum 


Figure 1 The graph elements to representing p<", p<, 
a constant y=), 
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vertex its value will be vy. The series for 


(1/|A|) log ZN (A, f) is then 


1 
taf Web 


and the integral will be called the integrated graph 
value. 

Suppose first that un =vn =0. Then if a graph G 
contains subgraphs like in Figure 2, the correspond- 
ing respective contribution to the integral in [12] 
(considering only the integrals over 7 and suitably 
taking care of the combinatorial factors) is a factor 
obtained by integrating over € the quantities 


n+p+r 


Sntptr) II dő, [12] 
j=1 


(<N) A(<N) A(<N) 
—6\CY: os Ceg 


Cc N) cls) Ed e 
as on np 2 


4? . 3! 
2! 








which if d=3 diverge as N— oo as YN or, respec- 
tively, as N; the second factor does not diverge in 
dimension d = 2 while the first still diverges as N. The 
divergences arise from the fact that as  — n — 0 the 
propagator behaves as |£—7|\ if d=3 or as 
—log|& —n| if d=2, all the way until saturation 
occurs at distance |é — n| ~ m'y~: for this reason 
the latter divergences are called “ultraviolet 
divergences.” 

However, if we set un Æ 0, then for every graph 
containing a subgraph like those in Figure 2 there 
is another one identical except that the points 
a, 8 are connected via a mass vertex, see Figure 1, 
with the vertex in €, by a line @ and a line &B; 
the new graph value receives a contribution from 
the mass vertex inserted in č between œ and $ 
simply given by a factor —un. Therefore if we fix, 
for Go = 3; 


— —6\ CS) ye fe 


aa 3 
ajg 


we can simply consider graphs which do not contain 
any mass graph element and in which there are no 
subgraphs like the first in Figure 2 while the subgraphs 
r n second in Figure 2 do not contribute a factor 


Man E — 6ACGY + dun [14] 


J C Ce B (CN) dn but a renormalized factor 
a g 6B a £ > B 

Figure 2 Divergent subgraphs, if d=3. If d=2 only the first 

diverges. 


[Cc] C Ca — C'S) dn. If d=2, we only 
A to define uyn as the first term on the right-hand 
side (RHS) of [14] and we can leave the subgraphs like 
the second in Figure 2 as they are (without any 
renormalization). 

Graphs without external lines are called vacuum 
graphs and there are a few such graphs which are 
divergent. Namely, if d= 3, they are the first three 
drawn in Figure 3; furthermore, if uyn is set to the 
above nonzero value a new vacuum graph, the 
fourth in Figure 3, can be formed. Such graphs 
contribute to the graph value, respectively, the terms 
in the sum 





<N)2 7A(<N)2 N 
i fe ay Can, Cee, Wodés—nnCzie? [15] 


and diverge, respectively, as VYN, yN, N, 47N if d=3 
while, if d=2, only the first and ae last (see [14]) 
diverge, like N?. 

Therefore, if we fix vj as minus the quantity in 
[15] we can disregard graphs like those in Figure 3; 
if d=2 vy can be defined to be the sum of the first 
and last terms in [15]. 

The formal series in À and f thus obtained is called 
the “renormalized series” for the field yf in 
dimension d=2 or, respectively, d=3. Note that 
with the given definitions and choices of uyn, vn the 
only graphs G that need to be considered to 
construct the expansion in and f are formed by 
the first and last graph elements in Figure 1, paying 
attention that the graphs in Figure 3 do not 
contribute and, if d=3, the graphs with subgraphs 
like the second in Figure 2 have to be computed with 
the modification described. 

In the next section, it will be shown that the 
above are the only sources of divergences as N —> œœ 
and therefore the problem of studying [1] is solved 
at the level of formal power series by the subtraction 
in [14]. This also shows that giving a meaning to the 
series thus obtained is likely to be much easier if 
d=2 than: i d= 3. 

The coefficients of order k of the expansion in A 
of (1/|A]) log Zn(A, f) can be ordered by the number 
2n of a Pak external fields: and have 
the form f s% TEE ae A (fe, dg;): the kernels 
s% are the A functions of order 2n, see the 
section “Euclidean quantum fields.” 


£ 


Figure 3 Divergent vacuum graphs. 


Remark If d=4, the regularization at cutoff N in 
[2] is not sufficient as in the subtraction procedure 
smoothness of the first derivatives of the field 
y'=N) is necessary, while the regularization [2] does 
not even imply [6], that is, not even Holder 
continuity. A higher regularization (i.e., using a 
yn like the square of the yn in [3]). Furthermore, 
the subtractions discussed in the case d=3 are not 
sufficient to generate a formal power series and 
many more subtractions are needed: for instance, 
graphs with a subgraph like the one in Figure 4 
would give a contribution to the graph value which 
is a factor 





2 def 2» 6° g (<N)2 
ES J e- dn 


also divergent as N— oœ proportionally to N. 
Although this divergence could be canceled by 
changing into An =A + A*éy the previously dis- 
cussed cancelations would be affected and a change 
in the value of un would become necessary; 
furthermore, the subtraction in [14] will not be 
sufficient to make finite the graphs, not even to 
second order in A, unless a new term 
N f (azp Y dé with ay = (1/2)A2 f ace” 
(E — n)* is added in the exponential in [1]. 

But all this will not be enough and still new 
divergences, proportional to \°, will appear. 


And so on indefinitely, the consequence being that 
it will be necessary to define AN, UN,QN,YN as 
formal power series in (with coefficients diverging 
as N — oo) in order to obtain a formal power series 
in A for [1] in which all coefficients have a finite 
limit as N— œo. Thus, the interpretation of the 
formal renormalized series in the case d=4 is 
substantially different and naturally harder than 
the cases d=2,3. Beyond formal perturbation 
expansions, the case d=4 is still an open problem: 
the most widespread conjecture is that the series 
cannot be given a meaning other than setting to 0 all 
coefficients of ,j>0. In other words, the con- 
jecture claims, there should be no nontrivial solution 
to the ultraviolet problem for scalar yf fields in 
d=4. But this is far from being proved, even at a 
heuristic level. The situation is simpler if d > 5: in 
such cases, it is impossible to find formal power 
series in A for (1/|A]) log Zn(A,f), even allowing 
AN; UN, QN, YN to be formal power series in A with 
divergent coefficients. 


Figure 4 The simplest new divergent subgraph on d= 4. 
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The distinctions between the cases d = 2, 3,4, >4 
explain the terminology given to the y*-scalar field 
theories calling them  super-renormalizable if 
d = 2,3, renormalizable if d=4 and nonrenormaliz- 
able if d>4. Since the (divergent) coefficients in the 
formal power series defining AN,MN,QN,VN are 
called counter-terms, the y*-scalar fields require 
finitely many counter-terms (see [14]) in the super- 
renormalizable cases and infinitely many in the 
renormalizable case. The nonrenormalizable cases 
(d >4) cannot be treated in a way analogous to the 
renormalizable ones. 

For more details, the 
to Gallavotti (1985), Aizenman 
Frohlich (1982). 


reader is referred 
(1982), and 


Finiteness of the Renormalized Series, 
d=2,3: “Power Counting” 


Checking that the renormalized series is well defined 
to all orders is a simple dimensional estimate 
characteristic of many multiscale arguments that in 
physics have become familiar with the name of 
“renormalization group arguments.” 

Consider a graph G with n +r vertices built over 1 
graph elements with vertices €,,...,€, each with four 
half-lines and r graph elements with vertices 
Cisis++9$n4, Tepresenting the external fields: as 
remarked in the previous section, these are the only 
graphs to be considered to form the renormalized series. 

Develop each propagator into a sum of propaga- 
tors as in [7]. The graph G value will, as a 
consequence, be represented as a sum of values of 
new graphs obtained from G by adding scale labels 
on its lines and the value of the graph will 
be computed as a product of factors in which a 
line joining €7 and bearing a scale label þ 
will contribute with eS replacing C!S‘). To avoid 

n 
proliferation of symbols, we shall call the 
graphs obtained in this way, i.e., with the scale 
labels attached to each line, still G: no confusion 
should arise as we shall, henceforth, only consider 
graphs G with each line carrying also a scale label. 

The scale labels added on the lines of the graph G 
allow us to organize the vertices of G into 
“clusters”: a cluster of scale h consists in a maximal 
set of vertices (of the graph elements in the graph) 
connected by lines of scale h’ > h among which one 
at least has scale h. 

It is convenient to consider the vertices of the 
graph elements as “trivial” clusters of highest scale: 
conventionally call them clusters of scale N + 1. 

The clusters can be of “first generation” if they 
contain only trivial clusters, of “second generation” 
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if they contain only clusters which are trivial or of 
the first generation, and so on. 

Imagine to enclose in a box the vertices of graph 
elements inside a cluster of the first generation and 
then into a larger box the vertices of the clusters of 
the second generation and so on: the set of boxes 
ordered by inclusion can then be represented by a 
rooted tree graph whose nodes correspond to the 
clusters and whose “top points” are nodes represent- 
ing the trivial clusters (i.e., the vertices of the graph). 

If the maximum number of nodes that have to be 
crossed to reach a top point of the tree starting from 
a node v is n, (v included and the top nodes 
included), then the node v represents a cluster of the 
n th generation. The first node before the root is a 
cluster containing all vertices of G and the root of 
the tree will not be considered a node and it can 
conventionally bear the scale label O: it represents 
symbolically the value of the graph. 

For instance, in Figure 5 a tree @ is drawn: its 
nodes correspond to clusters whose scale is indicated 
next to them; in the second part of the drawing, the 
trivial clusters as well as the clusters of the first 
generation are enclosed into boxes. 

Then consider the next generation clusters, that is, 
the clusters which only contain clusters of the first 
generation or trivial ones, and draw boxes enclosing 
all the graph vertices that can be reached from each 
of them by descending the tree, etc. Figure 6 
represents all boxes (of any generation) correspond- 
ing to the nodes of the tree in Figure 5. The 
representations of the clusters of a graph G by a tree 
or by hierarchically ordered boxes (see Figures 5 and 
6) are completely equivalent provided inside each 
box not representing a top point of the tree the scale 
h, of the corresponding cluster v is marked. For 
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Figure 5 A tree and its clusters of generation 1 and 2. 
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Figure 6 All clusters of any generation for the tree in Figure 5. 
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Figure 7 The clusters in Figure 6 after affixing the scale labels. 


instance, in the case of Figure 6 one gets Figure 7. 
By construction, if two top points € and 77 are inside 
the same box b, of scale þh, but not in inner boxes, 
then there is a path of graph lines joining € and n 
all of which have scales >h, and one at least has 
scale h,. 

Given a graph G, fix one of its points €, (say) and 
integrate the absolute value of the graph over the 
positions of the remaining points. The exponential 
decay of the propagators implies that if a point 77 is 
linked to a point 7 by a line of scale h the 
integration over the position of 7 is essentially 
constrained to extend only over a distance ym. 
Furthermore, the maximum size of the propagator 
associated with a line of scale b is bounded 
proportionally to y'¢-)’. Therefore, recalling that 
ife| is suppose bounded by 1, the mentioned integral 
can be immediately bounded by 

A" enter def AMON ~2)/2h —dh,(s,—1 

“ors — YY ‘Th ees amt) 
where, C being a suitable constant, the first product 
is over the half-lines 2 composing the graph lines and 
the second is over the tree nodes (i.e., over the 
clusters of the graph G), s, is the number of 
subclusters contained in the cluster v but not in 
inner clusters; and in [16] the scale of a half-line £ is 
h; if £ is paired with another half-line to form a line 
£ (in the graph G) of scale label hy. 

Denoting by v’ the cluster immediately containing 
v in G, by n™ the number of half-lines in the 
cluster v, by n,,r, the numbers of graph elements of 
the first type or of the fourth type in Figure 1 with 
vertices in the cluster v, and denoting by né the 
number of lines which are not in the cluster v but 


have one extreme on a vertex in v (“lines external to 
v”), the identities (k = 0) 


N (Ay — k)(sv — 1) 
= ` (b, — hy)(my + ty — 1) 


D (b, = k) inner — ` (by = hy )ninner [17] 


v>root v>root 
with 


e 


A def 
inner 
n v 


e a ty — 1 


hold, so that the estimate [16] can be elaborated into 


I < ae 
vor [1 8] 


def d+2 
py = —d + (4 — d)n, + r, —— 3 5 


where hy =k =0 if v is the first nontrivial node (i.e., 
v'=root), and an estimate of the integral of the 
absolute value of the graphs G with given tree 
structure but different scale labels is proportional to 
Xip I < œ if (and only if) p, > 0, Vv. 

But there may be clusters v with only two 
external lines n =2 and two graph vertices inside: 
for which p, =0. However, this can happen only if 
d=3 and in only one case: namely if the graph G 
contains a subgraph of the second type in Figure 2 
and the three intermediate lines form a cluster v of 
scale bh, while the other two lines are external to it: 
hence on scale }’>h,. In this case, one has to 
remember that the subtraction in the previous section 
has led to a modification of the contribution of such a 
subgraph to the value of the graph (integrated over 
the position labels of the vertices). As discussed in the 
previous een the change amounts to replacing the 

(b') (b') 
propagator Cc" 7B by C! n —C 

This improves, in rt 8], the estimate of the contribu- 
tion of the line joining 77 to J from being proportional 
to f Ci (Sh) a Oi (SP) = to being proportional to 


f CE (Shu) Ne cst CE ”)) dn; and this changes the con- 
tribution of the line n8 from we em to f e77 I-n] 
(7? |E — mi) \/2 dn because C”) is regular on scale 
y” m, see [10] with e = 1/2. 

Since En are in a cluster of higher scale h, this 
means that the estimate is improved by q7 4/2", 
In terms of the final estimate, this means that p, in 
[18] can be improved to p,=p,+1/2 for the 
clusters for which p,=0. Hence, the integrated 
value of the graph G (after taking also into account 
the integration over the initially selected vertex €,, 
trivially giving a further factor |A| by translation 
invariance), and summed over the possible scale 
labels is bounded proportionally to |A|Xy,)I < co 
once the estimate of I is improved as described. 

Note that the graphs contributing to the perturbation 
series for (1/|A|) log Zn(A,f) to order A” are finitely 
many because the number r of external vertices is r < 
2n +2 (since graphs must be connected). Hence, the 
perturbation series is finite to all orders in A. 

The above is the renormalizability proof of the 
scalar y*-fields in dimension d=2,3. The theory is 
renormalizable even if d=4 as mentioned in the 
remark at the end of the previous section. The 
analysis would be very similar to the above: it is just 
a little more involved power-counting argument. 


Constructive Quantum Field Theory 623 


For more details, the reader is referred to Hepp 
(1966), Gallavotti (1985), sections 8 and 16. 


Asymptotic Freedom (d = 2, 3). 
Heuristic Analysis 


Finiteness to all orders of the perturbation expan- 
sions is by no means sufficient to prove the existence 
of the ultraviolet limit for Zn(A,f) or for (1/|A]) 
log Zn(A,f): and a priori it might not even be 
necessary. For this purpose, the first step is to check 
uniform (upper and lower) boundedness of Zyn(A, f) 
as N— oo. 

The reason behind the validity of a bound 
elME-As) < Zn (A, f) < lME+OP with Ex (A, f) cutoff 
independent has been made very clear after the 
introduction of the renormalization group methods 
in field theory. The approach studies the mae 
Zn(A, f), N 
into its regular e ei z”, n [7], and 
integrating first over zN), then over zN- 1) and so on. 

The idea emerges ao if the potential Vy in 
[1] and [4 | is written in terms of the “normalized” 
variables XN)“ = ry—N(d-2)/ ek see [6]; here if d=2 
the factor y'4~7)/2N is interpreted as N12. 

The key remark is that as far as the integration 
over the small-scale component z®? is concerned the 
field X is a sum of two fields of size of order 1 
(statistically), 


recursively, decomposing the field Pe 


(N) — aN) —(d—2)/2y(N-1) 
Xe Shing t¥ Xx; 


if d=2 this becomes 


(N = 1)'/ KN 
N1/2 g 





and it can be considered to be smooth on scale m-!y-N 
(also statistically). Hence, approximately constant 
and of size of order O(1) on the small cubes A of 
volume y~ Nm? of the pavement Qn introduced 
before [7]; at the same time it can be considered to 
take (statistically) independent values on different cubes 
of On. This is suggested by the inequalities [8]-[10]. 
Therefore, it is natural to decompose the potential 
Vn, see [5], as a sum over the small cubes A of volume 
y-4Nm~4 of the pavement Qn as (see [14] for the 
definition of un, vn), taking henceforth m= 1, 


) def 5 -Nd J. C al 


AEON 
A uny NXE )2 
(d—2)/2N y(N ) 
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where y'7-2)N is interpreted as N if d=2. Hence, if 
d=3 it is 
Vu (2) 
=. Yr | (ax, age 
Ae On 
d 

HIN =F fey NX | 3 [20] 

where 


Fix = (-6dcn + ANY” ch), 
Dy 3Ac2, + 2y~Nbn + BN Y2NDL 


and cn, ch, bN, bh, computable from [15] and [14], 
admit a limit as N — oo. While if d=2 it is 


Vu (2) 
Sr NAN ii (AXP + aux? 
AEQN A 
d 


where [iy a ew and Dy =o, and cn, compu- 
table from [13], admits a limit as N — oo. 

The fields z®® and X%-" can be considered 
constant over boxes A € On: =Sa, A = x0 
for € € A and the sa can be considered statistically 
independent on the scale of the lattice Qy. 

Therefore, [20] and [21] show that integration over 
2%) in the integral defining Zn(A,f) is not too 
different from the computation of a partition func- 
tion of a lattice continuous spin model in which the 
“spins” are sa and, most important, interact extre- 
mely weakly if N is large. In fact, the coupling 
constants are of order of a power of |X'N~"| times 
O(y™) if d=3 (O(N2y2N) if d=2), or of order 
O(y-N(442)/2 max fel), no matter how large À and f. 

This says that the smallest scale fields are 
extremely weakly coupled. The fields XN! can be 
regarded as external fields of size that will be called 
Byn-1, of order 1 or even allowed to grow with a 
power of N, see [6]. Their presence in Vy does not 
affect the size of the couplings, as far as the analysis 
of the integral over z) is concerned, because the 
couplings remain exponentially small in N, see [20] 
and [21], being at worst multiplied by a power of 
By 1, 1.e., changed by a factor which is a power of N. 

The smallness of the coupling at small scale is a 
property called “asymptotic freedom.” Once fields 
and coordinates are “correctly scaled,” the real size 
of the coupling becomes manifest, that is, it is 
extremely small and the addends in Vy proportional 
to the “counter-terms” jin,vn, which looked 


divergent when the fields were not properly scaled, 
are in fact of the same order or much smaller than 
the main y*-term. 

Therefore, the integration over z®? can be, heur- 
istically, performed by techniques well established 
in statistical mechanics (i.e., by straightforward 
perturbation expansions): at least if the field 
X'SN~") is smooth and bounded, as prescribed 
by [6], with B=By_; growing as a power of N. 
In sae e denoting symbolically the integration 
over zN) by P or by (...), it can be expected that it 
should give 


fear (2) — eVin-1+RG,N)|A| [22] 


where V; NA is the Taylor expansion of 
log f eV" dP(z')) in powers of \ (hence essentially 
in the very oil parameter Xy~"4-4) truncated at 
order j, that is, 

















V2:N-1 = (Vn) + (VÄ) St ) 
2 
V3:N-1 = (Vn) — (VÄ) s ) 
(VN (VR) — (VnY D) — (Vn (Vig) — (VN) )) | <3 
i 3! ; 


|23] 


where [-]*/ denotes truncation to order j in A, 


and R(j,N) is a remainder (depending on yi") 

which can be expected to be estimated, for d = 2, 3, by 
RG, N)| < RG, N) 

def c B(A N2 yay LAN [24] 


for suitable constants C;, that is, a remainder 
estimated by the (j+ 1)th power of the coupling 
times the number of boxes of scale N in A. The 
relations [22]-[24] result from a naive Taylor 
expansion (in À of the log fe^ dP(z), taking into 
account that, in Vy as a function of z®), the z®) 
appear multiplied by quantities at most of size 
<)y*-4N*B3,, by [20] and [21] if |X] < By_4). 
In a statistical mechanics model for a lattice spin 
system, such a calculation of Zn would lead to a 
mean-field equation of state once the remainder was 
neglected. 

The peculiarity of field theory is that a relation like 
[22] and [24] has to be applied again to V;n-1 to 
perform the integration over z7” and define V; N-2 
and, then, again to V;.j-2.... Therefore, it will be 
essential to perform the integral in [22] to an order 
(in A) high enough so that the bound R(j, N) can be 


summed over N: this requires (see [24]) an explicit 
calculation of [23] pushed at least to order ;=1 if 
d=2 or to order j=3 if d=3; furthermore it is also 
necessary to check that the resulting V;n-1 can still 
be interpreted as low-coupling spin model so that 
[22] can be iterated with N — 1 replacing N and then 
with N — 2 replacing N — 1,.... 

The first necessary check towards a proof of the 
discussed heuristic “expectations” is that, defining 
recursively V;., from V;,41 for h=N-—1,...,1,0 
by [23] with Vyn replaced by Vj.,41 and Vj,n-1 
replaced by V; p, the couplings between the variables 
z% do not become “worse” than those discussed in 
the case h = N. Furthermore, the field yN” has a 
high probability of satisfying [6], but fluctuations 
are possible: hence the R-estimate has to be 
combined with another one dealing with the large 
fluctuations of xp” which has to be shown to be 
“not worse.” 

For more details, the reader is referred to Gallavotti 
(1978, 1985) and Benfatto and Gallavotti (1995). 


Effective Potentials and Their 
Scale (In)Dependence 


To analyze the first problem mentioned at the end of 
the previous section, define V;., by [23] with Vy 
replaced by Vep for h=N-—1,N-—2,...,0. The 
quantities V;.,, which are called “effective poten- 
tials” on scale h (and order /), turn out to be in a 
natural sense scale independent: this is a conse- 
quence of renormalizability, realized by Wilson as a 
much more general property which can be checked, 
in the very special cases considered here with 
d=2,3, at fixed j by induction, and in the super- 
renormalizable models considered here it requires 
only an elementary computation of a few Gaussian 
integrals as the case j=3 (or even j=1 if d=2) is 
already sufficient for our purposes. 

It can also be (more easily) proved for general j by 
a dimensional argument parallel to the one pre- 
sented earlier to check finiteness of the renormalized 
series. The derivation is elementary but it should be 
stressed that, again, it is possible only because of the 
special choice of the counter-terms uyn, vyn. If d=3, 
the boundedness and smoothness of the fields y'<”) 
and z'”) expressed by the second of [6] and of [10] is 
essential; while if d=2 the smoothness is not 
necessary. 

The structure of Vj.) is conveniently expressed 
in terms of the fields X as a sum of three terms 
y (standing for “relevant” part), y (standing 
for “irrelevant” part), and a “field independent” 
part E(j, h)|A]. 
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The relevant part in d=2 is simply of the form 
[21] with / replacing N: call it ye If d=3, it is 
given by [20] with / replacing N plus, for b < N, a 
second “nonlocal” term 


2 2) 
(rel,2) def 4° 3! 9 ['/ A(<h)3 _ AL<N)3 
vp rar | (Cir? R) 


2 
x (g — T dndn’ 


which is conveniently expressed in terms of a 
“nonlocal” field 


<h <h 

(h) def Pn” — On | 
ny 1 
(Pin — n|) 


ae ve — U de vor with 


rel,2) def =h 
y" Ay 2 





AACO, AxA’ 
, n dnd 
=c” n=n' 35 
i AIA" 25] 


where 


Ay) 
(oin — n|) ` 


with a,a',c' >0 and the subscript N means that the 
expression in parenthesis “saturates at scale N”, i.e., 
its denoninator becomes y2—"1/2)"-N) as |n — | > 0. 

The expression [25] is not the full part of the 
potential V;., which is of second order in the fields: 
there are several other contributions which are 
collected below as “irrelevant.” 

It should be stressed that “irrelevant” is a 
traditional technical term: by no means it should 
suggest “negligibility.” On the contrary, it could be 
maintained that the whole purpose of the theory is 
to study the irrelevant terms. The irrelevant part of 
the potential can be better designated as the “driven 
part,” as its behavior is “controlled” by the relevant 
part: although initially Vp, b=N, contains 
no irrelevant terms, it eventually contains them for 
h<N and they keep getting generated as bh 
diminishes. Furthermore, the part of the irrelevant 
terms generated at scale ho < N becomes very small 
at scales h < ho so that the irrelevant part of V,p at 
small b (e.g., at b =0, i.e., on the “physical scale” of 
the observer) only depend on the relevant terms in a 
few scales near h. 

It also turns out that the Schwinger functions are 
simply related to the irrelevant terms. 

The irrelevant part of the effective potential can 
be expressed as a finite sum of integrals of 
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one Pi the fields X® if d= 2, or in the fields 
xp and ai T if d=3, which can be written as oars 
given by 


dé, r dnd 
X Wilii sista) ME AL] |A2,] |26] 


with the integral extended to products Aj, x --- x 
ma {Al x x) of boxes A€Q, and 

dé... . om!) is the length of the shortest tree 
graph that connects all the p+2q>0 points, the 
exponents n,t are >2, and £t is >3 if q>0; 
the kernel W depends on all coordinates €,,.. si 
and it is bounded above by C; [J$ 4 Anum, for some 

Cj; the sums X ng +) on, canoe exceed 4j. The 
test functions f do not appear in [26] because by 
assumption they are bounded by 1: but W depends 
on the f’s as well. 

The field-independent part is simply the value 
of log Zn(A,f) computed by the perturbation 
analysis in the section “Perturbation theory” up to 
order j in À but using as propagator (C'SN) — C'<??); 
thus, E(j,4) is a constant depending on N but 
uniformly bounded as N= œo (because of the 
renormalizability proved in the section “Perturba- 
tion theory”). 

ka= 2, there is no need to introduce the nonlocal 
fields Y'’) and in [26] one can simply take g=0, 
and the relevant part ae can be expressed w 
omitting the term pa *) in [25]: unlike the d=3 
case, the estimate on the kernels W by an 
Karden C; holds uniformly in b without 
having to A Y. For d =2, it will therefore be 
supposed that Ju *) = 0 in [25] and g=0 in [26]. 

It is not necessary to have more information on 
the structure of V;.;, even though one can find simple 
graphical rules, closely related to the ones in the 
section “Perturbation theory,” to construct the 
coefficients W in full detail. The W depend, of 
course, on b but the uniformity of the bound on W 
is the only relevant property and in this sense the 
effective potentials are said to be (almost) “scale 
independent.” 

The above bounds on the irrelevant part can 
be checked by an elementary direct computation if 
j <3: in spite of its “elementary character,” the 
uniformity in h < N is a result ultimately playing an 
essential role in the theory together with the 
dominance of the relevant part over the irrelevant 
one which, once the fields are properly scaled, is 
“much smaller” (by a factor of order y~”, see [26]), 
at least if þh is large. 


Remarks 


(i) Checking scale eer eae for j=1 is just 
checking that [P(dz'”))V1,, = Vi,,_1. Note that 


Vint f (oe ~ 6c) os renee, 2) dé 


hence, calling :y‘S”*: the polynomial in the integral 
(Wick’s monomial of order 4), we have here an 
elementary Gaussian integral (“martingale property 
of Wick monomials’’). Note the essential role of the 
counter-terms. For j > 1, the computation is similar 
but it involves higher-order polynomials (up to 4f) 
and the distinction between d=2 and d=3 
becomes important. 

(ii) V;o contains only the field-independent part 
E(j,0)|A| (see above) which is just a number (as 
there are no fields of scale 0): by the above 
definitions, it is identical to the perturbative 
expansion truncated to jth order in ÀA of 
log Zn(A, f), well defined as discussed earlier. 


Nonperturbative Renormalization: 
Small Fields 


Having introduced the notion of effective potential 
V;p, Of order j and scale h, satisfying the bounds 
(described after [26]) on the kernels W representing 
it, the problem is to estimate the remainder in [22] 
and find its relation with the value [24] given by the 
heuristic Taylor expansion. Assume A < 1 to avoid 
distinguishing this case from that with A > 1 which 
would lead to very similar estimates but to different 
À- n a some constants. 

Define xp(z"”)) = 1 if |z” ||, < Bh? for all A € Q}, 
see [8], and 0 oael then the following lemma 


holds: 


Lemma 1 Let ||X")||, be defined as [8] with z 
replaced by X and suppose ||X"”)||, < Bh* for all A 
then, for all j > 1, it is 


[evo xn(a (b+1) dP (z' (b+1) )) 


— e Vin +R Gb+1)|A| [27] 


with, for suitable constants c_,c_, 
IR_G;b+1)| < R_G;b +1) 
ER RG; bh +1)+c- g (b+1)* 


and Rij;h +1) given by [24] with h+1 in place 
of N. 

Since Zn(A,f) > fe^ ipa 1 xB(z!”))P(dz'”)) this 
immediately gives a lower bound 7 a (1/|A]) 
logZn(A,f): in fact if  xp(|lz|])= for 


bh’ =1,...,h, then ||X"”)||, <cBh’ for some c so 
that, by recursive oe of Lemma 1, 


Zn(A,f) > e% ena RP A! By the remark at the 
end of the previous section, given j the lower bound 
on E just described agrees with the perturbation 
expansion of E=(1/|A|) log Zy(A,f) truncated to 
order j (in A) up to an error bounded by 


i R-(j,h). 


Remark The problem solved by Lemma 1 is 
usually referred to as the small-field problem, to 
contrast it with the large-field problem discussed 
later. The proof of the lemma is a simple Taylor 
expansion in Ay” if d=3 or in Ah?y 7” if d=2 to 
order j (in A). The constraint on z'’+!) makes the 
integrations over 2+"), necessary to compute Vip 
from V;p+1, not Gaussian. But the tail estimates 3, 
eai with the Markov property of the distribu- 
tion of z*) can be used to estimate the difference 
with respect to the Gaussian unconstrained integra- 
tions of zt”: and the result is the addition of the 
small “tail error” changing R into R- in [27]. The 
estimate of the main part of the dee R would 
be obvious if the fields 2”) were independent on 
boxes of scale y~’: they are not independent but 
they are Markovian and the estimate can be done by 
taking into account the Markov property. 


For more details, the reader is referred to Wilson 
(1970, 1972), Gallavotti (1978, 1981, 1985), and 
Benfatto et al. (1978). 


Nonperturbative Renormalization: Large 
Fields, Ultraviolet Stability 


The small-field estimates are not sufficient to obtain 
eas stability: to control the cas in which 
Xp" | > Bh* for some € or some h, or Le | > Bh* for 
some |E — n| < y~”, a further ee is necessary and it 
rests on making use of the assumption that A> 0 
which, in a sense to be determined, should suppress 
the contribution to the integral defining Zy(A,f) 
coming from very large values of the field. Assume 
also A <1 for the same reasons advanced in the 
section “Effective potentials and their scale 
(in)dependence.” 

Consider first d=2. Let Dy be the “large-field 
region” where |X‘\)|>BN* and let Vn(A/Dn) be 
the integral defining the potential in [21] extended 
to the region A/Dy, complement of Dy. This region 
is typically very irregular (and random as X itself is 
random with distribution Py). 

An upper bound on the integral te Zn(A, f) 
is obtained by simply replacing eYX by eVNA/Py) 
because in Dy the first term in the integrand in [21] 
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is <—\N?7?N(BN*) <0 and it overwhelmingly 
dominates on the remaining terms whose value is 
bounded by a similar expression with a smaller 
power of N. Then if ES dt AJE denotes the comple- 
ment in A of a set E C A: 


Lemma 2 Let d=2. Define V,(D;) to be given by 
the expression |22] with the integrals extending over 
A;/D, and define R(j,h + 1) by [24]. Then 


where 
cue? 


Rth +1 < Roth +1 E Rijsh +1) + 


"h+? with suitable oa oe 


Remark Lemma 2 is genuinely not perturbative 
and making essential use of the positivity of A. 
Below the analysis of the proof of the lemma, which 
consists essentially in its reduction to Lemma 1, is 
described in detail. It is perhaps the most interesting 
part and the core of the theory of the proof that 
truncating the expansion in à of (1/|A|) log Zn(A, f) 
to order j gives as a result an estimate exact to order 
Nit of (1/JAI) log Zn (A, f). 


Let Ry be the cubes A € Own in which there is at 
least one point é where |z!\’| > BN2. By definition, 
the region Dy /Dn_; is covered by Ry. 

Remark that in the region Dy_1/Rn the field 
X'N-1) is large but zy is not large so that X) is still 
very large: this is so because the bounds set to define 
the regions D and R are quite different being BN‘ 
and BN7?, respectively. Hence, if a point is in Dy_1 
and not in Ry, then the field X) must be of the 
order >> BN?. Therefore, by positivity of the Ayn” 
term (which dominates all other terms so that 
VIN) (ISN) <0 for č € Dn U(Dy_1/Rn)) we can 
replace Vn(D%) by V((Dn U(Dn-1/Rn))°), for the 
purpose of obtaining an upper bound. 

Furthermore, modulo a suitable correction, it is 
possible to replace V((DNU(Dn_-1/Rn))*) by 
V((Dn_1 U Rn)‘°): because the integrand in Vyn is 
bounded below by 


-bày 2N Ņ? 
if d=2 (by -bày ™ if d= 3), for some b, so that the 
points in Rn can at most lower V((Dyn U 
(Dn_1/Rn))°) by =bAN aq ONH Ry) if #RN is 


the number d boxes of be in Ry and V(ye) is 
bounded below by its minimum: thus, 


V((Dn-1 U Rn)*) + BAN? 74 ON # (Ry) 


is an upper bound to V((Dy U (Dy_1/Rn))*) 

In the complement of Dyn-1 U Ry, all fields are 
“small”; if X-I and Ry are fixed this region is not 
random (as a function of z’) any more. Therefore, 
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if X'N-) Ryn are fixed the integration over 2%), 
conditioned to having z'%) fixed (and large) in the 
region Ry, is performed by means of the same 
argument necessary to prove Lemma 1 (essentially a 
Taylor expansion in Ay~4-%), The large size of 
2'N) in Ry does not affect too much the result 
because on the boundary of Ry the field z® is 
<BN? (recalling that z®? is continuous) and since 
the variable z'%) is Markovian, the boundary effect 
decays exponentially from the boundary ORn: it 
adds a quantity that can be shown to be bounded by 
the number of boxes in Ry on the boundary of Rn, 
hence by #Rn, times b/(N — 1)*7y~'4-4)(B(N — 1)*)* 
for some b’. 

The result of the integration over 2) of 
eVn((PxU(Px-1/Rw)") Conditioned to the large-field 
values of 2'%) in Ry leads to an upper bound on 
fe’N P(dz'%)) as 


e Vin-1(Py_1) RGN) 
Rn 


if 2\2 Wy n —(4—d)N N2 4\4\ #RN 
7 II (ce c (BN?) , +c" Say | 29] 
AERN 
where c,c’,c’ are suitable constants: this is 


explained as follows. 


1. Taylor expansion (in A) of the integral 
e VN((Dn-1URN) BAN? EONAR) (which, by cons- 
truction, is an upper bound on eYX'?N)) with 
respect to the field 2), conditioned to be fixed 
and large in Ry, would lead to an upper bound as 

e ViN-1((DN-1URN) )+R' GN) AJD" ABN) YOAN (Ry) 
with R' equal to [24] possibly with some C; 
replacing C;. The second exponential on the RHS 
of [29] arises partly from the above correction 
b"\(BN*)*4-4-ON 4(Rx) and partly from a 
contribution of similar form explained in (3) 
below. 

2. Integration over the large conditioning fields 
fixed in Ry is controlled by the second estimate 
in [9] (the tail estimate): the first factors in 
parentheses in [29] is the tail estimate just 
mentioned, i.e., the probability that z® is large 
in the region Ry. The second factor is only partly 
explained in (1) above. 

3. Without further estimates, the bound [29] would 
contain Vj.n-1((Dn-1URn)*°) rather than 
V; n-1(DH1). Hence, there is the need to change 
the potential V;n-1((Dn-1 U Rn)*) by “reintrodu- 
cing” the contribution due to the fields in 
Ryn /Pn-1 in order to reconstruct V; yn-1(DH1). 
Reintroducing this part of the potential costs a 


quantity like b/\N27'4-@)N(BN*)*#(Rn) (because 
the reintroduction occurs in the region Ryn /Dn_-1 
which is covered by Ry and in such points the field 
xp” is not large, being bounded by B(N — 1)*); 
so that their contribution to the effective potential 
is still dominated by the y*-term and therefore by 
y~4-4IN times a power of BN* times the volume of 
Ry (in units yÙ, i.e., #Ry). All this is taken care 
of by suitably fixing c”. 


Note that the sum over Ry of [29] is 


I R2 NT4 "yn 4-d)N N72 454, a dN 
(1 +ce™? N ete Ay N-(BN*) y |A| 


(because A contains |A|y¢% cubes of On); hence, it is 
—' B 
CLE T 


bounded above by e for suitably defined 


Ci ox 

T same argument can be repeated for V; (D3) 
with any / if V;,p(D;) is defined by the sum over A’s 
in Q, of the same integrals as those in [25] and [26] 
with A;/D;, replacing A; in the integration domains. 

Applying Lemma 1 recursively with j> 1 (if 
d=3 it would become necessary to take jf > 3), it 
follows that there exist N-independent upper and 
lower bounds E+|A| on log Z(A,f) of the form 
Vio +X y1 (RUA) + cee BTA] for Cist > 0 
suitably chosen and A-independent for A< 1. 
By the remark at the end of Sec.6, given j, the 
bounds just described agree with the perturbation 
expansion E(j,0)|A| = V;o of log Z(A,f) truncated 
to order j (in A) up to the remainders 
+57, R(j,). Hence, if B is chosen proportional 
to log, A df log (e+), the upper and lower 
bounds coincide to order j in A with the value 
obtained by truncating to order j the perturbative 
series. 

The latter remark is important as it implies 
not only that the bounds are finite (by the 
section “Perturbation theory”) but also that the 
function (1/|A|)log Z(A,f) is not quadratic in f: 
already to order 1 in A it is quartic in f (containing a 
term equal to —A( f Ce ofedé)*). 

The latter property is important as it excludes 
that the result is a “Gaussian” generating function. 
Thus, the outline of the proof of Lemma 2, which 
together with Lemma 1 forms the core of the 
analysis of the ultraviolet stability for d=2, is 
completed. 

If d=3, more care is needed because (very mild) 
smoothness, like the considered Holder continuity 
with exponent 1/4, of z, X is necessary to obtain the 
key scale independence property discussed in earlier: 
therefore, the natural measure of the size of z” and 
X") in a box A € Q, is no longer the maximum of 
ze or of X” The region D, becomes more 





involved A it has to consist of the points & 
where Xe |> Bh* and of the pairs n,n’ where 


W) _ ylh) 
xp — xt 


Qrin — mI) 
i.e., it is not just a subset of A. 

However, if d= 3, the relevant part also contains 
the negative term virel,2) *) see [25]: and since it 
dominates over all other terms which contain a 
Y-field (because their couplings [25] are smaller by 
about y7”), the argument given for d=2 can be 
adapted to the new situation. Two regions D}, D? 
will be oe the first consists of all the points & 
where |X’ 1> Bh and the second of all the pairs 
n,m’ where bau ,|>Bh*. The region R, will be 
the collection of all AE Qp, where ||z”)||,, > Bh?, 
see [8] with r =0. Then V(D7) will be defined as the 
sum of the integrals in [25] and [26] with the integrals 
over &, further restricted to é; ¢ D} and those over the 
pairs 7;, 7, are further restricted to (n;, n!) ¢ Dz. With 
the new settings, Lemma 2 can be proved also for 
d= 3 along the same lines as in the d =2 case. 

For more details, the reader is referred to Wilson 
(1970, 1972), Benfatto et al. (1978), and Gallavotti 
(1981). 





Ultraviolet Limit, Infrared Behavior, and 
Other Applications 


The results on the ultraviolet stability are nonper- 
turbative, as no assumption is made on the size of A 
(the assumption A < 1 has been imposed in the last 
two sections only to obtain simpler expressions for 
the A-dependence of various constants): nevertheless 
the multiscale analysis has allowed us to use 
perturbative techniques (i.e., the Taylor expansion 
in Lemmata 1, 2) to find the solution. The latter 
procedure is the essence of the renormalization 
group methods: they aim at reducing a difficult 
multiscale problem to a sequence of simple single- 
scale problems. Of course, in most cases, it is 
difficult to implement the approach and the scalar 
quantum fields in dimensions 2,3 are among the 
simplest examples. The analysis of the beta function 
and of the running couplings, which appear in 
essentially all renormalization group applications, 
does not play a role here (or, better, their role is so 
inessential that it has even been possible to avoid 
mentioning them). This makes the models somewhat 
special from the renormalization group viewpoint: 
the running couplings at length scale þh, if intro- 
duced, would tend exponentially to 0 as ho; 
unlike what happens in the most interesting 
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renormalization group applications in which they 
either tend to zero only as powers of / or do not 
tend to zero at all. 

The multiscale analysis method, i.e., the renorma- 
lization group method, in a form close to the one 
discussed here has been applied very often since its 
introduction in physics and it has led to the solution 
of several important problems. The following is not 
an exhaustive list and includes a few open questions. 


1. The arguments just discussed imply, with minor 
extra work that Zn(A, f) as N —> œ not only admit 
uniform upper and lower bounds but also that the 
limit as N — œ actually exists and it isa C” function 
of A, f. Its A and f-derivatives at A = 0 and f = 0 are 
given by the formal perturbation calculation. In some 
cases, it is even possible to show that the formal series 
for Zn (A, f) in powers of A is Borel summable. 

2. The problem of removing the infrared cutoff (1.e., 
A — oo) is in a sense more a problem of statistical 
mechanics. In fact, it can be solved for d=2,3 bya 
typical technique used in statistical mechanics, the 
“cluster expansion.” This is not intended to mean 
that it is technically an easy task: understanding its 
connection with the low-density expansions and 
the possibility of using such techniques has been a 
major achievement that is not discussed here. 

3. The third problem mentioned in the introduction, 
that is, checking the axioms so that the theory could 
be interpreted as a quantum field theory is a difficult 
problem which required important efforts to con- 
trol and which is not analyzed here. An introduction 
to it can be its analysis in the d= 2 case. 

4. Also the problem of keeping the ultraviolet cutoff 
and removing the infrared cutoff while the para- 
meter m? in the propagator approaches 0 is a very 
interesting problem related to many questions in 
statistical mechanics at the critical point. 

5. Field theory methods can be applied to various 
statistical mechanics problems away from criti- 
cality: particularly interesting is the theory of the 
neutral Coulomb gas and of the dipole gas in two 
dimensions. 

6. The methods can be applied to Fermi systems in 
field theory as well as in equilibrium statistical 
mechanics. The understanding of the ground state 
in not exactly soluble models of spinless fermions 
in one dimension at small coupling is one of the 
results. And via the transfer matrix theory it has 
led to the understanding of nontrivial critical 
behavior in two-dimensional models that are not 
exactly soluble (like Ising next-nearest-neighbor or 
Ashkin-Teller model). Fermi systems are of 
particular interest also because in their analysis 
the large-fields problem is absent, but this great 
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ie 


technical advantage is somewhat offset by the 
anticommutation properties of the fermionic 
fields, which do not allow us to employ 
probabilistic techniques in the estimates. 

An outstanding open problem is whether the scalar 
y*-theory is possible and nontrivial in dimension 
d=4: this is a case of a renormalizable not 
asymptotically free theory. The conjecture that 
many support is that the theory is necessarily trivial 
(i.e., the function Zyn(A, f) becomes necessarily a 
Gaussian in the limit N — oo). One of the main 
problems is the choice of the ultraviolet cut-off; 
unlike the d=2,3 cases in which the choice is a 
matter of convenience it does not seem that the 
issue of triviality can be settled without a careful 
analysis of the choice and of the role of the 
ultraviolet cut-off. 


8. Very interesting problems can be found in the 


10. 


study of highly symmetric quantum fields: gauge 
invariance presents serious difficulties to be 
studied (rigorously or even heuristically) because 
in its naive forms it is incompatible with 
regularizations. Rigorous treatments have been 
in some cases possible and in few cases it has been 
shown that the naive treatment is not only not 
rigorous but it leads to incorrect results. 


. In connection with item (8) an outstanding problem 


is to understand relativistic pure gauge Higgs fields 
in dimension d = 4: the latter have been shown to be 
ultraviolet stable but the result has not been 
followed by the study of the infrared limit. 

The classical gauge theory problem is quantum 
electrodynamics, QED, in dimension 4: it is a 
renormalizable theory (taking into account gauge 
invariance) and its perturbative series truncated 
after the first few orders give results that can be 
directly confronted with experience, giving very 
accurate predictions. Nevertheless, the model is 
widely believed to be incomplete: in the sense that, 
if treated rigorously, the result would be a field 
describing free noninteracting assemblies of 
photons and electrons. It is believed that QED 
can make sense only if embedded in a model with 
more fields, representing other particles (e.g., the 
standard model), which would influence the 
behavior of the electromagnetic field by providing 
an effective ultraviolet cutoff high enough for not 
altering the predictions on the observations on the 
time and energy scales on which present (and, 
possibly, future over a long time span) experi- 
ments are performed. In dimension d = 3, QED is 
super-renormalizable, once the gauge symmetry is 
properly taken into account, and it can be studied 
with the techniques described above for the scalar 
fields in the corresponding dimension. 


In general, constructive quantum field theory 
seems to be in a deep crisis: the few solutions that 
have been found concern very special problems and 
are very demanding technically; the results obtained 
have often not been considered to contribute 
appreciably to any “progress.” And many consider 
that the work dedicated to the subject is not worth 
the results that one can even hope to obtain. 
Therefore, in recent years, attempts have been 
made to follow other paths: an attitude that in the 
past usually did not lead, in general to great 
achievements but that is always tempting and 
worth pursuing because the rare major progresses 
made in physics resulted precisely by such changes 
of attitude, leaving aside developments requiring 
work which was too technical and possibly hopeless: 
just to mention an important case, one can recall 
quantum mechanics which disposed of all attempts 
at understanding the observed atomic levels quanti- 
zation on the basis of refined developments of 
classical electromagnetism. 

For more details, the reader is referred to Nelson 
(1966), Guerra (1972), Glimm et al. (1973), Glimm 
and Jaffe (1981), Simon (1974), Benfatto et al. 
(1978, 2003), Aizenman (1982), Gawedzky and 
Kupiainen (1983, 1985a, b), Balaban (1983), and 
Giuliani and Mastropietro (2005). 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Euclidean Field 
Theory; Integrability and Quantum Field Theory; 
Perturbation Theory and its Techniques; Quantum Field 
Theory: A Brief Introduction; Scattering, Asymptotic 
Completeness and Bound States. 
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Introduction 


Contact geometry has been seen to underly many 
physical phenomena and is related to many other 
mathematical structures. Contact structures first 
appeared in the work of Sophus Lie on partial 
differential equations. They reappeared in Gibbs’ 
work on thermodynamics, Huygens’ work on 
geometric optics, and in Hamiltonian dynamics. 
More recently, contact structures have been seen to 
have relations with fluid mechanics, Riemannian 
geometry, and low-dimensional topology, and these 
structures provide an interesting class of subelliptic 
operators. 

After summarizing the basic definitions, exam- 
ples, and facts concerning contact geometry, this 
article discusses the connections between contact 
geometry and symplectic geometry, Riemannian 
geometry, complex geometry, analysis, and 
dynamics. The article ends by discussing two of 
the most-studied connections with physics: Hamil- 
tonian dynamics and geometric optics. References 
for other important topics in contact geometry 
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(e.g., thermodynamics, fluid dynamics, holo- 
morphic curves, and open book decompositions) 
are provided in the “Further reading” section. 


Basic Definitions and Examples 


A hyperplane field € on a manifold M is a codimen- 
sion-1 sub-bundle of the tangent bundle TM. Locally, 
a hyperplane field can always be described as the 
kernel of a 1-form. In other words, for every point in 
M there is a neighborhood U and a 1-form a defined 
on U such that the kernel of the linear map 
a, : TyM — R is & for all x in U. The form a is called 
a local defining form for €. A contact structure on a 
(2n + 1)-dimensional manifold M is a “maximally 
nonintegrable hyperplane field” €. The hyperplane 
field € is maximally nonintegrable if for any (and hence 
every) locally defining 1-form a for € the following 
equation holds: 


a A (da)" 40 [1] 


(this means that the form is, pointwise, never equal 
to 0). Geometrically, the nonintegrability of € means 
that no hypersurface in M can be tangent to € along 
an open subset of the hypersurface. Intuitively, this 
means that the hyperplanes “twist too much” to be 
tangent to hypersurfaces (Figure 1). The pair (M, €) 
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Figure 1 The standard contact structure on RÌ given as the 
kernel of dz — y dx. Courtesy of Stephan Schdénenberger. 


is called a contact manifold and any locally defining 
form a for £ is called a contact form for €. 


Example 1 The most basic example of a contact 
structure can be seen on R2”*! as the kernel of the 
1-form a=dz—)~"_, y; dx;, where the coordinates 
on R+! are (x1,Y1,.--,Xn, Yn, Z). This example is 
shown in Figure 1 when n=1. 


Example 2 Recall that on the cotangent space of 
any n-manifold M, there is a canonical 1-form A, 
called the Liouville form. If (g1,...,qn) are local 
coordinates on M, then any 1-form can be expressed 
as `; piddi, so (41,P15---54nyPn) are local coor- 
dinates on T*M. In these coordinates, 


e— Spin’ dg; [2] 
i=] 


where m:T*M—M is the natural projection 
map. The 1-jet space of M is the manifold 
J'(M) = T*M x R and can be considered as a bundle 
over M. The 1-jet space has a natural contact 
structure given as the kernel of a = dz — A, where z 
is the coordinate on R. Note that if M = R” then we 
recover the previous example. 


Example 3 The (oriented) projectivized cotangent 
space of a manifold M is the set P*M of nonzero 
covectors in T*M where two covectors are identified 
if they differ by a positive real number, that is, 


PYM = (T*M \ {0})/R+ 3] 


where {0} is the zero section of T*M and R, denotes 
the positive real numbers. If M has a metric then P*M 
can be easily identified with the space of unit 
covectors. Considering P*M as unit covectors, we can 
restrict the canonical 1-form A to P*M to get a 1-form 
a whose kernel defines a contact structure € on P*M. 
(Although there is no canonical contact form on P*M, 
the contact structure € is still well defined.) Note that if 


M is compact then so is P*M; so this gives examples of 
contact structures on compact manifolds. 


If a and a’ are two locally defining 1-forms for £, then 
there is a nonzero function f such that a’ =fa. Thus, 
a! A (da’)” = f"*t!a A (da)” is a nonzero top dimen- 
sional form on M and if n is odd then the orientation 
defined by the local defining form is independent of the 
actual form. Hence, when n is odd, a contact structure 
defines an orientation on M (this is independent of 
whether or not £ is orientable! ). If M had a preassigned 
orientation (and 7 is odd), then the contact structure is 
called “positive” if it induces the given orientation and 
“negative” otherwise. One should be careful when 
reading the literature, as some authors build 
positive into their definition of contact structure, 
especially when n= 1. If there is a globally defined 
1-form a whose kernel defines €, then € is called 
transversally orientable or co-orientable. This is 
equivalent to the bundle € being orientable when n 
is odd or when 7 is even and M is orientable. In 
this article the discussion is restricted to transver- 
sely orientable contact structures. 

Suppose that a is a contact form for £, then eqn [1] 
implies that dal. is a symplectic form on €. This 
is one sense in which a contact structure is like an 
odd-dimensional analog of a symplectic structure. 

A submanifold L of a contact manifold (M, €) is 
called Legendrian if dim M =2 dim L + 1 and T,L C &. 


Example 4 A fiber in the unit cotangent bundle 
with the contact structure from Example 3 is a 
Legendrian sphere. 


Example 5 Let f:M-—R be a function. Then 
ja(f (4) = (q, df4, f (q)) is a section of the 1-jet space 
J'(M) of M; it is called the 1-jet of f. If s is any 
section of the 1-jet space, then it is Legendrian if and 
only if it is the 1-jet of a function. 


This observation is the basis for Lie’s study of 
partial differential equations. More specifically, a 
first-order partial differential equation on M can be 
considered as giving an algebraic equation on ]'(M). 
Then, a section of J'(M) satisfying this algebraic 
equation corresponds to the 1-jet of a solution to the 
original partial differential equation if and only if it 
is Legendrian. 

Recently, Legendrian submanifolds have been 
much studied. There are various classification results 
in three dimensions and several striking existence 
results in higher dimensions. 


Local Theory 


The natural equivalence between contact structures 
is contactomorphism. Two contact structures o and 


€; on manifolds Mọ and Mj, respectively, are 
contactomorphic if there is a diffeomorphism 
f:Mo — My, such that f,(€o) =&. All contact struc- 
tures are locally contactomorphic. In particular, we 
have the following theorem. 


Theorem 1 (Darboux’s Theorem). Suppose &; is a 
contact structure on the manifold M;,i=0,1, and 
Moy and M; have the same dimension. Given any 
points po and pı in Mo and Mia, respectively, there 
are neighborhoods N; of p; in M; and a contacto- 
morphism from (No, €0|n,) to (N1, €1|n,). Moreover, 
if a; is a contact form for & near p;, then the 
contactomorphism can be chosen to pull a, back to ao. 


Thus, locally all contact structures (and contact 
forms!) look like the one given in Example 1 above. 

Furthermore, contact structures are “local in 
time.” That is, compact deformations of contact 
structures do not produce new contact structures. 


Theorem 2 (Gray’s theorem). Let M be an oriented 
(2n + 1)-dimensional manifold and &,,t € (0,1), a 
family of contact structures on M that agree off of 
some compact subset of M. Then there is a family of 
diffeomorphisms ¢;: M — M such that (¢;),€; = £0. 


In particular, on a compact manifold, all 
deformations of contact structures come from 
diffeomorphisms of the underlying manifold. The 
theorem is not true if the contact structures do not 
agree off of a compact set. For example, there is a 
one-parameter family of noncontactomorphic 
contact structures on St x R?. 


Existence and Classification 


The existence of contact structures on closed odd- 
dimensional manifolds is quite difficult. However, 
Gromov has shown that contact structures on 
open manifolds obey an h-principle. To explain 
this, we note that if (M?”*!',€) is a co-oriented 
contact manifold then the tangent bundle of M can 
be written as €@R and thus the structure group 
of TM can be reduced to U(m) (since € has 
a conformal symplectic structure on it). Such 
a reduction of the structure group is called an 
almost contact structure on M. Clearly, a contact 
structure on M induces an almost contact struc- 
ture. If M is an open manifold, Gromov proved 
that the inclusion of the space of co-oriented 
contact structures on M into the space of almost 
contact structures on M is a weak homotopy 
equivalence. In particular, if an open manifold 
meets the necessary algebraic condition for the 
existence of an almost contact structure, then the 
manifold has a co-oriented contact structure. 
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Lutz and Martinet proved a similar, but weaker, 
result for oriented closed 3-manifolds. More 
specifically, every closed oriented 3-manifold admits 
a co-oriented contact structure and in fact has at least 
one for every homotopy class of plane field. There has 
been much progress on classifying contact structures 
on 3-manifolds and here an interesting dichotomy has 
appeared. Contact structures break into one of two 
types: tight or overtwisted. Overtwisted contact 
structures obey an h-principle and are in general easy 
to understand. Tight contact structures have a more 
subtle, geometric nature. In higher dimensions there is 
much less known about the existence (or classification) 
of contact structures. 


Relations with Symplectic Geometry 


Let (X,w) be a symplectic manifold. A vector field v 
satisfying 


lyw=@ [4] 


(where L,w is the Lie derivative of w in the direction 
of v) is called a symplectic dilation. A compact 
hypersurface M in (X,w) is said to have “contact 
type” if there exists a symplectic dilation v in a 
neighborhood of M that is transverse to M. Given a 
hypersurface M in (X,w), the characteristic line field 
LM in the tangent bundle of M is the symplectic 
complement of TM in TX. (Since M is codimension 1, 
it is coisotropic; thus, the symplectic complement lies 
in TM and is one dimensional.) 


Theorem 3 Let M be a compact hypersurface in a 
symplectic manifold (X,w) and denote the inclusion 
map i: M— X. Then M has contact type if and only 
if there exists a 1-form a on M such that da=1t*w 
and the form a is never zero on the characteristic 


line field. 


If M is a hypersurface of contact type, then the 
1-form a is obtained by contracting the symplectic 
dilation v into the symplectic form: a=J,w. It is 
easy to verify that the 1-form a is a contact form 
on M. Thus, a hypersurface of contact type in a 
symplectic manifold inherits a co-oriented contact 
structure. 

Given a co-orientable contact manifold (M, €), its 
symplectization Symp(M,¢é)=(X,w) is constructed 
as follows. The manifold X =M x (0,00), and given 
a global contact form a for € the symplectic 
form is w=d(ta), where ¢ is the coordinate on R. 
(The symplectization is also equivalently defined as 


(M x R, d(e’a)).) 


Example 6 The symplectization of the standard 
contact structure on the unit cotangent bundle 
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(see Example 3) is the standard symplectic structure 
on the complement of the zero section in the 
cotangent bundle. 


The symplectization is independent of the choice 
of contact from a. To see this, fix a co-orientation 
for € and note the manifold X which can be 
identified (in many ways) with the sub-bundle of 
T*M whose fiber over x € M is 


{8E TM: B(€.) =0 and 


B > 0 on vectors positively transverse to €,} — [5] 


and restricting dX to this subspace yields a symplec- 
tic form w, where A is the Liouville form on T*M 
defined in Example 2.A choice of contact form a 
fixes an identification of X with the sub-bundle of 
T*M under which d(ta) is taken to då. 

The vector field v = 0/0t on (X,w) is a symplectic 
dilation that is transverse to M x {1} c X. Clearly, 
Ly| x1) =a. Thus, we see that any co-orientable 
contact manifold can be realized as a hypersurface 
of contact type in a symplectic manifold. In 
summary, we have the following theorem. 


Theorem 4 If (M,&) is a co-oriented contact 
manifold, then there is a symplectic manifold 
Symp(M,&) in which M sits as a hypersurface of 
contact type. Moreover, any contact form a for £ 
gives an embedding of M into Symp(M,&) that 
realizes M as a hypersurface of contact type. 


We also note that all the hypersurfaces of contact 
type in (X,w) look locally, in X, like a contact 
manifold sitting inside its symplectification. 


Theorem 5 Given a compact hypersurface M of 
contact type in a symplectic manifold (X,w) with the 
symplectic dilation given by v, there is a neighbor- 
hood of M in X symplectomorphic to a neighbor- 
hood of Mx{1} in Symp(M,€) where the 
symplectization is identified with M x (0,00) using 
the contact form a=tyw|y and €=kera. 


The Reeb Vector Field and Riemannian 
Geometry 


Let (M,€) be a contact manifold. Associated to a 
contact form a for € is the Reeb vector field va. 
This is the unique vector field satisfying 


w da = 0 [6] 


wael 2nd 


One may readily check that va is transverse to the 
contact hyperplanes and the flow of va preserves € 
(in fact, it preserves a). These two conditions 
characterize Reeb vector fields; that is, a vector 
field v is the Reeb vector field for some contact form 


for € if and only if it is transverse to € and its flow 
preserves €. 

The fundamental question concerning Reeb vector 
fields asks if its flow has a (contractible) periodic 
orbit. A paraphrazing of the Weinstein conjecture 
asserts a positive answer to this question. Most 
progress on this conjecture has been made in 
dimension 3 where H Hofer has proved the 
existence of periodic orbits for all Reeb fields on $° 
and on 3-manifolds with essential spheres 
(i.e., embedded S*’s that do not bound a 3-ball in 
the manifold). Relations with Hamiltonian dynamics 
are discussed below. 

Recall, from Example 3, that a Riemannian metric 
g on a manifold M provides an identification of the 
(oriented) projectivized cotangent bundle P*M with 
the unit cotangent bundle. Considered as a subset of 
T*M, P*M inherits not only a contact structure but 
also a contact form a (by restricting the Liouville 
form). Let va be the associated Reeb vector field. 
The metric g also provides an identification of the 
tangent and cotangent bundles of M. Thus, P*M 
may be considered as the unit tangent bundle of M. 
Let wg be the vector field on the unit tangent bundle 
generating the geodesic flow on M. 


Theorem 6 The Reeb vector field v,, is identified 
with geodesic flow field wg when P*M is identified 
with the unit tangent space using the metric g. 


Relations with Complex Geometry 
and Analysis 


Let X be a complex manifold with boundary and 
denote the induced complex structure on TX by J. 
The complex tangencies € to M=0OX are described 
by the equation dfbo]J=0, where ¢ is a function 
defined in a neighborhood of the boundary such that 
0 is a regular value and ¢!(0)=M. The form 
L(v,w)= —d(ddo J)(v, Jw), for v,w € €, is called 
the Levi form, and when Li(v,w) is positive 
(negative) definite, then X is said to have strictly 
pseudoconvex (pseudoconcave) boundary. The 
hyperplane field € will be a contact structure if and 
only if d(dġ o J) is a nondegenerate 2-form on € (if 
and only if L(v, w) is definite). A well-studied source 
of examples comes from Stein manifolds. 


Example 7 Let X be a complex manifold and 
again let J denote the induced complex structure 
on TX. From a function ¢: X — R, we can define a 
2-form w= -—d(ddoJ) and a symmetric form 
g(v,w) =w(v, Jw). If this symmetric form is positive 
definite, the function ¢ is called “strictly plurisub- 
harmonic.” The manifold X is a Stein manifold if X 


admits a proper strictly plurisubharmonic function 
@: X — R. An important result says that X is Stein 
if and only if it can be realized as a closed complex 
submanifold of C”. Clearly any noncritical level set 
of ọ gives a contact manifold. 


Contact manifolds also give rise to an interesting 
class of differential operators. Specifically, a contact 
structure € on M defines a symbol-filtered algebra of 
pseudodifferential operators W:(M), called the 
“Heisenberg calculus.” Operators in this algebra 
are modeled on smooth families of convolution 
Operators on the Heisenberg group. An important 
class of operators of this type are the “sum-of- 
squares” operators. Locally, the highest-order part 
of such an operator takes the form 


2n 
L= > vy + iava [7] 
j=l 


where {v1,...,V2n} is a local framing for the contact 
field and va is a Reeb vector field. This operator 
belongs to We (M) and is subelliptic for a outside a 
discrete set. 


Hamiltonian Dynamics 


Given a symplectic manifold (X,w), a function 
H:X—R will be called a Hamiltonian. (Only 
autonomous Hamiltonians are discussed here.) The 
unique vector field satisfying 


w= —dH 


is called the Hamiltonian vector field associated to 
H. Many problems in classical mechanics can be 
formulated in terms of studying the flow of vy for 
various H. 


Example 8 If (X,w)=(R”,d\), where à is from 
Example 2, then the flow of the Hamiltonian vector 
field is given by 


_ OH, OH 


A standard fact says that the flow of vy preserves 
the level sets of H. 


Theorem 7 If M is a level set of H corresponding 
to a regular value and M is a hypersurface of contact 
type, then the trajectories of vy and of the Reeb 
vector field (associated to M in Theorem 3) agree. 


Thus under suitable hypothesis, Hamiltonian 
dynamics is a reparametrization of Reeb dynamics. 
In particular, searching for periodic orbits in such a 
Hamiltonian system is equivalent to searching for 
periodic orbits in a Reeb flow. Thus in this context, 
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Weinstein’s conjecture asserts a positive answer to 
the questions: Does the Hamiltonian flow along a 
regular level set of contact type have a periodic 
orbit? Viterbo proved that the answer was yes if the 
hypersurface is compact and in (R*”,w=da). Other 
progress has been made by studying Reeb dynamics. 


Geometric Optics 


In this section, we study the propagation of light (or 
various other disturbances) in a medium (for the 
moment, we do not specify the properties of this 
medium). The medium will be given by a three- 
dimensional manifold M. Given a point p in M and 
t > 0, let I,(t) be the set of all points to which light 
can travel in time <t. The wave front of p at time t 
is the boundary of this set and is denoted as 
@,(t) = Ol,(t). 


Theorem 8 (Huygens’ principle). ®p(t+t) is the 
envelope of the wave fronts ®,(t’) for all q € ®,(t). 


This is best understood in terms of contact 
geometry. Let 7: (T*M\{0}) — P*M be the natural 
projection (see Example 3) and let $ be any smooth 
sub-bundle of T*M \ {0} that is transverse to the radial 
vector field in each fiber and for which 7 |ç: S — P*M 
is a diffeomorphism. The restriction of the Liouville 
form to S gives a contact form a and a corresponding 
Reeb vector field v. Given a subset F of M with a well- 
defined tangent space at every point set 


Lr={p€S:2(p) €F and p(w) =0 for all 
WE Tat) [8] 


The set Lp is a Legendrian submanifold of $ and is 
called the “Legendrian lift” of F. If L is a generic 
Legendrian submanifold in S, then z(L) is called the 
front projection of L and Laz) = L. Given a Legendrian 
submanifold L, let ©;(L) be the Legendrian submani- 
fold obtained from L by flowing along v for time t. 


Example 9 Given a metric g on M, Fermat’s 
principle says that light travels along geodesics. 
Thus, if S is the unit cotangent bundle, then using g 
to identify the geodesic flow with the Reeb flow 
one sees that light will travel along trajectories 
of the Reeb vector field. Given a point p in M, 
the Legendrian submanifold Ly is a sphere sitting 
in TM. The Huygens principle follows from the 
observation that ®,(t) =7(W;(Lp)). 


Using the more general S discussed above, one can 
generalize this example to light traveling in a medium 
that is nonhomogeneous (i.e., the speed differs from 
point to point in M) and anisotropic (i.e., the speed 
differs depending on the direction of travel). 
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See also: Hamiltonian Fluid Dynamics; Integrable Systems 
and Recursion Operators on Symplectic and Jacobi 
Manifolds; Minimax Principle in the Calculus of Variations. 
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Introduction 


Control Theory is an interdisciplinary research area, 
bridging mathematics and engineering, dealing with 
physical systems which can be “controlled,” that is, 
whose evolution can be influenced by some external 
agent. A general model can be written as 


y(t) = A(t, y(0), a(-)) [1] 


where y describes the state variables, y(0) the initial 
condition, and u(-) the control function. Thus, eqn 
[1] means that the state at time t depends on the 
initial condition but also on some parameters u 
which can be chosen as function of time. To be 
precise, there are some control problems which are 
not of evolutionary type; however, in this presenta- 
tion we restrict ourselves to this case. 

One has to distinguish among the control set U where 
the control function can take values: u(t) € U, and the 
space of control functions, U, to which each control 
function should belong: u(-) € U. Thus, for example, 
we may have U = R” and U = L™([0, T], R”). 


There are various problems one can formulate 
regarding systems of type [1], among which: 


Controllability Given any two states yo and yı 
determine a control function u(-) such that for 
some time t > 0 we have y; = A(t, yo, u(-)). 

Optimal control Consider a cost function /(y(-), 
u(-)) depending both on the evolutions of y and u 
and determine a control function ŭ(-) and a 
trajectory y(t)= A(t, yo, u(-)) such that y(-) steers 
the system from yo to y1, as before, and the cost J 
is minimized (or maximized). 

Stabilization We say that y is an equilibrium if 
there exists n € U such that A(t, y, u) =y for every 
t > Q0 (here u indicates also the constant in time 
control function). Determine the control u as 
function of the state y so that y is a (Lyapunov) 
stable equilibrium for the uncontrolled dynamical 
system y(t) = A(t, y(0), u(y(-))). 

Observability Assume that we can observe not the 
state y, but a function ¢(y) of the state. Determine 
conditions on @¢ so that the state y can be 
reconstructed from the evolution of ¢(y) choosing 
u(-) suitably. 


For the sake of simplicity, we restrict ourselves 
mainly to the first two problems and just mention 


some facts about the others. Also, we focus on two 
cases: 


Control of ordinary differential equations (ODEs) In 
this case te R,y € R”, U is a set, typically 
U c R”, and A is determined by a controlled ODE 


ý = f(t, y, u) [2 


A typical example in mathematical physics is the 
control of mechanical systems (Bloch 2003, Bullo 
and Lewis 2005). 


Control of partial differential equations (PDEs) In 
this case t € R,x € R”, y(x) belongs to a Banach 
functional space, for example, H’(R”*', R), U is a 
functional space, and A is determined by a 
controlled PDE, 


Cha ya eee eee | eee | ee [3] 


A typical example in mathematical physics is the 
control of wave equation using boundary condi- 
tions, see below. 


There are various other possible situations we do 
not treat here: “stochastic control,” when y is a random 
variable and A defined by a (controlled) sto- 
chastic differential equation; “discrete time control,” 
where t € N; “hybrid control,” where t and y may have 
both discrete and continuous components, and so on. 

As shown above, the control law can be assigned 
in (at least) two basically different ways. In open- 
loop form, as a function of time: t > u(t), and in 
closed-loop form or feedback, as a function of the 
state: y — u(y). For example, in optimal control we 
look for a control (t) in open-loop form, while in 
stabilization we search for a feedback control u(y). 
The open-loop control depends on y(0), while a 
feedback control can stabilize regardless of the 
initial condition. 


Example 1 A point with unit mass moves along a 
straight line; if a controller is able to apply an 
external force u, then, calling yj(t),y2(t), respec- 
tively, the position and the velocity of the point at 
time ft, the motion is described by the control system 


(91, V2) = (v2, u) [4] 
It is easy to check that the feedback control 
u(¥1, V2) = —¥1 — y2 stabilizes the system asymptot- 


ically to the origin, that is, for every initial data 
(1,2), the solution of the corresponding Cauchy 
problem satisfies lim; — ə (y1, y2)(t) = (0, 0). 

Another simple problem consists in driving the 
point to the origin with zero velocity in minimum 
time from given initial data. It is quite easy to see 
that the optimal strategy is to accelerate towards the 
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Figure 1 Example 1. The simplest example of (a) optimal 
synthesis and (b) corresponding feedback. 


origin with maximum force on some interval [0,¢| 
and then to decelerate with maximum force to reach 
the origin at velocity zero. The set of optimal 
trajectories is depicted in Figure 1a: they can 
be obtained using the following discontinuous 
feedback, see Figure 1b. Define the curves 
Ç+ = {(y1, y2): Fy2 > 0,91 = +y3} and let ¢ be 
defined as the union ¢*U {0}. We define At to be 
the region below ¢ and A- the one above. Then the 
feedback is given by 


+1 if (y1, Y2) Ec ATUCT 
u(x) =< -1 if(y, y) EAT UC 
0 if (y1, y2) = (0,0) 


Example 2 Consider a (one-dimensional) vibrating 
string of unitary length with a fixed endpoint. The 
model for the motion of the displacement of the 
string with respect to the rest position is given by 


Yn + Ay = 0, y(t, 0) =0 [5] 
with initial data 
y(0,-) = Yo; y(0,-) = yi [6] 


Assume that we can control the position of the 
second endpoint; then, 


y(t, 1) = u(t) 7] 
for some control function u(-) ER. 


Let us introduce another key concept: the reach- 
able set at time t from y is the set 


R(t;y) = (A(t, yu): uC) EU} 


Various problems can be formulated in terms of 
reachable sets, for example, controllability requires 
that for every y the union of all R(t;y) as t — œœ 
includes the entire space. The dependence of R(t; y) 
on time ¢ and on the set of controls M is also a 
subject of investigation: one may ask whether the 
same points in R(t;y) can be reached by using 
controls which are piecewise constant, or take 
values within some subsets of U. 


638 Control Problems in Mathematical Physics 


Control of ODEs 


For most proofs we refer to Agrachev and Sachkov 
(2004) and Sontag (1998). 


Controllability 
Consider first the case of a linear system: 


y=Ay+Bu, ueu, y(0)= yo [8] 


where y, yọ € R”, U C R”, A is an n x n matrix and 
B an n x m matrix. We have the following property 
of reachable sets: 


Theorem 1 If U is compact convex then the 
reachable set R(t) for [8] is compact and convex. 


A control system [8] is controllable if taking 
U=R” we have R(t)=R” for every t>0. By 
linearity, this is equivalent to requiring the reachable 
set to be a neighborhood of the origin in case of 
bounded controls. Define the controllability matrix 
to be the n x nm matrix 


C(A, B) = (B,AB,..., A” "B) 
Controllability is characterized by the following: 


Theorem 2 (Kalman controllability theorem). The 
linear system [|$] is controllable if and only if 
rank(C(A, B)) =n. 


For linear systems, there exists a duality between 
controllability and observability in the sense of the 
following theorem: 


Theorem 3 Consider the linear control system [8| 
and assume to observe the variable z(y)=Cy for 
some p xn matrix C. Then, observability holds if 
and only if the linear system y=A'y+C'v is 
controllable. 


There exists no characterization of controllability 
for nonlinear systems as for linear ones, but we have 
the linearization result: 


Theorem 4 A nonlinear system is locally control- 
lable if its linearization is. The converse is false. 


There are many results for the important class of 
control—affine systems 


m 


y = foly) + X fi(y)u; [9] 


i=1 


where fo,...,fm are smooth vector fields on R” and 
U = R”. In general, there exists no explicit represen- 
tation for the trajectories of [9], in terms of integrals 
of the control as it happens for linear systems. Still, a 
rich mathematical theory has been developed apply- 
ing techniques and ideas from differential geometry: 


the so-called geometric control theory. The main idea 
is that controllability (and properties of optimal 
trajectories) is determined by the Lie algebra gener- 
ated by vector fields f;. For example: 


Theorem 5 (Lie-algebraic rank condition). Let £ 
be the Lie algebra generated by the vector fields 
fi,i=1,...,m, and assume fọ=0. If L(y) is of 
dimension n at every point y then the system is 
controllable. 


We refer to Agrachev and Sachkov (2004) 
and Jurdjevic (1997) for general presentation of 
geometric control theory and give a simple example 
to show how Lie brackets characterize reachable 
directions. 


Example 3 Consider the Brockett integrator 


y1 = u1, Ya =, Y3 = U1Y2 — U21 


Starting from the origin, using constant controls, we 
can move along curves tangent to the y1y2 plane. 
However, let fı = (1,0, y2) and fo =(0, 1, —y1) (fields 
corresponding to constant controls); then their Lie 
bracket is given by 


fi, f2](0) = (Dh - fa — Dfa - f2)(0) = (0, 0, —2) 


Moving for time t first along the integral curve of f4, 
then of f2, then of —f;, and finally of —f}, we reach 
a point ¢7[f,,f2](0) + olt?) along the vertical direc- 
tion y3. This corresponds to say that the system 
satisfies LARC. 


Optimal Control 


The theory of optimal control has developed in three 
main directions: 

Existence of optimal controls, under various 
assumptions on L,f,U. When the sets F(t, y) are 
convex, optimal solutions can be constructed follow- 
ing the direct method of Tonelli for the calculus of 
variations, that is, as limits of minimizing sequences: 
the two main ingredients are compactness and lower- 
semicontinuity. If convexity does not hold, existence 
is not granted in general but for special cases. 

Necessary conditions for the optimality of a 
control u(-). The major result in this direction is 
the celebrated “Pontryagin maximum principle” 
(PMP) which extends the Euler-Lagrange equation 
to control systems, and the Weierstrass necessary 
conditions for a strong local minimum in the 
calculus of variations. Various extensions and other 
necessary conditions are now available (Agrachev 
and Sachkov 2004). 

Sufficient conditions for optimality. The standard 
procedure resorts to embedding the optimal control 
problem in a family of problems, obtained by 


varying the initial conditions. One defines the value 
function V by 


V(t, y) = intJ(y(-), ut) 

where the inf is taken over the set of trajectories and 
controls satisfying y(t) = y. Under suitable assumptions, 
V is the solution to a first-order Hamilton—Jacobian 
PDE. The lack of regularity of the value function V has 
long provided a major obstacle to a rigorous mathema- 
tical analysis, solved by the theory of viscosity solutions 
(Bardi and Capuzzo Dolcetta 1997). Another method 
consists in building an optimal synthesis, that is, a 
collection of trajectory—control pairs. 


Pontryagin maximum principle Consider a general 


autonomous control system: 


ý = fY, u) [10] 
where y € R” and u € U compact subset of R”. We 
assume to have regularity of f guaranteeing existence 
and uniqueness of trajectories for every u(-) € U. For 
a fixed T > 0, an optimal control problem in Mayer 
form is given by 


min Y(y(T,u)), 


min y(0)=7 [A] 


where w is the final cost and y the initial condition. 
More generally, one can consider also the Lagran- 
gian cost | L(y,u)dt and reduce to this case by 
adding a variable yo(0) =0 and yo = L. 

The well-known PMP provides, under suitable 
assumptions, a necessary condition for optimality in 
terms of a lift of the candidate optimal trajectory to 
the cotangent bundle. For problems as [11], PMP 
can be stated as follows: 


Theorem 6 Let u*(-) be a (bounded) admissible 
control whose corresponding trajectory y*(-)=y(-,u*) 
is optimal. Call p:[0,T]—> R” the solution of the 
adjoint linear equation 
p(t) = —p(t) : Dy for" (t), u” (2) 
p(T) = Vv" (T)) 
Then the maximality condition 


p(t) -f O* (E), u" (t)) = max p(t) -f(y"(4),w) [E3] 


holds for almost every time t € [0, T]. 


[12] 


Notice that the conclusion of the theorem can be 
interpreted by saying that the pair (y, p) satisfies the 
system: 

_ OH(y",p,u") 
Op 

where H(y,p,u)=(p,f(y,u)). This is a pseudo- 

Hamiltonian system, since H also depends on u“. 


_ OH(y*, pu") 


p= Dy 


y 
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Alternatively, one can define the maximized 


Hamiltonian 
H(y, p) = max(p, f(y, u)) 


but H may fail to be smooth. Another difficulty lies 
in the fact that an initial condition is given for y and 
a final condition is given for A. 

The proof of PMP relies on a special type of 
variations, called needle variations, of a reference 
trajectory. Given a candidate optimal control u* and 
corresponding trajectory y*, a time T of approximate 
continuity for f(y*(-),w*(-)) and w€ U, a needle 
variation is a family of controls us obtained 
by replacing u* with w on the interval [7 —«,7T]. 
A needle variation gives rise to a variation v of the 
trajectory satisfying the variational equation 


v(t) = Dyf (y* (t), u” (t)) - v(t) [14] 


in classical sense only after time 7. Recently Piccoli 
and Sussmann (2000) introduced a setting in which 
needle and other variations happen to be 
differentiable. 

One may also consider some final (or initial) 
constraint: 


(T, y(T)) €S [15] 


where S C R x R” (and T not fixed). In this case, the 
final condition for p is more complicated as well as 
the proof of PMP. It is interesting to note the many 
connections between PMP and classical mechanics 
framework well illustrated by Bloch (2003) and 
Jurdjevic (1997). 


Value function and HJB equation In this section 
we consider the minimization problem 


inf Y(T, y(T,u)) [16] 


for the control system 


ý = f(t,y, u), 
subject to the terminal constraints [15], where 


S c R”™! is a closed target set. 


Theorem 7 (PDE of dynamic programming). 
Assume that the value function V, for [15]-[17], 
is C! on some open set Q C R x R”, not intersecting 
the target set S. Then V satisfies the Hamilton- 
Jacobi equation 


Vs(s, y) + min{ Vy(s, y) i f(s, y,w) } =0 
Vis,y)€Q 


u(t)eEU ae. [17] 


[18] 


Equation [18] is called the Hamilton-Jacobi-Bellman 
(HJB) equation, after Richard Bellman. In general, 
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however, V fails to be differentiable: this is the case for 
Example 1 along the lines ¢*. To isolate V as the 
unique solution of the HJB equation, one has to resort 
to the concept of viscosity solution. The dynamic 
programming and HJB equation apparatus applies 
also to stochastic problems for which the equation 
happens to be parabolic, because of the Ito formula. 


Optimal syntheses Roughly speaking, an optimal 
synthesis is a collection of optimal trajectories, one 
for each initial condition y. Geometric techniques 
provide a systematic method to construct syntheses: 


Step 1 Study the properties of optimal trajectories 
via PMP and other necessary conditions. 

Step 2 Determine a (finite-dimensional) sufficient 
family for optimality, that is, a class of trajectories 
(satisfying PMP) containing all possible optimal ones. 

Step 3 Construct a synthesis selecting one trajec- 
tory for every initial condition in such a way as to 
cover the state space in a regular fashion. 

Step 4 Prove that the synthesis of Step 3 is indeed 
optimal. 


One of the main problems in step 2 is the possible 
presence of optimal controls with an infinite number 
of discontinuities, known as Fuller phenomenon. The 
key concept of regular synthesis, of step 3, was 
introduced by Boltianskii and recently refined by 
Piccoli and Sussmann (2000) to include Fuller phe- 
nomena. The above strategy works only in some 
special cases, for example for two-dimensional 
minimum-time problems (Boscain and Piccoli 2004): 
we report below an example. 


Example 4 Consider the problem of orienting in 
minimum time a satellite with two orthogonal rotors: 
the speed of one rotor is controlled, while the second 
rotor has constant speed. This problem is modelled by 
a left-invariant control system on SO(3): 


y=yF+uG), yeSO(3), u| <1 


where F and G are two matrices of so(3), the Lie 
algebra of SO(3). Using the isomorphism of Lie 
algebras (SO(3),[.,.]) ~ (R?, x), the condition that 
the rotors are orthogonal reads: trace(F-G)=0. 
If we are interested to orient only a fixed semi-axis 
then we project the system on the sphere S?: 


y=y(F+uG), ye Š, |jul<1 


In this case, F+ G and F — G are rotations around 
two fixed axes and, if the angle between these two 
axes is less than 2/2, every optimal trajectory is a 
finite concatenation of arcs corresponding to con- 
stant control +1 or —1. The “optimal synthesis” can 
be obtained by the feedback shown in Figure 2. 





Figure 2 Optimal feedback for Example 4. 


Control of PDEs 


The theory for control of models governed by PDEs 
is, as expected, much more ramified and much less 
complete. An exhaustive resume of the available 
results is not possible in short space, thus we focus 
on Example 2 and few others to illustrate some 
techniques to treat control problems and give 
various references (see also Fursikov and Imanuvilov 
(1996), Komornik (1994), and Lasiecka and Triggiani 
(2000), and references therein). 

Besides the variety of control problems illustrated 
in the Introduction, for PDE models one can consider 
different ways of applying the control, for example: 

Boundary control One consider the system [3] 
(with F independent of u) and impose the condition 
y(t, x) = u(t, x) to hold for every time t and every x in 
some region. Usually, we assume y(t) to be defined 
bounded region Q and the control acts on some set 
Ic øQ. Obviously, also Neumann conditions are 
natural as „y =u where v is the exterior normal to Q. 

Internal control One consider the system [3] 
with F depending on u. Thus, the control acts on the 
equation directly. 

Other controls There are various other control 
problems one may consider as Galerkin-type 
approximation and control of some finite family of 
modes. An interesting example is given by Coron 
(2002), where the position of a tank is controlled to 
regulate the water level inside. 


Control of a Vibrating String 


We consider Example 2, but various results hold for 
hyperbolic linear systems in general. First consider 


the uncontrolled system 
Zt = Az, z(0, t) = z(1, t) = 0 [19] 


A first integral is the energy given by 


El) =5 | [lex + le] dx 


Then we say that the system [19] is observable at 
time T if there exists C(T) such that 


T 
E(0) < C(T) / zx(1,t)Ê dt 


which means that if we observe zero displacement 
on the right end for time T then the solution has 
zero energy and hence vanishes. In this case, the 
system is observable for every time T > 2: this is 
precisely the time taken by a wave to travel from the 
right end point to the left one and backward. 

Thanks to a duality as for the finite-dimensional 
case, observability of [19] is equivalent to null 
controllability for [5]-[7], that is, to the property 
that for every initial conditions yo, yı there exists a 
control u(-) such that the corresponding solution 
verifies y(x,T)=y;(x,T)=0. More precisely, the 
desired control is given by u(t) =zZ,(1,t), where Z is 
the solution of [19] minimizing the functional (over 
L? x H) 


TBO) 2950) 


ji T 
=5/ lzx(1,t)|7 de + | yoes(-.0)dx— | yr2(,0) dx 


One can check that this functional is continuous and 
convex, and the coercivity is granted by the 
observability of [19]; thus, a minimum exists by 
the direct method of Tonelli. This is an example of 
the method known as Hilbert’s uniqueness method 
introduced by Lions (1988). 

In the multidimensional case, controllability can 
be characterized by imposing a condition on the 
region IC OQ on which the control acts. More 
precisely, rays of geometric optics in Q should 
intersect I’ (Zuazua 2005). 

If we consider infinite-time horizon T= +00 and 
introduce the functional 


+00 
j=] [l?d +N | x? dedx 
0 


then the optimal control is determined as follows. 
If (y,p) is a solution of the optimality system: 
[S|-[6] with y=0 outside T and 
De — Ap +y =0, ð&p+Ny=0 onl 
p=0 ondQ 


then u=y on I (Lions 1988, Zuazua 2005). 


Controllability via Return Method of Coron 


As we saw in Theorem 4, a nonlinear system may be 
controllable even if its linearization is not. In this 
case, controllability can be proved by the return 
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method of Coron, which consists in finding a 
trajectory y such that the following hold: 


1. y(0)=y(T) = 0; 
2. the linearized system around y is controllable. 


Then by implicit-function theorem, local controll- 
ability is granted, that is, there exits € > 0 such that 
for every data yo, y1 of norm less than €, there exists 
a control steering the system from yo to y, in time T. 
This method does not give many advantages in the 
finite-dimensional case, but permits to obtain excel- 
lent results for PDE systems such as Euler, Navier— 
Stokes, Saint-Venant, and others (Coron 2002). 


Control of Schrodinger Equation 


Consider the issue of designing an efficient transfer of 
population between different atomic or molecular 
levels using laser pulses. The mathematical descrip- 
tion consists in controlling the Schrodinger equation. 
Many results are available in the finite-dimensional 
case. Finite-dimensional closed quantum systems are 
in fact left-invariant control systems on SU(z), or on 
the corresponding Hilbert sphere S*”~! c C”, where 
n is the number of atomic or molecular levels, and 
powerful techniques of geometric control are avail- 
able both for what concerns controllability and 
optimal control (Agrachev and Sachkov 2004, 
Boscain and Piccoli 2004, Jurdjevic 1997). 

Recent papers consider the minimum-time pro- 
blem with unbounded controls as well as minimiza- 
tion of the energy of transition. Boscain et al. (2002) 
have applied the techniques of sub-Riemannian geo- 
metry on Lie groups and of optimal synthesis on two- 
dimensional manifolds to the population transfer 
problem in a three-level quantum system driven by 
two external fields of arbitrary shape and frequency. 

Although many results are available for finite- 
dimensional systems, only few controllability prop- 
erties have been proved for the Schrodinger equation 
as a PDE, and in particular no satisfactory global 
controllability results are available at the moment. 
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Introduction 


Convexity is an important notion in nonlinear 
optimization theory as well as in infinite- 
dimensional functional analysis. As will be seen 
below, very simple and powerful tools will be 
derived from elementary duality arguments (which 
are by-products of the Moreau—Fenchel transform 
and Hahn-Banach theorem). We will emphasize on 
applications to a large range of variational pro- 
blems. Some arguments of measure theory will be 
skipped. 


Basic Convex Analysis 


In the following, we denote by X a normed vector 
space, and by X* the topological dual of X. If 
a topology different from the normed topology is 
used on X, we will denote it by r. For every x € X 
and A C X, V, denotes the open neighborhoods of x 
and intA, clA, respectively, the interior and the 
closure of A. We deal with extended real-valued 
functions f : X > RU {+co}. We denote by dom f = 
f"(R) and by epif={(x,a) € X x R:f(x) < a} 
the domain and the epigraph of f, respectively. We 
say that f is proper if dom f #0. Recall that f is 
convex if for every (x,y) € X? and t € [0,1], there 
holds 


f(tx + (1 — t)y) < tf (x) + (1 — Hf y) 
(by convention oo + a = +00) 


The notion of convexity for a subset ACX 


is recovered by saying that xa is convex, where its 
indicator function ya is defined by setting 


= JO ifxEA 
xalx) = | 


+oo otherwise 


Continuity and Lower-Semicontinuity 


A first consequence of the convexity is the continuity 
on the topological interior of the domain. We refer for 
instance to Borwein and Lewis (2000) for a proof of 


Theorem 1 Let f:X—RU{+co} be convex and 
proper. Assume that supyf < +œ, where U is a 
suitable open subset of X. Then f is continuous and 
locally Lipschitzian on all int(dom f). 


As an immediate corollary, a convex function on 
a normed space is continuous provided it is 
majorized by a locally bounded function. In the 
finite-dimensional case, it is easily deduced that a 
finite-valued convex function f:R4—R is locally 
Lipschitz. Furthermore, by Aleksandrov’s theorem, 
f is almost everywhere twice differentiable and the 
non-negative Hessian matrix V7f coincides with the 
absolutely continuous part of the distributional 
Hessian matrix D7f (it is a Radon measure taking 
values in the non-negative symmetric matrices). 

However, in infinite-dimensional spaces, for 
ensuring compactness properties (as, e.g., in condi- 
tion (ii) of Theorem 4 below), we need to use weak 
topologies and the situation is not so simple. 
A major idea consists in substituting the continuity 
property with lower-semicontinuity. 


Definition 2 A function f : X — RU {+00} is 7-l.s.c. 
at xo E€ X if for all aE R, there exists U € V,, 
such that f > a on U. In particular, f will be l.s.c. on 
all X provided f! ((r, +00)) is open for every r € R. 


Remark 3 


(i) The following sequential notion can be also 
used: f is T-sequentially l.s.c. at xo if 


V(Xn) C X Xn > xo => lim inf i a) 27 xo) 


It turns out that this notion (weaker in general) 
is equivalent to the previous one provided xo 
admits a countable basis of neighborhoods. 

(ii) A well-known consequence of Hahn-—Banach 
theorem is that, for convex functions, the lower- 
semicontinuity property with respect to the 
normed topology of X is equivalent to the weak 
(or weak sequential) lower-semicontinuity. 


Theorem 4 (Existence). Let f:X—RU{+oo} be 


proper, such that 


(i) f is T-Ls.c., 
(ii) Vr € R, f-'((—00,1]) is t-relatively compact. 


Then there is x € X such that f(x)=int f and 
argmin f := {x € X|f(x) =inf f} is T-compact. 


In practice, the choice of the topology 7 is ruled 
by the condition (ii) above. For example, if X is a 
reflexive infinite-dimensional Banach space and if f 
is coercive (Le. lim, 0 f(x) = +00), we may take 
for r the weak topology (but never the normed 
topology). This restriction implies in practice that 
the first condition in Theorem 4 may fail. In this 
case, it is often useful to substitute f with its lower- 
semicontinuous (l.s.c.) envelope. 


Definition 5 Given a topology 7, the relaxed function 


f(=f") is defined as 
f(x) = sup{g(x)|g:X > RU {+00}, 
g is T-l.s.c..g<f} 


It is easy to check that f is 7-l.s.c. at xo if and only 
if f(xo) =f (xo). Futhermore, 


f(x) = sup inff, epif = clxxr) (epi f) 
Uey, U 


We can now state the relaxed version of Theorem 1.4. 


Theorem 6 (Relaxation). Let f:X— RU {+09}, 
then: inff =inff. Assume further that, for all 
real r, f-'((—00,1]) is T-relatively compact; then f 
attains its minimum and argminf=argminfN 


{xE X|f (x) =f (x)}. 


Moreau-Fenchel Conjugate 


The duality between X and X* will be denoted by the 
symbol (-|-). If X is a Euclidian space, we identify X* 
with X via the scalar product denoted (- | -). 
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Definition 7 Let f:X—RU{+oo}. The Moreau- 
Fenchel conjugate f* : X* — R U {+ co} of f is defined 
by setting, for every x* € X*: 

f*(x") =sup{(x|x") — f(x)|x € X} 


In a symmetric way, if f* is proper on X*, we define 
the biconjugate f** : X — R U {+co} by setting 


f(x) = supi clx") — f(x") x" E X"} 


As a consequence, the so-called Fenchel inequality 


holds: 
(x|x") < f(x) + f(x"), 


Notice that f does not need to be convex. However, 
if f is convex, then f* agrees with the Legendre- 
Fenchel transform. 


(x, x*)EX x X* 


Definition 8 Let f:X—RU{+co}. The sub- 
differential of f at x is the possibly void subset of 
Of (x) C X* defined by 


Of (x) == 1x" E X% f(x) + A(x") = (x, x*)} 


It is easy to check that Of (x) is convex and weak- 
star closed. Moreover, if f is convex and has a 
differential (or Gateaux derivative) f'(x) at x, then 
Of (x) = {f'(x)}. After summarizing some elementary 
properties of the Fenchel transform, we give 
examples in R? or in infinite-dimensional spaces. 


Lemma 9 


(i) f* is convex, I.s.c. with respect to the weak star 
topology of X*. 
(ii) f*(0)= —inf f and f > g => f< g. 
(ii) (inf; fi) = sup; f;*, for every family {fi}. 
(iv) f**(x) = sup{g(x): g affine continuous on X and 
g < f} (by convention, the supremum is identi- 
cally —co if no such g exists). 


Proof (i) This assertion is a direct consequence of the 
fact that f* can be written as the supremum 
of functions gy, where g,:=(x|-) —f(x). Clearly, 
these functions are affine and weakly star-continuous 
on X*. The assertions (ii), (iii) are trivial. To obtain (iv), 
it is enough to observe that an affine function g of 


the form g(x)=(x,x*)—( satisfies g<f iff 
f*(x*) < 2. O 
Example 1 Let f :X —> R, be defined by 
1 p 

f(x) = z lllz L<p< +o 

then, 
*(x*)= —|[x*15., with -+—=1 
f(x") y I" IIx aes 
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whereas, for find f*=xg, where 


B ={ |x" |< 1]. 


p=1, we 


Example 2 Let A € Rea be a symmetric positive- 
definite matrix and let f(x) := (1/2)(Ax | x)(x € R). 
Then, for all y € R, we have f*(y) =(1/2)(Ay | y). 
Notice that if A has a negative eigenvalue, then 
f = +0. 


Particular examples on R? are also very popular. 
For instance: 


Minimal surfaces 


f(x) = y1 + |x|? 
f*(y) = j= 1= |y? if |y| <1 


+00 otherwise 
Entropy 
= Jxlogx ifxeR}, TEOR 1 
f(x) = dlog BEERS fi) = exp — 1) 


Example 3 Let C C X be convex, and let f = xc. 
Then, 


f(x") = ac(c*) = sup (x|x") 


xEC 


(support function of C) 


Notice that if M is a subspace of X, then 
(vm) =xXm:. We specify now a particular case of 
interest. 

Let Q be a bounded open subset of R”. Take 
X= C(ĝ; RÍ) to be the Banach space of continu- 
ous functions on the compact 9) with values in R®. 
As usual, we identify the dual X* with the space 
M,(O;R2) of R4-valued Borel measures on 2 with 
finite total variation. Let K be a closed convex of 
R? such that 0 € K. Then pr(€):= sup {(€|z): z € K) 
is a non-negative convex l.s.c. and positively 
1-homogeneous function on R? (e.g., px is the 
Euclidean norm if K is the unit ball of R). Let us 
define C:={pye X: p(x) € K, Vx € Q}. Then, we 
have 


Cowen J A) 


= [ok (Sateen ii 


where 0 is any non-negative Radon measure such 
that A < 0 (the choice of @ is indifferent). In the case 
where K is the unit ball, we recover the total 
variation of À. 


Example 4 (Integral functionals). Given 1<p< 
+oo, (Q,u,7) a measured space and ọ:Q x 


R?5[0,+00] a T Q Bpa-measurable integrand. 
Then the partial conjugate y*(x, z*):=sup{(z|z*) — 
v(x, z): z € Rİ} is a convex measurable integrand. 
Let us define 


lp:u € (Le)? =f (x, u(x))du € RU {oo} 


and assume that I, is proper. Then there holds 
(Io) =I», where 


ive (EE) f(x, ve) dy 


Duality Arguments 
Two Key Results 


The first result related to the biconjugate f** is 
a consequence of the Hahn-Banach theorem. 
Recalling the assertion (v) of Lemma 9, we notice 
that the existence of an affine minorant for f is 
equivalent to the properness of f* (ie. 
WEA T O A a 


Theorem 10 Let f: X — RU {+020} be convex and 
proper. Then 


(i) f is Ls.c. at xo if and only if f* is proper 
and f**(xo)=f(xo). In particular, the lower- 
semicontinuity of f on all X is equivalent to the 
identity f = f**. E 

(ii) If f* is proper, then f** =f. 


Proof We notice that by Lemma 9, f** < f and f* 
is l.s.c (even for the weak topology). Therefore, 
f**<f and, moreover, f is l.s.c. at xo if f* (xo) > 
f(xo). Conversely, if f is l.s.c. at xo, for every ao < 
f(xo), there exists a neighborhood V of xo such 
that Vx (—oo,ao)Nepif=0. It follows that 
epif is a proper closed convex subset of X x R 
which does not intersect the compact singleton 
{(x0,Q@0)}. By applying the Hahn-Banach strict 
separation theorem, there exists (xj, 6o) € X* xR 
such that 





(x0, X0) + aobo < (x, x0) + abo 
for all (x,a) € epi f 


Taking a—oo and x € domf, we find 6o > 0. In 
fact, Go > 0 as the strict inequality above would be 
violated for x =x. Eventually, we obtain that f is 
minorized by the affine continuous function 
g(x) =—(x — x9,x9/8) +ao. Thus, we conclude 
that f* is proper and that f**(xo9) > ao. 


The assertion (ii) is a direct consequence of the 
equivalence in (i). E 


Theorem 11 Let X be a normed space and let 
f:X—[0,+00] be a convex and proper function; 
assume that f is continuous at 0, then 


1) f* achieves its minimum on X* 
(ii) f(0) =f**(0) = —inf f" 
Proof 


(i) Let M be an upper bound of f on the ball {||x||< 
R}. Then 


f*(x") > sup{(x, x") — f(x): [|x|] < R} 
= RII" |x. — M 


Hence, for every r, the set {x* € X*: f*(x*) <r} 
is bounded, thus r-relatively compact, where r is 
the weak-star topology on X*. By assertion (i) of 
Lemma 9, f* is r-l.s.c. and Theorem 4 applies. 
(ii) By en 10, since f is convex proper and 


l.s.c. at xo =0, we have f(0) =/f**(0) = —inf f*. 
LI 
Some Useful Consequences 
Proposition 12 (Conjugate of a sum). Let f,g:X — 


RU {+020} be convex such that 
Jxo E€ X : f is continuous at xy and g(xo) < +20 [2] 
Then 


O F+g'a)= int 


* * 
xi tx, =x 


AP (x1) + 8° (%3)} 


(the equality holds in R). 
(ii) If both sides of the equality in (i) are finite, then 
the infimum in the right-hand side is achieved. 


Proof Without any loss of generality, we may 
assume that x*=0 (we reduce to this case by 
substituting g with g — (-,x*)). We let 


h(p) = inti f(x + p) + g(x) |x € X} 


Noticing that (p, x)—> f(x +p) + g(x) is convex, we 
infer that h(p) is convex as well. As b is majorized 
by the function p> f(xo + p) + (xo), which by [2] 
continuous at 0, we deduce from Theorems 1 and 11 
that b(0)=h**(0) and that þh* achieves its infimum. 
Now /(0) = inf(f + g) = —(f + 2) (0) and 


h*(p*) = sup{(p,p*) — h(p): p € X} 
= sup{(p,p") —f(x+p)—ga(x): x E€ X,p EX} 
=g (=p) +h) 
The a ae (ii) atti since —h**(0)= 
min /* = min { *) + f*(p O 


Proposition 13 n Let X,Y be two 
Banach spaces and A: X — Y a linear operator with 
dense domain D(A). Let U:Y—RU{+oo} be a 
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convex l.s.c. function and let Ft» X be the convex 
functional defined by 


F(u) = ka if u € D(A) 


+00 otherwise 


Assume that there exists up € D(A) such that WV is 
continuous at Aug. Then 


(i) The Fenchel conjugate of F is given by 


EX, Pip) 


where, if both sides of the equality are finite, the 
infimum on the right-hand side is achieved. 

(ii) If, in addition, Y is reflexive and W is l.s.c. 
coercive, we have 


F(u) = inf{U(p)|(u,p) € G(A)} B] 
where G(A) denotes the graph of A. 


= inf{W*(o):0 € Y*,A*o =f} 


Proof 


(i) Define H, K:X x Y — R U {+00} by 
H(u, p) = xg(a) (4, p), K(u, p) = (p) 


Then we have the identity F*(f)=(H + K)*(f, 0), 
where the conjugate of H +K is taken ant 
respect to the duality (X x Y, X* x Y*). From the 
assumption, K is continuous at (uo, Aug) € 
dom H. By Proposition 12, we obtain 


(H + K)*(f,0) 


— inf {K*(f- H*(g,— 
eee i 230-1 (2,—@)} 
After a simple computation, it is easy to check 

that 


Hlg, 0) = L? if A ra 

+oo otherwise 

K(f—g0)= 40) tent 
+00 otherwise 


Let J(u) :=inf{WU(p): (u, p) € G(A)}. As observed 
for F* in the proof of (i), we have the identity 
J*(f)=(H+K)*(f,0). Therefore, in view of 
hate 10, F=F*=J]** and it is enough to 
prove that J is convex l.s.c. proper. Let us 
consider a sequence (u,) in X converging to 
some u € X. Without any loss of generality, we 
may assume that lim inf J(u,,) = lim ](u,) < +00. 
Then there is a sequence (pn) such that, for every 
n, (tins Pn) € GA) and J(u) > Ylin) — 1/7. As Y 
is coercive, {p,} is bounded in the reflexive 
space Y and possibly passing to a subsequence, 


— 
pi 0 
a 
~x 
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we may assume that p, converges weakly to 
some p. Since G(A) is a (weakly) closed subspace 
of X x Y, we infer that (u,p) as the limit of 
(un, Pn) still belongs to G(A). Thus, we conclude, 
thanks to the (weak) lower-semicontinuity of Y 








Him int Je) = lim Yp) = WO) >J) 


An immediate consequence of Propositions 12 and 
13 is the following variant: 


Proposition 14 Under the same notation as in 
Proposition 13, let ®:X— RU {+020} be a convex 
function and assume that there exists uy € D(A) 
such that F(uo) < +oo and Y is continuous at Auo. 
Then we have 


inf{ġ(u) + U(Au)} = sup {-"(-A*o) — V (0)} 


aE Y* 


where the supremum on the right-hand side is 
achieved. Furthermore, a pair (u,a) is optimal if 
and only if it satisfies the relations: ¢ € OV(Au) and 
—A*a € O¢(u). 


Remark 15 From the assertion (ii) of Proposition 
13, we may conclude that F is l.s.c. whenever the 
operator A is closed. If now A is merely closable 
(with closure denoted by A), we obtain 


F(u) = {Gn 


+00 


ifu c dom A 
otherwise 


This is the typical situation when F is an integral 
functional defined on smooth functions of the kind 


F(u) = | fl. Vu) dx 


where Q is an bounded open subset of R”, f:Q x 
R” — R is a convex integrand with quadratic growth 
(i.e., c|zl” < f(x,z) < C(1 + zl” for suitables C > 
c > 0). Then X = L7(Q), Y = L? (Q; R”), 


G(v) = | Fle. v) dx 


and A:u € Cl(Q) Vu € L?(Q; R”). It turns out 
that A is closable and that the domain of A 
characterizes the Sobolev space W'7(Q) on which 
A coincides with the distributional gradient 
operator. 


The situation is more involved if we consider 


F(u) = J f(x, Vu) dy 


u is a possibly concentrated Radon measure sup- 
ported on Q. In general, the operator A:u € 
ClO) c Li (Q) — Vue Do R”) is not closable 
and we need to come back to the general formula 
[3]. The general structure of G(A) has been given in 
Bouchitté et al. (1997) and Bouchitté and Fragalà 
(2002, 2003), namely 





(u, £) € G(A) =u € W14, Iq € LZ (9; R”): 
E = Vu Tr 1; n(x) z T 


where T(x), V (x) are suitable notions of tangent 
space and tangential gradient with respect to u, and 
Wire denotes the domain of the extended tangential 
gradient operator. 


Remark 16 The assertion (ii) of Proposition 13 
is not valid in the nonreflexive case. In 
particular, for 


F(u) = | f(x, Vu)dx 


where f(x,-) has a linear growth at infinity, 
we need to take Y as the space of R”-values 
vector measures on Q and the relaxed functional 
F** needs to be indentified on the space BV(Q) 
of integrable functions with bounded variations. 
The computation of F** is a delicate problem for 
which we refer to Bouchitté and Dal Maso (1993) 
and Bouchitté and Valadier (1998). 


Remark 17 By duality techniques, it is possible 
also to handle variational integrals of the kind 


F(u) = | Flee. u(x), Vax) 


even if the dependence of f(x,u,z) with respect to u 
is nonconvex. The idea consists in embedding the 
space BV(Q) in the larger space BV(Q x R) through 
the map u+>+1,, where 1, is the characteristic 
function defined on Q x R by setting 


_j1. if u(x) >t 
kD otherwise 
Then it is possible to show, under suitable 
conditions on the integrand f, that there exists 
a convex l.s.c., 1-homogeneous functional 
G:BV(Q x R) ~ RU {+00} such that F(u)=G(1,). 
This functional G is constructed as in the Example 
3 taking C to be a suitable convex subset of 
C°(Q x R). This nice new idea has been the key 
tool of the calibration method developed recently 
(Alberti et al. 2003). 


Convex Variational Problems in Duality 
Finite-Dimensional Case 


We sketch the duality scheme in two cases. 


Linear programming Let c € R”,b € R” and A an 
mxn matrix. We denote by A! the transpose 
matrix. We consider the linear program 


(P) inf{(c|x): x > 0, Ax < b} 
and its perturbed version (p € R”) 
h(p):= inf{(c|x): x > 0, Ax+p < b} 
An easy computation gives 


Vy € R”, 
; = if AT < > 4 
ro = (bly) if ATy+c<0,y>0 [4 


+00 otherwise 


Lemma 18 Assume that inf (P) is finite. Then: 


(i) b is convex proper and l.s.c. at 0. 
(ii) (P) has at least one solution. 


Proof We introduce the (n+ m) x (m+ 1) matrix 


B defined by 
fe O 
B=( i A 


(Im is the m-dimensional identity matrix). Denote 
(b1, b25... bBnym} C R”! the columns of B and K 
the convex cone keo rjbj: Aj > O}. By 
Farkas lemma, this cone K is closed. 


(i) Let a:= lim inf {h(p): p > 0}. We have to prove 
that a > b(0)=inf P. Let {p-} be a sequence in 
R” such that pe—0 and h(p-)—a. By the 
definition of h, we may choose x. > 0 such that 
Axs <b and (c|x-)-+a. Then we see that the 
column vector x- associated with (x-,b — Ax.) € 
R”™” satisfies: Bx. € K and 


Therefore, 


(3) ex 


and there exists x = (x,x’) such that x > 0,x’ > 0, 
(c|x)=a and Ax+x’=b. It follows that x is 
admissible for (P) and then (c| x) =a > h(0). 

(ii) We repeat the proof of (i) choosing p-=0 so 
that a= inf (P). O 


Thanks to the assertion (i) in Lemma 18, we deduce 
from Theorem 10 that inf (P)=h(0)=h**(0)= 
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sup —h*. Recalling [4], we therefore consider the dual 
problem: 


(P*) sup{—b-y: y>0,A'+c>0!} 


Theorem 19 The following assertions are equivalent: 


(i) (P) has a solution. 
(ii) (P*) has a solution. 
(iii) There exists (xo,yo) E R} x RY such that 
Axo < b,Alyo +c > 0. 


In this case, we have min(P)= max(P*) and 


an admissible pair (x,y) is optimal if and 
only if c-x=—b-y or, equivalently, satisfies 
the complementarity relations: (Ax —b)-y= 
(Aly +c)-x=0. 


Convex programming Let f,21,...,21:X—-R be 
convex l.s.c. functions and the optimization problem 


(P) inf{f (x): g(x) <0,j=1,2...,m} 


Here X=R” or any Banach space. As before, we 
introduce the value function 


pER”, b(p):= inf{f(x): 
St 0e 2am] 


and compute its Fenchel conjugate: 


AER”, b(A) = yet {L(x,A)} if A> 0 
+00 otherwise 
where L(x, A):=f(x) + X rAigi(x) is the so-called 
Lagrangian. We notice that þh is convex and that 
the equality (0) =h**(0) is equivalent to the zero- 
duality gap relation 


inf sup L(x, A) = sup inf L(x, A) 
x À À X 


This condition is fulfilled, in particular, if we make 
the following qualification assumption (ensuring 
that þh is continuous at 0 and Theorem 11 applies): 


dxo € X: f continuous at xo, g;(xo) < 0,Vj [5] 


Theorem 20 Assume that [S] holds. Then x is 
optimal for (P) if and only if there exist Lagrangian 
multipliers 4,2,...Am in Ry such that 


x € argmin (7 +>) ve) , Ag(¥)=0, Vj 
X z 
j 
Notice that the existence of such a solution x 
is ensured if, for example, X =R” and if, for some 
k > 0, the function f + k > 7g; is coercive. 


648 Convex Analysis and Duality Methods 


Primal—Dual Formulations in Mechanics 


We present here the example of elasticity which 
motivated the pioneering work by J J Moreau on 
convex duality techniques. Further examples can be 
found in Ekeland and Temam (1976). An elastic body is 
placed in a bounded domain 2 C R” whose boundary 
I consists of two disjoint parts T =T UI. The 
unknown u : Q — R” (deformation) satisfies a Dirichlet 
condition u = 0 on To, where the body is clamped. The 
system is subjected to a surface load g € L? (T1; R”) and 
to a volumic load f € L*(Q; R”). The static equilibrium 
problem has the following variational formulation: 


(P) int | [ ieseo)a — [fax 


- | guan") 
rı 


where e(z) :=(1/2)(uj,; + uj, i) denotes the symmetric 
strain tensor and jf:(x,Zz) EQ X Re Ry is a 
convex integrand representing the local elastic 
behavior of the material. We assume a quadratic 
growth as in Remark 15 (in the case of linear 
elasticity, an isotropic homogeneous material is 
characterized by the quadratic form 

Š AÀ 2 2 

j(x, z) = 5 ltr)" + mle 
à, u being the Lamé constants). 

We apply Proposition 14 with X = W!2 (Q; R”), 

Y = L? (Q; R” ), Au =e(u) and where we set 


sym 
—fof-udx 
— fp, g udH™ 


+00 otherwise 
(v) = f iv) dx 
Q 


After some computations, we may write the supre- 
mum appearing in Proposition 14 as our dual 
problem 


(u) = if u = 0 on To 


(P*) spd- f jeo) dx: o € 17(0; RZ); 
Q 
-divo = f on Q, -n = g on rı} 


where j* is the Moreau—Fenchel conjugate with 
respect to the second argument and n(x) denotes 
the exterior unit normal on IT. The matrix-valued 
map ø is called the stress tensor and j* the stress 
potential. Note that the boundary conditions for on 
have to be understood in the sense of traces. 


Theorem 21 The problems (P) and (P*) have 
solutions and we have the equality: inf(P) = sup (P*). 


Futhermore, a pair (u,c) is optimal if and only if it 

satisfies the following system: 

-div = f on Q) 
a(x) € Oj(x,e(u“)) a.e.onQ 


(equlibrium) 
(constitutive law) 


0 a.e.on Ig 


a S 
| 


g on [i 


O 


Duality in Mass Transport Problems 
General Cost Functions 


Let X,Y be a compact metric space and c:X x 
Y — [0, +00) a continuous cost function. We denote 
by P(X), P(X x Y) the sets of probability measures 
on X and X x Y, respectively. Given two elements 
u € P(X), v € P(Y), we denote by T(u,v) the subset 
of probability measures in P(X x Y) whose margin- 
als are, respectively, u and v. Identified as a subset 
of (C°(X x Y))* (the space of signed Radon mea- 
sures on X x Y), it is convex and weakly-star 
compact. The Monge-Kantorovich formulation of 
the mass transport problem reads as follows: 


Teuv)=inty | elwy drd) b (6 


This formulation, where the infimum is achieved (as 
we minimize an l.s.c. functional on a compact set for 
the weak star topology), is already a relaxation of 
the initial Monge mass transport problem, 


inf | deToa T v} 


where the infimum is searched among all transports 
maps T:X— Y pushing forward u on v (i.e., such 
that u(T“(B)=v(B) for all Borel subset B c Y). 
This is equivalent to restricting the infimum in [6] to 
the subclass {yr} C F(u, v), where 


Daaa- J p(x, Tx)u(da) 


In order to find a dual problem for [6], we fix 
v € P(Y) and consider the functional F: M(X) — 
[0, +20) defined by 


_JT(u,v) ifu>0,u(X)=1 
PE ne otherwise 


(M(X) denote the Banach space of (bounded) 
signed Radon measures on X). 


Lemma 22 F is convex, weakly-star l.s.c. and 
proper. Its Moreau—Fenchel conjugate is given by 


wpe C(X), Fy) =- | o‘(y)u(dy) 


where 
p(y) = inf{c(x, y) — p(x): x € X} 


Proof The convexity property is obvious and the 
properness follows from the fact that 


F(u) < J cl, y) 1 v(dxdy) 


Let un be such that u, — u (weakly star). We may 
assume that lim inf, F(u„) = lim, F(un):=a is finite. 
Then p, and the associated optimal +, are prob- 
ability measures on X and on X x Y, respectively. 
As X and Y are compact, possibly passing to a 
subsequence, we may assume that 7 — yy, and 
clearly we have y € T(u,v). Since c(x,y) is l.s.c. 
non-negative, we conclude that 

lim inf F(un) = lim inf C(x, V)Yn(dxdy) 

á a XxY 


> J. elx, y) 9(dedy) 
= F(u) 


Let us compute now F*(y). We have 


Aan l elx y)yldxdy) 
- f edu we PO), € Pia) 
int J. (elx) — ele) n(dedy) 
yeT(u, v} 
> | eo) Ha) 


To prove that the last inequality is actually an 
equality, we observe that, for every y € Y and ọ € 
C°(X), the minimum of the l.s.c. function c(-,y) — y 
is attained on the compact set X and there exists a 
Borel selection map S(y) such that y°(y) = c(S(y), y) — 
y(S(y) for all y € Y. We obtain the desired equality by 
choosing y defined, for every test w, by 


(x, y)y(dedy) = | (S(),y) (dy) 


XxY 
LJ 


We observe that, for every y € C°(X), the func- 
tion y% introduced in Lemma 22 is continuous (use 
the uniform continuity of c) and therefore the pair 
(p, pf) belong to the class 


F.:={(y, V) € C°(X) x C'(Y): 
p(x) + ply) < ex, y)} 
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Let us introduce the dual problem of [6]: 


spd f odut [odri(evie rp 7 


We will say that (y, Y) € Fe is a pair of c-concave 
conjugate functions if w=y* and y=% (where 
symmetrically (x) := inf {c(x, y) — V(x): y € Y}). 
Checking the latter condition amounts to verifying 
that p enjoys the so-called c-concavity property 
yp =y (in general, we have only yp > y, whereas 
yp = yf). We refer for instance to Villani (2003) for 
further details about this c-duality. 

Now, by exploiting Theorem 10 and Lemma 22, 
we obtain a very simple proof of Kantorovich 
duality theorem: 


Theorem 23 The following duality formula holds: 


T.(m v) = sopd f odut f vdv: (ou) EF) 


Moreover, the supremum in the right-hand side 
member is achieved by a pair (,4%) of conjugate 
c-concave functions such that, for any optimal J in 


[6], there holds B(x) + Wy) =c(x, y), 7-a.e. 
Proof By Theorem 10 and Lemma 22, we have 


Te(u, v) = F” (u) 


= supl [ edu + [edn pe OC} 
X Y 


<supf [ edu + f ydr ewer! 


< Te(H, v) 


where the last inequality follows from the definition 
of Fe. Therefore, inf [6] = sup [7]. Furthermore, on 
the right-hand side of first equality, we increase the 
supremum by substituting y with vy (recall that 
ip? = 9°). Thus, 


sup|7| = sup f edie [+ dv: y € C°(X), 


p c-concave } 


Take a maximizing sequence (Yn, p$) of c-concave 
conjugate functions. It is easy to check that {fn} 
is equicontinuous on X: this follows from the c-con- 
cavity property and from the uniform continuity of 
c (observe that yp(x1) — Yn(x2) = yo"(x1) — pS (x2) < 
supy {c(x1, -) — c(x2, - )}). Then, by Ascoli’s theorem, 
possibly passing to subsequences, we may assume 
that: Yn — Cn converges uniformly to some continuous 
function @ where {c,} is a suitable sequence of 
reals. Then, one checks that © is still c-concave 
and that (Yn — cn) = p$ +c, converges uniformly to 
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o°. Thus, 
deduce that 


r = lim( | eos | dv) 
= tim] fen on) du f (65 + 60) dv 


= | pdu+ | oar 
X Y 


The last assertion is a consequence of the extrem- 
ality relation: 


recalling that p(X)=v(Y)=1, we 


Remark 24 


(i) In their discrete version (i.e., u,v are atomic 
measures), problems [6] and [7] can be seen as 
particular linear programming problems (see the 
section “Finite-dimensional case”). 

(ii) The case X=Y c R” and c(x, y) =(1/2)|x — y|? 
is important. In this case, the notion of c-concavity 
is linked to convexity and the Fenchel transform 
since, for every y € C?(X), one has 


Le g (LÈ 
2 2 


Then if (%, 9f) is a solution of [7], we find that 


D E 
po(x): 7 P(x) 


is convex continuous and that the extremality 
condition: @(x) + f(y) =c(x,y) is equivalent to 
Fenchel equality wo(x) + yi(y) =(x\y). There- 
fore, any optimal y is supported in the graph 
of the subdifferential map Oyo. In the case 
where u is absolutely continuous with respect to 
the Lebesgue measure, it is then easy to deduce 
that the optimal y is unique and that y= 7r7,, 
where To =Vvyo is the unique gradient (a.e. 
defined) of a convex function such that 
Vu) =v. This is a celebrated result by Y 
Brenier (see, e.g., the monographs by Evans 
(1997) and Villani (2003)). 


The Distance Case 


In the following, we assume that X=Y and that 
c(x,y) is a semidistance. As an immediate 


consequence of the triangular inequality, we have 
the following equivalence: 


y c-concave } v(x) — ly) < c(x,y), V(x, y) 
S y =y 


Let us denote Lip; (X):= {u € C®(X): u(x) — uly) < 
c(x,y)}. The first assertion of Theorem 23 becomes 
the Kantorovich—Rubintein duality formula: 


Tv) =max} | ud(u -v)u € Lip} [8 


As it appears, T.(u,v) depends only on the differ- 
ence f = u — v, which belongs to the space Mo(X) of 
signed measure on X with zero average. Defining 
N(f):=T.(f*,f—) provides a seminorm (Kantoro- 
vich norm) on Mo(X) (it turns out that Mo(X) is 
not complete and that in general its completion is a 
strict subspace of the dual of Lip(X)). 

We will now specialize to the case where X is a 
compact manifold equipped with a geodesic dis- 
tance. This will allow us to link the original problem 
to another primal—dual formulation closer to that 
considered in the section “Primal—dual formulation 
in mechanics” and yielding to a connection with 
partial differential equations. As a model example, 
let us assume that K=Q, where Q is a bounded 
connected open subset of R” with a Lipschitz 
boundary. Let © CQ be a compact subset (on 
which the transport will have zero cost) and define 


elx y= in  (S\ >): 
S Lipschitz curve joining xto y,ScQ} [9] 


where H! denotes the one-dimensional Hausdorff 
measure (length). It is easy to check that 


c(x, y) = min{őa (x, y), 6a(x, X) + day, &)} 
where ôgo(x, y) is the geodesic distance on Q (induced 
by the Euclidean norm). Furthermore, the following 
characterization holds: 

u € Lip, (X) = u € Wt” (Q), 
|Vu| < 1 a.e. in Q, u = cte on X [10] 
Since f:=u—v is balanced, the value of the 
constant on Ù} in [10] is irrelevant and can be set 


to 0. Thus we may rewrite the right hand side 
member of [8] in a equivalent way as 


max | udf: ue W'(Q), 
Q 
|Vu| < 1 a.e. on Q, u = 0 on z) [11] 


We will now derive a new dual problem for [11] 
by using Proposition 14. To this aim, we consider 


. Cl(Q) (as a closed subspace of Whe()), 
Y = CQ; R”), Y* =M (Q; R”) and the operator 
A:u E€ X= VuE€ Y. 


Theorem 25 Let u,v EP(Q) f= u-v and c 
defined by [9]. Then, 


Tuy) = mind [IX A € MR") 
-divà = f on 0\ 2} [12] 


where the divergence condition is intended in the 


sense that 
l A-Ve= / pdf 
Q Q 


for all p € C” compactly supported in R"\™. 


Proof (sketch) We apply Proposition 14 with 
o(u)=—Joudf if u=0 on £ (+00 otherwise), 
A = V, and y(v)=0 if [v| < 1 on Q (+00 otherwise). 
We obtain that the minimum a in [12] is reached 
and that a = 8, where 


—G = mtl- | u df: u E Gr), 
Q 
Vu| <1 on 4 =0 on X} 


To prove that 8 = T.(u,v)=sup (11), we consider a 
maximizer “ in [11] and prove that it can be 
approximated uniformly by a sequence {u,} of 
functions in C!(Q) which satisfy the same con- 
straints. This technical part is done by truncation 
and convolution arguments (we refer to Bouchitté 


et al. (2003) for details). O 


Remark 26 By localizing the integral identity 
associated with [12], it is possible to deduce 
the optimality conditions which characterize optimal 
pairs (4, à) for [11], [12] (without requiring any 
regularity). This is done by using a weak notion 
of tangential gradient with respect to a measure 
(see Bouchitté et al. (1997) and Bouchitté and 
Fragalà (2002)). If A\=adx where o € Li(Q; R”) 
and if © C OQ, then we find that ¢ =aVu, where the 
pair (7%, a) solves the following system: 


—div(aVuz) =f on Q 
Val =4 ase: on {a > 0} 
u=0 a.e. on & 


Ou 
a One 


(diffusion equation) 


(eikonal equation) 
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Remark 27 Given a solution ¥ for [6], we can 
construct a solution for [12] by selecting for every 
(x,y) € spt(Y) a geodesic curve Sxy joining x and y 
(possibly passing through the free-cost zone X) and 
by setting, for every test œ: 


A, 6) = ho (/ 


xy 


b+ TS, an d(dxdy) 


where 7s„ denote the unit oriented tangent vector 
(see Bouchitté and Buttazzo (2001)). It is also 
possible to show (see Ambrosio (2003)) that any 
solution À can be represented as before through a 
particular solution 7. As a consequence, the support 
of any solution ¥ of [12] is supported in the geodesic 
envelope of the set spt(jz) Uspt(v) U £X. However, we 
stress the fact that, in general, there is no uniqueness 
at all of the optimal triple (7,7, A) for [6], [11] 
and [12]. 


Remark 28 An approximation procedure for par- 
ticular solutions of problems [11], [12] can be 
obtained by solving a p-Laplace equation and then 
by sending p to infinity. Precisely, consider the 
solution up € Wh?(Q) of 


—div(|Vul? *Vu) =f on Q\D 

u=0 on X 
which, for p >n, exists (due to the compact 
embedding W!?(Q) c C8(Q)) and is unique. In 
Bouchitté et al. (2003) it is proved that the sequence 
{(up,Op)}, where op =|Vup|” "Vp, is relatively 
compact in M,(Q;R”) x C°(Q (weakly star with 
respect to the first component) and that every cluster 
point (z, A) solves [11], [12]. It is an open problem 
to know whether or not such a cluster point is 
unique. If the answer is “yes,” the process described 
above would select one optimal pair among all 
possible solutions. As far as problem [11] is 
concerned, this problem is connected with the 
theory of viscosity solutions for the infinite Lapla- 
cian (see Evans (1997)) although this theory does 
not provide an answer as it erases the role of the 
source term f. On the other hand, a new entropy 
selection principle should be found for the solutions 
of dual problem [12]. In fact, the following partial 
result holds: let E: M (Q; R”)—R U {+00} be the 
functional defined by 


dA 


aarp 


Ja lol log(lo|) dx 


iog otherwise 


EA) = 


Assume that [12] admits at least one solution Ao 
such that E(Aj) < too. Then it can be shown that 
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the sequence {op} does converge weakly-star to A, 
the unique minimizer of the problem 


inf{E(A): A solution of [12]} 


The general case, in particular when all optimal 
measures are singular, is open. 


Remark 29 Variational problems [11], [12] have 
important counterparts in the theory of elasticity 
and in optimal design problems (see Bouchitté and 
Buttazo (2001)). They read, respectively, as 


max if u-df: u E€ Np>1 WP (Q; R”), 
Q 


Vu(x) € K a.e. on Q, u =Q on z) 


min a pr AXE My (Q; RZ m), 
Q 
—div\ = f on Mz} 


where K C Ren) is a convex compact subset of 
symmetric second-order tensors associated with the 
elastic material, pr(E) = sup {€-z: z E€ K} is convex 
positively 1-homogeneous and the functional on 
measures fa py(A) is intended in the sense given in 
[1]. A celebrated example is given by Michell’s 
problem (Michell 1904) where n=2 and K:={ze 
Rom |(2)| < 1}, p(z) being the largest singular value 
of z. The potential p% is given by the nondifferenti- 
able convex function p?(€) = n (£) + 72(€), where the 
T;(€)’s are the singular values of £. 


Unfortunately, it is not known if the vector 
variational problem above can be linked to an 
optimal transportation problem of the type [6], 
even if the analogous of equivalence [10] does exist 
in the Michell’s case, namely (for Q convex): 


p(e(u)) <1 onQ 
<=> |(u(x) — u(y) |x —y)| < lx — yf, Y(x,y) 
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Introduction 


Mathematical cosmology focuses on the geometrical 
and mathematical aspects of the study of the 
universe as a whole. Because the structure of 
spacetime (with metric tensor g,,(x/)) is governed 
by gravity, with matter and energy causing space- 
time curvature according to the nonlinear gravita- 
tional field equations of the theory of general 
relativity, it has its roots in differential geometry. It 
is to be distinguished from the three other major 
aspects of modern cosmology, namely astrophysical 
cosmology, high-energy physics cosmology, and 
observational cosmology; see Peacock (1999) for 
these aspects. 
The Einstein field equations (EFEs) are 


Rab — 4 Rab + Agab = KTap [1] 


where R, is the Ricci tensor, R the Ricci scalar, Tag 
the matter tensor, A the cosmological constant, and 
k the gravitational constant. Cosmological models 
differ from generic solutions of these equations in 
that they have preferred world lines in spacetime 
associated with the motion of matter and distribu- 
tion of radiation (Ellis 1971). This is a classic case of 
a broken symmetry: the underlying equations [1] are 
locally Lorentz invariant but their solutions are not. 
These preferred world lines, characterized by a unit 
4-velocity vector u’, are associated at late times with 
“fundamental observers,” and a key aspect of 
cosmological modeling is determining the observa- 
tional relations such observers would determine 
through astronomical observations. 

The dynamics of cosmological models is deter- 
mined by their matter content. This is usually 
represented in simplified form, often using the 
“perfect-fluid” approximation to represent the effect 
of matter or radiation; that is, 


Tab = (p + p)Uatty + Pad [2] 


where p is the energy density and p the pressure, and 
the matter 4-velocity u, is the preferred cosmo- 
logical 4-velocity. This description can include a 
scalar field @ with dynamics governed by the 
Klein-Gordon equation, provided uz is normal to 
spacelike surfaces {ġ = const}. Suitable equations of 
state describe the nature of the matter envisaged 
(e.g., p=0 for baryons, whereas p=p/3 for 


radiation); in the case of a scalar field with potential 
V(¢) and spacelike surfaces {¢=const.}, on choosing 
u’ orthogonal to these surfaces, the stress tensor has 
a pertect-fluid form with p= (1/2) 6° + V(d), 
p=(1/2)¢6 — V(¢). A cosmological constant A can 
be represented as a perfect fluid with p+p=0O, 
A=p. More general matter may involve a momen- 
tum flux density qa and anisotropic pressures 7, 
(Ehlers 1961). Whatever the nature of the matter, it 
will usually be required to satisfy energy conditions 
(Hawking and Ellis 1973). All realistic matter has a 
positive inertial mass density: 


p+p>0 [3] 


(note that realistic cosmological models are non- 
empty), whereas all ordinary matter has a positive 
gravitational mass density: 


p+3p>0 [4] 


but this is not necessarily true for a scalar field or 
effective cosmological constant. 

Mathematical cosmology (Ellis and van Elst 1999) 
studies (1) generic properties of solutions with a 
preferred 4-velocity field and matter content as 
indicated above, (2) the standard FLRW models, 
(3) approximate FLRW solutions, and (4) other 
exact and approximate cosmological solutions. The 
ultimate underlying issue is (5) the origin of the 
universe. We look at these in turn. We aim to use 
covariant methods as far as possible, to avoid being 
misled by coordinate effects, and to obtain exact 
solutions and exact results as far as possible, because 
approximate methods can be misleading in the case 
of these nonlinear field equations. 


Exact Properties 


We can split the equations into spacelike and 
timelike parts relative to the 4-velocity uf, obtain- 
ing the (1+ 3) covariant dynamical equations and 
identities in terms of the fluid shear o,,, vorticity 
wab, expansion O=w".,, and acceleration a? = 
Upp? (Ehlers 1961, Ellis 1971, Ellis and van Elst 
1999). The energy density of a perfect fluid obeys 
the conservation equation 


p=—3(p+p) [5] 


Ml wn: 


with extra terms occurring in the case of more 
complex matter. From the momentum equations, 
pressure-free solutions are geodesic (a° =0). The 
crucial Raychaudhuri-Ehlers equation for the 
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time derivative of the expansion (Ehlers 1961) 
can be written as 
S 2. ud bo” 
35 = 2 —o )+a_—x(p + 3p) +A [6] 
where the representative length scale S is defined by 
©=3$/S. This is the basis of the “fundamental 
singularity theorem”: if in an expanding universe 
w=0=a? and the combined matter present satisfies 
[4], with A < 0, then there was a singularity where 
S — 0 a finite time tọ < 1/Ho ago, Ho =(S/S)9 being 
the present value of the Hubble constant. The energy 
density will diverge there, so this is a spacetime 
singularity: an origin of physics, matter, and space- 
time itself. However, the deduction does not follow if 
there is rotation or acceleration, which could 
conceivably avoid the singularity, so this result is by 
itself inconclusive for realistic cosmologies. 

The vorticity obeys conservation laws analogous 
to those in Newtonian theory (Ehlers 1961). 
Vorticity-free solutions (w=0) occur whenever the 
fluid flow lines are hypersurface-orthogonal in 
spacetime, that is, there exists a cosmic time 
function for the comoving observers, which will 
measure proper time along the flow lines if 
additionally the fluid flow is geodesic. The rate of 
change of shear is related to the conformal curvature 
(Weyl) tensor, which represents the free gravita- 
tional field, and which splits into an electric part E,, 
and a magnetic part H,, in close analogy with 
electromagnetic theory. Shear-free solutions (ø = 0) 
are very special because they strongly constrain the 
Weyl tensor; indeed if the flow is shear free and 
geodesic, then it either does not expand (O=0), or 
does not rotate (w=0) (Ellis 1967). The set of 
cosmological observations associated with generic 
cosmological models has been characterized in 
power series form by Kristian and Sachs (1966), 
and that result has been extended to general models 
by Ellis et al. (1985). 

The local regularity of the theory is expressed in 
existence and uniqueness theorems for the EFEs, 
provided the matter behavior is well defined through 
prescription of suitable equations of state (Hawking 
and Ellis 1973). However, in general the theory 
breaks down in the large, and this feature is 
specified by the Hawking—Penrose singularity theo- 
rems, predicting the existence of a geodesic incom- 
pleteness of spacetime under conditions applicable 
to realistic cosmological models satisfying the energy 
conditions given by eqns [3] and [4] (Hawking and 
Ellis 1973, Tipler et al. 1980). However, the 
conclusion does not follow if the energy conditions 
are not satisfied. Furthermore, the deduction follows 


only if the gravitational field equations remain valid 
to arbitrarily early times; but we would in fact 
expect that, at high enough energy densities, 
quantum gravity would take over from classical 
gravity, so whether or not there was indeed a 
singularity would depend on the nature of the as 
yet unknown theory of quantum gravity. The cash 
value of the singularity theorems then is the 
implication that, when the energy conditions are 
satisfied, one would indeed be involved in such a 
quantum gravity realm in the very early universe. 


The Standard Friedmann-Lemaitre 
Models 


The standard models of cosmology are the Fried- 
mann—Lemaitre (FL) models with Robertson—Walker 
(RW) geometry: that is, they are exactly spatially 
homogeneous and locally isotropic, invariant under a 
Ge of isometries (Robertson 1933, Ehlers 1961). 
They have a unique cosmic time function t, with 
space sections {t=const.} of constant spatial curva- 
ture orthogonal to the uniquely preferred 4-velocity 
u’. The fluid acceleration, vorticity, and shear all 
vanish, and all physical quantities depend only on the 
time coordinate t. They can be represented by a 
metric with scale factor S(t): 


ds? = g,,dx%dx? 
= -dt + 8° (t){dr? + f?(r)(d@ + sin? 6dd?)} 
7] 


in comoving coordinates (x*) = (t, 7, 0, 6), where f (r) = 
{sinr,r, sinhr} if {k= +1,0, —1}, and the matter is a 
perfect fluid with 4-velocity vector u’ = dx*/ds = 64. 
The curvature of the space sections {t=const.} is 
K =k/S?; these 3-spaces are necessarily closed (com- 
pact) if they are positively curved (k = +1), but may be 
open or closed in the flat (k = 0) and negatively curved 
(k=-—1) cases, depending on their topology 
(Lachieze-Rey and Luminet 1995). 

Matter obeys the conservation equation [5], whose 
outcome depends on the equation of state; for 
baryons p=M/S°, whereas for radiation p=M/S*, 
where M is a constant. The dynamics of the models is 
governed by the Raychaudhuri equation 

K 


a 


ga gle t 3p) +A [8] 
which has the Friedmann equation 
35? 3k 


as a first integral whenever $ 4 0. Depending on the 
matter components present, one can qualitatively 


characterize the dynamical behavior of these models 
(Robertson 1933) and find exact and approximate 
solutions to these equations as well as phase planes 
representing the relation of the different models to 
each other; for example, Ehlers and Rindler (1989) 
give the phase planes for models with noninteracting 
matter and radiation and an arbitrary cosmological 
constant. Universes with maxima or minima in S(t) 
can only occur if R=+1; when A=0, the universe 
recollapses in the future iff k= +1. Static solutions 
are possible only if R=+1 and (assuming [4]) 
A >0O. The simplest expanding solutions are the 
Einstein—-de Sitter universes with k =0 =A. 

Equation [8] is a special case of [6], with 
corresponding implications: if the combined matter 
present satisfies [4], with A < 0, then there must have 
been an initial singularity, or at least the universe 
must have emerged from a quantum gravity domain. 
The temperature would have been arbitrarily high in 
the past, so there was a hot big bang era in the early 
universe where matter and radiation were in equili- 
brium with each other at very high temperatures that 
rapidly fell as the universe expanded. Many physical 
processes took place then, in particular nucleosynth- 
esis of light elements took place at ~10? K. Decou- 
pling of matter and radiation took place at a 
temperature of ~4000K, followed by formation of 
stars and galaxies (see Peacock (1999) for a discus- 
sion of these physical processes). The black-body 
radiation emitted by the surface of last scattering at 
4000 K is observed by us today as cosmic black-body 
radiation (CBR) at a temperature of 2.75 K. 

One can determine observational relations for 
these models such as the magnitude-redshift relation 
for “standard candles” at recent times from the EFEs 
(Sandage 1961). The aim of observations is to 
determine the Hubble constant Ho, dimensionless 
deceleration parameter go = —(3/ H2\(S /S)o, and 
normalized density parameters Qo; =K0;/3H¢ for 
each component of matter present. The spatial 
curvature and the cosmological constant then follow 
from [6] and [9]; also the present scale factor So is 
determined if k #0. The universe is of positive 
spatial curvature (R=+1) iff Q7=OQmt+Q, > 1, 
where Qm = X; Qo; Qa =A/3HZ. Current observa- 
tions indicate Qm œ 0.3, 94 œ 0.7, Qo ~ 1.02 + 
0.02. Because the nucleosynthesis results limit the 
baryon density to a very low value (Qo, œ 0.02), 
which is about the same as the density of luminous 
matter, this indicates the dominant presence of both 
nonbaryonic dark matter and a repulsive force 
corresponding to either a cosmological constant or 
varying scalar field (dark energy). 

Crucial causal limitations occur because of the 
existence of particle horizons (Rindler 1956), the 
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nature of which is most clear when represented in 
conformal diagrams (Hawking and Ellis 1973, Tipler 
et al. 1980). These result from the fact that light 
can only proceed a finite distance in the finite time 
since the origin of the universe, and imply that for 
a standard radiation-dominated hot-big-bang early 
universe, regions of larger than ~1° angular size on 
the surface of last scattering, which emits the CBR, 
are causally disconnected: hence, no causal process 
since the start of the universe can account for the 
extreme isotropy of the CBR (AT/T ~ 10™ over 
the whole sky, once a dipole anisotropy AT/T ~ 
10-3 due to our local velocity relative to the 
cosmological rest frame is allowed for). This is the 
“horizon problem,” one of the driving forces 
behind the theory of “inflation” (Guth 1981): the 
idea that, in the very early universe, a slow-rolling 
scalar field led to a brief exponential expansion 
through at least 50 e-folds (during which time the 
spacetime was approximately de Sitter), thus 
smoothing the universe and solving the horizon 
problem (Guth 1981, Peacock 1999). This is 
possible because a scalar field can violate the energy 
condition [3] and so allows acceleration: S > 0. 
Consequently, there are now many studies of the 
dynamics of FLRW solutions driven by scalar fields 
and the subsequent decay of these scalar fields into 
radiation. One interesting point is that one can 
obtain exact solutions of this kind for arbitrarily 
chosen evolutions S(t), provided they satisfy a 
restriction on the magnitude of $°, by running the 
field equations backwards to determine the needed 
potential V(ọ) (Ellis and Madsen 1991). The 
inflationary paradigm is dominant in present-day 
theoretical cosmology, but suffers from the problem 
that it is not in fact a well-defined theory, for there 
is no single accepted proposal for the physical 
nature of the effective scalar field underlying the 
supposed exponential expansion; rather there are 
numerous competing proposals. As the inflaton has 
not yet been identified, this theory is not yet 
soundly linked to well-established physics. 


Approximate FL Solutions 


The real universe is, of course, not exactly FL, and 
studies of structure formation depend on studies of 
solutions that are approximately FL models — they 
are realistic (“lumpy”) universe models. These 
enable detailed studies of observable properties 
such as CBR anisotropies and gravitational lensing 
induced by matter inhomogeneities, and of the 
development of those inhomogeneities from quan- 
tum fluctuations in the very early universe that then 
get expanded to very large scales by inflation. 
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The key problem here is that apart from the standard 
coordinate freedom allowed in general relativity, there 
is a serious gauge issue: the background FL model is not 
uniquely determined by the realistic universe model; 
however, the magnitudes of many perturbed quantities 
depend on how it is fitted into the lumpy model. For 
example, the density perturbation p is determined 
pointwise by the equation 

bp(x') = p(x’) — p(x’) 
where p(x’) is the background density. But by 
altering the correspondence between the background 
and realistic models (specifically, by the choice of 
surfaces p(x’) =const. in the realistic model) one can 
assign that quantity any value, including zero (if one 
chooses p(x’) = p(x’)). This is the “gauge problem.” 

One can handle it by using standard variables and 
keeping close track of the gauge freedom at all 
times. However, one then ends up with higher-order 
equations than necessary because some of the 
perturbation modes present are pure gauge modes 
with no physical significance. Alternatively, one can 
fix the gauge by some unique specification of how 
the background model is fitted into the realistic 
model, but there is no agreement on a unique way to 
do this, and different choices give different answers. 
The preferable resolution is to use gauge-invariant 
variables, either coordinate based (Bardeen 1980) or 
covariant, based on the (1+3) covariant decomposi- 
tion of spacetime quantities mentioned above (Ellis 
and Bruni 1989), in either case resulting in pertur- 
bation equations without gauge freedom and of 
order corresponding to the physical degrees of 
freedom. The key point in the latter approach is to 
choose covariant variables that vanish in the back- 
ground spacetime; they are then automatically gauge 
invariant. Realistic structure formation studies carry 
out this process for a mixture of matter components 
with different average velocities, and extend to a 
kinetic theory description of the background radia- 
tion (see Ellis and van Elst (1999) and references 
therein). The outcome is a prediction of the CBR 
anisotropy power spectrum, determined by the 
inhomogeneities in the gravitational field and the 
motions of the matter components at decoupling 
(Sachs and Wolfe 1967). This spectrum can then be 
compared with observations and used in determin- 
ing the values of the cosmological parameters 
mentioned above (see Peacock 1999). 

One crucial issue is why it is reasonable to use a 
perturbed FL model for the observable region of the 
universe. The key argument is that this is plausible 
because of the high isotropy of all observations 
around us when averaged on a sufficiently large 
spatial scale, and particularly the very low anisotropy 


of the CBR. The Ehlers—-Geren—Sachs (EGS) theorem 
(Ehlers et al. 1968) provides a sound basis for this 
argument: it shows that if freely propagating CBR 
(obeying the Liouville equation) is exactly isotropic in 
an expanding universe domain U,then the universe is 
exactly FL in that domain (i.e., it has exactly the RW 
spatially homogenous and isotropic geometry there), 
the point being that any inhomogeneities in the 
matter distribution between us and the surface of last 
scattering will produce anisotropies in the CBR 
temperature we measure. But that result does not 
apply to the real universe, because the CBR is not 
exactly isotropic. The “almost EGS” theorem 
(Stoeger et al. 1995) shows that this result is stable: 
almost isotropic CBR in the domain U implies that 
the universe is almost-FL in that domain. The 
application to the real universe comes by making a 
weak Copernican assumption: “we assume we are 
not special, so all observers in U/ (taken to be the 
visible part of the universe) will also see almost 
isotropic CBR, just as we do.” The result then 
follows. A further argument for homogeneity of the 
universe comes from postulating “uniform thermal 
histories” (Bonnor and Ellis 1986), but that argument 
is yet to be completed and applied in a practical way. 


Anisotropic and Inhomogeneous Models 


The FL universes are geometrically extremely special. 
We wish further to understand the full range of 
possible universe models, their dynamical behaviors, 
and which of them might, at some epoch, realistically 
represent the real universe. This enables us to see how 
the approximate FL models fit into this wider set of 
possibilities, and under what circumstances they are 
attractors in this set of cosmologies. 

Exact solutions are characterized by their space- 
time symmetries. Symmetries are characterized by 
the dimension s of the surfaces of homogeneity and 
the dimension q of the isotropy group at a general 
point, together giving the dimension r=s +t of the 
group of isometries G, (at special points, such as a 
center of symmetry, s can decrease and g increase 
but always so that r stays unchanged). In the case of 
a cosmological model, because the 4-velocity u’ is 
invariant under isotropies, the only possible dimen- 
sions for the isotropy group are g=3,1,0; whereas 
the dimension t£ of the surfaces of homogeneity can 
take any value from 4 to 0. This gives the basis for a 
classification of cosmological spacetimes (Ellis 1967, 
Ellis and van Elst 1999). 

When g=3, we have isotropic solutions — there 
are no preferred spatial directions — and it is then 
a theorem that they must be spatially homoge- 
neous FL universes (Ehlers 1961). When g=1, we 


have locally rotationally symmetric (LRS) solu- 
tions, with precisely one preferred spacelike direc- 
tion at a generic point (Ellis 1967). When q =0, the 
solutions are anisotropic in that there can be no 
continuous group of rotations leaving the solution 
invariant; however, there can be discrete isotropies 
in some special cases. 

When t=4,we have spacetime homogeneous solu- 
tions, with all physical quantities constant; they cannot 
expand (by [5] and [3]). Nevertheless, two cases are of 
interest. For g=1 (r= 5) we find the Gödel universe, 
rotating everywhere with constant vorticity, which 
illustrates important causal anomalies (Gödel 1949, 
Hawking and Ellis 1973). For g=3 (r=6), we find 
the Einstein “static universe” (Einstein 1917), the 
unique nonexpanding FL model with k = 1 and A > 0. 
It is of interest because it could possibly represent the 
asymptotic initial state of nonsingular inflationary 
universe models (Ellis et al. 2003). The higher- 
symmetry models (de Sitter and anti-de Sitter 
universes with higher-dimensional isotropy groups) 
are not included here because they do not obey the 
energy condition [3] — they are empty universes, 
which can be interesting asymptotic states but are 
not by themselves good cosmological models. 

When t=3, we have spatially homogeneous 
evolving universe models. For g=0 (r=3), there 
are a large family of Bianchi universes, spatially 
homogeneous but anisotropic, characterized into 
nine types according to the structure constants of 
the Lie algebra of the three-dimensional symmetry 
group G3. These can be “orthogonal”: the fluid flow 
is orthogonal to the surfaces of homogeneity, or 
“tilted”; the latter case can have fluid rotation or 
acceleration, but the former cannot. They exhibit a 
large variety of behaviors, including power-law, 
oscillatory, and nonscalar singularities (Tipler et al. 
1980). A vexed question is whether truly chaotic 
behavior occurs in Bianchi IX models. The behavior 
of large families of these models has been character- 
ized in dynamical systems terms (Wainwright and 
Ellis 1996), showing the intriguing way that higher- 
symmetry solutions provide a “skeleton” that guides 
the behavior of lower-symmetry solutions in the 
space of spacetimes. Many Bianchi models can be 
shown to isotropize at late times, particularly if 
viscosity is present; thus, they are asymptotic to the 
FL universes in the far future. In some cases, Bianchi 
models exhibit intermediate isotropization: they are 
much like FL models for a large part of their life, but 
are very different from it both at very early and very 
late stages of their evolution. These could be good 
models of the real universe. An important theorem 
by Wald (1983) shows that a cosmological constant 
will tend to isotropize Bianchi solutions at late 


Cosmology: Mathematical Aspects 657 


times. This is an indication that inflation can 
succeed in making anisotropic early states resemble 
FL models at later times. Observational properties 
like element abundances and CBR anisotropy 
patterns can be worked out in these models (some 
of them develop a characteristic isolated “hot spot” 
in the CBR sky). For g=1 (r=4), we have spatially 
homogeneous LRS models, either Kantowski Sachs 
or Bianchi universes, and again observations can be 
worked out in detail and phase planes developed 
showing their dynamical behavior, often isotropiz- 
ing at late times. There are orthogonal and tilted 
cases, the latter possibly involving nonscalar singu- 
larities. For g=3 (r=6), we have the isotropic FL 
models, discussed above. Both the LRS and isotropic 
cases could be good models of the real universe. 

When t=2, we have inhomogeneous evolving 
models. This is a very large family, but the LRS 
(¢g=1,r=2) cases have been examined in detail; in 
the case of pressure-free matter, these are the 
Tolman-Bondi inhomogeneous models (Bondi 
1947) that can be integrated exactly, and have 
been used for many interesting astrophysical and 
cosmological studies. Krasinski (1997) gives a very 
complete catalog of these and lower-symmetry 
inhomogeneous models and their uses in cosmology. 
A considerable challenge is the dynamical systems 
analysis for generic inhomogeneous models, needed 
to properly understand the early evolution of generic 
universe models (Uggla et al. 2003), and hence to 
determine what is generic behavior. 


The Origin of the Universe 


The issue underlying all this is what led to the initial 
conditions for the universe, for example, providing 
the starting conditions for inflation. There are many 
approaches to studying the quantum gravity phase 
of cosmology, including the Wheeler-de Witt equa- 
tion, the path-integral approach, string cosmology, 
pre-big bang theory, brane cosmology, the ekpyrotic 
universe, the cyclic universe, and loop quantum 
gravity approaches. These lie beyond the purview of 
the present article, except to say that they are all 
based on unproven extrapolations of known physics. 
The physically possible paths will become clearer as 
the nature of quantum gravity is elucidated. 

It is pertinent to note that there exist nonsingular 
realistic cosmological solutions, possible in the light 
of the violations of the energy condition enabled by 
the supposed scalar fields that underlie inflationary 
universe theory. These nonsingular solutions can even 
avoid the quantum gravity era (Ellis et al. 2003). 
However, they have very fine-tuned initial conditions, 
which is nowadays considered as a disadvantage; but 
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there is no proof that whatever processes led to the 
existence of the universe preferred generic rather than 
fine-tuned conditions; this is a philosophical rather 
than physical assumption. It may well be that, as 
regards the start of the universe, the options are that 
either an initial singularity occurred, or the initial 
conditions were very finely tuned and allowed an 
infinitely existing universe. Investigation of whether 
this conjecture is in fact valid, and if so which is the 
best option, are intriguing open topics. 


See also: Einstein Equations: Exact Solutions; 
Einstein—Cartan Theory; General Relativity: Experimental 
Tests; General Relativity: Overview; Gravitational 
Lensing; Lie Groups: General Theory; Newtonian Limit of 
General Relativity; Quantum Cosmology; Shock Wave 
Refinement of the Friedman—Robertson—Walker Metric; 
Spacetime Topology, Causal Structure and Singularities; 
String Theory: Phenomenology. 
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Introduction 


The general symplectic reduction theory (see 
Symmetry and Symplectic Reduction) becomes 
much richer and has many applications if the 
symplectic manifold is the cotangent bundle 
(T*O, Qo = —dOg) of a manifold OQ. The canonical 
1-form Ogo on T*O is given by O0(aq)(Va,) = 
Ag(To,7O(Va,)), for any qeQ, ene 10, and 
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tangent vector Va, € T,,(T*Q), where 79: T*O => Q 
is the cotangent bundle projection and Ty,70: 
Ta, (T*O) — T4Q is its tangent map (or derivative) 
at q. In natural cotangent bundle coordinates (q’, p;), 
we have Oo = pidq' and No = dq' ^ dp;. 

Let 6: G x O — O be a left smooth action of the Lie 
group G on the manifold and ©. Denote by 
g-qg=(g,q) the action of g€ G on the point q4 € Q 
and by ®,: OQ—O the diffeomorphism of O induced 
by g. The lifted left action G x T*O — T*Q, given by 
g-ag=T) Pei(aq) for geG and a,€ TQ, 
preserves Og, and admits the equivariant momentum 
map J:T*Q-—g* whose expression is (J(aq),§) = 
Qg((EQ(q)), where £ € g, the Lie algebra of G, (,): g* x 
g— R is the duality pairing between the dual g* and g, 


and €9(q)=d®( exp t£, q)/dt|;_o is the value of the 


infinitesimal generator vector field £ọ of the G-action 
at ge€O (see Hamiltonian Group Actions and 
Symmetries and Conservation Laws). Throughout 
this article, it is assumed that the G-action on OQ, 
and hence on T*Q, is free and proper. Recall also 
that ((T*Q),,,(Qo),,) denotes the reduced manifold 
at u €g“ (see Symmetry and Symplectic Reduction), 
where (T*Q), = J" (u)/G, is the orbit space of the 
G,,-action on the momentum level manifold J(u) 
and G,:={g€G|Adju=} is the isotropy sub- 
group of the coadjoint representation of G on g“. 
The left-coadjoint representation of g€ G on peg" 
is denoted by Ad? p. 

Cotangent bundle reduction at zero is already quite 
interesting and has many applications. Let p: O — O/G 
be the G-principal bundle projection defined by the 
proper free action of G on Q, usually referred to as the 
shape space bundle. Zero is a regular value of J and the 
map o: ((T*O)o (20)o) >(T*(Q/G), o/c) given 
by yo([aq])(Tqp(vq)) == aqlvq), where ag E€J™(0), 
[ag] E (T*O)o, and v4 € TzQ, is a well-defined sym- 
plectic diffeomorphism. 

This theorem generalizes in two nontrivial ways 
when one reduces at a nonzero value of J: an 
embedding and a fibration theorem. 


Embedding Version of Cotangent 
Bundle Reduction 


Let we g*, Ou:= Q/G pn: QQ, the projection 
onto the G,-orbit space, g,:={€€g|ad:=0} the 
Lie algebra of the coadjoint isotropy subgroup G,, 
where aden:=[€, n] for any £, n €g,ad;:g* — g* the 
dual map, p’ = Hlg, Eg; the restriction of u to g, 
and ((T*Q),,,(QQ),,) the reduced space at p. The 
induced G,,-action on T*O admits the equivariant 
momentum map J":T*Q— gj, given by J"(aq)= 
J(aq)|, - Assume there is a G,-invariant 1-form a,, 
on O with values in ( J“) (yw). Then there is a unique 
closed 2-form ,, on Q, such that p/,3,,=da,,. Define 
the magnetic term B, = T0, Gu, Where To: 
T*O,,—Q, is the cotangent bundle projection, 
which is a closed 2-form on T*Q,. Then the map 
pu: ((T*Q),,5(29),,) + (T*Qy, 20, — By) given by 
Pull] Ta pyulUq)) = (Aq — alq) va), for ag EJ (u), 
[ag]e(T*Q),, and v4€T4Q, is a symplectic embed- 
ding onto a submanifold of T*O,, covering the base 
O,,. The embedding y, is a diffeomorphism onto 
T*Q, if and only if g=g,,. If the 1-form a, takes 
values in the smaller set J" (u) then the image of p, is 
the the vector sub-bundle [Tp,,(VQ)]° of T*Q,,, where 
VO CTO is the vertical vector sub-bundle consisting 
of vectors tangent to the G-orbits, that is, its fiber at 
qeEO equals V,O={Eo0(q)|EEg}, and ° denotes the 
annihilator relative to the natural duality pairing 
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between TO, and T*O,,. Note that if g is abelian or 
„=Q, the embedding y,, is always onto and thus the 
reduced space is again, topologically, a cotangent 
bundle. 

It should be noted that there is a choice in this 
theorem, namely the 1-form a,. Whereas the 
reduced symplectic space ((T*Q),,,(Qog),,) is intrin- 
sic, the symplectic structure on the space T*Ọ, 
depends on a,,. The theorem above states that no 
matter how a, is chosen, there is a symplectic 
diffeomorphism, which also depends on a,, of the 
reduced space onto a submanifold of T*O,. 


Connections 


The 1-form a, is usually obtained from a left 
connection on the principal bundle p,,: Q— Q/G,, or 
p:OQ—O/G. A left connection 1-form A E€ Q! (Q; g) 
on the left principal G-bundle p: OQ — O/G is a Lie 
algebra-valued 1-form A:TQ-—g, where g denotes 
the Lie algebra of G, satisfying the conditions A (£o) = € 
for all £ € g and A(T, ®,(v)) =Ad,(A(v)) for all g € G 
and v € T,O, where Adg denotes the adjoint action of 
G on g. The horizontal vector sub-bundle HO of the 
connection A is defined as the kernel of A, that is, its 
fiber at q € O is the subspace H; := ker A(q). The map 
vq Verg (Yq) :=[A(q)(Yq)lo(q) is called the vertical 
projection, while the map vg horg(vg):= Vq — 
verg(Vz) is called the horizontal projection. Since for 
any vector vg E€ T,O we have vg = verg(Vq) + horg(vq), 
it follows that TO=HO®GVO and the maps 
hor; : T4Q — H,Q and verg : TQ — V,OQ are projec- 
tions onto the horizontal and vertical subspaces at every 
qe O. 

Connections can be equivalently defined by the 
choice of a sub-bundle HO C TO complementary to 
the vertical sub-bundle VO satisfying the following 
G-invariance property: Hy.,Q=T,®,(H,Q) for 
every g € G and q € O. The sub-bundle HO is called, 
as before, the horizontal sub-bundle and a connection 
1-form A is defined by setting A(q)(£olq) + tq) =£, 
for any £ € g and u4 €H,O. 

The curvature of the connection A is the Lie 
algebra-valued 2-form on O defined by B(u4, vq) = 
dA(horg(uq), horg(va)). When one replaces vectors in 
the exterior derivative with their horizontal projec- 
tions, then the result is called the exterior covariant 
derivative and the preceding formula for B is often 
written as B= DA. Curvature measures the lack of 
integrability of the horizontal distribution, namely 
B(u, v) = —A([hor(u),hor(v)]) for any two vector 
fields u and v on O. The Cartan structure equations 
state that B(u,v)=dA(u,v) —[A(u), A(v)], where 
the bracket on the right hand side is the Lie 
bracket in g. 
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Since the connection A is a Lie algebra-valued 
1-form, for each jeg* the formula a,(q):= 
A(q)"(u), where aq )":g° — TZO is the dual of the 
linear map A(q):T,O > 9, deins a usual 1-form on 
Q. This 1-form a, takes values in J- (uw) and is 
equivariant in the following sense: ža, = aga a for 
any gEG. 


Magnetic Terms and Curvature 


There are two methods to construct the 1-form a, 
from a connection. The first is to start with a 
connection 1-form A“ E2'(O3g,,) on the principal 
G,-bundle p,:Q— Q/G,. Then the 1-form a, := 
on m E€ Q!(Ọ) is G,,-invariant and has values in 
(J! (ulg) ). The magnetic term B, is the pullback to 
T*( o /G,,) of the Hla component da, of the 
irae of A” thought of as a 2-form on the 
base O/G,,. 

The second method is to start with a connection 
A€éQ!(Q,g) on the principal bundle p: OQ - O/G, 
to define ap = (u, A) €O1(Q), and to observe that 
this 1-form is G,„-invariant and has values in J Mu). 
The magnetic term B, is in this case the pullback to 
T*(Q/G,,) of the u-component da, of the curvature 
of A thought of as a 2-form on the base O/G,,. 


The Mechanical Connection 


If (O, ((,))) is a Riemannian manifold and G acts by 
isometries, there is a natural connection on the 
bundle p: O — O/G, namely, define the horizontal 
space at a point to be the metric orthogonal to the 
vertical space. This connection is called the mechan- 
ical connection and its horizontal bundle consists of 
all vectors vg E TO such that J(((vg, -))) =0. 

To determine the Lie algebra-valued 1-form A of 
this connection, the notion of locked inertia tensor 
needs to be introduced. This is the linear map 
I(q):g— g* depending smoothly on q € O defined by 
the identity (I(q)€,7)=((€0(9),no(q))) for any 
€,n€g. Since the G-action is free, each I(q) is 
invertible. The connection 1-form whose horizontal 
space was defined above is given by A(q)(vg)= 
Ug) (IU (v4s-)))) 

Denote by K: T*O—R the kinetic energy of the 
metric ((,)) on the cotangent bundle, that is, 
K(((vq,-))) :=(1/2)||vq||7. The 1-form a,,=A(-)*p is 
characterized for the mechanical connection A by the 
condition K(a,,(q)) = inf {K(8q) | Gg € J” (u) NT; Q}. 


The Amended Potential 


A simple mechanical system is a Hamiltonian system 
on a cotangent bundle T*O whose Hamiltonian 
function is the the sum of the kinetic energy of a 
Riemannian metric on O and a potential function 


V:O-R. If there is a Lie group G acting on O by 
isometries and leaving the potential invariant, then 
we have a simple mechanical system with symmetry. 
The amended or effective potential V,:Q—R at 
we g* is defined by V,:=Hoa,, where a, is the 
1-form associated to the mechanical connection. Its 
expression in terms of the locked moment of inertia 
tensor is given by V,(q):= V(q) + (1/2)(u, Iq)" u). 
The amended potential naturally induces a smooth 
function V, E€ C*(QO/G,,) 

The fundamental result about simple mechanical 
systems with symmetry is the following. The push- 
forward by the embedding ,,:((T*Q),,,(QqQ),,) > 
(T*O,,Q29,—B,) of the reduced Hamiltonian 
H, € C°((T*Q),,) of a simple mechanical system 
H = K+V o ro €C™(T*Q) is the restriction to the 
vector sub-bundle y,((T*Q),,) c T*(Q/G,), which 
is also a symplectic submanifold of (T*(Q/G,) 
Qo/G, — B,), of the simple mechanical system on 
T*(Q/G,,) whose kinetic energy is given by the 
quotient Riemannian metric on O/G, and whose 
potential is V,,. However, Hamilton’s equations on 
T*(Q/G,,) for this simple mechanical system are 
computed relative to the magnetic symplectic form 
Ocha 

There is a wealth of applications starting from 
this classical theorem to mechanical systems, span- 
ning such diverse areas as topological characteriza- 
tion of the level sets of the energy-momentum map 
to methods of proving nonlinear stability of relative 
equilibria (block-diagonalization of the stability 
form in the application of the energy-momentum 
method). 


Fibration Version of Cotangent Bundle Reduction 


There is a second theorem that realizes the reduced 
space of a cotangent bundle as a locally trivial 
bundle over shape space O/G. This version is 
particularly well suited in the study of quantization 
problems and in control theory. The result is the 
following. Assume that G acts freely and properly 
on QO. Then the reduced symplectic manifold (T*Q),, 
is a fiber bundle over T*(O/G) with fiber the 
coadjoint orbit O,. How this is related to the 
Poisson structure o the quotient (T*O)/G will be 
discussed later. 


The Kaluza-—Klein Construction 


The extra term in the symplectic form of the reduced 
space is called a magnetic term because it has this 
interpretation in electromagnetism. To understand 
why B,, is called a magnetic term, consider the 
problem of a particle of mass m and charge e 
moving in R? under the influence of a given 


magnetic field B=B,i+B,j+B,k,divB=0. The 
Lorentz force law (written in the International 
System) gives the equations of motion 


mo =evxB [1] 
where e is the charge and v=(x,y,z)=q is the 
velocity of the particle. What is the Hamiltonian 
description of these equations? 

There are two possible answers to this question. 
To formulate them, associate to the divergence free 
vector field B the closed 2-form B=B,dy ^ dz— 
Bydx ^ dz+B,dx ^ dy. Also, write B=curl A for 
some other vector field A=(A,,A,y,A,) on R3, 
called the magnetic potential. 

Answer 1 Take on T*R? the symplectic form 
Qg =dx ^ dpx + dy ^ dpy + dz ^ dp; — eB, where 
(Px, Py Pz) = p:=mv is the momentum of the 
particle, and h = mllv||7 /2 =m(x2 + y? + ż2)/2 is the 
Hamiltonian, the kinetic energy of the particle. A 
direct verification shows that dh =(Qp(X,, -), where 


oO oð o , 0 
X, a a 


o OZ Op x 
Wb B- eee 
x z Opy y Z Op. 


which gives the equations of motion [1]. 

Answer 2 Take on T*R? the canonical symplec- 
tic form Q = dx ^ dp, + dy ^A dp, + dz dp, and the 
Hamiltonian þa = ||p — eAl|*/2m. A direct verifica- 
tion shows that dha =()(X;,, -), where X;,, has the 
same expression |2]. 

Next we show how the magnetic term in the 
symplectic form Qg is obtained by reduction from 
the Kaluza—Klein system. Let O=R° x S! with 
the circle G=S!' acting on Q, only on the second 
factor. Identify the Lie algebra g of S' with R. Since 
the infinitesimal generator of this action defined 
by €€g=R has the expression €0(q, 0) = (q, 9; 0, £), 
if TS! is trivialized as S'xR, a momentum 
map J: T*O=R°xS!xR°>xR—g*=R is given by 
J(4,9; p.p)E= (P,P) (0,€)=pé, that is, J(q, 9; P,P) =p. 
In this case, the coadjoint action is trivial, so for any 
u E g* =R, we have G, =S = R, and w’ = u. The 
1-form a,,=p(A,dx + Aydy + A,dz + d0)€'(Q), 
where d0 denotes the length 1-form on St, is clearly 
G,,=S'-invariant, has values in J (n) ={(q, 93 p, u) | 
q,p € R°,0€S'}, and its exterior differential equals 
da,„= uB. Thus, the closed 2-form (,, on the base 
O,=0/G,=O0/S'=R* equals uB and hence 
the magnetic term, that is, the closed 2-form 
B =T, bu 0n T*Q,= T*R?, is also uB since 
ToO =R? x $'+0/G,=R° is the projection. 
Therefore, the reduced space (TI*Q), is 


Cotangent Bundle Reduction 661 


symplectically diffeomorphic to (T*R?, dx ^ dp, + 
dy ^ dpy+ dz ^ dp, — uB), which coincides with the 
phase space in Answer 1 if we put u=e. This also 
gives the physical interpretation of the momentum 
map J:T*Q=R? xS! x R? xR—g*=R, J(q,ð; 
p,p)=p and hence of the variable conjugate to 
the circle variable 0:p represents the charge. 
Moreover, the magnetic term in the symplectic 
form is, up to a charge factor, the magnetic field. 
The kinetic energy Hamiltonian 


1 1 
h(g.: p,p) = zbi +50" 


of the Kaluza—Klein metric, that is, the Riemannian 
metric obtained by keeping the standard metrics on 
each factor and declaring R? and S! orthogonal, 
induces the reduced Hamiltonian 


1 1 
bld) = lol +5 0° 

which, up to the constant u?/2, equals the kinetic 
energy Hamiltonian in Answer 1. Note that this 
reduced system is not the geodesic flow of the 
Euclidean metric because of the presence of the 
magnetic term in the symplectic form. However, 
the equations of motion of a charged particle in a 
magnetic field are obtained by reducing the geodesic 
flow of the Kaluza—Klein metric. 

A similar construction is carried out in Yang- 
Mills theory where A is a connection on a principal 
bundle and B is its curvature. Magnetic terms also 
appear in classical mechanics. For example, in 
rotating systems the Coriolis force (up to a dimen- 
sional factor) plays the role of the magnetic term. 


Reconstruction of Dynamics 
for Cotangent Bundles 


A general reconstruction method of the dynamics 
from the reduced dynamics was given in (see 
Symmetry and Symplectic Reduction). For cotangent 
bundles, using the mechanical connection, this 
method simplifies considerably. 

Start with the following general situation. Let G act 
freely on the configuration manifold O; let h: T*O — 
R be a G-invariant Hamiltonian, u € g*, ag EJ =E: 
and c,,(¢) the integral curve of the reduced system with 
initial condition [ag] <(T*Q),, given by the reduced 
Hamiltonian function h, :(T*Q),, >R. In terms of a 
connection A € Q! (J (11); g,,) on the left G,,-principal 
bundle J! (u) — (T*O) „ the reconstruction procedure 
proceeds in four steps: 


e Step 1: Horizontally lift the curve c,,(t) € (T*Q) 


to a curve d(t) € J! (u) with d(0) = ag. 
© Step 2: Set €(t) = A(d(t))(X,(d(t))) € g,- 


u 
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e Step 3: With €(t)<g,, determined in step 2, solve 
the nonautonomous differential equation g(t) = 
TLegty€(t) with initial condition g(0) =e, where Le 
denotes left translation on G; this is the step that 
involves “quadratures” and is the main obstacle 
to finding explicit formulas. 

e Step 4: The curve c(t) = g(t) - d(t), with d(t) found 
in step 1 and g(t) found in step 3 is the integral 
curve of X, with initial condition c(0) = aq. 


This method depends on the choice of the conne- 
ction AED! (J(u); g,,). Here are several particular 
cases when this procedure simplifies. 

(a) One-dimensional coadjoint isotropy group. If 
G,=S' or G,=R, identify g, with R via the map 
a ER aÇ E g,, where E€ g,,C 4 0, is a generator of 
g,- Then a connection 1-form on the S' (or R) 
principal bundle J(u) — (T*O) „ is the 1-form A on 
J(u) given by A=(1/(u,¢))0,, where 0, is the 
pullback of the canonical 1-form @€!(T*OQ) to 
the submanifold J(u). The curvature of this 
connection is the 2-form on (T*Q), given by 
curv(A) = —(1/(u,¢))w,z, where w, is the reduced 
symplectic form on (T*Q),,. In this case, the curve 
E(t) € g, in step 2 is given by €(t) = A[h](d(t)), where 
AEX(T*O) is the Liouville vector field character- 
ized by the property of being the unique vector field 
on T*Ọ that satisfies the relation dO(A,-)=6. In 
canonical coordinates (g’, p;) on T*O,A=Digg- 

(b) Induced connection. Any connection A€ 
(Oig) on the left principal bundle Ọ — O/G,, 
induces a connection AEM (J(u) g,,) by A(a,g)x 
(Va,):= Alg)(Ta,0(Va,))s where q€O,a,€ T;O, 
Va, E€ Ta,(T*Q), and to: T*O — O is the cotangent 
bundle projection. In this case, the curve €(t) € g, in 
step 2 is given by €(t)=A(q(t))(Fh(d(t)), where 
q(t):=o(d(t)) is the base integral curve and the 
vector bundle morphism Fp : T*O — TO is the fiber 
derivative of þh given by 


hlag + t8q) 


Ehad) = | 


for any Og, Bg E TZO. Two particular instances of 
this situation are noteworthy. 


(b1) Assume that the Hamiltonian þh is that of a 
simple mechanical system with symmetry. 
Choosing A to be the mechanical connection 
Amech> the curve €(t)€g,, in step 2 is given by 
E(t) = Ameh (a(t) (((d(2)s"))). 

(b2) If O=G is a Lie group, dim G, =1, and ¢ is a 
generator of g,,, then the connection A € OG) 
can be chosen to equal A(g):=(1/(y, ¢)) 
Te Ralu); where ¢ is a generator of g, and Rg 
is right translation on G. 


(c) Reconstruction of dynamics for simple 
mechanical systems with symmetry. The case of 
simple mechanical systems with symmetry deserves 
special attention since several steps in the recon- 
struction method can be simplified. For simple 
mechanical systems, the knowledge of the base 
integral curve g(t) suffices to determine the entire 
integral curve on T*Q. Indeed, if h =K + V o m; is 
the Hamiltonian, the Legendre transformation 
Fh:T*O—-TO determines the Lagrangian system 
on TO given by llug) = (1/2)||%eg ||" — V(ug), for 
u,€T,Q. Lagrange’s equations are second-order 
and thus the evolution of the velocities is given by 
the time derivative g(t) of the base integral curve. 
Since Fh =(F¢)*, the solution of the Hamiltonian 
system is given by Fé¢(g(t)). Using the explicit 
expression of the mechanical connection and the 
notation given in the general procedure, the method 
of reconstruction simplifies to the following steps. 
To find the integral curve c(t) of the simple mecha- 
nical system with G-symmetry h=K + V oro on 
T*O with initial condition c(0)=ag € TO; know- 
ing the integral curve c,(t) of the reduced Hamil- 
tonian system on (T*Q), given by the reduced 
Hamiltonian function h,:(T*Q),,— R with initial 
condition c,,(0)=[a,] one proceeds in the follow- 
ing manner. Recall the symplectic embedding 
pu: ((T*Q)ys (2Q),) > (TAQ/G,), Qo/G, — By). The 
curve ,,(c,,(t))€ T*(Q/G,,) is an integral curve of 
the Hamiltonian system on (T*(O/G,,), Qo/c, — Bu) 
given by the function that is the sum of the kinetic 
energy of the quotient Riemannian metric and the 
quotient amended potential V,. Let q,(t):= 
™Q/G,,(C,(t)) be the base integral curve of this system, 
where to/c,:T*(Q/G,) ~ Q/G, is the cotangent 
bundle projection. 


e Step 1: Relative to the mechanical connection 
Amech E 2'(O; g,), horizontally lift q,(t)< Q/G, 
to a curve g;(t) € O passing through g;,(0) =q. 

© Step 2: Determine €(t) € g, from the algebraic system 
((E(t)o(9n(t)), No(qu(t)))) = (mn) for all neg,, 
where ((-,-)) is the G-invariant kinetic energy 
Riemannian metric on QO. This implies that g;(0) 
and €(0)o(q) are the horizontal and vertical compo- 
nents of the vector al, € T,Q which is associated by 
the metric ((-,-)) to the initial condition ag. 

e Step 3: Solve g(t) = TeLgy€(t) in G, with initial 
condition g(0) =e. 

e Step 4: The curve g(t):= g(t)-gp(t), with q(t) 
and g(t) determined in steps 2 and 4, respectively, 
is the base integral curve of the simple mechanical 
system with symmetry defined by the function h 
satisfying g(0)=0. The curve (Fh) (g(t)) € T*O 
is the integral curve of this system with initial 


condition c(0)=a,g. In addition, q’(t)= g(t) - 
(qp(t) + €(t)Q(4p(t))) is the horizontal plus vertical 
decomposition relative to the connection induced 
on J(u) > (T*O) , by the mechanical connection 


Aneh = OO: g): 


There are several important situations when 
step 3, the main obstruction to an explicit solution 
of the reconstruction problem, can be carried out. 
We shall review some of them below. 


(cl) The case G, =S'. If G, is abelian, the Ta in 
step 3 has the a N g(t) = exp Jj é(s)ds. If, in 
addition, G,=S', then €(s) can be pants 
determined by step 2. Indeed, if (eg, is a 
generator of g,, writing €(s)=a(s)¢ for some 
smooth real-valued function a defined on some 
open interval around the origin, the algebraic 
equation in step 2 implies that ((a(s)&(t)o(qp(t)), 
Colle which gives als)= (a0) 
IColq,(s ))||°. Therefore, the base integral curve of 
the solution of the simple mechanical system with 
symmetry on T*O passing through q is 


i ds 
q(t) = exp (0 o aio) -qp (t) 


and 


"N f ds 
a p(w f zar") 


l; (11, ¢) 
(ao + CRON G clan) 


The case of compact Lie groups. An obvious 
situation when the differential equation in step 3 
can be solved is if €(t) =€ for all t£, where £ is a 
given element of g,. Then the solution is 
g(t) = exp(té). However, step 2 puts certain 
restrictions under this hypothesis, because it 
requires that ((€(t)o(qp(t)), NO(dp(t)))) = (Us 0) 
for any 7€g,. This is satisfied if there is a 
bilinear nondegenerate form (-,-) on g satisfy- 
ing (Gn) = (Cola) nola))) for all qEQ and 
¢,n€g. This implies that (-,-) is positive 
definite and invariant under the adjoint action 
of G on g, so semisimple Lie algebras of 
noncompact type are excluded. If G is com- 
pact, which ensures the existence of a positive 
adjoint invariant inner product on g, and 
O =G, this condition implies that the kinetic 
energy metric is invariant under the adjoint 
action. There are examples in which such 
conditions are natural, such as in Kaluza— 
Klein theories. Thus, if G is a compact Lie 


(c2 


x 
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group and (-,-) is a positive-definite metric 
invariant under the adjoint action of G on g 
satisfying (C, n) = ((Co(q),no(q))) for all q€ Q 
and C, n €g, then the element €(t) in step 2 can 
be chosen to be constant and is determined by 
the identity (£, -)= u|, on g,. The solution of 
the equation on step a is then g(t) = exp(té€). 

The case when Elt) is proportional to €(t). Try 
to find a real-valued function f(t) such that 
g(t) = exp(f(£)€(£)) is a solution of the equation 
o(t)=T.Ley€(t) with f(0)=0. This gives, for 
small zt, the equation f(t)é(t) + f(t)€(t) = E(t), 
that is, it is necessary that &(t) and €(t) be 
proportional. So, if €(t)=a/(t)€(t) for some 
a smooth EEA a(t), then this gives 

) = f exp( f? a(r)dr) ds. 

AG case of G, erin Write g(t) = exp(fi(t)&1) 
exp(f2(t)€2) “- exp(fult)En), for some basis 
{€1,€2,---5€n} of g, and some smooth real-valued 
functions f;, i= 1,2, ..., n, defined around zero. It 
is known that if G,, is solvable, the equation in 
step 3 can be solved by quadratures for the fi. 


(c3 


xr 


(c4 
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Reconstruction Phases for Simple Mechanical 
Systems with S! Symmetry 


Consider a simple mechanical system with symmetry 
G on the Riemannian manifold (QO, ((-,-))) with 
G-invariant potential V € C™(Q). If weg, let V, 
be the amended potential and V, € C*(Q/G,,) the 
induced function on the base. Let c:[0, T] — ~ be 
an integral curve of the system with Hamiltonian 
h=K+Vomrgo and suppose that its projection 
c,:[0, T]—(T*Q),, to the reduced space is a closed 
integral curve of the reduced system with Hamil- 
tonian ),. The reconstruction phase associated to 
the loop c,,(t) is the group element g € G,, satisfying 
the identity c(T)=g-c(0). We shall present two 
explicit formulas of the reconstruction phase for the 
case when G, =S". Let ÇE g, =R be a generator of 
the coadjoint isotropy algebra and write c(T)= 
exp(yC)- c(0); in this case, y is identified with the 
reconstruction phase and, as we shall see in concrete 
mechanical examples, it truly represents an angle. 

li G.=s", ee G,-principal bundle 7,,:J~' (u) > 
e n ie, admits two natural connec- 
tions: eM )O EQ!(J=(u)), where 0, is the 
e ol the enone 1-form on the cotangent 
bundle to the momentum level submanifold J(u), 
and TO Amech ENQ!(J(u)). There is no reason to 
choose one connection over the other and thus there 
are two natural formulas for the reconstruction 
phase in this case. Let c„(t) be a periodic orbit of 
period T of the reduced system and denote also by 
h, the value of the Hamiltonian function on it. 
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Assume that D is a two-dimensional surface in 
(IT*Q),, whose boundary is the loop c,,(¢). Since the 
manifolds (T*O),, and T*(O/S') are diffeomorphic 
(but not symplectomorphic), it makes sense to 
consider the base integral curve q,,(t) obtained by 
projecting c,,(¢) to the base O/S', which is a closed 
curve of period T. Denote by 


5 in 


= T 
(Vadim gf aOd 


the average of V, over the loop q,(t). Let q,(t) € O 
be the Amech-horizontal lift of q,,(t) to O and let x be 
the Amech-holonomy of the loop q,,(t) measured from 
q(0), the base point of c(0); its expression is given by 
exp x = exp(—{ Jp B), where B is the curvature of the 
mechanical connection. Denote by w,, the reduced 
symplectic form on (T*Q),,. With these notations the 
phase y is given by 


4 ds 
— — 3 
ej Iolar k 


The first terms in both formulas are the so-called 
geometric phases because they carry only geometric 
information given by the connection, whereas the 
second terms are called the dynamic phases since 
they encapsulate information directly linked to the 
Hamiltonian. The expression of the total phase as a 
sum of a geometric and a dynamic phase is not 
intrinsic and is connection dependent. It can even 
happen that one of these summands vanishes. We 
shall consider now two concrete examples: the free 
rigid body and the heavy top. 


Reconstruction Phases for the Free Rigid Body 


The motion of the free rigid body is a geodesic with 
respect to a left-invariant Riemannian metric on 
SO(3) given by the moment of inertia of the body. 
The phase space of the free rigid body motion is 
T*SO(3) and a momentum map J: T*SO(3) > R? of 
the lift of left translation to the cotangent bundle is 
given by right translation to the identity element. 
We have identified here so(3) with R? by the 
Lie algebra isomorphism x € (R°, x)=  € (so(3), 
[-,-]), where x(y)=x x y, and s0(3)* with R? by 
the inner product on R°. The reduced manifold 
J` (u)/G, is identified with the sphere S} in R° of 
radius ||u|| with the symplectic form w, = —dS/||ul|l, 
where dS is the standard area form on Si and G, S 
S! is the group of rotations around the axis u. These 
concentric spheres are the coadjoint orbits of the Lie- 
Poisson space 50(3)" and represent the level sets of the 


Casimir functions that are all smooth functions of 
|I|, where IMER? denotes the body angular 
momentum. 

The Hamiltonian of the rigid body on the Lie- 
Poisson space T*SO(3)/SO(3) S R? is given by 


Om g g 
pitt) = 5 (7 + L + 2) 

where I, lh,I3 > 0 are the principal moments of 
inertia of the body. Let I:= diag(I,, In, I3) denote the 
moment of inertia tensor diagonalized in a principal- 
axis body frame. The Lie—Poisson bracket on R? is 
given by {f g}(II)=—II-(V/f(II) x Vg(II)) and the 
equation of motions are II =I x Q, where Q €R? is 
the body angular velocity given in terms of II by 
Q=, for i=1,2,3, that is Q=I I. The 
trajectories of the these equations are found by 
intersecting a family of homothetic energy ellipsoids 
with the angular momentum concentric spheres. If 
I, > In > l, one immediately sees that all orbits are 
periodic with the exception of four centers (the two 
possible rotations about the long and the short 
moment of inertia axis of the body), two saddles 
(the two rotations about the middle moment of 
inertia axis of the body), and four heteroclinic orbits 
connecting the two saddles. 

Suppose that I(t) is a periodic orbit on the sphere 
Sha with period T. After time T, by how much has 
the rigid body rotated in space? The answer to this 
question follows directly from [3]. Taking ¢ = u/|| ull 
and the potential v = 0 we get 





ppg eT 
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where D is one of the two spherical caps on Sa 
whose boundary is the periodic orbit I(t), /,, is de 
value of the total energy on the solution I(t), and A 
is the oriented solid angle, that is, 


1 D 
A=] | w ae 
lell J JD lell 


Reconstruction Phases for the Heavy Top 





The heavy top is a simple mechanical systems with 
symmetry St on T*SO(3) whose Hamiltonian function 
i . e 2 

is given by h(ap):=(1/2)||a, || + Mg£k - hx, where 
h €SO(3), a, E€ T;SO(3),k is the unit vector of the 
spatial Oz axis (pointing in the direction opposite to 


that of the gravity force), M € R is the total mass of the 
body, g€ R is the value of the gravitational accelera- 
tion, the fixed point about which the body moves is the 
origin, and x is the unit vector of the straight line 
segment of length £ connecting the origin to the center 
of mass of the body. This Hamiltonian is left invariant 
under rotations about the spatial Oz axis. A momen- 
tum map induced by this S!-action is given by 
J: T*SO(3)—R, J(on) = T3L,(a,)-k; recall that 
T* L(a) =: € R? 7 ey body angular momentum. 
The reduced space J” ()/S! is generically the cotan- 
gent bundle of the e sphere endowed with the 
symplectic structure given by the sum of the canonical 
form plus a magnetic term; equivalently, this is the 
coadjoint orbit in the dual of the Euclidean Lie algebra 

se(3)" =R? x R? given by 0s (LT) |I- T= p, 
ICI =1}. The projection map J(u) > O, imple- 
menting A symplectic diffeomorphism between the 
reduced space and the coadjoint orbit in se(3)* is 
given by ap (I, T):=(TžL;(&p) bk). The orbit 
symplectic form w, on ©, has the expression 
w, (II, P)((Ilx x +r xy, Tr x x), uza +r xy, 
PR -II - (xx x’)-—T- (x xy — x x y) for any 
x, x’, y, y! € R°. The heavy-top equations I = II x Q + 
Mell x x, =T x Q are Lie-Poisson equations on 
se(3)" for the Hamiltonian h(I, T)= (1/2). Q + 
MgtT -x and the Lie—Poisson Aa He ILT) = 
I: (Vaf x Vog)- T: (Vofx Vreg- Vug x Vrf), 
where Vy and Vr denote the partial gradients. 

Let (II(t), (¢)) be a periodic orbit of period T of 
the heavy-top equations. After time T, by how much 
has the heavy top rotated in space? The answer is 
provided by [3]: 


par | f ont nF E (2r, T- amet f T) xds) 
= f [ATOR - Co moe y 


ds 
+f P(s) - ITs) 


where D is the spherical cap on the unit sphere 
whose boundary is the closed curve I(t) and D is a 
two-dimensional submanifold of the orbit O,, 
bounded by the closed integral curve (I(t), r(t)). 
The first terms in each summand represent the 
geometric phase and the second terms the dynamic 
phase. 


Gauged Poisson Structures 


If the Lie group G acts freely and properly on a 
smooth manifold O, then (T*O)/G is a quotient 
Poisson manifold (see Poisson Reduction), where the 
quotient is taken relative to the (left) lifted cotangent 
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action. The leaves of this Poisson manifold are the 
orbit reduced spaces (on /G, where O, C g“ is 
the coadjoint G-orbit through u€ g* (see Symmetry 
and Symplectic Reduction). Is there an explicit 
formula for this reduced Poisson bracket on a 
manifold diffeomorphic to (T*O)/G? It turns out 
that this question has two ome answers, once a 
connection on the principal bundle 7: O— O/G is 
introduced. The discussion below will also link to 
the fibration version of cotangent bundle reduction. 

In order to present these answers, we review two 
bundle constructions. Let G act freely and properly 
on the manifold P and consider the a (left) principal 
G-bundle p:P—P/G:=M. Let r:N—M be a 
surjective submersion. Then the pullback bundle 
õ: (n, p) E€ P :={(n,p) EN x P | p(p)=T(n)}rneN 
over N is also a principal (left) G-bundle relative to 
the action g- (n, p) := (n, 2g - p). 

If there is a (left) G-action a manifold V, then the 
diagonal G-action g- (p,v)=(g-p,g-v) on P x V is 
also free and proper and one can form the asso- 
ciated bundle P xg V:=(P x V)/G which is a 
locally trivial fiber bundle pg:[p,v]E€E:=P xg 
V— plp)€M over M with fibers diffeomorphic to 
V. Analogously, one can form the associated fiber 
bundle p;:E:=P xg V=>N. Summarizing, the 
associated bundle E=PxGV—-N is obtained 
from the principal bundle p:P— M, the surjective 
submersion T:N —> M, and the G-manifold V by 
pullback and association, in this order. 

These operations can be reversed. First, form the 
associated bundle pg:E=P xc V—M and then 
pull it back by the surjective submersion 7 :N —> M 
to N to get the pullback bundle jg: E — N. The map 
P:P xg V—>E defined by ®([(x, p), v]):= (n, [p, v]) 
is an isomorphism of locally trivial fiber bundles. 

These general considerations will be used now to 
realize the quotient Poisson manifold (T*O)/G in 
two different ways. Let O be a manifold pe Ga Lie 
group (with Lie algebra g) acting freely and properly 
on it. Let A€'(Q;g) be a connection 1-form on 
the left G-principal bundle 7: OQ — O/G. Pull back 
the G-bundle 7: O — O/G by the cotangent bundle 
projection to/Gg:I*(Q/G)—Q/G to T*(Q/G) to 
obtain the G-principal bundle o/c : (atq 4) E Q := 
{(atq> 4) | l4] =7(4), q E O} at E T*(Q/G). This 
bundle is isomorphic to the annihilator (VO)° C 
T*O of the vertical bundle VỌ := ker Tr C TỌ. 
Next, form the coadjoint bundle ps:S:=Q xg 
g —> T*(Q/G) of O, ps((Q{q]s q), u) = Qjq]» that IS, 
the associated vector bundle to the G-principal 
bundle O — T*(O/G) given by the coadjoint repres- 
entation of G on g*. The connection-dependent map 
@4:S—(T*Q)/G defined by ®4([(au 9), ul) := 
[Tp (ay i + A(q)"u], where q€Ọ,a4 : iT 72> and 
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weg", is a vector bundle isomorphism over O/G. 
The Sternberg space is the Poisson manifold (S, {- , -}s), 
where {-, -}, is the pullback to S by ®, of the quotient 
Poisson eee on (T*O)/G. 

Next, we proceed in the opposite ee S 
first the coadjoint bundle pġ :[q, u] E g*:=0 xc 
g*—> [q4] € O/G associated to the Seal bundle 
t:O-—O/G and then pull it back by the cotangent 
bundle projection agjg:T*(Q/G)—Q/G to 
T*(Q/G) to obtain the vector bundle pw: W:= 
(at [4 HI) | To/G(aqq)) = es (lq, ul) = lql}, Pw (at, 
[q4, u])=@ajq] over T*(Q/G). Note that W = T* 
(O/G) 9 g* and hence W is also a vector bundle over 
O/G. Let HO be the horizontal sub-bundle defined by 
the connection A; thus, TO=HO® VO, where 

HỌ :=ker A(q). For each q€ Q, the linear map 

Tarly o; H,O— Tq o is an isomorphism. Let 
horg := (Tan| Ho g(Q/G)— Hz,Q C TQ be 
the horizontal ie as induced by the connection 
A. Thus, hor* : T* 72. > T (Q/G) is a linear surjective 
map aoe e is ae Saat (H,Q)° of the 
spec space. The connection-dependent map 
Wa:(T*Q)/G—W defined by Wy([ag]):= (hor, 
(ey lge where qe Osa € T “OQ, and J: T° 
O — g* is the momentum map of the lifted action, 
(J(aq)s E) = ag((E0(q)) for €€g, is a vector bundle 
isomorphism over O/G and Y4 o #4 = ®. The Wein- 
stein space is the Poisson manifold (W, {- , -}w), where 
{-,-}w is me Lage forward by Wy of the Poisson 
beaches of (T*O)/G. In particular, ®:S— W is a 
connection aie Poisson diffeomorphism. The 
Poisson brackets on S and on W are called gauged 
Poisson brackets. They are expressed explicitly in terms 
of various covariant derivatives induced on S and on 
W by the connection A € 2!(Q;g). 

Recall that the connection A on the principal 
bundle 7:O—O/G naturally induces connections 
on pullback bundles and affine connections on 
associated vector bundles. Thus, both S and W 
carry covariant derivatives induced by A. They are 
given, according to general definitions, in the cases 
under consideration, by: 





elf fec(S),s=[(arg,q),u] ES, and Vog E Tova 

iy Dosh a dÈ f(s) eT: T*(Q/G) is knd 
d’ 

by aif (s) = df (s) (S(T aiql» 4), HTOxg ((v aja» NOK g 


(Ta pie Ae 18 i >O 
g* -s is the orbit map. The symbol d? 4 signifies 
that this is a covariant derivative on the 
associated bundle S induced by the connection 
A on the principal G-pullback bundle 
O — T*(Q/G). This connection A is the pullback 
connection defined by A. 

e If f EC (W),w = (Qq]> CE Lt) EW, and Vaj © Tara 


T*(Q/G), then V4 flw) € T} T*(Q/G) is defined 


nee Tx 


by Fa fw) =df(w Tia, u) TOxg -(horg 
(To m a) a Fo O x g“. = 
O <6 g* =g“ is the orbit map. The symbol Va 
signifies that this is a covariant derivative on the 
pullback bundle W induced by the covariant 
derivative Va on the coadjoint bundle g*. This 
covariant derivative V4 is induced on g* by the 
connection A. _ 
e For f € C”(W), we have dÈ (f o b)=(V, f)o® 
To write the two gauged Poisson brackets on S and 
on W explicitly, we denote by g=Oxcg the 
adjoint bundle of 7:Q—Q/G, by Qo;g_ the 
canonical symplectic structure on T*(Q/G), by 
BE?(O;g) the curvature of A, and by B the 
g-valued 2-form BENQ (Q/G; g) on the base O/G 
defined by B([q]) (tiaj Vigi) = lq, Blq)(4q5Vq)I; for any 
üg, V4 ET;Q that satisfy Tya(ug)=ujg) and 
Tgn(vg)=Vjq. Note that both S* and W* are Lie 
algebra bundles, that is, their fibers are Lie algebras 
and the fiberwise Lie bracket operation depends 
smoothly on the base point. If f € C*(S), denote by 
ôf /6s ES* =O xg g the usual fiber derivative of f. 
Similarly, if f €C°(W) denote by 6f/déw € W* the 
usual fiber derivative of f. Finally, {£:7T* 
(T*(O/G)) — T(T*(Q/G)) is the vector bundle iso- 
morphism induced by Qo/g. The Poisson bracket of 


f,g€ C™(S) is given by 
"8(8)') 


{F gsls) = Rocla) (f(s), 


-Ea 
3 (v (To/cb) (a) (aif Gy 


d¥.a(s)’) ) 


where v=|[g,u|]<g*. The Poisson bracket f,g€ 


C™°(W) is given by 


{fg} w(w) = Qoyalaig) (Vu fw), Vy ge)’ 


- (e fess 
+ (v, (7 /¢B) (aug) (Fa fw, Tago) ) 


Note that their structure is of the form: “canonical” 
bracket plus a (left) “Lie—Poisson” bracket plus a 
curvature coupling term. 


The Symplectic Leaves of the Sternberg 
and Weinstein Spaces 


The map ya:Q x g' > T"Q given by ya((Qjqj, 4), 
H) = Tz T(ajq]) T A(q)" Hs where ((Qfg]>4)>L) EQ x g; 
is a G-equivariant diffeomorphism; the G-action 
on T*O is by cotangent lift and on Oxg* is 
£: ((Q[q}59)> u) = ( (Qjq) 8: q), Adz lt). The pullback Ja 


of the momentum map to O x g* has the expression 
J a((Q{q]>4)s Le) =p, so if O C g* is a coadjoint orbit we 
have i n 10 z a x O, and hence the orbit reduced 
manifold J;'(O)/G, whose connected components 
are the o leawes of S, equals O xg QO. Its 
symplectic form is the Sternberg minimal coupling 
form Wo + psQo/c: 

In this formula, the 2-form wo has not been 
defined yet. It is uniquely defined by the identity 

T xg YO =dA + Iowo, where wọ is the minus orbit 
symplectic form on O (see Symmetry and Symplectic 
Reduction), Ho:O x O— Os the projection on the 
second factor, and AE? Ọ x O) is the 2-form 


given by — Allaahu) (layta) V) = 
“eat q)(vq)) for ((aiq 4), H) E Q x O, (ua “eva S 
40. and veg". 

“ae symplectic leaves of the Weinstein space 
W are obtained by pushing forward by © the 
symplectic leaves of the Sternberg space. They are 
the connected components of the symplectic 
ean (T*(Q/G) @ (Q xg O), He (0/6) RoG -- 

Io xs 0V0 xch ), where O is a coadjoint orbit in g*, 
Qo/g is the canonical symplectic form on T*(Q/G), 
WOx,0 18 a closed 2-form on er xg O to be defined 
below, and II g/g): T*(Q/G) @(Q xg O)— 
T*(Q/G), Tlox,0: T*(Q/G) 6 (Q xg O) - Q xg O 
are the projections. The closed 2-form wo, .6€ 
Q?(Ọ xg ©) is uniquely determined by the identity 
TO x OWO xg O = "Oxo? where 79x0:Q0 x eee xGO 
is the orbit space projection, Wo x0 EY (O xO) is 
closed and given by wWoxolq, H) (iq, =äd ii); 
(vq, ~ad u)):= —d(A x ido)(q, 1) (liq, —ad?y1), Wwa, 
ad ihe + wol Hl \(adžu, adju), and A x ido €Q'(Ox 
g*) is given by (Ax ido)(q, p)(ttq, —ades) = 
(ps A(q)(uq)), for gE O, WE g*, Ug, Vg € TzO, € Eg. 

Thus, on the Sternberg and Weinstein spaces, 
both the Poisson bracket as well as the symplectic 
form on the leaves have explicit connection 
dependent formulas (see Gauge Theory: Mathema- 
tical Applications for a general treatment of gauge 
theories). 


See also: Gauge Theory: Mathematical Applications; 
Hamiltonian Group Actions; Poisson Reduction; 
Symmetries and Conservation Laws; Symmetry and 
Symplectic Reduction. 
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Introduction 


Sufficiently dense concentrations of mass-energy in 
general relativity collapse irreversibly and form black 
holes. More precisely, the singularity theorems state 
that once a closed trapped surface has developed, some 
world lines will only extend to a finite length in the 
future — they end in a spacetime singularity. Further- 
more, the cosmic censorship hypothesis states that this 
singularity is hidden away inside a black hole. One 
can, therefore, classify initial data in general relativity 
which describe an isolated system with no black hole 
present into those which remain regular, and those 
which form a black hole during their evolution. 

Theorems on the stability of Minkowski spacetime, 
and similar results for some types of matter coupled to 
gravity, imply that sufficiently weak (in some technical 
sense) initial data will remain regular. On the other 
hand, no necessary or sufficient criterion for black hole 
formation is known. For very strong data the existence 
of a closed trapped surface implies black hole 
formation, but although the data themselves may be 
regular, the trapped surface must already be inside the 
black hole. Between the very weak and very strong 
regime, there is a middle regime of initial data for 
which one cannot decide if they will or will not form a 
black hole, other than evolving them in time. 

The threshold between collapse and dispersion was 
first explored systematically by Choptuik (1992). He 
concentrated on the simple model of a spherically 
symmetric massless scalar (matter) field (r, t). In this 
model, the scalar-field matter must either form a black 
hole, or disperse to infinity — it cannot form stable 
stars. Choptuik explored the space of initial data by 
means of one-parameter families of initial data which 
interpolate between strong data (say with large 
parameter p) that form a black hole and weak data 
(with small p) that disperse. The critical value p, of the 
parameter p can be found for each family by evolving 
many data sets from that family. Near the black hole 
threshold, Choptuik found the following phenomena: 


1. Mass scaling. By fine-tuning the initial data to 
the threshold along any one-parameter family, 
one can make arbitrarily small black holes. Near 
the threshold, the black hole mass scales as 


M~C(p—ps)’ for p > ps [1] 


for the black hole mass M in the limit p—p, 
from above. 

2. Universality. While p, and C depend on the 
particular one-parameter family of data, the critical 
exponent y has a universal value, y ~ 0.374, for all 
one-parameter families of scalar-field data. Further- 
more, for a finite time in a finite region of space, the 
solutions generated by all near-critical data 
approach one and the same solution ¢,, called the 
critical solution: 


olr, t) = 6.(F, =") 2 


The constants t, and L depend again on the 
family of initial data, but ¢,(7,¢) is universal. This 
universal phase ends when the evolution decides 
between black hole formation and dispersion. 
The universal critical solution is approached by 
any initial data that are sufficiently close to the 
black hole threshold, on either side, and from any 
one-parameter family. 

3. Scale-echoing. The critical solution @,(r,t) is 
unchanged when one rescales space and time by 
a factor eô: 





OAT, t) =O; (e“r, eôt) [3] 
where A ~ 3.44 for the scalar field. 


The same phenomena were quickly discovered in 
many other types of matter coupled to gravity, and 
even in vacuum gravity (where gravitational waves can 
form black holes). The echoing period A and critical 
exponent y depend on the type of matter, but the 
existence of the phenomena appears to be generic. For 
some types of matter (e.g., perfect fluid matter), the 
critical solution is continuously scale invariant (or 
continuously self-similar, CSS) in the sense that 


P(r, t) = b.(r/t) 4] 


rather than scale-periodic (or discretely self-similar, 
DSS) as in [3]. (We use the notation ¢,(x) for the 
function of one variable r/t.) We have described 
scale invariance and scale-echoing here in terms 
of coordinates, but these do admit geometric, 
coordinate-invariant definitions, which are not 
restricted to spherical symmetry. 

There is also another kind of critical behavior at the 
black hole threshold. Here, too, the evolution goes 
through a universal critical solution, but it is static, 
rather than scale invariant. As a consequence, the mass 
of black holes near the threshold takes a universal 
finite value (some fixed fraction of the mass of the 
critical solution), instead of showing power-law 


scaling. In an analogy with first- and second-order 
phase transitions in statistical mechanics, the critical 
phenomena with a finite mass at the black hole 
threshold are called type I, and the critical phenomena 
with power-law scaling of the mass are called type II. 

At this point, we characterize the degree of rigor 
of the various parts of the theory that is summarized 
in this article. Critical phenomena were discovered 
in the numerical time evolution of generic asympto- 
tically flat initial data. Numerical evolution of many 
elements of a specific one-parameter family, and 
fine-tuning to the black hole threshold along that 
family showed self-similarity and mass scaling near 
the threshold. Doing this for a number of randomly 
chosen one-parameter families suggests that these 
phenomena, and in particular the echoing scale A 
and mass-scaling exponent y, are universal between 
initial data within one model (e.g., the spherical 
scalar field). Numerical experiments, however, can 
only explore a finite-dimensional subspace of the 
infinite-dimensional space of initial data (phase 
space) of the field theory, and so cannot prove 
universality. 

We go further by applying the theory of dynami- 
cal systems to general relativity. The arguments 
summarized in the next section would be difficult to 
make rigorous, as the dynamical system under 
consideration is infinite dimensional, but they 
suggest a focus on fixed points of the dynamical 
system and their linear perturbations. Even though 
the dynamical systems motivation is not mathema- 
tically rigorous, the linearized analysis itself is a 
well-defined problem that can be solved numerically 
to essentially arbitrary precision. This proves uni- 
versality on a perturbative level, and provides 
numerical values of A and y. A combination of the 
global dynamical systems analysis and perturbative 
analysis even predicts further critical exponents for 
black hole charge and angular momentum. Finally, 
critical phenomena have been discovered in a 
number of systems (different types of matter and 
symmetry restrictions), and this suggests that they 
may be generic for some large class of field theories 
(although details such as the numerical values of 
y and A do depend on the system), but there is no 
conclusive evidence for this at present. 


The Dynamical Systems Picture 


When we consider general relativity as an infinite- 
dimensional dynamical system, a solution curve is a 
spacetime. Points along the curve are Cauchy 
surfaces in the spacetime, which can be thought of 
as moments of time. An important difference 
between general relativity and other field theories 


Critical Phenomena in Gravitational Collapse 669 


is that the same spacetime can be sliced in many 
different ways, none of which is preferred. There- 
fore, to turn general relativity into a dynamical 
system, one has to fix a slicing (and in practice also 
coordinates on each slice). In the example of the 
spherically symmetric massless scalar field, using 
polar slicing and an area radial coordinate r, a point 
in phase space can be characterized by the two 
functions 


z= for), ro) 5 


In spherical symmetry, there are no degrees of 
freedom in the scalar field, and Cauchy data for 
the metric can be reconstructed from Z using the 
Einstein constraints. 

The phase space consists of two halves: initial 
data whose time evolution always remains regular, 
and data which contain a black hole or form one 
during time evolution. The numerical evidence 
collected from individual one-parameter families of 
data suggests that the black hole threshold that 
separates the two is a smooth hypersurface. The 
mass-scaling law [1] can, therefore, be restated 
without explicit reference to one-parameter families. 
Let P be any function on phase space such that data 
sets with P > 0 form black holes, and data with P < 0 
do not, and which is analytic in a neighborhood of 
the black hole threshold P=0. The black hole mass 


as a function on phase space is then given by 
M ~ F(P) P 6) 


for P > 0, where F(P) > 0 is an analytic function. 

Consider now the time evolution in this dynami- 
cal system, near the threshold (“critical surface”) 
between black hole formation and dispersion. A 
phase-space trajectory that starts out in a critical 
surface by definition never leaves it. A critical 
surface is, therefore, a dynamical system in its own 
right, with one dimension fewer. If it has an 
attracting fixed point, such a point is called a 
critical point. It is an attractor of codimension 1, 
and the critical surface is its basin of attraction. The 
fact that the critical solution is an attractor of 
codimension 1 is visible in its linear perturbations: it 
has an infinite number of decaying perturbation 
modes tangential to (and spanning) the critical 
surface, and a single growing mode not tangential 
to the critical surface. 

Any trajectory beginning near the critical surface, 
but not necessarily near the critical point, moves 
almost parallel to the critical surface toward the 
critical point. As the phase point approaches the 
critical point, its movement parallel to the surface 
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Figure 1 The phase-space picture for the black hole threshold 
in the presence of a critical point. The arrow lines are time 
evolutions, corresponding to spacetimes. The line without an 
arrow is not a time evolution, but a one-parameter family of initial 
data that crosses the black hole threshold at p = p,. (Reproduced 
with permission from Gundlach C (2003) Critical phenomena in 
gravitational collapse. Physics Reports 376: 339—405.) 


slows down, while its distance and velocity out of 
the critical surface are still small. The phase point 
spends sometime moving slowly near the critical 
point. Eventually, it moves away from the critical 
point in the direction of the growing mode, and ends 
up on an attracting fixed point. 

This is the origin of universality: any initial data 
set that is close to the black hole threshold (on either 
side) evolves to a spacetime that approximates the 
critical spacetime for sometime. When it finally 
approaches either the dispersion fixed point or the 
black hole fixed point, it does so on a trajectory that 
appears to be coming from the critical point itself. 
All near-critical solutions are passing through one of 
these two funnels. All details of the initial data have 
been forgotten, except for the distance from the 
black hole threshold: the closer the initial phase 
point is to the critical surface, the more the solution 
curve approaches the critical point, and the longer it 
will remain close to it. 

In all systems that have been examined, the black 
hole threshold contains at least one critical point. A 
fixed point of the dynamical system represents a 
spacetime with an additional continuous symmetry 
that generic solutions do not have. If the critical 
spacetime is time independent in the usual sense, we 
have type I critical phenomena; if the symmetry is 
scale invariance, we have type II critical phenomena. 
The attractor within the critical surface may also be 
a limit cycle, rather than a fixed point. In spacetime 


terms this corresponds to a discrete symmetry (DSS 
rather than CSS in type II, or a pulsating critical 
solution, rather than a stationary one, in type I). 


Self-Similarity and Mass Scaling 


Type II critical phenomena occur where the critical 
solution is scale invariant (self-similar, CSS or DSS). 
Using suitable spacetime coordinates, a CSS solution 
can be characterized as independent of a time 
coordinate r which is also a logarithmic scale. 
Similarly, a DSS solution can be characterized as 
periodic in r. For example, starting from the scale 
periodicity [3] in  polar-radial coordinates, we 
replace r and t by new coordinates 


rs-mn(-—*) 7 


where the accumulation time t, and scale L must be 
matched to the one-parameter family under con- 
sideration. T has been defined so that it increases as 
t increases and approaches t, from below. It is useful 
to think of r, t, and L as having dimension length in 
units c=G=1, and of x and 7 as dimensionless. 
Choptuik’s observation, expressed in these coordi- 
nates, is that in any near-critical solution there is 
a spacetime region where the fields Z are well 
approximated by the critical solution, or 


Lk, Te ZT) [8] 


i 
a 








with 
Z(x, T+ A) = Z,(x, T) [9] 


Note that the time parameter of the dynamical 
system must be chosen as 7 if a CSS solution is to be 
a fixed point, or a DSS solution a cycle. More 
generally (going beyond spherical symmetry), on any 
self-similar spacetime one can introduce coordinates 


x! = (T, x1, x, x?) in which the metric is of the form 


Suv = a? [10] 


and where g,,, is independent of r for a CSS 
spacetime, and periodic in + for a DSS spacetime. 
These coordinates are not unique. 

The critical exponent y can be calculated from the 
linear perturbations of the critical solution. In order 
to keep the notation simple, the discussion will be 
restricted to a critical solution that is spherically 
symmetric and CSS, which is correct, for example, 
for perfect-fluid matter. 

Let us assume that we have fine-tuned initial data 
close to the black hole threshold so that in a region 
the resulting spacetime is well approximated by the 
CSS critical solution. This part of the spacetime 


corresponds to the section of the phase-space 
trajectory that lingers near the critical point. In this 
region, we can linearize around Z,. As Z, does not 
depend on 7, its linear perturbations can depend 
on T only exponentially. Labeling the perturbation 
modes by i, a single mode perturbation is of 
the form 


f= C;e™ Z;(x) [1 1] 


In the near-critical regime, we can therefore 


approximate the solution as 


CO 


Z(x,7) ~ Z.(x) +S Geza) 12 
i=0 


The notation C;(p) is used because the perturbation 
amplitudes C; depend on the initial data, and hence 
on the parameter p that controls the initial data. 

If Z, is a critical solution, by definition there is 
exactly one A; with positive real part (in fact, it is 
purely real), say Ap. As t—> t, from below, which 
corresponds to tT —> ov, all other perturbations decay 
and can be neglected. By definition, the critical 
solution corresponds to p=p,, and so we must have 
Co(p«) = 0. Linearizing around p,, we obtain 


Z(x, 7) ~ Za(x) + | (p— pele” Zo(x) [13] 





px 


in a region of the spacetime. 
Now we extract Cauchy data at one particular 
value of 7 within that region, namely at 7, 


defined by 
dCo 


— =A = 
dp ppc Fe [14] 


Pa 





where e€ is an arbitrary small constant, so that 
Lip) S E Zo) [15] 


where + is the sign of p — px, left behind because by 
definition e is positive. As 7 increases from Tp, the 
growing perturbation becomes nonlinear and the 
approximation [13] breaks down. Then either a 
black hole forms (say for the positive sign), or the 
solution disperses (for the negative sign). We need 
not follow this nonlinear evolution in detail to find 
the black hole mass scaling in the former case: 
dimensional analysis is sufficient. Going back to 
coordinates t and r, we have 


Z(r, tp) ~ Z, (5) bez (=) 16] 


Lp = Le“? [17] 


where 


Critical Phenomena in Gravitational Collapse 671 


These Cauchy data at t= tp depend on the initial 
data at t=0 only through the overall scale Lp, and 
through the sign in front of e. If the field equations 
themselves are scale invariant, or asymptotically 
scale invariant at scales L, and smaller, the black 
hole mass, which has dimensions of length in 
gravitational units, must be proportional to the 
initial data scale Lp, the only length scale that is 
present. Therefore, 


M x Lp x (p— ps)’ [18] 


and we have found the critical exponent to be y = 1/Ao. 


The Analogy with Statistical Mechanics 


The existence of a threshold where a qualitative 
change takes place, universality, scale invariance, 
and critical exponents suggest that there is a 
mathematical analogy between type II critical 
phenomena and critical phase transitions in statis- 
tical mechanics. 

In equilibrium statistical mechanics, observable 
macroscopic quantities, such as the magnetization of 
a ferromagnetic material, are derived as statistical 
averages over microstates of the system. The 
expected value of an observable is 


(A) = > A(microstate) e~#(mostate,#) [19] 


microstates 


The Hamiltonian H depends on the parameters u, 
which comprise the temperature, parameters char- 
acterizing the system such as interaction energies of 
the constituent molecules, and macroscopic forces 
such as the external magnetic field. The objective of 
statistical mechanics is to derive relations between 
the macroscopic quantities A and parameters u. 

Phase transitions in thermodynamics are thresholds 
in the space of external forces u at which the 
macroscopic observables A, or one of their derivatives, 
change discontinuously. In a ferromagnetic material 
at high temperatures, the magnetization m of the 
material (alignment of atomic spins) is determined by 
the external magnetic field B. At low temperatures, the 
material shows a spontaneous magnetization even at 
zero external field, which breaks rotational symmetry. 
With increasing temperature, the spontaneous magne- 
tization m decreases and vanishes at the Curie 
temperature T, as 


m| ~ (T; — TY 20 


In the presence of a very weak external field, the 
spontaneous magnetization aligns itself with the 
external field B, while its strength is, to leading 
order, independent of B. The function m<(B,T), 
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therefore, changes discontinuously at B=0. The line 
B=0O for T < T, is, therefore, a line of first-order 
phase transitions between the possible directions of 
the spontaneous magnetization (in a one-dimen- 
sional system, between m up and m down). This line 
ends at the critical point (B=0, T =T,) where the 
order parameter |m| vanishes. The role of B=0 as 
the critical value of B is obscured by the fact that 
B=0 is singled out by symmetry. 

A critical phase transition involves scale-invariant 
physics. One sign of this is that fluctuations appear 
on a large range of length scales between the 
underlying atomic scale and the scale of the sample. 
In particular, the atomic scale, and any dimensionful 
parameters associated with that scale, must become 
irrelevant at the critical point. This can be taken as 
the starting point for obtaining properties of the 
system at the critical point. 

One first defines a semigroup acting on micro- 
states: the renormalization group. Its action is to 
group together a small number of particles as a 
single particle of a fictitious new system, using some 
averaging procedure. Alternatively, this can also be 
done in Fourier space. One then defines a dual 
action of the renormalization group on the space of 
Hamiltonians by demanding that the partition 
function is invariant under the renormalization 
group action: 


`N eH = >. eH [21] 


microstates microstates’ 


The renormalized Hamiltonian H’ is in general 
more complicated than the original one, but it can 
be approximated by a fixed expression where only 
a finite number of parameters u are adjusted. Fixed 
points of the renormalization group correspond to 
Hamiltonians with the parameters u at their critical 
values. The critical value of any dimensional 
parameter u must be zero (or infinity). Only 
dimensionless combinations can have nontrivial 
critical values. 

The behavior of thermodynamical quantities at 
the critical point is in general not trivial to calculate. 
But the action of the renormalization group on 
length scales is given by its definition. The blowup 
of the correlation length € at the critical point is, 
therefore, the easiest critical exponent to calculate. 
We make contact with critical phenomena in 
gravitational collapse by considering the time evolu- 
tion in coordinates (T,x) as a renormalization group 
action. The calculation of the critical exponent for 
the black hole mass M is the precise analog of the 
calculation of the critical exponent for the correla- 
tion length €, substituting T, —T for p—p,, and 


taking into account that the 7-evolution in critical 
collapse is toward smaller scales, while the renor- 
malization group flow goes toward larger scales: 
therefore, € diverges at the critical point, while M 
vanishes. 

We have shown above that the black hole mass is 
controlled by one global function P on phase space. 
Clearly, P is the gravity equivalent of T — T, in 
the ferromagnet. But it is tempting to speculate 
(Gundlach 2002)that there is also a gravity equiva- 
lent of the external magnetic field B, which gives rise 
to a second independent critical exponent. At least 
in some situations, the angular momentum of the 
initial data can play this role. Note that, like B, 
angular momentum is a vector, with a critical value 
that is zero because all other values break rotational 
symmetry. Furthermore, the final black hole can 
have nonvanishing angular momentum, which must 
depend on the angular momentum of the initial 
data. The former is analogous to the magnetization 
m, the latter to the external field B. It can be shown 
that this analogy holds perturbatively for small 
angular momentum. Future numerical simulations 
will show if it goes further. 


Universality and Cosmic Censorship 


Critical phenomena in gravitational collapse first 
generated interest because a complicated self-similar 
structure and dimensionless numbers y and A arise 
from generic initial data evolved by quite simple 
field equations. Another point of interest is the 
rather detailed analogy of phenomena in a determi- 
nistic field theory with critical phase transitions in 
statistical mechanics. But critical phenomena are 
important for general relativity mostly for a differ- 
ent reason. 

Black holes are among the most important 
solutions of general relativity because of their 
universality: the black hole uniqueness theorems 
state that stable black holes are completely deter- 
mined by their mass, angular momentum, and 
electric charge — the Kerr-Newman family of black 
holes. Perturbation theory shows that any perturba- 
tions of black holes from the Kerr-Newman solu- 
tions must be radiated away. 

Critical solutions have a similar importance 
because they are generic intermediate states of 
the evolution that are also independent of the 
initial data. An important distinction is that 
critical solutions depend on the matter model, 
and are therefore less universal than black holes. 
However, critical phenomena in gravitational 
collapse seem to arise in axisymmetric vacuum 
spacetimes, and so are apparently not linked to the 


presence of matter. Furthermore, they also arise in 
perfect-fluid matter with the equation of state 
p=p/3, which is that of an ultrarelativistic gas. 
This is a good approximation for matter at very 
high density, such as in the big bang. This is 
important because critical phenomena probe 
arbitrarily large matter densities or spacetime 
curvatures as the initial data are fine-tuned to the 
black hole threshold. At even higher densities, 
presumably on the Planck scale, scale invariance is 
again broken by quantum-gravity effects, and 
so critical phenomena will end there. 

The cosmic censorship conjecture states that 
naked singularities do not arise from suitably 
generic initial data for suitably well-behaved mat- 
ter. Critical phenomena in gravitational collapse 
have forced a tightening of this conjecture. Type II 
(self-similar) critical solutions contain a naked 
singularity, that is, a point of infinite spacetime 
curvature from which information can reach a 
distant observer. (By contrast, the singularity inside 
a black hole is hidden from distant observers.) On a 
kinematical level, this could be seen already from 
the form [10] of the metric. Because the critical 
solution is the end state for all initial data that are 
exactly on the black hole threshold, all initial data 
on the black hole threshold form a naked singular- 
ity. As type II critical phenomena appear to be 
generic at least in spherical symmetry, this means 
that in generic self-gravitating systems, the space of 
regular initial data that form naked singularities is 
larger than expected, namely of codimension 1. 
Excluding naked singularities from generic initial 
data may be the sharpest version of cosmic censor- 
ship one can now hope to prove. 

Another point of interest in critical collapse is that 
it allows one to make a small region of arbitrarily 
high curvature from finite-curvature initial data. 
This may be a route for probing quantum-gravity 
effects. Similarly, one can make black holes that are 
much smaller than any length scale present in the 
initial data or the matter equation of state. An 
application has been suggested for this in cosmol- 
ogy, where primordial black holes could have 
masses much smaller than the Hubble scale at 
which they are created, rather than of the order of 
this scale. 
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Outlook 


Critical phenomena in gravitational collapse are now 
well understood in spherical symmetry, both theoreti- 
cally and in numerical simulations. In some matter 
models, the phenomenology is quite complicated, but 
it still fits into the basic picture outlined here. 

The crucial question as to what happens beyond 
spherical symmetry remains largely unanswered at 
the time of writing. Perturbation theory around 
spherical symmetry suggests that critical phenom- 
ena are not restricted to exactly spherical situa- 
tions. This is also supported by simulations in 
axisymmetric (highly nonspherical) vacuum grav- 
ity. Other simulations of nonspherical gravitational 
collapse which cover the necessary range of space- 
time scales required to see critical phenomena are 
only just becoming available, and the results are 
not yet clear-cut. For collapse with angular 
momentum, no high-resolution calculations have 
yet been carried out. As the necessary techniques 
become available, one should be prepared for 
numerical simulations to make dramatic extensions 
or corrections to the picture of critical collapse 
drawn up here. 


See also: Computational Methods in General Relativity: 
The Theory; Spacetime Topology, Causal Structure and 
Singularities; Stability of Minkowski Space; Stationary 
Black Holes. 
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Introduction 


Certain commutation relations among the current 
density operators in quantum field theories define 
an infinite-dimensional Lie algebra. The original 
current algebra of Gell-Mann described weak and 
electromagnetic currents of the strongly interacting 
particles (hadrons), leading to the Adler-Weisberger 
formula and other important physical results. This 
helped inspire mathematical and quantum-theoretic 
developments such as the Sugawara model, light 
cone currents, Virasoro algebra, the mathematical 
theory of affine Kac-Moody algebras, and non- 
relativistic current algebra in quantum and statis- 
tical physics. Lie algebras of local currents may be 
the infinitesimal representations of loop groups, 
local current groups or gauge groups, diffeomorph- 
ism groups, and their semidirect products or other 
extensions. Broadly construed, current algebra thus 
leads directly into the representation theory of 
infinite-dimensional groups and algebras. Applica- 
tions have ranged across conformally invariant 
field theory, vertex operator algebras, exactly 
solvable lattice and continuum models in statistical 
physics, exotic particle statistics and g-commuta- 
tion relations, hydrodynamics and quantized vortex 
motion. This brief survey describes but a few 


highlights. 


Relativistic Local Current Algebra 
for Hadrons 


To model superfluidity, Landau had proposed in 
1941 a quantum hydrodynamics fundamentally 
based on local fluid densities and currents as 
(operator) dynamical variables. However, current 
algebra came into its own in theoretical physics with 
the ideas of Gell-Mann in the early 1960s. The basic 
concept, in the era just preceding quantum chromo- 
dynamics (QCD), was that even without knowing 
the Lagrangian governing hadron dynamics in 
detail, exact kinematical information — the local 
symmetry — could still be encoded in an algebra of 
currents. The local (vector and axial vector) current 
density operators, expressed where possible in terms 
of underlying quantized field operators in Hilbert 
space, were to form two octets of Lorentz 4-vectors, 
with each octet corresponding to the eight genera- 
tors of the compact Lie group SU(3). 


More specifically (Adler and Dashen 1968), let 
Po eZ wanes 0, de We an. octet of 
hadronic vector currents, where as usual 
x = (x”) =(x°, x) denotes a point in four-dimensional 
spacetime. Likewise, introduce an axial vector octet 
F"(x). Unless otherwise specified, we use natural 
units, where h=1 and c=1. Define the correspond- 
ing charges F, and F? to be the space integrals of the 
time components of these currents, that is, 


F,(x°) = f PEF, a) 
[1] 
E(x”) = UA 


where d?x=dx! dx? dx?. Then F,,F,F3 are the 
three components l1, [5,13 of the isotopic spin, and 
Y =(2/3/3)Fs is the hypercharge. The usual elec- 
tromagnetic current J” (x°, x) is given by 


fn = 4 (7 A z) 2 


where q is the unit elementary charge, and the total 
charge is given by Q= f d?x J? (x, x) = q(I3 + Y/2). 
The hadronic part of the weak current entering an 
effective Lagrangian can be written as 


Jh =|(Fi - Fi") +i(FS - F3) cos Oc 
+| (Fh - F3) +i( FE- FH) | sin 8c [8] 
where Oc is the Cabibbo angle (determined experi- 
mentally to be ~0.27 rad). The terms with F1 — Fẹ 
and F — F3 are strangeness conserving, those with 
F4- Fa and F; — F are not. 
The main current algebra hypothesis is that the 


time components F° and F°° of these octets satisfy 
the equal-time commutation relations: 


F222), FOP, y) 
=i yY cabaF Oa) 
d 
Fal a), Fe (0"s¥)] ,0-50 
=O (ey) Cpe 7 ok) 
; 
ea Oy) Lon 
= i6® (x — y) X cabaF G(x”, x) 
d 


4 


where the c,,; are structure constants of the Lie 
algebra of SU(3), antisymmetric in the indices. Since 
current commutators relate bilinear expressions to 


linear ones, they fix the normalizations of the 
currents. The chiral currents F4” = (1/2)(F4 — F>") 
and FSM —(1/2)(F4 + F) commute with each 
other, so that the local current algebra decomposes 
into two independent pieces. 

The Dirac 6-functions in eqns [4] require that F? and 
F>?’ be interpreted as (unbounded) operator-valued 
distributions; while the fixed-time condition suggests 
these should make mathematical sense as 
three-dimensional distributions, with x? held constant. 
Such distributions may be modeled on the test-function 
space D of real-valued, compactly supported, C% 
functions on the spacelike hyperplane R°. For functions 
fa f? € D, one has formally the “smeared currents” 
that are expected to be bona fide (unbounded) 


operators in Hilbert space; suppressing x°, 


Filta) = | PINEA 


[5] 
DE a 
R? 
Equations [4] then become 
[Fala Fo(te)] =F (fa), Fo (fo)| 
=i) F aiaa) G 
d 


ECA) iS F2 (Cavafafo) 
d 


Let g(x) be a C® map from R? to the Lie algebra G of 
chiral SU(3) x SU(3), equal to zero outside a compact 
set. The set of all such G-valued functions forms an 
infinite-dimensional Lie algebra under the pointwise 
bracket, [g, 2’](x) =[g(x), g’(x)]. Let us call this Lie 
algebra mapy(R°,G), where the subscript 0 indicates 
the condition of compact support when that is 
applicable (on compact manifolds, we omit the sub- 
script). Expanding g(x) with respect to a fixed basis of 
G, we straightforwardly identify the map g with the 
pad a of test functions f, and f?. at defining 

=F ) + doa F?°(f>), eqns [6] are inter- 
te (for fixed x?) as a representation F of 
mapo(R°, 9). 

Integrating out the spatial variables entirely using 
eqns [1] leads to a representation at x° of G by the 
charges F, and F>. The Adler-Weisberger sum rule 
was first derived (in 1965) from the commutation 
relations of these charges, together with the assump- 
tion of a partially conserved axial-vector current 
(PCAC). It connected nucleon (-decay coupling with 
pion—nucleon scattering cross sections, agreeing well 
with experiment. Various low-energy theorems 
followed, also in accord with experiment. Shortly 
thereafter, Adler was able to eliminate the PCAC 
assumption, and derived a further sum rule going 
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beyond an experimental test of the algebra of 
charges to test the actual local current algebra. 
Here, the prediction pertained to structure functions 
in the deep inelastic scattering of neutrinos. This 
was elaborated by Bjorken to inelastic electron 
scattering. On the theoretical side, the study of the 
chiral current in perturbation theory led into the 
theory of anomalies. All these ideas were highly 
influential in subsequent theoretical work (Treiman 
et al. 1985, Mickelsson 1989). 

It is a natural idea to try to extend eqns [4] or [6], 
which elegantly express the combined ideas of 
locality and symmetry, to an equal-time commutator 
algebra that would also include the space compo- 
nents of the local currents FE k= 1,2,3. One may 
write without difficulty the commutators of the 
charges in [1] with these space components: 


[Fa (x°), Fi, x)] = [Fa (x°), FE: (x°, 2) 
=i a K) 
d 
PAO Feo) = Eek), 
Cie 
d 


But the commutator of the local time component 
with the local space component of the current 
cannot be merely the obvious extrapolation from 
eqns [4] and [7], that is, it cannot be 


a PG 5) i 
=10 (ey) > Cal a 5) 
d 


and so forth. Under very general conditions, for a 
relativistic theory based on local quantum fields or 
local observables, additional “Schwinger terms” are 
required on the right-hand sides of such commu- 
tators (Renner 1968). 

Well-known difficulties in specifying the Schwinger 
terms are associated with the fact that operator- 
valued distributions are singular when regarded as if 
they were functions of spacetime points. Thus, the 
product of two distributions at a point is often 
singular or undefined. When the currents forming a 
local current algebra are written as normal-ordered 
products of field operator distributions and their 
derivatives, the Schwinger terms in their commuta- 
tion relations may be calculated, for example, by 
“splitting points” in the arguments of the underlying 
fields, and subsequently letting the separation tend 
toward zero. The general form of a Schwinger term 
typically involves the derivative of a 6-function times 
an operator. This may be a multiple of the identity 
(i.e., a c-number) or not, depending on the underlying 
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field-theoretic model. Furthermore, when the number 
of spacetime dimensions is greater than 1+ 1, the 
c-number Schwinger terms turn out to be infinite. 
Hence, we do not obtain this way a bona- fide 
infinite-dimensional, equal-time commutator algebra 
comprising all the components of the local currents. 


Sugawara, Kac-Moody, and 
Virasoro Algebras 


Since equations such as [4] and [6] are not explicitly 
dependent on how the currents are constructed from 
underlying canonical fields, one has the possibility 
of writing a theory entirely in terms of self-adjoint 
currents as the dynamical variables, bypassing the 
field operators entirely, and expressing a Hamilto- 
nian operator directly in terms of such local 
currents. This is in the spirit of approaches to 
quantum field theory based on local algebras of 
observables. It suggests consideration of relativistic 
current algebras with finite c-number or operator 
Schwinger terms in s + 1 dimensions, s > 1. 

The Sugawara model, which is of this type, turned 
out to be one of the most influential of those 
proposed in the late 1960s and early 1970s. 
Henceforth, let G be a ae Lie group, and G 
its Lie algebra; let F}, a = , dim G, be a basis for 
G, with FA T The Sugawara current 
algebra, at the fixed time x? = y? (which, from here 
on, we suppress in the more is given by 


a(x) JO = 16° (x y ) Dd eabidal®) 
Te (x), Jg(y)] = 16) (2 — y E ene 3 
+ 166 gb hisa — y)I 


JEC JO] =0 (k = a) 


where J# = (J°, J£), k = 1, 2, 3, is again a 4-vector, c is a 
finite constant, and I is the identity operator. The time 
components in eqns [8] behave like the local currents in 
eqns [4]. The Schwinger term is a c-number, while 
setting the commutators of the space components to 
zero is the simplest choice consistent with the Jacobi 
identity. The Sugawara Hamiltonian is given in terms of 
the local currents by the formal expression: 


H=5.>) fs fs + Sse 9] 


where the pointwise products of the currents require 
interpretation in the particular representation. This 
Hamiltonian leads to current conservation equations 


for the JË. 


Related to the Sugawara current algebra, with s = 1 
and the spatial dimension compactified, are affine 
Kac-—Moody and Virasoro algebras (Goddard and 
Olive 1986, Kac 1990). Consider the infinite-dimen- 
sional Lie algebra map(S',G) of smooth functions 
from the circle to G under the pointwise bracket. This 
is also called a loop algebra. Referring to the basis F,, 
define T’”) for integer m to be the Fourier function 
0 — F, exp[—im@]. The pointwise bracket in 
map(S',G) gives [T'”), T] = Dral for these 
generators. The corresponding (untwisted) affine 
Kac-Moody algebra is a (uniquely defined, nontri- 
vial) one-dimensional central extension of this loop 
algebra — that is, the new generator commutes with all 
elements of the Lie algebra and, in an irreducible 
representation, must be a multiple of the identity. 
In such a representation, the new bracket can be 
written as 


Biers 


( =i > capa TY” + kmôsbôm -nI [10] 
d 

where k is a constant. Here, T'’"=° is again a 
representation of G. Self-adjointness of the local 
currents in the representation imposes the condition 
TO a7 

Now the compactly supported C% (tangent) 
vector fields on a C% manifold M form a natural 
Lie algebra under the Lie bracket, denoted by 
vecto(M). In local Euclidean coordinates, for g4, g, € 
vecto(M), one can write this bracket as 


— g2; VX [11] 


As the affine Kac-Moody algebras are central 
extensions of the algebra of G-valued functions on 
S', so are Virasoro algebras central extensions of the 
algebra of vector fields on St. Let L™ denote 
the (complexified) vector field described by 
exp [—1m0](1/1)0/00, for integer m. These genera- 
tors then satisfy  [L™®, L®]= (m — n) L”, 
Adjoining to the Lie algebra of vector fields a 
new central element (commuting with all the 


[81:82] = g1 © V82 


L™), the Virasoro bracket in an irreducible 
representation is given by the formula 
L, L™] = (m — aL 
1 — 1 
ee eS) 


12 


where the numerical coefficient c is called the 
Virasoro central charge; self-adjointness of the 
currents imposes L™ = L'~™), It is straightforward 
to verify that eqn [12] satisfies the Jacobi identity. 
The special form of the central term in the Virasoro 
current algebra results from the Gelfand—Fuks 
cohomology on the algebra of vector fields. 


The Kac-Moody and Virasoro algebras, both 
modeled on St, may be combined to form a natural 
semidirect sum of Lie algebras, with the additional 
bracket 


Ty, L) = mT" [13] 


Roughly speaking, the Kac—-Moody generators cor- 
respond to Fourier transforms of charge densities on 
S', whereas the Virasoro generators correspond to 
Fourier transforms of infinitesimal motions in St. 
The central extensions provide the finite, c-number 
Schwinger terms. These structures have important 
application to light cone current algebra, confor- 
mally invariant quantum field theories in (1+ 1)- 
dimensional spacetime, the quantum theory of 
strings, exactly solvable models in statistical 
mechanics, and many other domains. 

Of greatest physical importance, both in quantum 
field theory and statistical mechanics, are those 
irreducible, self-adjoint representations of the Virasoro 
algebra known as highest weight representations, 
where the spectrum of the operator L'”=°) is bounded 
below. In these applications, one represents a pair of 
Virasoro algebras by mutually commuting sets of 
operators L™ and L™. In the quantum theory, for 
example, one takes the total energy H « L® + L, 
and the total momentum P œ L® — LO. In a highest 
weight representation, there is a unique eigenstate of 
L® having the lowest eigenvalue h; for this “vacuum” 
lb), L™ |b) =0, m > 0. 

Friedan, Qiu, and Shenker showed in 1984 that 
highest weight representations are characterized by a 
class of specific, non-negative values of the central 
charge c and, correspondingly, of h: either c > 1 (and 
h>0O) or c=1—6(€4+2)1(€4 3)", €=1,2,3,... 
(and h assumes a corresponding, specified set of values 
for each value of £). In a beautiful application to the 
study of the critical behavior of well-known statistical 
systems, in which the generator of dilations is 
proportional to L® + L, they discovered a direct 
correspondence with permitted values of the central 
charge; thus, c= 1/2 for the Ising model, c= 7/10 for 
the tricritical Ising model, c=4/5 for the three-state 
Potts model, and c= 6/7 for the tricritical three-state 
Potts model. 


Current Algebras and Groups 


Local current algebras may be exponentiated to 
obtain corresponding infinite-dimensional topologi- 
cal groups (Pressley and Segal 1986, Mickelsson 
1989, Kac 1990). Let G be a Lie group whose Lie 
algebra is G. The algebra mapọ(M, G), consisting of 
smooth, compactly supported G-valued functions on 
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M under the pointwise bracket, exponentiates to the 
local current group Map )(M,G), consisting of 
smooth maps from M to G that are the identity 
outside a compact set in M, under the pointwise 
group operation. When M is taken to be the four- 
dimensional spacetime manifold (rather than a 
spacelike hyperplane), the local current group 
modeled on M is mathematically a gauge group for 
nonabelian gauge field theory. 

Likewise, the algebra vecto(M) exponentiates to 
the group Diffo(M) of compactly supported C% 
diffeomorphisms of M (under composition). The 
Kac—Moody and Virasoro algebras exponentiate to 
central extensions of the loop group Map(S', G) and 
the diffeomorphism group Diff(S'), respectively. The 
semidirect sums of the Lie algebras are the infinite- 
simal generators of semidirect products of the 
groups. 

Under appropriate technical conditions, self- 
adjoint representations of current algebras generate 
(and may be obtained from) continuous unitary 
representations of the corresponding groups. The 
needed technical conditions have to do with the 
existence of a dense set of analytic vectors belonging 
to a common, dense invariant domain of essential 
self-adjointness for the currents. 


Nonrelativistic Current Algebra 


In nonrelativistic local current algebra, Schwinger 
terms do not appear. In 1968, Dashen and Sharp 
defined (at fixed time t, suppressed in the present 
notation) a mass density p(x)=my"*(x)w(x) and a 
momentum density J(x)=(b/2i){W*(x)Vw(x) — 
[Vy (x) ]v(x)}, where w is a second-quantized cano- 
nical field; here we keep þh in the notation. The 
resulting equal-time algebra is the semidirect sum: 


ex), p(y)] =0 
le), O) = ib [6% y) 
J? oS j= 2e (59) (x _ yJ (y) [14] 
ð 


Since this current algebra is independent of whether 
w obeys commutation or anticommutation relations, 
the information as to particle statistics (Bose or 
Fermi) is not encoded in the Lie algebra itself but in 
the choice of its representation (up to unitary 
equivalence). Again interpreting p and Jf as operator- 
valued distributions, define p(f)= Jp; dx f (x) a(x) 
and J(g)= Jp: d?x D3 _,g*(x)J*(x), where f and the 
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components g% of the vector field g belong to the 
function-space D. Then the Lie algebra becomes 


pfi) e(f2)| = 0 
p(t), J2) = helg- VP) [15] 
(1), J(82)] = ~i} ([g1, 82!) 


Equations [15] are a representation by self-adjoint 
operators of the semidirect sum of the abelian Lie 
algebra D with vecto(R°). The corresponding group 
is the natural semidirect product of the space D 
(regarded as an abelian topological group under 
addition) with Diffp(R°). 

The construction generalizes to a general manifold 
M or manifold with boundary (in place of R°*), and 
to a general set of charge densities that generate the 
local Lie algebra map,(M, G). When M = St, we have 
the Kac-Moody and Virasoro algebras with central 
charge zero. However, L in the nonrelativistic 
(1+ 1)-dimensional quantum theories is propor- 
tional to the total momentum P, and thus is 
unbounded above and below. 

The continuous unitary representations of 
Diffo(M), or its semidirect product with a local 
current group at fixed time, thus describe nonrela- 
tivistic quantum systems (Albeverio et al. 1999, 
Goldin 2004). The unitary representation V(¢),@ € 
Diffo(M), satisfies V(¢%)= exp [1(r/b)](g)], where 
r € R and g£ is the one-parameter flow in Diffo(M) 
generated by the vector field g. Such a representa- 
tion may be described very generally by means of a 
measure u on a configuration space A, quasi-invariant 
under a group action of Diffp(M) on A, together 
with a unitary 1-cocycle x on Diffo(M) x A. The 
Hilbert space for the representation is 
H=1i,(A, W), which is the space of measurable 
functions W(y),y € A, taking values in an inner 
product space W, and square integrable with respect 
to u. The unitary representation V is given by 


— coc 


IVOTI) = xT) 20) 16 


where @y denotes the group action Diffo(M) x A > 
A; js is the measure on A transformed by ¢ (which, 
by the quasi-invariance of u, is absolutely contin- 
uous with respect to u); dug/di is the Radon- 
Nikodym derivative; and y4(y): W — W is a system 
of unitary operators in W obeying the cocycle 
equation 


Xpt (1V) = Xb1(Y) X62 (17) [17] 


Equations [16] and [17] hold outside sets of 
u-measure zero in A. Given the quasi-invariant 
measure u on A, one may always choose W = C 


and y(y) = 1 to obtain a unitary group representa- 
tion on complex-valued wave functions; but inequi- 


valent cocycles describe unitarily inequivalent 
representations. 
The configuration space A‘), N=1,2,3,..., 


consists of N-point subsets of R, and u®’ is the 
(local) Lebesgue measure on A). The correspond- 
ing diffeomorphism group and local current algebra 
representations describe N identical quantum parti- 
cles in s-dimensional space. When y = 1, we have 
bosonic exchange symmetry. Inequivalent cocycles 
on A‘) are obtained (for s>2) by inducing 
(generalizing Mackey’s method) from inequivalent 
unitary representations of the fundamental group 
m [A]. For s > 3, this fundamental group is the 
symmetric group Sy of particle permutations; the 
odd representation of Sn, N > 2, gives fermionic 
exchange symmetry, while the higher-dimensional 
representations are associated with particles satisfy- 
ing the parastatistics of Greenberg and Messiah. 

When s=2, however, 7;[A‘%’] is the braid group 
By. Goldin, Menikoff, and Sharp obtained induced 
representations of the current algebra describing the 
intermediate statistics proposed by Leinaas and 
Myrheim for identical particles in 2-space. Such 
excitations, subsequently termed “anyons” by Wilc- 
zek and characterized as charge-flux tube compo- 
sites, are important constructs in the theory of 
surface phenomena such as the quantum Hall effect, 
and anyonic statistics has also been applied to the 
study of high-T, superconductivity. Current algebra 
representations induced by _higher-dimensional 
representations of By describe the statistics of 
“plektons.” Similarly, current algebra in nonsimply 
connected space describes the Aharonov-Bohm 
effect. 

Let y*(h) = fps Tabli (x) denote the smeared 
creation field. Let the indexed set of representations 
pn, Jn, N=0,1,2,..., satisfying the current algebra 
[15], act in Hilbert spaces Hn, where y*(h): Hy > 
HN+1, Ylh): HN+1 = HN W(h)|Ho = 0, so that yr 
and w intertwine the N-particle diffeomorphism 
group representations. Let p(f) and /(g) act on 
P-o Hn, so that p(f)Un=pn(f)Un,J(g)¥n = 
In(g)Yn. Then conditions for a Fock space hier- 
archy are specified by commutator brackets of the 
fields with the currents: 


lol), y” h] = 4 (ona (f)h) 
(sg), (h) = y“ Un=le)h) 


The local creation and annihilation fields for anyons 
in R, obeying [18], satisfy g-commutation relations, 
where q is the relative phase change associated with 
a single counterclockwise exchange of two anyons, 


[18] 


and the q-commutator [A,B], = AB — qBA. These 
relations generalize the canonical commutation 
(g=1) and anticommutation (q= —1) relations of 
quantum field theory. 

When A is the configuration space of infinite but 
locally finite subsets of R°, nonrelativistic current 
algebra describes the physics of infinite gases in 
continuum classical or quantum statistical 
mechanics. Here, the most important kinds of 
measures u are Poisson measures (associated with 
gases of noninteracting particles at fixed average 
density) or Gibbsian measures (associated with 
translation-invariant two-body interactions). These 
measures describe equilibrium states and correlation 
functions in the classical case, and specify the 
current algebra representations in the quantum 
theory. 

The group of volume-preserving diffeomorphisms 
was taken by Arnold as the symmetry group of an 
ideal, classical, incompressible fluid, and Marsden 
and Weinstein described the hydrodynamics of such 
a fluid using the Lie—Poisson bracket associated with 
the nonrelativistic current algebra of divergenceless 
vector fields. The idea of using this algebra to study 
quantized fluid motion, included in the program 
proposed by Rasetti and Regge, formed the basis of 
the subsequent study of quantized vortex structures 
in superfluids from the point of view of geometric 
quantization on coadjoint orbits of the diffeomorph- 
ism group. This leads to quantum configuration 
spaces whose elements are no longer sets of points — 
for example, spaces of vortex filaments in R?, or 
ribbons and tubes in R°. 


See also: Algebraic Approach to Quantum Field Theory; 
Electroweak Theory; Quantum Chromodynamics; 
Solitons and Kac—Moody Lie Algebras; Symmetries in 
Quantum Field Theory: Algebraic Aspects; Toda Lattices; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras. 
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Introduction 


Deformation quantization is an alternative way 
of looking at quantum mechanics. Some of its 
techniques were introduced by the pioneers of 
quantum mechanics, but it was first proposed as 
an autonomous theory in a paper in Annals of 
Physics (Bayen et al. 1978). More recent reviews 
treat modern developments (HH I 2001, Dito and 
Sternheimer 2002, Zachos 2002). 

Deformation quantization concentrates on the cen- 
tral physical concepts of quantum theory: the algebra 
of observables and their dynamical evolution. Because 
it deals exclusively with functions of phase-space 
variables, its conceptual break with classical mechanics 
is less severe than in other approaches. It formulates the 
correspondence principle very precisely which played 
such an important role in the historical development. 

Although this article deals mainly with nonrelati- 
vistic bosonic systems, deformation quantization is 
much more general. For inclusion of fermions and 
the Dirac equation see (Hirshfeld et al. 2002b). The 
fermionic degrees of freedom may, in special cases, be 
obtained from the bosonic ones by supersymmetric 
extension (Hirshfeld et al. 2004). For applications to 
field theory, see Hirshfeld et al. (2002). For the 
relation to Hopf algebras see Hirshfeld et al. (2003), 
and to geometric algebra, see Hirshfeld et al. (2005). 

The observables of a physical system, such as the 
Hamilton function, are smooth real-valued functions 
on phase space. Physical quantities of the system at 
some time, such as the energy, are calculated by 
evaluating the Hamilton function at the point 
xo = (qo, po) in phase space that characterizes the 
state of the system at this time (we assume for the 
moment, a one-particle system). The mathematical 
expression for this operation is 


pa J H(q,p)62(q —40,p —po)dqdp [1] 


where 6) is the two-dimensional Dirac delta 
function. The observables of the dynamical system 
are functions on the phase space, the states of the 
system are positive functionals on the observables 
(here the Dirac delta functions), and we obtain the 
value of the observable in a definite state by the 
operation shown in eqn [1]. 

In general, functions on a manifold are multiplied 
by each other in a pointwise manner, that is, given 
two functions f and g, their product fg is the 
function 


(fg)(x) = f(x)g(x) |2] 


In the context of classical mechanics, the observa- 
bles build a commutative algebra, called the com- 
mutative “classical algebra of observables.” 

In Hamiltonian mechanics there is another way to 
combine two functions on phase space in such a way 
that the result is again a function on the phase space, 
namely by using the Poisson bracket 


“(əf og əf ag 
{f,8}(4.0) = Do a 


= f(a00, = pp) [3] 





q:p 


in an abbreviated notation. 

The notation can be further abbreviated by using x 
to represent points of the phase-space manifold, 
x =(xX1,...,X2n), and introducing the Poisson tensor 
a, where the indices i,j run from 1 to 2n. In 
canonical coordinates a is represented by the matrix 


Oe 


where I,, is the n x n identity matrix. Then eqn [3] 
becomes 


{f, g}(x) = a dif (x) dgx) [5] 
where 0; = 0/0x;. 


For a general observable, 


f ={f, H} 6 


2 Deformation Quantization 


Because a transforms like a tensor with respect 
to coordinate transformations, eqn [5] may also be 
written in noncanonical coordinates. In this case 
the components of a need not be constants, and 
may depend on the point of the manifold at which 
they are evaluated. But in Hamiltonian mechanics, 
a is still required to be invertible. A manifold 
equipped with a Poisson tensor of this kind is 
called a symplectic manifold. In general, the tensor 
a is no longer required to be invertible, but it 
nevertheless suffices to define Poisson brackets via 
eqn [5], and these brackets are required to have 
the properties 


1. se =a] 
2; {f,ghb}={f, gh + golf, h}, and 


Property (1) implies that the Poisson bracket is 
antisymmetric, property (2) is referred to as the Leibnitz 
rule, and property (3) is called the Jacobi identity. The 
Poisson bracket used in Hamiltonian mechanics satis- 
fies all these properties, but we now abstract these 
properties from the concrete prescription of eqn [3], and 
a Poisson manifold (M,a) is defined as a smooth 
manifold M equipped with a Poisson tensor a, whose 
components are no longer necessarily constant, such 
that the bracket defined by eqn [5] has the above 
properties. It turns out that such manifolds provide a 
better context for treating dynamical systems with 
symmetries. In fact, they are essential for treating gauge- 
field theories, which govern the fundamental interac- 
tions of elementary particles. 


Quantum Mechanics and Star Products 


The essential difference between classical and 
quantum mechanics is Heisenberg’s uncertainty 
relation, which implies that in the latter, states can 
no longer be represented as points in phase space. 
The uncertainty is a consequence of the noncommu- 
tativity of the quantum mechanical observables. 
That is, the commutative classical algebra of 
observables must be replaced by a noncommutative 
quantum algebra of observables. 

In the conventional approach to quantum 
mechanics, this noncommutativity is implemented 
by representing the quantum mechanical observables 
by linear operators in Hilbert space. Physical 
quantities are then represented by eigenvalues of 
these operators, and physical states are related to the 
operator eigenfunctions. Although these entities are 
somehow related to their classical counterparts, to 
which they are supposed to reduce in an appropriate 
limit, the precise relationship has remained obscure, 
one hundred years after the beginnings of quantum 


mechanics. Textbooks refer to the correspondence 
principle, which guided the pioneers of the subject. 
Attempts to give this idea a precise formulation by 
postulating a specific relation between the classical 
Poisson brackets of observables and the commu- 
tators of the corresponding quantum mechanical 
operators, as undertaken, for example, by Dirac and 
von Neumann, encountered insurmountable diffi- 
culties, as pointed out by Groenewold in 1946 in an 
unjustly neglected paper (Groenewold 1948). In the 
same paper Groenewold also wrote down the first 
explicit representation of a “star product” (see eqn 
[11]), without however realizing the potential of this 
concept for overcoming the difficulties that he 
wanted to resolve. 

In the deformation quantization approach, there 
is no such break when going from the classical 
system to the corresponding quantum system; we 
describe the quantum system by using the same 
entities that are used to describe the classical 
system. The observables of the system are described 
by the same functions on phase space as their 
classical counterparts. Uncertainty is realized by 
describing physical states as distributions on phase 
space that are not sharply localized, in contrast to 
the Dirac delta functions which occur in the 
classical case. When we evaluate an observable in 
some definite state according to the quantum 
analog of eqn [1] (see eqn [24]), values of the 
observable in a whole region contribute to the 
number that is obtained, which is thus an average 
value of the observable in the given state. Non- 
commutativity is incorporated by introducing a 
noncommutative product for functions on phase 
space, so that we get a new noncommutative 
quantum algebra of observables. The systematic 
work on deformation quantization stems from 
Gerstenhaber’s seminal paper, where he introduced 
the concept of a star product of smooth functions 
on a manifold (Gerstenhaber 1964). 

For applications to quantum mechanics, we 
consider smooth complex-valued functions on a 
Poisson manifold. A star product f x g of two such 
functions is a new smooth function, which, in 
general, is described by an infinite power series: 


f x g = fg + (ib)Ci(f,g) + Ob?) 


ih)" Calf, 2) 7 
n=0 


The first term in the series is the pointwise product 
given in eqn [2], and (ib) is the deformation 
parameter, which is assumed to be varying con- 
tinuously. If is identified with Planck’s constant, 
then what varies is really the magnitude of the 


action of the dynamical system considered in units 
of h: the classical limit holds for systems with large 
action. In this limit, which we express here as h — 0, 
the star product reduces to the usual product. In 
general, the coefficients C,, will be such that the new 
product is noncommutative, and we consider the 
noncommutative algebra formed from the functions 
with this new multiplication law as a deformation of 
the original commutative algebra, which uses point- 
wise multiplication of the functions. 

The expressions C,(f, g) denote functions made 
up of the derivatives of the functions f and g. It is 
obvious that without further restrictions of these 
coefficients, the star product is too arbitrary to be of 
any use. Gerstenhaber’s discovery was that the 
simple requirement that the new product be asso- 
ciative imposes such strong requirements on the 
coefficients C, that they are essentially unique in 
the most important cases (up to an equivalence 
relation, as discussed below). Formally, Gerstenhaber 
required that the coefficients satisfy the following 
properties: 


1 Die n Cj (Cpl (f, g), 
2 Co(f, 2) - and 
J: Cı(f, g) — Cı (g,f)= Vag: 


Property (1) guarantees that the star product is 
associative: (f xg)xh=fx»(gxhþ). Property (2) means 
that in the limit 4 — 0, the star product f x g agrees 
with the pointwise product fg. Property (3) has at least 
two aspects: (i) mathematically, it anchors the new 
product to the given structure of the Poisson manifold 
and (ii) physically, it provides the connection between 
the classical and quantum behavior of the dynamical 
system. Define a commutator by using the new 
product: 


= ” yk- n Ci(f, Calg, h)), 


f g5 f*rg-e*f [8] 


Property (3) may then be written as 


ee | 
lim J If. gl. = tf 8t 9] 
Equation [9] is the correct form of the correspon- 
dence principle. In general, the quantity on the left- 
hand side of eqn [9] reduces to the Poisson bracket 
only in the classical limit. The source of the 
mathematical difficulties that previous attempts to 
formulate the correspondence principle encoun- 
tered was related to trying to enforce equality 
between the Poisson bracket and the corresponding 
expression involving the quantum mechanical com- 
mutator. Equation [9] shows that such a relation in 
general only holds up to corrections of higher order 


in $. 
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For physical applications we usually require the 
star product to be Hermitean: f x g=¢* f, where f 
denotes the complex conjugate of f. The star 
products considered in this article have this 
property. 

For a given Poisson manifold, it is not clear a 
priori if a star product for the smooth functions on 
the manifold actually exists, that is, whether it is at 
all possible to find coefficients C, that satisfy the 
above list of properties. Even if we find such 
coefficients, it it still not clear that the series they 
define through eqn [7] yields a smooth function. 
Mathematicians have worked hard to answer these 
questions in the general case. For flat Euclidian 
spaces, M=R*", a specific star product has long 
been known. In this case, the components of the 
Poisson tensor a” can be taken to be constants. The 
coefficient Cı can then be chosen antisymmetric, 
so that 


Ci(f,g) = 50" (Af) (Gg) = 31/8} (10) 


by property (3) above. The higher-order coefficients 
may be obtained by exponentiation of Cı. This 
procedure yields the Moyal star product (Moyal 


1949): 
pus=ter((Do!Bd)e i 


In canonical coordinates, eqn [11] becomes 
(f *y 8)(9,P) 
ib - =O 
= flap) = (940p — — 12 


m,n=0 


We now come to the question of uniqueness of the 
star product on a given Poisson manifold. Two star 
products x and x’ are said to be “c-equivalent” if 
there exists an invertible transition operator 


T=1+hT,+---=S 6", [14 
n=0 
where the T, are differential operators that satisfy 
f x 8 = T ((Tf) * (Tg) [15] 


It is known that for M=R*” all admissible star 
products are c-equivalent to the Moyal product. The 
concept of c-equivalence is a mathematical one 
(c stands for cohomology (Gerstenhaber 1964)); it 
does not by itself imply any kind of physical 
equivalence, as shown below. 


4 Deformation Quantization 


Another expression for the Moyal product is a 
kind of Fourier representation: 


(f *u 8)(4,P) 
1 


ER 
p 12 


J dqı dqz dp: dp2f (q1, p1)2(42,P2) 
2 
xexp |i (Pla ~ a2) + alp = p1) 


+ (q2p1 — anpa)| [16] 


Equation [16] has an interesting geometrical inter- 
pretation. Denote points in phase space by vectors, 
for example, in two dimensions: 


(Qe E G) 


Now, consider the triangle in phase space spanned 
by the vectors r — rı and r — r2. Its area (symplectic 
volume) is 


= 5[p (42 — 41) + 4(pı — p2) + (qıp2 — q2p1)] [18] 


which is proportional to the exponent in eqn [16]. 
Hence, we may rewrite eqn [16] as 


(f *8)(1) 
— / dr; dr2f (r1 )e(r2) exp Ate rita) [19] 
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The properties of the star product are well adapted 
for describing the noncommutative quantum algebra 
of observables. We have already discussed the 
associativity and the incorporation of the classical 
and semiclassical limits. Note that the characteristic 
nonlocality feature of quantum mechanics is also 
explicit. In the expression for the Moyal product 
given in eqn [13], the star product of the functions f 
and g at the point x =(q,p) involves not only the 
values of the functions f and g at this point, but also 
all higher derivatives of these functions at x. But for 
a smooth function, knowledge of all the derivatives 
at a given point is equivalent to the knowledge of 
the function on the entire space. In the integral 
expression of eqn [16], we also see that knowledge 
of the functions f and g on the whole phase space is 
necessary to determine the value of the star product 
at the point x. 


The c-equivalent star products correspond to differ- 
ent quantization schemes. Having chosen a quantiza- 
tion scheme, the quantities of interest for the quantum 
system may be calculated. It turns out that different 
quantization schemes lead to different spectra for the 
observables. The choice of a specific quantization 
scheme can only be motivated by further physical 
requirements. In the simple example we discuss below, 
the classical system is completely specified by its 
Hamilton function. In more general cases, one may 
have to decide what constitutes a sufficiently large set 
of good observables for a complete specification of the 
system (Bayen et al. 1978). 

A state is characterized by its energy E; the set 
of all possible values for the energy is called the 
spectrum of the system. The states are described 
by distributions on phase space called projectors. 
The state corresponding to the energy E is 
denoted by 7g(g,p). These distributions are 
normalized: 


1 
zz | =a p)dqdp = 1 [20] 
and idempotent: 


(me * TE’)(, P) = ze TE(, p) [21 


The fact that the Hamilton function takes the value 
E when the system is in the state corresponding to 
this energy is expressed by the equation 


(H = TE)(Q, p) = Ere (q, p) [22] 


Equation [22] corresponds to the time-independent 
Schrödinger equation, and is sometimes called the 
“x-genvalue equation.” The spectral decomposition 
of the Hamilton function is given by 


E 


where the summation sign may indicate an integra- 
tion if the spectrum is continuous. The quantum 
mechanical version of eqn [1] is 


E= zz | teE)(q, p)dq dp 
-=-= J H(q,p) te(q, p)dq dp |24] 


where the last expression may be obtained by using 
eqn [16] for the star product. 

The time-evolution function for a time-indepen- 
dent Hamilton function is denoted by Exp(Ht), and 
the fact that the Hamilton function is the generator 
of the time evolution of the system is expressed by 


ih Č Exp(Hi) = H x Exp(Ht) [25] 


This equation corresponds to the time-dependent 
Schrodinger equation. It is solved by the star 
exponential: 


Exp(Ht) 


_ a (F) e 26] 


where (H +)” = H* Hx«---* H. Because each state 
Nee, Joeceeee” 


n times 
of definite energy E has a time evolution exp (iEt/b), 
the complete time-evolution function may be written 
in the form 


Exp(Ht) a sae [27] 


This expression is called the “Fourier—Dirichlet 
expansion” for the time-evolution function. 

Questions concerning the existence and unique- 
ness of the star exponential as a C™ function and the 
nature of the spectrum and the projectors again 
require careful mathematical analysis. The problem 
of finding general conditions on the Hamilton 
function H which ensure a reasonable physical 
spectrum is analogous to the problem of showing, 
in the conventional approach, that the symmetric 
operator H is self-adjoint and finding its spectral 
projections. 


The Simple Harmonic Oscillator 


As an example of the above procedure, we treat the 
simple one-dimensional harmonic oscillator charac- 
terized by the classical Hamilton function 


2 2 
H(p) =i 28] 


In terms of the holomorphic variables 


the Hamilton function becomes 
H = waa [30] 


Our aim is to calculate the time-evolution function. 
We first choose a quantization scheme characterized 
by the normal star product 


ass Mig [31] 
we then have 


ax, a=aa+h [32] 
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so that 
aa =p [33] 


N 


Equation [25] for this case is 


PE. 
ih T Exp, (Ht) = 


with the solution 


(H + hwad;z)Exp, (Ht) BA 
Expy (Ht) = e7” exp (e"'aa/b) [35] 


By expanding the last exponential in eqn [35], we 
obtain the Fourier—Dirichlet expansion 





Expy ( (Ht) e4a/b 3 = aa" einut [36] 


From here, we can read off the energy eigenvalues 
and the projectors describing the states by compar- 
ing coefficients in eqns [27] and [36]: 


my =e alt [37] 

1 1 
aN) — paroa" a" = pai" n nm) a [38] 
b= nw [39] 


Note that the spectrum obtained in eqn [39] does 
not include the zero-point energy. The projector 
onto the ground state 7°’ satisfies 


ax a) =0 40] 


The spectral decomposition of the Hamilton func- 
tion (eqn [23]) is in this case 


= e4a/h an n\ _ = 
= N onh “(gr ic a”a ) = waa  |41| 


n=() 
We now consider the Moyal quantization scheme. 
If we write eqn [12] in terms of holomorphic 
coordinates, we obtain 


frug = fop(3 ða- dads) e142 


Here, we have 


la,a|, =b [44] 


The value of the commutator of two phase-space 
variables is fixed by property (3) of the star product, 
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and cannot change when one goes to a c-equivalent 
star product. The Moyal star product is c-equivalent to 
the normal star product with the transition operator 


T= e- (b/2)0.02 [45] 


We can use this operator to transform the normal 
product version of the x-genvalue equation, eqn [22], 
into the corresponding Moyal product version 
according to eqn [15]. The result is 


þ 
H x; iM) = s(a * a +5] * gM) 


= hw(n +4) 7M) [46] 
with 
nOD L Tq) = 29-2aalh 47 
iM) = TN) = aa * wm taad [48] 
The projector onto the a state Tm satisfies 
ax ut = 0 [49] 
We now have, for the spectrum, 
E, = (n +4)hw [50] 


which is the textbook result. We conclude that for 
this problem, the Moyal quantization scheme is the 
correct one. 

The use of the Moyal product in eqn [25] for the 
star exponential of the harmonic oscillator leads to 
the following differential equation for the time 
evolution function: 


„d 
ih q; Expy (Ht) 


(n-A o,- ry Hoh, ) Esp (tH [51] 


The solution is 


= as (=) en (3) 52) 


This expression can be brought into the form of the 
Fourier—Dirichlet expansion of eqn [27] by using 
the generating function for the Laguerre 
polynomials: 


Expy (Ht) 


= So s"(—1)"Ly(2) [53] 


I wol 
1 +s PTEs z 


with s=e“™. The projectors then become 


ge) =i e HL, (=) [54] 


which is equivalent to the expression already found 
in eqn [48]. 


Conventional Quantization 


One usually finds the observables characterizing 
some quantum mechanical system by starting from 
the corresponding classical system, and then, either 
by guessing or by using some more or less systematic 
method, and finding the corresponding representa- 
tions of the classical quantities in the quantum 
system. The guiding principle is the correspondence 
principle: the quantum mechanical relations are 
supposed to reduce somehow to the classical 
relations in an appropriate limit. Early attempts to 
systematize this procedure involved finding an 
assignment rule © that associates to each phase- 
space function f a linear operator in Hilbert space 
f=0(f) in such a way that in the limit 4 — 0, the 
quantum mechanical equations of motion go over to 
the classical equations. Such an assignment cannot 
be unique, because even though an operator that is a 
function of the basic operators Ô and P reduces to a 
unique phase-space function in the limit b — 0, 
there are many ways to assign an operator to a given 
phase-space function, due to the different orderings 
of the operators Ô and P that all reduce to the 
original phase-space function. Different ordering 
procedures correspond to different quantization 
schemes. It turns out that there is no quantization 
scheme for systems with observables that depend on 
the coordinates or the momenta to a higher power 
than quadratic which leads to a correspondence 
between the quantum mechanical and the classical 
equations of motion, and which simultaneously 
strictly maintains the Dirac-von Neumann require- 
ment that (1/ih)[f,¢] e {f,g}. Only within the 
framework of deformation quantization does the 
correspondence principle acquire a precise meaning. 

A general scheme for associating phase-space 
functions and Hilbert space operators, which 
includes all of the usual orderings, is given as 
follows: the operator ©)(f) corresponding to a 
given phase-space function f is 


= | Fe meWOrmMeMeMdedn [55] 


where f is the Fourier transform of f, and ( O, P) are the 
Schrödinger operators that correspond to the phase- 
space variables (q, p); A(€, n) is a quadratic form: 


A(n) = P (anf + BE + 2iyén) [56] 


Different choices for the constants (a, (3,7) yield 
different operator ordering schemes. 


The relation between operator algebras and star 
products is given by 


O(f)O(g) = OF * g) |57] 


where © is a linear assignment of the kind discussed 
above. Different assignments, which correspond to 
different operator orderings, correspond to c-equiva- 
lent star products. It demonstrates that the quantum 
mechanical algebra of observables is a representa- 
tion of the star product algebra. Because in the 
algebraic approach to quantum theory all the 
information concerning the quantum system may 
be extracted from the algebra of observables, 
specifying the star product completely determines 
the quantum system. 

The inverse procedure of finding the phase-space 
function that corresponds to a given operator f is, 
for the special case of Weyl ordering, given by 


f(4,P) = / (qtiélflq—lee de [58] 


When using holonomic coordinates, it is convenient 


to work with the coherent states 
ala) =ala), (ala! = (ala [59] 


These states are related to the energy eigenstates of 
the harmonic oscillator 


=â! 10) (60) 


by 





[61] 


In normal ordering, we obtain the — space function 
f(a, a) corresponding to the operator f by just taking 
the matrix element between coherent states: 


f (a,@) = (alf(a,a")|a) [62] 
For holomorphic coordinates, it is easy to show 


(N) (a.g 


n (a,ā) = 7a (aln) (nla) = pa (aay"e™> 63 


in agreement with eqn [38] for the normal star 
product projectors. 

The star exponential Exp(Ht) and the projectors 
Tn are the phase-space representations of the time- 
evolution operator exp (—iHt/h) and the projection 
operators „= |n) (n|, respectively. Weyl ordering 
corresponds to the use of the Moyal star product for 
quantization and normal ordering to the use of the 
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normal star product. In the density matrix formal- 
ism, we say that the projection operator is that of a 
pure state, which is characterized by the property of 
being idempotent: 7 =f, (compare eqn [21]). The 
integral of the projector over the momentum gives 
the probability distribution in position space: 


1 
inh J ma” 
= / (q + £/2|n)(nlq — €/2)e®/Pdé d 
aah q q p 
= (q|n) (nlq) = nla)? 64 


and the integral over the position gives the prob- 
ee distribution in momentum space: 


sz | apaa = plp) = aP (65 
The normalization is 
xy fo \(q,p)dqdp =1 [66] 


which is the same as eqn [20]. Applying these 
relations to the ground-state projector of the 
harmonic oscillator, eqn [47] shows that this is a 
minimum-uncertainty state. In the classical limit 
h—0, it goes to a Dirac ô-function. The expecta- 
tion value of the Hamiltonian operator is 


1 nop A 
a5 | (A «,, 7) (q, p)dq dp = J (alftinlayag 
= tr(Hp,) [67] 


which should be compared to eqn [24]. 


Quantum Field Theory 


A real scalar field is given in terms of the coefficients 
a(k), a(k) by 

d°k | | 

x) = | ——,~—— |a(k)e"** + a(k)e**| [68 

00) = | oa [ah ike] [68 


where hu, = VB k* + m2 is the energy of a single- 
quantum of the field. The corresponding quantum 
field operator is 


d?k S —i x 5 i x 
B(x) = J o a(kje* + al(kjel] [69 


where 4(k),4'(k) are the annihilation and creation 
operators for a quantum of the field with momen- 
tum bk. The Hamiltonian is 


H= / Peroa (kk) a(k) 70) 
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N(k) = 4! (k)a(k) is interpreted as the number opera- 
tor, and eqn [70] is then just the generalization of 
eqn [39], the expression for the energy of the harmonic 
oscillator in the normal ordering scheme, for an infinite 
number of degrees of freedom. Had we chosen the 
Weyl-ordering scheme, it would have resulted in (by 
the generalization of eqn [50]) an infinite vacuum 
energy. Hence, requiring the vacuum energy to vanish 
implies the choice of the normal ordering scheme in 
free field theory. In the framework of deformation 
quantization, this requirement leads to the choice of 
the normal star product for treating free scalar fields: 
only for this choice is the star product well defined. 

Currently, in realistic physical field theories 
involving interacting relativistic fields we are limited 
to perturbative calculations. The objects of interest 
are products of the fields. The analog of the Moyal 
product of eqn [11] for systems with an infinite 
number of degrees of freedom is 


plx) * O(x2) * -x P(Xn) 


= ex : tx dî Ma x— = 
- bd / Oe OS) E 


i<j 
xX Qı (x1), cake On(Xn) di=o [71] 


where the expressions 6/6¢(x) indicate functional 
derivatives. Here, we have used the antisymmetric 
Schwinger function: 


A(x — y) = [®(x), Oty) [72] 


The Schwinger function is uniquely determined by 
relativistic invariance and causality from the equal- 
time commutator 


[P(x), O) |5 16° (x — y) [73] 





which is the characterization of the canonical 
structure in the field theoretic framework. 

The Moyal product is, however, not the suitable 
star product to use in this context. In relativistic 
quantum field theory, it is necessary to incorporate 
causality in the form advocated by Feynman: 
positive frequencies propagate forward in time, 
whereas negative frequencies propagate backwards 
in time. This property is achieved by using the 
Feynman propagator: 


l At(x) for x° >0 74) 


MO a) 


for x? < 0 


where A*(x),A (x) are the propagators for the 
positive and negative frequency components of the 
field, respectively. In operator language 


Ar(x — y) = T(®(x)®(y)) — N(@(x)®(y)) [75] 


where 7 indicates the time-ordered product of the 
fields and N the normal-ordered product. Because the 
second term in eqn |75] is a normal-ordered product 
with vanishing vacuum expectation value, the Feyn- 
man propagator may be simply characterized as the 
vacuum expectation value of the time-ordered product 
of the fields. The antisymmetric part of the positive 
frequency propagator is the Schwinger function: 


AY (x) — A" (=x) = AY (x) + A(x) = A(x) [76 


The fact that going over to a c-equivalent product 
leaves the antisymmetric part of the differential 
operator in the exponent of eqn [71] invariant suggests 
that the use of the positive frequency propagator 
instead of the Schwinger function merely involves the 
passage to a c-equivalent star product. This is indeed 
easy to verify. The time-ordered product of the 
operators is obtained by replacing the Schwinger 
function A(x — y) in eqn [72] by the c-equivalent 
positive frequency propagator A(x — y), restricting 
the time integration to x° > y?, as in eqn [74], and 
symmetrizing the integral in the variables x and y, 
which brings in the negative frequency propagator 
A (x — y) for times x° < y?. Then eqn [71] becomes 
Wick’s theorem, which is the basic tool of relativistic 
perturbation theory. In operator language 


T (®(x1),..-, P(xn)) 


1 ô 6 
= exp Ff d*x dy Seq Oe — y) 55) 
KND heres OX) [77] 


Another interesting relation between deformation 
quantization and quantum field theory has been 
uncovered by studies of the Poisson-Sigma model. 
This model involves a set of scalar fields X’ which mapa 
two-dimensional manifold X2 onto a Poisson space M, 
as well as generalized gauge fields A;, which are 1-forms 
on X2 mapping to 1-forms on M. The action is given by 


Soc = | (AjdX' 4- a A;A;) [78] 
>) 


where a” is the Poisson structure of M. A remark- 
able formula was found (Cattaneo and Felder 2000): 


(F«a)(x) = | DXDAF(X(1))g(X(2))e5"" [79 


where f, g are functions on M,» is Kontsevich’s star 
product (Kontsevich 1997), and the functional integra- 
tion is over all fields X that satisfy the boundary 
condition X(co) =x. Here ©) is taken to be a disk in R’; 
1, 2, and œ are three points on its circumference. By 
expanding the functional integral in eqn [79] according 
to the usual rules of perturbation theory, one finds that 
the coefficients of the powers of h reproduce the graphs 
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and weights that characterize Kontsevich’s star pro- 
duct. For the case in which the Poisson tensor is 
invertible, we can perform the Gaussian integration in 
eqn [79] involving the fields A;. The result is 


(f * g)(x) 
= [ DXf(X(1))e(X(2)) exp] 


1 


; J nydX'd 80 


Equation [80] is formally similar to eqn [16] for the 
Moyal product, to which the Kontsevich product 
reduces in the symplectic case. Here Qy = (a ) is the 
symplectic 2-form, and f Q;dX’ dX’ is the symplectic 
volume of the manifold M. To make this relationship 
exact, one must integrate out the gauge degrees of 
freedom in the functional integral in eqn [79]. Since the 
Poisson-sigma model represents a topological field 
theory there remains only a finite-dimensional inte- 
gral, which coincides with the integral in eqn [80]. 


See also: Deformations of the Poisson Bracket on a 
Symplectic Manifold; Deformation Quantization and 
Representation Theory; Deformation Theory; Fedosov 
Quantization; Noncommutative Geometry from Strings; 
Operads; Quantum Field Theory: A Brief Introduction; 
Schrodinger Operators. 
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The Quantization Problem 


Though quantum theory for the classical phase space 
R?” is well established by means of what usually is 
called canonical quantization, physics demands to go 
beyond R*”: On the one hand, systems with constraints 
lead by phase-space reduction to classical phase spaces 
different from R”; in general one ends up with a 
symplectic or even Poisson manifold. Thus, one needs 
to quantize geometrically nontrivial phase spaces. On 
the other hand, field theories and thermodynamical 
systems require to pass from R?” to infinitely many 
degrees of freedom, where one faces additional 
analytical difficulties. Both types of difficulties combine 


for gauge field theories and gravity, whence it is clear 
that quantization is still one of the most important 
issues in mathematical physics. 

One possibility (among many others) is to use the 
structural similarity between the classical and 
quantum observable algebras. In both cases the 
observables constitute a complex x-algebra: in the 
classical case it is commutative with the additional 
structure of a Poisson bracket, whereas in the 
quantum case the algebra is noncommutative. In 
deformation quantization, one tries to pass from the 
classical observables to the quantum observables by 
a deformation of the algebraic structures. 


From Canonical Quantization to Star 
Products 


Let us briefly recall canonical quantization and the 
ordering problem. In order to “quantize” classical 
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observables like the polynomials on R” to qf, pn 
one assigns the operators 


qt > olaf) = OF = (qr g*v(q)) [1] 


pan) =P= (12$) e 


for k,l=1,...,n, defined on a suitable domain in 
L*(R”,d"q). For simplicity, we choose CX(R”) as 
domain. The well-known ordering problem is 
encountered if one wants to also quantize higher 
polynomials. One convenient (although not the only) 
possibility is Weyl’s total symmetrization rule, that is, 
for a monomial like g*p we take the quantization 


oweyi(q p) = +(Q°P + OPQ + PQ’) 








= —ihq? Z —ihg [3] 
This can be written in the more explicit form: 
Oweyi(f) 
Si (\ wun | 7 Ws 
0 ri \i OPi + Opi, E, ðq’! -< Ogi 
with 
h 82 
N = exp (3. a) and A= rT 


Using [4] one can easily extend oy,y to all functions 
f € C®(R7”) which are polynomial in the momentum 
variables only and have an arbitrary smooth depen- 
dence on the position variables. This Poisson sub- 
algebra of C™(R”) certainly covers all classical 
observables of physical interest. Denoting these obser- 
vables by Pol(T*R”), one obtains a linear isomorphism 


wey! : Pol(T*R”) — Diffop(R”) [5] 


into the differential operators with smooth coeffi- 
cients, called Weyl symbol calculus. Other orderings 
would result in a different linear isomorphism like 
[5], for example, the standard ordering is obtained 
by simply omitting the operator N in [4]. 

Using [5], one can pull back the operator product 
of Diffop(R”) to obtain a new product xwey for 
Pol(T*R”), that is 


f xWeyl £5 Oweyt | (Weyl (f) OWeyl (g)) [6] 


which is called the Weyl-Moyal star product. 
Explicitly, one has 


f Weyl 8 


yee lee ee fog (7 
—— ðq Op, Op, qÅ j 


where u(f &g)=fg is the commutative product. 
Clearly, for f,g € Pol(T*R”) the exponential series 
terminates after finitely many terms. If one now 
wants to extend further to all smooth functions, 
then [7] is only a formal power series in h. Since on 
a manifold one does not have a priori a nice 
distinguished class of functions like Pol(T*R”), one 
indeed has to generalize in this direction if a 
geometric framework is desired. This observation 
and the simple fact, that *wey satisfies all the 
following properties, lead to the definition of a 
formal star product by Bayen et al. (1978): 


Definition 1 A formal star product on a Poisson 
manifold (M,7z) is an associative C[[A]]-bilinear 
product 


L 


fey NNE 8 
r=0 


for f,g € C~(M)[[A]] such that 


1. Co(f, 8) =fg and C1 (f, 8) = Ci(g, f) =f, g}, 
2. 1xf=f=fx1, and 
3. C, is a bidifferential operator. 


If in addition fxg=%8xf, then x is called 
Hermitian. 


Clearly, xwey, defines a Hermitian star product for 
R”. The first condition is called the correspondence 
principle in deformation quantization and the for- 
mal parameter A= À corresponds to Planck’s con- 
stant h once a convergence scheme is established. 

If S=id+ 5°~* , XS, is a formal series of differ- 
ential operators with S,1 =0 for r > 1, then it is easy 
to see that 


f x’ g = S (Sf x Sg) [9] 


defines again a star product which is Hermitian if x is 
Hermitian and if in addition Sf = Sf. In particular, the 
operator N, as before, serves for the transition from 
*Wey! to the standard-ordered star product xs,4 obtained 
the same way from the standard-ordered quantization. 
Thus, [9] can be seen as the abstract notion of changing 
the ordering prescription, even if no operator repre- 
sentation has been specified. Two star products related 
by such an equivalence transformation are called 
equivalent and *-equivalent in the Hermitian case. 

One main advantage of formal deformation 
quantization is that one has very strong existence 
and classification results: 


Theorem 2 On every Poisson manifold there exists 
a star product. 


The above theorem was first shown by deWilde 
and Lecomte (1983) for the symplectic case and 
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independently by Fedosov (1985) and Omori, 
Maeda, and Yoshioka (1991). In 1997, Kontsevich 
was able to prove the general Poisson case by 
showing his profound formality theorem. The full 
classification of star products up to equivalence was 
first obtained for the symplectic case by Nest and 
Tsygan (1995) and independently by Deligne 
(1995), Bertelson, Cahen, and Gutt (1997), and 
Weinstein and Xu (1997). The general Poisson case 
again follows from Kontsevich’s formality. In 
particular, in the symplectic case, star products are 
classified by their characteristic class 


cia e(a) E + Hiran MORI (10 
As conclusion one can state that for the price of 
formal power series in b one obtains in formal 
deformation quantization a very general and well- 
understood picture of the observable algebra for the 
quantum version of any classical system described 
kinematically by a Poisson manifold. It turns out 
that already in this framework one can discuss 
dynamics as well by use of a Heisenberg equation 
formulated with x. Moreover, the quantization of 
symmetries described by Hamiltonian Lie group or 
Lie algebra actions has been extensively studied. 
For a physical theory of quantization, however, 
there are still at least two ingredients missing. On 
the one hand, one has to overcome the formal 
power series expansion in b. This problem is, in 
principle, on the same footing as any perturbative 
approach to quantum theory and thus no easy 
answer can be expected to hold in general. In 
particular examples, however, such as the Weyl- 
Moyal star product, it can easily be solved. These 
issues together with the corresponding questions 
about a spectral calculus are best studied in the 
framework of Rieffel’s strict deformation quantiza- 
tion based on a more C*-algebraic formulation of 
the deformation problem. On the other hand, the 
observable algebra is not enough to describe a 
quantum system: one also needs to have a notion 
for the states. It turns out that already in the formal 
framework one has a physically reasonable notion 
of states as discussed by Bordemann and Wald- 
mann (1998). 


States and Representations 


The notion of states in deformation quantization 
is adapted from the C*-algebraic world and based 
on the notion of positive functionals. Recall that 
for a x*-algebra A over C a linear functional 


w:A—-C is called positive if w(a*a) >0. For 
formal deformation quantization, things are 
slightly more subtle as now one has to consider 
C[[A]]-linear functionals 


w  (C™(M)IIAl], x) — CIA [11] 


where x is assumed to be a Hermitian star product 
in the following. Then the positivity is understood in 
the sense of formal power series where a € R[[A]] is 
called positive if a= X`% „Aa, with a,, > 0. Thus, 


r=rfr 
we can make sense out of the following 
requirement: 


Definition 3 Let x be a Hermitian star product on 
M. A C{[A]]-linear functional w:C~(M)[[A]] - 
C[[A]] is called positive with respect to x if 


wif xf) > 0 12) 
and it is called a state if, in addition, w(1)=1. 


In fact, w(f) is interpreted as the expectation value 
of the observable f in the state w. The positivity [12] 
ensures that the usual uncertainty relations between 
expectation values hold. 

Sometimes it is convenient to consider positive 
functionals only defined on a (proper) x-ideal in 
C™(M)[[A]], for instance, C8 (M)[[A]]. 

Since in some situations one wants more general 
formal series than just power series, it is conve- 
nient to embed the above definition of states into a 
larger and more algebraic context: consider an 
ordered ring R, that is, a commutative, associative, 
unital ring R together with a distinguished subset 
P CR (the positive elements) such that R is the 
disjoint union —PU{O}UP, and we have P-P CP 
and P+PCP. Then C=R(i) denotes the ring 
extension by a square root i of —1 and consider 
*-algebras A over C. Clearly, this generalizes the 
cases R=R, where C=C, as well as R=RI[A]], 
where C=C[A]]. In this way, one provides a 
framework where C*-algebras, x-algebras over C, 
and formal Hermitian star products can be treated 
on the same footing. It is clear that the definition 
of a positive functional immediately extends to 
w:A—C for such a ring C. 


Example 4 


(i) For the Wick star product on R” = C”, defined 
by 





SLA of o'g 
1 = E oe | Ue a a eee 1 


the 6-functional 6:f++f(0) is positive. Note, 
however, that 6 is not positive for xweyl. 
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(ii) For the Weyl-Moyal star product *y,y the 
Schrödinger functional 


of) = | fla. = 0)d"q 14] 


defined on the -ideal C (RAJ, is positive. 

(iii) For any connected symplectic manifold (M, w) 
and any Hermitian star product x, there exists a 
unique normalized trace functional 


tr: Cy WD IAJ —CIAI] 
tr(f xg) =tr(gxf) 


with zeroth order equal to the integration over 
M with respect to the Liouville measure Q = w”. 
Then this trace is positive as well, tr(f xf) > 0. 


[15] 


Having a notion for states as expectation-value 
functionals is still not enough to formulate quantum 
theory. One main feature of quantum states, the 
superposition principle, is not yet implemented. In 
particular, forming convex combinations like 
w=cıwı +c2w2, with c12 > 0 and cy +o=—1, 
does not give a superposition of wı and w but 
a mixed stated. Hence, one needs an additional 
linear structure on the states whence we look for a 
x-representation 7 of the observable algebra A on a 
pre-Hilbert space H over C such that the states 
W1,W2 can be written as vector states w;(a)= 
(di, nla)di) for some unit vectors ¢1,¢2 E€ H. Then 
one can build superpositions of the vectors ¢1, ¢2 in 
the usual way. While this is the well-known 
argument in any quantum theory based on the 
observable algebras, for deformation quantization 
one first has to make sense out of the above notions, 
since now R=R|[A]] is only an ordered ring. This 
can actually be done in a consistent way as 
demonstrated and exemplified by Bordemann, 
Bursztyn, Waldmann, and others. 

We recall the basic results: A pre-Hilbert space H 
over C is a C-module with a C-sesquilinear inner 
product (-,-):#x H—-C such that (d, Y) = (Y, p) 
and (¢,¢) > 0 for ¢ # 0. This makes sense since R is 
ordered. An operator A: Hy, — H2 is called adjointa- 
ble if there exists an operator A* : H2 — Hı such that 
(Ad, Yh = (Q, A*Y), for all o € H1, Y € H2. The set 
of adjointable operators is denoted by 8(H1, H2), and 
6(H) =58(H, H) turns out to be a *-algebra over C. 
This allows one to define a *-representation 7 of A on 
H to be a x*x-homomorphism 7:A— (H). An 
intertwiner T between two x-representations (H1, 71) 
and (H2,72) is an operator T € B(H1ı, H2) with 
T7\(a)=72(a)T for all ae A. This defines the 
category *-Rep(.A) of x-representations of A. 

Let us now recall that a positive linear functional 
w can be written as an expectation value for a vector 





state in some representation. This is the well-known 
Gelfand—Naimark—Segal (GNS) construction from 
operator algebra theory which can be transferred to 
this purely algebraic context (Bordemann and 
Waldmann 1998). First recall that any positive 
linear functional w: A— C satisfies the Cauchy- 
Schwarz inequality 


w(a*b) w(a*b) < w(a*a)w(b*b) [16] 





and w(a*b) = w(b*a}). If A is unital, which will always 


be assumed for simplicity, then w(a*) =w(a) follows. 
Then 


Jo = {a € A|w(a*a) = 0} [17] 


is a left ideal in A, the so-called Geľfand ideal, and 
hence H= A/TJu is a left A-module with module 
structure denoted by m la)p = Yab, where Yp E Hu 
denotes the equivalence class of b€ A. Finally, 
(pp, Pc) =w(b*c) turns Hu into a pre-Hilbert space 
and m, becomes a *-representation, the GNS repre- 
sentation with respect to w. Moreover, y1 E€ H, is a 
cyclic vector, Yp =7.,(b)y1, with the property 


w(a) = (p1, Tula)p) [18] 


These properties characterize the GNS representa- 
tion (Hou, Tw, Y1) up to unitary equivalence. 


Example 5 We can now apply this construction to 
the three basic examples and obtain the following 
well-known representations as GNS representations: 


(i) The GNS representation corresponding to the 
-functional and the Wick star product is 
(unitarily equivalent to) the formal 
Bargmann—Fock representation. Here 
Hs=C[[y!,...,y7]][[A]] with inner product 





[19] 
and rs is explicitly given by 

7 29 (2AY or 
Tal) 7 2, ris! Ogi... Oz Oz! nies Oz 


oo 
Oy" a Oy" 





(0) 


med [20] 
In particular, re(%)=2A0/3ğ and rs(ž)=7 
are the annihilation and creation operators 
and [20] gives the Wick (or normal) ordering. 
This basic example has been extended to 


arbitrary Kähler manifolds by Bordemann and 
Waldmann (1998). 
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(ii) The Weyl-Moyal star product xwe) and the 
Schrödinger functional w as in [14] give the 
usual Schrodinger representation as GNS repre- 
sentation. We obtain H,,=C3°(R”)[[A]] with 


inner product 


(0.0) = | KDA da 21 


and 7.,(f) = Oweyi(f) as in [4] with h replaced 
by A. The Schrodinger representation as a 
particular case of a GNS representation has 
been generalized to arbitrary cotangent bundles 
including representations on sections of line 
bundles over the configuration space (Dirac’s 
representation for magnetic monopoles) by 
Bordemann, Neumaier, Pflaum, and Waldmann 
(1999, 2003). In this context, the WKB expan- 
sion can also be formulated. 

(iii) For the positive trace tr, the GNS pre-Hilbert is 
simply the space Ht =C3°(M)[[A]] with inner 
product (f, g) =tr(f x g). The corresponding GNS 
representation is the left regular representation 
Tr(f)g =f x g. Note that in this case the commu- 
tant of the representation is (anti-)isomorphic to 
the observable algebra and given by all the right 
multiplications. Thus, m is highly reducible and 
the size of the commutant indicates a “thermo- 
dynamical” interpretation of this representation. 
Indeed, one can take this GNS representation, and 
more general for arbitrary KMS functionals, as a 
starting point of a preliminary version of a 
Tomita—Takesaki theory for deformation quanti- 
zation as shown by Waldmann (1999). 


After these fundamental examples, we now recon- 
sider the question of superpositions: in general, two 
(pure) states w1,w2 cannot be realized as vector 
states inside a single irreducible representation. One 
encounters superselection rules. Usually, for 
instance, in algebraic quantum field theory, the 
existence of superselection rules indicates the pre- 
sence of charges. In particular, it is not sufficient to 
consider one single representation of the observable 
algebra A. Instead, one has to investigate (as good 
as possible) all superselection sectors of the repre- 
sentation theory *-Rep(A) of A and find physically 
motivated criteria to select distinguished representa- 
tions. In usual quantum mechanics on R”, this 
turns out to be rather simple, thanks to the 
(nontrivial) uniqueness theorem of von Neumann: 
one has a unique irreducible representation of the 
Weyl algebra up to unitary equivalence. In infinite 
dimensions or in topologically nontrivial situations, 
however, von Neumann’s theorem does not apply 
and one indeed has superselection rules. 


In deformation quantization, some parts of these 
superselection rules have been understood well: 
again, for cotangent bundles T*Q, one can classify 
the unitary equivalence classes of Schrodinger-like 
representations on C>°(Q)|[A]] by topological classes 
of nontrivial vector potentials. Thus, one arrives at 
the interpretation of the Aharonov-Bohm effect as 
superselection rule where the classification is essen- 


tially given by Hgegham(QsC)/27i Hiegham(Q>Z)- 


General Representation Theory 


Although it is very much desirable to determine the 
structure and the superselection sectors in *-Rep(A) 
completely, this is only achievable in the very 
simplest examples. Moreover, for formal star pro- 
ducts, many artifacts due to the purely algebraic 
nature have to be expected: the Bargmann—Fock and 
Schrödinger representation in Example 5 are uni- 
tarily inequivalent and thus define a superselection 
rule, even the pre-Hilbert spaces are nonisomorphic. 
However, these artifacts vanish immediately when 
one imposes the suitable convergence conditions 
together with appropriate topological completions 
(von Neumanns’s theorem). Given such problems, it 
is very difficult to find “hard” superselection rules 
which indeed have physical significance already at 
the formal level. Nevertheless, the example of the 
Aharonov-Bohm effect shows that this is possible. 
In any case, new techniques for investigating 
x-Rep( A) have to be developed. It turns out that 
comparing *-Rep(A) with some other *x-Rep(B) is 
much simpler but still gives some nontrivial insight 
in the structure of the representation theory. Here 
the Morita theory provides a highly sophisticated 
tool. 

The classical notion of Morita equivalence as well 
as Rieffel’s more specialized strong Morita equiva- 
lence for C*-algebras have been transferred to 
deformation quantization and, more generally, to 
*-algebras A over C=R(i) by Bursztyn and Wald- 
mann (2001). The aim is to construct functors 


F : x-Rep(A)— «-Rep(B) [22] 


which allow us to compare these categories and 
determine whether they are equivalent. But even if 
they are not equivalent, functors such as [22] are 
interesting. As example, one considers the situation 
of classical phase space reduction M ~> Mq as it is 
present in every constraint system or gauge theory. 
Suppose one succeeded with the (highly nontrivial) 
problem of quantizing both classical phase spaces in 
a reasonable way whence one has quantum obser- 
vable algebras A and A,,.g. Then, of course, a 
relation between x-Rep(A) and x-Rep(A,.q) is of 
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particular physical interest although one cannot 
expect both representation theories to be equivalent: 
A contains additional but physically irrelevant 
structure leading to possibly “more” representations. 

To get a clear picture of the Morita theory, one 
has to extend the notion of *-representations to the 
following framework: for an auxiliary x-algebra D 
over C, one defines a pre-Hilbert right D-module to 
be a right D-module H together with a C-sesqui- 
linear D-valued inner product (-,-:):HxH—D 
such that (œ, y) and (¢,W-d)=(¢,w)d for de D 
and such that (-,-) is completely positive. This 
means ((4;, ¢;)) E€ M,(D)* for all $1,...,¢n, where, 
in general, an algebra element a€ A is called 
positive, a € A”, if wla) > 0 for all positive linear 
functionals w: A — C. 

Then one defines (H) analogously as for pre- 
Hilbert spaces leading to a definition of a 
x-representation m of A on a pre-Hilbert right D- 
module H. The corresponding category of *-represen- 
tations is denoted by x-Repp( A). Clearly, elements in 
*-Repp(A) are in particular (A, D)-bimodules. 

The advantage is that now one has a tensor 
product ® taking care of the inner products as well. 
For x-algebras A, B,C, one has a functor 


& : *-Repp(C) x *-Rep4(B) — *-Repy(C) [23] 


which, on objects, is essentially given by &g. In fact, 
for F € x-Repg(C) and E € x-Repy(B), one defines 
on the (C,A)-bimodule F ®g€ an A-valued inner 
product by (x@¢,y@)=(¢, (x,y) >), which 
turns out to be well defined and completely positive 
again. Then F ® E is F &g E equipped with this 
inner product modulo its possibly nonempty degen- 
eracy space. 

By fixing one of the arguments of ©, one 
obtains the functor of Rieffel induction of 
*-representations 


Re : x-Repp(A) — x-Repp(B) [24] 


where € € x-Rep4(B) is fixed and Re(H) =E @H for 
H € x-Repp( A). 

The idea of strong Morita equivalence is then to 
search for such bimodules € where Reg gives an 
equivalence of categories. In detail, this is accom- 
plished by the following definition, where, for 
simplicity, only unital x-algebras are considered. 


Definition 6 A (B, A)-bimodule € is called a strong 
Morita equivalence bimodule if it is equipped with 
completely positive inner products (-,-) 4 and (-,-)g 
such that both inner products are full, in the sense 
that 


C-span{(x,y)al.yeE}=A BSI 


and analogously for (-,-),;, and compatible, in the 
sense that 


(b- x,y) a= (x,b* -Yj (X-4Y)p=(%Y-a")p [26] 


(X,Y) Bp Z =X: (Y, ZA [27] 


In this case, A and B are called strongly Morita 
equivalent. 


It turns out that this is indeed an equivalence 
relation and that strong Morita equivalence implies 
the equivalence of the representation theories: 


Theorem 7 For unital x-algebras over C, strong 
Morita equivalence is an equivalence relation. 


Theorem 8 If € is a strong Morita equivalence 
bimodule, then Rg as in |24] is an equivalence of 
categories. 


Example 9 The fundamental example in Morita 
theory is that a unital *-algebra A is strongly Morita 
equivalent to the matrices M,,(.A) via the (M,,(A), A)- 
bimodule A” where the inner product is (x,y) 4= 
2-17; and (-,-),(4) is uniquely determined by 
the compatibility condition [27]. 


An efficient way to encode the whole Morita 
theory of unital x-algebras over C is to collect all 
strong Morita equivalence bimodules modulo iso- 
metric isomorphisms of bimodules. Then the tensor 
product ® makes this into a “large” groupoid 
whose units are the x-algebras themselves. This so- 
called Picard groupoid Pic then encodes everything 
one can say about strong Morita equivalence. In 
particular, the orbits of this groupoid are precisely 
the strong Morita equivalence classes of x-algebras. 
The isotropy groups are the Picard groups Pic(A) 
which generalize the (outer) automorphism groups. 


Strong Morita Equivalence of Star 
Products 


This section considers star products from the view- 
point of the Morita equivalence. Here one can show 
that for A=(C%(M)[[A]], x), the possible candidates 
of equivalence bimodules are formal power series of 
sections T®(E)[[A]] of vector bundles E — M. This 
follows as, on the one hand, strong Morita 
equivalence is compatible with the classical limit 
A=0 in the sense that it implies strong Morita 
equivalence of the classical limits. On the other 
hand, any (classical or quantum) equivalence bimo- 
dule is finitely generated and projective as right 
A-module. Thus, by the Serre-Swan theorem one 
obtains the sections of a vector bundle in the 
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classical limit. Now one can show that every vector 
bundle can uniquely (up to equivalence) be 
deformed such that P°(E)[[A]] becomes a right 
A-module. Thus, the only thing to be computed is 
which deformation x’ is induced by this deformation 
of E for the endomorphisms ['°(End(E))[[A]], since 
one can show that then the result will always be a 
strong Morita equivalence bimodule. The inner 
products come from deformations of a Hermitian 
fiber metric on E. 

Since every vector bundle E— M can be 
deformed in this manner in an essentially unique 
way, we arrive at a general global construction of 
a noncommutative field theory where the fields are 
sections of E endowed with a deformed bimodule 
structure. In the case where M is even a symplectic 
manifold, a simple extension of Fedosov’s construc- 
tion of a star product x gives a rather explicit 
formula for the deformed bimodule structure of 
P°(E)[[A]] including a construction of the deforma- 
tion (T%(End(E)[[A]], x ) which acts from the left. 
As usual in Fedosov’s approach, the construction 
depends functorially on the choice of a connection 
V fore. 

Returning to the question of strong Morita 
equivalence of star products, we see that the vector 
bundle E has to be a line bundle L since only in this 
case we have ['~(End(E)) = C°(M). Since the 
deformation of the Hermitian fiber metric is always 
possible and since two equivalent Hermitian star 
products are always *-equivalent, one can show that 
strong Morita equivalence is already implied by 
ring-theoretic Morita equivalence (the converse is 
true in general). 


Theorem 10 Star products are strongly Morita 
equivalent if and only if they are Morita equivalent. 


An analogous statement holds for C*-algebras, 
known as Beer’s theorem (1982). 

In the symplectic case, the characteristic class c(x’) 
of the induced star product x’ can be computed 
explicitly leading to the following classification by 
Bursztyn and Waldmann (2002): 


Theorem 11 Let x,x be star products on a 
symplectic manifold M. Then x' is (strongly) Morita 
equivalent to x if and only if there exists a symplecto- 
morphism y such that 


w'e(s’) — c(t) € 27H gerham(M,Z) [28] 


A similar result in the general Poisson case was 
given by Jurčo, Schupp, and Wess (2002) based on 
Kontsevich’s formality theorem. This approach is 
motivated by a careful investigation of noncommu- 
tative (scalar) field theories. 


Finally, it is worth mentioning that [28] has a very 
simple physical interpretation. Consider again a 
cotangent bundle T*O with a topologically non- 
trivial configuration space Q, for example, R° \{0}. 
Then there is a canonical Weyl-type star product 
*Wey| depending on the choice of a connection V and 
an integration density u > 0, generalizing [7] to a 
curved situation. Now let B be a magnetic field, 
modeled as a closed 2-form on O. Minimal coupling 
leads to a new star product Weyl describing an 
electrically charged particle moving in O in the 
external field B. Then the two star products wey 
and Weyl are (strongly) Morita equivalent if and 
only if the magnetic field satisfies Dirac’s integrality 
condition for the (possibly nontrivial) magnetic 
charges described by B. Thus, Dirac’s condition 
is responsible for the very strong statement that the 
quantizations with and without magnetic field 
are Morita equivalent. In particular, the *-represen- 
tation theories of x*wey; and Weyl are equivalent. 
Even more specifically, using B to construct a line 
bundle L — O one obtains the result that Dirac’s 
*-representation of oe , on T@(L)[[A]] is precisely 
the Rieffel induction of the Schrödinger representa- 
tion of xwe; on C(O). 


See also: Aharonov-Bohm Effect; Algebraic Approach to 
Quantum Field Theory; Deformation Quantization; 
Deformation Theory; Deformations of the Poisson 
Bracket on a Symplectic Manifold; Fedosov Quantization. 
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Introduction and Historical Remarks 


In mathematical deformation theory one studies how 
an object in a certain category of spaces can be varied 
as a function of the points of a parameter space. In 
other words, deformation theory thus deals with the 
structure of families of objects like varieties, singula- 
rities, vector bundles, coherent sheaves, algebras, or 
differentiable maps. Deformation problems appear in 
various areas of mathematics, in particular in algebra, 
algebraic and analytic geometry, and mathematical 
physics. According to Deligne, there is a common 
philosophy behind all deformation problems in 
characteristic zero. It is the goal of this survey to 
explain this point of view. Moreover, we will provide 
several examples with relevance for mathematical 
physics. 

Historically, modern deformation theory has its 
roots in the work of Grothendieck, Artin, Quillen, 
Schlessinger, Kodaira—Spencer, Kuranishi, Deligne, 
Grauert, Gerstenhaber, and Arnol’d. The applica- 
tion of deformation methods to quantization 
theory goes back to Bayen—Flato—Fronsdal- 
Lichnerowicz—Sternheimer, and has led to the 
concept of a star product on symplectic and 
Poisson manifolds. The existence of such star 
products has been proved by de Wilde—Lecomte 
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occasion of his 60th birthday. Providence, RI: American 
Mathematical Society. 


and Fedosov for symplectic and by Kontsevich for 
Poisson manifolds. 

Recently, Fukaya and Kontsevich have found a 
far-reaching connection between general deforma- 
tion theory, the theory of moduli, and mirror 
symmetry. Thus, deformation theory comes back to 
its origins, which lie in the desire to construct 
moduli spaces. Briefly, a moduli problem can be 
described as the attempt to collect all isomorphism 
classes of spaces of a certain type into one single 
object, the moduli space, and then to study its 
geometric and analytic properties. The observations 
by Fukaya and Kontsevich have led to new insight 
into the algebraic geometry of mirror varieties and 
their application to string theory. 


Basic Definitions and Examples 


Deformation theory is based on the notion of a 
ringed space, so we briefly recall its definition. 


Definition 1 Let k be a field. By a k-ringed space 
one understands a topological space X together with 
a sheaf A of unital k-algebras on X. The sheaf A will 
be called the structure sheaf of the ringed space. In 
case each of the stalks A,,x € X, is a local algebra, 
that is, has a unique maximal ideal m,, one calls 
(X, A) a locally k-ringed space. Likewise, one defines 
a commutative k-ringed space as a ringed space 
such that the stalks of the structure sheaf are all 
commutative. 


Given two k-ringed spaces (X, A) and (Y,B), a 
morphism from (X, A) to (Y, B) is a pair (f, ~), where 


f :X — Y is a continuous mapping and y:f-'B— Aa 
morphism of sheaves of algebras. This means in 
particular that for every point x € X there is a 
homomorphism of algebras px : Bf) > Ax induced 
by y. Under the assumption that both ringed spaces 
are local, (f, ~) is called a morphism of locally ringed 
spaces, if each yx is a homomorphism of local 
k-algebras, that is, maps the maximal ideal of Bri) 
to the one of Ax. 

Clearly, k-ringed spaces (resp. locally or commu- 
tative k-ringed spaces) together with their morphisms 
form a category. The following is a list of examples of 
ringed spaces, in particular of those which will be 
needed later. 


Example 2 


(i) Denote by C% the sheaf of smooth functions on 

R”, by C” the sheaf of real analytic functions, 

and let O be the sheaf of holomorphic functions 

on C”. Then (R”,C™), (R”,C”), and (C”, O) are 

ringed spaces over R resp. C. 

A differentiable manifold of dimension n can be 

understood as a locally R-ringed space (M,C yy) 

which locally is isomorphic to (R”, C~). Likewise, 

a real analytic manifold is a ringed space (M, CH) 

which locally can be modeled by (R”,C”), and a 

complex manifold is an (M, Om) which locally 

looks like (C”, O). 

Let D be a domain in C”, and J an ideal sheaf 

in Op of finite type, which means that J is 

locally finitely generated over Op. Let Y be the 
support of the quotient sheaf Op/7. The pair 

(Y,Oy), where Oy denotes the restriction of 

Op/J to Y, then is a ringed space, called a 

complex model space. A complex space now is 

a ringed space (X, Ox) which locally looks like 

a complex model space (cf. Grauert and 

Remmert 1984). 

(iv) Let k be an algebraically closed field, and A” 
the affine space over k of dimension n. Then 
A", together with the sheaf of regular functions, 
is a ringed space. 

(v) Given a ring A, its spectrum Spec A together 

with the sheaf of regular functions O, forms a 

ringed space (cf. (Hartshorne (1997), section 

II.2)). One calls (Spec A, O,) an affine scheme. 

More generally, a scheme is a ringed space 

(X, Ox) which locally can be modeled by affine 

schemes. 

Finally, if A is a local k-algebra, the pair (x, A) 

can be understood as a locally ringed space. 

With A the algebra of formal power series k[[t]] 

over one variable t, this example plays an 

important role in the theory of formal deforma- 
tions of algebras. 
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Figure 1 A fibered space. 


Definition 3 A morphism (f,y~):(Y,B) — (P,S) of 
ringed spaces is called fibered, if the following 
conditions are fulfilled: 


(i) (P,S) is a commutative locally ringed space; 
(ii) f: Y — P is surjective; and 
(ill) ~y: Sry) > By maps Sy) into the center of By 
for each y € Y. 


The fiber of (f,p) over a point p € P then is the 
ringed space (Yp, 8p) defined by 


Yp=f'(p), Bp = By-rgy/MpBy-1@ 


where mM, is the maximal ideal of Sp which acts on 
By-(p) via p. 

A fibered morphism of ringed spaces can be 
pictured in Figure 1. 

Additionally to this intuitive picture, conditions 
(i)—(iii) imply that the stalks B, are central exten- 
sions of By/myy)By by Sfi). 


Definition 4 Let (P, S) be a commutative locally 
ringed space over a field k with P connected, let x be 
a fixed point in P, and (X, A) a k-ringed space. 
A deformation of (X, A) over the parameter space 
(P,S) with distinguished point » then is a fibered 
morphism (f, y):(Y, 8) — (P, S) over k together with 
an isomorphism (i, v):(X,A) —(Y,, B+) such that for 
all peP and yef'(p) the homomorphism 
Yy:Sp — By is flat. 


The condition of flatness in the definition of a 
deformation serves as a substitute for “local trivi- 
ality” and works also in the presence of singularities. 
(see Palamodov (1990), section 3) for a discussion of 
this point. 

In the remainder of this section, we provide a list 
of some of the most important deformation pro- 
blems in mathematics, and show how these can be 
formulated within the above language. 


Products of -Ringed Spaces 


Let (X,A) be any k-ringed space and (P,S) a 
k-scheme. For any closed point x € P, the product 
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(X x P,B)=(X, A) x, (P,S) then is a flat deforma- 
tion of (X, A) with distinguished point x. This can 
be seen easily from the fact that Bix, p) = Ax Bx Sp for 
every x € X and pe P. 


Families of Matrices as Deformations 


Let (P,Op) be a complex space with distinguished 
point x and Ap: P — Mat(n x n,C) a holomorphic 
family of complex n x n matrices over P. By the 
following construction, Ap can be understood as a 
deformation, more precisely as a deformation of the 
matrix A:=Ap(x). Let Y be the graph of Ap in the 
product space P x Mat(n x n,C) and f:Y—P be 
the restriction of the projection onto the first 
coordinate. Define the sheaf B as the inverse image 
sheaf f-'S, and let y be the sheaf morphism which 
for every y€ Y is induced by the identity map 
Py : Spy) — By := Sfi- It is then immediately clear 
that (f,p) is a deformation of the fiber f t(x) and 
that this fiber coincides with the matrix A. 

Now let A be an arbitrary complex n x n-matrix, 
and choose a GL(n,C)-slice through A, that is, a 
submanifold P containing A which is transversal to the 
GL(n, C)-orbit through A. Hereby, it is assumed that 
GL(n, C) acts by the adjoint action on Mat( x n, C). 
The family Ap given by the canonical embedding 
P— Mat(n x n,C) now is a deformation of A. The 
germ of this deformation at x is versal in the sense 
defined in the next section. 


Deformation of a Scheme a la Grothendieck 


Assume that (P,S) is a connected scheme over k. A 
deformation of a scheme (X, A) then is a deforma- 
tion (f,»):(Y,B)—(P,S) in the sense defined 
above, together with the requirement that f: Y — P 
is a proper map, that is, f—'(K) is compact for every 
compact K C P. As a particular example, consider 
the k-scheme Y = Speck[x, y, t]/(xy — t]. It gives rise 
to a fibration Y — Speck[t], whose fibers Y, with 
a € k are hyperbolas xy =a, when a Æ 0, and consist 
of the two axes x=0 and y=0, when a=0. For 
k=IR, this deformation can be illustrated as in 
Figure 2. 

For further information on this and similar 
examples, see Hartshorne (1977), in particular 
example 3.3.2. 


Deformation of a Complex Space 


According to Grothendieck, one understands by a 
deformation of a complex space (X,.A) a morphism 
of complex spaces (f,y):(Y,B) — (P,S) which is 
both a proper flat morphism of complex spaces and 
a deformation of (X,A) as a ringed space. In case 
(X, A) and (P,S) are complex manifolds and if P is 





Figure 2 Deformation of the coordinate axes. 


connected, each of the fibers Y, is a compact 
complex manifold. Moreover, the family (Yp)pep 
then is a family of compact complex manifolds in 
the sense of Kodaira—Spencer (cf. Palamodov 
(1990)). 


Deformation of Singularities 


Let p be a point of some C”. Two complex spaces 
(X, Ox) c (C”, 0) and (X’, Ox’) c (C”,O) with x € 
XX’ are then called germ equivalent at x if there 
exists an open neighborhood U € C” of x such that 
XNU=X'NU. Obviously, germ equivalence at x is 
an equivalence relation indeed. We denote the equiva- 
lence class of X by [X],. Clearly, if [X],,=[X’],, then 
one has Ox, =Ox, for the stalks at x. By a 
singularity one understands a pair ([X],,, Ox,,). In the 
literature, such a singularity is often denoted by (X, x). 
The singularity (X, x) is called nonsingular or regular if 
Ox, is isomorphic to an algebra of convergent power 
series C{z,,...,Zg}. A deformation of a complex 
singularity (X,x) over a complex germ (P, *) is a 
morphism of ringed spaces ([Y],-, Oy, x) — ([P].5 Op, «) 
which is induced by a holomorphic map and which is 
a deformation of ([X],.,Ox,) as a ringed space. See 
Artin (1976) and the overview article by Greuel (1992) 
for further details and a variety of examples. 


First-Order Deformation of Algebras 


Consider a k-algebra A and the truncated poly- 
nomial algebra S=k{e]/e*k[e]. Furthermore, let a: 
A x A — A bea Hochschild 2-cocycle of A; in other 
words, assume that the relation 


a1a(a2, 43) — a(aia2,a3) + a(a1,a2a3) 
= a(a1, a2 )a3 = 0 [1] 


holds for all a,,a2,a3 € A. Then one can define a 
new k-algebra B, whose underlying linear structure 


is isomorphic to A &, S and whose product is given 
by the following construction: any element b € B 
can be written uniquely in the form b=ag + aye, 
with ao, a, € A. Then the product of b = aọ + aye € B 
and b' =a) + a‘¢ € B is given by 


b- b' = aga) + |a(ao,a9) + aoa, +aiayle [2] 


By condition [1], this product is associative. One 
thus obtains a flat deformation A:S — B of the 
algebra A and calls it the first-order or infinitesimal 
deformation of A along the Hochschild cocycle a. 
For further information on this and the connection 
between deformation theory and Hochschild coho- 
mology, see the overview article by Gerstenhaber 
and Schack (1986). 


Formal Deformation of an Algebra 


Let us generalize the preceding example and explain 
the concept of a formal deformation of an algebra 
by Gerstenhaber. Assume again A to be an arbitrary 
k-algebra and choose bilinear maps a,:A x A— A 
for n € N such that apo is the product on A and a is 
a Hochschild cocycle. Furthermore, let S be the 
algebra k[[t]] of formal power series in one variable 
over k. Then define on the linear space B= A[[t]] of 
formal power series in one variable with coefficients 
in A the following bilinear map: 


x: Bx B— B 
(Za tb, r) >E D onlab” P 
neEN nEeEN nEeEN Pie eae 


If B together with x becomes a k-algebra or, in other 
words, if x is associative, one can easily see that it 
gives a flat deformation of A over S=k{[¢]]. In that 
case, one says that B is a formal deformation of A 
by the family (&n) en. Contrarily to the preceding 
example, there might not exist for every Hochschild 
cocycle œ on A a formal deformation B of A defined 
by a family (ay),cen such that a;=a. In case it 
exists, we will say that the deformation B of A is in 
the direction of a. If the third Hochschild cohomol- 
ogy group H°(A,A) vanishes, there exists for every 
Hochschild cocycle a on A a deformation B of A in 
the direction of œ (see again Gerstenhaber and 
Schack (1986) for further details). 


Formal Deformation Quantization of Symplectic 
and Poisson Manifolds 


Let us consider the last two examples for the case 
where A is the algebra C% (M) of smooth functions on 
a symplectic or Poisson manifold M. Then the Poisson 
bracket {,} gives a Hochschild cocycle on C%(M). 
There exists a first-order deformation of C% (M) along 
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(1/2i){,} and, even though HH°(A,A) might not 
always vanish, a deformation quantization of M, that 
means a formal deformation of C™~(M) in the 
direction of the Poisson bracket (1/2i){,}. For the 
symplectic case, this fact has been proved first by 
deWilde-Lecomte using methods from Hochschild 
cohomology theory. A more geometric and intuitive 
proof has been given by Fedosov (1996). The Poisson 
case has been settled in the work of Kontsevich 
(2003) (see also the section “Deformation quantiza- 
tion of Poisson manifolds”). 


Quantized Universal Enveloping Algebras 
According to Drinfeld 


A quantized universal enveloping algebra for a 
complex Lie algebra g is a Hopf algebra A over 
C[[¢]] such that A is a topologically free C[[t]]- 
module (i.e., A=(A/tA)[[t]] as left C[[t]]-module) 
and A/TtA is the universal enveloping algebra Ug of g. 
Because A is a topologically free C[[t]]-module, A is a 
flat C[[t]]-module and thus a deformation of Ug over 
C[[z]]. See Drinfel’d (1986) and the monograph by 
Kassel (1995) for further details and examples of 
quantized universal enveloping algebras. 


Quantum Plane 


Consider the tensor algebra T= Den (R*)*” of 
the two-dimensional real vector space R7, and let 
(x,y) be the canonical basis of R?. Then form the 
tensor product sheaf T œ =T @p Oc and let Zœ be 
the ideal sheaf in Tc generated by the relation 


x@y-—zy@x=O0 [4] 


where z:C* — C is the identity function. The 
quotient sheaf B= Bœ = T/T then is a sheaf of 
C-algebras and an Oc«-module. Using eqn [4] now 
move all occurrences of x in an element of Bc to the 
right of all y’s. Since 1/z is an element of O(C*), one 
can thus show that Bc is a free Oc«-module. Hence, 
Bc» is flat over Oc. Further, it is easy to see that for 
every q € C the C-algebra A, =B,/m,8, is freely 
generated by elements x, y with relations 


x@y—qyax=0 [5] 


We call A, the q-deformed quantum plane and 
B = B(C*) the over C* universally deformed quan- 
tum plane. Altogether, one can interpret B as 
a deformation of A, over C’, in particular as a 
deformation of A1 =T &r C=C[x,y], the algebra 
of complex polynomials in two generators. 

In the same way, one can deform function 
algebras on higher-dimensional vector spaces as 
well as function algebras on certain Lie groups. 
In this manner, one obtains the quantum group 
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SU,(2) as a deformation of a Hopf algebra of 
functions on SU(2). See, for example, the work of 
Faddeev—Reshetikhin—Takhtajan (1990), Manin (1988) 
and Wess—Zumino (1990) for more information on 
g-deformations of vector spaces, Lie groups, differ- 
ential calculi, etc. 


Versal Deformations 


In this section, and the ones that follow, we consider 
only germs of deformations, that is, deformations 
over parameter spaces of the form (x, S). This means 
in particular that the structure sheaf only consists 
of its stalk S at x, a commutative local k-algebra. Let 
us now suppose that the sheaf morphism 
y:(Y,B) — (x, S) (over the canonical map Y — x) 
is a deformation of the ringed space (X, A) and that 
T:T — S is a homomorphism of commutative local 
k-algebras. Then the sheaf morphism r*y: B s T > 
T with (7*y),(t)=1@t for ye Y and tET is 
a deformation of (X,.A) over the parameter space 
(x, T). One says that the deformation r*y is induced 
by the homomorphism r. 


Definition 5 A deformation y:(Y,b) — S of 
(X, A) is called versal if every (germ of a) deforma- 
tion of (X, A) is isomorphic to a deformation germ 
induced by a homomorphism of k-algebras 7:T — S. 
A versal deformation is called universal, if the 
inducing homomorphism r:T — S is unique, and 
miniversal if S is of minimal dimension. 


Example 6 


(i) In the section “Families of matrices as deforma- 
tions,” the construction of a versal deformation 
of a complex matrix A has been sketched. 

(ii) According to Kuranishi, every compact com- 
plex manifold has a versal deformation by an 
analytic germ. See Kuranishi (1971) for a 
detailed exposition and the section “The 
Kodaira—Spencer algebra controlling deforma- 
tions of compact complex manifolds” for a 
description of the principal ideas. 

(iii) Grauert has shown that for isolated singularities 

there exists a versal analytic deformation. 

By the work of Douady—Verdier, Grauert, and 

Palamodov one knows that for every compact 

complex space there exists a miniversal analytic 

deformation. One of the essential methods in 
the existence proof hereby is Palamodov’s 
construction of the cotangent complex (see 

Palamodov (1990). 

(v) Bingener (1987) has further established 
Palomodov’s approach and thus has provided a 


xr 


(iv 


unified and quite general method for construct- 
ing versal deformations in analytic geometry. 

(vi) Fialowski-Fuchs have constructed miniversal 
deformations of Lie algebras. 


Schlessinger’s Theorem 


According to Grothendieck, spaces in algebraic 
geometry are represented by functors from a category 
of commutative rings to the category of sets. In this 
picture, an affine algebraic variety X over the base 
field k and with coordinate ring A is equivalently 
described by the functor Hom,),(A, —) defined on the 
category of commutative k-algebras. As will be 
shown by examples in the next section, versal 
deformations are often encoded by functors repre- 
senting spaces. More precisely, a deformation pro- 
blem leads to a so-called functor of Artin rings, which 
means a covariant functor F from the category of 
(local) Artinian k-algebras to the category of sets such 
that the set F(k) has exactly one element. The 
question now arises as to under which conditions 
the functor F is representable, that is, there exists 
a commutative k-algebra A such that FS 
Homy,ig(A, —). In the work of Schlessinger (1968), 
the structure of functors of Artin rings has been studied 
in detail. Moreover, criteria have been established, 
when such a functor is pro-presentable, which means 
that it can be represented by a complete local 
algebra A, where “completeness” is understood 
with respect to the m-adic topology. Because of its 
importance for deformation theory, we will state 
Schlessinger’s theorem in this section. Before we 
come to its details, let us recall some notation. 


Definition 7 By an Artinian k-algebra over a field k 
one understands a commutative k-algebra R which 
satisfies the following descending chain condition: 


(Dec) Every descending chain I; D--- DI, D 


Ig} D +++ of ideals in R becomes stationary. 


Among others, an Artinian algebra R has the 
following properties: 


1. R is Noetherian, that is, it satisfies the ascending 
chain condition. 

2. Every prime ideal in R is maximal. 

3. (Chinese remainder theorem) R is isomorphic to 
a finite product II?_,R;, where each R; is a local 
Artinian algebra. 

4. Every maximal ideal m of R is nilpotent, that is, 
m* =0 for some k € N. 

5. Every quotient R/m* with m maximal is finite 
dimensional. 


Definition 8 Assume that f :B —> A is a surjective 
homomorphism in the category k-Algjay of local 
Artinian k-algebras. Then f is called a small extension 
if ker f is a nonzero principal ideal (b) in B such that 
mb =(0), where m is the maximal ideal of B. 


Theorem 9 (Schlessinger (1968, theorem 2.11)). 
Let F be a functor of Artin rings (over the base field 
k). Assume that A' — A and A" — A are morphisms 
in k-Al9; an, and consider the map 


F(A’ XA A”) —? F(A‘) X F(A) F(A") [6] 
Then F is pro-representable if and only if F has the 
following properties: 


(H1) The map [6] is a surjection whenever A" — A 
is a small extension. 
(H2) The map [6] is a bijection, when A=k and 


A” =kfe]. 
(H3) One has dim,(tp) < co for the tangent space 
tp := F(k[e]). 


(H4) For every small extension A' — A, the map 
F(A’ XA A’) —> F(A’) X F(A) F(A’) 
is an isomorphism. 


Suppose that the functor F satisfies conditions 
(H1)-(H4), and let A be an arbitrary complete local 
k-algebra. By Yoneda’s lemma, every element 


€ = proj lim &, € À = proj lim A / mA 
nEeN nEeN 


induces a natural transformation 
Homyg(A,-) + F, (u: AR) Fln) (én) [7] 


where n € N is chosen large enough such that the 
homomorphism u:A — R factors through some 
u,:A/m" — R. This is possible indeed, since R is 
Artinian. In the course of the proof of Schlessinger’s 
theorem, A and the element €€ A are now con- 
structed in such a way that [7] is an isomorphism. 


Differential Graded Lie Algebras 
and Deformation Problems 


According to a philosophy going back to Deligne 
“every deformation problem in characteristic zero is 
controlled by a differential graded Lie algebra, with 
quasi-isomorphic differential graded Lie algebras 
giving the same deformation theory” (cf. Goldman 
and Millson (1988), p. 48). In the following, we will 
explain the main idea of this concept and apply it to 
two particular examples. 
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Differential Graded Lie Algebras 


Definition 10 By a graded algebra over a field k 
one understands a graded k-vector space A*= 
-z A* together with a bilinear map 

u:A° x A®* > A’, (a,b)=a.-b = p/(a,b) 
such that A* - A! c A**! for all k,l € Z. The graded 
algebra A° is called associative if (ab)c =a(bc) for all 
a,b,c € A®. 

A graded subalgebra of A® is a graded subspace 
B° = @,-z BE C A° which is closed under p, a 
graded ideal is a graded subalgebra I° C A° such 
that I° - A° C I° and APPC. 

A homomorphism between graded algebras A° 
and B° is a homogeneous map f : A° — B® of degree 


0 such that f(a - b) = f(a) - f(b) for all a,b € A°. 


From now on, assume that k has characteristic 
+2,3. A graded Lie algebra then is a graded 
k-vector space 9° = @;cz g* together with a bilinear 
map 


F, ]:g? xg’ sg", (4,6) [a,b] 
such that the following axioms hold true: 


g£, g] C gk! for all k,l € Z. 
E, C] = — (-1)"[¢, E] for all £ € gt, ¢ € g’. 


-I 

A 

. (1) [[61, &], &] + (1)! é, £], &] + 
(-1)"[[€3, &1], &] = 0 for all & € g} with 
i = 1, 2, 3. 


By axiom (1), it is clear that a graded Lie algebra is 
in particular a graded algebra. So the above-defined 
notions of a graded ideal, homomorphism, etc., apply 
as well to graded Lie algebras. 


Example 11 Let A*= Q,.7 Af be a graded asso- 
ciative algebra. Then A° becomes a graded Lie 
algebra with the bracket 


a,b} =ab—(-1)"ba for a € AŻ and b € A! 


The space A° regarded as a graded Lie algebra is 
often denoted by lie’ (A°). 


Definition 12 A linear map D: A° — A® defined 
on a graded algebra A® is called a derivation of 
degree lif 


D(ab) = (Da)b + (—1)"a(Db) 
for alla € A! and b € A® 


A graded (Lie) algebra A° together with a 
derivation d of degree 1 is called a differential 
graded (Lie) algebra if dod=0. Then (A°,d) 
becomes a cochain complex. Since ker d is a graded 
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subalgebra of A° and im d a graded ideal in ker d, 
the cohomology space 


H*(A®,d) = ker d/im d 
inherits the structure of a graded (Lie) algebra from A®. 


Let f: A* — B° be a homomorphism of differen- 
tial graded (Lie) algebras (A°, d) and (B°, 0). Assume 
further that f is a cochain map, that is, that f o d = 
of. Then one says that f is quasi-isomorphism or 
that the differential graded (Lie) algebras A° and B° 
are quasi-isomorphic if the induced homomorphism 
on the cohomology level f : H°(A°, d) — H*(B°,0) is 
an isomorphism. Finally, a differential graded (Lie) 
algebra (A*,d) is called formal if it is quasi- 
isomorphic to its cohomology (H°(A°, d), 0). 


Maurer-—Cartan Equation 


Assume that (q*,[-,-]|,d) is a differential graded Lie 
algebra over C. Define the space MC(g°) of 
solutions of the Maurer—Cartan equation by 


MC(g°) := {we g! | dw — Hw, w] =0} 8] 


In case the differential graded Lie algebra g° is 
nilpotent, this space naturally possesses a groupoid 
structure or, in other words, a set of arrows which are 
all invertible. The reason for this is that, under the 
assumption of nilpotency, the space g? is equipped 
with the Campbell-Hausdorff multiplication 
a’ xg? = AY log(exp X, exp Y) 

and the group g? acts on g! by the exponential 
function. More precisely, in this situation one can 
define for two objects a, 3 € MC(g°) the space of 
arrows a—( as the set of all A€ g? such that 
exp A~@=— fb. 

We have now the means to define for every 
complex differential graded Lie algebra g° its 
deformation functor Def.. This functor maps the 
category of local Artinian C-algebras to the category 
of groupoids and is defined on objects as follows: 


Def, (R) := MC(g* @ m) [9] 


Hereby, R is a complex local Artinian algebra, and 
m its maximal ideal. Note that since R is Artinian, 
g? Q m is a nilpotent differential graded Lie algebra, 
hence Def,-(R) carries a groupoid structure as 
constructed above. Clearly, Def, is also a functor 
of Artin rings as defined in the previous section. 
With appropriate choices of the differential 
graded Lie algebra g°, essentially all deformation 
problems from the section “Basic definitions and 
examples” can be recovered via a functor of the 


form Def,-. Below, we will show in some detail how 
this works for two examples, namely the deforma- 
tion theory of complex manifolds and the deforma- 
tion quantization of Poisson manifolds. But before 
we come to this, let us state a result which shows 
how the deformation functor behaves under quasi- 
isomorphisms of the underlying differential graded 
Lie algebra. This result is crucial in a sense that it 
allows to equivalently describe a deformation 
problem with controlling g° by any other differential 
graded Lie algebra within the quasi-isomorphism 
class of g°. So, in particular in the case where the 
differential graded Lie algebra is formal, one often 
obtains a direct solution of the deformation 
problem. 


Theorem 13 (Deligne, Goldman—Millson). Assume 
that f:q° —~b6° is a quasi-isomorphism of 
differential graded Lie algebras. For every local 
Artinian C-algebra R the induced functor f,: 
Def,-(R) — Defg:(R) then is an equivalence of 
groupoids. 


The Kodaira-Spencer Algebra Controlling 
Deformations of Compact Complex Manifolds 


Let M be a compact complex n-dimensional mani- 
fold. Recall that then the complexified tangent 
bundle TcM has a decomposition into a holomor- 
phic tangent bundle TtM and an antiholomorphic 
tangent bundle T°'M. This leads to a decomposi- 
tion of the space of complex n-forms into the spaces 
Q?>7M of forms on M of type (p,q). More generally, 
a smooth subbundle J! C TcM which induces a 
decomposition of the form T7-M=J]'° a J®!, where 
Jt? = J1, is called an almost complex structure on 
M. Clearly, the decomposition of TcM into the 
holomorphic and antiholomorphic part is an almost 
complex structure, and an almost complex structure 
which is induced by a complex structure is called 
integrable. Assume that an almost complex structure 
J®™t! is given on M and that it has finite distance to 
the complex structure on M. The latter means that 
the restriction or 1 of the projection o: ToM > T®!M 
along T'°M to the subbundle J™! is an isomor- 
phism. Denote by 8 the inverse of oy and let w € 
Q°1(M,T'°M) be the composition —o0 3. One 
checks immediately that every almost complex 
structure with finite distance to the complex 
structure on M is uniquely characterized by a 
section w € 2°!(M, T'°M) and that every element 
of 2°!(M,T'°M) comes from an almost complex 
structure on M. 

As a consequence of the Newlander—Nirenberg 
theorem, one can now show that the almost 





complex structure J™! resp. w is integrable if and 
only if the equation 


dw — 4w, w] = 0 [10] 


is fulfilled. But this is nothing else than the Maurer- 
Cartan equation in the Kodaira—Spencer differential 
graded Lie algebra 


(2°, 0, E J) = (@ O°? (M, T'°M),9, |: y i 
peN 

Hereby, °?(M,T'°M) denotes the T°M-valued 
differential forms on M of type (0,p),0: 
(9? (M, THOM) = QOPM, THOM) the Dolbeault 
operator, and [-,-]| is induced by the Lie bracket 
of holomorphic vector fields. As a consequence of 
these considerations, deformations of the complex 
manifold M can equivalently be described by families 
(Wy yep C Q! which satisfy eqn [10] and w, =0. Thus, 
it remains to determine the associated deformation 
functor Defe:. 

According to Schlessinger’s theorem, the functor 
Defgs is pro-representable. Hence, there exists a 
local C-algebra Re» complete with respect to the 
mt-adic topology such that 


Defg (R) = Homaig(Re, R) [11] 


for every local Artinian C-algebra R. Moreover, by 
Artin’s theorem, there exists a “convergent” solution 
of the Maurer—Cartan equation, that is, Re» can be 
replaced in eqn. [11] by a ring Re representing an 
analytic germ. 


Theorem 14 (Kodaira-Spencer, Kuranishi). The 
ringed space (Re-,(0)) is a miniversal deformation 
of the complex structure on M. 


Deformation Quantization of Poisson Manifolds 


Let A be an associative k-algebra with chark=0. 
Put for every integer k > —1 


g? := Hom, (A®**)), A) 


Then g° becomes a graded vector space. Let us 
impose a differential and a bracket on g°. The 


differential is the usual Hochschild coboundary 
b:q* oak qo. 


bf (ao 8 +++ @ agy1) 
= dof (a1 D --- © agi) 


k 
+ POOD Fla 8- @ aidi B+ @ aer) 


+ (-1)*f (ao @ +++ @ apak 
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The bracket is the Gerstenhaber bracket 
|, -] . g“ x g% = geit 
fh = hoh- OD hof 


where 


fi o fa (ao res @ Ak, +k, ) 
kı l 
= XI (40 @ ++ @4j-1 @ fai @ ++ 8 dirk) 
=0 


B ditk 418° @ Ak, +k) 


The triple (q°,b,[-,-]) then is a differential graded 
Lie algebra. 

Consider the Maurer—Cartan equation by — 
(1/2)[7,7]=0 in g!. Obviously, it is equivalent to 
the equality 


ayy (41,42) — ¥(4041, 42) + (40,4142) — (40, 41) a2 
= ¥(7(40, 41), 42) — y(40, (41, 42)) 
for 40,41,42 E A /12] 


If one defines now for some y € g! the bilinear map 
m:AxA—-A by m(a,b)=ab+ (a,b), then [12] 
implies that m is associative if and only if y satisfies 
the Maurer—Cartan equation. 

Let us apply these observations to the case where A 
is the algebra C™(M)[[t]] of formal power series in one 
variable with coefficients in the space of smooth 
functions on a Poisson manifold M. By (a variant of) 
the theorem of Hochschild—Kostant—Rosenberg and 
Connes, one knows that in this case the cohomology of 
(q°, b) is given by formal power series with coefficients 
in the space [°(A*TM) of antisymmetric vector fields. 
Now, T®(A°TM) carries a natural Lie algebra bracket 
as well, namely the Schouten bracket. Thus, one 
obtains a second differential graded Lie algebra 
(T° (A*TM)|[z]],0,[-,-]). Unfortunately, the projec- 
tion onto cohomology (q°, b) — T®(A°TM)[[ż]] does 
not preserve the natural brackets, hence is not a quasi- 
isomorphism in the category of differential graded Lie 
algebras. It has been the fundamental observation by 
Kontsevich that this defect can be cured as follows. 


Theorem 15 (Kontsevich 2003). For every Poisson 
manifold M the differential graded Lie algebra 
(q°,b,[-,-]) is formal in the sense that there exists 
a  quasi-isomorphism  (q°,b,[-,-]) ~ (T° (A°TM) 
[[¢]],0,|-,-]) in the category of L®-algebras. 


Note that the theorem only claims the existence of 
a quasi-isomorphism in the category of L™-algebras 
or, in other words, in the category of homotopy Lie 
algebras. This is a notion somewhat weaker than a 
differential graded Lie algebra, but Theorem 13 also 
holds in the context of L®-algebras. 
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Since the solutions of the Maurer—Cartan equa- 
tion in (I'™°(A*°TM)|[t]],0,[-,-]) are exactly the 
formal paths of Poisson bivector fields on M, 
Kontsevich’s formality theorem entails: 


Corollary 16 Every Poisson manifold has a formal 
deformation quantization. 


See also: Deformation Quantization; Deformation 
Quantization and Representation Theory; Deformations of 
the Poisson Bracket on a Symplectic Manifold; Fedosov 
Quantization; Holonomic Quantum Fields; Operads. 
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Introduction to Deformation Quantization 


The framework of classical mechanics, in its 
Hamiltonian formulation on the motion space, 
employs a symplectic manifold (or more generally a 
Poisson manifold). Observables are families of 
smooth functions on that manifold M. The dynamics 
is defined in terms of a Hamiltonian H € C%(M) and 
the time evolution of an observable f € C°(M x R) 
is governed by the equation: (d/dt)f = —{H, fi}. 

The quantum-mechanical framework, in its usual 
Heisenberg’s formulation, employs a Hilbert space 
(states are rays in that space). Observables are 
families of self-adjoint operators on the Hilbert 
space. The dynamics is defined in terms of a 
Hamiltonian H, which is a self-adjoint operator, 


and the time evolution of an observable A; is 
governed by the equation dA,/dt = (i/h)[H, A;]. 

Quantization of a classical system is a way to pass 
from classical to quantum results. A first idea for 
quantization is to define a correspondence 
O:ft> O(f) mapping a function f to a self-adjoint 
operator O(f) on a Hilbert space 7 in such a way 
that Q(1) =Id and [Q(f), Q(g)] =i4Q({f, g}). Untor- 
tunately, there is no such correspondence defined on 
all smooth functions on M when one puts an 
irreducibility requirement (which is necessary not 
to violate Heisenberg’s principle). 

Different mathematical treatments of quantization 
have appeared: 


e Geometric quantization of Kostant and Souriau: 
first, prequantization of a symplectic manifold 
(M,w) where one builds a Hilbert space and a 
correspondence O defined on all smooth functions 
on M but with no irreducibility; second, polariza- 
tion to “cut down the number of variables.” 

è Berezin’s quantization where one builds on a 
particular class of symplectic manifolds (some 
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Kahler manifolds) a family of associative algebras 
using a symbolic calculus, that is, a dequantiza- 
tion procedure. 

e Deformation quantization introduced by Flato, 
Lichnerowicz, and Sternheimer in 1976 where 
they “suggest that quantization be understood as 
a deformation of the structure of the algebra of 
classical observables rather than a radical change 
in the nature of the observables.” 


This deformation approach to quantization is part 
of a general deformation approach to physics 
(a seminal idea stressed by Flato): one looks at 
some level of a theory in physics as a deformation of 
another level. 

Deformation quantization is defined in terms of a 
star product which is a formal deformation of the 
algebraic structure of the space of smooth functions 
on a Poisson manifold. The associative structure 
given by the usual product of functions and the Lie 
structure given by the Poisson bracket are simulta- 
neously deformed. 

In this article we concentrate on some mathema- 
tical results concerning deformations of the Poisson 
bracket on a symplectic manifold, classification of 
star products on symplectic manifolds, group actions 
on star products, convergence properties of some 
star products, and star products on cotangent 


bundles. 


Deformations of the Poisson Bracket 
on a Symplectic Manifold 


Definition 1 A Poisson bracket defined on the 
space of smooth functions on a manifold M is an 
R-bilinear map on C™(M), (u,v) {u,v} such that 
for any u,v,w € C%(M): 


(1) {u,v} = —{v, u}; 
(11) {{u,v},w} + {{v, w}, u} + {{w, u}, v} = 0; 
(ui) {u, vw} = {u,v}w + {u, ww. 


A Poisson bracket is given in terms of a contra- 
variant skew-symmetric 2-tensor P on M (called 
the Poisson tensor) by {u,v}=P(du Adv). The 
Jacobi identity for the Poisson bracket is equiva- 
lent to the vanishing of the Schouten bracket 
[P,P]=0. (The Schouten bracket is the extension — 
as a graded derivation for the exterior product — 
of the bracket of vector fields to skew-symmetric 
contravariant tensor fields.) A Poisson manifold 
(M, P) is a manifold M with a Poisson bracket 
defined by P. 


A particular class of Poisson manifolds, essential 
in classical mechanics, is the class of “symplectic 
manifolds.” If (M,w) is a symplectic manifold (i.e., 


w is a closed nondegenerate 2-form on M) and if 
u,v E€ C~(M), the Poisson bracket of u and v is 


10,0} = Xu U) SX Xa) 


where X, denotes the Hamiltonian vector field 
corresponding to the function u, that is, such that 
i(X,,)w=du. In coordinates the components of the 
Poisson tensor P” form the inverse matrix of the 
components w;j of w. 

Duals of Lie algebras form the class of linear 
Poisson manifolds. If g is a Lie algebra, then its dual 
q* is endowed with the Poisson tensor P defined by 
Pe(X, Y):=E([X, YI) for X, Y € g=(g")" ~ (Teg). 


Definition 2 A Poisson deformation of the Poisson 
bracket on a Poisson manifold (M, P) is a Lie 
algebra deformation of (C™~(M),{,}) which is a 
derivation in each argument, that is, of the form 
{u,v}, =P,(du,dv), where P,=P+ 5 *v*P, is a 
series of skew-symmetric contravariant 2-tensors 
on M (such that [P,, P,]=0). 


Two Poisson deformations P, and P’, of the 
Poisson bracket P on a Poisson manifold (M, P) 
are equivalent if there exists a formal path in the 
diffeomorphism group of M, starting at the identity, 
that is, a series T= exp D=Id + J, (1/j!)D' for 
D= $n; "D, where the D, are vector fields on M, 
such that 


Taio) 11, Tvt, 
where {u, v}, = P,(du, dv) and {u, v}) = P’ (du, dv). 


Proposition 3 (Flato et al. 1975, Lecomte 1987). 
On a symplectic manifold (M,w), any Poisson 
deformation of the Poisson bracket corresponds to 
a series of closed 2-forms on M, Q, =w + Do.) Vu, 
and is given by 


{u,v}, = P, (du, dv) = 0, (X%, XY) 


with 1(X”)Q, = du. The equivalence classes of Poisson 


deformations of the Poisson bracket P are 
parametrized by H*(M;R)[[v]]. 
Poisson deformations are used in classical 


mechanics to express some constraints on the 
system. To deal with quantum mechanics, Flato 
et al. (1976) introduced star products. These give, 
by skew-symmetrization, Lie deformations of the 
Poisson bracket. 


Definition 4 A “star product” on (M, P) is an 
Riv]-bilinear associative product x on C™(M)lv] 
given by 


uxv=ux,v:= > VC,(u,v) 


r>0 
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for u,v € C~(M) (we consider here real-valued 
functions; the results for complex-valued functions 
are similar), such that Colu, v)= uv, C(u,v) — 
Ci(v,u)={uyv},l*x«u=uxl=u. 


When the C,’s are bidifferential operators on M, 
one speaks of a differential star product. When each 
C, is a bidifferential operator of order at most r in 
each argument, one speaks of a natural star product. 

One finds in the literature other normalizations 
for the skew-symmetric part of Cı such as (i/2){, }; 
these amount to a rescaling of the parameter v. For 
physical applications, in the above convention for 
the formal parameter, v corresponds to ih, where h 
is Planck’s constant. 

In the case of complex-valued functions, one can 
add the further requirement that the complex con- 
jugation is a “-involution for x, that is, f x g=2*f. 
According to the interpretation of v as being ih, we 
have to require y= —v. Star products satisfying this 
additional property are called symmetric or Hermitian. 

A star product can also be defined not on the 
whole of C®(M) but on a subspace N which is stable 
under pointwise multiplication and Poisson bracket. 

The simplest example of a deformation quantiza- 
tion is the Moyal product for the Poisson structure P 
on a vector space V = R” with constant coefficients: 





P= PI0;A0,, Pi =—-PhER 
i,j 
where 0;=0/0y' is the partial derivative in the 


direction of the coordinate y',i=1,...,”. The 
formula for the Moyal product is 


(u xm v)(2)= exp (5 P" 8y 8y) (HOO lyy (1 


When P is nondegenerate (so V =R?”), the space of 
formal power series of polynomials on V with 
Moyal product is called the formal Weyl algebra 
W=(S(V)[[v]], *m)- 

Let g* be the dual of a Lie algebra g. The algebra of 
polynomials on g* is identified with the symmetric 
algebra S(q). One defines a new associative law on this 
algebra by a transfer of the product o in the universal 
enveloping algebra U(q), via the bijection between 
S(q) and U(q) given by the total symmetrization o: 


1 
a: S(q) — U(q) : X1... X} > T2 A o +- o X olk) 
' PES 


Then U(g)= ®n>0 Un, where U, :=0o(S”(g)) and we 
decompose an element u€ U(g); accordingly 
u= un„. We define, for P € P(g) and O € S4(q), 


Px O= 9 ("o (oP) o o(Q)) pagn) BI 


This yields a differential star product on g* (Gutt 
1983). This star product can be written with an 
integral formula (for v = 2ri)(Drinfeld 1987): 


u x v(£) = J aX JOY eTe, Y) dx dY 
gxg 


where u#(X)= f r u(ņnje 7") and CBH denotes 
Campbell-Baker-Hausdorff formula for the product 
of elements in the group in a logarithmic chart 
(exp X exp Y = exp CBH(X, Y) V X,Y € g). We call 
this the standard (or CBH) star product on the dual 
of a Lie algebra. 

De Wilde and Lecomte (1983) proved that on 
any symplectic manifold there exists a differential 
star product. Fedosov (1994) gave a recursive 
construction of a star product on a symplectic 
manifold (M, w) constructing flat connections on 
the Weyl bundle. Omori et al. (1991) gave an 
alternative proof of existence of a differential star 
product on a symplectic manifold, gluing local 
Moyal star products. In 1997, Kontsevich gave a 
proof of the existence of a star product on any 
Poisson manifold and gave an explicit formula for a 
star product for any Poisson structure on V =R”. 
This appeared as a consequence of the proof of his 
formality theorem. 


Fedosov’s Construction of Star Products 


Fedosov’s construction gives a star product on a 
symplectic manifold (M,w), when one has chosen a 
symplectic connection and a sequence of closed 
2-forms on M. The star product is obtained by 
identifying the space C%(M)[[v]] with an algebra of 
flat sections of the so-called Weyl bundle endowed 
with a flat connection whose construction is related 
to the choice of the sequence of closed 2-forms on M. 


Definition 5 The symplectic group Sp(n, R) acts by 
automorphisms on the formal Weyl algebra W. If 
(M,w) is a symplectic manifold, we can form its 
bundle F(M) of symplectic frames which is a 
principal Sp(m,R)-bundle over M. The associated 
bundle 7 = F(M) x spin,r) W is a bundle of associa- 
tive algebras on M called the Weyl bundle. Sections 
of the Weyl bundle have the form of formal series 

a(x, y, V) =- ` V aoii (ayy 7 y 
2k+1>0 


where the coefficients a, are symmetric covariant 
l-tensor fields on M. The product of two sections 
taken pointwise makes the space of sections into an 
algebra, and in terms of the above representation of 
sections the multiplication has the form 
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(a o b)(x,9,0) 
z pe A 
(ex (FP! a)y blaz) 


Note that the center of this algebra coincides with 
C™(M)[Eiv]]. 





y=z 


A symplectic connection on M is a linear torsion- 
free connection V such that Vw =0. 


Remark 6 It is well known that such connections 
always exist but, unlike the Riemannian case, are 
not unique. To see the existence, take any torsion- 
free connection V’ and set T(X, Y, Z) =(V4,w)(Y, Z). 
Define S by wa(S(X, Y), Z) =(1/3)(T(X, Y, Z) + 
T(Y,X,Z)), then VxY=VyY+S(X, Y) defines a 


symplectic connection. 


The connection V induces a covariant derivative 
on sections of the Weyl bundle, denoted ð. The idea 
is to try to modify it to have zero curvature. 
Consider Da= ða — 6(a) — (1/v)[r,a], where r is a 
1-form with values in 7, with [a,a’dx]=(aod' — 
a’ oa)dx and óla) = (1/v)[ doi wiy'dx’, a]. 


Theorem 7 (Fedosov 1994). For a given series 
Q=)>.,w; of closed 2-forms on M, there is a 
unique r €1(¥ @ A!) satisfying some normalization 
condition, so that Da = ða — 6(a) — (1/v)[r, a] is flat. 
For any a, € C™(M)|[v]], there is a unique a in the 
subspace %p of flat sections of 7, such that 
a(x,0,v)=ao(x,v). The use of this linear isomorph- 
ism to transport the algebra structure of 7p to 
C™(M)[[v]] defines the star product of Fedosov *y go. 


Writing *v,9 => ;>o VC, C? only depends on w; 


for i<r and C? (u,v) Sew (Xa Xp) + Crt (u,v), 
where c € R and the last term does not depend on u,. 


Classification of Star Products 
on a Symplectic Manifold 


Star products on a manifold M are examples of 
deformations of associative algebras (in the sense of 
Gerstenhaber). Their study uses the Hochschild 
cohomology of the algebra (here C®(M) with values 
in C®(M)) where p-cochains are p-linear maps from 
(C°(M))? to C%°(M) and where the Hochschild 
coboundary operator maps the p-cochain C to the 
(p + 1)-cochain 


(OC) (uo,...,Up) =UoC(m1,..., Up) 


For differential star products, we consider differen- 
tial cochains given by differential operators on each 
argument. The associativity condition for a star 
product at order k in the parameter v reads 


` (C, (Cs (u, v), w) 


r+s=k,r,s>0 


— C, (u, C,(v, w))) 


(OC, ) (u,v, w) = 


If one has cochains Cj,j7<k such that the star 
product they define is associative to order k — 1, 
then the right-hand side above is a cocycle 
(O(RHS) =0) and one can extend the star product 
to order k if it is a coboundary (RHS = O(C;)). 

Denoting by m the usual multiplication of func- 
tions, and writing x=m + C, where C is a formal 
series of multidifferential operators, the associativity 
also reads OC=[C,C] where the bracket on the 
right-hand side is the graded Lie algebra bracket on 
D poly(M)[[v]] = {multidifferential operators}. 


Theorem 8 (Vey 1975). Every differential p-cocycle 
C on a manifold M is the sum of the coboundary of a 
differential (p — 1)-cochain and a 1-differential skew- 
symmetric p-cocycle A: C=0B + A. In particular, a 
cocycle is a coboundary if and only if its total skew- 
symmetrization, which is automatically 1-differential 
in each argument, vanishes. Given a connection V on 
M, B can be defined from C by universal formulas 
(Cahen and Gutt 1982). Also 


Hf ¢(C°°(M), C°°(M)) = D(A? TM) 


The similar result about continuous cochains is 
due to Connes (1985). In the somewhat pathological 
case of completely general cochains, the full coho- 
mology is not known. 


Definition 9 Two star products x and *’ on (M, P) 
are said to be equivalent if there is a series of linear 
operators on C®(M), T=Id+ )°™ , v"T;, such that 


Cage Ig [3] 


Remark that the T, automatically vanish on con- 
stants since 1 is a unit for x and for x’. 


If x and x’ are equivalent differential star products, 
then the equivalence is given by differential operators 
T,; if they are natural, the equivalence is given by 
T=ExpE with E=)>>,uE,, where the E, are 
differential operators of order at most r+ 1. 

Nest and Tsygan (1995), then Deligne (1995) and 
Bertelson et al. (1995, 1997) proved that any 
differential star product on a symplectic manifold 
(M,w) is equivalent to a Fedosov star product and 
that its equivalence class is parametrized by the 
corresponding element in H*(M; R)[[v]]. 
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Kontsevich (IHES preprint 97) proved that the 
coincidence of the set of equivalence classes of star 
and Poisson deformations is true for general Poisson 
manifolds: 


Theorem 10 (Kontsevich). The set of equivalence 
classes of differential star products on a Poisson 
manifold (M, P) can be naturally identified with 
the set of equivalence classes of Poisson deforma- 
tions of P: P, =Pv + Par? +--+» € T(X, ~ Toli], 
[P,, P,]=0. 


Deligne (1995) defines cohomological classes 
associated to differential star products on a sym- 
plectic manifold; this leads to an intrinsic way to 
parametrize the equivalence class of such a differ- 
ential star product. The characteristic class c( * ) is 
given in terms of the skew-symmetric part of the 
term of order 2 in v in the star product and in terms 
of local (“v-Euler”’) derivations of the form 
D=v(0/dv) + X + %11" D!. This characteristic 
class has the following properties: 


e The map C from equivalence classes of star 
products on (M,w) to the affine space —[w]/v + 
H? (M; R)I] mapping [ « ] to c( ) is a bijection. 

e The characteristic class is natural relative to 
diffeomorphisms and is equivariant under a 
change of parameter (Gutt and Rawnsley 1995). 

e The characteristic class c( x ) coincides (cf. Deligne 
(1995) and Neumaier (1999)) for Fedosov-type 
star products with their characteristic class intro- 
duced by Fedosov as the de Rham class of the 
curvature of the generalized connection used to 
build them (up to a sign and factors of 2). 


Index theory has been introduced in the frame- 
work of deformation quantization by Fedosov 
(1996) and by Nest and Tsygan (1995, 1996). We 
refer to the papers of Bressler, Nest, and Tsygan for 
further developments in that subject. A first tool in 
that theory is the existence of a trace for the 
deformed algebra; this trace is essentially unique in 
the framework of symplectic manifolds (an elemen- 
tary proof is given in Karabegov (1998) and Gutt 
and Rawnsley (2003)); the trace is not unique for 
more general Poisson manifolds. 


Definition 11 A homomorphism from a differen- 
tial star product x on (M,P) to a differential star 
product * on (M’,P’) is an R-linear map 
A:C™’(M)lv] — C~(M‘)lv]l, continuous in the 
v-adic topology, such that 


A(u xv) = Aux’ Av 


It is an isomorphism if the map is bijective. 


Any isomorphism between two differential star 
products on symplectic manifolds is the combination 
of a change of parameter and a v-linear isomorph- 
ism. Any v-linear isomorphism between two star 
products x on (M,w) and *’ on (M’,w”) is the 
combination of the action on functions of a 
symplectomorphism %: M’ — M and an equivalence 
between x and the pullback via wy of x’. It exists if 
and only if those two star products are equivalent, 
that is, if and only if (w!)*c( *’)=c( *), where 
(yy 1)* denotes the action of w! on the second 
de Rham cohomology space. In particular, a 
symplectomorphism w of a symplectic manifold can 
be extended to a v-linear automorphism of a given 
differential star product on (M,w) if and only if 
(“c(l x )=c( x ) (Gutt and Rawnsley 1999), 

The notion of homomorphism and its relation to 
modules has been studied by Bordemann (2004). 

The link between the notion of star product on a 
symplectic manifold and symplectic connections 
already appears in the seminal paper of Bayen 
et al. (1978), and was further developed by 
Lichnerowicz (1982), who showed that any Vey 
star product (i.e. a star product defined by 
bidifferential operators whose principal symbols at 
each order coincide with those of the Moyal star 
product) determines a unique symplectic connection. 
Fedosov’s construction yields a Vey star product on 
any symplectic manifold starting from a symplectic 
connection and a formal series of closed 2-forms on 
the manifold. Furthermore, any star product is 
equivalent to a Fedosov star product and the 
de Rham class of the formal 2-form determines the 
equivalence class of the star product. On the other 
hand, many star products which appear in natural 
contexts (e.g., cotangent bundles or Kahler mani- 
folds) are not Vey star products but are natural star 
products. 


Theorem 12 (Gutt and Rawnsley 2004). Any 
natural star product on a symplectic manifold 
(M,w) determines uniquely 


(i) A symplectic connection V =V( x). 
(ii) A formal series of closed 2-forms Q= 
Q(x) € vA? (M)ivl]. 

(iii) A formal series E= X`," E, of differential 
operators of order <r + 1 (E2 of order <2), with 
ee ae (ER) Ake uw, where the E) are 
symmetric contravariant k-tensor fields 


such that 


uU*vV = exp —E((exp Eu) xy o (exp Ev)) [4] 


We denote x = xy p £. If T is a diffeomorphism of M 
then the data for T-* is T-V,7-Q, and T-E. In 
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particular, a vector field X is a derivation of a 
natural star product x, if and only if 7xw=0, 
OAl=0. 7eV =0, and g ce =U; 


Group Actions on Star Products 


Symmetries in quantum theories are automorphisms 
of an algebra of observables. In the framework 
where quantization is defined in terms of a star 
product, a symmetry o of a star product * is an 
automorphism of the Rlv]-algebra C™~(M)lv] with 
multiplication given by >: 


ol) 1, 


o(uxv) =o(u) x o(v), 


where o, being determined by what it does on 
C™(M), will be a formal series o(u) = X`, v"0,(u) 
of linear maps g, - C°(M) — C®(M). We denote by 
Autry, (M, * ) the set of those symmetries. 

Any such automorphism ø of » then can be 
written as o(u)=T(uor"), where 7 is a Poisson 
diffeomorphism of (M, P) and T=Id+ 5°. v’T, is 
a formal series of linear maps. If * is differential, 
then the T, are differential operators; if x is natural, 
then T=ExpE with E=}; VE, and E, is a 
differential operator of order at most r + 1. 

If c; is a one-parameter group of symmetries of 
the star product x, then its generator D will be a 
derivation of x. Denote the Lie algebra of v-linear 
derivations of x by Derg, (M, x). 

An action of a Lie group G ona star product * on 
a Poisson manifold (M,P) is a homomorphism 
o:G — Autrp,(M, * ); then o= (T) ™* + O(v) and 
there is an induced Poisson action 7 of G on (M, P). 

Given a Poisson action T of G on (M,P), a star 
product is said to be “invariant” under G if all the 
a are automorphisms of x. 

An action of a Lie group G on * induces a 
homomorphism of Lie algebras D:q — Derg, 
(M,*). For each € € g, De=€" + } >11" Dg, where 
€* is the fundamental vector field on M defined by 7; 
hence, 


EE) = É |yr(exp — t6)x) 


Such a homomorphism D:q — Derpy,j(M, *) is 
called an action of the Lie algebra g on x. 


Proposition 13 (Arnal et al. 1983). Given D:q —> 
Dergy,(M, x) a homomorphism so that for each 
E Eg, De=E + D051 Di, where € are the funda- 
mental vector fields on M defined by an action T of 
G on M and the D} are differential operators, then 
there exists a local homomorphism o:U C G —> 
Autry (M, * ) so that o, =D. 


If we want the analog in our framework to the 
requirement that operators should correspond to the 
infinitesimal actions of a Lie algebra, we should ask 
the derivations to be inner so that functions are 
associated to the elements of the Lie algebra. 

A derivation D € Derpy,](M,*) is said to be 
essentially inner or Hamiltonian if D=(1/v)ad,u 
for some u € C™~(M)lvy]. We call an action of a Lie 
group almost *-Hamiltonian if each De is essentially 
inner; this is equivalent to the knowledge of a linear 
map Aà:g — C®(M)iv] £m Ag so that ad,(1/v) 
[Aes An]. = ads Afen- 

We say the action is *-Hamiltonian if Ag can be 
chosen to make 

g — C”(M)ivl, £m Ae 
a homomorphism of Lie algebras, where C~®(M)[v] 
is endowed with the bracket (1/v)[,],. Such a 
homomorphism is called a quantization in Arnal 
et al. (1983) and is called a generalized moment map 
in Bordemann et al. (1998). 

When a map p2:q — C®(M) is a generalized 
moment map, that is, 


1/ o, 0 0, 0 0 
z (u? iy S ue) = Men] 


the star product is said to be covariant under g. 

When a map p:qg — C%(M)lv] is a generalized 
moment map, so that D¢ has no terms in v of 
degree >0, thus De=€*, this map is called a 
quantum moment map (Xu 1998). Clearly in that 
situation, the star product is invariant under the 
action of q on M. 

Covariant star products have been considered to 
study representations theory of some classes of Lie 
groups in terms of star products. In particular, an 
autonomous star formulation of the theory of 
representations of nilpotent Lie groups has been 
given by Arnal and Cortet (1984, 1985). 

Consider a differential star product * on a 
symplectic manifold, admitting an algebra g of vector 
fields on M consisting of derivations of x, and assume 
there is a symplectic connection V which is invariant 
under g; then x is equivalent, through an equivariant 
equivalence (T with ~xIT=0), to a Fedosov star 
product *v,o,3 this yields to a classification of such 
invariant star products (Bertelson et al. 1998). 


Proposition 14 (Kravchenko, Gutt and Rawnsley, 
Muller-Bahns, Neumaier, and Hamachi). Consider 
a Fedosov star product xy Qo on a symplectic 
manifold. A vector field X is a derivation of *v,q if 
and only if Sxw=0,7x2=0, and xV =Q. A 
vector field X is an inner derivation of *=*v,q if 
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and only if xV =0 and there exists a series of 
functions x such that 


i(X)w — i(X)Q = dAx 


In this case X(u) =(1/v)(ad, Ax)(u). 


On a symplectic manifold (M,w), a vector field X 
is an inner derivation of the natural star product 
x=*y QE if and only if 7xV=0,7xE=0, and 
there exists a series of functions Ax such that 


i(X)w — i(X)Q = dAx 


Then X =(1/v)ad, ux with px =Exp(E")Ax. 

Let G be a compact Lie group of symplecto- 
morphisms of (M,w) and g the corresponding Lie 
algebra of symplectic vector fields on M. Con- 
sider a star product x on M which is invariant 
under G. The Lie algebra g consists of inner 
derivations for x if and only if there exists a series 
of functions Ax and a representative (1/v)(w — Q) 
of the characteristic class of * such that i(X)w— 
i(X)Q = dày. 

Star products which are invariant and covariant 
are used in the problem of reduction: this is a 
device in symplectic geometry which allows one to 
reduce the number of variables. An important 
issue in quantization is to know if and how 
quantization commutes with reduction. This pro- 
blem has been studied by Fedosov for the action of 
a compact group on the particular star products 
constructed by him with trivial characteristic class 
(*y o). Here, one indeed obtains some “quantiza- 
tion commutes with reduction” statements. More 
generally, Bordemann, Herbig, and Waldmann 
considered covariant star products. In this case, 
one can construct a classical and quantum BRST 
complex whose cohomology describes the algebra of 
observables for the reduced system. While this is 
well known classically — at least under some 
regularity assumptions on the group action — for 
the quantized situation, the nontrivial question is 
whether the quantum BRST cohomology is “as large 
as” the classical one. Clearly, from the physical 
point of view, this is crucial. It turns out that 
whereas for strongly invariant star products one 
indeed obtains a quantization of the reduced phase 
space, in general the quantum BRST cohomology 
might be too small. More general situations of 
reduction have also been discussed by, for example, 
Bordemann as well as Cattaneo and Felder, when a 
coisotropic (i.e., first class) constraint manifold is 
given. 


Convergence of Some Star Products 
on a Subclass of Functions 


Let (M,P) be a Poisson manifold and let * be a 
differential star product on it with 1 acting as the 
identity. Observe that if there exists a value k of v 
such that 


Co 
nya >. vy’ C,(u, v) 
r=0 


converges (for the pointwise convergence of func- 
tions), for all u,v € C®(M), to F,(u,v) in such a 
way that F, is associative, then F(u, v) = uv. This is 
easy to see as the order of differentiation in the C, 
necessarily is at least r in each argument and thus 
the Borel lemma immediately gives the result. So 
assuming “too much” convergence kills all defor- 
mations. On the other hand, in any physical 
situation, one needs some convergence properties 
to be able to compute the spectrum of quantum 
observables in terms of a star product (as in Bayen 
et al. 1978). 

In the example of Moyal star product on the 
symplectic vector space (R*”,w), the formal formula 


(u xm v) (2) = exp (5 P” Ax de) (uwo) 





obviously converges when u and v are polynomials. 
On the other hand, there is an integral formula for 
Moyal star product given by 


(w+ v)(6) = (ah) > | uee) 
x ap 7 (E E") Holé" +8) 
tolg’, 6) Jae de" 


and this product x gives a structure of associative 
algebra on the space of rapidly decreasing functions 
y(R2"). The formal formula converges (for v = ib) in 
the topology of .7’ for u and v with compactly 
supported Fourier transform. 

Some works have been done about convergence of 
star products. 


e The method of quantization of Kahler manifolds 
due to Berezin as the inverse of taking symbols of 
Operators, to construct on Hermitian symmetric 
spaces star products which are convergent on a 
large class of functions on the manifold 
(Moreno, Cahen Gutt, and Rawnsley, Karabegov, 
Schlichenmaier). 
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e The constructions of operator representations of 
star products (Fedosov, Bordemann, Neumaier, 
and Waldmann). 

e The work of Rieffel and the notion of strict 
deformation quantization. Examples of strict (Fré- 
chet) quantization have been given by Omori, 
Maeda, Niyazaki, and Yoshioka, and by Bieliavsky. 


Convergence of Berezin-Type Star Products 
on Hermitian Symmetric Spaces 


The method to construct a star product involves 
making a correspondence between operators and 
functions using coherent states, transferring the 
Operator composition to the symbols, introducing a 
suitable parameter into this Berezin composition of 
symbols, taking the asymptotic expansion in this 
parameter on a large algebra of functions, and then 
showing that the coefficients of this expansion 
satisfy the cocycle conditions to define a star 
product on the smooth functions (Cahen et al. 
1995). The idea of an asymptotic expansion appears 
in Berezin (1975) and in Moreno and Ortega- 
Navarro (1983, 1986). 

This asymptotic expansion exists for compact M, 
and defines an associative multiplication on formal 
power series in k7! with coefficients in C%(M) for 
compact coadjoint orbits. For M a Hermitian 
symmetric space of compact type and more gener- 
ally for compact coadjoint orbits (i.e., flag mani- 
folds), this formal power series converges on the 
space of symbols (Karabegov 1998). 

For general Hermitian symmetric spaces of non- 
compact type, using their realization as bounded 
domains, one defines an analogous algebra of 
symbols of polynomial differential operators. 

Reshetikhin and Takhtajan have constructed an 
associative formal star product given by an asymp- 
totic expansion on any Kahler manifold. This they 
do in two steps, first building an associative product 
for which 1 is not a unit element, then passing to a 
star product. 

We denote by (L,V,/) a quantization bundle for 
the Kahler manifold (M, w, J) (i.e., a holomorphic 
line bundle L with connection V admitting an 
invariant Hermitian structure b, such that the 
curvature is curv(V) =—2imw). We denote by 7 the 
Hilbert space of square-integrable holomorphic 
sections of L which we assume to be nontrivial. 
The coherent states are vectors eg € 7 such that 


S\N) =(8e)g, E fe, KEM, SEZ 

(y is the complement of the zero section in L). The 
function e(x) = lal leal? q E€ ¥x, is well defined and 
real analytic. 


Let A: 7 — 7 be a bounded linear operator 
and let 


A(x) _ (Aeq, €q) , 


qEZy XEM 

(Casey) ” 
be its symbol. The function A has an analytic 
continuation to an open neighborhood of the 
diagonal in M x M given by 


(Aeg, €q) 


A(x, y) (eq, €q) 


, GEL%x, GEL, 


which is holomorphic in x and antiholomorphic in y. 
We denote by E(L) the space of symbols of bounded 
operators on 7. We can extend this definition of 
symbols to some unbounded operators provided 
everything is well defined. 

The composition of operators on 7 gives rise to an 
associative product * for the corresponding symbols: 





(Ax B)(x) = | Ale,y)B.x)o(e. y)el wy) 


n! 


2 
dae q E Y x, qE 
leg II" |leq|| 
is a globally defined real analytic function on 
M x M provided «e has no zeros (%(x,y) < 1 every- 
where, with equality where the lines spanned by eg 

and ey coincide). 

Let k be a positive integer. The bundle (L? = @* L, 
V*,b*) is a quantization bundle for (M, kw, J) and 
we denote by %7% the corresponding space of 
holomorphic sections and by E(L*) the space of 
symbols of linear operators on 7%. We let e® be the 
corresponding function. We say that the quantiza- 
tion is regular if e% is a nonzero constant for all 
non-negative k and if w(x,y)=1 implies x=y. 
(Remark that if the quantization is homogeneous, 
all e% are constants.) 


Theorem 15 (Cahen et al.). Let (M,w,J) be a 
Kahler manifold and (L,V,h)_ be a regular quan- 
tization bundle over M. Let A,B be in 2, where 
Z C CY(M) consists of functions f which have an 
analytic continuation in MxM so that f(x, yyy 
(x,y)! is globally defined, smooth and bounded on 
K x M and M x K for each compact subset K of M 
for some positive power l. Then 





(A «, B)(x) = J aao (x, ee 
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defined for k sufficiently large, admits an asymptotic 
expansion ink as k — oo 


(A *, B)(x) ~ XC k7” C(A, B)(x) 


r>0 


and the cochains C, are smooth bidifferential 
operators, invariant under the automorphisms of 
the quantization and determined by the geometry 
alone. Furthermore, Co(A,B)=AB and C,(A, B)— 
Cı (B, A) = (i/m){A, B). 

If M is a flag manifold, this defines a star product 
on C®(M) and the x, product of two symbols is 
convergent (it is a rational function of k without 
pole at infinity) (cf. Karabegov in that generality). 

If 7 be a bounded symmetric domain and « the 
algebra of symbols of polynomial differential opera- 
tors on a homogeneous holomorphic line bundle L 
over Y which gives a realization of a holomorphic 
discrete series representation of Go, then for f and g 
in € the Berezin product f x, g has an asymptotic 
expansion in powers of k which converges to a 
rational function of k. The coefficients of the 
asymptotic expansion are bidifferential operators 
which define an invariant and covariant star product 
on C™(9). 


Star Products on Cotangent Bundles 


Since from the physical point of view cotangent 
bundles 7: T*O — O over some configuration space 
O, endowed with their canonical symplectic struc- 
ture wo, are one of the most important phase spaces, 
any quantization scheme should be tested and 
exemplified for this class of classical mechanical 
systems. 

We first recall that on T*O there is a canonical 
vector field €, the Euler or Liouville vector field 
which is locally given by £=p,(0/Op,). Here and in 
the following, we use local bundle coordinates 
(q*,p,) induced by local coordinates x* on Q. 
Using é we can characterize those functions 
f € C*(T*O) which are polynomial in the fibers of 
degree k by ¿f = kf. They are denoted by Pol*(T* QO), 
whereas Pol*®(T*Q) denotes the subalgebra of all 
functions which are polynomial in the fibers. 
Clearly, most of the physically relevant observables 
such as the kinetic energy, potentials, and generators 
of point transformations are in Pol’(T*Q). More- 
over, Pol*(T*Q) is a Poisson subalgebra with 


f Pol (T*Q), Pol’ (T*Q) } Cro rO 15 


since “ gwo =wọ is conformally symplectic. 


All this suggests that for a quantization of T*Ọ, 
the polynomials Pol*(T*Q) should play a crucial 
role. In deformation quantization this is accom- 
plished by the notion of a homogeneous star product 
(De Wilde and Lecomte 1983). If the operator 


o 

is a derivation of a formal star product x, then x is 
called homogeneous. It immediately follows that 
Pol(T*O)[v] C C”(T*O)[[v]] is a subalgebra over 
the ring C[v] of polynomials in v. Hence for 
homogeneous star products, the question of conver- 
gence (in general quite delicate) has a simple answer. 

Let us now describe a simple construction of a 
homogeneous star product (following Bordemann 
et al. (1998)). We choose a torsion-free connection 
V on O and consider the operator of the symme- 
trized covariant derivative, locally given by 


D = de® V Vojae: PO (S*°T*Q) 9 T@(S**'T*Q) [7] 


Clearly, D is a global object and a derivation of the 
symmetric algebra a rer O). Let now 
f € Pol’(T*O) and weC%(Q) be given. Then one 
defines the standard-ordered quantization Ogq(f) of f 
with respect to V to be the differential operator 


Osta(f): C°(Q) > C™(Q) locally given by 








(0 EE ENP 
af) G) o [8] 


where i, denotes the symmetric insertation of vector 
fields in symmetric forms. Again, this is independent 
of the coordinate system x*. The infinite sum is 
actually finite as long as f € Pol’(T*O) whence we 
can safely set y=ib in this case. Indeed, [8] is the 
well-known symbol calculus for differential opera- 
tors and it establishes a linear bijection 


osa : Pol’ (TQ) > DiffOp(C*(Q)) 9] 


which generalizes the usual canonical quantization 
in the flat case of T*O =T*R” =R”. Using this 
linear bijection, we can define a new product xsțq for 


Pol’(T*Q) by 


CO 


f sed 8 = OSA (Osta (fF) Osta(g)) = X Cfg) [10] 
r=0 


It is now easy to see that xq fulfills all requirements 
of a homogeneous star product except for the fact 
that the C,(-,-) are bidifferential. In this approach 


Deformations of the Poisson Bracket on a Symplectic Manifold 33 


this is far from being obvious as we only worked 
with functions polynomial in the fibers so far. 
Nevertheless, it is true whence xc, indeed defines a 
star product for C~(T*O)|[v]]. 

In fact, there is a different characterization of xq 
using a slightly modified Fedosov construction: first 
one uses V to define a torsion-free symplectic 
connection on T*O by a fairly standard lifting. 
Moreover, using V one can define a standard- 
ordered fiberwise product ogg for the formal Weyl 
algebra bundle over T*O, being the starting point of 
the Fedosov construction of star products. With 
these two ingredients one finally obtains xg from 
the Fedosov construction with the big advantage 
that now the order of differentiation in the C, can 
easily be determined to be r in each argument, 
whence xq is even a natural star product. More- 
over, C, differentiates the first argument only in 
momentum directions which reflects the standard 
ordering. 

Already in the flat situation the standard ordering 
is not an appropriate quantization scheme from the 
physical point of view as it maps real-valued 
functions to differential operators which are not 
symmetric in general. To pose this question in a 
geometric framework, we have to specify a positive 
density u € P*(|A”|T*O) on the configuration space 
O first, as for functions there is no invariant 
meaning of integration. Specifying u we can con- 
sider the pre-Hilbert space C3°(Q) with inner 
product 


(6,) = L g [11] 


Now the adjoint with respect to [11] of ostqa(f) can 
be computed explicitly. We first consider the 
second-order differential operator 


POORE re oF. k O [12] 
Ogkdp, "® Əprðpm l Ape 


where I* are the Christoffel symbols of V. In fact, 
A is defined independently of the coordinates and 
coincides with the Laplacian of the pseudo- 
Riemannian metric on T*O which is obtained from 
the natural pairing of vertical and horizontal spaces 
defined by using V. Moreover, we need the 1-form 
a defined by Vxu=qa(X)u and the corresponding 
vertical vector field a’ € T™(T(T*QO)) locally given 
by a” =a,z(0/Op,). Then 


osa(f)' = Osa(N’f), N=eW/Are) S 


Note that due to the curvature contributions, this 
statement is a highly nontrivial partial integration 
compared to the flat case. Note also that for 


f € Pol(T*O)[v], we have Nf € Pol(T*O)[v] as 
well, and N commutes with H. As in the flat case 
this allows one to define a Weyl-ordered quantiza- 
tion by 


Oweyi(f) = Osea (NF) [14] 


together with a so-called Weyl-ordered star product 


Í * Weyl § = N7' (Nf Std Ng) [15] 


which is now a Hermitian and homogeneous star 
product such that owe, becomes a *-representation of 
* Weyls that IS, we have Oweyl (f * Weyl 8) = Oweyi(P) 


Oweyi(g) and owey(f)' =owey(f). Note that in the 
flat case this is precisely the Moyal star product xm 


from [1]. 

The star products xs and xwe have been 
extensively studied by Bordemann, Neumaier, 
Pflaum, and Waldmann and provide now a well- 
understood quantization on cotangent bundles. We 
summarize a few highlights of this theory: 


1. In the particular case of a Levi-Civita connection 
V for some Riemannian metric g and the 
corresponding volume density pg, the 1-form 
a vanishes. This simplifies the operator N and 
describes the physically most interesting situation. 

2. If the configuration space is a Lie group G, then its 
cotangent bundle T*G & G x g* is trivial by using, 
for example, left-invariant 1-forms. In this case the 
star products x wey and *s.q restrict to the CBH star 
product on g*. Moreover, x wey; coincides with the 
star product found by Gutt (1983) on T*G. 

3. Using the operator N one can interpolate between 
the two different ordering descriptions osiq and 
OWey! by inserting an additional ordering parameter 
k in the exponent, that is, N, = exp(vK(A +a‘)). 
Thus, one obtains «-ordered representations 0, 
together with corresponding «-ordered star pro- 
ducts *,, where «=O corresponds to standard 
ordering and «= 1/2 corresponds to Weyl order- 
ing. For x= 1, one obtains antistandard ordering 
and in general one has the relation f *, g=2 *1_. f 
as well as o,(f)' = o1-«(f). 

4. One can describe also the quantization of an 
electrically charged particle moving in a magnetic 
background field B. This is modeled by a closed 
2-form B € T™(A7T*Q) on QO. Using local vector 
potentials A € [~(T*O) with B=d~A locally, and 
by minimal coupling, one obtains a star product 
xg which depends only on B and not on the local 
potentials A. It will be equivalent to xwey; if and 
only if B is exact. In general, its characteristic 
class is, up to a factor, given by the class [B] of 
the magnetic field B. While the observable 





34 0-Approach to Integrable Systems 


algebra always exists, a Schrédinger-like repre- 
sentation of xg only exists if B satisfies the usual 
integrality condition. In this case, there exists a 
representation on sections of a line bundle whose 
first Chern class is given by [B]. This manifests 
Dirac’s quantization condition for magnetic 
charges in deformation quantization. Another 
equivalent interpretation of this result is obtained 
by Morita theory: the star products xwey; and xg 
are Morita equivalent if and only if B satisfies 
Dirac’s integrality condition. 

5. Analogously, one can determine the unitary 
equivalence classes of representations for a fixed, 
exact magnetic field B. It turns out that the 
representations depend on the choice of the global 
vector potential A and are unitarily equivalent if 
the difference between the two vector potentials 
satisfies an integrality condition known from the 
Aharonov-Bohm effect. This way, the Aharonov— 
Bohm effect can be formulated within the repre- 
sentation theory of deformation quantization. 

6. There are several variations of the representa- 
tions os and ọwey In particular, one can 
construct a representation on half-forms instead 
of functions, thereby avoiding the choice of the 
integration density u. Moreover, all the Weyl- 
ordered representations can be understood as 
GNS representations coming from a particular 
positive functional, the Schrödinger functional. 
For Owey, this functional is just the integration 
over the configuration space QO. 

7. All the (formal) star products and their represen- 
tations can be understood as coming from formal 
asymptotic expansions of integral formulas. From 
this point of view, the formal representations and 


star products are a particular kind of global 
symbol calculus. 

8. At least for a projectible Lagrangian submanifold 
L of T*Ọ, one finds representations of the star 
product algebras on the functions on L. This 
leads to explicit formulas for the WKB expansion 
corresponding to this Lagrangian submanifold. 

9. The relation between configuration space symme- 
tries, the corresponding phase-space reduction, 
and the reduced star products has been analyzed 
extensively by Kowalzig, Neumaier, and Pflaum. 


See also: Classical Matrices, Lie Bialgebras, and 
Poisson Lie Groups; Deformation Quantization; 
Deformation Quantization and Representation Theory; 
Deformation Theory; Fedosov Quantization; Operads. 
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Introduction 


The -approach is one of the most generic methods 
for constructing solutions of completely integrable 
systems. Taking into account that most soliton 
systems are represented as compatibility condition 
for a set of linear differential operators (Lax pairs, 
zero-curvature representations, L-A-B Manakov 
triples), it is sufficient to construct these operators. 


Such compatible families can be defined by present- 
ing their common eigenfunctions. If it is possible to 
show that some analytic constraints imply that a 
function is a common eigenfunction of a family of 
operators, solutions of original nonlinear system are 
also generated. 

The main idea of the 0 method is to impose the 
following analytic constraints: if A denotes the 
spectral parameter and x the physical variables, 
then, for arbitrary fixed values x, the Ox derivative 
of the wave function is expressed as a linear 
combination of the wave functions at other values 
of A with x-independent coefficients. In specific 
examples, this property is either derived from the 


direct spectral transform or imposed a priori. Of 
course, the specific realization of this scheme 
depends critically on the nonlinear system. 

The origin of the 0-method came from 
the following observation. A solution of the one- 
dimensional _inverse-scattering problem (the 
problem of reconstructing the potential by discrete 
spectrum and scattering amplitude at positive 
energies) for the one-dimensional time-independent 
Schrödinger operator 


L=-0 + u(x) (1) 


was obtained by Gelfand, Levitan, and Marchenko 
in the 1950s. It essentially used analytic continua- 
tion of the wave function from the real momenta to 
the complex ones. If the potential u(x) decays 
sufficiently fast as |x| — oo, then the eigenfunction 
equation 


Lu(k, x) = Rok, x) [2] 


has two solutions %4(k,x) and w_(k,x) such that 


1. we(Rk,x) =(1 + 0(1))e** as x — oo. 

2. The functions w+(k, x), w_(k,x) are holomorphic 
in k in the upper half-plane and the lower half- 
plane, respectively. 


Existence of analytic continuation to complex 
momenta is typical for one-dimensional systems. But 
in the multidimensional case the situation is differ- 
ent. For example, wave functions for the mutlidi- 
mensional Schrödinger operator constructed by 
Faddeev are well defined for all complex momenta 
k, but they are nonholomorphic in k, and they 
become holomorphic only after restriction to some 
special one-dimensional subspaces. The last property 
was one of the key points in the Faddeev approach. 

Beals and Coifman (1981-82) and Ablowitz et al. 
(1983) discovered that departure from holomorphi- 
city for multidimensional wave functions can be 
interpreted as spectral data. Such spectral trans- 
forms proved to be very natural and suit perfectly 
the purposes of the soliton theory. Some other 
famous methods, including the Riemann—Hilbert 
problem, can be interpreted as special reductions of 


the 0 method. 


Nonlocal 0-Problem and Local 0-Problem 


The most generic formulation of the 0-method is the 
nonlocal 0-problem. Assume that the following data 
is given: 


1. A rational nxn matrix-valued normalization 
function n(A). 
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2. An nxn matrix-valued function g(A,x) (it 
describes the dynamics) such that 


@ (A,x) depends on the spectral parameter A € C 
and “physical” variables x=(x1,...,xN); 
the physical variables x, are either continuous 
(x, belongs to a domain in R or in C) or 
discrete (x, takes integer values); 

e g(A,x) is analytic in A, defined for all A € C, 
except for a finite number of singular points, 
and is single valued; and 

e det g(\,x) has only finite number of zeros. 


For problems with continuous physical variables the 
typical form of g(A, x) is g(A, x) = exp( >), x;K;(A)), 
where K;(A) are meromorphic matrices, mutually 
commuting for all A. The discrete variables are 
usually encoded in orders of poles and zeros. 

3. An nxn matrix-valued function R(A, u) — the 
“generalized spectral data.” Usually, it is a regular 
function of four real variables RA, SA, Ru, Su. (We 
write this as a function of two complex variables, 
but we do not assume it to be holomorphic. It 
would be more precise to write it as R(A, A, u, B), 
but to avoid long notations we omit the A, ji 
dependence.) To avoid analytical complications, 
the function R(X, u) is usually assumed to vanish 
as A or u tend to singular points of n(A), g(A, x). 


Then the wave function W is defined by the data 
using the following properties: 


1. VW=W(\,x) takes values in complex nxn 
matrices: 
wi, x) inlA, x) 
Wni(A, x) Wnn(A, x) 


2. For all A€ C outside the singular points, the 
n(A), g(A, x) wave function W satisfies the 0-equation 
of inverse-spectral problem, 


= ff. dun dawa) 4] 


It is important that condition [4] is x-independent. 
3. The function y(A,x)—7(A), where x(à, x)= 
P(A, x)g71 (A, x), is regular for all A € C and 


x(A, x) — n(A) +9 as |A| — œ [5] 
The wave function W(\,x) is calculated by 


employing the data n(A), g(A,x), R(A, u) using the 
following procedure. Taking into account that the 
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functions n(A), g(A, x) are holomorphic in A, eqn [4] 
can be rewritten in terms of x(A, x): 


Alx(A, x) — nO) | EA dix (14) 80-2) 


aX 
x R(A, n)g* (m, x) [6] 


The right-hand side of [6] is regular; therefore, this 
relation is valid for all complex A values. 

Equation [6] with the boundary condition [5] is 
equivalent to the following integral equation: 


dv ^A di 


Yt V—À 


x 7 p du ^ dä x(u,x)g(v, x) 
' (m, x) [7] 


This equation can be derived using the generalized 
Cauchy formula. Let f(z) be a smooth (not necessa- 
rily holomorphic) function in a bounded domain D 
in the complex plane. Then 


fo) =- fo X fo 
dy A 7 L of x) 
oe tl hos 8 


If the kernel  g(A,x)R(A, pg! (u, x) is 
sufficiently good (e.g., it is sufficient to assume, that 
(1 + JA) g(a, x)R(A, u)g (u %)(1 + pf)”, € > 0, i 
a continuous function at both finite and infinite 
points), then we have a Fredholm equation (the 
operator on the right-hand side of [7] is compact). 
If it has no unit eigenvalues, eqn [7] is uniquely 
solvable. But, for some values of x, one of the 
eigenvalues may become equal to 1, and it results 
in singularities of solutions. 

If the norm of the integral operator is smaller than 
1, eqn [7] is uniquely solvable. To generate solutions 
that are regular for all values of physical variables, it 
is natural to restrict the class of admissible spectral 
data by assuming the kernel g(\,x)R(A, uw) 7! (u, x) 
to be bounded in x for all A, u. In the scalar case 
n= 1, this restriction implies: 


R(A, p) = 0 
for all A, w such that g(A,x)g~ 





x(A, x) = 


x Rv, wg” 





! (u, %) 


is unbounded in x 19] 


For specific examples like the Kadomtsev—Petviashvili-II 
(KP-II), direct scattering transform automatically 
generates spectral data satisfying [9]. In KP-II, [9] 
implies 


R(A, u) = A) = u) + TASA- a) [10] 


The coefficient A(A) can be eliminated by multi- 
plying the wave function to an appropriate function 
of A; therefore, in standard texts, A(A) = 0. 

If for every A the kernel R(A, m) is equal to 0 
everywhere except at finite number of points 
ur(A),...,pp(A), one has the so-called local 
0-problem. Such kernels are rather typical. 


Examples of Soliton Systems Integrable 
by the 0-Problem Method 


Let us discuss some important examples. 


The KP-II Hierarchy 


The first nontrivial equation from the KP hierarchy 
has the following form: 


(U, + 6uUUx — Uxxx). = 3a Uyy [11] 


From a physical point of view, the case of real a? is 
the most interesting one. Equation [11] is called 
KP-I if œ? = —1 and KP-II if a? = +1. The Lax pair 
for KP-II reads: 


[L, A] =O 
where 


L =ð —& + u(x, y,#) 
A=0,- 40 + 6u(x, y, t)Ox [12] 
+ 3ux(x,y,t) + 3w(x, y, t) 


The Cauchy problem for initial data u(x,y,0) 
decaying at infinity is solved by using the nonlocal 
Riemann problem for KP-I and local 0-problem for 
KP-II. The wave function is assumed to be scalar 
valued (n= 1), and -equation [4] takes the follow- 
ing form: 


OW (A, X, Y, t) 


SOE = TAWA x,y, t) 13] 


The wave function W(\,x,y,t) is assumed to be 
regular for finite ’s and to have the following 
essential singularity as A = oo: 


P(A, x,y, t) = exp(Ax + X y +472)(1 + 0(1)) [14] 


Equivalently, 7(A) = 1 and the function g(A, x,t) has 
one essential singularity at A = a, 


g(A, x,t) = exp(Ax + d?y + 4°2) [15] 


Higher times ft, from the KP hierarchy are 
incorporated into this scheme by assuming that 


g(A, t) = exp © vn) [16] 


k=1 


Here X=. Y = tn t= fla: 
Equation [13] was originally derived (Ablowitz 
et al. 1983) from the direct spectral transform. If the 


ioa u(x, 7) is sure small and 
u(x, y) = O(1/(x?2 + y*)'*) for x? +y — oo, then 
the wave function WV(\, x,y) for the L-operator [12] 

LU(A; x;y) =0 
P(A, x,y) = exp(àx +r°y)[1+o0(1)] [17] 


for x* + y* — 00 


can be constructed by solving the following integral 
equation for the pre-exponent y(A, x, y) = U(A, x, y) 
exp(—Ax — A*¥): 


xArt) = 1- ff G(x —a!,y—y)ulx,y) 
x x(A, x, y) dx’ dy’ [18] 
where the Green function G(A, x, y) is given by 


DxX+Pyy) 
G(A,x ,y)= 42 af dpx dpy [19] 


It is not holomorphic in A, but 


OG(A,x,y) ıl ~2iRAx—4iKASAY 

os 5, Sen(RA) ë [20] 
The nonholomorphicity of G(A,x,y) results in 
the special nonholomorphicity of W(A,x,y) of the 
form [13]. 


Remark We see that one function of two real 
variables T(A) is sufficient to solve the Cauchy 
problem in the plane. But it is also possible to 
construct solutions of KP-II starting from generic 
nonlocal kernels R(A, u) (to guarantee at least local 
existence of solutions, it is enough to assume that 
R(A, u) is small and has finite support). It looks 
like a paradox, but the situation is exactly the 
same in the linear case. In the standard Fourier 
method, only exponents with real momenta are 
used, but local solutions can be constructed as 
combinations of exponents with arbitrary complex 
momenta. 


Novikov-Veselov Hierarchy and Two-Dimensional 
Schrodinger Operator 


Equations from this hierarchy admit representation 
in terms of Manakov L—A-B triples, 
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OL 


ap = Am L] + Buk [21] 


where 
L = —40,03; + u(z, t) 


22 
A, — 22n+1 oo 4 os ey | | 


The order of B,, is smaller than 2” + 1. In particular, 
for n=1, 


A1 = 8(0) + 02) +2(wd, + wdz) 


E [23] 
By = w; + wz 


u, = 80u + 802u + 20z(uw) + 28 (uw) [24] 
where 


HW 21) =U t), Ozw(z, t) = —30zu(z,t) [25] 


This hierarchy is integrated using the scattering 
transform at zero energy for the two-dimensional 
Schrödinger operator L. If Cauchy data with 
asymptotic 


u(z) + —Eo, w(z)— 0, for |z| = œ [26] 


is considered, the scattering transform for the 
operator L= L + Eo with the potential (z) = u(z) + 
Eo at fixed energy Eo and decaying at infinity is used. 
In fact, the fixed-energy scattering problem is one of 
the basic problems of mathematical physics, and the 
Novikov—Veselov hierarchy can be treated as an 
infinite-dimensional Abel symmetry algebra for this 
problem. The scattering transform essentially depends 
on the sign of Ep. The case Ey = 0, studied by Boiti, 
Leon, Manna, and Pempinelli is the most complicated 
from the analytic point of view, and we do not 
discuss it now. 

If Eo < 0, the wave function satisfies a pure local 
0-relation: 


eee) = Tou (5, ) [27] 
OX À 
with n(à) = 1, and 
g(À,2) — eee Ay Ke — — Eo [28] 


Starting from generic spectral data T(A), one obtains 
a fixed-energy eigenfunction for a second-order 
operator, 


LV(),z) = Eo¥(, z) 


g [29] 
L = —40,0; + V(z)ð; + ù(z) 


To generate pure potential operators (V(z) = 0), it is 
necessary to impose additional symmetry constraints 
of the spectral data (see the section “Reductions on 


the ð data”). 
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If Ey >0, there are two types of generalized 
spectral data — 0-data and nonlocal Riemann 
problem data. The wave function satisfies the 
0-relation: 


— = TO)Y(—5.2), JA| #1 [30] 


and has a jump at the unit circle |[A|=1. The 
boundary values W4+(A,z)=W(A(140),z) are 
connected by the following relation: 


UW. (4.2) = U_(A,2) + f ROY- alde B1 


g(A, z) — een). k2 = Eo [32] 


Constraints on the spectral data associated with 
pure potential operators were found by Novikov 
and Grinevich for R(A, 4), and by Manakov and 
Grinevich for T(A). Existence of two different types 
of generalized scattering data has a very transparent 
physical meaning: there is a one-to-one correspon- 
dence between the classical scattering amplitude at 
energy Eo and the nonlocal Riemann problem data 
R(A, u). The -data T(\) can be treated as a 
complete set of additional parameters enumerating 
all potentials with a given scattering amplitude at 
one energy. 


Davey-Stewartson-lIl and Ishimori-| Equations 


The Davey-—Stewartson-II (DS-II) equation 


i,q +2 (82 + &)q +(g+z)q =0 [33] 
Ozg = —K0;\q/? [34] 


q=4(%,t),  8=82t),  gz=x+iy [35] 
can be treated as an integrable (2 + 1)-dimensional 
extension of nonlinear Schrödinger equation. The 


Ishimori-I equation 


3S +S8x(LS+HS)+A,wAS+I,wIS=0 [36] 
Zw — Ew + 28(A,S x 3S) =0 [37] 


S=S(x,y,t), S= (61,82,83), S*=1 [38] 


is an integrable (2 + 1)-dimensional extension of the 
Heisenberg magnetic equation. Both systems are 


integrated by using the following zero-curvature 
representation: 


(i „Jeza a er [39] 
0 o; 2 \ Kq(z,t) 0 


210? + ig idz — iqdz 
—ikĝz +ikqð; —2i02 — ig 


The wave function satisfies the following “scatter- 
ing” equation: 


aA 0 0 
y! = y! [41] 
O Ok b(k) 0 


Here Y" denotes the transposed matrix. Let us point 
out the amazing symmetry between the direct and 
inverse transforms. 


Discrete Systems 


In the examples discussed above, continuous vari- 
ables are “encoded” in essential singularities of 
g(A,x). Discrete variables correspond to orders of 
zeros and poles. For example, assuming that the 
function g(A,t) in the KP integration scheme 
depends on extra continuous variables t_;,t_2,..., 


t_n,... and discrete variable tọ =n, 
g(A,t) =A” exp| 3 A [42] 
k#0 


one obtains solutions of the so-called two-dimensional 
Toda-KP hierarchy. 

Assume that we have a nonlocal 0-problem for a 
scalar function with 7 = 1 and 


k = Np 
g(A,m1,...,Mp) ===) [43] 


j=1 





The wave function defines a map Z£ > CN, 


(14,..., 7%) E (WA, 21,---5 Ak) -3 
WAN, 11, see ,Np)) [44] 


where y,...,AN are some points in C. This 
construction generates the so-called quadrilateral 
lattices (each two-dimensional face is planar). 


Multidimensional Problems 


The 0-approach can also be applied to multidimen- 
sional inverse-scattering problems, but typically the 
scattering data are overdetermined and satisfy 
additional nonlinear compatibility conditions. For 


example, the Faddeev wave functions for the 
n-dimensional stationary Schrodinger operator 


(—a%, -= — 0}, + u(x) ) V (k, x) = (k k)U (k, x) 45] 


W(k, x) = e**(1 + 0(1)) 46] 


in the nonphysical domain kı Æ 0 (kgr and kı denote 
the real and imaginary parts of k, respectively) 
satisfy the following O-equation: 


OW(k, x) _ -21 | GPCR Re + EUR +E, 2) 


Ok; 
x 65-9 +2R-¢)déi -+ + dg, [47] 


The characterization of admissible spectral data 
h(k, l, k € C”,le R” is based on the following 


compatibility equation: 


Oh(k,l) 1əb(k, 1) 
Ok; 2 ol; 
= —2r ; &h(k, kr +6)b(R+ 6, !) 
ER” 
x S(E -E+ 2k- Ede +- dE, 48) 


More details can be found in Novikov and Henkin 
(1987). 


Reductions on the 0-Data 


The most generic 0-data usually result in solutions 
from wrong functional class (they may, e.g., be 
complex or singular), or extra constraints on the 
auxiliary linear operators are necessary to obtain 
solutions of the zero-curvature representation. For 
example, to obtain real KP-II solutions using the 
local 0-problem [13], the following reduction on the 
-data should be implied: 


TÀ) = -T(A) 49] 


It can be easily derived from the direct transform. 
But it is not always the case. For example, the 
selection of pure potential two-dimensional 
Schrödinger operators originally was not so evident. 
To formulate the answer, it is convenient to 
introduce a new function b(A), T(A)=b(A)r 
sen(A\ — 1)/À. 

For Eo < 0, the following constraints select real 
potential operators: 


b(- =) =O. (>) -BA (50) 


In some situations, the problem of finding appro- 
priate reductions is the most difficult part of the 
integration procedure. It is true not only for the 
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0-approach, but also for other techniques including 
the finite-gap method. For example, the inverse- 
spectral transform for the two-dimensional 
Schrödinger operator was first developed for 
finite-gap (quasiperiodic) potentials and only later 
for the decaying ones. For operators with finite gap 
at one energy the first-order terms were constructed 
by Dubrovin, Krichever, and Novikov in 1976, but 
only in 1984 the potentiality reduction was found by 
Novikov and Veselov. 


Nonsingular Solutions 


As mentioned above, one can construct regular 
solutions by choosing sufficiently small (in an 
appropriate norm) scattering data. But for some 
special systems the regularity follows automatically 
from reality reductions. For example, for arbitrary 
large 0-data, real KP-II solutions constructed by the 
local 0-problem [13] with reduction [49] are regular. 
The proof is based on the theory of generalized 
analytic functions (in the Vekua sense). Another 
example is the two-dimensional Schfodinger opera- 
tor at a fixed negative energy Ey < 0. The potenti- 
ality and reality constraints imply that the potential 
is nonsingular for arbitrary large T(A). But, unfortu- 
nately, the 0-problem with regular data covers only 
a part of the space of potentials. In fact, each such 
operator possesses a strictly positive real eigenfunc- 
tion at the level Eo, exponentially growing in all 
directions (it also follows from the generalized 
analytic functions theory). Existence of such func- 
tion implies that the whole discrete spectrum is 
located above the energy Eo, and it gives a 
restriction on the potential. (For more details, see 
the review by Grinevich (2000).) 


Some Explicit Solutions 


The generic 0-problem results in potentials that 
could not be expressed in terms of elementary or 
standard special functions. But for degenerate 
kernels, a solution of the inverse-spectral problem 
can be written explicitly. For example, if 


RAH) = So re(A)sa() 51 
k=1 


the wave function and solutions can be expressed in 
quadratures. 

In particular, if all r(A) and s,(j) are 6-functions, 
rp(A) = Rgd(A— Ap) and sg(u)=Skólu— Hk), the 
wave function is rational in À and can be expressed 
as a rational combination of exponents of xg. If for 
some k and l, Ap = u, this procedure needs some 
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regularization. For example, it is possible to assume, 
that 6(A — Ao) /(A — Ao) = 0. 

If for all k, \x=p%, the -problem generates 
rational in x solutions (lumps). It is possible to show 
that, the Novikov—Veselov real rational solitons for 
Eọ >O are always nonsingular, decay at oo as 
1/(x? + y*), and the potential u(z) has zero scatter- 
ing in all directions for the waves with energy Eo. 


The 0-Problem on Riemann Surfaces 


In all examples discussed above, the spectral vari- 
able is defined in a Riemann sphere. It is natural to 
generalize it by considering wave functions depend- 
ing on a spectral parameter defined on a Riemann 
surface of higher genus. Spectral transforms of such 
type arise in the theory of localized perturbation of 
periodic solutions. Assume that the KP-II potential 
u(x,y) has the form 


u(x, y) = uo(x, y) + u (x,y) |52] 


where uo(x,y) is a real nonsingular finite-gap 
potential and u1ı(x,y) decays sufficiently fast at 
infinity. Denote by Yo(y, x,y) the wave function of 
the operator Lo = 0, — 02 + uo(x, y), where y €r, 
the spectral curve I is a compact Riemann surface of 
genus g with a distinguished point oo. In addition to 
essential singularity at the point oo, the wave 
function WVo(y,x,y) has g simple poles at points 
Yis---»Yg and is holomorphic in y outside these 
singular points. For a real nonsingular potential, T is 
an M-curve, that is, there exists an antiholomorphic 
involution 7: —IT,tToo = œ, the set of fixed 
points form g+1 ovals do,...,ag, © E do, Yk E Ak. 
The wave function W(y,x,y) of the perturbed 
operator L=0,—02+u(x,y) is defined at the 
same spectral curve T, but it is not holomorphic 
any more. It has the following properties: 


1. At the point oo, the wave function (y, x,y) has 
an essential singularity: W(y,x,y) = Volly, x, y) 
(1 + 0(1)). 

2. In the neighborhoods of the points yp, (7, x,y) 
can be written as a product of a continuous 
function by a simple pole at y}. 

3. The wave function W(y,x,y) satisfies the ô 
equation 


Ov(y, X, Y, t) 
oy 
where the (0, 1)-form T(y)=t(y)dy is regular 


outside the divisor points yz, and in the neighbor- 
hood of y, it possible to define local coordinate 


such that ¢(y) = sgen(S‘y)t1(y)(¥ — Ye) /(Y — Yk), t1 (7) 
is regular. 


dy = T(y)¥(ry,x,y,t) [53] 


A solution of the inverse problem can be obtained 
by using appropriate analogs of Cauchy kernels on 
Riemann surfaces. 


Quasiclassical Limit 


The systems integrable by the -method usually 
describe integrable systems with high-order deriva- 
tives. It is well known that by applying some 
limiting procedures to integrable systems one can 
construct new completely integrable equations, but 
integration methods for these equations are based on 
completely different analytic tools. One of most 
important examples is the theory of dispersionless 
hierarchies. The limiting procedure for the ð- 
problem (quasiclassical 0-problem) was developed 
by Konopelchenko and collaborators. In the KP 
case, the quasiclassical limit of the wave function 
W(A,£) is assumed to have the following form: 


Tha, -) BEE exp (322) [54] 


It is possible to show that the function S(A, t) 
satisfies a Beltrami-type equation: 


a wag) [55] 





Or Or 


which is treated as a dispersionless limit of [13]. 
Higher-order corrections were also discussed in the 
literature (see Konopelchenko and Moro (2003)). 


See also: Boundary-Value Problems for Integrable 
Equations; Integrable Systems and Algebraic Geometry; 
Integrable Systems and the Inverse Scattering Method; 
Integrable Systems: Overview; Integrable Systems and 
Discrete Geometry; Riemann-—Hilbert Methods in 
Integrable Systems. 
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Introduction 


In this article we shall briefly outline derived 
categories and their relevance for physics. Derived 
categories (and their enhancements) classify off-shell 
states in a two-dimensional topological field theory 
on Riemann surfaces with boundary known as the 
open-string B model. We briefly review pertinent 
aspects of that topological field theory and its 
relation to derived categories, the Bondal—Kapranov 
enhancement and its relation to the open-string B 
model, as well as B model twists of two-dimensional 
theories known as Landau-Ginzburg models, and 
how information concerning stability of D-branes is 
encoded in this language. We concentrate on more 
physical aspects of derived categories; for a very 
readable short review concentrating on the mathe- 
matics, see, for example, Thomas (2000). 


Sheaves and Derived Categories 
in the Open-String B Model 


Derived categories are mathematical constructions 
which are believed to be related to D-branes in the 
open-string B model. We shall begin by briefly 
reviewing the B model, as well as D-branes. 

The A and B models are two-dimensional topolo- 
gical field theories, closely related to nonlinear 
sigma models, which are supersymmetrizations of 
theories summing over maps from a Riemann 
surface (the world sheet of the string) into some 
“target space” X. In both the A and B models, one 
considers only certain special correlation functions, 
involving correlators closed under the action of a 
nilpotent scalar operator known as the “BRST 
operator,” O, which is part of the original super- 
symmetry transformations. In considering the perti- 
nent correlation functions, only certain types of 
maps contribute. The A model has the properties of 
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Nachman AJ and Ablowitz MJ (1984) A multidimensional 
inverse-scattering method. Studies in Applied Mathematics 
71(3): 243-250. 

Novikov RG and Henkin GM (1987) The 0-equation in the 
multidimensional inverse scattering problem. Russian Mathe- 
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being invariant under complex structure deforma- 
tions of the target space X, and its pertinent 
correlation functions are computed by summing 
over holomorphic maps into the target X. The A 
model will not be relevant for us here. The B model 
has the properties of being invariant under Kahler 
moduli of the target X, and its pertinent correlation 
functions are computed by summing over constant 
maps into the target X. In the closed-string B model, 
the states of the theory are counted by the 
cohomology groups H*(X,A*TX), where X is con- 
strained to be Calabi-Yau. The BRST operator in 
the B model O can be identified with 0 for many 
purposes. The open-string B model is the same 
topological field theory, but now defined on a 
Riemann surface with boundaries. As with all 
open-string theories, we specify boundary conditions 
on the fields, which force the ends of the string to 
live on some submanifold of the target, and we 
associate to the boundaries degrees of freedom 
(known as the Chan—Paton factors) which describe 
a (possibly twisted) vector bundle over the submani- 
fold. In the case of the B model, the submanifold is a 
complex submanifold, and the vector bundle is 
forced to be a holomorphic vector bundle over that 
submanifold. 

To lowest order, that combination of a submani- 
fold S of X together with a (possibly twisted) 
holomorphic vector submanifold, is a “D-brane” in 
the open-string B model. We shall denote the twisted 
bundle by €@.W/Ks, where Ks is the canonical 
bundle of S, and the \/Ks factor is an explicit 
incorporation of something known as the Freed- 
Witten anomaly. Now, if 7:5 X is the inclusion 
map, then to this D-brane we can associate a 
sheaf i,€. 

Technically, a sheaf is defined by associating sets, 
or modules, or rings, to each open set on the 
underlying space, together with restriction maps 
saying how data associated to larger open sets 
restricts to smaller open sets, obeying the obvious 
consistency conditions, together with some gluing 
conditions that say how local sections can be 
patched back together. A vector bundle defines a 
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sheaf by associating to any open set sections of the 
bundle over that open set. Sheaves of the form “iE” 
look like, intuitively, vector bundles over submani- 
folds, with vanishing fibers off the submanifold. 
A more detailed discussion of sheaves is beyond the 
scope of this article; see instead, for example, Sharpe 
(2003). 

To “associate a sheaf” means finding a sheaf such 
that physical properties of the D-brane system are 
well modeled by mathematics of the sheaf. (In 
particular, the physical definition of D-brane has, 
on the face of it, nothing at all to do with the 
mathematical definition of a sheaf, so one cannot 
directly argue that they are the same, but one can 
still use one to give a mathematical model of the 
other.) For example, the spectrum of open-string 
states in the B model stretched between two 
D-branes, associated to sheaves i£ and jF, turn 
out to be calculated by a cohomology group known 
as Exty(1.€, jxF). 

There are many more sheaves not of the form i,€, 
that is, that do not look like vector bundles over 
submanifolds. It is not known in general whether 
they also correspond to (on-shell) D-branes, but in 
some special cases the answer has been worked out. 
For example, structure sheaves of nonreduced 
schemes turn out to correspond to D-branes with 
nonzero nilpotent Higgs vevs. 

For a set of ordinary D-branes, the description 
above suffices. However, more generally one would 
like to describe collections of D-branes and anti- 
D-branes, and tachyons. An anti-D-brane has all 
the same physical properties as an ordinary D- 
brane, modulo the fact that they try to annihilate 
each other. The open-string spectrum between 
coincident D-branes and anti-D-branes contains 
tachyons. One can give an (off-shell) vacuum 
expectation value to such tachyons, and then the 
unstable brane—antibrane-tachyon system will 
evolve to some other, usually simpler, configura- 
tion. For example, given a single D-brane wrapped 
on a curve, with trivial line bundle, and an anti-D- 
brane wrapped on the same curve, with line bundle 
O(—1), and a nonzero tachyon O(—1) —O, then 
one expects that the system will dynamically evolve 
to a smaller D-brane sitting at a point on the curve. 

Now, one would like to find some mathematics 
that describes such systems, and gives information 
about the endpoints of their evolution. Techni- 
cally, one would like to classify universality classes 
of world-sheet boundary renormalization group 
flow. 

It has been conjectured that derived categories of 
sheaves provide such a classification. To properly 
explain derived categories is well beyond the scope 


of this article (see instead the “Further reading” 
section at the end), but we shall give a short outline 
below. 

Mathematically, derived categories of sheaves 
concern complexes of sheaves, that is, sets of 
sheaves £E; together with maps d;: E; > Ej, 

' ae o> Aa 
such that dj,;0d;=0. A category is defined by a 
collection of “objects” together with maps between 
the objects, known as morphisms. In a derived 
category of coherent sheaves, the objects are com- 
plexes of sheaves, and the maps are equivalence 
classes of maps between complexes. 

Physically, if the complex consists of locally free 
sheaves (equivalently, vector bundles), then we can 
associate a brane/antibrane/tachyon system, by iden- 
tifying the £; for i even, say, with D-branes, and the 
E; for i odd with anti-D-branes. If the E; are all 
locally free sheaves, then there are tachyons between 
the branes and antibranes, and we can identify the 
d;s with those tachyons. In the open-string world- 
sheet theory, giving a tachyon a vacuum expectation 
value modifies the BRST operator O, and a necessary 
condition for the new theory to still be a topological 
field theory is that OQ? =0, a condition which turns 
out to imply that dj,1 o d;=0. 

To re-create the structure of a derived category, 
we need to impose some equivalence relations. To 
see what sorts of equivalence relations one would 
like to impose, note the following. Physically, we 
would like to identify, for example, a configuration 
consisting of a brane, an antibrane, and a tachyon, 
which we can describe as a complex 


O(—D) — O 
with a one-element complex 
Op 


corresponding to the D-brane which we believe is 
the endpoint of the evolution of the brane/antibrane 
configuration. 

One natural mathematical way to create identifi- 
cations of this form is to identify complexes that 
differ by “quasi-isomorphisms,” meaning, a set of 
maps (f”:C”— D”) compatible with d’s, and 
inducing an isomorphism f”:H”(C) = H”(D) on 
the cohomologies of the complexes. In particular, 
in the example above, there is a natural set of maps 


O(-D) ——- O 


| 


0 








Op 


that define a quasi-isomorphism. More generally, in 
homological algebra, one typically does computations 
by replacing ordinary objects with projective or 
injective resolutions, that is, complexes with special 
properties, in which the desired computation 
becomes trivial, and defining the result for the 
original object to be the same as the result for the 
resolution. To formalize this procedure, one would 
like a mathematical setup in which objects and their 
projective and injective resolutions are isomorphic. 

However, to define an equivalence relation, one 
usually needs an isomorphism, and the quasi- 
isomorphisms above are not, in general, isomorphisms. 
Creating an equivalence from nonisomorphisms, 
to resolve this problem, can be done through a 
process known as “localization” (generalizing the 
notion of localization in commutative algebra). 
The resulting equivalence relations on maps 
between complexes define the derived category. 

The derived category is a category whose objects 
are complexes, and whose morphisms C — D: are 
equivalence classes of pairs (s,t) where s: G —C is 
a quasi-isomorphism between C’ and another com- 
plex G, and t: Œ —D is a map of complexes. We 
take two such pairs (s,t),(s’,t’) to be equivalent if 
there exists another pair (r, h) between the auxiliary 
complexes G,G"’, making the obvious diagram 
commute. This is, in a nutshell, what is meant by 
localization, and by working with such equivalence 
classes, this allows us to formally invert maps that 
are otherwise noninvertible. (We encourage the 
reader to consult the “Further reading” section for 
more details.) 

Mathematically, this technology gives a very 
elegant way to rethink, for example, homological 
algebra. There is a notion of a derived functor, a 
special kind of functor between derived categories, 
and notions from homological algebra such as Ext 
and Tor can be re-expressed as cohomologies of the 
image complexes under the action of a derived 
functor, thus replacing cohomologies with 
complexes. 

Physically, looking back at the physical realization 
of complexes, we see a basic problem: different 
representatives of (isomorphic) objects in the derived 
category are described by very different physical 
theories. For example, the sheaf Op corresponds to a 
single D-brane, defined by a two-dimensional 
boundary conformal field theory (CFT), whereas 
the brane/antibrane/tachyon collection O(—D) — O 
is defined by a massive nonconformal two- 
dimensional theory. These are very different physical 
theories. If we want “localization on quasi- 
isomorphisms” to happen in physics, we have to 
explain which properties of the physical theories we 
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are interested in, because clearly the entire physical 
theories are not and cannot be isomorphic. 

Although the entire physical theories are not 
isomorphic, we can hope that under renormalization 
group flow, the theories will become isomorphic. 
That is certainly the physical content of the statement 
that the brane/antibrane system O(—D)—O should 
describe the D-brane corresponding to Op — after 
world-sheet boundary renormalization group flow, 
the nonconformal two-dimensional theory describing 
the brane/antibrane system becomes a CFT describing 
a single D-brane. 

More globally, this is the general prescription for 
finding physical meanings of many categories: we 
can associate physical theories to particular types 
of representatives of isomorphism classes of 
objects, and then although distinct representatives 
of the same object may give rise to very different 
physical theories, those physical theories at least lie 
in the same universality class of world-sheet 
renormalization group flow. In other words, 
(equivalence classes of) objects are in one-to-one 
correspondence with universality classes of physical 
theories. 

Showing such a statement directly is usually not 
possible — it is usually technically impractical to 
follow renormalization group flow explicitly. There 
is nO symmetry reason or other basic physics reason 
why renormalization group flow must respect quasi- 
isomorphism. The strongest constraint that is clearly 
applied by physics is that renormalization group 
flow must preserve D-brane charges (Chern char- 
acters, or more properly, K-theory), but objects in a 
derived category contain much more information 
than that. 

However, indirect tests can be performed, and 
because many indirect tests are satisfied, the result is 
generally believed. 

The reader might ask why it is not more efficient 
to just work with the cohomology complexes 
H (C) themselves, rather than the original com- 
plexes. One reason is that the original complexes 
contain more information than the cohomology -— 
passing to cohomology loses information. For 
example, there exist examples of complexes that 
have the same cohomology, yet are not quasi- 
isomorphic, and so are not identified within the 
derived category, and so physically are believed 
to lie in different universality classes of boundary 
renormalization group flow. 

Another motivation for relating physics to derived 
categories is Kontsevich’s approach to mirror sym- 
metry. Mirror symmetry relates pairs of Calabi-Yau 
manifolds, of the same dimension, in a fashion such 
that easy classical computations in one Calabi-Yau 
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are mapped to difficult “quantum” computations 
involving sums over holomorphic curves in the other 
Calabi-Yau. Because of this property, mirror sym- 
metry has proved a fertile ground for algebraic 
geometers to study. Kontsevich proposed that mirror 
symmetry should be understood as a relation 
between derived categories of coherent sheaves on 
one Calabi-Yau and derived Fukaya categories on 
the other Calabi-Yau. At the time he made this 
proposal, no one had any idea how either could be 
realized in physics, but since that time, physicists 
have come to believe that Kontsevich was secretly 
talking about D-branes in the A and B models. 


Bondal—Kapranov Enhancements 


Mathematically derived categories are not quite as 
ideal as one would like. For example, the cone 
construction used in triangulated categories does not 
behave functorially — the cone depends upon the 
representative of the equivalence class defining an 
object in a derived category, and not just the object 
itself. 

Physically, our discussion of brane/antibrane 
systems was not the most general possible. One 
can give vacuum expectation values to more general 
vertex operators, not just the tachyons. 

Curiously, these two issues solve each other. By 
incorporating a more general class of boundary vertex 
operators, one realizes a more general mathematical 
structure, due to Bondal and Kapranov, which repairs 
many of the technical deficiencies of ordinary derived 
categories. Ordinary complexes are replaced by gen- 
eralized complexes in which arrows can map between 
non-neighboring elements of the complex. Schemati- 
cally, the BRST operator is deformed by boundary 
vacuum expectation values to the form 


O=0+) da 


and demanding that the BRST operator square to 
zero implies that 


X ba + Y by 0 ba = 0 
a a,b 


which is the same as the condition for a generalized 
complex. Note that for ordinary complexes, the 
condition above factors into 


Ọn+1 © Dn =0 


which yields an ordinary complex of sheaves 
(Figure 1). 
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Figure 1 1. Example of generalized complex. Each arrow is 
labeled by the degree of the corresponding vertex operator. 


Landau-Ginzburg Models 


So far we have described how derived categories are 
relevant to geometric compactifications, that is, 
sigma models on Calabi-Yau manifolds. However, 
there are also “nongeometric” theories — CFTs that 
do not come from sigma models on manifolds, of 
which Landau—Ginzburg models and their orbifolds 
are prominent examples. Landau—Ginzburg models 
can also be twisted into topological field theories, 
and the B-type topological twist of (an orbifold of) a 
Landau-Ginzburg model is believed to be iso- 
morphic, as a topological field theory, to the B 
model obtained from a nonlinear sigma model, of 
the form we outlined earlier. Landau—Ginzburg 
models have a very different form than nonlinear 
sigma models, and so sometimes there can be 
practical computational advantages to working 
with one rather than the other. 

A Landau-—Ginzburg model is an ungauged sigma 
model with a nonzero superpotential (a holo- 
morphic function over the target space that defines 
a bosonic potential and Yukawa couplings). (In 
“typical” cases, the target space is a vector space.) 
Because of the superpotential, a Landau—Ginzburg 
model is a massive theory — not itself a CFT, but 
many Landau—Ginzburg models are believed to flow 
to CFTs under the renormalization group. 

In formulating open strings based on Landau- 
Ginzburg models, naive attempts fail because of 
something known as the Warner problem: if the 
superpotential is nonzero, then the obvious ways to 
try to define the theory on a Riemann surface with 
boundary have the undesirable property that the 
supersymmetry transformations only close up to a 
nonzero boundary term, proportional to derivatives 
of the superpotential. In order to find a description 
of open strings in which the supersymmetry trans- 
formations close, one must take a very nonobvious 
formulation of the boundary data. Specifically, to 
solve the Warner problem, one is led to work with 
pairs of matrices whose product is proportional to 
the superpotential. 

This method of solving the Warner problem is 
known as matrix factorization, and D-branes in this 
theory are defined by the factorization chosen, that 
is, the choice of pairs of matrices. In simple cases, 
we can be more explicit as follows. Choose a set of 


polynomials Fy, Ga such that the Landau—Ginzburg 
superpotential W is given by 


W = ` FaGa + constant. 


The F, and Ga are used to define the boundary 
action — the Fs appear as part of the boundary 
superpotential and the G’s appear as part of the 
supersymmetry transformations of boundary fermi 
multiplets. The Fa and Ga, that is, the factorization 
of W, determine the D-brane in the Landau- 
Ginzburg theory. We can also think of having a 
pair of holomorphic vector bundles €1,€2 of the 
same rank, and interpret F and G as holomorphic 
sections of £Y Q E2 and EX ® E1, respectively, obey- 
ing FG x W -Id and GF x W - Id, up to additive 
constants. 

Although a Landau-Ginzburg model is not the 
same thing as a sigma model on a Calabi-Yau, 
orbifolds of Landau-Ginzburg models are often on 
the same Kahler moduli space. Perhaps, the most 
famous example of this relates sigma models on 
quintic hypersurfaces in P* to a Zs orbifold of a 
Landau-Ginzburg model over C? with five chiral 
superfields x1, x2,x3,x4,x5, and a superpotential of 
the form 


re AT ERT ERT 
Wax, +45 $4344, 7%;5 
Ex XI NINAN 


for yw a complex number, corresponding to the 
equation of the degree-5 hypersurface in P*. The 
(complexified) Kahler moduli space in this example 
is a P', with the sigma model on the quintic at one 
pole, the zero-volume limit of the sigma model along 
the equator, and the Landau—Ginzburg orbifold at 
the opposite pole. 

Since the closed-string topological B model is 
independent of Kahler moduli, and the sigma model 
on the quintic and the Landau—Ginzburg orbifold 
above lie on the same Kahler moduli space, one 
would expect them both to have the same spectrum 
of D-branes, and indeed this is believed to be true. 


Pi-Stability 


So far we have discussed D-branes in the topological 
B model, a topological twist of a physical sigma 
model. If we untwist back to a physical sigma 
model, then the stability of those D-branes becomes 
an issue. 

To begin to understand what we mean by stability 
in this context, consider a set of N D-branes 
wrapped on, say, a K3 surface, at large radius (so 
that world-sheet instanton corrections are small). 
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On the world volume of the D-branes, we have a 
rank-N vector bundle, and in the physical theory on 
that world volume we have a consistency condition 
for supersymmetric vacua, that the vector bundle be 
“Mumford—Takemoto stable.” To understand what 
is meant by this condition on a Kahler manifold, let 
w denote the Kahler form, and define the “slope” u 
of a vector bundle € on a manifold X of complex 
dimension 7 to be given by 


E fs DA c1(€) 
me) = rank € 


where w is the Kahler form. Then, we say that € is 
(semi-)stable if for all subsheaves F satisfying 
certain consistency conditions, pu(F)(<) < p(€). 

Since the slope of a bundle depends upon the 
Kahler form, whether a given bundle is Mumford- 
Takemoto stable depends upon the metric. In 
general, on a Kahler manifold, the Kahler cone 
breaks up into subcones, with a different moduli 
space of (stable) holomorphic vector bundles in each 
subcone. 

This is a mathematical notion of stability, but it also 
corresponds to physical stability, at least in a regime in 
which quantum corrections are small. If a given 
bundle is only stable in a proper subset of the Kahler 
cone, then when it reaches the boundary of the 
subcone in which it is stable, the gauge field config- 
uration that satisfies the Donaldson—Uhlenbeck—-Yau 
partial differential equation splits into a sum of two 
separate bundles. In a heterotic string compactifica- 
tion, this leads to a low-energy enhanced U(1) gauge 
symmetry and D-terms which realize the change in 
moduli space. In D-branes, this means the formerly 
bound state of D-branes (described by an irreducible 
holomorphic vector bundle) becomes only marginally 
bound; a decay becomes possible. 

Pi-stability is a proposal for generalizing the 
considerations above to D-branes no longer wrap- 
ping the entire Calabi-Yau, and including quantum 
corrections. 

In order to define pi-stability, we must first 
introduce a notion of grading y of a D-brane. 
Specifically, for a D-brane wrapped on the entire 
Calabi-Yau X with holomorphic vector bundle €, 
the grading is defined as the mirror to the expression 
fy ch(E) A TI, where II encodes the periods. Close to 
the large-radius limit, this has the form: 


y(E) Sin log | exp(B + iw) A ch(€) 


TT 


x Ay/td(TX) +++ 


where B is a 2-form, the “B field.” As defined y is 
clearly S'-valued; however, we must choose a 
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particular sheet of the log Riemann surface, to 
obtain an R-valued function. 

This notion of grading of D-branes is an ansatz, 
introduced as part of the definition of pi-stability. 
Physically, it is believed that the difference in grading 
between two D-branes corresponds to the fractional 
charge of the boundary-condition-changing vacuum 
between the two D-branes, though we know of no 
convincing first-principles derivation of that state- 
ment. In particular, unlike closed-string computa- 
tions, the degree of the Ext group element 
corresponding to a particular boundary R-sector 
state is not always the same as the U(1)p charge — 
for example, it is often determined by the U(1)p 
charge minus the charge of the vacuum. The grading 
gives us the mathematical significance of that vacuum 
charge. This mismatch between Ext degrees and 
U(1)z charges is necessary for the grading to make 
sense: Ext group degrees are integral, after all, yet we 
want the grading to be able to vary continuously, so 
the grading had better not be the same as an Ext 
group degree. 

Given an R-valued function from a particular 
definition of log in the definition of p above, the 
statement of pi-stability is then that for all 
subsheaves F, as in the statement of Mumford- 
Takemoto stability, 


p(F) < (€) 


Before trying to understand the physical meaning 
of y, or the extension of these ideas to derived 
categories, let us try to confirm that Mumford- 
Takemoto stability emerges as a limit of pi-stability. 

For simplicity, suppose that X is a Calabi-Yau 
3-fold. Then, for large Kahler form w, we can 
expand (£) as, 


P(E) x Imlog|-5; | Plek £) 


3 Jy KA C1 (E) 
m Jew ukt) 


Thus, we see that to leading order in the Kähler 
form w, y(F) < (£) if and only if 


Joar NQF ) ke /\ c1(€) 
JX EN Fe PX ENE 
rk F =- rk E€ 


which is precisely the statement of Mumford- 
Takemoto stability on a 3-fold X. 

One can define a notion of (classical) stability for 
more general sheaves, but what one wants is to 
apply pi-stability to derived categories, not just 
sheaves. 


However, there is a technical problem that limits 
such an extension. Specifically, in a derived cate- 
gory, there is no meaningful notion of “subobject.” 
Thus, a notion of stability formulated in terms of 
subobjects cannot be immediately applied to derived 
categories. There are two (equivalent) workarounds 
to this issue that have been discussed in the math 
and physics literatures, which can be briefly sum- 
marized as follows: 


1. One workaround involves picking a subcategory 
of the derived category that does allow you to 
make sense of subobjects. Such a structure is 
known, loosely, as a “T-structure,” and so one 
can imagine formulating stability by first picking 
a T-structure, then specifying a slope function on 
the elements of the subcategory picked out by the 
subcategory. 

2. Another (equivalent) workaround is to work with 
a notion of “relative stability.” Instead of speak- 
ing about whether a D-brane is stable against 
decay into any other object, one only speaks 
about whether it is stable against decay into pairs 
of specified objects. 


In this fashion, one can make sense of pi-stability for 
derived categories. 


See also: Fourier—Mukai Transform in String Theory; 
Mirror Symmetry: A Geometric Survey; Spectral 
Sequences; Superstring Theories; Topological Quantum 
Field Theory: Overview. 
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Introduction 


The theory of random point fields has its origins in 
such diverse areas of science as life tables, particle 
physics, population processes, and communication 
engineering. A standard reference to the subject is 
the monograph by Daley and Vere-Jones (1988). 

This article is concerned with a special class of 
random point fields, introduced by Macchi in the mid- 
1970s. The model that Macchi considered describes 
the statistical distribution of a fermion system in 
thermal equilibrium. Macchi proposed to call the new 
class of random point processes the fermion random 
point processes. The characteristic property of this 
family of random point processes is the condition that 
k-point correlation functions have the form of deter- 
minants built from a correlation kernel. This implies 
that the particles obey the Pauli exclusion principle. 
Until the mid-1990s, fermion random point processes 
attracted only a limited interest in mathematics and 
physics communities, with the exception of two 
important works by Spohn (1987) and Costin- 
Lebowitz (1995). This situation changed dramatically 
at the end of the last century, as the subject greatly 
benefited from the newly discovered connections to 
random matrix theory, representation theory, random 
growth models, combinatorics, and number theory. 
Things are rapidly developing at the moment. Even the 
terminology has not yet set in stone. Many experts 
currently use the term “determinantal random point 
fields” instead of “fermion random point fields.” We 
follow this trend in our article. 

This article is intended as a short introduction to the 
subject. The next section builds a mathematical 
framework and gives a formal mathematical definition 
of the determinantal random point fields. Then we 
discuss examples of determinantal random point fields 
from quantum mechanics, random matrix theory, 
random growth models, combinatorics, and represen- 
tation theory. This is followed by a discussion of the 
ergodic properties of translation-invariant determi- 
nantal random point fields. We discuss the Gibbsian 
property of determinantal random point fields. 
Central-limit theorem type results for the counting 
functions and similar linear statistics are also dis- 
cussed. The final section is devoted to some general- 
izations of determinantal point fields, namely 
immanantal and Pfaffian random point fields. 
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Mathematical Framework 


We start by building a standard mathematical 
framework for the theory of random point pro- 
cesses. Let E be a one-particle space and X a space 
of finite or countable configurations of particles in E. 
In general, E can be a separable Hausdorff space. 
However, for our purposes it suffices to consider 
E=Rf or E=Z". We usually assume in this section 
that E=Rf, with the understanding that all con- 
structions can be easily extended to the discrete case. 
We assume that each configuration €=(x;),x; € E, 
ic Z (orie Zi for d > 1), is locally finite. In other 
words, for every compact K C E, the number of 
particles in K, #x(€) = #(x; € K) is finite. 

In order to introduce a o-algebra of measurable 
subsets of X, we first define the cylinder sets. 
Let B C E be a bounded Borel set and let n > 0. We 
call CË = {£ € X: #2(£) =n} a cylinder set. We define 
B as a o-algebra generated by all cylinder sets (i.e., 6 
is the minimal o-algebra that contains all CË). 


Definition 1 A random point field is a triplet 
(X, B, Pr), where Pr is a probability measure on (X, B). 


It was observed in the 1960-1970s (see, e.g., Lenard 
(1973, 1975)), that in many cases the most convenient 
way to define a probability measure on (X, B) is via the 
point correlation functions. Let E = Rf, equipped with 
the underlying Lebesgue measure. 


Definition 2 Locally integrable function p,:E* > 
Ri is called a k-point correlation function of the 
random point field (X,6, Pr) if, for any disjoint 
bounded Borel subsets A1,..., Am of E and for any 
k; € Z= 1,...,m, >>", k;=k, the following for- 
mula holds: 


where by E we denote the mathematical expectation 
with respect to Pr. In particular, p1(x) is the particle 
density, since 


Bita = | prle)dx 


for any bounded Borel ACE. In general, 
Pe(X1,...5Xp~) has the following probabilistic 
interpretation. Let [x1,x;+dx;|],i=1,...,k, be infini- 
tesimally small boxes around x;, then pp(x1,*2,.--,Xp) 
dx1---dx, is the probability to find a particle in each 
of these boxes. 
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In the discrete case E = Zf, the construction of a 
random point field is very similar. The probability 
space X and the o-algebra B are constructed 
essentially in the same way as before. Moreover, in 
the discrete case, the set of the countable configura- 
tions of particles can be identified with the set of all 
subsets of E. Therefore, X = {0, 1}”, and B is generated 
by the events {C,,x € E}, where Cy = {x € €}. The 
k-point correlation function p(x1,...,x,) is then just 
a probability that a configuration € contains the 
sites x1,...,x,. In other words, p,(x1,...,x,)= 
Pr ( NE: Cx,). In particular, the one-point correlation 
function p(x), x € Zf, is the probability that a 
configuration contains the site x, that is, 
pi(x) = Pr (Cx). 

The problem of the existence and the unique- 
ness of a random point field defined by its 
correlation functions was studied by Lenard 
(1973-1975). It is not surprising that Lenard’s 
papers revealed many parallels to the classical 
moment problem. In particular, the random point 
field is uniquely defined by its correlation func- 
tions if the distribution of random variables {#,} 
for bounded Borel sets A is uniquely determined 
by its moments. 

In this article we study a special class of random 
point fields introduced by Macchi (1975). To 
shorten the exposition, we give the definitions only 
in the continuous case E =R*. In the discrete case, 
the definitions are essentially the same. 

Let K:L2(R?) 4 L?(R2) be an integral locally 
trace-class operator. The last condition means that 
for any compact B c R? the operator Kyg is trace 
class, where yp(x) is an indicator of B. The kernel of 
K is defined up to a set of measure zero in Rf x Rf. 
For our purposes, it is convenient to choose it in 
such a way that for any bounded measurable B and 
any positive integer n 


a= J K(x, x)dx 2 


We refer the reader to Soshnikov (2000, p. 927) for 
the discussion. We are now ready to define a 
determinantal (fermion) random point field on Rê. 


Definition 3 A random point field on E is said to 
be determinantal (or fermion) if its n-point correla- 
tion functions are of the form 


OF PEEN N det (K(xi,%))) 1 <ien [3] 


Remark 1 If the kernel is Hermitian-symmetric, 
then the non-negativity of n-point correlation 
functions implies that the kernel K(x,y) is non- 
negative definite and, therefore K must be a 


non-negative operator. It should be noted, how- 
ever, that there exist determinantal random point 
fields corresponding to non-Hermitian kernels (see, 
e.g., [18] later). The kernel K(x, y) is usually called 
a correlation kernel of the determinantal random 
point process. 


In the Hermitian case, the necessary and sufficient 
conditions on the operator K to define a determi- 
nantal random point filed were established by 
Soshnikov (2000); see also Macchi (1975). 


Theorem 1 Hermitian locally trace class operator 
K on L*(E) determines a determinantal random 
point field if and only if 0< K <1 (in other words, 
both K and 1 —K are non-negative operators). If 
the corresponding random point field exists, it is 
unique. 


The main technical part of the proof is the 
following proposition. 


Proposition 1 Let (X,B,P) be a determinantal 
random point field with the Hermitian-symmetric 
correlation kernel K. Let f be a non-negative 
continuous function with compact support. Then 


Ee! = det (Id ~A KO- ef?) 4] 


where (€,f) is the value of the linear statistics 
defined by the test function f on the configuration 
€= (x;); in other words, (€, f) =Dif (x). 


Remark 2 The right-hand side (RHS) of [4] is well 
defined as the Fredholm determinant of a trace- 
class operator. Letting f = DE one obtains 
Eef) =E]]*_, z, ‘, with z;=e%. In this case, the 
left-hand side (LHS) of [4] becomes the generating 
function of the joint distribution of the counting 
random variables #;,,i=1,...,R. 


Unfortunately, there are very few known results 
in the non-Hermitian case. In particular, the 
necessary and sufficient condition on K for the 
existence of the determinantal random point field 
with the non-Hermitian correlation kernel is not 
known. 

We end this section with the introduction of the 
Janossy densities (a.k.a. density distributions, exclu- 
sion probability densities, etc.) of a random point 
field. 

The term Janossy densities in the theory of 
random point processes was introduced by Sriniva- 
san in 1969, who referred to the 1950 paper by 
Janossy on particle showers. Let us assume that all 
point correlation functions exist and are locally 
integrable, and let I be a bounded Borel subset of 


R. Intuitively, one can think of the Janossy density 
TS hI O EN ly snag hp C4, as 


wI] dx; 


= Prí there -are a k particles in I and 


Jar AS aS ey 


there is a particle in each of 

the k infinitesimal boxes (x;, x; + dx;), 

P= lenak [5] 
To give a formal definition, we express point 


correlation functions in terms of Janossy densities 
and vice versa: 





Ppl Xlgsass XE) 
=Y f, Tos XIs- e3 Xka Xk41y +++) Xk) 
x dxpy4 Š pols [6] 
Tir tip Xp) 
x 
> a Dp E eg 
i E oes 7] 


A very useful property of the Janossy densities is 
that 


Pr{there are exactly k particles in I} 


1 
=g | Foals 


In the case of determinantal random point fields, 
Janossy densities also have a determinantal form, 
namely 


Xp )dx1 A dx, [8] 


Iralis.. 5 Xk) 


= det(Id — Ky) - det(L1(x;,x;)) [9] 


1<ij<k 
In the last equation, Kz is the restriction of the operator 
K to the L*(I). In other words, Ky;(x,y)= 
x1(x)K(x, y)x1(y), where xz is the indicator of I. The 
operator Ly is expressed in terms of Ky; as L; = (Id — 
Kı) Kı. For further results on the Janossy densities of 
determinantal random point processes we refer the 
reader to Soshnikov (2004) and references therein. 


Examples of Determinantal Random 
Point Fields 


Fermion Gas 


Let H=—d* /dx? + V(x) be a Schrödinger operator 
with discrete spectrum on L*(E). We denote by 
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{ye}~-9 an orthonormal basis of the eigenfunctions, 
Hye = M - pe, ào < à1 < Ad < +--+. To define a Fermi 
gas, we consider the nth A power of H, 
A"(H): A” (L*(E)) 3 A"(L?(E)), where A”(L7(E)) is 
the space of square-integrable ey func- 
tions of n variables and A”(H)= S>"_ d /dx? + 
V(x;)). The eigenstates of the Fermi me are given by 
the normalized Slater determinants 


n 
Vk, EEE A ace Kn) 2 1)” | [vk 
=l 


Vil oe 
1 
T ae (Xj) r<ijen [10] 


n: 


where 0<ky <k <--- <k,. A probability distribu- 
tion of n particles in the Fermi gas is given by the 
squared absolute value of the eigenstate: 


T 

1 
— v det (pr (x) ici jen 
x det (p; CD) cy 


1 
=F det (K,,(x;, x;)) 


Di Miners) = 


[11] 
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where K,,(x,y)=)>"_1 Ye (x)ye(y) is the kernel 
of the orthogonal projector onto the subspace 
spanned by the n eigenfunctions {yz} of H. The 
n-dimensional probability distribution [11] 
defines a determinantal random point field with 
n particles. The k-point correlation functions are 


given by 
n! 
oC) = np | Pon TE 
X dXp = dX, 
= det (Ka (x1,2;))i<ijck [12] 


Pæ, 


Random Matrix Models 


Some of the most important ensembles of random 
matrices fall into the class of determinantal random 
point processes. 

The archetypal ensemble of Hermitian random 
matrices is a so-called Gaussian unitary ensemble 
(GUE). Let us consider the space of n x n Hermitian 
matrices {A= (Aj) <i jcn Re(Aij) =Re(Aji), Im{(A;) = 
-Im(A;;)}. A GUE random matrix is defined by its 
probability distribution 


P(dA) = const, -exp(—trA”)dA [13] 
where dA is a Lebesgue measure, that is, 


dA = ] [;;; dRe(Ajj) dIm(Aj) L-1 dAge. The eigenva- 


lues of a random Hermitian matrix are real random 
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variables, whose joint probability distribution is a 
determinantal random point process of n particles 
on the real line. The correlation kernel has the 
Christoffel-Darboux form built from the Hermite 
polynomials. 

The GUE ensemble of random matrices is invar- 
iant under the unitary transformation A > UAU”, 
U € U(z). An important generalization of [13] that 
preserves the unitary invariance is 


P(dA) = const, exp(—trV(A))dA [14] 


where, for example, V(x) is a polynomial of even 
degree with positive leading coefficients. The corre- 
lation functions of the eigenvalues in [14] are again 
determinantal, and the Hermite polynomials in the 
correlation kernel have to be replaced by the 
orthonormal polynomials with respect to the weight 
exp (— V(x)). For details, we refer the reader to the 
monographs by Mehta (2004) and Deift (2000). 

There are many other ensembles of random 
matrices for which the joint distribution of the 
eigenvalues has determinantal point correlation 
functions: classical compact groups with respect to 
the Haar measure, complex non-Hermitian Gaus- 
sian random matrices, positive Hermitian random 
matrices of the Wishart type, and chains of 
correlated Hermitian matrices. We refer the reader 
to Soshnikov (2000) for more information. 


Discrete Translation-Invariant Determinantal 
Random Point Fields 


Let pT — [0,1] be a Lebesgue-measurable func- 
tion on the d-dimensional torus T. Assume that 
0<g<1. A configuration € in Zf can be thought of 
as a 0-1 function on Zf, that is, ¿(x)= 1 if x € € and 
E(x)=0 otherwise. We define a Z*-invariant prob- 
ability measure Pr on the Borel sets of X ={0,1}” in 
such a way that 


Pk(X1,---,Xk) = Pr(€(xi) = 1,...,€(xg) = 1) 
= det((x; — eer [15] 


for x1,..., xp E€ Zf. In the above formula, {g(n)} 
are the Fourier coefficients of g, that is, 
g(x) = >>, a(n)e”*. It is clear from Definition 3 that 
[15] defines a determinantal random point field on 
Z with the translation-invariant kernel K(x, y) = 
2(x — y). Below we discuss several examples that fall 
into this category. For further discussion we refer the 
reader to Lyons (2003) and Soshnikov (2000). 


1. In the trivial case when g is identically a constant 
p € [0,1], we obtain the 1.i.d. Bernoulli prob- 
ability measure. 


2. The edges of the uniform spanning tree in Z7 
parallel to the horizontal axis can be viewed as 
the determinantal random point field in Z? with 


on sin nx 
i) 
ne sin? mx ir sin? TY 

Similarly, the edges of the uniform spanning 
forest in Z? parallel to the x1-axis correspond to 
the function 


sin? HXi 


2(x1,...,xg) = — 
a D sin? 1x; 


(the uniform spanning forest on Z is a tree only 
for d<4). The result is due to Burton and 
Pemantle (1993). 

3. Let d=1 and y be a parameter between 0 and 1. 
Consider 


1-y¥ : 
vee Cae 
eam — x| 


The corresponding probability measure is a 
renewal process and 


— afn) = Z7 nh 
K(n) = a(n) = 9 
(see Soshnikov (2000)). 

4. The process with g(x)= x(x), where I is an 
arbitrary arc of a unit circle, appeared in 
the work of Borodin and Olshanski (2000). The 
corresponding correlation kernel is known as the 
discrete sine kernel. The determinantal random 
point process on Z! with the discrete sine kernel 
describes the typical form of large Young 
diagrams “in the bulk” (see the next subsection). 

5. The discrete sine correlation kernel with g = xj, 1/2] 
appeared in the zig-zag process (Johansson 2002) 
derived from the uniform domino tilings in the 
plane. It corresponds to g = x{0, 1/2). 


Determinantal Measures on Partitions 


By a partition of n=1,2,... we understand a 
collection of non-negative integers \=(A1,..., Am) 
such that Ay +---+A,=n and Ay >A >- > Am. 
We shall use a notation Par(m) for the set of all 
partitions of 7. 
The Plancherel measure M, on the set Par(7) is 
defined as 
2 
ne ena 16] 
n! 
where dim A is the dimension of the corresponding 
irreducible representation of the symmetric group 


S,. Let Par= ||”, Par(n). Consider a probability 
measure M’ on Par 


MA) =e M, (0) where 
AC Paty), n =0, 12 USO co [17] 


M’ is called the Poissonization of the measures M,,. 
The analysis of the asymptotic properties of M, and 
M’ has been important in connection to the famous 
Ulam problem and related questions in representa- 
tion theory. 

It was shown by Borodin and Okounkov (2000), 
and, independently, Johansson (2001) that M’ is a 
determinantal random point field. The correspond- 
ing correlation kernel K (in the so-called modified 
Frobenius coordinates) is a so-called discrete Bessel 
kernel on Zt, 


K(x,y) 
Jp TV w122 V0) —Jiniv1/2(2VVii-y2(2V 9) 
|x| — [y| 


ifxy>0 


VELEL (2V Jy -172(2 V0) -J ixj+1/2(2 V0] y+1/2(2V0) 
x—y 





if xy <0 
[18] 


where /],(-) is the Bessel function of order x. One 
can observe that the kernel K(x,y) is not Hermitian, 
but the restriction of this kernel to the positive and 
negative semiaxis 1s Hermitian. 

M’ is a special case of an infinite parameter family 
of probability measures on Par, called the Schur 
measures, and defined as 


MA) = J5xl2)5x0) 19 


where s) are the Schur functions, x =(x1,x2,...) 
and y= (y1, Y2,...) are parameters such that 


Z= X sloso) =] [GA -xy 20] 


AEPar 1j 


is finite and {x;} 21 = {y;} 21. It was shown by 
Okounkov (2001), that the Schur measures belong 
to the class of the determinantal random point fields. 


Nonintersecting Paths of a Markov Process 


Let p:s(x,y) be the transition probability of a 
Markov process (t) on R with continuous trajec- 
tories and let (£1 (t), &:(t),...,&,(t)) be n independent 
copies of the process. A classical result of Karlin and 
McGregor (1959) states that if n particles start at 
the positions x,’ <x5'<-..< x0, then the 
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probability density of their joint distribution at 
time t; >0, given that their paths have not inter- 
sected for all 0 <t < t4, is equal to 


n 


n ij=1 


ma (fP, xD) = Fdet Poy, (9, x) 
provided the process (£1 (t), &:(t),...,&,(£)) in R” has 
a strong Markovian property. 

Let O0<t) <t <- <tm1- The conditional 
probability density that the particles are in the 
positions x} E xy <+.: <x! at time t, at 


the positions xa = x) <: <x) at time to,..., 
at the positions x,’ < x) <- < x™ at time tm, 


given that at time tm+1 they are at the positions 
oo < yh <- < xM+1) and their paths have 
not intersected, is then equal to 


(1) M 
Mias TE pane XE )) 


U D (Hisa 
= Zp M LL Puna (x! eee i F [21] 





where to = 0. 

It is not difficult to show that [21] can be viewed 
as a determinantal random point process (see, e.g., 
Johansson (2003). 

The formulas of a similar type also appeared in 
the papers by Johansson, Prahofer, Spohn, Ferrari, 
Forrester, Nagao, Katori, and Tanemura in the 
analysis of polynuclear growth models, random 
walks on a discrete circle, and related problems. 


Ergodic Properties 


As before, let (X,B, Pr) be a random point field 
with a one-particle space E. Hence, X is a space of 
the locally finite configurations of particles in E,B a 
Borel o-algebra of measurable subsets of X, and Pr a 
probability measure on (X,B). Throughout this 
section, we assume E=R®% or Z*. We define an 
action {7T"},-, of the additive group E on X in the 
following natural way: 


Pe KX, (Top (6); +t [22] 


We recall that a random point field (X, B, P) is 
called translation invariant if, for any AC Bb, any 
t€ E, Pr(T™A) = Pr(A). The translation invariance 
of the correlation kernel K(x, y)=K(x — y,0)=: 
K(x—y) implies the translation invariance of 
k-point correlation functions 


Pr(X1 Sree ae a9, = Pi Mis 025 Xb) 
Ave = Dean, LEE [23] 


which, in turn, implies the translation invariance 
of the random point field. The ergodic properties 
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of such point fields were studied by several 
mathematicians (Soshnikov 2000, Shirai and 
Takahashi, 2003, Lyons and Steif 2003). The 
first general result in this direction was obtained 
by Soshnikov (2000). 


Theorem 2 Let (X,B,P) be a determinantal ran- 
dom point field with a translation-invariant correla- 
tion kernel. Then the dynamical system (X, B, P, {T"}) 
is ergodic, has the mixing property of any multiplicity 
and its spectra is absolutely continuous. 


We refer the reader to the article on ergodic 
theory for the definitions of ergodicity, mixing 
property, absolute continuous spectrum of the 
dynamical system, etc. 

In the discrete case [15], E = Zf, more is known. 
Lyons and Steif (2003) proved that the shift 
dynamical system is Bernoulli, that is, it is iso- 
morphic (in the ergodic theory sense) to an 1.i.d. 
process. Under the additional conditions Spec(K) c 
(0,1) and >., In||K()|7 < oo, Shirai and Takahashi 
(2003a) proved the uniform mixing property. 


Gibbsian Properties 


Costin and Lebowitz (1995) were the first to 
question the Gibbsian nature of the determinantal 
random point fields; they studied the continuous 
determinantal random point process on R! with a 
so-called sine correlation kernel 


The first rigorous result (in the discrete case) was 


established by Shirai and Takahashi (2003b). 


Theorem 3 Let E be a countable discrete space 
and K a symmetric bounded operator on I(E). 
Assume that Spec(K) C (0,1). Then (X,B,P) is a 
Gibbs measure with the potential U given by 

U(x|g)= —log( (f(x, =e a where x €E,€E X, 
{x}N€=0. Here es) stands for the kernel of the 
operator ]=(Id—K)~'K, and we set Je=V(952)) 
and j£=(Jlx, Nyce: 


We recall that the Gibbsian property of the 
probability measure P on (X, B) means that 


aes a e Ulnlgac) B 


Ś NCA 


y, ZEE 


E[F| Bac] (£ F(N U Enc) 


where A is a finite subset of E, By: is the o-algebra 
generated by the B-measurable functions with the 
support outside of A, E[F|Bac] is the conditional 


mathematical expectation of the integrable function 
F on (X, B, P) with respect to the o-algebra Bac. The 
potential U is uniquely defined by the values of 


U(x,€), as follows from the following recursive 
relation: 
Oia CS Ce igo 
+ U(xn-1|{x1, ae a U E) 
$e $ UH 
For additional information about the Gibbsian 
property, see Introductory Articles: Equilibrium 


Statistical Mechanics. Much less is known in the 
continuous case. Some generalized form of Gibssian- 
ness, under quite restrictive conditions, was recently 


established by Georgii and Yoo (2004). 


Central Limit Theorem for Counting 
Function 


In this section, we discuss the central-limit theorem 
type results for the linear statistics. The first 
important result in this direction was established 
by Costin and Lebowitz in 1995, who proved the 
central-limit theorem for the number of particles in 
the growing box, #;-1,1), L— œ, in the case of the 
determinantal random point process on R! with the 
sine correlation kernel 


Below we formulate the Costin—Lebowitz theorem 
in its general form due to Soshnikov (1999, 2000). 


Theorem 4 Let E be a one-particle space, {0 < 
K; < 1} a family of locally trace-class operators in 
L7(E), {(X,B,P;)} a family of the corresponding 
determinantal random point fields in E, and {I,} a 
family of measurable subsets in E such that 


Var# 1, 


= tr(K;- x, — (Ki; XY) >œ ast—o_ [24] 
Then the distribution of the normalized number of 
particles in l, (with respect to P;) converges to the 
normal law, that is, 


#1, — EFL, w 
\/ Var#1, 


An analogous result holds for the joint distribu- 
tion of the counting functions {#17,,---, FI)» where 
I},..., If are disjoint measurable subsets in E. 


#, N(0,1) 


The proof of the Costin—Lebowitz theorem uses 
the k-point cluster functions. In the determinantal 
case, the cluster functions have a simple form 


x K(Xo(k), Xo(1)) |25] 


The importance of the cluster function stems from 
the fact that the integrals of the k-point cluster 
function over the k-cube with a side I can be expressed 
as a linear combination of the first k cumulants of the 
counting random variable #;. In other words, 


J Filise h) dei dx, 
Peel 


k 
=% Pu |26] 
= 


It follows from [25] that the integral at the LHS of 
[26] equals, up to a factor (—1)'(J — 1)!, to the trace 
of the kth power of the restriction of K to I. This 
allows one to estimate the cumulants of the counting 
random variable #;. For details, we refer the reader 
to Soshnikov (2000). The central-limit theorem for a 
general class of linear statistics, under some techni- 
cal assumptions on the correlation kernel was 
proved in Soshnikov (2002). Finally, we refer the 
reader to Soshnikov (2000) for the functional 
central-limit theorem for the empirical distribution 
function of the nearest spacings. 


Generalizations: Immanantal and Pfaffian 
Point Processes 


In this section, we discuss two important general- 
izations of the determinantal point processes. 


Iimmanantal Processes 


Immanantal random point processes were introduced 
by P Diaconis and S N Evans in 2000. Let » be a 
partition of n. Denote by x^ the character of the 
corresponding irreducible representation of the sym- 
metric group S,,. Let K(x, y), be a non-negative-definite, 
Hermitian kernel. An immanantal random point 
process is defined through the correlation functions 


tp) = >; JI Keinsa) |27] 
i=1 


oESy 


In other words, the correlation functions are given by 
the immanants of the matrix with the entries 
K(x;,x;). We will denote the RHS of [27] by 
Kô^[x1,... Xn]. 
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In the special case A= (1”) (i.e. A consists of n 
parts, all of which equal to 1), one obtains that 
Koe, and Kiessl = dk 
Therefore, in the case A= (1”) the random point 
process with the correlation functions [27] is a 
determinantal random point process. When à= (7) 
(i.e., the permutation has only one part, namely n) we 
have x^=1 identically, and K%*[x1,...,x,]= 
per(K(x;,x;)), the permanent of the matrix K(x;, xj). 
The corresponding random point process is known as 
the boson random point process. 


Pfaffian Processes 


Let 


= (Kulx,y) Ki(x,y) 
K(x,y) = e G 


be an antisymmetric 2 x 2 matrix-valued kernel, that 
is, K(x, y)= —Kjily, x), i,j =1,2. The kernel defines 
an integral operator acting on L*(E) @ L*(E), which 
we assume to be locally trace class. A random point 
process on E is called Pfaffian if its point correlation 
functions have a Pfaffian form 


ACRE k>1 [28] 


The RHS of [28] is the Pfaffian of the 2k x 2k 
antisymmetric matrix (since each entry K(x;,x;) is a 
2 x 2 block). Determinantal random point processes 
is a special case of the Pfaffian processes, corre- 
sponding to the matrix kernel of the form 


Ken = ry E) 


where K is a scalar kernel. The most well known 
examples of the Pfaffian random point processes, 
that cannot be reduced to determinantal form are 
B=1 and G=4 polynomial ensembles of random 
matrices and their limits (in the bulk and at the edge 
of the spectrum), as the size of a matrix goes to 
infinity. 
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Introduction 


Consider the dynamical system on R described by 
the equation 


_ da 
== 


ù G(u) + eF(u) [1] 
where F,G:S cR — R? are analytic functions 
and £ a real (small) parameter. Suppose also that for 
€=0 a solution uo:R — S (for some initial condi- 
tion uo(0) =z) is known. 

We look for a solution of [1] which is a 
perturbation of uo, that is, for a solution u which 
can be written in the form u=uo+ U, with 


U=O(e) and U(0)=U = u(0)— u. Then we con- 
sider the variational equation 

U = M(t)U +8(t), My(t) = 3n G;(uo(t)) [2] 
where ®(t) = ®(uo(t), U), with ®(uo, U) = Gluo + U) 
—G(uo) — ô „G(uo)U + cF(uọ + U). By defining the 
Wronskian matrix W as the solution of the 
matrix equation W=M(t)W such that W(0)=1 
(the columns of W are given by d independent 


solutions of the linear equation ú = M(t)u), we can 
write 


U(t) = WAT + WA J dw oio B 


If we expect the solution U to be of order £, we can 
try to write it as a Taylor series in ¢, that is, 


u(t) =X AUA [4] 


and, by inserting [4] into [3] and equating the 
coefficients with the same Taylor order, we 
obtain 


U*) (4) =W(t)U) 
+ W(t) J r? W-l(r) 8”) (r) [5] 
0 


where ®'*)(t) is defined as 


5O uth)... ye) 


p=2 ky+--+Rp=k 
< 1 0PF 
aw o(t)) 
pai? 
x ` yD... Ur) k>2 j6] 
kyt--+kp=k-1 


Hence ®'*)(t) depends only on coefficients of orders 
strictly less than k. In this way, we obtain an 
algorithm useful for constructing the solution 
recursively, so that the problem is solved, up to 
(substantial) convergence problems. 


Historical Excursus 


The study of a system like [1] by following the 
strategy outlined above can be hopeless if we do not 
make some further assumptions on the types of 
motions we are looking for. 

We shall see later, in a concrete example, that the 
coefficients U'*)(t) can increase in time, in a k- 
dependent way, thus preventing the convergence of 
the series for large t. This is a general feature of this 
class of problems: if no care is taken in the choice of 
the initial datum, the algorithm can provide a 
reliable description of the dynamics only for a very 
short time. 

However, if one looks for solutions having a 
special dependence on time, things can work better. 
This happens, for instance, if one looks for quasiper- 
iodic solutions, that is, functions which depend on 
time through the variable y=wt, with we RN a 
vector with rationally independent components, 
that is such that w-v #0 for all ve Z" \ {0} 
(the dot denotes the standard inner product, 
wW: V =w +: +wynvy). A typical problem of 
interest is: what happens to a quasiperiodic solution 
uo(t) when a perturbation €F is added to the 
unperturbed vector field G, as in [1]? Situations of 
this type arise when considering perturbations of 
integrable systems: a classical example is provided by 
planetary motion in celestial mechanics. 

Perturbation series such as [4] have been extensively 
studied by astronomers in order to obtain a more 
accurate description of the celestial motions compared 
to that following from Kepler’s theory (in which all 
interactions between planets are neglected and the 
planets themselves are considered as points). In 
particular, we recall the works of Newcomb and 
Lindstedt (series such as [4] are now known as 
Lindstedt series). At the end of the nineteenth century, 
Poincaré showed that the series describing quasiper- 
iodic motions are well defined up to any perturbation 
order k (at least if the perturbation is a trigonometric 
polynomial), provided that the components of w are 
assumed to be rationally independent: this means that, 
under this condition, the coefficients U% (t) are 
defined for all k € N. However, Poincaré also showed 
that, in general, the series are divergent; this is due to 
the fact that, as seen later, in the perturbation series 
small divisors w - v appear, which, even if they do not 
vanish, can be arbitrarily close to zero. 


Diagrammatic Techniques in Perturbation Theory 55 


The convergence of the series can be proved 
indeed (more generally for analytic perturbations, or 
even those that are differentiably smooth enough) by 
assuming on w a stronger nonresonance condition, 
such as the Diophantine condition 


wl > Ww € ZN \ {0} [7] 
where |v|=||+---+ |vn|, and Co and 7 are 
positive constants. We note that the set of vectors 
satisfying [7] for some positive constant Co have full 
measure in R provided one takes r > N — 1. 

Such a result is part of the Kolmogorov—Arnold- 
Moser (KAM) theorem, and it was first proved by 
Kolmogorov in 1954, following an approach quite 
different fom the one described here. New proofs 
were given in 1962 by Arnol’d and by Moser, but 
only very recently, in 1988, Eliasson gave a proof in 
which a bound C* is explicitly derived for the 
coefficients U'*)(t), again implying convergence for € 
to be small enough. 

Eliasson’s work was not immediately known widely, 
and only after publication of papers by Gallavotti and 
by Chierchia and Falcolini, in which Eliasson’s ideas 
were revisited, did his work become fully appreciated. 
The study of perturbation series [4] employs techni- 
ques very similar to those typical of a very different 
field of mathematical physics, the quantum field 
theory, even if such an analogy was stressed and 
used to full extent only in subsequent papers. 

The techniques have so far been applied to a wide 
class of problems of dynamical systems: a list of 
original results is given at the end. 


A Paradigmatic Example 


Consider the case S = A x T™, with A an open subset 
of R, and let Ho: A—R and f:Ax T = R 
be two analytic functions. Then consider the Hamilto- 
nian system with Hamiltonian H(A, a@)=Ho(A) + 
ef(A,a). The corresponding equations describe a 
dynamical system of the form [1], with w=(A, aq), 
which can be written explicitly: 


É = —€0,f (A, a) 


& = O4H0(A) + caf (A, a) 8) 


Suppose, for simplicity, Ho(A)=A*/2 and 
f{(A,a)=f(a), where A*=A-A. Then, we obtain 
for a the following closed equation: 


ä = —eðaf (a) 9) 


while A can be obtained by direct integration once 
[9] has been solved. For ¢=0,[9] gives trivially 
a@=ao0(t)=aj+wt, where w=0,4Ho9(Ao)=Ao is 
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called the rotation (or frequency) vector. Hence, for 
€=0 all solutions are quasiperiodic. We are inter- 
ested in the preservation of quasiperiodic solutions 
when € Æ 0. 

For £ # 0, we can write, as in [3], 


a(t) = S etai) [10] 


a = ao(t) + a(t), 


k=1 
where a® is determined as the solution of the 
equation 
a’) =A” 


with [3af lal r] E” expressed as in [6]. 

The quasiperiodic solutions with rotation vector 
w could be written as a Fourier series, by 
expanding 


aP (t) — >; et gl®) [12] 


vezN 


with w as before. If the series [10], with the Taylor 
coefficients as in [12], exists, it will describe a 
quasiperiodic solution analytic in €, and in such a 
case we say that it is obtained by continuation of the 
unperturbed one with rotation vector w, that is 
&œo(t). 

Suppose that the integrand [0,f(a(r’)]*") in 
[11] has vanishing average. Then the integral over 
rT’ in [11] produces a quasiperiodic function, which 
in general has a nonvanishing average, so that 
the integral over r produces a quasiperiodic 
function plus a term linear in t. If we choose A i 
in [11] so as to cancel out exactly the term linear 
in time, we end up with a quasiperiodic function. 
In Fourier space, an explicit calculation gives, for 


all y0; 








k>2 [13] 


which again is suitable for an iterative construction 
of the solution. The coefficients a) are left 
undetermined, and we can fix them (arbitrarily) as 
identically vanishing. 

Of course, the property that the integrand in 
[11] has zero average is fundamental; otherwise, 
terms increasing as powers of t would appear (the 
so-called secular terms). Indeed, it is easy to 


realize that, if this happened, to order k terms 
proportional to #7* could be present, thus requir- 
ing, at best, || < |t|” for convergence up to time t. 
This would exclude a fortiori the possibility of 
quasiperiodic solutions. 

The aforementioned property of zero average can 
be verified only if the rotation vector is nonresonant, 
that is, if its components are rationally independent 
or, more particularly, if the Diophantine condition 
[7] is satisfied. Such a result was first proved by 
Poincaré, and it holds irrespective of how the 
parameters @'*) appearing in [11] are fixed. This 
reflects the fact that quasiperiodic motions take 
place on invariant surfaces (KAM tori), which can 
be parameterized in terms of the angle variables 
a(t), so that the values a% contribute to the initial 
phases, and the latter can be arbitrarily fixed. 

The recursive equations [13] can be suitably 
studied by introducing a diagrammatic representa- 
tion, as explained below. 


Graphs and Trees 


A (connected) graph G is a collection of points, 
called vertices, and lines connecting all of them. We 
denote with V(G) and L(G) the set of vertices and 
the set of lines, respectively. A path between two 
vertices is a minimal subset of L(G) connecting the 
two vertices. A graph is planar if it can be drawn in 
a plane without graph lines crossing. 

A tree is a planar graph G containing no closed 
loops (cycles); in other words, it is a connected 
acyclic graph. One can consider a tree G with a 
single special vertex vo: this introduces a natural 
partial ordering on the set of lines and vertices, and 
one can imagine that each line carries an arrow 
pointing toward the vertex vo. We can add an extra 
oriented line &) connecting the special vertex vo to 
another point which will be called the root of the 
tree; the added line will be called the root line. In 
this way, we obtain a rooted tree @ defined by 
V(0)=V(G) and L(8)= L(G) U 4. A labeled tree is 
a rooted tree 6 together with a label function defined 
on the sets V(@) and L(@). 

Two rooted trees which can be transformed into 
each other by continuously deforming the lines in 
the plane in such a way that the latter do not cross 
each other (i.e., without destroying the graph 
structure) will be said to be equivalent. This notion 
of equivalence can also be extended to labeled trees, 
simply by considering equivalent two labeled trees if 
they can be transformed into each other in such a 
way that the labels also match. 

Given two vertices v,w € V(@), we say that w < v 
if v is on the path connecting w to the root line. One 


can identify a line with the vertices it connects; given 
a line €=(v,w), one says that £ enters v and exits w. 
For each vertex v, we define the branching number as 
the number p, of lines entering v. 

The number of unlabeled trees with k vertices can 
be bounded by the number of random walks with 2k 
steps, that is, by 4*. 

The labels are as follows: with each vertex v we 
associate a mode label v, € Z, and with each line 
we associate a momentum 1 € Z, such that the 
momentum of the line leaving the vertex v is given 
by the sum of the mode labels of all vertices 
preceding v (with v being included): if ¿= (v, v) 
then ve = Š „z, Vw. Note that for a fixed unlabeled 
tree the branching labels are uniquely determined, 
and, for a given assignment of the mode labels, the 
momenta of the lines are also uniquely determined. 

Define 


Wy, Dyra 1 
V; -E h, Re To [14] 


where the tensor V, is referred to as the node factor 
of v and the scalar gy as the propagator of the line Z. 
One has |f,| < Fe~“!"!, for suitable positive constants 
F and x, by the analyticity nia Then one 
can check that the coefficients a'*), defined in [12], 
for v Æ 0, can be expressed in ering of trees as 


= 2 Val(0 


peol* 
[15] 
val(o)={ [vii [e 
vEV(0) LEL(0) 
where O'*) denotes the set of all inequivalent trees 


with k vertices and with momentum v associated 
with e root line, while the coefficients aX can be 
fixed qi) = 0 for all k > 1, by the arbitrariness of the 
initial phases previously remarked. The property 
that [Onf(a(7’))] 8) in [11] has zero average for all 
k>1 implies that for all lines £ € L(A) one has 
gi =(w: v)” only for 1440, whereas gy=1 for 

=Q, so that the numerical valies Val(@) are well 
defined for all trees 0. If ae = 0 for all k > 1, then 
vy #0 for all £ € L(0). 

The proof of [15] can be performed by induction 
on k. Alternatively, we can start from the recursive 
definition [13], whereby the trees naturally arise in 
the following way. 

Represent graphically the coefficient a‘*) as in 
Figure 1; to keep track of the labels k and v, we 
assign k to the black bullet and v to the line. For 
k=1, the black bullet is meant as a grey vertex (like 
the ones appearing in Figure 3). 
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Figure 1 Graphical representation of a‘*’. 
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Figure 2 Graphical representation of the recursive equation [13]. 





Figure 3 An example of tree to be summed over in [15] for 
k = 39. The labels are not explicitly shown. The momentum of 
the root line is v, so that the mode labels satisfy the constraint 


> vev Vy =V. 


Then recursive equation [13] can be graphically 
represented as the diagram in Figure 2, provided 
that we associate with the (grey) vertex vo the 
node factor V,,, with vv, = vo and py, =p denoting 
the number of lines entering vo, and with the lines 
é;,i=1,...,p, entering vo the momenta 1., respec- 
tively. Of course, the sums over p and over the 
possible assignments of the labels {k;}?_, and {v;}"_, 
are understood. Each black bullet on the right- 
hand side of Figure 2, together with its exiting line 
looks like the diagram on the left-hand side, so 
that it represents alfi), i=1,...,p. Note that 
Figure 2 has to be interpreted in the following 
way: if one associates with the diagram as drawn 
in the right-hand side a numerical value (as 
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described above) and one sums all the values over 
the assignments of the labels, then the resulting 
quantity is precisely a% 

The (fundamental) difference between the black 
bullets on the right- and left-hand sides is that the labels 
k; of the latter are strictly less than k, hence we can 
iterate the diagrammatic oe simply by 
expressing again each qi as al”) in [13], and so on, 
until one obtains a tree with k grey vertices and no black 
bullets; see Figure 3, where the labels are not explicitly 
written. This corresponds to the tree expansion [15]. 

Any tree appearing in [15] is an example of what 
physicists call a Feynman graph, while the diagram- 
matic rules one has to follow in order to associate to 
the tree 0 its right numerical value Val(9) are usually 
called the Feynman rules for the model under 
consideration. Such a terminology is borrowed 
from quantum field theory. 


Multiscale Analysis and Clusters 


Suppose we replace [9] with a=<c0,f(a), so that 
no small divisors appear (that is, ge=1 in [14]). 
Then convergence is easily proved for € ee 
enough, since (by using the identity > iey) Pv =k — 
and the inequality e*x*/k! < 1 for all x € R, and i 
k € N), one finds 


IE V, < (o ry eklv|/4 IT e Klu|/4 [16] 


veu(6 veu(é 


and the sum over the mode labels can be performed 
by using the exponential decay factors e*!”!/4, while 
the sum over all possible unlabeled trees gives 4*. In 
particular, analyticity in £ follows. 

Of course, the interesting case is when the 
propagators are present. In such a case, even if 
no division by zero occurs, as w-% 40 (by 
the assumed Diophantine condition [13] and the 
absence of secular terms discussed previously), the 
quantities w- v in [14] can be very small. 

Then we can introduce a scale h characterizing the 
size of each propagator: we say that a line £ has scale 
hyp =h > Oifw- wis of order 2~’Cp and scale hy = —1 
if w- ve is greater than Co (of course, a more formal 
definition can be easily envisaged, for which the reader 
is referred to the original papers). Then, we can bound 
lw - ve| > 2Co for any £ € L(6), and write 


IE lg] Z Ores pe 


LEL(0 


< Cp n 27"0* exp (> 2 log 2 mio) [17] 
b=bo 


where N,,(@) is the number of lines in L(0) with scale 
h and ho is a (so far arbitrary) positive integer. The 
problem is then reduced to that of finding an 
estimate for N,(6). 

To identify which kinds of tree are the source of 
problems, we introduce the notion of a cluster and 
a self-energy graph. A cluster T with scale hr is a 
connected set of nodes linked by a continuous 
path of lines with the same scale label br or a 
lower one and which is maximal, namely all the 
lines not belonging to T but connected to it have 
scales higher than þr and at least one line in T has 
scale br. An inclusion relation is established 
between clusters, in such a way that the innermost 
clusters are the clusters with lowest scale, and so 
on. Each cluster T can have an arbitrary number 
of lines coming into it (entering lines), but only 
one or zero lines coming out from it (exiting line): 
lines of T which either enter or exit T are called 
external lines. A cluster T with oi one entering 
line £7. and with one exiting line 24 such that one 
has Va = Ue will be called a self-energy graph 
(SEG) ‘or resonance. In such a case, the line @7. is 
called a resonant line. Examples of clusters and 
SEGs are suggested by the bubbles in Figure 4; the 
mode labels are not represented, whereas the 
scales of the lines are explicitly written. 

If S,(@) is the number of SEGs whose resonant 
lines have scales hb, then N%(0)=N,(0)— S,(0) 
will denote the number of nonresonant lines with 
scale h. 

A fundamental result, known as Siegel—Bryuno 
lemma, shows that, for some positive constant c, 
one has 


NJO) < Pre > Jw 18 
vEV(0) 











Figure 4 Examples of clusters and SEGs. Note that the tree 
itself is a cluster (with scale 6), and each of the two clusters with 
one entering and one exiting lines is a SEG only if the momenta 
of its external lines are equal to each other. 





Figure 5 Example of tree whose value grows like a factorial. 


which, if inserted into [17] instead of N,(0@), would 
give a convergent series; then ho should be chosen in 
such a way that the sum of the series in [17] is less 
than, say, k > evo) v|/ 8. 

The bound [18] is a very deep one, and was 
originally proved by Siegel for a related problem 
(Siegel’s problem), in which, in the formalism 
followed here, SEGs do not occur; such a bound 
essentially shows that accumulation of small divisors 
is possible only in the presence of SEGs. A possible 
tree with k vertices whose value can be proportional 
to some power of k! is represented in Figure 5, 
where a chain of (k — 1)/2 SEGs, k odd, is drawn 
with external lines carrying a momentum v such that 
wy Col ’. 

In order to take into account the resonant lines, 
we have to add a factor (w : v)” for each resonant 
line @. It is a remarkable fact that, even if there are 
trees whose value cannot be bounded as a constant 
to the power k, there are compensations (that is, 
partial cancellations) between the values of all trees 
with the same number of vertices, such that the sum 
of all such trees admits a bound of this kind. 

The cancellations can be described graphically as 
follows. Consider a tree 6 with a SEG T. Then take 
all trees which can be obtained by shifting the 
external lines of T, that is, by attaching such lines to 
all possible vertices internal to T, and sum together 
the values of all such trees. An example is given in 
Figure 6. The corresponding sum turns out to be 
proportional to (w-v)’, if v is the momentum of the 
resonant line of T, and such a factor compensates 
exactly the propagator of this line. The argument 
above can be repeated for all SEGs: this requires a 
little care because there are SEGs which are inside 
some other SEGs. Again, for details and a more 
formal discussion, the reader is referred to original 
papers. 
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The conclusion is that we can take into account 
the resonant lines: this simply adds an extra constant 
raised to the power k, so that an overall estimate C*, 
for some C > 0, holds for U'*)(t), and the conver- 
gence of the series follows. 


Other Examples and Applications 


The discussion carried out so far proves a version of 
the KAM theorem, for the system described by [9], 
and it is inspired by the original papers by Eliasson 
(1996) and, mostly, by Gallavotti (1994). 

Here we list some problems in which original 
results have been proved by means of the diagram- 
matic techniques described above, or by some 
variants of them. These are discussed in the 
following. 

The first generalization one can think of is the 
problem of conservation in quasi-integrable systems of 
resonant tori (that is, invariant tori whose frequency 
vectors have rationally dependent components). Even 
if most of such tori disappear as an effect of the 
perturbation, some of them are conserved as lower- 
dimensional tori, which, generically, become of either 
elliptic or hyperbolic or mixed type according to the 
sign of € and the perturbation. With techniques 
extending those described here (introducing also, in 
particular, a suitable resummation procedure for 
divergent series), this has been done by Gallavotti 
and Gentile; see Gallavotti et al. (2004) and Gallavotti 
and Gentile (2005) for an account. 

An expansion like the one considered so far can 
be envisaged also for the motions occurring on the 
stable and unstable manifolds of hyperbolic lower- 
dimensional tori for perturbations of Hamiltonians 
describing a system of rotators (as in the previous 
case) plus n pendulum-like systems. In such a case, 
the function G(u) has a less simple form. For n= 1, 
one can look for solutions which depend on time 
through two variables, w=wt and x=e ®, with 
(w,g) € RN*!, and w Diophantine as before and g 
related to the timescale of the pendulum. This has 
been worked out by Gallavotti (1994), and then 
used by Gallavotti et al. (1999) to study a class of 
three-timescale systems, in order to obtain a lower 











Figure 6 Example of SEGs whose values have to be summed together in order to produce the cancellation discussed in the text. 


The mode labels are all fixed. 
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bound on the homoclinic angles (i.e., the angles 
between the stable and unstable manifolds of 
hyperbolic tori which are preserved by the perturba- 
tion). The formalism becomes a little more involved, 
essentially because of the entries of the Wronskian 
matrix appearing in [5]. In such a case, the 
unperturbed solution u(t) corresponds to the 
rotators moving linearly with rotation vector w and 
the pendulum moving along its separatrix; a 
nontrivial fact is that if go denotes the Lyapunov 
exponent of the pendulum in the absence of the 
perturbation, then one has to look for an expansion 
inx=e *® with g= gọ + O(e), because the perturba- 
tion changes the value of such an exponent. 

The same techniques have also been applied to 
study the relation of the radius of convergence of the 
standard map, an area-preserving diffeomorphism 
from the cylinder to itself, which has been widely 
studied in the literature since the original papers by 
Greene and by Chirikov, both appeared in 1979, 
with the arithmetical properties of the rotation 
vector (which is, in this case, just a number). In 
particular, it has been proved that the radius of 
convergence is naturally interpolated through a 
function of the rotation number known as Bryuno 
function (which has been introduced by Yoccoz as 
the solution of a suitable functional equation 
completely independent of the dynamics); see 
Berretti and Gentile (2001) for a review of results 
of this and related problems. 

Also the generalized Riccati equation ú — iu? — 
2if (wt) + ic* =0, where w € T? is Diophantine and f 
is an analytic periodic function of ~=wt, has been 
studied with the diagrammatic technique by Gentile 
(2003). Such an equation is related to two-level 
quantum systems (as first used by Barata), and 
existence of quasiperiodic solutions of the general- 
ized Riccati equation for a large measure set € of 
values of € can be exploited to prove that the 
spectrum of the corresponding two-level system is 
pure point for those values of €; analogously, one 
can prove that, for fixed €, one can impose some 
further nonresonance conditons on w, still leaving a 
full measure set, in such a way that the spectrum is 
pure point. (We note, in addition, that, technically, 
such a problem is very similar to that of studying 
conservation of elliptic lower-dimensional tori with 
one normal frequency.) 

Finally we mention a problem of partial differ- 
ential equations, where, of course, the scheme 


described above has to be suitably adapted: this is 
the study of periodic solutions for the nonlinear 
wave equation uy — uy. + mu = lu), with Dirichlet 
boundary conditions, where m is a real parameter 
(mass) and y(u) is a strictly nonlinear analytic odd 
function. Gentile and Mastropietro (2004) repro- 
duced the result of Craig and Wayne for the 
existence of periodic solutions for a large measure 
set of periods, and, in a subsequent paper by the 
same authors with Procesi (2005), an analogous 
result was proved in the case m=0, which had 
previously remained an open problem in 
literature. 


See also: Averaging Methods; Integrable Systems and 
Discrete Geometry; KAM Theory and Celestial 
Mechanics; Stability Theory and KAM. 
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Definitions 


The dimer model arose in the mid-twentieth century 
as an example of an exactly solvable statistical 
mechanical model in two dimensions with a phase 
transition. It is used to model a number of physical 
processes: free fermions in 1 dimension, the two- 
dimensional Ising model, and various other 
two-dimensional statistical-mechanical models at 
restricted parameter values, such as the 6- and 
8-vertex models and O(n) models. A number of 
observable quantities such as the “height function” 
and densities of motifs have been shown to have 
conformal invariance properties in the scaling limit 
(when the lattice spacing tends to zero). 

Recently, the model is also used as an elementary 
model of crystalline surfaces in R?. 

A dimer covering, or perfect matching, of a graph 
is a set of edges (“dimers”) which covers every 
vertex exactly once. In other words, it is a pairing of 
adjacent vertices (see Figure 1a which is a dimer 
covering of an 8 x 8 grid). Dimer coverings of a grid 
are sometimes represented as domino tilings, that is, 
tilings with 2 x 1 rectangles (Figure 1b). The dimer 
model is the study of the set of dimer coverings of a 
graph. Typically, the underlying graph is taken to be 
a regular lattice in two dimensions, for example, the 
square grid or the honeycomb lattice, or a finite part 
of such a lattice. 

Dimer coverings of the honeycomb graph are in 
bijection with tilings of plane regions with 60° 
rhombi, also known as lozenges (see Figure 2). 
These tilings in turn are projections of piecewise- 
linear surfaces in R? composed of unit squares in 
the 2-skeleton of Z°. So one can think of honey- 
comb dimer coverings as modeling discrete surfaces 
in R?. These surfaces are monotone in the sense 
that the orthogonal projection to the plane 
P4141 = {(x, y, 2) |x + y + z= 0} is injective. 
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Figure 1 A dimer covering of a grid and the corresponding 
domino tiling. 
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Figure 2 Honeycomb dimers (solid) and the corresponding 
“lozenge” tilings (gray). 


Other models related to the dimer model are: 


e The spanning tree model on planar graphs. The 
set of spanning trees on a planar graph is in 
bijection with the set of dimer coverings on an 
associated bipartite planar graph. Conversely, 
dimer coverings of a bipartite planar graph are 
in bijection with directed spanning trees on an 
associated graph. 

e The Ising model on a planar graph with zero 
external field can be modeled with dimers on an 
associated planar graph. 

e Plane partitions (three-dimensional versions of 
integer partitions). Viewing a plane partition 
along the (1, 1, 1)-direction, one sees a lozenge 
tiling of the plane. 

e Annihilating random walks in one dimension can 
be modeled with dimers on an associated planar 
graph. 

e The monomer-dimer model, where one allows a 
certain density of holes (monomers) in a dimer 
covering. This model is unsolved at present, 
although some partial results have been obtained. 


Gibbs Measures 


The most general setting in which the dimer model 
can be solved is that of an arbitrary planar graph 
with energies on the edges. We define here the 
corresponding measure. 

Let G=(V,E) be a graph and M(G) the set of 
dimer coverings of G. Let € be a real-valued 
function on the edges of G, with E(e) representing 
the energy associated to a dimer on the bond e. One 
defines the energy of a dimer covering as the sum of 
the energies of those bonds covered with dimers. 
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The partition function of the model on (G, €) is then 
the sum 


CeM(G) 


where the sum is over dimer coverings. In what 
follows we will take RT =1 for simplicity. Note that 
Z depends on both G and €. 

The partition function is well defined for a finite 
graph and defines the Gibbs measure, which is 
by definition the probability measure u= ue on 
the set M(G) of dimer coverings satisfying 
u(C)=(1/Z)e* for a covering C. 

For an infinite graph G with fixed energy function 
E, a Gibbs measure on M(G) is by definition any 
measure which is a limit of the Gibbs measures on a 
sequence of finite subgraphs which fill out G. There 
may be many Gibbs measures on an infinite graph, 
since this limit typically depends on the sequence of 
finite graphs. When G is an infinite periodic graph 
(and € is periodic as well), it is natural to consider 
translation-invariant Gibbs measures; one can show 
that in the case of a bipartite, periodic planar graph 
the translation-invariant and ergodic Gibbs meas- 
ures form a two-parameter family — see Theorem 3 
below. 

For a translation-invariant Gibbs measure v which 
is a limit of Gibbs measures on an increasing 
sequence of finite graphs G,, one can define the 
partition function per vertex of v to be the limit 


Z = lim Z(G,,)'/'@" 
where |G,,| is the number of vertices of G,. The free 
energy, or surface tension, of v is —log Z. 


Combinatorics 
Partition Function 


One can compute the partition function for dimer 
coverings on a finite planar graph G as the Pfaffian 
(square root of the determinant) of a certain 
antisymmetric matrix, the Kasteleyn matrix. The 
Kasteleyn matrix is an oriented adjacency matrix of 
G, indexed by the vertices V: orient the edges of a 
graph embedded in the plane so that each face has 
an odd number of clockwise oriented edges. Then 
define K = (Ky) with 


Kyy = t3e &™ 


if G has an edge vv’, with a sign according to the 
orientation of that edge, and Kyy =0 if v,v’ are not 


adjacent. We then have the following result of 
Kasteleyn: 


Theorem 1 Z= |Pf(K)| = /| det K]. 


Here Pf(K) denotes the Pfaffian of K. 

Such an orientation of edges (which always exists 
for planar graphs) is called a Kasteleyn orientation; 
any two such orientations can be obtained from one 
another by a sequence of operations consisting of 
reversing the orientations of all edges at a vertex. 

If G is a bipartite graph, that is, the vertices can 
be colored black and white with no neighbors 
having the same color, then the Pfaffian of K is the 
determinant of the submatrix whose rows index the 
white vertices and columns index the black vertices. 
For bipartite graphs, instead of orienting the edges 
one can alternatively multiply the edge weights by a 
complex number of modulus 1, with the condition 
that the alternating product around each face (the 
first, divided by the second, times the third, as so on) 
is real and negative. 

For nonplanar graphs, one can compute the 
partition function as a sum of Pfaffians; for a 
graph embedded on a surface of Euler characteristic 
x, this requires in general 277x Pfaffians. 


Local Statistics 


The inverse of the Kasteleyn matrix can be used to 
compute the local statistics, that is, the probability that 
a given set of edges occurs in a random dimer covering 
(random with respect to the Gibbs measure p). 


Theorem 2 Let S={(v1,v2),...,(V2p_-15,V2%)} be a 
set of edges of G. The probability that all these 
edges occur in a -random covering is 


{= 


k 
Pr(S) — ( Rosson) Pog x2e((K Jap 
1 


Again, for bipartite graphs the Pfaffian can be 
made into a determinant. 


Heights 


Bipartite graphs Suppose G is a bipartite planar 
graph. A 1-form on G is simply a function on the set 
of oriented edges which is antisymmetric with respect 
to reversing the edge orientation: f(—e) = —f(e) for 
an edge e. A 1-form can be identified with a flow: 
just flow by f(e) along oriented edge e. The 
divergence of the flow f is then d*f. Let Q be the 
space of flows on edges of G, with divergence 1 at 
each white vertex and divergence —1 at each black 
vertex, and such that the flow along each edge from 
white to black is in [0, 1]. From a dimer covering M 
one can construct such a flow w(M) € Q: just flow 


one unit along each dimer, and zero on the remaining 
edges. The set Q is a convex polyhedron in RË and its 
vertices can be seen to be exactly the dimer coverings. 

Given any two flows w1,w2 € Q, their difference is 
a divergence-free flow. Its dual (w1 — w2)" (or 
conjugate flow) defined on the planar dual of G is 
therefore the gradient of a function þh on the faces of 
G, that is, (w1 —w2)"=dh, where þh is well defined 
up to an additive constant. 

When w1 and w2 come from dimer coverings, / is 
integer valued, and is called the height difference of 
the coverings. The level sets of the function / are 
just the cycles formed by the union of the two 
matchings. If we fix a “base point” covering wọ and 
a face fo of G, we can then define the height 
function of any dimer covering (with flow w) to be 
the function 4 with value zero at fọ and which 
satisfies dh = (w — wo)”. 


Nonbipartite graphs On a nonbipartite planar 
graph the height function can be similarly defined 
modulo 2. Fix a base covering wo; for any other 
covering w, the superposition of wo and w is a set of 
cycles and doubled edges of G; the function þ is 
constant on the complementary components of these 
cycles and changes by 1 mod 2 across each cycle. 
We can think of the height modulo 2 as taking two 
values, or spins, on the faces of G, and the dimer 
chains are the spin-domain boundaries. In particu- 
lar, dimers on a nonbipartite graph model can in this 
way model the Ising model on an associated dual 
planar graph. 


Thermodynamic Limit 


By periodic planar graph we mean a graph G, with 
energy function on edges, for which translations by 
elements of Z4 or some other rank-2 lattice r c R? 
are isomorphisms of G preserving the edge energies, 
and such that the quotient G/Z” is a finite graph. 
Without loss of generality we can take T = Z7. The 
standard example is G=7Z” with € =0, which we 
refer to as “dimers on the grid.” However, other 
examples display different global behaviors and so it 
is worthwhile to remain in this generality. 

For a periodic planar graph G, an ergodic 
probability measure on M(G) is one which is 
translation invariant (the measure of a set is the 
same as any Z7-translate of that set) and whose 
invariant subsets have measure 0 or 1. 

We will be interested in probability measures 
which are both ergodic and Gibbs (we refer to them 
as ergodic Gibbs measures, dropping the term 
“probability”). When G is bipartite, there are 
multiple ergodic Gibbs measures (see Theorem 3 
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below). When G is nonbipartite, it is conjectured 
that there is a single ergodic Gibbs measure. 

In the remainder of this section we assume that G 
is bipartite, and assume also that the Z7-action 
preserves the coloring of the edges as black and 
white (simply pass to an index-2 sublattice if not). 

For integer n > 0 let G, = G/nZŽ, a finite graph 
on a torus (in other words, with periodic boundary 
conditions). For a dimer covering M of G,, we 
define (hx, hy) € Z? to be the horizontal and vertical 
height change of M around the torus, that is, the net 
flux of w(M) — wọ across a horizontal, respectively 
vertical, cut around the torus (in other words, bx, hy 
are the horizontal and vertical periods around the 
torus of the 1-form w(M) — wo). The characteristic 
polynomial P(z,w) of G is by definition 


P(z, w) = > eTEM) ghey iy (1) P29 
MeM(G}) 


here the sum is over dimer coverings M of 
G,=G/Z’, and h,,hb, depend on M. The poly- 
nomial P depends on the base point wo only by a 
multiplicative factor involving a power of z and w. 
From this polynomial most of the large-scale 
behavior of the ergodic Gibbs measures can be 
extracted. 

The Gibbs measure on G, converges as n — œ to 
the (unique) ergodic Gibbs measure u with smallest 
free energy F= —log Z. The unicity of this measure 
follows from the strict concavity of the free energy 
of ergodic Gibbs measures as a function of the slope, 
see below. The free energy F of the minimal free 
energy measure is 


F = m zj, log P(z, Fy casa 
(271)° Jsixs! Z w 


that is, minus the Mahler measure of P. 

For any translation-invariant measure v on M(G), 
the average slope (s,t) of the height function for v- 
almost every tiling is by definition the expected 
horizontal and vertical height change over one 
fundamental domain, that is, s=El[h(f + (1,0)) — 
h(f)| and t=E[h(f + (0,1)) — h(f)] where f is any 
face. This quantity (s,t) lies in the Newton polygon 
of P(z,w) (the convex hull in R? of the set of 
exponents of monomials of P). In fact, the points in 
the Newton polygon are in bijection with the 
ergodic Gibbs measures on M(G): 


Theorem 3 When G is a periodic bipartite planar 
graph, any ergodic Gibbs measure has average slope 
(s, t) lying in N(P). Moreover, for every point (s,t) € 
N(P) there is a unique ergodic Gibbs measure u(s, t) 
with that average slope. 


64 Dimer Problems 


In particular, this gives a complete description of 
the set of all ergodic Gibbs measures. The ergodic 
Gibbs measure ju(s,t) of slope (s,t) can be obtained 
as the limit of the Gibbs measures on G,, when one 
conditions the configurations to have a particular 
slope approximating (s, t£). 


Ronkin Function and Surface Tension 


The Ronkin function of P is a map R:R*—R 


defined for (B,, By) € R? by 
J log P(ze”*, we”) ae 
SxS! 


R(B,, By) = zw 





(2i) 


The Ronkin function is convex and its graph is 
piecewise linear on the complement of the amoeba 
A(P) of P, which is the image of the zero set {(z,w) € 
C*|P(z,w)=0} under the map (z, w) (log |g], 
log |w|) (see Figures 3 and 4 for an example). 

The free energy F(u(s,t)) of uls, t), as a function of 
(s,t) € N(P), is the Legendre dual of the Ronkin 
function of P(z, w): we have 


FS.) SIN By, By) = SBe— 1B; 
where 


 — GRBs, By), _ OR(Bx, By) 
OB, = OBy 


The continuous map VR: R — N(P) which takes 
(Bx, By) to (s,t) is injective on the interior of A(P), 
collapses each bounded complementary component of 
A(P) to an integer point in the interior of N(P), and 
collapses each unbounded complementary component 
of A(P) to an integer point on the boundary of N(P). 

Under the Legendre duality, the facets in the 
graph of the Ronkin function (i.e., maximal regions 








Figure 3 The amoeba of P(z,w)=5+2Zz+4+1/z+w+1/w, 
which is the characteristic polynomial for dimers on the periodic 
“square-octagon” lattice. 


— 


Figure 4 Minus the Ronkin function of P(z,w)=5+z+1/z 
+w+1/w. 





Figure 5 (Negative of) the free energy for dimers on the 
square-octagon lattice. 


on which R is linear) give points of nondifferentia- 
bility of the free energy F, as defined on N(P). We 
refer to these points of nondifferentiability as 
“cusps.” Cusps occur only at integer slopes (s, t) 
(see Figure 5 for the free energy associated to the 
Ronkin function in Figure 4). 

By Theorem 3, the coordinates (B,,B,) can also 
be used to parametrize the set of Gibbs measures 
u(s,t) (but only those with slope (s, ¢) in the interior 
of N(P) or on the corners of N(P) and boundary 
integer points). This parametrization is not one-to- 
one since when (B,,B,) varies in a complementary 
component of the amoeba, the measure su(s,t) does 
not change. On the interior of the amoeba the 
parametrization is one-to-one. 

The remaining Gibbs measures, whose slopes are 
on the boundary of N(P), can be obtained by taking 
limits of (Bx, By) along the “tentacles” of the amoeba. 


Phases 


The Gibbs measures p(s,t) can be partitioned into 
three classes, or phases, according to the behavior of 
the fluctuations of the height function. If we 
measure the height at two distant points x; and x2 
in G, the average height difference, E[h(x1) — h(x2)], 
is a linear function of xı — x2 determined by the 
average slope of the measure. The height fluctuation 
is defined to be the random variable h(x;) — h(x2) — 
E[h(x1) — b(x2)]. This random variable depends on 


the two points and we are interested in its behavior 
when x; and x2 are far apart. 
We say p(s, t) is 


1. “Frozen” if the height fluctuations are bounded 
almost surely. 

2. “Rough” (or “liquid”) if the covariance in the 
height function E[h(x1)h(x2)| — E[hb(x1)|E[h(x2)] 
is unbounded as |x; — x2|— oo. 

3. “Smooth” (or “gaseous”) if the covariance of the 
height function is bounded but the height 
fluctuations are unbounded. 


The height fluctuations can be related to the decay 
of the entries of K, which are in turn related to the 
decay of the Fourier coefficients of 1/P. In par- 
ticular, we have 


Theorem 4 The measure ,\(s,t) is respectively 
frozen, rough, or smooth according to whether 
(B,, By) =(VR)“(s, t) is in the closure of an 
unbounded complementary component of A(p), in 
the interior of A(P), or in the closure of a bounded 
component of A(P) 


The characteristic polynomials P which occur in 
the dimer model are not arbitrary: their algebraic 
curves {P =O} are all of a special type known as 
Harnack curves, which are characterized by the fact 
that the map from the zero-set of P in C? to its 
amoeba in R? is at most two-to-one. In fact: 


Theorem 5 By varying the edge energies all 
Harnack curves can be obtained as the characteristic 
polynomial of a planar dimer model. 


Local Statistics 


In the thermodynamic limit (on a periodic planar 
graph), local statistics of dimer coverings for the Gibbs 
measure of minimal free energy can be obtained from 
the limit of the inverse of the Kasteleyn matrix on the 
finite toroidal graphs G,. This in turn can be 
computed from the Fourier coefficients of 1/P. 

As an example, let G be the square grid Z? and take 
E =Q (which corresponds to the uniform measure on 
configurations for finite graphs). An appropriate 
choice of signs for the Kasteleyn matrix is to put 
weights 1, — 1 on alternate horizontal edges and i, —i 
on alternate vertical edges in such a way that around 
each white vertex the weights are cyclically 1,7, —1, 
—1. For this choice of signs we have 


i 2r 2r 
Koo) (xy) (27) zz), Ly. 


This integral can be evaluated explicitly (see Figure 6 
for values of Ko 0),(x,y) Dear the origin; by 


milter) dé eNO dd dd | 
2 sin 0 + 2i sin ¢ O 
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Figure 6 Values of K~ on Z° with zero energies. 


translation invariance Kj; sy), (x = Koo) yal 


and values in other e A be koa by 
Ki, 0), y) = ZIK, 0), (7,2): 

As a ‘sample Comput ton: using Theorem 2, the 
probability that the dimer covering the origin points 
to the right and, simultaneously, the one covering 
(0,1) points upwards is 


— = 
Ko .0),4,0) en) 
Ko2,0,0) Ke2)0.1 
a: = 1 
1: i) def J- 
ILE ti 


Another computation which follows is the decay 
of the edge covariances. If e1,e2 are two edges at 
distance d, then Pr(e,;&e2) — Pr(e1)Pr(e2) decays 
quadratically in 1/d, since K~'((0,0), (x,y)) decays 
like 1/(|x| + |y)). 


K(0,0),(1,0)K (0,2),(0,1) df 


Scaling Limits 


The scaling limit of the dimer model is the limit 
when the lattice spacing tends to zero. 

Let us define the scaling limit in the following 
way. Let eZ% be the square grid scaled by e€, so the 
lattice mesh size is e. Fix a Jordan domain U c R? 
and consider for each e a subgraph U, of €Z’, 
bounded by a simple polygon, which tends to U as 
«—0. We are interested in limiting properties of 
random dimer coverings of U,, in the limit as « > 0, 
for example, the fluctuations of the height function 
and edge densities. 


66 Dimer Problems 


The limit depends on the (sequence of) boundary 
conditions, that is, on the exact choice of approxi- 
mating regions U.. By changing U. one can change 
the limiting rescaled height function along the 
boundary. It is conjectured that the limit of the 
height function along the boundary of U, (scaled by 
€... and assuming this limit exists) determines 
essentially all of the limiting behavior in the interior, 
in particular the limiting local statistics. 

Therefore, let u be a real-valued continuous 
function on the boundary of U. Consider a sequence 
of subgraphs U; of €Z7, as e—0 as above, and 
whose height function along the boundary, when 
scaled by €, is approximating u. We discuss the limit 
of the model in this setting. 


Crystalline Surfaces 


The height function allows us to view dimer cover- 
ings as random surfaces in R°: to a dimer covering 
of G, one associates the graph of its height function, 
extended in a piecewise linear fashion over the edges 
and faces of the dual G*. These surfaces are then 
piecewise linear random surfaces, which resemble 
crystal surfaces in the sense that microscopically (on 
the scale of the lattice) they are rough, whereas their 
long-range behavior is smooth and facetted, as we 
now describe. 

In the scaling limit, boundary conditions as 
described in the last paragraph of the previous 
section are referred to as “wire-frame” boundary 
conditions, since the graph of the height function 
can be thought of as a (random) surface spanning 
the wire frame defined by its boundary values. 

In the scaling limit, there is a law of large 
numbers which says that the Gibbs measure on 
random surfaces (which is unique since we are 
dealing with a finite graph) concentrates, for fixed 
wire-frame boundary conditions, on a single surface 
So. That is, as the lattice spacing € tends to zero, 
with probability tending to 1 the random surface lies 
close to a limiting surface Sp. The surface So is the 
unique surface which minimizes the total surface 
tension, or free energy, for its fixed boundary values, 
that is, minimizes the integral over the surface of the 
F(uls,t)), where (s,t) is the slope of the surface at 
the point being integrated over. Existence and 
unicity of the minimizer follow from the strict 
convexity of the free energy/surface tension as a 
function of the slope. 

At a point where the free energy has a cusp, the 
crystal surface Sp will in general have a facet, that is, 
a region on which it is linear. Outside of the facets, 
one expects that So is analytic, since the free energy 
is analytic outside the cusps. 


Fluctuations 


While the scaled height function eh in the scaling 
limit converges to its mean value ho (whose graph is 
the surface So), the fluctuations of the unrescaled 
height function 4 — (1/e)ho will converge in law to a 
random process on U. 

In the simplest setting, that of honeycomb dimers 
with € = 0, and in the absence of facets, the height 
fluctuations converge to a continuous Gaussian 
process, the image of the Gaussian free field on the 
unit disk D under a certain diffeomorphism ©® 
(depending on ho) of D to U. 

In the particular case ho = 0, ® is the Riemann map 
from D to U and the law of the height fluctuations 
is just the Gaussian free field on U (defined to be 
the Gaussian process whose covariance kernel is 
the Dirichlet Green’s function). The conformal 
invariance of the Gaussian free field is the basis for 
a number of conformal invariance properties of the 
honeycomb dimer model. 


Densities of Motifs 


Another observable of interest is the density field of a 
motif. A motif is a finite collection of edges, taken up 
to translation. For example, consider, for the square 
grid, the “L” motif consisting of a horizontal domino 
and a vertical domino aligned to form an “L,” which 
we showed above to have a density 1/47 in the 
thermodynamic limit. The probability of seeing this 
motif at any given place is 1/47. However, in the 
scaling limit one can ask about the fluctuations of the 
occurrences of this motif: in a large ball around a 
point x, what is the distribution of Nr; — A/47, where 
Nz is the number of occurrences of the motif, and A is 
the area of the ball? These fluctuations form a 
random field, since there is a long-range correlation 
between occurrences of the motif. 

It is known that on Z”, for the minimal free energy 
ergodic Gibbs measure, the rescaled density field 


T 


converges as € — 0 weakly to a Gaussian random field 
which is a linear combination of a directional 
derivative of the Gaussian free field and an independent 
white noise. A similar result holds for other motifs. 

The joint distribution of densities of several motifs 
can also be shown to be Gaussian. 


See also: Combinatorics: Overview; Determinantal 
Random Fields; Growth Processes in Random Matrix 
Theory; Statistical Mechanics and Combinatorial 
Problems; Statistical Mechanics of Interfaces. 
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Introduction 


In this article we describe some recent results (Finster 
et al. 1999a,b, 2000 a-c, 2002a) concerning the 
existence of both particle-like, and black hole 
solutions of the coupled Einstein—Dirac—Yang—Mills 
(EDYM) equations. We show that there are stable 
globally defined static, spherically symmetric solu- 
tions. We also show that for static black hole 
solutions, the Dirac wave function must vanish 
identically outside the event horizon. The latter result 
indicates that the Dirac particle (fermion) must either 
enter the black hole or tend to infinity. 

The plan of the article is as follows. The next 
section describes the background material. It is 
followed by a discussion of the coupled EDYM 
equations for static, spherically symmetric particle- 
like and black hole solutions. The final section of 
the article is devoted to a discussion of these results. 


Background Material 
Einstein’s Equations 


We begin by describing the Einstein equation for the 
gravitational field (for more details, see, e.g., Adler 
et al. (1975)). We first note Einstein’s hypotheses of 
general relativity (GR): 


(E1) The gravitational field is the metric gj in 3+ 1 
spacetime dimensions. The metric is assumed to 
be symmetric. 

(E2) At each point in spacetime, the metric can be 
diagonalized as diag(—1,1,1,1). 

(E3) The equations which describe the gravitational 
field should be covariant; that is, independent 
of the choice of coordinate system. 


The hypothesis (E1) is Einstein’s brilliant insight, 
whereby he “geometrizes” the gravitational field. 
(E2) means that there are inertial frames at each 
point (but not globally), and guarantees that special 
relativity (SR) is included in GR, while (E3) implies 


that the gravitational field equations must be tensor 
equations; that is, coordinates are an artifact, and 
physics should not depend on the choice of 
coordinates. 


Einstein’s Equations of GR 


The merie 29=9)(4),17—0,12,30=% 1 xe a 


x° =ct(c=speed of light, t=time), is the metric tensor 
defined on four-dimensional spacetime. Einstein’s 
equations are ten (tensor) equations for the unknown 
metric gj (gravitational field), and take the form 


Ri — Rg; = oTi [1] 


where the left-hand side Gj = Rj — 4 Rg; is the 
Einstein tensor and depends only on the geometry, 
o=8nG/c*, where G is Newton’s gravitational 
constant, while T;, the energy-momentum tensor, 
represents the source of the gravitational field, and 
encodes the distribution of matter. (The word 
“matter” in GR refers to everything which can 
produce a gravitational field, including elementary 
particles, electromagnetic or Yang-Mills (YM) fields. 
From the Bianchi identities in geometry (cf. Adler 
et al. (1975)), the (covariant) divergence of the 
Einstein tensor, Gj, vanishes identically, namely 


l 
G; = 0 
so, on solutions of Einstein’s equations, 
j 
T; =0 


and this in turn expresses the conservation of energy 
and momentum. The quantities which comprise the 
Einstein tensor are given as follows: first, from 
the metric tensor gj, we form the Levi-Civita 
connection rs defined by: 


1 O i O ; O 17 
rk ke [ OEY Sil Sij 
ij 4% § ( ar ae | 





Ox' Oxi Ox? 


where (4 x 4 matrix) [g*“] =[g,,]~', and summation 
convention is employed; namely, an index which 
appears as both a subscript and a superscript is to 
be summed from 0 to 3. With the aid of T$, we can 


ij? 
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construct the celebrated Riemann curvature tensor 
ght" 
i i 
o or gt OL a 
ql xk  ðx! 





pe _ pi pe 
+ pkl ge — Ppl ak 


Finally, the terms Rj and R which appear in the 
Einstein tensor G; are given by 


Ry = RS 


isj 
(the Ricci tensor), and 
R=g! Rij 


is the scalar curvature. 

From the above definitions, one sees at once the 
enormous complexity of the Einstein equations. For 
this reason, one usually seeks solutions which have a 
high degree of symmetry, and in what follows, in this 
section, we shall only consider static, spherically 
symmetric solutions; that is, solutions which depend 


only on r= |x| =4/ (x1)? + (x2)? + (x3). In this case, 


the metric g; takes the form 
ds? = -T(r dt + A(r) dr + rdo? [2] 


where dQ? = dé? + sin? 0 dy? is the standard metric 
on the unit 2-sphere, r,9,p are the usual spherical 
coordinates, and ¢ denotes time. 


Black Hole Solutions 


Consider the problem of finding the gravitational 
field outside a ball of mass M in R3; that is, there is 
no matter exterior to the ball. Solving Einstein’s 
equations G;’=0 gives the famous Schwarzschild 


solution (1916): 
ds? = — (1 — n ear 
Y 
2m\ | D4 Da 


where m=GM_/c’. Since 2m has the dimensions of 
length, it is called the Schwarzschild radius. Observe 
that when r=2m, the metric is singular; namely, 
gu = 0 and g, =œ. By transforming the metric [2] to 
the so-called Kruskal coordinates (cf. Adler et al. 
(1975)), one observes that the Schwarzschild sphere 
r = 2m has the physical characteristics of a black hole: 
light and nearby particles can enter the region r < 2m, 
nothing can exit this region, and there is an intrinsic 
(nonremovable) singularity at the center r= 0. 

For the general metric [2], we define a black hole 
solution of Einstein’s equations to be a solution 
which satisfies, for some p > 0, 


A(p)=0, A(r)>0 ifr>p 


p is called the radius of the black hole, or the event 
horizon. 


Yang-Mills Equations 


The YM equations generalize Maxwell’s equations. 
To see how this comes about, we first write 
Maxwell’s equations in an invariant way. Thus, let 
A denote a scalar-valued 1-form: 
A=Ajdx', A;ER 

which is called the electromagnetic potential (by 
physicists), or a connection (by geometers). The 
electromagnetic field (curvature) is the 2-form 


F=dA 


In local coordinates, 








OA; OA; 

— F..dy’ J ia Lo á 

F= Ad, Jys a ae 
In this framework, Maxwell’s equations are given by 
dF = 0; dF = 0 [4] 


where x is the Hodge star operator, mapping 2-forms 
to 2-forms (in R*), and is defined by 


(“F) gp =3-V lg leieeF” 
where g= det(gj) and £;ke is the completely anti- 
symmetric symbol defined by éjge=sgn(iké). As 
usual, indices are raised (or lowered) via the metric, 
so that, for example, 


Pi = gigtip., 
It is important to notice that *F depends on the 
metric. Note also that Maxwell’s equations are 
linear equations for the A,’s. 

The YM equations generalize Maxwell’s equations 
and can be described as follows. With each YM field 
(described below) is associated a compact Lie group 
G called the gauge group. For such G, we denote its 


Lie algebra by g, defined to be the tangent space at the 
identity of G. Now let A be a g-valued 1-form 


A = A;dx' 


where each A; is in g. In this case, the curvature 2-form 


is defined by 








F=dA+AAA 
or, in local coordinates, 
OA; OA; 
g =~ —-— +A, A; 
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The commutator [A;,Aj]=0 if G is an abelian 
group, but is generally nonzero if G is a matrix 
group. In this framework, the YM equations can be 
written in the form d*F=0, where now d is an 
appropriately defined covariant exterior derivative. 
For Maxwell’s equations, the gauge group G = U(1) 
(the circle group {e®:0 € R}) so g is abelian and we 
recover Maxwell’s equations from the YM equa- 
tions. Observe that if G is nonabelian, then the YM 
equations d*F =O are nonlinear equations for the 
connection coefficients Aj. 


The Dirac Equation in Curved Spacetime 


The Dirac equation is a generalization of Schrodinger’s 
equation, in a relativistic setting (Bjorken and 
Drell 1964). It thus combines quantum mechanics 
with the theory of relativity. In addition, the Dirac 
equation also describes the intrinsic “spin” of fermions 
and, for this reason, solutions of the Dirac equation are 
often called spinors. 
The Dirac equation can be written as 


(G—m)t =0 (5) 


where G is the Dirac operator, m is the mass of the 
Dirac particle (fermion), and WV is a complex-valued 
4-vector called the wave function, or spinor. The 
Dirac operator G is of the form 


wae o 
G = iG (x) Dai + B(x) [6] 
where G/ as well as B are 4 x 4 matrices, m is the 
(rest) mass of the fermion, and i = /—1. The Dirac 
equation is thus a linear equation for the spinors. 
The G’ (called Dirac matrices) and the Lorentzian 
metric gi; are related by 


A 7] 
where {G’,G*} is the anticommutator 
IG G =G G LCE 


Thus, the Dirac matrices depend on the underlying 
metric in four-dimensional spacetime. 

Suppose that H is a spacelike hypersurface in Rf, 
with future-directed normal vector v = v(x), and let 
du be the invariant measure on H induced by the 
metric gj. We define a scalar product on solutions 
WU, ® of the Dirac equation by 


(U|6) = J WG! by; du [8] 


This scalar product is positive definite, and because 
of current conservation (cf. Finster (1988)) 


V ŪGİð = 0 


it is also independent of H. By generalizing 
the expression (due to Dirac), EY = |Y), in 
Minkowski space, where y° and W, the adjoint 
spinors, are defined by 


1 0 
= , w= yy 
0 -1 


where * denotes complex conjugation, and 1 is the 
2x2 identity matrix, the quantity UWG/W; is 
interpreted as the probability density of the Dirac 
particle. We normalize solutions of the Dirac 
equation by requiring 


(vv) =1 9] 


Spherically Symmetric EDYM Equations 


In the remainder of this article we assume that all 
fields are spherically symmetric, so they depend 
only on the variable r= |x|. In this case, the 
Lorentzian metric in polar coordinates (t,r,0, 9) 
takes the form [2]. The Dirac wave function can be 
(Finster et al. 2000b) described by two real 
functions, (a(r), 3(r)), and the potential W(r) corre- 
sponds to the magnetic component of an SU(2) YM 
field. As shown in Finster et al. (2000b), the EDYM 


equations are 


V Aa! =- o ae (m+wT)8 [10] 
VAP = (-m+uT)a -26 [11] 

2 

rA'=1 An ae 
— 2wT?(a* + 67) — Z Aw? [12] 


T 1 (1 = uw?) 
2rA' — =—-1+A4—-—— 
rA -7 ag PA 


+ 2mT(a* — 8") — 2wT?(a? + 67) 


T 2 
+4- wap — Aw" [13] 


rAw" =— (1 — uw jw + e'rTaß 
»A'T—2AT’ , 
— f —-— w 


aT [14] 


Equations [10] and [11] are the Dirac equations, 
[12] and [13] are the Einstein equations, and [14] is 
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the YM equation. The constants m,w, and e denote, 
respectively, the rest mass of the Dirac particle, its 
energy, and the YM coupling constant. 


Nonexistence of Black Hole Solutions 


Let the surface r= p > 0 represent a black hole event 
horizon: 


A(p)=0, Airy 0 ifr>p [15] 


In this case, the normalization condition [9] is 
replaced by 


J (a? rey dr<oo, foreveryro>p [16] 


ro 


In addition, we assume that the following global 
conditions hold: 


lim r(1 — A(r)) = M < œ [17] 


TOO 


(finite mass), 


lim T(r) =1 [18] 


r—> OO 


(gravitational field is asymptotically flat Minkows- 
kian), and 


lim (w(r}, w'(r)) = (1,0) [19] 


TCO 


(the YM field is well behaved). 
Concerning the event horizon r= p, we make the 
following regularity assumptions: 


1. The volume element ,/| det g;| =| sin 6|7*A~'T~* 
is smooth and nonzero on the horizon; that is, 


T?A', T*A € C'([p,0o)) 


2. The strength of the YM field Fj is given by 





2Aw? | (Lu?) 


(ij = A A 


(cf. Bartnik and McKinnon 1988). We assume that 
this scalar is bounded near the horizon; that is, 
outside the event horizon and near r= p, assume 
that 


w and Aw” are bounded [20] 


3. The function A(r) is monotone increasing outside 
of and near the event horizon. 


As discussed in Finster et al. (1999a), if assumption 
1 or 2 were violated, then an observer freely falling 
into the black hole would feel strong forces when 


crossing the horizon. Assumption 3 is considerably 
weaker than the corresponding assumption in 
Finster et al. (1999b), where, indeed, it was assumed 
that the function A(r) obeyed a power law 
A(r) =clr — p) + Ol(r — p)S*"), with positive con- 
stants c and s, for r > p. 

The main result in this subsection is the following 
theorem: 


Theorem 1 Every black hole solution of the 
EDYM equations |10]-[14] satisfying the regularity 
conditions 1-3 cannot be normalized and coincides 
with a Bartnik-McKinnon (BM) black hole of the 
corresponding Einstein—Yang—Mills (EYM) equa- 
tions; that is, the spinors a and B must vanish 
identically outside the event horizon. 


Remark Smoller and Wasserman (1998) proved 
that any black hole solution of the EYM equations 
that has finite mass (i.e., that satisfies [17]) must be 
one of the BM black hole solutions (Bartnik and 
McKinnon 1988) whose existence was first demon- 
strated in Smoller et al. (1993). Thus, amending the 
EYM equations by taking quantum-mechanical 
effects into account — in the sense that both the 
gravitational and YM fields can interact with Dirac 
particles — does not yield any new types of black 
hole solutions. 


The present strategy in proving this theorem is to 
assume that we have a black hole solution of the 
EDYM equations [10|-[18] satisfying assumptions 
1-3, where the spinors do not vanish identically 
outside of the black hole. We shall show that this 
leads to a contradiction. The proof is broken up 
into two cases: either A~!/? is integrable or 
nonintegrable near the event horizon. We shall 
only discuss the proof for the case when A~!/? is 
integrable near the event horizon, leaving the 
alternate case for the reader to view in Finster 
et al. (2000a). 

If A~'/? is integrable, then one shows that there 
are positive constants c, € such that 


csa (n+ @(rn)<-, if p<r<pte [21] 


a | = 


Indeed, multiplying [10] by a, and [11] by 8 and 
adding gives an estimate of the form 


VA( +BY < rla + 67) 


Upon dividing by VA(a? + 37) and integrating from 
r> p to p+ € gives 


| log(a? + 8%) (p + £) — log(a? + 6*)(r)| < const. 
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from which the desired result follows. Next, from 
[12] and [13], 


r(AT*)' =4 — wT (a + 87) 
+I ma =p yF t ap 
=e AAW T [22] 


Using assumption 2 together with the last theorem, 
we see that the coefficients of T4, T°, and T? on the 
right-hand side of [21] are bounded near p, and from 
assumption 1 the left-hand side of [21] is bounded 
near p. Since assumption 1 implies T(r)—> oo as 
rN p, we see that w=0. Since w=0, the Dirac 
equations simplify and we can show that af is a 
positive decreasing function which tends to 0 as 
r— oo. Then the YM equation can be written in the 
form 


(Aw) =—w(1—w) 
»(TVA)aB _ (AT?) 
TE GE a FAT? 


From assumption 2, Aw” is bounded so A*w’? — 0 as 
rN p. Thus, from [22] we can write, for r near p, 





(Aw’) [23] 





C2 
(Aw') (r) > c1 + 
A(r) 
where cı and c2 are positive constants. Using this 
inequality, we can show that for r near p, 


A(r) = (r— p)B(r) 


where 0 < lim, B(r) < oo. It follows that A(p) =0 
and A’(p) > 0. Thus, the Einstein metric has the 
same qualitative features as the Schwarzschild 
metric near the event horizon. Hence, the metric 
singularity can be removed via a Kruskal transfor- 
mation (Adler et al. 1975). In these Kruskal 
coordinates, the YM potential is continuous and 
bounded (as is easily verified). As a consequence, the 
arguments in Finster et al. (2000c) go through and 
show that the spinors must vanish identically outside 
the horizon. For this, one must note that continuous 
zero-order terms in the Dirac operator are irrelevant 
for the derivation of the matching conditions in 
Finster et al. 2000c, section 2.4). Thus, the matching 
conditions (equations (2.31), (2.34) of Finster et al. 
(2000c)) are valid without changes in the presence 
of our YM field. Using conservation of the (electro- 
magnetic) Dirac current and its positivity in timelike 
directions, the arguments in Finster et al. (2000c, 
section 4) all carry over. This completes the proof. 
We have thus proved that the only black hole 
solutions of our EDYM equations are the BM black 


holes; that is, the spinors must vanish identically. In 
other words, the EDYM equations do not admit 
normalizable black hole solutions. Thus, in the 
presence of quantum-mechanical Dirac particles, static 
and spherically symmetric black hole solutions do not 
exist. Another interpretation of these our result is that 
Dirac particles can only either disappear into the black 
hole or escape to infinity. These results were proved 
under very weak regularity assumptions on the form of 
the event horizon (see assumptions 1-3). 


Particle-Like Solutions 


By a particle-like (bound state) solution of the (SU(2)) 
EDYM equations, we mean a smooth solution of 
eqns [10]-[14], which is defined for all r > 0, and 
satisfies condition [9], which explicitly becomes 


KC +6’) n dr=1 [24] 


In addition, we demand that [17]-[19] also hold. It 
is easily shown that, near r=0, we must have 


w(r)=1-— oP + O(r’) [25] 


where is a real parameter. From this, via a Taylor 
expansion, one finds that 


a(r) = ar + Olr’) 


26 
B(r) = 5 (wT — mjor 4 O(r>) | | 


A(lr)=1+0), Tir)=To+O(r*) [27 
with two parameters a; and Tp > 0. Using linearity of 
the Dirac equation, we can always assume that a; > 0. 

Under all realistic conditions, the coupling of 
Dirac particles to the YM field (describing the weak 
or strong interactions) is much stronger than the 
coupling to the gravitational field. Thus, we are 
particularly intrested in the case of weak gravita- 
tional coupling. As shown in Finster et al. (2000b), 
the gravitational field is essential for the formation 
of bound states. However, for arbitrarily weak 
gravitational coupling, we can hope to find bound 
states. It is even conceivable that these bound-state 
solutions might have a well-defined limit when the 
gravitational coupling tends to zero, if we let the 
YM coupling go to infinity at the same time. Our 
idea is that this limiting case might yield a system of 
equations which is simpler than the full EDYM 
system, and can thus serve as a physically interesting 
starting point for the analysis of the coupled 
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interactions described by the EDYM equations. 
Expressed in dimensionless quantities, we shall thus 
consider the limits 


mk—>0 and e — œ [28] 


That is, we ask whether weak gravitational coupling 
can give rise to bound states. Using numerical methods, 
we find particle-like solutions which are stable, even 
for arbitrarily weak gravitational coupling. 

Now assuming that [27] holds (weak gravitational 
coupling), so that (A, T) ~ (1,1), then we find that 
the Dirac equations have a meaningful limit only 
under the assumptions that a converges and that 


mB(r) = A), nT -1> y 

|29] 
m(w-—- m) > E 

with two real functions 3, and a real parameter E. 

Multiplying [29] with m and taking the limits [28] 

as well as A, T — 1, the Dirac equations become 


a! =—a—26 [30] 


AN 


ĝ = (E+y)a -=Â 31] 


We next consider the YM equation [14]. The last 
term in [14] drops out in the limit of weak 
gravitational coupling [27]. The second summand 
converges only under the assumption that 


e2 


— 2 

n4 [32] 
with q a real parameter, playing the role of an 
“effective” coupling constant. Together with [27], 
this implies that m — co. The YM equations thus 
have the limit 


rw" = -—(1 -ww + qraf [33] 


In order to get a well-defined and nontrivial limit of 
the Einstein equations [13] and [14], we need to 
assume that the parameter m?« has a finite, nonzero 
limit. Since this parameter has the dimension of 
inverse length, we can arrange by a scaling of our 
coordinates that 


mK 1 [34] 

We differentiate the T-equation [13] with respect to r 

and substitute [12]. Taking the limits [28] and [33], a 
straightforward calculation yields the equation 

Ay = —a [35] 


where A=r70,(770,) is the radial Laplacian in 
Euclidean R?. Indeed, this equation can be 


regarded as Newton’s equation with the Newtonian 
potential y. Thus, the limiting case [34] for 
the gravitational field corresponds to taking the 
Newtonian limit. Finally, the normalization con- 
dition [16] reduces to 


| aasi 36 


The boundary conditions [17]-[19], [24|-[26] are 


transformed into 


lim w(r)=+1 [37] 


À 
w(r)=1-— ~ + O(r°), lim 
a(r)=air+O(r), blr) =0() [38] 


lim y(r) < œ [39] 


r—> OO 


plr) = po + O(?), 


with the three parameters à, aœ1, and yo. We point 
out that the limiting system contains only one 
coupling constant q. According to [31] and [33], 
q is in dimensionless form given by 


emk >q [40] 
Hence, in dimensionless quantities, the limit [17] 
describes the situation where the gravitational cou- 
pling goes to zero, while the YM coupling constant 
goes to infinity like e? ~ (mK) `t. Therefore, this 
limiting case is called the reciprocal coupling limit 
(RCL). The reciprocal coupling system is given by 
eqns [29], [30], [32], and [34] together with the 
normalization conditions [35] and the boundary 
conditions [36]-[38]. According to [28], the para- 
meter E coincides up to a scaling factor with w — m, 
and thus has the interpretation as the (properly 
scaled) energy of the Dirac particle. As in Newtonian 
mechanics, the potential y is determined only up to a 
constant u € R; namely, the reciprocal limit equa- 
tions are invariant under the transformation 


poptpy E>E-p [41] 

To simplify the connection between the EDYM 
equations, and the RCL equations, we introduce a 
parameter £ in such a way that as € — 0, EDYM —> 
RCL; namely, 


Notice that £ describes the relative strength of gravity 
versus the YM interaction. For realistic physical 
situations, the gravitational coupling is weak; 
namely, m*k <1, but the YM coupling constant is 
of order 1:e7 ~ 1. So we investigate the parameter 
range e < 1,q ~ 0. These form the starting points for 
the numeric below. 
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We seek stable bound states for weak gravita- 
tional coupling. For this purpose, we consider the 
total binding energy 


B=M-m [42] 


where M is the ADM mass defined by [17] and 7 is the 
rest mass of the Dirac particle. B is thus the amount of 
energy set free when the binding is broken. If B < 0, 
then energy is needed to break up the binding. 
According to Lee (1987), a solution is stable if B < 0. 
In order to find solutions of the RCL equations with 
B <0, Lee’s treatment and a new two-parameter 
shooting method (Finster et al. 2000b) can be used. 
Stable solutions of these RCL equations then follow 
(see Finster et al. (2000b) for details). 

We now turn to the full EDYM equations. Here 
are the key steps of our method: 


1. Find solutions which are small perturbations of 
the limiting (RCL) solutions. 

2. Trace these solutions by gradually changing the 
coupling constants. 

3. This should yield a one-parameter family of 
solutions which are “far” from the known limit- 
ing solutions. 


The point is that we use the RCL solutions as a 
starting point for numerics, and we “continue” these 
solutions to solutions of the full EDYM equations. 

To be somewhat more specific, we see that if we 
fix € and q, we have two parameters: 


a, =a (0) and E=w-m 
and two conditions at oo: 
ath 30, wl 
We consider the EDYM equations with weaker 
side conditions 
oe T 
0 < v= | (a2 +) VE dr < 00 
0 


O<eSlim lr) < 


lim w*(r) = 1 
p = lim r(1 — A(r)) < œ 


Then we rescale these solutions to obtain the true 
side conditions via the transformations 


Discussion 


In this article we have considered the SU(2) EDYM 
equations. Our first result shows that the only black 
hole solutions of these equations are the BM black 
holes; that is, the spinors must vanish identically outside 
of the black hole. In other words, the EDYM equations 
do not admit normalizable black hole solutions. Thus, 
as mentioned earlier, this result indicates that the Dirac 
particle either enters the black hole or escapes to 
infinity. Two recent publications (Finster et al. 2002a,b) 
we consider the Cauchy problem for a massive Dirac 
equation in a charged, rotating-black-hole geometry 
(the non-extreme Kerr-Newman black hole), with 
compactly supported initial data outside the black 
hole. We prove that, in this case, the probability that the 
Dirac particle lies in any compact set tends to zero as 
t — oo. This means that the Dirac particle indeed either 
enters the black hole or tends to infinity. We also show 
that the wave function decays at a rate t~*/° on any 
compact set outside of the event horizon. 

For particle-like solutions of the SU(2) EDYM 
equations, we find stable bound states for arbitrarily 
weak gravitational coupling. This shows that as weak 
as the gravitational interaction is, it has a regularizing 
effect on the equations. The stability of particle-like 
solutions of the EDYM equations is in sharp contrast 
to the EYM equations, where the particle-like solu- 
tions are all unstable (Straumann and Zhou 1990). 
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Introduction 


The Dirac equation arose in the early days of 
quantum mechanics, inspired by the problem of 
taking special relativity into account in the quantum 
mechanical description of a freely moving electron. 
From the outset, however, Dirac looked for an 
equation that also accomodated the electron spin 
and that could be modified to include interaction 
with an external electromagnetic field. The equation 
he discovered satisfies all of these requirements. On 
the other hand, when it is rewritten in Hamiltonian 
form, the spectrum of the resulting Dirac operator 
includes not only the desired interval [mc%, o0) 
(where m is the electron mass and c the speed of 
light), but also an interval (—oo, —mc’]. 

Dirac himself already considered this negative 
part of the spectrum as unphysical, since no such 
negative energies had been observed and their 
presence would entail instability of the electron. 
This physical flaw of the “first-quantized” descrip- 
tion of a relativistic electron led to the introduction 
of “second quantization,” as encoded in quantum 
field theory. In the field-theoretic version of the 
Dirac theory, the unphysical negative energies are 
obviated by a prescription that originated in Dirac’s 
hole theory. 

Specifically, Dirac postulated that the negative- 
energy states of his equation were occupied by a sea 
of unobservable particles, the Pauli principle 
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forbidding an occupancy greater than one. In this 
heuristic picture, the annihilation of a negative- 
energy electron yields a hole in the sea, observable 
as a new type of positive-energy particle with the 
same mass, but opposite charge. This led Dirac to 
predict that the electron should have an oppositely 
charged partner. 

His prediction was soon confirmed experimen- 
tally, the partner of the negatively charged electron 
showing up as the positively charged positron. More 
generally, all electrically charged particles (not only 
spin-1/2 particles described by the Dirac equation) 
have turned out to have oppositely charged anti- 
particles. Furthermore, some electrically neutral 
particles also have distinct antiparticles. 

Returning to the second-quantized Dirac theory, 
this involves a Dirac quantum field in which the 
creation/annihilation operators of negative-energy 
states are replaced by annihilation/creation opera- 
tors of positive-energy holes, resp. The hole theory 
substitution therefore leads to a Hilbert space (called 
Fock space) that accomodates an arbitrary number 
of particles and antiparticles with the same mass and 
opposite charge. 

Soon after the introduction of the Dirac equation 
(which dates from 1928), it turned out that the 
number of particles and antiparticles is not con- 
served in a high-energy collision. Such creation and 
annihilation processes admit a natural description in 
the Fock spaces associated with relativistic quantum 
field theories. The very comprehensive mathematical 
description of real-world elementary particle phe- 
nomena that is now called the standard model arose 
some 30 years ago, and has been abundantly 
confirmed by experiment ever since. It involves 


various relativistic quantum fields with nonlinear 
interactions. The Dirac quantum field is an essential 
ingredient, inasmuch as it is used to describe all 
spin-1/2 particles and antiparticles in the model 
(including quarks, electrons, neutrinos etc.). 

After this survey (which is not only very brief, 
but also biased toward the physical concepts at 
issue), the contents of this article will be sketched. 
The free Dirac equation associated with the 
physical Minkowski spacetime R* is first detailed. 
The exposition and notation are slightly unconven- 
tional in some respects. This is because we are partly 
preparing the ground for a mathematically precise 
account of the second-quantized version of the free 
Dirac theory. For example, momentum space (as 
opposed to position space) is emphasized, since the 
variable x in the Dirac equation does not have a 
clear physical significance and should be discarded 
in the Hilbert space formulation of the second- 
quantized Dirac field. The latter acts on a Fock 
space of multi-particle and -antiparticle wave 
functions depending on momentum and spin vari- 
ables, and the spacetime dependence of the Dirac 
field is solely a consequence of relativistic covar- 
lance. (In particular, the variable x in the Dirac 
field W(t, x) should not be viewed as the position of 
particles and antiparticles created and annihilated 
by the field.) 

To be sure, there is much more to the Dirac 
theory than its free first- and second-quantized 
versions for Minkowski spacetime R*. The primary 
purpose here is, however, to present these founda- 
tional versions in some detail. A much more 
sketchy account of further developments can be 
found in subsequent sections. First, the one-particle 
theory is reconsidered. Generalizations of the free 
theory to arbitrary dimensions and Euclidean 
settings are sketched and interactions with external 
fields are described, touching on various aspects 
and applications. 

The next focus is on relations with index theory 
that arise when the massless Euclidean Dirac 
operator is generalized to geometric settings, namely 
/-dimensional Riemannian manifolds allowing a spin 
structure. We illustrate the general Atiyah—Singer 
index theory for the Dirac framework with some 
simple examples for /=1 (Toeplitz operators) and 
l=2 (the manifold Stx S'). 

More information on the many-particle Dirac 
theory appears in the final section. Brief remarks 
on the Dirac field in interaction with other 
quantized fields are followed by an elaboration 
of the far simpler situation of the Dirac field 
interacting with external fields. Among the S- 
Operators corresponding to such fields there is a 
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special class of unitary matrix multipliers; the 
external field then vanishes for t < 0 and equals 
the pure gauge field corresponding to the unitary 
matrix for t > 0. Specializing to an even spacetime 
dimension and choosing special “kink” type 
unitaries, the associated Fock-space quadratic 
forms can be made to converge to the free Dirac 
field. 

As mentioned already, Dirac’s second quantization 
procedure was invented to get rid of the unphysical 
negative energies of the first-quantized (one-particle) 
theory. It is an amazing fact that the resulting 
formalism for the simplest case (namely the massless 
Dirac operator in a two-dimensional spacetime) can 
be exploited for quite different purposes. In particu- 
lar, this setting can be tied in with various soliton 
equations and the representation theory of certain 
infinite-dimensional groups and Lie algebras. In 
conclusion, some of these applications are briefly 
sketched, namely the construction of special solutions 
to the Kadomtsev—Petviashvili (KP) equation (incl- 
uding the KP solitons and finite-gap solutions) and 
special representations of Kac-Moody and Virasoro 
algebras. 


The Free One-Particle Dirac 
Equation in R* 


The free time-dependent Dirac equation is a linear 
hyperbolic evolution equation for a function W(t, x) 
on spacetime R* with values in C*. It involves four 
4x4 matrices y“, u=0, 1, 2,3, satisfying the 
y-algebra 


FY + ook = 2914, g=diag(1,—1,-1,—-1) [1] 
Using the Pauli matrices 
0 1 0 —i 
Sa of) Lio 
|2] 
1 0 
Ni i 
one can choose for example 
v=( c) ae 7) 
12 0 Ok 0 
R= 1.2.3 [3] 


Now the free Dirac equation reads 


(iby°O, + ibcy -V — me*14)U(t,x) = 0 4] 
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where þ is Planck’s constant, c the speed of light, 
and m the particle mass. Using from now on units so 
that h =c=1, this can be abbreviated as 


3 
(iyo, — mv = 0, x= (x? x1, x, x) 
u=0 


ð, = ð/ðx", w=0,1,2,3 [5] 


The relativistic invariance of this equation can be 
understood as follows. First, since the equation 
does not explicitly involve the spacetime coordi- 
nates, it is invariant under spacetime translations. 
(If W(t, x) solves [5], then also U(t— ao, x — a) is a 
solution for all (ag, a) € R*.) Second, it is invariant 
under Lorentz transformations (rotations and 
boosts). Indeed, if U(x) is a solution and Le 
SO(1,3), then S(L)W(Lx) solves [5] too, where 


S(L) denotes a (suitably normalized) matrix 
satisfying 
3 
S(L)*yS(L) = S~ LA,” (6) 
v=0 


(The matrices y“ on the right-hand side of [6] satisfy 
the y-algebra [1]. From this, the existence of a 
representation S(L) of SO(1, 3) satisfying [6] is 
readily deduced.) 

As a consequence, the Poincaré group (inhomo- 
geneous Lorentz group) acts in a natural way on the 
space of solutions to the time-dependent Dirac 
equation, expressing its independence of the choice 
of inertial frame. For quantum mechanical purposes, 
however, one needs to choose a frame and use the 
associated time variable to rewrite the equation as a 
Hilbert space evolution equation. 

The relevant Hilbert space H is the space of four- 
component functions that are square integrable over 
space, 


H = L?(R?, dx) @ CÍ [7] 


To obtain a self-adjoint Hamiltonian on H, one multi- 
plies [5] by 4 and introduces the Hermitian matrices 


Ce, ae kea [8] 


Then, one obtains the Schrödinger type equation 
d ° 
i— wy =H 9 
iy = Fb 9 
where H is the Dirac operator, 


H =-ia-V+ 6m [10] 


Under Fourier transformation, 
F:H = L?(R?, dp) @C* 
HE p) = (2m) | de exp(—in- pute) 
11 


eqn [9] turns into 


d 
io =D(p)s, DO)=a-p+pm [12 


The matrix D(p) is Hermitian and has square Fia 
where Ep is the relativistic energy, 


Ep = (p ptm)” [13] 


corresponding to a momentum p. Now, we have 


UcD(—p) = —D(p) Uc [14] 
where Uc is the charge conjugation matrix, 
Uc = iy [1 5] 


Hence, the four eigenvalues of D(p) are given by 
Ep, Ep, — Ep, and —Ep. Therefore, the matrices 


P.(p) = ; (1 + a) 16] 


are projections on the positive and negative spectral 
subspaces of D(p). 

As orthonormal base for the positive-energy sub- 
space, we can now choose 





2E, 1/2 | 
wp) = (EE) Pob j=12 (07 
where 
1 0 
1/0 1/1 
bı — V2 1 E by = Ro 0 [18] 
0 1 
Next, setting 
w_j(p) = Ucwz;(p), j= 1,2 [19] 


an orthonormal base w-1(—p), w_2(—p) for the 
negative-energy subspace of D(p) is obtained; cf. [14]. 

The upshot is that the time-independent Dirac 
equation 


Hy = Ey [20] 


gives rise to bounded eigenfunctions 


e+ (Xx, p) 

= (2m) “epi p)w.j(p), j=1,2 
e_ (x, p) 

= (27) ** exp(—ix- p)w_j(p), j= 1,2 


(21) 


with eigenvalues E = Ey and E = — Ep, resp. Clearly, 
they are not square-integrable, but they can be used 
as the kernel of a unitary transformation between 


v 


H (7) and the Hilbert space 


H=Hs, 6H. =P,H@P_H 


22 
H}, H- = L?(R?, dp) 9 C? 22 


Specifically, we have 
W:H-H 
f (P) = (f(b), -()) 
0) = D D | deyip) 23 


§=+,— j=1,2 


which entails 


(WP) = | deao) o) DA 
R 


(Here and throughout this article, a bar denotes 
complex conjugation.) 

From the above, it is clear that the Dirac 
Hamiltonian H acting on the Hilbert space H is 
unitarily equivalent to the multiplication operator on 
H [22] given by 


(Hf) s(b) = SEpfs(b), 


Indeed, W is a diagonalizing transformation for H, 
the relation 


§=+,- [25] 


H = WAW [26] 


yielding an explicit realization of the spectral 
theorem. 

Using the same notational convention, the 
momentum, charge conjugation, parity, and time- 
reversal operators on H, given by 


(Ppa) (x) = iðka) [27] 

(Cw) (x) = Ucth(x) [28] 
(Py)(x) = Ury(—x), Up = 7° [29] 
(Tp)(x) = Ury(x), Ur = 7'7 [30] 
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transform into the operators 


(Pef)s(B) = Spefs(P), ê= +,- [31] 
(CHP) = f-s(0), as a [32] 
(PH) = êfs(-p), ê= +,- [33] 
(TA) (p) =ioxfs(—p), ê= +, — [34] 


Note that P}, P, and T leave the positive- and 
negative-energy subspaces H, and H- invariant, 
whereas C interchanges them. 

To conclude this section, we describe some salient 
features of the unitary representation of the (identity 
component of the) Poincaré group on H, which 
follows from the representation on solutions to [5] 
already sketched. The spacetime translations over 
aé€R* are represented by the unitary operator 
exp (—iaọH + ia - P); explicitly, 


(exp(—idoH + ia - P)f);<(p) 
= exp(—id(aoEp — a - p))fs(D), 


The representation of the Lorentz group involves 
unitary 2x2 matrices U(k, A), where k is an 
arbitrary 4-vector satisfying k“k,=1 and A the 
matrix in SL(2, C) representing L € SO(1, 3). (Recall 
that SL(2, C) can be viewed as a 2-fold cover of 
SO(1, 3).) In particular, U(k, A) does not depend on 
k for rotations, 


U(k, A) = A*, 


=+,- [35] 


YA € SU(2) [36] 


(Here and henceforth, we use * to denote the 
Hermitian adjoint of matrices and operators.) For 
boosts, however, there is dependence on the vector 
k, which is the image of the vector (1, 0) under the 
boost. We refrain from a more detailed description 
of U(k, A), as this would carry us too far afield. 
The unitary SO(1, 3) representation leaves the 
decomposition H=P H ®P_H invariant. On the 
positive-energy subspace H4, it is given by 





(U(L)f),.(p) = (E) veaje 37 


0 


where 


p=(Epp), pH Lp [38] 


On H_, it is given by the complex-conjugate 
representation, 


(U(L)f)_(p) = (E) uE Ao 39) 


po m 
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just as for the spacetime translations, cf. [35]. (The 
superscript t is used to denote the transpose matrix.) 
This feature is crucial for the second-quantized 
Dirac theory, which is discussed next. 


The Free Dirac Field in Rí 


The free Dirac field is an operator-valued distribution 
on a Fock space that describes an arbitrary number of 
spin-1/2 particles and antiparticles in terms of momen- 
tum space wave functions. Since spin-1/2 particles are 
fermions (which encodes the Pauli exclusion principle), 
an M-particle wave function a ny a Piss: Pm) (where 
jı € {1, 2} is the spin index) i k e under any 
interchange of a pair (j;, p;) and Cr» p,). Likewise, 
N-antiparticle wave functions F,’ CTR -» qy) 
are antisymmetric. But a wave can E, 2 (D> 4) 
describing a particle—antiparticle pair need i have 
any symmetry property, since a particle and an 
antiparticle can be distinguished by their charge. 

The relevant Fock space is therefore the tensor 
product of two antisymmetric Fock spaces built over 
the one-particle and one-antiparticle spaces 
L?(R°, dp) 9 C?. For later purposes, it is important 
to view these spaces as the summands H, and H_ of 
the space H from the previous section. Thus, the 
arena for the free Dirac field is the Hilbert space 


Fa(H) S Fa(H+) 8 Fa(H-) [40] 


where, for example, 


FH) =(C@HO(H@H),@:::) M] 


where the bar denotes the completion of the infinite 
direct sum in the obvious inner product. The tensor 
(1,0,0,...) is viewed as the vacuum (the “filled 
Dirac sea”) and denoted by Q. 

To get around in Fock space, one or the 
creation and annihilation operators c'*)(f), f € H. 
The creation operator c*(f), f € H, is defined by 
linear and continuous extension of its action on the 
vacuum Q and on elementary antisymmetric tensors, 
recursively given by 


CPQ=f, CPA=fAh, --- 
AA Af =f Af AAN Nyss 


Its adjoint, the annihilation operator c(f), satisfies 


cf) =0, efh = fh) 


N : 
AAA: Af => -Y GA 


j=l 


x ALA ABA 


[42] 


--Afn,... T83] 


Accordingly, the operators c (f) satisfy the canoni- 
cal anticommutation relations (CARs) over H, 


{e(f), c(g)} = 0, 
telf), (e)} = (f8), 


where {A, B} denotes the anticommutator AB + BA. 
(From this, one readily deduces that cœ (f) is 
bounded with norm ||f||.) 

Next, recalling the direct sum decomposition [22], 
a notation change 


Pa ea a e Ta (25) 


is made, thus indicating that a") and b should be 
viewed as the creation/annihilation operators of 
particles and antiparticles, resp. Since H} and H- 
are copies of L?(R°, dp) @C*, a given function 
(f1(p), fo(p)) in the latter space can occur both as an 
argument of a')(-) and of b)(-); it can also be 
viewed as i ee function for unsmeared 
quantities a," '(p) and 2 (p), j= 1, 2, that are often 
referred to as operators as well (even though they 
are only quadratic forms). Thus, one has, for 
example, 


VE,gE€H [44] 


As explained shortly, the smeared time-zero Dirac 
field takes the form 


O(f) = a(P,f) + 6°(KP_f), 


Here and below, K denotes complex ee ies on 
H, H}, and H_. Just as the operators c'*)(f), the 
operators P™ (f) satisfy the CARs over H, 


{®(f), P(g) } = 0 
{O(f), O'(g)} = (f8), 


as is readily verified using [44]-[45]. But this 
-representation is not unitarily equivalent to the 
c-representation [44]. This becomes clear in parti- 
cular from the consideration of a crucial type of 
CAR automorphism that is considered next. 

To this end, we fix a unitary operator U on H. 
Then it is plain that the operators 


cf) = (Uf) 49] 


= 0 (Uf) [50] 


fEH [47] 


[48] 
Vf,gSEH 


also satisfy the CARs. The CAR-algebra automorph- 
ism c™®(f)— (f) can be unitarily implemented in 
F ,(H), since one has 


CO) =T(U)c (ArU) [51] 


where I'(U) denotes the Fock-space product opera- 
tor corresponding to U. Thus, for example, 


KUL TUV = Uhe 
|52] 
MUHA AJN = Uf, A A Ufn,... 
For the CAR automorphism (f) — BF ) this is 
not true, however. Rewriting it in terms of the 
annihilation and creation operators a) and b") via 
[47], it amounts to a linear transformation (Bogoliubov 
transformation), whose unitary implementability has 
been clarified several decades ago. To be specific, the 
necessary and sufficient condition for unitary imple- 
mentability is that the off-diagonal parts 


Ua. = PUF V= PUP [53] 


in the 2 x 2 matrix decomposition of operators on 
H be Hilbert-Schmidt operators. Therefore, no 
problem arises when U is diagonal with respect to 
this decomposition. Indeed, in that case one can 
choose as unitary implementer the product operator 


F(U) =T(U}4)9T(KU_-K) [54] 


(cf. the tensor product structure [40] of F,(H)). 
In particular, the automorphism 


&(f) > oe") [55] 


where H is the free diagonalized Dirac Hamiltonian 
[25], is implemented by the operator 


T(el") Z (el) Q T (el) [56] 


where E denotes multiplication by Ep on H+ and H_. 
The change of CAR representation, therefore, entails 
that the unphysical negative energies of the one- 
particle theory are replaced by positive energies of 
antiparticles. Hence, we obtain a mathematically 
precise version of Dirac’s hole theory substitution 
biip) — b* (p), b*(p) — b;(p). 

More generally, if one chooses for U the Poincaré 
group representation (given by [35] and [37]-[39]), 
then the Fock-space implementer [54] is the tensor 
product of two product operators with the same 
action on F (L?(R?, dp) & C?). Observe that this is 
also true for the Fock-space version I'(T)=I°(T) of 
the time-reversal operator [34]. By contrast, the 
Fock-space parity operator I'(P) =I'(P) gives rise to 
two product operators with slightly different 
actions, cf. [33]. Accordingly, particles and anti- 
particles have opposite parity. 
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The map 
O(f) + O(Cf)" 57] 


also yields a CAR automorphism. It is unitarily 
implemented by the Fock-space charge-conjugation 


operator 
(G) 


which interchanges particles and antiparticles. 
Notice that C is unitary, whereas C is antiunitary. 

It remains to establish the precise relation of the 
above to the customary free Dirac field Y(t, x). This 
is a quadratic form on F,(H) given by 


E i dp XO (ai(p)w (pete? 


ZE 
+B; (p)w_j(p)e"r?*) 99) 


(Its expectation (F1, V(t, x)Fy) is, for example, well 
defined for F,, Fy in the dense subspace of F,(H) 
that consists of vectors with finitely many particles 
and antiparticles and wave functions in Schwartz 
space.) It satisfies the time-dependent Dirac equation 


i0,U = (—ia - V + Bm) i60] 


in the sense of quadratic forms. Furthermore, 
smearing it with a function y(x) in the Hilbert 
space H (7), we obtain 


/ dxy(x) - Y(t, x) 
R? 

_ P(e” W- iy) 

= D (e p(w y) (e) weH [61] 
As announced, the time evolution of the free Dirac 
field is, therefore, given by the unitary one- 
parameter group [56], whose generator (the sec- 
ond-quantized Dirac Hamiltonian) has spectrum 
{O} U [m, oo). 

The Dirac field Y(t, x) can also be smeared with a 


test function F(t, x) in the Schwartz space S(R*)’, 
yielding a bounded operator 


D(F) = | dxF(x) Wx) [62] 


Then one obtains the relativistic covariance relation 
['(U(a, L)) (FT (U(a, L)¥ = U(F*") [63] 
where 


FA" (x) = §(L-1)'F(L t(x — a)) [64] 
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and U(a, L) denotes the Poincaré group representa- 
tion on H, cf. [35] and [37]-[39]. Likewise, one gets 
the inversion formulas 


~ 


T(1)U(P)T()* = U(F;), I=P,T [65] 
with 


Fp(t, x)= UbEF(t, —x), 


Fr(t,x) =U!F(—t,x) [66] 


while the Fock-space charge-conjugation operator 
[58] transforms the Dirac field as 


CWU(F)C = (Fec) [67] 
with 


Fc(x) = UŁF(x) 68 


Finally, let us consider the global U(1) gauge 
transformations f = e? f, where 6 € R and f € H. 
They can be implemented by 


Kensie ore *) [69] 
and one has 
D(e?)U(F) (e) = U(Fy) [70] 
with 
Fi) eo [71] 


The generator O of the one-parameter group 
= T(el%) is the charge operator: on wave functions 
describing N, particles and N_ antiparticles, it has 
eigenvalue N, — N_. 


More on the One-Particle Dirac Theory 


Even for the free one-particle setting, the account 
given earlier is far from complete. To begin with, 
the free Dirac equation admits a specialization to 
massless particles. In the Weyl representation of 
the y-algebra adopted above, the choice m=0 
entails that the p-space equation [12] decouples 
into two 2 x 2 equations for spinors that can be 
labeled by their chirality (“handedness”). This 
refers to their eigenvalue with respect to the 
chirality matrix 


| 1, 0 
F = iPr = ( a ES [72] 


and this notion derives from the noninvariance of 
the separate 2x2 equations under parity. (A 
positive-chirality spinor is mapped to a negative- 
chirality spinor under the parity operator P (33) and 
vice versa.) Since the weak interaction breaks parity 
symmetry, the two 2 x 2 equations (often called 
Weyl equations) do have physical relevance. Indeed, 


the associated quantum fields are a crucial ingredi- 
ent of the standard model. 

Next, we point out that it is possible to switch to 
a representation in which the gamma matrices are 
real. This so-called Majorana representation is 
convenient (but not indispensable) in the description 
of neutral spin-1/2 particles. By definition, such 
particles are equal to their antiparticles, so that the 
second-quantized formalism of the previous section 
must be adapted: one needs the neutral CAR algebra 
over H (also known as self-dual CAR). 

For various purposes, it is important to formulate 
the free Dirac equation for a spacetime whose 
spatial dimension is arbitrary. Then one needs, first 
of all, gamma matrices satisfying the (Minkowski) 
Clifford algebra relations 


We H = 2ga, g=diag,—In) [73] 
where n is the space dimension and the minimal size 
A x A of the gamma matrices is to be determined. 

Clearly, for n=1 and n=2, one can take A=2, 
choosing, for example, 


0 1 0 1 
0 _ 1 _ 
P= (i 9) í tal 
5 fe 0 
r=(5 i) 


to fulfill [73]. For n=4, one can take A=4, just 
as for n= 3, supplementing [1] with the matrix i7°, 
cf. [72]. 

More generally, for n=2N — 1 and n=2N, one 
can take A=2N in [73]. Indeed, a representation on 
the 2N-dimensional fermion Fock space F,(C™) 
(cf. [41]) is readily constructed using the creation 
and annihilation operators described in the previous 
section. Once this has been taken care of, most 
of the discussion on the free one-particle Dirac 
equation in R* can be easily generalized. Of special 
importance in this regard is the straightforward 
adaptation of the formulas [7]-[26], which form 
the foundation for the second-quantized version. 
Indeed, the discussion of the last section applies 
nearly verbatim for arbitrary spacetime dimension. 

In several applications, the so-called Euclidean 
version of the free Dirac theory in spacetime 
dimension n + 1 is important. Basically, this version 
is obtained upon replacing i109 by O,,1 in the Dirac 
equation, a substitution that changes the character 
of the equation from hyperbolic to elliptic. Pro- 
vided that the mass vanishes, the Euclidean Dirac 
equation admits a reinterpretation as a time- 
independent zero-eigenvalue Weyl equation in a 
Minkowski spacetime of dimension n+ 2. (This 
equation is often called the zero-mode equation.) 


[74] 


Let us now turn to the description of the 
interaction with an external electromagnetic poten- 
tial A,,(t, x). This can be taken into account via the 
minimal substitution, 


ð, > ð, tieA,, [75] 


also known as the covariant derivative, in the time- 
dependent Dirac equation [5]. 

For the electron in the Coulomb field of a nucleus 
of charge Ze, one has 


Z 
hae eea Aas = We 
Ar|x| 
and the time-independent equation 
Z 2 
-iæ- V +8m- LL \p=Ep [77 
Ar|x| 


can be solved explicitly. This leads to a bound-state 
spectrum that is more accurate than its nonrelativis- 
tic counterpart. In particular, one finds that energy 
levels that are degenerate in the nonrelativistic 
theory split up into slightly different levels. The 
resulting fine structure of the Dirac levels can be 
understood as a consequence of the coupling 
between the spin of the electron and its orbital 
motion. 

In spite of this better agreement with the 
experimental levels, the physical interpretation of 
the Dirac electron in a Coulomb field is enigmatic. 
This is not only because of the persistence of the 
negative-energy states of the free theory (which 
turn into scattering states), but also because of 
unphysical properties of the position operator. 
More general time-independent external fields 
(such as step potentials Ao(x) with a step height 
larger than 2m) can cause transitions between 
positive- and negative-energy states (Klein para- 
dox). This phenomenon is enhanced when time 
dependence is allowed. In particular, any external 
field that is given by functions in C*(R*) leads to 
a scattering operator S on the one-particle space H 
[22] that has nonzero off-diagonal parts S++. 
Hence, a positive-energy wave packet scattering 
at such a time- and space-localized field has a 
nonzero probability to show up as a negative- 
energy wave packet. 

When one tensors the one- “particle space H with 
an internal symmetry space C*, one can also 
couple external Yang-Mills fields A, taking values 
in the k xk matrices via the substitution [75]. 
(From a geometric viewpoint, this can be 
rephrased as tensoring the spinor bundle with a 
vector bundle equipped with a connection A.) The 
generalization of this external gauge field coupling 
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to a Minkowski spacetime or Euclidean space of 
arbitrary dimension is straightforward. An adapta- 
tion of the resulting interacting one-particle Dirac 
theory in arbitrary dimension to quite general 
geometric settings also yields a crucial starting 
point for index theory. 

Before turning to the latter area, we conclude this 
section with another striking application of the one- 
particle framework, namely the massless Dirac 
equation in two spacetime dimensions with special 
external fields. Specifically, the relevant Dirac 
operator is of the form 

ig —ig(x) 

| ` 78] 

(x) -iğ 
where r(x) and q(x) are not necessarily real valued. 
(Note that this operator is in general not self- 
adjoint.) With suitable restrictions on r and q, 
the direct and inverse scattering theory associa- 
ted with the Dirac operator [78] can be applied 
to various nonlinear PDEs in two spacetime 
dimensions to solve their Cauchy problems in 
considerable detail. As a crucial special case, 
initial conditions yielding vanishing reflection 
give rise to soliton solutions for the pertinent 
equation. 

The first example in this framework was found by 
Zakharov and Shabat (the nonlinear Schrödinger 
equation); with other choices of r and q several other 
soliton PDEs (including the sine-Gordon and mod- 
ified Korteweg-de Vries equations) were handled by 
Ablowitz, Kaup, Newell, and Segur, who studied a 
quite general class of external fields r and q. 


The Dirac Operator and Index Theory 


Thus far, we have considered various versions 
of the Dirac operator associated with the spaces 
R! for some l> 1. For applications in the area 
of index theory, however, one needs to generalize 
this base manifold. Indeed, one can define a Dirac 
operator for any l-dimensional oriented Rieman- 
nian manifold M that admits a spin structure. 
This is a lifting of the transition functions of the 
tangent bundle TM (which may be assumed to 
take values in SO(/)) to the simply connected 
twofold cover Spin(/) (taking l > 3). 

Choosing first /=2N + 1, the spin group has a 
faithful irreducible representation on C% Hence, 
one obtains a C% -bundle over M, tive spinor 
bundle. The Levi-Civita connection on M derived 
from the metric can now be lifted to a connection 
on the spinor bundle. From the covariant deriva- 
tive corresponding to the spin connection and the 
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Clifford algebra generators y!,...,7/, one can 
then construct a first-order elliptic differential 
Operator that acts on sections of the spinor 
bundle. (For the case M=R7N*! with its Eucli- 
dean metric, this construction yields the massless 
positive-chirality Dirac operator acting on wave 
functions with 2N components, as considered 
above.) 

The massless Dirac operator thus obtained is 
self-adjoint as an operator on the L?-space H 
associated with the spinor bundle, and it has 
infinite-dimensional positive and negative spectral 
subspaces H} and H_. (In this section the check 
accent on position-space quantities is omitted.) 
Specializing to the case of compact M, a contin- 
uous map from M to C gives rise to a Fredholm 
operator on H,, and more generally a continuous 
map from M to GL(k,C) yields a Fredholm 
operator on H, @C*. 

For a smooth map, the Fredholm index of this 
operator can be written in terms of an integral over 
M involving certain closed differential forms. The 
value of this integral does not change when exact 
forms are added, since M has no boundary. Hence, 
one is dealing with de Rham cohomology classes. In 
this context, the class involved (“characteristic 
class”) is determined by the Riemann curvature 
tensor of M and the topological (“winding”) 
characteristics of the map. 

The simplest example of this state of affairs arises 
for /=1 and M= St! with its obvious spin structure 
(periodic boundary conditions). Writing Y € H = L? 
(S!) as 


Wz) = Sanz”, zES [79] 


nEZ 
the Dirac operator H on H reads 


d 
H=- 80 
az eel 
It has eigenfunctions z”, n € 'Z. Thus, we may 
choose 


(Pab)(z) = Sanz", (P_v)(2) => ane” Bi 


n>0 n<0 


As a consequence, the functions in H,(HL_) are 
L?-boundary values of holomorphic functions in 
|z| < 1 (|z| >1). Operators of the form 


where w is a continuous function on St! and My 
denotes multiplication by wy, are called Toeplitz 
operators. It is not hard to see that they are Fredholm 


(viewed as operators on H+), provided that y does not 
vanish on St. (Recall a bounded operator B is 
Fredholm if it has finite-dimensional kernel K and 
cokernel C. Its Fredholm index is given by 


index(B) = dim K — dim C [83] 


and is norm continuous and invariant under addi- 
tion of a compact operator.) Assuming w(S') c C* 
from now on, the curve (St) has a well-defined 
winding number w(w) with respect to the origin. The 
equality 


index(Ty) = —w(w) [84] 


between objects from the area of analysis on the 
left-hand side and from the areas of topology and 
geometry on the right-hand side is the simplest 
example of an Atiyah-Singer type index formula. 
When w is not only continuous but also smooth, 
the index formula can be rewritten as 


L paw 


index(Ty) = ~ Fai fer 


[85] 
yielding a characteristic class version. 

It should be noted that the operator My on H 
has a bounded inverse M1; when 0¢ W(S!), hence a 
trivially vanishing index. Therefore, the compres- 
sion [82] involving the spectral projection of the 
Dirac operator is needed to get a nonzero index. 
Observe also that the equality [84] is quite easily 
verified for the case ¢%(z)=2", since Ty yields a 
power of the right (n > 0) or left (n < 0) shift on 
P,H~FI(N). 

We proceed to the case of even-dimensional 
manifolds, /=2N. Then the fiber C? of the spinor 
bundle splits into a direct sum of even and odd 
spinors, corresponding to two distinct representa- 
tions of Spin(2N) on C?. (Here it is assumed 
that N > 3; recall the Lie algebra isomorphisms 
so(4) ~ so(3) 6 so(3) and so(6) ~ su(4).) With respect 
to this decomposition, the Dirac operator can be 


written as 
0 D 
H= 86 
a4 86] 


where D and D* are again first-order elliptic 
differential operators expressed in terms of Clifford 
algebra generators and the spin connection. Tensor- 
ing the spinor bundle with a vector bundle equipped 
with a connection A, one can define a Dirac 
operator on the tensor product which involves A 
and takes the form 


mF) o 


with respect to the even/odd spinor decomposition. 
Once more, the index of D4 (viewed as a Fredholm 
operator between two different Hilbert spaces) can 
be expressed as an integral over M involving 
characteristic classes that depend on the curvatures 
of the two connections. 

Probably the simplest example of the construc- 
tions just sketched is given by the torus M = S! x S! 
with its flat metric. Employing the above coordinate 
and spin structure on St, one can take 


2/el çl 2 O a. 0 

M= (S XS JOC, al Te [88] 
Since the curvature vanishes, the index theorem for 
this situation implies index(D)=0. (Note that this is 
also plain from [88]: both kernel and cokernel of D 
are spanned by the constant sections.) On the other 
hand, when one tensors the spinor bundle with a line 
bundle with connection A, the index formula reads 


1 


index(D,) = a T 


F [89] 
where F is the curvature 2-form corresponding to A. 

The Atiyah-Singer index theorem for Dirac 
operators has far-reaching applications. It can be 
used to derive other results in this area, such as the 
Gauss-Bonnet-Chern theorem, the Hirzebruch sig- 
nature theorem, and (when M is a Kähler manifold) 
Riemann-Roch type theorems. From this, one can 
obtain information on various questions, such as the 
existence of positive scalar curvature metrics or 
zeros of vector fields on M. Other applications 
include insights on topological invariants of mani- 
folds obtained from “simple” manifolds (such as 
spheres and tori) by glueing or covering operations. 
This hinges on the additive properties of the index 
that are clear from its being given by an integral 
over the manifold. Conversely, the integrality of 
Fredholm indices can be used to deduce that certain 
rational cohomology classes are actually integral on 
manifolds that admit the structure that is required 
for the pertinent index theorem to apply, that 
certain manifolds do not admit such structures, 
since one knows that the relevant class is not 
integral, etc. 


More on the Dirac Field 


As mentioned earlier, the free-field formalism can be 
easily generalized to an arbitrary spacetime dimen- 
sion d. For d>4, however, no renormalizable 
interacting quantum field models involving the 
Dirac field are known. For the physical case d=4 
the standard model involves various Dirac fields 
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interacting with quantized gauge fields and Klein- 
Gordon fields. Although its perturbation theory is 
renormalizable, its mathematical existence is to date 
wide open. 

It is far beyond the scope of this article to 
elaborate on the analytical difficulties of relativistic 
quantum field theories, let alone those associated 
with the standard model. Even for d=2 and 3, a 
nonperturbative construction of interacting quan- 
tum field models involving the Dirac field is an 
extremely difficult enterprise. Apart from some 
rigorous results on certain self-interacting Dirac 
field models, the only interacting model that is 
reasonably well understood from the constructive 
field theory viewpoint is the Yukawa model for 
d=2 and 3. This describes the interaction between 
the Dirac field YW and a Klein—Gordon field ¢, the 
interaction term being formally given by g(W*7°WV)¢. 

On the other hand, the interaction of the quantized 
Dirac field with external classical fields is much more 
easily understood and analytically controlled. As a 
bonus, within this context, one can make contact 
with various issues of physical and mathematical 
relevance. We now proceed to sketch the external- 
field framework and some of its applications. 

Let us first consider the addition of an external 
field term gV(t, x) to the free Dirac operator H on 


H = L?(R",dx) 8 Ct g C! [90] 


We assume from now on that the coupling g is real 
and that V is a self-adjoint kA x kA matrix-valued 
function on spacetime R”™! with matrix elements 
that are in C R]: Then the (interaction picture) 
scattering operator S exists. It is unitary and has off- 
diagonal Hilbert-Schmidt parts S++, so that a 
unitary Fock-space S-operator Î(S) implementing 
the Bogoliubov transformation generated by S 
exists: 


SEATS) = &(Sf), 
The arbitrary phase in I'(S) can be fixed by requiring 
that the vacuum expectation value of I(S) be 
positive. More precisely, this number is generically 
nonzero and satisfies 


VEH [91] 


(2, F(S)Q)| = det(1 + T9 [92] 


where Ts is a positive trace class operator deter- 
mined by S. 

The vector I'(S)Q is a superposition of wave 
functions with an equal and arbitrary number of 
particles and antiparticles. More generally, the 
Fock-space S-operator Î(S) leaves the subspaces of 
F,(H) with a fixed eigenvalue q € Z of the charge 
operator QO invariant, and can create and 


84 Dirac Operator and Dirac Field 


annihilate an arbitrary number of particle- 
antiparticle pairs. 

The unitary propagator U(T,, T2) corresponding 
to V(t, x) does not have Hilbert-Schmidt off- 
diagonal parts (unless the spacetime dimension is 
sufficiently small and special external fields are 
chosen). Even so, the diagonal parts are Fredholm 
with vanishing index, and the off-diagonal parts are 
compact. Omitting the ill-defined determinantal 
factor, these properties imply that one obtains a 
renormalized quadratic form I',.,(U(T;, T2)) satisfy- 
ing the implementing relation 


~ 


Den (U(T1, 12) ) &(f) 


~ 


= &(U(T, T2)f) Pren(U(T1, T2)), Yf EH [93] 


in the quadratic form sense. 

The above unitary operators on H yield Fredholm 
diagonal parts whose indices vanish. (They are norm 
continuous in g and reduce to the identity for g = 0.) 
This is why their Fock-space implementers leave the 
charge sectors invariant. Indeed, for a unitary 
operator U on H with compact off-diagonal parts 
the implementer maps the charge-g sector to the 
charge-(q + q(U)) sector, where 


q(U) = index(U__) [94] 
Specializing to the case 


m=2N-iL Ke? AS? [95] 


a unitary (kA x kA)-matrix multiplier U on H does 
not have compact off-diagonal parts in general. But 
when it is of the form 


p nas 0 ) 


7 0 1), Q u_(x) [>é] 


with respect to the chiral decomposition (the 
generalization of the y°-decomposition [72] to even 
spacetime dimension), then it suffices for compact- 
ness of the off-diagonal parts that the matrices 
u(x) € U(k) are continuous and converge to 1, for 

Viewing R^N-! as arising from S*N~! via stereo- 
graphic projection, the latter unitaries can be viewed 
as continuous maps from S*‘~! to U(k), reducing to 
1, at the north pole. As such, they yield elements of 
the homotopy group 72n_1(U(k)). By virtue of Bott’s 
periodicity theorem, the latter group equals Z for 
k > N. Thus, the maps us have a well-defined 
“winding number” w(u+) € Z for k > N. From the 
index formula 


index(U__) = w(u,) — w(u_) [97] 


and [94] one now deduces that one can obtain 
implementers I’,.,(U) effecting a nonzero charge 


change from unitary maps with nonzero winding 
number. 

In particular, choosing kK=\=2N-'>N, there 
exist quite special “kink maps” 


ucal) €U(A), €>0, ac RN! [98] 


with winding number 1 and such that the quadratic 
form implementers of the unitary multiplication 
operators 


Ù 1) &) iteal) 0 
se — 0 1, 2 1, 


9 1, @1) 0 
U sgn 
“a 0 1) © ücal ~x) 


converge to (a linear combination of the chiral 
components of) the free Dirac field (0, a) as the 
kink size parameter € goes to 0. 

For the special case N = 1, one can take 


[99] 


y= g =E 


[100] 


eala) = x—a+ie 
and the off-diagonal parts of U4«a are actually 
Hilbert-Schmidt. Thus, the implementers can be 
chosen to be unitary operators. But to get con- 
vergence to the Dirac field components W(0, a), as 
e — 0, the unitary implementers I'(U..,) should be 
renormalized by a multiplicative factor. 

For the N=1 case, the unitary multipliers [96] 
give rise to loop groups. Indeed, requiring 


lim us(x)= 1, =+,- [101] 


x—>+oo0 

we are dealing with continuous maps S! — U(k). 
From the viewpoint of the Dirac theory, these 
groups are local gauge groups. The convergence to 
the Dirac field just sketched can be used to great 
advantage to clarify the structure of the correspond- 
ing Fock-space gauge groups. Their Lie algebras 
yield representations of Kac-Moody algebras, a 
topic which is considered shortly. 

Before doing so, it should be pointed out that 
under some mild smoothness assumptions all of 
the above unitary matrix multipliers can also be 
viewed as S-operators associated with very special 
external fields. Indeed, the gauge-transformed Dirac 
operator 


Hy =U HU [102] 


is of the form 


[103] 


where V(x) is a self-adjoint RA x kA matrix on 
RNI (a “pure gauge” field). If one now defines a 
time-dependent external field by 


V(t,x) = M 


then U equals the S-operator for V(t, x). (Equiva- 
lently, Ù is the t > œo wave operator for the time- 
independent external field V(x).) 

To conclude this section we sketch some applica- 
tions of the second-quantized Dirac formalism for 
the special case N=1, m=0, and positive chirality. 
Even though we could stick to the massless positive- 
chirality Dirac operator —id/dx on the line, it is 
simpler and more natural to start from its counter- 
part on the circle already considered in the last 
section, cf. [80]. (Under the Cayley transform, the 
positive- and negative-energy subspaces of —id/dx 
on L7(R) correspond to those of zd/dz on L7(S'), 
given by [81].) Letting z=e!’, we then obtain 


t>0 


Hr. [104] 


H = L? ([0, 27], dé) 


H = —id/d9, 
f [105] 
H= H= FN) Horz 


P (Z), 


and a corresponding Dirac field 


U A = (2r)? B q e itin eaae 
n=0 A 


(t,0) E€ R x [0,27] [106] 


where 


Gace), 20; bc e).1 x0 [107] 


and {ei}iez, is the canonical basis of F(Z). 

Consider now the group GL(H) of bounded 
operators on H with bounded inverses. The 
transformation 


PA= EGA, PA= O(G"f) 
fEH, GeGL(H) 


leaves the CAR [48] invariant. Provided that G 
belongs to the subgroup 


G2(H) 


[108] 


= {G € GL(H)|G4+ Hilbert-Schmidt} [109] 
there exists an implementer I'(G) on F,(H): 
[(G)®*(f) = ®*(Gf)P(G), 
(GOI =C AEG), VfeH — [110] 
In particular, the multiplication operator 
exp(h(x)), h(x) = Se, g=e = [111] 
k=1 
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belongs to G2(H) provided the sequence x, vanishes 
sufficiently fast as k — oo. Thus, one obtains an 
implementer I'(e”*)), the so-called KP evolution 
operator. This designation is justified by the vacuum 
expectation value 


T(x) = (0,T(e?™)T(G)Q), G € Go(H) 


being a tau-function solving the hierarchy of KP 
evolution equations in Hirota bilinear form, as first 
shown by Sato and his Kyoto school. For example, 
the KP equation itself, 


112) 


Uyy = On (Fu; — 2uu, — 5 Uxxx.) [113] 
has the bilinear form 
4 2 
(a a eoe- 
= 0 [114] 
the relation being given by 
H= Seay, 2% AS 20? lar [115] 


The class of solutions to [113] thus obtained 
includes not only the rational and soliton solutions 
(which correspond to choosing G as multiplication by 
a rational function of z =e” that does not vanish on 
S'), but also the finite-gap solutions associated with 
compact Riemann surfaces. Moreover, for suitable 
subgroups of G2(H), one obtains tau-functions for 
related soliton hierarchies, including the Korteweg-de 
Vries, Boussinesq and Hirota—Satsuma hierarchies. 
Even though the class of solutions associated with 
G2(H) via the Dirac formalism is large, it should be 
noted that from the perspective of the Cauchy problem 
for the pertinent evolution equations the solutions are 
nongeneric, inasmuch as the initial data are real- 
analytic functions. 

Finally, we consider Lie algebra representations 
related to the above special starting point [105] for 
the second-quantized Dirac framework. Assume that 
exp(tA) is a one-parameter group of bounded 
operators on H with generator A in the Lie algebra 


of G2(H), 
gH) 


= {A bounded | A++ Hilbert-Schmidt} [116] 
Then one can take 
T'(exp(tA)) = exp(tdI(A)) [117] 


where dĪ(A) is the Fock-space operator uniquely 
determined up to an additive constant by its 
commutation relation 


[dT (A), ®*(f)] = (Af), VP eH |118] 
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with the smeared Dirac field ©®*(f). Fixing the 
constant by requiring 
(Q, dI'(A)Q) = 0 [119] 


the map At dI'(A) satisfies the Lie algebra relations 


[dI‘(A), dI'(B)] = dI([A,B])+C(A,B)1 [120] 
so that the term 
C(A, B) = tr(A_,B,_ — B_,A4_) [121] 


encodes a central extension of the Lie algebra g&2(H) 
[116]. 

The developments sketched in the previous 
paragraph are in fact independent of the specific 
form of the Hilbert space H and its H4/H- 
decomposition. But the special feature of the choice 
[105] and its St — R analog is that the smeared 
Dirac current 


2r 


dé y(8) :W*(0,6)U(0,8):, weCr(S') [122] 


0 
(where the double dots denote normal ordering — the 
replacement of terms involving bb? by —b7 b,x) is of 
the form dI(A,) with A, € g2(H) determined by w. 
(For spacetime dimension d > 2, this is no longer 
true, as the Hilbert-Schmidt condition is violated.) 
Moreover, [120] reduces to 


[dT (Ay), d'(Ag)] = C(Ay, Ag)1 [123] 
with the central extension explicitly given by 
i 2r ; 
C(Ay, As) =5— [| doy (8) 0) [124] 
JO 


We have just sketched the details of the (simplest 
version of the) Dirac current algebra: the term [124] 
is commonly known as the Schwinger term, so that 
the central extension featuring in [120]-[121] may 
be viewed as a generalization. The above setup can 
also be slightly generalized so as to obtain repre- 
sentations of the Virasoro algebra, which is a central 
extension of the Lie algebra of polynomial vector 
fields on St. The general framework has a quite 
similar version for the neutral Dirac field (Majorana 
field), described in terms of the self-dual CAR 
algebra. In the neutral setting, one can construct 
the Neveu-Schwarz and Ramond representations of 
the Virasoro algebra, which are crucial in string 
theory. 

Tensoring H with an internal symmetry space C 
and starting from the Lie algebra of rational maps 
St! — sl(k, C), z— M(z), with poles occurring solely 


at z=0 and z=oo (regarded as multiplication 
operators on L7(S1)*), the Fock-space counterparts 
obtained via the dI'-operation yield representations 
of the Kac-Moody Lie algebra AW. Specifically, on 
the charge-0 sector of F,(H), one obtains the so- 
called basic representation, whereas the charge-q 
sectors with g=1,..., k — 1, yield the fundamental 
representations. Using the neutral version of Dirac’s 
second quantization, one can also obtain the 
basic and a fundamental representation of the 
Kac-Moody algebras B (for k=2l+ 1) and D: 
(ftor k= 2h; 


See also: Bosons and Fermions in External Fields; 
Clifford Algebras and Their Representations; Current 
Algebra; Dirac Fields in Gravitation and Nonabelian 
Gauge Theory; Gerbes in Quantum Field Theory; 
Holonomic Quantum Fields; Index Theorems; Quantum 
Field Theory in Curved Spacetime; Quantum 
Chromodynamics; Random Walks in Random 
Environments; Relativistic Wave Equations Including 
Higher Spin Fields; Solitons and Kac—Moody Lie 
Algebras; Spinors and Spin Coefficients; Symmetry 
Classes in Random Matrix Theory. 


Further Reading 


Bjorken JD and Drell SD (1964) Relativistic Quantum Mechanics. 
New York: McGraw-Hill. 

Carey AL and Ruijsenaars SNM (1987) On fermion gauge 
groups, current algebras and Kac—Moody algebras. Acta 
Applicandae Mathematicae 10: 1-86. 

Date E, Jimbo M, Kashiwara M, and Miwa T (1983) Transfor- 
mation groups for soliton equations. In: Jimbo M and Miwa T 
(eds.) Proceedings of RIMS Symposium, Nonlinear Integrable 
Systems — Classical Theory and Quantum Theory, pp. 39-119. 
Singapore: World Scientific. 

Dirac PAM (1928) The quantum theory of the electron. 
Proceedings of the Royal Society of London. Series A 117: 
610-624. 

Dirac PAM (1928) The quantum theory of the electron, II. 
Proceedings of the Royal Society of London. Series A 118: 
351-361. 

Glimm J and Jaffe A (1981) Quantum Physics. New York: 
Springer. 

Itzykson C and Zuber JB (1980) Quantum Field Theory. New York: 
McGraw-Hill. 

Pressley A and Segal G (1986) Loop Groups. Oxford: Clarendon. 

Rose ME (1961) Relativistic Electron Theory. New York: Wiley. 

Ruijsenaars SNM (1989) Index formulas for generalized Wiener- 
Hopf operators and boson-fermion correspondence in 2N 
dimensions. Communications in Mathematical Physics 124: 
553-593. 

Schweber SS (1961) An Introduction to Relativistic Quantum 
Field Theory. Evanston, IL: Row-Peterson. 

Streater RF and Wightman AS (1964) PCT, Spin and Statistics, 
and All That. New York: Benjamin. 

Thaller B (1992) The Dirac Equation. New York: Springer. 


Dispersion Relations 


J Bros, CEA/DSM/SPhT, CEA/Saclay, Gif-sur-Yvette, 
France 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Dispersion relations constitute a basic chapter of 
mathematical physics which covers various types of 
classical and quantum scattering phenomena and 
illustrates in a typical way the importance of general 
principles in theoretical physics, among which 
causality plays a major role. Each such phenomenon 
is described in terms of a scattering amplitude F(w), 
which is a complex-valued function of a frequency 
variable w; in quantum physics, this variable 
becomes an energy variable called E (or s in particle 
physics), as it follows from the fundamental de 
Broglie relation E= hw. The real and imaginary 
parts of F(w), which are called respectively the 
dispersive part D(w) and the absorptive part A(w) of 
F, have well-defined physical interpretations for all 
these phenomena; they represent quantities which 
are essentially accessible to measurements. The term 
dispersion relations refers to linear integral equa- 
tions which relate the functions D(w) and A(w); such 
integral equations are always closely related to the 
Cauchy integral representation of a subjacent holo- 
morphic function F(w)) of the complexified fre- 
quency (or energy) variable w®. F(w)) is called the 
holomorphic scattering function or in short the 
scattering function, and the scattering amplitude 
appears as the boundary value of the latter, taken at 
positive real values of w from the upper half-plane of 
w), namely 


F(w) = lim F(w+ie), ¢>0 


pe 
Historically, the first relations of that type to be 
obtained were the Kramers—Kronig relations (1926), 
which concern the propagation of light in a 
dielectric medium. In this basic example, F(w) 
represents the complex refractive index of the 
medium 7'(w)=n(w) +iK(w) for a monochromatic 
wave with frequency w. The dispersive part D(w) is 
the real refractive index n(w), which is the inverse 
ratio of the phase velocity of the wave in the 
medium to its velocity c in the vacuum: the fact that 
it depends on the frequency w corresponds precisely 
to the phenomenon of dispersion of light in a 
dielectric medium. A slab of the latter thus appears 
as a prototype of a macroscopic scatterer. The 
absorptive part A(w) is the rate of exponential 
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damping x(w) of the wave, caused by the absorption 
of energy in the medium. 

It has appeared much later that for many 
scattering phenomena, dispersion relations can be 
derived from an appropriate set of general physical 
principles. This means that inside a certain axio- 
matic framework these relations are model indepen- 
dent with respect to the detailed structure of the 
scatterer or to the detailed type of particle interac- 
tion in the quantum case. 

In a very short and oversimplifying way, the 
following logical scheme holds. At first, one can say 
that any mathematical formulation of a physical 
principle of causality results in support-type proper- 
ties with respect to a time variable t of an 
appropriate “causal structural function” R(t) of the 
physical system considered: typically, such a causal 
function should vanish for negative values of t. It 
follows that its Fourier transform R admits an 
analytic continuation R in the upper half-plane 
of the corresponding conjugate variable, interpreted 
as a frequency (or an energy in the quantum case): 
here is the general reason for the occurrence of 
complex frequencies and of holomorphic functions 
of such variables. In fact, the relevant holomorphic 
scattering function F(w'*)) always appears as gener- 
ated by R) via some (more or less sophisticated) 
procedure: in the simplest case, F coincides with R® 
itself, but this is not so in general. Finally, the 
derivation of suitable analyticity and boundedness 
properties of F(w°)) in a domain whose typical form 
is the upper half-plane, allows one to apply a 
Cauchy-type integral representation to this function; 
the dispersion relations directly follow from the 
latter. 

The first part of this article aims to describe the 
most typical dispersion relations and their link 
with the Cauchy integral. It then presents two 
basic illustrations of these relations, which are: (1) 
in classical physics, the Kramers—Kronig relations 
mentioned above, and (2) in quantum physics, the 
dispersion relations for the forward scattering of 
equal-mass particles. The aim of the subsequent 
parts is to give as complete as possible accounts of 
the derivation of the relevant analyticity domains 
inside appropriate axiomatic frameworks which, 
respectively, contain the previous two examples. 
The simplest axiomatic framework is the one 
which governs all the phenomena of linear 
response: in the latter, the proof of analyticity 
and dispersion relations most easily follows the 
logical line sketched above. It will be presented 
together with its application to the derivation of 
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the Kramers—Kronig relations. The rest of the 
article is devoted to the derivation of the so-called 
crossing analyticity domains which are the relevant 
background of dispersion relations for the two- 
particle scattering (or collision) amplitudes in 
particle physics. This derivation relies on the 
general axiomatic framework of relativistic quan- 
tum field theory (QFT) (see Axiomatic Quantum 
Field Theory) and more specifically on the “analy- 
tic program in complex momentum space” of the 
latter. This framework, whose rigorous mathema- 
tical form has been settled around 1960, represents 
the safest conceptual approach for describing the 
particle collision processes in a range of energies 
which covers by far all those that can be produced 
and will be produced in the accelerators for 
several decades. A simple account of the field- 
theoretical axiomatic framework and of the logical 
line of the derivation of dispersion relations will 
be presented here for the simplest kinematical 
situations. A broader presentation of the analytic 
program including an extended class of analyticity 
properties for the general structure functions and 
(two-particle and multiparticle) collision ampli- 
tudes in QFT can also be found in this encyclope- 
dia (see Scattering in Relativistic Quantum Field 
Theory: The Analytic Program). For brevity, we 
shall not treat here the derivation of dispersion 
relations in the framework of nonrelativistic 
potential theory. Concerning the latter, the inter- 
ested reader can refer to the book by Nussenzweig 
(1972). A collection of old basic papers on field- 
theoretical dispersion relations can be found in the 
review book edited by Klein (1961). For a recent 
and well-documented review of the multiplicity of 
versions and applications of dispersion relations 
and their experimental checking, the reader can 
consult the article by Vernov (1996). 


Typical Dispersion Relations 


The possibility of defining the scattering function 
P(w) in the full upper half-plane and of exploiting 
the corresponding boundary value F of F on the 
negative part as well as on the positive part of the 
real axis will depend on the framework of considered 
phenomena. For the moment, we do not consider the 
more general situations which also occur in particle 
physics and will be described later (“crossing 
domains” and “quasi-dispersion-relations”). 

In the simplest cases, the real and imaginary parts 
D and A of F are extended to negative values of the 
variable w via additional symmetry relations result- 
ing from appropriate “reality conditions.” As a 
typical and basic example, there occurs the 





symmetry relation F(w°)) = F(—w)), (with w°) and 
—w'°) in the upper half-plane) and correspondingly 
D(w) = D(—w), A(w) = —A(—w) on the reals; we shall 
call (S) this symmetry relation. 

The simplest case of dispersion relations is then 
obtained when D and A are linked by the reciprocal 
Hilbert transformations: 








Dw)=*P [Aw sted fa 
a. = ee a 
A(w) = —+P Dy ’) ! du [1b] 
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where P denotes Cauchy’s principal value, defined 
for any differentiable function (x) (sufficiently 
regular at infinity) by 


p [T PO ax 
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As a matter of fact, the pair of equations [1a], [1b] is 
equivalent to the following relation for F=D +iA: 


The latter is obtained as a limiting case of the 
Cauchy formula 


Fw) = 1 [- F(a’) du! [4] 
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expressing the fact that F is holomorphic and 
sufficiently decreasing at infinity in the upper half- 
plane Z, of the complex variable w and that F(w) 
is the boundary value of F(w®) on all the reals. 

Finally, one checks that in view of the symmetry 
relation (S), the Hilbert integral relations between D 
and A given above reduce to the following disper- 
sion relations: 


T — w? 
2w a , 1 f 
A(w) = =T : D(w) + a dw [Sb] 


Two Basic Examples 


1. The Kramers—Kronig relation in classical optics 
It will be shown in the next part that the complex 
refractive index n'(w)=n(w) +ik(w) of a dielectric 
medium is the boundary value of a holomorphic 


function A’ (w°) in T} satisfying the symmetry relation 
(S), and such that the integral fe A (w + in) — 117 dw 
is uniformly bounded for all 7 > 0. 

It follows that all the previous relations are 
satisfied by the function F(w))=Aa'(w)) —1. 
In particular, the real refractive index i and the 
“extinction coefficient” G(w)=2wkK(w)/c (c being 
the velocity of light in the vacuum) are linked by 
the following Kramers-Krönig dispersion relation 
(corresponding to eqn [Sa]): 


c w 
n(w)— 1 = z? ET [6] 
2. Dispersion relation for the forward two-particle 
scattering amplitude in relativistic quantum physics 
One considers the following collision phenomenon in 
particle physics. A particle II, with mass m, called the 
target and sitting at rest in the laboratory, is collided 
by an identical pare II, with relativistic energy w 
larger than m (=mc’; in high-energy physics, one 
usually chooses es such that c=1). After the 
collision, the particle II; is scattered in all possible 
directions, 0, of space, according to a certain 
quantum scattering amplitude T,(w), whose modulus 
is essentially the rate of probability for detecting I], in 
the direction 6. The forward scattering amplitude 
To(w) corresponds to the detection of II, in the 
forward longitudinal direction with respect to its 
incidence direction towards the target. Let us also 
assume that the particles carry no charge of any kind, 
so that each particle coincides with its “antiparticle.” 
In that case, To(w) is shown n be the boundary value 
of a scattering function To(w®) enjoying the follow- 
ing properties: 


1. it is a holomorphic function in Z, satisfying the 
symmetry relation (S); 
2. its behavior at infinity in Z, is such that the 


integral 


is uniformly bounded for all 7 > 0; and 

3. under more specific assumptions on the mass 
spectrum of the subjacent theory, the “absorptive 
part” A(w) =Im To(w) vanishes for |w|< m. 


= 2 
To(w + in) 
(w + in) 








Then by bg ean [Sa] to the function D(w) = 


Re[(To(w) 0))/w*] (regular at w = 0), one obtains 
the T g a relation: 
Re To(w) 
2ur ba f 1 f 
=T9(0) +P | Won T 
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Remark In view of (3), the scattering function 
Tolu) oe ay analytic continuation as an even 
function of w (still orig: To) in the cut-plane 
Ci") = C\{w € R;|w| > m}. In fact, in view of (S) 
and (3), the boundary value To of To satisfies the 
relation To(w)=To(—w) in the real interval 6,,= 
WER, -—m< : i m}. Let us then introduce the 
function îs - ©) = Tow) as a holomorphic 
function of w® in Z_: one sees that the boundary 
values of To i r from the respective domains 
T, and Z_ coincide on 6,, and therefore admit a 
common analytic continuation throughout this real 
interval (in view of “Painlevé’s lemma” or “‘one- 
dimensional edge-of-the-wedge theorem”). One 
also notes that in view of (S) the extended func- 
tion To satisfies the “reality condition” 
Tolwe) = Tolu) in C") The fact that To is well 
defined as an even holomorphic function in the cut- 
plane CS") has been established in the general 
faniework of QFT, as explained in the last part of 
this article. 


Phenomena of Linear Response: 
Causality and Dispersion Relations 
in the Classical Domain 


The subsequent axiomatic framework and results 
(due to J S Toll (1952, 1956)) concern any physical 
system which exhibits the following type of phe- 
nomena: whenever it receives some excitation signal, 
called the input and represented by a real-valued 
function of time fi,(t) with compact support, the 
system emits a response signal, called the output and 
represented by a corresponding real-valued function 
four(f), in such a way that the following postulates 
are satisfied: 


(P1) Linearity. To every linear combination of 
inputs difin,1 + 42fin,2, there corresponds the 
output d1fou, 1 + 42fout, 2- 

(P2) Reproductibility or time-translation invariance. 

Let r be a time-translation parameter taking 

arbitrary real values; to every “time-translated 

input” FP =finl(t — T), there corresponds 

the output fult) = four(t — 7). 

Causality. The effect cannot precede the cause, 

namely if ti, and tout denote respectively the 

lower bounds of the supports of fin(t) and 
fout(t), then there always holds the inequality 
tin < tout- 

Continuity of the response. There exists 

some continuity inequality which expresses 

the fact that a certain norm of the output is 
majorized by a corresponding norm of the 
input. The case of an L2-norm inequality of the 


ro 
U9 


(P4 
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form |fout|< |fin| is particularly significant: 
when the norm |f|=[/ F(t)? de]'/7 is interpre- 
table as an energy (for the output as well as for 
the input), it acquires the meaning of a 
“dissipation” property of the system. 


The postulate of linear dependence (P1) of fout 
with respect to fi, is obviously satisfied if the 
response is described by any general kernel K(t, t’) 
such that the following formula makes sense: 


+00 
fal = | KENA 8 
Conversely, the existence of a distribution kernel K 
can be established rigorously under the continuity 
assumption postulated in (P4) by using the Schwartz 
nuclear theorem. In full generality (see our comment 
in the next paragraph), the kernel K(t, t’) appears to 
be a tempered distribution in the pair of variables 
(t, ¢’) and the previous integral formula holds in the 
sense of distributions, which means that both sides 
of eqn [8] must be considered as tempered distribu- 
tions (in t£) acting on any smooth test-function g(t) in 
the Schwartz space S. (Note, for instance, that the 
trivial linear application four = fin is represented by 
the kernel K(t, t’) = 6(¢ — 7’)). 

From the reproductibility postulate (P2), it fol- 
lows that the distribution K can be identified with a 
distribution of the single variable r=t — t, namely 
K(t, t')=R(t—t'). Moreover, the real-valuedness 
condition imposed to the pairs (fin, four) entails that 
R is real. Finally, the causality postulate (P3) implies 
that the support of the distribution R is contained in 
the positive real axis, so that one can write, in the 
sense of distributions, 


t 
fowtt)= | Re=t)fnlt)df  p 
The convolution kernel R(t— t) is typically what 
one calls in physics a “retarded kernel.” 

If we now introduce the frequency variable w, 
which is the conjugate of the time variable t, by the 
Fourier transformation 


jw) = f fotat 


we see that the convolution equation [9] is equiva- 
lent to the following one: 


four(w) = R(w)fin(w) [10] 


In the latter, the Fourier transform R(w) of R is a 
tempered distribution, which is the boundary value 
from the upper half-plane Z, of a holomorphic 


function R® (wl), called the Fourier-Laplace trans- 
form of R. R is defined for all w =w + in, with 
n>0O, by the following formula in which the 
exponential is a good test-function for the distribu- 
tion R (since exponentially decreasing for t — +00): 


RO (WO) = / 
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More precisely, the tempered-distribution character of 
R is strictly equivalent to the fact that R is of 
moderate growth both at infinity and near the reals in 
T, namely that it satisfies a majorization of the 
following form for some real positive numbers p and q: 


+00 


RH" dt [11] 


(1+ ul® +) 
nÊ 

We thus conclude from eqn [10] that each phenom- 

enon of linear response is represented very simply in 

the frequency variable by the multiplicative operator 

R(w), whose analytic continuation R')(w")) is called 

the (causal) response function. 


RO (w+in)| SC 12) 


A Typical Illustration: The Damped Harmonic 
Oscillator 


We consider the motion x=x(t) of a damped 
harmonic oscillator of mass m submitted to an 
external force F(t). The force is the input (fin =F) and 
the resulting motion is the output, namely fou(t) = 
x(t). All the previous general postulates (P1)—(P4) are 
then satisfied, but this particular model is, of course, 
governed by its dynamical equation 


x" (t) + 27x! (t) + w2x(t) = 13] 


where wọ is the eigenfrequency of the oscillator and 
y is the damping constant (y>0). The relevant 
solution of this second-order differential equation 
with constant coefficients is readily obtained in 
terms of the Fourier transforms x(w) of x(t) and F(w) 
of F(t). One can in fact replace eqn [13] by the 
equivalent equation 

(=u = 2iqw + w})x(w) = i> [14] 
whose solution is of the form [10], namely x(w) = 
R(w)F(w), with 


TT F(w) 

ee m(w* + 2iqw — wi) 
Fe) 
= mlw — w1)(w — w) al 

wi = (we — 7)” — iy [15b] 


It is clear that the rational function defined by eqns 
[15] admits an analytic continuation in the full 
complex plane of w' minus the pair of simple poles 
(w1,W2) which lie in the lower half-plane. In 
particular, it is holomorphic (and decreasing at 
infinity) in Z}, as expected from the previous general 
result. Moreover, this example suggests that for any 
particular phenomenon of linear response, the details 
of the dynamics are encoded in the singularities of the 
holomorphic scattering function R' (w'°)), which all 
lie in the lower half-plane. The validity of a 
dispersion relation only expresses the analyticity 
(and decrease at infinity) of that function in the 
upper half-plane, which is model independent. 


Remark The same mathematical analysis applies to 
any electric oscillatory circuit, in which the capaci- 
tance, inductance, and resistance are involved in 
place of the parameters m, wo and 7: fin and fout 
correspond respectively to an external electric 
potential and to the current induced in the circuit; 
the response function is the admittance of the 
circuit. 


Application to the Kramers-Kronig Relation 


The background of the Kramers—Kronig relation [6], 
namely the analyticity and boundedness properties 
of the complex refractive index f'(w)) in Z4}, is 
provided by the previous axiomatic framework. 
However, it is not the quantity #’(w)) itself but 
appropriate functions of the latter which play the 
role of causal response functions; two phenomena 
can in fact be exhibited, which both contribute to 
proving the relevant properties of 7'(w°). 


1. Propagation of light in a dielectric slab with 
thickness 6. One considers the wave front fi,(t) of an 
incoming wave normally incident upon the slab, 
with Fourier decomposition 


fin(t) = ~ i i foje dw [16] 


After having traveled through the medium, it gives 
rise to an outgoing wave fout(t) on the exit face of 
the slab, whose Fourier decomposition can be 
written as follows (provided the thickness 6 of the 
slab is very small): 


1 [T i 
fa) = z AND du [17] 
N J= 


CO 


In the latter, the real part of n'(w)/c is the inverse of 
the light velocity in the medium, while its imaginary 
part takes into account the exponential damping of 
the wave. The output fout thus appears as a causal 
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linear response with respect to fin (since four “starts 
after” fin). According to the general formula [10], 
the corresponding response function Re can be 
directly computed from eqns [16] and [17], which 
yields: 


Ro ( w(°) — pivla'(w)6/c 18) 


In view of the previous axiomatic analysis, r has 
to be holomorphic and of moderate growth in Z4, 
and since this holds for all 6’s sufficiently small, it 
can be shown that the function 7’(w') itself is 
holomorphic and of moderate growth in Z, (no 
logarithmic singularity can be produced). 

2. Polarization of the medium produced by an 
electric field. The dielectric polarization signal P(t) 
produced at a point of a medium by an external 
electric field E(t) is also a phenomenon of linear 
response which obeys the postulates (P1)—(P4); the 
corresponding formula [10] reads 


P(w) = x'(w) E(w) 
where x’ is the complex dielectric susceptibility of 


the medium, which is related to n’ by Maxwell’s 
relation 


[19a] 


[n*(w) — 1] 
An 


One thus recovers the fact that x’ admits an analytic 
continuation in Z4; one can also show by a physical 
argument that ¥/(w), and thereby '(w) — 1, tends to 
zero as a constant divided by w* when w tends 
to infinity. This behavior at infinity extends to 
fl (w) — 1 in T, in view of the Phragmen—Lindel6f 
theorem, since 7’ is known (from (1)) to be of 
moderate growth. This justifies the analytic back- 
ground of Kramers—Kronig’s relation. 


[19b] 


From Relativistic QFT to the Dispersion 
Relations of Particle Physics: Historical 
Considerations and General Survey 


In the quantum domain, the derivation of dispersion 
relations for the two-particle scattering (or collision) 
amplitudes of particle physics has represented, since 
1956 and throughout the 1960s, an important 
conceptual progress for the theoretical treatment of 
that branch of physics. These phenomena are 
described in a quantum-theoretical framework in 
which the basic kinematical variables are the 
energies and momenta of the particles involved. 
These variables play the role of the frequency of 
light in the optical scattering phenomena. Moreover, 
since large energies and momenta are involved, 
which allow the occurrence of particle creation 
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according to the conservation laws of special relativ- 
ity, it is necessary to use a relativistic quantum- 
mechanical framework. Around 1950, the success of 
the quantum electrodynamics formalism for comput- 
ing the electron—photon, electron-electron, and elec- 
tron—positron scattering amplitudes revealed the 
importance of the concept of relativistic quantum 
field for the understanding of particle physics. 
However, the methods of perturbation theory, 
which had ensured the success of quantum electro- 
dynamics in view of the small value of the coupling 
parameter of that theory (namely the electric charge 
of the electron), were at that time inapplicable to the 
strong nuclear interaction phenomena of high-energy 
physics. This failure motivated an important school 
of mathematical physicists for working out a model- 
independent axiomatic approach of relativistic QFT 
(e.g., Lehmann, Symanzik, Zimmermann (1954), 
Wightman (1956), and Bogoliubov (1960); see Axio- 
matic Quantum Field Theory). Their main purpose 
was to provide a conceptually satisfactory treatment 
of relativistic quantum collisions, at least for the case 
of massive particles. Among various postulates 
expressing the invariance of the theory under the 
Poincaré group in an appropriate quantum- 
mechanical Hilbert-space framework, the approach 
basically includes a certain formulation of the 
principle of causality, called microcausality or local 
commutativity. This axiomatic approach of QFT was 
followed by a conceptually important variant, namely 
the algebraic approach to QFT (Haag, Kastler, Araki 
1960), whose most important developments are 
presented in the book by Haag (1992) (see Algebraic 
Approach to Quantum Field Theory). From the 
historical viewpoint, and in view of the analyticity 
properties that they also generate, one can say that all 
these (closely related) approaches parallel the axio- 
matic approach of linear response phenomena with, 
of course, a much higher degree of complexity. In 
particular, the characterization of scattering (or 
collision) amplitudes in terms of appropriate struc- 
ture functions of the basic quantum fields of the 
theory is a nontrivial preliminary step which was 
taken at an early stage of the theory under the name 
of “asymptotic theory and reduction formulae” 
(Lehmann, Symanzik, Zimmermann 1954-57, 
Haag-Ruelle 1962, Hepp 1965). There again, in the 
field-theoretical axiomatic framework, causality gen- 
erates analyticity through Fourier—Laplace transfor- 
mation, but several complex variables now play the 
role which was played by the complex frequency in 
the axiomatics of linear response phenomena: they 
are obtained by complexifying the relativistic energy- 
momentum variables of the (Fourier transforms of 
the) quantum fields involved in the high-energy 


collision processes. In fact, the holomorphic functions 
which play the role of the causal response function 
R(w) are the QFT structure functions or “Green 
functions in energy-momentum space.” The study of 
all possible analyticity properties of these functions 
resulting from the QFT axiomatic framework is 
called the analytic program (see Scattering in 
Relativistic Quantum Field Theory: The Analytic 
Program). The primary basic scope of the latter 
concerns the derivation of analyticity properties for 
the scattering functions of two-particle collision 
processes, which appears to be a genuine challenge 
for the following reason. The basic Einstein relation 
E=mc*, which applies to all the incoming and 
outgoing particles of the collisions, operates as a 
geometrical constraint on the corresponding physical 
energy-momentum vectors: according to the Min- 
kowskian geometry, the latter have to belong to mass 
hyperboloids, which define the so-called “mass shell” 
of the collision considered. It is on the corresponding 
complexified mass-shell manifold that the scattering 
functions are required to be defined as holomorphic 
functions. In the analytic program of QFT, the 
derivation of such analyticity domains and of 
corresponding dispersion relations in the complex 
plane of the squared total energy variable, s, of each 
given collision process then relies on techniques of 
complex geometry in several variables. As a matter of 
fact, the scattering amplitude is a function (or 
distribution) of two variables F(s,t), where ¢ is a 
second important variable, called the squared 
momentum transfer, which plays the role of a fixed 
parameter for the derivation of dispersion relations in 
the variable s. The value t=O corresponds to the 
special kinematical situation which has been 
described above (for the case of equal-mass particles 
II, and II,) under the name of forward scattering and 
the variable s is a simple affine function of the energy 
w of the colliding particle I], in the laboratory 
Lorentz frame, (namely s = 2m? + 2mw in the equal- 
mass case). It is for the corresponding scattering 
amplitude To(w) = Fo(s) = F(s, t),9 that a dispersion 
relation such as eqn [7] can be derived, although this 
derivation is far from being as simple as for the 
phenomena of linear response in classical physics: 
even in that simplest case, it already necessitates the 
use of analytic completion techniques in several 
complex variables. The first proof of this dispersion 
relation was performed by K Symanzik in 1956. In 
the case of general kinematical situations of measure- 
ments, the direction of observation of the scattered 
particle includes a nonzero angle with the incidence 
direction, which always corresponds to a negative 
value of t. The derivation of dispersion relations at 
fixed t= to < 0, namely for the scattering amplitude 


F,,(s) = F(s, t)r- requires further arguments of 
complex geometry, and it is submitted to subtle 
limitations of the form tı < to < 0, where t; depends 
on the mass spectrum of the particles involved in the 
theory. The first rigorous proof of dispersion relations 
at t < 0 was performed by N N Bogoliubov in 1960. 

Three conceptually important features of the 
dispersion relations in particle physics deserve to 
be pointed out. 


1. In comparison with the dispersion relations of 
classical optics, a feature which appears to be new is 
the so-called “crossing property,” which is character- 
istic of high-energy physics since it relies basically on 
the relativistic kinematics. According to that prop- 
erty, the boundary values of the analytic scattering 
function F,(s) at positive and negative values of s 
from the respective half-planes Ims >0 and Ims <0 
are interpreted, respectively, as the scattering ampli- 
tudes of two physically different collision processes, 
which are deduced from each other by replacing the 
incident particle by the corresponding antiparticle; 
one also says that “these two collision processes are 
related by crossing.” A typical example is provided 
by the proton-proton and proton—antiproton colli- 
sions, whose scattering amplitudes are therefore 
mutually related by the property of analytic con- 
tinuation. This type of relationship between the 
values of the scattering function at positive and 
negative values of s generalizes in a nontrivial way 
the symmetry relation (S) satisfied by the forward 
scattering function To(w'*) when each particle coin- 
cides with its antiparticle (see the second basic 
example above). No nontrivial crossing property 
holds in that special case and the fact that To is an 
even function of w' precisely expresses the identity 
of the two-collision processes related by crossing. In 
the general case, for t=0 as well as for t = to < 0 for 
any value of tọ, the analyticity domain that one 
obtains for the scattering function is not the full cut- 
plane of s: in its general form, a “crossing domain” 
may exclude some bounded region B;, from the cut- 
plane, but it always contains an infinite region which 
is the exterior of a circle minus cuts along the two 
infinite parts of the real s-axis (Bros, Epstein, Glaser 
1965): these cuts are along the physical regions of the 
two collision processes related by crossing. In that 
general case, the scattering function F,(s) still satisfies 
what can be called a quasi-dispersion-relation, in 
which the right-hand side contains an additional 
Cauchy integral, taken along the boundary of B+. 

2. A second important feature concerns the 
behavior at large values of s of the scattering 
functions F,(s) in their analyticity domain. As 
indicated in the presentation of the second basic 
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example, a  “precise-increase” property was 
expected to be satisfied by the forward scattering 
amplitude To(w) for w (or s) tending to infinity. 
This “‘precise-increase” property implied the neces- 
sity of writing the corresponding dispersion rela- 
tion [7] for the function (Tp(w) — To(0))/w7: this is 
what one calls a “dispersion relation with a 
subtraction.” As a matter of fact, the existence of 
such restrictive bounds on the total cross sections at 
high energies had been discovered in 1961 by 
M Froissart: his derivation relied basically on the 
use of the unitarity of the scattering operator 
(expressing the quantum principle of conservation 
of probabilities), but also on a strong analyticity 
postulate for the scattering function not implied by 
the general field-theoretical approach (namely the 
Mandelstam domain of “double dispersion rela- 
tions”). In the general framework of QFT, Froissart- 
type bounds appeared to be closely linked to a 
further nontrivial extension of the range of “admis- 
sible” values of t for which F,(s) can be analytically 
continued in a cut-plane or crossing domain. In 
fact, the extension of this range to positive (i.e., 
“unphysical’’?) and even complex values of t, and as 
a second step the proof of Froissart-type bounds in 
s( logs)” for F,(s) at all these admissible values of t, 
were performed in 1966 by A Martin. They rely on 
a subtle conspiracy of the analyticity properties 
deduced from the QFT axiomatic framework and of 
positivity and unitarity properties expressing the 
basic Hilbertian structure of the quantum collision 
theory. The consequence of these bounds on the 
exact form of the dispersion relations is that, as in 
formula [7] of the case t=0, it is justified to write a 
(the so-called “subtracted”) dispersion relation for 
(F,(s) — F,(0)— sF{(0))/s?: for the general case when 
the crossing property replaces the symmetry (S), 
such a dispersion relation involves two subtractions 
(since F,(0) #0). Detailed information concerning 
the interplay of analyticity and unitarity on the mass 
shell and the derivation of refined forms of disper- 
sion relations and various boundedness properties 
for the scattering functions are given in the book by 
Martin (1969). 

3. Constraints imposed by dispersion relations 
and experimental checks. The conceptual impor- 
tance of dispersion relations incorporating the 
above features (1) and (2) is displayed by such 
spectacular application as the relationship between 
the high-energy behaviors of proton-proton and 
proton—antiproton cross sections. Even though the 
closest forms of relationship between these cross 
sections (e.g., the existence of equal high-energy 
limits) necessitate for their proof some extra 
assumption concerning, for instance, the behavior 
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of the ratio between the dispersive and absorptive 
parts of the forward scattering amplitude, one can 
speak of an actual model-independent implication 
of general QFT that imposes nontrivial constraints 
on phenomena. Otherwise stated, checking experi- 
mentally the previous type of relationship up to the 
limits of high energies imposed by the present 
technology of accelerators constitutes an indirect, 
but important test of the validity of the general 
principles of QFT. 


As a matter of fact, it has also appeared frequently 
in the literature of high-energy physics during the 
last 40 years that the Froissart bound by itself was 
considered as a key criterion to be satisfied by any 
sensible phenomenological model in particle physics. 
As already stated above, the Froissart bound is one 
of the deepest consequences of the analytic program 
of general QFT, since its derivation also incorpo- 
rates in the most subtle way the quantum principle 
of probability conservation. Would it be only for the 
previous basic results, the derivation of dispersion 
relations (and, more generally, the results of the 
analytic program) in QFT appear as an important 
conceptual bridge between a fundamental theoreti- 
cal framework of relativistic quantum physics and 
the phenomenology of high-energy particle physics. 


Basic Concepts and Main Steps in the 
QFT Derivation of Dispersion Relations 


The rest of this article outlines the derivation of the 
analytic background of dispersion relations for the 
forward scattering amplitudes in the framework of 
axiomatic QFT. After a brief introduction on 
relativistic scattering processes and the problematics 
of causality in particle physics, it gives an account of 
the Wightman axioms and the simplest reduction 
formula which relates the forward scattering ampli- 
tude to a retarded product of the field operators. 
Then it describes how the latter can be used for 
justifying a certain type of analyticity domain for the 
forward scattering functions, namely a crossing 
domain or in the best cases a cut-plane in the 
squared energy variable s. This is the basic result 
that allows one to write dispersion relations (or 
quasi-dispersion-relations) at t = 0; the exact form of 
the latter, including at most two subtractions, relies 
on the use of Hilbertian positivity and of the 
unitarity of the scattering operator. 


Relativistic Quantum Scattering as a Phenomenon 
of Linear Response 


Collisions of quantum particles may be seen as 
phenomena of linear response, but in a way which 


differs greatly from what has been previously 


described. 


Particles in Minkowskian geometry Each state of a 
relativistic classical particle with mass m is char- 
acterized by its energy-momentum vector or 
4-momentum p=(fo,p) satisfying the mass-shell 
condition p?=pj—p?=m (in units such that 
c=1). In view of the condition of positivity of the 
energy po >0 the “physical mass shell” thus coin- 
cides with the positive sheet Ht of the mass 
hyperboloid H,, with equation p? = m?. 

The set of all energy-momentum configurations 
characterizing the collisions of two relativistic classi- 
cal particles with initial (resp. final) 4-momenta 
pı, p2 (resp. p4, ph) is the mass-shell manifold M 
defined by the conditions 


2 2 
P; =m, 


pı tp =p +P 


where the latter equation expresses the relativistic 
law of total energy-momentum conservation. M is 
an eight-dimensional manifold, invariant under the 
(six-dimensional) Lorentz group: the orbits of this 
group that constitute a foliation of M are parame- 
trized by two variables, namely the squared total 
energy s= (pı + p2) =(p1 +p5)* and the squared 
momentum transfer t= (pı — piy =(p2 — D (or 
u= (pı — pL) =4m? — s-t). In these variables, 
called the Mandelstam variables, the “physical 
region” ® of the collision is represented by the set 
of pairs (s,t) (or triplets (s, t, u) with s + t + u = 4m?) 
such that t < 0,u < 0, and therefore s > 4m7’. 

Correspondingly, each state of a relativistic quan- 
tum particle with mass m is characterized by a wave 
packet f(p) on H}, which is an element of unit norm 
of La(H$; um(p)), with um(p)=dp/(p? + m?)"?. In 
Minkowskian spacetime with coordinates x = (xo, x), 
any such state is represented by a wave function f(x) 
whose Fourier transform is the tempered distribution 
(with support in H*) f(p) x 6(p* — m°): f(x) is a 
positive-energy solution of the Klein-Gordon equa- 
tion (07 /ðx4 — Ax + m? )f(x)=0. A free two-particle 
state is a symmetric wave packet f(p1, p2) on H} x 
H% in the Hilbert space La(H}, x H$; Um © lm). 


pam, Po >00, Pio > 9, j= 1.2 


Scattering kernels as response kernels: distribution 
character While the input to be considered is a free 
wave packet fin(P1, p2) on H} x H+, representing the 
preparation of an initial two-particle state, the output 
corresponds to the detection of a final two-particle 
state also characterized by a wave packet Sour(p',, p5) 
on H} x H}. In quantum mechanics, linearity is 
linked to the “superposition principle” of states, 


which allows one to state that collisions are described 
by a certain bilinear form (fin, Zour) > S(fins Zout)s 
called the “scattering matrix.” This bilinear form is 
bicontinuous with respect to the Hilbertian norms of 
the wave packets, and it then results from the 
Schwartz nuclear theorem that it is represented by a 
distribution kernel S(p1, p2;p1, p3), namely a tem- 
pered distribution with support contained in M, in 
such a way that (formally) 


nw 


Di fine Zou) = | fob sPa)olP5)S(P 1,2: p) 
X Hin (D1) Mm (D2) Min (Ps) Hm (D2) [20] 


AN 


If there were no interaction, S(fin, Zour) would reduce 
to the Hilbertian scalar product <ĝout,fin> in L2 
(H+ x H}; Um ® Um) and the corresponding kernel S 
would be the identity kernel 


I(p1, pa; Pis p1) = 5[6(P1 — P1)6(P2 = PS) 

+ 6(b1 — p)6(b2 — 71) 
In the general case, the interaction is therefore 
described by the scattering kernel T(p1, p2; p4, p>) = 
S(p1, P23 P3 P2) — U(P1, P23 p1 Pz). The action of T as 
a bilinear form (defined in the same way as the 
action of S in eqn [20]) may be seen as the quantum 
analog of the classical response formula [10]. Note, 
however, the difference in the mathematical treat- 
ment of the output: instead of being considered as 
the direct response (four) to the input, it is now 
explored by Hilbertian duality in terms of detection 
wave packets Sour, in conformity with the principles 
of quantum theory. Finally, in view of the invariance 
of the collision process under the Lorentz group, the 
scattering kernel T is constant along the orbits of 
this group in M and it then defines a distribution 
F(s, t) = T(p1, P23 p1, p3) with support in the physical 
region ©: this is what is called the scattering 
amplitude. 


What becomes of causality? One can show that the 
positive-energy solutions of the Klein—Gordon equa- 
tion cannot vanish in any open set of Minkowski 
spacetime; they necessarily spread out in the whole 
spacetime. This makes it impossible to formulate a 
causality condition comparable to eqn [9] in terms 
of the spacetime wave functions fin and Zout 
corresponding to the input and output wave packets 
fins out. In this connection, it is, however, appro- 
priate to note that (after various attempts of “weak 
causality conditions”) a certain condition called 
“macrocausality” (Iagolnitzer and Stapp 1969; see 
the book by Iagolnitzer (1992)) has been shown to 
be equivalent to some local properties of analyticity 
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of the scattering kernel T; but it is not our purpose 
to develop that point here for two reasons: (1) the 
interpretation of that condition is rather involved, 
because it integrates a very weak form of causality 
together with the spatial short-range character of the 
strong nuclear interactions between the elementary 
particles; (2) the domains of analyticity obtained are 
by far too small with respect to those necessary for 
writing dispersion relations. The reason for this 
failure is that the scattering kernel only represents 
an asymptotic quantum observable, in the sense that 
it is intended to describe observations far apart from 
the extremely small spacetime region where the 
particles strongly interact, namely in regions where 
this interaction is asymptotically small. Although 
well adapted to what is actually observed in the 
detection experiments, the concept of scattering 
kernel is not sufficient for describing the funda- 
mental interactions of physics: it must be enriched 
by other theoretical concepts which might explicitly 
take into account the microscopic interactions in 
spacetime. This motivates the introduction of quan- 
tum fields as basic quantities in particle physics. 


Relativistic Quantum Fields: Microcausality and 
the Retarded and Advanced Kernels; Analyticity 
in Complex Energy-Momentum Space 


By an idealization of the concept of quantum 
electromagnetic field and a generalization to all 
types of microscopic interactions of matter, one 
considers that all the phenomena involving such 
interactions can be described by fields ®;(x), whose 
amplitude can, in principle, be measured in arbi- 
trarily small regions of Minkowski spacetime. In the 
quantum framework, one is thus led to the notion of 
local observable O (emphasized as a basic concept in 
the axiomatic approach of Araki, Haag, and 
Kastler). In the Wightman field-theoretical frame- 
work, a local observable corresponds to the measur- 
ing process of a ponderated average of a field ®;(x) 
of the form O= ©,[f] = [®,(x)f(x) dx. In the latter, 
f(x) denotes a smooth real-valued test-function with 
(arbitrary) compact support K in spacetime; the 
observable O is then said to be localized in K. Each 
observable O=9@;(f) has to be a self-adjoint 
(unbounded) operator acting in (a dense domain 
of) the Hilbert space H generated by all the states of 
the system of fundamental fields {®;}; therefore, the 
correct mathematical concept of relativistic quantum 
field ®(x) is an “operator-valued tempered distribu- 
tion on Minkowski spacetime.” Here the additional 
“temperateness assumption” is a convenient techni- 
cal assumption which in particular allows the 
passage to the energy-momentum space by making 
use of the Fourier transformation. 
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In this QFT framework, it is natural to express a 
certain form of causality by assuming that two 
observables ®(f) and (f) commute if the sup- 
ports of f and f’ are spacelike-separated regions in 
spacetime, which means that no signal with 
velocity smaller or equal to the velocity of light 
can propagate from either one of these regions to 
the other. This expresses the idea that these two 
observables should be independent, that is, “com- 
patible as quantum observables.” This postulate is 
equivalent to the following condition, called 
microcausality or local commutativity, and under- 
stood in the sense of operator-valued tempered 
distributions: 


[B(x1), (x1) =0, for (x, -x))*> <0 [21 


where (x1 =a] is the squared Minkowskian 
pseudonorm of x=x,—x,=(x0,x), namely 
x? =x —x*. It follows that for every admissible 


pair of states Y, Y’ in H, the tempered distribution 


Cou (x1,%4) =<W, [b (x1), B(x) Y> [22] 


has its support contained in the union of the sets 


Vt :x1 — x1 € V+ and V` :x1 — x} € V-, where V+ 


and V- are, respectively, the closures of the forward 
and backward cones V* = {x = (xo, x); xo > |x|}, 
V-=-—V* in Minkowski spacetime. It is always 
possible to decompose the previous distribution as 

— Ayw (x1,x1) [23] 


Cy w (x1, x1) = Ro w (x1, x1) 


in such a way that the supports of the distributions 
Ry w(x1, x1) and Ag w(x1,x4) belong, respectively, 
to V% and V`. and Agy w are called, 
respectively, retarded and advanced kernels and 
they are often formally expressed (for convenience) 
as follows: 


Ry, w 


Ryw CEN = 0(x1,0 7 X19) Cu,w (x1, x4) 


Ayw (x1, x1) = —O(x1,0 — X49) Cu,w (x1, x4) 


in terms of the Heaviside step function O(t) of the 
time-coordinate difference t=x1,0 — x4 9. For every 
pair (Y, Y’), Ru,w(x1,x;) appears as a relativistic 
generalization of the retarded kernel R(t — t') of eqn 
[10]: its support property in spacetime, similar to the 
support property of R in time, expresses a relativistic 
form of causality, or “Einstein causality.” 

There exists a several-variable extension of the 
theory of Fourier—Laplace transforms of tempered 
distributions which is based on a formula similar to 
eqn [11]. We introduce the vector variables 
X = (x1 + x1)/2,x=xı —- x} and a complex 
4-momentum k=p+ig=(ko,k) as the conjugate 


vector variable of x with respect to the Minkows- 
kian scalar product k = koxo — k - x, and we define 


Since a x>0 for all a (45% ) such that q € V7, 
xE V`, it follows that Ry w(k, X) is holomorphic 
with n to k in the domain T* containing all 
k=p+ig such that q belongs to V*. Moreover, in 
the limit q—0 this holomorphic function tends (in 
the sense of distributions) to the Fourier transform 
Ro,w(p,X) of Row (X+x/2,X —x/2) with respect 
to x. The domain 7”, which is called the “forward 
tube,” is the analog of the domain T, of the w-plane; 
bounds of moderate type comparable to ee of [12] 
apply to the holomorphic function Re vw in T°. K 
Similarly, the advanced kernel Ay w( (X ra, 2,X — 
x/2) admits a Fourier—Laplace transform Ay wh, X), 
which is holomorphic and of moderate growth in the 
“backward tube” T~ containing all R=p-+ig such 
that q belongs to V~. In view of [23], the Fourier 
transform Cy w(p,X) of Co,w(X+x/2,X —x/2) 
then appeal z the a between the boundary 
values of Re y, and AY v,,v, ON the reals (from the 
respective domains T+ and T- ). 


The Field-Theoretical Axiomatic Framework and 
the Passage from the Structure Functions of QFT 
to the Scattering Kernels (Case of Forward 
Scattering) 


The postulates (Wightman axioms) Apart from the 
causality postulate, which we have already presented 
above in view of its distinguished role for generating 
analyticity properties in complex energy-momentum 
space, the field-theoretical axiomatic approach to 
collision theory is based on the following postulates 
(for all the fundamental developments of axiomatic 
field theory, the interested reader may consult the 
books by Streater and Wightman (1980) and by Jost 
(1965); see Axiomatic Quantum Field Theory). 


1. There exists a unitary representation g —U(g) of 
the Poincaré group G in the Hilbert space of 
states H; in this representation, the abelian 
subgroup of translations of space and time has a 
Lie algebra whose generators are interpreted as 
the four self-adjoint (commuting) operators P, of 
total energy-momentum of the system. 

2. The quantum field operators ®(x) transform 
covariantly under that representation; in the 
simplest case of scalar fields (considered here), 
(gx) = U(g)®(x)U(g). 

3. There exists a unique state Q, called the vacuum, 
such that the action of all polynomials of field 
operators on Q generates a dense subset of H; 


moreover, Q is assumed to be invariant under the 
representation U of G, and thereby such that 
P,.Q=0. 

4. Spectral condition or positivity of energy in all 
physical states. The joint spectrum © of the 
operators P,, is contained in the closed forward 
cone V+ of energy-momentum space. In order to 
perform the collision theory of massive particles, 
one needs a more detailed “mass-gap assump- 
tion”: X is the union of the origin O, of one or 
several positive sheets of hyperboloid H, and 
of a region V} defined by the conditions p? > 
M?, po >0, with M larger than all the mj. 


The Hilbert space H is correspondingly decom- 
posed as the direct sum of the vacuum subspace (or 
zero-particle subspace) generated by Q, of subspaces 
of stable one-particle states with masses m; iso- 
morphic to L3(Hj,, fm,;)) and of a remaining sub- 
space H. As a result of the construction of 
“asymptotic states,” H’ can be shown to contain 
two subspaces H;, and H’ generated, respectively, 
by N-particle incoming states (with N arbitrary and 
>2) and by N-particle outgoing states. The collision 
operator S is then defined as the partially isometric 
operator from Huw onto H.,, which maps a 
reference basis of outgoing states onto the corre- 
sponding basis of incoming states. 


An independent postulate: asymptotic completeness 
(see Scattering, Asymptotic Completeness and 
Bound States and Scattering in Relativistic 
Quantum Field Theory: Fundamental Concepts 
and Tools) The theory is said to satisfy the 
property of asymptotic completeness if all the states 
of H can be interpreted as superpositions of various 
N-particle states (either in the incoming or in the 
outgoing state basis), namely if one has 
H' =H; =H uw This property is not implied by the 
previous postulates on quantum fields, but its 
physical interpretation and its role in the analytic 
program are of primary importance (see Scattering 
in Relativistic Quantum Field Theory: The Analytic 
Program). Let us simply note here that asymptotic 
completeness implies as a by-product the unitarity 
property of the collision operator $ on the full 
Hilbert space H’ (i.e., SS* = S*S =I). 


Connection between retarded kernel and scattering 
kernel for the forward scattering case; a simple 
“reduction formula” We consider the scattering of 
a particle I]; with mass mı on a target consisting of 
a particle II, with mass m and denote by 
T(p1, P23, pb) the corresponding scattering kernel 
(defined similarly as for the case of equal-mass 
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particles considered earlier). Equations [22|-[24] are 
then applied to the case when Y and Y’ coincide 
with a one-particle state of II, at rest, namely with 
4-momentum p2=p, along the time axis: p2= 
((D2)9, 0), (D2)y9 =m2. This describes in a simple way 
the case of forward scattering, since in view of the 
energy-momentum conservation law p1 + p2= p4 + 
p5, the choice p =p also implies that pı =p}. 
(The possibility of restricting the distribution 
T(p1,P23P',P>) to such fixed values of the energy- 
momenta is shown to be mathematically well 
justified). The advantage of this simple case is that 
the corresponding kernels [22], [23] of (x1,x',) are 
invariant under spacetime translations and therefore 
depend only on x (and not on X). We can thus 
rewrite eqns [22], [23] with simplified notations as 
follows: 


Guts) = <n C) #(-3))o> 
= Rp, (x) — Ap, (x) 25) 


which can be shown to give correspondingly by 
Fourier transformation 


Cp, (P) =<p2, ®(p)®(—p)p2> — <p2, 6(—p) ®(p)po> 
=Rys(p)—Aps() 26] 


If the particle Il, appears in the asymptotic states of 
the field ®, the scattering kernel T(p1,p2;p4, p5) is 
then given in the forward configurations pı =p € 
Hy, »P2=p> € Hy, by the following reduction for- 


mula in which s= (pı + p2): 


Fo(s) = T(p1, P2;P1, P2) 
= [(pi — m7) Rp, (p1)] Hi, [27] 


Analyticity Domains in Energy-Momentum Space: 
From the “Primitive Off-Shell Domains” of QFT to 
the Crossing Manifolds on the Mass Shell 


For simplicity, we shall restrict ourselves to the 
consideration of forward scattering amplitudes, 
namely to the derivation of crossing analyticity 
domains and (quasi-)dispersion relations at t=0 
for two-particle collision processes of the form I], + 
Ty ~-0,+I1h, I and Il, being given massive 
particles with arbitrary spins and charges. 


The holomorphic function H,,(k) and its primitive 
domain D. Nontriviality of dispersion relations for 
the scattering amplitudes As suggested by eqn [24], 
we can exploit the analyticity properties of the 
Fourier-—Laplace transforms of the retarded and 
advanced kernels Rp, and Ay,,: in fact, Rp,(p) and 
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Ap, (p) are, respectively, the boundary values of the 
holomorphic functions 


|28] 


from the corresponding domains 7” and 7~. Accord- 
ing to the reduction formula [27], it is appropriate to 
consider correspondingly the functions H% (k) = (k*— 
m?)Ř (k) and H7 (k) = (k? — m2)Ä' (k), which are 
also, respectively, holomorphic in J* and 7~. Then 
the forward scattering amplitude Fo(s) = F(s,0)= 
T(p1,p2;p1,p2) appears as the restriction to the 
hyperboloid sheet p € H}, of the boundary value 
H7, (p) of Hy. (k) on the reals. 

Moreover, it can be seen that the two boundary 
values H} (p)=(p* — m{)Rp,(p) and H, (p) =(p*— 
m*+)Ay,(p) coincide as distributions in the region 


R = {p € RÍ; (p + pr)” < (mı +m)’: 
(p — pr)” < (mı + m) } [29] 


This follows from the intermediate expression in 
eqn [26] and from the fact that a state of the form 
(p* — m+)®(+p)p2 > is a state of energy-momentum 
+p+p2 and therefore vanishes (in view of the 
spectral condition) if (+p + p2) < (mı + m} (here 
we also use a simplifying assumption according to 
which no one-particle bound state is present in this 
channel). 

The situation obtained concerning the holo- 
morphic functions H (k) and Hok) parallels (in 
complex dimension four) the case of a pair of 
holomorphic functions in the upper and lower half- 
planes whose boundary values on the reals coincide 
on a certain interval playing the role of R. As in this 
one-dimensional case there is a theorem, called the 
“edge-of-the-wedge theorem” (see below), which 
implies that H; (k) and H, (k) have a common 
analytic continuation Hp, (k): this function is holo- 
morphic in a domain D which is the union of 
T*,Z~ and of a complex neighborhood of R; D is 
called the primitive domain of Hp, (k). 

Moreover, it follows from the postulate of invar- 
iance of the field ®(x) under the action of the Poincaré 
group (see postulate (2)) that the holomorphic func- 
tion Hp, (k) only depends of the two complex variables 
C= k?(=ki — k’) and k- p> or equivalently s= (k + 
p2) =Ċ +m + 2k - po; it thus defines a correspond- 
ing holomorphic function H, (C, s) = Hp, (k) in the 
image of D in these variables. 

In view of the reduction formula [27], the 
scattering function Fo(s) should appear as the 


restriction of the holomorphic function Ho (Gs) to 
the physical mass-shell value ¢=m/. However, it 
turns out that the section of D by the complex mass- 
shell manifold M® with equation k? = m? is empty: 
this geometrical fact is responsible for the nontrivi- 
ality of the proof of dispersion relations for 
the physical quantity Fo(s) on the mass shell. In 
fact, the tube T* U T~ which constitutes the basic 
part of the domain D and is given by the field- 
theoretical microcausality postulate, is a “purely off- 
shell” complex domain, as it can be easily checked: if a 
complex point k=p-+ig is such that g*>0, the 
corresponding squared mass C = k? = p? — q* + 2ip -q 
is real if and only if p-q=0, which implies p? < 0 
(i.e., p spacelike) and therefore ¢ = p? — q? <0. 


“Off-shell dispersion relations” as a first step The 
starting point, which is easy to obtain from the 
domain D, is the analyticity of the holomorphic 
function Hy, (6, s) in a cut-plane of the variable s for 
all negative values of the squared mass variable ¢. 
This cut-plane Aç is always the complement in C 
(i.e., the complex s-plane) of the union of the s-cut 
(s real>(m,+my)*) and of the u-cut (u=2¢+ 
2m, — s real >(m, +m2)*). This analyticity property 
thus justifies “off-shell dispersion relations” at fixed 
negative values of ¢ for the field-theoretical structure 
function Hales: 

The latter property and the subsequent analysis 
concerning the process of analytic continuation of Hy, 
to positive values of ¢ will be more easily understood 
geometrically if one reduces the complex space of k to 
a two-dimensional complex space, which is legitimate 
in view of the equality Hp, (k) = Hy, (6, s). 

Having chosen the koọ-axis along p2, we reduce the 
orthogonal space coordinates k of k to the radial 
variable k,. One thus gets the following expressions 
of the variables ¢ and s (resp. u): 


Ç= ko a ke, 
(resp. u = C+ mî — 2mzko) 


Then we can write Hy, (¢, s) = Hy, (ko, kr) = 
Hp, (ko, —k,), and describe the image D, of the 
domain D in the variables k=(ko,k,) =p +ig as 
T* UT; UN(R,), where: 


1. T* is defined by the condition q? = q3 — q? > 0, 
qo >Q or qo < 0, 

2. N is a complex neighborhood of the real region 
R, defined as follows. Let hł,h; be the two 
branches of hyperbolae with respective equations: 


s = C+ m? + 2mko 


b+: (po +mz2)° — p? = (m +m}, po+m>0 


h; : (po—m2) -p = (m +m), po— ma <0 


Then R, is the intersection of the region situated 
below h} and of the region situated above h}. 


Let us now consider any complex hyperbola 
bO] with equation k =k —k2=¢. On such a 
complex curve either one of the variables kg or s or 
u is a good parameter for holomorphic functions 
which are even in k,, like Hy, (Ro, k+). If ¢ is real, any 
complex point k=p+ig of bft] is such that p? 
and q? have opposite signs (since p-q=0). There- 
fore, the sign of q? is always opposite to the sign of 
(=p? — q’): if C is negative, all the complex points 
of h')[¢] thus belong to T U T7; the union of all 
these points with the real points of h')[¢] in R, is 
therefore a subset of D,, which is represented in the 
complex plane of s by the cut-plane Ac. The function 
Hy, (¢,s) is therefore analytic (and univalent) in Aç 
for each ¢ < 0. Moreover, the existence of moderate 
bounds of type [12] on Hp, in D (resulting from the 
temperateness assumption) then implies the validity 
of dispersion relations (with subtractions) for 
Hy, (C, s) in Ac. 


The problem of analytic completion to the complex 
mass-shell hyperbola h'° [m7]: what is provided by 
the Jost-Lehmann—Dyson domain A basic fact in 
complex geometry in n variables, with n > 2, is the 
existence of a distinguished class of domains, called 
holomorphy domains: for each domain M in this 
class, there exists at least one function which is 
holomorphic in U and cannot be analytically 
continued at any point of the boundary of U. In 
one dimension, every domain is a holomorphy 
domain. In dimension larger than one, a general 
domain U is not a holomorphy domain, but it 
admits a holomorphy envelope U, which is a 
holomorphy domain containing U, such that every 
function holomorphic in WU admits an analytic 
continuation in U. 

It turns out that the domain D, considered above 
in the last subsection) is not a holomorphy domain; 
its holomorphy envelope D, (obtained geometrically 
by Bros, Messiah, and Stora in 1961) coincides with 
a domain introduced by Jost-—Lehmann (1957) and 
Dyson (1958) by methods of wave equations. This 
domain can be characterized as the union of D, with 
all the complex points of all the hyperbolae with 
equations (ko — a) —(k, —b)? = 2 (for all a,b,c 
real, including the complex straight lines for which 
c=0) whose both branches have a nonempty 
intersection with the real region R,. 

In particular, one easily sees that all the hyperbo- 
lae h'[¢] with 0 < ¢ < m? belong to the previous 
class. It follows that for any Ç in this positive 
interval, the function Hy, (C,s) can still be 
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analytically continued as a holomorphic function of 
s in the cut-plane Aç and thereby satisfies the 
corresponding dispersion relations. 

The physical mass shell hyperbola h' [m7] thus 
appears as a limiting case of the previous family (for 
¢ tending to mj from below). The analyticity of 
Hy, (m4, s) in Am can then be justified provided one 
knows that this function is analytic at at least one 
point of A„z2: but this additional information results 
from a more thorough exploitation of the analyticity 
properties resulting from the QFT postulates. This 
will be now briefly outlined below. 


Further information coming from the four-point 
function in complex momentum space It is 
possible to obtain further analyticity properties of 
H, (C,s) = Hp, (k) by considering the latter as 
the restriction to the submanifold kı = — k3 = k; 
k= —k4=p2 of a master analytic function 
H4(kı,k2,k3, k4), called the four-point function of 
the field ® in complex energy-momentum space (see 
Scattering in Relativistic Quantum Field Theory: 
The Analytic Program). This function is holo- 
morphic in a well-defined primitive domain D4 of 
the linear submanifold kı + ko + kz + k4 =0. It is 
then possible to compute some local parts situated 
near the reals of the holomorphy envelope of D4, 
which implies, as a by-product, that the function 
H,,(¢,s) can be analytically continued in a set © of 
the form 


D={(C,s); CE 6, sE V(O} 
U{(¢,s); ÇE, u=2C+2m5—sEV.,(¢)} [80] 


with the following specifications: 


1. 6 is a domain in the ¢-plane, which is a complex 
neighborhood of a real interval of the form 
—a < < Mj; here Mı denotes a spectral mass 
threshold in the theory such that Mı > mı; 

2. for each ¢,V,,(¢) (resp. Vy,(¢)) is a cut- 
neighborhood in the s-plane of the real half-line 
s>s, (resp. of the half-line w=2¢ + 2m- 
s real >u1); sı and u, denote appropriate real 
numbers independent of ¢. 


The final analytic completion: crossing domains on 
h' [m7]. Dispersion relations for mo-7) meson 
scattering and “quasi-dispersion-relations” for 
proton-proton scattering We now wish to describe 
briefly the final step of analytic completion, which 
displays the existence of a “quasi-cut-plane domain” 
in s for the function H,,(m?,s), even in the more 
general case when the s-cut and u-cut are associated 
with different scattering channels, whose respective 
mass thresholds s= M7, and u=M’, are unequal. 
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This general situation may occur as soon as one 
charged particle II; of the s-channel is replaced by 
the corresponding antiparticle II, in the u-channel, 
in contrast with the case of neutral particles (like the 
mto meson) which coincide with their own antiparti- 
cles. Here it is important to note that the two real 
branches ht[m}] and b`[m?] of the mass shell 
hyperbola h [m7] correspond, respectively, to the 
physical region of the “direct scattering channel” of 
the reaction Il + TI, —> I + Il with squared total 
energy s, and to the physical region of the “crossed 
scattering channel” of the reaction I + IIx — Ty + 
Il, with squared total energy u. A typical and 
important example is the case of proton-proton 
scattering in the s-channel, where M12 equals twice 
the mass m(=m,=mz) of the proton, while the 
corresponding u-channel refers to the proton—anti- 
proton scattering, whose threshold M}, equals twice 
the mass u of the 7 meson. 

In that general case, the analysis of the subsection 
““Off-shell dispersion relations’ as a first step” still 
applies, so that the function f, (Cs) is always 
analytic in a set of the form 


So = 1165): ACOs E Ae [31] 


Then, the additional information described above in 
the last subsection allows one to use the following 
crucial property of analytic completion, which we 
call 


Crossing lemma Ifa function G(¢, s) is holomorphic 
in a domain which contains the union of the sets X 
and So (see eqns |30] and [31]), then it admits an 
analytic continuation in a set of the following form: 


Kos Ce Oy Se 
Is — —m3| = |u — ¢ — må| > R(Q} 


By applying this property to the function Hy, (¢,s) 
and restricting ¢ to the mass-shell value m+ which 
belongs to 6, one obtains the analyticity of the 
scattering function Fols) = Hy, (m2, s) in a crossing 
domain of the complex mass shell hyperbola 
h'[m+]: the crossing between the two physical 
regions ht[m7] (s > M3») and h~ [m7] (u > M2) is 
ensured by a complex domain of h® [m+] whose image 
in the s-plane is the “cut-neighborhood of infinity” 
{s3s E€ A2, |s — mî — m3 | = ju -m4 — m3 | > R(m?)}. 
Note that the relevant boundary values of Fo for 
obtaining the scattering amplitudes of the two 
collision processes with respective physical regions 
h*[m{] and h-[m7+] have to be taken from the 
respective sides Im s >0 and Imu=—Ims>0 of the 


corresponding s- and u-cuts. 


It is only for the neutral case, where M12 = 
Mi =mı +m, that a more favorable scenario 
occurs, as explained earlier: in this case, the interval 
{C€]-—a,O[} of the set So is replaced by 
{¢ € ] —a,m7[}, so that the whole cut-plane domain 
A 2 is obtained in the result of the previous crossing 
lemma. The scattering amplitudes of 29-79 meson 
scattering and of m meson-proton scattering enjoy 
this property and, therefore, satisfy genuine disper- 
sion relations in which the scattering function is 
even (see the second basic example described at the 
beginning of this article). In the general case of 
crossing domains obtained above, corresponding 
Cauchy integral relations have been written and 
used under the name of “quasi-dispersion-relations.” 


Complementary results Some comments can now 
be added concerning the passage from the purely 
geometrical results (i.e. analyticity domains) 
described above to the writing of precise (quasi-) 
dispersion relations with two subtractions: 

Polynomial bounds and dispersion relations with 
N subtractions The previous methods of analytic 
completion also allow one to control the bounds at 
infinity in the relevant complex domains. As it has 
been noticed after eqn [24], the Fourier—Laplace 
transforms of the retarded and advanced kernels, and 
thereby the holomorphic functions gk (k) discussed 
at the start of this section are bounded at most by a 
power of a suitable norm of k in their respective tubes 
T=. Correspondingly, the holomorphic function 
Hp, (k) (resp. Hy,(Ro,R,)) admits the same type of 
bound in its primitive analyticity domain D (resp. 
D,). These bounds are a consequence of the tempered 
distribution character of the structure functions of the 
fields which is built-in in the Wightman field- 
theoretical framework. Then it can be checked that 
in the holomorphy envelope D, of D,, and thereby in 
the cut-plane (or crossing) domains obtained in the 
intersection of D, and of the complex mass shell 
b [m=], the same type of power bound is still valid: 
Pols) is therefore bounded by some power |s|} of |s| 
and thus satisfies a (quasi-)dispersion relation with N 
subtractions. The same type of argument holds for all 
the similar cut-domains (or crossing domains) in s 
obtained for F,(s) for all negative value of t. 

It is also worthwhile to mention that a similar 
remarkable (since not at all predictable) result was 
also obtained in the Haag, Kastler, and Araki frame- 
work of algebraic QFT (Epstein, Glaser, Martin, 
1969; see Scattering in Relativistic Quantum Field 
Theory: The Analytic Program for further comments). 

In this connection, one can also mention a more 
recent result. In the Buchholz—Fredenhagen axio- 
matic approach of charged fields (1982), in which 
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locality is replaced by the more general notion of 
“stringlike locality” (see Algebraic Approach to 
Quantum Field Theory, Axiomatic Quantum Field 
Theory, and Scattering in Relativistic Quantum 
Field Theory: Fundamental Concepts and Tools), a 
proof of forward dispersion relations has again been 
obtained (Bros, Epstein, 1994). 

The extension of the analyticity domains by 
positivity and the derivation of bounds by unitarity 
(Martin 1966; see the book by Martin (1969)). The 
following ingredients have been used: 


1. Positivity conditions on the absorptive part of 
F(s,t), which are expressed by the infinite set of 
inequalities (d/dt)"Im F(s,t),-9 20 (for all inte- 
gers n), 

2. The existence of a two-dimensional complex 
neighborhood of some point (s=so,f=0) in the 
analyticity domain resulting from QFT. 


The following results have then been obtained: 


(a) It is justified to differentiate the forward (sub- 
tracted) dispersion relations with respect to t at 
any order. 

(b) F(s,t) can be analytically continued in a fixed 
circle |t| <tmax for all values of s. The latter 
implies the extension of dispersion relations in s 
to positive (and complex) values of t. 

(c) In a last step, the use of unitarity conditions for 
the “partial waves” f;(s) of F(s,t) (see Scattering 
in Relativistic Quantum Field Theory: The 
Analytic Program) allows one to obtain 
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Introduction 


A dynamical system (DS) is a system which evolves 
with respect to the time. To be more precise, a DS 
(S(t), ®) is determined by a phase space ® which 
consists of all possible values of the parameters 
describing the state of the system and an evolution 
map S(t): ® — ® that allows one to find the state of 
the system at time £ > 0 if the initial state at t=0 is 
known. Very often, in mechanics and physics, the 
evolution of the system is governed by systems of 


Froissart-type bounds on the scattering ampli- 
tudes and thereby to justify the writing of 
(quasi-)dispersion relations with at most two 
subtractions for all the admissible values of t. 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Perturbation Theory 
and its Techniques; Scattering in Relativistic Quantum 
Field Theory: The Analytic Program; Scattering, 
Asymptotic Completeness and Bound States; Scattering 
in Relativistic Quantum Field Theory: Fundamental 
Concepts and Tools. 
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of Infinite Dimension 


differential equations. If the system is described by 
ordinary differential equations (ODEs), 


Eya) = FEW), y0) =y, 
v(t) = Ol) IN) i 


for some nonlinear function F: R, x RN — RY, we 
have a so-called finite-dimensional DS. In that case, 
the phase space ® is some (invariant) subset of Rẹ 
and the evolution operator S(t) is defined by 


S(t)yo == y(t), y(t) solves [1] |2] 


We also recall that, in the case where eqn [1] is 
autonomous (i.e., does not depend explicitly on the 
time), the evolution operators S(t) generate a 
semigroup on the phase space ®, that is, 


S(t + ty ) = S(t) O S(t2), lit © R+ [3] 
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Now, in the case of a distributed system whose 
initial state is described by functions up = u(x) 
depending on the spatial variable x, the evolution 
is usually governed by partial differential equations 
(PDEs) and the corresponding phase space ® is some 
infinite-dimensional function space (e.g., ® := L*(Q) 
or ®:= L” (Q) for some domain 2 c RN.) Such DSs 
are usually called infinite dimensional. 

The qualitative study of DSs of finite dimensions 
goes back to the beginning of the twentieth century, 
with the pioneering works of Poincaré on the N- 
body problem (one should also acknowledge the 
contributions of Lyapunov on the stability and of 
Birkhoff on the minimal sets and the ergodic 
theorem). One of the most surprising and significant 
facts discovered at the very beginning of the theory 
is that even relatively simple equations can generate 
very complicated chaotic behaviors. Moreover, these 
types of systems are extremely sensitive to initial 
conditions (the trajectories with close but different 
initial data diverge exponentially). Thus, in spite of 
the deterministic nature of the system (we recall that 
it is generated by a system of ODEs, for which we 
usually have the unique solvability theorem), its 
temporal evolution is unpredictable on timescales 
larger than some critical time Tp (which depends 
obviously on the error of approximation and on the 
rate of divergence of close trajectories) and can 
show typical stochastic behaviors. To the best of our 
knowledge, one of the first ODEs for which such 
types of behaviors were established is the physical 
pendulum parametrically perturbed by time-periodic 
external forces, 


y” (t) + sin(y(t))(1 + esin(wt)) = 0 [4] 


where w and £ > 0 are physical parameters. We also 
mention the more recent (and more relevant for our 
topic) famous example of the Lorenz system which is 
defined by the following system of ODEs in R°: 


x = o(y — x) 
y = —xy+rx—y [5] 
z = xy — bz 


where o,r, and b are some parameters. These 
equations are obtained by truncation of the 
Navier-Stokes equations and give an approximate 
description of a horizontal fluid layer heated from 
below. The warmer fluid formed at the bottom 
tends to rise, creating convection currents. This is 
similar to what happens in the Earth’s atmosphere. 
For a sufficiently intense heating, the time evolution 
has a sensitive dependence on the initial conditions, 
thus representing a very irregular and chaotic 


convection. This fact was used by Lorenz to justify 
the so-called “butterfly effect,” a metaphor for the 
imprecision of weather forecast. 

The theory of DSs in finite dimensions had been 
extensively developed during the twentieth century, 
due to the efforts of many famous mathematicians 
(such as Anosov, Arnold, LaSalle, Sinai, Smale, etc.) 
and, nowadays, much is known on the chaotic 
behaviors in such systems, at least in low dimen- 
sions. In particular, it is known that, very often, the 
trajectories of a chaotic system are localized, up to a 
transient process, in some subset of the phase space 
having a very complicated fractal geometric struc- 
ture (e.g., locally homeomorphic to the Cartesian 
product of R” and some Cantor set) which, thus, 
accumulates the nontrivial dynamics of the system 
(the so-called strange attractor). The chaotic 
dynamics on such sets are usually described by 
symbolic dynamics generated by Bernoulli shifts on 
the space of sequences. We also note that, nowa- 
days, a mathematician has a large amount of 
different concepts and methods for the extensive 
study of concrete chaotic DSs in finite dimensions. 
In particular, we mention here different types of 
bifurcation theories (including the KAM theory and 
the homoclinic bifurcation theory with related 
Shilnikov chaos), the theory of hyperbolic sets, 
stochastic description of deterministic processes, 
Lyapunov exponents and entropy theory, dynamical 
analysis of time series, etc. 

We now turn to infinite-dimensional DSs gener- 
ated by PDEs. A first important difficulty which 
arises here is related to the fact that the analytic 
structure of a PDE is essentially more complicated 
than that of an ODE and, in particular, we do not 
have in general the unique solvability theorem as for 
ODEs, so that even finding the proper phase space 
and the rigorous construction of the associated DS 
can be a highly nontrivial problem. In order to 
indicate the level of difficulties arising here, it 
suffices to recall that, for the three-dimensional 
Navier-Stokes system (which is one of the most 
important equations of mathematical physics), the 
required associated DS has not been constructed yet. 
Nevertheless, there exists a large number of equa- 
tions for which the problem of the global existence 
and uniqueness of a solution has been solved. Thus, 
the question of extending the highly developed 
finite-dimensional DS theory to infinite dimensions 
arises naturally. 

One of the first and most significant results in that 
direction was the development of the theory of 
integrable Hamiltonian systems in infinite dimen- 
sions and the explicit resolution (by inverse-scattering 
methods) of several important conservative equations 
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of mathematical physics (such as the Korteweg-de 
Vries (and the generalized Kadomtsev—Petiashvilli 
hierarchy), the sine-Gordon, and the nonlinear 
Schrödinger equations). Nevertheless, it is worth 
noting that integrability is a very rare phenomenon, 
even among ODEs, and this theory is clearly 
insufficient to understand the dynamics arising in 
PDEs. In particular, there exist many important 
equations which are essentially out of reach of this 
theory. 

One of the most important classes of 
such equations consists of the so-called dissipative 
PDEs which are the main subject of our study. As 
hinted by this denomination, these systems exhibit 
some energy dissipation process (in contrast to 
conservative systems for which the energy is 
preserved) and, of course, in order to have nontrivial 
dynamics, these models should also account for the 
energy income. Roughly speaking, the complicated 
chaotic behaviors in such systems usually arise from 
the interaction of the following mechanisms: 


1. energy dissipation in the higher part of the 
Fourier spectrum; 

2. external energy income in its lower part; 

3. energy flux from lower to higher Fourier modes 
provided by the nonlinear terms of the equation. 


We chose not to give a rigorous definition of a 
dissipative system here (although the concepts of 
energy dissipation and related dissipative systems 
are more or less obvious from the physical point of 
view, they seem too general to have an adequate 
mathematical definition). Instead, we only indicate 
several basic classes of equations of mathematical 
physics which usually exhibit the above behaviors. 

The first example is, of course, the Navier-Stokes 
system, which describes the motion of a viscous 
incompressible fluid in a bounded domain Q (we 
will only consider here the two-dimensional case 
Q C R’*, since the adequate formulation in three 
dimensions is still an open problem): 


Ou — (u,Vx)u = vA,u + Vp + g(x) 6 

div u = 0, u|,_9 = uo, tlan = 0 lé! 
Here, u(t, x)= (u(t, x), u2(t,x)) is the unknown 
velocity vector, p = p(t, x) is the unknown pressure, 
Ax is the Laplacian with respect to x, v > 0 and g are 
given kinematic viscosity and external forces, 
respectively, and (u,Vx)u is the inertial term 
([(4, V xul; = D054 ujOx;4i, i= 1,2). The unique global 
solvability of [6] has been proved by Ladyzhenskaya. 
Thus, this equation generates an infinite-dimensional 
DS in the phase space ® of divergence-free square- 
integrable vector fields. 


The second example is the damped nonlinear 
wave equation in Q C R”: 


Hu +70u —A,u+f(u) =0 


Ulan = 0, ulo = 40, ulio = Uy [7] 


which models, for example, the dynamics of a 
Josephson junction driven by a current source 
(sine-Gordon equation). It is known that, under 
natural sign and growth assumptions on the non- 
linear interaction function f, this equation generates 
a DS in the energy phase space E of pairs of 
functions (u, u) such that O,u and Vu are square 
integrable. 

The last class of equations that we will consider 
here consists of reaction-diffusion systems in a 
domain Q Cc R”: 


Ou = aAxu — f(u), u| o = uo [8] 


(endowed with Dirichlet (u|,,=0) or Neumann 
(nulan =0) boundary conditions), which describes 
some chemical reaction in Q. Here, u = (u!,... u^) 
is an unknown vector-valued function which 
describes the concentrations of the reactants, f(u) is 
a given interaction function, and a is a diffusion 
matrix. It is known that, under natural assumptions 
on f and a, these equations also generate an infinite- 
dimensional DS, for example, in the phase space 
P= (OT. 

We emphasize once more that the phase spaces ® in 
all these examples are appropriate infinite-dimensional 
function spaces. Nevertheless, it was observed in 
experiments that, up to a transient process, the 
trajectories of the DS considered are localized inside 
a “very thin” invariant subset of the phase space 
having a complicated geometric structure which, thus, 
accumulates all the nontrivial dynamics of the system. 
It was conjectured a little later that these invariant sets 
are, in some proper sense, finite dimensional and that 
the dynamics restricted to these sets can be effectively 
described by a finite number of parameters. Thus 
(when this conjecture is true), in spite of the infinite- 
dimensional initial phase space, the effective dynamics 
(reduced to this invariant set) is finite dimensional and 
can be studied by using the algorithms and concepts of 
the classical finite-dimensional DS theory. In particu- 
lar, this means that the infinite dimensionality plays 
here only the role of (possibly essential) technical 
difficulties, which cannot, however, produce any new 
dynamical phenomena which are not observed in the 
finite-dimensional theory. 

The above finite-dimensional reduction principle 
of dissipative PDEs in bounded domains has been 
given solid mathematical grounds (based on the 
concept of the so-called global attractor) over the 
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last three decades, starting from the pioneering 
papers of Ladyzhenskaya. This theory is considered 
in more detail here. 

The finite-dimensional reduction theory has some 
limitations. Of course, the first and most obvious 
restriction of this principle is the effective dimension 
of the reduced finite-dimensional DS. Indeed, it is 
known that, typically, this dimension grows at least 
linearly with respect to the volume vol(Q) of the 
spatial domain 2 of the DS considered (and the 
growth of the size of Q is the same (up to a 
rescaling) as the decay of the viscosity coefficient v 
or the diffusion matrix a, see eqns [6]-[8]). So, for 
sufficiently large domains 2, the reduced DS can be 
too large for reasonable investigations. 

The next, less obvious, but much more essential, 
restriction is the growing spatial complexity of the 
DS. Indeed, as shown by Babin—Buinimovich, the 
spatial complexity of the system (e.g., the number of 
topologically different equilibria) grows exponen- 
tially with respect to vol(Q). Thus, even in the case 
of relatively small dimensions, the reduced system 
can be out of reasonable investigations, due to its 
extremely complicated structure. 

Therefore, the approach based on the finite- 
dimensional reduction does not look so attractive 
for large domains. It seems, instead, more natural, at 
least from the physical point of view, to replace large 
bounded domains by their limit unbounded ones 
(e.g., Q =R” or cylindrical domains). Of course, this 
approach requires a systematic study of dissipative 
DSs associated with PDEs in unbounded domains. 

The dynamical study of PDEs in unbounded 
domains started from the pioneering paper of 
Kolmogorov—Petrovskij—Piskunov, in which the tra- 
veling wave solutions of reaction—diffusion equa- 
tions in a strip were constructed and the 
convergence of the trajectories (for specific initial 
data) to this traveling wave solutions were estab- 
lished. Starting from this, many results on the 
dynamics of PDE in unbounded domains have been 
obtained. However, for a long period, the general 
features of such dynamics remained completely 
unclear. The main problems arising here are: 


1. the essential infinite dimensionality of the DS 
considered (absence of any finite-dimensional 
reduction), which leads to essentially new 
dynamical effects that are not observed in finite- 
dimensional theories; 

2. the additional spatial “unbounded” directions 
lead to the so-called spatial chaos and the 
interaction between spatial and temporal chaotic 
modes generates the spatio-temporal chaos, 
which also has no analog in finite dimensions. 


Nevertheless, several ideas are mentioned in 
the following which (from authors’ point of view) 
were the most important for the development of 
these topics. The first one is the pioneering paper of 
Kirchgassner, in which dynamical methods were 
applied to the study of the spatial structure of 
solutions of elliptic equations in cylinders (which 
can be considered as equilibria equations for 
evolution PDEs in unbounded cylindrical domains). 
The second is the Sinai-Buinimovich model of 
spacetime chaos in discrete lattice DSs. Finally, the 
third is the adaptation of the concept of a global 
attractor to unbounded domains by Abergel and 
Babin-Vishik. 

We note that the situation on the understanding 
of the general features of the dynamics in 
unbounded domains, however, seems to have chan- 
ged in the last several years, due to the works of 
Collet-Eckmann and Zelik. This is the reason why a 
section of this review is devoted to a more detailed 
discussion on this topic. 

Other important questions are the object of 
current studies and we only briefly mention some 
of them. We mention for instance, the study of 
attractors for nonautonomous systems (i.e., sys- 
tems in which the time appears explicitly). This 
situation is much more delicate and is not 
completely understood; notions of attractors for 
such systems have been proposed by Chepyzhov-— 
Vishik, Haraux and Kloeden—Schmalfuss. We also 
mention that theories of (global) attractors for 
non-well-posed problems have been proposed by 
Babin-Vishik, Ball, Chepyzhov—Vishik, Melnik- 
Valero, and Sell. 


Global Attractors and Finite-Dimensional 
Reduction 


Global Attractors: The Abstract Setting 


As already mentioned, one of the main concepts of 
the modern theory of DSs in infinite dimensions is 
that of the global attractor. We give below its 
definition for an abstract semigroup S(t) acting on a 
metric space ®, although, without loss of generality, 
the reader may think that (S(t),®) is just a DS 
associated with one of the PDEs ([6]-[8]) described 
in the introduction. 

To this end, we first recall that a subset K of the 
phase space ® is an attracting set of the semigroup 
S(t) if it attracts the images of all the bounded subsets 
of ®, that is, for every bounded set B and every £ > 0, 
there exists a time T (depending in general on B 
and €) such that the image S(t)B belongs to the 
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e-neighborhood of K if t > T. This property can be 
rewritten in the equivalent form 


lim disty(S(¢)B, K) = 0 (9) 


where disty(X, Y) := sup,.cx infyey d(x, y) is the non- 
symmetric Hausdorff distance between subsets of ®. 

The following definition of a global attractor is 
due to Babin—Vishik. 


Definition 1 A set A C Ẹ is a global attractor for 
the semigroup S(t) if 


(i) A is compact in 9; 
(ii) A is strictly invariant: S(t) A= A, for all t > 0; 
(iii) A is an attracting set for the semigroup S(t). 


Thus, the second and third properties guarantee 
that a global attractor, if it exists, is unique and that 
the DS reduced to the attractor contains all the 
nontrivial dynamics of the initial system. Further- 
more, the first property indicates that the reduced 
phase space A is indeed “thinner” than the initial 
phase space ® (we recall that, in infinite dimensions, 
a compact set cannot contain, e.g., balls and should 
thus be nowhere dense). 

In most applications, one can use the following 
attractor’s existence theorem. 


Theorem 1 Let a DS (S(t), ®) possess a compact 
attracting set and the operators S(t):® — ® be 
continuous for every fixed t. Then, this system 
possesses the global attractor A which is generated 
by all the trajectories of S(t) which are defined for 
all t € R and are globally bounded. 


The strategy for applying this theorem to concrete 
equations of mathematical physics is the following. 
In a first step, one verifies a so-called dissipative 
estimate which has usually the form 


S(2)uollp < Q(lluolle) e + Cs, 


where ||- ||ẹ is a norm in the function space ® and the 
positive constants œ and C, and the monotonic 
function O are independent of t and uo € ® (usually, 
this estimate follows from energy estimates and is 
sometimes even used in order to “define” a dissipa- 
tive system). This estimate obviously gives the 
existence of an attracting set for S(t) (e.g., the ball 
of radius 2C, in ®), which is, however, noncompact 
in ®. In order to overcome this problem, one usually 
derives, in a second step, a smoothing property for 
the solutions, which can be formulated as follows: 


IS(1)wolle, < Q1(ll#olle): 


where ©®; is another function space which is 
compactly embedded into ®. In applications, ® is 


ugo E ® [10] 


ued [11] 


usually the space L?(Q) of square integrable func- 
tions, ®; is the Sobolev space H! (Q) of the functions 
u such that u and Vu belong to L*(Q) and estimate 
[11] is a classical smoothing property for solutions 
of parabolic equations (for hyperbolic equations, a 
slightly more complicated asymptotic smoothing 
property should be used instead of [11]). 

Since the continuity of the operators S(t) usually 
arises no difficulty (if the uniqueness is proven), then 
the above scheme gives indeed the existence of the 
global attractor for most of the PDEs of mathema- 
tical physics in bounded domains. 


Dimension of the Global Attractor 


In this subsection, we start by discussing one of the 
basic questions of the theory: in which sense is 
the dynamics on the global attractor finite dimen- 
sional? As already mentioned, the global attractor 
is usually not a manifold, but has a rather 
complicated geometric structure. So, it is natural to 
use the definitions of dimensions adopted for the 
study of fractal sets here. We restrict ourselves to the 
so-called fractal (or box-counting, entropy) dimen- 
sion, although other dimensions (e.g., Hausdorff, 
Lyapunov, etc.) are also used in the theory of 
attractors. 

In order to define the fractal dimension, we first 
recall the concept of Kolmogorov’s e-entropy, which 
comes from the information theory and plays a 
fundamental role in the theory of DSs in unbounded 
domains considered in the next section. 


Definition 2 Let A be a compact subset of a 
metric space ®. For every £ > 0, we define N-(K) as 
the minimal number of -balls which are necessary 
to cover A. Then, Kolmogorov’s  ¢-entropy 
H-(A)=H-(A,®) of A is the digital logarithm of 
this number, #(-(A):= log, N-(A). We recall that 
H(A) is finite for every £ > 0, due to the Hausdorff 
criterium. The fractal dimension d;(A) € [0,00] of A 
is then defined by 


d¢(A) := lim sup H.(A)/ log, 1/e [12] 
e€—0 

We also recall that, although this dimension 
coincides with the usual dimension of the manifold 
for Lipschitz manifolds, it can be noninteger for 
more complicated sets. For instance, the fractal 
dimension of the standard ternary Cantor set in 
[0, 1] is In2/In3. 

The so-called Mané theorem (which can be 
considered as a generalization of the classical Yitni 
embedding theorem for fractal sets) plays an 
important role in the finite-dimensional reduction 
theory. 
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Theorem 2 Let ® be a Banach space and A be a 
compact set such that d(A) < N for some NEN. 
Then, for “almost all” (2N + 1)-dimensional planes 
L in ®, the corresponding projector II,:® — L 
restricted to the set A is a Holder continuous 
homeomorphism. 


Thus, if the finite fractal dimensionality of the 
attractor is established, then, fixing a hyperplane L 
satisfying the assumptions of the Mané theorem 
and projecting the attractor A and the DS S(t) 
restricted to A onto this hyperplane (A:=II,A 
and S(t):=Hy, o S(t)oII;'), we obtain, indeed, a 
reduced DS (S(t), A) which is defined on a finite- 
dimensional set A C L ~ R*N*!. Moreover, this DS 
will be Holder continuous with respect to the initial 
data. 


Estimates on the Fractal Dimension 


Obviously, good estimates on the dimension of the 
attractors in terms of the physical parameters are 
crucial for the  finite-dimensional reduction 
described above, and (consequently) there exists a 
highly developed machinery for obtaining such 
estimates. The best-known upper estimates are 
usually obtained by the so-called volume contraction 
method, which is based on the study of the evolution 
of infinitesimal k-dimensional volumes in the neigh- 
borhood of the attractor (and, if the DS considered 
contracts the k-dimensional volumes, then the 
fractal dimension of the attractor is less than k). 
Lower bounds on the dimension are usually based 
on the observation that the global attractor always 
contains the unstable manifolds of the (hyperbolic) 
equilibria. Thus, the instability index of a properly 
constructed equilibrium gives a lower bound on the 
dimension of the attractor. 

In the following, several estimates for the classes 
of equations given in the introduction are formu- 
lated, beginning with the most-studied case of the 
reaction—diffusion system [8]. For this system, sharp 
upper and lower bounds are known, namely 


Cyvol(Q) < d¢(A) < Cyvol(Q) [13] 


where the constants C4 and C2 depend on a and f 
(and, possibly, on the shape of 2), but are indepen- 
dent of its size. The same types of estimates also hold 
for the hyperbolic equation [7]. Concerning the 
Navier-Stokes system [6] in general two-dimensional 
domains Q, the asymptotics of the fractal dimension 
as v — 0 is not known. The best-known upper bound 
has the form d;(A) < Cv? and was obtained by 
Foias-Temam by using the so-called Lieb-Thirring 


inequalities. Nevertheless, for periodic boundary 
conditions, Constantin-Foias-Temam and Liu 
obtained upper and lower bounds of the same order 
(up to a logarithmic correction): 


Gar < dA) < Oy iaae A 


Global Lyapunov Functions and the Structure 
of Global Attractors 


Although the global attractor has usually a very 
complicated geometric structure, there exists one 
exceptional class of DS for which the global attractor 
has a relatively simple structure which is completely 
understood, namely the DS having a global Lyapunov 
function. We recall that a continuous function 
£:® — Risa global Lypanov function if 


1. £ is nonincreasing along the trajectories, that is, 
L(S(t)uo) < L(uo), for all £ > 0; 

2. £ is strictly decreasing along all nonequilibrium 
solutions, that is, £(S(t)uo) = L(uo) for some t > 0 
and uo implies that uo is an equilibrium of S(t). 


For instance, in the scalar case N=1, the 
reaction—diffusion equations [8] possess the global 
Lyapunov function L(u9):= fo [a] V4 (x)|7 + Fluo 
(x))|]dx, where F(v):= fy f(u) du. Indeed, multiply- 
ing eqn [8] by ðu and integrating over 2, we have 


d 

dt 
Analogously, in the scalar case N=1, multiplying 
the hyperbolic equation [7] by 0,u(t) and integrating 
over Q, we obtain the standard global Lyapunov 
function for this equation. 

It is well known that, if a DS posseses a global 
Lyapunov function, then, at least under the generic 
assumption that the set R of equilibria is finite, every 
trajectory u(t) stabilizes to one of these equilibria as 
t — +20. Moreover, every complete bounded trajec- 
tory u(t), t€ R, belonging to the attractor is a 
heteroclinic orbit joining two equilibria. Thus, the 
global attractor A can be described as follows: 


L(u(t)) = —2||Au(2)|l72¢q) < 0 15) 


A= |] M* (uo) [16] 


UER 


where M*(uo) is the so-called unstable set of the 
equilibrium uo (which is generated by all heteroclinic 
orbits of the DS which start from the given equilibrium 
ug € A). It is also known that, if the equilibrium uo is 
hyperbolic (generic assumption), then the set M* (uo) 
is a k-dimensional submanifold of ®, where « is the 
instability index of uo. Thus, under the generic 
hyperbolicity assumption on the equilibria, the 
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attractor A of a DS having a global Lyapunov function 
is a finite union of smooth finite-dimensional sub- 
manifolds of the phase space ®. These attractors are 
called regular (following Babin—Vishik). 

It is also worth emphasizing that, in contrast to 
general global attractors, regular attractors are 
robust under perturbations. Moreover, in some 
cases, it is also possible to verify the so-called 
transversality conditions (for the intersection of 
stable and unstable manifolds of the equilibria) 
and, thus, verify that the DS considered is a 
Morse-Smale system. In particular, this means that 
the dynamics restricted to the regular attractor A is 
also preserved (up to homeomorphisms) under 
perturbations. 

A disadvantage of the approach of using a regular 
attractor is the fact that, except for scalar parabolic 
equations in one space dimension, it is usually 
extremely difficult to verify the “generic” hyperbo- 
licity and transversality assumptions for concrete 
values of the physical parameters and the associated 
hyperbolicity constants, as a rule, cannot be 
expressed in terms of these parameters. 


Inertial Manifolds 


It should be noted that the scheme for the finite- 
dimensional reduction described above has essential 
drawbacks. Indeed, the reduced system (S(t), A) is 
only Holder continuous and, consequently, cannot 
be realized as a DS generated by a system of ODEs 
(and reasonable conditions on the attractor A which 
guarantee the Lipschitz continuity of the Mané 
projections are not known). On the other hand, the 
complicated geometric structure of the attractor 
A (or A) makes the use of this finite-dimensional 
reduction in computations hazardous (in fact, only 
the heuristic information on the number of 
unknowns which are necessary to capture all the 
dynamical effects in approximations can be 
extracted). 

In order to overcome these problems, the concept 
of an inertial manifold (which allows one to embed 
the global attractor into a smooth manifold) has 
been suggested by Foias—Sell-Temam. To be more 
precise, a Lipschitz finite-dimensional manifold M Cc ® 
is an inertial manifold for the DS (S(t), ®) if 


1. M is semiinvariant, that is, S(t)M c M, for all 
t > 0; 

2. M satisfies the following asymptotic completeness 
property: for every uo € ®, there exists vp € M 
such that 


[SEx — S(t)volle < Q(|l#ollae [47] 


where the positive constant œ and the monotonic 
function O are independent of uo. 

We can see that an inertial manifold, if it 
exists, confirms in a perfect way the heuristic 
conjecture on the finite dimensionality formulated 
in the introduction. Indeed, the dynamics of S(t) 
restricted to an inertial manifold can be, obviously, 
described by a system of ODEs (which is called the 
inertial form of the initial PDE). On the other hand, 
the asymptotic completeness gives (in a very strong 
form) the equivalence of the initial DS (S(t), ®) with 
its inertial form (S(t), M). Moreover, in turbulence, 
the existence of an inertial manifold would yield an 
exact interaction law between the small and large 
structures of the flow. 

Unfortunately, all the known constructions of 
inertial manifolds are based on a very restrictive 
condition, the so-called spectral gap condition, 
which requires arbitrarily large gaps in the spectrum 
of the linearization of the initial PDE and which can 
usually be verified only in one space dimension. So, 
the existence of an inertial manifold is still an 
open problem for many important equations of 
mathematical physics (including in particular the 
two-dimensional Navier-Stokes equations; some 
nonexistence results have also been proven by 
Mallet—Paret). 


Exponential Attractors 


We first recall that Definition 1 of a global 
attractor only guarantees that the images S(t)B of 
all the bounded subsets converge to the attractor, 
without saying anything on the rate of convergence 
(in contrast to inertial manifolds, for which this 
rate of convergence can be controlled). Further- 
more, as elementary examples show, this conver- 
gence can be arbitrarily slow, so that, until now, 
we have no effective way for estimating this rate of 
convergence in terms of the physical parameters of 
the system (an exception is given by the regular 
attractors described earlier for which the rate of 
convergence can be estimated in terms of the 
hyperbolicity constants of the equilibria. However, 
even in this situation, it is usually very difficult to 
estimate these constants for concrete equations). 
Furthermore, there exist many physically relevant 
systems (e.g., the so-called slightly dissipative 
gradient systems) which have trivial global attrac- 
tors, but very rich and physically relevant transient 
dynamics which are automatically forgotten under 
the global-attractor approach. Another important 
problem is the robustness of the global attractor 
under perturbations. In fact, global attractors are 
usually only upper semicontinuous under 
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perturbations (which means that they cannot 
explode) and the lower semicontinuity (which 
means that they cannot also implode) is much 
more delicate to prove and requires some hyperbo- 
licity assumptions (which are usually impossible to 
verify for concrete equations). 

In order to overcome these difficulties, Eden- 
Foias—Nicolaenko—Temam have introduced an inter- 
mediate object (between inertial manifolds and 
global attractors), namely an exponential attractor 
(also called an inertial set). 


Definition 3 A compact set M C ® is an exponen- 
tial attractor for the DS (S(t), ®) if 


(i) M has finite fractal dimension: d;(M) < œ; 
(ii) M is semi-invariant: S(t)M C M, for all t > 0; 
(iii) M attracts exponentially the images of all the 

bounded sets B C ¢: 


dist (S(t)B, M) < Q(||Bllg je [18] 


where the positive constant œ and the monotonic 
function O are independent of B. 


Thus, on the one hand, an exponential attractor 
remains finite dimensional (like the global attractor) 
and, on the other hand, estimate [18] allows one to 
control the rate of attraction (like an inertial 
manifold). We note, however, that the relaxation 
of strict invariance to semi-invariance allows this 
object to be nonunique. So, we have here the 
problem of the “best choice? of the exponential 
attractor. We also mention that an exponential 
attractor, if it exists, always contains the global 
attractor. 

Although the initial construction of exponential 
attractors is based on the so-called squeezing 
property (and requires Zorn’s lemma), we formulate 
below a simpler construction, due to Efendiev— 
Miranville-Zelik, which is similar to the method 
proposed by Ladyzhenskaya to verify the finite 
dimensionality of global attractors. This is done for 
discrete times and for a DS generated by iterations 
of some map $:® — @, since the passage from 
discrete to continuous times usually arises no 
difficulty (without loss of generality, the reader 
may think that S= S(1) and (S(t), ®) is one of the DS 
mentioned in the introduction). 


Theorem 3 Let the phase space ®o be a closed 
bounded subset of some Banach space H and let H, 
be another Banach space compactly embedded into 
H. Assume also that the map S:®9 — Po satisfies 
the following “smoothing” property: 


|| Su — Su2|| H, < K||u1 — u2|\ 4, U1,U2 E€ Po [19] 


for some constant K independent of u;. Then, the DS 
(S, Po) possesses an exponential attractor. 


In applications, ®p is usually a bounded absorb- 
ing/attracting set whose existence is guaranteed by 
the dissipative estimate [10], H:=L7?(Q) and 
Hı := H! (Q). Furthermore, estimate [19] simply 
follows from the classical parabolic smoothing 
property, but now applied to the equation of 
variations (as in [11], hyperbolic equations require 
a slightly more complicated analogue of [19]). These 
simple arguments show that exponential attractors 
are as general as global attractors and, to the 
best of our knowledge, exponential attractors exist 
indeed for all the equations of mathematical physics 
for which the finite dimensionality of the global 
attractor can be established. Moreover, since A C 
M, this scheme can also be used to prove the finite 
dimensionality of global attractors. 

It is finally worth emphasizing that the control on 
the rate of convergence provided by [18] makes 
exponential attractors much more robust than global 
attractors. In particular, they are upper and lower 
semicontinuous under perturbations (of course, up to 
the “best choice,” since they are not unique), as 
shown by Efendiev—Miranville—Zelik. 


Essentially Infinite-Dimensional 
Dynamical Systems - The Case of 
Unbounded Domains 


As already mentioned in the introduction, the theory 
of dissipative DS in unbounded domains is develop- 
ing only now and the results given here are not as 
complete as for bounded domains. Nevertheless, we 
indicate below several of the most interesting (from 
our point of view) results concerning the general 
description of the dynamics generated by such 
problems by considering a system of reaction- 
diffusion equations [8] in R” with phase space 
= L™(IR”) as a model example (although all the 
results formulated below are general and depend 
weakly on the choice of equation). 


Generalization of the Global Attractor and 
Kolmogorov’s -Entropy 


We first note that Definition 1 of a global attractor 
is too strong for equations in unbounded domains. 
Indeed, as seen earlier, the compactness of the 
attractor is usually based on the compactness of 
the embedding H!(Q) c L*(Q), which does not hold 
in unbounded domains. Furthermore, an attractor, 
in the sense of Definition 1, does not exist for most 
of the interesting examples of eqns [8] in R”. 
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It is natural to use instead the concept of the 
so-called locally compact global attractor which is 
well adapted to unbounded domains. This attrac- 
tor A is only bounded in the phase space 
$ = L®(R”), but its restrictions Aj, to all bounded 
domains Q are compact in L®(Q). Moreover, the 
attraction property should also be understood in 
the sense of a local topology in L®(R”). It is 
known that this generalized global attractor A 
exists indeed for problem [8] in R” (of course, 
under some “natural” assumptions on the non- 
linearity f and the diffusion matrix a). As for 
bounded domains, its existence is based on the 
dissipative estimate [10], the smoothing property 
[11], and the compactness of the embedding 
Hy, .(R”) C Li (R”) (we need to use the local 
topology only to have this compactness). 

The next natural question that arises here is how 
to control the “size” of the attractor A if its fractal 
dimension is infinite (which is usually the case in 
unbounded domains). One of the most natural 
ways to handle this problem (which was first 
suggested by Chepyzhov—Vishik in the different 
context of uniform attractors associated with 
nonautonomous equations in bounded domains 
and appears as extremely fruitful for the theory of 
dissipative PDE in unbounded domains) is to study 
the asymptotics of Kolmogorov’s e-entropy of the 
attractor. Actually, since the attractor A is compact 
only in a local topology, it is natural to study the 
entropy of its restrictions, say, to balls BË of R” of 
radius R centered at xọ with respect to the three 
parameters R,xọ, and €. A more or less complete 
answer to this question is given by the following 
estimate: 


He(Alpe ) < C(R + logy 1/2)” log, 1/e [20] 


where the constant C is independent of £ < 1, R, 
and xo. Moreover, it can be shown that this estimate 
is sharp for all R and € under the very weak 
additional assumption that eqn [8] possesses at least 
one exponentially unstable spatially homogeneous 
equilibrium. 

Thus, formula [20] (whose proof is also based on 
a smoothing property for the equation of variations) 
can be interpreted as a natural generalization of the 
heuristic principle of finite dimensionality of global 
attractors to unbounded domains. It is also worth 
recalling that the entropy of the embedding of a ball 
B, of the space Cus) into C(Bx ) has the 
asymptotic H-(B) ~ Cr(l og k which is essentially 
worse than [20]. So, [20] is not based on the 
smoothness of the attractor A and, therefore, 
reflects deeper properties of the equation. 


Spatial Dynamics and Spatial Chaos 


The next main difference with bounded domains is 
the existence of unbounded spatial directions which 
can generate the so-called spatial chaos (in addition 
to the “usual” temporal chaos arising under the 
evolution). In order to describe this phenomenon, it 
is natural to consider the group {T,,/ € R”} of 
spatial translations acting on the attractor A: 


(Tuo) (x) = U(x + h), T,:A-<A [21] 


as a DS (with multidimensional “times” if n > 1) 
acting on the phase space A and to study its 
dynamical properties. 

In particular, it is worth noting that the lower 
bounds on the ¢-entropy that one can derive imply 
that the topological entropy of this spatial DS is 
infinite and, consequently, the classical symbolic 
dynamics with a finite number of symbols is not 
adequate to clarify the nature of chaos in [21]. 
In order to overcome this difficulty, it was suggested 
by Zelik to use Bernoulli shifts with an infinite 
number of symbols, belonging to the whole interval 
w € [0,1]. To be more precise, let us consider the 
Cartesian product M, :=[0, 1]”° endowed with the 
Tikhonov topology. Then, this set can be interpreted 
as the space of all the functions v:Z” — [0,1], 
endowed with the standard local topology. We define 
a DS {7,1 € Z”} on M, by 


(Tw)(m):=v(m+), veM,, l mez” |22] 


Based on this model, the following description of 
spatial chaos was obtained. 


Theorem 4 Let eqn [8] in Q=R” possess at least 
one exponentially unstable spatially homogeneous 
equilibrium. Then, there exist a > 0 and a home- 
omorphic embedding T:M,„ — A such that 


Ty °Tv)=T°T (v), VEZ", ve M, — [23] 


Thus, the spatial dynamics, restricted to the set 
T(M,,), is conjugated to the symbolic dynamics on 
M,. Moreover, there exists a dynamical invariant 
(the so-called mean toplogical dimension) which is 
always finite for the spatial DS [22] and strictly 
positive for the Bernoulli scheme M,,. So, the 
embedding [23] clarifies, indeed, the nature of 
chaos arising in the spatial DS [21]. 


Spatio-Temporal Chaos 


To conclude, we briefly discuss an extension of 
Theorem 4, which takes into account the temporal 
modes and, thus, gives a description of the spatio- 
temporal chaos. In order to do so, we first note 
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that the spatial DS [21] commutes obviously 
with the temporal evolution operators S(t) and, 
consequently, an extended (n + 1)-parametric semi- 
group {S(t, h), (t,4) € R4 x R”} acts on the attractor: 


S(t) =S) o Tr; S(t,b): A—> A 
tER, hbeR” [24] 


Then, this semigroup (interpreted as a DS with 
multidimensional times) is responsible for all the 
spatio-temporal dynamical phenomena in the initial 
PDE [8] and, consequently, the question of finding 
adequate dynamical characteristics is of a great 
interest. Moreover, it is also natural to consider the 
subsemigroups Sy, (t, þh) associated with the k-dimen- 
sional planes V; of the spacetime R} x R”,k <n+1. 

Although finding an adequate description of the 
dynamics of [24] seems to be an extremely difficult 
task, some particular results in this direction have 
already been obtained. Thus, it has been proved by 
Zelik that the semigroup [24] has finite topological 
entropy and the entropy of its subsemigroups 
Sy, (t, þh) is usually infinite if k < n+ 1. Moreover 
(adding a natural transport term of the form 
(L, Vx)u to eqn [8]), it was proved that the analog 
of Theorem 4 holds for the subsemigroups Sy, (t, /) 
associated with the n-dimensional hyperplanes V,„ of 
the spacetime. Thus, the infinite-dimensional Ber- 
noulli shifts introduced in the previous subsection 
can be used to describe the temporal evolution in 
unbounded domains as well. 

In particular, as a consequence of this embedding, 
the topological entropy of the initial purely temporal 
evolution semigroup S(t) is also infinite, which 


indicates that (even without considering the spatial 
directions) we have indeed here essential new levels 
of dynamical complexity which are not observed in 
the classical DS theory of ODEs. 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Ergodic Theory; 
Evolution Equations: Linear and Nonlinear; Fractal 
Dimensions in Dynamics; Inviscid flows; Lyapunov 
Exponents and Strange Attractors. 
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Introduction 


Since they were introduced by Witten in 1988, 
topological quantum field theories (TQFTs) have 
had a tremendous impact in mathematical physics 
(see Birmingham et al. (1991) and Cordes et al. for a 
review). These quantum field theories are 


constructed in such a way that the correlation 
functions of certain operators provide topological 
invariants of the spacetime manifold where the 
theory is defined. This means that one can use the 
methods and insights of quantum field theory in 
order to obtain information about topological 
invariants of low-dimensional manifolds. 
Historically, the first TQFT was Donaldson—Witten 
theory, also called topological Yang-Mills theory. 
This theory was constructed by Witten (1998) starting 
from M =2 super Yang-Mills by a procedure called 


“topological twisting.” The resulting model is topolo- 
gical and the famous Donaldson invariants of 
4-manifolds are then recovered as certain correlation 
functions in the topological theory. The analysis of 
Witten (1998) did not indicate any new method to 
compute the invariants, but in 1994 the progress in 
understanding the nonperturbative dynamics of N = 2 
theories (Seiberg and Witten 1994 a, b) led to an 
alternative way of computing correlation functions in 
Donaldson—Witten theory. As Witten (1994) showed, 
Donaldson—Witten theory can be reduced to 
another, simpler topological theory consisting of 
a twisted abelian gauge theory coupled to spinor 
fields. This theory leads to a different set of 
4-manifold invariants, the so-called ‘“Seiberg— 
Witten invariants,” and Donaldson invariants can 
be expressed in terms of these invariants through 
Witten’s “magic formula.” The connection 
between Seiberg-Witten and Donaldson invar- 
lants was streamlined and extended by Moore 
and Witten by using the so-called u-plane integral 
(Moore and Witten 1998). This has led to a rather 
complete understanding of Donaldson—Witten the- 
ory from a physical point of view. 

In this article we provide a brief review of 
Donaldson—Witten theory. First, we describe the 
construction of the model, from both a mathematical 
and a physical point of view, and state the main 
results for the Donaldson—Witten generating func- 
tional. In the next section, we present the basic results 
of the u-plane integral of Moore and Witten and 
sketch how it can be used to solve Donaldson—Witten 
theory. In the final section, we mention some 
generalizations of the basic framework. For a 
complete exposition of Donaldson—Witten theory, 
the reader is referred to the book by Labastida 
and Mariño (2005). A short review of the u-plane 
integral can be found in Mariño and Moore (1998a). 


Donaldson-Witten Theory: Basic 
Construction and Results 


Donaldson-Witten Theory According to Donaldson 


Donaldson theory as formulated in Donaldson (1990), 
Donaldson and Kronheimer (1990), and Friedman and 
Morgan (1991) starts with a principal G=SO(3) 
bundle V — X over a compact, oriented, Riemannian 
4-manifold X, with fixed instanton number k 
and Stiefel-Whitney class w2(V) (SO(3) bundles on a 
4-manifold are classified up to isomorphism by these 
topological data). The moduli space of anti-self-dual 
(ASD) connections is then defined as 


Masp = {A : F(A) = 0}/G [1] 
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where F*(A) is the self-dual part of the curvature, 
and G is the group of gauge transformations. To 
construct the Donaldson polynomials, one considers 
the universal bundle 


P = (V x A“)/(G x G) [2] 


where A* is the space of irreducible G-connections 
on V. This is a G-bundle over B* x X, where 
B* = A*/G is the space of irreducible connections 
modulo gauge transformations, and as such has a 
Pontrjagin class 


pi(P) € H* (5°) 8 H(X) [3] 


One can then obtain differential forms on B* by 
taking the slant product of pı(P) with homology 
classes in X. In this way we obtain the Donaldson 
map: 


u : Hi(X) — H^ (B") 4 


After restriction to Masp, we obtain the following 
differential forms on the moduli space of ASD 
connections: 
x € Ho (X) 
SeH (X) 


O(x) = H (Masp) 


h(S) C H” (Masp) Pl 


ar 
= 
If the manifold X has b,(X) 40, there are also 
cohomology classes associated to 1-cycles and 
3-cycles, but we will not consider them here. 

We can now formally define the Donaldson 
invariants as follows. Consider the space 


A(X) = Sym(Ho(X) © H2(X)) 6 


with a typical element written as x‘S;,---S;,. The 
Donaldson invariant corresponding to this element 
of A(X) is the following intersection number: 


> i Cas a Si, ) 


= O' A Th(Si,) A+++ A 1h(S;,) [7] 
Masp 


where Masp is the moduli space of ASD connec- 
tions with second Stiefel-Whitney class w2(V) and 
instanton number k. The integral in [7] will be 
different from zero only if the degrees of the forms 
add up to dim(Masp). 

It is very convenient to pack all Donaldson 
invariants in a generating functional. Let 
{Si};=1,...b, be a basis of 2-cycles. We introduce the 
formal sum 


b2 
S=) aS [8] 
i=1 
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where v; are complex numbers. We then define the 
Donaldson—Witten generating functional as 


w(V a Ripa 
2 (p, vi) — yo ) (e? ma [9] 
k=0 


where on the right-hand side we are summing over 
all instanton numbers, that is, we are summing over 
all topological configurations of the SO(3) gauge 
field with a fixed w2(V). This gives a formal power 
series in p and v;. 

For bj (X) >1, the generating functional [9] 
is a diffeomorphism invariant of X; therefore, it 
is potentially a powerful tool in four-dimensional 
topology. When b;(X)=1, Donaldson invariants 
are metric dependent. The metric dependence 
can be described in more detail as follows. Define 
the period point as the harmonic 2-form 
satisfying 
eS [10] 


*W = W, 


which depends on the conformal class of the metric. 
As the conformal class of the metric varies, w 
describes a curve in the cone 


V, = {w € H’(X,R) :u* > 0} [11] 
Let ¢ € H?(X) satisfy 


Ç=mw(V)mod2, <0, (Gw)=0 [2 


Such an element ¢ defines a “wall” in V}: 
We = {w: (Cw) = 0} 13] 


The complements of these walls are called “cham- 
bers,” and the cone V is then divided in chambers 
separated by walls. A class ¢ satisfying [12] is the 
first Chern class associated to a reducible solution of 
the ASD equations, and it causes a singularity in 
moduli space: the Donaldson invariants jump when 
we pass through such a wall. Therefore, when 
b;(X)=1, Donaldson invariants are metric inde- 
pendent in each chamber. A basic problem in 
Donaldson-Witten theory is to determine the jump 
in the generating function as we cross a wall, 


Z$ (p, S) — Z$ (p, 5) = WCe(p,S) [14] 
The jump term WCe(p,S,6) is usually called the 


“wall-crossing” term. 

The basic goal of Donaldson theory is to study 
the properties of the generating functional [9] and 
to compute it for different 4-manifolds X. On 
the mathematical side, many results have been 
obtained on Zpw, and some of them can be found 
in Donaldson and Kronheimer (1990), Friedman 
and Morgan (1991), Stern (1998), and Gottsche 


(1996). On the other hand, Donaldson theory can be 
formulated as a topological field theory, and many 
of these results can be obtained by using quantum 
field theory techniques. This will be our main focus 
for the rest of the article. 


Donaldson-Witten Theory According to Witten 


Witten (1988) constructed a twisted version of 
N =2 super Yang-Mills theory which has a nilpo- 
tent Becchi-Rouet-Stora—Tyutin (BRST) charge 
(modulo gauge transformations) 


g= Oa [15] 


where Q., are the supersymmetric (SUSY) charges. 
Here & is a chiral spinor index and A has its origin in 
the SU(2) R-symmetry. The field content of the 
theory is the standard twisted M = 2 vector multiplet: 


A, Wa — cies Q, D a o — Pah d, = På [16] 


where (1/ 2D, dx” dx” is a self-dual 2-form derived 
from the auxiliary fields, etc. All fields are valued in 
the adjoint representation of the gauge group. After 
twisting, the theory is well defined on any Rieman- 
nian 4-manifold, since the fields are naturally 
interpreted as differential forms and the Q charge 
is a scalar (Witten 1988). 

The observables of the theory are Q cohomology 
classes of operators, and they can be constructed 
from 0-form observables ©% using the descent 
procedure. This amounts to solving the equations 


do ={9,08), i=0,...,3 [17] 


The integration over i-cycles y® in X of the 
operators O" is then an observable. These descent 
equations have a canonical solution: the 1-form- 
valued operator Kaa = —i64O,,,/4 verifies 


d= {O,K} [18] 


as a consequence of the supersymmetry algebra. The 
operators O©® = K'O% solve the descent equations 
[17] and are canonical representatives. When the 
gauge group is SU(2), the observables are obtained 
by the descent procedure from the operator 


O = tr(¢") [19] 
The topological descendant 0) is given by 
OP) =— pr ( oE + Di.) —4vyh) det dx” [20] 


and the resulting observable is 


D(S) = J o% [21] 


O and h(S) correspond to the cohomology classes in 
[5]. One of the main results of Witten (1988) is that 
the semiclassical approximation in the twisted 
N =2 Yang-Mills theory is exact. The semiclassical 
evaluation of correlation functions of the observa- 
bles above leads directly to the definition of 
Donaldson invariants, and the generating functional 
[9] can be written as a correlation function of the 
twisted theory. One then has 


ZEN (P.S) = exp (PO + H(S))) [22 


Results for the Donaldson-Witten 
Generating Function 


The basic results that have emerged from the 
physical approach to Donaldson—Witten theory are 
the following. 


1. The Donaldson—Witten generating functional 
is in general the sum of the two terms, 


Zpw = Zu + Zsw [23] 


(We have omitted the Stiefel-Whitney class for 
convenience.) The first term, Z,, is called the 
“u-plane integral.” It is given by a complicated 
integral over C which can be written, in turn, as an 
integral over a fundamental domain of the con- 
gruence subgroup T?(4) of SL(2,Z). Z, depends 
only on the cohomology ring of X, and therefore 
does not contain any information beyond the one 
provided by classical topology. Finally, Z, vanishes 
if bj (X) > 1, and it is responsible for the wall- 
crossing behavior of Zpw when bj (X)=1. 

2. The second term of [23], Zsw, is called the 
Seiberg-Witten contribution. This contribution 
involves the Seiberg-Witten invariants of X, which 
are obtained by considering the moduli problem 
defined by the Seiberg-Witten monopole equations 
(Witten 1994b): 


+ iW = 
FY, + 4iM(gM, = 0 


l [24] 
Df" Ma = 0 
In these equations, Mg is a section of the spinor 
bundle St @ L'/?, L is the determinant line bundle 
of a Spin, structure on X, Ee l is the 
self-dual part of the curvature of a U(1) connection 
on L, and Dy is the Dirac operator for the bundle 
ST @ L'/*, The solutions of these equations modulo 
gauge equivalence form the moduli space Ms W, 
and the Seiberg—Witten invariants are defined by 
integrating suitable differential forms on this moduli 
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space. We will label Spin, structures by the class 
N= c1(L'/*) € H?(X, Z) + w2(X)/2. We say that A is 
a Seiberg—Witten basic class if the corresponding 
Seiberg-Witten invariants are not all zero. If MsW 
is zero dimensional, the Seiberg-Witten invariant 
depends only on the Spin, structure associated to 
\=c;(L"'/*), and is denoted by SW()). 

3. A manifold X is said to be of Seiberg- 
Witten simple type if all the Seiberg—Witten basic 
classes have a zero-dimensional moduli space. For 
simply connected 4-manifolds of Seiberg—Witten 
simple type and with bł (X) > 1, Witten determined 
the Seiberg-Witten contribution and proposed the 
following “magic formula” for Zpw (Witten 1994b): 


Zo = eee eee: y e2in(Ao:A+A3) aaa /2 e2(8,A) 
A 


pae) ee-e SWA) [25] 


In this equation, y,o are the Euler characteristic and 
signature of X, respectively, y,=(v+0)/4 is the 
holomorphic Euler characteristic of X, and Ao 
is an integer lifting of w2(V). This formula gen- 
eralizes previous results by Witten (1994a) for 
Kahler manifolds. It also follows from this formula 
that the Donaldson—Witten generating function of 
simply connected 4-manifolds of Seiberg—Witten 
simple type and with bj(X) > 1 satisfies 


82 
(ipa ~4)Z0w =o 


which is the Donaldson simple type condition 
introduced by Kronheimer and Mrowka (1994). 

4. Using the u-plane integral, one can find explicit 
expressions for Zpw in more general situations (like 
non-simply-connected manifolds or manifolds which 
are not of Seiberg—Witten simple type). 


In the next section we explain the formalism of 
the u-plane integral introduced by Moore and 
Witten (1998), which makes possible a detailed 
derivation of the above results. 


The u-Plane Integral 
Definition of the u-Plane Integral 


The evaluation of the Donaldson—Witten generating 
function can be made by using the results of Seiberg 
and Witten (1994 a, b) on the low-energy dynamics 
of SU(2), M =2 Yang-Mills theory. In their work, 
Seiberg and Witten determined the exact low-energy 
effective action of the model up to two derivatives. 
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From a physical point of view, there are certainly 
corrections to this effective action which are difficult 
to evaluate. Fortunately, the computation in the 
twisted version of the theory can be done by just 
considering the Seiberg—Witten effective action. This 
is because the correlation functions in the twisted 
theory are invariant under rescalings of the metric, 
so we can evaluate them in the limit of large 
distances or equivalently of very low energies. The 
effective action up to two derivatives is sufficient for 
that purpose. 

One way of describing the main result of the work 
of Seiberg and Witten is that the moduli space of 
O-fixed points of the twisted SO(3) M =2 theory on 
a compact 4-manifold has two branches, which we 
refer to as the Coulomb and Seiberg-Witten 
branches. On the Coulomb branch the expectation 
value 


(tro?) 
1672 





breaks SO(3)— U(1) via the standard Higgs 
mechanism. The Coulomb branch is simply a copy 
of the complex u-plane. The low-energy effective 
theory on this branch is simply the abelian M =2 
gauge theory. However, at two points, u = +1, there 
is a singularity where the moduli space meets a 
second branch, the Seiberg—Witten branch. At these 
points, the effective action is given by the magnetic 
dual of the U(1), M =2 gauge theory coupled to a 
monopole matter hypermultiplet. Therefore, this 
branch consists of solutions to the Seiberg—Witten 
equations [24]. 

Since the manifold X is compact, the partition 
function of the twisted theory is a sum over “all” 
vacuum states. Equation [23] then follows. In this 
equation, Z, comes from “integrating over the 
u-plane,” while Zsw corresponds to the points 
u=+1. As we stated before, Z, vanishes for 
manifolds of bj (X) > 1, but once this piece has 
been determined an argument originally presented 
at Moore and Witten (1998) allows one to derive 
the form of Zsw as well for arbitrary b}(X) > 1. 

The computation of Z,, is presented in detail in 
Moore and Witten (1998). The starting point of 
the computation is the untwisted low-energy 
theory, which has been described in detail in 
Seiberg and Witten (1994 a, b) and Witten 
(1995). It is an MN =2 theory characterized by a 
prepotential F which depends on an N =2 vector 
multiplet. The effective gauge coupling is given by 
T(a)=F"(a), where a is the scalar component of 
the vector multiplet. The Euclidean Lagrange 
density for the u-plane theory can be obtained 


simply by twisting the physical theory. It can be 
written as 
i 


6T 


1 


K‘F(a) + { 0, F'x(D + F,)} 


167 
iV? (= ~ V2i 
-a {OF a ev} —3 <Ir 


” {9, FX wx xf bade 
+ A(u)trR A R + B(u)trR AR [26] 





where A(u), B(u) describe the coupling to gravity, 
and after integration of the corresponding differen- 
tial forms we obtain terms proportional to the 
signature o and Euler characteristic x of X. The 
data of the low-energy effective action can be 
encoded in an elliptic curve of the form 


y =x — ux +tx [27] 


and 7 is the modulus of the curve. The monodromy 
group of this curve is I°(4). All the quantities 
involved in the action can be obtained by integrating 
a certain meromorphic differential on the curve, and 
they can be expressed in terms of modular forms. 

As for the operators, we have u=O(P) by 
definition. We may then obtain the 2-observables 
from the descent procedure. The result is that I($) > 
I(S) = fs K*u= fs (du/da)(D, + F_)+---. Here D4 
is the auxiliary field. Although one has I(S) — I(S) in 
going from the microscopic theory to the effective 
theory, it does not necessarily follow that 
1(S1)I(S2) — I(S)I(S2) because there can be contact 
terms. If Sı and S> intersect, then in passing to the 
low-energy theory we integrate out massive modes. 
This can induce delta function corrections to the 
operator product expansion modifying the mapping 
to the low-energy theory as follows: 


exp(I(S)) > exp(I(S) + S*T(u)) [28] 


where T(u) is the contact term. Such contact terms 
were observed in Witten (1994a) and studied in 
detail in Losev et al. (1998). It can be shown that 


2 
Tass Z (7) (Z) +u [29] 


where E(rT) is Eisenstein’s series and da/du is one of 
the periods of the elliptic curve [27]. 

The final result of Moore and Witten is the 
following expression: 


du du 24 
Z,(p, 8) = J TO (30 





Here, 


[31] 


where y=Imz7 and A is the discriminant of the 
curve [27]. The quantity Y is essentially a Narain- 
Siegel theta function associated to the lattice 
H? (X, Z). Notice that this lattice is Lorentzian and 
has signature (1,(—1)”'*)) (since b}(X)=1). The 
self-dual projection of a 2-form A can be done with 
the period point w as Ay =(A,w)w. The lattice is 
shifted by half the second Stiefel-Whitney class of 
the bundle, w2(V), that is, 


T = H*(X,Z)+4w2(V) 


and 
b—exp a. el 2 e2riào y i 2%) 
Sry \da) ~ = 
i du 
x aw) +o ’ ] 
Lo p) n 2 du 
x exp int.) —int(r_)*- iT (S, A [32] 


Here, w2(X) is the second Stiefel-Whitney class 
of X, and Xo is a choice of lifting of w2(V) to 
H?*(X,Z). This expression can be extended to the 
non-simply-connected case (see Marifio and 
Moore (1999) and Moore and Witten (1998)). 
The study of the u-plane integral leads to a 
systematic derivation of many important results 
in Donaldson—Witten theory. We will discuss in 
detail two such applications, Gottsche’s wall- 
crossing formula and Witten’s “magic formula.” 


Wall-Crossing Formula 


As shown by Moore and Witten, the u-plane integral 
is well defined and does not depend on the period 
point (hence on the metric on X) except for discontin- 
uous behavior at walls. There are two kinds of walls, 
associated, respectively, to the singularities at u = œo 
(the semiclassical region of the underlying Yang-Mills 
theory) and at u = +1, given by 


u =œ: A,=0,\ € H?(X,Z)+4w2(V) 


r i [33] 


The first type of walls is precisely the one that 
appears in Donaldson theory on manifolds of 
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b;(X)=1. The discontinuity of the u-plane inte- 
gral at these walls can be easily computed from 
eqn [33]: 


WCc-2, (p, S) 
O : (—1) Oro) 2i ag Phot) OF 
x exp{2pu., + S*Tx— i(A, S) /boo} [34] 
q 
This expression involves the modular forms 


b, f2% uo, and Ta (the subscript oo refers to the 
fact that they are computed at the “electric” frame 
which is appropriate for the Seiberg-Witten curve at 
u — œ). They can be written in terms of Jacobi 


theta functions 0J,(g), with g=e?", and their 
explicit expression is 
hoo (4) = 7¥2(4)93(q) 
v2 (q) (q) 
foo(q) = 
D= 7084) 
nla) < AD + 8) 35] 
2(82(q)03(q))° 
1 Ex(q) 1 





50 (q) = ABQ) + 3 Uo0(q) 


The subindex q? means that in the expansion in q of 
the modular forms, we pick the constant term. The 
formula [34] agrees with the formula of Gottsche 
(1996) for the wall crossing of the Donaldson- 
Witten generating functional. 


The Seiberg-Witten Contribution and Witten’s 
Magic Formula 


At u=+1,Z, jumps at the second type of 
walls [33], which are called Seiberg-Witten (SW) 
walls. In fact, these walls are labeled by classes 
à € H?(X; Z) + (1/2)w2(X), which correspond to 
Spin, structures on X. At these walls, the Seiberg— 
Witten invariants have wall-crossing behavior. Since 
the Donaldson polynomials do not jump at SW 
walls, it must happen that the change of Z, at u= +1 
is canceled by the change of Zsw. As shown by 
Moore and Witten, this actually allows one 
to obtain a precise expression for Zsw for general 
4-manifolds of b3 (X) > 1. 

On general grounds, Zsw is given by the sum of 
the generating functionals at u= +1. These involve 
a magnetic U(1), M =2 vector multiplet coupled to a 
hypermultiplet (the monopole field). The twisted 
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Lagrangian for such a system involves the magnetic 
prepotential Fp(ap), and it can be written as 


{Q, Wi +7 FF AF + plu )trR A *R 
2d 
32r dap 


TP OAUADAY [36] 


“owes” (WA w) AF 
i 


tor TZ 


where 7p =F F. (ap). Using the cancellation of wall 
crossings, one can actually compute the functions 
F p(ap), plu), (u) and determine the precise form of 
the Seiberg-Witten contributions. One finds that a 
Spin, structure \ at u = 1 gives the following contribu- 
tion to the Donaldson—Witten generating functional: 


— SW (A) ae ee 
7”=1,A 21m(A5—Ao-A 
oe 16 elds 


8+0 Xh 
* a aphm ( i) 


x exp(2pum + 1(A, S)/hu F eT) [37] 
Ip 





Here, ap, bm, um, and Ty are modular forms 
that can be expressed as well in terms of Jacobi 
theta functions ®;(qp), where gp = exp (2miTp). 
The subscript M refers to the monopole point, 
and they are related by an S-transformation 
to the quantities obtained in the “electric” 
frame at u — oo. Their explicit expression is 


__ i2Ex(qp) — ¥4(4p) — 84 (40) 
naa Á Pa a , 
hu(qp) = = 03(qp)ðalan) 
uulqp) = 12800) + 94l) 38 
2 (93(ap)¥4(qp)) 
1 Ex(qp) 1 





The contribution at u=—1 is related to the 
contribution at u= 1 by a u > —u symmetry: 


Zu=1(p, S) = eA (p, iS) [39] 


If the manifold has bj (X) >1 and is of Seiberg- 
Witten simple type, [37] reduces to 


= Pes 10/4 e2p+S*/2,,—2(S,) 
x eit oor) SW(X) [40] 


This leads to Witten’s “magic formula” [25] which 
expresses the Donaldson invariants in terms of 
Seiberg-Witten invariants. 


Other Applications of the u-Plane Integral 


The u-plane integral makes possible to derive other 
results on the Donaldson—Witten generating 
functional. 

The blow-up formula. This relates the function 
Zpw on X to Zpw on the blown-up manifold x. 
The u-plane integral leads directly to the general 
blow-up formula of Fintushel and Stern (1996). 

Direct evaluations. The u-plane integral can be 
evaluated directly in many cases, and this leads to 
explicit formulas for the Donaldson—Witten generat- 
ing functional of certain 4-manifolds with b3 (X) =1, 
on certain chambers, and in terms of modular forms. 
For example, there are explicit formulas for the 
Donaldson—Witten generating functional of product 
ruled surfaces of the form S° x X, in the limiting 
chambers in which $? or “ig are very small (Moore 
and Witten 1998, Marino sad Moore 1999). Moore 
and Witten (1998) have also derived an explicit 
formula for the Donaldson invariants of CP” in terms 
of Hurwitz class numbers. 


Extensions of Donaldson-Witten Theory 


Donaldson—Witten theory is a twisted version of 
SU(2), M=2 Yang-Mills theory. The twisting of 
more general N =2 gauge theories, involving other 
gauge groups and/or matter content, leads to other 
topological field theories that give interesting gen- 
eralizations of Donaldson—Witten theory. We now 
briefly list some of these extensions and their most 
important properties. 

Higher-rank theories. The extension of 
Donaldson-Witten to other gauge groups has been 
studied in detail in Mariño and Moore (1998b) and 
Losev et al. (1998). One can study the higher-rank 
generalization of the u-plane integral, and as shown 
in Mariño and Morre (1998b), this leads to a fairly 
explicit formula for the Donaldson—Witten generat- 
ing function in the SU(N) case, for manifolds with 
bj > 1 and of Seiberg-Witten simple type. Mathe- 
matically, higher-rank generalizations of Donaldson 
theory turn out to be much more complicated, but 
they can be studied. In particular, higher-rank 
generalizations of the Donaldson invariants can be 
defined and computed (Kronheimer 2004), and the 
results so far agree with the predictions of Mariño 
and Moore (1998b). Unfortunately it seems that 
these higher-rank generalizations do not contain 
new topological information, besides the one 
encoded in the Seiberg-Witten invariants. 

Theories with matter. Twisted SU(2), N =2 
theories with hypermultiplets lead to generalizations 
of Donaldson—Witten theory involving nonabelian 


monopole equations (see Mariño (1997) and 
Labastida and Marino (2005) for a review of these 
models and some of their properties). The u-plane 
integral leads to explicit formulas for the generating 
functionals of these theories, which for manifolds of 
b >1 can be written in terms of Seiberg-Witten 
invariants. Again, no new topological information 
seems to be encoded in these theories. One can 
however exploit new physical phenomena arising in 
the theories with hypermultiplets (in particular, the 
presence of superconformal points) to obtain new 
information about the Seiberg—Witten invariants 
(see Marino et al. (1999) for these developments). 
Vafa-—Witten theory. The so-called Vafa—Witten 
theory is a close cousin of Donaldson—Witten theory, 
and was introduced by Vafa and Witten (1994) as a 
topological twist of M=4 Yang-Mills theory. In 
some cases, the partition function of this theory 
counts the Euler characteristic of the moduli space of 
instantons on the 4-manifold X. For a review of some 
properties of this theory, see Lozano (1999). 


See also: Duality in Topological Quantum Field Theory; 
Mathai—Quillen Formalism; Seiberg—Witten Theory; 
Topological Quantum Field Theory: Overview. 
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Introduction 


There have been many exciting interactions between 
physics and mathematics in the past few decades. 
A prominent role in these interactions has been 
played by certain field theories, known as topologi- 
cal quantum field theories (TQFTs). These are 
quantum field theories whose correlation functions 
are metric independent and, in fact, compute certain 
mathematical invariants (Birmingham et al. 1991, 
Cordes et al. 1996, Labastida and Lozano 1998). 

Well-known examples of TQFTs are, in two 
dimensions, the topological sigma models (Witten 
1988a), which are related to Gromov—Witten invar- 
lants and enumerative geometry; in three dimen- 
sions, Chern—Simons theory (Witten 1989), which is 
related to knot and link invariants; and in four 
dimensions, topological Yang-Mills theory (or 
Donaldson—Witten theory) (Witten 1988b), which 
is related to the Donaldson invariants. The two- and 
four-dimensional theories above are examples of 
cohomological (also Witten-type) TQFTs. As such, 
they are related to an underlying supersymmetric 
quantum field theory (the M=2 nonlinear sigma 
model, and the MN =2 supersymmetric Yang-Mills 
theory, respectively) and there is no difference 
between the topological and the standard version 
on flat space. However, when one considers curved 
spaces, the topological version differs from the 
supersymmetric theory on flat space in that some 
of the fields have modified Lorentz transformation 
properties (spins). This unconventional spin assign- 
ment is also known as twisting, and it comes about 
basically to preserve supersymmetry on curved 
space. In fact, the twisting gives rise to at least one 
nilpotent scalar supercharge Q, which is a certain 
linear combination of the original (spinor) super- 
symmetry generators. 

In these theories the energy momentum tensor is 
O-exact, that is, 


Tw E {Q, Aw} 


for some A,,, which (barring potential anomalies) 
leads to the statement that the correlation functions 
of operators in the cohomology of Q are all metric 
independent. Furthermore, the corresponding path 
integrals are localized to field configurations that are 
annihilated by Q, and this typically leads to some 


moduli problem related to the computation of 
certain mathematical invariants. 

On the other hand, in Chern-Simons theory, as a 
representative of the so-called Schwarz-type topolo- 
gical theories, the topological character is manifest: 
one starts with an action which is explicitly 
independent of the metric on the 3-manifold, and 
thus correlation functions of metric-independent 
operators are topological invariants as long as 
quantization does not introduce any undesired 
metric dependence. 

Even though the primary motivation for introdu- 
cing TQFTs may be to shed light onto awkward 
mathematical problems, they have proved to be a 
valuable tool to gain insight into many questions of 
interest in physics as well. One such question where 
TQFTs can (and in fact do) play a role is duality. In 
what follows, an overview of the manifestations of 
duality is provided in the context of TQFTs. 


Duality 


The notion of duality is at the heart of some of the 
most striking recent breakthroughs in physics and 
mathematics. In broad terms, a duality (in physics) 
is an equivalence between different (and often 
complementary) descriptions of the same physical 
system. The prototypical example is electric- 
magnetic (abelian) duality. Other, more sophisti- 
cated, examples are the various string-theory 
dualities, such as T-duality (and its more specialized 
realization, mirror symmetry) and strong/weak 
coupling S-duality, as well as field theory dualities 
such as Montonen—Olive duality and Seiberg-Witten 
effective duality. 

Also, the original °t Hooft conjecture, stating that 
SU(N) gauge theories are equivalent (or dual), at 
large N, to string theories, has recently been revived 
by Maldacena (1998) by explicitly identifying the 
string-theory duals of certain (supersymmetric) 
gauge theories. 

One could wonder whether similar duality sym- 
metries work for TQFTs as well. As noted in the 
following, this is indeed the case. 

In two dimensions, topological sigma models 
come under two different versions, known as types 
A and B, respectively, which correspond to the 
two different ways in which MN =2 supersymmetry 
can be twisted in two dimensions. Computations 
in each model localize on different moduli spaces 
and, for a given target manifold, give different 
results, but it turns out that if one considers 
mirror pairs of | Calabi-Yau manifolds, 


computations in one manifold with the A-model are 
equivalent to computations in the mirror manifold 
with the B-model. 

Also, in three dimensions, a program has been 
initiated to explore the duality between large N 
Chern-Simons gauge theory and topological strings, 
thereby establishing a link between enumerative 
geometry and knot and link invariants 
(Gopakumar and Vafa 1998). 

Perhaps the most impressive consequences of the 
interplay between duality and TQFTs have come out 
in four dimensions, on which we will focus in what 
follows. 


Duality in Twisted M = 2 Theories 


As mentioned above, topological Yang-Mills theory 
(or Donaldson—Witten theory) can be constructed by 
twisting the pure N =2 supersymmetric Yang-Mills 
theory with gauge group SU(2). This theory contains 
a gauge field A, a pair of chiral spinors A,, A2, and a 
complex scalar field B. The twisted theory contains a 
gauge field A, bosonic scalars À, ¢, a Grassman-odd 
scalar n, a Grassman-odd vector w, and a Grassman- 
odd self-dual 2-form y. 

On a 4-manifold X, and for gauge group G, the 
twisted action has the form 


S =j dfx, /gtr (FY — iX" Diy + nD ye" 
x 
1 l 
+7 PX ws Ne Arbus OS = AD D" 
1 1 
aU +ga) [1] 
where F* is the self-dual part of the Yang-Mills field 


strength F. The action [1] is invariant under the 
transformations generated by the scalar supercharge Q: 


{9,A,,} = Wu, 12; Xia} = Fi, 
{2,4} = dag, {Q.n}=iP,@] B] 
{2,9} = 0, {Q,A} =n 


In these transformations, Q^ is a gauge transforma- 
tion with gauge parameter ¢, modulo field equa- 
tions. Observables are, therefore, related to the 
G-equivariant cohomology of Q (i.e., the cohomol- 
ogy of Q restricted to gauge invariant operators). 
Auxiliary fields can be introduced so that the 
action [1] is Q-exact, that is, 


S={Q,A} [3] 


for A a certain functional of the fields of the theory 
which comes under the name of gauge fermion, a 
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BRST-inspired terminology which reflects the formal 
resemblance of topological cohomological field 
theories with some aspects of the BRST approach 
to the quantization of gauge theories. Before con- 
structing the topological observables of the theory, 
we begin by pointing out that for each independent 
Casimir of the gauge group G it is possible to 
construct an operator Wo, from which operators W; 
can be defined recursively through the descent 


equations {Q, W;}=dW;_;. For example, for the 
quadratic Casimir, 
Wo = z tr(6?) 4 
— tr 
0 872 
which generates the following family of operators: 
1 
Wi = ga" (ow) 
| 
m- he! BAV+OAF] [5] 
1 
W3 = Ae tr(w A F) 


Using these one defines the following observables: 


OW = | W [6] 


Yk 


where y € H;(X) is a k-cycle on the 4-manifold X. 
The descent equations imply that they are Q-closed 
and depend only on the homology class of yg. 

Topological invariants are constructed by taking 
vacuum expectation values of products of the 
operators O") 


-fau olka)... lko) -5/2 7] 


where the integration has to be understood on the 
space of field configurations modulo gauge transfor- 
mations, and e is a coupling constant. Standard 
arguments show that due to the Q-exactness of the 
action S, the quantities obtained in [7] are indepen- 
dent of e. This implies that the observables of the 
theory can be obtained either in the weak-coupling 
limit e — 0 (also short-distance or ultraviolet regime, 
since the M =2 theory is asymptotically free), where 
perturbative methods apply, or in the strong-coupling 
(also long-distance or infrared) limit e — oo, where 
one is forced to consider a nonperturbative approach. 

In the weak-coupling limit one proves that 
the correlation functions [7] descend to polynomials 
in the product cohomology of the moduli space 
of anti-self-dual (ASD) instantons H; (Masp) x 
Ay, (Masp) x +++ x Hg, (Masp), which are precisely 
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the Donaldson polynomial invariants of X. How- 
ever, the weak- coupling analysis does not add any 
new ingredient to the problem of the actual 
computation of the invariants. The difficulties that 
one has to face in the field theory representation are 
similar to those in ordinary Donaldson theory. 

Nevertheless, the field theory connection is very 
important since in this theory the strong- and weak- 
coupling limits are exact, and therefore the door is 
open to find a strong-coupling description which 
could lead to a new, simpler representation for the 
Donaldson invariants. 

This alternative strategy was pursued by Witten 
(1994a), who found the strong-coupling realization 
of the Donaldson—Witten theory after using the 
results on the strong-coupling behavior of M =2 
supersymmetric gauge theories which he and Seiberg 
(Seiberg and Witten 1994a-c) had discovered. The 
key ingredient in Witten’s derivation was to assume 
that the strong-coupling limit of Donaldson—Witten 
theory is equivalent to the “sum” over the twisted 
effective low-energy descriptions of the correspond- 
ing NV =2 physical theory. This “sum” is not entirely 
a sum, as in general it has a part which contains a 
continuous integral. The “sum” is now known as 
integration over the u-plane after the work of Moore 
and Witten (1998). Witten’s (1994a) assumption 
can be simply stated as saying that the weak-/ 
strong-coupling limit and the twist commute. In 
other words, to study the strong-coupling limit of 
the topological theory, one first untwists, then 
works out the strong-coupling limit of the physical 
theory and, finally, one twists back. From such a 
viewpoint, the twisted effective (strong-coupling) 
theory can be regarded as a TQFT dual to the 
original one. In addition, one could ask for the dual 
moduli problem associated to this dual TQFT. It 
turns out that in many interesting situations 
(by (X) > 1) the dual moduli space is an abelian 
system corresponding to the Seiberg—Witten or 
monopole equations (Witten 1994a). The topologi- 
cal invariants associated with this new moduli space 
are the celebrated Seiberg—Witten invariants. 

Generalizations of Donaldson—Witten theory, with 
either different gauge groups and/or additional matter 
content (such as, e.g., twisted M=2 Yang-Mills 
multiplets coupled to twisted M =2 matter multi- 
plets) are possible, and some of the possibilities have 
in fact been explored (see Moore and Witten (1998) 
and references therein). The main conclusion that 
emerges from these analyses is that, in all known 
cases, the relevant topological information is cap- 
tured by the Seiberg-Witten invariants, irrespectively 
of the gauge group and matter content of the theory 
under consideration. These cases are not reviewed 


here, but rather the attention is turned to the twisted 
theories which emerge from M =4 supersymmetric 
gauge theories. 


Duality in Twisted M = 4 Theories 


Unlike the M =2 supersymmetric case, the VV =4 
supersymmetric Yang-Mills theory in four dimen- 
sions is unique once the gauge group G is fixed. 
The microscopic theory contains a gauge or gluon 
field, four chiral spinors (the gluinos) and six real 
scalars. All these fields are massless and take 
values in the adjoint representation of the gauge 
group. The theory is finite and conformally 
invariant, and is conjectured to have a duality 
symmetry exchanging strong and weak coupling 
and exchanging electric and magnetic fields, which 
extends to a full SL(2, Z) symmetry acting on the 
microscopic complexified coupling (Montonen and 
Olive 1977) 


0 4r 
“m e 
As in the M =2 case, the M =4 theory can be 
twisted to obtain a topological model, only that, in 
this case, the topological twist can be performed in 
three inequivalent ways, giving rise to three different 
TQFTs (Vafa and Witten 1994). A natural question 
to answer is whether the duality properties of the 
N =4 theory are shared by its twisted counterparts 
and, if so, whether one can take advantage of the 
calculability of topological theories to shed some 
light on the behavior and properties of duality. 

The answer is affirmative, but it is instructive to 
clarify a few points. First, as mentioned above, the 
topological observables in twisted M =2 theories are 
independent of the coupling constant e, so the 
question arises as to how the twisted VV = 4 theories 
come to depend on the coupling constant. As it turns 
out, twisted M =2 supersymmetric gauge theories 
have an off-shell formulation such that the TQFT 
action can be expressed as a Q-exact expression, 
where Q is the generator of the topological 
symmetry. Actually, this is true only up to a 
topological 6-term |, tr(F A F), 


[8] 


1 
S=55 | ved'x{2,\) 
~ 1 
-2rir zzz |. tr(F A F) [9] 


for some A. However, the MN =2 supersymmetric 
gauge theories possess a global U(1) chiral symmetry 
which is generically anomalous, so one can actually 


get rid of the -term with a chiral rotation. As a 
result of this, the observables in the topological 
theory are insensitive to -terms (and hence to rT 
and e) up to a rescaling. 

On the other hand, in M =4 supersymmetric 
gauge theories 9-terms are observable. There is no 
chiral anomaly and these terms cannot be shifted 
away as in the M =2 case. This means that in the 
twisted theories one might have a dependence on the 
coupling constant 7, and that — up to anomalies — 
this dependence should be holomorphic (resp. 
antiholomorphic if one reverses the orientation of 
the 4-manifold). In fact, on general grounds, one 
would expect for the partition functions of the 
twisted theories on a 4-manifold X and for gauge 
group G to take the generic form 


Zx(G) = gq 9 $ q x(Ma) (10) 
k 


where q =e*™”, c is a universal constant (depending 


on X and G), k=(1/1677) fy tr(FAF) is the 
instanton number, and y(M,) encodes the topolo- 
gical information corresponding to a sector of the 
moduli space of the theory with instanton number k. 

Now we can be more precise as to how we expect 
to see the Montonen—Olive duality in the twisted 
N =4 theories. First, under r — —1/7 the gauge 
group G gets exchanged with its dual group 
G. Correspondingly, the partition functions should 
behave as modular forms 


Zo(-1/7) = K(X, G)r"Ze(7) [11] 


where « is a constant (depending on X and G), and 
the modular weight w should depend on X in such a 
way that it vanishes on flat space. 

In addition to this, in the M =4 theory all the 
fields take values in the adjoint representation of G. 
Hence, if H*(X,71(G)) Æ 0, it is possible to consider 
nontrivial G/Center(G) gauge configurations with 
discrete magnetic °t Hooft flux through the 2-cycles 
of X. In fact, G/Center(G) bundles on X are 
classified by the instanton number and a character- 
istic class v € H?(X, mı(G)). For example, if 
G =SU(2), we have G = SU(2)/Z2 =SO(3) and v is 
the second Stiefel-Whitney class w2(E) of the gauge 
bundle E. This Stiefel-Whitney class can be repre- 
sented in de Rham cohomology by a class in 
H?(X, Z) defined modulo 2, that is, w2(E) and 
w2(E) + 2w, with w € H*(X, Z), represent the same 
t Hooft flux, so if w2(E)=2,A, for some AE 
H?(X, Z), then the gauge configuration is trivial in 
SO(3) (it has no ’t Hooft flux). 

Similarly, for G=SU(N) (for which G=SU 
(N)/Zn), one can fix fluxes in H?(X, Zyn) (the 
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corresponding Stiefel-Whitney class is defined mod- 
ulo N). One has, therefore, a family of partition 
functions Z,(T), one for each magnetic flux v. The 
SU(N) partition function is obtained by considering 
the zero flux partition function (up to a constant 
factor), while the (dual) SU(N)/Zn partition func- 
tion is obtained by summing over all v, and both are 
to be exchanged under + — —1/r. The action of 
SL(2, Z) on the Z, should be compatible with this 
exchange, and thus the r — —1/7 operation mixes 
the Z, by a discrete Fourier transform which, for 
G = SU(N) reads 


Z,(—1/7) =G S AZ) [12 


We are now in a position to examine the (three) 
twisted theories in some detail. For further details 
and references, the reader is referred to Lozano 
(1999). 

The first twisted theory considered here possesses 
only one scalar supercharge (and hence comes 
under the name of “half-twisted theory”). It is a 
nonabelian generalization of the Seiberg-Witten 
abelian monopole theory, but with the monopole 
multiplets taking values in the adjoint representa- 
tion of the gauge group. The theory can be 
perturbed by giving masses to the monopole multi- 
plets while still retaining its topological character. 
The resulting theory is the twisted version of the 
mass-deformed M =4 theory, which preserves 
N =2 supersymmetry and whose low-energy effec- 
tive description is known. This connection with 
N=2 theories, and its topological character, 
makes it possible to go to the long-distance limit 
and compute in terms of the twisted version of the 
low-energy effective description of the supersym- 
metric theory. Below, we review how the u-plane 
approach works for gauge group SU(2). 

The twisted theory for gauge group SU(2) has a 
U(1) global symmetry (the ghost number) which 
has an anomaly —3(2x+30)/4 on gravitational 
backgrounds (i.e., on curved manifolds). Nontrivial 
topological invariants are thus obtained by con- 
sidering the vacuum expectation value of products 
of observables with ghost numbers adding up to 
—3(2y + 30)/4. The relevant observables for this 
theory and gauge group SU(2) or SO(3) are 
precisely the same as in the Donaldson—Witten 
theory (eqns [4] and [5]). In addition to this, it is 
possible to enrich the theory by including sectors 
with nontrivial nonabelian electric and magnetic ’t 
Hooft fluxes which, as pointed out above, should 
behave under SL(2, Z) duality in a well-defined 
fashion. 
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The generating function for these correlation 
functions is given as an integration over the moduli 
space of vacua of the physical theory (the u-plane), 
which, for generic values of the mass parameter, 
forms a one-dimensional complex compact manifold 
(described by a complex variable customarily 
denoted by u, hence the name), which parametrizes 
a family of elliptic curves that encodes all the 
relevant information about the low-energy effective 
description of the theory. At a generic point in the 
moduli space of vacua, the only contribution to the 
topological correlation functions comes from a 
twisted M =2 abelian vector multiplet. Additional 
contributions come from points in the moduli space 
where the low-energy effective description is singu- 
lar (i.e. where the associated elliptic curve 
degenerates). 

Therefore, the total contribution to the generating 
function thus consists of an integration over the 
moduli space with the singularities removed — which 
is nonvanishing for b3(X)=1 (Moore and Witten 
1998) only — plus a discrete sum over the contribu- 
tions of the twisted effective theories at each of the 
three singularities of the low-energy effective 
description (Seiberg and Witten 1994a, b, c). The 
effective theory at a given singularity contains, 
together with the appropriate dual photon multiplet, 
one charged hypermultiplet, which corresponds to 
the state becoming massless at the singularity. The 
complete effective action for these massless states 
also contains certain measure factors and contact 
terms among the observables, which reproduce the 
effect of the massive states that have been integrated 
out as well as incorporate the coupling to gravity 
(i.e., explicit nonminimal couplings to the metric of 
the 4-manifold). How to determine these a priori 
unknown functions was explained in Moore and 
Witten (1998). The idea is as follows. At points on 
the u-plane where the (imaginary part of the) 
effective coupling diverges, the integral is discontin- 
uous at anti-self-dual abelian gauge configurations. 
This is commonly referred to as “wall crossing.” 
Wall crossing can take place at the singularities of 
the moduli space — the appropriate local effective 
coupling T.¢ diverges there — and, in the case of the 
asymptotically free theories, at the point at infinity — 
the effective electric coupling diverges owing to 
asymptotic freedom. 

On the other hand, the final expression for the 
invariants can exhibit a wall-crossing behavior at 
most at u — ov, so the contribution to wall crossing 
from the integral at the singularities at finite values 
of u must cancel against the contributions coming 
from the effective theories there, which also dis- 
play wall-crossing discontinuities. Imposing this 


cancelation fixes almost completely the unknown 
functions in the contributions to the topological 
correlation functions from the singularities. The 
final result for the contributions from the singula- 
rities (which give the complete answer for the 
correlation functions when b;(X)> 1) is written 
explicitly and completely in terms of the funda- 
mental periods da/du (written in the appropriate 
local variables) and the discriminant of the elliptic 
curve comprising the Seiberg-Witten solution for 
the physical theory. For simply connected spin 
4-manifolds of simple type the generating function 
is given by 
(POTS) 2 


2 (¥/2+(2x+30)/8) „= (3v+70)/8 (n(T) ) —12p 
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jb [20 e2pus+S*T3 
= pe (ss): du 


(i/2)(du/da),x-S 


x Lit 1) */ n shakes [13] 


where x is a Seiberg-Witten basic class (and ny is 
the corresponding Seiberg-Witten invariant), m is 
the mass parameter of the theory, v = (x + o)/4, v € 
H? (X, Z2) is a °t Hooft flux, S is the formal 
sum S= 31,0444 = correspondingly, I(S)= $., ag 
(£4), with IÈ = fy W2), where {Xahg=1,...b5(x) 
form a basis K aye ) and Q4 are constant g 
while n(r) is the Dedekind function, = (du/ 
dqgeff)u=u; (With qef = exp (2TiTeff), and a is the 
ratio of the fundamental periods of the elliptic curve), 
and the contact terms T; have the form 


1 /du\* Uj 

with E> and E, the Einstein series of weights 2 and 
4, respectively. Evaluating the quantities in [13] 
gives the final result as a function of the physical 
parameters 7 and m, and of topological data of X as 
the Euler characteristic x, the signature ø and the 
basic classes x. The expression [13] has to be 
understood as a formal power series in p and a,, 
whose coefficients give the vacuum expectation 
values of products of © = Wọ and I(),). 


The generating function [13] has nice properties 
under the modular group. For the partition function Z,, 


Z(7 +1) = (-1)78i” Z, (7) 


Z,(—1/r) = 2 E as 


x S(-1)"*Z (7) 
Also, with ZSU(2) =—21Z,—o and Z50(3) = > 


Zsy(2)(T + 1) = (—1)” ° Zsu2) (T) 
Zso) (T +2) = Zso) (T) [16] 
Zsu(2)(—1/7) = (~1)7827xX/21X? Z 03) (7) 


Ly; 


Vv 


Notice that the last of these three equations 
corresponds precisely to the strong—-weak coupling 
duality transformation conjectured by Montonen 
and Olive (1977). 

As for the correlation functions, one finds the 
following behavior under the inversion of the coupling: 


1 2 a oS) — 1 893) 
gt? =O = ere 


T 


(za J tr(2dF+ vA v) = (1(S) PU) 
1 SO(3) [17] 


Therefore, as expected, the partition function of 
the twisted theory transforms as a modular form, 
while the topological correlation functions turn out 
to transform covariantly under SL(2, Z), following a 
pattern which can be reproduced with a far more 
simple topological abelian model. 

The second example considered next is the Vafa- 
Witten (1994) theory. This theory possesses two 
scalar supercharges, and has the unusual feature that 
the virtual dimension of its moduli space is exactly 
zero (it is an example of balanced TQFT), and 
therefore the only nontrivial topological observable 
is the partition function itself. Furthermore, the 
twisted theory does not contain spinors, so it is well 
defined on any compact, oriented 4-manifold. 

Now this theory computes, with the subtleties 
explained in Vafa and Witten (1994), the Euler 
characteristic of instanton moduli spaces. In fact, in 
this case in the generic partition function [10], 


Zx(G) = gD Sgky(My) [8 
k 
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y(M,) is the Euler characteristic of a suitable 
compactification of the kth instanton moduli space 
M; of gauge group G in X. 

As in the previous example, it is possible to consider 
nontrivial gauge configurations in G/Center (G) and 
compute the partition function for a fixed value of the 
’t Hooft flux v € H?(X,71(G)). In this case, however, 
the Seiberg—Witten approach is not available, but, as 
conjectured by Vafa and Witten, one can nevertheless 
carry out computations in terms of the vacuum degrees 
of freedom of the M =1 theory which results from 
giving bare masses to all the three chiral multiplets of 
the MN =4 theory. It should be noted that a similar 
approach was introduced by Witten (1994b) to obtain 
the first explicit results for the Donaldson—Witten 
theory just before the far more powerful Seiberg— 
Witten approach was available. 

As explained in detail by Vafa and Witten (1994), 
the twisted massive theory is topological on Kahler 
4-manifolds with b>? 40, and the partition func- 
tion is actually invariant under the perturbation. In 
the long-distance limit, the partition function is 
given as a finite sum over the contributions of the 
discrete massive vacua of the resulting M = 1 theory. 
In the case at hand, it turns that, for G=SU(N), the 
number of such vacua is given by the sum of the 
positive divisors of N. The contribution of each 
vacuum is universal (because of the mass gap), and 
can be fixed by comparing with known mathema- 
tical results (Vafa and Witten 1994). However, this 
is not the end of the story. In the twisted theory, the 
chiral superfields of the M =4 theory are no longer 
scalars, so the mass terms cannot be invariant under 
the holonomy group of the manifold unless one of 
the mass parameters be a holomorphic 2-form w. 
(Incidentally, this is the origin of the constraint 
h0) £0 mentioned above.) This spatially depen- 
dent mass term vanishes where w does, and we will 
assume as in Vafa and Witten (1994) and Witten 
(1994b) that w vanishes with multiplicity 1 on 
a union of disjoint, smooth complex curves 
Ci,i=1,...,2 of genus g; which represent the 
canonical divisor K of X. The vanishing of 
w introduces corrections involving K whose precise 
form is not known a priori. In the G=SU(2) 
case, each of the M =1 vacua bifurcates along each 
of the components C; of the canonical divisor 
into two strongly coupled massive vacua. This 
vacuum degeneracy is believed to stem from the 
spontaneous breaking of a Z2 chiral symmetry 
which is unbroken in bulk (see, e.g., Vafa and 
Witten (1994) and Witten (1994b)). 

The structure of the corrections for G=SU(N) 
(see [19] below) suggests that the mechanism at 
work in this case is not chiral symmetry breaking. 
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Indeed, near any of the C;’s, there is an N-fold 
bifurcation of the vacuum. A plausible explana- 
tion for this degeneracy could be found in the 
spontaneous breaking of the center of the gauge 
group (which for G=SU(N) is precisely Zn). In 
any case, the formula for SU(N) can be computed 
(at least when N is prime) along the lines 
explained by Vafa and Witten (1994) and assum- 
ing that the resulting partition function satisfies a 
set of nontrivial constraints which are described 
below. 

Then, for a given ’t Hooft flux v € H*(X, Zn), the 
partition function for gauge group SU(N) (with 
prime N) is given by 


n N-1 V (1—gi)6e,. 
Lx; = (x: One) I] [I (2) 
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1 \=0 \ 
in((N—1)/N)mv2 [CCa N) - 
x e -NE [19] 
where a= exp (2ri/N), G(q) =n(q)** (with n(q) the 


Dedekind function), x are the SU(N) characters at 
level 1 and y,, are certain linear combinations 
thereof. [C;], is the reduction modulo N of the 
Poincaré dual of C;, and 


wrle) = SreilCily 20 
1=1 


where ¢;=0,1,..., N — 1 are chosen independently. 
Equation [19] has the expected properties under 
the modular group: 


Z,(T i 1) = e(in/12)N(2x-+30) -—in((N-1)/N)}* Z (7) 
Z,(-1/rT) = NeR [21] 
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and also, with Zsu) = N” Zo and ZSU(N)/Zn = 


yy Lv; 
~y/2 T —x/2 
Zsu(n)(—1/T) = N (=) Zsu(N)/Zn(T) [22] 


which is, up to some correction factors that vanish in 
flat space, the original Montonen—Olive conjecture! 
There is a further property to be checked which 
concerns the behavior of [19] under blow-ups. This 
property was heavily used by Vafa and Witten 
(1994) and demanding it in the present case was 


essential in deriving the above formula. Blowing up 
a point on a Kahler manifold X replaces it with a 
new Kahler manifold X whose second cohomology 
lattice is H*(X, Z) = H?(X, Z) @I-, where I is the 
one-dimensional lattice spanned by the Poincaré 
dual of the exceptional divisor B created by the 
blow-up. Any allowed Zy flux v on X is of the form 
=v r, where v is a flux in X and r= AB, 
A=0,1,...,N—1. The main result concerning 
[19] is that under blowing up a point on a Kahler 
4-manifold with canonical divisor as above, the 
partition functions for fixed °t Hooft fluxes have a 
factorization as 





XA (To) 
Zy4(T) = Zxv(7) a [23] 
Precisely the same behavior under blow-ups of the 
partition function [19] has been proved for the 
generating function of Euler characteristics of 
instanton moduli spaces on Kähler manifolds. This 
should not come as a surprise since, as mentioned 
above, on certain 4-manifolds, the partition function 
of Vafa-Witten theory computes the Euler charac- 
teristics of instanton moduli spaces. Therefore, [19] 
can be seen as a prediction for the Euler numbers of 
instanton moduli spaces on those 4-manifolds. 
Finally, the third twisted M =4 theory also pos- 
sesses two scalar supercharges, and is believed to be a 
certain deformation of the four-dimensional BF 
theory, and as such it describes essentially intersection 
theory on the moduli space of complexified gauge 
connections. In addition to this, the theory is “amphi- 
cheiral,” which means that it is invariant to a reversal 
of the orientation of the spacetime manifold. The 
terminology is borrowed from knot theory, where an 
oriented knot is said to be amphicheiral if, crudely 
speaking, it is equivalent to its mirror image. From this 
property, it follows that the topological invariants of 
the theory are completely independent of the complex- 
ified coupling constant 7. 


See also: Donaldson—Witten Theory; Electric-Magnetic 
Duality; Hopf Algebras and g-Deformation Quantum 
Groups; Large-N and Topological Strings; Seiberg— 
Witten Theory; Topological Quantum Field Theory: 
Overview. 
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Introduction 


The relations between thermodynamics and 
dynamics are dealt with by statistical mechanics. 
For a given dynamical system of Hamiltonian type 
in a classical framework, it is usually assumed that a 
dynamical foundation for equilibrium statistical 
mechanics, namely for the use of the familiar 
Gibbs ensembles, is guaranteed if one can prove 
that the system is ergodic, that is, has no integrals of 
motion apart from the Hamiltonian itself. One of 
the main consequences is then that classical 
mechanics fails in explaining thermodynamics at 
low temperatures (e.g., the specific heats of crystals 
or of polyatomic molecules at low temperatures, or 
the black body problem), because the classical 
equilibrium ensembles lead to equipartition of 
energy for a system of weakly coupled oscillators, 
against Nernst’s third principle. This is actually the 
problem that historically led to the birth of quantum 
mechanics, equipartition being replaced by Planck’s 
law. At a given temperature T, the mean energy of 
an oscillator of angular frequency w is not kgT (kg 
being the Boltzmann constant), and thus is not 
independent of frequency (equipartition), but 


decreases to zero exponentially fast as frequency 
increases. 

Thus, the problem of a dynamical foundation for 
classical statistical mechanics would be reduced to 
ascertaining whether the Hamiltonian systems of 
physical interest are ergodic or not. It is just in this 
spirit that many mathematical works were recently 
addressed at proving ergodicity for systems of hard 
spheres, or more generally for systems which are 
expected to be not only ergodic but even hyperbolic. 
However, a new perspective was opened in the year 
1955, with the celebrated paper of Fermi, Pasta, and 
Ulam (FPU), which constituted the last scientific 
work of Fermi. 

The FPU paper was concerned with numerical 
computations on a system of N (actually, 32 or 64) 
equal particles on a line, each interacting with the 
two adjacent ones through nonlinear springs, certain 
boundary conditions having been assigned (fixed 
ends). The model mimics a one-dimensional crystal 
(or also a string), and can be described in the 
familiar way as a perturbation of a system of N 
normal modes, which diagonalize the corresponding 
linearized system. The initial conditions corre- 
sponded to the excitation of only a few low- 
frequency modes, and it was expected that energy 
would rather quickly flow to the high-frequency 
modes, thus establishing equipartition of energy, in 
agreement with the predictions of classical equili- 
brium statistical mechanics. But this did not occur 
within the available computation times, and the 
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energy rather appeared to remain confined within a 
packet of low-frequency modes having a certain 
width, as if being in a state of apparent equilibrium 
of a nonstandard type. This fact can be called “the 
FPU paradox.” In the words of Ulam, written as a 
comment in Fermis Collected Papers, this is 
described as follows: “The results of the computa- 
tions were interesting and quite surprising to Fermi. 
He expressed the opinion that they really constituted 
a little discovery in providing intimations that the 
prevalent beliefs in the universality of mixing and 
thermalization in nonlinear systems may not be 
always justified.” 

The FPU paper immediately had a very strong 
impact on the theory of dynamical systems, because 
it motivated all the modern theory of infinite- 
dimensional integrable systems and solitons (KdV 
equation), starting from the works of Zabusky and 
Kruskal (1965). But in this way the FPU paradox 
was somehow enhanced, because the FPU system 
turned out to be associated to the class of 
integrable systems, namely the systems having a 
number of integrals of motion equal to the number 
of degrees of freedom, which are in a sense the 
most antithermodynamic systems. The merit of 
establishing a bridge towards ergodicity goes to 
Izrailev and Chirikov (1966). Making reference to 
the most advanced results then available in the 
perturbation theory for nearly integrable systems 
(KAM theory), these authors pointed out that 
ergodicity, and thus equipartition, would be recov- 
ered if one took initial data with a sufficiently large 
energy. And this was actually found to be the case. 
Moreover, it turned out that their work, and its 
subsequent completion by Shepelyanski, was often 
interpreted as supporting the conjecture that the 
FPU paradox would disappear in the thermody- 
namic limit (infinitely many particles, with finite 
density and energy density). The opposite conjec- 
ture was advanced in the year 1970 by Bocchieri, 
Scotti, Bearzi, and Loinger, and its relevance for the 
relations between classical and quantum mechanics 
was immediately pointed out by Cercignani, 
Galgani, and Scotti. A long debate then followed. 
Possibly, some misunderstandings occurred, because 
in the discussions concerning the dynamical aspects 
of the problem reference was generally made to 
notions involving infinite times. In fact, it had not 
yet been conceived that the FPU equilibrium might 
actually be an apparent one, corresponding to some 
type of intermediate metaequilibrium state. This 
was for the first time suggested by researchers in 
Parisi’s group in the year 1982. The analogy of 
such a situation with that occurring in glasses was 
pointed out more recently. 


In the present article, the state of the art of the 
FPU problem is discussed. The thesis of the present 
authors is that the FPU phenomenon survives in the 
thermodynamic limit, in the last mentioned sense, 
namely that at sufficiently low temperatures there 
exists a kind of metaequilibrium state surviving for 
extremely long times. The corresponding thermo- 
dynamics turns out to be different from the standard 
one predicted by the equilibrium ensembles, inas- 
much as it presents qualitatively some quantum-like 
features (typically, specific heats in agreement with 
Nernst’s third principle). The key point, with respect 
to equilibrium statistical mechanics, is that the 
internal thermodynamic energy should be identified 
not with the whole mechanical energy, but only with 
a suitable fraction of it, to be identified through its 
dynamical properties, as was suggested more than a 
century ago by Boltzmann himself, and later by 
Nernst. 

Here, it is first discussed why nearly integrable 
systems can be expected to present the FPU phenom- 
enon. Then the latter is illustrated. Finally, some hints 
are given for the corresponding thermodynamics. 


Nearly Integrable versus Hyperbolic 
Systems, and the Question of the Rates 
of Thermalization 


As mentioned above, it is usually assumed that the 
problem of providing a dynamical foundation to 
classical statistical mechanics is reduced to the 
mathematical problem of ascertaining whether the 
Hamiltonian systems of physical interest are ergodic 
or not. However, there remains open a subtler 
problem. Indeed, the notion of ergodicity involves 
the limit of an infinite time (time averages should 
converge to ensemble averages as t — oo), while 
intermediate times might be relevant. In this 
connection it is convenient to distinguish between 
two classes of dynamical systems, namely the 
hyperbolic and the nearly integrable ones. 

The first class, in a sense the prototype of chaotic 
systems, should include the systems of hard spheres 
(extensively studied after the classical works of 
Sinai), or more generally the systems of mass points 
with mutual repulsive interactions. For such systems 
it can be expected that the time averages of the 
relevant dynamical quantities in an extremely short 
time converge to the corresponding ensemble 
averages, so that the classical equilibrium ensembles 
could be safely used. 

A completely different situation occurs for the 
dynamical systems such as the FPU systems, which 
are nearly integrable, that is, are perturbations of 


systems having a number of integrals of motion 
equal to the number of degrees of freedom. Indeed, 
in such a case ergodicity means that the addition of 
an interaction, no matter how small, makes an 
integrable system lose all of its integrals of motion, 
apart from the Hamiltonian itself. And, in fact, this 
quite remarkable property was already proved to be 
generic by Poincaré, through a set of considerations 
which had a fundamental impact on the theory of 
dynamical systems itself. In view of its importance 
for the foundations of statistical mechanics, the 
proof given by Poincaré was reconsidered by Fermi, 
who added a subtle contribution concerning the role 
of single invariant surfaces. It is just to such a paper 
that Ulam makes reference in his comment to the 
FPU work mentioned above, when he says: “Fermi’s 
earlier interest in the ergodic theory is one motive” 
for the FPU work. 

The point is that the picture which looks at the 
ergodicity induced on an integrable system by the 
addition of a perturbation, no matter how small, 
somehow lacks continuity. One might expect that, 
in situations in which the nonlinear interaction 
which destroys the integrals of motion is very small 
(i.e., at low temperatures), the underlying integrable 
structure should somehow be still appreciable, in 
some continuous way. In fact, continuity should be 
recovered by making a question of times, namely by 
considering the rates of thermalization (to use the 
very FPU phrase), or equivalently the relaxation 
times, namely the times needed for the time averages 
of the relevant dynamical quantities to converge to 
the corresponding ensemble averages. By continuity, 
one clearly expects that the relaxation times diverge 
as the perturbation tends to zero. But more 
complicated situations might occur, as, for example, 
the existence of two (or more) relevant timescales. 
The point of view that timescales of different orders 
of magnitude might occur in dynamical systems 
(with the exhibition of an interesting example) and 
that this might be relevant for statistical mechanics, 
was discussed by Poincaré himself in the year 1906. 
Indeed, he denotes as “first-order very large time” a 
time which is sufficient for a system to reach a 
“provisional equilibrium,” whereas he denotes as 
“second-order very large time” a time which is 
necessary for the system to reach its “definitive 
equilibrium.” 


The FPU Phenomenon: Historical and 
Conceptual Developments 


We now illustrate the FPU phenomenon, following 
essentially its historical development. We will make 
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reference to Figures 1-8, which are the results of 
numerical integrations of the FPU dynamical system. 
If x1,...,xn denote the positions of the particles (of 
unitary mass), or more precisely the displacements 
from their equilibrium positions, and p; the corre- 
sponding momenta, the Hamiltonian is 


N+1 


N p2 
H = ua +S V(r) 
71 i=1 


where 7; =x; — x;-1 and one has taken a potential 
V(r) =77/2 + ar°/3+ Gr /4 depending on two 
positive parameters a and 8. Boundary conditions 
with fixed ends, namely x9 =xn41 =Q, are consid- 
ered. We recall that the angular frequencies 
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Figure 1 The FPU paradox: normal-mode energies Ej versus 
time (left) and energy spectrum, namely time average of Ej 
versus j (right) for three different timescales. The energy, initially 
given to the lowest-frequency mode, does not flow to the high- 
frequency modes within the accessible observation time. Here, 
N=32 and E=0.05. 
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Figure 2 The FPU paradox: time averages of the energies of 
the modes 1, 2,..., 8 (from top to bottom) versus time for the 
same run as Figure 1. The spectrum has reached an apparent 
equilibrium, different from that of equipartition predicted by 
classical equilibrium statistical mechanics. An exponential 
decay of the tail is clearly exhibited. 


of the corresponding normal modes are w;= 2sin 
[jr/2(N+1)], with j=1,...,N; it is thus conve- 
nient to take as time unit the value m, which is 
essentially, for any N, the period of the fastest 
normal mode. 

The original FPU result is illustrated in Figures 1 
and 2. Here N=32,a=(G=1/4, and the total 
energy is E=0.05; the energy was given initially to 
the first normal mode (with vanishing potential 
energy). Three timescales (increasing from top to 
bottom) are considered, the top one corresponding 
to the timescale of the original FPU paper. In the 
boxes on the left the energies E;(t) of modes j are 
reported versus time (j=1,...,8 at top, ;=1 at 
center and bottom). In the boxes on the right we 
report the corresponding spectra, namely the time 
average (up to the respective final times) of the 
energy of mode j versus j, for 1 < j < N. In Figure 2 
we report, for the same run of Figure 1, the time 
averages of the energies of the various modes versus 
time; this figure corresponds to the last one of the 
original FPU work. The facts to be noticed in 
connection with these two figures are the following: 
(1) the spectrum (namely the distribution of energy 
among the modes, in time average) appears to have 
relaxed very quickly to some form, which remains 
essentially unchanged up to the maximum observed 
time; (2) there is no global equipartition, but only a 
partial one, because the energy remains confined 
within a group of low-frequency modes, which form 
a small packet of a certain definite width; and (3) 
the time evolutions of the mode energies appear to 
be of quasiperiodic type, since longer and longer 
quasiperiods can be observed as the total time 
increases. 
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Figure 3 The Izrailev—Chirikov contribution: for a fixed obser- 


vation time, equipartition is attained if the initial energy E is high 
enough. Here, from top to bottom, E =0.1, 1, 10. 
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Figure 4 The Izrailev—Chirikov contribution: time averages of 


the mode energies versus time for the same run as at bottom of 
Figure 3. 


After the works of Zabusky and Kruskal, by 
which the FPU system was somehow assimilated to 
an integrable system, the bridge toward ergodicity 
was made by Izrailev and Chirikov (1966), through 
the idea that there should exist a stochasticity 
threshold. Making reference to KAM theory, which 
had just been formulated in the framework of 
perturbation theory for nearly integrable systems, 
their main remark was as follows. It is known that 
KAM theory, which essentially guarantees a beha- 
vior similar to that of an integrable system, applies 
only if the perturbation is smaller than a certain 
threshold; on the other hand, in the FPU model the 
natural perturbation parameter is the energy E of 
the system. Thus, the FPU phenomenon can be 
expected to disappear above a certain threshold 
energy Ec. This is indeed the case, as illustrated in 
Figures 3 and 4. The parameters a, 8 and the class of 
initial data are as in Figure 1. In Figure 3 the total 
time is kept fixed (at 10 000 units), whereas the 
energy E is increased in passing from top to bottom, 
actually from E=0.1 to E=1 and E=10. One sees 
that at E=10 equipartition is attained within the 
given observation time; correspondingly, the motion 
of the modes visually appears to be nonregular. The 
approach to equipartition at E=10 is clearly 
exhibited in Figure 4, where the time averages of 
the energies are reported versus time. 

There naturally arose the problem of the depen- 
dence of the threshold E. on the number N of degrees 
of freedom (and also on the class of initial data). 
Certain semianalytical considerations of Izrailev and 
Chirikov were generally interpreted as suggesting 
that the threshold should vanish in the thermody- 
namic limit for initial excitations of high-frequency 
modes. Recently, Shepelyanski completed the analy- 
sis by showing that the threshold should vanish also 
for initial excitations of the low-frequency modes, as 
in the original FPU work (see, however, the 
subsequent paper by Ponno mentioned below). If 
this were true, the FPU phenomenon would dis- 
appear in the thermodynamic limit. In particular, 
the equipartition principle would be dynamically 
justified at all temperatures. 

The opposite conjecture was advanced by 
Bocchieri et al. (1970). This was based on numerical 
calculations, which indicated that the energy thresh- 
old should be proportional to N, namely that the FPU 
phenomenon persists in the thermodynamic limit 
provided the specific energy «=E/N is below a 
critical value cec, which should be definitely nonvan- 
ishing. Actually, the computations were performed 
on a slightly different model, in which nearby 
particles were interacting through a more physical 
Lennard-Jones potential. By taking concrete values 
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having a physical significance, namely the values 
commonly assumed for argon, for the threshold of 
the specific energy they found the value ee ~ 0.04Vo, 
where Vo is the depth of the Lennard-Jones potential 
well. This corresponds to a critical temperature of the 
order of a few kelvin. The relevance of such a 
conjecture (persistence of the FPU phenomenon in the 
thermodynamic limit) was soon strongly emphasized 
by Cercignani, Galgani, and Scotti, who also tried to 
establish a connection between the FPU spectrum and 
Planck’s distribution. 

Up to this point, the discussion was concerned 
with the alternative whether the FPU system is 
ergodic or not, and thus reference was made to 
properties holding in the limit t — oo. Correspond- 
ingly, one was making reference to KAM theory, 
namely to the possible existence of surfaces (N- 
dimensional tori) which should be dynamically 
invariant (for all times). The first paper in which 
attention was drawn to the problem of estimating 
the relaxation times to equilibrium was by Fucito 
et al. (1982). The model considered was actually a 
different one (the so-called ¢* model), but the results 
can also be extended to the FPU model. Analytical 
and numerical indications were given for the 
existence of two timescales. In a short time the 
system was found to relax to a state characterized by 
an FPU-like spectrum, with a plateau at the low 
frequencies, followed by an exponential tail. This, 
however, appeared as being a sort of metastable 
state. In their words: “The nonequilibrium spectrum 
may persist for extremely long times, and may be 
mistaken for a stationary state if the observation 
time is not sufficiently long.” Indeed, on a second 
much larger timescale the slope of the exponential 
tail was found to increase logarithmically with time, 
with a rate which decreases to zero with the energy. 
This is an indication that the time for equipartition 
should increase as an exponential with the inverse of 
the energy. 

This is indeed the picture that the present authors 
consider to be essentially correct, being supported 
by very recent numerical computations, and by 
analytical considerations. Curiously enough, how- 
ever, such a picture was not fully appreciated until 
quite recently. Possibly, the reason is that the 
scientific community had to wait until becoming 
acquainted with two relevant aspects of the theory 
of dynamical systems, namely Nekhoroshev theory 
and the relations between KdV equation and 
resonant normal-form theory. 

The first step was the passage from KAM theory 
to Nekhoroshev theory. Let us recall that, whereas 
in KAM theory one looks for surfaces which are 
invariant (for all times), in Nekhoroshev theory one 
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looks instead for a kind of weak stability involving 
finite times, albeit “extremely long” ones, as they 
are found to increase as stretched exponentials with 
the inverse of the perturbative parameter. Thus, one 
meets with situations in which one can have 
instability over infinite times, while having a kind 
of practical stability up to exponentially long times. 
Notice that Nekhoroshev’s theory was formulated 
only in the year 1974, and that it started to be 
known in the West only in the early 1980s, just 
because of its interest for the FPU problem. Another 
interesting point is that just in those years one 
started to become acquainted with a related histor- 
ical fact. Indeed, the idea that equipartition might 
require extremely long times, so that one would be 
confronted with situations of a practical lack of 
equipartition, has in fact a long tradition in 
statistical mechanics, going back to Boltzmann and 
Jeans, and later (in connection with sound disper- 
sion in gases of polyatomic molecules) to Landau 
and Teller. 

In this way the idea of the existence of extremely 
long relaxation times to equipartition came to be 
accepted. The ingredient that was still lacking is the 
idea of a quick relaxation to a metastable state. The 
importance of this should not be overlooked. 
Indeed, without it one cannot at all have a 
thermodynamics different from the standard equili- 
brium one corresponding to equipartition. This was 
repeatedly emphasized, against Jeans, by Poincaré 
on general grounds and by Nernst on empirical 
grounds. The full appreciation of this latter ingre- 
dient was obtained quite recently (although it had 
been clearly stated by Fucito et al. (1982)). A first 
hint in this direction came from the realization 
(see Figure 5) of a deep analogy between the FPU 
phenomenon and the phenomenology of glasses. 
Then there came a strong numerical indication by 
Berchialla, Galgani, and Giorgilli. Finally, from the 
analytical point of view, there was a suitable 
revisitation (by Ponno) of the traditional connection 
between the FPU system and the KdV equation 
with its solitons. The relevant points are the 
following: (1) the KdV equation describes well the 
solutions of the FPU problem (for initial data of FPU 
type) only on a “short” timescale, which increases as 
a power of 1/e, and so describes only a first process 
of quick relaxation; (2) the corresponding spectrum 
has a very definite analytical form, the energy being 
spread up to a maximal frequency w(e) ~ e!/4 and 
then decaying exponentially; and (3) the relevant 
formulas contain the energy only through the 
specific energy €, and thus can be expected to hold 
also in the thermodynamic limit. It should be 
mentioned, however, that all the results of an 
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Figure 5 Analogy with glasses: the specific energy u of an 
FPU system is plotted versus temperature T for a cooling 
process (upper curve) and a heating process (lower curve). 
The FPU system is kept in contact with a heat reservoir, whose 
temperature is changed at a given rate. At low temperatures the 
system does not have time to reach the equilibrium curve u= T 
(with kg =1). 


analytic type mentioned above have a purely formal 
character, because up to now none of them was 
proved, in the thermodynamic limit, in the sense of 
rigorous perturbation theory. This requires a suita- 
ble readaptation of the known techniques, which is 
currently being obtained both in connection with 
Nekhoroshev’s theorem (in order to explain the 
extreme slowness of a possible final approach to 
equilibrium) and in connection with the normal- 
form theory for partial differential equations (in 
order to explain the fast relaxation to the metaequi- 
librium state). 

In conclusion, for the case of initial conditions of 
the FPU type (excitation of a few low-frequency 
modes) the situation seems to be as follows. The first 
phenomenon that occurs in a “short” time (of the 
order of (1/e)/f is a quick relaxation to the 
formation of what can be called a “natural packet” 
of low-frequency modes extending up to a certain 
maximal frequency © ~ e!/*+. This is a phenomenon 
which has nothing to do with any diffusion in phase 
space. In fact, it shows up also for an integrable 
system such as a Toda lattice (as will be illustrated 
below), and should be described by a suitable 
resonant normal form related to the KdV equation. 
One has then to take into account the fact that the 
domain of the frequencies in the FPU model is 
bounded (w < 2 in the chosen units). Now, as the 
function w(e) is monotonic, this fact leads to the 
existence of a critical value e. of the specific energy 
c, defined by wlec) =2. Indeed, for € > e. the quick 
relaxation process leads altogether to equipartition. 
Below the threshold, instead, the same quick process 
leads to the formation of an FPU-like spectrum, 
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Figure 6 Time needed to form a packet versus specific energy 

for the FPU model (bottom) and the corresponding Toda model 

(top). Different symbols refer to packets of different width. The 


existence of two timescales below a critical specific energy in the 
FPU model is exhibited. 


involving only modes of sufficiently low frequency. 
This should, however, be a metastable state (which 
might be mistaken for a stationary one), which 
should be followed, on a second timescale, by a 
relaxation to the final equilibrium, through a sort 
of Arnol’d diffusion requiring extremely long 
Nekhoroshey-like times. This is actually the way in 
which the old idea of a threshold, originally 
conceived in terms of KAM tori, is now recovered 
even for ergodic systems, in terms of timescales. 
The existence of a process of quick relaxation, 
and of a threshold in the above-mentioned sense, is 
illustrated in Figures 6 and 7. In Figure 6 the lower 
part refers to the FPU model, while the upper one 
refers to a corresponding Toda model. The latter is 
in a sense the prototype of an integrable nonlinear 
system; with respect to the FPU case, the difference 
is that the potential V(r) is now exponential. The 


Dynamical Systems and Thermodynamics 131 






N 
(=) + 8 
= JA 16 
J o 32 
> + J A 63 
= J © 127 
Z N 4 O 255 
D . * 511 
L o ~ x 1023 
m E 
C i o A 








| 
10° 10> 10° 
Specific energy 
Figure 7 Width of the natural packet versus specific energy, 
for N ranging from 8 to 1023. Reproduced from Berchialla L, 
Galgani L, and Giorgilli A (2004) Localization of energy in FPU 
chains, Discr. Cont. Dyn. Systems B 11: 855-866, with 
permission from American Institute of Mathematical Sciences. 


parameters of the exponential were chosen so that 
the two models coincide up to cubic terms in the 
potential. With the energy given to the lowest- 
frequency mode, the figure shows the time needed in 
order that energy spreads up to a mode k, for several 
values of k, as a function of e. It is seen that in the 
Toda model (top) there is formed a packet extending 
up to rather well-defined width, and that this occurs 
within a relaxation time increasing as a power of 
1/e. An analogous phenomenon occurs for the FPU 
model (bottom). The only difference is that, below a 
critical specific energy <œ 0.1, there exists a 
subsequent relaxation time to equipartition, which 
involves a time growing faster than any inverse 
power of e. Such a second phenomenon is due to the 
nonintegrable character of the FPU model. In 
Figure 7 the width of the natural packet for the 
FPU model is exhibited, by reporting the frequency 
w of its highest mode as a function of e. As one sees, 
the numerical results clearly indicate the existence of 
a relation © ~ «1/4, which holds for a number of 
degrees of freedom N ranging from 8 to 1023. This 
is actually the law which is predicted by resonant 
normal-form theory. 


Boltzmann and Nernst Revisited 


All the results illustrated above refer to initial data 
of FPU type, namely with an excitation of a few 
low-frequency modes. However, from the point of 
view of statistical mechanics, such initial data are 
exceptional, and one should rather consider initial 
data extracted from the Gibbs distribution at a 
certain temperature. One can then couple the FPU 
system to a heat bath at a slightly different 
temperature, and look at the spectrum of the FPU 
system after a certain time. The result, for the case 
of a heat bath at a higher temperature, is shown in 
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Figure 8 A case of an FPU system initially at equilibrium and 
thus in equipartition. Spectrum of the FPU system after it was 
kept in contact with a heat reservoir at a higher temperature. 


Figure 8. Clearly, here one has a situation similar to that 
occurring for initial data of FPU type, because only a 
packet of low-frequency modes exhibits a reaction, each 
of its modes actually adapting itself to the temperature 
of the bath, whereas the high-frequency modes do not 
react at all, that is, remain essentially frozen. 

This capability of reacting to external disturbances 
(which seems to pertain only to a fraction of the 
mechanical energy initially inserted into the system) 
can be characterized in a quantitative way through an 
estimate of the fluctuations of the total energy of the 
FPU system. This is indeed the sense of the fluctuation- 
dissipation theorem, the precursor of which is perhaps 
the contribution of Einstein to the first Solvay 
conference (1911). Through such a method, the 
specific heat of the FPU system is estimated (apart 
from a numerical factor) by the time average of [E(t) — 
E(0)]*, where E(t) is the energy, at time t, of the FPU 
system in dynamical contact with a heat bath (at the 
same temperature from which the initial data are 
extracted). Usually, in the spirit of ergodic theory, one 
looks at the infinite-time limit of such a quantity. But 
in the spirit of the metastable picture described above, 
one can check whether the time average presents a 
previous stabilization to some value smaller than the 
one predicted at equilibrium. Such a result, which is in 
qualitative agreement with the third principle, has 
indeed been obtained (by Carati and Galgani) recently. 

In conclusion, in situations of metaequilibrium such 
as those existing in the FPU model at low tempera- 
tures, a thermodynamics can still be formulated. 
Indeed, by virtue of the quick relaxation process 
described above, the time averages of the relevant 
quantities are found to stabilize in rather short times. 
In this way, one overcomes the critique of Poincaré to 
Jeans, namely that one cannot have a thermodynamics 
at all if reference is made only to the existence of 


extremely long relaxation times to the final equili- 
brium. A relaxation to a “provisional equilibrium” 
within a “first-order very large time” (to quote 
Poincaré) is required . The difference with respect to 
the standard equilibrium thermodynamics relies now 
in the mechanical interpretation of the first principle. 
Indeed, the internal thermodynamic energy is identi- 
fied not with the whole mechanical energy, but just 
with that fraction of it which is capable of reacting in 
short times to the external perturbations. 

This is the way in which the old idea of Boltzmann 
(and Jeans) might perhaps be presently implemented. 
For what concerns the fraction of the mechanical energy 
which is not included in the thermodynamic internal 
energy, as not being able to react in relatively short 
times, this should somehow play the role of a zero-point 
energy. This was suggested in the year 1971 by C 
Cercignani. But in fact, such a concept was put forward 
by Nernst himself in an extremely speculative work in 
1916, where he also advanced the concept that, for a 
system of oscillators of a given frequency, there should 
exist both dynamically ordered (geordnete) and dyna- 
mically chaotic (ungeordnete) motions, the latter being 
prevalent above a certain energy threshold. According to 
him, this fact should be relevant for a dynamical 
understanding of the third principle and of Planck’s law. 

It is well known that the modern theory of 
dynamical systems has led to familiarity with the 
(sometimes abused) notions of order and chaos and of 
a transition between them. One might say that the FPU 
work just forced the scientific community to take into 
account such notions in connection with the principle 
of equipartition of energy. It is really fascinating to see 
that the same notions, with the same terminology, had 
already been introduced much earlier on purely 
thermodynamic grounds, in connection with the 
relations between classical and quantum mechanics. 
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Introduction 


The purpose of this article is to describe some basic 
problems related to the interplay between dynamical 
systems and mathematical physics. Since it is 
impossible to be exhaustive in these topics, the 
focus here is on water-wave models. These mathe- 
matical models are described by partial differential 
equations that can be understood as dynamical 
systems in a suitable infinite-dimensional phase 
space. 

We will not address the original equations for 
two-dimensional (2D) surface water waves, even if 
we know that dynamical-system methods can help 
to exhibit some solitary waves for the equations. 
The reader is referred to relevant articles in this 
encyclopedia for details. Another approach is to 
seek these 2D surface water waves as saddle points 
for some Hamiltonians, which too is discussed 
elsewhere in this work. 

This article presents these arguments on some 
asymptotical models for the propagation of surface 
water waves. 


Asymptotical Models in Hydrodynamics 


To begin with, consider an irrotational fluid in a 
canal that is governed by the Euler equations and 


that is subject to gravitational forces. For a canal of 
finite depth, Boussinesq (1877) and Korteweg-de 
Vries (KdV) (1890) obtained the following model 


for unidirectional long waves: 
Ut + Ux + Uyyy + UU = 0 [1] 


Sometimes we drop the uy term on the left-hand side 
of [1], thanks to a suitable change of coordinates. 
Alternatively, we can also deal with the so-called 
generalized KdV equation, which reads 


U, + Uyyy + UŽU, = 0 [2] 


where k is a positive integer. There are also other 
models designed to represent long waves in shallow 
water. Let us introduce the regularized long-wave 
equation (also referred to as the Benjamin-Bona- 
Mahony equation) that reads 


Ut — Uxx + Uy + uu, = 0 [3] 
or the Camassa-Holm equation 


Up — Uia + 3UtUty, = Zyty + Ullis [4] 


For deep water, a well-known model was intro- 


duced by Zakharov (1968) 
itt; + Ux + eju| u = 0 [5] 


which describes the slow modulations of wave 
packets. Here the unknown u(x, t) takes values in 
C, and this nonlinear Schrödinger equation is in 
fact a system. In these equations, £ is either 1 or 
—1; throughout this article, we shall refer to the 
former case as the focusing case and to the latter 


134 Dynamical Systems in Mathematical Physics: An Illustration from Water Waves 


as the defocusing case. We may also substitute 
u|u in the nonlinear term in [5] to obtain 
alternate models. 

The variable t represents the time and the space 
variable x belongs either to R or to a finite interval 
when we are dealing with periodic flows. 

The above models are intended to describe the 
propagation of unidirectional waves. For two-way 
waves, see Bona et al. (2002). 

Actually, these equations feature particular solu- 
tions, the so-called traveling waves. Let us recall, for 
instance, that for generalized KdV equation [2] these 
solutions are 


u(t,x) = O,(x — ct) [6] 
Q.(x) = cll? O(/ex) 7] 
Q(x) = (3ch *(px))'/? [8] 


These so-called solitons (Figure 1) move to the right 
without changing their shape; c is the speed of 
propagation. In real life, this phenomenon was 
observed by Russel (1834). Riding his horse, he 
was able to follow for miles the propagation of such 
a wave on the canal from Edinburgh to Glasgow. 
On the other hand, Camassa-Holm equations are 
designed to describe the propagation of peaked 
solitons as shown in Figure 2. 

Focusing nonlinear Schrödinger equations also 


feature solitary waves that read u,(t,x)= 
exp(iwt)O(x), where O is solution to 
O — WO + Or = 0 [9] 


Figure 1 A soliton. 


Figure 2 Peaked soliton. 


There are numerous examples of equations or 
systems of equations that model 2D surface water 
waves. Among all these models, a first issue is to 
identify the relevant models insofar as the dynamical 
properties are concerned. Indeed, we address here 
the question of stability of solitary waves (up to the 
symmetries of the equation). For instance, the 
orbital stability for cubic Schrödinger reads: for 
any £ > 0, there exists a neighborhood Q of u,,(x, 0) 
such that any trajectory starting from Q satisfies 


sup,infginf,,||(t) — exp(i@)u,(t,.—y)||pi <e [10] 


Another issue consists in the interaction of N 
solitons. Schneider and Wayne (2000) have 
addressed the issue of the validity of water-wave 
models when this interaction is concerned. 

Assume now that the validity of these models is 
granted. To consider [1] or [5] as a dynamical 
system, the next issue is then to consider the initial- 
value problem. 


The Initial-Value Problem 


Let us supplement these equations with initial 
data uo in some Sobolev space. We shall consider 
either 


H'(R) = fus [a+r wer as < too 11 


in the case where x belongs to the whole line, or the 
corresponding Sobolev space with periodic bound- 
ary conditions. It should be examined whether these 
equations provide a continuous flow S(t) :uo — u(t) 
in these functional spaces (at least locally in time). 
We would like to point out that for each Sobolev 
space under consideration, we may have a different 
flow. This fact is at the heart of infinite-dimensional 
dynamical systems. 

The initial-value problem was a challenge for 
decades for low norms, that is for small s. The last 
breakthrough was performed by Bourgain (1993). 
Let us present the method for KdV equation. 
Consider U(t)uo the solution of the Airy equation 

Uz + Uxxx = 0, u(0) = uo [12] 
Without going into further details, the idea is to 
perform a fixed-point argument to the Duhamel’s 
form of the equation, 


u(t) = U(t)uy — 5 ff U(t —s)0,(u*(s))ds [13] 
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in a suitable mixed-spacetime Banach space whose 
norm reads ||U(—t)u(t,x)||;,7. This relies on fine 
properties in harmonic analysis. Thanks to this 
method, we know that the Schrödinger equation 
[S] and the KdV equation [1] are well-posed in, 
respectively, H‘(IR),s>0O and H*‘(R),s > -3/4, 
locally in time. For the periodic case, the results 
are slightly different. We would like to point out 
that both KdV and nonlinear Schrodinger equations 
provide semigroups S(t) that do not feature smooth- 
ing effect. A trajectory that starts from H° remains 
in H’; indeed, we can also solve these partial 
differential equations backward in time. 

The next issue is to determine if these flows are 
defined for all times. Loosely speaking, the follow- 
ing alternative holds true: either the local flow in H* 
extends to a global one, or some blow-up phenom- 
enon occurs, that is, ||S(t)uo||,. collapses in finite 
time. 

To this end, let us observe that, for instance, the 
mass fg u(x) dx is conserved for both KdV and 
nonlinear Schrödinger flows. Therefore, one can 
prove that the solutions in L? are global in time. It is 
worthwhile to observe that the Bourgain method 
also provides some global existence results below 
the energy norm. 

Consider now the flow of the solutions in H'. The 
second invariant for nonlinear Schrödinger equa- 
tions reads 


ae, € 2p+2 
| Pde- ua [14 


Therefore, the local solutions in H! extend to global 
ones in the defocusing case (e = — 1). In the focusing 
case, the situation is more contrasted. The solution 
is global if the nonlinearity is less than an H'-critical 
value (p =2 for Schrodinger, and k =4 for general- 
ized KdV equation). This critical value depends on 
some Sobolev embeddings as 


2p+2 1 
J es) PM dx < Cpllall> lll [15] 


Therefore, since the mass is constant, the second 
invariant controls the H! norm of the solution if 
p < 2. Note that the critical power of the nonlinear- 
ity depends also on the dimension of the space; it is 
the cubic Schrodinger that is critical in H'!(R7). It is 
well known that, for some initial data, blow-up 
phenomena can occur for 2D cubic Schrödinger 
equations. Moreover, the behavior of blow-up 
solutions is more or less understood. This analysis 
was performed using the conformal invariance of 
the equation. For quintic Schrödinger equation, 


which is critical in 1D, this conformal invariance 
states that if u(t,x) is solution, then 


. 2 1 
VE = p 1? exp (G5) [16] 


is also solution. 

On the other hand, for the generalized KdV 
equation, there is no conformal invariance and the 
blow-up issue had been open for years. There was 
some numerical evidence that blow-up can occur for 
k=4. Recently, Martel and Merle (2002) have given 
a complete description of the blow-up profile for 
this equation. Their methods are quite complex and 
rely on an ejection of mass at infinity in a suitable 
coordinate system. 

In the discussion so far we have presented some 
quantities that are invariant by the flow of the 


solutions. This is related to the Hamiltonian 
structure of the dynamical systems under 
consideration. 


Hamiltonian Systems in Hydrodynamics 


The study of Hamiltonian systems has developed 
beyond celestial mechanics (the famous n-body 
problems) to other fields in mathematical physics. 
We focus here on dynamical systems that read 


o 
Uy =Ja Hu) [17] 


where H is the Hamiltonian and J some skew- 
symmetric operator. For instance, [1] is a Hamiltonian 
system with J=0O, (i.e. an unbounded skew- 
symmetric operator) and 


H(u) =5 |0 -ujde += fade [18] 


There is a subclass of Hamiltonian systems that 
are integrable by inverse-scattering methods. For 
instance, [1] belongs to this class. Indeed, these 
methods give a complete description of the asymp- 
totics when t — -too. It is well known (Deift and 
Zhou 1993) that, asymptotically, any solution to 
KdV equation consists of a wave train moving to the 
right in the physical space up to a dispersive part 
moving to the left. 

On the other hand, a generic Hamiltonian system 
is not integrable. The study of the asymptotics and 
of the dynamical properties of such a system 
deserves another analysis. We say that a system 
features asymptotic completeness if there exist u, 
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and u— such that the solution u(t) of [17] supple- 
mented with initial data uo satisfies 


u(t) — U(E)u,|| = 0 [19] 


lut) — Uu- > 0 |20] 


when, respectively, t — +oo or t ——oo. Here 
U(t)uo is the solution of the free equation, that is, 
the associated linear equation, supplemented with 
initial data uo; for instance, the Airy equation is the 
free equation related to the KdV equation. The 
Operators u_ — up — u, are called wave opera- 
tors. This is related to the Bohr’s transition in 
quantum mechanics. Loosely speaking, we are able 
to prove these scattering properties for high powers 
in the nonlinearity for subcritical defocusing Schr6- 
dinger equations. 

The asymptotics of trajectories can be more 
complicated. Let us recall that the stability of 
traveling waves is also an important issue in under- 
standing the dynamical properties of these models. 
For instance, let us point out that Martel and Merle 
proved the asymptotic stability of the sum of N 
solitons for KdV in the subcritical case. 

Beyond these asymptotics we are interested in the 
case where the permanent regime is chaotic (or 
turbulent). A scenario is that there exist quasiper- 
iodic solutions of arbitrarily order N for the system 
under consideration. The next challenge about these 
Hamiltonian systems is to apply the Kolmogorov- 
Arnol’d—Moser theory to exhibit this type of 
solutions to systems like [17]. Here we restrict our 
discussion to the case of bounded domains, with 
either periodic or homogeneous Dirichlet conditions. 
Then, let us introduce the following definition: a 
solution is quasiperiodic if there exist a finite 
number N of frequencies wọ such that 


N 
ux = N u(x) exp(iw,t) [21] 
l=1 


This extends the case of periodic solutions 
(N=1), which are isomorphic to the torus. To 
prove the existence of such structures, one idea is 
then to imbed N-dimensional invariant tori into the 
phase space of solutions. One may approximate the 
infinite-dimensional Hamiltonian by a sequence of 
finite ones and consider the convergence of iterated 
symplectic transformations, or one solves directly 
some nonlinear functional equation. Actually, the 
difficulty is that resonances can occur. Resonances 
occur when there are some linear combinations of 
the frequencies that vanish (or that are arbitrarily 
close to 0). This introduces a small divisor problem 


in a phase space that has infinite dimension. To 
overcome these difficulties, a Nash-Moser scheme 
can be implemented (Craig 1996). There are 
numerous such open problems. For instance, let us 
observe that known results are essentially only for 
the case where the dimension of the ambient space 
is 1. On the other hand, quasiperiodic solutions 
correspond to N-dimensional invariant tori for the 
flow of solutions; one may seek for Lagrangian 
invariant tori that correspond to the case where 
N= +00. Current research is directed towards 
extending this analysis. 

Another issue is to seek invariant measures for these 
Hamiltonian dynamical systems, as in statistical 
mechanics. Bourgain was successful in performing 
this analysis for some nonlinear Schrödinger equations 
either in the case of periodic boundary conditions or in 
the whole space. This result is an important step in the 
ergodic analysis of our Hamiltonian dynamical sys- 
tems. This could explain the Poincaré recurrence 
phenomena observed numerically for these types of 
equations: some particular solutions seem to come 
back to their initial state after a transient time. This 
point will not be developed here. 

All these results are properties of conservative 
dynamical systems. We now address the case when 
some dissipation takes place. 


Dissipative Water-Wave Models 


To model the effect of viscosity on 2D surface water 
waves, we go back to a flow governed by the 
Navier-Stokes equations and we proceed to obtain 
damped equations (Ott and Sudan 1970, Kakutani 
and Matsuuchi 1975). In fact, the damping in KdV 
equations can be either a diffusion term that leads to 
study the equation 


U; + Uyyy + UUy = VUyy [22] 


where v is a positive number analogous to the 
viscosity, or a zero-order term — vu on the right- 
hand side of [22]. In the first case, we obtain a 
KdV-Burgers equation that has some smoothing 
effect in time. In the second case, we have a zero- 
order dissipation term. A nonlocal term would be 
vF (JE úl) for Ge[0,1], where F(u)=a 
denotes the Fourier transform of u. 

A first issue concerning damped water-wave 
equations is to estimate the decay rate of the 
solutions towards the equilibrium (no decay) when 
t — +00. For [22] the ultimate result is that, for 
initial data uo € L'(R)ML7(R), the L? norm of the 
solution decays like #14 (Amick et al. 1989). 
Energy methods have been developed to handle 
these problems, as the Shonbeck’s splitting method. 
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The center manifold theory is another approach 
that is employed in dynamical systems. The aim is 
to prove the existence of a finite-dimensional 
manifold that is invariant (in a neighborhood of 
the origin) by the flow of the solutions and that 
attracts the other trajectories with high speed. 
Therefore, this manifold, and the trajectories 
therein, monitor the decay rate of the solutions 
towards the origin. The construction of such a 
manifold relies on splitting properties of the 
spectrum of the associated linearized operator 
(Gallay and Wayne 2002). Using a suitable change 
of variables (that moves the continuous spectrum 
away from the origin), Gallay and Wayne were able 
to construct such a manifold in an infinite-dimen- 
sional phase space. 

Another issue is the understanding of the dynamics 
for damped-forced water-wave equations as 


The dynamical system approach is the attractor 
theory (Temam 1997). Equations such as [23] 
provide dissipative semigroups S(t) in some energy 
spaces. The theory has developed for years and we 
know that these dynamical systems feature global 
attractors. A global attractor is a compact subset in 
the energy space under consideration which is 
invariant by the flow of the solutions and that 
attracts all the trajectories when t — +00. More- 
over, if we deal with periodic boundary conditions, 
this global attractor has finite fractal (or Hausdorff) 
dimension. This dimension depends on the data 
concerning v and f. 

Actually, eqn [23] provides semigroups either in 
L7(R),H'(R), or in H?(R). These three dynamical 
systems feature global attractors Ao, A1, A2. From 
the viewpoint of physics, the attractors describe the 
permanent regime of the flow. One may wonder if 
this permanent regime depends on the space chosen 
for the mathematical study. Eventually, the last 
result for this issue establish that Ag = A, = A2. This 
property is equivalent to prove the asymptotical 
smoothing effect for the associated semigroup: even 
if S(t) is not a smoothing operator for finite t, then 
all solutions converge to a smooth set when ¢ goes to 
the infinity. 

All these results are for subcritical nonlinearities. 
As already noted, dissipation provides smoothing at 
infinity. Nevertheless, damping does not prevent 
blow-up. Let us illuminate this by the following 
result due to Tsutsumi (1984). The damped Schr6- 
dinger equation 


iu; + ivu + uxx + |u| Pu = 0 [24] 


features blow-up solutions in H'(R) for p > 2, even 
if all solutions are damped in L? (R) with exponen- 
tial speed. 

This completes the discussion of damped-forced 
water-wave equations. We now consider equations 
that are forced with a random forcing term. 


Stochastic Water-Wave Models 


During the modeling process that led to KdV or 
Schrödinger equations from Euler equation, we have 
neglected some low-order terms. We now model 
these terms by a noise and we are led to a new 
randomly forced dynamical system that reads 


U; + Uy + Uyyy + UU, = ve [25] 


Here one may assume that (x,t) is a Gaussian 
process with correlations 


E(E(x, 2)E(y,8)) = bx-y6r-s 26 


that is, a spacetime white noise. The parameter y 
is the amplitude of the process. Unfortunately, due 
to the lack of smoothing effect of KdV or 
Schrödinger equations, it is more convenient to 
work with a noise that is correlated in space, 
satisfying 


E(E(x, t)E(y, s)) = C(x a V) Ors [27] 


here c(x — y) is some smooth ansatz for x-y, defined 
from some Hilbert-Schmidt kernel K as 


c(x— y) = | KEKO, z) dz 


We also consider random perturbation of focusing 
Schrödinger equation, which reads either 


Uy + ite, + iu|7Pu = u£ [28] 
(which represents a multiplicative noise) or 
Uy + itty, + ilulPu = iyé [29] 


(which is an additive noise). In the former case, the 
noise acts as a potential, while in the latter case it 
represents a forcing term. These equations also 
model the propagation of waves in an inhomoge- 
neous medium. 

Research is in progress to study these stochastic 
dynamical systems. To begin with, the theory of the 
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initial-value problem has to be established in this 
new context (see, e.g., de Bouard and Debussche 
(2003)). 

One challenge is to understand the effect of noise 
on dynamical properties of the particular solutions 
described above, for instance, the solitary waves for 
Schrodinger equation, either in the subcritical case 
p <2 or in the critical case p=2 and beyond. 

Results obtained both theoretically and numeri- 
cally on the influence of the noise on blow-up 
phenomena (random process) for generalized Schr6- 
dinger equations are likely almost-sure results. 

On the one hand, if the noise is additive and the 
power supercritical, p > 1, there is some numerical 
evidence that a spacetime white noise can delay or 
even prevent the blow-up. However, if the noise is 
not so irregular (as for the correlated in space noise 
described above) it seems that any solution blows up 
in finite time. 

de Bouard and Debussche have proved that for 
either an additive or a multiplicative noise, any 
smooth and localized (in space) initial data give rise 
to a trajectory that collapses in arbitrarily small 
time with a positive probability. This contrasts 
with the deterministic case, where only particular 
initial data could lead to blow-up trajectories. 
Actually, the noise enforces that any trajectory 
must pass through this blow-up region, with a 
positive probability. 
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Introduction 


Effective field theories (EFTs) are the counterpart of 
the “theory of everything.” They are the field 
theoretical implementation of the quantum ladder: 
heavy degrees of freedom need not be included 
among the quantum fields of an EFT for a 
description of low-energy phenomena. For example, 
we do not need quantum gravity to understand the 
hydrogen atom nor does chemistry depend upon the 
structure of the electromagnetic interaction of 
quarks. 

EFTs are approximations by their very nature. 
Once the relevant degrees of freedom for the 
problem at hand have been established, the corre- 
sponding EFT is usually treated perturbatively. It 
does not make much sense to search for an exact 
solution of the Fermi theory of weak interactions. In 
the same spirit, convergence of the perturbative 
expansion in the mathematical sense is not an issue. 
The asymptotic nature of the expansion becomes 
apparent once the accuracy is reached where effects 
of the underlying “fundamental” theory cannot be 
neglected any longer. The range of applicability of 
the perturbative expansion depends on the separa- 
tion of energy scales that define the EFT. 

EFTs pervade much of modern physics. The 
effective nature of the description is evident in 
atomic and condensed matter physics. The present 
article will be restricted to particle physics, where 
EFTs have become important tools during the last 
25 years. 


Classification of EFTs 


A first classification of EFTs is based on the 
structure of the transition from the “fundamental” 
(energies > A) to the “effective” level (energies < A). 


1. Complete decoupling The fundamental the- 
ory contains heavy and light degrees of freedom. 


Under very general conditions (decoupling theorem, 
Appelquist and Carazzone 1975) the effective 
Lagrangian for energies KA, depending only on 
light fields, takes the form 


1 
Leff = La<a + Deez: > Si, Oj, [1] 
d>4 ig 
The heavy fields with masses>A have 


been “integrated out” completely. Lg<4 contains 
the potentially renormalizable terms with operator 
dimension d < 4 (in natural mass units where Bose 
and Fermi fields have d=1 and 3/2, respectively), 
the g; are coupling constants and the Oj, are 
monomials in the light fields with operator dimen- 
sion d. In a slightly misleading notation, Lyg<,4 
consists of relevant and marginal operators, whereas 
the O;, (d > 4) are denoted irrelevant operators. The 
scale A can be the mass of a heavy field (e.g., Mw in 
the Fermi theory of weak interactions) or it reflects 
the short-distance structure in a more indirect way. 

2. Partial decoupling In contrast to the previous 
case, the heavy fields do not disappear completely 
from the EFT but only their high-momentum modes 
are integrated out. The main area of application is 
the physics of heavy quarks. The procedure involves 
one or several field redefinitions introducing a frame 
dependence. Lorentz invariance is not manifest but 
implies relations between coupling constants of the 
EFT (reparametrization invariance). 

3. Spontaneous symmetry breaking The transi- 
tion from the fundamental to the effective level 
occurs via a phase transition due to spontaneous 
symmetry breaking generating (pseudo-)Goldstone 
bosons. A spontaneously broken symmetry relates 
processes with different numbers of Goldstone 
bosons. Therefore, the distinction between renorma- 
lizable (d < 4) and nonrenormalizable (d > 4) parts 
in the effective Lagrangian [1] becomes meaningless. 
The effective Lagrangian of type 3 is generically 
nonrenormalizable. Nevertheless, such Lagrangians 
define perfectly consistent quantum field theories at 
sufficiently low energies. Instead of the operator 
dimension as in [1], the number of derivatives of 
the fields and the number of symmetry-breaking 
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insertions distinguish successive terms in the Lagran- 
gian. The general structure of effective Lagrangians 
with spontaneously broken symmetries is largely 
independent of the specific physical realization 
(universality). There are many examples in 
condensed matter physics, but the two main 
applications in particle physics are electroweak 
symmetry breaking and chiral perturbation theory 
(both discussed later) with the spontaneously broken 
global chiral symmetry of quantum chromody- 
namics QCD. 


Another classification of EFTs is related to the 
status of their coupling constants. 


A. Coupling constants can be determined by match- 
ing the EFT with the underlying theory at short 
distances. The underlying theory is known and 
Green functions can be calculated perturbatively 
at energies ~A both in the fundamental and in 
the effective theory. Identifying a minimal set of 
Green functions fixes the coupling constants g;, 
in eqn [1] at the scale A. Renormalization group 
equations can then be used to run the couplings 
down to lower scales. The nonrenormalizable 
terms in the Lagrangian [1] can be fully included 
in the perturbative analysis. 

B. Coupling constants are constrained by symme- 
tries only. 

e The underlying theory and therefore also the 
EFT coupling constants are unknown. This is 
the case of the standard model (SM) (see the 
next section). A perturbative analysis beyond 
leading order only makes sense for the known 
renormalizable part £g<4. The nonrenormaliz- 
able terms suppressed by powers of A are 
considered at tree level only. The associated 
coupling constants g;, serve as bookmarks for 
new physics. Usually, but not always (cf., e.g., 
the subsection “Noncommutative spacetime”), 
the symmetries of Lg<4, are assumed to 
constrain the couplings. 

e The matching cannot be performed in perturba- 
tion theory even though the underlying theory 
is known. This is the generic situation for EFTs 
of type 3 involving spontaneous symmetry 
breaking. The prime example is chiral perturba- 
tion theory as the EFT of QCD at low energies. 


The SM as an EFT 


With the possible exception of the scalar sector to be 
discussed in the subsection “Electroweak symmetry 
breaking” the SM is very likely the renormalizable 
part of an EFT of type 1B. Except for nonzero 
neutrino masses, the SM Lagrangian La<4 in [1] 


accounts for physics up to energies of roughly 
the Fermi scale Gz” * ~ 300 GeV. 

Since the SM works exceedingly well up to the 
Fermi scale where the electroweak gauge symmetry 
is spontaneously broken it is natural to assume that 
the operators O;, with d > 4, made up from fields 
representing the known degrees of freedom and 
including a single Higgs doublet in the SM proper, 
should be gauge invariant with respect to the full 
SM gauge group SU(3), x SU(2) x U(1)y. An 
almost obvious constraint is Lorentz invariance 
that will be lifted in the next subsection, however. 

These requirements limit the Lagrangian with 
operator dimension d=5 to a single term (except 
for generation multiplicity), consisting only of a left- 
handed lepton doublet Ly and the Higgs doublet ®: 


Ogas = eee LC! Ly 8); + hic. 2] 


This term violates lepton number and generates 
nonzero Majorana neutrino masses. For a neutrino 
mass of 1eV, the scale A would have to be of the 
order of 10!’ GeV if the associated coupling con- 
stant in the EFT Lagrangian [1] is of order 1. 

In contrast to the simplicity for d= 5, the list of 
gauge-invariant operators with d=6 is enormous. 
Among them are operators violating baryon or 
lepton number that must be associated with a scale 
much larger than 1TeV. To explore the territory 
close to present energies, it therefore makes sense to 
impose baryon and lepton number conservation on 
the operators with d=6. Those operators have all 
been classified (Buchmüller and Wyler 1986) and the 
number of independent terms is of the order of 80. 
They can be grouped in three classes. 

The first class consists of gauge and Higgs fields 
only. The corresponding EFT Lagrangian has been 
used to parametrize new physics in the gauge sector 
constrained by precision data from LEP. The second 
class consists of operators bilinear in fermion fields, 
with additional gauge and Higgs fields to generate 
d=6. Finally, there are four-fermion operators 
without other fields or derivatives. Some of the 
operators in the last two groups are also constrained 
by precision experiments, with a certain hierarchy of 
limits. For lepton and/or quark flavor conserving 
terms, the best limits on A are in the few TeV range, 
whereas the absence of neutral flavor changing 
processes yields lower bounds on A that are several 
orders of magnitude larger. If there is new physics in 
the TeV range flavor changing neutral transitions 
must be strongly suppressed, a powerful constraint 
on model building. 

It is amazing that the most general renormalizable 
Lagrangian with the given particle content accounts 


for almost all experimental results in such an 
impressive manner. Finally, we recall that many of 
the operators of dimension 6 are also generated in the 
SM via radiative corrections. A necessary condition 
for detecting evidence for new physics is therefore 
that the theoretical accuracy of radiative corrections 
matches or surpasses the experimental precision. 


Noncommutative Spacetime 


Noncommutative geometry arises in some string 
theories and may be expected on general grounds 
when incorporating gravity into a quantum field 
theory framework. The natural scale of noncommu- 
tative geometry would be the Planck scale in this 
case without observable consequences at presently 
accessible energies. However, as in theories with 
large extra dimensions the characteristic scale Anc 
could be significantly smaller. In parallel to theoret- 
ical developments to define consistent noncommu- 
tative quantum field theories (short for quantum 
field theories on noncommutative spacetime), a 
number of phenomenological investigations have 
been performed to put lower bounds on Anc. 
Noncommutative geometry is a deformation of 
ordinary spacetime where the coordinates, repre- 
sented by Hermitian operators x,,, do not commute: 


noe Oy [3] 


The antisymmetric real tensor 6,,, has dimensions 

2 j a 
length” and it can be interpreted as parametrizing 
the resolution with which spacetime can be probed. 
In practically all applications, 6,,,, has been assumed 
to be a constant tensor and we may associate an 
energy scale Anc with its nonzero entries: 


ANC E Ou [4] 


There is to date no unique form for the noncommu- 
tative extension of the SM. Nevertheless, possible 
observable effects of noncommutative geometry have 
been investigated. Not unexpected from an EFT point 
of view, for energies <Anc, noncommutative field 
theories are equivalent to ordinary quantum field 
theories in the presence of nonstandard terms contain- 
ing 6, (Seiberg-Witten map). Practically all applica- 
tions have concentrated on effects linear in 0,,,. 

Kinetic terms in the Lagrangian are in general 
unaffected by the noncommutative structure. New 
effects arise therefore mainly from renormalizable 
d=4 interactions terms. For example, the Yukawa 
coupling gywd generates the following interaction 
linear in 6,,,: 


LYS = gy Ou, ("p Wo + Mba” b+ para’) [5] 
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These interaction terms have operator dimension 6 and 
they are suppressed by 0, ~ AXo. The major differ- 
ence to the previous discussion on physics beyond the 
SM is that there is an intrinsic violation of Lorentz 
invariance due to the constant tensor 6,,,,. In contrast to 
the previous analysis, the terms with dimension d > 4 
do not respect the symmetries of the SM. 

If Ow is indeed constant over macroscopic 
distances, many tests of Lorentz invariance can be 
used to put lower bounds on Anc. Among the exotic 
effects investigated are modified dispersion relations 
for particles, decay of high-energy photons, charged 
particles producing Cerenkov radiation in vacuum, 
birefringence of radiation, a variable speed of light, 
etc. A generic signal of noncommutativity is the 
violation of angular momentum conservation that 
can be searched for at the Large Hadron Collider 
(LHC) and at the next linear collider. 

Lacking a unique noncommutative extension of 
the SM, unambiguous lower bounds on Anc are 
difficult to establish. However, the range 
Anc $10TeV is almost certainly excluded. An 
estimate of the induced electric dipole moment of 
the electron (noncommutative field theories violate 
CP in general to first order in @,,,) yields 
Anc Z 100 TeV. On the other hand, if the SM were 
CP invariant, noncommutative geometry would be 


able to account for the observed CP violation in 
K? - K? mixing for Anc ~ 2 TeV. 


Electroweak Symmetry Breaking 


In the SM, electroweak symmetry breaking is 
realized in the simplest possible way through 
renormalizable interactions of a scalar Higgs doub- 
let with gauge bosons and fermions, a gauged 
version of the linear o model. 

The EFT version of electroweak symmetry 
breaking (EWEFT) uses only the experimentally 
established degrees of freedom in the SM (fermions 
and gauge bosons). Spontaneous gauge symmetry 
breaking is realized nonlinearly, without introducing 
additional scalar degrees of freedom. It is a low- 
energy expansion where energies and masses are 
assumed to be small compared to the symmetry- 
breaking scale. From both perturbative and 
nonperturbative arguments we know that this scale 
cannot be much bigger than 1TeV. The Higgs 
model can be viewed as a specific example of an 
EWEFT as long as the Higgs boson is not too light 
(heavy-Higgs scenario). 

The lowest-order effective Lagrangian takes the 
following form: 


2 
Ces = Lp + Le [6] 
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where Lr contains the gauge-invariant kinetic terms for 
quarks and leptons including mass terms. In addition to 
the kinetic terms for the gauge bosons W,,, B,,, the 
bosonic Lagrangian £g contains the characteristic 
lowest-order term for the would-be-Goldstone bosons: 


on = pkin 


2 
V 
gauge T 4 (D,,U Deu) [7] 


with the gauge-covariant derivative 


D,U=0,U —igW,U + ig'UB,, 
= 8] 
T a T3 | 
W= 5 Wm B, =a By 

where (...) denotes a (two-dimensional) trace. The 
matrix field U(¢) carries the nonlinear representa- 
tion of the spontaneously broken gauge group and 
takes the value U=1 in the unitary gauge. The 
Lagrangian [6] is invariant under local SU(2), x 
U(1)y transformations: 


i 
W, B g Wgl T ELLI 
r a i 
B, > B, + y SRÔuSR 


fL > erft, fr— erfr, U— g Ugh 


with 


gL(x) = exp(iay(x)T/2) 
gr(x) = exp(iay(x)73/2) 


and fir) are quark and lepton fields grouped in 
doublets. 

As is manifest in the unitary gauge U = 1, the lowest- 
order Lagrangian of the EWEFT just implements the 
tree-level masses of gauge bosons (Mw = Mz cos dw = 
vg /2, tan Ow = g'/g) and fermions but does not carry 
any further information about the underlying mechan- 
ism of spontaneous gauge symmetry breaking. This 
information is first encoded in the couplings a; of the 
next-to-leading-order Lagrangian 


14 
4 
D a = ` AO; [10] 
i=0 


with monomials O; of O(p*) in the low-energy 
expansion. The Lagrangian [10] is the most general 
CP and SU(2), x U(1)y invariant Lagrangian of O(p*). 

Instead of listing the full Lagrangian, we display 
three typical examples: 


Oo= ari 
O3 = —-g(W,[V“, V”) e 
Os = (V,,V")" 


where 


T=UnU', V,=D,UU' 


i |12] 
Wiw = O; —1igW,,, 0, — igW,| 

In the unitary gauge, the monomials O; reduce to 
polynomials in the gauge fields. The three examples 
in eqn [11] start with quadratic, cubic, and quartic 
terms in the gauge fields, respectively. The strongest 
constraints exist for the coefficients of quadratic 
contributions from the Large Electron—Positron 
collider LEP1, less restrictive ones for the cubic 
self-couplings from LEP2, and none so far for the 
quartic ones. 


Heavy-Quark Physics 


EFTs in this section are derived from the SM and 
they are of type 2A in the classification introduced 
previously. In a first step, one integrates out W, Z, 
and top quark. Evolving down from My to mp, 
large logarithms a,(mp) In (Mym) are resummed 
into the Wilson coefficients. At the scale of the 
b-quark, QCD is still perturbative, so that at least a 
part of the amplitudes is calculable in perturbation 
theory. To separate the calculable part from the rest, 
the EFTs below perform an expansion in 1/mo, 
where mọ is the mass of the heavy quark. 

Heavy-quark EFTs offer several important 
advantages. 


1. Approximate symmetries that are hidden in full 
QCD appear in the expansion in 1/mo. 

2. Explicit calculations simplify in general, for 
example, the summing of large logarithms via 
renormalization group equations. 

3. The systematic separation of hard and soft effects 
for certain matrix elements (factorization) can be 
achieved much more easily. 


Heavy-Quark Effective Theory 


Heavy-quark effective theory (HQET) is reminiscent 
of the Foldy-Wouthuysen transformation (nonrela- 
tivistic expansion of the Dirac equation). It is a 
systematic expansion in 1/mo, when mo > Aqcn, 
the scale parameter of QCD. It can be applied to 
processes where the heavy quark remains essentially 
on shell: its velocity v changes only by small 
amounts ~Agcp/mo. In the hadron rest frame, the 
heavy quark is almost at rest and acts as a 
quasistatic source of gluons. 

More quantitatively, one writes the heavy-quark 
momentum as p“=mov"+k", where v is the 
hadron 4-velocity (v?=1) and k is a residual 


momentum of O(Aqcp). The heavy quark field Q(x) 
is then decomposed with the help of energy 
projectors P#=(1 +% )/2 and employing a field 
redefinition: 


Q(x) =e" (h(x) + Hy(x)) 
p(x) = "= Pt O(x) [13] 
H, (x) = e= P7 O(x) 


In the hadron rest frame, hb ,(x) and H,(x) corre- 
spond to the upper and lower components of O(x), 
respectively. With this redefinition, the heavy-quark 
Lagrangian is expressed in terms of a massless field 
h, and a “heavy” field H,: 


Lo = Q(ip - mgo)Q 
+ mixed terms [14] 


At the semiclassical level, the field H, can 
be eliminated by using the QCD field equation (1) — 
mo)O =0 yielding the nonlocal expression 


Lo = h, lv: Dh, +h, iD, TD nici [15] 
with DẸ =(g"”—v”v”)D,. The field redefinition in 
[13] ensures that, in the heavy-hadron rest frame, 
derivatives of þ, give rise to small momenta of 
O(Agcp) only. The Lagrangian [15] is the starting 
point for a systematic expansion in mo. 

To leading order in 1/mo(Q =b, c), the Lagrangian 


Loe =b w- Db, + Giv - De, [16] 


exhibits two important approximate symmetries of 
HQET: the flavor symmetry SU(2)p relating heavy 
quarks moving with the same velocity and the 
heavy-quark spin symmetry generating an overall 
SU(4) spin-flavor symmetry. The flavor symmetry is 
obvious and the spin symmetry is due to the absence 
of Dirac matrices in [16]: both spin degrees of 
freedom couple to gluons in the same way. The 
simplest spin-symmetry doublet consists of a pseu- 
doscalar meson H and the associated vector meson H*. 
Denoting the doublet by H, the matrix elements of the 
heavy-to-heavy transition current are determined to 
leading order in 1/mg by a single form factor, up to 
Clebsch-—Gordan coefficients: 


(H(v')|by Th Hv) ~ uv’) [17] 


I is an arbitrary combination of Dirac matrices and 
the form factor € is the so-called Isgur-Wise 
function. Moreover, since h,y"h, is the Noether 
current of heavy-flavor symmetry, the Isgur—Wise 
function is fixed in the no-recoil limit v =v to be 
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E(v-v'=1)=1. The semileptonic decays B — DI, 
and B — D*lvņ; are therefore governed by a single 
normalized form factor to leading order in 1/mo, 
with important consequences for the determination 
of the Cabibbo—Kobayashi—-Maskawa (CKM) matrix 
element V->». 

The HQET Lagrangian is superficially frame 
dependent. Since the SM is Lorentz invariant, the 
HQET Lagrangian must be independent of the 
choice of the frame vector v. Therefore, a shift in v 
accompanied by corresponding shifts of the fields h, 
and of the covariant derivatives must leave the 
Lagrangian invariant. This reparametrization invari- 
ance is unaffected by renormalization and it relates 
coefficients with different powers in 1/mo. 


Soft-Collinear Effective Theory 


HQET is not applicable in heavy-quark decays 
where some of the light particles in the final state 
have momenta of O(mg), for example, for inclusive 
decays like B — X,y or exclusive ones like B — az. 
In recent years, a systematic heavy-quark expansion 
for heavy-to-light decays has been set up in the form 
of soft-collinear effective theory (SCET). 

SCET is more complicated than HQET because 
now the low-energy theory involves more than one 
scale. In the SCET Lagrangian, a light quark or 
gluon field is represented by several effective fields. 
In addition to the soft fields h, in [15], the so-called 
collinear fields enter that have large energy and 
carry large momentum in the direction of the light 
hadrons in the final state. 

In addition to the frame vector v of HQET 
(v=(1,0,0,0) in the heavy-hadron rest frame), 
SCET introduces a lightlike reference vector n in 
the direction of the jet of energetic light particles 
(for inclusive decays), for example, n= (1,0,0,1). 
All momenta p are decomposed in terms of light- 
cone coordinates (p+, p_, p1) with 

p= Fin + Pats ph =p pit pl [18 
where n=2v —n=(1,0,0,—1). For large energies, 
the three light-cone components are widely 
separated, with p- = O(mọ) being large while pı 
and p, are small. Introducing a small parameter A ~ 
pı/p-, the light-cone components of (hard-)colli- 
near particles scale like (p+, p_,p1)=mo(A’, 1, A). 
Thus, there are three different scales in the problem 
compared to only two in HQET. For exclusive 
decays, the situation is even more involved. 

The SCET Lagrangian is obtained from the full 
theory by an expansion in powers of A. In addition 
to the heavy quark field 4,, one introduces soft as 
well as collinear quark and gluon fields by field 
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redefinitions so that the various fields have momen- 
tum components that scale appropriately with A. 

Similar to HQET, the leading-order Lagrangian of 
SCET exhibits again approximate symmetries that 
can lead to a reduction of form factors describing 
heavy-to-light decays. As in HQET, reparametriza- 
tion invariance implements Lorentz invariance and 
results in stringent constraints on subleading correc- 
tions in SCET. 

An important result of SCET is the proof of 
factorization theorems to all orders in a,. For 
inclusive decays, the differential rate is of the form 


dr ~ HJ x S [19] 


where H contains the hard corrections. The 
so-called jet function J sensitive to the collinear 
region is convoluted with the shape function S 
representing the soft contributions. At leading order, 
the shape function drops out in the ratio of weighted 
decay spectra for B > X„lvı and B —> X,y allowing 
for a determination of the CKM matrix element V,„p. 
Factorization theorems have become available for an 
increasing number of processes, most recently also 
for exclusive decays of B into two light mesons. 


Nonrelativistic QCD 


In HQET the kinetic energy of the heavy quark 
appears as a small correction of O(Adcp /mg). For 
systems with more than one heavy quark, the kinetic 
energy cannot be treated as a perturbation in 
general. For instance, the virial theorem implies 
that the kinetic energy in quarkonia OO is of the 
same order as the binding energy of the bound state. 

NRQCD, the EFT for heavy quarkonia, is an 
extension of HQET. The Lagrangian for NRQCD 
coincides with HQET in the bilinear sector of the 
heavy-quark fields but it also includes quartic 
interactions between quarks and antiquarks. The 
relevant expansion parameter in this case is the 
relative velocity between O and OQ. In contrast to 
HQET, there are at least three widely separate scales 
in heavy quarkonia: in addition to mo, the relative 
momentum of the bound quarks p ~ mov withv < 1 
and the typical kinetic energy E ~ mov*. The main 
challenges are to derive the quark—antiquark potential 
directly from QCD and to describe quarkonium 
production and decay at collider experiments. In the 
abelian case, the corresponding EFT for quantum 
electrodynamics (QED) is called NRQED that has 
been used to study electromagnetically bound systems 
like the hydrogen atom, positronium, muonium, etc. 

In NRQCD only the hard degrees of freedom with 
momenta ~mg are integrated out. Therefore, 
NRQCD is not enough for a systematic computation 


of heavy-quarkonium properties. Because the non- 
relativistic fluctuations of order mov and mov? have 
not been separated, the power counting in NRQCD 
is ambiguous in higher orders. 

To overcome those deficiencies, two approaches 
have been put forward: potential NRQCD 
(pNRQCD) and velocity NRQCD (vNRQCD). In 
pNRQCD, a two-step procedure is employed for 
integrating out quark and gluon degrees of freedom: 


QCD A>mo 
NRQCD mo >A > mov 
Į} 
pNRQCD mov > A > mov? 


The resulting EFT derives its name from the fact 
that the four-quark interactions generated in the 
matching procedure are the potentials that can be 
used in Schrodinger perturbation theory. It is 
claimed that pNRQCD can also be used in the 
nonperturbative domain where as(mov?) is of order 1 
or larger. The advantage would be that also charmo- 
nium becomes accessible to a systematic EFT analysis. 

The alternative approach of vNRQCD is only 
applicable in the fully perturbative regime when 
mo > mov > mou” > Agen is valid. It separates 
the different degrees of freedom in a single step 
leaving only ultrasoft energies and momenta of 
O(mov*) as continuous variables. The separation 
of larger scales proceeds in a similar fashion as in 
HQET via field redefinitions. A systematic nonrela- 
tivistic power counting in the velocity v is 
implemented. 


The Standard Model at Low Energies 


At energies below 1 GeV, hadrons — rather than quarks 
and gluons — are the relevant degrees of freedom. 
Although the strong interactions are highly nonpertur- 
bative in the confinement region, Green functions and 
amplitudes are amenable to a systematic low-energy 
expansion. The key observation is that the QCD 
Lagrangian with N; = 2 or 3 light quarks, 


Loco = q(iD-— M,)q - ri Go Ca ” FLheayy guik 
= qiiPa. + gri Par — q1 Mqqr 
— arM gq. +--> 
qr =4(1 +7s)q, | = (ud{s)) |20] 


exhibits a global symmetry 


SU(N»), x SU(Nf)r x U()y x UC), [21] 


chiral group G 


in the limit of Ny massless quarks (M4 =0). At the 
hadronic level, the quark number symmetry U(1),y is 
realized as baryon number. The axial U(1), is not a 
symmetry at the quantum level due to the abelian 
anomaly. 

Although not yet derived from first principles, 
there are compelling theoretical and phenomenolo- 
gical arguments that the ground state of QCD is 
not even approximately chirally symmetric. All 
evidence, such as the existence of relatively light 
pseudoscalar mesons, points to spontaneous chiral 
symmetry breaking G — SU(Ny)y, where SU(Ny)y is 
the diagonal subgroup of G. The resulting N? —1 
(pseudo-)Goldstone bosons interact weakly at low 
energies. In fact, Goldstone’s theorem ensures that 
purely mesonic or single-baryon amplitudes vanish 
in the chiral limit (M; =0) when the momenta of all 
pseudoscalar mesons tend to zero. This is the basis 
for a systematic low-energy expansion of Green 
functions and amplitudes. The corresponding EFT 
(type 3B in our classification) is called chiral 
perturbation theory (CHPT) (Weinberg 1979, 
Gasser and Leutwyler 1984, 1985). 

Although the construction of effective Lagran- 
gians with nonlinearly realized chiral symmetry is 
well understood, there are some subtleties involved. 
First of all, there may be terms in a chiral-invariant 
action that cannot be written as the four- 
dimensional integral of an invariant Lagrangian. 
The chiral anomaly for SU(3) x SU(3) bears witness 
of this fact and gives rise to the Wess—Zumino- 
Witten action. A general theorem to account for 
such exceptional cases is due to D’Hoker and 
Weinberg (1994). Consider the most general action 
for Goldstone fields with symmetry group G, 
spontaneously broken to a subgroup H. The only 
possible non-G-invariant terms in the Lagrangian 
that give rise to a G-invariant action are in one-to- 
one correspondence with the generators of the fifth 
cohomology group H°?(G/H;R) of the coset mani- 
fold G/H. For the relevant case of chiral SU(N), the 
coset space SU(N); x SU(N)p/SU(N)y is itself an 
SU(N) manifold. For N > 3,H°(SU(N);R) has a 
single generator that corresponds precisely to the 
Wess—Zumino—Witten term. 

At a still deeper level, one may ask whether chiral- 
invariant Lagrangians are sufficient (except for the 
anomaly) to describe the low-energy structure of 
Green functions as dictated by the chiral Ward 
identities of QCD. To be able to calculate such 
Green functions in general, the global chiral sym- 
metry of QCD is extended to a local symmetry by 
the introduction of external gauge fields. The 
following invariance theorem (Leutwyler 1994) 
provides an answer to the above question. Except 
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for the anomaly, the most general solution of the 
Ward identities for a spontaneously broken symme- 
try in Lorentz-invariant theories can be obtained 
from gauge-invariant Lagrangians to all orders in 
the low-energy expansion. The restriction to Lorentz 
invariance is crucial: the theorem does not hold in 
general in nonrelativistic effective theories. 


Chiral Perturbation Theory 


The effective chiral Lagrangian of the SM in the 
meson sector is displayed in Table 1. The lowest- 
order Lagrangian for the purely strong interactions 
is given by 
F2 
Lp= 7 (D,,UD"U") 
2 
A ~ ((stip)UT+(s—ip)U) 122] 
with a covariant derivative D=0,,U — i(v, + a„)U + 
iU(v,,—a,,). The first term has the familiar form [7] 
of the gauged nonlinear o model, with the matrix 
field U(¢) transforming as U > grUgi under chiral 
rotations. External fields v,,,a,,s,p are introduced 
for constructing the generating functional of Green 
functions of quark currents. To implement explicit 
chiral symmetry breaking, the scalar field s is set 
equal to the quark mass matrix M, at the end of the 
calculation. 
The leading-order Lagrangian has two free para- 
meters F, B related to the pion decay constant and to 
the quark condensate, respectively: 


F, = F1 + O(m,)| 


p j |23] 

(Ojuu|0) = — F B|1 + O(m,)| 
The Lagrangian [22] gives rise to MŽ = B(m, + mg) 
at lowest order. From detailed studies of pion-pion 
scattering (Colangelo et al. 2001), we know that the 
leading term accounts for at least 94% of the pion 
mass. This supports the standard counting of CHPT, 


Table 1 The effective chiral Lagrangian of the SM in the 
meson sector 





Lcniral order (# Of LECs) Loop order 
Lp (2) + LOE (2) + LE (1) + LEME (1) L=0 
+L£s(10) + £26°(32) + Lesa | (22) + LS (28) L=1 
+L (14) + LAER (14) + Loe (5) 

+L (90) L=2 


The numbers in brackets refer to the number of independent 
couplings for N; =3.. The parameter-free Wess—Zumino—Witten 
action Swzw that cannot be written as the four-dimensional 
integral of an invariant Lagrangian must be added. 
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with quark masses booked as O(p7) like the two- 
derivative term in [22]. 

The effective chiral Lagrangian in Table 1 
contains the following parts: 


1. strong interactions: Lp2, Lys, ray or + Swzw3 
2. nonleptonic weak interactions to first order in 


the Fermi coupling constant Gp: Le a 
FP 
[^ôsS=1 pas=). 
Ggp* I~ Grp , 
3. radiative corrections for strong processes: 
em em 


0209 ~e2p25 
4. radiative corrections for 


decays: cot. ee and 
5. radiative i 


corrections for 
decays: £ 


nonleptonic weak 


semileptonic weak 
leptons 


e*p 
Beyond the leading order, unitarity and analyticity 
require the inclusion of loop contributions. In the 
purely strong sector, calculations have been per- 
formed up to next-to-next-to-leading order. Figure 1 
shows the corresponding skeleton diagrams of O(p°), 
with full lowest-order tree structures to be attached 
to propagators and vertices. The coupling constants 
of the various Lagrangians in Table 1 absorb the 
divergences from loop diagrams leading to finite 
renormalized Green functions with scale-dependent 
couplings, the so-called low-energy constants (LECs). 
As in all EFTs, the LECs parametrize the effect of 
“heavy” degrees of freedom that are not represented 
explicitly in the EFT Lagrangian. Determination of 
those LECs is a major task for CHPT. In addition to 
phenomenological information, further theoretical 
input is needed. Lattice gauge theory has already 
furnished values for some LECs. To bridge the gap 
between the low-energy domain of CHPT and the 
perturbative domain of QCD, large-N, motivated 
interpolations with meson resonance exchange have 
been used successfully to pin down some of the LECs. 
Especially in cases where the knowledge of LECs is 
limited, renormalization group methods provide 
valuable information. As in renormalizable quantum 
field theories, the leading chiral logs (In M?/,7)” 


CO a OOQ 


@——_® = 
Figure 1 Skeleton diagrams of O(pê).. Normal vertices are 
from L,2, crossed circles and the full square denote vertices 
from Lys and Lps, respectively. 


with a typical meson mass M, renormalization scale u 
and loop order L can in principle be determined from 
one-loop diagrams only. In contrast to the renorma- 
lizable situation, new derivative structures (and quark 
mass insertions) occur at each loop order preventing a 
straightforward resummation of chiral logs. 

Among the many applications of CHPT in the 
meson sector are the determination of quark mass 
ratios and the analysis of pion—pion scattering where 
the chiral amplitude of next-to-next-to-leading order 
has been combined with dispersion theory (Roy 
equations). Of increasing importance for precision 
physics (CKM matrix elements, (g—2),,...) are 
isospin-violating corrections including radiative cor- 
rections, where CHPT provides the only reliable 
approach in the low-energy region. Such corrections 
are also essential for the analysis of hadronic atoms 
like pionium, a 7*z bound state. 

CHPT has also been applied extensively in the 
single-baryon sector. There are several differences to 
the purely mesonic case. For instance, the chiral 
expansion proceeds more slowly and the nucleon 
mass my provides a new scale that does not vanish in 
the chiral limit. The formulation of heavy-baryon 
CHPT was modeled after HQET integrating out the 
nucleon modes of O(my). To improve the conver- 
gence of the chiral expansion in some regions of phase 
space, a manifestly Lorentz-invariant formulation has 
been set up more recently (relativistic baryon CHPT). 
Many single-baryon processes have been calculated to 
fourth order in both approaches, for example, pion- 
nucleon scattering. With similar methods as in the 
mesonic sector, hadronic atoms like pionic or kaonic 
hydrogen have been investigated. 


Nuclear Physics 


In contrast to the meson and single-baryon sectors, 
amplitudes with two or more nucleons do not vanish 
in the chiral limit when the momenta of Goldstone 
mesons tend to zero. Consequently, the power 
counting is different in the many-nucleon sector. 
Multinucleon processes are treated with different 
EFTs depending on whether all momenta are smaller 
or larger than the pion mass. 

In the very low energy regime |p| < M,, pions or 
other mesons do not appear as dynamical degrees of 
freedom. The resulting EFT is called “pionless EFT” 
and it describes systems like the deuteron, where the 
typical nucleon momenta are ~\/mnBy ~ 45 MeV 
(By is the binding energy of the deuteron). The 
Lagrangian for the strong interactions between two 
nucleons has the form 


Lyn = Co(N'P\N)'N'PIN+--- [24 


where P; are spin-isospin projectors and higher- 
order terms contain derivatives of the nucleon fields. 
The existence of bound states implies that at least 
part of the EFT Lagrangian must be treated 
nonperturbatively. Pionless EFT is an extension of 
effective-range theory that has long been used in 
nuclear physics. It has been applied successfully 
especially to the deuteron but also to more compli- 
cated few-nucleon systems like the Nd and na 
systems. For instance, precise results for Nd scatter- 
ing have been obtained with parameters fully 
determined from NN scattering. Pionless EFT has 
also been applied to the so-called halo nuclei, where 
a tight cluster of nucleons (like *He) is surrounded 
by one or more “halo” nucleons. 

In the regime |p| > M,, the pion must be included as 
a dynamical degree of freedom. With some modifica- 
tions in the power counting, the corresponding EFT is 
based on the approach of Weinberg (1990, 1991), who 
applied the usual rules of the meson and single-nucleon 
sectors to the nucleon-nucleon potential (instead of 
the scattering amplitude). The potential is then to be 
inserted into a Schrodinger equation to calculate 
physical observables. The systematic power counting 
leads to a natural hierarchy of nuclear forces, with 
only two-nucleon forces appearing up to next-to- 
leading order. Three- and four-nucleon forces arise at 
third and fourth order, respectively. 

Significant progress has been achieved in the 
phenomenology of few-nucleon systems. The two- 
and m-nucleon (3 < n < 6) sectors have been pushed to 
fourth and third order, respectively, with encouraging 
signs of “convergence.” Compton scattering off the 
deuteron, md scattering, nuclear parity violation, solar 
fusion, and other processes have been investigated in 
the EFT approach. The quark mass dependence of the 
nucleon-nucleon interaction has also been studied. 


See also: Anomalies; Electroweak Theory; High Te 
Superconductor Theory; Noncommutative Geometry 
and the Standard Model; Operator Product Expansion 

in Quantum Field Theory; Perturbation Theory and its 
Techniques; Quantum Chromodynamics; Quantum 
Electrodynamics and its Precision Tests; 
Renormalization: General Theory; Seiberg—Witten 
Theory; Standard Model of Particle Physics; Symmetries 
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Theory. 
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Introduction 


This article is an introduction to eigenfunctions of 
quantum completely integrable (QCI) systems. For 
these systems, one can understand asymptotics of 
eigenfunctions better than for other systems, so it is 
natural to study them. It is useful to begin the 
discussion with the most important geometric exam- 
ple given by the quantum Hamiltonian, P4 = — VA. 
We fix a basis of eigenfunctions, yj,j/=1,2,..., with 


—VAy = Aj, (Yi, 9) = i 


and assume that there exist functionally independent 
(pseudo)differential operators P2,...,P, with the 
property that 


(Pak | =, 


i,j=1,...,n 


In this case, P4 is said to be QCI and the operators, 
P,,kR=1,...,”, can be simultaneously diagonalized. It 
is therefore natural to study the special basis of Laplace 
eigenfunctions which are joint eigenvectors of the P/s. 
From now on, the ¢;’s are always assumed to be 
joint eigenfunctions of the commuting operators, 
P,k=1,...,n. The classical observables correspond- 
ing to the operators P,,k= 1,...,”, are the respective 
principal symbols, p € C~(T*M),j=1,...,”. In 
particular, the bicharacteristic flow of p1(x, €) = |El z 
is the classical “geodesic flow” 


G: T*M — T*M 


Examples of manifolds with QCI Laplacians include 
tori and spheres of revolution, Liouville metrics on tori 
and spheres, large families of metrics on homogeneous 
spaces, as well as hyperellipsoids with distinct axes in 
arbitrary dimension. There are also many inhomoge- 
neous QCI examples (see the next section). It is of 
interest to understand the asymptotics of both eigen- 
values and eigenfunctions. There is a large literature 
devoted to eigenvalue asymptotics, including trace 
formulas and Bohr-Sommerfeld rules (see Colin de 
Verdiere (1994a, b), Helffer and Sjoestrand (1990), 
and Colin de Verdiere and Vu Ngoc (2003)). We will 
concentrate here on the corresponding problem of 
determining eigenfunction asymptotics. The key prop- 
erty of eigenfunctions in the QCI case is localization in 
phase space, T*M. This allows one to study more 
effectively the concentration and blow-up properties 
than in any other setting. It is important to contrast 


this with, for example, the situation in the ergodic 
case. Moreover, in the QCI case, there is a particularly 
strong connection between dynamics of the geodesic 
flow, G,;: T*M— T*M, and the asymptotics of indi- 
vidual eigenfunctions. In the general case, one can 
usually only relate the dynamics to spectral averages, 
such as in the trace formula (Duistermaat and 
Guillemin 1975). 

For the most part, the literature on eigenfunction 
asymptotics addresses the following basic problems: 


1. determining sharp upper and lower bounds for y; 
as A; > oo and 

2. describing the link between the blow-up proper- 
ties of y; as A; 00 and the dynamics of the 
geodesic flow, G+. 


The starting point in the study of eigenfunction 
asymptotics in the QCI case is the fact that the joint 
eigenfunctions, y;, have masses that localize on the 
level sets, P '(b):={(x,&) € T*M; Pacl = bals 
1,...,2}. Moreover, by the Liouville—Arnol’d theo- 
rem, for generic levels (indexed by b € R), 


P~ (b) =X  A(b) [1] 


where the A,(b) C T*M are Lagrangian tori. The 
affine symplectic coordinates in a neighborhood of 
A,(b) are called “action-angle variables” (a, ..., 
glk), TR), te) € T” x R”. Written in terms of 
these coordinates, the classical Hamilton equations 
defining the geodesic flow assume the form 


dé dI 


and this system of ordinary differential equations 
(ODEs) is solved by quadrature. This explains why 
one refers to such systems as completely integrable. 
At the quantum level, one can construct semiclassi- 
cal Lagrangian distributions, 


0 


Dala)i= | Ead D 


which microlocally concentrate on A® (b) as \— o0 
and satisfy P)}b,=bjAb, + O(A”) in L*(M). An 
important fact is that the actual joint eigenfunctions, 
Yj, are approximated to O(A~°°)-accuracy in L?(M) by 
suitable linear combinations of the quasimodes, ®,. 
However, there are subtleties underlying this correspon- 
dence which are often neglected in the physics literature: 


3. The actual joint eigenfunctions ¢; localize on the 
level sets P-'(b) which usually consist of many 
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connected components. Consequently, the eigen- 
functions are approximated by (sometimes large) 
linear combinations of Lagrangian quasimodes 
attached to the different component tori. The 
precise splitting of mass amongst these different 
components is a difficult and, in general, 
unsolved problem in microlocal tunneling. 

4. The local torus foliation given by action-angle 
variables tends to degenerate and Lagrangian 
quasimodes are no longer approximate solutions 
to the (joint) eigenvalue equations near the 
singularities of the foliation. The singularities and 
their relative configurations can be complicated 
(Colin de Verdiere and Vu Ngoc 2003) and most 
of the interesting asymptotic blow-up properties 
of eigenfunctions tend to be associated with these 
degeneracies. The main tool for studying joint 
eigenfunctions near degeneracies is the quantum 
analog of the Eliasson normal form (Eliasson 
1984, Vu Ngoc 2000). We will refer to this as the 
“quantum Birkhoff normal form” (QBNF). 


Background on QCI Systems 


Let (M”,g) be a compact, closed Riemannian 
manifold and P;:=Op,(p1) be a formally self- 
adjoint, elliptic (in the classical sense) h-pseudodif- 
ferential operator. In local coordinates, the Schwarz 
kernel of P4 is of the form, 


A = (2nh)-” J cil ¥8)/p, (x, £; b) dé 


R” 
where pi(x,&h) € S?” (T*M); that is, pi(x, éh) ~ 
EL opile th with 828fpi jl, €) = Oa, pE TA 
(Dimassi and Sjoestrand 1999). It is often conve- 
nient to work with h-pseudodifferential operators 
rather than their classical counterparts. In the 
homogeneous case, one chooses h™ € Spec vA. 

Pi € Ops is said to be QCI if there exist self- 
adjoint P;= Opp(p;) € Opps% ) j=2,..., n, for 
some m with [P;, P;]=0,i,j=1,...,7, such that 
dpi \---Adp, #0 on a dense open subset, Qyeg C 
T*M, and P? +---+ P2 is elliptic in the classical 
sense. There are many inhomogeneous QCI examples 
including quantum Euler, Lagrange, and Kowalevsky 
tops together with quantum Neumann and Rosocha- 
tius oscillators in arbitrary dimension. 

Since {p;, p;}=0, the joint Hamilton flow of the 
prs induces a symplectic R”-action on T*M: 


P, ; T*M = T*M 
D6) = epa Hp Oae aHa (Xo) 
t = (ti,..., ta) E R” 


The associated moment map is just 
P:T*M — 0> R”, P= (p1,..., Pn) 


We denote the image P(T*M — 0) by B, the regular 
values (resp. singular values) by Breg (resp. Being) of 
the moment map. 

To establish bounds for the joint eigenfunctions of 
P,,...,P,, one imposes a “finite-complexity” assump- 
tion (Toth and Zelditch 2002) on the classical integrable 
system. This condition holds for all systems of interest in 
physics. To describe it, for each b = (b"),..., b') € B, 
let ma(b) denote the number of R”-orbits of the joint 
flow ®, on the level set P™ (b). Then, the finite- 
complexity condition says that for some Mọ > 0, 


malb) < Mo(Vb € B) 


In addition, when P is proper, 
malb) 
P (b) = X Alb) [3] 
k=1 


for any b € Beg, where the A;,(b) are Lagrangian tori. 
The starting point for analyzing joint eigenfunctions 
is the following correspondence principle (Zelditch 
1990) which makes the eigenfunction localization 
alluded to in the introduction more precise: 


Theorem 1 Let Opla) € Op»(S3)(T*M) and P;, į = 
1,... n, be a OCI system of commuting operators. 
Then, for every b € Byeg, there exists a subsequence of 
joint eigenfunctions ‘p,,(x) := y(x; u(h)) with h € (0, ho] 
and joint eigenvalues ju(b)=(11(b),...5 Mn(b)) € 
Spec(P1,...,P,) with |u(b) — b| = O(h) such that 


(Ops(4)Pur¥u) = le(B)P J 108 dju + O18) 


Here, du, denotes Lebesgue measure on the torus, A(b). 


The proof of Theorem 1 follows from the 4-microlocal, 
regular quantum normal construction near A(b) (see 
the section “Birkhoff normal forms”). 


Blow-Up of Eigenfunctions: 
Qualitative Results 


Before discussing quantitative bounds for joint 
eigenfunctions, it is useful to prove qualitative results. 
Here, we review only the homogeneous case where 
Pı =bVA, although the general case can be dealt 
with similarly (Toth and Zelditch 2002). Two well- 
known QCI examples which exhibit extremes in 
eigenfunction concentration are the round sphere and 
the flat torus. In the case of the sphere, the zonal 
harmonics blow-up like \!/ at the poles, whereas, in 
the case of the flat torus, all the joint eigenfunctions 
are uniformly bounded. The rest of the article will be 
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essentially devoted to understanding these extreme 
blow-up properties (and intermediate ones) more 
systematically. When discussing blow-up of eigen- 
functions, it is natural to start with the following: 


Question Do there exist QCI manifolds (other 
than the flat torus) for which all eigenfunctions are 
uniformly bounded in L®? 


Toth and Zelditch (2002) have proved that, up to 
coverings, the flat torus is the only example with 
uniformly bounded eigenfunctions. Their argument 
used the correspondence principle in Theorem 1 
combined with some deep results from symplectic 
geometry. To deal with the issue of multiplicities, it 
is convenient to define 


L(A; g) = sup |lollz= 
pe Vy 


where Vy ={y;P1y, =Ay)} and it is assumed that 
lell = 1. 


Theorem 2 (Toth and Zelditch 2002). Suppose 
that Pj} =VA is OCI on a compact, Riemannian 
manifold (M, g) and suppose that the corresponding 
moment map satisfies the finite-complexity condi- 
tion. Then, if L™®(A, g) = O(1), (M, g) is flat. 


The proof of Theorem 2 follows by contradiction: 
that is, one assumes that all eigenfunctions are 
uniformly bounded. There are two main steps in the 
proof of Theorem 2: the first is entirely analytic and 
uses the correspondence principle in Theorem 1 and 
uniform boundedness to determine the topology of 
M. The second step uses two deep results from 
symplectic topology/geometry to determine the 
metric, g, up to coverings. 

Using a local Weyl law argument and the finite- 
multiplicity assumption, it can be shown that for 
each b € Breg, there exists a subsequence, p,, of joint 
eigenfunctions such that Proposition 1 holds with 

c(h: JP > & 
where C > 0 is a uniform constant not depending 
on b € Breg. With this subsequence, one applies Theo- 
rem 1 with a(x,é)=V(x) € C®(M). It then easily 
follows by the boundedness assumption that for h 
sufficiently small and appropriate constants Co, C1 > 0, 


(iY) 
<x | VOl lel 


eee = | ve Javo j4] 








)|* dVol (x) 


where Ta) denotes the restriction of the canonical 
projection 7: T*M — M to the Lagrangian A(b). The 
estimate in [4] is equivalent to the statement, 


(7a (b)) «(psp ) < dVol(x) 


where given two Borel measures du and dv, one 
writes du«&«dv if du is absolutely continuous with 
respect to dv. Consequently, Tat) : A(b)— M has no 
singularities and thus, up to coverings, M is 
topologically a torus (since A(b) is). 

Since there are many QCI systems on n-tori, it still 
remains to determine how the uniform-boundedness 
condition constrains the metric geometry of (M, g8). 
First, by a classical result of Mane, if T*M possesses 
a C!-foliation by Lagrangians, (M,g) cannot have 
conjugate points. By the first step in the proof, it 
follows that under the uniform-boundedness 
assumption, M is a topological torus and T*M 
possesses a smooth foliation by Lagrangian tori. 
Consequently, (M,g) has no conjugate points. 
Finally, the Burago-Ivanov proof of the Hopf 
conjecture says that metric tori without conjugate 
points are flat. Therefore, (M, g) is flat. 

Consistent with Theorem 2, one can show (Toth 
and Zelditch 2003, Lerman and Shirokova 2002) that 
if (M, g) is integrable and not a flat torus, then there 
must exist a compact ®,-orbit (i.e., an orbit of the joint 
flow of Xp,,7=1,...,) with dim =k < n. In the QCI 
case, these “singular” orbits trap eigenfunction mass 
for appropriate subsequences. To understand this 
statement in detail, it is necessary to review QBNF 
constructions in the context of QCI systems. 


Birkhoff Normal Forms 


There are several excellent expositions on the topic 
of Birkhoff normal forms in the literature (see, e.g., 
Guillemin (1996), Iatchenko et al. (2002), and 
Zelditch (1998)), which discuss both the classical 
and quantum constructions. Here, we discuss the 
aspects which are most relevant for QCI systems. 

Consider the Schrodinger operator, P(x;/D,) = 
-p (d* /dx?) )+ V(x) with V(x + 2r)= V(x) acting 
on C™(R/27Z). Assume that the potential, V(x), is 
Morse and that x=0 is a potential minimum with 
V(0) = V’(0) =0 and Q c T*(S') an open neighbor- 
hood containing (0,0). In its simplest incarnation, 
the classical Birkhoff normal-form theorem says that 
for small enough Q, there exists a symplectic 
diffeomorphism, = ; (Q; (0, 0)) — (Q; (0, 0)); K7! 
(x, €)++(y,7), and a (locally defined) function Fo € 
C™”(R) such that 


(p o K)(y, n) = Fol? +9”) [5] 
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provided (y,7)€Q. At the quantum level, the 
analogous QBNF expansion says that there exist 
microlocally unitary 4-Fourier integral operators, 


U(b): C°(Q) + C*(Q) and a classical symbol 
F(x, h) ~ So 9 F(x)b’, such that 
U(b)* o P(h) o U(h) = oF(Ie; h) [6] 


with I, = hD? + y*. Given two h-pseudos P and O, 
the notation P=ņ O means ya Ix(P — OQ)llz2llo = 
O(h”) and |\(P—Q)x|l)>=O(6™), for any x€ 
C% (N). Since it can be easily pl that eigenfunc- 
tions y,, with u(h) = O(h’), O < B <1, localize very 
sharply near x=0, from the þ- aola unitary 
equivalence in [6], ihe eigenfunction and eigenvalue 
asymptotics (including trace formulas) can all be 
determined by working with the model operator on 
the right-hand side (RHS) of [6]. Moreover, on the 
model side, the eigenfunctions and eigenvalues are 
explicitly known. 

At a potential maximum, there exist classical and 
quantum normal forms analogous to [5] and [6] (see 
Helffer and Sjoestrand (1990) and Colin de Verdiere 
and Parisse (1994a)) except that the harmonic 
oscillator action operator, le, is replaced by the 
hyperbolic action operator, 


I, = b(yDy +3) 7] 


The 1D Schrodinger operator is the simplest 
example of a QCI system where (0,0) € T*S! is a 
nondegenerate critical point of the classical Hamil- 
tonian, H(x,£)=é* + V(x). Under a mild nonde- 
generacy hypothesis (Vu Ngoc 2000), there is an 
analogous normal form for arbitrary QCI systems 
which is valid near nondegenerate rank k <n orbits 
of the joint flow, ®,. At the classical level, this result 
is due to Eliasson (1984) and the quantum analog is 
due to Vu-Ngoc (2000). To state the result is 
general, one has to define the appropriate model 
operators: these are I, and I, together with the 
loxodromic model operators RI, = Dy, SL, = 
hpD,+h/i, where (p,@) denote polar coordinates 
in R2. The local model phase space for a rank k < n 
orbit, Og, is just T*(T*) x T*(R"*). In this case, the 
QBNF says that, for a sufficiently small ee abor 
hood, G} of O,, there exists a family of 4-Fourier 
integral operators, U,.:C%(Gz) > Corl) sI 
(R”-*)) and symbols f(b) ~ Xo fh’, such that 


UP Us =g,Mn - (Qi — fi (P), -- -Qn = fa(h)) 


8 
U* o U„ =ç, Id 8] 


Here, Mp is a microlocally invertible matrix of 
h-pseudodifferential operators commuting with the 
Q;’s, and the Q;’s are to be chosen from the list of 


model operators Listed eels where = 


(bDo,,...,Do,) denotes the regular model operator 
acting along the k-dimensional orbit, O,. Moreover, 
if (V13: --3Yn-k Ms- -> Ny-k) E T*(R”™ £) denote the 
symplectic model coordinates, then the Q;’s act in 
separate, complementary (y1,...,y,—,)-variables. 
The main point here is that [8] is actually a 
convergent normal norm in h in the sense that 
error terms in [8] are O(b~). In contrast (Guillemin 
1996, Iatchenko et al. 2002, Zelditch 1998), the 
general Birkhoff normal form is only formal in the 
sense that error terms vanish to successively higher 
orders along the orbit, O,, but are not necessarily 
small in terms of the spectral parameter, b. 

Using [8], it can be shown that the joint 
eigenfunctions, y,,, are microlocally determined in 
terms of the 4-Fourier integral operators, U,, and 
certain model eigenfunctions. More precisely, 


Ui pul, y; h) =gh) e™ - [up - cy Mel(y;b) [9] 


where m E€ Z + 1/4,c(b) € C(b). The generalized 
eigenfunctions of the model operators, Ip, Iep, Ie, acting 
a to A orbit O, are uly; u,b)= 
c (hly? + ehy A ualp,0;t,k, ht) = 
je ciko and uelly; n, rid H,(b- / y), where H,,(y) is 
the mth Hermite function. 


Eigenfunction Lower Bounds: 
Quantitative Results 


Let ©, be a singular rank k<n orbit as in the 
previous section. From the qualitative results of the 
first section, it follows that there must exist joint 
eigenfunctions, y,, of the commuting operators, 
Pj,j=1,...,n, which blow up along the orbit, Og. 
To obtain quantitative results, one could try to 
determine the L? — L1 mapping properties of the 
h-Fourier integral operator, U,. However, since the 
canonical transformation « to normal form can be 
complicated, this method is quite cumbersome. E 
avoid this complication (Toth and Zelditch 2003), i 
suffices to compute L*-masses only, but on scales af 
order b? where O<6<1)2.. Let m Geb °)) be the 
configuration space projection of the ’-radius tube 


G,(b°) D O,. Since 


lonli VAD > f p Wa dVol [10 


one is reduced to estimating Jo, H’) 1d,|\° dVol from 
below. To bound this integral con pou it suffices to 


1. reduce the estimate to one involving only the 
model eigenfunctions in the Birkhoff normal 
form and 

2. estimate the normalizing h-dependent constant 


c(h) in [9]. 
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To prove (1) one introduces a cutoff function 
Vine: b°) € C (G(h°)) and is identically equal to 
one near O,. Then, since a! (a(G,(B°))) > G,(b°), 
from the Garding inequality, it follows that 


[ m pul’ dVol > (Ops (x(x, & pupa) [11] 
k 


In light of the QBNF result in [8], the computa- 
tion of the matrix element on the RHS of [11] is 
reduced to a corresponding computation for the L2- 
normalized model eigenfunctions. Since the U,,’s are 
microlocally unitary, it follows that 


(Opr (x(x, 6:6°)) Gus Pu) ~paor C(O) eh) [12] 


Here, the constant C(6)>0 depends only on the 
scale of the cutoff function. It finally remains to deal 
with (2). Bounding the size of |c(4)| from below 
amounts to estimating the L*-mass of the joint 
eigenfunction y, which must be trapped near 
the orbit, O,. Using a local (singular) Weyl 
law argument, it is shown in Toth and Zelditch 
(2003) that 


e$ > log hl” [13] 


where 6>0 indexes the number of hyperbolic and 
loxodromic model operators. The final result quan- 
tifies blow-up along a compact orbit: 


Theorem 3 (Toth and Zelditch 2003). Let ©, be a 
rank k<n orbit of the joint flow ©®,. If this orbit is 
compact and nondegenerate, then there exists a 
subsequence of L*-normalized joint eigenfunctions 
yy, »R=1,2,..., of the OCI system Pj,j=1,...,n, 
such that for any «e > 0, 


(n—k/4)—e 
aj, lre >e Aj, 


By using the semiclassical scale h'/*| log b|", one 
can (slightly) improve the lower bound in Theorem 
3 to |px, lro > aa log A;,| “ for some a>0 (see 
Sogge et al. (2005)). 

When (M,g) is not flat, there must exist a 
singular, compact orbit of dimension k with 1<k< 
n — 1 and so, as an immediate corollary of Theorem 


3, it follows that for some a>0, 
L™ (A; g) >> A"/4| log Al“ [14] 


Since the bound in [14] is highly dependent on 
dimension, establishing the existence of high- 
codimension singular orbits would strengthen the 
estimate substantially. However, this appears to be a 
difficult and open problem. 


Maximal Blow-Up of Modes 
and Quasimodes 


We review here a number of converses to a recent 
result of Sogge and Zelditch (2002) on Riemannian 
manifolds (M,g) with maximal eigenfunction 
growth. These authors proved that if there exists a 
sequence of L?-normalized eigenfunctions of the 
Laplacian A of (M,g) whose L®-norms are compa- 
rable to zonal spherical harmonics on S$”, then there 
must exist a point comparable to the north pole of 
S”, that is, a recurrent point z such that a positive 
measure of geodesics emanating from z return to it 
at a fixed time T. The most extreme kind of 
recurrent point is a “blow-down point” of period 
T, where by definition all geodesics leaving z return 
to z at time T, that is, form geodesic loops. Poles of 
surfaces of revolution are blow-down points where 
all geodesic loops at z are smoothly closed, while 
umbilic points of triaxial ellipsoids are examples of 
blow-down points where all but two geodesic loops 
are not smoothly closed. On real-analytic manifolds, 
all recurrent points are blow-down points. The 
converse question is the following: what kind of 
mode (eigenfunction) or quasimode growth must 
occur when a blow-down point exists? 

Sogge et al. (2005) proved that maximal quasi- 
mode growth (Colin de Verdiere 1977) implies the 
existence of a blow-down point. This generalizes the 
main result of Sogge and Zelditch (2002) from 
modes (which one rarely understands) to quasi- 
modes (which one often understands better). Con- 
versely, existence of a blow-down point insures 
near-maximal quasimode growth, that is, here, 
maximal up to logarithmic factors. If one assumes 
that the geodesic flow G’:T*M—T*M of (M,g) is 
completely integrable and that dim M = 2, then the 
results of Sogge et al. (2005) show that actual 
eigenfunctions have near maximal blow-up. Examples 
show that, in general, blow-up points do not neces- 
sarily cause modes to have near-maximal blow-up. 

An important geometric invariant of a blow-down 
point is the first-return map to the cotangent fiber 
over the blow-down point: 


GI: S*M— SIM [15] 


G? is also an important analytic invariant: the blow- 
up rate of modes or quasimodes, specifically the 
occurrence of the logarithmic factors, depends on 
the fixed-point structure of this map. When all 
geodesic loops at z are smoothly closed, that is, 
when the first-return map is the identity, then there 
exist quasimodes of maximal growth. When the 
first-return map has fixed points, the maximal 
growth is modified by logarithmic factors. 
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To put these results in context, we first recall 
the local Weyl law of Avakumovich—Levitan 
(Duistermaat and Guillemin 1975), which states that 


A = 2r)” dé+ R(A,x) T16 
S~ P= ( fo E+R(A,x) [16] 


pars 
with uniform remainder bounds 
ROS) << OXM" #eEM 
It follows that 
L(A, g) = (A082) [17] 


on any compact Riemannian manifold. Riemannian 
manifolds for which the equality 


L®(A, g) = QA) (18) 


is achieved for some subsequence of eigenfunctions 
are said to be of maximal eigenfunction growth. In 
addition to modes, and almost inseparable from 
them, are the quasimodes of the Laplacian (Colin 
de Verdiere 1977). As the name suggests, quasi- 
modes are approximate eigenfunctions. The crudest 
type of quasimode is quasimode {7,} of order 0, 
namely a sequence of L?-normalized functions 
which solve 


(A — ue) Pellz2 = OM) 


for a sequence of quasieigenvalues pp. By the 
spectral theorem, it follows that there must exist 
true eigenvalues in the interval [ug — 6, up + 6] for 
some 6>0. (M,g) is said to have maximal 0-order 
quasimode growth if there exists a sequence of 
quasimodes of order 0 for which |z|» = 
Q(\\"—)/2), There are analogous definitions for 
more refined quasimodes, for example, quasimodes 
of higher order or (most refined) quasimodes defined 
by oscillatory integrals. It is natural to include 
quasimodes in this study because they often reflect 
the geometry and dynamics of the geodesic flow 
more strongly than actual modes. For quasimodes, 
there is the following result: 


Theorem 4 (Sogge et al. 2005). Let (M”,g) be a 
compact Riemannian manifold with Laplacian A. 
Then: 


(i) If there exists a quasimode sequence {(wp, up)} 
of order 0 with ||wWp||, 0 = ue), then there 
exists a recurrent point z € M for the geodesic 
flow. If (M, g) is real analytic, then there exists 
a blow-down point. 

(ii) Conversely, if there exists a blow-down point 
and if the map G! =id, then there exists a 
quasimode sequence {(Wp, uk)} of order 0O with 


Yell =O n: 


(iii) Let n=2 and (M",g) be real analytic. Then, 
if GI has a finite number of nondegenerate 
fixed points, there exists a quasimode sequence 


(Ves Me)} of order 0 with Ie ll poe = Uu x 
log py] ^). 


The assumption that G/=id is the same as 
saying that all geodesics leaving z smooth close up 
at z again. As mentioned above, poles of surfaces 
of revolution have this property. On the contrary, 
the umbilic points of triaxial ellipsoids in R? are 
blow-down points for which Gf #id. That is, 
every geodesic leaving an umbilic point returns at 
the same time, but only two closed geodesics in 
this family are closed, and they give rise to fixed 
points of Gi. One can show (see Toth 1996) that 
there exists a sequence of eigenfunctions in this 
case for which L®(g,A) aA log Al”. Hence, 
the above result is sharp. Moreover, it is clear 
from the proof that the fixed points are respon- 
sible for the logarithmic correction to maximal 
eigenfunction growth: they cause a change in the 
normal form of the Laplacian near the blow-down 
point. 

Theorem 4 illustrates the intimate connection 
between maximal blow-up of quasimodes and 
existence of blow-down points. It is natural to ask, 
however, when blow-down points cause blow-up in 
modes, that is, actual eigenfunctions. As mentioned 
above, this is not generally the case and some further 
mechanism is needed to ensure it. In the case of QCI 
surfaces, one can prove: 


Theorem 5 (Sogge et al. 2005). Let (M,g) be a 
smooth, compact surface, Pj = VA, Pa be an Elias- 
son nondegenerate OCI system on M and p, be an 
L?-normalized joint eigenfunction of P,P with 
VAy, =Apyp. Suppose that there exists a blow- 
down point z€M_ for the geodesic flow 
G;:= exp tXp,. Then, there exists a subsequence of 
(joint) Laplace eigenfunctions, p;,,R=1,2,..., such 
that for any e > 0, 


(1/2)—« 
Pella De Aj, 


The role of complete integrability is to force joint 
eigenfunctions to localize on level sets of the 
moment map and thus to blow up at blow-down 
points. The proofs of Theorems 4 and 5S are similar. 
To prove the latter, by the same reasoning as in the 
orbit case (Theorem 3), one needs to bound from 
below the integral 


/ lp, | dVol [19] 
B(z:b°) 
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for an appropriate subsequence of „s, where B(z; b°) 
denotes a ball of radius 6° centered at the blow-down 
point, z € M. The blow-down condition implies that 
SM C P'(b) for some b € B. The relevant sub- 
sequence of eigenfunctions, y,, are the ones with 
joint eigenvalues satisfying |u(4) — b| = O(h). Since the 
eigenfunctions Y, are microlocally concentrated on the 
set P! (b), by Garding, 


I, H’) ly, /'dVol > (Opr (x(x, £; b°))on, Pa) [20] 


where (x, €, bê) is a cutoff localized on an 
h°-neighborhood of Q =r! (z) NP (b). The matrix 
elements on the RHS of [20] are estimated by passing 
to QBNF. The subtlety here lies in the choice of scale, 
6. For 0 < 6 < 1/2, the b-pseudodifferential operators 
Opa (x(x, £; b°)) are contained in a standard calculus 
(Dimassi and Sjoestrand 1999) and so they automati- 
cally satisfy the h-Egorov theorem. In particular, the 
passage to normal form by conjugating with the U,,’s 
is automatic. The crucial point here is that to obtain 
the (near)-maximal blow-up near a blow-down point 
z € M, one needs to able to choose 0 < 6 < 1. Using 
second-microlocal methods similar to the ones in 
Sjoestrand and Zworski (1999), it is shown in Sogge 
et al. (2005) that the blow-down geometry implies that 
the microlocal cutoffs are contained in an h-pseudo- 
differential operator calculus and, in particular, the 
relevant h-Egorov theorem needed to pass to QBNF is 
satisfied for any 0 < 6 < 1. Then, by explicit compu- 
tation for the model eigenfunctions, one can show that 


Opa (x(x, & b°)) Gus Pp) > § h [21] 


for any 6 with 0<6<1. The result in Theorem 5 
then follows from the bound 


Pull: Vol(B(z; B°°)) > BP |22] 


where one takes 6 arbitrarily close to 1. By analyzing 
the U,s carefully (Sogge et al. 2005), the lower 
bound in Theorem 5 can be improved slightly by 
replacing the A~* by |logA| ° for some a> 0, 
although the sharp constant, a > 0, appears to be 
difficult to determine in general. In cases where the 
geometry of the first-return map, G2, is particularly 
simple, one can sometimes get sharp | log A|-power 
improvements in Theorem 5 (see Theorem 4 (iii)). 


Eigenfunction Upper Bounds: 
Quantitative Results 


In light of the Q-bounds in Theorem 5, it is natural 
to ask whether there are analogous upper bounds 
for L®(å; g) in the QCI case. The following result 
holds in the case of real-analytic surfaces: 


Theorem 6 (Sogge et al. 2005). Let (M,g) be a 
real-analytic Riemannian 2-manifold and P4 = VA 
and P) be a OCI system on (M,g) where, the 
principal symbol, p2, of P2 is a metric form on T*M. 


(i) IfM = TÈ, 
L™(A;g) = O(A"") 


(ii) If M&S?, let M,e be the set of completely 
recurrent points for the geodesic flow, 
G;,: T*M — T*M and let Qc C M be an open 
neighborhood of Mec. Then, 


L(A; 8) M-a = OO") 


An old result of Kozlov says that if the surface 
(M,g) is analytic, then topologically either M S S? 
or M & T°’, so that the estimates in Theorem 6 cover 
all possible cases in two dimensions. The assump- 
tions in Theorem 6 are satisfied in many examples 
including surfaces of revolution, Liouville surfaces, 
and ellipsoids with distinct axes in R°. 

The proof of Theorem 6 follows from a pointwise 
(joint) trace formula argument (Duistermaat and 
Guillemin 1975). Namely, in Sogge et al. (2005), it 
is shown that if there are no blow-down points for 
G,, then for appropriate p € S(R) with p> 0 and 
ô € CX(R), 


S>o( 6" [uP — b1]) -p(6* Jn?) - bo} 


x |p, (x; h) |? = O18") [23] 


where the estimate in [23] is uniform in x € M and 
locally uniform in b=(b,,b2) € B. Part (ii) follows 
from this. To prove part (i), one applies a simple 
homological argument to show that if M = T’, there 
cannot exist blow-down points for the geodesic flow 
(see also Sogge and Zelditch (2002)). 


Open Problems 


Most questions related to eigenfunction blow-up are 
completely open and general results are rare (Sogge 
and Zelditch 2002). Specific results/conjectures in 
the ergodic case can be found in Quantum Ergodi- 
city and Mixing of Eigenfunctions. We would like to 
point out here some specific questions related to the 
above results in the QCI case: 


1. All the known examples with blow-down points 
turn out to be integrable. Is this necessarily 
always the case? 

2. Does the maximal bound L(A; g) ~ A%—)/? 
necessarily imply that (M, g) is QCI? 


3. At the other extreme, does the minimal bound 
L®(à;g) ~ 1 necessarily imply that (M, g) is flat, 
or do there exist nonflat manifolds (which are 
necessarily not QCI) satisfying L® (à; g) ~ 1? 
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Introduction 


The goal of statistical mechanics is to calculate the 
macroscopic properties of matter from a knowledge 
of the fundamental interactions between the con- 
stituent microscopic components. For simplicity, let 
us assume discrete states. The mathematical prob- 
lem, as formulated by Gibbs, is then to calculate the 
partition function 


ie) ee [1] 


states o 


where B= 1/kgT is the inverse temperature, kp is 
the Boltzmann constant, and the Hamiltonian H 
describes the interaction energy of the state o of the 


N constituent degrees of freedom. The formidable 
nature of the problem ensues from the fact that Zn 
is needed in the limit of an arbitrarily large system 
to obtain the bulk free energy (T) or partition 
function per site x in the thermodynamic limit 


—BY(T) = lim Slog Zn = log k [2] 


This limit generally exists because the free energy of a 
finite system is extensive, that is, it grows proportion- 
ally with the system size. Once the bulk free energy is 
known, the other thermodynamic potentials are 
obtained, in principle, by taking derivatives with 
respect to the temperature T and other thermodynamic 
fields such as the volume V or the external magnetic 
field h. Phase transitions and the accompanying critical 
phenomena are associated with singularities of the 
bulk free energy as a function of the thermodynamic 
fields. Up until the beginning of the 1970s, there were 
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only a handful of two-dimensional lattice models that 
had yielded exact solution, most notably, the Ising 
model (free-fermion or dimer model), the spherical 
model, the square ice, and six-vertex models. This 
situation changed dramatically with Baxter’s solution 
of the eight-vertex and hard-hexagon models. The 
methods developed by Baxter make it possible to solve 
an infinite plethora of two-dimensional lattice models. 
In this article, we compare and contrast the remark- 
able properties of these two prototypical models that 
played such a pivotal role in the emergence of the 
modern theory of Yang—Baxter integrability. 


Definition of the Models 
Eight-Vertex Model 


The eight-vertex model emerged from the study of 
two-dimensional ferroelectrics. The local degrees of 
freedom are arrow states a, 3,7y,6= +1 which live 
on the edges of the elementary faces of the square 
lattice and describe the local polarization within the 
ferroelectric material. Of the 16 possible configura- 
tions around a face, the local configurations of an 
elementary square face are restricted to the eight 
configurations shown in Figure 1. 
The partition function is 


Ss [wis B [3] 


arrow states faces Q 


ZN = 


where the Boltzmann face weights are given alter- 
native graphical representations as a face or vertex 


y y 
y 
"(o Je pee [4] 





Figure 1 The eight vertex configurations of the eight-vertex 
model showing one of the two corresponding configurations of 
the related Ising model. The model is solvable in the symmetric 
Case, w1 =Ws5, W2 = We, W3 =W7,W4 =wg, When the Boltzmann 
weights are equal in pairs under arrow reversal. 


In the face representation, the arrow states are often 
called bond variables. Formally, the Hamiltonian is 
a sum over local energies H = 5°... Ela, b, Y, ô), 
where W(a, B, Y, 6) =exp(—GE(a, 6, y, 6)) but we use 
face weights since E is infinite for excluded config- 
urations. The general eight-vertex model includes 
many other ferroelectric models including the 
rectangular Ising model, Slater’s model of potassium 
dihydrogen phosphate (KDP), the Rys F model of an 
antiferroelectric, the square ice model and the six- 
vertex model solved by Lieb. In the case of the six- 
vertex model, w4—=wg =Q, so arrows are conserved 
with “two in” and “two out” at each vertex. 

The eight-vertex model can be formulated as an 
Ising model with spins a,b,c,d = +1 at the corners 
of the elementary faces and Boltzmann face weights 


d 


c 
W = R exp(Kac + Lbd + Mabcd) 
a b d 
d C 
= =d C [5] 
a b 
b 


The four independent vertex weights are related to 
R,K,L,M by 


dcom R A 
U= wW = Re iM 6 
w3 = wy = Re 1 ™ A 
ET 


This is not the usual rectangular Ising model since it 
involves four-spin interactions in addition to two-spin 
interactions. The spins and arrows are related by 


&=4b, B=be. y=, 0=—da [7] 


This mapping is one-to-two, since we can arbitrarily 
fix one spin somewhere on the lattice. It follows that 
Zising = 2Z vertex. The eight-vertex model obviously 
includes the six-vertex (w4 = weg) and the rectangular 
Ising models (M=0O). Although it is not at all 
obvious, the three-spin Ising model is also included 
as a special case (K =M, L=0). 

Notice that the eight-vertex face weights are 
invariant under spin reversal of the spins on either 
diagonal. This Z x Z2 symmetry, which the eight- 
vertex model shares with the Ashkin—Teller model, 
is peculiar because it allows the model to exhibit 
continuously varying critical exponents. Because of 
symmetries and duality, it is sufficient to consider 
the regime wy > w2 +w3+w4 with w2, w3, w4 > 0. 
In terms of spins, this corresponds to the 


ferromagnetically ordered phase; in terms of vertices 
or arrows, this corresponds to the ferroelectric 
phase. The eight-vertex model is critical on the 
four surfaces 


Wy = w2 + w3 F w4, w2 = w1 + w3 + w4 


W3 = w1 F w2 + w4, w4 = w1 +W2 + w3 
A convenient parameter to measure the departure 
from criticality t= (T — Te)/Te is 


1 


t = ——— 
16w1W2W3W4 


(w1 — W2 — W3 — w4) 


x (Wy — w2 + w3 + w4) 
x (Wy +w — w3 + w4) 
x (wy ae W2 =P 3 — w4)| [9] 


Because of the unusual four-spin interaction, it is 
difficult to realize the eight-vertex model experi- 
mentally in the laboratory. 


Hard-Hexagon Model 


The hard-hexagon model is a two-dimensional 
lattice model of a gas of hard nonoverlapping 
particles. The particles are placed on the sites of a 
triangular lattice with nearest-neighbor exclusion 
so that no two particles are together or adjacent. 
Effectively, the triangular lattice is partially cov- 
ered with nonoverlapping hard tiles of hexagonal 
shape. Let us draw the triangular lattice as a 
square lattice with one set of diagonals as in 
Figure 2. The partition function for the hard- 
hexagon model is 


N 
ZN = N e"g(n, N) [10] 
n=0 


where z > 0 is the activity and g(n, N) is the number 
of ways of placing n particles on the N sites such 
that no two particles are together or adjacent. To 
each lattice site j, assign a spin or occupation 











Figure 2 The triangular lattice drawn as a square lattice with 
one set of diagonals. The close-packed arrangement of particles 
(solid circles) fills one of the three independent sublattices. One 
of the nonoverlapping hard hexagons is shown shaded. At low 
activities, the hard hexagons are sparsely scattered on the 
lattice with no preferential occupation of a particular sublattice. 
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number o;; if the site is empty, oj = 0; if the site is 
full, oj=1. The partition function can then be 
written in terms of spins as 


Zn = >) [[0P A - o0) [11] 


spins 0 (ij) 


where the product is over all bonds (i) of the 
triangular lattice and the sum is over all configurations 
of the N spins or occupation numbers g; = 0, 1. The 
exponent of z arises because the activity is shared out 
between the six bonds incident at each site. The 
remaining term, (1 — oj0;)=0,1, ensures that neigh- 
boring sites are not occupied simultaneously by 
excluding such terms from the sum. 

The activity z gives the a priori probability of 
finding a particle at a given site and can be written 
as g=e °“, where u is the chemical potential. The 
density of particles increases monotonically as the 
activity increases but only a third of the total lattice 
sites can be occupied. At low activities, there are 
only a few particles scattered randomly so the $3 
sublattice symmetry of the triangular lattice is 
preserved. However, at higher activities approaching 
the close-packing limit, there is a sudden change and 
one of the three sublattices is preferentially occupied 
so the S3 sublattice symmetry is spontaneously 
broken. This dramatic change signals an order- 
disorder phase transition at some critical value Ze of 
the activity. The system is disordered below the 
critical activity but is ordered above it. The funda- 
mental problem is to obtain the statistical properties 
of this model such as the bulk free energy and the 
sublattice densities 


Pk = (oR) 
= {fraction of spins sitting on 
sublattice k = 1,2,3} [12] 


in the thermodynamic limit N — oo. The mean 
density is 


p = (Pı + p2 + p3)/3 < 1/3 [13] 


Assuming that sublattice k=1 is preferentially 
occupied, an order parameter is defined by 


R=pi- po [14] 


The order parameter vanishes in the disordered regime 
but is nonzero in an ordered regime. Notice that the 
symmetry between sublattices k = 2 and 3 is not broken. 

Unlike the eight-vertex model, the hard-hexagon 
model can be realized by a physical system in the 
laboratory, namely helium adsorbed on a graphite 
surface. The graphite substrate is composed of 
hexagonal cells formed by six carbon atoms with 
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an interatom distance of 2.46 A. Energetically, the 
adsorbed helium atoms prefer to sit in the potential 
well at the center of the hexagonal cells. The 
diameter of the helium atom, however, is 2.56 A, 
which precludes the simultaneous occupation of 
neighboring cells by excluded volume effects. Some 
beautiful experiments carried out by Bretz indicate 
that this system undergoes a phase transition. 
Indeed, Bretz took precise measurements of the 
specific heat as the temperature or, equivalently, the 
activity z, is varied, and obtained a symmetric 
power-law divergence at the critical point 


Cw |z—z,|°, ax 0.36 [15] 


with critical exponent a close to 1/3. Of course, one 
does not actually see divergences experimentally. 
Rather, it is the presence of dramatic peaks in the 
specific heat that are the hallmarks of a second- 
order transition. 


Yang-Baxter Equations and Commuting 
Transfer Matrices 


Yang-Baxter Equations 


The eight-vertex and hard-hexagon models were 
solved by Rodney Baxter at the beginning of the 
1970s and 1980s, respectively. Although the two 
models are quite different in nature, they are 
quintessential of exactly solvable lattice models. 
The seminal work of Baxter gives a precise criterion 
to decide if a two-dimensional lattice model is 
exactly solvable: it is exactly solvable if its local 
face weights satisfy the celebrated Yang—Baxter 
equation. We present a general formulation of the 
Yang-Baxter equations and commuting transfer 
matrices and then show how Baxter implemented 
these for the eight-vertex and hard-hexagon models. 
The first important step in the exact solution of a 
two-dimensional lattice model is the parametrization 
of the Boltzmann weights in terms of a distinguished 
variable u called the spectral parameter. Typically, 
critical models involve trigonometric or hyperbolic 
functions and off-critical models involve elliptic 
functions of the spectral parameter. In terms of u, 
the local Boltzmann weights of a general two- 
dimensional lattice model take the form 


d 





dvyc nE 
wls plu =al u |ø 16 
aab awb 


where the allowed values of the spins a,b, c,... and 
arrows (or bond variables) a,G,7,... may be 


restricted by certain constraints. The spins a,b,c,d 
are absent for the eight-vertex model and the arrows 
a, b, y, ó are absent for the hard-hexagon model. 

The general Yang-Baxter equations take the 
following algebraic and graphical forms: 


e dd dyc 
Ju | u( B 
Cg gnb 


e oa 
n h 
g ce 








f g 
swf: n 


aab 


8m,&,Ç 





Graphically, this equation can be interpreted as saying 
that the diamond-shaped face with spectral parameter 
v — u can be pushed through from the right to the left 
with the effect of interchanging the spectral para- 
meters u and v in the remaining two faces. 


Commuting Transfer Matrices 


A square lattice is built up row-by-row using the 
row transfer matrix T(u) with matrix elements 


(a, a| T(u) | c, 4) 


ee Gs. es 
N I Y j+1 18] 
- >, [Ws guar 
_ =] 
Disc NEEL J aj Oy j+ 1 
C1 Y1 C2 V2 C3 V3 C4 CN %1 (1 
= [19] 





dı Q1 ay Q2 d3 Q3 ay 


Here there are N columns, and periodic boundary 
conditions are applied so that ani1 = 41, BN+1 = b1, 
and so on. The significance of the Yang—Baxter 
equations is that they imply a one-parameter family 
of commuting transfer matrices 


T(u)T(v) = Tv) T(w) 20 


Pictorially, the product on the left is represented by 
two rows, one above the other, the lower row with 
spectral parameter u and the upper row with 
spectral parameter v. The matrix product implies 


that the spins and arrows on the intervening row are 
summed out. Inserting a diamond-shaped face with 
spectral parameter v —u and then using the local 
Yang—Baxter equation to progressively push it from 
right to left around the period interchanges all of the 
spectral parameters u with the spectral parameter v. 
At the end, the diamond-shaped face is removed 
again. This heuristic argument was made rigorous 
by Baxter, who showed quite generally, and for the 
eight-vertex and hard-hexagon models in particular, 
that the diamond faces are in fact invertible: 


d d 
OZ NEUZ SJ 
Oe: 
SE ù b 
= p(u) d(a, c) 5a, B) dy, 6) [21] 


independent of b,d where the scalar function p(u) is 
model dependent. This equation is called the 
inversion relation. 

Invariably, the existence of commuting transfer 
matrices leads to functional equations satisfied by 
the transfer matrices. Typically, the transfer matrices 
can be simultaneously diagonalized and so the 
functional equations can be solved for the eigen- 
values of the transfer matrices. Mathematically, this 
is where Yang—Baxter techniques derive their power. 
For example, building up the lattice row-by-row, we 
see that the partition function of an M x N lattice is 


Zun = te Tu) =y Taa)" p2 


where T,(u) are the eigenvalues of T(u). Typically, 
by the Perron—Frobenius theorem, the largest eigen- 
value To(u) is real, positive, and nondegenerate: 


To(u) > |Ti(w)| > |Ta(#)| > -+> |23] 


Consequently, 


a M 
— 8y = Jim lim MNE 2 TH) 


=œ Moo 
> 4 
= lim 5 log To() [24] 
Thus the calculation of the bulk free energy is 


reduced to the problem of finding the largest 
eigenvalue of the transfer matrix. 


Parametrization of the Eight-Vertex Model 


Using the spin formulation of the eight-vertex 
model, Baxter showed that two transfer matrices 
T(K, L, M), T(K', L', M’) commute whenever 


A(K,L, M) = A(K',L', M’ [25] 
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where 


A(K, L, M) = sinh 2K sinh 2L 
+tanh2Mcosh2Kcosh2L [26] 


If M and A are regarded as fixed, this is seen to be a 
symmetric biquadratic relation between e** and e% 
and is naturally parametrized in terms of elliptic 
functions. Unfortunately, many different notations 
and conventions for these elliptic functions appear 
in the literature which can be confusing to the 
uninitiated. Let 














a (u) O WAU) RA) 
s= “= hay? "wa 24 
= va (u) P V4 (AÀ zm u) = v4(0) 
MO) FA)? OD) el 


where 4 (u) =V1(u, g) and v4(u) = 94(u, q) are stan- 
dard elliptic theta functions of nome q. Then the 
vertex weights can be parametrized as 


wy = Ru tcc, w = Rn tss 


|29] 


w3 = Ry tcs, w4 = Ru te_s 


In the ferromagnetic regime u, A, and 7 are all pure 
imaginary with O<q<1 and 0 < Im u < Imà < 
(m/2)Imr. The critical line occurs in the limit q — 1. 
In this sense, we are using a low-temperature elliptic 
parametrization. Another elliptic parametrization, 
which is useful to study the critical limit, is obtained 
by transforming to the conjugate nome q’. If q =e7™ 
then the conjugate nome is defined by g’=e~*’* so 
that qd — Oas q > 1. 

We regard the crossing parameter À as constant, u 
as a variable, and write the transfer matrix as T(u). 
It follows from this parametrization that M and A 
are constants, independent of u. Furthermore, any 
two transfer matrices T(u) and T(v) commute and 
hence T(u) is a one-parameter family of commuting 
transfer matrices. For interest, we point out that the 
integrable XYZ quantum spin chain belongs to this 
family. Specifically, the logarithmic derivative of the 
eight-vertex transfer matrix yields 

d 


o [log T(u)],,-0= Hxyz [30] 


where 


Hxyz 
1 N 
=-5) (kojoa thoa thoja) BY 
j=1 


and oF" o} ja are the usual Pauli spin matrices. 
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Parametrization of the Hard-Hexagon Model 


Actually, Baxter did not solve the hard-hexagon 
model directly. Instead, he solved a generalized 
hard-hexagon model, which is a model of hard 
squares with interactions along the diagonals of the 
elementary squares as shown in Figure 3. This in 
turn corresponds to the A4 case of the more general 
solvable Ay restricted solid-on-solid (RSOS) models 
of Andrews, Baxter, and Forrester. 

The face weights of the generalized hard-hexagon 
model are 


d 
w(‘ 4 — T a a (| = ab) 


x (1 — bc)(1 — cd) (1 — da) 
x exp(Lac + Mbd) [32] 


Here the activity z has been shared out between 
the four faces adjacent to a site, m is a trivial 
normalization constant, and £ is a gauge param- 
eter that cancels out of the partition function and 
transfer matrix. The anisotropy between L and M 
introduces an additional parameter which will 
play the role of the spectral parameter u. In fact, 
using the Yang-Baxter equation, Baxter showed 
that this model is exactly solvable on the 
manifold 


z= (1— e74) (1 —e™)/(e™ — et — eM) [33] 


Specifically, two transfer matrices T(z, L,M) and 
T(z’, L', M') commute whenever 


A(z, L, M) = A(z, L', M’) 


34 
A(z, L, M) = {=z 34 


The hard-hexagon model is recovered in the limit 
L=0, M = —œ which forbids simultaneous occupa- 
tion of sites joined by one set of diagonals. In this 
special limit, the activity z is unconstrained. It is 
curious to note that the pure hard-square model 
with L=M =0 is not solvable. 

Eliminating z between the above relations gives a 
symmetric biquadratic relation between e and e™, 














Figure 3 Interacting hard squares showing the diagonal 
interactions L and M. The hard-hexagon model corresponds to 
the limit L=0, M = —œ.. 


which is naturally parametrized in terms of elliptic 
functions. Choosing m and t appropriately, the 
Boltzmann weights are 


w(, o) 7 pee 


v(i J= a F 
v rG oea 8s 


w(? ') _ O(2A =u) 


1 0 OA) 
1 0\  A&A+un) 
w(, 1) = IO) 


Here the crossing parameter is A=7/5, —A <u< 
2, and 


= şin u [a ge) 
x (1 — q” — q”) [36] 


is a nonstandard elliptic theta function of nome q°. 
Despite the deceiving notation, the nome q? lies in the 
range —1 < q? < 1 and is determined by the relation 
A(X) ]° 2 
= =2(1—zert™ 37 
oy] meat 7 
Regarding q? as fixed and u as a variable, it follows 
that T(u) is a one-parameter family of commuting 
transfer matrices. 
The regimes relevant to the hard-hexagon model 
are: 


Regime I (disordered) : 


Regime II (triangular ordered): 0< gq? <1, 
=a 0= 0 


The borderline case g*=0 corresponds to a line of 
critical points. The original hard-hexagon model is 
obtained in the limit u — —A=—7/5, so it follows 
that the critical point occurs at 


5 
= ( 4) = TEF 5v5) [39] 





Away from criticality the activity is related to the 
nome q? by 


Tig cosan + q*"]° 
Aeee wo 
se a OE. 


Functional Equations 
Baxter’s T—Q Relation 


In a tour de force Baxter showed that the transfer 
matrix of the eight-vertex model satisfies the 
functional equation 


T(u)Q(u) = o(u)Q(u— A) + o(u— A)Q(u + A) 
where (u)=(cs) =[01(u)04(u)/01(A)04(A)]* and 


O(u) is an auxiliary peat ‘a seat E 
transfer matrices satisfying [O(u), O(v)]=[O(w), 
T(v)|=0. In principle, these equations, which are 
intimately related to the Bethe ansatz, can be solved to 
obtain all the eigenvalues of the transfer matrix. 
Without entering into the intricacies of solving these 
equations, we summarize the results for the partition 
function per site «, correlation length £, and interfacial 
tension g. As we have seen, the largest eigenvalue of 
the transfer matrix yields x. The interfacial tension o 
and correlation length € were obtained, respectively, 
by Baxter and by Johnson, Krinsky, and McCoy by 
integrating over (continuous) bands of eigenvalues. In 
the ferromagnetic regime, their results are 


[41] 


log(k/w1) 
Naa e a a Te 
E n(1 — q”) (1 + x") 


Et = —flog k(x”), 


maA/2 —1 ariu 


where =p" z= eE 
2 


modulus of nome x^: 


o=hkpT/E J43 


, and k is the elliptic 
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and a=2 — r/m with 0 < fi< m. Exceptional cases 
occur, however, if 7/fi is an integer. This occurs, for 
example, in the case of the rectangular Ising model 
(M=0, f#=7/2), which exhibits a logarithmic sin- 
aaa in the specific heat (œ= Qog). Similarly, 
using log k(x?) ~ (—t)"/7", the other associated crit- 
ical exponents are 
me o~ (0%, p= n/2p 
Notice that, due to the special symmetries of the 
eight-vertex model, these critical exponents vary 
continuously as the four-spin interaction is varied. 
This violates the universality hypothesis, which 
asserts that the exponents should only depend on 
the dimensionality and symmetries and not on the 
details of the interactions. Suzuki has suggested that 
it is more natural to use the inverse correlation length 
€, rather than the temperature difference T — Te, to 
measure the departure from criticality with the effect 
that it is the renormalized critical exponents 


[47] 


—— 


&=(2-a)/v, B= B/v, [48] 


fi = ujv 


that are independent of the details of the 


interactions. 


Hard-Hexagon Functional Equation 


Baxter and Pearce showed that the normalized row 
transfer matrix of the generalized hard-hexagon 
model, 


6(u+22)0(A) 1^ 


ee Olu + )0(u — 2.) 
2) 
ke) =p) 
satisfies the simple functional equation 
Detailed analysis shows that near Tę the free 
energy w in general behaves as t(u)t(u +A) =1+t(u— 22) [50] 
2 —\T/ ji 2—q 
Y ~ cot(n?/20) F ~, t+ 0 [45] where A=7/5. Since T(u) is a commuting family of 
veer -TN matrices, this equation can be solved for the 
eigenvalues T(u) to obtain the partition function 
tan(a/2) = 1/2 _ ,-2M 46 per site «, correlation length ¿ and interfacial 
AmA A a lea) j pal tension o. Let p=|q?|, s=|q2|°/°, then the results 
are summarized as 
TT! [1 _ 2g" cos(47/5) te gr (1 _ gl — p — perenne 2 
Ric - al ala aen Aa aaa chine À S &c 
a (L— 2g? cos(2nr/5) + a a 5s 
eo 2 4 2ny2 5 a 
is 4n/5) + q2(1 -qA — p” 
xe] 2 o ae aes 3 & > ke 
a — 24°" cos(2m/5) + q” P (1 — pe") 
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1/2 
_ [2725 + 11v5)]" i 
-~ 250 
oo o ooo _ 
a ee 1+ 3520-1 + s4n-2' — 
iy ai racial os (= = a) 4 4n-2 . 

ee ar 1 —252"-1cos (+a) p g4n—2 » Ze 
[53] 
0, Z <e [54] 

E Ten Z> XK 


It follows that «(z), €(z), and o(z) are analytic 
functions of z, except at the critical point z= ze. 
The associated critical exponents 


pr (z = eae Ew (z — a 


E [55] 
SWIZZ)", gsi, ypopas/6 


agree with experiments on helium adsorbed on 
graphite. 


Corner Transfer Matrices 


The one-point functions and order parameters of the 
eight-vertex and hard-hexagon models were 
obtained by Baxter by using corner transfer matrices 
(CT'Ms). The idea is to build up the square lattice 
quadrant-by-quadrant as shown in Figure 4. The 
partition function and one-point function are then 


otr SABCD 


Z = tr ABCD, (01) ~ ++ ABCD 


[56] 
where S is the diagonal matrix with entries Ss, o =09 
and the entries A, ~» are labeled by half-rows of 
spins d=(00,01,02,...) and o =(00, 04, 05,...). 


























Figure 4 The square lattice divided into four quadrants 
corresponding to the CTMs A, B, C, D. The spin at the center 
iS co.. The spins on the boundaries are fixed by the boundary 
conditions. 


The CTMs have some remarkable properties. If the 
Boltzmann weights are invariant under reflections 
about the diagonals, as is the case for the eight-vertex 
model, Baxter argued that, in the limit of a large lattice, 


A(u) = C(u) = BIA — u) = D(A — u) [57] 


where A(z) is a commuting family of matrices. Since 
these are block matrices in the center spin og, they 
also commute with S$. Moreover, Baxter showed that 
the eigenvalues of A(u) are exponentials of the form 


A(u), = ms exp(uE,) [58] 


where the constants m, and E, can be evaluated in 
the low-temperature limit. It follows that 


4 ,2AE, 
> so TOME 


y míe2^Eo |59] 


(00) = 


When the Boltzmann weights do not exhibit symme- 
try about the diagonals, which is the case for hard 
hexagons, the above arguments need to be modified. 


One-Point Functions of the Eight-Vertex Model 


For the eight-vertex model, Baxter showed that 


1 ois. 
Mo = 1, Es = 51i X jH(oj-1,0), 7441) 
j=1 
H(o;j-1, 0j, 0j) = 1 — oj-19}41 [60] 


subject to the boundary condition oN =on 41 = +1. 
Introducing a new set of spins 


Hi = O7-197+1, J= L Assa N [61] 


we have o= Halas. Setting s= (xz)? = em. 
t = (x/z)'/* =e™—/2_ and taking the limit of large 
N, the diagonalized matrices are direct products of 
2 x 2 matrices: 


s-(i Jeh Jef Ajo e 


A(u) = C(u) 
(o s) (o 2) (o aler 6 
B(u) = D(u) 


-G JG el LJe s 


It follows that the magnetization is 


CO = y4n—2 


(00) = ll — (k) = (1 — k2)!’ [65] 


where k’ = k'(x’) is the conjugate elliptic modulus of 
nome x? and the associated critical exponent is 


(oo) ~ (-t)", @=n/16fi [66] 


The polarization of the eight-vertex model is 


a) = (on) = (E) 67] 


2n n 
ee kg 








This cannot be obtained by a direct application of 
CTMs but was conjectured by Baxter and Kelland 
and subsequently derived by Jimbo, Miwa, and 
Nakayashiki using difference equations. 


One-Point Functions of the Hard-Hexagon Model 


For hard hexagons, the working is more complicated 
because one must keep track of the sublattice of the 
central spin og, but fascinating connections emerge 
with the Rogers-Ramanujan functions: 


2 1 
G(x) = la ra en 


2 i 68 
FA) aa 
I] (Lye) (1 =") 
For hard hexagons, Baxter showed that 
2 200, 2E 
a _ 26707) Wo" gg 


AB Dor w 


where k=1,2,3 labels the sublattice of the trian- 
gular lattice. Here the spin configurations 
o=(00,01,03,...) with oj=0,1 are subject to the 
constraint oj0;4;=0 for all j. If |q*|=e-* and 
g(x) = H(x)/G(x) then 


x= eT Se re = —x/ g(x), wo=—-x forz<K 
xa=e e rex 'g(x), wo=x A forz>z 
[70] 
and 
oa . 
DIGS); qs. Re 
j=l 
Doak =, 71 
DIG — Gj-19j+1 a 
J= 
SP Sis), Zoe 


For large N, oj — sj, where the ground-state values 


s; determined by the boundary conditions are 
Z < Ze : Sj = () [72] 


La RE S3i+k = 1, S3j+k+1 = 0, R= 1 23 [73] 
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After applying some Rogers-Ramanujan identities 
and introducing the elliptic functions 





"= . [74] 
P(x) — 2o = l[a T xen) 


the expressions for the sublattice densities simplify 
in the limit of large N giving 
_ xG(x)H(x°) P(x’) 


in the disordered fluid phase and 


_ A(x) Q(x)[G (x) Q(x) + x° A(x”) O(%")] 
Q(x)? 
x*H(x)H(x°)Q(x)Q(x°) 
Q(x)? 


[76] 
pr = p3 = 


[77] 
Z > xe 


in the triangular ordered phase. In principle, the 
dependence on x can be eliminated by observing that 


ar NL 
—1 5 
x“ [G(x)/H(x))’, 


oe 
os [78] 
Z > £e 


In practice, this is quite nontrivial. Although it is far 
from obvious, because x — 1 is a subtle limit, the critical 
exponent associated with the order parameter R is 


R ~ (z= 2) ~ (g), B=1/9 [79] 


Summary 


Baxter’s exact solutions of the eight-vertex and 
hard-hexagon models have been reviewed. These 
prototypical examples clearly illustrate the mathe- 
matical power and elegance of commuting transfer 
matrices and Yang-Baxter techniques. The results 
for the principal thermodynamic quantities, includ- 
ing free energies, correlation lengths, interfacial 
tensions, and one-point functions, have been sum- 
marized. For convenience in comparison, the asso- 
ciated critical exponents are collected in Table 1. All 
these exponents confirm the hyperscaling relation 
2 — a=dyv for lattice dimensionality d=2. 

More recently, Yang—Baxter techniques have been 
applied to solve an infinite variety of lattice models in 
two dimensions. Commuting transfer methods have 
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Table 1 Comparison of the exactly calculated critical expo- 
nents of the rectangular Ising, eight-vertex and hard-hexagon 
models. The rectangular Ising model corresponds to the special 
case ji=7/2 of the eight-vertex model. The eight-vertex 
exponents vary continuously with O<j<7.. The critical 
exponents of the hard-hexagon model, with its S3 symmetry, 
lie in the universality class of the three-state Potts model. 


Model a B V u 





Rectangular Ising Olog 1/8 1 1 
Eight vertex 2—1/[ m/164 n/2u m2m 
Hard hexagons 1/3 1/9 5/6 5/6 


also been adapted to study integrable boundaries and 
associated boundary critical behavior. Lastly, it 
should be mentioned that, in the continuum scaling 
limit, there are deep connections with conformal field 
theory and integrable quantum field theory. On the 
one hand, the lattice can often provide a convenient 
way to regularize the infinities that occur in these 
continuous field theories. On the other hand, the field 
theories can predict and explain the universal proper- 
ties of lattice models such as critical exponents. 


See also: Bethe Ansatz; Boundary Conformal Field 
Theory; Hopf Algebras and g-Deformation Quantum 
Groups; Integrability and Quantum Field Theory; 
g-Special Functions; Quantum Spin Systems; 
Two-Dimensional Ising Model; Yang—Baxter Equations. 
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Introduction 


Even in a linear theory like Maxwell’s electrody- 
namics, in which sufficiently general solutions of the 
field equations can be obtained, one needs a good 
sample, a useful kit, of explicit exact fields like the 
homogeneous field, the Coulomb monopole field, the 
dipole, and other simple solutions, in order to gain a 
physical intuition and understanding of the theory. In 
Einstein’s general relativity, with its nonlinear field 
equations, the discoveries and analyses of various 
specific explicit solutions revealed most of the 
unforeseen features of the theory. Studies of special 
solutions stimulated questions relevant to more 
general situations, and even after the formulation of 
a conjecture about a general situation, newly dis- 
covered solutions can play a significant role in 
verifying or modifying the conjecture. The cosmic 
censorship conjecture assuming that “singularities 
forming in a realistic gravitational collapse are hidden 
inside horizons” is a good illustration. 

Albert Einstein presented the final version of his 
gravitational field equations (or the Einstein’s 
equations, EEs) to the Prussian Academy in Berlin 
on 18 November 1915: 


1 8G 
Kg — 5 Swk E a lw [1] 


Here, the spacetime metric tensor g(x’), 4, V5 
p, ...=0, 1,2,3, determines the invariant line element 
&=2,, dx" dx”, and acts also as a dynamical variable 
describing the gravitational field; the Ricci tensor 
Ry = 2°" Rpuov, Where g’? gp = 64, is formed from the 
Riemann curvature tensor R,,,~,3 both depend non- 
linearly on gag and 0,20, and linearly on 0,,0,24,3; the 
scalar curvature R =g” Rw. T(x’) is the energy- 
momentum tensor of matter (“sources”); and Newton’s 
gravitational constant G and the velocity of light c are 
fundamental constants. If not stated otherwise, we use 
the geometrized units in which G = c = 1, and the same 
conventions as in Misner et al. (1973) and Wald (1984). 
For example, in the case of perfect fluid with density p, 
pressure p, and 4-velocity U”, the energy-momentum 
tensor reads T, = (p + p) U Ur + pg. To obtain a 
(local) solution of [1] in coordinate patch {x°} 
means to find “physically plausible” (i.e., complying 
with one of the positive-energy conditions) functions 


p(x’), p(x’), U (x°), and metric g(x’) satisfying [1]. 
In vacuum T,,, = 0 and [1] implies R,,, = 0. 

In 1917, Einstein generalized [1] by adding a 
cosmological term Ag, (A =const.): 


Rw — 5 8uvR + Agu = STT w [2 


A homogeneous and isotropic static solution of [2] 
(with metric [8], R= +1,a=const.), in which the 
“repulsive effect” of A > 0 compensates the gravita- 
tional attraction of incoherent dust (“uniformly 
distributed galaxies”) — the Einstein static universe — 
marked the birth of modern cosmology. Although it is 
unstable and lost its observational relevance after the 
discovery of the expansion of the universe in the late 
1920s, in 2004 a “fine-tuned” cosmological scenario 
was suggested according to which our universe starts 
asymptotically from an initial Einstein static state and 
later enters an inflationary era, followed by a 
standard expansion epoch (see Cosmology: Mathe- 
matical Aspects). There are many other examples of 
“old” solutions which turned out to act as asymptotic 
states of more general classes of models. 


Invariant Characterization 
and Classification of the Solutions 


Algebraic Classification 
The Riemann tensor can be decomposed as 
Raggy = Caps + Eapyó + Gapy6 [3] 


where E and G are constructed from Rag, R, and 
gag (see, e.g., Stephani et al. (2003)); the Weyl 
conformal tensor Cag can be considered as the 
“characteristic of the pure gravitational field” since, 
at a given point, it cannot be determined in terms of 
the matter energy-momentum tensor Tyg (as E and 
G can using EEs). Algebraic classification is based 
on a classification of the Weyl tensor. This is best 
formulated using two-component spinors 
a4(A=1,2), in terms of which any Weyl spinor 
Wascp determining Cag, can be factorized: 


ABCD = QAPBYCoD) [4] 


brackets denote symmetrization; each of the spinors 
determines a principal null direction, say, 
k*=a4a (see Spinors and Spin Coefficients). The 
Petrov—Penrose classification is based on coin- 
cidences among these directions. A solution is of 
type I (general case), I, IH, and N (“null”) if all null 
directions are different, or two, three, and all four 


coincide, respectively. It is of type D (“degenerate”) 


166 Einstein Equations: Exact Solutions 


if there are two double null directions. The 
equivalent tensor equations are simplest for type N: 


Cagak = 0, Cac’ =O, 


5 
Concer —0 j 


where Ci 5,5 =(1/2)é€appoC?’ 5, € is the Levi-Civita 
pseudotensor. 


Classification According to Symmetries 


Most of the available solutions have some exact 
continuous symmetries which preserve the metric. 
The corresponding group of motions is characterized 
by the number and properties of its Killing vectors €° 
satisfying the Killing equation (£2) 43 = Ea; 8 + €3; =0 
(£ is the Lie derivative) and by the nature (spacelike, 
timelike, or null) of the group orbits. For example, 
axisymmetric, stationary fields possess two commuting 
Killing vectors, of which one is timelike. Orbits of the 
axial Killing vector are closed spacelike curves of finite 
length, which vanishes at the axis of symmetry. In 
cylindrical symmetry, there exist two spacelike com- 
muting Killing vectors. In both cases, the vectors 
generate a two-dimensional abelian group. The two- 
dimensional group orbits are timelike in the stationary 
case and spacelike in the cylindrical symmetry. 

If a timelike €“ is hypersurface orthogonal, 
€ = AP a for some scalar functions A, ®, the spacetime 
is “static.” In coordinates with € = O,, the metric is 


p= Vdr + 0 Uy dxidx* [6] 


where U, y; do not depend on t. In vacuum, U satisfies 
the potential equation U:4 = 0, the covariant derivatives 
(denoted by :) are with respect to the three-dimensional 
metric 7;,. A classical result of Lichnerowicz states that 
if the vacuum metric is smooth everywhere and U — 0 
at infinity, the spacetime is flat (for refinements, see 
Anderson (2000)). 

In cosmology, we are interested in groups whose 
regions of transitivity (points can be carried into 
one another by symmetry operations) are three- 
dimensional spacelike hypersurfaces (homogeneous 
but anisotropic models of the universe). The three- 
dimensional simply transitive groups G3 were 
classified by Bianchi in 1897 according to the 
possible distinct sets of structure constants but 
their importance in cosmology was discovered only 
in the 1950s. There are nine types: Bianchi I to 
Bianchi IX models. The line element of the Bianchi 
universes can be expressed in the form 


g = -dP + galt)” [7] 


where the time-independent 1-forms w’ = E? dx° 
satisfy the relations dwf = —(1 C a Awt, d is 


the exterior derivative and C?. are the structure 
constants (see Cosmology: Mathematical Aspects for 
more details). 

The standard Friedmann—Lemaitre—Robertson-— 
Walker (FLRW) models admit in addition an 
isotropy group SO(3) at each point. They can be 
represented by the metric 


2 
g=-dt + [a(t] ( i ai + 1° (d0 + sin? édy?)) [8] 





in which a(t), the “expansion factor,” is determined by 
matter via EEs, the curvature index k= — 1,0, +1, the 
three-dimensional spaces t=const. have a constant 
curvature K=k/a?;r € [0,1] for closed (k= +1) uni- 
verse, r€[0,0oo) in open (k=0,—1) universes (for 
another description (see Cosmology: Mathematical 
Aspects). 

There are four-dimensional spacetimes of constant 
curvature solving EEs [2] with T,» = 0: the Minkowski, 
de Sitter, and anti-de Sitter spacetimes. They admit the 
same number [10] of independent Killing vectors, but 
interpretations of the corresponding symmetries differ 
for each spacetime. 

If €° satisfies £e203 = 228, ® = const., it is called a 
homothetic (Killing) vector. Solutions with proper 
homothetic motions, ® Æ 0, are “self-similar.” They 
cannot in general be asymptotically flat or spatially 
compact but can represent asymptotic states of more 
general solutions. In Stephani et al. (2003), a summary 
of solutions with proper homotheties is given; their 
role in cosmology is analyzed by Wainwright and Ellis 
(eds.) (1997); for mathematical aspects of symmetries 
in general relativity, see Hall (2004). 

There are other schemes for invariant classifica- 
tion of exact solutions (reviewed in Stephani et al. 
(2003)): the algebraic classification of the Ricci 
tensor and energy-momentum tensor of matter; the 
existence and properties of preferred vector fields 
and corresponding congruences; local isometric 
embeddings into flat pseudo-Euclidean spaces, etc. 


Minkowski (M), de Sitter (dS), 
and Anti-de Sitter (AdS) Spacetimes 


These metrics of constant (zero, positive, negative) 
curvature are the simplest solutions of [2] with Ty =0 
and A=0,A >0,A <0, respectively. The standard 
topology of M is R*. The dS has the topology R! x S° 
and is best represented as a four-dimensional hyper- 
boloid —v* + w? + x? +y? +2%=(3/A) in a five- 
dimensional flat space with metric g = — dv? + dw? + 
dx? + dy? + dz*. The AdS has the topology St x R3; it 
is a four-dimensional hyperboloid —v* — w? + x? + y? 
+z% = —(3/A),A < 0, in flat five-dimensional space 


with signature (-, —, +, +, +). By unwrapping the 
circle St and considering the universal covering space, 
one gets rid of closed timelike lines. 

These spacetimes are all conformally flat and can 
be conformally mapped into portions of the Einstein 
universe (see Asymptotic Structure and Conformal 
Infinity). However, their conformal structure is 
globally different. In M, one can go to infinity 
along timelike/null/spacelike geodesics and reach 
five qualitatively different sets of points: future/past 
timelike infinity i+, future/past null infinity Z*, and 
spacelike infinity i°. In dS, there are only past and 
future conformal infinities Z-,Z*, both being space- 
like (on the Einstein cylinder, the dS spacetime is a 
“horizontal strip” with Z*/Z~ as the “upper/lower 
circle”). The conformal infinity in AdS is timelike. 

As a consequence of spacelike Z* in dS, there 
exist both particle (cosmological) and event horizons 
for geodesic observers (Hawking and Ellis 1973). dS 
plays a (doubly) fundamental role in the present-day 
cosmology: it is an approximate model for infla- 
tionary paradigm near the big bang and it is also the 
asymptotic state (at t— oo) of cosmological models 
with a positive cosmological constant. Since recent 
observations indicate that A> 0, it appears to 
describe the future state of our universe. AdS has 
come recently to the fore due to the “holographic” 
conjecture (see AdS/CFT Correspondence). 

Christodoulou and Klainermann, and Friedrich 
proved that M, dS, and AdS are stable with respect 
to general, nonlinear (though “weak”) vacuum 
perturbations — result not known for any other 
solution of EEs (see Stability of Minkowski Space). 


Schwarzschild and Reissner—Nordstrom 
Metrics 


These are spherically symmetric spacetimes — the 
SO; rotation group acts on them as an isometry 
group with spacelike, two-dimensional orbits. The 
metric can be brought into the form 


g = —e” dt + edr +r (dẹ? + sin? Ody*) [9] 


v(t, r), A(t, r) must be determined from EEs. In vacuum, 
we are led uniquely to the Schwarzschild metric 


zi 
g=-(1-"*) a? 4 (1-=) dr 


+7 (dë + sin? Ody’) [10] 


where M =const. has to be interpreted as mass, as 
test particle orbits show. The spacetime is static at 
r > 2M, that is, outside the Schwarzschild radius at 
r=2M, and asymptotically (r — oo) flat. 
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Metric [10] describes the exterior gravitational field 
of an arbitrary (static, oscillating, collapsing, or 
expanding) spherically symmetric body (spherically 
symmetric gravitational waves do not exist). It is the 
most influential solution of EEs. The essential tests of 
general relativity — perihelion advance of Mercury, 
deflection of both optical and radio waves by the Sun, 
and signal retardation — are based on [10] or rather on 
its expansion in M/r. Space missions have been 
proposed that could lead to measurements of “post- 
post-Newtonian” effects (see General Relativity: 
Experimental Tests, and Misner et al. (1973)). The full 
Schwarzschild metric is of importance in astrophysical 
processes involving compact stars and black holes. 

Metric [10] describes the spacetime outside 
a spherical body collapsing through r=2M into 
a spherical black hole. In Figure 1, the formation 
of an event horizon and trapped surfaces is indicated 
in ingoing Eddington-Finkelstein coordinates 
(v,r,0, p) where v=t+r+2Mlog(r/2M—1) so 
that (v, 0, p) = const. are ingoing radial null geodesics. 
The interior of the star is described by another metric 
(e.g., the Oppenheimer—Snyder collapsing dust solu- 
tion — see below). The Kruskal extension of the 
Schwarzschild solution, its compactification, the con- 
cept of the bifurcate Killing horizon, etc., are analyzed 
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Figure 1 Gravitational collapse of a spherical star (the interior of 
the star is shaded). The light cones of three events, O, P, Q, at the 
center of the star, and of three events outside the star are illustrated. 
The event horizon, the trapped surfaces, and the singularity formed 
during the collapse are also shown. Although the singularity appears 
to lie along the direction of time, from the character of the light cone 
outside the star but inside the event horizon we can see that it has a 
spacelike character. Reproduced from Bicak J (2000) Selected 
solutions of Einstein’s field equations: their role in general relativity 
and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations 
and their physical Implications, Lecture Notes in Physics, vol. 540, pp. 
1-126. Heildelberg: Springer, with permission from Springer-Verlag. 
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in Stationary Black Holes and in Misner et al. (1973), 
Hawking and Ellis (1973), and Bicak (2000). 

The Reissner—Nordstr6m solution describes the 
exterior gravitational and electromagnetic fields of a 
spherical body with mass M and charge QO. The 
energy-momentum tensor on the right-hand side of 
EE [2] is that of the electromagnetic field produced 
by the charge; the field satisfies the curved-space 
Maxwell equations. The metric reads 


2 24 —1 
Tam eos dt? + fa dr? 
r r? r r? 


+7 (dë + sin* Ody’) [11] 
The analytic extension of the electrovacuum metric 
[11] is qualitatively different from the Kruskal exten- 
sion of the Schwarzschild metric. In the case O? > M? 
there is a “naked singularity” (visible from r — co) at 
r= 0 where curvature invariants diverge. If O? < M?, 
the metric describes a (generic) static charged black 






hole with two event horizons at r=r+ =M + (M? — 
Q?)'/?. The Killing vector /8t is null at the horizons, 
timelike at r >r} and r < r_, but spacelike between 
the horizons. The character of the extended spacetime 
is best seen in the compactified form, Figure 2, in 
which world-lines of radial light rays are 45° lines. 
Again, two infinities (right and left, in regions I and III) 
arise (as in the Kruskal-Schwarzschild diagram, see 
Stationary Black Holes), however, the maximally 
extended geometry consists of an infinite chain of 
asymptotically flat regions connected by “wormholes” 
between the singularities at r=0. In contrast to 
the Schwarzschild singularity, the singularities are 
timelike — they do not block the way to the future. 
The inner horizon r=r_ represents a Cauchy horizon 
for a typical initial hypersurface like © (Figure 2): what 
is happening in regions V is in general influenced not 
only by data on X but also at the singularities. The 
Cauchy horizon is unstable (for references, see Bicak 
(2000) and recent work by Dafermos (2005)). 
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Figure 2 The compactified Reissner—Nordstrom spacetime representing a non-extreme black hole consists of an infinite chain of 
asymptotic regions (“universes”) connected by “wormholes” between timelike singularities. The world-line of a shell collapsing from 
“universe” / and re-emerging in “universe” l is indicated. The inner horizon at r= r_ is the Cauchy horizon for a spacelike hypersurface 
X. It is unstable and thus it will very likely prevent such a process. Reproduced from Bicak J (2000) Selected solutions of Einstein’s 
field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and their Physical 
Implications, Lecture Notes in Physics, vol. 540, pp. 1-126. Heildelberg: Springer, with permission from Springer-Verlag. 


For M? = Ọ? the two horizons coincide at r} = 
r_=M. Metric [11] describes extreme Reissner— 
Nordstrom black holes. The horizon becomes 
degenerate and its surface gravity vanishes (see 
Stationary Black Holes). Extreme black holes play 
a significant role in string theory (Ortin 2004). 


Stationary Axisymmetric Solutions 


Assume the existence of two commuting Killing 
vectors — timelike €° and axial n° (E°€, <0,n°7.> 0), 
E€% normalized at (asymptotically flat) infinity, 7° at 
the rotation axis. They generate two-dimensional orbits 
of the group G2. Assume there exist 2-spaces orthogo- 
nal to these orbits. This is true in vacuum and also in 
case of electromagnetic fields or perfect fluids whose 
4-current or 4-velocity lies in the surfaces of transitivity 
of G2 (e.g., toroidal magnetic fields are excluded). The 
metric can then be written in Weyl’s coordinates 


(2,0. P52) 


g = —U (dt + Ady) 
+e 7 e (do? + dz”) + p*dy’| [12] 


U, k, and A are functions of p, z. 

The most celebrated vacuum solution of the form 
[12] is the Kerr metric for which U, k, A are ratios 
of simple polynomials in spheroidal coordinates 
(simply related to (p,z)). The Kerr solution is 
characterized by mass M and specific angular 
momentum a. For a? > M?, it describes an asymp- 
totically flat spacetime with a naked singularity. For 
a? < M?, it represents a rotating black hole that has 
two horizons which coalesce into a degenerate 
horizon for a? = M? — an extreme Kerr black hole. 
The two horizons are located at r4 =M + (M? — 
a’)'/? (r being the Boyer-Lindquist coordinate (see 
Stationary Black Holes)). As with the Reissner— 
Nordström black hole, the singularity inside is 
timelike and the inner horizon is an (unstable) 
Cauchy horizon. The analytic extension of the Kerr 
metric resembles Figure 2 (see Frolov and Novikov 
(1998), Hawking and Ellis (1973), Misner et al. 
(1973), Ortin (2004), Semerák et al. (2002), 
Stephani et al. (2003), and Wald (1984) for details). 

Thanks to the black hole uniqueness theorems (see 
Stationary Black Holes), the Kerr metric is the unique 
solution describing all rotating black holes in vacuum. 
If the cosmic censorship conjecture holds, Kerr black 
holes represent the end states of gravitational collapse 
of astronomical objects with supercritical masses. 
According to prevalent views, they reside in the nuclei 
of most galaxies. Unlike with a spherical collapse, 
there are no exact solutions available which would 
represent the formation of a Kerr black hole. However, 
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starting from metric |12] and identifying, for example, 
z= b=const. and z = —b (with the region —b < z < b 
being cut off), one can construct thin material disks 
which are physically plausible and can be the sources 
of the Kerr metric even for a? > M? (see Bičák (2000) 
for details). 

In a general case of metric [12], EEs in vacuum 
imply the “Ernst equation” for a complex function f 
of p and z: 


(Rf) for P “fo -P+ [13] 


or, equivalently, (Rf)Af =(Vf)*, where f =e2¥ + ib, 
U enters [12], and b(p,z) is a “potential” for A(p, z): 
A,p = pe *4b ,, A, =—pe*"b,; R(p,z) in [12] can 
be determined from U and b by quadratures. 
Tomimatsu and Sato (TS) exploited symmetries of 
[13] to construct metrics generalizing the Kerr metric. 
Replacing f by €=(1—f)/(1 +f), one finds that in 
case of the Kerr metric € is a linear function in the 
prolate spheroidal coordinates, whereas for TS 
solutions € is a quotient of higher-order polynomials. 
A number of other solutions of eqn [13] were found 
but they are of lower significance than the Kerr 
solution (cf. Stephani et al. (2003), Chapter 20). 

These solutions inspired “solution-generating meth- 
ods” in general relativity. The Ernst equation can be 
regarded as the integrability condition of a system of 
linear differential equations. The problem of solving 
such a system can be reformulated as the Riemann- 
Hilbert problem in complex function theory (see 
Riemann-Hilbert Problem and Integrable Systems: 
Overview). We refer to Stephani et al. (2003) and 
Belinski and Verdaguer (2001) where these techniques 
using Backlund transformations, inverse-scattering 
method, etc., are also applied in the nonstationary 
context of two spacelike Killing vectors (waves, 
cosmology). In the stationary case, all asymptotically 
flat, stationary, axisymmetric vacuum solutions can, in 
principle, be generated. It is known how to generate 
fields with given values of multipole moments, though 
the required calculations are staggering. By solving the 
Riemann-Hilbert problem with appropriate boundary 
data, Neugebauer and Meinel constructed the exact 
solution representing a rigidly rotating thin disk of 
dust (cf. Stephani et al. (2003) and Bicak (2000)). 

A subclass of metrics [12] is formed by static Weyl 
solutions with A=b=0. Equation [13] then 
becomes the Laplace equation AU =0. The non- 
linearity of EEs enters only the equations for k: k p = 
plU? — U kz =2pU „Uz. The class contains some 
explicit solutions of interest: the “linear super- 
position” of collinear particles with string-like 
singularities between them which keep the system 
in static equilibrium; solutions representing external 


170 Einstein Equations: Exact Solutions 


fields of counter-rotating disks, for example, those 
which are “inspired” by galactic Newtonian potentials; 
disks around black holes and some other special 
solutions (Stephani et al. 2003, Bonnor 1992, Bicak 
2000, Semerák et al. 2002). 

There are solutions of the Einstein—Maxwell equa- 
tions representing external fields of masses endowed 
with electric charges, magnetic dipole moments, etc. 
(Stephani et al. 2003). Best known is the Kerr- 
Newman metric characterized by parameters M, a, 
and charge Q. For M? > a? + OQ? it describes a 
charged, rotating black hole. Owing to the rotation, 
the charged black hole produces also a magnetic field 
of a dipole type. All the black hole solutions can be 
generalized to include a nonvanishing A (for various 
applications, see Semerák et al. 2002)). Other general- 
izations incorporate the so-called Newman—Unti- 
Tamburino (NUT) parameter (corresponding to a 
“sravomagnetic monopole”) or an “external” mag- 
netic/electric field or a parameter leading to “uniform” 
acceleration (see Stephani et al. (2003) and Bicak 
(2000)). Much interest has recently been paid to black 
hole (and other) solutions with various types of gauge 
fields and to multidimensional solutions. References 
Frolov and Novikov (1998) and Ortin (2004) are two 
examples of good reviews. 


Radiative Solutions 
Plane Waves and Their Collisions 


The best-known class are “plane-fronted gravita- 
tional waves with parallel rays” (pp-waves) which 
are defined by the condition that the spacetime 
admits a covariantly constant null vector field 
k°:R.3=0. In suitable null coordinates u, v such 
that ka = u a, k“ =(0/Ov)", and complex coordinate ¢ 
which spans the wave 2-surfaces u= const., v= 
const. with Euclidean geometry, the metric reads 


g = 2d¢d¢é — 2dudv — 2H(u, ¢, C)du? [14] 


H(u,¢,¢) is a real function. The vacuum EEs imply 
H .«=0 so that 2H = f (u, Ç) + f(u, Å), f is an arbitrary 
function of u, analytic in ¢. The Weyl tensor satisfies 
eqns [5] — the field is of type N as is the field of plane 
electromagnetic waves. In the null tetrad 
{k°, 1°, m°(complex)} with /°k, = —1,m°m, = 1, all 
other products vanishing, the only nonzero projection 
of the Weyl tensor, Y= Cagle mt’ m? = H zz, 
describes the transverse component of a wave propa- 
gating in the k® direction. Writing Y = Ae, the real 
A>O is the amplitude of the wave, © describes 
polarization. Waves with © = const. are called linearly 
polarized. Considering their effect on test particles, 
one finds that plane waves are transverse. 


The simplest waves are homogeneous in the sense 
that Y is constant along the wave surfaces. One gets 
f(u, C) =(1/2)A(u)e'™ C7. Instructive are “sandwich 
waves,” for example, waves with a “square 
profile”: A=0 for u < 0 and u > a*, A=a™ = const. 
for 0 < u < a°. This example demonstrates, within 
exact theory, that the waves travel with the speed of 
light, produce relative accelerations of test particles, 
focus astigmatically generally propagating parallel rays, 
etc. The focusing effects have a remarkable conse- 
quence: there exists no global spacelike hypersurface on 
which initial data could be specified — plane wave 
spacetimes contain no global Cauchy hypersurface. 

“Impulsive” plane waves can be generated by 
boosting a “particle” at rest to the velocity of light by 
an appropriate limiting procedure. The ultrarelativistic 
limit of, for example, the Schwarzschild metric (the so- 
called Aichelburg—Sexl solution) can be employed as a 
“limiting incoming state” in black hole encounters (cf. 
monograph by d’Eath (1996)). Plane-fronted waves 
have been used in quantum field theory. For a review 
of exact impulsive waves, see Semerák et al. (2002). 

A collision of plane waves represents an exceptional 
situation of nonlinear wave interactions which can be 
analyzed exactly. Figure 3 illustrates a typical case in 
which the collision produces a spacelike singularity. The 
initial-value problem with data given at v = 0 and u = 0 
can be formulated in terms of the equivalent matrix 
Riemann-Hilbert problem (see Riemann—Hilbert 
Problem); it is related to the hyperbolic counterpart of 
the Ernst equation [13]. For reviews, see Griffiths 
(1991), Stephani et al. (2003), and Bicak (2000). 


Cylindrical Waves 


Discovered by G Beck in 1925 and known today as the 
Einstein—Rosen waves (1937), these vacuum solutions 
helped to clarify a number of issues, such as energy loss 
due to the waves, asymptotic structure of radiative 





Figure 3 A spacetime diagram indicating a collision of two plane- 
fronted gravitational waves which come from regions // and III, collide 
in region /, and produce a spacelike singularity. Region /V is flat. 
Reproduced from Bicak J (2000) Selected solutions of Einstein’s 
field equations: their role in general relativity and astrophysics. In: 
Schmidt BG (ed.) Ejinstein’s Field Equations and their Physical 
Implications, Lecture Notes in Physics, vol. 540, pp. 1-126. 
Heildelberg: Springer, with permission from Springer-Verlag. 


spacetimes, dispersion of waves, quasilocal mass- 
energy, cosmic censorship conjecture, or quantum 
gravity in the context of midisuperspaces (see Bicak 
(2000) and Belinski and Verdaguer (2001)). 

In the metric 


g =e) (dP + dp”) + ed? + p’e dy’ [15] 


w(t, p) satisfies the flat-space wave equation and ¥(p, t) 
is given in terms of ~ by quadratures. Admitting a 
“cross term” ~ w(t, p)dzdd, one acquires a second 
degree of freedom (a second polarization) which 
makes all field equations nonlinear. 


Boost-Rotation Symmetric Spacetimes 


These are the only explicit solutions available which 
are radiative and represent the fields of finite sources. 
Figure 4 shows two particles uniformly accelerated in 
opposite directions. In the space diagram (left), the 
“string” connecting the particles is the “cause” of the 
acceleration. In “Cartesian-type” coordinates and the 
z-axis chosen as the symmetry axis, the boost Killing 
vector has a flat-space form, ¢ = z(0/Ot) + t(0/0z), the 
same is true for the axial Killing vector. The metric 
contains two functions of variables p? = x? + y? and 
8? =z} —t*. One satisfies the flat-space wave equa- 
tion, the other is determined by quadratures. 

The unique role of these solutions is exhibited by the 
theorem which states that in axially symmetric, locally 
asymptotically flat spacetimes, in the sense that a null 
infinity (see Asymptotic Structure and Conformal 
Infinity) exists but not necessarily globally, the only 
additional symmetry that does not exclude gravitational 
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Figure 4 Two particles uniformly accelerated in opposite 
directions. Orbits of the boost Killing vector (thinner hyperbolas) 
are spacelike in the region t? > z?. Reproduced from Bicak J 
(2000) Selected solutions of Einstein’s field equations: their role 
in general relativity and astrophysics. In: Schmidt BG (ed.) 
Einstein’s Field Equations and their Physical Implications, 
Lecture Notes in Physics, vol. 540, pp. 1-126. Heildelberg: 
Springer, with permission from Springer-Verlag. 
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radiation is the boost symmetry. Various radiation 
characteristics can be expressed explicitly in these 
spacetimes. They have been used as tests in numerical 
relativity and approximation methods. The best-known 
example is the C-metric (representing accelerating black 
holes, in general charged and rotating, and admitting A), 
see Bonnor et al. (1994), Bicak (2000), Stephani et al. 
(2003), and Semerak et al. (2002). 


Robinson-Trautman Solutions 


These solutions are algebraically special but in general 
they do not possess any symmetry. They are governed 
by a function P(u,¢,¢) (u is the retarded time, ¢ a 
complex spatial coordinate) which satisfies a fourth- 
order nonlinear parabolic differential equation. Stud- 
ies by Chruściel and others have shown that RT 
solutions of Petrov type II exist globally for all positive 
“times” u and converge asymptotically to a Schwarzs- 
child metric, though the extension across the 
“Schwarzschild-like” horizon can only be made with 
a finite degree of smoothness. Generalization to the 
cases with A > 0 gives explicit models supporting the 
cosmic no-hair conjecture (an exponentially fast 
approach to the dS spacetime) under the presence of 
gravitational waves. See Bonnor et al. (1994), Bicak 
(2000), and Stephani et al. (2003). 


Material Sources 


Finding physically sound material sources in an 
analytic form even for some simple vacuum metrics 
remains an open problem. Nevertheless, there are 
solutions representing regions of spacetimes filled 
with matter which are of considerable interest. 

One of the simplest solutions, the spherically 
symmetric Schwarzschild interior solution with 
incompressible fluid as its source, represents “a 
star” of uniform density, p= pọ = const.: 


2 
g=- | VI AR - 5 VI AP dr 


a dr’ 
1 — Ar? 
A = 8rpo/3 = const., R is the radius of the star. 


The equation of hydrostatic equilibrium yields 
pressure inside the star: 


V1 — Ar — V1 — AR? 
3V1 — AR? — V1 — Ar? 


Solution [16] can be matched at r= R, where p=0, 
to the exterior vacuum Schwarzschild solution [10] 
if the Schwarzschild mass M = (1/2)AR?. Although 
“incompressible fluid” implies an infinite speed of 
sound, the above solution provides an instructive 


+r (d0 + sin? 0de?) [16] 


Srp = 2A [17] 
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model of relativistic hydrostatics. A Newtonian star of 
uniform density can have an arbitrarily large radius 


R= /3p_/2mp% and mass M = (p: / P8) \/6De/7 Pe is 
the central pressure. However, [17] implies that (1) M 
and R satisfy the inequality 2M/R < 8/9, (2) equality 
is reached as p.-becomes infinite and R and M attain 
their limiting values Riim =(3mpo0) 1/7 = (9 /4)Miim. For 
a density typical in neutron stars, pọ = 10! g cm, we 
get Mii, = 3.96M. (Mo solar mass) — even this simple 
model shows that in Einstein’s theory neutron stars can 
only be a few solar masses. In addition, one can prove 
that the “Buchdahl’s inequality” 2M/R < 8/9 is valid 
for an arbitrary equation of state p=p(p). Only a 
limited mass can thus be contained within a given 
radius in general relativity. The gravitational redshift 
z=(1- IMIR — 1 from the surface of a static 
star cannot be higher than 2. 

Many other explicit static perfect fluid solutions 
are known (we refer to Stephani et al. (2003) for a 
list), however, none of them can be considered as 
really “physical.” Recently, the dynamical systems 
approach to relativistic spherically symmetric static 
perfect fluid models was developed by Uggla and 
others which gives qualititative characteristics of 
masses and radii. 

The most significant nonstatic spacetime describing 
a bounded region of matter and its external field is 
undoubtedly the Oppenheimer-Snyder model of 
“gravitational collapse of a spherical star” of uniform 
density and zero pressure (a “ball of dust”). The model 
does not represent any new (local) solution: the interior 
of the star is described by a part of a dust-filled FLRW 
universe (cf. [8]), the external region by the Schwarzs- 
child vacuum metric (cf. eqn [10], Figure 1). 

Since Vaidya’s discovery of a “radiating Schwarzs- 
child metric,” null dust (“pure radiation field”) has 
been widely used as a simple matter source. Its 
energy-momentum tensor, T43=@0R.ks, where 
kyk® =0, may be interpreted as an incoherent 
superposition of waves with random phases and 
polarizations moving in a single direction, or as 
“lightlike particles” (photons, neutrinos, gravitons) 
that move along k®. The “Vaidya metric” describing 
spherical implosion of null dust implies that in case 
of a “gentle” inflow of the dust, a naked singularity 
forms. This is relevant in the context of the cosmic 
censorship conjecture (cf., e.g., Joshi (1993)). 


Cosmological Models 


There exist important generalizations of the stan- 
dard FLRW models other than the above-mentioned 
Bianchi models, particularly those that maintain 
spherical symmetry but do not require homogeneity. 
The best known are the Lemaitre-Tolman—Bondi 


models of inhomogeneous universes of pure dust, 
the density of which may vary (Krasiński 1997). 
Other explicit cosmological models of principal 
interest involve, for example, the Gödel universe — a 
homogeneous, stationary spacetime with A < 0 and 
incoherent rotating matter in which there exist 
closed timelike curves through every point; the 
Kantowski-Sachs solutions — possessing homo- 
geneous spacelike hypersurfaces but (in contrast to 
the Bianchi models) admitting no simply transitive 
G3; and vacuum Gowdy models (“generalized 
Einstein—Rosen waves”) admitting G2 with compact 
2-tori as its group orbits and representing cosmolo- 
gical models closed by gravitational waves. See 
Cosmology: Mathematical Aspects and references 
Stephani et al. (2003), Belinski and Verdaguer 
(2001), Bicak (2000), Hawking and Ellis (1973), 
Krasinski (1997) and Wainwright and Ellis (1997). 


See also: AdS/CFT Correspondence; Asymptotic 
Structure and Conformal Infinity; Cosmology: 
Mathematical Aspects; Dirac Fields in Gravitation and 
Nonabelian Gauge Theory; Einstein Manifolds; Einstein’s 
Equations with Matter; General Relativity: Experimental 
Tests; General Relativity: Overview; Hamiltonian 
Reduction of Einstein’s Equations; Integrable Systems: 
Overview; Newtonian Limit of General Relativity; 
Pseudo-Riemannian Nilpotent Lie Groups; 
Reimann-Hilbert Problem; Spacetime Topology, Causal 
Structure and Singularities; Spinors and Spin 
Coefficients; Stability of Minkowski Space; Stationary 
Black Holes; Twistor Theory: Some Applications. 
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Introduction 


Einstein’s theory of gravity models a gravitating 
physical system S using a spacetime (M7*, g, y) which 
satisfies the Einstein field equations 


Gwg) = KT mhg, p) [1] 
F(g,y) =0 2] 


Here, M4 is a four-dimensional spacetime manifold, g 
is a Lorentz signature metric on M, 7 represents the 
nongravitational (“matter”) fields of interest, G, := 
Rw — (1/2)g,,R is the Einstein curvature tensor, « is a 
constant, T is the stress—energy tensor for the field 4, 
and F =0 represents the nongravitational field equa- 
tions (e.g., V F” = 0 for the Einstetn—Maxwell theory). 

By far the most widely used way to obtain and to 
study spacetime solutions (M4, g,%) of equations 
[1]-[2] is via the initial-value (or Cauchy) formula- 
tion. The idea is as follows: 


1. One chooses a set of initial data D which consists of 
geometric as well as matter information on a 
spacelike slice of M*. This data must satisfy a system 
of constraint equations, which comprise a portion of 
the field equations [1]-[2], and are analogous to the 
Maxwell constraint equation V -E=0. 

2. One fixes a time and coordinate choice to be used in 
evolving the fields into the spacetime (e.g., maximal 
time slicing and zero shift). This choice should result 
in a fixed set of evolution equations for the data. 

3. Using the evolution equations, one evolves the data 
into the future and the past. From the evolved data, 
one constructs the spacetime solution (M4, g, 2). 


Why is this procedure so popular? First, because 
we have known for over 50 years that at least for 
a short time, it works. That is, as shown by 
Choquet-Bruhat (Foures-Bruhat 1952), the Cauchy 


formulation is well posed. Second, because it fits with 
the way we like to model physical systems. That is, 
we first specify what the system is like now, and we 
then use the equations to determine the behavior of 
the system as it evolves into the future (or the past). 
Third, because the formulation is eminently amenable 
to numerical treatment. Indeed, virtually all numer- 
ical simulations of colliding black hole systems as 
well as of most other relativistic astrophysical systems 
are done using some version of the initial-value 
formulation. Finally, because the initial-value formu- 
lation casts the Einstein equations into a form which 
is readily accessible to many of the tools of geometric 
analysis. Questions such as cosmic censorship are 
turned into conjectures which can be analyzed and 
proved mathematically, and the proofs of both the 
positivity of mass and the Penrose mass inequality 
rely on an initial-value interpretation. 

There are of course drawbacks to the Cauchy 
formulation. Foremost, Einstein’s theory of general 
relativity is inherently a spacetime-covariant theory; 
why break spacetime apart into space plus time when 
covariance has played such a key role in the theory’s 
success? As well, we have learned over and over again 
that null cones and null hypersurfaces play a major 
role in general relativity; the initial-value formulation 
is not especially good at handling them. These draw- 
backs show that there are analyses in general relativity 
for which the initial-value formulation may not be well 
suited. However, there is a preponderance of applica- 
tions for which this formulation is an invaluable tool, 
as evidenced by its ubiquitous use. 

A complete treatment of the initial-value formula- 
tion for Einstein’s equations would include discus- 
sion of each of the following topics: 


1. A statement and proof of well-posedness theo- 
rems, including a discussion of the regularity of 
the data needed for such results. 

2. A space + time decomposition of the fields, and a 
formal derivation of the Einstein constraint 
equations and the Einstein evolution equations. 
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3. An outline of the Hamiltonian version of the 
initial-value formulation. 

4. A listing of those choices of field variables and 
gauge choices for which the system is manifestly 
hyperbolic. 

5. A description of the known methods for finding 
and parametrizing solutions of the Einstein 
constraint equations. 

6. A comparison of the virtues and drawbacks of 
various choices of time foliation and coordinate 
threading. 

7. A compendium of results concerning long-time 
behavior of solutions. 

8. An account of the difficulties which arise in 
attempts to construct solutions numerically 
using the Cauchy formulation. 

9. A recounting of cases in which the initial-value 
formulation has been used to model physically 
interesting systems. 

10. A note regarding the extent to which the initial- 
value formulation (and the various aspects of it 
just enumerated) generalize to dimensions other 
than 3 + 1 (three space and one time). 

11. A determination of which nongravitational 
fields may be coupled to Einstein’s theory in 
such a way that the resulting coupled theory 
admits an initial-value formulation. 


We do not have the space here for such a 
complete treatment. So we choose to focus on 
those topics directly related to the Einstein con- 
straint equations. Generalizing a bit to the Einstein- 
Maxwell theory (thereby including representative 
nongravitational fields), we first carry out the space 
plus time “3 + 1” decomposition of the gravitational 
and electromagnetic fields. Then, applying the 
Gauss—Codazzi—Mainardi equations to the space- 
time curvature, we turn the spacetime-covariant 
Einstein—Maxwell equations into a set of constraint 
equations restricting the choice of initial data 
together with a set of evolution equations develop- 
ing the data in time. Next, we discuss the most 
widely used approach for obtaining sets of initial 
data which satisfy the constraint equations: the 
conformal method. We include in this discussion 
an account of some of what is known about the 
extent to which the equations which are produced 
by the conformal method admit solutions in various 
situations (e.g., working on a closed manifold, or 
working with asymptotically Euclidean data). We 
then discuss alternate procedures which have been 
used to obtain and analyze solutions of the 
constraints, including the conformal thin sandwich 
approach, the quasispherical method, and various 
gluing procedures. Finally, we make concluding 


remarks. For more details on some of the topics 
discussed here, and for treatment of some of the 
other topics listed above, see the recent review paper 
of Bartnik and Isenberg (2004). 


Space + Time Field Decomposition 
and Derivation of the Constraint 
Equations 


To understand what sort of initial data one needs to 
choose in order to construct a spacetime via the initial- 
value formulation, it is useful to consider a spacetime 
(M*, g) which satisfies the Einstein (-Maxwell) field 
equations and contains a Cauchy surface ip : U3 M4. 
We note that the existence of a Cauchy surface in 
(Mt, g, A) is not automatic; if one exists, the spacetime 
is said to be (by definition) “globally hyperbolic.” 

Among its other properties, a Cauchy surface is a 
spacelike embedded submanifold of a Lorentz 
geometry. It immediately follows that the spacetime 
(M*,¢,A) induces on X° a Riemannian metric y, 
a timelike normal vector field ej, an intrinsic 
(y-compatible) covariant derivative V, and a sym- 
metric “extrinsic curvature” tensor field K (second 
fundamental form). It also follows that certain 
components of the spacetime curvature tensor can 
be written in terms of these Cauchy surface 
quantities (y,e,,V,K) along with other geometric 
quantities related to them, such as the spatial 
curvature R corresponding to the induced covariant 
derivative V (Gauss—Codazzi equations). 

To complete the curvature 3 + 1 decomposition (i.e., 
to carry it out for all components of the spacetime 
curvature), we need not just one Cauchy surface, but 
rather a full local foliation i; : £3 — M‘ of the spacetime 
by such submanifolds. This foliation allows one to 
define e} as a smooth vector field on an open 
neighborhood of the Cauchy surface ig(%?) in M4. It 
also results in a threading of spacetime by a congruence 
of timelike paths (see Figure 1). This threading may be 
viewed as a spacetime-filling family of observers. It also 
defines for the spacetime a set of coordinates relative to 
which one can measure and calculate the dynamics of 
the spacetime geometry. 

It is useful for later purposes to note that at each 
spacetime point p € ©; C M4 (Here X; :=i;(d°).) the 
vector O/Ot tangent to the threading path through p 
may be decomposed as 


o 
a +X [3] 


'The Taub-NUT spacetime is an example of a spacetime which is 
not globally hyperbolic. 


M+ 
ir 
g3 


Figure 1 3+ 1 Foliation and threading of spacetime. 


with the “shift vector” X tangent to the surface (X € 
T,X), and with the “lapse” N a scalar (see Figure 2). 
Using these quantities, we can write the spacetime 
metric in the form 


g=7 -0 80 
= Yap (dx + X°dt) (dx? +X?dt) -N?d? [4] 


where 0+ is the unit length timelike 1-form which 
annihilates all vectors tangent to the hypersurfaces 
of the foliation. 

Relying on the following 3 + 1 decomposition of 
the spacetime-covariant derivative *V (Here {0,} is a 
coordinate basis for the vectors tangent to the 
hypersurfaces of the foliation; {0,,e,} constitutes a 
basis for the full set of spacetime vectors at p.): 


Va, 3p = Va,Op — Kape [5] 
Voger == Dn [6] 
ð, N 
Ve. b= —KpOm + le A)+ ye T 
3p„N 
"Woy AL NO [8] 


Figure 2 Decomposition of the time evolution vector field 0/0t. 
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one readily derives the (Gauss—Codazzi—Mainardi) 
3 +1 decomposition of the curvature: ? 


4pe _ pe 
Rove = R 


abc 


T Kak, — Kak. [9] 
Rhe = Va, Kab — Va, Kac [10] 


abc 
Va,Va,N 


"Rab = LenK ap — kank, T [1 1] 


where £ denotes the surface-projected Lie derivative. 

Since we are interested here in the 3 + 1 formula- 
tion of the Einstein—Maxwell system, we need a 3 + 1 
decomposition for the electromagnetic as well as the 
gravitational field. The spacetime 1-form “vector 
potential” +A pulls back on each Cauchy surface X; 
to a spatial 1-form A. One may then write 


tA = A + 0t = Apdx? +(Nu+A,X°) de [12] 


for a scalar u. Based on this decomposition, one has 
the following 3+1 decomposition for the electro- 
magnetic 2-form F: 


Fa =Vack [13] 
Fib = Va, Ap — Va, Aa [14] 


where F° is the electric vector field. 

We may now use all of these decomposition 
formulas to write out the 14 field equations for the 
Einstein—Maxwell theory 


"Gop = FoF op — ggap” Fy [15] 
“Vu (Fg) =0 [16] 


in terms of the spatial fields (y, K, N, X; A, E, u) and 
their derivatives. We obtain 


R — K”" King + (tr KY =1E”" Em +1B"By [17] 


VmK? — Va,(tt K) = Eam E” B” [18] 
Le, Kab = Rap — 2K? K mb + (tr K) Kap 

+E,Ey+B,B,—~2~®% (19) 

Va P” =0 (20) 

Le, Et=O™"V a By [21] 


where €,». is the alternating Levi-Civita symbol 
(component representation of the Hodge dual), and 
where we have used B4 := €7""(Va, An — Va,Am) as a 
convenient shorthand. 


*Here and throughout this article, we use the Misner-Thorne- 
Wheeler (MTW) (Misner et al. 1973) conventions for the 
definition of the Riemann curvature, for the signature — + ++ 
of the metric, for the index labels (Greek indices run over 
{0, 1,2, 3} while Latin indices run over {1, 2, 3}), etc. 
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It is immediately evident that nine of these equations 
([19] and [21]) involve time derivatives of the spatial 
fields, while five of them ([{17], [18], and [20]) do not. 
Thus, we may split the field equations of the Einstein- 
Maxwell theory into two sets: (1) the constraint 
equations [17], [18], and [20], which restrict our 
choice of the Einstein—Maxwell initial data 
(y,K,A,E); and (2) the evolution equations, which 
describe how to evolve the data (y, K, A, E) in time, 
presuming that one has also prescribed (freely!) the 
“atlas fields” (N, X, u).? We note that the complete 
system of evolution equations for the Einstein- 
Maxwell field equations includes equations which are 
based on the definitions of K and E. Written in terms 
of (surface-projected) Lie derivatives along 0/0,, the 
full system takes the form 


Loap = —2NKap + Lx ab |22] 

LaKap =N (Ras — 2K? Kmo + KiKa + BaEp + BaBe) 
Tp Va NA a 23] 
LoAa=N(Ea + Vati) + LxA; [24] 
eN + LxE" 25] 


As noted earlier, well-posedness theorems* guar- 
antee that initial data satisfying the constraint 
equations [17], [18], and [20] on a manifold ©’ 
can always at least locally be evolved into a 
spacetime solution (X? x I,g,*A) (for I some inter- 
val in Rt) of the Einstein-Maxwell equations. We 
now turn our attention to the issue of finding sets of 
data which do satisfy the constraints. 


The Conformal Method 


We seek to find sets of data (y,K,A,E) on a 
manifold £? which satisfy the constraint equations 


R — K"" Kin + (tr =E Pyta B Bae Bo 
Vink? = Valtt K) =m EB" [27] 


Vm E” =0 [28] 


`The collective name “atlas field” for the lapse N, the shift X, the 
electric potential u, and other such fields which are neither 
constrained by the constraint equations nor evolved by the 
evolution equations, derives from their role in controlling the 
evolution of coordinate charts and bundle atlases in the course of 
the construction of spacetime solutions of relativistic field 
equations like the Einstein Maxwell system. 

While the work cited earlier (Foures-Bruhat 1952) proves well 
posedness for the vacuum Einstein equations only, the extension 
to the Einstein—Maxwell system is straightforward 


(Here and below, for convenience, we replace Vo, 
by Va.) This is an underdetermined problem, with 
five equations to be solved for 18 functions. 

The idea of the conformal method is to divide the 
initial data on X? into two sets — the “free (conformal) 
data,” and the “determined data” — in such a way that, 
for a given choice of the free data, the constraint 
equations become a determined elliptic partial differ- 
ential equation (PDE) system, to be solved for the 
determined data. There are a number of ways to do this; 
we focus here on one of them — the “semidecoupling 
split” or “method A.” After describing this version of 
the conformal method, and discussing what one can do 
with it, we note some of its drawbacks and then later (in 
the next section) consider some alternatives. (See 
Choquet-Bruhat and York (1980) and Bartnik and 
Isenberg (2004) for a more complete discussion of these 
alternatives.) 

For the Einstein—Maxwell theory, the split of the 
initial data is as follows: 


Free (“conformal”) data 
A; — a Riemannian metric, specified up to 
conformal factor; 


Cj -a divergence-free’(V’c; = 0), tracefree 
(Ao; = 0); symmetric tensor; 

r — a scalar field; 

Q, — a 1-form; 

E? — a divergence-free vector field; 

Determined data 

@ -— a positive-definite scalar field; 

W' — a vector field; 

€ — a scalar field. 


For a given choice of the free data, the five 
equations to be solved for the five functions of the 
determined data take the form 


Ke=0 (29) 
Vin( LW)” =26°Va,7 + €amnE” 8" (30) 


Ap= R p= 0 + LW") (Gm + LWinn) 
+75 (E"Em +B" Bm)? +577 (6°) [31] 


where the Laplacian A and the scalar curvature R 
are based on the \,»-compatible covariant derivative 
V;, where L is the corresponding conformal Killing 
operator, defined by 


(LW), = V,W,4+ V Wa — Z Aab Vm w” [32] 


`In the free data, the divergence-free condition is defined using the 
Levi-Civita-covariant derivative compatible with the conformal 
metric Ajj 


and where (§,:=€7"(Voa,Qn — Va,Qm). Presuming 
that for the chosen free data one can indeed solve 
equations [29]-[31] for €,¢, and W, then the initial 
data (y, K, A, E) constructed via the formulas 


Yab = P Aab [33] 

K =0 (Oa HLW) tio rat [BA 
Ap = Op [35] 

E* = $°(E, + Vea€) [36] 


satisfy the Einstein—Maxwell constraint equations 
[26]-[28]. 

Before discussing the extent to which one can 
solve equations [29]-[31] and consequently use the 
conformal method to generate solutions, we wish to 
comment on how these equations are derived. Three 
formulas are key to this derivation. The first is the 
formula for the scalar curvature of the metric 
Yab = fab, expressed in terms of the scalar curva- 
ture for A,, and derivatives of ¢: 


R(y) = *R(A) -841% [37] 


We note that if we were to use a different power of @ as 
the conformal factor multiplying Aap, then this formula 
would involve squares of first derivatives of ¢ as well. 
The second key formula relates the divergence of a 
traceless symmetric tensor p,p with respect to the 
covariant derivatives V(,) and V} compatible with 
conformally related metrics. One obtains 


Vy) Pmb = go? Vo (¢° Pmb) [38] 


The third key formula does the same thing for a 
vector field C?: 


Vigne” = & VG) (Gn) [39] 


In addition to helping us derive equations [29]-[31] 
from the substitution of formulas [33]-[36] into 
[26]-[28], these key formulas indicate to some 
extent how the choice of the explicit decomposition 
of the initial data into free and determined data is 
made (see Isenberg, Maxwell, and Pollack for 
further elaboration). 

It is easy to see that there are some choices of the 
free data for which [29]-[31] do not admit any 
solutions. Let us choose, for example, X° to be the 
3-sphere, and let us set A to be the round sphere 
metric, o to be zero everywhere, r to be unity 
everywhere, and both a and € to vanish everywhere. 
We then readily determine that eqn [29] requires 
that € be constant and that eqn [30] requires that 
LW,» be zero. The remaining equation [31] now 
takes the form Ad =(1/8)R¢ + (1/12)¢°. Since the 
right-hand side of this equation is positive definite 
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(recall the requirement that ¢ > 0), it follows from 
the maximum principle on closed (compact without 
boundary) manifolds that there is no solution. 

In light of this example, one would like to know 
exactly for which sets of free data eqns [29]-[31] can 
be solved, and for which sets they cannot. Since one 
readily determines that every set of initial data which 
satisfies the Einstein—Maxwell constraints [26]|-[28] 
can be obtained via the conformal method, such a 
classification effectively provides a parametrization 
of the space of solutions of the constraints.° 

What we know and do not know about classifying 
free data for the solubility of eqns [29]-[31] is 
largely determined by whether or not the function + 
is chosen to be constant on X>. If r is chosen to be 
constant, then eqns [29]-[31] effectively decouple, 
and the classification is essentially completely 
known. Sets of initial data generated from free 
data with constant r are called “constant mean 
curvature” (CMC) sets, since the mean curvature of 
the initial slice embedded in its spacetime develop- 
ment is given by 7. We also know a considerable 
amount about the classification if |V7| is sufficiently 
small (“near CMC”), while virtually nothing is 
known for the general non-CMC case. 

A full account of the classification results known 
to date is beyond the scope of this article. Indeed, 
such an account must separately deal with a number 
of alternatives regarding manifold and asymptotic 
conditions (data on a closed manifold; asymptoti- 
cally Euclidean data; asymptotically hyperbolic 
data; data on an incomplete manifold with bound- 
aries) and regularity (analytic data, smooth data, 
Ck data, or data contained in various Holder or 
Sobolev spaces), among other things. We will, 
however, now summarize some of the results; see, 
for example, Bartnik and Isenberg (2004) or 
Choquet-Bruhat for more complete surveys. 


CMC Data on Closed Manifolds 


Generalizing the S? example given above, we note 
that for any set of free data (X3, Asb, Cab, T, Qa, EP) 
with constant 7 and with no conformal Killing 
fields, eqn [29] is easily solved for €, and then eqn 
[30] takes the form 


Vin( LW)” = €gmnE” 3” 40) 


°Of course, in claiming that appropriate sets of the free data 
parametrize the space of solutions of the constraints, one needs to 
determine if inequivalent sets of free data are mapped to the same 
set of solutions. We discuss this below. 
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which is a linear elliptic PDE for W,, with 
invertible operator.’ This equation admits a unique 
solution, and then the problem of solving the 
constraints reduces to the analysis of the “Lichner- 
owicz equation” [31]. 

To determine if this equation admits a solution for 
the given set of free data, we use the following 
classification criteria: (1) The metric is labeled 
positive Yt(d?), zero yE’), or negative (>) 
Yamabe class depending upon whether the metric 
Aap on ©? can be conformally deformed so that its 
scalar curvature is everywhere positive, everywhere 
zero, or everywhere negative. (2) The (Tabs Qa, EP) 
portion of the data is labeled either = or Æ, 
depending upon whether the quantity Omno”” + 
E” Em + 8” Bm is identically zero, or not. (3) The 
mean curvature 7 is labeled “max” or “nonmax” 
depending upon whether the constant 7 is zero or 
not. In terms of these criteria, we have 12 classes of 
free data, and one can prove (Choquet-Bruhat and 
York 1980, Isenberg 1995) the following: 


e Solutions exist for the classes (Y+, #, max), (V+, Æ, 
nonmax), o =j max), a -3 max), (Y, = 
nonmax), (Y , #,nonmax) and 

e Solutions do not exist for the classes (V*, =, 
max), Cae =,nonmax), ‘eur =,nonmax), a = 
max), (V =; max), (I) ; #, max). 


3 


This classification is exhaustive, in the sense that 
every set of CMC data on a closed manifold fits 
neatly into exactly one of the classes. We note that 
the proofs of existence of solutions can generally be 
done using the sub-super solution technique, while 
the nonexistence results follow from application of 
the maximum principle. 


Maximal Asymptotically Euclidean Data 


Just as is the case for data on a closed manifold, 
the constraint equations [29] and [30] decouple 
from the Lichnerowicz equation [31] for asympto- 
tically Euclidean data with constant r. We note 
that #0 is inconsistent with the data being 


7A metric À has a conformal Killing field if the equation LY =0 
has a nontrivial solution Y. Geometrically, the existence of a 
conformal Killing field Y indicates that the flow of (£3, Aap) along 
Y is a conformal isometry. While free data with nonvanishing 
conformal Killing fields can be handled, for convenience we shall 
stick to data without them here. 

SWork on the Yamabe problem (Aubin 1998) shows that every 
Riemannian metric on a closed manifold is contained in one and 
only one of these classes. In fact, the Yamabe theorem (Schoen 
1984) shows that every metric can be conformally deformed so 
that its scalar curvature is +1,0, or —1, but this result is not 
needed for the analysis of the constraint equations. 


asymptotically Euclidean, so we restrict to the 
maximal case, 7=0. 

The criterion for solubility of the constraints in 
conformal form for maximal asymptotically Eucli- 
dean free data is quite a bit simpler to state than 
that for CMC data on a closed manifold. It 
involves the metric A only; the rest of the free 
data is irrelevant. Specifically, as shown by Brill 
and Cantor (with a correction by Maxwell (2005)), 
a solution exists if and only if for every nonvanish- 
ing, compactly supported, smooth function f on £, 
we have 


fat Sul VAI + RF) vdet à / det à 


5 > 0 [41] 
eo ae 


Alternative Methods for Finding Solutions 
to the Constraint Equations 


While the conformal method has proved to be a 
very useful tool for generating and analyzing 
solutions of the Einstein constraint equations, it 
does have some minor drawbacks: (1) The free data 
is remote from the physical data, since the 
conformal factor can vastly change the physical 
scale on different regions of space. (2) While 
casting the constraints into a determined PDE 
form has the advantage of producing PDEs of a 
relatively familiar (elliptic) form, one does give up 
certain flexibilities inherent in an underdetermined 
set of PDEs. (We expand upon this point below in 
the course of discussing gluing.). (3) In choosing a 
set of free data, one does have to first project out a 
divergence-free vector field (€) and a divergence- 
free tracefree tensor field (o). (4) While the choice 
of CMC free data for the conformal method is 
conformally covariant in the sense that conformally 
related sets of CMC free data (©3, Asb, Cab, Ts Was EP) 
and (53,6), OTa, T, Ag, OEP) produce the 
same physical solution to the constraints, this is 
not the case for non-CMC free data. 


Conformal Thin Sandwich 


The last two of these problems can be removed by 
modifying the conformal method in a way which 
York (1999) has called the “conformal thin sand- 
wich”? (CTS) approach. The basic idea of the CTS 
approach is the same as that of the conformal 
method. However, CTS free data sets are larger — 
the divergence-free tracefree symmetric tensor field 
o is replaced by a tracefree symmetric tensor field U, 


and an extra scalar field 7 is added — and after 
solving the CTS constraint equations 


AE=0 [42] 

Vn ((2m) (LX) =3 ®° Vat + eam E” 8” 
+Vm((2n)'U") 3 

AP =i RP —1(U”” + LY””)(Umn + LY inn) ® 
+ (E”Em + B"Bm)P I HAr D [44] 


for the vector field Y and the conformal factor ®, 
one obtains not just the full set of physical initial 
data satisfying the constraint equations [26]-[28] 


Yab = P’ Aab [45] 


Ka =07* (Ua + LY) +52" rAwT [46] 


Ap =a [47] 
E* =$ (E, + Vaf) [48] 
but also the lapse N and shift X 
N=0°1 [49] 
X’ = Y? [50] 


Clearly, in using the CTS approach, one need not 
project out a divergence-free part of a symmetric 
tracefree tensor. One also readily checks that the 
CTS method is conformally covariant in the sense 
discussed above: the physical data generated from 
CTS free data (Aap, Ugh T; N, Qa, E?) and from data 
(64.550 2 U nbs T, 06N, gE”) are the same. 
Furthermore, since the mathematical form of eqns 
[42]-[44] is very similar to that of [29]-[31], the 
solvability results for the conformal method can be 
essentially carried over to the CTS approach. 

There is, however, one troubling feature of the 
CTS approach. The problem arises if we seek CMC 
initial data with the lapse function chosen so that the 
evolving data continue to have CMC (such a gauge 
choice is often used in numerical relativity). In the 
case of the conformal method, after solving [29]-[31] 
to obtain initial data (7,4, Kab, Aa, E®) which satisfies 
the constraints, one achieves this by proceeding to 
solve a linear homogeneous elliptic PDE for the lapse 
function. One easily verifies that solutions to this 
extra equation always exist. By contrast, in the CTS 
approach, the extra equation takes the form 


A(n) =} nR +3 (8n) (U — LX) 
+ (n) (E + 8°) 
+Y” Va, T — D’ [51] 
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which is coupled to the system [42]-[44]. The 
coupling is fairly intricate; hence little is known 
about the existence of solutions to the system, and it 
has been seen that there are problems with unique- 
ness. Such problems of course do not arise if one 
makes no attempt to preserve CMC. 


The Quasispherical Ansatz and Parabolic Methods 


Applying either the conformal method or the CTS 
approach to the constraint equations results in 
systems of elliptic equations. Another approach, 
pioneered by Bartnik (1993), produces instead 
parabolic equations. In the simplest version of this 
approach, known as the “quasispherical ansatz,” 
one works on a manifold £? = R3 \ B3, where B3 is a 
3-ball; one presumes that there exist coordinates 
(r,0,) on ©? in terms of which the metric takes the 
“quasispherical” form 


yos =u"dr + (rdO + 8dr)” 
+ (rsin 8 dọ + B°dr)* [52] 


for functions u(r,0,¢), 6°(r,9,), 8°(r,0,ġ), and 
then one attempts to satisfy the time-symmetric 
constraint Ros) =0 on 3.” Calculating the scalar 
curvature for the metric in this form, one finds that 


the equation Ros) =0 can be written as 


(r0, — 8°05 — 8° 0 ju —u? Au 
= O(u, 6’, B°, 7,0, 6) [53] 


where O is a polynomial in the positive function u. 
One can now show that if one specifies 8° and 8° 
everywhere on X? (subject to an upper bound on the 
divergence of the vector field (8°, 3°)), and if one 
specifies regular initial data for u on the inner 
boundary of £>, then one has a well-posed initial- 
value problem (in terms of the “evolution” coordi- 
nate r) for the parabolic PDE [53]. Ideally, one can 
use this approach to extend solutions of the time- 
symmetric constraints from an isolated region 
(corresponding to B3) out to spatial infinity. 

The basic quasispherical ansatz approach just 
outlined can be generalized significantly (Sharples 
2001, Bartnik and Isenberg 2004) to allow for more 
general spatial metrics, and to allow nonzero 
Kab, Ac, and E°. It has been an especially valuable 
tool for the study of mass in asymptotically 
Euclidean data sets. It does not, however, purport 
to construct general solutions of the constraint 
equations. 


?This version of the constraints is called “time symmetric” since 
one is solving the full set of constraints with K,, assumed to be 
zero. Data with K,, =0 is time symmetric. 
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Gluing Solutions of the Constraint Equations 


Starting around the year 2000, a number of new 
“gluing” procedures have been developed for con- 
structing and studying solutions of the constraint 
equations. Unlike the conformal method, the CTS 
method, and the quasispherical ansatz, all of which 
construct solutions from scratch, the gluing proce- 
dures construct new solutions from given ones. This 
feature, and the considerable flexibility of the 
procedures, has resulted in a wealth of applications 
already in the short five-year history of gluing in 
general relativity. 

One of the gluing approaches, developed by 
Corvino (2000) and Corvino and Schoen (preprint) 
(see also Chruściel and Delay (2002)), allows one to 
choose a compact region Q in almost any smooth, 
asymptotically Euclidean vacuum solution of the 
constraints, and from this produce a new smooth 
solution which is completely unchanged in the 
region Q and is identical to Schwarzschild or Kerr 
outside some larger region. In proving this result, 
one exploits the underdetermined character of the 
constraint equations: such a construction could not 
be carried out if the constraints were a determined 
PDE system.” 

The other main gluing approach, developed first by 
Isenberg et al. (2001), and then further developed with 
Chruściel (Chruściel et al. 2005) and with Maxwell 
(Isenberg et al. 2005), starts with a pair of solutions of 
the (vacuum) constraints (53,771, K1) and (©3, %2, K2) 
together with a choice of a pair of points pı € £}, po € 
3, one from each solution. From these solutions, this 
gluing procedure produces a new set of initial data 
(Sia) (1-2) Ka-2)) with the following properties: 
(1) Xa-2) is diffeomorphic to the connected sum 
Sees: (2) (© i-a ie Ka-2)) is a solution of the 
constraints everywhere on ©7,_5); (3) On that portion 
of D which corresponds to X? \ {ball around p1}, 
the data (ya-2), Ka-2)) is isomorphic to (y1, Ki), with 
a corresponding property holding on that portion of 
X> which corresponds to }3\{ball around p2} (see 
Figure 3)."! 

This connected sum gluing can be carried out for 
very general sets of initial data. The sets can be 
asymptotically Euclidean, asymptotically hyperbolic, 
specified on a closed manifold, or indeed anything 


"Hence if one tries to do Corvino Schoen-type gluing using a 
fixed conformal geometry, the gluing fails because the determined 
elliptic system satisfies the unique continuation property. 

1The connected sum of the two manifolds (see property (1)) is 
constructed as follows: first we remove a ball from each of the 
manifolds =} and 43. We then use a cylindrical bridge $? x I 
(where I is an interval in R!) to connect the resulting S? 
boundaries on each manifold 


(E7, 14, Ki} (Z3, 72, Ko} 
en oe a ba 


3 
{2(4-2), 1-2) Ka-2) 


Figure 3 Connected sum gluing. 


else. The only condition that the data sets must 
satisfy is that, in sufficiently small neighborhoods of 
each of the points at which the gluing is to be done, 
there do not exist nontrivial solutions € to the 
equation DO% gı =0, where DO% x, is the opera- 
tor obtained by taking the adjoint of the linearized 
constraint operator." In work by Beig, Chruściel, 
and Schoen, it is shown that this condition (some- 
times referred to as “No KIDs,” meaning “no 
(localized) Killing initial data)” is indeed generically 
satisfied. 

While a discussion of the proof that connected 
sum gluing can be carried out to this degree of 
generality is beyond the scope of this paper (see 
Chruściel et al. (2005), along with references cited 
therein for details of the proof), we note three 
features of it: first, the proof is constructive in the 
sense that it outlines a systematic, step-by-step 
mathematical procedure for doing the gluing. In 
principle, one should be able to carry out the gluing 
procedure numerically. Second, connected sum glu- 
ing relies primarily on the conformal method, but it 
also uses a nonconformal deformation at the end 
(dependent on the techniques of Corvino and 
Schoen, and of Chruściel and Delay), so as to 
guarantee that the glued data is not just very close to 
the given data on regions away from the bridge, but 
is indeed identical to it. Third, while Corvino- 
Schoen gluing has not yet been proved to work for 
solutions of the constraints with source fields, 
connected sum gluing (up to the last step, which 
relies on Corvino—Schoen) has been shown to work 
for most matter source fields of interest (Isenberg 
et al.). It has also been shown to work for general 
dimensions greater than or equal to three. 


"When a solution to this equation does exist on some region 
A € 3, it follows from the work of Moncrief that the spacetime 
development of the data on A admits a nontrivial isometry. 


While gluing is not an efficient tool for studying 
the complete set of solutions to the constraints, it 
has proved to be very valuable for a number of 
applications. We note a few here. 


1. Spacetimes with regular asymptotic structure. 
Until recently, it was not known whether there 
is a large class of solutions which admit the 
conformal compactification and consequent 
asymptotically simple structure at null and space- 
like infinity characteristic of the Minkowski and 
Schwarzschild spacetimes. Using Corvino-Schoen 
gluing, together with Friedrich’s analyses of 
spacetime asymptotic structures and an argument 
of Chruściel and Delay (2002), one produces 
such a class of solutions. 

2. Multi-black hole data sets. Given an asymptoti- 
cally Euclidean solution of the constraints, con- 
nected sum gluing allows a sequence of (almost) flat 
space initial data sets to be glued to it. The bridges 
that result from this gluing each contain a minimal 
surface, and consequently an apparent horizon. 
With a bit of care, one can do this in such a way 
that indeed the event horizons which appear in the 
development of this glued data are disjoint, and 
therefore indicative of independent black holes. 

3. Adding a black hole to a cosmological space- 
time. Although there is no clear established 
definition for a black hole in a spatially compact 
solution of Einstein’s equations, one can glue an 
asymptotically Euclidean solution of the constraints 
to a solution on a compact manifold, in such a way 
that there is an apparent horizon on the bridge. 
Studying the nature of these solutions of the 
constraints, and their evolution, could be useful in 
trying to understand what one might mean by a 
black hole in a cosmological spacetime. 

4. Adding a wormhole to your spacetime. While 
we have discussed connected sum gluing as a 
procedure which builds solutions of the con- 
straints with a bridge connecting two points on 
different manifolds, it can also be used to build a 
solution with a bridge connecting a pair of points 
on the same manifold. This allows one to do the 
following: if one has a globally hyperbolic 
spacetime solution of Einstein’s equations, one 
can choose a Cauchy surface for that solution, 
choose a pair of points on that Cauchy surface, 
and glue the solution to itself via a bridge from 
one of these points to the other. If one now 
evolves this glued-together initial data into a 
spacetime, it will likely become singular very 
quickly because of the collapse of the bridge. 
Until the singularity develops, however, the 
solution is essentially as it was before the gluing, 
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with the addition of an effective wormhole. 
Hence, this procedure can be used to glue a 
wormhole onto a generic spacetime solution. 

5. Removing topological obstructions for constraint 
solutions. We know that every closed three- 
dimensional manifold M? admits a solution of 
the vacuum constraint equations. To show this, 
we use the fact that M? always admits a metric T 
of constant negative scalar curvature. One easily 
verifies that the data (y=I,K=I) is a CMC 
solution. Combining this result with connected 
sum gluing, one can show that for every closed 
£3, the manifold X? \ {p} admits both an asymp- 
totically Euclidean and an asymptotically hyper- 
bolic solution of the vacuum constraint equations. 

6. Proving the existence of vacuum solutions 
on closed manifolds with no CMC Cauchy 
surface. Based on the work of Bartnik (1988) 
one can show that if one has a set of initial data 
on the manifold T’?#T° with the metric compo- 
nents symmetric across a central sphere and the 
components of K skew symmetric across that 
same central sphere, then the spacetime develop- 
ment of that data does not admit a CMC Cauchy 
surface. Using connected sum gluing, one can 


show that indeed initial data sets of this sort exist 
(Chruściel et al. 2005). 


Conclusion 


Much is known about the Einstein constraint 
equations and those sets of initial data which satisfy 
them. We know how to use the conformal method 
or the CTS approach to construct (and parametrize 
in terms of free data) the CMC and near CMC sets 
of data which solve the constraints, with or without 
matter fields present. We know how to use the 
quasispherical approach to explore extensions of 
solutions of the constraint equations from compact 
regions. We know how to use gluing techniques to 
produce new solutions of both physical and math- 
ematical interest from old ones, and we know how 
to use gluing as a tool for proving such results as the 
existence of vacuum spacetimes with no CMC 
Cauchy surfaces. 

There is much that is not yet known as well. Very 
little is known about solutions of the constraint 
equations which have neither CMC nor near CMC. It 
is not known how to systematically extend solutions of 
the constraints from a compact region to all of R° in 
such a way that the extension is asymptotically 
Euclidean (unless we know a priori that such an 
extension exists). Very little is known regarding how 
to control the constraints during the course of 
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numerical evolution of solutions. Most importantly, 
we do not yet know how to systematically find 
solutions of the constraint equations which serve as 
physically realistic model initial data sets for studying 
astrophysical and cosmological systems of interest. 

Many of these questions concerning the Einstein 
constraints and their solutions are fairly daunting. 
However, in view of the rapid progress in our 
understanding during the last few years, and in view 
of the pressing need to further develop the initial- 
value formulation as a tool for studying general 
relativity and gravitational physics, we are optimis- 
tic that this progress will continue, and we will soon 
have answers to a number of these questions. 
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originally appeared in relativity, but it is of 
tremendous interest from the point of view of pure 
mathematics. Demanding a metric of constant 
sectional curvature is a very strong condition, 
while metrics of constant scalar curvature always 
occur. The Einstein property, which is essentially a 
constant-Ricci-curvature condition, occupies an 
intermediate position between these conditions, and 
it is still not clear exactly how strong it is. In 


dimensions higher than four, it is still unknown 
whether there are obstructions to a manifold 
admitting an Einstein metric. 

The study of Einstein manifolds is a vast and 
rapidly expanding area, and this article can merely 
touch on some points of particular interest. The 
focus of the article is very much on the Riemannian 
rather than Lorentzian case (see, e.g., Hawking and 
Ellis (1973) or the articles by Christodoulou and 
Tod in LeBrun and Wang (1999) for a discussion of 
the Lorentzian case in general relativity). For further 
reading, the books of Besse (1987) and LeBrun and 
Wang (1999) are strongly recommended. 


Basic Properties 


Let (M,g) be a (pseudo)-Riemannian manifold. 
There is a unique connection V, the Levi-Civita 
connection of g, with the following properties: 


1. the torsion T(X,Y) =Vx Y—VyX —[X, Y] 
vanishes and 
2. Vg=0 


We can now form the Riemann curvature tensor 
of g: 


R(X, Y)Z = VxVyZ —VyVxZ - VxyZ 


This is a type (3,1) tensor. There is one nontrivial 
contraction we can perform to obtain a (2, 0) tensor, 
that is, the Ricci curvature 


Ric(X, Y) = tr(Z+ R(X, Z)Y) 


We may perform a further contraction and obtain 
the scalar curvature s= tr, Ric. 

The Ricci curvature is a symmetric tensor of the 
same type as the metric, so we can make the 
following definition: 


Definition 1 A metric g is Einstein if 
Ric = Ag [1] 
for some constant A. 


In this article, we shall take g to be a Riemannian 
(positive-definite) metric. 


Remark 1 In dimension higher than 2, we do not 
have to put in the assumption that A is constant by 
hand. For, taking the divergence of [1] gives 
(1/2)ds=dA, while taking instead the trace gives 
s=n\, so ifn Æ 2, we see dA=0. 


Remark 2 In dimension 2 and 3, the Einstein 
condition is equivalent to constant curvature. The 
only complete Einstein manifolds in these dimen- 
sions are therefore the model spaces $”,R” and 
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hyperbolic space, and quotients of these by discrete 
groups of isometries. 


Remark 3 As noticed by Hilbert, the Einstein 
equations admit a variational interpretation. They 
are the variational equations for the total scalar 
curvature functional 


M 


restricted to the space of volume 1 metrics (here dug 
denotes the volume form defined by g). 


Obstructions 


The most fundamental question, we can ask is: 
Given a smooth manifold M does it support an 
Einstein metric? 

One is also interested in the question of unique- 
ness of such a metric, or more generally of 
describing the moduli space of such metrics. 

In this section we discuss obstructions to existence. 
In dimension 2 Remark 2 shows that any compact 
manifold admits an Einstein metric, while in dimen- 
sion 3 the only possibilities are space forms. In 
particular, there is no Einstein metric on S! x S?. 

The picture is much less clear in higher dimen- 
sions. If A > 0, one obtains some elementary 
obstructions just by considering the sign of the 
Ricci curvature: 


1. If M supports a complete Einstein metric with 
A > 0, then by Myers’s theorem M is compact 
and 71(M) is finite. Also there are obstructions 
coming from the positivity of the scalar curvature 
(e.g., if M is spin and 4m-dimensional, then the A 
genus vanishes). 

2. If M supports a complete Ricci-flat metric, then 
every finitely generated subgroup of 71(M) has 
polynomial growth. 


However, if dim M > 5, there is, at the time of 
writing, no known obstruction to M supporting an 
Einstein metric of negative Einstein constant. 

In the borderline dimension 4, Hitchin and Thorpe 
observed that the Einstein condition put topological 
constraints on the manifold. for, we have the 
following expressions for the Euler characteristic x 
and signature 7 in terms of the curvature tensor: 


1 2 2 
r= aa | Wa - W- di 


1 , 5 
x= ga | IWP + [WP — [Rico +55 dre 
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where W, and W_ are the self-dual and anti- 
self-dual parts of the Weyl tensor, s is the scalar 
curvature, and Rico is the trace-free part of the Ricci 
tensor. 

The Einstein condition is just Rico =0, so we 
immediately obtain the following inequality. 


Theorem (Hitchin 1974). A compact four- 
dimensional Einstein manifold satisfies the inequality 


lz |< $x 


Note that equality is obtained if and only if g is 
Ricci-flat and (anti)-self-dual, which is equivalent to 
locally hyper-Kahler for some orientation. The only 
examples are the flat torus, the K3 surface with the 
Yau metric (now T=16 and y=24), and two 
quotients of K3. 

Since the mid-1990s, LeBrun (2003) has obtained 
a series of results which sharpen the Hitchin—Thorpe 
inequality by obtaining estimates on the Weyl and 
scalar curvature terms. These estimates are obtained 
by using Seiberg-Witten theory, the general theme 
being that nonemptiness of the Seiberg—Witten 
moduli space gives lower bounds on the curvature 
terms. LeBrun shows there are infinitely many 
compact smooth simply connected 4-manifolds 
that satisfy the Hitchin-Thorpe inequality but 
nonetheless do not admit Einstein metrics. 


Uniqueness and Moduli 


In Yang-Mills theory, there is a highly developed 
theory of moduli spaces of instantons, including 
formulas for the dimension. The situation for 
Einstein metrics is far less well understood. The 
relevant moduli space here is the set of Einstein 
metrics modulo the action of the diffeomorphism 
group, but there are very few manifolds for which 
the moduli space has been determined. In dimension 
2, of course, this is essentially the subject of the 
Teichmuller theory. 

One example where the moduli space is under- 
stood is the K3 surface. As explained above, the 
Hitchin—Thorpe argument shows that any Einstein 
metric is hyper-Kahler, and the moduli space of such 
structures on K3 is understood as an open set in a 
certain noncompact symmetric space. 

Some uniqueness results have been obtained in 
four dimensions. LeBrun used Seiberg—Witten tech- 
niques to show that the Einstein metric on a 
compact quotient of the complex hyperbolic plane 
CH? is unique up to homotheties and diffeomorph- 
isms. The analogous result for compact quotients of 
real hyperbolic 4-space was obtained using entropy 
methods by Besson, Courtois, and Gallot. It is still 


unknown, however, whether nonstandard Einstein 
metrics can exist on S*. 

In higher dimensions, very little is known. One can, 
by analogy with the theory of instantons, consider the 
linearization of the Einstein equations together witha 
further linear equation expressing orthogonality to 
the orbits of the diffeomorphism group. This gives a 
notion of formal tangent space to the Einstein moduli 
space. However, Koiso has shown that formal 
tangent vectors need not integrate to a curve of 
Einstein metrics. The structure of the moduli space 
(dimension, possible singularities) remains quite 
mysterious in general. It is known from the Wang- 
Ziller torus bundle examples that the moduli space 
can have infinitely many components. 


Special Holonomy 


Berger classified the possible holonomy groups of 
simply connected, irreducible, nonsymmetric n- 
dimensional Riemannian manifolds. The generic 
case is that of holonomy SO(n), and there are six 
other possibilities, each of which corresponds to 
some special geometry. Interestingly, four of these 
are automatically Ricci-flat, while a fifth is Einstein 
with A Æ 0. The remaining example, that of Kahler 
geometry, is not automatically Einstein, but the 
Einstein equations with the additional Kahler 
assumption reduce to a scalar Monge—Ampere 
equation and are therefore simpler than the general 
Einstein system. 

For further reading in this section, see the articles 
by Boyer-Galicki, Joyce, Salamon, Tian, Yau and 
the author in part I of LeBrun and Wang (1999), 
and also the book of Joyce (2000). For the Kahler 
case, see also Tian (2000). 


Kahler Manifolds (Holonomy U(n/2), SU(n/2)) 


A Kahler manifold (M,g) admits a covariant 
constant complex structure I, and associated 
Kahler 2-form w defined by w(X, Y)=g(IX, Y). 
The Ricci form p is defined by p(X, Y)= 
Ric(IX, Y), so the Einstein condition for a Kahler 
manifold becomes 


po = Aw 


On a Kahler manifold, p is the curvature of the 
canonical bundle, so [p/27] is a representative for 
the cohomology class c,(M). 

We see that a necessary condition for a complex 
manifold (M, I) to admit a Kahler—Einstein metric is 
that cı has a definite sign. We consider, in turn, the 
three cases: 


cy <0 
In this case, we have: 


Theorem (Aubin, Yau). Let (M,I) be a compact 
complex manifold with cı < 0. Then (M,1I) admits 
a Kahler—Einstein metric with A < 0. The metric is 
unique up to homothety. 


C1 = 0 
This is a special case of the Calabi conjecture, 
proved by Yau. 


Theorem (Yau). Let M be a compact Kabler 
manifold with Kahler form w. For any closed real 
form p of type (1,1) with [p/2r]=cı(M), there 
exists a unique Kahler metric with Kahler form 
cohomologous to w and Ricci form equal to p. 

In particular, if M is a compact Kahler manifold 
with cı =0, there exists a Ricci-flat Kahler metric 
on M. 


Ricci-flat Kahler metrics are called Calabi-Yau 
metrics, and are exactly the metrics with holonomy 
in SU(7/2). They admit two parallel spinors and are 
of great interest to string theorists, because in some 
string theories spacetime is expected to be a product 
of the four-dimensional macroscopic factor with a 
compact Calabi-Yau manifold of complex dimen- 
sion 3. 

Yau’s theorem provides many examples of 
Calabi-Yau spaces. For example, we can take a 
nonsingular complex submanifold defined as a 
complete intersection by the vanishing of r poly- 
nomials of degree d,,...,d, in CP”. Now, M has 
complex dimension n — r and cı =0 if and only if 
n+ 1= 5d; We obtain examples of complex 
dimension 2 by considering a quartic in CP”, the 
intersection of a quadric and a cubic in CP*, or the 
intersection of three quadrics in CP”; these all give 
examples of K3 surfaces. A famous example of a 
Calabi-Yau manifold of complex dimension 3 is 
given by the quintic in CP*. This technique can be 
extended, for example, by considering complete 
intersections in weighted projective space or con- 
structing Calabi-Yau desingularizations of singular 
spaces. 


cy > 0 

This case is the most complicated and, at the time of 
writing, is not yet fully understood. It is known that 
not every compact manifold with cı > 0 supports a 
Kahler—Einstein metric. 

An early result of Matsushima was that the identity 
component of the automorphism group of a Kahler- 
Einstein space with cı >0O must be reductive. 
This shows, for example, that the blow-up of CP? at 
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one or two points does not admit a Kahler—Einstein 
metric, despite having cı > 0. (The one-point blow-up 
does admit a Hermitian—Einstein metric due to 
Page.) A second obstruction is the Futaki invariant, a 
character of the Lie algebra of the automorphism 
group. This character vanishes if there is a Kahler— 
Einstein metric. 

Both the above obstructions depend on having a 
nontrivial algebra of holomorphic automorphisms of 
M. More recently, Tian has discovered further 
obstructions (in complex dimension 3 or higher) 
which can be present even if the automorphism 
algebra is trivial. 

However, for compact complex surfaces with 
cı >0, Tian has proved that vanishing of the 
Futaki invariant is sufficient. In particular, the 
blow-up of CP? at k points in general position, 
where 3 < k < 8, admits a Kahler—Einstein metric 
(note that c? =9 —k so if k > 8 then c1 is no longer 
definite). 

LeBrun—Catanese and Kotschick used these results 
to give an example of a topological 4-manifold 
carrying Einstein metrics of different signs. A 
deformation of the Barlow surface (a surface of 
general type) has cy <0 and hence carries an 
Einstein metric with A < 0. But this space is home- 
omorphic (though not diffeomorphic) to the blow- 
up of CP” at eight points, which carries an Einstein 
metric with A >0. One may use this example 
to construct higher-dimensional examples of diffeo- 
morphic manifolds carrying Einstein metrics of 
opposite sign. 


Hyper-Kahler Manifolds (Holonomy Sp(n/4)) 


These are always Ricci-flat. They have a triple 
(I,J,K) of covariant constant complex structures, 
satisfying the quaternionic multiplication relations 
IJ=K=-—JI, etc., and defining Kahler forms 
wwr, wg. Hyper-Kahler manifolds of dimension 
n=4N have N + 1 parallel spinors. 

The most effective way of producing complete 
hyper-Kahler metrics has been the hyper-Kahler 
quotient construction (Hitchin et al. 1987), which 
was motivated by the Marsden—Weinstein quotient 
in symplectic geometry. Let G be a group acting 
freely on a hyper-Kahler manifold (M, g, I,J, K) 
preserving the hyper-Kahler structure. Subject to 
mild assumptions, we obtain a G-equivariant 
moment map u:M > g* Q R, satisfying 


dux(Y) = (w7(X, Y), wy (X, Y), wK(X, Y)) 


Now the quotient p~'(0)/G is a hyper-Kahler 
manifold of dimension dim M — 4 dim G. 
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The power of this construction comes from the 
fact that even if M is just flat quaternionic space, 
one can obtain highly nontrivial quotients by 
suitable choice of group G (e.g., the asymptotically 
locally Euclidean four-dimensional examples of 
Kronheimer, which include as a subcase the multi- 
instanton metrics of Gibbons and Hawking). 

Many examples of interest in mathematical 
physics may be obtained by taking hyper-Kahler 
quotients of an infinite-dimensional space of con- 
nections and Higgs fields (Hitchin 1987). Examples 
include moduli spaces of instantons over a hyper- 
Kahler base, moduli spaces of monopoles on R?, 
and moduli spaces of Higgs pairs over a Riemann 
surface. 

The hyper-Kahler manifolds produced so far by 
the quotient construction have all been noncompact. 
Examples of compact hyper-Kahler manifolds are 
rarer but some are known. Beauville has produced 
examples in all dimensions as desingularizations of 
symmetric products of the basic four-dimensional 
compact examples (K3 and the 4-torus). 

Further material for this section may be found, for 
example, in Hitchin (1992) and in the chapter by the 
author on hyper-Kahler manifolds in LeBrun and 
Wang (1999). 


Quaternionic Kahler Manifolds (Holonomy Sp(n/4)) 
Sp(1)) 


These are always Einstein with nonzero Einstein 
constant. Instead of globally defined parallel com- 
plex structures as in the hyper-Kähler case, we have 
a sub-bundle G of End(TM) with fiber isomorphic to 
the imaginary quaternions, parallel with respect to 
the Levi-Civita connection. Thus, we have locally 
defined almost-complex structures I, J, K, satisfying 
the quaternionic multiplication relations, such that 
covariant differentiation of one of I, J, K gives a 
linear combination of the other two. In particular, 
note that quaternionic Kähler manifolds are not 
Kähler. 

If the Einstein constant A is positive, the only 
known complete examples are symmetric, the so- 
called compact Wolf spaces, which are in one-to-one 
correspondence with the compact simple Lie groups. 
It is conjectured that these are the only examples 
with A > 0, and some results in this direction have 
been established (e.g., it is known if dim M < 12). It 
is also known that for fixed dimension, there are 
only finitely many types of compact quaternionic 
Kähler manifold with A > 0. 

Many orbifold examples, however, are known to 
exist, for example, via the Galicki-Lawson quater- 
nionic Kähler quotient construction. 


If A <0, more complete examples are known. In 
addition to the noncompact duals of the Wolf 
spaces, there are homogeneous, nonsymmetric 
examples due to Alekseevski, and infinite-dimen- 
sional families of inhomogeneous examples con- 
structed via twistor methods by LeBrun (see also 
Biquard (2000)). 


Exceptional Holonomy (Go or Spin(7)) 


Such metrics exist in dimension 7 or 8, respectively. 
They are always Ricci-flat and admit a parallel 
spinor. Local examples were constructed by Bryant 
using Cartan—Kahler theory, and some explicit 
complete noncompact examples were produced by 
Salamon and Bryant using a cohomogeneity-1 
construction. More complicated explicit noncom- 
pact examples have recently been produced by 
several authors (see Cvetic et al. (2003) for a 
survey). Compact examples were produced using 
analytical methods by Joyce, and later by Kovalev. 
Joyce starts with a flat singular metric on quotients 
of the seven- or eight-dimensional torus and con- 
structs an approximate solution to the special 
holonomy condition on a resolution of this singular 
space. Then an analytic argument is used to show 
that an exact nearby solution exists. 

For further reading, consult Joyce (2000) as well 
as the article by Joyce in LeBrun and Wang 
(1999), 

There are also some interesting examples of 
Einstein metrics which, although not of special 
holonomy themselves, are closely related to special 
holonomy geometries. In recent years, these have 
yielded many new examples of compact Einstein 
manifolds in the work of Boyer, Mann, Galicki, 
Kollar, Rees, Piccinni, and Nakamaye. 


Einstein-Sasaki Structures 


There are several different ways of defining these, 
but the simplest is to say that (M,g) is Einstein- 
Sasaki if the cone (R x M, dt? + tg) is Ricci-flat 
Kahler. Also, an Einstein—Sasaki manifold has a 
circle action with quotient a Kahler—Einstein orbi- 
fold. Existence theorems for such orbifold metrics 
have led to many examples of Einstein—Sasaki 
metrics, including families on odd-dimensional 
spheres. 


3-Sasakian Structures 


Again, we can define these in terms of cones; (M, g) 
has a 3-Sasakian structure if the cone over it is 
hyper-Kahler. The basic example is S*”*? with 
associated cone H” — {0}. A 3-Sasakian manifold is 
always Einstein with positive Einstein constant. 





The hyper-Kahler quotient construction induces 
a 3-Sasakian quotient, and many examples of 
compact 3-Sasakian manifolds have been produced 
as 3-Sasakian quotients of S*”+%. In particular, there 
are examples in dimension 7 with arbitrarily large 
second Betti number, showing that one cannot, in 
general, expect compactness/finiteness results for 
Einstein moduli spaces without further assumptions. 


Homogeneous Examples 


Another strategy to study the Einstein equations is 
to reduce the difficulty of the problem by imposing 
symmetries. More precisely, we consider Einstein 
manifolds (M,g) with an isometric action of a Lie 
group G. In general, the Einstein equations with this 
symmetry will now involve r independent variables 
where r is the dimension of the stratified space 
M/G. We call r the cohomogeneity of the manifold. 

In this section, we consider the situation where 
(M, g) is homogeneous, that is, when the action of G 
is transitive so r=0. The Einstein equations now 
reduce to a system of algebraic equations. 

We may now write M=G/K, where K is the 
stabilizer of a point of M. We choose an Adx- 
invariant vector space complement p to f in g, and 
identify p with the tangent space to G/K at the 
identity coset. The key point is that G-invariant 
metrics on M=G/K may now be identified with 
Adg-invariant inner products on p, which may, in 
turn, be studied by looking at the decomposition of 
p into irreducible representations of K. 

In the special case when G/K is isotropy irredu- 
cible (i.e., p is an irreducible representation of K), 
both the metric g and its Ricci tensor are propor- 
tional by Schur’s lemma, and hence g is automati- 
cally Einstein. Isotropy-irreducible homogeneous 
spaces have been classified by Kramer, Manturov, 
Wolf, and Wang—Ziller. 

In the general case, the Einstein equations become 
a system of polynomial equations. Determining 
whether this system has a real positive solution is, 
in general, a highly nontrivial problem. However, 
the situation of homogeneous metrics is one area in 
which the variational formulation of the Einstein 
equations has proved highly successful. 

We are now considering the scalar curvature 
functional on the finite-dimensional space of unit 
G-invariant metrics on G/K. The behavior of the 
scalar curvature functional is related to the structure 
of the lattice of intermediate subalgebras between 
the Lie algebras of K and G. 

An early result along these lines (Wang and Ziller 
1986) is that if K is maximal in G (compact), then 
G/K admits a G-invariant Einstein metric. The idea 
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of the proof is to show that maximality of K forces 
the scalar curvature functional on the space of 
volume-1 homogeneous metrics to be both bounded 
above and proper, and therefore to have a 
maximum. 

These ideas have been greatly extended by Bohm, 
Wang, and Ziller. Given a compact connected 
homogeneous space G/K, they define a graph 
whose vertices are Ad(K)-invariant subalgebras 
strictly intermediate between g and £. The edges 
correspond to inclusions between subalgebras. 
A component of the graph is called toral if all 
subalgebras § in this component are such that the 
identity component of H/K is abelian. They now 
show that if the graph has at least two nontoral 
components, then G/K admits a G-invariant Einstein 
metric. The Einstein metrics in the theorem are 
produced by a mountain pass argument and may 
have co-index 1, contrasting with the maxima of the 
earlier theorem. 

Further advances in this direction have recently 
been made by Bohm. He associates to G/K a 
simplicial complex, and shows that nonzero homo- 
logy groups of the complex imply the existence of 
higher co-index Einstein metrics. 

One can also study homogeneous noncompact 
Einstein spaces with A <0. It is conjectured by 
Alekseevski that for all such examples K is a 
maximal compact subgroup of G. The reader is 
referred to Heber (1998) for further information on 
the noncompact case. 

The above results give some powerful existence 
results for Einstein metrics. However, there are 
examples known of homogeneous spaces G/K 
which admit no G-invariant Einstein metric (Wang 
and Ziller 1986). One such example is SU(4)/SU(2), 
where SU(2) is a maximal subgroup of 
Sp(2) c SU(A). 

Techniques similar to those in the homogeneous 
case have been used to construct Einstein metrics on 
total spaces of certain bundles, via Riemannian 
submersions. Some highlights are Jensen’s exotic 
Einstein metrics on (4n + 3)-dimensional spheres, 
and the Wang—Ziller metrics on total spaces of torus 
bundles over products of Kahler—Einstein manifolds. 
The latter construction gives examples of spaces 
admitting volume-1 Einstein metrics with infinitely 
many Einstein constants A. 


Examples of Higher Cohomogeneity 


One can also look for Einstein metrics of higher 
cohomogeneity. Most progress has been made in the 
cohomogeneity-1 case, that is, where the principal 
orbit G/K of the action has real codimension one in 
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M (see Eschenburg and Wang (2000) for back- 
ground on such metrics). On the open dense set in M 
which is the union of the principal orbits, we may 
write the metric as 


dr + & 


where g, is a t-dependent homogeneous metric on 
G/K. The Einstein equations are now a system of 
ordinary differential equations in t. 

One may also add a special orbit G/H at one or 
both ends of the interval over which ¢ ranges. This 
will impose boundary conditions on the ODEs. For 
the manifold structure to extend smoothly over the 
special orbit, H/K must be a sphere. Notice that if 
A > 0, then to obtain a complete metric M must be 
compact, so we must add two special orbits. If A < 0 
and the metric is irreducible, then a Bochner 
argument tells us that M is noncompact. In the Ricci- 
flat case, the Cheeger-Gromoll theorem tells us that to 
obtain a complete irreducible metric, we must have 
exactly one special orbit, so M is topologically the 
total space of a vector bundle over the special orbit. 
In fact, most of the known examples even with A < 0 
have a special orbit too. 

The system of ODEs we obtain is still highly 
nonlinear and difficult to analyze in general. How- 
ever, there are certain situations in which the 
equations, or a subsystem, can be solved in closed 
form. If we take G/K to be a principal circle bundle 
over a Hermitian symmetric space, Bérard Bergery 
(1982) showed that the resulting Einstein equations 
are solvable. (His work was inspired by the earlier 
example of Page, which corresponds to the case 
when G/K =U(2)/U(1), a circle bundle over CP'.) 
In fact, Bérard Bergery’s construction works in 
greater generality as we obtain the same equations 
if G/K is replaced by any Riemannian submersion 
with circle fibers over a positive Kahler—Einstein 
space. This illustrates a general principle that 
systems arising as cohomogeneity-1 Einstein equa- 
tions also typically arise from certain bundle ansatze 
without homogeneity assumptions. 

Wang and Wang generalized this construction to 
be the case when the hypersurface in M is a 
Riemannian submersion with circle fibers over a 
product of an arbitrary number of Kahler—Einstein 
factors. Other solvable Einstein systems have been 
studied by, for example, Wang and Dancer. 

It may also be possible in certain situations to get 
existence results without an explicit solution. This 
observation underlies the important work of Bohm 
(1998). He constructs cohomogeneity-1 Einstein 
metrics on certain manifolds with dimension 
between 5 and 9, including all the spheres in this 


range of dimensions. The equations are not now 
solved in closed form, but it is possible to get a 
qualitative understanding of the flow and to show 
that certain trajectories will give metrics on the 
desired compact manifolds. 

Bohm has also shown, in an analogous result to 
the homogeneous case, that there are examples of 
manifolds with a cohomogeneity-1 G-action which 
do not support any G-invariant Einstein metric. 

So far, not much is known about Einstein metrics 
of higher cohomogeneity. An exception is the 
situation of self-dual Einstein metrics in dimension 
4, where the self-dual condition greatly simplifies 
the resulting equations. Calderbank, Pedersen, and 
Singer have achieved a good understanding of such 
metrics with T* symmetry, including construction of 
such metrics on Hirzebruch—Jung resolutions of 
cyclic quotient singularities. 


Analytical Methods 


So far there is no really general analytical method 
for proving existence of global Riemannian Einstein 
metrics (although, of course, such techniques do exist 
in more restrictive situations of special holonomy). 

Although the Einstein equations admit a variational 
formulation, this has (except for homogeneous metrics) 
not yielded general existence results. Note that the 
Wang-Ziller torus bundle examples at the end of the 
section “Homogeneous examples” show that the 
Palais-Smale condition does not hold in full generality. 

One early suggestion was to adopt a minimax 
procedure. In each conformal class [g], one looks for 
a minimizer of the volume-normalized scalar curva- 
ture. Such a minimizer always exists. One then takes 
the supremum over all conformal classes. The 
resulting supremum of the functional is called the 
Yamabe invariant Y(M) of the manifold M. If a 
maximizer g exists, and Y(M) < 0, then g is Einstein. 

However, striking work of Petean shows that this 
procedure must fail to produce an Einstein metric in 
many cases. He proves that if dim M > 5 and M is 
simply connected, then the Yamabe invariant is non- 
negative. So, for such an M, any Einstein metric 
produced will have A > 0, and we know that this 
puts constraints on the topology of M. 

Another possible technique is to use the Hamilton 
Ricci flow. If this converges as t — oo, the limiting 
metric is Einstein. However, it seems hard in higher 
dimensions to get control over the flow. In parti- 
cular, the Wang-—Ziller example in the section 
“Homogeneous examples” of a homogeneous space 
with no invariant Einstein metric shows that the 
flow may fail to converge (the Hamilton flow 
preserves the property of G-invariance). 


Graham-Lee and Biquard have used analytical 
methods to produce Einstein deformations of hyper- 
bolic space (real, complex, quaternionic, or Cayley). 
The idea is to show that a sufficiently small deforma- 
tion of the conformal infinity of hyperbolic space can 
be extended to a deformation of the hyperbolic metric. 

Recently, Anderson has shown the existence of 
Einstein metrics with A <0 on a large class of 
manifolds obtained by Dehn filling from hyperbolic 
manifolds with toral ends. The strategy is to glue on 
to the hyperbolic metric copies of a simple explicit 
asymptotically hyperbolic metric, and to show that 
the resulting metric can be perturbed to an exact 
solution of the Einstein equations. 


See also: Einstein Equations: Exact Solutions; Einstein 
Equations: Initial Value Formulation; Hamiltonian 
Reduction of Einstein’s Equations; Several Complex 
Variables: Compact Manifolds; Singularities of the Ricci 
Flow. 
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Introduction 
Notation 


Standard notation and terminology of differential 
geometry and general relativity are used in this 
article. All considerations are local, so that the four- 
dimensional spacetime M is assumed to be a smooth 
manifold diffeomorphic to R*. It is endowed with 
a metric tensor g of signature (1,3) and a linear 
connection defining the covariant differentiation of 
tensor fields. Greek indices range from O to 3 and 
refer to spacetime. Given a field of frames (e,,) on M, 
and the dual field of coframes (6”), one can write the 
metric tensor as g=2,0"0", where gy, =2(en5 ev) 
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Asterisque 


and Einstein’s summation convention is assumed to 
hold. Tensor indices are lowered with g,,, and raised 
with its inverse g”. General-relativistic units are 
used, so that both Newton’s constant of gravitation 
and the speed of light are 1. This implies b= 14, 
where I ~ 10-°° cm is the Planck length. Both mass 
and energy are measured in centimeters. 


Historical Remarks 


The Einstein—Cartan theory (ECT) of gravity is a 
modification of general relativity theory (GRT), 
allowing spacetime to have torsion, in addition to 
curvature, and relating torsion to the density of 
intrinsic angular momentum. This modification 
was put forward in 1922 by Elie Cartan, 
before the discovery of spin. Cartan was influenced 
by the work of the Cosserat brothers (1909), who 
considered besides an (asymmetric) force stress 
tensor also a moments stress tensor in a suitably 
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generalized continuous medium. Work done in the 
1950s by physicists (Kondo, Bilby, Kroner, and 
other authors) established the role played by torsion 
in the continuum theory of crystal dislocations. A 
recent review (Ruggiero and Tartaglia 2003) 
describes the links between ECT and the classical 
theory of defects in an elastic medium. 

Cartan assumed the linear connection to be metric 
and derived, from a variational principle, a set of 
gravitational field equations. He required, without 
justification, that the covariant divergence of the 
energy-momentum tensor be zero; this led to an 
algebraic constraint equation, bilinear in curvature 
and torsion, severely restricting the geometry. This 
misguided observation has probably discouraged 
Cartan from pursuing his theory. It is now known 
that conservation laws in relativistic theories of 
gravitation follow from the Bianchi identities and, in 
the presence of torsion, the divergence of the 
energy-momentum tensor need not vanish. Torsion 
is implicit in the 1928 Einstein theory of gravitation 
with teleparallelism. For a long time, Cartan’s 
modified theory of gravity, presented in his rather 
abstruse notation, unfamiliar to physicists, did not 
attract any attention. In the late 1950s, the theory of 
gravitation with spin and torsion was independently 
rediscovered by Sciama and Kibble. The role of 
Cartan was recognized soon afterward and ECT 
became the subject of much research; see Hehl et al. 
(1976) for a review and an extensive bibliography. 
In the 1970s, it was recognized that ECT can be 
incorporated within supergravity. In fact, simple 
supergravity is equivalent to ECT with a massless, 
anticommuting Rarita—Schwinger field as the source. 
Choquet—Bruhat considered a generalization of ECT 
to higher dimensions and showed that the Cauchy 
problem for the coupled system of Einstein—Cartan 
and Dirac equations is well posed. Penrose (1982) 
has shown that torsion appears in a natural way 
when spinors are allowed to be rescaled by a 
complex conformal factor. ECT has been general- 
ized by allowing nonmetric linear connections and 
additional currents, associated with dilation and 
shear, as sources of such a “metric-affine theory of 
gravity” (Hehl et al. 1995). 


Physical Motivation 


Recall that, in special relativity theory (SRT), the 
underlying Minkowski spacetime admits, as its 
group of automorphisms, the full Poincaré group, 
consisting of translations and Lorentz transforma- 
tions. It follows from the first Noether theorem 
that classical, special-relativistic field equations, 
derived from a variational principle, give rise to 


conservation laws of energy-momentum and angu- 
lar momentum. Using Cartesian coordinates (x), 
abbreviating Op/Ox? to y, and denoting by t” and 
sP = —s’HP the tensors of energy-momentum and 
of intrinsic angular momentum (spin), respectively, 
one can write the conservation laws in the form 


po =0 [1] 
and 
(xPP — xt? 4. sP ) = 0 [2] 


In the presence of spin, the tensor t” need not be 
symmetric, 


WV Vb __ pp 
t — t — S „P 
Belinfante and Rosenfeld have shown that the tensor 
TH” = t” + - (p 4 YP 4 Sue 


is symmetric and its divergence vanishes. 

In quantum theory, the irreducible, unitary repre- 
sentations of the Poincaré group correspond to 
elementary systems such as stable particles; these 
representations are labeled by the mass and spin. 

In Einstein’s GRT, the spacetime M is curved; the 
Lorentz group — but not the Poincaré group — appears 
as the structure group acting on orthonormal frames 
in the tangent spaces of M. The energy-momentum 
tensor T appearing on the right-hand side of the 
Einstein equation is necessarily symmetric. In GRT 
there is no room for translations and the tensors t 
and s. 

By introducing torsion and relating it to s, Cartan 
restored the role of the Poincaré group in relativistic 
gravity: this group acts on the affine frames in the 
tangent spaces of M. Curvature and torsion are the 
surface densities of Lorentz transformations and 
translations, respectively. In a space with torsion, 
the Ricci tensor need not be symmetric so that an 
asymmetric energy-momentum tensor can appear 
on the right-hand side of the Einstein equation. 


Geometric Preliminaries 
Tensor-Valued Differential Forms 


It is convenient to follow Cartan in describing 
geometric objects as tensor-valued differential 
forms. To define them, consider a homomorphism 
o:GL4(R) — GLn(R) and an element A= (A!) 
of End R*, the Lie algebra of GL4(R). The derived 
representation of Lie algebras is given by 


d y 
qz I (€*P At)|,-0 = 7,45 


If (e4) is a frame in RY, then oy, (ea) =o i where 
a,b=1,...,N. 
A map a=(a",):M — GL4(R) transforms fields 
of frames so that 
e, = 6a, and 0” = a0" [3] 
A differential form y on M, with values in RY, is said 
to be of type o if, under changes of frames, it 
transforms so that y’ = o(a t). For example, 0 = (6“) 
is a 1-form of type id. If now A=(A“): M —> End RÍ, 
then one puts a(t)=exptA:M — GL4(R) and 
defines the variations induced by an infinitesimal 
change of frames, 


_d 
dt 


d a V 
Sy = F Oll) = -At 


60 (a(t)~*6)|,-) = —A0 


Hodge Duals 


Since M is diffeomorphic to Rf, one can choose an 
orientation on M and restrict the frames to agree 
with that orientation so that only transformations 
with values in GL7 (R) are allowed. The metric then 
defines the Hodge dual of differential forms. Put 
Oj — Sp The fornos Ius His Mro ANd. Yaya are 
defined to be the duals of 1, 6,,,0,, A 015 On A Or A 955 
and 0, ^0, A0, ^bos, respectively. The 4-form n is 
the volume element; for a holonomic coframe 
0! = dx", it is given by ,/—det(g,,)dx° A dx! A 
dx? A dx. In SRT, in Cartesian coordinates, one 
can define the tensor-valued 3-forms 


=i" mda 6 =p [5] 
so that eqns [1] and [2] become 

dř” =0 and dj”=0 
where 

J = x"t — xt" + sl” [6] 


For an isolated system, the 3-forms t“ and j”, 
integrated over the 3-space x? = const., give the 
system’s total energy-momentum vector and angular 
momentum bivector, respectively. 


Linear Connection, Its Curvature and Torsion 


A linear connection on M is represented, with 
respect to the field of frames, by the field of 1-forms 


H — TE pP 
w =e 
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so that the covariant derivative of e, in the direction 
of e, is V pev =T prep. Under a change of frames [3], 
the connection forms transform as follows: 


Be JP — r aP H 
abw, = whai + day 


If p= yfe; is a k-form of type o, then its covariant 
exterior derivative 


Dy? = dy + CAO Ag 


is a (k + 1)-form of the same type. For a 0-form one 
has Dyf =6"V „°. The infinitesimal change of w, 
defined similarly as in [4], is bw = DA“. The 2-form 
of curvature Q =(Q#,,), where 


OY = dat + wh Aut 


is of type ad: it transforms with the adjoint 
representation of GL4(R) in End R*. The 2-form of 
torsion © =(O"), where 


OF = do" + wl A 6 


is of type id. These forms satisfy the Bianchi 
identities 


DOF, =0 and DO” = Q, A0” 


For a differential form y of type o, the following 
identity holds: 


Dy = opg Ay” 7] 
The tensors of curvature and torsion are given by 
QOF, = t R” pp A07 
and 
OF =} O” P AG? 


respectively. With respect to a holonomic frame, 
dé" =Q, one has 


u peH u 
9 po ~~ lo 7 ae 

In SRT, the Cartesian coordinates define a radius-vector 

field X” = —x", pointing towards the origin of the 


coordinate system. The differential equation it satisfies 
generalizes to a manifold with a linear connection: 


DX" +6" =0 8] 
By virtue of [7], the integrability condition of [8] is 
Q”, X” +0" =0 


Integration of [8] along a curve defines the Cartan 
displacement of X; if this is done along a small 
closed circuit spanned by the bivector Af, then the 
radius vector changes by about 


AX! = 1 (RM ipo X” J OF ar 
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This holonomy theorem —- rather imprecisely for- 
mulated here — shows that torsion bears to transla- 
tions a relation similar to that of curvature to linear 
homogeneous transformations. 

In a space with torsion, it matters whether one 
considers the potential of the electromagnetic field to be 
a scalar-valued 1-form y or a covector-valued 0-form 
(Yu). The first choice leads to a field dy that is invariant 
with respect to the gauge transformation y > y + dy. 
The second gives (V pyy — Viey)" A O = (Dpp) A 
0! = dy — y,,0", a gauge-dependent field. 


Metric-Affine Geometry 


A metric-affine space (M,g,w) is defined to have a 
metric and a linear connection that need not depend on 
each other. The metric alone determines the torsion- 
free Levi-Civita connection w characterized by 


do" + wf AO’ =0 and Dgo =0 
Its curvature is 
Op = dit +t AGE 
The 1-form of type ad, 
k”, = wt — wl [9] 


determines the torsion of w and the covariant 


derivative of g, 
O“ = K" AO, De = =R Fo 


The curvature of w can be written as 


ya FD ow ek a ANR a [10] 
The transposed connection w is defined by 
wr, = Ww, 7 Oar a 


so that, with respect to a holonomic frame, one has 
ia =T% The torsion of & is opposed to that of w. 


Riemann-Cartan Geometry 


A Riemann-—Cartan space is a metric-affine space 
with a connection that is metric, 


D250 [11] 


The metricity condition implies that kpv + Ky, =0 
and Qu + Qu, =0. In a Riemann-Cartan space, the 
connection is determined by its torsion O and the 


metric tensor. Let Op = 8p Q „3 then 


hu = 5 (Quov + Orpo + Qo)? [12] 


The transposed connection of a Riemann-Cartan 
space is metric if and only if the tensor Qp is 
completely antisymmetric. Let V denote the 


covariant derivative with respect to w. By definition, 
a symmetry of a Riemann—Cartan space is a 
diffeomorphism of M preserving both g and w. The 
one-parameter group of local transformations of M, 
generated by the vector field v, consists of symme- 
tries of (M, g,w) if and only if 


Viv + Vv =0 [13] 
and 
DV" + R” p010 = 0 [14] 


In a Riemannian space, the connections w and w 
coincide and [14] is a consequence of the Killing 
equation [13]. The metricity condition implies 


D Nuvp = Te [15 | 


The Einstein-Cartan Theory of 
Gravitation 


An Identity Resulting from Local Invariance 


Let (M, g,w) be a metric-affine spacetime. Consider a 
Lagrangian L which is an invariant 4-form on M; it 
depends on g,0,w,y, and the first derivatives of 
y= p%e,. The general variation of the Lagrangian is 


EL = La A p? +578" 6g + OO" A ty 


— 5 wt As”), + an exact form [16] 


so that La =0 is the Euler-Lagrange equation for y. 
If the changes of the functions g,6,w, and ọ are 
induced by an infinitesimal change of the frames [4], 
then óL =0 and [16] gives the identity 


Rut =O ie pe, Le 0 


It follows from the identity that the two sets of 
Euler-Lagrange equations obtained by varying L 
with respect to the triples (y,0,w) and (y,g,w) are 
equivalent. In the sequel, the first triple is chosen to 
derive the field equations. 


Projective Transformations and the Metricity 
Condition 


Still under the assumption that (M, g,w) is a metric- 
affine spacetime, consider the 4-form 


87K = 38"? Mp AQ", [17] 
which is equal to nR, where R =g” R, is the Ricci 


l ae: a as 
scalar; the Ricci tensor Rw = Rf, is, in general, 
asymmetric. invariant with 


The form [17] is 


respect to projective transformations of the 


connection, 
wr wh + OFA [18] 


where A is an arbitrary 1-form. Projectively related 
connections have the same (unparametrized) geode- 
sics. If the total Lagrangian for gravitation interact- 
ing with the matter field y is K + L, then the field 
equations, obtained by varying it with respect to 
p0 and ware: La=0, 


5 PN avp Ar = = Sat; [19] 
and 
D(g Nv) = 871, [20] 
respectively. Put sy =2yps?v. If 
Sw Sy =U [21] 


then s’,, =0 and L is also invariant with respect to 
[18]. One shows that, if [21] holds, then, among the 
projectively related connections satisfying [20], there 
is precisely one that is metric. To implement 
properly the metricity condition in the variational 
principle, one can use the Palatini approach with 
constraints (Kopczyński 1975). Alternatively, fol- 
lowing Hehl, one can use [9] and [12] to eliminate w 
and obtain a Lagrangian depending on y, 6, and the 
tensor of torsion. 


The Sciama-Kibble Field Equations 


From now on the metricity condition [11] is 
assumed, so that [21] holds and the Cartan field 
equation [20] is 

Mua OF = 80S ww [22] 
Introducing the asymmetric energy-momentum ten- 
sor tyy and the spin density tensor Sup = Dos wiv 
similarly as in [5], one can write the Einstein—Cartan 


equations [19] and [22] in the form given by Sciama 
and Kibble, 


Ray —48yR = Baty, [23] 


eam + O _ Oe = STSP wy [24] 
Equation [24] can be solved to give 
OF p= OTS wta ves an) [25] 


Therefore, torsion vanishes in the absence of spin 
and then [23] is the classical Einstein field 
equation. In particular, there is no difference 
between the Einstein and Einstein—Cartan theories 
in empty space. Since practically all tests of 
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relativistic gravity are based on consideration of 
Einstein’s equations in empty space, there is no 
difference, in this respect, between the Einstein 
and the Einstein—Cartan theories: the latter is as 
viable as the former. 

In any case, the consideration of torsion amounts 
to a slight change of the energy-momentum tensor 
that can be also obtained by the introduction of a new 
term in the Lagrangian. This observation was made in 
1950 by Weyl in the context of the Dirac equation. 

In Einstein’s theory, one can also satisfactorily 
describe spinning matter without introducing tor- 
sion (Bailey and Israel 1975). 


Consequences of the Bianchi Identities: 
Conservation Laws 


Computing the covariant exterior derivatives of 
both sides of the Einstein—Cartan equations, using 
[15] and the Bianchi identities, one obtains 


SaD = UTES AQ [26] 
and 


SMS ied a PE [27] 


Cartan required the right-hand side of [26] to 
vanish. If, instead, one uses the field equations [19] 
and [22] to evaluate the right-hand sides of [26] and 
[27], one obtains 


Di, SO! Raya a A [28] 
and 


Ds ig =O Ate Op At, [29] 


Let v be a vector field generating a group of 
symmetries of the Riemann—Cartan space (M, g,w) 
so that eqns [13] and [14] hold. Equations [28] and 
[29] then imply that the 3-form 


. = 
j = utu HEV sw 


is closed, dj = 0. In particular, in the limit of SRT, in 
Cartesian coordinates x“, to a constant vector field v 
there corresponds the projection, onto v, of the 
energy-momentum density. If A“” is a constant 
bivector, then v” = A",x” gives j =J” Am, where jH” 
is as in [6]. 


Spinning Fluid and the Generalized Mathisson- 
Papapetrou Equation of Motion 


As in classical general relativity, the right-hand sides 
of the Einstein—Cartan equations need not necessa- 
rily be derived from a variational principle; they 
may be determined by phenomenological 
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considerations. For example, following Weyssenh- 
off, consider a spinning fluid characterized by 


t” — Phy’ and 


where S’’+S”"=0 and u is the unit, timelike 
velocity field. Let U = u”nņ, so that 


h= and S525 yU 


sH” — yP 


Define the particle derivative of a tensor field yf in 
the direction of u by 


$n = D(¢*U) 
For a scalar field y, the equation ġ = 0 is equivalent 
to the conservation law d(pU)=0. Define 


P=28,P"u’, then [29] gives an equation of motion 
of spin 


Say = UP y — Ugh gy 


so that 
Fy = M, + Su” 


From [28] one obtains the equation of translatory 
motion, 


P, = (QP p — TR? Spo)” 


which is a generalization to the ECT of the 
Mathisson—Papapetrou equation for point particles 
with an intrinsic angular momentum. 


From ECT to GRT: The Effective 
Energy-Momentum Tensor 


Inside spinning matter, one can use [12] and [25] to 
eliminate torsion and replace the Sciama—Kibble 
system by a single Einstein equation with an 
effective energy-momentum tensor on the right- 
hand side. Using the split [10], one can write [23] as 


Q 


Ru — fgwR? = 87T 30 


o o 
Here R,, and R are, respectively, the Ricci tensor 
and scalar formed from g. The term in [10] that is 
quadratic in « contributes to Te an expression 
quadratic in the components of the tensor s,,) so 
that, neglecting indices, one can symbolically write 


T!=T+s [31] 

The symmetric tensor T is the sum of ¢ and a term 
coming from D «K! in [10]: 

TH” = tH + 1V? (e p YPM 4 se) [32] 


It is remarkable that the Belinfante—-Rosenfeld 
symmetrization of the canonical energy-momentum 
tensor appears as a natural consequence of ECT. 


From the physical point of view, the second term on 
the right-hand side of [31], can be thought of as 
providing a spin-spin contact interaction, reminis- 
cent of the one appearing in the Fermi theory of 
weak interactions. 

It is clear from eqns [30]-[32] that whenever 
terms quadratic in spin can be neglected — in 
particular, in the linear approximation — ECT is 
equivalent to GRT. To obtain essentially new 
effects, the density of spin squared should be 
comparable to the density of mass. For example, to 
achieve this, a nucleon of mass m should be 
squeezed so that its radius rca; be such that 


2 
[2 m 
3 ane: 
Cart Cart 


Introducing the Compton wavelength rcomp = [7 /m ~ 
10- cm, one can write 


1/3 


~~ 


Cart ~ (l i ‘Compt ) 


The “Cartan radius” of the nucleon, rca ~ 
106 cm, so small when compared to its physical 
radius under normal conditions, is much larger than 
the Planck length. Curiously enough, the energy 
I?/rcarre is of the order of the energy at which, 
according to some estimates, the grand unification 
of interactions is presumed to occur. 


Cosmology with Spin and Torsion 


In the presence of spinning matter, T° need not 
satisfy the positive-energy conditions, even if T does. 
Therefore, the classical singularity theorems of 
Penrose and Hawking can be overcome here. 
In ECT, there are simple cosmological solutions 
without singularities. The simplest such solution, 
found in 1973 by Kopczynski, is as follows. Consider 
a universe filled with a spinning dust such that 
P” = pu”,u” = 6,823 = 0, and S,,=0 for wt+v45S, 
and both p and o are functions of t=x° alone. 
These assumptions are compatible with the 
Robertson—Walker line element dt? —R(t)* (dx? + 
dy? + dz’), where (x,y,z)=(x!,x7,x°) and torsion is 
determined from [25]. The Einstein equation [23] 
reduces to the modified Friedmann equation, 


IR? —-MR'4+39R*=0 [33] 


supplemented by the conservation laws of mass 
and spin, 


M = + pR° = const., = +roR° = const. 


The last term on the left-hand side of [33] plays the 
role of a repulsive potential, effective at small values of 
R; it prevents the solution from vanishing. It should be 


noted, however, that even a very small amount of 
shear in u results in a term counteracting the repulsive 
potential due to spin. Neglecting shear and making the 
(unrealistic) assumption that matter in the universe at 
t=0 consists of ~10°° nucleons of mass m with 
aligned spins, one obtains the estimate R(0) ~ 1cm 
and a density of the order of m7/I*, very large, but 
much smaller than the Planck density 1//*. 

Tafel (1975) found large classes of cosmological 
solutions with a spinning fluid, admitting a group of 
symmetries transitive on the hypersurfaces of constant 
time. The models corresponding to symmetries of 
Bianchi types I, VIIo, and V are nonsingular, provided 
that the influence of spin exceeds that of shear. 


Summary 


ECT is a viable theory of gravitation that differs 
very slightly from the Einstein theory; the effects of 
spin and torsion can be significant only at densities 
of matter that are very high, but nevertheless much 
smaller than the Planck density at which quantum 
gravitational effects are believed to dominate. It is 
possible that ECT will prove to be a better classical 
limit of a future quantum theory of gravitation than 
the theory without torsion. 


See also: Cosmology: Mathematical Aspects; General 
Relativity: Overview. 
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Introduction 


Newton’s theory of gravity with absolute time and 
Euclidean 3-space connects the gravitational poten- 
tial U with its source, the density of matter r, by the 
Poisson equation 


AU = —4rKr 


where A is the Laplace operator and « is the 
gravitational constant. The trajectories of massive 
test particles are the flow lines of the gradient of U. 
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Newton’s theory has proven to be very accurate in 
the laboratory as well as in the solar system (except for 
a small discrepancy with the observed value of 
Mercury perihelion). Newton’s theory together with 
special relativity, the equivalence principle, and ideas 
of Mach, have been an inspiration for Einstein to 
uncover the equations which must be satisfied by the 
geometry of spacetime. They link the curvature of the 
spacetime metric with a phenomenological symmetric 
2-tensor T, which must represent the energy, momen- 
tum, and stresses of all the sources, by the equality: 


S(g) = Ricci(g) — $gR(g) = 87KT 


where Ricci(g) is the Ricci tensor of the spacetime 
metric g and R(g) its scalar curvature. The sym- 
metric 2-tensor S(g) is called the Einstein tensor. The 
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Bianchi identities, due to the invariance of curvature 
by isometries of g, imply that the divergence of the 
Einstein tensor is identically zero: the Einstein 
equations imply therefore the vanishing of the 
divergence of the source tensor T. The equations so 
obtained generalize in a relativistic context the 
conservation laws of Newtonian mechanics. In 
local spacetime coordinates x“, the Einstein equa- 
tions and conservation laws read 

Sag = Kae tg, R = STk lag, VT a0 
where V denotes the covariant derivative in the 
metric g. 

The gravitational constant « is inspired by the 
Newtonian equation relating the potential U with 
the density of matter. This equation can be obtained 
as an approximation of Einstein’s equations with 
matter in the case of low velocities of matter and 
weak gravitational fields. The Newton’s equation of 
motion of test particles is also an approximation of 
Einstein’s geodesic motion of such particles which 
can be deduced from Einstein’s equations them- 
selves. However, if one wants to remain in the 
framework of the general relativity theory, it is these 
Einstein’s equations which define the mass of a 
body, there is no comparison possible with some 
fixed given mass. As length had the dimension of 
time already in special relativity, now mass is found 
to have dimension of length. We write the equations 
in geometrical units, where 87K = 1, keeping in mind 
the corresponding change to usual laboratory 
units only in specific applications. In geometrical 
units the mass of the Earth is of the order of the 
centimeter. The most precise measures of « are still 
made using Newton type experiments, giving 
k = 6.67259 x 1071! m? kg | s72. 

In the case of electromagnetic (or classical Yang- 
Mills) field sources, the stress energy tensor in 
special relativity is the well-known Maxwell tensor 
T (or its generalizations), whose divergence vanishes 
when the field satisfies the Maxwell (or Yang—Mills) 
equations in vacuum. The expression of this tensor 
in a curved spacetime can be trivially deduced from 
its Minkowskian form. Its expression can also be 
deduced from the Lagrangian, and the vanishing of 
its divergence results from the invariance of this 
Lagrangian under isometries of the metric. It is the 
natural source of Einstein equations coupled with 
these fields. In the case of matter, the construction 
of a stress energy tensor is already delicate even in 
special relativity. 

The simplest models of sources with well- 
understood properties — kinetic matter and perfect 
fluids — are reviewed in this article. Physical 


situations difficult to model, even in special relativ- 
ity, dissipative fluids and elasticity, are mentioned. 
The extension to electrically, or classical Yang- 
Mills-Higgs, charged matter, offers no conceptual 
difficulty, but interesting new situations. 


Fluid Sources 


A fluid source in a domain of a spacetime (V, g) is 
such that there exists, in this domain, a unit timelike 
vector field u, satisfying g(u,u) = gagu®u? = —1, 
whose trajectories are the flow lines of matter. 
A moving Lorentzian orthonormal frame is called a 
proper frame if its timelike vector is u. Since the 
Einstein gravitational potentials reduce at a point in 
a Lorentzian orthonormal frame to Minkowskian 
values, one admits that the spacetime symmetric 
2-tensor T, which embodies the density of stress, 
energy, and momentum of a given type of matter, in 
a proper frame takes the expression it would have in 
special relativity and inertial coordinates. The 
expression of T in a general frame results from its 
tensorial character and the equivalence principle. 
The problem is to find a good expression of T in 
special relativity. 


Case of Dust (Incoherent Matter) 


In a proper frame there is neither momentum nor 
stresses. Therefore, the stress energy tensor reads in 
a general frame, with r a scalar function represent- 
ing the matter density: 


T =ru&u, 1e., Tag = Tuag 


Using the property g(u,u)=—1, the conservation 
laws imply the vanishing of the divergence of the 
matter flow ru, that is, the continuity equation 
(conservation of matter) 


Valru“)=0 


and the motion of the particles along geodesics of 
the metric: 


u°Vauf = 0 


Similar equations are obtained for a null dust 
model where g(u,u) =Q. 


Perfect Fluid 


Euler equations In Newtonian mechanics, a con- 
tinuous matter flow is characterized by its mass 
density and flow velocity. The equations are a 
continuity equation (conservation of matter) and 
equations of motion resulting from Newton’s law, 
which link the acceleration vector and the space 


divergence of the stress symmetric 2-tensor whose 
contraction with the normal to a small 2-surface 
gives the force applied to it. A fluid is called perfect 
if the pressure it applies to a small surface element 
with normal n is independent of n. Its stress tensor t, 
symmetric 2-tensor on Euclidean space, is then 
invariant by rotations. By generalization, a relativis- 
tic fluid is called perfect if its stress energy tensor 
has the following form: 


Tag = Huat + P(Sas + Mats) 


Then in a proper frame, where g takes the 
Minkowskian values and the only nonvanishing 
component of u is along the time axis and equal to 1, 
the projection of T on space is the Newtonian 
stress tensor with pressure p, while u, the projection 
of T on the time axis, is the fluid energy density. 
There is no momentum density in the proper frame. 
The conservation laws, also called Euler equations, 
are shown to split, as in the case of dust, into a 
continuity equation 


Valla + p)u"| — uap = 0 


and equations of motion 
(u +p) Van” + (g + u°u")d,p = 0 


In relativity, where mass and energy are equivalent, the 
continuity equation is no more a conservation law. 


Equations of state As in Newtonian mechanics, the 
Euler equations must be completed by a relation, 
called equation of state, depending on the physical 
properties of the fluid. In general in addition to 
mechanics, thermodynamic properties must be con- 
sidered. In relativity, they are borrowed from 
classical thermodynamics formulated in a spacetime 
context. 

In the simplest cases one introduces a conserved 
rest mass density r (or particle number density for 
particles with rest mass zero), satisfying the equation 


Vor? =0 with P®“ = ru° 


This r differs from the density of energy u. One sets 
u=r(1 +£) and calls € the internal specific energy. 
The first law of (reversible) thermodynamics is 
extended to relativistic perfect fluids by the identity 


© dS = de + pd(r“') 


which defines both the absolute temperature O 
and the differential of the specific entropy 
S. Modulo the continuity equation and the 
thermodynamic identity, the matter conservation 
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is equivalent to the conservation of entropy along 
the flow lines: 


Va(rSu*) =O hence u°O,S = 0 


The scalars p, y,S, r are not independent. Simple 
situations can be modeled by an “equation of state” 
linking these quantities. In astrophysics, one is inspired 
by what is known from classical fluids, with additional 
relativistic considerations. General relativity plays a 
role in the case of strong gravitational field. 

Very cold matter and nuclear matter are baro- 
tropic fluids; they obey an equation of state of the 
form p=p(p). 

When the energy p is largely dominated by the 
radiation energy, the fluid is called ultrarelativistic. 
The Stefan—Boltzmann laws give w=KT* and 
p—=(1/3)KT*, hence p=(1/3)y; the stress energy 
tensor is traceless. 

In white dwarves, the fluid is considered as 
polytropic: it obeys an equation of state of the 
form p=f(S)r’. If only the internal energy € and 
pressure p are dominated by radiation, then 
e=Kr'T* and p=(1/3)KT*, hence p=(1/3)re. 
The use of the thermodynamic identity leads to 
y=4/3, p=(K/3)(3S/4K)*° 74, with w=3p +r. 

For most other stars, the physical situation is too 
complex to be modeled by a simple equation; only 
tables of numerical values may be available. 

In cosmology, there is little physical informa- 
tion about the fluid which is to represent the 
energy content of the universe. It is assumed that 
in the early universe of the big-bang models, at 
very high temperature, the fluid was ultrarelati- 
vistic. At later times, it is generally assumed, for 
simplicity, that there is an equation of state linear 
and independent of entropy, p=(y — 1)u. In order 
that the speed of sound waves be not greater than 
the speed of light, one assumes that 1 < y< 2; 
y=1 corresponds to dust, y=2 to a stiff (see 
below) fluid. 

Recent confrontations of theory and observations 
seem to imply the existence of a new, not directly 
seen, type of matter, called “dark matter.” 


Wave fronts and propagation speeds The wave 
fronts of a differential system are the submani- 
folds of spacetime whose normals n annul the 
characteristic determinant. Discontinuities propa- 
gate along wave fronts. For a hyperbolic system, 
the wave fronts determine the domain of depen- 
dence of a solution. For a perfect fluid, they are 
found to be 


1. the matter wave fronts, generated by the flow 
lines, such that u“na =0 and 
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2. the sound wave fronts, whose normals satisfy the 
equation 
Va = 1) (u na) + pun na = 0 
in a proper frame at a point of spacetime 
u“ = 65, Za = Nap; this equation states that the 
slope of the spacetime normal to the wave front 
can be written as 


1/2 
m(n?\ 1 


n VP 


The sound propagation speed is the inverse of this 
slope, that is, v= ,/p’. It is less than the speed of 
light, as expected from a relativistic theory, if pi, < 1. 
The limiting case where these speeds are equal is 
called incompressible or stiff fluid. 








Hyperbolicity, existence, and uniqueness theorem 
The characteristics of the perfect fluid equations are 
real, but the apparent multiplicity of the matter 
wave fronts poses a problem for the hyperbolicity of 
the relativistic Euler equations, even in a given 
background metric. However, Choquet-Bruhat has 
proven that this system is a hyperbolic Leray system 
as well as its coupling with the Einstein equations, 
for instance, in wave gauge. The following theorem 
can then be proved using the general theorem on 
hyperbolic systems and an extension of the method 
used for Einstein’s equations in vacuum. 


Theorem Let (M,g,K) be an initial data set for the 
Einstein equations and (i,ji,8) be Cauchy data in a 
local Sobolev space H!®, s > 3, on the 3-manifold M 
for a perfect fluid with a smooth equation of state. 
Suppose ji > 0 and p, <1. There exists a globally 
hyperbolic spacetime of maximal extension solution of 
the Einstein equations with source such as perfect fluid 
taking these Cauchy data. Such a spacetime and fluid 
flow are smooth for smooth initial data. They are 


unique, up to spacetime isometries. 


The Euler equations have also been written as a 
first-order symmetric hyperbolic system by Boillat, 
Ruggeri, and Strumia using general methods relying 
on the existence of a convex functional, and directly 
by Rendall, who pointed out the difficulty of 
modeling the general motion of isolated fluid bodies, 
because of the assumption ji > 0. He constructed 
some solutions without this assumption where the 
boundaries are freely falling. The general problem of 
determining the evolution of boundaries appears 
everywhere in general relativity, and in classical 
mechanics. 


Global problems The spacetimes obtained above 
are, in general, incomplete: even in Minkowski space- 
time, the Euler equations do not in general have 
solutions that are global in time. Shocks appear in 
relativistic perfect fluids as in classical ones. Global 
existence results have been obtained for four- 
dimensional ultrarelativistic fluids (limited data), and 
in the case of 1-space dimension. A detailed study of the 
global behavior of spherically symmetric solutions of 
the Einstein—Euler equations with equation of state 
admitting a phase transition from zero pressure to stiff 
fluid has been done by Christodoulou. 


Dissipative Fluids 


A general fluid stress energy tensor is with u, a unit 
vector whose trajectories are the flow lines: 


T°! = uur” es gu? Ha'u” 4 Or 


with q%ua =0, Oua =0 


u = T*u,ug is the energy density, which must satisfy 
u > 0, O is a space tensor representing the stresses, 
orthogonal to u and q is a space vector considered as 
a heat flow. The fundamental equations are still 
Vol?’ =0, but they must be implemented by 
constitutive equations for g and O which do not 
have simple satisfactory answer in a relativistic 
context. The transfer of results from classical 
mechanics on viscous fluids or on heat transfer 
leads to propagation speeds greater than the speed 
of light. It should be remarked that these classical 
equations are obtained as governing asymptotic 
states; thus, the parabolic character of their relativis- 
tic version does not contradict relativistic causality. 
However, it would be interesting to obtain, for 
dissipative relativistic fluids, hyperbolic dissipative 
equations. Various systems have been proposed, in 
particular, by Marle by using an approximation near 
equilibrium of a solution of the relativistic Boltzmann 
equation. A promising system, also inspired from 
kinetic theory, is the “extended thermodynamics” of 
Miller and Ruggeri which takes as 14 fundamental 
unknowns, the vector P=ru and the tensor T, 
satisfying the conservation laws. These equations are 
supplemented by equations linking a totally sym- 
metric 3-tensor A with a symmetric 2-tensor I by 
equations of the form 


Va AA = Py [1] 


A and I are functions of P and T depending on the 
model and called constitutive equations. The system 
is shown to be symmetric hyperbolic under the 
existence of a convex entropy function, property 
which holds under appropriate physical 
assumptions. 


Reasonable equations have been proposed and 
studied for several constituent fluids and 
superfluids. 


Charged Fluids 


The stress energy tensor of a charged fluid with 
electric (or Yang-Mills) charge is generally the sum 
of the stress energy tensor of the fluid and of the 
Maxwell (or Yang-Mills) field. This tensor is 
conserved modulo the Maxwell (or Yang-Mills) 
equations with source the electric current, and the 
Euler equations completed by the Lorentz force. The 
corresponding Einstein—Maxwell perfect fluid sys- 
tem is well posed in the case of zero or infinite 
conductivity (magnetohydrodynamics). A subtlety 
appears in the case of finite conductivity: the system 
is still well posed, but for a restricted (Gevrey) class 


of C@™ fields. 


Kinetic Models 
Distribution Function and Moments 


A general relativistic kinetic theory can be formu- 
lated without appeal to classical mechanics or 
special relativity. The matter is composed of 
particles whose size is negligible in the considered 
scale: rarefied gases in the laboratory, galaxies or 
even clusters of galaxies at the cosmological scale. 
The number of particles is so great and their motion 
so chaotic that the state of the matter can be 
described by a “one-particle distribution function,” 
a positive scalar function on the tangent bundle to 
the spacetime (x, p)—> f(x, p), which gives the mean 
number of particles with momentum p present at the 
point x of spacetime. 

The first moment of f is a causal vector field P 
defined by the integral over the space P, of 
momenta at x, with w, a volume element in that 
space: 


P(x) =: | pf (x, P)wp 
Px 

Out of the first moment, one extracts a scalar r > 0, 
interpreted as the square of a proper mass density 
given by r* =: — g(P, P) and, if r > 0, a unit vector 
u=r 'P interpreted as the macroscopic flow 
velocity. 

The second moment of the distribution function f 
is the symmetric 2-tensor on spacetime given by 


T(x) =: [ f(x, P)P @ pur 


It is interpreted as the stress energy tensor of the 
distribution f. Higher moments are defined similarly. 
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Liouville-Vilasov Equation 


When the gas is so rarefied that the particle 
trajectories do not cross, then in the absence of 
nongravitational forces, these trajectories are geode- 
sics of g, orbits in TV of the vector field 
X=(p%, OC = oe with T$, the Christoffel 
symbols of g. 

In a collisionless model, the physical law of 
conservation of particles imposes the conservation 
of f along the trajectories of X, that is, the Liouville- 
Vlasov equation 

of of 0 


xf =P" 5+ Qa 


Conservation laws If f satisfies the Vlasov equa- 
tion, then all moments satisfy a conservation law, in 
particular, 


V,P° = 0 and VaT% =0 


equations which make the Einstein—Vlasov system 
consistent. 

The theory extends without problem to particles 
having the same rest mass m, because the scalar 
2(p, p) = -m° is constant on a geodesic. 


Cauchy problem The Einstein—Vlasov system is an 
integro-differential system for g and f on a manifold 
V=MxR. The Cauchy data for the spacetime 
metric g on Mọ =M x {0} is, as usual, a pair (g, K), 
implemented with gauge initial data which complete 
the definition of Cauchy data for a _ well-posed 
hyperbolic system in the chosen gauge. The Cauchy 
data for f are a function f on the bundle Py,. It has 
been proved long ago that there exists a solution, 
geometrically unique, in a neighborhood of Mo if 
the data are in Sobolev spaces, weighted by a power 
of p° in the case of f. 

Since the Vlasov matter model, solution of a 
linear equation for given g, has no singularity by 
itself, the Einstein—Vlasov system is a good candi- 
date for solutions that are global in time. This global 
existence has been proved by Rein and Rendall in 
the case of small data, asymptotically flat with 
spherical symmetry or plane symmetry, or with 
hyperbolic symmetry and compact space. Global 
existence without these symmetries is an open 
problem. 


Boltzmann Equation 


When the particles undergo collisions, their trajec- 
tories in phase space are no more connected integral 
curves of the vector field X, that is, their moment 
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undergoes a jump with the crossing of another 
trajectory. In the Boltzmann model, the derivative 
Lxf is equal to the so-called collision operator, Tf: 


(Lxf)(x,P) = (Zf)(x, p) 


where Zf is an integral operator linked with the 
probability that two particles of momentum, respec- 
tively, p’ and q’, collide at x and give, after the shock, 
two particles of momentum p and g. For “elastic” 
shocks, the total momentum is conserved, that is, 
p’ and q’ lie in the submanifold %, =: {p + q' = 
p + q}, with volume element €’ and 


(Zf)(x,p) = JJ" (x, p')f (x, 7’) 


— f(x, p) xal Apa BL TE A wg 


The function A(x, p, q, p’, g’) is called the shock 
cross section; it is a phenomenological quantity. No 
explicit expression is known for it in relativity. 
A generally admitted property is the reversibility of 
elastic shocks, A(x, P, q, P q') = A(x, P; q', D> 9). 
It can be proved that under this hypothesis, the 
first and second moment of f are conserved as in the 
collisionless case, making the Einstein—Boltzmann 
system consistent. Existence of solutions (that are 
local in time) of the Cauchy problem for this system 
has long been known. No global existence for the 
coupled system is known yet. 

One defines, in a relativistic context, an entropy 
flux vector H which is proved to satisfy an 
H-theorem, that is, V,H° > 0. In an expanding 
universe, for instance, Robertson Walker, where H 
depends only on time and an entropy density is 
defined by H?, one finds that a decrease in entropy 
is linked with the expansion of the universe, thus 
permitting its ever-increasing organization from an 
initial anisotropy of f in momentum space. 


Other Matter Sources 
Elastic Media 


There are no solids in general relativity; in special 
relativity rigid motions are already very restricted. 
A theory of elastic deformations can only be defined 
relatively to some a priori given state of matter 
whose perturbations will satisfy laws analogous to 
the classical laws. Various such theories have been 
proposed through geometric considerations, extend- 
ing methods of classical elasticity; they have been 
used to predict the possible signals from bar 
detectors of gravitational waves, or the motions in 
the crust of neutron stars. A general theory 
constructed by Lagrangian formalism has recently 
been developed. 


Spinor Sources 


A symmetric stress energy tensor can be associated 
to classical spinors of spin 1/2, leading to a well- 
posed Einstein—Dirac system. The theories of super- 
gravity couple the Einstein—Cartan equations with 
anticommuting spin 3/2 sources. 


See also: Boltzmann Equation (Classical and Quantum); 
Einstein Equations: Exact Solutions; Einstein Equations: 
Initial Value Formulation; General Relativity: Overview; 
Geometric Analysis and General Relativity; Kinetic 
Equations; Spinors and Spin Coefficients. 
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Introduction 


Classical electromagnetism is described by Max- 
well’s equations, which, in 3-vector notation and 
corresponding respectively to the laws of Coulomb, 
Ampère, Gauss, and Faraday, are given by eqns 


[1a]-[1d]: 


div E = p [1a] 
curl BS =J [1b] 
div B = 0 [1c] 
curl E + 7 = 0 [1d] 


Equivalently, in covariant 4-vector notation, these 
correspond to eqns [2a] and [2b]: 


O, FY = — 7 [2a] 


O° Fl” =0 [2b] 


In eqns |1], E and B are the electric and magnetic 
fields, respectively, p is the electric charge density, 
and J is the electric current. In eqns [2], F,,, is the 
field tensor, *F,,, the dual field tensor, and j” is the 
4-current, related to the previous vector quantities 
by the following relations: 


0 Fy E; È; 


Fy = 
=F, B 0 -Bi 
-6 =b; Pi 0 
0 B Bb B 
, -Pi 0 FE; l; 

Fy = 
=p =i; 0 E 
B B -i 0 

* = (eS) 


Throughout this article, we shall denote the three 
spatial indices by lower-case Latin letters such as 4, f, 
while Greek indices such as u, v denote spacetime 
indices running through 0,1,2,3. The Einstein 
summation convention is used, whereby repeated 
indices are summed. Spacetime indices are raised 
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and lowered by the (flat) Minkowski metric 
gı = diag(1, — 1, — 1, — 1). We also use units con- 
ventional in particle physics, in which the reduced 
Planck constant h and the speed of light c are both 
set to 1. 

In terms of the totally skew symmetric symbol 
Emwpo (with £0123 = 1), the two field tensors are 
related by eqn [3]: 


s a [3] 


We say that *F,,, is the dual of F,,,, and eqn [3] is 
indeed a duality relation because eqn [4] holds, 
which means that up to a sign, F,,, and *F,,, are 
duals of each other: 


(F) =-F 4 


This duality is in fact the Hodge duality between 
p-forms and (n — p)-forms in an n-dimensional 
space. In our particular case, p=2 and n=4, so 
that both F and its dual are 2-forms. The minus sign 
in eqn [4] comes about because of the Lorentzian (or 
pseudo-Riemannian) signature of Minkowski 
spacetime. 

The physical significance of this duality is that 
such a symmetry interchanges electric and magnetic 
fields (again up to sign) (eqn [5]), as can be seen 
from the matrix representation of F, and *F,,, 
above: 


“: EX B, BR -E [5] 


Now in the absence of electric charges and 
currents, one sees immediately that Maxwells 
equations [1] or [2] are dual symmetric. This 
means that, in vacuo, whether we call an electro- 
magnetic field electric or magnetic is a matter of 
convention. As far as the dynamics is concerned, 
there is no distinction. 

On the other hand, eqns [1] and [2] as presented, 
that is, in the presence of matter, are manifestly not 
dual symmetric. The underlying reason for this 
asymmetry has been much studied both in physics 
and in mathematics. One of the two questions that 
this article addresses is precisely this. Following on 
this, we shall see what happens if we try somehow 
to restore this dual symmetry even in the presence of 
matter. 

The second question that we wish to discuss is a 
generalization of this duality. Electromagnetism is a 
gauge theory, in which the gauge group is the 
abelian circle group U(1), representing the phase of 
wave-functions in quantum mechanics. A physically 
relevant generalization, in which the abelian U(1) is 
replaced by a nonabelian group (e.g., SU(2), SU(3)) 
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is called Yang-Mills theory (Yang and Mills 1954), 
which is the theoretical basis of all modern particle 
physics. We shall show in this article how the 
concept of electric-magnetic duality can be general- 
ized in the context of Yang-Mills theory. 


Gauge Invariance, Sources, 
and Monopoles 


Electric-magnetic duality, whether in the well-known 
abelian case or in the still somewhat open nonabelian 
case, is intimately connected with gauge invariance, 
sources, and monopoles, and also the dynamics as 
embodied in the gauge action. These questions in 
turn find their natural setting in differential geometry, 
particularly the geometry of fibre bundles. 

Although classical electrodynamics can be fully 
described by the field tensor F,,,, one needs to 
introduce the electromagnetic (or gauge) potential 
A, if one considers quantum mechanics, as has 
been beautifully demonstrated by the Bohm- 
Aharonov experiment. The two quantities are 
related by eqn [6]: 


F(x) = OAy(x) — 3 A(x) [6] 


The fact that the phase of a wave function y(x) (e.g., 
of the electron) is not a measurable quantity 
(although relative phases of course are) implies that 
we are free to make the following transformation: 


W(x) + eh ap(xe) [7] 


This in turn implies an unobservable transformation 
[8] on the gauge potential, where A(x) is a real- 
valued function on spacetime: 


Ay(x) = Ay(x) + A(x) [8] 


This invariance is called gauge invariance. Since in 
this abelian case F,,, is gauge invariant, so are the 
Maxwell equations, for which we shall take from 
now on the covariant form [2]. Inasmuch as the 
Maxwell equations dictate the dynamics of electro- 
magnetism, gauge invariance is an intrinsic ingredi- 
ent even in the classical theory. 

In Yang-Mills theory, the U(1) phase e^% is 
replaced by an element S(x) of a nonabelian group 
G, so that eqns [7], [8], and [6] become, respec- 
tively, eqns [9], [10], and [11]: 


h(x) => S(x) a(x) [9] 


Here the electric coupling e is replaced by a general 
gauge coupling g. The quantities A, and F,,, now take 
values in the Lie algebra of the Lie group G and the 
bracket is the Lie bracket. The wave function (x) 
takes values in a vector space on which an appropriate 
representation of G acts. Notice that now the field 
tensor F, is no longer invariant, but only covariant: 


Fuv(x) > S(x) Fp (2S (x) [12] 


Next we consider the charges of gauge theory. For 
the moment, we wish to distinguish between two 
types of charges: sources and monopoles. These are 
defined with respect to the gauge field, which in turn 
is derivable from the gauge potential. 

Source charges are those charges that give rise to a 
nonvanishing divergence of the field. For example, the 
electric current j due to the presence of the electric charge 
e occurs on the right-hand side of the first Maxwell 
equation, and is given in the quantum case by eqn [13], 
where 4” is a Dirac gamma matrix, identifiable as a basis 
element of the Clifford algebra over spacetime: 


j” = epy" [13] 
In the Yang-Mills case, the first Maxwell equation 
is replaced by the Yang-Mills equation 


D,” = j", jë = gyy"y [14] 


We define the covariant derivative D as in 


Dyp r =A — iglAy, E] [15] 


Monopole charges, on the other hand, are 
topological obstructions specified geometrically by 
nontrivial G-bundles over every 2-sphere S? sur- 
rounding the charge. They are classified by elements 
of mı(G), the fundamental group of G. They are 
typified by the (abelian) magnetic monopole as first 
discussed by Dirac in 1931. 

Let us go into a little more detail about the Dirac 
magnetic monopole. If the field tensor F,,,, does come 
from a gauge potential A, as in eqn [6], then simple 
algebra will tell us that this implies 0,*F“” =0 as in 
eqn [2]. Hence, we conclude the following: 


4 monopole => A, cannot be well defined 
everywhere 


The result is actually stronger. Suppose there exists a 
magnetic monopole at a certain point in spacetime, 
and, without loss of generality, we shall consider a 
static monopole. If we surround this point by a 
(spatial) 2-sphere X, then the magnetic flux out of 
the sphere is given by 


J| B-d0= || B-do+ f| Bdo 16] 


Here XN and ÐS are the northern and southern 
hemispheres overlapping on the equator S. By 
Stokes’ theorem, since F,,, has no components 
Fo; = E;, we have 


|| B- do = A-ds 

ZN S 

J| B-do= $ Ards 
ys — 5 


In eqn [17b], —S means the equator with 
the opposite orientation. Hence, g+ 4. =0. 
But this contradicts the assumption that there 
exists a magnetic monopole at the center of 
the sphere. Hence, we see that if a monopole 
exists, then A, will have at least a string of 
singularities leading out of it. This is the famous 
Dirac string. 

The more mathematically elegant way to describe 
this is that the principal bundle corresponding to 
electromagnetism with a magnetic monopole is 
nontrivial, so that the gauge potential A, has to be 
patched (i.e., related by transition functions in the 
overlap). Consider the example of a static monopole 
of magnetic charge ë. For any (spatial) sphere S, of 
radius r surrounding the monopole, we cover it with 
two patches N, S as follows: 


[17a] 


[17b] 


(Nj: 0-0 <%0<6< 27 
(Si: VG ag, 0S O65 27 


In each patch we define the following: 


(N/)_ 8 

| Anr(r + z) 
AN) _ ex 

Anr(r + z) 
ASY =0 
aD 2 

l Anr(r — z) 
A) _ ex 

2 Anr(r — z) 
AY’ =0 


In the overlap (containing the equator), A‘) 
and A® are related by a gauge transformation: 


| a DN č [18] 
A= (ža) 


Notice that A has a line of singularity along 
the negative z-axis (which is the Dirac string 
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in this case); similarly for A’) along the positive 
Z-axis. 

Furthermore, the corresponding field strength is 
given by 


E=0 [19a] 
er 


If we now evaluate the “magnetic flux” out of S,, we 
have 


B-do= f AN — AS ) dx” =ë [20 
J J. a ğü i ) | ! 


In other words, in the presence of a magnetic 
monopole, the second half of Maxwell’s equations 
is modified according to eqn [21], with j” given by 
eqn [22]. 


j" = čp y |22] 

Furthermore, the form of eqn [21] tells us that a 
monopole of the F, field can also be considered as a 
source of the *F,,,, field. The two descriptions are 
equivalent. 

How are the charges e and ë related? The gauge 
transformation S=e'@" relating A and A must 
be well defined; that is, if one goes round the 
equator once, ¢=0 — 2r, one should get the same 
S. This gives 


ee = 270, 


ne Z, [23] 


In particular, the unit electric and magnetic charges 
are related by eqn [24], which is Dirac’s quantiza- 
tion condition, 


ee = 2n [24] 


So, in principle, just as in the electric case, where we 
could have charges e,2e,..., here we could also 
have magnetic charges of @,2é,... . In other words, 
both charges are quantized. 

Another way to look at this is to consider the 
classification of principal bundles over S*. The 
reason for these topological 2-spheres is that we 
are interested in enclosing a point charge. For a 
nontrivial bundle, the patching is given by a function 
S defined in the overlap (the equator), in other words, a 
map S! — U(1). What this amounts to is a closed 
curve in the circle group U(1). Now, curves that can be 
continuously deformed into one another cannot give 
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distinct fibre bundles, so that one sees easily that there 
exists a one-to-one correspondence: 


{principal U(1) bundles over S*} 
| 


{homotopy classes of closed curves in U(1)} 


This last is m4(U(1)) S Z. Hence, we recover Dirac’s 
quantization condition. 

So, for electromagnetism, there are two equivalent 
ways of defining the magnetic charge, as a source or 
as a monopole: 


1. 8, * F" =—j!" x ně £0. 
2. An element of mı(U(1)) & Z. 


The same goes for the electric charge. We also note 
that both definitions give us the fact that these 
charges are discrete (quantized) and conserved 
(invariant under continuous deformations). 

We now want to apply similar considerations 
to the magnetic charges in the nonabelian case. 
For several (subtle) reasons the obvious expression 
D,*F"” = —j# as a source (see Table 1) does not 
work. The quickest way to say this is that *F” in 
general has no corresponding potential A,, and so is 
not a gauge field. Moreover, in contrast to the 
abelian case, the field tensor does not fully 
specify the physical field configuration, as demon- 
strated by Wu and Yang. We shall come back to 
this later. 

But we have just seen that in the abelian case 
there is another equivalent definition, which is 
that a magnetic monopole is given by the gauge 
configuration corresponding to a nontrivial U(1) 
bundle over S*. This can be generalized to the 
nonabelian case without any problem. Moreover, 
this definition automatically guarantees that a 
nonabelian monopole charge is quantized and 
conserved. This is the way monopoles are defined 
above. 

Arguments similar to the abelian case easily yield 
the nonabelian analog of the Dirac quantization 
condition, eqn [25], the difference between the two 





cases being only a matter of conventional 
normalization. 
gg = 4r [25] 
Table 1 Definitions of charges 
Sources Monopoles 
Abelian ð, F” = — je Fe =f 
Nonabelian D,F”” = —j” ? 


Abelian Duality and the Wu-Yang 
Criterion 


We saw above the well-known fact that classical 
Maxwell theory is invariant under the duality opera- 
tor. By this we mean that at any point in spacetime 
free of electric and magnetic charges we have the two 
dual symmetric Maxwell equations: 


ð F” =0 [dF = 0] [26] 


Ə,” =0 [d*F = 0] [27] 


Displayed in square brackets are the equivalent 
equations in the language of differential forms. Then 
by the Poincaré lemma we deduce immediately the 
existence of potentials A and A such that eqns [28] 
and [29] hold: 


area) — ðA, (x) — rA) [F = dA] [28] 


“Fu(x) = OAy(x) —Q,A,(x) [F= dA] [29 


The two potentials transform independently under 
independent gauge transformations A and A: 


Ay (x) => Ay(x) + 3 Alx) (30) 


Á (x)= Ay (x) + O, A(x) [31] 


This means that the full symmetry of this theory is 
doubled to U(1) x U(1), where the tilde on the 
second circle group indicates that it is the symmetry 
of the dual potential A. It is important to note that 
the physical degrees of freedom remain the same. 
This is clear because F and *F are related by an 
algebraic equation [3]. As a consequence, the 
physical theory is the same: the doubled gauge 
symmetry is there all the time but is just not so 
readily detected. 

As mentioned in the Introduction, this dual 
symmetry means that what we call “electric” or 
“magnetic” is entirely a matter of choice. 

In the presence of electric charges, the Maxwell 
equations usually appear as 


ð Fe” = 0 [32] 
ð, F” = — jt! [33] 


The apparent asymmetry in these equations comes 
from the experimental fact that there is only one 
type of charges observed in nature, which we choose 
to regard as a source of the field F (or, equivalently 
but unconventionally, as a monopole of the field *F). 
But as we see by dualizing eqns [32] and [33], 
that is, by interchanging the role of electricity and 
magnetism in relation to F, we could equally have 
thought of these instead as source charges of 


the field *F (or, similarly to the above, as monopoles 
of F): 7 
ð,“ F” = —j# [34] 


0, F” = 0 35] 


If both electric and magnetic charges existed in 
nature, then we would have the dual symmetric pair: 


6," F” = —j" [36] 


6, FHY = —j" [37] 


This duality in fact goes much deeper, as can be 
seen if we use the Wu-Yang criterion to derive the 
Maxwell equations, although we should note that 
what we present here is not the textbook derivation 
of the Maxwell equations from an action, but we 
conisder this method to be much more intrinsic and 
geometric. Consider first pure electromagnetism. 
The free Maxwell action is given by 


i 
Ay = =F / F P” [38] 


The true variables of the (quantum) theory are the 
A, so in eqn [38] we should put in a constraint to 
say that F,, is the curl of A, [28]. This can be 
viewed as a topological constraint, because it is 
precisely equivalent to [26]. Using the method of 
Lagrange multipliers, we form the constrained 
action 


A= A + J A (8 F”) [39] 


We can now vary this with respect to F,,,, obtaining 
eqn [40], which implies [27]: 


F” = Qed, 40) 


Moreover, the Lagrange multiplier is exactly the 
dual potential A. 

This derivation is entirely dual symmetric, since 
we can equally well use [27] as constraint for the 
action A’, now considered as a functional of *FH” 
(eqn [41]), and obtain [26] as the equation of motion: 


A= : J * By * PY [41] 

This method applies to the interaction of charges 
and fields as well. In this case we start with the free 
field plus free particle action (eqn [42]), where we 
assume the free particle m to satisfy the Dirac 
equation, 


A= AR + | piam aa 
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To fix ideas, let us regard this particle carrying an 
electric charge e as a monopole of the potential A,. 
Then the constraint we put in is [33], giving 


A =A + | SAOR +7") 43] 


Variation with respect to *F gives eqn [32], and 
varying with respect to w gives 


Gð," — mb = -eA "Y 44) 


So, the complete set of equations for a Dirac particle 
carrying an electric charge e in an electromagnetic 
field is [32], [33], and [44]. The duals of these 
equations will describe the dynamics of a Dirac 
magnetic monopole in an electromagnetic field. 

We see from this that the Wu-Yang criterion 
actually gives us an intuitively clear picture of 
interactions. The assertion that there is a monopole 
at a certain spacetime point x means that the gauge 
field on a 2-sphere surrounding x has to have a 
certain topological configuration (e.g., giving a 
nontrivial bundle of a particular class), and if the 
monopole moves to another point then the gauge 
field will have to rearrange itself so as to maintain 
the same topological configuration around the new 
point. There is thus naturally a coupling between the 
gauge field and the position of the monopole, or, in 
physical language, a topologically induced interac- 
tion between the field and the charge (Wu and 
Yang, 1976). Furthermore, this treatment of inter- 
action between field and matter is entirely dual 
symmetric. 

As a side remark, consider that although the 
action A? is not immediately identifiable as geo- 
metric in nature, the Wu—-Yang criterion, by putting 
the topological constraint and the equation of 
motion on equal (or dual) footing, suggests that in 
fact it is geometric in a subtle manner not yet fully 
understood. Moreover, as pointed out, eqn [40] says 
that the dual potential is given by the Lagrange 
multiplier of the constrained action. 


Nonabelian Duality Using Loop Variables 


The next natural step is to generalize this duality to 
the nonabelian Yang-Mills case. Although there is 
no difficulty in defining *F“”, which is again given 
by [3], we immediately come to difficulties in the 
relation between field and potential; for example, as 
in eqn [11], 


F(x) = OyAy (x) — OpAv(2e) + iglAy(x), Av(x)] 


First of all, despite appearances the Yang-Mills 
equation [45] (in the free-field case) and the Bianchi 
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identity [46] are not dual-symmetric, because the 
correct dual of the Yang—Mills equation ought to be 
given by eqn [47], where D, is the covariant 
derivative corresponding to a dual potential: 


D, Ft” =0 45] 
DP =0 [46] 
D, “F” =0 [47] 
Secondly, the Yang-Mills equation, unlike its 


abelian counterpart [27], says nothing about 
whether the 2-form *F is closed or not. Nor is the 
relation [11] about exactness at all. In other words, 
the Yang-Mills equation does not guarantee the 
existence of a dual potential, in contrast to the 
Maxwell case. In fact, Gu and Yang have con- 
structed a counterexample. Because the true vari- 
ables of a gauge theory are the potentials and not 
the fields, this means that Yang-Mills theory is not 
symmetric under the Hodge star operation [3]. 

Nevertheless, electric-magnetic duality is a very 
useful physical concept, so one may wish to seek a 
more general duality transform (~), satisfying the 
following properties: 


1 ( = ). 

2. Electric field F, <> magnetic field Fy». 

3. Both A, and A, exist as potentials (away from 
charges). 

4. Magnetic charges are monopoles of A,, and 
electric charges are monopoles of A,,. 

5. ~ reduces to * in the abelian case. 


One way to do this is to study the Wu—-Yang 
criterion more closely. This reveals the concept of 
charges as topological constraints to be crucial 
even in the pure field case, as can be seen in 
Figure 1. The point to stress is that, in the above 
abelian case, the condition for the absence of a 
topological charge (a monopole) exactly removes 
the redundancy of the variables F,,,, and hence 
recovers the potential A,,. 


A, exists as 


Defining constraint 


Poincaré JF” =O 


[dF=0] 


| Gauss 


No magnetic 
monopole ê 


potential for F 
[F=dA] 





p 


Principal A, 
bundle trivial 


Geometry 
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Figure 1 


Now the nonabelian monopole charge was defined 
topologically as an element of mı(G), and this 
definition also holds in the abelian case of U(1), with 
m™(U(1))= Z. So the first task is to write down a 
condition for the absence of a nonabelian monopole. 

To fix ideas, let us consider the group SO(3), 
whose monopole charges are elements of Z2, which 
can be denoted by a sign +. The vacuum, charge (+) 
(that is, no monopole) is represented by a closed 
curve in the group manifold of even winding 
number, and the monopole charge (—) by a closed 
curve of odd winding number. It is more convenient, 
however, to work in SU(2), which is the double 
cover of SO(3) and which has the topology of S$, as 
sometimes it is useful to identify the fundamental 
group of SO(3) with the center of SU(2) and hence 
consider the monopole charge as an element of this 
center. There the charge (+) is represented by a 
closed curve, and the charge (—) by a curve that 
winds an odd number of “half-times” round the 
sphere S°. Since these charges are defined by closed 
curves, it is reasonable to try to write the constraint 
in terms of loop variables. The treatment presented 
below is not as rigorous as some others, but the 
latter are not so well adapted to the problem in 
hand. Furthermore, it is important to emphasize that 
this approach aims to generalize electric-magnetic 
duality to Yang-Mills theory in direct and close 
analogy to duality in electromagnetism, without any 
further symmetries with which it may be expedient 
to enrich the theory. Other approaches are referred 
to in the next section. 

Consider the gauge-invariant Dirac phase factor 
(or holonomy) ®(C) of a loop C, which can be 
written symbollically as a path-ordered exponential: 


2r 
P|] = P; expig l ds Ay(E(s))E"(s) [48] 


In eqn [48], we parametrize the loop C as is eqn [49] 
and a dot denotes differentiation with respect to the 
parameter s. 


C: {&(s): s = 0 — 27,£(0) = E(27) = &} 49 


We thus regard loop variables in general as 
functionals of continuous piecewise smooth func- 
tions € of s. In this way, loop derivatives and loop 
integrals are just functional derivatives 
and functional integrals. This means that loop 
derivatives 6,,(s) are defined by a regularization 
procedure approximating delta functions with 
finite bump functions and then taking limits in a 
definite order. For functional integrals, there exist 
various regularization procedures, which are treated 
elsewhere in this Encyclopedia. 


Polyakov (1980) introduces the logarithmic loop 
derivative of ®[€]: 


F, [gls] = ZOOL 50) 


This acts as a kind of “connection” in loop space 
since it tells us how the phase of ®[€] changes from 
one loop to a neighbouring loop. One can go a step 
further and define its “curvature” in direct analogy 
with F(x) by 


Gw [E|s| = oy (s JE IE|s eee pls ) F,|€|s| 
+ ig|F,[§|s], FL[g|s]] [51] 


It can be shown that by using the F,,[é|s] we can 
rewrite the Yang-Mills action as eqn [52], where the 
normalization factor N is an infinite constant: 

2 
0 1 . 


Ar = a 


a ds tr{F,,[E|s]F"E|s]}|E(s)|~ 


|52] 


However, the true variables of the theory are still 
the A,,. They represent 4 functions of a real variable, 
whereas the loop connections represent 4 functionals 
of the real function £(s). Just as in the case of the F,,,, 
these F,,[£|s] have to be constrained so as to recover 
A, but this time much more severely. 

It turns out that, in pure Yang-Mills theory, the 
constraint that says there are no monopoles ([53]) 
also removes the redundancy of the loop variables, 
exactly as in the abelian case, 


G plés] = 0 [53] 


That this condition is necessary is easy to see by 
simple algebra. The proof of the converse of this 
“extended Poincaré lemma” is fairly lengthy. Granted 
this, we can now apply the Wu-Yang criterion to the 
action [52] and derive the Polyakov equation [54], 
which is the loop version of the Yang-Mills equation: 


ôu (s)FY[g|s}] = 0 |54] 


In the presence of a monopole charge (—), the 
constraint [53] will have a nonzero right-hand side, 


G léls] = =u EIS] [55] 


The loop current J,,,[€|s] can be written down 
explicitly. However, its global form is much easier 
to understand. Recall that F“[&|s] can be thought of 
as a loop connection, for which we can form its 
“holonomy.” This is defined for a closed (spatial) 
surface © (enclosing the monopole), parametrized by 
a family of closed curves €,(s),t=O—27. The 
“holonomy” Oy is then the total change in phase 
of [é] as t — 2r, and thus equals the charge (—). 
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To formulate an electric-magnetic duality that is 
applicable to nonabelian theory, one defines yet 
another set of loop variables. Instead of the Dirac 
phase factor ®[€] for a complete curve [48], we 
consider the parallel phase transport for part of a 
curve from s4 to s2: 


Pe(s2, 51) = P, exp ig f dsA,,(E(s))EX(s) [56] 


Then the new variables are defined by [57]. 
E,[é|s] = els, 0)F,,[E|s]®z '(s, 0) [57] 


These are not gauge invariant like F,,[&|s] and may 
not be as useful in general, but seem more 
convenient for dealing with duality. 

Using these variables, we now define their dual 
E.,[n|t] according to 


w! (n(t))E, [n|t]w(n()) 
-eppil (t) | 86 dSEMEIE (SIE A(s) 


x d(E(s) — n(t)) [58] 


In eqn [58], w(x) is a (local) rotation matrix 
transforming from the frame in which the orientation 
in internal symmetry space of the fields E,,[€|s] are 
measured to the frame in which the dual fields £, [nlt] 
are measured. It can be shown that this dual transform 
satisfies all five of the required conditions listed earlier. 

Electric-magnetic duality in Yang-Mills theory is 
now fully reestablished using this generalized dua- 
lity. We have the dual pairs of equations [59]-[60] 
and [61]-[62]: 


b,E,, — 6,E, = 0 [59] 
FE, =0 [60] 
i= [61] 

b,E, — 6,E, = 0 [62] 


Equation [59] guarantees that the potential A 
exists, and so is equivalent to [53], and hence is the 
nonabelian analog of [26]; while equation [60] is 
equivalent to the Polyakov version of Yang-Mills 
equation [54], and hence is the nonabelian analog of 
[27]. Equation [61] is equivalent by duality to [59] 
and is the dual Yang-Mills equation. Similarly 
equation [62] is equivalent to [60], and guarantees 
the existence of the dual potential A. 

The treatment of charges using the Wu—Yang 
criterion also follows the abelian case, and will not 
be further elaborated here. For this and further 
details, the reader is referred to the orginal papers 
(Chan and Tsou 1993, 1999). 
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Also, just as in the abelian case, the gauge 
symmetry is doubled: from the group G we deduce 
that the full gauge symmetry is in fact G x G, but that 
the physical degrees of freedom remain the same. 

The above exposition establishes electric-magnetic 
duality in Yang-Mills theory only for classical fields. 
A hint that this duality persists at the quantum level 
comes from the work of °t Hooft (1978) on confine- 
ment. There he introduces two loop quantities A(C) 
and B(C) that are operators in the Hilbert space of 
quantum states satisfying the commutation relation 
[63] for an SU(N) gauge theory, where 7 is the linking 
number between the two (spatial) loops C and C’: 


A(C)B(C’) = B(C)A(C) exp(2nin/N) [63] 


The order or Wilson operator is given explicitly by 
A(C)=tr ®(C). These two operators play dual roles 
in the sense of electric-magnetic duality: 


e A(C) measures the magnetic flux through C and 
creates electric flux along C. 

e B(C) measures the electric flux through C and 
creates magnetic flux along C. 


By defining the disorder operator B(C) as the 
Wilson operator corresponding to the dual potential 
A obtained above, one can prove the commutation 
relation [63], thus showing that these classical fields, 
when promoted to operators, retain their duality 
relation. Furthermore, there is a remarkable relation 
between the two (abstractly identical) gauge groups, 
in that if one is confined then the dual must be 
broken (that is, in the Higgs phase). This result is 
known as ’t Hooft’s theorem. 

The doubling of gauge symmetry, together with 
’t Hooft’s theorem, has been applied to the confined 
colour group SU(3) of quantum chromodynamics 
(QCD), in the Dualized Standard Model, to solve the 
puzzle of the existence of exactly three generations of 
fermions, with good observational support, by 
identifying the (necessarily broken) dual SU(3) with 
the generation symmetry (Chan and Tsou, 2002). 


Other Treatments of Nonabelian Duality 


Since Yang-Mills theory is not symmetric under the 
Hodge *-operation, there are several routes one can 
take to generalize the concept of electric-magnetic 
duality to the nonabelian case. What was presented 
in the last section is a modification of the 
“-operation so as to restore this symmetry for 
Yang-Mills theory, keeping to the original gauge 
structure as much as possible. However, Yang-Mills 
theory as used today in particle and field theories are 
usually embedded in theories with more structures. 


In the simplest case we have the Standard Model of 
Particle Physics, which describes all of particle 
interactions (except gravity) and which has the 
gauge group usually written as SU(3) x SU(2) x 
U(1), corresponding to the SU(3) of strong interac- 
tion and SU(2) x U(1) of electroweak interaction. 
[Strictly speaking, it is (SU(3) x SU(2) x U(1))/Ze, if 
we have the standard particle spectrum.] However, 
the former group is confined and the latter broken. 
The breaking is usually effected by introducing 
scalar fields called Higgs fields into the theory. 

Besides the experimentally well-tested Standard 
Model, there are many theoretically popular models 
of gauge theory in which supersymmetry is postu- 
lated, thereby introducing extra symmetries into the 
theory. Many of these are remnants of string theory, 
and are usually envisaged as gauge theories in a 
spacetime dimension higher than 4. 

Because of the extra structures and increased 
symmetries in these theories, there is quite a 
proliferation of concepts of duality, which could all 
be thought of as generalizations of abelian electric- 
magnetic duality (Schwarz, 1997). They come under 
the names of Seiberg-Witten duality, S-duality, 
T-duality, mirror symmetry, and so on. All these 
other aspects of duality have their own entries in this 
Encyclopedia. 


See also: AdS/CFT Correspondence; Duality in 
Topological Quantum Field Theory; Four-Manifold 
Invariants and Physics; Large-N Dualities; Measure on 
Loop Spaces; Mirror Symmetry: a Geometric Survey; 
Nonperturbative and Topological Aspects of Gauge 
Theory; Seiberg—Witten theory; Standard Model of 
Particle Physics. 
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Introduction 


The discovery of the electroweak theory crowned 
long years of investigation on weak interactions. 
The key earlier developments included Fermi’s 
phenomenological four-fermion interactions for the 
G-decay, discovery of parity violation and establish- 
ment of V — A structure of the weak currents, the 
Feynman-—Gell—Mann conserved vector current (CVC) 
hypothesis, current algebra and its beautiful applica- 
tions in the 1960s, Cabibbo mixing and lepton—hadron 
universality, and finally, the proposal of intermediate 
vector bosons (IVBs) to mitigate the high-energy 
behavior of the pointlike Fermi’s interaction theory. 

It turned out that the scattering amplitudes in IVB 
theory still generally violated unitarity, due to the 
massive vector boson propagator, 


—g" + qq’ |M? 
q? — M? + ie 


The electroweak theory, known as Glashow- 
Weinberg—Salam (GWS) theory (Weinberg 1967, 
Salam 1968, Taylor 1976), was born through the 
attempts to make the hypothesis of IVBs for the 
weak interactions such that it is consistent with 
unitarity. 

The GWS theory contains, and is in a sense a 
generalization of, quantum electrodynamics (QED) 
which was earlier successfully established as the 
quantum theory of electromagnetism in interaction 
with matter. GWS theory describes the weak and 
electromagnetic interactions in a single, unified 
gauge theory with gauge group 


SUL (2) x U(1) i1] 


Part of this gauge symmetry is realized in the 
so-called “spontaneously broken” mode; only a 
Urm(1) C SUL(2) x U(1) subgroup, corresponding 
to the usual local gauge symmetry of the electro- 
magnetism, remains manifest at low energies, with a 
massless gauge boson (photon). The other three 
gauge bosons W+,Z, are massive, with masses 
~ 80.4 and 91.2 GeV, respectively. 

The theory is renormalizable, as conjectured by 
S Weinberg and by A Salam, and subsequently 
proved by G’t Hooft (1971), and makes well- 
defined predictions order by order in perturbation 
theory. 


Electroweak Theory 209 


Since the experimental observation of neutral 
currents (a characteristic feature of the Weinberg- 
Salam theory which predicts an extra, neutral 
massive vector boson, Z, as compared to the naive 
IVB hypothesis) at Gargamelle bubble chamber at 
CERN (1973), the theory has passed a large number 
of experimental tests. The first basic confirmation 
also included the discovery of various new particles 
required by the theory: the charm quark (SLAC, 
BNL, 1974), the bottom quark (Fermilab, 1977), 
and the tau (r) lepton (SLAC, 1975). The heaviest 
top quark, having mass about two hundred times 
that of the proton, was found later (Fermilab, 1995). 
The direct observation of W and Z vector bosons 
was first made by UA1 and UA2 experiments at 
CERN (1983). 

The GWS theory is today one of the most precise 
and successful theories in physics. Even more 
important, perhaps, together with quantum chro- 
modynamics (QCD), which is a SU(3) (color) gauge 
theory describing the strong interactions (which 
bind quarks into protons and neutrons, and the 
latter two into atomic nuclei), it describes correctly — 
within the present experimental and theoretical 
uncertainties — all the presently known fundamental 
forces in Nature, except gravity. The SU(3)acp x 
(SU, (2) x U(1))ews theory is known as the standard 
model (SM). 

Both the electroweak (GSW) theory and QCD 
are gauge theories with a nonabelian (noncom- 
mutative) gauge group. This type of theories, 
known as Yang-Mills theories, can be constructed 
by generalizing the well-known gauge principle 
of QED to more general group transformations. 
It is a truly remarkable fact that all of the 
fundamental forces known today (apart from 
gravity) are described by Yang-Mills theories, 
and in this sense a very nontrivial unification 
can be said to underlie the basic laws of Nature 
(G ’t Hooft). 

There are further deep and remarkable conditions 
(anomaly cancellations), satisfied by the structure of 
the theory and by the charges of experimentally 
known spin-1/2 elementary particles (see Tables 1 
and 2), which guarantees the consistency of the 
theory as a quantum theory. 

It should be mentioned, however, that the recent 
discovery of neutrino oscillations (SuperKamio- 
kande (1998), SNO, KamLAND, K2K experi- 
ments), which proved the neutrinos to possess 
nonvanishing masses, clearly indicates that the 
standard GWS theory must be extended, in an as 
yet unknown way. 
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Table 1 Quarks and their charges 





Quarks SUL (2) Uy(1) Uem(1) 
UL CL ty D 1 3 
di J’ \ si)’ (bi . = 

UR, CR, İR 1 4 3 

dr, SR, bR 1 £ 3 


The primes indicate that the mass eigenstates are different from 
the states transforming as multiplets of SU_(2) x Uy(1). They 
are linearly related by CKM mixing matrix. 


Table 2 Leptons and their charges 


Leptons SU_ (2) 
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The primes indicate again that the mass eigenstates are in 
different from the states transforming as multiplets of SUĻ(2) x 
Uy(1), as required by the observed neutrino oscillations. 


The following is a brief summary of the GWS 
theory, its characteristic features, its implications to 
the symmetries of Nature, the status of the precision 
tests, and its possible extensions. 


GWS Theory 


All the presently known elementary particles (except 
for the gauge bosons W*,Z,~7, the gluons, the 
graviton, possibly right-handed neutrinos) are listed 
in Tables 1-3 together with their charges with 
respect to the SU;(2) x U(1) gauge group. 

A doublet of Higgs scalar particles is included 
even though the physical component (which should 
appear as an ordinary scalar particle) has not yet 
been experimentally observed. 

The Lagrangian is given by 


L= Lange + L aiias F L iepons + Lipes + Lukana 
+ T T Lshosts 


The gauge kinetic terms are 


Leauge = 


UR. FY 7G, Ta 


Table 3 Higgs doublet scalars and their charges 


Higgs doublet SUL (2) Uy(1) 





(i) 2 G) 


where 
bc nb ac 
cD = 0,0 = OL At, + ge” A n 
Cy =0 by — By 


are SUL(2) x U(1) gauge field tensors; £,¢ and Lpp 
are the so-called gauge-fixing term and Faddeev- 
Popov ghost term, needed to define the gauge-boson 
propagators appropriately and to eliminate certain 
unphysical contributions. The gauge invariance of the 
theory is ensured by a set of identities (A Slavnov, 
J C Taylor). The quark kinetic terms have the form 


Lansik = Ss” piy" Dp 


quarks 


where D, are appropriate covariant derivatives, 
ig ig’ 
Didi = (a, = a7 aa EB, Ja 


for the left-handed quark doublets, 


and similarly for other “up” quarks cg (charm) and 
tr (top), and “down” quarks, sp (strange), and bp 
(bottom). Analogously, the lepton kinetic terms are 
given by 





3 
leptons = > Pi Duy 
= 1 
3 7a Aa ig’ 
= 2 py” (a, -E3 -T +B, L 
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+X phig" (Ou + ig’By) Ve 


i=1 


where 7" (i= 1, 2, 3) indicate the e, u, 7 lepton families; 
finally, the parts involving the Higgs fields are 


Luiggs = Dud * D'o + Vlo, g’) 
V(¢, ot) = -r He — A(t ¢)? 
and 


NERI \ 2 
LYeleaws — ` si qi ( p? Jd 


i, j=1 


Ox 
+g at ( Jr 


Slat o Jk] +e [2] 


1j=1 


Lhe 





For u? < 0, the Higgs potential has a minimum at 
(614) =o"? + 0°) = -Eo 
2A 2 
By choosing conveniently the direction of the Higgs 
field, its vacuum expectation value (VEV) is expressed as 


(osla) reve 8B 


The physical properties of Higgs and gauge 
bosons are best seen by choosing the so-called 
unitary gauge, 


wy al?) -acor 0 _ fe 
ae (Se) even = UG) ) 
t=UOU, tr=v 


A, =U(O) (a; + Ean) UHO, A,= 


aa 
TA 
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and expressing everything in terms of primed 
variables. It is easy to see that 


1. There is one physical scalar (Higgs) particle 
with mass, 


My, =f —2p7 |4] 
2. The Higgs kinetic term (Doọ*)(Do') produces the 


gauge-boson masses 


T 2 
gv v 
My:="z-, Mz=7lg°+e") [5 
4 4 

3. The physical gauge bosons are the charged W*, and 


two neutral vector bosons described by the fields 





Z„ = cos OwA3,, — sin OwB,,, 
A, = sin @wA3,, + cos OwB,, 


where the mixing angle 


/ 
Oy = tan (si Oy = 


g 
/ g? Ee p- 
is known as the Weinberg angle. The massless A, 
field describes the photon. 


Fermi Interactions and Neutral Currents 


The fermions interact with gauge bosons through 
the charge and neutral currents 


L=F (Jp Wi +J WE) +E 6 


/ 
pes = ga- +5 JIB” 


= ef emu + 





§ 0 
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where 
J= S bet be 
SDA =A y 
z 8 
corresponds to the standard charged current, and 
J) =f} — sin? Owen [9] 


is the neutral current to which the Z boson is 
coupled (JÀ =(1/2) $ YL yT yL and Jo is the 
electromagnetic current). The model thus predicts 
the existence of neutral current processes, mediated 
by the Z boson, such as v,e > v,e or P,e > De, with 
cross section of the same order of that for the 
charged current process, %e—v,e, but with a 
characteristic L-R asymmetric couplings depending 
on the Weinberg angle. By eqn [9] appropriate ratios 
of cross sections, such as o(v,e > v,e) /o(D,e > De), 
can be used to measure sin^ Ow. 

The exchange of heavy W bosons generates an 
effective current—current interaction at low energies: 


g2 
Li = — zya Jw 

IMa 
the well-known Fermi-Feynman-Gell-Mann Lagran- 
ian —&Jl_, Je, with 
8 V2) V-Aus V-A? 

Gr g 

J27 3M% 


This means that the Higgs VEV must be taken to be 





v =2-V4Git? ~ 246 GeV [10] 


Masses 


It is remarkable that “all” known masses of the 
elementary particles — except perhaps those of the 
neutrino masses — are generated in GWS theory 
through the spontaneous breakdown of SU,(2) x 
U(1) symmetry, through the Higgs VEV (eqns [3] 
and [10]). The boson masses are given by [4] and 
[S]. Note that the relation 


Mwy 


7 Mz, cos? bw ee 


p 
reflects an accidental SO(3) symmetry present (note the 
SO(4) symmetry of the Higgs potential in the limit 
œ — 0, before the spontaneous breaking) in the model, 
called custodial symmetry. This is a characteristic, 
model-dependent feature of the minimal model, not 
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necessarily required by the gauge symmetry. This 
relation is well met experimentally, although a quanti- 
tative discussion requires the choice of the renormaliza- 
tion scheme (including the definition of sin Oy itself) 
and check of consistency with various other data. 

The fermions get mass through the Yukawa 
interactions (eqn [2]); the fermion masses are 
arbitrary parameters of the model and cannot be 
predicted within the GWS theory. An important 
feature of this mechanism is that the coupling of the 
physical Higgs particle to each fermion is propor- 
tional to the mass of the latter. This should give a 
clear, unambiguous experimental signature for the 
Higgs scalar of the minimal GWS model. 

The recent discovery of nonvanishing neutrino 
masses requires the theory to be extended. Actually, 
there is a natural way to incorporate such masses in the 
standard GWS model, by a minimal extension. As the 
right-handed neutrinos, if they exist, are entirely 
neutral with respect to the SU,(2) x U(1) gauge 
symmetry, they do not need its breaking to have 
mass. In other words, vp may get Majorana masses, 
~ Mrvprvr, by some yet unknown mechanism, much 
larger than those of other fermions (such a mechanism 
is quite naturally present in some grand unified 
models). If now the Yukawa couplings are introduced 
as for the quarks and for the down leptons, then the 
Dirac mass terms result upon condensation of the 
Higgs field, and the neutrino mass matrix would take 
the form, for one flavor (in the space of (v1, /R)): 


[sa Be 1 


Table 4 Quark masses 


u (MeV) c(GeV) t (GeV) d (MeV) s(MeV) b (GeV) 


1.5—4 1.15-1.35 174.3 5.1 4—8 80-130 4.1-4.4 


Table 5 Leptons masses 








ve(eV) v,,(MeV) v-(MeV) 

<3 <0.19 <18.2 

e (MeV) u (MeV) T (MeV) 

0.510998 92 + 105.658369 + 1776.99 + 0.26 
4x 10° 9x 10° 


Table 6 Gauge-boson masses 


Photon Gluons W+(GeV) Z (GeV) 





0 0 80.425 + 0.038 91.1876 + 0.0021 


If the Dirac masses are assumed to be of the same 
order of those of the quarks and if the right-handed 
Majorana masses Me are far larger, for example, 
of the order of the grand unified scale, O(10'° GeV), 
then diagonalization of the mass matrix would 
give, for the physical masses of the left-handed 
neutrinos, ~m /Mr < mp, much smaller than other 
fermion masses, quite naturally (“see-saw” mechanism). 


CKM Quark Mixing As there is a priori no reason 
why the weak-interaction eigenstates should be 
equal to the mass eigenstates, the Yukawa couplings 
in eqn [2] are in general nondiagonal matrices in the 
flavor. Suppose that the the weak base for the 
quarks is given in terms of the mass eigenstates (in 
which quark masses are made diagonal), by unitary 
transformations 


= up ~ 
uli = X Vi ULj, 
j 


then the interaction terms with W* bosons [6] can 
be cast in the form (Kobayashi and Maskawa 1972) 


dii = > vison dij 


J 


w-exc _ mi y" wi a di 


+d W; Ug u [02 


where U a (yP agate, is called Cabibbo- 
Kobayashi-Maskawa (CKM) matrix. It can be 
parametrized in terms of three Euler angles and 
one phase 


Ug Une Unb 
U = Ua Us Ucp 
Ura Uss Usb 
C1213 $12€13 s13e 113 
= | -s12€23 — c12$23813€  C12C23 — $12823813E  S23C13 
$12$23 — €12€23813€ C1412823 — $12€23813E8 ——€23.€13 
[13] 


where c12 = cos ĝ12,S23 = sin 623, etc. The require- 
ment that charge-current weak processes are all 
described by these matrix elements, satisfying the 
unitarity relation, 


CKM 
oo, He [14 
7 
gives a very stringent test for the validity of the model. 


CP Violation 


CP (product of charge conjugation and parity 
transformation) invariance is an approximate sym- 
metry of Nature. Although it is known to be broken 
by very tiny amounts only, the exact extent and the 
nature of CP violation can have far-reaching 
consequences. 


CP violation has first been discovered by Cronin 
and Fitch (BNL, 1964) in the K-meson system; more 
precise information on the nature of CP violation 
from the neutral kaon decays has been obtained 
more recently (2000) in NA48 (CERN) and KTeV 
(Fermilab) experiments. CP violation has been 
established in the B-meson systems as well, very 
recently (2002), by Babar experiments at SLAC and 
Belle experiments at KEK. 

Through the so-called CPT theorem, CP invariance 
(or violation) is closely related to the T (time-reversal 
invariance) symmetry. Also, CP noninvariance is one 
of the conditions needed in the cosmological baryon 
number generation (baryogenesis). 

In the GWS theory, with three families of quark 
flavors (six quarks), there is just one source of CP 
violation: the phase 6,3 appearing in the CKM 
matrix (eqn [13]). For 640,72, W-exchange inter- 
actions [12] induce CP violation. The earlier and 
more recent experimental data on K? = mixing 
and Kz s decay data appear to be compatible with 
the CKM mechanism for CP violation, but a 
quantitative comparison with the SM remains 
somewhat hindered by the difficulty of estimating 
certain strong interaction effects. The recent con- 
firmation of CP violation in B systems is made in 
the context of a global fit with the SM predictions 
such as the “unitarity triangle” relations, for 
example, 

Upd U, 


JaA u 
aF U.a U*, + 


Uia Ui 

—— 2 — 0 15 
Uca U`, | ! 
(eqn [14]), and by combining data from kaon deays, 
charmed meson decays, B meson decay and mixings, 
etc., and is a part of direct tests of the GWS 
model, with nonvanishing CP violation CKM 





Figure 1 Unitarity triangle test (Eq. (15)). The small ellipses 
represent 68% and 95% probability zones for the apex 
corresponding to UygU)j,/Uca U3. Reproduced from M. Bona et 
al. (2005) The 2004 UTfit collaboration report on the status of the 
unitarity triangle in the standard model. Journal of High Energy 
Physics. 0507: 028—059 (hep-ph/0501199), with permission from 
loP Publishing Ltd and the UTfit collaboration. 
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phase (eqn [16] and Figure 1). Recent evidence for 
nonzero neutrino masses and mixings opens the 
way to possible CP violation in the leptonic 
processes as well. 

Finally, within the SM including strong interac- 
tions, there is one more source of CP violation: the 
so-called 0 (vacuum) parameter of QCD. 


B and L Nonconservation 


Another set of approximate symmetries in Nature are 
the baryon and lepton number conservations. In the 
electroweak theory, these global symmetries are exact 
to all orders of perturbation theory. Nonperturbative 
effects (a sort of barrier penetration in gauge field 
space) however violate both B and L; the combina- 
tion B-L is conserved even nonperturbatively though. 
The nonperturbative electroweak baryon number 
violation is an extremely tiny effect, the amplitude 
being proportional to the typical tunneling factor 
e?7/a but the process is unsuppressed at finite 
temperatures as might have been experienced by the 
universe at some early stage after big bang. 

B or L nonconservation can also arise naturally at 
high energy scales, if the electroweak theory is 
embedded as the low-energy approximation in a 
grand unified model. The experimental lower limit 
of proton lifetime, tp > 10°* years, from Kamio- 
kande experiments, however severely restricts accep- 
table models of this type (the simplest SU(5) model 
is already ruled out). 

On the other hand, cosmological baryogenesis 
requires sufficient amount of baryon number viola- 
tion, at least in some stage of cosmological expan- 
sion. Detailed analyses suggest that the standard 
electroweak transition might not in itself explain the 
baryon number np/n, ~ 107°? observed in the 
present universe. Recent observations of neutrino 
oscillations suggest the right-handed Majorana-type 
neutrino masses to be present, which violate the 
lepton number L. In such a case it might be possible 
that the correct amount of baryon number excess 
would be generated, through the leptogenesis. 


Global Fit 


Various relations exist at the tree level among the 
masses, scattering cross sections, decay rates, 
various asymmetries, etc., which can be read off 
or calculated from the formulas given earlier. 
These quantities receive corrections at higher 
orders, and the experimental checks of these 
modified relations provide precision tests of the 
model on the one hand, and possibly a hint for new 
physics, if there is any discrepancy with the 
prediction. Very often the amplitudes of interest 
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receive important contributions due to strong 
interactions, which are difficult to estimate. 

The basic parameters of the model, apart from the 
Higgs mass, and fermion masses and mixing 
parameters, can be taken to be (1) the fine structure 
constant, a=1/137.035 999 11(46); (2) the Fermi 
constant Gp = 1.166 37 x 10> GeV (which can be 
determined from the muon lifetime), and the Z-boson 
mass, Mz = 91.1876 + 0.0021 GeV (observed directly 
at LEP). Mw and sinw are then calculable 
numbers, in terms of these quantities, and depending 
on m, (measured independently by CDF and DØ 
experiments at Fermilab) and on the unknown My. 

Such precision tests of the GWS model are being 
made, combining the analyses of various decay rates 
and asymmetries in B-meson systems at B factories 
and in colliders, production and decays of Z and W 
bosons, elastic ve or ve scatterings, elastic v p or vp 
scatterings, deep inelastic lepton nucleon (or deu- 
teron) scatterings, the muon anomalous magnetic 
moment, atomic parity violation experiments, etc. 

An overall fit to the data gives an excellent 
agreement, with the input parameters 


My = 11373° GeV, m, = 176.9 + 4.0 GeV, 
as(Mz) = 0.1213 + 0.0018 


For instance (in GeV), 


Mw = 80.390 + 0.018 vs. 
(exp. value (LEP)) 
Tz = 2.4972 + 40.0012 vs. 


(exp. value) 


80.412 + 0.042 


2.4952 + 0.0023 


For sin? Ow (defined in the so-called MS scheme) all 
data give consistently the value 


sin? Oy = 0.23120 + 0.00015 


(a slightly larger value is reported by an vN 
experiment at Fermilab). 

The unitarity-triangle tests of the SM and deter- 
mination of CKM matrix have already been men- 
tioned. The results of global fit can be summarized 
in Figure 1, and by the angles 


s12 = 0.2243 + 0.0016 
s23 = 0.0413 + 0.0015 
s13 = 0.037 + 0.0005 
643 = 60° + 14° 


[16] 


For the muon anomalous gyromagnetic ratio (g — 2), 
the experimental data 


= = (1.116 5920(37) + 0.78) x 107° 


qe”? 


is to be compared with the theoretical prediction 
a'? = (1.1165918(83) + 0.49) x 107? 


which is slightly smaller (1.90), where the largest 
theoretical uncertainty comes from the two-loop 
hadronic contribution a ~ (69.63 + 0.72) x 107 
(the QED corrections to O(a”) are included). 

For further details of the analyses and the present 
status of experimental tests of the electroweak theory, 
see the reviews by J Erler and P Langacker, and by F 
J Gilman et al., cited in “Further reading” (most of 
numbers cited here come from these two reviews). 


Need for Extension of the Model 


In spite of such an impressive experimental con- 
firmation, there are reasons to believe that the 
electroweak theory, in its standard minimal form, 
is not a complete story. As already mentioned, 
neutrino oscillations, predicted earlier by Ponte- 
corvo, have recently been experimentally confirmed, 
giving uncontroversial evidence for nonvanishing 
neutrino masses and their mixing. This is a clear 
signal that the theory must be extended. If the mass 
is instead taken in the form of eqn [11] but with 
three neutrinos families, the diagonalization in 
general yields a mixing for the light neutrinos, as 
for the quarks. Some of the experimental data on the 
neutrinos are summarized in Table 7. 

In addition, the Higgs sector of the theory (the 
part of the interactions responsible for spontaneous 
breaking SU,(2) x U(1) — Ugm(1)) is still largely 
untested. The theory predicts a physical scalar 
particle, the Higgs particle, of unknown mass. The 
present-day expectation for its mass, which com- 
bines the experimental lower limit and an indirect 
upper limit following from the analysis of various 
radiative corrections, is 


114 (GeV) < my < 250 (GeV) 
This particle should be observable either in the 


Tevatron at Fermilab or in the coming LHC 


Table 7 Neutrino mass square differences and mixing 





Ve Vy Dez 

Ayom? = (6 — 9) x 10> eV? 
Aogm? — (1 — 3) x 10% ev? 
Solar neutrinos and reactor (SNO, SuperKamiokande, 


KamLAND) experiments give the first results. Atmospheric neutrino 
data and the long baseline experiment (SuperKamiokande, K2K) 
provide the second. The mixing angle relevant to the solar and 
reactor neutrino oscillation is large, tan? 042 ~ 0.4073-5°, while the 
one related to the atmospheric neutrino data is maximal, 
sin? 2423 ~ 1. Cosmological considerations give Y` m,, < O(1 eV). 


experiments at CERN; negative results would force 
upon us a substantial modification of the electro- 
weak theory. 

Last, but not least, there are a few theoretical 
motivations for an extension of the model to be 
considered necessary. First, the structure of the GWS 
theory is not entirely determined by the gauge 
principle. The form of the Higgs self-interactions, 
as well as their number and the Yukawa couplings 
of the Higgs scalar to the fermions, are uncon- 
strained by any principle, and the particular, 
minimal form assumed by Weinberg and Salam is 
yet to be confirmed experimentally. 

In addition, the theory is not really a unified 
gauge theory: SU,(2) and U(1) gauge couplings are 
distinct. One possibility is that the SU(3)Qcp x 
SU, (2) x U(1) theory of the SM is actually a low- 
energy manifestation of a truly unified gauge theory — 
grand unified theory (GUT) —- defined at some 
higher mass scale. The simplest version of GUT 
models based on SU(5) or SO(10) gauge groups has 
however a difficulty with the proton decay rates, 
and with the coupling-constant unification itself. 
Supersymmetric GUTs appear to be more accepta- 
ble both from the coupling-constant unification and 
from the proton lifetime constraints. 

A more subtle, but perhaps more severe theore- 
tical problem, is the so-called naturalness problem. 
At the quantum level, due to the quadratic diver- 
gences in the scalar mass, the structure of the theory 
turns out to be quite peculiar. If the ultraviolet 
cutoff of the theory is taken to be the Planck mass 
scale, Ayy ~ mp ~ 10!? GeV, at which gravity 
becomes strongly coupled, the theory at Ayy would 
have to possess parameters which are fine-tuned 
with an excessive precision. The problem is known 
also as a “hierarchy” problem. 

A way to avoid having such a difficulty is to 
introduce supersymmetry. In a supersymmetric 
version of the standard theory — in fact, there are 
phenomenologically well-acceptable models such 
as the minimal supersymmetric standard model 
(MSSM) —- this problems is absent due to the 
cancellation of bosonic and fermionic loop con- 
tributions typical of supersymmetric theories. As a 
result, the properties of the theory at low energies 
are much less sensitive to those of the theory at the 
Planck mass scale. Experiments at LHC (expected 
to be performed after 2008, CERN) should be able 
to produce a whole set of new particles associated 
with supersymmetry, if this is a part of the physical 
law beyond TeV energies. 

At a deeper level, however, the hierarchy problem 
in a more general sense persists, even in super- 
symmetric models: why the masses of the order of 
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O(100 GeV) should appear at all in a theory with a 
natural cutoff of the order of the Planck mass? 
Furthermore, if the masses of the neutrinos turn out 
to be of the order of O(10-°-10°) eV, we are left 
with the problem of understanding the large 
disparities among the quark and lepton masses, 
spanning the range of more than 13 orders of 
magnitudes: another “hierarchy” problem. 

It is also possible that the spacetime the physical 
world lives in is actually higher dimensional: the usual 
four-dimensional Minkowski spacetime times either 
compactified or uncompactified “extra dimensions.” 
In theories of this type, some of the difficulties 
mentioned above might find a natural solution. It is 
yet to be seen whether a consistent theory of this type 
can be constructed that correctly account for the 
properties of the universe we inhabit. 
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Introduction 
Motivation: A Model Problem 


Many physical problems can be modeled by partial 
differential equations. Let us consider, for example, 
the case of an elastic membrane Q, with fixed 
boundary T, subject to pressure forces f. The vertical 
membrane displacement is represented by a real- 
valued function u, which solves the equation 


—Au(x) = f(x), x = (x1, X2) EQ [1] 


where the Laplace operator A is defined, in two 
dimensions, by 


u u 


e a 
É Ox; 0x, 


As the membrane is glued to the curve I’, u satisfies 
the condition 


u(x) =0, xer 2) 


The system [1|-[2] is the homogeneous Dirichlet 
problem for the Laplace operator. It enters the more 
general framework of (linear) elliptic boundary 


value problems, which consist of a (linear) partial 
differential equation (in the example above, of order 
two: the highest order in the derivatives) inside an 
open set 2 of the whole space RY, satisfying some 
“elliptic” property, completed by (linear) conditions 
on the boundary T of Q, called “boundary condi- 
tions.” In the sequel, we only consider the linear 
case. 

Our aim is to answer the following questions: does 
this problem admit a solution? in which space? is this 
solution unique? does it depend continuously on the 
given data f? In case of positive answers, we say that 
the problem is “well posed” in the Hadamard sense. 
But other questions can also be raised, such as the sign 
of the solution, for example, or its regularity. We give a 
full survey of linear elliptic problems in a bounded or in 
an exterior domain with a sufficiently smooth bound- 
ary and in the whole space. In the general theory of the 
elliptic problem, we consider only smooth coefficients. 
We survey the standard theory, which can be found in 
the several well-known monographs of the 1960s. The 
new trends in the investigation of the elliptic problems 
is to consider more general domains with nonsmooth 
boundaries and nonsmooth coefficients. On the other 
hand, the regularity results for elliptic systems have not 
been improved during last 30 years. New trends also 
require employment of more general function spaces 
and more general functional background. 

The number of references (see “Further reading” 
section) is strictly limited here; we list only some of 
the most important publications. The basic facts can 
usually be found in more places and sometimes we 
do not mention the particular reference. Among the 


very basic references are Friedman (1969), Gilbarg 
and Trudinger (1977), Dautray and Lions (1988), 
Hormander (1964), Ladyzhenskaya and Uraltseva 
(1968), Lions and Magenes (1968), Renardy and 
Rogers (1992), and Weinberger (1965); of course, 
there are many others. 


The Method 


To answer the above questions, we generally use, for 
such elliptic problems, an approach based on what is 
called a “variational formulation” (see the section 
“Variational approach”): the boundary-value problem 
is first transformed into a variational problem of lower 
order, which is solved in a Hilbertian frame with help of 
the Lax—Milgram theorem (based on the representation 
theorem). All questions are then solved (e.g., existence, 
uniqueness, continuity in terms of the data, regularity). 
But this variational formalism does not necessary allow 
to treat all the situations and it is limited to the 
Hilbertian case. Other strategies can then be developed, 
based on a priori estimates and duality arguments for 
the existence problem, or maximum principle for the 
question of unicity. Without forgetting the particular 
cases where an explicit Green kernel is computable 
(e.g., the Laplacian operator in the whole space case). 

Moreover, the study of linear elliptic equations is 
directly linked to the background of function spaces. It is 
the reason why we first deal with Sobolev spaces — both 
of the integer and fractional order and we survey their 
basic properties, imbedding and trace theorems. We pay 
attention to the Riesz and Bessel potentials and we 
define weighted Sobolev spaces important in the context 
of unbounded opens. Second, we present the variational 
approach and the Lax—Milgram theorem as a key point 
to solve a large class of boundary-value problems. We 
give examples: the Dirichlet and Neumann problems for 
the Poisson equation, the Newton problem for more 
general second-order operators; we also investigate 
mixed boundary conditions and present an example of 
a problem of fourth order. Then, we briefly present the 
arguments for studying general elliptic problems and 
concentrate on second-order elliptic problems; we recall 
the weak and strong maximum principle, formulate the 
Fredholm alternative and tackle the regularity questions. 
Moreover, we are interested in the existence and 
uniqueness of solution of the Laplace equation in the 
whole space and in exterior opens. Finally, we present 
some particular examples arising from physical pro- 
blems, either in fluid mechanics (the Stokes system) or in 
elasticity. 


Sobolev and Other Types of Spaces 


Throughout, Q C Rẹ will generally be an open 
subset of the N-dimensional Euclidean space RY. 
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A domain will be an open and connected subset of 
RN. We shall use standard notations for the spaces 
LP(Q), C°(Q), etc., and their norms. Let us agree 
that C’"(Q), REN, r€ (0,1), denote the space of 
functions f in C*%(Q), whose derivatives D°f, a= 
(ais... aN) ENY, of order Ja|= S~ ai=k are 
all r-Holder continuous. In the notations for some of 
these spaces, by 2 we mean that the functions have 
the corresponding property on 2 and that they can be 
continuously extended to Q. 

Let us recall several fundamental concepts. The 
space D(Q) of the test functions in Q consists of 
all infinitely differentiable ọọ with a compact 
support in 2. A locally convex topology can be 
introduced here. The elements of the dual space 
D'(Q) are called the distributions. If f € Li (9) 
(i.e., f € L'(K) for all compact subsets K of Q), 
then f is a regular distribution; the duality is 
represented by |, f(x)p(x)dx. If fe D(Q), we 
define the distributional or the weak derivative 
D° of f as the distribution pr (1) (Ff, Dey). 
Plainly, if f € Li. has “classical” partial deriva- 
tives in Lj, then it coincides with the correspond- 
ing weak derivative. 

If Q=RN, it is sometimes more suitable to work 
with the tempered distributions. The role of D(Q) is 
played by the space S(R‘) of C®-functions 
with finite pseudonorms sup |D°f(x)|(1 + |x|)‘, lal, 
k=0,1,2,.... Recall that the Fourier transform F 
maps S(RY) into itself and the same is true for the 
space of the tempered distributions S'(RY). 


Sobolev Spaces of Positive Order 


The Sobolev space W"?(Q),1<p<oo,kEN, is 
the space of all f € L?(Q) whose weak derivatives up 
to order k are regular distributions belonging to 
L?(Q); in W*?(Q) we introduce the norm 


1/p 


lle =| 0 J Def (x)|? dx [3] 


|a| <k 


when p<oo and maxjaj<k SUP eSS,eq|D°f(x)| if 
p=oco. The space W?(Q) is a Banach space, 
separable for p < œ and reflexive for 1 < p < œ; 
it is a Hilbert space for p=2, more simply denoted 
H” (Q). In the following, we shall consider only the 
range p € (1, 00). 

The link with the classical derivatives is given by 
this well-known fact: a function f belongs to 
W1?(Q) if and only if it is a.e. equal to a function 
ü, absolutely continuous on almost all line segments 
in Q parallel to the coordinate axes, whose 
(classical) derivatives belong to L?(Q) (the Beppo- 
Levi theorem). 
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For 1 < p < œ and noninteger s > 0 the Sobolev 
space W*?(()) of order s is defined as the space of all 
f with the finite norm 


= p 
lwo = (We 


5 | [EE Dro) De f(x) — DFP N” 
_ |x = m o æ= yD 


where [s] is the integer part of s (for details, see, e.g., 
Adams and Fournier (2003) and Ziemer (1989)). 


Imbedding Theorems 


One of the most useful and important features of the 
functions in Sobolev spaces is an improvement of 
their integrability properties and the compactness of 
various imbeddings. Theorems of this type were first 
proved by Sobolev and Kondrashev. Let us agree that 
the symbols — and << stand for an imbedding and 
for a compact imbedding, respectively. 


Theorem 1 Let Q be a Lipschitz open. Then 


i) If sp<N, then W°?(Q)GLP(Q) with 
p*=Np/(N — ps) (the Sobolev exponent). If 
IQ] <œ, then the target space is any L"(Q) 


with O < r < p*. 
If Q is bounded, then W# (NQ) —— L1(Q 
1<q<p*. 

(ii) If sp > N, then Wi+5?(Q)— Ci(Q) forj=0,1,.... 
If Q has the Lipschitz boundary, then W'**? 
(Q)— ChH(Q) for j = 0,1,... and p = s —N/p. 
If sp >N, rh Wits? (Q) >=> CHO), j=0,1,.-. 
and WHO) =W (Q) for all 1 < q < œ. If, 
moreover, Q has the Lipschitz boundary, then the 
target space can be replaced by Cl#(Q) provided 
sp >N>(s—1)p and0< w<s—N/p. 


Note that if the imbedding W%?(Q)— L7(Q) is 
compact for some q > p, then |Q“| < co. Moreover, 
if limsup,_,,, |{* € Q;r< |x| <r+1}|>0, then 
W*? (Q) —> L7(Q) cannot be compact. 


) for all 


Traces and Sobolev Spaces of Negative Order 


Let s > 0 and let Q be, for simplicity, a bounded open 
subset of R with boundary T of class Cls)!. Then 
with the help of local coordinates, we can define 
Sobolev spaces W*P (T) (also denoted H°(T) for p =2) 
on ['=d0Q (see, e.g., Nečas (1967) and Adams and 
Fournier (2003) for details). If f € C(Q), then fir has 
sense. Introducing the space D(Q) of es in Q 
of functions in D(R™), one can show that if f € D(Q), 
we have ||fir llw- S Clif lh ur so that, in view 
of the density of DIO ) in WED (Q ), the restriction 


of f to IT can be uniquely extended to the whole 
W!?(Q). The result is the bounded trace operator 
yo: WhP(Q)  W!-l/e-P(T). Moreover, every g€ 
W1-1/2-P (T) can be extended to a (nonunique) function 
f € W+?(Q) and this extension operator is bounded 
with respect to the corresponding norms. 

More generally, let us suppose I is of class Ge i 
and define the operator Tr, for any f € D(Q) by 


Traf =(Yof, Nf» - - -s Yk-1f) where 
öf 
yf) =z (x) 
-5L (O°f (x) /Ox*)n*, x ET 
lola 


is the jth-order derivative of f with respect to the 
outer normal n at x € IT; by density, this operator 
can be uniquely extended to a continuous linear 
mapping defined on the space W*?(Q); moreover, 
yol WEP (9)) = WE- NPP (T). 

The kernel of this mapping is the space WEP (O) 
(denoted by Hé(Q) for p=2), where W?(Q) is 
defined as the closure of D(Q) in W*?(Q) (s > 0). For 
1<p<co, the following holds: W*?(R‘)= 
WP (RN), WPN) = Ws?(Q) provided 0<s <1 p. 
If s < 0, then the space W*?(Q) is defined as the dual 
to W-D (Q ), where p'=p/(p — 1) (see, e.g., Triebel 
(1978, 2001)). Observe that, for an arbitrary Q, a 
function f € W!?(Q) has the zero trace if and only if 
f(x)/dist(x, T) belongs to L’ (Q). 

For p=2, we simply denote by H~*(Q) the dual 
space of H¥(Q). In the case of bounded opens, we recall 
the following useful Poincaré—Friedrichs inequality (for 
simplicity, we state it here in the Hilbert frame): 


Theorem 2 Let Q be bounded (at least in one 
direction of the space). Then there exists a positive 
constant Cp(Q) such that 


lelle o) = Cp(Q)||Vu on 
for all v € Hj(Q) [4] 


The Whole-Space Case: Riesz and 
Bessel Potentials 


The Riesz potentials Zą naturally occur when one 
defines the formal powers of the Laplace operator A. 
Namely, if f € S(R) and a > 0, then 


F|CA)*?#| (=I FAO, 


This can be taken formally as a definition of the 
Riesz potential Za on S’(R), 


Daf (.) = FUEL FA(OIO) 


for anya €R.If0O < a < N, then Iaf (x) = (Ia * f)(x), 
where I, is the inverse Fourier transform of ||“, 


I,(x) = C,|x|°-™ 
C, =T((N —a)/2) (®P2°T(0/2)) 


where [ is the Gamma function and [,, is the Riesz 
kernel. The following formula is also true: 


n(x) = Ca | ert t 
0 t 


Recall that every f € S(R) can be represented as 
the Riesz potential Ząg of a suitable function g € 
S(RN), namely g =(—A)®™?f; we get the representa- 
tion formula 


The standard density argument implies then an 
appropriate statement for functions in W*?(R‘) 
with an integer k and for the Bessel potential spaces 
HP(RN) — see below for their definition. The 
original Sobolev imbedding theorem comes from 
the combination of this representation and the basic 
continuity property of I,,ap < N, 

1 1 qa 


UR =L R h ee 
q p N 
To get an isomorphic representation of a Bessel 
potential space (of a Sobolev space with positive 
integer smoothness in particular) it is more convenient 
to consider the Bessel potentials (of order a € R), 


Gat (x) = (Ge x f)(x) 
= F (T1 + EPI FFE) E) 


(with a slight abuse of the notations); the following 
formula for the Bessel kernel Ga is well known: 


Ga =u f pE-NY 2o t/a E 
0 


(cf. the analogous formula for Ia), where ca= 
(4r)® *T(a/2). The kernels Ga can alternatively be 
expressed with help of Bessel or Macdonald functions. 

Now we can define the Bessel potential spaces. 
For s € R and 1 < p < œ, let H®?(R™) be the space 
of all f € S'(R) with the finite norm 


NF lse") 


- (I. F(a EPFO) ae) : 
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In other words, the spaces H®?(R™) are isomorphic 
copies of L? (RY). 

For k=0,1,2,..., plainly H*?(RN) = We2(RN) 
by virtue of the Plancherel theorem. But it is true 
also for integer s and general 1 < p < o (see, e.g., 
Triebel (1978)). 


Remark 3 Much more comprehensive theory of 
general Besov and Lizorkin-Triebel spaces in R has 
been established in the last decades, relying on the the 
Littlewood—Paley theory. Spaces on opens can be 
defined as restrictions of functions in the corresponding 
space on the whole RY, allowing to derive their 
properties from those valid for functions on R. The 
justification for that are extension theorems. In parti- 
cular, there exists a universal extension operator for the 
Lipschitz open, working for all the spaces mentioned up 
to now. We refer to Triebel (1978, 2001). 


Unbounded Opens and Weighted Spaces 
The study of the elliptic problems in unbounded 


opens is usually carried out with use of suitable 
Sobolev weighted space. The Poisson equation 
-Au =f inR, N>2 [5] 


is the typical example; the Poincaré inequality [4] is 
not true here and it is suitable to introduce Sobolev 
spaces with weights. 

Let mEN,1<p<am,aeR, kR=m—-N/p-a 
if N/p+ae {1,...,m} and k= -—1 elsewhere. For 
an open 2 Cc RN, we define 


wr? (Q) = fv E€ D'(Q),0 < A| < k, 
po” (log p)'D*u € L’ (9), 
k+1<|A| <m, 
pr TAA DA z LP(Q)} 


where p(x)=(1+ I7)! Note that W”? is a 
reflexive Banach space for the norm |].|| y=.» defined by 


= —1 
lul = S> M og p) Dulo 
O<|A|<k 


— A 
T ` |p" ARN ulls) 
k+1<|A|<m 


We also introduce the following seminorm: 
1/p 


4] ne = >, lo” Da llr a) 
|A|=m 


Let 


Wr? (2) = {v © WPP; yolu) = -Ym (V) = 0} 
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If Q is a Lipschitz domain, then wr? (Q) is 
the closure of D(Q) in W%?(Q), while D(Q) is 
dense in WP (Q). We denote by WP (Q) the dual 
of W"?(Q) (p'=p/(p—1)). We note that these 
spaces also contain polynomials, 


P; c WP (O) 


j= |m- a] ET 
as p p 


j=m— — Q elsewhere 

where [s] is the integer part of s and Pig = {0} if 
[s] <0. The fundamental property of functions 
belonging to these spaces is that they satisfy the 
Poincaré weighted inequality. An open Q is an 
exterior domain if it is the complement of a closure 
of a bounded domain in RN. 


Theorem 4 Suppose that Q is an exterior domain 
or Q=R" or Q=RN. Then 


(i) the seminorm |: [wei is a norm on wre? (Q) / Pi, 
equivalent to the quotient norm with 
j = min (m — 1,7); 

(ii) the seminorm |-|ymoq) is equivalent to the full 
norm on W?P(Q)), 


Variational Approach 


Let us first describe the method on the model problem 
[1]-[2], supposing f € L7(Q) and Q bounded. We first 
suppose that this problem admits a sufficiently smooth 
function u. Let v be any arbitrary (smooth) function; 
we multiply eqn [1] by v(x) and integrate with respect 
to x over Q; this gives 


| -(am)la)jdx= | (fo)(x)dx 


Using the following Green’s formula (do(x) denotes 
the measure on T=OQ and Ou(x)/On = Vu(x) - n(x), 
where n(x) is the unit normal at point x of T 
oriented towards the exterior of Q): 


| (aueyceas =— [ve -Vv)(x)dx 


we get, since v|p=0:A(u,v)=L(v), where we 
have set 


A(u,v) = J vues - Vv(x)dx 


L(v) = [ f (x)u(x) dx 


The idea is to study in fact this new problem 
(showing first its equivalence with the boundary- 
value problem), noting that it makes sense for far 
less regular functions u, v (and also f), in fact u,v € 
H4(Q) (and f € H*(Q)). 


The Lax-—Milgram Theorem 
The general form of a variational problem is 


to find u € V such that 
A(u,v) = L(v) for all v € V [8] 


where V is a Hilbert space, A a bilinear continuous 
form defined on V x V and L a linear continuous form 
defined on V. We say, moreover, that A is V-elliptic if 
there exists a positive constant œ such that 


A(u,u) > allul|, for all u € V [9] 
The following theorem is due to Lax and Milgram. 


Theorem 5 Let V be a Hilbert space. We suppose 
that A is a bilinear continuous form on V x V which 
is V-elliptic and that L is a linear continuous form 
on V. Then the variational problem [8] has a unique 
solution u on V. Moreover, if A is symmetric, u is 
characterized as the minimum value on V of the 
quadratic functional E defined by 


for all v € V, E(v) =} Alv, v) — L(v) [10] 


Remark 6 


(i) We have the following “energy estimate”: 
||| y <4||L||y where V’ is the dual space to V. In 
the particular case of our model problem, this 
inequality shows the continuity of the solution u € 
Hj(Q) with respect to the data f € L7(Q) (that can 
be weakened by choosing f € H7!(Q)). 

(ii) Theorem 5 can be extended to sesquilinear 
continuous forms A defined on V x V; such a form 
is called V-elliptic if there exists a positive constant 
a such that 


Re A(u,u) >allu||, forallaeV [11] 


(iii) Denoting by A the linear operator defined on 
the space V by A(u, v) = (Au, vy y, for all v € V, the 
Lax—Milgram theorem shows that A is an isomorph- 
ism from V onto its dual space V’, and the problem [8] 
is equivalent to solving the equation Au = L. 

(iv) Let us make some remarks concerning the 
numerical aspects. First, this variational formulation is 
the starting point of the well-known finite element 
method: the idea is to compute a solution of an 
approximate variational problem stated on a finite 
subspace of V (leading to the resolution of a linear 


system), with a precise control of the error with the exact 
solution u. Second, the equivalence with a minimization 
problem allows the use of other numerical algorithms. 


Let us now present some classical examples of 
second-order elliptic problems than can be solved 
with help of the variational theory. 


The Dirichlet Problem for the Poisson Equation 


We consider the problem on a bounded Lipschitz 
open 2c RY, 


—Au=f 


u = uo 


[12] 
on f= oo) 


with uo € H'/2(T), so that there exists Uo € H!(Q) 
satisfying yo(Uo)=uo. The variational formulation 
of problem [12] is 


to find u € Up + HÌ (Q) such that 
for all v € Hj(Q), A(u,v) = L(v) [13] 


with A given by |7] and a more general L with f € 
H#(Q), defined by 


Lw) = (f, V) H-1(0),H1(Q) [14] 


The existence and uniqueness of a solution of [13] 
follows from Theorem 5 (and Poincaré inequality [4]). 
Conversely, thanks to the density of D(Q) in Hj (Q), we 
can show that u satisfies [12]. More precisely, we get: 


Theorem 7 Let us suppose f € H™!(Q) and uo € 
H'/* (TL); let Up € H'(Q) satisfy yo(Uo) = uo. Then the 
boundary-value problem [12] has a unique solution u 
such that u — Uo € Hj(Q). This is also the unique 
solution of the variational problem [13]. Moreover, 
there exists a positive constant C= C(Q) such that 


lll gay < CCl + lolly) BSI 


which shows that u depends continuously on the 
data f and uo. 


Moreover, using techniques of Nirenberg’s differ- 
ential quotients, we have the following regularity 
result (see, e.g., Grisvard (1980)): 


Theorem 8 Let us suppose that Q is a bounded 
open subset of R with a boundary of class C+! and 
let f € L7(Q),u9 € H? (T). Then u c H?(Q) and 
each equation in |12] is satisfied almost everywhere 
(on Q for the first one and on T for the boundary 
condition). Moreover, there exists a positive con- 
stant C = C(Q) such that 


lulla < Clllf izzy + Ilgllasecry] [16] 
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By induction, if the data are more regular, that is, 
f € H*(Q) and uo € H*t3/7(T) (with k € N), and if T 
is of class C*+! +, we get u € H+? (0). 


Remark 9 Let us point out the importance of the 
open geometry. For example, if Q is a bounded 
plane polygon, one can find u € Hj(Q) with Au € 
C™(Q), such that ug H'*7/"(Q), where w is the 
biggest value of the interior angles of the polygon. In 
particular, if the polygon is not convex, the solution 
of the Dirichlet problem [12] cannot be in H?(Q). 


The Neumann Problem for the Poisson Equation 


We consider the problem (n is the unit outer normal 
on T) 


—Au=f inQd 

17 
a? on [ Pa 
On 


Setting E(A)={v € H'(Q); Av € L7(Q)}, the space 


D(Q) is a dense subspace, and we have the following 
Green formula for all u € E(A) and v € H! (Q): 


| Aucew(ejae 


Ou 


=- f Vula): Wola) ( aw) 
Q ðn H-1/2(T),H!/2 (T) 


If u € H! (Q) satisfies [17] with f € L?(Q) and be 
H? (T), then for any function v € H! (Q), we have, 
by virtue of the above Green formula, 


A(u,v) = L(v) 
Lv = [wm (x)dx + (h, YOV) H-1/2(T), H121) 


But, here the form A is not H!(Q)-elliptic; in fact, 
one can check that, if problem [17] has a solution, 
then we have necessarily (take v= 1 above) 


| Edat h Deno 8 


Moreover, we note that if u is a solution, then 
u+ C, where C is an arbitrary constant, is also a 
solution. So the variational problem is not well 
posed on H!(Q). It can, however, be solved in the 
quotient space H!(Q)/R, which is a Hilbert space 
for the quotient norm 


lèl = inf lv + klimo) [19] 
but also for the seminorm v= |v|mio = VAY, v), 


which is an equivalent norm on this quotient space 


(see Nečas (1967)). 
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Then, supposing that the data f and þ satisfy the 
“compatibility condition” [18], we can apply the 
Lax—Milgram theorem to the variational problem 


to find ú € V such that 
A(u,v) = L(v) for all ù € V [20] 


with V = H! (Q)/R. We get the following result (see, 
e.g., Necas (1967): 


Theorem 10 Let us suppose that Q is connected 
and that the data f € L?(Q) and h € H™/?(L) satisfy 
[18]. Then the variational problem |20| has a unique 
solution ù in the space H'(Q)/R and this solution is 
continuous with respect to the data, that is, there 
exists a positive constant C= C(Q) such that 


ulmo < C(Mflluzqay + likla) 
forall u € ù 


Moreover, if I is of class C! and if the data 
satisfy f € L*(Q),g € H'/?(T), then every u € ù is 
such that u € H? (Q) and it satisfies each equation in 
[17] almost everywhere. 


Problem with Mixed Boundary Conditions 


Here we consider more general boundary condi- 
tions: the Dirichlet conditions on a closed subset T1 
of T =0Q, and the Neumann, or more generally the 
“Robin”, conditions on the other part [2 =T —T4. 


We seek u such that (f € L7(Q), hb € L*(T>), 
a € L~(T>)) 
-Au =f inQd 
u=0 onl, [21] 
au + ~ =h onl) 
On 
Let V={v € H'(Q);yov =0 on Ty}. Then [8] is the 


variational formulation of this problem with 


— v) 7 ve (x): Vv(x )dx + fr, (aypuyov)(o)do; 
= fa f(xw(x jde tf (byov)(o )do. 


soning for ae a>0O, we get a unique 
solution u € V for this variational problem by virtue 
of the Lax—Milgram theorem. Moreover, if u € 
H? (9), then u is the unique solution in H*(Q)N V 
of the problem [21]. 


The Newton Problem for More General Operators 


Let Q be a bounded open subset of R”. We now 
consider more general second-order operators of the 
form v= —V.(MVv)+b-Vu+cv, where be 


[WQ], ce LQ), M is an NxN square 
matrix with entries Mj, and V -(MVv) stands for 


> [Mage 


We also assume that there is a positive constant ayy 
such that 


> M;(x 


E E T 





EG 2 QM 3 


,EN) = R“ 


For given data f € L*(Q),b € L?(T), we look for a 
solution u of the problem 


-V -(MVu)+b-Vu+cu=f nQ 


|22] 
au+n:(MVu)=h onT 


We assume that a € L®(T). The variational formu- 
lation of this problem is still [8], with V=H!'(Q) 
and 


A(u,v) = | MYu- Vvdx 
Q 


n / [b - Vu + culvdx + J ayouyovda [23] 
Q T 


»)= | fev OC ae J ioe A 
If the conditions 


c—4V. b>Co>0 ae.onQ 
a+sb-v>C,>0 a.e. on I 


are fulfilled, with (Co, C1) Æ (0,0), then the bilinear 
form A is V-elliptic and the Lax—Milgram theorem 
applies. 


A Biharmonic Problem 


We consider the Dirichlet problem for the operator 
of fourth order: (c € L™(Q)): 


Arut+cu=f inQ [25] 
u = u onl, TOAT [26] 
On 


Theorem 11 Let us suppose that Q has a boundary 
of class Ct! and that the data satisfy f € H*(Q), uo € 
H?/?(T), h € H! (T). Let Up € H*(Q) be such that 
yo(Uo) = uo, ¥1(U0) =h. Then, if c > 0 a.e. in Q, the 
boundary value problem |25|-|26] has a unique 


solution u such that u — Up € Hé(Q), and u is also the 
unique solution of the variational problem 


to find u € Uo + Hg(Q) such that 
A(u,v) = Hv) for all v € H5(Q) [27] 


where I(v) = (f, V) H2(0), H2(9) and 


A(u, v) = | Au(x)Av(x)ds+ | (cw)(x)dx [28] 


Moreover, there exists a positive constant C= C(Q) 
such that 


lalla < CI la- + lloll) 
+ |lA|l nec)! |29] 


which shows that u depends continuously upon the 


data f, uo, and h. 


Remark 12 The Hilbert space choice V is of crucial 
importance for the V-ellipticity. In fact, let us 
consider for example the problem [25], with 


OAu 
ay: =(0 on I [30] 


Au = 0 on T, 
In fact, the associated bilinear form is not V-elliptic 
for V = H? (Q) but it is V-elliptic for V = {v € L? (Q); 
Av € L*Q)}. 


General Elliptic Problems 


Here 2 will be a bounded and sufficiently regular 
open subset of R. Let us consider a general linear 
differential operator of the form 


A(x, D)u = X a,(x)D"u, a,(x) EC [31] 


esi 


Setting Ao(x, £) = > ju 1 u(x )EM, we say that the 
operator A is elliptic at a point x if Ao(x,€) Æ 0 for 
all € € RN — {0}. One can show that, if N > 3, lis 
even, that is, /=2m; the same result holds for N=2 
if the coefficients a, are real. Moreover, for N > 3, 
every elliptic operator is properly elliptic, in the 
following sense: for any independent vectors €, €’ in 
RN, the polynomial ++ Ao(.,€+ 7é’) has m roots 
with positive imaginary part. 

The aim here is to study boundary-value problems 
of the following type: 


Au=f inQ [32] 


Bu =g; onl, j=0,...,m-—1, [33] 


where A is properly elliptic on Q, with sufficiently 
regular coefficients, and the operators B; are bound- 
ary operators, of order m; < 2m — 1, that must 
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satisfy some compatibility conditions with respect to 
the operator A (see Renardy and Rogers (1992) for 
details; these conditions were introduced by Agmon, 
Douglis, and Nirenberg). For example, A = (— 1)” A” 
and B;= ð /ðn is a convenient choice. 

In order to show that problem [32]-[33] has a 
solution u € H?”"*"(Q) (r € N), the idea is to show 
that the operator P defined by u+>P(u)= 
(Au, Bou,...,By-1u) is an index operator from 
Hero) into GSA Q) x Oa laa a 
and to express the compatibility conditions through 
the adjoint problem. 

We recall that a linear continuous operator P is 
an index operator if 


(1) dim Ker P < œ, and Im ?P closed; 
(2) codim Im P < oo. 


Then the index x(P) is given by x(P)= 
dim Ker P — codim Im P. We recall the following 
Peetre’s theorem: 


Theorem 13 Let E, F, and G be three reflexive 
Banach spaces such that E—— F, and P a linear 
continuous operator from E to G. Then condition 
(1) is equivalent to: “there exists C > 0, such that 
for all u € E, we have |lu||_ < C(||Pullo + llul|lp).” 


Applying this theorem to our problem [32]-[33], 
condition (1) results from a priori estimates of the 
following type: 


ll z2m (ay < C (Pulle + lialle- ) 


and condition (2) by similar a priori estimates for 
the dual problem. 


Second-Order Elliptic Problems 


We consider a second-order differential operator of 
the “divergence form” 


N N 

Au =—2(ai(x)itg)y +) bie), + cle) [84 
ij=1 i=1 

with given coefficient functions a", b',c (i,j = 


1,...,N), and where we have used the notation 
ux, = . Such operators are said uniformly strongly 
i Ox 


elliptic in Q if there exists œ > 0 such that 


NO d(x) i > all’ forall x € Q, £e RN 


Ua. 


Remark 14 There exist elliptic problems for which 
the associated variational problem does not necessa- 
rily satisfy the ellipticity condition. Let us consider 
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\(7, 0) € 


the following example, due to Seeley: let Q = 


(7,27) x [0,27]} and 
sOn g 
ai | Ale. = a210 = 
= -(c02)'-eH(14%) 

One can check that, for all A € C, the problem Au + 
Au =f in Q and u =0 on T admits nonzero solutions u 
which are given by (with u such that r=) 
u = sinr cos (je) and u= sinr sin (ue) for A Æ 0; 
u= sinr and u = sinde—” for \=0. 


Most of the results concerning existence, unicity, 
and regularity for second-order elliptic problems can 
be established thanks to a maximum principle. 
There exist different types of maximum principles, 
which we now present. 


Maximum Principle 


Theorem 15 (Weak maximum principle). Let A be 
a uniformly strongly elliptic operator of the form 
[34] in a bounded open N c RN , with a, b',c € 
L™(Q) and c > 0. Let u € CHQ) NA CO Q) and 


Au > 0 |resp. Au< 0] inQ 
Then 
infu > inf ue , < u 
infu > infu [resp ae < a | 
where u* = max (u, 0) and u` = —min (u, 0). If c=0 


in Q, one can replace w` [resp. u*| by u. 


Theorem 16 (Strong principle maximum). Under 

the assumptions of the above theorem, if u is not a 

constant function in C? (9) N C(Q) oe: that Au > 0 

[resp. Au < 0], then infgu < u (x) [resp. supo u > 
x)], forall x EQ. 


Remark 17 These two maximum principles can be 
adapted to elliptic operators in nondivergence form, 
that is, 


N 
— So aÏ (x)Ms;x; + +H x)ux, +c(x)u [35] 
ij=1 


Fredholm Alternative 


We now present some existence results which are 
based on the Fredholm alternative rather than on the 
variational method. 

Let us consider two Hilbert spaces V and H, 
where V is a dense subspace of H and V =- H. 
Denoting by V’ the dual space of V, and identifying 
H with its dual space, we have the following 
imbeddings: V — H —> V’. Let A be a sesquilinear 


form on V x V, V-coercive with respect to H, that 
is, there exist Ay € R and a > 0 such that 


Re(A(v,v)) +All} > allu|ly for all v € V 


Denoting by A the operator associated with the 
bilinear form A (see Remark 6(iii)), the equation 
Au =f is equivalent to u — àọTu =g, with T=(A + 
old)” and g= Tf. Note that T is an isomorphism 
from H onto D(A) = {u € H; Au € H}). 

The operator T : H — H is compact and, thanks to 
the Fredholm alternative, there are two situations: 


1. either Ker A=0 and A is an isomorphism from 
D(A) onto H; 

2. or Ker A Æ 0; then Ker A is of finite dimension, 
and the problem Au=f with f€ H admits a 
solution if and only if f € ImA=[Ker(A*)]-. 


We now give another example in a non-Hilbertian 
frame. Let us consider the problem (Grisvard 1980): 
Au=f in Q and Bu=g on T, where [ is of class 
Cl! A, which is defined by [34], is uniformly 
strongly elliptic with a’ = a € C?1(Q), bf, c € L®(Q), 
and Bu=~yo(u) or Bu =y (u). One can show that 
the operator u+> (Au, Bu) is a Fredholm operator of 
index zero from W”?(Q) in L?(Q) x W2-4- 1/2? (LP) 
(with d=0 if Bu = y(u) and d=1 if Bu = y (u)). 


Regularity 


Assume that Q is a bounded open. Suppose that u € 
Hi (Q) is a weak solution of the equation 


Au=f inQd 


[36] 
u=0 onl 


where A has the divergence form [34]. We now 
address the question whether u is in fact smooth: 
this is the regularity problem for weak solutions. 


Theorem 18 (H*-regularity). Let Q be open, of 
class Chha EC (OLD ee LO ELA Sup 
pose, oe N thatu € H e is a weak solution 
of [36]. Then u € H? (Q) and we have the estimate 


) < CIF llrz( 


where the constant C depends only on Q and on the 
coefficients of A. 


luliazo ) + llalla) 


Theorem 19 (Higher regularity). Let m be a non- 
negative integer, Q be open, of class C”+1,1 and assume 
that a! e C0), bi, ce C™*1(0), f Ee H™(Q). Sup- 
pose, furthermore, hat u € H'(Q) is a weak solution of 
[36]. Then u € H”+*? (Q) and 

< C(IIFl ao 


allaca) + [l#llz2@)) 


where the constant C depends only on Q and on 
the coefficients of A. In particular, if m > N/2, then 
u € C*(Q). Moreover, if Q is of C® class and f € 
C(O). a 2 C0), bee Gr), then vw] CO). 


Remark 20 


(i) If u € H}(Q) is the unique solution of [36], one 
can omit the L*-norm of u in the right-hand side 
of the above estimate. 

(ii) Moreover, let us suppose the coefficients a”, b’ 
and c are all C and f E€ C(O); then, if u € 
H! (Q) satisfies Au = f,u € C™(Q); this is due to 
the “hypoellipticity” property satisfied by the 
operator A. 


We have a similar result in the L? frame (Grisvard 
1980): 


Theorem 21 (W>?-regularity). Let Q be open, of 
class C'h areca b', cE L™(Q). Sappose 
furthermore, that b'} =0,1<i<N and c >Q a.e. 
Then for every f € L?(Q) there exists a unique 
solution u € W~?(Q) of [36]. 


Unbounded Open 
The Whole Space 


Note in passing that we shall work with the weighted 
Sobolev spaces W??(Q) defined in the subsection 
“Unbounded opens and weighted spaces.” 


Theorem 22 The following claims hold true: 
(i) Let fe WP (RY) satisfy the compatibility 
condition 


(f, 1) 10 (RN) whe! (RN) =0 if p' = N 


Then the problem [S5] has a solution u€ 
Wo? (R), which is unique up to an element in 
Pi-n;/p) and satisfies the estimate 


lly RN Pa wy < Cll ll wea) 


Moreover, if1<p<N, then u=E xf. 

(ii) If f © LP(RN), then the problem [S5] has a 
solution u € WPR"), which is unique up to 
an element in Pin jp, and if 1 < p < N/2, then 
is ee 


The Calder6n—Zygmund inequality 


and Theorem 4 are crucial for establishing Theorem 22. 





Oy 
< C(N A 
ðx;ðx; RN ( p) || Pll") 


p € D(RY) 
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Further, point (i) means that the Riesz potential of 
second order satisfies 


h: Wy? (RY) LPh-N/p' —> Wo? (RN) /Pu-y 


(where the initial space is the orthogonal comple- 
ment of Pu-n/p) in WR and it is an 
isomorphism. 

Note that here 


WIP (RY) = {v € L” (RN); Vu € LP (R)} 


for 1<p<N and 1/p*=1/p—1/N. And for 
1<r< N/2, we also have the continuity property 


tL I 2 
. Tr N _,74 N eee 
h: L"(R™) > LIR"), ar -TN 
Remark 23 The problem 
u—Au=f in R" [37] 


is of a completely different nature than the problem 
[5]. The class of function spaces appropriate for the 
problem [37] are the classical Sobolev spaces. With 
the help of the Calderon—Zygmund theory, one can 
prove that if f € LP (RN), then the unique solution of 
[37] belongs to W2?(R“) and can be represented as 
the Bessel potential of second order (see Stein 
(1970)): u= G x f, where G is the appropriate Bessel 
kernel, that is, G, for which G(é) ~ (1+ en, 
Recall that in particular G(x) ~ |x| e*l for N=3. 
In the Hilbert case, f € L*(R™), we get 


(1+ [ER € L’(RY) 


which, by Plancherel’s theorem, implies that u € 
H?(RN). For f € W?(RY), the problem [37] has a 
unique solution ue W?(RY) satisfying the 
estimate 


lullige < C(p, 2) IF ll wt aR) 


Exterior Domain 


We consider the problem in an exterior domain with 
the Dirichlet boundary condition 


—Au=f inQd 
[38] 
=p onI =099Q 
where f € Wp ?(Q) and g € W!-1PP (8N). Invoking 
the results for R and bounded domains, one can 
prove the existence of a solution u € Y (Q) which 
is unique up to an element of the kernel AŻ (Q) ={zeE 
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w (Q); Az=0} provided that f satisfies the com- 
patibility condition 


o ) 
if.) = (82E) for all p € ABO) 


The kernel can be characterized in the following way: 
it is reduced to {0} if p=2 or p < N and if not, then 


ASQ) ={C(A-1);CER} ifp>N>3 


where A is (unique) solution in w39) N Wo? (Q) of 
the problem AA=0 in Q and A\=1 on OQ, and 


A5(Q) = {C(u - uo); CER} ifp>N=2 


where uo(x) = (27|r|) J, log |y — x| doy and p is the 
only solution in W?N) N w” (Q) of the problem 
Ap=0 in Q and p=uo on T. 


Remark 24 Similar results exist for the Neumann 
problem in an exterior domain (see Amrouche et al. 
(1997)). The framework of the spaces we (RY ) for 
the Dirichlet problem in R¥ was also considered in 
the literature. For a more general theory see Kozlov 
and Maz’ya (1999). 


Elliptic Systems 
The Stokes System 


The Stokes problem is a classical example in the 
fluid mechanics. This system models the slow 
motion with the field of the velocity u and the 
pressure 7, satisfying 


—vAu+Vnr=f in Q 
(S) divu=h in Q 
u=g on [(=0Q 


where v > 0 denotes the viscosity, f is an exterior force, 
g is the velocity of the fluid on the domain boundary, 
and / measures the compressibility of the fluids (if 
h = 0, it is an incompressible fluid). The functions h and 
g must satisfy the compatibility condition 


| neax = | g-ndo [39] 


Theorem 25 Let Q be a Lipschitz bounded domain 
in RN, N > 2. Let f € H1(Q)N, h € L2(Q), and g € 
H'/2(L)% satisfy [39]. Then the problem (S) has a 
unique solution (u,n) € H'(Q)N x L7(Q)/R_ satisfy- 
ing the a priori estimate 


lallen + MItllawye 
< C((lf llc + llliz + lielam) 


In order to prove Theorem 25, one can start with a 
homogeneous problem. The procedure of finding u is a 
simple application of the Lax—Milgram theorem. 
Application of de Rham’s theorem gives the pressure 7. 
We introduce the space 


V= {v € D(Q); divv = 0} 
and define F € H” (Q)™" by 


(F, v) =0 forallvue VY 


H-!x H! 
Moreover, there exists m € L*(Q), unique up to an 
additive constant, and such that F=Vz. The 
problem (S), which we transform to the homoge- 
neous case (,=0,g=0), can be formulated on an 
abstract level. Let X and M be two real Hilbert 
spaces and consider the following variational pro- 
blem: Given L € X’ and X € M’, find (u, 7) € X x M 
such that 


vex 
qeM 


A(u,v) + Biv, x] = L(v), 
Blu, q| = X(q), 


where the bilinear forms A, B and the linear form L 
are defined by 


(40) 


Alu, v) = | Vu: Vv 
Q 


Biq] =- | fav -o 


Lv) =f fov 


Theorem 26 If the bilinear form A is coercive in 
the space 


V = {v € X; Blv,g]=0} forallqeM 
that is, if there exists a > 0 such that 
Alv, v) > alll, ve V 


then the problem [40] has a unique solution (u, 7) 
if and only if the bilinear form B satisfies the 
“inf-sup” condition: 


there exists 3 > 0 such that 


inf sup ee) > B 

9M vex ||Ullx|l9 lu 

As for the Dirichlet problem, the regularity result 
is the following: 


Theorem 27 Let 9 be a bounded domain in RN, of 
the class C”+t1 ifm € N and Cl! if m= —1. Let f € 
warb e WPO) and pe W7t2-1/b (DN 
satisfy condition [39]. Then the problem (S) has a 
unique solution (u, t) € W"*2? (Q)N x W” (Q)/R. 


Remark 28 Itis possible to solve (S) under weaker 
assumption, for instance, if f € W~!/?(Q'), h=0 and 
ge W1/Pe(T)N. We can prove that then (u,7) € 
Le(Q) x WHO). 


The Linearized Elasticity 


The equations governing the displacement u= 
(u1,42,u43) of a three-dimensional structure 
subjected to an external force field f are written as 
(Q is a bounded open subset of R? and T = ôN) 


—pAu—-(A4+WV(V-a) =f in 
u=0 onl 


3 
X olu) =g; on TY 
j=1 


where A > 0 and u > 0 are two material character- 
istic constants, called the Lamé coefficients, and 
(v= (V1, V2, V3)) 


=T-Tp 


ajv) = oji(v) 


3 
= bj 2 Ekk (V 


with Ei(V) = Eji(V oe 


) + pe; (Vv) [41] 


1 (Ov; + ðiv;) 


where 6; denotes the Kronecker symbol, that is, 
ô; = 1, for i=j and é; = 0, for i # j. These equations 
describe the equilibrium of an elastic homogeneous 
isotropic body that cannot move along To; along T1, 
surface forces of density g = (g1, 22, g3) are given. The 
case T4 =@ physically corresponds to clamped struc- 
tures. The matrix with entries e;(u) is the linearized 
strain tensor while oj(u) represents the linearized 
stress tensor; the relationship [41] between these 
tensors is known as Hooke’s law. We refer for 
example to Ciarlet and Lions (1991) and Nečas and 
Hlaváček (1981) (and references therein) for most of 
the results stated in this paragraph. The variational 
formulation of this problem is 


to find u € V such that 


A(u,v) = L(v) for all v € V (42) 


where the bilinear form A and the linear form L are 
given by 


Aua J NG ange 


+2 > Ei (u)ej(v)|(x)dx, [43a] 


= | fs) Ato aides: J glo) vlo)do [43H 
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The functional space V is defined as 
V = {v = (11,02,13) € [HQ]; 
yovi = 0 on To, 1<i<3} 


To prove the ellipticity of A, one needs the following 
Korn inequality: There exists a positive constant C(Q) 
such that, for all v= (v1, v2,v3) € [H'(Q)]°, we have 


1/2 
ine Mize + Cll [44] 


1j=1 


lelli SC) 





The following result holds true: 


Theorem 29 Let Q be a bounded open in R? with a 
Lipschitz boundary, and let To be a measurable 
subset of T, whose measure (with respect to the 
surface measure dT (x)) is positive. Then the mapping 


3 
2 
Cre bo leg) lro) 


ij=1 


1/2 





is a norm on V, equivalent to the usual norm |\.||, o- 
As a consequence, we get: 


Theorem 30 Under the above assumptions, there 
exists a unique u€ V solving the variational 
problem [42]-[43]. This solution is also the unique 
one which minimizes the energy functional 


3 
E(v) =; f AV i v) sF 2i ` leg(v 


[fro neas 


over the space V. 


17] (as 


de + | glo 0) vlo) do 
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Introduction 


Entanglement is a type of correlation between 
subsystems, which cannot be explained by the action 
of a classical random generator. It is a key notion 
of quantum information theory and corresponds 
closely to the possibility of channels which transmit 
quantum information, and cannot be simulated by 
classical channels. In this article, we consider the 
development of the concept, and its qualitative 
aspects. The quantitative aspects are treated in a 
separate article (see Entanglement Measures). 


Historical Development 


The first realization that quantum mechanics comes 
with new, and perhaps rather strange, correlations 
came in the famous 1935 paper by Einstein, Podolsky, 
and Rosen (EPR) (Einstein et al. 1935), in which they 
set up a paradox showing that the statistics of certain 
quantum states could not be realized by assigning 
wave functions to subsystems. It was in response to 
this paper that Schrödinger (1935), in the same year, 
coined the term “entanglement,” as well as its German 
equivalent “Verschrankung.” The subject lay dormant 
for a long time, since Bohr, in his reply, completely 
ignored the entanglement theme, and there was a 
widespread reluctance in the physics community to 
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consider problems of interpretation. The leaf turned 
slowly with Bohm’s reduced model of the EPR 
paradox using spins rather than continuous variables, 
and decisively with Bell’s 1964 strengthening of the 
paradox (Bell 1964). He showed that not only wave 
functions assigned to individual systems failed to 
describe the correlations predicted by quantum 
mechanics, but any set of classical parameters assigned 
to the subsystems. This eliminated all reference to a 
possibly dubious quantum ontology and all reference 
to the quantum formalism from the argument. Bell 
derived a set of inequalities from the assumption that 
each subsystem could be described in terms of classical 
variables, and that these (possibly hidden) variables 
would not be changed by the mere choice of a 
measurement for the distant correlated system. The 
only relation to quantum mechanics was then the 
simple quantum calculation showing, in certain situa- 
tions, such as the state described by EPR, quantum 
mechanics predicted a violation of Bell’s inequalities. 
This immediately suggested an experiment, and 
although it was difficult at first to find an efficient 
source of suitably quantum-correlated pairs of parti- 
cles, the experiments that have been made since 
then have supported the quantum-mechanical result 
beyond reasonable doubt. This came too late for 
Einstein, whose research program in quantum 
mechanics had been precisely to build a “local 
hidden-variable theory” of the type seen in contra- 
diction with Bell’s inequality. But at least the EPR 
paper had finally received the response it deserved. 

In Schrédinger’s work, entanglement was a purely 
qualitative term for the strange way the subsystems 


seemed to be intertwined as soon as one insisted on 
discussing their individual properties. After Bell’s 
work, the favored mathematical definition of entan- 
glement would probably have been the existence of 
measurements on the subsystems, such that Bell’s 
inequality (or some generalization derived on the same 
assumptions) is violated. However, around 1983 
another notion of (the lack of) entanglement was 
independently proposed by Primas (1983) and Werner 
(1983). According to this definition, a quantum state p 
is called unentangled if it can be written as 


p= > Pap OP, [1] 


where the p’, are arbitrary states of the subsystems 
(i=1,2), which depend on a “hidden variable” a, 
drawn by a classical random generator with prob- 
abilities pa. Such states are now called separable, 
which is a bit awkward, since the notion is typically 
applied to systems which are widely separated. 
However, the term is so firmly established that it is 
hopeless to try to improve on it. 

In any case, it was shown by Werner (1989) that 
there are nonseparable states, which nevertheless 
satisfy Bell’s inequalities and all its generalizations. 
The next step was the observation by Popescu 
(1994) that entanglement could be distilled: this is 
a process by which some number of moderately 
entangled pair states is converted to a smaller 
number of highly entangled states, using only local 
quantum operations, and classical communication 
between the parties. For some time it seemed that 
this might close the gap, that is, that the failure of 
separability might be equivalent to “distillability” 
(i.e., the existence of a distillation procedure produ- 
cing arbitrarily highly entangled states from many 
copies of the given one). However, this turned out to 
be false, as shown by the Horodecki family in 1998 
(Horodecki et al. 1998), by explicitly exhibiting 
bound entangled, that is, nonseparable, but also not 
distillable states. In 2003 Oppenheim and the 
Horodeckis introduced a further distinction, namely 
whether it is possible to extract a secret key from 
copies of a given quantum state by local quantum 
operations and public classical communication 
(Horodecki et al. 2005). This task had hitherto 
been viewed as an application of entanglement 
distillation, but it turned out that secret key can be 
distilled from some bound entangled (but never from 
separable) states. 

For the entanglement theory of multipartite states, 
that is, states on systems composed of three or more 
parts, between which no quantum interaction takes 
place, one key observation is that new entanglement 
properties must be expected with any increase of the 
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number of parties. As shown by Bennett et al. 
(1999), there are states of three parties which cannot 
be written in the three-party analog of [1], but are 
nevertheless separable for all three splits of the 
system into one vs. two subsystems. 

The crucial advance of entanglement theory, 
however, lies not so much in the distinctions 
outlined above, but in the quantitative turn of the 
theory. With the discovery of the teleportation and 
dense coding processes (Bennett and Wiesner 1992, 
Bennett et al. 1993), entanglement changed its role 
from a property of counterintuitive contortedness to 
a resource, which is used up in teleportation and 
similar processes. Distillation is then seen as a 
method to upgrade a given source to a new source 
of highly entangled states suitable for this purpose, 
and it is not just the possibility of doing this, but the 
rate of this conversion, which becomes the focus of 
the investigation. All the tasks in which entangle- 
ment appears suggest quantitative measures of 
entanglement. In addition, there are many entangle- 
ment measures, which appear natural from a 
mathematical point of view, or are introduced 
simply because they can be estimated relatively 
easily and in turn give bounds on other entangle- 
ment measures of interest. The current situation is 
that there is no shortage of entanglement measures 
in the literature, but it is not yet clear which ones 
will be of interest in the long run. Some of these 
measures are described in Entanglement Measures. 

The current state of entanglement theory is marked 
firstly by some long-standing open problems in the 
basic bipartite theory on the one hand (additivity of 
the entanglement of formation, the existence of NPT 
bound entangled states, and more recently the 
existence of entangled states with vanishing key 
rate). Secondly, there is significant effort to try to 
compute some of the entanglement measures, at least 
for simple subclasses of states. This is so difficult, 
because many definitions involve an optimization 
Over Operations on an asymptotically large system. 
Thirdly, there is a new trend in multipartite entangle- 
ment theory, namely looking specifically at entangle- 
ment in lattice structures such as spin systems of 
harmonic-oscillator lattices. Here one can expect very 
fruitful interaction with the statistical mechanics and 
solid-state physics in the near future. 


Qualitative Entanglement Theory 
Setup 


Throughout this section, we will consider density 
operators on a Hilbert space split in some fixed way 
into a tensor product of a Hilbert space Ha for 
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Alice’s system and a Hilbert space Hg for Bob’s 
system, that is, H =H, © Hp. For simplicity, we will 
mostly consider finite-dimensional spaces, and if a 
dimension parameter d < œo appears, it is under- 
stood that d=dimH, = dim Hg. By B(H) we will 
denote the set of bounded operators on a Hilbert 
space, and by B,(H) the set of trace-class operators. 
We distinguish these even in the finite-dimensional 
case, because of their different norms. By G we will 
denote the state space of the combined system, that 
is, the set of positive elements of B,(#() with trace 1. 

For such a density operator p= p*® we denote by 
pô and p® the restrictions to the subsystems, defined 
by the partial trace over the other system, or by 
tr(pF) =tr(p*3(F @ 1)). We denote by © the opera- 
tion of matrix transposition, and by id&@© the 
partial transposition, applied only to the second 
tensor factor. Since transposition is not completely 
positive (see Channels in Quantum Information 
Theory) partial transposition may take positive 
Operators to non-positive operators. The relative 
entropy (see Entropy and Quantitative Transvers- 
ality) of two density operators p,o will be used with 
the convention S(p||c) = tr p( log p — logo). 


Witnesses and the Criterion of Positivity 
of Partial Transpose 


A state p is called separable iff it is of the form [1], 
and entangled otherwise. The set of separable states 
€ is a convex subset of the set G of all states. Its 
extreme points are obvious from the representation 
[1], namely the pure product states p= |a ® 
bp)(ba Q dp]. Since €, like G, is a convex set in 
(d* — 1) dimensions, Caratheodory’s theorem asserts 
that the sum can be taken to be a decomposition 
into d* such terms. For a given p, deciding whether 
it is separable or entangled, hence, involves a 
nonlinear search problem in roughly 4d> real 
parameters, namely the vector components of the 
DA, Qg appearing in the sum. 

Dually, the convex set € can be described by a set 
of linear inequalities. Here is a simple way of 
generating such inequalities: let T: B,(Hp,) — B(Ha) 
be a positive linear map, that is, a map taking 
positive matrices to positive matrices. Then for 
PA, pg > O the expression tr(paT(pp)) is positive. It 
is also bilinear, so we can find a Hermitian operator 


T’ € B(Ha Q Hp) such that 


tr(PaT(P)) = tr((Pa 8 æ)T*) 


Since the left-hand side is positive, we see by taking 
convex combinations that tr(oT*) > 0 for all separ- 
able states p. Hence, if we find a state with a negative 
expectation of T’, we can be sure it is entangled. 


Therefore, such operators T! are called entanglement 
witnesses. This is often a useful criterion, especially 
when one has some additional information about the 
state, allowing for an intelligent choice of witness. It 
is known from the theory of ordered vector spaces 
and their tensor products that the set of witnesses 
constructed above is complete. Hence, in principle, 
checking all such witnesses provides a necessary and 
sufficient criterion for entanglement. However, in 
practice this remains a difficult task, because the 
extreme points of the set of positive maps are only 
known for some low dimensions. 

By restricting T to completely positive maps, we 
get a useful necessary criterion. It can be seen that it 
is equivalent to 


(id @ ©)(p) =0 


that is, to the positivity of the partial transpose 
(PPT). States with this property are called “PPT 
states” in current jargon. 


Pure States, Purification 


For pure states, that is, for the extreme points of 6, 
separability is trivial to decide: since for pure states 
the sum [1] can only be a single term, a pure state is 
separable iff it factorizes. 

A useful observation is that, for pure states 
p=|®)(®|, all information about entanglement is 
contained in the spectrum of the reduced states. 
Consider a vector ® € Ha ® Hp of the form 


b=) Vep [2] 


where #4 €Ha and ¢® € Hg are orthonormal 
systems, 7, > 0, and $ „ra =1. Then it is easy to 
check that p* = X`, ralo) (2| is the spectral resolu- 
tion of the restriction. Conversely, by diagonalizing 
the restriction of a general unit vector ®, we find a 
biorthogonal decomposition of the from [2], also 
known as the Schmidt decomposition. The Schmidt 
spectrum {r,,...,7q} hence classifies vectors up to 
local basis changes in Ha and Hp. 

Since any pê can appear in this construction, we 
see that any mixed state can be considered as the 
restriction of a pure state, which is essentially 
unique, namely up to the choice of basis in the 
purifying system B, and up to perhaps adding or 
deleting some irrelevant dimensions in Hg. The 
resulting vector ® is known as the purification of pa. 

The extreme cases of [2] are pure product states 
on the one hand, and vectors, for which p* = 1/d is 
the totally chaotic state. These are known as 
maximally entangled and embody, in the most 
extreme way, the observation that in quantum 


mechanics, as opposed to classical probability, the 
restriction of a pure state may be mixed. 

Let us fix a maximally entangled vector Q, and 
the matching Schmidt bases, so that 


1 
a= Tay lek) 3] 


where we have used the simplified ket notation, in 
which only the basis label is written. Then, an 
arbitrary vector can be written as ®=(X &Q 1) 
Q =(18XT)Q, where X! denotes the matrix trans- 
pose of X. Clearly, this vector is again maximally 
entangled iff X is unitary. Hence, the set of maximally 
entangled vectors is a single orbit under unilateral 
unitary transformations, and we even have the choice 
to which side we apply the unitaries. 


Teleportation 


Suppose we have an orthonormal basis of maximally 
entangled vectors ®, € Ha ® Hpg. By the remarks 
above, this is equivalent to choosing unitaries 
U,,a=1,...,d* such that 6,=(U, @1)9, and 
tr(U* Ug) =déag. For example, a finite Weyl system 
constitutes such a system of unitaries, which shows 
that we can find realizations in any dimension d. 

Suppose that Alice and Bob each own part of a 
system prepared in the state Q then they can 
transmit perfectly the state of a d-dimensional 
system, using only classical communication. Classi- 
cal communication by itself would never suffice to 
transmit quantum information, and the entangled 
resource Q by itself does not allow the transmission 
of any signal. But the combination of these resources 
does the trick: Alice measures the observable 
associated with the basis ®, on the combined system 
formed by the unknown input and her part of the 
entangled pair. The result œ is then transmitted to 
Bob, who performs a U,-rotation on his part of the 
entangled pair, producing the output state of the 
teleportation. One can show by direct calculation 
that this is exactly equal to the input state. 

Note that the resource Q is destroyed in this 
process, so that for every transmission we need a 
fresh entangled pair. Less than maximally entangled 
states instead of Q lead to less-than-perfect transmis- 
sion, which can be extended to quantitative relations 
between entanglement and channel capacity. 


Special Systems 
Qubits 


For qubit pairs, there is a special basis of maximally 
entangled vectors, which has some amazing 
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properties. It consists of the vectors ®j9 =Q, and 
®, =i(o, Q 1)Q, where o,,k=1,2,3, denotes the 
Pauli matrices. Then a vector is maximally entangled 
iff its components are real in this basis, up to a 
common phase. A unitary matrix of determinant 1 
factorizes into U4 ® U> iff its matrix elements are 
real, up to a common phase. 

For qubit pairs, and also for dimensions 2 & 3, the 
partial transposition criterion for entanglement is 
necessary and sufficient, as shown by Woronowicz 
and the Horodecki family. 


Orthogonally Invariant States 


A state p on C4 @ Cf is called orthogonally invariant 
if, for any orthogonal matrix U (with respect to some 
fixed product basis) [p,U @ U]=0. This leaves a 
three-dimensional space of operators, spanned by the 
identity, the permutation F= )/; ; |i)(ji|, and its 
partial transpose F= ) |; ; |ti) (jj|, which is d times the 
projection onto the maximally entangled vector Q. 
Figure 1 shows the plane of Hermitian operators p 
with the described symmetry and tr p = 1. Convenient 
coordinates are tr pF and tr pF. Note that these are 
defined for any density operator, and are also invariant 
under the “twirl” operation p++ f dU(U & U)p(U & 
U)*, using the Haar measure dU, which projects onto 
the orthogonally invariant states. Hence, the diagram 
provides a section as well as a projection of the state 
space. The intersection of the positive operators with 
those having positive partial transpose is the set of PPT 
states, which in this case coincides with the separable 
states. The thin lines correspond to states of higher 
symmetry, namely on the one hand the “isotropic 
states” commuting with U @ U, with U the complex 
conjugate of U, and the “Werner states” commuting 
with all unitaries U & U. Their intersection point is the 
normalized trace. 





Figure 1 The plane of orthogonally invariant unit trace 
Hermitian operators of a 3@3-system. The upright triangle 
gives the positive operators, and the dashed one those with 
positive partial transpose. The shaded area gives the PPT 
states. 
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Gaussians 


In general, the entanglement in systems with infinite- 
dimensional Hilbert spaces is more difficult to 
analyze. However, if the system is characterized by 
variables satisfying canonical commutation rela- 
tions, like positions and momenta, or the compo- 
nents of the free quantum electromagnetic field, 
there is a special class of states, which is again 
characterized by low-dimensional matrices. This 
allows the discussion of entanglement questions, in 
a way largely parallel to the finite-dimensional 
theory. 

Let Ri,...,R2¢ denote the canonical operators, 
where f is the number of degrees of freedom. The 
commutation relations can be summarized as 
i[R,,, Rv] = 1, where o is the symplectic matrix. 
Operators R, have a common set of analytic 
vectors, and generate the unitary Weyl operators 
W (a) = exp (ia"R,,), which describe the phase space 
displacements. Gaussian states are those making 
a++trpoW(p) a Gaussian function or, equivalently, 
those with Gaussian Wigner function. Up to a gloal 
displacement, they are completely characterized by 
the covariance matrix 


Yw = tr PR Ry + RR, ) [4] 


The only constraint for a real symmetric matrix to 
be a covariance matrix of a quantum state is that 
y +io is a positive semidefinite matrix, which is a 
version of the uncertainty relations. 

Now for entanglement theory, we take some of 
the degrees of freedom as Alice’s and some as Bob’s. 
Separability can be characterized in terms of y, 
namely by the condition that y > 7’, where y is the 
covariance matrix of a Gaussian product state. 
Similarly, partial transposition can be implemented 
as an operation on covariance matrices, which 
allows a simple verification of the PPT condition. 
It turns out that as long as one partner has only a 
single degree of freedom, the PPT condition is 
necessary and sufficient for separability, but this 
fails for larger systems. 

The pure Gaussian states allow a normal form 
with respect to local symplectic transformations 
analogous to the Schmidt decomposition. For the 
minimal case of one degree of freedom on either 
side, one obtains a one-parameter family of 
“two mode squeezed states.” Its limit for infinite 
squeezing parameter is the state used by EPR 
(Einstein et al. 1935), which, however, makes 
rigorous mathematical sense only as a singular 
state, that is, a linear functional on B(H), which 
can no longer be represented as the trace with a 
density operator. 


Multipartite Stars 


A key feature of entanglement in a multipartite 
system is usually referred to as “monogamy”: when 
Alice shares a highly entangled state with Bob, her 
system cannot also be highly entangled with Bill. 
More formally, suppose that a multipartite state for 
systems A, B,,...,B, is given, such that the restric- 
tion to each pair AB, is the same bipartite state p. 
Then as n becomes larger, the existence of such a 
star-shaped extension constrains p to become less 
and less entangled. In fact, as n— oo, this condition 
is equivalent to the separability of p. 


Open Problems 


Recall from the introduction the following chain of 
inclusions: 


separable states C states with vanishing key rate 
C PPT state 
C undistillable states 
C all states 


The second and fourth inclusions are strict, but for 
the first and third one might have equality, for all 
we know. Especially for the third inclusion, this is a 
long-standing problem. 

Finally, we would like to point out that qualita- 
tive and conceptual aspects of entanglement are 
surveyed by Bub (2001), Popescu and Rohrlich 
(1998), and Horodecki et al. (2001). For quantita- 
tive aspects see Entanglement Measures. 


See also: Capacities Enhanced by Entanglement; 
Capacity for Quantum Information; Channels in Quantum 
Information Theory; Entanglement Measures; Entropy 
and Quantitative Transversality. 
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Introduction 


Entanglement, or quantum correlation, is one of 
the central concepts in quantum information 
theory. Its theory can be roughly separated into 
three parts. The first is qualitative, that is, it 
addresses the question “Is this state entangled or 
not?” The second, comparative part asks “Is this 
state more entangled than that state?,” and finally 
the quantitative theory asks “How entangled is this 
state?,” and gives its answers in the form of 
entanglement measures assigning a number to 
every state. Quantitative questions come up natu- 
rally whenever entanglement is used as a resource 
for tasks of quantum information processing. For 
example, entangled states are in a way the fuel for 
the processes of teleportation and dense coding: in 
each transmission step a maximally entangled pair 
system is required, and cannot be used for a further 
transmission. The process also works with less than 
maximally entangled states, but then it also 
becomes less efficient. Since entangled states 
created in the laboratory typically have imperfec- 
tions, it becomes important to understand the rates 
at which imperfectly entangled states may be 
distilled to maximally entangled ones, and this 
rate is a direct measure of the usefulness of the 
given state for many purposes. The quantitative, 
task related turn is a new development in the study 
of the foundations of quantum mechanics. It has 
been imported from classical information theory, 
where this way of thinking has been standard for a 
long time. The combination makes the particular 
flavor of quantum information theory. 
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correlations admitting a hidden-variable model. Physical 
Review A 40: 4277-4281. 


In this article we consider the comparative and 
quantitative aspects of entanglement. The historical 
aspects and qualitative theory are treated in a separate 
article (see Entanglement), to which we refer for basic 
notions and notations. The example of teleportation 
suggests close links between quantitative entanglement 
theory and the theory of capacity Bennett et al. (1996), 
which is the transfer rate of quantum information 
through a given channel. These connections are 
described in Quantum Channels: Classical Capacity. 

We follow the notations of the basic article on 
entanglement (see Entanglement). In particular, O 
denotes the transpose operation, and (id & ©) the 
partial transpose. A state is called “PPT” if its 
partial transpose is positive. The two physicists 
operating the laboratories in which the two parts 
of a bipartite system are kept are called Alice and 
Bob, as usual. The restriction of a state p to Alice’s 
subsystem is denoted by p*. 


Comparative Entanglement 
and Protocols 


Protocols 


In this section we introduce relations of the kind “state 
pı is more entangled than p2.” We take this to mean 
that p can be obtained by applying to pı some 
operations which “cannot create entanglement.” The 
definition of a class of operations of which this can be 
claimed then defines the comparison. It turns out that 
there are different choices for the class of such 
operations, depending on the resources available for 
the transformation steps. The class of operations is 
usually referred to as a protocol. 

Certainly local operations performed separately 
by Alice and Bob cannot increase entanglement. 
Alice and Bob might have to make some choices, 
and even if they make these according to a 
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prearranged scheme, by using a shared table of 
random numbers, entanglement will not be gener- 
ated. In this restrictive protocol, which we abbre- 
viate by LO, for “local operations,” no 
communication is allowed. It is clear that by just 
discarding the initial state, and preparing a new one, 
based on the random instruction allows Alice and 
Bob to make any separable state, so these states 
come out as the “least entangled” ones for this and 
any richer protocol. 

Next we might allow classical communication 
from Alice to Bob. That is, Bob’s decision to 
perform some operation in his laboratory is allowed 
to depend on measuring results obtained by Alice in 
an earlier stage. Of course, Alice is not allowed to 
send quantum systems, since in this case she might 
just send a particle entangled to one of her own, and 
any state could be generated. This protocol is 
referred to as “local operations and one-way 
classical communication” (LOWC). Obviously, we 
might also allow Bob to talk back, arriving at “local 
operations and classical communication” (LOCC). 
This is the protocol underlying most of the work in 
entanglement theory. 

The drawback of the LOCC protocol is that its 
Operations are extremely difficult to characterize: an 
LOCC operation can take many rounds, and there is 
no way to simplify a general operation to some kind 
of standard form. This is the main reason why other 
protocols have been considered. For example, it is 
obvious that an LOCC operation can be written as a 
sum of tensor products of local operations, in a form 
reminiscent of the definition of separability. How- 
ever, such “separable superoperators” may fall 
outside LOCC. Another property easily checked for 
all LOCC operations is that PPT states go into PPT 
states. The protocol “PPT-preserving operations” 
(PPTP) can also be characterized as the set of 
channels T for which (id ® 0)T(id & ©) is positive 
(although not necessarily completely positive). This 
condition is relatively easy to handle mathemati- 
cally, so that the best way to show that some pı 
cannot be converted to p2 by LOCC is often to show 
that this transition is impossible under PPTP. The 
drawback of the PPTP protocol is that it may create 
some entanglement after all, namely arbitrary PPT 
states. So it properly belongs to a modified 
entanglement theory in which separability is 
replaced by the PPT condition. 


Converting Pure States and Majorization 


The entanglement ordering is exactly known for 
pure states due to a famous theorem by Nielsen 
(1999): a pure state p1 is more entangled than a pure 


state p2 under the LOCC protocol iff the restriction 
p% is more mixed than the restriction p>} in the sense 
of majorization of spectra (i.e., for every k the sum 
of the k largest eigenvalues of pi is less than the 
corresponding sum for p+). Equivalently, there is a 
doubly stochastic channel (completely positive linear 
map preserving both the identity and the trace 
functional) taking p> to p}. 

An interesting aspect of this theory is the 
phenomenon of catalysis: It may happen that 
although pı cannot be converted by LOCC to 
p2, p1 Q a can be converted to p2 ® o. The “catalyst” 
g is a resource borrowed at the beginning of the 
transformation, and is returned unchanged after- 
wards. The order relation allowing such catalysts is 
yet to be fully characterized. 


Asymptotic Conversion 


In many applications we are not interested in exact 
conversion of one state to another, but are quite 
satisfied if the transformation can be done with a 
small controlled error. In particular, when we ask 
for the achievable conversion rate between many 
copies of the states involved, we allow small errors, 
but require the errors to go to zero. Given any 
protocol, and states p1,p2, we say that pı can be 
converted to p2 with rate r if, for all sufficiently 
large n, there is a channel of the protocol, which 
takes n copies of pı, that is, the state p?”, to a 
state p’ which approximates roughly m ~ rn copies 
of p, in the sense that m > rn, and the trace norm 
Ilo’ — p3” || goes to zero. 

Of course, one is usually interested in the 
supremum of the achievable conversion rates, which 
we call simply the maximal conversion rate. In 
particular, when p2 is the maximally entangled pure 
state of a qubit pair (usually called the “singlet”), the 
maximal rate is called the distillable entanglement 
Ep(p1). In the other direction, when p is the singlet, 
we call the inverse of the maximal conversion rate the 
entanglement cost Ec(pz). These are two of the key 
entanglement measures to be discussed below. 

In general, Ep(p) < Ec(p), so the asymptotic 
conversion between different states is usually not 
reversible. However, this is the case for pure states, 
and one finds 


Ep(p) = Ec(p) = S(p”) [1] 


where S(p)= —tr plog, (p) denotes the von Neu- 
mann entropy (see Entropy and Quantitative Trans- 
versality) based on the binary logarithm. 

Since one can do the conversion between different 
pure states via singlets, it is clear that the maximal 
conversion rate from a pure state pı to a pure state p2 


equals S(pi)/S(p}). Hence, in contrast to the ordering 
given by Nielsen’s theorem, all pure states are 
interconvertible, and the ordering is described by a 
single number. For this simplification, the allowance 
of small errors is crucial. Without asymptotically 
small but nonzero errors, it would also be impossible 
to obtain singlets from any generic mixed state. 


Entanglement Measures 
Properties of Interest 


We now consider more systematically functions 
E:© — R defined on the states spaces of arbitrary 
bipartite quantum systems. When can we regard this 
as a measure of entanglement? The minimal require- 
ments are that E(p) > 0 for all p, and E(p) =O for 
separable states. Since the choice of local bases 
should be irrelevant, we will require E((U, Q 
Up) p(Ua, ® Up)*)=E(p) for unitaries Uy, Ug. We 
also normalize all entanglement measures so that 
E(o)=1, when ø is the maximally entangled state of 
a pair of qubits. Beyond that, consider the following: 


1. V(Convexity E(S2, Papa) < Yọ PaElpa)) Start- 
ing from any E, possibly defined only on a subset 
containing the pure states, we can enforce this 
property by taking the convex hull (or “roof”) 
cok, defined as the largest convex function, 
which is <E wherever it is defined. 

2. M (Monotonicity) Suppose that some LOCC 
protocol applied to p returns some classical 
parameter œa with probability pa, and in that 
case a bipartite state pa. Then >), PaElpa) < E(p). 

3. A (Subadditivity E(p; ® p2) < E(pi) + E(p2)) 
In this and the following, the tensor products of 
bipartite states are to be reordered from 
A,B,A2By to (A;A2)(B,Bz), so the separation 
into Alice’s and Bob’s subsystems is respected. 

4. A™ (Superadditivity E(p, ® p2) > E(p1) + E(p2)) 

5. A™ (Strong superadditivity E(p12) > E(p1)+ 
E(p2)) Here p; denotes the restriction of a general 
state p12 to the ith subsystem. 

6. A” (Weak additivity E(p*"”)=nE(p)) This can 
be enforced by regularization, going from E to 


tee ep 
E™(p) = lim- E(p™) 


Note that this is implied by additivity, which is 
the conjunction of A* and A’. 

7. C (Continuity) Here it is crucial to postulate the 
right kind of dimensional dependence. A good 
choice is to demand that |E(p,) — E(p2)| < 
log df(||e1 — p2||), where f is some function with 
lim;—.0 f(t) =0. 
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8. L (Lockability) A property related to, but not 
equal to, discontinuity: a measure is called 
lockable, if the loss (1.e., the tracing out) of a 
single qubit by Alice or Bob can make E(p) drop 
by an arbitrarily large amount. 


The Collection of Entanglement Measures 


The following are the main entanglement measures 
discussed in the literature. Note that all measures 
defined by conversion rates in principle depend on 
the protocol used. Unless otherwise stated, we will 
only consider LOCC. For every function we list in 
brackets the properties which are known. 


1. Ep (Entanglement of formation [V,M,A_ ,C,L]) 
This is defined as the convex hull of the 
entanglement of pure states given by eqn [1]. 
For qubit pairs, there is a closed formula due 
to Wootters (1998), orthogonally invariant states 
(Vollbrecht and Werner 2001) (see 00510), and 
permutation symmetric 2-mode Gaussians. One 
of the big open questions is whether Ep is 
additive. This is equivalent to Ep satisfying A**, 
and also to the additivity of Holevo’s y-capacity 
of quantum channels (see Quantum Channels: 
Classical Capacity). 

2. Ec (Entanglement cost [V,M, A , A”, C, L]) This 
was already defined in the section “Asymptotic 
conversion.” It has been shown to be equal to the 
regularization of Ep, that is, Ec = EX. If Ep would 
turn out to be additive, we would thus have Ec = Er. 

3. Ep (Distillable entanglement [M,A‘*,A~]) 
Again, see the section “Asymptotic conversion.” 
This is one of the important measures from the 
practical point of view, but notoriously difficult 
to compute explicitly. Convexity of Ep is an open 
problem related to the existence of bound 
entangled, but not PPT states. 

4. E, (One-way distillable entanglement [M, A+", 
A™”]) Same as Ep, but restricting to the LOWC 
protocol. Obviously, E— (p) < Ep(p). There are 
examples of proper inequality “<” (Bennett et al. 
1996). E_, is more directly linked to quantum 
capacity than Ep, which in turn corresponds to the 
quantum capacity, allowing classical backwards 
communication as a resource. 

5. Ex (Logarithmic negativity [M,A ,A*,L]) This 
is a quantitative companion of the PPT criterion: one 
sets En(p) = log, ||(id ® ©)(p)||, where the norm is 
the trace norm. For PPT states, p this is equal to the 
trace, and En(p)=0. If the partial transpose has 
negative eigenvalues, the sum of their absolute 
values is >1, and En(p) >0. En is an easily 
computed upper bound to Ep, but gives the wrong 
value for nonmaximally entangled pure states. 
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10. 


. Er (Relative 


. Es (Squashed entanglement 


. Ex (Key rate |V,M, A ]) 


entropy of entanglement 
[V,M,A ]) This measure (Vedral et al. 1997) 
is motivated geometrically: it is simply the 
relative entropy distance of p to the separable 
subset: Er(p) = inf, S(p||o), where o ranges over 
all separable states. Ep is an upper bound to Ep. 
However, it can be improved by taking the 
distance to the PPT states rather than the 
separable states, and by combining with En, in 
the following way: 


. Eg (The Rains bound |V,M,A _ ,C]) Following 


Rains (2001), we set 
Ex(p) = int(S(p||o) + En(o)) 


where the infimum is over all states ø. This is 
still an upper bound to Ep, although clearly 
smaller than both Ep (take only separable o) 
and En (take c=p). No example of Ep(p) < 
Er(p) is known, but any bound entangled non- 
PPT state would be such an example. 

LV, M, A 3 A’, 
C,L]) This measure, introduced by Christandl 
and Winter (2004), amazingly has all the good 
properties, but is as difficult to compute as any 
of the other measures. Es(p**) is the infimum 
over the entropy combination 


S(p©) + SC“) — S(p°P°) — S(p*) 


ABC AB to 


over all extensions p°®~ of the given state p 
a system enlarged by a part C, where the density 
Operators in the above expression are the 
restrictions of pĉP® to the subsystems indicated. 
The bit rate at which 
secret key can be generated is certainly larger 
than Ep, since distillation is one way to do it. It 
is, in general, strictly larger, since there are 
undistillable states with positive key rate. 

Ec (Concurrence [V|) This measure was 
originally only defined for qubit pairs, as a 
step in Wootter’s (1998) formula for Ep in this 
case. It has an extension to arbitrary dimensions 
(Rungta et al. 2001), namely the convex hull of 


the function c(|w)(w|) = W2(1 — tr(p?)), where 
p= |wh)(¥|* is the reduced density operator. 
Both upper and lower bounds exist in the 
literature. The main interest in this measure 
stems from the fact that it has interesting 
extensions to the multipartite case. 


To conclude, we would like to point out that 
many of the themes discussed in this article were set 
by Bennett et al. (1996); their article is worth 
reading even today. Good review articles covering 
entanglement measures, with more complete refer- 
ences, are Plenio and Virmani (2005), Brufs (2002), 
and Donald et al. (2002). 


See also: Entanglement; Entropy and Quantitative 
Transversality; Quantum Channels: Classical Capacity. 
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Introduction 


A mathematical law for a physical phenomenon, 
describing the variation of a value y(€ R) in terms 
of parameters x; € R,7€ {1,...,”}, is usually 
given: 


1. in the simplest cases (and hence in exceptional 
cases), by an explicit functional equation 
y= F(x, o sA or 

2. by an implicit equation G(y,x1,..., Xn) =Q, or 

3. more generally, by a partial differentiable 
equation, 


laly 
( OX, ens OXi,, Oe, ats 


— 0 + initial values 


In the first case, the exact equation y= F(x1,...,X,) 
fully describes the behavior of y as (x1,...,X,) 
vary, but in practice this information is too 
substantive: using the Taylor formula, knowledge 
of the value y° at some point (x/,...,x°) and of the 
value of 


(OF  OF\, 9 o 
VE (0 TE o e e 


is enough to predict, with controlled accuracy, by 
linear approximation, the behavior of y for para- 
meters (x1,...,X,) close to (eei 

In the case (2), both the parameters (x1, ..., X4) and 
the value y belong to the set M = {(y,x1,...,Xn) € 
R”+1; G(y,x1,...,Xn)=0}, and we would like to 
know whether or not this set may be (at least locally 
around one of its point (y°,x?,...,x})) a graph of 
some function (x1,...,Xn)— y = F(x1,...,Xn), as in 
the case (1). Using the implicit function theorem, we 
may try to reduce our equation to the explicit 
equation of (1), and then perform a linear approx- 
imation involving VF(,0,__.0). Assuming that a priori 
we know a value ° such that for 
(x9, .--5 x2), (y0, x9,... x8) E M, this reduction is 
possible, locally around (y°,x?,...,x2), under the 
condition that 


In this situation 





Now, as it is normally the case, when they come 
from observation, the variables x1,...,X„ are known 
with an estimate and one sees that the larger 


OG AG OG 
(Ferree) OP aa) Dy | A yonetee) 


is, the worse the estimate on y near y. 

Furthermore, assuming that M is locally a graph 
of a function (x1,...,x,) y = F(x1,...,Xn), for a 
given (x1,...,Xn) the exact expression of 
y = F(x1,...,X,) and consequently the exact value 
of VF\x,,...x,) 18 not possible to obtain; we have to 
approach it using an algorithm (classically the 
Newton algorithm), and closer 


OG 
By Ook Fn) 


is to 0, the more such an algorithm is unstable. 

Finally, in the case (3), skipping technical details, 
we encounter the same type of difficulties: we have 
to avoid small values for some gradient functions at 
a given point, in order to obtain, locally at some 
point (x?,...,x®), in a stable way, reliable informa- 
tion on y in terms of (xj,...,Xn). 

To sum up, the prediction of a physical phenom- 
enon by a mathematical law greatly depends not 
only on the noncancellation of some gradient 
functions, but, as we deal with approximations and 
algorithms, on how different those gradient func- 
tions are from zero. 

This principle, of course, extends directly to 
applied problems (see the last of our examples in 
the final section): being close to singular values 
essentially means that the control (e.g., of the 
positions of some device by a manipulator) is poor. 

The geometric counterpart of this analytic phe- 
nomenmon is called “transversality,” the condition 
for some function G to have a nonzero partial 
derivative 


OG 
By Ook Fn) 


is equivalent to the condition 





j B Oxi +++ Xn = RM 
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Figure 1 Transversality of the manifold M and Oy. 


or to the condition 


Tiyo 50 = xM D Oy = Rw 
where T,M is the tangent space of M at a € M. 

We say that VGiyo,x0,..,x9) is transverse to the 
space of parameters Ox,:--x, at (y°,x?,...,x2), or 
that M is transverse to Oy at oe TR A: 
For some quantity € > 0, the condition that 


OG OG 
(Goze) eee) 


OG 
By Wok Fn) 





means that the angle a = (VG iyo, 0... x0)s OX >+ Xn) 
or the angle G=(T — M, Oy) is smaller than e€ 
. (y0, x9... x0 
(see Figure 1). at 
Our purpose in the sequel is to indicate how we 
can quantify the situations described above (the 
defect of transversality), in order to generically or 
almost generically avoid them with quantified 
accuracy. 


Quantifying Transversality 


Given two submanifolds M and N of the Euclidean 
space IR”, we can measure the transversality defect 
of (M,N) at xe R” with a differential criterion, 
both analytical and geometric. 

Let us first introduce some notations. For a given 
linear map L:R” — R’, the image by L of the unit 
ball of R” is an r-dimensional ellipsoid in R? with 
semi-axes denoted as h(L) >--- > (L), where r is 
the rank of L. For r<p, we denote 1,.,(L)= 
Osecala =O. 

Now, let xe MAN; let 7:R” — T,N* be the 
projection onto the orthogonal space of T,.N, p =n — 
dim (N) and mm the restriction of 7 to M. 





Definitions 


We say that (M, N) is transverse at x, and we denote 
it by Mi, N, if and only if mm is a submersion at x, 
that is, Dmm) : TM > T,,N~ is onto. 

For a given A=(€1,...,€)), €1 2°: > &, We say 
that (M, N) is A-nontransverse at x, and we denote it 
by MSN, if and only if li(Damyix)) < 
EV i E {1,..., p} 

With these notations, we have: M AN (i.e., (M, N) 
nontransverse at x) if and only if x M N N or M ÊN, 
for some A with ep =0, and the more (M,N) is A- 
nontransverse, with A close to (€1,..., €p—1, 9), the less 
the manifolds M and N seem transverse at x € MN N 
(see Figure 2). 

The final step in our formalism to give a convenient 
quantitative approach of transversality is the following: 
let X, Y be two (real) Riemannian manifolds, f : X — Y 
a (smooth) mapping, N C Y a submanifold of Y with 
codimension p in N, y€ N, and ®:0—R? a 
submersion, where OÔ is an open neighborhood of x in 
Y, such that 6-!({0}) = N N O. Then we say that (f, N) 
is transverse at x, and we denote it by fM.N, if and only 
if f o ® is submersive in x. 

For a given A=(e1,...,€), €1 > +++ > €p, we say 
that (f,N) is (®,A)-nontransverse at x, and we 
denote it by ff'®N, if and only if 1;(D[f o Pli) 
<«,V7i€ {1,...,p}. 

Clearly, we recognize the definition of transvers- 
ality and of A-nontransversality of two submani- 
folds M,N of R” by letting f:M— R” be the 
inclusion and ® = mm (for more details on transvers- 
ality and stability, see, e.g., Golubitski and Guille- 
min (1973)). 

With the definitions and notations above, our 
general problem may be posed as follows: 


For a C*-regular (k € NU {oo}) mapping f:R” > R? 
-5€), how large is the set 
Das B,, A) = {x z 
.,p}} and B, is a ball 


and a given A=(e,.. 
A(f, By, A) =f (X(f, By, A)), 


where 


B, C R”; l;(Dfix)) < éi Vi E€ {1,.. 
of radius r in R”? 


TB) 





Figure 2 Almost-nontransversality of M and N. 


The “bad” set A(f,B,,A) is called the set of 
A-almost critical values of f (restricted to B,). Our 
purpose is to show that one can control its size in 
terms of k and A. However, before explicitly stating 
quantitative results, let us precise what we under- 
stand by “big set” or by “size of a set.” 


Measure and Dimensions 


We have a very natural way to measure a subset A 
of a metric space. To do this, we consider a > 0 a 
real number and we denote 


Ap. = {Desi C U D; and |D; < | 


1EN 


where |D;| is the diameter of Dj, 


A) = in DIP Dijen E Av i 
1EN 


and 
ie (A) = lim HE (A) € RN {oo} 
H°(A) is called the a-dimensional Hausdorff 


measure of A. It appears that when H°(A) 4 œ, 
H°(A)=0 for a’ >a, and when H(A) £40, 
H” (A)=œ for a’ <a. This gives rise to the 
following definition of the Hausdorff dimension 
of A: 

dim7(A) = inf{a; H(A) = 0} 
= sup{a;H"(A) = 00} 


The Hausdorff dimension generalizes the classical 
notions of dimension, for instance, when A is a 
subset of IR”, dim, (A) < n, a d-dimensional mani- 
fold has Hausdorff dimension d, and H”(A) is the 
same as the Lebesgue measure L, of A (for a very 
large class of subset A, which we do not describe 
here. For more details on geometric measure theory, 
see Falconer (1986) and Federer (1969)). 

Another convenient notion of dimension is the 
(metric) entropy dimension. Let us briefly define it. 
For a bounded subset A in some metric space and a 
real number a > 0, we denote M(a, A) the minimal 
number of closed balls of radius < a, covering A. 
H(A) = log,(M(a, A)) is called the a-entropy of the 
set A. This terminology was introduced in 
Kolmogorov and Tihomirov (1961) and reflects the 
fact that H,(A) is the amount of information needed 
to digitally memorize A with accuracy a. The 


Entropy and Quantitative Transversality 239 


entropy dimension of A, dim, (A), is the order of 
M(a, A) as a — 0. Precisely, 


| — lim sup 28M A) 
ae = a—0 log(1/a) 
— inf{6: M(a, A) < (1/a)’, 


for sufficiently small a} 


We clearly have 
dimz(A) < dim,(A) 


For any bounded set A in R”, we can bound M(a, A) 
from above by a polynomial in 1/a (see Ivanov 
(1975) and Yomdin and Comte (2004)): 


e) X VAA) \(1/a) 


where c(n) only depends on n and V;(A) (the ith 
variation of the set A) is the mean value, with 
respect to P (for a suitable measure), of the number 
of connected components of A N P, with P an affine 
(n — i)-dimensional space of R”. 

Since for A contained in a d-dimensional mani- 
fold, V;(A)=0 for i>d, we deduce from this 
inequality that in this case M(a,A) is bounded 
from above by a polynomial of degree < d in 1/a. 

Our goal is to explain that we can be more precise 
than this general inequality when A is a set of 
critical or almost-critical values of a C mapping. 


Transversality Is a Generic Situation 


The results in this section concern critical values, 
and not almost-critical values. They show that a 
“generic” point of the target space is not a critical 
value, and the more regular, the mapping the 
smaller the set of critical values. Such theorems 
relating the regularity of a mapping and the size of 
its critical values are called Morse-Sard type 
theorems (see Sard (1942, 1958, 1965)). The 
simplest theorem in this direction is the following: 


Theorem 1 (C Morse-Sard theorem) (Morse 
1939, Sard 1942, Holm 1987). Let f:R” — R? 
be a C®-regular mapping. Then H?(A(f, B,)) = 
where A(f, B,) =f(X(f,B,-)) and X(f, B,) is the set of 
points x € B, ions neti (Dfix)) < 


The set A(f, B,) is the image, under f, of the points 
of the ball B, in the source space at which f is not 
submersive, that is, the set of critical values of f. 
Consequently, the Morse-Sard theorem ensures that 
for almost all points y in the target space, f—!({y}) is 
either empty or a smooth submanifold of the source 
space of dimension n — p. 
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Note that A(f,B,)=A(f,B,, A) for some conve- 
nient A=(e1,...,€) with e«=0, because 
x> l;i(Dfix)) is bounded on B,, for all ¿i € {1,..., p}. 

Now, we can concentrate our attention on more 
singular points than the critical ones, those at which 
the rank p of f is prescribed. Let us denote such 
points by A?*(f,B,), for p<p. By definition, 
AP (fs B,)=f(2°(f, B,)), where DP(f, By) = {x € B, C 
R”; rank(Dfix)) < p}. With these notations, the result 
for rank-r critical values is the following: 


Theorem 2 (Cf Morse-Sard theorem for rank-r 
critical values) (Federer 1969). Let f: R” — R? be 
a  C*-regular mapping. Then HP+”—)/k 
(A?(f, B,)) =0. In particular, 


dimy,(A?(f,Br)) < p+" 

One can produce examples showing that the 
bound of Theorem 2 is the sharpest one (see 
Comte (1996), Whitney (1935), Grinberg (1985), 
and Yomdin and Comte (2004)). 

We note that Theorem 1 is a corollary of Theorem 2 
(just replace k by œ and p by p — 1 in Theorem 2). 
This result tells nothing about the entropy dimen- 
sion of A?’(f,B,); in the next section, we will 
bound the growth of entropy of almost-critical 
values. 


Almost-Transversality Is Almost Generic 


In this section, f:R” > R? is a C’ mapping. We 
denote by K a Lipschitz constant of D*~!f on B, and 
by R,(f) the quantity (K/(k — 1)!) -r*. We have: 


Theorem 3 (C* quantitative Morse-Sard theorem) 
(Yomdin 1983 Yomdin and Comte 2004). Let 
f:R" > R? be a Ck mapping, A=(€1,...5€p), 
€& >- >e, and let us denote e=1. We have 
(for a < R,(f)): 


M(a, A(f, B;, A)) 


< cS . (E (M2) (n—i)/k 


where C is a constant depending only on n, p, and k. 


As a corollary, one can bound the entropy 
dimension of A®(f,B,) by p+ (n — p)/k, and hence 
its Hausdorff dimension, again finding Theorem 2: 
we just have to put ¢€,41=0 and «,...,¢, large 
enough, that is, «&; > Aj;(Df(x)), for all x € B,, in 
Theorem 3, to obtain: 


Theorem 4 (CE entropy Morse-Sard_ theorem) 
(Yomdin 1983 Yomdin and Comte 2004). Let 
f:R” > R? be a C mapping, let us denote «) =1 


and ei = sup {A;(Dfix));x € B,}, for i € {1,..., p}. We 
have (for a < Rz(f)): 


M(a, A’(f,B,)) < C; 3 AT 3 an (ni) 


Q Q 


where C is a constant depending only on n, p, and k. 
In particular, 


dimu (A”(f, B,)) < dime(A(f, Br) < p+" 





Again we have examples showing that this bound 
is sharp (see Yomdin and Comte 2004). 

Furthermore, the mapping f in Theorems 2-4 may 
be of real differentiability class (Hölder smoothness 
class C*), with the same conclusions in these 
theorems. That is, k may be a real number written 
as k=p+ 8 with 8 € [0,1], p € N\{0}, and f is C* 
means that f is p times differentiable and there exists 
a constant C > 0 such that for all x,y € B,,||D? fix) — 
DP? fiyy|| < C- lx- yl? (see Yomdin and Comte 
(2004)). 


Examples 


Let us denote by A the set of real polynomial 
mappings of degree d and of the following type: 


d 
x= Q(a,x)=1+ N ajx 
j=1 


with a= (a1,...,ag) and ||a|| < 1 (where |||| is the 
Euclidean norm of Rf). We identify the set A with 
B4(0, 1) = {a € R$; |lal| < 1}. 

We want to bound the a-entropy of the set of 
such polynomials for which the real roots are 
multiple or almost multiple. 

We denote by V the set V={(a,x) € Re: 
O(a, x)=0}. At points (a,x) of V with VQu,x) £ 0, 
V is a C” manifold of codimension 1 of Rt. 
We denote by V™8={(a,x) € VsVQiax) 4 0} 
and by y= {(a,x) € V; VO E V \ yrs, 
By Whitney (1957), V°"8 is a union of smooth 
manifolds of dimension < d — 1. 

A root x of a polynomial O(a, - ) is multiple if and 
only if 


Ola,x) = (a,x) =0 


Consequently, the set A» of polynomials of A 
with multiple roots is (V8) U A(Tjyre), where 
ar: R¢*! — Rf is the standard projection z(a, x) =a, 
and A(m)yrg) is the set {(a,x) € V8; Ox C Tax) V"8} 
of critical values of mys. By Sard’s theorem 


(Theorem 2),  dimy(A(myre)) <d—1. Since 
dimy (1(V°"8)) < d — 1, we obtain: dimy (A”) < d — 
1: thus, having distinct roots is a generic property. 
Let, as above, A=(e,...,€g) with & >- > eg 
and co = 1. A root x of a polynomial Ọ(a,-) € A is 
said to be A-almost multiple if and only if 
O(a,x)=0 and Vif{AOx, that is, (a,x) € V®S or 
sin (Tia, x) V"“®,Ox) < eg. This condition only con- 
cerns €g and we can take e = -- =eg1=1. We 
denote A>4 to be the set of polynomials of A with 
(at least) a A-almost multiple root. By Theorem 3, 


DORAO 
` — + Ed . = 

i=0 SO : 

But 7(V"8) being a finite union of manifolds of 
dimension at most d — 1, we finally obtain 


d—1 (4) 1 (4) l 

` = “64 . == 

a. < 

Thus, having no A-almost multiple root is A-almost 


a generic property. In Figure 3, we represent V for 
d=3 and a3=1, 


M(a, A”^\r(V*®8)) < C- 





M(a, A>*) < C’- 





W= flax) e R, uan = o) 


Ox 





Figure 3 The space of polynomials of type 1 + a,x + ax + x? 
with almost-multiple roots. 
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Figure 4 Almost-critical points of the distance function of P to 
the origin. 


The next example comes from robotics: let us 
consider a planar robotic manipulator consisting of 
two jointed bars of length a and b, as presented in 
Figure 4. We may parametrize the positions of the 
endpoint P of this device by the angles ¢ and w (see 
Figure 4). Now the distance r from the origin to P is 
r? = ||P||? =a? + b? + 2ab cos (4%). The critical points 
of r are given by 


dr 
dy 
and correspond to the circle y = 0. The critical value 
of ris a+ b. Near these critical positions, the control 


of r with respect to w is poor; we would like to avoid 
those near-critical values. Given e€ > 0, the condition 


(p) = —2ab sin(y) = 0 


(ud 
dy 
implies |w| < arcsin(e/2ab), and the e-near-critical 
values of r are 


7 — 1 < 2ab[1 — cos(arcsin(e/2ab))] 


max 





DE 


where frmax is a + b; thus, they are contained in an 
interval of length <c-e?/(4ab-1rmax), and 
M(a,A(r,€)) < c-€?/(4ab+tmax- a) (Theorem 3 
gives M(a, A(r,€)) < C(1 + €/a). 


See also: Entanglement; Entanglement Measures; 
Quantum Entropy; Singularity and Bifurcation Theory. 
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Introduction 


If a compact Lie group G acts on a manifold M, the 
space M/G of orbits of the action is usually a singular 
space. Nonetheless, it is often possible to develop a 
“differential geometry” of the orbit space in terms of 
appropriately defined equivariant objects on M. This 
article is mostly concerned with “differential forms 
on M/G.” A first idea would be to work with the 
complex of “basic” forms on M, but for many 
purposes this complex turns out to be too small. 
A much more useful complex of equivariant differ- 
ential forms on M was introduced by Cartan (1950). 
In retrospect, Cartan’s approach presented a differ- 
ential form model for the equivariant cohomology of 
M, as defined by A Borel (1960). Borel’s construction 
replaces the quotient M/G by a better-behaved (but 
usually infinite-dimensional) homotopy quotient Mg, 
and Cartan’s complex should be viewed as a model 
for forms on Mc. 

One of the features of equivariant cohomology are 
the localization formulas for the integrals of equivar- 
iant cocycles. The first instance of such an integration 
formula was the “exact stationary phase formula,” 
discovered by Duistermaat and Heckman. This 
formula was quickly recognized by Berline and 
Vergne (1983) and Atiyah and Bott (1984), as a 
localization principle in equivariant cohomology. 
Today, equivariant localization is a basic tool in 
mathematical physics, with numerous applications. 


This article begins with Borel’s topological defini- 
tion of equivariant cohomology, then proceeds to 
describe H Cartan’s more algebraic approach, and 
concludes with a discussion of localization principles. 

As additional references for the material covered 
here, we particularly recommend books by Berline, 
Getzler, and Vergne (1992) and Guillemin and 
Sternberg (1999). 


Borel’s Model of H,(M) 


Let G be a topological group. A G-space is a 
topological space M on which G acts by transforma- 
tions g> dg, in such a way that the action map 


a:GxM—M [1] 


is continuous. An important special case of G-spaces 
are principal G-bundles E — B, that is, G-spaces 
locally isomorphic to products U x G. 


Definition 1 A classifying bundle for G is a 
principal G-bundle EG — BG, with the following 
universal property: for any principal G-bundle 
E — B, there is a map f:B —> BG, unique up to 
homotopy, such that E is isomorphic to the pullback 
bundle f*EG. The map f is known as a “classifying 
map” of the principal bundle. 


To be precise, the base spaces of the principal 
bundles considered here must satisfy some technical 
condition. For a careful discussion, see Husemoller 
(1994). Classifying bundles exist for all G (by a 
construction due to Milnor (1956)), and are unique 
up to G-homotopy equivalence. 

It is a basic fact that principal G-bundles with 
contractible total space are classifying bundles. 


Examples 2 


(i) The bundle R—R/Z=S'! is a classifying 
bundle tor G=Z. 

(ii) Let H be a separable complex Hilbert space, 
dim H = oo. It is known that unit sphere S(H) is 
contractible. It is thus a classifying U(1)-bundle, 
with the projective space P(H) as base. More 
generally, the Stiefel manifold St(k, H) of unitary 
k-frames is a classifying U(k)-bundle, with base 
the Grassmann manifold Gr(k, H) of k-planes. 

(iii) Any compact Lie group G arises as a closed 
subgroup of U(k), for k sufficiently large. 
Hence, the Stiefel manifold St(k, H) also serves 
as a model for EG. 

(iv) The based loop group G = LoK of a connected Lie 
group K acts by gauge transformations on the 
space of connections A(S!) =!(S', £). This is a 
classifying bundle for LoK, with base K. The 
quotient map takes a connection to its holonomy. 


For any commutative ring R (e.g., Z, R, Z2), let 
H(-;R) denote the (singular) cohomology with 
coefficients in R. Recall that H(-;R) is a graded 
commutative ring under cup product. 


Definition 3 The equivariant cohomology Hg(M) = 
Hg(M; R) of a G-space M is the cohomology ring of 
its homotopy quotient Mg = EG xc M: 


H¢(M; R) = H(Mg;R) [2] 


Equivariant cohomology is a contravariant func- 
tor from the category of G-spaces to the category of 
R-modules. The G-map M —> pt induces an algebra 
homomorphism from Hg(pt)= H(BG) to Hg(M). In 
this way, Hcg(M) is a module over the ring H(BG). 


Example 4 (Principal G-bundles). Suppose E — B is 
a principal G-bundle. The homotopy quotient Eg may 
be viewed as a bundle E xg EG over B. Since the fiber 
is contractible, there is a homotopy equivalence 


and therefore Hg(E)=H(B). 


Example 5 (Homogeneous spaces). If K is a closed 
subgroup of a Lie group G, the space EG may be 
viewed as a model for EK, with BK = EG/K = EG xx 
(G/K). Hence, 


H¢(G/K) = H(BK) 4] 
Let us briefly describe two of the main techniques 
for computing Hg(M). 


1. Leray spectral sequences. If R is a field, the 
equivariant cohomology may be computed as the 
Eœ term of the spectral sequence for the fibration 
Mc — BG. If BG is simply connected (as is the 
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case for all compact connected Lie groups), the 
E2-term of the spectral sequence reads 


E57 = H?(BG) @ H4(M) [5] 


2. Mayer-Vietoris sequences. If M=U,UU) is a 
union of two G-invariant open subsets, there is a 
long exact sequence 


-- > Hé(M) > Hg (U1) $ Hg (U2) > 
— H®(U,N Uz) > M) =e: 


More generally, associated to any G-invariant open 
cover, there is a spectral sequence converging to 
H¢(M). 


Example 6 Consider the standard U(1)-action on 
S? by rotations. Cover S4 by two open sets Ux, given 
as the complement of the south pole and north pole, 
respectively. Since U} N U_ retracts onto the equa- 
torial circle, on which U(1) acts freely, its equivar- 
iant cohomology vanishes except in degree 0. On the 
other hand, Ux retract onto the poles p+. Hence, by 
the Mayer-Vietoris sequence the map Hf (S°) ~ 
Hẹn (p+) O Hap given by pullback to the fixed 
points is an isomorphism for k> 0. Since the 
pullback map is a ring homomorphism, we conclude 
that Hy1)(S*;R) is the commutative ring generated 
by two elements x+ of degree 2, subject to a single 
relation x,x_ =0. 


q-Differential Algebras 


Let G be a Lie group, with Lie algebra g. A 
G-manifold is a manifold M together with a 
G-action such that the action map [1] is smooth. 
We would like to introduce the concept of equivar- 
iant differential forms on M. This complex should 
play the role of differential forms on the infinite- 
dimensional space Mg. In Cartan’s approach, the 
starting point is an algebraic model for the differ- 
ential forms on the classifying bundle EG. 

The algebraic machinery will only depend on the 
infinitesimal action of G. It is therefore convenient 
to introduce the following concept. 


Definition 7 Let g be a finite-dimensional Lie 
algebra. A q-manifold is a manifold M, together with 
a Lie algebra homomorphism a:g — X(M), £= ag 
into the Lie algebra of vector fields on M, such that 
the map g x M — TM, (£, m) + a¢(m) is smooth. 


Any G-manifold M becomes a q-manifold by 
taking aç to be the generating vector field 


ag(m) = de Aexp(-re)(™) [6] 


tio 
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Conversely, if G is simply connected, and M is a 
q-manifold for which all of the vector fields ag are 
complete, the q-action integrates uniquely to an 
action of the group G. 

The de Rham algebra ((Q(M),d) of differential 
forms on a g-manifold M carries graded derivations 
Le =L(d¢) (Lie derivatives, degree 0) and ¿= (ae) 
(contractions, degree —1). One has the following 
graded commutation relations: 

id, d| = 0, 


[Le d] =0, = [te,dJ = Le [7] 


[tés tn] = 9, (Le, Ly] = Lien, [Le, tn] = “én [8] 


More generally, the following definitions are 
introduced. 


Definition 8 A q-differential algebra (q-da) is a 
commutative graded algebra A= @>-, A”, equipped 
with graded derivations d, L¢,v¢ of degrees 1,0, —1 
(where Le, t depend linearly on € € g), satisfying the 
graded commutation relations [7] and [8]. 


Definition 9 For any g-da A, one defines the 
horizontal subalgebra Ahbor = Me ker(tg), the invar- 
iant subalgebra A"= f¿ker(Lę), and the basic 
subalgebra Apasic = Ahor N AS. 


Note that the basic subalgebra is a differential 
subcomplex of A. 


Definition 10 A connection on a q-da is an 
invariant element 0 € A! @q, with the property 
i0 =£. The curvature of a connection is the element 
F? € A? @q given as F’ = dé + (1/2)[9, alee 


q-da’s A admitting connections are the algebraic 
counterparts of (smooth) principal bundles, with 
basic Playing the role of the base of the principal 
bundle. 


Weil Algebra 


The Weil algebra Wg is the algebraic analog to the 
classifying bundle EG. Similar to EG, it may be 
characterized by a universal property: 


Theorem 11 There exists a q-da Wg with 
connection Ow, having the following universal 
property: if A is a g-da with connection 0, there 
is a unique algebra homomorphism c:Wqa— A 
taking Ow to 8. 


Clearly, the universal property characterizes Wg 
up to a unique isomorphism. To get an explicit 
construction, choose a basis {e,} of g, with dual 
basis {e"} of g*. Let y? € A'q* be the corresponding 


generators of the exterior algebra, and v? € S'q* the 
generators of the symmetric algebra. Let 


Wa= @ Sg ang [9] 
2i+j=n 


carry the differential 


dy? =v" + foey Y [10] 


dy" = finty [11] 


where ff. = (e°, [ep, ecl) are the structure constants 
of g. Define the contractions ta = te, by 

lay? = 6°, iv? =0 [12] 
and let La = [d, 44]. Then La are the generators for 
the adjoint action on Wg. The element Ow =y° & 
e, € W'q @q is a connection on Wg. Notice that 
we could also use yf and dy’ as generators of Wg. 
This identifies Wg with the Koszul algebra, and 
implies: 


Theorem 12 Wg is acyclic, that is, the inclusion 
R — Wg is a homotopy equivalence. 


Acyclicity of Wg corresponds to the contractibil- 
ity of the total space of EG. 

The basic subalgebra of Wg is equal to (Sq*)8, and 
the differential restricts to zero on this subalgebra, 
since d changes parity. Hence, if A is a q-da with 
connection, the characteristic homomorphism 
c:Wg— A induces an algebra homomorphism, 
(Sq*)® — H(Apasic). This homomorphism is indepen- 
dent of 0: 


Theorem 13 Suppose 00,01 are two connections on 
a q-da A. Then their characteristic homomorphisms 
co, c1 : Wg — A are g-homotopic. That is, there is a 
chain homotopy intertwining contractions and Lie 
derivatives. 


Remark 14 One obtains other interesting exam- 
ples of g-da’s if one drops the commutativity 
assumption from the definition. For instance, 
suppose g carries an invariant scalar product. Let 
Cl(q) be the corresponding Clifford algebra, and 
U(g) the enveloping algebra. The noncommu- 
tative Weil algebra (introduced by Alekseev and 
Meinrenken 2002) 


Wg = Ug ® Cl(g) [13] 


is a (noncommutative) g-da, with the derivations d, 
La, La defined on generators by the same formulas as 
for Wg. 


Equivariant Cohomology of q-da’s 
In analogy to Hg(M):=H(Mc), we now declare: 


Definition 15 The equivariant cohomology algebra 
of a q-da A is the cohomology of the differential 
algebra Ag:=(Wg 8 A)basic: 


H(A) := H(A) [14] 


The equivariant cohomology H(A) has functorial 
properties parallel to those of Hg(M). In particular, 
H,(A) is a module over 


Ay ({0}) = H((Wa)pasic) = (S8")” [15] 


Theorem 16 Suppose A is a g-da with connection 
6, and let c: Wg — A be the characteristic homo- 
morphism. Then 


Wg 8 A — A, 


is a Q-homotopy equivalence, with g-homotopy 
inverse the inclusion 


w&x c(w)x [16] 


A— Wg 8A, x=lgx [17] 
In particular, there is a canonical isomorphism 
A(Apasic) — H(A) [18] 


Proof By Theorem 13, the automorphism w & 
xt>1@c(w)x of Wg&A is g-homotopic to the 
identity map. 

LI 


The above definition of the complex Ag is often 
referred to as the Weil model of equivariant 
cohomology, while the term Cartan model is reserved 
for a slightly different description of Ag. Identify 
the space (Sq* @ A)*® with the algebra of equivariant 
A-valued polynomial functions a:q — A. Define a 
differential d, on this space by setting 


(dga) (£) = d(a(g)) — als) [19] 


Theorem 17 (H Cartan). The natural projection 
Wg & A —> Sq* & A restricts to an isomorphism of 
differential algebras, A, = (Sq* & A)®. 


Suppose A carries a connection 0. The g-homotopy 
equivalence [16] induces a homotopy equivalence 
Ag — Abasic of the basic subcomplexes. By explicit 
calculation, the corresponding map for the Cartan 
model is given by 


(Sg 8 A)’ =} Ahasi 


Here a(F’) € A® is the result of substituting the 
curvature of 0, and Phor: A —> Aho is horizontal 
projection. On elements of (Sq*)*® C (Sg* 9 A)§, 


ar PP(a(F)) 20 
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the map _ [20] Chern—Weil 
homomorphism. 
There is an algebraic counterpart of the Leray 


spectral sequence: introduce a filtration 


specializes to the 


FP A+ = (P (Sg @ A1). [21] 


2i>p 


Since second term in the equivariant differential 
[19] raises the filtration degree by 2, it follows that 


E21 = (SP20)! 8 H(A) 22] 


for p even, E57 =0 for p odd. In fortunate cases, the 
spectral sequence collapses at the E)-stage (see 
below). 


Equivariant de Rham Theory 


We will now restrict ourselves to the case that 
A=Q(M) is the algebra of differential forms on a 
G-manifold, where G is compact and connected. 


Theorem 18 (Equivariant de Rham theorem). Sup- 
pose G is a compact, connected Lie group, and 
that M is a G-manifold. Then there is a canonical 
isomorphism 


He (M; R) = Hg(Q(M)) |23] 


where the left-hand side is the equivariant cohomol- 
ogy as defined by the Borel construction. 


Motivated by this result, the notation can be 
changed slightly; write 


QG(M) = (Sg* @ Q(M))” |24] 


for the Cartan complex of equivariant differential 
forms, and dg for the equivariant differential [19]. 


Remark 19 Theorem 18 fails, in general, for 
noncompact Lie groups G. A differential form 
model for the noncompact case was developed by 
Getzler (1990). 


Example 20 Let (M,w) be a symplectic manifold, 
and a:G — Diff(M) a Hamiltonian group action. 
That is, a preserves the symplectic form, ažw =w, 
and there exists an equivariant moment map 
@:M—q* such that ww+d(%,é)=0. Then the 
equivariant symplectic form wglé):= w+ (®,€) is 
equivariantly closed. 


Example 21 Let G be a Lie group, and denote, 
respectively, by 


0t = g 'dg and È = dgg [25] 
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the left- and right-invariant Maurer—Cartan forms. 


Suppose g= Lie(G) carries an invariant scalar 
product “-”, and consider the closed 3-form 
b= 77% - [0,07] [26] 
Then 
dal) = +3 (0 +O") -E [27] 


is a closed equivariant extension for the conjugation 
action of G. More generally, transgression gives 
explicit differential forms ¢ġ; generating the coho- 
mology ring H(G)=(Aaq*)°. Closed equivariant 
extensions of these forms were obtained by Jeffrey 
(1995), using a construction of Bott-Shulman. 


A G-manifold is called equivariantly formal if 
Hc(M) = (Sa*)" @ H(M) |28] 


as an (Sq*)°-module. Equivalently, this is the 
condition that the spectral sequence [22] for 
Hg(M) collapses at the E2-term. M is equivariantly 
formal under any of the following conditions: (1) 
H4(M) =0 for q odd, (2) the map Hg(M) — H(M) is 
onto, (3) M admits a G-invariant Morse function 
with only even indices, and (4) M is a symplectic 
manifold and the G-action is Hamiltonian. (The last 
fact is a theorem due to Ginzburg and Kirwan. 


Example 22 The conjugation action of a compact 
Lie group is equivariantly formal, by criterion [2]. In 
this case, eqn [28] is an isomorphism of algebras. 


It is important to note that eqn [28] is not an 
algebra isomorphism, in general. Already the rota- 
tion action of G=U(1) on M=S7, discussed in 
Example 6, provides a counter-example. 


Theorem 23 (Injectivity). Suppose T is a compact 
torus, and M is T-equivariantly formal. Then the 
pullback map H7(M) — Hr7(M°‘) to the fixed point 
set is injective. 


Since the pullback map to the fixed point set is an 
algebra homomorphism, one can sometimes use this 
result to determine the algebra structure on H7(M): 
let a, € H(M) be generators of the ordinary coho- 
mology algebra, and let (a)r be equivariant exten- 
sions. Denote by x, € Hr(M/‘) the pullbacks of (a,)+ 
to the fixed point set, and let y; be a basis of t*, 
viewed as elements of St* c H7r(M!). Then H7(M) 
is isomorphic to the subalgebra of H7(M"‘) gener- 
ated by the x, and yj. 

The case of nonabelian compact groups G may be 
reduced to maximal torus T using the following result. 
Observe that for any G-manifold M, there is a natural 
action of the Weyl group W = N(T)/T on Hr(M). 


Theorem 24 The natural restriction map 
H¢(M;R) > Hr(M;R)" [29] 


onto the Weyl group invariants is an algebra 
isomorphism. 


Remark 25 The Cartan complex [24] may be viewed 
as a small model for the differential forms on the 
infinite-dimensional space Mg. In the noncommuta- 
tive case, there exists an even “smaller” Cartan model, 
with underlying complex (Sq*)° @ 2(M)°, involving 
only invariant differential forms on M (see Alekseev 
and Meinrenken (2005) and Goresky, Kottwitz, and 
MacPherson (1998)). 


Equivariant Characteristic Forms 


Let G be a compact Lie group, and E—B a 
principal G-bundle with connection 0 € QHE) @ q. 
Suppose the principal G-action commutes with the 
action of a compact Lie group K on E, and that @ is 
K-invariant. The K-equivariant curvature of 0 is 
defined as follows: 


Fr = dx + t0, 6] € NF(E) 8 g 


By the equivariant version of eqn [20], there is a 
canonical chain map 


OKxG(E) —> Ox(B) [30] 


defined by substituting the K-equivariant curvature 
for the g-variable, followed by horizontal projection 
with respect to 6. The Cartan map [30] is homotopy 
inverse to the pullback map from 0x(B) to QxyG(B). 


Example 26 The complex Qx,yc¢(E) contains a 
subcomplex (Sq*)°. The restriction of eqn [30] is 
the equivariant Chern—Weil map 


(Sq*)% 3 Ox(B) [31] 


Forms in the image of eqn [31] are equivariantly 
closed; they are called the K-equivariant character- 
istic forms of E. 


Example 27 Similarly, if V — B is a K-equivariant 
vector bundle with structure group G C GL(k), one 
defines the K-equivariant characteristic forms of V 
to be those of the corresponding bundle of G-frames 
in V. 


For instance, suppose V is an oriented K-equivar- 
iant vector bundle of even rank k, with an invariant 
metric and compatible connection. The Pfaffian 
defines an invariant polynomial on $0(k): 


C+ det'/*(¢/27) [32] 


(equal to O if k is odd). The K-equivariant 
characteristic form of degree k on B determined by 
eqn [32] is known as the equivariant Euler form 


Eulk(V) € OE (B) [33] 


Similarly, one defines equivariant Pontrjagin forms 
of V, and (for Hermitian vector bundles) equivariant 
Chern forms. 


Example 28 Suppose G is a maximal rank sub- 
group of the compact Lie group K. The bundle K > 
K/G admits a unique K-invariant connection. 
Hence, one obtains a canonical chain map (Sq*)° — 
Og(K/G), realizing the isomorphism Hx(K/G) = 
(Sq*)°. In particular, any G-invariant element of g* 
defines a closed K-equivariant 2-form on K/G. For 
instance, symplectic forms on coadjoint orbits are 
obtained in this way. 


Suppose M is a G-manifold, and let O = E xg M 
be the associated bundle. For any K-invariant 
connection on E, one obtains a chain map 


QG(M) > OnyG(E x M) > Ox(Q) [34] 


by composing the pullback to Ex M with the 
Cartan map for the principal bundle E x M —> O. 


Example 29 Suppose (M,w) is a Hamiltonian 
G-manifold, with moment map ®:M — q*. The 
image of wg =w+® under the map [34] defines a 
closed K-equivariant 2-form on QO. This construction 
is of importance in symplectic geometry, where it 
arises in the context of Sternberg’s minimal 
coupling. 


Equivariant Thom Forms 


Let r: V — B be a G-equivariant oriented real vector 
bundle of rank k over a compact base B. There is a 
canonical chain map, called fiber integration 


T, : L(V) oy > Q(B) [35] 


where the subscript indicates “compact support.” It 
is characterized by the following properties: 


(1) for a form of degree k, the value of its fiber 
integral at x € B is equal to the integral over the 
fiber Vy, and 

(2) 


TA ATB) = Ta A B [36] 


for all a € Q(Y), and 6 € Q(B). Fiber integration 
extends to G-equivariant differential forms, and 
commutes with the equivariant differential. 
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Theorem 30 (Equivariant Thom isomorphism). Fiber 
integration defines an isomorphism, 
He!" (V) > H3 (B) 37 


An equivariant Thom form for a G-vector bundle 
is a cocycle Thg(V) € a Vi with the property, 


m,TheV) = 1 38] 


Given Thc(V), the inverse to eqn [37] is realized on 
the level of differential forms as 
Q(B) = OSE), ar The(V)An*a [39] 
A beautiful “universal” construction of Thom 
forms was obtained by Mathai and Quillen (1986). 
Using eqn [34], it suffices to describe an SO(k)- 
equivariant Thom form for the trivial bundle R? — 


{O}. Using multi-index notation for ordered subsets 
Een steerer 2 


ec lll? 
Thoo RNO = Sir X adet"? (F) (da) (40 
Here the sum is over all subsets I with |I| even, and 
I° is the complement of I. The matrix ¢; is obtained 
from ¢ by deleting all rows and columns that are not 
in I, and det"? is defined as a Pfaffian. Finally, er is 
the sign of the shuffle permutation defined by I, that 
is, (dx)! (dx)! =erdx1---dxp. As shown by Mathai 
and Quillen, the form [40] is equivariantly closed, 
and clearly eqn [38] holds since the top degree part 
is just a Gaussian. If k is even, the Mathai—Quillen 
formula can also be written, on the open dense 
where C € $0(k) is invertible, as 


Thsoe)(IR*)(¢) = det"? (>) ee 


The form Thso x) (R*) given by these formulas does 
not have compact support, but is rapidly decreasing 
at infinity. One obtains a compactly supported 
Thom form, by applying an SO(k)-equivariant 
diffeomorphism from R* onto some open ball of 
finite radius. 

Note that the pullback of eqn [40] to the origin is 
equal to dett”? (¢/27) (equal to 0 if k is odd). This 


implies: 


Theorem 31 Let ı:B — V denote the inclusion of 
the zero section. Then 


“The (V) = Eul (V) (42) 


where Eulc(V) € Q$ (B) is the equivariant Euler 
form. 


Suppose, M is a G-manifold, and S a closed 
G-invariant submanifold with oriented normal 
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bundle vs. Choose a G-equivariant tubular neigh- 


borhood embedding 
vy >UCM [43] 


and let PDg(S) € Q¢(M),, be the image of Thc(V) 
under this embedding. The form PDg(S) has the 


property 


J PS Ame J tho 44] 


for all closed equivariant forms a € Q¢(M). It is called 
an “equivariant Poincaré dual” of S. By construction, 
the pullback to S is the equivariant Euler form: 


usPDc(S) = Eulg(vs) [45] 


Equivariant Poincaré duality takes transversal inter- 
sections of G-manifolds to wedge products, similar 
to the nonequivariant case. 


Remark 32 In general, the (Sq*)°-submodule gen- 
erated by Poincaré duals of G-invariant submani- 
folds is strictly smaller than Hg(M). In this sense, 
the terminology “duality” is misleading. 


Localization Theorem 


In this section, T will denote a torus. Suppose M is a 
compact oriented T-manifold. For any component F 
of the fixed point set of T, the action of T on vp 
fixes only the zero section F. This implies that the 
normal bundle vp has even rank and is orientable. 
Fix an orientation, and give F the induced 
orientation. 

Since T is compact, the list of stabilizer groups of 
points in M is finite. Call € € t generic if it is not in the 
Lie algebra of any of these stabilizers, other than T 
itself. In this case, value Eulr(vp, £) of the equivariant 
Euler form is invertible as an element of Q(F). 


Theorem 33 (Integration formula). Suppose M is 
a compact oriented T-manifold, where T is a torus. 
Let a € Qr(M) be a closed equivariant form, and let 
E € t be generic. Then 


Lpa(§) 
= a 46 
fc 5) Eulr (vr, £) 46] 
where the sum is over the connected components of 
the fixed point set. 


Rather than fixing €, one can also view eqn (46) 
as an equality of rational functions of £ € t. 


Remark 34 The integration formula was obtained 
by Berline and Vergne (1983), based on ideas of Bott 
(1967). The topological counterpart, as a “localiza- 
tion principle,” was proved independently by Atiyah 


and Bott (1984). More abstract versions of the 
localization theorem in equivariant cohomology had 
been proved earlier by Borel, Chiang-Skjelbred and 
others. 


Remark 35 If a=PDr7(F) ^ 8, where 8 is equivar- 
iantly closed, the integration formula is immediate 
from the property [44] of Poincaré duals. The 
essence of the proof is to reduce to this case. 


Remark 36 The localization contributions are 
particularly nice if F={p} is isolated (which can 
only happen if dim M is even). In this case, vZa(€) is 
simply the value of the function ajo)(€) at p. For the 
Euler form, one has 


Eul(vp, €) = (-1)°"? T we), e 47 


where u;(p) € t* are the (real) weights of the action 
on the tangent space T,M. (Here we have chosen an 
isomorphism T M & C! compatible with the orien- 
tation.) Hence, if all fixed points are isolated, 


ae) = (qdim M/2 5m SOS) (P) 
h = Co Litwe.g M8 


Example 37 Let M be a compact oriented mani- 
fold, and e(M) = fy Eul(TM) its Euler characteristic. 
Suppose a torus T acts on M. Then 


e(M) = X e(F) [49] 
F 


where the sum is over the fixed point set of T. 
This follows from the integral of the equivariant 
Euler form a(£)= Eulr(M, £), by letting €— 0 in 
the localization formula. In particular, if M admits 
a circle action with isolated fixed points, the 
number of fixed points is equal to the Euler 
characteristic. 


In a similar fashion, the localization formula gives 
interesting expressions for other characteristic num- 
bers of manifolds and vector bundles, in the 
presence of a circle action. Some of these formulas 
were discovered prior to the localization formula, 
see in particular Bott (1967). 


Example 38 In this example, we show that for a 
simply connected, simple Lie group G the 3-form 
p € 03(G) defined in eqn [26] is integral, provided 
“*.? is taken to be the basic inner product (for which 
the length squared of the short coroots equals 2). 
Since any such G is known to contain an SU(2) 
subgroup, it suffices to prove this for G=SU(2). 
Consider the conjugation action of the maximal 
torus T = U(1), consisting of diagonal matrices. The 
fixed point set for this action is T itself. The normal 


bundle vr is trivial, with T acting on the fiber g/t by 
the negative root —a. Hence, Eul(vz, €) = (a, &). 
Let et be the coroot, defined by (a,da) =2. 
By definition, (a,@)=2. Let us integrate the T- 
equivariant extension @7(€) (cf. [27]). Its pullback to 
T is 6! - £, where 6 € Q(T, t) is the Maurer—Cartan 
form. The integral of 0! is a generator of the integral 
lattice, that is, it equals &. Thus, 


-i re 
hu One a 10 








Duistermaat-—Heckman Formulas 


In this section, we discuss the Duistermaat—Heckman 
formula, for the case of isolated fixed points. Let T 
be a torus, and (M,w) a compact Hamiltonian 
T-space, with moment map ®:M — t*. Denote by 
wr =w+® the equivariant extension of w. Assum- 
ing isolated fixed points, the localization formula 
gives, for all integers k > 0, 


[w+ @a= “I EOL ne (Sil 
where n=(1/2)dim M. Note that both sides are 
homogeneous of degree k — n in €, but the terms on 
the right-hand side are only rational functions while 
the left-hand side is a polynomial. For k=n, both 
sides are independent of £, and compute the integral 
Juya”. For k <n, the integral [51] is zero, and the 
cancellation of the terms on the right-hand side gives 
identities among the weights (p). Equation [51] 
also implies 


a(O(0). 
wt (yn Yr 
I, "_ Tome P” 


Assume, in particular, that T = U(1), and let €=t&, 
where £o is the generator of the integral lattice in t. 
Identify t = R in such a way that £o corresponds to 
1éR. Then H=(®,€ ) is a Hamiltonian function 
with periodic flow. Write aj(p) = (mlp), £o) € Z. 
Then eqn [52] reads 

atH(p) 


ww” — (—1)" 
Je noo w DEO 


The right-hand side of eqn [53] is the leading term 
for the stationary phase approximation of the 
integral on the left. For this reason, eqn [52] is 
known as the Duistermaat-Heckman exact station- 
ary phase theorem. 

Formula [52] has the following consequence for 
the push-forward of the Liouville measure under the 
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moment map, the so-called Duistermaat-Heckman 
measure H,(w”/n!). Let O be the Heaviside measure 
(i.e., the characteristic measure of the positive real axis). 


Theorem 39 (Duistermaat-Heckman). The push- 
forward H,(w"/n!) is piecewise polynomial measure 
of degree n — 1, with singularities at the set of all H(p) 
for fixed points p of the action. One has the formula 


AORE 


p 


O -= H(p)) [54] 


Proof It is enough to show that the Laplace 
transforms of the two sides are equal. Multiplying 
by e” and integrating over A (take t < 0 to ensure 
convergence of the integral), the resulting identity is 
just eqn [53]. O 


Remark 40 The theorem generalizes to Hamiltonian 
actions of higher-rank tori, and also to nonisolated 
fixed points. See the paper by Guillemin, Lerman, and 
Sternberg (1988) for a detailed discussion of this 
formula and of its “quantum analog.” 


Equivariant Index Theory 


By definition, the Cartan model consists of equivar- 
iant forms a(&) with polynomial dependence on the 
equivariant parameter €. However, the integration 
formula holds in much greater generality. For 
instance, one may consider generalized Cartan 
complexes (Kumar and Vergne 1993). Here the 
parameter € varies in some invariant open subset of 
q, and the polynomial dependence is replaced by 
smooth dependence. The use of these more general 
complexes in equivariant index theory was pio- 
neered by Berline and Vergne (1992). 

Assume that M is an even-dimensional, compact 
oriented Riemannian manifold, equipped with a 
Spin-c structure. According to the Atiyah—Singer 
theorem, the index of the corresponding Dirac 
operator D is given by the formula 


ind(D) = J A(M)e‘!? [55] 


Here c is the curvature 2-form of the complex line 
bundle associated to the Spin-c structure, and A(M) 
is the A-form. Recall that A(M) is obtained by 
substituting the curvature form in the ou — 
series expansion of the function A(x) = det"? ((x/2)/ 
sinh(x/2)) on $0(7). 

Suppose now that a compact, connected Lie group 
G acts on M by isometries, and that the action lifts to 
the Spin-c bundle. Replacing curvatures with equiv- 
ariant curvatures, one defines the equivariant form 


A(M)(é) and the form c(é). Note that A(€) is only 
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defined for € in a sufficiently small neighborhood of 
0, since the function A(x) is not analytic for all x. 
The G-index of the equivariant Spin-c Dirac operator 
is a virtual character g —> ind(D)(g) of the group G. For 
g = exp € sufficiently small, it is given by the formula 


N 


ind(D)(exp £) = J ÀME [s6 


For € sufficiently small, the fixed point set of g 
coincides with the set of zeroes of the vector field ag. 
The localization formula reproduces the Atiyah- 
Segal formula for ind(D)(g), as an integral over M8. 

Berline and Vergne (1996) gave similar formulas 
for the equivariant index of any G-equivariant 
elliptic operator, and more generally for operators 
that are transversally elliptic in the sense of Atiyah. 


See also: Cohomology Theories; Compact Groups and 
Their Representations; Hamiltonian Group Actions; 
K-theory; Lie Groups: General Theory; Mathai—Quillen 
Formalism; Path-Integrals in Noncommutative Geometry; 
Stationary Phase Approximation. 
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Introduction 


The ergodic theory was developed from the following 
Poincaré’s work, which served as the starting point in 
the measure theory of dynamical systems in the sense 
of the study of the properties of motions that take 
place at “almost all” initial states of a system: let 
(X, B, u) be a probability space and a transformation 
T:X — X preserve u (ie., (T~'A)=p(A) for any 
A € B). If (A) > 0, then for almost all points x € A 
the orbit {T”x},,.9 returns to A infinitely more often 
(the Poincaré-Caratheodory recurrence theorem). 
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The main theme of the ergodic theory is to know 
whether averages of quantities generated in a 
stationary manner converge. In the classical situation 
the stationary is described by a measure-preserving 
transformation T, and one considers averages taken 
along a sequence f,fT,fT7,... for integrable f. This 
corresponds to the probabilistic concept of stationar- 
ity. Hence, traditionally, the ergodic theory is the 
qualitative study of iterates of an individual transfor- 
mation, of one parameter flow of transformations 
(such as that obtained from the solution of an 
autonomous ordinary differential equation). We 
should note that an important purpose behind this 
theory is to verify significant facts from a statistical 
point of view (e.g., the law of large numbers, 
convergence to limit distributions). The oldest branch 


of this theory is the study of ergodic theorems. It was 
started in 1931 by Birkhoff (1931) and von Neumann 
(1932), having its origins in statistical mechanics. 
More specifically, the central notion is that of 
ergodicity, which is intended to capture the idea 
that a flow is “random” or “chaotic.” In dealing with 
the motion of molecules, Boltzmann and Gibbs made 
such hypotheses from the beginning. One of the 
earliest precise definitions of randomness of a 
dynamical system was “minimality”: the orbit of 
almost every point is dense. In order to describe such 
phenomena in measure-theoretical setting, von Neu- 
mann and Birkhoff required the stronger assumption 
of ergodicity as follows. Let (X, B, u) be a measure 
space and F; a measurable flow on X. We call F; 
ergodic if the only invariant measurable sets are Ø or 
all of X. Here, the invariance of the set A means that 
F,(A) =A for all t € R and we agree to write A =B if 
A and B differ by a null set with respect to u. Note 
that ergodicity implies minimality if we are on a 
second countable Borel space. A function f: X — R 
will be called a “constant of the motion” iff f o F, = f 
a.e. for each t € R. Then we see that a flow F; on X is 
ergodic iff the only constants of the motion are 
constant a.e. In case of a measurable transformation 
T on X, the invariance of the set A means that 
T-'A=A, and the measurable function f is called 
invariant if foT=f a.e. Then we call T ergodic 
provided if A is invariant then either u(A)=0 or 
u(A)=1; equivalently, any invariant function is 
constant a.e. (Cornfeld et al. 1982). The most basic 
example where ergodicity can be verified is the 
following: if M is a compact Riemannian and has 
negative sectional curvatures at each point, then the 
geodesic flow on each sphere bundle is ergodic 
(Hopf—Hadamard). In general, verifying ergodicity 
can still be very difficult. In the Hamiltonian case, the 
first step is to pass to an energy surface. For example, 
Sinai (1970) shows that one has ergodicity on an 
energy surface of a classical model for molecular 
motion, that is, a collection of hard spheres in a box. 


Ergodic Theorems 


Koopman (1931) published the following significant 
observation: if T is an invertible measure-preserving 
transformation of a measure space (X,B, m), then 
the operator U, defined on L7(X,B,y) by 
Uf (x):=f(Tx), is unitary. Thus, the association of 
U with T replaces a nonlinear finite-dimensional 
problem with a linear infinite-dimensional one. 
Then von Neumann (1932) showed an intimate 
connection between measure-preserving transforma- 
tions and unitary operators (the mean ergodic 
theorem): let U be a unitary operator on a Hilbert 
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space H. Denote by P the orthogonal projection 
onto the subspace Ho:={f € H|Uf =f}. For any 
f € H, one has 


N-1 
lim aur-e = 0 
B n=0 H 








As a corollary, one can show that if T:X —> X is 
an ergodic measure-preserving transformation on a 


probability space (X,B,u) then, for any 
f € L'(X, B, u), 
= 1 
lim — T”x =- | d 
WN 2a! ) WAX) R u 


in L'-norm. We also know that T is ergodic if and 
only if U has 1 as a simple eigenvalue. In the case of 
a continuous invertible process, the setting is the 
following. Let M be a manifold and 2 a volume on 
M, with uo the corresponding measure. If F, is a 
volume-preserving flow on M, then F; induces a linear 
one-parameter group of isometries on H = L?(M, uo) 
by U,(f)=foF+. Then U, has 1 as a simple 
eigenvalue for all ż if and only if F; is ergodic. 

On the other hand, Birkhoff (1931) proved the 
following almost everywhere statement (the point- 
wise ergodic theorem): for any f € L'(X, B, u), there 
exists a function f € L'(X, B, u) such that for p-a.e. 
x, f T(x) = f(x) and 


1 Nel E 
oA f(T"x) = f(x) 
In particular, if T is ergodic then p-a.e. 


x, f(x) = Jy fdu. Thus, the Birkhoff theorem allows 
one to prove the ergodic hypothesis by Boltzmann- 
Gibbs, that is, the space average of an observable 
function coincides with its time averages almost 
everywhere, and guarantees the existence, for almost 
everywhere, of the mean number of occurrences in 
any measurable set. On the other hand, physical 
meanings of the mean ergodic theorem can be 
explained as follows. We now turn to one-parameter 
flow of transformations. In order to study continu- 
ous averages 


L f Eads 


fix some sọ E€ R and consider the averages of the 
form 


Z| = 


N-1 
S_ f(T"x) 
n=0 
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where T = Fs. In reality, the measurements can be 
done only approximately at times t=0,1,...,N — 1, 
and it is natural to consider the perturbed averages 


N-1 
f(T” x) 


where {őn} en is an independent random sequence in 
a small interval (—€,€). Assuming that T =F is 
ergodic, we would like to know whether for large N, 
the averages 


are close to 


The answer to this question is satisfactory if one 
is concerned with norm convergence (see, e.g., 
Bergelson et al. (1994)). 


Induced Transformations and 
Tower Constructions 


Suppose T is a measure-preserving transformation 
on a probability space (X,B,u) and A €B with 
u(A) > 0. Let us transform A into a space with 
normalized measure by choosing the o-algebra B4 
consisting of all subsets E C A,E € B and setting 
ualE)=pu(A N E)/u(A). Let Ra: A — N U {co} be the 
“first return function,” that is, Ra(x):= inf {n € 
N|T”x € A}. Then it follows from the Poincaré 
reccurrence theorem that py({x € A| Ra(x) < 
coo}=1. Define Ty:{x € A] Ra(x)< co} ~A_ by 
Tax := Tax, which is called the “induced trans- 
formation” over A (constructed from T). For each 
n € N we define A, := {x € A|Ra(x) =n}. Then for 
every E € By we see that Ta E = Ls TO" (An 
E). Hence, if T is invertible, then we have 
immediately u4(Ta E) =4(E); thus, u4 is invari- 
ant under T4. Even if T is noninvertible, since for 
every k > 1 the equality, 


u (U TIAN TRAN B) 


j=0 
= (Age, N TEDE) 


k 

+u (Ù TAAT A b) 
j=0 

holds, we have p(E)= Sop, u(AgNT-*E)= 

u(T4E), which allows us to see that T4 preserves 

ua. We note that for every E € By with (E) > 0, 


{n E N|T”x € E}={n E€ N|T,4"x € E}. Therefore, 
lor ae KET.) amel a= alpina y 
This equality allows us to see that if (T, u) is ergodic 
then (Ta, u4) is ergodic. Indeed, suppose T4 E = E 
and u(AN ES) >0. Then for xe ANE‘, we have 
So eT 4"(x)=0. On the other hand, as EC 
Un-1 T”E (mod u), Up=1 T "E= Unio TE is a 
T-invariant set. Hence, ergodicity of (T, u) allows us 
to see that J", T”E=X (~mod 0), which implies 
So, LeT (x) = 00. In the case when T is invertible, 
we can write f, Ra du=p(U,s9 T”A), so that Kac’s 
formula (Darling and Kac 1957): 


J Buinen 


is valid when L,.) T”A =X (mod u). In particular, 
u(U„»o T”A)=1 if T is ergodic. The key to 
establish the Kac formula is to show that T’A;(0 > 
i>k—1,k > 1) are pairwise disjoint. This property 
holds when T is invertible. On the other hand, 
in the case when T is noninvertible, if 
U o T”A=X(umod 0) then we can establish, 
for every E € B, 


XO teh) dua(x) Ul 


h=0 
by noting that the following equality holds for all 
n> 1: 


k 


n 


WE) =) pla (4 N 
h 


k=1 


+ u| (ANE)+ u (Ù r-a) a re) 


j=0 


T ASN ra) 
1 


Then choosing E = X allows one to establish the Kac 
formula. As we have observed in the above, the 
assumption that U o T”A =X(umod0) is auto- 
matically satisfied if (T, u) is ergodic. Conversely, if 
(Ta, ua) is ergodic and J*_, T”A(mod u) holds, 
then (T, u) is ergodic. We should remark that the 
formula [1] allows one to obtain a T-invariant 
measure when a Ty-invariant measure u4 is 
obtained previously. Even if Ra is nonintegrable, 
we may have a o-finite infinite invariant measure. 
Then if u4 is ergodic, u obtained by [1] is still 
ergodic (i.e. T-'E=E implies that u(E)=0 or 


ul E) = 0) under the assumption that 
U T”A =X(mod u) (cf. Aaronson (1997)). In 


particular, the recent progress in the study of 
nonhyperbolic systems strongly depends on such 
constructions of induced maps over hyperbolic 
regions. More specifically, if one can find a subset 
A over which the induced map possesses an 


invariant measure satisfying nice statistical proper- 
ties, then the formula [1] may give a o-finite 
invariant measure u for the original map T which 
reflects the statistical properties of the induced 
system. The fundamental problem in the study of 
nonhyperbolic phenomena arising from complex 
systems is to clarify how to predict statistical 
properties of nonhyperbolic systems (T,p) by 
using those of induced systems (T4,j4) over 
hyperbolic regions. We should claim that induced 
maps are well defined over positive-measure sets 
with respect to a reference measure v that is 
“conservative.” Here conservativity of (T,v) 
implies that there are no wandering sets of positive 
measure with respect to v. In many cases, the 
reference measures are physical measures (e.g., 
Lebesgue measures, conformal measures) which 
satisfy nonsingularity with respect to T. Here 
nonsingularity of v means that vT™ ~ v. Then as 
long as we obtain a T4-invariant measure u4 which 
is equivalent to v|A, the formula [1] may give us 
a T-invariant o-finite measure which is equivalent 
to V. 

At the end of this section, we will explain that the 
folmula [1] can be obtained via Rohlin tower 
(Kakutani’s skyscraper) in the case when T is 
invertible. This tower construction is a dual con- 
struction to the construction of induced transforma- 
tions. Assuming that we are given an invertible 
transformation T of the measure space (X, B, u), 
consider the measurable integer-valued positive 
function f € L'(X,B,). By using this function, 
construct a new measure space X’, whose points 
are of the form (x, i), where x € X,1 <i< f(x) andi 
is an integer. The o-algebra BÍ of measurable sets in 
Xf is constructed in an obvious way. The measure 
ul is defined as follows: for any subset of the form 
(A,z), A € B we put 


f is u(A) 
pi ((A, 4) : I fdu 


Let 


~  f(x,it+1) iff +1< f(x) 
T!(x,i) = (Tx,1) ifi+1> f(x) 


It is easy to see that T preserves u’. The space can 
naturally be visualized as a tower whose foundation 
is the space X and which has f(x) floors over the 
point x € X. The space X is identified with the set of 
points (x,1). We see that T=(T’)y and the 
construction of (Xf, TÍ) is called the Rohlin tower 
over X. Let T be an invertible measure-preserving 
transformation on a probability space (X,B, py) 
and AEB with p(A)>O0. Suppose that 
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X= U _o T”A (mod u). Then T is represented as 
the Rohlin tower (APA, BPA (u4)5^) over A as 
follows. We define p :(AP4, BPA, (11,)*4) > (X, B, u) 
by p(x, i):= T'x. Then p is an isomorphism satisfy- 
ing p(T’4),=Tp (almost everywhere). Moreover, 
we can verify that (u,)*4p-'=p by assuming 
ergodicity of u. This is because VE € B we have 


On the other hand, in the case when T is 
noninvertible, the formula [1] is not necessarily 
obtained by any tower construction, except in very 
special cases. For example, even if T is not 
invertible, the tower construction is valid if T|} 
and T|,- are one-to-one and TA = X. 


Convergence to Equilibrium States 
and Mixing Properties 


Let T: X — X be a measure-preserving transforma- 
tion on a probability space (X, B, u). We call T to be 
“weak mixing” if for any A,B € B 





TAOB) — w(A)u(B)| = 0 


The weak-mixing property of (T, u) can be repre- 
sented by; Vf, g € L?(X, B, m) 


1X1 
ms f Egan- f fdu f sdu) = 0 
Ca FT )adu— | fdu | gdu 


and this is equivalent to the ergodicity of (T x 
T, u x u). Moreover, (T, u) is weak mixing if and 
only if the unitary operator U:H — H defined by 
Uf(x)=f(Tx) has no eigenfunctions that are not 
constants (umod 0). We say that the operator U has 
continuous spectrum if there are no eigenvectors. If 
H is the closure of the linear span of the 
eigenvectors, then we say that the operator U has 
pure point spectrum. The weak-mixing property of 
(T, u) just implies that U restricted on the ortho- 
normal subspace of the subspace consisting of 
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constant functions has continuous spectrum. We 
recall that if U has one as a simple eigenvalue then T 
is ergodic. Additionally, if there are no other 
eigenvalues, then T is weakly mixing. Hence, if T 
is weak mixing, then it is necessarily ergodic. The 
next property corresponds to the term “relaxation” 
in physics literature which is used to describe 
processes under which the system passes to a certain 
stationary state independently of its original state. 
We call T (strong) mixing if for any A,B € B 


lim u(T "A AB) = u(A)u(B) 


Then (T, u) is (strong) mixing if and only if for any 
fg € L*(X, B, py) 


im [FT \edu= | fdu f gdu 
B X x 

and mixing is necessarily weak mixing. Moreover, for 
any probability measure v absolutely continuous with 
respect to u, one can show that lim,_.,,v(T-"A) = 
u(A) for every A € B. Thus, any nonequilibrium 
distribution tends to an equilibrium one with time. 
The mixing property has a significant meaning from 
a physical point of view, as it implies decay of 
correlation of observable functions; moreover, limit- 
ing distributions of averaged observables are deter- 
mined by the decay rates of correlation functions for 
many cases (e.g., hyperbolic systems). For any f € 
L?(X,B,u) we consider the scalar products 
Sn = Sn(f)=(U"f,f) n> 0 and define s„:=5 for 
n <0. The sequence {s„} ez is positive definite and 
so by Bohners theorem, we can write 
snf) = f exp [2mind] doy(A), where op is a finite 
Borel measure on the unit circle St and satisfies the 
condition that o¢(S') = ||f |7. Such a measure is called 
a spectral measure of f. We see that T is mixing iff for 
any f € L*(X,B,u) with fẹfdu=0 the Fourier 
coefficients {s,} of the spectral measure oy tend to 
zero as |n| > oo. Let (X, B,u) be isomorphic to 
([0, 1], Bo, A), where Bo is the Borel o-algebra on 
[0, 1] and A is the normalized Lebesgue measure of 
[0, 1]. Then we call a measure-preserving transfor- 
mation T on (X, 5,u) an exact endomorphism if 
N-o T "B= {X, 0}(44mod 0). We can verify that an 
exact endomorphism is (strong) mixing (Rohlin 1964). 
Moreover, u is exact if for any positive-measure set 
AEB with T”A e B(Vn > 0) limy.. u(T”A)=1 
holds. Let T be a nonsingular transformation on 
(X,B,v), that is, vT™ ~ v. Then we can define the 
transfer (Perron—Frobenius) operator £L,:L'(X,v) > 
L'(X,v) by L,f:=d(fv)T 1 /dv, which satisfies 


i (Lfgdv= | figTdv (Yg € (Xv) 
X X 


We say that a nonsingular measure v is exact if 
AEN oT”B implies v(A)\v(A°)=0. By Lin’s 
theorem (Lin 1971) the exactness of v can be 
described as follows; Vf € L'(X,v) with jer dr=0, 
limes. \\by fig. Lee gaby be an. exact 
T-invariant probability measure equivalent to v. 
Then the upper bounds of mixing rates of the exact 
measure u= hv are determined by the speed of Lt- 
convergence of the iterated transfer operators {L,,"}. 
This is because £L,h =h and for every f € L'(X,v) 
with fy fdv=1, lim, ||£,"f — h||, =0. Hence, the 
property L,f =h"L,(hf) allows one to see that for 
every f,g € L®(X, m) the correlation function 


ral) =| | Tednu- f fan f san 


is bounded from above by 


Wllocll£%g — J g dull 
= Ifl lIb HLE” Eb) — P(gh) Hl, 


where P:L'(X,v) — L'(X,v) is a linear operator 
defined by Pf :=h fy f dv. The operator P is the one- 
dimensional projection operator associated to the 
eigenvalue 1 (which is maximal in many cases) of £, 
satisfying P? = P and PL, = L,P =P. Moreover, since 
Li” — P= (L£, — P”, the exponential decay of mixing 
rates follows from the spectral gap of £,, that is, 1 is 
the simple isolated maximal eigenvalue of £,. 


Entropy and Reversibility 


We recall one of the fundamental problems of 
ergodic theory, namely deciding when two auto- 
morphisms T1, T2 of probability spaces (X1, B1, u4) 
and (X2, B2, 4) are equivalent. The approach devel- 
oped for this problem involved the study of spectral 
properties of the associated isometric operators 
U;: L*(X;, u;) > L?(X;, u;)\(i=1,2) and is based on 
the concept of the entropy of automorphism T, 
introduced by Kolmogorov (1958). The entropy is a 
non-negative number, which is the same for equiva- 
lent automorphisms. For example, the entropy of the 
Bernoulli shift o : I,ez{1,2,...,d} — Inez{1, 2,...,d} 
with probability vector (p1,p2,...,pq) is equal to 
= p, log pz. A remarkable theorem of Ornstein 
(1970) states that Bernoulli shifts with the same 
entropy are equivalent. On the other hand, Shannon 
(1948) introduced a notion of entropy in his work 
information theory, which is essentially the same as 
Kolmogorov’s. Let T : X — X be a measure-preserving 
transformation on a probability space (X, B, u). We 
define the entropy of a measurable partition a of X by 


H,,(a@) = — do 4c 4 L(A) log u(A) and define the entropy 
of T with respect to a by 


1 n—1 
h, ,(T, a) : = lim ZH, (Vr a) 
Then the (measure-theoretic) entropy of T is defined by 
h,(T) = 


sup Buli a) 
aH, (@)<0oo 


The next Abramov theorem gives an important 
method of practical computation: let {a,},5, be an 
increasing sequence of partitions with H,(a,) < 
œo(Yn > 1) and such that L),.,an generates the 
o-algebra B. Then h,(T)= limp h,(T,an). We 
say that a partition a is called a generator for a 
noninvertible measure-preserving transformation T on 
a probability space (X, B, u) if V2 o T~‘a generates B. 
If T is invertible then a partition a is called a generator 
if V~ T~'a generates B. In the case when a is a 
generator with H,,(a) < oo, by the Kolmogorov-Sinai 
theorem we tae h, (T = h „(T,@). Let a,(x) denote 
an element of Va 9 1 ‘a containg x €X. By 
the Shannon—McMillan-Breiman theorem, if T is a 
measure-preserving transformation of the achi 
space (X, B, 4) and a is a partition of X with H,,(a) < 
oo, then —(1/n)ul&an(x)) converges u-a.e. and in 
L!(X, u) as n > oo. If T is ergodic, then the limit 
coincides with b,(T,a). Now we can apply these 
results to piecewise expanding transitive (countable) 
Markov transformations T of X c R?. More specifi- 
cally, let v be the normalized Lebesgue measure of X. It 
is well known that under certain conditions there 
exists the unique ergodic invariant probability measure 
u equivalent to v. Then we can establish the Rohlin’s 
entropy formula (Rohlin 1964): 


T) = / log det DT dp 
X 


under the assumptions that H,(a)<oo and 
log | det DT| € L'(X,v). In particular, if a is a finite 
partition and ¢= —log |det DT| is piecewise Holder 
continuous, then the entropy formula just implies 
that u is an equilibrium state for the potential œ in 


the following sense: 
+f odp = sup{hin( r+] o dm|m 


is a T-invariant Borel probability measure on X}, 
where the right-hand side is called the pressure for ¢ 
(Walters 1981). 

We now turn our attention to results which relate 
entropy to Lyapunov exponent in the context of smooth 
invertible systems. Let T be a diffeomorphism of a 
compact manifold M. We say that x € M is a regular 
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point of T if there exist numbers A1 (x) > A2(x) >- > 
Ag(x) and a decomposition T,M = E1(x) + E2(x) 
+- -- + Eg(x) such that 


lim “log || DT” (x )u|| = Aj(x) 


for every 0 Æ u € E;(x) and every 1 <j < d. Let A be 
the set of regular points of T. Then we define a function 


= Se 


dj(x)>0 


) dim Ej (x) 


In the case when all Lyapunov exponents at x are 
negative, we put y(x)=0. Then for every T-invariant 
Borel probability measure u on (X, 5), it holds that 
h (T) < fy x du (Ruelle 1978). Moreover, the equal- 
ity holds whenever T is C!-Holder and p is absolutely 
continuous with respect to the Lebesgue measure of X 
(Pesin 1977). Let T be a transitive C!-Hélder Anosov 
diffeomorphism. E’, E” denote the stable and unstable 
fiber bundles of T. Suppose that u, is the unique 
T-invariant probability measure which satisfies 


n—1 


Jf fates = Him ITH) 
oa 


for every continuous function f :M — R and almost 
everywhere x€ M with respect to the Lebesgue 
measure. The probability measure is the so-called 
Sinai-Ruelle-Bowen (SRB) measure. Then we have 


bu, (T)= f log| det DT(~)ley lds.) 


On the other hand, we have 
bu, (T) = | log | det DT"(x)]p, ldu (2) 


+ J log | det oTo 
M 


We also define unti-SRB measure u_ by replacing T by 
T. Then the SRB measure ju, is absolutely continu- 
ous with respect to the Lebesgue measure of M iff ju. 
coincides with the unti-SRB measure u_ (Bowen 
1975). Hence, the SRB measure À absolutely continu- 
ous iff fy log | det DT(x Neyo (x ,(x)=0. This property is 
sometimes explained as “zero pastas production” 
and also as “reversibility” in the context of non- 
equilibrium statistical mechanics (Ruelle 1997). 


See also: Chaos and Attractors; Determinantal Random 
Fields; Dissipative Dynamical Systems of Infinite 
Dimension; Dynamical Systems and Thermodynamics; 
Finitely Correlated States; Fourier Law; Fractal 
Dimensions in Dynamics; Homeomorphisms and 
Diffeomorphisms of the Circle; Hyperbolic Billiards; 
Hyperbolic dynamical Systems; Intermittency in 
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Turbulence; Large Deviations in Equilibrium Statistical 
Mechanics; Lyapunov Exponents and Strange Attractors; 
Nonequilibrium Statistical Mechanics: Interaction 
Between Theory and Numerical Simulations; 
Nonequilibrium Statistical Mechanics (Stationary): 
Overview; Phase Transitions in Continuous Systems; 
Polygonal Billiards; Regularization for Dynamical Zeta 
Functions; Singularity and Bifurcation Theory; von 
Neumann Algebras: Introduction, Modular Theory, and 
Classification Theory. 
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Introduction 


In this article, we consider Euclidean field theory as 
a formulation of quantum field theory which lives in 
some Euclidean space, and is expressed in probabil- 
istic terms. Methods arising from Euclidean field 
theory have been introduced in a very successful 
way in the study of concrete models of constructive 
quantum field theory. 

Euclidean field theory was initiated by Schwinger 
(1958) and Nakano (1959), who proposed to study 
the vacuum expectation values of field products 
analytically continued into the Euclidean region 
(Schwinger functions), where the first three (spatial) 
coordinates of a world point are real and the last one 
(time) is purely imaginary (Schwinger points). The 
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possibility of introducing Schwinger functions, and 
their invariance under the Euclidean group are immedi- 
ate consequences of the now classic formulation of 
quantum field theory in terms of vacuum expectation 
values given by Wightman (Streater and Wightman 
1964). The convenience of dealing with the Euclidean 
group, with its positive-definite scalar product, instead 
of the Lorentz group, is evident, and has been exploited 
by several authors, in different contexts. 

The next step was made by Symanzik (1966), who 
realized that Schwinger functions for boson fields 
have a remarkable positivity property, allowing to 
introduce Euclidean fields on their own sake. 
Symanzik also pointed out an analogy between 
Euclidean field theory and classical statistical 
mechanics, at least for some interactions (Symanzik 
1969). 

This analogy was successfully extended, with a 
different interpretation, to all boson interactions by 
Guerra et al. (1975), with the purpose of using 
rigorous results of modern statistical mechanics for 


the study of constructive quantum field theory, 
within the program advocated by Wightman (1967), 
and further pursued by Glimm and Jaffe (see Glimm 
and Jaffe (1981) for an overall presentation). 

The most dramatic advance of Euclidean theory 
was due to Nelson (1973a, b). He was able to isolate 
a crucial property of Euclidean fields (the Markov 
property) and gave a set of conditions for these 
fields, which allow us to derive all properties of 
relativistic quantum fields satisfying Wightman 
axioms. The Nelson theory is very deep and rich in 
new ideas. Even after so many years since the basic 
papers were published, we lack a complete under- 
standing of the radical departure from the conven- 
tional theory afforded by Nelson’s ideas, especially 
about their possible further developments. 

By using the Nelson scheme, in particular a very 
peculiar symmetry property, it was very easy to prove 
(Guerra 1972) the convergence of the ground-state 
energy density, and the van Hove phenomenon in the 
infinite-volume limit for two-dimensional boson 
theories. A subsequent analysis (Guerra et al. 1972) 
gave other properties of the infinite-volume limit of 
the theory, and allowed a remarkable simplification 
in the proof of a very important regularity property 
for fields, previously established by Glimm and Jaffe. 

Since then, all work on constructive quantum field 
theory has exploited in different ways ideas coming 
from Euclidean field theory. Moreover, a very 
important reconstruction theorem has been estab- 
lished by Osterwalder and Schrader (1973), allowing 
a reconstruction of relativistic quantum fields from 
the Euclidean Schwinger functions, and avoiding the 
previously mentioned Nelson reconstruction theorem, 
which is technically more difficult to handle. 

This article is intended to be an introduction to the 
general structure of Euclidean quantum field theory, 
and to some of the applications to constructive 
quantum field theory. Our purpose is to show that, 
50 years after its introduction, the Euclidean theory is 
still interesting, both from the point of view of 
technical applications and physical interpretation. 

The article is organized as follows. In the next 
section, by considering simple systems made of a 
single spinless relativistic particle, we introduce the 
relevant structures in both Euclidean and Minkowski 
worlds. In particular, a kind of (pre)Markov property 
is introduced already at the one-particle level. 

Next we present a description of the procedure of 
second quantization on the one-particle structure. 
The free Markov field is introduced, and its crucial 
Markov property explained. Following Nelson, we 
use probabilistic concepts and methods, whose 
relevance for constructive quantum field theory 
became immediately more and more apparent. The 
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very structure of classical statistical mechanics for 
Euclidean fields is firmly based on these probabil- 
istic methods. This is followed by an introduction of 
interaction, and we show the connection between 
the Markov theory and the Hamiltonian theory, for 
two-dimensional space-cutoff interacting scalar 
fields. In particular, we present the Feynman—Kac-— 
Nelson formula that gives an explicit expression of 
the semigroup generated by the space-cutoff 
Hamiltonian in ®o« space. We also deal with some 
applications to constructive quantum field theory. 
This is followed by a short discussion about the 
physical interpretation of the theory. In particular, 
we discuss the Osterwalder—Schrader reconstruction 
theorem on Euclidean Schwinger functions, and the 
Nelson reconstruction theorem on Euclidean fields. 
For the sake of completeness, we sketch the main 
ideas of a proposal, advanced in Guerra and 
Ruggiero (1973), according to which the Euclidean 
field theory can be interpreted as a stochastic field 
theory in the physical Minkowski spacetime. 

Our treatment will be as simple as possible, by relying 
on the basic structural properties, and by describing 
methods of presumably very long lasting power. The 
emphasis given to probabilistic methods, and to the 
statistical mechanics analogy, is a result of the historical 
development. Our opinion is that not all possibilities 
of Euclidean field theory have been fully exploited 
yet, both from technical and physical points of view. 


One-Particle Systems 


A system made of only one relativistic scalar 
particle, of mass m > 0, has a quantum state space 
represented by the positive-frequency solutions of 
the Klein—Gordon equation. In momentum space, 
with points p,,4~=0,1,2,3, let us introduce the 
upper mass hyperboloid, characterized by the con- 
straints p? = px—S>_, p? =m, po >m, and the 
relativistic invariant measure on it, formally given 
by dyu(p) = 6(po)d(p2 — m*) dp, where @ is the step 
function O0(x)=1 if x > 0, and 6(x)=0 otherwise, 
and dp is the four-dimensional Lebesgue measure. 
The Hilbert space of quantum states F is given by 
the square-integrable functions on the mass hyper- 
boloid equipped with the invariant measure dju(p). 
Since in some reference frame the mass hyperboloid 
is uniquely characterized by the space values of the 
momentum p, with the energy given by po = 
w(p) =/p* +m?, the Hilbert space F of the states 
is, in fact, made of those complex-valued tempered 
distributions f in the configuration space R? whose 
Fourier transforms, f (p), are square-integrable func- 
tions in momentum space with respect to the image 
of the relativistic invariant measure dp/2w(p), where 
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dp is the Lebesgue measure in momentum space. 
The scalar product on F is defined by 


if.e)e= my? [Pooh 


where we have normalized the Fourier transform in 
such a way that 


f(x) = J exp(ip-x)} (p) dp 
Fp) = (2r) J exp(—ip.x)f (x) dx 
J e E 6a) 


The scalar product on F can also be expressed in the 


form 
(fsde= ff fle 


where we have introduced the two-point Wightman 
function at fixed time, defined by 





W(x’ — x)g(x) dx’ dx 





/ —3 . / dp 

W(x — x) = (27) J xplip-( Ee 

A unitary irreducible representation of the Poincaré 

group can be defined on F in the obvious way. In 

particular, the generators of space translations are 

given by multiplication by the components of p in 

momentum space, and the generator of time transla- 
tions (the energy of the particle) is given by w(p). 

For the scalar product of time-evolved wave 

functions, we can write 


(exp(—it )f, exp(—it)8) r 
=| | fey W(t! — t,x — x) g(x) dx’dx 


where we have introduced the two-point Wightman 
function, defined by 


W(t — t,x — x) 


—3 . / : / dp 

(Qn) | exp(—i(t ~ #)) explip.(x’ — x) A 

To the physical single-particle system living in 

Minkowski spacetime, we associate a kind of 

mathematical image, living in Euclidean space, 

from which all properties of the physical system 

can be easily derived. We start from the two-point 
Schwinger function 








1 exp(ip - x) 
(x) = an dp 
yp rm 
which is the analytic continuation of the previously 
given two-point Wightman function into the Schwinger 


points. Here x, p € R4, and p-x= Si 1 xipi. Here dp 
and dx are the Lebesgue measures in the R* momentum 
and configuration spaces, respectively. The function 
S(x) is positive and analytic for x Æ 0, decreases as 
exp (—m||x||) as x — oo, and satisfies the equation 


(—A + m*)S(x) = 6(x) 


where A= S4 / Ax? is the Laplacian in four 
dimensions. 

The mathematical image we are looking for is 
described by the Hilbert space N of those tempered 
distributions in four-dimensional configuration space 
R* whose Fourier transforms are square integrable 
with respect to the measure dp/\/p* +m. The scalar 
product on N is defined by 


(f, 2) = an) [WR 


Four-dimensional Fourier transforms are normalized 
as follows: 


f(x) = | exp ip.x)(p) dp 
FO) = (2m) exp (—ip.x)f (x) dx 
J aoaea 


We also write 


aw = |] EOS 
= wen 7 re 


where (,) is the ordinary Lebesgue product defined 
on Fourier transforms and, in momentum space, 
(-A+m?)' amounts to a multiplication by 
(pP +m)". The Schwinger function S(x— y) is 
formally the kernel of the operator (~A +m?) 
The Hilbert space N is the carrier space of a 
unitary (nonirreducible) representation of the four- 
dimensional Euclidean group E(4). In fact, let (a, R) 
be an element of E(4) 


(a, R) : Rê — R* 
x—> Rx+a 


y)e(y) dx dy 


where ae Rt, and R is an orthogonal matrix, 
RR! = RIR= 14. Then the transformation u(a, R) 
defined by 


u(a,R): N3N 
f (x) > (u(a, R)f)(x) = F(R (x — a)) 


provides the representation. In particular, we con- 
sider the reflection ro with respect to the hyperplane 


x4 = 0, and the translations u(t) in the x4-direction. 
Then we have rou(t)ro =u(—t), and analogously for 
other hyperplanes. 

Now we introduce a local structure on N by 
considering, for any closed region A of R*, the 
subspace Ny of N made by distributions in N with 
support on A. We call e4 the orthogonal projection 
on N4. It is obvious that if A € B then N4 € Nz and 
e,egp =epea =€4. A kind of (pre)Markov property 
for one-particle systems is introduced as follows. 
Consider a closed three-dimensional piecewise 
smooth manifold o, which divides R* in two closed 
regions A and B, having o in common. Therefore, 
o € Ayo € B, ANB=0,AUB=R*. Let Na, Ng, Ny, 
and e4,ep,é, be the associated subspaces and 
projections, respectively. Then Ns C Na, No C Np, 
and esea =e€Ato = €o, €c€B = EBs = €s. It is very 
simple to prove the following: 


Theorem 1 Let e4,ep,e, be defined as above, then 
CACB— CBeCA—€2o- 


Clearly, it is enough to show that for any f € N 
we have egepf € No. In that case, e,e,epf = e,esf, 
from which the theorem easily follows. Since e,epf 
has support on A, we must show that for any Cp 
function g with support on A, we have 
(g,e4epf) =0. Then eyegf has support on o, and 
the proof is complete. Now we have 


={ -A4 m°)g, eres) 

= (e4( —A +m’)g, esl) 
=((-A+m’*)g, esl) 

= (g,enf) = 0 


where we have used the definition of (), in terms of 
(), the fact that e,(-A + m7)g=(-A + m7*)g, since 
(A + m*)g has support on A,, and the fact that egf 
has support on B. This ends the proof of the 
(pre)Markov property for one-particle systems. 

A very important role in the theory is played by 
subspaces of N associated to hyperplanes in R*. To 
fix ideas, consider the hyperplane x4=0 and the 
associated subspace No. A tempered distribution in 
N with support on x4=0 has necessarily the form 
(f ® ox) = f(x)6(x4), with f € F. By using the 
basic magic formula, for x > 0 and M > 0, 


*°exp(ipx) , _ m 


(g, €aesf ) 


it is immediate to verify that ||f 8 dolly = ||fllp- 
Therefore, we have an isomorphic and isometric 
identification of the two Hilbert spaces F and No. 
Obviously, similar considerations hold for any 
hyperplane. In particular, we consider the 
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hyperplanes x4 =t and the associated subspaces N;. 
Let us introduce injection operators j+ defined by 


jp: FON 
ff @6 


where f is a generic element of F, with values f(x) 
and (f ® 6;)(x)=f(x)6(x4 —t). It is immediate to 
verify the following properties for jp and its adjoint 
j;: the range of j; is N;; moreover, jy is an isometry, 
so that jf*jt = 18, jj, =e:, where 1, is the identity on 
F, and e; is the projection on N;. Moreover, eji =f: 
and j* =j*e;. 

If we introduce translations u(t) along the 
x4-direction and the reflection rọ with respect to 
x4=0, then we also have the covariance property 
u(t)is=Jr+s, and the reflexivity property rojo =/o, 
Joro =jġ. The reflexivity property is very important. 
It tells us that rọ leaves No pointwise invariant, and 
it is an immediate consequence of the fact that 
6(x4) = 6(—x4). 

Therefore, if we start from N we can obtain F, by 
taking the projection j, with respect to some 
hyperplane z, in particular x4 =0. It is also obvious 
that we can induce on F a representation of E(3) by 
taking those elements of E(4) that leave r invariant. 

Let us now see how we can define the Hamiltonian 
on F starting from the properties of N. Since we are 
considering the simple case of the one-particle system, 
we could just perform the following construction 
explicitly by hand, through a simple application of 
the basic magic formula given earlier. But we prefer 
to follow a route that emphasizes Markov property 
and can be immediately generalized to more 
complicated cases. 

Let us introduce the operator p(t) on F defined by 
the dilation p(t)=jjj;=sjult)jo,t > 0. Then we 
prove the following: 


Theorem 2 The operator p(t) is bounded and self- 
adjoint. The family {p(t)}, for t>0, is a norm- 
continuous semigroup. 


Proof Boundedness and continuity are obvious. Self- 
adjointness is a consequence of reflexivity. In fact, 
p*(t) = jou(—t)jo = forou(t)rojo = fou(t)io = p(t) 

The semigroup property is a consequence of the 
Markov property. In fact, let us introduce 
N+, No, N- as subspaces of N made by distributions 
with support in the regions x4 > 0,x4=0,x4 < 0, 
respectively, and call e,,e9,e_ the respective projec- 
tions. By Markov property, we have eo = e_e,. Now 
write, for s,t > 0, 


ppls) = fou(Z)jofou(s)jo = fou(Z)eou(s)jo 
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If eg could be cancelled, then the semigroup 
property would follow from the group property of 
the translations u(t)u(s)=u(t+s) (a miracle of the 
dilations!). For this, consider the matrix element 


(f P(E) P(s)8) p= u- 


recall eg=e_e,, and use 
—t)jof € N_. 


Let us call h the generator of p(t), so that 
p(t) = exp (—-th), for t > 0. By definition, hb is the 
Hamiltonian of the physical system. A simple 
explicit calculation shows that / is just the energy 
w introduced earlier. Starting from the representa- 
tion of the Euclidean group E(3) already given and 
from the Hamiltonian, we immediately get a 
representation of the full Poincaré group on F. 
Therefore, all physical properties of the one-particle 
system have been reconstructed from its Euclidean 
image on the Hilbert space N. 

As a last remark of this section, let us note that we 
can consider the real Hilbert spaces N, and F,, made of 
real elements (in configuration space) in N and F. The 
operators u(a, t), u(t), ro, jz, f4, ea are all reality preser- 
ving, that is, they map real spaces into real spaces. 

This completes our discussion about the one- 
particle system. For more details we refer to Guerra 
et al. (1975) and Simon (1974). We have introduced 
the Euclidean image, discussed its main properties, 
and shown how we can derive all properties of the 
physical system from its Euclidean image. In the 
next sections, we will show how this kind of 
construction carries through the second-quantized 
case and the interacting case. 


t)jof, eou(s)jo 8) n 
u(s\iog&N, and 


Second Quantization and Free Fields 


We begin this section with a short review about the 
procedure of second quantization based on prob- 
abilistic methods, by following mainly Nelson 
(1973b); see also Guerra et al. (1975) and Simon 
(1974). Probabilistic methods are particularly useful 
in the framework of the Euclidean theory. 

Let H be a real Hilbert space with symmetric 
scalar product (,). Let ġ(u) be the elements of a 
family of centered Gaussian random variables 
indexed by u € H, uniquely defined by the expecta- 
tion values E(¢(u)) = 0, E(é(u)d(v)) = (u,v). Since @ 


is Gaussian, we also have 


E(exp(Ag(u))) = exp(5 0° (u, u)) 
and 


E((41)O(u2) +++ b(Un)) = 


[uuz Ets Un! 


Here [...] is the Hafnian of elements 
[u;u;] = (Mi, Uj}, defined to be zero for odd n, and 
for even n given by the recursive formula 


n 


[uiu - -+ n| = S pau [uu °° Un] 
i=2 
where in [...]’ the terms u1 and u; are suppressed. 


Hafnians, from the Latin name of Copenhagen, the 
first seat of the theoretical group of CERN, were 
introduced in quantum field theory by Caianiello 
(1973), as a useful tool when dealing with Bose 
Statistics. 

Let (O, £, u) be the underlying probability space 
where ¢ are defined as random variables. Here O is 
a compact space, & a o-algebra of subsets of O, and 
u a regular, countable o probability measure 
on X, normalized to u(Q = Jo du=1. 

The fields ¢(u) are a by measurable 
functions on O. The probability space is uniquely 
defined, but for trivial isomorphisms, if we assume 
that X is the smallest o-algebra with respect to 
which all fields é(u), with u € H, are measurable. 
Since (u) are Gaussian, they are represented by 
L?(Q,», u) functions, for any p with 1 < p < œ, 
and the expectations will be given by 


aia p(u)o(u2)--- plun) dy 


where, by a mild abuse of notation, ¢(u;) on the 
right-hand side denote the O space functions which 
represent the random variables ¢(u;). We call the 
complex Hilbert space F =T'(H)=L7(O,», u) the 
Pox space constructed on H, and the function Qo = 
1 on O the ox vacuum. 

In order to introduce the concept of second 
quantization of operators, we must introduce sub- 
spaces of F with a “fixed number of particles.” Call 
Fo) ={AQo}, where A is any complex number. 
Define Fi<,) as the subspace of F generated by 
complex linear combinations of monomials of the 
type o(m1)---d(uj), with uj € H, and j <n. Then 
F (<n-1) 1s a subspace of Feen: We define Fiw the 
n-particle subspace, as the nN ene ae 
of Fi<cn-1) in F(<n), so that 


F (<n) = F in) D F (<n-1) 


By construction, the Fin) are orthogonal, and it is 


not difficult to verify that 


F = Q Fon) 


n=0 


Let us now introduce the Wick normal products 
by the definition 


:b(u1)b(u2) -+ Plun) = Eqn) O(41)O(u2) Plun) 


where Ein) is the projection on Fin). It is not difficult 
to prove the usual Wick theorem (see, e.g., Guerra 
et al. (1975), and its inversion given by Caianiello 
(1973). 

It is interesting to remark that, in the framework 
of the second quantization performed with prob- 
abilistic methods, it is not necessary to introduce 
creation and destruction operators as in the usual 
treatment. However, the two procedures are com- 
pletely equivalent, as shown, for example, in Simon 
(1974). 

Given an operator A from the real Hilbert space 
Hı to the real Hilbert space H2, we define its 
second-quantized operator T (A) through the follow- 
ing definitions: 


P(A)Qo1 = oo 
D(A) :ġ1 (41) 01 (u2) - - - P1 (Un): 
= :62(Auy )b2(Auz) - - - p2 (Aun): 


where we have introduced the probability spaces Q1 
and Q3, their vacua Qo; and Qo2, and the random 
variables 6; and @ 2, associated to Hı and Ho, 
respectively. The following remarkable theorem by 
Nelson (1973b) gives a full characterization of T (A), 
very useful in the applications. 


Theorem 3 Let A be a contraction from the real 
Hilbert space Hı to the real Hilbert space H2. Then 
[(A) is an operator from Ll, to Lo which is 
positivity preserving, T(A)u > 0 if u > 0, and such 
that E(1(A)u) = E(u). Moreover, T(A) is a contrac- 
tion from Ly to TA for any p, 1 < p < œ. Finally, 
r(A) is also a contraction from Li to Li with 
q > p, if |A]? < (p — 1)/(q4 — 1). 


We have indicated with L la the L? spaces 
associated to Hı and H2, respectively. This is the 
celebrated best hypercontractive estimate given by 
Nelson. For the proof, we refer to the original paper 
of Nelson (1973b); see also Simon (1974). 

This completes our short review on the theory of 
second quantization based on probabilistic methods. 

The usual time-zero quantum field (u), u € F,, in 
the ox representation, can be obtained through 
second quantization starting from F,. We call 
(O,%,/) the underlying probability space, and 
F =T(F,)= L?(Q,%, ñ) the Hilbert Pox space of 
the free physical particles. 

Now we introduce the free Markov field ¢(f), f € 
N,, by taking N, as the starting point. We call 
(O,%,) the associated probability space. We 
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introduce the Hilbert space MN=I(N,)= 
L7(Q,%, u), and the operators U(a, R) =T(u(a, R)), 
Ro =T (ro), U(t) = P(u(t)), Ea =T (e4), and so on, for 
which the previous Nelson theorem holds (take 
Ty =a Nez), 

Since in general I'(AB)=I(A)I(B), we have 
immediately the following expression of the Markov 
property E,=E,Ep, where the closed regions 
A,B,o of the Euclidean space have the same 
properties as explained earlier in the proof of the 
(pre)Markov property for one-particle systems. 

It is obvious that E, can also be understood as 
conditional expectation with respect to the sub-o- 
algebra X4 generated by the field ¢(f) with f € N; 
and the support of f on A. 

The relation, previously pointed out, between N; 
subspaces and F are also valid for their real parts 
N,, and F,. Therefore, they carry out through the 
second quantization procedure. We introduce 
Je=T (j+) and Jf =T (77); then the following proper- 
ties hold. J; is an isometric injection of L?(O,», jz) 
into L’ (Ọ, X, u); the range of J, as an operator L? — L? 
is obviously N; =T (Nx); moreover, J;J* = E+. The free 
Hamiltonian Hp is given for t > 0 by 


JoJ: = exp(—tHo) = P'(exp(—tw)) 


Moreover, we have the covariance property 
U(t)Jo =Jt, and the reflexivity RoJo = Jo, Jó Ro = Jő. 

These relations allow a very simple expression for 
the matrix elements of the Hamiltonian semigroup 
in terms of Markov quantities. In fact, for u,v € F 
we have 


(u, exp(—tHo)v) = J, (Jun) Jou dy 


In the next section, we will generalize this 
representation to the interacting case. 

Finally, let us derive the hypercontractive property 
of the free Hamiltonian semigroup. 

Since || exp (—tw)|| < exp (—tm), where m is the 
mass of the particle, we have immediately, by a 
simple application of Nelson theorem, 


| exp(—tHo) <1 


loa S 








provided q — 1 < (p — 1)exp (2tm), where ||...||,, 
denotes the norm of an operator from L? to L4 
spaces. 


Interacting Fields 


The discussion of the previous sections was limited 
to free fields both in Minkowski and Euclidean 
spaces. Now we must introduce interaction in order 
to get nontrivial theories. 
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First, as a general motivation, we will proceed 
quite formally and then we will resort to precise 
statements. 

Let us recall that in standard quantum field theory, 
for scalar self-coupled fields, the time-ordered pro- 
ducts of quantum fields in Minkowski spacetime can 
be expressed formally through the formula 


(T exp(i f £dx)) 


where T denotes time ordering, ọ are free fields in 
Minkowski spacetime, £ is the interaction Lagrangian, 
and (...) are vacuum averages. As is well known, this 
expression can be put, for example, at the basis of 
perturbative expansions, giving rise to terms expressed 
through Feynman graphs. The appropriately chosen 
normalization provides automatic cancelation of the 
vacuum to vacuum graphs. 

Now we can introduce a formal analytic continua- 
tion to the Schwinger points, as previously done for 
the one-particle system, and obtain the following 
expression for the analytic continuation of the field 


time-ordered products, now called Schwinger 
functions, 
Olx, U 
S(x1,...,Xn) — ((x1) p(x ) exp ) 


(exp U) 


Here x1,...,X, denote points in Euclidean space, ¢ 
are the Euclidean fields introduced earlier. The 
chronological time ordering disappears, because the 
fields ¢@ are commutative, and there is no distin- 
guished “time” direction in Euclidean space. Here 
the symbol (...) denotes the expectation values 
represented by [...dyu, as explained earlier, and U 
is the Euclidean “action” of the system formally 
given by the integral on Euclidean space 


U=- | P(ġ(x)) dx 


if the field self-interaction is produced by the 
polynomial P. 

Therefore, these formal considerations suggest 
that the passage from the free Euclidean theory to 
the fully interacting one is obtained through a 
change of the free probability measure du to the 
interacting measure 


exp Udu/ | exp U du 
Q 


The analogy with classical statistical mechanics is 
evident. The expression exp U acts as Boltzmannfaktor, 
and Z= |, o expU dy is the partition function. 

Our task will be to make these statements precise 
from a mathematical point of view. We will be 


obliged to introduce cutoffs, and then be involved in 
their careful removal. 

For the sake of convenience, we make the 
substantial simplification of considering only two- 
dimensional theories (one space, one time dimension 
in the Minkowski region) for which the well-known 
ultraviolet problem of quantum field theory gives no 
trouble. There is no difficulty in translating the 
contents of the previous sections to the two- 
dimensional case. 

Let P be a real polynomial, bounded below and 
normalized to P(0)=0. We introduce approxima- 
tions þh to the Dirac 6 function at the origin of the 
two-dimensional Euclidean space R*, with h € N,. 
Let 4, be the translate of h by x, with x € R*. The 
introduction of þ, equivalent to some ultraviolet 
cutoff, is necessary, because local fields, of the 
formal type (x), have no rigorous meaning, and 
some smearing is necessary. 

For some compact region A in RŽ, acting as space 
cutoff (infrared cutoff), introduce the QO space 
function 


where dx is the Lebesgue measure in R7. It is 
immediate to verify that go is well defined, 
bounded below and belongs to L’ (Ọ, ©, u), for any 
p, 1<p<o. This is the infrared and ultraviolet 
cutoff action. Notice the presence of the Wick 
normal products in its definition. They provide a 
kind of automatic introduction of counterterms, in 
the framework of renormalization theory. 

The following theorem allows us to remove the 
ultraviolet cutoff. 


Theorem 4 Let h — ô, in the sense that the Fourier 
transforms h are uniformly bounded and converge 
pointwise in momentum space to the Fourier trans- 
form of the 6-function given by (2r)®. Then UW) is 
LP -convergent for any p,1 < p < œ, as h— ô. Call 
U, the L?-limit, then Uy, exp Ua E L’ (O, a, u), for 
L&p on 


The proof uses standard methods of probability 
theory, and originates from pioneering work of 
Nelson in (1966). It can be found for example in 
Guerra et al. (1975), and Simon (1974). 

Since U, is defined with normal products, and 
the interaction polynomial P is normalized to 
P(O)=0, an elementary application of Jensen 
inequality gives 


[ exp Un du > exp. | U,du=1 
Q Q 


Therefore, we can rigorously define the new space 
cutoff measure in O space: 


du, = exp U, du/ J exp U, du 
Q 


The space-cutoff interacting Euclidean theory is 
defined by the same fields on O space, but with a 
change in the measure and, therefore, in the 
expectation values. The correlations for the inter- 
acting fields ¢ are the cutoff Schwinger functions 


Sie. — en) = (6(x1) ae b(Xn)) 
= Zx"((x1) --- b(%n) exp Ua) 


where the partition function is 
Li = {eap Ug) 


We see that the analogy with statistical mechanics 
is complete here. Of course, the introduction of the 
space cutoff A destroys translation invariance. The 
full Euclidean covariant theory must be recovered by 
taking the infinite-volume limit A— R? on field 
correlations. For the removal of the space cutoff, all 
methods of statistical mechanics are available. In 
particular, correlation inequalities of ferromagnetic 
type can be easily exploited, as shown, for example, 
in Guerra et al. (1975) and Simon (1974). 

We would like to conclude this section by giving 
the connection between the space-cutoff Euclidean 
theory and the space-cutoff Hamiltonian theory in 
the physical Pox space. 

For £ > 0,t > 0, consider the rectangle in R7, 


£ £ 
A(t) = CE -~5<x1<5,0<a< | 


and define the operator in the physical Pox space 
Pot) = Jo exp Ua (£, ti 


where Jo and J; are injections relative to the lines 
x2 =0 and x2 =t, respectively. Then the following 
theorem, largely due to Nelson, holds. 


Theorem § The operator P,(t) is bounded and self- 
adjoint. The family {P)(t)}, for £ fixed and t> 0, 
is a strongly continuous semigroup. Let H; be 
its lower bounded self-adjoint generator, so that 
P(t) = exp (—tH;). On the physical Pox space, there 
is a core D for H; such that on D the equality 
H; = Ho + V; holds, where Ho is the free Hamiltonian 
introduced earlier and V; is the volume-cutoff 
interaction given by 


2 
Vie in / PCG. a 
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where h,, are the translates of approximations to 
the 6-function at the origin on the x1-space, and the 
limit is taken in L?, in analogy to what has been 
explained for the two-dimensional case in the 
definition of Uy. 


While we refer to Guerra et al. (1975) and Simon 
(1974) for a full proof, we mention here that 
boundedness is related to hypercontractivity of the 
free Hamiltonian, self-adjointness is a consequence 
of reflexivity, and the semigroup property follows 
from Markov property. This theorem is remark- 
able, because it expresses the cutoff interacting 
Hamiltonian semigroup in an explicit form in the 
Euclidean theory through probabilistic expectations. 
In fact, we have 


(u, exp(—tH,)v} = J, a veso Tedi 


We could call this expression as the Feynman-Kac- 
Nelson formula, in fact it is nothing but a path 
integral expressed in stochastic terms, and adapted to 
the Hamiltonian semigroup. 

By comparison with the analogous formula given 
for the free Hamiltonian semigroup, we see that 
the introduction of the interaction inserts the 
Boltzmannfaktor under the integral. 

As an immediate consequence of the Feynman- 
Kac—Nelson formula, together with Euclidean cov- 
ariance, we have the following astonishing Nelson 
symmetry: 


(Qo, exp(—tH,)Qo) = (Qo, exp(—@H;)o) 


which was at the basis of Guerra (1972) and Guerra 
et al. (1972), and played some role in showing the 
effectiveness of Euclidean methods in constructive 
quantum field theory. 

It is easy to establish, through simple probabilistic 
reasoning, that Hy has a unique ground state Qy of 
lowest energy Ep. For a convenient choice of 
normalization and phase factor, one has ||Q,||, =1, 
and Q; >0 almost everywhere on QO space (for 
bosonic systems, ground states have no nodes in 
configuration space!). Moreover, Q; € LP, for any 
1<p<o. If €>0 and the interaction is not 
trivial, then Q¢AQ,FEe <0, and |l < 1. 
Obviously, ||exp (—tHy)||, 2 = exp (-tEp). 

The general structure of Euclidean field theory, as 
explained in this section, has been at the basis of all 
applications in constructive quantum field theory. 
These applications include the proof of the existence 
of the infinite-volume limit, with the establishment of 
all Wightman axioms, for two- and three-dimensional 
theories. Moreover, the existence of phase transitions 
and symmetry breaking has been firmly established. 
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Extensions have also been given to theories involving 
Fermions, and to gauge field theory. Due to the scope 
of this review, limited to a description of the general 
structure of Euclidean field theory, we cannot give 
a detailed treatment of these applications. Therefore, 
we refer to recent general reviews on constructive 
quantum field theory for a complete description of 
all results (see, e.g., Jaffe (2000)). For recent applica- 
tions of Euclidean field theory to quantum fields on 
curved spacetime manifolds we refer, for example, to 


Schlingemann (1999). 


The Physical Interpretation of Euclidean 
Field Theory 


Euclidean field theory has been considered by most 
researchers as a very useful tool for the study of 
quantum field theory. In particular, it is quite easy, 
for example, to obtain the fully interacting Schwin- 
ger functions in the infinite-volume limit in two- 
dimensional spacetime. At this point, there arises 
the problem of connecting these Schwinger func- 
tions with observable physical quantities in Min- 
kowski spacetime. A very deep result of 
Osterwalder and Schrader (1973) gives a very 
natural interpretation of the resulting limiting 
theory. In fact, the Euclidean theory, as has been 
shown earlier, arises from an analytic continuation 
from the physical Minkowski spacetime to the 
Schwinger points, through a kind of analytic 
continuation in time (also called Wick rotation, 
because Wick exploited this trick in the study of 
the Bethe-Salpeter equation). Therefore, having 
obtained the Schwinger functions for the full 
covariant theory, after all cutoff removal, it is 
very natural to try to reproduce the inverse analytic 
continuation in order to recover the Wightman 
functions in Minkowski spacetime. Therefore, 
Osterwalder and Schrader have been able to 
identify a set of conditions, quite easy to verify, 
wich allow us to recover Wightman functions from 
Schwinger functions. A key role in this reconstruc- 
tion theorem is played by the so-called reflection 
positivity for Schwinger functions, a property quite 
easy to verify. In this way, a fully satisfactory 
solution for the physical interpretation of Euclidean 
field theory is achieved. 

From a historical point of view, an alternate route 
is possible. In fact, at the beginning of the exploita- 
tion of Euclidean methods in constructive quantum 
field theory, Nelson was able to isolate a set of 
axioms for the Euclidean fields (Nelson 1973a), 
allowing the reconstruction of the physical theory. 
Of course, Nelson axioms are more difficult to 


verify, since they also involve properties of the 
Euclidean fields and not only of the Schwinger 
functions. However, it is still very interesting to 
investigate whether the Euclidean fields play only an 
auxiliary role in the construction of the physical 
content of relativistic theories, or if they have a 
more fundamental meaning. 

From a physical point of view, the following 
considerations could also lead to further developments 
along this line. By its very structure, the Euclidean 
theory contains the fixed-time quantum correlations in 
the vacuum. In elementary quantum mechanics, it is 
possible to derive all physical content of the theory 
from the simple knowledge of the ground state wave 
function, including scattering data. Therefore, at least 
in principle, it should be possible to derive all physical 
content of the theory directly from the Euclidean 
theory, without any analytic continuation. 

We conclude this short section on the physical 
interpretation of the Euclidean theory with a mention 
of a quite surprising result (Guerra and Ruggiero 
1973) obtained by submitting classical field theory 
to the procedure of stochastic quantization in the sense 
of Nelson (1985). The procedure of stochastic 
quantization associates a stochastic process to each 
quantum state. In this case, in a fixed reference frame, 
the procedure of stochastic quantization, applied to 
interacting fields, produces, for the ground state, a 
process in the physical spacetime that has the same 
correlations as Euclidean field theory. This opens the 
way to a possible interpretation of Euclidean field 
theory directly in Minkowski spacetime. However, a 
consistent development along this line requires a new 
formulation of representations of the Poincaré group 
in the form of measure-preserving transformations in 
the probability space where the Euclidean fields are 
defined. This difficult task has not been accomplished 
as yet. 
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Introduction 


In this article we present the semigroup approach 
to linear and nonlinear evolution equations in 
general Banach spaces. In the first part we 
introduce the general frame and we explain the 
cornerstones of the widely developed theory of 
linear evolution equations. Besides the classical 
approach to linear evolution equations based on 
Co-semigroups, we also give a brief introduction to 
the more recent theory of maximal regularity. The 
entire linear theory is not only important on its 
own (which we prove by discussing applications to 
the heat equation, Schrödinger equation, wave 
equation, and Maxwell equations) but it is also 
the indispensable basis for the theory of nonlinear 
evolution equation, which we present in the second 
part. 


Linear Evolution Equations 


Let Eo be a Banach space, T > 0, and assume that 
A:={A(t);t € [0,T]} is a family of closed linear 
operators in Ey. By this we mean that, given t€ 
[0, T], there is a linear subspace D(A(t)) of Eo and 


linear mapping A(t): D(A(t)) C Eo — Eo such that the 
graph {(x,A(t)x);x € D(A(t))} of A(t) is a closed 
subspace of Ey x Eo. Given a mapping f:[0,T] > 
Eo anda vector uo € Eg, we study the following initial- 
value problem for (A,f,uo): find a function u € 
C'((0,T],Eo) such that u(t) € D(A(t)) for 
t € (0, T] and 


u (t) = A(t)u(t) + f(t), 


Sometimes we call [1] also the Cauchy problem of 
the linear evolution equation uw'(t) = A(t)u(t) + f(t). 
In the following, we will specify different conditions 
on (A,f,uo) which guarantee the well-posedness of 
[1], and we shall discuss several examples of 
equations of type [1] which are relevant in mathe- 
matical physics. 


te(0,T], u(0)=u [1] 


Autonomous Homogeneous Equations 


As in the case of ordinary differential equations in 
finite-dimensional spaces, it is convenient to con- 
sider first the autonomous version of [1], that is, we 
assume that A is trivial in the sense that T = œ and 
that A(O)=A(t) for all t> 0. In order to simplify 
our notation, we set A := A(0). We consider first the 
homogeneous problem 
u(t) = Au(t), 


tE(0,c), u(0)=u 12] 
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where uo € Ep is given. The question of the well- 
posedness of [2] is closely tied to the notion of a 
Co-semigroup in Eg. Let £(Eo9) denote the Banach 
space of all bounded linear operators on Ep, 
endowed with the usual operator norm. A one- 
parameter family 7 ={T(t) € L(Eo);t > 0} is called 
“Co-semigroup” in L(Eo) iff 


1. T(0)=idg, (normalization), 

2. T(s+t)=T(s)T(t) for all s,t>0 (semigroup 
property), and 

3. lim; oT (t)x =x for all x € Eo (strong continuity 
at 0). 


Given a Co-semigroup 7, we define its (infinite- 
simal) generator B by setting 


T(t)x — 


dom(B) := fx € Eo; lim X exists in Eo } 
t— 


and by defining 


T(t)x — x 


bye 2 hm for x € dom(B) 


t— 
This clearly defines a linear operator in Eo and it is 
well known that B is closed and densely defined. 
Moreover, we have 


Theorem 1 Assume that A: D(A) C Eo — Eo is the 
generator of a Co-semigroup {T(t);t > 0}. Then, given 
uo E€ D(A), problem [2] possesses a unique solution u 
in C'([0, 00), Eo), which is given by u(t) = T(t)uo. 


Under suitable additional assumptions it can be 
shown that the converse of Theorem 1 also holds 
true. However, we shall not go into these details but 
we prefer to present the following characterization 
of generators of Co-semigroups: 


Theorem 2 (Hille-Yosida). The operator A: D(A) 
C Ey — Eo generates a Co-semigroup iff it is closed, 
densely defined, and there exists w MER such that 
the resolvent set p(A) of A contains the ray (w, oo) and 
such that ||(\ — w)"(A — A) "|| < M forall > w and 
allneN. 


In applications, it is in general rather difficult to 
derive a uniform estimate of powers of the resolvent 
of an unbounded operator. Luckily, generators of 
Co-semigroups of contractions (i.e., ||T(t)||¢(z,) < 1 
for all t > 0) can be characterized in a rather useful 
way. To formulate this result we call an operator 
B:D(B) C Ey > Eo “dissipative” iff for any x€ 
D(B) there is an x’ € Eù with (x', x) =|lx||z, = lx’ lle 
such that Re(x’,Bx) <0. Here (-,-) denotes the 
duality pairing between Ej and Eo. The operator B 
is called “sm-dissipative” if it is dissipative and 
im( o — A) = Eo for some Xo > 0. 


Theorem 3 (Lumer-Phillips). Let A:D(A) c Eo > 
Eo be a closed and densely defined operator. Then 
A generates a Co-semigroup of contractions in L(Eo) iff 
A is m-dissipative. 


Before we shall discuss examples of Co-semigroups 
and their infinitesimal generators, let us introduce the 
following definition: given a € (0, 7], let £a := {z € C; 
larg (z)| < a} denote the sector in C of angle 2a. A 
family of operators T ={T(z) € L(Eo);z€ Xa} is 
called a “holomorphic Co-semigroup” in £(Eo) iff 


1. [z+ T(z)]: iq — L(Eo) is holomorphic, 
2. T(0)=idg, and lim,_.9T(z)x =x for all x € Eo, and 
3. T(w+z)=T(w)T(z) for all w,z € Xa. 


Generators of holomorphic Co-semigroups can be 
characterized in the following way: 


Theorem 4 A densely defined closed linear operator 
A:D(A) C Eg — Eo generates a holomorphic 
Co-semigroup iff there exist M > 0 and wo > 0 such 
that \ € p(A) and ||M\—A)1|| <M for all X€ C 
with Re A > wo. 


Examples 5 


(i) Self-adjoint generators. Let Eo be a Hilbert 
space and assume that A is self-adjoint and that 
there exists an Q@) ER such that A < ao. Then 
A generates a holomorphic Co-semigroup {T(t); t > 
0}. If {EA(å); A € R} denotes the spectral resolution 
of A, then T(t)= Jp exp (tà) dE4(A) for t > 0. 

(ii) Dissipative operators in Hilbert spaces. 
Assume again that Eo is a Hilbert space. Then, 
by Riesz’ representation formula, an operator A is 
dissipative iff Re(u|Au) < 0 for all u € D(A). 

(iii) The heat semigroup. Let M be either a 
smooth compact closed Riemannian manifold or 
R” with the Euclidean metric and write A for the 
Laplace—Beltrami operator on M. Then it is known 
that A € L(D'(M)), where D’(M) is the space of all 
distributions on M. Given 1 < p < œ, let 


D(A,) := {u € L,(M); Au € L,(M)} 


and set Apu = Au for u € D(A,). Then A, generates a 
holomorphic Co-semigroup on L (M), the so-called 
“diffusion” or “heat semigroup” on M. If 1 < p < œ, 
then it can be shown that D(A,)= W5(M), where 
We (M ) denotes the Sobolev space of order k € N, built 
over Lp(M). 

If M =R” then the operators T(t) of the semigroup 
generated by Ag» are given by 


(us) = an | (et) u(y) dy 


for all t > 0 and almost all x € R”. 


Observe that the case L,,(M) is excluded here. In 
fact, it is known that if a linear operator A generates 
a Co-semigroup on L,(M), then A must be 
bounded. However, it can be shown that suitable 
realizations of the Laplace—Beltrami operator on 
spaces of continuous and Holder continuous func- 
tions generate holomorphic semigroups. For more 
details on that topic the reader is referred to the 
“Further reading” section. 

(iv) Stone’s theorem and the Schrodinger equa- 
tion. Let Ep be a Hilbert space and assume that A 
is self-adjoint. Then Theorem 3 and Remark (ii) 
imply that iA generates a Co-group {U(t);t € R} of 
unitary operators. In fact, Stone’s theorem ensures 
that every generator of a Co-group of unitary 
operators is of the form iA with a self-adjoint 
operator A. As an example of particular interest, 
let us consider the Schrödinger equation 


1 Ou 
T Au — Vu [3] 
with a bounded potential V:R” — R. Letting 
D(A):= H?(R”) and Au:=Au— Vu, it follows 
that A is self-adjoint in L2(IR”). Hence, the evolution 
of [3] is governed by the group of unitary operators 
generated by iA. Of course, the assumption that V be 
bounded is rather restrictive. In fact, there are 
numerous contributions which show that this assump- 
tion can be weakened considerably. Again reader is 
referred to the “Further reading” section for more 
details in this direction. 
(v) The wave equation. Let us consider the 
following initial-value problem 


Ou(t,x)=0, xER”, t>0 
u(0,x) = yi(x), ðu/Ət(0, x) [4] 
= y2(x), x E€ R” 


for the d’Alembert operator O =6*u/0t7 — Agr in 
m+ 1 dimensions. In order to associate with [4] a 
semigroup, let us formally re-express [4] as the 
following first-order system: 

dU 


—— = AU, 


E> 0 
dt Ns 


U(0) = ® 
where 
0 id 
U = (u,u'), A=(4 4 ® = (y1, 42) 

Letting now Eo := H'(R”) x L2(R”) and 
D(A) := H?(R”) x H'(R”), it can be shown that A 
generates a Co-group of linear operators in L(Eo). 
Hence, given any initial datum (¢1, %2) E€ H?(R”)x 


H'(R”), there exists a unique solution u € C! 
([0, 00), L2(R”)) to the initial-value problem [4]. It 
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can be shown that this solution possesses the 
following additional regularity: 


u € C? ([0, 00), L2(R”)) Nn C([0, œ), H*(R”)) 


Hence, eqns [4] are satisfied for all ¢t € [0, coo) 
and for almost all x € R”. 

(vi) Maxwell equations. Let E and H denote the 
electric and magnetic field vector, respectively, € and u 
the electrical permittivity and magnetic permeability, 
respectively, and consider the initial-value problem for 
Maxwell equations in vacuum and without charges 
and currents: given sufficiently smooth vector fields 
(Eo, Ho) find a pair (E, H) such that 


2 — rot H = 0 in (0,00) x R? 

H 
pt rotE=0 in (0,00) x R? [5] 
E(0,-) = Eo, H(0,-)=Ho in R? 


We assume that £ and p belong to L,,(R°, Lym R? )) 
and are uniformly positive definite, that is, we 
assume that there are £ọ > 0 and pio > O such that 
2 2 
(e(x)yly) > eolyl”, (u(x) yly) = Holy 


for all x,y € R?. Based on these assumptions we 
endow the space L (Rẹ?) x L7(R°) with the inner 
product 


((u1, 42) \(V1, V2)) := (euvr), + (uuz|v2)r, 


for (u1, u2), (V1, V2) E€ L2(R°) x La (R°), and call this 
Hilbert space Eo. We further set 


Ej = { (u1, U2) € Fo; (rot u1, rot 72) = Fo} 
Finally, given u = (u1, u2) € Ey, let 
Au := (e7! rot u2, = rot u1) 


It can be shown that iA is self-adjoint in Eo. 
Hence, Stone’s theorem ensures that A generates a 
Co-group of unitary operators in L(Eo). Therefore, 
given (Eo, Ho) € E1, there exists a unique solution (E(-), 
H(-)) of [5]. For this solution, the energy functional 


BW) =5 | (CBWE) ss + (HHEIHG) r] dx 
is constant on [0, co). 


Autonomous Inhomogeneous Equations 


Next, we study problem [1] in the case A(t)=A for 
all z€ [0,T). Throughout this section we assume 
that the following minimal hypotheses 


1. A generates a Co-semigroup in L(Eo), 
2. 2- 14((0, T} Eo), and 
3. ug € Eo 
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are satisfied. Later on we shall discuss several more 
restrictive assumptions on (A,f,uo). A function 
u :[0, T] — Eo is called a “(classical) solution” of 


u (t) = Au(t) + f (t), 


iff u € C([0, T], Eo) N C'((0, T], Eo), u(t) € D(A) for 
all ż € (0, T], and u satisfies [6] pointwise on [0, T]. 
It can be shown that [6] has at most one solution. If 
it has a solution, this solution is represented by the 
following variation-of-constant-formula: 


te (0,T], u(O)=u [6l] 


u(t) = T(t)uo + [ T(t—s)f(s)ds, te |0,T] [7] 


where {T(t);t > 0} denotes the semigroup generated 
by A. Observe that the function u:[0,T] — Eo, 
defined by [7], is continuous, but in general not 
differentiable on (0, T]. For this reason one calls [7] 
the “mild solution” of [6]. 

It is not difficult to see that if uo € D(A) and f € 
C!({0, T], Eo), then the mild solution is a classical 
solution, that is, [6] is uniquely solvable in the 
classical sense. In application to nonlinear problems, 
the assumption f € C'([0,T],Eo) is often too 
restrictive. Fortunately, in the case of generators of 
holomorphic semigroups, this assumption on f can 
be weakened in two different directions. Let 
lælla :=|lx||z, + ||Ax||,, denote the graph norm on 
D(A). Then the closedness of A implies that 
(D(A); || - ||,) is a Banach space. In the following, 
we call this Banach space E1. Moreover, given a € 
(0,1), we write Ea= (Eo, E1), for the complex 
interpolation space between Ep and E1. Then we 
have the following result. 


Theorem 6 Let A generate a holomorphic 
Co-semigroup in L(Eo) and assume that there is a 
constant a € (0,1) such that 


f z G0; T], Eo) T C((0, T], Ea) 


Then, given uo € Eo, the Cauchy problem [6] 
possesses a unique classical solution. It is given by 


u(t) = T(t)uo + [ T(t—s)f(s)ds, t € [0,T] 


where {T(t);t > 0} stands for the semigroup gener- 
ated by A. 


In the following, we discuss an alternative 
approach to the Cauchy problem [6], which is 
based on the so-called theory of maximal regularity. 
There are several different types of results on 
maximal regularity, which we cannot discuss in full 
detail here. We decided to give a brief introduction 
to the theory of the so-called “maximal Ly-regular- 
ity.” For further results on maximal regularity, we 


again draw the reader’s attention to the “Further 
reading” section. 

The Banach space Ep is called an unconditionality 
of martingale “differences” (UMD) space if the 
Hilbert transform is bounded on L,(R, Eo) for 
some q € (1,co). It is known that Hilbert spaces, 
the Lebesgue spaces Lp(X,dj) with 1 < p < œ and 
with a o-finite measure space (X, um), and closed 
subspaces of UMD spaces are UMD spaces. 
Furthermore, UMD spaces are without exception 
reflexive. Thus, the spaces L;(X, du), L..(X, dy), and 
spaces of continuous or Holder continuous functions 
are not UMD spaces. 

Next, assume that —A generates a holomorphic 
Co-semigroup in L(Eo) and that [0,00) C p(—A). 
Then, it is known that, given z € C, the fractional 
power A? of A is a densely defined closed operator 
in Eg. We say that A has bounded imaginary powers 
(BIP) of angle 8 > 0 if there exist positive constants 
M and e such that 


A” € L(Eo) and IA" lego < Mexp(6|t]) 
t E€ (-€,€) [8] 


In order to have a neat notation, we write A € 


BIP(0) if [8] holds true. 


Remarks 7 In the following, we assume that —A 
generates a holomorphic Co-semigroup in £(Eo) and 
that [0, 00) C p(—A). 


(i) If Rez < 0, then A? is bounded on Eo. 

(ii) There are several representation formulas for 
the fractional powers of A. Among them we 
picked the following: if Rez € (—1,1) and x € 
D(A), then 


oS aie) f s*(s + A) 7 Ax ds 
0 


NTZ 


xr 


Assume that Eo is a Hilbert space, that A is self- 
adjoint, and that there is a positive constant a 
such that A > a. Further, let {E4(A) € R} be the 
spectral resolution of A; then 


(iii 


Mem | NdEa(), zec 
0 


Moreover, A € BIP(0). 

Let again Eo be a Hilbert space and assume that 
—A is m-dissipative and satisfies 0 € p(A). Then 
A € BIP(1/2). 


xr 


(iv 


Given p € (1,00), Sobolev’s embedding theorem 
ensures that W3 ((0, T), Eo) is continuously injected 
into C([0, T], Eo). Consequently, given any function 
u € W3 ((0, T), Eo) and t€[0,T], the pointwise 


evaluation u(t) is well defined. In particular, the 
trace at 0 with respect to time 


tr : W35 ((0, T), Eo) > Eo, um u(0) 


is a well-defined and bounded linear operator. In order 
to formulate the next result, let Esp = (Eo, E1)sp, with 
p € (1,00) and s € (0,1), denote the real interpolation 
space between the basic space Eo and E1, the domain 
D(A) of A, endowed with the graph norm. Further- 
more, we set 


Eo := Lp((0, T), Eo) 
E1 := L,((0,T), E1) n W, ((0, T), Eo) 


and we write Isom(E, F) for the set of all topological 
isomorphisms mapping the Banach space E onto the 
Banach space F. 


Theorem 8 (Dore and Venni). Suppose that Eo is a 
UMD space and that A € BIP(0) for some 
0 € [0, 7/2). Then, given p € (1,00), we have 


(ð; + A, tr) € Isom(E1, Eo x E1-1/pp) 


This means that, given (f,uo) € Lp((0,T), Eo) x 
Ei-1/pp, there exists a unique solution u€ 
L, ((0, T), E1) A w3 ((0, T), Eo) of the Cauchy problem 
[6]. Moreover, u depends continuously on (f, uo) 
and fulfills the following a priori estimate: 


lelie, < CMF lhe, + lolle spp) 


= 
where C := I] (Oy + A, tr) I C(x Ey_1/p,p»E1)" 


Nonautonomous Equations of Hyperbolic Type 


According to Theorem 1 and the corresponding 
remark, it is reasonable to impose in the study of the 
Cauchy problem [1] the minimal hypothesis that, 
given s € [0, T], each individual operator A(s) be the 
generator of a Co-semigroup {T;,(t);¢ > 0} in L(Eo). 
If this semigroup is holomorphic, we call [1] of 
“parabolic type.” Otherwise the evolution equation [1] 
is said to be of “hyperbolic type.” 

A family {A(t);t€[0,T]} of generators of 
Co-semigroups in L(Ep) is called “stable” iff there 
exist positive constants M and w such that (w, oo) C 
p(A(t)) for all € [0, T] and such that 


<M(A—w)*® forA\>w 











k 
[a-a 
j=1 





and every finite sequence 0 < t <t <- tp ST 
with k €e N. Observe that the resolvent operators 
(A — A(t;)) do not commute in general. Therefore, 
the order of the terms on the left-hand side of the above 
estimate has to be obeyed. Assume that A= {A(t);t € 
[0, T]} is a family of m-dissipative operators. Then, A 
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is stable, since any m-dissipative operator B satisfies 
the estimate ||(A — B)~'|| < 1/A for all \ > 0. 

It turns out that the stability of a family of 
generators is not sufficient to construct a solution of 
[1] even in the case f = 0. We also need a certain 
time regularity of the mapping t> A(t). For this we 
say that the family {A(t);¢ € [0, T]} has a common 
domain D iff D is a dense subspace of Eo such that 
D(A(t))=D for all t€ [0, T]. The family {A(¢);t € 
[0, T]} is called “strongly differentiable” iff it has a 
common domain D and, given v € D, the function 
tr+ A(t)v belongs to C!([0, T], Eo). 

We are now prepared to formulate the following 
result. 


Theorem 9 (Kato). Let {A(t);t € [0, T]} be a stable 
and strongly differentiable family of generators of 
Co-semigroups with common domain D. If f€ 
C!([0,T],Eo) and uo €D then [1] possesses a 
unique classical solution. 


The above result is based on the construction of 
an evolution operator U(t,s), which can be 
considered as the generalization of the notion of a 
Co-semigroup for autonomous equations to the case 
evolution equations of the form 


u(t)=A(t)u(t), te(s,T], u(s)=v 


for fixed s € [0, T). Once an evolution operator is 
available, the solution of [1] is given by 


u(t) = U(t,0)uo + [ U(t,s)f(s)ds, te [0,T| 


Of course, this generalizes [7] and if A(t) is 
independent of t, then U(t,s)=T(t—s), where 
{T(t);t > O} is the semigroup generated by A(0). 

Furthermore, there are several extensions of the 
Kato’s result. Among them the most interesting 
contributions are concerned to weaken the time 
regularity of f and to weaken the assumption that 
{A(t);¢ € [0, T]} be strongly differentiable. In parti- 
cular, it is possible to study [1] for families without 
a common domain. 

For the construction of evolution operators as 
well as generalizations of Theorem 9, the reader is 
again referred to the “Further reading” section. 


Nonautonomous Equations of Parabolic Type 


Throughout this section we assume that Ey and E1 
are Banach spaces such that EF, is dense and 
continuously injected in Eo. In the study of parabolic 
evolution equations, the class of all operators in 
L(E1, Eo), considered as unbounded operators in Eo 
with common domain E4, which generate holo- 
morphic Co-semigroups in £(Eq) has turned out to 
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be very useful. In the following, we call this class 
H(E,, Eo). It is known that A € H(E1, Eo) iff there 
exist constants w > 0 and x > 1 such that w— A € 
Isom(E,, Eo) and such that 


-1 < (A = A)xllo ea 
~ [Allo + Ula 7 
x €E,;\{0}, RerA>w 


where || - ||; denotes the norm of E;. Using the above 
characterization, it can be shown that H(E,, Eo) is 
an open subset of L(E1, Eo). In the following, we 
always endow H(E,, Eo) with the topology induced 
by the norm of L(E1, Eo). As a consequence of this 
convention it is meaningful to consider, for example, 
continuous mappings from [0,T] into H(E1, Eo). 
Observe that if A € C([0, T], H(E1, Eo)), then 
A={A(t);t € [0,T]} is a family of generators of 
holomorphic semigroups with the common domain 
E1. Then we have the following result. 


Theorem 10 (Sobolevskii, Tanabe). 
there is a p € (0,1) such that 


(A, f) € Œœ ([0, T], H(E1, Eo) x Eo) 


Then, given uo € Eo, the Cauchy problem [1] 
possesses a unique classical solution u. This solution 
has the additional regularity 


u C C’ ((0, T], E1) N CHO TJ, Eo) 
Finally, if ug € E1, then u € C'([0, T], Eo). 


Assume that 


As in the hyperbolic case, the proof of Theorem 10 
is based on the evolution operator U(t,s) for the 
homogeneous problem, although the constructions of 
the corresponding evolution operators are completely 
different. 

In addition, there are several extensions and 
generalizations of Theorem 10. In particular, the 
assumption that the family {A(t);¢ € [0,T]} pos- 
sesses a common domain can be weakened con- 
siderably. Furthermore, it is possible to look at 
parabolic evolution equations in the so-called inter- 
polation and extrapolation scales. This offers a great 
flexibility in the study of nonlinear problems. 
Further details in this direction can be found in the 
“Further reading” section. 


Nonlinear Evolution Equations 


Let Eo,E; be Banach spaces such that E4 is 
dense and continuously embedded in Ey. Assume 
further that uo € Eq and that we are given a 
nonlinear operator F € C([0, T] x V, Eo), where V 
is an open neighborhood of uo in E4. In this 
section, we will discuss the well-posedness of the 


Cauchy problem for the following nonlinear 
evolution equation 


u (t) = F(t, u(t)), 


in the Banach space Eg. We will always assume 
that the nonlinear operator F either carries a quasi- 
linear structure or is of fully nonlinear parabolic type. 
By a “quasilinear structure,” we mean that there is 
mapping A € C([0, T] x V, £(E1, Eo)) and a suitable 
“lower-order term” f € C([0, T] x V, Eo) such that 


F(t v) = A(t,v)v + f(t, v) 
for all (t, v) € [0, T] x V 


te(0,T], u(0)=u [9] 


Problem [9] is of fully nonlinear parabolic type if 
F € C!([0, T] x V, Eo) and if the Fréchet derivative 
D2F(0, uo) of F with respect to v at (0, uo) belongs to 
the class H(E1, Eo). 


Quasilinear Evolution Equations of Hyperbolic Type 


Assume that Eo is a reflexive Banach space and let 
uo E€ V C E; be chosen as above. We consider the 
following abstract quasilinear evolution equation of 
hyperbolic type: 


u'(t) = A(t, u(t))u(t) + f(t, u(t)), t € (0, T] 


u(0) = uo eC 


and assume that the following hypotheses are 
satisfied: 


(Hı) A€ C([0, T] x V, £(E1,Eo)) is bounded on 
bounded subsets of V and, given (t,v) € 
[0, T] x V, the operator A(t, v) is m-dissipative 
and there is a constant ua such that 


JAC, v) — AZ, w) || (Ey Eo) < pally — wle, 


for all ¢ € [0, T] and all v,w € V. 

(H2) There is a ỌO € Isom(E1,Eo) such that 
OA(t,v)O! = A(t, v) + B(t, v), where B(t,v) € 
L(Eọ) is bounded, uniformly on bounded 
subsets of V. Moreover, 


|B(t,v) — BY, w) Ice.) < valle — wle 


for all ż € [0, T] and all v,w € V. 
(H3) f € C([0,T] x V, E1) is bounded on bounded 
subsets of V and there are uo and u such that 


IŒ v) — f(t, w)lle, < mlle — wll, 
for all v,w € V,j € {0,1} 


Then we have the following result. 


Theorem 11 (Kato). Assume that (Hı), (H2), 
and (H3) are satisfied. Then there is a maximal 


t+ € (0, T], depending only on ||uo||p,, and a unique 
solution u to[10] such that 


u = u(-,uo) € C([0, £"), V) n C*((0, t), Eo) 


Moreover, the mapping uo => u(-,uo) is continuous 
from V to C([0,t*), V) A C([0, t+), Eo). 


There are many applications of Theorem 11 to 
different concrete partial differential equations 
(PDEs), including symmetric hyperbolic first- 
order systems, the Korteweg-de Vries equation, 
nonlinear elastodynamics, quasilinear wave equa- 
tions, Navier-Stokes and Euler equations, and 
coupled Maxwell-Dirac equations. We decided 
to explain in some detail an application to the 
so-called periodic Camassa-Holm equation: 


Ut — Uxxt + 3uux = Litex + UUxxx 
t>0,xeS! [11] 


where S! stands for the unit circle. In the above 
model, the function u is the height of a unilinear 
water wave over a flat bottom. 
Set X:=L,(S'),V:=H'(S'), and Q:= (I — 62)". 
With y:=u — uxx, eqn [11] can be re-expressed as 
y+ (Q)yx = —2y(Qy), in Lp(S") 


which is of type [10] with 


A(y) =(QO*y)dx, f(y) = -2y(O~*Y),., 
where dom(A(y)) := {v € L2(S');(O#y)v € H1(S')}. 


ye V 


Quasilinear Evolution Equations of Parabolic Type 


Assume that Eo and E, are Banach spaces such that 
F, is dense and continuously injected in Eo. More- 
over, let (-, -)ọ for each 6 € (0,1) be an admissible 
interpolation functor (e.g., the real or complex 
interpolation functor) and set Eg:= (Eo, E1)g for 6 € 
(0,1). Given a subset X C Eg for some 6 € (0,1), we 
set X,:=XME, for 7 € [0,1], equipped with the 
topology induced by E,,. Finally, we write C} (M, N) 
for the class of all locally Lipschitz continuous 
functions mapping the metric space M into the 
metric space N. 


Theorem 12 (Amann). Suppose that 0<y<6< 
a < 1, that Xg is open in Eg, and that 


(A, f) € C+- ([0, T] x Xg, H(E1, Eo) x E,) 


Then, given uo € Xa, there exists a unique maximal 
tt €(0,T], such that the quasilinear parabolic 
Cauchy problem 


u'(t) = A(t,u(t))u(t) +f(t,u(t)), te (0,T], u(0) = uo 
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possesses a unique classical solution 
u := u(-, ug) € C([0, £"), Xa) N C'((0,t*), Eo) 
Assume A and f are independent of t and let u(- , uo) be 
the solution to corresponding autonomous problem 
u (t) = A(u(t))u(t) + f (u(t)), 


Then the mapping (t, uo) u(t,uo) is a semiflow 
on Xa. 


t € (0,00), u(0) = uo 


Due to its clarity and flexibility, Theorem 12 has 
found a plethora of applications, which we cannot 
discuss in detail here. Let us at least mention the 
following: reaction-diffusion systems, population 
dynamics, phase transition models, flows through 
porous media, Stefan problems, and nonlinear and 
dynamic boundary conditions in boundary-value 
problems. In addition, many geometric evolution 
equations fall into the scope of Theorem 12. Consider, 
for example, the volume-preserving gradient flow of 
the area functional of a compact hypersurface M in 
R”! with respect to L2 (M) and W35! (M), respectively. 
These flows are known as the averaged mean curvature 
flow and the surface diffusion flow, respectively, and 
have been investigated on the basis of Theorem 12. 


Fully Nonlinear Evolution Equations 
of Parabolic Type 


Based on the theory of maximal regularity for linear 
evolution equations, it is possible to investigate 
abstract fully nonlinear parabolic problems of type 
[9]. As there are different techniques of maximal 
regularity, there are also different approaches to [9]. 
We present here a result which uses maximal 
regularity properties in singular Hölder spaces C 
Let Eg and E,; be Banach spaces such that Ej is 
continuously embedded into Eg (density of E; in Eo 
is not needed here). As before, V is an open subset of 
E, and D>F stands for the Fréchet derivative of 
F(t,v) with respect to the second variable. 


Theorem 13 (Lunardi). Assume that Fe C 
((0,T] x V,Eo) such that D2F € Cl([0, T] x 
V,H(E,,Eo)). Then, given uo € V, there is a max- 
imal t* € (0, T] such that problem [9] has a solution 
u € C([0,t+), E1) O CH[0, t+), Eo). This solution is 
unique in the class 


c3((0, d > e], E1) N C((0, > é| 4) 
0<8<1 
for each £ € (0,t*). 


Theorem 13 has important applications to problems 
for which the hypotheses of Theorem 12 (in particular 
the assumption on the quasilinear structure) are not 
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satisfied. We mention here fully nonlinear second-order 
boundary-value problems, Hele-Shaw models, models 
from combustion theory, and Bellman equations. 


See also: Boltzmann Equation (Classical and Quantum); 
Breaking Water Waves; Dissipative Dynamical Systems 
of Infinite Dimension; Elliptic Differential Equations: 
Linear Theory; Ginzburg—Landau Equation; Image 
Processing: Mathematics; Incompressible Euler 
Equations: Mathematical Theory; Nonlinear Schrodinger 
Equations; Partial Differential Equations: Some 
Examples; Quantum Dynamical Semigroups; Relativistic 
Wave Equations Including Higher Spin Fields; Semilinear 
Wave Equations; Separation of Variables for Differential 
Equations; Singularities of the Ricci Flow; Symmetric 
Hyperbolic Systems and Shock Waves; Wave Equations 
and Diffraction. 
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Introduction 


The renormalization group (RG) in its modern form 
was invented by K G Wilson in the context of 
statistical mechanics and Euclidean quantum field 
theory (EQFT). It offers the deepest understanding of 
renormalization in quantum field theory (QFT) by 
connecting EQFT with the the theory of second-order 
phase transition and associated critical phenomena. 
Thermodynamic functions of many statistical mechan- 
ical models (the prototype being the Ising model in two 
or more dimensions) exhibit power-like singularities as 
the temperature approaches a critical value. One of the 
major triumphs of the Wilson RG was the prediction 
of the exponents (known as critical exponents) 
associated to these singularities. Wilson’s fundamental 
contribution was to realize that many length scales 
begin to cooperate as one approaches criticality and 
that one should disentangle them and treat them one at 
a time. This leads to an iterative procedure known 
as the “renormalization group.” Singularities and 
critical exponents then arise from a limiting process. 
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Ultraviolet singularities of field theory can also 
be understood in the same way. Wilson reviews this 
(Wilson and Kogut 1974) and gives the historical 
genesis of his ideas (Wilson 1983). 

The early work in the subject was heuristic, in the 
sense that clever but uncontrolled approximations 
were made to the exact equations often with much 
success. Subsequently, authors with mathematical bent 
began to use the underlying ideas to prove theorems. 
Benfatto, Cassandro, Gallavotti, Nicolo, Olivieri et al. 
pioneered the rigorous use of Wilson’s renormalization 
group in the construction of super-renormalizable 
QFTs, (see Benfatto and Gallavotti (1995) and 
references therein). The subject saw further mathema- 
tical development in the work of Gawedzki and 
Kupiainen (1984, 1986) and that of Balaban (1982), 
and references therein. Balaban in a series of papers 
ending in Bataban (1989) proved a basic result on the 
continuum limit of Wilson’s lattice gauge theory. 
Brydges and Yau (1990) simplified the mathematical 
treatment of the renormalization group for a class of 
models and this has led to further systemization and 
simplification in the work of Brydges et al. (1998, 
2003). Another method which has been intensely 
developed during the same historical period is based on 
phase cell expansions: Feldman, Magnen, Rivasseau, 
and Sénéor developed the early phase cell ideas of 


Glimm and Jaffe and were able to prove independently 
many of the results cited earlier (see Rivasseau (1991) 
and references therein). Although these methods share 
many features of the Wilson RG, they are different in 
methodology and thus remain outside of the purview 
of the present exposition. 

A somewhat different line of development has been 
the use of the RG to give simple proofs of perturbative 
renormalizability of various QFTs: Gallavotti 
and Nicolo, via iterative methods (see Benfatto and 
Gallavotti (1995) and references therein), and 
Polchinski (1984), who exploited a continuous version 
of the RG for which Wilson (1974) had derived a 
nonlinear differential equation. These early works 
were devoted to the standard (*), scalar field theory, 
but subsequently Polchinski’s work has been extended 
to a large class of models, including four-dimensional 
nonabelian gauge theories (see Kopper and Muller 
(2000) and references therein). 

Finally, it should be mentioned that apart from 
QFT and statistical mechanics, the RG method has 
proved fruitful in other domains. An example is the 
study of interacting fermion systems in condensed 
matter physics (see Fermionic Systems and Renor- 
malization: Statistical Mechanics and Condensed 
Matter). In the rest of this article, our focus will be 
on EQFT and statistical mechanics. 


The RG as a Discrete Semigroup 


We will first define a discrete version of the RG and 
consider its continuous version later. As we will see, 
the RG is really a semigroup, so calling it a group is 
a misnomer. 

Let @ be a Gaussian random field (see, e.g., Gelfand 
and Vilenkin (1964) for a discussion of random fields) 
in Rf. Associated to it there is a positive-definite 
function which is identified as its covariance. In QFT 
one is interested in the covariance 


E(6(x)6(9)) = const. [x — y|?” 


— ip.(x—y) 
io dpe 


1 
pm 





Here [¢] > 0 is the (canonical) dimension of the 
field, which for the standard massless free field is 
[ġ]=(d — 2)/2. The latter is positive for d> 2. 
However, other choices are possible but in EQFT 
they are restricted by the Osterwalder—Schrader 
positivity. It is assured if [¢6]=(d—a)/2, with 
0O<a<2. If a < 2, we get a generalized free field. 

Observe that the covariance is singular for x = y and 
this singularity is responsible for the ultraviolet 
divergences of QFT. This singularity has to be initially 
cut off and there are many ways to do this. A simple 
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way is as follows. Let u(x) be a smooth, rotationally 
invariant, positive-definite function of fast decrease. 
Examples of such functions are legion. Observe that 


~ ge f=20 —2|¢] 
lx — y|? = const. | T l u( j ) [2] 


as can be seen by scaling in /. We define the unit 
ultraviolet cutoff covariance C by cutting off at the 
lower end point of the / integration (responsible for 
the singularity at x=y) at /=1, 


C(x — y) is positive-definite and everywhere smooth. 
Being positive-definite, it qualifies as the covariance 
of a Gaussian probability measure denoted uc on a 
function space 2 (which it is not necessary to specify 
any further). The covariance C being smooth implies 
that the sample fields of the measure are uc almost 
everywhere sufficiently differentiable. 





Remark Note that, more generally, we could have 
cut off the lower end point singularity in [1] at any 
€ > 0. The e-cutoff covariance is related to the unit 
cutoff covariance by a scale transformation (defined 
below) and we will exploit this relation later. 


Let L > 1 be any real number. We define a scale 
transformation Sz on fields ¢ by 


$.6(x) = Ll 6(F) 4] 
on covariances by 
S$. C(x —y) = Lc) [5] 


and on functions of fields F(¢) by 
SLE($) = F(SLġ) [6] 


The scale transformations form a multiplicative 
group: SF = Srn. 
Now define a fluctuation covariance Tz: 


T(x — y) = [ fre u(= F ”) 7 


T(x— y) is smooth, positive-definite and of fast 
decrease on scale L. It generates a key scaling 
decomposition 


C(x — y) =TL(x — y) + SLC(x — y) [8] 





Iterating this, we get 


eS L(y) 9] 


n=0 
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where 


Pal% — y) = SeT r(x — y) = LAT, (=) (10) 


The functions I’,,(x — y) are of fast decrease on scale 
case 

Thus, [9] achieves the decomposition into a sum 
over increasing length scales as desired. Being 
positive definite, I, qualify as covariances of 
Gaussian probability measures, and therefore 
Lc= Qo mr,- Correspondingly introduce a family 
of independent Gaussian random fields ¢,, called 
fluctuation fields, distributed according to ur,. Then 


o= ae [11] 
n=0 


Note that the fluctuation fields ¢, are slowly varying 
over length scales L”. In fact, an easy estimate using 
a Tchebycheff inequality shows that, for any y > 0, 


|x- y| < L” 
=> pc(IGn(x) — Ga(y)| > y) < const. y [12] 


which reveals the slowly varying nature of ¢, on 
scale L”. Equation [11] is an example of a multiscale 
decomposition of a Gaussian random field. 

The above implies that the uc integral of a function 
can be written as a multiple integral over the fields ¢,,. 
We calculate it by integrating out the fluctuation fields 
Cn step by step, going from shorter to longer length 
scales. This can be accomplished by the iteration of a 
single transformation Tz, a renormalization group trans- 
formation, as follows. Let F(@é) be a function of fields. 
Then we define a RG transformation F — TF by 


(TLF)(¢) = Sur, * F(¢) 
n J dur, (OE(C+ So) [13] 


Thus the renormalization group transformation 
consists of a convolution with the fluctuation 
measure followed by a rescaling. 


Semigroup Property 

The discrete RG transformations form a semigroup: 
Ty Tp» = Tino forall n > 0 [14] 

To prove this, we must first see how scaling 


commutes with convolution with a measure. We 
have the property 


Lr, * SLF = SLMS, T, *EF [15] 


To see this, observe first that if Ç is a Gaussian 
random field distributed with covariance Tz then the 


Gaussian field S;¢ is distributed according to S, Ty. 
This can be checked by computing the covariance of 
SLC. Now the left-hand side of [15] is just the 
integral of F(S;¢+S,¢) with respect to dyr, (¢). 
By the previous observation, this is the integral of 
F(C + SL) with respect to dus, pr, (Ç), and the latter is 
the right-hand side of [15]. Now we can check the 
semigroup property trivially: 
Ty Tye F = Symp, * Sree, * F 
= SLS” Ms nl, * UE yn * F 
= Orar HT n+SpnT x F 
= Sin HT iny x F 
= Traat [1 6] 
We have used the fact that Tga + Star L =D you. This 
is because Sz- has the representation [7] with 
integration interval changed to [L”, L”*+]. 
We note some properties of T. Ty has an unique 


invariant measure, namely pc: for any bounded 
function F, 





[auc TLF E fanc F [17] 


To understand [17], recall the earlier observation 
that if ¢ is distributed according to the covariance C, 
then Sọ is distributed according to SC. By [8], 
rr + S C=C. Therefore, 


[suctit = fanosi, x F 


= [suc 18) 


The uniqueness of the invariant measure follows 
from the fact that the semigroup Ty, is realized by a 
convolution with a probability measure and, there- 
fore, is positivity improving: 

F20 ucae > pF > 0, ue ac: 

Finally, note that Ty is a contraction semigroup 
on L?(duc) for 1< p< œ. To see this, note that 
since Tz is a convolution with a probability measure 
TLF = us, r; * SLF, we have, via Hölder’s inequal- 
ity, |TLF|? < T,|F|?. Then use the fact that uc is an 
invariant measure. 


Eigenfunctions 


Let :Pn,m:(ġ(x)) be a C Wick-ordered local mono- 
mial of m fields with n derivatives. Define 


Prm(X) = J E 


The functions P,,,(X) play the role of eigenfunc- 
tions of the RG transformation Tz up to a scaling of 
volume: 


Po See L X) [19] 


Because of the scaling in volume, P,,,,,(X) are not 
true eigenfunctions. Nevertheless, they are very 
useful because they play an important role in the 
analysis of the evolution of the dynamical system 
which we will later associate with Tz. They are 
classified as expanding (relevant), contracting 
(irrelevant) or central (marginal), depending on 
whether the exponent of L on the right-hand side 
of [19] is positive, negative, or zero, respectively. 
This depends, of course, on the space dimension 
d and the field dimension [d]. 

Gaussian measures are of limited interest. But we 
can create new measures by perturbing the Gaus- 
sian measure uc with local interactions. We cannot 
study directly the situation where the interactions 
are in infinite volume. Instead, we put them in a 
very large volume which will eventually go to 
infinity. We have a ratio of two length scales, one 
from the size of the diameter of the volume and the 
other from the ultraviolet cutoff in uc, and this 
ratio is enormous. The RG is useful whenever there 
are two length scales whose ratio is very large. It 
permits us to do a scale-by-scale analysis and at 
each step the volume is reduced at the cost of 
changing the interactions. The largeness of the ratio 
is reflected in the large number of steps to be 
accomplished, this number tending eventually to 
infinity. This large number of steps has to be 
controlled mathematically. 


Perturbation of the Gaussian Measure 


Let Ay =[-L/2, Na C Rf be a large cube in 
R?. For any XC Ay, let Vo(X,¢) be a local 
semibounded function where the fields are 
restricted to the set X. Here “local” means that if 
X,Y are sets with disjoint interiors then Vo(X U 
Y, 6) = Vo(X) + Vo(Y). Consider the integral 
(known as the partition function in QFT and 
statistical mechanics) 


Z(An) = J duc(p)zo(An, $) [20] 
where 
zo(X,¢) = e7 Vo(X,9) [21] 


and 


di (An, 8) = padel VN 22 
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is the corresponding probability measure. Vo is 
typically not quadratic in the fields and therefore 
leads to a non-Gaussian perturbation. For example, 


Vo(X, 4) = | dx(élV (x)? 


+ goo (x) + Hod" (x)) |23] 


where we take gọ > 0. The integral [20] is well 
defined because the sample fields are smooth. 

We now proceed to the scale-by-scale analysis 
mentioned earlier. Because uc is an invariant 
measure of Tz, we have the partition function 
Z(An) in the volume Ay as 


Z(An) = J duc()zo(An, $) 
— J ducl(¢)Trzo(An,¢) R4 


The integrand on the right-hand side is a new 
function of fields which, because of the final scaling, 
live in the smaller volume Ay_ 1. This leads to the 
following definition: 


z1(An-1, $) = Trzo(An, $) [25] 


Because Vo is local, zo has a factorization property for 
unions of sets with disjoint interiors. This is no longer 
the case for z1. Wilson noted that, nevertheless, the 
integral is well approximated by an integrand which 
does, but the approximator has new coupling con- 
stants. The phrase “well approximated” is what all the 
rigorous work is about and this was not evident in the 
early Wilson era. The idea is to extract out a local part 
and also consider the remainder. The local part leads 
to a flow of coupling constants and the (unexponen- 
tiated) remainder is an irrelevant term. This operation 
and its mathematical control is an essential feature of 
RG analysis. 

Iterating the above transformation, we get, for all 
O<n<N, 


O OEA E AN) [26] 


After N iterations, we get 


Z(An) = J ETE E 27] 


where Ao is the unit cube. To take the limit as 
N — œ, we have to control the infinite sequence 
of iterations. We cannot hope to control the 
infinite sequence at the level of the entire partition 
function. Instead, one chooses representative coor- 
dinates for which the infinite sequence has a 
chance of having a meaning. The coordinates are 
provided by the coupling constants of the 
extracted local part and the irrelevant terms (an 
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approximate calculation of the flow of coupling 
constants is given in the next section). The 
existence of a global trajectory for such coordi- 
nates helps us to control the limit for moments of 
the probability measure (correlation functions). 
The question of coordinates and the representation 
of the irrelevant terms will be taken up in the 
section “Rigorous RG analysis.” 


Ultraviolet Cutoff Removal 


The next issue is ultraviolet cutoff removal in field 
theory. This problem can be put into the earlier 
framework as follows. Let en be a sequence of 
positive numbers which tend to 0 as N—-oo. 
Following the remark after [3], we replace the unit 
cutoff covariance C by the covariance Ce, defined 
by taking en (instead of 1) as the lower end point in 
the integral [3]. Thus, ey acts as a short-distance or 
ultraviolet cutoff. It is easy to see that 


Coy (% — Y) = Sey C(x — y) |28] 

Consider the partition function Z,..(A) in a cube 
A=[-R/2, R/2]*: 

vim (A) — J duc. (peT Vol An Enn) [29] 


where Vo is given by [23] with go, uo replaced by 
&N, LN, respectively. By dimensional analysis we can 
write 


by BH, a, 
~ — __(2[¢|-d) pe 
LUN = €N H 


where g,€,, are dimensionless parameters. Now @ 
distributed according to Ce equals in distribution 
Se Q distributed according to C. Therefore, choosing 
en =L, we get 


VAM / duic(d)e~ VA Sainn än) 


= J duce ONA [31] 


where Ay =[—LNR/2, LYR. Thus, the field 
theory problem of removing the ultraviolet cutoff, 
that is, taking the limit en — 0, has been reduced to 
the study of a statistical mechanical model in a very 
large volume. The latter has to be analyzed via RG 
iterations as before. 


Critical Field Theories 


As mentioned earlier, we have to study the flow of 
local interactions as well as that of irrelevant terms. 
Together they constitute the RG trajectory and we 
have to prove that it exists globally. In general, the 


trajectory will tend to explode after a large number 
of iterations due to growing relevant terms (char- 
acterized in terms of the expanding Wick monomials 
mentioned earlier). Wilson pointed out that the 
saving factor is to exploit fixed points and their 
invariant manifolds by tuning the initial interaction 
so that the RG has a global trajectory. This leads to 
the notion of a critical manifold which can be 
defined as follows. A fixed point will have contract- 
ing and/or marginal attractive directions besides the 
expanding ones. In the language of dynamical 
systems, the critical manifold is the stable or center 
stable manifold of the fixed point in question. This 
is determined by a detailed study of the discrete 
flow. In the examples above, it amounts to fixing 
the initial “mass” parameter [0 = uc(g0) with a 
suitable function ue such that the flow remains 
bounded in an invariant set. The critical manifold is 
then the graph of a function from the space of 
contracting and marginal variables to the space of 
ws which remains invariant under the flow. 
Restricted to it the flow will now converge to a 
fixed point. All references to initial coupling 
constants have disappeared. The result is known as 
a critical theory. 

Critical theories have been rigorously constructed 
in a number of cases. Take the standard ¢* in d 
dimensions. Then [¢] = (d — 2)/2. For d > 5 the ¢* 
interaction is irrelevant and the Gaussian fixed point 
is attractive with one unstable direction (corre- 
sponding to u). In this case one can prove that the 
interactions converge exponentially fast to the 
Gaussian fixed point on the critical manifold. For 
d=4 the interaction is marginal and the Gaussian 
fixed point attractive for g > 0. The critical theory 
has been constructed by Gawedzki and Kupainen 
(1984) starting with a sufficiently small coupling 
constant. The fixed point is Gaussian (interactions 
vanish in the limit) and the convergence rate is 
logarithmic. This is thus a mean-field theory with 
logarithmic corrections, as expected on heuristic 
grounds. The mathematical construction of the 
critical theory in d=3 is an open problem. (It is 
expected to exist with a non-Gaussian fixed point, 
and this is indicated by the perturbative € expansion 
of Wilson and Fisher in 4 — e dimensions.) However, 
the critical theory for d=3 for [ọ]=(3 — €)/4 
for ¢€>0O held very small has been rigorously 
constructed by Brydges et al. (2003). This theory 
has a nontrivial hyperbolic fixed point of O(c). The 
stable manifold is constructed in a small neighbor- 
hood of the fixed point. Note that the covariance 
without cutoff is Osterwalder—Schrader positive and 
thus this is a candidate for a nontrivial EQFT. For 
¢€=1 we have the standard situation in d=3, and 


this remains open, as mentioned earlier. A very 
simplified picture of the above is furnished by the 
perturbative computation in the next section. 


Unstable Fixed Points 


We may attempt to construct field theories around 
unstable fixed points. In this case the initial 
parameters have to be adjusted as functions of the 
cutoff in such a way as to stabilize the flow in the 
neighborhood of the fixed point. This may be called 
a genuine renormalization. A famous example of 
this is pure Yang-Mills theory in d=4, where the 
Gaussian fixed point has only marginal unstable 
directions. Balaban in a series of papers ending in 
Balaban (1989) considered Wilson’s lattice cutoff 
version of Yang-Mills theory in d=4 with initial 
coupling fixed by the two-loop asymptotic freedom 
formula. He proved, by lattice RG iterations, that in 
the weak-coupling regime the free energy per unit 
volume is bounded above and below by constants 
independent of the lattice spacing. Instability of the 
flow is expected to lead to mass generation for 
observables but this is a famous open problem. 
Another example is the standard nonlinear sigma 
model for d=2. Here too the flow is unstable 
around the Gaussian fixed point and we can set the 
initial coupling constant by the two-loop asymptotic 
freedom formula. Although much is known via 
approximation methods (as well as by methods 
based on integrable systems) this theory remains to 
be rigorously constructed as an EQFT. 

Let us now consider a relatively simpler 
example, that of constructing a massive super- 
renormalizable scalar field theory. This has been 
studied in d=3, with [¢]=(d—2)/2=1/2. We 
get E=, g = Lg, w=L?NQ, and & is taken to be 
small. € is marginal, whereas g,y are relevant 
parameters and grow with the iterations. After N 
iterations, they are brought up to g,j together 
with remainders. This realizes the so-called 
massive continuum ¢f theory in d=3, and this 
has been mathematically controlled in the exact 
RG framework. This was proved by Brydges, 
Dimock, and Hurd and earlier by Benfatto, 
Cassandro, Gallavotti, and others, (see the refer- 
ences in Brydges et al. (1998) and Benfatto and 
Gallavotti (1995)). 


The Exact RG as a Continuous Semigroup 


The discrete semigroup defined in [13] of the previous 
section has a natural continuous counterpart. Just take 
L to be a continuous parameter, L=e’,t > 0, and 
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write by abuse of notation T;, S+, T+ instead of Te, etc. 
The continuous transformations T;, 


TF = Sir, x F [32] 
give a semigroup 
TT, = digs [33] 


of contractions on L?(duc) with uc as invariant 
measure. One can show that T; is strongly contin- 
uous and, therefore, has a generator which we will 
call £. This is defined by 

. T,-1 
LF = lim — 


t—0r 





F [34] 


whenever this limit exists. This restricts F to a 
suitable subspace D(L) C L?(duc). D(L) contains, 
for example, polynomials in fields as well as twice- 
differentiable bounded cylindrical functions. The 
generator £ can be easily computed. To state it, we 
need some definitions. Define (D”F)(¢;/,,...,f,) as 
the nth tangent map at ¢ along directions f,,..., fy. 
The functional Laplacian A; is defined by 


A+F(¢) = J dug(C)(D?F)(4;¢,0) BS] 


where =u. Define an infinitesimal dilatation 
operator 


Do(x) = x: V(x) [36] 
and a vector field X, 


XF = —|$|(DF)(¢; 6) — (DF); D) [87] 


Then, an easy computation gives 
L= t Ab + X [3 8| 


T; is a semigroup with £ as generator. Therefore, 
T,;=e. Let F,(¢)=T;F(¢). Then F, satisfies the 
linear PDE 

OF, 

—= LF 39 

Ft LER, 39] 
with the initial condition Fo =F. This evolution 
equation assumes a more familiar form if we write 
F, =e, V; being known as the effective potential. 
We get 

OV, 1 


ae = EV -5 (Vog (Voo [40] 


where 


(V,())4-(Vi())4 = J dug(C)((DVN(O:0)? [41] 


and Vo= V. This infinite-dimensional nonlinear 
PDE is a version of Wilson’s flow equation. 
Note that the linear semigroup T; acting on 
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functions induces a semigroup R, acting non- 
linearly on effective potentials giving a trajectory 
Vi = Rt Vo. 

Equations like the above are notoriously difficult 
to control rigorously, especially for large times. 
However, they may be solved in formal perturbation 
theory when the initial Vo is small via the presence 
of small parameters. In particular, they give rise 
easily to perturbative flow equations for coupling 
constants. They can be obtained to any order but 
then there is the remainder. It is hard to control the 
remainder from the flow equation for effective 
potentials in bosonic field theories. They require 
other methods based on the discrete RG. Never- 
theless, these approximate perturbative flows are 
very useful for getting a preliminary view of the 
flow. Moreover, their discrete versions figure as an 
input in further nonperturbative analysis. 


Perturbative Flow 


It is instructive to see this in second-order perturba- 
tion theory. We will simplify by working in infinite 
volume (no infrared divergences can arise because 
P(x — y) is of fast decrease). Now suppose that we 
are in standard ¢* theory with [¢]= (d — 2)/2 and 


d > 2. We want to show that 
Vix f de(&: ITO): s 0l): 


+ He:6(x)”:) 42] 


satisfies the flow equation in second order modulo 
irrelevant terms provided the parameters flow 
correctly. We will ignore field-independent terms. 
The Wick ordering is with respect to the covariance 
C of the invariant measure. The reader will notice 
that we have ignored a ¢° term which is actually 
relevant in d=3 for the above choice of [ø]. This is 
because we will only discuss the d=3 case for the 
model discussed at the end of this section and for 
this case the ¢° term is irrelevant. We will assume 
that €, 4, are of order O(g*). Plug in the above in 
the flow equation. The quantity \,°"":Py,m: repre- 
sents one of the terms above with m fields and n 
derivatives. Because £ is the generator of the 
semigroup T; we have 


o 
u ed a 
(5; £\x ) 


= (2 — (d — mjg] — n)a) Pum: [43] 





Next turn to the nonlinear term in the flow equation 
and insert the ¢* term (the others are already of 
order O(g*)). This produces a double integral of 


T(x — y):b(x)?:b(y)?:, which after complete Wick 
ordering, gives 


2 
— i 16 / dx dy I(x — y) COKIN 
+ 9C(x — y) -P(x Ay): +36C(x — y) :#(x)p(y): 
+6C(x - y)*) [44] 


Consider the nonlocal ¢* term. We can localize it by 
writing 


- (œ)? - e077): as 


The local part gives a ¢f contribution and the last 
term above gives rise to an irrelevant contribution 
because it produces additional derivatives. The 
coefficients are well defined because C, are smooth 
and I'(x — y) is of fast decrease. Now the nonlocal œ? 
term is similarly localized. It gives a relevant local ¢” 
contribution as well as a marginal |V¢|* contribu- 
tion. Finally, the same principle applies to the 
nonlocal ¢° contribution and gives rise to further 
irrelevant terms. Then it is easy to see by matching 
that the flow equation is satisfied in second order up 
to irrelevant terms (these would have to be compen- 
sated by adding additional terms in V;) provided 


d 

= (4 — d)g, — ag? + O(g?) 

d 

en = 2, — bg? + O(g?) [46] 
dé 

a = cg; + O(g?) 


where a,b,c are positive constants. We see from the 
above formulas that, up to second order in g’, as 
t — œ,g; — 0 for d > 4. In fact, for d > 5 the decay 
rate is O(e*) and for d=4 the rate is O(t). 
However, to see if V; converges, we also have to 
discuss the us, € flows. It is clear that in general the 
ut flow will diverge. This is fixed by choosing the 
initial uọ to be the bare critical mass. This is 
obtained by integrating up to time ż and then 
expressing uo as a function of the entire g trajectory 
up to time t. Assume that jy; is uniformly bounded 
and take t — oo. This gives the critical mass as 


Ho =b / ds e=% gs = pe(8o) [47] 


This integral converges for all cases discussed above. 
With this choice of uo we get 


Ht = b | ds a" [48] 


and this exists for all £ and converges as t — oo. 
Now consider the perturbative € flow. It is easy to 
see from the above that for d > 4,& converges as 
LX. 

We have not discussed the d=3 case because the 
perturbative g fixed point is of order O(1). But 
suppose we take, in the d=3 case, [¢]=(3 — «)/4 
with e > 0 held small as in Brydges et al. (2003). 
Then the above perturbative flow equations are 
easily modified (by taking account of [43]) and we 
get, to second order, an attractive fixed point 
g, =O(e) of the g flow. The critical bare mass uo 
can be determined as before and the & flow 
converges. The qualitative picture obtained above 
has a rigorous justification. 


Rigorous RG Analysis 


We will give a brief introduction to rigorous RG 
analysis in the discrete setup in the section “The RG 
as a Discrete Semigroup” concentrating on the 
principal problems encountered and how one 
attempts to solve them. Our approach is borrowed 
from Brydges et al. (2003). It is a simplification 
of the methods initiated by Brydges and Yau in 
(1990) and developed further by Brydges et al. 
(1998). The reader will find other approaches to 
rigorous RG methods in the selected references, such 
as those of Balaban, Gawedzki and Kupiainen, 
Gallavotti, and others. We will take as a concrete 
example the scalar field model introduced earlier. 

At the core of the analysis is the choice of good 
coordinates for the partition function density, z, of 
the section “The RG as a Discrete Semigroup”. This 
is provided by a polymer representation (defined 
below) which parametrizes z by a couple (V,kK), 
where V is a local potential and K is a set function 
also depending on the fields. Then the RG transfor- 
mation T; maps (V,K) to a new (V,K). (V, kK) 
remain good coordinates as the volume tends to ov, 
whereas z(volume) diverges. There exist norms 
which are suited to the fixed-point analysis of (V, K) 
to new (V,K). Now comes the important point: z 
does not uniquely specify the representation (V, K). 
Therefore, we can take advantage of this nonunique- 
ness to keep K small in norm and let most of the 
action of Ty; reside in V. This process is called 
extraction in Brydges et al. (2003). It makes sure that 
K is an irrelevant term, whereas the local flow of V 
gives rise to discrete flow equations in coupling 
constants. We will not discuss extraction any further. 
In the following, we introduce the polymer represen- 
tation and explain how the RG transformation acts 
on it. 


Exact Renormalization Group 279 


To proceed further, we first introduce a simplifi- 
cation in the setup used in the section “The RG as a 
Discrete Semigroup.” Recall that the function u 
introduced in [3] was smooth, positive definite, and 
of rapid decrease. We will simplify further by 
imposing the stronger property that it is actually of 
finite range: u(x) =0 for |x| > 1. We say that u is of 
finite range 1. It is easy to construct such functions. 
For example, if g is any smooth function of finite 
range 1/2, then u =g x g is a smooth positive-definite 
function of finite range 1. This implies that the 
fluctuation covariance I, of [7] has finite range L. 
As a result, T, in [10] has finite range L”*! and the 
corresponding fluctuation fields ¢,(x) and G,(y) are 
independent when |x — y| > L”*!, 


Polymer Representation 


Pave R? with closed cubes of side length 1 called 
1-blocks or unit blocks denoted by A, and suppose 
that A is a large cube consisting of unit blocks. A 
connected polymer X C A is a closed connected 
subset of these unit blocks. A polymer 
activity K(X, ¢) is a map X, — R where the fields 
@ depend only on the points of X. We will set 
K(X, ¢) =0 if X is not connected. A generic form of 
the partition function density z(A,@) after a certain 
number of RG iterations is 

N 

=1 


z(A) = D — g V(X) I K(X;) [49] 


N=0 Xı seas XN J 


Here X; C A are disjoint polymers, X = |JX;, and 
X,=A\X. Visa local potential of the form [23] with 
parameters £, g, u. We have suppressed the ¢-depen- 
dence. Initially, the activities K; = 0, but they will arise 
under RG iterations and the form [49] remains stable, 
as we will see. The partition function density is thus 
parametrized as a couple (V, K). 


Norms for Polymer Activities 


Polymer activities K(X, ¢) are endowed with a norm 
||K(X)||, which must satisfy two properties: 


XAY = 0 = ||Ki(X)K2(Y)IIS IIK (XI |K (Y)| 
ITKI < AIKE [50] 


where X is the interior of X and |X| is the number 
of blocks in X. c is a constant of order O(1). The 
norm measures (Fréchét) differentiability proper- 
ties of the activity K(X, ¢) with respect to the field 
@ as well as its admissible growth in ¢. The 
growth is admissible if it is uc integrable. The 
second property above ensures the stability of 
the norm under RG iteration. For a fixed polymer 
X, the norm is such that it gives rise to a Banach 
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space of activities K(X). The final norm ||- ||, 
incorporates the previous one and washes out the 
set dependence, 


IKlL4 = ou >, A(X) IIK(X)| [51] 
XDA 
where A(X) = L'¢+?)4|, This norm essentially ensures 


that large polymers have small activities. The details 
of the above norms can be found in Brydges et al. 
(2003). 

The RG operation map f is a composition of two 
maps. The RG iteration map z — Trz induces a map 
V — Va and a nonlinear map Tr :K > K=T,(R). 
We then compose this with a (nonlinear) extraction 
map E which takes out the expanding (relevant) 
parts of K z E(K)=K' and compensates the local 
potential V, — V’ such that Trz remains invariant. 
We denote by f the composition of these two maps 
with 


VoW=f\(V,K), K=>K' =fx(V,K) [52] 


The Map 7, 


Consider applying the RG map T; to [49]. The map 
consists of a convolution ur,» followed by the 
rescaling S}. In the integration over the fluctuation 
field C, we will exploit the independence of ¢(x) and 
C(y) when |x —y| > L. To do this, we pave A by 
closed blocks of side L, called L-blocks, so that 
each L-block is a union of 1-blocks. Let X“ be 
the L-closure of a set X, namely the smallest union 
of L-blocks containing X. The polymers will be 
combined into L-polymers which are, by definition, 
connected unions of L blocks. The combination is 
performed in such a way that the new polymers are 
associated to independent functionals of ¢. 

Let V(X,¢)), to be chosen later, be a local 
potential independent of ¢. For a coupling constant 
sufficiently small, there is a bound 


Jerr |] <2 [53] 


We assume that V is so chosen that the same bound 
holds when V is replaced by V. Define 


P(A, a o) = — e V(Ac+e) _ -V(A,¢) [54] 
Then we have 
-V(X +0) — p-V(Xe6+¢) 


e VA) 4 P(A, o)) [55] 


where X. is the closure of X.. Expand out the 
product and insert into the representation [49] for 


z(A,¢+¢)). We then rewrite the resulting sum in 
terms of L-polymers. The sum splits into a sum over 
connected components. Define, for every connected 
L-polymer Y, 





BK(Y) = 
oe, Nil 
g N M 
x > ew [kx [r 
(X; (A)>Y j=1 i=1 


where Xo = Y\( U X;) U ( U Aj) and the sum over the 
distinct A;, and disjoint 1-polymers X; is such that 
their L-closure is Y. Equation [49] now becomes 


»I] BK(Y [57] 


where the sum is over ‘aa connected closed 
L-polymers. We now perform the fluctuation inte- 
gration over ¢ followed by the rescaling. Now V(Y,) 
is independent of ¢. The ¢-integration sails through 
and then factorizes because the Y;, being disjoint 
closed L-polymers, are separated from each other by 
a distance > L. The rescaling brings us back to 1- 
polymers and reduces the volume from A to LHA. 
Therefore, 


zZ (LTA) = Tz z(A) 


. N 
-Eh he MOTT TBR) [58 


X41 snag XN j= 1 


where the sum is over disjoint 1-polymers, 
X.=L7!A\X. By definition Vz(A)=S,V(LA) and 
(TLBK\(Z)=SLur, *BK(LZ). This shows that the 
representation [49] is stable under iteration and, 


furthermore, gives us the map 
VoV 
an [59] 
K-K= T(K) = T BK 


The norm boundedness of K implies that T(K) is 
norm bounded. We see from the above that a 
variation in the choice of V is reflected in the 
corresponding variation of K. The extraction map € 
now takes out from K the expanding parts and then 
compensates it by a change of Vz in such a way that 
the representation [58] is left invariant by the 
simultaneous replacement V, —> V',K — K'=€(K). 
The extraction map is nonlinear. Its eao is a 
subtraction operation and this dominates in norm the 
nonlinearities, (Brydges et al. 1998). 

The map V — Vz — V’ leads to a discrete flow of 
the coupling constants in V. It is convenient to write 
K= K°®™ + R, where R is the remainder. Then the 
coupling constant flow is a discrete version of the 


continuous flows encountered in the last section, 
together with remainders which are controlled by the 
size of R. In addition, we have the flow of K. The 
discrete flow of the pair (coupling constants, K) can be 
studied in a Banach space norm. Once one proves that 
the nonlinear parts satisfy a Lipshitz property, the 
discrete flow can be analyzed by the methods of stable- 
manifold theory of dynamical systems in a Banach 
space context. The reader is referred to the article by 
Brydges et al. (2003) for details of the extraction map 
and the application of stable-manifold theory in the 
construction of a global RG trajectory. 


Further Topics 
Lattice RG Methods 


Statistical mechanical systems are often defined on a 
lattice. Moreover, the lattice provides an ultraviolet 
cutoff for Euclidean field theory compatible with 
Osterwalder—Schrader positivity. The standard lat- 
tice RG is based on Kadanoff—Wilson block spins. 
Its mathematical theory and applications have 
been developed by Balaban, and Gawedzki and 
Kupiainen (see Gawedzki and Kupiainen (1986) and 
references therein). This leads to multiscale decom- 
positions of the Gaussian lattice field as a sum of 
independent fluctuation fields on increasing length 
scales. Brydges et al. (2004) have shown that 
standard Gaussian lattice fields have multiscale 
decompositions as a sum of independent fluctuation 
fields with the finite-range property introduced in 
the last section. This permits the development of 
rigorous lattice RG theory in the spirit of the 
continuum framework of the previous section. 


Fermionic Field Theories 


Field theories of interacting fermions are often 
simpler to handle than bosonic field theories. Because 
of statistics, fermion fields are bounded and pertur- 
bation series converges in finite volume in the 
presence of an ultraviolet cutoff. The notion of 
studying the RG flow at the level of effective 
potentials makes sense. At any given scale, there is 
always an ultraviolet cutoff and the fluctuation 
covariance being of fast decrease provides an infrared 
cutoff. This is illustrated by the work of Gawedzki 
and Kupiainen (1985), who gave a nonperturbative 
construction in the weak effective coupling regime of 
the RG trajectory for the Gross-Neveu model in two 
dimensions. This is an example of a model with an 
unstable Gaussian fixed point where the initial 
coupling has to be adjusted as a function of the 
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ultraviolet cutoff consistent with ultraviolet asympto- 
tic freedom so as to stabilize the flow. 
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A Brief History 


The “Falicov-Kimball model” was first considered by 
Hubbard and Gutzwiller during 1963-65 as a simpli- 
fication of the Hubbard model. In 1969, Falicov and 
Kimball introduced a model that included a few extra 
complications, in order to investigate metal—insulator 
phase transitions in rare-earth materials and transition- 
metal compounds (Falicov and Kimball 1969). Experi- 
mental data suggested that this transition is due to the 
interactions between electrons in two electronic states: 
nonlocalized states (itinerant electrons), and states that 
are localized around the sites corresponding to the 
metallic ions of the crystal (static electrons). 

A tight-binding approximation leads to a model 
defined on a lattice (the crystal) and two species of 
particles are considered. The first species consists of 
spinless quantum fermions (we refer to them as 
“electrons”), and the second species consists of localized 
holes or electrons (“classical particles”). Electrons hop 
between nearest-neighbor sites but classical particles do 
not. Both species obey Fermi statistics (in particular, the 
Pauli exclusion principle prevents more than one 
particle of a given species to occupy the same site). 
Interactions are on-site and thus involve particles of 
different species; they can be repulsive or attractive. 

The very simplicity of the model allows for a 
broad range of applications. It was studied in the 
context of mixed valence systems, binary alloys, and 
crystal formation. Adding a magnetic field yields the 
flux phase problem. The Falicov-Kimball model can 
also be viewed as the simplest model where 
quantum particles interact with classical fields. 

The fifteen years following the introduction of the 
model saw studies based on approximate methods, 
such as Green’s function techniques, that gave rise to 
a lot of confusion. A breakthrough occurred in 1986 
when Brandt and Schmidt, and Kennedy and Lieb, 
proposed the first rigorous results. In particular, 


Kennedy and Lieb showed in their beautiful paper 
that the electrons create an effective interaction 
between the classical particles and that a phase 
transition takes place for any value of the coupling 
constant, provided the temperature is low enough. 
Many studies by mathematical physicists fol- 
lowed and several results are presented in this 
short survey. Recent years have seen an increasing 
interest from condensed matter physicists. We 
encourage interested readers to consult the reviews 
by Freericks and Zlatic (2003), Gruber and Macris 
(1996), and Jedrzejewski and Lemanski (2001). 


Mathematical Setting 
Definitions 


Let A C Zf denote a finite cubic box. The config- 
uration space for the classical particles is 


Qa = {0,1}" = {w = (w) : x € A, and wy = 0,1} 


where wy = 0 or 1 denotes the absence or presence of 
a classical particle at the site x. The total number of 
classical particles is N.(w)= X pea wx. The Hilbert 
space for the spinless quantum particles (“elec- 
trons”) is the usual fermionic Fock space 


|A] 
Fa = Han 
N=0 


where Ha, is the Hilbert space of square summable, 
antisymmetric, complex functions Y = U(x1,...,xN) 
of N variables x; € A. Let a! and a, denote the 
standard creation and annihilation operators of an 
electron at x; recall that they satisfy the antic- 
ommutation relations 


{dx, dy} = 0, {al,a}} = 0, {axal} = Oxy 


The Hamiltonian for the Falicov-Kimball model is an 
operator on F; that depends on the configurations of 
classical particles. Namely, for w € Q4, we define 


HAy(w) = — Ss” ala; — UY wyl ly 
x, yEA xEA 
x-y|=1 
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The first term represents the kinetic energy of the 
electrons. The second term represents the on-site 
attraction (U > 0) or repulsion (U < 0) between 
electrons and classical particles. 

The Falicov-Kimball Hamiltonian can be written 
with the help of a one-body Hamiltonian ha, which 
is an operator on the Hilbert space for a single 
electron Ż(A). Indeed, we have 


=», og to wal Ay 


x yEA 


The matrix ha(w) = (hbxy(w)) is the sum of a hopping 
matrix (adjacency matrix) t4, and of a matrix v,(w) 
that represents an external potential due to the 
classical particles. Namely, we have 


Payla) = =h = UO 0 


where tyy is one if x and y are nearest neighbors, and is 
zero otherwise. The spectrum of ft, lies in (—2d, 2d), 
and the eigenvalues of v,(w) are —U (with degeneracy 
N.(w)) and 0 (with degeneracy |A| —N,(w)). Denoting 
A;(A) the eigenvalues of a matrix A, it follows from the 
minimax principle that 


XC) — IBI < (4 + B) < AG) + IIB] 
Let Ay(w) < A2(w) < +--+ < Aaw) be the eigenvalues 


of halw). Choosing A=v,(w) and B=t, in the 
inequality above, we find that for U > 0, 


=U=24<— (@)< =U 2d forj = 1s; Naw) 
=< Au) < 2d: tjsa Nw) PF Wives | 


In particular, for any configuration w and any A, 
Spec ha(w) C (—U — 2d, —U + 2d) U (—2d, 2d) 


Thus, for U > 4d, the spectrum of ha(w) has the 
“universal” gap (-U + 2d, —2d). A similar property 
holds for U < —4d. 


Canonical Ensemble 


A fruitful approach towards understanding the 
behavior of the Falicov-Kimball model is to first 
fix the configuration of the classical particles, and 
then to introduce the ground-state energy E,(Ne, w) 
as the lowest eigenvalue of H,(w) in the subspace 


HA, N.: 


E,(Ne,w)= inf (| 
VEH Ne, || |= 1 


A typical problem is to find the set of ground- 
state configurations, that is, the set of configura- 
tions that minimize E,(N,.,w) for given N, and 
Ne= NAW): 


In the case U > 4d and Ne=N,(w), the ground- 
state energy E,(N.(w),w) has a convergent expan- 
sion in powers of U~! 


E (N: (w), w) 


0x a <k 


where m(x1,...,X}) is the number of sites x; with 
wx, =0. The last sum also includes the condition 
|x — x1|=1. Simple estimates show that the series is 
less than (2d/(U — 4d))N.(w). The lowest-order term 


is a nearest-neighbor interaction, 


1 


Ôw, —wy 
{x,y }:|x—-y|=1 


that favors pairs with different occupation numbers. 
Formula [1] is the starting point for most studies of 
the phase diagram for large U. A similar expansion 
holds for U < —4d and Ne = |A| — 

A simple derivation of expansion [1] using Cauchy 
formula can be found in Gruber and Macris (1996). It 
can be extended to positive temperatures with the 
help of Lie-Schwinger series (Datta et al. 1999). 

Phase diagrams are better discussed in the limit of 
infinite volumes where boundary effects can be 
discarded. Let ?* be the set of configurations on Z’ 
that are periodic in all d directions, and NP" (pe) C QP% 
be the set of periodic configurations with density pe. 
For w € QP and pe € [0,1], we introduce the energy 
per site in the infinite volume limit by 


e(Pe,w) = lim ~= Ea (Ne, w) [2 


A7 ZA A] 
Here, the limit is taken over any sequence of 
increasing cubes, and N.=|p,|A|| is the integer 
part of p.|A|. Existence of this limit follows from 
standard arguments. 

In the case of the empty configuration wx = 0, 
we get the well-known energy per site of free lattice 
electrons: for k € [—7, 7]7, let elk) = = cos ky: 
then 


: a SET. 
(21)? Je(k)<er(pe) 


where éf(-) is the Fermi energy, defined by 


ee a | dk 
(2T)? Je(k)<er(pe) 


e(pPe,w = 0) = 


The other simple situation is the full configu- 
ration wy=1, whose energy is e(pe,w=1)= 
e( pe, Ww = 0) — Upe 

Let e(fe,Pc) denote the absolute ground-state 
energy density, namely, 


C(Pe,Pc) = inf  e(pe,w) 


WEP (pe) 
Notice that e(pe,w) is convex in pe, and that e(pe, pe) 
is the convex envelope of {e(pe, w):w E QP (p,)}. It 
may be locally linear around some (pe, pe). This is 
the case if the infimum is not realized by a periodic 
configuration. The nonperiodic ground states can be 
expressed as linear combinations of two or more 
periodic ground states (“mixtures”). That is, for 1 < 
i <n there are a; > 0 with $; aj=1, w® € OP", 


and p®, such that 
pes ` Oi Pc (wt 
i 


Pe = ` ap, 
i 
and 


=e e(p® W 


The simplest mixture is the “segregated state” for 

densities pe < pe: take w to be the empty 

configuration, w'*) to be the full configuration, 
nA), p = pe/ Pc, and az =1—a1 = pe. 

If d>2, a mixture between configurations w” 
can be realized as follows. First, partition Zf into 
domains D; U---UD, such that |D;|/|A] — a; and 
\AD,|/|A| — 0 as A Z Z4. Then, n a nonperiodic 
configuration w by setting wy =w for x € D; (see the 
illustration in Figure 1). The canonical energy can be 
computed from [2], and it is equal to 


él De Pc) 


(Pew) = inf welp a) 
FY ap =pe 2 i 


Furthermore, the infimum is realized by densities p® 
such that there exists ue with pele, w®) =p for all 
i (see [4] below for the definition of pe(ue,w)). 


oOoo0oo0o0000000 
oOo0oo0o0o0000000 
oOoo0oo0o0000000 






OOO 00000000000000 
O00000 0000000000000 


Figure 1 A two-dimensional mixed configuration formed by 
periodic configurations of densities 0, 1/5, and 1/2. 
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We define the canonical ground-state phase 
diagram as the set of ground states w (either a 
periodic configuration or a mixture) that minimize 
the ground-state energy for given densities pe, pe: 


Gean (Pes Pc) = {W : e(pe, w) = (Pe, pe) and pe(w) = pe} 


Grand-Canonical Ensemble 


Properties of the system at finite temperatures 
are usually investigated within the grand-canonical 
formalism. The equilibrium state is characterized by 
an inverse temperature 8 = 1/kgT, and by chemical 
potentials je, Uc, for the electrons and for the 
classical particles, respectively. In this formalism, 
the thermodynamic properties are derived from the 
partition functions 


ZK, pe: w) = 
LA (2, as fie) 


tr F€ e~ FlAa(w)—HeNa] 3] 


=). eee Za (2, He, W w) 


WENA 


Here, Ny = J` c4 alay is the operator for the total 
number of electrons. We then define the free energy by 


1 
FCG flee) = — glosZa(8, [be Hc) 


The first partition function in [3] allows us to 
introduce an effective interaction for the classical 
particles, mediated by the electrons, by 


1 
F,(, Helia) = — WUN. (w) = g OBANA, [e, W) 


It depends on the inverse temperature 3. Taking the 
limit of zero temperature gives the corresponding 
ground-state energy of the electrons in the classical 
configuration w: 


Eq (Me, fc, w) = ore Fy (G, bey be, W) 


y+ SO Le) 


f:Aj(w)<He 


= — UN 


Notice that Fa and Eq are strictly decreasing and 
concave in He, Mc (Eq is actually linear in ue). We 
also define the energy density in the infinite volume 
limit by considering a sequence of increasing cubes. 
For w € QP, 


l | 
e( He, Hc, w) = a JA] E1 (He, Hec, w) 


The corresponding electronic density is 


i 1 , 
Pe(Me,w) = lim — #{j : Aw) < Mes 
azzi |A| 
S ee) 4] 
a Ole Ue, Hce, 
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and the density of classical particles is 
p-(w) = lima N.(w)/|A]. One can check that canoni- 
cal and grand-canonical energies are related by 


C (Me, fc, W) = €(Pe(Me, w), w) 
— He Pe (He, W) — MePc(w) [5] 


Given (He, Uc), the ground-state energy density 
e( lte, uc) is defined by 


e( He, He) = nf, e( Me, Hc, w) 


The set of periodic ground-state configurations 
for given chemical potentials pe, uc is the grand- 
canonical ground-state phase diagram: 


Goce (Me; He) — fu z gps : EHe Hew) — e(pie; pc) } 


It may happen that no periodic configuration 
minimizes e({le,flcyw) and that Ggc(He, fc) = 9. 
However, results suggest that Ggc(Me, Hc) is 
nonempty for almost all He, He- 

The situation simplifies for U>4d and pe € 
(—U +2d, —2d). Since ue belongs to the gap of 
hy(w), we have Pe( Hes w) = pe(w), and 


C(Me, Hc, W) = efpe(w), w) — (He + He) Pc(w) 


Thus, Ggc(He, Hc) is invariant along the line pe + He = 
const. (for ue in the gap). 


Symmetries of the Model 


The Hamiltonian Hy, clearly has the symmetries of 
the lattice (for a box with periodic boundary 
conditions, there is invariance under translations, 
rotations by 90°, and reflections through an axis). 
More important, it also possesses particle-hole 
symmetries and these are useful since they allow us 
to restrict investigations to positive U and to certain 


domains of densities or chemical potentials 
(see below). 
e The classical particle-hole transformation 


Wy => Wy = 1 —w, results in 
HY(@) = Hy" (w) - UNA 


and N.(w)=|A|—N,(w). It follows that 
EY(N.,@) = EZY (Ne,w) — UN, and 


| 


Gan (Des Pe) = {D : w E€ Goon (Pes 1 — pe) } 
Ga ek = {2 wW E G (i me —j1c) } 


è An electron-hole transformation can be defined 
via the unitary transformation ax €a} and 


at +> €xax, where £x = 1 ona sublattice, and = —1 
on the other sublattice. Then, 


Hy (w) + Hy” (w) — UN. (w) 


and Nat |A|—Na. It follows that EẸ(|A| — 
Ne, w) = Ex" (Ne,w) —UN,(w), and 


Gan Pape) = Gen (1 — Pe; Pc) 
Ga (Hes He) = G. (mha — U) 


e Finally, the particle-hole transformation for 
both the classical particles and the electrons 
gives 


HY (©) = HK (w) + UN, + UN. (w) — UJA| 
It follows that 
Ex (|A| —Ne, w) = Ex (Ne, w) + U(Ne + Ne(w) — |Al) 


and 
GEin (Pes Pe) = {Dw € GE, (1 — pe, 1 — pe) 
G (Hes He) = 12 OC =U u)} 


Any of the first two symmetries allow us to choose 
the sign of U. We assume from now on that U > 0. The 
third symmetry indicates that the phase diagrams have 
a point of central symmetry, given by pe = pe = 1/2 in 
the canonical ensemble and He =p. = —U/2 in the 
grand-canonical ensemble. Consequently, it is enough 
to study densities satisfying pe < 1/2 and chemical 
potentials satisfying 4e < —U/2. 

These symmetries also have useful consequences at 
positive temperatures. In particular, both species of 
particles have average density 1/2 at He = He = — U /2, 
for all 8. 


€l 


The Ground State - Arbitrary Dimensions 
The Segregated State 


What follows is best understood in the limit 
U — œ and when pe < pe. In this case, the electrons 
become localized in the domain Dalw)= {x€ 
A:wx= 1} and their energy per site is that of the 
full configuration, e(p,w=1) (see the section 
“Canonical ensemble”), where p= pe/pe is the 
effective electronic density. The presence of a 
boundary for D,(w) raises the energy and the 
correction is roughly proportional to 


Ba(w) = #4 (x,y) : x € Da(w) and y € Z? \ Dy(w) } 


The following theorem was proposed by Freericks 
et al. (2002). 


Theorem 1 


(i) Let A C Zf be a finite box, and U > 4d. Then 
for all wE Qa, and all Ne < N-(w)=N., we 
have the following upper and lower bounds: 


Ne 
zg (w=) 


~Nee(SEw=1) > a(S) 0) Baw 


Here, a(p)=a(1— p) is strictly positive for 0 < 
p <1. (U) behaves as 8d?/U for large U, in 
the sense that Uy(U) > 8d* as U > ov. 

(ii) For any pe# pc that differ from zero, the 
segregated state is the unique ground state if 
a(pe/Pc) > y(U), that is, if U is large enough. 


| Ba lw) > Ea(Ne,w) 








The proof of (i) is rather lengthy and we only 
show here that it implies (ii). Let b(w)= 
lim; (By(w)/|A]), and notice that b(w)=0O for the 
empty, the full, and the segregated configurations; 
0 < b(w) < d for all other periodic configurations 
or mixtures. Recall that p.e(pe/pe,w = 1) is the 
energy density of the segregated state. For all 
densities such that a(pe/p:) > y(U), and all config- 
urations such that p(w) = pe, we have 


aotan ieni 


and the inequality is strict for any periodic config- 
uration. This shows that the segregated configura- 
tion is the unique ground state. 


General Properties of the Grand-Canonical 
Phase Diagram 


We have already seen that the grand-canonical 
phase diagram is symmetric with respect to 
(— U/2, —U/2). Other properties follow from 
concavity of e(ue, Hc). 

Let wE Gge(Mes fc) \Ggc( Mes Me) 
(Hes Me) \Ggc( Mes He). Then, 


and w E€ Gg 


(a) He = p and p > pe imply pel”) > pe(w); 

(b) pc =p, and p > pe imply peles w) > pelhe, w) 
and w cannot be obtained by adding some 
classical particles to the configuration w. 


It follows from (b) that if w = 1 € Ggc(He, Hc), then 
oS lE Glual) fot all pe 2 Ho He Z lia A simi 
lar property holds for the empty configuration. To 
establish these properties, we can start from 


gatat) a e( He, Hc, w) ~O 
> GR paa) E a i Hat) [6] 
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Since e(jle, Hc, w) is concave with respect to ue and 
linear with respect to ue, we have 


C( He, Hey W) < elte, He, w) 
+ (Le — Me) Pe(Me,W) + (oe — Me) pew) 7 


Using this inequality for both terms on the right- 
hand side of [6], we obtain the inequality 


(Hi, — He) [Pele w) — pelhe, w)] 
+ (He — He) [Pc(w") — pe(w)] = 0 
which proves (a) and the first part of (b). The second 
part of (b) follows from 


He 
du pe(, w) — UcPc(w) 


—00O 


e( fle, Hc, w) => 


Indeed, the minimax principle implies that eigenvalues 
A;(w) are decreasing with respect to w (if U > 0), so 
that pe( ue, w) is increasing (with respect to w). Then for 
any w” > w and ui > He, 


e(He, Hc, Ww) — (fe, Hc, Ww) 
> e( He, Hc, w) E e( He, He; w) 


and w"EGec( Mes He) implies w” ÉG geh Hk, He). 

Next, we discuss domains in the plane of 
chemical potentials where the empty, full, and 
chessboard configurations have minimum energy 
(see, e.g., Gruber and Macris (1996), and references 
therein). One easily sees that w = 1 is the unique 
ground-state configuration if uc > 0, or if ue > 2d 
and ue > —U. Similarly, w = 0 is the unique ground 
state if ue < —U, or if ue < -—U — 2d and u.< 0. 
For U > 4d, it follows from the expansion [1] that 
the full configuration is also ground state if —U + 
2d < He < —2d and e+ u+ U> 4d/(U — 4d). 
These domains can be rigorously extended using 
energy estimates that involve correlation functions 
of classical particles. The results are illustrated in 
Figures 2 (U < 4d) and 3 (U > 4d). 





Figure 2 Grand-canonical ground-state phase diagram for 
U < 4d. Domains for the empty, chessboard, and full configurations, 
are denoted in light gray, black, and dark gray, respectively. 
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Figure 3 Grand-canonical ground-state phase diagram 
for U > 4d. Domains for the empty, chessboard, and full 
configurations are denoted in light gray, black, and dark gray, 
respectively. 


Finally, canonical and grand-canonical phase 
diagrams are related by the following properties: 


(c) Ifwe Ggc(Hes Hc), then w € Gean(Pe(Mes w), Pc(w)). 

(d) More generally, suppose that w®,...,w™ € 
Ggc(He, Hc), and consider a mixture with coeffi- 
cients @1,...,@n. The mixture belongs to 
Canles Peh with Pe = >i pel les w) and 
Pe = > aipe(w). 

To establish (c), observe that any w satisfies 

C( ples Lc u) > C(Mes cy w) if we Gge(Mes lic). Let 

Pe = pelhe, w) and pe =p-(w), and let ju, be such that 

Pe( Hes) = pe. By eqns [S] and [7], 


€(Pe( ie, w), w) — be Pe( ie, w) — Hepelu’) 
> €(Pe( fle, W), W) — Hepe ple, W) — Uc Pc(w) 


Then, e(pe,w’) > e(pe,w) for any configuration w 
such that p(w’) = pe. Property (d) follows from (c) by 
a limiting argument, because a mixture can be 
approximated by a sequence’ of periodic 
configurations. 

Next we describe further properties of the phase 
diagrams that are specific to dimensions 1 and 2. 


Ground-State Configurations - Dimension 1 


A large number of investigations, either analytical or 
numerical, have been devoted to the study of the 
ground-state configurations in one dimension. One- 
dimensional results also serve as guide to higher 
dimensions. Recall that symmetries allow us to 
restrict to U > 0 and pe < 1/2. 

Most ground-state configurations that appear in 
the canonical phase diagram seem to be given by an 
intriguing formula, which we now describe. Let 


pe=p/q4 with p relatively prime to q. Then 
corresponding periodic ground-state configurations 
have period q and density p: =r/q (r is an integer). 
The occupied sites in the cell {0,1,...,q — 1} are 
given by the solutions ko,...,k,_1 of 

(pkj)=jmodq, O<j<r-1 [8] 
Note that the first classical particle is located at 
ko =0, and ko, ...,kp—1 are not in increasing order. In 
order to discuss the solutions of [8], we introduce 
l= |q/p| (the integer part of g/p), and we write 


q=(€+ 1)p-s [9] 


where 1 < s < p-— 1, and s is relatively prime to p. 
Next, let L(x) denote the distance between the particle 
at x and the one immediately preceding it (to the left). 

Let us observe that if pe = pe, that is, if r =p, then 


(a) Lik) =f for0=7 sladek 
(b) L(k;)=£+1 fors<j<p—1andk;—(€+1)= 
Rings 


Indeed, for pk; =j + nq, eqn [9] implies 
pki —£) =7+ (n—-1qt+ (p—s) =i +p—smodq 


and 
p(k; -€-—1) =j -—smodq 


Therefore, kj — £ is a solution of [8] if ;+p—s< 
p—1, while k; —(€+1) is a solution of [8] if 
j—s>0. 

These two properties show that the configuration 
defined by [8] is such that L(x) € {£,4+ 1} for all 
occupied x. A periodic configuration such that all 
distances between consecutive particles are either £ 
or 4+1 is called homogeneous. Let w be a 
homogeneous configuration with period q and 
density pe=r/q, and let xo <:::<xp-ı be the 
occupied sites in {0,1,...,q — 1}. We introduce the 
derivative w of w as the periodic configuration with 
period r defined by (see Figure 4) 


,ı J1 


A configuration is most homogeneous if it can be 
“differentiated” repeatedly until the empty or the 
full configuration is obtained. 

Let w be the homogeneous configuration from [8] 
and w' be its derivative. Using the same arguments as 
for properties (a) and (b) above, and the fact that s is 
relatively prime to p, we obtain 
(c) Let Rosse sR 

(sk) = ¿mod p 


if LG) =f 


be the solutions of 
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w @BOooOdWBF OCB COCO BOC BOCC0O8B8O0 BO OC 
ky=0 k,=4 k,=7 kz=11 kn=14 kg=18 kz =21 

wW: @ O © © O © 
kj =0 k =1 ki =2 ki, =3 k5=4 kg=5 k4=6 


Figure 4 The configuration w given by the formula [8] with q =24 and p=7, and its derivative w’. Notice that 2=3 and s= 4. 


Then (kp,...,%, 1) is a permutation of 
(O; dj tng P= 1) Further k = Las Or A 
j<s-l, and k;-1=k. , fors<j<p-—1l. 

Consider the periodic configuration with period p 
where sites kp,...,Rk,_, are occupied and sites 
Roy+++yR,_4 are empty. Since kp =0, this configura- 
tion is precisely the derivative w of w. Iterating, 
these properties prove that the solutions of [8] are 
most homogeneous. 

One of the most important results in one dimen- 
sion is that only most homogeneous configurations 
are present in the canonical phase diagram, for U 
large enough and for equal densities pe = pe. 


Theorem 2 Suppose that pe=p-=p/q. There 
exists a constant c such that for U > c44, the only 
ground-state configuration is the most homogeneous 
configuration, given by |8] (together with transla- 
tions and reflections). 


This theorem was established using the expansion [1] 
of E,(Ne,w) in powers of U™. It suggests a devil’s 
staircase structure with infinitely many domains. 
However, the number of domains for fixed U could 
still be finite. Results from Theorem 2 are illu- 
strated in Figure 5. Notice that pe = pe when [ue is in 
the universal gap. These results have been extended 














Figure 5 Grand-canonical ground-state phase diagram in one 
dimension for U > 4 and ue in the universal gap. Chessboard 
configurations occur in the black domain. Dark gray oblique 
domains correspond to densities 1/5, 1/4, 1/3, 2/8, 3/4, 4/5. Total 
width of these domains is of order U7. 


to positive temperatures by using “quantum 
Pirogov-Sinai theory” (Datta et al. 1999). 

For small U, on the other hand, one can use a 
(nonrigorous) Wigner—Brillouin degenerate pertur- 
bation theory (a standard tool in band theory). 
Let pe=p/q with p relatively prime to q, and w be 
a periodic configuration with period ng,n eN. 
Then for U small enough (U <1/q), we obtain 
the following expansion for the ground-state energy 
(Freericks et al. 1996): 


2. 
e(pe,w) = ——sin me — Upepe(w) 


mn 3 
= (Pe) | 
AT SIN Te 


U*| log U| + O(U*) [10] 


where @(p-) is the “structure factor” of the periodic 
configuration w, namely 


1 nq—1 


(pe) = ^ eripi, 


nq <> 


This expansion suggests that the ground-state 
configuration can be found by maximizing the 
structure factor. The following theorem holds 
independently of U. 


Theorem 3 Let pe =p/q. There exist rı > q/4 and 
r2 <3q/4 such that the configurations maximizing 
the structure factor are given as follows: 


(i) for pe=r/q with 

formula [8]; 

(ii) for pc € (r/q,(r + 1/q)) with 1, sra r= lythe 
configuration is a mixture of those for p: =r/q and 
Pc =(r + 1)/q; and 

(iii) for pe € (0,r1/q), the configurations are mixtures 
of w=0 and that for pe=rı/q. For pe € 
(r2/q, 1), the configurations are mixtures of w = 
1 and that for pe =12/q. 


rı <r<nm, use the 


Some insight for low densities is provided by 
computing the energy of just one classical particle 
and one electron on the infinite line, and to compare 
it with two consecutive classical particles and two 
electrons. It turns out that the former is more 
favorable than the latter for U>2//3 x 1.15, 


while “molecules” of two particles are forming 
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Figure 6 Grand-canonical ground-state phase diagram for 
U x 0.4. Enlarged are domains for pe=1/7 and 2/7, with the 
same densities po =2/7,3/7,4/7. 


when U < 2/3. Smaller U shows even bigger 
molecules for p.=np-, and n-molecules are most 
homogeneously distributed according to the for- 
mula [8]. It should be stressed that the canonical 
ground state cannot be periodic if U is small and 
pc €[1/4, 3/4], which is different from the case of 
large U. 

Only numerical results are available for 
intermediate U. They suggest that configurations 
occurring in the phase diagram are essentially given 
by Theorem 3 (together with the segregated config- 
uration). This is sketched in Figure 6, where bold 
coexistence lines for we > —U — 2 and ue < 2 repre- 
sent segregated states. 


Ground-State Configurations - Dimension 2 


We discuss the canonical ensemble only, but many 
results extend to the grand-canonical ensemble. 
Recall that Gean(1/2,1/2) consists of the two 
chessboard configurations for any U>0, and 
that segregation takes place when pe Æ pe, provid- 
ing U is large enough (Theorem 1). Other results 
deal with the case of equal densities, and for U 
large enough (see Haller and Kennedy (2001), and 
references therein). 


Theorem 4 Let pe =p: = p < 1/2. 


(i) If 
1211212 1 
pe AICIDVIAIQICI44 29” 
2°5’°3°4°9'5°11°6 
then for U large enough, the ground-state 
configurations are those displayed in Figure 7. 
If p=1/(n? + (n+ 1)*) with integer n, then for 
U large enough (depending on p), the ground- 
state configurations are periodic. 


(ii) If p is a rational number between 1/3 and 2/5, 
then for U large enough (depending on the 





Figure 7 Ground-state configurations for several densities. 
Occupied sites are denoted by black circles, empty sites by 
white circles. Lines are present only to clarify the patterns. 


i 121 
0 35 713 oT 


Figure 8 Canonical ground-state phase diagram in two 
dimensions for U > 8. 


denominator of p), the ground-state configura- 
tions are periodic. Further, the restriction to 
any horizontal line is a one-dimensional peri- 
odic configuration given by |8], and the config- 
uration is constant in either the direction ($) 
Of (4). 

(iii) Suppose that U is large enough. If pe 
(1/6,2/11), the ground-state configurations are 
mixtures of the configurations p=1/6 and 
p=2/11 of Figure 7. If p€(1/5,2/9), the 
ground-state configurations are mixtures of the 
configurations p=1/5 and p=2/9. If pe 
(2/9,1/4), the ground-state configurations are 
mixtures of the configurations p=2/9 and 


p=1/4. 


The canonical phase diagram for pe = pe is presented 
in Figure 8. 

The situation for densities p < 1/2 that are not 
mentioned in Theorem 4 is unknown. All these 
periodic configurations are present in the grand- 
canonical phase diagram as well. Theorem 4(ii) 
suggests that the two-dimensional situation is similar 
to the one-dimensional one where a devil’s staircase 
structure may occur. Let us stress that no periodic 
configurations occur for large U and densities pe = pe 
in the intervals (1/6, 2/11), (1/5, 2/9), and (2/9, 1/4). 
This resembles the one-dimensional situation, but for 
small U. 


See also: Quantum Spin Systems; Quantum Statistical 
Mechanics: Overview; Fermionic Systems; Hubbard 
Model; Pirogov—Sinai Theory. 
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Introduction 


On the one hand, quantum mechanics and classical 
mechanics appear to be formulated within quite 
different mathematical frameworks, that is, in terms 
of Hilbert spaces and operators on Hilbert spaces on 
the quantum side and in terms of phase spaces, that 
is, symplectic, or more generally, Poisson manifolds, 
and functions on these phase spaces on the classical 
side. On the other hand, there is a strong structural 
similarity between the algebras of observable quan- 
tities in both theories which are associative *-algebras 
over C. In the classical case, the algebra is commu- 
tative the product being the pointwise product of 
functions on the phase space and is endowed with the 
additional structure of a Poisson bracket by means of 
which the dynamics of the system can be formulated. 
In the quantum case, the algebra is the noncommu- 
tative composition of operators on a Hilbert space 
and the dynamics is determined by the corresponding 
commutator. The difference between functions on a 
phase space and the operators on a Hilbert space 
constitutes the main difficulty for the passage from a 
classical theory to the corresponding quantum theory 
which would be desirable, since a formulation of the 
more fundamental but much less intuitive quantum 
theory is often impossible. Even the consideration of 
the classical limit leads to the same problem of 
comparing quite different mathematical objects. One 
possibility, which is the basic idea of deformation 
quantization, to avoid these problems is to pass from 
classical observables to quantum observables not by 
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changing the underlying vector space, but only by 
deforming the algebraic structures namely the asso- 
ciative product and possibly the *-involution. 

This idea motivates the following definition of a 
star product by Bayen et al. (1978), which reassem- 
bles the minimal demands made on a suitable 
quantization: 


Definition 1 A star product on a Poisson manifold 
(M, II) is an associative C[[v]]-bilinear product x on 
C*°(M)[[v]] such that — writing f x g= >> o 1" Cf, g) 
for f,g € C~(M) with C-bilinear maps C, with values 


in a ) — the following properties hold: 
) Colf, 8) =f8, 
) Gua Bo {f,g}, and 


In case the C-bilinear maps C, are differential 
operators, the star product is called differential. If 
fxg=2xf, then x is called Hermitian. 





The conditions (i) and (ii) express the correspon- 
dence principle in deformation quantization and in 
case the star product converges the formal para- 
meter is to be identified with if, whence we set 
v= —v considering the formal parameter as purely 
imaginary. Since the Fedosov star products we are 
going to study in the sequel are differential, we shall 
drop stressing this property explicitly and refer to 
differential star products as star products, merely. 

One main advantage of deformation quantization 
is that one has the following very general existence 
result: 


Theorem 1 On every Poisson manifold (M, II) 
there exist (even differential) star products. 


This theorem was first shown by DeWilde and 
Lecomte (1983) for the symplectic case and 
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independently by Fedosov (1985) who gave a 
beautiful explicit construction using geometrical 
structures on (M,w) to build a star product 
recursively. Omori et al. (1991) gave yet another 
existence proof of star products on a symplectic 
manifold (M,w) that appears to combine 
the methods of DeWilde and Lecomte (1983) 
and Fedosov (1985). The general proof of existence 
on general Poisson manifolds is due to Kontsevich 
(2003) and is a consequence of Kontsevich’s 
formality theorem. 

If S=id+ )°™ , v’S, is a formal series of differ- 
ential operators on C%(M) with S,1=0 for r> 1, 
then 


f x’ g := S (CSF) * (Sg)) [1] 


again defines a star product. Clearly, x’ is Hermitian 
in case * is Hermitian and Sf=Sf for all 
f € C*(M){[[v]]. The above observation of the shape 
of certain isomorphisms between star product 
algebras gives rise to the notion of equivalence of 


star products: 


Definition 2 Two star products x and *’ on (M, II) 
are called equivalent in case there is a formal series 
S=id+)°™ ,v"S, of differential operators on 
C°(M) with S,1=0 for r>1 such that eqn [1] is 
satisfied for all f, g € C°(M)[[v]]. 


The full classification of star products up to 
equivalence was first obtained in the symplectic case 
by Nest and Tsygan (1995) and independently by 
Deligne (1995) and Bertelson et al. (1997). The 
general Poisson case again follows from Kontsevich’s 
formality theorem. In particular, in the symplectic 
case, star products are classified in a functorial way 
by the characteristic class 

| ep 

cracls) € É+ Hiran Ml 2 
defined by Deligne that induces a bijection 
between the equivalence classes of star products 
and [w]/v + H aaa Mie Moreover, it has been 
shown (Bertelson et al. 1997, Deligne 1995, Nest 
and Tsygan 1995) that every star product on a 
symplectic manifold is equivalent to a Fedosov star 
product. This fact can also be seen as a direct 
consequence of the explicit computation of the 
characteristic class of a Fedosov star product 
(cf. Neumaier (2002)). The importance of Fedosov’s 
construction for the general theory of deformation 
quantization in the symplectic case is also shown by 
the fact that in many proofs Fedosov’s star 
products were used to have reference star products 
to compare with a given star product. Moreover, 


there is a great variety of modifications and 
generalizations of Fedosov’s method and there are 
many examples where additional structures on the 
symplectic manifold suggest to look for star 
products adapted to them, where modified 
Fedosov constructions can be applied successfully. 


Fedosov Star Products on (M, w) 


The attempt to construct a star product step by step 
in fact leads to a cohomological problem, where 
a priori an obstruction in the third Hochschild 
cohomology of C%(M) occurs. This problem results 
from the demand for associativity which is the really 
most restricting condition on a star product. There- 
fore, additional arguments are necessary to show 
that these obstructions can be circumvented, since 
the concerning cohomology is isomorphic to 
P N TM) and hence, for dim (M) > 3, is not 
trivial at all. 

The basic strategy of Fedosov’s construction to 
build in associativity of the resulting product is 
to begin with a “very large” associative algebra 
(W 8A, o), where o mimicks the well-known 
Weyl-Moyal star product on a vector space with a 
constant symplectic Poisson tensor, and to specify a 
suitable subalgebra which is in bijection to 
C°(M)[[v]]. Pulling back the product to the sub- 
algebra then clearly results in an associative product 
on C~(M)[[v]], but as we shall see later on, one has 
to care for the bijection to be sufficiently nontrivial 
in order to obtain in fact a nontrivial deformation of 
the usual pointwise product on C~(M)|[[v]]. 

Defining 


W®A:= ( >E (\/'T*M @ /\ rm)) i] [3] 
s=0 


W &A becomes in a natural way an associative, 
supercommutative algebra using the symmetric 
V-product in the first factor and the antisymmetric 
^-product in the second factor. This product is 
denoted by ula&b)=ab for ab EeW8A. By 
WAF we denote the elements of antisymmetric 
degree k and set W := W & A”. Besides this pointwise 
product, the Poisson tensor II corresponding to w gives 
rise to another associative product o on W & A by 


aob = p(exp(41i(a) @i(G))(a@b)) M 


which is a deformation of u. Here i,(Y) denotes the 
symmetric insertion of a vector field Y € [~(TM) 
and similarly i,(Y) shall be used to denote the 
antisymmetric insertion of a vector field. We set 
ad(a)b:=[a,b], where the latter denotes the 


deg ,-graded supercommutator with respect to o. 
Denoting the obvious degree maps by deg,, deg,, 
and deg,=v0,, one observes that they all are 
derivations with respect to u but deg, and deg, fail 
to be derivations with respect to o. Instead, 
Deg := deg, + 2deg,, is a derivation of o and hence 
(W@ A, o) is formally Deg-graded and the corre- 
sponding degree is referred to as the total degree. 
Sometimes we write W; ® A to denote the elements 
of total degree >k. The total degree can be used to 
define an ultrametric d on W & A and it is known 
that (W & A,d) is complete, which implies that 
Banach’s fixed-point theorem can be applied in this 
setting. This observation is important since all the 
proofs of existence and uniqueness of certain 
elements in W & A we shall construct in the sequel 
can be reduced to the application of this theorem. 
In local coordinates, we define the differential 


5:= (1 dx’)is() [5] 


which satisfies 6 = 0 and is a superderivation of o. 
Evaluated at a point m € M, the product a(m)b(m) 
of two elements a,b € W 8A can be considered as 
the A-product of two differential forms with poly- 
nomial coefficient functions on the vector space 
T,„mM. Interpreted this way, the restriction of 6 to the 
fiber at m is nothing but the exterior derivative of 
differential forms with polynomial coefficients. 
Hence, it is clear that there is a homotopy operator 
ô! satisfying 


66146 '6+o=id (6] 


where 0:W@®A—-C™(M)[[v]] denotes the projec- 
tion onto the part of symmetric and antisymmetric 
degree 0. With the above view of 6, this is just the 
Poincaré Lemma for differential forms with poly- 
nomial coefficients, which says that all the coho- 
mology spaces vanish except for the one of degree 0 
and the cohomology in degree 0 is just given by the 
constant functions on the vector space T,,,M. This 
means that the -cohomology on W 8A is trivial 
except for the space of degree 0, which is given by 
the formal functions on M. For computational 
purposes, it is useful to have a concrete formula of 
the homotopy operator 6! which is given by 

EEE Q 1)i,(0;)a for deg, a = ka, 


deg,a = la with k +1 #0 


0 else 


Now ker(ô)n W=C™(M)[[v]] and one might 
wonder whether this subalgebra of (W 8 A, o) is 
already suitable to induce a deformed product on 
C°(M)[[v]] by pulling back the product o from W ® A. 


C= [7] 
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Evidently, the answer to the question is negative 
since the resulting product just gives back the 
undeformed pointwise product of formal functions 
on M. Hence one has to find a less trivial 
superderivation of the product o the kernel of 
which is still in bijection to C°(M)|[v]]. The 
essential new component of Fedosov’s construction 
is a superderivation of (W & A, o) that is not 
C®(M)iflv]]-linear and hence in a certain sense 
generates derivatives along the base manifold M. 
Using a torsion-free symplectic connection V on M 
we define an endomorphism also denoted by V of 
W®A by 


V:=(18 dx‘) Va, [8] 


which turns out to be a superderivation of o due to 
the fact that Vw = VII =0. The map V satisfies the 
identities 


|ô, V] = 0, 


since the connection is torsion-free [9] 


1 
2 
= ——ad(R 
N 5 ad( ), 
where R := }wieRip, dx! v dx! @dx* Adx' EW@A* [10] 


involves the curvature of the connection. Moreover, 
we have 


6R =0=VR 11) 


by the Bianchi identities. 

Now one could consider the superderivation —6 + V 
of (W 8 A, o ) and try to define a mapping T from 
C™*(M)[[v]] to ker(—6 + V) NW such that o(r(f)) =f 
for all f © C°(M)[[v]]. But in case the curvature of 
the connection does not vanish, the necessary 
condition for the solvability of the equation (—6 + V) 
T(f)=0 subject to the additional condition 
o(t(f))=f is not satisfied. Only in case there is a 
torsion-free symplectic connection on M with 
vanishing curvature, this procedure can be carried 
through and yields again the Weyl-Moyal star 
product since the fact that V is symplectic in this 
case implies that the components of the Poisson 
tensor are constant. However, in general, the kernel 
of —6+ V does not have the desired properties to 
specify a suitable subalgebra of (W ® A, o ) and one 
makes the ansatz 


Da. V = ~ad(r) [12] 


with an element r € W3 & A! for a suitable super- 
derivation. Now a direct computation yields that 


P = lad pr ork [13] 
V V 
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which vanishes iff 6r—Vr+(1/v)ror—R is a 
central element in W2 & A. This is the case iff 
there is a formal series of 2-forms Q €r% 
(\* T*M)[[v]] with 


1 
é6r—Vr+-ror—-R=1@2 [14] 
V 


After these preparations, one is in the position to 
prove the following theorem: 


Theorem 2 (Fedosov 1994, theorem 3.2; Fedosov 
1996, theorem 5.2.2). For every formal series 
Qe WAN T*M)[[v]] of closed 2-forms there 
exists a unique element r € W; 8 A! such that 


1 
ér=Vr—-ror+R+1@ and &'r=0 [15] 
V 


Moreover, r satisfies 
1 
r=8(R+180+Vr-Žror) [16] 
V 


from which r can be determined recursively. In this 
case the Fedosov derivation 


D:=-6+V — ad(r) [17] 


is a superderivation of antisymmetric degree 1 and 
has square zero: DÙ? =0. 


For obvious reasons Fedosov calls D a flat or 
abelian connection for the bundle W & A and w + Q is 
referred to as the central Weyl curvature of the 
connection D. In some sense, the flatness property 
D = 0 guarantees that there are sufficiently many flat 
sections. Before investigating the structure of 
ker (D) A W we note that the D-cohomology is trivial 
on elements a with positive antisymmetric degree 
since one has the following homotopy formula: 


DD la+ DiDa =a 


where 


= E | 1 

Dam e (S A) 
(cf. Fedosov (1996, theorem 5.2.5)). The reason for 
this fact, which is also the crucial point for the proof 
of Theorem 1, is the property of the -cohomology 
to vanish except for the cohomology space of 
degree 0. 

The next step in Fedosov’s construction now 
consists in establishing a bijection between the flat 
sections a € W, that is, those elements of W with 
Da =0, and C*(M)|[[v]]. 


Theorem 3 (Fedosov 1994, theorem 3.3, Fedosov 
1996, theorem 5.2.4). Let D= —6+ V — (1/v) 


|18] 


ad(r):W 8 A—>W 8A be given as in [17] with r as 
in [15]. 


(i) Then for any f € C™(M)|[v]] there exists a 
unique element T(f) € ker (9) AW such that 
o(r(f)) =f [19] 


and 7:C™(M){[v]] — ker (9) A W is C[[v]]-linear 
and referred to as the Fedosov-Taylor series 
corresponding to D. 


(ii) In addition, r(f) can be obtained recursively for 
f € C™(M) from 


1 
P= E+E (VAA -ZaAB 
Using D” according to [18] one can also write 


r(f) =f -D (18df) 
for all f € C®(M)|[v] 21] 


xr 


(iii) Since D as constructed above is a o-superderivation, 
ker (D) NW is a o-subalgebra and a new 
associative product x» for C™~(M)|[v]], which 
turns out to be a star product, is defined by 


pullback of o via T: 
f *g:=a(r(f) °7(g)) |22] 


In the following, we shall refer to the associative 
product « defined above as the Fedosov star product 
corresponding to (V, Q). The choice of the formal 
series of closed 2-forms 2 in fact has a crucial effect 
on the equivalence class of the resulting star 
product, whereas the choice of the torsion-free 
symplectic connection, which in contrast to a 
Riemannian connection is not unique, does not 
affect this class. This observation has been the 
main step in all the proofs of the classification 
results in deformation quantization of symplectic 
manifolds. Another way to prove this fact is to 
compute the characteristic class c(x) introduced by 
Deligne (1995) using the methods developed in Gutt 
and Rawnsley (1999) directly which yields: 


Theorem 4 (Neumaier 2002, theorem 2). Deligne’s 
characteristic class c(*) of a Fedosov star product x as 
constructed above is given by 


c(*) = ~= [e] +7 [9] |23] 


The properties of Q with respect to complex 
conjugation also decide on whether * is Hermitian 
or not. In case Q is real, that is, satisfies Q= Q it is 

7 bin = 
easy to show — observing that ao b=(—1)°bo0a@ for 
ac W@eA*,beWR@A! — that 7 solves the equa- 
tions that uniquely determine r and hence 7=r. But 
then D commutes with complex conjugation and 





therefore the unique characterization of the Fedosov- 


Taylor series yields r(f) =7(f) for all f € C” (M)[[v]], 
implying that x is Hermitian. 


Derivations, Automorphisms, 
and Equivalence Transformations 


Having defined the Fedosov star product x correspond- 
ing to (V, Q), the next logical step is to investigate the 
structure of its derivations and automorphisms and to 
find out how they can be described in the framework of 
Fedosov’s construction. In addition, one can ask for an 
explicit construction of equivalence transformations 
between two Fedosov star products * and +’ obtained 
from (V,Q) and (V‘/,Q’) that exist according to 
Theorem 4 iff [Q] = [9]. 

Since the basic philosophy of Fedosov’s construc- 
tion is to consider suitable operations on the algebra 
(W 8 A, ©) in order to obtain induced mappings on 
the level of (C°(M)[[v]], x), one may expect to be 
able to define derivations of (C*~(M)[[v]], x) by 
considering appropriate fiberwise quasi-inner 
derivations of (W@ A, o ) of the shape 


Dy, = ~+ad(b) [24] 


where 4 € W and without loss of generality we 
assume o(/)=0. Our aim is to define C[[v]]-linear 
derivations of x by C°(M)[[v]] > fre o(Dyr(f)), but 
for an arbitrary element h € W with o(hb)=0 this 
mapping fails to be a derivation as D, does not map 
elements of ker (D) A W to elements of ker (D) A W. 
In order to achieve this, the supercommutator of D 
and D, has to vanish. As Ð is a C[[v]]-linear 
o-superderivation, we obviously have 


ID, D,] = — ~ad(Dh) 25] 


and hence obviously Dh must be central, that is, Dh 
has to be of the shape 1 ® B with B € F™(T*M)|[v]] 
to have [D,D,]=0. From D* =0, we get that the 
necessary condition for the solvability of the 
equation Dh=1 &B is the closedness of B since 
(1 @B)=1@dB. But as the D-cohomology is 
trivial on elements with positive antisymmetric 
degree, this condition is also sufficient for the 
solvability of the equation Dh =1 & B and we get 
the following statement. 


Lemma 1 (Miller-Bahns and Neumaier 2004, 


lemma 2.1). 


(i) For all formal series B € 1™°(T*M)|[v]] of closed 
1-forms on M there is a uniquely determined 
element bhg EW such that Shp =1@B and 
o(hg)=0. Moreover, hg is explicitly given by 
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bg =D ‘(1 @B) [26] 


(ii) For all B € Z} g, (M)[[v]] the mapping Dp :C™ 
(M)[[v]] > C™(M)liv]], where 


Daf = o(Pryr(f)) = 0(—Zadtha)r(f)) 27 


for f € C°(M)[[v]] defines a C[[v]]-linear derivation 
of x and hence this construction yields a mapping 
Ziran (MII > B= Dp € Dercy (C (M)ilv]], *). 


Furthermore, one can show that one even obtains 
all C[[v]]-linear derivations of x by varying B in the 
derivations Dg constructed above. 


Proposition 1 (Müller-Bahns and Neumaier 2004, 
proposition 2.2). The mapping 


ZaeRham(M)[[v]] 3 B> Dg € Derey,y(C*(M) [PI], *) 


defined in Lemma 1 is a bijection. Moreover, Dag is 
a quasi-inner derivation for all f € C*(M)|[v]], that 
is, Dap =(1/v)ad.(f) and the induced mapping 
[B] [Dz] from H}rham(M)[[v]] to Dercy (C 
(MJII, #/Der%,,,(C™(MI[[vI], ») the space of 
C[[v]|-linear derivations of x modulo the quasi- 
inner derivations, also is bijective. 


Actually, it is well known that for an arbitrary star 
product x on a symplectic manifold the space of C[[v]]- 
linear derivations is in bijection with Zj.p1..(M)[[vI] 
and that the quotient space of these derivations modulo 
the quasi-inner derivations is in bijection with 
H graa MEA (cf. Bertelson et al. (1997), theorem 
4.2), but the remarkable thing about Fedosov star 
products is that these bijections can be explicitly 
expressed in terms of D resp. D” in a very lucid way. 

Now we turn to the consideration of C[[v]]-linear 
automorphisms of x. For such automorphisms that 
start with id, which are also called self-equivalences, 
it is known (cf. Gutt and Rawnsley (1999), Proposi- 
tion 3.3) for arbitrary star products * on (M,w) that 
they are of the form 


A = exp(vD) [28] 


with a C[[v]]-linear derivation D of x. Therefore, the 
above result about the description of all the 
derivations of * directly yields a complete descrip- 
tion of all self-equivalences of x. 

The description of C[[v]]-linear automorphisms 
that are not self-equivalences of x is slightly more 
involved and we first need some results about the 
concrete structure of the equivalence transforma- 
tions between two Fedosov star products *« and x’. 
To compare two Fedosov star products obtained 
from different torsion-free symplectic connections 
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V and V’ and different but cohomologous formal 
series of 2-forms Q and Q’, one has to compare the 
corresponding Fedosov derivations D and D. First 
recall some well-known facts about torsion-free 
symplectic connections on (M,w). Given two such 
connections V and V’, it is obvious that 
SY-V'(X, Y):=VxY — VY, where X,Y € r®(TM) 
defines a symmetric tensor field SY-Y' eT™(\/7 
T*M@TM) on M. Defining oY~V(X, Y,Z):= 
w(SY-Y' (X, Y),Z) it is easy to see that oY-V € 


T=? T*M) is a totally symmetric tensor field. 
Conversely, given an arbitrary element o€ 
T~(\v? T*M) and a symplectic torsion- free connec- 


tion V a defining S° € r®( V? T*M@TM) by 
o(X, Y, Z)=w(S (X,Y), Z), then V°” defined by 
V$Y:=VxY — S’(X, Y) again is a torsion-free 
symplectic connection and all such connections can 
be obtained this way by varying o. Using these 
relations, one can compare the corresponding 
mappings V and V’ on W & A. With the notations 
from above we have 
—(dx! @ dx')i,(SY 


V-Vi= -V (8; 3;)) 


m t ad(TY=7) [29] 


where TY-Y er”(V  T*'MT*M)CWQA! is 


defined by TYY (Z, Y;X):=0Y-V (X,Y, Z)= 
w(SY-Y (X, Y), Z). Moreover, TY-Y satisfies the 
equations 
TYY =0 [30a] 
and 
vr — R' R 4+ 1 v-v 7 plo 
. [30b] 
v ena a 
V 
baa = (1/4) wR, idx V dxi @dx* ^Adx! and 


= (1/4) ae "dx V di Í dxf Adx! denote the 
a elements of W & AĈ that are built 
from the curvature tensors of V and V”. 

Now we are in the position to compare two 
Fedosov derivations D and D' resp. the induced star 
products x and x’ obtained from (V,Q) and (V’,Q). 
The idea for the construction of an equivalence 
transformation from * to *’ is to look for an 
automorphism A, of (W 8 A, o ) of the form 


Ap =e€xp (Zaw) such that D'=A,D(A,) [31] 


where /) is an element of W3 guaranteeing that Ap is 
well defined and without loss of generality is assumed 
to satisfy o(þ)=0. In case one can find such an 
element /) it is clear that A, yields a bijection between 


ker(D) QW and ker(D") NW and hence one would 


obtain an equivalence S$, from * to x’ defining 


Spf = o(Apt(f)) 


= o(exn( aac) 7) 
with inverse 


(Sp) “fF = o((Ap) TCA) 


n (exp (- tadh) r P) [32b] 


A direct computation yields 


1 i, exp((1/v)ad(h)) — id 
ALD As) N= Dad Ty) 


which is equal to D' iff h has been chosen such that 


[32a] 


exp(żad(b)) — id 


T YEr r= (Dh) EWQA! [33] 


is a central element. Considering the total degree of 
the terms in this expression, this is the case iff there 
is a formal series of 1-forms Ce r™®(T*M)[[v]] 
such that the expression in eqn [33] equals 18C. 
Applying © to this equation and using the 
equations that r and r’ satisfy together with the 
relations [30] it is cumbersome but not difficult to 
show that necessarily Q and Q’ have to be 
cohomologous: 


0-0 =dC [34] 


with C as above. Now, using [6] one can show that 
this condition is in fact sufficient and moreover one 
can even determine the element / in question 
recursively: 


Theorem 5 (Fedosov 1994, theorem 4.3). Two 
Fedosov star products x and *' obtained from (V,Q) 
and (V’',9') are equivalent iff Q and X are 
cohomologous. In case C € vl (T*M)|[v]] satisfies 
Q — Q =dC there is a uniquely determined element 
hc € W3 with o(hc) =0 such that 


_exp((1/Y)ad(bo)) id gy) 1 gc 


(1/v)ad(hc) 


Moreover, hc can be determined recursively from 


Tew ay e 


hc=C@i+o! 


(1/v)ad(hc) y 
EN o E (/—r + TY ”)) ES) 


Viet 
V 


and with the  so-constructed hc one has 
D =A, DAp.) and thus S),. according to eqn [32] 
defines an equivalence transformation from x to x’. 


Evidently, in the above construction of the 
equivalence transformation Sp, there is some choice 
of the formal series of 1-forms C. Different possible 
choices C and C differ by a formal series of closed 
1-forms but choosing C instead of C amounts to 
another equivalence transformation Sh = A'S) .A 
from x to x, where A and A’ are certain self- 
equivalences of « and +’, respectively. In case Q and 
Q’ are real, we have seen that * as well as *’ are 
Hermitian star products and it is easy to verify that 
choosing a formal series C of 1-forms as above that 
is moreover real yields an element hc satisfying 
hc=hc. But then it is evident that the resulting 
equivalence transformation is also compatible with 
complex conjugation, that is, S,f=S, f for all 
fec™(M)[[v]]. 

Now we are prepared to give a construction of all 
C[[v]]-linear automorphisms of a Fedosov star 
product x. It is easy to show that any C[[v]]-linear 
automorphism of a star product x on a symplectic 
manifold is the combination of the action of a 
symplectomorphism w:M-— M and an equivalence 
between x and the pullback x’ via w! of x, which is 
defined by f x g=(uW!)*((u*f) x (w*g)) (cf. Gutt and 
Rawnsley (1999 Proposition 9.4)). Since the char- 
acteristic class of x’ is given by c(x’) = (q)!)*c(x), the 
necessary and sufficient condition for a 
symplectomorphism w to define a possible zeroth- 
order term of an automorphism is that (Y7!) c) 
= c(x) since x’ and x have to be equivalent. 

Within Fedosov’s framework, it can be shown 
that the pullback +’ via a symplectomorphism w! of 
* 1s identical to the Fedosov star product obtained 
from (V/= (yt) Va, OQ! = (Y), which just 
expresses the functoriality of Fedosov’s construc- 
tion. Together with Theorem 4 this particularly 
shows that c(*’)=(')*c(x), and therefore »’ is 
equivalent to x iff Q and Q differ by a formal series 
dC, of exact 2-forms, where Cy € vP'°(T*M)[[v]] 
clearly depends on yw. But in this situation one can 
apply the construction of equivalence transforma- 
tions between Fedosov star products given in 
Theorem 5 with C replaced by Cy and V’,{' as 
above yielding an equivalence S; := She, from * to x’. 
Finally, we therefore get that the combination 





Ay = U'Sp, [36] 


is a C[[v]]-linear automorphism of * and it is 
obvious from the above that every such automorph- 
ism can be obtained by considering all symplecto- 
morphisms w of (M, w) satisfying [(¢~!)*Q] = [Q] and 
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composing the resulting A, according to [36] with 
all self-equivalences A of x according to [28]. 


Adaptions, Modifications, 
and Generalizations 


The geometrical construction of Fedosov has gone 
through many adaptions and modifications that are 
well suited to the particular geometry of the under- 
lying symplectic manifold. Moreover, there are 
generalizations that go beyond the case of symplec- 
tic manifolds and others that yield more general 
deformations than star products. We just give a few 
important examples that stress the power and 
beauty of Fedosov’s construction. 

On a Kahler manifold, one can define the notion 
of star products with separation of variables (cf. 
Karabegov (1996) that are also called star products 
of Wick type (cf. Bordemann and Waldmann (1997) 
and Neumaier (2003). These are star products such 
that in local holomorphic coordinates the bidiffer- 
ential operators C, are of the form 


+ OKF AlElg 
KL 
C(f,g) = > G IK oz [37] 


with certain coefficient functions C+. These star 
products can be obtained by a modified Fedosov 
construction starting from the product ow;,, on W & A 
given by 


a owick b= (exp (F iOa) Qi (ðz ) (ag b) (38) 


where g% denotes the components of the inverse of 
the Kahler metric in local holomorphic coordinates. 
In the case of a Kahler manifold, there is a 
distinguished torsion-free symplectic connection 
namely the Kahler connection V that induces a 
superderivation of ow;, in a way completely 
analogous to [8]. With these structures the Fedosov 
construction works for an arbitrary formal series Q 
of closed 2-forms as before, but one can show that 
the resulting star product is of Wick type iff Q is of 
type (1, 1) and one can even show that one obtains 
all star products of Wick type by varying Q 
(cf. Neumaier (2003)). 

In the case of an almost-Kahler manifold, one can 
consider a product % on W®@A similar to Owick 
which is adapted to the almost-complex structure 
(cf. Karabegov and Schlichenmaier (2001)). How- 
ever, in this situation there is no torsion-free 
connection that yields a superderivation of this 
product but only a connection V’ with torsion that 
defines such a superderivation. Nevertheless, one 
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can consider a generalized Fedosov construction. To 
this end, one shows that [6, V’] =(1/v)ad'(T’) with 
some T’ € W & A? that satisfies 5T’ = 0 and encodes 
the torsion of V’ and 6R’=V’T", where again 
V2 = —(1/v)ad'(R’) and R’, which depends on the 
curvature of V’, satisfies V’R’ = 0. But then it is easy 
to show that there is a unique element ” € W2 Q A! 
such that 


1 
ôr = Vir —-ror +T4+R'410829 
V 
and 


ee et [39] 


with Q as above, which can also be computed 
recursively. Clearly, D' = —6 + V’ — (1/v)ad'(r') then 
is a suitable Fedosov derivation with square zero for o’ 
and one can proceed as described earlier to obtain a star 
product *’ adapted to the almost-complex structure. 

On a cotangent bundle 7: T*O — O, where T*O 
is equipped with the canonical symplectic form 
wo = —dĝo, one can consider (cf. Bordemann et al. 
(1998)) the following so-called standard ordered 
product ogg on W® A given by 


aosa b =u( exp( -vi (p) 8i (ôy + PITTA) 
x (a® b)) 40] 


in local Darboux coordinates. Here l, denotes the 
Christoffel symbols of a torsion-free connection V2 
on O in the chart of O corresponding to the bundle 
chart (g,p) and it is straightforward to see that otd 
does not depend on the chosen local coordinates and 
is associative. In the present situation, one can 
define a torsion-free symplectic connection VTS on 
T*O solely in terms of VÈ but then the correspond- 
ing mapping V' 2 on W®@A again fails to be a 
superderivation of ogg, whereas the combination 
VTS +B with B=(v/3)prr* Ri (1 8 dq')is(Op, )is(Op,); 
where Rip denotes the components of the curvature 
tensor of VÈ, turns out to be a suitable super- 
derivation to start the Fedosov construction with 
Ong. In fact, the square of V' 2+B turns out to 
equal the square of V‘2 and all the other 
preconditions of Fedosov’s construction are easily 
verified just replacing V by VT S +B. The particular 
property of the resulting star product *,.q for Q=0 
on T“O is that it is a standard ordered star product, 
that is, for all feEc°(T*O)|[[v]] and all ye 
C*°(O)[[v]] one has 


TX *std Í = xf |41] 


and hence x, in a certain sense is adapted to the 
vertical polarization. 


The methods mentioned so far can even be melted 
into a more general situation, where one considers a 
(complex) polarization on (M,w) and looks for star 
products that are adapted to this polarization which 
are then called polarized deformation quantizations 
(cf. Donin (2003)). Here again a generalization of 
Fedosov’s construction yields the existence and the 
classification of such particular star products. 

Another recent generalization of Fedosov’s con- 
struction that goes beyond the framework of smooth 
symplectic manifolds is that of the construction of 
star products on symplectic orbispaces (cf. Pflaum 
(2003)), which are stratified symplectic spaces. The 
main idea there is to consider Fedosov’s construction 
in local orbicharts and to show that the changes of 
orbicharts induce isomorphisms between the locally 
defined deformation quantizations, implying that the 
locally defined products match together to define a 
global deformation quantization on the symplectic 
orbispace. To achieve this property, one has to adjust 
the local Fedosov constructions appropriately, that is, 
one has to use locally defined torsion-free symplectic 
connections and formal series of closed 2-forms that 
are related by the changes of the orbicharts. 

Considering a vector bundle E— M, the sections 
P(E) are naturally a C°°(M)-right module and a 
TP°(End(£))-left module, and it is a natural question 
whether this bimodule structure can be deformed 
such that [°(E)[[v]] becomes a (C°(M)[[v]], «)-right 
module and a (I'(End(E))[[v]],*)-left module, where 
* is a deformation of the usual composition of 
elements of P°(End(E)). In order to construct such 
deformations, one can also adapt Fedosov’s construc- 
tion (cf. Waldmann (2002)) considering W 8 AQ 
E=(XX r ”(V T*M8 AT*M8 E)[[v]] and WAQ 
End(E)=(XL r (V T*M8 AT*M8 End(E))[[v]] 
and extending the product o to these spaces in 
a natural way making W@A®E a (W8AQ 
End(E), o0) — (W 8A, o)-bimodule. Furthermore, one 
has to consider a connection VË that naturally induces 
a connection on End(E), and both have to be added 
to V to define the corresponding substitute of V on 
the respective space. Then the Fedosov construction 
with WeA@End(E) can be considered yielding a 
Fedosov derivation D°™") with square zero, hence 
a Fedosov—Taylor series 7-™') and an associative 
deformation F®G= o(rF™)(F)o7EMF)(G)) of the 
usual composition of sections in the endomorphism 
bundle. Moreover, there is a map DË on WQA@E 
that is a superderivation with respect to the 
bimodule multiplication o along DE") and D, 
respectively. This map also has square zero and 
the intersection of its kernel with the elements of 
antisymmetric degree is in bijection to [°(E)[[v]] via 
a natural generalization TË of the Fedosov—Taylor 


series. Defining F©s:=o(rE")(F)or#(s)) and s-f:= 
o(t*(s)or(f)),P°(E)[[v]] can be given the structure 
of a ([™(End(E))[[v]],@)-(C*(M)[[v]], *)-bimodule 
which is indeed a deformation of the classical 
bimodule structure of [°(E). It is rather evident that 
the same procedure also works for other products on 
W®A and the above generalizations, in particular for 
the product owick on a Kahler manifold, where one can 
obtain (P°°(End(E))[[v]],@wick)-(C°(M) [Lv *wick)- 
bimodules that are adapted to the complex structure in 
case the curvature endomorphism of the connection 
VE is of type (1,1). For example, this holds true for 
(anti-) holomorphic vector bundles endowed with a 
Hermitian fiber metric h and the corresponding 
connection that is compatible with h and the (anti-) 
holomorphic structure. 

Finally, the proof of existence of deformation 
quantizations on arbitrary Poisson manifolds 
(M,II), that includes a concrete construction 
starting from Kontsevich’s star product on the 
flat space R” equipped with a Poisson tensor, 
given by Cattaneo et al. (2002) is similar in spirit 
to Fedosov’s construction. There one constructs 
two bundles J® and J® of associative algebras, 
where — as a bundle — J” is isomorphic to J®[[v]] 
and J” is the bundle of infinite jets of smooth functions 
on M which is equipped with the canonical flat 
connection Do. The Poisson tensor gives rise to 
the structure of a Poisson algebra on each fiber of J® 
and the canonical map C%(M) —J® yields a Poisson 
algebra isomorphism between C™%(M) and the 
Poisson algebra of Do-flat sections in J®. The second 
step in the construction consists in a deformation of 
this correspondence. Using the Kontsevich formula for 
R”, each fiber of J” can be equipped with an 
associative product which is a deformation of the 
above product on the fibers of J® in the direction of 
the Poisson bracket induced by II. Then analogously to 
Fedosov’s construction, one constructs a compatible 
connection D = Do + vD + 12D +--- which is a 
deformation of Do. Here compatibility just means 
that D is a derivation with respect to the above 
product on sections in J% implying that the D-flat 
sections form a subalgebra. Moreover, one can 
achieve that this connection is flat and in this case 
C°(M)[[v]] turns out to be in bijection to the D-flat 
sections in 7~. For the proof of existence of D and for 
its recursive determination using an adaption of 
Fedosov’s method, again special cases of Kontsevich’s 
formality theorem prove to be the crucial tools. Pulling 
back the above fiberwise product to C*(M)[[v]] via 
this isomorphism, one then obtains a star product on 
(M,II). Since this isomorphism can be determined 
recursively, the star product can in principle be 
computed explicitly. 
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See also: Deformation Quantization; Deformation 
Quantization and Representation Theory; Deformation 
Theory; Deformations of the Poisson Bracket on a 
Symplectic Manifold; String Field Theory. 
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Quantum Statistics 


Quantum particles are described by a complex, square- 
integrable wave function W(x,,...,xN) with |? 
representing the probability density of finding N 
particles at positions x1,%2,...,xn, which will be 
assumed to be in a d-dimensional square box V with 
side L and periodic boundary conditions. If the N 
particles are identical, |¥|* must be totally symmetric in 
the exchange of any pair of coordinates. Regarding the 
symmetry properties of W itself, it is an experimental 
fact (which finds its theoretical explanation in the 
context of relativistic quantum field theory) that only 
two possibilities can arise: either Ų is symmetric or it 
is antisymmetric, which means that V(x1,...,xN)= 
(—1)' U(xp,, ...5Xp,), where P;,..., Pn is a permuta- 
tion of 1,...,N, and (—1)’ is the parity of the 
permutation. Particles described by a symmetric wave 
function are called bosons, while particles with an 
antisymmetric wave function are called fermions, after 
Bose and Fermi, who introduced these concepts. The 
fermionic wave function therefore vanishes if two 
coordinates are equal, a property called Pauli exclu- 
sion principle. Particles have an intrinsic quantized 
angular momentum called spin and particles with 
semi-integer spin are fermions, while particles with 
integer spin are bosons. Examples of fermions are 
electrons, protons, or neutrons, with spin o= +h/2, 
where / is the Plank constant; examples of bosons are 
phonons or mesons with integer spin. 

The time evolution of a wave function is driven 
(through the Schrodinger equation) by the Hamilto- 
nian operator, and the choice of such an operator is 
determined by the physical system we want to 


describe. One of the most important physical realiza- 
tions of a fermionic system is given by the conduction 
electrons in solids with a crystalline structure (like 
metals). According to the classical theory of Drude, a 
crystal can be described as a lattice of atoms in which 
the valence electrons are lost by the atoms (which 
become ions) and move freely in the metal; they are 
responsible for the conduction properties of the 
crystal. However, if one assumes that the electrons 
are classical particles (in the sense that they obey the 
Newtonian mechanics), one obtains wrong predictions 
about the properties of crystals. One has to take into 
account that the conduction electrons are quantum 
particles and this provides us with a natural example 
of a fermionic system; the Hamiltonian can be taken as 


N hE 
Hn = D ——— + uc(x;) + > v(x; — xj) [1] 
i= i<j 


The first term represents the nonrelativistic kinetic 
energy of the electrons (m is the mass), uc(x) is a 
periodic potential due to the ions in the lattice 
(c(x)=c(x+R) with R= (ma, ...,Ngaz),n;j E€ Z) 
and Av(x — y) is a two-body interaction potential, 
which is modeled by a short-range potential to take 
into account, phenomenologically, the electrostatic 
screening. Finally, A and u are couplings which 
measure the “strength” of the corresponding inter- 
action. Much more complicated and “realistic” 
Hamiltonians could be considered; for instance, 
one can add an interaction with a stochastic field 
to take into account impurities in the lattice, or with 
a boson field to take into account the dynamics of 
the ions, and so on. Note also that one can study not 
only three-dimensional Fermi systems (d= 3), but 
also d=2 or d=1 systems; they can describe the 
conduction electrons of crystals that are anisotropic 
and should be considered as bidimensional or one- 
dimensional systems. We focus on the nonrelativistic 
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fermionic systems with Hamiltonian [1], which is a 
problem of great importance from both the con- 
ceptual and the applications point of view. 


Second-Quantization Formalism 


The Hilbert space of states of a system of N>1 
fermions is the space Hy of all the complex square- 
integrable antisymmetric functions V(x 1,...,xy). Let 
{bp(x)}pera be a basis for Hı (the one-particle Hilbert 
space of all the complex square-integrable functions 
W(x1)), where k is an index called quantum number. 
Usually, the set of g(x) is chosen as the eigenfunctions 
of the single-particle Hamiltonian 

be 

ee (x) 


for instance, if u = 0 then 





1 
p(x) = rane 

with bk representing the momentum; due to periodic 
boundary conditions, k has the form k=(27/L)n 
n=... ng with n; integer and —[L/2] < ni < 

L — 1)/2]. If we call |ky,...,kn) the normalized 
antisymmetrization of dg, (x1)@p (X2): bey (XN) 
(Slater determinant), then the set of all possible 
|k1,..-,Rn) is a basis for Hy; |k1,..., RN) describes a 
state in which the N fermions have quantum numbers 
ki,...,kn. One can introduce (Negele and Orland 
1988, Berezin 1966) the creation or annihilation 
operators a; , a, : they are anticommuting operators, 


{ap ap} = ap ay + Ay dy = Ôk k 
{ay ay} = {ay ay} = O 
such that a Rises kN =|, R1,.--5 RN) if k a ki, 


i=1,...,N and 0 otherwise; a, is the adjoint of 
a}. The action of a, is to create a particle with 
quantum number k if it is not present in the state, 
and to yield zero otherwise (according to the Pauli 
principle). The state |0) such that a, |0) =0 for all k 
is called the vacuum state and it represents a 
state with no particles. The Fock space is defined as 
the direct sum of the Hilbert spaces with any 
number of particles, and all the elements of the 
Fock space can be generated by linearly super- 
posing products of creation operators acting over 
the vacuum state. We can extend such definitions 
by adding a label to such operators to take into 
account the spin of the particle; for example, a, are 
creation or annihilation operators of a particle with 
spin o and position k. In terms of a}, =L ae 
yop PRX x)ay o and of its adjoint a, „, the Hamiltonian 
can be written as 


2 | 


Xx,0? 





—h’ a2 
=- + X = 
H= ` | J dx ay, yy Ixo 


+u J dw ela)a saz 
V 
+ aJ dx | dyv(x — y 
> oe f ayy 


According to the postulates of quantum statistical 
mechanics, the grand canonical partition function is 
given by Z=tre%4-N), where G=(KT)', k is 
the Boltzmann constant, T is the ope u is the 
chemical potential, N = >°, f dx a{ „azo, and tr is the 
trace operation over the Fock space. The thermodyna- 
mical average of an observable O is given by <O >= 
Z-ttr[e 70 H-eN) O]. Given a fermionic system, one is 
often interested in its Schwinger functions defined as 
follows: if x = (x,t) andt; >t. >--- > t;, s even, then 


y)ay ots. do o'y, g” [3] 


DRE Asses) 
otr e = a= tı H- uN) e (ti— tz )(H- uN) 2... e~t (H-N) 


~ 
tr e7 (H-N) 


4] 


with ¢;= +, — 8/2 < t; < 6/2; periodic and anti- 
periodic boundary conditions are, respectively, 
imposed over x; and t;. From the knowledge of the 
Schwinger functions, one can compute all the 
thermodynamical properties of a system at equili- 
brium or close to equilibrium. 


The Free Fermi Gas 


Computation of the physical observables corre- 
sponding to the complete Hamiltonian [3] is a 
very difficult task. The natural starting point 
consists in taking into account only the kinetic 
term by putting A=u=0 in [3], obtaining the free 
Fermi gas model. The resulting model is not trivial 
at all; its properties are radically different with 
respect to the ones of a gas of classical particles, 
and it is sufficient to understand many properties of 
matter (see, e.g., Mahan (1990)). If 
1 
p(X) = TA 

then |ki,01,...,RN,ON) are eigenfunctions of 
H with ea Sok o €lk)nk,o Where e(k) = 

h |k]? /2m and ko = O51, the occupation number, 
is the eigenvalue ay, sko) nko = 1 if in the state there 
is a fermion with momentum k and spin o, and it is 
zero otherwise. The eigenfunction |Q) of H with 
lowest energy is called the ground state, and it 
determines the low-temperature properties of the 
system. In order to find the ground state |Q}, one has 
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to minimize `% , €(R)Mz,. with the constraint that 7, o 
can take only the values 0 or 1 and $`; ok, =N; if 
there are many solutions to this problem, one says that 
the ground state is degenerate. An approximate 
solution is the following: if d= 3, one can consider a 
state such that ng o= 1 if k is in a sphere of radius kp 
and zero otherwise; since the number of momenta 
k =(27/L)n in the sphere is approximately given by 


Ark? L? 
3 8r? 


we can choose kp= (372p), with p=NL~. 
The state | Tjp)<e, 41/2, p41 J2, klO) is not the true ground 
state when N,L are finite, but it is a very good 
approximation of it and converges to it (in a suitable 
sense) in the limit N,L — ov, p fixed. The boundary of 
the sphere with radius kp in the space of momenta is 
called the Fermi surface, and it is a key notion in the 
theory of Fermi systems; if d=2, it is replaced by a 
circle and in d=1 by two points. 

Coming to the thermodynamical properties, the 
partition function is given by 


Z= lI NO Pem = TT (1 + eE) 


ng=0,1 k 





and the specific heat by 


0 O 
v = — == log Z 
ame E 
One finds, by expressing u in terms of 8 through the 
relation N= — Of /ðu, that if d=3, in the L — œ 


limit 
2 
C —_ pr (=) +0 
2 EF 


where p= b“ ke /2m. Early models for metals 
described the electrons as classical particles; however 
in such a case, a well-known result of classical 
statistical mechanics states that they should contribute 
to the specific heat by ż px, while experimentally their 
contribution is much smaller. The solution of this 
puzzle was provided by the above formula for C,; the 
classical value is in fact depressed by a factor 


which at room temperatures is O(10~7), in agree- 
ment with experimental data. The average number 
of electrons with momentum bk is given, in the 
infinite-volume limit, by 


(a$ sako) = (1 + aan 


At zero temperature, it reduces to O(|k| < kp), that 
is, it has a discontinuity at the Fermi surface, while 


at high temperatures it is very close to the Maxwell 
distribution ~e~P4)-/), 

Finally, in the free Fermi gas model, all Schwinger 
functions can be computed. One finds that, if, for 
instance, e;=+ for i=1,2,...,s/2 and &;=-— 
otherwise, that the Schwinger function with s > 4 
can be expressed as sum of products of the s=2 
Schwinger function (also called the propagator) 


S(x1,...,%5) = > D" [ [so (xi — xp) [5] 


where $= 1,.0.98/2,7 =S] 2 F leen e a per 
mutation of j=s/2 + 1,...,s,(—1)" is the parity of 
this permutation, 5) is the sum over all the 
possible permutations; such a formula is called the 
Wick rule. By an explicit computation, So(x — y) is 
given by 


2r eik(x-y) 


(7) 
p ko=27r(no+1/2)87! L k=(27/L)n —iko T ik| /2m TH 


d 
o ELD w 
ko=27r(no+1/2)67! k=(2r/L)n 


6 


where k=(ko,k). In the limit L, 8 — ov, for large 
distances S(x,y) decays as a power law, O(|x — y|') 
times an oscillating function of period k;!. Note that 
So(k) in the limit 6, L — oo diverges for kọ =0 and 
e(k) =, that is, at the Fermi surface (u= ep in the 
limit 3 — oo); when £ is finite, Sọ(k) is finite even for 
L — œ, that is, the finite temperature acts as an 
infrared cutoff. 


2n 
p 


Fermions in an External Potential 


The next step consists in adding an external periodic 
potential to the free Fermi gas model, taking into 
account the field generated by the ions of the lattice. 
We consider then [3] with A= 0 and u Æ 0. As in the 
previous case, the eigenfunctions of the N-particle 
Hamiltonian can be computed and are expressed in 
terms of the single-particle eigenfunctions of 
-b02 /2m +uc(x); they are called Bloch waves 
and have the form 


1 
/ [4/2 


k, called the crystalline momentum, is conserved 
modulo G, the vectors of the reciprocal lattice, 
defined as 





r(x) = eu, (x), ug(x) = ug(x + R) 


The eigenvalue e(k) of _h /2m + uc(x) associated 
with a Bloch wave (x) has some peculiar properties; 
in the L — oo limit, one finds that e(k) is not a 
continuous function (unlike the u = 0 case) but it has 
gaps, that is, first-order discontinuities. For d=1, 
by a convergent power-series expansion in u, one 
finds that e(k) is a continuous monotonically increas- 
ing function except at the points +77/a, n an integer; 
at these points ¢(R) is discontinuous and e((nr/a)*) — 
e((nm/a)) = A, =u, + O(u*); the gaps divide e(k) 
into disconnected pieces called energy bands. Some- 
thing similar happens in d = 2, 3,in which gaps open 
for R such that G? + 2kG =0. 

Again, the eigenfunctions of H are given by 
Jk1,01,---,Rn, on) with eigenvalue >), ,€(R)Mk,0s 
and the Fermi surface is still defined by the set k 
such that e(k)=ep with ep determined by the 
condition > kek) < EF 1=N. However, in this case 
the Fermi surface is not anymore a sphere in d=3, 
but it is in general a polyhedron of a very complex 
shape. The Schwinger functions are expressed by the 
Wick rule [5] in terms of the two-point Schwinger 
functions; they are given by [6] with e=») replaced 
by p(x); (y) and k|? /2m replaced by e(k). The 
asymptotic properties of the two-point Schwinger 
function are quite different with respect to the u=0 
case. This is easy to see if d= 1; in the limit L, 8 —> œo, 
S(k) is singular if u does not belong to the interval 
[e((nm/a)*),e((n/a)~)], whereas it is finite if pu 
belongs to such an interval; in the first case, S(x,y) 
decays for large distances as O(|x — y|_'), whereas in 
the second case it is O(e7!®-l*-Y|), This means that, 
depending on the number of particles (which essen- 
tially fixes u), the Schwinger function has a totally 
different asymptotic behavior. This fact has impor- 
tant consequences in many physical properties; for 
instance, the conductivity (which can be computed 
from the s=4 Schwinger function) vanishes if u 
belongs to the interval [e((7/a)"), e((nm/a))]. Simi- 
lar properties hold for d=2,3; hence, from the 
knowledge of the number of particles and the periodic 
potential generated by the ions, one can predict if the 
system is an insulator or a metal. 

Note also that the conductivity is infinite in the 
infinite-volume and zero-temperature limit, when u 
does not correspond to a gap; in other words, the 
electric current in a perfect crystal lattice is not 
subjected to any dissipation of energy. A finite 
resistivity is found only if one takes into account 
deviations from perfect periodicity. To simulate 
impurities in the lattice, one can add, according to 
Anderson, to the Hamiltonian an interaction term 
of the form adxwfw,, where ¢, is a Gaussian 
stochastic field. A detailed mathematical investi- 
gation has been devoted to the properties of 
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eigenfunctions of _h & /2m+adx, where @x is a 
Gaussian field (see, e.g., Pastur and Figotin (1991)); 
it is found that if œa is large enough in d=2,3 and 
for any a in d=1, the single-particle eigenfunctions 
are exponentially localized, that is, they decay 
exponentially at large distances; this implies a finite 
conductivity. One can also add to the Hamiltonian a 
term Bopi yp, with a quasiperiodic function, in 
order to describe crystals in which the lattice 
develops a periodic distortion, with incommensurate 
period with respect to the lattice periodicity. For 
d=1 and 8 large, one again finds localized 
eigenfunctions, whereas for small 8 there are 
extended states (see, e.g., Pastur and Figotin 
(1991)); such results are obtained with the 
Kolgomorov—Arnol’d—Moser (KAM) techniques. 


Interacting Systems 


The analysis of noninteracting Fermi systems has 
been very successful in understanding qualitatively 
many features of crystals, but there are many 
properties (e.g., superconductivity or magnetism) 
which cannot be really explained without taking 
into account the interaction between fermions; 
however, the analysis becomes more involved. 
When there is no interaction, the properties of 
the many-body system can be understood in terms of 
the single-body properties; the eigenfunctions of the 
Hamiltonian are, in fact, obtained in terms of the 
single-particle eigenfunctions. This is not true when 
A #0 when a description of the system in terms of 
independent particles is impossible. In order to 
compute the interacting Schwinger functions, it is 
convenient to write them in terms of fermionic 
functional integrals (Berezin 1966). One introduces 
a set of anticommuting Grassmann variables y% , Y, 
k=(ko,k); the Grassmann integration is defined by 
{duy =I and Jdu =0; o= +, and the integral 
of any analytic function of the Grassmann variables 
can be obtained by expanding it in Taylor series 
(which is a finite sum if suitable cutoffs are imposed 
and L, 8 are finite) and using the above rules; finally, 


1 . 
ve =X gela ypt 
Lag i 


and yy is defined in an analogous way. The 
Schwinger function can be written as a Grassmann 
integral as follows: 


S(x1, X2, er , XN) 
ðN 


_ —v+ | dxgivé 
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where P(dw) is the fermionic integration 
[Tle dug dey lexp lp VI- iko + elk) — plyg], while 
y=(y,s), and 


v=AS> | de dy vey) 


x é(t B s) F De, yot [8] 


The Grassmann integral of a monomial of Grassmann 
variables can be obtained by the Wick rule [5] with 
propagator the Fourier transform of (— ikọ + e(k) — 
u)'. As stated earlier, the propagator is finite at 
nonzero temperature, whereas if 3=oo, then it is 
singular when k=(ko,k) is such that kj) =O and 
e(k) = p. 

One can write [7] as a series by Taylor-expanding 
the exponential and using the Wick rule; each order 
of the expansion can be represented as a sum of 
Feynman diagrams, very similar to the ones appear- 
ing in quantum field theory. We have then an 
algorithm to compute [7]; nevertheless, to extract 
information from such a series is quite difficult. One 
cannot really compute an infinite (in the L= oo 
limit) number of coefficients, so one is tempted, for 
small A, to compute only the first few of them, 
neglecting the others. However, it appears that this 
approximation is generally not justified, and it leads 
to wrong results; the reason is that the Schwinger 
functions for A=0 or A#0O are not analytically 
close, or, in more physical terms, even if is small, 
the physical behavior of the free and interacting 
theories can be quite different, especially at low 
temperatures. A number of very interesting concepts 
(e.g., spontaneous symmetry breaking or the mass 
generation phenomenon), or techniques (e.g., the 
renormalization group method, or the parquet or 
random phase approximation) have been introduced 
in the last 50 years to analyze [7], and indeed many 
results have been obtained which explain several 
physical properties of the matter, such as super- 
conductivity or the Kondo effect (see, e.g., Anderson 
1985, Abrikosov et al. 1965, Mahan 1990, Negele 
and Orland 1988, Pines 1961). Unfortunately, most 
of such results are not really mathematically 
consistent, and in many cases quantitative computa- 
tions are impossible (in computations one generally 
neglects terms which, according to a heuristic 
physical intuition, are irrelevant, but no control of 
the error introduced by this approximation is 
attempted). In recent times, attempts towards a 
mathematical understanding of the functional inte- 
gral [7] have started (see, e.g., Benfatto and 
Gallavotti (1995), and references therein); the 
methods rely on the mathematical implementation 
of Wilson’s renormalization group methods via 


multiscale analysis (Gallavotti 1985). The necessity 
of a firmer mathematical basis was felt mainly 
under the pressure of the recent discovery of high- 
Te superconductors whose behavior is still not 
understood in terms of the microsopic model [7]; 
this has forced reconsideration of the validity of the 
approximations usually made in the analysis of this 
model. 

The behavior of [7] depends crucially on the 
temperature. At high temperatures, we can simply 
expand the exponential in [7] in a power series of A, 
and find that each Feynman graph contributing to 
the nth perturbative order is bounded by C%],|", 
with Cg < CS” for some constants C,y; this follows 
immediately by using the Wick rule and by 
remembering that the propagator is larger than 
O(3-'). As the number of Feynman graphs con- 
tributing to order n is O(n!), a bound on each 
Feynman graph is not sufficient to prove the 
convergence of the series. To prove convergence, 
one has to take into account cancellations, due to 
the anticommutativity of fermionic variables. Such 
cancellations are proved via Gram’s inequality for 
determinants and a bound C”%” can be obtained for 
the order n (without factorials); hence, convergence 
follows for temperatures greater than O(|A|") for 
some constant a@>0O. One finds that 
S(k) =So(k)(1 + Ay(R)) with |A)(k)| < CJA], that is, 
the interaction has essentially no influence on the 
physical properties of the system at high 
temperatures. 


Landau Fermi Liquids 


We consider next an intermediate region of tem- 
peratures, that is, e™%™/ Al < T < |A|” for some con- 
stants a,œ. In this region, the naive expansion in 
power series of A fails and other techniques, such as 
renormalization group, are necessary. Such a 
method allows us to perform a suitable resummation 
of the naive power series in A, and one gets, for A 
small enough, T > e~@/\| and e(k) =|k|* /2m, 


apy 2 FAR) 
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where Z(A)=14+2(A), vp(A) =hbkp/m+v(A), and 
ke(A)=Re +A), with 2A) = O(A*), v(A) = O(A), 
v(A) = O(A*), and z(A),v(A),vp(A) essentially tem- 
perature independent; moreover, |A)(k)| is O(A). 
The above formula has been proved rigorously for 
d=2 (see Rivasseou (1994), and references therein); 
for d=3, it has been proved at the level of formal 
perturbation theory (Benfatto and Gallavotti 1995). 
The case e(k) =|k|?/2m is quite special, as the shape 


of the interacting Fermi surface is fixed by the 
rotation-invariant symmetry; it is necessarily circular 
(d=2) or spherical (d=3), whereas in general the 
interaction can also modify its shape. For d= 2, if the 
interacting Fermi surface is symmetric, smooth and 
convex, a formula like [9] still holds (with a function 
kp(A,k) replacing kp(A)) up to exponentially 
small temperatures (see references in Gentile and 
Mastropietro (2001)). 

It is apparent from [9] that one cannot derive such 
a formula from a power-series expansion in A; by 
expanding [9] as a series in A, one immediately finds 
that the mth term is O(\”G”), which means that the 
naive perturbative expansion cannot be convergent 
up to exponentially small temperatures. It can be 
derived only by selecting and resumming some 
special class of terms in the original expansion. A 
peculiar property of [9] is that the wave function 
renormalization Z(X) is essentially independent of 
the temperature. Such temperature independence is a 
consequence of cancellations in the perturbative 
series essentially due to the curvature of the Fermi 
surface. For d=1, a formula similar to [9] is also 
valid; however, such cancellations are not present 
and one finds Z(A)=1-+ O(\ log 8). Comparing 
S(k) given by [9] with the Fourier transform So(k) of 
[6], we note that the Schwinger function of the 
interacting system is still very similar to the 
Schwinger function of a free Fermi gas, with 
physical parameters (e.g., the Fermi momentum, 
the wave function renormalization, or the Fermi 
velocity) which are changed by the interaction. This 
property is quite remarkable: the eigenstates cannot 
be constructed when A= 0 starting from the single- 
particle states but, nevertheless, the physical proper- 
ties of the interacting system (which can be deduced 
from the Schwinger functions) are qualitatively very 
similar to the ones of the free Fermi gas, although 
with different parameters; this explains why the free 
Fermi gas model works so well to explain the 
properties of crystals, although one neglects the 
interactions between fermions which are, of course, 
quite relevant. A fermionic system with such a 
property is called a Landau Fermi liquid (see, e.g., 
Arbikosov et al. 1965, Mahan 1990, Pines 1961), 
after Landau, who postulated in the 1950s that 
interacting systems may evolve continuously from 
the free system in many cases. 

It was generally accepted that metals in this range 
of temperatures were all Landau Fermi liquids 
(except one-dimensional systems). However, the 
experimental discovery of the high-T, superconduc- 
tors (see, e.g., Anderson (1997)) has changed this 
belief, as such metals in their normal state, that is, 
above Tę are not Landau Fermi liquids; their 
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wave function renormalization behaves like 1+ 
O(7 log 3) instead of 1+ O(A?) as in Landau 
Fermi liquid. This behavior has been called 
marginal-Fermi-liquid behavior and many attempts 
have been devoted to predict such behavior from [7]. 
In order to see deviations from Fermi liquid 
behavior, one could consider Fermi surfaces with 
flat or almost flat sides or corners (which are quite 
possible; e.g., in a square lattice with one conduc- 
tion electron per atom, such as in the “half-filled 
Hubbard model’’). 

Let us finally consider the last regime, that is, 
temperatures lower than O(e~4@/!!). Except for very 
exceptional cases (e.g., asymmetric Fermi surfaces, 
i.e., such that e(k) Ae(—k) except for a finite 
number of points, in which Fermi liquid behavior 
is found down to T =O (Feldman et al. 2002)), a 
strong deviation from Fermi liquid behavior is 
observed; the interacting Schwinger function is not 
similar to the free one and the physical properties in 
this regime are totally new. 


One-Dimensional Systems up to 7 —0 


The only case in which the Schwinger functions of 
the Hamiltonian [3] can be really computed down to 
T =0 occurs for d= 1; in such a case, an expression 
like [9] is not valid anymore and the system is not a 
Fermi liquid. On the contrary, when u=0 and for 
small repulsive A > 0, one can prove, for spinning 
fermions (see Benfatto and Gallavotti (1995), Gentile 
and Mastropietro (2001) and references therein) that 


tee ko] 
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where kp(A) = krp + O(A) and n(à) =ad + O(A?) isa 
critical index. This means that the interaction 
changes qualitatively the nature of the singularity 
at the Fermi surface; S(k) is still diverging at the 
Fermi surface but with an exponent which is no 
longer 1 but is 1 — 27(A), with 7(A) a nonuniversal 
(i.e., A-dependent) critical index. As a consequence, 
the physical properties are different with respect to 
the free Fermi gas; for instance, the occupation 
number nz is not discontinuous at k = + kp(A) when 
T =0. Nonuniversal critical indices appear in all 
the other response functions. Fermionic systems 
behaving in this way are called Luttinger liquids, 
as they behave like the exactly solvable Luttinger 
model describing relativistic spinless fermions 
with linear dispersion relation. The solvability of 
this model, due to Mattis and Lieb (1966), relies 
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on the possibility of mapping its Hamiltonian in a 
system of free bosons. Such a mapping is not 
possible for the Hamiltonian [3], which is not 
solvable; however, one can use renormalization 
group methods and suitable Ward identities to 
show that its behavior is similar to the Luttinger 
model (in a sense, one makes perturbation theory 
not around the free Fermi gas, but around the 
Luttinger model). 

If we take into account the interaction with an 
external periodic potential with period a, that is, 
consider u # 0, we find that if kp 4 nr/a, then the 
Schwinger function behaves essentially like [8]. On 
the contrary, in the filled-band case, kp =n7/a, one 
finds that there is still an energy gap which becomes 
O(u't™) with n,=O(A); this means that the 
renormalization of the gap is described by a critical 
index; moreover, S(x) ~ O(e-" xh), A similar 
behavior is also observed in the presence of quasi- 
periodic potential. In the attractive case, À< 0, 
u=0Q, the behavior is much less understood; it is 
believed that the interaction produces a gap A) in 
the spectrum which is nonanalytic in A, and S(x) 
shows an exponential decay rather than a power- 
law decay, and the interaction converts the system 
from a metal to an insulator. 

Finally, it is remarkable that a large variety of 
models, like Heisenberg spin chains or bidimensional 
classical statistical mechanics models, such as the 
eight-vertex or the Ashkin-Teller model, can be 
mapped into interacting d = 1 fermionic systems, and 
consequently their critical behavior can be understood 
by using fermionic techniques (see Gentile and 
Mastropietro (2001), and references therein). 


Superconductors 


The theory up to T=0 for d=2,3 systems with 
dispersion relation |k|*/2m is based only on 
approximate computations, predicting the phenom- 
enon of superconductivity. According to the theory 
of Bardeen, Cooper, and Schrieffer (BCS theory), 
the interaction between fermions leads to the 
formation of a gap in the energy spectrum, below 
the critical temperature. There are many ways to 
derive the BCS theory. One is based on the fact that 
one verifies, by perturbative computations, that the 
effective interaction is stronger when the four 
momenta of the fermions are such that k; ~ —k3 


and kœ —k4. This suggests, heuristically, to 
replace in [7] v with 
1 _ = 
vecs = Ar ag > Viet a oP oP Ko! 
kk 


which is an interaction between pairs of electrons 
with opposite spin and momenta, which are called 
Cooper pairs. Replacing v with vgcs has the great 
advantage that it makes the Schwinger functions 
exactly computable and explains the mechanism 
of superconductivity in many metals (but not in 
the recently discovered high-T, superconductors). 
On the other hand, proving that [7] with v or vgcs 
has a similar behavior is still an important open 
problem. The two-point Schwinger function in the 
model with vgcs can be written, after the so-called 
Hubbard-Stratonovitch transformation, as 


=iko—elk)+p =L vlu) 
ke e2(k)-- du 


C —_ anrd 
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[11] 


where v(u) is a function with a global minimum in 
u=() for repulsive interactions A < 0, whereas for 
A >0O and sufficiently small temperatures (for 
T < T., with T.=O(e-4/!)), it has the form of a 
double well with two minima at w=+A) with 
Ay = O(e~4/!); for T greater than T,, there is only 
a global minimum at u=0. By the saddle-point 
theorem, we find, for T < T, and A < 0, 


—iky —e(k) +p 


aaa kj + (e(k) — Hw) + AY 


[12] 

L—oœ 
The physical properties predicted by [12] are 
completely different with respect to the free case: 
the occupation number is continuous, there is an 
energy gap in the spectrum, the specific heat is 
O(e~*") and the phenomenon of superconductivity 
appears. The fact that the interaction generates a 
gap is called mass generation; a similar mechanism 
appears in particle theory. 


Conclusions 


Many other physical phenomena, observed experi- 
mentally, can be essentially understood by studying 
fermionic systems, but a clear mathematical com- 
prehension is still lacking. We mention: the Kondo 
effect, that is, the resistance minimum observed in 
some metals due to magnetic impurities; Mott 
transition, in which a strong interaction produces 
an insulating state in a system which should be 
conductors; antiferromagnetism; fractional quantum 
Hall effect, and many others. We can say that the 
situation in this area of study reminds one of the 
classical mechanics at the end of the nineteenth 
century; there is agreement on the models to 
consider, which are believed to be able to take into 


account the marvelous properties of the matter 
experimentally found, but to extract information 
from them requires deeper and complex analytical 
and mathematical investigations. 


See also: Falicov—Kimball Model; Fractional Quantum 
Hall Effect; Quantum Statistical Mechanics: Overview; 
Renormalization: Statistical Mechanics and Condensed 
Matter. 
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Introduction 


In nonrelativistic quantum mechanics, the state of 
a d-dimensional particle is represented by a 
unitary vector w in the complex separable Hilbert 
space L?(R7), the so-called “wave function,” while 
its time evolution is described by the Schrödinger 
equation: 


oO b 


(0, x) = po(x) 


where /) is the reduced Planck constant, m > 0 is the 
mass of the particle, and F=—VV is an external 
force. 

In 1942 R P Feynman, following a suggestion by 
Dirac, proposed an alternative (Lagrangian) formu- 
lation of quantum mechanics, and a heuristic but 
very suggestive representation for the solution of eqn 
[1]. According to Feynman, the wave function of the 
system at time ¢ evaluated at the point x€R? is 
given as an “integral over histories,” or as an 
integral over all possible paths y in the configuration 
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space of the system with finite energy passing at the 
point x at time t: 


=i 
Tr e J eli) Dy 
{yh E)=x} 


xf PSO) [2 
{y1y(t)=x} 


S:(y) is the classical action of the system evaluated 
along the path y 


(9) = 8:0) — f “V(q(s)) ds 3 


So)=F | HOP ds 4] 


Dy is a heuristic Lebesgue “flat” measure on the 
space of paths and (Jere) =x) el/h)S°(y)Dy)* is a 
normalization constant. 

Some time later, Feynman himself extended 
formula [2] to more general quantum systems, 
including the case of quantum fields. 

The Feynman path-integral formulation of quan- 
tum mechanics is particularly suggestive, as it 
provides a spacetime visualization of quantum 
dynamics, reintroducing in quantum mechanics the 
concept of trajectory (which was banned in the 
“orthodox interpretation” of the theory) and creat- 
ing a connection between the classical description of 
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the physical world and the quantum one. Indeed, it 
provides a quantization method, allowing, at least 
heuristically, to associate a quantum evolution to 
each classical Lagrangian. Moreover, the application 
of the stationary-phase method for oscillatory 
integrals allows the study of the semiclassical limit 
of the Schrodinger equation, that is, the study of the 
detailed behavior of the solution when the Planck 
constant is regarded as a parameter converging to 0. 
Indeed, when h is small, the integrand in [2] is 
strongly oscillating and the main contributions to 
the integral should come from those paths y that 
make stationary the phase function S(y). These, by 
Hamilton’s least action principle, are exactly the 
classical orbits of the system. 

Feynman path integrals allow also a heuristic 
calculus in path space, leading to variational 
calculations of quantities of physical and mathe- 
matical interest. An interesting application can be 
found in topological field theories, as, for 
instance, Chern—Simons models. In this case, 
heuristic calculations based on the Feynman 
path-integral formulation of the theory, where 
the integration is performed on a space of 
geometrical objects, lead to the computation of 
topological invariants. 

Even if from a physical point of view, formula [2] 
is a source of important results, from a mathema- 
tical point of view, it lacks rigor: indeed, neither the 
“infinite-dimensional Lebesgue measure,” nor the 
normalization constant in front of the integral is 
well defined. In this article, we shall describe the 
main approaches to the rigorous mathematical 
realization of Feynman path integrals, as well as 
their most important applications. 


Possible Mathematical Definitions 
of Feynman’s Measure 


In the rigorous mathematical definition of Feynman’s 
complex measure 


=j 
i | MSO Dy) eSa fs] 
{yh )=x} 


one has to face mainly two problems. First of all, the 
integral is defined on a space of paths, that is, on an 
infinite-dimensional space. The implementation of 
an integration theory is nontrivial: for instance, it is 
well known that a Lebesgue-type measure cannot be 
defined on infinite-dimensional Hilbert spaces. 
Indeed, the assumption of the existence of a 
o-additive measure u which is invariant under 
rotations and translations and assigns a positive 
finite measure to all bounded open sets leads to a 


contradiction. In fact, by taking an orthonormal 
system {e;};-x in an infinite-dimensional Hilbert 
space H and by considering the open balls 
B; = {x EH, ||x — e||< 1/2}, one has that they are 
pairwise disjoint and their union is contained in the 
open ball B(0, 2) = {x € H, ||x|| < 2}. By the Euclidean 
invariance of the Lebesgue-type measure u, one can 
deduce that p(B;) =a,0<a< œ, for all ¿€ N. By the 
o-additivity, one has 


u(B(0,2)) > w(U;Bi) = > #(Bi) = 00 


but, on the other hand, u(B(0,2)) should be finite as 
B(0,2) is bounded. As a consequence, we can also 
deduce that the term Dy in [2] does not make sense. 

The second problem is the fact that the exponent 
in the density e"/”)5:) is imaginary, so that the 
exponential oscillates. Even in finite dimensions, 
integrals of the form f,ne'®f(x)dx, with 
P, f:R —R are continuous functions and f is not 
summable, have to be suitably defined, in order to 
exploit the cancelations in the integral due to the 
oscillatory behavior of the exponential. 

The study of the rigorous foundation of Feynman 
path integrals began in the 1960s, when Cameron 
proved that Feynman’s heuristic complex measure 
[S] cannot be realized as a complex bounded 
variation o-additive measure, even on very nice 
subsets of the space (R%)!° of paths, contrary to 
the case of complex measures on R” of the form 
eli/2lx} dx. In other words, it is not possible to 
implement an integration theory in the traditional 
(Lebesgue) sense. As a consequence, mathemati- 
cians tried to realize [5] as a linear continuous 
functional on a sufficiently rich Banach algebra of 
functions, inspired by the fact that a bounded 
measure can be regarded as a continuous functional 
on the space of bounded continuous functions. 
In order to mirror the features of the heuristic 
Feynman’s measure, such a functional should have 
some properties: 


1. it should behave in a simple way under “transla- 
tions and rotations in path space,” as Dy denotes 
a “flat” measure; 

2. it should satisfy a Fubini-type theorem, concern- 
ing iterated integrations in path space (allowing 
the construction, in physical applications, of a 
one-parameter group of unitary operators); 

3. it should be approximable by finite-dimensional 
oscillatory integrals, allowing a sequential approach 
in the spirit of Feynman’s original work; and 

4. it should be sufficiently flexible to allow a rigorous 
mathematical implementation of an infinite- 
dimensional version of the stationary-phase 


method and the corresponding study of the 
semiclassical limit of quantum mechanics. 


Nowadays, several implementations of this program 
can be found in the literature of physics and mathe- 
matics, for instance, by means of analytic continuation 
of Wiener integrals, or as an infinite-dimensional 
distribution in the framework of Hida calculus, or via 
“complex Poisson measures,” or via nonstandard 
analysis, or as an infinite-dimensional oscillatory inte- 
gral. The last of these methods is particularly interesting 
as it allows the systematic implementation of an infinite- 
dimensional version of the stationary-phase method, 
which can be applied to the study of the semiclassical 
limit of the solution of the Schrödinger equation [1]. 


Analytic Continuation 


In one of the first approaches in the definition of 
Feynman path integrals, formula [2] was realized as 
the analytic continuation in a suitable complex 
parameter of a (nonoscillatory) Gaussian integral 
on the space of paths. 

In 1949, inspired by Feynman’s work, M Kac 
observed that by considering the heat equation 

o 


1 
-z7 -5 Au t+ V(x)u 


u(0,x) = Yo(x) 


instead of the Schrödinger equation [1] and by 
replacing the oscillatory term e"/”)50 in Feynman 
complex measure with the fast decreasing one 
e (@/h)S0(1) it is possible to give a well-defined 
mathematical meaning to Feynman’s heuristic for- 
mula [2] in terms of a well-defined integral on the 
space of continuous paths W, x= {w € C(0, t; RÎ): 
w(0)=x} with respect to the Wiener Gaussian 
measure P; x: 


u(t, x) = j e` T Vv( 1/mw(r))dr 
W. 


x poly 1/mw(t)) dP; x(w) 7] 


The path-integral representation [7] for the solution of 
the heat equation [6] is called Feynman—Kac formula. 
The underlying idea of the analytic continuation 
approach comes from the fact that by introducing in 
[6] a suitable parameter A, proportional, for 

instance, to the time ż as in the case À = \q, 
o 1 

-Aha = — zh Au + V(x)u 
i= J ao (L/h) [V(b] mdi tor) dr 
Wix 


6 


x vo( h/m )w(t)) dP, (w) 
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or to the Planck constant, as in the case A = A2, 


a 1, 


u(t, x) = J ella) fa Vy 2/m w(r)) dr 
Wix 
x po(VAz/mw(t)) dP, x(w) 


or to the mass, as in the case A= A3, 


o 1 . 
at = a~ — i V(x)u 
mE =| aif, V( 1/A3w(r)) dr 
Wix 
x< pol TPsw(t)) dPrx (w) 


and by allowing À to assume complex values, then one 
gets, at least heuristically, Schrödinger equation and its 
solution by substituting, respectively, Ay = —i, A2 = 1), 
or A3 = —im. These procedures can be made com- 
pletely rigorous under suitable conditions on the 
potential V and initial datum yy. 


The Approach via Fourier Transform 


This approach has its roots in a couple of papers 
by K Ito in the 1960s and was extensively 
developed by S Albeverio and R Hgegh-Krohn in 
the 1970s. The main idea is the definition of 
oscillatory integrals with quadratic phase function 
on a real separable Hilbert space (H,(-,-)), the 
Fresnel integrals, 


n~ 


f el/2PNNI £000) dx 8 
H 

as the distributional pairing between eli/2h)IxI” and 
a complex-valued function f belonging to the 
space F(H) of functions that are Fourier trans- 
forms of complex bounded variation measures on 


H, that is, 


fi f= fb d0) 


F(H) is a Banach algebra, where the product is the 
pointwise one and the identity is the function 
f(x) =1VxEH. The norm of an element f is the 
total variation of the corresponding measure ju, that 
is, ||ue|| = sup >/; |we(E;)|, where the supremum is 
taken over all sequences {E;} of pairwise-disjoint 
Borel subsets of H, such that U;E; =H. 

Given a function f E F(H),f=fy, its Fresnel 
integral is defined by the Parseval formula: 


| e2 £050) dae = f e-d? dpue(x) [9] 


H 
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where the right-hand side is a well-defined absolutely 
convergent integral with respect to a o-additive 
measure on H. 

It is important to recall that this approach 
provides the implementation of a method of 
stationary phase for the expansion of the integral 
in powers of the small parameter h occurring in the 
integrand. We postpone the discussion of these 
results, as well as the application to the solution 
of the Schrodinger equation, to the next section 
where a generalization of the present approach is 


described. 


Infinite-Dimensional Oscillatory Integrals 


The main idea of this approach is the extension of 

the definition of oscillatory integrals with quadratic 

phase function [8] to infinite-dimensional Hilbert 

spaces by means of a twofold limiting procedure. 
The study of integrals of the form 


Ge | _ ell) F(x) dx 10] 


where @(x):R—R is the phase function and 
f:R—C a complex-valued continuous function, 
is a classical topic, largely developed in connection 
with various problems in mathematics (such as the 
theory of pseudodifferential operators) and physics 
(such as optics). Particular effort has been devoted 
to the study of the detailed behavior of the above 
integral in the limit of “strong oscillations,” that is, 
when 4-0, by means of the method of stationary 
phase. 

Thanks to the cancellations due to the oscillatory 
term e"/2/)®(*) | the integral can still be defined, even if 
the function f is not summable, as the limit of a 
sequence of regularized, hence absolutely convergent, 
integrals. According to a Hormander’s proposal, 
the oscillatory integral of a function f:R >C is 
well defined if, for each test function o € S(RY), such 
that 4(0) =1, the limit 


lim | ett) b(ex) F(x) dx 
e>0 J RN 
exists and is independent of ¢. 

This definition has been generalized in the 1980s 
by D Elworthy and A Truman to the case where the 
underlying space R is replaced by a real separable 
infinite-dimensional Hilbert space (H, (-,-)), under 
the assumption that the phase function is quadratic, 
that is, ®(x)=||x||\"/2. The “infinite-dimensional 
oscillatory integral” 


—~— 


f eGD? £50) dx 


M 


is defined as the limit of a sequence of finite- 
dimensional approximations. More precisely, a 
function f:H—C is “integrable” if, for each 
increasing sequence {Pa} en of finite-dimensional 
projector operators in H converging strongly to the 
identity operator as n — œ, the limit 


= 
lim ( J eli/2h) |Puoel” dP, 


x J eli/2b) Pal” FP æ) dP, x [11] 
P,H 


exists and is independent of the sequence {Pp} eN. 
In this case, the limit is denoted by 


—~— 


f e02? £ (20) dx 
H 

The description of the largest class of integrable 
functions is still an open problem, even in finite 
dimension, but it is possible to find some interesting 
subsets of it. In particular, any function belonging 
to F(H), the Banach algebra considered in the 
approach by Fourier transform, is integrable. Indeed, 
by assuming that the function f in [11] is of the type 


PEE g(x) 


where L:H—H is a linear self-adjoint trace-class 
operator on H such that (I— L) is invertible and 
gEF (H); that is g(x) = Jy e% dug(y), then it is 
possible to prove that f is integrable in the sense of 
definition [11] and the corresponding infinite- 
dimensional oscillatory integral can be explicitly 
computed in terms of a well-defined integral with 
respect to a bounded variation measure uf by means 
of the following Parseval’s type equality: 


—~— 


/ “eli/ 2B)? CPL) 4) dp 
H 


=det(I — L) "? f emoe day 2 
H 

det (I — L) being the Fredholm determinant of the 
operator I — L, that is, the product of its eigenvalues, 
counted with their multiplicity. If L=0, then we 
obtain eqn [9], so that we can look at the infinite- 
dimensional oscillatory integrals approach as a gen- 
eralization of the Fourier transform approach, since it 
allows at least in principle to integrate a class of 
function larger than F(H). In fact, recently this feature 
has been used by S Albeverio and S Mazzucchi in the 
proof of a Parseval’s type equality similar to [12] for 
infinite-dimensional oscillatory integrals with poly- 
nomially growing phase functions. 

Feynman’s heuristic formula [2] for the representa- 
tion of the solution of the Schrödinger equation [1] can 


be realized as an infinite-dimensional oscillatory integral 
on the Hilbert space H; of absolutely continuous paths 

~: [0t] ER? a endpoint y(t)=0 and finite 
kinetic energy |) y (T)dr ; oo, endowed with the inner 
product (7,72) Ly 41(T)¥2(7T)dr. One has to take 
an initial datum wo € ae that is the Fourier trans- 
form of a na bounded variation measure on Rf, 
that is, wWo(x = Jr, e'**diig(k). Moreover, one has to 
assume that T potential V in [1] is the sum of a 
harmonic oscillator part plus a bounded perturbation 
V, that is the Fourier transform of a complex bounded 
variation measure u, on R: 


V(x) = 4x9 x + Vi (x) 
Vals) =f dph) 


(Q? being a symmetric positive d x d matrix). 
In this case, it is possible to prove that the linear 
operator L on H; defined by 


(y,Ly)= J | y(7)079(7) dr 


is self-adjoint and trace class, and (I — L) is invertible. 
Moreover, by considering the function v : H; — C 


(y= J Vi(q(t) + x)d7 
X 2 l TJAT t 
+ 2x0 J y(T)ìdr, yEH 


it is possible to prove that the function f :H:,— C 
given by 
f(y) = e WM bo(-y(0) + x) 


is the Fourier transform of a complex bounded 
variation measure uf on H, and the infinite- 
dimensional Fresnel integral of the function 
g(7) ou ), that is,: 


wa 


pe 


0 
_ J oli/2b)(I-L)y) e—i/b)0(0) 


is well defined and it is equal to 


det(I — L)" [« Ob M onl 
He 


Wo(y(0) + x)dy [13] 


D dupla) 


Moreover, it is a representation of the solution of 
equation |1] evaluated at x € Rf at time t. Recently, 
solutions of the Schrödinger equation with quartic 
anharmonic potential via infinite-dimensional oscil- 
latory integrals have been provided by S Albeverio 
and S Mazzucchi using a combination of Parseval 
formula and a new analytic method (the inclusion of 
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such potentials had been a stumbling block for many 
years). 

In this framework, it is possible to implement an 
infinite-dimensional version of the stationary-phase 
method and study the asymptotic behavior of the 
oscillatory integrals in the limit 4 — 0. 

The method of stationary phase was originally 
proposed by Stokes, who noted that when } — 0 the 
oscillatory integral [10] is O(”) for any neN, 
provided that there are no critical points of the 
phase function ® in the support of the function f. As 
a consequence, one can deduce that the leading 
contribution to the integral [10] should come from a 
neighborhood of those points ceR, such that 
V®(c)=0. More precisely, by assuming that the set 
C of critical points is finite, that is, C = {c1,..., Cp} 
and that every critical point is nondegenerate, that 
is, det D* ®(c;) 4 0Vc; € C, then one has 


b) ~ > eli/2) (ci) T* (h) [14] 


GEC 


where I*:R— C are C™ functions of R, such that 


T;(0) = f(c) (27ih) (det D?®(c))) 1" 
If some critical point is degenerate, the situation is 
more complicated: one has to take into account the 
type of degeneracy and apply the theory of unfold- 
ings of singularities. 

These results can be generalized to infinite- 
dimensional oscillatory integrals of the form 


—~— 


I(b) = f eli/26) (I-L)2) W/W) (ede [15] 
H 


with v(x = f duly), =e ewy du(y), pV 
being Ase. bounded Se measures on H 
satisfying suitable assumptions and L:H—>ĦH is a 
self-adjoint and trace-class linear operator, such that 
(I — L) is invertible. Under suitable growth condition on 
the moments of the measures u,v and by assuming 
that the phase function ®(x) = (x, (I — L)x) — v(x) 
has a finite number of nondegenerate critical points 
C1,..-,Cs, it is possible to prove that the integral I(/) 
in [15] is equal to 


b) = HE 
k=1 


for some C% functions I% satisfying: 


) + Io(h) 


I,(0) = [det(I — L — D’ V (cp))] "7 g(ce) 
k 
19 (0) 


in 


1,.. 
0 70.12... 
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Moreover, under some additional smallness assump- 
tions on v, it has been proved that the phase function 
® has a unique stationary point c and as h —> 0 


I(b) 25 e(i/b)®(c) y+ (b) 


for some C% function I*. Each term of the 
asymptotic expansion in powers of / of the function 
I* can be explicitly computed, and it is possible to 
prove that such an asymptotic expansion is Borel- 
summable and determines I* uniquely. 

The application of these results to the infinite- 
dimensional oscillatory integral representation [13] 
for the solution of the Schrödinger equation allows 
the study of its semiclassical limit. One has to 
consider a potential V that is the Fourier transform 
of a complex bounded variation measure u on (RÍ), 
such that fpa elfled|ul(8) <oo for some €>0, and a 
particular form for the initial wave function 
wo(x) =el4/P)%)y (x), where œ is real and 
d, xE CAR are independent of h. This initial 
datum corresponds to an initial particle distribution 
po(x) = Ixl (x) and to a limiting value of the 
probability current J;~9 = V(x)p0(x)/m, giving an 
initial particle flux associated to the velocity field 
V (x) /m. One also has to assume that the Lagrange 
manifold Lr = (y, —Vf) intersects transversally the 
subset Ay of the phase space made of all points (y, p) 
such that p is the momentum at y of a classical 
particle that starts at time zero from x, moves under 
the action of V, and ends at y at time t. In this case, 
the Feynman path integral [13] has an asymptotic 
expansion in powers of þh for b — 0, whose leading 
term is the sum of the values of the function 


aga) . 
daf (7 v0) 
OY, 


taken at the points y” such that a classical particle 
starting at y% at time zero with momentum V¢(y"”) 
is at x at time t. S is the classical action along this 
classical path 7" and m” is the Maslov index of the 
path 7"), that is, m is the number of zeros of 


ga) . 
Oy 


as T varies on the interval (0, t). 


my) 


e~(i/2)rm ——(i/P)S p—(i/B)Oy, 








White-Noise Calculus 


The leading idea of the present approach, which was 
originally proposed by C DeWitt-Morette and 
P Krée and presently realized in the framework of 
white-noise calculus by T Hida, L Streit, and many 
other authors, is the realization of the Feynman 


integrand e"/”)5:(” as an infinite-dimensional distri- 
bution. This idea is similar to the one of the 
approach via Fourier transform, where the expres- 
sion (2mi)~4/? tee el'/2)(%*) f(x)dx is realized as a 
distributional pairing between e'i/2)'**)(27i)4/* and 
the function f € F(R“) by means of the Parseval-type 
equality [9] and generalized to infinite-dimensional 
spaces. In white-noise calculus, the pairing is 
realized in a different measure space. Indeed, by 
manipulating the integrand in 


(2ni) t? I, el/2)2) f(x)dx 


one has 


fl eli/2)(e.x) ina 
— r] (x jax 
Ri (2i)? 


fl elil/2) (xx) +(1/2) (7) P e 
nmm X DEEE 
R? Wa (2r) 


dx [16] 


where the latter line can be interpreted as the 
distributional pairing of 


e (1/2) (x,%)+(1/2)(x,%) 


;d/2 


and f not with respect to Lebesgue measure but 
rather with respect to the standard Gaussian 
measure 


a—(1/2)(x,x) 
(2r)? X 


on Rt. The RHS of [16] can be generalized to the 
case in which Rf is replaced by a path space, thanks 
to the fact that on infinite-dimensional spaces, even 
if Lebesgue measure is meaningless, Gaussian 
measures are well defined and can be used as 
reference measures. The detailed realization of this 
idea as well as its application to the mathematical 
realization of the Feynman integrand are rather 
technical and we certainly do not provide details 
here. We recall that this approach has been success- 
fully applied to the rigorous realization of Feynman 
path-integral formulation of Chern—Simons models. 


Other Possible Approaches 


Another possible mathematical definition of Feyn- 
man path integrals is based on Poisson measures. It 
was originally proposed by A M Chebotarev and 
V P Maslov and further developed by several 
authors such as S Albeverio, Ph Blanchard, 
Ph Combe, R Høegh-Krohn, M Sirugue, and 
V Kolokol’tsov. It can be applied to “phase-space 
integrals,” to the Dirac equation and in particular 
algebraic settings, as well as to the Schrodinger 


equation, with potentials of the same type “Fourier 
transform of bounded measure” discussed in the 
subsection “Infinite-dimensional oscillatory integrals.” 

Another possible definition of Feynman path 
integrals is based on a “time-slicing” approximation 
and a limiting procedure, rather closed to Feynman’s 
original work based on Trotter product formula. 
The “sequential approach” was proposed originally 
by A Truman and further extensively developed by 
D Fujiwara and N Kumano-go. The paths y in 
formula [2] are approximated by piecewise linear 
paths and the Feynman path integral is correspond- 
ingly approximated by a finite-dimensional integral. 
In particular, D Fujiwara and N Kumano-go proved 
that the integrals defined in this way have some 
important properties, such as invariance under 
translations and orthogonal transformations. It is 
also possible to interchange the order of integration 
with Riemann-Stieltjies integrals and study the 
semiclassical approximation. 

Finally, it is worthwhile to recall a very interesting 
and intuitive approach to the Feynman integration 
which is based on nonstandard analysis. It was 
introduced by S Albeverio, J E Fenstad, R Høegh- 
Krohn, and T Linstrøm in the 1980s, but it has not 
been systematically developed yet. 


Abbreviations 

Dy Heuristic Lebesgue-type measure on the space 
of paths 

Piy Wiener Gaussian measure on W; x 

S; Action functional 

5 Action functional for the free particle 

V Potential 

W, x Space of continuous paths with fixed initial 


point W, x = {w € C(0, t; R?) : w(0) =x} 
h Reduced Planck constant 
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Phase function 

Path, ~: [0, t] + R? 

Fourier transform of the measure u 

Wave function, solution of the Schrödinger 
equation 

Hilbert space 

Fresnel integral on the Hilbert space H 
Infinite-dimensional oscillatory integral on the 
Hilbert space H 

(,) inner product 

I norm 


BI SP 


See also: Chern—Simons Models: Rigorous Results; 
Euclidean Field Theory; Functional Integration in 
Quantum Physics; Path Integrals in Noncommutative 
Geometry; Quillen Determinant; Singularity and 
Bifurcation Theory; Stationary Phase Approximation. 
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Introduction 


Algebras and their representations are ubiquitous in 
mathematics. It turns out that representations of 
finite-dimensional algebras are intimately related to 
quivers, which are simply oriented graphs. Quivers 


arise naturally in many areas of mathematics, 
including representation theory, algebraic and dif- 
ferential geometry, Kac—Moody algebras, and quan- 
tum groups. In this article, we give a brief overview 
of some of these topics. We start by giving the basic 
definitions of associative algebras and their repre- 
sentations. We then introduce quivers and their 
representation theory, mentioning the connection to 
the representation theory of associative algebras. We 
also discuss in some detail the relationship between 
quivers and the theory of Lie algebras. 
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Associative Algebras 


An “algebra” is a vector space A over a field k 
equipped with a multiplication which is distributive 
and such that 
a(xy) =(ax)y=x(ay), Vack, x,yEA 

When we wish to make the field explicit, we call A a 
k-algebra. An algebra is “associative” if (xy)z = x(yz) 
for all x,y,zEA. A has a “unit,” or “multiplicative 
identity,” if it contains an element 1,4 such that 
14x =x1, =x for all x€ A. From now on, we will 
assume all algebras are associative with unit. A is said 
to be “commutative” if xy=yx for all xy €A and 
finite dimensional if the underlying vector space of A 
is finite dimensional. 

A vector subspace I of A is called a “left (resp. 
right) ideal” if xy€I for all xe A,yeEl (resp. 
x€I,yeA). If I is both a right and a left ideal, it 
is called a two-sided ideal of A. If I is a two-sided 
ideal of A, then the factor space A/I is again an 
algebra. 

An algebra homomorphism is a linear map 
f : A; — A2 between two algebras such that 


f (1a,) = 1,, 
f(xy) =f œ) W), 


A representation of an algebra A is an algebra 
homomorphism p:A—End,(V) for a k-vector 
space V. Here End;(V) is the space of endomorph- 
isms of the vector space V with multiplication 
given by composition. Given a representation of 
an algebra A on a vector space V, we may view V 
as an A-module with the action of A on V given 


by 


Yx,yEA 


a-v=p(a)jv, acA, vEV 

A morphism w:V—W of two A-modules (or 
equivalently, representations of A) is a linear map 
commuting with the action of A. That is, it is a 
linear map satisfying 


a- Y(v) = Y(a-v), 


Let G be a commutative monoid (a set with an 
associative multiplication and a unit element). A 
G-graded k-algebra is a k-algebra which can be 
expressed as a direct sum A= @geGAg such that 
aA, C Ag for all a€k and Ag, Ag, C Agiig, for all 
21,22€G. A morphism w:A—-B of G-graded 
algebras is a k-algebra morphism respecting the 
grading, that is, satisfying ~#(Ag) C Bẹ for all 
geG. 


Vac A, vEeV 





Quivers and Path Algebras 


A “quiver” is simply an oriented graph. More 
precisely, a quiver is a pair O = (Qo, O1) where Oo 
is a finite set of vertices and Q, is a finite set of 
arrows (oriented edges) between them. For a€ Qi, 
we let h(a) denote the “head” of a and t(a) denote the 
“tail” of a. A path in O is a sequence x = p1 p2 . - - Pm 
of arrows such that h(p;.1) =t(p;) for 1<i<m-—1. 
We let t(x) =t(p,.) and h(x) =h(p,) denote the initial 
and final vertices of the path x. For each vertex 
i € Oo, we let e; denote the trivial path which starts 
and ends at the vertex i. 

Fix a field k. The path algebra RO associated to a 
quiver O is the k-algebra whose underlying vector 
space has basis the set of paths in O, and with the 
product of paths given by concatenation. Thus, if 
X=P1...P~m and y=o,...0, are two paths, then 
XY = p1 «<< Pm01 - - -On 1f h(y)=t(x) and xy =0 other- 
wise. We also have 


Ci ifa =f 
ejej = 

0 if ix; 

x if h(x) =i 
eix = 

0 if b(x) Ai 

oe i= 
Xei = 

0 if t(x) 4i 


for x€ kO. This multiplication is associative. Note 
that e;A and Ae; have bases given by the set of paths 
ending and starting at i, respectively. The path 
algebra has a unit given by `e o, €i 


Example 1 Let O be the following quiver: 


then RO has a basis given by the set of paths 
{€1, €2, €3, €4, pP, 0, A, op}. Some sample products are 
pa =, Ax=0, Ao =0,,. 30 Se) =—0, 0 =0. 


Example 2 Let O be the following quiver (the 
so-called “Jordan quiver”). 


p 


Q 


| 


Then kO k[t], the algebra of polynomials in one 
variable. 


Note that the path algebra RO is finite dimen- 
sional if and only if O has no oriented cycles (paths 
with the same head and tail vertex). 


Example 3 Let O be the following quiver: 


o> @ > @ «++ @>-e—> 6 
1 2 3 n-2 n-1 n 


Then for every 1<i<j <x, there is a unique path 
from ito j. Let f : RO — M,,(k) be the linear map from 
the path algebra to the n x n matrices with entries in 
the field k that sends the unique path from / to j to the 
matrix E;; with (j, i) entry 1 and all other entries zero. 
Then one can show that f is an isomorphism onto the 
algebra of lower triangular matrices. 


Representations of Quivers 


Fix a field k. A representation of a quiver O is an 
assignment of a vector space to each vertex and to 
each arrow a linear map between the vector spaces 
assigned to its tail and head. More precisely, a 
representation V of O is a collection 


{Vil E€ Qo} 


of finite-dimensional k-vector spaces together with a 
collection 


{Vp : Viip) > Viole E Q1} 


of k-linear maps. Note that a representation V of a 
quiver O is equivalent to a representation of the 
path algebra kO. The dimension of V is the map 
dy : Oo = Z>0 given by dy(i) = dim V; for i€ Oo. 

If V and W are two representations of a quiver Ọ, 
then a morphism 4: V — W is a collection of k-linear 
maps 


TY; ; Vi — Wii E Oo} 
such that 
Wx) = Yop) Vo VP © Qt 


Proposition 1 Let A be a _ finite-dimensional 
k-algebra. Then the category of representations of 
A is equivalent to the category of representations of 
the algebra RO/I for some quiver O and some two- 
sided ideal I of kO. 


It is for this reason that the study of finite- 
dimensional associative algebras is intimately related 
to the study of quivers. 

We define the direct sum V@W of two repre- 
sentations V and W of a quiver O by 


i E€ Oo 
and (V W), : Vip) B Wiio) =} Vip) p Whip) by 
(VE W), (w, w)) = (Voy), Wp(w)) 





(Ve W); = V; W; 
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for v € Vyp, W E Wup p E Q1. A representation V is 
“trivial” if V;=0 for all ¿€ Oo and “simple” if its 
only subrepresentations are the zero representation 
and V itself. We say that V is “decomposable” if it is 
isomorphic to W ẹ@ U for some nontrivial represen- 
tations W and U. Otherwise, we call V “indecom- 
posable.” Every representation of a quiver has a 
decomposition into indecomposable representations 
that is unique up to isomorphism and permutation 
of the components. Thus, to classify all representa- 
tions of a quiver, it suffices to classify the indecom- 
posable representations. 
Example 4 Let O be the following quiver: 
p 
C— o 
1 2 


Then O has three indecomposable representations 
U, V, and W given by: 


Ui=k, U=0, U,=0 
Vi=0, V=k, V,=0 
Wi=k, W=k, W,=1 


Then any representation Z of O is isomorphic to 
Zœ yt- an ya-r a wr 


where dı = dim Z4, d2 = dim Z2,r=rank Z,. 


Example 5 Let O be the Jordan quiver. Then 
representations V of O are classified up to iso- 
morphism by the Jordan normal form of V, where p 
is the single arrow of the quiver. Indecomposable 
representations correspond to single Jordan blocks. 
These are parametrized by a discrete parameter n 
(the size of the block) and a continuous parameter A 
(the eigenvalue of the block). 


A quiver is said to be of “finite type” if it has only 
finitely many indecomposable representations (up to 
isomorphism). If a quiver has infinitely many 
isomorphism classes but they can be split into 
families, each parametrized by a single continuous 
parameter, then we say the quiver is of “tame” (or 
“affine”) type. If a quiver is of neither finite nor 
tame type, it is of “wild type.” It turns out that there 
is a rather remarkable relationship between the 
classification of quivers and their representations 
and the theory of Kac-Moody algebras. 

The “Euler form” or “Ringel form” of a quiver O 
is defined to be the asymmetric bilinear form on Ze? 
given by 


(a, 8) = X aL- X a(t(p)) 5(h(p)) 


i € Qo pEQi 
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In the standard coordinate basis of 72°, the Euler 
form is represented by the matrix E = (aj) where 


ai = 6; — #{p€ Qi | t(p) =1,h(p) =F} 


Here ô; is the Kronecker delta symbol. We define 
the “Cartan form” of the quiver O to be the 
symmetric bilinear form given by 


(a, 8) = (a, 8) + (8, a) 


Note that the Cartan form is independent of the 
orientation of the arrows in QO. In the standard 
coordinate basis of Z2°, the Cartan form is represented 
by the Cartan matrix C = (cy) where cj = aj + dji. 


Example 6 For the quiver in Example 1, the Euler 
matrix is 


1 -1 00 
0 1 —1 0 
n 0 0 1 0 
0 0 =f 1 


and the Cartan matrix is 


2 -1 0 0 
-1 2 -1 0 
0 =i 2 =l 
0 © =k 2 


C= 


The “Tits form” g of a quiver O is defined by 


q(a) = (a, a) = 3 (a, a) 
It is known that the number of continuous para- 
meters describing representations of dimension a for 
a #0 is greater than or equal to 1 — g(a). 

Let g be the Kac—-Moody algebra associated to 
the Cartan matrix of a quiver O. By forgetting the 
orientation of the arrows of O, we obtain the 
underlying (undirected) graph. This is the Dynkin 
graph of g. Associated to g is a root system and a set 
of simple roots {a; |i € Oo} indexed by the vertices of 
the Dynkin graph. 


Theorem 1 


(i) A quiver is of finite type if and only if the 
underlying graph is a union of Dynkin graphs 
of type A, D, or E. 

(ii) A quiver is of tame type if and only if the 
underlying graph is a union of Dynkin graphs 
of type A, D, or E and extended Dynkin graphs 
of type A,D, or E (with at least one extended 
Dynkin graph). 

(iii) The isomorphism classes of indecomposable 
representations of a quiver O of finite type are 
in one-to-one correspondence with the positive 
roots of the root system associated to the 


(Gabriel’s theorem). 


underlying graph of O. The correspondence is 
given by 


V> Y` dy(ija; 


1€ Qo 


The Dynkin graphs of type A, D, and E are as follows. 


A, @———_@—__@- o o o —@@ 0 


D, @——_®—_®- eee 


Here the subscript indicates the number of vertices in the 
graph. p J 

The extended Dynkin graphs of type A, D, and E 
are as follows. 


Here we have used an open dot to denote the vertex 
that was added to the corresponding Dynkin graph 
of type A, D, or E. 


Theorem 2 (Kac’s theorem). Let O be an arbitrary 
quiver. The dimension vectors of indecomposable 
representations of O correspond to positive roots 


of the root system associated to the underlying graph 
of O (and are thus independent of the orientation of 
the arrows of O). The correspondence is given by 


dy Y` dy(ia; 


i E€ Qo 


Note that in Kac’s Theorem, it is not asserted that the 
isomorphism classes are in one-to-one correspondence 
with the roots as in the finite case considered in 
Gabriel’s theorem. It turns out that in the general case, 
dimension vectors for which there is exactly one 
isomorphism class correspond to real roots while 
imaginary roots correspond to dimension vectors for 
which there are families of representations. 


Example 7 Let O be the quiver of type Aq, 
oriented as follows. 


p1 P2 Pn-2 Pn-i 
o—_o —=— @ o o o -@— Oo—_*O 
1 2 3 n-2 n-i n 


It is known that the set of positive roots of the 
simple Lie algebra of type A, is 


[54 


i=] 





r<jstembu(oy 


The zero root corresponds to the trivial representation. 
The root ya jai for some 1 <j < l < n corresponds 
to the unique (up to isomorphism) representation V 


with 


y= k if7<i<l 
0 otherwise 


and 


i 0 otherwise 


Example 8 Let O be the quiver of type An, with all 
arrows oriented in the same direction (for instance, 
counter-clockwise). The positive root  S~" ,a; 
(where {0,1,2,...,7} are the vertices of the quiver) 
is imaginary. There is a one-parameter family of 
isomorphism classes of indecomposable representa- 
tions where the maps assigned to each arrow are 
nonzero. The parameter is the composition of the 
maps around the loop. 


If a quiver O has no oriented cycles, then the only 
simple RO-modules are the modules S’ for i€ Oo 


where 
j k 
$=15 


and S =0 for all p € Q4. 


ifi=j 
if iA; 
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Ringel-—Hall Algebras 


Let k be the finite field F} with q elements and let 
O be a quiver with no oriented cycles. Let P be 
the set of all isomorphism classes of RO-modules 
which are finite as sets (since k is finite dimen- 
sional, these are just the quiver representations we 
considered above). Let A be a commutative 
integral domain containing Z and elements v,v™! 
such that v?=q. The Ringel-Hall algebra 
H=H,,,(RQ) is the free A-module with basis 
{[V]} indexed by the isomorphism classes of 
representations of the quiver O, with an A-bilinear 
multiplication defined by 


im V! dim V2 
VV ae S gagal 
V 


Here (dim V', dim V?) is the Euler form and By y2 
is the number of submodules W of V such that 
V/W2V! and WV’. H is an associative ZS. 
graded algebra, with identity element [0], the 
isomorphism class of the trivial representation. 
The grading H = 6,H, is given by letting Ha be 
the A-span of the set of isomorphism classes [V] 
such that dim V =a. 

Let C=C,,,(RQ) be the A-subalgebra of H 
generated by the isomorphism classes [Sf] of the 
simple RO-modules. C is called the “composition 
algebra.” If the underlying graph of O is of finite 
type, then C=H. 

Now let K be a set of finite fields k such that the 
set {|k||kEK} is infinite. Let A be an integral 
domain containing Q and, for each kREK, an 
element vg such that vz;=|k|. For each REK, we 
have the corresponding composition algebra C,, 
generated by the elements [ŽS] (here we make the 
field k explicit), Now let C be the subring of 
[lec Ce generated by Q and the elements 


= (te)ke K 
7 a = k\—1 

m = (t'he te =(¥"*) 

w= heo m= ÉS], 16 Qo 


Now, ż lies in the center of C and if p(t) =0 for some 
polynomial p, then p must be the zero polynomial 
since the set of vg is infinite. Thus, we may think 
of C as the A-algebra generated by the u’, i€ Oo, 
with A=Qf[zt,t'] and ¢ an indeterminate. Let 
C* = Q(t) @4 C. We call C* the “generic composi- 
tion algebra.” 

Let g be the Kac—Moody algebra associated to the 
Cartan matrix of the quiver O and let U be the 
quantum group associated by Drinfeld and Jimbo to g. 
It has a triangular decomposition U = U~ & U} & Ut. 
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Specifically, U* is the Q(t)-algebra with generators 
E;, i € Qo and relations 


1-ci 


1 — Ci l=c;= A z 
5o] p (EPRE 2 iF] 


p=0 


where cj are the entries of the Cartan matrix and 


ilo 
p | ~ Piin =p] 
n=, inl = n... 


Theorem 3 There is a Q(t)-algebra isomorphism 
C* — U* sending u;+> E; for all i € Qo. 


The proof of Theorem 3 is due to Ringel in the 
case that the underlying graph of O is of finite or 
affine type. The more general case presented here is 
due to Green. 

All of the Kac-Moody algebras considered so 
far have been simply-laced. That is, their Cartan 
matrices are symmetric. There is a way to deal 
with non-simply-laced Kac-Moody algebras using 
species. We will not treat this subject in this 
article. 


Quiver Varieties 


One can use varieties associated to quivers to yield a 
geometric realization of the upper half of the 
universal enveloping algebra of a Kac—Moody 
algebra g and its irreducible highest-weight 
representations. 


Lusztig’s Quiver Varieties 


We first introduce the quiver varieties, first 
defined by Lusztig, which yield a geometric 
realization of the upper half U* of the universal 
enveloping algebra of a simply laced Kac-Moody 
algebra g. Let O=(Qo,Q1) be the quiver whose 
vertices Og are the vertices of the Dynkin 
diagram of g and whose set of arrows QO, consists 
of all the edges of the Dynkin diagram with both 
orientations. By definition, U™ is the Q-algebra 
defined by generators e; i€ Og, subject to the 
Serre relations 


= 1 i 
Sophia teen 


p=0 
for all i 47 in Qo, where c; are the entries of the Cartan 
matrix associated to O. For any v= j< Oy Vib Vi EN, 
let Ur be the subspace of U* spanned by the 


monomials ¢;,e;,...e;, for various sequences 
i1, i2,... in in which 7 appears v; times for each 
i € Oo. Thus, U7=6,U7. Let UZ be the subring of 
U* generated by the elements e? /p! for i€ Oo, p EN. 
Then U4 = @ US, where UZ =U NUJ. 

We define the involution ~: Q1 — Q4 to be the 
function which takes p€ QO; to the element of QO; 
consisting of the same edge with opposite orienta- 
tion. An orientation of our graph/quiver is a choice 
of a subset Q C Qı such that QUQ=Q, and 
QNN=F. 

Let V be the category of finite-dimensional 
Qo-graded vector spaces V= @jcQ,Vi over C 
with morphisms being linear maps respecting 
the grading. Then V €V shall denote that V is an 
object of V. The dimension of VEY is given by 
v= dim V = (dim Vo,..., dim V,,). 

Given V €V, let Ey be the space of representa- 
tions of O with underlying vector space V. That 
is, 





Ey = s> Hom(V ip); Vip) 
pEQI 


For any subset Oj of Q1, let Ey, o, be the subspace 
of Ey consisting of all vectors x=(x,) such that 
x,=0O whenever p¢QO'. The algebraic group 
Gy = [| [; Aut(V;) acts on Ey and Ey, QO! by 


(g,x) = (81), (%p)) = gx 
= (x,) = CORTE) 


Define the function £: Q1 —> {-1, 1} by e(p) =1 for 
all p€Q and e(p) = —1 for all pE Q. Let (-,-) be the 
nondegenerate, Gy-invariant, symplectic form on 
Ey with values in C defined by 


(x,y) = Y e(p)tr(xpyp) 


pEQi 


Note that Ey can be considered as the cotangent 
space of Ey o under this form. 

The moment map associated to the Gy-action on 
the symplectic vector space Ey is the map 
w:Ey > gly = |], EndV;, the Lie algebra of GLy, 
with i-component w;: Ey — EndV; given by 


wiix)= > 


p E€ Q1,h(p)=i 


E(p)XpXz 


Definition 1 An element x¢€FEy is said to be 
nilpotent if there exists an N > 1 such that for any 
sequence p1, p2,..., pN in H satisfying t(p1)=h(p2), 
t(p2) =h(p3),.--,t(Pn-1)=h(pn), the composition 
XoXo: Xan: Vion ~ V bion 1S Zero. 


Definition 2 Let Ay be the set of all nilpotent 
elements x € Ey such that w(x) =0 for all iE J. 


A subset of an algebraic variety is said to be 
“constructible” if it is obtained from subvarieties 
from a finite number of the usual set-theoretic 
operations. A function f:A—Q on an algebraic 
variety A is said to be a constructible function if f! (a) 
is a constructible set for all a € Q and is empty for all 
but finitely many a. Let M(Ay) denote the Q-vector 
space of all constructible functions on Ay. Let M(Ay) 
denote the Q-subspace of M(Ay) consisting of those 
functions that are constant on any Gy-orbit in Ay. 

Let V,V’,V” EV such that dimV=dimV’ + 
dim V”. Now, suppose that S is an I-graded subspace 
of V. For x € Ay we say that S is x-stable if x(S) C S. 
Let Ay. y y” be the variety consisting of all pairs (x, S) 
where x € Ay and S is an I-graded x-stable subspace of 
V such that dim S= dim V”. Now, if we fix some 
isomorphisms V/S2V’,S2V", then x induces ele- 
ments x’ € Ay and x” € Ayr. We then have the maps 


pı p2 
Ay x Ayn == Ay.yiy" = Ay 


where p1(x,S)=(x%'.%" ),po(x, S) =x: 

For a holomorphic map m between complex 
varieties A and B, let m, denote the map between 
the spaces of constructible functions on A and B 
given by 


mA) = Dax y) Of" (a) 


acQ 


Let z* be the pullback map from functions on B to 
functions on A acting as m*f(y)=f(z(y)). We then 
define a map 


M(Ay’) x M(Ayr) > M(Ay) 1] 
by (f, f") = f" * f" where 
f«f" = (pr) pill’ xf”) 


Here f’xf"”€M(Ay x Ay) is defined by 
(fo x FV", x") = fl xf" x"). The map [1] is bilinear 
and defines an associative Q-algebra structure on 
&,M(Ay-) where V” is the object of V defined by 
ig Oar 

There is a unique algebra homomorphism 
K:U* > @,M(Ay-) such that «(e;) is the function 
on the point Ay: with value 1. Then « restricts to a 
map ky,:U>—>M(Ayw-). It can be shown that 
Kpiles /p!) is the function 1 on the point Aysi for 
1E Qo, p E Z>0. 7 

Let Mz(Ay) be the set of all functions in M(Ay) that 
take on only integer values. One can show that if 
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f' € Mz(Ay’) and f" € Mz(Ayr), then f' « f" € Mz(Ay) 
in the setup of [1]. Thus k (U} „) C Mz(Av»). 

Let IrrAy denote the set of irreducible compo- 
nents of Ay. The following proposition was con- 
jectured by Lusztig and proved by him in the affine 
(and finite) case. The general case was proved by 
Kashiwara and Saito. 


Proposition 2 For any veZa, we have 
dimU/* = #IrrAyr. 


We then have the following important result due 
to Lusztig. 


Theorem 4 Let v€ (Z>o)®. Then, 


(i) For any ZelrrAy’, there exists a unique 
fz E€ kU} „) such that fz is equal to 1 on an 
open dense subset of Z and equal to zero on an 
open dense subset of Z' €lrrAy- for all Z' FZ. 

(ii) {fz | Z € IrrAy} is a Q-basis of k (U$). 

Gii) ky :UT —> k, (U?) is an isomorphism. 

(iv) Define [Z]EU} by «,([Z])=fz. Then B, = 
{[Z] x | Z €IrrAy-} is a Q-basis of UF. 

(v) Ku(Uz ,) = Ky(U} ) N Mz(Av-). 

(vi) B, is a Z-basis of Uy, „ 


From this theorem, we see that B= uU, B, is a 
Q-basis of U*, which is called the “semicanonical 
basis.” This basis has many remarkable properties. 
One of these properties is as follows. Via the algebra 
involution of the entire universal enveloping algebra 
U of g given on the Chevalley generators by 
ei> fi fime; and ht+—h for h in the Cartan 
subalgebra of g, one obtains from the results of 
this section a semicanonical basis of W`, the lower 
half of the universal enveloping algebra of g. For any 
irreducible highest-weight integrable representation 
V of U (or, equivalently, g), let ve V be a nonzero 
highest-weight vector. Then the set 


{bv|b € B, bv 40} 


is a Q-basis of V, called the semicanonical basis of 
V. Thus, the semicanonical basis of W~ is simulta- 
neously compatible with all irreducible highest- 
weight integrable modules. There is also a way to 
define the semicanonical basis of a representation 
directly in a geometric way. This is the subject of the 
next subsection. 

One can also obtain a geometric realization of the 
upper part U* of the quantum group in a similar 
manner using perverse sheaves instead of construc- 
tible functions. This construction yields the canoni- 
cal basis of the associated quantum group (a 
q-deformation of the universal enveloping algebra) 
which also has many remarkable properties and is 
closely related to the theory of crystal bases. 
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Nakajima’s Quiver Varieties 


We introduce here a description of the quiver varieties 
first presented by Nakajima. They yield a geometric 
realization of the irreducible highest-weight represen- 
tations of simply-laced Kac—-Moody algebras. The 
construction was motivated by the work of Kronhei- 
mer and Nakajima on solutions to the anti-self-dual 
Yang-Mills equations on ALE gravitational instantons 
(see Instantons: Topological Aspects). 


Definition 3 For v,we Zo, choose I-graded vec- 
tor spaces V and W of graded dimensions v and w, 
respectively. Then define 


A = A(v,w) = Ay x GQ) Hom(V;, W;) 
ie! 
Definition 4 Let ASX =A(v,w)” be the set of all 
(x,t) € A(v,w) satisfying the following condition: if 
S=(S;) with $; cC V; is x-stable and t;(S;)=0 for 
i€I, then $;=0 for iel. 


The group Gy acts on A(v, w) via 


(g, (x, £)) > (Cerno) (tigr *)) 


and the stabilizer of any point of A(v,w)* in Gy is 
trivial. We then make the following definition. 


Definition 5 Let £L = L(v, w) = A(v, w)* / Gy. 


We should note that while the above definition 
and other constructions in this article are algebraic, 
there are also more geometric ways of looking at 
quiver varieties. In particular, the space 


M(v, w) = E Hom(Va Vio) 


peEQi 


a E Hom(W;, Vi) p Hom(V;, w») 


iel 


has a natural hyper-Kähler metric and one can 
consider a hyper-Kähler quotient by the group 
[[ U(V;). The variety L(v,w) is a Lagrangian 
subvariety of (and is homotopic to) this hyper- 
Kahler quotient. In the case g=sl,, the varieties 
involved are closely related to flag varieties. 

Let w, v, v, u" E€ ZLo be such that v=v +v". 
Consider the maps 


A(v",0) x Aw, w) 2 Èi, w; v") 
P Rv, wv") B Aw, w) (2] 


where the notation is as follows. A point of 
F(v,w;v") is a point (x,t) € A(v,w) together with 
an I-graded, x-stable subspace S$ of V such that 


dimS=v =v — v". A point of F(v,w;v") is a point 
(x,t, S) of F(v,w;v") together with a collection of 
isomorphisms R/:V,2S; and R!:V/2=V;/S; for 
each ¿€ I. Then we define p2(x, t, S, R’, R”) = (x,t, S), 
p3(x, t, S) = (x,t) and piix, t, S, R, R”) — ee es t’) 
where x”, x’, t’ are determined by 


/ / 
Rj) p p) 


t = t;iR: : V; >W; 


1 


= x pRi Vivo) —> Ship) 


Ry(p)X%p = XpRip) : Vip 7 Vow) / Sho) 


It follows that x’ and x” are nilpotent. 


Lemma 1 One has 
(p3 o p2) (Alv, w)*) C pī (A(w", 0) x A(v’, w)*) 


Thus, we can restrict [2] to A“, forget the A(v”, 0)- 
factor and consider the quotient by Gy and Gw. 
This yields the diagram 


L(v',w) € F(v,w;v — v) Z L(v,w) [3] 
where 
F(v,w,v—v') 
“{ (x,t, S) € F(v, w;v — v’)|(x,t) € A(v, w)"}/Gy 


Let M(L(v,w)) be the vector space of all 
constructible functions on L(v,w). Then define 
maps 


h; : M(L(v,w)) + M(L(v, w)) 
e; : M(L(v,w)) — M(L(v — e’, w)) 
fi: M(L(v - e. w)) > M(L(v, w)) 


by 
hif = uif 
eif = (m1) (3f) 
fig = (m2) (Ti8) 
Here 


u = (uo, ..., Un) = w — Cv 


where C is the Cartan matrix of g and we are using 
diagram 3 with v' =v -— e where e is the vector 
whose components are given by e; Sp: 

Now let y be the constant function on £(0, w) 
with value 1. Let L(w) be the vector space of 
functions generated by acting on » with all possible 
combinations of the operators f;. Then let 
L(v,w) = M(L(v, w)) A L(w). 


Proposition 3 The operators e;, fi, hi on L(w) provide 
it with the structure of the irreducible highest-weight 


integrable representation of g with highest weight 
ico, Wiwi. Each summand of the decomposition 
L(w)= @, L(v,w) is a weight space with weight 
oe eo, Wiwi —vja;. Here the w; and a; are the 
fundamental weights and simple roots of g, respectively. 


Let ZelIrr£L(v,w) and define a linear map 
Tz:L(v,w)—C that associates to a constructible 
function f € L(v,w) the (constant) value of f on a 
suitable open dense subset of Z. The fact that 
L(v,w) is finite dimensional allows us to take such 
an open set on which any f € L(v,w) is constant. So 
we have a linear map 


®: L(v, w) —> Ca 


Then we have the following proposition. 


Proposition 4 The map ® is an isomorphism; for 
any Z€Elrrl(v,w), there is a unique function 
gz E€ L(v,w) such that for some open dense subset 
O of Z we have gz\ > =1 and for some closed 
Gy-invariant subset K C L(v,w) of dimension < 
dim L(v,w) we have gz=0 outside ZUK. The 
functions gz for ZeIrrA(v,w) form a basis of 
L(v, w). 


Additional Topics 


To conclude, we have given here a brief overview 
of some topics related to _ finite-dimensional 
algebras and quivers. There is much more to be 
found in the literature. For basics on associative 
algebras and their representations, the reader 
may consult introductory texts on abstract alge- 
bra such as Lang (2002). For further results (and 
their proofs) on Ringel—Hall algebras see the 
papers of Ringel (1990a, b, 1993, 1995, 1996) 
and of Green (1995) and the references cited 
therein. The reader interested in species, which 
extend many of these results to non-simply-laced 
Lie algebras, should consult Dlab and Ringel 
(1976). 

The book by Lusztig (1993) covers the quiver 
varieties of Lusztig and canonical bases. Canonical 
bases are closely related to crystal bases and crystal 
graphs (see Hong and Kang (2002) for an overview 
of these topics). In fact, the set of irreducible 
components of the quiver varieties of Lusztig and 
Nakajima can be endowed with the structure of a 
crystal graph in a purely geometric way (see 
Kashiwara and Saito (1997) and Saito (2002)). 
Many results on Nakajima’s quiver varieties can be 
found in the original papers (Nakajima 1994, 
1998). The overview article (Nakajima 1996) is 
also useful. 


Finite-Dimensional Algebras and Quivers 321 


Quiver varieties can also be used to give geometric 
realizations of tensor products of representations 
(see Malkin (2002, 2003), Nakajima (2001), and 
Savage (2003)) and finite-dimensional representa- 
tions of quantum affine Lie algebras (see Nakajima 
(2001)). This is just a select few of the many 
applications of quiver varieties. Much more can be 
found in the literature. 
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Introduction 


It is a commonplace situation that symmetric laws 
of Nature give rise to physical states which are not 
symmetric. States related by symmetry operations 
are equivalent, but still nature selects one of them. 

As an example, consider a ferromagnetic system 
of interacting spins with no external magnetic field. 
The “up” and “down” states are equivalent, but one 
of the two is chosen: the interaction makes states 
with agreeing spin orientation (and therefore macro- 
scopic magnetization) energetically preferred, and 
fluctuations will decide which state is actually 
chosen by a given sample. 

Finite group symmetry is also commonplace in 
physics, in particular through crystallographic 
groups occurring in condensed matter physics — but 
also through the inversions (C, P, T and their combi- 
nations) occurring in high-energy physics and field 
theory. 

The breaking of finite group symmetry has thus 
been thoroughly studied, and general approaches 
exist to investigate it in mathematically precise 
terms with physical counterparts. In particular, 
a widely applicable approach is provided by the 
Landau theory of phase transitions — whose 
mathematical counterpart resides in the realm 
of equivariant singularity and bifurcation theory. 
In Landau theory, the state of a system is 
described by a finite-dimensional variable (the 
“order parameter”), and physical states corre- 
spond to minima of a potential, invariant under a 
group. 

In this article we describe the basics of 
symmetry breaking analysis for systems described 
by a symmetric polynomial; in particular, we 
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discuss generic symmetry breakings, that is, those 
determined by the symmetry properties them- 
selves and independent of the details of the 
polynomial describing a concrete system. We 
also discuss how the plethora of invariant poly- 
nomials can be to some extent reduced by means 
of changes of coordinates, that is, how one can 
reduce to consider certain types of polynomials 
with no loss of generality. Finally, we will give 
some indications on extension of this theory, that 
is, on how one deals with symmetry breakings for 
more general groups and/or more general physical 
systems. 


Basic Notions 
Finite Groups 


A finite group (G, ©) is a finite set G of elements 
{go,---,2n} equipped with a composition law o, and 
such that the following conditions hold: 


1. for all g,h € G the composition go h belongs to 
G, that is, go þh € G; 

2. the composition is associative, that is, (gob) 
ok=go(hok) for all g,b,k € G; 

3. there is an element in G — which we will denote 
as e — which is the identity for the action of o on 
G, that is, eo g=g =g oe for all g € G; and 

4. for each g € G there is an element g! which is 
the inverse of g, that is, g! o g=e=gog". 


In the following, we omit the symbol o, that is, we 
write gh to mean go þh. Similarly, we usually write 
simply G for the group, rather than (G, o). 

Given a subset H C G, this is a subgroup of (G, o) 
if (H, ©) satisfies the group axioms (1)-(4) above. 
Note that this implies that e € H whenever H is a 
subgroup, and {e} is a subgroup. Subgroups not 
coinciding with the whole G and with {e} are said to 
be “proper.” 


Given two elements g,) we say that ghg™ is the 
conjugate of h by g. The conjugate of a subgroup 
HCG by géG is the subgroup of elements 
conjugated to elements of H,gHg™! ={(ghg'), 
h € H}. 


Group Action 


In physics, one is usually interested in a realization 
of an abstract group as a group of transformations 
in some set X; in physical applications, this is 
usually a (possibly, function) space or a manifold, 
and we refer to elements of X as “points.” That is, 
there is a map p: Gr End(X) from G to the group 
of endomorphisms of X, such to preserve the 
composition law: 


p(2) : p(h) = plg o h) 


In this case, we say that we have a “representation” 
of the abstract group G acting in the “carrier” space 
or manifold X; we also say that X is a G-space or 
G-manifold. We often denote by the same letter the 
abstract element and its representation, that is, write 
simply g for p(g) and G for p(G). (In many 
physically relevant cases, but not necessarily, X has 
a linear structure and we consider linear endo- 
morphisms. In this case, we sometimes write T, for 
the linear operator representing g.) 

If x € X is a point in X, the G-orbit G(x) is the set 
of points to which x is mapped under G, that is, 


G(x) = {yE X: y=gx,g EG} CX 


Vg,h EG 


Belonging to the same orbit is obviously an 
equivalence relation, and partitions X into equiva- 
lence classes. The “orbit space” for the G action on 
X, also denoted as 2=X/G, is the set of these 
equivalence classes. It corresponds, in physical 
terms, to considering X modulo identification of 
elements related by the group action. 

For any point x € X, the “isotropy (sub)group” 
Gx is the set of elements leaving x fixed, 


G,={geEG: gx=x}CG 


Points on the same G-orbit have conjugated isotropy 
subgroups: indeed, y= gx implies immediately that 
Gy =g8G,g. 

When a topology is defined on X, the problem 
arises if the G-action preserves it; if this is the case, 
we say that the G-action is “regular.” In the case of 
a compact Lie group (and a fortiori for a finite 
group) we are guaranteed the action is regular. 
(A physically relevant example of nonregular action 
is provided by the irrational flow on a torus. In this 
case G = R, realized as the time ¢ irrational flow on 
the torus X = T*.) 
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Spontaneous Symmetry Breaking 


Let us now consider the case of physical systems 
whose state is described by a point x in the G-space 
or G-manifold X, with G a group acting by smooth 
mappings g:X — X. In physical problems, G quite 
often acts by linear and orthogonal transformations. 
(If this is not the case, the Palais-Mostow theorem 
guarantees that, for suitable groups (including in 
particular the finite ones) we can reduce to this case 
upon embedding X into a suitably larger carrier 
space Y.) 

Usually, G represents physical equivalence of 
states, and G-orbits are collections of physically 
equivalent states. A point which is G-invariant, that 
is, such that G, = G, is called “symmetric” for short. 

Let ® be a scalar function (potential) defined on 
X,®:X — R, possibly depending on some para- 
meter u, such that the physical state corresponds to 
critical points — usually the (local) minima — of ®. 

A concrete example is provided by the case 
where ©® is the Gibbs free energy; more generally, 
this is the framework met in the Landau theory of 
phase transitions (Landau 1937, Landau and 
Lifshitz 1958). 

We are interested in the case where ® is invariant 
under the group action, or briefly G-invariant, that 
is, where 


O(gx) = (x) Vx EX, VegEG [1] 

A critical point x such that Gy =G is a symme- 
trical critical point. If Gy is strictly smaller than G, 
then x is a symmetry-breaking critical point. 

If a physical system corresponds to a nonsym- 
metric critical point, we have a spontaneous 
symmetry breaking: albeit the physical laws (the 
potential function ®) are symmetric, the physical 
state (the critical point for ®) breaks the symmetry 
and chooses one of the G-equivalent critical points. 

It follows from [1] that the gradient of ©® is 
covariant under G. If y=g(x), then the differential 
(Dg) of the map g:X — X is a linear map between 
the corresponding tangent spaces, (Dg): T,X — T,X. 
The covariance amounts, with 7 the Riemannian 
metric in X, to (n/0;6)(gx) =[(Dg), nn P(x); this 
is also written compactly, with obvious notation, as 


(V®)(gx) = (Dg) [(V®)(x)| |2] 


(in the case of euclidean spaces (7=6) and linear 
actions described by matrices T,, the covariance 
condition reduces to (V®)'(T,x) = (Tg); [(V®)!(x)]). 
As (Dg) is a linear map, (V®)(x)=0 implies the 
vanishing of V® at all points on the G-orbit of x. 
We conclude that critical points of a G-invariant 
potential come in G-orbits: if x is a critical point for 
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®, then each y € G(x) is also a critical point for ®. 
We speak therefore of critical orbits for ®. 

It is thus possible (thanks to the regularity of the 
G-action), and actually convenient, to study sponta- 
neous symmetry breaking in the orbit space 
Q=xX/G rather than in the carrier manifold X 
(Michel 1971). 

If G describes physical equivalence, physical states 
whose symmetries are G-conjugated should be seen 
as physically equivalent. An equivalence class of 
isotropy types under conjugation will be said to be a 
symmetry type. We are thus interested, given a 
G-invariant polynomial ®, to know the symmetry 
types of its critical points. We denote symmetry 
types as [H] = {Hg}, and say that [H] < [K] if a 
group conjugated to H is strictly contained in 
a group conjugated to K. 

As we have seen, points on the same G-orbit have 
the same symmetry type. On the other hand, points 
on different G-orbits can have the same isotropy 
type (e.g., for the standard action of O(n) in R”, all 
collinear nonzero points will have the same isotropy 
subgroup but will lie on distinct group orbits). 


G-Invariant Polynomials 


Consider a finite group G acting in X. (Many of the 
notions and results mentioned in this section have a 
much wider range of applicability.) We look at the 
ring of G-invariant scalar polynomials in x!,...,x”. 

By the Hilbert basis theorem, there is a set 
{Ji(x),..-,Je(x)} of G-invariant homogeneous poly- 
nomials of degrees {dj,...,d,} such that any 
G-invariant polynomial ®(x) can be written as a 


polynomial in the {J;,...,J,}, that is, 


B(x) = Wl Ji(x),--- Je) [3] 


with W a polynomial. (A similar theorem holds for 
smooth functions.) 

The algebra of G-invariant polynomials is finitely 
generated, that is, we can choose k finite. When the 
Ja are chosen so that none of them can be written as 
a polynomial of the others and r has the smallest 
possible value (this value depends on G), we say that 
they are a minimal integrity basis (MIB). (Note that 
some of the J, could be written as nonpolynomial 
functions of the others, and the J, could satisfy 
polynomial relations. For example, consider the 
group Z2 acting in R? via g:(x,y) — (—x, —y); an 
MIB is made of J1(x,y)=x*,Jo(x,y)=y*, and 
]3(x,y)=xy. None of these can be written as a 
polynomial function of the others, but J, Jz =J.) In 
this case, we say that the {J,} are a set of basic 
invariants for G. There is obviously some arbitrar- 
ness in the choice of the J, in an MIB, but the 


degrees {d1,...,d,} of {J1,...,J,} are fixed by G. (In 
mathematical terms, they are determined through 
the Poincaré series of the graded algebra Pg of 
G-invariant polynomials.) 

We will henceforth assume that we have chosen 
an MIB, with elements {J1,..., J} of degrees 
{d1,..., dp} in x, say with dy < d2 <---< dg. 

When the elements of an MIB for G are 
algebraically independent, we say that the MIB is 
regular; if G admits a regular MIB we say that G is 
coregular. 

An algebraic relation between elements J, of the 
MIB is said to be a relation of the first kind. The 
algebraic relations among the J are a set of 
polynomials in {J;,...,J,}, which are identically 
zero when seen as polynomials in x. If there are 
algebraic relations among these, they are called 
relations of the second kind, and so on. A theorem 
by Hilbert guarantees that the chain of relations has 
finite maximal length. (This is the homological 
dimension of the graded algebra Po mentioned 
above.) 

In the following, we will consider a matrix built 
with the gradients of basic invariants, the P-matrix 
(Sartori). This is defined as 


Pin(x) = (VJi(x), VJn(%)) 4] 


with (.,.) the scalar product in T,X. 

The gradient of an invariant is necessarily a 
covariant quantity; the scalar product of two 
covariant quantities is an invariant one, and thus 
can be expressed again in terms of the basic 
invariants. Thus, the P-matrix can always be written 
in terms of the basic invariants themselves. 


Geometry of Group Action 


The use of an MIB allows to introduce a map J: x — 
{]i(x),..-5Jz(x)} from X to a subset P of R*. If 
the MIB is regular, P=R*, while if the J; satisfy 
some relation then Pc RE is the submanifold 
satisfying the corresponding relations. The manifold 
P is isomorphic to the orbit space Q=X/G (the 
isomorphism being realized by the J map) and 
provides a more convenient framework to study 2. 

As mentioned above, on physical terms we are 
mainly interested in the orbit space up to equiva- 
lence of symmetry type. The set of points in X (of 
orbits in Q) with the same symmetry type will be 
called a G-stratum in X (a G-stratum in Q); the 
G-stratum of the point x will be denoted as a(x) c X 
(the G-stratum of the orbit w as X(w) C Q). (The 
notion of stratum was introduced by Whitney in 
topology; a stratified manifold is a set which can 
be decomposed as the disjoint union of smooth 


manifolds of different dimensions, the topological 
(or Whitney) strata: M = UJ M*, with M* c OM’ for 
all k <7.) 

It results that the G-stratification is compatible 
with the topological stratification. Indeed, P is a 
semialgebraic (i.e., it is defined by algebraic equal- 
ities and inequalities) stratified manifold in R*; the 
image of any G-stratum in Q belongs to a single 
topological stratum in P, and topological strata in P 
are the union of images of G-strata in Q. 

Moreover, the subgroup relations correspond to 
bordering relations between G-strata: if [G,] < 
[G,], then o(y) € 0o(x) and (with wx the orbit of x) 
Dy) COL): 

There is a stratum, called the principal stratum oo, 
which corresponds to minimal isotropy, open and 
dense in X; similarly, the principal stratum ¥ọ is 
open and dense in Q. 


Landau Polynomial 


In the Landau (1937) theory of phase transitions, 
the state of the system under study is described by a 
G-invariant polynomial ®: X — R having a critical 
point in the origin, with at least some of its 
coefficients — in particular those controlling the 
stability of the zero critical point — depending on 
external control parameters (usually, X=R” and 
G C O(n); in particular, in solid-state physics G is a 
crystallographic group). This should be chosen as 
the most general G-invariant polynomial of the 
lowest degree £ sufficient to ensure termodynamic 
stability; in mathematical terms, this amounts to the 
requirement that there is some open set 6 containing 
the origin and such that — for all values of the 
control parameters — V® points inwards at all points 
of OB (i.e., B is invariant under the gradient flow 
of ®). If the polynomials in the MIB are of degree 
dı <d)---<d,, then usually @=2d,. 

The G-invariance of ® and the results recalled 
above mean that we can always write it in terms of 
the polynomials in an MIB for G as in [3], 
P(x) = UJ (x). 

The discussion of previous sections shows that we 
can study symmetry breakings for ®:X —R by 
studying critical points of Y : P — R; in other words, 
Landau theory can be worked out in the G-orbit 
space Q:=M/G. The polynomial WV - providing 
a representation of the Landau polynomial in the 
orbit space — will also be called Landau—Michel 
polynomial. (Louis Michel (1923-1999) pioneered 
the use of orbit space techniques in physics and 
nonlinear dynamics, originally motivated by the 
study of hadronic interactions.) 
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In this way, the evaluation of the map ®:X — R 
is, in principle, substituted by evaluation of two 
maps, J:X — P and V:P — R. However, if, as in 
Landau theory, we have to consider the most 
general G-invariant polynomial on X, we can just 
consider the most general polynomial on P. 


Critical Points of the Landau Polynomial 
and Geometry of Orbit Space 


The G-invariance has consequences on the critical 
points of ®. We have already seen one such 
consequence: critical points come in G-orbits. 

However, this is not all. Indeed, G-invariance 
enforces the presence of a certain set x(G) € X of 
critical points, and conversely if we look for points 
which are critical under any G-invariant potential, 
these are precisely the points in x(G); the critical 
points on x(G) correspond to critical orbits which 
we call principal critical orbits. 

The set x(G) can be determined on the basis 
of the geometry of the Gz-action. (A trivial 
example is provided by X=R and G=Z) acting 
via g:x — —x; any even function has a critical point 
in zero, and albeit even functions can, and in general 
will, have nonzero critical points, this is the only 
critical point common to all the even functions.) 
Indeed (Michel 1971): an orbit w is a principal 
critical orbit if and only if it is isolated in its 
stratum. 

For the linear orthogonal group actions in R” 
often occurring in physics, no nonzero point or 
orbit can be isolated in its stratum. However, 
we can quotient out the radial degeneracy and 
work on X =S”! c R”. In this case, a G-orbit w1 
in S”! which is isolated in its stratum corresponds 
to a one-dimensional family {w,} of G-orbits in 
R” (call Xo the corresponding submanifold in X); 
the gradient of ® at x € Xo points along TxXo. 
We can thus reduce to consider the restriction 
o of the potential ® to Xo. (See also the 
reduction lemma of Golubitsky and Stewart in this 
context. ) 

Correspondingly, if Po C P is the submanifold in 
P image of Xo, that is, Po = J(Xo0), we can reduce to 
consider the restriction Vo of Y to Po. 

As these become one-dimensional problems, 
general results are available. In particular, one 
can provide general conditions ensuring the 
existence of one-dimensional branches of symme- 
try-breaking solutions bifurcating from zero along 
any such Xo or Po; this is also known as the 
equivariant branching lemma of Cicogna and 
Vanderbauwhede. 
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Reduction of the Landau Potential 


In realistic problems, ® quickly becomes extremely 
complicated, that is, it includes a high number of 
terms and therefore of coefficients. A thorough 
study of different symmetry-breaking patterns, that 
is, of the symmetry type of minima of ® for different 
values of these coefficients and of the external 
control parameter, is in this case a prohibitive task. 
It is possible to reduce the generality of the Landau 
polynomial with no loss of generality for the 
corresponding physical problem. Indeed, a change 
of coordinates in the X space will produce a 
formally different — but obviously equivalent — 
Landau polynomial; it is convenient to use coordi- 
nates in which the Landau polynomial is simpler. 

A systematic and algorithmic reduction procedure — 
based on perturbative expansion near the origin —is well 
known in dynamical systems theory (Poincaré—Birkhoff 
normal forms), and can be adapted to the reduction 
of Landau polynomials. (An alternative and more 
general — but also much more demanding — approach 
is provided by the spectral sequence approach, also 
originating in normal-form theory.) 

We work near the origin, so that we can assume 
X =R” (with metric 7), and for simplicity we also 
take the case where G acts via a linear representa- 
tion T,. We consider changes of coordinates of the 
(Poincaré) form 


x= y+ biG) s 
generated by a G-invariant function H: h'(y) =n" 
(OH(y)/Oy’); this guarantees that [5] preserves the 


G-invariance of ®. The action of [5] on ® can be read 
from its action on the basic invariants J,. It results 


Ja(x) = Jaly) + (6Ja)(y) 
Ja := Pab (0H /9Jb) 
Let us now consider the reduction of an invariant 
polynomial ®(x) = (J). We write D,:=0/0],, and 
understand that summation over repeated indices is 
implied. In general, 


WJ) > WJ + 4) 


=U S 
a=1 q 


where the ellipsis means higher-order terms. 
Disregarding higher-order terms and using [6] and 
[4], we get 


[6] 





Ja +++: 


We expand ® as a sum of homogeneous poly- 
nomials, and write ®(x)= Sio elx), where 


®,(ax) =a**!@,(x). Also, write Y= >>, U,, where 
(x) := Vp J(x)]. 

It results that under a change of coordinates [5] 
generated by H = Hm homogeneous of degree m + 1, 
the terms Y, with k < m are not changed, while the 
terms W,,,, change according to 


= Vintp + (Da¥p)Pas(DsHm) +: [8] 


We can then operate sequentially with H,, of 
degree 3,4,...; at each stage (generator Hm), we are 
not affecting the terms Y, with k < m. Moreover, 
we can just consider [8], as higher-order terms are 
generic and will be taken care of in subsequent 
steps. (This procedure requires to determine suitable 
generating functions H,,; these are obtained as 
solutions to homological equations.) 

In the above, we disregarded the dependence on the 
control parameters, such as temperature, pressure, 
magnetic field, etc; that is, we implicitly considered 
fixed values for these. However, they have to change 
for a phase transition to take place. If we consider a full 
range of values — including in particular the critical 
ones — for the control parameters, say A € A, we should 
take care that the concerned quantities and operators 
are nonsingular uniformly in A. 

This leads to reduction criteria for the Landau and 
Landau—Michel polynomials (Gufan). Define, for 
i=1,...,k the quantities U;(J1,...,J%):=(OF/O];)Psi. 


Reduction Criterion 


For ®(x) = Y( Ji, ..., Jg): R” — Ra G-invariant poten- 
tial depending on physical parameters À € A, there is a 
sequence of Poincaré changes of coordinates such that 
® is expressed in the new coordinates y as ®(y) = Ê( J), 
where terms which can be written (up to higher-order 
terms) uniformly in A as ag OW Jis... Jk) 
Ua(Ji,---, Je), with Oa polynomials in Ji,...,Jk 
satisfying the compatibility condition (0Q3/0J].) = 
(3Qa/3Jə), are not present in Ê. 


Nonstationary and Nonvariational 
Problems 


So far we have considered stationary physical states. 
In some cases, one is not satisfied with such a 
description, and wants to study time evolution. A 
model framework for this is provided by the 
Ginzburg-Landau equation 


x = f(x) 9] 


where f =7(V®):X — TX (see above for notation). 
In this case, G-invariance of ® implies equivariance 


of [9]. More generally, we can consider [9] for an 
equivariant smooth f (not necessarily a gradient), 
that is, f‘ (gx) =(Dg);f"(x). 

In this case, one shows that 


f(x) € Txo(x) (10) 


so that closures of G-strata are dynamically invar- 
iant, and the dynamics can be reduced to them. This 
is of special interest for the “most singular” strata, 
that is, those of lower dimension. The reduction 
lemma and the equivariant branching lemma men- 
tioned above also hold (and were originally for- 
mulated) in this context. 

The relation [10] also implies that one can project 
the dynamics [9] in X to a smooth dynamics p = F(p) 
in the orbit space; this satisfies F[J(x)] =(DJ)[f(x)]. 
In the gradient case, this (together with initial 
conditions) embodies the full dynamics in X, while 
in the generic case one loses all information about 
motions along group orbits (note that these corre- 
spond to phonon modes). 

An orbit w isolated in its stratum is still an orbit of 
fixed points for any G-equivariant dynamics in X in 
the gradient case, while in the generic case it 
corresponds to a fixed point for F and to relative 
equilibria (dynamical orbits which belong to a single 
group orbit) in X. In this case, time averages of 
physical quantities can be G-invariant for nontrivial 
relative equilibria. 


Extensions and Physical Applications 


We have discussed finite group symmetry breaking 
and focused on polynomial potentials (which can be 
thought of as Taylor expansions around critical 
points). For nonfinite groups, and in particular 
noncompact ones, the situation can be considerably 
more complicated. 


1. An extension of the theory sketched here is 
provided by Palais’ theory, and in particular by 
his “symmetric criticality principle,” which 
applies in Hilbert or Banach spaces of sections 
of a fiber bundle satisfying certain conditions. 
This is especially relevant in connection with field 
theory and gauge groups. 

2. We focused on the situation discussed in classical 
physics. Finite group symmetry breaking is of 
course also relevant in quantum mechanics; 
this is discussed, for example, in the classical 
books by Weyl (1931) and Wigner (1959), and in 
the review by Michel et al. (2004). 

3. One speaks of “explicit symmetry breaking” 
when a nonsymmetric perturbation is introduced 
in a symmetric problem. In the Hamiltonian 
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case (or in the Lagrangian one for Noether 
symmetries), Hamiltonian symmetries correspond 
to conserved quantities, and nonsymmetric 
perturbations make these become approximate 
constants of motion. 

4. The symmetry of differential equations — as well 
as symmetric and symmetry-breaking solutions for 
symmetric equations — can be studied in general 
mathematical terms (see, e.g., Olver (1986)). 

5. Physical applications of the theory discussed here 
abound in the literature, in particular through the 
Landau theory of phase transitions. A number of 
these, together with a deeper discussion of the 
underlying theory, is given in the monumental 
review paper by Michel et al. (2004). 


See also: Central Manifolds, Normal Forms; Compact 
Groups and Their Representations; Electroweak Theory; 
Finite Group Symmetry Breaking; Phase Transitions in 
Continuous Systems; Quasiperiodic Systems; Symmetry 
and Symmetry Breaking in Dynamical Systems; 
Symmetry Breaking in Field Theory. 
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Introduction 


Finite Weyl systems have their applications in 
various branches of quantum information theory. 
They are helpful to tame the growth of complexity 
for a large class of quantum systems: a key 
discrepancy between classical and quantum systems 
is the difference in the growth of complexity as one 
goes to larger and larger systems. This is encoun- 
tered by simulating a quantum spin system on a 
computer, for example, with the aim to determine 
the ground state of a solid-state model of magnet- 
ism. For a model of N classical spins, this involves 
checking the energy for 2 different configurations, 
but for a model with quantum spins it requires the 
solution of an eigenvalue equation in a Hilbert space 
of dimension 2, which is a vastly more difficult 
problem for large N. For a three-dimensional lattice, 
three sites each way (N=27), this is a problem in 
10° dimensions, and lattice size 4 leads to utterly 
untractable 10!” dimensions. 

It is therefore highly desirable to find ways of 
treating at least some aspects of large, complex 
quantum systems without actually having to write 
out state vectors component by component. States 
which are invariant under a suitable discrete abelian 
symmetry group satisfy this condition. They can be 
characterized by simple combinatorial data, which 
do not grow exponentially with the system size N. 
At the same time, the class of these so-called 
stabilizer states is sufficiently complex to capture 
some of the key features needed for computation, 
especially the quantum correlation (entanglement) 
between subsystems. They have also been shown to 
be sufficient to generate large quantum error 
correcting codes. 

A further motivation for finite Weyl systems is 
directly based on constructing quantum error cor- 
recting codes from classical coding procedures (see 
Quantum Error Correction and Fault Tolerance). 
The “quantization” technique which is used there 
naturally leads to the structure of finite Weyl 
systems. 

Finite Weyl systems precisely represent quantum 
versions of discrete abelian symmetry groups. It is a 
standard procedure to build the quantum version of 
a symmetry group by an appropriate central exten- 
sion, or equivalently, to study all its projective 


representations: the composition of two symmetry 
transformations is only preserved up to a phase on 
the representation Hilbert space. The unitary opera- 
tors which represent the symmetry transformations 
are called Weyl operators. 

The simplest and most prominent example for a 
finite Weyl system is given by the three Pauli 
matrices and the identity. These four unitary 
operators build a projective representation of the 
symmetry group of binary vectors (0,0), (0,1), 
(1,0), (1,1), where the group law is the addition 
modulo two. The null-vector (0,0) corresponds to 
the identity, the vector (0,1) is assigned to X, (1,0) 
corresponds to Z, and (1,1) is mapped to iY. It is 
not difficult to verify that the product of two Pauli 
operators preserves the addition of binary vectors up 
to a phase. 

Discrete Weyl systems are deeply related to 
symplectic geometry for vector spaces over finite 
fields. The additive structure of the vector space is 
the underlying abelian symmetry group. The 
exchange of two Weyl operators within a product 
produces a phase that is the exponential of an 
antisymmetric bilinear form, as it is explained in the 
next section. For irreducible Weyl systems, this 
antisymmetric form must be symplectic because the 
Weyl operators generate a full matrix algebra. In 
particular, this requires that the dimension of the 
underlying vector space is even. The Pauli matrices 
are also an example for this more special structure: 
the binary vectors (p, q4)p,4=0,1 are a two-dimensional 
vector space over the field with two elements {0, 1}. 
The commutation relations for Pauli operators 
imply that the symplectic form can be evaluated 
for two binary vectors (p,q), (p’,q’) according to 
pq —qp'mod2. It is apparent to interpret the 
binary vectors (p,q) as points in a discrete phase 
space, where the first entry corresponds to the 
momentum and the second to the position. In view 
of this, discrete Weyl systems serve as a finite- 
dimensional analog of the canonical commutation 
relations. 

For the generic situation in quantum information 
theory, an irreducible Weyl system is represented on 
the Hilbert space describing a system of several 
single particles. Stabilizer states are left unchanged 
under the action of a so-called isotropic subgroup 
which consists of mutually commuting Weyl opera- 
tors: this kind of invariance is precisely the type of 
constraint that reduces the complexity for the 
parametrization of the state. For an efficient 
description of such states, there are combinatorial 
techniques available e.g., graph theory. 


Operations that preserve the class of stabilizer 
states (for a particular symmetry group) must be 
covariant with respect to this symmetry. These 
operations are called Clifford channels which have 
far-reaching applications in the theory of quantum 
error correction. They also allow to take classical 
coding procedures and turn them into quantum 
codes: on the classical level, the encoding operation 
acts on classical phase space as a linear map 
(additive code). Up to a choice of phases, this 
induces a quantum channel that preserves the 
structure of Weyl systems. These codes are called 
stabilizer codes and have been investigated by many 
authors (Calderbank et al. 1997, Cleve and Gottesman 
1996, 1997) (see Quantum Error Correction and 
Fault Tolerance). In particular, the first quantum 
error correcting codes belong to this class. 

This article is organized as follows. In the next 
section, the basic mathematical notions are provided, 
like projective representations, Weyl systems, and 
irreducibility. Moreover, statements on the main 
structure of Weyl systems are presented. Next, the 
notion of Weyl covariant channels (Clifford channels) 
is introduced and their basic properties are stated. In 
particular, stronger results for the reversible case are 
given. The relation between symplectic geometry and 
reversible Clifford operations on finite Weyl systems 
is explained. Results on the general structure of 
stabilizer states and stabilizer codes are given in the 
penultimate section. Finally, the representation of 
stabilizer codes in terms of graphs is described. 


Finite Weyl Systems 


A projective representation of a group = assigns to 
each group element € a unitary operator w(€) on a 
Hilbert space H such that the group law is preserved 
up to a phase, that is, the relation 


w(Ei + &2) = f (81, €2)w(E1 w(&2) [1] 


is fulfilled for a phase-valued function f on =’. In the 
following, we denote a projective representation by a 
triple (w,f, H). A finite Weyl system is a projective 
representation of a finite abelian group. The opera- 
tors w(€) are called Weyl operators and the function f 
is called the factor system. We refer to the work by 
Zmud (1971, 1972) for an analysis of projective 
representations for general abelian groups. 

The Weyl algebra A(w,f, H) associated with a 
Weyl system (w,f, H) is the smallest norm-closed 
subalgebra in the space of bounded operators B(H) 
which contains all Weyl operators. If the Weyl 
algebra coincides with the algebra of all bounded 
operators, then the Weyl system is called irreducible. 
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This is equivalent to the fact that each operator that 
commutes with all Weyl operators must be a 
multiple of the identity. 

In order to analyze the properties of factor 
systems systematically, we introduce here a few 
pieces of the cohomology theory of groups. For each 


positive integer R=1,2,3,... we introduce the 
abelian group C*(=) of k-cochains which consists 
mk 


of all phase-valued functions on =*. The product 
and the inverse of k-cochains is defined pointwise. 
Factor systems are special 2-cochains. Namely, if we 
consider a Weyl system (w, f, H), then associativity 
implies that the so-called 2-cocycle condition, 


flé +E E)E E) f(é,+&) f(a.) =1 [2] 


holds. This property can also be expressed by a 
coboundary map 6 which is a group homomorphism 
from k-cochains to (k + 1)-cochains. We consider here 
the action of the coboundary map on a 1-cochain 
and a 2-cochain f: 


(6p) (E1, £2) := (fr +) (G1) “y()' [3] 


(Of) (E1, E2, 63) :=f( + £2, &)f (8) 
x fln i tE AEE) A 


The group of 2-cocycles Z*(=) consists of all 
2-cochains f with ðf =1 and the group of all 
2-coboundaries B*(=) contains all 2-cochains of 
the form f=6y. The 2-fold concatenation of the 
coboundary map is the trivial homomorphism 
60°0=1, which implies that each 2-coboundary is 
a 2-cocycle. The converse is in general not the case 
and the 2-cohomology group H?(=) := Z7(=)/B?(=) 
is nontrivial. 

The Zmud (1971, 1972) analysis shows that the 
set of Weyl systems are characterized by elements of 
the 2-cohomology H?(€). The multiplication of a 
Weyl system (w, f, H) by a 1-cochain y yields a new 
family of Weyl operators (pw)(€) = y(&)w(£). The 
2-cocycle f is altered by the multiplication of the 
2-coboundary ðy and the new Weyl system is given 
by (pw,ðyf, H). This kind of transformation does 
not change the cohomology class of the factor system 
and the corresponding Weyl algebras coincide: 
Alw, f, H) =A( yw, dyf, H). Thus, the fundamental 
properties of a Weyl system only depend on the 
cohomology class of the factor system. In particular, 
if the factor system f=6y is a 2-coboundary, then 
we can trivialize the Weyl system (w,ðp, H) by 
multiplying the inverse 1-cochain y! and we obtain 
a true unitary representation (py !w,1,H). The 
corresponding Weyl algebra 2(w,6y, H) is abelian. 
The relation between cohomology and Weyl systems 
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can be made even more precise by the following 
theorem: 


Theorem 1 (Zmud 1971, 1972). 0 is the group 
homomorphism on 2-cochains that exchanges the 


variables: (OF )(E1, &2) = ae) 


(i) The antisymmetric part f-\(0f) of a factor 
system (2-cocycle) is an antisymmetric bichar- 
acter, that is, a group homomorphism in both 
arguments keeping the other variable fixed. 

(ii) Each symmetric 2-cocycle f=0f is a 
2-coboundary f =6y¢. 

(iii) The group of antisymmetric bicharacters on = is 
isomorphic to the 2-cohomology group H?(2). 
For each antisymmetric bicharacter o the corre- 
sponding 2-cohomology class is uniquely deter- 
mined by o=f (0f) for some representative 
Fe =), 


Example 2 The following Weyl system describes 
n-quantum digits (in short qudits). The system’s 
Hilbert space is spanned by orthonormal vectors 
|a) =|a1,42,...,a,) which are labeled by vectors a 
of the additive group F”, where F = Z, is the cyclic 
field of prime order. A projective representation 
(w, x, C”) of the additive group F?” is given by 


w(p,q)|a) = ee 2ta + q) [5] 


where p* is the transposed vector. The factor system 
x assigns to each pair (p,q), (p’, g’) the phase 


x(p, q|p',q’) =e ara [6] 


The finite vector space F” is interpreted as finite 
phase space with a multiplicative symplectic form o. 
It assigns to a pair of vectors (p, q), (a, b) the phase 


o(p, qla, b) := em OPO ag) [7] 


The commutation relation for Weyl operators 
comprise the symplectic form: 


w(p,q)w(a,b) = o(a, b|p,q)w(a,b)w(p,q) [8] 


The d*” Weyl operators w(p,q) are a basis of 
the algebra of all operators acting on the Hilbert 
space Cc’, hence (w,x,C™) is irreducible. In 
particular, this Weyl system is a nice error basis in 
the sense of (Klappenecker and Roetteler 2002, 
2005). Namely, the Weyl operators form a projec- 
tive representation, on the one hand, and a unitary 
basis (Werner 2001) on the other. 

For d=2 and n=4, we obtain a system of four 
qubits and the Weyl operators are tensor products of 
four Pauli matrices including the identity. For 
instance, the Weyl operator of the binary vector 


(p,q)=(0011,1010) can be expressed in terms of 
Pauli matrices (see Introduction) as follows: 


w(0011, 1010) = w(0,1) @1 @ w(1, 1) & w(1, 0) 
=iXQ1@Y®@Z (9] 


Clifford Channels 


Weyl systems can be seen as quantized symmetries 
corresponding to finite abelian groups. In the 
Heisenberg picture the symmetry transformations 
act on operators A € B(H) of the observable algebra 
by automorphisms (reversible quantum channels): 


Ad|w(£)](A):= wS) Aw) (10) 


Since a projective representation preserves the group 
law up to a phase, the corresponding automorph- 
isms preserve the group law: 


Ad[w(g)| o Adļlw(n)] = Adjw(E+m)]| [11] 


A quantum channel T is called a Clifford channel if 
it is covariant with respect to Weyl systems 
(w1, fi, Hı) and (w2, f2, H2), that is, the intertwiner 
relation 


T o Ad[w2(€)] = Ad[w: (€)] o T [12] 


holds. It is required that the antisymmetric part of 
the factor systems fı and f} coincide, that is, 
o=fi 0f =f 0h. We call (w1, fi, Hı) the input 
and (w2, f2, H2) the output system. We refer to the 
article by Scutaru (1979), which is concerned with 
the general properties of covariant channels. 

It is a natural question to ask how Clifford 
channels act on Weyl operators. As shown by 
Holevo (n.d.), a Clifford channel maps Weyl 
operators of the output system to multiples of a 
Weyl operators of the input system, provided the 
input system is irreducible. 


Theorem 3 (Holevo (n.d.)). Let T be a Clifford 
channel such that the input system (w,,f1,71) is 
irreducible. Then there exists a function p:=2—>C 
such that 


T(w2(8)) = e(S)wri (8) [13] 


holds for all €€ =. The function y is of positive 
type, that is, for all complex functions f on = the 
inequality 


0< X pE- MOn) [14] 
ENEE 


holds. Conversely, if the factor systems f, =f» 
coincide, then a well-defined channel is determined 


by [13] for any function p of positive type with 
p(0) = 1. 


We apply Theorem 3 to a reversible Clifford 
channel T. Each output Weyl operator wy(€) is 
mapped to a multiple of an input Weyl operator 


T(w2(€)) = p(€)w1(§) [15] 


where y is phase-valued (a 1-cochain) according to 
the reversibility of T. We focus now on the converse 
problem: construct all reversible Clifford channels 
for irreducible Weyl systems that have a common 
antisymmetric part of the factor system. The 
following theorem gives a useful characterization 
of reversible Clifford channels. 


Theorem 4 (Schlingemann and Werner 2001). If 
(wi, fı, Hı) and (w2,fh, H2) are irreducible Weyl 
systems with f,'(0f,) =f (0f), then there exists a 
1-cochain y with coboundary dp=f,'fr, and a 
reversible Clifford channel T, is determined by 


Ty(w2(€)) = p(€)wi(§) [16] 


If 7 is a 1-cochain that also satisfies ðr = frf, then 
there exists n € = such that 


r(€) = o(nlé)eé) 17 
T, = Adfwr(n)]oT, = Tp o Adļw>(n)] [18] 


holds. In other words, two irreducible Weyl systems 
determine a reversible Clifford channel up to a 
“phase space translation n.” 


We consider the Weyl system (w,f,H) over a 
discrete phase space F*”, where F is a finite field of 
prime order. The group of symplectic transforma- 
tions Sp(7, F) consists of all F-linear maps s on the 
phase space F” that preserve the symplectic form 
o=f of. A further Weyl system (w o s,f os, H) is 
obtained for each symplectic transformation s. Here 
the factor system f o s is defined according to (f o s) 
(€, n) :=f(s€,s7) and the corresponding Weyl opera- 
tors are (wos)(€)=w/(sE&). Obviously, the antisym- 
metric part of the factor system fos is the 
symplectic form 0os=o. The following statement 
is a direct consequence of Theorem 4. 


Corollary 5 For each symplectic transformation 
s€Sp(n,F) there exists a 1-cochain p with 
coboundary 6¢=f "(f os) and the corresponding 
reversible Clifford channel Tip, sı is given by 


Tip (w(E)) = p(E)w(s€) 19) 
with £n € F”. 


Example 6 We consider a finite field F. To a 
symmetric matrix I € M,(F) we associate the 
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symplectic transformation on F?” that maps a 
phase space vector (p,q) to (p — Tq, q). This shear 
transformation is viewed as one elementary step of a 
discrete dynamics. The quantized version of this 
dynamics is given by the unitary multiplication 
operator 


u(T)la) = &"la) |20] 


with the root of unity C4 = explir(d + 1)/d) for 
d#2 and Q =i. The unitary operator u(I) 
implements a reversible Clifford operation for the 
symplectic transformation (p,qg)+>(p — Iq, q) 
since the relation 


u(T)w(p,q)u(l)*= 4 "4w(p—-Tq,q) [21 


holds. The symmetric matrix T describes a pattern 
of two-qudit interactions. This can be visualized by 
a graph I whose vertices are the positions 
x,y=1,...,n. Two vertices x, y are connected by 
an edge if the matrix element T* 4 0 is nonvanish- 
ing. The value of the matrix element I} is 
interpreted as the strength of the interaction. 


Example 7 The second type of symplectic trans- 
formations, which is relevant here, is determined by 
an invertible matrix C € M,(F). It induces a 
symplectic transformation which maps the vector 
(p,q) to (Cg,—Cp), where C is the inverse of the 
transpose of C. This is implemented by a unitary 
transformation Fic). It is called the Fourier trans- 
form associated with the invertible matrix C: 


1 
(27i/d)p'Cq | 
zy e q) [22] 
vd qEF” 


By construction, the relation 





Fiqlp)= 


Fiqw(p,q)Fig =e°™/?4Iw(Cq,-Cp) [23] 


follows. If C= diag(cy,...,c,) is a diagonal matrix, 
then Frc, is a local unitary transformation. In fact, 
the Fourier transform is a tensor product 


Fig = Flay 8 Fla) D +++ @ Fie, [24] 


with c, € F\0, where the tensor product structure is 
determined by |q) =|qi) 8 --- 8 |qn). 


The Stabilizer Formalism 


This section is dedicated to the stabilizer formalism, 
which has widely been discussed in the literature 
(Calderbank et al. 1997, Gottesman 1996, 1997). 
We investigated here stabilizer codes from a point of 
view of symmetries and show how they can be 
characterized by Clifford channels. We verify that 


332 Finite Weyl Systems 


stabilizer codes are specific Clifford channels in the 
sense described in the last section. To begin with, we 
consider an irreducible Weyl system (w,f, H) of an 
even-dimensional F-vector space = such that the 
antisymmetric part of the factor system o:—f!6f is 
a symplectic form on =. Furthermore, we need to 
introduce the following notions: 

The symplectic complement of a subspace Q c = 


is the subspace 
Q = {£ € Slo(é|q) = 1Vq € Q} [25] 


Furthermore, a subspace Q of = is isotropic if it is 
contained in its symplectic complement Q7 D Q. In 
other words, for all pairs of vectors q,q' € Q we 
have o(q|q’')=1. 

We consider an isotropic subspace Q and we 
denote by (w|o,f|o, H) the corresponding restriction 
of the Weyl system (w, f, H). Since Q is isotropic, it 
follows that the restriction f|o is symmetric. Hence, 
the Weyl algebra for the restricted system 2%o:= 
W(w|o,f|o,H) is an abelian subalgebra of B(H). As 
a consequence, all the operators in Ago can be 
diagonalized simultaneously. To obtain the joint 
spectral resolution for all operators in A, we employ 
some facts from the theory of finite dimensional 
abelian C*-algebras: 


1. Ao is a finite-dimensional abelian C*-algebra and 
can be identified with the algebra of complex 
functions C(Q^) on a finite set Q^. 

2. Each element w € Q^ is a character (pure state), 
that is, a linear functional such that 
co(AB) = w(A)m(B) and w(A*) = w(A). 

3. For each operator A € Mo there exists precisely 
one function f4 on Q^ which is uniquely 
determined by w(A)= fa(w). The isomorphism 
A —> fa is called the Gelfand isomorphism. 

4. A character w € Q^ is an irreducible representa- 
tion of Xlo and there is a unique projection es 
onto the subspace in #H which carries this 
irreducible representation. 


From these facts we derive a joint spectral 
resolution for all operators in Ug. Namely, each 
A Eo can be written as 


A= X e,@(A) [26] 


wEQ^ 


We are now prepared to introduce the notion of 
stabilizer codes in accordance with Calderbank et al. 
(1997) and Gottesman (1996, 1997): Let Q be an 
isotropic subspace in = and let w € Q^ be a character 
of Ao. The projection e, is called a stabilizer code. 
The abelian group that is generated by the Weyl 
operators w(q),q € Q, is called stabilizer group. The 


abelian C*-algebra 29 is called stabilizer algebra. 
According to the following theorem, each stabilizer 
code is uniquely associated with a Clifford channel: 


Theorem 8 (Schlingemann 2002, 2004). Let Q be 
an isotropic subspace of = and let e, be the 
stabilizer code of a character w. Then there exists 
a unique Clifford channel E, with input system 
(Wey fos Ho) and output system (Wl or,f\o7,H) such 
that the following is true: 


(i) For each € € = the identity 
E(w(£)) = ĉo (€)Wa(E) [27] 


is fulfilled. 
(ii) Let va: Ha — H be the isometry which embeds 
Ho into H, then 


E (A) =v" Av, 28) 


holds for all A € B(H). 
(iit) The channel E, is invariant under translations 
in the isotropic subspace Q, that is, the identity 


E,, o Ad|w(q)|= Ex [29] 


holds for all q € Q. 


Stabilizer codes for maximally isotropic subspaces 
O = O° are special, since the projection e» onto the 
eigenspace of the character w is one-dimensional. 
Thus, e» is the density matrix of a pure state which is 
called stabilizer state. In view of Theorem 8, the 
expectation value of a Weyl operators w/(€) is given by 


tr(e,w()) = w(w(f))6a(6) (30) 


Representation by Graphs 


As described in the previous section (Theorem 8), 
each stabilizer codes is a pure Clifford channel 
which is completely determined by an isotropic 
subspace and a character of the corresponding 
stabilizer algebra. A constructive characterization 
of isotropic subspaces can be given in terms of 
graphs, as it has been shown in Schlingemann (2002, 
2004). The complete description of a stabilizer code 
requires in addition the choice of a character of the 
stabilizer algebra. Both data, the isotropic subspace 
and the character, can be encoded in a single graph 
A. The set of vertices N is partitioned into four 
different types, the input vertices I, the output 
vertices J, the measurement vertices K and the 
syndrome vertices L (see Figure 1). The edges of 
the graph are undirected, and a pair of vertices can 
be connected by at most d—1 edges, where self- 
links are also allowed. The adjacency matrix (also 








(a) (b) 

Figure 1 (a) A graphical representation of a Weyl operator 
-Y@®Y®Z8&Z 81 Of the stabilizer algebra of a quantum error 
correcting code, encoding one qubit into five (see 00273). The 
input vertex is gray, the output vertices are black. Each binary 
vector represents a Pauli matrix sitting at a tensor position of the 
output system. (b) The expectation values which are products 
over all edges, where to each edge with labels q, q’ the value 
(—1)97 is assigned. The character corresponds to the syndrome 
configuration (1110) (blanc vertices). 


denoted by A) is a symmetric matrix with entries 
A¥=0,1,...,d—1 according to the number of 
edges between x and y. Thus, the adjacency matrix 
can be seen as a linear operator on F” with cyclic 
field F=Z ,. Each subset A CN corresponds to a 
linear projection onto the subspace F* c F", which 
we denote by mą. For a convenient description we 
introduce the following notation: the union of two 
sets of vertices is written without the symbol U, that 
is, instead of IU J we write IJ. 


Theorem 9 (Schlingemann 2002, 2004). Let QC 
FF” be an isotropic subspace and let w be a 
character of the stabilizer algebra Aq. Then there 
exists a graph A with input vertices I, output vertices 
J, measurement vertices K and syndrome vertices L 
such that the following holds: 


(i) The linear operator Ti Amy, is invertible. 
(ii) The isotropic subspace Q consists of the vectors 
(tj AT Kq,73q) with q € ker(tirArx). 
(iii) There is a unique vector a in the syndrome 
subspace F™ such that the expectation values of 
the character w are given by 


co(w(mz At Kg, 71q)) = oi A(q+a) [31] 


with q€ ker(TikATz). 


Theorems 8 and 9 provide different useful 
characterizations of stabilizer codes, namely in 
terms of eigenspaces, Clifford channels, and graphs. 


e The original definition of stabilizer codes in terms of 
eigenspaces goes back to Calderbank, Gottesman, 
Rains, Shor, and Sloane (see, e.g., Calderbank et al. 
(1997), Gottesman (1996, 1997). They have devel- 
oped an approach to derive quantum codes from 
classical binary codes. 
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e Stabilizer codes can also be characterized by 
specific Clifford channels (see Theorem 8). The 
condition for a channel to be a stabilizer code is 
the covariance with respect to a subgroup of 
phase space translations. This reflects stabilizer 
codes in terms of symmetries. 

e Theorem 9 yields a characterization of stabilizer 
codes in terms of graphs providing an explicit 
expression for the isotropic subspace and the 
character of the stabilizer code. This graphical 
representation provides a suggestive encoding of 
various properties like error-correcting capabil- 
ities, multipartite entanglement, the effects of 
specific local operations. In fact, as it has been 
shown in Briegel and Raussendorf (2001), Dir 
et al. (2003), and Hein et al. (2004) that the 
entanglement present in a graph state can be 
derived from its shape. 


See also: Capacities Enhanced by Entanglement; 
Quantum Error Correction and Fault Tolerance. 
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Introduction 


A typical problem of quantum statistical mechanics is 
to compute equilibrium states of quantum dynamical 
systems. However, there is a strange difficulty inherent 
in this task, which is to describe the solution: if we try 
to describe the quantum state by specifying all matrix 
elements of all local density operators, we have a job 
which grows exponentially with the system size. This 
approach is obviously out of the question for the large 
systems statistical mechanics is interested in. Luckily, 
in practice nobody really wants to see all those 
numbers anyway, and one is content with determining 
a few correlation functions, or other easily parame- 
trized characteristics of the state. But for computing a 
state in the first place, we cannot restrict the state 
description to a such parameters. So the problem there 
is again: how can we efficiently parametrize the states 
of interest? 

In this article we collect some results on a 
particular way of addressing this problem. It 
originated in the early 1990s (Fannes et al. 1992b) 
in ideas for quantizing the notion of Markov chains 
(Accardi and Frigerio 1983). Recently, there has 
been a new surge of interest in such ideas, because 
they turned out to be very useful for numerical work 
on quantum spin chains. 

Its typical feature is that one does not directly 
describe expectation values of the state, but instead 
generates the state from a description of its correla- 
tions between neighboring sites. In the language of 
quantum information theory, it could be said that 
the method focuses on the entanglement between 
different parts of the system. 


The Basic Construction 
Notation 


We consider a quantum spin chain, that is, a system 
of infinitely subsystems, labeled by the integers, each 


Zmud EM (1971) Symplectic geometries over finite abelian 
groups. Mathematics of the USSR Sbornik 15: 7-29. 

Zmud E (1972) Symplectic geometry and projective representa- 
tions of finite abelian groups. Mathematics of the USSR 
Sbornik 16: 1-16. 


of which is a quantum-mechanical d-level system. 
Let us denote the observable algebra at site x € Z by 
A,. Each A, is hence isomorphic to the dxd 
matrices. The observables of the whole (infinite) 
system lie in the infinite tensor product 
Az =Qxcz Ax. This is defined as a quasilocal 
algebra (Bratteli and Robinson 1987, 1997), which 
is to say that it is the algebra generated by all finite 
tensor products of elements of the Ay, say Qyper Ax 
with A, € A, and A finite. Such an element is said 
to be localized in A, and we denote by A, the 
corresponding algebra. For A, C A2, we identify Aq, 
with a subalgebra of A,,, by tensoring with the 
identity operator on all sites in Az\A,. Az is the 
completion of the union of all A,, with A finite, 
under the C*-norm. 

A state w on Az is uniquely specified by its 
expectations on the subalgebras A4. Since these are 
finite-dimensional matrix algebras, we can write 
w(A)=tr(p,A) for A € Ay, with a “local density 
operator” pa. The system of local density operators 
must be consistent with respect to restrictions 
(partial traces). 

So far we have not used the structure of the 
underlying lattice Z in any way. This enters via the 
translation automorphisms 7, of Az, which identify 
A, with A, ,. A state is called translationally 
invariant, if woT,=w. The translationally invariant 
states form a weakly compact convex subset of the 
state space of Az, whose extreme points are called 
ergodic states. 


How to Generate Correlations 


Correlations between parts of a systems typically 
have their origin in an interaction in the past. Even 
if the subsystems are dynamically separated later on, 
the correlation persists, and one can take this as a 
motivation to model correlations from two ingredi- 
ents: a simplified prototype of a correlated system, 
and some evolution taking the parts of the simplified 
system to the parts of the given system. Let us 
consider a composite system, whose parts have 
observable algebras A; and A2, respectively, so 
that the whole system has algebra A; ® A2. We can 
build a state w on this system from a simpler one, 


say a state 7 on some B1 ® B2, and two completely 
positive unit preserving maps T;:.A;— B; such that 


w(A; ® A2) = 7(T1(A1)  T2(B2)) 


Some features of 7 are inherited by w. For example, 
when 7 is separable (a convex combination of 
products), which is always the case if either 51 or 
B2 is classical (i.e., an abelian algebra), then the 
same holds for w. Hence, if we want to describe 
quantum correlated “entangled” states, we have to 
build the correlations on an entangled state 7. 
Similarly, the “size” of the model system 6; ® By 
limits the strength of correlations in w. As for every 
correlated state, we can look at the linear func- 
tionals on A2, which are of the form A |> w(A1 ® A) 
with fixed A; € A. The dimension of the space of 
such functionals might be called the correlation 
dimension of w. This dimension is 1 for product 
states, and can clearly not increase by passing from 
n to w. Hence, it is bounded by the dimensions of 5, 
and B2, even if A; and A are infinite dimensional. 
“Finite correlation” in the sense of the title of this 
article refers to the finiteness of the correlation 
dimension between the two halves of a spin chain. 


The VBS Construction, and Matrix Product States 


The so-called valence bond solid (VBS) states on a 
chain are constructed by applying these ideas to the 
correlations across every link of a spin chain. Let us 
introduce a correlated model state 7, on some 
algebra B% B} for every bond (x,x +1). Then 
the state at site x is a function of contributions from 
both bonds connecting it, and we express this by a 
completely positive map Tx : Ax > B}; ® B7. Then 
an observable A; &8---& Az on a chain piece of 
length L is first mapped by Qi _, 1, to an element 
of Bj 8 By ®@---@ B;_, ® B,. Evaluating with the 
states m © <- Q NL-1, we are left with an element of 
Bj ® By, which we can evaluate with yet another 
state noz describing the boundary conditions for the 
construction (see Figure 1). 

Clearly, if we take the algebras B= large enough, 
and the model states 7, sufficiently highly entangled, 
we can generate every state on the finite chain. 
However, we can get an interesting class of states, 
even for fixed finite dimensions of the By. By 
restricting this correlation dimension, we can set a 
level of complexity for the state description. We can 
then try to handle a given physical problem first 
with simple states of low correlation dimension, and 
increase this parameter only as needed. A typical 
problem here is to determine the ground state of a 
finite-range Hamiltonian. We can then optimize 
each Ty and 7, separately, minimizing the ground 
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state energy with all other elements fixed. This is a 
semidefinite programming problem, for which very 
efficient methods are known. The global minimiza- 
tion is then done by letting the optimization site x 
sweep over the whole chain as often as needed. 

In a ground-state problem one is looking for a 
pure state, and it is therefore sufficient to choose 
both the model states 7, and the operations Ty to 
pure, that is, without decomposition into sums of 
similar objects. The scheme is thus run at the vector 
level rather than the operator level: we take the 
algebras B} = By as the operators on a Hilbert space 
K,, and ny =(dim K,)"|Q,)(Q,| with the (unnor- 


malized) maximally entangled vector 


Ww = D |i) ® |i) € Ke @ Kx [1] 


The maps T, will be implemented by a single 
operator Vxy:Ky1@Ky—-H as T1,(A)= VAV». 
Then the vectors Y € H®" contributing to the state 
on the chain of length L are of the form 


Y = V1 8- Vi (jo) @ 2°” @ fir) 
= ` (Vige VEI Joies: 


fos) (seth 


Se 


where jo,jz are labels for bases in Kg and Kz, 
describing the possible choices at the boundary, and 
we have used the special form of Q. We write out 
the operators V, in components, so that 


Vili) = >. [2 Veji 
U 


with suitable dim K,_,; x dim K£, dimensional 
matrices V£, in terms of which the above expression 
can be interpreted as a matrix product. The 
components of Y in a product basis {|jz)} become 


(m, -+-5ME|Y) = Gol Vi VZ + VE) [2] 


Due to this form the states generated in this way 
have also been called “matrix product states” 
(Klümper 1991). If one wants to consider periodic 
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boundary conditions, the indices jọ and jz can also 
be contracted, and the expression becomes a trace. 
For some simulations it is also convenient to choose 
~Ko = dim K,=1, so there is only one matrix 
element to be considered. 

The scheme for getting ground-state vectors 
described here is essentially the same as the density 
matrix renormalization group method (Verstraete 
et al. 2004). However, the version given here 
appears to be more transparent, more flexible, and 
in some cases (e.g., periodic boundary conditions) 
vastly more efficient. However, it may be too early 
for such judgment, since this is very much work in 
progress (Verstraete et al. 2005). 

In the sequel, we will focus not so much on the 
numerical aspects, but on the possibility this 
construction offers to explicitly construct nontrivial 
translationally invariant states on the infinite chain. 
Numerically, even in a translation invariant situa- 
tion the matrices V” obtained by optimization may 
turn out to depend on x (Wolf, Private Commu- 
nication), that is, one has to admit the possibility of 
a spontaneous symmetry breaking. However, for the 
construction of states on the infinite chain we will 
simply fix all V, to be equal. In some sense this 
turns the matrix product into a matrix power, which 
could be analyzed by methods familiar from the 
transfer matrix formalism of statistical mechanics. 
In eqn [2] this does not work, because of the 
u-dependence of the matrices involved. Neverthe- 
less, a slight reorganization of the construction will 
lead to a transfer-matrix-like formalism. 


The Evolution Operator Construction 


Fixing all Ty to be the same in Figure 1 still does not 
fix the state uniquely, since both in the mixed state 
version and in the pure state version of the 
construction some boundary information enters, as 
well. This boundary information then has to be 
chosen in such a way that a consistent family of 
local density operators is generated. It turns out that 
by rearranging the construction a little bit one can 
trivially solve one boundary condition, and reduce 
the other to finding a fixed point of a linear 
operator. This rearrangement was first carried out 
in Fannes et al. (1992b), where the term “finitely 
correlated state” was also coined. 

The basic element of the VBS construction was 
the operators T: A— B* & B` (here already taken 
independent of x). This is specified by dim A- 
dim B* - dim B~ matrix elements. However, assum- 
ing we can identify the algebras B=, we can also 
consider these matrix elements as those of an 
“evolution operator” E:A®B—B. This operator 


is once again taken to be completely positive and 


unit preserving. We introduce its nth iterate 
E” : A®” @ B— B by the recursion 


EX =E, E+) = E(id, @ E”) [3] 


Clearly, these operators are again completely posi- 
tive and unit preserving. Another way to express this 
iteration is to look at E as a family of maps on B, 
parametrized by A € A: We set E,(B)=E(A & B), 
and find 


E” (A1 &:-: 9 An QB) = Ea, -Ea (B) [4 


An important special role is played by the operator 
E= E, which is again completely positive and unit 
preserving. 

Now given any state 7 on B, we get a state w, on 
A®*", by setting 


Since E(1)=1, this family of states is consistent 
with respect to increasing n, by adding sites on the 
right, that is, w„+1(A ® 1) =w,(A). In other words, 
the family w, defines a state on the infinite right 
half-chain. This state can be extended to the full 
chain, as a translationally invariant state if and 
only if consistency also holds for adding sites 
on the left, that is, if w,41(1@A)=w,(A) for all 
A € A®”. For this we need a condition on the state 
n: it must be invariant under the map E (i.e., 
n(E(B))=7(B) for all B € B). This is the only 
requirement, and we call w the state Az generated 
by E and 7. Note that since E has the invariant 
vector l, its transpose also has an invariant vector, 
which can also be chosen as a state. We will often 
look at unique invariant state, in which case we can 
call w the state generated by E, without having to 
mention n. 

The valence bond picture was very much sug- 
gested by trying to describe correlations in a 
spatially distributed quantum system (the chain). 
The construction given here is perhaps more readily 
suggested by a process in time, rather than space. In 
fact, the paper by Fannes et al. (1992b) was partly 
motivated by an attempt to define a quantum analog 
of Markov processes (Accardi and Frigerio 1983). In 
fact, we can think of the construction as a general 
form for a repeated measurement in quantum 
theory. The object on which the measurements are 
performed has observable algebra B, whereas 
A describes the successive outputs. Choosing A to 
be classical (abelian) we would find in w the joint 
probability distribution of the sequence of measured 
values, when the initial state of the object is 7 (not 
necessarily invariant). Allowing nonabelian A would 


then correspond to a family of delayed choice 
experiments: while E describes the interaction of 
the system with the measurement apparatus (includ- 
ing the overall state change E), we are still free to 
make correlated and even entangled measurements 
on the successive output systems. This interpretation 
suggests many extensions, in particular, to continu- 
ous time (where the case of abelian outputs is 
discussed extensively in the classic book by Davies 
1976), or to cases allowing an external quantum 
input in each step, in which case we are looking at a 
quantum channel with memory B (Kretschmann and 
Werner 2005). 

In spite of the different natural interpretations, 
however, the constructions in this and previous 
paragraphs give exactly the same class of transla- 
tionally invariant states on the chain, as was shown 
in (Fannes et al. 1992b). 


Ergodic Decomposition 


A state on Az is called ergodic if it is an extreme 
point of the compact convex subset of translation- 
ally invariant states. Often in statistical mechanics, 
one finds states which may be ergodic, but never- 
theless contain a breaking of translation symmetry. 
Such states can be decomposed into periodic states, 
that is, states which are invariant with respect 
to some power of the shift. In general, new 
decompositions may become possible for any 
period. If no decomposition into periodic states is 
possible, the state is called completely ergodic. 

In this section we consider the question of how to 
decompose a finitely correlated state into ergodic 
components, using a well-established connection 
between ergodicity and clustering properties 
(Bratteli and Robinson 1987, 1997), that is, the 
decay of correlation functions. 

Correlation functions are very easily evaluated for 
finitely correlated states: let A+ be two observables 
localized on n+ sites, and suppose that these sites are 
separated by L sites. Then eqn [5] gives 


w(A-@ 1% BA.) =n( EY FEH) [6 


The L-dependence of this operator is clearly 
governed by the matrix powers of E. By assumption 
this operator always has the eigenvalue 1, because 
E(1)=1, and has norm <1, because it is also 
completely positive. The spectrum is hence 
contained in the unit circle. Each eigenvalue with 
modulus <1 thus contributes exponentially decay- 
ing terms to the correlation function [6]. From 
eigenvalues of modulus 1, which make up the 
so-called peripheral spectrum, we may get constant 
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or periodic contributions. This distinction is directly 


reflected in the ergodic properties (Fannes et al. 
1992b): 


e When the eigenvalue 1 is simple, there is a unique 
invariant state 7, and lim,n7! Sar E(B) = 
n(B)l. This implies, by [6] and (Bratteli and 
Robinson 1987, 1997, theorem 4.3.22), that w is 
ergodic. 

e When the eigenvalue 1 is simple, the peripheral 
spectrum consists precisely of the pth roots of 
unity for some p> 1. The state w is then the 
equal-weight convex combination of p periodic 
states with period p, which are translates of each 
other. 

è In particular (i.e., for p=1), a peripheral spec- 
trum consisting only of the simple eigenvalue 1 
implies that w is exponentially clustering in the 
sense that 


lw(A_ @ 1°" @ A4) — w(A_)w(A4)| 
< poly(L)r™||A_||||A+l| 7] 


where r is the largest modulus of eigenvalues other 
than 1, and poly is polynomial obtained from the 
Jordan normal form of E. By the previous item, the 
state w is then completely ergodic. 

è Conversely, if a state is finitely correlated, and is 
ergodic (resp. completely ergodic), it has a 
representation such that 1 is a simple eigenvalue 
(resp. the peripheral spectrum is trivial). 


Purity 
Pure States 


As in the case of the VBS construction, there is a 
version of the evolution operator construction, 
which is especially suited to produce pure states. 
Pure states are those which cannot be decomposed 
into a weighted sum of other states. For a 
translationally invariant state, this is a much 
stronger property than ergodicity and even com- 
plete ergodicity: not only the decomposition into 
periodic states is impossible, but any decomposition 
whatsoever. Nevertheless, this is what one expects 
from a ground state of translationally invariant 
interaction. 

From the formula [5] it is clear that if we 
decompose the E-operator entering for a site x into 
a sum two completely positive terms, we will have 
decomposed w into two positive terms. These might 
still be equal, but it is certainly suggestive to look at 
states generated with an E, which cannot be 
decomposed nontrivially into a sum of other 
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completely positive maps. Such maps are called 
pure, and are characterized by the form 


E(A & B) = V*(A & B)V 
ae Os Ow Q C¥ is isometric 


and A and B are the algebras of dx d and k xk 
matrices, respectively. Finitely correlated states 
generated from such a pure evolution operator are 
called purely generated. These are the candidates for 
pure finitely generated states. 

The form of a pure map is reminiscent of the 
Stinespring dilation of a general completely positive 
map: for a general E, we can set 


E(A @ B) = V*(11 9 A 8 B)V [9] 


[8] 


where A is some auxiliary matrix algebra. Since the 
invariance condition for 7 does not involve the A 
algebras, we get a purely generated state w with one- 
site algebra (AQ A), whose restriction to the 
original chain is w. Hence, purely generated states 
are the prototypes from which all other finitely 
correlated are obtained by sitewise restriction. 

But are such states pure? Since E need not have a 
trivial peripheral spectrum, the previous section tells us 
that a purely generated state may have a nontrivial 
decomposition into other, perhaps periodic states. But 
this is the only restriction we have to make. Indeed, the 
following statements about a finitely correlated state w 
are equivalent (Fannes et al. 1994, theorem 1.5): 


®© w is pure; Ji 

èe w is purely generated, and the operator E has 
trivial peripheral spectrum; 

e the mean entropy of w vanishes, and w is 
clustering, that is, [7] holds; and 

e E has the form [8], and no subalgebra of B, which 
contains l, is invariant under all operators E4. 


The Asymptotic Form of the Local Support 


Let us now fix an isometry V, such that E has trivial 
peripheral spectrum, and let p denote the unique 
invariant state of E. Then the vectors Y € H®” in 
the support of w are of the form [2] and depend, 
apart from the fixed choice of the VË = V“, on the 
boundary indices jg,j,=1,...,k. We can consider 
this as a map T, from k x k matrices to H®”: 


(H1s+++)bn|Tn(B)) = tr(BV*! Vi... Vi") [10] 


and denote the range of r, by G,,. Then G, is at most 
k-dimensional. Moreover, this family of subspaces 
is nested, that is, Guim C Or O H" and Guim © 
H?” Q Gm. Using that E(B)” — p(B)l converges 
exponentially fast, we also find that [,, is asympto- 
tically an isometry between G,, and the Hilbert 


space of kxk matrices with scalar product 
(B,C), =tr(pB*C). Hence, all the spaces G, are 
asymptotically identified, even though they are 
contained in each other. This “self-similarity” is 
the source of many further properties. For example, 
for any density matrix p on + m + r sites supported 
by Geim+r, and any observable A, localized on m 
sites in the middle of this interval (with £ to the 
left and r to the right), we get the expectation 
tr(pA) + w(A), up to exponentially small terms 
depending only on £ and r. 


Ground States and Gaps 


Suppose we fix some interval length @, and let þh be 
the projection onto the complement of G; in HS. 
We now consider / as the interaction term of a 
lattice interaction, that is, we consider the formal 
Hamiltonian 


H= y ab) [11] 


Then in the finitely correlated state w, each term in 
this sum has expectation zero, which is the absolute 
minimum for such expectations, because h > 0. In 
this sense w is the ground state for this Hamiltonian. 
Usually, ground states are not characterized in this 
way: one can only require that the average energy is 
minimized with respect to all translationally invari- 
ant states (Bratteli and Robinson 1987, 1997, 
theorem 6.2.58). Hence, one can usually perturb a 
ground state locally such that some terms in [11] 
have less than average expectation, at the expense of 
others. For w this is clearly impossible. Moreover, 
any state w with w(7,(4)) =0 for all x must coincide 
with w, even if we do not impose translation 
invariance. This follows from the previous section: 
the local density operators of w must all be 
supported in G, by the nesting property; hence, if 
we compare density operators on intervals of length 
l+m +r on observables localized on the middle m 
sites, we get w (A) © w(A), up to errors exponentially 
small in £ and r. 

The Hamiltonian [11] involves an infinite sum, 
which can be mathematically understood as a 
quadratic form in the GNS-representation associated 
with w (Bratteli and Robinson 1987, 1997). This is 
the Hilbert space spanned by vectors written as AQ, 
with the scalar products (AQ, Bw)=w(A*B), for 
local operators A, B. The ground-state property then 
implies HQ =0, and H > 0, because h > 0. It can be 
seen that H generates a well-defined dynamics, and 
is essentially self-adjoint on the domain of such 
vectors. Thus also the spectrum of H is a well- 
defined concept. This suggests a strengthening of the 


ground-state property: not only is Q the unique 
eigenvector of H for eigenvalue 0, but there is a gap 
y > 0 between zero and the next eigenvalue. This 
property is of considerable interest for models in 
solid-state physics and statistical mechanics. It was 
shown for all ergodic pure finitely correlated states 
in (Fannes et al. 1992b). 


Density 
Density of Finitely Correlated Pure States 


The natural topology in which to consider the 
approximation between states on the chain is the 
weak topology. A sequence w, converges weakly to 
w if for all local A the expectations converge, that is, 
Wy(A) — w(A). 

Let us start from an arbitrary translationally 
invariant state w, and see how we can approximate 
it. First, we can split the chains into intervals of 
length L, and replace w by the tensor product of the 
restrictions of w to each of these intervals. This state 
is not translationally invariant, so we average it over 
the L translations, and call the resulting state wz. 
Consider a local observable A, whose localization 
region has length R. Then for L — R out of the L 
translates contributing to wz the expectation will be 
the same as for w, and we get 


aa = (1-8) ata) + Ba, 


where the error term w is again a state. Hence, wy 
converges weakly to w as L— oo. One can show 
easily that wy is finitely correlated, with an algebra 
B essentially equal to A®”’. Hence, the finitely 
correlated states are weakly dense in the set of 
translationally invariant states. 

We can make the approximating states purer by a 
very simple trick. In the previous construction we 
always take two intervals together, and replace the 
tensor product of the two restrictions by a purifica- 
tion, that is, by a pure state on an interval of length 
2L, whose restrictions to the two length-L subinter- 
vals coincide with w. We average this over 2L 
translates, and call the result n. The estimates 
showing that 7, — w weakly are exactly the same as 
before. Moreover, one can show (Fannes et al. 
1992a) that nr, is purely generated. 

Being defined as a convex combination of other 
states, 7; is not pure, and the peripheral spectrum of 
E will contain all the 2Lth roots of unity. However, 
we can use that such a rich peripheral spectrum is 
not generic for E constructed from an isometry V. 
Therefore, if we choose an isometry V, close to the 
isometry V generating nr, we obtain a purely 
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generated state 7; with trivial peripheral spectrum. 
Since the expression for expectations of such states 
depends continuously on the generating isometry, 
we have that nî >n, as e— 0. But we know from 
the previous section that such states are pure. 
Hence, the pure finitely correlated states are weakly 
dense in the set of all translationally invariant states 
(Fannes et al. 1992a). 

This has implications for the geometry of the 
compact convex set of translationally invariant 
states, which are rather counter-intuitive for the 
intuitions trained on  finite-dimensional convex 
bodies. To begin with, the extreme points (the 
ergodic states) are dense in the whole body. This is 
not such a rare occurrence in infinite-dimensional 
convex sets, and is shared, for example, by the set of 
operators F with 0 < F< 1 on an infinite-dimen- 
sional Hilbert space (Davies 1976). Together with 
the property that the translationally invariant states 
form a simplex, it actually fixes the structure of this 
compact convex set to be the so-called Poulsen 
simplex. This was known also without looking at 
finitely correlated states. The rather surprising result 
of the above density argument is that even the small 
subclass of states which are extremal, not only in the 
translationally invariant subset but even in the 
whole state space, is still dense. 


Finitely Correlated Pure States with 
Bounded Memory Dimension 


It is clear in the above construction that the dimension 
of the algebra B goes to infinity for an approximating 
sequence. How many states can we get with a fixed 
memory algebra 6? The dimension of this manifold 
can be estimated easily from the number of parameters 
needed to describe the map E, and this dimension is 
certainly small compared to the dimension of the state 
space of the length L piece of the chain as L —> œo. 
However, since this is an infinite set, and not a linear 
subspace, we do not get an immediate bound on the 
dimension of the linear span of these states. What we 
want to show in this section is that the space of finitely 
correlated states with fixed B nevertheless generates a 
low-dimensional subspace of states on any large 
interval of the chain. To this end we will have to 
exhibit many observables A, localized on L sites, 
whose expectation is the same for all finitely correlated 
states with given 5. 

Let us look first at the case of purely generated 
states, or rather at the vectors Y € H®", which can 
be written in the form [2], which in the translation 
invariant case becomes 


(m,i, pr |B) = Gol VE VE Veje) [12] 
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for some collection V!,..., V4 of k x k matrices, and 
some basis labels jo,j, € {1,...,k}. The span of all 
such vectors will be denoted by Vz (k, d), and we would 
like to analyze the growth of dim V,(k, d), as L —> oo. 
Now a vector with components a( u1, .. . , uL) lies in the 
orthogonal complement of V,(k, d) if and only if 


> a(p4,... 


M1 5.5 HL 


tn VO Vee y — 0 


for any collection of matrices V”. In other words, this 
expression, considered as a noncommutative polyno- 
mial in d variables, is a polynomial identity for k x k 
matrices. The simplest such identity, for k=2, 
d=3,L=S, is [A, [B, C]*]=0. (For the proof observe 
that [B, C] is traceless, so its square is a multiple of the 
identity by the Cayley—Hamilton theorem.) This 
identity alone implies the existence of many more 
identities. For example, we can substitute higher-order 
polynomials for A, B, C, and multiply the identity with 
arbitrary polynomial from the right or form the left. 
There is a well-developed theory for such identities, 
called the theory of polynomial identity (PI) rings. In 
that context, the precise growth we are looking for has 
been worked out (Drensky 1998): 


log dim Vz (k, d) 


lim EA 


L—œ 


=(d—1)k* +1 [13] 


Thus, the dim Vz (k, d) only grows like a polynomial 
in L, of known degree, and the joint support of all 
purely generated finitely correlated state is exponen- 
tially small compared to H®”. 

We can apply the same idea to the set of all finitely 
correlated states with B equal to the k x k matrices. 
The joint support in this case is the full space, since the 
trace state on the chain, which is a product state 
generated with k= 1, already has full support. How- 
ever, it is still true all but a polynomial number of 
expectation values of w are already fixed by specifying 
k. Indeed, formula [5] for a general state is precisely of 
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Introduction 


Knots belong to sailors and climbers and upon further 
reflection, perhaps also to geometers, topologists, or 
combinatorialists. Surprisingly, throughout the 1980s, 
it became apparent that knots are also closely related 


the form [12], with the difference that the arguments A 
replace u, and the matrices E4 are now operators on 
the k*-dimensional space B. If we only want an upper 
bound, we can ignore subtlatties coming from Hermi- 
ticity and normalization constraints on E, and we get 
that the dimension of all finitely correlated states 
generated from the k x k matrices, restricted to a 
subchain of length L, grows at most like L°, with 
ao (d= 1)k* +1, 


See also: Ergodic Theory; Quantum Spin Systems; 
Quantum Statistical Mechanics: Overview. 
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to several other branches of mathematics in general 
and mathematical physics in particular. Many of these 
connections (though not all!) factor through the 
notion of “finite-type invariants” (aka “Vassiliev” or 
“Goussarov—Vassiliev” invariants) (Goussarov 1991, 
1993, Vassilev 1990, 1992, Birman-Lin 1993, 
Kontsevich 1993, Bar-Natan 1995). 

Let V be an arbitrary invariant of oriented knots in 
oriented space with values in some abelian group A. 
Extend V to be an invariant of 1-singular knots, knots 
that may have a single singularity that locally looks like 
a double point X, using the formula 


VOX = VA) VX) [1] 


Further extend V to the set K” of m-singular 
knots (knots with m double points) by repeatedly 
using [1]. 


Definition 1 We say that V is of type m if its 
extension V|¢m to (m + 1)-singular knots vanishes 
identically. We also say that V is of finite type if it is 
of type m for some m. 


Repeated differences are similar to repeated deriva- 
tives; hence, it is fair to think of the definition of V|» 
as repeated differentiation. With this in mind, the 
above definition imitates the definition of polynomials 
of degree m. Hence, finite-type invariants can be 
considered as “polynomials” on the space of knots. 

As described in the section “Basic facts”, finite-type 
invariants are plenty and powerful and they carry a 
rich algebraic structure and are deeply related to Lie 
algebras. There are several constructions for a 
“universal finite-type invariant” and those are related 
to conformal field theory, the Chern—Simons—Witten 
topological quantum field theory, and Drinfel’d’s 
theory of associators and quasi-Hopf algebras (see 
the section “The proofs of the fundamental theo- 
rem”). Finite-type invariants have been studied 
extensively (see the section “Some further directions”) 
and generalized in several directions (see the section 
“Beyond knots”). But the first question on finite-type 
invariants remains unanswered: 


Problem 2 Honest polynomials are dense in the 
space of functions. Are finite-type invariants dense 
within the space of all knot invariants? Do they 
separate knots? 


In a similar way, one may define finite-type 
invariants of framed knots (and ask the same 
questions). 


Basic Facts 
Classical Knot Polynomials 


The first (nontrivial!) thing to notice is that there are 
plenty of finite-type invariants and they are at least 
as powerful as all the standard knot polynomials 
combined (finite-type invariants are like polynomials 
on the space of knots; the standard phrase “knot 
polynomials” refers to a different thing - knot 
invariants with polynomial values): 


Theorem 3 (Bar-Natan 1995, Birman-Lin 1993). 
Let ](K)(q) be the Jones polynomial of a knot K (it is 
a Laurent polynomial in a variable q). Consider the 
power series expansion J(K)(e*)= >> o Vin(K)x”. 


Then each coefficient Vm(K) is a finite-type knot 
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invariant (thus, the Jones polynomial can be 
reconstructed from finite-type information). 


A similar theorem holds for the Alexander- 
Conway, HOMEFLY-PT, and Kauffman polynomials 
(Bar-Natan 1995), and indeed, for arbitrary Reshe- 
tikhin—Turaev invariants (Reshetikhin and Turaev 
1990, Lin 1991), although it is still unknown if the 
signature of a knot can be expressed in terms of its 
finite-type invariants. 


Chord Diagrams and the Fundamental Theorem 


The top derivatives of a multivariable polynomial 
form a system of constants which determine 
the polynomial up to polynomials of lower 
degree. Likewise the mth derivative V'”):= 
V(X .”.X) of a type m invariant V is a constant (for 
VO Oa a T= 
so V“ is blind to 3D topology), and likewise V'”) 
determines V up to invariants of lower type. Hence, a 
primary tool in the study of finite-type invariants is the 
study of the “top derivative” V'’”), also known as “the 
weight system of V.” 

Blind to 3D topology, V™ only sees the combi- 
natorics of the circle that parametrizes an m-singular 
knot. On this circle, there are m pairs of points that 
are pairwise identified in the image; one indicates 
those by drawing a circle with m chords marked (an 
“m-chord diagram”) (see Figure 1). 


Definition 4 Let D,, denote the space of all formal 
linear combinations with rational coefficients of 
m-chord diagrams. Let A, be the quotient of Dm 
by all 4T and FI relations as drawn in Figure 2 (full 
details are given in, e.g., Bar-Natan (1995)), and let 
A" be the graded completion of A := @,„ Ap- Let 
Am, A, and A be the same as A, A, and A’ 


but without imposing the FI relations. 


Theorem 5 (The fundamental theorem) 





FO FE FE FO 
ee a ya) 


Figure 2 The 4T and FI relations. 
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e (Easy part). If V is a rational valued type 
m-invariant then V™ defines a linear functional on 
A”. If in addition V™ = 0, then V is of type m — 1. 

e (Hard part). For any linear functional W on A‘, 
there is a rational valued type m invariant V so 
that V™ = W. 


Thus, to a large extent, the study of finite-type 
invariants is reduced to the finite (though super- 
exponential in m) algebraic study of A). A similar 
theorem reduces the study of finite-type invariants 
of framed knots to the study of Am. 


The Structure of A 


Knots can be multiplied (the “connected sum” opera- 
tion) and knot invariants can be multiplied. This 
structure interacts well with finite-type invariants and 
induces the following structure on A’ and A: 


Theorem 6 (Kontsevich 1993, Bar-Natan 1995, 
Willerton 1996, Chmutov et al. 1994). A’ and A are 
commutative and cocommutative graded bialgebras 
(i.e. each carries a commutative product and a 
compatible cocommutative coproduct). Thus, both 
A’ and A are graded polynomial algebras over their 
spaces of primitives, P = m Pop, and P= Om Pm. 


Framed knots differ from knots only by a single 
integer parameter (the “self-linking,” itself a type 1 
invariant). Thus, P” and P are also closely related. 


Theorem 7 (Bar-Natan 1995). P=P" @ (0), where 
0 is the unique 1-chord diagram: 





Bounds and Computational Results 


Table 1 shows the number of type m-invariants of 
knots and framed knots modulo type m — 1 invar- 
iants (dim A and dim A,,) and the number of 
multiplicative generators of the algebra A in degree 
m (dim Pm) for m<12. Some further tabulated 
results are in Bar-Natan (1996). 


Table 1 Some dimensions of spaces of finite type invariants 


m 0 1 2 3 4 5 
dim A’, 1 0 1 1 3 4 
dim Am 1 1 2 3 6 10 
dim Pm 0 1 1 1 2 3 


Source: Bar-Natan (1995); Kneissler (1997). 


Little is known about these dimensions for large m. 
There is an explicit conjecture in Broadhurst (1997), 
but no progress has been made in the direction of 
proving or disproving it. The best asymptotic bounds 
available are the following. 


Theorem 8 For large m, dim Pm > e°” (for any 
fixed c< m,/2/3) and dim Ay, < 6”m!y/m/r” 
(Stoimenow 1998, Zagier 2001). 


Jacobi Diagrams and the Relation 
with Lie Algebras 


Much of the richness of finite-type invariants stems 
from their relationship with Lie algebras. Theorem 9 
below suggests this relationship on an abstract level, 
Theorem 10 makes that relationship concrete, and 
Theorem 12 makes it a bit deeper. 


Theorem 9 (Bar-Natan 1995). The algebra A is 
isomorphic to the algebra A’ generated by “Jacobi 
diagrams in a circle” (chord diagrams that are also 


allowed to have oriented internal trivalent vertices) 
modulo the AS, STU, and IHX relations (see Figure 3). 


Thinking of trivalent vertices as graphical analogs 
of the Lie bracket, the AS relation becomes the 
anti-commutativity of the bracket, STU becomes 
the equation [x,y] =xy — yx, and IHX becomes the 
Jacobi identity. This analogy is made concrete 
within the proof of the following: 


Theorem 10 (Bar-Natan 1995). Given a finite- 


dimensional metrized Lie algebra q (e.g., any semi- 
simple Lie algebra), there is a map T,:A— U(q)§ 
defined on A and taking values in the invariant part 
Ulag)? of the universal enveloping algebra U(q) of 9. 
Given also a finite-dimensional representation R of 9 
there is a linear functional Wy.r:A— Q. 





IHX relations. 
6 7 8 9 10 11 12 
9 14 27 44 80 132 232 
19 33 60 104 184 316 548 
5 8 12 18 27 39 55 


Figure 4 A free Jacobi diagram. 


The last assertion along with Theorem 5 show 
that associated with any g, R, and m, there is a 
weight system and hence a knot invariant. Thus, 
knots are unexpectedly linked with Lie algebras. 

The hope (Bar-Natan 1995) that all finite-type 
invariants arise in this way was dashed by Vogel 
(1997, 1999) and Lieberum (1999). But finite-type 
invariants that do not arise in this way remain rare 
and not well understood. 

The Poincaré—Birkhoff-Witt (PBW) theorem of 
the theory of Lie algebras says that the obvious 
“symmetrization” map Xg:S(g) —U(g) from the 
symmetric algebra S(q) of a Lie algebra g to its 
universal enveloping algebra U/(q) is a q-module 
isomorphism. The following definition and theorem 
form a diagrammatic counterpart of this theorem: 


Definition 11 Let B be the space of formal linear 
combinations of “free Jacobi diagrams” (Jacobi 
diagrams as before, but with unmarked univalent 
ends (“legs”) replacing the circle; see an example in 
Figure 4), modulo the AS and IHX relations of before. 
Let x : B — A be the symmetrization map which maps 
a k-legged free Jacobi diagram to the average of the k! 
ways of planting these legs along a circle. 


Theorem 12 (Diagrammatic PBW; Kontsevich 
1993, Bar-Natan 1995). x is an isomorphism of 
vector spaces. Furthermore, fixing a metrized g there 
is a commutative square as in Figure 5. 


Note that B can be graded (by half the number of 
vertices in a Jacobi diagram) and that y respects 


B ——~ A 


To To 


Xg 


S(g) ——————>. 1(q) 


Figure 5 The diagrammatic PBW isomorphism and its 
classical counterpart. 
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degrees so it extends to an isomorphism x:B — A 
of graded completions. 


Proofs of the Fundamental Theorem 


The heart of all known proofs of Theorem 5 is 
always a construction of a “universal finite-type 
invariant” (see below); it is simple to show that the 
existence of a universal finite-type invariant is 
equivalent to Theorem 5. 


Definition 13 A universal finite-type invariant is a 
map Z:{knots}— A” whose extension to singular 
knots satisfies Z(K)=D + (higher degrees) when- 
ever a singular knot K and a chord diagram D are 
related as discussed before. 


The Kontsevich Integral 


The first construction of a universal finite-type 
invariant was given by Kontsevich (1993) (see also 
Bar-Natan (1995) and Chmutov and Duzhin 
(2001)). It is known as “the Kontsevich integral” 
and up to a normalization factor it is given by 





ior N < — dz; 


i=1 


where the relationship between the knot K, the pairing 
P, the real variables t;, the complex variables z; and z‘, 
and the chord diagram Dp is summarized in Figure 6 
(the symbol Y means “sum over all discrete variables 
and integrate over all continuous variables.”) 

The Kontsevich integral arises from studying the 
holonomy of the Knizhnik—Zamolodchikov equation 
of conformal field theory (Knizhnik and Zamolodchi- 
kov 1984). When evaluating Z1, one encounters 
multiple ¢-numbers (Le-Murakami 1995) in a sub- 
stantial way, and the proof that the end result is 
rational is quite involved (Le-Murakami 1996) and 
relies on deep results about associators and quasitrian- 
gular quasi-Hopf algebras (Drinfel’d 1990, 1991). 
Employing the same techniques, in Le-Murakami 





Figure 6 The key ingredients of the Kontsevich integral. 
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(1996), it is also shown that the composition of Wg R © 
Z1 precisely reproduces the Reshetikhin—Turaev invar- 
iants (Reshetikhin and Turaev 1990). 


Perturbative Chern-Simons-Witten Theory 
and Configuration Space Integrals 


Historically, the first approach to the construction 
of a universal finite-type invariant was to use 
perturbation theory with the Chern—Simons—Witten 
topological quantum field theory; this is also how 
the relationship with Lie algebras first arose. But 
taming the integrals involved turned out to be 
difficult and working constructions using this 
approach appeared only a bit later. 

In short, one writes a perturbative expansion for 
the large k asymptotics of the Chern—Simons—Witten 
path integral for some metrized Lie algebra g with a 
Wilson loop in some representation R of Q, 


DA tre holx (A) 


q-connections 


ik 2 
Kepli f w(Andd+5AAAAA) 


The result is of the form 
Wa(D),R SLED) 


where €(D) is a very messy integral expression and 
the diagrams D as well as the weights Wap),r were 
already discussed before. Replacing Wap),r by 
simply D in the above formula, we get an expression 
with values in A: 


Z2(K) := XCD Ye) eA 
D 


For formal reasons Z2(K) ought to be a universal 
finite-type invariant, and after much work taming 
the €(D) factors and after multiplying by a further 
framing-dependent renormalization term Z22°™Y, 
the result is indeed a universal finite-type invariant. 

Upon further inspection, the €(D) factors can be 
reinterpreted as integrals of certain spherical volume 
forms on certain (compactified) configuration spaces 
(Bott-Taubes 1994). These integrals can be further 
interpreted as counting certain “tinker toy construc- 
tions” built on top of K (Thurston 1995). The latter 
viewpoint makes the construction of Z visually 
appealing (Bar-Natan 2000), but there is no satis- 
factory write-up of this perspective yet. 

We note that the precise form of the renormaliza- 
tion term ZY remains an open problem. An 
appealing conjecture is that Zema = exp (1/2)+. If 


D: Feynman diagram 


this is true then Z,=Z, (Poirier 1999); but the 
conjecture is only verified up to degree 6 (Lescop 
2001) (there is also an unconfirmed verification to all 
orders (Yang 1997)). 

The most important open problem about pertur- 
bative Chern—Simons—Witten theory is not directly 
about finite-type invariants, but it is nevertheless 
worthwhile to recall it here: 


Problem 14 Does the perturbative expansion of the 
Chern—Simons—Witten theory converge (or is asymp- 
totic to) the exact solution due to Witten (1989) and 
Reshetikhin and Turaev (1990) when the parameter 
k converges to infinity? 


Associators and Trivalent Graphs 


There is also an entirely algebraic approach for the 
construction of a universal finite-type invariant Z3. 
The idea is to find some algebraic context within which 
knot theory is finitely presented — that is, presented by 
finitely many generators subject to finitely many 
relations. If the algebraic context at hand is compatible 
with the definitions of finite-type invariants and of 
chord diagrams, one may hope to define Z3 by defining 
it on the generators in such a way that the relations are 
satisfied. Thus, the problem of defining Z3 is reduced 
to finding finitely many elements of A-like spaces 
which solve certain finitely many equations. 

A concrete realization of this idea is in 
Le-Murakami (1996) and Bar-Natan (1997) (follow- 
ing ideas from Drinfel’d (1990, 1991) on quasitrian- 
gular quasi-Hopf algebras). The relevant “algebraic 
context” is a category with certain extra operations, 
and within it, knot theory is generated by just two 
elements, the braiding X and the re-association |. 
Thus, to define Z3 it is enough to find R= Z3(%) 
and “an associator” ® = Z3(|1) which satisfy certain 
normalization conditions as well as the pentagon 
and hexagon equations: 


p123 . (1A1)(®) - 634 = (A11)(®) - (11A)(®) 
(A1)(R*) = DRT (a) ao 


As it turns out, the solution for R is easy and 
nearly canonical. But finding an associator ® is rather 
difficult. There is a closed-form integral expression 
KZ due to Drinfel’d (1990) but one encounters the 
same not-too-well-understood multiple ¢ numbers. 
There is a rather complicated iterative procedure 
for finding an associator (Drienfel’d 1991, Bar- 
Natan 1998). Ona computer it had been used to find 
an associator up to degree 7. There is also closed-form 
associator that works only with the Lie superalgebra 
gl(1|1) (Lieberum 2002). But it remains an open 


problem to find a closed-form formula for a rational 
associator (existence by Drienfel’d (1991) and 
Bar- Natan (1998)). 

On the positive side, we should note that the end 
result, the invariant Z3, is independent of the choice 
of ® and that Z; = Z4. 

There is an alternative (more symmetric and intrinsi- 
cally three dimensional, but less well-documented) 
description of the theory of associators in terms 
of knotted trivalent graphs (Bar-Natan and 
Thurston). There ought to be a perturbative invariant 
associated with knotted trivalent graphs in the spirit of 
the last subsection and such an invariant should lead to 
a simple proof that Z2 = Z3 = Z1. But the €(D) factors 
remain untamed in this case. 


Step-by-Step Integration 


The last approach for proving the fundamental 
theorem is the most natural and historically the 
first. But here it is last because it is yet to lead to an 
actual proof. A weight system W: A, — Q is an 
invariant of m-singular knots. We want to show that 
it is the mth derivative of an invariant V of 
nonsingular knots. It is natural to try to integrate 
W step by step, first finding an invariant V’”~! of 
(m — 1)-singular knots whose derivative in the sense 
of [1] is W, then an invariant V”? of (m — 2)- 
singular knots whose derivative is V”~!, and so on 
all the way up to an invariant V° =V whose mth 
derivative will then be W. If proven, the following 
conjecture would imply that such an inductive 
procedure can be made to work: 


Conjecture 15 (Hutchings 1998). If V" is a once- 
integrable invariant of r-singular knots, then it is 
also twice integrable. That is, if there is an invariant 
V1 of (r—1)-singular knots whose derivative is 
V”, then there is an invariant V=? of (r — 2)-singular 
knots whose second derivative is V". 


Hutchings (1998) reduced this conjecture to a 
certain appealing topological statement and further 
to a certain combinatorial-algebraic statement about 
the vanishing of a certain homology group H! which 
is probably related to Kontsevich’s graph homology 
complex (Kontsevich 1994) (Kontsevich’s H? is A, 
so this is all in the spirit of many deformation theory 
problems where H? enumerates infinitesimal defor- 
mations and H! is the obstruction to globalization). 
Hutchings (1998) was also able to prove the 
vanishing of H! (and hence reprove the fundamental 
theorem) in the simpler case of braids. But no 
further progress has been made along these lines 
since then. 
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Some Further Directions 


We would like to touch upon a number of 
significant further directions in the theory of finite- 
type invariants and describe each of those only 
briefly; the reader is referred to the “Further read- 
ing” section for more information. 


The Original “Vassiliev” Perspective 


V A Vassiliev came to the study of finite-type knot 
invariants by studying the infinite-dimensional space 
of all immersions of a circle into R? and the topology 
of the “discriminant,” the locus of all singular 
immersions within the latter space (Vassiliev 1990, 
1992). Vassiliev studied the topology of the comple- 
ment of the discriminant (the space of embeddings) 
using a certain spectral sequence and found that 
certain terms in it correspond to finite-type invar- 
iants. This later got related to the Goodwillie calculus 
and back to the configuration spaces discussed in the 
last section. See Volic (2004). 


Interdependent Modifications 


The standard definition of finite-type invariants is 
based on modifying a knot by replacing over (or 
under) crossings with under (or over) crossings. 
Goussarov (1998) generalized this by allowing 
arbitrary modifications done to a knot — just take 
any segment of the knot and move it anywhere else 
in space. The resulting new “finite-type” theory 
turns out to be equivalent to the old one though 
with a factor of 2 applied to the grading (so an 
“old” type m invariant is a “new” type 2m invariant 
and vice versa). (see also Bar-Natan (2001) and 
Conant (2003)). 


n-Equivalence, Commutators, and Claspers 


While little is known about the overall power of finite- 
type invariants, much is known about the power of 
type m-invariants for any given n. Goussarov (1993) 
defined the notion of n-equivalence: two knots are 
said to be “n-equivalent” if all their type 1-invariants 
are the same. This equivalence relation is well under- 
stood both in terms of commutator subgroups of the 
pure braid group (Stanford 1998, Ng and Stanford 
1999) and in terms of Habiro’s calculus of surgery 
over “claspers” (Habiro 2000) (the latter calculus also 
gives a topological explanation for the appearance of 
Jacobi diagrams). In particular, already Goussarov 
(1993) shows that the set of equivalence classes of 
knots modulo n-equivalence is a finitely generated 
abelian group G, under the operation of connected 
sum, and the rank of that group is equal to the 
dimension of the space of type 7-invariants. 
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Ng (1998) has shown that ribbon knots generate 
an index 2 subgroup of Gy. 


Polynomiality and Gauss Sums 


Goussarov (1998) (see also Goussarov—Polyak—Viro 
(2001)) found an intriguing way to compute finite- 
type invariants from a Gauss diagram presentation of 
a knot, showing in particular that finite-type invar- 
lants grow as polynomials in the number of crossings 
n and can be computed in polynomial time in n 
(though actual computer programs are still missing!). 

Gauss diagrams are obtained from knot diagrams 
in much of the same way as Chord diagrams are 
obtained from singular knots, except all crossings 
are counted and not just the double points, and 
certain over/under and sign information is asso- 
ciated with each crossing/chord so that the knot 
diagram can be recovered from its Gauss diagram. 
In the example below (Figure 8), we also dashed a 
subdiagram of the Gauss diagram equivalent to the 
chord diagram shown in Figure 7. 

If G is a Gauss diagram and D is a chord diagram, 
then let (D,G) be the number of subdiagrams of 
G equivalent to D, counted with appropriate signs 
(to be precise, we also need to base the diagrams 
involved and count subdiagrams that respect the 
basing). 


Theorem 16 (Goussarov 1998, Goussarov et al. 
2000). If V is a type m invariant, then there are 
finitely many (based) chord diagrams D; with at 
most m chords and rational numbers a; so that 
V(K) = >0,a;(Di,G) whenever G is a Gauss dia- 
gram representing a knot K. 


Figure 7 A chord diagram. 





Figure 8 A knot and its Gauss diagram. 


Computing the Kontsevich Integral 


While the Kontsevich integral Z4 is a cornerstone of 
the theory of finite-type invariants, it has been 
computed for surprisingly few knots. Even for the 
unknot, the result is nontrivial: 


Theorem 17 (“Wheels,” Bar-Natan et al. 2000, 
2003). The framed Kontsevich integral of the unknot, 
Zi(Q), expressed in terms of diagrams in B, is given 
by Q= expyy >, banw2n, where the “modified Ber- 
noulli numbers” b2, are defined by the power series 
expansion X` o boyx?” = (1/2) log (sinh x/2)/(x/2), 
the “2n-wheel” wn is the free Jacobi diagram made 
of a 2n-gon with 2n legs (so, e.g., w = ped) and where 
exp,, means “exponential in the disjoint union sense.” 


Closed-form formulas have also been given for the 
Kontsevich integral of framed unknots, the Hopf 
link and Hopf chains. 

Theorem 17 has a companion that utilizes the same 
element Q, the “wheeling” theorem (Bar-Natan et al. 
2000, 2003). The wheeling theorem “upgrades” the 
vector space isomorphism xy:B— A to an algebra 
isomorphism and is related to the Duflo isomorphism 
of the theory of Lie algebras. It is amusing to note that 
the wheeling theorem (and hence Duflo’s theorem in 
the metrized case) follows using finite-type techniques 
from the “1 + 1=2 onan abacus” identity (Figure 9). 


Taming the Kontsevich Integral 


While explicit calculations are rare, there is a nice 
structure theorem for the values of the Kontsevich 
integral, saying that for a knot K and up to any fixed 
number of loops in the Jacobi diagrams, y~!Z,(K) can 
be described by finitely many rational functions (with 
denominators powers of the Alexander polynomial) 
which dictate the placement of the legs. This structure 
theorem was conjectured in Rozansky (2003), proven 
in Kricker (2000), and partially generalized to links in 
Garoufalidis and Kricker (2004). 


The Rozansky-Witten Theory 


One way to construct linear functionals on A (and 
hence finite-type invariants) is using Lie algebras 
and representations as discussed earlier; much of our 
insight about A comes this way. But there is another 
construction for such functionals (and hence invar- 
iants), due to Rozansky and Witten (1997), using 
contractions of curvature tensors on hyper-Kahler 


=D 


Figure 9 A knot theoretic 1+ 1= 





manifolds. Very little is known about the Rozansky-— 
Witten approach; in particular, it is not known if it 
is stronger or weaker than the Lie algebraic 
approach. For an application of the Rozansky- 
Witten theory back to hyper-Kahler geometry 
check Hitchin and Sawon (2001), and for a 
unification of the Rozansky—Witten approach with 
the Lie algebraic approach (albeit at a categorical 
level) check Roberts and Willeton (in preparation). 


The Melvin-—Morton Conjecture and the 
Volume Conjecture 


The Melvin—Morton conjecture (stated Melvin and 
Morton (1995), proven Bar-Natan and Garoufalidis 
(1996)) says that the Alexander polynomial can be read 
off certain coefficients of the colored Jones polynomial. 
The Kashaev—Murakami—Murakami volume conjec- 
ture (stated Kashaev (1997) and J Murakami and H 
Murakami (2001), unproven) says that a certain 
asymptotic growth rate of the colored Jones polyno- 
mial is the hyperbolic volume of the knot complement. 

Both conjectures are not directly about finite-type 
invariants but both have ramifications to the theory 
of finite-type invariants. The Melvin—Morton con- 
jecture was first proven using finite-type invariants 
and several later proofs and generalizations (see 
(Bar-Natan)) also involve finite-type invariants. The 
volume conjecture would imply, in particular, that 
the hyperbolic volume of a knot complement can be 
read from that knot’s finite-type invariants, and 
hence finite-type invariants would be at least as 
strong as the volume invariant. 

A particularly noteworthy result and direction for 
further research is Gukov’s (preprint) recent unifica- 
tion of these two conjectures under the Chern- 
Simons umbrella (along with some relations to 
three-dimensional quantum gravity). 


Beyond Knots 


For lack of space, we have restricted ourselves here to a 
discussion of finite-type invariants of knots. But the 
basic “differentiation” idea of the first section calls for 
generalization, and indeed it has been generalized 
extensively. We will only make a few quick comments. 
Finite-type invariants of homotopy links (links 
where each component is allowed to move across 
itself freely) and of braids are extremely well 
behaved. They separate, they all come from Lie 
algebraic constructions and in the case of braids, 
step-by-step integration as discussed previously works 
(for homotopy links the issue was not studied). 
Finite-type invariants of 3-manifolds and especially 
of integral and rational homology spheres have been 
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studied extensively and the picture is nearly a 
complete parallel of the picture for knots. There are 
several competing definitions of finite-type invar- 
iants, and they all agree up to regrading. There are 
weight systems and they are linear functionals on a 
space A(Ø) which is a close cousin of A and B and is 
related to Lie algebras and hyper-Kahler manifolds in 
a similar way. There is a notion of a “universal” 
invariant, and there are several constructions; they all 
agree or are conjectured to agree, and they are related 
to the Chern—Simons—Witten theory. 

Finite-type invariants were studied for several 
other types of topological objects, including knots 
within other manifolds, higher-dimensional knots, 
virtual knots, plane curves and doodles and more 
(see Bar-Natan). 
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Introduction 
Physics Background and Motivation 


Suppose G is a semisimple compact Lie group and 
M a closed oriented 3-manifold. Witten (1989) 
defined quantum invariants by the path integral 
over all G-connections A: 


Z(M, G;k) := J exp(V—1k CS(A))DA 


where k is an integer and CS(A) is the Chern—Simons 
functional, 


CS(A) = =f tr (4 ^AdA + 4° 


The path integral is not mathematically rigorous. 
According to the stationary-phase approximation in 
quantum field theory, in the limit k — oo the path 
integral decomposes as a sum of contributions from 
the flat connections: 


Z(M, G;k) ~ Z)(M,G;k) as k= œ 


flat connections f 


Each contribution is exp(2rv—1kCS(f)) times a 
power series in 1/k. The contribution from the 
trivial connection is important, especially for 
rational homology 3-spheres, and the coefficients 
of the powers (1/k)”, calculated using (n + 1)-loop 
Feynman diagrams by quantum field theory techni- 
ques, are known as perturbative invariants. 


Mathematical Theories 


A mathematically rigorous theory of quantum 
invariants Z(M, G ; k) was pioneered by Reshetikhin 
and Turaev in 1990 (see Turaev (1994)). 
A number-theoretical expansion of the quantum 
invariants into power series that should correspond 
to the perturbative invariants was given by Ohtsuki 
(in the case of sl, and general simple Lie algebras 


by the author) in 1994. This led him to introducing 
finite-type invariant (FTI) theory for 3-manifolds. A 
universal perturbative invariant was constructed by 
Le-Murakami-Ohtsuki (LMO) in 1995; it is uni- 
versal for both finite-type invariants and quantum 
invariants, at least for homology 3-spheres. 
Rozansky in 1996 defined perturbative invariants 
using Gaussian integral, very close in the spirit to 
the original physics point of view. Later Habiro 
(for sl and Habiro and the author for all simple 
Lie algebras) found a finer expansion of quantum 
invariants, known as the cyclotomic expansion, but 
no physics origin is known for the cyclotomic 
expansion. The cyclotomic expansion helps to show 
that the LMO invariant dominates all quantum 
invariants for homology 3-spheres. 

The purpose of this article is to give an overview 
of the mathematical theory of finite-type and 
perturbative invariants of 3-manifolds. 


Conventions and Notations 


All vector spaces are assumed to be over the ground 
field Q of rational numbers, unless otherwise stated. 
For a graded space A, let Gr,A be the subspace of 
grading n and Gr<,A the subspace of grading <x. 
For x € A, let Gr,x and Gr<„x be the projections of 
x onto, respectively, Gr,A and Gre<,A. 

All 3-manifolds are supposed to be closed and 
oriented. A 3-manifold M is an integral homology 
3-sphere (ZHS) if H,(M,Z)=0; it is a rational 
homology 3-sphere (QHS) if H;(M,Q)=0. For a 
framed link L in a 3-manifold M denote My, the 
3-manifold obtained from M by surgery along L (see 
e.g., Turaev (1994)). 


Finite-Type Invariants 


After its introduction by Ohtsuki in 1994, the theory 
of FIIs of 3-manifolds has been developed rapidly 
by many authors. Later Goussarov and Habiro 
independently introduced clasper calculus, or 
Y-surgery, which provides a powerful geometric 
technique and deep insight in the theory. Y-surgery, 
corresponding to the commutator in group theory, 
naturally gives rise to 3-valent graphs. 


Generality on FTIs 


Decreasing filtration In a theory of FIIs, one 
considers a class of objects, and a “good” decreasing 
filtration Fo D Fi D F2 D--- on the vector space 
F =F, spanned by these objects. An invariant of 
the objects with values in a vector space is of order 
less than or equal to n if its restriction to Fy,4, is 0; 
it is of finite type if it is of order < n for some n. An 
invariant has order n if it is of order < n but not 
< n— 1. Good here means at least the space of FTI 
of each order is finite dimensional. It is desirable to 
have an algorithm of polynomial time to calculate 
every FTI. In addition, one wants the set of FIIs to 
separate the objects (completeness). 

The space of invariants of order<m can be 
identified with the dual space of Fo/Fn+1; its 
subspace F„/Fn+1 is isomorphic to the space of 
invariants of order < n modulo the space of invar- 
lants of order < n — 1. Informally, one can say that 
Fn/Fns1 is more or less the set of invariants of 
order 7. 


Elementary moves, the knot case Usually the 
filtrations are defined using “independent elemen- 
tary moves.” For the class of knots the elementary 
move is given by crossing change. Any two knots 
can be connected by a finite sequence of such moves. 
The idea is if K,K'€ Fa, the mth term of the 
filtration, then K — K' € F414, where K’ is obtained 
from K by an elementary move. Formal definition is 
as follows. Suppose S is a set of double points of a 
knot diagram D. Let 


[D, S] = XC (-1)* Dy 


S'CS 


where the sum is over all subsets S’ of S, including 
the empty set, Dy is the knot obtained by changing 
the crossing at every point in S’, and #5 is the 
number of elements of S. Then F, is the vector 
space spanned by all elements of the form [D, S] 
with #S = n. For the knot case, the Kontsevich 
integral is an invariant that is universal for all FTIs 
(see Bar-Natan (1995)). 


Ohtsuki’s Definition of FTIs for ZHS 


An elementary move here is a surgery along a 
knot: M— Mx, where K is a framed knot in a 
ZHS M. A collection of moves corresponds to 
surgery on a framed link. To always remain in the 
class of ZHS we need to restrict ourselves to unit- 
framed and algebraically split links, that is, framed 
links in ZHS each component of which has 
framing +1 and the linking number of every two 
components is 0. It is easy to prove that a link L 
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in a ZHS M is unit-framed and algebraically split 
if and only if Mr is a ZHS for every sublink L’ of 
L. For a unit-framed, algebraically split link L in a 
ZHS M define 


[M, L] = X (-1)* Mv 
LCL 


which is an element in the vector space M freely 
spanned by ZHS. 

For a non-negative integer n let F° be the 
subspace of M spanned by [M,L] with #L =n. 
Then the descending filtration M = F$ > FS > 
FS >.--- defines a theory of FTIs on the class of 
ZHS. 


Theorem 1 


(i) (Ohtsuki) The dimension of F (M) is finite for 
every n. 

(ii) (Garoufalidis-Ohtsuki) One has F3n44(M)= 
F 3n42(M) = F3ni3(M). 


The orders of FTIs in this theory are multiples of 3. 
The first nontrivial invariant, which is the only (up 
to scalar) invariant of degree 3, is the Casson 
invariant. 


The Goussarov—-Habiro Definition 


Y-surgery or clasper surgery Consider the standard 
Y-graph Y and a small neighborhood N(Y) of it in 
the standard R? (see Figure 1). Denote by L(Y) the 
six-component framed link diagram in N(Y) c R?, 
each component of which has framing 0 in R° 
(see Figure 1). 

A framed Y-graph C in a 3-manifold M is the 
image of an embedding of N(Y) into M. The surgery 
of M along the image of the six-component link 
L(Y) is called a Y-surgery along C, denoted by Mc. 
If one of the leaves bounds a disk in M whose 
interior is disjoint from the graph, then Mc is 
homeomorphic to M. 

Matveev in 1987 proved that two 3-manifolds M 
and M’ are related by a finite sequence of 
Y-surgeries if and only if there is an isomorphism 
from Hı(M, Z) onto H,(M’',Z) preserving the 


Q O 


© 





Y-graph Its neighborhood N(Y) 


Figure 1 Y-graph. 


Surgery link L(Y) 
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linking form on the torsion group. It is natural to 
partition the class of 3-manifolds into subclasses of 
the same Hı and the same linking form. 


Goussarov—Habiro filtrations For a 3-manifold M 
denote by M(M) the vector space spanned by all 
3-manifolds with Hı and linking form the same as 
those of M. Define, for a set S$ of Y-graphs in M, 
[M, S$] = $ ycs (—1)*" My, and F* M(M) the vector 
space spanned by all [N, S] such that N is in M(M) 
and #S=n. The following theorem of Goussarov 
and Habiro (Goussarov 1999, Garoufalidis et al. 
2001, Habiro 2000) shows that the FTI theory 
based on Y-surgery is the same as the one of Ohtsuki 
in the case of ZHS. 


Theorem 2 For the case M=M(S?), one has 


The Fundamental Theorem of FTis of ZHS 


Jacobi diagrams A closed Jacobi diagram is 
a vertex-oriented trivalent graph, that is, a graph 
for which the degree of each vertex is equal to 3 and 
a cyclic order of the three half-edges at every vertex 
is fixed. Here, multiple edges and self-loops are 
allowed. In pictures, the orientation at a vertex is 
the clockwise orientation, unless otherwise stated. 
The “degree” of Jacobi diagram is half the number 
of its vertices. 

Let Gr,A(0),72 > 0, be the vector space spanned 
by all closed Jacobi diagrams of degree n, modulo 
the antisymmetry (AS) and Jacobi (IHX) relations 
(see Figure 2). 


The universal weight map W Suppose D is a closed 
Jacobi diagram of degree n. Embedding D into R? c S$? 
arbitrarily and then projecting down onto R? in 
general position, one can describe D by a diagram, 
with over/under-crossing information at every dou- 
ble point just as in the case of a link diagram. We 
can assume that the orientation at every vertex of D 
is given by a clockwise cyclic order. From the image 
of D, construct a set G of 2n Y-graphs as in 


AS Y + Y = 0 
l l 
l l 
l l 
XN N ~N 
N, A a Pa “ar 
N 
IHX l ‘Y’ V7 : A v ` 7 0 
(Jacobi) .------ eee > a re i, 
l 
l 


Figure 2 The AS and IHX relations. 


EAS 


Figure 3 The weight map. 


Figure 3. Here only the cores of a Y-graph are 
drawn, with the convention that each framed 
Y-graph is a small neighborhood of its core in R?. 

If G’ is a proper subset of G, then in G’ there is a 
Y-graph, one of the leaves of which bounds a disk, 
hence S2,,=S°. Thus, W(D):=[S°, G] =S2, — S°. By 
definition, W(D) € Fy,; it might depend on the 
embedding of D into R?, but one can show that 
W(D) is well defined in F3,,/F}3,,,,;. The map W was 
first constructed by Garoufalidis and Ohtsuki in the 
framework of F*. 


Fundamental theorem 


Theorem 3 (Lê et al. 1998, Lè 1997). The map W 
descends to a well-defined linear map 
W: Gr AlO) > F3,,/Fin.1 and moreover, is an 
isomorphism between the vector spaces Gr,A(0) 


and Finl F insis fOr M= MS): 


The theorem essentially says that the set of 
invariants of degree 2n is dual to the space of closed 
Jacobi diagram Gr,.A(Q). The proof is based on the 
LMO invariant (see the next section). 

A Q-valued invariant I of order < 2n restricts to a 
linear map from F2n/F2n+1 to Q. The composition 
of I and W is a functional on Gr,.A(@) called the 
“weight system” of I. The theorem shows that every 
linear functional on Gr<,A(Q) is the weight of an 
invariant of order <2n. 


Relation to knot invariants Under the map that 
sends an (unframed) knot K c S to the ZHS 
obtained by surgery along K with framing 1, an 
invariant of degree <2n (in the FY theory) of ZHS 
pulls back to an invariant of order <2n of knots. 
This was conjectured by Garoufalidis and proved by 
Habegger. 


Other classes of rational homology 3-spheres 
Actually, the theorem was first proved in the frame- 
work of F®. Clasper surgery theory allows Habiro 
(2000) to generalize the fundamental theorem to QHS: 
for M a QHS, the universal weight map W:Gr, 
A0) —> FanM(M)/Fa2ns1M(M), defined similarly as 


in the case of ZHS, is an isomorphism, and 


F2y-1M(M) = FiınM(M). 


Other filtrations and approaches Other equivalent 
filtrations were introduced (and compared) by 
Garoufalidis, Garoufalidis and Levine (1997), and 
Garoufalidis-Goussarov—Polyak (2001). Of impor- 
tance is the one using subgroups of mapping class 
groups in Garoufalidis and Levine (1997). A theory 
of n-equivalence was constructed by Goussarov and 
Habiro that encompasses many geometric aspects of 
FIIs of 3-manifolds (Habiro 2000, Goussarov 
1999). Cochran and Melvin (2000) extended the 
original Ohtsuki definition to manifolds with 
homology, using algebraically split links, but the 
filtrations are different from those of Goussarov— 
Habiro. 


The Le—-Murakami-Ohtsuki Invariant 
Jacobi Diagrams 


An open Jacobi diagram is a vertex-oriented uni- 
trivalent graph, that is, a graph with univalent and 
trivalent vertices together with a cyclic ordering of 
the edges incident to the trivalent vertices. A 
univalent vertex is also called “a leg.” The degree 
of an open Jacobi diagram is half the number of 
vertices (trivalent and univalent). A Jacobi diagram 
based on X, a compact oriented 1-manifold, is a 
graph D together with a decomposition D = X UT, 
such that D is the result of gluing all the legs of an 
open Jacobi diagram I to distinct interior points of 
X. The degree of D, by definition, is the degree of I’. 
In Figure 4 X is depicted by bold lines. Let A(X) be 
the space of Jacobi diagrams based on X modulo the 
usual antisymmetry, Jacobi and the new STU 
relations. The completion of Al(X) with respect to 
degree is denoted by A(X). 

When X is a set of m-ordered oriented intervals, 
denote A(X) by Pm, which has a natural algebra 
structure where the product DD’ of two Jacobi 
diagrams is defined by stacking D on top of D’ 
(concatenating the corresponding oriented intervals). 
When X is a set of m-ordered oriented circles, 
denote A(X) by Am. By identifying the two 
endpoints of each interval, one gets a map pr: Py, —> 
Am, which is an isomorphism if m=1 (see 
Bar-Natan (1995)). 


“a 


STU = 





Figure 4 The STU relation. 
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For x € Am and y € Aı, the connected sum is 
defined by x#my:=pr((pr-'x)(pr-ly)®”), where 
(pr-ly)®” is the element in P, with pr~'y on each 
oriented interval. 


Symmetrization maps Let Bm be the vector space 
spanned by open Jacobi diagrams whose legs are 
labeled by elements of {1,2,...,}, modulo the 
antisymmetry and Jacobi relations. One can define 
an analog of the Poincare—Birkhoff—Witt isomorph- 
ism ¥: By — Pm as follows. For a diagram D, x(D) 
is obtained by taking the average over all possible 
ways of ordering the legs labeled by j and attaching 
them to the jth oriented interval. It is known that x 
is a vector space isomorphism (Bar-Natan (1995)). 


The Framed Kontsevich Integral of Links 


For an m-component framed link Lc R?, the 
(framed version of the) Kontsevich integral Z(L) is 
an invariant taking values in A,, (see, e.g., Ohtsuki 
(2002)). Let v:=Z(K), when K is the unknot with 
framing 0, and Z(L):=Z(L)#,,v. An explicit for- 
mula for v is given in Bar-Natan et al. (2003). 


Removing Solid Loops: The Maps ., 


Suppose x € Bm is an open Jacobi diagram with legs 
labeled by {1,...,77}. If the number of vertices of 
any label is different from 2n, or if the degree of 
D>(m+1)n, we set t,(D)=0. Otherwise, parti- 
tioning the 27 vertices of each label into n pairs and 
identifying points in each pair, from x we get a 
trivalent graph which may contain some isolated loops 
(no vertices) and which depends on the partition. 
Replacing each isolated loop by a factor —2n, 
and summing up over all partitions, we get 
tn(D) E€ Gre, AlO). 

For x € Am, choose y € Pm such that pr(y) = x. 
Using the isomorphism y we pull back y~!y € By. 
Define 1,(x) := tn(x~!y). One can prove that (x) 
does not depend on the choice of the preimage y of x. 
Note that 2, lowers the degree by nm. 


Definition of the Le-Murakami-—-Ohtsuki 
Invariant ZMO 


In A(O) := [o GrnA(O) let the product of two 
Jacobi diagrams be their disjoint union. In addition, 
define the coproduct A(D)=1@D+D®1 for Da 
connected Jacobi diagram. Then A(Ø) is a commu- 
tative cocommutative graded Hopf algebra. 

For the unknot Us with framing +1, one has 
tn(Z(Us)) = (1)" + (terms of degree > 1); hence, 
their inverses exist. Suppose the linking matrix of an 
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oriented framed link L c R? has o, positive eigen- 
values and o_ negative eigenvalues. Define 


v 


(L) = = = 
(ta (Z (U+) (tn(Z(U_)) 
€ Gradz,,(A(0)) [1] 


Theorem 4 (Lé et al. 1998). 
of the 3-manifold M = $}. 


Q),(L) is an invariant 


We can combine all the 2, to get a better 
invariant: 


Z'MO(M) := 1 + Grad; (1(M)) 
+- + Grad,(Q,(M)) +--> 
For M a QHS, we also define 


Grad, (Q1 (M)) 
d(M) 
Grad, (Q,(M)) 
d(M)" 
where d(M) is the cardinality of H,(M,Z). 


Proposition 1 (Lê et al. 1998). Both Z'M° (M) and 
Z'MO(M) (when defined) are group-like elements, 
that is, 


A(Z'M°9 (M)) = LN Q) ZMMO(M) 
A( ZOM) — N Q ZOOM) 
Moreover, ZMO(M1#M2) = Z'M°(M,) x Z'M°(M)). 


E€ AD) 


Z1MO(M) := 1+ 


Universality Properties of the LMO Invariant 
Let us restrict ourselves to the case of ZHS. 


Theorem 5 (Lê 1997). The less than or equal to n 
degree part Gr<,Z'™© is an invariant of degree 2n. 
Any invariant of degree<2n is a compo- 
sition w(Gr<,Z'™©), where w:Gren,A() > Q is a 


linear map. 


Clasper calculus (or Y-surgery) theory allows 
Habiro to extend the theorem to rational homology 
3-spheres. 


The Arhus Integral 


The Arhus integral (ca. 1998) of Bar-Natan, 
Garoufalidis, Rozansky and Thurston, based on a 
theory of formal integration, calculates the LMO 
invariant of rational homology 3-spheres. The 
formal integration theory has a conceptual flavor 
and helps to relate the LMO invariant to perturba- 
tive expansions of quantum invariants. We give here 
the definition for the case when one does surgery on 


a knot K with nonzero framing b. The link case i 
similar (see Bar-Natan et al. (2002a, b)). 

When K is a knot, Z(K) is an element of A, 
Pı = Bı. Note that Bı is an algebra where the 
product is the disjoint union LI. Since the framing is 
b, one has 


N 


Z(K) = exp, (bw 1/2) UY 


where w ; is the “dashed interval” (the only 
connected open Jacobi diagram without trivalent 
vertex), and Y is an element in B every term of 
which must have at least one trivalent vertex. For 
uni-trivalent graphs C, D € B, let 


0 if the numbers of legs of C, D 
are different 


sum of all ways to glue legs of C and D 
together 


One defines ad Z(K) := (exp, (— w1/2b), Y). Then 


f“ aK =5 Grn (tn Z(K)) 


(C, D) = 


n=0 (—b)” 
Hence, 
FG z 
7iMO (S? ) J Z(K) 
EG = 
J ZUe) 
Other Approaches 


Another construction of a universal perturbative 
invariant based on integrations over configuration 
spaces, closer to the original physics approach but 
harder to calculate because of the lack of a surgery 
formula, was developed by Axelrod and Singer, 
Kontsevich, Bott and Cattaneo, Kuperberg and 
Thurston (see Axelred and Singer (1992), Bott and 
Cattaneo (1998)). 


Quantum Invariants and Perturbative 
Expansion 


Fix a simple (complex) Lie algebra g of finite 
dimension. Using the quantized enveloping algebra 
of g one can define quantum link and 3-manifold 
invariants. We recall here the definition, adapted for 
the case of roots lattice (projective group case). 
Here our q is equal to q? in the text book (Jantzen 
1995). Fix a root system of g. Let X, X,Y denote 
respectively the weight lattice, the set of dominant 
weights, and the root lattice. We normalize the 
invariant scalar product in the real vector space of 
the weight lattice so that the length of any short root 


is V2. 


Quantum Link Invariants 


Suppose L is a framed oriented link with m-ordered 
components, then the quantum invariant 
Ji(Mt, -.-5Am) is a Laurent polynomial in g'/??, 
where 1, ..., Am are dominant weights, standing 
for the simple q-modules of highest weights 
`,- --, Am, and D is the determinant of the Cartan 
matrix of g (see, e.g., Turaev (1994) and Lê (1996)). 
The Jones polynomial is the case when g=sl) and 
all the às are the highest weights of the funda- 
mental representation. For the unknot U with zero 
framing, one has (here p is the half-sum of all 
positive roots) 


A= Jf 


positive roots & 


g+ela)/2 — g-(Atpla)/2 
go) — g—(ela)/2 


We will also use another normalization of the 
quantum invariant: 


OPO <a 1g A) = e Aa m 


This definition is good only for A; € X+}. Note 
that each A € X is either fixed by an element of the 
Weyl group under the dot action (see Humphreys 
(1978)) or can be moved to X, by the dot action. 
We define Qzr(à1,.--, Am) for arbitrary A; € X by 
requiring that O7(A1,..., Am) =0 if one of the A;’s is 
fixed by an element of the Weyl group, and that 
Ozr(à1,..., Àm) is Component-wise invariant under 
the dot action of the Weyl group, that is, for every 
W1,...,Wm in the Weyl group, 


OL (w1 DA 50% Wigs Am) = Or Adi -+4 Am) 


Proposition 2 (Lê 1996). 
the root lattice Y. 
(i) (Integrality) Then Or(^,... 


fractional power). 
(ii) (Periodicity) When q is an rth root of 1, then 


Suppose A1,..., Am are in 


Am) € Z[q™"], (no 


Or(A\1,...5Am) is invariant under the action of 
the lattice group rY, that is, for y1,...,V¥m E€ Y, 
Or(à1, srr tis Am) = Or(à1 IV aoea Am + FY): 


Quantum 3-Manifold Invariants 


Although the infinite sum Da, ey OLA sAm) 
does not have a meaning, heuristic ideas show iha 
it is invariant under the second Kirby move, and 
hence almost defines a 3-manifold invariant. The 
problem is to regularize the infinite sum. One 
solution is based on the fact that at rth roots of 
unity, Oz (à1,..., Am) is periodic, so we should use 
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the sum with A,’s run over a fundamental set P, of 
the action of rY, where 


P, := {x =c19, +- + cae |O<c,...,c¢< 7} 


Here ay,...,a¢ are basis roots. For a root € of 
unity of order r, let 
Fr (£) = ` O1(M1,.-+5Am)|gae 


AjE (PNY) 


If Fy, (£) Æ 0, define 





Fr (€) 


(Fu, (€))"* (Fu_(€))" 


Recall that D is the determinant of the Cartan 
matrix. Let d be the maximum of the absolute values 
of entries of the Cartan matrix outside the diagonal. 


Theorem 6 (Lé 2003) 


i) If the order r of € is coprime with dD, then 
Fy, (£) # 0. 

(ii) If Fy, (E£) £0 then Ty) S(€):=7,(€) is an invariant 
of the 3-manifold M = S: 


TL() := 








Remark 1 The version presented here corresponds to 
projective groups. It was defined by Kirby and Melvin 
for sly, Kohno and Takata for sl„, and by Lé (2003) for 
arbitrary simple Lie algebra. When r is coprime with 
dD, there is also an associated modular category that 
generates a topological quantum field theory. In most 
texts in literature, say Kirillov (1996) and Turaev 
(1994), another version 73 was defined. The reason we 
choose 7’8 is: it has nice integrality and eventually 
perturbative expansion. For relations between the 
version 7”! and the usual 78, see Lê (2003). 


Examples When M is the Poincaré sphere and 


q= slo, 


a (g — ae _ ge) 
x (1 = 72) ~ (1 _ qe) 


Here g is a root of unity, and the sum is easily 
seen to be finite. 


Integrality The following theorem was proved for 
g=sl by Murakami (1995) and for g=sl, by 
Takata—Yokota and Masbaum—Wenzl (using ideas 
of J Roberts) and for arbitrary simple Lie algebras 
by Lé (2003). 


Theorem 7 Suppose the order r of € is a prime big 
enough, then 15(€) is in Z[é] = ZI exp (2ni/r)]. 
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Perturbative Expansion 


Unlike the link case, quantum 3-manifold invariants 
can be defined only at certain roots of unity. In 
general, there is no analytic extension of the 


function ts around g=1. In perturbative theory, 


we want to expand the function 7, around q=1 
into power series. For QHS, Ohtsuki (for g= sl2) 
and then the present author (for all other simple Lie 
algebras) showed that there is a number-theoretical 
expansion of ah around g = 1 in the following sense. 

Suppose r is a big enough prime, and 


E= exp(2mi/r). By the integrality (Theorem 7), 


(È € ZIE = Zig/(At+q+¢ +--- +474) 


Choose a representative f(g) € Z[g] of T(E). 
Formally substitute q = (q — 1) + 1 in f(q): 


f (q) = Cr 0 a Cri(g = 1) TEPEE S Craald = iy 


The integers c, „ depend on r and the representative 
f(q). It is easy to see that c,,,, (mod r) does not depend 
on the representative f(g) and hence is an invariant of 
QHS. The dependence on r is a big drawback. The 
theorem below says that there is a rational number cy, 
not depending on r, such that c,,, (mod r) is the 
reduction of either c, or —c, modulo r, for sufficiently 
large prime r. It is easy to see that if such c, exists, it 
must be unique. Let s be the number of positive roots 
of g. Recall that £ is the rank of g. 


Theorem 8 For every QHS M, there is a sequence 
of numbers 


Cn € e aT 


such that for sufficiently large prime r 


Crn = 


— aw 
r 


N es (mod 7) 


where 


(EAD) a 


y 


is the Legendre symbol. Moreover, Cy, is an invariant 
of order < 2n. 


The series tisla — 1):= Š o en(q — 1)”, called 
the Ohtsuki series, can be considered as the 
perturbative expansion of the function THA at q=1. 
For actual calculation of ti 8(q — 1), see Lé (2003), 
Ohtsuki (2002), and Rozansky (1997). 


Recovery from the LMO invariant It is known that 
for any metrized Lie algebra q, there is a linear map 
W,: GtnA(0) — Q (see Bar-Natan (1995)). 


Theorem 9 One has 


do Wa (Gr, Z™O) bY = M (q = 1)lg=et 
n=0 


This shows that the Ohtsuki series a — 1) can 
be recovered from, and hence totally determined by, 
the LMO invariant. The theorem was proved by 
Ohtsuki for sl). For other simple Lie algebras, the 
theorem follows from the Arhus integral (see Bar- 
Natan et al. (2002a, b) and Ohtsuki (2002)). 


Rozansky’s Gaussian Integral 


Rozansky (1997) gave a definition of the Ohtsuki 
series using formal Gaussian integral in the impor- 
tant work. The work is only for sl, but can be 
generalized to other Lie algebras; it is closer to the 
original physics ideas of perturbative invariants. 


Cyclotomic Expansion 
The Habiro Ring 


ie 


Let us define the Habiro ring Z[g] by 
Ziq] = lim Zig] (0 - q)(1 - 4°)... 1-4") 


Habiro (2002) called it the cyclotomic completion 
of Ziq]. Formally, Z[g] is the set of all series of the 


form 


f(q)= > AADA- aa- a). a”) 
n=0 
fn(q) € Zlq] 


Suppose U is the set of roots of 1. If € € U then 
(1—é)(1-&)---(1-&)=0 if n is_big enough; 
hence, one can define f(€) for f € Ziq]. One can 
consider every f € Z[g] as a function with domain U. 
Note that f(£) € Z[€] is always an algebraic integer. 
It turns out that Z[qg] has remarkable properties, 
and plays an important role in quantum topology. 

Note that the formal derivative of (1-— q) 
(1—q’)...(1—q") is divisible by (1—gq) (1-— 
A Veet l= q*) with k > (n —1)/2. This means every 
element f € Z[q] has a derivative f € Z[q], and hence 
derivatives of all orders in Z[q]. One can then 
associate to f € Z[q] its Taylor series at a root £ of 1: 


œ f (2) 
r=) Eq- 


n=0 n 





which can also be obtained by noticing that (1 — q) 
(1 — q2)... (1 — q”) is divisible by (q — £} if n is 
bigger than k times the order of €. Thus, one has a 
map Te: Z[q] > ZIéllla — 8. 


Theorem 10 (Habiro 2004) 


(i) For each root of unity £, the map Te is injective, 
that is, a function in Ziq] is determined by its 
Taylor expansion at a point in the domain U. 

(ii) if f(E) =g(€) at infinitely many roots € of prime 
power orders, then f = g in Ziq]. 


a 


One important consequence is that Z[g] is an 
integral domain, since we have the embedding 


a 


Tı: Z[lq4] > Zila — 11]. 

In general the Taylor series Tf has 0 convergence 
radius. However, one can speak about p-adic 
convergence to f(E) in the following sense. Suppose 
the order r of £ is a power of prime, r= pf. Then it is 
known that (€—1)” is divisible by p” if n > mk. 
Hence, Tıf (£) converges in the p-adic topology, and 
it can be easily shown that the limit is exactly f (£). 

The above properties suggest considering Z[q] as 
a class of “analytic functions” with domain U. 


Quantum Invariants as an Element of Zig] 


It was proved, by Habiro for sl and by Habiro with 
the present author for general simple Lie algebras, 
that quantum invariants of ZHSs belong to Zļ[q] 
and thus have remarkable integrality properties: 


Theorem 11 


(i) For every ZHS M, there is an invariant ES 
Ziq] such that if € is a root of unity for which 
the quantum invariant TH (E) can be defined, 
then IS (€) =r 8E). 

(ii) The Ohtsuki series is equal to the Taylor series 
of Irai. 


Corollary 1 Suppose M is a ZHS. 


(i) For every root of unity £, the quantum invariant 
at € is an algebraic integer, TÌ (£) € ZIE]. (No 
restriction on the order of € is required.) 

(ii) The Obtsuki series ora — 1) has integer coeffi- 
cients. If € is a root of order r=p*, where p is 
prime, then the Ohtsuki series at € converges 
p-adically to the quantum invariant at €. 

(iii) The quantum invariant THA is determined by 
values at infinitely many roots of prime power 
orders and also determined by its Ohtsuki series. 

(iv) The LMO invariant totally determines the 
quantum invariants 7,8. 

Part (ii) was conjectured by R Lawrence for sl» 
and first proved by Rozansky (also for sl2). Part (iv) 
follows from the fact that the LMO invariant 
determines the Ohtsuki series; it exhibits another 
universality property of the LMO invariant. 
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See also: Finite-Type Invariants; Knot Invariants and 
Quantum Gravity; Lie Groups: General Theory; Quantum 
3-Manifold Invariants. 
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Introduction 


Morse theory allows one to reconstruct the homology 
of a compact manifold B from data obtained from the 
gradient flow of a function f:B—R, the Morse 
function. The term “Floer homology” is used to 
describe homology groups that arise from carrying 
out the same construction, but in a setting where the 
space B is replaced by an infinite-dimensional mani- 
fold (a space of maps, or a space of configurations for a 
gauge theory), and where the gradient trajectories of 
the Morse function correspond to solutions of an 
elliptic differential equation. There are two important 
types of such homology theories that have been 
extensively developed, and the study of both was 
initiated in the 1980s by Andreas Floer. In the first 
type, the elliptic equation that arises is a Cauchy- 
Riemann equation, whose solutions are pseudoholo- 
morphic maps from a two-dimensional domain into a 
symplectic manifold. In the second type, the elliptic 
equation is an equation of gauge theory on a 
4-manifold: either the anti-self-dual Yang—Mills 
equations or the Seiberg—Witten equations. Important 
antecedents of Floer’s work included work of Conley, 
Zehnder, and others on the symplectic fixed-point 
problem, and Witten’s ideas about Morse theory. 

This article describes the background material 
from Morse theory before discussing Floer homol- 
ogy of Cauchy—Riemann type and its application to 
the Arnol’d conjecture in symplectic topology. Floer 
homology in the context of four-dimensional gauge 
theories is discussed more briefly. 


Morse Theory 


Let B be a smooth, compact manifold and f : B—R 
a smooth function. A critical point p of f is said to 
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Rozansky L (1998) On p-adic propreties of the Witten- 
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3-Manifolds. de Gruyter Studies in Mathematics, vol. 18. 
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be nondegenerate if the Hessian of f is a nonsingular 
operator on T,B. The function f is a Morse function 
if all its critical points are nondegenerate. In the 
presence of a Riemannian metric g on B, the 
derivative df becomes a vector field, the gradient 
Vf, and we can consider the downward gradient- 
flow equation for a path x(s) in B: 
dx 

If p and q are nondegenerate critical points, let us 
write M(p,g) for the space of solutions x(s) 
satisfying 


To understand the structure of M(p, q), consider the 
linearization of the gradient-flow equation at a 
solution x € M(p, q). This is a linear equation for a 
vector field X along the path x in B, and takes the 
form 


Vajas X = —VVf(X) [1] 


where VVf is the covariant derivative of the 
gradient Vf, an operator on tangent vectors. Let ex 
be the dimension of the space of solutions X to this 
linear equation, with the boundary conditions 
lims +% X(s) =0, and let e be the dimension of 
the space of solutions to the adjoint equation 


Vajasx = +VVP(X) 


We say that the trajectory x is “regular” if e =0. In 
this case, the trajectory space M(p,q) has the 
structure of smooth manifold near x: its dimension 
is €x and its tangent space is the space of solutions X 
to [1]. The gradient flow is said to be Morse—Smale 
if all trajectories between critical points are regular. 
If f is any Morse function, one can always choose 
the metric g so that the corresponding flow is 
Morse—Smale. (It is also the case that one can leave 
g fixed and perturb f to achieve the same effect.) 


In the Morse-Smale case, each M(p,q) is a 
smooth manifold. The dimension of M(p,q) in the 
neighborhood of a trajectory x depends only on 
p and q, not otherwise on x. Indeed, even without 
the regularity condition, the index of eqn [1], 
namely the difference ex — ef, is given by 


€x — €, = index(p) — index(q) 


where index(p) denotes the number of negative 
eigenvalues (counting multiplicity) of the Hessian 
at p. In the Morse-Smale case therefore, the 
dimension of M(p,g) is given by index(p) — 
index(q). If x(s) is a solution of the gradient-flow 
equation, then so is the reparametrized trajectory 
x(s + c); and this is different from x(s) as long as 
p#q. Let us denote by M(p,q) the quotient of 
M(p,q) by the action of R given by these reparame- 
trizations. We have 


dim M(p, q) = index(p) — index(q)-—1 (p #q) 


as long as the trajectory space is nonempty. 

Let > denote the field with two elements. The 
Morse complex of a Morse-Smale gradient flow, 
with coefficients in F, is defined as follows. For 
each i, let C;(f) be the finite-dimensional vector 
space over F, having a basis 

Career 


1 


indexed by the critical points p1,...,p,, with index 2. 
For each pair of critical points p and q with indices 
i and i—1 respectively, let óp € F2 denote the 
number of points in the zero-dimensional manifold 


M(p, q), counted mod 2: 


nq =##M (p,q) (mod 2) 


The Morse-Smale condition ensures that the zero- 
dimensional space M(p, q) is finite, so this definition 
is satisfactory. Define a differential 


6: C(f) > C-i(f) 
by 


ôlep) = ` 


index(q)=i—1 


ôpqeq 


The first important fact is that 6 really is a 
differential: as long as the flow is Morse-Smale, 
we have 


the composite 606: C(f) > Ci-2 (f) is zero [2] 


We can therefore construct the homology of the 
complex (C,(f), ô). This is the Morse homology: 


_ ker(6: Cf) CaN) 


MO = ind: Cul) > CA) 


3] 
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The proof of [2] is as follows. Suppose that p has 
index i and r is a critical point with index 7 — 2, and 
consider M(p, r), which has dimension 1. The key 
step is to understand that M(p,7r) is noncompact, 
and that its ends correspond to “broken trajec- 
tories”: pairs (x1,x2) (modulo reparametrization), 
where x1 is a gradient trajectory from p to some q of 
index i— 1, and x2 is a trajectory from q to r. The 
number of ends is thus > `; 6gr5pq. Since the number 
of ends of a 1-manifold is even, this sum is zero in 
F2. This sum is also the matrix entry of ô o 6 from ep 
to €,; so 606=0. 

The main result about Morse homology in finite 
dimensions is the following: 


Theorem 1 The Morse homology H,(f) is iso- 
morphic to the ordinary homology of the compact 
manifold B with coefficients F2: the group H;(B; F2). 


This result can be proved by first showing that 
H;(f) depends only on B, not on the choice of f or 
the metric. (This step can be accomplished by 
examining a nonautonomous flow of the form 
dx/ds = -Vf (s,x).) Then one can examine the 
Morse complex in the case of a self-indexing 
Morse function (where the value of f at the critical 
points is a monotone-increasing function of their 
index). In the self-indexing case, the unstable 
manifolds of the critical points give rise to a cell 
decomposition of the manifold B, and the Morse 
complex is easily identified with the cellular chain 
complex for this cell decomposition. 

The sum of the dimensions of the Morse 
homology groups cannot be larger than the sum of 
the dimensions of the chain groups C;(f), which is 
the total number of critical points. The above 
theorem therefore implies the following basic ver- 
sion of the “Morse inequalities”: 


Corollary 2 The number of critical points of a 
Morse function f:B—R cannot be less than 
X`; dim H;(B; F2). 


The Morse complex can be refined in various 
ways. For example, one can use integer coefficients 
in place of coefficients Fy by taking account of 
orientations of the spaces of trajectories. One can 
also introduce Morse theory with coefficients in a 
local system, and in both these cases a version of the 
above theorem continues to hold. One can also 
study the Morse complex of a multivalued Morse 
function: that is, one can start with closed 1-form a 
on B, with nontrivial periods, and study the flow 
generated by the corresponding vector field —g ta. 
Such a theory was developed by Novikov. 

The Morse complex can be generalized in a 
different direction, replacing f by a functional 
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related to a geometric problem. The canonical 
example of this (and one of the very few cases in 
which the theory works as in the finite-dimensional 
case) is the case that B= LW is the space of loops 
u:S'— W ina Riemannian manifold W and f is the 
“energy function,” felu)= | (du/ dt)? dt. If the 
Morse-Smale condition holds, then the Morse 
homology H;(fg) computes the homology of LW, 
as expected. Critical points of fg are geodesics, and 
the relationship between geodesics and the topology 
of LW, for which Corollary 2 provides a prototype, 
is an idea with many applications. 

For the energy functional, the downward gradient- 
flow equation is a parabolic equation (the ordinary 
heat equation if the target space is Euclidean), and 
a solution to the flow exists for each choice of 
initial condition. Floer homology can be loosely 
characterized as the Morse theory of certain 
variational problems for which the gradient-flow 
equation is not parabolic, but elliptic of first order: 
the important models are the Cauchy—Riemann 
equation in dimension 2, the anti-self-dual Yang- 
Mills equations in dimension 4, or the closely 
related Seiberg-Witten equations. For an elliptic 
equation, one does not expect to solve the Cauchy 
problem with arbitrary initial condition; so with 
Floer homology, one is studying a functional for 
which the gradient flow is not everywhere defined. 
However, to define the Morse complex, the import- 
ant thing is only that we have a good understanding 
of the trajectory spaces M(p, q), which will now be 
solution spaces for an elliptic problem of geometric 
origin. The proof of Theorem 1 depends very much 
on the fact that the flow is everywhere defined: this 
theorem will therefore fail for the Morse complexes 
arising in Floer theory, and one must look else- 
where for a means to compute the Morse homology 
groups. 

Before discussing Floer homology in more specific 
terms, we shall describe the problem in symplectic 
geometry that motivated its development. 


The Arnol’d Conjecture 


A symplectic manifold of dimension 2n is a smooth 
manifold W equipped with a 2-form w which is 
closed and nondegenerate. On a symplectic mani- 
fold, one can associate to each smooth function 
H:W—kRK a vector field Xy on W: the vector field 
is characterized by the property that 


w(Xp, V) = dH(V) 


for all vector fields V. In this situation, one refers to 
H as the Hamiltonian and Xy as the corresponding 


Hamiltonian vector field. If W is compact, or if XH 
is otherwise complete, then this vector field gener- 
ates a flow œ: W —W(te R). We also wish to 
consider the case that H is time dependent: we 
suppose that H;:W —>R is a Hamiltonian which 
varies smoothly with t € R and is periodic, in that 
H1 = H;. In this case, there is a time-dependent 
Hamiltonian vector field X;, and we can consider 
the flow ¢, that it generates: so for x € W, the path 
t(x) will be the solution to 


d 
Toa) = Xala) 4] 


with initial condition 9(x)=x. The Arnol’d con- 
jecture, in one formulation, concerns the 1-periodic 
solutions to this equation, or equivalently the fixed 
points of ¢;: W — W. A fixed point x with ¢1(x)=x 
is called nondegenerate if dé, : Ty X — T,X does not 
have 1 as an eigenvalue. With this understood, one 
version of the conjecture states: 


Conjecture 3 Suppose W is compact and let H, be 
any 1-periodic, time-dependent Hamiltonian. If the 
fixed points of ¢; are all nondegenerate, then the 
number of fixed points is not less than the sum of 
the Betti numbers of the manifold W. 


There is another, more general version of this 
conjecture. Let L CW be a closed Lagrangian 
submanifold: that is, an n-dimensional submanifold 
such that the restriction of w to L as a 2-form is 
identically zero. Let L’ c W be another Lagrangian, 
obtained from L by a Hamiltonian isotopy: that is, 
L’ is ¢i(L), for some flow ¢; generated by a time- 
dependent Hamiltonian H; as above. 


Question 4 If L and L’ intersect transversely, is it 
always true that the number of intersection points of 
L and L’ is at least the sum of the Betti numbers of 
the manifold L: 


#(LOL') > X rankHj(L)? 


This is phrased as a question rather than a 
conjecture, because the answer is certainly “no” in 
some cases. For example, L might be a circle 
contained in a small disk in a symplectic 2-manifold, 
in which case there is no reason why ¢,; should not 
move the disk to be completely disjoint from itself. 
Nevertheless, with extra hypotheses, it is known 
that the answer is often “yes.” 

We can exhibit Conjecture 3 as a special case of 
Question 4, as follows. Given a symplectic manifold 
(V,w), we can form the product W = Vx V, with the 
symplectic form ww = —pjw + pw, where the p; are 
the two projections. The result of this definition is 


that the diagonal in VxV is a Lagrangian 
submanifold, 


Lc w=VxVvV 


for this symplectic form. Let H; be a time-dependent 
Hamiltonian on V, and let œ: V — V be the flow. 
Then H;op2 is a time-dependent Hamiltonian 
generating a flow on W. For the flow on W, the 
image L’ of the diagonal L C W at time 1 is the 
graph of ¢;: V— V. Thus, (LA L’) can be identified 
with the set of fixed points of ¢; in V, and an 
affirmative answer to Question 4 for L C W implies 
Conjecture 3 for V. 

Conjecture 3 and Question 4 can both be 
extended to the case of isolated degenerate fixed 
points of ¢, for Conjecture 3, or to the case of 
isolated, nontransverse intersections for Question 4. 
For example, one can ask whether, in the non- 
transverse case, the sum of the intersection multi- 
plicities can ever be less than the sum of the Betti 
numbers. 


Morse Theory and the Arnol’d Conjecture 


The Arnol’d conjecture, and the related Question 4, 
can both be studied by reformulating them as 
questions about the number of critical points of a 
carefully chosen functional. 

We begin with the situation addressed by Con- 
jecture 3. For simplicity, we suppose that 72(W) is 
zero. Let B be the space of smooth, null-homotopic 
loops in W: 


B = {u : S! — W\u is smooth and null homotopic} 


This is a smooth, infinite-dimensional manifold. 
There is a natural functional fọ:B— R, the sym- 
plectic action, defined as 


fal) = | vw 


where v:D*—> W is any extension of the map 
u:S!— W. The extension v exists because u is null 
homotopic, and the value of fp is independent of the 
choice of v because 72.(W) =0. This functional can be 
modified in the presence of a periodic Hamiltonian. 
Introduce a coordinate t on St with period 1, and so 
regard u as a periodic function of t. Write the 
Hamiltonian as H, as before, and define 


1 
ftw) = fol) + | Hi(o(e)) ae 
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To compute the first variation of f, consider a one- 
parameter family of loops u,(t) = u(s, t) parametrized 
by s € R. We compute 


d ' /ðu Ou ' Ou 
geo f eaat are( Ge) a 
1 /ðu Ou 
using the relationship between dH, and X;. Thus, a 


loop u € B is a critical point of f: B— R if and only 
if it is a solution of the equation 


du 
H = X, (u(i) 5] 


This means that there is a one-to-one correspon- 
dence between these critical points and certain 
1-periodic solutions of eqn [4]: these in turn 
correspond to fixed points p of ¢, with the 
additional property that the path ¢,(p) from p to 
p is null homotopic. 

To consider the formal gradient flow of the 
functional f, on must introduce a metric on B. A 
Riemannian metric g on the symplectic manifold 
(W,w) is compatible with w if there is an almost- 
complex structure J:TW —TW such that 
w(X, Y)=g(JX, Y) for all tangent vectors X and Y 
at any point of W. Let g, be a 1-periodic family of 
compatible Riemannian metrics on W. Using these, 
on can define an inner product on the tangent 


bundle of B by the formula 


1 
(U,V) = f gU Vie) d 


in which U and V are tangent vectors at u € B, 
regarded as vector fields along the loop u in W. We 
can rewrite the above formula for the variation of 
f in terms of this inner product: 


OET 


where J; is the almost-complex structure corre- 
sponding to g;. Formally then, a one-parameter 
family of loops u(s, ft) is a solution of the downward 
gradient-flow equations for the functional f with 
respect to this metric, if u satisfies the differential 
equation 


w I (Fe Xl) =0 6) 


In the absence of the term X;, and with W replaced 
by C” with the standard J, this equation becomes the 
Cauchy—Riemann equation du/dz = 0, for a function 
u of the complex variable z=s + it, periodic in t. 
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Let us now suppose we are in the situation of 
Conjecture 3, so W is closed, and the fixed points of 
@, are nondegenerate. As we have seen, each fixed 
point p of ¢1 corresponds to a 1-periodic solution up 
of eqn [5], a critical point of f. For each pair of fixed 
points p and q, introduce M(p,q) as the space of 
solutions of the formal gradient-flow equations of f, 
running from p to q: that is, M(p,q) is the space of 
maps u:R x S'— W satisfying eqn [6], with 


lim u(s,t) = up(t) 
lim u(s,t) = u(t) 


With these definitions in place, one can follow the 
same sequence of steps that we outlined previously 
in the context of finite-dimensional Morse theory, to 
construct the Morse complex. First, if u belongs to 
M(p,q), we can consider the linearization at u of 
eqns [6], to obtain the counterpart of eqn [1]. These 
are linear equations for a vector field U(s,t) along u 
in W, and take the form 


VoajasU + JiVajaeU + b(U) = 0 [7] 


where / is a linear operator of order zero. Let e, 
denote the dimension of the space solutions U which 
decay at s = +o, and let e, denote the dimension of 
the space of solutions of the formal adjoint 
equation. Elliptic theory for the Cauchy—Riemann 
equation, and the nondegeneracy condition for up 
and u4, mean that the operator that appears on the 
left-hand side of the equation is Fredholm: so both 
€u and e, are finite, and the index e,—€), is 
deformation invariant. This index depends only on 
p and q: we give it a name, 


€u — €, = index(p, q) 


As before, u is said to be regular if e, is zero. For 
suitable choice of the almost-complex structures J; 
(or equivalently the metrics g;), the Morse—Smale 
condition will hold: that is, the trajectories in all 
spaces M(p, q) are regular. In this case, each M(p, q) 
is a smooth manifold and has dimension index(p,q) 
if it is nonempty. 

The “relative index” index(p,q) plays the role of 
the difference of the Morse indices in the finite- 
dimensional case. It can be defined whether or not 
M(p,q) is empty by considering an equation such as 
[7] along an arbitrary path u(s, t). In general, there is 
no natural way to define the “index” of p: if we 
wish, we can select one fixed point po and declare it 
to have index zero; we can then define index(p) as 
index(p, po). Alternatively, we can regard the critical 
points as indexed by an affine copy of Z (without a 
preferred zero). 


Imitating the construction of the Morse complex, 
we define a vector space CF, over Fz as having a 
basis consisting of elements e, indexed by the fixed 
points p. We then define 6: CF, — CF, by 


dey = > 


index(p,q)=1 


Opgeq 


where dp is defined by counting points in M(p, q) as 
before. The vector space CF, is Z-graded if we make 
a choice of critical point po to have index zero; 
otherwise, CF, has an “affine” Z-grading. The map 
6 maps CF; into CF;1. 

To show that 6 is well defined, and to show that 
606=0, one must show that the zero-dimensional 
spaces M(p,q) are compact, and that the ends of 
the one-dimensional spaces M(p,r) correspond 
biectively to broken trajectories, as in the finite- 
dimensional case. Both of these desired properties 
hold, under the Morse-Smale conditions; but this is 
a very special feature of the specific problem. 
Without the hypothesis that 72.(W) is zero, addi- 
tional noncompactness can arise from the following 
“bubbling” phenomenon. There could be a 
sequence of solutions u’ € M(p,q) to eqns [6], and 
a point (so,to) in R x St, such that for suitable 
constants «é; converging to zero, the rescaled 
solutions 


u (o, T) = u'(so + Gio, to + GT) 


converge on compact subsets of the plane R? to a 
nonconstant pseudoholomorphic map z:CP! => W, 
or more precisely a solution of the equation 


Ou Ou 

Oo + Sic (=) = 
(In the original coordinates, the derivatives of the u’ 
would grow like 1/e; near (so, f).) A pseudoholo- 
morphic sphere always has nontrivial homology 
class (and therefore nontrivial homotopy class); so 
this sort of noncompactness does not occur when 
m™(W) =0. 

Granted the compactness results, the proof that 


ó o ô= Q runs as before, and we can construct a Floer 
homology group, 


HF, = ker(6) /im(6) 


Unlike the Morse homology of the energy func- 
tional, the Floer homology does not yield the 
ordinary homology of B. To compute it, one first 
shows that it depends only on the symplectic 
manifold (W,w), not on the choice of Hamiltonian 
H, or metrics g;: this step is similar to the proof that 
the finite-dimensional Morse homology H,(f) does 
not depend on the Morse function. Once one has 


established this independence, HF, can be computed 
by examining a special case. Floer did this by taking 
the Hamiltonian to be independent of t and equal to 
a small negative multiple —ņnb of a fixed Morse 
function ): W—R on the symplectic manifold. If 
the multiple n€ R is small enough, the only fixed 
points of ¢; are the stationary points of the flow, 
and these are exactly the critical points of hb. 
Furthermore the only index-1 solutions of eqn [6] 
for small 7 are the solutions u(s,t) with no t 
dependence; and these are the solutions of 
du/ds = —nVh, the downward gradient flow of h, 
scaled by 7. In this case therefore, the Floer complex 
CF, is precisely the Morse complex C,(h) of the 
Morse function 4, and Theorem 1 yields: 


Theorem 5 For a periodic, time-dependent Hamil- 
tonian H, on a closed symplectic manifold (W,w) 
with m(W)=0, the Floer homology HF, is iso- 
morphic to the ordinary homology of W with F2 
coefficients, H.(W;F2). 


Because the generators of CF, correspond to fixed 
points p of ¢; such that the path ¢,(p) is null 
homotopic, the number of these fixed points is not 
less than the dimension of HF,, and therefore not 
less than }),dimHj(W;F2) because of the above 
result. The sum of the mod 2 Betti numbers is at 
least as large as the sum of the ordinary Betti 
numbers (the dimensions of the rational homology 
groups); so one deduces, following Floer, 


Corollary 6 The Arnol’d conjecture (Conjecture 3) 
holds for symplectic manifolds (W,w) satisfying the 
additional condition 72(W)=0. 


Orientations can be introduced rather as in the 
case of finite-dimensional Morse theory, allowing 
one to define Floer groups with arbitrary 
coefficients. 

The Arnol’d conjecture is now known to hold in 
complete generality, without the hypothesis on m2. 
The proof has been achieved by successive exten- 
sions of the Floer homology technique. When 72(W) 
is nonzero, the space B is not simply connected. The 
first complication that arises is that the symplectic 
action functional fp, and therefore f also, is multi- 
valued. This is not an obstacle initially, because Vf 
is still well defined, and the spaces M(p,q) of 
gradient trajectories can still be assumed to satisfy 
the Morse-Smale condition: this is the type of 
Morse theory considered by Novikov, as mentioned 
above. Because 7;(B) is nontrivial, M(p, q) is a union 
of parts M,(p,q), one for each homotopy class of 
paths from p to g. For each homotopy class z, we 
have the index index,(p,q), which is the dimension 


of M,\p, q). 
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The spaces M,(p,g) may now have additional 
noncompactness, due to the presence of pseudo- 
holomorphic spheres #:CP!'— W. The simplest 
manifestation is when a sequence u in M,(p,q) 
“bubbles off” a single such sphere at a point (so, fo), 
and converges elsewhere to a smooth trajectory w’ in 
M.(p, gq), belonging to a different homotopy class. 
Let o be the homology class of the sphere 7. Because 
the sphere has positive area, the pairing of ø with 
the de Rham class [w] is positive: ([w],o) >0. The 
indices are related by 


index, (p,q) = index,(p, q) — 2(ci(W),¢) 


where ci(W) € H?(W; Z) is the first Chern class of a 
compatible almost-complex structure. The symplec- 
tic manifold is said to be “monotone” if, in real 
cohomology, cı(W) is a positive multiple of [w]. In 
the monotone case, we always have index, (p,q) < 
index,(p,g), and no bubbling off can occur for 
trajectory spaces M,(p,q) of index 2 or less: the 
above formula either makes M,(p,q) a space 
of negative dimension (in which case it is empty) 
or a zero-dimensional space (in which case one 
has to exploit an additional transversality argument, 
to show that the holomorphic spheres belonging 
to classes o with (c,(W),o) =1 cannot intersect one 
of the loops up in W). Since the construction of 
HF, involves only the trajectories of indices 1 and 2, 
the construction goes through with minor changes. 
Because index,(p,g) depends on the path z, 
the group HF, will no longer be Z-graded: the 
grading is defined only modulo 2d, where d is the 
smallest nonzero value of (ci(W),o) for spherical 
classes o. 

In the case that W is not monotone, additional 
techniques are needed to deal with the essential 
noncompactness of the trajectory spaces. These 
techniques involve (amongst other things) multi- 
valued perturbations on orbifolds — a strategy that 
requires the use of rational coefficients in order to 
perform the necessary averaging. For this reason, in 
the monotone case, the Arnol’d conjecture is known 
to hold only in its original form: with the ordinary 
(rational) Betti numbers. 

To address Question 4 for Lagrangian intersec- 
tions, a closely related Floer homology theory is 
used. Assume L is connected, and introduce the 
space of smooth paths joining L to L’: 


Q(W; L, L’) 
= {u : [0,1] — W | u(0) € L,u(1) € L'} 


Fix a point xọ in L, and let uo be the path 
uo(t)=¢ġ:(xo). Let B be the connected component 
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of 2(W;L,L’) containing up. On B we have a 
symplectic action functional, defined as 


f(u) = I mon? O 


where v:[0,1] x [0,1] W is a path in B with 
v(0,t)=uo(t) and v(1,t)=u(t). The symplectic 
action is single valued if 72(W,L) is trivial (even 
though this condition does not guarantee that B is 
simply connected). The critical points of f corre- 
spond to constant paths whose image in W is an 
intersection point of L and L’ (though not all such 
constant paths belong to the connected component 
B). If we fix a one-parameter family of compatible 
metrics g; and almost-complex structures J; on W, 
then we can consider the downward gradient 
trajectories of the functional. These are maps 


u: R x [0,1] — W 


satisfying the Cauchy—Riemann equation 


Ou Ou 
ON 


with boundary conditions u(s,0) € L and u(s,1) € L’. 
With coefficients Fy, a Morse complex can be 
constructed much as in the case just considered. If 
m2( W, L) is trivial, then the Floer homology group HF, 
obtained as the homology of this Morse complex is 
isomorphic to H,(L; F2); and as a corollary, Question 
4 has an affirmative answer in this case. 

Without the hypothesis that ma(W, L) is trivial, 
one does not expect an affirmative answer to 
Question 4 in all cases. There is a “monotone” 
case, in which HF, can always be defined; but it is 
not always isomorphic to H,(L; F2): instead, there is 
a spectral sequence relating the two. In the general 
case, there is once again the need to use rational 
coefficients in place of mod 2 coefficients, in order 
to deal with the orbifold nature of the trajectory 
spaces that appear. This raises the question of 
orientability for the trajectory spaces. In contrast 
to the Morse theory for Hamiltonian diffeomorph- 
isms, there is an obstruction to  orientability, 
involving spin structures on L and W. Even when 
the trajectory spaces are orientable, there are further 
obstructions to the existence of a Morse differential 
satisfying 606=0. The theory of these obstructions 
is developed in Fukaya et al. (2000). There are still 
Open questions in this area. 


Instanton Floer Homology 


A “Floer homology theory” for 3-manifolds should 
assign to each 3-manifold Y (satisfying perhaps some 


additional topological requirements) a group, say 
HF(Y). Furthermore, given a four-dimensional 
cobordism W from Yı to Y2, the theory should 
provide a corresponding homomorphism of groups, 
from HF(Y;) to HF(Y2). These homomorphisms 
should satisfy the natural composition law for compo- 
site cobordisms. One can formulate this by considering 
the category in which an object is a closed, connected, 
oriented 3-manifold Y, and in which the morphisms 
from Yı to Y are the oriented four-dimensional 
cobordisms, considered up to diffeomorphism. A 
Floer homology theory is then a functor from this 
category (perhaps with some additional decorations or 
restrictions) to the category of groups. Such a functor 
was constructed by Floer (1988a), at least for the full 
subcategory of homology 3-spheres (manifolds Y with 
Hı x (Y; Z) =0). We outline the construction. 

Let P — Y be a principal SU(2) bundle (necessarily 
trivial). Let A denote the space of SU(2) connections 
in the bundle P, and let Ag be any chosen basepoint 
in A. Any other A € A can be written as Ap + a, for 
some 1-form a with values in the adjoint bundle 
ad(P) whose fiber is the Lie algebra su(2). So A is an 
affine space, 


A = Ay + 91(Y;ad(P)) 


and we can identify the tangent space TAA at any 
A with 01!(Y;ad(P)). The Chern-Simons functional 
is a smooth function 


CS: A—R 


depending on our choice of a reference connection 
Ao. It can be defined by stating that its derivative at 
A € Ais the linear map TAA —> R given by 


dt+— J tr(a A Fa) 
Y 


where F4 denotes the curvature of A, as an ad(P)- 
valued 2-form on Y, and tr denotes the trace of a 
matrix-valued 3-form. If we equip Y with a 
Riemannian metric, then we have the L? inner 
product on Q!(Y;ad(P)), with respect to which we 
can consider the gradient of CS. The formal down- 
ward gradient-flow equation on A is then 


(d/ds)A = — x F4 (8) 


where x is the Hodge star on Y. If A(s) is a solution 
defined on an interval [s1, s2], then we can form the 
corresponding four-dimensional connection A on 
[s1,52] x Y, and eqn [8] implies that A is a solution 
of the anti-self-dual Yang-Mills equation, Fy =0. 
Here F; is the self-dual part of the curvature 2-form 
on the cylinder. The critical points of CS are the flat 
connections on Y, with F4 =0. 


Let G denote the gauge group, by which we mean 
the group of automorphisms of P. When a trivializa- 
tion of P is chosen, G becomes the group of smooth 
maps g: Y > SU(2). A connection A € A is irreducible 
if its stabilizer in G consists only of the constant gauge 
transformations +1. The functional CS is invariant 
only under the identity component of G: it descends to 
a function CS:A/G—R/(4n*Z). If we choose a 
basepoint in Y, then the gauge-equivalence classes of 
flat connections in A are in one-to-one correspond- 
ence with conjugacy classes of representations, 


p : tı(Y) — SU(2) 


Given representations p and o, we write M(p, c) for 
the quotient by G of the space of trajectories A(s) 
which satisfy the gradient-flow equation [8] and 
which are asymptotic to flat connections belonging 
to the classes p and o as s— too. There is a purely 
four-dimensional interpretation of M(p, ø): it can be 
identified with the moduli space of solutions A to 
the anti-self-dual Yang-Mills equation, or “instan- 
tons,” on R x Y, satisfying the same asymptotic 
conditions. 

One defines the “instanton Floer homology” of Y, 
roughly speaking, as the Morse homology arising 
from the functional CS. In the case that Y is a 
homology 3-sphere, Floer defined I,(Y) as the 
homology H,(C, 6) of a complex C whose generators 
correspond to the irreducible representations p, and 
whose differential 6 is defined in terms of the one- 
dimensional components of the moduli spaces 
M(p,c). To carry out the construction of I,(Y), it 
is necessary to perturb the functional CS to achieve 
a Morse-Smale condition: this is done by adding a 
function f : A— R defined in terms of the holonomy 
of connections along families of loops in Y. The 
group G is not connected, and for given p and o, the 
moduli space M(p,c) has components differing in 
dimension by multiples of 8. For this reason, I,,(Y) is 
a Z,/8-graded homology theory. It is a topological 
invariant of Y, and is functorial for cobordisms, in 
the manner outlined at the beginning of this section. 

Various extensions have been made, to allow the 
definition of I,,(Y) for 3-manifolds with nontrivial H4, 
and to incorporate the reducible representations. 
Although there have been some successes (Donaldson 
2002), a completely satisfactory general theory has not 
been constructed. The main difficulties stem from the 
noncompactness of the instanton moduli spaces (a 
bubbling phenomenon) and the interaction of this 
bubbling with the reducible solutions. 

The instanton Floer theory for 3-manifolds is 
closely tied up with Donaldson’s polynomial invari- 
ants of closed 4-manifolds, which are also defined 
using the anti-self-dual Yang-Mills equations. 
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Seiberg-Witten Floer Homology 


Seiberg-Witten Floer homology can be defined in a 
manner very similar to the instanton case. Again, we 
start with a Riemannian 3-manifold Y, equipped 
now with a spin‘ structure s: a rank-2 Hermitian 
vector bundle §— Y together with a Clifford multi- 
plication p:A*(Y)—End(S). The configuration 
space C is defined as the space of pairs (A, ®), 
where A is a spin connection and © is a section of S. 
In place of the Chern—Simons functional considered 
above, we have the Chern—Simons—Dirac functional 


CSD:C—R defined by 
CSD(A, 6) = Z CS(tr(A)) + ; J wD dy 
Y 


where tr(A) denotes the connection induced by A on 
the line bundle A*S and D4 is the Dirac operator for 
the connection A. The functional is invariant again 
under the identity component of the gauge group G, 
which this time is the group of maps g:Y—S!, 
acting as automorphisms of S. The critical points are 
the solutions (A,®) to the  three-dimensional 
“Seiberg-Witten equations,” 


7 P(Fr(a)) — (@B*)9 = 0 
Da? =0 


in which the subscript 0 denotes the traceless part of 
the endomorphism. If a and 8 are gauge-equivalence 
classes of critical points, then we write M(a, 8) for 
the quotient by G of the space of gradient trajec- 
tories from a to 8. 

As in the instanton case, M(a, 68) has a four- 
dimensional interpretation: it is the quotient by the 
four-dimensional gauge group of a space of solu- 
tions (A,®) on Rx Y to the four-dimensional 
Seiberg-Witten equations: 


bo Feu | — (@@"), = 0 
D,® = 0 


Here ® is a section of the summand S* of the four- 
dimensional spin® bundle S=S* S7, and DÅ: 
r(S*)—T(S7) is the four-dimensional Dirac operator. 

The action of the gauge group on C is free except 
at configurations with ®=0. These reducible con- 
figurations have an S! stabilizer. Reducible critical 
points of CSD correspond to flat connections in the 
line bundle A*S. We can now distinguish two cases, 
according to whether c;(S) is a torsion class or not. 

If c,($) is not a torsion class, then there are no flat 
connections in A*S, so all critical points are 
irreducible. In this case, there is a straightforward 
Floer-type Morse theory for the functional CSD on 
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the space C/G: for generators of our complex we 
take the gauge-equivalence classes of critical points, 
and we use the one-dimensional trajectory spaces 
M(a, 3) to define the boundary map. The resulting 
Morse homology group is denoted HM,(Y,s). It has 
a canonical Z,/2-grading, and is a topological 
invariant of Y and its spin‘ structure. 

If cı(S) is torsion, the theory is more complex. 
There will be reducible critical points, and one 
cannot exclude these from the Morse complex and 
still obtain a topological invariant of Y. One may 
incorporate the reducible critical points in two 
different ways, that are in a sense dual to one 
another; and there is a third homology theory that 
one can define, using the reducibles alone. Thus, one 
can construct three Floer groups associated to Y 
with the spin® structure s. The resulting theory 
closely resembles the Heegaard Floer homology that 
is described next. 


Heegaard Floer Homology and Other 
Floer Theories 


Heegaard Floer homology is a Floer homology 
theory for 3-manifolds that is formally similar to 
Seiberg-Witten Floer homology, and conjecturally 
isomorphic to it. Unlike the instanton and Seiberg— 
Witten theories, its construction, due to Ozsvath 
and Szabo, does not use gauge theory. Instead, one 
begins with a decomposition of the 3-manifold into 
two handlebodies with common boundary ©, and 
one studies a symplectic manifold s§X, the configu- 
ration space of g-tuples of points on ©, where 
g denotes the genus. The Heegaard Floer groups are 
then defined by a variant of the construction used 
for Lagrangian intersections (see the section “Morse 
theory and the Arnol’d conjecture”), applied to a 
particular pair of Lagrangian tori in s&™. 

As in the case of Seiberg—Witten theory, Heegaard 
Floer homology assigns to each oriented 3-manifold 
Y three different Floer groups, HF*(Y), HF (Y), and 
HF™(Y), related by a long exact sequence: 


... + HF* (Y) = HF (Y) = HF” (Y) = HF*(Y) > --- 


The first two groups are dual, in that there is 
a nondegenerate pairing between HF*(Y) and 
HF (—Y), where —Y denotes the same 3-manifold 
with opposite orientation. If W is an oriented four- 
dimensional cobordism from Yı to Y2, then there 
are associated functorial maps 


F (W) i HF*(Y,) — HF* (Y2) 
F (W): HF (Y1) -HF (Y2) 
F°(W) : HFE” (Y1) — HF” (Y2) 


In addition, if the intersection form of W is not 
negative semidefinite, there is a map 


F(W) HF (Y1) — HF*(Y2) 


As a special case, one can start with a closed 
4-manifold X, and consider the cobordism W from 
S? to S? obtained from X by removing two 4-balls. 
In this case, the map 


F(W) : HF (S°) — HF*(S°) 


encodes a diffeomorphism invariant of the original 
4-manifold X. This invariant is conjectured to be 
equivalent to the Seiberg—Witten invariants of X. 

Heegaard Floer homology, and its cousin Seiberg— 
Witten Floer homology, have been applied success- 
fully to settle long-standing problems in topology, 
particularly questions related to surgery on knots. 
An example of such an application is the theorem of 
Kronheimer et al. that one cannot obtain the 
projective space RP? by surgery on a nontrivial 
knot in the 3-sphere. 

In these and other applications of both Heegaard 
and Seiberg—Witten Floer homology, two key proper- 
ties of the homology groups play an important part. 
The first is a nonvanishing theorem, which shows, for 
example, that these Floer groups can distinguish St x 
S* from any other manifold with the same homology. 
The second is a long exact sequence, which relates the 
Floer groups of the manifolds obtained by three 
different surgeries on a knot. The latter property is 
shared by the instanton Floer groups, as was shown by 
Floer (Braam and Donaldson 1995). 

Other Floer-type theories have been considered, 
not all of which arise from a gradient flow, but in 
which the boundary map of the complex is obtained 
by counting solutions to a geometric differential 
equation. At the time of writing, Floer homology is 
an area of very active development. 


See also: Four-Manifold Invariants and Physics; Gauge 
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The objective of this article is to give an overview 
of some advanced numerical methods commonly 
used in fluid mechanics. The focus is set primarily 
on finite-element methods and finite-volume 
methods. 


Fluid Mechanics Models 
Let Q be a domain in R4(d =2, 3) with boundary 00 


and outer unit normal n. Q is assumed to be 
occupied by a fluid. The basic equations governing 
fluid flows are derived from three conservation 
principles: conservation of mass, momentum, and 
energy. Denoting the density by p, the velocity by u, 
and the mass specific internal energy by e;, these 
equations are 


Op + V - (pu) =0 [1] 
O(pu)+V-(pu®u)=V-0+ of [2] 
O;(pei) + V+ (puei) =0:€ +qr-V-jr [B] 


where ø is the stress tensor, € =(1/2)(Vu + Vu)! is 
the strain tensor, f is a body force per unit mass 
(gravity is a typical example), qr is a volume source 
(it may model chemical reactions, Joule effects, 
radioactive decay, etc.), and j7 is the heat flux. In 
addition to the above three fundamental conserva- 
tion equations, one may also have to add L 
equations that account for the conservation of 


other quantities, say ¢;,1 < £ < L. These quantities 
may, for example, be the concentration of constitu- 
ents in an alloy, the turbulent kinetic energy, the 
mass fractions of various chemical species by unit 
volume, etc. All these conservation equations take 
the following form: 


O:(pbe) + V - (Pupe) = Fo. — V ` Toy: 


Henceforth, the index £ is dropped to alleviate the 
notation. 

The above set of equations must be supplemented 
with initial and boundary conditions. Typical initial 
conditions are py—o0 = Po, Uj=0 = Uo, and dy—0 = Go. 
Boundary conditions are usually classified into 
two types: the essential boundary conditions and 
the natural boundary conditions. Natural conditions 
impose fluxes at the boundary. Typical examples are 


1<f<L [4 


(o -n+ R- u)on = au 

(jr: n+ rrei) = aT 
and 

Gg: n+ ToP) = to 


The quantities R, rr, 14, Au, AT, ag are given. Essen- 
tial boundary conditions consist of enforcing bound- 
ary values on the dependent variables. One typical 
example is the so-called no-slip boundary condition: 
uan = 0. 

The above system of conservation laws is closed 
by adding three constitutive equations whose pur- 
pose is to relate each field ø, 77, and 7, to the fields 
p, u, and @. They account for microscopic properties 
of the fluid and thus must be frame-independent. 
Depending on the constitutive equations and 
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adequate hypotheses on time and space scales, 
various models are obtained. An important class of 
fluid model is one for which the stress tensor is a 
linear function of the strain tensor, yielding the so- 
called Newtonian fluid model: 


Oo=(Cpt+ AV -u#)I+2pe [5] 


Here p is the pressure, I is the identity matrix, and A 
and u are viscosity coefficients. Still assuming 
linearity, common models for heat and solute fluxes 
consist of assuming 


Jp =—-KVI, = Jg =-DVO [6] 


where T is the temperature. These are the so-called 
Fourier’s law and Fick’s law, respectively. 

Having introduced two new quantities, namely 
the pressure p and the temperature T, two new 
scalar relations are needed to close the system. These 
are the state equations. One admissible assumption 
consists of setting p=p(p, T). Another usual addi- 
tional hypothesis consists of assuming that the 
variations in the internal energy are proportional 
to those in the temperature, that is, ĝe; =cpOT. 

Let us now simplify the above models by 
assuming that p is constant. Then, mass conserva- 
tion implies that the flow is incompressible, that is, 
V-u=0. Let us further assume that neither A, m, 
nor p depend on e;. Then, upon abusing the 
notation and still denoting by p the ratio p/p, the 
above set of assumptions yields the so-called 
incompressible Navier-Stokes equations: 


V-u=0 [7] 
ðu +u: Vu— vAu+ Vp =f [8] 


As a result, the mass and momentum conservation 
equations are independent of that of the energy and 
those of the solutes: 


pcp(07; T +u-VT)—V-(KVT) =2pe : €+ qr |9| 


b+ u-V6-5V-(DVS)=" 4s [0 


Another model allowing for a weak dependency of 
p on the temperature, while still enforcing incom- 
pressibility, consists of setting p= po(1 — G(T — To)). 
If buoyancy effects induced by gravity are important, 
it is then possible to account for them by setting 
f = pog(1 — G(T — To)), where g is the gravitational 
acceleration, yielding the so-called Boussinesq model. 

Variations on these themes are numerous and a 
wide range of fluids can be modeled by using 
nonlinear constitutive laws and nonlinear state 
laws. For the purpose of numerical simulations, 


however, it is important to focus on simplified 
models. 


The Building Blocks 


From the above considerations we now extract a 
small set of elementary problems which constitute 
the building blocks of most numerical methods in 
fluid mechanics. 


Elliptic Equations 


By taking the divergence of the momentum equation 
[8] and assuming u to be known and renaming p to 
@, one obtains the Poisson equation 


-Ad=f a1 


where f is a given source term. This equation plays a 
key role in the computation of the pressure when 
solving the Navier-Stokes equations; see [54b]. 
Assuming that adequate boundary conditions are 
enforced, this model equation is the prototype for 
the class of the so-called elliptic equations. A simple 
generalization of the Poisson equation consists of the 
advection-diffusion equation 


u-Vb—V- (kV) =f [12] 


where «k > 0. Admissible boundary conditions are 
(KOnd +ro)on=a, r> 0, or Gaqg=a. This type of 
equation is obtained by neglecting the time deriva- 
tive in the heat equation [9] or in the solute 
conservation equation [10]. Mathematically speak- 
ing, [12] is also elliptic since its properties (in 
particular, the way the boundary conditions must 
be enforced) are controlled by the second-order 
derivatives. For the sake of simplicity, assume that 
u=( in the above equation and that the boundary 
condition is djgq = 0, then it is possible to show that 
@ solves [12] if and only if ọ minimizes the 
functional 


IW) = | (VYP - fo) dx 


where |-| is the Euclidean norm and w spans 


H= fu | Vuds < oiva =0} [13 


Writing the first-order optimality condition for this 
optimization problem yields 


[ vo-vo= | fo 


for all € H. This is the so-called variational 
formulation of [12]. When u is not zero, no 
variational principle holds but a similar way to 


reformulate [12] consists of multiplying the equation 
by arbitrary functions in H and integrating by parts 
the second-order term to give 


[ue vow+nve vo= | Fe VWWweH [14] 
Q Q 


This is the so-called weak formulation of [12]. Weak 
and variational formulations are the starting point 
for finite-element approximations. 


Stokes Equations 


Another elementary building block is deduced from 
[8] by assuming that the time derivative and the 
nonlinear term are both small. The corresponding 
model is the so-called Stokes equations, 


—vAu + Vp =f [15] 


V-u=0 [16] 


Assume for the sake of simplicity that the no-slip 
boundary condition is enforced: ujn =0. Introduce 
the Lagrangian functional 


Lad = | (Vu:Vv -qV -v—f-v) dx 


Set 


X= 1% f Vol dx < 00; Vian = o) 
Q 


M= fa; | de< o0} 


Then, the pair (u,p)€XxM solves the Stokes 
equations if and only if it is a saddle point of £, that is, 


L(u,q) < L(u,p) < Lw, p), Yw,q)EXxM [17] 


In other words, the pressure p is the Lagrange multi- 
plier of the incompressibility constraint V -u =Q. 
Realizing this fact helps to understand the nature of 
the Stokes equations, specially when it comes to 
constructing discrete approximations. A variational 
formulation of the Stokes equations is obtained by 
writing the first-order optimality condition, namely: 


| (wvurve— pv -v—f-v) dx =0 Vex 
Q 


[av uds=0 VqeM 
Q 


When the nonlinear term is not zero in the 
momentum equation, or when this term is linear- 
ized, there is no saddle point, but a weak formula- 
tion is obtained by multiplying the momentum 
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equation by arbitrary functions v in X and integrat- 
ing by parts the Laplacian, and by multiplying the 
mass equation by arbitrary functions q in M: 


[ (ova) v+u: Vv- pV-v)de= | fov 18] 
Q Q 


[ave = 0 [19] 


Parabolic Equations 


The class of elliptic equations generalizes to that of 
the parabolic equations when time is accounted for: 


gp +tu:Vo-V: (KV) =f, 4-0 =¢0 [20] 


Fundamentally, this equation has many similarities 
with the elliptic equation 


ap+tu-Vo—V-(KVo) =f [21] 


where aœ > 0. In particular, the set of boundary 
conditions that are admissible for [20] and [21] are 
identical, that is, it is legitimate to enforce (KO,@ + 
ro)jan =4,r = 0, or dag =a. Moreover, solving [21] 
is always a building block of any algorithm solving 
[20]. The important fact to remember here is that if 
a good approximation technique for solving [21] is 
at hand, then extending it to solve [20] is usually 
straightforward. 


Hyperbolic Equations 


When «/UL — 0, where U is the reference velocity 
scale and L is the reference length scale, [20] 
degenerates into the so-called transport equation 


np+u-Vo=f |22] 


This is the prototypical example for the class of 
hyperbolic equations. For this equation to be well- 
posed, it is necessary to enforce an initial condition 
b-0 =o and an inflow boundary condition, that 
is, daq- =a, where OQ = {x € OQ; (u - n)(x) < O} is 
the so-called inflow boundary of the domain. To 
better understand the nature of this equation, 
introduce the characteristic lines X(x, s;t) of u(x, t) 
defined as follows: 


dX (x, s; t) = u(X (x, s; t), t) 


XSSR 23| 


If u is continuous with respect to t and Lipschitz 
with respect to x, this ordinary differential equation 
has a unique solution. Furthermore, [22] becomes 


d[o (X (x, s; t), t)] = F(X (x, s; t), t) |24] 
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Blot) = bo(X(x,050)) + | cere tote 


provided X(x,t;7) € Q for all r € [0,¢]. This shows 
that the concept of characteristic curves is important 
to construct an approximation to [22]. 


Meshes 


The starting point of every approximation technique 
for solving any of the above model problems consists 
of defining a mesh of 2 on which the approximate 
solution is defined. To avoid having to account for 
curved boundaries, let us assume that the domain Q is 
a two-dimensional polygon (resp. three-dimensional 
polyhedron). A mesh of Q, say Ty, is a partition of Q 
into small cells, hereafter assumed to be simple 
convex polygons in two dimensions (resp. polyhe- 
drons in three dimensions), say triangles or quad- 
rangles (resp. tetrahedrons or cuboids). Moreover, 
this partition is usually assumed to be such that if 
two different cells have a nonempty intersection, then 
the intersection is a vertex, or an entire edge, or an 
entire face. The left panel of Figure 1 shows a mesh 
satisfying the above requirement. The mesh in the 
right panel is not admissible. 


Finite Elements: Interpolation 


The finite-element method is foremost an interpola- 
tion technique. The goal of this section is to 
illustrate this idea by giving examples. 

Let T, ={Km}i<m<n, be a mesh composed of Na 
simplices, that is, triangles in two dimensions or 
tetrahedrons in three dimensions. Consider the 
following vector spaces of functions: 


V, = {up € O (Q); unix, € Pe, <M Na} [25] 


where P, denotes the space of polynomials of global 
degree at most k. V, is called a finite-element 
approximation space. We now construct a basis for V}. 

Given a simplex K,, in RÍ, let v, be a vertex of 
Km, let F, be the face of Km opposite to v,, and 





Figure 1 Admissible (left) and nonadmissible (right) meshes. 


define n, to be the outward normal to F,,1<n< 
d+ 1. Define the barycentric coordinates 


(x — Un) + Mn 


oe la (Vi — Un) Mn 


, l<n<d+1_ [26 
where v is an arbitrary vertex in F,, (the definition 
of A, is clearly independent of vı provided v; belongs 
to F,,). The barycentric coordinate , is an affine 
function; it is equal to 1 at v, and vanishes on F,; its 
level sets are hyperplanes parallel to F,. The 
barycenter of Km has barycentric coordinates 


ee 
del dEl 


The barycentric coordinates satisfy the following 
properties: for all x € Km, 0 < à„(x) < 1, and for all 
xE RI, 


d+1 d+1 
X Aai and S An(x)(x —v,) =0 
n=1 n=1 


Consider the set of nodes {dy m}ycn<n, Of Km with 
barycentric coordinates 


(ZZ) 0 < io,..., ig < Rk, in +-:- +igJ =k 


These points are called the Lagrange nodes of Kn. It 
is clear that there are np=(1/2)(k + 1)(k +2) 
of these points in two dimensions and n, = (1/6) 
(k+1)(kR+2)(kR+3) in three dimensions. It is 
remarkable that n, = dim P}. 

Let fbi, mney bn} = Uk,,7, (dim eri Aram) be the 
set of all the Lagrange nodes in the mesh. For Km € T, 
and n € {1,... nn} let j(n,m) € {1,...,N} be the 
integer such that an, m = biin,m;j(n, m) is the global 
index of the Lagrange node đn,m. Let {y1,..., pn} be 
the set of functions in V; defined by y;(b;) = j, then it 
can be shown that 


{1,---, pn} is a basis for V, [27] 


The functions y; are called global shape functions. 
An important property of global shape functions is 
that their supports are small sets of cells. More 
precisely, let ¿€ {1,...,N} and let V;= {m; 3n; 
i=j(n,m)} be the set of cell indices to which the 
node b; belongs, then the support of 9; is U,,cy, Km- 
For k=1, it is clear that yj) x, =An for all m € V; 
and all n such that i=j(n,m), and yi\x, =0 
otherwise. The graph of such a shape function in 
two dimensions is shown in the left panel of Figure 2. 
For k=2, enumerate from 1 to d+ 1 the vertices of 
Km, and enumerate from d+ 2 to ny, the Lagrange 
nodes located at the midedges. For a midedge node 
of index d+2<n<ng, let b(n),e(n) € {1,..., 
d+1} be the two indices of the two Lagrange 
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Figure 2 Two-dimensional Lagrange shape functions: piecewise P, (left) and piecewise P. (center and right). 


nodes at the extremities of the edge in question. Then, 
the restriction to Km of a P2 shape function g; is 


oe An(2An — 1), 
PilKm = ANG Gy Aen 


f1<n<d+1 


ifd+2<n<ny, 28 


Figure 2 shows the graph of two P2 shape functions 
in two dimensions. 

Once the space V; is introduced, it is natural to 
define the interpolation operator 


N 
M, : CQ) >v — S v(b)pie Vp [29] 
i=l 


This operator is such that for all continuous 
functions v, the restriction of II,(v) to each mesh 
cell is a polynomial in P, and II,(v) takes the same 
values as v at the Lagrange nodes. Moreover, setting 
h= maxx, <r, diam(K,,), and defining 


1/p 
Ill. = (/ rds ELETA 
Q 


the following approximation holds: 


lv — w) + PIVE — Un ()) IIb 


< ch* lvlen [30] 


) 
where c is a constant that depends on the quality of 
the mesh. More precisely, for Km € Tp, let px, be 
the diameter of the largest ball that can be inscribed 
into K,, and let hx, be the diameter of Km. Then, c 
depends on o= maxx, ct, hx,,/px,,- Hence, for the 
mesh to have good interpolation properties, it is 
recommended that the cells be not too flat. Families 
of meshes for which o is bounded uniformly with 
respect to h as hb —> 0 are said to be shape-regular 
families. 

The above example of finite-element approxima- 
tion space generalizes easily to meshes composed of 
quadrangles or cuboids. In this case, the shape 
functions are piecewise polynomials of partial 


degree at most k. These spaces are usually referred 
to as Q; approximation spaces. 


Finite Elements: Approximation 


We show in this section how finite-element approx- 
imation spaces can be used to approximate some 
model problems exhibited in the section “Building 


blocks.” 


Advection-Diffusion 


Consider the model problem [21] supplemented 
with the boundary condition (KOn¢@+1@)\99 =. 
Assume k > 0,a + (1/2)V -u > 0, andr > 0. Define 


a(,#) = J (ad + E E ET 


+ J R rowds 


Then, the weak formulation of [21] is: seek € H 
(H defined in [13]) such that for all y € H 


a(b,) = J fypdx + J guds [31] 


Using the approximation space Vp defined in [25] 
together with the basis defined in [27], we seek an 
approximate solution to the above problem in the 
form @, = DF U;y; € Vp. Then, a simple way of 
approximating [31] consists of seeking U= 
(U1,..., Un)! € RN such that for all 1 <i <N 


lOp p= | todxs | gypids [32 


This problem finally amounts to solving the follow- 
ing linear system: 


AU =F 33] 
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where Aj; =a(y;, pi) and 


F= | fede + | gpids 
Q an 


The above approximation technique is usually 
referred to as the Galerkin method. The following 
error estimate can be proved: 


l — rll + AIIV(O — op) 
< ch I dll cer @ [34] 


where, in addition to depending on the shape 
regularity of the mesh, the constant c also depends 
on K, a, and 8. 


Stokes Equations 


The line of thought developed above can be used to 
approximate the Navier-Stokes problem [15]-[16]. 
Let us assume that the nonlinear term u- Vu is 
linearized in the form v- Vu, where v is known. Let 
T, be a mesh of Q, and assume that finite-element 
approximation spaces have been constructed to 
approximate the velocity and the pressure, say X, 
and M,. Assume for the sake of simplicity that X, C 
X and M; C M. Assume that bases for X, and M, 


are at hand, say {y,...,yn,} and isee UN 
respectively. Set 


alu, 9) = [wv -o+vVu: Vodx 
Q 
and 
b(v, ~) =—-—] YV -vd 
(w, Y) =- | WV vdx 


Then, we seek an approximate velocity up = 
S> Uig; and an approximate pressure py = 
yop”, Pave such that for all i € {1,...,N,} and all 
k € {1,..., Np} the following holds: 


alun, Q;) + b(P; Dr) = J f-gdx [35] 


b(uy, Yk) = 9 [36] 


Define the matrix AeRN’% such that 
Ajj =4(9;,9;). Define the matrix Be RN Nu such 
that By; = b(9;, Yk). Then, the above problem can be 
recast into the following partitioned linear system: 


A B'|[U] |F 
a ollel=o] 
where the vector F € R™ is such that F; = fhf- P: 


An important aspect of the above approximation 
technique is that, for the linear system to be 
































Figure 3 The P;/P; finite element: the mesh (left); one 
pressure spurious mode (right). 


invertible, the matrix B! must have full row rank 
(i.e., B has full column rank). This amounts to 


V-u,d 
33, >0, inf sup LEY mer. 4 jg 
qb=Mp vex, |[Vollxllolla 


where 


wp |l2 = J Vo,dx, lq = J q? dx 


This nontrivial condition is called the Ladyženskaja- 
Babuška-Brezzi condition (LBB) in the literature. 
For instance, if P4 finite elements are used to approx- 
imate both the velocity and the pressure, the above 
condition does not hold, since there are nonzero 
pressure fields q, in M, such that fo q,V :vpdx=0 
for all v, in Xp. Such fields are called spurious 
pressure modes. An example is shown in Figure 3. 
The spurious function alternatively takes the values 
—1,0, and +1 at the vertices of the mesh so that its 
mean value on each cell is zero. 

Couples of finite-element spaces satisfying the 
LBB condition are numerous. For instance, assuming 
k > 2, using P, finite elements to approximate the 
velocity and P,_, finite elements to approximate the 
pressure is acceptable. Likewise, using Q; elements 
for the velocity and Q,_,; elements for the pressure 
on meshes composed of quadrangles or cuboids is 
admissible. 

Approximation techniques for which the pressure 
and the velocity degrees of freedom are not 
associated with the same nodes are usually called 
staggered approximations. Staggering pressure and 
velocity unknowns is common in solution methods 
for the incompressible Stokes and Navier-Stokes 
equations; see also the subsection “Stokes 
equations.” 


Finite Volumes: Principles 


The finite-volume method is an approximation 
technique whose primary goal is to approximate 
conservation equations, whether time dependent or 


not. Given a mesh, say 7,={Kim}j<m<n,, and a 
conservation equation 


ad.d + V -F(d,V¢,x,t) = f 39] 


(a =0 if the problem is time independent and a=1 
otherwise), the main idea underlying every finite- 
volume method is to represent the approximate 
solution by its mean values over the mesh cells 
(Ova, »OKy,) c R^: and to test the conservation 
equation by the characteristic functions of the mesh 
cells {1x,,...,1 Kn, }: For each cell K,, € T,, denote by 
nx, the outward unit normal vector and denote by F 
the set of the faces of K,,. The finite-volume approx- 
imation to [39] consists of seeking (@x,,..., Ky.) = 
RN¢ such that the function ¢, = YL 1 PK, LKim 
satisfies the following: for all 1 < m < Na 


Knladiox,(t)+ YO FP (On Vrni) = f fde [40 


CEF m 


[Km] = | dx 
K 


Vd, is an approximation of Vọ, and F,”° is an 
approximation of 


where 


[ Fe. Vọ, x, t) " ng,„do 


The precise definition of the so-called approximate 
flux F”? depends on the nature of the problem 
(e.g., elliptic, parabolic, hyperbolic, saddle point) 
and the desired accuracy. In general, the approx- 
imate fluxes are required to satisfy the following 
two important properties: 


1. Conservativity: for Km,Kı€T, such that 
Tela a ==. 


2. Consistency: let 7 be the solution to [39], and set 


1x, Kn 


— [Kil Je, 








el pdx 


pdx +--+ 
Knal KN; 


Wh 


then 


Fe’ (hy, Vent) = | FVat) -ndo as h — 0 


The quantity 


Fr’ (dy, Von, t) — | F% VUZ, t) " n do 


oO 


is called the consistency error. 


Note that [40] is a system of ordinary differential 
equations. This system is usually discretized in time 
by using standard time-marching techniques such as 
explicit Euler, Runge-Kutta, etc. 
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The discretization technique described above is 
sometimes referred to as cell-centered finite-volume 
method. Another method, called vertex-centered 
finite volume method, consists of using the char- 
acteristic functions associated with the vertices of 
the mesh instead of those associated with the cells. 


Finite Volumes: Examples 


In this section we illustrate the ideas introduced 
above. Three examples are developed: the Poisson 
equation, the transport equation, and the Stokes 
equations. 


Poisson Problem 


Consider the Poisson equation [11] equipped with 
the boundary condition „jan =a. To avoid techni- 
cal details, assume that Q =[0,1]?. Let K; be a mesh 
of Q composed of rectangles (or cuboids in three 
dimensions). 

The flux function is F(¢,V¢,x)=—V4; hence, 
F” must be a consistent conservative approxima- 
tion of —{ nx,,- Vodo. Let o be an interior face of 
the mesh and let Km, Kı be the two cells such that 
o=K,,1 Kı. Let xx,,,xx, be the barycenters of Km 
and K, respectively. Then, an admissible formula 
for the approximate flux is 


| | 
pine = 


men mm) AN 
where |o|= f, do. The consistency error is O(h) in 
general, and is O(h?) if the mesh is composed of 
identical cuboids. The conservativity is evident. If o 
is part of Q, an admissible formula for the 
approximate flux is F” =-f ado. Then, upon 
defining Fy =Fx,\0Q and Ga =Fx, MOQ, the 
finite-volume approximation of the Poisson problem 
is: seek 6, € RN* such that for all 1 < m < Ng 


` R= j fit Y | «00 |42] 


i O 
oCF A oEF A 


Transport Equation 


Consider the transport equation 


dp +V: (ue) =f [43] 


di-0 = 0, Pjan-=a |44] 


where u(x, t) is a given field in C! (Q x [0, T]). Let Ty, 
be a mesh of Q. For the sake of simplicity, let us use 
the explicit Euler time-stepping to approximate [40]. 
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Let N be positive integer, set At = T/N, set t” =nAt 
for 0 < n < N, and partition [0, T] as follows: 


N-1 


U t”, p 


n=0 


(0, T| z 


Denote by ¢7 € Rò: the finite-volume approxima- 
tion of p(t”). Then, [40] is approximated as 
follows: 


Km 
Tu Pk) + XO F Cn Vado, t") 


oEF m 


a J Maid 45] 
K 


where ok, = Ik, oo dx. The approximate flux F/”” 
must be a consistent conservative approximation of 
f (u-nx,,)odo. Let o be a face of the mesh and let 
Km, Kı be the two cells such that c=K,, N K; (note 
that if ø is on OQ,0 saga to one cell only and we 
set Km = Kj). ie on OD, 


p S [jem - ng,„)a do [46] 


If g is not on OQ”, set Uh = 
define 


f (u-nk,,)do and 


47 
PKMmo UU, <9 A 


The above choice for the approximate flux is usually 
called the upwind flux. It is consistent with the analysis 
that has been done for [22], that is, information flows 
along the characteristic lines of the field u; see [24]. In 
other words, the updating of oe must be done by 
using the approximate values @? coming from the cells 
that are upstream the flow field. 

An important feature of the above approximation 
technique is that it is L®-stable, in the sense that 


oy, | < (uo, f) 


if the two mesh parameters At and / satisfy 
the so-called Courant—Friedrichs—Levy (CFL) 
condition ||z||,.At/h < clo), where c(o) is a con- 
stant that depends on the mesh regularity parameter 
o= maxx, cz, hx,,/pPK,,- In one dimension, c(c) = 1. 





Max0<n<N,1<m<Nz 





Stokes Equations 


To finish this short review of finite-volume methods, 
we turn our attention to the Stokes problem (15)-(16) 
equipped with the homogeneous Dirichlet boundary 
condition ujao = 0. 


Let 7, be a mesh of Q composed of triangles (or 
tetrahedrons). All the angles in the triangulation are 
assumed to be acute so that, for all K€ 7,, the 
intersection of the orthogonal bisectors of the sides 
of K, say xx, is in K. We propose a finite-volume 
approximation for the velocity and a finite-element 
approximation for the pressure. Let {e1,..., eq} be a 
Cartesian basis for R. Set i =1x,e, for all 1 < 
m < Na and 1 < k < d; then define 


1 d d 
X, = span{ thoes Ags 1%,,, } 
Let {b1,...,bn,} be the vertices of the mesh, and let 
{~Y1,---5~N,} be the associated piecewise linear 
global shape functions. Then, set (see the section 
“Finite elements: interpolation”) 


1 
i eH 


, PN, } 
M, = {4€ Ni; | qdx = 0} 
Q 


N; = span{y1,..- 


The approximate problem consists of seeking 
(UKx,,---,UKy,) E R dNa and pp € M; such that for 
E NA andalll1<i<N,, 
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CEF m 
c(ux,,, Pi) =0 [49] 
where 
(UKn Ph) = / UK,, ` VPpdx 
Km 
Moreover, 
d (ux — UK,) if o = Km 1K; 
PA [Xm — x| " 
ae 
—— if o = Km N 0N 
A j Ka if o Nð 


where d(xx,,0) is the Euclidean distance between 
xx, and ø. This formulation yields a linear system 
with the same structure as in [37]. Note in particular 
that 


cV, Ph 
sup Po) L py If, 50 
vp, EX) | || 7,00 


Since the mean value of p, is zero, ||Vpp||;1 is a norm 
on M,. Asa result, an inequality similar to [38] holds. 
This inequality is a key step to proving that the linear 
system is wellposed and the approximate solution 
converges to the exact solution of (15)—(16). 


Projection Methods for Navier-Stokes 


In this section we focus on the time approximation 
of the Navier-Stokes problem: 


Ou —vAu+u-Vu+Vp=f [51a] 
V-u=0 [5 1b] 
uan = 0 [S1c] 
U\:+—0 = UO [51d] 


where f is a body force and uo is a solenoidal 
velocity field. There are numerous ways to discretize 
this problem in time, but, undoubtedly, one of the 
most popular strategies is to use projection methods, 
sometimes also referred to as Chorin—Temam 
methods. 

A projection method is a fractional-step time- 
marching technique. It is a_predictor—corrector 
strategy aiming at uncoupling viscous diffusion and 
incompressibility effects. One time step is composed 
of three substeps: in the first substep, the pressure is 
made explicit and a provisional velocity field is 
computed using the momentum equation; in the 
second substep, the provisional velocity field is 
projected onto the space of incompressible (solenoi- 
dal) vector fields; in the third substep, the pressure is 
updated. 

Let q > 0 be an integer and approximate the time 
derivative of u using a backward difference formula of 
order g. To this end, introduce a positive integer N, set 
At= T/N, set t” =nAt for 0 < n < N, and consider a 
partitioning of the time interval in the form 


N-1 
(0, T] = U e, p] 
n=0 
For all sequences vas = (v, vt, ... u), set 


D@y"! Z B v +1 


q-1 . 
= Ss” Bw! [52] 
j=0 


where g—1<n<N-1. The coefficients 6; are 
such that 


il n+1 aE n—j 
a Ple) = Dante") 


is a gth-order backward difference formula approx- 
imating 0,u(t”*!). For instance, 


Dyrt! = yt! _ y” 


Dy?! = 3y7tt — 


2v" + tv"! 
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(4°, ot, eae? ob’), 


Furthermore, for all sequences da; = 
define 


grntt — 


-$ ae- i [53] 


so that ys a » Vib (t”i) is a (q — 1)th-order extrapola- 
tion of p(t), For instance, p*”t'=0 for 
sL =p forg=2, and po" = 2p =p 
- q=3. Finally, denote by (u-Vu”t! a qth- 
order extrapolation of (u - Vu)(t"*!). For instance, 


forg=1 


x,n+1 u” Vu” 
(u-Vu)” ={ 79 


2u"- Vu" — u”! . Vu! 


A general projection algorithm is as follows. Set 
u? =u and ¢/'=0 for 0<l<q-1. If q>1; 
assume that w!,...,a7-',p*7 and (u-Vu)*? have 
been initialized orak For n > q-— 1, seek u”*! 
such that io = 0 and 


D@ 
lillie vA! ev) G n+l {£ Pi o" | 


=< [54a] 


C (u-Vu)?"*!, Then solve 


=f") 


Ag t= Vi", Angier = 0 


where 
[54b] 


Finally, update the pressure as follows: 
prt = a or a p* „n+1 — pV « nt! [54c] 


The algorithm [54a-c] is known in the literature as 
the rotational form of the pressure-correction 
method. Upon denoting ua;=(u(t®),...,u(t®)) 
and pa,=(p(t?),... p(t)), the above algorithm 
has been proved to yield the following error 
estimates: 


|uar — Harllee \< cAt* 


|V (ar — dadMlean + |[Par 


where ||Garllaq2)=At Zrno Jo lo" dx. 

A simple strategy to initialize the algorithm 
consists of using Du! at the first step in [54a]; 
then using D'u? at the second step, and proceed- 
ing likewise until w',...,77-' have all been 
computed. 

At the present time, projection methods count among 
the few methods that are capable of solving the time- 
dependent incompressible Navier-Stokes equations in 
three dimensions on fine meshes within reasonable 


_ Palle) < cAr’? 
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computation times. The reason for this success is that 
the unsplit strategy, which consists of solving 


Da) 
ae = vAu! J yp Z gut [5Sa] 
Veu =0, th = [55b] 


yields a linear system similar to [37], which usually takes 
far more time to solve than sequentially solving [54a] 
and [54b]. It is commonly reported in the literature that 
the ratio of the CPU time for solving [55a]-[55b] to that 
for solving [54a-c] ranges between 10 to 30. 


See also: Compressible Flows: Mathematical Theory; 
Computational Methods in General Relativity: The Theory; 
Geophysical Dynamics; Image Processing: Mathematics; 
Incompressible Euler Equations: Mathematical Theory; 
Interfaces and Multicomponent Fluids; 
Magnetohydrodynamics; Newtonian Fluids and 
Thermohydraulics; Non-Newtonian Fluids; Partial 
Differential Equations: Some Examples; Variational 
Methods in Turbulence. 
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Introduction 


In the famous 1822 treatise by Jean Baptiste Joseph 
Fourier, Théorie analytique de la chaleur, the Discours 
préliminaire opens with: “Primary causes are 
unknown to us; but are subject to simple and constant 
laws, which may be discovered by observation, the 
study of them being the subject of natural philosophy. 
Heat, like gravity, penetrates every substance of the 
universe, its rays occupy all parts of space. The object 
of our work is to set forth the mathematical laws 
which this element obeys. The theory of heat will 
hereafter form one of the most important branches of 
general physics.” After a brief discussion of rational 
mechanics, he continues with the sentence: “But 
whatever may be the range of mechanical theories, 
they do not apply to the effects of heat. These make up 
a special order of phenomena, which cannot be 
explained by the principles of motion and equilibria.” 
Fourier goes on with a thorough description of the 
phenomenology of heat transport and the derivation of 
the partial differential equation describing heat trans- 
port: the heat equation. A large part of the treatise is 
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then devoted to solving the heat equation for various 
geometries and boundary conditions. Fourier’s treatise 
marks the birth of Fourier analysis. After Boltzmann, 
Gibbs, and Maxwell and the invention of statistical 
mechanics in the decades after Fourier’s work, we 
believe that Fourier was wrong and that, in principle, 
heat transport can and should be explained “by the 
principles of motion and equilibria,” that is, within the 
formalism of statistical mechanics. But well over a 
century after the foundations of statistical mechanics 
were laid down, we still lack a mathematically 
reasonable derivation of Fourier’s law from first 
principles. Fourier’s law describes the macroscopic 
transport properties of heat, that is, energy, in none- 
quilibrium systems. Similar laws are valid for the 
transport of other locally conserved quantities, for 
example, charge, particle density, momentum, etc. We 
will not discuss these laws here, except to point out 
that in none of these cases macroscopic transport laws 
have been derived from microscopic dynamics. As 
Peierls once put it: “It seems there is no problem in 
modern physics for which there are on record as many 
false starts, and as many theories which overlook some 
essential feature, as in the problem of the thermal 
conductivity of [electrically] non-conducting crystals.” 


Macroscopic Law 


Consider a macroscopic system characterized at 
some initial time, say t=0, by a nonuniform 


temperature profile To(r). This temperature profile 
will generate a heat, that is, energy current J(r). 


Due to energy conservation and _ basic 
thermodynamics: 
o 
o(T) a T(r, t) =-V ‘J [1] 


where c,(T) is the specific heat per unit volume. On the 
other hand, we know that if the temperature profile is 
uniform, that is, if To(r) = To, there is no current in 
the system. It is then natural to assume that, for small 
temperature gradients, the current is given by 


I(r) = RUAT) 2 | 


where «(T) is the conductivity. Here we have 
assumed that there is no mass flow or other mode 
of energy transport besides heat conduction (we 
also ignore, for simplicity, any variations in density 
or pressure). Equation [2] is normally called as 
Fourier’s law. Putting together eqns [1] and [2], we 
get the heat equation: 


o 
(DT rt) = V- KTVT] B 
This equation must be completed with suitable 
boundary conditions. Let us consider two distinct 
situations in which the heat equation is observed to 
hold experimentally with high precision: 


1. An isolated macroscopic system, for example, 
a fluid or solid in a domain A surrounded 
by effectively adiabatic walls. In this case, 
eqn [3] is to be solved subject to the initial 
condition T(r,0)=Tpo(r) and no heat flux 
across the boundary of A (denoted by OA), that 
is, n(r)-V T(r) =0 if re OA with n the normal 
vector to OA at r. As t —> œ, the system reaches a 
stationary state characterized by a uniform 
temperature T determined by the constancy of 
the total energy. 

2. A system in contact with heat reservoirs. Each 
reservoir a fixes the temperature of some portion 
(OA), of the boundary OA. The rest of the 
boundary is insulated. When the system reaches 
a stationary state (again assuming no matter 
flow), its temperature will be given by the 
solution of eqn [3] with the left-hand side set 
equal to zero, 


V-J(r)=V-(KVT(r)) =0 4] 


subject to the boundary condition T(r)= T, for 
r€ (ðA), and no flux across the rest of the 
boundary. 


The simplest geometry for a conducting system is 
that of a cylindrical slab of height h and cross- 
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sectional area A. It can be either a cylindrical 
container filled with a fluid or a piece of crystalline 
solid. In both cases, one keeps the lateral surface of 
the cylinder insulated. If the top and the bottom of 
the cylinder are also insulated we are in case (1). If 
one keeps the top and the bottom in contact with 
thermostats at temperatures Tp and Tp, respectively, 
this is (for a fluid) the usual setup for a Benard 
experiment. To avoid convection, one has to make 
T, > T, or keep |T} — T,| small. Assuming unifor- 
mity in the direction perpendicular to the vertical 
x-axis one has, in the stationary state, a tempera- 
ture profile T(x) with T(0)=T,,T(h)=T, and 
k(T)dT/dx = const. for x € (0,4). 

In deriving the heat equation, we have implicitly 
assumed that the system is described fully by specifying 
its temperature T(r,t) everywhere in A. What this 
means on the microscopic level is that we imagine the 
system to be in local thermal equilibrium (LTE). 
Heuristically, we might think of the system as being 
divided up (mentally) into many little cubes, each large 
enough to contain very many atoms yet small enough 
on the macroscopic scale to be accurately described, at 
a specified time ft, as a system in equilibrium at 
temperature T(r;, t), where r; is the center of the ith 
cube. For slow variation in space and time, we can 
then use a continuous description T(r, t). The theory 
of the heat equation is very developed and, together 
with its generalizations, plays a central role in modern 
analysis. In particular, one can consider more general 
boundary conditions. Here we are interested in the 
derivation of eqn [2] from first principles. This clearly 
presupposes, as a first fundamental step, a precise 
definition of the concept of LTE and its justification 
within the law of mechanics. 


Empirical Argument 


A theory of heat conduction has as a goal the 
computation of the conductivity (T) for realistic 
models, or, at the very least, the derivation of 
behavior of «(T) as a function of T. The early 
analysis was based on “kinetic theory.” Its applica- 
tion to heat conduction goes back to the works of 
Clausius, Maxwell, and Boltzmann, who obtained a 
theoretical expression for the heat conductivity of 
gases, k ~ VT, independent of the gas density. This 
agrees with experiment (when the density is not too 
high) and was a major early achievement of the 
atomic theory of matter. 


Heat Conduction in Gases 


Clausius and Maxwell used the concept of a “mean 
free path” A: the average distance a particle (atom or 
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molecule) travels between collisions in a gas with 
particle density p. Straightforward analysis gives 
à ~ oro’, where o an “effective” hard-core diameter 
of a particle. They considered a gas with temperature 
gradient in the x-direction and assumed that the gas is 
(approximately) in local equilibrium with density p 
and temperature T(x). Between collisions, a particle 
moves a distance A carrying a kinetic energy propor- 
tional to T(x) from x to x+.A/V3, while in the 
opposite direction the amount carried is proportional 
to T(x + AV3). Taking into account the fact that the 
speed is proportional to VT the amount of energy J 
transported per unit area and time across a plane 
perpendicular to the x-axis is approximately 


J ~ pVT|T(x) — T(x + v3) | 
~ a [5] 


and so x ~ vT independent of p, in agreement with 
experiment. It was clear to the founding fathers that 
starting with a local equilibrium situation the process 
described above will produce, as time goes on, a 
deviation from LTE. They reasoned, however, that this 
deviation from local equilibrium will be small when 
(A/T)dT/dx « 1, the regime in which Fourier’s law is 
expected to hold, and the above calculation should 
yield, up to some factor of order unity, the right heat 
conductivity. To have a more precise theory, one can 
describe the state of the gas through the probability 
distribution f(r,p,t) of finding a particle in the 
volume element drdp around the phase space point 
(r, p). Here LTE means that 


f(r, p, t) = exp (- JATO) 


where m is the mass of the particles. If one computes 
the heat flux at a point r by averaging the microscopic 
energy current at r, j = pv(1/2mv?), over f(r, p, t) then 
it is only the deviation from local equilibrium which 
makes a contribution. The result however is essentially 
the same as eqn [5]. This was shown by Boltzmann, 
who derived an accurate formula for « in gases by 
using the Boltzmann equation. If one takes x from 
experiment, the above analysis yields a value for o, the 
effective size of an atom or molecule, which turns out 
to be close to other determinations of the characteristic 
size of an atom. This gave an evidence for the reality of 
atoms and the molecular theory of heat. 


Heat Conduction in Insulating Crystals 


In (electrically) conducting solids, heat is mainly 
transported by the conduction electron. In this case, 
one can adapt the theory discussed in the previous 


section. In (electrically) insulating solids, on the other 
hand, heat is transmitted through the vibrations of the 
lattice. In order to use the concepts of kinetic theory, it 
is useful to picture a solid as a gas of phonons which 
can store and transmit heat. A perfectly harmonic 
crystal, due to the fact that phonons do not interact, 
has an infinite thermal conductivity: in the language of 
kinetic theory, the mean free path A is infinite. In a real 
crystal, the anharmonic forces produce interactions 
between the phonons and therefore a finite mean free 
path. Another source of finite thermal conductivity 
may be the lattice imperfections and impurities which 
scatter the phonons. Debye devised a kind of kinetic 
theory for phonons in order to describe thermal 
conductivity. One assumes that a small gradient of 
temperature is imposed and that the collisions between 
phonons maintain local equilibrium. An elementary 
argument gives a thermal conductivity analogous to 
eqn [5] obtained in the last subsection for gases 
(remembering, however, that the density of phonons 
is itself a function of T) 


KN CyC7T [6] 


where, with respect to eqn [5], p has been replaced by 
Cy, the specific heat of phonons, VT by c, the (mean) 
velocity of the phonons, and A by cr, where r is the 
effective mean free time between phonon collisions. 
The thermal conductivity depends on the temperature 
via T, and a more refined theory is needed to account 
for this dependence. This was done by Peierls via a 
Boltzmann equation for the phonons. In collisions 
among phonons, the momentum of phonons is 
conserved only modulo a vector of the reciprocal 
lattice. One calls “normal processes” those where the 
phonon momentum is conserved and “Umklap pro- 
cesses” those where the initial and final momenta 
differ by a nonzero reciprocal lattice vector. Peierls’ 
theory may be summarized (very roughly) as follows: 
in the absence of Umklap processes, the mean free 
path, and thus the thermal conductivity of an insulat- 
ing solid, is infinite. A success of Peierls’ theory is to 
describe correctly the temperature dependence of the 
thermal conductivity. Furthermore, on the basis of this 
theory, one does not expect a finite thermal conduc- 
tivity in one-dimensional monoatomic lattices with 
pair interactions. This seems so far to be a correct 
prediction, at least in the numerous numerical results 
performed on various models. 


Statistical Mechanics Paradigm: 
Rigorous Analysis 


In a rigorous approach to the above arguments, we 
have to first formulate precisely the problem on a 


mathematical level. It is natural to adapt the standard 
formalism of statistical mechanics to our situation. To 
this end, we assume that our system is described by 
the positions QO and momenta P of a (very large) 
number of particles, N, with O=(q,,...,4dn) € 
AN,A CR’, and P=(p,,... pu) ER®. The 
dynamics (in the bulk) is given by a Hamiltonian 
function H(QO,P). A state of the system is a 
probability measure p(P,Q) on phase space. As 
usual in statistical mechanics, the value of an 
observable f(P,O) will be given by the expected 
value of f with respect to the measure u. In the case of 
a fluid contained in a region A, we can assume that 
the Hamiltonian has the form 


H(P,Q) =X |55 + > ela; — qi) + ulq:) 


N p? 
=X PL+ V(O) 7 


where ¢(q) is some short-range interparticle potential 
and u(q;) an external potential (e.g., the interaction of 
the particle with fixed obstacles such as a conduction 
electron interacting with the fixed crystalline ions). If 
we want to describe the case in which the temperature 
at the boundary is kept different in different regions 
Na, we have to properly define the dynamics at the 
boundary of the system. A possibility is to use 
“Maxwell boundary conditions”: when a particle hits 
the wall in OA4q, it gets reflected and re-emerges with a 
distribution of velocities 


m2 


mv 

hli) = —levlexp|— Jer |v 8 
Several other ways to impose boundary conditions 
have been considered in the literature. The notion of 
LTE can be made precise here in the so-called 
hydrodynamic scaling limit (HSL), where the ratio 
of microscopic to macroscopic scales goes to zero. 
The macroscopic coordinates r and t are related to 
the microscopic ones q and 7, by r=eq and t=e°7, 
that is, if A is a cube of macroscopic sides /, then its 
sides, now measured in microscopic length units, are 
of length L =e7'/. We then suppose that at t=0 our 
system of N=pL‘ particles is described by an 
equilibrium Gibbs measure with a temperature 
T(r) =T(eq): roughly speaking, the phase-space 
ensemble density has the form 


N 
pio(P, Q) ~ apl- X Bo(ed:) 
Zi 





+X lq- q) + Ha) [9] 
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where (5'(r) =To(r). In the limit e— 0, p fixed, the 
system at t=O will be macroscopically in LTE with 
a local temperature T(r) (as already noted, here we 
suppress the variation in the particle density n(r)). 
We are interested in the behavior of a macroscopic 
system, for which «€<1, at macroscopic times 
t>0O, corresponding to microscopic times 
T=e °*t,a=2 for heat conduction or other diffu- 
sive behavior. The implicit assumption then made 
in the macroscopic description given earlier is that, 
since the variations in To(r) are of order e on a 
microscopic scale, then for « < 1, the system will, 
also at time t, be in a state very close to LTE, with 
a temperature T(r,t) that evolves in time according 
to Fourier’s law, eqn [1]. From a mathematical 
point of view, the difficult problem is to prove that 
the system stays in LTE for t>0O when the 
dynamics are given by a Hamiltonian time evolu- 
tion. This requires proving that the macroscopic 
system has some very strong ergodic properties, for 
example, that the only time-invariant measures 
locally absolutely continuous with respect to the 
Lebesgue measure are, for infinitely extended 
spatially uniform systems, of the Gibbs type. This 
has only been proved so far for systems evolving 
via stochastic dynamics (e.g., interacting Brownian 
particles or lattice gases). For such stochastic 
systems, one can sometimes prove the hydrodyna- 
mical limit and derive macroscopic transport 
equations for the particle or energy density and 
thus verify the validity of Fourier law. Another 
possibility, as we already saw, is to use the 
Boltzmann equation. Using ideas of hydrodynami- 
cal space and time scaling described earlier, it is 
possible to derive a controlled expansion for the 
solution of the stationary Boltzmann equation 
describing the steady state of a gas coupled to 
temperature reservoirs at the top and bottom. One 
then shows that for e « 1, e being now the ratio 
A/L, the Boltzmann equation for f in the slab has a 
time-independent solution which is close to a local 
Maxwellian, corresponding to LTE (apart from 
boundary layer terms) with a local temperature and 
density given by the solution of the Navier-Stokes 
equations which incorporates Fourier’s law as 
expressed in eqn [2]. The main mathematical 
problem is in controlling the remainder in an 
asymptotic expansion of f in power of «e. This 
requires that the macroscopic temperature gradient, 
that is, |T, — T2|/b, where h = eL is the thickness of 
the slab on the macroscopic scale, be small. Even if 
this apparently technical problem could be over- 
come, we would still be left with the question of 
justifying the Boltzmann equation for such steady 
states and, of course, it would not tell us anything 
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about dense fluids or crystals. In fact, the Boltz- 
mann equation itself is really closer to a macro- 
scopic than to a microscopic description. It is 
obtained in a well-defined kinetic scaling limit in 
which, in addition to rescaling space and time, the 
particle density goes to zero, that is, A > ø. 

A simplified model of a crystal is characterized by 
the fact that all atoms oscillate around given 
equilibrium positions. The equilibrium positions 
can be thought of as the points of a regular lattice 
in Rf, say Zf. Although d=3 is the physical 
situation, one can also be interested in the case 
d=1,2. In this situation, A C Zf with cardinality 
N, and each atom is identified by its position 
xj =i-+q,, where i € A and q; € R? is the displace- 
ment of the particle at lattice site í from this 
equilibrium position. Since interatomic forces in 
real solids have short range, it is reasonable to 
assume that the atoms interact only with their 
nearest neighbors via a potential that depends only 
on the relative distance with respect to the equili- 
brium distance. Accordingly, the Hamiltonians that 
we consider have the general form 





2 
C20) ee Pi + > V(qi — 4) + Uila) 


ten M Ga 
2 
= 2 P V(Q) [10] 
icA 


where P=(p,)j<, and analogously for Q. We shall 
further assume that as |q| — co so do U;(q) and 
V(q). The addition of U;(q) pins down the crystal 
and ensures that exp [—GH(P, O)] is integrable with 
respect to dPdQ, and thus the corresponding Gibbs 
measure is well defined. In this case, in order to fix the 
temperature at the boundary, one can add a Langevin 
term to the equation of particles on the boundaries, 
that is, if 7 € OA, the equation for the particle is 


Pi — =O APO) AP; F 


where w; is a standard white noise. Other thermo- 
statting mechanisms can be considered. In this case 
we can also define LTE using eqn [9] but we run 
into the same difficulties described above — although 
the problem is somehow simpler due to the presence 
of the lattice structure and the fact that the particles 
oscillate close to their equilibrium points. We can 
obtain Fourier’s law only by adding stochastic 
terms, for example, terms like eqn [11], to the 
equation of motion of every particle and assuming 
that U(q) and V(q) are harmonic. These added 
noises can be thought of as an effective description 
of the chaotic motion generated by the anharmonic 
terms in U(q) and V(q). 


MT,w; [11 


Just how far we are from establishing rigorously 
the Fourier law is clear from our very limited 
mathematical understanding of the stationary 
nonequilibrium state (SNS) of mechanical systems 
whose ends are, as in the example of the Benard 
problem, kept at fixed temperatures Tı and T3. 
Various models have been considered, for exam- 
ple, models with Hamiltonian [10] coupled at the 
boundaries with heat reservoirs described by eqns 
[11]. The best mathematical results one can prove 
are: the existence and uniqueness of SNS; the 
existence of a stationary nontrivial heat flow; 
properties of the fluctuations of the heat flow in 
the SNS; the central-limit theorem type fluctua- 
tions (related to Kubo formula and Onsager 
relations; and large-deviation type fluctuations 
related to the Gallavotti-Cohen fluctuation theo- 
rem). What is missing is information on how the 
relevant quantities depend on the size of the 
system, N. In this context, the heat conductivity 
can be defined precisely without invoking LTE. To 
do this, we let J be the expectation value in the SNS 
of the energy or heat current flowing from reservoir 
1 to reservoir 2. We then define the conductivity 
KL as J/(A6T/L), where 6T/L=(T; — T2)/L is the 
effective temperature gradient for a cylinder of 
microscopic length L and uniform cross section A, 
and «(7T) is the limit of «kg when 
6T —0(T, =T,=T) and L— oo. The existence of 
such a limit with « positive and finite is what one 
would like to prove. 


See also: Dynamical Systems and Thermodynamics; 
Ergodic Theory; Interacting Particle Systems and 
Hydrodynamic Equations; Kinetic Equations; 
Nonequilibrium Statistical Mechanics: Dynamical 
Systems Approach; Nonequilibrium Statistical 
Mechanics: Interaction Between Theory and Numerical 
Simulations. 
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Introduction 


The Fourier-Mukai transform has been introduced 
in the study of abelian varieties by Mukai and can 
be thought of as a nontrivial algebro-geometric 
analog of the Fourier transform. Since its original 
introduction, the Fourier-Mukai transform turned 
out to be a useful tool for studying various aspects 
of sheaves on varieties and their moduli spaces, and 
as a natural consequence, to learn about the 
varieties themselves. Various links between geome- 
try and derived categories have been uncovered; for 
instance, Bondal and Orlov proved that Fano 
varieties, and certain varieties of general type, can 
be reconstructed from their derived categories. 
Moreover, Orlov proved a derived version of the 
Torelli theorem for K3 surfaces and also a structure 
theorem for derived categories of abelian varieties. 
Later, Kawamata gave evidence to the conjecture 
that two birational smooth projective varieties with 
trivial canonical sheaves have equivalent derived 
categories, which has been proved by Bridgeland in 
dimension 3. 

The Fourier-Mukai transform also enters into 
string theory. The most prominent example is 
Kontsevich’s homological mirror-symmetry conjec- 
ture. The conjecture predicts (for mirror dual pairs 
of Calabi-Yau manifolds) an equivalence between 
the bounded derived category of coherent sheaves 
and the Fukaya category. The conjecture implies a 
correspondence between certain self-equivalences 
(given by Fourier-Mukai transforms) of the derived 
category and symplectic self-equivalences of the 
mirror manifold. 

Besides their importance for geometrical aspects 
of mirror symmetry, the Fourier-Mukai transforms 
have also been important for heterotic string 
compactifications. The motivation for this came 
from the conjectured correspondence between the 


heterotic string and F-theory, which both rely on 
elliptically fibered Calabi-Yau manifolds. To give 
evidence for this correspondence, an explicit descrip- 
tion of stable holomorphic vector bundles was 
necessary and inspired a series of publications by 
Friedman, Morgan, and Witten. Their bundle con- 
struction relies on two geometrical objects: a 
hypersurface in the Calabi-Yau manifold together 
with a line bundle on it; more precisely, they 
construct vector bundles using a relative Fourier- 
Mukai transform. 

Various aspects and refinements of this construc- 
tion have been studied by now. For instance, a 
physical way to understand the bundle construction 
can be given using the fact that holomorphic vector 
bundles can be viewed as D-branes and that 
D-branes can be mapped under T-duality to new 
D-branes (of different dimensions). 

We survey aspects of the Fourier-Mukai trans- 
form, its relative version and outline the bundle 
construction of Friedman, Morgan, and Witten. The 
construction has led to many new insights, for 
instance, the presence of 5-branes in heterotic string 
vacua has been understood. The construction also 
inspired a tremendous amount of work towards a 
heterotic string phenomenology on elliptic Calabi- 
Yau manifolds. For the many topics omitted the 
reader should consult the “Further reading” section. 


The Fourier—-Mukai Transforms 


Every object E of the derived category on the 
product X x Y of two smooth algebraic varieties X 
and Y gives rise to a functor Ë from the bounded 
derived category D(X) of coherent sheaves on X to 
the similar category on Y: 


Ë : D(X) > D(Y) 

Fre Ë (F) = Ri, (TF Q E) 
where 7,7 are the projections from X x Y to X 
and Y, respectively, and ® denotes the derived 


tensor product. ®Ë(F) is called Fourier-Mukai 
transform with kernel E € D(X x Y) (in analogy 
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with the definition of an integral transform with 
kernel). Note that given a Fourier-Mukai functor 
Ë, PË(F) is in general a complex having homol- 
ogy in several degrees even if F is a sheaf. 
Furthermore, a result by Orlov states that if X 
and Y are smooth projective varieties then any 
fully faithful functor D(X)— D(Y) is a Fourier- 
Mukai functor. 

In analogy with the Fourier transform, there is a 
kind of “convolution product” giving the composi- 
tion of two such functors. More precisely, given 
smooth algebraic varieties X, Y, Z, and elements E € 
D(X x Y) and G € D(Y x Z), we can define o E € 
D(X x Z) by 


Go E = Rnxz,(txyE ® myzG) 


where myy, tyz, txz are the projections from X x 
Y x Z to the pairwise products giving a natural 
isomorphism of functors 


pE Ò pE = pocE 


Another analogy with the Fourier transform can 
be drawn. For this, assume that we have sheaves F 
and G which only have one nonvanishing Fourier- 
Mukai transform, the ith one ®(F) (where ©’: 
D(X) — Coh(Y), Fre H'(®£(F)); cf. remarks below) 
in the case of F, and the jth one ®/(G) in the case 
of G. Given such sheaves, there is the Parseval 
formula 


Ext? (F, G) = Ext?" T (6'(F), ®/(G)) 


which gives a correspondence between the exten- 
sions of F,G and the extensions of their Fourier- 
Mukai transforms. This formula can be considered 
as the analog of the Parseval formula for the 
ordinary Fourier transform for functions on a torus. 

The Parseval formula can be proved using two 
facts. First, for arbitrary coherent sheaves E,G the 
Ext groups can be computed in terms of the derived 
category, namely 


Ext'(E, G) = Homp,x)(E, G[i)) 


Second, the Fourier—Mukai transforms of F and G in 
the derived category D(X) are given by ®(F)= 
®'(F)[—i] and ®(G) =®/(G)[-j]. Since the Fourier- 
Mukai transform is an equivalence of categories, we 
have 


Homp(x (F, G|) = Hompyx) (P (F), ®/(G) [i — j + h]) 


implying the Parseval formula. 

A first simple example of a Fourier-Mukai functor 
can be given: let F be the complex in D(X x X) 
defined by the structure sheaf O, of the diagonal 


AcxXxX. Then it is easy to check that 4: 
D(X) — D(X) is isomorphic to the identity functor 
on D(X). Moreover, if we shift degrees by n taking 
F=Qa,|[n] (a complex with only the sheaf O, placed 
in degree n), then 6: D(X)— D(X) is the degree 
shifting functor Gro G[n]. 

As we will be interested in relative Fourier-Mukai 
transforms for elliptic fibrations, let us consider the 
case of a Fourier- Mukai transform on an elliptic 
curve: consider an elliptic curve E with a fixed 
origin po and identify E with E=Pic°(E) via 
f: E— E,xt+ Og(x — po). As kernel we take the 
normalized Poincaré line bundle P:= OpgxglA — 
{po} x E — E x {po}). The restriction of P to po x E 
or E x po is isomorphic to the trivial line bundle 
O.P has the universal property which can be 
expressed by ®?(k(x))=f(x), where k(x) is the 
sheaf supported at a point x € E; in particular, 
®”(k(po)) =Og and ®”(Og)=k(po)[-1], where Og 


is the structure sheaf of E. 


Relative Fourier—Mukai Transforms 
for Elliptic Fibrations 


It is often convenient to study problems for families 
rather than for single varieties. The main advantage 
of the relative setting is that base-change properties 
(or parameter dependencies) are better encoded into 
the problem. We can do that for Fourier—Mukai 
functors as well. To this end, we consider two 
morphisms p:X—B,p:X—B of algebraic vari- 
eties. We will assume that the morphisms are flat 
and so give nice families of algebraic varieties. We 
shall define relative Fourier-Mukai functors in this 
setting by means of a “kernel” E in the derived 
category D(X xp X). 

Let us make the relative setting explicit for elliptic 
fibrations: an elliptic fibration is a proper flat 
morphism p:X— B of schemes whose fibers are 
Gorenstein curves of arithmetic genus 1. We also 
assume that p has a section 0: B<> X taking values 
in the smooth locus X’ — B of p. The generic fibres 
are then smooth elliptic curves, whereas some 
singular fibers are allowed. If the base B is a smooth 
curve, elliptic fibrations were studied and classified 
by Kodaira, who described all the types of singular 
fibers that may occur, the so-called Kodaira curves. 
When the base is a smooth surface, more compli- 
cated configuration of singular curves can occur and 
have indeed been studied by Miranda. 

First let us fix notation and setup. We denote by 
o =o0(B) the image of the section, by X; the fiber of 
p over t € B (we assume, in what follows, B is either 
a smooth curve or surface) and by %:X;@X the 
inclusion. Furthermore, wyg is the relative dualizing 


sheaf and w=R!p,Ox > (p.Wx pg), where the iso- 
morphism is Grothendieck-Serre duality for p. The 
sheaf L = p,wy;g is a line bundle whose first Chern 
class we denote by K=cı(£). The adjunction 
formula for o —> X gives that o? = —o - p*K as cycles 
on X. Moreover, we will consider elliptic fibrations 
with a section whose fibers are all geometrically 
integral. This means that the fibration is isomorphic 
with its Weierstrass model. 

From Kodaira’s classification of possible singular 
fibers one finds that the components of reducible 
fibers of p which do not meet o form rational double 
point configurations disjoint from ø. Let X — X be 
the result of contracting these configurations and let 
p:X—B be the induced map. Then all fibers of p 
are irreducible with at worst nodes or cusps as 
singularities. In this case, one refers to X as the 
Weierstrass model of X. 

The Weierstrass model can be constructed as 
follows: the divisor 30 is relatively ample and, if 
E=p,Ox(30)> Og P wW? Gw* and p:P=P(E*) —B 
is the associated projective bundle, there is a 
projective morphism j: X — P such that j(X)=X. 

Now special fibers of X— B can have at most 
one singular point, either a cusp or a simple node. 
Thus, in this case 30 is relatively very ample 
and gives rise to a closed immersion j:X = P 
such that /*Op(1)=Ox(3c), where j is locally a 
complete intersection whose normal sheaf is 
N(X/P) = rw ®° & Ox(9c). This follows by rela- 
tive duality since wp/g = NA Qp/g = 7*w®(—3), due 
to the Euler exact sequence 


0 — Op/p — FE(-1) — Op > 0 


The morphism p: X—B is then a local complete 
intersection morphism (cf. Fulton (1984)) and has a 
virtual relative tangent bundle Tx 3 =[j*Tp/p] — 
[N'x/p] in the K-group K*(X). The Todd class of 


Tx g is given by 
Td(Txjp) =1 -3p K + yy (120: pK + 13p 1K’) 
— 50° p 'K* + terms of higher degree 


Now if f: X — B denotes the dual elliptic fibration, 
defined as the relative moduli space of torsion-free 
rank-1 sheaves of relative degree 0, it is known that 
for t € B there is an isomorphism X, = X, between 
the fibers of both fibrations. Since we assume that 
the original fibration p: X — B has a section o, then 
p and p are globally isomorphic; hereafter we 
identify X = X, where X denotes the compactified 
relative Jacobian of X. 

Note that X is the scheme representing the 
functor which, to any scheme morphism ¢:S— B, 
associates the space of equivalence classes of S-flat 
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sheaves on ps: X xp S— S, whose restrictions to the 
fibers of ¢ are torsion-free (the usual definition of 
“torsion free” is only for integral varieties, i.e., 
varieties whose local rings have no zero-divisors. In 
this case, a sheaf M is torsion free if for any open 
subset U, any nonzero section m of M on U and 
any nonzero section a of the relevant functions 
sheaf, one has a-m #0. When the variety is not 
integral (it is reducible, or nonreduced) this defini- 
tion has no real meaning, then what substitutes the 
notion of “torsion free” is the Simpson definition 
of “pure of maximal dimension”: a sheaf M is 
“torsion free” in this sense if the support of any of 
its subsheaves is the whole variety (cf. Huybrechts 
and Lehn (1997)), of rank 1 and degree 0; two such 
sheaves F, F’ are considered to be equivalent if 
F' =F @ pL for a line bundle £ on S (cf. Altman 
and Kleiman (1980); note the Altman—Kleiman 
compactification of the relative Jacobian applies to 
our situation since we consider elliptic fibrations 
with integral fibers). Moreover, the natural morph- 
ism X— X,xt>T, ® Ox,(o(t)) is an isomorphism 
(of B-schemes); here Z, is the ideal sheaf of the 
point x in Xz. 

Note also that if m: Y— X, is the normalization 
of one of our fibers X; and z is the exceptional 
divisor (the pre-image of the singular point x) then 
™(Oy(—z)) is the maximal ideal of x. 

The variety X is a fine moduli space. This means 
that there exists a coherent sheaf P on X xz X flat 
over X, whose restrictions to the fibers of p are 
torsion free, and of rank 1 and degree 0. The sheaf 
P is defined, up to tensor product, by the pullback 
of a line bundle on X, and is called the universal 
Poincaré sheaf, which we will normalize by letting 
Pisa ~ Ox. We shall henceforth assume that P is 
normalized in this way, so that 


P = T1 @T'Ox(c) 8 #*Ox(o) 8 qr! 


where m, ĉĉ and q=p o m=p o ĉĝî refer to the diagram 


Kex = eX 
AS 
p 


and TZ, is the ideal sheaf of the diagonal immersion 
XX XpX. 

Starting with the diagram and with the kernel 
given by the normalized relative universal Poincaré 
sheaf P on the fibered product X xg X, we define 
the relative Fourier-Mukai transform as 








® = P? : D(X) > D(X) 
Fis ®(F) = Ra, (1° F Q P) 
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Note that ®(F) can be generalized if we allow 
changes in the base space B, that is, we consider 
base-change morphisms g:$— B. 

We close this section with some remarks: 


e An important feature of Fourier-Mukai functors 
is that they are exact as functors of triangulated 
categories. In more familiar terms, we can say 
that for any exact sequence 0 > N => F-G—0 
of coherent sheaves in X, we obtain an exact 
sequence 


> ONG) > PN) > D(F) + OG) > 
PIN) BES 


where we have written ®P=0®ë" and ®'(F)= 
H'(®(F)) denotes the ith cohomology sheaves of 
the complexes ®(F). 

Given a Fourier-Mukai functor ®", a complex 
F in D(X) satisfies the WIT; condition (or is WIT;) 
if there is a coherent sheaf G on X such that 
PE(F) ~ Gli] in D(X), where Gfi] is the associated 
complex concentrated in degree 7. Furthermore, 
we say that F satisfies the IT; condition if, in 
addition, G is locally free. 

When the kernel E is simply a sheaf Q on X x 
X flat over X, the cohomology and base-change 
theorem (cf. Hartshorne (1977)) allows one to 
show that a coherent sheaf F on X is IT; if and 
only if H(X, F ® Qe)=0 for all £ € X and for all 
j #1, where Qe denotes the restriction of Q to 
X x {€} and F is WIT if and only if it is IT. 

The acronym “IT” stands for “index theorem,” 
while “W” stands for “weak.” This terminology 
comes from Nahm transforms for connections on 
tori in complex differential geometry. 

e The Parseval formula for the relative Fourier- 
Mukai transform has been proved by Mukai in 
his original Fourier-Mukai transform for abelian 
varieties and can be extended to any situation 
in which a Fourier-Mukai transform is fully 
faithful. 

e For physical applications, it is often convenient to 
work in cohomology H*(X, Q). The passage from 
D(X) to H*(X, Q) can be described as follows. We 
first send a complex Z € D(X) to its natural class 
in the K-group; we then make use of the fact that 
the Chern character ch maps K(X) — CH*(X) 8 Q 
and finally we apply the cycle map to H*(X,Q). 
This passage (by abuse of notation) is often denoted 
by ch:D(X)—H"(X,Q), it commutes with 
pullbacks and transforms tensor products into 
dot products. ati if we substitute the 
Mukai vector v(Z) =ch(Z),/Td(X) for the Chern 


character ch(Z) i. we find commutative 


diagram 





H*(X, Q) ~ HY, Q) 


This can be shown using the Grothendieck- 
Riemann-Roch theorem and the fact that the 
power series defining the Todd class starts with 
constant term 1 and thus is invertible. 


Vector Bundles for Heterotic Strings 


A compactification of the ten-dimensional heterotic 
string is given by a holomorphic, stable G-bundle V 
(with G some Lie group specified below) over a 
Calabi-Yau manifold X. The Calabi-Yau condition, 
the holomorphy and stability of V are a direct 
consequence of the required supersymmetry in the 
uncompactified spacetime. We assume that the 
underlying ten-dimensional space Myo is decom- 
posed as Mip = M4 x X, where M4 (the uncompac- 
tified spacetime) denotes the four-dimensional 
Minkowski space and X a six-dimensional compact 
space given by a Calabi-Yau 3-fold. To be more 
precise: supersymmetry requires that the connection 
A on V satisfies 


Pe =F =0, Flap=o 


where J denotes a Kahler form of X. It follows that 
the connection has to be a holomorphic connection 
on a holomorphic vector bundle and, in addition, 
satisfies the Donaldson—Uhlenbeck—Yau equation, 
which has a unique solution if and only if the vector 
bundle is polystable. 

In addition to X and V, we have to specify a 
B-field on X of field strength H. In order to get an 
anomaly-free theory, the Lie group G is fixed to be 
either Eg x Eg or Spin(32)/Z2 or one of their 
subgroups and H must satisfy the identity 


dH =trRAR-—TrFAF 


where R and F are, respectively, the associated 
curvature forms of the spin connection on X and the 
gauge connection on V. Also tr refers to the trace of 
the composite endomorphism of the tangent bundle 
to X and Tr denotes the trace in the adjoint 
representation of G. For any closed four-dimen- 
sional submanifold X4 of the ten-dimensional space- 
time M19, the 4-form tr RA R—TrF AF must have 
trivial cohomology. Thus, a necessary topological 
condition V has to satisfy is ch)(TX)=ch2(V), 
which simplifies to c2(TX)=c2(V) for Calabi-Yau 
manifolds, V being an SU(m) vector bundle. 


A physical interpretation of the third Chern class 
can be given as a result of the decomposition of the 
ten-dimensional spacetime into a four-dimensional 
flat Minkowski space and X. The decomposition of 
the corresponding ten-dimensional Dirac operator 
with values in V shows that massless four- 
dimensional fermions are in one-to-one correspon- 
dence with zero modes of the Dirac operator Dy on 
X. The index of Dy can be effectively computed 
using the Hirzebruch-Riemann-Roch theorem and 
is given by 


index(D) = J 


X 


Td(X)ch(V) = al c3( V) 

2 Jx 
equivalently, we can write the index as index(D) = 
ya (—1)* dim H(X, V). For stable vector bundles, 
we have H°(X, V)= H?(X, V)=0 and so the index 
computes the net number of fermion generations Neen 
in the respective model. 

Now it has been observed that the inclusion of 
background 5-branes changes the anomaly con- 
straint. Various 5-brane solutions of the heterotic 
string equations of motion have been discussed in 
the gauge 5-brane, the symmetric 5-brane, and the 
neutral 5-brane. It has been shown that the gauge 
and symmetric 5-brane solutions involve finite-size 
instantons of an unbroken nonabelian gauge group. 
In contrast, the neutral 5-branes can be interpreted 
as zero-size instantons of the SO(32) heterotic 
string. The magnetic 5-brane contributes a source 
term to the Bianchi identity for the 3-form H, 


dH =trRAR—TrFAF+n; > 6) 


five-branes 


and integration over a 4-cycle in X gives the anomaly 
constraint 


(TX) = c(V) +[W] 


The new term 6) is a current that integrates to 1 in 
the direction transverse to a single 5-brane whose 
class is denoted by [W]. The class [W] is the 
Poincaré dual of an integer sum of all these sources 
and thus [W] should be an integral class, represent- 
ing a class in H2(X, Z). [W] can be further specified 
taking by into account that supersymmetry requires 
that S-branes are wrapped on holomorphic curves 
and thus [W] must correspond to the homology class 
of holomorphic curves. This fact constrains [W] to 
be an algebraic class. Further, algebraic classes 
include negative classes; however, these lead to 
negative magnetic charges, which are unphysical, 
and so they have to be excluded. This constrains [W] 
to be an effective class. Thus, for a given Calabi- 
Yau 3-fold X the effectivity of [W] constrains the 
choice of vector bundles V. 
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The study of the correspondence between the 
heterotic string (on an elliptic Calabi-Yau 3-fold) 
and F-theory (on an elliptic Calabi-Yau fourfold) 
has led Friedman, Morgan, and Witten to introduce 
a new class of vector bundles which satisfy the 
anomaly constraint with [W] nonzero. As a result, 
they prove that the number obtained by integration 
of [W] over the elliptic fibers of the Calabi-Yau 
3-fold agrees with the number of 3-branes given by 
the Euler characteristic of the Calabi-Yau fourfold 
divided by 24. 


Fourier—Mukai Transforms and Spectral Covers 


Let us now describe how the construction of vector 
bundles out of spectral data (first considered in 
Hitchin and Beauville, Narasimhan, and Ramanan) 
can be easily described in the case of elliptic 
fibrations by means of the relative Fourier-Mukai 
transform. This construction was widely exploited 
by Friedman, Morgan, and Witten to construct 
stable vector bundles on elliptic Calabi-Yau three- 
folds X, which we will summarize now. 

If V—X is a vector bundle of rank n which is 
semistable and of degree 0 on each fibre f of X — B, 
then its Fourier-Mukai transform ®!(V) is a torsion 
sheaf of pure dimension 2 on X. The support of 
!(V) is a surface i: C— X, which is finite of degree 
n over B. Moreover, ®!(V) is of rank 1 on C and, if 
C is smooth, then ®!(V)=i,L is just the extension 
by zero of some line bundle L € Pic(C). Conversely, 
given a sheaf G— X of pure dimension 2 which is 
flat over B, then ®(G) is a vector bundle on X of 
rank equal to the degree of supp(G) over B. 

This correspondence between vector bundles on X 
and sheaves on X supported on finite covers of B is 
known as the spectral cover construction. The 
torsion sheaf G is called the spectral sheaf (or line 
bundle) and the surface C=supp(G) is called the 
spectral cover. 

For the description of vector bundles on elliptic 
Calabi-Yau 3-folds X it is appropriate to take iL 
with Chern characters given by (nz, € H7(B, Q) 
and ag,sz € Z) 


cho L) = 0, chi (iL) = no + r*n 


chy (L) = On NE F art, ch3(4,L) =-Sp 


The characteristic classes of the rank-v vector bundle 
V can be obtained if we apply the Grothendieck- 
Riemann-Roch theorem to the projection 7: 


ch(V) = [7° (ch(i,.L)) ch(P)Td(T x78) 


where Td(Tx/g) as given above. 
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To make sure that the construction leads to SU(m) 
vector bundles we set ng = (1/2)nc, giving cı(V)=0 
and the remaining Chern classes are given by 


c2( V) = T(n)o + n (Ga), c3( V) — LY). 


where 
w = hci (B) (w — n) +402 — 2 nn(n — nci (B)) 


and ye Htt(C, Z) is some cohomology class 
satisfying mc.y=0 € H'!(B,Z). The general solu- 
tion for y has been derived by Friedman, Morgan, 
and Witten and is given by y=A(no\, — nTen + 
nmcci(B)) and y, = —An*n(r*n — nn*cı(B))o with 
S=Cno. The parameter A has to be determined 
such that cı(L) is an integer class. If n is even, 
A=m(m €Z,) and in addition we must impose 
n=c1(B) modulo 2. If n is odd, A=m-+ 1/2. 

It remains to discuss the stability of V. The 
stability depends on the properties of the defining 
data C and L. If C is irreducible and L a line bundle 
over C then V will be a vector bundle stable with 
respect to the polarization 


J= «Jo +r*Hg, €>0 


if e is sufficiently small. This has been proved by 
Friedman, Morgan, and Witten under the additional 
assumption that the restriction of V to the generic 
fiber is regular and semistable. Here Jọ refers to 
some arbitrary Kahler class on X and Hpg a Kahler 
class on the base B. It implies that the bundle V can 
be taken to be stable with respect to J while keeping 
the volume of the fiber f of X arbitrarily small 
compared to the volumes of effective curves asso- 
ciated with the base. That J is actually a good 
polarization can be seen by assuming « =0. Now we 
observe that n*Hpg is not a Kahler class on X since 
its integral is non-negative on each effective curve C 
in X; however, there is one curve, the fiber f, where 
the integral vanishes. This means that 7* Hz is on the 
boundary of the Kahler cone and, to make V stable, 
we have to move slightly into the interior of the 
Kahler cone, that is, into the chamber which is 
closest to the boundary point 7*Hg. Also we note 
that although 7*Hpg is in the boundary of the Kahler 
cone, we can still define the slope ua,(V) with 
respect to it. Since (7*Hg)* is some positive multiple 
of the class of the fiber f, semistability with respect 
to m*Hg is implied by the semistability of the 
restrictions V|; to the fibers. Assume that V is not 
stable with respect to J, then there is a destabilizing 
sub-bundle V’ cC V with pj(V’) > uj(V). But semi- 
stability along the fibers says that pu ~4,(V’) < 
UnH,(V). If we had equality, it would follow that 
V’ arises by the spectral construction from a proper 


subvariety of the spectral cover of V, contradicting 
the assumption that this cover is irreducible. So we 
must have a strict inequality ux~H,(V') < UH, (V). 
Now taking € small enough, we can ensure that 
L(V") < py(V), thus V’ cannot destabilize V. 


D-Branes and Homological Mirror 
Symmetry 


Kontsevich proposed a homological mirror symme- 
try for a pair (X,Y) of mirror dual Calabi-Yau 
manifolds; it is conjectured that there exists a 
categorical equivalence between the bounded 
derived category D(X) and Fukaya’s A, category 
F(Y), which is defined by using the symplectic 
structure on Y. A Lagrangian submanifold with a 
flat bundle gives an object of F(Y). If we consider a 
locally trivial family of symplectic manifolds Y (1.e., 
the symplectic form is locally constant as we vary Y 
in the family) the object of F(Y) undergoes mono- 
dromy transformations going round a loop in the 
base. On the other hand, the object of D(X) is a 
complex of coherent sheaves on X and under the 
categorical equivalence between D(X) and F(Y) the 
monodromy (of 3-cycles) is mapped to certain self- 
equivalences in D(X). 

Since all elements in D(X) can be represented by 
suitable complexes of vector bundles on X, we can 
consider the topological K-group and the image 
Kyo(X) of D(X). The Fourier-Mukai transform 
E: D(X)— D(X) induces then a corresponding 
automorphism Kj o)(X) — Kyo)(X) and also an auto- 
morphism on H‘*"(X,Q) if we use the Chern 
character ring homomorphism ch: K(X)— H* 
(X,Q), as described above. With this in mind, we 
can introduce various kernels and their associated 
monodromy transformations. 

For instance, let D be the associated divisor defining 
the large-radius limit in the Kahler moduli space and 
consider the kernel Oa (D), with A being the diagonal 
in X x X. The corresponding Fourier—Mukai trans- 
form acts on an object G € D(X) as twisting by a line 
bundle, that is, G — G ® O(D). This automorphism is 
then identified with the monodromy about the large 
complex structure limit point (LCSL point) in the 
complex structure moduli space. 

Furthermore, if we consider the kernel given by 
the ideal sheaf Za on A, we find that the action of 
674 on H(X) can be expressed by taking the 
Chern character ring homomorphism: 


ch(&75(G)) = cho(@%*(G)) — ch(G) 
_ ( J ch(G) Ta) ) -ch(G) 


Kontsevich proposed that this automorphism 
should reproduce the monodromy about the princi- 
pal component of the discriminant of the mirror 
family Y. At the principal component we have 
vanishing $ cycles (and the conifold singularity), 
thus the action of this monodromy on cohomology 
may be identified with the Picard—Lefschetz formula. 

Now for a given pair of mirror dual Calabi-Yau 
3-folds, it is generally assumed that A-type and 
B-type D-branes exchange under mirror symmetry. 
For such a pair, Kontsevich’s correspondence 
between automorphisms of D(X) and monodromies 
of 3-cycles can then be tested. More specifically, a 
comparison relies on the identification of two 
central charges associated to D-brane configurations 
on both sides of the mirror pair. 

For this, we first have to specify a basis for the 
3-cycles £; € H°(Y, Z) such that the intersection form 
takes the canonical form ©; - X; = ô; i+0),41 = i,j for 
i=0,...,62,1. It follows that a 3-brane wrapped about 
the cycle N= X`; n;X' has an (electric, magnetic) 
charge vector n = (n;). The periods of the holomorphic 
3-form Q are then given by 


n= fo 
yi 


and can be used to provide projective coordinates on 
the complex structure moduli space. If we choose a 
symplectic basis (Aj,B;) of H2(Y,Z) then the A; 
periods serve as projective coordinates and the B; 
periods satisfy the relations I! = n; ;OF/0I', where 
F is the prepotential which has, near the large- 
radius limit, the asymptotic form (as analyzed by 
Candelas, Klemm, Theisen, Yau, and Hosono, cf. 
“Further reading”): 


1 1 
F= =) Ravctatyte + 72 Captat 


abc 


7 y C2 s L TEN a 3 X(X) + const. 


where x(X) is the Euler characteristic of X,c,, are 
rational constants (with Cab = Cpa) reflecting an 
Sp(2h'' +2) ambiguity, and kabe is the classical 
triple intersection number given by 


Rube aa fi A Jo A Je 


The periods determine the central charge Z(n) of a 
3-brane wrapped about the cycle X = 5°, m;[¥;]: 


Z(n) = [2 = re 
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On the other hand, the central charge associated 
with an object E of D(X) is given by 





Zp m) e™*a ch(E) (1 + a) 


Now, physically it is assumed that the two central 
charges are to be identified under mirror symmetry. 
If we compare the two central charges Z(m) and 
Z(E), then we obtain a map relating the Chern 
characters ch(E) of E to the D-brane charges n. If we 
insert the expressions for ch(E) in ch(®74(E)), it 
yields a linear transformation acting on n, such that 
ng —ne +73, which agrees with the monodromy 
transformation about the conifold locus. 

Similarly, the monodromy transformation about 
the LCSL point corresponding to automorphisms 
[E] — [E ® Ox(D)] can be made explicit. 

Using the central charge identification, the auto- 
morphism/monodromy correspondence has been 
made explicit for various dual pairs of mirror 
Calabi-Yau 3-folds (given as hypersurfaces in 
weighted projective spaces). This identification pro- 
vides evidence for Kontsevich’s proposal of homo- 
logical mirror symmetry. 


See also: Derived Categories; Mirror Symmetry: A 
Geometric Survey. 
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Introduction 


Manifolds of dimension 4 play a distinguished role 
in physics and have done so ever since special and 
general relativity ushered in the celebrated four- 
dimensional spacetime. It is also the case that 
manifolds of dimension 4 play a distinguished role 
in mathematics: many generalities about manifolds 
of a general dimension do not apply in dimension 4; 
there are also phenomena in dimension 4 with no 
counterpart in other dimensions. 

This article describes some of the more important 
physical and mathematical properties of dimension 4. 
We begin with an account of some topological and 
geometric properties for manifolds in general, but 
avoiding dimension 4, and then embark on the 
dimension 4 discussion. The references at the end 
will serve to take the reader further into the subject. 


Topological, Piecewise-Linear, and 
Differentiable Structures for Manifolds 


In dealing with topological spaces which are mani- 
folds, one distinguishes three types of manifolds M: 
topological, piecewise-linear, and differentiable (also 
called smooth). It is possible to describe the more 
important differences between these three types 
using topological techniques. 

Consider then a manifold M of dimension n; M will 
always be assumed to be compact, connected and 


closed unless we indicate the contrary. The type of M is 
determined by examining whether the transition 
functions gag are homeomorphisms, (invertible) piece- 
wise-linear maps, or diffeomorphisms. Now, since the 
transition functions are maps from one subset of R” to 
another, we introduce the groups TOP,, PL,, and 
DIFF,, which are all the homeomorphisms, piecewise- 
linear maps, and diffeomorphisms of R”, respectively. 
We are naturally led to the three sets of inclusions: 


TOP, c TOP? C C TOP, © 
PL c PL C © C Pl, Cc: [I 
DIFF, C DIFF, C C DIFF, Cc 


For each of the three sets of inclusions we pass to 
the direct limit and construct the three limiting 
groups 


TOP, PL, DIFF [2] 


With these three groups are associated the classifying 
spaces BTOP, BPL and BDIFF. The transition 
functions gag are those of the tangent bundle to M; 
and there are three possible tangent bundles depending 
on the type of M and we denote these tangent bundles 
by TMror, TMr, and TMpjrr in an obvious nota- 
tion. Then to determine the tangent bundles TM7op, 
TMr, and TMp;rr one simply selects an element of the 
homotopy classes 


[M,BTOP], [M,BPL], and [M,BDIFF] [3] 


respectively. 

Given this threefold hierarchy of manifold struc- 
tures one wishes to know when one can straighten 
out a topological manifold to make it piecewise 
linear; and also, when can one smooth a piece- 
wise-linear manifold to make it differentiable? 


If dimM>S of M these two questions can be 
formulated as lifting problems. 


TOP versus PL for dim M + 4 


Taking the first of them, so that we are comparing 
piecewise-linear and topological structures on M, 
one can check BPL fibers over BTOP with fiber 
TOP/PL yielding 


TOP/PL > BPL 
Lr 4] 
BTOP 


A method for straightening out a PL manifold is now 
apparent: now a topological manifold is a choice of 
map a: M — BTOP, and a factorization of a through 
BPL will give M a PL structure. We show this below 


BPL 
6S }Įr a=nToP [5] 
M & BTOP 


The existence of the map 8:M —> BPL satisfying 
œ=ro 8 provides M with a PL structure and is a 
lifting of the map a from the base BTOP to the total 
space BPL. 

This lifting method, for passing from TOP 
structures to PL structures, does work, provided 
dim M > 5, since we have the stability result that 


TOP, TOP 
Pi JL- 





n>5 [6] 


For the map £ to exist the obstructions to the lifting 
which are cohomology classes of the form 


H**"(M; mk (TOP/PL)) 7] 


must vanish. However, Kirby and Siebenmann have 
shown that 


TOP/PL ~ K(Z2,3) [8] 
where K(Z2,3) is Eilenberg-Mac Lane space so that 
its sole nonvanishing homotopy group is in dimen- 
sion 3 giving us 


Z ifn=3 
„(TOP/PL) = 2 9 
Tn /PL) f otherwise P] 


Any obstruction to ĝ@’s existence is a class e(M), 
say, in 


H (M; Z2) dimM> 5 [10] 


When e(M) vanishes, the map 8 exists and furnishes 
M with a PL structure; if e(M) =0 it is natural to go 
on to ask how many (homotopy classes of) such (’s 
exist? Standard obstruction theory says the relevant 
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homotopy classes are just the whole cohomology 
group 


H! (M; n;(TOP/PL)) [11] 
which, since k = 3, is just 
H?(M; Z2) [12] 


So, for dim M > 5, we see that when a closed 
topological manifold M acquires a PL structure by 
the lifting process just described, then the possible 
distinct PL structures are isomorphic to 


H? (M; Z2) [13] 


which is not zero in general. 

Finally, if dim M < 3, then the notions PL and 
TOP coincide, so we are left with the case dim M = 4 
which we shall come to below. Now we wish to 
describe the next step in the sequence TOP, PL, 
DIFF which is the smoothing problem. 


PL versus DIFF for dim M 4 4 


Similar ideas are used to address the question of 
smoothing a piecewise-linear manifold — however, 
the results are different. Let us assume that M is a 
closed PL manifold with dim M > 5. This time the 
fibration is 


PL/DIFF — BDIFF 
| a [14] 
BPL 


The smoothing of a piecewise-linear M can also be 
handled with obstruction theory and leads us immedi- 
ately to the consideration of the homotopy groups 
T(PL/DIFF). This time the nontrivial homotopy 
groups of the fiber are much more numerous than in 
the piecewise-linear case. In fact one has 


0 ifn<6 
28 i = 7 
ira] s 
t(PL/DIFF) = | [15] 
Z992 if n= 11 


The obstructions to passing from a PL to a DIFF 
structure on M now lie in 


H**!(M:; 1,(PL/DIFF)) [16] 


and the number of distinct liftings comprises the 
cohomology group 


H* (M; 7,(PL/DIFF)) [17] 
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As an illustration of all this, consider the case 
M = S’; then the first nontriviality occurs when 1 = 7 
and so the obstruction to smoothing S” lies in 


H®(S’; 17(PL/DIFF)) [18] 


which is of course zero — this means that S’ can be 
smoothed, a fact which we know from first 
principles. However, by the obstruction theory 
introduced above, the resulting smooth structures 
are isomorphic to 


H’(S’;17(PL/DIFF)) =H’(S’;Zg)=Zog [19] 


Hence, we have the celebrated result of Milnor and 
Kervaire and Milnor that S” has 28 distinct 
differentiable structures, 27 of which correspond to 
what are known as exotic spheres. 

Lastly, if dim M < 3, then PL and DIFF coincide — 
this leaves us with the case of greatest interest 
namely dim M=4. 


The Strange Case of Four Dimensions 


In four dimensions there are phenomena which have 
no counterpart in any other dimension. First of all, 
there are topological 4-manifolds which have no 
smooth structure, though if they have a PL structure, 
then they possess a unique smooth structure. Second, 
the impediment to the existence of a smooth structure 
is of a completely different type to that met in the 
standard obstruction theory — it is not the pullback of 
an element in the cohomology of a classifying space, 
that is, it is not a characteristic class. Also the four- 
dimensional story is far from completely known. 
Nevertheless, there are some very striking results 
dating from the early 1980s onwards. 

We begin by disposing of the difference between 
PL and DIFF structures: our earlier results together 
with the vanishing statement 


t(PL/DIFF) =0, n<6 [20] 


mean that every PL 4-manifold possesses a unique 
DIFF structure. Thus, we can take the crucial 
difference to be between DIFF and TOP. 

In Freedman (1982) all, simply connected, topo- 
logical 4-manifolds were classified by their intersec- 
tion form q. 

We recall that q is a quadratic form constructed 
from the cohomology of M as follows: take two 
elements a and 8 of H? (M; Z) and form their cup 
product a U 8 € H*(M;Z); then we define g(a, 3) by 


q(a, 6) = (a U 6)[M] |21] 


where (aœ U B)[M] denotes the integer obtained by 
evaluating œa U 8 on the generating cycle [M] of the 


top homology group H4(M;Z) of M. Poincaré 
duality ensures that such a form is always non- 
degenerate over Z and so has detq = F1; q is then 
called unimodular. Also we refer to q, as “even” if 
all its diagonal entries are even, and as “odd” 
otherwise. 

Freedman’s work yields the following: 


Theorem (Freedman). A simply connected 
4-manifold M with even intersection form q belongs 
to a unique homeomorphism class, while if q is 
odd there are precisely two nonhomeomorphic 
manifolds M with q as their intersection form. 


This is a very powerful result — the intersection 
form q very nearly determines the homeomorphism 
class of a simply connected M, and actually only 
fails to do so in the odd case where there are still 
just two possibilities. Further, every unimodular 
quadratic form occurs as the intersection form of 
some manifold. 

As an illustration of the impressive nature of 
Freedman’s work, choose M to be the sphere S4, 
since H*(S*;Z) is trivial, then q is the zero quadratic 
form and is of course even; we write this as q =f. 
Now recollect that the Poincaré conjecture in four 
dimensions is the statement that any homotopy 
4-sphere, S$ say, is actually homeomorphic to S"*. 
Well, since H?(S?; Z) is also trivial then any S$} also 
has intersection form q=. Applying Freedman’s 
theorem to S% immediately asserts that $} belongs to 
a unique homeomorphism class which must be that 
of St thereby establishing the Poincaré conjecture. 

Freedman’s result combined with a much earlier 
result of Rohlin (1952) also gives us an example of a 
nonsmoothable 4-manifold: Rohlin’s theorem asserts 
that given a smooth, simply connected, 4-manifold 
with even intersection form q, then the signature — 
the signature of g being defined to be the difference 
between the number of positive and negative eigen- 
values of q — o(q) of q is divisible by 16. 


Now write 


2-100 00 0 0 
12-10 00 0 0 
0-12-10 00 0 
0 0-12-10 0 0 

lo o 0-1 2-1 0-1) =Fs 2 
00 0 0-1 2-1 0 
000 0 0-1 2 0 
000 0-10 02 


(Eg is actually the Cartan matrix for the exceptional 
Lie algebra eg), then, by inspection, g is even, and 
by calculation, it has signature 8. By Freedman’s 
theorem there is a single, simply connected, 4-mani- 
fold with intersection form g=Eg. However, by 


Rohlin’s theorem, it cannot be smoothed since its 
signature is 8. 

The next breakthrough was due to Donaldson 
(1983). Donaldson’s theorem is applicable to defi- 
nite forms g, which by appropriate choice of 
orientation on M we can take to be positive definite. 
One has: 


Theorem (Donaldson). A simply connected, smooth 
4-manifold, with positive-definite intersection form 
q is always diagonalizable over the integers to 
q=diag(1,...,1). 


One can immediately deduce that no, simply 
connected, 4-manifold for which q is even and 
positive definite can be smoothed! 

For example, the manifold with q = Eg @ Eg has 
signature 16 (by Rohlin’s theorem). But since Eg is 
even, then so is Eg @Eg and so Donaldson’s 
theorem forbids such a manifold from existing 
smoothly. 

In fact, in contrast to Freedman’s theorem, which 
allows all unimodular quadratic forms to occur as 
the intersection form of some topological manifold, 
Donaldson’s theorem says that in the positive- 
definite, smooth, case only ove quadratic form is 
allowed, namely I. 

Donaldson’s work makes contact with physics 
because it uses the Yang-Mills equations as we now 
outline. 

Let A be a connection on a principal SU(2) bundle 
over a simply connected 4-manifold M with posi- 
tive-definite intersection form. If the curvature 
2-form of A is F, then F has an L* norm which is 
the Euclidean Yang-Mills action S. One has 


S = ||FI|° = -J tr(F A *F) [23] 


where *F is the usual dual 2-form to F. The minima 
of the action S$ are given by those A, called 
instantons, which satisfy the famous self-duality 
equations 


F=xF [24] 


Given one instanton A which minimizes § one can 
perturb about A in an attempt to find more 
instantons. This process is successful and the space 
of all instantons can be fitted together to form a 
global moduli space of finite dimension. For the 
instanton which provides the absolute minimum of 
S, the moduli space M is a noncompact space of 
dimension 5. 

We can now summarize the logic that is used to 
prove Donaldson’s theorem: there are very strong 
relationships between M and the moduli space M; 
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for example, let q be regarded as an n x n matrix 
with precisely p unit eigenvalues (clearly p < n and 
Donaldson’s theorem is just the statement that 
p=n), then M has precisely p singularities which 
look like cones on the space CP*. These combine to 
produce the result that the 4-manifold M has the 
same topological signature Sign(M) as p copies of 
CP’; and so they have signature a — b, where a of 
the CP?’s are oriented as usual and b have the 
opposite orientation. Thus, 


Sign (M) =a—b [25] 


Now by definition, Sign (M) is the signature o(q) of 
the intersection form q of M. But, by assumption, q is 
positive definite n x n so o(q) =n = Sign (M). Hence, 


n=a—b [26] 
However, a+ b=p and p < n so we can say that 
n=a-—b, p=a+b<n [27] 
but one always has a + b > a — b so we have 


n<p<n>p=n [28] 


which is Donaldson’s theorem. 


Donaldson’s Polynomial Invariants 


Donaldson extended his work by introducing poly- 
nomial invariants also derived from Yang-Mills 
theory and to discuss them we must introduce 
some notation. 

Let M be a smooth, simply connected, orientable 
Riemannian 4-manifold without boundary and A be 
an SU(2) connection which is anti-self-dual so that 


F=—xF [29] 


Then the space of all gauge-inequivalent solutions to 
this anti-self-duality equation — the moduli space 
Mpg — has a dimension given by the integer 


dim M; = 8k — 3(1 + b7) (30) 


Here k is the instanton number which gives the 
topological type of the solution A. The instanton 
number is minus the second Chern class c2(F) € 
H? (M; Z) of the bundle on which the A is defined. 
This means that we have 


k=-a(PIM = ziz | t(FAF)EZ [31 


The number b3 is defined to be the rank of the 
positive part of the intersection form q of M. 
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A Donaldson invariant q% , is a symmetric integer 
polynomial of degree d in the 2-homology H2(M; Z) 
of M 


di, : H2(M) x --. x H)(M) — Z [32] 
ee 
d factors 


Given a certain map mi, 
mi : H;(M) > H (M; ) [33] 


if a € Ha(M) and * represents a point in M, we 
define q% (a) by writing 


Far) = m3 (a) (*) [M] [34] 


The evaluation of [M,] on the RHS of the above 
equation means that 


2d + 4r = dim M; 35] 


so that M, is even dimensional; this is achieved by 
requiring b3 to be odd. 

Now the Donaldson invariants Vi are differential 
topological invariants rather than topological invari- 
ants but they are difficult to calculate as they require 
detailed knowledge of the instanton moduli space 
Mpg. However they are nontrivial and their values 
are known for a number of 4-manifolds M. For 
example, if M is a complex algebraic surface, a 
positivity argument shows that they are nonzero 
when d is large enough. Conversely, if M can be 
written as the connected sum 


M = M1#M2 


where both Mı and M3 have bj > 0, then they all 
vanish. 


Topological Quantum Field Theories 


Turning now to physics, it is time to point out that 
the qh. , can also be obtained, Witten (1988), as the 
correlation functions of twisted N=2 supersym- 
metric topological quantum field theory. 

The action S for this theory is given by 


4 1 V 1 * V 
s=f d svaf Fok + GFP 
1 
+ z PDD" A + ID pux” — inD y” 
i a4 
= g Xm X" | = 5 Alm Y] 
i 1 2 
-5m n] — z [$A } [36] 


where F,,, is the curvature of a connection A, and 
(Q, À, N, Yus Xv) are a collection of fields introduced 


in order to construct the right supersymmetric 
theory; ¢ and A are both spinless while the multiplet 
(Yas Xv) contains the components of a O-form, a 
1-form, and a self-dual 2-form, respectively. 

The significance of this choice of multiplet is that 
the instanton deformation complex used to calculate 
dim M, contains precisely these fields. 

Even though S contains a metric, its correlation 
functions are independent of the metric g so that S 
can still be regarded as a topological quantum field 
theory. This is because both S$ and its associated 
energy momentum tensor T = (6S/6g) can be written 
as BRST commutators S={O, V}, T={O, V’} for 
suitable V and V’. 

With this theory, it is possible to show that the 
correlation functions are independent of the gauge 
coupling and hence we can evaluate them in a small 
coupling limit. In this limit, the functional integrals 
are dominated by the classical minima of S$, which 
for A, are just the instantons 


Fy, = -F [37] 


We also need @ and A to vanish for irreducible 
connections. If we expand all the fields around the 
minima up to quadratic terms and do the resulting 
Gaussian integrals, the correlation functions may be 
formally evaluated. 

A general correlation function of this theory is 
given by 


<P>= J DF exp|—S] P(F) (38) 


where F denotes the collection of fields present in 
S and P(F) is some polynomial in the fields. 

S has been constructed so that the zero modes in 
the expansion about the minima are the tangents to 
the moduli space Mg. This suggests doing the DF 
integration as follows: express the integral as an 
integral over modes, then integrate out all the 
nonzero modes first leaving a finite-dimensional 
integration over the compactified moduli space 
Mg. The Gaussian integration over the nonzero 
modes is a boson-fermion ratio of determinants, 
which supersymmetry constrains to be +1, bosonic 
and fermionic eigenvalues being equal in pairs. 

This amounts to writing 


<P>= | P, 39) 


Mi 
where P, denotes some n-form over My, and 
n= dim M}. If the original polynomial P(F) is 
judiciously chosen, then calculation of <P> repro- 
duces evaluation of the Donaldson polynomials q%,. 


The Seiberg-Witten Equations 


The Seiberg-Witten equations constitute another 
breakthrough in the work on the topology of 
4-manifolds, since they greatly simplify the calcula- 
tion of the data supplied by the Donaldson 
polynomial invariants. We shall discuss this later 
below but turn now to the equations themselves. 

If we choose an oriented, compact, closed, 
Riemannian manifold M, then the data we need for 
the Seiberg-Witten equations are a connection A on 
a line bundle L over M and a “local spinor” field w. 
The Seiberg—Witten equations are then 


Py=0, Fr=—july [40] 


where @ is the Dirac operator and I is made from 
the gamma matrices IT; according to T= (1/2) 
L; Tj |dx’ /\ dx. 

We call y a local spinor because global spinors 
may not exist on M; however, in dimension 4, 
orientability guarantees that a spin, structure exists 
on M (a choice of spin, structure on M is an extra 
piece of data in the Seiberg-Witten case); w is then 
the appropriate section for the spin, bundle and 
behaves locally like a spinor coupled to the U(1) 
connection A. Let Spin,(M) denote the set of 
isomorphism classes of spin, structures on M then, 
for the case bf >1 -— the case b} =1 has some 
technicalities — the Seiberg-Witten invariants deter- 
mine a map SW of the form 


SW : Spin.(M) — Z [41] 

We emphasize that A is just a U(1) abelian 

connection and so F=dA, with F* denoting the 
self-dual part of F. 

We shall now have a look at an example of a new 

result obtained directly from the Seiberg—Witten 


equations. The equations clearly provide the 
absolute minima for the action 


s= | {awh +F +r) a 
M 
If we use a Weitzenböck formula to relate the 


Laplacian V% V4 to 9%@, plus curvature terms, we 
find that S satisfies 


J {1v HF + fru} 


= | {Iau HHE + Belt RP} 143 


= | {IVa +EP +410 RE) 


+7“ci(L) [44] 
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where R is the scalar curvature of M and c;(L) is the 
Chern class of L. 

We notice that the action now looks like one for 
monopoles. But now suppose that R is positive and 
that the pair (A,w) is a solution to the Seiberg- 
Witten equations, then the left-hand side (LHS) of 
this last expression is zero and all the integrands on 
the RHS are positive so the solution must obey y% = 0 
and F*=0. A technical point is that if M has by > 1, 
then a perturbation of the metric can preserve the 
positivity of R but perturb F*=0 to be simply F=0 
rendering the connection A flat. Hence, in these 
circumstances, the solution (A, W) is the trivial one. 
This means that we have a new kind of vanishing 
theorem in four dimensions. 


Theorem (Witten 1994). No 4-manifold with by > 
1 and nontrivial solution to the Seiberg—Witten 
equations admits a metric of positive scalar curvature. 


Now, for technical reasons, we assume that the 
qy, have the property that 


qari = 4q, [4 5| 


A simply connected M with this property is called of 
simple type. We also define 4% by writing 


a To if d=(by + 1) mod 2 46 
a 54a, if d=by mod 2 


The generating function Gy(a) is now given by 
“1. 
Gulo) =X gao) 47] 


According to Kronheimer and Mrowka (1994), 
Gm(a) can be expressed in terms of a finite number 
of classes (known as basic classes) «(Kj € H?(M)) 
with rational coefficients a; (the Seiberg—Witten 
invariants) resulting in the formula 


Gyula) = expla: a/2) 9 a exp|k; - a] [48] 


Hence, for M of simple type, the polynomial 
invariants are determined by a (finite) number of 
basic classes and the Seiberg-Witten invariants. 

Returning now to the physics we find that the 
quantum field theory approach to the polynomial 
invariants relates them to properties of the moduli 
space for the Seiberg-Witten equations rather than 
to properties of the instanton moduli space Mg. 

The moduli space for the Seiberg-Witten equa- 
tions, unlike the instanton case, is compact and 
generically has dimension 


2 (L) — 2x(M) — 30(M) 


: 49 
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y(M) and o(M) being the Euler characteristic and 
signature of M, respectively. When 


ci(L) = 2x(M) + 30(M) [50] 


we get a zero-dimensional moduli space consisting 
of a finite collection of points 


Pissa N] [51] 


Now each point P; has a sign ¢;= +1 associated 
with it coming from the sign of the determinant of 
elliptic operator whose index gave the dimension of 
the moduli space. The sum of these signs is an 
integer topological invariant denoted by nz, that is, 


N 
nr = Ei [52] 
j=1 


Returning now to our formula for Gy(a), one 


finds that 


Gu (a) = 2?) expla-a/2] S npexplei(L)-a] [53] 
T 


p(M) = 1+4(7x(M) + 110(M)) [54] 


and the sum over L on the RHS of the formula is 
over line bundles L that satisfy 


È (L) = 2x(M) + 30(M) [55] 


that is, it is a sum over L with zero-dimensional 
Seiberg-Witten moduli spaces. 

Comparison of the two formulas for Gy(a) — the 
first mathematical in origin and the second physi- 
cal — allows one to identify the Seiberg-Witten 
invariants a; and the Kronheimer—Mrowka basic 
classes «x; as the c,(L)’s. 

The results described thus far are for simply 
connected 4-manifolds but this condition is not 
obligatory for and there is also a theory in the non- 
simply-connected case (Mariño and Moore 1999). 

The physics underlying these topological results is 
of great importance since many of the ideas 
originate there. It is known that the computation 
of the Donaldson invariants there uses the fact that 
the N=2 gauge theory is asymptotically free. This 
means that the ultraviolet limit being one of weak 
coupling is tractable. However, the less tractable 
infrared or strong-coupling limit would do just as 
well to calculate the Donaldson invariants since 
these latter are metric independent. 

In Seiberg and Witten’s work, this infrared 
behavior is actually determined and it is found 
that, in the strong-coupling infrared limit, the theory 
is equivalent to a weakly coupled theory of abelian 
fields and monopoles. There is also a duality 


between the original theory and the theory with 
monopoles which is expressed by the fact that the 
(abelian) gauge group of the monopole theory is the 
dual of the maximal torus of the group of the 
nonabelian theory. 

We recall that the Yang-Mills gauge group in this 
discussion is SU(2). Seiberg and Witten’s results 
mean the replacement of SU(2) instantons used to 
compute the Donaldson invariants by the counting 
of U(1) monopoles. This calculation of the non- 
abelian Donaldson data by abelian Seiberg-Witten 
data theory is much like the representation theory of 
a nonabelian Lie group G where everything is 
determined by an abelian object: the maximal torus. 

The theory considered by Seiberg and Witten 
possesses a collection of quantum vacua labeled by a 
complex parameter u which turns out to parametrize a 
family of elliptic curves. A central part is played by a 
function r(u) on which there is a modular action of 
SL(2, Z). The successful determination of the infrared 
limit involves an electric-magnetic duality and the 
whole matter is of very considerable independent 
interest for quantum field theory, quark confinement, 
and string theory in general. 


Seiberg-Witten Theory and Exotic 
Structures on 4-Manifolds 


We saw earlier that, when dim M + 4, a manifold 
may possess a finite number of differentiable 
structures, $” having 28 distinct smooth structures. 
However, in dimension 4, Seiberg—Witten theory has 
been used to show that there are many 4-manifolds 
with a countable infinity of smooth structures. We 
just mention two: the K3 surface has infinitely 
many smooth structures as does the manifold 
CP245CP . This is another instance of how dimen- 
sion 4 differs from all other dimensions. This infinite 
variety of exotic smooth structures in four dimen- 
sions is also of great interest to physics. 

An outstanding four-dimensional matter still is the 
smooth Poincaré conjecture which asks whether a 
smooth 4-manifold M homotopic to S* is diffeo- 
morphic to S*? Such an M is certainly homeomorphic 
to S* because this is the standard Poincaré conjecture 
proved by Freedman and, if the answer to this question 
is yes then S* would be an example of a 4-manifold 
with no exotic smooth structures. There is at present 
no consensus on the answer to this question. 


Exotic Structures on Open 4-Manifolds 


If M is an open manifold, that is, a noncompact 
manifold without boundary, and M =R” then, for 


n #4, there is only one smooth structure; but for 
n=4, there are exotic differentiable structures on 
R*. In fact, Gompf showed that there is a continuum 
of exotic differentiable structures that can be placed 
on R’. 


Symplectic and Kahler 4-Manifolds 


Many 4-manifolds are symplectic, and symplectic 
manifolds are central in physics; there are many results 
obtained using Seiberg-Witten theory concerning the 
topology and geometry of symplectic manifolds. The 
exotic K3 structures referred to above are all symplec- 
tic and so there is no shortage of symplectic structures 
even within one homeomorphism class. Taubes 
obtained far-reaching new results for symplectic 
4-manifolds including establishing an equivalence 
between the Seiberg—Witten invariants in the symplec- 
tic case and the Gromov invariants. 

Kahler manifolds possess, simultaneously, com- 
patible, Riemannian, symplectic and complex struc- 
tures and, beginning with Witten’s work, there are 
many results to be found for Kahler 4-manifolds 
using Seiberg—Witten techniques. 


4-Manifolds with Boundary 


There is a very important extension of the Donaldson- 
Seiberg—Witten theory to 4-manifolds M with bound- 
ary OM = N. When 0M £ 4, the Donaldson invariants 
are not numerical invariants but take values in HF(N) 
where HF(N) denotes what is called the Floer 
homology of the 3-manifold N. Topological quantum 
field theory is the ideal setting for this theory since it 
naturally treats manifolds with boundaries. The Floer 
homology groups HF(N) act as Hilbert spaces for the 
quantum fields defined on the boundary. There is now 
a full interplay of 4-manifold theory and 3-manifold 
theory as well as Yang-Mills theory in three and four 
dimensions. This interplay is often realized by taking 
two 4-manifolds Mı and M2 with the same boundary 
N and joining them along N to obtain a closed 
4-manifold M so that 


M = M, Un Mo [56] 


Given a 3-manifold N, and an SU(2) connection 
A, Floer studied the critical points of the Chern- 
Simons function f(A) defined by 


f(A) =f tr(AAdA+$AAAAA) [57 
NOJN 


where f(A) is regarded as a function on the infinite- 
dimensional space A of connections. The function f(A) 
changes by an integer under a gauge transformation 
and so descends to a single-valued gauge-invariant 
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function on the space of gauge orbits A/G if one 
considers exp (2rkif(A)) where k € Z (G being the 
group of gauge transformations). Morse theory 
applied to this infinite-dimensional setting gives an 
infinite Morse index to each critical point, a pathology 
which is avoided by only defining the difference of the 
index between two critical points using spectral flow. 
The critical points correspond, via gradient flow and a 
consideration of the instanton equations 


letel [58] 


on the 4-manifold N x R, to the flat connections on 
the 3-manifold N. The latter are identifiable as the 
set of (equivalence classes) of representations of the 
fundamental group mı(M) in the gauge group SU(2), 
that is, with 


Hom(7:ı (N), SU(2))/Ad SU(2) [59] 


For the Seiberg-Witten formulation, let À denote a 
connection on the 3-manifold N with curvature 
F(A). Then the Chern-Simons function f(A) is 
replaced by the abelian Chern-Simons function 
together with a quadratic fermion term resulting in 
the function fSW (A), defined by 


PYA) = | {opo+ Array} 6o 


where JP, denotes the self-adjoint Dirac operator in 
three dimensions acting on a spinor ¢ on N; because 
of the presence of the Chern—Simons function 
{S“(A) is only defined up to a multiple of 872 in a 
manner similar to the case for f(A). Gradient flow 
together with the Seiberg—Witten equations on the 
4-manifold N xR result in critical points corre- 
sponding to the solutions to 


Pp = 0, 


which is a three-dimensional version of the Seiberg— 
Witten equations. 

The critical point theory of these two functions f(A) 
and f’“(A) permit the construction of the instanton 
Floer homology groups HF'™*(N) and HF™(N), 
respectively. In fact, there are several kinds of Floer 
homology: Lagrangian Floer homology, instanton 
Floer homology, Heegard—Floer homology, Seiberg— 
Witten—Floer homology and conjectures concerning 
their relations to one another. 

There are still many unanswered questions of 
joint interest to mathematicians and physicists in the 
entire area of 4-manifold theory. 


F(A) = leg 61 


See also: Electric-Magnetic Duality; Gauge Theoretic 
Invariants of 4-Manifolds; Floer Homology; Topological 
Quantum Field Theory: Overview. 
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Introduction 


Since the 1970s, dimension theory for dynamics has 
evolved into an independent field of mathematics. 
Its main goal is to measure complexity of invariant 
sets and measures using fractal dimensions. The 
history of fractal dimensions is closely related to 
the names of H Minkowski (Minkowski content, 
1903), H Hausdorff (Hausdorff dimension, 
1919), G Bouligand (Bouligand dimension, 1928), 
LS Pontryagin and LG Schnirelmann (metric order, 
1932), P Moran (Moran geometric constructions, 
1946), AS Besicovitch and SJ Taylor (Besicovitch— 
Taylor index, 1954), A Rényi (Rényi spectrum 
for dimensions, 1957), AN Kolmogorov and 
VM Tihomirov (metric dimension, Kolmogorov 
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complexity, 1959), YaG Sinai, D Ruelle, R Bowen 
(thermodynamic formalism, Bowen’s equation, 
1972, 1973, 1979), B Mandelbrot (fractals and 
multifractals, 1974), JL Kaplan and JA Yorke 
(Lyapunov dimension, 1979), JE Hutchinson (frac- 
tals and self-similarity, 1981), C Tricot, D Sullivan 
(packing dimension, 1982, 1984), HGE Hentschel 
and I Procaccia (Hentschel—Procaccia spectrum for 
dimensions, 1983), Ya Pesin (Carathéodory—Pesin 
dimension, 1988), M Lapidus and M van Franken- 
huysen (complex dimensions for fractal strings, 
2000), etc. Fractal dimensions enable us to have a 
better insight into the dynamics appearing in various 
problems in physics, engineering, chemistry, medi- 
cine, geology, meteorology, ecology, economics, 
computer science, image processing, and, of course, 
in many branches of mathematics. Concentrating on 
box and Hausdorff dimensions only, we describe 
basic methods of fractal analysis in dynamics, sketch 
their applications, and indicate some trends in this 
rapidly growing field. 


Fractal Dimensions 
Box Dimensions 


Let A be a bounded set in R, and let d(x, A) be 
Euclidean distance from x to A. The Minkowski 
sausage of radius € around A (a term coined by 
B Mandelbrot) is defined as e-neighborhood of A, 
that is, A-:={yeR™: d(y, A) < €}. By the upper 
s-dimensional Minkowski content of A, s > 0, 
we mean 


*S : PA |A| 
MA) lim ae € (0, co] 





Here |-| denotes N-dimensional Lebesgue measure. 
The corresponding upper box dimension is defined by 


dimpA := inf{s > 0: M*(A) = 0} 


The lower s-dimensional Minkowski content M? (A) 
and the corresponding lower box dimension dimpA 
are defined analogously. The name of box dimen- 
sion stems from the following: if we have an «-grid 
in R composed of closed N-dimensional boxes 
with side £, and if N(A, £) is the number of boxes of 
the grid intersecting A, then 
dim,A = lim log N(A, £) 
-30 log(1/e) 

and analogously for dimpA. It suffices to take any 
geometric subsequence e,=b~* in the limit, where 
b > 1 (H Furstenberg, 1970). There are many other 
names for the upper box dimension appearing in the 
literature, like the Cantor-Minkowski order, Min- 
kowski dimension, Bouligand dimension, Borel 
logarithmic rarefaction, Besicovitch-Taylor index, 
entropy dimension, Kolmogorov dimension, fractal 
dimension, capacity dimension, and limit capacity. 
If A is such that dimpA = dimgA, the common value 
is denoted by d:= dimpA, and we call it the box 
dimension of A. If, in addition to this, both M4(A) 
and M*“(A) are in (0,00), we say that A is 
Minkowski nondegenerate. If, moreover, M!(A) = 
M*4(A)=:M4(A) € (0,00), then A is said to be 
Minkowski measurable. 

Assume that A is such that d:= dimgpA and 
M‘4(A) exist. Then the value of M%(A)"! is called 
the lacunarity of A (B Mandelbrot, 1982). A 
bounded set AC RY is said to be porous (A Denjoy, 
1920) if there exist œa >0 and 6>0 such that 
for every x€A and reé(0,6) there is ye R™ such 
that the open ball Bar(y) is contained in B,(x) \ A. 
If A is porous then it is easy to see that dimgA < N 
(O Martio and M Vuorinen, 1987, A Salli, 1991). 

We proceed with two examples. Let 
A:=C"™, a€(0,1/2), be the Cantor set obtained 
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Figure 1 Spirals of equal box dimensions (4/3) and different 
lacunarities (0.43 and 0.05). 


from [0,1] by consecutive deletion of 2% middle 
open intervals of length a*(1— 2a) in step RE NU 
{0}. Then dims A = (log 2)/( log (1/a)) (G Bouligand, 
1928), and A is nondegenerate, but not Minkowski 
measurable (Lapidus and Pomerance, 1993). For 
the spiral T of focus type defined by r=myp° in 
polar coordinates, where a€(0,1) and m>0 are 
fixed, y > yı >0, we have dimgl=2/(1+a) 
(Y Dupain, M Mendés-France, C Tricot, 1983). It 
is Minkowski measurable (Žubrinić and Županović, 
2005), and the larger m, the smaller the lacunarity; 
see Figure 1. 


Hausdorff Dimension 


For a given subset A of R (not necessarily 
bounded) and s> 0 we define H*(A):= lim. _,o 
inf {°° , r} € [0, co], where the infimum is taken 
over all finite or countable coverings of A by open 
balls of radii r; < e. The value of H(A) is called 
s-dimensional Hausdorff outer measure of A. The 
Hausdorff dimension of A, sometimes called the 


Hausdorff-Besicovitch dimension, is defined by 
dimy A := inf{s > 0: H°(A) = 0} 


If A is bounded then dimy A < dimpA < dimpA < N. 

We say that A is Hausdorff nondegenerate (or d-set) 
if H7(A) € (0,00) for some d > 0. Cantor sets share 
this property, and dimyC™ =(log2)/(log(1/a)), 
where a € (0, 1/2) (Hausdorff, 1919). 


Gauge Functions 


The notions of Minkowski contents and Hausdorff 
measure can be generalized using gauge functions 
h:[0,e9) =~ R that are assumed to be continuous, 
increasing, and /(0)=0. For example, 
* PER Ae 

M*"(A) := lim Alne) 
and similarly for M” (A) (M Lapidus and C He, 
1997), while for H’ (A) it suffices to change r; with 
h(r;) in the above definition of the Hausdorff outer 
measure (Besicovitch, 1934). Gauge functions are 
used for sets that are Minkowski or Hausdorff 
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degenerate. The aim, if possible, is to find an explicit 
gauge function so that the corresponding generalized 
Minkowski contents or Hausdorff measure of A be 
nondegenerate. 


Methods of Fractal Analysis in Dynamics 
Thermodynamic Formalism 


Thermodynamic formalism has been developed by 
Sinai (1972), Ruelle (1973), and Bowen (1975), 
using methods of statistical mechanics in order to 
study dynamics and to find dimensions of various 
fractal sets. We first describe a “dictionary” for 
explicit geometric constructions of Cantor-like sets. 
Let X, be the set of all sequences i= (i1, 12,...) of 
elements ip from a given set of p symbols, say 
{1,2,...,p}. We endow X, with the metric d(i, j) := 
S, 2™®]ik —jp| and introduce the one-sided shift 
operator (or left shift) o:X,— xX, defined by 
(Eln = Int1s that IS, Eliis ls 13, Se ) = (s 135 14, na j 
A set Ọ cX, is called the symbolic dynamics if it is 
compact and o-invariant, that is, o(Q)c O. Hence, 
(O,c) is a symbolic dynamical system. Denote 
iln]:= (i1,... in). Given a continuous function 
p:O—>R, let us define the topological pressure of 
y with respect to o by 


P(y):= Roeser NO Elin) 


POOR inico} 
n—1 
E(a|n]) = osn( sup > eo) 


{7 € Q: jla]=iln]} R=0 


The topological entropy of o|O is defined by 
h(o|Q) := P(0), that is, 


b(o|Q) = lim log #{ilu|: 1 Q} 


where # denotes the cardinal number of a set. The 
above function yy, := er poo? has the property 
Pnym = Pn + Ym © o”, and therefore we speak about 
additive thermodynamic formalism. Topological pres- 
sure was introduced by D Ruelle (1973) and extended 
by P Walters (1976). Bowen’s equation (1979) has a 
very important role in the computation of the 
Hausdorff dimension of various sets. For the unknown 
s€R, and with a suitably chosen function y, this 
equation reads 


P(sp) =0 


Geometric Constructions 


A geometric construction (OQ, A) in R” indexed by 
symbolic dynamics O is a family A of compact sets 


AS) (@A 


A41 A12 A24 A292 
Figure 2 Cantor-like set. 


Ain CR”, ic O,2EN, such that diamAj,; — 0 as 
m— O0; APERT] C Ai Aia = intA;jn] for every 1E O 
and all n, and intAj,); N intj = whenever 
iln] Æ jln] (Moran’s open set condition). This family 
induces the Cantor-like set 


n A (U arn) 


n=1 \1EO 





(see Figure 2). The mapping h:O —F defined by 
h(t) := °°_, Ain is called the coding map of F. The 
above geometric construction includes well-known 
iterated function systems of similarities as a special 
case. If \1,..., Ap are given numbers in (0, 1), and Aj, 
are balls of radii ry) := A; ...Ai,, then s:= dimy F is 
the unique solution of Bowen’s equation P(sy) =0, 
where y is defined by y(t):= log A, (Ya Pesin and 
H Weiss, 1996). In this case Bowen’s equation is 
equivalent to Moran’s equation (1946), 


p 
3 Aari 
k=1 


This result has been generalized by L Barreira (1996) 
using the Carathéodory—Pesin construction (1988). 
Let us illustrate Barreira’s theory of nonadditive 
thermodynamic formalism with a special case. 
Assume that (OQ, A) is a geometric construction for 
which the sets Aj,,) are balls, and let there exist 6 > 0 
such that fijn+1] 2 6: itn] and Vitntm) S Vifn|’o"(i)[m] for 
all i€ O,n,meEN. Then dimy F= dimg F =s, where 
s is the unique real number such that 


` E =0 [1] 


{iln]:i€ Q} 


1 
lim —| 
ae 


This is a special case of Barreira’s extension of 
Bowen’s equation to nonadditive thermodynamic 
formalism. Moran’s equation can be deduced from 
[1]. by defining rinj i= An sss Aps WHETE i= (fissis 
and à1,..., Ap € (0,1) are given numbers. Pesin and 
Weiss (1996) showed that Moran’s open set condi- 
tion can be weakened so that partial intersections of 
interiors of pairs of basic sets in the family A are 
allowed. Thermodynamic formalism has been used 
to study the Hausdorff dimension of Julia sets 


(Ruelle, 1982), horseshoes (H McCluskey and 
A Manning, 1983), etc. 

An important example of symbolic dynamics is the 
topological Markov chain X4 generated by a p x p 
matrix A with entries aj € {0, 1}: 


Xai={t= (,,...)e Xp: ai,i,,, =1 for all REN} 


It is a compact, o-invariant subset of X,. The map 
o|Xza is called the subshift of finite type (Bowen, 
1975). A construction of Cantor-like set F using 
dynamics Q = X, is called a simple geometric con- 
struction, while a geometric construction is said to bea 
Markov geometric construction if O=Xz,. If F is 
obtained by a Markov geometric construction such 
that all Aj) are balls of radii fiin] := A; ...Ai,, where 
A; € (0, 1),4 €{1,...,p}, then dimgF= dimpF =s, 
where s is the unique solution of equation 
p(AM,)=1. Here M,:=diag(A1*,...,A)°) and 
p(AM,) is the spectral radius of the matrix AM,. This 
and more general results have been obtained by Pesin 
and Weiss (1996). 

Any Cantor-like set F obtained via iterated 
function system of similarities satisfying Moran’s 
Open set condition is Hausdorff nondegenerate 
(Moran, 1946). If F is of nonlattice type, that is, 
the set {log A\1,..., log Ap} is not contained in r- Z 
for any r>0, then F is Minkowski measurable 
(D Gatzouras, 1999). 


Hyperbolic Measures 


Let X be a complete metric space and assume that 
f: X — X is continuous. Let u be an f-invariant Borel 
probability measure on X (i.e., p(f—!(A)) = p(A) 
for measurable sets A) with a compact support. 
The Hausdorff dimension of u, and the lower and 
upper box dimensions of  (L-S Young, 1982) are 
defined by 


diimpu = inti dima? Ze XZ) 1) 
dime | := lim inf{dimpZ : ZCX, u(Z) > 1 — 6} 
dimgu := lim inf{dimgZ : Z cX, u(Z) > 1-6} 


It is natural to introduce the lower and upper 
pointwise dimensions of u at x€ X by 


r—0 log r 


d (x) = 





and similarly d,,(x). It has been shown by Young 
(1982) that if X has finite topological dimension and 
if u is exact dimensional, that is, d „(x)= d(x) =: d 
for u-a.e. x € X, then 


dimy U= dimp HU = d 
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She also proved that hyperbolic measures (ergodic 
measures with nonzero Lyapunov exponents), invar- 
iant under a C!+°-diffeomorphism, a > 0, are exact 
dimensional. F Ledrappier (1986) derived exact 
dimensionality for hyperbolic Bowen-Ruelle-Sinai 
measures. This result was extended by Ya Pesin and 
Ch Yue (1996) to hyperbolic measures with semilocal 
product structure. J-P Eckmann and D Ruelle (1985) 
conjectured that the exact dimensionality holds for 
general hyperbolic measures, and this was proved by 
Barreira, Pesin, and Schmeling (1996). More precisely, 
if f is a C!*+°-diffeomorphism on a smooth Riemann 
manifold X without boundary, and if u is f-invariant, 
compactly supported Borel probability measure, then 
its hyperbolicity implies that 


d(x) = d(x) = dp) + d(x) 


for pi-a.e. x © X, where di (x) and di (x) are stable 
and unstable pointwise dimensions of u at x 
introduced by Ledrappier and Young (1985). 


Multifractal Analysis of Functions and Measures 


Invariant sets of many dynamical systems are not 
self-similar. Roughly speaking, the aim of multi- 
fractal analysis is to make a decomposition of the 
invariant set with respect to desired fractal proper- 
ties and then to study a fractal dimension of each 
set of the decomposition. Some dynamical systems 
have invariant sets equal to graphs of Hölderian 
functions f :R —R, so that wavelet methods can 
be used. One of the goals of multifractal analysis 
of functions is to study the spectrum of singularities 


of f defined by 
d;(a) := dimyH,(f) 


introduced by U Frisch and G Parisi (1985) in the 
context of fully developed turbulence. Here H,(f) is 
the set of points at which the corresponding 
pointwise Holder exponent of f is equal to a> 0. 
If the function f is self-similar then d;(a) is real 
analytic and strictly concave (first increasing and 
then decreasing) on an explicit interval (a,a) 
(S Jaffard, 1997). It is natural to consider the set 
Cu,a(f) of points x9 called chirps of order (a, 8) 
(Y Meyer 1996), at which f behaves roughly 
like |x — xo|* sin (1/|x —xo|”),8 > 0. The function 
Dr(a, 2) := dimyCy, (f) is called the chirp spectrum 
of f (S Jaffard 2000). Wavelet methods have found 
applications in the study of evolution equations 
and in modeling and detection of chirps in 
turbulent flows (S Jaffard, Y Meyer, RD Robert, 
2001). 

Basic ideas of multifractal analysis have been 
introduced by physicists T Halsey, MH Jensen, 
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LP Kadanoff, I Procaccia, and BI Shraiman (1988). 
In applications it often deals with an invariant 
ergodic probability measure associated with the 
dynamical system considered. Multifractal analysis 
of a Borel finite measure u defined on R consists in 
the study of the function 


d,,(a):= dimyKa(“), a> 0 


called the spectrum of pointwise dimensions of u. 
Here K,(/) is the set of points where the pointwise 
dimension of u is equal to a: 


Kal) = {x RN: d,(x) =d,(x) =a} 


It is also of interest to study the Hausdorff 
dimension of irregular set K(u) :={x ERN G(x) < 
d,,(x)}. These sets are pairwise disjoint and consti- 
tute a multifractal decomposition of RY, that is, 


R“ = K(u) U (UserKa(u)) 


The function d,,(a) provides an important informa- 
tion about the complexity of multifractal decom- 
position. In many situations, there is an open 
interval (a,@) on which the function d,,(a) is 
analytic and strictly concave (first increasing and 
then decreasing), and equal to the Legendre trans- 
form of an explicit convex function. We thus obtain 
an uncountable family of sets K,(j) with positive 
Hausdorff dimension, which shows enormous com- 
plexity of the multifractal decomposition of RN. 
These and related questions have been studied by 
L Olsen (1995), K Falconer (1996), Pesin and Weiss 
(1996), Barreira and Schmeling (2000), and many 
other authors. 


Local Lyapunov Dimension 


Let Q be an open set in R and let f:2 RY bea 
C!-map. To any fixed x€ we assign N singular 
values a1 > a > --- > an > 0 of f, defined as square 
roots of eigenvalues of the matrix f'(x)! - f'(x), 
where f'(x) is the Jacobian of f at x, and f'(x)! its 
transpose. The local Lyapunov dimension of f at x is 


defined by 
dim, (f,x):=j+s 


where j is the largest integer in [0,N] such that 
a,:--a; > 1 (if there is no such j we let 7=0), and 
s€[0,1) is the unique solution of dı: : -ajdi =1 
(except for j =N, when we define s=0). This 
definition, due to BR Hunt (1996), is close to that of 
Kaplan and Yorke (1979). The Jacobian f'(x) con- 
tracts k-dimensional volumes (that is, a, ---a, < 1) if 
and only if dim, (f,x) < k. In this case, we say that f is 
k-contracting at x. Furthermore, the function 
x> dim (f,x) is upper-semicontinuous, so that for 


any compact subset A of 2 the Lyapunov dimension 
of fon A, 


dim, (f, A) := max dim, (f, x) 
XE 


is well defined. Yu S Ilyashenko conjectured that if f 
locally contracts k-dimensional volumes then the 
upper box dimension of any compact invariant set is 
<k. Hunt (1996) proved that if A is a compact, 
strictly invariant set of f (i.e., f(A) =A) then 


dimgA < dim, (f, A) [2] 


This is an improvement of dimyA < dim,(f, A) 
obtained by A Douady and J Oesterlé (1980), and 
independently by Ilyashenko (1982). MA Blinchevs- 
kaya and Yu S Ilyashenko (1999) proved that if A is any 
attractor of a smooth map in a Hilbert space that 
contracts k-dimensional volumes then dimpA < k. See 
[3] below. 

A continuous variant of this method is used in 
order to obtain estimates of fractal dimensions of 
global attractors of dynamical systems (X,S) on a 
Hilbert space X. Here S(t), t > 0, is a semigroup of 
continuous operators on X, that is, S(t + s) = S(t)S(s) 
and S(0)= T. A set A in X is called a global attractor 
of dynamical system if it is compact, attracting 
(i.e., for any bounded set B and € > 0 there exists to 
such that for t > tọ we have S(t)BCA-,), and A is 
strictly invariant (i.e., S(t)A =A for all t > 0). 


Applications in Dynamics 
Logistic Map 


M Feigenbaum, a mathematical physicist, intro- 
duced and studied the dynamics of the logistic map 
fa: [0, 1] — [0, 1], falx) := Ax(1 — x), A € (0,4]. Tak- 
ing À= A% 3.570 the corresponding invariant set 
A C [0,1] (1.e., Sı(A) US2(A) =A, where S; are two 
branches of fy!) has both Hausdorff and box 
dimensions equal to ~0.538 (P Grassberger 1981, 
P Grassberger and I Procaccia, 1983). The set A 
has Cantor-like structure, but is not self-similar. 
Its multifractal properties have been studied by 
U Frisch, K Khanin, and T Matsumoto (2004). 


Smale Horseshoe 


In the early 1960s S Smale defined his famous 
horseshoe map and showed that it has a strange 
invariant set resulting in chaotic dynamics. The 
notion of strange attractor was introduced in 1971 
by Ruelle and Takens in their study of turbulence. 
Let S be a square in the plane and let f : R? — R? be 
a map transforming S as indicated in Figure 3, such 
that on both components of SN f~!(S) the map f is 
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Figure 3 The Smale horseshoe. 


affine and preserves both horizontal and vertical 
directions, and such that points 1, 2, 3, and 4 are 
mapped to 1’,2’,3’, and 4’. Iterating f we get 
backward invariant set A_:= VE f7(S), forward 
invariant set Ay:= N29 f/(S), and invariant set 
(horseshoe) Ay:=A;MA_. These sets have the 
Cantor set structure. More precisely, assuming that 
the contraction parameter of f in vertical direction is 
aé(0,1/2), and the expansion parameter in hor- 
izontal direction is b > 2, then A, =[0,1] x C®, 
where C is the Cantor set, A- = C"/®) x [0,1], and 
As = Cb) x C, so that dimpAy = dimyAy =1 + 
(log 2)/(log (1/a)) and 


log 2 log 2 


logb log(1/a) 


This is a special case of a general result about 
horseshoes in R* (not necessarily affine), due to 
McCluskey and Manning (1983), stated in terms of 
the pressure function. Analogous result as above can 
be obtained for Smale solenoids. In R? it is possible 
to construct affine horseshoes Ay such that dimp Ap < 
dimpA, (M Pollicott and H Weiss, 1994), 

Smale discovered a connection between homoclinic 
orbits and the horseshoe map. It has been noticed that 


dimpA, = dimp Ar = 
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fractal dimensions have important role in the study of 
homoclinic bifurcations of nonconservative dynamical 
systems. Since the 1970s the relationship between 
invariants of hyperbolic sets and the typical dynamics 
appearing in the unfolding of a homoclinic tangency 
by a parametrized family of surface diffeomorphisms 
has been studied by J Newhouse, J Palis, F Takens, 
J-C Yoccoz, CG Moreira and M Viana. The main 
result is that if the Hausdorff dimension of the 
hyperbolic set involved in the tangency is <1 then 
the parameter set where the hyperbolicity prevails has 
full Lebesgue density. If the Hausdorff dimension is 
>1, then hyperbolicity is not prevalent. This result and 
its proof were inspired by previous work of 
JM Marstrand (1954) about arithmetic differences of 
Cantor sets on the real line. According to the result by 
Moreira, Palis, and Viana (2001) the paradigm 
“hyperbolicity prevails if and only if the Hausdorft 
dimension is <1” extends to homoclinic bifurcations 
in any dimension. 

Using methods of thermodynamic formalism 
McCluskey and Manning (1983) proved that if f 
is the above horseshoe map, then there exists a 
C!-neighborhood U of f such that the mapping 
f — dimyA,y is continuous. Continuity of box and 
Hausdorff dimensions for horseshoes has been 
studied also by Takens, Palis, and Viana (1988). 


Lorenz Attractor 


EN Lorenz (1963), a meteorologist and student of 
G Birkhoff, showed by numerical experiments that 
for certain values of positive parameters o,7r,b, the 
quadratic system 


x=o(y—Xx), yare—y—xzZ, z=xy— bz 


has the global attractor A, for example, for o= 10, 
r=28,b=8/3. In this case dimg A © 2.06, which is 
a numerical result (Grassberger and Procaccia, 
1983). Using the analysis of local Lyapunov dimen- 
sion along the flow in A, GA Leonov (2001) showed 
that if o+ 1>b>2 and ro7(4 — b) + 20(b — 1)x 
(20 — 3b) > b(b — 1) then 


2(o+b+1) 
(o — 1% + 4ro 


dimpA < 3 — 
o+1i+ 


Hénon Attractor 


M Hénon (1976), a theoretical astronomer, discovered 
the map f:R* — R?, f(x, y):=(a + by — x?, x), cap- 
turing several essential properties of the Lorenz 
system. In the case of a=1.4 and b=0.3, Hunt 
(1996) derived from [2] that for any compact, strictly 
f-invariant set A in the trapping region [—1.8, 1.8]? 
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there holds dimpA < 1.5. Numerical experiments 
show that dimg A = 1.28 (Grassberger, 1983). Assum- 
inga > 0,b€ (0,1), and P4(x+, x+) € A, where P4 are 
fixed points of f, Leonov (2001) obtained that 


1 


ding <1 — = e 
ene Ea) aA 
Here 
sia 5 [6-14 (b= 1) + 4a | 


The proof is based on the study of local Lyapunov 
dimension of f and its iterates on A. 


Embedology 


The physical relevance of box dimensions in the 
study of attractors is related to the problem of 
finding the smallest possible dimension 7 sufficient 
to “embed” an attractor into R”. If ACR* is a 
compact set and if n > 2dimpA, then almost every 
map from R* into R”, in the sense of prevalence, is 
one-to-one on A and, moreover, it is an embedding 
on smooth manifolds contained in A (T Sauer, 
JA Yorke, and M Casdagli, 1991). If A is a strange 
attractor then the same is true for almost every 
delay-coordinate map from R£ to R”. This improves 
an earlier result by H Whitney (1936) and F Takens 
(Takens’ embedology, 1981). The above notion of 
prevalence means the following: a property holds 
almost everywhere in the sense of prevalence if it 
holds on a subset S of the space V := C!(R*, R”) for 
which there exists a finite-dimensional subspace 
ECV (probe space) such that for each ve V we 
have that v +e €S for Lebesgue a.e. e € E. 


Julia and Mandelbrot Sets 


M Shishikura (1998) proved that the boundary of the 
Mandelbrot set M generated by f(z) := 24 + c has the 
Hausdorff dimension equal to 2, thus answering 
positively to the conjecture by B Mandelbrot, 
J Milnor, and other mathematicians. Also for Julia 
sets there holds dimp J (fe) =2 for generic c in M 
(i.e., on the set of second Baire category). The proof is 
based on the study of the bifurcation of parabolic 
periodic points. Also, each baby Mandelbrot set sitting 
inside of M has the boundary of Hausdorff dimension 
2 (L Tan, 1998). Shishikura’s results hold for more 
general functions f(z) := 24 + c, where d > 2. 

For Julia sets J(f-) generated by f.(z):=2* +c 
there holds d(c):= dimy J (fe) =1 + lc|” /(4 log 2) + 
o(|c|”) for c—0. This and more general results 
have been obtained by Ruelle (1982). He also 
proved that the function d(c) when restricted to the 
interval [0, co) is real analytic in [0, 1/4) U (1/4, oo). 


Furthermore, it is left continuous at 1/4 (O Bodart 
and M Zinsmeister, 1996), but not continuous 
(A Douady, P Sentenac, and M Zinsmeister, 1997). 
Discontinuity of this map is related to the phenom- 
enon of parabolic implosion at c= 1/4. The deriva- 
tive d'(c) tends to +oo from the left at c= 1/4 like 
(1/4 — c)®1/4-3/2 (G Havard and M Zinsmeister, 
2000). Here d(1/4)~ 1.07, which is a numerical 
result. Analysis of dimensions is based on methods 
of thermodynamic formalism. 

C McMullen (1998) showed that if @ is an 
irrational number of bounded type (i.e., its contin- 
ued fractional expansion [a1,a2,...] is such that the 
sequence (a;) is bounded from above) and 
f(z) := 27 +e7™z, then the Julia set J(f) is porous. 
In particular, dimp](f) < 2. YC Yin (2000) showed 
that if all critical points in J(f) of a rational map 
f:C—C are nonrecurrent (a point is nonrecurrent 
if it is not contained in its w-limit set) then J(f) is 
porous, hence dimp/](f) < 2. Urbanski and Przytycki 
(2001) described more general rational maps such 


that dimp](f) < 2. 





Spiral Trajectories 


A standard planar model where the Hopf-Takens 
bifurcation occurs is ż=r(r” ES air”), = 1, 
where JEN. If T is a spiral tending to the limit 
cycle r=a of multiplicity m (i.e., r=a is a zero of 
order m of the right-hand side of the first equation 
in the system) then dimg T = 2 — 1/m. Furthermore, 
for m>1 the spiral is Minkowski measurable 
(Žubrinić and Županović, 2005). For m=1 the 
spiral is Minkowski nondegenerate with respect to 
the gauge function h(e):= «(log (1/e))*. 


Infinite-Dimensional Dynamical Systems 


In many situations the dynamics of the global attractor 
A of the flow corresponding to an auto- 
nomous Navier-Stokes system is finite-dimensional 
(Ladyzhenskaya, 1972). This means that there exists a 
positive integer N such that any trajectory in A is 
completely determined by its orthogonal projection 
onto an N-dimensional subspace of a Hilbert space X. 
The aim is to find estimates of box and Hausdorff 
dimensions of the global attractor, in order to under- 
stand some of the basic and challenging problems of 
turbulence theory. If A is a subset of a Hilbert space X, 
its Hausdorff dimension is defined analogously as for 
ACRN. The definition of the upper box dimension 
can be extended from A C RN to 


log m(A, €) 


log(1/e) 3 


dimgA := lim._.o 


where m(A,ce) is the minimal number of balls 
sufficient to cover a given compact set AC X. The 
value of logm/(A,«) is called e-entropy of A. 

Foias and Temam (1979), Ladyzhenskaya (1982), 
AV Babin and MI Vishik (1982), Ruelle (1983), 
and E Lieb (1984) were among the first who 
obtained explicit upper bounds of Hausdorff and 
box dimensions of attractors of infinite-dimensional 
systems. For global attractors A associated with 
some classes of two-dimensional Navier-Stokes 
equations with nonhomogeneous boundary condi- 
tions it can be shown that dimgA < c1G + c2Re?’””, 
where G is the Grashof number, Re is the Reynolds 
number, and c; are positive constants (R M Brown, 
PA Perry, and Z Shen, 2000). VV Chepyzhov and 
AA Ilyin (2004) obtained that dimpgA < 
(1/V/27)(A1|Q|)'/*G for equations with homoge- 
neous boundary conditions, where QCR? is a 
bounded domain, and A4 is the first eigenvalue of 
—A. In the case of periodic boundary conditions 
Constantin, Foias, and Temam (1988) proved that 
dimpA < cyG2/3(1 + log G)!/°, while for a special 
class of external forces there holds dimyA > c.G?/? 
(VX Liu, 1993). Let us mention an open problem by 
VI Arnol’d: is it true that the Hausdorff dimension 
of any attracting set of the Navier-Stokes equation 
on two-dimensional torus is growing with the 
Reynolds number? 

In their study of partial regularity of solutions 
of three-dimensional Navier-Stokes equations, 
L Caffarelli, R Kohn, and L Nirenberg (1982) 
proved that the one-dimensional Hausdorff mea- 
sure in space and time (defined by parabolic 
cylinders) of the singular set of any “suitable” 
weak solution is equal to zero. A weak solution is 
said to be singular at a point (xo,to) if it is 
essentially unbounded in any of its neighborhoods. 
Dimensions of attractors of many other classes of 
partial differential equations (PDEs) have been 
studied, like for reaction—diffusion systems, wave 
equations with dissipation, complex Ginzburg- 
Landau equations, etc. Related questions for non- 
autonomous PDEs have been considered by VV 
Chepyzhov and MI Vishik since 1992. 


Probability 


Important examples of trajectories appearing in 
physics are provided by Brownian motions. Brow- 
nian motions w in R, N > 2, have paths w([0, 1]) of 
Hausdorff dimension 2 with probability 1, and they 
are almost surely Hausdorff degenerate, since 
H*(w([0,1]))=0 for ae. w (SJ Taylor, 1953). 
Defining gauge functions h(e):=e* log (1/e)x 
log log log (1/e) when N =2, and þh(e):= e€? log (1/e) 
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when N > 3, there holds H’(w([0,1])) € (0,00) for 
a.e. w (D Ray, 1963, SJ Taylor, 1964). If N=1 then 
a.e. w has the box and Hausdorff dimensions of 
the graph of wļio 1} equal to 3/2 (Taylor, 1953), and 
for the gauge function h(e):=e*/* loglog(1/e) the 
corresponding generalized Hausdorff measure is 
nondegenerate. In the case of N > 2 we have the 
uniform dimension doubling property (R Kaufman, 
1969). This means that for a.e. Brownian motion w 
there holds dimpyw(A)=2dimyA for all subsets 
A C[0, oo). There are also results concerning almost 
sure Hausdorff dimension of double, triple, and 
multiple points of a Brownian motion and of more 
general Lévy stable processes. 

Fractal dimensions also appear in the study of 
stochastic differential equations, like 


d 
dx, = Xo(x:) dt + X Xg (x1) dO (t), xo =x ER 
k=1 


The stochastic flow (x:)»ọ in R is driven by a 
Brownian motion (O(£)),>9 in Rf. Let us assume that 
X,,k=0,...,d, are C®-smooth T-periodic divergence- 
free vector fields on R. Then for almost every 
realization of the Brownian motion (6(t)),+9, the set of 
initial points x generating the flow (x;),9 with linear 
escape to infinity (Le., lim, ,.,(|xz|/t) > 0) is dense 
and of full Hausdorff dimension N (D Dolgopyat, 
V Kaloshin, and L Koralov, 2002). 


Other Directions 


There are many other fractal dimensions important 
for dynamics, like the Rényi spectrum for dimen- 
sions, correlation dimension, information dimen- 
sion, Hentschel—Procaccia spectrum for dimensions, 
packing dimension, and effective fractal dimension. 
Relations between dimension, entropy, Lyapunov 
exponents, Gibbs measures, and multifractal rigidity 
have been investigated by Pesin, Weiss, Barreira, 
Schmeling, etc. Fractal dimensions are used to study 
dynamics appearing in Kleinian groups (D Sullivan, 
CJ Bishop, P W Jones, C McMullen, B O Stratmann, 
etc.), quasiconformal mappings and quasiconfor- 
mal groups (FW Gehring, J Vaisala, K Astala, 
CJ Bishop, P Tukia, JW Anderson, P Bonfert-Taylor, 
EC Taylor, etc.), graph directed Markov systems 
(RD Mauldin, M Urbanski, etc.), random walks on 
fractal graphs (J Kigami, A Telcs, etc.), billiards 
(H Masur, Y Cheung, P Balint, S Tabachnikov, 
N Chernov, D Szász, IP Toth, etc.), quantum 
dynamics (J-M Barbaroux, J-M Combes, H 
Schulz-Baldes, I Guarneri, etc.), quantum gravity 
(M Aizenman, A Aharony, ME Cates, TA Witten, 
GF Lawler, B Duplantier, etc.), harmonic analysis 
(RS Strichartz, ZM Balogh, JT Tyson, etc.), 
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number theory (L Barreira, M Pollicott, H Weiss, 
B Stratmann, B Saussol, etc.), Markov processes 
(RM Blumenthal, R Getoor, SJ Taylor, S Jaffard, 
C Tricot, Y Peres, Y Xiao, etc.), and theoretical 
computer science (B Ya Ryabko, L Staiger, 
JH Lutz, E Mayordomo, etc.), and so on. 


See also: Bifurcations of Periodic Orbits; Chaos and 
Attractors; Dissipative Dynamical Systems of Infinite 
Dimension; Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Ergodic Theory; 
Generic Properties of Dynamical Systems; Holomorphic 
Dynamics; Homoclinic Phenomena; Hyperbolic 
Dynamical Systems; Image Processing: Mathematics; 
Lyapunov Exponents and Strange Attractors; Partial 
Differential Equations: Some Examples; Polygonal 
Billiards; Quantum Ergodicity and Mixing of 
Eigenfunctions; Stochastic Differential Equations; 
Synchronization of Chaos; Universality and 
Renormalization; Wavelets: Applications; Wavelets: 
Mathematical Theory. 
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Introduction 


Interacting particles sometimes collectively behave 
in ways that take us by complete surprise. In a 
superfluid *He atoms flow without viscosity, 
and in a superconductor electrons flow without 
resistance. Such behaviors announce emergent 
structures and principles which have often found 
applications in other areas. This article concerns 
the surprising collective effects that occur when 
electrons are confined in two dimensions and 
subjected to a strong transverse magnetic field. 
At low temperatures, the Hall resistance (defined 
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below) exhibits plateaus on which it is precisely 
quantized at 


[1] 


where b and e are fundamental constants and f is a 
plateau-specific rational fraction. This phenomenon 
is known as the “fractional quantum Hall effect” 
(FQHE), or, after its discoverers, the ““Tsui-Stormer-— 
Gossard” (TSG) effect. The underlying state provides 
a new paradigm for collective behavior in nature, and 
is understood in terms of a new class of quasiparticles 
known as “composite fermions,” which are topologi- 
cal bound states of electrons and quantized vortices. 
This article will outline the basics of the experimental 
phenomenology and our theoretical understanding of 
this effect. 


The Hall Effect 


The Ohm’s law, I= V/R, tells us that the current 
through a resistor is proportional to the applied 
voltage. The local form of the law is 


J=0E 2] 


where o is the conductivity, and J=qpv is the 
current density for particles of charge g and density 
p moving with a velocity v. 

In 1879, EH Hall discovered that in the presence of 
a crossed electric and magnetic fields (E and B), 
the current flows in a direction “perpendicular” to the 
plane containing the two fields. Alternatively, the 
passage of current induces a voltage perpendicular to 
the direction of the current flow. This is known as 
the Hall effect (see Figure 1). The phenomenon has a 
classical origin. A consequence of the Lorentz force 
law of electrodynamics, 


F=q(E+2v xB) [3] 


which gives the force on a particle of charge q 
moving with a velocity v, is that for crossed electric 
and magnetic fields the particle drifts in the 
direction Ex B with a velocity v=cE/B. The 
current density is therefore given by J =gpv, where 
p is the (three-dimensional) density of particles. That 
produces the Hall resistivity 


E, B 
Jx page 4 


The von Klitzing Effect 


Molecular beam epitaxy allows controllable layer 
by layer growth in which one type of semiconductor, 
say GaAs, can be grown on top of another, say 
Al Ga1-xAs, to produce an atomically sharp interface. 
By appropriately doping such structures, electrons can 





Figure 1 Schematics of magnetotransport measurement. /, VL, 
and Vi are the current, longitudinal voltage, and the Hall 
voltage, respectively. The longitudinal and Hall resistances are 
defined as RL = V/I and Ry = Vi /. 
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be captured at the interface, thus producing a two- 
dimensional electron system (2DES). We note that 
these are three-dimensional electrons confined to move 
in two dimensions. The interaction has the standard 
Coulomb form V(r) =e/er, where e is the dielectric 
constant of the host material. (In a hypothetical world 
which has only two space dimensions, the interaction 
would be logarithmic.) 

The “integral quantum Hall effect” (IQHE) or the 
“von Klitzing effect” was discovered unexpectedly 
by von Klitzing and collaborators in 1980, in their 
study of Hall effect in a 2DES. In two dimensions, 
one defines the Hall resistance as 


_ Vu 


Ry 7 


5] 
which, from classical electrodynamics, is expected to 
be proportional to the magnetic field B. That is indeed 
the case at small magnetic fields. At sufficiently high B, 
however, quantum mechanical effects appear in a 
dramatic manner. The essential observations are as 
follows. 


1. When plotted as a function of the magnetic field 
B, the Hall resistance exhibits numerous plateaus. On 
any given plateau, Ry is precisely quantized with 
values given by 


Re 
ne? 


6 


where 7 is an integer (hence the name “integral 
quantum Hall effect’). The plateau occurs in the 
vicinity of v = Be/phc =n, where v is the “filling 
factor” (defined below). 

2. In the plateau region, the longitudinal resis- 
tance exhibits an Arrhenius behavior: 


A 
Ry ~ exp (- T) [7] 


This gives a filling-factor dependent energy scale A, 
which indicates the presence of a gap in the 
excitation spectrum. R; vanishes in the limit T — 0. 


The absolute accuracy of the quantization has 
been established to a few parts in 108 for 1o 
uncertainty, and the relative accuracy to a few 
parts in 10!°. There is presently no known 
“intrinsic” correction to the quantization. Perhaps, 
the most remarkable aspect of the effect is its 
universality. It is independent of the sample type, 
geometry, various material parameters (the band 
mass of the electron or the dielectric constant of the 
semiconductor), and disorder. The combination 
h/e* also occurs in the definition of the fine 
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structure constant a=e*/hc, the value of which is 
approximately 1/137. The Hall effect measure- 
ments in dirty, solid state systems thus provide 
one of the most accurate values for a. Finally, the 
lack of resistance at T =O is to be contrasted with 
ordinary metals, for which the resistance at T — 0, 
called the residual resistance, is finite and propor- 
tional to disorder. 


The TSG Effect 


The next revolution occurred in 1982 with the 
discovery of the TSG effect, that is, plateaus on 
which the Hall resistance is quantized at values 
given by eqn [1] (see Figure 2). The observation 
of the Ry=h/fe? plateau is often referred to as 
the observation of the fraction f. Improvement 
of experimental conditions has led to the observa- 
tion of a large number of fractions over the 
years, revealing the richness of the TSG effect. 
At the time of the writing of this article, the 
number of observed fractions is more than 50 if 
one counts only fractions below unity. As in the 
von Klitzing effect, the longitudinal resistance 
exhibits an Arrhenius behavior, vanishing in the 
limit T — 0. 


Ru (h/e 2) 


10 


Landau Levels 


The Hamiltonian for a nonrelativistic electron 
moving in two space dimensions in a perpendicular 
magnetic field is given by 


1 eA\? 
H= 3, ( +) a 


Here, mp, is the electron’s band mass and —e its 
’ b 

charge. For a uniform magnetic field, the vector 
potential A satisfies 


V x A = B2 (9] 


Because A is a linear function of the spatial 
coordinates, it follows that H is a generalized two- 
dimensional harmonic oscillator Hamiltonian which 
is quadratic in both the spatial coordinates and in 
the canonical momentum p= —1hV, and therefore 
can be diagonalized exactly. 

A convenient gauge choice is the symmetric gauge: 


Bxr B 
= = — | — 1 
5 zl y, x, 0) [10] 


With the magnetic length €=,/hc/eB and the 


cyclotron energy hw,.=heB/m,c chosen as the 





A 





20 30 


Magnetic field (T) 


Figure 2 The TSG effect. The Hall resistance (Ry) exhibits many precisely quantized plateaus, concurrent with minima in the 
longitudinal resistance (R). Reproduced with permission from Perspectives in Quantum Hall Effects; HL Stormer and DC Tsui; 
SD Sarma and A Pinczuk (eds.); Copyright © 1997, Wiley. Reprinted with permission of John Wiley & Sons, Inc. 


units for length and energy, the Hamiltonian can be 
expressed as 





1 O y ? 3o x 
H (2-3) +(-2+5) [11] 
Choosing as independent variables 
Z= x-— iy, Z=% 1) [12] 
we get 
1 o l o o 
EE 4 Sg E EA 1 
5( acon | 4 ie tts] 


Now define the following sets of ladder operators: 
L fz o 
b = —~(-4+2— 14 
V2 G j A m 


which have the property that 


la,a"| = 1, b,b] =1 [18] 


and all the other commutators are zero. In terms of 
these operators, the Hamiltonian can be written as 


H =a'a +4 [19] 


The eigenvalue of aa is an integer, n, called the 
Landau level (LL) index. The z-component of the 
canonical angular momentum operator, the only 
relevant component for the two-dimensional prob- 
lem, is defined as 

=i 25-25 =ala— bib [20] 
Exploiting the property [H, L,] = 0, the eigenfunctions 
will be chosen to diagonalize H and L, simultaneously. 
The eigenvalue of L} will be denoted by —m. The 
analogy to the Harmonic oscillator problem immedi- 
ately gives the solution 


H|m, n) = E,|m, n) [21] 


where 


E, = (» a 5) (22) 
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and 
oy a 
7 (m+n)! Vn! 


where m = —n, —n + 1,.... The single-particle orbital 
at the bottom of the two ladders defined by the two sets 
of raising and lowering operators is 





|0, 0) [23] 


m,n) 


(r0, 0) = no olr) = —— e74 [24] 


vV2r 
which satisfies 
a|0,0)= b|0,0) = 0 [25] 


The single-particle states are particularly simple in 
the lowest Landau level (n= 0): 


Z œ 
Malr) = OO) = ema 


Aside from the ubiquitous Gaussian factor, a general 
state in the lowest Landau level is given by a 
polynomial of z; it does not involve any z. In other 
words, apart from the Gaussian factor, the lowest 
Landau level wave functions are analytic functions of z. 


|26] 


Landau Level Degeneracy 


The state no,m(r) is peaked strongly at r= v2m (£. 
Neglecting order-1 effects, there are m states in the 
lowest Landau level in a disk of radius r= v2m 4, 
giving a degeneracy of (27/2)! per unit area per 
Landau level. (The same degeneracy is obtained for 
higher Landau levels as well.) It is equal to B/¢o, 
where ¢9 =hc/e is called the flux quantum, that is, 
there is one state per flux quantum in each Landau 
level. 


Filling Factor 
The number of filled Landau levels, called the filling 


factor, is given by 


Po 


v =p2nl? = 27] 


The Origin of Plateaus 


The von Klitzing effect can be explained in terms of 
a model which neglects the interactions between 
electrons. It occurs because the ground state at an 
integral filling is unique and nondegenerate, sepa- 
rated from excitations by a gap. Laughlin (1981) 
showed that the disorder-induced Anderson locali- 
zation also plays a crucial role in the establishment 
of the Hall plateaus. To see this, imagine changing 
the filling away from an integer by adding some 
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electrons or holes. In a perfect system, the additional 
particles would also be free to carry current, but in 
the actual, disordered sample, they are immobilized 
by impurities (which create localized states in the 
energy gap), and do not contribute to transport. The 
transport properties therefore remain unaffected as 
the filling factor is varied slightly away from an 
integer, and the system continues to behave as 


though it had filled shells. 


The Lowest Landau Level Problem 


The TSG effect arises due to interelectron interaction. 
We wish to obtain solutions for the Schrödinger 
equation 


HW = EW [28] 


at an arbitrary filling v, where 


H = Ia -|; Vit ZOE Er | 29) 


The first term on the right-hand side is the kinetic 
energy in the presence of a constant external 
magnetic field B=V x A, and the second term is 
the Coulomb interaction energy. (The Zeeman 
energy is not included explicitly because we con- 
sider, for now, magnetic fields that are sufficiently 
high that only fully spin-polarized states are 
relevant.) It is convenient to consider the limit 
(e? /el)/(hw.) — 0, when the Coulomb interaction is 
so weak that it is not able to cause Landau level 
mixing, so electrons can be taken to be within the 
lowest Landau level. The kinetic energy then is an 
irrelevant constant which can be thrown away, and 
the Hamiltonian reduces to 


H = Pur — Dr 7 u [30] 
E Tzk J k 


“which must be solved with the lowest LL restric- 
tion,” as explicitly indicated by the lowest LL 
projection operator Prr. The problem is thus 
mathematically well defined, but requires degenerate 
perturbation theory in an enormously large Hilbert 
space, with (\”) many particle basis vectors. The 
usual perturbative techniques are not useful due to 
the absence of a small parameter in the problem; 
e? Jel merely sets the energy scale in the lowest 
Landau level. 


Composite-Fermion Theory 


Inspired by the qualitative similarity between the 
integral and the fractional Hall effects, the composite- 


fermion (CF) theory (Jain 1989) postulates that the 
eigenfunctions of interacting electrons at filling factor 
v, W,, are related to the (known) eigenfunctions of 
noninteracting electrons at filling factor v*, ®,+, accord- 
ing to 


Wy = PuL? L[& — x)? [31] 
j<k 
where Pirr denotes projection of the wave function 
on its right into the lowest Landau level. The filling 
factors are related by 


ia 


E 2pv* +1 


which can be seen as follows: the largest power of z1 
in ®,. (neglecting order-one corrections) is N/v*, as 
follows from the definition of the filling factor. The 
largest power of z, on the right-hand side is 
therefore pN(N — 1) + N/v*. This is the number of 
flux quanta penetrating the “sample.” Dividing it by 
N and taking the limit N — oo gives the inverse of 
the filling factor v-!. These wave functions are now 
known to capture the correct nonperturbative 
physics of the TSG effect (see below), and also 
to provide extremely accurate representations for 
the actual correlated ground states and their excita- 
tions. They recover Laughlin’s 1983 wave function 
for the ground state at v = 1/(2p + 1), while clarify- 
ing that it is a part of a much bigger conceptual 
structure. 


Physical Interpretation 


The crucial property of the wave function in eqn [31] 
is that the complex Jastrow factor Į [<x (z — zp)? 
binds 2p vortices on each electron. More precisely, 
each electron sees 2p vortices on every other electron, 
in that a complete loop of an electron around any other 
electron produces a phase of 27 x 2p. The bound state 
is interpreted as a particle, called the “composite 
fermion.” Because the vortex is a topological object, so 
is the composite fermion. The vorticity 2p is quantized 
to be an even integer, as required by the single- 
valuedness and antisymmetry requirements of quan- 
tum mechanics, which will be seen to lie at the root of 
the exact quantization of the Hall resistance. 

When composite fermions move about, they experi- 
ence, in addition to the Aharonov-Bohm (AB) phase, 
also the Berry phases coming from vortices on other 
composite fermions. Imagine taking a composite 
fermion in a closed loop enclosing an area A. The 
phase associated with that loop is given by 


BA 
0 


where Nenc is the number of composite fermions 
inside the loop. The first term is the familiar AB 
phase due to a charge going around in a loop. The 
second is the Berry phase due to 2p vortices going 
around Nene particles, with each particle producing 
a phase of 27. Replacing Nenc by its average value 
pA shows that, on average, ®* is equal to the AB 
phase from an “effective” magnetic field 


B* = B — 2ppoo [34] 


The composite fermions thus experience an effective 
magnetic field B* which is much smaller than the 
external, applied field B. That lies at the heart of the 
phenomenology of this lowest Landau level liquid. 
One treats composite fermions as noninteracting 
in the simplest approximation. They form their own 
Landau-like levels in B*. Their filling factor is 
defined as v*=pd¢o/B*, with which eqn [34] 
becomes equivalent to eqn [32]. The effective field 
B* can be antiparallel to B, in which case 
v* = poo /B* is formally negative. For negative values 
of v*,®,- in eqn [31] is defined as $y =[®),«|]", 
because complex conjugation is equivalent to 
switching the direction of the magnetic field. 


Fermion Chern-Simons Theory 


Lopez and Fradkin (1991) developed a field-theoretic 
formulation of composite fermions through a singular 
gauge transformation defined by 


—_ 2p 
-a a) y [35] 
ice \IRi ~ Zk 
under which the eigenvalue problem of eqn [29] 


transforms into 


HW! = EW’ [36] 


T= e (», + -A(ri) = “a(ri)) +V 37] 


2m, ; 


J / 
a(ri) = Pdo 2 Vidi [38] 
j 


where 





Pjk = iln “Sk 
Zep 


is the relative angle between the particles j and k. The 
magnetic field corresponding to a(r;) is given by 


b; = V; x alri) = 2poo ` 6- (r; = rı) [39] 
l 
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The above transformation thus amounts to attaching 
a point flux of strength —2pġo to each electron, 
which is how the composite fermion is modeled 
in this approach. (A flux quantum is topologically 
equivalent to a vortex.) This definition is reminis- 
cent of the treatments of particles obeying fractional 
statistics (“anyons”) introduced by Leinaas and 
Myrheim (1977) and Wilczek (1982); an anyon is 
modeled as an electron bound to a point flux of 
magnitude ada, where œa determines the winding 
statistics. 

It is not possible to proceed further without making 
approximations. The usual approach is to make a 
“mean-field” approximation, which amounts to 
spreading the point flux on each electron into a 
uniform magnetic field. Formally, one writes 


A-a=A’+6A [40] 
V x A’ = B*z [41] 


The transformed Hamiltonian is written as 


1 
H' = Im DP $ A'r) +V +v 


= H+ V+ V [42] 


V is the Coulomb interaction and V’ denotes the 
terms containing 6A. The solution to Ho is trivial, 
describing free fermions in an effective magnetic 
field B*. We have thus decomposed the Hamiltonian 
into a part Ho, which can be solved exactly, and the 
rest, V + V’, which is to be treated perturbatively. 

Lopez and Fradkin recast the problem in the 
language of functional integrals, which is suitable 
for studying corrections to the mean-field theory. 
One writes the zero-temperature quantum partition 
function 


= [ PUDY Daexp (55) |43] 


S= J d'r J dil [44] 


1 
L= w (10; = ag)w =P 2m, | (—ibv +A = ajul 
1 2 / / 
taa a+ ja rplr)V(r-=r )plr) [45] 


where w and 4* are anticommuting Grassmann 
variables. The flux attachment is introduced 
through a Lagrange multiplier a9; because ao enters 
linearly in the action, it can be integrated out to 
produce a delta function that imposes the 
constraint 


V x alr) = 2pdop(r) = 2pdoy" (r) V(r) [46] 
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This formalism is closely related to the topological 
Chern-Simons (CS) field theory. Recall that the CS 
Lagrangian has the form 


Los ~ ^A, Fy, = 2A, OLAd [47] 


where F,,,,=0,A, —0,A,, and e^ is the antisym- 
metric Levy-Civita tensor, with e°! =1. The index 
takes values 4=0,1,2, the first being the time 
component and the remaining space components. 
The CS action is invariant, up to surface terms, 
under a gauge transformation, because the change in 
Lcs under a functional variation 6A,,=0,,A is a total 
derivative. 

Zhang et al. (1989) noted that the term propor- 
tional to aọV xa in eqn [45], which enforces flux 
attachment, is precisely equal to the CS Langrangian 
in the Coulomb gauge. Write 





1 
Lo= aT 
> Apo mein 
i i |48] 
=~ ay ja; — —— Paida; 
2po) T Apoo O] 
where i,j represent the spatial components 


(i j=1,2), and the time components have been 
displayed explicitly in the second step (o =0,). The 
first term on the right-hand side of eqn [48] is 
identical to the third term on the right-hand side of 
egn [45]. In the Fourier space the last term is 
proportional to 


dailq, w) (~iw)a;(-q, —w) [49] 


By choosing the x-axis along q, the Coulomb gauge 
condition q.a=0 implies a2(q,w)=0, guaranteeing 
that the last term in eqn [48] is identically zero. 

The constraint of eqn [46] is used to eliminate 
the two factors of density in the last term of 
egn [45]. The action is then quadratic in the 
fermion field, which can be integrated out. Various 
response functions can be expressed as correlation 
functions of the vector potential field and their 
averages over the CS field configurations are 
evaluated perturbatively by standard diagrammatic 
methods. 

The fermion CS theory is believed to capture the 
topological properties of composite fermions, but 
has not lent itself, because of the lack of a small 
parameter, to quantitative calculations. It is not 
known what classes of Feynman diagrams will need 
to be summed to eliminate the electron mass mp 
(which is not a parameter of the lowest Landau 
problem - see eqn [30]) in the fermion CS 


approach). Halperin et al. (1993) proceeded by 
replacing mp by an adjustable parameter m*, 
interpreted as the composite-fermion mass. Murthy 
and Shankar (1997) proposed to separate out 


the inter- and intra-Landau level degrees of 
freedom by making a sequence of further 
transformations. 
Consequences 


Fractional Quantum Hall Effect 


The CF theory provides a simple understanding of 
why gaps open up at “fractional” fillings, which 
happens at those fillings v=f for which composite 
fermions fill integral numbers of CF Landau levels. 
That results in Hall plateaus at Ry—=h/fe” in the 
presence of disorder. The fractional QHE is thus 
understood as the integral QHE for composite 
fermions. 


Sequences of Fractions 


The integral fillings of composite fermions corre- 
spond to fractional fillings of electrons given by 


7] 
Oo — 50 
” = Don] £1 o 
which are precisely the observed fractions. Some of 
these are: 
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Particle-hole symmetry in the lowest Landau level 
also implies fractions 1— f. The fractions appear 
in the form of sequences because they are all 
derived from the sequence of integers. The Hall 
quantization is exact because the right-hand side 
of eqn [50] is made up of whole numbers and 
therefore is not susceptible to small perturbations 
in the Hamiltonian. The CF theory unifies the 
FQHEs and IQHEs. 


Fermi Sea at Half Filling 


Equation [50] is consistent with the fact that only 
odd-denominator fractions have been observed in 
the lowest Landau level (i.e., with f < 1). Halperin 
et al. (1993) and Kalmeyer and Zhang (1992) 
proposed that at the simplest even-denominator 
fraction, namely v= 1/2, composite fermions form 
a Fermi sea. This was motivated by the fact that the 
effective magnetic field is B*=0 at v=1/2. A 
number of experiments have directly measured the 
Fermi sea of composite fermions. The TSG effect 
with f=1/2 is absent because the Fermi sea has 
gapless excitations. 


Effective Magnetic Field 


For small values of B* (i.e. in the vicinity of 
v= 1/2), the cyclotron radius of composite fermions 
can be very large compared to the radius of the 
cyclotron orbit of a classical electron in B. Direct 
measurements of the cyclotron orbit in several 
geometric experiments have confirmed that the 
charge carriers experience a magnetic field B* rather 
than B. 


Fractional Charge 


Laughlin (1983) showed that the presence of a gap 
at a fractional filling implies the existence of 
fractionally charged excitations. He obtained an 
excitation through the adiabatic insertion of a point 
flux quantum at, say the origin, which can be 
gauged away at the end leaving behind an exact 
excited state. The Faraday’s law implies that the 
azimuthal component of the induced electric field is 
E, = —(2ar)* dgé/dt. The current density then is 
jr = 0HEg, where oy = fe*/h is the Hall conductivity. 
The charge leaving the area defined by a circle of 
radius r per unit time is 2z7j,. The total charge 
leaving this area in the adiabatic process then is 


O = | 2m, dt = —oy¢0 = —fe [55] 


The charge excess associated with the excitation is 
therefore fe. It is in general not an elementary 
excitation. For f=n/(2pn +1), it can be shown to 
be a collection of n elementary excitations, giving a 
charge of e* =e/(2pn +1) for a single elementary 
excitation. 


Microscopic Tests 


Exact solutions of the Schrödinger equation can be 
obtained, for a finite number of particles, by a brute- 
force diagonalization of the Hamiltonian in the 
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lowest LL subspace, which enables a rigorous and 
nontrivial testing of the CF theory. Figure 3 shows 
some typical comparisons, which help test both the 
qualitative and the quantitative aspects of the CF 
theory in a model-independent manner. The low- 
energy spectrum of interacting electrons at B is 
explicitly seen to have a one-to-one correspondence 
to that of weakly interacting electrons at B*. 
Furthermore, there is a remarkably good quantita- 
tive agreement. The predicted energies agree with 
the exact energies to better than 0.05%, and the 
overlaps between the wave functions of eqn [31] 
with the exact eigenfunctions are close to 100%. 
Such comparisons are even more convincing in light 
of the fact that the wave functions of eqn [31] 
do not contain any adjustable parameters for the 
states at v in eqn [50], because the ground state 
wave function and its low-energy excitations at 
y* =n are unique and fully known: the former is 
the Slater determinant corresponding to n filled 
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Figure 3 Exact spectra (dashes) for several particle numbers 
at v= 1/3, 2/5, and 3/7. Dots show the CF prediction for the 
energy, obtained with no adjustable parameters. The electrons 
are taken to be confined on the surface of a sphere in the 
presence of a radial magnetic field; L is the total orbital angular 
momentum, and each dash represents a multiplet of 2L+ 1 
degenerate states. The number on top is the dimension of the 
Fock space in the corresponding L sector. Reproduced from 
Jain JK (2000) The composite fermion: a quantum particle and 
its quantum fluids. Physics Today 39(4): 39—42, with permission 
from American Institute of Physics. 
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Landau levels, and the latter are the excitons. 
The predicted energies are calculated by determining 
the expectation values of the full Hamiltonian 
of eqn [30] with respect to the wave functions in 
eqn [31]. 


More Physics 
Spin 


At small Zeeman energies, partially spin-polarized 
or spin-unpolarized FQHE states become possible. 
The TSG effect with spin is well described by a 
generalization of the CF theory. The observed 
fractions are still given by eqn [50], but with 


n=n; +n] [56] 


where 7; is the number of occupied spin-up Landau- 
like CF bands and n, is the number of occupied 
spin-down Landau-like CF bands. There are in 
general several states with different spin polariza- 
tions possible at any given fraction. The observed 
quantum phase transitions as a function of the 
Zeeman energy, which can be changed by increasing 
the parallel component of the magnetic field, are 
consistent with this picture. Direct measurements of 
the spin polarization further confirm this, but also 
see evidence for certain additional fragile states, 
which are presumably caused by the residual 
interaction between composite fermions. 


Bilayers 


It has been proposed that for two parallel 2DES 
planes at small separations and at total filling v= 1, 
neutral interlayer excitons (each exciton made up of 
an electron in one layer and a hole in the other) 
undergo Bose-Einstein condensation, producing a 
true off-diagonal long-range order. Tunneling and 
transport experiments by Eisenstein and collabora- 
tors provide evidence for nontrivial behavior under 
such conditions. The resistivity in the antisymmetric 
channel is very small but does not vanish. 


Pairing 


An even-denominator fraction f=5/2 has been 
observed. Writing 5/2=2+1/2 and noting that 
the lowest LL contributes 2 (counting the spin 
degree of freedom), v = 5/2 corresponds to a filling 
of 1/2 in the second Landau level. The most 
promising scenario for the explanation of the 5/2 
effect is that composite fermions form a p-wave 
paired state, which opens up a gap to excitations. 
This state is believed to be well described by a 


Pfaffian wave function proposed by Moore and 
Read (1991) 


1 
wrt, — Pf 
ca e - J 


x | [(z - z)” exp -1X at) [57] 
k 


i<j 





The Pfaffian of an antisymmetric matrix M is 
defined, apart from an overall factor, as 


Pf(M;;) = A(M12M34 ees Mn-1N) [58] 


where A is the antisymmetrization operator. The 
Bardeen—Cooper-Schrieffer wave function 


Wecs = A[do(11, 72)b0(73, 174) ---d0(7nN-1,7N)] [59] 


has the same form as the Pfaffian in eqn [58]. 
Hence, Pf 1/(z; — z) describes a p-wave pairing of 
electrons, and Wi is interpreted as a paired state of 
composite fermions carrying two vortices. 


FQHE of Composite Fermions 


Recently, some fractions other than those in eqn [50] 
have been observed, for example, f=4/11 and 
f=5/13. These are understood as the delicate 
“fractional” QHE of composite fermions at v* = 1+ 
1/3 and v* =1 + 2/3. 


TSG Effect in Higher Landau Levels 


The short-range part of the Coulomb interaction 
is less effective in higher Landau levels because 
of the greater spread of the electron wave func- 
tion. As a result, composite fermions are less 
stable, often losing to charge density wave states. 
A few fractions have been observed in the second 
Landau level (1/3, 2/3, 2/5, 1/2) and one (1/3) in 
the third. 


Edge states 


There is a gap to excitations in the bulk at the 
magic fillings of eqn [50], but there is no gap at the 
edge of the sample. The dynamics of the low-energy 
edge excitations is formally equivalent to that of a 
chiral one-dimensional Tomonaga-Luttinger liquid. 
Wen (1991) argued that the exponent characterizing 
the long-distance behavior of this liquid is quan- 
tized, fully determined by the filling factor of the 
bulk state. Experimental studies of the tunneling 
of an external electron into the edge of an 
FQHE system provide evidence for a nontrivial 
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Tomonaga-—Luttinger liquid but do not find the 
predicted universal value for the edge exponent. 
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Introduction 


In several models coming from very different 
applications, one needs to describe physical phe- 
nomena where the state function may present some 
regions of discontinuity. We may think, for instance, 
of problems arising in fracture mechanics, where the 
function which describes the displacement of the 
body has a jump along the fracture, phase transi- 
tions, or also of problems of image reconstruction, 


where the function that describes a picture (the 
intensity of black, e.g., in black-and-white pictures) 
has naturally some discontinuities along the profiles 
of the objects. 

The Sobolev space analysis is then no longer 
appropriate for this kind of problem, since Sobolev 
functions cannot have jump discontinuities along 
hypersurfaces, as, on the contrary, is required by the 
models above. For a rigorous presentation of 
variational problems involving functions with dis- 
continuities, the essential tool is the space, BV, of 
functions with bounded variation. The first ideas 
about this space were developed by De Giorgi in the 
1950s, in order to provide a variational framework to 
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study the problems of minimal surfaces, and several 
monographs are now available on the subject. We 
quote, for instance, the classical volumes of Evans 
and Gariepy (1992), Federer (1969), Giusti (1984), 
Massari and Miranda (1984), Ziemer (1989), and the 
recent book by Ambrosio et al. (2000), where a 
systematic presentation is given, also in view of the 
applications mentioned above. 


The Space BV 


Consider a generic open subset 2 of R, which, for 
simplicity, we take bounded and with a Lipschitz 
boundary. In the following, we denote by L (E), or 
simply |E|, the Lebesgue measure of E in R, while 
H* denotes the k-dimensional Hausdorff measure. 


Definition 1 We say that a function u € L'(Q) is a 
function of bounded variation in Q if its distribu- 
tional gradient Du is an R-valued finite Borel 
measure on Q. In other words, we have 


[ wwiedx =— | oad 
VOCE (U) W= TyeeagilN [1] 


where D,u are finite Borel measures. The space of all 
functions of bounded variation in 2 is denoted by 
BV(Q). 


The space BV(Q) is clearly a vector space and, 
with the norm 


luleve = Ulla) + Pul) 2] 


it becomes a Banach space. The total variation 
|Du|(Q) appearing above is intended as 


|Du|(Q) 
N 
wn} > [ o dDa: $ €CX(Q;R%), || < 7 
j=1 Y9 


= supl — | udivodx: 6 € CPR"), lol < i| 


and is sometimes indicated by |,,|Du|. The space 
BVioc(Q) is defined in a similar way, requiring that 
u € BV(’) for every Q CC Q. 

From the point of view of functional analysis, the 
space BV(Q) does not verify the nice properties of 
Sobolev spaces. In particular, 


e the Banach space BV(Q)) is not separable; 

e the Banach space BV(Q)) is not reflexive; and 

èe the class of smooth functions is not dense in 
BV(Q) for the norm [2]. 


The above issues motivate why the norm [2] is not 
very helpful in the study of variational problems 
involving the space BV(Q). On the contrary, the 
weak” convergence defined below is much more 
suitable to treat minimization problems for integral 
functionals. 


Definition 2 We say that a sequence (u,,) weakly” 
converges in BV(Q) to a function u € BV(Q) if u, —> u 
strongly in L'(Q) and Du, — Du in the weak" 
convergence of measures. 


The weak” convergence on BV(Q)) satisfies the 
following properties: 


© Compactness Every bounded sequence in BV(Q) 
for the norm [2] admits a weakly’ convergent 
subsequence. 

è Lower-semicontinuity The norm [2] is sequen- 
tially lower-semicontinuous with respect to the 
weak" convergence. 

e Density Every function u € BV(Q) can be 
approximated, in the weak” convergence, by a 
sequence (u„) of smooth functions. 


The density property above can be actually made 
stronger: in fact, the approximation of (un) to u 
holds in the sense that 


Uy — u strongly in L'(Q) 
Du, — Du weakly” as measures 


|Dutn|(Q) — |Du|(Q) 


Further properties of the space BV(Q) concern 
the embeddings into Lebesgue spaces, traces, 
and Poincaré-type inequalities. More precisely, we 
have: 


e Embeddings The space BV(Q) is embedded 
continuously into LN/'N-"(Q) and compactly 
into L?(Q) for every p < N/(N — 1). 

e Traces Every function u € BV(Q) has a bound- 
ary trace which belongs to L'(0Q), and the trace 
operator from BV(Q) into L!(0Q) is continuous. 

è Poincaré inequalities There exist suitable con- 
stants cy and c2 such that for every u € BV(Q) 


| lulde < c IDa f uann- 
Q on 


/ ju — ug| dx < c2|Du|(Q) 
Q 


(where uo = Ze J ndx) 
Q 
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Sets of Finite Perimeter 


An important class of functions with bounded 
variation are those that can be written as 1g, the 
characteristic function of a set E, taking the value 1 
on E and 0 elsewhere. This is the natural class where 
many phase-transition problems with sharp inter- 
faces may be framed. 


Definition 3 For a measurable set EC R the 
perimeter of E in Q is defined as 


Per(E, 2) =|D1|(Q) 


The equality above is intended as Per(E,Q)=-+co 
whenever 1g ¢ BV(Q). If Per(E, Q) <-+c0o then the set 
E is called a set of finite perimeter in Q. 


Note that by the compactness property above for 
BV functions, a family of characteristic functions of 
sets with finite perimeter in a bounded open set Q 
with equibounded perimeter is weakly’ -precompact, 
and its limit is of the same form. 

For a set E of finite perimeter in 2, we may define 
the inner normal versor and the reduced boundary 
as follows. 


Definition 4 Let E be a set of finite perimeter in Q. 
We call reduced boundary 0*E the set of all points 
x E ON spt|D1eg| such that the limit 


eine a 
=0 [D1 e|(B-(x)) 


exists and satisfies |vg(x)|=1. The vector vg(x) is 
called the generalized inner normal versor to E. 


In order to link the measure-theoretical objects 
introduced above with some structure property of sets 
of finite perimeter, we introduce, for every t € [0,1] 
and every measurable set E C RN, the set E’ defined by 


ga [ken tig OR} p 


For instance, if E is a smooth domain of RY, E! is 
the interior part of E, E? is its exterior part, while 
E'/? is the boundary OE. 

The main properties of the reduced boundary and 
of the generalized inner normal versor are stated in 
the following result. 


Theorem 5 Let E be a set of finite perimeter in Q. 
Then its reduced boundary 0*E coincides HN~'-a.e. 
with the set E! introduced in Definition 3, and we 
have the equality 


Per(E,Q) = HN (Qn PE) = HHN E1?) 


Moreover, the generalized inner normal versor vg(x) 
exists for HN~'-a.e. x € O*E, and we have 


D1g = ve(x)HN LOE 


Note that the lower-semicontinuity of |D1g|(Q) 
entails the lower-semicontinuity of EK HN (aN 
O*E) with respect to the weak"-convergence of 1g. 
As a consequence, we may apply the direct methods 
of the calculus of variations to obtain, for example, 
existence of minimizers of 


min{ Per(E, R") — J g dx} 


that are sets with prescribed mean curvature g. This 
lower-semicontinuity property can be further gen- 
eralized, for example, as in the following result for 
anisotropic perimeters. 


Theorem 6 Let y:SN~! — R be a Borel function. 
The energy 


/ (ve) dH 
QNO*E 


is lower-semicontinuous with respect to the weak*- 
convergence of 1g in BV(Q) if and only if the 
positively one-homogeneous extension of p from 
SN-1 to RN is convex. 


This result immediately implies the existence of 
solutions of isovolumetric problems of the form 


mind f p(ve) dH t: |E| = c) 
oE 


whose solutions are obtained by suitably scaling the 
Wulff shape of y. 


The Structure of BV Functions 


The simplest situation occurs when N = 1 and so Q is 
an interval of the real line. In this case, decomposing 
the derivative u’ into positive and negative parts, and 
taking their primitives, we obtain that u € BV(Q) if 
and only if u is the sum of two bounded monotone 
functions (one increasing and one decreasing). There- 
fore, in the one-dimensional case, the BV functions 
share all the properties of monotone functions. 

The situation is more delicate when N > 1, for 
which we need the notion of approximate limit. 


Definition 7 Let u € BV(Q). We say that u has the 
approximate limit z at x if 


1 
lim = | u(y) —z|dy = 0 
730 IB, (x)| B, (x) | (y) | y 
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The set where no approximate limit exists is called 
the approximate discontinuity set, and is denoted by 
S„. In a similar way, when x € S, we may define the 
approximate values zt and z~, by requiring that 


1 H Aes 

5 are AOC, 

=A al. w E N 
where 

By (x,v) = {y € B,(x): (y—-x)-v > 0} 

B; (x, v) = {y € B(x): y- x): v <0} 


Analogous definitions can be given in the vector- 
valued case, when u € BV(Q; R”). 


The triplet (z*,z~,v) in Definition 7 is unique up 
to interchanging z* with z and changing sign to v, 
and is denoted by (u(x), u~ (x), ,(x)). 

We are now in a position to describe the structure 
of the measure Du when u € BV(Q), or more 
generally u € BV(Q; R”). We first apply the 
Radon-Nikodym theorem to Du and we decompose 
it into absolutely continuous and singular parts: 
Du = (Du)? + (Du)*. We denote by Vu the density of 
the absolutely continuous part, so that we have 

Du = Vu- L" + (Du) 
The singular part (Du)? can be further decomposed 
into an (N — 1)-dimensional part, concentrated on 
the approximate discontinuity set S,, and the 
remaining part, which vanishes on all sets with 


finite HN! measure. More precisely, if u€ 
BV(Q; R”), we have 
Du=Vu-LN + (ut (x) — uo (x)) 
Q v(x) : HILS, + (Du) [4] 


the three terms on the right-hand side are mutually 
singular and are, respectively, called the absolutely 
continuous part, the jump part, and the Cantor part 
of the gradient measure Du. 

In the vector-valued case, Du is an m x N matrix 
of finite Borel measures, Vu is an m x N matrix of 
functions in L'(Q), and the jump term in [4] is an 
(N — 1)-dimensional measure of rank 1. The struc- 
ture of the Cantor part (Du)° is described by the 
Alberti’s rank-1 theorem (see Alberti (1993)). 


Theorem 8 For every u € BV(Q; R”) the Cantor 
part (Du) is a measure with values in the mx N 
matrices of rank 1. 


Convex Functionals on BV 


Many problems of the calculus of variations deal 
with the minimization of energies of the form 


D= J FD [5] 


The direct methods to obtain the existence of at 
least a minimizer require some coercivity hypotheses 
on F, as well as its lower-semicontinuity. This 
last issue, already rather delicate when working 
in Sobolev spaces (see, e.g., Buttazzo (1989) 
and Dacorogna (1989)), presents additional difficul- 
ties when the unknown function u varies in the 
space BV(Q), due to the fact that Du is a measure, 
and the precise meaning of the integral in [5] has to 
be clarified. 

In this section, we limit ourselves to consider the 
simpler situation of convex functionals, and we also 
assume that the integrand f(x, u, Du) depends only 
on x and Du. It is then convenient to study the 
problem in the framework of functionals defined on 
the space of finite Borel vector measures M(Q; R£). 
Let f: RN x R* — [0, +00] be a Borel function such 
that 


e f is lower-semicontinuous, and 
e f(x,-) is convex for every x € RN. 


We denote by f%(x,z) 
associated with f, given by 


the recession function 


= 4. T AER) 
mea = lim S 


where zo is any point in R* such that f(x, zo) < +00 
(in fact, the definition above is independent of the 
choice of zo). Then we may consider the functional 


y= f er M(x ype fF (x as) dX’). [6] 


where A=°-dx+° is the Lebesgue—Nikodym 
decomposition of into absolutely continuous and 
singular parts, and the notation dA‘/d|A‘| stands for 
the density of A’ with respect to its total variation 
|A|. For simplicity, the last term on the right-hand 
side of [6] is often denoted by fo f°(x, 5). 

For the functional F, the following lower- 


semicontinuity result holds (see, e.g., Buttazzo 
(1989). 





Theorem 9 Under the assumptions above the func- 
tional [6] is sequentially lower-semicontinuous for the 
weak* convergence on M(Q; R£). Moreover, if 


f(x,2) > co|z| — a(x) 
with cy > 0 and a € L'(Q) [7] 
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then the functional F turns out to be coercive for the 
same topology. 


From Theorem 9 we deduce immediately a lower- 
semicontinuity result for functionals defined on 


BV(Q; R”). 


Corollary 10 Under the assumptions above on the 
integrand f(with k=mN) the functional defined on 
BV(Q; R”) by 


F(u) = J f(x, (Du)*) dx 


xo (, d(Du)* s 
+See (e E) do 8 
is sequentially lower-semicontinuous for the weak* 


convergence. Moreover, under the assumption |7] 
the functional F is coercive with respect to the same 


topology. 


For some extensions of the result above to the case 
when f(x, -) is quasiconvex (in the vector-valued 
situation m > 1), we refer the interested reader to 
Fonseca and Miller (1992) and references therein. 

Fixing boundary data is another difference 
between variational problems on Sobolev spaces 
and on BV spaces. Due to the fact that the class 
{u € BV(Q): u=uo on OO} is not weakly” closed, to 
set in a correct way a minimum problem of Dirichlet 
type on BV(Q) with datum uo € BV(RY) it is 
convenient to consider a larger domain QDD 2 
and for every u € BV(Q) the extended function 


g E 
U = 
U0 


whose distributional gradient is 
Dù = DuLQ + DuL \ Q 
+e (uo — uvo HNT! LƏQ 





on Q 
on Y \Q 


vo being the exterior normal versor to Q. We have 
then the following functional on BV((’): 


Pa) = | f (Dap) dx + | F>, (Da) 


= J f(x, (Du)*) dx + (x, (Duo)*) dx 


N\A 
+ f FEDA fl, (Duo) 
Q Y\a 
+] f> (x, (uo — u)va) dH! 
an 
If we drop the constant term 


(x, (Duo)*) dx ae 
QQ O\O 


f” (x, (Duo)”) 


irrelevant for the minimization, we end up with the 
functional 


Fu (u) = F(u) + | f° (x, (uo — u)v) dH! 
On 


where F is as in [8]. The Dirichlet problem we 
consider is then 


mind Fw + f f% (x, (uo — u)vQ) dH: 
u € BVO) } 19] 


For instance, if f(z) =|z|, problem [9] becomes 


Q on 


Under the assumptions considered, the problem 
above admits a solution u € BV(Q), but in general 
we do not have u=up on ðQ in the sense of BV 
traces. 


Nonconvex Functionals on BV 


In order to introduce the class of nonconvex 
functionals on BV(Q), let us denote v = Du so that 
every functional ®(v) provides an energy F(u). If we 
work in the setting of Sobolev spaces, we have u € 
W4?(Q) (p > 1), which implies v € L. (Q; R); now, 
it happens that in this case all “interesting” 
functionals ® are convex. More precisely, it can be 
proved that a functional ®:L?(0;R‘) — [0, +00], 
which is 


è sequentially lower-semicontinuous for the weak 
convergence of L?(Q;R™), and 

e local on L?(Q; RY) in the sense that ®(v + w) = 
O(v) + P(w) whenever v.w = 0 in Q, 


has to be necessarily convex, and of the form 
B(v) = | olx, v(x)) dx 
Q 


for a suitable integrand ọ such that @(x,-) is 
convex. Then the energies F(u) defined on Sobolev 
spaces and obtained by a functional ®(v) through 
the identification v=Du are necessarily convex. 
This is no longer true if ® is defined on the space 
M(Q; R) of measures, and hence F is defined on 
BV(Q). The first example of a nonconvex functional 
® on M(Q;RN) in the literature comes from the 
so-called Mumford-Shah model for computer vision 
(see below) and is given by 


(A) = J N(x) Pde + (Aa) 
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where à? is the absolutely continuous part of A, A) is 
the set of atoms of A, and # is the counting measure. 
The functional ® is set equal to +oo on all measures 
à whose singular part AS is nonatomic. A general 
representation result (see Bouchitte and Buttazzo 
(1992) and references therein) establishes that a 
functional ®: M(Q; R) — [0, +oo], which is 


è sequentially lower-semicontinuous for the weak” 
convergence of M(Q; RY), and 

e local on M(Q;RN) in the sense that (A + v) = 
P(A) + (v) whenever A and v are mutually 
singular in Q, 


has to be of the form 


D(A) = J p(x, X*) du + | 6° (x, °) 
+ J (æ, A# (a) df 


where p is a non-negative measure, \=A*-dx + 
A“ +A” is the decomposition of À into absolutely 
continuous, Cantor, and atomic parts, $(x,v) is an 
integrand convex in v, and $* is its recession 
function. The novelty is now represented by the 
integrand 7(x,v) which has to be subadditive in v 
and satisfying the compatibility condition 


n Ot). Wet) 
t—>+00 t t—0+t t 


When @ has a superlinear growth the condition 
above gives that the slope of w(x, - ) at the origin has 
to be infinite. For instance, in the Mumford—Shah 
case we have 


plx, v) = lv, L To 


Coming back to the case u € BV(Q), we have the 
decomposition (see [4]): 


Du = Vu- LN + (Du) + [ulv,(x) -HN LS, 
where we considered, for simplicity, only the scalar 


case m=1 and denoted by [u] the jump ut — u”. 
We have then the functional 


F(u) = | olx, Vu) dx | g(x, (Due) 
X, |UlV — 
+ | We bjr) a 


For instance, in the homogeneous-isotropic 
case, when ¢(x,v) and y(x,v) are independent 


of x and depend only on |v|, the formula above 
reduces to 


F(u) = l (| Vul)dx + B|Dul (A) 


+ f Wiep ay 
where 8, ¢,~ satisfy the compatibility condition 
t 
6 = 6*(1) = lim ŽP 12] 


In the original Mumford-Shah model for computer 
vision, Q is a rectangle of the plane, uo: Q — [0,1] 
represents the gray level of a picture, cı and c2 are 
positive scale and contrast parameters, and the 
variational problem under consideration is 


mind f |Vu| dx + c | ju — uo| dx 
Q Q 
Pao {Suk Du = o) [13] 


The solution u then represents the reconstructed 
image, whose contours are given by the jump set S,,. 
We refer to Giorgi and Ambrosio (1988) and to the 
book by Morel and Solimini (1995) for further 
details about this model. 

Analogously, in the case of the study of fractures 
of an elastic membrane, a problem similar to [13] 
provides the vertical displacement u of the mem- 
brane, together with its fracture set S,. We refer to 
some recent papers (see Dal Maso and Toader 
(2002) and Francroft and Marigo (1998), and 
references therein) for a more detailed description 
of fracture mechanics problems, even in the more 
delicate vectorial setting of elasticity. 

Using the functional F in [11] we have the 
generalized Mumford-Shah problem, 


min{ Fw + c4 J lu — uo| dx: u € BVO} 


where ¢ is convex, w is subadditive, and the 
compatibility condition [12] is fulfilled. 

If we set K=S, and assume that it is closed, the 
Mumford-Shah problem can be rewritten as 


mind J [Vul dx + cy J lu — up|" dx 
Q\K Q\K 
+c HNHK NAQ): K CQ closed, 


u € H'QA\K)} 


and this justifies the name “free discontinuity 
problems,” which is often used in this setting. 

The regularity properties of optimal pairs (u, K) 
are far from being fully understood; some partial 
results are available but the Mumford-Shah 
conjecture: 


è in the case N =2 for an optimal pair (u, K) the set 
K is locally the finite union of Ct! arcs 


remains still open. We refer to Ambrosio et al. 
(2000) for a list of the regularity results on the 
problem above that are known thus for. 
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Introduction 


Free probability is a probability theory adapted to 
quantities with the highest degree of noncommutativ- 
ity. A basic feature of this is that the definition of 
independence is modified in such a way that the freely 
independent random variables will not commute in 
general. The exploration of this notion of indepen- 
dence, which was initially motivated by questions 
about operator algebras (Voiculescu 1985), has 
produced a theory that runs parallel to an unexpect- 
edly large part of classical probability theory. The 
applications of the theory have also gone into 
unexpected directions, once it turned out that the 
large-N limit of systems of random matrices is a key 
asymptotic model in the theory (Voiculescu 1991). 
There are several signs like the connections to large N 
for random matrices and to the combinatorics of 
noncrossing partitions (Speicher 1998) (which corre- 
spond to certain planar diagrams), that perhaps these 
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connections may go even further towards the large-N 
limit of models in gauge theory. 

In this article the noncommutative probability and 
the random matrix angle will be emphasized and 
very little will be said about the operator algebras 
and the combinatorics. After discussing free inde- 
pendence and models based on free products of 
groups and creation and annihilation operators on 
the Boltzmann full Fock space, we continue with the 
semicircle law, which is the substitute for the Gauss 
law in this context, and with the nonlinear free 
harmonic analysis arising from addition and multi- 
plication of free random variables. 

We then devote two longer sections to the 
asymptotic free independence of large random 
matrices and to free entropy, the free probability 
analog of Shannon’s information-theoretic entropy 
for continuous random variables. 


Freeness of Noncommutative 
Random Variables 


Classical probability deals with expectation values 
of numerical random variables, that is, with 
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numerical functions on a space of events and with 
their integrals with respect to a probability measure 
on the space of events. In noncommutative prob- 
ability, the random variables, like quantum-mechan- 
ical quantities, are elements of a noncommutative 
algebra A over C, with unit 1¢A, which is 
endowed with a linear expectation functional 
y:A—-C, so that y(1)=1. Frequently, A is a 
*-algebra of operators on some Hilbert space H and 
y(T)= (TE, £) for some unit vector EEH. We call 
(A,y) a noncommutative probability space and the 
elements a € A, noncommutative random variables. 
In this section we shall discuss the basics around the 
notion of freeness (Voiculescu 1985), which plays 
the role of independence in free probability. 

If a= (a;i); CA is a family of noncommutative 
random variables, the role of joint distribution is 
played by the collection of noncommutative moments 
yla; ...a;,). This can also be extended by linearity toa 
distribution functional ®,:C(X;|i€ I) —C, where 
C(X; |i € I) is the ring of polynomials in noncommu- 
tative indeterminates X;(i € I) and 


O,(P(Xili € 1)) = pP (aili € D) 


If A is a C*-algebra of operators on H, a=a* € A 
and y(-)=(-€,&), the distribution of a can also be 
identified with the probability measure ua on R 


Halw) = (E(w; ay€, £) 


where E(-;a) is the spectral measure of a. Indeed, 
then 


6,(P(X)) = J P(t)dua(t) 


A family (Aj)je; C A,1 € A; of subalgebras is 
“free” (which is short for freely independent) if 


(ay... an) =) 


whenever a; € Aj, 1 <j <n, ij i1 and (aj) =0. 
(Here it is only required that consecutive a;’s be in 
different A;s. Thus, we may have i; =73, provided 
11 # 12.) 

A family of sets of random variables (w;);-;, w; C A 
is free if the algebras A; generated by 1 U {w;} are free 
in (A,y). 

Except for rather trivial situations, free random 
variables in (A,y) do not commute. 

Note also that, as in the case of classical 
independence, if (w;);-; are disjoint freely indepen- 
dent sets of random variables, then, if the distribu- 
tions ®,,(7€ I) are given, the distribution ®,, of 
w= |Jerwi is completely determined. 

Example 1 Let the group G be the free product of 
its subgroups (G;);-;, that is, G is generated by these 


subgroups and there is no nontrivial relation among 
elements of different G;’s. Further, let A be the 
regular representation X(g)eé,=e,, of G on the 
Hilbert space with orthonormal basis (ég),¢g. 
Then, with respect to the expectation functional 
t(T)=(Te,.,ée) on operators on (G), the sets 
(A(G;));e, are freely independent. 


Example 2 If H is a complex Hilbert, let 
TH= @,s) H denote the full Boltzmann Fock 
space, with vacuum vector 1 so that H®?=C1. 
If beH and €€TH, let I(b)E=h@€E denote the 
left creation operator and y(X)=(X1,1) the 
vacuum expectation. Then, if the #H,(i€ I) 
are pairwise orthogonal subspaces in #H, the 
*-subalgebras of operators generated by /[(H;) U 
I*(H;), indexed by i€I, are freely independent 
with respect to y. 


Free Independence with Amalgamation 
over a Subalgebra 


The classical notion of conditional independence 
also has a free counterpart based on the notion of 
free independence with amalgamation over a sub- 
algebra. This subject is technically more complicated 
and we will only aim at giving an idea about what 
kind of concepts are involved. 

In the classical context, if (X, X£, u) is a probability 
space with a o-algebra X, then the conditional 
independence with respect to a o-subalgebra of 
events, go C ©, amounts to replacing in the defini- 
tion of independence the expectation functional 
(which is the integral with respect to u) by the 
conditional expectation functional L®(X, ®©, u) *, 
LX, 405 [(du9)). 

In free probability, one considers an extension of 
the theory, from the (A,y) framework to an (A,®, B) 
framework (Voiculescu 1995), where A is an algebra 
with unit over C,B31 is a subalgebra, and 
@:A—B is B-B-bilinear and ®|,=idg. Then the 
definition of B-freeness (or free independence with 
amalgamation over B) of a family of subalgebras 
(Ai); B C A; C A requires that 


whenever a; € A; i A tj41(1 <j <n), and (a) =0. 

In the case of a unital x-algebra of bounded 
operators M with an expectation functional 
T(-)=(-€&) which is tracial (i.e., r([m1, m])=0 if 
m,,mz E€ M) and given a subalgebra 1 € N C M, as 
in the classical theory, there is a certain canonical 
construction in operator algebra theory of a “con- 
ditional expectation” ®:M— N, where M,N are 


algebras of operators obtained as completion- 
separates from M and N. With this construction, in 
the trace-state setting there is complete analogy with 
the classical notion of conditional independence. 

Several other constructions of free probability 
have been extended to the (A,®,B) B-valued 
context. 

A group-theoretic example similar to Example 1 
can be constructed from a group G which is a free 
product with amalgamation over a subgroup H C G 
of subgroups HCG; Cc Giel. Then A is the 
algebra constructed from the left-regular representa- 
tion of G, whereas B is an algebra constructed from 
the left-regular representation of H. 


The Semicircle Law 


In free probability the semicircle law appears as the 
limit law in the free central limit theorem 
(Voiculescu 1985). Here is a weak, rather algebraic, 
version of this fact: 

If (an) are freely independent in (A,y) and 
satisfy the conditions that 


plan) = O(n E N) 
lim N7! pla) =1 
N-oo Pan ( ) 


p( at] = Ch < o(k eN) 


then, if Sn =N 37, ex an, we have the conver- 
gence of moments of the distribution of Sn to the 
semicircle distribution 


sup 
neN 








2 
lim (Sk) = (20)! J ra= d 
$00 E” 

Thus, the semicircle law, given by the density 
(27) (4 — 22)'/? on [+2,2] is the free analog of the 
(0,1) Gauss law. 

Two coincidences involving the semicircle law 
should be noted. 

The field operators s(h) =27'(I(h) + I(h)*) on the 
Boltzmann Fock space (Example 2) have semicircle 
distributions with respect to the vacuum expectation 
e(-)=(-1,1). It turns out that this goes farther: if 
H=Hpr@rC is the complexification of a real 
Hilbert space, then the map Hr Ə b —> s(h) is the 
analog in free probability of the Gaussian process 
over the Hilbert space Hg (Voiculescu 1985). It is 
often called the semicircular process over Hg. This 
points to an important connection of free prob- 
ability to the full Boltzmann statistics. 

The other coincidence is that the semicircle law is 
well known as the Wigner limit distribution of 
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eigenvalues of large Gaussian random matrices. As 
we shall see, this is a clue to a deep connection of 
free probability to the large-N limit of random 
matrices (Voiculescu 1991). 


Free Convolution Operations 


In classical probability theory, the distribution of 
the sum of two independent random variables is 
computed by the convolution product of their 
distributions. This has a free probability analog. 
If a,b are free random variables in (A,y) with 
distributions pa, up:C[X]—>C, then the joint 
distribution Ma b} is completely determined by 
Ha, up and in particular u4+p, the distribution of a + b, 
also depends only on ua, up. It follows that there 
is an additive free convolution operation H on 
distributions so that wgHup = a4, whenever a,b are 
free (Voiculescu 1985). The same can be done with 
multiplication replacing addition, and this defines the 
multiplicative free convolution operation x% by 
the equation pgXu,=L a, When a,b are free 
(Voiculescu 1985). A slightly surprising feature of x 
is that in spite of noncommutativity of a and b, 
the multiplicative operation X% turns out to be 
commutative, which of course is obvious for H. 

In the classical context, convolutions are bilinear 
operations which can be computed using integrals. 
The free convolutions are quite nonlinear and their 
computation is via another route, which can also be 
explained by a classical analogy. Classically, the 
logarithm of the Fourier transform linearizes con- 
volution, that is, 


log F(ux*v) = log F(u) + log F(v) 





























and we may compute xv as the (log F)! of 
log F(u) + log F(v). The linearizing transform for H 
is the R-transform (Voiculescu 1986), which is 
obtained by the following procedure. 

If u:C[X]—C is a distribution, let G,,(z) =z! + 
Soy H(X”) 1, which, in case u is a compactly 
supported probability measure on R, is the Laurent 
series at oo of the Cauchy transform 


du(t) 
=z 
From this, one obtains, by inversion at oo, the series 


K,, so that G,(K,(z))=z and one defines 
R,,(z) =K,(z) — z+, which is a power series in z. Then 


Rimo aya Ry 
In case the distribution corresponds to a measure, 


the formal inversion amounts to inverting an 
analytic function. 
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For the multiplicative operation H, it is more 
convenient to describe an analog of the Mellin trans- 
form, that is, no logarithm will be taken. This is the 
S-transform (Voiculescu 1991), obtained as follows. 

If w:C[X]—C is a distribution with p(X) 4 0, 
one forms Y,(2)= „>41 U(X”)z” and its inverse x,, 
so that w,,(x,,(z)) =z. Then 


S..(z) = z7'(1 + z)x,(z) 
has the property that 
Saki = Sasy 


The free central limit theorem can be easily 
proved using the R-transform. Another easy applica- 
tion of the R-transform is to find the free analog of 
the Poisson law, that is, 

lim ((1 — a /n)ôo + a/nê1)” 


n— OO 


where a > 0. The free Poisson law is 
={ Gala if0<a<1 
V 


H ifa>1 


where v has support [(1 —a!/2)*,(1+a!/2)*] and 
density (27t) "(4a — (t — (1 + a)}?)!. This distribu- 
tion is well known in random matrix theory as the 
Marchenko-Pastur distribution, again a coincidence 
pointing to a random matrix theory connection. 

Because probability measures on R are distribu- 
tions of self-adjoint operators and a sum of self- 
adjoint operators is again such an operator, the 
additive free convolution yields an operation on 
probability measures on R. Similarly, it can be 
shown that & gives rise to operations on probability 
measures on {z€ C||z|=1} and on probability 
measures on [0,oo). 

With the R-transform machinery at hand, the free 
analogs of many of the classical results around 
addition of independent random variables have been 
developed (we recommend Voiculescu (1998c) for a 
survey of these developments). This includes the 
classification of infinitely divisible laws (Levy- 
Khintchine type theorem), classification of stable 
laws, domains of attraction, and convolution semi- 
groups. Note that the free laws are rather different 
from the classical ones, but the classification results 
are quite parallel, that is, the indexing parameters 
are almost the same. The situation is similar in the 
multiplicative context. As in the classical case, these 
results about laws yield in particular processes with 
independent increments, which in the free frame- 
work are free increments. 

As in the classical setting, also in the free setting, 
convolution semigroups are connected to differential 














equations. In the additive free case, a semigroup is a 
family (ut);>o of probability measures on R, so that 
Urts = ut H us. If G(t,z) is the Cauchy transform of u 
(which is an analytic function on the half-plane 
Imz>0), the equation (Voiculescu 1986) is a 
semilinear complex PDE: 


OG OG 
Opt Rin (G) a = 0) 


where R, is the R-transform of mı. In particular, 
when u1 is the semicircle law, R,,,(z) =az a > 0 and 
the PDE is a complex Burgers equation in the upper 
half-plane. 














Noncrossing Partitions 


The series expansion of the R-transform 


R, (2) — 5 Roe 


n>0 


has as coefficients polynomials R,(j) in the 
moments u(X£). More precisely, assigning to p(X*) 
a degree k, R,„(u) is a polynomial of degree n and 
R„(u) — p(X”) = polynomial in p(X*) with k <n. 
The linearization property of the R-transform 
implies that 


R, (uE v) = Ra(u) + R,(v) 


For classical convolution, polynomials with simi- 
lar properties satisfying 


Cu(u*v) = Cam) + Ca) 


are called cummulants and satisfy 


log u(e*) = ) | Cn)” 


n>1 


There are combinatorial formulas involving the 
lattice of all partitions of the set {1,...,7} which give 
the classical cummulants. For free cummulants, like 
R,„(u) and generalizations of these, there are similar 
formulas provided the lattice of all partitions is 
replaced by the lattice NC() of noncrossing partitions 
(Speicher 1998). A partition m= (V1,..., Vm) of 
{1,...,7} is noncrossing if there arenoa<b<c<d 
so that {a,c} C V;,{b,d} C Vand k £1. 

More generally, a family R™(a1,...,a„) of free 
cummulants, where aj,...,da, are in some (A,y), is 
defined recursively as follows (Speicher 1998). For 
n=1, one has R")(a)=y(a). If t=(Vi1,..., Vin) € 


NC(z), where V,={i(1,k) <--- <i(m,k)}, we 
define 
R[r](41,-.-,4n) = [| RIP aig. diana) 


1<k<m 


The recurrence relation for cummulants is then 


Ria] (a1,.--,4n) 
TENC(n) 


Note that the right-hand side involves only R')’s 
with k < n and that actually R™® appears only in 
and is equal to R[({1,...,7})](a1,...,an) (the coars- 
est partition). 

A key property of R(a,,...,a,) is that if 
{1,... n} =a I p and (ak)keos (ai)icg are freely inde- 
pendent, then R™® (a1,...,a„)=0. 

If u is the distribution of a€(A,y), then the 
cummulants R,(u) are given by 


The noncrossing condition on partitions corre- 
sponds to a planarity requirement for diagrams and 
as such is very suggestive of connections to planar 
diagrams occurring in the constant term of large-N 
expansions from random matrix theory and more 
generally gauge theory. 

For more details on the subject of noncrossing 
partitions, we refer the reader to the memoir by 
Speicher (1998). 


Asymptotic Freeness of Random 
Matrices 


The explanation for the coincidences between certain 
laws in free probability and in random matrix theory 
is that freeness occurs asymptotically among random 
matrices in the large-N limit (Voiculescu 1991). 

Random matrices can be put in a noncommutative 
probability framework (AN, YN), where 
An = L®~°(Q, My; dø) (the N x N complex matrix- 
valued functions on the probability space (Q, dco) 
which are p-integrable for all p € [1,o0)) and the 
expectation functional is 


yn(X) = No J tr X(w)do(w) 


The basic example is provided by an n-tuple of 
Gaussian random matrices (Voiculescu 1991). Let 


Pe (a) N, 1<j< 
J DQ) 1<p,q<N ) zd — m 
(N) (N) (N) 
where daj = lap: and the 2179, 


1<j<mn are (0,N‘)-Gaussian and independent. 
Then (ae as N — œ converges in noncommu- 
tative distribution to the freely independent n-tuple 
(1(e;) + [*(e;))1<j<, in the Boltzmann Fock space 
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context of Example 2 for an orthonormal system 
€1,.-.,€n E H, that is, convergence of moments: 


yim, ex (EY Te") 
= (Hen) + Pei )) + Een) +P len))1, 1) 


In particular, the limit variables (/(e;) + P (e;))i<j<n 
are free. 

More generally, asymptotic freeness of variables 
or sets of variables in (An,yn) can be defined 
without the existence of a limit distribution, that is, 
by requiring only that the freeness relations among 
noncommutative moments hold asymptotically as 
N— oo. 

Note that in these random matrix questions, the 
joint classical distribution of an m-tuple of random 
matrices aa ..5X%)) in An is a probability 
measure on (Myn)” which contains more informa- 
tion than the collection of noncommutative 
moments, which is the distribution of the noncom- 
mutative variables in (Ay, yn). In particular, for one 
random matrix the classical distribution gives the 
joint distribution of all entries, whereas the non- 
commutative distribution gives information only 
about the distribution of eigenvalues. 

From the Gaussian v-tuple using operator techni- 
ques much more general asymptotic freeness results 
have been obtained. For instance (Voiculesu 1998b): 

Let (XM... , XM YIN). YM) be (m+n)- 
tuples of self-adjoint N x N random matrices with 
classical joint distribution py on (Mj). Assume 
that uyn is invariant under the action of the unitary 
group U(N) which takes (X1,...,Xm,Y1,..-5 Yn) 
into (X1,..., Xm, UY,U",...,UY,U") and assume 
that there is a bound R on the operator norms 
xA] and pana independent of N. Then the 
sets (x), ...,X)} and Fi eats Y'N)} are asymp- 
totically free as N — oo. 

Note that the uniform bound on the operator 
norms can be easily replaced by weaker conditions. 

Once we know that certain random matrices are 
asymptotically free and that the large-N limit in 
noncommutative distribution exists, the results of 
free probability apply. For instance, if X’ and Y®) 
are asymptotically free and have limit distributions 
u and v, then the limit distribution of XN) + YW) 
and of XN YN) are the free convolutions wv and, 
respectively, u% v. 

Free probability techniques have also been suc- 
cessful in dealing with other questions about the 
asymptotic behavior of random matrices. 

If jG ia ree ha is an n-tuple of i.i.d. Hermitian 
Gaussian random, then the uniform operator norms 
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of polynomials in noncommutative indeterminates 
have the property that 


Acta eames | 
= ||P(U(e1) + Her)", «++ en) + Ken) 


almost surely (Haagerup and Thorbjoernsen). 

This result is a far-reaching generalization of the 
results about largest eigenvalues of one Gaussian 
random matrix. The use of operator-valued free 
random variables (with respect to certain subalge- 
bra) was an essential ingredient in the proof. Also, in 
another direction, freeness of operator-valued free 
random variables was used to obtain a free prob- 
ability treatment of Gaussian random band matrices 
and generalizations of these (Shlyakhtenko 1996). 

Finally, quite recently, extensions of the free 
probability framework have appeared which are 
adapted to the study of fluctuations of systems of 
random matrices in the large-N limit. 


Free Entropy 


There are free probability analogs also for information- 
theoretic quantities (Voiculescu 1994, 1998a). 

Let (f1,.--5f,) be an -tuple of classical numerical 
random variables the joint distribution of which has 
density p(t1,...,¢,) with respect to the n-dimensional 
Lebesgue measure à, on R”. The entropy quantity 
associated by Shannon to (f1, ..., fn) is 


H(fis.-sf) =~ | p log pds 


The free analog of H(f,,...,f,) is the free entropy 
quantity x(X1,...,X,). Here MWK 1<7<.%, ate 
noncommutative self-adjoint random variables in 
(M,r), where M is a x-algebra of bounded operators 
on a Hilbert space H. The expectation functional in 
addition to the positivity properties, equivalent to 
the requirement that it can be defined by a unit 
vector T(- ) =(-€,&), also has the property of a trace 
T(XY)=7(YX) for all X,Y € M. For instance, the 
noncommutative random variables arising from the 
large-N limit of n-tuples of self-adjoint random 
matrices live in noncommutative probability frame- 
works (M,7) of this kind. 

There are two approaches to defining free entropy 
and, since there are only partial results about the 
equivalence of these approaches, the quantities 
obtained are denoted by y(Xj,...,X,) (Voiculescu 
1994) and y*(X1,...,X,) (Voiculescu 1998a). The 
quantity y is often referred to as the “microstates 
free entropy,” its definition being inspired by the 
Boltzmann formula S=k log W, whereas the other 
entropy, sometimes called “microstates-free free 


b) 


entropy,” is obtained via a free probability analog 
of the Fisher information (Voiculescu 1998a). 

The microstates used to define y are matricial and 
the reason why this choice produced a quantity with 
the right behavior with respect to free independence 
can be found in the asymptotic freeness properties of 
random matrices. 

Given X;=X7 € M,1 <j <n and meN,REN, 
c>0 the microstates IT(X1,...,Xn;m,k,ce) are 
n-tuples (A1,..., An) of self-adjoint k x k matrices, 
such that, for noncommutative moments of order up 
to m, we have 

ker, (Aa, ia . Aj,) — TX - . Xj, )| <E 
where tapam isj ad ay op. 

One obtains x(X1,..., Xn) by taking the infimum 
over € > 0 and m EN of 


lim sup (k? logvol T(...) + 5log k) 


k=œ 


where vol is the volume on (M;")” corresponding to 
the Hilbert-Schmidt norm Hilbert space structure 
(Voiculescu 1994). 

When n = 1, there is a simple formula for x(X). If 
u is the probability measure on R which represents 
the distribution of X =X* € M with respect to the 
expectation 7, then 


xX) = J [ log|s — thdu(s)du(e) + © 
where the exact value of the constant C is 
3/4 + 1/2 log 2r. 

For n>1 there is no simple formula for 
\y(X1,...,Xn), but there are several properties 
which provide a better understanding of this 
quantity. 

If X; are such that x(X;) > —oo, then 


MSG seen Aa) = x(X1) T ste (Xa) 


if and only if X1,...,X, are freely independent in 
(M,7). Clearly, this property of x with respect to 
free independence is analogous to the property of 
H(ft,.--5fn) with respect to classical independence. 

Further, if F,,...,F, are power series in n 
noncommuting indeterminates, there is a change- 
of-variable formula 


x (Fi (X14, .. eee sedal Xia es ,Xn)) 
= log | det |(7(F)) + x(X1,.--, Xn) 


involving the Kadison—Fuglede positive determinant 
}det| and a certain noncommutative Jacobian 
J(F),F=(F,,...,F,) defined in M, 8 M&M”, 
where MP is the opposite algebra of M. (For 


definitions and the many technical conditions under 
which this formula holds, see Voiculescu (1994).) 

The free entropy y also satisfies semicontinuity, 
subadditivity, and a semicircular bound (analogous 
to the classical Gaussian bound) properties. 

An unexpected feature of y is a degeneration of 
convexity. If the trace state T is a convex combina- 
tion 7=67' + (1 — 6)7”, where 7’,7” are trace states 
and where r Ær” on the algebra generated by 
X1,..., Xn, and n > 1, then 


\(X1,..., Xn) = —00 


(for a reference consult the survey Voiculescu 
(2002)). 

With the free entropy at hand, an important 
variational problem can be formulated for the 
noncommutative distribution of an n-tuple of self- 
adjoint noncommutative random variables 
T;,..., Tn in the tracial context. The quantity to 
be maximized is 


Ned ignaay 1a) T(P (T ises Ia) 


where P is a given self-adjoint polynomial in 
noncommutative indeterminates (see Voiculescu 
(2002) for comments on this problem). If n=1, 
this is a classical problem for the logarithmic energy 


/ log |s — tldyu(s)dy(e) — | POAC) 


where u is a probability measure on R. 

To explain the second approach, based on Fisher 
information, we begin by recalling some facts about 
Fisher information in the classical context. 

If f is a numerical random variable with distribu- 
tion given by the density p(t) on R, then 


man = (PJE 


Here d/dt is the differential operator defined on test 
functions in L*(R, pdt). Then 


Po (aV 
r=- 


The classical connection to entropy is that the Fisher 
information is a derivative of the entropy when the 
variable becomes the starting point of a Brownian 
motion. This can be written as 











L?(R,p dt) 


d 
Fisher(f) = ZH (f +2" 9) |, 


where gq and f are independent and g is (0,1) 
Gaussian. 

The several-variables version is treated by using 
partial derivatives. 
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The analog in free probability of the Fisher 
information (Voiculescu 1998a) is obtained by 
using the free difference quotient derivations, 
which are the appropriate derivations in this 
maximally noncommutative setting. On the poly- 
nomials in n noncommutative indeterminates, the 
kth partial free difference quotient 


dpa © 0. erna Oca © 0. CREED, Oa 


is defined on noncommutative monomials by the 
formula 


OpXi, Xa = DY) Xie Xi @ Xi Xi 


{f|ij=R} 


p 


If X;=X*,1<j<n, are noncommutative ran- 
dom variables in (M,7), which do not satisfy any 
nontrivial algebraic relations, to simplify matters we 
can assume that M is generated by X,...,X, and 
identify M with C(Xj,...,X,). The trace state 7 
gives rise to a scalar product (m1, m2) =7(m3m 1) on 
M. Let L?(M,7) denote the Hilbert space obtained 
from M. Then, skipping some technicalities, ô, will 
give rise to a densely defined operator of L*(M, 7) 
into L?(M, 7T) & L?(M, 7). If 1 @ 1 is in the domain of 
the adjoints Of, the free Fisher information of the 
n-tuple X1,...,X, is defined to be 


* * 2 
® (A iges TEA — ` [ə 8 Uea 


1<k<n 


In case 1@1 is not in the domain of some Oj, the 
free Fisher information is given the value oo. 
The “microstates-free free entropy” x* is then 


defined by 


OC hain pag Ay) = Slog 2ne 


Lee 
0 1+¢ 
OES ieee e 8,) Ja 


where S1,...,8, are (0,1)-semicircular and freely 
independent and also freely independent of 
[X1,..., Xn}. 

For n=1 it is known that y*(X)=y(X) and the 
free Fisher information is 


72 
B(x) ==E fp (ear 


if p(t) is the density with respect to the Lebesgue 
measure of the distribution of X. The computation 
of 0*1 & 1 is possible in the one-variable case and up 
to a factor the result is (Hp)(X), where Hp is the 
Hilbert transform of p. 
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Several of the classical inequalities for the Fisher 
information have free probability analogs 
(Voiculescu 1998a) (Cramer—Rao inequality, Stam 
inequality, information-log-Sobolev inequality, and 
others). 

For n > 1 only y < x*, the easier of the inequal- 
ities among xy and y*, has been established (Biane 
et al. 2003). This result was obtained based on an 
important connection of x and x* to large deviations. 
The deviations studied are for the noncommutative 
distributions of 1-tuples of matrices in the case of an 
n-tuple of Gaussian random matrices. In this context 
x is related to the quantity to be estimated and x* is 
related to the rate function. 

For more details on free entropy, the reader is 


referred to the survey articles by Voiculescu (1998c, 
2002). 


Concluding Comments 


For more details, additional results, and 
bibliography, we refer the reader to the exposi- 
tions in Voiculescu (1998c), Voiculescu et al. 
(1992) and Speicher (1998). To get even more 
detail, the reader may consult, besides the original 
papers of the present author, those of P Biane, 
R Speicher, D Shlyakhtenko, K J Dykema, A Nica, 
U Haagerup, H Bercovici, L Ge, F Radulescu, 
A Guionnet, T Cabanal-Duvillard, M Anshelevich, 
to name a few of the main contributors. 

Also, via random matrices, there are connections 
to physics models (especially large-N 2D Yang-Mills 
QCD) in work of I M Singer, M Douglas, D Gross- 
R Gopakumar, P Zinn—Justin. In a loose sense, one 
may view the noncrossing partitions combinatorics 
as related to the work on planar diagrams and the 
large-N limit of t' Hooft and Brezin—Itzykson—Parisi— 
Zuber in the 1970s. 


See also: Large Deviations in Equilibrium Statistical 
Mechanics; Large-N and Topological Strings; Random 
Matrix Theory in Physics. 
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Introduction 


Functional equations have a long and interesting 
history in connection with mathematical physics and 
touch upon many branches of mathematics. They 
have arisen in the context of both classical and 
quantum completely integrable systems in several 
different ways and we shall survey some of these. 

In the great majority of cases functional equations 
appear in the integrable system setting as the result 
of an ansatz: a particular form of a solution is either 
guessed or postulated, the consistency of which 
yields a functional equation. What the ansatz is for 
can vary significantly. As outlined below, amongst 
others, one may postulate algebraic structures in the 
form of the existence of a Lax pair or of conserved 
quantities; in the quantum setting, one may postu- 
late properties of a ground-state wave function or 
the ring of commuting differential operators. 
Appearing in this way, functional equations are 
really just another of the (significant) tools-of-the- 
trade for constructing and discovering new integr- 
able systems. However, as one surveys both the 
functional equations and the functions they describe 
one sees certain common features. The functions are 
most frequently associated with an elliptic curve, a 
genus-1 abelian variety. One can seek to associate 
these to another fundamental ingredient of modern 
integrable systems, the Baker—Akhiezer function. 
Indeed, very few of the ansdtze made directly 
suggest that the systems being constructed will be 
completely integrable. This very desirable property 
usually is a bonus of the construction and hints of 
more fundamental connections. Another fundamen- 
tal connection we shall mention is that with 
topology. The phase space of a completely integr- 
able system is rather special, admitting (generically) 
a foliation by tori. The functional equations we 
encounter often also characterize the Hirzebruch 
genera associated with the index theorems of known 
elliptic operators. These are typically evaluated by 
Atiyah-Bott fixed-point theorems for circle actions 
on the manifold. A general understanding of the 
various interconnections has yet to be achieved. 

To bring to focus our discussion we shall concen- 
trate on functional equations arising from studying 
systems with an arbitrary number of particles 
(n below). In principle, there could be many different 
interactions between the particles and symmetry will 


be used to limit these. The use of symmetry is a key 
ingredient, often implicit, in the various ansatze we 
shall describe. For simplicity, we shall most often focus 
on the situation where the particles are identical. In 
algebraic terms, we focus on the symmetric group S, 
and root systems of type a,; generalizations frequently 
exist for other root systems and Weyl groups and we 
shall simply note this at the outset. 


Lax Pairs 


The modern approach to integrable systems is to 
utilize a Lax pair, that is, a pair of matrices L, M such 
that the zero curvature condition L=[L,M] is 
equivalent to the equations of motion. By construc- 
tion, Lax pairs produce the conserved quantities tr L*. 
To establish integrability, one must further show both 
that there are enough functionally independent con- 
served quantities and that these are in involution. 
(R-matrices are the additional ingredient of the 
modern approach to establishing involutivity.) Lax 
pairs can fail on both counts, and so the construction 
of a Lax pair is but the first step in establishing a 
system to be completely integrable. The great merit of 
the modern approach is that it provides a unified 
framework for treating the many disparate completely 
integrable systems known. Unfortunately the construc- 
tion of a Lax pair is often far from straightforward and 
typically hides the “clever tricks” frequently employed 
in establishing integrability. In the present context, we 
shall outline how functional equations have been used 
to construct Lax pairs. The paradigm for this approach 
is the Calogero—Moser system. 
Beginning with the ansatz (for n x n matrices) 


Lik = pj + B(1 — ôr) A(Gj — qr) 


Mir = 8 | 5x > B(aj — qi) — (1 — §) C(j — ar) 
Fi 


one finds L=[L, M] yields the equations of motion 
for the Hamiltonian system ( > 3) 


H= 3 +e Y Ula- a0 i 
U(x) = A(x)A(—x) AOM 
provided C(x)= —A'(x), and that A(x) and B(x) 
satisfy the functional equation 
A(x + y)[B(x) — B(y)] = A(x)A' (y) — A(y)A' (x) [2] 


This is a particular example of a more general 
functional equation whose solution will be described 
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below. For the present, we simply note that for this 
system the corresponding potential is the Weier- 
strass g-function, A(x)A(—x) = g(v) — p(x), and the 
resulting Hamiltonian system [1] is known as the 
Calogero—Moser system. It is completely integrable 
though, as already remarked, the ansatz did not 
necessitate this. The Lax pair presented here and the 
reduction of its consistency to a functional equation 
and algebraic constraints follows Calogero (1976) in 
which he discovered the elliptic generalization of the 
model he had introduced in 1975. 
A different ansatz for a Lax pair is 


Lik =q; + (1 — bie) \/ 94 A — qr) 


Myr = r > B(aj — 91) + (1 — 8) 4/44 Clai — de) 
lA; 


Now the consistency of the Lax pair yields equations 
of motion of the form 


4; = > 442 V (45 — 4k) 








kFi 
A(x) A(-x) 
Vio) = = —V(—x) 
C(x) Cx) 
provided B(x) = B(—x), C(x) = A’(x) — A(x)G(x), 


where we have defined G(x) = B(x) + (1/2)V(x), 
and the functions satisfy the functional equation 














A(x) AQ) 
Alz +y) = AG)A(y) +E EPR 
A(x) Aly) 
_ 1 C(x) Cy) | 
~ Gx) GO) 5 


Again we shall briefly defer describing the solution 
of this equation and simply note that the general 
solution for V(x) is again given in terms of the 
Weierstrass g-function V(x)='(x)/(g(v) — @(x)) 
and that the equations of motion follow from the 
Hamiltonian 


H= Y e” [J oe) -plaq - a) 
j RA] 
This is known as the Ruijsenaars—Schneider model 
and it too is completely integrable. The Lax pair 
here was constructed by Bruschi and Calogero. 

In the two examples of Lax pairs just presented, 
each particle interacts with every other pairwise. By 
modifying the ansatz, it is possible to construct 
models that interact with just their nearest neighbors 
(which include the Toda systems). More generally, 


an ansatz exists for a Lax pair associated with 
equations of motion of the form 


Gj = > (at bqj)(a + bax) Vlaj- ae) A 
k#j 


which unifies, for example, the Calogero—Moser, 
Ruijsenaars-Schneider, and Toda systems. The 
functional equations now encountered are typically 
(and whenever b 40) of the form 








a(x) ¢2(y) 

| G3(x) gay) 
BN ai) 2 

bs(x) os(y) 








This functional equation, for five a priori unknown 
functions, includes [2] and [3] as special cases. 

The general analytic solution of [5] is, up to 
symmetries, given by 


a $r) a(x) \ a 
O an) a) 7 p vı) 


| E on e) 
Ps5 (x) E V2 ) 





where 


Here, ¢(x) =o0(x)’/o(x) is the Weierstrass ¢-function. 
The solution of [2] arises as the v — 0 limit of [5]. 
The proof of the general solution just stated 
is in fact constructive (Braden and Buchstaber 
1997). The parameters appearing in the solution 
are determined as follows. Suppose xo is a generic 
point for [5]. Then (for k =1,2), we have that 


O74 (% + xo) boe(y + xo) 
Oy In 
Pak+1(x + x0) Q2k+1(Y + xo) 
= Cvr) — C(x) — Cue — x) — Ak 








y=0 


1 xit+1 


The Laurent expansion determines the parameters 
21,22 (which are the same for both k=1,2) 
characterizing the elliptic functions of [6] by 

g2 =2(Fa + 6F), g3 =6F)— Fi + $PP» 
and the parameters v, via Fo= —(vz). Here, 
o(x) = —C'(x) is the Weierstrass elliptic go-function 


with periods 2w,2w’ that satisfies the differential 
equation 


¢' (x) = 4px)? — 20(x) — g3 


The constructive nature of the solutions of [5] means 
that it is straightforward to construct solutions to 
various specializations of the equation such as 


o1(x +y) = a(x) bs(y) + b4(¥) Os (x) 


(obtained by requiring ¢2(x)=¢4(x) and @3(x)= 
6%(x)). More complicated functional equations such as 


Wi (x +y) =Wa(x + y)b2(x)o3(y) 
+ W3(x + y)ba(x)os(y) [8] 


may be solved using the solutions of [5]. 

Finally, let us note that the general system [4] may 
lead to functional equations not just of the form [5], 
for example, 


pelx +y) =i (x + y)(ba(x) — bs(y)) 
r(x) 3(y) 
p(x) 3(y) 
The general analytic solution to [9] has yet to be 
determined although particular solutions are known. 


As a final example of a functional equation 
coming from an ansatz for a Lax pair, consider 


Lik = \/Pj;PpA(G — Ik), My = \/pjPeC(4j — qk) 


where we now assume A(0) and C(0) regular. Then 
the consistency of this Lax pair corresponds to the 
equations of motion for the Hamiltonian 


H = Y` ppe(a—4) 10 
j,k 


provided f is even and the functional equation 
2A'(x + y)[f(x) — fY) 


Aea O-ro Ay) 


C(y) 


is satisfied. The Hamiltonian system [10] corre- 
sponds to geodesic motion. Nonanalytic solutions 
are known to the functional equation [11]. 


ty 


An Algebraic Ansatz: Conserved 
Quantities 


Another way in which functional equations may 
appear is by making an ansatz for an additional 
conserved quantity beyond the Hamiltonian. For two 
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and three particles on the line, Hietarinta derived 
functional equations by seeking a second quartic or 
cubic integral (respectively). Here, a key ingredient is 
the assumption of a further invariant polynomial in the 
momenta. Polynomial invariance, together with sym- 
metry, is quite constraining. Consider 


Theorem 1 Let H and P be the (natural) Hamilto- 
nian and center of mass momentum 


H=5) p} +V, P=S~p, 
i=l i=l 


Denote by Q an independent third-order quantity 


i 1 
Q=9 pi +> J, dbipipr + Y | didi; 
= iAj#k Fj 


+ >. aypibj + X bipi+c 
1] 1 


If these are S,-invariant and Poisson-commute, 


{P, H} = {P,Q} = {Q, H} = 0 


then 


V= =N olqi — qi) + const. 
iF} 


and we have the Calogero—Moser system. 


Here, the symmetric group invariance means that 
for any coefficient aj(q1,92,---5Qn) in the expan- 
sions above, we have Ganen IA Do(2)o-++5 Gat) for 
all o € S,. In particular, V(q1, q2,.--54n) = V(dor1)s 
Jo(2)s+++s4ain)) for all o € S,. We remark that had we 
begun with particles of possibly different particle 
masses, H=(1/2) $; mp? +V; the effect of 
S,-invariance is such as to require these masses to 
be the same. Thus, we are assuming the S,,-invariant 
Hamiltonian of the theorem. Finally, by “an 
independent third-order quantity” O, we mean one 
functionally independent of H and P and for which 
one cannot obtain an invariant of lower degree by 
subtracting multiples of P? and PH. We are not 
dealing with quadratic conserved quantities here. 

The assumed polynomial behavior of the con- 
served quantities means that when calculating 
Poisson brackets, the coefficients of independent 
monomials must vanish. This, together with sym- 
metry, leads to the functional equation 


1 1 1 
F(x) Fly) F(z) |=0, x+y+z=0 [12] 
F(x) F(y) Fh) 
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The result follows in light of 


Theorem 2 Let f be a three-times differentiable 
function satisfying the functional equation [12]. Up 
to the manifest invariance 


F(x) — aF(6x) + 8 


the solutions of |12] are one of F(x)=p(x+d), 
F(x) =e” or F(x)=x. Here, © is the Weierstrass 
go-function and 3d is a lattice point of the g-function. 


Again we note that the ansatz per se has not 
established complete integrability: the ansatz leads 
us to the Calogero—-Moser model whose complete 
integrability must be established by other means. 
This result may be interpreted as a rigidity theorem 
for the a, Calogero-Moser system and in part 
explains this models’ ubiquity: demanding a cubic 
invariant together with S,-invariance necessitates 
the model. A natural generalization is to replace the 
S,-invariance with the invariance of a general 
Weyl group W and make connection with the 
Calogero—Moser models associated to other root 
systems (Perelomov 1990). 

We shall encounter the functional equation [12] 
again in this survey and now note that this may be 
generalized to 


1 1 1 
F(x) Gy) H(z)|}=0, x+y4+z=0 [13] 
P(x) Gy) H) 


If F,G, and H are three-times differentiable func- 
tions satisfying the functional equation [13], then, 
up to the manifest invariance, 


F(x) > aF(6x +71) + 8 
G(x) > aG(6x +42) + 8 
H(x) > aH(éx +793) + 8 


where 71 + 72 +773 =0, the nonconstant solutions of 
[13] are given by F(x) = G(x) = H(x) =e”, x, or (x). 
If (say) H(z) is a constant then either 


1. one of the functions F(x) or G(y) is the same 
constant as H(z), in which case the remaining 
function is arbitrary, or 


2, FOG =e. 


We remark that in fact the exponential and linear 
function solutions satisfy [12] and [13] without the 
constraint x+y+z=0. Further, the theorems 
immediately give the general analytic solutions to 


the same functional equations viewed as functions of 
a complex variable, showing that the solutions are 
in fact meromorphic. These theorems were estab- 
lished in Braden and Byatt-Smith (1999) where 
earlier results are described. 


Quantum Calogero—Moser Systems 


Quite a bit is known about the quantum general- 
izations of the Calogero—Moser system. The poly- 
nomial and Weyl group W-invariance of the 
classical conserved quantities is replaced by a 
commutative ring R of W-invariant, holomorphic, 
differential operators, whose highest-order terms 
generate W-invariant differential operators with 
constant coefficients. The Poisson bracket is then 
replaced by a commutator of operators. When this 
is done functional equations again ensue and one 
finds that the potential term for the Laplacian H 
(the quantum Hamiltonian) has Calogero—Moser 
potential appropriate to W (Oshima and Sekiguchi 
1995). In this setting, it is known that the 
commutativity of just a few low-order elements of 
R dictate the form of the potential and the 
commuting algebra (at least for the classical root 
systems). In particular, Theorem 1 above is the classical 
analog for the a, root system of a quantum result where 
a functional equation equivalent to [12] was obtained 
by requiring the commutativity of certain linear, 
quadratic, and cubic holomorphic differential opera- 
tors. Taniguchi’s results (Taniguchi 1997) are also 
indicative of the rigidity of these quantum models: if H 
is the quantum Hamiltonian just discussed, and Q; 9 
are holomorphic (but not a priori W-invariant), 
differential operators of appropriate degrees for 
which [Q;,2,H] =0, then Q1,2 E€ R and consequently 
[Q1, Q2|=0. 


An Algebraic Ansatz: The Poincare 
Algebra 


We have earlier encountered the Ruijssenaars- 
Schneider models when considering functional 
equations ensuing from ansatz for Lax pairs. These 
models were however discovered by another route 
(Ruijsenaars and Schneider 1986) in the course 
of investigating mechanical models obeying the 
Poincaré algebra 

{H,B}=P, {P,B}=H,  {H,P}=0 [14 
Here, H will be the Hamiltonian of the system 
generating time translations, P is a space-translation 


generator, and B the generator of boosts. Ruijsenaars 
and Schneider began with the ansatz 


H= X cosh; [I fs Xj — Xp) 
k#j 

P= X simp; Tf Xj — Xp) 
k#j 


B= D 
j=1 


With this ansatz and the canonical Poisson bracket 
{pi xi} = ój, the first two Poisson brackets of [14] 
involving the boost operator B are automatically 
satisfied. The remaining Poisson bracket is then 


(H,P} =— 5a P — x) 
j=l k#j 
-ZY cosh(p; — pr) ) [fe x; — x1) 
aT lA) 


x || fx 


m#k 


) (Oj In f(x — xj) 


+ Op Inf (xj — Xp) 

and for the independent terms proportional to 
cosh (p; — pz) to vanish we require that f'(x)/f (x) be 
odd. This entails that f(x) is either even or odd 
(Ruijsenaars and aa assumed the function even) 
and in either case F(x) = f? (x) is even. Supposing that 
f(x) is so constrained, rd the final Poisson bracket is 
equivalent to the functional equation 


{H, P}=0e Sra LPG 


j=1 kj 


— Xp) = 0 [15] 


For F eqn [15] takes precisely the form [12] 
with F(x)=f?(x). From Theorem 2, the even 
solutions m have the form F(x)=g(x)+c. 
This was found by Ruijsenaars and Schneider who 
further showed this function satisfies [15] for all n. 
The general solution to [15] has recently been 


established. 


Theorem 3 (Byatt-Smith and Braden 2003). The 
general even solution of [15] amongst the class of 
meromorphic functions whose only singularities on 
the real axis are either a double pole at the origin, or 
double poles at np (p real, n € Z) is: 


i) for all odd n given by the solution of Ruijsenaars 
and Schneider while 
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(ii) for even n> 4, there are in addition to the 
Ruijsenaars—Schneider solutions the following: 


(olz) — ej) (p2) — ex) [16] 


where i, j, k are a cyclic permutation of 1, 2, 3. 


These functions have simple expressions in terms 
of Weierstrass elliptic functions, theta functions, and 
the Jacobi elliptic functions (Whittaker and Watson 
1927). For example, 





Fi (z) = y (p(z) — e2)\(p(z) — e3) = zeas 
2 eC, 
O(v) 4wA3(0)04(0)  sn?(u) 
where 
gale) = AEE SH) etan 
u = fey — 32 
vy=z/2w,b=e — b3 with w1 =0,. W= -ws Ww, and 


w3 =w". For appropriate ranges of z the solutions 
are real. Their degenerations yield all the even 
solutions with only a double pole at x=0 on 
the real axis. These degenerations may in fact 
coincide with the degenerations of the Ruijsenaars— 
Schneider solution. 

Thus far, complete integrability has not been 
mentioned. The models discovered by Ruijsenaars 
and Schneider not only exhibited an action of the 
Poincaré algebra but were completely integrable as 
well. In particular, Ruijsenaars and Schneider 
demonstrated the Poisson commutativity for their 
solutions of the light-cone quantities 


Ws > 


Then, H = (S1 + S_4)/2 and P= (S1 —S_4)/2. (Note 
the even/oddness of the functions f(x) means that there 
really are only n functionally independent quantities.) 
It is an open problem whether the new solutions [16] 
of Theorem 3 yield integrable systems. We know that 
these new solutions do not always yield Poisson 
commuting quantities using the ansatz of Ruijsenaars 
and Schneider, but as yet one cannot rule out other 
Poisson commuting conserved quantities. 


Quantum Ruijsenaars-Schneider Models 


Ruijsenaars later investigated the quantum version 
of the classical models he and Schneider introduced. 
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From the outset, he sought operator analogs of the 
light-cone quantities [17]. He showed that (for 


R= 1,550.59) 
S: = | [ 46; — x;)'/? 
IC{1,2,...,n} icl 
=k sl 
<exp( vV— uya) re X= X) e 
1E1 IEI 
jgI 
pairwise commute if and only if 
[I hix =X) =p) 
IC41,2,...7⁄ IEI 
Tek NA 
— |] a - xb —iB)|=0 [18] 
IEI 
j¢l 


held for all k and n > 1. Here, 8 is an arbitrary 
positive number and the sum is over all subsets with 
k elements. Observe that upon dividing [18] by 
B and letting 8—0 this yields [15] with F(x)= 
h(x)h(—x) when k= 1. 

Ruisenaars found a solution to [18] which has 
subsequently been shown to be unique. The general 
solution of the functional equation [18] analytic in a 
neighborhood of the real axis with either a simple 
pole at the origin or an array of such poles at mp on 
the real axis (n € Z) is given by 

a(x + V) QX 
hx) = a T e [19] 


This solution is related to the earlier Ruijsenaars— 
Schneider solution via 


o(xtujo(x—v) | 


o2(x)o2(v) = (v) =~ p(x) 


Geometric Ansatz 


We have already encountered the Hamiltonian 
system [10] corresponding to geodesic motion 
while discussing Lax pairs. We shall now consider 
various ansätze with a geometric flavor and their 
attendant functional equations. 

It is known that the Ruijsenaars—Schneider model 
has the Calogero—Moser system as a scaling limit. 
Other scaling limits also exist for the Ruijsenaars— 
Schneider model. In particular, we may consider one 
in which the Poincaré algebra scales to either the 
Galilean algebra or a central extension of the 
Galilean algebra. 


Similar to our analysis of the Poincaré algebra, we 
find that the functions 


=> p7 L[ f(x; _ Xk), 
a k7] 
P=) p [fæ B= ox 
j1 kj j=1 
obey the algebra 
{H,B}=P, {P,B}=\, {H,P}=0 [20] 

if and only if f(x) is either an even or odd function 
satisfying 


IUE 


j=1 k#j 


— Xk) |21] 


where A is a constant. When A=0 this is the 
Galilean algebra, while \40 is a central extension 
of the Galilean algebra. Again we are ue 
models of the form H=(1/2) 5%- gp; and so 
dealing with diagonal metrics. We note that if [21] 
holds for n=3 then it holds for all n; and if it 
holds for n= 4 then it holds for all “even” n. This 
type of behavior was already encountered in 
Theorem 3. 

Some particular solutions of [21] are known 
although the general solution is not known as yet. 
The odd functions f(x)=1/x (A=0), coth (x) (A=1 
for n odd and A=0 for n even), 4/ plx) — ea (A=0) 
yield solutions for example. Interestingly, in the 
case of an even number of particles, particular cases 
of the elliptic Ruijsenaars-Schneider model are in 
this list. 

Diagonal metrics arise in many settings in integr- 
able systems. By taking the ansatz 


ds’ = ) | (TI U(x! — “) (dx')? 
i=1 \j4i 

we may construct and solve a functional equation to 

show that the potentially nonvanishing curvature 

components Rigs Rig (h # i,j), and Ri; have 

1. Rin = = Ri, = =0 (kÆi,j) if and only if 
W(x) = ale? — 1)* or ax. We may set a=1 by 
rescaling x. 


2. Ri, =(—1)"b? when a ) = (e29* — 1). 
3. Ri,=0 when V(x) = 
Thus, W(x)=x yields a solution of the Lamé 


equations. These metrics are of Stackel form. The 
rational degenerations of the Galilean models above 
are given by this theorem. They may be understood 
as a parabolic limit of Jacobi elliptic coordinates. 


Similar techniques may be applied to the more 
general metric 


ds? = > (o 1] W(x! — =) (dxi)* 


i=1 jżi 
to show that Ria = Rin = Olki) if and only 
if U(x) =ale?®* — 1)? or ax? 


Ground-State Factorization 


Some years ago, Sutherland and Calogero consid- 
ered the problem as to when the ground-state wave 
function of a one-dimensional n-body Schrodinger 
equation with pairwise interactions would factorize. 
Thus, the problem is to determine those potentials 


v(x) for which 


and where 


W(x1,X2...% 


It is convenient to set 


p(x; — xj) = exp (G _ f(x) dx) 


Substitution now shows 


p 
TA Iro 
i1=1 


Comparison with [22] shows that this may be 
expressed in terms of two-body potentials if and 
only if we have the functional equation 


fta)f (~b) — fla)f (c) + fle)f(—8) 
= G4 (a) + Gı(c)+ Go(-b), a+b+c=0 [23] 


Now [23] is not quite the functional equation 
studied by Sutherland and Calogero. On physical 
grounds, Sutherland implicitly, and Calogero expli- 
citly, made the “assumption” that f is an odd 
function. This ensured that the potential was even 
and so bounded from below; equally it may be 
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imposed so that œ(x;—x;)=%(x;—x;) and the 
ground state describes bosons. With this assump- 
tion, one arrives at the functional equation of 


Sutherland: 


faf (b) + f(b)f lc) + flf la) 
= G(a) + G(b) + G(c) [24] 


Actually the assumption of f being odd is unneces- 
sary. One can show that there is a bijection between 
analytic solutions of [23] and analytic solutions of 
[24] for which f'(x) is even. Upon requiring a 
potential of the stated form then necessitates f 
being odd. Whatever, we arrive at the functional 
equation [24]. This is connected with [12] by 


Lemma 4 Ifa+b+c=0, then 


f(a) F'O) FO 
f(a) f(b) fe) |=0 25] 


= F(a) + F(b) + F(c) [26] 


<== f (a)f (b) + f(b)f le) + fle)f (a) 
= G(a) + G(b) + G(c) [27] 


Now, we may use Theorem 2 to determine those 
potentials with factorizable ground-state wave func- 
tions. We remark that the 6-function potential a(x) 
of many-body quantum mechanics on the line, 
which also has a factorizable ground-state wave 
function, can be viewed as the a — O limit of 
—b/asinh*(—x/a + 7i/3) with maa=6b. Thus, all 
of the known quantum mechanical problems with 
factorizable ground-state wave function are included 
n [12]. 


Baker-Akiezer Functions 


Baker—Akiezer functions are one of the foundations 
of the algebro-geometric or finite-gap integration of 
integrable systems. These functions may be viewed 
as an extension of the exponential function to curves 
of arbitrary genus g. They have essential singula- 
rities at various points on the curve and a prescribed 
asymptotic expansion at these points. The functions 
may be described in terms of theta functions on the 
Jacobian of the curve, and suitable meromorphic 
differentials on the curve. The functions [6] and [19] 
may be viewed as the Baker—Akhiezer function for a 
genus-1 curve. Now, just as the exponential function 
satisfies Cauchy’s functional equation one may ask 
what functional equations (if any) characterize the 
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Baker—Akiezer function. This is an area of research 
still ongoing. Theta functions of a general abelian 
variety are known to satisfy addition formulas with 
N=28 terms. It appears Baker—Akiezer functions 
satisfy a similar functional equation with far fewer 
terms. Such a characterization of Baker—Akiezer 
functions, if found, will provide an analogous 
answer to that of the Riemann—Schottky problem 
which seeks to describe the Jacobians of curves 
amongst general abelian varieties. 

The functional equations [5], and after suitably 
symmetrizing [9], are particular cases of the func- 
tional equation 


b3i42(x) 63:42(y) 


with N=1 in the former case and N=2 in the 
latter. In the case $342 = $3;,1, these may be viewed 
as differentiated forms of 


3 pty m mo ps 
i=0 


N 
N b3i(x +y) l(x)psn (y) = 1 |29] 
i=0 


For N =0, this is Cauchy’s equation characterizing 
the exponential function and for N=2 it is 
equivalent to [8]. For N=1 and N =2, Buchstaber 
and Krichever have shown that “all” the solutions to 
this equation are the Baker—Akhiezer functions 
corresponding to algebraic curves of genus 1 and 
2, respectively. In general, the Baker—Akhiezer 
functions for a genus-g curve are known to satisfy 
[29] for N=g. Thus, many of the equations we have 
encountered are related to Baker-Akhiezer func- 
tions. Dubrovin, Fokas, and Santini have shown that 
Baker—Akhiezer functions for a genus-g curve are 
related to the functional equation 


§ 


saana =S — r(z,y) + ds (vy) Pe (x, z) 


Multivariable generalizations of [29] have been sought 
as a means of characterizing Baker—Akhiezer functions 
but such a characterization remains unproved as yet. 


Topology 


Several of the functional equations we have encoun- 
tered also arise in topology, where the German and 
Russian schools have powerfully applied functional 
equations to formal group laws and genera. It is still 
unclear whether these common threads form part of 
a greater fabric. A genus is a ring homomorphism 


y:N2@Q—-R, y(1)=1 


where Q is the cobordism ring and R an integral 
domain over Q. To each even power series O(x) with 
Q(0) = 1, one can associate a genus Yo and vice versa 
(Hirzebruch et al. 1992). Defining the odd power 
series f(x) =x/QO(x) with first term 1 and coefficients 
in R, the inverse function g=f~! is such that 


CO 


g'(y) = >) vo(CP”)y” 
n=0 


The genus corresponding to O(x)=x/tanh(x) is 
known as the L-genus; it takes the value 1 on every 
even complex projective space. The genus corre- 
sponding to Q(x) =(x/2)sinh(x/2) is known as the 
A-genus. The so-called (string-inspired) Witten or 
elliptic genus corresponds to O(x) =x/o(x). Certain 
genera may be associated with the index of natural 
differential operators on the manifold. Thus, the 
signature of M, sign(M), is given in terms of the de 
Rham differential d and its adjoint d*, 


2n , 
ind(d + d*) = sign(M) = i =i [M] 


j=0 


with variants for the A-genus and elliptic genus. 
Further, when a compact topological group acts on 
the manifold, Atiyah and Bott showed how these 
indices may be determined from the fixed point sets 
of the action. 

Now, functional equations arise naturally in this 
context when seeking genera with special properties. 
Novikov’s school has shown, for example, that the 
genera associated with the index theorems of known 
elliptic operators arise as solutions of functional 
equations which are particular examples of [5]. 
Similarly, one may seek the following property of 
a genus y: for the fiber bundle p:E—>B with 
smooth fiber and base, one has that 


P(E) = (F) - p(B) 


Such a genus is said to be strictly multiplicative. It 
may be shown that a genus is strictly multiplicative 
in bundles with fiber CP”~' if and only if 


n 1 
> lII f= x) É 30 


which is essentially [21]. Following the remarks of 
that equation, a genus y is strictly multiplicative for 
all fiber bundles with fibers CP”! if and only if it is 
strictly multiplicative for all fiber bundles with fiber 
CP”, in which case the genus is the L-genus. If, on the 
other hand, we only demand strict multiplicativity for 
all fiber bundles with fibers CP7*~', then this is 
equivalent to requiring it to hold for all fiber bundles 
with fiber CP’, in which case the genus is an elliptic 


genus. That the same functional equations arise in 
both the integrable systems and topological settings 
may reflect something deeper. String theory physics, 
for example, allows some topology changes such as 
flops, and physical quantities such as the partition 
function should reflect this invariance; invariance 
under classical flops characterizes the elliptic genus. 
In addition, connections have been made between the 
complex cobordism ring and conformal field theory. 


Other Areas 


The constraints placed on this review have meant that 
several further applications of functional equations 
and integrable systems can only be noted. Using an 
ansatz together with functional equations, Wojcie- 
chowski gives an analog of the Backlund transforma- 
tion for integrable many-body systems. Similarly, 
Inozemtsev constructs generalizations of the 
Calogero—Moser models, while this route was used to 
construct new solutions to the Witten—Dikgraaf- 
Verlinde-Verlinde (WDVV) equations by Braden, 
Marshakov, Mironov, and Morozov. In the quantum 
regime, Gutkin derived and solved several functional 
relations by requiring a nondiffractive potential, while 
functional equations have been used to construct 
R-operators, solutions of the quantum Yang—Baxter 
equation on a function space. 


See also: Calogero—Moser—Sutherland Systems of 
Nonrelativistic and Relativistic Type; Classical matrices, 
Lie Bialgebras, and Poisson Lie Groups; Cohomology 
Theories; Eigenfunctions of Quantum Completely 
Integrable Systems; Integrability and Quantum Field 
Theory; Integrable Systems and Algebraic Geometry; 
Integrable Systems: Overview; Lie Groups: General 
Theory; Quantum Calogero—Moser Systems; Toda 
Lattices; WDVV Equations and Frobenius Manifolds. 
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The Domain of Integration 


Functional integration is integration over function 
spaces, that is, the variable of integration is a 
function f with values in a D-dimensional manifold: 


f:U—>M? [1] 


Generically a space of functions is an infinite- 
dimensional space. Our understanding of infinite- 
dimensional spaces has progressed significantly 
during the twentieth century, and we can formulate 
functional integration in its proper setting. 

Let F be the domain of integration, and f € F 
the variable of integration. If the domain of f is 
a subset U of R, the functional integral is called 
a path integral; if U is of dimension higher than 1 
(e.g., spacetime), F is often called a space of 
histories. 

The information necessary for defining a domain 
of integration includes 


e the domain and the range of the variable of 
integration f, 

e the analytical properties of f, and 

è possibly additional information, such as require- 
ments on the values of f on its boundary. 


Examples of variables of integration f in [1] 


The domain U of f may be a time interval, a scale 
range, or any parameter. The range M? of f may be 
a group manifold, a Riemannian manifold, a 
symplectic manifold, a multiply-connected space, 
etc., or simply R”. The domain of integration F 
may be a space of pointed paths, for example, 


x:=T— M”, T= [t,,t,| [2] 


x(t) = xp EMP forall x € F 


The paths x may be continuous (e.g., Brownian 
paths), or may have square integrable derivatives; 
F is then an L>! space (e.g., quantum physics). 


| dt|x(t)?< co, xEF [3] 
T 


Given a domain of integration F, one needs to 
select a volume element appropriate to F. This is a 
challenge which has been met in a number of cases 


(Cartier and DeWitt-Morette 2006). Examples are 
given below. Given a volume element, one can then 
characterize the functionals F on F integrable with 
respect to the chosen volume element. 


Two Basic Techniques 


The two most useful techniques for computing 
integrals are change of variable of integration and 
integration by parts. They follow from fundamental 
properties that apply to functional integrals as well 
as to ordinary integrals. Let us recall them in the 
context of ordinary integrals. 

Let f and g be functions on R of compact support. 
Let I stand for integration 


H= | dsf) xER 


and D for derivation of f with respect to x, 


DAE) =f) 
The fundamental rule 
D=0= 0-5 dx f (x) 4 


The functional I(f) is invariant under a change of 
variable of integration. 
Another fundamental rule is ID =0: 


ID=0 = 0= f alfel) 
= | af) s | dea) fa) S 


The fundamental rules [4] and [5] apply to 
functional integration. The derivation D can be 
either a functional derivative or a Lie derivative 
defined as follows. Let K be the reals R or the 
complex C, let f be a differentiable functional on a 
Banach space X 


f:UCX—K [6] 


The functional derivative Df |, of f at xo is defined 
by the equation 


f (xo +h) — f (x0) = Df |x, P + R(b) [7] 


with the norm ||R(b)|| of order less than the 
norm ||/||. 

The Lie derivative Ly along the vector field V is 
conceptually intuitive and of practical interest: an 
infinite-dimensional space X of paths x is not an 
intuitive concept, but a one-parameter family of 





Figure 1 A one-parameter family of paths with fixed endpoints. 


paths {x(a)} E€ X, with a € [0,1], 
tool for dealing with X: 


is a convenient 


x(a): T => MP 
x(a, t) := (x(a) ) (t) € MP 


Set x(0) = xo. A differentiable family {x(a)} defines a 
vector field V(xọ) along the path xo (see Figure 1): 


[8] 


Vizo) = £ x(a) 





; o= 9] 
V(x0(t)) = 5 x(a, 2) 





a=0 


The functional vector field V on the tangent bundle 
of X defines a group of transformations on X, and a 
Lie derivative Ly of tensor fields on X. 

The Lie derivative Ly obeys the Cartan (Elie and 
Henri) equation 


Ly = dty +ıyd [10] 


where d is the exterior differential and zy is the 
interior product, defined as usual on Banach spaces. 


Remark (Berezin integrals). To show the power of 
the rules [4] and [5], we can mention that they 
provide Berezin rules of integration over Grassmann 


variables (Cartier and DeWitt-Morette 2006). 


Path Integrals and Quantum Dynamics 


The history of path integrals in quantum physics did 
not begin with the definitions of domain of integra- 
tion, volume elements, etc. It began with the Ph.D. 
thesis of R P Feynman in 1942. Feynman expressed 
the time evolution of a system as the limit N = œœ of 
the following N-tuple integral: 


(xen) = lim | | f (foey dew Gry) 


x dxn_-1 Kes (x2|x1) dx (x1|x0) [11] 
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where the time interval T = [t,0] has been replaced 
by N of its points {t}, 1 <i <N: 


lo <L < AKIN A [12] 


and the path x: T — R? is replaced by N of its values 
K S= x(t;) [13] 
Dirac (1933) had shown that (x;|xo) defines the 


exponential of a quantum function Sọ, by 


exp(iSo(x:,x0,t)/b) := 


such that the real part of So is the classical action 
function (a.k.a. Hamilton’s principal function; 
further studies have shown that the correct state- 
ment is: the real part is the classical action, up to 
order /), and the imaginary part of Sg is of order h, 
the normalized Planck constant 


b =h/2n 15] 


(x:|x0) [14] 


Feynman remarked that for a system with 
Lagrangian L the short-time probability amplitude 


(x2162|Xr) is “often equal to 
"rs /b) [16] 


A` 'exp(i aL 
within a normalizing constant A as the limit ôt 
approaches zero.” The absolute value of A can be 
obtained from a unitary requirement (Morette 1951). 
Feynman expressed the finite probability ampli- 
tude as a path integral, limit of the discretized 
expression |11] 


ere J Dx exp(iS(x)/b) 17] 


where S(x) is the action functional 


— J dsL(x(s),x(s)) [18] 


The undefined symbol Dx is a “volume element” on 
the space of paths, corresponding to the infinite 
product of the normalization constant A“. 

The issues raised by the path integral [17] are 


è the definition of the volume element Dx; and 
e a method for computing [17] for a given action 
functional S. 


The explicit calculation of the limit [11] of an 
N-tuple integral when N = oo is a Herculean task of 
very limited use. But two other methods of wide 
applications, leaving the volume element Dx as a 
heuristic symbol, have vindicated the power of 
functional integration: the diagram technique and 
the semiclassical expansions. 
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Feynman devised practical rules for computing 
asymptotic expansions of path integrals, order by 
order in perturbation theory. The rules are depicted 
by graphs, known as the Feynman diagrams 
t Hooft and Veltmann 1973). Feynman’s first 
explicit nontrivial calculation was the Lamb shift. 
It earned him the Nobel prize in 1965 (Feynman 
1966). The diagram technique is widely used in 
quantum mechanics and quantum field theory. The 
time ordering provided by the time parameter in 
quantum mechanics becomes, in quantum field 
theory, a chronological ordering dictated by light 
cones. 

Another explicit calculation of a path integral [17] 
uses the Taylor expansion of the action functional 
S(x) around one of its values. It is known as the 
background method (DeWitt 2004). It is called a 
semiclassical WKB approximation when one expands 
around an extremum S(xq) where x, is a solution of 
the Euler-Lagrange equation S(xqy)=0 (Wenzel 
1926, Kramers 1927, Brillouin 1926). 

Introduced in 1951 (Morette (1951)) semiclassical 
approximations are now the subject of a rich 
literature reviewed briefly below. 


Gaussian Volume Elements 


A lesson from Gaussians on R? suggests a definition 
of volume elements on infinite-dimensional Banach 
spaces X. Let 


Ipa = I. dx exp(—=|x|*) fora>O [19] 


D 
dx = dx! - - -dxP and |x|? = N(x) = Sx x 
ræ 
An elementary calculation gives Ip(a) =a?/*. There- 
fore, when D = œ, 


0 ifO0<a<l1 
I,.(a) = l 1 ifa=1 |20] 
œo «(ifil<a 


This is clearly an unsatisfactory situation, but it can 
be corrected by introducing a dimensionless volume 
element: 


Dax = rdx dy” [21] 
The volume element D,x can be defined by the integral 
Tl) 2 ae ae 
| Pa exp(— a |x|" — 2ri({x .x)) 


= exp (—an|x'|*) [22] 


where x’ is in the dual Rp of R?. Equation [22] 
suggests the following generalization of Gaussians 
on R” to Gaussians on a Banach space X: 


J D; oX exp (- ~Q(x)) exp(—27i(x’, x) ) 
X 
= exp(—s7W(x’)) [23] 


where s € {1,i}, O(x) is a quadratic form on X (see 
condition on O below). W(x’) is a quadratic form on 
the dual X’ of X, inverse of O(x) in the following 
sense. Set 


O(x)=(Dx,x) and W(x')= (x',Gx") RA 


where (,) is a duality product, for example, the 
product of x € X and Dx € X’; then 


DG=ly, GD=lx [25] 


Equation [23] defines a Gaussian volume element dT 
by its Fourier transform 


Ior) = | drow) exp(—27i(x’, x)) 
= exp(—s7W(x’)) [26] 


where the Gaussian volume element 


dsol=) L Dsolx)exp(-306)) 27) 


This is a qualified equality valid upon integration. 

The definition of the Gaussian volume element by 
its Fourier transform FT is valid for s=1 (Wiener 
integral) when QO(x)>0; it is valid for s=i 
(Feynman integral) when ReQ(x) > 0. 


Remark Volume elements were introduced with 
the notation such as dx; later they were identified 
with forms such as w=dx. In [26] we omit d on the 
left-hand side (LHS) for visual clarity. 


Example (diagram expansion). The following inte- 
grals follow readily (Cartier and DeWitt-Morette 
2006) from the definition [26]. Let x’ be in the dual 
X of X, 


/ dD, (x) (x’, x) t = 0 [28] 
X 


[ Towa EW p 


f eona) ama) 
X 


OL W(x", ’ Xe) Ea W (x inn ’ X Hy) [30] 


where ` is a sum without repetitions of identical 
terms. 


For instance when n = 1, eqn [30] reads 


S 


J dTs ol) aa) = 5 W(x1, x2) [31] 
x 


W(x'1, x2) is called the two-point function (a.k.a. the 
propagator). In a diagram it stands for a line from 
x’, to x’. 

Feynman diagrams represent Gaussian integrals of 
polynomials. 

For instance when = 2, the diagram representa- 
tion of [30] is the sum of three terms, 


W (x4, 7) W (x5, x4) + Wip) W(x}, x4) 
+ W(x}, x4) W (x5, x4) 


Example (Linear maps). Linear maps on R” are 
limited to L:x— Ax, where A is a D x D constant 
matrix. Linear maps on a Banach space X offer 
many possibilities: 


(i) Projections. For example, let x: T— R and 
L:x€X— {x(t1),x(to),...,x(tr)} € R”) [32] 


This projection is a discretization of the path, 
useful in particular in numerical calculations of path 
integrals. Equation [32] is unambiguous, whereas 
the limit of the discretized expression [11] is ill- 
defined. 

(ii) Liouville decomposition. For example, let D be 
a second-order differential operator on a space of 
paths x:[t,,t,] ~ M? vanishing on the boundary, 
x(tz) =0,x(t,) =0. Let {p} be a complete, orthogo- 
nal set of eigenfunctions of D, then the decomposi- 
tion of x into the basis {yz}, 


x(t) = Xu) 33] 
k=1 
is a linear map 
L:xeX— {u',...,u~} E R” 
It is useful in particular for diagonalizing (see, 
e.g., [107]) the Green function G of D [25] (a.k.a. 
the covariance in a Gaussian integral [24], or the 


two-point function in [31]). 
(iii) Volterra maps. For example, let L: X — Y by 


y= fas O(t — s)x(s) 


0(t—s)=1 fors<t,0 otherwise [34] 
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Figure 2 Linear maps. (Published with permission by Elsevier, 
North Holland.) 


Let X be the space of square-integrable functions 
on T and Y be an L>! space (square-integrable 
function for which the first derivative is also square 
integrable) then L maps the canonical quadratic 
form on X into the canonical quadratic form on Y, 
hence the canonical Gaussian on X into the 
canonical Gaussian on Y. The identity mapping 7 
from Y into the space C of continuous functions 
maps the canonical Gaussian on Y into the Wiener 
Gaussian on C (DeWitt-Morette et al. 1979). 


The linear maps [32]-[34] and their obvious 
generalizations have been used for computing 
explicitly many functional integrals (see Figure 2). 
The basic formula reads 


J arxore) = f arvoyfo), F=foL BS 
where the Fourier transform FT is given by 
Fly =FTxoL [36] 
L is the transpose of the linear map L defined by 
(Ly/,x) = (y/, Lx) 37 


Computing FT y does not require any calculation. It 
can be read off eqn [36]. Computing dI'y is easy in a 
number of cases such as the following: 


1. Y is finite-dimensional. In other words [35] is 
a cylindrical integral. Then 


dTy(y) =dy!---dy? (detO,)"”” 
x exp(— = O(y) [38] 
where Q(y) is an abbreviation of 
Oy(y) = Qv; y'y’ [39] 


its inverseWy (y') in the sense of [24]-[25] is 

Wy (y) = Wyn, 40) 
that is given by [36]: 

Wy (y) = Wx o L [41] 
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Wx is the quadratic form defining rx by [26]. 
When D is small, say less than 4, and this is not an 
unusual situation, it is easy to compute [38]. 


Example (Wiener Gaussians and Brownian 
motion). The Wiener Gaussian on the space P,R 
of pointed paths x:T — R, T= [t,,t,|,x(tz)=0 is 
defined by its variance [26]: 


=f dx'(t Fi dx'(s) inf(t, s) [42] 


Let Y be the Wiener differential space consisting of 
the differences of two consecutive values of x on the 
n-discretized time interval. The space Y is finite 
dimensional, 


L:X—Y 
by y! = slt) — x(g) = (nai 
7] that 


Ôt, X) 


It follows from [3 
Ly = = 2yil Ôt — ês ) 


and 





where At :=b4i — 4 and Axj:= x(t) = x(t): 
When s= 1 the Gaussian Ty defines the distribu- 
tion of a Brownian path. The Gaussian Tx of 
covariance inf(t,s) is the Wiener measure. oO 
2. In semiclassical approximations, Oy is the 
Hessian (second variation) of an action functional S: 


Ox(b) =F Sl(a) 





aae ale 


Fa [44] 





a=0 


where {x(a)} is a one-parameter family of paths [8]. 
The Jacobi field technology (the Jacobi operator is 
defined by [103]; a Jacobi field is a solution of 
[102]) yields the inverse of Wy and its determinants 
(Cartier and DeWitt-Morette 2006); they have been 
worked out for a variety of boundary conditions on 
classical paths. 


Volume Elements Other than Gaussian 


The definition [26|-[27] of Gaussian volume ele- 
ments is a particular case of volume elements on a 
Banach space ® defined by 


[ Do zo- Ole, J) = ZO) 45) 


for y in ®, and J in the dual ®’ of ©. The volume 
element Do,z is defined by two continuous bounded 
functionals 


©: xp —C and Z: — C |46] 


In quantum field theory, y is a field and J is a 
source. The functional Z(J) is then the Schwinger 
generating functional for the n-point functions. An 
axiomatic and applications of functional integrals 
on with volume elements De,z can be found in 
(Cartier and DeWitt-Morette (1993)). 


Example (Poisson volume elements) (Cartier and 
DeWitt-Morette (2006) and Collins (1997)). A 
Poisson random variable is a random variable N 
taking values in the set N of non-negative integers 
such that the probability p, that N =n is 


Pa = PiN Sy exp(—A) -77> A>0 J47] 
Thanks to the normalizing constant exp (—A), 
S90 Pn = 1. The parameter Ais the mean value of N: 


(N) =A 48] 


A record of fortuitous events occurring at random 
times tọ < Tı < T>--- can consist either of the 
number N(t) of events occurring at times less than 
or equal to t, or of the waiting times 


W; = Tk — Tki |49] 
between two consecutive events. 


When the waiting times are stochastically inde- 
pendent and when 


Pit < Wi <t+di)=pt)di [50] 


palt) =aexp(—at), t>0 [51] 


the record is a Poisson random variable. It is related 
to the number of events N(t) as follows. 
Let T be a finite time interval [t’,t”], and 


Nr = N(t") — N(t’) [52] 


the number of events during T. The random variable 
Nr follows a Poisson law [47] with mean value 


A(T) = a(t" =t’) [53] 


For mutually disjoint time intervals T”, T, ... the 
random variables No, N@,... are stochastically 
independent. 

Whereas the parameter À must be real non- 
negative, the parameter a can be pure imaginary; 
therefore, Poisson processes defined by waiting 





t(0) L To T3 T4 t 


Figure 3 A Poisson path in X4. 


times can be used in quantum physics as well as in 
probability. When a is real, it is called the decay 
constant because its physical dimension is [time]. 

A Poisson path x € X, is characterized by n jumps 
and the jump times during a given time interval 
T =([t,,t,] (Figure 3 illustrates a Poisson path 
in X4). The space X of Poisson paths is the union 
of all X,: 


X = UX, [54] 


One can define a volume element Da, r on X by its 
Fourier transform: 


J Dare: expli P) = exp( / dae! ) [55] 


Here a path x € X,, characterized by n jump times 
Ti, ..-, Tn, is represented by the sum 


ót F +6r, 
Hence 
(x f) = f(T1) +: Th) |56] 
The dimensionless volume element on T is 
dv(t) =a dt 
Therefore, 


vol(T)=aT, T =t,-T, 
vol(X,,) = a” T” /n! [57] 
vol(X) = exp(vol(T)) [58] 
and it makes sense to write formally 
~=exp £ 


It can be proved that the volume element D, rx is a 
measure, in the technical sense of the word (Cartier 
and DeWitt-Morette 2006). 
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Functional integration on spaces of Poisson paths 
have been used extensively in solutions of Klein- 
Gordon equations, the telegrapher equation and the 
Dirac equation (Cartier and DeWitt-Morette 2006). 

Other volume elements of interest in quantum 


physics include (LaChapelle 2004): 


® gamma volume elements, which are to gamma 
probability distributions what Gaussian volume 
elements are to Gaussian probability distribu- 
tions; and 

e Hermite volume elements convenient for integrat- 
ing Wick-ordered polynomials. 


A Dirac “6-function” is formally the limit of a 
Gaussian integral. Formally, one can introduce a Dirac 
functional volume element as the limit of a Gaussian 
volume element. 


The Koszul Formula 


There are several roadblocks on the road from finite 
to infinite-dimensional spaces. For instance, a 
volume in a D-dimensional space is a top-differential 
form, that is, a D-form. There is no top-form in an 
infinite-dimensional space — neither on Grassmann 
manifolds since Grassmann forms are totally sym- 
metric tensors. A D-form in R? has only one strict 
component and is equivalent to a scalar density of 
weight 1, but scalar densities of weight 1 do not 
form an algebra. 

For these reasons, volume elements have so far 
been defined by integrals [26], [27], [43], and [55]. 
Short of giving an explicit expression for their 
differential forms, one can require them to satisfy 
the Koszul formula 


Lxw = Div(X)w [59] 


where w is a volume element on a Banach space X, 
X a vector field generating a group of transforma- 
tions on X, Ly the Lie derivative defined by X, and 
Div(X) the standard generalization of div(ergence) 
on finite-dimensional spaces (see, e.g., Cartier and 
DeWitt-Morette (2006) for the explicit expression of 
divergences on Riemannian, symplectic, Grassmann 
manifolds). The Koszul formula dictates how a 
volume element changes under a group of 
transformations. 

It often happens that an object cannot be defined 
per se, but that it is sufficient to define its variation. For 
example, one does not define potentials, but potential 
differences; the ratio of infinite-dimensional determi- 
nants can be defined without defining each determi- 
nant; the work of Wiener on “differential-spaces,” 
which is a landmark in functional integration, is based 


on differences between two consecutive values of a 
function, etc. Similarly, the Koszul formula does not 
define w but gives its variation Lyw. 


The Operator Formalism of Quantum 
Physics 


Functional integrals can be used to represent 
Operator matrix elements, and solutions of the 
Schrodinger equation. 


1. Matrix elements of operators on Hilbert spaces. 
Symbolically, 


(3|exp(—iHt/B)|a) = f Dx exp(iS(x)/h) [60] 


The domain of integration Xag is a space of paths 
x on [ż,0] satisfying initial conditions that 
characterize the quantum state a, and final 
conditions that characterize the quantum state (3. 
The action functional S yields the Hamiltonian H. 
A key property of path integrals is their 
representations of matrix elements of time- 
ordered operators. The path parameter (time, 
scale, or any other parameter) provides the 
operator ordering [11]. A simple example is the 
two-point function of the Wiener measure [42]: 


J dr (x) x(t)x(s) = inf (t, s) [61] 
x 


The function integral orders the time, that is, the 
argument of the variable of integration. In 
quantum field theory, time ordering becomes a 
chronological ordering dictated by light cones. 

2. Schrödinger equation and other parabolic equa- 
tions (Cartier and DeWitt-Morette 2006). 


The following theorem provides the mathematical 
underpinning for a great variety of functional inte- 
grals. It also provides a construction of functional 
integrals, which begins with the symmetries of a given 
physical system rather than its action functional. The 
theorem consists of two parts: the definition of a 
functional integral, and the partial differential equa- 
tion satisfied by the value of the functional integral, as 
a function of a set of parameters. 

Given a manifold M, consider the contractible 
space PoM of pointed L*! paths over T = [t,, tp]: 


x:T—oM, eg., x(tp) = xp, 
1.€., x E PoM [62] 


Given D +1 vector field Y, {X(q)}, generators of 
group of transformations on M, define a map 


P: PoR? > P,M byz—x [63] 


explicitly 
dx(t,z) = X(q)(x(t,z))dz° + Y(x(t,z))dt 164] 


x(tp, z) = Xp, Zlu) =) [65] 


In general, the vector fields do not commute and the 
solution of [64]-[65] is of the form 


n A N (t.2) [66] 


where (t,z) is an element of a group of right 
actions on M, defined by the D +1 generators Y, 
(X(a)}! 


xp: X (t+t',z x 2’) -DF DP 


The path z defined on [t,, t] is followed by the path 
z on [t,t]. 

Consider the following functional integral over 
PoR? of a functional of paths on PM: 


(Uro) = f, Psozexp(- F2) 
x (x, > (2)) [67] 


where 


Q(z) = J dt ho oi (t)%(t) 68 


The functional (Urg) at x, is a function Y(T, xp). It 
is a solution of the generalized Schrodinger equation, 


OV s 
= ap 
ƏT = =P LX LX Y T LyY [69] 
This equation is valid on manifolds M (e.g., frame 
bundles, U(N) bundles, multiply connected spaces, 
symplectic manifold phase space) in arbitrary sys- 
tems of coordinates. 


Example (Polar coordinates on R”). Let us 
abbreviate z(t) to z“, x! (t) to r, and x?(t) to 8. It 
follows from 


z =rcos8, z = rsind [70] 


that 


dr = cos 6 - dz! + sin 8 - dz? 
[71] 


sin @ 


dô = d 
r 


cos 0 


dz? 


: X()dz! + X(y) dz? 


The dynamical vector fields are, therefore, 





ðo sind oO 
Xa) = COS 0 E F O [72] 
. -0 cosĝoð 
X(2) = sinô + 56 [73] 
Here hef = 6°? and eqn [69] reads 
ôy s /@ 10 10 
Ee he E 4 
Ot 4r (2 - r? 002 y r z) a! 


This example is trivial because x(t,z) is not a 
functional of z but a function of z(t) given by [70]. 
In the following example, x(t, z) is a functional of z. 


Example (Paths with values on a Riemannian 
manifold (M”,g)). Consider the frame bundle over 
M? and a connection o defining the horizontal lift 
p(t) of a vector x(t), 


p(t) = a(p(t)) - x(t) [75] 


In order to bring eqn [75] in the form [64], we think 
of a frame u(t) as a linear map from R? into the 
tangent space TyM”: 


u(t): R? = Ty)? [76] 
Let 
g(t) := u(t) 'x(t) [77] 


Choose a basis {e4)} in R? and {eq)} in TyM’ 
such that 


z(t) = 2^ (t)eca) = UAT (E (te) [78] 


Insert u(t) o u(t)! into [75], then 


p(t) = Xia (olE)) 24 (E) [79] 
where the dynamical vector fields are 
Xalol) = (lE) ouli) ea) [80] 


The construction [64]-[69] gives a parabolic equation 
on the bundle. If the connection o is the metric 
connection, then the parabolic equation on the bundle 
gives, by projection on the base space, the parabolic 
equation with the Laplace-Beltrami operator. Expli- 
citly, the projection on the base space of [67] is 


wtrx):= f p PaO ep- T00) 


x 6( (Dev z)(ta)) [81] 


where Dev is the Cartan development map, namely 
the bijection, defined by [82], from the space of 
pointed paths z on T,M” (identified to R? via the 
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frame upg) into the space of pointed paths x on MP 
(paths such that x(tp) = xp): 


(I o p)(t) =: (Dev z)(¢) [82] 


II is the projection on the base space. The path 
integral [81] is the solution of the equations 


IEN = AV (to, x) [83] 

Plta, x) = P(x) [84] 

where A is the Laplace-Beltrami operator on (M”, g), 
A = g!D;D; [85] 


and D; is the covariant derivative defined by the 
Riemann connection o. 


Semiclassical Expansions 


Classical mechanics is a limit of quantum 
mechanics; therefore, it is natural to expand the 
action functional S$ of a given system around, or 
near, its classical value — namely its minimum S(q), 
where q is a solution of the Euler-Lagrange 
equation, 


S'(q) = 0 [86] 


Set 
S(x) =S(q) + S(q)-£+58"(a) -& 


1 
S” Et 87 


where x € X is a path 


+ 


x: T = MP 


and €,7 € T,X is a vector field at q € X. The second 
variation of S is called its Hessian 


S"(qg)&n =: Hess(q; €, n) [88] 


The arena of semiclassical expansions of a func- 
tional integral schematically written as 


fe J Dx exp(iS(x)/b) - o(((ta))) [89] 
Xab 


consists of the intersection U,, of two spaces 
Xap CX the space of paths satisfying D initial 
conditions (a) and D final conditions (b), and 
U?P (S) the space of critical points of S 


qeUr(s), S(q)=0 [90] 


Unb = Xab n U?P (S) |91] 
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D 
TaM Ua b 


PMP 


U g USIN pM? 


Figure 4 Intersection of the space Pa pM? (abbreviated to 
Xa, p) Of paths on M? with fixed points, and the 2D-dimensional 
space U?P (S) of critical points of the system S. (Adapted from a 
Plenum Press publication with permission by Springer-Verlag.) 


The nature of the intersection U, p determines the 
behavior of the system S. Figure 4 shows the 
intersection of the space X,, of paths on M? with 
fixed points. It also shows the space UP (S) of 
critical points of S. 

We consider first the case in which U, 4 consists 
of a single point q, or several isolated points qi). The 
semiclassical expansion consists in dropping the 
terms beyond the Hessian: 


Iwkp := h DE exp (+ (sia T 55"(q) se) ) 
x @(x(ta)) [92] 


where the initial wave function ¢ accounts for the D 
initial conditions of the system, and X, is the space 
of pointed paths 


x(t) = xp, and E(t) =0 foreveryx € X, [93] 


WKB Approximations 


The integral Iwxpg is the Gaussian defined by the 
Hessian. Explicit calculations of Iwpgg exploit the 
power of Jacobi fields of S at g. 


Example (Momentum-to-position transitions) 
Cartier and DeWitt-Morette 2006). We have 


(€.8., 


M 
Twe (Xb, to; Pas ta) = exp F- S(a (to), p (ta) 
as y 
x {| det —————___— 94 
(sso) MM 


where S is the action function (a.k.a. Hamilton’s 
principal function) 


SC (to), P(ta)) = S(q) + (Pa, x(ta)) [95] 


where the classical path q is characterized by its 
initial momentum p, and its final position xp. The 


proof of [94] rests on the following property of 
quadratic forms O. Let L:X — Y linearly and 


Ox = Oy O L [96] 


According to the notations used in [26], [27], 


f Dx(x)exp(—"Ox(x))=1 [97 


According to [35], [27], 
1 = | Dx(x)exp(-* Ox()) 
= J Dy(Lx)exp(-7Ov(Lx)) 198] 
Y 


— |detL| J Dy (x) exp(—“Ox(x)) [99] 


If s=1, that is, if Ox and Oy are positive definite, 
then 


J Dy(x) exp}-rQx(x2)) == det(Qx/Qy)™" [100] 


If s =i, that is, for Feynman integrals 


J Dy(x) exp(—1Qx(x)) 
— Idet(Qx/Qy)| "7 P122) 


where “Ind(Ox/QOy)” is the ratio of the numbers of 
negative eigenvalues of Ox and Oy respectively, and 


i=V-1=e'7/2, 


Equation [100] is a key equation for semiclassical 
expansions where it is convenient to break up the 
second variation S"(qg)€£ into two quadratic forms: 


S"(HEE = Qo (€) + Q (€) [102] 


where Qo is the kinetic energy. The quadratic form 
Oo is a convenient Gaussian volume element for 
computing [92]. Moreover, splitting the Hessian 
into Oo + O corresponds to splitting the system into 
a “free” system and a perturbation. 

In eqns [100] and [101] the determinant of the 
ratios of the infinite-dimensional quadratic forms 
Ox/Oy have been shown (Cartier and DeWitt- 
Morette 2006) to be a finite-dimensional determi- 
nant, thanks to Jacobi field technology. 


[101] 


Degenerate Hessians; Beyond WKB 


When U, p consists of isolated points, the Hessian is 
not degenerate, and the semiclassical expansion is 
usually called the (strict) WKB approximation. 
When the Hessian is degenerate, 


S"(qeé=0 for £#0 [103] 


there is at least one nonzero Jacobi field þh along q, 
S"(q)h=0, b€T,U7?(S) [104] 


with D vanishing initial conditions (a) and D 
vanishing final conditions (b). Equation [104] is 
the defining equation of Jacobi fields. The vanishing 
boundary conditions imply that h € T,X, as well 
as being a Jacobi field. 

For understanding the intersections U, p when the 
Hessian is degenerate, one can construct the follow- 
ing basis for the intersecting tangent spaces 
T,U7?(S) and Tag: 


è Basis for T,U7?(S): a complete set (if it exists) of 
linearly independent Jacobi fields. It can be 
constructed by varying the 2D conditions (a), (b) 
satisfied by q € X, p. 

e Basis for T,X, 4: a complete set of orthonormal 
eigenvectors {W,} of the Jacobi operator J(q) 
defined by the Hessian 


S“ (q)  €& =: ((F (4), €), £) [105] 


I (q)Yk = ak Yg, 


The basis {¥;} diagonalizes the Hessian. When 
the Hessian is degenerate, there is at least one 
eigenvector of 7(q) with zero eigenvalue. 


ke {0,1,...} [106] 


T The intersection U, p is of dimension / > 0. Let 


{uk} be the coordinates of € in the {Ẹ,} basis of 
T4Xa, b- Then the diagonalized Hessian is 
S" (q) £E = X ap (ut) [107] 
k=0 


There are l zero eigenvalues {ap} when the system of 
Euler-Lagrange equations decouples (possibly after 
a change of variable in X,,) into two sets: l 
constraint equations, and D — / equations determin- 
ing D-— I coordinates {g“} of q. Say l=1, for 
simplicity. Then 


S(x) =S(q) + cou? +5 olt)? 
k=1 


[108] 


where 





[109] 


e = j d 2 
a r 6q(t) 


The change of variable € — {uf} is a linear change 
of variable of type [33]. The integral [92] 


Functional Integration in Quantum Physics 443 


Figure 5 A flow of particles scattered by a repulsive Coulomb 
potential. (Reprinted from Physical Review D with permission by 
the American Physical Society.) 


decomposes into the product of an ordinary integral 
over u? and a Gaussian functional integral defined 
by a nondegenerate quadratic form. The integral 
over u yields a Dirac 6-function, 5(co/h). The 
propagator vanishes unless the conservation law 
co =Q is satisfied. 

Conservation laws appear in the classical limit of 
quantum physics. The quantum system may have 
less symmetry than its classical limit. 

2. The intersection U,, is a multiple root of the 
Euler-Lagrange equation. The flow of classical 
solutions has an envelope, known as a caustic. 
Caustics abound in physics: the soap bubble 
problem, scattering of particles by a repulsive 
Coulomb potential (see Figure 5), rainbow scatter- 
ing from a source at infinity, glory scattering etc. 
(Cartier and DeWitt-Morette 2006). 

Let us consider a specific example for simplicity. 
For instance, the scattering of particles of given 
momenta pz by a repulsive Coulomb potential. 
Let q and qĉ be two solutions of the Euler- 
Lagrange equation with slightly different boundary 
conditions at tp. Compute I Ca „Ép; Pas. t4) by expand- 
ing the action functional not around qô but around q. 
The path qô is not in X, p and the expansion of the 
action functional has to be carried up to and including 
the third variation. As before, let {uf} be the 
coordinates of € in the base {W;},k € {0,1...}. The 
integral over u? is an Airy integral 


vP Ai (ie) [ow exp (i (cw J 3 (u) ) ) [110] 


where 


v= far fds | de 


x WG (1) Vo (s) YI) 
20 6S 
oe ot Sy Volt) (xb — x) [112] 
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The leading contribution of the Airy function when 
h tends to zero can be computed by the stationary 
phase method. When x} is in the “illuminated” 
region, the probability amplitude I Cry A 
oscillates rapidly as h tends to zero. When x? is in 
the “dark” region, the probability amplitude decays 
exponentially. Quantum mechanics softens up the 
caustics. 

The two kinds of degeneracies described in 
sections (1) and (2) may occur simultaneously. This 
happens, for instance, in glory scattering for which 
the cross section, to leading terms in the semiclassi- 
cal expansions, has been obtained by functional 
integration in closed form in terms of Bessel 
functions (Cartier and DeWitt-Morette 2005). 

3. The intersection U, p is the empty set. There is 
no classical solution corresponding to the quantum 
transition. This phenomenon, called “tunneling” or 
“barrier penetration,” is a rich chapter of quantum 
physics which can be found in most of the books 
listed under “Further reading.” 


A Multipurpose Tool 


Functional integration provides insight and techni- 
ques to quantum physics not available from the 
operator formalism. Just as an example, one can 
quote the section “Beyond WKB” which has often 
been dismissed in the operator formalism by stating 
that “WKB breaks down” in such cases. 

The power of functional integration stems from 
the power of infinite-dimensional spaces. For 
instance, compare the Lagrangian of a system with 
its action functional 


dt L(x(t),x(t)), x € Xag 


S: Xap > R [113] 


A classical solution q of the system can be defined 
either by a solution of the Euler-Lagrange equation, 
together with the boundary conditions dictated by 
q E Xap or by an extremum of the action func- 
tional, S'(q)=0. The path q is a significant point in 
X,,p but it is not isolated and the Hessian S”(q) 
gives much information on q, such as conservation 
laws, caustics, tunneling. 

A list of applications is beyond the scope of this 
article. We treat only two applications, then give in 
the “Further reading” section a short list of books 
that develop such applications as polarons, phase 
transitions, properties of quantum gases, scattering 
processes, many-body theory of bosons and fer- 
mions, knot invariants, quantum crystals, quantum 
field theory, anomalies, etc. 


The Homotopy Theorem for Paths Taking Their 
Values in a Multiply-Connected Space 


The space X, p of paths x 


x:ToM?, xeX,, 
probes the global properties of their ranges M”. 
When M? is multiply connected, X, p is the sum of 
distinct homotopy classes of paths. The integral over 
X, œp is a linear combination of integrals over each 
homotopy class of paths. The coefficients of this 
linear combinations are provided by the homotopy 
theorem. 

The principle of superposition of quantum states 
requires the probability amplitude for a given 
transition to be a linear combination of probability 
amplitudes. It follows that the absolute value of the 
probability amplitude for a transition from the state 
a at t, to the state b at tp has the form 


|K(, ty; 4, ta)| =|S_ x(a)K°(b, toa, ta)| [114] 


where K® is the interval over paths in the same 
homotopy class. The homotopy theorem (Laidlaw 
and Morette-DeWitt 1971) and (Schulman 1971) in 
Cartier and DeWitt-Morette (2006)) states that the 
set {y(qa)} forms a representation of the fundamental 
group of the multiply connected space MP. One 
cannot label a homotopy class by an element of the 
fundamental group unless one has chosen a point 
c € M? and a homotopy class for paths going from 
c to a and for paths going from c to b — in brief, 
unless one has chosen a homotopy mesh on M”. 
The fundamental group based at c is isomorphic to 
the fundamental group based at any other point of 
M? but not canonically so. Therefore, eqn [114] is 
only an equality between absolute values of prob- 
ability amplitudes. The proof of the homotopy 
theorem consists in requiring [114] to be indepen- 
dent of the chosen homotopy mesh. 


Application: Systems of n-Indistinguishable 
Particles in R? 


In order that there be a one-to-one correspondence 
between the system and its configuration space, 


x2 TR?" 5, = Re" 


where S, is the symmetric group for n permutations; 
the coincidence points in R®” are excluded so that 
S, acts effectively on R?”. Note that R’” is not 
connected, but R>” is multiply connected. When 
D > 3, RP” is simply connected and the fundamen- 
tal group on R®” is isomorphic to S,,. 


There are only two scalar unitary representations 
of S,: 


ya eS, 1 


Pioen] 


for all permutations a 
1 for even permutations 
—1 for odd permutations 


Therefore, in R? there are two different propagators 
of indistinguishable particles: 


prose Z ` xP (aK? (1 15] 


is a Symmetric propagator 


Kiem = ` x" (a)K® (1 16] 
Q 
is an antisymmetric propagator. 

The arguments leading to the existence of (scalar) 
bosons and fermions in R? fails in R. Statistics 
cannot be assigned to particles in R’; particles 
“without” statistics have been called anyons. 


Application: a Spinning Top 


Schulman’s analysis of the Schrodinger equation for 
a spinning top (Schulmann 1968) motivated the 
formulation of the homotopy theorem. Therefore, 
Schulman’s results can easily be formulated as an 
application of [114]. 


Application: Instantons (DeWitt 2004) 

The homotopy theorem reformulated for functional 
integrals applies to the total (out|in) amplitude of 
instantons in Minkowski spacetime. 


Scaling Properties of Gaussians 


We rewrite the definition [26] of Gaussian volume 
elements as 


J deea a a e awe). rr 

X 

where the covariance G is defined by the variance W, 
W(x’) = (x', Gx’) 


In quantum field theory the definition [26] reads 


J dI'g(y) exp(—2mi(J,)) := exp(—miW(J)) [118] 


where is a field on spacetime (Minkowski, or 
Euclidean) and J is called the source. A Gaussian Ig 
can be decomposed into the convolution of any 
number of Gaussians. For example, if 


W =W +W — G=G,4+GQ [119] 
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then 
IG = le x re [120] 
Explicitly, in QFT 
[stew exp(—2mi(J,¥)) 
_ f AA J drc, (91) 
& 
x exp(—27i(J, 1 + Y2)) [121] 
where 
p = pı t p2 |122] 


The additive property [119] makes it possible to 
express a covariance G as an integral over an 
independent scale variable. 

Let A € [0,00] be an independent scale variable. 
(some authors use à € [1,oo] and A% € [0,1[). A 


scale variable has no physical dimension: 
[A] = 0 [123] 


The scaling operator $, acting on a function f of 
length dimension [f] is by definition 


Saf (x) = AUF (x/d) 


the scaling of an interval 


Sala, b[ = {s/Als € [a, b[}, that is, 


[124] 
[a,b[ is given by 


Sala, b| = [a/A, b/A| [125] 
The scaling of a functional F is 
(S\F)(y) = F(Say) [126] 


In order to decompose a covariance into an integral 
of scale-dependent contributions we note that a 
covariance G is a two-point function [31]. In 
quantum field theory [118], the engineering length 
dimension of G is twice the field dimension 


[127] 


Let x,y € spacetime and G be a Laplacian Green 
function. One can introduce a scaled (truncated) 
Green function 


Ginx, y) = [as Spul — yl) [128] 
where 
fet, Wet, Feed 
lb <l, [4] = [G] [129] 
such that 
lim Gy,s/(x,y) = G(x, y) [130] 


Ip =0,l=00 
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Example G(x,y)=cp/|x —y|? 7; then the only 
requirement on the function u in [128] is 


f d*rr*lu(r) =cp, [r|=1 [131] 
0 


All objects defined by the scaled covariance [128] 
are labeled with the interval [lọ,l[. For instance, 
a Gaussian volume element Ceni is abbreviated 
to Din, Ii 


A Coarse-Graining Operator 


The following coarse-graining operator has been 
used for constructing a parabolic semigroup equa- 
tion in the scaling variable (Brydges et al. 1998): 


PF := Si/lo : Dio x F hey] 


where the convolution product is by definition 


(Cy *F)(y) = J. dp E(Y + ¥) 


The coarse-graining operator P; rescales the con- 
volution of a Gaussian volume element Tip, so that 
all volume elements entering the construction of the 
semigroup renormalization equation are scale 
independent. 

Some properties of the coarse-graining operator: 


s Pori = Phy Ip: 
e The scaled eigenfunctions of the coarse-graining 


operator are Wick-ordered monomials (Wurm 
and Berg 2002) 


Pca (es Eou , 
1: P (5) Vives) = l : P T * [lo,o[ [133] 


Note that P; preserves the scale range. 
e Let H be the generator of the coarse-graining 
operator 


Laer [134] 


og” 
a w o al 
0 





The semigroup renormalization equation (a.k.a. 
the flow equation) 

og” 

ay PE) = HPiF(y) 


Ph Elp) = Fly) 


Brydges et al. have applied the coarse-graining 
Operator to the quantum field theory known as 
“Ap” (more precisely the Wick-ordered Lagrangian 
of Ay*). The flow equation [135] plays the role of 
the “G-function” equation in perturbative quantum 
field theory. 


[135] 


Functional Integrals in Quantum Field Theory 


Functional integrals in quantum field theory have 
been modeled to some extent on path integrals in 
quantum mechanics: mutatis mutandis, the defini- 
tion [23] of Gaussian volume elements, the diagram 
expansion [30], the property [36] of linear maps, 
semiclassical expansions [87], the homotopy theo- 
rem [114], and the scaling eqns [135] apply to 
functional integrals in quantum field theory. The 
time ordering encoded in a path integral becomes a 
chronological ordering dictated by light cones in 
functional integrals of fields on Minkowski fields. 

The fundamental difference between quantum 
mechanics (systems with a finite number of degrees 
of freedom) and quantum field theory (systems with 
an infinite number of degrees of freedom) can be 
said to be “radiative corrections.” In quantum field 
theory, the concept of “particle? is intrinsically 
associated to the concept of “field.” A particle is 
affected by its field. Its mass and charge are 
modified by the surrounding fields, namely its own 
and other fields interacting with it. One speaks of 
“bare mass” and “renormalized mass” when the 
bare mass is renormalized by surrounding fields. 
Computing radiative corrections is a delicate proce- 
dure because the Green functions G defined by [25] 
are singular. Regularization techniques have been 
developed for handling singular Green functions. 

Particles in quantum mechanics are simply particles, 
and bosons and fermions can be treated separately. 
Not so in quantum field theory. Therefore, the 
configuration space in quantum field theory is a 
supermanifold. For functional integrals in this theory, 
we refer the reader to the “Further reading” section, in 
particular to the book of A Das for an introduction, to 
the book of B DeWitt for an in-depth study, and to the 
book of K Fujikawa and H Suzuki for applications to 
quantum anomalies. 


Concluding Remarks 


The key issue in functional integration is the domain 
of integration, that is, a function space. This infinite- 
dimensional space, say X, cannot be considered as 
the limit 2 = 00 of R”. 

Concepts of R” stated without reference to D are 
likely to be meaningful on X. Other approaches 
which have been used for exploring X are 


è projective system of finite-dimensional spaces 
coherently defined on X (DeWitt-Morette et al. 
1979). 

è one-parameter curves on X (Figure 1), and 

è projecting X on finite-dimensional spaces (cylind- 
rical integrals). 


Functional integration has advanced our under- 
standing of infinite-dimensional spaces, and like all 
good mathematical tools, it improves with usage. 


See also: BRST Quantization; Euclidean Field Theory; 
Feynman Path Integrals; Infinite-Dimensional 
Hamiltonian Systems; Knot Theory and Physics; 
Malliavin Calculus; Path Integrals in Noncommutative 
Geometry; Quantum Mechanics: Foundations; Stationary 
Phase Approximation; Topological Sigma Models. 
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Introduction 


Several asymptotic problems in the calculus of 
variations lead to the following question: given a 
sequence F, of functionals, defined on a suitable 
function space, does there exist a functional F such 
that the solutions of the minimum problems for F, 
converge to the solutions of the corresponding 
minimum problems for F? T'-convergence, introduced 
by Ennio De Giorgi and his collaborators in 1975, and 
developed as a powerful tool to attack a wide range of 
applied problems, provides a unified answer to this 
kind of question. 


Definition and Main Properties 


Let U be a topological space with a countable 
base and let F, be a sequence of functions defined 
on U with values in the extended real line 
R :=RU{—o, +00}. We say that F, I-converges 
to a function F:U — R, or that F is the [-limit of 
Fp, if for every u € U the following conditions are 
satisfied: 


1. For every sequence u, converging to u in U we 
have 


F(u) < lim inf Flug) 


2. There exists a sequence u, converging to u in U 
such that 


F(u) = lim Felu) 


Property (1) appears to be a variant of the usual 
definition of lower semicontinuity. Property (2) 
requires the existence, for every u € U, of a “recovery 
sequence,” which provides an approximation of the 
value of F at u by means of values attained by F, 
near u. 


It follows immediately from the definition that, 
if F, T'-converges to F, then F} + G T-converges to 
F +G for every continuous function G:U —> R. 

The first general property of T-limits is lower 
semicontinuity: if F, [-converges to F, then F is 
lower semicontinuous on U; that is, 


F(u) < lim inf F (up) 


for every u € U and for every sequence u, converg- 
ing to u in U. 

Another important property of T-convergence is 
compactness: every sequence F, has a I’-convergent 
subsequence. 

For every k assume that the function F, has a 
minimum point up. The following property is the 
link between I-convergence and convergence of 
minimizers: if F, -converges to F and up con- 
verges to u, then u is a minimum point of F and 
Flug) converges to F(u), hence 

min F(v) = lim min F (v) [1] 
veU k= VEU 

Under suitable coerciveness assumptions, the 
convergence of up is obtained by a compactness 
argument. We recall that a sequence of functions F, 
is said to be equicoercive if for every t € R there 
exists a compact set K, (independent of k) such that 


{uEU:F,(u) <t} CK, [2] 


for every k. 

If Fp is equicoercive and I-converges to F, the 
previous result implies that [1] holds. If, in addition, 
F is not identically +oo, then the sequence u, of 
minimizers considered above has a subsequence uz, 
which converges to a minimizer u of F. The whole 
sequence u, converges to u whenever F has a unique 
minimizer u. 

In many applications to the calculus of variations, 
U is the Lebesgue space L?(Q;R™”), with Q a 
bounded open subset of R” and 1 < p <+, but 
the effective domains of the functionals F}, defined 
as {uC U:F,(u) € R}, are often contained in the 
Sobolev space W?(Q;R™), composed of all func- 
tions u € L?(Q;R™”) whose distributional gradient 
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Vu belongs to L?(Q;R”*”). When one considers 
homogeneous Dirichlet boundary conditions, the 
effective domains of the functionals F, are often 
contained in the smaller Sobolev space Ww,” (Q; R”), 
composed of all functions of W'?(Q;R”) which 
vanish on the boundary OQ, technically defined as 
the closure of C (Q; R”) in W1?(Q;R”). 

In this case, the equicoerciveness condition [2] can 
be obtained by using Rellich’s theorem, which 
asserts that the natural embedding of Ww,” (Q; R”) 
into L’ (Q; R”) is compact. Therefore, a sequence of 
functionals F, defined on LP (Q; R”) is equicoercive 
if there exists a constant a > 0 such that 


Fru)? a | |Vu|’ dx 
Q 


for every u € WiP (Q; R”), while F,(u)= +00 for 
every u ¢ Wy? (Q; R”). 


Homogenization Problems 


Many problems for composite materials (fibered or 
stratified materials, porous media, materials with 
many small holes or fissures, etc.) lead to the study 
of mathematical models with many interacting scales, 
which may differ by several orders of magnitude. 
From a microscopic viewpoint, the systems considered 
are highly inhomogeneous. Typically, in such com- 
posite materials, the physical parameters (such as 
electric and thermal conductivity, elasticity coeffi- 
cients, etc.) are discontinuous and oscillate between 
the different values characterizing each component. 

When these components are intimately mixed, 
these parameters oscillate very rapidly and the 
microscopic structure becomes more and more com- 
plex. On the other hand, the material becomes quite 
simple from a macroscopic point of view, and it tends 
to behave like an ideal homogeneous material, called 
“homogenized material.” The purpose of the mathe- 
matical theory of homogenization is to describe this 
limit process when the parameters which describe the 
fineness of the microscopic structure tend to zero. 

Homogenization problems are often treated by 
studying the partial differential equations that 
govern the physical properties under investigation. 
Due to the small scale of the microscopic structure, 
these equations contain some small parameters. The 
mathematical problem consists then in the study of 
the limit of the solutions of these equations when the 
parameters tend to zero. [-convergence is a very 
useful tool to obtain homogenization results for 
systems governed by variational principles, which 
are the only ones described in this article. 


Let O := (— 1/2, 1/2)” be the open unit cube in R” 
centered at 0. We say that a function u defined on R” is 
O-periodic if, for every z € R” with integer coordi- 
nates, we have u(x + z) =u(x) for every x € R”. 

Let f:R” x R”*” — [0, +00) be a function such 
that x> f(x, €) is measurable and O-periodic on R” 
for every € € R””” and E€ f(x, €) is convex on R”*” 
for every x € R”. Given a bounded open set Q C R” 
and a constant p > 1, let F.: LP (Q; R”) — [0, +00] 
be the family of functionals defined by 


CaO ee f(x/e,Vu)dx ifue Wo” (Q; R”) 


Loo otherwise 


In the applications to composite materials, the func- 
tional F. represents the energy of the portion of the 
material occupying the domain Q. The fact that the 
energy density depends on x/e reflects the -periodic 
structure of the material, which implies that the energy 
density oscillates faster and faster as € — 0. 

Assume that there exist two constants 8 >a > 0 
such that 


all? < f(x, €) < BC + [€/’) 3] 


for every x € Q and every € € R”””. Then for every 
sequence € — 0 the functionals F., [-converge to 
the functional Fhom : Le (Q; R”) — [0, +00] defined by 


ice) = K ftom(Vu)de if u € Wa” (O; R”) 


+00 otherwise 
[4] 


The integrand from: R””*” — [0, +00) is obtained by 
solving the cell problem 


Fier (£) = min 


we Wyck (Q;R” 


[feng + Ve) dx [5] 


where wig (O; R”) denotes the space of functions 
w € Wie (R”; R”) which are O-periodic. 

The function fhom is always convex and satisfies 
[3]. If it is strictly convex, the basic properties of 
I-convergence imply that for every g € L7(Q; R”), 
with 1/p + 1/q =1, the solutions us of the minimum 
problems 


min J VE) — g(x)o| dx [6] 


1 m 
ve W” (Q;R”) E 


converge in LP (Q; R”), as € — 0, to the solution u of 
the minimum problem 


min Jlf») — g(x)v| dx [7] 


1, m 
ve W” (Q;R 


Similar results can be proved for nonhomogeneous 
Dirichlet boundary conditions, as well as for 
Neumann boundary conditions. 

In the special case m=1, p=2, and 


> Aij(x is £i [8] 


with a;(x) Q-periodic, the function fom takes the 
form 


emo) 2; D T = 


for suitable constant coefficients ahom 

By considering the Euler equations of the prob- 
lems [6] and [7] in this special case, from the 
previous result we obtain the homogenization 
theorem for symmetric elliptic operators in diver- 
gence form, which asserts that for every g € L? (Q) 
the solutions u, of the Dirichlet problems 


— yD, (a, (=) Diue(x)) = g(x) onQ 
WAX) =0 


converge in L*(Q) to the solution u of the Dirichlet 
problem 


on oC) 


- Sahm DD x)= p(x) -on 


pe =( 


An extensive literature is devoted to precise 
estimates of the homogenized coefficients a 
depending on various structure conditions on T 
periodic coefficients aj(x). Some of these esti- 
mates are based on a clever use of the variational 
formula [5]. 

Explicit formulas for a?°™ are known in the case 
of layered materials, which correspond to the case 
where R” is periodically partitioned into parallel 
layers on which the coefficients a(x) take constant 
values. 

Easy examples show that, even if the composite 
material is isotropic at a microscopic layer (i.e., 
a;i(x) =a(x)d; for some scalar function a(x)), the 
homogenized material can be anisotropic (i.e., 
a # adj), due to the anisotropy of the periodic 
function a(x), which describes the microscopic 
distribution of the different components of the 
composite material. 

In the vector case m > 1, the convexity hypothesis 
on €+>f(x,€) is not satisfied by the most interesting 
functionals related to nonlinear elasticity. If € +> f(x, £) 
is not convex, one can still prove that F., [-converges 


on OQ) 


om 
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to a functional Fy, : LE (Q; R”) — [0, +00] of the 
form [4], but this time from: R””” — [0, +00) cannot 
be obtained by solving a problem in the unit cell. 
Instead, it is given by the asymptotic formula 


f(x, € + Vw) dx 


_ 1 . 

from(§) := dim =, eye. 
where Or :=(—R/2,R/2)” is the open cube of side 
R centered at 0. Similar formulas can be obtained 
for quasiperiodic integrands f and for stochastic 
homogenization problems. 

In the nonperiodic case one can prove that, if 
g; : R” x R””” — [0, +00) are arbitrary Borel func- 


tions satisfying [3], with constants independent of €, 
and G.: L’ (Q; R”) — [0, +00] are defined by 


Loo otherwise 


(Q; R”) 


then there exists a sequence €p — 0 such that the 
functionals G.,[-converge to a functional G of the 
form 


Glu) = i g(x, Vu)dx if ue Wht 


+00 otherwise 


(Q; R”) 


with g satisfying [3]. 

In this case, no easy formula provides the integrand 
g(x, €) in terms of simple operations on the integrands 
g-,(x,€). The indirect connection between these 
integrands can be obtained by introducing the 
functions M.(x,€,p) defined, for x € Q, € € R”*”, 
and 0 < p < dist(x, OQ), by 


Maxi p): min 
weEW,” (B(x,p)) 


J g-(y,€ + Vw) dy 
B(x,p) 


where B(x, p) is the open ball with center x and radius 
p. These functions describe the local behavior of the 
integrands g, in some special minimum problems. The 
sequence G+, ['-converges to G if and only if 


Ma (x, €, p) 

|B(x, p)| 

M.,(x, €, p) 
|B(x, p)| 


g(x, £) = lim inf lim inf 
p—0 k= 


= lim sup lim sup 
p—0 k=œ 


for almost every x € Q and every € € R”. 
Similar results have also been proved for integral 
functionals of the form 


Gn) = k g-(x,u,Vu)dx ifue WEP (Q; R”) 


Log otherwise 


under suitable structure conditions for the inte- 
grands g.. 
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Perforated Domains 


In some homogenization problems, the integrand is 
fixed, but the domain depends on a small parameter e 
and its boundary becomes more and more fragmented 
as € —> 0. A typical example is given by periodically 
perforated domains with small holes. Given a 
bounded open set Q C R” and a compact set K C QO, 
both with smooth boundaries, for every € > 0 we 
consider the perforated sets 


Q. := O\ J (ez + eK) [9] 


E 
gel, 


where Z% is the set of vectors z € R” with integer 
coordinates such that ez +£0 CQ. 

Given g € L*(Q), let F.: L7(Q) — [0, +00] be the 
functionals defined by 


{oo otherwise 


F= fh 3 [Vul — gu dx if we Wy7(Q) 


(10) 


Minimizing [10] is equivalent to solving the mixed 
problems 


—Au,; = g on Q- 
m= on 02) [11] 
ous 





= 0 on NQ \ oQ 


The homogenization formula [5] is still valid, with 
minor modifications. It leads to a matrix of 
coefficients a such that 


SO avemggi = min f |g + Val d 
ij=1 \K 


we Wye (Q) JO 


for every € € R”. For every sequence ep — 0 the T -limit 
of the functionals F., is the functional F : L7(Q) — 
[0, +00] defined by 


n 


1 
in ; a Dju Diu — meu dx 
1 


PAS tf 
(9) if u € Wy7(Q) 
+00 otherwise 
where m :=|O\K] is the volume fraction of the 


sets Q. 

Since a slight modification of the functionals F, 
satisfies an equicoerciveness condition, it follows 
from the basic properties of [-convergence that the 
solutions u- of the mixed problems [11] in the 
perforated domains [9], extended to the holes so 
that wu, are harmonic on Q\Q- and u: € Wo (Q), 


converge in L*(Q) to the solution u of the Dirichlet 
problem 


n 

h 

— > aye" D;Dju = mg on} 
ij=1 


v= on O21) 
Therefore, the asymptotic effect of the small holes 
with Neumann boundary condition is a change in 
the coefficients of the elliptic equation. 

In the case of Dirichlet boundary conditions, it is 
interesting to consider perforated domains with 
holes of a different size, namely 


Q := O\ LJ (ez + e” 02K) [12] 


E 
REL 


with e”/”-2) replaced by exp (—1/e*) if n=2, while 
the case n= 1 gives only trivial results. 

Given g € L7(Q), let G: : L*(Q) — [0, +00] be the 
functionals defined by 


1 2 I t2 
G.(u) = l Jo. 3 [Vu] gu! dx ifu c W3“ (Qe) 13] 
+00 otherwise 


Minimizing [13] is equivalent to solving the Dirichlet 
problems 


on Q- 


—Au, = g 
on OQ- 


u- =Q 


[14] 


For every sequence ep — 0 the T-limit of the 
functionals G., is the functional G: L7(Q) — [0, +00] 
defined by 


G(u) := l lo [Vu]? +$u? — gu| dx ifue WE (O) 


+00 otherwise 


where, for n> 3, 


ci=gap( K) = Mo J, IVw| dx 
K 


w=1 on 


Since a slight modification of the functionals G- 
satisfies an equicoerciveness condition, it follows 
from the basic properties of [-convergence that the 
solutions u, of the Dirichlet problems [14] in the 
perforated domains [12], extended as zero on 
Q\Q., converge in L7(Q) to the solution u if the 
Dirichlet problem 


—Au+cu=g onQ 15) 
u = 0 on 02) 


In the electrostatic interpretation of these problems, 
the boundary OQ, is a conductor kept at potential 


zero. The extra term cu in [15] is due to the electric 
charges induced on Q; by the charge distribution g. 

These results on Dirichlet and Neumann boundary 
conditions have been extended to more general 
functionals and also to a wide class of nonperiodic 
distributions of small holes. 


Dimension Reduction Problems 


In the study of thin elastic structures, like plates, 
membranes, rods, and strings, it is customary to 
approximate the mechanical behavior of a thin three- 
dimensional body by an effective theory for two- or 
one-dimensional elastic bodies. [-convergence provides 
a useful tool for a rigorous deduction of the lower- 
dimensional theory. 

Let us focus on the derivation of plate theory from 
three-dimensional finite elasticity. The reference 
configuration of the thin three-dimensional elastic 
body is a cylinder of the form 


where ¢ > 0 and S is a bounded open subset of R? 
with smooth boundary. We assume that the body is 
hyperelastic, with stored elastic energy 


I, W(Vu) dx 


where u:Q- — R? is the deformation. The energy 
density W:R°*?—[0,+00], depending on the 
material, is continuous and frame indifferent; that 
is, W(OF)=W(F) for every rotation QO and every 
F € R°*’, where OF denotes the usual product of 
3x3 matrices. We assume that W vanishes on the 
set SO(3) of rotations, is of class C? in a 
neighborhood SO(3), and satisfies the inequality 


W(F) > a dist? (F,SO(3)) for every FE R? [16] 


with a constant a > 0. 

Plate theory is obtained in the limit as € — 0 when 
the densities of the volume forces applied to the 
body have the form e?f(x1, x2), with f € L?(S; Rẹ). 
We assume that f is balanced; that is, 


| fdx=0, [ xnfds=0 
or Qe 


Stable equilibria are then obtained by minimizing 
the functionals 


J |W (Vu) =e u| dx [17] 


on W!2?(Q,; R3). 


T-Convergence and Homogenization 453 


To study the behavior of [17] as €— 0, it is 
convenient to change variables, so that the scaled 
deformations v(X1, X2, X3) := U(X1, X2, EX3) are 
defined on the same domain 


1 1 
Ti —=,> 
sx( 3) 


The scaled energy density W-:R°*’ — [0, +00] is 
then defined as 


1 
WF, |F2 | Fs) = W (F. | F> | Er) 


where (F,|F2|F3) denotes the 3x3 matrix with 
columns F1, F2, and F3. This implies that 


/ [W (Vu) — € f - u| dx 


= ef [W.(Vv) — € f - v] dx 
Q 
The asymptotic behavior of the minimizers of 
these functionals can be obtained from the knowl- 
edge of the T-limit of the functionals 
Fs: L?(9; R?) — [0, +00] defined by 


a LORI 
ra = | Wo) dx ifve W'*(Q0;R°) 
+00 


otherwise 


Let us fix a sequence cp — 0. The [-limit of F+, 
turns out to be finite on the set “(S;R°) of all 
isometric embeddings of S into R? of class W”*; that 
is, v € E(S; R?) if and only if v € W>?(S; R°) and 
(Vv)! Vv =I a.e. on S. The elements of X(S; R?) will 
be often regarded as maps from Q into R, 
independent of x3. 

To describe the T-limit, we introduce the quad- 
ratic form Q; defined on R°*? by 


Q3(F) := 7D°W(DIF, F| 


which is the density of the linearized energy for the 
three-dimensional problem, and the quadratic form Q2 
defined on the space of symmetric 2 x 2 matrices by 


i1 412 
Q2 
412 22 
di1 412 bı 
= min O3 a12 a2 b> 


(b1,b2,b3)ER? 


The T-limit of Fa, is the functional F:L? 


(Q; R?) — [0, +20] defined by 
Fv) = = Q(A)dx if v € D(S;R°) 


+00 otherwise 
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where A(x1,x2) denotes the second fundamental 
form of v; that is, 


Ai .= —D;D;v ae [1 8| 


with normal vector v := Div A Dov. 

The equicoerciveness of the functionals F, in 
L?(Q;R°) is not trivial for this problem: it follows 
from [16] through a very deep geometric rigidity 
estimate which generalizes Korn’s inequality 
(see Friesecke et al. (2002)). The basic properties of 
I-convergence imply that 


min / [W (Vu) — £°f -uldx 
Q 


uce W12(0.;R°) 


=° min J gA) — f. | dx’ + o(e?) 
ved(S:R°) Js |12 
with x’ := (x1, x2) and A defined by [18]. 

For every £ > 0 let u, be a minimizer of [17] and let 
V-(X1,X2,X3) := ue(x1, x2, EX3). Then the basic proper- 
ties of I’-convergence imply that there exists a sequence 
Ep — 0 such that v+, (x1, x2, x3) converges in L*(Q; R?) 
to a solution v(x1, x2) of the minimum problem 


mi J Fe O,(A) — f-o! dx’ [19] 
ved(S:R3) Js |12 
These results provide a sound mathematical justification 
of the reduced two-dimensional theory of plates based 
on the minimum problem [19]. 

Similar results have been proved for shells, 
membranes, rods, and strings. 


Phase Transition Problems 


The Cahn-Hilliard gradient theory of phase transi- 
tions deals with a fluid with mass m, under 
isothermal conditions, confined in a bounded open 
subset Q of R” with smooth boundary, whose Gibbs 
free energy, per unit volume, is a prescribed function 
W of the density distribution u. Given a small 
parameter £ > 0, the energy functional F; : L! (Q) > 
[0, too] has the form 


2 l 
F(u) := fa |W) +e|Vu | dx if u € A(m) 20 


+00 otherwise 


where A(m) is the set of all functions u € W?(Q) 
with [a u=m. 

We assume that W :R — [0, +020) is continuous 
and that there exist a, 8 € R, with a|Q| < m < BIQ], 
such that W(t)=0 if and only t=a or t=@. 
Moreover, we assume that W(t) +00 as t— too. 
In the minimization of F., the Gibbs free energy 
W(u) favors the functions whose values are close to a 


and 6, which represent the pure phases, while the 
gradient term penalizes the transitions between 
different phases. 

It is easy to see that for every sequence £, — 0 
the sequence F., T-converges to the functional 
F :L'(Q) — [0, +20] defined by 


Ja W(u) dx 


+00 otherwise 


if Jau =m 


rof 


The set M(a,6,m) of minimum points of F is 
composed of all measurable functions u on Q which 
take only the values a and 8 (on Ea and Eg, 
respectively), and satisfy the mass constraint a|E,| + 
B|Eg| =m, which is equivalent to 
B|Q| -m 
E,| == 21] 
B-a 
From the basic properties of I-convergence, we 
deduce that 


min J (Wu) +e*|Vul*|dx >0 py 
ucA(m) JQ 
and that there exists a sequence £p — 0 such that the 
minimizers uz, of F., converge in L'(Q) to a 
function u which takes only the values a and 8 
and satisfies [21]. 
This result can be improved by considering the 
rescaled functionals 
1 
CAu) = zelu) [23] 
where F+ is defined by [20]. Then for every 
sequence € —0 the sequence G, I-converges to 
the functional G: L! (Q) — [0, +00] defined by 


G(u) := oki Q) if u € M(a,B,m) 


+00 otherwise 
where 
b 
mm / JW dt 
and 
P(E, Q) 


= spd | div y dx : y € C(O; R”), |y| < i| 
E 


is the Caccioppoli-De Giorgi perimeter of E in Q, 
which coincides with the (n — 1)-dimensional mea- 
sure of QNE when E is smooth enough. 

Note that the effective domain A(m) of the 
functionals G. is disjoint from the effective domain 
of the limit functional G, which is the set of all 
functions u € M(a, 8, m) with P(Ea, Q) < +00. 


As the functionals [20] and [23] have the same 
minimizers, we deduce that there exists a sequence 
Ek — 0 such that the minimizers u., of F., converge in 
L'(Q) to a function u which takes only the values 
a and 8, satisfies [21], and fulfills the minimal 
interface criterion 


P(E a, 9) < P(E, 9) 


for every measurable set E C Q with |E|/=|E,|. 
Moreover, [22] can be improved, and we obtain 
in F.(u) = e2cP(Ea, Q 
Rip Flu) = e2eP(Ea, 9) + ofe) 
Similar results have been proved when the term 
Vul" in [20] is replaced by a general quadratic form 


like [8], which leads to an anisotropic notion of 
perimeter. 


Free-Discontinuity Problems 


Free-discontinuity problems are minimum problems 
for functionals composed of two terms of different 
nature: a bulk energy, typically given by a volume 
integral depending on the gradient of an unknown 
function u; and a surface energy, given by an 
integral on the unknown discontinuity surface of u. 
These problems arise in many different fields of 
science and technology, such as liquid crystals, 
fracture mechanics, and computer vision. 

The prototype of free-discontinuity problems is 
the minimum problem proposed by David Mumford 
and Jayant Shah: 


min ‘| (Vul dx + HIK NQ) 
Q\K 


(u,K)EA 
+ | ju — cas [24] 
O\K 


where Q is a bounded open subset of R”, H”! 
denotes the (n — 1)-dimensional Hausdorff measure, 
g € L®(Q), and A is the set of all pairs (u, K) with K 
compact, K C R”, and u € C(O. \ K). 

In the applications to image segmentation problems 
the dimension 7 is 2 and the function g represents the 
grey level of an image. Given a solution (u, K) of the 
minimum problem [24], the set K is interpreted as 
the set of the relevant boundaries of the objects in 
the image, while u provides a smoothed version of the 
image. The first term in [24] has a regularizing effect, 
the purpose of the second term is to avoid over- 
segmentation, while the last term, called “fidelity term,” 
forces u to be close to g. Of course, in the applications 
these terms are multiplied by different coefficients, 
whose relative values are very important for image 
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segmentation problems, since they determine the 
strength of the effect of each term. However, the 
mathematical analysis of the problem can be easily 
reduced to the case where all coefficients are equal to 1. 

To solve [24], it is convenient to introduce a weak 
formulation of the problem based on the space 
GSBV(Q) of generalized special functions with 
bounded variation (see Ambrosio et al. (2000)). 
Without entering into details, here it is enough to 
say that every u € GSBV(Q) has, at almost every 
point, an approximate gradient Vu in the sense of 
geometric measure theory. This is a measurable map 
from Q into R” which coincides with the usual 
gradient in the sense of distributions on every open 
subset U of Q such that u € W!!(U). 

The functional F:L!(Q)—[0,+00] used for the 
weak formulation of [24] is defined by 


2 n—1 ; 
+00 otherwise 


[25] 


where J, is the jump set of u, defined in a measure- 
theoretical way as the set of points x € Q such that 


1 
lim supa | ju(y) —aldy > 0 
p—0 |B(x, p)| B(x,p) 
for every aE R. 

For every g € L™(Q), the functional 


F(u) + J u — gļ?dx 
Q 


is lower semicontinuous and coercive on L!(Q); 
therefore, the minimum problem 


2 
min [Fo)+ f gl dx} [26] 
has a solution. The connection with the Mumford- 
Shah problem is given by the following regularity 
result, proved by Ennio De Giorgi and his colla- 
borators: if u is a solution of [26] and J„ is its 
closure, then HHQ NA Qa \Ja))=0, u € CHO\ Ja), 
and (u,J„) is a solution of [24]. 

Since the numerical treatment of [24] and [26] is 
quite difficult, [-convergence has been used to 
approximate [26] by means of minimum problems 
for integral functionals, whose minimizers can be 
obtained by standard numerical techniques. 


Let us consider the nonlocal functionals 
F.:L'(Q) = [0, too] defined by 


Fis ieee if u € W!4(Q) 


+00 otherwise 
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where 


Av(|Vul’, ae) 


1 f 2 
= ——_____—_. Vu d 
B(x, o N Q| B(x,e)NQ | 9) i 


and f :[0, +20) —> [0, +00) is any increasing continu- 
ous function with f(0)=0,f'(0) =1, and f(t)— 1/2 
as t— +00. Then for every sequence ep—0 the 
sequence F., I'-converges to F. 

Given g € L™(Q), for every e >0 let uz be a 
solution of the minimum problem 


min t= f f(eAvvul,x.2)) dx 


uEeW!2(Q) LE 


-f u- gda} 
Q 


From the basic properties of I-convergence it 
follows that there exists a sequence e, — 0 such 
that us, converges in L'(Q) to a solution u of [26], 
so that (u, J„) is a solution of [24]. 

Other approximations by nonlocal functionals use 
finite differences instead of averages of gradients. 

A different approximation can be obtained by 
using the local functionals G-:(L'(Q))? — [0, +00] 
defined by 


I, 2 2, 4 
feo Faiva +5 hv) dx 


G-(u,v) := 
nu) if (u,v) € (W!2(Q))* 
+oo otherwise 
where g,,(t):=-+t7,0<n-<<e, and h(t):= 


(1 —t)* for 0<t<1, while h(t) := +00 otherwise. Let 
G:(L'(Q))? — [0, +00] be the functional defined by 


Gipi = er ifv=1a.e.onQ 


+00 otherwise 


where F is defined [25]. Then for every sequence 
Ek — 0 the sequence G., [-converges to G. 

Given g € L™(Q), for every £ > 0 let (u,v) be a 
solution of the minimum problem 


; & 
min, | [e 00u +5 vv 
(uvje(W12(9))}? Jo 2 


+U) +u- g| de 27 


From the basic properties of I-convergence it 
follows that there exists a sequence e, — 0 such 
that us, converges in L! (Q) to a solution u of [26], 
so that (u, J„) is a solution of [24]. 

The approximation of the solutions of [24] based 
on [27] has been used to construct numerical 
algorithms for image segmentation. 


Free discontinuity problems similar to [24] appear 
in the mathematical treatment of Griffith’s model in 
fracture mechanics. In this case, u is a vector-valued 
function, which represents the deformation of an 
elastic body, the first term in [24] is replaced by a 
more general integral functional which represents 
the energy stored in the elastic region Q\K, while the 
second term is interpreted as the energy dissipated to 
produce the crack K. An approximation based on 
minimum problems similar to [27] has been used to 
construct numerical algorithms to study the process 
of crack growth in brittle materials. 

An important research line, connected with these 
problems, has been developed in the last years to 
derive the macroscopic theories of fracture 
mechanics from the microscopic theories of inter- 
atomic interactions. Using I’-convergence, some 
theories expressed in the language of continuum 
mechanics can be obtained as limits of discrete 
variational models on lattices, as the distance 
between neighboring points tends to zero. 


See also: Convex Analysis and Duality Methods; Elliptic 
Differential Equations: Linear Theory; Free Interfaces 
and Free Discontinuities: Variational Problems; 
Geometric Measure Theory; Image Processing: 
Mathematics; Variational Techniques for Ginzburg- 
Landau Energies; Variational Techniques for 
Microstructures. 
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Introduction 


Poincaré duality is fundamental in the study of 
manifolds. In the case of an orientable closed 
manifold X, this duality appears as an isomorphism 


wp: H*(X;Z) > Hy_¢(X;Z) 


between integral cohomology and homology. The 
map w is defined by cap product with a chosen 
orientation class. This article focuses on dimension 
n=4, where Poincaré duality induces a bilinear 
form O on H(X; Z) by use of the Kronecker pairing 


O62) = W(6),€) EZ 


One of the outstanding achievements of modern 
topology, the classification of simply connected 
topological 4-manifolds by Freedman (1982), can 
be phrased in terms of the intersection pairing O. 
Indeed, two simply connected differentiable 
4-manifolds X and X’ are orientation preservingly 
homeomorphic if and only if the associated 
pairings O and O’ are equivalent. Freedman’s 
classification scheme has been extended to also 
cover a wide range of fundamental groups, 
resulting in a fair understanding of topological 
4-manifolds (Freedman and Quinn 1990). 

When it comes to differentiable 4-manifolds, the 
situation changes drastically. On the one hand, there is 
an abundance of topological 4-manifolds which do not 
admit a differentiable structure at all. On the other 
hand, there also are topological 4-manifolds support- 
ing infinitely many distinct differentiable structures. 
A classification of differentiable 4-manifolds up to 
differentiable equivalence seems out of reach of 
current technology, even in the most simple cases. 

The discrepancy between topological and differen- 
tiable 4-manifolds was uncovered by gauge-theoretic 
methods, applying the concepts of instantons and of 
monopoles. In order to study these, one has to equip a 
4-manifold both with a Riemannian metric and some 


additional structure: a Hermitian rank-2 bundle in 
the case of instantons and a spin‘-structure in the case 
of monopoles. Given such data, instantons and 
monopoles arise as solutions to partial differential 
equations the gauge equivalence classes of which form 
finite-dimensional moduli spaces. As it turns out, 
these moduli spaces encode significant information 
about the differentiable structures of the underlying 
4-manifolds. 

A decoding of such information contained in the 
instanton moduli and in the monopole moduli is 
achieved through Donaldson invariants and Seiberg- 
Witten invariants, respectively. This article outlines 
these theories from a mathematical point of view. 


Instantons and Donaldson Invariants 


Let X denote a closed, connected, oriented differ- 
entiable Riemannian 4-manifold. We will consider a 
principal bundle P over X with fiber a compact Lie 
group G with Lie algebra g. Connections on P form 
an infinite-dimensional affine space A(P)=Ap + 
Q!'(X;qp) modeled on the vector space of 1-forms 
with values in the adjoint bundle 


Gp = P Xaqag) 9 


The curvature F4 € 07(X,qp) of a connection A is a 
Qp-valued 2-form satisfying the Bianchi identity 
DaFa =0. The group G of principal bundle auto- 
morphisms of P acts in a natural way on the space 
of connections with quotient space 


B(P) = A(P)/G 
The Yang-Mills functional 
YM: A(P) =} Rso 


associates to a connection A the norm square 


Fall” = -| tr(F4 A *F 4) 
X 


of its curvature. Here x denotes the Hodge star 
operator defined by the metric on X and the 
orientation. The metric —tr:¢ ®q—R is Ad(G)- 
invariant and hence YM is invariant under the 
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action of G. In particular, the Yang-Mills functional 
descends to a function on the space 6(P) of a gauge 
equivalence class of connections. 

The Euler-Lagrange equations for the critical 
points of YM, called Yang-Mills equations, are of 
the form 


Da(*F,4) = 0 
and can be derived easily from the formula 
Fayg = Fa + Dala) + [la ^al 
Satisfying the equations 
Da(*Fa)=0 and D4(Fa)=0 


a Yang-Mills connection is characterized by the fact 
that it is harmonic with respect to its own Laplacian. 

The bundle A*T*X of 2-forms on X decomposes 
into (+1)-eigenbundles of the Hodge operator. This 
orthogonal splitting leads to a decomposition of 
curvature forms 


Fa = Fi +A 


into self-dual and anti-self-dual components. The 
differential form —(1/47*)tr(F4 AF4) represents a 
characteristic class of the principal bundle P. In 
particular, the integral 


K(P) =- | te(Fa AFA) 
X 
2 — j2 
= ||FAI — ||Fal 


is independent of the connection A. The Yang-Mills 
functional therefore is bounded 


YM(P) > |K(P)| 


and attains this minimum at connections A which 
satisfy the equation 


xF4 = +F4 


Such connections are either self-dual, anti-self-dual 
or both, that is, flat, depending on whether «(P) is 
negative, positive, or zero. The moduli space of 
instantons on P is the subset of minima of the Yang- 
Mills functional 


M(P) = YM~"(|«(P)|) c B(P) 


The moduli space thus consists of gauge equivalence 
classes of connections which are either self-dual or 
anti-self-dual. Donaldson theory indeed considers 
anti-self-dual connections on principal bundles with 
structure group PU(2)=SO(3). 

The Hodge x operator induces a decomposition of 
the second cohomology 


H? (X) = H; (X) 6 H? (X) 


into (+1)-eigenspaces of dimension bt and b7. 
Unless specified differently, cohomology groups are 
meant with real coefficients. In order to simplify the 
exposition, we will assume X to be simply con- 
nected. The Donaldson invariants then are defined if 
b* is odd and greater than 1. 

A “homology orientation” consists of an orienta- 
tion of H4 (X) and an integral homology class 
c € H(X; Z). The Donaldson invariant Dx, =De 
is defined after fixing such a homology orientation. 
It is a linear function 


D. : A(X) ~R 
where A(X) is the graded algebra 
A(X) = Sym, (Ho(X) $ H2(X)) 


in which H;(X) has degree (1/2)(4 — i). The sig- 
nificance of D, is its functoriality 


Dx flf (a)) = Dx, (a) 


under diffeomorphisms f:X— X’ which preserve 
both orientation and homology orientation. Switch- 
ing the orientation of H (X; R) reverses the sign of 
D,. Similarly, 


De Z C AD, 


if c—c € 2H:(X, Z) c H(X: Z). 
The construction of this invariant makes use of 
the following facts: 


1. An SO(3) principal bundle P over X is 
determined by its first Pontrjagin number p;(P) and 
its Stiefel-Whitney class w2(P) € H?(X; Z/2). As X 
is simply connected, this Stiefel-Whitney class 
admits integer lifts. Let c be such a lift and let c? 
be shorthand for the intersection pairing O(c,c). 
A pair (p1,w2) is realized by a principal bundle 
provided it satisfies the relation pı = c? modulo 4. 

2. If b* is nonvanishing, then for generic metrics 
on X, the moduli space M(P) is a manifold of 
dimension 


—2p,(P) — 3(14+ 57) 


This follows from a transversality theorem whose 
main ingredient in the Sard—Smale theorem. The 
dimension is computed by use of the Atiyah—Singer 
index theorem: to an anti-self-dual connection A on 
P there is an associated elliptic complex 


0 — Q(X; gp) Lo gp) 
D} 2 
—> Q4 (X; Gp) + 0 


where Q(X; gp) denotes gp-valued i-forms on X. 
This complex describes the tangential structure of 


the moduli space at the equivalence class of A. The 
space Q!(X; gp) is the tangent space of A(P) at A, 
°(X;qp) is the tangent space of the group G at the 
identity, and D4 is the differential of the orbit map. 
The differential operator D} is the linearization of 
the anti-self-duality map 


amc F}, = D} (a) + [a ^ a]* 


3. The moduli space M(P) can be oriented if it is 
a manifold. The orientation depends on an orienta- 
tion of H? (X) and on a U(2)-principal bundle which 
has P as its PU(2)-quotient bundle. It is determined 
by an integer lift of w2(P). The elliptic complex 
above then can be compared with a corresponding 
elliptic complex where the differentials are given by 
a complex Dirac operator. This leads to an almost- 
complex structure on the tangent space for each 
point in the moduli space and in particular to an 
orientation on the moduli space itself. 

4. Over the product M(P) x X there is a universal 
PU(2)-bundle P with first Pontrjagin class p;(P). 
Taking slant product with the class —(1/4)pi(P) 
results in a homomorphism 


u: H;(X) + H*(M(P)) 


5. The moduli space M(P) in general is noncom- 
pact. There is an Uhlenbeck compactification M(P) 
describing “ideal instantons.” Such an ideal instanton 
consists of an element (x1,...,x,) E€ Sym,(X) and an 
anti-self- dual connection A’ on the principal bundle 
P’ on X with w2(P’) =w2(P) for which the equality 


pi(P’) — pı (P) = 4n 


of Pontrjagin numbers holds. Uhlenbeck’s compact- 
ness theorem describes what happens if a sequence 
of anti-self-dual connections has no convergent 
subsequence: after passing to a subsequence, the 
sequence converges to an anti-self-dual connection 
on the restriction of P to X\{x1,...,Xn}. This limit 
connection extends to a connection A’ on the 
principal bundle P’. The functions |F4, | on X 
converge to the measure 


Pal +5 8T” óx 
i=l 





The compactification M(P) is a stratified space and 
not usually a manifold. If w2(P)40, then the 
singular set of codimension at least 2 and thus the 
space M(X) carries a fundamental class. In the case 
w2(P)=0, such a fundamental class in general can 
only be defined if —pı(P) > 4+ 3b*t. In practice, 
this problem can be circumvented by blowing up X 
and considering bundles with w2(P)~0 over the 
connected sum X#CP. Note that the complex 
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projective plane CP? as a complex manifold carries 
a natural orientation. The notation CP indicates a 
reversed orientation. 

6. The classes p(a) € H*(M(P)) for a € H*(X) 
extend over the compactification. The same holds for 
the class u(x), where x € Ho(X;Z) is the generator 
corresponding to the orientation, as long as w2(P) Æ 0. 
Otherwise, there are certain dimension restrictions. 
However, the same blow-up trick as mentioned above 
allows to handle the case w2(P)=0 as well. 


Now fix an element c € H(X; 'Z) and let 


M. = U MCP. 
n Paa) 


denote the disjoint union of all moduli spaces 
of anti-self-dual connections on principal PU(2)- 
bundles P. 4 whose second Stiefel-Whitney class is 
Poincaré-dual to c modulo 2 and whose Pontrjagin 
number equals —d — (3/2)(b* + 1). 

Our assumption of bt being odd corresponds 
to the fact that the dimension 2d of the moduli 
space M(c,d) is even and congruent to —c* + 
(1/2)(1+6*) modulo 4. Neglecting the difficulties 
in the case w2(P)=0 mentioned above, we may use 
the cup product on H*(M,) to extend u to an 
algebra homomorphism 


u: A(X) + H*(M.) 


The Donaldson invariant D, is nonzero only on 
elements z of A(X) whose total degree d is congruent 
to —c?+(1/2)(1+6*) modulo 4. For such an 
element it is defined by 





D-(2) = (u(), (Pa) = = (2) 


The Donaldson series D. is defined as a formal 
power series 





for a € H(X) and d=(1 + (x/2))a. 


Computations and Structure Theorems 


The first results about these invariants are due to 
S Donaldson. He proved both a vanishing and a 
nonvanishing theorem (Donaldson and Kronheimer 
1990): 


Theorem 1 If both b*(X) > 0 and b*(Y) > 0, then 
all Donaldson invariants vanish for the connected 
sum X#Y. 
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Theorem 2 If c represents a divisor on a complex 
algebraic surface X and a represents an ample 
divisor, then 


D(a’) #0 forr>0 


The second theorem is a consequence of the fact that in 
the case of an algebraic surface the instanton moduli 
can be described in algebraic geometric terms: the 
moduli space M(P, q) associated to the metric induced 
from the Fubini-Study metric on CP” by an embedding 
X — CP” carries the structure of a projective variety. 
This variety is reduced and of complex dimension d, as 
soon as d is large enough. Furthermore, u(d) is the first 
Chern class of an ample line bundle. 

The translation of instanton moduli into algebraic 
geometry uses two steps: suppose the first Chern class 
of a U(r)-principal bundle P on a Kahler surface is also 
the first Chern class of a holomorphic line bundle. Then 
the absolute minima of the Yang-Mills functional are 
achieved by Hermite—Einstein connections. These are 
connections for which the Ricci curvature is a constant 
multiple of the identity. The second step, the transla- 
tion from differential geometry into algebraic geome- 
try, is called the Kobayashi—Hitchin correspondence, 
which again was proved by Donaldson. 

The Donaldson invariants have been computed 
for a number of 4-manifolds. A simply connected 
4-manifold is said to have simple type, if the relation 





D,(x7z) = 4D,(z) 


is satisfied by its Donaldson invariant for all z € A(X) 
and c € H2(X;Z). It is known that this simple type 
condition holds for many 4-manifolds. Indeed, it is an 
open question whether there are 4-manifolds which 
are not of simple type. For manifolds of simple type 
the Donaldson series D, completely determines the 
Donaldson invariant De. A main result is due to 
Kronheimer and Mrowka (1995): 


Theorem 3 Let X be a simply connected 4-manifold 
of simple type. Then, there exist finitely many basic 
Classes K1,... Ky E H(X; Z) such that 


n 


De = exp(Q/2) X61) "a; exp(r:) 


i=1 


as analytic functions on H(X). The numbers a; are 
rational and each basic class k; is characteristic, that 
is, it satisfies a* = O(a,K;) modulo 2 for all 
a € H(X; Z). The homology class «; in this formula 
acts on an arbitrary homology class by intersection. 


The geometric significance of the basic classes is 
underlined by the following theorem (Kronheimer 
and Mrowka 1995): 


Theorem 4 If a € H2(X:Z) is represented by an 
embedded surface of genus g with self-intersection 
a? > 2, then for each basic class k the following 
adjunction inequality is satisfied: 


2g—2 > a* FO a] 


There are many 4-manifolds for which the Donaldson 
series have been computed (Friedman and Morgan 
1997). The basic classes for complete intersections, for 
example, are the canonical divisor and its negative. 
Another example is given by elliptic surfaces. Let 
E(n;p,q) be a minimal elliptic surface, that is, a 
holomorphic surface admitting a holomorphic map to 
CP! with generic fiber f an elliptic curve. For any 
numbers n, p, and q with p < q coprime, there exists 
such a simply connected elliptic surface with Euler 
characteristic 12n and two multiple fibers of multi- 
plicity p and q, respectively. The Donaldson series of 
E(n; p,q) for c=0 then is given by 


oA i 
_ (5) simh(/p) sinh(f/4) 


Another important formula relates the Donaldson 
series D a manifold X of simple type and the 
Donaldson series D of the blow-up X#CP?: 


D. = D. - exp(—e?/2) cosh(e) 
Deve = —D. - exp(-e”/2) sinh(e) 


Here e € Ha(CP?; Z) denotes a generator. Indeed, a 
more general blow-up formula is known which 
relates the Donaldson invariants for X and its 
blow-up even in case X is not of simple type. This 
formula, due to Fintushel and Stern (1996), involves 
Weierstrafs sigma-functions. 

The instanton moduli space carries nontrivial 
information about 4-manifolds even in the case 
bt (X) < 1. However, one has to deal with singula- 
rities in the moduli space. Let us first consider the 
case b*(X)=0. If the intersection form on X is 
negative definite, the instanton moduli spaces in 
general are bound to have singularities. Indeed, 
Donaldson examined the case with the Pontrjagin 
number p;(P)= —4 and w2(P)=0. In this case, the 
moduli space for a generic metric on X will be an 
orientable smooth manifold except at isolated 
singular points. The singularities are cones over 
CP? and they correspond to reducible connections, 
that is, reductions of the structure group of P to 
U(1). These reductions are in bijective correspon- 
dence to pairs +a € H(X; Z) with a*=-—1. The 
Uhlenbeck compactification of the moduli space 
thus leads to an oriented cobordism between X and 
the disjoint union Li,CP? over all pairs +a in 
H>(X;Z,) of square —l. As the signature of a 


manifold is an invariant of oriented cobordism, 
there have to be b7 many pairs +a of square (—1) in 
H(X; Z) and, in particular, the intersection form O 
is represented by the negative of the identity matrix 
(Donaldson 1983): 


Theorem 5 The intersection form on a differenti- 
able manifold with negative-definite intersection 
form is diagonal. 


Indeed, from rank 8 on there are lots of definite 
unimodular forms which are not diagonal. By 
Freedman’s (1982) classification, any unimodular 
form is realized as the intersection form of a simply 
connected topological manifold. This theorem 
shows that most of these manifolds do not support 
differentiable structures. 

The case b*(X)=1 is also interesting. Here, the 
moduli space is a smooth manifold for a generic 
metric, giving rise to Donaldson invariants. How- 
ever, over a smooth path of metrics, there is in 
general no smooth cobordism of moduli spaces. So 
the invariants depend on the chosen metric. The 
singularities in the cobordisms again correspond to 
classes in H>(X;Z) with negative square. An 
analysis of these singularities leads to wall-crossing 
formulas describing how different choices of the 
metric do affect Donaldson invariants. The case of 
CP? is special, as there are no elements of negative 
square in Hy(CP?; Z,). The Donaldson invariants for 
CP? as well as the wall-crossing formulas turn out 
to be closely related to modular forms (G6ttsche 
2000). 


Monopoles and Seiberg-Witten Invariants 


A spin‘®-structure on an oriented Riemannian 
4-manifold is a Spin®(4)-principal bundle P projecting 
to the orthonormal tangent frame bundle P over X 
through the group homomorphism Spin‘ (4) — SO(4) 
with kernel U(1). The group H*(X;Z) acts freely 
and transitively on the set of all spin‘-structures. 
A spin‘-connection is a lift to P of the Levi-Civita 
connection on P. Fixing a background spin‘- 
connection Ag, the monopole map 


u: (A, $) — (Dad, Fk — 66%, d*a) 


is defined (Witten 1994) for spin‘-connections A € 
Ao + '(X;iR) and positive spinors ¢. Here, DA 
denotes the complex Dirac operator associated to A 
and d*a for a € Q!(X;iR) is the adjoint of the de 
Rham differential on forms. The section ¢¢* of the 
traceless endomorphism bundle of positive spinors is 
viewed as a self-dual 2-form on X. 
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In case the first Betti number vanishes, this map — 
after suitable Sobolev completion — becomes a map 
between Hilbert spaces u: A— C which is a compact 
deformation of a linear Fredholm map. The 
Weitzenbock formula can be used to show that 
preimages under u of bounded sets in C are bounded 
in A. Furthermore, u is U(1)-equivarant, where U(1) 
acts by complex multiplication on spinors and 
trivially on forms. If b;(X) > 0, the monopole map 
is a map between Hilbert space bundles over the 
torus H'(X)/H'(X;Z). These properties of the 
monopole map allow for an interpretation in terms 
of stable homotopy (Bauer 2004): 


Theorem 6 If the first Betti number of X vanishes, 
then defines an element 


ju] € x; (8°) 


in an equivariant stable homotopy group of spheres. 
The index i=indD, — H? (X) as an element of the 
real representation ring RO(U(1)) is determind by 
the analytic index of the linearization of u. 


In the case b*(X)>1, these equivariant 
stable homotopy groups can be identified with 
nonequivariant stable  cohomotopy groups 
a HCPA, Here, d denotes the index of the 
complex Dirac operator ind D4. Fixing an orienta- 
tion of H x ) results in a Hurewicz homomorphism 


baan Pen PA] 
If b*(X) is odd, the image 
h([u]) = SW (X) VP 


is an integer multiple of a power of the generator 
t € H*(CP¢!;Z). This integer SW(X) is known as 
the Seiberg—Witten invariant (Witten 1994). 

This invariant alternatively can be defined by 
considering the moduli space M(a) = u™ (a). Assum- 
ing bt > 0, this is a smooth oriented manifold with a 
free U(1)-action for generic a € '(X;iR). The 
Seiberg-Witten invariant is the characteristic 
number obtainable by these data. In general, the 
stable homotopy invariant |u] encodes global inform- 
ation about the monopole map, which cannot be 
recovered by only considering the moduli space. In 
case the spin‘-structure is associated to an almost- 
complex structure, however, there is a fortunate 
coincidence: the Hurewicz homomorphism in this 
case is an isomorphism. So for almost-complex 
spin®-structures, the invariants [uw] and SW carry 
the same information. 

The Seiberg—Witten invariants turn out to be directly 
computable for Kahler manifolds and to some degree 
also for symplectic manifolds (Taubes 1994). Indeed, 
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the following theorem follows from arguments of 
Witten and of Taubes: 


Theorem 7 Let X be a 4-manifold with b* >1 and 
bı =0 which can be equipped with a Kahler or a 
symplectic structure. If |u] is nonvanishing for a 
spin‘-structure on X, then the spin‘-structure is 
associated to an almost-complex structure. For the 
canonical spin‘-structure on X the Seiberg-Witten 
invariant is +1. 


Seiberg-Witten invariants and Donaldson invar- 
iants are closely related: Witten gave physical 
arguments that an equality of the form 


D = 2% exp(Q/2)- X ` SW(a) exp(a) 


should hold for the Donaldson series for c=0 of 
a simply connected manifold of simple type. Here, 
a € H*(X;Z,) denotes the first Chern class of the 
complex determinant line bundle. This first Chern 
class characterizes spin‘-structures in the simply 
connected case. The number k is related to the 
signature o and the Euler characteristic x of the 
manifold X by the formula 


4k = 11o +7x +2 


A mathematical proof of this formula is known in 
special cases (Feehan and Leness 2003). 

As is the case for Donaldson invariants, the Seiberg— 
Witten invariants vanish for connected sums X#Y if 
both b*(X) > 0 and b*(Y) > 0 holds. This is not the 
case for the stable homotopy refinement as follows 
from the following theorem (Bauer 2004). 


Theorem 8 For a connected sum X#Y of 
4-manifolds the stable equivariant homotopy 
invariants are related by smash product 


lux#y] = [ux] A [uy] 


As an example application, consider connected sums of 
elliptic surfaces of the form E(2n; p, q). Now suppose X 
and X’ are each connected sums of at most four copies of 
such elliptic surfaces. Then X and X’ are diffeomorphic 
if and only if the summands were already diffeomorphic. 
This contrasts to the fact that the connected sum 
E(2n; p, q)#¢CP? is diffeomorphic to a connected sum 
of 4n — 1 copies of CP? and 20n — 1 copies of CP , 
independently of p and q. 

As a final application, we consider the case of spin 
manifolds. If the manifold X is spin, then the 
intersection form O is even, that is, O(a, a@)=0 mod 
2 for a € H(X, Z). According to Rochlin’s theorem, 
the signature of a spin 4-manifold is divisible by 16. 
The monopole map u for the spin structure admits 
additional symmetry. It is Pin(2)-equivariant. The 
nonabelian group Pin(2) appears as the normalizer 


of the maximal torus SU(2). Methods from equivar- 
iant K-theory lead to Furuta’s (2001) theorem: 


Theorem 9 Let X be a spin 4-manifold. Then 
x(X) > gla(X) 


Manifolds with Boundary 


Both Donaldson invariants and Seiberg—Witten invar- 
iants to some extent satisfy formal properties which 
fit into a general conceptual framework known as 
“topological quantum field theories (TQFTs).” Such 
a TQFT in 3+1 dimensions is a functor on the 
cobordism category of oriented 3-manifolds to the 
category of, say, vector spaces over a ground field: it 
assigns to an oriented 3-manifold Y a vector space 
h(Y). To a disjoint union it assigns 


h(y, LI Y2) = h(Y,) ®&) h(Y2) 
Reversing orientation corresponds to dualizing 
b(Y) =h(Y)" 


Viewing a four-dimensional manifold X with 
boundary ðX = Yı U Y2 formally as a morphism 
from Yı to Y>, this functor associates to X a 
homomorphism 


H(X) : h(Y,) 3 h(Y2) 


that is, an element H(X) € h(Y;U Y2). The most 
important feature is the composition law 


H(X1 Uy X2) = H(X2) O H(X1) 


So if a cobordism X from Y, to Y2 can be decomposed 
as a cobordism X, from Yı to an intermediate 
submanifold Y and a cobordism X> from Y to Y2, 
then the homomorphism H(X) can be computed from 
H(X1ı) and H(X2) as their composition. 

Donaldson invariants and Seiberg-Witten invar- 
iants fit neatly into the framework of a TQFT if one 
restricts to 3-manifolds which are disjoint unions of 
homology 3-spheres. In both the instanton and 
the monopole case, the vector spaces h(Y) are 
Floer homology groups. The construction of Floer 
homology carries the Morse theory description of 
the homology of a finite-dimensional manifold over 
to an infinite-dimensional setting. In the instanton 
case, one considers the Chern—Simons function 


1 Z 
CS(a) = -z f e(anda+5anana) 


This function is defined on the space of gauge 
equivalence classes of SU(2)-connections on Y. Note 
that for a homology 3-sphere, any SU(2) or PU(2) 


principal bundle over Y is trival. Choosing a 
trivialization, a connection becomes identified with 
a Lie-algebra-valued 1-form a. Critical points for the 
Chern-Simons functional lead to generaters in a 
chain complex the homology of which then gives the 
Floer groups. Such critical points correspond to flat 
connections on Y. The Floer homology groups HF, (Y) 
are Z,/8-graded in the SU(2) case and Z,/4-graded in 
the SO(3) case. If X is a 4-manifold with b,(X)=0 
and b*(X) > 1 and such that the boundary OX is a 
disjoint union of homology 3-spheres, then the 
Donaldson invariants are linear maps 


D. : A(X) — HF, (OX) 


These invariants satisfy a composition law on the 
subring of A(X) generated by two-dimensional 
homology classes (Donaldson 2002). 

In the monopole case, one considers a Chern- 
Simons—Dirac functional 


CSD(a, Y) = ; (fo D,w)dvol — Ja /\ da) 


and obtains integer graded Floer homology groups. 
Details and proofs of the relevant composition laws 
are announced. 


See also: Floer Homology; Four-Manifold Invariants and 
Physics; Gauge Theory: Mathematical Applications; 
Instantons: Topological Aspects; Moduli Spaces: An 
Introduction; Several Complex Variables: Basic 
Geometric Theory; Topological Quantum Field Theory: 
Overview. 
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Introduction 


One of the most exciting properties of string theory, 
which led ten years ago to the formulation of the M 
theory as the unique theory unifying all interactions, 
has been the discovery that type II theories, besides a 
perturbative spectrum consisting of closed-string 
excitations, contain also a nonperturbative one 
consisting of “solitonic” p-dimensional objects 
called Dp branes. They are characterized by two 
important properties. They are coupled to closed- 
string states as the graviton, the dilaton, and the 
R-R (p + 1)-form potential, and are described by a 
classical solution of the low-energy string effective 
action. Their dynamics is, on the other hand, 
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described by open strings having the endpoints 
attached to their world volume and therefore 
satisfying Dirichlet boundary conditions in the 
directions transverse to their world volume. This is 
the reason why they are called D (Dirichlet) branes. 
Since the lightest open-string excitation corresponds 
to a gauge field, they have a gauge theory living on 
their world volume. This twofold description of 
D-branes has opened the way to study both the 
perturbative and nonperturbative properties of the 
gauge theory living on their world volume from 
their dynamics in terms of closed strings. With the 
addition of the decoupling limit, these two proper- 
ties have led to the Maldacena (1998) conjecture of 
the equivalence between the maximally supersym- 
metric and conformal MN =4 super Yang-Mills and 
type IIB string theory on AdS; x S°. 

They have also been successfully applied to less 
supersymmetric and nonconformal gauge theories 
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that live on the world volume of fractional and 
wrapped branes. For general reviews of various 
approaches see Bertolini et al. (2000), Herzog et al. 
(2001), Bertolini (2003), Bigazzi et al. (2002), and 
Di Vecchia and Liccardo (2003). Also in these cases, 
one has constructed a classical solution of the 
supergravity equations of motion corresponding to 
these more sophisticated branes. These equations 
contain not only the supergravity fields present in 
the bulk ten-dimensional action but also boundary 
terms corresponding to the location of the branes. It 
turns out that in general the classical solution 
develops a naked singularity of the repulson type 
at short distances from the branes. This means that 
at short distances, it does not provide a reliable 
description of the branes. In the case of N=2 
supersymmetry, this can be explicitly seen because 
of the appearance of an enhancon located at 
distances slightly higher than the naked singularity 
(Johnson et al. 2000). The enhançon radius corre- 
sponds, in supergravity, to the distance where a 
brane probe becomes tensionless, and, in the gauge 
theory living on the branes, to the dynamically 
generated scale Agcp. Then, since short distances 
in supergravity correspond to large distances in 
the gauge theory, as implied by holography, the 
presence of the enhancgon and of the naked 
singularity does not allow to get any information 
on the nonperturbative large-distance behavior of 
the gauge theory living on the D-branes. Above the 
radius of the enhancon, instead, the classical solu- 
tion provides a good description of the branes and 
therefore it can be used to get information on the 
perturbative behavior of the gauge theory. This 
shows that, if we want to use the D-branes for 
studying the nonperturbative properties of the gauge 
theory living on their world volume, we must 
construct a classical solution that has no naked 
singularity at short distances in supergravity. We 
will see in a specific example that it will be possible 
to deform the classical solution, eliminating the 
naked singularity, and use it to describe nonpertur- 
bative properties as the gaugino condensate. 

In this article, we review some of the results obtained 
by using fractional D3 branes of some orbifold and D5 
branes wrapped on 2-cycles of some Calabi-Yau 
manifold. The analysis of the supersymmetric gauge 
theories living on the world volume of these D-branes 
will be based on the gauge/gravity relations that relate 
the gauge coupling constant and the -angle to the 
supergravity fields (see, e.g., reference Di Vecchia et al. 
(2005) for a derivation of them): 
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where C2 is the 2-cycle where the branes are 
wrapped. 

In the next section, we will describe the case of 
the fractional D3 branes of the orbifold C*/Z and 
show that the classical solution corresponding to a 
system of N D3 and M D7 branes reproduces the 
perturbative behavior of MN =2 super-QCD. 
Then, we will consider DS branes wrapped on 2- 
cycles of a Calabi-Yau manifold described by the 
Maldacena-Núñez classical solution (Maldacena 
and Núñez 2001, Chamseddine and Volkov 1997) 
and show that in this case we are able to reproduce 
the phenomenon of gaugino condensate and to 
construct the complete G-function of M =1 super 
Yang-Mills. 


Fractional D3 Branes of the Orbifold 
C*/Zo and N =2 Super-QCD 


In this section, we consider fractional D3 and D7 
branes of the noncompact orbifold C?/Z, in order 
to study the properties of NV =2 super-QCD. We 
group the coordinates of the directions (x*,..., x7) 
transverse to the world volume of the D3 
brane where the gauge theory lives, into three 
complex quantities: z4 =x +ix?, z.=x® + ix’, 
z3 =x8 +ix?. The nontrivial generator h of Z) 
acts as Z2 = — 22, 23 > — 23, leaving z invariant. 
This orbifold has one fixed point, located at 
Z2=2%3=0 and corresponding to a vanishing 
2-cycle. Fractional D3 branes are DS branes 
wrapped on the vanishing 2-cycle and therefore 
are, unlike bulk branes, stuck at the orbifold fixed 
point. By considering N fractional D3 and M 
fractional D7 branes of the orbifold C*/Z2, we are 
able to study M=2 super-QCD with M hyper- 
multiplets. In order to do that, we need to 
determine the classical solution corresponding to 
the previous brane configuration. For the case of 
the orbifold C*/Z, the complete classical solution 
is found in Bertolini et al. (2002b); see also 
references therein and Bertolini et al. (2000) for a 
review on fractional branes. In the following, we 
write it explicitly for a system of N fractional D3 
branes with their world volume along the direc- 
tions x9,x!,x7, and x? and M fractional D7 branes 
containing the D3 branes in their world volume 
and having the remaining four world-volume 
directions along the orbifolded ones. The metric, 


the 5-form field strength, the axion, and the 
dilaton are given by 


ds? = H7! nag dx® dx? 
=F H"? (Stm dx‘ dx” i e’ s; dx’ dx’) [3] 
Fis) =a dx A+++ A dx) 
+*d(H"! dx” A --+ A dx?) [4] 
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where the self-dual field strength Fis) is given in 
terms of the NS-NS and R-R 2-forms Bs and C2 
and of the 4-form potential C4 by Fis) =—dC,+C,A 
dB». The warp factor H is a function of the 
coordinates (x*,...,x”) and € is an infrared cutoff. 
We denote by a and the four directions corre- 
sponding to the world volume of the fractional D3 
brane, by Z and m those along the four orbifolded 
directions x°, x’, x’, and x’, and by i and j the 
directions x* and x° that are transverse to both the 
D3 and the D7 branes. The twisted fields are instead 
given by B2 =w2b, C2 = wc where w2 is the volume 
form of the vanishing 2-cycle and 
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The expression of H (Kirsch and Vaman 2005) shows 
that the previous solution has a naked singularity of 
the repulson type at short distances. On the other 
hand, if we use a brane probe approaching from 
infinity the stack of branes, described by the previous 
classical solution, it can also be seen that the tension 
of the probe vanishes at a distance that is larger than 
that of the naked singularity. The point where the 
probe brane becomes tensionless is called “enhançon” 
(Johnson et al. 2000) and at this point the classical 
solution does not describe anymore the stack of 
fractional branes. 

Let us now use the gauge/gravity relations given in 
the introduction, to determine the coupling con- 
stants of the world-volume theory from the super- 
gravity solution. In the case of fractional D3 branes 
of the orbifold C*/Zs, that is characterized by one 
single vanishing 2-cycle C2, the gauge coupling 
constant given in eqn [1] reduces to 
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By inserting the classical solution in eqns [7] and [2], 
we get the following expressions for the gauge coupling 
constant and the ĝym angle (Bertolini et al. 2002b): 
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[8] 


Notice that the gauge coupling constant appearing 
in the previous equation is the “bare? gauge 
coupling constant computed at the scale m ~ y/a’, 
while the square of the bare gauge coupling constant 
computed at the cutoff A ~ €/a’ is equal to 8rg;. 
In the case of an MN =2 supersymmetric gauge 
theory, the gauge multiplet contains a complex 
scalar field W that corresponds to the complex 
coordinate z transverse to both the world volume 
of the D3 brane and the four orbifolded directions: 
Y ~ z/2na'. This is another example of holographic 
identification between a quantity, Y, peculiar of the 
gauge theory living on the fractional D3 branes and 
another one, the coordinate z, peculiar of super- 
gravity. It allows one to obtain the gauge theory 
anomalies from the supergravity background. In 
fact, since we know how the scale and U(1) 
transformations act on W, from the previous gauge/ 
gravity relation we can deduce how they act on z, 
namely 
21a 


P — sety a> z > seiz = y — sy 
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Those transformations do not leave invariant the 
supergravity background in eqn [6] and when 
we use them in eqns |7] and [2], they generate the 
anomalies of the gauge theory living on the 
fractional D3 branes. In fact, by acting with 
those transformations in eqns [8], we get 
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The first equation generates the 3-function of N =2 
super-QCD with M hypermultiplets given by 


2N—-M , 
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B(gym) = — 
while the second one reproduces the chiral U(1) 
anomaly (Klebanov et al. 2002, Bertolini et al. 2002a). 
In particular, if we choose @=27/(2(2N — M)), 
then ym is shifted by a factor 27. But since Oym is 
periodic of 27, this means that the subgroup Z2(2N—m) is 
not anomalous in perfect agreement with the gauge 
theory results. 
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Wrapped D5 Branes and M = 1 Super 
Yang-Mills 


In this section, we will consider the classical solution 
corresponding to N DS branes wrapped on a 2-cycle 
of a noncompact Calabi-Yau space and we use it to 
study the properties of the gauge theory living on 
their world volume that can be shown to be NM =1 
super Yang-Mills. 

We start by writing the classical solution found in 
Maldacena and Núñez (2001) and Chamseddine and 
Volkov (1997). It has a nontrivial metric: 


ds*, = e? ay + e” (a8 + sin? fag”) 
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a 2-form R-R potential 
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and a dilaton 
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with p= Ar and \-7=Ng,a’. The left-invariant 


1-forms of S? are 
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with 0O< @ <m, 0<¢ġ< 2r, and 0 <% < 4r. The 
variables ĝ and ¢ describe a two-dimensional sphere 
and vary in the range 0<6< 7 and 0< < 2r. 
Before proceeding, here we want to stress the fact that 
the presence of the function a(p) #0 makes the 
solution regular everywhere. This will allow us to use 
it later on to describe the nonperturbative gaugino 
condensate property of N = 1 super Yang-Mills. 

We can now use the previous solution for comput- 
ing the running coupling constant and the @ parameter 
of N=1 super Yang-Mills (see Di Vecchia et al. 
(2002), Bertolini and Merlatti (2003), and Miick 
(2003) reviewed in Bertolini (2003), Di Vecchia and 
Liccardo (2003), and Imeroni (2003)). In order to do 
that, we have to fix the cycle on which to perform the 
integrals in eqns [1] and [2]. It turns out that this 
2-cycle is specified by 


6=0'-¢=-4, 


keeping p fixed. If we now compute the gauge couplings 
on the previous cycle with B2 = Co = 0, we get 


Aq? 
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= pcoth 2p + 4a(p) cos y [19] 
and 
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where we have kept y #0 for reasons that will 
become clear in a moment. Equation [19] shows 
that the coupling constant is running as a function 
of the distance p from the branes. In order to obtain 
the correct running of the gauge theory, we have to 
find a relation between p and the renormalization 
group scale u. This can be obtained with the 
following considerations. If we look at the previous 
solution, it is easy to see that the metric in eqn [12] 
is invariant under the following transformations: 


p—yp+2r ifa#0 
p—yp+2e ifa=0 


where € is an arbitrary constant. On the other hand, 
C2 is not invariant under the previous transforma- 
tions, but its flux, that is exactly equal to ĝym in eqn 
[20], changes by an integer multiple of 27: 


[21] 
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if a= 0, E€ = N 
But since the physics does not change when 
ym — Oym + 27, one gets that the transformation in 
eqn [22] is an invariance. Notice that also eqn [19] for 


the gauge coupling constant is invariant under the 
transformation in eqn [21]. The previous considera- 
tions show that the classical solution and also the 
gauge couplings are invariant under the Z2 transfor- 
mation if a Æ 0, while this symmetry becomes Zn if a 
is taken to be zero. As a consequence, since in the 
ultraviolet a(p) is exponentially small, we can neglect 
it and we have a Z2n symmetry, while in the infrared 
we cannot neglect a(p) anymore and we have only a 
Z2 symmetry left. This fits very well with the fact that 
N=1 super Yang-Mills has a nonzero gaugino 
condensate <A> that is responsible for the break- 
ing of Zon into Z2. Therefore, it is natural to identify 
the gaugino condensate precisely with the function 
a(p) Æ 0 that makes the classical solution regular also 
at short distances in supergravity (Di Vecchia et al. 
2002, Apreda et al. 2002): 


<A>} = pwa(p) (23) 


This provides the relation between the renormaliza- 
tion group scale u and the supergravity spacetime 
parameter p. In the ultraviolet (large p) a(p) is 
exponentially suppressed and in eqns [19] and [20] 
we can neglect it obtaining 


Nem [24] 
Oym = — N(Y + yo) 


The chiral anomaly can be obtained by performing 
the transformation Y% —> w + 2e and getting 


Oym — Oym — 2Ne [25] 


This implies that the Zən transformations corre- 
sponding to e=rk/N are symmetries because they 
shift @yy by multiples of 27. 

In general, however, eqns [19] and [20] are only 
invariant under the Z2 subgroup of Zn correspond- 
ing to the transformation 


y—ow+2n [26] 
that changes ym in eqn [20] as follows: 
Oym — Bym — 2N [27] 
leaving invariant the gaugino condensate: 
<X >= p eoa e787?/Ng}y eiðym/N 28) 
3N8ym 


Therefore, the chiral anomaly and the breaking of 
Zon to Z2 are encoded in eqns [19] and [20]. 
Finally, if we put 7 =O in eqn [19], we get 


—,— = pcoth2p — 5a(p) = ptanh p [29] 
Ngm Í 
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This equation taken together with eqn [23] allows us to 
determine the running coupling constant as a function of 
u. From it, we get (Di Vecchia et al. 2002, Di Vecchia 
and Liccardo 2003) the Novikov—Shifman—Vainshtein— 
Zacharov (NSVZ) (-function plus nonperturbative 
corrections due to fractional instantons: 
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where in the ultraviolet we have approximated 
p with 4n*/(Ngs,,) coth 427 /(Ng¥y,). 


See also: AdS/CFT Correspondence; Anomalies; 

BF Theories; Brane Construction of Gauge Theories; 
Gauge Theory: Mathematical Applications; 
Noncommutative Geometry from Strings; 
Nonperturbative and Topological Aspects of Gauge 
Theory; Perturbation Theory and its Techniques; 
Seiberg—Witten Theory; Superstring Theories. 
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Introduction 


This article surveys some developments in pure mathe- 
matics which have, to varying degrees, grown out of the 
ideas of gauge theory in mathematical physics. The 
realization that the gauge fields of particle physics and 
the connections of differential geometry are one and the 
same has had wide-ranging consequences, at different 
levels. Most directly, it has led mathematicians to work 
on new kinds of questions, often shedding light later on 
well-established problems. Less directly, various funda- 
mental ideas and techniques, notably the need to work 
with the infinite-dimensional gauge symmetry group, 
have found a place in the general world-view of many 
mathematicians, influencing developments in other 
fields. Still less directly, the work in this area — between 
geometry and mathematical physics — has been a prime 
example of the interaction between these fields which 
has been so fruitful since the 1970s. 

The body of this article is divided into three 
sections: roughly corresponding to analysis, geome- 
try, and topology. However, the different topics 
come together in many different ways: indeed the 
existence of these links between the topics is one of 
the most attractive features of the area. 


Gauge Transformations 


For a review of the usual foundational material 
on connections, curvature, and related differential 
geometric constructions, the reader is referred to 
standard texts. We will, however, briefly recall 
the notions of gauge transformations and gauge 
fixing. The simplest case is that of abelian gauge 
theory — connections on a U(1)-bundle, say over 
R?. In that case the connection form, representing 
the connection in a local trivialization, is a pure 


imaginary 1-form A, which can also be identified 
with a vector field A. The curvature of the 
connection is the 2-form dA. Changing the local 
trivialization by a U(1)-valued function g= eX 
changes the connection form to 


A = A — dgg™! =A — idx 


The forms A,A are two representations of the same 
geometric object: just as the same metric can be 
represented by different expressions in different 
coordinate systems. One may want to fix this choice 
of representation, usually by choosing A to satisfy 
the Coulomb gauge condition d*A =0 (equivalently 
divA=0), supplemented by appropriate boundary 
conditions. Here we are using the standard Eucli- 
dean metric on R°. (Throughout this article we will 
work with positive-definite metrics, regardless of the 
fact that — at least at the classical level — the 
Lorentzian signature may have more obvious bear- 
ing on physics.) Arranging this choice of gauge 
involves solving a linear partial differential equation 
(PDE) for x. 

The case of a general structure group G is not 
much different. The connection form A now takes 
values in the Lie algebra of G and the curvature is 
given by the expression 


F=dA+5/A, A] 


The change of bundle trivialization is given by a 
G-valued function and the resulting change in the 
connection form is 


A = gAg | — dgg! 


(Our notation here assumes that G is a matrix 
group, but this is not important.) Again, we can seek 
to impose the Coulomb gauge condition d*A =O, 
but now we cannot linearize this equation as before. 

We can carry the same ideas over to a global 
problem, working on a G-bundle P over a general 


Riemannian manifold M. The space of connections on 
P is an affine space A: any two connections differ by a 
bundle-valued 1-form. Now the gauge group G of 
automorphisms of P acts on A and, again, two 
connections in the same orbit of this action represent 
essentially the same geometric object. Thus, in a sense 
we would really like to work on the quotient space 
A/G. Working locally in the space of connections, near 
to some Ag, this is quite straightforward. We represent 
the nearby connections as Ag + a, where a satisfies the 
analog of the coulomb condition 


x 
aa = 


Under suitable hypotheses, this condition picks out a 
unique representative of each nearby orbit. However, 
this gauge-fixing condition need not single out a 
unique representative if we are far away from Ao: 
indeed, the space A/G typically has, unlike A, a 
complicated topology which means that it is impos- 
sible to find any such global gauge-fixing condition. 
As noted above, this is one of the distinctive features 
of gauge theory. The gauge group G is an infinite- 
dimensional group, but one of a comparatively 
straightforward kind — much less complicated than 
the diffeomorphism groups relevant in Riemannian 
geometry for example. One could argue that one of 
the most important influences of gauge theory has 
been to accustom mathematicians to working with 
infinite-dimensional symmetry groups in a compara- 
tively simple setting. 


Analysis and Variational Methods 
The Yang-Mills Functional 


A primary object brought to mathematicians atten- 
tion by physics is the Yang-Mills functional 


YM(A)= | |Fal? 
M 


Clearly, YM(A) is non-negative and vanishes if and 
only if the connection is flat: it is broadly analogous 
to functionals such as the area functional in minimal 
submanifold theory, or the energy functional for 
maps. As such, one can fit into a general framework 
associated with such functionals. The Euler- 
Lagrange equations are the Yang-Mills equations 


d% Fa = 0 


For any solution (a Yang-Mills connection), there is 
a “Jacobi operator” Hy, such that the second 
variation is given by 


YM(A + ta) = YM(A) + #7(Haa,a) + O(t) 
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The omnipresent phenomenon of gauge invar- 
lance means that Yang-Mills connections are never 
isolated, since we can always generate an infinite- 
dimensional family by gauge transformations. Thus, 
as explained in the last section, one imposes the 
gauge-fixing condition d,a=0. Then the operator 
Ha, can be written as 


Haa = ^ga + [Fa, al 


where Ay is the bundle-valued “Hodge Laplacian” 
dad’, +dj,d,4 and the expression [F4,a] combines 
the bracket in the Lie algebra with the action of 
A? on Al. This is a self-adjoint elliptic operator 
and, if M is compact, the span of the negative 
elgenspaces is finite dimensional, the dimension 
being defined to be the index of the Yang-Mills 
connection A. 

In this general setting, a natural aspiration is to 
construct a “Morse theory” for the functional. Such 
a theory should relate the topology of the ambient 
space to the critical points and their indices. In the 
simplest case, one could hope to show that for any 
bundle P there is a Yang-Mills connection with 
index 0, giving a minimum of the functional. More 
generally, the relevant ambient space here is the 
quotient A/G and one might hope that the rich 
topology of this is reflected in the solutions to the 
Yang-Mills equations. 


Uhlenbeck’s Theorem 


The essential foundation needed to underpin such a 
“direct method” in the calculus of variations is an 
appropriate compactness theorem. Here the dimen- 
sion of the base manifold M enters in a crucial way. 
Very roughly, when a connection is represented 
locally in a Coulomb gauge, the Yang-Mills action 
combines the L?-norm of the derivative of the 
connection form A with the L?-norm of the 
quadratic term [A, A]. The latter can be estimated 
by the L*-norm of A. If dim M < 4, then the 
Sobolev inequalities allow the L*-norm of A to be 
controlled by the L7-norm of its derivative, but this 
is definitely not true in higher dimensions. Thus, 
dim M=4 is the “critical dimension” for this 
variational problem. This is related to the fact that 
the Yang-Mills equations (and Yang-Mills func- 
tional) are conformally invariant in four dimensions. 
For any nontrivial Yang-Mills connection over the 
4-sphere, one generates a one-parameter family of 
Yang-Mills connections, on which the functional 
takes the same value, by applying conformal 
transformations corresponding to dilations of R’*. 
In such a family of connections the integrand |F4|* — 
the “curvature density” — converges to a 6-function 
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at the origin. More generally, one can encounter 
sequences of connections over 4-manifolds for 
which YM is bounded but which do not converge, 
the Yang-Mills density converging to 6-functions. 
There is a detailed analogy with the theory of 
the harmonic maps energy functional, where the 
relevant critical dimension (for the domain of the 
map) is 2. 

The result of Uhlenbeck (1982), which makes 
these ideas precise, considers connections over a ball 
B” C R”. If the exponent p > 2n, then there are 
positive constants «(p,7),C(p,2) > 0 such that any 
connection with ||F|l;p;gn, < € can be represented in 
Coulomb gauge over the ball, by a connection form 
which satisfies the condition d*A =0, together with 
certain boundary conditions, and 


lAll < CIFlli» 


In this Coulomb gauge, the Yang-Mills equations 
are elliptic and it follows readily that, in this setting, 
if the connection A is Yang-Mills one can obtain 
estimates on all derivatives of A. 


Instantons in Four Dimensions 


This result of Uhlenbeck gives the analytical basis 
for the direct method of the calculus of variations 
for the Yang-Mills functional over base manifolds 
M of dimension <3. For example, any bundle over 
such a manifold must admit a Yang-Mills connec- 
tion, minimizing the functional. Such a statement 
is definitely false in dimensions >5. For example, 
an early result of Bourguignon and Lawson (1981) 
and Simons asserts that there is no minimizing 
connection on any bundle over $” for n > 5. The 
proof exploits the action of the conformal trans- 
formations of the sphere. In the critical dimension 
4, the situation is much more complicated. In four 
dimensions, there are the renowned “instanton” 
solutions of the Yang-Mills equation. Recall that 
if M is an oriented 4-manifold the Hodge 
*-operation is an involution of A*T*M which 
decomposes the two forms into self-dual and 
anti-self-dual parts, A7T*M=A* @ A~. The curva- 
ture of a connection can then be written as 


Fa = F} + F7 


and a connection is a self-dual (respectively 
anti-self-dual) instanton if Fz (respectively Fy) is 0. 
The Yang-Mills functional is 





YM(A) = |All’ + IlFall” 


while the difference ||F{||* — ||F4||* is a topological 
invariant «(P) of the bundle P, obtained by 
evaluating a four-dimensional characteristic class 


on [M]. Depending on the sign of «(P), the self- 
dual or anti-self-dual connections (if any exist) 
minimize the Yang-Mills functional among all 
connections on P. These instanton solutions of 
the Yang-Mills equations are analogous to the 
holomorphic maps from a Riemann surface to a 
Kahler manifold, which minimize the harmonic 
maps energy functional in their homotopy class. 


Moduli Spaces 


The instanton solutions typically occur in “moduli 
spaces.” To fix ideas, let us consider bundles with 
structure group SU(2), in which case «(P)= 
—877c(P). For each k > 0, we have a moduli space 
M; of anti-self-dual instantons on a bundle P; — M+, 
with c2(P})= k. It is a manifold of dimension 8k — 3. 
The general goal of the calculus of variations in this 
setting is to relate three things: 


1. the topology of the space A/G of equivalence 
classes of connections on P,; 

2. the topology of the moduli space M}, of 
instantons; and 

3. the existence and indices of other, nonminimal, 
solutions to the Yang-Mills equations on P}. 


In this direction, a very influential conjecture was 
made by Atiyah and Jones (1978). They considered 
the case when M=S* and, to avoid certain 
technicalities, work with spaces of “framed” con- 
nections, dividing by the restricted group Go of 
gauge transformations equal to the identity at 
infinity. Then, for any k, the quotient A/QGo is 
homotopy equivalent to the third loop space 0°S° 
of based maps from the 3-sphere to itself. The 
corresponding “framed” moduli space My, is a 
manifold of dimension 8k (a bundle over My, with 
fiber SO(3)). Atiyah and Jones conjectured that 
the inclusion M,—A/Go induces an isomorphism 
of homotopy groups 7; in a range of dimensions 
l < l(k), where l(k) increases with k. This would be 
consistent with what one might hope to prove by the 
calculus of variations if there were no other Yang- 
Mills solutions, or if the indices of such solutions 
increased with k. 

The first result along these lines was due to 
Bourguignon and Lawson (1981), who showed that 
the instanton solutions are the only local minima of 
the Yang-Mills functional over the 4-sphere. Subse- 
quently, Taubes (1983) showed that the index of an 
non-instanton Yang-Mills connection P% is at least 
k +1. Taubes’ proof used ideas related to the action 
of the quaternions and the hyper-Kahler structure on 
the Mz, (see the section on hyper-Kahler quotients). 
Contrary to some expectations, it was shown by 


Sibner et al. (1989) that nonminimal solutions do 
exist; some later constructions were very explicit 
(Sadun and Segert 1992). Taubes’ index bound gave 
ground for hope that an analytical proof of the 
Atiyah—Jones conjecture might be possible, but this 
is not at all straightforward. The problem is that in 
the critical dimension 4 a mini-max sequence for the 
Yang-Mills functional in a given homotopy class 
may diverge, with curvature densities converging to 
sums of 6-functions as outlined above. This is 
related to the fact that the M, are not compact. In 
a series of papers culminating in a framework for 
Morse theory for Yang-Mills functional, Taubes 
(1998) succeeded in proving a partial version of the 
Atiyah-Jones conjecture, together with similar 
results for general base manifolds M*. Taubes 
showed that, if the homotopy groups of the moduli 
spaces stabilize as k — œœ, then the limit must be 
that predicted by Atiyah and Jones. Related analy- 
tical techniques were developed for other variational 
problems at the critical dimension involving “critical 
points at infinity.” The full Atiyah—Jones conjecture 
was established by Boyer et al. but using geometrical 
techniques: the “explicit” description of the moduli 
spaces obtained from the Atiyah—Drinfeld—Hitchin- 
Manin (ADHM) construction (see below). A differ- 
ent geometrical proof was given by Kirwan (1994), 
together with generalizations to other gauge groups. 

There was a parallel story for the solutions of the 
Bogomolony equation over R°, which we will not 
recount in detail. Here the base dimension is below 
the critical case but the analytical difficulty arises 
from the noncompactness of R°. Taubes succeeded 
in overcoming this difficulty and obtained relations 
between the topology of the moduli space, the 
appropriate configuration space and the higher 
critical points. Again, these higher critical points 
exist but their index grows with the numerical 
parameter corresponding to k. At about the same 
time, Donaldson (1984) showed that the moduli 
spaces could be identified with spaces of rational 
maps (subsequently extended to other gauge 
groups). The analog of the Atiyah—Jones conjecture 
is a result on the topology of spaces of rational maps 
proved earlier by Segal, which had been one of the 
motivations for Atiyah and Jones. 


Higher Dimensions 


While the scope for variational methods in Yang- 
Mills theory in higher dimensions is very limited, 
there are useful analytical results about solutions of 
the Yang-Mills equations. An important monotoni- 
city result was obtained by Price (1983). For 
simplicity, consider a Yang-Mills connection over 
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the unit ball B” c R”. Then Price showed that the 
normalized energy 


E(A,B(r)) = J FP du 


decreases with r. Nakajima (1988) and Uhlenbeck used 
this monotonicity to show that for each n there is an 
c€ such that if A is a Yang-Mills connection over a ball 
with E(A, B(r)) < e then all derivatives of A, in a 
suitable gauge, can be controlled by E(A, B(r)). Tian 
(2000) showed that if A; is a sequence of Yang-Mills 
connections over a compact manifold M with bounded 
Yang-Mills functional, then there is a subsequence 
which converges away from a set Z of Haussdorf 
codimension at least 4 (extending the case of points ina 
4-manifold). Moreover, the singular set Z is a minimal 
subvariety, in a suitably generalized sense. 

In higher dimensions, important examples of 
Yang-Mills connections arise within the framework 
of “calibrated geometry.” Here, we consider a 
Riemannian n-manifold M with a covariant constant 
calibrating form Q € Q774. There is then an analog 
of the instanton equation 


Fa =+% (QA Fa) 


whose solutions minimize the Yang-Mills functional. 
This includes the Hermitian Yang-Mills equation 
over a Kahler manifold (see the section on moment 
maps) and also certain equations over manifolds with 
special holonomy groups (Donaldson and Thomas 
1998). For these “higher-dimensional instantons,” 
Tian shows that the singular sets Z that arise are 
calibrated varieties. 


Gluing Techniques 


Another set of ideas from PDEs and analysis which 
has had great impact in gauge theory involves the 
construction of solutions to appropriate equations 
by the following general scheme: 


1. constructing an “approximate solution,” formed 
from some standard models using cutoff 
functions; 

2. showing that the approximate solution can be 
deformed to a true solution by means of an 
implicit function theorem. 


The heart of the second step usually consists of 
estimates for the relevant linear differential opera- 
tor. Of course, the success of this strategy depends 
on the particular features of the problem. This 
approach, due largely to Taubes, has been particu- 
larly effective in finding solutions to the first-order 
instanton equations and their relatives. (The applic- 
ability of the approach is connected to the fact that 
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such solutions typically occur in moduli spaces and 
one can often “see” local coordinates in the moduli 
space by varying the parameters in the approximate 
solution.) Taubes applied this approach to the 
Bogomolny monopole equation over R? (Jaffe and 
Taubes 1980) and to construct instantons over 
general 4-manifolds (Taubes 1982). In the latter 
case, the approximate solutions are obtained by 
transplanting standard solutions over Rf — with 
curvature density concentrated in a small ball — to 
small balls on the 4-manifold, glued to the trivial 
flat connection over the remainder of the manifold. 
These types of techniques have now become a fairly 
standard part of the armory of many differential 
geometers, working both within gauge theory and 
other fields. An example of a problem where similar 
ideas have been used is Joyce’s construction of 
constant of manifolds with exceptional holonomy 
groups (Joyce 1996). (Of course, it is likely that 
similar techniques have been developed over the 
years in many other areas, but Taubes’ work in 
gauge theory has done a great deal to bring them 
into prominence.) 


Geometry: Integrability and Moduli 
Spaces 


The Ward Correspondence 


Suppose that S is a complex surface and w is the 
2-form corresponding to a Hermitian metric on S. 
Then S is an oriented Riemannian 4-manifold and w 
is a self-dual form. The orthogonal complement of w 
in At can be identified with the real parts of forms 
of type (0,2). Hence, if A is an anti-self-dual 
instanton connection on a principle U(r)-bundle over 
S the (0,2) part of the curvature of A vanishes. This 
is the integrability condition for the 0-operator 
defined by the connection, acting on sections of the 
associated vector bundle E— S. Thus, in the 
presence of the connection, the bundle E is naturally 
a holomorphic bundle over S. 

The Ward correspondence (Ward 1877) builds 
on this idea to give a complete translation of the 
instanton equations over certain Riemannian 
4-manifolds into holomorphic geometry. In the 
simplest case, let A be an instanton on a bundle 
over R*. Then, for any choice of a linear complex 
structure on Rf, compatible with the metric, A 
defines a holomorphic structure. The choices of 
such a complex structure are parametrized by a 
2-sphere; in fact, the unit sphere in A+(R*). So, for 
any \ € S* we have a complex surface S) and a 
holomorphic bundle over $),. These data can be 
viewed in the following way. We consider the 


projection m:Rf x $2—R* and the pull-back 
m*(E) to R* x S?. This pullback bundle has a 
connection which defines a holomorphic structure 
along each fiber S, C R* x S? of the other projec- 
tion. The product R* x S? is the twistor space of R* 
and it is in a natural way a three-dimensional 
complex manifold. It can be identified with the 
complement of a line Læ in CP? where the projection 
R* x S*-+S* becomes the fibration of CP?\ La 
by the complex planes through Læ. One can see 
then that z*(E) is naturally a holomorphic bundle 
over CP? \ Læ. The construction extends to the 
conformal compactification S4 of R*. If S4 is viewed 
as the quaternionic projective line HP! and we 
identify H* with C* in the standard way, we get a 
natural map 7:CP®—HP!. Then CP? is the 
twistor space of St and an anti-self-dual instanton 
on a bundle E over S* induces a holomorphic 
structure on the bundle 7*(E) over CP”. 

In general, the twistor space Z of an oriented 
Riemannian 4-manifold M is defined to be the unit 
sphere bundle in Aj,. This has a natural almost- 
complex structure which is integrable if and only if 
the self-dual part of the Weyl curvature of M 
vanishes (Atiyah et al. 1978). The antipodal map 
on the 2-sphere induces an antiholomorphic involu- 
tion of Z. In such a case, an anti-self-dual instanton 
over M lifts to a holomorphic bundle over Z. 
Conversely, a holomorphic bundle over Z which is 
holomorphically trivial over the fibers of the fibra- 
tion Z—M (projective lines in Z), and which 
satisfies a certain reality condition with respect to 
the antipodal map, arises from a unitary instanton 
over M. This is the Ward correspondence, part of 
Penrose’s twistor theory. 


The ADHM Construction 


The problem of describing all solutions to the 
Yang-Mills instanton equation over S* is thus 
reduced to a problem in algebraic geometry, of 
classifying certain holomorphic vector bundles. This 
was solved by Atiyah et al. (1978). The resulting 
ADHM construction reduces the problem to certain 
matrix equations. The equations can be reduced to 
the following form. For a bundle Chern class k and 
rank r, we require a pair of k x k matrices aj,Q2, a 
kxr matrix a, and an rx k matrix b. Then the 
equations are 


lay, a2 | = ab 
aj, a1] + [a5,a2| = aa* — b*b 


[1] 


We also require certain open, nondegeneracy condi- 
tions. Given such matrix data, a holomorphic 


bundle over CP? is constructed via a “monad”: a 
pair of bundle maps over CP? 


C? @ O(-1)@ > Dı >> CoC’ e C'a > D, 
>> C* @ O(1) 





with D2,D,=0. That is, the rank-r holomorphic 
bundle we construct is Ker D2/Im D1. The bundle 
maps D1, D2 are obtained from the matrix data in a 
straightforward way, in suitable coordinates. It is 
this matrix description which was used by Boyer 
et al. to prove the Atiyah—Jones conjecture on the 
topology of the moduli spaces of instantons. The 
only other case when the twistor space of a compact 
4-manifold is an algebraic variety is the complex 
projective plane, with the nonstandard orientation. 
An analog of the ADHM description in this case was 
given by Buchdahl (1986). 


Integrable Systems 


The Ward correspondence can be viewed in the general 
framework of integrable systems. Working with the 
standard complex structure on R*, the integrability 
condition for the 0-operator takes the shape 


[V1 + 1V2, V3 + 1V4| = 0 


where V; are the components of the covariant 
derivative in the coordinate directions. So, the 
instanton equation can be viewed as a family of 
such commutator equations parametrized by À € S2. 
One obtains many reductions of the instanton 
equation by imposing suitable symmetries. Solutions 
invariant under translation in one variable corre- 
spond to the Bogomolny “monopole equation” 
(Jaffe and Taubes 1980). Solutions invariant under 
three translations correspond to solutions of Nahm’s 
equations, 


d = €ijk| Tj, Tk] 

for matrix-valued functions Tı, T2,T; of one 
variable t. Nahm (1982) and Hitchin (1983) 
developed an analog of the ADHM construction 
relating these two equations. This is now seen as a 
part of a general “Fourier-Mukai—Nahm trans- 
form” (Donaldson and Kronheimer 1990). The 
instanton equations for connections invariant 
under two translations, Hitchin’s equations 
(Hitchin 1983), are locally equivalent to the 
harmonic map equation for a surface into the 
symmetric space dual to the structure group. 
Changing the signature of the metric on Rf to (2, 2), 
one gets the harmonic mapping equations into Lie 
groups (Hitchin 1990). More complicated reduc- 
tions yield almost all the known examples of 
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integrable PDEs as special forms of the instanton 
equations (Mason and Woodhouse 1996). 


Moment Maps: the Kobayashi-Hitchin Conjecture 


Let X be a compact Riemann surface. The Jacobian 
of © is the complex torus H'(£, ©)/H! (£, Z): it 
parametrizes holomorphic line bundles of degree 0 
over X. The Hodge theory (which was, of course, 
developed long before Hodge in this case) shows 
that the Jacobian can also be identified with the 
torus H!(%,R)/H'(%,Z) which parametrizes flat 
U(1)-connections. That is, any holomorphic line 
bundle of degree 0 admits a unique compatible flat 
unitary connection. 

The generalization of these ideas to bundles of 
higher rank began with Weil. He observed that any 
holomorphic vector bundle of degree 0 admits a flat 
connection, not necessarily unitary. Narasimhan and 
Seshadri (1965) showed that (in the case of degree 0) 
the existence of a flat, irreducible, unitary connec- 
tion was equivalent to an algebro-geometric condi- 
tion of stability which had been introduced shortly 
before by Mumford, for quite different purposes. 
Mumford introduced the stability condition in order 
to construct separated moduli spaces of holo- 
morphic bundles — generalizing the Jacobian —- as 
part of his general geometric invariant theory. For 
bundles of nonzero degree, the discussion is slightly 
modified by the use of projectively flat unitary 
connections. The result of Narasimhan and Seshadri 
asserts that there are two different descriptions of 
the same moduli space M*%’(Sigma): either as 
parametrizing certain irreducible projectively flat 
unitary connections (representations of 74(%J)), or 
parametrizing stable holomorphic bundles of degree 
d and rank r. While Narasimhan and Seshadri 
probably did not view the ideas in these terms, 
another formulation of their result is that a certain 
nonlinear PDE for a Hermitian metric on a 
holomorphic bundle — analogous to the Laplace 
equation in the abelian case — has a solution when 
the bundle is stable. 

Atiyah and Bott (1982) cast these results in the 
framework of gauge theory. (The Yang-Mills 
equations in two dimensions essentially reduce to 
the condition that the connection be flat, so they are 
rather trivial locally but have interesting global 
structure.) They made the important observation 
that the curvature of a connection furnishes a map 


F:A-— Lie(G)* 


which is an equivariant moment map for the action 
of the gauge group on A. Here the symplectic form 
on the affine space A and the map from the adjoint 
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bundle-valued 2-forms to the dual of the Lie algebra 
of G are both given by integration of products of 
forms. From this point of view, the Narasimhan- 
Seshadri result is an infinite-dimensional example of 
a general principle relating symplectic and complex 
quotients. At about the same time, Hitchin and 
Kobayashi independently proposed an extension of 
these ideas to higher dimensions. Let E be a 
holomorphic bundle over a complex manifold V. 
Any compatible unitary connection on E has 
curvature F of type (1,1). Let w be the (1,1)-form 
corresponding to a fixed Hermitian metric on V. 
The Hermitian Yang-Mills equation is the equation 


Pew = ulg 


where u is a constant (determined by the topological 
invariant c;(E)). The Kobayashi-Hitchin conjecture 
is that, when w is Kahler, this equation has an 
irreducible solution if and only if E is a stable 
bundle in the sense of Mumford. Just as in the 
Riemann surface case, this equation can be viewed 
as a nonlinear second-order PDE of Laplace type for 
a metric on E. The moment map picture of Atiyah 
and Bott also extends to this higher-dimensional 
version. In the case when V has complex dimension 
2 (and u is zero), the Hermitian Yang-Mills 
connections are exactly the anti-self-dual instantons, 
so the conjecture asserts that the moduli spaces of 
instantons can be identified with certain moduli 
spaces of stable holomorphic bundles. 

The Kobayashi—Hitchin conjecture was proved in 
the most general form by Uhlenbeck and Yau 
(1986), and in the case of algebraic manifolds in 
Donaldson (1987). The proofs in Donaldson (1985, 
1987) developed some extra structure surrounding 
these equations, connected with the moment map 
point of view. The equations can be obtained as the 
Euler-Lagrange equations for a nonlocal functional, 
related to the renormalized determinants of Quillen 
and Bismut. The results have been extended to non- 
Kahler manifolds and certain noncompact mani- 
folds. There are also many extensions to equations 
for systems of data comprising a bundle with 
additional structure such as a holomorphic section 
or Higgs’ field (Bradlow et al. 1995), or a parabolic 
structure along a divisor. Hitchin’s equations 
(Hitchin 1987) are a particularly rich example. 


Topology of Moduli Spaces 


The moduli spaces M, g(%:) of stable holomorphic 
bundles/projectively flat unitary connections over 
Riemann surfaces X have been studied intensively 
from many points of view. They have natural Kahler 
structures: the complex structure being visible in the 


holomorphic bundles guise and the symplectic form 
as the “Marsden—Weinstein quotient” in the unitary 
connections guise. In the case when r and d are 
coprime, they are compact manifolds with compli- 
cated topologies. There is an important basic 
construction for producing cohomology classes 
over these (and other) moduli spaces. One takes a 
universal bundle U over the product M x © with 
Chern classes 


c;(U) € H*(M x ©) 


Then, for any class a € H,(%), we get a cohomology 
class c;(U)/a € H” (M). Thus, if Rs is the graded 
ring freely generated by such classes, we have a 
homomorphism v:Ry—H*(M). The questions 
about the topology of the moduli spaces which 
have been studied include: 


1. finding the Betti numbers of the moduli space M; 

2. identifying the kernel of v; 

3. giving an explicit system of generators and 
relations for the ring H*(M); 

4. identifying the Pontrayagin and Chern classes of 
M within H*(M); and 


5. evaluating the pairings 


[ow 


for elements W of the appropriate degree in R. 


All of these questions have now been solved quite 
satisfactorily. In early work, Newstead (1967) found 
the Betti numbers in the rank-2 case. The main aim 
of Atiyah and Bott was to apply the ideas of Morse 
theory to the Yang-Mills functional over a Riemann 
surface and they were able to reproduce Newstead’s 
results in this way and extend them to higher rank. 
They also showed that the map v is a surjection, so 
the universal bundle construction gives a system of 
generators for the cohomology. Newstead made 
conjectures on the vanishing of the Pontrayagin 
and Chern classes above a certain range which were 
established by Kirwan and extended to higher rank 
by Earl and Kirwan (1999). Knowing that Ry maps 
on to H*(M), a full set of relations can (by Poincaré 
duality) be deduced in principle from a knowledge 
of the integral pairings in (5) above, but this is not 
very explicit. A solution to (5) in the case of rank 2 
was found by Thaddeus (1992). He used results 
from the Verlinde theory (see section on 3-manifolds 
below) and the Riemann—Roch formula. Another 
point of view was developed by Witten (1991), who 
showed that the volume of the moduli space was 
related to the theory of torsion in algebraic topology 
and satisfied simple gluing axioms. These different 


points of view are compared in Donaldson (1993). 
Using a nonrigorous localization principle in infinite 
dimensions, Witten (1992) wrote down a general 
formula for the pairings (5) in any rank, and this 
was established rigorously by Jeffrey and Kirwan, 
using a finite-dimensional version of the same 
localization method. A very simple and explicit set 
of generators and relations for the cohomology (in 
the rank-2 case) was given by King and Newstead 
(1998). Finally, the quantum cohomology of the 
moduli space, in the rank-2 case, was identified 
explicitly by Munoz (1999). 


Hyper-Kahler Quotients 


Much of this story about the structure of moduli 
spaces extends to higher dimensions and to the 
moduli spaces of connections and Higgs fields. 
A particularly notable extension of the ideas 
involves hyper-Kahler structures. Let M be a hyper- 
Kahler 4-manifold, so there are three covariant- 
constant self-dual forms w1, w2,w3 on M. These 
correspond to three complex structures I), lh, I; 
obeying the algebra of the quaternions. If we single 
out one structure, say l4, the instantons on M can be 
viewed as holomorphic bundles with respect to J, 
satisfying the moment map condition (Hermitian 
Yang-Mills equation) defined by the form w1. 
Taking a different complex structure interchanges 
the role of the moment map and integrability 
conditions. This can be put in a general framework 
of hyper-Kahler quotients due to Hitchin et al. 
(1987). Suppose initially that M is compact 
(so either a K3 surface or a torus). Then the w; 
components of the curvature define three maps 


F; : A= Lie(G)" 


The structures on M make A into a flat hyper- 
Kahler manifold and the three maps F; are the 
moment maps for the gauge group action with 
respect to the three symplectic forms on A. In this 
situation, it is a general fact that the hyper-Kahler 
quotient — the quotient by G of the common zero set 
of the three moment maps — has a natural hyper- 
Kahler structure. This hyper-Kahler quotient is just 
the moduli space of instantons over M. In the case 
when M is the noncompact manifold R*, the same 
ideas apply except that one has to work with 
the based gauge group Go. The conclusion is that 
the framed moduli spaces M of instantons over R* 
are naturally hyper-Kahler manifolds. One can also 
see this hyper-Kahler structure through the ADHM 
matrix description. A variant of these matrix 
equations was used by Kronheimer to construct 
“gravitational instantons.” The same ideas also 
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apply to the moduli spaces of monopoles, where 
the hyper-Kahler metric, in the simplest case, was 
studied by Atiyah and Hitchin (1989). 


Low-Dimensional Topology 
Instantons and 4-Manifolds 


Gauge theory has had unexpected applications in 
low-dimensional topology, particularly the topology 
of smooth 4-manifolds. The first work in this 
direction, in the early 1980s, involved the Yang- 
Mills instantons. The main issue in 4-manifold 
theory at that time was the correspondence between 
the diffeomorphism classification of simply con- 
nected 4-manifolds and the classification up to 
homotopy. The latter is determined by the intersec- 
tion form, a unimodular quadratic form on the 
second integral homology group (i.e., a symmetric 
matrix with integral entries and determinant +1, 
determined up to integral change of basis). The only 
known restriction was that Rohlin’s theorem, which 
asserts that if the form is even the signature must be 
divisible by 16. The achievement of the first phase of 
the theory was to show that 


1. There are unimodular forms which satisfy the 
hypotheses of Rohlin’s theorem but which do not 
appear as the intersection forms of smooth 
4-manifolds. In fact, no nonstandard definite 
form, such as a sum of copies of the Eg matrix, 
can arise in this way. 

2. There are simply connected smooth 4-manifolds 
which have isomorphic intersection forms, and 
hence are homotopy equivalent, but which are 
not diffeomorphic. 


These results stand in contrast to the homeomorph- 
ism classification which was obtained by Freedman 
shortly before and which is almost the same as the 
homotopy classification. 

The original proof of item (1) above argued with 
the moduli space M of anti-self-dual instantons SU(2) 
instantons on a bundle with c.)=1 over a simply 
connected Riemannian 4-manifold M with a negative- 
definite intersection form (Donaldson 1983). In the 
model case when M is the 4-sphere the moduli space 
M can be identified explicitly with the open 5-ball. 
Thus the 4-sphere arises as the natural boundary of 
the moduli space. A sequence of points in the moduli 
space converging to a boundary point corresponds to 
a sequence of connections with curvature densities 
converging to a 6-function, as described earlier. One 
shows that in the general case (under our hypotheses 
on the 4-manifold M) the moduli space M has a 
similar behavior, it contains a collar M x (0,6) 
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formed by instantons made using Taubes’ gluing 
construction, described previously. The complement 
of this collar is compact. In the interior of the moduli 
space, there are a finite number of special points 
corresponding to U(1)-reductions of the bundle P. 
This is the way in which the moduli space “sees” the 
integral structure of the intersection form since such 
reductions correspond to integral homology classes 
with self-intersection —1. Neighborhoods of these 
special points are modeled on quotients C? /U(1); that 
is, cones on copies of CP*. The upshot is that (for 
generic Riemannian metrics on M) the moduli space 
gives a cobordism from the manifold M to a set of 
copies of CP? which can be counted in terms of the 
intersection form, and the result follows easily from 
standard topology. More sophisticated versions of the 
argument extended the results to rule out some 
indefinite intersection forms. 

On the other hand, the original proofs of item (2) 
used “invariants” defined by instanton moduli spaces 
(Donaldson 1990). The general scheme exploits the 
same construction outlined in the previous section. 
We suppose that M is a simply connected 4-manifold 
with b*(M)=1+2p, where p>O is an integer. 
(Here b*(M) is, as usual, the number of positive 
eigenvalues of the intersection matrix.) Ignoring some 
technical restrictions, there is a map 


Vy. Ru =F H* (Mg) 


where Ry is a graded ring freely generated by 
the homology (below the top dimension) of the 
4-manifold M and Mg is the moduli space of 
anti-self-dual SU(2)-instantons on a bundle with 
co=k>0O. For an element W in Ry of the 
appropriate degree, one obtains a number by 
evaluating, or integrating, v(W) on Mg. The main 
technical difficulty here is that the moduli space M; 
is rarely compact, so one needs to make sense of this 
“evaluation.” With all the appropriate technicalities 
in place, these invariants could be shown to 
distinguish various homotopy-equivalent, homeo- 
morphic 4-manifolds. All these early developments 
are described in detail in the book by Donaldson 
and Kronheimer (1990). 


Basic Classes 


Until the early 1990s, these instanton invariants 
could only be calculated in isolated favorable 
cases (although the calculations which were 
made, through the work of many mathematicians, 
led to a large number of further results about 
4-manifold topology). Deeper understanding of 
their structure came with the work of Kronheimer 
and Mrowka. This work was, in large part, 


motivated by a natural question in geometric 
topology. Any homology class a € H2(M;Z) can 
be represented by an embedded, connected, 
smooth surface. One can define an integer g(a) 
to be the minimal genus of such a representative. 
The problem is to find g(a), or at least bounds on 
it. A well-known conjecture, ascribed to Thom, 
was that when M is the complex projective plane 
the minimal genus is realized by a complex curve; 
that is, 


g(+dH) =3(d — 1)(d - 2) 


where H is the standard generator of H»(CP”) and 
d> 1. 

The new geometrical idea introduced by 
Kronheimer and Mrowka was to study instantons 
over a 4-manifold M with singularities along a 
surface X C M. For such connections, there is a real 
parameter: the limit of the trace of the holonomy 
around small circles linking the surface. By 
varying this parameter, they were able to inter- 
polate between moduli spaces of nonsingular 
instantons on different bundles over M and obtain 
relations between the different invariants. They 
also found that if the genus of © is suitably small 
then some of the invariants are forced to vanish, 
thus, conversely, getting information about g for 
4-manifolds with nontrivial invariants. For exam- 
ple, they showed that if M is a K3 surface then 
g(a) =(1/2)(a-a+2). 
= The structural results of Kronheimer and Mrowka 
(1995) introduced the notion of a 4-manifold of 
“simple type.” Write the invariant defined above by 
the moduli space M, as Ip: Ry—Q. Then I; 
vanishes except on terms of degree 2d(k), where 
d(k)=4k — 3(1 + p). We can put all these together 
to define I= S>Ip:Ry—Q. The ring Ry is a 
polynomial ring generated by classes a € H>(M), 
which have degree 2 in Rm, and a class X of degree 
4 in Ry, corresponding to the generator of Ho(M). 
The 4-manifold is of simple type if 


I(X°W) = 41(W) 


for all W € Ry. Under this condition, Kronheimer 
and Mrowka showed that all the invariants are 
determined by a finite set of “basic” classes 
K,,...,K; € H2(M) and rational numbers (),..., 8s. 
To express the relation, they form a generating 
function 


Dula) = Ie) + (5 e°) 


This is a priori a formal power series in H? (M) but a 
posteriori the series converges and can be regarded 


as a function on H2(M). Kronheimer and Mrowka’s 
result is that 


Dula) = exp(25") D gete 
r=1 


It is not known whether all simply connected 
4-manifolds are of simple type, but Kronheimer and 
Mrowka were able to show that this is the case for a 
multitude of examples. They also introduced a 
weaker notion of “finite type,” and this condition 
was shown to hold in general by Munoz and 
Froyshov. The overall result of this work of 
Kronheimer and Mrowka was to make the calcula- 
tion of the instanton invariants for many familiar 
4-manifolds a comparatively straightforward matter. 


3-Manifolds: Casson’s Invariant 


Gauge theory has also entered into 3-manifold 
topology. In 1985, Casson introduced a new 
integer-valued invariant of oriented homology 
3-spheres which “counts” the set Z of equivalence 
classes of irreducible flat SU(2)-connections, or 
equivalently irreducible representations 7,(Y)— 
SU(2). Casson’s approach (Akbulut and McCarthy 
1990) was to use a Heegard splitting of a 
3-manifold Y into two handle bodies Yt, Y~ with 
a surface X as common boundary. Then mı(£) maps 
onto 71(Y) and a flat SU(2) connection on Y is 
determined by its restriction to X. Let My be the 
moduli space of irreducible flat connections over X 
(as discussed in the last section) and let L* c My be 
the subsets which extend over Y*. Then L* are 
submanifolds of half the dimension of Ms and the 
set Z can be identified with the intersection L™ AN L7. 
The Casson invariant is one-half the algebraic 
intersection number of Lt and L~. Casson showed 
that this is independent of the Heegard splitting 
(and is also, in fact, an integer, although this is not 
obvious). He showed that when Y is changed by 
Dehn surgery along a knot, the invariant changes 
by a term computed from the Alexander polynomial 
of the knot. This makes the Casson invariant 
computable in examples. (For a discussion of 
Casson’s formula see Donaldson (1999).) Taubes 
showed that the Casson invariant could also be 
obtained in a more differential-geometric fashion, 
analogous to the instanton invariants of 4-manifolds 
(Taubes 1990). 


3-Manifolds: Floer Theory 


Independently, at about the same time, Floer 
(1989) introduced more sophisticated invariants — 
the Floer homology groups - of homology 
3-spheres, using gauge theory. This development 
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ran parallel to his introduction of similar ideas in 
symplectic geometry. Suppose, for simplicity, that 
the set Z of equivalence classes of irreducible flat 
connections is finite. For pairs p_,p, in Z, Floer 
considered the instantons on the tube YxR 
asymptotic to pt at -too. There is an infinite set 
of moduli spaces of such instantons, labeled by a 
relative Chern class, but the dimensions of these 
moduli spaces agree modulo 8. This gives a relative 
index 6(p_,p+)€ Z/8. If 6(p_,p,)=1 there is a 
moduli space of dimension 1 (possibly empty), but 
the translations of the tube act on this moduli space 
and, dividing by translations, we get a finite set. 
The number of points in this set, counted with 
suitable signs, gives an integer n(p_,p,). Then, 
Floer considers the free abelian groups 


C. = PBZ(p) 


pEZ 


generated by the set Z and a map 0:C,—-C, 
defined by 


O((p-)) = >> n(p_, p+) (p+) 


Here the sum runs over the p, with 6(p_,p,)=1. 
Floer showed that 0*=0 and the homology 
HF,(Y)=ker0/Im0@ is independent of the metric 
on Y (and various other choices made in implement- 
ing the construction in detail). The chain complex 
C, and hence the Floer homology can be graded by 
Z/8, using the relative index, so the upshot is to 
define 8 abelian groups HF,(Y): invariants of the 
3-manifold Y. The Casson invariant appears now as 
the Euler characteristic of the Floer homology. 
There has been extensive work on extending these 
ideas to other 3-manifolds (not homology spheres) 
and gauge groups, but this line of research does not 
yet seem to have reached a clear-cut conclusion. 

Part of the motivation for Floer’s work came from 
Morse theory, and particularly the approach to 
this theory expounded by Witten (1982). The 
Chern-Simons functional is a map 


CS: A/G — R/Z 


from the space of SU(2)-connections over Y. 
Explicitly, in a trivialization of the bundle 


CS(A) = [ AndA+3ANANA 
Y 


It appears as a boundary term in the Chern-Weil 
theory for the second Chern class, in a similar way 
as holonomy appears as a boundary term in the 
Gauss—Bonnet theorem. The set Z can be identified 
with the critical points of CS and the instantons on 
the tube as integral curves of the gradient vector 
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field of CS. Floer’s definition mimics the definition 
of homology in ordinary Morse theory, taking 
Witten’s point of view. It can be regarded formally 
as the “middle-dimensional” homology of the 
infinite-dimensional space A/G. See Atiyah (1988) 
and Cohen et al. (1995) for discussions of these 
ideas. 

The Floer theory interacts with 4-manifold invar- 
lants, making up a structure approximating to 
a (3+1)-dimensional topological field theory 
(Atiyah 1988). Roughly, the numerical invariants 
of closed 4-manifolds generalize to invariants for a 
4-manifold M with boundary Y taking values in the 
Floer homology of Y. If two such manifolds are 
glued along a common boundary, the invariants of 
the result are obtained by a pairing in the Floer 
groups. There are, however, at the moment, some 
substantial technical restrictions on this picture. This 
theory, as well as Floer’s original construction, is 
developed in detail by Donaldson (2002). At the time 
of writing, the Floer homology groups are still difficult 
to compute in examples. One important tool is a 
surgery-exact sequence found by Floer (Braam and 
Donaldson 1995), related to Casson’s surgery 
formula. 


3-Manifolds: Jones-—Witten Theory 


There is another, quite different, way in which ideas 
from gauge theory have entered 3-manifold topo- 
logy. This is the Jones—Witten theory of knot and 
3-manifold invariants. This theory falls outside the 
main line of this article, but we will say a little about 
it since it draws on many of the ideas we have 
discussed. The goal of the theory is to construct a 
family of (2 + 1)-dimensional topological field the- 
ories indexed by an integer k, assigning complex 
vector space H;(:) to a surface © and an invariant 
in H,(OY) to a 3-manifold-with-boundary Y. If OY is 
empty, the vector space H,(OY) is taken to be C, so 
one seeks numerical invariants of closed 3-manifolds. 
Witten’s (1989) idea is that these invariants of closed 
3-manifolds are Feynmann integrals 


J ai2mkCS(A) DA 
AIG 


This functional integral is probably a schematic 
rather than a rigorous notion. The data associated 
with surfaces can, however, be defined rigorously. If 
we fix a complex structure I on X}, we can define a 
vector space H,(%,I) to be 


H(X, 1) = H°(M(); L“) 


where M(X) is the moduli space of stable holo- 
morphic bundles/flat unitary connections over © 
and L is a certain holomorphic line bundle over 


M(X). These are the spaces of “conformal blocks” 
whose dimension is given by the Verlinde formulas. 
Recall that M(X), as a symplectic manifold, is 
canonically associated with the surface X, without 
any choice of complex structure. The Hilbert 
spaces H(X, I) can be regarded as the quantization 
of this symplectic manifold, in the general frame- 
work of geometric quantization: the inverse of k 
plays the role of Planck’s constant. What is not 
obvious is that this quantization is independent of 
the complex structure chosen on the Riemann 
surface: that is, that there is a natural identification 
of the vector spaces (or at least the associated 
projective spaces) formed by using different com- 
plex structures. This was established rigorously by 
Hitchin (1990) and Axelrod et al. (1991), who 
constructed a projectively flat connection on the 
bundle of spaces H(X}, I) over the space of complex 
structures I on ©. At a formal level, these 
constructions are derived from the construction of 
the metaplectic representation of a linear symplec- 
tic group, since the My are symplectic quotients of 
an affine symplectic space. 

The Jones—Witten invariants have been rigorously 
established by indirect means, but it seems that there 
is still work to be done in developing Witten’s point 
of view. If Y* is a 3-manifold with boundary, one 
would like to have a geometric definition of a vector 
in H;,(OY~*). This should be the quantized version of 
the submanifold L* (which is Lagrangian in My) 
entering into the Casson theory. 


Seiberg-Witten Invariants 


The instanton invariants of a 4-manifold can be 
regarded as the integrals of certain natural differ- 
ential forms over the moduli spaces of instantons. 
Witten (1988) showed that these invariants could be 
obtained as functional integrals, involving a variant 
of the Feynman integral, over the space of connec- 
tions and certain auxiliary fields (insofar as this 
latter integral is defined at all). A geometric 
explanation of Witten’s construction was given by 
Atiyah and Jeffrey (1990). Developing this point of 
view, Witten made a series of predictions about the 
instanton invariants, many of which were subse- 
quently verified by other means. This line of work 
culminated in 1994 where, applying developments 
in supersymmetric Yang-Mills QFT, Seiberg and 
Witten introduced a new system of invariants and a 
precise prediction as to how these should be related 
to the earlier ones. 

The Seiberg-Witten invariants (Witten 1994) are 
associated with a Spin‘ structure on a 4-manifold M. 
If M is simply connected this is specified by a class 
K € H?(M; Z) lifting w2(M). One has spin bundles 


S+, S©—M with c,(S*)=K. The Seiberg-Witten 
equation is for a spinor field ọ — a section of S* 
and a connection A on the complex line bundle 
A*S*. This gives a connection on S* and hence a 
Dirac operator 


Da: r(S*) — r(S) 
The Seiberg-Witten equations are 
Dad = 0, F} = o(p) 


where o0:$*—A* is a certain natural quadratic 
map. The crucial differential-geometric feature of 
these equations arises from the Weitzenbock 
formula 


DiDad = VaVad +364 (F+) 


where R is the scalar curvature and p is a natural 
map from At to the endomorphisms of S+. Then p is 
adjoint to ø and 


(o(o(¢))¢, 6) = lol" 


It follows easily from this that the moduli space of 
solutions to the Seiberg—Witten equation is compact. 
The most important invariants arise when K is 
chosen so that 


K - K = 2x(M) + 3 sign(M) 


where x(M) is the Euler characteristic and sign(M) is 
the signature. (This is just the condition for K to 
correspond to an almost-complex structure on M.) In 
this case, the moduli space of solutions is zero 
dimensional (after generic perturbation) and the 
Seiberg-Witten invariant SW(K) is the number of 
points in the moduli space, counted with suitable signs. 

Witten’s conjecture relating the invariants, in its 
simplest form, is that when M has simple type the 
classes K for which SW(K) is nonzero are exactly the 
basic classes K, of Kronheimer and Mrowka and that 


b, = 2° SW(K;) 


where C(M)=2 + (1/4)(7x(M) + 11 sign(M)). This 
asserts that the two sets of invariants contain exactly 
the same information about the 4-manifold. 

The evidence for this conjecture, via calculations of 
examples, is very strong. A somewhat weaker 
statement has been proved rigorously by Feehan and 
Leness (2003). They use an approach suggested by 
Pidstragatch and Tyurin, studying moduli spaces of 
solutions to a nonabelian version of the Seiberg— 
Witten equations. These contain both the instanton 
and abelian Seiberg—Witten moduli spaces, and the 
strategy is to relate the topology of these two sets by 
standard localization arguments. (This approach is 
related to ideas introduced by Thaddeus (1994) in the 
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case of bundles over Riemann surfaces.) The serious 
technical difficulty in this approach stems from the 
lack of compactness of the nonabelian moduli spaces. 
The more general versions of Witten’s conjecture 
(Moore and Witten 1997) (e.g., when b*(M)=1) 
contain very complicated formulas, involving mod- 
ular forms, which presumably arise as contributions 
from the compactification of the moduli spaces. 


Applications 


Regardless of the connection with the instanton 
theory, one can go ahead directly to apply the 
Seiberg-Witten invariants to 4-manifold topology, 
and this has been the main direction of research 
since the 1990s. The features of the Seiberg—Witten 
theory which have led to the most prominent 
developments are the following. 


1. The reduction of the equations to two dimensions 
is very easy to understand. This has led to proofs 
of the Thom conjecture and wide-ranging gen- 
eralizations (Ozsvath and Szabo 2000). 

2. The Weitzenboch formula implies that, if M has 
positive scalar curvature, then solutions to the 
Seiberg-Witten equations must have ¢=0. This 
has led to important interactions with four- 
dimensional Riemannian geometry (Lebrun 1996). 

3. In the case when M is a symplectic manifold, 
there is a natural deformation of the Seiberg— 
Witten equations, discovered by Taubes (1996), 
who used it to show that the Seiberg-Witten 
invariants of M are nontrivial. More generally, 
Taubes showed that for large values of the 
deformation parameter the solutions of the 
deformed equation localize around surfaces in 
the 4-manifold and used this to relate the 
Seiberg-Witten invariants to the Gromov theory 
of pseudoholomorphic curves. These results of 
Taubes have completely transformed the subject 
of four-dimensional symplectic geometry. 


Bauer and Furuta (2004) have combined the 
Seiberg-Witten theory with more sophisticated 
algebraic topology to obtain further results about 
4-manifolds. They consider the map from the space of 
connections and spinor fields defined by the formulas 
on the left-hand side of the equations. The general 
idea is to obtain invariants from the homotopy class 
of this map, under a suitable notion of homotopy. 
A technical complication arises from the gauge group 
action, but this can be reduced to the action of a single 
U(1). Ignoring this issue, Bauer and Furuta have 
obtained invariants in the stable homotopy groups 
limy — oo TNi7(S%), which reduce to the ordinary 
numerical invariants when r= 1. Using these invar- 
iants, they obtain results about connected sums of 
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4-manifolds, for which the ordinary invariants are 
trivial. Using refined cobordism invariants ideas, 
Furuta made great progress towards resolving the 
question of which intersection forms arise from 
smooth, simply connected 4-manifolds. A well- 
known conjecture is that, if such a manifold is spin, 
then the second Betti number satisfies 


b(M) > 1 |sign(M) 
Furuta (2001) proved that b7(M) > (10/8)|sign(M)| +2. 


Animportant and very recent achievement, bringing 
together many different lines of work, is the proof of 
“Property P” in 3-manifold topology by Kronheimer 
and Mrowka (2004). This asserts that one cannot 
obtain a homotopy sphere (counter-example to the 
Poincaré conjecture) by +1-surgery along a nontrivial 
knot in S°. The proof uses work of Gabai and 
Eliashberg to show that the manifold obtained by 
Q-framed surgery is embedded in a symplectic 
4-manifold; Taubes’ results to show that the Seiberg— 
Witten invariants of this 4-manifold are nontrivial; 
Feehan and Leness’ partial proof of Witten’s con- 
jecture to show that the same is true for the instanton 
invariants; and the gluing rule and Floer’s exact 
sequence to show that the Floer homology of the 
+1-surgered manifold is nontrivial. It follows then 
from the definition of Floer homology that the funda- 
mental group of this manifold is not trivial; in fact, 
it must have an irreducible representation in SU(2). 


See also: Cotangent Bundle Reduction; Floer Homology; 
Gauge Theories from Strings; Gauge Theoretic Invariants 
of 4-Manifolds; Instantons: Topological Aspects; 

Knot Homologies; Moduli Spaces: An Introduction; 
Nonperturbative and Topological Aspects of Gauge 
Theory; Seiberg—Witten Theory; Topological Quantum 
Field Theory: Overview; Variational Techniques for 
Ginzburg—Landau Energies. 
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Introduction 


Einstein’s general theory of relativity has become the 
foundation for our understanding of the gravita- 
tional interaction. Four decades of high-precision 


experiments have verified the theory with ever- 
increasing precision, with no confirmed evidence of 
a deviation from its predictions. The theory is now 
the standard framework for much of astronomy, 
with its searches for black holes, neutron stars, 
gravitational waves, and the origin and fate of the 
universe. 

Yet modern developments in particle theory 
suggest that it may not be the entire story, and that 
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modification of the basic theory may be required at 
some level. String theory generally predicts a 
proliferation of gravity-like fields that could result 
in alterations of general relativity (GR) reminiscent 
of the Brans—Dicke theory of the 1960s. In the 
presence of extra dimensions, the gravity of the four- 
dimensional “brane” of a higher-dimensional world 
could be somewhat different from a pure four- 
dimensional GR. However, any theoretical specula- 
tion along these lines must still abide by the best 
current empirical bounds. This article will review 
experimental tests of GR and the theoretical 
implications of the results. 


The Einstein Equivalence Principle 


The Einstein equivalence principle is a modern 
generalization of Einstein’s 1907 idea of an equiva- 
lence between gravity and acceleration, or between 
free fall and an absence of gravity. It states that: 
(1) test bodies fall with the same acceleration 
independently of their internal structure or composi- 
tion (weak equivalence principle, or WEP); (2) the 
outcome of any local nongravitational experiment is 
independent of the velocity of the freely falling 
reference frame in which it is performed (local 
Lorentz invariance, or LLI); and (3) the outcome of 
any local nongravitational experiment is indepen- 
dent of where and when in the universe it is 
performed (local position invariance, or LPI). 

This principle is fundamental to gravitational 
theory, for it is possible to argue that, if EEP is 
valid, then gravitation and geometry are synon- 
ymous. In other words, gravity must be described by 
a “metric theory of gravity,” in which (1) spacetime 
is endowed with a symmetric metric, (2) the 
trajectories of freely falling bodies are geodesics of 
that metric, and (3) in local freely falling reference 
frames, the nongravitational laws of physics are 
those written in the language of special relativity 
(see Will (1993) for further details). 

GR is a metric theory of gravity, but so are many 
others, including the scalar-tensor theory of Brans 
and Dicke and many of its modern descendents, 
some of which are inspired by string theory. 


Tests of the Weak Equivalence Principle 


To test the WEP, one compares the acceleration of 
two laboratory-sized bodies of different composition 
in an external gravitational field. Although legend 
suggests that Galileo may have demonstrated this 
principle to his students at the Leaning Tower of Pisa, 
and Newton tested it by means of pendulum 
experiments, the first true high-precision experiments 


were done at the end of the nineteenth century by the 
Hungarian physicist Baron Roland von Eötvös and 
colleagues. 

Eötvös employed a torsion balance, in which 
(schematically) two bodies of different composition 
are suspended at the ends of a rod that is supported 
horizontally by a fine wire or fiber. One then looks 
for a difference in the horizontal accelerations of the 
two bodies as revealed by a slight rotation of the 
rod. The source of the horizontal gravitational force 
could be the Sun, a large mass in or near the 
laboratory, or, as Eötvös recognized, the Earth itself. 
A measurement or limit on the fractional difference 
in acceleration between two bodies yields a quantity 
n = 2\a1 — a2|/|a1 + a|, called the “Eötvös ratio.” 
Eötvös’ experiments showed that 7 was smaller than 
a few parts in 10’, and later classic experiments in 
the 1960s and 1970s by Dicke and Braginsky 
improved the bounds by several orders of magni- 
tude. Additional experiments were carried out 
during the 1980s as part of a search for a putative 
“fifth force,” that was motivated in part by a 
re-analysis of Eötvös’ original data. 

The best limit on 7 currently comes from experi- 
ments carried out during the 1985-2000 period at 
the University of Washington (called the “Eot- 
Wash” experiments), which used a sophisticated 
torsion balance tray to compare the accelerations of 
bodies of different composition toward the Earth, 
the Sun, and the galaxy. Another strong bound 
comes from ongoing laser ranging to reflectors 
deposited on the Moon during the Apollo program 
in the 1970s (lunar laser ranging, LLR), which 
routinely determines the Earth-Moon distance to 
millimeter accuracies. The data may be used to 
check the equality of acceleration of the Earth and 
Moon toward the Sun. The results from laboratory 
and LLR experiments are (Will 2001): 

Neat-Wash < 4 107%,  mir<5x 10% [1] 
LLR also shows that gravitational binding energy 
falls with the same acceleration as ordinary matter to 
1.3 x 107° (test of the Nordtvedt effect — see the section 
“Bounds on the PPN parameters” and Table 1). 

Many of the high-precision, low-noise methods that 
were developed for tests of WEP have been adapted 
to laboratory tests of the inverse-square law of 
Newtonian gravitation at millimeter scales and 
below. The goal of these experiments is to search for 
additional gravitational interactions involving massive 
particles or for the presence of large extra dimensions. 
The challenge of these experiments is to distinguish 
gravitation-like interactions from electromagnetic and 
quantum-mechanical effects. No deviations from 


Table 1 Current limits on the PPN parameters 
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Parameter Effect Limit Remarks 
y—1 (i) Shapiro delay 2.3x10-° Cassini tracking 
(ii) Light deflection 4x 10-4 VLBI 
B14 (i) Perihelion shift 3 x 1073 Jo =10~’ from helioseismology 
(ii) Nordtvedt effect 2.3 x 1074 LLR plus bounds on other parameters 
4 Anisotropy in Newton’s G 10-3 Gravimeter bounds on anomalous Earth tides 
Q Orbit polarization for moving systems 10 Lunar laser ranging 
a2 Anomalous spin precession for moving bodies 4 x 1077 Alignment of solar axis relative to ecliptic 
a3 Anomalous self-acceleration for spinning moving bodies 2 x 1072 Pulsar spindown timing data 
n? Nordtvedt effect 9x 1074 Lunar laser ranging 
Cy 2x10? Combined PPN bounds 
Co Anomalous self-acceleration for binary systems 4x10-° Timing data for PSR 1913 + 16 
(3 Violation of Newton’s third law 10° Lunar laser ranging 
Ca Not independent 


“Here n=46 — y — 3 — 10/3 — ay + 2a2/3 — 2¢,/3 — (2/3. 


Newton’s inverse-square law have been found to date 
at distances between 10 um and 10 mm. 


Tests of Local Lorentz Invariance 


Although special relativity itself never benefited 
from the kind of “crucial” experiments, such as the 
perihelion advance of Mercury and the deflection of 
light, that contributed so much to the initial 
acceptance of GR and to the fame of Einstein, the 
steady accumulation of experimental support, 
together with the successful integration of special 
relativity into quantum mechanics, led to its being 
accepted by mainstream physicists by the late 1920s, 
ultimately to become part of the standard toolkit of 
every working physicist. 

But in recent years new experiments have placed 
very tight bounds on any violations of the Lorentz 
invariance, which underlies special relativity. A 
simple way of interpreting this new class of 
experiments is to suppose that a coupling of some 
external gravitation-like field (not the metric) to the 
electromagnetic interactions results in an effective 
change in the speed of electromagnetic radiation, c, 
relative to the limiting speed of material test 
particles, co; in other words, c Æ co. It can be 
shown that such a Lorentz-noninvariant electromag- 
netic interaction would cause shifts in the energy 
levels of atoms and nuclei that depend on the 
orientation of the quantization axis of the state 
relative to our velocity relative to the rest of the 
universe, and on the quantum numbers of the state, 
resulting in orientation dependences of the funda- 
mental frequencies of such atomic clocks. The 
magnitude of these “clock anisotropies?” would be 
proportional to 6 = |(co/c)* — 1|, which vanishes if 
Lorentz invariance holds (see Will (1993) and 
Haugan and Will (1987) for details). 


The earliest clock anisotropy experiments were 
carried out around 1960 independently by Hughes 
and Drever, although their original motivation was 
somewhat different. Dramatic improvements were 
made in the 1980s using laser-cooled trapped atoms 
and ions. This technique made it possible to reduce 
the broadening of resonance lines caused by colli- 
sions, leading to the impressive bound |6| > 107% 
(Will 2001). 

Other recent tests of Lorentz invariance violation 
include comparisons of resonant cavities with 
atomic clocks, tests of dispersion and birefringence 
in the propagation of high-energy photons from 
astrophysical sources, threshold effects in elemen- 
tary particle collisions, and anomalies in neutrino 
oscillations. Mattingly (2005) gives a thorough and 
up-to-date review of both the theoretical frame- 
works for studying these effects and the experimen- 
tal results. 


Tests of Local Position Invariance 


LPI requires, among other things, that the internal 
binding energies of atoms and nuclei be indepen- 
dent of location in space and time, when measured 
against some standard atom. This means that a 
comparison of the rates of two different kinds of 
atomic clocks should be independent of location or 
epoch, and that the frequency shift between two 
identical clocks at different locations is simply a 
consequence of the apparent Doppler shift 
between a pair of inertial frames momentarily 
comoving with the clocks at the moments of 
emission and reception, respectively. The relevant 
parameter œ appears in the formula for the 
frequency shift, 


Af/f =(1+a)A®/c? 2] 
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where ® is the Newtonian gravitational potential. If 
LPI holds, a=0. An early test of this was the 
Pound-Rebka experiment of 1960, which measured 
the frequency shift of gamma rays from radioactive 
iron nuclei in a tower at Harvard University. The 
best bounds come from a 1976 experiment in which 
a hydrogen maser atomic clock was launched to 
10000km altitude on a Scout rocket and its 
frequency compared via telemetry with an identical 
clock on the ground, and a 1993 experiment in 
which two different kinds of atomic clocks were 
intercompared as a function of the varying solar 
gravitational field as seen on Earth (a “null” redshift 
experiment). The results are (Will 2001): 


Oa 2X 10“, QNull < 107 [3] 


Recent “clock comparison” tests of LPI include 
experiments done at the National Institute of 
Standards and Technology (NIST) in Boulder and 
at the Observatory of Paris, to look for cosmological 
variations in clock rates. The NIST experiment 
compared laser-cooled mercury ions with neutral 
cesium atoms over a two-year period, while the 
Paris experiment compared laser-cooled cesium and 
rubidium atomic fountains over five years; the 
results showed that the fine-structure constant is 
constant in time to a part in 101° per year. A better 
bound of 6 x 1071” yr“! comes from analysis of 
fission yields of the Oklo natural reactor, which 
occurred in Africa two billion years ago. 


Solar-System Tests 
The Parametrized Post-Newtonian Framework 


It was once customary to discuss experimental tests of 
GR in terms of the “three classical tests,” the 
gravitational redshift (which is really a test of the 
EEP, not of GR itself; see the section on tests of LPI), 
the perihelion advance of Mercury (the first success of 
the theory), and the deflection of light (whose 
measurement in 1919 made Einstein a celebrity). 
However, the proliferation of additional experimental 
tests and of well-motivated alternative metric theories 
of gravity made it desirable to develop a more general 
theoretical framework for analyzing both experiments 
and theories. This “parametrized post-Newtonian 
(PPN) framework” dates back to Eddington in 1922, 
but was fully developed by Nordtvedt and Will in the 
period 1968-72 (see Will (1993) for details). 

When attention is confined to metric theories of 
gravity and, further, the focus is on the slow-motion, 
weak-field limit appropriate to the solar system and 
similar systems, it turns out that, in a broad class of 
metric theories, only the numerical values of a set of 


coefficients in the spacetime metric vary from theory to 
theory. The resulting PPN framework contains ten 
parameters: y, related to the amount of spatial curv- 
ature generated by mass; (3, related to the degree of 
nonlinearity in the gravitational field; £, a1, a2, and 
a3, which determine whether the theory violates LPI 
or LLI in gravitational experiments; and (1, ¢2, ¢3, and 
C4, which describe whether the theory has appropriate 
momentum conservation laws. In GR, y=1, G=1, 
and the remaining parameters all vanish. In the scalar- 
tensor theory of Brans—Dicke, y= (1 + wpp)/(2 +wpp), 
where wpp is an adjustable parameter. 

A number of well-known relativistic effects can be 
expressed in terms of these PPN parameters: 


Deflection of light 


fir) 4GM 
ae = ( 2 ) dc? 


1 R 
3 (=) v 1.75057 arcsec [4] 


where d is the distance of closest approach of a ray 
of light to a body of mass M, and where the second 
line is the deflection by the Sun, with radius Ro. 


Shapiro time delay 


ee (* =) 4GM n K + X41 -n)(r2 — X2: n) 





2 P d? P| 
where At is the excess travel time of a round-trip 
electromagnetic tracking signal, x; and x2 are the 
locations relative to the body of mass M of the 
emitter and receiver of the round-trip signal (r4 and 
ro are the respective distances), and n is the direction 
of the outgoing tracking signal. 


Perihelion advance 


dw BE an GM 
dt 3 Pa(1 — e?)c? 


2+27- 
= (AS) x 42.98 arcsec/100 yr [6] 


where P, a, and e are the period, semimajor axis, 
and eccentricity of the planet’s orbit, respectively; 
the second line is the value for Mercury. 


Nordtvedt effect 





2 Nt = (48-y-3-¥8 ~ a +202 
I 
Eg] 
—$q- 16) ma [7] 


where mg and my are, respectively, the gravitational 
and inertial masses of a body such as the Earth or 


Moon, and Eg is its gravitational binding energy. A 
nonzero Nordtvedt effect would cause the Earth and 
Moon to fall with a different acceleration toward 
the Sun. In GR, this effect vanishes. 


Precession of a gyroscope 


dS 
T (Qrt Oceo) x § 
ea 


1 G 
Op = —5 (147+) aa (J -3n J) 


1 Q1 
ns E | a 
T 


1 Gmn 
ge ae 


) x 0.041 arcsec yr! 
QGeo = 


= =a + 27) x 6.6 arcsec yr! [8] 
where S is the spin of the gyroscope, and Qrp and 
Geo are, respectively, the precession angular velo- 
cities caused by the dragging of inertial frames 
(Lense-Thirring effect) and by the geodetic effect, a 
combination of Thomas precession and precession 
induced by spatial curvature; J is the angular 
momentum of the Earth, and v, n, and r are, 
respectively, the velocity, direction, and distance of 
the gyroscope. The second line in each case is the 
corresponding value for a gyroscope in polar Earth 
orbit at about 650 km altitude (Gravity Probe B). 


Bounds on the PPN Parameters 


Four decades of high-precision experiments, ranging 
from the standard light-deflection and perihelion- 
shift tests, to LLR, planetary and satellite tracking 
tests of the Shapiro time delay, and geophysical and 
astronomical observations, have placed bounds on 
the PPN parameters that are consistent with GR. 
The current bounds are summarized in Table 1 (Will 
2001). 

To illustrate the dramatic progress of experimen- 
tal gravity since the dawn of Einstein’s theory, 
Figure 1 shows a history of results for (1 + y)/2, 
from the 1919 solar eclipse measurements of 
Eddington and his colleagues (which made Einstein 
a celebrity), to modern-day measurements using very 
long baseline radio interferometry (VLBI), advanced 
radar tracking of spacecraft, and the astrometry 
satellite Hipparcos. The most recent results include a 
2003 measurement of the Shapiro delay, performed 
by tracking the “Cassini” spacecraft on its way 
to Saturn, and a 2004 measurement of the bending 
of light via analysis of VLBI data on 541 quasars 
and compact radio galaxies distributed over the 
entire sky. 
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observations of the deflection of light and of the Shapiro delay 
in propagation of radio signals near the Sun. The GR prediction 
is unity. “Optical” denotes measurements of stellar deflection 
made during solar eclipse, and “Radio” denotes interferometric 
measurements of radio-wave deflection. “Hipparcos” denotes the 
European optical astrometry satellite. Arrows denote values well 
off the chart from one of the 1919 eclipse expeditions and from 
others through 1947. Shapiro delay measurements using the 
Cassini spacecraft on its way to Saturn yielded tests at the 
0.001% level, and light deflection measurements using VLBI 
have reached 0.02%. 


The perihelion advance of Mercury, the first of 
Einstein’s successes, is now known to agree with 
observation to a few parts in 10°. During the 1960s 
there was controversy about this test when reports of an 
excess solar oblateness implied an unacceptably large 
Newtonian contribution to the perihelion advance. 
However, it is now known from helioseismology, the 
study of short-period vibrations of the Sun, that the 
oblateness is of the order of a part in 10’, as expected 
from standard solar models, much too small to affect 
Mercury’s orbit, within the observational errors. 


Gravity Probe B 


The NASA Relativity Mission called Gravity Probe B 
(GPB) recently completed its mission to measure the 
Lense—Thirring and geodetic precessions of gyroscopes 
in Earth’s orbit. Launched on 20 April 2004 for a 
16-month mission, it consisted of four spherical rotors 
coated with a thin layer of superconducting niobium, 
spinning at 70-100 Hz, ina spacecraft filled with liquid 
helium, containing a telescope continuously pointed 
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toward a distant guide star (IM Pegasi). Superconduct- 
ing current loops encircling each rotor were designed to 
measure the change in direction of the rotors by 
detecting the change in magnetic flux through the loop 
generated by the London magnetic moment of the 
spinning superconducting film. The spacecraft was in a 
polar orbit at 650 km altitude. The primary science goal 
of GPB was a 1% measurement of the 41 marcsec yr! 
frame dragging or Lense—Thirring effect caused by the 
rotation of the Earth; its secondary goal was to measure 
to six parts in 10° the larger 6.6 arcsec yr~! geodetic 
precession caused by space curvature. 


The Binary Pulsar 


The binary pulsar PSR 1913 + 16, discovered in 1974, 
provided important new tests of GR. The pulsar, with 
a pulse period of 59 ms, was observed to be in orbit 
about an unseen companion (now generally thought to 
be a dead pulsar), with a period of ~8h. Through 
precise timing of apparent variations in the pulsar 
“clock” caused by the Doppler effect, the important 
orbital parameters of the system could be measured 
with exquisite precision. These included nonrelativistic 
“Keplerian” parameters, such as the eccentricity e, and 
the orbital period (at a chosen epoch) Pẹġ, as well as a 
set of relativistic “post-Keplerian” (PK) parameters. 
The first PK parameter, (w), is the mean rate of 
advance of periastron, the analog of Méercury’s 
perihelion shift. The second, denoted 7’, is the effect 
of special relativistic time dilation and the gravita- 
tional redshift on the observed phase or arrival time of 
pulses, resulting from the pulsar’s orbital motion and 
the gravitational potential of its companion. The third, 
P,,, is the rate of decrease of the orbital period; this is 
taken to be the result of gravitational radiation 
damping (apart from a small correction due to the 
acceleration of the system in our rotating galaxy). Two 
other parameters, s and r, are related to the Shapiro 
time delay of the pulsar signal if the orbital inclination 
is such that the signal passes in the vicinity of the 
companion; s is a direct measure of the orbital 


inclination sin 7. According to GR, the first three PK 
effects depend only on e and P}, which are known, and 
on the two stellar masses, which are unknown. By 
combining the observations of PSR 1913 +16 (see 
Table 2) with the GR predictions, one obtains both a 
measurement of the two masses and a test of GR, since 
the system is overdetermined. The results are 


mı = 1.4414 + 0.0002Mo, mm = 1.3867 + 0.0002M. 
P JP” = 10013400021 9] 
Other relativistic binary pulsars may provide even 
more stringent tests. These include the relativistic 
neutron star/white dwarf binary pulsar J1141-6545, 
with a 0.19 day orbital period, which may ultimately 
lead to a very strong bound on the phenomenon of 
dipole gravitational radiation, predicted by many 
alternative theories of gravity, but not by GR; and 
the remarkable “double pulsar” J0737-3039, a 
binary system with two detected pulsars, in a 
0.10 day orbit seen almost edge on and a periastron 
advance of 17° per year. For further discussion of 
binary pulsar tests, see Stairs (2003). 


Gravitational-Wave Tests 


The detection of gravitational radiation by either 
laser interferometers or resonant cryogenic bars will 
usher in a new era of gravitational-wave astronomy 
(Barish and Weiss 1999). Furthermore, it will yield 
new and interesting tests of GR in its radiative 
regime (Will 1999). 

GR predicts that gravitational waves possess only two 
polarization modes independently of the source; they are 
transverse to the direction of propagation and quad- 
rupolar in their effect on a detector. Other theories of 
gravity may predict up to four additional modes of 
polarization. A suitable array of gravitational antennas 
could delineate or limit the number of modes present in a 
given wave. If distinct evidence were found of any mode 
other than the two transverse quadrupolar modes of 
GR, the result would be disastrous for the theory. 


Table 2 Parameters of the binary pulsars PSR 1913+ 16 and J0737-3039 


Parameter Symbol 


Value? in PSR1913 + 16 Value? in JO737-3039 





Keplerian parameters 
Eccentricity e 


Orbital period P, (day) 
Post-Keplerian parameters 

Periastron advance (w)(°yr-") 
Redshift/time dilation y (ms) 
Orbital period derivative P,(10~12) 
Shapiro delay (sin /) S 


“Numbers in parentheses denote errors in last digit. 


0.6171338(4) 0.087779(5) 
0.322997448930(4) 0.102251563(1) 
4.226595(5) 16.90(1) 
4.2919(8) 0.382(5) 
—2.4184(9) 

0.9995(4) 


According to GR, gravitational waves propagate with 
the same speed, c, as light. In other theories, the speed 
could differ from c because of coupling of gravitation to 
“background” gravitational fields, or propagation of the 
waves into additional spatial dimensions. Another way 
in which the speed of gravitational waves could differ 
from c is if gravitation were propagated by a massive 
field (a massive graviton), in which case vg would be 
given by, in a local inertial frame, 


mi g" c2 


x 1-5 
PX 





(10) 


where mg, E, and f are the graviton rest mass, 
energy, and frequency, respectively, and Ag =h/mgc 
is the graviton Compton wavelength (it is assumed 
that Ay > c/f). 

The most obvious way to measure the speed of 
gravitational waves is to compare the arrival times of a 
gravitational wave and an electromagnetic wave from 
the same event (e.g., a supernova). For a source at a 
distance of 600 million light years (a typical distance 
for the currently operational detectors), and a differ- 
ence in times on the order of seconds, the bound on the 
difference |1 — vg/c| could be as small as a part in 
101”. It is worth noting that a 2002 report that the 
speed of gravity had been measured by studying light 
from a quasar as it propagated past Jupiter was 
fundamentally flawed. That particular measurement 
was not sensitive to the speed of gravity. 


Conclusions 


The past four decades have witnessed a systematic, 
high-precision experimental verification of Einstein’s 
theories. Relativity has passed every test with flying 
colors. A central theme of future work will be to test 
strong-field gravity in the vicinity of black holes and 
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The Principle of Equivalence 


The special theory of relativity is founded on two 
basic principles: that the laws of physics should be 
independent of the uniform motion of an inertial 
frame of reference, and that the speed of light 
should have the same constant value in any such 
frame. In the years between 1905 and 1915, 
Einstein pondered deeply on what was, to him, a 
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neutron stars, and to see how well GR works on 
cosmological scales. Gamma-ray, X-ray, microwave, 
infrared, neutrino, and gravitational-wave astronomy 
will all play a critical role in probing these largely 
unexplored aspects of GR. 

GR is now the “standard model” of gravity. But, 
as in particle physics, there may be a world beyond 
the standard model. Quantum gravity, strings, and 
branes may lead to testable effects beyond Einstein’s 
GR. Searches for such effects using laboratory 
experiments, particle accelerators, space instrumen- 
tation, and cosmological observations are likely to 
continue for some time to come. 


See also: Cosmology: Mathematical Aspects; Einstein 
Equations: Exact Solutions; General Relativity: Overview; 
Geometric Flows and the Penrose Inequality; 
Gravitational Lensing; Gravitational Waves; Standard 
Model of Particle Physics. 
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profound enigma, which was the issue of why these 
laws retain their proper form only in the case of an 
inertial frame. In special relativity, as had been the 
case in the earlier dynamics of Galilei-Newton, the 
laws indeed retain their basic form only when the 
reference frame is unaccelerated (which includes it 
being nonrotating). It demonstrated a particular 
prescience on the part of Einstein that he should 
have demanded the seemingly impossible require- 
ment that the very same dynamical laws should 
hold also in an accelerating (or even rotating) 
reference frame. The key realization came to him 
late in 1907, when sitting in his chair in the Bern 
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patent office he had the “happiest thought” in his life, 
namely that if a person were to fall freely in a 
gravitational field, then he would not notice that field 
at all while falling. The physical point at issue is 
Galileo’s early insight (itself having roots even earlier 
from Simon Stevin in 1586 or Ioannes Philiponos in 
the fifth or sixth century) that the acceleration 
induced by gravity is independent of the body upon 
which it acts. Accordingly, if two neighboring bodies 
are accelerated together in the same gravitational 
field, then the motion of one body, in the (nonrotat- 
ing) reference frame of the other, will be as though 
there were no gravitational field at all. To put this 
another way, the effect of a gravitational force is just 
like that of an accelerating reference system, and can 
be eliminated by free fall. This is now known as the 
“principle of equivalence.” 

It should be made clear that this is a particular 
feature of only the gravitational field. From the 
perspective of Newtonian dynamics, it is a conse- 
quence of the seemingly accidental fact that the 
concept of (passive) “mass” m that features in 
Newton’s law of gravitational attraction, where the 
attractive force due to the gravitational field of 
another body, of mass M, has the form 


GmM 
Z 





is the same as — or, at least, proportional to — the 
inertial mass m of the body which is being acted 
upon. Thus, the impedance to acceleration of a body 
and the strength of the attractive force on that body 
are, in the case of gravity (and only in the case of 
gravity), in proportion to one another, so that the 
acceleration of a body in a gravitational field is 
independent of its mass (or, indeed, of any other 
localized magnitude) possessed by it. (The fact that 
the active gravitational mass, here given by the 
quantity M, is also in proportion to its own passive 
gravitational mass — from Newton’s third law — may 
be regarded as a feature of the general Lagrangian/ 
Hamiltonian framework of physics. But see Bondi 
(1957).) Other forces of nature do not have this 
property. For example, the electrostatic force on a 
charged body, by an electric field, acts in proportion to 
the electric charge on that body, whereas, the 
impedance to acceleration is still the inertial mass of 
that body, so the acceleration induced depends on the 
charge-to-mass ratio. Accordingly, it is the gravita- 
tional field alone which is equivalent to an acceleration. 

Einstein’s fundamental idea, therefore, was to 
take the view that the “relativity principle” could 
as well be applied to accelerating reference frames as 
to inertial ones, where the same physical laws would 
apply in each, but where now the perceived 


gravitational field would be different in the two 
frames. In accordance with this perspective, Einstein 
found it necessary to adopt a different viewpoint 
from the Newtonian one, both with regard to the 
notion of “gravitational force” and to the very 
notion of an “inertial frame.” According to the 
Newtonian perspective, it would be appropriate to 
describe the action of the Earth’s gravitational field, 
near some specific place on the Earth’s surface, in 
terms of a “Newtonian inertial frame” in which the 
Earth is “fixed” (here we ignore the Earth’s rotation 
and the Earth’s motion about the Sun), and we 
consider that there is a constant gravitational field 
of force (directed towards the Earth’s center). But 
the Einsteinian perspective is to regard that frame as 
noninertial where, instead, it would be a frame 
which falls freely in the Earth’s (Newtonian) 
gravitational field that would be regarded as a 
suitable “Einsteinian inertial frame.” Generally, to 
be inertial in Einstein’s sense, the frame would refer 
to free fall under gravity, so that the Newtonian 
field of gravitational force would appear to have 
disappeared — in accordance with his “happiest 
thought” that Einstein had had in the Bern patent 
office. We see that the concept of a gravitational 
field must also be changed in the passage from 
Newton’s to Einstein’s viewpoint. For in Newton’s 
picture we indeed have a “gravitational force” 
directed towards the ground with a magnitude of 
gm, where m is the mass of the body being acted 
upon and g is the “acceleration due to gravity” at 
the Earth’s surface, whereas in Einstein’s picture we 
have specifically eliminated this “gravitational 
force” by the choice of “Einsteinian inertial frame.” 

It might at first seem puzzling that the gravitational 
field has appeared to have been removed altogether by 
this device, and it is natural to wonder how gravita- 
tional effects can have any physical role to play at all 
from this point of view! However, this would be to go 
too far, as the Newtonian gravitational field may vary 
from place to place — as it does, indeed, in the case of 
the Earth’s field, since it is directed towards the Earth’s 
center, which is a different spatial direction at different 
places on the Earth’s surface. Our considerations up to 
this point really refer only to a small neighborhood of a 
point. One might well take the view that a “frame” 
ought really to describe things also at widely separated 
places at once, and the considerations of the para- 
graphs above do not really take this into consideration. 


The Tidal Effect 


To proceed further, it will be helpful to consider an 
astronaut A in free fall, high above the Earth’s 
surface. Let us first adopt a Newtonian perspective. 


We shall be concerned only with the instantaneous 
accelerations due to gravity in the neighborhood of 
A, so it will be immaterial whether we regard the 
astronaut as falling to the ground or - more 
comfortably! — in orbit about the Earth. 

Let us imagine that the astronaut is initially 
surrounded, nearby, by a sphere of particles, with A 
at the centre, which are taken to be initially at rest 
with respect to A (see Figure 1). To a first 
approximation, all the particles will share the same 
acceleration as the astronaut, so they will seem to the 
astronaut to hover motionless all around. But now let 
us be a little more precise about the accelerations. 
Those particles which are initially located in a vertical 
line from A, that is, either directly below A, at B, or 
directly above A, at T, will have, like A, an 
acceleration which is in the direction AO, where O 
is the Earth’s center. But for the bottom point B, the 
acceleration will be slightly greater than that at A, and 
for the top point T, the acceleration will be slightly 
less than the acceleration at A, because of the slightly 
differing distances from O. Thus, relative to A, both 
will initially accelerate away from A. With regard to 
particles in the sphere which are initially in a circle in 
the horizontal plane through A, the direction to O 
will now be somewhat inwards, so that the particles 





Figure 1 (a) Tidal effect. The astronaut A surrounded by a 
sphere of nearby particles initially at rest with respect to A. In 
Newtonian terms, they have an acceleration towards the Earth’s 
center E, varying slightly in direction and magnitude (single- 
shafted arrows). By subtracting A’s acceleration from each, we 
obtain the accelerations relative to A (double-shafted arrows); 
this relative acceleration is slightly inward for those particles 
displaced horizontally from A, but slightly outward for those 
displaced vertically from A. Accordingly, the sphere becomes 
distorted into a (prolate) ellipsoid of revolution, with symmetry 
axis in the direction AE. The initial distortion preserves volume. 
(b) Now move A to the Earth’s center E and the sphere of 
particles to surround E just above the atmosphere. The 
acceleration (relative to A=E) is inward all around the sphere, 
with an initial volume reduction acceleration 4rGM, where M is 
the total mass surrounded. Reproduced with permission from 
Penrose R (2004) The Road to Reality: A Complete Guide to the 
Laws of the Universe. London: Jonathan Cape. 
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at these points H; will accelerate, relative to A, 
slightly inwards. Accordingly, the entire sphere of 
particles will begin to get distorted into a prolate 
spheroid (elongated ellipsoid of revolution). This is 
referred to as the tidal distortion, for the good reason 
that it is precisely the same physical effect which is 
responsible for the tides in the Earth’s oceans, where 
for this illustration we are to think of the Earth’s 
center as being at A, the Moon (or Sun) to be situated 
at O, and the sphere of particles to represent the 
surface of the water of the Earth’s oceans. 

It is not hard to calculate (reverting, now, to our 
original picture) that, as a reflection of Newton’s 
inverse-square law of gravitational attraction, the 
amount of (small) outward vertical displacement 
from A (at B and T) will be twice the inward 
horizontal displacement (over the circle of points 
H;); accordingly, the sphere will initially be distorted 
into an ellipsoid of the same volume. This depends 
upon there being no gravitating matter inside the 
sphere. The presence of such matter would con- 
tribute a volume-reducing effect in proportion to the 
total mass surrounded. (An extreme case illustrating 
this would occur if we take our sphere of particles to 
surround the entire Earth, where the volume- 
reducing effect would be manifest in the accelera- 
tions towards the ground at all points of the 
surrounding sphere.) 


Gravity as Curved Spacetime 


It is appropriate to take a spacetime view of these 
phenomena (Figure 2). The distortions that we have 
been considering are, in fact, direct manifestations 





Figure 2 Spacetime versions of Figure 1 in terms of the 
relative distortion of neighboring geodesics. (a) Geodesic 
deviation in empty space (basically Weyl curvature) as seen in 
the world lines of A and surrounding particles (one spatial 
dimension suppressed), as might be induced from the gravita- 
tional field of a nearby body E. (b) The corresponding inward 
acceleration (basically Ricci curvature) due to the mass density 
within the bundle of geodesics. Reproduced with permission 
from Penrose R (2004) The Road to Reality: A Complete Guide 
to the Laws of the Universe. London: Jonathan Cape. 
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of spacetime curvature, according to Einstein’s 
viewpoint. We are to think of the world line of a 
particle, falling freely under gravity (Einsteinian 
inertial motion), as described as some kind of 
geodesic in spacetime. We shall be coming to this 
more completely shortly, but for the moment it will 
be helpful to picture the behavior of geodesics 
within an ordinary curved 2-surface S (Figure 3). 
If S has positive (Gaussian) curvature, then there 
will be a tendency for geodesics on S to bend 
towards each other, so that a pair of infinitesimally 
separated geodesics which are initially parallel will 
begin to get closer together as we move along them; 
if S has negative (Gaussian) curvature, then there 
will be a corresponding tendency for geodesics on S 
to bend away from each other. This is what happens 
in two dimensions, where the intrinsic curvature at a 
point is given by a single number. However, we are 
now concerned with a four-dimensional space, 
where the notion of curvature requires many more 
components. We see in Figure 2 that we are indeed 
to expect mixtures of convergence and divergence of 
geodesics, which suggests that there are both 
positive and negative curvature components 
involved, the positive curvature being in the hor- 
izontally displaced directions from A and the 
negative curvature in the vertically displaced direc- 
tions. In a curved space of dimension 4, as is the 
case for a curved spacetime, we can expect 20 
independent components of curvature at each point 
altogether. In the present situation, the others would 
be called into play when differing velocities of A are 
considered. 

Let us see how we are to accommodate the above 
considerations within the standard framework of 
differential geometry. So far, we have not really 
deviated from Newtonian theory, even though we 
have been considering “geodesics” in a four-dimen- 
sional spacetime. In fact, it is perfectly legitimate to 
view Newtonian theory in this way (see Newtonian 





(b) 


Figure 3 Geodesic deviation when M is a 2-surface (a) of 
positive (Gaussian) curvature, when the geodesics y,7°* bend 
towards each other, and (b) of negative curvature, when they 
bend apart. Reproduced with permission from Penrose R (2004) 
The Road to Reality: A Complete Guide to the Laws of the 
Universe. London: Jonathan Cape. 


Limit of General Relativity), although the 4-geometry 
description is somewhat more complicated than one 
might wish. This is due to the fact that the infinite 
speed at which gravitation is taken to act in 
Newtonian theory demands that the “metric” of 
Newtonian spacetime is degenerate. (In effect, one 
would have a degenerate “dual metric” G”, of 
matrix rank 3, which plays a role in defining spatial 
displacements and a very degenerate “metric” Gp, of 
matrix rank 1, which defines temporal differences, 
where G*’G,.=0; see Newtonian Limit of General 
Relativity.) Accordingly, there is no unique notion of 
“geodesic” defined by the metric in Newtonian 
theory. 

It is striking that although the insights provided 
by the principle of equivalence are to some 
considerable extent independent of special relativity 
(since we see from the paragraphs prior to the 
preceding one that a curved-spacetime-geometry 
view of gravity is natural in the light of the 
equivalence principle alone), it is the nondegenerate 
metric gp, (and its inverse g°®°) that special relativity 
gives us locally, which leads to an elegant space- 
time theory of gravity. Although the metric g,, 
is Lorentzian (with preferred choice of signature 
+ — —— here) rather than positive definite, so that 
the spacetime is not strictly a Riemannian one, the 
change of signature makes little difference to 
the local formalism. In particular, the fact that the 
metric defines a unique (torsion-free) connection 
preserving it is unaffected by the signature. This 
connection is the one defined by Christoffel’s symbols 


á 


Duc? = g OBa + OR — Oea) 


where ô, stands for coordinate derivative 0/Ox’, 
so that the covariant derivative of a vector V” is 
given by 


VV =0,V° + VT,” 


(Here the standard “physicist’s conventions” are 
being used, whereby notation such as “g,” and 
“V°”? can be used interchangeably either for the sets 
of components of the metric tensor g and the vector 
V, respectively, or alternatively for the entire 
geometrical metric tensor g or vector V, in each 
case; moreover, the summation convention is being 
assumed, or this can alternatively be understood in 
terms of abstract indices. (For the abstract-index 
notation for tensors, see Penrose and Rindler (1984), 
especially Chapters 2 and 4. Sign and index-ordering 
conventions used here follow those given in that 
book. Many other authors use conventions which 
differ from these in various, usually minor 
respects.)) 


Physical Interpretation of the Metric 


Some words of clarification are needed, as to the 
meaning of the metric tensor g,, in relativity theory. 
In the early discussions by Einstein and others, the 
spacetime metric tended to be interpreted in terms of 
little “rulers” placed on a curved manifold. 
Although this is natural in the Riemannian 
(positive-definite) case, it is not quite so appropriate 
for the Lorentzian geometry of spacetime manifolds. 
An ordinary physical ruler has a spacetime descrip- 
tion as a timelike strip, and it does not naturally 
express the spatial separation between two 
spacelike-separated events. In order for a ruler to 
measure such a spacelike separation, it would be 
necessary for the two events to be simultaneous in 
the ruler’s rest frame, and for this to be assured, 
some further mechanism would be needed, such as 
Einstein’s procedure for ensuring simultaneity by the 
use of light signals from the two events to be 
received simultaneously at their midpoint on the 
ruler. Clearly this complicates the issue, and it turns 
out to be much preferable to concentrate on 
temporal displacements rather than spatial ones. 

The idea that spacetime geometry should really be 
regarded as “chronometry,” in this way, has been 
stressed by a number of distinguished expositors of 
relativity theory, most notably John L Synge (1956, 
1960) and Hermann Bondi (1961, 1964, 1967). 
Where needed, spatial displacements can then be 
defined by the use of temporal ones together with 
light signals. This has the additional advantage that 
in modern technology, the measurement of (proper) 
time far surpasses that of distance in accuracy, to the 
extent that the meter is now defined simply by the 
requirement that there are exactly 299792458 of 
them in a light-second! The proper time interval 
between two nearby events is, indeed, measured by a 
clock which encounters both events, moving iner- 
tially between the two, and very precise atomic and 
nuclear clocks are now a common feature of current 
technology. The physical role of the metric g, is 
most clearly seen in the formula 


q 
T= l (24 dX° dx”) 
p 


which measures the (proper) time interval r between 
an event p and a later event q on its world line, the 
integral being taken along this curve, and where now 
that curve need not be a geodesic, so that accelerating 
(noninertial) motion of the clock is allowed. The 
metric (with choice of signature + — —— so that it is 
the timelike displacements that are directly provided 
as real numbers) is very precisely specified by this 
physical requirement, and this tells us that the 
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pseudo-Riemannian (Lorentzian) structure of space- 
time is far from being an arbitrary construction, but 
is given to us by Nature with enormous precision. 
(Some theorists prefer to use the alternative spacetime 
signature — + + + , because this more directly relates 
to familiar Newtonian concepts, these being normally 
described in spatial terms. The difference is essentially 
just a notational one, however. It may be remarked 
that the 2-spinor formalism (see Spinors and 
Spin Coefficients) fits in much more readily with the 
+ — —— signature being used here.) It may be noted, 
also that this time measure is ultimately fixed by 
quantum principles and the masses of the elementary 
ingredients involved (e.g., particle masses) via the 
Einstein and Planck relations E = mc? and E = hv, so 
that there is a natural frequency associated with a 
given mass, via v =mc?/h (c being the speed of light 
and h being Planck’s constant). 


Riemann Curvature and Geodesic 
Deviation 


The unique torsion-free (Christoffel—Levi-Civita) 
connection V, is, via this physically determined 
metric, also fixed accordingly by these physical 
considerations, as is the notion of a geodesic, and 
therefore so also is the curvature. The 20-independent- 
component Riemann curvature tensor R,»-q may be 


defined by 
(VNG VeV V = Ra V 


with normal index-raising/lowering conventions, so 
that Rabcd = RE p-8ea; etc., and we have the standard 
classical formula 


Ra = OTa — Ea ae Fa Da — Ea Ey 


The symmetries R ged = Redab = — Robacds Rabca + Rocad+ 
R abd =0 reduce the number of independent compo- 
nents of Raba to 20 (from a potential 4*=64). 
Of these, 10 are locally fixed by the kind of 
physical requirement indicated above, that in order 
to express something that agrees closely with New- 
ton’s inverse-square law we require that there should 
be a net inward curving of free world lines (the 
timelike geodesics that represent local inertial 
motions, or “free fall” under gravity). Let us see 
how this requirement is satisfied in Einstein’s general 
relativity. 

What we find, from Newton’s theory, is that a 
system of test particles which, at some initial time 
constitutes a closed 2-surface at rest surrounding 
some gravitating matter, will begin to accelerate in 
such a way that the volume surrounded is initially 
reduced in proportion to the total mass surrounded. 
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This volume reduction is a direct consequence of 
Poisson’s equation V7 = —4rp (® being the gravi- 
tational potential and p the mass density) and of 
Newton’s second law, which tell us that the second 
time derivative of the free-fall volume of our initially 
stationary closed surface of test particles is indeed 
—47GM, where M is the total gravitating mass 
surrounded (and G is Newton’s constant, as above). 
In Einstein’s theory, we can basically carry this over 
to our four-dimensional Lorentzian spacetime. 
We do, however, find that such a general statement 
as this does not exactly hold. Instead of referring to 
3-volumes of any size, we must restrict attention to 
infinitesimal volumes. 

The basic mathematical tool is the equation of 
“geodesic deviation,” namely the “Jacobi equation”: 


D224 = Rp tur 
where D describes “propagation derivative” 
D = ae 


along a timelike geodesic y, where tf is a unit 
timelike tangent vector to y (so t,t? =g,,t7t? =1) 
which is (consequently) parallel-propagated along 4, 


Dt’? =0 


(When acting on a scalar quantity defined along 4, 
we can read “D” as “d/dr,” where 7 measures 
proper time along y.) The vector u? is what is called 
a connecting vector between the geodesic y and 
some “neighboring geodesic” y’. We think of the 
vector u’ as “connecting” a point p on y to some 
neighboring point p’ on y, where it is usual to take 
u’? to be orthogonal to t° (i.e. uat? =0). The 
derivative Du? measures the rate of change of u’, 
as p and p’ move together into the future along y. 
Mathematically, we express this as the vanishing of 
the Lie derivative of u? with respect to t° (with t, 
extended to a unit vector field which is tangent both 
to yand to 7’). By taking three independent vectors u’ 
at p, we can form a spatial 3-volume element W and 
investigate how this propagates along y. We find 


D?W = WR,,t4t? 
where the Ricci tensor R,»(= Rpa) is here defined by 
Rap — Rab 


The Einstein Field Equations 


In view of what has been said above, with regard to 
the way that the acceleration of volume behaves in 
Newtonian theory, it would be natural to “iden- 
tify” Ratt? with (—4rG x) the (active gravita- 
tional) mass density, with respect to the time 


direction t°. In (special) relativity theory we expect 
to identify mass density with c% x energy density 
(by E =mc?) and to take energy density as just one 
component (the time-time component) of a sym- 
metric tensor Tp, called the “energy tensor,” and 
for simplicity we now take c=1. The tensor 
quantity T}, is to incorporate the contributions to 
the local mass/energy density of all particles and 
fields other than gravity itself. Since we would 
require this to work for all choices of time- 
direction £“, it would be natural, accordingly, to 
make the identification 


Rab = —4rG Tp 


Indeed, this was Einstein’s initial choice for a 
gravitational field equation. However, this will 
actually not do, as Einstein later realized. The 
trouble comes from the Bianchi identity 


VaR bdie Vip Rege + VeR ae = 
from which we deduce 
V° (Rap — Rab) = 0 
where 
Sk. 


This causes trouble in connection with the standard 
requirement on the energy tensor, that it satisfy the 
local “conservation law” 


VT = 0 


The latter equation is an essential requirement in special 
relativity, since it expresses the conservation of energy 
and momentum for fields in flat spacetime. In standard 
Minkowski coordinates, each of T0, Ta, 72, T33 
satisfies an equation just like the V7J,=0 of the 
charge-current vector J, of Maxwell’s theory of 
electromagnetism, with now V, =0, =0/0x*, which 
expresses global conservation of charge. Similarly to the 
way that J, encapsulates density and flux of electric 
charge, T20 encapsulates density and flux of energy, and 
Ta, 142,143 encapsulate the same for the three 
components of momentum. So the equation 
VT = 0 is essential in special relativity, for similarly 
expressing global conservation of energy and momen- 
tum. We find (referring to a local inertial frame) that, 
when we pass to general relativity, this equation should 
still hold, with V, now standing for covariant 
derivative. But the initially proposed field equation 
Ra = —4rGT,, would now give us V’ Rap = 0, which 
combined with the geometrically necessary V“(R,» — 
(+)Rgap) =Q, tells us that R is constant. In turn this 
implies the physically unacceptable requirement that 
T = T? is constant (since we have R= —42GT). 


Einstein eventually became convinced (by 1915) 
of the modified field equations 


Rab — Rg ab = —8nG Tap 


(the “8” rather than “4” being now needed to fit in 
with the Newtonian limit) and it is these that are 
now commonly referred to as “Einstein’s field 
equations.” (Some authors prefer to use the singular 
form “field equation,” especially if the formula is to 
be read as an abstract-index expression rather than a 
family of component equations, since the tensors 
involved are really single entities.) It may be noted 
that the formula can be rewritten as 


— IT gap ) 


from which we deduce that in Einstein’s theory the 
source of gravity is not simply the mass (or equivalently 
energy) density, but there is an additional contribution 
from the pressure (momentum flux, i.e., space-space 
components of Tp). This can have significant implica- 
tions for the instability of very large and massive stars in 
highly relativistic regimes, where increases in pressure 
can, paradoxically, actually increase the tendency for a 
star to collapse, owing to its contribution to the 
attractive effect of its gravity. 

In 1917, Einstein put forward a slight modifica- 
tion of his field equations — basically the only 
modification that can be made without fundamen- 
tally changing the foundations of his theory — by 
introducing the very tiny cosmological constant A. 
The modified equations are 


Ra = -81G (Tp 


Rap — ŻRgab + Agab = —87GT a 


and the source of gravity, or active gravitational 
mass is now 


A 
P P P=  ——— 
Pri ars is 4G 


where (with respect to a local Lorentzian orthonormal 
frame, units being chosen so that c= 1) p= Too is the 
mass/energy density and Py = T11, P2 = T22, P3 = T33 
are the principal pressures. The A-term, for positive A, 
provides a repulsive contribution to the gravitational 
effect, but it is extremely tiny (and totally ignorable) on 
all ordinary scales, beginning to show itself only at the 
most vast of observed cosmological distances (since the 
effect of A adds up relentlessly at larger and larger 
distances). Einstein originally introduced the term in 
order to have the possibility of a static universe, where 
the attractive gravitational effect of the totality of 
ordinary matter would be balanced, overall, by A. But 
the discovery of the expansion of the universe (by 
Hubble and others) led Einstein to abandon the 
cosmological term. However, since 1998 (initially 
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from the supernova observations of Brian Schmidt and 
Robert Kirschner, and Saul Perlmutter, see Perlmutter 
et al. (1998)), cosmological evidence has mounted in 
favor of the presence of a very small positive A-term, 
which has resulted in the expansion of the universe 
beginning to accelerate. While the presence of Einstein’s 
constant A-term is consistent with observations, and 
remains the simplest explanation of this observed 
acceleration, many cosmologists prefer to allow for 
what would amount to a “varying A,” and refer to it as 
“dark energy.” 


Energy Conservation and Related Matters 


One of the features of Einstein’s general relativity 
theory that had been deeply puzzling to a good many of 
Einstein’s contemporaries, and which may be said to be 
still not fully resolved, even today, is “energy conserva- 
tion,” in the presence of a dynamical gravitational field. 
We have noted that the energy tensor T, is to 
incorporate the contributions of all particles and fields 
other than gravity. But what about gravity itself? There 
are many physical situations in which energy can be 
transferred back and forth from gravitational systems 
to nongravitational ones (most strikingly in the example 
of the emission of gravitational waves; see Gravitational 
Waves). The conservation of energy would make no 
sense without an understanding of how energy can be 
stored in a gravitational field. At first sight we seem to 
see no role for a gravitational contribution to energy in 
Einstein’s theory, since the conservation law VT, = 0 
seems to be a self-contained expression of energy 
conservation with no direct contribution from the 
gravitational field in the tensor T}. However, this is 
illusory, since the formulation of a global conservation 
law from the local covariant expression V“T,, = 0 does 
not work in curved spacetime (basically because, unlike 
the charge-current quantity J, of Maxwell’s electro- 
dynamical theory, the extra index on T,, prevents it 
from being regarded as a 1-form). We may take the view 
that the energy of gravitation enters nonlocally into the 
equation, so that the failure of T, to provide a global 
conservation law on its own is an expression of the 
gravitational contributions of energy not being taken 
into account. This is no doubt a correct attitude to take, 
but it is a difficult one to express comprehensively in a 
mathematical form. Einstein himself provided a partial 
understanding, but at the expense of introducing 
concepts known as “pseudotensors” whose meaning 
was too tied up with arbitrary choices of coordinate 
systems to provide an overall picture. In modern 
approaches, the most clear-cut results come from the 
study of asymptotically flat or asymptotically de Sitter 
spacetimes (de Sitter space being the empty universe 
which takes over the role of Minkowski space when 


494 Generic Properties of Dynamical Systems 


there is a positive cosmological constant A; see 
Cosmology: Mathematical Aspects). 
The important role of the “Weyl conformal tensor” 


Cabed = Rabed — 5(RacSbd — RocBad + RoaSac — Radbc) 


+ ER (acbd — SbcSad) 


should also be pointed out. This tensor retains all 
the symmetries of the full Riemann tensor, but has 
the Ricci tensor contribution removed, so that all its 
contractions vanish, as is exemplified by 


Ce =0 


It describes the conformal part of the curvature, that 
is, that part that survives under conformal rescalings 
of the metric; 


Sab ‘7 Q gap 


where Q is a smooth (positive) function of position. The 
tensor C, is itself invariant under these conformal 
rescalings. This has importance in the asymptotic 
analysis of gravitational fields (see Asymptotic Struc- 
ture and Conformal Infinity). We may take the view 
that C,,.q describes the degrees of freedom in the free 
gravitational field, whereas R,, contains the informa- 
tion of the sources of gravity. This is analogous to the 
Maxwell tensor F,, describing the degrees of freedom 
in the free electromagnetic field, whereas J, contains 
the information of the sources of electromagnetism. 
From the observational point of view, general 
relativity stands in excellent shape, with full agreement 
with all known relevant data, starting with the 
anomalous perihelion advance of the planet Mercury 
observed by LeVerrier in the mid-nineteenth century, 
through clock-slowing, light-bending (lensing) and 
time-delay effects, and the necessary corrections to 
GPS positioning systems, to the precise orbiting of 
double neutron-star systems, with energy loss due to the 
emission of gravitational waves. The effects of gravita- 
tional lensing now play vital roles in modern cosmology. 


To get some idea of the precision in Einstein’s theory, we 
may take note of the fact that the double neutron-star 
system PSR 1913+16 has been observed for some 
30 years, and the agreement between observation and 
theory overall is to about one part in 1014. 


See also: Asymptotic Structure and Conformal Infinity; 
Canonical General Relativity; Computational Methods in 
General Relativity: the Theory; Cosmology: Mathematical 
Aspects; Einstein Equations: Exact Solutions; Einstein 
Equations: Initial Value Formulation; Einstein—Cartan 
Theory; Einstein’s Equations with Matter; General 
Relativity: Experimental Tests; Geometric Flows and the 
Penrose Inequality; Gravitational Lensing; Gravitational 
Waves; Hamiltonian Reduction of Einstein’s Equations; 
Lorentzian Geometry; Newtonian Limit of General 
Relativity; Noncommutative geometry and the Standard 
Model; Spacetime Topology, Causal Structure and 
Singularities; Spinors and Spin Coefficients; Symmetries 
and Conservation Laws; Twistor Theory: Some 
Applications [in Integrable Systems, Complex Geometry 
and String Theory]. 
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Introduction 


The state of a concrete system (from physics, 
chemistry, ecology, or other sciences) is described 
using (finitely many, say n) observable quantities 
(e.g., positions and velocities for mechanical 
systems, population densities for echological 
systems, etc.). Hence, the state of a system may be 


represented as a point x in a geometrical space R”. 
In many cases, the quantities describing the state are 
related, so that the phase space (space of all possible 
states) is a submanifold M c R”. The time evolution 
of the system is represented by a curve x,t €R 
drawn on the phase space M, or by a sequence xy € 
M, n€ Z, if we consider discrete time (i.e., every 
day at the same time, or every January 1st). 
Believing in determinism, and if the system is 
isolated from external influences, the state xo of the 
system at the present time determines its evolution. 
For continuous-time systems, the infinitesimal 


evolution is given by a differential equation or vector 
field dx/dt = X(x); the vector X(x) represents velo- 
city and direction of the evolution. For a discrete-time 
system, the evolution rule is a function F:M — M; if 
x is the state at time ż, then F(x) is the state at the 
time t+ 1. The evolution of the system, starting at 
the initial data xo, is described by the orbit of xo, that 
is, the sequence {(Xn),c7 |Xn11=F(xn)} (discrete 
time) or the maximal solution x; of the differential 
equation ax /dt= X(x) (continuous time). 


General problem Knowing the initial data and the 
infinitesimal evolution rule, what can we tell about 
the long-time evolution of the system? 


The dynamics of a dynamical system (differential 
equation or function) is the behavior of the orbits, 
when the time tends to infinity. The aim of 
“dynamical systems” is to produce a general 
procedure for describing the dynamics of any 
system. For example, Conley’s theory presented in 
the next section organizes the global dymamics of a 
general system using regions concentrating the orbit 
accumulation and recurrence and splits these regions 
in elementary pieces: the chain recurrence classes. 

We focus our study on C’-diffeomorphisms F (i.e., F 
and F-! are r times continuously derivable) on a 
compact smooth manifold M (most of the notions and 
results presented here also hold for vector fields). Even 
for very regular systems (F algebraic) of a low- 
dimensional space (dim (M) =2), the dynamics may 
be chaotic and very unstable: one cannot hope for a 
precise description of all systems. Furthermore, neither 
the initial data of a concrete system nor the infinitesi- 
mal-evolution rule are known exactly: fragile proper- 
ties describe the evolution of the theoretical model, and 
not of the real system. For these reasons, we are mostly 
interested in properties that are persistent, in some 
sense, by small perturbations of the dynamical system. 
The notion of small perturbations of the system 
requires a topology on the space Diff’(M) of C’- 
diffeomorphisms: two diffeomorphisms are close for 
the C’-topology if all their partial derivatives of order 
<r are close at each point of M. Endowed with this 
topology, Diff’(M) is a complete metric space. 

The open and dense subsets of Diff’(M) provide the 
natural topological notion of “almost all” F. Genericity 
is a weaker notion: by Baire’s theorem, if O;,7 € N, are 
dense and open subsets, the intersection (ey Oj is a 
dense subset. A subset is called residual if it contains 
such a countable intersection of dense open subsets. A 
property P is generic if it is verified on a residual 
subset. By a practical abuse of language, one says: 


““C’-generic diffeomorphisms verify P” 


A countable intersection of residual sets is a residual 
set. Hence, if {P;}h i € N, is a countable family of 
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generic properties, generic diffeomorphisms verity 
simultanuously all the properties P;. 

A property P is C’-robust if the set of diffeo- 
morphisms verifying P is open in Diff’(M). A 
property P is locally generic if there is an (nonempty) 
open set O on which it is generic, that is, there is 
residual set R such that P is verified on RM O. 

The properties of generic dynamical systems 
depend mostly on the dimension of the manifold M 
and of the C’-topology considered, r € N U{+co} 
(an important problem is that C’-generic diffeo- 
morphisms are not C’*! ): 


e On very low dimensional spaces (diffeomorphisms of 
the circle and vector fields on compact surfaces) the 
dynamics of generic systems (indeed in a open and 
dense subset of systems) is very simple (called Morse- 
Smale) and well understood; see the subsection 
“Generic properties of the low-dimensional 
systems.” 

e In higher dimensions, for C’-topology, r > 1, one 
has generic and locally generic properties related 
to the periodic orbits, like the Kupka—Smale 
property (see the subsection “Kupka—Smale theo- 
rem”) and the Newhouse phenomenon (see the 
subsection “Local C*-genericity of wild behavior 
for surface diffeomorphisms”). However, we still 
do not know if the dynamics of C’-generic 
diffeomorphisms is well approached by their 
periodic orbits, so that one is still far from a 
global understanding of C’-generic dynamics. 

o For the C!-topology, perturbation lemmas show that 
the global dynamics is very well approximated by 
periodic orbits (see the section “C!-generic systems: 
global dynamics and periodic orbits”). One then 
divides generic systems in “tame” systems, with a 
global dynamics analoguous to hyperbolic dynamics, 
and “wild” systems, which present infinitely many 
dynamically independent regions. The notion of 
dominated splitting (see the section “Hyperbolic 
properties of C!-generic diffeormorphisms”) seems 
to play an important role in this division. 


Results on General Systems 
Notions of Recurrence 


Some regions of M are considered as the heart of the 
dynamics: 


e Per(F) denotes the set of periodic points x € M of 
F, that is, F”(x)=x for some n > 0. 

e A point x is recurrent if its orbit comes back 
arbitrarily close to x, infinitely many times. 
Rec(F) denotes the set of recurrent points. 

e The limit set Lim(F) is the union of all the 
accumulation points of all the orbits of F. 
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e A point x is “wandering” if it admits a neighbor- 
hood U, CM disjoint from all its iterates 
F”(U,.),2 > 0. The nonwandering set Q(F) is the 
set of the nonwandering points. 

e R(F) is the set of chain recurrent points, that is, 
points x € M which look like periodic points if we 
allow small mistakes at each iteration: for any 
g0, there is a sequence x=X0,X1,...,Xp=X 
where d(f(x;),xXj11) <£ (such a sequence is an 
e-pseudo-orbit). 


A periodic point is recurrent, a recurrent point is a 
limit point, a limit point is nonwandering, and a 
nonwandering point is chain recurrent: 


Per(F) c Rec(F) c Lim(F) c O(F) c R(F) 


All these sets are invariant under F, and Q(F) and 
R(F) are compact subsets of M. There are diffeo- 
morphisms F for which the closures of these sets are 
distinct: 


e A rotation x++>x+a with irrational angle a € 
R\Q on the circle S‘=R/Z has no periodic 
points but every point is recurrent. 

e The map x= x + (1/4r)(1 + cos(27x)) induces 
on the circle S! a diffeomorphism F having a 
unique fixed point at x= 1/2; one verifies that 
Q(F)={1/2} and R(F) is the whole circle St. 


An invariant compact set K C M is transitive if there 
is x € K whose forward orbit is dense in K. Generic 
points x € K have their forward and backward 
orbits dense in K: in this sense, transitive sets are 
dynamically indecomposable. 


Conley’s Theory: Pairs Attractor/Repeller and 
Chain Recurrence Classes 


A trapping region U C M is a compact set whose 
image F(U) is contained in the interior of U. By 
definition, the intersection A= f)„»ọ F”(U) is an 
attractor of F: any orbit in U “goes to A.” Denote by 
V the complement of the interior of U: it is a trapping 
region for F and the intersection R = (),,.) F ”(V) is 
a repeller. Each orbit either is contained in A U R, or 
“goes from the repeller to the attractor.” More 
precisely, there is a smooth function %: M —> [0,1] 
(called Lyapunov function) equal to 1 on R and 0 on A, 
and strictly decreasing on the other orbits: 


w(F(x)) < v(x) forx€AUR 


So, the chain recurrent set is contained in AUR. 
Any compact set contained in U and containing the 
interior of F(U) is a trapping region inducing the 
same attracter and repeller pair (A,R); hence, the set 
of attracter/repeller pairs is countable. We denote by 
(A;, Rj, Yi), i E€ N, the family of these pairs endowed 


with an associated Lyapunov function. Conley 
(1978) proved that 


R(F) = ( \(AiU Ri) 
1EN 
This induces a natural partition of R(F) in equiva- 
lence classes: x ~ y if x € A; & y € A;. Conley proved 
that x ~ y iff, for any £ > 0, there are -pseudo orbits 
from x to y and vice versa. The equivalence classes 
for ~ are called chain recurrence classes. 

Now, considering an average of the Lyapunov 
functions y; one gets the following result: there is a 
continuous function y:M—R with the following 
properties: 


e o(F(x)) < y(x) for every x €M, (i.e, y is a 
Lyapunov function); 

© o(F(x)) = plx) &x € RF); 

e for x,y E RF), pl(x)=ply) x ~ y; and 

e the image y(R(F) is a compact subset of R with 
empty interior. 


This result is called the “fundamental theorem of 
dynamical systems? by several authors (see 
Robinson (1999)). 

Any orbit is -decreasing from a chain recurrence 
class to another chain reccurence class (the global 
dynamics of F looks like the dynamics of the 
gradient flow of a function @¢, the chain recurrence 
classes supplying the singularities of ¢). However, 
this description of the dynamics may be very rough: 
if F preserves the volume, Poincaré’s recurrence 
theorem implies that Q(F) = R(F) = M; the whole M 
is the unique chain recurrence class and the function 
p of Conley’s theorem is constant. 

Conley’s theory provides a general procedure for 
describing the global topological dynamics of a 
system: one has to characterize the chain recurrence 
classes, the dynamics in restriction to each class, 
the stable set of each class (1.e., the set of points 
whose positive orbits goes to the class), and the 
relative positions of these stable sets. 


Hyperbolicity 


Smale’s hyperbolic theory is the first attempt to give 
a global vision of almost all dynamical systems. In 
this section we give a very quick overview of this 
theory. For further details, see Hyperbolic Dynami- 
cal Systems. 


Hyperbolic Periodic Orbits 


A fixed point x of F is hyperbolic if the derivative 
DF(x) has no (neither real nor complex) eigenvalue 
with modulus equal to 1. The tangent space at x 


splits as T M = E; @ E", where E’ and EY are the 
DF(x)-invariant spaces corresponding to the eigen- 
values of moduli <1 and >1, respectively. There are 
C’-injectively immersed F-invariant submanifolds 
Ws(x) and W"(x) tangent at x to ES and E"; the 
stable manifold W*%(x) is the set of points y whose 
forward orbit goes to x. The implicit-function 
theorem implies that a hyperbolic fixed point x 
varies (locally) continuously with F; (compact parts 
of) the stable and unstable manifolds vary continu- 
ously for the C’-topology when F varies with the 
C’-topology. 

A periodic point x of period n is hyperbolic if it is 
a hyperbolic fixed point of F” and its invariant 
manifolds are the corresponding invariant manifolds 
for F”. The stable and unstable manifold of the orbit 
of x, WS, (x) and W(x), are the unions of the 
invariant manifolds of the points in the orbit. 


Homoclinic Classes 


Distinct stable manifolds are always disjoint; how- 
ever, stable and unstable manifolds may intersect. At 
the end of the nineteenth century, Poincaré noted 
that the existence of transverse homoclinic orbits, 
that is, transverse intersection of Ws (x) with 
W(x) (other than the orbit of x), implies a very 
rich dynamical behavior: indeed, Birkhoff proved 
that any transverse homoclinic point is accumulated 
by a sequence of periodic orbits (see Figure 1). The 
homoclinic class H(x) of a periodic orbit is the 
closure of the transverse homoclinic point associated 
tO x: 


H(p) = Wo) MW, () 


There is an equivalent definition of the homoclinic 
class of x: we say that two hyperbolic periodic 
points x and y are homoclinically related if W$ , (x) 
and W".(x) intersect transversally W",(y) and 
We p(y), respectively; this defines an equivalence 
relation in Pery,,(F) and the homoclinic classes are 
the closure of the equivalence classes. 

The homoclinic classes are transitive invariant 


compact sets canonically associated to the periodic 


=) Ax) 


Figure 1 A transverse homoclinic orbit. 
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orbits. However, for general systems, homoclinic 
classes are not necessarily disjoint. 
For more details, see Homoclinic Phenomena. 


Smale’s Hyperbolic Theory 


A diffeomorphism F is Morse—Smale if Q(F) = Per(F) 
is finite and hyperbolic, and if W*(x) is tranverse to 
W"(y) for any x,y € Per(F). Morse-Smale diffeo- 
morphisms have a very simple dynamics, similar to 
the one of the gradient flow of a Morse function; apart 
from periodic points and invariant manifolds of 
periodic saddles, each orbit goes from a source to a 
sink (hyperbolic periodic repellers and attractors). 
Furthermore, Morse-Smale diffeomorphisms are 
C!-structurally stable, that is, any diffeomorphism 
C!-close to F is conjugated to F by a homeomorphism: 
the topological dynamics of F remains unchanged by 
small C!-perturbation. Morse-Smale vector fields 
were known (Andronov and Pontryagin, 1937) to 
characterize the structural stability of vector fields on 
the sphere S*. However, a diffeomorphism having 
transverse homoclinic intersections is robustly not 
Morse-Smale, so that Morse-Smale diffeomorphisms 
are not C’-dense, on any compact manifold of 
dimension >2. In the early 1960s, Smale generalized 
the notion of hyperbolicity for nonperiodic sets in 
order to get a model for homoclinic orbits. The goal of 
the theory was to cover a whole dense open set of all 
dynamical systems. 

An invariant compact set K is hyperbolic if the 
tangent space TM], of M over K splits as the direct 
sum TMr=ES 9E" of two DF-invariant vector 
bundles, where the vectors in ES and E® are 
uniformly contracted and expanded, respectively, 
by F”, for some n > 0. Hyperbolic sets persist 
under small C!-perturbations of the dynamics: any 
diffeomorphism G which is C!-close enough to F 
admits a hyperbolic compact set Kg close to K and 
the restrictions of F and G to K and Kg are 
conjugated by a homeomorphism close to the 
identity. Hyperbolic compact sets have well- 
defined invariant (stable and unstable) manifolds, 
tangent (at the points of K) to E’ and E" and the 
(local) invariant manifolds of Kg vary locally 
continuously with G. 

The existence of hyperbolic sets is very common: 
if y is a transverse homoclinic point associated to a 
hyperbolic periodic point x, then there is a transitive 
hyperbolic set containing x and y. 

Diffeomorphisms for which R(F) is hyperbolic 
are now well understood: the chain recurrence 
classes are homoclinic classes, finitely many, and 
transitive, and admit a combinatorical model 
(subshift of finite type). Some of them are 
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attractors or repellers, and the basins of the 
attractors cover a dense open subset of M. If, 
furthermore, all the stable and unstable manifolds 
of points in R(f) are transverse, the diffeomorph- 
ism is C!-structurally stable (Robbin 1971, 
Robinson 1976); indeed, this condition, called 
“axiom A + strong transversality,” is equivalent 
to the C!-structural stability (Mañé 1988). 

In 1970, Abraham and Smale built examples of 
robustly non-axiom A diffeomorphisms, when 
dim M > 3: the dream of a global understanding 
of dynamical systems was postponed. However, 
hyperbolicity remains a key tool in the study of 
dynamical systems, even for nonhyperbolic 
systems. 


C’-Generic Systems 
Kupka-Smale Theorem 


Thom’s transversality theorem asserts that two 
submanifolds can always be put in tranverse posi- 
tion by a C’-small perturbations. Hence, for F in an 
open and dense subset of Diff’(M),r > 1, the graph 
of F in MxM is transverse to the diagonal 
A= {(x,x),x E€ M}:F has finitely many fixed points 
xi, depending locally continuously on F, and 1 is not 
an eigenvalue of the differential DF(x;). Small local 
perturbations in the neighborhood of the x; avoid 
eigenvalue of modulus equal to 1: one gets a dense 
and open subset ©; of Diff’(M) such that every fixed 
point is hyperbolic. This argument, adapted for 
periodic points, provides a dense and open set O% C 
Diff’(M), such that every periodic point of period n 
is hyperbolic. Now („en ©; is a residual subset of 
Diff (M), for which every periodic point is 
hyperbolic. 

Similarly, the set of diffeomorphisms F € 
M-o (M) such that all the disks of size n, of 
invariant manifolds of periodic points of period less 
that n, are pairwise transverse, is open and dense. 
One gets the Kupka—Smale theorem (see Palis and de 
Melo (1982) for a detailed exposition): for C’-generic 
diffeomorphisms F € Diff (M), every periodic orbit is 
hyperbolic and W(x) is transverse to W"(y) for 
x,y E€ Per(F). 


Generic Properties of Low-Dimensional Systems 


Poincaré—Denjoy theory describes the topological 
dynamics of all diffeomorphisms of the circle S! (see 
Homeomorphisms and Diffeomorphisms of the 
Circle). Diffeomorphisms in an open and dense 
subset of Diff” (St) have a nonempty finite set of 
periodic orbits, all hyperbolic, and alternately 
attracting (sink) or repelling (source). The orbit of 


a nonperiodic point comes from a source and goes 
to a sink. Two C’-generic diffeomorphisms of S! are 
conjugated iff they have same rotation number and 
same number of periodic points. 

This simple behavior has been generalized in 1962 
by Peixoto for vector fields on compact orientable 
surfaces S. Vector fields X in a C’-dense and open 
subset are Morse-Smale, hence structurally stable 
(see Palis and de Melo (1982) for a detailed proof). 
Peixoto gives a complete classification of these 
vector fields, up to topological equivalence. 

Peixoto’s argument uses the fact that the return 
maps of the vector field on transverse sections are 
increasing functions: this helped control the effect 
on the dynamics of small “monotonous” perturba- 
tions, and allowed him to destroy any nontrivial 
recurrences. Peixoto’s result remains true on non- 
orientable surfaces for the C!-topology but remains 
an open question for r> 1: is the set of Morse- 
Smale vector fields C*-dense, for S nonorientable 
closed surface? 


Local C?-Genericity of Wild Behavior for Surface 
Diffeomorphisms 


The generic systems we have seen above have a very 
simple dynamics, simpler than the general systems. 
This is not always the case. In the 1970s, Newhouse 
exhibited a C?-open set OC Diff?(S2) (where S? 
denotes the two-dimensional sphere), such that 
C*-generic diffeomorphisms F € © have infinitely 
many hyperbolic periodic sinks. In fact, C*-generic 
diffeomorphisms in © present many other patholo- 
gical properties: for instance, it has been recently 
noted that they have uncountably many chain 
recurrence classes without periodic orbits. Densely 
(but not generically) in O, they present many other 
phenomena, such as strange (Henon-like) attractors 
(see Lyapunov Exponents and Strange Attractors). 
This phenomenon appears each time that a 
diffeomorphism Fo admits a hyperbolic periodic 
point x whose invariant manifolds W*%(x) and 
W"(x) are tangent at some point p € W°(x) N 
W"(x) (p is a homoclinic tangency associated to x). 
Homoclinic tangencies appear locally as a codimen- 
sion-1 submanifold of Diff*(S2); they are such a 
simple phenomenon that they appear in very natural 
contexts. When a small perturbation transforms the 
tangency into tranverse intersections, a new hyper- 
bolic set K with very large fractal dimensions is 
created. The local stable and unstable manifolds of 
K, each homeomorphic to the product of a Cantor 
set by a segment, present tangencies in a C?-robust way, 
that is, for F in some C*-open set O (see Figure 2). 
As a consequence, for a C?-dense subset of O, the 


Figure 2 Robust tangencies. 


invariant manifolds of the point x present some 
tangency (this is not generic, by Kupka—Smale 
theorem). If the Jacobian of F at x is <1, each 
tangency allows to create one more sink, by an 
arbitrarily small perturbation. Hence, the sets of 
diffeomorphisms having more than n hyperbolic 
sinks are dense open subsets of ©, and the 
intersection of all these dense open subsets is the 
announced residual set. See Palis and Takens 
(1993) for details on this deep argument. 


C'-Generic Systems: Global Dynamics 
and Periodic Orbits 


See Bonatti et al. (2004), Chapter 10 and Appendix A, 
for a more detailed exposition and precise 
references. 


Perturbations of Orbits: Closing and Connecting 
Lemmas 


In 1968, Pugh proved the following Lemma. 


Closing lemma If x is a nonwandering point of a 
diffeomorphism F, then there are diffeomorphisms 
G arbitrarily C'-close to F, such that x is periodic 
for G. 


Consider a segment xo,...,Xn = F”(xo) of orbit 
such that x, is very close to x» =x; one would like 
to take G close to F such that G(x,)=xo, and 
G(x;) = F(x;)=xj,1, for i#n. This idea works for 
the C°-topology (so that the C®-closing lemma is 
easy). However, if one wants G e-C!-close to F, one 
needs that the points x;,i € {1,...,2— 1}, remain at 
distance d(x;, xo) greater than C(d(xy,xo)/e), where 
C bounds ||Df|| on M. If C/e is very large, such a 
segment of orbit does not exist. Pugh solved this 
difficulty in two steps: the perturbation is first 
spread along a segment of orbit of x in order to 
decrease this constant; then a subsegment yo,..., Yk 
of x0,...,X, is selected, verifying the geometrical 
condition. 
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For the C? topology, the distances d(x;, xo) need 
to remain greater than /d(xy,x0)/é >> d(Xn, xo). 
This new difficulty is why the C?-closing lemma 
remains an open question. 

Pugh’s argument does not suffice to create 
homoclinic point for a periodic orbit whose unstable 
manifold accumulates on the stable one. In 1998, 
Hayashi solved this problem proving the 


Connecting lemma (Hayashi 1997) Let y and z be 
two points such that the forward orbit of y and the 
backward orbit of z accumulate on the same 
nonperiodic point x. Fix some £ > 0. There is N > 
0 and a e-C'-perturbation G of F such that G"(y) =z 
for some n > 0, and G = F out of an arbitrary small 
neighborhood of {x, F(x),..., FN (x)}. 


Using Hayashi’s arguments, we (with Crovisier) 
proved the following lemma: 


Connecting lemma for pseudo-orbits (Bonatti and 
Crovisier 2004) Assume that all periodic orbits of F 
are hyperbolic; consider x,y € M such that, for any 
e€ > 0, there are e-pseudo-orbits joining x to y; then 
there are arbitrarily small C'-perturbations of F for 
which the positive orbit of x passes through y. 


Densities of Periodic Orbits 


As a consequence of the perturbations lemma above, 
we (Bonatti and Crovisier 2004) proved that for 
F C!-generic, 


RF) = OCF) = Perpyp (F) 


where Pery,,(F) denotes the closure of the set of 
hyperbolic periodic points. 

For this, consider the map WU: F= U(F) = Pery,,(F) 
defined on Diff'(M) and with value in K(M), space 
of all compact subsets of M, endowed with the 
Hausdorff topology. Perhyp(F) may be approximated 
by a finite set of hyperbolic periodic points, and this 
set varies continuously with F; so Perpy,(F) varies 
lower-semicontinuously with F: for G very close to 
F, Perhyp(G) cannot be very much smaller than 
Perhyp(F). As a consequence, a result from general 
topology asserts that, for C!-generic F, the map W is 
continuous at F. On the other hand, C!-generic 
diffeomorphisms are Kupka-Smale, so that the 
connecting lemma for pseudo-orbits may apply: 
if x E€ R(F),x can be turned into a hyperbolic 
periodic point by a C!-small perturbation of F. So, 
if x € Perhyp(F), F is not a continuity point of Y, 
leading to a contradiction. 

Furthermore, Crovisier proved the following 
result: “for C!-generic diffeomorphisms, each chain 
recurrence class is the limit, for the Hausdorff 
distance, of a sequence of periodic orbits.” 
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This good approximation of the global dynamics 
by the periodic orbits will now allow us to 
better understand the chain recurrence classes of 
C!-generic diffeomorphisms. 


Chain Recurrence Classes/Homoclinic Classes 
of C'-Generic Systems 


Tranverse intersections of invariant manifolds of 
hyperbolic orbits are robust and vary locally 
continuously with the diffeomorphisms F. So, the 
homoclinic class H(x) of a periodic point x varies 
lower-semicontinuously with F (on the open set 
where the continuation of x is defined). As a 
consequence, for C’-generic diffeomorphisms (r > 
1), each homoclinic class varies continuously with F. 
Using the connecting lemma, Arnaud (2001) proved 
the following result: “for Kupka-Smale diffeo- 
morphisms, if the closures W%,(x) and WS, (x) 
have some intersection point z, then a C!-pertuba- 
tion of F creates a tranverse intersection of W! (x) 
and W5 (x) at z.” So, if z¢ H(x), then F is not a 
continuity point of the function F= H(x, F). Hence, 
for C!-generic diffeomorphisms F and for every 
periodic point x, 
H(x) = We, (x) AWS (x) 


orb 











In the same way, W» (x) and WS (x) vary locally 
lower-semicontinuously with F so that, for F 
C"-generic, the closures of the invariant manifolds 
of each periodic point vary locally continuously. For 
Kupka-Smale diffeomorphisms, the connecting 
lemma for pseudo-orbits implies: “if z is a point in 
the chain recurrence class of a periodic point x, then 
a C!-small perturbation of F puts z on the unstable 
manifold of x”; so, if z € WY, (x), then F is not a 
continuity point of the function F= WY, (x, F). 
Hence, for C!-generic diffeomorphisms F and for 
every periodic point x, the chain recurrence class of 
x is contained in Wt, (x) WS, (x), and, therefore, 
coincides with the homoclinic class of x. This 
argument proves: 














For a C!-generic diffeomorphism F, each homoclinic class 
H(x) is a chain recurrence class of F (of Conley’s theory): 
a chain recurrence class containing a periodic point x 
coincides with the homoclinic class H(x). In particular, 
two homoclinic classes are either disjoint or equal. 


Tame and Wild Systems 


For generic diffeomorphisms, the number N(F) € 
NU {oo} of homoclinic classes varies lower-semicon- 
tinuously with F. One deduces that N(F) is locally 
constant on a residual subset of Diff'(M) (Abdenur 
2003). 


A local version (in the neighborhood of a chain 
recurrence class) of this argument shows that, for 
C!-generic diffeomorphisms, any isolated chain 
recurrence classe C is robustly isolated: for any 
diffeomorphism G, C!-close enough to F, the 
intersection of R(G) with a small neighborhood of 
C is a unique chain recurrence class Cg close to C. 

One says that a diffeomorphism is “tame” if each 
chain recurrence class is robustly isolated. We 
denote by 7(M) Cc Diff'(M) the (C'-open) set of 
tame diffeomorphisms and by W(M) the comple- 
ment of the closure of Z(M). C!-generic diffeo- 
morphisms in W(M) have infinitely many disjoint 
homoclinic classes, and are called “wild” 
diffeomorphisms. 

Generic tame diffeomorphisms have a global 
dynamics analogous to hyperbolic systems: the 
chain recurrence set admits a partition into finitely 
many homoclinic classes varying continuously with 
the dynamics. Every point belongs to the stable set 
of one of these classes. Some of the homoclinic 
classes are (transitive) topological attractors, and the 
union of the basins covers a dense open subset of M, 
and the basins vary continuously with F (Carballo 
Morales 2003). It remains to get a good description 
of the dynamics in the homoclinic classes, and 
particularly in the attractors. As we shall see in the 
next section, tame behavior requires some kind of 
weak hyperbolicity. Indeed, in dimension 2, tame 
diffeomorphisms satisfy axiom A and the noncycle 
condition. 

As of now, very little is known about wild 
systems. One knows some semilocal mechanisms 
generating locally C!-generic wild dynamics, there- 
fore proving their existence on any manifold with 
dimension dim (M) >3 (the existence of wild diffeo- 
morphisms in dimension 2, for the C!-topology, 
remains an open problem). Some of the known 
examples exhibit a universal dynamics: they admit 
infinitely many disjoint periodic disks such that, up 
to renormalization, the return maps on these disks 
induce a dense subset of diffeomorphisms of the 
disk. Hence, these locally generic diffeomorphisms 
present infinitely many times any robust property of 
diffeomorphisms of the disk. 


Ergodic Properties 


A point x is well closable if, for any £ > 0 there is 
G e-C!-close to F such that x is periodic for G and 
d(F' (x), G'(x)) <e for i€{0,...,p}, p being the 
period of x. As an important refinement of Pugh’s 
closing lemma, Mané proved the following lemma: 


Ergodic closing lemma For any F-invariant prob- 
ability, almost every point is well closable. 


As a consequence, “for C!-generic diffeomoph- 
isms, any ergodic measure u is the weak limit of a 
sequence of Dirac measures on periodic orbits, 
which converges also in the Hausdorff distance to 
the support of u.” 

It remains an open problem to know if, for 
C!-generic diffeomorphisms, the ergodic measures 
supported in a homoclinic class are approached by 
periodic orbits in this homoclinic class. 


Conservative Systems 


The connecting lemma for pseudo-orbits has been 
adapted for volume preserving and symplectic 
diffeomorphisms, replacing the condition on the 
periodic orbits by another generic condition on the 
eigenvalues. As a consequence, one gets: “C!-generic 
volume-preserving or symplectic diffeomorphisms 
are transitive, and M is a unique homoclinic class.” 

Notice that the KAM theory implies that this 
result is wrong for C*-generic diffeomorphisms, the 
persistence of invariant tori allowing to break 
robustly the transitivity. 

The Oxtoby—Ulam (1941) theorem asserts that 
C°-generic volume-preserving homeomorphisms are 
ergodic. The ergodicity of C!-generic volume- 
preserving diffeomorphisms remains an open question. 


Hyperbolic Properties of C'-Generic 
Diffeomorphisms 


For a more detailed exposition of hyperbolic proper- 
ties of C!-generic diffeomorphisms, the reader is 
referred to Bonatti et al. (2004, chapter 7 and 
appendix B). 


Perturbations of Products of Matrices 


The C!-topology enables us to do small perturbations of 
the differential DF at a point x without perturbing either 
F(x) or F out of an arbitrarily small neighborhood of x. 
Hence, one can perturb the differential of F along a 
periodic orbit, without changing this periodic orbit 
(Frank’s lemma). When x is a periodic point of period n, 
the differential of F” at x is fundamental for knowing the 
local behavior of the dynamics. This differential is (up to 
a choice of local coordinates) a product of the matrices 
DF(x;), where x;=F'(x). So, the control of the 
dynamical effect of local perturbations along a periodic 
orbit comes from a problem of linear algebra: “consider 
a product A= A, 0 Ay_1 0---0 Ay of n > 0 bounded 
linear ismorphisms of R“; how do the eigenvalues and 
the eigenspaces of A vary under small perturbations of 
the Aj?” 

A partial answer to this general problem uses 
the notion of dominated splitting. Let X c M be an 
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F-invariant set such that the tangent space of M at 
the points x € X admits a DF-invariant splitting 
T,.(M) = E;(x) @--- E,(x), the dimensions dim (E;(x)) 
being independent of x. This splitting is dominated if 
the vectors in E;,, are uniformly more expanded than 
the vectors in E;: there exists > 0 such that, for 
any x € X, any i € {1,...,k — 1} and any unit vectors 
u € E;(x) and v € Ej;,1(x), one has 


| DF'(u)|| < z IDF‘) 


Dominated splittings are always continuous, 
extend to the closure of X, and persist and vary 
continuously under C!-perturbation of F. 


Dominated Splittings versus Wild Behavior 


Let {y;} be a set of hyperbolic periodic orbits. On 
X=\l(Jy% one considers the natural splitting 
TM|y = E; $ E" induced by the hyperbolicity of the 
yi. Mañé (1982) proved: “if there is a C!-neighbor- 
hood of F on which each y; remains hyperbolic, then 
the splitting TM|, = E’ 6 E" is dominated.” 

A generalization of Mané’s result shows: “if a 
homoclinic class H(x) has no dominated splitting, 
then for any € > 0 there is a periodic orbit y in H(x) 
whose derivative at the period can be turned into an 
homothety, by an e-small perturbation of the 
derivative of F along the points of y”; in particular, 
this periodic orbit can be turned into a sink or a 
source. As a consequence, one gets: “for C!-generic 
diffeomorphisms F, any homoclinic class either has a 
dominated splitting or is contained in the closure of 
the (infinite) set of sinks and sources.” 

This argument has been used in two directions: 


e Tame systems must satisfy some hyperbolicity. In 
fact, using the ergodic closing lemma, one proves 
that the homoclinic classes H(x) of tame diffeo- 
morphisms are volume hyperbolic, that is, there is 
a dominated splitting TM=EF;@---@E, over 
H(x) such that DF contracts uniformly the 
volume in FE; and expands uniformly the volume 
in Eps 

e If F admits a homoclinic class H(x) which is 
robustly without dominated splittings, then gen- 
eric diffeomorphisms in the neighborhood of F are 
wild: at this time this is the unique known way to 
get wild systems. 


See also: Cellular Automata; Chaos and Attractors; 
Fractal Dimensions in Dynamics; Homeomorphisms and 
Diffeomorphisms of the Circle; Homoclinic Phenomena; 
Hyperbolic Dynamical Systems; Lyapunov Exponents 
and Strange Attractors; Polygonal Billiards; Singularity 
and Bifurcation Theory; Synchronization of Chaos. 
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Geometric analysis can be said to originate in the 
nineteenth century work of Weierstrass, Riemann, 
Schwarz, and others on minimal surfaces, a problem 
whose history can be traced at least as far back as 
the work of Meusnier and Lagrange in the eight- 
eenth century. The experiments performed by 
Plateau in the mid-19th century, on soap films 
spanning wire contours, served as an important 
inspiration for this work, and led to the formulation 
of the Plateau problem, which concerns the exis- 
tence and regularity of area-minimizing surfaces in 
R? spanning a given boundary contour. The Plateau 
problem for area-minimizing disks spanning a curve 
in R? was solved by J Douglas (who shared the first 
Fields medal with Lars V Ahlfors) and T Rado in the 
1930s. Generalizations of Plateau’s problem have 
been an important driving force behind the devel- 
opment of modern geometric analysis. Geometric 
analysis can be viewed broadly as the study of 
partial differential equations arising in geometry, 
and includes many areas of the calculus of varia- 
tions, as well as the theory of geometric evolution 
equations. The Einstein equation, which is the 
central object of general relativity, is one of the 
most widely studied geometric partial differential 
equations, and plays an important role in its 
Riemannian as well as in its Lorentzian form, the 
Lorentzian being most relevant for general relativity. 


The Einstein equation is the Euler-Lagrange 
equation of a Lagrangian with gauge symmetry 
and thus in the Lorentzian case it, like the Yang- 
Mills equation, can be viewed as a system of 
evolution equations with constraints. After imposing 
suitable gauge conditions, the Einstein equation 
becomes a hyperbolic system, in particular using 
spacetime harmonic coordinates (also known as 
wave coordinates), the Einstein equation becomes a 
quasilinear system of wave equations. The con- 
straint equations implied by the Einstein equations 
can be viewed as a system of elliptic equations in 
terms of suitably chosen variables. Thus, the 
Einstein equation leads to both elliptic and hyper- 
bolic problems, arising from the constraint equa- 
tions and the Cauchy problem, respectively. The 
groundwork for the mathematical study of the 
Einstein equation and the global nature of space- 
times was laid by, among others, Choquet-Bruhat, 
who proved local well-posedness for the Cauchy 
problem, Lichnerowicz, and later York who pro- 
vided the basic ideas for the analysis of the 
constraint equations, and Leray who formalized the 
notion of global hyperbolicity, which is essential for 
the global study of spacetimes. An important frame- 
work for the mathematical study of the Einstein 
equations has been provided by the singularity 
theorems of Penrose and Hawking, as well as the 
cosmic censorship conjectures of Penrose. 

Techniques and ideas from geometric analysis 
have played, and continue to play, a central role in 
recent mathematical progress on the problems posed 
by general relativity. Among the main results are the 


proof of the positive mass theorem using the 
minimal surface technique of Schoen and Yau, and 
the spinor-based approach of Witten, as well as the 
proofs of the (Riemannian) Penrose inequality by 
Huisken and Illmanen, and Bray. The proof of the 
Yamabe theorem by Schoen has played an important 
role as a basis for constructing Cauchy data using 
the conformal method. 

The results just mentioned are all essentially 
Riemannian in nature, and do not involve study of 
the Cauchy problem for the Einstein equations. 
There has been great progress recently concerning 
global results on the Cauchy problem for the 
Einstein equations, and the cosmic censorship con- 
jectures of Penrose. The results available so far are 
either small data results (among these the nonlinear 
stability of Minkowski space proved by Christodoulou 
and Klainerman) or assume additional symmetries, 
such as the recent proof by Ringström of strong 
cosmic censorship for the class of Gowdy space- 
times. However, recent progress concerning quasi- 
linear wave equations and the geometry of 
spacetimes with low regularity due to, among 
others, Klainerman and Rodnianski, and Tataru 
and Smith, appears to show the way towards an 
improved understanding of the Cauchy problem for 
the Einstein equations. 

Since the constraint equations, the Penrose 
inequality and the Cauchy problem are discussed 
in separate articles, the focus of this article will be 
on the role in general relativity of “critical” and 
other geometrically defined submanifolds and folia- 
tions, such as minimal surfaces, marginally trapped 
surfaces, constant mean curvature hypersurfaces 
and null hypersurfaces. In this context it would be 
natural also to discuss geometrically defined flows 
such as mean curvature flows, inverse mean 
curvature flow, and Ricci flow. However, this 
article restricts the discussion to mean curvature 
flows, since the inverse mean curvature flow 
appears naturally in the context of the Penrose 
inequality and the Ricci flow has so far mainly 
served as a source of inspiration for research on the 
Einstein equations rather than an important tool. 
Other topics which would fit well under the 
heading “General relativity and geometric analysis” 
are spin geometry (the Witten proof of the Positive 
mass theorem), the Yamabe theorem and related 
results concerning the Einstein constraint equa- 
tions, gluing and other techniques of “spacetime 
engineering.” These are all discussed in other 
articles. Some techniques which have only recently 
come into use and for which applications in general 
relativity have not been much explored, such as 
Cheeger—Gromov compactness, are not discussed. 
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Minimal and Related Surfaces 


Consider a hypersurface N in Euclidean space R” 
which is a graph x, =u(x1,...,Xn—1) with respect to 
the function u. The area of N is given by 


A(N)= f4/1+ Du? dx! ---dx-!. N is stationary 
with respect to A if u satisfies the equation 
D; 
(D —— | 20 (1] 
i 1+ |Du|? 


A hypersurface N defined as a graph of u solving 
[1] minimizes area with respect to compactly sup- 
ported deformations, and hence is called a minimal 
surface. For n < 7, a solution to eqn [1] defined on 
all of R”! must be an affine function. This fact is 
known as a Bernstein principle. Equation [1], and 
more generally, the prescribed mean curvature 
equation which will be discussed below, is a quasi- 
linear, uniformly elliptic second-order equation. The 
book by Gilbarg and Trudinger (1983) is an excellent 
general reference for such equations. 

The theory of rectifiable currents, developed by 
Federer and Fleming, is a basic tool in the modern 
approach to the Plateau problem and related varia- 
tional problems. A rectifiable current is a countable 
union of Lipschitz submanifolds, counted with integer 
multiplicity, and satisfying certain regularity condi- 
tions. Hausdorff measure gives a notion of area for 
these objects. One may therefore approach the study of 
minimal surfaces via rectifiable currents which are 
stationary with respect to variations of area. Suitable 
generalizations of familiar notions from smooth 
differential geometry such as tangent plane, normal 
vector, extrinsic curvature can be introduced. The 
book by Federer (1969) is a classic treatise on the 
subject. Further information concerning minimal sur- 
faces and related variational problems can be found in 
Lawson, Jr. (1980) and Simon (1997). Note, however, 
that unless otherwise stated, all fields and manifolds 
considered in this article are assumed to be smooth. For 
the Plateau problem in a Riemannian ambient space, 
we have the following existence and regularity result. 


Theorem 1 (Existence of embedded solutions for 
Plateau problem). Let M be a complete Riemannian 
manifold of dimension n < 7 and let T be a compact 
(n — 2)-dimensional submanifold in M which bounds. 
Then there is an (n — 1)-dimensional area-minimizing 
hypersurface N with T as its boundary. N is a smooth, 
embedded manifold in its interior. 


If the dimension of the ambient space is >7, 
solutions to the Plateau problem will in general have 
a singular set of dimension n — 8. Let N be an 
oriented hypersurface of a Riemannian manifold M 
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with covariant derivative D. Let 7 be the unit 
normal of N and define the second fundamental 
form and mean curvature of N by Aj = (Den, e;)) 
and H=trA. Define the action functional 
EIN) = A(N) — Jug Ho» where Hp is a function 
defined on M, and Ju; y denotes the integral over 
the volume bounded by N in M. The problem of 
minimizing € is a useful generalization of the 
minimization problem for A. 


Theorem 2 (Existence of minimizers in homology). 
Let M be a compact Riemannian manifold of 
dimension <7, and let a be an integral homology 
class on M of codimension 1. Then there is a smooth 
minimizer for E representing [a]. 


Again, in higher dimensions, the minimizers will 
in general have singularities. The general form of 
this result deals with elliptic functionals. For 
surfaces in 3-manifolds, the problem of minimizing 
area within homotopy classes has been studied. 
Results in this direction played a central role in the 
approach of Schoen and Yau to manifolds with non- 
negative scalar curvature. 

If M is not compact, it is in general necessary to use 
barriers to control the minimizers, or consider some 
version of the Plateau problem. Barriers can be used due 
to the strong maximum principle, which holds for the 
mean curvature operator since it is quasilinear elliptic. 
Consider two hypersurfaces N1, N2 which intersect at a 
point p and assume that N lies on one side of N} with 
the normal pointing towards N1. If the mean curvatures 
Hı, H2 of the hypersurfaces, defined with respect to 
consistently oriented normals, satisfy Hı < A < H3 for 
some constant A, then N; and N3 coincide near p and 
have mean curvatures equal to A. This result requires 
only mild regularity conditions on the hypersurfaces. 
Generalizations hold also for the case of spacelike or 
null hypersurfaces in a Lorentzian ambient space, see 
Andersson et al. (1998) and Galloway (2000). 

Let ¢ be a smooth compactly supported function 
on N. The variation €'=65,€ of E under a 
deformation @7 is 


= | o(H— Ho) 


Thus, N is stationary with respect to € if and only if 
N solves the prescribed mean curvature equation 
H(x)=Ho(x) for xe N. Supposing that N is 
stationary and Họ is constant, the second variation 
E" = bgnE of E is of the form 


ge J. pU) 


where J is the second-variation operator, a second- 
order elliptic operator. A calculation, using the 


Gauss equation and the second-variation equation 
shows 


Jọ = -Anġ — £ [(Scalm — Scaln) + H? +|AP]¢ [2] 


where An, Scalm, Scaly denote the Laplace—Beltrami 
operator of N, and the scalar curvatures of M and 
N, respectively. If J is positive semidefinite, N is 
called stable. 

To set the context where we will apply the 
above, let (M, 9) be a connected, asymptotically 
Euclidean three-dimensional Riemannian manifold 
with covariant derivative, and let k; be a symmetric 
tensor on M. Suppose (M,gij,Kj) is imbedded 
isometrically as a spacelike hypersurface in a space- 
time (V,7a3) with gj,K; the first and second 
fundamental forms induced on M from V, in 
particular Kj=(D.,T,e;) where T is the timelike 
normal of M in the ambient spacetime V, and D is 
the ambient covariant derivative. We will refer to 
(M, gi, Ki) as a Cauchy data set for the Einstein 
equations. Although many of the results which will 
be discussed below generalize to the case of a 
nonzero cosmological constant A, we will discuss 
only the case A=0 in this article. Gag = Ricyag — 
(1/2)Scalyyag be the Einstein tensor of V, and let 
p=GogT°T”, Wj = GjgT®. Then the fields (gj, Kj) 
satisfy the Einstein constraint equations 


R+tr K? — |K? = 2p [3] 
V;tr K — V'K;; = |4] 


We assume that the dominant energy condition 
(DEC) 


~ 1/2 
p> (X mui) [5] 
holds. We will sometimes make use of the null 
energy condition (NEC), GogL°L’ > 0 for null 
vectors L, and the strong energy condition (SEC), 


Ricyagv°v? > 0 for causal vectors v. M will be 
assumed to satisfy the fall-off conditions 


gi = (1 + =) 6; + O(1/717) [6a] 


K; = O(1/P) (6b) 


as well as suitable conditions for the fall-off of deriva- 
tives of g;, Kj. Here m is the ADM (Arnowitt, Deser, 
Misner) mass of (M, gi, Kij). 


Minimal Surfaces and Positive Mass 


Perhaps the most important application of the theory 
of minimal surfaces in general relativity is in the 


Schoen-Yau proof of the positive-mass theorem, 
which states that m > 0, and m=O only if (M, g, K) 
can be embedded as a hypersurface in Minkowski 
space. Consider an asymptotically Euclidean manifold 
(M,g) with g satisfying [6a] and with non-negative 
scalar curvature. By using Jang’s equation, see below, 
the general situation is reduced to the case of a time 
symmetric data set, with K = 0. In this case, the DEC 
implies that (M, g) has non-negative scalar curvature. 

Assuming m<0 one may, after applying a 
conformal deformation, assume that Scaly > 0 in 
the complement of a compact set. Due to the 
asymptotic conditions, level sets for sufficiently 
large values of one of the coordinate functions, say 
x’, can be used as barriers for minimal surfaces in 
M. By solving a sequence of Plateau problems with 
boundaries tending to infinity, a stable entire 
minimal surface N homeomorphic to the plane is 
constructed. Stability implies using [2], 


1 1.8 
cas = a << 
[G Scalm e+ 5IAP) <0 


where &=(1/2)Scaly is the Gauss curvature of N. 
Since by construction Scaly > 0,Scaly > 0 outside a 
compact set, this gives fy x > 0. Next, one uses the 
identity, related to the Cohn—Vossen inequality 


L2 
k= 2r = lim— 
J. 1 2A; 


where A;, L; are the area and circumference of a 
sequence of large discs. Estimates using the fact 
that M is asymptotically Euclidean show that 
lim; (L? /2A;) > 2m which gives a contradiction and 
shows that the minimal surface constructed cannot 
exist. It follows that m > 0. It remains to show that the 
case m=O is rigid. To do this proves that for an 
asymptotically Euclidean metric with non-negative 
scalar curvature, which is positive near infinity, there 
is a conformally related metric with vanishing scalar 
curvature and strictly smaller mass. Applying this 
argument in case m=O gives a contradiction to the 
fact that m > 0. Therefore, m=0 only if the scalar 
curvature vanishes identically. Suppose now that (M, g) 
has vanishing scalar curvature but nonvanishing Ricci 
curvature Ricy. Then using a deformation of g in the 
direction of Ricjy, one constructs a metric close to g 
with negative mass, which leads to a contradiction. 

This technique generalizes to Cauchy surfaces of 
dimension n < 7. The proof involves induction on 
dimension. For > 7 minimal hypersurfaces are 
singular in general and this approach runs into 
problems. The Witten proof using spinor techniques 
does not suffer from this limitation but instead 
requires that M be spin. 
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Marginally Trapped Surfaces 


Consider a Cauchy data set (M, gj, Kj) as above and 
let N be a compact surface in M with normal n, 
second fundamental form A and mean curvature H. 
Then considering N as a surface in an ambient 
Lorentzian space V containing M, N has two null 
normal fields which after a rescaling can be taken to 
be L+=T +n. Here, T is the future-directed time- 
like unit normal of M in V. The null mean 
curvatures (or null expansions) corresponding to 
L+ can be defined in terms of the variation of the 
area element uyn of N as ôL, uN =0+HuyN OF 


0+ =Tink + H 





where tryK denotes the trace of the projection of 
Kj to N. Suppose L, is the outgoing null normal. 
N is called outer trapped (marginally trapped, 
untrapped) if 6, < 0(0,=0,0, > 0). An asymptoti- 
cally flat spacetime which contains a trapped surface 
with 0_ < 0,0, <0 is causally incomplete. In the 
following we will for simplicity drop the word outer 
from our terminology. 

Consider a Cauchy surface M. The boundary of 
the region in M containing trapped surfaces is, if it is 
sufficiently smooth, a marginally trapped surface. 
The equation 6, =0 is an equation analogous to the 
prescribed curvature equation, in particular it is a 
quasilinear elliptic equation of second order. Mar- 
ginally trapped surfaces are not variational in the 
same sense as minimal surfaces. Nevertheless, they 
are stationary with respect to variations of area 
within the outgoing light cone. The second variation 
of area along the outgoing null cone is given, in view 
of the Raychaudhuri equation, by 


61.04 = —(Gi4 + loxl)o [7] 


for a function ¢ on N. Here G} = GalL and o, 
denotes the shear of N with respect to L+, that is, the 
tracefree part of the null second fundamental form 
with respect to L,. Equation [7] shows that the 
stability operator in the direction L+ is not elliptic. 

In the case of time-symmetric data, K; =0, the 
DEC implies Scaly > O and marginally trapped 
surfaces are simply minimal surfaces. A stable 
compact minimal 2-surface N in a 3-manifold M 
with non-negative scalar curvature must satisfy 


1 
m= fe > F Scalu + JA > 0 
N 


and hence by the Gauss-Bonnet theorem, N is 
diffeomorphic to a sphere or a torus. In case N is a 
stable minimal torus, the induced geometry is flat 
and the ambient curvature vanishes at N. If, in 
addition, N minimizes, then M is flat. 
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For a compact marginally trapped surface N in M, 
analogous results can be proved by studying the 
stability operator defined with respect to the direc- 
tion 7. Let J be the operator defined in terms of a 
variation of 64 by Jọ = ôsn0+. Then 


Jọ = —Andt+2s"Dad 


+ G Scale — Sas Das = ; lo? - G+) O 
Here, sa = —(1/2)(L-,Da L+) and G,_ is the Einstein 
tensor evaluated on L,,L_. We may call N stable if 
the real part of the spectrum of J is non-negative. A 
sufficient condition for N to be stable is that N is 
locally outermost. This can be formulated, for 
example, by requiring that a neighborhood of N in M 
contains no trapped surfaces exterior to N. In this case, 
assuming that the DEC holds, N is a sphere or a torus, 
and if the real part of the spectrum of J is positive then 
N is a sphere. If N is a torus, then the ambient 
curvature and shear vanishes at N, s4 is a gradient, and 
N is flat. One expects that in addition, global rigidity 
should hold, in analogy with the minimal surface case. 
This is an open problem. If N satisfies the stronger 
condition of strict stability, which corresponds to the 
spectrum of J having positive real part, then N is in the 
interior of a hypersurface H of the ambient spacetime, 
with the property that it is foliated by marginally 
trapped surfaces (Andersson et al. 2005). If the NEC 
holds and N has nonvanishing shear, then H is 
spacelike at N. A hypersurface H with these proper- 
ties is known as a dynamical horizon. 


Jang’s Equation 


Consider a Cauchy data set (M, gj, Ki). Extend Kj 
to a tensor field on M x R, constant in the vertical 
direction. Then the equation for a graph 


t=f(x)} 


such that N has mean curvature equal to the trace of 
the projection of Kj to N with respect to the induced 
metric on N, is given by 


jo ViVi ___VitVif \ _ 
s(* aR) (e pao = 


i,j 


N= {(x,t) EM xR, 


an equation closely related to the equation 
6, =0. Equation [8] was introduced by P S Jang 
(Jang 1978) as part of an attempt to generalize the 
inverse mean curvature flow method of Geroch from 
time-symmetric to general Cauchy data. 

Existence and regularity for Jang’s equation were 
proved by Schoen and Yau (1981) and used to 


generalize their proof of the positive-mass theorem 
from the case of maximal slices to the general case. 
The solution to Jang’s equation is constructed as the 
limit of the solution to a sequence of regularized 
problems. The limit consists of a collection N of 
submanifolds of M x R. In particular, component 
near infinity is a graph and has the same mass as M. 
N may contain vertical components which project 
onto marginally trapped surfaces in M, and in fact 
these constitute the only possibilities for blow-up of 
the sequence of graphs used to construct N. If the 
DEC is valid, the metric on N has non-negative 
scalar curvature in the weak sense that 


/ Scalyd* + 2|V¢l > 0 
N 


for smooth compactly supported functions ¢. If the 
DEC holds strictly, the strict inequality holds and in 
this case the metric on N is conformal to a metric 
with vanishing scalar curvature. 

Jang’s equation can be applied to prove existence 
of marginally trapped surfaces, given barriers. Let 
(M, gi, K;) be a Cauchy data set containing two 
compact surfaces N1, N2 which together bound a 
compact region M’ in M. Suppose the surfaces N1 
and N) have 6, <0 on Ny, and 6,>0 on N3. 
Schoen recently proved the following result. 


Theorem 3 (Existence of marginally trapped sur- 
faces). Let M’,N ,,N> be as above. Then there is a 
finite collection of compact, marginally trapped 
surfaces {iz} contained in the interior of M', such 
that UX, is homologous to N,. If the DEC holds, 
then X; is a collection of spheres and tori. 


The proof proceeds by solving a sequence of 
Dirichlet boundary-value problems for Jang’s equation 
with boundary value on N1, N2 tending to —oo and ov, 
respectively. The assumption on 6, is used to show the 
existence of barriers for Jang’s equation. Let fp be the 
sequence of solutions to the Dirichlet problems. Jang’s 
equation is invariant under renormalization fg — fg + 
cp for some sequence c, of real numbers. A Harnack 
inequality for the gradient of the solutions to Jang’s 
equation is used to show that the sequence of solutions 
fe, possibly after a renormalization, has a subsequence 
converging to a vertical submanifold of M’ x R, which 
projects to a collection %, of marginally trapped 
surfaces. By construction, the zero sets of the fg 
are homologous to Nı and N2. The estimates on 
the sequence {fp} show that this holds also in the limit 
k — œ. The statement about the topology of the X, 
follows by showing, using the above-mentioned 
inequality for Scaly, that if DEC holds, the total 
Gauss curvature of each surface X, is non-negative. 


Center of mass 


Since by the positive-mass theorem m > 0 unless the 
ambient spacetime is flat, it makes sense to consider 
the problem of finding an appropriate notion of 
center of mass. This problem was solved by Huisken 
and Yau who showed that under the asymptotic 
conditions [6] the isoperimetric problem has a unique 
solution if one considers sufficiently large spheres. 


Theorem 4 (Huisken and Yau 1996). There is an 
Ho > 0 and a compact region By, such that for each 
H € (0, Ho) there is a unique constant mean curva- 
ture sphere Sy with mean curvature H contained in 
M\By,. The spheres form a foliation. 


The proof involves a study of the evolution 
equation 


* (H-H) 9 


where H is the average mean curvature. This is the 
gradient flow for the isoperimetric problem of 
minimizing area keeping the enclosed volume con- 
stant. The solutions in Euclidean space are standard 
spheres. Equation [9] defines a parabolic system, in 
particular we have 


d . z 
qH = AH + (Ric(n, n) + |A|?)(H — H) 


It follows from the fall-off conditions [6] that the 
foliation of spheres constructed in Theorem 4 are 
untrapped surfaces. They can therefore be used as 
outer barriers in the existence result for marginally 
trapped surfaces, (Theorem 3). 

The mean curvature flow for a spatial hypersur- 
face in a Lorentz manifold is also parabolic. This 
flow has been applied to construct constant mean 
curvature Cauchy hypersurfaces in spacetimes. 


Maximal and Related Surfaces 


Let N be the hypersurface xp =u(x1,...,X,) in 
Minkowski space R''” with line element —dx% + 
dxj+---+dx*. Assume |Vu| <1 so that N is 
spacelike. Then N is stationary with respect to 
variations of area if u solves the equation 


Sa) eee 10] 


i 1 — |Vul? 


N maximizes area with respect to compactly 
supported variations, and hence is called a maximal 
surface. As in the case of the minimal surface 
equation, eqn [10] and more generally the 
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Lorentzian prescribed mean curvature equation, is 
quasilinear elliptic, but it is not uniformly elliptic, 
which makes the regularity theory more subtle. 

A Bernstein principle analogous to the one for the 
minimal surface equation holds for the maximal 
surface equation [10]. Suppose that u is a solution to 
[10] which is defined on all of R”. Then u is an 
affine function (Cheng and Yau 1976). An impor- 
tant tool used in the proof is a Bochner type identity, 
originally due to Calabi, for the norm of the second 
fundamental form. For a hypersurface in a flat 
ambient space, the Codazzi equation states V;A;ę — 
V;Aiz = 0. This gives the identity 


AAS = ViV;H aig ea T AmiRic;” [11] 


The curvature terms can be rewritten in terms of Aj 
if the ambient space is flat. Using [11] to compute 
AJAJ? gives an expression which is quadratic in VA, 
and fourth order in |A|, and which allows one to 
perform maximum principle estimates on |A|. Gen- 
eralizations of this technique for hypersurfaces in 
general ambient spaces play an important role in the 
proof of regularity of minimal surfaces, and in the 
proof of existence for Jang’s equation as well as in 
the analysis of the mean curvature flow used to 
prove existence of round spheres. The generalization 
of eqn [11] is known as a Simons identity. 

For the case of maximal hypersurfaces of 
Minkowski space, it follows from further maximum 
principle estimates that a maximal hypersurface of 
Minkowski space is convex, in particular, it has 
nonpositive Ricci curvature. Generalizations of this 
technique allow one to analyze entire constant mean 
curvature hypersurfaces of Minkowski space. 

Consider a globally hyperbolic Lorentzian mani- 
fold (V,y). A C? hypersurface is said to be weakly 
spacelike if timelike curves intersect it in at most one 
point. Call a codimension-2 submanifold TFC V a 
weakly spacelike boundary if it bounds a weakly 
spacelike hypersurface No. 


Theorem 5 (Existence for Plateau problem for 
maximal surfaces (Bartnik 1988)). Let V be a 
globally hyperbolic spacetime and assume that the 
causal structure of V is such that the domain of 
dependence of any compact domain in V is compact. 
Given a weakly spacelike boundary T in V, there is a 
weakly spacelike maximal hypersurface N with T as 
its boundary. N is smooth except possibly on null 
geodesics connecting points of I. 


Here, maximal hypersurface is understood in a 
weak sense, referring to stationarity with respect to 
variations. Due to the nonuniform ellipticity for the 
maximal surface equation, the interior regularity 
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which holds for minimal surfaces fails to hold in 
general for the maximal surface equation. 

A time-oriented spacetime is said to have a crushing 
singularity to the past (future) if there is a sequence ©, 
of Cauchy surfaces so that the mean curvature 
function H, of %, diverges uniformly to —oo(o0). 


Theorem 6 (Gerhardt 1983). Suppose that (V,7¥) is 
globally hyperbolic with compact Cauchy surfaces 
and satisfies the SEC. Then if (V,~7) has crushing 
singularities to the past and future it is globally 
foliated by constant mean curvature hypersurfaces. 
The mean curvature T of these Cauchy surfaces is a 
global time function. 


The proof involves an application of results from 
geometric measure theory to an action € of the form 
discussed earlier. A barrier argument is used to control 
the maximizers. Bartnik (1984, theorem 4.1) gave a 
direct proof of existence of a constant mean curvature 
(CMC) hypersurface, given barriers. If the spacetime 
(V,) is symmetric, so that a compact Lie group acts 
on V by isometries, then CMC hypersurfaces in V 
inherit the symmetry. Theorem 6 gives a condition 
under which a spacetime is globally foliated by CMC 
hypersurfaces. In general, if the SEC holds in a 
spatially compact spacetime, then for each 7 Æ 0, 
there is at most one constant mean curvature Cauchy 
surface with mean curvature 7. In case V is vacuum, 
Ricy =0, and 3 + 1 dimensional, then each point x € 
V is on at most one hypersurface of constant mean 
curvature unless V is flat and splits as a metric product. 

There are vacuum spacetimes with compact Cauchy 
surface which contain no CMC hypersurface 
(Chrusciel et al. 2004). The proof is carried out by 
constructing Cauchy data, using a gluing argument, on 
the connected sum of two tori, such that the resulting 
Cauchy data set (M, gjj, K;;) has an involution which 
reverses the sign of Kj. The involution extends to the 
maximal vacuum development V of the Cauchy data 
set. Existence of a CMC surface in V gives, in view of 
the involution, barriers which allow one to construct a 
maximal Cauchy surface homeomorphic to M. This 
leads to a contradiction, since the connected sum of 
two tori does not carry a metric of positive scalar 
curvature, and therefore, in view of the constraint 
equations, cannot be imbedded as a maximal Cauchy 
surface in a vacuum spacetime. The maximal vacuum 
development V is causally geodesically incomplete. 
However, in view of the existence proof for CMC 
Cauchy surfaces (cf. Theorem 6), these spacetimes 
cannot have a crushing singularity. It would be 
interesting to settle the open question whether there 
are stable examples of this type. 

In the case of a spacetime V which has an 
expanding end, one does not expect in general that 


the spacetime is globally foliated by CMC hyper- 
surfaces even if V is vacuum and contains a CMC 
Cauchy surface. This expectation is based on the 
phenomenon known as the collapse of the lapse; for 
example, the Schwarzschild spacetime does not 
contain a global foliation by maximal Cauchy 
surfaces (Beig and Murchadha 1998). However, no 
counterexample is known in the spatially compact 
case. In spite of these caveats, many examples of 
spacetimes with global CMC foliations are known, 
and the CMC condition, or more generally pre- 
scribed mean curvature, is an important gauge 
condition for general relativity. 

Some examples of situations where global 
constant or prescribed mean curvature foliations 
are known to exist in vacuum or with some types of 
matter are spatially homogeneous spacetimes, 
and spacetimes with two commuting Killing fields. 
Small data global existence for the Einstein equa- 
tions with CMC time gauge have been proved for 
spacetimes with one Killing field, with Cauchy 
surface a circle bundle over a surface of genus >1, 
by Choquet-Bruhat and Moncrief. Further, for 
(3 + 1)-dimensional spacetimes with Cauchy 
surface admitting a hyperbolic metric, small data 
global existence in the expanding direction has been 
proved by Andersson and Moncrief. See Andersson 
(2004) and Rendall (2002) for surveys on the 
Cauchy problem in general relativity. 


Null Hypersurfaces 


Consider an asymptotically flat spacetime contain- 
ing a black hole, that is, a region B such that future 
causal curves starting in B cannot reach observers at 
infinity. The boundary of the trapped region is 
called the event horizon H. This is a null hypersur- 
face, which under reasonable conditions on causality 
has null generators which are complete to the future. 
Due to the completeness, assuming that H is 
smooth, one can use the Raychaudhuri equation 
[7] to show that the null expansion 6, of a spatial 
cross section of H must satisfy 0, > 0, and hence 
that the area of cross sections of H grows mono- 
tonously to the future. A related statement is that 
null generators can enter H but may not leave it. 
This was first proved by Hawking for the case of 
smooth horizons, using essentially the Raychaudhuri 
equation. In general H can fail to be smooth. 
However, from the definition of as the boundary 
of the trapped region it follows that it has support 
hypersurfaces, which are past light cones. This 
property allows one to prove that H is Lipschitz 
and hence smooth almost everywhere. At smooth 
points of H, the calculations in the proof of 


Hawking apply, and the monotonicity of the area of 
cross sections follows. 


Theorem 7 (Area theorem (Chrusciel et al. 2001)). 
Let H be a black hole event horizon in a smooth 
spacetime (M,g). Suppose that the generators are 
future complete and the NEC holds on H. Let 
Sa,4=1,2, be two spacelike cross sections of H and 
suppose that Sọ is to the future of Sı. Then 
A(S2) > A(S1). 


The eikonal equation V°uV,u=0 plays a central 
role in geometric optics. Level sets of a solution u are 
null hypersurfaces which correspond to wave fronts. 
Much of the recent progress on rough solutions to the 
Cauchy problem for quasilinear wave equations is 
based on understanding the influence of the geometry 
of these wave fronts on the evolution of high- 
frequency modes ‘in the background spacetime. In 
this analysis many objects familiar from general 
relativity, such as the structure equations for null 
hypersurfaces, the Raychaudhuri equation, and the 
Bianchi identities play an important role, together 
with novel techniques of geometric analysis used to 
control the geometry of cross sections of the wave 
fronts and to estimate the connection coefficients in a 
rough spacetime geometry. These techniques show 
great promise and can be expected to have a 
significant impact on our understanding of the 
Einstein equations and general relativity. 
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Introduction 


In a paper, R Penrose (1973) made a physical 
argument that the total mass of a spacetime which 
contains black holes with event horizons of total area 
A should be at least \/A/16z. An important special 
case of this physical statement translates into a very 
beautiful mathematical inequality in Riemannian 
geometry known as the Riemannian Penrose inequal- 
ity. The Riemannian Penrose inequality was first 
proved by Huisken and Ilmanen (1997) for a single 
black hole and then by the author in 1999 for any 
number of black holes. The two approaches use two 
different geometric flow techniques. The most general 
version of the Penrose inequality is still open. 

A natural interpretation of the Penrose inequality 
is that the mass contributed by a collection of 
black holes is (at least) \/A/167. More generally, 
the question “How much matter is in a given region 
of a spacetime?” is still very much an open problem. 
(Christodoulou and Yau 1988). In this paper, we 
will discuss some of the qualitative aspects of mass 
in general relativity, look at examples which are 
informative, and describe the two very geometric 
proofs of the Riemannian Penrose inequality. 


Total Mass in General Relativity 


Two notions of mass which are well understood in 
general relativity are local energy density at a point 
and the total mass of an asymptotically flat space- 
time. However, defining the mass of a region larger 
than a point but smaller than the entire universe is 
not very well understood at all. 

Suppose (M°’,g) is a Riemannian 3-manifold 
isometrically embedded in a (3 + 1)-dimensional 
Lorentzian spacetime N*. Suppose that M? has zero- 
second fundamental form in the spacetime. This is a 
simplifying assumption which allows us to think of 
(M°, gz) asa “t = 0” slice of the spacetime. (Recall that 
the second fundamental form is a measure of how 
much M? curves inside N4. M? is also sometimes 
called “totally geodesic” since geodesics of N4 which 
are tangent to M? at a point stay inside M? forever.) 
The Penrose inequality (which allows for M? to have 
general second fundamental form) is known as the 


Riemannian Penrose inequality when the second 
fundamental form is set to zero. 

We also want to only consider (M°,g) that are 
asymptotically flat at infinity, which means that for 
some compact set K, the “end” M°\K is diffeo- 
morphic to R°\B,(0), where the metric g is 
asymptotically approaching (with certain decay 
conditions) the standard flat metric ô; on R? at 
infinity. The simplest example of an asymptotically 
flat manifold is (R°, 6) itself. Other good examples 
are the conformal metrics (R? , u(x) ói), where u(x) 
approaches a constant sufficiently rapidly at infinity. 
(Also, sometimes it is convenient to allow (M°, g) to 
have multiple asymptotically flat ends, in which 
case each connected component of M*\K must 
have the property described above.) A qualitative 
picture of an asympotically flat 3-manifold is shown 
in Figure 1. 

The purpose of these assumptions on the asymp- 
totic behavior of (M°, g) at infinity is that they imply 
the existence of the limit 


NE= in | S (Siivi — gi jj) du 
TF 
where S, is the coordinate sphere of radius ø, v is the 
unit normal to S,, and dy is the area element of S, in the 
coordinate chart. The quantity m is called the “total 
mass” (or ADM mass) of (M?, g) and does not depend 
on the choice of asymptotically flat coordinate chart. 
The above equation is where many people would 
stop reading an article like this. But before you do, 
we will promise not to use this definition of the total 
mass in this paper. In fact, it turns out that total mass 
can be quite well understood with an example. Going 
back to the example (R°, u(x)*S;z), if we suppose that 
u(x) > 0 has the asymptotics at infinity 


u(x) = a + b/|x| + O(1/|xI") (1 


CO 


Figure 1 A qualitative picture of an asymptotically flat 
3-manifold. 


(and derivatives of the O(1/|x|*) term are O(1/|x|)), 
then the total mass of (M?, g) is 


m = 2ab [2] 


Furthermore, suppose (Mł, g) is any metric whose 
“end” is isometric to (R?\ K, u(x) Ej), where u(x) is 
harmonic in the coordinate chart of the end (R° \ 
K,6;) and goes to a constant at infinity. Then 
expanding u(x) in terms of spherical harmonics 
demonstrates that u(x) satisfies condition [1]. 
We will call these Riemannian manifolds (M°, g) 
“harmonically flat at infinity,” and we note that the 
total mass of these manifolds is also given by eqn [2]. 
A very nice lemma by Schoen and Yau is that, 
given any € > 0, it is always possible to perturb an 
asymptotically flat manifold to become harmoni- 
cally flat at infinity such that the total mass changes 
less than € and the metric changes less than 
€ pointwise, all while maintaining non-negative 
scalar curvature. Hence, it happens that to prove 
the theorems in this paper, we only need to consider 
harmonically flat manifolds! Thus, we can use eqn 
[2] as our definition of total mass. As an example, 
note that (R° ,6) has zero total mass. Also, note 
that, qualitatively, the total mass of an asymptoti- 
cally flat or harmonically flat manifold is the 1/r 
rate at which the metric becomes flat at infinity. 


The Phenomenon of Gravitational Attraction 


What do the above definitions of total mass have to 
do with anything physical? That is, if the total mass 
is the 1/r rate at which the metric becomes flat at 
infinity, what does this have to do with our real- 
world intuitive idea of mass? 

The answer to this question is very nice. Given a 
Schwarzschild spacetime metric 


4 
(r. (1 + a) (dxi + dx3 + dx3) 


2 
_ (Lem/2lxl\? 2 
1+ m/2\x| 
|x| >m/2, for example, note that the t=O slice 


(which has zero-second fundamental form) is the 
spacelike Schwarzschild metric 


(r \ Bm /2(0), (1 + rn a) 


(discussed more later). Note that according to eqn 
[2], the parameter m is in fact the total mass of this 
3-manifold. 

On the other hand, suppose we were to release a 
small test particle, initially at rest, a large distance r 
from the center of the Schwarzschild spacetime. If 
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this particle is not acted upon by external forces, 
then it should follow a geodesic in the spacetime. It 
turns out that with respect to the asymptotically flat 
coordinate chart, these geodesics “accelerate” 
towards the middle of the Schwarzschild metric 
proportional to m/r? (in the limit as r goes to 
infinity). Thus, our Newtonian notion of mass also 
suggests that the total mass of the spacetime is 7n. 


Local Energy Density 


Another quantification of mass which is well under- 
stood is local energy density. In fact, in this setting, 
the local energy density at each point is 


1 
= — R 
i 16r 


where R is the scalar curvature of the 3-manifold 
(which has zero-second fundamental form in the 
spacetime) at each point. Note that (R°, &;) has zero 
energy density at each point as well as zero total mass. 
This is appropriate since (R°, 6;) is in fact a “t=0” 
slice of Minkowski spacetime, which represents a 
vacuum. Classically, physicists consider u > 0 to be a 
physical assumption. Hence, from this point on, we 
will not only assume that (M?, g) is asymptotically flat, 
but also that it has non-negative scalar curvature, 


R>0O 


This notion of energy density also helps us 
understand total mass better. After all, we can take 
any asymptotically flat manifold and then change 
the metric to be perfectly flat outside a large 
compact set, thereby giving the new metric zero 
total mass. However, if we introduce the physical 
condition that both metrics have non-negative scalar 
curvature, then it is a beautiful theorem that this is 
in fact not possible, unless the original metric was 
already (R°, 6;)! (This theorem is actually a corollary 
to the positive mass theorem discussed below.) 
Thus, the curvature obstruction of having non- 
negative scalar curvature at each point is a very 
interesting condition. 

Also, notice the indirect connection between the 
total mass and local energy density. At this point, 
there does not seem to be much of a connection at 
all. The total mass is the 1/r rate at which the metric 
becomes flat at infinity, and local energy density is 
the scalar curvature at each point. Furthermore, if a 
metric is changed in a compact set, local energy 
density is changed, but the total mass is unaffected. 

The reason for this is that the total mass is “not” 
the integral of the local energy density over the 
manifold. In fact, this integral fails to take potential 
energy into account (which would be expected to 
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contribute a negative energy) as well as gravitational 
energy. Hence, it is not initially clear what we should 
expect the relationship between total mass and local 
energy density to be, so let us begin with an example. 


Example Using Superharmonic Functions in R? 


Once again, let us return to the (R° u(x) ói) 
example. The formula for the scalar curvature is 


R = —8u(x)~° Au(x) 


Hence, since the physical assumption of non- 
negative energy density implies non-negative scalar 
curvature, we see that u(x) >0 must be super- 
harmonic (Au <0). For simplicity, let us also 
assume that u(x) is harmonic outside a bounded set 
so that we can expand u(x) at infinity using 
spherical harmonics. Hence, u(x) has the asympto- 
tics of eqn [1]. By the maximum principle, it follows 
that the minimum value for u(x) must be a, referring 
to eqn [1]. Hence, b > 0, which implies that m > 0! 
Thus, we see that the assumption of non-negative 
energy density at each point of (R°,u(x)*6;) implies 
that the total mass is also non-negative, which is 
what one would hope. 


The Positive Mass Theorem 


Why would one hope this? What would be the 
difference if the total mass were negative? This 
would mean that a gravitational system of positive 
energy density could collectively act as a net 
negative total mass. This phenomenon has not 
been observed experimentally, and so it is not a 
property that we would hope to find in general 
relativity. 

More generally, suppose we have any asymptotically 
flat manifold with non-negative scalar curvature, is it 
true that the total mass is also non-negative? The 
answer is yes, and this fact is know as the positive mass 
theorem, first proved by Schoen and Yau (1979) using 
minimal surface techniques and then by Witten (1981) 
using spinors. In the zero-second fundamental form 
case, the positive mass theorem is known as the 
Riemannian positive mass theorem and is stated below. 


Theorem 1 (Schoen, Yau). Let (M°,g) be any 
asymptotically flat, complete Riemannian manifold 
with non-negative scalar curvature. Then the total 
mass m > 0, with equality if and only if (M°,g) is 
isometric to (R°, 6). 


Gravitational Energy 


The previous example neglects to illustrate some of 
the subtleties of the positive mass theorem. For 
example, it is easy to construct asymptotically flat 


manifolds (M°, g) (not conformal to R?) which have 
zero scalar curvature everywhere and yet have 
“nonzero” total mass. By the positive mass theorem, 
the mass of these manifolds is positive. Physically, 
this corresponds to a spacetime with zero energy 
density everywhere which still has positive total 
mass. From where did this mass come? How can a 
vacuum have positive total mass? 

Physicists refer to this extra energy as gravita- 
tional energy. There is no known local definition of 
the energy density of a gravitational field, and 
presumably such a definition does not exist. The 
curious phenomenon, then, is that for some reason, 
gravitational energy always makes a non-negative 
contribution to the total mass of the system. 


Black Holes 


Another very interesting and natural phenomenon in 
general relativity is the existence of black holes. 
Instead of thinking of black holes as singularities in 
a spacetime, we will think of black holes in terms of 
their horizons. For example, suppose we are explor- 
ing the universe in a spacecraft capable of traveling 
at any speed less than the speed of light. If we are 
investigating a black hole, we would want to make 
sure that we don’t get too close and get trapped by 
the “gravitational forces” of the black hole. In fact, 
we could imagine a “sphere of no return” beyond 
which it is impossible to escape from the black hole. 
This is called the event horizon of a black hole. 

However, one limitation of the notion of an event 
horizon is that it is very hard to determine its location. 
One way is to let daredevil spacecraft see how close 
they can get to the black hole and still escape from it 
eventually. The only problem with this approach 
(besides the cost in spacecraft) is that it is hard to 
know when to stop waiting for a daredevil spacecraft 
to return. Even if it has been 50 years, it could be that 
this particular daredevil was not trapped by the black 
hole but got so close that it will take it 1000 or more 
years to return. Thus, to define the location of an event 
horizon even mathematically, we need to know the 
entire evolution of the spacetime. Hence, event 
horizons can not be computed based only on the 
local geometry of the spacetime. 

This problem is solved (at least for the mathema- 
tician) with the notion of apparent horizons of black 
holes. Given a surface in a spacetime, suppose that it 
emits an outward shell of light. If the surface area of 
this shell of light is decreasing everywhere on the 
surface, then this is called a trapped surface. The 
outermost boundary of these trapped surfaces is 
called the apparent horizon of the black hole. 
Apparent horizons can be computed based on their 


local geometry, and an apparent horizon always 
implies the existence of an event horizon outside of 
it (Hawking and Ellis 1973). 

Now let us return to the case we are considering 
in this paper where (M°,g) is a “t=O” slice of a 
spacetime with zero-second fundamental form. Then 
it is a very nice geometric fact that apparent 
horizons of black holes intersected with M? corres- 
pond to the connected components of the outermost 
minimal surface No of (M?, 2). 

All of the surfaces we are considering in this paper 
will be required to be smooth boundaries of open 
bounded regions, so that outermost is well defined 
with respect to a chosen end of the manifold. 
A minimal surface in (M°, g) is a surface which is a 
critical point of the area function with respect to any 
smooth variation of the surface. The first variational 
calculation implies that minimal surfaces have zero 
mean curvature. The surface Xo of (M°, g) is defined 
as the boundary of the union of the open regions 
bounded by all of the minimal surfaces in (M°, g). It 
turns out that No also has to be a minimal surface, 
so we call Yo the “outermost minimal surface.” 
A qualitative sketch of an outermost minimal 
surface of a 3-manifold is shown in Figure 2. 

We will also define a surface to be “(strictly) outer 
minimizing” if every surface which encloses it has 
(strictly) greater area. Note that outermost minimal 
surfaces are strictly outer minimizing. Also, we define 
a “horizon” in our context to be any minimal surface 
which is the boundary of a bounded open region. 

It also follows from a stability argument (using 
the Gauss—Bonnet theorem interestingly) that each 
component of an outermost minimal surface (in a 
3-manifold with non-negative scalar curvature) must 
have the topology of a sphere. Furthermore, there is 
a physical argument, based on Penrose (1973), 
which suggests that the mass contributed by the 


black holes (thought of as the connected compo- 
nents of Xo) should be defined to be \/Ao/167, 





Figure 2 A qualitative sketch of an outermost minimal surface 
of a 3-manifold. 
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where Ag is the area of “ip. Hence, the physical 
argument that the total mass should be greater than 
or equal to the mass contributed by the black holes 
yields the following geometric statement. 


The Riemannian Penrose Inequality Let (M?, g) be 
a complete, smooth, 3-manifold with non-negative 
scalar curvature which is harmonically flat at 
infinity with total mass m and which has an 
outermost minimal surface “ig of area Ao. Then, 


N 
m p 

— V 16r 
with equality if and only if (M?, g) is isometric to the 
Schwarzschild metric 


fon (vet) 


outside their respective outermost minimal surfaces. 

The above statement has been proved by the 
present author, and Huisken and Ilmanen proved it 
when Ap is defined instead to be the area of the 
largest connected component of io. We will discuss 
both approaches in this paper, which are very 
different, although they both involve flowing sur- 
faces and/or metrics. 

We also clarify that the above statement is with 
respect to a chosen end of (M?, g), since both the 
total mass and the definition of outermost refer to a 
particular end. In fact, nothing very important is 
gained by considering manifolds with more than one 
end, since extra ends can always be compactified by 
connect summing them (around a neighborhood of 
infinity) with large spheres while still preserving non- 
negative scalar curvature, for example. Hence, we 
will typically consider manifolds with just one end. In 
the case that the manifold has multiple ends, we will 
require every surface (which could have multiple 
connected components) in this paper to enclose all of 
the ends of the manifold except the chosen end. 


3] 


The Schwarzschild Metric 


The Schwarzschild metric 


o (ig) 


referred to in the above statement of the Rieman- 
nian Penrose inequality, is a particularly important 
example to consider, and corresponds to a zero- 
second fundamental form, spacelike slice of the 
usual (3 + 1)-dimensional Schwarzschild metric 
(which represents a spherically symmetric static 
black hole in vacuum). The three-dimensional 
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Schwarzschild metrics have total mass m > 0 and 
are characterized by being the only spherically 
symmetric, geodesically complete, zero scalar curva- 
ture 3-metrics, other than (R°, 6;;). They can also be 
embedded in four-dimensional Euclidean space 
(x,y,z, w) as the set of points satisfying 


I(x, y,z)| = (w?/8m) + 2m 


which is a parabola rotated around an S*. This last 
picture allows us to see that the Schwarzschild 
metric, which has two ends, has a Z2 symmetry 
which fixes the sphere with w=0 and |(x,y,z)| = 
2m, which is clearly minimal. Furthermore, the area 
of this sphere is 47(2m)*, giving equality in the 
Riemannian Penrose inequality. 


A Brief History of the Problem 


The Riemannian Penrose inequality has a rich 
history spanning nearly three decades and has 
motivated much interesting mathematics and phy- 
sics. In 1973, R Penrose in effect conjectured an 
even more general version of inequality [3] using a 
very clever physical argument, which we will not 
have room to repeat here (Penrose 1973). His 
observation was that a counterexample to inequality 
[3] would yield Cauchy data for solving the Einstein 
equations, the solution to which would likely violate 
the cosmic censor conjecture (which says that 
singularities generically do not form in a spacetime 
unless they are inside a black hole). 

Jang and Wald (1977), extending ideas of Geroch, 
gave a heuristic proof of inequality [3] by defining a 
flow of 2-surfaces in (M?, g) in which the surfaces 
flow in the outward normal direction at a rate equal 
to the inverse of their mean curvatures at each point. 
The Hawking mass of a surface (which is supposed 
to estimate the total amount of energy inside the 
surface) is defined to be 


/ |> 1 2 
Mirins (2) = ier 1 167 k 


(where |X| is the area of X and H is the mean 
curvature of © in (M?ł,g)) and, amazingly, is 
nondecreasing under this “inverse mean curvature 
flow.” This is seen by the fact that under inverse 
mean curvature flow, it follows from the Gauss 
equation and the second variation formula that 


d SE 
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when the flow is smooth, where R is the scalar 
curvature of (M?, g), K is the Gauss curvature of the 
surface X, and A, and à are the eigenvalues of the 
second fundamental form of ©, or principle curva- 
tures. Hence, 


R>0O 


and 


[Ksa 4] 


(which is true for any connected surface by the 
Gauss—Bonnet theorem) imply 


d 
dt MuHawking (x) = 0 [5] 
Furthermore, 
£o] 
MHawking ( 2.0 ) = er 


since “io is a minimal surface and has zero mean 
curvature. In addition, the Hawking mass of suffi- 
ciently round spheres at infinity in the asymptotically 
flat end of (M°, g) approaches the total mass m. Hence, 
if inverse mean curvature flow beginning with Xo 
eventually flows to sufficiently round spheres at 
infinity, inequality [3] follows from inequality [5]. 

As noted by Jang and Wald, this argument only 
works when inverse mean curvature flow exists and 
is smooth, which is generally not expected to be the 
case. In fact, it is not hard to construct manifolds 
which do not admit a smooth inverse mean 
curvature flow. The problem is that if the mean 
curvature of the evolving surface becomes zero or is 
negative, it is not clear how to define the flow. 

For 20 years, this heuristic argument lay dormant 
until the work of Huisken and Ilmanen in 1997. With 
a very clever new approach, Huisken and Ilmanen 
discovered how to reformulate inverse mean curvature 
flow using an energy minimization principle in such a 
way that the new generalized inverse mean curvature 
flow always exists. The added twist is that the surface 
sometimes jumps outward. However, when the flow is 
smooth, it equals the original inverse mean curvature 
flow, and the Hawking mass is still monotone. Hence, 
as will be described in the next section, their new flow 
produced the first complete proof of inequality [3] for 
a single black hole. 

Coincidentally, the author found another proof of 
inequality [3], submitted in 1999, which works for any 
number of black holes. The approach involves flowing 
the original metric to a Schwarzschild metric (outside 
the horizon) in such a way that the area of the 
outermost minimal surface does not change and the 


total mass is nonincreasing. Then, since the Schwarzs- 
child metric gives equality in inequality [3], the 
inequality follows for the original metric. 

Fortunately, the flow of metrics which is defined 
is relatively simple, and in fact stays inside the 
conformal class of the original metric. The outer- 
most minimal surface flows outwards in this 
conformal flow of metrics, and encloses any 
compact set (and hence all of the topology of the 
original metric) in a finite amount of time. Further- 
more, this conformal flow of metrics preserves non- 
negative scalar curvature. We will describe this 
approach later in the paper. 

Other contributions on the Penrose conjecture 
have also been made by Herzlich using the Dirac 
operator which Witten used to prove the positive 
mass theorem, by Gibbons in the special case of 
collapsing shells, by Tod, by Bartnik for quasi- 
spherical metrics, and by the present author using 
isoperimetric surfaces. There is also some interesting 
work of Ludvigsen and Vickers using spinors and 
Bergqvist, both concerning the Penrose inequality 
for null slices of a spacetime. 


Inverse Mean Curvature Flow 


Geometrically, Huisken and Ilmanen’s idea can be 
described as follows. Let X(t) be the surface 
resulting from inverse mean curvature flow for 
time ¢ beginning with the minimal surface %o. 
Define Y(t) to be the outermost minimal area 
enclosure of X(t). Typically, E(t) = (t) in the flow, 
but in the case that the two surfaces are not equal, 
immediately replace Y(t) with Y(t) and then con- 
tinue flowing by inverse mean curvature. 

An immediate consequence of this modified flow is 
that the mean curvature of X(t) is always non-negative 
by the first variation formula, since otherwise (t) 
would be enclosed by a surface with less area. This is 
because if we flow a surface X in the outward 
direction with speed 7, the first variation of the area 
is f Hn, where H is the mean curvature of X. 

Furthermore, by stability, it follows that in the 
regions where X(t) has zero mean curvature, it is 
always possible to flow the surface out slightly to 
have positive mean curvature, allowing inverse mean 
curvature flow to be defined, at least heuristically at 
this point. 

Furthermore, the Hawking mass is still monotone 
under this new modified flow. Notice that when © (t) 
jumps outwards to Y(t), 


| H < / H? 
E(t) E(t) 
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since X(t) has zero mean curvature where the two 
surfaces do not touch. Furthermore, 


DE = 2E) 


since (this is a neat argument) |%(t)| < |X(t)| (since 
y(t) is a minimal area enclosure of Y(t)) and we 
cannot have |X(t)| < |X(z)| since X(t) would have 
jumped outwards at some earlier time. This is only a 
heuristic argument, but we can then see that the 
Hawking mass is nondecreasing during a jump by 
the above two equations. 

This new flow can be rigorously defined, always 
exists, and the Hawking mass is monotone. Huisken 
and Ilmanen define X(t) to be the level sets of a 
scalar valued function u(x) defined on (M?,g) such 
that u(x) =0 on the original surface “io and satisfies 


div( S) = |Vul 6 


in an appropriate weak sense. Since the left-hand 
side of the above equation is the mean curvature of 
the level sets of u(x) and the right-hand side is the 
reciprocal of the flow rate, the above equation 
implies inverse mean curvature flow for the level sets 
of u(x) when |Vu(x)| £ 0. 

Huisken and Ilmanen use an energy minimization 
principle to define weak solutions to eqn [6]. 
Equation [6] is said to be weakly satisfied in Q by 
the locally Lipschitz function u if for all locally 
Lipschitz v with {v 4 u} CC Q, 


Jalu) < Jaw) 





where 


Jaw) = | vv + v|Vu] 


It can then be seen that the Euler-Lagrange equation 
of the above energy functional yields eqn [6]. 

In order to prove that a solution u exists to the above 
two equations, Huisken and Ilmanen regularize the 
degenerate elliptic equation 6 to the elliptic equation 


wla e eee 
\/|Vu] +e 


Solutions to the above equation are then shown to 
exist using the existence of a subsolution, and then 
taking the limit as € goes to zero yields a weak 
solution to eqn [6]. There are many details which we 
are skipping here, but these are the main ideas. 

As it turns out, weak solutions u(x) to eqn [6] 
often have flat regions where u(x) equals a 
constant. Hence, the level sets X(t) of u(x) will be 
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discontinuous in £ in this case, which corresponds 
to the “jumping out” phenomenon referred to at 
the beginning of this section. 

We also note that since the Hawking mass of the 
level sets of u(x) is monotone, this inverse mean 
curvature flow technique not only proves the 
Riemannian Penrose inequality, but also gives a 
new proof of the positive mass theorem in dimen- 
sion 3. This is seen by letting the initial surface be a 
very small, round sphere (which will have approxi- 
mately zero Hawking mass) and then flowing by 
inverse mean curvature, thereby proving m > 0. 

The Huisken and Ilmanen inverse mean curvature 
flow also seems ideally suited for proving Penrose 
inequalities for 3-manifolds which have R > —6 and 
which are asymptotically hyperbolic. This situation 
occurs if (M?,g) is chosen to be a constant mean 
curvature slice of the spacetime or if the spacetime is 
defined to solve the Einstein equation with nonzero 
cosmological constant. In these cases, there exists a 
modified Hawking mass which in monotone under 
inverse mean curvature flow which is the usual 
Hawking mass plus 4(|=|/167)°/*. However, because 
the monotonicity of the Hawking mass relies on the 
Gauss—Bonnet theorem, these arguments do not work 
in higher dimensions, at least so far. Also, because 
of the need for eqn [4], inverse mean curvature 
flow only proves the Riemannian Penrose inequality 
for a single black hole. In the next section, we 
present a technique which proves the Riemannian 
Penrose inequality for any number of black holes, 
and which can likely be generalized to higher 
dimensions. 


The Conformal Flow of Metrics 


Given any initial Riemannian manifold (M?, go) 
which has non-negative scalar curvature and which 
is harmonically flat at infinity, we will define a 
continuous, one-parameter family of metrics (M°, g;), 
0<t<o. This family of metrics will converge to a 
three-dimensional Schwarzschild metric and will have 
other special properties which will allow us to prove 
the Riemannian Penrose inequality for the original 
metric (M°, go). 

In particular, let “go be the outermost minimal 
surface of (M°, gq) with area Ap. Then, we will also 
define a family of surfaces X(t) with (0) =o such 
that X(t) is minimal in (M°, g,). This is natural since 
as the metric g; changes, we expect that the location 
of the horizon X(t) will also change. Then, the 
interesting quantities to keep track of in this flow are 
A(t), the total area of the horizon Y(t) in (M?, g;), 
and m(t), the total mass of (M°, g;) in the chosen end. 


In addition to all of the metrics g; having non- 
negative scalar curvature, we will also have the very 
nice properties that 


AG) = 0 
m'(t) <0 


for all t>0. Then, since (M?,g;) converges to a 
Schwarzschild metric (in an appropriate sense) 
which gives equality in the Riemannian Penrose 
inequality as described in the introduction, 


mO) > mo) = AEP = (FO 7 


which proves the Riemannian Penrose inequality for 
the original metric (M°, go). The hard part, then, is 
to find a flow of metrics which preserves non- 
negative scalar curvature and the area of the 
horizon, decreases total mass, and converges to a 
Schwarzschild metric as £ goes to infinity. 





The Definition of the Flow 


In fact, the metrics g, will all be conformal to go. 
This conformal flow of metrics can be thought of as 
the solution to a first-order ODE in ¢ defined by 
eqns [8|-[11]. Let 


gi = m(x) 80 [8] 
and uo(x) = 1. Given the metric g,, define 


X(t) = the outermost minimal area 


enclosure of Xo in (M?°, g;) [9] 


where Xọ is the original outer minimizing horizon in 
(M?, go). In the cases in which we are interested, X(t) 
will not touch Xo, from which it follows that X(t) is 
actually a strictly outer minimizing horizon of (M?°, g;). 
Then given the horizon }Ł(t), define v(x) such that 


Apt) = 0 outside X(t) 
vile) = on X(t) [10] 
lim v(x) = —e* 


and v(x) = 0 inside X(t). Finally, given v(x), define 
t 

u(x) = 1 +f vs(x) ds [11] 
0 


so that u(x) is continuous in t and has uo(x) = 1. 
Note that eqn [11] implies that the first-order rate 
of change of u;(x) is given by v;(x). Hence, the first- 
order rate of change of g; is a function of itself, go, 
and v(x) which is a function of go, t, and X(t) which 
is in turn a function of g, and Xo. Thus, the first-order 
rate of change of g, is a function of t, g;, go, and No. 


Theorem 2 Taken together, eqns [8|-[11] define a 
first-order ODE in t for u(x) which has a solution 
which is Lipschitz in the t variable, C' in the x 
variable everywhere, and smooth in the x variable 
outside X(t). Furthermore, X(t) is a smooth, strictly 
outer minimizing horizon in (M?,g;) for all t > 0, 
and X(t) encloses but does not touch X(t) for all 
ty >t, > 0. 


Since v;(x) is a superharmonic function in (M?, go) 
(harmonic everywhere except on S(t), where it is 
weakly superharmonic), it follows that u;(x) is super- 
harmonic as well. Thus, from eqn [11] we see that 
lim, — o u(x) =e’ and consequently that u(x) > 0 
for all £ by the maximum principle. Then, since 


R(gr) = u(x) >(—8Ag, + R(go))mr(x) [12] 


it follows that (M?,g;) is an asymptotically flat 
manifold with non-negative scalar curvature. 

Even so, it still may not seem like g; is particularly 
naturally defined since the rate of change of g; appears 
to depend on ¢ and the original metric go in eqn [10]. 
We would prefer a flow where the rate of change of g, 
can be defined purely as a function of g, (and No 
perhaps), and interestingly enough this actually does 
turn out to be the case! The present author has proved 
this very important fact and defined a new equivalence 
class of metrics called the harmonic conformal class. 
Then, once we decide to find a flow of metrics which 
stays inside the harmonic conformal class of the 
original metric (outside the horizon) and keeps the 
area of the horizon \{(t) constant, then we are basically 
forced to choose the particular conformal flow of 
metrics defined above. 


Theorem 3 The function A(t) is constant in t and 
m(t) is nonincreasing in t, for all t > 0. 


The fact that A’(t)=0 follows from the fact that 
to first order the metric is not changing on }(t) 
(since v;(x) =O there) and from the fact that to first 
order the area of X(t) does not change as it moves 
outward since X(t) is a critical point for area in 
(M?, g,). Hence, the interesting part of Theorem 3 is 
proving that m’(t) < 0. Curiously, this follows from 
a nice trick using the Riemannian positive mass 
theorem, which we describe later. 

Another important aspect of this conformal flow of 
the metric is that outside the horizon X(t), the manifold 
(M°,g,) becomes more and more spherically sym- 
metric and “approaches” a Schwarzschild manifold 
(R? \ {0}, s) in the limit as t goes to oo. More precisely, 


Theorem 4 For sufficiently large t, there exists a 
diffeomorphism ¢, between (M?,g;,) outside the 
horizon X(t) and a fixed Schwarzschild manifold 
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(R°\{0},s) outside its horizon. Furthermore, for all 
c€ > 0, there exists a T such that for all t > T, the 
metrics g; and $*(s) (when determining the lengths 
of unit vectors of (M?, g;)) are within € of each other 
and the total masses of the 2-manifolds are within 
c€ of each other. Hence, 

m(t) 1 


ae JA(t) V167 

Theorem 4 is not that surprising really although a 
careful proof is reasonably long. However, if one is 
willing to believe that the flow of metrics converges 
to a spherically symmetric metric outside the 
horizon, then Theorem 4 follows from two facts. 
The first fact is that the scalar curvature of (M3, g;) 
eventually becomes identically zero outside the 
horizon Y(t) (assuming (M?,gọ) is harmonically 
flat). This follows from the facts that X(t) encloses 
any compact set in a finite amount of time, that 
harmonically flat manifolds have zero scalar curva- 
ture outside a compact set, that u(x) is harmonic 
outside X(t), and eqn [12]. The second fact is that 
the Schwarzschild metrics are the only complete, 
spherically symmetric 3-manifolds with zero scalar 
curvature (except for the flat metric on R?). 

The Riemannian Penrose inequality, inequality 
[3], then follows from eqn [7] using Theorems 2-4, 
for harmonically flat manifolds. Since asymptoti- 
cally flat manifolds can be approximated arbitrarily 
well by harmonically flat manifolds while changing 
the relevant quantities arbitrarily little, the asymp- 
totically flat case also follows. Finally, the case of 
equality of the Penrose inequality follows from a 
more careful analysis of these same arguments. 





Qualitative Discussion 


Figures 3 and 4 are meant to help illustrate some of the 
properties of the conformal flow of the metric. Figure 3 
is the original metric which has a strictly outer 
minimizing horizon io. As ¢ increases, X(t) moves 
outwards, but never inwards. In Figure 4, we can 
observe one of the consequences of the fact that 
A(t)=Ago is constant in ¢. Since the metric is not 
changing inside }(ż), all of the horizons £(s), 0 < s < t 


V 
Ga (MÈ, go) 
Ga 


Figure 3 Original metric having a strictly outer minimizing 
horizon Xo. 
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(M$, gi) 


Figure 4 Metric after time t. 


have area Ao in (M?,g;). Hence, inside X(t), the 
manifold (M°, g;) becomes cylinder-like in the sense 
that it is laminated (i.e., foliated but with some gaps 
allowed) by all of the previous horizons which all have 
the same area Ap with respect to the metric g;. 

Now let us suppose that the original horizon Xo 
of (M°,g) had two components, for example. Then 
each of the components of the horizon will move 
outwards as £ increases, and at some point before 
they touch they will suddenly jump outwards to 
form a horizon with a single component enclosing 
the previous horizon with two components. Even 
horizons with only one component will sometimes 
jump outwards, but no more than a countable 
number of times. It is interesting that this phenom- 
enon of surfaces jumping is also found in the 
Huisken-Ilmanen approach to the Penrose conjec- 
ture using their generalized 1/H flow. 


Proof that m’(t) < 0 


The most surprising aspect of the flow defined 
earlier is that m'(t) <0. As mentioned in that 
section, this important fact follows from a nice 
trick using the Riemannian positive mass theorem. 

The first step is to realize that while the rate of 
change of g; appears to depend on ¢ and gg, this is in 
fact an illusion. As described in detail by Bray, the 
rate of change of g; can be described purely in terms 
of g, (and Xo). It is also true that the rate of change 
of g; depends only on g, and }£(t). Hence, there is no 
special value of t, so proving m’(t) < 0 is equivalent 
to proving m'(0) < 0. Thus, without loss of general- 
ity, we take t=0 for convenience. 

Now expand the harmonic function vo(x), defined 
in eqn [10], using spherical harmonics at infinity, to get 


vo(x) = —1 nto) [13] 


for some constant c. Since the rate of change of the 
metric g; at t =Q is given by vo(x) and since the total 


mass m(t) depends on the 1/r rate at which the 
metric g; becomes flat at infinity (see eqn [2]), it is 
not surprising that direct calculation gives us that 


m' (0) = 2(c — m(0)) 


Hence, to show that m’(0) < 0, we need to show 
that 


c < m(0) [14] 


In fact, counterexamples to eqn [14] can be found 
if we remove either of the requirements that X(0) 
(which is used in the definition of vo(x)) be a 
minimal surface or that (M°, go) have non-negative 
scalar curvature. Hence, we quickly see that eqn 
[14] is a fairly deep conjecture which says something 
quite interesting about manifold with non-negative 
scalar curvature. Well, the Riemannian positive 
mass theorem is also a deep conjecture which says 
something quite interesting about manifolds with 
non-negative scalar curvature. Hence, it is natural to 
try to use the Riemannian positive mass theorem to 
prove eqn [14]. 

Thus, we want to create a manifold whose total 
mass depends on c from eqn [13]. The idea is to use 
a reflection trick similar to one used by Bunting and 
Masood-ul-Alam (1987) for another purpose. First, 
remove the region of M? inside £(0) and then reflect 
the remainder of (M°, go) through 4(0). Define the 
resulting Riemannian manifold to be (M°, gọ) which 
has two asymptotically flat ends since (M°, gq) has 
exactly one asymptotically flat end not contained by 
(0). Note that (M°,2 ) has non-negative scalar 
curvature everywhere except on (0) where the 
metric has corners. In fact, the fact that (0) has 
zero mean curvature (since it is a minimal surface) 
implies that (M°,% ) has “distributional” non- 
negative scalar curvature everywhere, even on »(0). 
This notion is made rigorous by Bray. Thus, we have 
used the fact that X(0) is minimal in a critical way. 

Recall from eqn [10] that vo(x) was defined to be 
the harmonic function equal to zero on (0) which 
goes to —1 at infinity. We want to reflect v(x) to be 
defined on all of (M°, 0). The trick here is to define 
vo(x) on (M?, Z0) to be the harmonic function which 
goes to —1 at infinity in the original end and goes to 
1 at infinity in the reflect end. By symmetry, vo(x) 
equals 0 on %(0) and so agrees with its original 
definition on (M°, go). 

The next step is to compactify one end of (M°, go). 
By the maximum principle, we know that vo(x) > —1 
and c > 0, so the new Riemannian manifold (M°, 
(vo(x) + 1)*8 ) does the job quite nicely and compac- 
tifies the original end to a point. In fact, the 
compactified point at infinity and the metric there 


can be filled in smoothly (using the fact that (M°, go) is 
harmonically flat). It then follows from eqn [12] that 
this new compactified manifold has non-negative 
scalar curvature since vo(x) + 1 is harmonic. 

The last step is simply to apply the Riemannian 
positive mass theorem to (M°, (vo(x) + 1)" z0): It is 
not surprising that the total mass m/(0) of this 
manifold involves c, but it is quite lucky that direct 
calculation yields 


m(0) = —4(c — m(0)) 


which must be positive by the Riemannian positive 
mass theorem. Thus, we have that 


m' (0) = 2(c — m(0)) = —5m(0) < 0 


Open Questions and Applications 


Now that the Riemannian Penrose conjecture has been 
proved, what are the next interesting directions? What 
applications can be found? Is this subject only of 
physical interest, or are there possibly broader 
applications to other problems in mathematics? 

Clearly, the most natural open problem is to find a 
way to prove the general Penrose inequality in which 
M? is allowed to have any second fundamental form in 
the spacetime. There is good reason to think that this 
may follow from the Riemannian Penrose inequality, 
although this is a bit delicate. On the other hand, the 
general positive mass theorem followed from the 
Riemannian positive mass theorem as was originally 
shown by Schoen and Yau using an idea due to Jang. 
For physicists, this problem is definitely a top priority 
since most spacetimes do not even admit zero-second 
fundamental form spacelike slices. 

Another interesting question is to ask these same 
questions in higher dimensions. The author is currently 
working on a paper to prove the Riemannian Penrose 
inequality in dimensions <8. Dimension 8 and higher 
are harder because of the surprising fact that minimal 
hypersurfaces (and hence apparent horizons of black 
holes) can have codimension 7 singularities (points 
where the hypersurface is not smooth). This curious 
technicality is also the reason that the positive mass 
conjecture is still open in dimensions 8 and higher for 
manifolds which are not spin. 

Naturally, it is harder to tell what the applications 
of these techniques might be to other problems, but 
already there have been some. One application is to 
the famous Yamabe problem: given a compact 
3-manifold M°, define E(g) = fy; Re dV, where g is 
scaled so that the total volume of (M?, g) is 1, Rg is 
the scalar curvature at each point, and dV, is the 
volume form. An idea due to Yamabe was to try to 
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construct canonical metrics on M? by finding critical 
points of this energy functional on the space of metrics. 
Define C(g) to be the infimum of E(g) over all metrics g 
conformal to g. Then the (topological) Yamabe 
invariant of M?, denoted here as Y(M7°), is defined to 
be the supremum of C(g) over all metrics g. Y(S°) =6 - 
(27%)? = Yı is known to be the largest possible value 
for Yamabe invariants of 3-manifolds. It is also known 
that Y(T)=0 and Y(S* x S')=Y¥,;= Y(S*xS'), 
where S$? XS! is the nonorientable S* bundle over St. 

The author, working with Andre Neves on a 
problem suggested by Richard Schoen, recently was 
able to compute the Yamabe invariant of RP? using 
inverse mean curvature flow techniques and found 
that Y(RP?) = Y;,/2?/> = Y2. A corollary is Y(RP? x 
S')=Y> as well. These techniques also yield the 
surprisingly strong result that the only prime 3-mani- 
folds with Yamabe invariant larger than RP? are 
S3, S2 x St, and S?XS!. The Poincare conjecture for 
3-manifolds with Yamabe invariant greater than RP? 
is therefore a corollary. Furthermore, the problem of 
classifying 3-manifolds is known to reduce to the 
problem of classifying prime 3-manifolds. The 
Yamabe approach then would be to make a list of 
prime 3-manifolds ordered by Y. The first five prime 
3-manifolds on this list are therefore $°,S* x 
S', S*xS!, RP’, and RP* x S'. 
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Introduction 


The aim of these pages is to give a brief, self- 
contained introduction to that part of geometric 
measure theory which is more directly related to the 
calculus of variations, namely the theory of currents 
and its applications to the solution of Plateau 
problem. (The theory of finite-perimeter sets, which 
is closely related to currents and to the Plateau 
problem, is treated in the article Free Interfaces and 
Free Discontinuities: Variational Problems in the 
Encyclopedia.) 

Named after the Belgian physicist JAF Plateau 
(1801-1883), this problem was originally formulated 
as follows: find the surface of minimal area spanning 
a given curve in the space. Nowadays, it is mostly 
intended in the sense of developing a mathematical 
framework where the existence of k-dimensional 
surfaces of minimal volume that span a prescribed 
boundary can be rigorously proved. Indeed, several 
solutions have been proposed in the last century, 
none of which is completely satisfactory. 

One difficulty is that the infimum of the area 
among all smooth surfaces with a certain boundary 
may not be attained. More precisely, it may happen 
that all minimizing sequences (i.e., sequences of 
smooth surfaces whose area approaches the infimum) 
converge to a singular surface. Therefore, one is 
forced to consider a larger class of admissible surfaces 
than just smooth ones (in fact, one might want to do 
this also for modeling reasons — this is indeed the case 
with soap films, soap bubbles, and other capillarity 
problems). But what does it mean that a set “spans” a 
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given curve? and what should we intend by area of a 
set which is not a smooth surface? 

The theory of integral currents developed by 
Federer and Fleming (1960) provides a class of 
generalized (oriented) surfaces with well-defined 
notions of boundary and area (called mass) where 
the existence of minimizers can be proved by direct 
methods. More precisely, this class is large enough 
to have good compactness properties with respect 
to a topology that makes the mass a lower- 
semicontinuous functional. This approach turned 
out to be quite powerful and flexible, and in the 
last decades the theory of currents has found 
applications in several different areas, from dyna- 
mical systems (in particular, Mather theory) to the 
theory of foliations, to optimal transport problems. 


Hausdorff Measures, Dimension, 
and Rectifiability 


The volume of a smooth d-dimensional surface in 
R” is usually defined using parametrizations by 
subsets of R. The notion of Hausdorff measure 
allows to compute the d-dimensional volume using 
coverings instead of parametrizations, and, what is 
more important, applies to all sets in R”, and makes 
sense even if d is not an integer. Attached to 
Hausdorff measure is the notion of Hausdorff 
dimension. Again, it can be defined for all sets in 
R” and is not necessarily an integer. The last 
fundamental notion is rectifiability: k-rectifiable 
sets can be roughly understood as the largest class 
of k-dimensional sets for which it is still possible to 
define a k-dimensional tangent bundle, even if only 
in a very weak sense. They are essential to the 
construction of integral currents. 


Hausdorff Measure 


Let d > 0 be a positive real number. Given a set E in 
R”, for every 6 > 0 we set 


where wg is the d-dimensional volume of the unit 
ball in R? whenever d is an integer (there is no 
canonical choice for wg when d is not an integer; 
a convenient one is wy=2%), and the infimum is 
taken over all countable families of sets {E;} that 
cover E and whose diameters satisfy diam(E;) < ô. 
The d-dimensional Hausdorff measure of E is 


v4 (E):= lim 74 (E) [2] 
(the limit exists because va (E) is decreasing in 6). 
Remarks 
d 


(i) 7° is called d-dimensional because of its 
scaling behavior: if E, is a copy of E scaled 
homothetically by a factor A, then 


#* (Ey) = °#"(E) 


Thus, 7! scales like the length, 77 scales like the 
area, and so on. 

(ii) The measure 7° is clearly invariant under 
rigid motions (translations and rotations). This 
implies that 74 agrees on R with the Lebesgue 
measure up to some constant factor; the renorma- 
lization constant wg/2* in [1] makes this factor 
equal to 1. Thus, 74(E) agrees with the usual 
d-dimensional volume for every set E in Rf, and 
the area formula shows that the same is true if 
E is (a subset of) a d-dimensional surface of class C! 
in R”. 

(iii) Besides the Hausdorff measure, there are 
several other, less popular notions of d-dimensional 
measure: all of them are invariant under rigid motion, 
scale in the expected way, and agree with 74 for sets 
contained in R or in a d-dimensional surface of class 
C!, and yet they differ for other sets (for further 
details, see Federer (1996, section 2.10)). 

(iv) The definition of 7”4(E) uses only the notion 
of diameter, and therefore makes sense when E is a 
subset of an arbitrary metric space. Note that 7¢(E) 
depends only on the restriction of the metric to E, 
and not on the ambient space. 

(v) The measure 7% is countably additive on 
the o-algebra of Borel sets in R”, but not on all sets; 
to avoid pathological situations, we shall always 
assume that sets and maps are Borel measurable. 


d 
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Hausdorff Dimension 


According to intuition, the length of a surface 
should be infinite, while the area of a curve should 
be null. These are indeed particular cases of the 
following implications: 


W*(E)>0 => #7 E= ford’ <d 
w4(E)< co => Zİ (E)=0 ford’ >d 


Hence, the infimum of all d such that 74(E) =0 and 
the supremum of all d such that 74(E)=oo 
coincide. This number is called Hausdorff dimension 
of E, and denoted by dimų (E). For surface of class 
C!, the notion of Hausdorff dimension agrees with 
the usual one. Example of sets with nonintegral 
dimension are described in the next subsection. 


Remarks 


(i) Note that 74(E) may be 0 or oo even for 

(ii) The Hausdorff dimension of a set E is strictly 
related to the metric on E, and not just to the 
topology. Indeed, it is preserved under diffeomorph- 
isms but not under homeomorphisms, and it does 
not always agree with the topological dimension. 
For instance, the Hausdorff dimension of the graph 
of a continuous function f:R—R can be any 
number between 1 and 2 (included). 

(iii) For nonsmooth sets, the Hausdorff dimension 
does not always conform to intuition: for example, 
the dimension of a Cartesian product E x F of 
compact sets does not agree in general with the sum 
of the dimensions of E and F. 

(iv) There are many other notions of dimension 
besides Hausdorff and topological ones. Among 
these, packing dimension and box-counting dimen- 
sion have interesting applications (see Falconer 
(2003, chapters 3 and 4)). 


Self-Similar Fractals 


Interesting examples of sets with nonintegral dimen- 
sion are self-similar fractals. We present here a 
simplified version of a construction due to Hutchinson 
(Falconer 2003, chapter 9). Let {W;} be a finite set of 
similitudes of R” with scaling factor A; <1, and 
assume that there exists a bounded open set V such 
that the sets V;:=W;(V) are pairwise disjoint and 
contained in V. The self-similar fractal associated with 
the system {W,} is the compact set C that satisfies 


C= LU wi(C) [3] 


The term “self-similar” follows by the fact that C 
can be written as a union of scaled copies of itself. 
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The existence (and uniqueness) of such a C follows 
from a standard fixed-point argument applied to the 
map C+r+|(JW;(C). The dimension d of C is the 
unique solution of the equation 


> X =1 [4] 


Formula [4] can be easily justified: if the sets U;(C) 
are disjoint — and the assumption on V implies that 
this almost the case — then [3] implies 74(C)= 
17 4(U;(C)) = EA 74(C), and therefore 74(C) can 
be positive and finite if and only if d satisfies [4]. 
An example of this construction is the usual 
Cantor set in R, which is given by the similitudes 


Wy (x) := $x and W(x) := $ + 4x 


By [4], its dimension is d= log2/log3. Other 
examples are described in Figures 1-3. 


Rectifiable Sets 


Given an integer R=1,...,”, we say that a set E in 
R” is k-rectifiable if it can be covered by a countable 
family of sets {S;} such that So is 7*-negligible (i.e., 
7* (So) =0) and $; is a k-dimensional surface of class 
C! for j=1,2,... Note that dimy (E) < k because 
each S; has dimension k. 

A k-rectifiable set E bears little resemblance to 
smooth surfaces (it can be everywhere dense!), but it 
still admits a suitably weak notion of tangent bundle. 











Figure 1 The maps W;,/=1,..., 4, take the square V into the 
squares V; at the corners of V. The scaling factor is A for all i, 
hence dimy (C) = log4/(—log A). Note that dimp (C) can be any 
number between 0 and 2, including 1. 











Figure 2 A self-similar fractal with more complicated topology. 
The scaling factor is 1/4 for all twelve similitudes, hence 
dimy (C) = log 12/ log 4. 


Figure 3 The von Koch curve (or snowflake). The scaling 
factor is 1/3 for all four similitudes, hence dimy (C) = log 4/ log 3. 


More precisely, it is possible to associate with every 
x € E a k-dimensional subspace of R”, denoted by 
Tan(E,x), so that for every k-dimensional surface S 
of class C! in R” there holds 

Tan(E,x) = Tan(S,x) for 7*-a.e.xE ENS [5] 
where Tan(S,x) is the tangent space to S$ at x 
according to the usual definition. 

It is not difficult to see that Tan(E, x) is uniquely 
determined by [5] up to an 7*-negligible amount of 
points x € E, and if E is a surface of class C}, then it 
agrees with the usual tangent space for 7”*-almost 
all points of E. 


Remarks 


(i) In the original definition of rectifiability, the 
sets S; o j > 0 are Lipschitz 1 images of Rf, that is, 
S= —f(R ), where fi: R R* —R” is a badie map. 
It can be shown that this definition is equivalent to 
the one above. 

(ii) The construction of the tangent bundle is 
straightforward: Let {S;} be a covering of E as earlier, 
and set Tan(E, x) := Tan(S;, x), where j is the smallest 
positive integer such that x € S;. Then [5] is an 
immediate corollary of the following lemma: if S and 
S’ are k-dimensional surfaces of class C! in R”, then 
Tan(S, x) = Tan(S', x) for ~*-almost every x € SNS’. 

(iii) A set E in R” is called purely k-unrectifiable 
if it contains no k-rectifiable subset with posi- 
tive k-dimensional measure, or, equivalently, if 
v*®(EMS)=0 for every k-dimensional surface S of 
class C!. For instance, every product E:= E; x E2, 
where FE; and E> are 7!-negligible sets in R is a 
purely 1-unrectifiable set in R? (it suffices to show 
that ~'(EMS)=0 whenever S is the graph of a 
function f: R—R of class C', and this follows by 
the usual formula for the length of the graph). Note 
that the Hausdorff dimension of such product sets 
can be any number between O and 2, hence 
rectifiability is not related to dimension. The self- 
similar fractals described in Figures 1 and 3 are both 
purely 1-unrectifiable. 


Rectifiable Sets with Finite Measure 


If E is a k-rectifiable set with finite (or locally finite) 
k-dimensional measure, then Tan(E,x) can be 
related to the behavior of E close to the point x. 

Let B(x,r) be the open ball in R” with center x 
and radius r, and let C(x,T,a) be the cone with 
center x, axis T — a k-dimensional subspace of R” — 
and amplitude a= arcsin a, that is, 


C(x, T,a):= {x € R”: dist(x' —x,T) < alx’ — x|} 


ya C&T 


= mene Tan(E, x) 
\ 


Figure 4 A rectifiable set E close to a point x of approximate 
tangency. The part of E contained in the ball B(x, r) but not in the 
cone C(x, T, a) is not empty, but only small in measure. 











For 7*-almost every x € E, the measure of EN 
B(x,r) is asymptotically equivalent, as r— 0, to the 
measure of a flat disk of radius r, that is, 


Z(E N B(x,7r)) ~ wyr* 


Moreover, the part of E contained in B(x,r) is 
mostly located close to the tangent plane Tan(E, x), 
that is, 


7*(EN B(x,r) N C(x, Tan(E, x),a)) ~ wer 
for every a > 0 


When this condition holds, Tan(E,x) is called the 
approximate tangent space to E at x (see Figure 4). 


The Area Formula 


The area formula allows to compute the measure 
7v*(®(E)) of the image of a set E in RE as the 
integral over E of a suitably defined Jacobian 
determinant of ®. When ® is injective and takes 
values in R*, we recover the usual change of 
variable formula for multiple integrals. 

We consider first the linear case. If L is a linear 
map from R* to R” with m > k, the volume ratio 
p:= 7*(L(E)//7*(E) does not depend on E, and 
agrees with |det(PL)|, where P is any linear 
isometry from the image of L into R*, and det (PL) 
is the determinant of the k x k matrix associated 
with PL. The volume ratio p can be computed using 
one of the following identities: 


p=4/ det? L) = 


where L* is the adjoint of L (thus, L*L is a linear 
map from R* into R*), and the sum in the last term 
is taken over all kxk minors M of the matrix 
associated with L. 

Let ®: R — R” be a map of class C! with m > k, 
and E a set in R. Then 


S "(det M)" (6) 


f HEO) NE) dz) = / Hx)d7*(x) [7 
&(E) E 
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where #A stands for the number of elements of A, 
and the Jacobian J is 


J (x) := 4/det(V®(x)"V®(x)) [8] 


Note that the left-hand side of [7] is 7*(®(E)) when 
® is injective. 


Remark 

Formula [7] holds even if E is a k-rectifiable set in R”. 
In this case, the gradient V®(x) in [8] should be 
replaced by the tangential derivative of ® at x (viewed 
as a linear map from Tan(E, x) into R”). No version of 
formula [7] is available when E is not rectifiable. 


Vectors, Covectors, and Differential 
Forms 


In this section, we review some basic notions of 
multilinear algebra. We have chosen a definition of 
k-vectors and k-covectors in R”, and of the corres- 
ponding exterior products, which is quite convenient 
for computations, even though not as satisfactory from 
the formal viewpoint. The main drawback is that it 
depends on the choice of a standard basis of R”, and 
therefore cannot be used to define forms (and currents) 
when the ambient space is a general manifold. 


k-Vectors and Exterior Product 


Let {e1,..., €n} be the standard basis of R”. Given an 
integer k <n, I(n,k) is the set of all multi-indices 
i=(11,...,%) with 1<4y <b <-+:-<& <n, and 
for every i € I(n,k) we introduce the expression 


Ci Sea Nep AAAG 


A k-vector in R” is any formal linear combination 
Xa;e; with a; € R for every i € I(n, k). The space of 
k-vectors is denoted by A,(R”); in particular, 
A, (R”)=R”. For reasons of formal convenience, 
we set Ag(R”):=R and A(R”) := {0} for k >n. 

We denote by |- | the Euclidean norm on ^(R”). 

The exterior product v Aw € A,,,(R”) is defined 
for every v€A,(R”) and we€A,(R”), and is 
completely determined by the following properties: 
(1) associativity, (2) linearity in both arguments, and 
(3) e; Ae; = —e; ^ e; for every i £ j and e; ^ e; =0 for 
every 7. 


Simple Vectors and Orientation 


A simple k-vector is any v in A(R”) that can be 
written as a product of 1-vectors, that is, 


V== V1 AVA == AVE 


It can be shown that v is null if and only if the 
vectors {v;} are linearly dependent. If v is not null, 
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then it is uniquely determined by the following 
objects: (1) the k-dimensional space M spanned 
by {v;}; (2) the orientation of M associated with the 
basis {v;}; (3) the euclidean norm |v]. In particular, 
M does not depend on the choice of the vectors vj. 
Note that |v| is equal to the k-dimensional volume of 
the parallelogram spanned by {vj}. 

Hence, the map v+> M is a one-to-one correspon- 
dence between the class of simple k-vectors with 
norm |v|=1 and the Grassmann manifold of 
oriented k-dimensional subspaces of R”. 

This remark paves the way to the following definition: 
if S is a k-dimensional surface of class Ct in R”, possibly 
with boundary, an orientation of S is a continuous map 
Ts: S — Ap (R”) such that T5(x) is a simple k-vector with 
norm 1 that spans Tan(S,x) for every x. With every 
orientation of S (if any exists) is canonically associated 
the orientation of the boundary OS that satisfies 


Ts(x) = n(x) A Tas(x) 


where n(x) is the inner normal to OS at x. 


for every x€ ƏS [9] 


k-Covectors 


The standard basis of the dual of R” is 
fdx1,...,dxn}, where dx;:R”—R is the linear 
functional that takes every x =(x1,...,X,) into the 
ith component x;. For every i € I(n,k) we set 


dxi = di NaN Nee NOX, 


and the space A*(R”) of k-covectors consists of all 
formal linear combinations Ma;dx;. The exterior 
product of covectors is defined as that for vectors. 
The space A*(R”) is dual to A,(R”) via the duality 
pairing (;) defined by the relations (dxj;;e;) := dj 
(that is, 1 if ¿=j and O otherwise). 


Differential Forms and Stokes Theorem 


A differential form of order k on R” is a map 
w:R”— A*(R"). Using the canonical basis of 
AR(IR”), we can write w as 


= X wad 


icI(n,k) 


where the coordinates w; are real functions on R”. 
The exterior derivative of a k-form w of class C! is 
the (k + 1)-form 


dux(x):= S dw;(x) A dx; 


i€I(n,k) 


where, for every scalar function f, df is the 1-form 


If S is a k-dimensional oriented surface, the 
integral of a k-form w on S is naturally defined by 


form five 


Stokes theorem states that for every (k — 
of class C! there holds 


ef 


provided that OS is endowed with the orientation Tas 
that satisfies [9]. 


x)) dz? (x) 


1)-form w 


Currents 


The definition of k-dimensional currents closely 
resembles that of distributions: they are the dual of 
smooth k-forms with compact support. Since every 
oriented k-dimensional surface defines by integration 
a linear functional on forms, currents can be regarded 
as generalized oriented surfaces. As every distribution 
admits a derivative, so every current admits a 
boundary. Indeed, many other basic notions of 
homology theory can be naturally extended to 
currents — this was actually one of the motivations 
behind the introduction of currents, due to de Rham. 
For the applications to variational problems, 
smaller classes of currents are usually considered; 
the most relevant to the Plateau problem is that of 
integral currents. Note that the definitions of the 
spaces of normal, rectifiable, and integral currents 
and the symbols used to denote them vary, some- 
times more than slightly, depending on the author. 


Currents, Boundary, and Mass 


Let n,k be integers with n > k. The space of 
k-dimensional currents on R”, denoted by 2(R”), 
is the dual of the space 7*(R”) of smooth k-forms 
with compact support in R”. For k>1, the 
boundary of a k-current T is the (k — 1)-current oT 
defined by 

(8T;w):= (T;dw) for every w € 9*1(R") [11 
while the boundary of a O-current is set equal to 0. 
The mass of T is the number 


M(T):= sup4 (T; w): w E€ F(R”), lw] < i} 112] 


Fundamental examples of k-currents are oriented 
k-dimensional surfaces: with each oriented surface 
S of class C! ; canonically associated the current 

(T; dw) : = fyw (in fact, S is completely determined 
7 a action on forms, i.e., by the associated 


current). By Stokes theorem, the boundary of T is 
the current associated with the boundary of S; thus, 
the notion of boundary for currents is compatible 
with the classical one for oriented surfaces. 
A simple computation shows that M(T)= 7%(S); 
therefore, the mass provides a natural extension 
of the notion of k-dimensional volume to 
k-currents. 


Remarks 


(i) Not all k-currents look like k-dimensional 
surfaces. For example, every k-vectorfield v: R” — ^k 
(R”) defines by duality the k-current 


(Tsar) = | w(x)sv(ee)) d7") 


The mass of T is f|v|dz”, and the boundary is 
represented by a similar integral formula involving 
the partial derivatives of v (in particular, for 
1-vectorfields, the boundary is the O-current asso- 
ciated with the divergence of v). Note that the 
dimension of such T is k because k-vectorfields act 
on k-forms, and there is no relation with the 
dimension of the support of T, which is n. 

(ii) To be precise, 7*(IR”) is a locally convex 
topological vector space, and %;,(R”) is its topolo- 
gical dual. As such, 24(R”) is endowed with a dual 
(or weak*) topology. We say that a sequence of 
k-currents (T;) converge to T if they converge in the 
dual topology, that is, 


(Tj;w) —> (T;w) for every w € a(R”) [13] 


Recalling the definition of mass, it is easy to show 
that it is lower-semicontinuous with respect the dual 
topology, and in particular 


lim inf M(T;) > M(T) 14) 


Currents with Finite Mass 


By definition, a k-current T with finite mass is a 
linear functional on k-forms which is bounded with 
respect to the supremum norm, and by Riesz 
theorem it can be represented as a bounded measure 
with values in A(R”). In other words, there exist a 
finite positive measure u on R” and a density 
function T:R” — A, (R”) such that |7(x)|=1 for 
every x and 


Ti) = J (u(x); r(x) yale) 


The fact that currents are the dual of a separable 
space yields the following compactness result: a 
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sequence of k-currents (T;) with uniformly bounded 
masses M(T;) admits a subsequence that converges 
to a current with finite mass. 


Normal Currents 


A k-current T is called normal if both T and ƏT 
have finite mass. The compactness result stated in 
the previous paragraph implies the following com- 
pactness theorem for normal currents: a sequence of 
normal currents (T;) with M(T;) and M(d0Tj) 
uniformly bounded admits a subsequence that 
converges to a normal current. 


Rectifiable Currents 


A k-current T is called rectifiable if it can repre- 
sented as 


Ta- J (wla); T(2)y0() d2 (a) 


where E is a k-rectifiable set E, 7 is an orientation of 
E — that is, v(x) is a simple unit k-vector that spans 
Tan(E,x) for ”*-almost every x € E-and 8 is a real 
function such that f,|6|d 7* is finite, called multi- 
plicity. Such T is denoted by T=[E,7,06]. In 
particular, a rectifiable O-current can be written as 
(T; w) = 6jw(x;), where E = {x;} is a countable set in 
R” and {0;} is a sequence of real numbers with 
>|; <--O0: 


Integral Currents 


If T is a rectifiable current and the multiplicity 0 
takes integral values, T is called an integer multi- 
plicity rectifiable current. If both T and OT are 
integer multiplicity rectifiable currents, then T is an 
integral current. 

The first nontrivial result is the boundary rectifia- 
bility theorem: if T is an integer multiplicity 
rectifiable current and OT has finite mass, then OT 
is an integer multiplicity rectifiable current, too, and 
therefore T is an integral current. 

The second fundamental result is the compactness 
theorem for integral currents: a sequence of integral 
currents (T;) with M(T;) and M(OT;) uniformly 
bounded admits a subsequence that converges to 
an integral current. 


Remarks 


(i) The point of the compactness theorem for 
integral currents is not the existence of a converging 
subsequence — that being already established by the 
compactness theorem for normal currents — but the 
fact that the limit is an integral current. In fact, this 
result is often referred to as a “closure theorem” 
rather than a “compactness theorem.” 
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nf RT) 
2 ei 0 Oi 
B eee! 
T:=e-10 T;:=[E; e, 1/j] T;:=[E; e, 1] 


Figure 5 T is the normal 1-current on R? associated with the 
vectorfield equal to the unit vector e on the unit square Q, and 
equal to O outside. 7; are the rectifiable currents associated with 
the sets E; (middle) and the constant multiplicity 1/j, and then 
M(7;) =1, M(OT;) =2. E are the integral currents associated 
with the sets E; (left) and the constant multiplicity 1, and then 
M(T}) =1, M(OT/) =2/*. Both (T;) and (T/) converge to T. 


(ii) The following observations may clarify the 

role of assumptions in the compactness theorem: 
(1) a sequence of integral currents (Tj) with M(T;) 
uniformly bounded -— but not M(OT;) - may 
converge to any current with finite mass, not 
necessarily a rectifiable one. 
(2) A sequence of rectifiable currents (Tj) with 
rectifiable boundaries and M(T;), M(OT;) uniformly 
bounded may converge to any normal current, 
not necessarily a rectifiable one. Examples of both 
situations are described in Figure 5. 


Application to the Plateau Problem 


The compactness result for integral currents implies 
the existence of currents with minimal mass: if T is 
the boundary of an integral k-current in R”,1 < k < 
n, then there exists a current T of minimal mass 
among those that satisfy T =P. 

The proof of this existence result is a typical 
example of the direct method: let m be the infimum 
of M(T) among all integral currents with boundary 
I, and let (Tj) be a minimizing sequence (1.e., a 
sequence of integral currents with boundary I such 
that M(T;) converges to m). Since M(T;) is bounded 
and M(0OT;)=M(T) is constant, we can apply the 
compactness theorem for integral currents and 
extract a subsequence of (T;) that converges to an 
integral current T. By the continuity of the boundary 
operator, OT = lim óT; =T, and by the semiconti- 
nuity of the mass M(T) < lim M(T;) =m (cf. [14]). 
Thus, T is the desired minimal current. 


Remarks 


(i) Every integral (Rk —1)-current I with null 
boundary and compact support in R” is the boundary 
of an integral current, and therefore is an admissible 
datum for the previous existence result. 

(ii) A mass-minimizing integral current T is more 
regular than a general integral current. For k =n — 1, 
there exists a closed singular set S with dimy (S) < 


k — 7 such that T agrees with a smooth surface in the 
complement of S and of the support of the boundary. 
In particular, T is smooth away from the boundary 
for n< 7. For general k, it can only be proved 
that dimp (S$) < k—2 Both results are optimal: in 
Rf x Rf, the minimal 7-current with boundary T := 
{|x| = |v] =1} - a product of two 3-spheres — is the 
cone T := {|x| =|y| < 1}, and is singular at the origin. 
In R* x R?, the minimal 2-current with boundary 
DP := {x =0, |y|=1} U {y=0, |x] =1} - a union of 


two disjoint circles — is the union of the disks 
{x = 0, |y| < 1} U {y=0, |x] < 1}, and is singular at 
the origin. 


(iii) In certain cases, the mass-minimizing current 
T may not agree with the solution of the Plateau 
problem suggested by intuition. The first reason is 
that currents do not include nonorientable surfaces, 
which sometimes may be more convenient (Figure 6). 
Another reason is that the mass of an integral 
current T associated with a k-rectifiable set E does 
not agree with the measure 7*(E) — called size of T 
— because multiplicity must be taken into account, 
and for certain [ the mass-minimizing current may 
be not size-minimizing (Figure 7). Unfortunately, 
proving the existence of size-minimizing currents is 
much more complicated, due to lack of suitable 
compactness theorems. 

(iv) For k=2, the classical approach to the 
Plateau problem consists in parametrizing surfaces 
in R” by maps f from a given two-dimensional 
domain D into R”, and looking for minimizers of 
the area functional 


| /det(VFVA) 
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Figure 6 The surface with minimal area spanning the 
(oriented) curve T is the Möbius strip X. However, X is not 
orientable, and cannot be viewed as a current. The mass- 
minimizing current with boundary T is X”. 





Figure 7 The boundary I is a O-current associated with four 
oriented points. The size (length) of T is smaller than that of T”. 
However, 07 =T implies that the multiplicity of T must be 2 on 
the central segment and 1 on the others; thus the mass of T is 
larger than its size. The size-minimizing current with boundary T 
is T, while the mass-minimizing one is T’. 


> afd aft 


Figure 8 The surface ©) minimizes the area among surfaces 
parametrized by the disk with boundary I’. The mass-minimizing 
current X’ can only be parametrized by a disk with a handle. 
Note that © is a singular surface, while >’ is not. 


= Ses 


Figure 9 Two possible soap films spanning the wire T: unlike 
£, X’ cannot be viewed as a current with multiplicity 1 and 
boundary I. 





(recall the area formula, discussed earlier) under the 
constraint f(D) =I. In this framework, the choice 
of the domain D prescribes the topological type of 
admissible surfaces, and therefore the minimizer 
may differ substantially from the mass-minimizing 
current with boundary T (Figure 8). 

(v) For some modeling problems, for instance, 
those related to soap films and soap bubbles, currents 
do not provide the right framework (Figure 9). A 
possible alternative are integral varifolds (cf. Almgren 
2001). However, it should be pointed out that this 
framework does not allow for “easy” application of 
the direct method, and the existence of minimal 
varifolds is in general quite difficult to prove. 


Miscellaneous Results and Useful Tools 


(i) An important issue, related to the use of currents 
for solving variational problems, concerns the extent 
to which integral currents can be approximated by 
regular objects. For many reasons, the “right” regular 
class to consider are not smooth surfaces, but integral 
polyhedral currents, that is, linear combinations with 
integral coefficients of oriented simplexes. The follow- 
ing approximation theorem holds: for every integral 
current T in R” there exists a sequence of integral 
polyhedral currents (Tj) such that 


T; >T, ôT; > ôT 
M(T) + M(T), M(8T;) > M(0T) 


The proof is based on a quite useful tool, called 
polyhedral deformation. 

(ii) Many geometric operations for surfaces have an 
equivalent for currents. For instance, it is possible to 
define the image of a current in R” via a smooth proper 
map f : R” — R”. Indeed, with every k-form w on R” 
is canonically associated a k-form f#w on R”, called 
pullback of w according to f. The adjoint of the 
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pullback is an operator, called push-forward, that 
takes every k-current T in R” into a k-current fT in 
R”. If T is the rectifiable current associated with a 
rectifiable set E and a multiplicity 0, the push-forward 
fT is the rectifiable current associated with f (E) — and 
a multiplicity 6’(y) which is computed by adding up 
with the right sign all @(x) with x € f(y). As one 
might expect, the boundary of the push-forward is the 
push-forward of the boundary. 

(iii) In general, it is not possible to give a meaning 
to the intersection of two currents, and not even of 
a current and a smooth surface. However, it is 
possible to define the intersection of a normal 
k-current T and a level surface f-!(y) of a smooth 
map f:R”—R’ (with k <b < n) for almost every 
y, resulting in a current Ty with the expected 
dimension þh — k. This operation is called slicing. 

(iv) When working with currents, a quite useful 
notion is that of flat norm: 


F(T) := inf {M(R) + M(S): T = R + 0S} 


where T and R are k-currents, and S is a (k + 1)- 
current. The relevance of this notion lies in the fact 
that a sequence (T;) that converges with respect to 
the flat norm converges also in the dual topology, 
and the converse holds if the masses M(T;) and 
M(OT;) are uniformly bounded. Hence, the flat 
norm metrizes the dual topology of currents (at 
least on sets of currents where the mass and the 
mass of the boundary are bounded). 

Since F(T) can be explicitly estimated from above, it 
can be quite useful in proving that a sequence of 
currents converges to a certain limit. Finally, the flat 
norm gives a (geometrically significant) measure of how 
far apart two currents are: for instance, given the 0- 
currents 6, and 6, (the Dirac masses at x and y, 
respectively), then F(x — ôy) is exactly the distance 
between x and y. 


See also: Free Interfaces and Free Discontinuities: 
Variational Problems; '-Convergence and 
Homogenization; Geometric Phases; Image Processing: 
Mathematics; Minimal Submanifolds; Mirror Symmetry: 
A Geometric Survey; Moduli Spaces: An Introduction. 
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Introduction 


We invite the reader to perform the following simple 
experiment. Put your arm out in front of you keeping 
your thumb pointing up perpendicular to your arm. 
Move your arm up over your head, then bring it down 
to your side, and at last bring the arm back in front of 
you again. In this experiment an object (your thumb) 
was taken along a closed path traced by another object 
(your arm) in a way that a simple local law of transport 
was applied. In this case the local law consisted of two 
ingredients: (1) preserve the orthogonality of your 
thumb with respect to your arm and (2) do not rotate 
the thumb about its instantaneous axis (i.e., your arm). 
Performing the experiment in this way, you will 
manage to avoid rotations of your thumb locally; 
however, in the end you will experience a rotation of 
90° globally. 

The experiment above can be regarded as the 
archetypical example of the phenomenon called 
anholonomy by physicists and holonomy by math- 
ematicians. In this article, we consider the manifes- 
tation of this phenomenon in the realm of quantum 
theory. The objects to be transported along closed 
paths in suitable manifolds will be wave functions 
representing quantum systems. After applying local 
laws dictated by inputs coming from physics, one 
ends up with a new wave function that has picked 
up a complex phase factor. Phases of this kind are 
called geometric phases, with the famous Berry 
phase being a special case. 


The Space of Rays 


Let us consider a quantum system with physical 
states represented by elements |Y} of some Hilbert 
space H with scalar product (|):H x H — C. For 
simplicity, we assume that H is finite dimensional, 
H ~ C"*! with n > 1. The infinite-dimensional case 
can be studied by taking the inductive limit n — oo. 


Morgan F (2000) Geometric Measure Theory. A Beginner’s 
Guide, 3rd edn. San Diego: Academic Press. 

Simon L (1983) Lectures on Geometric Measure Theory. 
Proceedings of the Centre for Mathematical Analysis, 3. 
Australian National University, Centre for Mathematical 
Analysis, Canberra 1983. 


Let us denote the complex amplitudes characterizing 
the state |W) by Z°,a=0,1,...,7. For a normalized 
state, 


lI = ib) = bupZ°Z2=Z,.Z°=1 [1] 


where summation over repeated indices is understood, 
indices raised and lowered by 6°” and 6,3, respectively, 
and the overbar refers to complex conjugation. A 
normalized state lies on the unit sphere S ~ S*”*! in 
C”*!, Two nonzero states |x) and |y) are equivalent, 
lw) ~ |p), iff they are related as |Y) = Aly) for some 
nonzero complex number A. For equivalent states, 
physically meaningful quantities such as 


2 
(HA) Mal) : 


D aE lel’ 


(mean value of a physical quantity represented by a 
Hermitian operator A, transition probability from a 
physical state represented by |y) to one represented 
by |w~)) are invariant. Hence, the real space of states 
representing the physical states of a quantum system 
unambiguously is the set of equivalence classes P = 
H/~ .P is called the “space of rays.” For H ~ C”*", 
we have P ~ CP”, where CP” is the n-dimensional 
complex projective space. For normalized states, |W) 
and |p) are equivalent iff |Y) = Alp}, where |A|=1, 
that is, A € U(1). Thus, two normalized states are 
equivalent iff they differ merely in a complex phase. 
It is well known that S can be regarded as the total 
space of a principal bundle over P with structure 
group U(1). This means that we have the projection 


T: y) €S CH = |p EP [3] 


where the rank-1 projector |w)(qW| represents the 
equivalence class of |y). Since we will use this bundle 
frequently in this article, we call it 7, (the meaning of 
the subscript 1 will be clarified later). Then, we have 


m:U(1)\ GC S—>P [4] 


For Z? Æ 0 the space of rays P can be given local 
coordinates 


wñ =Z Z}, j=1,...,n [5] 


The w’ are inhomogeneous coordinates for CP” on 
the coordinate patch Uo defined by the condition 
Vie 08 

P is a compact complex manifold with a natural 
Riemannian metric g. This metric g is induced from 
the scalar product on H. Let us consider the 
construction of g by using the physical input 
provided by the invariance of the transition prob- 
ability of [1]. For this we define a distance between 


lw) (4b| and |~)(y| in P as follows: 


cos? (Ely, 9) /2 - Lot 6 
(EV, p) /2) lvo [6] 


This definition makes sense since, due to the 
Cauchy-Schwartz inequality, the right-hand side of 
[6] is non-negative and <1. It is equal to 1 iff |Y} is 
a nonzero complex multiple of |p}, that is, iff they 
define the same point in P. Hence in this case, 
6(w, py) =0 as expected. 

Suppose now that |Y) and |y} are separated by an 
infinitesimal distance ds = (4%, y). Putting this into 
the definition [6], using the local coordinates w’ of 
[5] for |Y) and u’ + dw! for |p) after expanding both 
sides using Taylor series, one gets 


ds? = 4g; dw! du”, j,k =1,2,...,n [7] 
where 


(1 + ww!) 6, — wjw, 
Sik = e mee 8) 
(1+ w,Ww"™) 
with dwf = dw*. The line element [7] defines the 
Fubini-Study metric for P. 


The Pancharatnam Connection 


Having defined the basic entity, the space of rays P, 
and the principal U(1) bundle 7,, now we define a 
connection giving rise to a local law of parallel 
transport. This approach gives rise to a very general 
definition of the geometric phase. In the mathema- 
tical literature, the connection defined below is 
called the “canonical connection” on the principal 
bundle. However, since the motivation is coming 
from physics, we are going to rediscover this 
construction using merely physical information 
provided by quantum theory alone. 

The information needed is an adaptation of Pan- 
charatnam’s study of polarized light to quantum 
mechanics. Let us consider two normalized states |W) 
and |p). When these states belong to the same ray, then 
we have |Y) =e'?|y~) for some phase factor e'?; hence, 
the phase difference between them can be defined to be 
just @. How to define the phase difference between |W) 
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and |p) (not orthogonal) when these states belong to 
different rays? To compare the phases of 
nonorthogonal states belonging to different rays, 
Pancharatnam employed the following simple rule: 
two states are “in phase” iff their interference is 
maximal. In order to find the state |p) = e! |’) from 
the ray spanned by the representative |v’) which is “in 
phase” with |V), we have to find a ¢ modulo 27 for 
which the interference term in 


ly + egl = 20 + Refet wy) [9 


is maximal. Obviously the interference is maximal 
iff e? (4|ọ') is a real positive number, that is, 








io _ PY) _ on vl) 10 
Te Pray BN 

Hence for the state |p) “in phase” with |W), one has 
(lp) = Kulp) E R* [11] 


When such |Y) and |p) = |Y + dy) are infinitesi- 
mally separated, from [11] it follows that 


mane ~ (Z,dZ*°—-dZ,.Z*)=0 [12] 


where Z,Z° = ZoZ°(1 + wjw') =1 due to normal- 
ization. Writing Z° = |Z°Je’® using [5], one obtains 


n. j 
woe Ajai. 


In order to clarify the meaning of the 1-form A, 
notice that the choice 


1 + w,wk 


v= a) 14 


defines a local section of the bundle 7,. In terms of 
this section, the state |y) can be expressed as 


w= (J) =12%e*( 0) =e) as 


For a path w/(t) lying entirely in Uo CP, 
lt) =e? ly (t)) defines a path in S with a 
(t) satisfying the equation ® + A =Q. For a closed 
path C, the equation above defines a (generically) 
open path I projecting onto C by the projection vr. 
It must be clear by now that the process described is 
the one of parallel transports with respect to a 
connection with a connection 1-form w. The pull- 
back of w with respect to the local section in [14] is 
the 1-form (U(1) gauge field) A in [13]. The curve T 
corresponding to |y(t)) is the horizontal lift of C in 
P. The U(1) phase 


ae ei." [16] 
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is the holonomy of the connection. We call this 
connection the “Pancharatnam connection,” and its 
holonomy for a closed path in the space of rays is 
the geometric phase acquired by the wave function. 
Now the question of fundamental importance is: 
how to realize closed paths in P physically? This 
question is addressed in the following sections. 


Quantum Jumps 


We have seen that physical states of a quantum 
system are represented by the space of rays P and 
normalized states used as representatives for 
such states form the total space S of a principal 
U(1) bundle 7, over P. Moreover, in the previous 
section we have realized that the physical 
notions of transition probability, and quantum 
interference naturally lead to the introduction of a 
Riemannian metric g and an abelian U(1) gauge 
field A living on P. 

An interesting result based on the connection 
between g and A concerns a nice geometric descrip- 
tion of a special type of quantum evolution consist- 
ing of a sequence of “quantum jumps.” 

Consider two nonorthogonal rays |A)(A| and 
|B)(B| in P. Let us suppose that the system’s 
normalized wave function initially is |A) € S, and 
measure by the “polarizer” |B)(B|. Then the result of 
this filtering measurement is |B)(B\A), or after 
projecting back to the set of normalized states we 
have the “quantum jump” 


(BIA) 


& > |B) pray 





[17] 


Now we have the following theorem: 


Theorem The |17] jump can be recovered by 
parallel transporting the normalized state |A) 
according to the Pancharatnam connection along 
the shortest geodesic (with respect to the [8] metric), 
connecting |A)(A| and |B)(B| in P. 


Let us now consider a cyclic series of filtering 
measurements with projectors |A,z)(A,z|, a=1,2,..., 
N+ 1, where |A1)(A1|=|An+1)(An41|. Prepare the 
system in the state |A1)} € S, and then subject it to 


the sequence of filtering measurements. Then 
according to the theorem, the phase 

. A1|An)(An|An-1) ++: (A2|A 

oie _ SArlAn) (An|An-1) « * (42/41) 18) 


|(A1|An) (An|An-1) «+: (A2|A1)| 


picked up by the state is equal to the one obtained 
by parallel transporting |A1) along a geodesic 
polygon consisting of the shorter arcs connecting 


the projectors |A,z)(Ag| and |Agi1)(Agy1| with 
a=1,2,...,N. It is important to realize that this 
filtering measurement process is not a unitary one; 
hence, unitarity is not essential for the geometric 
phase to appear. 

In this section we have managed to obtain closed 
paths in the form of geodesic polygons in P via the 
physical process of subjecting the initial state |A1} to 
a sequence of filtering measurements. It is clear that 
for any type of evolution, the geodesics of the 
Fubini-study metric play a fundamental role since 
any smooth closed curve in P can be approximated 
by geodesic polygons. 

Nonunitary evolution provided by the quantum 
measuring process is only half of the story. In the 
next section, we start describing closed paths in P 
arising also from unitary evolutions generated by 
parameter-dependent Hamiltonians, the original 
context where geometric phases were discovered. 


Unitary Evolutions 
Adiabatic Evolution 


Suppose that the evolution of our quantum system 
with H ~ C”*! is generated by a Hermitian Hamil- 
tonian matrix depending on a set of external 
parameters x“,u=1,2,...,M. Here we assume 
that the x” are local coordinates on some coordinate 
patch V of a smooth M-dimensional manifold M. 
We lable the eigenvalues of H(x) by the numbers 
r=0,1,2,...,”, and assume that the rth eigenvalue 
E,(x) is nondegenerate: 


Ee) SEX), THO, 2cu.8 19) 


We assume that H(x), E,(x), |r,x) are smooth func- 
tions of x. The rank-1 spectral projectors 


PAC) Shere, CS0,1.2yc.0,% [20] 
for each r define a map f,:M — P: 
fr: xE€VCMHP,(x) €P [21] 


Recall now that we have the bundle 7, over P, at 
our disposal, and we can pull back 7, using the map 
f, to construct a new bundle €; over the parameter 
space M. Moreover, we can define a connection on 
€{ by pulling back the canonical (Pancharatnam) 
connection of 7,. The resulting bundle €} is called 
the Berry—Simon bundle over the parameter space 
M. Explicitly, 


UIC OM [22] 


The states |r, x} of [19] define a local section of &'. 
Supressing the index r, the relationship between n 


and €, can be summarized by the following 


diagram: 


ai = M 
r| m| [23] 
M +s P 


Here f* denotes the pullback map, and we have & = 
f*(7,). (We have denoted the total space S as 7,.) 

The local section of €, arising as the pullback of 
[14] an 7, is given by 


75%) = 


1 
1 + wk (x) wy, (x) (a x) ) 
xEVCcM |24] 


with j=1,2,...,n. The pullback of the Panchar- 
atnam connection w on 7, is f*(w). We can further 
pull back f*(w) to V C M with respect to the local 
section of [24] to obtain a gauge field living on the 
parameter space. This gauge field is called the 
“Berry gauge field” and the corresponding connec- 
tion is the Berry connection. Thus, 


A= f*(A) =A,,(x)dx" = (Aj0,,0/ +A0,w!)dx" [25] 


here 0,,=0/0x" and A is given by [13]. When we 
have a closed curve C in M, then foC defines a 
closed curve C in P. We already know that the 
holonomy for C in P can be written in the [16] 
form; hence, 


a=- A=-$ftA)=-fA Be 


This formula states that there is a geometric phase 
picked up by the eigenstates of a parameter- 
dependent Hermitian Hamiltonian when we change 
the parameters along a closed curve. Our formula 
shows that the geometric phase can be calculated 
using either the canonical connection on 7, or the 
Berry connection on &. 

Let us then change the parameters x” adiabati- 
cally. The closed path in parameter space then 
defines Hamiltonians satisfying H(x(T)) =H(x(0)) 
for some T €e R*. Moreover, there is also the 
associated closed curve P,(x(T))=P,(x(0)) in P. 
The quantum adiabatic theorem states that if we 
prepare a state |(0)) = |r, x(0)) at t=0, which is an 
eigenstate of the instantaneous Hamiltonian 
H(x(0)), then after changing the parameters 
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infinitely slowly, the time evolution generated by 
the time-dependent Schrödinger equation 


ih EvO) = HOITO) 27 
takes the form 


T) = |r, (e |28] 


after time t, which belongs to the same eigensub- 
space. The point is that the theorem holds only for 
cases when the kinetic energy associated with the 
slow change in the external parameters is much 
smaller than the energy separation between E,(x) 
and E,(x) for all x€ M. Under this assumption, 
transitions between adjacent levels are prohibited 
during evolution. Notice that the adiabatic theorem 
clearly breaks down in the vicinity of level crossings 
where the gap is comparable with the magnitude of 
the kinetic energy of the external parameters. 

However, if one takes it for granted that the 
projector P,(t) = P,(x(t)) for some r satisfies the 
Schr6dinger-von Neumann equation 

ib< P,(t) = HO), P(t) 29 

by virtue of [19], we get zero for the right-hand side. 
This means that P,(t) is constant; hence, the curve in 
P degenerates to a point. The upshot of this is that 
exact adiabatic cyclic evolutions do not exist. It can 
be shown, however, that under certain conditions 
one can find an initial state |(0)) Æ |r, x(0)) that is 
“close enough” to P,(x(t))= |r, x(t))(r,x(t)|. Then, 
we can say that the projector analog of [28] only 
approximately holds 


EEEE S |r, x(t) r, E) [30] 


This means that the use of the bundle picture for 
the generation of closed curves for P via the 
adiabatic evolution can merely be used as an 
approximation. 


Berry’s Phase 


The straightforward calculation after substituting 
[28] into [27] shows that 


exp(1A,(T)) exp (- : [ E,(t)dr) 


exp (i $ A”) [31] 


where C is a closed curve lying entirely in Y C M. 
The first phase factor is the dynamical and the 
second is the celebrated Berry phase. Notice that the 
index r labeling the eigensubspace in question 
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should now be included in the definition of A 
(see eqn [25]). 
As an explicit example, let us take the Hamiltonian 


A(X(t)) = —woJ X(t), wo = ——, 


XeR, |X|=1 [32] 


where e, m, and g are the charge, mass, and Landé 
factor of a particle, c is the speed of light, and B is 
the (constant) magnitude of an applied magnetic 
field. The three components of J are 
(2J +1) x (2J + 1)-dimensional spin matrices satis- 
fying J xJ=ibJ. The Hamiltonian (eqn [32]) 
describes a spin J particle moving in a magnetic 
field with slowly varying direction. It is obvious that 
the parameter space is a 2-sphere. Introducing polar 
coordinates 0 <8 <r, 0 < x< 2r for the patch V 
of S* excluding the south pole, we have 
Xais =y. 

As an illustration, let us consider the spin 1/2 case. 
Then H can be expressed in terms of the 2 x 2 Pauli 
matrices. The eigenvalues are Eo =—woh/2 and 
FE, =woh/2 (r=0,1). For the ground state, the 
mapping fo of [21] from V c M ~ S? to P ~ CP! is 
given by 


w(0, x) = tan (5) eX [33] 


which is stereographic projection of S* from the 
south pole onto the complex plane corresponding to 
the coordinate patch Uo C CP}. Using [13] and [25], 
one can calculate the pullback gauge field and its 


curvature F® = dA, where 
1 1 
AO = 5 (1 —cos)dy, F® = zsinddd A dx [34] 


Notice that F® is the field strength of a magnetic 
monopole of strength 1/2 living on M. Using Stokes 
theorem, from [26] one can calculate Berry’s phase 


=- $ A® = ae = -Zoe [35] 


where S is the surface bounded by the loop C and Q[C] is 
the solid angle subtended by the curve C at X =0. 

The above result can be generalized for arbitrary spin 
J. Then, we have the eigenvalues E, = —woh(J — r), 
where 0 < r < 2]. The final result in this case is 


e”(c]=-(J-nQ[c, O<r<2J [36 
The Aharonov—Anandan Phase 


We have seen that the quantum adiabatic theorem 
can only be used approximately for generating 


closed curves in P. This section, describes as to 
how such curves can be generated exactly. 

Let us consider the Schrödinger equation with a 
time-dependent Hamiltonian (eqn [27]). Then we 
call its solution |W(t)) cyclic if the state of the system 
returns, after a period T, to its original state. This 
means that the projector |W(t))(W(t)| traverses a 
closed path C in P. In order to realize this situation, 
we have to find solutions of [27] for which 
1 (T)) =e'4"|W(0)) for some Ay. 

Taking for granted the existence of such a solution, let 
us first explore its consequences. First, we remove the 
dynamical phase from the cyclic solution |W(t)) 


v0) =exp(z S WORO) wo) B7 


Then, |7(t)) satisfies [12], that is, it defines a unique 
horizontal lift of the closed curve C in P. Following 
the same steps as in section describing the Panchar- 
atnam condition, we see that the phase 


tass fa 
1 T 
-Artz | VOHOVO)d [38 


is purely geometric in origin. It is called the 
Aharonov—Anandan (AA) phase. 

Let us now turn back to the question of finding 
cyclic states satisfying |W(T))=e'4"|W(0)). One 
possible solution is as follows. Suppose that H 
depends on time through some not necessarily 
slowly changing parameters x. Let us find a partner 
Hamiltonian þh for our H by defining a smooth 
mapping o: M — M, such that 

h(x) = H(o(x)), x«EeVCM [39] 
For the special class we study here, the cyclic vectors 
are eigenvectors of h(x). Hence, the projectors p, 
and P, of h and H are related as p,(x) = P,(o(x)); this 
means that we have a map g: M > P, 


g =froa:xEVCM—pf,(x) EP [40] 


which associates with every x an eigenstate of h(x). 
Moreover, g; associates with a closed curve C in M 
a closed curve C in P. Notice that generically 
[h(x), H(x)] 40; hence, cyclic states are not eigen- 
states of the instanteneous Hamiltonian. 

It should be clear by now that we can repeat the 
construction as discussed in the adiabatic case with 
g, replacing f,. In particular, we can construct a new 
bundle ¢, over the parameter space via the usual 


pullback procedure. More precisely, we have the 
corresponding diagram 


GC == i 
n| m| 41] 
M24. 2 


The AA connection can be obtained by pulling back 
the Pancharatnam connection: 


a= g" (A) = o* of (A) = o (A) (42) 


where the last equality relates the AA connection 
with the Berry connection. Now the AA phase is 


Bia=—P A=- peA- fa [43] 


As an example, let us take the Hamiltonian [32] 
with the curve C on M = S?: 


X(t) = (sin cos(x +wt),sind sin(x +wt),cos@) [44] 


Here 0 and y are the polar coordinates of a fixed point 
in S* where the motion starts. The curve C is a circle of 
fixed latitude and is traversed with an arbitrary speed. 
This model can be solved exactly and it can be shown 
that the mapping o,:S* — S? is given by 


fas) ( u—s ) 
O% , | , ’ 
A ys? — 2us + 1 a 
u=cosé, s= = [45] 
WO 


One can prove that for 0<s<1,o, is a diffeo- 
morphism. In the s — O (the adiabatic) limit, the 
mapping 8g,s = frs © gs is continuously deformed to 
f,- Moreover, h(x) as defined above commutes with 
the time evolution operator; hence, cyclic states are 
indeed eigenstates of h(x). 

Using [42], [43], and [45], the explicit form of cs, 
we get for the AA phase 


Vs? — 2us + 7 es 


In the adiabatic limit, the result goes to —2r(J — r) 
(1 — u) which is just —(J — r) times the solid angle of 
the path of fixed latitude, as it has to be. 


oK = ny -h1 - 


Generalization 


In the sequence of examples, we have shown that 
geometric phases are related to the geometric struc- 
tures on the bundle 7,. The Berry and AA phases are 
special cases arising from Pancharatnam’s phase via a 
pullback procedure with respect to suitable maps 
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defined by the physical situation in question. Hence, 
the Pancharatnam connection in this sense is universal. 
The root of this universality rests in a deep theorem of 
mathematics concerning the existence of universal 
bundles and their universal connections. In order to 
elaborate the insight provided by this theorem into the 
geometry of quantum evolution, let us first make a 
further generalization. 

In our study of time-dependent Hamiltonians we 
have assumed that the eigenvalues of [19] were 
nondegenerate. Let us now relax this assumption. Fix 
an integer N > 1, the degeneracy of the eigensubspace 
corresponding to the eigenvalue E,. One can then form 
a U(N) principal bundle €, over M, furnished with a 
connection, that is a natural generalization of the Berry 
connection. The pullback of this connection to a patch 
of M is a U(N)-valued gauge field and its holonomy 
along a loop in M gives rise to a U(N) matrix 
generalization of the U(1) Berry phase. 

The natural description of this connection and its AA 
analog is as follows. Take the complex Grassmannian 
Gr(n + 1,N) of N planes in C’*!. Obviously, Gr(n + 
1,1) = P. Each point of Gr(n + 1, N) corresponds to 
an N plane through the origin represented by a rank-N 
projector. This projector can be written in terms of N 
orthonormal basis vectors in an infinite number of 
ways. This ambiguity of choosing orthonormal frames 
is captured by the U(N) gauge symmetry, the analog of 
the U(1) (phase) ambiguity in defining a normalized 
state as the representative of the rank-1 projector. This 
bundle of frames is the Stiefel bundle V(z + 1,N) 
alternatively denoted by ny. V(n + 1, N) is a principal 
U(N) bundle over Gr(z+1,N) equipped with a 
canonical connection wy which is the U(N) analog of 
Pancharatnam’s connection. 

Now according to the powerful theorem of Nar- 
asimhan and Ramanan if we have a U(N) bundle £n 
over the M-dimensional parameter space M, then 
there exists an integer no(N, M) such that for n < no 
there exists a map f:M — Gr(n+1,N) such that 
nn =f*(V(n +1, N)). Moreover, given any two such 
maps f and g, the corresponding pullback bundles are 
isomorphic if and only if f is homotopic to g. 

For the examples of the sections “Berry’s phase” 
and “The Aharonov-Anandan phase,” we have 
N=1,n=1, and M=2. Since the maps f, and gs 
defined by the rank-1 spectral projectors of H(x) 
and h(x) for 0<s<1 are homotopic, the corre- 
sponding pullback bundles €, and ¢, are isomorphic. 
Moreover, the Berry and AA connections are the 
pullbacks of the universal connection on V(n + 
1,1) =, which is just Pancharatnam’s connection. 

For the infinite-dimensional case, one can define 
Gr(oo,N) by taking the union of the natural 
inclusion maps of Gr(n,N) into Gr(n+1,N). 
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We denote this universal classifying bundle V(oo, N) 
as 7. Then, we see that given an N-dimensional 
eigensubspace bundle over M and a map f,:x € 
M = P(x) € Gr(oo,N) defined by the physical 
situation, the geometry of evolving eigensubspaces 
can be understood in terms of the holonomy of the 
pullback of the universal connection on 7. 


Conclusions 


In this article, we elucidate the mathematical origin of 
geometric phases. We have seen that the key observa- 
tion is the fact that the space of rays P represents 
unambiguously the physical states of a quantum 
system. The particular representatives of a class in P 
belonging to the usual Hilbert space H form (local) 
sections of a U(1) bundle 7,. Based on the physical 
notions of transition probability and interference, 7, 
can be furnished with extra structures: the metric and 
the connection, the latter giving rise to a natural 
definition of parallel transport. We have seen that the 
geodesics of P with respect to the metric play a 
fundamental role in approximating evolutions of any 
kind, giving rise to a curve in P. 

The geometric structures of 7, induce similar 
structures for pullback bundles. These bundles encap- 
sulate the geometric details of time evolutions gener- 
ated by Hamiltonians that depend on a set of 
parameters x belonging to a manifold M. It was 
shown that the famous examples of Berry and AA 
phases arise as an important special case in this 
formalism. A generalization of evolving N-dimen- 
sional subspaces based on the theory of universal 
connections can also be given. This shows that the 
basic structure responsible for the occurrence of 
anholonomy effects in evolving quantum systems is 
the universal bundle 7 which is the bundle of subspaces 
of arbitrary dimension N in a Hilbert space. 

The important issue of applying the idea of 
anholonomy to physical problems has not been 
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Introduction 


The equations of geophysical fluid dynamics are the 
equations governing the motion of the atmosphere and 
the ocean, and are derived from the conservation 
equations from physics, namely conservation of mass, 


dealt with in this article. There are spectacular 
applications such as holonomic quantum computa- 
tion, the gauge kinematics of deformable bodies, 
quantum Hall-effect, fractional spin and statistics. 
The interested reader should consult the vast 


literature on the subject or as a first glance, the 
book of Shapere and Wilczek (1989). 


See also: Fractional Quantum Hall Effect; Geometric 
Measure Theory; Holomorphic Dynamics; Moduli 
Spaces: An Introduction. 
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momentum, energy, and some other components such 
as salt for the ocean, humidity (or chemical pollutants) 
for the atmosphere. 

The first assumption used in any circulation 
model is the well-accepted Boussinesq approxima- 
tion, that is, the density differences are neglected in 
the system except in the buoyancy term and in the 
equation of state. The resulting system is the so- 
called Boussinesq equations (Pedlosky 1987). Due to 
the extremely high accuracy of this approximation, 
these equations are considered as the basic equations 


in geophysical dynamics. From the computational 
point of view, however, the Boussinesq equations 
are still not accessible. 

Owing to the difference of sizes of the vertical and 
horizontal dimensions, both in the atmosphere and in 
the ocean (10-20 km versus several thousands of 
kilometers), the second approximation is based on the 
smallness of the vertical length scales with respect to 
the horizontal length scales, that is, oceans (and the 
atmosphere) compose very thin layers. The scale 
analysis ensures that the dominant forces in the 
vertical-momentum equation come from the pressure 
gradient and the gravity. This leads to the so-called 
hydrostatic approximation, which amounts to repla- 
cing the vertical component of the momentum equa- 
tion by the hydrostatic balance equation, and hence 
leading to the well-accepted primitive equations (PEs) 
(Washington and Parkinson 1986). As far as we 
know, the primitive equations were first considered 
by LF Richardson (1922); when it appeared that 
they were still too complicated they were left out 
and, instead, attention was focused on even simpler 
models, the geostrophic and quasigeostrophic mod- 
els, considered in the late 1940s by J von Neumann 
and his collaborators, in particular J G Charney. 
With the increase of computing power, interest 
eventually returned to the PEs, which are now the 
core of many global circulation models (GCMs) or 
ocean global circulation models (OGCMs), avail- 
able at the National Center for Atmospheric 
Research (NCAR) and elsewhere. GCMs and 
OGCMs are very complex models which contain 
many components, but still, the PEs are the central 
component for the dynamics of the air or the water. 
Further approximations based on the fast rotation 
of the Earth implying the smallness of the Rossby 
number lead to the quasigeostrophic and goes- 
trophic equations (Pedlosky 1987). 

The mathematical study of the PEs was initiated by 
Lions, Temam, and Wang in the early 1990s. They 
produced a mathematical formulation of the PEs 
which resembles that of the Navier-Stokes due to 
Leray, and obtained the existence, for all time, of weak 
solutions (see Lions et al. 1992a, b, 1993, 1995). 
Further works conducted during the 1990s have 
improved and supplemented these early results bring- 
ing the mathematical theory of the PEs to that of the 
three-dimensional incompressible Navier-Stokes 
equations (Constantin and Foias 1998, Teman 2001). 
In summary, the following results are now available 
which will be presented in this article: 


1. existence of weak solutions for all time; 

2. existence of strong solutions in space dimension 
three, local in time; 

3. existence and uniqueness of a strong solution in 
space dimension two, for all time; and 
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4. uniqueness of weak solutions in space dimension 
two. 


The PEs of the Ocean 


The ocean is made up of a slightly compressible 
fluid subject to a Coriolis force. The full set of 
equations of the large-scale ocean are the following: 
the conservation of momentum equation, the con- 
tinuity equation (conservation of mass), the thermo- 
dynamics equation, the equation of state and the 
equation of diffusion for the salinity S: 


pV3 + 2px Vs + Vap + pg =D [1] 
+ pdiv3V3 = 0 [2] 
Tor 3 

Sos 4 
p=f(T,S,p) [5] 


Here V3 is the three-dimensional velocity vector, 
V3=(u,v,w), p, p, T are respectively, the density, 
pressure, and temperature, and S is the concentra- 
tion of salinity; g= (0,0, g) is the gravity vector, D 
the molecular dissipation, Or and Qs are the heat 
and salinity diffusions, respectively. 


Remark 1 The equation of state for the oceans is 
derived on a phenomenological basis. Only empirical 
forms of the function f(T,S,p) are known (see 
Washington and Parkinson (1986)). It is natural, 
however, to expect that p decreases if T increases and 
that p increases if S increases. The simplest law is 


p= po(1 — PT — T) =F Bs(S — Sr)) [6] 


corresponding to a linearization around reference 
values po, T;, St of respectively, the density, tem- 
perature, and the salinity, Gr and (Js are positive 
expansion coefficients. 


The Mach number for the flow in the ocean is not 
large and, therefore, as a starting point, we can 
make the so-called Boussinesq approximation in 
which the density is assumed constant, p= po, 
except in the buoyancy term and in the equation of 
state. This amounts to replacing [1], [2] by 


dV 
poz + 2p00 x V3 + Vsp + pg = D [7] 


div3V3 =() [8] 


Furthermore, since for large-scale ocean, the horizon- 
tal scale is much larger than the vertical one, a scale 
analysis (Pedolsky 1987) shows that Op/0z and pg are 
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the dominant terms in the vertical-momentum equa- 
tion, leading to the hydrostatic approximation 


Op 
a PS [9] 


For mid-latitude regional studies, it is usual to 
consider the beta-plane approximation of the equations. 
Thus, we assume that the ocean fills a domain M. of RÌ. 
The top of the ocean is a domain T; included in the 
surface of the earth S, (sphere of radius a centered at 
0). The bottom T, of the ocean is defined by (z = x3 = 
r— a), z= —eh(0,%), where £ > 0 is a positive para- 
meter. It is introduced to take into consideration the 
smallness of the vertical scales compared to the 
horizontal scales. þh is a function of class C? at least on 
[;; it is assumed also that h is bounded from below, that 
is, 0 < b < bh(0,) < h, (0,9) € Tj. The lateral surface 
I; consists of the part of cylinder {(@,y) € OT;, 
—eh(6,) < r < 0}. The PEs of the ocean are given by 


Ov Ov 1 
+20 sin ðk x v — ote e [10] 
Uv ” Oz2 v 
Op Ow 
De p, div v + De 0 [11] 
OT OT oT 
—— oL — — urÂT — vr = = F 12 
pee ee A 
OS OS ðS 
— ‘5 — — uss — vy — = F 13 
apt Ore US S US a72 S | | 
0 
div | vdz = 0 [14] 
—h 


0 
b =p. +P, p=p(T.S) = | pd? Jis 
reni a [16 
/ SdM. =0 117] 
Me 


where v is the horizontal velocity of the water, w is the 
vertical velocity, and T,, S, are averaged (or reference) 
values of T and S. The diffusion coefficients uy, UT, us 
and vy, Vr, Vs are different in the horizontal and 
vertical directions, accounting for some eddy diffu- 
sions in the sense of Smagorinsky (1962). Note that 
F,, Fr, and Fs correspond to volumic sources of 
horizontal momentum, heat, and salt, respectively. 


Boundary conditions 


There are several sets of natural boundary condi- 
tions that one can associate to the PEs; for instance, 
the following: 


On the top of the ocean I[;(z=0) 


Vy Y + av= oa) = Tv, Ww = 0 
Pi Os [18] 
ee Pant 0: oe 
TF Fari )=0 Əz 0 
At the bottom of the ocean T, (z= —h(6, y)) 
OT Os 
v=0, w=, any 0, ans 0 [19] 


On the lateral boundary [,={—h(@,y) <z< 0, 
CERY 
OT Os 


v=0, w=0, 5 -=0, 20 


Here n=(ny,n,) is the unit outward normal on 
OM, decomposed into its horizontal and vertical 
components; the conormal derivatives 0/On7 and 
O/Ongs are those associated with the linear (tempera- 
ture and salinity) operators, 


o o 
ao = UTH: V + Vorn = 


T oz [21] 
an, PE Var vsr: 5> 
Equations [10]-[17] with boundary conditions 


[18]-[20] are supplemented with the initial conditions 
v|,=0 = V0, T|,—0 = To, S|,—0 = So [22] 


where vo, To, So are given initial data. 

Following the work of Lions et al. (1992a, b, 
1993, 1995) (see also Temam and Ziane (2004)), 
we introduce the following function spaces V = 
Vi x V> x V3, H= H; X H> x H3, where 


0 
v= {ve HM) div | vd =i 


v=OonT, un} 
V2 = H'(M) 
Vs =H) = fse HM), | sam = 0} 
M 


0 
Hı = fo € LM) div | v dz = 0, 


0 
ny ` J vdz = 0 on OT; (i.e., on ri) } 
—h 


Hz = L*(M) 


H; = L?(M) = o € rm), f 


SaM = o} 


The global existence of weak solutions is estab- 
lished in Lions et al. (1992b), using the Galerkin 
method and assuming the H?-regularity of the GFD- 
Stokes problem, which was established in Ziane 


(1995). A more general global existence result based 
on the method of finite differences in time and 
independent of the H?-regularity is established in 
Temam and Ziane (2004), which we state here. 


Theorem 2 Given tı > 0, Up in H, and F=(F, 
oa, in LO t1; H: § — $v» 8ST is given in (0, 115 
(L7(T;)°). Then there exists 

U € L®(0,t1;H) 1 L7(0, t1; V) [23] 


which is a weak solution of [10]-[17] and [18]-[20], 
[22]; furthermore, U is weakly continuous from 
[0, t41] into H. 


Strong Solutions 


The local existence and uniqueness of strong 
solutions of the primitive equations of the ocean 
relies on the H?-regularity of the stationary linear 
primitive equations associated to [10]-[17]: 


1 2 
— Vp + 2Q sin 6k x v — py, Av — pe = i; 
po OZ 
5 [24] 
div v dz = 0 
-h 
82 
=0r Al =e a= FT 
Oz? 
[25] 
OS 
=f d= iy oi Fs 


p=p +P, P=P(T,S) =e f paz [26] 
with boundary conditions [18]-[20]. Here F,, Fr, Fs 
are independent of time. We have the following 


H?-regularity of solutions (Ziane 1995, Hu et al. 
2002, Temam and Ziane 2004). 


Theorem 3 Assume, that h is in C*(T;),hb > b >O, 
Foal Fs1's F (IL? (M. ))* and 8v = Ty + Aa, 8T = Qala 
c (Hj(T;))*. Let (v, T, S; p) € (H'(M.))* x L?(T;) be 
a weak solution of [24]-[26]. Then 








(H2(M.)) xH! (Ma) 
€ (EP (M:))" 


(v, p) € 
(T, S) 


|27] 





Moreover, the following inequalities hold: 





2 2 
Yee) + E|P lrvr, . 


< Cll ‘ale + elVauliacry | 


Thies $C + lerim) + lVartiacry | 





where C is a positive constant independent of €. 
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We now turn our attention to the nonlinear time- 
dependent PEs. The local-in-time existence and 
uniqueness of strong solutions is obtained in 
Temam and Ziane (2004); see also Hu et al. 
(2003) and Guillén-Gonzalez et al. (2001). The 
proof is more involved than that of the three- 
dimensional Navier-Stokes equations. It consists of 
several steps. In the first step, one proves the global 
existence of strong solutions to the linearized time- 
dependent problem. In the second step, one uses the 
solution of the linearized equation in order to reduce 
the PEs to a nonlinear evolution equation with zero 
initial data and homogeneous boundary conditions. 
Finally, in the last step, one uses nonisotropic 
Sobolev inequalities together with Theorem 3. The 
local existence result is given by the following: 


Theorem 4 Let £ > 0 be given. We assume that T; 
is of class È and that h:T; +R, is of class C?. We 
are given Uo in V, F=(F,, Fr, Fs) in L7(0, tı; H) with 
OF/dt in 1L7(0,t1;L?(M-.)*), and g=(gy,gr) in 
L7(0, t1; H4(T;)°) with Og/dt in L*(0,t1;H4(T;)>). 
Then there exists t, >0,t.=t,(||Uo||), and there 
exists a unique solution U=U(t)=(v(t), T(t), S(t)) 
of the PEs [10|-[17], [18]-[20], and [22] such that 


U € C((0, t]; V) A L7(0,t,, H?(M-)*) [28] 


The PEs of the Atmosphere 


In this section we briefly describe the PEs of the 
atmosphere, for which all the mathematical results 
obtained for the PEs of the ocean are valid. We start 
from the conservations equations similar to [1]-[5]; in 
fact [1] and [2] are the same; the equation of energy 
conservation (temperature) is slightly different from 
[3] because of the compressibility of air; the state 
equation is that of perfect gas instead of [5]; finally, 
instead of the concentration of salt in the water, we 
consider the amount of water in air, q. Hence, we have 


dV 

p- +2PQ x V3 + Vsp + pg = D: [29] 
C N, 30 
Fa A [30] 

dT RT dp O 

p =. u Yr 
, dt pdt 31) 
a e _ 


Here cp > 0 is the specific heat of air at constant 
pressure, and R is the specific gas constant for the 
air. Proceeding as in the PEs of the ocean, we 
decompose V3 into its horizontal and vertical 
components, V3 =v +w; then we use the hydro- 
static approximation, replacing the equation of 
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conservation of vertical momentum by the hydro- 
static equation [9]. We find 


Way a eae 
Ot i Oz Vpo 
One E E, [32] 
Ly ? ozz 
Op 
— = — 33 
az = PB [33] 
oT oT 
ae ee a 
oT RTdp 
=e a 4 
"Oz p dt Qr pal 
ôq Og 3q 
Ly ade jA = 
p = RpT [36] 


The right-hand side of [34], represents the solar 
heating. 


Change of Vertical Coordinate 


Since p does not vanish, the hydrostatic equation 
[33] implies that p is a strictly decreasing function of 
z, and we are thus allowed to use p as the vertical 
coordinate; hence in spherical geometry the inde- 
pendent variables are now y, 9, p, and t. By an abuse 
of notation, we still denote by v, p, T, q, p these 
functions expressed in the y, 0, p, t variables. We 
denote by w the vertical component of the wind in 
the new variables, and one can show that the PEs of 
the atmosphere become 


+2Q sink x v+ VĒ- Lw=F, [37] 
Od R 


E N E) 38 
Bp p [38] 
divv +2 =0 39] 

OT ƏT RT 
2 gg a 40 
ar uae ae T=Fr [40] 

ðq ðq 

ic a me eT ay 8 41 
ae A q4 q |41] 
b = RoT (42 


We have denoted by ®=gz the geopotential (z is 
now function of p, 6, p,t); Ly, Lr, Lg are the Laplace 


operators, with suitable eddy viscosity coefficients, 
expressed in the y, 0, p variables. Hence, for example, 


gp * Ov 
(T) Op 
with similar expressions for Lr and L4. Note that 
Fr corresponds to the heating of the Sun, whereas F, 
and F, (which vanish in reality) are added here for 
mathematical generality. The change of variable gives, 
for 0*v/Oz*, a term different from the coefficient of Vy. 
The expression above is simplified for of this coeffi- 
cient; the simplification is legitimate because v, is a very 
small coefficient (in particular, T has been replaced by 
T (known) average value of the temperature). 


o 


Lyv = pyAv+ Mv Bp 


[43] 








Pseudogeometrical Domain 


For physical and mathematical reasons, we do not allow 
the pressure to go to zero, and assume that p > po, with 
po > 0 “small.” Physically, in the very high atmosphere 
(p very small), the air is ionized and the equations above 
are not valid anymore. The pressure is then restricted to 
an interval pọ < p < pı, where pı is a value of the 
pressure smaller in average than the pressure on Earth, 
so that the isobar p = p1 is slightly above the Earth and 
the isobar p = po is an isobar high in the sky. We study 
the motion of the air between these two isobars. 

For the whole atmosphere, the boundary of this 
domain 


M = {(¥,9,P), Po < p < pis} 


consists first of an upper part Tu, p = po; the lower 
part p=p, is divided into two parts T; the part of 
p=p, at the interface with the ocean, and I, the 
part of p=p, above the earth. 


Boundary Conditions 


Typically, the boundary conditions are as follows: 


On the top of the atmosphere Talp = po) 


Ov 7 OT 7 q 7 
z 0, w=0, Dp 0, ap > 0 [44] 
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Definitions and an Example 


A gerbe can be viewed as a next step in a ladder of 
geometric and topological objects on a manifold 
which starts from ordinary complex-valued func- 
tions and in the second step of sections of complex 
line bundles. 

It is useful to recall the construction of complex 
line bundles and their connections. Let M be a 
smooth manifold and {U,} an of open cover of M 
which trivializes a line bundle L over M. Topologi- 
cally, up to equivalence, the line bundle is comple- 
tely determined by its Chern class, which is a 
cohomology class [c] € H*(M,Z). On each open 
set Ua we may write 27ic=dA,, where A, is a 
1-form. On the overlaps Uag = Ua N Ug we can write 


Aa — Ap = fag dfag [1] 


at least when Uag is contractible, where fag is a 
circle-valued complex function on the overlap. The 
data {c, Aa, fag} define what is known as a (repre- 
sentative of a) Deligne cohomology class on the 
open cover {Ua}. The 1-forms A, are the local 
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potentials of the curvature form 2ric and the fag’s 
are the transition functions of the line bundle L. 
Each of these three different data defines separately 
the equivalence class of the line bundle but together 
they define the line bundle with a connection. 

The essential thing here is that there is a bijection 
between the second integral cohomology of M and 
the set of equivalence classes of complex line bundles 
over M. It is natural to ask whether there is a 
geometric realization of integral third (or higher) 
cohomology. In fact, gerbes provide such a realiza- 
tion. Here, we shall restrict to a smooth differential 
geometric approach which by no means is the most 
general possible, but it is sufficient for most applica- 
tions to quantum field theory. However, there are 
examples of gerbes over orbifolds that do not need to 
come from finite group action on a manifold, which 
are not covered by the following definition. 

For the examples in this article, it is sufficient to 
adapt the following definition. A gerbe over a 
manifold M (without geometry) is simply a 
principal bundle 7:P—+M with fiber equal to 
PU(H), the projective unitary group of a Hilbert 
space H. The Hilbert space may be either finite or 
infinite dimensional. 

The quantum field theory applications discussed 
in this article are related to the chiral anomaly for 
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fermions in external fields. The link comes from 
the fact that the chiral symmetry breaking leads in 
the generic case to projective representations of the 
symmetry groups. For this reason, when modding 
out by the gauge or diffeomorphism symmetries, 
one is led to study bundles of projective Hilbert 
spaces. The anomaly is reflected as a nontrivial 
characteristic class of the projective bundle, 
known in mathematics literature as the Dixmier- 
Douady class. 

In a suitable open cover, the bundle P has a family 
of local trivializations with transition functions 
gag: Uag — PU(H), with the usual cocycle property 


Zapg Kya = 1 [2] 


on triple overlaps. Assuming that the overlaps are 
contractible, we can choose lifts gg: Uag > U(H), 
to the unitary group of the Hilbert space. However, 


g apf By£ya = i apy [3] 


where the f’s are circle-valued functions on triple 
overlaps. They satisfy automatically the cocycle 


property 
f apf, adsl. ayôf, ins =l [4] 


on quadruple overlaps. There is an important differ- 
ence between the finite- and infinite-dimensional 
cases. In the finite-dimensional case, the circle 
bundle U(H)— U(H)/S'=PU(H) reduces to a 
bundle with fiber Z/NZ=Zn, where N=dimH. 
This follows from U(N)/S' = SU(N) /Zyn and the fact 
that SU(N) is a subgroup of U(N). For this reason 
one can choose the lifts ag such that the functions 
fag, take values in the finite subgroup Zy C St. 

The functions fag, define an element a = {d,s} in 
the Cech cohomology H3(U,Z) by a choice of 
logarithms, 


2ridapys = log fagy — log fapa + log fays — log fos [5] 


In the finite-dimensional case, the Cech cocycle is 
necessarily torsion, Na =0, but not so if H is infinite 
dimensional. In the finite-dimensional case (by passing 
to a good cover and using the Cech — de Rham 
equivalence over real or complex numbers), the class is 
third de Rham cohomology constructed from the 
transition functions is necessarily zero. Thus, in 
general one has to work with Cech cohomology to 
preserve torsion information. One can prove: 


Theorem The construction above is a one-to-one 
map between the set of equivalence classes of PU(H) 
bundles over M and elements of H? (M, Z). 


The characteristic class in H?(M, Z) of a PU(H) 
bundle is called the Dixmier-Douady class. 


First example 


Let M be an oriented Riemannian manifold and FM 
its bundle of oriented orthonormal frames. The 
structure group of FM is the rotation group SO(n) 
with n =dimM. The spin bundle (when it exists) is a 
double covering Spin(M) of FM, with structure 
group Spin(z), a double cover of SO(n). Even when 
the spin bundle does not exist there is always the 
bundle Cl(M) of Clifford algebras over M. The fiber 
at x €M is the Clifford algebra defined by the 
metric gx, that is, it is the complex 2”-dimensional 
algebra generated by the tangent vectors v € T(m) 
with the defining relations 


y(u)y(v) + y(v)y(#) = 2g (u,v) 


The Clifford algebra has a faithful representation in 
N =2l”/2] dimensions ([x] is the integral part of x) 
such that 


(a - u) = S(a)y(u)S(a)* 


where S is an unitary representation of Spin(n) in 
C, Since Spin(z) is a double cover of SO(n), the 
representation § may be viewed as a projective 
representation of SO(n). Thus again, if the overlaps 
Uag are contractible, we may choose a lift of the 
frame bundle transition functions gag to unitaries 
ĉag in H =C™. In this case, the functions fagy reduce 
to Zy-valued functions, and the obstruction to the 
lifting problem, which is the same as the obstruction 
to the existence of spin structure, is an element of 
H?(M,Z.), known as the second Stiefel-Whitney 
class w2. The image of w2 with respect to the 
Bockstein map (in this case, given by the formula 
[5]) gives a 2-torsion element in H?(M, Z), the 
Dixmier—Douady class. 

Another way to think of a gerbe is the following 
(we shall see that this arises in a natural way in 
quantum field theory). There is a canonical complex 
line bundle L over PU(H), the associated line bundle 
to the circle bundle St — U(H)— PU(H). Pulling 
back L by the local transition functions 
2.3 > PU(H), we obtain a family of line bundles 
Lag over the open sets Uag. By the cocycle property 
[2] we have natural isomorphisms 


Lap ® Lay = Lay [6] 


We can take this as a definition of a gerbe over M: 
a collection of line bundles over intersections of 
open sets in an open cover of M, satisfying the 
cocycle condition [6]. By [6] we have a trivialization 


Lag o Loy zo Lya — fi aby ` 1 [7] 


where the f’s are circle-valued functions on the 
triple overlaps. By the theorem above, we conclude 


that indeed the data in [6] define (an equivalence 
class of) a principal PU(H) bundle. 

If Lag and Li, are two systems of local line 
bundles over the same cover, then the gerbes are 
equivalent if there is a system of line bundles L, 
over open sets U, such that 


Lag = Log ® Li ® Le [8] 


on each Uag. 

A gerbe may come equipped with geometry, 
encoded in a Deligne cohomology class with respect 
to a given open covering of M. The Deligne class is 
given by functions fagy, 1-forms Aag, 2-forms Fa, 
and a global 3-form (the Dixmier-Douady class of 
the gerbe) Q, subject to the conditions 


dF = 20) 
Fa — Fg = dAag [9] 
Aap a Aay T Asy — ia a 


Gerbes from Canonical Quantization 


Let D, be a family of self-adjoint Fredholm operators 
in a complex Hilbert space H parametrized by x € M. 
This situation arises in quantum field theory, for 
example, when M is some space of external fields, 
coupled to Dirac operator D on a compact manifold. 
The space M might consist of gauge potentials 
(modulo gauge transformations) or M might be the 
moduli space of Riemann metrics. In these examples, 
the essential spectrum of D, is both positive and 
negative and the family D, defines an element of 
K'(M). In fact, one of the definitions of K!(M) is that 
its elements are homotopy classes of maps from M to 
the space F, of self-adjoint Fredholm operators with 
both positive and negative essential spectrum. In 
physics applications, one deals most often with 
unbounded Hamiltonians, and the operator norm 
topology must be replaced by something else; popular 
choices are the Riesz topology defined by the map 
F= F/(|F| +1) to bounded operators or the gap 
topology defined by graph metric. 

The space F, is homotopy equivalent to the 
group G=U;,(H) of unitary operators g in H such 
that g—1 is a trace-class operator. This space is a 
classifying space for principal U,., bundles, where 
Ues is the group of unitary operators g in a polarized 
complex Hilbert space H=H,@ H- such that the 
off-diagonal blocks of g are Hilbert-Schmidt opera- 
tors. This is related to Bott periodicity. There is a 
natural principal bundle P over G = U;(H) with fiber 
equal to the group QG of based loops in G. The total 
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space P consists of smooth paths f(t) in G starting 
from the neutral element such that fdf is smooth 
and periodic. The projection P — G is the evaluation 
at the end point f(1). The fiber is clearly QG. By Bott 
periodicity, the homotopy groups of QG are shifted 
from those of G by one dimension, that is, 


TyQG = Ty41G 


The latter are zero in even dimensions and equal to Z 
in odd dimensions. On the other hand, it is known that 
the even homotopy groups of Ures(H) are equal to Z 
and the odd ones vanish. In fact, with a little more 
effort, one can show that the embedding of QG 
to Ures(H) is a homotopy equivalence, when H = 
L?(S!, H), the polarization being the splitting to non- 
negative and negative Fourier modes and the action of 
QG is the pointwise multiplication on H-valued 
functions on the circle St. 

Since P is contractible, it is indeed the classifying 
bundle for Ues bundles. Thus, we conclude that 
“K!(M)=the set of homotopy classes of maps 
M= G=the set of equivalence classes of Uses 
bundles over M.” The relevance of this fact in 
quantum field theory follows from the properties of 
representations of the algebra of canonical anti- 
commutation relations (CAR). For any complex 
Hilbert space H, this algebra is the algebra gener- 
ated by elements a(v) and a*(v), with v € H, subject 
to the relations 


a*(u)a(v) + a(v)a*(u) = 2 < v,u > 


where the Hilbert space inner product on the right- 
hand side is antilinear in the first argument, and all 
other anticommutators vanish. In addition, a*(u) is 
linear and a(v) antilinear in its argument. 

An irreducible Dirac representation of the CAR 
algebra is given by a polarization H=H,@6H_. 
The representation is characterized by the existence 
of a vacuum vector w in the fermionic Fock space F 
such that 


a*(u)yy=O0=a(v)W ftorue HH, veEH, [10] 


A theorem of D Shale and W F Stinespring says that 
two Dirac representations defined by a pair of 
polarizations H, H’ are equivalent if and only if 
there is g € Ures(H4 © H_) such that H’ =g - H+. In 
addition, in order that a unitary transformation g is 
implementable in the Fock space, that is, there is a 
unitary operator g in F such that 


ga" (vg =a" (gv), WweH [11] 


and similarly for the a(v)’s, one must have g € Ues 
with respect to the polarization defining the vacuum 
vector. This condition is both necessary and sufficient. 
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The polarization of the one-particle Hilbert space 
comes normally from a spectral projection onto the 
positive-energy subspace of a Hamilton operator. In 
the background field problems one studies families 
of Hamilton operators D, and then one would like 
to construct a family of fermionic Fock spaces 
parametrized by x € M. If none of the Hamilton 
operators has zero modes, this is unproblematic. 
However, the presence of zero modes makes it 
impossible to define the positive-energy subspace 
H_,(x) as a continuous function of x. One way out of 
this is to weaken the condition for the polarization: 
each x € M defines a Grassmann manifold Gryes(x) 
consisting of all subspaces W CH such that the 
projections onto W and H,(x) differ by Hilbert- 
Schmidt operators. The definition of Gryes(x) is 
stable with respect to finite-rank perturbations of 
D,./|D,.|. For example, when Dx is a Dirac operator 
on a compact manifold then (Dx — X)/|D, — A| 
defines the same Grassmannian for all real numbers 
A because in each finite interval there are only a 
finite number of eigenvalues (with multiplicities) of 
D,.. From this follows that the Grassmannians form 
a locally trivial fiber bundle Gr over families of 
Dirac operators. 

If the bundle Gr has a global section x++W,, then 
we can define a bundle of Fock space representa- 
tions for the CAR algebra over the parameter space M. 
However, there are important situations when no 
global sections exist. It is easier to explain the 
potential obstruction in terms of a principal Uses 
bundle P such that Gr is an associated bundle to P. 

The fiber of P at x € M is the set of all unitaries g 
in H such that g- H} € Gr, where H=H,@ H_ isa 
fixed reference polarization. Then we have 


Ge= F xy. Gtx, 


res 


where the right action of Ues = Ures(H, 6 H_) in 
the fibers of P is the right multiplication on unitary 
operators and the left action on Gres comes from 
the observation that Gr,es = Uses /(U4 x U_), where 
Us are the diagonal block matrices in Ues. By a 
result of N Kuiper, the subgroup U- x U_ is 
contractible and so Gr has a global section if and 
only if P is trivial. 

Thus, when P is trivial we can define the family of 
Dirac representations of the CAR algebra parame- 
trized by M such that in each of the Fock spaces we 
have a Dirac vacuum which, in a precise sense, is close 
to the vacuum defined by the energy polarization. 
However, the triviality of P is not a necessary 
condition. Actually, what is needed is that P has a 
prolongation to a bundle P with fiber U,.,. The group 
Ü. is a central extension of Ure. by the group St. 


The Lie algebra ftes is as a vector space the direct 
SUM ures DiR, with commutators 


X + A, Y +u] = [X, Y] + c(X, Y) [12] 
where c is the Lie algebra cocycle 
c(X, Y) = ttr e[c, X][e, Y] [13] 


Here € is the grading operator with eigenvalues +1 
on H+. The trace exists since the off diagonal blocks 
of X, Y are Hilbert—Schmidt. 

The group Ures is a circle bundle over Ue. The 
Chern class of the associated complex line bundle is 
the generator of H? (Ues, Z) and is given explicitly at 
the identity element as the antisymmetric bilinear 
form c/2mi and at other points on the group 
manifold through left-translation of c/2mi. If P is 
trivial, then it has an obvious prolongation to the 
trivial bundle M x Uyes. In any case, if the prolonga- 
tion exists we can define the bundle of Fock spaces 
carrying CAR representations as the associated 


bundle 
F=Pxy Fo 


where is Fo is the fixed Fock space defined by the 
same polarization H =H, @ H_ used to define Utes. 
By the Shale-Stinespring theorem, any g € Ues has 
an implementation g in Fo, but g is only defined up 
to phase, thus the central St extension. 

The action of the CAR algebra in the fibers is 
given as follows. For x€ M choose any &€ P,. 
Define 


a" (v) - (8, W) = (84° (g uy) 
where Y% € Fo and v € H; similarly for the operators 
a(v). It is easy to check that this definition passes 
to the equivalence classes in F. Note that the 
representations in different fibers are in general 
inequivalent because the tranformation g is not 
implementable in the Fock space Fo. 

The potential obstruction to the existence of the 
prolongation of P is again a 3-cohomology class on 
the base. Choose a good cover of M. On the 
intersections Uag of the open cover the transition 
functions gag of P can be prolonged to functions 
Capt U= Ures. We have 


£68 p70 = lapi od [14] 


for functions fagy : Uagy > St, which by construction 
satisfy the cocycle property [4]. Since the cocycle is 
defined on a good cover, it defines an integral Čech 
cohomology class w € H?(M, Z). 

Let us return to the universal Ues bundle P over 
G = U4 (H). In this case the prolongation obstruction 
can be computed relatively easily. It turns out that 


the 3-cohomology class is represented by the de 
Rham class which is the generator of H?(G, Z). 
Explicitly, 


a: aiaa 
w= >, att (8 dg) [15] 


Any principal U,-, bundle over M comes from a 
pullback of P with respect to a map f:M—G, so 
the Dixmier-Douady class in the general case is the 
pullback f*w. 

The line bundle construction of the gerbe over 
the parameter space M for Dirac operators is given 
by the observation that the spectral subspaces 
E\y(x) of Dx, corresponding to the open interval 
JA,A‘[ in the real line, form finite-rank vector 
bundles over open sets Uy, =U, N Uy. Here U, is 
the set of points x € M such that A does not belong 
to the spectrum of Dx. Then we can define, as top 
exterior power, 


Lyx = A (Exx) 


as the complex vector bundle over U)). It follows 
immediately from the definition that the cocycle 
property [6] is satisfied. 


Example 1 (Fermions on an interval). Let K be a 
compact group and p its unitary representation in a 
finite-dimensional vector space V. Let H be the 
Hilbert space of square-integrable V-valued func- 
tions on the interval [0,27] of the real axis. For 
each g € K let Dom, C H be the dense subspace of 
smooth functions ù% with the boundary condition 
yW(27) = p(g)W(0). Denote by D, the operator 
—id/dx on this domain. The spectrum of D, is a 
function of the eigenvalues A; of p(g), consisting of 
real numbers n + log (A,)/2m1 with n € Z. For this 
reason the splitting of the one-particle space H to 
positive and negative modes of the operator Dg, is 
in general not continuous as function of the 
parameter g. This leads to the problems described 
above. However, the principal U,es bundle can be 
explicitly constructed. It is the pullback of the 
universal bundle P with respect to the map f:K—G 
defined by the embedding p(K) C Gas N x N block 
matrices, N= dim V. Thus, the Dixmier-Douady 
class in this example is 


w = tr (ole)! dplg))? 16 


Example 2 (Fermions on a circle). Let H= L7(S', V) 
and D4 = —i(d/dx + A) where A is a smooth vector 
potential on the circle taking values in the Lie 
algebra k of K. In this case, the domain is fixed, 
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consisting of smooth V-valued functions on the 
circle. The k-valued function A is represented as a 
multiplication operator through the representation p 
of K. The parameter space A of smooth vector 
potentials is flat; thus, there cannot be any obstruc- 
tion to the prolongation problem. However, in 
quantum field theory, one wants to pass to the 
moduli space A/G of gauge potentials. Here G is the 
group of smooth based gauge transformations, that 
is, G=QK. Now the moduli space is the group of 
holonomies around the circle, A/G = K. Thus, we are 
in a similar situation as in Example 1. In fact, these 
examples are really two different realizations of the 
same family of self-adjoint Fredholm operators. 
The operator Da with k=holonomy(A) has exactly 
the same spectrum as D, in Example 1. For this 
reason, the Dixmier—Douady class on K is the same as 
before. 


The case of Dirac operators on the circle is simple 
because all the energy polarizations for different 
vector potentials are elements in a single Hilbert- 
Schmidt Grassmannian Gr(H, @ H_), where we can 
take as the reference polarization the splitting to 
positive and negative Fourier modes. Using this 
polarization, the bundle of fermionic Fock spaces 
over A can be trivialized as F =A x Fo. However, 
the action of the gauge group G on F acquires a 
central extension G C LK, where LK is the free loop 
group of K. The Lie algebra cocycle determining the 
central extension is 


1 
c(X, Y) = xa | tr,X dY [17] 
where tr, is the trace in the representation p of K. 
Because of the central extension, the quotient F/ C 
defines only a projective vector bundle over A/G, 
the Dixmier-Douady class being given by [16]. 

In the Example 1 (and Example 2) above, the 
complex line bundles can be constructed quite 
explicitly. Let us study the case K =SU(n). Define 
U, C K as the set of matrices g such that A is not an 
eigenvalue of g. Select n different points A; on the 
unit circle such that their product is not equal to 1. 
We assume that the points are ordered counter- 
clockwise on the circle. Then the sets U; = U), form 
an open cover of SU(m). On each U; we can choose a 
continuous branch of the logarithmic function 
log : U; — su(n). The spectrum of the Dirac operator 
D, with the holonomy g consists of the infinite set of 
numbers Z + Spec(—ilog(g)). In particular, the 
numbers Z — ilog A; do not belong to the spectrum 
of D. Choosing u= —ilogàą as an increasing 
sequence in the interval [0,27], we can as well 
define U;={x € M|u;¢ Spec(D,.)}. In any case, the 
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top exterior power of the spectral subspace E,,,,,,, (x) 
is given by zero Fourier modes consisting of the 
spectral subspace of the holonomy g in the segment 
[A;, Az] of the unit circle. 


Index Theory and Gerbes 


Gauge and gravitational anomalies in quantum field 
theory can be computed by Atiyah—-Singer index 
theory. The basic setup is as follows. On a compact 
even-dimensional spin manifold S (without bound- 
ary) the Dirac operators coupled to vector potentials 
and metrics form a family of Fredholm operators. 
The parameter space is the set A of smooth vector 
potentials (gauge connections) in a vector bundle 
over S and the set of smooth Riemann metrics on S. 
The family of Dirac operators is covariant with 
respect to gauge transformations and diffeomorph- 
isms of S; thus, we may view the Dirac operators 
parametrized by the moduli space A/G of gauge 
connections and the moduli space M/Diffo(S) of 
Riemann metrics. Again, in order that the moduli 
spaces are smooth manifolds, one has to restrict to 
the based gauge transformations, that is, those 
which are equal to the neutral element in a fixed 
base point in each connected component of S. 
Similarly, the Jacobian of a diffeomorphism is 
required to be equal to the identity matrix at the 
base points. Passing to the quotient modulo gauge 
transformations and diffeomorphims, we obtain a 
vector bundle over the space 


S x A/G x M/Diffo(S) 18] 


Actually, we could as well consider a generalization 
in which the base space is a fibering over the moduli 
space with model fiber equal to S, but for simplicity 
we stick to [18]. 

According to the Atiyah—Singer index formula for 
families, the K-theory class of the family of Dirac 
operators acting on the smooth sections of the tensor 
product of the spin bundle and the vector bundle V 
over [18] is given through the differential forms 


AN 


A(R) A ch(V) 


where A(R) is the A-roof genus, a function of the 
Riemann curvature tensor R associated with the 
Riemann metric, 


A(R) = der"? ( R/4ri ) 


sinh(R/4ri) 
and ch(V) is the Chern character 


ch(V) = tr ef? 


where F is the curvature tensor of a gauge connec- 
tion. Here both R and F are forms on the infinite- 
dimensional base space [18]. After integrating over 


the fiber S, 
id= J A(R) A ch(V) 19] 
S 


we obtain a family of differential forms 2, one in 
each even dimension, on the moduli space. 

The (cohomology classes of) forms @, contain 
important topological information for the quantized 
Yang-Mills theory and for quantum gravity. The 
form ¢2 describes potential chiral anomalies. The 
chiral anomaly is a manifestation of gauge or 
reparametrization symmetry breaking. If the class 
[2] is nonzero, the quantum effective action cannot 
be viewed as a function on the moduli space. 
Instead, it becomes a section of a complex line 
bundle DET over the moduli space. 

Since the Dirac operators are Fredholm (on 
compact manifolds), at a given point in the moduli 
space we can define the complex line 


DET, = A (ker Do N (coker D) [20] 


for the chiral Dirac operators D}. In the even- 
dimensional case, the spin bundle is Z2 graded such 
that the grading operator I’ anticommutes with D,. 
Then Di =P_D,P,, where Ps =(1/2)(14T) are 
the chiral projections. A'°? means the operation on 
finite-dimensional vector spaces W taking the 
exterior power of W to dim W. 

When the dimensions of the kernel and cokernel 
of D, are constant, eqn [20] defines a smooth 
complex line bundle over the moduli space. In the 
case of varying dimensions, a little extra work is 
needed to define the smooth structure. 

The form @2 is the Chern class of DET. So if DET 
is nontrivial, gauge covariant quantization of the 
family of Dirac operators is not possible. 

One can also give a geometric and topological 
meaning to the chiral symmetry breaking in Hamil- 
tonian quantization, and this leads us back to gerbes 
on the moduli space. Here we have to use an odd 
version of the index formula [19]. Assuming that the 
physical spacetime is even dimensional, at a fixed 
time the space is an odd-dimensional manifold S. 
We still assume that S is compact. In this case, the 
integration in [19] is over odd-dimensional fibers 
and, therefore, the formula produces a sequence of 
odd forms on the moduli space. 

The first of the odd forms ¢,; gives the spectral 
flow of a one-parameter family of operators D,is). 
Its integral along the path x(t), after a correction by 
the difference of the eta invariant at the end points 


of the path, in the moduli space, gives twice the 
difference of positive eigenvalues crossing over to 
the negative side of the spectrum minus the flow of 
eigenvalues in the opposite direction. The second 
term @3 is the Dixmier-Douady class of the 
projective bundle of Fock spaces over the moduli 
space. In Examples 1 and 2, the index theory 
calculation gives exactly the form [16] on K. 


Example Consider Dirac operators on the three- 
dimensional sphere S? coupled to vector potentials. 
Any vector bundle on S? is trivial, so let V = S? x CY. 
Take SU(N) as the gauge group and let A be the space 
of 1-forms on S? taking values in the Lie algebra su(N) 
of SU(N). Fix a point x, on S°, the “south pole,” and 
let G be the group of gauge transformations based at 
xs. That is, G consists of smooth functions 
g:S — SU(N) with g(x,)=1. In this case A/G can 
be identified as Map(S?,SU(N)) times a contractible 
space. This is because any point x on the equator of $° 
determines a unique semicircle from the south pole to 
the north pole through x. The parallel transport along 
this path with respect to a vector potential A € A 
defines an element g’,(x) € SU(N), using the fixed 
trivialization of V. Set ga(x) = 2%, (x)24 x0)", where 
xo is a fixed point on the equator. The element g(x) 
then depends only on the gauge equivalence class [A] € 
A/G. It is not difficult to show that the map At gy is 
a homotopy equivalence from the moduli space of 
gauge potentials to the group G2 = Map, (S°, SU(N)), 
based at xo. When N>2, the cohomology 
H° (SU(N), Z)= Z transgresses to the cohomology 
H?’ (G2, Z) = Z. In particular, the generator 


i \°2 145 
ws = (35) Fle de) 
of H’ (SU(N), Z) gives the generator of H? (G2, Z) by 
contraction and integration, 


a= | ws 
Ce 


Gauge Group Extensions 


The new feature for gerbes associated with Dirac 
operators in higher than one dimension is that the 
gauge group, acting on the bundle of Fock spaces 
parametrized by vector potentials, is represented 
through an abelian extension. On the Lie algebra 
level this means that the Lie algebra extension is not 
given by a scalar cocycle c as in the one-dimensional 
case but by a cocycle taking values in an abelian Lie 
algebra. In the case of Dirac operators coupled to 
vector potentials, the abelian Lie algebra consists of 
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a certain class of complex functions on A. The 
extension is then defined by the commutators 


[(X, a), (Y, 8) = 


where a, are functions on A and £Ly/3 denotes the 
Lie derivative of 8 in the direction of the infinitesi- 
mal gauge transformation X. The 2-cocycle property 
of c is expressed as 


(IX, Y], £x8—Lyate(X,Y)) [21] 


c([X, Y], Z) + Lxc(Y, Z) 
+ cyclic permutations of X, Y, Z = 0 


In the case of Dirac operators on a 3-manifold S the 
form c is the Mickelsson—Faddeev cocycle 


c(X,Y) = =f tr,AA(dX^AdY-dY^AdX) [22] 


E- 1272 

The corresponding gauge group extension is an 
extension of Map(S, G) by the normal subgroup 
Map(A,S!). As a topological space, the extension is 
the product 


Map(A, S!) X çı P 


where P is a principal S! bundle over Map(S, G). 
The Chern class cı of the bundle P is again 
computed by transgression from ws; this time 


a= f ws 
S 


In fact, we can think of the cocycle c as a 2-form on 
the space of flat vector potentials A = g"'dg with g € 
Map(S°, G). Then one can show that the cohomol- 
ogy classes [c] and [c1] are equal. 

As we have seen, the central extension of a loop 
group is the key to understanding the quantum field 
theory gerbe. Here is a brief description of it starting 
from the 3-form [16] on a compact Lie group G. 
First define a central extension Map(D, G) x S! of 
the group of smooth maps from the unit disk D to 
G, with pointwise multiplication. The group multi- 
plication is given as 


(g, A) -(g’,X’) = 2min 88°) 


(2g, rr e 


where 
1 1 1_ı—1 
qy(gg) = 3 az | tek dg A dg’g [23] 


where the trace is computed in a fixed unitary 
representation p of G. This group contains as a 
normal a the group N consisting of pairs 
(g,e°™Cl8)) with 


C(e) = zga | Ede DA 
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Here g(x) =1 on the boundary circle St = ðD, and 
thus can be viewed as a function S$— G. The 
three-dimensional unit ball B has $? as a boundary 
and g is extended in an arbitrary way from the 
boundary to the ball B. The extension is possible 
since m(G)=0 for any finite-dimensional Lie 
group. The value of C(g) depends on the extension 
only modulo an integer and therefore e7"“'8) is 


well defined. 


The central extension is then defined as 
LG = (Map(D,G) x S')/N 


One can show easily that the Lie algebra of LG 
is indeed given through the cocycle [17]. 
When G=SU(n) in the defining representation, 
this central extension is the basic extension: 
The cohomology class is the generator of 
H? (LG, Z). In general, to obtain the basic exten- 
sion one has to correct [23] and [24] by a 
normalization factor. 

This construction generalizes to the higher loop 
groups Map(S,G) for compact odd-dimensional 
manifolds S. For example, in the case of a 
3-manifold, one starts from an extension of 
Map(D, G), where D is a 4-manifold with bound- 
ary S. The extension is defined by a 2-cocycle y, 
but now for given g,g’ the cocycle y is a real- 
valued function of a point go E€ Map(S, G), which is 
a certain differential polynomial in the Maurer- 
Cartan 1-forms go'dgo,g 'dg,g ‘dg. The normal 
subgroup N is defined in a similar way; now C(g) is 
the integral of the S-form ws over a 5-manifold 
B with boundary OB identified as D/-~, the 
equivalence shrinking the boundary of D to one 
point. This gives the extension only over the 
connected component of identity in Map(S, G), but 
it can be generalized to the whole group. For 
example, when $=S° and G is simple, the con- 
nected components are labeled by elements of the 
third homotopy group 73G =Z. 

In some cases, the de Rham cohomology class of 
the extension vanishes but the extension still 
contains interesting torsion information. In quan- 
tum field theory this comes from Hamiltonian 


formulation of global anomalies. A typical example 
of this phenomenon is the Witten SU(2) anomaly in 
four spacetime dimensions. In the Hamiltonian 
formulation, we take S° as the physical space, the 
gauge group G=SU(2). In this case, the second 
cohomology of Map(S*,G) becomes pure torsion, 
related to the fact that the 5-form ws on SU(2) 
vanishes for dimensional reasons. Here the homo- 
topy group 74(G)=Z, leads the nontrivial funda- 
mental group Z2 in each connected component of 
Map(S?, G). Using this fact, one can show that 


there is a nontrivial Z2 extension of the group 
Map(S°, G). 


See also: Anomalies; Bosons and Fermions in External 
Fields; Characteristic Classes; Dirac Operator and Dirac 
Field; Index Theorems; K-Theory. 
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Introduction 


In the Ginzburg-Landau theory of superconduc- 
tivity, a complex order parameter WV characterizes 
a macroscopic/mesoscopic superconducting state 
in a bulk superconductor. The square of the 
magnitude ||" expresses the density of super- 
conducting electrons and W is regarded as a 
macroscopic wave function. With a magnetic 
vector potential A and the order parameter W, 
the Helmholtz free energy density in a super- 
conducting material near the critical temperature 
is given by 





F=F,+a|¥|? +S yu 
1 e, HJ 
a e A 





where F, denotes the energy density of the normal 
state, c is the light speed, H=curl A, and m, and e, 
are mass and charge of a superconducting electron, 
respectively. The parameters a and ( depend on 
temperature and are determined by the material. 
Moreover, below the critical temperature Te, 
a=a(T) and G=((T) take negative and positive 
values, respectively. In the presence of an applied 
magnetic field Hap, we have to consider the Gibbs 
free energy density, G = F — H - Hap/4r. 
Introduce the following physical parameters: 


Yo = y —a/P, H. = \/ Ana? /(B 
\= \/-Gm.c2/4nae2, £=- /2ma | 


K=A/E 




















The value 4 implies the equilibrium density and H, 
is the thermodynamic critical field, which is 
obtained by equating G = F, — [Hap|*/ 87 (for the 
normal state Y =0, H= Hp) with G = F, — a?/26 
(for the perfect superconductivity |Y] = ws, A=0). 
The parameters and € stand for penetration depth 
and coherence length, respectively. The ratio «x of 
these characteristic lengths is called the Ginzburg- 
Landau parameter, which determines the type of 
superconducting material: type I for k < 1//2 and 
type II for k > 1/2. 

















Ginzburg-Landau Equation 547 


We use the nondimensional variables x’, Y’, A’, 
Hap’, and G: 














ve VU = Vo’ 
A =V2H.€A' (H! = curl’ A’), 
Hap = V2H Hap’ /« [2] 


F = F, + (G/k? — 1/2 
+ 2H! - Hap'/K? — |Hap | /K?)H2/4r 

















Dropping the primes after the change of variables 
and integrating G over a domain Q C R”(n=2,3), 
which is occupied by a superconducting sample, 
yields a functional of Y and A, called the Ginzburg- 
Landau energy in a nondimensional form, 


k2 
E(W, A) =| {Cv = iAP Sa i) 


Q 





+ |curl A — HoP} dx [3] 


The Ginzburg-Landau equations are the Euler- 
Lagrange equations of this energy, which are given 


by 





(V -iAV =e (JWl?—1) in Q [4] 
curl? A = J + curl Hap in Q [5] 
where 
1 
J := z: (WVU — YV Y*) — EJA [6] 


W* stands for the complex conjugate of Y. In a two- 
dimensional domain Q, the differential operator 
“curl” acts on A = (A1, A2): R? > R? such that 
curl A = y, A2 — Ox, A1 
curl H = (0,,H, —0,,H) 
H :=curl A 





and Hap is replaced by a scalar-valued function. 
Note that J represents a supercurrent in the 
material. Every critical point of the energy is 
obtained by solving the Ginzburg-Landau equations 
with appropriate boundary conditions and, thus, a 
physical state in the superconducting sample is 
realized by a solution of the equations. A minimizer 
of [3] is a solution of [4]-[5] that minimizes the 
energy |3] in an appropriate function space, whereas 
a local minimizer is a solution minimizing the energy 
locally in the space. A solution is called a stable 
solution if it is a local minimizer of the energy. 
A physically stable phenomenon could be realized 
by a minimizer or at least a local minimizer. 
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The Ginzburg-Landau energy and the equations 
are gauge invariant under the transformation 


(Y, A)H> (We, A+ Vx) [7] 


for a smooth scalar function x(x). Therefore, we can 
identify two solutions which have the correspon- 
dence through the transformation [7]. The following 
London (Coulomb) gauge is often chosen: 


dvA=0 inQ [8] 
(with a boundary condition if necessary). 

Let (Y, A) be a smooth solution of [4]-[5]. In a region 
for |W(x)| > 0, the expression W = w(x)exp (i6(x)) 
(w =|W(x)|) leads to 


V?w = |V0 — Afw + k? (u? — 1)w [9] 
div(w7(V0 — A)) = 0 [10] 
curl’? A = J = w° (V8 — A) [11] 





where the gauge [8] is fixed and curl Hap=0 is 
assumed. Let S be a surface in Q bounded by a 
closed curve OS. Suppose w(x)>0 on OS. Then 
from [11], 


= w “ds 
=| O +A)-d 


=j ZI ds+ | curl A -dS 
os W S 


= | V0-ds = 2dr [12] 
as 

where d is an integer; in fact, d=deg(WV, ðS) is the 
winding number of W(OS) in the complex plane. 
Thus, the identity [12] relates the magnetic field to a 
topological degree of the order parameter. The 
quantity ®, multiplied by an appropriate constant, 
is called the fluxoid. A connected component of 
vanishing points of Y generally has codimension 2 in 
the domain, and it is called a vortex. 

From the expression [9], the asymptotic behavior 
w— 1 as k—co is expected under a suitable 
condition. Then, by [11], H=curl A enjoys the 
property curl? H + curl H =0, which is known as the 
London equation. However, this is valid for |W| > 0. 
Otherwise, a singularity appears around zeros of Y. 

There are several characteristic phenomena 
observed in a bulk superconductor. Typical phenom- 
ena are: perfect conductivity (persistent current), 
perfect diamagnetism (Meissner effect), nucleation 
of superconductivity, and vortices (quantization of a 
penetrating magnetic field). These phenomena can be 
expressed by solutions of the Ginzburg-Landau 
equations in various settings. 











Ginzburg-Landau Equations in R° 


A standard model of the Ginzburg-Landau energy is 
considered in the whole space R*. Let A = (Aj, A2) 
and assume Hap = 0 in [3]. Consider then the energy 
functional 





£(W, A) -j DAYI + curl Al? 
R2 


K? 242 
tra de [13] 


where Da:=V — iA. Then the Ginzburg-Landau 
equations are 


DŽ? Y = «* (|W)? — 1)0 
curl’ A = Im(Y* DAY) 


in R? [14 
in R? [15] 


In the gauge theory, this model can be regarded as a 
two-dimensional abelian (U(1)) Higgs model. In that 
context, Y is a scalar (Higgs) field, A is a connection 
on the U(1) bundle R? x U(1), and Dy, is the 
covariant derivative. 

Equations [14]-[15] are useful in observing quan- 
tization of the magnetic field, although it is an ideal 
model for superconductivity. By the natural condition 
that the right-hand side of [13] is finite, we may 
assume that |D4W|,|curl AJ +0 and |W|—1 as 
|x| — oo. From [12], the flux quantization follows: 


J curl A dx = 2dr [16] 
R2 


If Y has a finite number of zeros {aj} 1> [16] implies 


N 
f, curl A dx = 27X deg(W, OB(aj, p)) 
R j=l 
for a small positive number p, where B(a,, p) stands 
for the disk with the center a; and the radius p. 
A zero of W represents a vortex, at which the 
magnetic field is quantized, and a supercurrent 
moves around the field. 
To characterize the configuration analytically, we 
find a solution (W,A) expressed by the polar 
coordinate in the form 


Y = f(r) exp(id0), A(r) = a(r)(— sin 8, cos 6) 


Substituting these into [14]-[15], one obtains 


This system of the equations has a solution for « > 0. 
In addition to these types of solutions, when 
k=1//2, a special transformation reduces the 
system of [14]-[15] to a scalar nonlinear equation 
with a singular term. Then, it is proved that for an 
arbitrary d € Z, under the constraint of [16] there 
exists a minimizer of [13] with zeros of prescribed 


points {aj}, (Jaffe and Taubes 1980). 


Solutions for Persistent Current 


A current flowing in a superconducting ring with no 
decay even in the absence of an applied magnetic 
field is called a persistent current. Assume that a 
superconducting sample 9 in R? is surrounded by 
vacuum and adopt the energy functional as 


2 
E(W,A) = | [Daw +F- UP) de 
+f curl Al? [17] 
R? 


Although the functional [17] is minimized by a 
trivial solution (Y, A) = (exp (ic), 0)(c € R), which is 
the case for perfect diamagnetism, this is not the 
solution describing a persistent current since J =0 
everywhere. We have to look for a nontrivial 
solution that locally minimizes the energy, that is, 
a local minimizer of [17]. To characterize a solution 
representing the persistent current, we define a 
mapping from 2 to S! c C by x € Q = U(x)/|U(x)| 
for a solution (W,A) of the corresponding 
Ginzburg-Landau equations to [17]. Consider a domain 
having infinitely many homotopy classes in the 
space of continuous functions C?(Q,S') (e.g., a 
solid torus). If (W,A) is a local minimizer and 
W/|V| is not homotopic to a constant map of 
C°(Q,S'), then it is a solution describing a 
persistent current. The existence of such a solution 
has been established mathematically for large k 
(Jimbo and Morita 1996, Rubinstein and Sternberg 
1996). 


Configuration of Solutions under an 
Applied Magnetic Field 


In the presence of an applied magnetic field, 
according to the magnitude of the field, a sample 
exhibits the transition from the superconducting 
state to the normal state and vice versa. This 
transition can be considered mathematically as a 
bifurcation of solutions to the Ginzburg-Landau 
equations with a parameter measuring the magni- 
tude of the applied magnetic field. In fact, let Hap be 
an applied magnetic field perpendicular to the 
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horizontal plane and assume that it is constant 
along the vertical axis, that is, Hap = (0,0, Ha). 
Then a rich bifurcation structure is suggested by 
numerical and analytical studies in the parameter 
space of (H,,«). Mathematical developments for 
variational methods and nonlinear analysis reveal 
the configuration of the solutions and provide 
rigorous estimates for critical fields in a parameter 
regime for a two-dimensional model, predicted by 
physicists. 

Throughout this section, we consider the Ginzburg- 
Landau model in an infinite cylinder Q=D x R 
(D c R?) with a constant applied magnetic field 
Hap = Haez = (0,0, Ha), Ha > 0. Assuming the uni- 
formity along the vertical axis, we may write 
A= (A1,A2) and H=curl A=0,,A2 — ôx A1 as in 
the previous section. Then the Ginzburg-Landau 
energy on D 1s 








2 
ewa)= f fpa a - j9Py 
D 
+ |curl A — Hal? hax [18] 


With the London gauge 
divA=0 in D, A-n=0 on ðD 


the Ginzburg-Landau equations in the present 
setting are written as 


D? Y = ((%-1)}% in D [19] 
-V° A = Im(Y*D4Y) in D [20] 
n-VV=0 on OD [21] 
curl A = Ha on OD [22] 


where n denotes the outer unit normal. 


Meissner Solutions 


As seen in the case of no applied magnetic field, the 
trivial solution (¥, A) = (exp(ic),0) is a minimizer of 
[18]. This solution expresses no magnetic field in the 
sample. In a superconducting sample, the diamag- 
netism holds even in the presence of an applied 
magnetic field if the field is weak. Namely, the 
sample is shielded so that penetration of the field is 
only allowed near the surface of the sample. This 
phenomenon is called the Meissner effect. A solution 
expressing Meissner effect is called a Meissner 
solution. Mathematically, it is understood that as 
H, increases, such a Meissner solution continues 
from the trivial solution. Then the solution preserves 
the configuration 0 < |W(x)|<1. A study of the 
asymptotic behavior of the Meissner solution as « 
tends to œo shows that the Meissner solution is a 
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minimizer up to H, = O(log «) for sufficiently large 
k (Serfaty 1999). 


Nucleation of Superconductivity 


In an experiment, the Meissner state breaks down by 
a stronger applied magnetic field. Then the sample 
turns to be the normal state (in a type I conductor) 
or it allows a mixed state of superconductivity and 
normal state (in a type II conductor). In the former 
case, the critical magnitude of the field is denoted by 
H., which corresponds to the one of [1], while it is 
denoted by He, in the latter case. Moreover, the 
mixed state eventually breaks down to be normal 
state by further increasing the applied field up to 
another critical field He. To characterize these 
two types mathematically, we consider a transition 
from the normal state to the superconducting state 
by reducing the magnitude of the field. 

Let Ap satisfy Curl Ay, = Hale D) and Ayp n= 
O(x € OD). Then eqns [19]-[22] have a trivial 
solution (Y, A)= (0, Aap), which stands for the 
normal state. Consider the second variation of the 
energy functional [18] at this trivial solution 


1 d? 
af ICV — iÁap) p|? 


zga Elt, Aap + sB) 
— Kê? \)|* + |curl B| dx 





If the minimum of this second variation for nonzero 
(W,B) is positive (or negative), then the trivial 
solution is stable (or unstable). The minimum gives 
the least eigenvalue of the linearized problem of 
[19]-[20] around the trivial solution. Seeking such a 
least eigenvalue p is reduced to studying an 
eigenvalue problem of the Schrodinger operator 
Llp] := —(V — iAap) Y. 

If the domain D is the whole space R’, it is 
proved that u = H}. Back to the original variable of 
[2], we can define a critical field He =V/2H.k; 
k=1/\2 separates a class of superconductors into 
type I by «<1/V2 (Ha < H.) and type II by 
k > 1/V2 (He, > He). 

In the bounded domain D, however, the critical 
field at which superconductivity nucleates in the 
interior of a sample is larger than He, (it is denoted 
by H,,), since the eigenvalue problem of L is 
considered in the domain with the Neumann 
boundary condition. A study of the least eigenvalue 
u shows that the critical field has the asymptotics as 


Ha /V2H, = A O(1), 
where 0 < 3 < 1. If the applied field is very close to 
H,, and « is sufficiently large, the amplitude of the 
eigenfunction associated with the least eigenvalue of 


K —> CO 


L (with the Neumann boundary condition) is very 
small except for a 1/« neighborhood of the 
boundary. This implies that the nucleation of super- 
conductivity takes place at the boundary. This 
phenomenon is called surface nucleation (Del Pino 
et al. 2000, Lu and Pan 1999). 


Solutions of Vortices 


In a type II superconductor, it is well known that 
there exists a mixed state of superconductivity and 
normal state in a parameter regime Ha < Ha < Ho. 
In the mixed state, the magnetic field penetrating in 
the sample is quantized such that it delivers a finite 
number of lines or curves in the sample. This 
configuration (called vortex) is characterized by 
zero sets of the order parameter of the Ginzburg- 
Landau equations. In a two-dimensional domain, 
isolating vanishing points of the order parameter are 
called vortices. Thus, it is quite an interesting 
problem how such a vortex configuration can be 
described mathematically by a minimizer of the 
energy functional. In the section “Ginzburg-Landau 
equations in R?,” a specific configuration for vortex 
solutions is stated under very special conditions, 
k=1/v2, on the whole space and no applied 
magnetic field. However, this result is not general- 
ized in the present setting. 

A standard approach to a solution with the vortex 
configuration is using a bifurcation analysis near the 
critical field H., (or H.,) by expanding a solution and 
the difference H, — He, in a small parameter. Then the 
leading term is given by an eigenfunction of the least 
eigenvalue of the Schrodinger operator coming from the 
linearization. Under the doubly periodic conditions in 
the whole space R?, the spatial pattern of vortices, called 
Abrikosov’s vortex lattice, is studied by a local bifurca- 
tion theory. 

However, this kind of bifurcation analysis only 
works near the critical field and the trivial solution 
(Y, A) =(0, Aap), which implies that only a small- 
amplitude solution can be found. To realize a sharp 
configuration of vortices, we need to consider a 
parameter regime far from the bifurcation point. As 
a matter of fact, mathematical and numerical studies 
for sufficiently large « exhibit nice configurations of 
vortex solutions. In this case, in a neighborhood of 
each vortex, with radius O(1/K), a sharp layer 
arises, and there exists a solution with multivortices 
in an appropriate parameter region for Ha. In 
addition, as H, increases (up to He), the number 
of vortices also increases. This implies that the 
minimizer of the energy functional [18] admits a 
larger number of zeros for a higher magnitude of 
applied magnetic field. However, it is a puzzle since 


a solution with a smaller number of vortices seems 
to have less energy. Thus, there is some balance 
mechanism between contributions of the vortices 
and the applied magnetic field to the total energy. 

Mathematically, it is possible to estimate 
E(W, A) for the vortex solution to [19]-[22] as 
follows: consider a family of square tiles K; with 
side-length p which are periodically arranged over 
the whole space. Assume each square in the 
domain D has a single vortex. For an appropriate 
test function, the energy over K; is estimated as 
O(log(Kp)). Since the number of vortices in the 
domain is O(|D|/p*) (|D|: the measure of D), we 
obtain an upper bound O((|D|/p7) log(Kp)). This 
bound is less than €(0, Aap) =|D|K*/2 for H,/K* = 
o(1) and p=1/\/H,. Although in a general case it is 
difficult to estimate the energy of the minimizer from 
below, the leading order can be precisely determined in 
some range of the interval (Hea, He2) if « is sufficiently 
large (Sandier and Serfaty 2000). 


A Simplified Model 


Since the Ginzburg-Landau equations [4]-[5] are 
coupled equations for Y and A, we often encounter 
mathematical difficulty in realizing a solution with 
the configuration shown by a numerical experiment. 
To look at a specific configuration, we may use a 
simpler model equation. A typical simplification is 
to neglect the magnetic field, which leads to the 
equation for the order parameter W: 


Vy +A- Y= in 2 [23] 


This equation is also called the Ginzburg-Landau 
equation and it is the Euler-Lagrange equation of 
the energy 


k2 
Gv) = | P+O- WP Pde pA 


in an appropriate function space. Under no constraint, 
a constant solution with |~|=1 is a minimizer. If a 
domain is topologically nontrivial, eqn [23] also 
allows local minimizers of [24] for large «k as seen in 
the section “Solutions for Persistent Current.” 

On the other hand, [23] in a simply connected 
domain DcCR* with a boundary condition 
w=g(x)(x € OD) is used for a study of a vortex 
solution for large «. Let e= 1/Kx. Under the constraint 
deg(g, 0D) =d, a minimizer p. must have at least |d| 
zeros. The leading order of the energy around each 
vortex is estimated as 27 log (1/e). The result of Bethuel 
et al. (1994) describes the energy for a minimizer 


G(e) = 2n|d| log(1/e) +y + Wat, .-.,46) +0(1) 


Ginzburg-Landau Equation 551 


where {at} are zeros of ye and y is a universal 
constant. The function W is explicitly given as 


. Ajd) = 27 ` 


1<j,k<|d|,j#k 


W(a,.. log |a; — a| + R 


where R is derived from a Green function satisfying 
some boundary condition depending on g. More- 
over, as € — 0, the zeros converge to a minimizer of 
W, which implies that the asymptotic position of 
every zero (vortex) is determined by the explicit 
function W. The first term of W shows that vortices 
with the same sign of the degree are repulsive to one 
another and the optimal arrangement of vortices 
never allows the superposition of multivortices. 
Although the boundary condition is rather artificial, 
their mathematical formulation promoted the devel- 
opment of variational methods applied to the 
Ginzburg-Landau equation. 


Time-Dependent Ginzburg-Landau 
Equations 


The Ginzburg-Landau equations in the preceding 
sections are static models. We consider time evolu- 
tion models called the time-dependent Ginzburg- 
Landau equations. The evolution equations serve 
various numerical simulations exhibiting dynamical 
properties of solutions. They also provide mathe- 
matical problems on global time behaviors of 
solutions, stability of stationary solutions, dynami- 
cal laws of vortices, etc. The Ginzburg-Landau 
energy is denoted by E(u), u =(¥, A). The simplest 
model for the time-dependent problem is the 
gradient flow for E(u) 


where 6€/é6u is the first variation of the energy. 
A more standard evolution equation in a nondimen- 
sional form is given by 


(, +id)W-D{V=Kr7(1—-]U)7)U [25] 
WA + Ve) + curl? A = Im(W*D,4W)+ curl Hap [26] 





where (x,t) is the electric (scalar) potential and ņ is 
a positive parameter with a physical quantity. In 
fact, this equation was derived by Gor’kov and 
Eliashberg from the Bardeen, Cooper, and Schrieffer 
(BCS) theory. 

The system of the equations [25]-—[26] is invariant 
under the following time-dependent gauge 
transformation: 


(¥,ġ, A) > (¥ exp(ix), $ — Ax, A + VX) 
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The equations in the bounded domain D c R? are 


considered subject to boundary and initial 
conditions 
Dav -n=0 on O2 x (0, T) 
curl A = H, on ðQ x (0, T) 27] 
P(x, 0) = Yo in Q 
A(x,0) = Ao(x) in Q 


Then, besides the Coulomb gauge [8], we can 
choose the Lorentz gauge as follows: 


[eax =0 


For a smooth solution u(x,t) to [25]-[26] with [27], 


d 
dt 
holds if Hap is time independent. This is also true in 
the case of the whole space R* with a condition for 
the asymptotic behavior as |x| — oo. 

Suppose that a domain 2 C R? is occupied by a 
superconducting sample and it is surrounded by a 
medium (or vacuum). Then the electromagnetic 
behavior in the outside domain, caused by the 
induced magnetic field of a supercurrent in 2 and 
an applied magnetic field, should be expressed 
by the Maxwell equations. With the electric field 
E=—(yd0,;A + Vo), we obtain 


div A+¢=0 in D, 


A-n=0O0 on OD 


E(u) = -2 | (3; tid) Ul + nlO,A + Vol? dx < 0 
Q 








—vð,E — oE + curl? A = curl Hap in R? \ Q 


where u,v, and o are physical parameters (e.g., o= 0 
in the vacuum). To match the inside and the outside 
of Q, appropriate boundary conditions are required. 

From a point of the gauge theory as in the section 
“Ginzburg-Landau equations in R*,” the following 
time-dependent equations in the whole space are 
also considered: 


(3s +i) Y — DŻ Y = k?(1 — | 8/7) 
—O,E + curl? A = Im(Y*Da Y) 
-V -E = Im(Y* (ð; + id)¥) 


Other Topics 


In realistic problems, a superconducting sample 
contains impurities. This inhomogeneity is usually 
expressed by putting a variable coefficient into the 
Ginzburg-Landau energy and the equations. Such a 
model with a variable coefficient is useful in studies 
for pinning of vortices, Josephson effect through an 


inhomogeneous media, etc. A model in a thin film 
with variable thickness is also described by the 
Ginzburg-Landau equations with a variable coeffi- 
cient. Since the Ginzburg-Landau equations (or a 
modified model) can be considered in various settings, 
more applications to realistic problems would be 
treated by the development of nonlinear analysis. 


See also: Abelian Higgs Vortices; Bifurcation Theory; 
Evolution Equations: Linear and Nonlinear; High Te 
Superconductor Theory; Image Processing: 
Mathematics; Integrable Systems: Overview; Interacting 
Stochastic Particle Systems; Ljusternik-Schnirelman 
Theory; Nonlinear Schrödinger Equations; Quantum 
Phase Transitions; Variational Techniques for 
Ginzburg-Landau Energies. 
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Introduction 


Many macroscopic systems if left to evolve in 
isolation or in contact with a bath, are able to 
relax, after a finite time, to history-independent 
equilibrium states characterized by time-independent 
values of the state variables and time-translation 
invariance correlations. In glassy systems, the relaxa- 
tion time becomes so large that equilibrium behavior 
is never observed. On short timescales, the micro- 
scopic degrees of freedom appear to be frozen in 
far-from-equilibrium disordered states. On longer 
timescales slow, history-dependent, off-equilibrium 
relaxation phenomena become detectable. 

The list of physical systems falling in disordered 
glassy states at low temperature is long, just to mention 
a few examples one can cite the canonical case of simple 
and complex liquid systems undergoing a glass transi- 
tion, polymeric glasses, dipolar glasses, spin glasses, 
charge density wave systems, vortex systems in type II 
superconductors, and many other systems. 

Experimental and theoretical research has pointed 
out the existence of dynamical scaling laws char- 
acterizing the off-equilibrium evolution of glassy 
systems. These laws, in turn, reflect the statistical 
properties of the regions of configuration space 
explored during relaxation. 

The goal of a theory of glassy systems is the 
comprehension of the mechanisms that lead to the 
growth of relaxation time and the nature of 
the scaling laws in off-equilibrium relaxation. 
A well-developed description of glassy phenomena 
is provided by mean-field theory based on spin glass 
models, which gives a coherent framework that is 
able to describe the dynamics of glassy systems and 
provides a statistical interpretation of glassy relaxa- 
tion. Despite important limitations of the mean-field 
description for finite-dimensional systems, it allows 
precise discussions of general concepts such as 
effective temperatures and configurational entropies 
that have been successfully applied to the descrip- 
tion of glassy systems. 

In the following, examples of two different ways of 
freezing will be discussed: spin glasses, where 
disorder is built in the random nature of the coupling 
between the dynamical variables, and structural 
glasses, where the disordered nature of the frozen 
state has a self-induced character. These systems are 
examples of two different ways of freezing. 


A Glimpse of Freezing Phenomenology 
Spin Glasses 


The archetypical example of systems undergoing the 
complex dynamical phenomena described in this 
article is the case of spin glasses (Fischer and Hertz 
1991, Young 1997). Spin glass materials are 
magnetic systems where the magnetic atoms occupy 
random position in lattices formed by nonmagnetic 
matrices fixed at the moment of the preparation of 
the material. The exchange interaction between the 
spin of the magnetic impurities in these materials is 
an oscillating function, taking positive and negative 
values according to the distance between the atoms. 

Spin glass models (see Spin Glasses, Mean Field 
Spin Glasses and Neural Networks, and Short- 
Range Spin Glasses: The Metastate Approach) are 
defined by giving the form of the exchange 
Hamiltonian, describing the interaction between 
the spins $; of the magnetic atoms. In the presence 
of an external magnetic field þh, the exchange 
Hamiltonian can be written as 


H=- jsi -hY sS [1] 


i, JEA IEA 


The spin variable can have classical or quantum 
nature. This article will be limited to the physics of 
classical systems. The most common choice in 
models is to use Ising variables $; = +1. The 
couplings J;;, which in real material depend on the 
distance, are most commonly chosen to be indepen- 
dent random variables with a distribution with 
support on both positive and negative values. Most 
commonly, one considers either a symmetric bimo- 
dal distribution on {—1,1} or a symmetric Gaussian. 
The sums are restricted to lattices A of various types. 
The most common choices are A=Z? for the 
Edwards—Anderson model, the complete graph 
A={(i, j)li < j3i,/=1,...,N} for the Sherrington- 
Kirkpatrick (SK) model, and the Erdos—Renyi ran- 
dom graph for the Viana—Bray (VB) model. 

The presence of interactions of both signs induces 
frustration in the system: the impossibility of 
minimizing all the terms of the Hamiltonian at the 
same time. One then has a complex energy land- 
scape, where relaxation to equilibrium is hampered 
by barriers of energetic and entropic nature. 

Spin glass materials, which have a paramagnetic 
behavior at high temperature, show glassy behavior 
at low temperature, where magnetic degrees of 
freedom appear to be frozen for long times in 
apparently random directions. There is quite a 
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general consensus, based on the analysis of the 
experimental data and the numerical simulations, 
that in three dimensions and in the absence of a 
magnetic field, the two regimes are separated by a 
thermodynamic phase transition at a temperature T, 
where the magnetic response x exhibits a cusp (see 
Figure 1). By linear response, x is related to the 
equilibrium spin correlation function 


<= ND (S#)— i") 


having denoted by (-) the Boltzmann—-Gibbs aver- 
age. A cusp in x indicates a second-order transition 
where the so-called Edwards—Anderson parameter 
g=(T/N) ws becomes different from zero, 
indicating freezing of the spins in random directions. 
In the presence of a magnetic field, although the 
low-temperature phenomenology is similar to 
the one at zero field, the thermodynamic nature of 
the freezing transition is more controversial. Theo- 
retically, mean-field theory, based on the SK model, 
predicts a phase transition with a cusp in the 
susceptibility both in the absence and in the 
presence of a magnetic field. Unfortunately, no 
firm theoretical result is available on the existence 
and the nature of phase transitions in finite- 
dimensional spin glass models which is a completely 
open problem. 
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Figure 1 Magnetic susceptibility as a function of temperature 
in spin glass materials. Reproduced from Fischer KH and 
Hertz JA (1991) Spin Glasses. Cambridge, UK: Cambridge 
University Press. 


Structural Glasses 


Analogous freezing of dynamical variables is 
observed in a variety of systems. Some of them 
share with the spin glasses the presence of quenched 
disorder; in many others, this feature is absent. This 
is the case of structural glasses (Debenedetti 1996). 

Many liquids under fast enough cooling, instead 
of crystallizing, as dictated by equilibrium thermo- 
dynamics, form glasses. Simple liquids can be 
modeled as classical systems of particles with 
pairwise interactions. In the simplest example of a 
monoatomic liquid, the potential energy of a 
configuration is then written as 


V fic fN] = ` plr =T) 2 


i<j 


In the case of atomic mixtures, the potential ¢ 
acquires a dependence on the species of the inter- 
acting atoms. 

Liquids can be characterized as good or bad glass 
formers depending on the facility by which they 
form glasses. In good glass formers, in order to 
avoid crystallization, it is in general sufficient to 
cross the region around the liquid-crystal transition 
point fast enough, so that the systems can set in a 
supercooled liquid metastable equilibrium. On low- 
ering the temperature, the supercooled liquid 
becomes denser and more viscous while the relaxa- 
tion time of the system, related to the viscosity 
through the Maxwell relation t=7/G (G is the 
instantaneous shear modulus of the liquid), under- 
goes a rapid growth. One defines a conventional 
glass transition temperature T, as the point where 7 
takes the solid-like value 7 = 10! Poise, correspond- 
ing to a relaxation time r ~ 100s. After that point, 
the system falls out of equilibrium; under usual 
experimental conditions, it does not have the time to 
adjust to external solicitations and behaves mechani- 
cally like a solid. The glass transition temperature is 
then characterized as the point where the liquid goes 
out of equilibrium, the relaxation time becomes 
larger than the external timescale and the positions 
of the atoms appear as frozen on that scale. 

A great effort has been devoted to understand the 
behavior of the temperature dependence of the 
relaxation time and the nature of the dynamical 
processes in supercooled liquids. In deeply super- 
cooled liquids, the empirical behavior of the relaxa- 
tion time ranges from the Arrhenius form for 
“strong glasses” T ~ exp(A/T) to the Vogel—Fulcher 
form T(T) ~ exp(D/(T — To)) for “fragile glasses.” 
The Vogel—Fulcher law predicts a finite-temperature 
divergence of the relaxation time at the temperature 
To. Unfortunately, in typical cases, the To results are 
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estimated to be 10-15% lower then T, so that it is 
not possible to verify the law close enough to To to 
support the divergent behavior. 

As a consequence of freezing, one observes 
important qualitative changes in the behavior of 
thermodynamic quantities similar to those encoun- 
tered in equilibrium phase transitions. In a narrow 
interval around T,, specific heat and compressibility 
undergo jumps from liquid-like values to much 
lower solid-like values. 


Aging and Slow Dynamics 


While the crudest picture of the glass transition 
describes freezing as complete structural arrest, both 
for the cases where the glass transition is a gen- 
uine off-equilibrium phenomenon, as in structural 
glasses, and in the case where it has a thermo- 
dynamical character as in spin glasses, the study of 
dynamical quantities reveals the existence of persist- 
ing, history-dependent, slow relaxation processes in 
the frozen phase (Norblad and Svendlidh 1997). 
This is the phenomenon of aging, which is a 
constitutive feature of the glassy state. Its theoretical 
analysis occupies a central theoretical role in the 
comprehension of the way glassy systems explore 
configuration space. A first characterization of 
relaxation is given by the behavior of “one-time 
quantities” like internal energy, density, etc., which 
slowly evolve in the course of time towards values 
corresponding to states of lower free energy. More 
interesting is the behavior of “two-time quantities,” 
time-dependent correlation functions and responses, 
which reveal the deep off-equilibrium nature of 
glassy relaxation. In experimental, numerical, and 
theoretical studies, a special position is occupied by 
the linear response function. Using the language of 
magnetic systems, apt to the spin glasses, one 
considers the response of the magnetization to an 
applied magnetic field. To deal with other systems, 
different conjugated couples of variables are con- 
sidered and simple changes of language are needed. 
Linear perturbations allow to reveal the dynamics of 
the systems without affecting its evolution. Denoting 
by M(t) the magnetization at a time ¢ and by h(t’) 
the magnetic field at time ¢’, the instantaneous linear 
response function is defined as 


. (M(t) 
ee eas 3 


Measures of the time integral of R(t,t') are com- 
monly performed to reveal the presence of aging in 
glassy systems. Aging is usually studied observing the 
dynamics that follows a rapid quench from high 


temperature, at an instant that marks the origin of 
time. One can reconstruct the response function 
measuring the zero-field-cooled (ZFC) magnetization 
as the response to a magnetic field acting from a 
waiting time tw to the measuring time f, 


Vc. ie) = f aero. t') |4] 


or its complement, the thermoremanent magnetization 
(TRM) corresponding to the response to a magnetic 
field acting from the time of the quench up to tw 


xlt t) = f “W R(t) [s] 


In Figure 2, the behavior of the susceptibility zc is 
shown as a function of t — ty in a typical example 
of aging experiment at low temperature. Out-of- 
equilibrium behavior is manifest in the dependence 
of the curves on the waiting time ty. The relaxation 
appears slower and slower for larger waiting times, 
and the ty dependence does not disappear even for 
very large times. Two nontrivial dynamical regimes 
can be identified: a first regime for small t — tw, that 
is, £ — tw << tw where the relaxation is independent 
of ty and a second regime roughly valid for t — tw ~ ty 
where time-translation invariance is manifestly vio- 
lated. The analysis of experimental and simulation data 
shows a scale-invariant behavior according to which 
curves corresponding to different waiting times can be 
superimposed rescaling the time difference t — tw witha 
suitable t,,-dependent relaxation time T(t,). This is a 
growing function of tw which seems to diverge for large 
tw. Up to the waiting times where it has been possible to 
test the relation, T(tw) behaves as a power T(ty) ~ t% 
where in different materials and models, a = 0.8-0.9. 
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Figure 2 ZFC magnetization in an aging experiment. The 
curves, from bottom to top, correspond to increasing waiting 
times. Reproduced from Norblad P and Svendlidh P (1997) 
Experiments in spin glasses. In: Young AP (ed.) Spin Glasses 
and Random Fields. Singapore: World Scientific, with permission 
from World Scientific Publishing Co. Pte Ltd. 
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Many efforts have been devoted to the compre- 
hension of the scaling laws in aging (Bouchaud 
et al.). Among the theories and models of aging that 
have been proposed, one can cite the phenomeno- 
logical model known as “trap model,” developed by 
Bouchaud and collaborators that assimilates aging 
to a random walk between “traps” characterized by 
a broad distribution of trapping times. Suitable 
choices of the trapping-time distribution allow to 
derive scaling laws similar to the ones characteristic 
of aging systems. A different theory, the “droplet 
model” for spin glasses assimilates aging phenomena 
to the competition between slowly growing domains 
of equilibrium phases, in analogy with the dynamics 
of phase separation in first-order phase transitions. 
The approach that has led to the most detailed and 
spectacular predictions has been the study of 
microscopic mean-field models. 


Mean-Field Models of Disordered 
Systems 


Mean-field theory starts from the analysis of the 
relaxation dynamics of disordered systems with 
weak long-range forces (Bouchaud et al.). The 
reference model of spin glass mean-field theory is 
the so called p-spin model, which considers N spins 
S; with random p-body interactions with each other 
and is described by the Hamiltonian 


1,N 
H,(S) = > Vic Sa? i [6] 
q<- *<lp 
where the quenched coupling constants Ji,.i, are 


assumed to be i.i.d. Gaussian variables with zero 
average and N dependent variance E( Ji. = 
p!/2N?-!. The case p=2 coincides with the SK 
model denned in the introduction. The reason for 
considering the p-spin generalization is that the 
order of the transition passes from the second one 
for p =2 to the first one for p > 3 and that this last 
case has been suggested to provide a mean-field limit 
for the structural glass transition. It is also useful to 


define Hamiltonians 
= Š 4H, IS] 7] 
p21 


that mix p-spin Hamiltonians for different p. These 
are random Gaussian functions of the spin variables, 
with covariance induced by the coupling distribution 


E|H(S)H(S')] = — (S,S)) 


D AY eal S, SOP [8] 
p>1 


where the function 


q(S, S") = -Dss 


is the overlap between configurations. A crucial 
hypothesis in the study of relaxation in spin systems 
is that any local spin update rule verifying the detailed 
balance condition with respect to the Boltzmann- 
Gibbs measure gives rise to the same long-time 
properties. In this perspective, in Monte Carlo simula- 
tions, it is convenient to use Ising spins with 
Metropolis or Glauber dynamics. Much theoretical 
progress has been achieved considering spherical 
models where the spin variables are real numbers 
subject to a global spherical constraint $`; $7 = N and 
evolve according to the following Langevin dynamics: 
dS;(t OH (S(t 
BD SEE) si) +n) 1 





where u(t) is a time-dependent multiplier that at 
each instant of time insures that the spherical 
constraint is respected, and 7;(t) is a thermal white 
noise with variance 


E(nj(t)nj(s)) = 2 T6j6(t — s) [10] 


In order to model the quench from high temperature 
performed in experiments, the initial conditions are 
randomly chosen with uniform probability. To 
describe long but finite-time dynamics, it is neces- 
sary to consider the limit of large volume N — oo for 
finite time, which is the only case where one can 
have infinite relaxation times. Application of func- 
tional Martin—Siggia—Rose techniques has allowed 
the derivation of closed integro-differential equa- 
tions for the spin autocorrelation function 


and the response to an impulsive external field 


= sd (aie) 


where the average has to be intended on quenched 
disordered couplings, initial conditions, and realiza- 
tion of thermal noise. Unfortunately, in the case 
p=2 relevant for spin glass phenomenology, the 
spherical constraints reduce the model to a linear 
system where different eigenmodes of the interaction 
matrix J; evolve independently. This oversimplifica- 
tion renders the model similar to systems apt to 
describe phase separation rather then freezing 
phenomena. Many of the glassy features of the SK 
model however are captured by a mixture of p=2 


hilt), R(t, t’) 
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with p =4 Hamiltonians and f(q) = (1/2)(q* + aq’*). 
For the general Hamiltonian [7] one gets the 
coupled equations 








E = — p(t) C(t, ¢') 
+ f a prca, t RE, t CE, t) 
0 
+f dt” f'(C(t, t") R(t", t’) 
a D -= MARGE t) 


+ J “de” PCERE RE t) (11 


u(t) is a multiplier that at each instant of time 
insures the spherical constraint C(t,t)=1, and is 
determined by 


u(t) = fo C(t, t) R(t, t) + T [12] 


In the next sections, we will discuss how these 
equations describe dynamical freezing at low tem- 
perature. The gross features are determined by the 
form of the function f(g). Two main behaviors can 


be identified: 


1. Systems of type I. This behavior is found if 
aa a is a monotonically decreasing 
function of q. To this family belongs the pure 
spherical p-spin model for p > 3, and one finds a 
dynamical transition not corresponding to a 
point of singularity in the free energy where the 
Edwards—Anderson parameter jumps discontinu- 
ously to a nonzero value. Models of this family 
have been proposed as appropriate mean-field 
limits for structural glass behavior. 

2. Systems of type II. This behavior is found if 
Magli ao is a monotonically increasing 
function of q. This family mimics the behavior 
of the SK model. An example of function f 
verifying the condition for type II behavior is 
f(q)=1/2(q* +aqf) for sufficiently small but 
positive values of a. In this case, the dynamical 
transition is found at a point of second-order 
singularity of the free energy and the Edwards- 
Anderson parameter is continuous at the transi- 
tion. Models of this family provide a mean-field 
limit for spin glass type behavior. 


Equilibrium Dynamics at High Temperature 


At high temperature, after a finite transient, eqn [11] 
describes equilibrium behavior. In these conditions, 


time-translation invariance holds C(t, t’) = C(t — t'), 
R(t, t') = R(t — t) while the Lagrange multiplier u 
becomes time independent. In addition, correlation 
and response are related by the fluctuation- 
dissipation theorem (FDT) relation 


= 1dC{t) 
T dt 


Ergodic behavior is manifest in the fact that the 
dynamics decorrelate completely; lim; — o C(t) = 0. 
Then from [11] one gets the equilibrium equation: 


a = —~TC(2) r- J dsf’(C(e - 3) e 114] 


S 





Ki) = [13] 


It is worth noticing that this equation, apart from an 
irrelevant inertial term, coincides for type I systems 
with the schematic mode-coupling theory (MCT) 
equation which has been successfully used to 
describe moderate supercooled liquids (Goetze 
1989). In the context of liquid theory, mode- 
coupling equations stem from an approximate 
treatment for the dynamical evolution of the 
density—density space and time-dependent correla- 
tion function. The schematic MCT equations con- 
sider an equation for a single mode, neglecting any 
space dependence of the correlator. 

Both in type I and in type II systems, eqn [14] 
displays a dynamical transition at a finite tempera- 
ture Tę where the relaxation time diverges as a 
power law 7 ~ |T — T| and the asymptotic value of 
the correlation acquires a nonzero value. 

This behavior in type I systems represents a failure 
of MCT to describe the temperature dependence of 
the relaxation time in supercooled liquids, which, as 
previously observed, empirically follows the Vogel- 
Fulcher law. The MCT temperature is interpreted as 
a singularity which is avoided in supercooled 
liquids, thanks to relaxation mechanisms specific of 
short-range systems. It has been noticed that this 
singularity at T, can be associated to the growth of 
spatial heterogeneities and dynamical correlations, 
as exemplified in the behavior of the four-point 
function 


alt) = 5 (6(05(0)8:(0)5)(0)) 


and its associate correlation length (Franz and Parisi 
2000, Biroli and Bouchaud 2004). 


Off-Equilibrium Dynamics Below /,: Aging 
and Slow Dynamics 


Type I systems Below the transition temperature 
Te slow dynamics and aging set in. In 1993, 
Cugliandolo and Kurchan found a long-time 
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solution to the equations of motion [11] for type I 
systems describing an asymptotic off-equilibrium 
state that follows from high-temperature quench. 
Soon after, type II systems were also analyzed 
(Bouchaud et al.). 

The equations can be analyzed in the limit in 
which both times tend to infinity t, t — oo. In this 
regime all “one-time quantities,” that is, state 
functions like energy, magnetization, etc., reach 
asymptotic time-independent limit. Though the 
decay to the asymptotic value cannot read directly 
from the analysis of the equations in that limit, 
numerical and theoretical evidence suggests that the 
final values are approached as power laws in time. 

The study of correlation and response functions 
displays an asymptotic scaling behavior similar to 
the one observed in glassy systems in laboratory and 
numerical experiments. 

Two different interesting regimes are found, first of 
all there is a stationary regime: the limit ft, tw — œœ is 
performed keeping the difference t — tw =s finite. In 
this regime, equilibrium behavior is observed, with 
correlation and response related by the FDT relation 
Rg(s) = —GOC,(s)/Os. The stationary regime is fol- 
lowed by an aging regime, where correlations decay 
below the value gga = lim, _. oo Cy(s) down to zero. 
One of the most striking features of aging evolution is 
that the system — though at a decreasing speed — 
constantly move far apart from any visited region of 
configuration space. The decay of correlations is 
nonstationary and takes place on a timescale T(tw) 
diverging for large tw. While the theory can infer the 
existence of the timescale (ty), its precise form 
remains undetermined. This is a consequence of an 
asymptotic invariance under monotonous time repar- 
ametrizations t— g(t) appearing for large times. 
Coherently with nonstationary behavior, other equi- 
librium properties break down in the aging regime. 
Correlation and response which do not verify the 
FDT are rather asymptotically related by a general- 
ized form of the fluctuation—dissipation relation 


Rea = eee 


[15] 
This relation, despite predicting the vanishing of 
the instantaneous response, implies a finite contribu- 
tion of the aging dynamics to the value of the 
integrated ZFC and TRM responses. The constant 
X, called fluctuation—dissipation ratio (FDR), is a 
temperature-dependent factor monotonically vary- 
ing between the values 1 and 0 as the temperature is 
decreased from T, down to zero. Violations of the 
FDT have to be expected in any off-equilibrium 
regime; however, a constant ratio between response 


and derivative of the correlation is very nongeneric. 
It is of great theoretical importance that the same 
constant that governs the FDR among spin auto- 
correlation and magnetic response, also appears in 
the relation of any other conceivable couple of 
correlation and conjugated response in the system. 
Slow dynamics can be interpreted as motion 
between finite-life metastable states with well- 
defined free energy f and exponential multiplicity 


exp(NX(f)). The FDR verifies the generalized 
thermodynamic relation 

Ou X 

FT [16] 


This relation is in turn intimately related to the 
possibility of considering the ratio Tep =T/X as 
an effective temperature, that governs the 
heat exchanges among slow degrees of freedom 
(Cugliandolo et al. 1997). Slow degrees of freedom 
do not exchange heat with the fast ones, but they 
are in equilibrium between themselves at the 
temperature Tę. The validity of relation [16] has 
been put at the basis of a detailed statistical 
description of the glassy state (Franz and Virasoro 
2000, Biroli and Kurchan 2001, Nieuwenhuizen 
2000) which assumes that metastable states with 
equal free energy are encountered with equal 
probability during the descent to equilibrium. 
Modified thermodynamic relations follow, that 
condensate all the dependence on the thermal 
history in the value of the effective temperature. 
Given the interest of a thermodynamic description 
of the glassy state, many numerical studies have 
addressed the problem of the identification and 
determination of effective temperatures from the 
fluctuation-dissipation relations, and its relation 
with configurational entropy. In Figure 3 the result 
of a numerical study on a realistic system is 
presented, verifying relation [15]. Experimental 
verifications are at the moment starting and new 
results are waited in the future. 


Type II systems In these systems the dynamic 
transition occurs at the point of thermodynamic 
singularity, where the Edwards—Anderson parameter 
becomes nonzero in a second-order fashion. The 
magnetic susceptibility exhibits a cusp singularity 
similar to the one found in spin glass materials. 
Differently from type I systems, one-time quantities 
tend to their equilibrium values for long times. The 
off-equilibrium nature of the relaxation shows up in 
the behavior of correlations and responses, which 
display aging behavior. 

Their behavior generalizes the one found in type I 
systems, with a more complex pattern of violation 
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Figure 3 Fluctuation—dissipation plot; yzec(t, tw) vs. C(t, ty) in 
a model of Lennard-Jones glass for different values of the 
waiting time tw. The slope of the curves is equal to the finite-time 
FDR divided by the temperature. One observes the characteristic 
shape of type-I systems with an FDR equal to 1 in the stationary 
regime for high correlations and equal to a constant smaller then 
1 in the low-correlations aging regime. Reproduced from Kob W 
and Barrat J-L (1999) Europhysics Letters 46: 637, with 
permission from EDP Sciences. 


of time-translation invariance and FDT. Also in this 
case a short-time equilibrium behavior can be 
identified where the correlation decreases from 1 to 
qga and a long-time inhomogeneous aging behavior 
where correlations decrease to zero. Differently from 
type I system it is impossible to characterize aging 
through a unique timescale 7(tw). One finds instead 
a continuum of timescales hierarchically organized. 
The analysis of the equations at _ the 
reparametrization-invariant level reveals the existence 
of a continuum of separate timescales T(tw,q) asso- 
ciated to each value of C(t,ty)=q < dpa and that 
liti 00 T(tws 7)/T(tw, 9’) = 0 for g > q’, meaning that 
for finite ty the time to decay to q’ is much larger than 
the time to decay to q. For large times, 1 << ty << 
ty << t3, the correlations verify the ultrametric prop- 
erty C(t3, t1) = min[C(t3, t2), C(t, t1)]. To each time- 
scale corresponds in this case a different effective 
temperature, and correlation and response are related 
by the equation 


a R(t, t') 
PXA) = i ICG, 09/08 


17) 
ae 
where the function X(q) is an increasing function of 
q with the properties of a cumulative probability 
distribution. In fact it can be seen (Franz et al. 1999) 
that this is related to the Parisi overlap probability 
function describing the correlations among ergodic 
components at equilibrium, in a generalization of 
relation [14]. Figure 4 shows the result of a 
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Figure 4 Fluctuation—dissipation plot in a three-dimensional 
spin glass at low temperature. As predicted for type-II systems, 
the FDR is an increasing function of the correlation, constant 
only in the stationary part of the relaxation. Reproduced from 
Marinari E et al. (1998) Violation of the fluctuation dissipation 
theorem in finite dimensional spin glasses. Journal of Physics A: 
Mathematical and General 31: 2611, with permission from 
Institute of Physics Publishing Ltd. 


numerical experiment in a three-dimensional spin 
glass, where X(q) is not piecewise constant. 

The ideas presented in this article, fruits of mean-field 
theory of disordered systems, are objects of intense 
debate in their application to the physics of short-range 
systems. Many of the relations derived have stimulated 
a lot of numerical, experimental, and theoretical work. 
Some of the predictions of the theory are very well 
verified in many short-range glassy systems, at least on 
the accessible timescales. Notably, the violations of 
FDR, and the possibility to associate the values of the 
FDR to effective temperatures is very well verified both 
in structural glass models, and in finite-range spin 
glasses. Since finite aging times imply finite length scales 
over which the dynamic variables can exhibit correlated 
behavior, this indicates that the mean-field theory is at 
least good at describing glassy phenomena on a local 
scale. The question if the mean-field theory also gives a 
good description on the infinite time limit and the 
anomalous response persists forever is at present an 
open theoretical problem. It relates to the possibility of 
having mean-field type of equilibrium ergodicity break- 
ing, which is an open question, object of active research. 


See also: Interacting Stochastic Particle Systems; 
Short-Range Spin Glasses: The Metastate Approach; 
Spin Glasses. 
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Definitions 
Graded Vector Spaces 


By a Z-graded vector space (or simply, graded 
vector space) we mean a direct sum A=4;-7A; of 
vector spaces over a field k of characteristic zero. 
The A; are called the components of A of degree i 
and the degree of a homogeneous element a€A is 
denoted by |a|. We also denote by A[m] the graded 
vector space with degree shifted by n, namely 
Ajlan] = 6;<z(A[z]); with (A[m]);=Aj.,. The tensor 
product of two graded vector spaces A and B is 
again a graded vector space whose degree r 
component is given by (A & B), = @p4g=r Ap 8 Bg. 

The symmetric and exterior algebras of a graded 
vector space A are defined, respectively, as S(A) = 
T(A)/Is and A(A)=T(A)/In, where T(A)= nzo 
A®” is the tensor algebra of A and Is (resp. I, ) is the 
two-sided ideal generated by elements of the form 
a@b—(-1)lb@a (resp. a@b+(-1)"" bea), 
with a and b homogeneous elements of A. The images 
of A®” in S(A) and A (A) are denoted by $”(A) and 
|\"(A), respectively. Notice that there is a canonical 
decalage isomorphism $”(A[1]) ~ A ”(A)[z]. 
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Graded Algebras and Graded Lie Algebras 


We say that A is a graded algebra (of degree zero) if 
A is a graded vector space endowed with a degree 
zero bilinear associative product: A@A—A. A 
graded algebra is graded commutative if the product 
satisfies the condition 


a-b = (-1)"lb.g 


for any two homogeneous elements a,b€A_ of 
degree |a| and |b|, respectively. 

A graded Lie algebra of degree n is a graded 
vector space A endowed with a graded Lie bracket 
on A[m]. Such a bracket can be seen as a degree —n 
Lie bracket on A, that is, as a bilinear operation 
{-,-}:A@A—A|[—7] satisfying graded antisymme- 
try and graded Jacobi relations: 


{a,b} = oe Oe kaka i a} 


{a, {b,c}} = {{a, b}, c} + (— 1) fal, ch} 


Graded Poisson Algebras 


We can now define the main object of interest of 
this note: 


Definition 1 A graded Poisson algebra of degree n, 
or n-Poisson algebra, is a triple (A,-,{,}) consisting 
of a graded vector space A = @j<7, Aj endowed with 
a degree zero graded commutative product and with 





a degree —n Lie bracket. The bracket is required to 
be a biderivation of the product, namely: 


{a,b-c} = {a,b} c + (—1)?""b - fa, c} 


Notation. Graded Poisson algebras of degree zero 
are called Poisson algebras, while for n=1 one 
speaks of Gerstenhaber (1963) algebras or of 
Schouten algebras. 

Sometimes a Zy-grading is used instead of a 
Z-grading. In this case, one just speaks of even and 
odd Poisson algebras. 


Example 1 Any graded commutative algebra can 
be seen as a Poisson algebra with the trivial Lie 
structure, and any graded Lie algebra can be seen as 
a Poisson algebra with the trivial product. 


Example 2 The most classical example of a 
Poisson algebra (already considered by Poisson 
himself) is the algebra of smooth functions on R” 
endowed with usual multiplication and with the 
Poisson bracket {/, g} = O,ifOp,g — Oz'80p,f, where the 
prs and the q’’s, for i=1,...,m, are coordinates on 
R™”. The bivector field ð; \ 0p, is induced by the 
symplectic form w = dp; \ dq’. An immediate gener- 
alization of this example is the algebra of smooth 
functions on a symplectic manifold (R7”, w) with the 
Poisson bracket {f, g}=w"d;f0;g, where wð; Að; is 
the bivector field defined by the inverse of the 
symplectic form w=wydx! A dx’; viz. ww = 68. 

A further generalization is when the bracket on 
C% (R”) is defined by {f,g}=a%0,f0.g, with the 
matrix function œ not necessarily nondegenerate. 
The bracket is Poisson if and only if œa is skewsym- 
metric and satisfies 


al Aa! —— a! A;al® -- af ða =0 


An example of this, already considered by Lie 
(1894), is a(x)=f/xf, where the fps are the 
structure constants of some Lie algebra. 


Example 3 Example 2 can be generalized to any 
symplectic manifold (M,w). To every function 
heC™(M) one associates the Hamiltonian vector 
field X, which is the unique vector field satisfying 
ix,w=dh. The Poisson bracket of two functions 
f and g is then defined by 


{f,g} = ix iX, W 


In local coordinates, the corresponding Poisson 
bivector field is related to the symplectic form as in 
Example 2. 


A generalization is the algebra of smooth 
functions on a manifold M with bracket 
(f,g} =(aldf \dg), where œ is a bivector field 
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(i.e. a section of A*TM) such that {a,a}sy =0, 
where {-,-}csxy is the Schouten—Niyenhuis bracket 
(see the first subsection in the next section for 
details, and Example 2 for the local coordinate 
expression). Such a bivector field is called a Poisson 
bivector field and the manifold M is called a Poisson 
manifold. Observe that a Poisson algebra structure 
on the algebra of smooth functions on a smooth 
manifold is necessarily defined this way. In the 
symplectic case, the bivector field corresponding to 
the Poisson bracket is the inverse of the symplectic 
form (regarded as a bundle map TM — T*M). 

The linear case described at the end of Example 2 
corresponds to M=q* where gq is a (finite- 
dimensional) Lie algebra. The Lie bracket N?g—> g 
is regarded as an element of g@ N g*CT(A Tg*) 
and reinterpreted as a Poisson bivector field on g. 
The Poisson algebra structure restricted to polyno- 
mial functions is described at the beginning of the 
next section. 


Batalin-Vilkovisky Algebras 


When n is odd, a generator for the bracket of an 
n-Poisson algebra A is a degree —n linear map from 
A to itself, 


A: A=A[=n] 


such that 
A(a-b) = A(a)-b + (—1)"a- A(b) + (—1)"! fa, b} 


A generator A is called exact if and only if it satisfies 
the condition A? =0, and in this case A becomes a 
derivation of the bracket: 


A({a,b}) = {A(a),b} + (-1)""*"{a, A(b)} 


Remark 1 Notice that not every odd Poisson 
algebra A admits a generator. For instance, a 
nontrivial odd Lie algebra seen as an odd Poisson 
algebra with trivial multiplication admits no gen- 
erator. Moreover, even if a generator A for an odd 
Poisson algebra exists, it is far from being unique. In 
fact, all different generators are obtained by adding 
to A a derivation of A of degree —n. 


Definition 2 An z-Poisson algebra A is called an 
n-Batalin-Vilkovisky algebra, if it is endowed with 
an exact generator. 


Notation. When n=1 it is customary to speak 
of Batalin-Vilkovisky algebras, or simply BV alge- 
bras (see Batalin—Vilkovisky Quantization; see also 
Batalin and Vilkovisky (1963), Getzler (1994), and 
Koszul (1985)). 
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There exists a characterization of m-Batalin— 
Vilkovisky algebras in terms of only the product and 
the generator (Getzler 1994, Koszul (1985). Suppose 
in fact that a graded vector space A is endowed with a 
degree zero graded commutative product and a linear 
map A:A-— A[—n] such that A? =0, satisfying the 


following “seven-term” relation: 
A(a-b-c) + A(a)-b-c + (-1)a-A(b)-¢ 
+ (-1)449.b- Alo) 
= A(a-b)-c + (-1)"a-A(b-c) 
+ (-1) 4D. Afa- c) 
In other words, A is a derivation of order 2. 


Then, if we define the bilinear operation 
{,}:A@A—A[-n] by 


{a,b} = (1) (Ala -b) — A(a) -b 


-(-1)"la- A(b)) 


we have that the quadruple (A,-,{,},A) is an 
n-Batalin-Vilkovisky algebra. Conversely, one easily 
checks that the product and the generator of an 
n-Batalin-Vilkovisky algebra satisfy the above 
“seven-term” relation. 


Examples 
Schouten-Nijenhuis Bracket 


Suppose g is a graded Lie algebra of degree zero. 
Then A=S(q[z]) is a (—7)-Poisson algebra with its 
natural multiplication (the one induced from the 
tensor algebra T(A)) and a degree —n bracket 
defined as follows (Koszul 1985, Krasil’shchik 
1988): the bracket on S!(gf[n]) = gfn] is defined as 
the suspension of the bracket on g, while on 
Sk(q[n]), for k >1, the bracket, often called the 
Schouten-—Nijenhuis bracket, is defined inductively 
by forcing the Leibniz rule 


{a,b-c} = {a,b} -c+ (—1)?"4" 0. {a, c} 


Moreover, when n is odd, there exists a generator 
defined as 


A(a1-a2--- ak) 
= X (-1) {aaj} a1 ag 
i<j 
where a1,...,ap E9 and e=]a;| + (la;|+1)(lai| +--+ 
jaiz] ti 1)+(la;|+1fa| +t ailt laj- 
j—2). An easy check shows that A?=0, thus 


S(q[7z]) is an n-Batalin-Vilkovisky algebra for 
every odd neN. For n=—1 the A-cohomology 


on Ag is the usual Cartan—Chevalley—Eilenberg 
cohomology. 

In particular, one can consider the Lie algebra 
g= Der(B) = Gjez Der’(B) of derivations of a graded 
commutative algebra B. More explicitly, Der’ (B) con- 
sists of linear maps ¢:B—>B of degree j such 
that o(ab) = d(a)b + (-1)'ad(b) and the bracket 
is {6,¥} = dow — (-1)9"!b od. The space of multi- 
derivations S(Der(B)[{—1]), endowed with the Schouten- 
Nijenhuis bracket, is a Gerstenhaber algebra. 

We can further specialize to the case when B is the 
algebra C% (M) of smooth functions on a smooth 
manifold M; then *(M) = Der(C™(M)) is the space of 
vector fields on M and Y(M)=S(X(M)[—1]) is the 
space of multivector fields on M. It is a classical 
result by Koszul (1985) that there is a bijective 
correspondence between generators for V(M) and 
connections on the highest exterior power 
A‘'™ TM of the tangent bundle of M. Moreover, 
flat connections correspond to generators which 
square to zero. 


Lie Algebroids 


A Lie algebroid E over a smooth manifold M is a 
vector bundle E over M together with a Lie algebra 
structure (over R) on the space T(E) of smooth 
sections of E, and a bundle map p: E— TM, called 
the anchor, extended to a map between sections of 
these bundles, such that 


1X, fY} = HX, Y} + AY 


for any smooth sections X and Y of E and any 
smooth function f on M. In particular, the anchor 
map induces a morphism of Lie algebras 
px :T(E)—> X(M), namely p.({X, Y}) = {p.(X), p(Y))}. 

The link between Lie algebroids and Gerstenhaber 
algebras is given by the following Proposition 
(Kosmann-Schwarzbach and Monterde 2002, Xu 
1999): 


Proposition 1 Given a vector bundle E over M, 
there exists a one-to-one correspondence between 
Gerstenhaber algebra structures on A=I\( N(E)) 
and Lie algebroid structures on E. 


The key of the proposition is that one can extend 
the Lie algebroid bracket to a unique graded 
antisymmetric bracket on I(/A(E)) such that 
(X, f}=p(X)f for XET(A(E)) and feT( ACE), 
and that for O€T(A7*1(E)), {O,-} is a derivation 
of T( N (E)) of degree q. 


Example 4 A finite-dimensional Lie algebra g can 
be seen as a Lie algebroid over a trivial base 
manifold. The corresponding Gerstenhaber algebra 
is the one of last subsection. 


Example 5 The tangent bundle TM of a smooth 
manifold M is a Lie algebroid with anchor map 
given by the identity and algebroid Lie bracket given 
by the usual Lie bracket on vector fields. In this 
case, we recover the Gerstenhaber algebra of multi- 
vector fields on M described in the last subsection. 


Example 6 If M is a Poisson manifold with Poisson 
bivector field a, then the cotangent bundle T*M 
inherits a natural Lie algebroid structure where the 
anchor map a”: T;M-—T,M at the point p€ M is 
given by a7 (€)(7) =a(E,7), with £ n€ TM, and the 
Lie bracket of the 1-forms w1 and w is given by 


{w1, w2} = Lot (u) w2 — Lot) w — dalwi, 2) 


The associated Gerstenhaber algebra is the de Rham 
algebra of differential forms endowed with the 
bracket defined by Koszul (1985). As shown in 
Kosmann-Schwarzbach (1995), r(A (T*M)) is indeed 
a BV algebra with an exact generator A = [d, ia] given 
by the commutator of the contraction i, with the 
Poisson bivector aœ and the de Rham differential d. 
Similar results hold if M is a Jacobi manifold. 


It is natural to ask what additional structure on a 
Lie algebroid E makes the Gerstenhaber algebra 
r(A (E)) into a BV algebra. The answer is given by 
the following result, which is proved in Xu (1999), 


Proposition 2 Given a Lie algebroid E, there is a 
one-to-one correspondence between generators for the 
Gerstenhaber algebra T(/\(E)) and E-connections on 
AXFE (where rkE denotes the rank of the vector 
bundle E). Exact generators correspond to flat E- 
connections , and in particular, since flat E-connections 
always exist, T(N (E)) is always a BV algebra. 


Lie Algebroid Cohomology 


A Lie algebroid structure on E— M defines a 
differential 6 on I'(/\ E*) by 


oF =p d. f €C™(M) = T(A°E*) 
and 


(6a, XA Y) := (6(a, X), Y) — (6(a, Y), X) 
— (a, {X, Y})X, Y ET(E), w€ T(E") 


where p*:0'(M)—I(E*) is the transpose of 
px: 1 (E)— X(M) and (,) is the canonical pairing of 
sections of E* and E. On I'(A\"E*), with n > 2, the 
differential 6 is defined by forcing the Leibniz rule. 

In Example 4 we get the Cartan—Chevalley— 
Eilenberg differential on /\q*; in Example 5 we 
recover the de Rham differential on (*(M)= 
r(A T*M), while in Example 6 the differential on 
VIM) =P(A TM) is {a, }sn: 
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Lie-Rinehart Algebras 


The algebraic generalization of a Lie algebroid is a 
Lie-Rinehart algebra. Recall that given a commu- 
tative associative algebra B (over some ring R) and a 
B-module g, then a Lie—Rinehart algebra structure 
on (B,q) is a Lie algebra structure (over R) on g and 
an action of g on the left on B by derivations, 
satisfying the following compatibility conditions: 


{y, ao} = y(a)6 + aty, oF 
(ay)(b) = a(q(b)) 


for every a,b € B and 7,0 €Q. 

The Lie-Rinehart structures on the pair (B,q) 
biectively correspond to the Gerstenhaber algebra 
structures on the exterior algebra /\,(q) of g in 
the category of B-modules. When g is of finite rank 
over B, generators for these structures are in turn 
in bijective correspondence with (B, q)-connections on 

tkBSq, and flat connections correspond to exact 
generators. For additional discussions, see Gerstenhaber 
and Schack (1992) and Huebschmann (1998). 

Lie algebroids are Lie—Rinehart algebras in the 
smooth setting. Namely, if E — M is a Lie algebroid, 
then the pair (C~(M),I(E)) is a Lie—Rinehart 
algebra (with action induced by the anchor and the 
given Lie bracket). 


Lie-—Rinehart Cohomology 


Lie algebroid cohomology may be generalized to 
every Lie—Rinehart algebra (B,q). Namely, on the 
complex Altg(g, B) of alternating multilinear func- 
tions on g with values in B, one can define a 
differential 6 by the rules 


(64,7) = (a),  aé€B=Alt,(g,B), veg 
(6a, y ^0) = (6(a, Y), o) — (6(4, 0), Y) — (4,195 OF) 
y,0€g, a€Alt,(g,B) 


and forcing the Leibniz rule on elements of 
Alt5(g, B), n > 2. 


Hochschild Cohomology 


Let A be an associative algebra with product u, and 
consider the Hochschild cochain complex 
Hoch(A) =] |„>ọ Hom(A ®”, A)| -n +1]. There are 
two basic operations between two elements 
f c Hom(A®%, A)[—k + 1] and g € Hom(A ®%, A)[—/ + 
1], namely a degree zero product 


f Ug(a1 @---@ apy) 
= (-1)"f(a1 @-+- Qa) -gla ® +B ayy) 
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and a degree —1 bracket {f,g}=fog— (—1)*- WO) 
gof, where 


fog(a Q--+Q api 4) 
= | 
= NODE Y (a 8- aiD Eldi 
i=1 


D:a) @---@apyt-) 


It is well known from Gerstenhaber (1963) that the 
cohomology HHoch(A) of the Hochschild complex 
with respect to the differential dHoch = {u, - } has the 
structure of a Gerstenhaber algebra. More generally, 
there is a Gerstenhaber algebra structure on Hochs- 
child cohomology of differential graded associative 
algebras (Loday 1998). 


Graded Symplectic Manifolds 


The construction of Example 3 can be extended 
to graded symplectic manifolds (see Supermanifolds; 
see also Alexandrov et al. (1997), Getzler (1994), 
and Schwarz (1993)). Recall that a symplectic 
structure of degree n on a graded manifold N is a 
closed nondegenerate 2-form w such that Lgw = nw 
where Lg is the Lie derivative with respect to the 
Euler field of N (see Roytenberg (2002) for details). 
Let us denote by X, the vector field associated to the 
function h € C° (N) by the formula ix,w = dh. Then 
the bracket 


{f, g} = ix ix W 


gives C% (N) the structure of a graded Poisson 
algebra of degree n. 

If the symplectic form has odd degree and the 
graded manifold has a volume form, then it is 
possible to construct an exact generator defined by 


A(f) = 4div(X/) 


where div is the divergence operator associated to 
the given volume form (Getzler 1994, Kosmann- 
Schwazbach and Monterde 2002). 

An explicit characterization of graded symplectic 
manifolds has been given in Roytenberg (2002). In 
particular, it is proved there that every symplectic 
form of degree n with n > 1 is necessarily exact. 
More precisely, one has w= d(igw/n). 


Shifted Cotangent Bundles 


The main examples of graded symplectic manifolds 
are given by shifted cotangent bundles. If N is a 
graded manifold then the shifted cotangent bundle 
T*[n]N is the graded manifold obtained by shifting 
by n the degrees of the fibers of the cotangent 
bundle of N. This graded manifold possesses a 


nondegenerate closed 2-form of degree n, which can 
be expressed in local coordinates as 


y= X dx’ ^ dx! 


where {x'} are local coordinates on N and {xt} are 
coordinate functions on the fibers of T*[n]N. In 
local coordinates, the bracket between two homo- 
geneous functions f and g is given by 


tzi OF Og 


{fg} => Gel ax! Oxi 


_ (1) falei C8 OF 
Ax! Ox! 


If in addition the graded manifold N is orientable, 
then T*[z]N has a volume form too; when n is odd, 
the exact generator A(f)=(1/2)divXf is written in 
local coordinates as 


_ 0 ð 
Ax! Ox! 





In the case n= 1, we have a natural identification 
between functions on T*[1]N and multivector fields 
V(N) on N, and we recover again the Gerstenhaber 
algebra of the subsection “Schouten—Nijenhuis 
bracket.” Moreover, it is easy to see that, under 
the above identification, A applied to a vector field 
of N is the usual divergence operator. 


Examples from Algebraic Topology 


For any n > 1, the homology of the n-fold loop 
space Q”(M) of a topological space M has the 
structure of an (n — 1)-Poisson algebra (May 1972). 
In particular, the homology of the double loop space 
Q?(M) is a Gerstenhaber algebra, and has an exact 
generator defined using the natural circle action on 
this space (Getzler 1994). The homology of the free 
loop space £(M) of a closed oriented manifold M is 
also a BV algebra when endowed with the “Chas-— 
Sullivan intersection product” and with a generator 
defined again using the natural circle action on the 
free loop space (Cohen and Jones 2002). 


Applications 
BRST Quantization in the Hamiltonian Formalism 


The BRST procedure is a method for quantizing 
classical mechanical systems or classical field the- 
ories in the presence of symmetries (see BRST 
Quantization). The starting point is a symplectic 
manifold M (the “phase space”), a function H (the 
“Hamiltonian” of the system) governing the evolu- 
tion of the system, and the “constraints” given by 


several functions g; which commute with H and 
among each other up to a C% (M)-linear combina- 
tion of the g;’s. 

Then the dynamics is constrained on the locus V of 
common zeros of the g;’s. When V is a submanifold, 
the g;’s are a set of generators for the ideal I of 
functions vanishing on V. Observe that I is closed 
under the Poisson bracket. Functions in I are called 
“first class constraints.” The Hamiltonian vector fields 
of first-class constraints, which by construction tan- 
gential to V, are the “symmetries” of the system. 

When V is smooth, then it is a coisotropic 
submanifold of M and the Hamiltonian vector fields 
determined by the constraints give a foliation F of 
V. In the nicest case V is a principal bundle with F 
its vertical foliation and the algebra of functions 
C” (V/F) on the “reduced phase space” (see Poisson 
Reduction, and Symmetry and Symplectic Reduction) 
V/F is identified with the J-invariant subalgebra of 
Cc (M)/T. 

From a physical point of view, the points of V/F are 
the interesting states at a classical level, and a 
quantization of this system means a quantization of 
C (V/F). The BRST procedure gives a method of 
quantizing C% (V/F) starting from the (known) quan- 
tization of C% (M). Notice that these notions immedi- 
ately generalize to graded symplectic manifolds. 

From an algebraic point of view, one starts with a 
graded Poisson algebra P and a multiplicative ideal I 
which is closed under the Poisson bracket. The 
algebra of functions on the “reduced phase space” is 
replaced by (P/I), the I-invariant subalgebra of P/I. 
This subalgebra inherits a Poisson bracket even if 
P/I does not. Moreover, the pair (B, g) = (P/I, I/I”) 
inherits a graded Lie-Rinehart structure. The “Rine- 
hart complex” Altp/;(I/ I’, P/I) of alternating multi- 
linear functions on I/I? with values in P/I, endowed 
with the differential described in the subsection 
“Lie-Rinehart cohomology,” plays the role of the de 
Rham complex of vertical forms on V with respect 
to the foliation F determined by the constraints. 

In case V is a smooth submanifold, we also have 
the following geometric interpretation: let N*V 
denote the conormal bundle of V (i.e., the annihi- 
lator of TV in TyP). This is a Lie subalgebroid of 
T*P if and only if V is coisotropic. Since we may 
identify I/I” with sections of N*C (by the de Rham 
differential), (P/I,I/I*) is the corresponding Lie- 
Rinehart pair. The Rinehart complex is then the 
corresponding Lie algebroid complex I'(/A(N*V)*) 
with differential described in the subsection “Lie 
algebroid cohomology.” The image of the anchor 
map N*V — TV is the distribution determining F, 
so by duality we get an injective chain map from the 
vertical de Rham complex to the Rinehart complex. 
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The main point of the BRST procedure is to define 
a chain complex C° = /A(W* $ V)@P, where W is a 
graded vector space, with a coboundary operator D 
(the “BRST operator”), and a quasi-isomorphism 
(i.e., a chain map that induces an isomorphism in 
cohomology) 


T : (C°, D) = (Altp/;(I*/I, P/I), d) 


This means in particular that the zeroth cohomology 
H? (C) gives the algebra (P/I)' of functions on the 
“reduced phase space.” Observe that there is a 
natural symmetric inner product on Y*@® W given by 
the evaluation of U* on WV. This inner product, as an 
element of S*(U 6 U*) ~ S(T) 6 (VU @W*) 6 S*(U*), 
is concentrated in the component UV @W*, and so 
it defines an element in /A*(WU[1]@W*[-1])~ 
S*(W)[2] @ (VW @ U*) @ S*(W*)[—2], that is, a degree 
zero bivector field on U[1] 6 W*[—1]. It is easy to see 
that this bivector field induces a degree zero Poisson 
structure on S(U*[—1] @ U[1]). From another view- 
point this is the Poisson structure corresponding to 
the canonical symplectic structure on T*W[1]. 
Finally, we have that S(W*[—1] 6 W[1])@P is a 
degree zero Poisson algebra. Note that the super- 
algebra underlying the graded algebra S(¥*[-1] 
W[1]) 8P is canonically isomorphic to the complex 
C°= A\(U* GV) SP. When P=C*(M), we can 
think of S(W*[—1] 6 U[1]) 89 C% (M) as the algebra 
of functions on the graded symplectic manifold 
N=(W[1] 6 W*[-1]) x M_ (the “extended phase 
space”). In physical language, coordinate functions 
on W[1] are called “ghost fields” while coordinate 
functions on W*|[—1] are called “ghost momenta” or, 
by some authors, “antighost fields” (not to be 
confused with the antighosts of the Lagrangian 
functional-integral approach to quantization). 

Suppose now that there exists an element 
O €S(U*[-1] 6 U[1]) 9P such that {O,-}=D, that 
one can extend the “known” quantization of P toa 
quantization of S(W*|—1] 6 U[1]) @P as operators 
on some (graded) Hilbert space 7 and that the 
operator O which quantizes © has square zero. 
Then one can consider the “true space of physical 
states” He(T ) on which the ado-cohomology of 
operators will act. This provides one with a 
quantization of (P/I). 

For further details on this procedure, and in 
particular for the construction of D, we refer to 
Henneaux and Teitelboim (1992), Kostant and 
Sternberg (1987), and Stasheff (1997), and references 
therein. Observe that some authors refer to this 
method as BVF (Batalin-Vilkovisky—Fradkin) and 
reserve the name BRST for the case when the g;’s are 
the components of an equivariant moment map. 
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For a generalization to graded manifolds different 
from (W[1] $ Y*[-1]) x M we refer to Roytenberg 
(2002). There it is proved that the element © exists if the 
graded symplectic form has degree different from —1. 


BV Quantization in the Lagrangian Formalism 


The BV formalism (see Batalin—Vilkovisky Quantiza- 
tion; see also Batalin and Vilkovisky (1983) and 
Henneaux and Teitelboim (1992)) is a procedure for 
the quantization of physical systems with symmetries 
in the Lagrangian formalism. As a first step, the 
“configuration space” M of the system is augmented 
by the introduction of “ghosts.” If G is the group of 
symmetries, this means that one has to consider the 
graded manifold W = g[1] x M. The second step is to 
double this space by introducing “antifields for fields 
and ghosts,” namely one has to consider the “extended 
configuration space” T*|—1]W, whose space of func- 
tions is a BV algebra (see the subsection “Shifted 
cotangent bundle.” The algebra of “observables” is by 
definition the cohomology H4,(C* (T*[—1]W)) with 
respect to the exact generator A. 


Related Topics 
AKSZ 


The graded manifold T*[—1]W considered above is a 
particular example of a OP-manifold, that is, of a 
graded manifold M endowed with an integrable (i.e., 
self-commuting) vector field O of degree 1 and a graded 
QO-invariant symplectic structure P. In quantization of 
classical mechanical theories, the graded symplectic 
manifold of interest is the space of fields and antifields 
with symplectic form of degree 1, while O is the 
Hamiltonian vector field defined by the action func- 
tional S$; the integrability of O is equivalent to the 
classical master equation {S,S}=0O for the action 
functional. Quantization of the theory is then reduced 
to the computation of the functional integral 
J-exp(iS/h), where £ is a Lagrangian submanifold 
of M. This functional integral actually depends only on 
the homology class of the Lagrangian. Locally, a OP 
manifold is a shifted cotangent bundle T*[—1]N and 
a Lagrangian submanifold is the graph of an exact 
1-form. In the notations of the subsection “Shifted 
cotangent bundle,” a Lagrangian submanifold £ is 
therefore locally defined by equations x! = 06 /0dx', and 
the function ® is called a gauge-fixing fermion. The 
action functional of interest is then the gauge-fixed 
action S| -=S(x', 0®/Ox’). 

The language of OP manifolds has powerful 
applications to sigma models (see Topological Sigma 
Models): if X is a finite-dimensional graded manifold 
equipped with a volume element, and M is a OP 


manifold, then the graded manifold C(x, M) of 
smooth maps from © to M has a natural structure of 
OP manifold which describes some field theory if one 
arranges for the symplectic structure to be of degree 
1. As an illustrative example, if X = T[1]X, for a 
compact oriented three-dimensional smooth mani- 
fold X, and M=a[1], where g is the Lie algebra of a 
compact Lie group, the OP manifold C% (£, M) is 
relevant to Chern—Simons theory on X. Similarly, if 
X= T[1]X, for a compact oriented two-dimensional 
smooth manifold X and M =T[1]N is the shifted 
tangent bundle of a symplectic manifold, then the OP 
structure on C% (£, M) is related to the A-model with 
target N; if the symplectic manifold N is of the form 
N=T*K for a complex manifold K, then one can 
endow C%’(,M) with a complex OP manifold 
structure, which is related to the B-model with target 
K; this shows that, in some sense, the B-model can be 
obtained from the A-model by “analytic continua- 
tion” (Alexandrov et al. 1997). If X = T[1]X, for a 
compact oriented two-dimensional smooth manifold 
X and M = T*[1]|N with canonical symplectic struc- 
ture, then the OP structure on C™ (X, M) is related to 
the Poisson sigma model (OP structures on T*[1]N 
with canonical symplectic structure are in one-to-one 
correspondence with Poisson structures on N). The 
study of OP manifolds is sometimes referred to as 
“the AKSZ formalism”. In Roytenberg (2002) OP 
manifolds with symplectic structure of degree 2 are 
studied and shown to be in one-to-one correspon- 
dence with Courant algebroids. 


Graded Poisson Algebras from Cohomology of P,, 


The Poisson bracket on a Poisson manifold can be 
derived from the Poisson bivector field œ using the 
Schouten—Nienhuis bracket as follows: 


ft = {ia f tsn SIsn 


This may be generalized to the case of a graded 
manifold M endowed with a multivector field a of 
total degree 2 (i.e. a=); o aj, where a; is an 
i-vector field of degree 2 — 7) satisfying the equation 
{a, @}syy = 0. One then has the derived multibrackets 


MASA 
Vi (a1, back Gs) 
= {{...110%, 41 fon, 42Ssn---Ssn ISN 


with A=C™ (M). Observe that A; is a multiderivation 
of degree 2 — i. The operations A; define the structure 
of an L,, -algebra on A. Such a structure is called a 
P..-algebra (P for Poisson) since the ),’s are multi- 
derivations. If Ao =aop vanishes, then A, is a differ- 
ential, and the ,-cohomology inherits a graded 
Poisson algebra structure. This structure can be used 


to describe the deformation quantization of coisotropic 
submanifolds and to describe their deformation theory. 
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Introduction 


Einstein’s theory of general relativity states that 
gravity attracts light. The deflection angle of a light 
ray by an object with mass m was predicted to be 


4Gm i1] 


cr 
where c and G are the velocity of light and the 
gravitational constant, respectively, and r is the 
impact parameter. The quantitative measurement of 
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this light deflection at the solar limb during the solar 
eclipse in 1919 with 


4GMo 


~ 1.74 2 
OR. 74 arcsec [2] 


a= 
(here m is replaced by the solar mass Mo and the 
impact parameter is the solar radius Ro) confirmed 
Einstein’s theory. 

In the decades following this measurement, 
various aspects of the gravitational lens effect were 
explored theoretically, which include (1) the possi- 
bility of multiple or ring-like images of background 
sources, (2) the use of lensing as a gravitational 
telescope on very faint and distant objects, and 
(3) the possibility of determining Hubble’s constant 
with lensing. Only relatively recently — after the 
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discovery of the first doubly imaged quasar in 1979 — 
gravitational lensing became an_ observational 
science. Today gravitational lensing is a booming 
part of astrophysics. 

Lensing has established itself as a very useful 
astrophysical tool with some remarkable successes: 
with the discovery of multiply-imaged quasars, giant 
luminous arcs, Einstein rings, quasar and galactic 
microlensing significant new results in areas as 
different as cosmology, physics of quasars, and 
galaxy structure could be reached. In this article, 
only the aspects of “strong lensing” can be treated. 
More detailed studies on strong and weak lensing 
can be found in the “Further reading” section. 


Basics of Gravitational Lensing 


The path, the size, and the cross section of a light 
bundle propagating through spacetime in principle 
are affected by all the matter between the light 
source and the observer. For most practical pur- 
poses, we can assume that the lensing action is 
dominated by a single matter inhomogeneity at 
some location between source and observer. This is 
usually called the “thin-lens approximation”: all the 
action of deflection is thought to take place at a 
single distance. This approach is valid only if the 
relative velocities of lens, source, and observer are 
small compared to the velocity of light (v < c) and 
if the Newtonian potential is small (|6| < c*). These 
two assumptions are justified in all astronomical 
cases of interest. The size of a galaxy, for example, 
is of order 50 kpc, even a cluster of galaxies is not 
much larger than 1 Mpc. This “lens thickness” is 
small compared to the typical distances of the order 
of few Gpc between observer and lens or lens and 
background quasar/galaxy, respectively. We assume 
that the underlying spacetime is well described by a 
perturbed Friedmann—Robertson—Walker metric: 


ds“ = (1 + =) cdr =a (h (1 — =) do? [3] 


A detailed description of optics in curved spacetimes 
and a derivation of the lens equation from Einstein’s 
field equations can be found in Schneider et al. 
(1992, chapters 3 and 4). 


Lens Equation 


The basic setup for such a simplified gravitational 
lens scenario involving a point source and a point 
lens is displayed in Figure 1. The three ingredients in 
such a lensing situation are the source S, the lens L, 
and the observer O. Light rays emitted from the 
source are deflected by the lens. For a point-like 





D, 








Figure 1 The relation between the various angles and 
distances involved in the lensing setup can be derived for the 
case d < 1 and formulated in the lens equation [6]. 


lens, there will always be (at least) two images Sı 
and S of the source. With external shear — due to 
the tidal field of objects outside but near the light 
bundles — there can be more images. The observer 
sees the images in directions corresponding to the 
tangents to the real incoming light paths. 

In Figure 1, the corresponding angles and angular 
diameter distances D,,Ds,D 5 are indicated. (In 
cosmology, the various methods to define distance 
diverge. The relevant distances for gravitational 
lensing are the angular diameter distances.) In the 
thin-lens approximation, the hyperbolic paths are 
approximated by their asymptotes. In the circular- 
symmetric case, the deflection angle is given as 


E 4GM(£) 1 
— 4 
ao =< 4 
where M(€) is the mass inside a radius €. In this 
depiction, the origin is chosen at the observer. From 
the diagram, it can be seen that the following 
relation holds: 


ODs = BDs + &Dis [5] 


(for 0, 6,& < 1; this condition is fulfilled in practi- 
cally all astrophysically relevant situations). With 
the definition of the reduced deflection angle as 
a(0) = (Dıs/Ds)ã&(0), this can be expressed as 


6 =@—a(@) 6 


This relation between the positions of images and 
source can easily be derived for a nonsymmetric 
mass distribution as well. In that case, all angles are 


vector valued. The two-dimensional lens equation 
then reads 


P =0- a6) [7] 


Einstein Radius 


For a point lens of mass M, the deflection angle is 
given by eqn [4]. Plugging this deflection angle into 
eqn [6] and using the relation €= D10 (cf. Figure 1), 
one obtains 


Dis 4GM 
Di Ds c20 








p0) =0 [8] 
For the special case in which the source lies exactly 
behind the lens (8 = 0), due to the symmetry, a ring- 
like image occurs whose angular radius is called 
Einstein radius 6p: 


4GM Dis 


OE E c? Di Ds 


9] 
The Einstein radius defines the angular scale for a 
lens situation. For a massive galaxy with a mass of 
M=10"M., at a redshift of zg =0.5 and a source at 
redshift zs =2.0 (we used here H = 50 km s” Mpc! 
as the value of the Hubble constant and an Einstein- 
de Sitter universe), the Einstein radius is 


M 


~1. 
ee 102M. 


arcsec [10] 


(note that for cosmological distances, in general, 
Dıs Æ Ds — Dı!). For a galactic microlensing sce- 
nario in which stars in the disk of the Milky Way act 
as lenses for stars close to its center, the scale 
defined by the Einstein radius is 


| M 
Og ~ 0.5 ao [11] 


An application and some illustrations of the point 
lens case can be found in the section on galactic 
microlensing. 


Critical Surface Mass Density 


In the more general case of a three-dimensional mass 
distribution of an extended lens, the density p(r) can 
be projected along the line of sight onto the lens 
plane to obtain the two-dimensional surface mass 
density distribution X(¢) as 


Ds 
DË) = / p(r) dz [12] 


Here r is a three-dimensional vector in space, and é 
is a two-dimensional vector in the lens plane. The 
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two-dimensional deflection angle & is then given as 
the sum over all mass elements in the lens plane: 


: AG T E= E) » 
a(S) = -7 J =e eo 
-ő | 
For a finite circle with constant surface mass density 
X, the deflection angle can be written as 


Be i 


3 [13] 











With £= D18 this simplifies to 
> 4rG Di Dis 
O = 15] 


With the definition of the critical surface mass 
density Merit as 
5 = g Ds 
A 4rG D, Dis 


the deflection angle for a such a mass distribution 
can be expressed as 








[16] 


2 
(0) =z [17] 





The critical surface mass density can be visualized as 
the lens mass M “smeared out” over the area of the 
Einstein ring: Merit = M/ (Rin), where Rp =0O¢D,. 
The value of the critical surface mass density is 
roughly Meit ~ 0.8g cm for lens and source red- 
shifts of z =0.5 and zs =2.0, respectively. For an 
arbitrary mass distribution, the condition © > Merit 
at any point is sufficient to produce multiple images. 


Image Positions and Magnifications 


The lens equation [6] can be re-formulated in the 
case of a single-point lens: 
OF 


g=0-+ 18 


Solving this for the image positions @, one finds that 
an isolated point source always produces two 
images of a background source. The positions of 
the images are given by the two solutions: 


O12 =3 (62 EZT [19] 


The magnification of an image is defined by the 
ratio between the solid angles of the image and the 
source, since the surface brightness is conserved. 
Hence, the magnification u is given as 


0 dé 


H = BdB [20] 
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In the symmetric case above, the image magnifica- 
tion can be written as (by using the lens equation) 


Op u? Ja 1 


hk ( E) 2uvu? +4 2 


Here we defined u as the “impact parameter,” the 
angular separation between lens and source in units 
of the Einstein radius: u = 8/0g. The magnification 
of one image (the one inside the Einstein radius) is 
negative. This means it has negative parity: it is 
mirror-inverted. For 8 — 0 the magnification 
diverges. In the limit of geometrical optics, the 
Einstein ring of a point source has infinite magnifi- 
cation! (Due to the fact that physical objects have a 
finite size, and also because at some limit wave 
optics has to be applied, in reality the magnification 
stays finite.) The sum of the absolute values of the 
two image magnifications is the measurable total 
magnification u: 


|21] 


AE E 
= [pal + = oe 


Note that this value is (always) larger than 1! (This 
does not violate energy conservation, since this is the 
magnification relative to an “empty” universe and 
not relative to a “smoothed out” universe. This issue 
is treated in detail in Schneider et al. (1992, chapter 
4.5).) The “sum” of the two image magnifications is 
unity: 


|22] 


j4+j2=1 |23] 


(Non)Singular Isothermal Sphere 


A popular model for galaxy lenses is the singular 
isothermal sphere with a three-dimensional density 
distribution of 


g 1 
p(r) = E 
27Gr 
where oy is the one-dimensional velocity dispersion. 


Projecting the matter on a plane, one obtains the 
circularly symmetric surface mass distribution 


gl 
With M(£) = BEIE 2rE' dé’ plugged into eqn [4], 
one obtains the deflection angle for an isothermal 


sphere, which is a constant (i.e., independent of the 
impact parameter €): 





|24] 





|25] 


a(é) = 4r — [26] 


In practical units for the velocity dispersion of a 
galaxy, this can be expressed as 


2 

o 

a(€) = 1.15 | ———~—_ } arcsec 27 
T km =) a 

Two generalizations of this isothermal model are 

commonly used: models with finite cores are more 

realistic for (spiral) galaxies. In this case, the 

deflection angle is modified to (core radius &): 


o? E 

e 1/2 
ae) 
Furthermore, a realistic galaxy lens usually is not 
perfectly symmetric but is slightly elliptical. Depend- 
ing on whether one wants an elliptical mass 


distribution or an elliptical potential, various form- 
alisms are in use. 


aE) = 4n 28 


Lens Mapping 


In the vicinity of an arbitrary point, the lens 
mapping as shown in eqn [7] can be described by 
its Jacobian matrix A: 


aB- (n-A) - (1-0) p 








Here we made use of the fact that the deflection 
angle can be expressed as the gradient of an effective 
two-dimensional scalar potential w: Vo = &@, where 


VO =y ae [ema 30 





and ®(r) is the Newtonian potential of the lens. The 
determinant of the Jacobian A is the inverse of the 
magnification: 


1 
are a 
Defining 
Oru 
vi = 30,30; [32] 


the Laplacian of the effective potential 7 is twice the 
convergence: 


wy t+ 22 = 2K = try [33] 


With the definitions of the components of the 
external shear y, 


y1 (0) = $ (11 — %22) = (0) cos[24(0)] [34] 
and 


y2 (0) = Y12 = Yar = 7(0) sin[2~(8)| [35] 


(where the angle y reflects the direction of the shear- 
inducing tidal force relative to the coordinate 
system), the Jacobian matrix can be written as 


r Ai = 
= lent 
1 0 cos 2y 
=(L=k] =a), 
0 1 sin 2y 


The magnification can now be expressed as a 
function of the local convergence « and the local 
shear y: 


sin 2y 361 
—cos2y 


1 


— (det A = E 
ee a-r- 


[37] 


Locations at which det A =0 have formally infinite 
magnification. They are called “critical curves” in 
the lens plane. The corresponding locations in the 
source plane are the “caustics.” For spherically 
symmetric mass distributions, the critical curves are 
circles. For a point lens, the caustic degenerates into 
a point. For elliptical lenses or spherically symmetric 
lenses plus external shear, the caustics consist of 
cusps and folds. 


Time Delay and Fermat’s Theorem 


The deflection angle is the gradient of an effective 
lensing potential ~. Hence, the lens equation can be 
rewritten as 


(0 -8)- Və =0 [38] 


V G (0 — B} — v) = 39] 


The term in brackets appears as well in the physical 
time delay function for gravitationally lensed 
images: 


T(0, B ) = Tgeom + Tgrav 


B 1+ ZL Di Ds 
7 C Dis 








(36-8 —v(@)) wo 


This time delay surface is a function of the image 
geometry (0,8), the gravitational potential 7, and 
the distances Di, Ds, and Dıs. The first part — the 
geometrical time delay Tgeom — reflects the extra path 
length compared to the direct line between observer 
and source. The second part — the gravitational time 
delay Teray — is the retardation due to gravitational 
potential of the lensing mass (known and confirmed 
as Shapiro delay in the solar system). From eqns [39] 
and [40], it follows that the gravitationally lensed 
images appear at locations that correspond to 
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extrema in the light travel time, which reflects 
Fermat’s principle in gravitational-lensing optics. 

The (angular-diameter) distances that appear in 
eqn [40] depend on the value of the Hubble 
constant. Therefore, it is possible to determine the 
latter by measuring the time delay between different 
images and using a good model for the effective 
gravitational potential % of the lens. 


Lensing Phenomena 


Strong lensing phenomena involve multiple images, 
caustics, critical lines, usually a significant magnifi- 
cation, and large distortions if extended sources are 
involved. Below we discuss the most frequent strong 
lensing phenomena. 


Galactic Microlensing 


The conceptually simplest strong lensing scenario is 
a foreground star acting as a lens on a background 
star. Since stars in the Milky Way move relative to 
each other, this can be observed as a time-variable 
situation: due to the relative motion between 
observer, lensing star, and source star, the projected 
impact parameter between lens and source changes 
with time and produces a time-dependent magnifi- 
cation. If the impact parameter is smaller than an 
Einstein radius (u <1), then the magnification is 
Umin > 1.34 (cf. eqn [22]). 

For an extended source, a sequence image 
configurations with decreasing impact parameter 
is illustrated in Figure 2 for five instants of time. 
The separation of the two images is of order-2 
Einstein radii when they are of comparable 
magnification, which corresponds to only about 
1 marcsec in a realistic situation in the Milky 
Way. Hence, the two images cannot be resolved 
individually; we can only observe the combined 
brightness of the image pair. This is illustrated in 
Figures 3 and 4, which show the relative tracks 
and the respective light curves for five values of 
the minimum impact parameter Umin- 





O 


Figure 2 Five snapshots of a gravitational lens situation with a 
point lens and an extended source: from left to right the 
alignment between lens and source gets better and better, until 
it is perfect in the rightmost panel. This results in the image of an 
“Einstein ring.” 
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Figure 3 Five relative tracks between background star and 
foreground lens (indicated as the central star) parametrized by 
the impact parameter Umin. The dashed line indicates the 
Einstein ring for the lens. 
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Figure 4 Five microlensing light curves for the tracks indicated 
in Figure 3, parametrized by the impact parameter Umin. The 
vertical axis is the magnification in astronomical magnitudes 
relative to the unlensed case, the horizontal axis displays the 
time in “normalized” units. 


Quantitatively, the total magnification u= |1| + 
|u2| of the two images (cf. eqn [22]) entirely depends 
on the impact parameter u(t) =r(t)/Rg between the 
lensed star and the lensing object, measured in the 
lens plane (here Rg is the Einstein radius of the lens, 
i.e., the radius at which a circular image appears for 
perfect alignment between source, lens, and obser- 
ver, cf. Figure 2, rightmost panel): 


2 
naO ay 
u(y (ua? +4) 


The timescale of such a “microlensing event” is 
defined as the time it takes for the source to cross 


the Einstein radius. With realistic values for dis- 
tances and relative velocity, this can be expressed as 


E S [M | D 


Di V i 
ddie a 42 
oa errand) pe 


(here vı is the (relative) transverse velocity of the 
lens; we applied the simple relation Drs = Ds — Dy, 
which is valid here). 

Note that from eqn [42] it is obvious that it is not 
possible to determine the mass of the lens from one 
individual microlensing event. The duration of an 
event is determined by three unknown parameters: 
the mass of the lens m, the transverse velocity v1, 
and the distance of the lens Dy (assuming we know 
the distance to the source). It is impossible to 
disentangle these for individual events. Only with a 
model for the spatial and velocity distribution of the 
lensing stars in the Milky Way, one can obtain 
approximate information about the masses of the 
lensing objects. 

In 1986, Bohdan Paczynski suggested to use this 
microlensing method as an observational test for 
potential dark matter candidates in the halo of the 
Milky Way. If the dark matter is in the form of 
astrophysical objects (such as brown dwarfs, neu- 
tron stars, black holes, sometimes called “MACHO” 
for MAssive Compact Halo Object), then they 
should occasionally act as lenses on stars in the 
neighboring galaxy Large Magellanic Cloud. It 
turned out that too few of such microlensing events 
were observed, in order to explain the dark matter 
this way. 

However, this method produced more than 2000 
microlensing events by ordinary stars in the direc- 
tion to the center of the Milky Way. Two of these 
events provide convincing evidence for a planet 
accompanying the lensing star. It is likely that 
gravitational microlensing will provide a statistically 
very valuable sample of extrasolar planets, because 
in contrast to most other methods these planets are 
pre-selected by their host stars. Furthermore, micro- 
lensing is sensitive to masses as low as a few Earth 
masses. 





Multiply-lmaged Quasars 


The first gravitationally lensed double quasar was 
discovered in 1979: two images of the same quasar, 
separated by about 6 arcsec. This led to the field of 
gravitational lensing as an observational science. By 
now, more than 120 multiply imaged quasars are 
known, mostly double and quadruple images. They 


span image separations from 0.3 arcsec to almost 30 
arcsec. 

Gravitationally lensed quasar systems are studied 
individually in great detail to get a better under- 
standing of both lens and source. The lens systems 
are analyzed statistically as well, in order to get 
information about the population of lenses (and 
quasars) in the universe, their distribution in 
distance (i.e., cosmic time) and mass, and hence 
about the cosmological model. 


Time delay and Hubble constant As stated above, 
the signals from a gravitational lens system reach us 
with a certain “time delay” Az, so that the measured 
fluxes as functions of time, Ia(t) and Ip(t), can be 
described as: Ip(t)=const. x Ia(t + At). Any intrin- 
sic fluctuation of the quasar shows up in both 
images, in general with an overall offset in apparent 
magnitude and an offset in time. 

Q0957 + 561 is the first lens system in which the 
time delay was firmly established: 


Atoo957+561 = (417 zX 3) days [43] 


With a model of the lens system, the time delay 
can be used to determine the Hubble constant. (This 
can be seen very simply: imagine a certain lens 
situation like the one displayed in Figure 1. If now 
all length scales are reduced by a factor of 2 and at 
the same time all masses are reduced by a factor of 
2, then for an observer, the angular configuration in 
the sky would appear exactly identical. But the total 
length of the light path is reduced by a factor of 2. 
Now, since the time delay between the two paths is 
the same fraction of the total lengths in either 
scenario, a measurement of this fractional length 
allows us to determine the total length, and hence 
the Hubble constant, the constant of proportionality 
between distance and redshift.) The resulting value 
of Ho is 


Ho = (67 + 13)km s~ Mpc“! [44] 


where the uncertainty comprises the 95% confi- 
dence level. To date, about a dozen quasar lens 
systems have measured time delays. The derived 
values of the Hubble constant are “lowish,” if we 
assume the best astrophysical motivated lens 
models. 


Quasar Microlensing 


Light bundles from “lensed” quasars are split by 
intervening galaxies. Usually the quasar light bundle 
passes through the galaxy and/or the galaxy halo. 
Galaxies consist at least partly of stars, and galaxy 
halos consist possibly of compact objects as well. 
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Figure 5 “Microimages”: the top left panel shows an assumed 
“unlensed” source profile of a quasar. The other three panels 
illustrate the microimage configuration as it would be produced 
by stellar objects in the foreground. The surface mass density of 
the lenses is 20% (top right), 50% (bottom left), and 80% 
(bottom right) of the critical density (cf. eqn [16]). 


Each of these stars (or other compact objects like 
black holes, brown dwarfs, or planets) acts as a 
“compact lens” or “microlens” and produces at least 
one additional microimage of the source. This 
means the “macroimage” consists of many “micro- 
images” (Figure 5). But because the image splitting is 
proportional to the square root of the lens mass, 
these microimages are only of order a microarcse- 
cond apart and cannot be resolved. Various aspects 
of microlensing have been addressed after the first 
double quasar had been discovered. 

The microlenses produce a complicated two- 
dimensional magnification distribution in the source 
plane. It consists of many caustics, locations that 
correspond to formally infinitely high magnification. 
An example for such a magnification pattern is shown 
in Figure 6. It is determined with the parameters of 
image A of the quadruple quasar Q2237 + 0305 





Figure 6 Magnification pattern in the source plane, produced 
by a dense field of stars in the lensing galaxy. The gray scale 
reflects the magnification as a function of the quasar position. 
Light curves taken along the straight tracks are shown in 
Figure 7. The microlensing parameters were chosen according 
to a model for image A of the quadruple quasar Q2237 + 
0305: x =0.36, y = 0.44. 
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(surface mass density «k=0.36; external shear 
y = 0.44). Gray scale indicates the magnification. 

Due to the relative motion between observer, lens, 
and source, the quasar changes its position relative 
to this arrangement of caustics, that is, the apparent 
brightness of the quasar changes with time. A one- 
dimensional cut through such a magnification 
pattern, convolved with a source profile of the 
quasar, results in a microlensed light curve. Exam- 
ples for microlensed light curves taken along the 
straight lines in Figure 6 can be seen in Figure 7 for 
two different quasar sizes. 

In particular when the quasar track crosses a 
caustic (the sharp lines in Figure 6 for which the 
magnification formally is infinite, because the deter- 
minant of the Jacobian disappears, cf. eqn [31]), a 
pair of highly magnified microimages appears 
newly or merges and disappears. Such a microlen- 
sing event can easily be detected as a strong peak in 
the light curve of the quasar image. 

Microlens-induced fluctuations in the observed 
brightness of quasars contain information both 
about the light-emitting source (size of continuum 
region or broad line region of the quasar, brightness 
profile of quasar) and about the lensing objects 
(masses, density, transverse velocity). Hence, from a 
comparison between observed and simulated quasar 
microlensing, one can draw conclusions about the 
density and mass scale of the microlenses. So far the 
“best” example of a microlensed quasar is the 
quadruple quasar Q2237 + 0305. 
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Figure 7 Microlensing light curve for the straight lines in 
Figure 6. The solid and dashed lines indicate relatively small and 
large quasar sizes. The time axis is in units of Einstein radii 
divided by unit velocity. 


Einstein Rings 


If a point source lies exactly behind a point lens, a ring- 
like image occurs. Theorists had recognized early on 
that such a symmetric lensing arrangement would result 
in a ring image, the so-called “Einstein ring.” There are 
two necessary requirements for the observability of 
Einstein rings: the mass distribution of the lens needs to 
be approximately axially symmetric, as seen from the 
observer, and the source must lie exactly on top of the 
resulting degenerate pointlike caustic. Such a geometric 
arrangement is highly unlikely for pointlike sources. But 
astrophysical sources in the real universe have a finite 
extent, and it is enough if a part of the source covers the 
point caustic (or the complete astroid caustic in the case 
of a not quite axially symmetric mass distribution) in 
order to produce such an annular image. 

In 1988, the first example of an “Einstein ring” 
was discovered. With high-resolution radio observa- 
tions, the extended radio source MG1131 + 0456 
turned out to be a ring with a diameter of about 
1.75 arcsec. The source was identified as a radio 
lobe at a redshift of zs = 1.13, whereas the lens is a 
galaxy at z, = 0.85. By now more than a dozen cases 
have been found that qualify as Einstein rings. Their 
diameters vary between 0.33 and about 2 arcsec. 


Giant Luminous Arcs and Arclets 


Fritz Zwicky had pointed out the potential use of 
galaxies and galaxy clusters as gravitational lenses in 
the 1930s. With background galaxies as sources, the 
apparent lensing consequences for them would be 
far more dramatic than for quasars: galaxies should 
be heavily deformed once they are strongly lensed. 
Rich clusters of galaxies at redshifts beyond z = 0.2 
with masses of order 10!'*M, are very effective 
lenses if they are centrally concentrated. Their 
Einstein radii are of the order of 20 arcsec. 

In 1986, the following gravitational lensing phe- 
nomenon was discovered: magnified, distorted, and 
strongly elongated images of background galaxies 
which happen to lie behind foreground clusters of 
galaxies, the so-called giant luminous arcs. The giant 
arcs can be exploited in two ways, as is typical for 
many lens phenomena. Firstly, they provide us with 
strongly magnified galaxies at (very) high redshifts. 
These galaxies would be too faint to be detected or 
analyzed in their unlensed state. Hence, with the 
lensing boost, we can study these galaxies in their early 
evolutionary stages, possibly as infant or protoga- 
laxies, relatively shortly after the big bang. The other 
practical application of the arcs is to take them as tools 
to study the potential and mass distribution of the 
lensing galaxy cluster. In the simplest model of a 
spherically symmetric mass distribution for the cluster, 


giant arcs form very close to the critical curve, which 
marks the Einstein ring. So with the redshifts of the 
cluster and the arc, it is easy to determine a rough 
estimate of the lensing mass by just determining the 
radius of curvature and interpreting it as the Einstein 
radius of the lens system. 


Weak Lensing/Statistical Lensing/Cosmic Shear 


In contrast to the phenomena that were mentioned 
here, “weak lensing” deals with effects of light 
deflection that cannot be measured individually, but 
rather in a statistical way only. No caustics, critical 
lines, or multiple images are involved. As was 
discussed above, “strong lensing” — usually defined as 
the regime that involves multiple images, high magni- 
fications, and caustics in the source plane — is a rare 
phenomenon. Weak lensing on the other hand is much 
more common. In principle, weak lensing acts along 
each line of sight in the universe, since each light 
bundle’s path is affected by matter inhomogeneities 
along or near its path. It is just a matter of how 
accurately we can measure. In recent years, many 
teams started impressive and ambitious observational 
programs to determine the slight distortion of tens of 
thousands of background galaxies by foreground 
galaxy clusters and/or by the large-scale structure in 
the universe, the so-called cosmic shear. It is beyond 
the scope of this article to discuss these applications of 
weak gravitational lensing. The interested reader is 
referred to the “Further reading” section, in particular 
to Bartelmann and Schneider (2001). 
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See also: Cosmology: Mathematical Aspects; General 
Relativity: Experimental Tests; General Relativity: 
Overview; Newtonian Limit of General Relativity; 
Singularity and Bifurcation Theory. 
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Introduction 


Let a number, N, of particles interact classically 
through Newton’s laws of motion and Newton’s 
inverse-square law of gravitation. Then the equa- 
tions of motion are 


j=N ii 
ñ=-G $, m7 —s [1] 
j=1, j#i Ini — r| 


where r; is the position vector of the ith particle 
relative to some inertial frame, G is the universal 
constant of gravitation, and 7n; is the mass of the ith 
particle. These equations provide an approximate 


mathematical model with numerous applications in 
astrophysics, including the motion of the Moon and 
other bodies in the solar system (planets, asteroids, 
comets, and meteor particles); stars in stellar systems 
ranging from binary and other multiple stars to star 
clusters and galaxies; and the motion of dark-matter 
particles in cosmology. For N=1 and N=2, the 
equations can be solved analytically. The case N=3 
provides one of the richest of all unsolved dynamical 
problems — the general three-body problem. For 
problems dominated by one massive body, as in 
many planetary problems, approximate methods 
based on perturbation expansions have been devel- 
oped. In stellar dynamics, astrophysicists have 
developed numerous numerical and theoretical 
approaches to the problem for larger values of N, 
including treatments based on the Boltzmann equa- 
tion and the Fokker—Planck equation; such N-body 
systems can also be modeled as self-gravitating 
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gases, and thermodynamic insights underpin much 
of our qualitative understanding. 


Few-Body Problems 
The Two-Body Problem 


For N =2, the relative motion of the two bodies can 
be reduced to the force-free motion of the center of 
mass and the problem of the relative motion. If 
r= rı — r2, then 


? = —G(m + m) -5 2] 
ir] 

often called the Kepler problem. It represents motion 
of a particle of unit mass under a central inverse-square 
force of attraction. Energy and angular momentum are 
constant, and the motion takes place in a plane passing 
through the origin. Using plane polar coordinates (r, 8) 
in this plane, the equations for the energy and angular 
momentum reduce to 


m m 
e=3( ai) [3] 
bere [4] 


(Note that these are not the energy and angular 
momentum of the two-body problem, even in the 
barycentric frame of the center of mass; E and L must 
be multiplied by the reduced mass 111m /(m, + mz).) 
Using eqns [3] and [4], the problem is reduced to 
quadratures. The solution shows that the motion is on 
a conic section (ellipse, circle, straight line, parabola, 
or hyperbola), with the origin at one focus. 

This reduction depends on the existence of integrals 
of the equations of motion, and these in turn depend 
on symmetries of the underlying Lagrangian or 
Hamiltonian. Indeed, eqns [1] yield ten first integrals: 
six yield the rectilinear motion of the center of mass, 
three the total angular momentum, and one the energy. 
Furthermore, eqn [2] may be transformed, via the 
Kustaanheimo-Stiefel (KS) transformation, to a four- 
dimensional simple harmonic oscillator. This reveals 
further symmetries, corresponding to further invar- 
iants: the three components of the Lenz vector. 
Another manifestation of the abundance of symme- 
tries of the Kepler problem is the fact that there exist 
action-angle variables in which the Hamiltonian 
depends only on one action, that is, H=H(L). 
Another application of the KS transformation is one 
that has practical importance: it removes the singular- 
ity of (i.e., regularizes) the Kepler problem at r=0, 
which is troublesome numerically. 

To illustrate the character of the KS transforma- 
tion, we consider briefly the planar case, which can 


be handled with a complex variable obeying the 
equation of motion %= —z/|z|’ (after scaling 
eqn (2)). By introducing the Levi-Civita transforma- 
tion g=Z* and Sundman’s transformation of the 
time, that is, dt/dr= |z|, the equation of motion 
transforms to Z” =hZ/2, where h =|z|* /2 — 1/|z| is 
the constant of energy. The KS transformation is a 
very similar exercise using quaternions. 


The Restricted Three-Body Problem 


The simplest three-body problem is given by the 
motion of a test particle in the gravitational field of 
two particles, of positive mass mı, m2, in circular 
Keplerian motion. This is called the circular 
restricted three-body problem, and the two massive 
particles are referred to as primaries. In a rotating 
frame of reference, with origin at the center of mass 
of these two particles, which are at rest at positions 
rı, r2, the equation of motion is 


r+20xr+Ox (Q xr) 
=cv( 4 | 5) 


Ir —11| lr — ra| 


where r is the position of the massless particle and Q 
is the angular velocity of the frame. 

This problem has three degrees of freedom but only 
one known integral: it is the Hamiltonian in the 
rotating frame, and equivalent to the Jacobi integral, J. 
One consequence is that Liouville’s theorem is not 
applicable, and more elaborate arguments are required 
to decide its integrability. Certainly, no general 
analytical solution is known. 

There are five equilibrium solutions, discovered 
by Euler and Lagrange (see Figure 1). They lie at 
critical points of the effective potential in the 





Figure 1 The equilibrium solutions of the circular restricted 
three-body problem. A rotating frame of reference is chosen in 
which two particles are at rest on the x-axis. The massless 
particle is at equilibrium at each of the five points shown. Five 
similar configurations exist for the general three-body problem; 
these are the “central” configurations. 


rotating frame, and demarcate possible regions of 
motion. 

Throughout the twentieth century, much numerical 
effort was used in finding and classifying periodic 
orbits, and in determining their stability and bifurca- 
tions. For example, there are families of periodic orbits 
close to each primary; these are perturbed Kepler 
orbits, and are referred to as satellite motions. Other 
important families are the series of Liapounov orbits 
starting at the equilibrium points. 

Some variants of the restricted three-body pro- 
blem include the following: 


1. The elliptic restricted three-body problem, in 
which the primaries move on an elliptic Kepler- 
ian orbit; in suitable coordinates the equation of 
motion closely resembles eqn [5], except for a 
factor on the right side which depends explicitly 
on the independent variable (transformed time); 
this system has no first integral. 

2. Sitnikov’s problem, which is a special case of the 
elliptic problem, in which mı =m, and the 
motion of the massless particle is confined to 
the axis of symmetry of the Keplerian motion; 
this is still nonintegrable, but simple enough to 
allow extensive analysis of such fundamental 
issues as integrability and stochasticity. 

3. Hill’s problem, which is a scaled version suitable 
for examining motions close to one primary; its 
importance in applications began with studies of 
the motion of the moon, and it remains vital for 
understanding the motion of asteroids. 


The General Three-Body Problem 


Exact solutions When all three particles have 
nonzero masses, the equations of motion become 


mif; = -V;W 


where the potential energy is 
W= -G E 
ETI ri — r; 
Then the exact solutions of Euler and Lagrange 
survive in the form of homographic solutions. In 
these solutions, the configuration remains geometri- 
cally similar, but may rotate and/or pulsate in the 
same way as in the two-body problem. 

Let us represent the position vector r; in the 
planar three-body problem by the complex number 
zi. Then, it is easy to see that we have a solution of 
the form 2z;(t)=2z(t)zo;, provided that 
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and 
m;Czo; = ViW(Zo1,; 202; 203) 


for some constant C. Thus, z(t) may take the form of 
any solution of the Kepler problem, while the 
complex numbers zo; must correspond to what is 
called a central configuration. These are in fact 
critical points of the scale-free function WvI, where 
I (the “moment of inertia of the system”) is given by 
I= S4} mr?; and C= —W/I. 

The existence of other important classes of 
periodic solutions can be proved analytically, even 
though it is not possible to express the solution in 
closed form. Examples include hierarchical three- 
body systems, in which two masses m1, mz exhibit 
nearly elliptic relative motion, while a third mass 
orbits the barycenter of mı and mz in another nearly 
elliptic orbit. In the mathematical literature, this is 
referred to as motion of elliptic—elliptic type. More 
surprisingly, the existence of a periodic solution in 
which the three bodies travel in succession along the 
same path, shaped like a figure 8 (cf. Figure 2), was 
established by Chenciner and Montgomery (2000), 
following its independent discovery by Moore using 
numerical methods. Another interesting periodic 
motion that was discovered numerically, by 
Schubart, is a solution of the collinear three-body 
problem, and so collisions are inevitable. In this 
motion, the body in the middle alternately encoun- 
ters the other two bodies. 





Figure 2 A rare example of a scattering encounter between 
two binaries (which approach from upper right and lower left) 
which leads to a permanently bound triple system describing the 
“figure-8” periodic orbit. A fourth body escapes at the bottom. 
Note the differing scales on the two axes. (Reproduced with 
permission from Heggie DC (2000) A new outcome of binary- 
binary scattering. Monthly Notices of the Royal Astronomical 
Society 318(4): L61—-L63; © Blackwell Publishing Ltd.) 
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Singularities As Schubart’s solution illustrates, 
two-body encounters can occur in the three-body 
problem. Such singularities can be regularized just as 
in the pure two-body problem. Triple collisions 
cannot be regularized in general, and this singularity 
has been studied by the technique of “blowup.” This 
has been worked out most thoroughly in the collinear 
three-body problem, which has only two degrees of 
freedom. The general idea is to transform to two 
variables, of which one (denoted by r, say) deter- 
mines the scale of the system, while the other (s) 
determines the configuration (e.g., the ratio of 
separations of the three masses). By scaling the 
corresponding velocities and the time, one obtains a 
system of three equations of motion for s and the 
two velocities which are perfectly regular in the limit 
r — 0. In this limit, the energy integral restricts the 
solutions of the system to a manifold (called the 
collision manifold). Exactly the same manifold 
results for zero-energy solutions, which permits a 
simple visualization. Equilibria on the collision 
manifold correspond to the Lagrangian collinear 
solutions in which the system either expands to 
infinity or contracts to a three-body collision. 


Qualitative ideas Reference has already been made 
to motion of elliptic—elliptic type. In a motion of 
elliptic-hyperbolic type, there is again an “inner” 
pair of bodies describing nearly Keplerian motion, 
while the relative motion of the third body is nearly 
hyperbolic. In applications, this is referred to as a 
kind of scattering encounter between a binary and a 
third body. When the encounter is sufficiently close, 
it is possible for one member of the binary to be 
exchanged with the third body. One of the major 
historical themes of the general three-body problem 
is the classification of connections between these 
different types of asymptotic motion. It is possible to 
show, for instance, that the measure of initial 
conditions of hyperbolic—elliptic type leading asymp- 
totically to elliptic—elliptic motion (or any other type 
of permanently bound motion) is zero. Much of the 
study of such problems has been carried out 
numerically. 

There are many ways in which the stability of 
three-body motions may be approached. One exam- 
ple is furnished by the central configurations already 
referred to. They can be used to establish sufficient 
conditions for ensuring that exchange is impossible, 
and similar conclusions. 

A powerful tool for qualitative study of three- 
body motions is Lagrange’s identity, which is now 
thought of as the reduction to three bodies of the 
virial theorem. Let the size of the system be 


characterized by the “moment of inertia” I. Then it 
is easy to show that 


di 

To 4T +2W 
where T, W are, respectively, the kinetic and 
potential energies of the system. Usually, the bary- 
centric frame is adopted. Since E=T + V is con- 
stant and T > 0, it follows that the system is not 
bounded for all t > 0 unless E < 0. 


Perturbation theory The question of the integr- 
ability of the general three-body problem has 
stimulated much research, including the famous 
study by Poincaré which established the nonexis- 
tence of integrals beyond the ten classical ones. 
Poincaré’s work was an important landmark in the 
application to the three-body problem of perturba- 
tion methods. If one mass dominates, that is, mı >> 
m, and mı > m3, then the motion of m and m3 
relative to mı is a mildly perturbed two-body 
motion, unless m and m3 are close together. Then 
it is beneficial to describe the motion of m relative 
to m, by the parameters of Keplerian motion. These 
would be constant in the absence of m3, and vary 
slowly because of the perturbation by m3. This was 
the idea behind Lagrange’s very general method of 
variation of parameters for solving systems of 
differential equations. Numerous methods were 
developed for the iterative solution of the resulting 
equations. In this way, the solution of such a three- 
body problem could be represented as a type of 
trigonometric series in which the arguments are the 
angle variables describing the two approximate 
Keplerian motions. These were of immense value in 
solving problems of celestial mechanics, that is, the 
study of the motions of planets, their satellites, 
comets, and asteroids. 

A major step forward was the introduction of 
Hamiltonian methods. A three-body problem of the 
type considered here has a Hamiltonian of the form 


H = Hı (Li) + Ha(L2) +R 


where H;, i=1, 2, are the Hamiltonians describing 
the interaction between m; and m1, and R is the 
“disturbing function.” It depends on all the vari- 
ables, but is small compared with the H;. Now 
perturbation theory reduces to the task of perform- 
ing canonical transformations which simplify R as 
much as possible. 

Poincaré’s major contribution in this area was to 
show that the series solutions produced by perturba- 
tion methods are not, in general, convergent, but 


asymptotic. Thus, they were of practical rather than 
theoretical value. For example, nothing could be 
proved about the stability of the solar system using 
perturbation methods. It took the further analytic 
development of KAM theory to rescue this aspect of 
perturbation theory. This theory can be used to 
show that, provided that two of the three masses are 
sufficiently small, then for almost all initial condi- 
tions the motions remain close to Keplerian for all 
time. Unfortunately, now it is the practical aspect of 
the theory which is missing; though we have 
introduced this topic in the context of the three- 
body problem, it is extensible to any N-body system 
with N — 1 small masses in nearly Keplerian motion 
about mı, but to be applicable to the solar system 
the masses of the planets would have to be 
ridiculously small. 


Numerical methods Numerical integrations of the 
three-body problem were first carried out near the 
beginning of the twentieth century, and are now 
commonplace. For typical scattering events, or other 
short-lived solutions, there is usually little need to go 
beyond common Runge-Kutta methods, provided 
that automatic step-size control is adopted. When 
close two-body approaches occur, some regulariza- 
tion based on the KS transformation is often 
exploited. In cases of prolonged elliptic—elliptic 
motion, an analytic approximation based on Kepler- 
ian motion may be adequate. Otherwise (as in 
problems of planetary motion, where the evolution 
takes place on an extremely long timescale), meth- 
ods of very high order are often used. Symplectic 
methods, which have been developed in the context 
of Hamiltonian problems, are increasingly adopted 
for problems of this kind, as their long-term error 
behavior is generally much superior to that of 
methods which ignore the geometrical properties of 
the equations of motion. 


Four- and Five-Body Problems 


Many of the foregoing remarks, on central config- 
urations, numerical methods, KAM theory, etc., 
apply equally to few-body problems with N > 3. 
Of special interest from a theoretical point of view is 
the occurrence of a new kind of singularity, in which 
motions become unbounded in finite time. For 
N=4, the known examples also require two-body 
collisions, but noncollision orbits exhibiting finite- 
time blowup are known for N=5S. 

One of the practical (or, at least, astronomical) 
applications is again to scattering encounters, this 
time involving the approach of two binaries on a 
hyperbolic relative orbit. Numerical results show 
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that a wide variety of outcomes is possible, includ- 
ing even the creation of the figure-8 periodic orbit of 
the three-body problem, while a fourth body escapes 
(Figure 2). 


Many-Body Problems 


Many of the concepts already introduced, such as 
the virial theorem, apply equally well to the many- 
body classical gravitational problem. This section 
refers mainly to the new features which arise when 
N is not small. In particular, statistical descriptions 
become central. The applications also have a 
different emphasis, moving from problems of plane- 
tary dynamics (celestial mechanics) to those of 
stellar dynamics. Typically, N lies in the range 
107-10". 


Evolution of the Distribution Function 


The most useful statistical description is obtained if 
the correlations we neglect and focus on the one- 
particle distribution function f(r, v, t), which can be 
interpreted as the number density at time ¢ at the 
point in phase space corresponding to position r and 
velocity v. Several processes contribute to the 
evolution of f. 


Collective effects When the effects of near neigh- 
bors are neglected, the dynamics is described by the 
Vlasov—Poisson system 


of əf aelr,t) əf 
a «(Oe aa 
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V? = 4rGm J f(r,v,t)d°v [7] 


where ¢ is the gravitational potential and m is the 
mass of each body. Obvious extensions are neces- 
sary if not all bodies have the same mass. 

Solutions of eqn [6] may be found by the method 
of characteristics, which is most useful in cases 
where the equation of motion r= —V¢@ is integr- 
able, for example, in stationary, spherical potentials. 
An example is the solution 


f =|E\"" 8 


where E is the specific energy of a body, that is, 
E=v*/2+4+¢. This satisfies eqn [6] provided that ¢ 
is static. Equation [7] is satisfied provided that ¢ 
satisfies a case of the Lane-—Emden equation, which 
is easy to solve in this case. 

The solution just referred to is an example of an 
equilibrium solution. In an equilibrium solution, the 
virial theorem takes the form 4T + 2W=0O, where 
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T, W are appropriate mean-field approximations for 
the kinetic and potential energy, respectively. It 
follows that E= —T, where E=T + V is the total 
energy. An increase in E causes a decrease in T, 
which implies that a self-gravitating N-body system 
exhibits a negative specific heat. 

There is little to choose between one equilibrium 
solution and another, except for their stability. In 
such an equilibrium, the bodies orbit within the 
potential on a timescale of the crossing time, which 
is conventionally defined to be 


GM3/2 
(2E)? 


cr 


The most important evolutionary phenomenon of 
collisionless dynamics is violent relaxation. If f is not 
time independent then @ is time dependent in 
general. Also, from the equation of motion of one 
body, E varies according to dE/dt=0¢/0t, and so 
energy is exchanged between bodies, which leads to 
an evolution of the distribution of energies. This 
process is known as violent relaxation. 

Two other relaxation processes are of importance: 


1. Relaxation is possible on each energy hypersur- 
face, even in a static potential, if the potential is 
nonintegrable. 

2. The range of collective phenomena becomes 
remarkably rich if the system exhibits ordered 
motions, as in rotating systems. Then an impor- 
tant role is played by resonant motions, espe- 
cially resonances of low order. The 
corresponding theory lies at the basis of the 
theory of spiral structure in galaxies, for instance. 


Collisional effects The approximations of colli- 
sionless stellar dynamics suppress two important 
processes: 


1. The exponential divergence of stellar orbits, 
which takes place on a timescale of order ter. 
Even in an integrable potential, therefore, f 
evolves on each energy hypersurface. 

2. Two-body relaxation. It operates on a timescale of 
order (N/ In N)te, where N is the number of 
particles. Although this two-body relaxation time- 
scale, t., is much longer than any other timescale 
we have considered, this process leads to evolution 
of f(E), and it dominates the long-term evolution 
of large N-body systems. It is usually modeled by 
adding a collision term of Fokker—Planck type on 
the right-hand side of eqn [6]. 


In this case, the only equilibrium solutions 
in a steady potential are those in which f(E) œ 
exp(—GE), where 8 is a constant. Then eqn [7] 


becomes Liouville’s equation, and for the case of 
spherical symmetry the relevant solutions are those 
corresponding to the isothermal sphere. 


Collisional Equilibrium 


We consider the collisional evolution of an N-body 
system further in a later subsection and here develop 
fundamental ideas about the isothermal model. The 
isothermal model has infinite mass, and much has 
been learned by considering a model confined within 
an adiabatic boundary or enclosure. There is a series 
of such models, characterized by a single dimension- 
less parameter, which can be taken to be the ratio 
between the central density and the density at the 
boundary, po/pe (Figure 3). 

These models are extrema of the Boltzmann 
entropy S= —k ff Inf d°r, where k is the Boltzmann 
constant, and the integration is taken over all 
available phase space. Their stability may be 
determined by evaluating the second variation of S. 
It is found that it is negative definite, so that S is a 
local maximum and the configuration is stable, only 
if po/Pe < 709 approximately. A physical explana- 
tion for this is the following. In the limit when 
Po/Pe ~ 1, the self-gravity (which causes the spatial 
inhomogeneity) is weak, and the system behaves like 
an ordinary perfect gas. When po/pe > 1, however, 
the system is highly inhomogeneous, consisting of a 
core of low mass and high density surrounded by an 
extensive halo of high mass and low density. 
Consider a transfer of energy from the deep interior 
to the envelope. In the envelope, which is restrained 
by the enclosure, the additional energy causes a rise 
in temperature, but this is small, because of the very 
large mass of the halo. Extraction of energy from 
around the core, however, causes the bodies there to 
sink and accelerate, and, because of the negative 
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Figure 3 The density profile of the nonsingular isothermal 
model, with conventional scalings. 


specific heat of a self-gravitating system, they gain 
more kinetic energy than they lost in the original 
transfer. Now the system is hotter in the core than in 
the halo, and the transfer of energy from the interior 
to the exterior is self-sustaining, in a gravothermal 
runaway. The isothermal model with large density 
contrast is therefore unstable. 

The negative specific heat, and the lack of an 
equilibrium which maximizes the entropy, are two 
examples of the anomalous thermodynamic beha- 
vior of the self-gravitating N-body problem. They 
are related to the long-range nature of the gravita- 
tional interaction, the importance of boundary 
terms, and the nonextensivity of the energy. Another 
consequence is the inequivalence of canonical and 
microcanonical ensembles. 


Numerical Methods 


The foregoing considerations are difficult to extend 
to systems without a boundary, although they are a 
vital guide to the behavior even in this case. Our 
knowledge of such systems is due largely to 
numerical experiments, which fall into several 
classes: 


1. Direct N-body calculations. These minimize the 
number of simplifying assumptions, but are 
expensive. Special-purpose hardware is readily 
available, which greatly accelerates the necessary 
calculations. Great care has to be taken in the 
treatment of few-body configurations, which 
otherwise consume almost all resources. 

2. Hierarchical methods, including tree methods, 
which shorten the calculation of forces by 
grouping distant masses. They are mostly used 
for collisionless problems. 

3. Grid-based methods, which are used for colli- 
sionless problems. 

4. Fokker—Planck methods, which usually require a 
theoretical knowledge of the statistical effects of 
two-, three- and four-body interactions. Other- 
wise they can be very flexible, especially in the 
form of Monte Carlo codes. 

5. Gas codes. The behavior of a self-gravitating 
system is simulated surprisingly well by modeling 
it as a self-gravitating perfect gas, rather like a 
star. 


Collisional Evolution 


Consider an isolated N-body system, which is 
supposed initially to be given by a spherically 
symmetric equilibrium solution of eqns [6] and [7], 
such as eqn [8]. The temperature decreases with 
increasing radius, and a gravothermal runaway 
causes the “collapse” of the core, which reaches 
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Figure 4 Gravothermal oscillations in an N-body system with 
N =65 536. The central density is plotted as a function of time in 
units such that t,=2V2. (Source: Baumgardt H, Hut P, 
and Makino J, with permission.) 


extremely high density in finite time. (This collapse 
takes place on the long two-body relaxation time- 
scale, and so it is not the rapid collapse, on a free- 
fall timescale, which the name rather suggests.) 

At sufficiently high densities, the timescale of 
three-body reactions becomes competitive. These 
create bound pairs, the excess energy being removed 
by a third body. From the point of view of the one- 
particle distribution function, f, these reactions are 
exothermic, causing an expansion and cooling of the 
high-density central regions. This temperature inver- 
sion drives the gravothermal runaway in reverse, 
and the core expands, until contact with the cool 
envelope of the system restores a normal tempera- 
ture profile. Core collapse resumes once more, and 
leads to a chaotic sequence of expansions and 
contractions, called gravothermal oscillations 
(Figure 4). 

The monotonic addition of energy during the 
collapsed phases causes a secular expansion of the 
system, and a general increase in all timescales. In 
each relaxation time, a small fraction of the masses 
escape, and eventually (it is thought) the system 
consists of a dispersing collection of mutually 
unbound single masses, binaries, and (presumably) 
stable higher-order systems. 

It is very remarkable that the long-term fate of 
the largest self-gravitating N-body system appears 
to be intimately linked with the three-body 
problem. 
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In elementary physics presentations, one learns about 
electricity and magnetism, and also about gravity. 
There appear striking similarities between Newton’s 
law of gravitational attraction and Coulomb’s law of 
attraction between charges. There are also obvious 
differences, the most immediate one being that in 
gravitation all masses are positive and always attract 
each other, whereas in electromagnetism charges may 
attract or repel, depending on their signs. We also 
know today that Newton’s theory of gravity is not 
considered an entirely correct description of the 
gravitational field, particularly when fields are time 
dependent and intense. The currently accepted theory 
of gravity is Einstein’s theory of general relativity. 

The similarity between electromagnetism and 
gravitation also holds to a certain extent when the 
fields depend on time. This is usually not discussed 
in elementary treatments since a full description of 
time-dependent gravitational fields requires the use 
of general relativity. It is true, however, that if the 
fields are weak, there exist several similarities 
between gravitation and electromagnetism. In parti- 
cular, one can have waves in the gravitational field 
that are able to carry energy from a source to a 
receptor. 

If one assumes that the metric of spacetime is 
close to the flat Minkowski metric n, that is, 
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Euv =Nw + hy with |b| << 1 in Cartesian coordi- 
nates, the Einstein equations of general relativity, 
expanding to linear order in h,,,, become 


0 =R — +NuwR 
=} (0,0,b%,, + 8,0,b° — 8,,0,b 
— Bi E Viggo T taik) [1] 


These do not look like wave equations. However, 
if one chooses “harmonic coordinates,” Ox" =0, 
where O is the d’Alembertian constructed with the 
full metric and then linearized, the vacuum Einstein 
equations become 


Ohu = 0 2 | 


where O is the d’Alembertian computed in the flat 
Minkowski metric. 

Just as in electromagnetism the motion of charges 
produces waves, the motion of masses produces 
waves in the gravitational field. In the above wave 
equations, one would have nonzero right-hand sides 
if masses were present. In electromagnetism, the 
conservation of charge implies that the lowest order 
of “structure” a source must have to produce 
electromagnetic waves is that of a time-dependent 
dipole. In the gravitational field, the conservation of 
momentum implies that the lowest multipolar order 
of a source of gravitational waves must be a 
quadrupole. Moreover, gravity is a weaker force 
than electromagnetism when one considers usually 
available situations. One can exert forces of the 
orders of fractions of Newton with electromagnetic 


charges easily collected in tabletop experiments. To 
produce similar amounts of gravitational force, one 
needs large quantities of mass. This last fact, 
coupled with the quadrupolar nature of the sources 
of gravitational waves, makes their production quite 
challenging in experimental terms. The luminosity of 
a gravitational wave source is given by the cele- 
brated Einstein quadrupole formula, 


L=(s5) x (ix) 3 


where G is Newton’s gravitational constant, c is the 
speed of light, and I; is the third-order time 
derivative of the traceless part of the quadrupole 
mass moment of the source. 

Gravity is, however, a dominant force if one 
considers the universe at large (say, at least 
planetary) scales. There one would expect gravita- 
tional waves to play some role in the dynamics of 
the systems. In such systems, the presence of 
gravitational waves has indeed been experimentally 
confirmed. We know of a system of two pulsars in 
mutual orbit, PSR1913 +16, whose orbit has been 
tracked with enough accuracy via radioastronomy 
to make the influence of gravitational waves 
observable. The motion of the pulsars makes the 
system an emitter of gravitational waves. Since the 
waves carry away energy, the orbit of the system 
decreases in radius and the period of oscillation 
increases. The system has now been tracked for over 
20 years, and the prediction of the emitted amount 
of energy in gravitational waves due to general 
relativity has been confirmed with a very significant 
degree of accuracy. Penrose was the first to notice 
that if one considers how accurately Newton’s 
theory plus the corrections due to general relativity 
predict the positions of the pulsars in their orbit, this 
is in fact the most accurately verified physical 
prediction ever. 

Technically, even the existence of gravitational 
waves at a conceptual mathematical level, was an 
open problem for many years. Since the correct 
description of the waves is through the general 
theory of relativity, a “gravitational wave” should 
really be viewed as a “ripple in spacetime.” Disen- 
tangling if such ripples are a true physical effect or a 
time-dependent coordinate transformation that pro- 
pagates — to use the words of Eddington — “with the 
speed of thought” took quite a bit of technical 
development within the general theory of relativity. 
It was only in the 1960s that a clear enough 
conceptual picture was developed to determine that 
gravitational waves were indeed a true physical 
phenomenon akin to electromagnetic waves. And in 
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particular that one can unambiguously characterize 
them as transporting energy, momentum, and 
angular momentum from a source to an observer. 

Gravitational waves are as difficult to detect as 
they are to produce. Since all masses fall in the same 
way in a gravitational field, one needs to couple to 
the gradients of the field to detect gravitational 
waves, which diminishes the efficiency. Attempting 
to produce gravitational waves via mechanical 
means in the lab (e.g., by rotating a bar of metal) 
produces too little luminosity, and in addition, the 
relatively low frequency implies that the wave zone 
is far away, which further decreases the chances for 
detection. Up to date, no one has succeeded in 
producing a Hertz-like experiment for gravitational 
waves and the jury is still out on the issue if future 
technologies (e.g., the use of superconductors to 
produce waves of gigahertz frequency) will ever 
allow such an experiment. 

Efforts to attempt to detect gravitational waves 
produced by astrophysical phenomena started in the 
1960s with pioneering work by Weber. The initially 
proposed technology for detection was the construc- 
tion of large (~1 ton) resonant bars. The idea was 
to use sensitive technology to measure the resonance 
of the bar as gravitational waves of astrophysical 
origin impact on it. Gravitational waves manifest 
themselves as a stretching and contraction of 
lengths. The contraction or stretching is propor- 
tional to the length of the object considered and is 
therefore characterized by a dimensionless number, 
the “strain” AL/L usually called “h.” Conservative 
current estimates of possible astrophysical sources 
state that on Earth one should not expect strains 
larger than 1072 for events that repeat more 
frequently than a few times every year. Detectors 
with bar technology are approaching their funda- 
mental quantum limits with strains that appear to be 
too large for detection to be ensured. This led to the 
proposal of a new technology, the use of Michelson- 
type interferometers to detect the waves. Currently, 
several interferometric detectors are being built in 
the US, Europe, Japan, and Australia that expect to 
achieve enough sensitivity for detection within a few 
years. Contrary to the bars, which are quintessen- 
tially narrow-band detectors (most bars operate 
~900 Hz with a bandwidth of ~10Hz), interfero- 
metric detectors are broadband. Current detectors 
have a sensitivity curve limited by various sources of 
noise that make them suitable for detection within 
the 10 Hz—1 kHz band. The broadband nature of the 
detectors opens several opportunities for the use of 
data analysis techniques that can allow the detection 
of gravitational waves that have strains even lower 
than the noise of the detectors. Moreover, several of 
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the candidate events “evolve” in frequency as they 
emit gravitational waves (in the case of the binary 
pulsar, for instance, the frequency “sweeps up” as the 
system loses energy), and such evolution could be 
monitored with interferometric detectors. This 
would allow several insights into the physics of the 
observed systems. 

An important limitation of any type of detector 
based on Earth is that the seismic noise increases 
quite significantly below 10Hz. Even if seismic 
isolation allowed sensitivities below 10 Hz, gravity 
gradients due to Earth’s seismic motion and due to 
clouds would limit ground-based detectors to 1 Hz 
and above. The frequency at which a system emits 
gravitational waves is inversely proportional to the 
system’s mass (a simple way to see this is to realize 
that larger systems move proportionally slower to 
their size). However, larger systems generically have 
more mass and therefore consequently emit larger 
amounts of energy in gravitational waves. This 
suggests that setting up detectors in space, free of 
the constraints of seismic noise, would offer sig- 
nificant promise in detecting gravitational waves. 
Currently, there is a proposal for a space-borne 
gravitational wave detector consisting of three 
satellites in a solar orbit that trails that of Earth. 
Lasers would be sent between the satellites to track 
their relative positions, which will be separated by 
5 million kilometers. Such a detector would be 
sensitive in frequencies of 10-107 Hz. In such a 
frequency band, one expects that compact objects 
plunging into supermassive black holes and other 
sources will be readily available. Detection of 
gravitational waves on Earth is considered marginal, 
in the sense that conservative current estimates 
cannot guarantee that there will be enough events 
to make the detection successful at significant event 
rates. Conversely, for the detectors in space, detec- 
tion should be guaranteed at high event rates. 

Possible sources of gravitational waves to be 
detected by the Earth-based interferometric detec- 
tors are: 


1. Binary systems of compact objects. As the system 
orbits, it emits gravitational waves, which makes 
the orbit shrink in size and the orbiting period 
shorter with the objects eventually merging 
together. Potential systems include black hole 
binaries, neutron star binaries and mixed black 
hole/neutron star binaries. As the system sweeps 
up in orbital speed towards the merger, so does 
the frequency of the gravitational waves emitted. 
For binaries of neutron stars, which usually have 
masses slightly larger than the mass of the Sun, 
the last few minutes of the binary inspiral will be 


detectable by the current generation of gravita- 
tional wave detectors, up to a distance of several 
mega-parsecs for the initial detectors, increasing 
to a few hundreds of mega-parsecs for improve- 
ments planned for the next few years. For black 
hole binaries, since the masses can be larger, one 
expects larger signal-to-noise ratio for the same 
distance or to be able to detect at larger 
distances. 

2. Spinning neutron stars that develop “mountains” 
or other irregularities in the surface would 
produce gravitational waves of small amplitude 
but of a very regular periodic nature. This makes 
them prime candidates for data analysis techni- 
ques that could exhibit the presence of the wave 
even though it is weaker than the background 
noise of the interferometers. Integration periods 
of several months may be needed for detection, 
depending on the size of the asymmetries in the 
neutron stars. 

3. Supernovas or other violent events are obviously 
possible sources of gravitational waves. How- 
ever, the quadrupole nature of the waves requires 
the events to be asymmetric in order to produce 
gravitational waves. Current numerical models of 
supernovas are not accurate enough to predict in 
a clear way the level of asymmetry to make 
reliable predictions of how frequently and at 
what intensity could these types of sources be 
detected. 

4. The primordial background of gravitational 
waves produced in the big bang is not expected 
to be detectable by the Earth-based detectors. 
The precise amplitude of the background is 
unknown, depending on details of cosmological 
models. The detectors are likely to be able to 
constrain some of the models that predict large 
amplitudes for the gravitational wave 
background. 


For the space-based detectors, the situation is 
more favorable, since there exist sources of gravita- 
tional waves that are guaranteed to be detected. 
Potential sources of gravitational waves are: 


1. Merger of the supermassive black holes at the 
centers of two galaxies. Given the large amounts 
of mass involved, they would be easily detected 
and very precise measurements of the system’s 
parameters and of various general relativistic 
behaviors could be possible. Such systems should 
be detectable all across the universe, although it 
is not expected that such systems form for 
redshifts larger than 30. 

2. Inspiral of compact objects into the supermassive 
black holes at the centers of galaxies (neutron 


stars, white dwarfs, solar-sized black holes). 
These processes will allow the usage of gravita- 
tional waves to map precisely the gravitational 
field of the supermassive object. 

3. White-dwarf binaries and low-mass X-ray bin- 
aries. There exist about a dozen such systems 
optically observable with gravitational wave 
frequencies above 0.1 mHz that the space-based 
detectors should be able to detect. There is likely 
to be a large population of other systems that are 
also detectable and are not optically visible. In 
fact, there may be so many of these sources that 
time resolution would be impossible, and they 
would form a random background. 

4. Collapse of supermassive stars. The formation 
mechanism for the supermassive black holes in 
the centers of galaxies is still uncertain. One 
possibility is that they stem from the collapse of 
supermassive stars, and in that case a potentially 
significant emission of gravitational waves could 
take place. 

5. Primordial background of gravitational waves. 
Unfortunately, the abundance of white-dwarf 
binaries as a source is expected to cloud the ability 
of the space detectors to observe primordial 
gravitational waves in an important portion of the 
spectrum of the instrument, although it appears 
possible at low frequencies, where it could compete 
with the bounds set by pulsar timing. 


The current Earth-based gravitational wave pro- 
jects include the LIGO project in the US, funded by 
the National Science Foundation and jointly oper- 
ated by Caltech and MIT and a consortium of 
institutions known as the LIGO Science Collabora- 
tion. LIGO consists of two 4km long Fabry-Perot 
recycled Michelson interferometers, one in Hanford, 
WA, and one in Livingston, LA. In Europe, the 
GEO600 project is a 600m dual-recycled inter- 
ferometer near Hanover in Germany and the Virgo 
project is a 3km interferometer near Pisa in Italy 
operated by a French-Italian consortium with a 
similar optical configuration as LIGO. TAMA300 is 
a 300 m interferometer in Japan also with the same 
configuration as LIGO. When all these detectors are 
in Operation, sources seen in coincidence could be 
localized by triangulation. TAMA is now operating 
close to design sensitivity, GEO600 and LIGO are 
likely to operate at design sensitivity in 2006, with 
VIRGO following close behind. The space-based 
interferometer project is called the LISA project and 
is planned as a joint NASA/ESA project. ESA has 
approved a launching date for 2015, but it is 
plausible that the mission could be launched at an 
earlier date. 
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A direct detection of gravitational waves would be a 
breakthrough in experimental science, as well as a 
confirmation of the dynamic nature of gravity in 
general relativity. Once the detection of gravitational 
waves becomes a routine matter, one can imagine a 
revolution in astronomy as one uses gravitational 
waves to “see” the universe. Since they are so hard to 
produce and interfere with, gravitational waves 
become an excellent type of “light” to look at the 
universe with. Gravitational waves will be produced by 
important concentrations of mass, correlating well 
with “interesting” astronomical processes, and is not 
expected to be affected by the presence of dust or other 
interfering objects that could easily obscure electro- 
magnetic waves. In addition to this, one has several 
“standard candles” for gravitational waves (e.g., most 
neutron stars have masses that differ by a few percent 
from 1.4 solar masses). This could allow, for instance, 
to determine with a high degree of accuracy the Hubble 
constant. Gravitational waves will also provide insight 
into the nuclear equation of state that holds in the 
interior of compact objects like neutron stars. Contrary 
to ordinary electromagnetic radiation, which 
“decoupled” from matter only when the universe 
became cool enough after the big bang, gravitational 
waves could be used to probe the universe further into 
the past. The detection could also prove that gravita- 
tional waves travel at the speed of light, a prediction of 
general relativity and other theories. 

An interesting observation is that most astrophysical 
objects that are quite visible in the electromagnetic 
spectrum are unlikely to be visible in terms of 
gravitational waves, and vice versa. This makes the 
information we will gather from gravitational wave 
astronomy complementary to what we learn from 
optical (electromagnetic) astronomy. Moreover, it 
should be noted that wavelengths of electromagnetic 
waves are typically very small compared to the size of 
the astronomical objects they depict. This is due to the 
fact that the waves are really not produced by the 
objects themselves but by atoms on the surface of the 
objects or in regions nearby, usually very hot and in 
gaseous form. In contrast, gravitational waves are 
produced by the bulk matter of astronomical objects 
and their wavelengths are expected to be long as 
compared to the objects that produce them. They are 
more akin to a sound than to light in this respect, 
another reason to suspect that the information we will 
get from them is unlike any information obtained 
electromagnetically. 

Gravitational waves are likely to bring great 
surprises. Every time a new window has been 
opened on the universe — for instance, the use of 
radio waves — our view of the universe has been 
revolutionized. Given how differently they operate 
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at a detailed level with respect to radio waves, the 
surprises from gravitational waves used as tools to 
view the universe are potentially even greater. 


See also: Asymptotic Structure and Conformal 
Infinity; Computational Methods in General Relativity: 
The Theory; General Relativity: Experimental Tests; 
General Relativity: Overview. 
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Introduction 


Probability distributions coming from random matrix 
theory (RMT), RMT laws, occur in different contexts, 
notably in quantum physics and in number theory. 
RMT laws are also seen in certain local random growth 
models and related problems in discrete probability, 
random permutations, exclusion processes, and ran- 
dom tilings (dimer models). In these models limit laws 
for height/shape fluctuations are given by limit laws 
from RMT, in particular the largest eigenvalue or 
Tracy-Widom distributions. These models belong to 
the Kardar—Parisi-Zhang (KPZ) universality class. 
Models in this class have two universal exponents, 1/ 
3 describing the interface fluctuations and 2/3 describ- 
ing the correlations in the transversal direction. By a 
local random growth model, we mean a model where 
the random growth mechanism is local in that it does 
not depend on the global geometry as in diffusion 
limited aggregation (DLA). Typically there is also some 
smoothing mechanism. The connection with RMT can 
only be established for special exactly solvable models. 
Below we discuss a basic model based on a last-passage 
percolation problem, which translates into a poly- 
nuclear growth (PNG) process. Other models that can 
be treated are in a sense variations of this model. Point 
processes with determinantal correlation functions 
play a central role in RMT and in the analysis of the 
basic model and we start by discussing these. The basic 


model has several different interpretations that will be 
outlined. Another basic tool, which can be formulated 
in different ways, is the Robinson—Schensted—Knuth 
(RSK) correspondence well known in combinatorics. 
One approach is in terms of nonintersecting paths 
which translates into a multilayer PNG process. Limit 
theorems can be formulated for the height above a 
fixed location, and also for the whole height function in 
terms of the Airy process which extends the Hermitian 
Tracy-Widom distribution F2. It is expected that 
several results should generalize to a broader class of 
models. There is a natural universality problem of 
extending the validity of the RMT laws. 


Determinantal Processes 


Point processes with determinantal correlation func- 
tions play an important role in the exactly solvable 


models. We consider probability measures on 
A”, A CR, of the form 


1 


7 der dila) e det (ea) a da U 


which can be thought of as describing random points 
in A at positions x1,...,x,. Here, u is a reference 
measure on A, for example, Lebesgue or counting 
measure, Z,, a normalization constant, and ġ;, Y; given 
functions. A measure of this form has determinantal 
correlation functions in the sense that the density, with 
respect to d” u(y), of particles at y1,...5 Ym is 


Ym) = det(Kn (vi, Y;)); j=1 2 | 


There is an explicit formula for the correlation 
kernel K,, in terms of the functions ¢;, Yi. 


oi.. 


The eigenvalue measures in the basic random 
matrix ensembles have the form 


1 a A 
Hlan) T] we d'u) 3] 
where A,(x)=det(xi")F 4 is Vandermonde’s 


determinant, x € A” and xj,...,x, are the eigen- 
values. For the Gaussian unitary ensemble (GUE,,), 
Z-! exp(—tr M?)dM of n x n Hermitian matrices M, 
we have G=2,A=R,w(x)=exp(—x*) and u the 
Lebesgue measure. For the Laguerre unitary ensem- 
ble (LUE,,,) of complex covariance matrices, M*M, 
where M is an (n+ v) x n-matrix with standard 
complex Gaussian elements, we have w(x)= 
x” e™, v > 0,8=2,A=[0,œ) and u the Lebesgue 
measure. The G=2 case of [3] can be put into the 
form [1] and hence has determinantal correlation 
functions. In this case the correlation kernel can be 
expressed in terms of the normalized orthogonal 
polynomials p(x) with respect to w(x)du(x) on A. 
Because of this when G=2 the ensemble [3] is 
referred to as an orthogonal polynomial ensemble 
(OPE). The kernel is given by 


n—1 
Kal, y) =X pe(x)Pev)(w(x)w(y))'* A 
k=0 


A consequence of [2] is that the probability of 
finding no particle in a set JC A is given by a 
Fredholm determinant, 


Pino particle in J] = det(I — Kn) 2/7, [5] 


In particular the distribution function F(€) of the largest 
eigenvalue or rightmost particle Xmax = MaX1<j<n Xj 
in an OPE is given by [5] with K, as in [4] and 
J =(§, 00). 


A Basic Model 


Let (w(i, j)); jez? be independent geometric random 
variables with parameter a;bj, 


Plw(i, j) = k] = (1 — ajbj) (ajbj)" 6] 


k>0 and 0 < a;b; < 1. As a limiting case we can 
obtain exponential random variables. Consider the 
last-passage time 


G(M,N)= max X w(i,j) 7] 
T i)er 

where the maximum is over all up/right paths ~ 

from (1,1) to (M, N), that is, m = {(i1,/1),..-5 (ins fm)} 


with (Zp415/k+1) (225 1k) = (1, 0) or (0, 1), (21, /1) = 
(1,1) and (in,jm)=(M,N),m=M+N-—1. We can 
also think of this as a zero-temperature directed 
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polymer, by thinking of the w(i,j):s as (minus) 
energies and 7 as random walk paths. 

As will be explained in some more detail below, 
if the w(i,j):s are exponential with mean 1 and 
M > N, then G(M, N) = \max in distribution, M > N, 
where Amax is the largest eigenvalue in LUEN, M-N. 
Hence, in this case G(M,N) behaves exactly like a 
largest eigenvalue. If the w(i,7):s are geometric with 
parameter q, then G(M, N) has the same distribution 
as the rightmost particle in an OPE, namely [3], 
with G=2,w(x)= (ig and u the counting 
measure on A=N, called the Meixner ensemble. 
Since in this case the relevant orthogonal polynomials 
are discrete the ensemble is referred to as a discrete 
OPE. 

The random variables {G(M, N)} (M,N)eZ2 have two 
interpretations related to random growth. It follows 
from [7] that 


G(M, N) 
=max(G(M — 1,N),G(M,N —1)) 
+w(M,N) [8] 


This can be thought of as a growth rule. We change 
variables by letting G(M,N)=h(M —N,M+N — 1) 
and w(M,N) =w(M—N,M+N —1) with w(M,N)=0 
if (M,N) ¢Z*. Then 


h(x,t+1) 
= max(h(x — 1,t),hb(x,t),b(x + 1,t)) 
+ w(x, £) 19] 


xeZ,teN,h(x,0) = 0, and w(x, t)=0 if |x| > t or if 
x — tis even. We can extend it to the whole real line 
by letting h(x,t)=h([x],t). The growth rule [9] is a 
discrete polynuclear growth (PNG) model. Up-steps 
in the interface, x — h(x,t) (see the top curve in 
Figure 1), move at unit speed to the left and down- 
steps move at unit speed to the right and they merge 
at collision. On top of this smoothing mechanism, 
we have random deposition given by u(x,t). Look- 
ing at the definition of w, we see that all deposition 
up to time £ is on top of a basic layer (—t,t). The 
asymptotic shape will look like a droplet and this 
setting of PNG is called the droplet geometry. We 
see that height fluctuations are directly related to 
fluctuations of G(M, N). 

We get another growth model, the corner growth 
model, somewhat similar to the classical Eden 
growth model, by considering the random shape 
(see Figure 2), 


Q(n) = {(M, N) € Z7;G(M,N) +M+N-1 <n} 
+ [-1, 0] [10] 
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Figure 1 Multilayer PNG model. 





Figure 2 Corner growth model at time n= 5. The crosses are 
the possible growth sites. 


The complement of this set in a has a boundary 
B(n) which we can think of as an interface. By [8] 
and the lack of memory property of the geometric/ 
exponential distribution, the region Q(n) grows by 
adding new squares independently at each corner of 
B(n) with geometric/exponential waiting times (see 
Figure 2). If we look at B(n) in a coordinate system 
with M =N as vertical axis and write a 1 for every 
unit down-step on B(n) and a O for every unit up- 
step (see Figure 2), the corner growth dynamics 
translates into the totally asymmetric simple exclu- 
sion process (TASEP), in discrete or continuous 
time, with initial configuration ... 1111000... . As 
shown by Jockush, Propp, and Shor, Q(n) also 
occurs in a uniform random domino tiling of a 
region called the Aztec diamond (see Figure 3). The 
shape Q(n), when q= 1/2, has the same law as the 
completely regular (frozen) North Polar Region 
(NPR) in the tiling and hence the boundary 
fluctuations of the NPR are related to the fluctua- 
tions of G(M, N). The NPR in Figure 3 has the same 





Figure 3 Domino tiling of an Aztec diamond of size n=5. 
Dominos marked by dots form the NPR. 


shape as Q(5) in Figure 2. This connects the models 
considered here with dimer or tiling problems in 
two-dimensional equilibrium statistical mechanics. 

Consider a (Poissonized) random permutation o 
from Sx, where N is a Poisson(a) random variable. 
Let L(a) denote the length of the longest increasing 
subsequence in ø, for example, o=316452 has 
L=3. By thinking of the representation of a 
permutation by its permutation matrix, we see that 
G(N,N) with w(i,j) geometric with parameter 
q=a/N? converges to L(a) in distribution as 
N — œ. We call this limit the Poisson limit. Taking 
this limit in the PNG process yields the Prahofer— 
Spohn continuous time PNG (cont-PNG) model, 
which is similar to the discrete PNG defined above 
but where all steps have unit size and we have 
continuous time dynamics with deposition events 
according to a two-dimensional spacetime Poisson 
process. The study of L(a), and its de-Poissonization 
when N is nonrandom, is known as Ulam’s problem 
in combinatorial probability. 


The RSK Correspondence 


The mapping of the last-passage problem [7] into a 
determinantal process is based on the RSK corre- 
spondence. This correspondence maps the integer 
matrix (w(i, f))ı<; jcm bijectively to a pair of semi- 
standard Young tableaux (P,Q) with common 
shape A, which is a partition A=(Ay,2,...) of 
><; jcm w(i j). This map has the property that 
G(M,N)=.4, the length of the first row in the 
Young diagram. From the combinatorial definition 
of the Schur polynomials są it follows that the 
measure [6] on the integer matrix is mapped to a 


probability measure on partitions, the Schur mea- 
sure, given by 


Pschur [A] = s(a, ..-,4m)S\(b1,...,6m) [11] 
This measure has determinantal correlation functions 
if we think of x;= A; — i as the positions of particles 
in Z. If we use x;=A;+N-—i as variables and 
specialize to qj = --- =ay=,/q,0,=-:: =bn=/q 
and b;=0 for j > N we get the Meixner ensemble. 
The case of exponential random variables, for 
example the relation to LUE discussed above, is 
obtained from the Meixner ensemble by taking an 
appropriate limit. In the Poisson limit we get the 
Poissonized Plancherel measure, 


Plan A] = LN [12] 


where Pplan, NIA] = (dim r) /N! if A is a partition of 
N and 0 otherwise. Here dim is the dimension 
of the irreducible representation of Sy labeled by A. 
In the work of Borodin and Olshanski in representa- 
tion theory various measures on partitions with 
determinantal correlation functions occur naturally. 
Also Okounkov and co-workers have used the 
Plancherel and Schur measures in Gromoy—Witten 
theory. The correlation kernel for the Plancherel 
measure represented as the point process (x;);>1 in Z 
with x; = A; —i has the correlation kernel, called the 
discrete Bessel kernel, 


B°(x,y) = Va(x — y)! 
A A 


where J, is the ordinary Bessel function. The 
random variable L(a) has the same distribution as 
maxx; + 1. Hence, by [5], 


P[L(a) < n| = det — B pun nyp 14 


The random variable L(a) also gives the height 
above the origin in cont-PNG. 

There is a geometric interpretation of RSK going 
back to Viennot. The pair (P,O) is represented as a 
family of nonintersecting paths in a directed graph. 
These paths can be obtained by running a multilayer 
version of the PNG process where the size of 
collisions are deposited as growth in lower layers 
which evolve according to the same PNG dynamics. 
The information lost in the collisions is recorded in 
the lower layers. This can be done also for 
{w(t,/)}i+;-1<¢ and leads to a multilayer version of 
[P]; [ath 1 where h_;(x, 0) = =j PaE t) = 
—j, and h(x, t) = ho(x, t) is the top path (see Figure 1). 
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The Karlin-McGregor theorem or the Gessel- 
Viennot method say that the weight (probability) 
of a family of nonintersecting paths with fixed initial 
and final positions on a weighted directed acyclic 
graph is given by a determinant. It follows that the 
probability of a certain configuration {h_;(0, t)};>1 1s 
given by a product of two determinants and hence 
has the form [1]. In Figure 1, the weights of the 
horizontal line segments will be 1, whereas each unit 
vertical step has weight a; or b; as indicated in the 
figure. This leads to [9] using the Jacobi—Trudi 
formula for the Schur polynomial. 


Limit Theorems 


The existence of a limit shape in a model like [6] 
with w(i,7) independent random variables, and in 
related problems follows by a subadditivity argu- 
ment, although explicit shapes are only known in a 
few cases. The formalism described above makes it 
possible to get more detailed results about the 
fluctuations around the limit shape, like a central- 
limit theorem, but with a non-normal limit law. We 
know that G(M,N) has the same distribution as 
Xmax — N+ 1, where xmax is the rightmost particle in 
the Meixner ensemble. This, together with [4], [5], 
and an asymptotic analysis of the Meixner poly- 
nomials, gives 


P[G(M,N) <w(7,4)N + £0(7,4)N'?|>Fr(€) [15] 
as N — œ,M > o,M/N > y > 1, where 


W(¥,4q) = ———— - 1 [16] 


and 
1/6 
ana) =O (vi Var VE? (17 


The limiting distribution function Fz is the Tracy- 
Widom distribution given by 


Fz (€) = det(I — A)12(€06) Pe 
where 
x—y 


is the Airy kernel. It is also the limiting largest 
eigenvalue distribution for GUE,, 


T — 2 
=, J =P 20 


A(x, y) = [19] 


lim P 


N-oo 





The function Fz can also be expressed in terms of a 
Painlevé II function. The limit theorem [15] 
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translates into a fluctuation result for the height 
function in the corner growth and the PNG models, 
saying that the height fluctuations above a fixed 
location at time ¢ are of order t!/? and given by the 
F,-distribution. Here we see the KPZ exponent 1/3. 

For the length L(a) of the length of a longest 
increasing subsequence in a random permutation or 
the height above the origin in the cont-PNG, [14] 
and asymptotics for Bessel functions yield 


P[(L(a) — 2va)/a"f <§@ > F€) [21] 


as a— oo. This result was first proved by Baik, 
Deift, and Johansson using a Toeplitz determinant 
formula (Gessel’s formula) for the left-hand side of 
[14] and the Deift-Zhou nonlinear steepest descent 
method for oscillatory Riemann—Hilbert problems. 
The above limit theorems can be extended to limit 
theorems for the whole point process rescaled 
around the rightmost point. This results in a 
limiting determinantal point process given by the 
Airy kernel [19]. 


The Airy Process 


From the point of view of the growth processes, for 
example, the PNG process [7], it is natural to consider 
a scaling limit of the whole height function x — h(x, t) 
as t — oo. Looking at the height configuration in the 
multilayer growth process x — {h_;(x,t)};+1 at differ- 
ent locations x1,...,Xr5Xr41,---,XM_1 leads, via the 
Karlin-McGregor or Gessel-Viennot method, to 
probability measures of the form 


Me! eet \\" 
z lI det( dr (y )) a Ba 


with y? and y™ fixed configurations. Here, in the 
discrete PNG model, ¢,,,41(x,y) is the transition 
probability (weight) to go from height x to height y 
between positions x, and x, ,,. This measure 
generalizes [1] and it also has determinantal correla- 
tion functions. Measures of this form also arise in 
multimatrix models and in Dyson’s Brownian 
motion model, t— M(t),t<¢R, for Hermitian 
matrices, which is a Gaussian multimatrix model. 
The elements of the time-dependent Hermitian 
matrix M(t) evolve according to independent 
Ornstein—Uhlenbeck processes and we have the 
transition kernel 


Z-‘exp[-tr(M(t)— qM(0))"/(1—4@°)]_— [23] 


where g=exp(—t). This process has GUE as its 
stationary distribution. The Harish-Chandra/Itzykson— 
Zuber integral can be used to show that the joint 
eigenvalue measure for M(t,),...,M(ty_1) has the 


form [22] and hence has determinantal correlation 
functions. The correlation kernel is the extended 
Hermite kernel, 


Kn (T, X; 0, y) 
» e 0") T (X) Pit (y) e247") 
k=1 


iT oo 
= |24] 


7 0 
- So ek, p(x) Paz (ye 2" 1”? 


k=—0o 


itrT<o6 


with p, the normalized Hermite polynomials, p = 0 
if k<0. Notice that this reduces to the Hermite 
kernel [4] when tr=o. This machinery can be used 
to show that the largest eigenvalue process 
t— A”) (t) induced by M(t) converges in the sense 
of finite-dimensional distributions to a limiting 
process, the Airy process, 


(V2ny; 


raxt- 2n) /n'3 + A(t) 2S 
as n— oœ. The Airy process A(t), which is a 
stationary process, can be viewed as the top curve 
of a multilayer process t — (A-—;(t));>1, A(t) = Ao(t) 
such that the point process {A_j(tp)}1<p<m,j>0 has 
determinantal correlation functions with correlation 
kernel 


A(T, & 7", ) 
J eT) Ai (E + AJAI (Ë + A)dà 
0 
ifr >’ [26] 


0 
= J eT) Ai (E + AJAI (E + A) dA 


CO 


irer 


the extended Airy kernel, which reduces to the 
ordinary Airy kernel [19] when r=. The Airy 
process can be viewed as an extension of the Tracy- 
Widom distribution F2. For the PNG model above, 
the multilayer process is described by an extended 
kernel, which in the cont-PNG is an extended 
version of the discrete Bessel kernel [13]. In a 
suitable scaling limit, this extended kernel converges 
to the extended Airy kernel. For the PNG process 
[7], this leads to the limit law 


aN H (2a Ln 2N — 1) 


2/3 
- An — A(T) 27] 
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as N — oo, where d=(1 — dvara + Ja)”. 
Notice the exponent 2/3 which is the second KPZ 
exponent. This exponent can also be seen in the 
transversal fluctuations of the maximal paths in [6] 
for G(N,N). These are superdiffusive, they have 
fluctuations of order N? around the diagonal, 
compared to N!/* for random walk paths between 
the same points. A fluctuation result like [27] can 
also be proved for the corner growth model and 
hence also for the Aztec diamond. The boundary of 
the NPR suitably rescaled converges to the Airy 
process. 


Variations 


Above we discussed one possible geometry, the 
droplet, for the PNG process. If we start with 
h(x,0) = 0 and allow random depositions along the 
whole line, we get an interface that is macroscopi- 
cally flat, and not curved as in the droplet case. In 
this case, the height fluctuations above a fixed 
location at time t are again of size t'/? and described 
by the Gaussian Orthogonal Ensemble largest 
eigenvalue distribution. This law comes from the 
scaling limit of the rightmost particle in [3] with 
G=1, w(x) = exp(—x*), A=R, and u the Lebesgue 
measure. In this case, the correlation functions are 
not determinants but rather pfaffians. The result for 
flat PNG follows from the Baik—Rains analysis of 
symmetrized last-passage or permutation problems. 
In the PNG model we can also consider an interface 
in equilibrium. This can be put into the last-passage 
percolation picture by suitable boundary conditions, 
different parameters for w(i,j) when 7 or f equals 1 
or extra Poisson points on the axes in the Poisson 
limit. Results by Baik and Rains show that in the 
cont-PNG in equilibrium the height fluctuations are 
given by a relative of the Tracy-Widom distribution, 
Fo. In these last two cases, the scaling limit of the 
whole height profile is not known. 

The types of results discussed above can only be 
obtained for very special models. However, it is 
expected that many of the results (in particular the 
KPZ exponents 1/3 and 2/3, and also the fluctuation 
laws, including the Airy process) should generalize 


to many other models. The different interpretations 
of [6] mentioned above suggest different general- 
izations, various local growth models, directed 
polymers, asymmetric exclusion processes, and 
dimer/tiling problems. RMT laws are natural limit 
laws for which the domain of attraction is not 
understood. 


See also: Combinatorics: Overview; Determinantal 
Random Fields; Dimer Problems; Integrable Systems in 
Random Matrix Theory; Random Walks in Random 
Environments; Random Matrix Theory in Physics; 
Random Partitions. 
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Introduction 


The ideal fluid description is one in which viscosity or 
other phenomenological terms are neglected. Thus, as is 
the case for systems governed by Newton’s second law 
without dissipation, such fluid descriptions possess 
Lagrangian and Hamiltonian descriptions. In fact, in 
the eighteenth century, Lagrange himself discussed 
what is in essence the action principle for the 
incompressible fluid. The subsequent history of action 
functional and Hamiltonian formulations of the ideal 
fluid is long and convoluted with contributions from 
Clebsch in the nineteenth century, and the likes of L 
Landau and V Arnol’d in the mid-twentieth century. 
In the early 1980s, there was a flurry of activity on 
the noncanonical Poisson bracket formulation, and 
this formulation is the focus of the present treatment, 
which is motivated by the work of the author, D 
Holm, J Marsden, T Ratiu, A Weinstein, and others. 


Noncanonical Hamiltonian Structure 


The traditional arena for Hamiltonian dynamics is the 
cotangent bundle M:= T* Q, the phase space, which is 
naturally a symplectic manifold with a closed non- 
degenerate 2-form. In coordinates, the 2-form is given 
by we=dgAdp, where q denotes the configuration 
coordinate for the base space manifold Q and p 
denotes the corresponding canonical momenta that 
arise from Legendre (convex) transformation. The 
2-form we provides a natural identification at a point 
z=(g,p)€ M of T,M with T;M, and because of 
nondegeneracy its inverse, the cosymplectic form, 
provides the map J.:T7;M—T,M. Thus, for a 
Hamiltonian H:M—R we have the Hamiltonian 
system of ordinary differential equations z=), dH, 
which in canonical coordinates has the familiar form 


q’ = H /ðp;, 


pi = —0H /ðq [1] 


with ¿=1,2,...,N, where N is the number of 
degrees of freedom. 

Hamilton’s equations can also be written in terms 
of the Poisson bracket [f, g] :=we(Je df, J. dg), where 
f,g:M—>R are smooth phase-space functions. In 
terms of z= (q, p), Hamilton’s equations are 


B= Jag OR 2 
where the Poisson bracket is 
n. of ab Og 
fe = sore 31 


with 


wisi 4 


—In ON 


Note, repeated indices are to be summed with 
a,8=1,2,...,2N. In [4], On is an N x N matrix 
of zeros and Iyn is the N x N unit matrix. 


Noncanonical Poisson Brackets 


The canonical Poisson bracket description of [2]-[4] 
suggests a generalization, with antecedents to S Lie 
and others, that was termed noncanonical Hamilto- 
nian form in the fluid mechanics context by 
P Morrison and J Greene (1980): 


A system has noncanonical Hamiltonian form if it 
can be written as z= |z, H], where the noncanonical 
Poisson bracket | , | is a Lie product for a realization of 
a Lie enveloping algebra on phase-space functions. 


Recall a Lie enveloping algebra a is a Lie algebra, 
with the usual product [,] that is bilinear, anti- 
symmetric, and satisfies the Jacobi identity, which in 
addition has a product axa-—a that satisfies 
the Leibniz identity [fg,h]=flg,b]+[f,]g for all 
BAL 

The geometric description of noncanonical Hamil- 
tonian form has evolved into a structure called the 
Poisson manifold, a differential manifold Z 
endowed with the binary bracket operation [, | 
defined on smooth functions, say, f,g:Z—-R. 
Poisson manifolds differ from symplectic manifolds 
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because the nondegeneracy condition is removed. In 
coordinates, [,] is given by 


— Of ap Og 
œ, 8 =1,2,...,M 


where M = dim Z. Note that J need not have the 
form of [3], may depend upon the coordinate 
z, and may have vanishing determinant. Bilinearity, 
[f ,e]= —lg,f] for all f,g, and the Jacobi 
identity, [f,[g,hl]+[g,1h.fll+[4[f,g]] =0, for 
all f,g,4, imply that the cosymplectic matrix 
satisfies Jf = —J°° and 

Ol” gO eh a 

J Oz? tJ Oz? tJ Oz? =" 4 
respectively, for a, 6, y, 6 = 1,2,..., M. 

The local structure of Z is elucidated by the Darboux- 
Lie theorem, which states that in a neighborhood of a 
point z € Z, for which rank J = M, there exist coordi- 
nates in which J has the following form: 


On In 0 
(J)=|-In On 0 [7] 
0 0 Om-2Nn 


From [7] it is clear that in the right coordinates, the 
system looks like a canonical N-degree-of-freedom 
Hamiltonian system with some extraneous coordi- 
nates, M — 2N in fact. Through any point of the 
M-dimensional phase space Z, there exists a local 
foliation by symplectic leaves of dimension 2N. 

A consequence of the degeneracy is that there 
exists a special class of invariants called Casimir 
invariants that is built into the phase space. Since 
the rank of J is 2N, there exist possibly M — 2N 
independent null eigenvectors. A consequence of the 
Darboux—Lie theorem is that the independent null 
eigenvectors exist and, moreover, the null space can 
in fact be spanned by the gradients of the Casimir 
invariants, which satisfy J°’0C/dz°=0, where 
a=1,2,3,...,M—2N. That the Casimir invariants 
are constants of motion follows from 


el — OC jag 0H _ 
On” og 

Thus, Casimir invariants are constants of motion for any 
Hamiltonian. The symplectic leaves of dimension 2N 
are the intersections of the M — 2N surfaces defined by 
C = constant. Dynamics generated by any H that 
begins on a particular symplectic leaf remains there. The 
structure of Poisson manifolds has now been widely 

studied, but we will not pursue this further here. 
Let us turn to infinite-dimensional systems, field 
theories such as those that govern ideal fluids, where 





[8] 


the governing equations are partial differential 
equations. Although the level of rigor does not 
match that achieved for the finite systems described 
above, formally one can parody most of the steps 
and, consequently, the finite theory provides cogent 
imagery and serves as a beacon for shedding light. In 
infinite dimensions, an analog of [5] is given by 


6F „ôG  /6F _6G 

ve or= J dp ap a (E752) [9] 
where F and G are functionals of the functions 2" (u, t), 
which are functions of u= (m, ..., Hn) independent 
variables of some kind, 6F/6y' denotes the functional 
(variational) derivative, and (,) is a pairing between a 
vector (function) space and its dual. The W’,i=1,...,7, 
are n field components, and now J is a cosymplectic 
operator. To be noncanonically Hamiltonian requires 
antisymmetry, {F,G}= —{G, F}, and the Jacobi iden- 
nitys CH +G A, FIHA GH= 0, foral 
functionals F, G, and H. Antisymmetry requires J to be 
skew-symmetric, that is, (f, 7g) =(J'f,g) = —(g, If). 
The Jacobi identity for infinite-dimensional systems has 
a condition analogous to [6]; it can be shown that one 


need only consider variations of J when calculating, for 
example, {F,{G,H}}. 


Lie—-Poisson Brackets 


As noted in the Introduction, the usual variables of 
fluid mechanics are not a set of canonical variables, 
and, consequently, the Hamiltonian description in 
terms of these variables is noncanonical. There is a 
special general form that the Poisson bracket takes 
for equations that describe media in terms of 
Eulerian-like variables, the so-called Lie—Poisson 
brackets, a special form of noncanonical Poisson 
bracket. Lie—Poisson brackets describe essentially 
every fundamental equation that describes classical 
media. In addition to the equations for the ideal 
fluid, they describe Liouville’s equation for the 
dynamics of the phase-space density of a collection 
of particles, the various hierarchy of kinetic theory, 
the Vlasov equation of plasma physics, and various 
approximations thereof, and magnetized and other 
more complicated fluids. 

Both finite- and infinite-dimensional Lie—Poisson 
brackets are intimately associated with a Lie group 6. 
We use the pairing between a vector space and its 
dual, (,), where the second slot is reserved for 
elements of the Lie algebra g of 6 and the first slot 
for elements of its dual g*. Thus, (,):g* x g—R. In 
terms of the pairing, noncanonical Lie—Poisson 
brackets have the following compact form: 


{F, G} = (x, [Fy, Gy]) [10] 


where we suppose the dynamical variable x € g*,[, ] is 
the Lie algebra product, which takes g x g— g, and we 
have introduced the shorthand F,:=6F/éy. The 
quantities F, and G, are, of course, in g. We refer to 
{ ,} as the “outer” bracket of the realization enveloping 
algebra and [, ] asthe “inner” bracket of the Lie algebra 
g. The binary operator [, ]' is defined as follows: 


(x, Foal) = (Deal. f) [11] 


where evidently y € g*,g,f € g, and[,]':g* x gg". 
The operator [,]', which defines the coadjoint orbit, 
is necessary for obtaining the equations of motion 
from a Lie—Poisson bracket. 

For finite-dimensional systems, the group 6 must be 
a finite-parameter Lie group, the variable w corre- 
sponds to w, and the cosymplectic form in coordinates 
is given by Jap =c°,w-, where the c4, are the structure 
constants for the Lie algebra g, which satisfy 

co, = —c 
ab ba ; f 2) 
ance T ae T CeaCeb = — 
relations that imply [10] satisfies the antisymmetry 
condition and the Jacobi identity. 

For infinite-dimensional systems, the group 6 
must be an infinite-parameter Lie group and the 
cosy cre operator has the form Jj = C$ Xk 
where t are structure operators. The meaning of 
these sich operators will be clarified when we 
consider brackets for fluid mechanics. 


The Fluid State 


Fluid mechanics has a long history, and thus it 
comes as no surprise that the fluid state has been 
described in many ways. Because the Hamiltonian 
structure depends on the state variables, some of 
these ways are described below, beginning with 
Lagrangian variable description. 


Lagrangian Variables 


The description of a fluid that is most like that of 
particle mechanics occurs in terms of variables usually 
referred to as Lagrangian variables. This description 
dates to the eighteenth century. The idea behind the use 
of these variables is a simple one: if a fluid is described 
as a continuum collection of fluid particles, also called 
fluid parcels or elements, then its motion is governed by 
an equation that is an infinite-dimensional version of 
Newton’s second law and, consequently, as we will see, 
both the Hamiltonian and the Lagrangian descriptions 
are infinite degree-of-freedom generalizations of those 
of ordinary particle mechanics. 
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The position of a fluid element, referred to a fixed 
rectangular coordinate systems, is given by q = qla, t), 
where q= (q1,q2,q3) and a=(a1,42,43) is a con- 
tinuum label that replaces the index 7 of [1]. In 
practice, the label can be any quantity that identifies 
a fluid particle, but it is often taken to be the position 
of the fluid particle at time t=O in rectangular 
coordinates. The quantities g'(a,t) are coordinates 
for the configuration space Q, which is in fact a 
function space because in addition to the three indices 
“i” there is the continuum label a. We assume that a 
varies over a fixed domain, Q c R?, which is 
completely filled with fluid, and that the function 
q:Q— Q is one-to-one and onto. We will assume that 
as many derivatives of g with respect to a as needed 
exist, but we will not say more about Q; in fact, not 
much is known about the solution function space for 
the 3D fluid equations in Lagrangian variables. Often 
in the Hamiltonian context the functions q = g(a, t) are 
assumed to be diffeomorphisms and their collection is 
referred to as the diffeomorphism group. 

In the sequel several manipulations are needed and 
so we record here some identities for later use. Viewing 
the map a> q at fixed ¢ as a coordinate change, the 
Jacobian matrix ðq! /da' =: q% has an inverse given by 





ðq" Ai, 
— -5 = ő 13 
ag [13] 
where Aj, is the cofactor of q% and Ĵĵ is its 
determinant. A convenient expression for A; IS 
given by 
~ 1- ôg ad! 
A = e T 14 
kK gm ða" a 


where e= k) is the skew-symmetric tensor 
(density). Evidently, 03/ Oq', =A follows from [13]. 


Eulerian Variables 


In the Lagrangian variable description, one picks out 
a particular particle, labeled by a, and keeps track in 
time t of where it goes. However, in the Eulerian 
variable description, one stays at a spatial observa- 
tion point r=(x1,x%2,x3) EQ and monitors the 
nature of the fluid at r at time t. 

The most important Eulerian variable is the Eulerian 
velocity field v(r, t). This quantity is the velocity of the 
particular fluid element that is located at the spatial 
point r at time t. The label of that particular fluid 
element is given by a = q™! (r, t), and so 


v(r, t) go q` (r, t) [15] 


where - denotes differentiation with respect to time 
at fixed label a. Attached to a fluid element is a 
certain amount of mass described by a density 


= qla, t) PE = 
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function pola). As the fluid moves so that a> q, the 
volume of an infinitesimal region will change, but its 
mass must remain fixed. The statement of local mass 
conservation is pd°r=pod°a, where d°a is an initial 
infinitesimal volume element that maps to d’q at 
time t, and d?r =J d*a. (When integrating over Q we 
will replace d°g by d?r.) Thus, we obtain 





-LA Mog ttre) a 


Jla, t) lazgi (rt) 

where recall the Jacobian J= det(q',;). Besides the 
density, for the ideal fluid, one attaches an entropy 
per unit mass, s=so(a), to a fluid element, and this 
quantity remains fixed in time. In the Eulerian 
description this gives rise to the entropy field 


s(r, t) = so(a)lazgien= S004 (r,t) [17] 


One could attach other scalar, vector, etc., quan- 
tities to the fluid element, but we will not pursue 
this. In the usual ideal fluid closure only the above 
variables are considered. 

Equations [15]-[17] express the Euler-Lagrange 
map. There is a natural representation of this map in 
terms of the Eulerian density variables, M := pv, p, 
and o:=ps, the momentum, mass, and entropy 
densities, respectively, which, as will be seen, are 
variables in which the noncanonical Poisson bracket 
has Lie—Poisson from. 


Other Variables 


Fluid mechanics is rife with variables that have been 
used for its description. For example, Euler, Monge, 
Clebsch, and others introduced potential representa- 
tions, of varying generality, for the Eulerian velocity 
field, an example being 


v(r,t)=aVB4+ Ve [18] 


where the three components of v are replaced by the 
functions a, 8, and @, all of which depend on (r,t). 

Often reduced variables that are tailored to 
specific ideal flows with less generality than those 
described by p,s, and v are considered. Examples 
include incompressible flow with V-v=0, vortex 
dynamics, including contour dynamics and point 
vortex dynamics, flow governed’ by the 
shallow-water equations, quasigeostrophy, etc. The 
Hamiltonian structure in terms of these reduced 
variables derives from that of the parent model in 
terms of Lagrangian variables. Specific variables 
may embody constraints, and understanding these 
constraints, although tractable, can be a cause of 
confusion. Pursuing this further is beyond the scope 
here. 


Hamilton’s Principle for Fluid 


Lagrange, in his famous work of 1788, Mécanique 
Analytique, produced in essence a variational 
principle for incompressible fluid flow in terms of 
Lagrangian variables. The generalization to com- 
pressible flow awaited the discovery of thermody- 
namics, and that is what we describe here. In 
traditional mechanics nomenclature, this variational 
principle is an infinite-dimensional generalization of 
what is known variously as the action principle, the 
principle of least action, or Hamilton’s principle, 
whereby one constructs, on physical grounds, a 
Lagrangian function on TQ used in the action 
principle, where Q is the function space of the 
q(a, t). 

Construction of the Lagrangian requires identifi- 
cation of the potential energy, and this requires 
thermodynamics, because potential energy is stored 
in terms of pressure and temperature. A basic 
assumption of the fluid approximation is that of 
local thermodynamic equilibrium. In the energy 
representation of thermodynamics, the extensive 
energy is treated as a function of the entropy and 
the volume. For a fluid, it is convenient to consider 
the energy per unit mass, denoted by U, to be a 
function of the entropy per unit mass, s, and the 
mass density, p, a measure of the volume. The 
intensive quantities, pressure and temperature, are 
given by T=OU/0s and p=p*0U/dp. Choices for 
U produce equations of state. For barotropic or 
isentropic flow, U depends only on p. For an ideal 
monatomic gas U(p,s)=cp™ t exp (as), where c,7, 
and a are constants. The function U could also 
depend on additional scalar quantities, such as a 
quantity known as spice that has been considered in 
oceanography. 

Conventional thermodynamic variables can be 
viewed as Eulerian variables with a static velocity 
field. Thus, we write U(p,s), where p and s are 
spatially independent or, if the system has only locally 
relaxed, these variables can be functions of r. For the 
ideal fluid, each fluid element can be viewed as a self- 
contained isentropic thermodynamic system that 
moves with the fluid. Thus, the total fluid potential 
energy functional is given by V[qg]= fo d? apoU 
(so, P0/3), which is a functional of q that depends 
only upon J and hence only upon 0q/0a. 

The next item required for constructing Hamilton’s 
principle is the kinetic energy functional, which is 

; A 3 BS e ep ag 
given by T[q,4]= Jo d apog /2, where 4° := niq'q', 
with the Cartesian metric nj := 6;;. This metric and its 
inverse can be used to raise and lower indices. 

The Lagrangian functional is Ll[g,g]:=T— V, 
where L[q,q]= fa d’aL(q,q,0q/Aa) and L is the 


Lagrangian density, in terms of which the action 
functional of Hamilton’s principle is given by 


Sla] = J 'dtLlg ål 


ti 
-faf Pal 004? = poU] 19] 
to Q 2 


The end conditions for Hamilton’s principle for the 
fluid are the same as those of mechanics, that is, 
6q(a, to) =dg(a,t;)=0. The nonpenetration condi- 
tion, 6g -2=0 on OO, where ^ is a unit normal vector 
is also assumed. Other boundary conditions, such as 
periodic and free boundary conditions, are also 
possibilities. Hamilton’s principle amounts to 
6S/éq(a,t)=0, which, with the end and boundary 
conditions, implies the following equations of motion: 


7 O (pp 


Here we have used ðA! Jad =0, which can be seen 
using [14]. Equation [20] amounts to Newton’s 
second law for the ideal fluid, which is made clearer 
by using the following useful identity: 
o Ls O 

— =A, — 21 

Ogk I * Oa’ ae 
Alternatively, upon using [13], [20] is sometimes 
written in the form 


. Od! ð (p,90U 
mäen (B) =0 a 


The Eulerian variable force law follows from [20] 
upon using [21]: 


(5 +v: vv) = -Vp [23] 
where v=v(r,t). The remaining Eulerian equations 
of mass conservation and entropy advection follow 
from the constraints that sọ and po are constant on 
fluid elements. Time differentiation and the trans- 
formations of [16] and [17] yield 


Op 
= 24 
L 4V- (pv) =0 24 
Os 
= ives 2 
ae Vs=0 [25] 


Equations [23]-[25] together with a given function 
U(p,s) and the relation p=p?0U/0p constitute the 
Eulerian description. 

Variational principles similar to that described 
above exist for essentially all ideal fluid models, 
including incompressible flow, magnetohydrody- 
namics, the two-fluid equations of plasma physics, etc. 
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Eulerian Action Principles 


Some early researchers sought variational principles 
that directly produce the ideal fluid equations in 
Eulerian form. Because the Eulerian form of the 
equations does not treat the fluid as a collection of 
particles, the resulting action principles possess a 
certain awkwardness. Below, we describe three 
approaches to such action principles. 


Clebsch action The action principle for electromag- 
netism proceeds by introducing the 4-vector potential. 
In a similar way, the Clebsch action principle 
anticipates this idea by using a potential representation 
of the velocity field, an example being that of [18]. 

Although compressible flow with an arbitrary 
equation of state can be treated in full generality, for 
simplicity and variety we will restrict to incompres- 
sible flow and set V-v=0. This constraint is 
enforced by requiring ¢ to be dependent on a and 
G according to ¢[a, 8]:= —A+(aV), where A is 
the inverse Laplacian. The Clebsch action is then 
written as follows: 


Scla, 8] := J E J drl ay — 57] [26] 


where the subscript t denotes differentiation at fixed r, 
we have set p= 1, and v is a shorthand for the 
expression of [18] with ¢= ¢[a, 6]. The form of Sc is 
that of the phase-space action that produces Hamilton’s 
equations upon independent variation of the configura- 
tion space coordinate and its conjugate momentum, 
which are here a and p, respectively. Thus, we require 
da(r, to) = alr, tı) =0, but no condition is needed for 
63 at to,1. We also require ~-v=0 on NQ. The 
variations 6Sc/63 =0 and 6Sc/éa = 0 imply 


H 
a= = —v-Va 
A An 
b= -5 = -v Vp=0 
ÔQ 


an infinite-dimensional version of [1] with 
H= lo dřrv?/2. Evidently, both a and 8 are 
advected by the flow. 

Because the vorticity, C := V x v= Va x V3, knowl- 
edge of a and 8 determines ¢ and one can invert the curl 
operator to obtain v in the usual way. The intersection of 
level sets of a and 8 define vortex lines, and, evidently, 
these quantities, like the entropy for compressible 
dynamics, are constant on fluid elements. It is not 
difficult to show that the advection of a and 8 implies 
the correct dynamical equation for incompressible v. 


Herivel—Lin action The Herivel—Lin action incor- 
porates [24] and [25] as constraints with Lagrange 
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multipliers, p and p8. (Here 8 is not the Clebsch 8 
and the factor of p is included for convenience.) It 
was discovered early on that these constraints were 
not enough to achieve complete generality and so a 
new one, known as the Lin constraint, was added. 
The Lin constraint corresponds to constancy of the 
fluid particle label. One defines an Eulerian label 
field by setting g(a,t)=r and solving for the label 
a=q7'(r,t) =:a(r, t). Conservation of particle identity 
is thus given by a; + v- Va=0O, and this constraint is 
associated with a Lagrange multiplier y = (71,72, 3). 
The Herivel—Lin action is thus given by 


Surv, P,S,4, P, D; y| 


-fa fa po? — pU(p,s) + plo: + V - (ov) 


— pB\s; +v- Vs] — py: a, + Val) [28] 
Variation of [28] with respect to the Lagrange 
multipliers just reproduces the constraints; however, 
variation with respect to v, p, s, and a produces 
equations that imply [23]. Moreover, every flow can 
be shown to be an extremal of Su. 


Euler—Poincaré—Hamel action Another approach is 
to use directly constrained variations. The essential 
idea is to only consider Eulerian variable variations 
that are induced by underlying Lagrangian variable 
variations ôq, the so-called dynamically accessible 
variations. Explicitly, a basic Eulerian variation 
n=(m, m,n) is given by n(r,t)=6q(a, t)|a -gai 
In terms of this quantity, the dynamically accessible 
variations of the Eulerian velocity field, density, 
and entropy are given, respectively, by v =m +v- 
Vn—=n: Vv, p= —V-(pn), and ós = =n - Vs. Upon 


inserting them into the variation of 


Semb = f at f dria- ouo] p9 


and integrating by parts gives 


ti 
OSEPH =f dt | d?rf...] 
to Q 


where [|...] is equivalent to [23]. Thus, assuming n 
is arbitrary, we obtain directly the equation of 
motion. 

There is a version of this kind of constrained 
variational principle for all ideal fluid and plasma 
equations. Also, it possesses a geometric interpreta- 
tion. In a more practical vein, constrained variations 
can be used to derive reduced models, and dynami- 
cally accessible variations can also be used for 
stability calculations. Exploring these ideas is out- 
side the present scope. 
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Fluid Hamiltonian Description 


Having described variational principles, we turn 
to the associated canonical and noncanonical 
Hamiltonian descriptions. 


Canonical Description 


Because the action of [19] is of standard form, it is 
convex in g and the Legendre transform follows 
easily: the ee momentum density is 
mila, t) ):=őL/ôġ (a = poq; and H[q,7]= fo dal 
q-L]= fo dal am (200) + poU]. Hamilton’s equa- 
tions are then 


. ôH 6H 
7 = [= z — pH 
e = {q', H}, Tt Sgi {7i,H} [30] 


an infinite-dimensional version of [1], with the 
canonical Poisson bracket 


6G óG óF 
F, 
C he on ôq On 


(Note, ôqg'(a)/ôq' (a' 
gous to ðq /ðq = 


da [31 


= bla —a'), a relation analo- 
by for finite systems.) 


Reduction to Noncanonical Poisson Brackets 


Reduction is a procedure for reducing the size of a 
Hamiltonian system. Given constants of motion in 
involution, that is, with pairwise vanishing Poisson 
brackets, the dimension of a Hamiltonian system 
can be reduced by 2 for each such constant of 
motion. However, when constants do not commute, 
the situation is more complicated and one must 
invoke a theory due to Lie, Poincaré, Cartan, and 
others. Associated with invariants are symmetries, 
and so a complete discussion of this theory requires 
examination of symmetry groups and associated 
geometry. For the ideal fluid, the map from the 
Lagrangian to the Eulerian descriptions is an 
example of reduction, whereby the Poisson bracket 
of [31] is mapped into a noncanonical Poisson 
bracket. En route to describing this example, a brief 
discussion of reduction of finite systems is consid- 
ered first. 


Reduction of Finite-Dimensional Systems 


Consider a canonical system with the phase space 
M, a 2N-dimensional symplectic manifold. In a 
coordinate patch with coordinates z= (q,p) the 
system has the canonical description of [2]-[4]. 
Suppose we have a map P:M-— m*, where m* is 
some M < 2N-dimensional space described by coor- 
dinates w=(W1,W2,...,Wm). In coordinates, this 
map is represented in terms of functions wz =w,(z), 


with a=1,2,...,M, which, because M < 2N, is 
always noninvertible. Suppose f,g:M-—R obtain 
their z-dependence through the functions w, that is, 
f(z) =f(w(z)) =f ow. Making use of the chain rule 
yields 





= Og 
J 
faa) = Zoho e 32] 
where the quantity 
_ Ow Ow 
a ae HAN 


is in general a function of z. However, it is possible 
that J,, may only depend on w. When this happens, 
we have a reduction of the phase space M. 

If the original dynamics of interest has the 
Hamiltonian vector field generated by H(z), and if 
it is possible that H(z) can be expressed solely in 
terms of the w’s, that is, H(z)=H(w), then the 
system has been reduced. Clearly, this is a statement 
of symmetry, since the function H(z) in reality 
depends on a fewer number of variables, the w’s. 

A beautiful form of reduction occurs when the 
map P has a special form w,=L'(q)p;, where the 
quantity L is associated with a symmetry group. An 
identity for what is required of L’, in order for the 
transformed bracket to be expressible in terms of the 
w’s can be worked out, but this is explained in terms 
of Lie groups. If the space m is a Lie algebra g, then 
the functions f, g are real-valued functions on g* that 
can be extended by left or right translation to 
functions f,g on T*6. Thus, f restricted to T*G at 
the identity, T7;6=g*, is f. Because I*G is a 
cotangent bundle, it carries the canonical Poisson 
bracket and we get a natural map P, called a 
momentum map, into the dual of a Lie algebra. This 
geometrical description of obtaining brackets on g* 
from brackets on T*6 is a case of Marsden- 
Weinstein reduction. In the early 1980s, these 
authors and others developed the geometrical inter- 
pretation of the noncanonical Poisson brackets for 


the ideal fluid. 


Ideal Fluid Noncanonical Poisson Brackets 


The Euler-Lagrange map of the fluid is of the form 
of the map P above. It maps the canonical bracket of 
[31] into a noncanonical Poisson bracket. If we use 
the Eulerian variables M:= pv, p, and o:= ps, then 
the resulting noncanonical bracket is of Lie—Poisson 
form. To effect this map, one must vary [15]|-[17] to 
relate functional derivatives with respect to q and 7 
to those with respect to M, p, and ø. This amounts 
to working out the chain rule for functionals. Upon 
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doing this, one obtains the following noncanonical 


bracket: 
óF ð 6G 6G ð 6F 

sae G} — J mM E ôM; Ox! am) 

oF 8G _ 8G. gF 

óM dp 6M êp 





óF _6G 6G _6éF\] 3 
This om together with the Hamiltonian 
H[M, p,o w d°r[M2/(2p) + pU(p,o/p)] generates 


the ideal n equations. This Hamiltonian follows 
from H[M,p,o]:=Hl[g,7] with H[q,r] = fa da 
[n*/(2p0) + poU]. The bracket of [34] is clearly seen 
to be linear in the variables M, p, and o, and the form 
of the cosymplectic operator and structure operators 
Gs can be obtained by integration by parts. The Lie 
group in this case can be seen to be an extension by 
semidirect product of the diffeomorphism group. 

An alternative form of the noncanonical Poisson 
bracket is given in terms of the variables v, p, and s. 
Upon changing to these coordinates, the noncanoni- 
cal Poisson bracket transforms into 


SF_ 6G ôG OF 
a [GP wa we) 


(~ xv óG N 
x 





p ov ov 
Vs (6F6G 6G6F 3 


which, with the Hamiltonian H[v, p,s]= fo d?r 
[ov?/2 + pU(p,s)], produces the Eulerian fluid equa- 
tions of [23]-[25] directly as v: = {v, H}, p: = {p, H}, 
and s;={s,H}, respectively. Observe that in these 
variables, the bracket is no longer of Lie—Poisson form. 


Conclusion 


In a general sense, Hamiltonian dynamics is about 
coordinate changes, and it is clear from the above 
that there is no shortage of coordinates for describ- 
ing the ideal fluid. The most intuitive form of fluid 
equations (at present) is the Eulerian form, and this 
possesses a noncanonical Hamiltonian description. 
Other noncanonical variables are also used for both 
less and more general fluid systems than those 
described above. Vortex dynamics, shallow-water 
theory, and other equations of geophysical fluid 
dynamics are possibilities, as well as equations from 
plasma physics and other disciplines. The general 
story for these systems is much the same as above, 
although in some descriptions constraints are 
involved and they can complicate matters. 
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There are various motivations for pursuing an 
understanding of the Hamiltonian structure of 
fluids, but ultimately these motivations are the 
same as those for investigating the Hamiltonian 
dynamics of particle and other finite degree-of- 
freedom systems. Hamiltonian theory serves as an 
organizing framework, one that can be used for the 
derivation and approximation of systems. If one 
understands something about a particular Hamilto- 
nian system, then often it can be said to be true of a 
general class of Hamiltonian systems. By now, many 
applications have been worked out, some of which 
can be accessed from the literature cited below. 


See also: Adiabatic Piston; Adiabatic Piston; 
Bi-Hamiltonian Methods in Soliton Theory; Bi-Hamiltonian 
Methods in Soliton Theory; Classical Matrices, Lie 
Bialgebras, and Poisson Lie Groups; Contact Manifolds; 
Contact Manifolds; Hamiltonian Group Actions; 
Infinite-Dimensional Hamiltonian Systems; Korteweg—de 
Vries Equation and other Modulation Equations; 
Korteweg—de Vries Equation and Other Modulation 
Equations; Stochastic Hydrodynamics; Stochastic 
Hydrodynamics. 


Hamiltonian Group Actions 


L C Jeffrey, University of Toronto, Toronto, ON, 
Canada 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


The idea of a Hamiltonian flow on a symplectic 
manifold has its roots in Hamilton’s equations, 
which govern the trajectory of a particle in phase 
space (the space parametrizing coordinates and 
momenta of a classical particle). A fundamental 
idea in theoretical physics (Noether’s theorem) is 
that to every symmetry in a physical system (such as 
a group action), there is an associated conserved 
quantity: invariance under translation corresponds 
to conservation of linear momentum, invariance 
under rotation corresponds to conservation of 
angular momentum and so on, and these momenta 
are functions on the phase space. The mathematical 
formulation of this idea is the idea of the moment 
map associated to a group action on a symplectic 
manifold; the group action is obtained from the 
Hamiltonian flow of the moment map. 

This article will describe some basic features of 
moment maps associated to Hamiltonian group 
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actions, and some recent results about the geometry 
and topology of symplectic manifolds which have such 
group actions. We first define Hamiltonian group 
actions and list some of their properties. Next we give 
the definition of the symplectic quotient, which is a 
means of dividing out the symmetry to form a new 
symplectic manifold. We also explain some properties 
of the quotient construction. The convexity theorem 
and the moment polytope are outlined and toric 
manifolds (a particular type of symplectic manifold 
with a Hamiltonian torus action of maximal dimen- 
sion) are defined. Finally, we list some properties of 
cohomology rings of symplectic quotients. 

Two standard references on this material are the 
books of Cannas da Silva (2001) and McDuff and 
Salamon (1995). An authoritative and comprehen- 
sive reference is the monograph by Guillemin, 
Ginzburg and Karshon (2002). 


Hamiltonian Group Actions 


Let (M, w) be a symplectic manifold. The Hamiltonian 
vector field £y generated by a function H is defined by 


yer, Y):— diy Y) 


for any Y € T„M. If X € g++ X* are the vector 
fields on M generated by the symplectic action of a 
compact Lie group G with Lie algebra g, then the 
moment map pyu:M-—g* is defined by two 
properties: 


1. dum(Y)(X)=wm(X#, Y) for any Y € T„M: in 
other words the function ux :M —> R defined by 


pox (m) = n(m) (X) 


is the Hamiltonian function generating the vector 
field X#. 

2. w:M—g* is equivariant (where G acts on g* by 
the coadjoint action). 


Remark 1 In this article, we shall only consider 
actions of compact connected Lie groups, although 
the definition of Hamiltonian group action may be 
extended to noncompact groups. In particular, 
unless otherwise specified the term “torus” refers 
to the compact torus T = U(1)”. 


Remark 2 (Existence and uniqueness of moment 
maps). One sees that LCyxw=d(ly#w), so that ryxw 
is closed. The moment map jx exists if and only if 
Lyw is also “exact.” The moment map need not 
always exist: for example, if S! acts on T? by 


dx. ( elf e2) = ell +X). e2) 


we see that for the standard symplectic form 
w= dı Adh) we have iyw = d2. Since 62 is only 
defined mod 27 we see that the moment map does 
not exist as a map into R. Conditions guaranteeing 
the existence of a moment map (other than M being 
simply connected) include the hypothesis that G 
is semisimple (Guillemin and Sternberg (1990, 
theorem 26.1)); conditions on the existence and 
uniqueness of the moment map can be formulated in 
terms of Lie algebra cohomology (see Guillemin and 
Sternberg (1990)). The obstruction to the existence 
of the moment map for a symplectic action of G is 
an element of H'(g); the obstruction to uniqueness 
of the moment map is an element of H*(g), where g 
is the Lie algebra of G. See Guillemin and Sternberg 
(1990, proposition 24.1). 


Basic Properties of Moment Maps 


Proposition 1 (Guillemin—Sternberg (1982, 1984)) 


Im(dum) = Lie(Stab(m)) 


where | denotes the annihilator under the canonical 
pairing g* @ gk. 
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Proof We have 
(VE, Z) = duy(Z) = (Y, dim (Z)) 


for all Z € T,,M. Thus, Y annihilates all € € Im(dum) 
if and only if Y € Lie(Stab(m)). 


Corollary 1 Zero is a regular value of u if and only 
if Stab(m) is finite for all mép'(0). In this 
situation, 1 '(0) is a manifold and the stabilizer of 
the action at any point in ~'(0) is finite. 


Example 1 Let T be a torus acting on M and let 
F c M! be a component of the fixed-point set. Then 
for any f € F, we have dus =0, so u(F) is a point. 


Proposition 2 


(i) If H C G are two groups acting in a Hamiltonian 
fashion on a symplectic manifold M, then uy =r © 
ug where x:g* +h’ is the projection map. In 
other words, if X € h, then py (m)(X) = wcg(m)(X) 
for any m€ M. One example that frequently 
arises is the case when H =T is a maximal torus of 
a compact Lie group G. 

(ii) More generally if f:H—G is a Lie group 

homomorphism, and the two groups G and H 

act in a Hamiltonian fashion on a symplectic 
manifold M, in such a way that the action is 
compatible with the homomorphism f, then 
uy =f* o ug where f*:g—h* is induced from 
the homomorphism f. (The case (i) is the special 
case where f is the inclusion map.) 

If two symplectic manifolds Mı and M3 are acted 
on in a Hamiltonian fashion by a group G with 
moment maps u and m, then the moment map 
for the diagonal action of G on Mı x Ma with the 
product symplectic structure is pı + u2. 


xr 


(iii 


Example 2 The standard symplectic form on SŽ is 
w= —dcos ^A déd=—dzAd¢ (where 0 is the polar 
angle, @ is the azimuthal angle, and z is the height 
function). The associated moment map for the action 
of U(1) on S* by rotation about the z axis is u(z, b) =z. 


Example 3 If R*=C has the symplectic structure 
w=dx Ady, the moment map for the standard 
action of U(1) on R° with multiplicity m € Z, in 
other words the action 


“weU(1):zEeCru% 
is u(x, y) = —m(x? + y*)/2. 


Example 4 Suppose a torus T acts on C preserving 
the standard symplectic structure, and suppose 
the action factors through a homomorphism 
B:T — U(1) which can be written as 


B(expr X) = exXPu(1) (3(X)) 
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in terms of a linear map 8 € t* that maps the integer 
lattice of t into Z (in other words, a weight) and the 
exponential maps 


expr:t— T 
and 
expyi1) : R > U(1) 


(the latter being normalized as expyq) (t) = e°"), 
Then, by Proposition 2(ii) and Example 3 we 
see that the moment map for the action of T on 


C is 
ulz) = 2 Bll 


It follows that if T acts on C” via a collection of 
weights (3;,..., 0, € t*, then the moment map is 


1< 2 
UZI- -s Zn) = ae! Gj 
j=l 


and the image of the moment map is the cone in t 


spanned by {(1,..., Gn}. 


The Symplectic Quotient 


Since the moment map u is equivariant, we may 
form the symplectic quotient (or Marsden—Weinstein 
reduction) 


Mea = Mo = uw '(0)/G 


The symplectic structure on M descends to give a 
symplectic structure on Mg. Corollary 1 implies that 
if O is a regular value of u, then Mo is an orbifold. 


Remark 3 Another way to formulate Corollary 1 is 
that if G acts freely, O is a regular value of u so 
u'(0) is a manifold with a free G action, and hence 
u'(0)/G is also a manifold. If the G action is only 
locally free, then u~'(0) is still a manifold, but the 
quotient y'(0)/G is only an orbifold. 


Remark 4 The definition of orbifold is due to 
Satake (1957); an alternate formulation is given in 
the paper by Henriques and Metzler (2004) and 
references cited there. 


If T is a torus, then the equivariance condition on 
the moment map reduces to invariance, so we may 
form the reduced space M;=yp'(t)/T for any 
regular value t€t* of the moment map yp; the 
space M; is a symplectic orbifold for any regular 
value t of u. 


Example 5 Let U(1) act diagonally on C” equipped 
with the standard symplectic structure 


= >> dz; /\ dz; 
j=1 


= dx; /\ dy; 
j=l 


where z; =x; +1);. The moment map for this action is 
1 n 
2 
u(z1, a e = = 2. Fa 
j=l 


so the symplectic quotient y!(—1/2)/U(1) is com- 
plex projective space 


ger 0) G01 @ ace 


More generally we may consider the reduced 
space M,=y!(O,)/G when O, is the orbit in g* 
through A € g* (coadjoint orbit). All such orbits may 
be parametrized by A € t$, where ft’ is a chosen 
positive Weyl chamber in f¢*. 
Example 6 Let U(n) act on C” in the standard way, 
where C” is equipped with the standard symplectic 
structure [1]. The moment map for this action is 

1 o 
uzi, ... , Zn) ip = 55 2] 
which is the (j,k) element of a matrix in the Lie 
algebra of U(n). The standard symplectic form on 
C” descends under reduction to the standard 
symplectic form on CP”! (which corresponds to 
the Fubini-Study metric). 


Example 7 (Coadjoint orbits). Let A€ g*. We 
define a symplectic structure w) on the coadjoint 
orbit ©, (in terms of the vector fields X*,Y* 
generated by the action of X,Y€g) by 
w(X*, YF) = —X([X, Y]) at the point A € O, (and 
everywhere else on the orbit by equivariance). The 
moment map for the action of G on O, with respect 
to this symplectic structure is the inclusion of O, 
in g*. (The symplectic structure on the orbit was 
found by Kirillov and Kostant; see, for instance, 
Berline et al. (1992, section 7.5). 


Example 8 (The shifting trick). 
structure Q on M x O, by 


Define a symplectic 


QO =a" wy; 


Then for the moment map with respect to the 
induced action of G on M x O, we have 


My, S (M x Oa )o 


Corollary 2} Combining Example 6 with Proposition 
2(ii) we see that for any linear action of a group G on 
CP""! (i.e., an action factoring through a representa- 
tion G — U(n), or in other words an action descending 
from a linear action on C”) the moment map factors as 


[= 70 fb 


where û:CP”-t — u(n)" is given in [3] below, and 
1:u(n) — g* is the projection map. 


In particular, one often requires for a projective 
manifold M (i.e., a compact complex manifold with 
an embedding into CP”"') that the action of G 
extends to a linear action on CP”~!. Thus, moment 
maps for such linear actions are given by [3] 
composed with m and with the embedding of M 
into CP”-! (see also Cotangent Bundle Reduction, 
Poisson Reduction, Symmetry and Symplectic 
Reduction). 


Reduction in Stages 


Suppose a compact Lie group G acts in a Hamilto- 
nian fashion on a symplectic manifold M, and H is a 
normal subgroup of G. (For example, this hypoth- 
esis is satisfied if both H and G are tori.) Suppose 
also that O is a regular value for wy and pic. Then 
the symplectic quotient j/(0)/H is acted on 
naturally by the quotient group G/H, and this 
action is Hamiltonian; furthermore, the symplectic 
quotient of u4 (0)/H by G/H is naturally iso- 
morphic to wc'(0)/G. (This result is known as 
“reduction in stages.”’) 

Let M be a symplectic manifold equipped with the 
Hamiltonian action of a torus T. Let H C T be a Lie 
subgroup of T (so H is a torus whose dimension is 
smaller than the dimension of T). Let pr:M—- 
Lie(T)* and up: M — Lie(H)* be the moment maps: 
recall that py=mHopr, where my:Lie(T)* — 
Lie(H)* is the standard projection. 

For any 7 € Lie(H)” we may form the reduced 
space M,=¢))(n)/H. This is equipped with a 
Hamiltonian action of T/H. 


Example 9 Let U(z) act on C” in the standard way. 
This action descends to an action on CP”~!, which 
is the symplectic quotient of C” under the action of 
the diagonal U(1) subgroup of U(m). Hence, the 
moment map /i for the action of U(z) on CP”! is 
given by the formula 


OÈ Bp 
a; rt L2 
2 eat lzel 


which comes from the moment map [2] for the 
action of U(n) on C”. 


Clt, -+3 Zal) 3] 
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The Normal Form Theorem 


There is a neighborhood of p~'(0) on which the 
symplectic form is given in a standard way related to 
the symplectic form wọ on Myeg (see, e.g., Guillemin 
and Sternberg (1990, sections 39-41)). 


Proposition 3 (Normal form theorem). Assume 0 is 
a regular value of u (so that y'(0) is a smooth 
manifold and G acts on 1~'(0) with finite stabilizers). 
Then there is a neighborhood U = p (0) x {z € g*, 
lz] < bh} Cw (0) x g* of (0) on which the sym- 
plectic form is given as follows. Let P% 
u™(0)3M,eq be the orbifold principal G-bundle 
given by the projection map q:u™(0)— pw 1(0)/G, 
and let 0 € Q!(P) @ g be a connection for it. Let wo 
denote the induced symplectic form on Mq, in other 
words q* wo =igw. Then if we define a 1-form T on 
U C P x g® by Tpz =2(0) (for p € P and z € g*), the 
symplectic form on U is given by 


w = qřwo + dr [4] 


Further, the moment map on U is given by 
up, Z) =z. 


Corollary 3 Let t be a regular value for the 
moment map for the Hamiltonian action of a torus 
T on a symplectic manifold M. Then in a neighbor- 
hood of t, all symplectic quotients M, are diffeo- 
morphic to M, by a diffeomorphism under which 
Wp = Ww, + (t —to,d0) where 0 € Op" (to)) @t is a 
connection for the action of T on p” (to). 


Corollary 4 Suppose G acts in a Hamiltonian 
fashion on a symplectic manifold M, and suppose 
0 is a regular value for the moment map u. Then the 
reduced space My=p'(O,)/G at the orbit O, 
fibers over Mo = up” (0)/G with fiber the orbit O); 
furthermore, if n: M, — Mo is the projection map, 
then the symplectic form w) on w'(Oy)/G is given 
as Wy =T* wo + Qa, where wo is the symplectic form 
on Mo and Q, restricts to the standard Kirillov- 
Kostant symplectic form on the fiber. 


Convexity Theorems 


Theorem 1 (Atiyah (1982); Guillemin—Sternberg 
(1982 and 1984)). Suppose M is a connected 
compact symplectic manifold equipped with a Hamil- 
tonian action of a torus T. Then the image u(M) isa 
convex polytope, the convex hull of {u(F)}, where F are 
the components of the fixed-point set of T in M. 


Example 10 Consider the orbits ©; of SU(2) in 
su(2) = R? through że R*. The image of the 
moment map for the action of the maximal torus 
T = U(1) is the interval [—z, t]. 
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Example 11 When Q; is the coadjoint orbit 
(through ¢ € t*) for a compact Lie group G with 
maximal torus T, the image pz7(O;) of the moment 
map ur for the action of the maximal torus T is the 
convex hull Conv{wt:w € W}, where W is the Weyl 


group. 


The convexity theorem above can be generalized 
to actions of nonabelian groups. If M is a connected 
compact symplectic manifold equipped with a 
Hamiltonian action of a compact Lie group G with 
maximal torus T and positive Weyl chamber t,, 
then the intersection of the image u(M) of the 
moment map with the positive Weyl chamber t, (in 
other words, a fundamental domain for the action 
of the Weyl group on t£) is a convex polytope. 
This result is due to Kirwan (1984b) and for Kahler 
manifolds to Guillemin and Sternberg (1982 and 
1984). 

The proofs of Atiyah and Guillemin—Sternberg are 
based on Morse theory applied to the moment map. 
A key ingredient in the proofs is to establish that the 
fibers of the moment map are connected. 


The Moment Polytope 


Given a compact symplectic manifold M equipped 
with the Hamiltonian action of a torus T, we see 
that there is an associated polytope P, the “moment 
polytope.” The fibers of the moment map p are 
preserved by the action of T, so the value of u 
parametrizes a family {M,} of symplectic quotients. 
By Theorem 1 the moment polytope is the convex 
hull of the images of the fixed-point set under the 
moment map. 

By Proposition 1, we see that the moment 
polytope is decomposed according to the stabilizers 
of points in the preimage, and the critical values of 
the moment map are the images ur(W;) of the 
fixed-point sets W; of one-parameter subgroups $; 
of T. These critical values form hyperplanes 
(“walls”) which subdivide the moment polytope: 
the complement of the walls is a collection of open 
regions consisting of regular values of the moment 
map. 


Example 12 The group SU(3) has maximal torus 
T = U(1)*. We identify g* with g via the bi-invariant 
inner product (1.e., the Killing form) on g, and thus 
identify t* with t. For A € t, the Weyl group images 
of à are the six vertices of a hexagon: the “walls” in 
the moment polytope for the action of T on the 
coadjoint orbit O, arising from the action of G on 
g“ through \€f* are the edges of the hexagon 
(exterior walls) and the three lines connecting 
Opposite vertices (interior walls). 


Toric Manifolds 


Definition 1 A toric manifold is a compact 
symplectic manifold M of dimension 2” equipped 
with the effective Hamiltonian action of a torus T of 
dimension 7. 


Example 13 Complex projective space CP” with 
the obvious Hamiltonian action of U(1)” c U(1)"*! 
is a toric manifold. 


Example 14 A special case of Example 13 is the 
2-sphere S* = CP! (with the action of U(1) given by 
rotation around one axis). The 2-sphere is a toric 
manifold. 


Elementary Properties of Toric Manifolds 


If M is a toric manifold, the fiber of the moment map 
for the action of T is an orbit of the action. Hence, 
the symplectic quotient M, at any value t € t* is a 
point (if it is nonempty). 

The regular values of u are the interior points of 
the moment polytope P. All points in the preimage 
u (ðP) are fixed points of some one-parameter 
subgroup of T. Points in the interior of a face P; of 
dimension j are fixed by a subtorus of T of 
dimension n — j. Hence, each fiber of u over a 
point in P; is a quotient torus of dimension j. In 
particular, the vertices of the polytope are the 
images of the components of the fixed-point set of 
the whole torus T, and the inverse image of a vertex 
is contained in the fixed-point set of T. 

The push-forward function u,(w”/n!) under the 
moment map is just the characteristic function of the 
moment polytope. 


Delzant’s Theorem 


In fact, toric manifolds are characterized by their 
moment polytopes. A theorem of Delzant (1988) 
says that any polytope P satisfying appropriate 
hypotheses (a simple polytope) is the moment 
polytope for some toric manifold; furthermore, if 
two toric manifolds acted on effectively by a torus T 
have the same moment polytope, then they are 
T-equivariantly symplectomorphic. The first state- 
ment is proved by constructing a toric manifold 
which has the polytope P as its moment polytope; if 
P has d faces of codimension 1, one constructs the 
toric manifold M as a symplectic quotient of a 
vector space V © C? by the linear action of a torus 
T’ ~U(1)*". The torus T  U(1)” acting on M is 
then obtained by reduction in stages, as the quotient 
of U(1)* by T’. 

The construction of a toric manifold whose 
moment polytope is a given simple polytope is 


given in Guillemin (1994, chapter 1). The second 
statement (namely that toric manifolds are classified 
by their moment polytopes) is proved in Delzant 
(1988). 


Example 15 The moment polytope for the action 
of U(1)” on CP” is the n-simplex. This action 
descends from the action of U(1)"*! on C”*!, using 
reduction in stages: recall from Example 5 that we 
constructed CP” as the symplectic quotient of C”*! 
by the standard action of U(1). 


Cohomology Rings of Symplectic 
Quotients 


For material on the equivariant cohomology of 
symplectic manifolds equipped with Hamiltonian 
group actions and the relation to the fixed-point set, 
we refer to Equivalent Cohomology and the Cartan 
Model. As in that reference we shall describe 
the equivariant cohomology of a Hamiltonian 
G-manifold using the Cartan model. 

Two fundamental results of Kirwan give comple- 
mentary descriptions of the equivariant cohomology 
of a symplectic manifold. 


Kirwan Injectivity 
Kirwan’s first theorem is the injectivity theorem: 


Theorem 2 (Injectivity theorem). If T is a compact 
torus and M is a Hamiltonian T-space, then the 
direct sum of restriction maps to all components of 
the fixed-point set 


Pr : H7(M) > H*(F) 8 S(t*) 
is injective. 
The proof appears in Kirwan (1984a); this 


material is treated in Equivalent Cohomology and 
the Cartan Model (theorem 6.6). 


Kirwan Surjectivity 


Let G be a complex torus, and let 0 be a regular 
value of the moment map p. Suppose M is a 
compact symplectic manifold equipped with a 
Hamiltonian action of a compact Lie group G. 
There is a natural map «:H%ý(M)—> H*(M,ņġea) 
defined by 


k: He(M) He (pe (0) & H (Mea) 


(where the first map is the restriction map and the 
second is the identification of Hý (Z) with H*(Z/G) 
when G acts locally freely on Z and the cohomology 
is taken with rational coefficients). The map & is 
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obviously a ring homomorphism. Kirwan’s second 
theorem treats the image of k. 


Theorem 3 (Surjectivity theorem). Under the 
above hypotheses, the map « is surjective. 


The proof of this theorem (Kirwan (1984a, 5.4 and 
8.10); see also Kirwan (1992, section 6)) uses the 
Morse theory of the “Yang-Mills function” |y|*: 
M-—R to define an equivariant stratification of M 
by strata Sz which flow under the gradient flow of 
|u| to a critical set Cg of |p|”. One shows that the 
function |u|" is equivariantly perfect (i.e., that the 
Thom-Gysin (long) exact sequence in equivariant 
cohomology decomposes into short exact sequences, 
so that one may build up the cohomology as 


Hg (M) = He(u '(0)) e EP HG (So) 
B40 


Here, the stratification by S has a partial order >; 
thus, one may define an open dense set Ug =M — 
U>, which includes the open dense stratum S, of 
points that flow into p'(0) (note S, retracts onto 
u'(b)). The equivariant Thom—Gysin sequence is 


> HG"? (Ss) 3 HG (Us) > HG (Us — Sa) > + 


To show that the Thom—Gysin sequence splits into 
short exact sequences, it suffices to know that the 
maps (ig), are injective. Since 7,(73), is multiplication 
by the equivariant Euler class eg of the normal 
bundle to Sg, injectivity follows because this 
equivariant Euler class is not a zero divisor (see 
Kirwan (1984a, 5.4) for the proof). 

Because « is a surjective ring homomorphism, it 
follows that 


Hé (Mea) = H} (M) /Ker(%) 


The above theorem is also valid when G is the 
complexification of a compact semisimple Lie 
group. In this case, one must reduce at 0 (because 
of the condition that the moment map is equivar- 
iant, since b=0 is the only value which is invariant 
under the coadjoint action). The case of reducing at 
coadjoint orbits can be treated using the proof for 
the case of reducing at O via the shifting trick 
(Example 8). 

Several recent articles (Jeffrey and Kirwan (1995, 
1997), Tolman and Weitsman 2003) compute 
Ker(«). Some articles compute Ker(«) in specific 
examples, notably the action of S! on products of 
two-dimensional spheres of general radii. 


The Residue Formula 


One approach to identifying Ker(«) is the “residue 
formula,” Jeffrey and Kirwan (1995), theorem 8.1: 
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Theorem 4 (Jeffrey and Kirwan (1995), corrected 
as in Jeffrey and Kirwan (1997)). 

Let n € Hý (M) induce no € H*(Myeq). Then we 
have 


FEF 


J k(nje i = nyC° Res (ro ` h}(X) ax) [5] 


where no is the order of the stabilizer in G of a 
generic element of «'(0), and the constant C° is 


defined by 


_ (=. 
[WI vol(T) 


We have introduced s= dim G and l= dim T; here 
n, =(s — l)/2 is the number of positive roots. Also, 
F denotes the set of components of the fixed-point 
set of T, and if F is one of these components, then 
the meromorphic function hi, on t ® C is defined by 


Cc 6 


, rn (X e 
PX) = HPO) / ipn(X)e 5 
p(X) sE [7] 
and the polynomial D:t—R is defined by D(X) = 
ILo (X), where y runs over the positive roots of G. 


The residue map Res is defined on (a subspace of) 
the meromorphic differential forms on t&C: its 
definition depends on some choices, but the sum of 
the residues over all F€ F is independent of these 
choices. When T=U(1), we define the residue on 
meromorphic functions of the form e”*/X%N when 
àA #0 (for N € Z) by 


1AX 1AX 
e e 
Res XN = Resx=ọ ONT 


= 0, ifA<0O 


More generally, the residue is specified by certain 
axioms (see Jeffrey and Kirwan (1995, proposition 
8.11)), and may be defined as a sum of iterated 
multivariable residues Resx,— ,...Resx,=), for a 
suitably chosen basis of t yielding coordinates 
X1,...,X, (see Jeffrey and Kirwan (1997)). 


The Tolman-Weitsman Theorem 


The Tolman and Weitsman (2002) theorem is as 
follows: 


Theorem 5 We have 


Ker(x) = X (K? @ KS) [8] 
S 


Here, S is a generic circle subgroup of T and KS 
(resp. KÌ ) denote the set of all equivariant cohomol- 
ogy classes n whose restriction to F° (resp. PI is 
zero. Here, 


F+ ={FE F: +us(F)>0} 


where us is the component of the moment map in 
the direction of the Lie algebra of S. 


For more information, see Intersection Theory, 
Moduli Spaces: An Introduction, and Equivariant 
Cohomology and the Cartan Model. 
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Introduction 


In general relativity there are several levels within 
the framework of symplectic reduction of Einstein’s 
equations at which one could attempt to define a 
Hamiltonian for the gravitational dynamics of a 
spatially closed universe. At the most basic unre- 
duced level, this Hamiltonian is simply a linear 
function of the Einstein constraints and thus 
vanishes for any solution of the field equations. At 
the other extreme, at the deepest fully reduced level, 
one affects a transformation to a complete set of new 
canonical variables, the so-called “observables,” 
which Poisson-commute with all of the constraints. 
At this level, the relevant Hamiltonian vanishes 
identically since each of the new canonical variables 
is a constant of the motion. 

There is, however, an intermediate level wherein, 
after making a suitable choice of coordinate gauge 
and imposing the constraint equations, one can 
define a nonvanishing Hamiltonian that generates 
the gauge-fixed and constrained evolution equations 
and whose global infimum as a function on the 
relevant reduced phase space has direct topological 
significance. For the large class of manifolds on 
which this Hamiltonian can be defined, it has the 
attractive feature of globally monotonically decaying 
in the direction of cosmological expansion and thus 
evolves in such a way so as to seek and, in certain 
cases at least, to asymptotically attain its infimum 
value in the limit of this expansion. This Hamilto- 
nian provides in these cases a weak Lyapunov 
function for the dynamics that can be used to 
partially control its global behavior. Since under- 
standing the global behavior of solutions to 
Einstein’s equations and its dependence upon the 
spatial topology is one of the central open problems 
in classical general relativity, the mathematical 
properties of this quantity are worthy of study. 


Further information and details regarding the 
authors’ work discussed in this article can be found 
in Fischer and Moncrief (2000, 2002a, b) and in the 


references therein. 


Topological Background 


Einstein’s field equations are nonvacuous and 
compatible with the introduction of material sources 
in (n + 1) dimensions for all n > 2, the case of most 
physical interest being of course n = 3. For the field 
equations to be deterministic in a classical sense, 
that is, for the Cauchy problem to be well-posed, it 
is essential that they be formulated on a manifold 
that is globally hyperbolic and, in particular, has a 
product topology M xR (roughly, space x time = 
spacetime) where M is a smooth (C™) connected 
manifold of dimension n and R is the real line. For 
the case of spatially closed universes of interest here, 
M should be closed, that is, compact and without 
boundary. To simplify the analysis further, we also 
assume that M is oriented, that is, orientable and 
an orientation has been chosen. Thus, unless stated 
otherwise, throughout this article M will denote 
a smooth closed connected oriented n-manifold, 
n > 2, and all maps will be smooth. 

Let “~” denote the diffeomorphic equivalence 
relation between smooth manifolds. Let $” denote 
the unit m-sphere in Euclidean (m+ 1) space 
R"*',1>1. An n-manifold M is trivial if M ~ S” 
and nontrivial if M æ% S”. 

The connected sum M#N of two closed connected 
oriented n-manifolds M and N is constructed by 
removing the interior of an embedded closed n-ball in 
M and N, respectively, and then identifying the 
resulting S”~'-boundary components by an orienta- 
tion-reversing diffeomorphism of the (n — 1) spheres. 
The resulting manifold is smooth, connected, closed, 
and orientable, and is naturally oriented by the 
orientations on M and N. Up to orientation-preserving 
diffeomorphism, this construction is independent of 
the choice of the embeddings of the m-balls and of the 
choice of the orientation-reversing diffeomorphism 
used to join the manifolds together. 
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Let M be a nontrivial closed connected oriented 
n-manifold. Then M is prime if Mx M,#M) 
implies that either Mı ~ S” or M3 = S” (but not 
both since we are assuming that M is nontrivial). 
M is a composite if M can be written as a nontrivial 
connected sum, that is, if M~M,#M)> where both 
Mı # S” and Mz # S”. 

Note that with this definition, S” itself is not 
prime. This is analogous to the fact that for the 
positive integers, the unit 1 is not prime. 

Now let M be a connected n-manifold without 
boundary (not necessarily compact or orientable) and 
let m be a group. Then M is a K(x, 1)-manifold if M is 
an Eilenberg-MacLane space, that is, if its first 
homotopy group (or fundamental group) mı(M) =% 
and if all of its higher homotopy groups are trivial, that 
is, m;(M)=0 for i> 1 (equivalently, the universal 
covering space M of M is contractible). Since the 
higher homotopy groups 7;(M), i > 1, can be inter- 
preted as the homotopy classes of continuous maps 
S' = M, each such map must be homotopic to a 
constant map. Thus a K(x, 1)-manifold is said to be 
aspherical. Moreover, at the level of homotopy, all of 
the information about the topology of M is contained 
in 71(M) =. Thus, in particular, if fis a map between 
connected aspherical manifolds that induces an iso- 
morphism on their fundamental groups, then f is a 
homotopy equivalence. Consequently, any two con- 
nected aspherical manifolds are homotopy equivalent 
if and only if their fundamental groups are isomorphic. 

It is useful to define a connected n-manifold M to 
be hyperbolizable if there exists a complete Rieman- 
nian metric g on M with constant negative sectional 
curvature, K(g)=constant < 0. We introduce this 
terminology to emphasize the underlying topology 
of manifolds that can support hyperbolic metrics 
rather than the geometry of such metrics. Similarly, 
M is of flat type if M admits a complete flat 
Riemannian metric g, K(g) =0, and M is of spherical 
type if M admits a complete Riemannian metric g on 
M with constant positive sectional curvature, 
K(g)=constant > 0. In this latter case, by the Bon- 
net—-Myers theorem, M is necessarily compact and if n 
is odd, then by Synge’s theorem, M is necessarily 
orientable. In fact, all such manifolds have been 
classified. As an important example, we note that a 
connected 3-manifold M is of spherical type if and only 
if it is diffeomorphic to a spherical space form S°/T, 
where T is a finite subgroup of SO(4) acting freely and 
orthogonally, that is, isometrically, on S°. 

Within the class of K(m,1)-manifolds are all flat- 
type and hyperbolizable m-manifolds, since any such 
manifold is isometrically covered by R” in the flat case 
and homothetically covered by H” in the hyperbolic 
case, where H” is the standard single-sheeted spacelike 


hyperboloid with constant sectional curvature K = —1 
embedded in (n + 1)-Minkowski space R", 

We now return to our standard assumptions on M, 
so that M is connected, closed, and oriented. For 
n=2, these assumptions restrict the possibilities to 
S*,T*, and the orientable higher genus surfaces 
5 =T°#T’#.--+T° (p factors) consisting of the 
connected sum of p copies of T^, p > 2. However, 
from the point of view of (2+ 1) gravity, unless one 
includes material sources or a cosmological constant, 
the spherical case is vacuous in that there are no 
vacuum solutions of the field equations on S* x R. 
The torus case is nonvacuous but the solutions, the 
so-called flat Kazner spacetimes, can all be found by 
elementary means. Thus only the case of genus p > 2 
surfaces presents problems of interest. 

For n = 3, although not essential for the program of 
reduction, it is convenient to assume the elliptization 
conjecture of 3-manifold topology. This conjecture 
asserts that a closed connected 3-manifold M with 
finite fundamental group mı(M) must be diffeo- 
morphic to a spherical space form S?/T, where, in 
such a quotient, I will always be a finite subgroup of 
SO(4) acting freely and orthogonally on S? and thus T 
is isomorphic to 771(M). 

The simply connected case is the Poincaré con- 
jecture. The full elliptization conjecture is equivalent 
to the Poincaré conjecture and a conjecture asserting 
that the only free actions of finite groups on S° 
are equivalent to the standard orthogonal ones. 
The elliptization conjecture is part of Thurston’s 
geometrization program (Thurston 1997). For back- 
ground information regarding 3-manifold topology, 
see Hempel (1976) and Jaco (1980). 

Under the assumption of the elliptization con- 
jecture, the Kneser—Milnor prime decomposition 
theorem asserts that if M is nontrivial, then up to 
order, M is uniquely diffeomorphic to a finite 
connected sum of the following form: 


S°/T1#--- 48° /T, 
Se aI 


k spherical factors 


Mea 


r (S! x S) #---#(S' x S’) 
ef 


l wormholes (or handles) 


4 K(T1,1) #--#K(am, 1) 1 


m aspherical factors 


where k, l, and m are integers > 0,k+l+m> 1, 
and if either k, l, or m is 0, terms of that type do not 
appear. Moreover, if k > 1, then each T;,1 <i<k, 
is a finite nontrivial (T; Æ {I}) subgroup of SO(4) 


acting freely and orthogonally on S°, and if m > 1, 
then each aspherical factor is a K(m;, 1)-manifold, 
1<j<m, and thus is universally covered by a 
contractible manifold. 

We remark that although in general a contractible 
3-manifold need not be R°, conjecturally the 
universal covering manifold of a K(x, 1) 3-manifold 
is diffeomorphic to Rè. 

In 3-manifold topology, a concept closely related 
to that of a prime manifold is that of an irreducible 
manifold. A closed 3-manifold M is irreducible if 
every embedded 2-sphere in M is the boundary of an 
embedded closed 3-ball. 

An embedded 2-sphere that does not bound such a 
3-ball is essential. Thus in the prime decomposition [1] 
above, M is decomposed along essential 2-spheres. For 
this reason, the prime decomposition is sometimes 
referred to as the sphere decomposition. 

With the exception of S$? which is irreducible but 
not prime (by definition of prime) and S! x S? 
which is prime but not irreducible, a closed oriented 
3-manifold is prime if and only if it is irreducible. 
We also remark that the Poincaré conjecture, when 
taken in the form that there do not exist any fake 
3-cells, is equivalent to every K(m,1) 3-manifold 
being irreducible. Thus in this article, since we are 
assuming the elliptization conjecture and hence the 
Poincaré conjecture, every K(m,1) 3-manifold will 
automatically be irreducible. 

Examples of the kinds of K(m,1)-factors that can 
occur in the decomposition [1] are as follows (we will 
explain the Seifert and graph designations below): 


1. Non-Seifert manifolds. Closed oriented hyperboliz- 
able manifolds diffeomorphic to H? /T, where I isa 
discrete torsion-free (i.e., no nontrivial element has 
finite order) co-compact subgroup of the Lie group 
Isom* (H?) of orientation-preserving isometries of 
H? which is Lie-group isomorphic to the proper 
orthochronous Lorentz group $O!(1, 3). 

2. Seifert manifolds. T? and five other 3-manifolds of 
flat type which are finitely covered by T?. Noting 
that £? = T”, we remark that the product manifold 
Stx £? = S! x T? = T’ is included in this class. 

3. Seifert manifolds. Product manifolds S! x X2, p > 2. 

4. Seifert manifolds. Nontrivial circle bundles over 
Zop >1. 

5. Graph manifolds. Any 3-manifold which fibers 
nontrivially over a circle with fiber > p > 1. Any 
such manifold is obtained by identifying the 
boundary components of [0,1] x 2 with an 
orientation-reversing diffeomorphism of > 


Since the handle S! x S% and spherical manifolds 
S°/T are well understood, under the assumption of 
the elliptization conjecture the task of 3-manifold 


Hamiltonian Reduction of Einstein’s Equations 609 


topology now reduces to understanding the topology 
of the (automatically irreducible) K(m, 1)-factors that 
can occur in the prime decomposition [1]. Since 
essential 2-spheres have already been used to 
decompose M into its prime components, the idea 
now is to use the next simplest 2-manifold, the 
2-torus, to probe the irreducible K(z, 1)-factors. 

Let i:T* — M be an embedding of T? into a 
closed oriented 3-manifold M. Then the embedded 
torus i(T”), identified with T*, is incompressible if 
the induced mapping of fundamental groups 
i, : mı(T?) — 71(M) is injective. Thus noncontracti- 
ble loops in T? remain noncontractible when T? is 
embedded in M, or, in other words, the ambient 
manifold M does not fill in any homotopy hole that 
exists in T? when standing alone. 

A closed oriented 3-manifold M is a Seifert- 
fibered space, or a Seifert manifold, if M admits a 
foliation by circles. For example, if S! acts freely on 
M, then M is the total space of an S'-bundle over a 
surface M/S' and M is a Seifert-fibered space (see 
examples 2, 3, and 4 above). More generally, if S! 
acts without fixed points (locally free), then M is a 
Seifert-fibered space, and in either case the fibers of 
M are the orbits of the S'-action. 

All spherical 3-manifolds are Seifert fibered with 
base $*. Also, the product manifold S! x S* is Seifert 
fibered, as are all manifolds finitely covered by T?, and 
thus all 3-manifolds of flat type are Seifert fibered. 
The only nontrivial connected sum that is a Seifert- 
fibered space is P? # P?°. No hyperbolizable manifold 
is Seifert fibered. Thus the remaining Seifert 
manifolds are among the nonhyperbolizable nonflat 
type K(x, 1)-manifolds (i.e., those for which M does 
not admit either a hyperbolic or a flat Riemannian 
metric). 

A generalization of Seifert-fibered spaces are the 
graph manifolds. A closed oriented 3-manifold M is a 
graph manifold if there exists a finite collection {T7} of 
disjoint embedded incompressible tori T? C M such 
that each component M; of M\U T? is a Seifert-fibered 
space. Thus a graph manifold is a union of Seifert- 
fibered spaces glued together by toral automorphisms 
along toral boundary components. The collection of 
tori may be empty so that, in particular, a Seifert- 
fibered manifold is a graph manifold. 

We remark that the manifolds described by example 
5 above are graph manifolds. We also remark that 
graph manifolds are closed under connected sums so 
that a graph manifold may be a composite. This 
contrasts with the situation for Seifert spaces which, 
with the exception of P? # P’, are not composites. 

Conjecturally, the most general K(z, 1)-manifold, 
not included in the list above, consists of “gluing 
together” across disjoint embedded incompressible 
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tori a finite collection of finite-volume-type hyperbo- 
lizable manifolds, that is, noncompact manifolds that 
admit a finite-volume complete hyperbolic metric, 
together with a possibly empty finite collection of 
irreducible graph manifolds with toral boundaries. 
Thus, overall, in this picture, to decompose an 
arbitrary closed oriented 3-manifold M into its 
elementary constituents, one first cuts along essential 
2-spheres to break M down into its prime factors, that 
is, the nontrivial spherical S? /T-factors, the wormhole 
(S! x §*)-factors, and the aspherical K(z, 1)-factors, as 
given by [1]. Then one cuts each nonelementary 
K(x, 1)-factor along incompressible tori to separate 
these factors into their final finite-volume-type hyper- 
bolizable and irreducible graph manifold components. 
The graph manifold components can then be further 
broken down along incompressible tori into Seifert- 
fibered pieces, finally yielding the toral decomposition 
of Jaco, Shalen, and Johannson (see Anderson (1997), 
Jaco (1980), and the end of the section “The Reduced 
Hamiltonian” for further details). 

The Thurston (1997) geometrization program, 
which implies that every closed oriented 3-manifold 
has the structure described by the above prime (or 
spherical) and toroidal decomposition, has been the 
subject of recent work by G Perelman (see Anderson 
(2003) and the references therein) who has argued 
that it can be proved by an enhancement of the Ricci 
flow program of R Hamilton (see the collected 
papers edited by Cao et al. (2003)). Without 
entering into the technical issues surrounding the 
completeness of Perelman’s proof, one can simply 
limit one’s attention to 3-manifolds of the above type. 
If geometrization is correct, then no 3-manifolds of 
interest have been excluded. 

Returning to the general case of m-manifolds, in the 
program of Hamiltonian reduction of Einstein’s 
equations, an important consideration is under what 
topological conditions on M can the conformal classes 
of M be uniquely represented by a given metric in each 
class. To analyze that question, we introduce the 
concept of the Yamabe type of a manifold. 

Let M be a connected closed oriented n-manifold, 
n > 3. There is no topological obstruction to the 
existence of Riemannian metrics with constant nega- 
tive scalar curvature, so all such manifolds admit a 
Riemannian metric g such that R(g) = —1. However, 
there are topological obstructions for zero scalar 
curvature and positive constant scalar curvature 
metrics on M. To help categorize these topological 
obstructions, we introduce the following terminology: 


1. M is of positive Yamabe type if M admits a 
Riemannian metric g; with scalar curvature 


R(g1) = 1; 


2. M is of zero Yamabe type if M admits a 
Riemannian metric gọ with R(go)=0, but no 
Riemannian metric g with R(g) =1; and 

3. M is of negative Yamabe type if M admits no 
Riemannian metric g with R(g) =0. 


The definition of Yamabe type partitions the class 
of connected closed oriented n-manifolds, n > 3, 
into three classes that are mutually exclusive and 
exhaustive. The following rather complete topologi- 
cal information regarding 3-manifolds of negative 
Yamabe type is known. 

Let M be a connected closed oriented 3-manifold. 
Assume that the Poincaré conjecture is true. Then M 
is of negative Yamabe type if and only if M satisfies 
one of the following three mutually exclusive 
conditions: 


1. M is hyperbolizable (and thus is a K(n,1)- 
manifold; see example 1 of K(, 1)-manifolds); 

2. M is a nonhyperbolizable nonflat type K(m, 1)- 
manifold (see examples 3, 4, and S of K(m,1)- 
manifolds); 

3. M has a nontrivial connected sum decomposition 
(i.e., M is a composite) in which at least one factor 
is a K(x, 1)-manifold; that is, M ~ M'# K(x, 1), 
where M’ æ% S°. In this case the K(x, 1)-factor may 
be either of flat type or hyperbolizable. 


We remark that (1) is the vast class of closed 
oriented hyperbolizable 3-manifolds. We also 
remark that the six closed orientable 3-manifolds 
of flat type, although K(z,1)-manifolds, are 
excluded from (2) as they are not of negative 
Yamabe type (they are of zero Yamabe type). Lastly 
we remark that if M is of negative Yamabe type and 
Seifert fibered, then M must be of type (2) (see 
remarks on Seifert-fibered spaces above). 

In any dimension 1 > 3, a manifold M of negative 
Yamabe type has the property that it admits no 
Riemannian metric g having scalar curvature R(g) > 0 
everywhere on M, or, in other words, every Riemannian 
metric on M has scalar curvature which is negative 
somewhere. For such a manifold M, Yamabe’s theorem 
asserts that each Riemannian metric g on M is uniquely 
globally conformal to a metric y with scalar curvature 
R(y)=—1 (see also [21]). Thus one can represent 
the conformal classes of Riemannian metrics on M 
in a suitable function space setting by an infinite- 
dimensional submanifold 


Mı =M_1(M) ={yeM|R(y)=-1} [2] 


of the space M = M(M)=Riem(M) of Riemannian 
metrics on M (see Fischer and Marsden (1975) for 
details). For this reason, we refer to metrics y in 
Mı as conformal metrics. 


The quotient of Mı by the natural action of 
Do =Do(M) = Diffo(M), the connected component 
of the identity of the diffeomorphism group 
D=D(M) =Ditt(M) of M, defines an orbit space 
(not necessarily a manifold) 7 =7(M), 


= M 


T 
Do 





3] 


which, when M is of negative Yamabe type, we 
define as the Teichmüller space of conformal 
structures on M. 

In two dimensions in the case of a higher genus 
manifold 2; p > 2, this construction leads precisely 
to the conventional Teichmüller space, as discussed 
by Fischer and Tromba (1984). In this case the 
resulting Teichmüller space 


M- (25) 


Tp — T (;) = Do (EŻ) 
p 


~œ R$ |4] 


is then a manifold diffeomorphic to R°’~°, which 
then plays the role of the natural reduced configu- 
ration space for the Einstein equations in (2 + 1) 
dimensions. Moreover, these constructions can be 
carried out globally using known global cross 
sections for the Do (E$) action on M_, 2.: These 
global cross sections can then be used to provide an 
explicit model for the Teichmüller space Tp as a 
finite-dimensional subspace of M4 Ei: 

For n=3,7 =T(M) plays the analogous role for 
the reduced field equations in (3 +1) dimensions. 
Moreover, for many 3-manifolds it is possible to 
show that 7 is itself an infinite-dimensional con- 
tractible manifold, rather than something more 
general such as an orbifold or a stratified union of 
manifolds. For technical simplicity, we shall assume 
throughout this article that 7 is a manifold. Our 
results remain valid in the more general case but in 
that case one must work on stratified spaces (see 
Fischer (1970) for results on the structure of orbit 
spaces when they are not manifolds). 

For higher-dimensional manifolds there is no 
analog of the Thurston geometrization program. 
Indeed, it is known that the set of closed n- 
manifolds for n>4 is so rich that no purely 
algebraic classification is possible. Nevertheless, for 
manifolds of negative Yamabe type, every Rieman- 
nian metric g is still uniquely conformal to a metric 
y € M so that the orbit space T=M_4/Do still 
represents the Teichmüller space of conformal 
equivalence classes on M. However, in these 
higher-dimensional cases, very little is known 
about the structure of 7. 
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The Field Equations 


Relative to a global time coordinate t= x? and local 
spatial coordinates (x!,...,x”) on a connected 
closed oriented n-manifold M, one can express the 
line element of an arbitrary (n+ 1)-Lorentzian 
metric with signature (-+---+) (n positive signs) 
in the form 


ds? = "tY gwd dg” 

= —N*dt* + gj(dx' + X'dt)(dx’ + X’dt) 15] 
where '"*1)g_,, denotes the components of the space- 
time metric, 0 < u,v < n, where the Riemannian 
metric g with components gj is the first fundamental 
form induced on each t=constant hypersurface, 
where the time-dependent positive function 
N = N(x, t) > 0 is referred to as the lapse function, 
and where the time-dependent spatial vector field 
X = X(x,t) with components X’='""!)go; g”, where 
g” denotes the inverse of the spatial metric gj, is 
referred to as the shift vector field. 

Let £ denote the dimension length. In this article 
we use the convention that the spatial coordinates 
(x!,...,x”) are always dimensionless, but the time 
coordinate t may have a dimension (see [19] and 
[36]). Since the line element ds? [5] has dimension # 
and the spatial coordinates are dimensionless, the 
physical spatial metric coefficients g; also have 
dimension ¢*. If the time coordinate t has a 
dimension, then the dimension of the lapse function 
N is such that the quantity Ndt has dimension £ and 
the dimension of the shift vector field X is such that 
the quantity Xdtż is dimensionless. 

We now briefly consider the canonical formula- 
tion of Einstein’s equations. For more information 
regarding this formulation, see Arnowitt, Deser, and 
Misner (1962) (ADM) or Fischer and Marsden 
(1972) for a global perspective. We remark that 
the canonical formulation of gravity itself is local 
and is valid for any spatial topology of M. However, 
as we shall see, Hamiltonian reduction of gravity 
along the lines described in this article requires the 
topological restriction that M be of negative 
Yamabe type. 

The standard definition of the second fundamen- 
tal form k, or extrinsic curvature, induced on a 
t=constant hypersurface leads to the coordinate 
formula 





where the vertical bar signifies covariant differentia- 
tion with respect to the spatial metric g and spatial 
indices are raised and lowered using this metric. The 
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natural momentum variable conjugate to g turns out 
to be the 2-contravariant symmetric tensor density 7 
(that is, m is a relative tensor of weight 1) whose 
components in a positively oriented local coordinate 
chart (x',...,x”), that is, in a chart in the orienta- 
tion atlas of M, are given by 


rÏ = —,/det gy) (k! — (trek) g") [7] 


where kï = gf gk, is the contravariant form of k, 
and where 

T=7(g,k)= trek =g" ki [8] 
is the trace of the second fundamental form, or the 
mean (extrinsic) curvature. From the coordinate 
formula [6] for the extrinsic curvature, we see that 
the components k; have dimension 46 =£ and 
thus the mean curvature T=trgk=g%k; has the 
dimension 474 = 47. 

Let \/det g denote the (global) scalar density and 
dug denote the (global) Riemannian measure on 
M determined by the Riemannian metric g (note 
that here d is not the exterior derivative). Similarly, 
let ug denote the volume element, a nonvanishing 
n-form on M, determined by g and the orientation 
on M. In a positively oriented local coordinate chart 


(xf) =(x',...,x7) on M, (./detg),, = / det gy, 
(dig) (xi) y arr gi dx! dx ---dx" = ,/detg;d"x, where 
x= eld! --dx” is the Lebesque measure in R”, 


and ( = \/detgijdx! Adx? A---Adx". We adopt 
the pied of suppressing the coordinate-chart 
designation (x’) so that one can, for example, write 


with some ambiguity \/detg = (./detg)(,i) = \/det gj. 


We let 


vol(M,g) =f m= f dug= | Vaetge"x [9 


denote the volume of the Riemannian manifold 
(M,g), given by either the integral of the volume 
n-form [lg or the Riemannian measure du, over M, 
which is given in the last integral in its coordinate 
form using the suppressed coordinate-chart conven- 
tion adopted above. As expected, the spatial 
physical volume has dimension (¢2)"/? = ¢”, 

We shall refer to the canonical variables (g;, n”) as 
the physical variables, in contrast to the reduced or 
conformal variables (y;,(p!!)”) to be introduced 
later. 

Note that the mean curvature T=trgk is a scalar 
function on M whereas trgz is a scalar density on M. 
Taking the trace of [7] expresses the mean curvature 
in terms of the canonical variables (g, 7), 


1 


7 (n — 1)\/det g 


T= eT) 103k tiyn [10] 


Using [10], eqn [7] can be inverted to give k in terms 
of g and 7, 


kij = — er (x E zo (tem) ) [11] 


and then combined with [6] to give the kinematical 
equation 








ao (x Tij — m (tem) ) 
ðt /detg (n— 1) 
Tr Xij T Xj [12] 


In terms of the canonical variables (g,7), a 
Hamiltonian form for the action for Einstein’s 
vacuum field equations can be expressed as 


Og; 
TapM(g, 7 j= far f (ni pe — NH(g,7) 
XT, BY dx 13] 
where I= [to, t1] C R is a closed interval and where 


the Hamiltonian (scalar) density H(g,7) and the 
momentum (1-form) density J (g, 7) are given by 





1 1 7 
H(g, T) = ——([ T -1r -—— 
= gg aE) a 
— ,/det g R(g) 
1 mp 1 P 
dee (sveu g TE (sin) 
— y det g R(g) [15] 
Tilg, T) = 2(b—r); = —2gijn"" k [16] 


where 7 - 7 is the g-metric contraction of m with itself, 
and where, as above, R(g) is the scalar curvature of 
the spatial metric. We also note that each of the three 
terms in the integrand of [13] are global scalar 
densities and thus can be integrated over M without 
any further involvement of the metric g. 

Variation of Iapm with respect to the lapse 
function and shift vector field yields the constraint 
equations 


H(g,m) = 0 [17] 
Tilg,m) =0 [18] 


which comprise that subset of the empty space 
(n + 1)-Einstein field equations corresponding to the 
normal-normal and normal-tangential projections 
of the Einstein tensor relative to a t = constant initial 
hypersurface. Variation of Iapm with respect to 7” 
reproduces the kinematical equation [12], whereas 


variation of Iapm with respect to gj generates the 
complementary tangential-tangential projections of 
Einstein’s equations. 

There are no evolution or constraint equations for 
either the lapse function N or the shift vector field X 
and therefore these quantities must be fixed by either 
externally imposed or implicitly defined gauge condi- 
tions. A convenient choice, for which a local existence 
and well-posedness theorem for the corresponding 
field equations can be established in any dimension 
n > 2, is given indirectly by imposing constancy of the 
mean curvature and a spatial harmonic gauge condi- 
tion on each t=constant slice (see Andersson and 
Moncrief (2003, 2004)). These constant mean curva- 
ture spatial harmonic (CMCSH) gauge conditions are 
given, respectively, by the equations 


[=F [19] 


g’ Tie) -T36 = 0 |20] 


where from [10], T is a function of the canonical 
variables (g, 7) and where g is some convenient fixed 
spatial reference metric (or background metric) on 
M. The latter condition corresponds to the require- 
ment that the identity map between the Riemannian 
manifolds (M, g) and (M, ẹ) be harmonic. Neither of 
these conditions involves the lapse function or shift 
vector field directly but their preservation in time 
implemented by the demand that the time deriva- 
tives of the given conditions be enforced leads 
immediately to a linear elliptic system for (N, X’) 
which determines these variables. The foregoing 
formalism is easily extended to the nonvacuum 
field equations in the presence of suitable material 
sources whose field equations are amenable to a 
constrained Hamiltonian treatment. To simplify the 
analysis, such sources will be ignored in the present 
discussion. 

For the special case of Einstein gravity in (2 + 1) 
dimensions, there is an elegant, alternative, triad- 
based formulation of the action functional as an 
Isom(R;)-invariant gauge-theoretic Chern-Simons 
action, where Isom(R}) denotes the full isometry 
group, or the Poincaré group (=the inhomogeneous 
Lorentz group), of (2 + 1)-Minkowski space RÈ. For 
nondegenerate triads the resulting field equations for 
this alternative formulation can easily be shown to 
be equivalent to those of the conventional formalism 
when the latter is re-expressed in terms of triads but 
the new formulation allows for meaningful field 
equations in the case of degenerate triads as well 
and thus suggests a potentially interesting general- 
ization of the theory (see Carlip (1998) for details). 

In any dimension n > 2, there is a well-known 
technique, pioneered by Lichnerowicz (1955), for 
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solving the constraint equations on a constant mean 
curvature (CMC) hypersurface (see Choquet-Bruhat 
and York (1980) and Isenberg (1995)). Of major 
importance for the treatment of Hamiltonian reduc- 
tion is that if n=2 and M=} p >2, or ifn>3 
and M is of negative Yamabe type, then every 
Riemannian metric g on M is uniquely globally 
pointwise conformal to a metric y which satisfies 
R(y)=—1 (see remark above [2]). Thus, from now 
on, we assume this topological condition on M. In 
this case, every Riemannian metric g on M can be 
uniquely expressed as 


ery ifm =2andM=X5,p>2 


if n > 3 and M is of [21 
negative Yamabe type 


with the conformal metric y normalized so that 
R(y)=—1 and with the specific form of the 
coefficient conformal factor being chosen to simplify 
calculations involving the curvature tensors. In 
the case n > 3, is positive and thus the space of 
all Riemannian metrics on M is parametrized by 
Mı and the space of scalar functions y > 0 on M. 
The function ọ is then determined by solving the 
Hamiltonian constraint [17] (see also the remark 
before [33]). 

In the given CMC slicing and imposing the 
vacuum field equations, since by the momentum 
constraint m must have zero divergence (see [16] 
and [18]), one finds that 7” must be expressible in 
the form 


n = (aT) + È (trgm)gi 22 
where m!" is transverse (i.e., divergence-free) and 
traceless with respect to g. In the nonvacuum case, n” 
picks up an additional summand determined by the 
sources in the modified momentum constraint [18]. 

Substitution of the foregoing decompositions of 
(gij n”) into the Hamiltonian constraint leads to a 
nonlinear elliptic equation for y which, under the 
conditions assumed here, determines this function 
uniquely, provided r #0. No solutions exist for 
T=0 (equivalently, trgm=0) since from [14], [17], 


and [22], the Hamiltonian constraint would then 
immediately imply that 
1 1 
a ee cn As ees 2 
=a ga Bs 


everywhere on M, which is not possible for a 
manifold M of negative Yamabe type. Instantaneous 
vanishing of the mean curvature, the defining 
property of a maximal hypersurface, would corre- 
spond to a moment at which an expanding universe 
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ceases to expand or a collapsing universe ceases to 
collapse. From [23], such behavior is topologically 
excluded here by the requirement that M be of negative 
Yamabe type (see also the discussion after [36]). 

In the unreduced formalism of Iapm, the role of a 
super-Hamiltonian is played by the functional 


Huelga) = f (NHT) +X'Silgen)) ax PA 


which evidently vanishes whenever the constraints 
are satisfied. To achieve a fully reduced formulation 
wherein again the effective Hamiltonian would 
vanish, one could endeavor to solve the associated 
Hamilton-Jacobi equations 


H(gij, 6S/ôg;) = 0 [25] 


Tr (gi, 6S/6gij) = 9 |26] 
for a real-valued functional S= S(g, a“) of the metric 
g and a set of additional independent parameters a“. 
A complete solution S(g;,a“) would be one for 
which an arbitrary solution (g;,n”) of the con- 
straints could be realized as (gj, 6S/6gi;) for a 
suitable (unique) choice of the a“. A complementary 
set of reduced canonical variables 64 (the momenta 
conjugate to the a,’s) could then be defined by 
Ba =6S/éda4 and one could in principle solve the 


equations 


ÓS 





n = 27 
ie [27] 
ôS 

Sa 28 

Ba =A [28] 


for (aĉ, Ba) as functionals of the canonical variables 
(8j, n”). This procedure, if it could be carried out, 
would ensure that these functionals (a“(g,7), 
Ga(g,7)) Poisson-commute with all of the con- 
straints and hence are conserved for an arbitrary 
slicing of spacetime. Conversely, if a suitable set of 
gauge conditions such as the CMCSH conditions 
were imposed, one could in principle solve for the 
remaining independent canonical variables as func- 
tionals of the (aĉ, 34) and an internal variable, such 
as the mean curvature r, which plays the role of 
time, and hence solve the field equations for (gj, n”) 
in the chosen gauge. 

This proposal is purely heuristic in (3 +1) and 
higher dimensions in that there is no known 
procedure for finding the needed complete solution 
of the Hamilton-Jacobi equations in these cases. 
However, by exploiting the Chern-Simons analogy 
discussed earlier in this section, a complete solution 
can be found in (2+1) dimensions and the 
corresponding complete set of “observables” 


(aĉ, Ba) identified. The latter are equivalent, up to 
a diffeomorphism of the associated reduced phase 
space, to a complete set of traces of holonomies of 
the flat Isom(R;)-connections defined in this Chern- 
Simons formulation (see Carlip (1998) for more 
details). 


The Reduced Hamiltonian 


We continue with the assumption that M is a 
connected closed oriented n-manifold, with either 
n=2 and M=%5,p > 2, orn > 3 and M of negative 
Yamabe type. We now define the reduced phase 
space as the set of conformal variables given by 


Preduced ={(¥,p'') | y E€ M_1 and p™! is a 
2-contravariant symmetric tensor density 
that is transverse and traceless 
with respect to y} [29] 


We remark that the fully reduced phase space is 
given by Pieducea/ Do, where Do is the group of 
diffeomorphisms of M isotopic to the identity. 
However, here, for clarity of exposition, we work 
on Preduceq rather than the fully reduced phase space. 

Given a scalar function y, with y > 0 if n > 3, the 
physical variables (g,7!') are related to the con- 
formal variables (y,p!!) by 


(sr) 
7 (ey, e] 
o Ca gi 2p”) 


ifn=2 [30] 
ifn>3 


We adopt the convention that raising and low- 
ering of indices on either momentum variable z!! or 
p'! will be with respect to its own conjugate metric, 
either g or y, respectively. With this convention, the 
mixed forms of n!! and p!! are equal, since for 
ae 

(nT), = gym 22 aah] (HA2) 
TTi _ 


=p * = (p 


(and similarly for the n=2 case). Thus the squared 
norms of p!! and 7!! are equal, 


Ala Tul 


YIP p 
“1 [31] 


i 
j 


TT TTij pi TEl 


p= Vik YIP 


= SikgjIT 


p 
TTi TTR IT. gT 139) 
where in the first term the center dot is y-metric 
contraction and in the last term the center dot is 
g-metric contraction. 

The uniquely determined scalar factor » relating 
the physical metric g to the conformal metric y is 
obtained by solving the Hamiltonian constraint 


equation [17]. In the special case that p!! =0 (or 
equivalently, from [30], that r!" =0), y is constant 
and is given in the n > 3 case by 


Thus in this case 


(n—1) 


AEA rg 34 





Y= g= 
In particular, since r has the dimension £1 (see the 
remark after [8]) and the components g; have 
the dimension 7, we see from this formula that the 
conformal metric y; is dimensionless. Although ¢ is 
not constant in the general case when p!! Æ 0, its 
dimension, as in [33], is still "72/2 and thus the 
components y; are still dimensionless in the general 
case. Since in the conventions used in this article, the 
spatial coordinates are dimensionless, the volume 
vol(M, y) of the Riemannian manifold (M, y), as well 
as all curvature tensors of y, are also dimensionless. 
Having a dimensionless conformal metric y with a 
dimensionless volume has its advantages over the 
physical metric g with dimension @ inasmuch, as we 
shall see below, an infimum of the volume of the 
conformal metric is related to a dimensionless 
topological invariant of M (see [48] and the remark 
thereafter). 

If one now uses the conformal variables given by 
[30] and the decomposition [22] in the ADM action 
given by [13], one finds the reduced action to be 


Oy; 2n—1) Or 
leda = | ar | Gk aS V det g 
I M 


n 


LOUT Y p 





In this expression one can discard the final time 
derivative which contributes only a boundary 
integral and so does not contribute to the equations 
of motion. Moreover, the conformal metric yj is 
constrained to lie in the intersection of Mı and a 
slice for the action of Dp on M. This space can be 
regarded as a local chart for the reduced configura- 
tion space T=M_ {/Do, under the technical 
assumption that J is a manifold. Thus, taken 
together, the conformal variables (yj, p!!”) can be 
viewed as local canonical coordinates for the 
cotangent bundle T*7 of Teichmüller space 7, 
where T*7 now plays the role of the reduced 
phase space. 

For n=2, these constructions can be carried out 
globally for the Teichmüller space T, of an arbitrary 
closed oriented surface &7,p > 2 (see the remarks 
after [4]). Using these global constructions, the 
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reduced phase space T*T, for the (2 + 1)-reduced 
Einstein equations can be modeled explicitly. 

Having restricted the slices to be CMC, one need 
only choose the relationship between the time 
coordinate and the CMC 7 in order to fix a 
corresponding reduced Hamiltonian. The most 
natural choice of time coordinate from the present 
point of view is to take 


t = t(T) = ———_ [36] 


Note that this choice of time coordinate, although 
also denoted t, is no longer dimensionless but has 
dimension ¢"~'. 

This choice of time coordinate is motivated by 
three considerations. Firstly, we remark that since 
T =0 is excluded in the setting used in this article 
(see [23] and the discussion after), r can range in 
either the domain R” = (—oo, 0) or R* = (0, 00). The 
usual convention on the sign of k, as adopted here, 
is that the sign of k is negative when the tips of the 
normals on a spacelike hypersurface are further 
apart than their bases, as for example in the 
expansion of a model universe, in which case 
T=trgk < 0. Thus, with this convention, 7 in the 
range R` corresponds to an expanding universe and 
r in the range R* corresponds to a collapsing one in 
the future direction of increasing t. Thus for 
manifolds of negative Yamabe type that we consider 
here, the expected maximal range of the CMC 7 is R` 
for which t—-—oco corresponds to a “crushing 
singular” big bang of vanishing spatial volume and 
T — 0° corresponds to the limit of infinite volume 
expansion. Then, with the time function given by [36], 
the coordinate time ¢ ranges in the interval R*, 
vanishes at the big bang, and tends to positive infinity 
in the limit of infinite cosmological expansion. 

We remark that to prove that a solution deter- 
mined by Cauchy data prescribed at some initial 
coordinate time tọ E€ R* actually exhausts the range 
R* is a difficult global existence problem that is not 
dealt with here. Nevertheless, one of the main 
motivations for this work is the hope that Hamil- 
tonian reduction will lead to advances in the study 
of the global existence question for Euinstein’s 
equations. 

We also remark that with the choice of temporal 
gauge function given by [36] and with 7 in its 
natural range R7, 


dr n 


a ao (-r)">0 37] 


so that this temporal coordinate choice preserves the 
time orientation of the flow for all n > 2. 
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Secondly, with this choice of temporal gauge, the 
reduced action given by [35] simplifies to 


d Induce = dt (> = 
= rya) d”x [38] 


from which one can read off an effective reduced 
Hamiltonian density, 


Hreducea (T, V P") =(—1)"V det g [39] 


and an effective reduced Hamiltonian, 


Hisdioed (T Vs p = J eny det gd’x 


i J dug 
M 


= (-T)"vol(M, g) [40] 


where vol(M,g)= f,,dug is the volume of the 
Riemannian manifold (M,g). Thus in terms of the 
physical variables (gi, 7”), the reduced Hamiltonian 
Hyeduced at “time” 7 is simply the volume of the CMC 
slice with mean curvature 7 rescaled by the factor 
(—r)”. With this reduced Hamiltonian density, the 
reduced action [38] takes the canonical form 


teaduced — =| dt J (> = 2a ~ Hsin ) d [41] 


As the third consideration for the given choice of the 
time function, we note that rescaling the physical 
volume vol(M, g) by the factor (—T)” yields a dimen- 
sionless quantity. Indeed, as we have seen, the spatial 
physical volume has the dimension @” and the constant 
mean curvature 7 has the dimension ¢~', so that the 
reduced Hamiltonian (—T)” vol(M, g) is dimensionless. 

The main advantage of having a dimensionless 
reduced Hamiltonian is that only such a reduced 
Hamiltonian can have a topological significance, 
and indeed, the infimum of Hyeduceq iS Closely related 
to a dimensionless topological invariant of M (see 
the remarks after [48]). 

In terms of the conformal variables (7,p''), the 
reduced Hamiltonian is found from [21] and [40] to 
be given for n > 3 by 


= (=n | (ely der ax 


= (ryt f gio du, [42] 


where du, is the Riemannian measure on M 
determined oy y (locally, du, = „/detyd”x) and 
y=(t,7,p!') is the conformal factor which, 
through the solution of the Hamiltonian constraint 
[17], is expressed as a function of the “time” t and 
the independent conformal (or canonical) variables 
GP 

In the special case n= M= p > 2, a simple 
formula for H,educeg can be derived. In terms of the 
conformal variables (y,p''), we find from [40], 
[10], [14], [17], [21], [22], and [32] that 


hanes” 
= J (- Pan=2 f ((detg)” (nT nT) —R(g) )dyug 


=2 J e=% (dety) t (p -p™ )du,+16m(p—1) [43] 
5; 


where y=y(T,7,p!"), x(Z5)= 2(1—p) is the Euler 
characteristic of the genus p surface 2: and where 
we have used the Gauss—Bonnet icon 


[ Ree) dng = xE = 8x1 —p) AA 


p 


Since 


H educed (T, Y, o) 


=2 | edet) (p -pdu 
52 


p 


+ 16r(p — 1) > 167(p — 1) [45] 
the infimum of Hyeduceq 18 attained precisely when 
p!" =0 and this infimum coincides with the topo- 
logical invariant —82x(Z5) = 16z(p — 1), which char- 
acterizes the surface 2> (see also [51] below). As we 
shall see shortly, an analogous result holds for 
n > 3. 

A straightforward but lengthy calculation, which 
is valid in arbitrary dimensions, shows that the 
reduced Hamiltonian is strictly monotonically 
decreasing in the direction of cosmological expan- 
sion except for a family of continuously self-similar 
spacetimes for which this Hamiltonian is constant 
(Fischer and Moncrief 2002b). The latter solutions 
exist if and only if M admits a Riemannian metric 
y E€ M which is an Einstein metric, that is, for 
which the Ricci tensor satisfies Ric(y) = —(1/n)7. 
Using the mean curvature as a convenient time 
coordinate, that is, temporarily taking t=7, the 


corresponding self-similar vacuum spacetime metrics 
then have the line element 


nN 


E de ee 
ds“ = (=) dr ade van dX [46] 


In the case that n= 3, the Einstein metric y is 
actually hyperbolic with constant sectional curvature 
K(y) = —1/6 and Ricci curvature Ric(y) = —(1/3)+. 
Although the conformal variables (y, p'') = (y, 0) are 
static in this model, the physical variables (g,7) are 
not. In this case, the resulting spacetimes (which 
depend on the underlying topology of M) have 
expanding closed hyperbolic spacelike hypersurfaces 
where the physical volume vol(M, g) “starts” at zero 
at the big bang and expands to infinity in the forward 
time direction, as befits a universe endlessly expand- 
ing from the big bang. Such a universe is depicted in 
Figure 1, where the genus-2 surface is used to 
represent a generic closed hyperbolic 3-manifold. 
The Bianchi and Thurston types of this model are 
discussed in the next section. 

The line element [46] is locally isometric to the 
vacuum Friedmann—Lemaitre-Robertson—Walker 
(FLRW) k= -—1 spacetime, which is well known to 
be flat. Although these spatially compactified mod- 
els are technically not classical FLRW spacetimes 
since the expanding compact hypersurfaces are not 
homogeneous (and thus not isotropic), they are 
Lorentz-covered by the FLRW k=-—1 spacetime 
and thus are locally isometric to this classical 
spacetime. 

The same result leading to [46] holds even if 
matter sources are allowed, provided they satisfy a 
suitable energy condition, in which case the corre- 
sponding reduced Hamiltonian will only be station- 
ary in the vacuum limit and then only when the 
metric is of the above type; otherwise it mono- 
tonically decays. This result even has a quasilocal 


X 


Figure 1 Expansion of the physical universe in the Bianchi V, 
Thurston type H°, spatially compactified FLRW flat spacetime 
cosmology. 
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generalization expressible in terms of the corre- 
sponding quasilocal reduced Hamiltonian defined 
for an arbitrary domain D, within the CMC slice 
T=constant by restricting H,educea in [42] to the 
domain D,, so that for n > 3, 


PG J djig 


=(=" | ge du, 47 


If D, is determined from its specification on some 
initial slice T= rọ, by letting the domain flow along 
the normal trajectories of the CMC foliation, one 
can then verify that Hp, is monotonically decreasing 
except for the vacuum solutions of self-similar type 
described above, in which case Hp, is constant. This 
result is independent of the initial domain chosen. 

We remark that one cannot use the quasilocal 
Hamiltonian to get equations of motion (even 
quasilocally) since the full true Hamiltonian is 
nonlocal and so one gets contributions from the 
whole manifold. 

Since the reduced Hamiltonian H ,educeg as well as 
its quasilocal variant Hp, is monotonically decreasing 
for generic solutions of Einstein’s equations, it is 
natural to ask what its infimum is and whether this 
infimum is ever attained, at least asymptotically, by 
solutions of the field equations. The infimum of the 
reduced Hamiltonian for n> 3 and for a spatial 
manifold M of negative Yamabe type can be character- 
ized in terms of a certain topological invariant of M 
called the sigma constant o(M) of M. For manifolds of 
negative Yamabe type, this quantity can be defined in 
terms of the infimum of the volume of all metrics which 
range over the space of conformal metrics M1. The 
precise definition leads to the formula 


2/n 
o(M)=—( inf vol(M,7) |48] 
yeM 4 

Interestingly, this equation defines the topological 
invariant o(M) by a purely geometrical equation 
involving the volume functional restricted to M. 
We also remark that [48] is a dimensionless 
equation, the left-hand side being dimensionless 
since it is a topological invariant of M and the 
right-hand side being dimensionless since the con- 
formal metric and its volume are dimensionless (see 
the remarks after [34]). 

Although the o-constant can be defined for all 
Yamabe types, [48] holds only for manifolds of 
negative Yamabe type. From this equation, one can 
conclude that for such manifolds 


o(M) <0 [49] 
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One can relate the foregoing to the reduced 
Hamiltonian by showing that the infimum of 
Hyeduceqd defined for arbitrary 7 < 0 as a functional 
on the reduced phase space 


T*T =T* (+) [50] 





is given by 


z Po H reduced T Y, a. 





— 8rx(Z,) = 167(p — 1) ifn=2 [51] 
n/2 and p >2 
n 
(e0) TA 


where for n> 3,M is of negative Yamabe type and 
thus o(M) <0 (see [49]). 

One proves this result by first showing that within 
an arbitrary fiber of the cotangent bundle 
T*(M_4/Do), one minimizes Hreduceq by setting the 
fiber variable pT! to zero. In this case, the solution for 
the conformal factor y reduces to a spatial constant 
which is a function of 7 alone (see [33]), and thus the 
formula for Hyeduced given in [42] reduces to 





n/2 

Hecasces(s,0) = (1) voim, a) 152 
The infimum over all conformal metrics y € M of 
this latter functional yields the o-constant as outlined 
above. If matter sources obeying a suitable energy 
condition are allowed, the argument goes through in 
much the same way with the additional implication 
that the infimum is achieved only for a vacuum 
solution so that in fact the matter must be “turned off.” 

Thus, as a consequence of the above analysis, 
one has 


n—1 


Teised E Y, a 


n n/2 
Z Flvedaved (7, Y, 0) = (=) vol(M, y) 





n/2 
ny" inf vol(M, 7) 


YEM 





= (Z (-o(m))"" 53] 


where the last equality follows by inverting [48] to give 


inf vol(M, y) = (-o(M))"/* [54] 
yEM-a 
Moreover, if yeM_4 actually achieves the 


o-constant, that is, if vol(M,y)=(—o(M))”* (and 
not just asymptotically approaches it as a curve or 
sequence), then y must be an Einstein metric with 


Rich) = —=7 [55] 


If, additionally, n=3, then y must be hyperbolic 
(with constant sectional curvature K(y) = —1/6). 

Although Thurston’s conjectures do not refer to 
the o-constant, Anderson (1997) has been able to 
reformulate and somewhat refine the Thurston 
geometrization conjectures for 3-manifolds of arbi- 
trary Yamabe type in terms of conjectured proper- 
ties of the o-constant. Additionally, if Perelman’s 
results are technically complete, they would provide 
a proof of Anderson’s conjectures as well as those of 
Thurston’s (see Anderson (2003)). 

The conjectured behavior for a sequence of 
conformal metrics {y;} y E M4,i=1,2,..., which 
seeks to minimize the volume of a stand-alone 
K(x, 1) 3-manifold M of negative Yamabe type can 
be described as follows: 


1. If M is hyperbolizable, then o(M) < 0 is attained 
by a hyperbolic metric y, E€ M1, unique up to 
diffeomorphism, and the sequence of conformal 
metrics {y;} converges to this metric in a suitable 
function space topology. 

2. If M is a pure graph manifold, then o(M) =0 and 
the sequence {y;} of conformal metrics “volume 
collapses” M with bounded curvature. Typically 
this occurs through collapse of circular or toroidal 
fibers in the associated circle or 2-torus bundle 
structure (see examples 3, 4, and 5 in the section 
“Topological Background” and see also the penul- 
timate section). The six manifolds of flat type are 
not included here as they are of zero Yamabe type. 

3. If M is a generic K(n,1)-manifold (not of type 1 
or 2 above), then M can be decomposed along 
incompressible tori into its final finite-volume- 
type hyperbolizable and (possibly empty set of) 
graph-manifold pieces. In this case, o(M) < 0 and 
the sequence {y;} of conformal metrics collapses 
the graph-manifold components and converges to 
finite-volume complete hyperbolic metrics on the 
hyperbolizable components (normalized to have 
R(y) = —1) yielding a o-constant that is entirely 
determined by the volumes of these final hyper- 
bolic components (see the final section). 


We shall return to this conjectured characteriza- 
tion of sequences of conformal metrics in the next 
two sections. 


Reduction of Bianchi Models 
and Conformal Volume Collapse 


For manifolds of negative Yamabe type, the strict 
monotonic decay of Hyreduceq in the direction of 
cosmological expansion along nonconstant integral 
curves of the reduced Einstein equations suggests 


that the reduced Hamiltonian is seeking to achieve 
its infimum inf Hyeduceg = ((n/(n — 1))(—o(M)))”””. 
But does this ever happen? Does the reduced 
Einstein flow of the conformal geometry asymptoti- 
cally approach inf Hyeduceq in the limit of infinite 
cosmological expansion? 

To answer this question, one can consider for n = 3 
known locally homogeneous vacuum solutions of 
Einstein’s equations which spatially compactify to 
manifolds of negative Yamabe type. Applying the 
theory of Hamiltonian reduction to these classical 
models, one can show that the reduced Hamiltonian 
behaves as expected under the reduced Einstein flow 
defined by these models. Since these models existed 
long before this theory, it is somewhat satisfying to see 
that they can be interpreted in terms of Hamiltonian 
reduction and how, with this interpretation, new 
properties of these classical solutions can be found. 

Since Hyreduced is a Strictly monotonically decreasing 
function along nonconstant integral curves of the 
reduced Einstein flow, it is expected that under certain 
conditions, the reduced Hamiltonian is monotonically 
seeking to decay to its infimum. Thus, it is of interest to 
look at Hamiltonian reduction under the consequence 
of the following two assumptions: 


1. The reduced Einstein field equations give rise 
to the existence of a positive semiglobal non- 
constant solution (y(t), p!!(t)) defined for all 
t € (0,00) (or equivalently, for all r € (~o, 0); 

2. The reduced Hamiltonian strictly monotonically 
decays to its infimum along nonconstant integral 
curves, 


Heda TS) y(t) pTO) 
as t — © 


|56] 


> inf educi 


From [40] and [51], in terms of the physical 
variables (g,7) (or (g,k)), [56] can be written 
equivalently as 


3/2 
— T° vol(M,g) =—(tr,k)* vol(M,g)— (Fom) 


|57] 


as t — œ 


Table 1 


Bianchi type Thurston type Typical examples 
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Yamabe type c-constant 


As a consequence of these assumptions, it follows 
from [53] that the conformal volume vol(M,y) must 
also decay to its infimum [54] (although not 
necessarily monotonically), 


vol(M,7(¢))— inf vol(M,») 


= (-o(M))** ast — œ [58] 


Now suppose that o(M)=0. A large class of 
manifolds for which this is true are the graph 
manifolds (and thus also the Seifert manifolds) of 
negative Yamabe type since o(M)>0O for graph 
manifolds in general and since o(M) < 0 for mani- 
folds of negative Yamabe type. In this case the curve 
y(t) € M_4 of conformal metrics must necessarily 
(conformally) volume collapse M in the direction of 
cosmological expansion, 


vol(M,7(t)) — (—o(M))?* =0 ast— oœ [59] 


Consequently, the curve of conformal metrics y(t) 
must undergo some form of degeneration as its 
volume collapses. The details of this metric degen- 
eration are of importance and are discussed below. 

Not all locally homogeneous vacuum Bianchi 
models admit spatially compact quotients. Fortu- 
nately, the general theory of which Bianchi models 
admit spatially compact quotients has been worked 
out in detail by Tanimoto, Koike, and Hosoya (see 
Tanimoto et al. (1997) and the references therein). 
These Bianchi models together with their corre- 
sponding Thurston classification and typical exam- 
ples of their closed quotient manifolds are listed in 
Table 1, where “K-S” indicates “Kantowski—Sachs,” 
“Pp.” “Z,” and “N” denote manifolds of Yamabe 
type positive, zero, and negative, respectively (see 
the section “Topological Background”), “Seifert” 
means Seifert fibered, “Hyper” means hyperboliz- 
able, “?” indicates “unknown, but conjectured to be 
so,” and “manifold collapse” denotes the type of 
collapse that the conformal manifold (M,+(t)) goes 
through as the conformal volume vol(M,7(t)) 
collapses. We also remark that all of the manifolds 


Bianchi, Thurston, and Yamabe type of a connected closed oriented irreducible 3-manifold 


Manifold structure Manifold collapse 





K-S S& x R S? x S! 

IX S? Nontrivial S'-bundles over S° 

| R? 1 

II Nil Nontrivial S'-bundles over T? 

Il H XR 2 Mow oe 

VIII SL(2, R) Nontrivial S'-bundles over £? 
Vlg Sol Nontrivial T?-bundles over S' 
V, VII, H? Closed hyperbolizable manifolds 


Z22222NnNni0vu 


>0 Seifert 
>0 Seifert 
0 Seifert 
0 Seifert Total 
0 Seifert Pancake 
0 Seifert Pancake 
0 Graph Barrel 
<0? Hyper None 
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listed in the “Typical examples” column are irredu- 
cible with the exception of S! x S*, which is prime 
but not irreducible. Also, in this column, p > 2. 

In this table, the eight Thurston types are grouped 
into three sets according to their Yamabe type. The 
first set of such Bianchi models are those that 
spatially compactify to yield 3-manifolds of positive 
Yamabe type which allow metrics with positive 
constant scalar curvature, for example, Bianchi IX 
models defined over spherical space forms. The 
second set (consisting of one type) yields manifolds 
of zero Yamabe type which allow zero scalar 
curvature metrics but not constant positive scalar 
curvature metrics, for example, Bianchi I models 
defined over T? or one of the other five manifolds of 
flat type finitely covered by T°. The third set (the 
last five entries in Table 1 and the set of most 
interest in this article) yields manifolds of negative 
Yamabe type which do not allow metrics with zero 
scalar curvature. 

These latter models are the five Bianchi models of 
types II, III, VIII, VIo, and V (and in part VII;), 
which in turn onid, in Thurston’s classification 
to manifolds of type Nil, H? x R, SL(2, R), Sol, and 
H?, respectively. 

In the first three cases, the models of Bianchi type 
Il, Ill, and VIII compactify to a nontrivial $'-bundle 
over T? or to a trivial or nontrivial S'-bundle over 
= p > 2, respectively. Each of these spaces is Seifert 
fibered. In the fourth case, the model of Bianchi type 
Vlo compactifies to a nontrivial T*-bundle over S! 
which is an irreducible graph manifold. Since each 
of these manifolds is also of negative Yamabe type, 
in each of these four cases, as discussed in the 
beginning of this section, a(M) =0. In the fifth case, 
we consider vacuum Bianchi V metrics as well 
as a special case of Bianchi type VI, which 
compactify to an arbitrary closed oriented hyperbo- 
lizable manifold M. 

For these latter five Bianchi models that spatially 
compactify to manifolds of negative Yamabe type, 
one can consider the classical solutions from 
the point of view of Hamiltonian reduction. The 
starting point for this point of view is to use 
explicitly known vacuum metrics for the 
simplest “standard” metric forms, given, for exam- 
ple, in Wainwright and Ellis (1997). One need not 
consider all such possible spatially compact quoti- 
ents, even though that would appear to be quite 
feasible, but one need only consider some represen- 
tative examples for each of the Bianchi types listed. 

It can be shown by explicit calculation, using the 
known solutions, that in the four nonhyperbolizable 
cases where o(M)=0, each of the classical Bianchi 
solutions gives rise to the existence of a positive 


semiglobal nonconstant solution to the reduced Einstein 
field equations and that along this solution, the reduced 
Hamiltonian asymptotically approaches 0 under the 
reduced Einstein flow, thereby confirming the expecta- 
tion that the reduced Hamiltonian asymptotically 
approaches its infimum ((3 /2)(—o(M)))*/ *—(. Thus 
in these cases the reduced Einstein flow conformally 
volume-collapses the 3-manifold. 

The explicit calculations also show the details of 
this collapse. In the second and third models of 
Bianchi type III, Thurston type H* xR, and 
Bianchi type VIII, Thurston type SL(2, R), respec- 
tively, the conformal metric degenerates along 
embedded circular fibers and this metric degenera- 
tion causes M to collapse to its base manifold 2 
p > 2. Since the collapse is along one-dimensional 
fibers and since the two-dimensional base mani- 
fold pan does not collapse, we refer to this type of 
collapse as pancake collapse (see Figure 2). 

In the fourth model of Bianchi type VIo, Thurston 
type Sol, the conformal metric degenerates along 
embedded T?-fibers and this metric degeneration 
causes M to collapse to its base manifold S$‘. Since 
the collapse is along two-dimensional fibers and 
since the one-dimensional base manifold S! does not 
collapse, we refer to this type of collapse as barrel 
collapse (see Figure 3). 

In the first model of Bianchi type I, Thurston type 
Nil, as in the second and third models, the 
conformal metric degenerates along embedded cir- 
cular fibers. Additionally, not only do the circular 
fibers collapse but simultaneously the flat quotient 
2-torus base manifold T^ ~ M/S! of M modulo its 
circular fibers also collapses. Thus the metric 
degeneration collapses M to a point, exhibiting a 








Figure 2 Bianchi Ill, Thurston type H? x R,M= 3 x S', 
pancake ives to 52, p=2. The conformal geometry starts 
with an infinite S’ fiber at the big bang (t=0*) and pancake 
collapses with bounded curvature to £3 at infinite cosmological 
expansion (t — co). 


a 


Ss t wee 


m o 


Figure 3 Bianchi Vlọ, Thurston type Sol, nontrivial T?-bundle 
over S', barrel collapses to S'. The conformal geometry evolves 
from a base manifold S' at the big bang (t=O*). Instanta- 
neously after the big bang, flat T*-fibers bloom out of the 
collapsed S' state. The conformal metric then expands to a 
maximum volume and then barrel collapses with bounded 
curvature back to the base manifold S' at infinite physical 
cosmological expansion (t — oo). The two facial 2-tori are flat 
and are glued together by an _ orientation-reversing toral 
automorphism so as to give a nontrivial T?-bundle over S!. 
The gray-scale density grading along the tube also indicates the 
nontriviality of the bundle. 


case of total collapse. Thus these model universes 
provide examples of nonflat almost-flat manifolds 
that exhibit total collapse with bounded curvature. 
Since the conformal geometries of these model 
universes collapse to a point, they aptly deserve 
their name Nil (see Figure 4). 

Remarkably, in each of these four cases of 
collapse, the collapse occurs with bounded curva- 
ture, precisely as occurs in the totally different 
setting of the Cheeger-Gromov theory of collapsing 
Riemannian manifolds, recognized many years ago 
to be of importance in the understanding of the 
behavior of sequences of metrics with uniform 
curvature bound (see Gromov (1999) for references 
and Anderson (2004) for other applications of 
Cheeger-Gromoy theory to general relativity). 
What is somewhat remarkable is that the above 
cosmological models were constructed completely 
independently of that setting and thus provide 
naturally occurring cosmological models whose 
closed spatial hypersurfaces undergo conformal 
volume collapse and metric degeneration exactly as 
occurs in the theory of collapsing Riemannian 
manifolds. 

Of course, this volume collapse and metric 
degeneration only occur as described in the con- 
formal variables. The physical variables behave 
differently. Indeed, in contrast to the conformal 
volume which collapses to zero in the first four cases 
and is constant in the hyperbolizable case (see 
below), the volume of the physical metric in all 
five cases goes to infinity since the flow is 
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t 





t=Ot t= co 


Figure 4 Bianchi II, Thurston type Nil, nontrivial S'-bundle 
over T°, totally collapses to a point. The conformal geometry 
evolves from a point at the big bang (t=0*). Instantaneously 
after the big bang, the full 3-manifold blooms out from that point. 
The conformal geometry then evolves to a metric of maximum 
volume and then totally collapses with bounded curvature back 
to a point at infinite physical cosmological expansion (t — ov). 
The two 2-tori, represented here by doughnuts, are flat and 
are glued together by an orientation-reversing toral automorph- 
ism so as to give a nontrivial S'-bundle over T°. 


temporally oriented in the direction of infinite 
cosmological expansion. 

In the fifth case where M is hyperbolizable, o(M) 
is conjectured to be negative and to be determined 
by the hyperbolic volume, o(M) = —(vol(M, vp), 
of the hyperbolic conformal metric yp normalized so 
that R(y,)=—1. In this case, yp together with 
p'!=0 is a fixed point for the reduced Einstein 
flow so that trivially the conformal volume does 
not collapse. Moreover, if o(M) is determined by 
the volume of p, then the constant reduced 
Hamiltonian also trivially achieves its infimum 
Hyeduced(T Yb» 0) = ((3/2) (—o(M)))°/* = (3/2)? (vol 
(M,-,)), again confirming the expectation for the 
behavior of Hreduceq On these Bianchi models. 

Note that for this static case, the physical 
variables behave as described after [46] and as 
shown in Figure 1. Also note that in contrast to 
Figures 2-4 where the conformal geometry is 
depicted, Figure 1 depicts the physical geometry. 

Overall, in all five cases, subject in the hyper- 
bolizable case to a hyperbolic metric realizing the 
o-constant, the reduced Hamiltonian asymptotically 
approaches its o-constant infimum along the flow 
lines of the reduced Einstein system. In doing so, the 
volumes of the conformal metrics either go to zero 
(in the first four cases) or to the hyperbolic volume 
(in the hyperbolic case). In all five cases, the 
curvature of the conformal metrics is uniformly 


bounded. 
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Because the reduced Einstein field equations behave 
as expected for the Bianchi models that we have 
considered with spatially compactified manifolds 
being either Seifert fibered, graph, or hyperbolizable, 
it seems plausible that for a more complicated starting 
manifold M, the reduced Einstein flow may induce a 
decomposition of M into geometric pieces. Indeed, 
Anderson’s conjectures (Anderson 1997) predict how 
a sequence of geometries with bounded curvature 
approaching o(M) degenerate. Assuming these con- 
jectures, the asymptotic behavior of large classes of 
Einstein spacetimes may perhaps be characterized 
rather explicitly in terms of the geometrization 
program of 3-manifolds (see the next section). 

Conversely, it is conceivable that the damped 
hyperbolic system of equations defined by the reduced 
Einstein flow (with its strictly monotonically decreas- 
ing reduced Hamiltonian on nonconstant curves) 
could be used to try to establish some form of the 
geometrization conjectures for 3-manifolds, much like 
the parabolic system of equations defined by Ricci 
flow is currently being used. If such a program were to 
be successful, it would amount to a spectacular 
consequence of Einstein’s equations, implying as it 
does that geometrization may actually occur in nature. 


Possible Cosmological Applications 
of the Reduced Hamiltonian 


Astronomical observations strongly support the view 
that in a sufficiently coarse-grained sense, the universe 
is homogeneous and isotropic. Furthermore, it is 
expanding at such a rate, relative to its observable 
energy density, that it will continue to expand forever. 
The simplest cosmological model consistent with these 
properties and which has a vacuum limit is the k = —1 
FLRW model. Spatially compactified variants of this 
model are still locally homogeneous and isotropic even 
though they are no longer globally so (see the 
discussion after [46]). Evidence for one or another of 
the infinitely many compactifications possible could be 
sought in patterns of fluctuations of the cosmic 
microwave background radiation and the detection 
of such patterns could be strong evidence for a 
spatially closed universe. 

However, is one really justified in extrapolating 
local observations of that portion of the universe 
visible to astronomers to a conclusion about its 
global topology? Could it be instead that there is a 
dynamical reason, provided by Einstein’s equations, 
for the observed fact that the universe seems to be 
locally homogeneous and isotropic and in such a 
state as to continue expanding forever? 

Suppose for the sake of argument that the 
universe has a more complicated topology, such as 


that of one of the generic K(x, 1)-manifolds which 
does not admit a locally homogeneous and isotropic 
metric even though its hyperbolizable components 
would each individually do so. A plausible scenario 
suggested by the results in this article is that under 
the Einstein evolution, the reduced Hamiltonian given 
by [40] consisting of the rescaled spatial volume 
becomes asymptotically dominated in the future 
direction of cosmological expansion by the contribu- 
tion of the hyperbolizable components. On each of 
these components, the limiting conformal metric 
approaches local homogeneity and isotropy with the 
relative contribution of the graph-manifold constitu- 
ents, if any are present, collapsing asymptotically to a 
negligible fraction of the whole. The idea is that if 
structure formation develops sufficiently late in the 
evolution of such a universe, then it should occur, with 
overwhelmingly high probability, in those regions 
which dominate the conformal volume and admit an 
asymptotically locally homogeneous and isotropic 
metric of constant negative curvature, locally indis- 
tinguishable from a k = —1 FLRW model. 

One can speculate still further and imagine what 
happens if the spatial topology is not of prime type 
but rather consists of a connected sum of several 
K(x, 1)’s together perhaps with nontrivial spherical 
manifolds $°/F' and handles St x S*. Here it seems 
conceivable, especially in view of the expected 
tendency of spherical manifolds to “recollapse,” that 
the evolving universe would develop pinch-off singu- 
larities along the essential 2-spheres that separate the 
individual prime factors. Such singularities might 
occur in finite time between connected sums of 
spherical recollapsing factors or in infinite time 
between connected sums of K(x, 1)-factors. Similar 
patterns of singularity formation are seen to occur in 
Ricci flow and must be treated in the resolution of 
the 3-manifold geometrization program. 

Of course there is no proof of such behavior for the 
full (3 + 1)-dimensional Einstein gravity but for the 
model problem of Einstein’s theory in (2 + 1) dimen- 
sions, something close to a proof of the analogous 
conjecture is already at hand. In the vacuum case, 
which can be described rather explicitly, one can 
construct the generic solution for a higher genus 
surface topology by cutting open the corresponding 
k=—1 FLRW model and gluing in the so-called 
Kazner wedges. These wedges play the role of the 
graph-manifold constituents of a generic K(n,1)- 
manifold in three dimensions and evolve anisotropi- 
cally. However, it is known rigorously in this case that 
the rescaled spatial area  Hyeeduced(Ts y, p11) = 
(—7)* Area(D2, g) is asymptotically exhausted by the 
FLRW components with the contribution from the flat 
Kazner anisotropic pieces shrinking to zero in this 


limit. If certain types of matter sources are included, 
for example, those analogous to terms which result 
from Kaluza—Klein reduction of vacuum gravity in 
(3 + 1)-dimensions, then a similar result can be proved 
at least for sufficiently small but fully nonlinear 
perturbations away from the vacuum backgrounds 
(see Choquet-Bruhat (2004)). 

In fully general (3 + 1)-dimensional gravity, there 
are few known topologically general results beyond 
those mentioned earlier and the problem is compli- 
cated by the presence of gravitational waves (which 
are absent in (2+ 1) dimensions) and the fact that 
on such more general manifolds, there are no known 
“background” solutions to perturb about. However, 
for the special case of (future) vacuum evolution on 
a pure closed hyperbolizable manifold, one can show 
that if the initial data is sufficiently close to that of an 
FLRW model, then the fully nonlinear gravitational 
perturbations eventually die out leaving a locally 
homogeneous and isotropic model in the asymptotic 
limit (see Andersson and Moncrief (2004)). It seems 
likely that this result can be generalized to allow for 
the inclusion of various types of matter sources as in 
the (2 + 1)-dimensional case. 
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Introduction 


In the study of differential systems, and particularly 
of Hamiltonian differential equations, a fundamen- 
tal problem is the question of their integrability. 
Because there are different definitions of this notion, 
a system which is integrable according to one 
definition can be nonintegrable according to another 
one. The notion of integrability is connected to the 
existence of a sufficiently large number of first 
integrals, which are linked to conservation laws. For 
a real analytic Hamiltonian system with n degrees of 
freedom, the “complete integrability” means the 
existence of n first integrals, which are functionally 
independent, and “in involution,” in the entire phase 
space. These integrals can be functions of class C’ 
(r finite), C, or analytic. 

For the classical problems of Hamiltonian 
mechanics which are integrable, their first integrals 
can be continued into the complex domain of the 
variables, as one-valued holomorphic, or mero- 
morphic, functions of complex time. This fact leads 
to the concept of “complex integrability” of a 
system. Note that a real Hamiltonian system which 
is integrable may be nonintegrable in the complex 
domain, if the real first integrals cannot be con- 
tinued as one-valued holomorphic functions of the 
complex time. 

Generally, the branching of solutions of a system, 
as functions of complex time, is an obstruction to 
the existence of one-valued first integrals. To study 
this problem, one can, following Poincaré, expand 
the solutions in convergent series of a small 
parameter: this is the base of “perturbation meth- 
ods,” and the main fact is that a small perturbation 
of an integrable Hamiltonian system generally 
destroys its integrability. Another method of proving 
nonintegrability consists of studying the linearized 
equations along a particular solution. This last 
direction has been exploited recently, in particular, 
through methods based on algebraic results inspired 
by differential Galois theory. 


Hamiltonian Systems and Mechanics 


Let us consider a conservative holonomic real 
dynamical system with n degrees of freedom: the 
positions of this system are points of an 


n-dimensional real manifold N (the state space or 
configuration space) with local coordinates 
X1,X2,...5Xn. If the velocities are denoted by 
x; = dx;/dt, we consider the Lagrangian function L 
associated to this system: 


PC opener: E eee 
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where x=(X1,...,%y,) is a tangent vector to the 
manifold N at the point x =(x1,...,x,). The kinetic 
energy T(x, x) is a positive-definite quadratic form in 
X1,-.+5Xn, and V(x) is the potential energy, whose 
gradient determines the forces acting on the system. 
The motions (x1(t),x2(f),...,Xn(f)) of the system 
on the manifold N are the extremals of the action 
integral: IP LAX ican py Cigaxcgkany de ("principle 
of stationary action of Hamilton”) and they are 
the solutions of the Euler-Lagrange system, which 
consists in n differential equations of second order 
for the coordinates x1,%2,...,X, (Whittaker 1904): 


d (aL) Lo 
dt On Ox; 


This system can be written in the Hamiltonian 
form: the Lagrangian L is a function defined on the 
tangent bundle TN of the state space N, with local 
coordinates *1,...,%75%X1,---5Xn (i.e. an element of 
TN consists in a point x of N, joint with a tangent 
vector to N at x). Now, we consider the cotangent 
bundle T*N: an element of T*N consists in a point x 
of N joint with a cotangent vector to N at x, that is, a 
linear form defined in the tangent space to N at x. In 
local coordinates, the components of this linear form 
are 1,---,Yn, defined by: y;=OL/OX; 31,...,¥n are 
called the generalized momenta, or impulsions. x; 
and y; are called conjugate canonical variables. 

The mapping from TN to T*N thus defined is the 
Legendre transformation (Abraham and Marsden 
1967). Through it, the Euler-Lagrange equations 
become a system of 2n differential equations of first 
order: 


1l<i<n 


dx; ƏH dy; 0H 
dt OY; dt Ox; 








1l<i<n 


where Tian casey incacs i) = 1) Vienne 
Vienees Va) — V Migwang en): 

H is the Hamiltonian function of this system. The 
solutions of these differential equations are curves 
on the 2m-dimensional manifold T*N, whose projec- 
tions in the n-dimensional state manifold N coincide 
with the solutions of the Lagrangian system. T*N is 
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called the phase space of the system. The second 
members of the differential system define a vector 
field in the phase space. 

Let M=T*N. On this 2”-dimensional manifold, 
consider the standard symplectic form Q= 
So", dy; A dx;. If f and g are C™-functions on M, 
we define their Poisson bracket {f,g} in local 
coordinates by 


(of 0g of dg 
(i- O 


It defines the space C®(M) as a Lie algebra over R. 

Then, if H € C®(M) is the Hamiltonian function 
associated to a system, the corresponding Hamilto- 
nian equations can be written as the following 27 
“canonical equations” (Arnol’d 1976): 


dx; ðH dy; _ _ OH 
T an ae O 


Loran [1] 








A function F € C®%(M) is a (first) integral of eqns [1] 
if it is constant along any solution of [1], that is, if it 
verifies: {F, H} = 0. Thus, a first integral is a quantity 
which is preserved along a solution (“conservation 
law”). In particular, H itself is a first integral of the eqns 
[1]. It represents the “total energy” of the system. 


1. The simplest example of Hamiltonian system is 
the harmonic oscillator defined by the one degree of 
freedom Hamiltonian: 


(x,y) =3y" + 3x" 


It possesses the energy integral H. Thus, the trajectories 
in the phase space R? (phase plane) are given by 
x* + y*=2h, which are concentric circles if the 
constant energy verifies h > 0. The phase space RŽ? is 
foliated by these circles. The system is said to be 
“integrable.” Obviously, it is also possible to construct 
Hamiltonian systems with n degrees of freedom 
(n > 1), by coupling n harmonic oscillators, with a 
Hamiltonian defined by 


1 n n 
ISi EREE pis Vino a5 Va) =5)> +X ax’? 
i=1 i=] 


with constant coefficients a; > 0. 

2. Another example of Hamiltonian system with one 
degree of freedom is the simple mathematical pendu- 
lum. The state coordinate is the angle 6 of the 
pendulum with the vertical axis, defined modulo 27. 
The phase space is: M=S! x R (x =6@(mod2r) € 
S',y € R) that is, a cylinder. The Hamiltonian function 
is: H(x, y) =(1/2)y? — cos x; H isa first integral of the 
differential equations, the system is integrable and the 
trajectories on the cylinder St! x R are defined by 


(1/2)y* — cosx=h. According to the constant value 
of on each phase curve, the solutions are periodic 
oscillations of the pendulum (if b < ho), periodic 
solutions of rotation where the angle varies mono- 
tonically with time (if h > ho), two equilibria (one 
stable, one unstable) and solutions which “begin” 
when t — —co at the unstable equilibrium and “finish” 
when t— +00 at the same point (if b= bo): the 
corresponding phase curves are called “separatrices.” 
3. The system of Hénon—Heiles (Hénon and Heiles 
1964) is a system with two degrees of freedom. The 
phase space is R? x R? and the Hamiltonian is 


defined by 


X13 V1, 3) 


=} (y1 +27) +4 (2017 + 2027) + x172 — Anz” 


where À is a real constant. This system is “integrable” 
for some isolated values of the parameter A (Ziglin 
1983) and “nonintegrable” otherwise. Of course, it is 
necessary to define the integrability of a Hamiltonian 
system, although according to Poincaré: “A system of 
differential equations is only more or less integrable.” 


Integrability of Hamiltonian Systems 


Generally, if a differential system is of order p, it is 
necessary to know p first integrals to integrate it. But if 
the system is Hamiltonian of order 2n, only n first 
integrals are sufficient to integrate it “by quadratures,” 
that is, by “algebraic” operations such as integrations 
and inverting of functions. The reason is that the 
existence of one first integral allows us to reduce the 
order of the system by two: a system of order 2” with 
one first integral can be reduced to order 2n — 2. 


Theorem of Liouville (see Arnol’d (1976)). Sup- 
pose that F,, Fo,...,F, E€ C®(M) are n first integrals 
of the Hamiltonian system [1] which are “in 
involution,” that is, such that: {F;, Fj}=0,Vi,j, and 
suppose that they are functionally independent, that 
is, the n differentials, dF;, are linearly independent at 
each point of the level set M; defined by 


Me = E T PET Vn) EM: Fi Xiycsrs Vig ves) 
ee ee eee 4 
Then 
(i) the set My is a manifold which is invariant 
along the solutions of the system [1]; 
(ii) if My is compact and connected, it is diffeo- 
morphic to an n-dimensional torus 
Poa SE xS 
= {(Y1,---5 Pn) : pi E R/2nZ}; 
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(iii) the Hamiltonian flow on each torus Mẹ is linear 
and “quasiperiodic” with frequencies w; defined 
by dy;/dt =wi(fi, fr, ---sfn)3 and 

(iv) the Hamiltonian equations are integrable by 
quadratures. 


If a Hamiltonian verifies the assumptions of the 
theorem of Liouville, one can prove that it exists, 
locally, canonical coordinates Ø = (¢%1,..., n) and 
I=(l,...,I„) such that the Hamiltonian function 
depends only on the variables J;. Then 








dy; OH 
dt OL 
d; OH 
dt 0g; 


These equations are immediately integrated as 
follows: 


I; = constant, and y;=w;-t+ y;(0), with 
On 
ól; 





will, h, TE dg) 


I=cst 


Such local coordinates (%;,I;) are called “action- 
angle” variables. They were defined for the first time 
by Delaunay and they play an important part in the 
theory of perturbations. 


Remark An invariant torus T” of the theorem of 
Liouville is characterized by the constant values of the 
actions I;, which determine the frequencies w; on it. 
Such a torus is said to be nonresonant if the relation 
between the frequencies wj:S~"_, kiw;=0 (where 
ki,...,R, are integers) implies that k; =0,Vi. The 
frequencies w; are then rationally independent. If a 
torus is nonresonant, the phase trajectories are dense 
everywhere and the motion is quasiperiodic on it. 


A torus is said to be resonant, if the frequencies w; 
are rationally dependent: they verify a relation 
J Ree 0), with (kissesRa) (0522050). Then 
the phase trajectories are not dense on the torus; 
they belong to tori of lower dimension. 

A consequence of the theorem of Liouville is that, 
if a two-degree-of-freedom Hamiltonian system 
possesses one first integral F (in addition to H, and 
independent of H), it is integrable because F is 
necessarily in involution with H: {F, H}=0. 

An example of system with three degrees of 
freedom which is integrable is the Lagrangian 
symmetric top with one fixed point (there exists a 
cylindric symmetry for the inertia momenta and the 
center of mass is on the symmetry axis). This system 
possesses three first integrals that are in involution 
and independent: H, and the angular momenta M, 
and M3, which correspond to the (constant) 


frequencies of precession and nutation of the top. 
The level sets Mẹ are here tori of dimension 3, which 
are indexed by the three frequencies (or by the 
constant values of the three integrals). 

There are other integrable cases for this problem 
of a rigid body with a fixed point (see Kozlov 
(1983)): the Euler’s case (when the fixed point is the 
center of mass); the Kowalevskaya’s case (in which 
the inertia momenta verify two relations and the 
third coordinate of the center of mass vanishes — see 
Kowalevski (1889)); and the Goryachev—Chaplygin’s 
case, which is integrable only on a single integral 
level. 

A fundamental and classical example of integrable 
Hamiltonian system is the Kepler’s problem: the 
motion of a ponctual mass in the gravitational 
(Newtonian) field of a center, for instance, a planet 
in the field of attraction of the Sun. 

Another example is the problem of two fixed centers: 
an infinitesimal mass in the field of two centers, problem 
which was integrated by Lagrange (Lagrange, 1810). 


Isolated Periodic Orbits 
and Nonintegrability 


We consider a real Hamiltonian system with n 
degrees of freedom and we suppose that there exists 
a particular T-periodic solution r'r (which is not an 
equilibrium). Along Tr, we consider the linearized 
equations deduced from the Hamiltonian system. 
They can be decoupled into the tangential equation 
(one degree of freedom) which possesses the first 
integral dH and the normal variational system which 
can be written as 


= =]-K(Pr())-£ 2 


where 


is the standard symplectic matrix of order 2(n — 1) 
and K(I‘7(t)) is a T-periodic matrix depending on 
the solution Tr. 

The solutions of the linear system [2] form a 
vector space. As a definition, the monodromy 
matrix M(T) expresses how fundamental solutions 
of the linear system [2] are transformed after one 
period T, that is, along the periodic closed orbit Pry: 


E(t + T) = M(T) - &(¢) 


Poincaré showed that if one of the eigenvalues of 
M(T) is different from 1, then the periodic solution 
Ir is isolated. Furthermore, if the number of first 
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integrals of the Hamiltonian system, independent 
along Tr, is equal to k, then at least 2k eigenvalues 
of M(T) are equal to 1. 


Theorem (Poincaré 1892). If the Hamiltonian 
system possesses n integrals in involution, and 
independent along a periodic solution Tr, then Tr 
is nonisolated. 


Then, if the Hamiltonian system possesses a dense 
set of isolated periodic orbits, it cannot have 
integrals in involution and independent in an open 
domain. 


Nearly Integrable Hamiltonian Systems, 
Theorem of Poincare 


Consider the Hamiltonian system with n degrees of 
freedom, depending on a small real parameter 
€ E€ (—€0, +£0), defined by the analytic function H: 


H(ọ,1,£) = HoT) + £ - Ai (9,2) [3] 


where Ø= (491,..., Yn) E T”,J=(h,...,In) € R”, and 
where Hy, is periodic in the angles y;. 

This system is called “nearly integrable” because 
when £ =0, the “unperturbed system” Hp is integr- 
able in the action-angle variables ø, I: 


H(@, 1,0) = Ho(I) 
then 


dl dp Ho 


30, == a 
dt >° dt Ol an 
system which can be integrated by quadratures: 
I=I° and p= @°+a(I°)-t 


According to the theorem of Liouville, the motion of 
the unperturbed problem takes place on n-dimen- 
sional tori (S')” in the phase space. On these 
invariant tori, indexed by the actions I, the motion 
is generally quasiperiodic (if the frequencies @(I) are 
rationally independent). 

We are now interested in studying the perturbed 
system [3] with € 4 0, and its integrability which is, 
according to Poincaré (1892), “the fundamental 
problem of dynamics.” This problem of nearly 
integrable Hamiltonian systems is directly inspired 
by celestial mechanics where the motions in the 
solar system are, in a first approximation, described 
by the (integrable) Kepler’s problem. In particular, 
the “restricted three-body problem” is the study of the 
motion of a planet in the gravitational field of the Sun, 
with the perturbative attraction of Jupiter. It is also the 
problem of the Moon in the field of the Earth, with 
the perturbative attraction of the Sun (Poincaré 1892). 


Theorem of Poincaré (Poincaré 1892). Assume 
that, in the Hamiltonian function [3]: 
(i) (nondegeneracy condition) the unperturbed 


Hamiltonian Hp is nondegenerate, that is, 


OH) 
ALal; 


Ow; 


det al, 
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in an open domain of the phase space; 
(ii) (genericity condition) no coefficient h(I) in the 
Fourier expansion of Hı with 


Hi (9,1) = Y bI) - el 


kez” 


does identically vanish in the nonresonant 
domain G € R” of the actions defined by 


G= fre Rt Sok wld = 0, 
i=1 
iff (k1,...,Rn) = 0....0) 


then, there is no analytic first integral F(ọ,1,€) 
independent of the Hamiltonian function H. 


Thus, a perturbation of a nondegenerate integrable 
Hamiltonian system is generically nonintegrable. 

When one wants to apply this theorem to celestial 
mechanics, a peculiarity is that the unperturbed 
problem corresponds to the Keplerian system, which 
is degenerate, and this is a specific difficulty of these 
systems. 


Splitting of Separatrices and 
Nonintegrability 


Consider a Hamiltonian system with n=2 (degrees 
of freedom) defined as in eqn [3] by a perturbation 
of an integrable Hamiltonian: 


A(¢1 , 2,11 , 12 ,€) 
=Ao(h,b)+e-Ail(gi,¢2,h,b) [4 


The unperturbed problem is integrable and its four- 
dimensional phase space is foliated by two-dimensional 
invariant tori T? : I = constant. If Ho is nondegenerate, 
the nonresonant tori are dense and the resonant tori also 
are dense in the phase space. 

According to Kolmogorov’s theorem and 
the Kolomogorov—Arnol’d—Moser (KAM) theory 
(Arnol’d 1985), the majority of the nonresonant 
tori of the unperturbed problem Ho are preserved 
in the full problem [4]: they are slightly 
deformed, and are invariant in the perturbed 
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system. The resonant tori of Ho are destroyed in 
the perturbed problem. 

Now we consider, in the phase space, a transverse 
surface S to the invariant tori T? of the perturbed 
system. A trajectory of the system generated by [4], 
which crosses $ through a point wo, will cross S 
again, for the first time, through a point w1: this 
defines the “first return map” or “Poincaré’s map” 
R:wot?R(wo)=uw1.S8 is called a Poincaré’s sec- 
tion. If wọ belongs to a preserved invariant torus 
of the perturbed system, the successive points 
wo, W1 = R(wo), W2 = R(w 1), w3 = R(w2),... belong 
to the intersection of this torus with S$; thus, they 
belong to a curve diffeomorphic to a circle, which is 
an invariant curve of the map R. If wo does not 
belong to a preserved invariant torus of [4], the 
sequence of points wo, W1, W2,... through the 
Poincaré’s map belongs to a curve much more 
complicated than a curve diffeomorphic to a 
circle (Poincaré 1890, Arnol’d 1985) and the 
“chaotic” behavior of this sequence is the mark of 
the nonintegrability of the system [4]. 

The best way of numerically showing the 
“evidence” of nonintegrability is to study the 
example of a system with “one and a half” degree 
of freedom, that is, a system with one degree of 
freedom whose Hamiltonian depends on time: 
H(y,I,t). An example of such a system is the 
problem of a mathematical pendulum whose length 
| performs periodic oscillations, defined by the 
Hamiltonian function 


2 
H(y,p.t) =F -wW(lte-f()-cose [S 
where y € S!, p € R, and f is periodic of period T. 
The unperturbed system (e =0) is integrable (one 
degree of freedom with a Hamiltonian independent 
of t): 
Pp» 
Ao(y,p) = g V COS? 


The phase portrait of this problem is similar to the 
one of the simple mathematical pendulum of 
constant length: on the cylinder St x R there are 
two equilibria (stable and unstable) and separatrices 
“beginning” and “finishing” at the hyperbolic point 
y. The invariant stable and unstable manifolds 
associated to x and represented by these separatrices 
were called by Poincaré as “homoclinic” trajectories, 
because each of them, drawn on the phase cylinder, 
joins equilibrium y to itself. 

If e #0, we define a Poincaré section of the 
perturbed system [4] in the following way: from an 
initial point woọo(po, po, to), we consider the successive 


planes perpendicular to the t-axis in the “extended” 
phase space {(y,p,t)}, defined by:to,ti =to +T, 
tı =to +2T, t3=t9+3T,... and we look at the 
successive intersections of the orbit of wo with 
these planes: wo,w1,Ww2,.... If we identify all the 
successive planes and if we draw on the same 
picture, the points wo,w1,W2,..., we obtain a 
phase portrait in which the equilibria of the 
unperturbed problem are present, but the separatrix 
which “leaves” the point x is not confounded with 
the separatrix which “ends” at y, as in the 
unperturbed problem: the two invariant curves are 
transversal to each other: they “split” and have an 
infinite number of intersections. This splitting is the 
traduction of the nonintegrability of the perturbed 
system [|5]. 

A method to detect this splitting of separatrices 
consists in computing the Melnikov’s function which 
gives a measure of the angle between the separatrices 
at their first intersection when they split. 

Many concrete Hamiltonian systems have been 
studied by this method and numerical investigations 
on the splitting have permitted detection of their 
nonintegrability. 


Topological Obstructions to Integrability 


We are interested in a natural mechanical system 
with two degrees of freedom and we suppose that 
the state space N is a real analytic surface which is 
compact and orientable. Then, N consists of a two- 
dimensional sphere with k handles (or a torus with k 
holes). The number k is a topological invariant of 
the surface and is called the genus of N. 

Let H be the Hamiltonian function associated to 
this problem. The Hamiltonian system possesses the 
first integral H. It is completely integrable if and 
only if another analytic integral F exists, function- 
ally independent of H. In this case, the state space N 
belongs necessarily to a very restrictive class of 
surfaces. 


Theorem (Kozlov 1983). If the genus k of the 
state manifold N is not equal to 0 or 1 (i.e., if N is 
neither diffeomorphic to the sphere S* nor to the torus 
T*), then the Hamiltonian system generated by H does 
not possess a first integral, analytic on T*N and 
functionally independent of the energy integral H. 


Note that this theorem does not apply to first 
integrals which are C% only, and examples can be 
given which illustrate this case (Kozlov 1983, 1989). 

For systems with more than two degrees of 
freedom, an open question is to know whether the 
complete integrability imposes restrictions to the 
topology of the state manifold N. 
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Singular Point Analysis, Branching 
of Solutions and Ziglin’s Theory 


If we look at the classical Hamiltonian problems 
which have been integrated, their first integrals are 
real functions which can be continued in the 
complex domain as one-valued holomorphic or 
meromorphic functions of the complex time t 
(polynomials, rational functions, etc.). This fact 
leads to the concept of “complex integrability.” 
But the nonintegrability of a complex Hamiltonian 
system does not imply the nonintegrability of its 
restriction to the real domain: it may happen that a 
real analytic first integral does not possess a 
continuation in the complex domain as a mero- 
morphic function. 

Adopting this point of view, S Kowalevskaya 
(Kowalevski 1889) studied the problem of a top 
rotating around a fixed point, and she discovered a 
new case of integrability for this classical problem of 
Hamiltonian mechanics. She searched for conditions 
on the parameters such that the movable singula- 
rities of the solutions in the complex plane of time 
are poles (as a definition, a singularity is movable if 
its location in the complex domain depends on the 
initial conditions). Such differential systems are said 
to be of “Painlevé’s type.” In this case, the solutions 
are single valued in the complex t-plane and there is 
no branching of these solutions. The leading idea is 
the following: a first integral must be constant along 
a solution, and an eventual branching would change 
its value along a loop around a singularity in the 
complex t-plane. However, finite branchings of 
solutions can be compatible with integrability. 

The main tool in this analysis is the calculation of 
the “Kowalevski’s exponents” which determine the 
eventual branching of a solution around a 
singularity. 

In spite of the efficiency of the Painlevé analysis for 
the search of integrable (or nonintegrable) systems, the 
relation between the analytic properties of their 
solutions (Painlevé) and their integrability in the 
sense of Liouville remains mysterious. The most 
fundamental result obtained in this field is a theorem 
of Adler and van Moerbeke which proves that, if a 
system has the Painlevé property and if it is integrable 
in the sense of Arnol’d—Liouville, then it is algebrai- 
cally integrable (Adler and van Moerbeke 1989). 

The discovery of S Kowalevskaya inspired 
Ziglin, who related the existence of meromorphic 
first integrals for a Hamiltonian system, to the 
properties of the linearized equations along a 
particular periodic solution of this system, espe- 
cially to the monodromy group associated to this 
linear system. Ziglin used the constraints imposed 


to this monodromy group by the existence of first 
integrals. 

Let us consider a Hamiltonian system defined on 
a complex analytic symplectic manifold of dimen- 
sion 2m, and suppose that there exists a family of 
periodic solutions T. The linearized equations 
deduced from the Hamiltonian system along I are 
decoupled into tangential and normal equations. We 
are interested by the normal equations, which are 
linear with periodic coefficients. 


Ziglin’s Theorem (Ziglin 1983). Assume that a 
Hamiltonian system has a family of particular 
solutions T, (which are not equilibria) parametrized 
by periodic functions of the complex time and 
depending analytically on a real parameter h € 
(h1,h2). Let G be the monodromy group of the 
normal variational equation associated to the solu- 
tion T,. A monodromy matrix g € G is said to be 
nonresonant if every eigenvalue of g is different 
from a root of unity. If the Hamiltonian system has 
a meromorphic integral F, functionally independent 
of the Hamiltonian H in a neighborhood of T,, and 
if the monodromy group G contains a nonresonant 
element g1, then for any g2 € G, the commutator 
g*=g5'-g7'-g-g1 satisfies either g*=Id or 
g* =(g1). 


As a corollary of this theorem, we have sufficient 
conditions of nonintegrability: if the necessary con- 
ditions of integrability of Ziglin are not satisfied by 
a Hamiltonian system, it is not analytically integr- 
able. For instance, this will happen if we can find 
two nonresonant monodromy matrices g; and g2 
which do not commute. If the periodic solution T, 
has two complex periods, the monodromy group G 
has generators g4 and g2, respectively associated to 
each of these periods and their commutativity can 
be sometimes studied. 

These sufficient conditions of nonintegrability 
were studied for particular Hamiltonian systems, 
first by Ziglin himself. 

Several concrete systems with two degrees of free- 
dom were proved to be nonintegrable by Ito, Yoshida, 
Churchill, Rod, and many other mathematicians who 
applied this “Ziglin’s method”: for instance, the 
Hénon-Heiles system, the Yang-Mills system, and 
Hamiltonian systems with a homogeneous potential. 


Nonintegrability and Differential Galois 
Theory (Morales-Ruiz 1999) 


Recently, the integrability of Hamiltonian systems 
was studied with algebraic tools from the differen- 
tial Galois theory, applied to linear differential 
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systems. As in Ziglin’s theory, we consider a 
particular solution I (not necessarily periodic) of a 
differential system generated by an analytic Hamil- 
tonian H with n degrees of freedom, and the (linear) 
variational equations along T. The idea is that, if the 
Hamiltonian system is integrable, we can assume 
that the linearized equations along I must also have 
a “regular behavior.” If the Hamiltonian system is 
integrable, it will also be the case for the variational 
equations. 

The normal variational system (of order 2n — 2) 
can be written as 


T=] KTE): £ 6 


with 


K(I(t)) is a matrix depending on the particular 
solution I. 

We have to define the “Galois group” of the linear 
equation [6]. Recall that in the classical Galois 
theory of algebraic equations, the Galois group is 
defined by the automorphisms which map roots 
onto roots of the equation. In an analogous way, in 
the differential Galois theory, we consider the maps 
which send a fundamental solution of eqn [6] on a 
fundamental solution. In order to define the Galois 
group G associated to [6], we consider a differential 
field K of functions over C (i.e., a field of functions 
equipped with a derivation). The field of constants 
of K is C; it is the subfield of K whose elements have 
a derivative equal to zero. We denote by K(,7,...) 
the differential field extension obtained from K by 
the adjunction of the functions €,7,.... If (y,~w) is a 
fundamental system of solutions of eqn [6], then 
L=K(y,w) is the smallest differential field exten- 
sion which contains all the solutions of [6]. The 
field of constants of L is the same as the one of K, 
that is, C. By definition, L is a Picard—Vessiot 
extension of K. 

The differential Galois group of L is defined as 
the group of the automorphisms y of L (that map a 
solution of [6] onto a solution) leaving the field of 
constants fixed. Given a fundamental system of 
solutions (y, w), we can associate to each automorph- 
ism y the matrix M such that (y(y), y(w)) = (y, W).M. 
By definition, the set of these matrices M is the 
Galois group G of eqn [6]. It is a linear algebraic 
group (because, the matrices M being symplectic, 
their coefficients verify polynomial equations) and a 
subgroup of the linear group of matrices GL(C). 
We note that, for a given linear system, the mono- 
dromy group is contained in the Galois group and 
both are subgroups of the symplectic group Sp(C). 


In the Galois group G of eqn [6], we consider G°, 
the connected component of the identity. The integr- 
ability of the initial Hamiltonian system is connected 
to the integrability of the variational equation [6] and, 
through it, to the properties of its Galois group: 


Theorem of Morales and Ramis (Morales-Ruiz 
1999). If an analytic Hamiltonian system is com- 
pletely integrable, then the Galois group associated 
to the variational equation along a particular 
solution T is such that its connected component of 
identity G? is Abelian. 


Thus, if a Hamiltonian system is such that G° is not 
Abelian, there cannot exist a complete set of first 
integrals in involution in a neighborhood of the 
particular solution I and the system is not integrable. 

In the concrete applications of this theory, an 
algorithm of Kovacic allows us to determine the 
Galois group explicitly. By this method, several 
Hamiltonian systems were proved to be nonintegrable: 
for instance, systems of points on a line with a 
potential in 1/r*, studied by Julliard-Tosel (1998), 
but also ancient proofs of nonintegrability of homo- 
geneous potentials, which were improved by Yoshida 
and Umeno, thanks to the theorem of Morales—Ramis. 


See also: Billiards in Bounded Convex Domains; 
Infinite-Dimensional Hamiltonian Systems; Integrable 
Systems: Overview; Peakons; Separatrix Splitting. 
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The solar system has long appeared to astronomers 
and mathematicians as a model of stability. On the 
other hand, statistical mechanics relies on the 
assumption that large assemblies of particles form 
highly unstable systems (at the microscopic scale). 
Yet all these physical situations are described, at 
least to a certain degree of approximation, by 
Hamiltonian systems. 

One may hope that Hamiltonian systems can be 
classified in two different categories, stable and 
unstable ones. However, the situation is much more 
complicated and both stable and unstable behaviors 
cohabit in typical systems. Even our examples are 
not perfect paradigms of stability and instability. 
Indeed, it is now clear from numerical as well as 
theoretical points of views that some instability is 
present over long timescales in the solar systems, so 
that for example future collisions between planets 
cannot be completely ruled out in view of our 
present understanding. On the other hand, unex- 
pected patterns of stability have been discovered in 
systems involving a large number of particles. 

Understanding the impact of stable and unstable 
effects in Hamiltonian systems has been considered 
ever since Poincaré as one of the most important 
questions in dynamical systems. In this article, we 
will discuss model Hamiltonian systems of the form 


H.(q,p) = hp) + €G.(q, p) 


where (q,p)€T xU, with U a bounded open 
subset of R”. Recall that the equations of motion are 


q(t) = Oph(p) + €OpG-(4,P) [1] 


p(t) = —€0,G.(4;P) 2 | 


The textbook by Arnol’d (1964) is a good general 
introduction on Hamiltonian systems. We will always 
denote by w(p) the frequency map 0,h(p), which plays 
a crucial role. Here, as is obvious in [2], the action 
variables p are preserved under the evolution in the 
unperturbed case € = 0. We will try to explain what is 
known on the evolution of these action variables for 
the perturbed system. As we will see, in many 
situations, these variables are extremely stable. For 
example, KAM theorem implies that, for a positive 
measure of initial conditions (go, po) the trajectory 
(q(t), p(t)) satisfies ||p(t) — p(O)|| < Ce for all times. 
Examples show that some initial conditions may lead 


to unstable trajectories, that is, trajectories such that 
p(t) — p(0)|| > 1/C for some t (depending on €) and 
some fixed constant C independent of e. However, this 
is, as we will see, possible only for very large time t 
(meaning that t as a function of € has to go to infinity 
very quickly when e —> 0). The main questions here 
are to understand in what situation instability is or is 
not possible, and what kind of evolutions can have the 
action variable p. Another important question is to 
estimate the speed (as a function of the parameter €) of 
the evolutions of p. 


A Convention 


We assume, unless otherwise stated, that the 
Hamiltonians are real analytic. The norm |H| of 
the Hamiltonian H is the uniform norm of its 
holomorphic extension to a certain complex strip. 
We do not specify the width of this strip. 
Whenever we consider a family H,,F,... of 
Hamiltonians, we mean that the norm |H,| is 
bounded when e€ — 0. 


Averaging and Exponential Stability 


The first observation concerning the action variables 
is that they should evolve at a speed of the order of 
c. However, averaging effects occur. More precisely, 
in the equation p(t) = —€0,H.(q(t), p(t)), the variable 
q(t) is moving fast compared to p(t). If the evolution 
of g(t) nicely fills the torus T”, it is tempting to 
think that the averaged equation 


p(t) = —eV.(p(t)) 
should approximate accurately the actual behavior 


of p(t), where 
V.(p) = J OH (4; p) dq 


We have V = 0, which leads us to think that the 
evolution should consist mainly of oscillations of 
small amplitude with no large evolution. This 
reasoning is limited by the presence of resonances. 


Frequencies 


A frequency w€R? is said to be resonant if there 
exists k € Z4(= 74 —{0}) such that (k,w)=0. The 


resonance module of w, 
Z(w) = {REZ*/(k, w) = 0} 


is a subgroup Zf; we denote by R(w) the vector 
space generated by Z(w) in Rf. The order of 
resonance r(w) is the dimension of R(w). The main 
examples of resonances of order r are the 
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frequencies w= (w 1,0), where w1 ERİ is nonreso- 
nant. This example is universal. Indeed, if w is a 
resonant frequency, then there exists a matrix 
AEGI,(Z) such that Aw=(w 1,0), where w € R” 
is not resonant. The matrix A can be seen as a 
diffeomorphism of T*, which transports the constant 
vector field w to the constant vector field Aw = (w1, 0). 
It is useful to distinguish, among nonresonant 
frequencies, some which are sufficiently nonresonant. 
A frequency w € R? is called Diophantine if there exist 
real constants y > 0 and T > d such that 


w, k) > Yall 


for each kez. Finally, a frequency is called 
resonant Diophantine if there exists a matrix 
A€GI,(Z) such that w= A(w1, 0), where w1 € R® is 
a Diophantine frequency. 


Symplectic Diffeomorphisms and Normal Forms 


An efficient mathematical method to take averaging 
effects into account is the use of normal forms. 
Normal form theory consists in finding new coordi- 
nates in which the fast angles have been eliminated 
from the equations up to a small remainder. This is 
done exploiting the existence of a large group of 
diffeomorphisms preserving the Hamiltonian struc- 
ture of equations, called symplectic diffeomorphisms 
or canonical transformations. We refer the reader to 
standard textbooks for these notions, for example to 
Arnol’d (1964). An important point is that a 
symplectic diffeomorphism @ sends the trajectories 
of the Hamiltonian H o ¢ to the trajectories of the 
Hamiltonian H. A Hamiltonian N(q, p) is said to be in 
R-normal form, where R is a linear subspace of R”, if 
ONER for each (q,p). Let us give an illustrative 
result, taken from Lochak et al. (2003). Note that this 
result is not sufficient to obtain uniform stability 
estimates, as in Nekhoroshev theorem below. More 
precise normal form results are given in Nekhoroshev 


(1977) and Pöschel (1993). 


Normal Form Theorem 


Let wo =w(po) be a given Diophantine or resonant- 
Diophantine frequency. Let us denote B,(po) the 
open ball of radius r in R? centered at po. There 
exists a constant a which depends only on w, and 
constants co > 0 and C > 0 such that the following 
holds: for each e< eo, there exists an analytic 
symplectic embedding ¢,:T4 x Bye) — T? x U, 
which is e-close to identity and such that 


Hogdal b) = h(p) + Nda p) + we) F.(, p) 


where N is in R(wo)-normal form, r(e)> ye, and 
ple) < eS", 


This means that the motions with resonant initial 
conditions are confined, up to small oscillations, in 
the associated affine plane p(0) + R(w(p(0)) until 
they live in the domain of the normal form, or until 
time u™ (e). 


Geometry of Resonances 


In view of the normal form theorem, we are led to 
consider the curves P(0): R — R? which satisfy 


P(0') — P(0) € R(o(P(0))) 


for each @ and 6’. Indeed, it appears that these curves 
are the ones the action variables can follow on 
timescales not involving the remainders of the 
normal forms. Note that here the parameter @ is 
not the physical time. Assuming that P(@) is such a 
curve, we can define the affine space 


R := P(0) + DoerR(w(P(0))) 


We then have P(0) €R for each 0. In addition, each 
point P(0),0 €R, is a critical point of the restriction 
hr of the unperturbed Hamiltonian h to the affine 
space R. It follows that the curve P(@) has to be 
constant if the unperturbed Hamiltonian satisfies the 
following hypothesis. 


Nekhoroshev Steepness 


We say that the unperturbed Hamiltonian h is steep 
if, for each affine subspace A in RË, the restriction 
ha has only isolated critical points. 


This formulation, due to Niederman, is much 
simpler than the equivalent one first given by 
Nekhoroshev. It turns out that this condition, 
which was made natural by our heuristic explana- 
tion, implies stability over exponential timescales for 
all initial conditions (see Nekhoroshev (1977)). We 
first need another condition. 


Kolmogorov Nondegeneracy 


We say that the unperturbed Hamiltonian h is 
nondegenerate in the sense of Kolmogorov if it has 
nondegenerate Hessian at each point, or equivalently 
if the frequency map p= w(p) is an immersion. 


Nekhoroshev Stability Theorem 


Assume that the unperturbed Hamiltonian does not 
have critical points (w(p) does not vanish), satisfies 
Nekhoroshev steepness and Kolmogorov nondegene- 
racy conditions. Then there exists constants a > 0 
and b > 0, which depend only on h, and constants 
co > 0 and C > 0 such that the following holds: for 
€ < €o, each trajectory (q,(t),p-(t)) satisfies the 
estimate 


Hamiltonian Systems: Stability and Instability Theory 633 


P(t) — p-(0)|| < Ce? 


a 


for all t such that |t| < e“. 


Herman’s Example 


In order to illustrate the necessity of the condition of 
steepness, let us consider the Hamiltonian 


A.(q41, 92, P1, p2) = pip2 + €V(q1) 


with V: T—>R. The associated equations are 


p2 =0, f=-V, H=p,, H=MN 


The trajectories whose initial conditions are sub- 
jected to p2(0)=0 and V'(q1(0)) Æ O satisfy 


p(t) = pı (0) — teV"(qi(0)) 
po(t)=9, = gilt) = 41(0) 


We see an evolution at speed € of the action variable 
pı contradicting the conclusion of Nekhoroshev 
theorem. In this example, we have R(uw(p(t))) = 
R x {0}, and hig xo) = 0, so that the curve 


P(0) = (6,0) 


is indeed a curve of critical points of hyp x (0). 


Genericity of Steepness 


The condition of steepness is frequently satisfied. In 
order to be more precise, we mention that, for N € N 
large enough (how large depends on the dimension d), 
steepness is a generic condition in the finite-dimen- 
sional space of polynomials of degree less than N. 
Note in contrast that a quadratic Hamiltonian is steep 
if and only if it is positive definite. Finally, it is 
important to mention that convex Hamiltonians þh 
with positive-definite Hessian are steep. More gener- 
ally, quasiconvex Hamiltonians are steep. A function 
h : U — R is said to be quasiconvex if, at each point, 
the restriction of its Hessian to the kernel of its 
differential is positive definite. 


The Quasiconvex Case 


It is interesting to be more precise about the values 
of a and b in Nekhoroshev theorem. We shall do so 
in the quasiconvex case, which is the most stable 
case, and where much more is known. If / is 
quasiconvex, one can take 
1 
ay, 
as was proved by Lochak (1992). It is a question of 
active present research whether these exponents are 
optimal. It now appears that this is almost so, and 
that the optimal exponent a should not be larger 


than 1/2(d — 3). That this exponent deteriorates as 
the dimension increases is of course very natural in 
the perspective of statistical mechanics. As a matter 
of fact, not only the exponent a but also the 
threshold €9 of validity of Nekhoroshev theorem 
deteriorates with the dimension, as was noticed in 
Bourgain and Kaloshin. 

Another important fact was proved in Lochak 
(1992): in these expressions, the important value of 
d is not the total number of degrees of freedom, but 
the number of active degrees of freedom. More 
precisely, resonant initial conditions are more stable 
than generic ones. If r is the order of resonance of a 
given initial condition, then the number d — r of fast 
angles can be substituted to the total number of 
degrees of freedom for the computation of the 
stability exponent. This phenomenon may account 
for the surprising stability obtained numerically by 
Fermi, Pasta, and Ulam. 


Permanent Stability 


Many initial conditions satisfy more than exponen- 
tial stability: they are permanently stable. 


Kolmogorov Theorem 


Assume that h satisfies Kolmogorov nondegeneracy 
condition (“Kolmogorov nondegeneracy”’). Then for 
each open subset V C R? such that V c U, there 
exists co > 0 such that, for each e < <o, there exists 


e a smooth symplectic embedding Q.: T x V— 
T? x U, which is e-close to the identity, 

è a compact subset F; of V, whose relative measure 
in V is converging to 1 as «— 0, 


such that the Hamiltonian system H; o }, preserves 
the torus T? x {p} for each p €F.. 


The union 
F. = ¢.(T x F.) 


of all the invariant tori has positive measure. Its 
complement is usually an open dense subset of 
T? x U. All the orbits starting in this invariant set 
obviously undergo oscillations of amplitude of the 
order of € for all times. It is worth mentioning that 
some energy surfaces may not intersect the invariant 
set Fe. This is illustrated in example, i.e., “Herman’s 
example,” where the surface of zero energy does not 
contain invariant tori. The following condition 
guarantees the existence of invariant tori on each 
energy surface. 
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Arnol’d Nondegeneracy 


The Hamiltonian h is said to be nondegenerate in 
the sense of Arnold if it does not have critical points 


and if the map 


w(p) 
lot) || 


p= 
is a local diffeomorphism between each level set of h 
and S4~|, This is equivalent to say that the function 
(A p) ER x U Ah(p) has nondegenerate Hessian 
at each point of the form (1,p). 


Arnol’d Theorem 


If b satisfies Arnol’d nondegeneracy condition, then 
the relative measure of the set F. of invariant tori is 
converging to 1 in each energy surface. 


This theorem prevents ergodicity of the perturbed 
systems for the canonical invariant measure on its 
energy surface. This may be considered as a very 
disappointing result for statistical mechanics, whose 
mathematical foundation has often been considered 
to be the Boltzmann hypothesis of ergodicity. 
However, statistical mechanics is first of all a 
question of letting d go to infinity, and ergodicity 
might not be such a crucial hypothesis (see 
Khinchin). 

When d=2, the Arnol’d theorem has particularly 
strong consequences. Indeed, in this case, the 
invariant tori cut the energy surfaces in small 
connected components. The motion is then confined 
in these connected components. As a consequence, 
we obtain permanent stability for all initial 
conditions. 

In higher dimensions however, the complement of 
Fin each energy shell is usually a dense, connected 
open set. There may exist orbits wandering in this 
large connected set, although the speed of evolution 
of these orbits is limited by Nekhoroshev theory. 
Understanding the dynamics in this open set is a 
very important and difficult question. It is the 
subject of the next section. 


Relaxed Assumption 


For many applications, such as celestial mechanics, 
the nondegeneracy conditions of Arnol’d or Kolmo- 
gorov are not satisfied, or difficult to check. 
However, the existence of invariant tori has been 
proved under much milder assumptions. As a rule, 
invariant tori exist in the perturbed systems if the 
frequency map p> w(p) stably contains Diophan- 
tine vectors in its image. 


The Mechanism of Arnol’d 


Understanding instability is the subject of intense 
present research. General methods of construction 
of interesting orbits as well as clever classes of 
examples are being developed. These methods are 
exploring the limits of stability theory. Here we shall 
only describe the fundamental ideas of Arnol’d (see 
Arnol’d 1964), where most of the present activity 
finds its roots. Although these ideas have some 
ambition of universality, they are best presented, 
like in Arnol’d (1964), on an example. We consider 
the quasiconvex Hamiltonian 


(41, 92, 93, P1; P2; P3) 
= (pj + p3)/2 — p3 + ecos 2rq2 
+ (cos 2rq2)(cos 27q1 + cos 27q3) 


As we have seen, this system is typical of the kind of 
Hamiltonians one gets after reduction to resonant 
normal form. However, it is illuminating to consider 
u not as a function of e but as an independent 
parameter. This is an idea of Poincaré then followed 
by Arnol’d. We shall expose the main steps of the 
proof of the following result. 


Theorem 


Let us fix numbers 0 < A < B. For each « > 0, there 
exists a number ugle) such that, when O < u < ugle), 
there exists a trajectory 


(q1(t), 92(t), P1(t), P2(t)) 


and a time T > 0 (which depends on «e and u) such 
that 


pi(O)<A, pi(T)= B 


The Truncated System 


Let us begin with some remarks about the truncated 
Hamiltonian obtained when u= Q: 


Hgt) = Ai(1, 93,1, 23) + H2(qQ2, p2) 
— p, /2 —p3 + p5/2 + ecos 27q2 


This system is the uncoupled product of Hı and of 
the pendulum described by H2. The variable pı is 
constant along motion; hence, the theorem can not 
hold for w=0. 

Recall that the point q2 =0, p2 =0 is a hyperbolic 
fixed point of the pendulum H)(q2, p2)=p3/2 + 
€cos 27q2. The stable and unstable manifolds of this 
integrable system coincide; they form the energy 
level H2 =€. As a consequence, in the product 
system of Hamiltonian Hp = Hı + H2, there exists, 
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in the zero energy level, a one-parameter family T,, 
of invariant tori of dimension 2: 


Ir~ {pı = Ww, p3 =u /2+€,q2 = 0, po =} 
CT xR 


Each of these tori is hyperbolic in the sense that it has a 
stable manifold of dimension 3 and an unstable 
manifold of dimension 3, which are nothing but the 
liftings of the stable and unstable manifolds of the 
hyperbolic fixed point of H2. Notice that these 
manifolds do not intersect transversally along T,,. 

When u Æ 0, the perturbation is chosen in such a 
way that the tori T, are left invariant by the 
Hamiltonian flow. 


Splitting 


For O < u < pole), the invariant tori T, still have 
stable and unstable manifolds of dimension 3. These 
stable and unstable manifolds intersect transversally 
in the energy surface, along an orbit which is 
homoclinic to the torus. 


The first point is that the tori remain hyperbolic, 
and that the stable and unstable manifolds are 
deformed, but not destroyed by the additional 
term. This results from the observation that the 
manifold M formed by the union of the invariant 
tori is normally hyperbolic in its energy surface. 
Note that this step does not require exponential 
smallness of u. 

It is then a very general result that the stable and 
unstable manifolds have nonempty intersection. It is 
a global property, which can be established by 
variational methods, and which still does not rest on 
exponential smallness of u. 

The key point, where exponential smallness is 
required, is transversality. Since transversality is a 
generic phenomenon, one may think that this step is 
not so crucial. And indeed, it is very likely that the 
statement remains true for most values of u € ]0,€] 
(and not only for u < po(e)). However, there are two 
important issues here. First, transversality is difficult 
to establish on explicit examples. Second, it is useful 
for many further discussions to obtain some quanti- 
tative estimates. 

Indeed, we can associate to the intersection 
between the stable and unstable manifolds a 
quantity, the splitting, which in a sense measures 
transversality. Discussions on such a definition are 
available in Lochak et al. (2003). Using methods of 
Poincaré and Melnikov, Arnol’d showed that this 
splitting can be estimated, for sufficiently small «€, by 


a > pe lV" + O(p) [3] 


This implies non-nullity of the splitting, hence 
transversality, for small u. 


Transition Chain 


We have established the existence, when u > 0 is 
small enough, of a family T, of hyperbolic invariant 
tori such that the stable manifold W and the 
unstable manifold W, intersect transversally along a 
homoclinic orbit (but not along T.,!) for each w. 

A stability argument shows that the stable mani- 
fold Wt of the torus T,, intersects transversally the 
stable manifold W, of the torus To, when w is close 
enough to wo. How close directly depends on the 
size of the splitting. We obtain heteroclinic orbits 
between tori close to each other. 

Given two values w and w”, we can find a sequence 
wi 1 <i< N, such that wo=w,wn=w', and W7 
intersects transversally W; for all i. The associated 
family T,,, of tori is called a transition chain. 

The left step consists in proving that some orbits 
shadow the transition chain. Arnol’d solved this step 
by a very simple topological argument which, 
however, does not provide any estimate on the 
time T. He proves the existence of an orbit joining 
any neighborhood of T,, to any neighborhood of Ty. 
This ends the proof of the main theorem, since we 
can chose w and w’ such thatw < A< B<w. 

The dynamics associated to hyperbolic tori and 
transition chains have later been studied more 
carefully. It particular, a \-lemma can be proved in 
this context, which allows us to conclude that, in a 
transition chain, the unstable manifold Wọ of the 
first torus intersects transversally the stable manifold 
of the last torus Wẹ. These detailed studies also 
allow us to relate the speed of diffusion to the 
splitting of the invariant manifolds. 


Diffusion Speed 


It is interesting to estimate the speed of evolution of 
the variable p1, or in other words the time T in the 
statement. It follows from Nekhoroshev theory that 
this time T has to be exponentially large as a 
function of €. In fact, it is possible to prove, either by 
recent developments on the ideas of Arnol’d exposed 


above, or more easily by variational methods, (Bessi 
1996) that 


eC/ ve 
~ —p log u 


for u < po(e). This time is of course highly related to 
the estimate [3] of the splitting. In addition, Ugo Bessi 
proved that one can take uo(e) =e~“/". Plugging this 
value of u in the estimate of T, we get the estimate 
T < e/V as a function of the only parameter e. 
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Considering the fact that the orbit we have described 
goes close to double resonances, this is the best 
estimate one may hope for in view of the improved 
Nekhoroshev stability estimates at resonances. 

The idea is now well spread that the time of 
diffusion is exponentially large. However, we point 
out that, if it is indeed exponentially small as a 
function of the parameter «€, it is only polynomially 
small as a function of the second parameter u, as was 
first understood by P Lochak and proved in Bernard 
(1996) using the variational method of U Bessi. 


Conclusion 


The theories of instability are developing in several 
directions. One of them is to try to understand the 
limits of stability, and to test to what extent the 
stability results obtained so far are optimal. This 
aspect has quickly developed recently, for example, 
the optimal stability exponent a for convex systems 
is almost known. Another direction is to try to give 
a description of unstable orbits in typical systems. 
This remains a widely open question. 

Let us finally mention that the application of the 
theories we have presented to concrete systems is 
very difficult. One of the reasons is that the 
estimates of the threshold <ọ of validity of Nekhor- 
oshev and KAM theorems that can (painfully) be 
obtained by inspection in the proofs are very bad, 
and it is much too bad, for example, to think about 
applications to the solar systems with the physical 
values of the parameters. 


See also: Averaging Methods; Hyperbolic Billiards; KAM 
Theory and Celestial Mechanics; Separatrix Splitting; 
Stability Problems in Celestial Mechanics; Stability 
Theory and KAM; Weakly Coupled Oscillators. 
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Overview 


Given a continuous Hamiltonian H(x, p) defined on 
the cotangent bundle of a compact boundaryless 
manifold, where x and p are the state and the 
momentum variable, respectively, and satisfying 
suitable convexity and coercivity assumptions, we 
consider the family of Hamilton—Jacobi equations 


Hix,Do)=<@ [1] 


with a a real parameter. If, in addition, H is 
assumed to be smooth, we also consider the 
Hamilton’s equations 


£ = Hp (E i) 1 = =d @ n) [2] 


whose analysis is related to the variational problem 
of minimizing the action functional 


J L(E, È) dt 3] 


among all Lipschitz-continuous or, equivalently, 
continuous piecewise C! curves defined on I with 
fixed end points. Here I is a compact interval and L, 
the Lagrangian, is the Fenchel transform of H. A 
“conjugate” flow, named after Euler-Lagrange, is 
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also defined on the tangent bundle of the underlying 
manifold. 

A connection between [1] and [2] is provided by 
the classical Hamilton—Jacobi method, which shows 
that the graph of the differential of any regular, say 
C!, global solution to [1] is an invariant subset for 
the Hamiltonian flow. The drawback of this 
approach is that such regular solutions do not exist 
in general, even for very regular Hamiltonians. 

However, for any continuous Hamiltonian a 
distinguished value of the parameter a can be 
detected, denoted by c and qualified, from now on, 
as critical, for which there are a.e. subsolutions of 
the corresponding Hamilton-Jacobi equations 
enjoying some extremality properties. Note that 
such functions can be equivalently defined as weak 
solutions, in the viscosity sense, of [1] with a=c, or 
as fixed points of the associated Lax—Oleinik 
semigroup (see Fathi (to appear)). We do not give 
these interpretations here to avoid any technicalities. 

Even if they are just Lipschitz—continuous on the 
whole underlying manifold, these extremal subsolu- 
tions become of class C!, when restricted on a 
special compact subset, the same for any of them, 
say A, and the corresponding differentials coincide 
on A. More generally, all critical subsolutions, that 
is, the a.e. subsolutions to [1] with a=c, are 
continuously differentiable on A. This regularity 
property holds if H is at least locally Lipschitz- 
continuous in both variables. When, in addition, the 
Hamiltonian is smooth, so that the Hamiltonian 
flow is defined, the graph of this common differ- 
ential defined in A, denoted by A, is an invariant set 
for the flow, and is foliated by integral curves of [1] 
possessing some global minimizing properties with 
respect to the action functional. 

The aim of this presentation is to give an 
explanation of the previously described phenomena 
occurring at the critical level, and of some related 
facts, using tools and arguments as simply as 
possible. We propose a metric approach to the 
subject and consider as central in our analysis a 
family of distances, denoted by S,, for any a > c. We 
emphasize that such distances can be defined for 
only continuous Hamiltonians, and the qualitative 
analysis of the critical subsolutions has an interest 
independent from the dynamical applications. 
Indeed, it can be used in other contexts such as in 
homogenization problems, and the large-time beha- 
vior of the viscosity solutions to the time-dependent 
equation u; + H(x, Du) = 0. 

The discovery of the critical value has a history 
that reflects the dual character of the topics, which 
has a dynamical as well as a partial differential 
equation (PDE) interest. 


It was probably Ricardo Mané who first focused his 
attention on it, at the beginning of the 1980s, in 
connection with the analysis of integral curves of the 
Euler-Lagrange flow with some global minimizing 
properties. The set, previously denoted by A, has been 
found and analyzed by Serge Aubry, in a purely 
dynamical way, as the union of the supports of such 
minimizing curves. On the other hand, John Mather 
(1986) independently defined, in a more general 
framework, a set, contained in the Aubry set, through 
a weak approach that utilizes minimal probability 
measures invariant with respect to the Euler-Lagrange 
flow. The Mather set is actually the closure of the 
union of the supports of such measures. We will follow 
the approach of Aubry (see Fathi (2005b)), and will 
not introduce the Mather’s measures. 

In the viscosity solution theory, the critical value 
has instead been introduced in a famous unpub- 
lished paper of P L Lions, S R S Varadhan, and 
G Papanicolaou (1987), in connection with some 
periodic homogenization problems for Hamilton- 
Jacobi equations. It is worth noticing that they 
consider continuous Hamiltonian, defined on the 
flat N-dimensional torus, without any convexity 
assumption. 

They define the critical value, and show the 
existence of viscosity solutions to the critical equation 
by means of an ergodic approximation, that is, by 
considering the equation eu + H(x, Du) = 0 and then 
passing to the limit for e — 0. The critical viscosity 
solutions are used as correctors in the homogeniza- 
tion. They do not perform any qualitative analysis, 
and if such analysis can be done, and something 
similar to the Aubry—Mather sets exists for noncon- 
vex Hamiltonian this is still an important open 
problem. 

The two pieces of the picture were pasted together 
by Fathi (1996) with his weak KAM theory (see 
Contreras and Iturriaga (1999) and Fathi (2005a) 
for a general treatment, where the relevance of the 
extremal critical subsolutions has first been recog- 
nized for the analysis of the dynamics, and the 
Aubry—Mather sets have been characterized as a 
regularity set for such subsolutions, as described 
above). Evans and co-workers have been presently 
using more general PDE methods in weak KAM 
theory to address some integrability issues and to 
find a quantum analog (see Evans and Gomes (2001, 
2002) and Evans (2004)). 


Critical Value and Extremal Subsolutions 


We consider the family of Hamilton-Jacobi equa- 
tions [1] defined, for simplicity, on the flat torus 
TN =RN/ZN, endowed with the flat Riemannian 
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metric induced by the Euclidean metric on R. The 
tangent, as well as the cotangent bundle of T will 
be identified with T x R. All the results discussed 
in the remainder of the paper are still true in any 
compact boundaryless manifold, and some of them 
also hold in noncompact manifolds. We require H to 
be continuous in both variables, to satisfy the 
coercivity assumption, 


{(y, p): H(y, p) < a} is compact for any a 


and the following (strict) quasiconvexity conditions 
for any x € T“, a€ R: 


{p: H(y,p) < a} is strictly convex 
dip: H(y,p) <a} = ip: HY, p) =a} 


where O, in the above formula, indicates the 
boundary. We denote by S, the (possibly empty) 
set of the Lipschitz—continuous a.e. subsolutions to 
[1]. They will be called in the sequel, for short, just 
subsolutions. Due to the convex character of the 
Hamiltonian and its continuity, the property of 
being a subsolution, for some function u, can be 
equivalently expressed by requiring the inequality 
H(x,p) < ato hold for any x € T" and any p in the 
(Clarke) generalized gradient Ou(x), defined by 


OU\x) =op = lim Du(x;): 


x; differentiability point of u, lim =a 


where co indicates the convex hull. Note that if this 
set of weak derivatives reduces to a singleton at some 
x, then the function u is strictly differentiable at x, 
i.e., it is differentiable and Du is continuous at x. 

By a strict subsolution to [1] we mean a 
Lipschitz—continuous function w with ess sup H x 
(x,Dw(x))<a. The property of being a (strict) 
subsolution is not affected by addition of constants. 
Moreover, the pointwise supremum (resp. infimum) 
of any class of equibounded subsolution to [1] is 
itself a subsolution, and S, is stable with respect to 
the uniform convergence in T™. 

The purpose of this section is to show that there is 
a unique value c (the critical value) for which the 
corresponding equation 


H(x Du) =< [4] 


possesses subsolutions enjoying some extremality 
properties. We, more precisely, call a subsolution u € 
S, maximal (resp. minimal) if for any open subset 2 
of T and any Lipschitz—continuous function ¢ with 


u= onoQ and ess supoH(x,Do(x))<a [5] 


one has u > ¢ (resp. u < ġ) in Q. 


Any maximal (resp. minimal) subsolution u is 
actually an a.e. solution of [1]. If, in fact, 
H(xo, Du(xo)) < a for some differentiability point xo 
of u, then the function ¢(x)=u(x9) + Du(xo)(x — 
xo) — elx — x9] te (resp. (x) = u(x) + Du(xo)(x— 
xo) + elx — xol — £) should satisfy [5] for a suitable 
choice of £ > 0 and of a neighborhood Q of xo, and 
so should violate the maximality (resp. minimality) 
condition for u. 

The previous argument can be easily adapted to 
show something more general: if u is a maximal 
(resp. minimal) subsolution then no subtangents 
(resp. supertangents) to u at any y € T can be local 
strict subsolutions at y, that is, strict subsolutions in 
some neighborhood of y. 

The subtangency (resp. supertangency) condition 
of a function ¢ to u at a point xọ means that xo is a 
local minimizer (resp. maximizer) of u—¢. We 
denote by D™u(xo) (resp. D*u(xo)) the sets made up 
by the differentials of the C!-subtangent (resp. 
supertangent) to u at xo. They are (possibly empty) 
closed convex subsets of Ou(xo). It is apparent that if 
D*u(xo) 40 4 D-u(xo) then u is differentiable at xo 
and D*u(x9) =D u(xo) ={Du(xo)}. 

It is an immediate consequence of the previous 
fact that no extremal subsolutions can exist in Sj, 
whenever [1] admits a strict subsolution, say @, since 
there are global minimizers and maximizers of u — @, 
for any u € Sa, because of the compactness of T™. 
The function ¢ is then subtangent and supertangent, 
respectively, to u at such points. 

The unique value we can look at for finding 
extremal subsolutions is therefore 


c= inf{a € R: Sa # 0} [6] 


The set on the right-hand side of [6] is nonempty 
since the null function belongs to S, when a> 
max.,n H(x,0), and bounded from below by 
minx H(x,0). The value c is consequently well 
defined by [6]. 

Moreover, any sequence un € Sa, with a, 
decreasing and convergent to c, is equi-Lipschitz— 
continuous because of the coercivity of H, and 
equibounded, up to addition of suitable constants. 
It is therefore uniformly convergent, up to a 
subsequence, to some u, which belongs to Sa, 
for any n, since these classes are stable for the 
uniform convergence. This implies that u is a 
subsolution to [4], so that S. Æ Ø. The critical 
value c is then characterized by the property that 
the corresponding eqn [4] admits subsolutions 
but not strict subsolutions. Our aim is to show 
that extremal subsolutions do exist for the critical 
eqn [4]. 
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For any supercritical value a, that is, a > c, we can 
define the functional nonsymmetric semidistance: 


Sa(y, x) = sup{u(x) — u(y): u € Sa} 
= sup{u(x): u € Sa, u(y) = 0} 


for any x, y in T. It is immediate that S, satisfies 
the triangle inequality and S,(y, y)=0 for any y. But 
it fails, in general, to be symmetric and positive if 
x Æ y. We will nevertheless call it a distance, in the 
sequel, to ease terminology. The function 
xt> Saly, x) is itself a subsolution to [1], for any y, 
being the pointwise supremum of a family of 
equibounded subsolutions. Taking into account the 
inequality 


u(x) — u(y) = —Sa(x, y) 
which holds for any u € Sa, and the fact that it 
becomes an equality by setting u = S,(x, - ), we also get 
—S,(x,y) = inf{u(x) — u(y): u € Sa} 
inf{u(x): u € Sa u(y) = 0} 


and —S,(-,y) is, as well, a subsolution to [1]. Note 


Sa(x, y) + Sa(y, x) 2 0 for any y, xX [7] 


The interest of introducing the distance S, in the 
present context is that, for any a > c and y € TÄ, 
the function x> Saly, x) (resp. x —Sa(x,y)) satis- 
fies the maximality (resp. minimality) condition for 
subsolutions of [1] in any open set not containing y. 
If, by contradiction, the maximality property of 
Saly, -) were violated in some open set 2 with y ¢ Q 
by a ¢ satisfying [5] then one could make the set 
{x: (x) > Saly, x)} nonempty and compactly con- 
tained in 2, by adding a suitable constant. Hence, 
the formula 


c= max{?,Sa(y,-)f in Q (8] 
Saly, °) otherwise 
could provide a subsolution to [1] with 


u(y) =S,(y,y)=0 and u > S,(y,-) at some point of 
Q, which is in contrast with the very definition of S,. 
One can similarly prove the minimality condition 
for —S,(-,¥). 

We now focus our attention on the critical case. 
We derive from the previous considerations that if a 
maximal subsolution to [4] does not exist then, for 
any y, we can find a neighborhood (2) of y where 
S-(y,-) fails to be maximal. We can thus construct, 
through a formula like [8], a uy E€ Se with 


ess supo, HC, Duy(-)) < c [9] 


in some neighborhood Qy of y contained in Q;. 
Thanks to the compactness of T, we can extract 
from {Q,} a finite subcover {Qy,},7=1,...,m, for 
some m € N, and define 


u = X Ajit; 
i 


where A; are positive constants with X7 A;= 1. The 
convex character of the Hamiltonian and [9] imply 
that u is a strict critical subsolution, which cannot 
be. We therefore conclude that there is a nonempty 
subset of y, denoted henceforth by A, for which 
S-(y,-) is indeed a maximal critical subsolution. It 
can also be proved, by exploiting some stability 
properties of the maximal subsolutions, that A is 
closed. Similarly, —S,(-,y) must be a minimal 
critical subsolution for some y. We denote by A 
the closed set made up by such points. 

The previous covering argument shows that if 
yg A (resp. y Z A) then there is a local strict 
critical subsolution at y. The converse is also true: 
let in fact ¢ be such a strict subsolution satisfying 
oly) =S.-(y,y) =0; then ¢ is subtangent to S-,(y, - ) 
(resp. supertangent to —S,(-,y)) at y, by the very 
definition of the distance S.. This shows that S,(y, - ) 
(resp. —S,(-,y)) is not a maximal (resp. minimal) 
critical subsolution, and so y ¢ A (resp. y ¢ A). Since 
the previous characterization holds for both A and 
A, it follows that A= A. This set is a generalization 
of the (projected) Aubry set. We will come back on 
this point later on. 

We also see from the covering argument that there 
is a critical subsolution œ, which is strict outside A, 
that is, such that ess supo H(x, Dé(x)) < c for any 
open set 2 compactly contained in T\A. 

This implies that any y such that {p: H(y, p) < c} 
has empty interior, belongs to A. The empty interior 
condition in fact implies, thanks to the strict 
quasiconvexity of H, that the sublevel set reduces 
to a singleton, say {po}. We know that Ou(y) c 
{p: H(y, p) < c}, for any u € S,; therefore, Ou(y) is a 
singleton and so any critical subsolution u is strictly 
differentiable at y with H(y, Du(y)) = H(y, po) =c. 
Hence, there cannot be critical subsolutions which 
are strict around y. 

The previously described points will be called, in 
the sequel, equilibria, and the (possibly empty) 
closed set made up by them will be denoted by €. 
The reason of this terminology will be explained 
later. The differentiability property of the critical 
subsolutions at equilibria, can be extended, quite 
surprisingly, to any point of A, under more stringent 
assumptions on H. We will discuss this issue in the 
next section. 
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Qualitative Properties of Generalized 
Aubry Set 


We introduce some dynamical aspects in the picture 
by showing that the distances $4, defined in the 
previous section for any a > c, are actually of length 
type, in the sense that S,(y,x) equals, for any pair y, 
x, the infimum of the intrinsic length of absolutely 
continuous, or equivalently Lipschitz-continuous, 
curves joining y to x. By intrinsic length, we mean 
the total variation of Sa on the curve. It will be 
denoted by ¢,, while @ will indicate the natural (i.e., 
Euclidean) length. 

For this purpose, we proceed to give a line- 
integral representation formula of S,. To start with, 
we consider a C! subsolution u to [1], some x, y T^ 
and a (Lipschitz-continuous) curve &, defined in 
some compact interval I, joining y to x. We have 


u(x) — u(y) = J Duédt < / ca(é,6) dt [10] 


where, for any (x,v) € TN x RN, oy(x,v) = 


MaXpeZ,(x) pv and 


Za(x) := {p: H(x,p) < a} 


Inequality [10] also holds for a Lipschitz-continuous 
subsolution to [1] through suitable replacement of 
the differential by the generalized gradient. The set- 
valued map Z, is compact convex valued, by the 
coercivity and quasiconvexity assumptions on H, 
and continuous with respect to the Hausdorff 
metric. The function o, is accordingly continuous 
in the first variable, and convex and positively 
homogeneous in the second, being a support func- 
tion. This implies in particular that the integral on 
the right-hand side of [10] is invariant under change 
of parameter preserving the orientation. We derive, 
from [10], 


1 
Sa(y,x) < inf ‘| galt, £) dt: € defined 
0 
in [0,1] and joining y to 3 [11] 


for any y, x. We denote by S,(y,x) the quantity on 
the right-hand side of [11]. It is immediate that the 
triangle inequality holds for §S,. The function 
u:=S,(y,-) is, moreover, Lipschitz-continuous 
since o,(x,v)/|v| is bounded from above in 
TN x(R™\{0}) because of the coercivity of H. Given 
v €RN, we exploit the definition of S,, the 
continuity of o,, and the triangle inequality for S,, 


to get at any differentiability point xo of u, 


bios ta E 
h—0+ h 
AER D 
h—0+ 


1 
< lim al Og(xo — hut, hv) dt 
h Jo 
1 


= lim Og(xo — hut, v) dt 
b—0* Jo 
=at) 
This implies by Hahn-Banach theorem that 


Du(xo) € Za(x) or, in other terms, that u=S,(y,-) € 
Sa. We then derive, from [11] and the very 
definition of S,, 


1 
E inf | f EÈ dr 
0 
€ defined in [0,1] with €(0) = y, 


e(1) =a} 


Taking into account that the integral functional 
appearing in the previous formula is lower semicon- 
tinuous for the uniform convergence of equi- 
Lipschitz-continuous sequence of curves, by standard 
variational results, we in turn infer that it equals the 
intrinsic length ¢,. Mathematically, 


E= | oE Èd 


for any compact interval I and any curve € defined 
in I. 

Since S; is just a semidistance, we do not have any 
a priori information on the sign of ¢,; however, by 
[10], the intrinsic length of any cycle must be non- 
negative. Furthermore, while |¢,(€)| must be small 
for any curve € with small natural length, by the 
coercivity condition on H, no converse estimates 
hold, in general. If a > c, some information in this 
direction can be gathered by taking a strict subsolu- 
tion ¢ to [1], that it can be assumed smooth, up to 
regularization by mollification, then Dd(x)v < 
o4(x,v) — p\v| for any (x,v) € T x R, and some 
p > 0, and consequently 


G (£) > J (a4(E,£) — Do(€)8) dt 
+ d(x) — d(y) = pl(€) —Sa(x,y) [12] 


for any pair y, x and any curve €, defined in some 
interval I, joining y to x. The previous formula says, 
in particular, that when |x —y| is small then any 
curve whose intrinsic length approximates S,(y, x) 
must have small natural length. The previous 
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argument cannot be extended to the critical case. 
This gap suggests the next definition. The main 
purpose for introducing it is to get a metric 
characterization of the Aubry set A. 

We say that S, is localizable at some y if for every 
e€ > Q there is 0 < 6. < £ such that 


S-(y,x) = inf{é,(€): € joins y to x and £(€) <e} [13] 


whenever |x — y| < 6:. If y Z A, we adapt the argu- 
ment previously used in the strict subcritical case to 
get that S- is indeed localizable at y. In this case we 
have, in fact, at our disposal a critical subsolution, 
say @, which is strict in some neighborhood Q of y, 
thanks to the characterization of the Aubry set given 
in the previous section. 

We assume, to simplify, @ to be Ct; under the 
natural condition of Lipschitz-continuity, general- 
ized gradients should be used in place of differ- 
ential. We have Dd(x)v < o(x,v) — plv| for any 
x E€ Q, any v € RN, and some p > 0, and Dé(x)v < 
o(x,v), for any x, v. Exploiting these inequalities, we 
obtain an estimate analogous to [12] for curves 
starting from y, which allows us to prove [13]. 

Conversely, let y Z E be a point where S. is 
localizable. We claim that Z,(y) C D-u(y), where 
u:=S-.(y,-). It is enough to show that any po in the 
interior of Z.(y) belongs to D- u(y), since D™ u(y) is 
closed. Note that the interior of Z,(y) is nonempty 
since we are assuming that y is not an equilibrium. 
Such a po belongs to the interior of Z,(x) for x 
sufficiently close to y, thanks to the continuity of Zs; 
consequently, p(x — y) < (£) for any x close to y 
and any curve € joining y to x with ¢(€) sufficiently 
small. Taking into account [13], we then deduce 


p(x — y) < S.(y,x) for x close to y 


and so the linear function (x):=po(x—y) is 
subtangent to u at y. This in turn implies that y is 
out of A since ¢ is a local strict critical subsolution at 
y, and so S,(y, - ) cannot be a maximal subsolution by 
the characterization given in the previous section. 

The fact that S, is not localizable at any point of 
y € A\E leads to the announced metric characteriza- 
tion of A. If y is such a point, there is ane > 0, a 
point x, with |x — y|, and so |S.(y,x)|, as small as 
desired, and a curve € joining y to x with ¢,.(&) ~ 
S-(y,x) and K£) > £. We construct a cycle y, passing 
through y, by juxtaposition of € and the Euclidean 
segment joining x to y. We obtain, in this way, a 
sequence of cycles yn, passing through y, with length 
l(%n) > 0 and Lyn) 2 £, for any n. 

The same result can also be obtained for y € €. In 
this case we select ¢>0 and voce R with 
ocly, vo) =0, and denote by B, a sequence of 


Euclidean balls, centered at y, satisfying ce( -,vo) < 
1/n in B,. We construct a sequence of cycles, 
passing through y, by going up and down on the 
line {y + sv} in such a way that ,(t) € Bn, for every 
t, and € < &(y,) < 2e; therefore 0 < €,(y%,) < 2e/n. 

Conversely, such a sequence of cycles cannot exist 
at any y Z A because S, is localizable at y. 

We emphasize that the previous definition of A 
through cycles and the fact that S, is not localizable 
at any point y€ A with intZ,.(y) Æ Ø shows that, 
apart for the special case of equilibria, the property of 
being a point of A is definitively not of local nature. 

As pointed out already, if y g A, and so Se is 
localizable at y, then Z,(y) C D-u(y), where 
u:=S.(y,-); on the other hand, we know that 
D-u(y) C Ou(y) and Ou(y) C Z,(y), where the latter 
inclusion holds since u is a critical subsolution. We 
then derive 


D- u(y) = Ou(y) = Z-(y) 


We interpret these inequalities as a convexity—-type 
property, or, to use a more appropriate terminology, 
a semiconvexity property of the distance function 
S.(y,:) at y. The same property holds for the 
Euclidean distance function |x| at 0. 

A contrasting phenomenon takes place if y € A, 
namely S.(y,-) is semiconcave at y, which means 
that D*u(y) =Ou(y). This is more complicated to 
prove (see Fathi 200Sb), and requires, in addition, H 
to be strictly convex in p and locally Lipschitz- 
continuous in (x,p). Under these assumptions one 
can, more generally, show that S,(y,-) is semicon- 
cave in TY, if y€ A, while it is semiconcave in 
T™\{y} and semiconvex in y, if yg A. Some 
important consequences can be deduced. 

First, thanks to the semiconcavity property there 
are C! supertangents to u:—S,(y,-) at y, whenever 
y € A. Such a function, say ¢, is also supertangent to 
—S.(-,¥), which is a minimal critical subsolution, at 
the same point. We know from the previous section 
that no supertangents to —S,(-,y) at y can be strict 
critical subsolution locally at y, and so 
H(y, Dé(y))=c. This implies that Dtu(y) is con- 
tained in the boundary of Z,(y). We then see, taking 
into account that DT u(y) is convex and Z,(y) strictly 
convex, that Dtu(y) reduces to a singleton, and so, 
by the semiconcavity property, Ou(y) reduces to a 
singleton. Therefore, S,(y,-) is strictly differentiable 
at y, for any y € A. One can similarly show that 
—S.(-,y) is strictly differentiable at y. 

Second, given y € A and a critical subsolution w, 
which can be assumed, up to addition of a constant, 
to vanish at y, we see that S.(y, - ) (resp. —S,-(-,y)) is 
supertangent (resp. subtangent) at y because of 
its extremality properties. Since both these 
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super(sub)-tangents are differentiable, by the previous 
point, we deduce that w itself is differentiable at y. 
Moreover, the differentials at y of all three functions 
under consideration, namely S,(y,-), —S,.(-,y), and 
w, coincide. In particular, H(y,Dw/(y))=c, and 
y> Dw(y) is continuous on A, since S,(y, - ) has been 
proved to be strictly differentiable at y, whenever y € 
A. Any critical subsolution, restricted to A, is conse- 
quently a continuously differentiable solution to [4]. 

Summing up, we have discovered (under the 
assumption of strict convexity and Lipschitz- 
continuity for H) that every critical subsolution is 
differentiable on A, and the differential on A is the 
same for every critical subsolution. A continuous 
map G:A— RN is then defined by taking G(y) 
equal to the common differential of any critical 
subsolution at y. We denote by A the graph of G, 
which is a subset of the cotangent bundle of TY, 
identified with T x RY. 

As we have already pointed out, the existence of a 
C% subsolution to [1] is obvious when a > c, and 
such a subsolution can be obtained through a 
suitable regularization by mollification of any strict 
subsolution. The same construction cannot be 
performed at the critical level, since no strict critical 
subsolutions are available to start the regularization 
procedure. We can nevertheless show the existence 
of C! critical subsolutions by exploiting the infor- 
mation gathered on the Aubry set. We start by 
considering a countable locally finite open cover of 
TN\A, {Q;}; we know from the previous section that 
there is a critical subsolution, say w;, which is strict 
on Q;, for any i. Loosely speaking, we have some 
space, also in this case, for regularizing w; in such a 
way that the regularized function is still a critical 
subsolution, at least on Q;. 

We can glue together, with some precautions, 
these regularized local critical subsolutions through 
a C% partition of the unity, to produce a critical 
subsolution which is C® outside A. Using the fact 
that any critical subsolution is differentiable on A, 
we can further adjust the previous construction so 
that the critical subsolution is C! on the whole T™. 
We state this result in the following way: if the 
equation [1] has a subsolution then it also has a C! 
subsolution. It is worth noticing that it holds even if 
the underlying manifold is noncompact (see Fathi 
(2004, 2005b)). 


The Intrinsic Lengths and the Action 
Functional 


Here we assume H to satisfy all the usual assump- 
tions in order to define the Hamilton’s equations [2] 


and to have the completeness of the associated 
Hamiltonian flow. Namely, we require H to be C? 
in both variables, C-strictly convex, that is, Hp, > 
0 in T x R and superlinear, in the sense that 


H(x, p) 
plot = |p| 





=-+oo uniformly in x 


We define the Lagrangian L as the Fenchel 
transform of H. It takes finite values thanks to the 
superlinearity condition, and, in addition, inherits, 
from H,C? regularity, C?-strict convexity and 
superlinearity. In our setting, the Fenchel transform 
is involutive. 

We call a vector vg and a covector po conjugate at 
a point x if vo = Hp(x, po), and so L(x, vo) = povo — 
H(x, po). This also implies the relations po = L,(x, vo) 
and H(x, po) =povo — L(x, vo). If H(x,po)=a, for 
some a, then povo =c4(x, Vo), and po is the unique 
element of Z4(xo) for which such a relation holds. 
Since the function y+> povo — H(y, po) is subtangent 
to L(-,vo) at x, we see that Ly(x, vo) = —Hx(x, po). 

We introduce, for any (Lipschitz-continuous) 
curve € defined in [a,b], for some a < b, the action 
functional A(€) through 


A(é) = / L(E,€) dt 


We say that the curve € is a minimizer of the action 
if A(é) < A(y) for any y defined [a,b] and with the 
same end points of €. It is a classical result in 
calculus of variations that any of such minimizers € 
is of class C? and satisfies the Euler-Lagrange 
equation 


LEÒ LEË in Ja, bl 
Consequently, € and the conjugate curve 
n=L,(€,) satisfy the Hamilton’s equations [2]. 
Note that all the integral curves of [2] lie in a fixed 
level of the Hamiltonian, which is compact by the 
superlinearity condition. The corresponding Hamil- 
tonian flow is consequently complete. 

We show that if x9 € E, and Z,(xo) ={G(xo)}, 
then (xo, G(xo)) is a steady state of the Hamiltonian 
flow. In this case, in fact, c= min, H(xo0,p) and so 
L(xo,0) = —c and Hp(xo0, G(xo)) =0, or equivalently 
G(xo) and O are conjugate at xo. Taking into 
account that c is the critical value, we have that 


L(x y= E H(x,p)> —c for any x € TN 


so that xo is a minimizer of x+>L(x,0) and 
L,.(x0, 0) = —Hp(x0, G(xo) = 0. It is easy to see that, 
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conversely, if (xo,po) is a steady state of the 
Hamiltonian flow and H(xo,po0)=c then xo €€ 
and po = G(xo). 

We want to establish a relation between A(-) and 
the length functionals @, defined in the previous 
section for a>c. This will allow, among other 
things, to show that the Aubry set A is invariant for 
the Hamiltonian flow and to analyze the properties 
of the integral curves lying on it. To this aim, we 
consider the minimal geodesics for S,,a > c, that is, 
the curves, defined on compact intervals, whose 
intrinsic lengths ¢, equal the distance S$, between 
their end points. 

If a > c, we claim that, given any pair of points in 
TN, there is a minimal geodesics joining them. 
Recalling the formula [12], whose validity depends 
on the fact that in the strict supercritical case there is 
a smooth strict subsolution to [1], we have 


lal) > +coo whenever ¢(€) — +00 


The claim is then proved by using the Ascoli 
theorem and the lower-semicontinuity property of 
la. In the critical case, given y ¢ A, we can use the 
same argument to deduce the existence of minimiz- 
ing geodesics for S$, between y and any point x 
sufficiently close to y (in the Euclidean sense). This 
comes from the fact that S, is localizable at y, and so 
any sequence of curves &, with ¢.(&,) —> Se(y, x) has 
bounded natural length. For a general pair of points, 
we will show, on the contrary, that existence of a 
minimal geodesic is not guaranteed in the critical 
case. 

We consider a minimizing geodesics € for S, 
between a pair of points y and x. We assume a > c 
or a=c and €NE€=(. We want to show that £ is a 
minimizing curve for the action, up to a change of 
parameter. We choose the new parameter in such a 
way that 


L(E,8) +a =0,(€,8) [14] 


where we have denoted by E the reparametrized 
curve. Since € stays away from £, the velocities |£l 
are bounded from below by a positive constant and 
so the domain of definition of £, denoted by [0, T], is 
a compact interval. Note that @,(£) = @,(€), since the 
intrinsic length is invariant under change of para- 
meter. We take into account that € is a minimal 
geodesic and the inequality L(x,v)+ a> o0,(x,v), 
which holds for any x, v, to get 


A(€) = £,(§) WEL. < lala) =al < A(y) 


for any y defined in [0, T] with »(0)=y,7(T) =X. 
This proves the announced minimality property of €. 


Furthermore, we show that the function 
u:=S,(y,-) is strictly differentiable at €(s), for s € 
]0, T[, and 


Du(é) = L(é,8) = 15) 


D 
in [0,T]. Hence, (€,Du(€)) is a solution of the 
Hamilton’s equations in ]0, T[. To see this, we start 
from the relations 


t d 7 ` t a a 
| EED ds =E) = [od 
| nès 16l 


which hold in [0, T] because u is Lipschitz-contin- 
uous, € is a, minimizing geodesic, and nņ(s) is 


conjugate to &(s) at &(s) for any s€[0,T]. We 
know that 


d > z 
u(E(s)) = pels) 


for a.e. s and some p € Ou(E(s)) 


~ 


We have that p € Z,(E(s)), since u is a critical 
subsolution, and so 


p&(s) < o-(E(s), (s)) = n(s)&(s)) 
We see, in the light of [16], that equality must hold 
in the previous formula, for a.e. s. Therefore, 


~ 


© Was) = nls) &s) 


we derive from the fact that the function n( - )€(-) is 
continuous that (u(€(-)) is actually continuously 
differentiable in ]0, T[ and that [17] holds for any s. 
We finally exploit that u is semiconcave in T’Y\{y} 
as pointed out in the previous section, and so 
D*u(E(s))=Ou(E(s)), for any s. If ọ is a Cl- 
supertangent to u at &(s) then 


for a.e. s [17] 


accordingly, 


p&(s) = n(s)&(s) 
Since Ou(E(s)) c Z-(E(s)), this implies that Ou(E(s)) = 
{n(s)}. This actually gives the strict differentiability 
function u at Els), and Du(é(s))=n(s) for any s. 

The same argument works, with some adjustment, 
also when a=c and EN E Æ Ø. If, for instance, y ¢ 
E, to = min {t: E(t) € E}, then by reparametrizing £ in 
[0, to], as indicated in [14], we get a curve £ defined 
in [0,+o0o[ which is a minimizer of the action 
functional in any compact interval contained in 
[0, +oo[. Moreover, wu:=S,(y,-) is strictly 


for any s and p € Ou(E(s)) 
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~ 


differentiable in ]0, +oo[ and (£, Du() is a solution 
of the Hamilton’s equations. 

We proceed to investigate the properties of the 
Hamiltonian flow on A. We take a yo in A\€, and 
consider a sequence &, of cycles passing through yo 
with ¢.(&,)) — 0, 2(&,) > 26, for some positive 6. Such 
a sequence does exist in view of the characterization 
of A through cycles given in the previous section. 
Moreover, we assume that the €, are parametrized by 
the natural arc length in |- Tn, Ta], for some T, > 6, 
and satisfy &,(0) = yo for any n. There is then defined 
a uniform limit curve y in [-6,6], up to a 
subsequence, thanks to the Ascoli theorem. 

The idea is to construct a new sequence of cycles 
Yn by replacing the portion of the £, between —é and 
6 by y, and pasting this new piece with the 
remainder of €, through Euclidean segments at the 
end points. The y, are still of infinitesimal intrinsic 
length 4., which shows, in particular, that y is 
contained in A. By exploiting that S, is a length 
distance, that the y, are cycles, and the formula [7], 
with a=c, we get 


L(Yn) = Sell 8), (6)) + Sell), ¥(-4)) 
>0 


for any n, and we at last derive 


Lely) = Sc(7(4), ¥(-6)) = — Sc(¥(-4), (6) 


Note that the second equality is actually redundant. 
By reparametrizing y, as in [14], with a =c, in some 
open interval containing 0 as interior point and 
contained in [—6,6], we get a curve contained in 
A\E, denoted by £, defined on some open interval I 
and satisfying 


A(Elis) + el — $)) = £c(Elic.n) 
= —§,(€(t),€(s)) for any t>s [18] 


This, in particular, shows that £ is a minimizer of the 
action functional in any [s,t] C I. If we denote, as 
usual, by ņ the curve conjugate to €, we have, 
arguing as above, that 7(t) is the differential of the 
function S.(E(s),-) at €(¢), but, since the differentials 
of all critical subsolutions coincide on A, we finally 
get that n(t)=G(E(t) for every t€ I. Therefore, 
(£, G(€)) is a solution of the Hamilton’s equation in 
I and is contained in A. The same properties can be 
extended on the whole R. 

Taking into account that if y € € then (y, G(y)) is 
a steady state of the Hamiltonian flow, we in the 
end see that A is foliated by integral curves of 
the Hamiltonian flow (£, G(&)), with € enjoying the 
variational property [18]. This is indeed a 


characterization since if, conversely, a curve & 
satisfies [18] then it must be contained in A. 

As an application, we finally show that there 
cannot be minimal geodesics, for the critical metric 
Sc, joining a point of A, say y, to some x ¢ A, at 
least when €= Ø. If such a geodesic, say €, exists, 
and is defined in [0,7], for some T > 0, then 
(£, Du(€)) is a solution of the Hamilton’s equations, 
up to a change of parameter, where u:=S(y, - ), 
satisfying the initial conditions £(0)=yo,7(0)= 
lim;_.0+ Du(€(t)). 

The last relation tells us that 7(0) € Ou(y) and, 
since u is differentiable at y € A with Du(y) = G(y), 
we conclude that 7(0) = G(y). Therefore, (£, Du(€)) is 
a part of the integral curve of the Hamiltonian flow 
starting at (y,G(y)) that we know, by the above 
reasoning, to be contained in A, which is in 
contradiction with €(T)=x ¢ A. 


See also: Control Problems in Mathematical Physics; 
Dynamical Systems in Mathematical Physics: An 
lllustratrion form Water Waves; KAM Theory and Celestial 
Mechanics; Minimax Principle in the Calculus of Variations; 
Optimal Transportation; Stability Theory and KAM. 
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Introduction 


The phenomenon of superconductivity is one of the 
most profound manifestations of quantum 
mechanics in the macroscopic world. The celebrated 
Bardeen—Cooper-Schrieffer (BCS) theory (Bardeen 
et al. 1957) of superconductivity (SC) provides a 
basic theoretical framework to understand this 
remarkable phenomenon in terms of the pairing of 
electrons with opposite spin and momenta to form a 
collective condensate state. This theory does not 
only quantitatively explain the experimental data of 
conventional superconductors, the basic concepts 
developed from this theory, including the concept of 
spontaneous broken symmetry, the Nambu-—Gold- 
stone modes and the Anderson—Higgs mechanism 
provide the essential building blocks for the unified 
theory of fundamental forces. The discovery of high- 
temperature superconductivity (HTSC) in the copper 
oxide material poses a profound challenge to 
theoretically understand the phenomenon of super- 
conductivity in the extreme limit of strong correla- 
tions. While the basic idea of electron pairing in the 
BCS theory carries over to the HTSC, other aspects 
like the weak coupling mean field approximation 
and the phonon mediated pairing mechanism may 
not apply without modifications. Therefore, HTSC 
system provides an exciting opportunity to develop 
new theoretical frameworks and concepts for 
strongly correlated electronic systems. 

To date, a number of different HTSC materials have 
been discovered. The most studied ones include the 
hole-doped Laz_,Sr,CuO445 (LSCO), YBaz2Cu3O0¢6+5 
(YBCO), Big Sr2 CaCuz Og, 5 (BSCO), Tl, Baz CuO¢645 
(TBCO) materials and the electron-doped Nd2_,Ce, 
CuO, (NCCO) material. All these materials have a 
two-dimensional (2D) CuO% plane, and have an 
antiferromagnetic (AF) insulating phase at half-filling. 
The magnetic properties of this insulating phase is well 
approximated by the antiferromagnetic Heisenberg 
model with spin S = 1/2 and an AF exchange constant 
J ~100meV. The Neel temperature for the 3D AF 
ordering is approximately given by Ty ~ 300 ~ 


500K. The HTSC material can be doped either by 
holes or by electrons. In the doping range of 
S% < x< 15%, there is an SC phase with a dom-like 
shape in the temperature versus doping plane. The 
maximal SC transition temperature T, is of the order 
of 100K. The generic phase diagram of HTSC is 
shown in Figure 1. 

One of the main questions concerning the HTSC 
phase diagram is the transition region between the 
AF and the SC phases. Partly because of the 
complicated material chemistry in this regime, 
there is no universal agreement among different 
experiments. Different experiments indicate several 
different possibilities, including phase separation 
with an inhomogeneous density distribution, uni- 
form coexistence phase between AF and SC and 
periodically ordered spin and charge distributions in 
the form of stripes or checkerboards. 

The phase diagram of the HTSC cuprates also 
contains a regime with anomalous behaviors con- 
ventionally called the pseudogap phase. This region 
of the phase diagram is indicated by the dashed lines 
in Figure 1. In conventional superconductors, a 
pairing gap opens up at T. In a large class of HTSC 
cuprates, however, an electronic gap starts to open 
up at a temperature much higher than Te. Many 
experiments indicate that the pseudogap “phase” is 
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Figure 1 Phase diagram of the of the NCCO and the YBCO 
superconductors. 
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not a true thermodynamical phase, but rather the 
precursor towards a crossover behavior. 

The SC phase of the HTSC has a number of 
striking properties not shared by conventional 
superconductors. First of all, phase-sensitive experi- 
ments indicate that the SC phase for most of the 
cuprates has d wave like pairing symmetry. This is 
also supported by the photoemission experiments 
which show the existence of the nodal points in the 
quasiparticle gap. Neutron scattering experiments 
find a new type of collective mode, carrying spin 1, 
lattice momentum close to (m,m), and a resolution- 
limited sharp resonance energy around 20-40 meV. 
Most remarkably, this resonance mode appears only 
below T, of the optimally doped cuprates. Another 
property uniquely different from the conventional 
superconductors is the vortex state. Most HTSCs are 
type II superconductors where the magnetic field can 
penetrate into the SC state in the form of a vortex 
lattice, where the SC order is destroyed at the center 
of the vortex core. In conventional superconductors, 
the vortex core is filled by the normal metallic 
electrons. However, a number of different experi- 
mental probes, including neutron scattering, muon 
spin resonance (usR), and nuclear magnetic reso- 
nance (NMR), have shown that the vortex cores in 
the HTSC cuprates are antiferromagnetic, rather 
than normal metallic. This phenomenon has been 
observed in almost all HTSC materials, including 
LSCO, YBCO, TBCO, and NSCO, making it one of 
the most universal properties of the HTSC cuprates. 

The HTSC materials also have highly unusual 
transport properties. While conventional metals 
have a T* dependence of resistivity, in accordance 
with the predictions of the Fermi liquid theory, the 
HTSC materials have a linear T dependence of 
resistivity near optimal doping. This linear T 
dependence extends over a wide temperature win- 
dow, and seems to be universal among most of the 
cuprates. When the underdoped or sometimes 
optimally doped SC state is destroyed by applying 
a high magnetic field, the “normal state” is not a 
conventional conducting state, but exhibits insula- 
tor-like behavior, at least along the c-axis. This 
phenomenon may be related to the insulating AF 
vortices mentioned in the previous paragraph. 

The discovery of HTSC has greatly stimulated the 
theoretical understanding of superconductivity in 
strongly correlated systems. There are a number of 
promising approaches, partially reviewed in Dagotto 
(1994), Imada et al. (1998), and Orenstein and 
Millis (2000), but an universally accepted theory has 
not yet emerged. This article focuses on a particular 
theory, which unifies the AF and the SC phases of 
the HTSC cuprates based on an approximate SO(5) 


symmetry (Zhang 1997). The SO(5) theory draws 
its inspirations from the successful application of 
symmetry concepts in theoretical physics. All funda- 
mental laws of Nature are statements about sym- 
metry. Conservation of energy, momentum, and 
charge are direct consequences of global symmetries. 
The form of fundamental interactions is dictated by 
local gauge symmetries. Symmetry unifies appar- 
ently different physical phenomena into a common 
framework. For example, electricity and magnetism 
were discovered independently, and viewed as 
completely different phenomena before the nine- 
teenth century. Maxwell’s theory, and the under- 
lying relativistic symmetry between space and time, 
unify the electric field E and the magnetic field B 
into a common electromagnetic field tensor F,,,. 
This unification shows that electricity and magnet- 
ism share a common microscopic origin, and can be 
transformed into each other by going to different 
inertial frames. As discussed previously, the two 
robust and universal ordered phases of the HTSC 
are the AF and the SC phases. The central question 
of HTSC concerns the transition from one phase to 
the other as the doping level is varied. The SO(5) 
theory unifies the 3D AF order parameter 
(Nx, Ny, Nz) and the 2D SC order parameter 
(ReA, ImA) into a single, SD order parameter called 
“superspin,” in a way similar to the unification of 
electricity and magnetism in Maxwell’s theory: 


0 ReA 
i p Nx 
Lg = E B 0 Si; = Ny [1] 
y z N 
Fy =B; By Q Im A 


This unification relies on the postulate that a 
common microscopic interaction is responsible for 
both AF and SC in the HTSC cuprates and related 
materials. A well-defined SO(5) transformation 
rotates one form of the order into another. Within 
this framework, the mysterious transition from the 
AF and the SC as a function of doping is explained 
in terms of a rotation in the 5D order parameters 
space. Symmetry principles are not only fundamen- 
tal and beautiful, they are also practically useful in 
extracting information from a strongly interacting 
system, which can be tested quantitatively. The 
approximate SO(5) symmetry between the AF and 
the SC phases has many direct consequences, which 
can be, and some of them have been, tested both 
numerically and experimentally. 

The commonly used microscopic model of the 
HTSC materials is the repulsive Hubbard model, 
which describes the electronic degrees of freedom in 


the CuO, plane. Its low-energy limit, the t — J model 
is defined by 


H=-t Ss” (c! (x)cg(x') + h.c.) 
(xx!) 


+] > _ S(x) - S(x’) 2 | 
(xx!) 


where the term ¢t describes the hopping of an 
electron with spin o from a site x to its nearest 
neighbor x’, with double occupancy removed, and 
the J terms describe the nearest-neighbor exchange 
of its spin S. The main merit of these models does 
not lie in the microscopic accuracy and realism, but 
rather in the conceptual simplicity. However, 
despite their simplicity, these models are still very 
difficult to solve, and their phase diagrams cannot 
be compared directly with experiments. The idea of 
the SO(5) theory is to derive an effective quantum 
Hamiltonian on a coarse-grained lattice, which 
contains only the superspin degrees of freedom. 
The resulting SO(5) quantum nonlinear o-model is 
much simpler to solve using the standard field 
theoretical techniques, and the resulting phase 
diagram can be compared directly with 
experiments. 


SO(4) Symmetry of the Hubbard Model 


Before presenting the full SO(S) theory, let us first 
discuss a much simpler toy model, namely the 
negative U Hubbard model, which has an SC 
ground state with s-wave pairing. However, it also 
has a charge-density-wave (CDW) ground state at 
half-filling. The competition between CDW and the 
SC states is similar to the competition between AF 
and SC states in the HTSC cuprates. In the negative 
U Hubbard model, the CDW/SC competition can be 
accurately described by a hidden symmetry, namely 
the SO(4) symmetry of the Hubbard model. 

The Hubbard model is defined by the Hamiltonian 


where c,(x) is the fermion operator and n,(x)= 
c'(x)c,(x) is the electron density operator at site x 
with spin o, t, U, and u are the hopping, interaction, 
and the chemical potential parameters, respectively. 
The Hubbard model has a pseudospin SU(2) symmetry 
generated by the operators 


High T, Superconductor Theory 647 


5 (n(x) -5), [n,n] = teen" E 


where nt =7 tir” and a=x,y,z. The model is 
defined on any bipartite lattice, and the lattice 
function (—)* takes the value 1 on even sublattice 
and —1 on odd sublattice. These operators commute 
with the Hubbard Hamiltonian at half-filling when 
=O, that is, [H,7°]=0; therefore, they form the 
symmetry generators of the model (Yang and Zhang 
1990). Combined with the standard SU(2) spin 
rotational symmetry, the Hubbard model enjoys an 
SO(4) =SU(2) ® SU(2)/Z2 symmetry. This symme- 
try has important consequences in the phase 
diagram and the collective modes in the system. In 
particular, it implies that the SC and CDW orders 
are degenerate at half-filling. The SC and the CDW 
order parameters are defined by 


A~ =) alaw) At =(A>)! 
A 5] 
A= S e fn, AP] = ica,” 


where A*=A*+iA”. The last equation of [5] 
shows that the 7 operators perform the rotation 
between the SC and CDW order parameters. Thus, 
n° is the pseudospin generator and A® is the 
pseudospin order parameter. Just like the total spin 
and the Neel order parameter in the AF Heisenberg 
model, they are canonically conjugate variables. 
Since [H,7°]=0 at u=0, this exact pseudospin 
symmetry implies the degeneracy of SC and CDW 
orders at half-filling. 

The phase diagram of the U < 0 Hubbard model 
is identical to the phase diagram of the AF 
Heisenberg model in a uniform magnetic field. If 
the AF order parameter originally points along the 
z-direction, a magnetic field applied along the 
z-direction causes the AF order parameter to flop 
into the xy-plane. This transition is called the spin- 
flop transition, and is depicted in Figures 2a and 2c. 
The chemical potential u in the negative U Hubbard 
model plays a role similar to the magnetic field in 
the AF Heisenberg model. It transforms a CDW 
state at half-filling to an SC state away from half- 
filling, as depicted in Figures 2a and 2c. 

In the low-energy sector, both the AF Heisenberg 
model in a magnetic field and the negative-U 
Hubbard model with a chemical potential can be 
described by the SO(3) nonlinear o-model, which is 
defined by the following Lagrangian density (in 
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Figure 2 The spin-flop transition. (a) The spin-flop transition of the AF Heisenberg model. When a uniform magnetic field is applied 
along the direction of the AF moments, there is no net gain of the Zeeman energy. Therefore, after a critical value of the magnetic field, 
the AF spin component flops into the xy-plane, while a uniform spin component aligns in the direction of applied magnetic field. (b) The 
Mott insulator to superfluid transition of the hardcore boson model or the U < 0 Hubbard model. At half-filling, one possible state is the 
CDW state of ordered boson pairs. Upon doping, the pairs become mobile and form the superfluid state. (c) Both transitions can be 
described by the spin or the pseudospin flop in the SO(8) nonlinear c-model, induced either by the magnetic field or by the chemical 


potential. 


imaginary time coordinates) for a unit vector field 
na with n? =f; 


G2 ce Ay 
L =F Wag + 5 (Oita) + V(n) G 


Wo = No(0;nNg — IBe) — (a > b) 


where the magnetic field, or equivalently the chemical 
potential, is given by Ba = (1/2)eagyBo. x and p are 
the susceptibility and stiffness parameters, and V(7) is 
the anisotropy potential, which can be taken as 
Vin) = —(g/2)n%. Exact SO(3) symmetry is obtained 
when g=B,=0. g >0 corresponds to easy axis 
anisotropy, while g < 0 corresponds to easy plane 
anisotropy. In the case of g > 0, there is a phase 
transition as a function of B, with B, = B, =0. To see 
this, let us expand out the first term in [6]. The time- 
independent part contributes to an effective potential 
2 
V= V(n) = ~ (n? F n”) 

from which we see that there is a phase transition at 
Ba =/g/x. For B < Ba, the system is in the Ising 
phase, while for B > Ba, the system is in the XY 
phase. Therefore, tuning B for a fixed g > 0 leads to 
the spin-flop transition. In D=2, both the XY and 
the Ising phase can have a finite-temperature phase 
transition into the disordered state. However, 
because of the Mermin-Wagner theorem, a finite- 
temperature phase transition is forbidden at the 
point B=g=0, where the system has an enhanced 
SO(3) symmetry. This SO(3) symmetric point leads 
to a large regime below the mean field transition 
temperature where the fluctuation dominates. This 
large fluctuation regime can be identified as the 
pseudogap behavior. 

The pseudospin SU(2) symmetry of the negative-U 
Hubbard model has another important consequence. 
Away from half-filling, the 7 operators no longer 
commute with the Hamiltonian, but they are 
eigenoperators of the Hamiltonian, in the sense that 


[H, 1°] = F2un* [7] 


This means that the 7 operators create well-defined 
collective modes with energy 2u. Since they carry 
charge +2, they usually do not couple to any 
physical probes. However, in an SC state, the SC 
order parameter mixes the 7 operators with the 
CDW operator A7, via eqn [5]. From this reasoning, 
a pseudo-Goldstone mode was predicted to exist in 
the density response function at wave vector (m,m) 
and energy 2u, which appears only below the SC 
transition temperature Te. 


Unification of Antiferromagnetism and 
Superconductivity through the SO(5) 
Theory 


Order Parameters and SO(5) Group Properties 


The negative U Hubbard model and the SO(3) 
nonlinear o-models discussed in the previous section 
give a nice description of the quantum phase 
transition from the Mott insulating phase with 
CDW order to the SC phase. On the other hand, 
these simple models do not have enough complexity 
to describe the AF insulator at half-filling and the SC 
order away from half-filling. Therefore, a natural 
step is to generalize these models so that the Mott 
insulating phase with the scalar CDW order para- 
meter is replaced by a Mott insulating phase with 
the vector AF order parameter. The pseudospin 
SO(3) symmetry group considered previously arises 
from the combination of one real scalar component 
of the CDW order parameter with one complex, or 
two real components of the SC order parameter. 
After replacing the scalar CDW order parameter by 
the three components of the AF order parameter, 
and combining with the two components of the SC 
order parameters, we are naturally led to consider a 
five-component order parameter vector, and the 
SO(5) symmetry group which transforms it. 

It is simplest to define the concept of the SO(5) 
symmetry generator and order parameter on two 
sites with fermion operators c, and d,, respectively, 


where 0 = 1,2 is the usual spin index. The AF order 
parameter operator can be naturally defined in 
terms of the difference between the spins of the c 
and d fermions as follows: 


N° = 4(clr%c — d'r“d), m=N, 


[8] 


n3 = N3, ną = N3 


where 7° are the Pauli matrices. In view of the strong 
on-site repulsion in the cuprate problem, the SC order 
parameter should be naturally defined on a bond 
connecting the c and d fermions, explicitly given by 


—i 1 
Al = g Ta = z(—c}d] F edi, 
g 
(att) _(At-A) 
1—5 2 1 = —__ ae 


2 21 


We can group these five components together to 
form a single vector na = (11, 12,13, N4, n5) called the 
superspin, since it contains both superconducting 
and antiferromagnetic spin components. The indivi- 
dual components of the superspin are explicitly 
defined in the last parts of eqns [8] and [9]. 

The concept of the superspin is only useful if there 
is a natural symmetry group acting on it. In this 
case, since the order parameter is 5D, it is natural to 
consider the most general rotation in the SD order 
parameter space spanned by n4. In 3D, three Euler 
angles specify a general rotation. In higher dimen- 
sions, a rotation is specified by selecting a plane and 
the angle of rotation within this plane. Since there 
are n(n — 1)/2 independent planes in n dimensions, 
the group SO(n) is generated by n(n -— 1)/2 ele- 
ments, specified in general by antisymmetric 
matrices Lap = — Lpa, with a=1,...,7. In particular, 
the SO(5) group has ten generators. The total spin 
and the total charge operator 


Ss, = 5 (cltac + d'7,d) 


(10) 
OQ =3(clc+d'd—2) 


perform the function of rotating the AF and SC 
order parameters within each subspace. In addition, 
there are six so-called m operators, defined by 

To = (nt)! 11] 


m= —4eltytydi, 


which perform the rotation from AF to SC and vice 
versa. These infinitesimal rotations are defined by 
the commutation relations 


[r], Ng] = ibap, int, A) =iN, [12] 
The ten operators, the total spin Sa, the total charge 
O, and the six m operators form the ten generators 
of the SO(5) group. 
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The superspin order parameter na, the associated 
SO(5) generators Lag, and their commutation relations 
can be expressed compactly and elegantly in terms of 
the SO(S) spinor and the five Dirac I’ matrices. The 
four-component SO(5) spinor is defined by 


(4) s 


They satisfy the usual anticommutation relations 
Uw) Sey, {U,V} = {Viv} =0 [14] 


Using the Y spinor and the five Dirac I’ matrices, we 
can express na and Lap as 


na = LWIT Y 


a` 2 U uV V9 


Lp] ViTew, TIS] 


2 W py 
The L,» operators satisfy the commutation relation 
[Lab Leal — —1(OacLod Fogd Lao — Osa bbe — bccLiga) [16] 


The n4 and the Y, operators form the vector and the 
spinor representations of the SO(5) group, satisfying 
the following equations: 


(Lab ne] = =i( óc = bca) [17] 
and 
[Labs Vu] ea T [18] 


If we arrange the ten operators Sa, O, and ma into 
Lap’s by the following matrix form: 


0 
TL+ Tx 0 
Lo=| ttm -6 0 [19] 
T! +T; Sy — Sy 0 


Q Hal—m) Hajn) Hat) 0 


and group %4 as in eqns [8] and [9], we see that eqns 
[16] and [17] compactly reproduce all the commuta- 
tion relations worked out previously. These equations 
show that L,, and n4 are the symmetry generators 
and the order parameter vectors of the SO(5) theory. 
Having introduced the concept of local symmetry 
generators and order-parameter-based sites in real 
space, we now proceed to discuss definitions of 
these operators in momentum space. The AF and 
dSC order parameters can be naturally expressed in 
terms of the microscopic fermion operators as 


a a = 
Ne=) nT, A>) dogry 
p p 


d(p) =cospx — cos py 


|20] 


where II = (m,r) and d(p) is the form factor for 
d-wave pairing in 2D. They can be combined into 
the five-component superspin vector na by using the 
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Table 1 Quantum number of the AF and the dSC order 
parameters, and the z operator, which rotates the AF and the 
dSC order parameters into each other 





Charge Spin Momentum Internal angular 
momentum 
A, At or +2 0 0 d-Wave 
Ny, N5 
N° or 0 1 (1, T) s-Wave 
N2,3,4 
To, Ti, +2 1 (n, 7) d-Wave 


same convention as before. The total spin and total 
charge operator are defined microscopically as 


P 1 
= Xo dr Cp, Q = 5 (chep —1) [21] 
p p 
and the z-operators can be defined as 


Ta = 28) 


The form factor g(p) needs to be chosen appro- 
priately to satisfy the SO(5) commutation relation 
[16], and this requirement determines g(p)= 
sgn(d(p)). 

The SO(S) symmetry generators perform the most 
general rotation among the five-order parameters. 
It is easy to see that the quantum number of the 
m Operators exactly patches up the difference in 
quantum numbers between the AF and the dSC 
order parameters, according to Table 1. 


)epentT “7c! =p [22] 


The SO(5) quantum nonlinear o-model 


In the previous section we presented the concept of 
the local SO(5) order parameters and symmetry 
generators. These relationships are purely kinematic, 
and do not refer to any particular Hamiltonian. One 
can in fact construct microscopic models with exact 
SO(5) symmetry out of these operators. A large class 
of models, however, may not have SO(5) symmetry 
at the microscopic level, but their long-distance, 
low-energy properties may be described in terms of 
an effective SO(5) model. In the previous section, we 
have seen that many different microscopic models 
indeed all have the SO(3) nonlinear o-model as their 
universal low-energy description. Similarly, we pre- 
sent the SO(5) quantum nonlinear o-model as a 
general theory of AF and dSC in the HTSC. 

From eqn [17] and the discussions in the previous 
subsection, we see that L,, and n, are conjugate 
degrees of freedom, very much similar to [g, p] =ih 
in quantum mechanics. This suggests that we can 
construct a Hamiltonian from these conjugate 
degrees of freedom. The Hamiltonian of the SO(5) 


quantum nonlinear o-model takes the following 
form: 


5 mee 


1 
=a tet De 
+ ` Balx) Lala) + D V(n(x)) [23] 


where the na vector field is subjected to the 
constraint 


iol [24] 


This Hamiltonian is quantized by the canonical 
commutation relations [16] and [17]. Here, the first 
term is the kinetic energy of the SO(5) rotors, where 
x has the physical interpretation of the moment of 
inertia of the SO(5) rotors. The second term 
describes the coupling of the SO(5) rotors on 
different sites, through the generalized stiffness p. 
The third term introduces the coupling of external 
fields to the symmetry generators, while the V(7) 
can include anisotropic terms to break the SO(S) 
symmetry to the SO(3) x U(1) symmetry. The SO(5) 
quantum nonlinear o-model is a natural combina- 
tion of the SO(3) nonlinear o-model describing the 
AF Heisenberg model and the quantum XY model 
describing the SC to insulator transition. If we 
restrict to the values a=2,3,4, then the first two 
terms describe the symmetric Heisenberg model, the 
third term describes easy plane or easy axis 
anisotropy of the Neel vector, while the last term 
represents the coupling to the uniform external 
magnetic field. On the other hand, for a=1,5, the 
first term describes Coulomb or capacitance energy, 
the second term is the Josephson coupling energy, 
while the last term describes coupling to external 
chemical potential. 

The first two terms of the SO(5) model describe 
the competition between the quantum disorder and 
classical order. In the ordered state, the last two 
terms describe the competition between the AF and 
the SC order. Let us first consider the quantum 
competition. The first term prefers sharp eigenstates 
of the angular momentum. At an isolated site, C = 
X L*, is the Casimir operator of the SO(5) group, in 
the sense that it commutes with all the SO(5) 
generators. The eigenvalues of this operator can be 
determined completely from group theory; they are 
0, 4, 6, and 10, respectively, for the 1D SO(S) 
singlet, 5D SO(5) vector, 10D antisymmetric 
tensor, and 14D symmetric, traceless tensors. There- 
fore, we see that this term always prefers a 
quantum-disordered SO(5) singlet ground state, 
which is a total spin singlet. This ground state is 
separated from the first excited state, the fivefold 


SO(5) vector state with an energy gap of 2/y. This 
gap will be reduced, when the different SO(5) rotors 
are coupled to each other by the second term. This 
term represents the effect of stiffness, which prefers 
a fixed direction of the n, vector, rather than a fixed 
angular momentum. This competition is an appro- 
priate generalization of the competition between the 
number sharp and phase sharp states in a super- 
conductor and the competition between the classical 
Neel state and the bond or plaquette singlet state in 
the Heisenberg AF. The quantum phase transition 
occurs near yp œ 1. 

In the classically ordered state, the last two 
anisotropy terms compete to select a ground state. 
To simplify the discussion, we can first consider the 
following simple form of the static anisotropy 
potential: 


V(n) =—g(nz + n3 + n4) |25] 


At the particle-hole symmetric point with vanishing 
chemical potential By; = u= 0, the AF ground state 
is selected by g > 0, while the SC ground state is 
selected by g < 0 coupled with the constraint n? = 1. 
g=0 is the quantum phase transition point separat- 
ing the two ordered phases. 

However, it is unlikely that the HTSC cuprates 
can be close to this quantum phase transition point. 
In fact, we expect the anisotropy term g to be large 
and positive, so that the AF phase is strongly 
favored over the SC phase at half-filling. However, 
the chemical potential term has the opposite, 
competing effect favoring SC. To see this, we 
transform the Hamiltonian into the Lagrangian 
density (in imaginary time coordinates) in the 
continuum limit: 


D= a t)? + D (dpn (x, t)? + V(n(x,t)) [26] 


where 
Wab = nalOn, = iD porig) = (a = b) [27] 


is the angular velocity. We see that the chemical 
potential enters the Lagrangian as a gauge coupling 
in the time direction. Expanding the time derivative 
term, we obtain an effective potential 


(2a) X 
2 


from which we see that the V term competes with 
the chemical potential term. For u < ue = \/g/x, the 
AF ground state is selected, while for u > uc, the SC 
ground state is realized. At the transition point, even 
though each term strongly breaks SO(S) symmetry, 
the combined term gives an effective static potential 
which is SO(5) symmetric, as we can see from [28]. 





Vest(#) = V(n) — (ni +75) |28] 
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Even though the static potential is SO(5) symmetric, 
the full quantum dynamics is not. This can be most 
easily seen from the time-dependent term in the 
Lagrangian. When we expand out the square, the 
term quadratic in p enters the effective static 
potential in eqn [28]. However, there is also a 
time-dependent term linear in u. This term breaks 
the particle-hole symmetry, and it dominates over 
the second-order time derivative term in the nı and 
ns variables. In the absence of an external magnetic 
field, only second-order time derivative terms of 
n 3,4 enter the Lagrangian. Therefore, while the 
chemical potential term compensates the anisotropy 
potential in eqn [28] to arrive at an SO(5) symmetric 
static potential, its time-dependent part breaks the 
full quantum SO(5) symmetry. This observation 
leads to the concept of the projected, or static 
SO(5) symmetry (Zhang et al. 1999). A model with 
projected or static SO(5) symmetry is described by a 
quantum effective Lagrangian of the form 


L= 5 ` (dna) — yp(n10;n5 — nso) 
a=2,3,4 


eff (72) [29] 


where the static potential Veg is SO(S) symmetric, 
but the time-dependent part contains a first-order 
time derivative term in nı and ns. 

The SO(5) quantum nonlinear o-model is con- 
structed from two canonically conjugate field 
operators L,, and n4. In fact, there is a kinematic 
constraint among these field operators: 


Labne + Loena + Leanp = O [30] 


This identity is valid for any triples a, b, and c, and 
can be easily proved by expressing Lap =ngpp — 
NpPa, where pa is the conjugate momentum of na. 
Geometrically, this identity expresses the fact that 
Lap generates a rotation of the m, vector. The 
infinitesimal rotation vector lies on the tangent 
plane of the four sphere S*, and is therefore 
orthogonal to the na vector itself. 

In a large class of materials, including the high-T, 
cuprates, the organic superconductors, and the heavy 
fermion compounds, the AF and SC phases occur in 
close proximity to each other. The SO(5) theory is 
developed based on the assumption that these two 
phases share a common microscopic origin and should 
be treated on an equal footing. The SO(5) theory gives 
a coherent description of the rich global phase diagram 
of the high-T. cuprates and its low-energy dynamics 
through a simple symmetry principle and a unified 
effective model based on a single quantum Hamilto- 
nian. A number of theoretical predictions, including 
the intensity dependence of the neutron resonance 
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Figure 3 The finite-temperature phase diagram of the SO(5) model in the temperature (7) versus chemical potential (1) plane. (a) and 
(b) are two different representations of the same phase diagram, corresponding to a direct first-order phase transition between AF and 
SC, as a function of the chemical potential and doping, respectively. (c) corresponds to two second-order phase transitions with a uniform 
AF/SC mix phase in between. The AF and the SC transition temperatures Ty and 7, merge into a bicritical Tec or a tetra-critical point Tic. 
Both possibilities are allowed theoretically; it is up to experiments to determine which one is actually realized in the high-7, cuprates. 


mode, the AF vortex state, and the mixed phase of AF 
and SC, have been verified experimentally (Figure 3). 
The theory also sheds light on the microscopic 
mechanism of superconductivity and quantitatively 
correlates the AF exchange energy with the condensa- 
tion energy of superconductivity. However, the theory 
is still incomplete in many ways and lacks full 
quantitative predictive power. While the role of 
fermions is well understood within the exact SO(5) 
models, their roles in the effective SO(5) models are 
still not fully worked out. As a result, the theory has 
not made many predictions concerning the transport 
properties of these materials. 


See also: Abelian Higgs Vortices; Effective Field 
Theories; Euclidean Field Theory; Ginzburg—Landau 
Equation; Hubbard Model; Quantum Phase Transitions; 
Quantum Spin Systems; Quantum Statistical Mechanics: 
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Introduction 
Subject 


Holomorphic dynamics (in a narrow sense) is a 
theory of iterates of rational endomorphisms of the 
Riemann sphere C=CU {ox}. The goal is to under- 
stand the phase portrait of this dynamical system, 
that is, the structure of its trajectories, and the 
dependence of the phase portrait on parameters 
(coefficients of f). 
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Holomorphic dynamics in a broader sense would 
include the theory of analytic transformations, local 
and global, in dimension 1 and higher, as well as the 
theory of groups and pseudogroups of analytic 
transformations, which would cover theory of 
Kleinian groups and holomorphic foliations. How- 
ever, we will mostly focus on holomorphic dynamics 
in the narrow sense. 


Brief History 


Local dynamical theory of analytic maps was laid 
down in the late nineteenth and early twentieth 
century by Königs, Schröder, Böttcher, and Leau. 
Global theory of iterates of rational maps was 
founded by Fatou and Julia in comprehensive 
memoires of 1918-19. The theory had been 


developed very little since then until early 1980s 
when it exploded with new methods, ideas, and 
computer images. Particularly influential were the 
works of D Sullivan who introduced ideas of 
quasiconformal deformations into the field, of 
A Douady and J Hubbard who gave a comprehensive 
combinatorial description of the Mandelbrot set, 
and W Thurston who linked holomorphic 
dynamics to three-dimensional hyperbolic geome- 
try bringing to the field ideas of geometrization and 
rigidity. As a result, profound rigidity conjectures 
were formulated. Renormalization ideas introduced to 
the theory later on led to a significant progress 
towards these conjectures (see Universality and 
Renormalization). 

Another source of ideas came from ergodic theory 
and the general theory of dynamical systems, 
particularly hyperbolic dynamics and thermodyna- 
mical formalism. They led to constructions of 
natural geometric measures on the Julia sets that 
helped to penetrate into their fractal nature. 


General Terminology and Notations 


N= {1,2,...} is the set of natural numbers; D is the 
unit disk; Z+ =N U {0}; T = D. 

A topological disk is a simply connected domain 
in C. A topological annulus is a doubly connected 
domain in C (i.e., a domain homeomorphic to a 
round annulus). A Cantor set is a totally 
disconnect compact subset of R” without isolated 
points. 

Given a map f : X — X, f” will stand for its n-fold 
iterate. The semigroup of iterates form a dynamical 
system with discrete time. An orbit or trajectory of a 
point z is orbs(z) = (f"z)7~_ 9. 

A subset Y C C is called invariant if f(Y) c Y and 
completely invariant if also f '(Y) c Y. 

A point a € C is called periodic if fPa=qa for 
some natural p. The smallest such p is called the 
period of a. If p=1, then a is called a fixed 
point. The orbit of a periodic point is also called a 
cycle. 

Two maps f : X — X and g: Y — Y on topological 
spaces X and Y are called topologically conjugate if 
there exists a homeomorphism ):X — Y such that 
hof=goh. If h has better regularity properties, for 
example, it is quasiconformal/conformal/affine, then 
f and g are called quasiconformally/conformally/ 
affinely conjugate. 

Let f(z) = P(z)/Q(z) be a rational function viewed 
as a map C—C. Its topological degree degf = 
Hf—(z),z€C, (where the preimages of z are 
counted with multiplicity), is equal to the algebraic 
degree max(deg P, deg O). The dynamics of f is very 
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simple in degree 1, so in what follows we assume 
that degf > 2. 

Let Cy = {c : Df(c) =0} stand for the set of critical 
points of f, and Vr=f(Cy) be the set of critical 
values. A rational function of degree d has 2d — 2 
critical points counted with multiplicity. Moreover, 


n—1 n 
Cpe = JAE), Ve = U C 
k=0 k=1 


The latter formula explains why the behavior of the 
critical orbits crucially influences the global dynamics 
of f. The set Of = U Vp is called postcritical. 


Basic Dynamical Theory 
Local Theory 


The local theory describes the dynamics of an 
analytic map f:z-Az+ >), 4n2” near its fixed 
point 0. The derivative \=f'(0) is called the multi- 
plier of 0. The fixed point is called attracting, 
repelling, or neutral, depending on whether |A| < 1, 
Al > 1, or |A]=1. It is called superattracting if 
A=0, 

In case when 0 is an attracting (but not super- 
attracting) or repelling fixed point, the map is lineariz- 
able, that is, it is conformally conjugate to its linear part 
zt Az; thus, there is a local conformal solution of the 
Schröder equation (fz) = A¢(z). This solution is also 
called the linearizing coordinate near 0. 

In the superattracting case, the map is confor- 
mally conjugate to the map z> 2%, where az? is the 
first nonvanishing term in the local expansion of f. 
Thus, in this case there is a local conformal solution 
of the Böttcher equation (fz) =¢(z)4. It is also 
called the Böttcher coordinate near 0. 

The situation in the neutral case (when 
\=e*", 0 € R/Z) depends in a delicate way on 
the arithmetic properties of the rotation number 0. If 
0=q/p is rational, the fixed point O is called 
parabolic. The local dynamics is then described in 
terms of the Leau—Fatou flower consisting of 
attracting petals alternating with repelling petals. 
In each petal, the map is conformally conjugate to 
the translation z—> z+ 1. The quotients of the petals 
by dynamics are conformally equivalent to the 
cylinder C/< z-+z+41>. They are called (attracting/ 
repelling) Ecalle-Voronin cylinders. 

In the irrational case, when 6 € R\Q, the map can 
be either linearizable or not. Accordingly, 0 is called a 
Siegel or a Cremer fixed point. If the multiplier is 
Diophantine (1.e., there exist C > 0 and a > 2 such 
that for all rational numbers g/p, we have: 
|0 — q/p| > Cp®), then 0 is linearizable (Siegel 1942). 
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Notice that almost all numbers are Diophantine. 
A sharper arithmetic condition for linearizability in 
terms of the continuous fraction expansion for 0 was 
given by Bruno (1965). In the quadratic case, 
z e?z + 22. this condition was proved to be sharp 


(Yoccoz 1988). 


Fatou and Julia Sets 


From now on, f:C— C is a rational endomorphism 
of the Riemann sphere. The theory starts with the 
splitting of the sphere into two subsets now called 
Fatou and Julia sets based on the notion of a normal 
family in the sense of Montel. A family (a : S — C) 
of meromorphic functions on some Riemann surface 
S is called normal if it is precompact in the open- 
closed topology. The Fatou set F(f) is the maximal 
open subset of C on which the family of iterates 
(f”)°_4 is normal. The Julia set J(f) is the comple- 
ment of the Fatou set. Both sets are completely 
invariant. The Julia set is always nonempty, and is 
either nowhere dense or coincides with the whole 
sphere. The trajectories on the Fatou set are 
Lyapunov stable (if z is close to zo € Fif), then 
orbs(z) is uniformly close to orb,(zo)), while the 
dynamics on the Julia set is “chaotic.” 

If f is a polynomial, then the Fatou and Julia sets 
can be defined in a more concrete way as follows. In 
this case, oo is a superattracting fixed point for f. Let 
us consider its basin of attraction, 


D;(oo) = {z: f%% > 00 as z — oo} 


Its complement, K(f), is called the filled Julia set. 
Then, 


I(f) = OK(f) = Dz (co) 


Periodic Points 


Let a be a periodic point of f of period p. As a fixed 
point of f?, it is subject of the local theory. Thus, it 
(and its cycle) is classified as attracting, repelling, 
i aie to the properties of the multiplier 

= (fry! ) (that can be calculated in any local chart 
near qa). 

The basin of attraction D;(a@) of an attracting 
cycle a =(f"a) aa is the set {z:f”z— @ as n— co}. 
The immediate basin of attraction Di (a a) is the 
union of components of D;(q@) containing the points 
of æ. 


Theorem 1 (Fatou—Julia). The immediate basin of 
any attracting cycle contains a critical point. (Note 
that a superattracting cycle actually contains some 
critical point.) 


It follows that a rational function of degree d has 
at most 2d —2 attracting cycles. A polynomial of 
degree d has at most d — 1 attracting cycles in C. 

Attracting cycles belong to the Fatou set, while 
repelling cycles lie on the Julia set. Parabolic and 
Cremer points lie on the Julia set, while Siegel points 
belong to the Fatou set. The basin of attraction of a 
parabolic cycle @ is defined as 


D;(@) = {z: f"% > æ as n — a f” æ) 
n=0 


It is the union of some components of the Fatou set. 
The union of the components of D;(@) containing 
the petals of the Leau—Fatou flower is called the 
immediate basin of attraction D*(œ) of a. As in the 
attracting case, the immediate basin D; (a) ) of a 
parabolic cycle contains a critical point of f. 
Components of the Fatou set containing Siegel 
periodic points are called Siegel disks. If D is a Siegel 
disk of period p, then f?|D is conformally conjugate 
to the irrational rotation z+ e?™’z of the unit disk. 


Theorem 2 (Shishikura 1987). A rational function 
of degree d has at most 2d — 2 nonrepelling cycles. 


The proof of this result uses the methods of 
quasiconformal surgery. 


Examples 


For f :z= zf, d > i the Julia set J(f) is the unit circle. 
Moreover, Dy(oo =C\D, while D is the basin of 
attraction of the T fixed point 0. 
For maps f-:z1 2z? +e with sufficiently small 
e #0, the Julia set J( f) is a nowhere-differentiable 
Jordan curve (see Figure 1). The domain bounded 
by this curve is the basin of attraction of an 
attracting fixed point as. 
The filled Julia set of the map f : z> z? — 1 called 
the basilica is depicted in black in Figure 2. The 





Nowhere-differentiable Jordan curve. 


Figure 1 





Figure 2 Basilica. 


interior of the basilica is the basin of the super- 
attracting cycle œ = (0,1) of period 2. 

For the map f:z=2? —2, the Julia set is the 
interval [—2, 2]. It is affinely conjugate to the Cheby- 
shev quadratic polynomial Ch): z+ 22? — 1. More 
generally, for a Chebyshev polynomial Chy of any 
degree d, the Julia set is the interval. (By definition, the 
Chebyshev polynomial Chg is the solution of the 
functional equation cos dz = Chg( cos z).) 

For quadratic maps f- :z— 2* + c with c <-2, the 
Julia set is a Cantor set on R. For maps fe with 
c > 1/4, the Julia set is a Cantor set that does not 
meet R. For c € (—2,1/4], the Julia set contains an 
invariant interval on R, but is not contained in R. 

For f:z++ 27 +i, the Julia set is a “dendrite” (see 
Figure 3). 

For c ~ 0.12 + 0.741, the map f : z 27 + c has an 
attracting cycle of period 3. Its Julia set in known as 
the Douady rabbit (Figure 4). 


No Wandering Domains Theorem and Dynamics 
on the Fatou Set 


A component D of the Fatou set is called wandering 
if f”(D) f(D) =9 for all natural n > m. 





Figure 3 Dendrite. 
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Figure 4 Douady rabbit. 


Theorem 3 (Sullivan 1982). 
not have wandering domains. 


Rational functions do 


This theorem is analogous to Ahlfors theorem in 
the theory of Kleinian groups. Its proof introduced 
to holomorphic dynamics the methods of quasicon- 
formal deformations that has become the basic tool 
of the subject. 

The “no wandering domains theorem” has com- 
pleted the picture of dynamics on the Fatou set. 
Namely, for any z € F(f), one of the following three 
events may happen: 


e z belongs to the basin of attraction of some 
attracting cycle; 

e z belongs to the basin of attraction of some 
parabolic cycle; and 

e for some n, f”z belongs to a rotation domain. 


Here a rotation domain is either a Siegel disk, or a 
Herman ring, that is, a topological annulus A such 
that f?(A)=A for some p €N and f?|A is con- 
formally equivalent to an irrational rotation z+> e?'z 
of a round annulus {z: 1 < |z|< R}. Note that 


Herman rings cannot occur for polynomial maps. 


More Properties of the Julia Set 


There are two more useful characterizations of the 
Julia set: 


e If z is not an attracting periodic point and does 
not belong to a rotation domain, then the set of 
accumulation points of the full preimages f~"z is 
equal to J(f). 

è The Julia set is the closure of the set of repelling 
periodic points. 


In the polynomial case, the Julia set J(f) (and the 
filled Julia set K(f)) is connected if and only if the 
critical points do not escape to o (in other words, 
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C; C K(f)). In the quadratic case, the Basic Dichot- 
omy holds: the Julia set (and the filled Julia set) is 
either connected or a Cantor set. 


Bottcher Coordinate 


Let f =z +aız™! +---+ag be a monic polyno- 
mial of degree d > 2. Then oo is a superattracting 
fixed point, and hence there is a univalent function 
B(z) = B;(z) near œ satisfying the Böttcher equation 
B(fz)=B(z)? (the Böttcher coordinate near oo). 
Moreover, B(z) ~ z as z — œo since f is monic. 

If J(f) is connected, B(z) can be analytically 
extended to the whole basin of œo, and provides us 
with the Riemann map C\K(f) + C\D. Otherwise, 
B(z) extends to a conformal map from some 
invariant domain Q; whose boundary contains a 
critical point onto C\Dr, where R=R; > 1. 

The B-preimage of a straight ray {re?™ : 0 < r < co} 
is called the external ray Ry of angle 6. The B-preimage 
of a round circle {re?"?: 0 <6 <1} is called the 
equipotential E, of level t= logr. External rays and 
equipotentials form two orthogonal f-invariant folia- 
tions. We let Rolt) = Ra N Ej. 


Combinatorial Equivalence 


Assume now that J(f) is connected. One says that an 
external ray Ry lands at some point z € J(f) if 
Re(t) +z as t—0. Any external ray of rational 
angle 0=q/p with odd p lands at some repelling or 
parabolic periodic point of period dividing p 
(Douady and Hubbard 1982). Vice versa, any 
repelling or parabolic point is a landing point of at 
least one rational ray as above (Douady 1990s). 

Let us consider the following equivalence relation 
on the set of rational numbers with odd denomi- 
nators: two such numbers 6 and @’ are equivalent if 
the corresponding rays Rg and Rø land at the same 
point z € J(f). Two polynomials f and f with 
connected Julia set are called combinatorially 
equivalent if the corresponding equivalence relations 
coincide. Notice that topologically equivalent poly- 
nomials are combinatorially equivalent. 


Parameter Phenomena 
Spaces of Rational Functions 


Let Raty stand for the space of rational functions 
of degree d. As an open subset of the complex 
projective space CP*4*!, it is endowed with the 
natural topology and complex structure. 


Hyperbolic Maps and Fatou’s Conjecture 


Hyperbolic maps form an important and best- 
understood class of rational maps (compare with 
Hyperbolic Dynamical Systems). A rational map f is 
called hyperbolic if one of the following equivalent 
conditions holds: 


e All critical points of f converge to attracting 
cycles; 
e The map is expanding on the Julia set: 


DF” (z) > CA”, z€ S(f) 


where C>0, A> 1. 


For instance, the maps zz? +e for small 
ée,zre27—1, and zz? +c for ce R\[-2,1/4] 
are hyperbolic. It is easy to see from the first 
definition that hyperbolicity is a stable property, 
that is, the set of hyperbolic maps is open in the 
space Rat, of rational maps of degree d. One of the 
central open problems in holomorphic dynamics is 
to prove that this set is also dense. This problem is 
known as Fatou’s conjecture. 


Postcritically Finite Maps and Thurston’s Theory 


A rational map is called postcritically finite if the 
orbits of all critical points are finite. In this case, any 
critical point c is either a superattracting periodic 
point, or a repelling preperiodic point (i.e., f”c is a 
repelling periodic point for some n). If all critical 
points of f are preperiodic, then J(f)=C. 

Important examples of postcritically finite maps with 
Jif) =Ê come from the theory of elliptic functions. 
Namely, let P, :C/T-—C be the Weierstrass 
P-function, where I’, is the lattice in C generated by 
1 and 7, Imr > 0. It satisfies the functional equation 
P (nz) =fr,n(P(z)), where frn is a rational function. 
These functions called Lattés examples possess the 
desired properties. (For some special lattices, n can be 
selected complex: the corresponding maps are also 
called Lattés.) 

More generally, one can consider postcritically 
finite topological branched coverings f : S% — S. Two 
such maps, f and g, are called Thurston combina- 
torially equivalent if there exist homeomorphisms 
h,h' :(S*, Or) + (S, Og) homotopic relO, (and 
hence coinciding on O+) such that h'o f =p o g. 

A combinatorial class is called realizable if it 
contains a rational function. Thurston (1982) gave a 
combinatorial criterion for a combinatorial class to 
be realizable. If it is realizable, then the realization is 
unique, except for Lattés examples (Thurston’s 
Rigidity Theorem). 


Structural Stability and Holomorphic Motions 


A map f € Rata is called J-stable if for any maps 
g € Rat, sufficiently close to f, the maps f | J(f) and 
g|J(g) are topologically conjugate, and moreover, 
the conjugacy he :J(f)— J(e) is close to id. Thus, the 
Julia set J(f) moves continuously over the set of 
J-stable maps. The following result proves a weak 
version of Fatou’s conjecture: 


Theorem 4 (Lyubich and Mané-Sad-Sullivan 
1983). The set of J-stable maps is open and dense 
in Ratg. Moreover, the set of unstable maps is the 
closure of maps that have a parabolic periodic point. 


A map f € Raty is called structurally stable if for 
any maps g € Rat, sufficiently close to f, the maps f 
and g are topologically conjugate on the whole 
sphere, and moreover, the conjugacy hg:C—C is 
close to id. The set of structurally stable maps is also 
open and dense in Raty (Mafié-Sad-Sullivan). 

The proofs make use of the theory of holomorphic 
motions developed for this purpose but having much 
broader range of applications in dynamics and 
analysis. Let X be a subset of C, and let h) : X — C 
be a family of injections depending on parameter 
AEA in some complex manifold with a marked 
point A,. Assume that 4), =1id and that the functions 
At>h)(z) are holomorphic in A for any z € X. Such 
a family of injections is called a holomorphic 
motion. 

A holomorphic motion of any set X over A 
extends to a holomorphic motion of the whole 
sphere C over some smaller manifold A’ C A (Bers- 
Royden, Sullivan—Thurston 1986). If 4) is a holo- 
morphic motion of an open subset of the sphere, 
then the maps hy are quasiconformal (Mafié-Sad- 
Sullivan). These statements are usually referred to as 
the A-lemma. 

If A=), then the holomorphic motion of a set 
X C C extends to a holomorphic motion of C over 


the whole disk D (Slodkowsky 1991). 


Fundamental Conjectures 


The above rigidity and stability results led to the 
following profound conjectures: 


QC Rigidity Conjecture If two rational maps are 
topologically conjugate, then they are quasiconfor- 
mally conjugate. 


Let us consider the real projective tangent bundle 
PT over C, with a natural action of the map f. 
A measurable invariant line field on the Julia set is 
an invariant measurable section X— PT over an 
invariant set X C J(f) of positive Lebesgue measure. 
In other words, it is a family of tangent lines L, C 
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Tz, z € X, such that Df(Lz) =Lg. Note that such a 
field can exist only if J(f) has positive Lebesgue 
measure. 


No Invariant Line Fields Conjecture Let us con- 
sider two rational maps, f and f, that are not Lattés 
examples. If they are quasiconformally conjugate 
and the conjugacy is conformal on the Fatou set, 
then they are conjugate by a Mobius transformation. 
Equivalently, if f is not Lattés, then there are no 
measurable invariant line fields on J(f). 


This conjecture would imply Fatou’s conjecture. 


Mandelbrot Set 


Let us consider the quadratic family f- :z— 2% + c. 
(Note that any quadratic polynomial is affinely 
conjugate to a unique map fe.) The Mandelbrot set 
classifies parameters c according to the Basic 
Dichotomy of the subsection “More properties of 
the Julia set”: 


M = {c: J(f-) is connected} = {c : f7 (0) — co} 


Note that ¢,(c)=f/(0) is a polynomial in c of 
degree 2”-', and these polynomials satisfy a 
recursive relation @n+1 =¢2 +c. Moreover, M= 
{c:|@n(c)| < 2, n € Z4}, which gives an easy way to 
make a computer image of M (see Figure 5). 

A distinguished curve seen at the picture of M is 
the main cardioid C ={c=e?"" — e4™ /4}, 0 € R/Z. 
For such a c=c(O) €C, the map fe has a neutral 
fixed point a, with rotation number 0. For c inside 
the domain Hg bounded by C,f, has an attracting 
fixed point a,, and the Julia set J(f,) is a Jordan 
curve (see Figure 1). 





Figure 5 The Mandelbrot set. 
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At the cusp c=1/4=c(0) of the main cardioid, 
the map f- has a parabolic point with multiplier 1. 
This point is also called the root of C. Other 
parabolic points c=c(q/p) on C are bifurcation 
points: if one crosses C transversally at c, then the 
fixed point a, “gives birth” to an attracting cycle of 
period p. This cycle preserves its “attractiveness” 
within some component H,/, of int M attached to C. 

On the boundary of H,/,, the above attracting 
cycle becomes neutral, and similar bifurcations 
happen as one crosses this boundary transversally, 
etc. In this way we obtain cascades of bifurcations 
and associated necklaces of components of int M. 
The most famous one is the cascade of doubling 
bifurcations that occur along the real slice of M. 

Components of int M that occur in these bifurcation 
cascades give examples of hyperbolic components of 
int M. More generally, a component H of int M is 
called hyperbolic of period p if the maps f,c € H, 
have an attracting cycle of period p. Many other 
hyperbolic components become visible if one begins 
to zoom-in into the Mandelbrot set. Some of them 
are satellite, that is, they are born as above by 
bifurcation from other hyperbolic components. 
Others are primitive. They can be easily distin- 
guished geometrically: primitive components have a 
cusp at their root, while satellite components are 
bounded by smooth curves. 

Given a hyperbolic component H, let us consider the 
multiplier A(c), c € H, of the corresponding attracting 
cycle, as a function of c € H. The function  univalently 
maps H onto the unit disk D (Douady and Hubbard 
1982). Thus, there is a single parameter co € H for 
which X(co) = 0, so that fe has a superattracting cycle. 
This parameter is called the center of H. 

Nonhyperbolic components of int M are called 
queer. Conjecturally, there are no queer compo- 
nents. This conjecture is equivalent to Fatou’s 
conjecture for the quadratic family. 

The boundary of M coincides with the set of 
J-unstable quadratic maps (see the subsection 
“Structural stability and holomorphic motions”). 


Connectivity and Local Connectivity 


Theorem 5 (Douady and Hubbard 1982). The 
Mandelbrot set is connected. 


The proof provides an explicit uniformization 
Rm :C\M > C\D. Namely, let B; : Q. — C\ DR., c € 
C\M, be the Böttcher coordinate near oo. Then 
Rm(c) =B,(c). This remarkable formula explains the 
phase-parameter similarity between the Mandelbrot 
set near a parameter c € M and the corresponding 
Julia set J(f,.) near the critical value c. 


The following is the most prominent open 
problem in holomorphic dynamics: 


MLC Conjecture 
connected. 


The Mandelbrot set is locally 


If this is the case, then the inverse map ay 


extends to the unit circle T, and the Mandelbrot 
set can be represented as the quotient of T modulo 
certain equivalence relation that can be explicitly 
described. Thus, we would have an explicit topolo- 
gical model for the Mandelbrot set (Douady and 
Hubbard, Thurston). 

The MLC conjecture is equivalent to the follow- 
ing conjecture: 


Combinatorial Rigidity Conjecture If two quadratic 
maps f: and fe with all periodic points repelling are 
combinatorially equivalent, then c=c'. 


In turn, this conjecture would imply, in the 
quadratic case, the above fundamental conjectures. 
For a progress towards the MLC conjecture (see 
Universality in Mathematical Physics). 


Parabolic Implosion 


Parabolic maps fa:z— 2+ co are unstable in a 
dramatic way. In particular, the Julia set J(f.) does 
not depend continuously on c near Co. Instead, J(fe) 
tends to fill in a good part of int J(f,,). This 
phenomenon called parabolic implosion has been 
explored by Douady, Lavaurs, Shishikura, and many 
others. 


Geometric Aspects 
Area 


One of the basic problems in holomorphic dynamics 
is whether a Julia set that does not coincide with C 
can have positive area. It would give an example of 
“observable chaos” that occurs on a topologically 
small set. It is also related to the No Invariant Line 
Fields Conjecture. 

Maps with strong hyperbolic properties have zero 
area Julia set. A rational map f is called Collet- 
Eckmann if there exist constants C > 0 and A> 1 
such that: 


DEFOL > CX", neN 


for all critical points c. If f is a Collet-Eckmann map 
with |(f) AC, then area J(f)=0 (Przytycki and 
Rohde 1998) (see Universality and Renormalization 
for more examples). On the other hand, A Douady 
has set up a compelling program of constructing a 
Cremer quadratic polynomial f:z-e?z+4 27 
whose Julia set would have positive area. Buff and 


Cheritat have recently announced that they have 
completed the program, thus constructing the first 
example of a Julia set of positive area. (It makes use 
of a renormalization theorem for parabolic implo- 
sion recently announced by Shishikura.) 

In the parameter plane, it would be interesting to 
know whether the boundary of the Mandelbrot set 
has zero area. 


Hausdorff Dimension 


Hausdorff dimension (HD) gives us a further 
refinement of fractal sets of zero area. Any Julia 
set has positive HD. If f is a polynomial with 
connected Julia set, then HD(J(f))>1 unless f is 
affinely conjugate to z++ x4 or a Chebyshev poly- 
nomial (Zdunik 1990). If f is a Collet--Eckmann map 
with J(f) AC, then HD/J(f) <2 (Przytycki-Rohde 
1998). On the other hand, in the quadratic case 
f.:z2? + c, HD(J(f-))=2 for a generic parameter 
c € OM. The corresponding parameter result is that 
HD(0M) =2 (Shishikura 1998). It is based on the 


parabolic implosion phenomenon. 


Conformal Measure 


Let 6>0. A Borel measure u on C is called 
6-conformal if 


u(x) = | IDF dn 


for any measurable set X such that f | X is injective. 


Theorem 6 (Sullivan 1983). Any rational map f 
has a -conformal measure with 6 € (0,2] supported 


on J(f). 


This is a dynamical measure that captures well 
geometric properties of J(f). For instance, for Collet- 
Eckmann maps, 6=HD(J(f)), and u is equivalent to 
the Hausdorff measure on J(f) in dimension ô. 

The hyperbolic dimension, HDpyp of J(f) is the 
supremum of HD(X) over all compact invariant 
hyperbolic subsets of J(f). Denker and Urbanski 
(1991) proved that HDpyp(J(f)) is equal to the 
smallest exponent 6 of all -conformal measures 
supported on J(f) (see Universality and 
Renormalization). 


Measure of Maximal Entropy 


An f-invariant measure u is called balanced if 
ulf X)=du(X) for any measurable set X such that 
f |X is injective (where d = deg f). 


Theorem 7 (Brolin 1965, Lyubich 1982). Any 
rational map f has a unique balanced measure pm. 


Holomorphic Dynamics 659 


Moreover, preimages of any point z except at most 
two are equidistributed with respect to u (meaning 
that the probability measures un, z that assign mass 1 
to every f"-preimage of z converge weakly to u as 
n— œ). 


For polynomials, the balanced measure coincides 
with the harmonic measure on J(f) (Brolin). (The 
latter is the charge distribution on the conductor J(f) 
generated by the unit charge placed at oo.) In 
general, the balanced measure is the unique measure 
of maximal entropy of f, and moreover, periodic 
points are equidistributed with respect to m 
(Lyubich). 

Measure of maximal entropy is supported on a 
relatively small measurable set: its HD is strictly less 
than HD(J(f)), unless f is conformally equivalent to 
zt 27, a Chebyshev polynomial, or a Lattés example 
(Zdunik 1990). In the polynomial case, it is 
supported on a set of HD at most 1 (Manning 1984). 

In complex analysis, there has been an extensive 
study of fractal properties of harmonic measures, 
providing insights at the balanced measure p and the 
other way around (Carleson, Makarov, Jones, 
Binder, Smirnov, . . .) 


See also: Fractal Dimensions in Dynamics; Geometric 
Analysis and General Relativity; Geometric Flows and 
the Penrose Inequality; Geometric Phases; Polygonal 
Billiards; Renormalization: General Theory; 
Renormalization: Statistical Mechanics and Condensed 
Matter; Universality and Renormalization. 
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Introduction 


The term, holonomic field, was coined by Sato, 
Miwa, and Jimbo (SMJ) in 1978 and the subject was 
investigated by them in a series of five long papers 
and many shorter notes in the period from 1978-81 
(Sato et ail. 19794, 19796, 1979c, 1980, Tracy 
and Widom 1994). The term refers to a special class of 
two-dimensional interacting quantum field theories 
whose 7 point correlations can be expressed in terms 
of the solution to a holonomic system of differential 
equations. A holonomic system is an overdetermined 
system of differential equations with only a finite- 
dimensional family of solutions. There is a sense in 
which these interacting systems with infinitely many 
degrees of freedom have a finite-dimensional substrate 
(at the level of n point functions for fixed n). After 
developing their theory, SMJ realized that such 
quantum fields made an earlier appearance in work 
of Thirring and Federbush. The models considered by 
Thirring and Federbush are self-interacting fermionic 
systems whose nonlinear classical field equations have 
solutions that are an explicit nonlinear transformation 
of solutions to the free field equations. This inspired 
the idea of trying to study these models by “quantiz- 
ing” the nonlinear transformation. Expressions were 
obtained for the correlations and S-matrix but the 
connection with deformation theory was not under- 
stood until the SMJ work. 

In what follows we will sketch the SMJ theory and 
discuss some of its offshoots. There is one circum- 
stance that it might help the reader to be aware of 
even though it will be mostly glossed over. Quantum 
fields in one space and one time dimension have 
correlations which transform under the symmetries of 
spacetime with metric signature (1,1). Since the work 
of Osterwalder and Schrader, it is customary to pass 
back and forth between this Minkowski regime and 
the Schwinger functions obtained by analytically 
continuing the n point functions to pure imaginary 
values for the time variable where they possess the 
rotational symmetries associated with a positive- 
definite metric. The Ising model, which we take up 
next, is naturally considered in the Euclidean domain 
where the correlations have an interpretation in 
statistical mechanics as the expected value of a 
product of random variables. Ultimately, the SMJ 
deformation analysis is done in the Euclidean 
domain. 


The Two-Dimensional Ising Model 


The SMJ theory was inspired by, and provides an 
attractive setting for, an earlier result of Wu, 
McCoy, Tracy, and Baruch (WMTB), concerning 
the spin-spin scaling functions of the two-dimen- 
sional Ising model (Wu et al. 1976). Since the Ising 
model is the example with the most direct signifi- 
cance for physics, we will take some time to explain 
the WMTB result and to sketch the way in which it 
fits into the SMJ theory. 

The Ising model is a statistical model of magnest- 
ism on a lattice that incorporates ferromagnetic 
interactions of nearest-neighbor spins. In the 1920s, 
Ising solved the model for the one-dimensional 
lattice and showed that there was no phase transition 
in the infinite volume limit. Interest in the two- 
dimensional model intensified dramatically following 
Onsager’s calculation of the specific heat in the 
infinite volume limit (see Palmer and Tracy (1981) 
and references within). His formula for the specific 
heat was the first instance of a thermodynamic 
quantity in a nearest-neighbor model which exhibits 
the sort of discontinuity in temperature dependence 
expected at a phase transition. For many years, the 
Ising model served as a testbed for the now accepted 
notion that the infinite volume limit of Gibbsian 
statistical mechanics provides a suitable setting for 
the study of phase transitions. 

A configuration for the Ising model on a finite 
subset, A, of the integer lattice, Z*, is a map C: 
A—{+1, —1}, which assigns to each site on the 
lattice either an up spin (+1) or a down spin (—1). 
The energy function of the Ising model, E,(c), is 


defined by 
Ea(o) = -J X fio) 


(i j)EA 


for J > 0 and a spin configuration o is a sum over 
pairs of nearest-neighbor sites 7,7 in A (boundary 
terms require special consideration). This energy 
function tends to favor spin configurations, o, in 
which the nearest-neighbor spins are aligned in the 
sense that the Boltzmann weight, e~ £'”/*", is larger 
for such configurations. In the Gibbs ensemble, 
which is expected to describe systems in equilibrium 
at temperature T, the configuration ø occurs with a 
probability proportional to the Boltzmann weight. 
The factor k which appears is a conversion factor 
between thermal and kinetic energy called the 
Boltzmann constant. It is clear from the formula 
for the Boltzmann weight that small temperatures 
(near 0) tend to accentuate the difference in 


statistical weights assigned to configurations with 
different energies, and large temperatures tend to 
wash out the difference in statistical weights 
associated to configurations with different energies. 

Remarkably, there is a sharp critical temperature 
0<T.<oo so that for T < Te the propensity for 
order built into the energy triumphs in the infinite 
volume limit At Z?, and for T > T, the randomness 
or disorder associated with high temperatures 
governs the infinite volume behavior. More specifi- 
cally, if T < Tę and the infinite volume limit is taken 
with plus spins assigned to the boundary of A, the 
system exhibits a residual magnetism (there is a 
positive expected value, (ø), for the spin per site). 
This infinite volume plus state is the quintessential 
example of symmetry breaking —- the spin flip 
symmetry possessed by the bulk energy is broken 
below T, in the thermodynamic limit. For T > Ty, 
the spin per site is 0 no matter what boundary 
conditions are imposed on the infinite volume limit. 

Pure equilibrium states both above and below T., 
exhibit clustering in the thermodynamic limit 
(uniqueness for the ground state in field theory). 
This is the tendency of spin variables o(a) and o(b) 
at sites a,b € Z? to become statistically independent 
as the distance |a — b| tends to ov. In such a pure 
state the two-point function, which is the expected 
value of the product of spin variables, (a(a)o(b)), 
will tend to the square (c)* both below (o) Æ 0) 
and above ((7)=0) the critical temperature T, as 
la — b| + œ. To leading order, this clustering takes 
place at an exponential rate, e!?-l/S), for a 
function (T) called the correlation length. The 
correlation length €(T) — co as T —> T.. The scaling 
limit (from below T.) of the spin-spin correlation is 
the leading-order correction to the clustering beha- 
vior of the correlations when these correlations are 
examined at the scale of the correlation length. It is 
the limit 


(o(a)o(6)) = lim SOB a 


mT (0) 


where the correlations on the right-hand side are 
thermodynamic correlations on the lattice at tem- 
perature T. Since (0), tends to 0 as T—> T, the 
normalization by (c);' on the right produces an 
“infinite wave function renormalization” in the limit. 

Equivalently, one may think of this continuum 
limit being achieved by letting the lattice spacing 
shrink to 0 as T approaches Tę so that the 
correlation length stays fixed on the new scale. The 
scaling limit from above T, turns out to be different 
from the scaling limit from below T, and since 
(o)7=0 for T > Te, it is defined by a different wave 
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function renormalization. The resulting asymptotics 
are expected to capture quite a lot about what is 
interesting in the behavior of the correlations near 
the phase transition. In the late 1970s, the scaling 
behavior in this model was also a prototype for the 
emerging connection between renormalization group 
ideas in quantum field theory and statistical 
mechanics. 

Wu et al. (1976) showed that the two-point 
scaling function, (o(0)o(x)), is a function of r= |x| 
and can be written as 


cosh(w/2) 


1 
x eer A 


of dy 2 

(sinh?y ($) K 
x (Tt Te.) 

sinh(w/2) 


oe) 2 
x exp; | (sinnu ($) K 


x( TiTa 
[1] 


where ~=y(r) satisfies the differential equation, 
d/d r. 

T (+ T) = z sinh(2¥) 

The substitution n=e™ transforms this differential 
equation into a Painlevé equation of the third kind. 
This was used by McCoy, Tracy, and Wu (see 
Palmer and Tracy (1981) and references within) to 
study the short-distance behavior, r—0, of the 
scaling functions — behavior which is far from 
manifest in the infinite series expansions obtained 
for the scaling functions. 


Deformation Theory 


Sato, Miwa, and Jimbo showed that there was a 
class of quantum field theories that included the 
scaling limits of the Ising model which have the 
property that the n-point correlations are “tau 
functions” for monodromy-preserving deformations 
of the Dirac equation in two dimensions. The two- 
dimensional (Euclidean) Dirac operator is 


m —20 
i i m | 


with 
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the usual complex derivatives acting on smooth 
functions on R*. The monodromy-preserving defor- 
mations mentioned above are families of (multivalued) 
solutions w(x) to 


Dw = 0 [2] 


which are branched at points a; € R’(j=1,2,...,7) 
and change by a factor e*™% as x makes a small 
circuit about aj. SMJ (1979b) show that the L?(R7) 
(square-integrable) solutions w(x) of the Dirac 
equation with this prescribed branching behavior 
comprise an n-dimensional subspace of L?(R7). The 
constants e*™ are called the monodromy of the 
solutions and the term “deformation” in the descrip- 
tion refers to the fact that the monodromy constants 
do not change as the branch points a; are varied. 
SMJ show that it is possible to choose a basis 
WA X30) WX Gig dncgGe) P= 1s1c0g%) SO that’ the 
vector 


Wa) 0) WHS, G) ng (X50) 


becomes a section for a flat (Dirac compatible) 
connection in the (x,a) variables. That is, 


dza W = Uxa) Ww 


where dx a is the exterior derivative in the x and a 
variables and Q is a matrix-valued 1-form that 
satisfies the zero curvature condition, 


A= OAD 


They also introduced the notion of a tau function, 
Tr(a), for such deformations. The logarithmic deri- 
vative d,logr(a)=w, where w is a 1-form on 
R*\{a1,a2,...,4,} expressed in terms of the matrix 
elements of 2. The 1-form w introduced by SMJ is 
shown to be closed when Q satisfies the zero 
curvature condition above. The scaling limit of 
the Ising model is related to the situation for 
monodromy multipliers —1 and when the scaling 
limits of correlations are identified as suitable 
T-functions in this case, the WMTB result emerges 
when the nonlinear zero curvature condition is 
identified with a Painlevé equation. 

The connection between the deformation theory 
and quantum field theory is developed in the 
computationally intensive paper SMJ (1980). Exten- 
sive use is made of local operator product expan- 
sions, analytic continuation, and formal series 
expansions that are infinite-dimensional analogs of 
Wick-type theorems for finite-dimensional spin repre- 
sentations (developed by SMJ (1978)). One can get 
a feeling for the source of the connection by recalling 
that in one of the “exact solutions” of the two- 
dimensional Ising model the spin operators, o(a), 
are identified as elements of an infinite spin 


representation of the orthogonal group and are 
characterized by their linear action on Fermionic 
variables (Palmer and Tracy 1981). In the physics 
literature, the o(a) are referred to as Bogoliubov 
transformations. In the scaling limit the associated 
representation space is the home to a free Fermi field 
w(x), an operator-valued solution to the Dirac 
equation. Of course, ~(x) has components p;(x) but 
for simplicity we will suppress such details in the 
mostly schematic discussion that follows. For coin- 
cident second coordinates x2 = a2 the Fermi field w(x) 
and g(a) satisfy the commutation relation 


o(a)y(x) = —sgn(x1 — a1 )(x)o(a) 3] 


which is a surviving remnant of the linear action of 
o(a) on lattice fermions. In the transfer matrix 
formalism, which is natural for statistical mechanics, 
translation in the “space” variable x; is unitary, but 
translation in the “time” variable, x2, is governed by 
the transfer matrix, the generator of a contractive 
semigroup. Because of this, the quantities that are 
well behaved in this formalism are “time-ordered 
vacuum expectations”; these involve only “positive” 
powers of the transfer matrix. Let 7 denote the 
“time”-ordering operator; a sequence of operators 
depending on coordinates in R? is reordered follow- 
ing 7 so that the second coordinates appear in 
increasing order from left to right. Sign changes are 
incorporated whenever it is necessary to exchange 
Fermi type operators like ¢*(x) and w(y) to put them 
in the correct order. In the Euclidean setting (pure 
imaginary time) it is well known that 


G(x,y) = (Ty (x)v(y)) 


is a Green function for the Dirac operator D (the 
distribution kernel for D). 

This observation and [3] suggests that the hybrid 
vacuum expectation 


(Ty (x) o(y)o(a41) -- olan)) 
(To(a1) +++ o(4n)) 


should be the Green function for a Dirac operator 
with a domain containing “functions” branched at 
the points a; having “monodromy” —1 there. It is 
possible to recast the SMJ analysis so that a Dirac 
operator, D(a), on a suitable vector bundle with base 
R*\{a1,...,4,} becomes the central player (see 
Palmer et al. (1994) and references therein). The 
data for the vector bundle includes the factors e77"% 
incorporated in transition functions for the bundle. 
The 7-function becomes an infinite determinant (or 
Pfaffian in the Ising case) 


Gix,9;4) = 


Tr(a) = det D(a) [4] 


in the Segal-Wilson sense (see Palmer et al. (1994) 
and references therein). The Green function 
G(x, y;a) has a finite-rank derivative, 


daG (x,y; a) = > ri(x, a)sj(y, 4) da; 
j 


+ uj(x,a)v;(y, a) da; [5] 


which is the key result in this version of the SMJ 
analysis (this observation appears in SMJ (1980) but 
does not have a central role there). The “wave 
functions” r, s, u, and v are closely related to the 
L? wave functions w; described above. Equation [5] 
is both the source of the deformation equations for 
r, s, u, and v which arise from d7G =0 coupled with 
the rotational and translational symmetries of the 
Green function, and also of the expression for 
d,logt(a) in terms of data associated with the 
deformation theory. A “transfer matrix” calculation 
of the determinant allows one to make the connection 
with the scaling limits of lattice fields including the 
Ising model (see Palmer et al. (1994) and references 
therein). 

The short-distance behavior of the two-point 
function for the Ising model scaling functions has 
been rigorously calculated by Tracy and later by 
Tracy and Widom (see Harnad and Its (2002) and 
references therein). A less detailed analysis of the 
short-distance behavior of the n point functions that 
uses the deformation analysis of the correlations in a 
crucial way can be found in Palmer (2000). 


The Riemann-Hilbert Problem 


In SMJ (1979b), a “massless” version of holonomic 
fields is developed. This concerns monodromy- 
preserving deformations of the Cauchy—Riemann 
operator ð. The techniques used to study this lead 
back to the Riemann—Hilbert problem — the problem 
of determining a linear differential equation in the 
complex plane with rational coefficients and pre- 
scribed monodromy at the poles of the coefficients. 
More specifically, suppose one is given n distinct 
points {a1,...,d,} in Pt, the Riemann sphere, and 
a base point ao distinct from the aj,j #0. Let y; 
denote a simple closed curve based at ag which 
winds counterclockwise once around a; but has 
winding number O for the other points a,,k Æj. 
Choose n invertible p x p matrices M; which satisfy 
the single condition M,M)---M,=I. Then, the 
homotopy classes of the curves 7; are the generators 
for the fundamental group of the punctured 
sphere P'\{a1,...,4,} with base point ao and the 
map which sends 7; — M; determines a representa- 
tion of the fundamental group. One version of the 
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Riemann-Hilbert problem is to find p x p complex 
matrices A; for j=1,...,m so that the linear 
differential equation 


dY QA 
rip Sree 6 


has monodromy representation given by y; — Mj. 
This means that the fundamental solution Y(z) 
defined in a neighborhood of z= ao and normalized 
so the Y(aọ)=I (the identity) will become the 
fundamental solution Y(z)M;! after analytic con- 
tinuation around the curve yj. This form of the 
problem does not always have a solution but when 
it does, it is interesting to consider deformations 
a— A,(a) that preserve the monodromy multipliers 
M;. Such monodromy-preserving deformations 
were first considered by Schlesinger in 1912 (see 
Palmer and Tracy (1981) and references therein) 
and he discovered that the coefficients A; must 
satisfy nonlinear differential equations that, for 
ado =œ, can be written as 


dA = >i aay - a) 


í dp — d; 
kyj “RI 








SMJ introduced 7-functions associated with these 
deformations and they gave these r-functions a 
quantum field theory interpretation as n point 
functions. Eventually this theory was extended to 
include the Birkhoff generalization of the Riemann- 
Hilbert problem, a generalization which incorpo- 
rates the additional information needed to fix local 
holomorphic equivalence at higher-order poles (formal 
asymptotics and Stokes’ multipliers) (Jimbo and 
Miwa 1981, Sato et al. 1978). Roughly speaking, the 
problem is to reconstruct a global connection with 
specified singularities from its local holomorphic 
equivalence data and its global monodromy represen- 
tation. Thinking of the differential equation [6] as a 
holomorphic connection proved very helpful in a 
geometric reworking of the SMJ analysis given 
by Malgrange (1983a, 1983b) who showed that the 
zeros of the r-function occurred at points where a 
suitably defined Riemann-—Hilbert problem fails to 
have a solution (see also Palmer (1999) references 
within). The mathematical significance of massless 
holonomic quantum fields as (quantized) singular 
elements of a gauge group is apparent from the SMJ 
work and later work of Miwa but the possibility of 
interesting physics in these models does not seem to 
have been much investigated at this time. These 
quantum fields are also conformal fields; however, a 
comprehensive integration into the highly developed 
formalism of conformal field theory on compact 
Riemann surfaces has not currently been developed 
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(an analog of [5] should survive on compact Riemann 
surfaces but the deformation analysis of the correla- 
tions is likely limited to symmetric spaces). 


Further Developments 


This work on massless holonomic fields and the 
connection with the Riemann—Hilbert problem is 
doubtless the aspect of holonomic fields with the 
most “spin offs” in the mathematics and physics 
literature. These include an analysis of the delta- 
function gas done by Jimbo, Miwa, Mori, and Sato 
in 1981, random matrix models first looked at by 
Jimbo, Miwa, Mori, and Sato and later system- 
atically investigated by Tracy and Widom (1994), 
the deformations of line bundles on Riemann 
surfaces that led to KdV in the work of Segal and 
Wilson (1985), which emerged from work of Sato, 
Miwa, Jimbo and collaborators, the analysis of 
Painlevé equations starting with work of McCoy, 
Tracy and Wu (see Palmer and Tracy (1981) and 
references within) and more systematically devel- 
oped by Its and Novokshenov (1986), and the 
revival of interest in monodromy-preserving defor- 
mations (Harnad and Its 2002). 

Holonomic fields are related to free fields in a 
well-understood way and it is natural to study them 
in situations where free fields make sense. In 
particular, they are an interesting testbed for the 
nonperturbative investigation of the influence of 
geometry (or curvature) on quantum fields. In Palmer 
et al. (1994), the deformation analysis of r-functions 
for holonomic fields is carried out for the Poincaré 
disk. The two-point functions are shown to be 
expressible in terms of solutions to the family of 
Painlevé VI equations. A quantum field theory 
interpretation of these +t-functions is given by 
Doyon and there are natural analogs of the scaling 
limit of the Ising model on the Poincaré disk as 
well. The role of “spacetime” symmetries in the 
deformation theory suggests that such analysis will 
be limited to symmetric spaces. In addition to the 
plane and the Poincaré disk, the cylinder, the 
sphere, and the torus round out the possibilities in 
two dimensions. Lisovyy has recently worked out 
the analysis for the cylinder, which is important for 
the study of thermodynamic correlations. It should 
be possible to recast the analysis of the continuum 
Ising model on the torus (Zuber and Itzykson 1977) 
in deformation theoretic terms. It does not appear 
that the holonomic fields associated with the Dirac 
operator for the constant curvature metric on the 
2-sphere have been studied yet. 


See also: Deformation Theory; Integrable Systems: 
Overview; Isomonodromic Deformations; 
Riemann—Hilbert Problem; Two-Dimensional Ising 
Model. 
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Introduction 


In this article we consider the following question: 
which homeomorphisms of the circle transport one 
given class of continuous functions into another? 
The allowed classes of functions are Banach spaces 
contained in C(T), the space of continuous functions 
on the unit circle T, and will be defined by the 
properties of the Fourier series of the functions. 
Next, we will develop the theory of Poincaré- 
Denjoy which describes some basic geometric 
properties about diffeomorphisms of the circle such 
as existence and properties of the rotation number, 
classifications of possible orbits of diffeomorphisms, 
and Denjoy counterexample. 

A homeomorphism of the circle is regarded here as 
a change of variables for periodic functions. So, it will 
be our major concern to describe the changes of 
variables that do not affect “too much” the behavior 
of the Fourier series of the functions in the given class. 

We say that a function ): R > R is a homeomorph- 
ism of the circle T={(x,y) € R*:x?2+y5%=1}, 
if 4 itself is a homeomorphism such that h(t + 27) = 
h(t) + 2r for all t € R. It is clear that such function þh 
induces a unique homeomorphism h:T—T that 
makes the following diagram commutative: 


~ 


h 
T — T 


e” | |e i.e., ble) = ell?) 
R— R 
h 


In the same way, we identify functions yọ: T — C 
with 27-periodic functions Y% :R > C. 

Let U(T) be the space of all continuous functions 
on T that have uniformly convergent Fourier series, 
and let A(T) be the space of all continuous functions 
on T with absolute convergent Fourier series. 

In 1953, Beurling and Helson proved an important 
result about the homeomorphisms that preserve the 
space A(T): they are rotations and symmetries, that 
is, if f o h € A(T) for all f € A(T), then the homeo- 
morphism b must have the form h(t)=t+a or 
b(t)=-—t +a. It is quite obvious that rotations and 
symmetries preserve A(T), since the Fourier coeffi- 
cients of f oh and f have the same modulus, but to 
prove the converse is very hard. So, homeomorphisms 
that preserve A(T) are a very restrict class. 


A wider class is obtained when we transport A(T) 
into U(T), that is, f o h € U(T) for all f € A(T). The 
major object of this article is to study such changes 
of variables. 

We say that a homeomorphism of the circle / is of 
finite type, if there is an integer v, satisfying 3 < v < œ, 
such that h is of class C” and 


Jb" (t)| + [pO (2) +--+ |b ()| £0, for allt eR 


In the realm of Fourier analysis, the most 
important and general result about homeomorph- 
isms of the circle is due to R Kaufman, who showed 
in 1974 that a finite-type homeomorphism p 
transports A(T) into U(T). We shall analyze in 
detail such seminal result. 


Homeomorphism of the Circle 
of Finite Type 


In this section we prove the theorem of R Kaufman 
mentioned before, which means that it is sufficient 
for a homeomorphism of the circle 4 to have a 
certain amount of curvature in order to transport 
A(T) into U(T). We present a simple proof of this 
fact, based on a result due to Stein and Wainger. 

If f: T—C is a continuous function and if 


too 1 i —int 
h= z| fle dt, n € Z, 


denote the Fourier coefficients of f, then f € A(T) if 
and only if 


N 
> d= lim N hal < 00 
neZ an 


Of course, A(T) is a Banach space with the norm 


WF lacy = >> lf 


neN 


The space U(T) is defined as the space of all 
continuous functions f : T — C such that 


N 
y” — f(t), when Noo, forallt € [—7,7] 
-N 


uniformly on T, that is, U(T) is the space of 
continuous functions from T to C that are the 
uniform limit of their Fourier partial sums 


N A^ à 
SN(f, t) — > ue” 
—N 
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Hence, under the natural norm given by 


If llucry = suptlSn(f,#)|; N €N = {0,1,...} 
and t € |—2,z]} 


the space U(T) is a Banach space. 
We shall prove: 


Theorem 1 (Kaufman 1974). Let hb be a homeo- 
morphism of the circle of class C’, with v > 3. 
Suppose that 


HOLH [PO (A) +--+ [LOW] 40, 


Then, h transports A(T) into U(T), that is, fohe 
U(T), whenever f € A(T). 


for alltE R 


It follows from the theorem that an analytic 
homeomorphism of the circle transports A(T) into 
U(T). To see this, suppose that h is not of finite type. 
Then, for each n > 3, there exists t, € [—7,7] such 
that h(t) =0 for all j € {2,...,2}. Since {t,} has a 
convergent subsequence, there exists t €[—7z,7| 
such that }”)(t)=0 for all j > 2. This implies that 
hb” must be a constant function and, therefore, 
h(t) =tt+a. Since we know that this kind of 
homeomorphism preserves A(T), we are done. 

One can ask why to demand v > 3. The answer is 
easy. Since h(t+2r)=h(t) +27 for all te R, it 
follows that h’(t + 27) =h’(t) for all t € R, that is, p’ 
is a periodic function of period 27. So, it will always 
exist a point t € (m,m) such that h”(t)=0. 

We can also infer from the theorem that a C% 
homeomorphism of the circle that has no flat point, 
that is, no point ¢ such that h(t) =0 for all j > 2, 
transports A(T) into U/(T). This is obvious, because 
the negation of being of finite type implies the 
existence of a flat point. It is not true, however, that 
every C” homeomorphism of the circle transports 
A(T) into U(T). 

The proof of the theorem is based on the two 
lemmas that follows. The first lemma was obtained 
by Stein and Wainger, who proved it in a more 
general setting in 1965, although that proof was 
only published five years later. The second lemma 
was proved by R Kaufman in 1974. 


Lemma 2 (Stein and Wainger 1970). 
real polynomial of degree d. Then 


/ inn) dt 
. t 


Let p(t) be a 


<6") 2010 





=2d+ 237 (24) 2] 


k=0 


for all r > 0. 


Lemma 3 (Kaufman 1974). Let f be a real function 
of class CF on the interval [—r,r], with k > 2. 
Suppose that 1<|f*)(t)| <b for all t€[—n7). 


Then 
/ cife) at 
7 t 


where C(k,b) is a constant that depends only on k 
and b. 


< C(k, b) 





We shall see that Lemma 3 can be proved from 
Lemma 2 in a quite simple way. The proof given 
by R Kaufman for Lemma 3 does not make use of 
Lemma 2 at all. Also, it is not difficult to see that 
Lemma 2 follows from Lemma 3, if we consider 
d > 2. So, they are indeed equivalent results. 

Before getting into the proof of these two lemmas, 
let us state a result which is the primary tool in 
dealing with oscillatory integrals as those in the 
lemmas. 


Lemma 4 (Van der Corput lemma). Let f be real 


valued and smooth in [a,b], with O<a< b. 
Suppose that |f®(t))}>A>0 for all t€ [a,b]. 


Then 
b 
f airy at 
P f 
holds if 


(i) k > 2, and 
(ii) k=1 and f'(t) is monotonic. 








Now, let us prove the two lemmas and 


Theorem 1. 


Proof of Lemma 2 The proof is by induction on 
the degree of the polynomial. Suppose that p(t) is a 
polynomial of degree 0, that is, p(t) is a constant 
function. In this case the result is trivial, since the 
integral is equal to zero. 

By induction, assume that the statement is true 
for polynomials of degree less than or equal to d. 
Let 


p(t) = agy1t®! +agt +--+ ait +ao, agy #0 


Make the change of variables t= |agy1 | 0s. 


Then we have 


| Oa / 7 iaar) dt 
| t J. f 
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where o=|agu,| 17" )r and q(t) is a polynomial 
of degree at most equal to d. Suppose o> 1. 


Then 
J ” iane) dt 
1 t 


/ 7 ilaa) dt) c 
E f 




















4+ [ eila(t JE a Se dt 
t 
+| fe (tt!) dt 
<J+HU4+I1I 
By Van der Corput lemma, I < [3(2%!) — 2] and 
II < [3(24*") — 2], so I + H < 6(24*!) — 4. Now 


(t)+¢4+1) E eit) dt 


1 
m<|f [e $+ iqt) dt 
4 


<f if dr 60""") 9d —-10 
—1 


<2460°")=24—10 








since the degree of q(t) is at most equal to d. So 


ESIET, 
Lo -4+ — 2d — 10 
=60°"'\-2¢+1)—10 


On the other hand, if o < 1, then 


T eila(t JESAS yi VY dt 
<f [les (atte) _ a) - 


P [ ciat) at 
£24607) —24 =10 


<2 460) 0d 10 4607) 4 
60") Dod 10 











and the proof is completed. o 


Proof of Lemma 3 Assume first that r > 1. Then 


r dt 
f cif) dt 
=f 




















t 
r 1 
P / aif del a) dt + / if) d 
Wi t =j 
=+ H+ 
Since |f®(t)| > 1 and k > 2, then by Van der 


Corput lemma, I < [3(2*) — 2] and II < [3(2*) — 2]. 
(Note that we have to assume k > 2 in order to 
apply Van der Corput lemma, since we know 
nothing about the monotonicity of f'(t).) 


To evaluate III, we proceed as following: 


(k—1) 
FAFO + FOr + a ae 





k 


=p) +f (ot) 


where the number a; depends on ¢t and 0 < o; < 1. So 


i dt 
/ aire) d 
=| f 
1 
< | fetes une — e20) de 
=i t 


< bZ + 6(2%) —2(k—1)—10 











1 
P | ciple) dË 
= 
k! 


by Lemma 2, since p(t) is a polynomial of degree at 
most equal to k — 1. 

On the other hand, if r < 1, it also follows from 
Lemma 2 that 


í dt 
[ oe 
<| [le (ote a) de 

t 


a b= + 6(2') - 2(k —1)—10 











i / ipa a 
=f 


Hence 


S| < Cl, b) 





L 


and we are done. o 


Proof of Theorem 1 Let b be a homeomorphism 
of the circle satisfying the hypotheses of the 
theorem. 

We claim: there exists 6 > 0 such that, for all x € 
[—7, 7], there is k depending on x, with 2 < k < v, 
such that |h*)(¢ + x)| > 6 for all t satisfying |t| < 6. 

The proof of the claim is simple: suppose that 
there is no such 6. Then, for each n € N and each k 
with 2 < k < n, there exist x, € |—7, 7] and tg, such 
that |t| <1/ and |h (tak + xn)| < 1/n. Taking a 
subsequence if necessary, we have x, >x E [—7, 7]. 
Also, tga —>0 when n—oo for all such k. So, 
h'®) (tp, + Xn) > hP (x) when n— oo. Since |b) (tgn + 
Xn)| <1/n, we conclude that h'*)(x)=0 for all k 
with 2 < k < v, thus reaching a contradiction. 

Now, let fe A(T). So, XS, lfl < co, thus 
implying that 


29 A š N A 
fit) = Xhe" = im hae 
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Hence 
= “A . N Aa 
f(b(t)) = >> O = lim Yf, 
—00 -N 
Put gn(t) =n "a e", Since gy is smooth, 


we have gn E€ U(T) T) for all N € N. If g(t) stands for 
f(h(t)), then gn —> g uniformly, since f € A(T). Thus, 
it suffices to r that g € U(T). This happens if 
and only if S,,(g,x)=>-)__,, Be" * converges uni- 
formly to g as m— œ, that is, given € > 0, there 
exists mo € N such ihat e — g(x)| < for all 
m > mo and x E€ [—7,7]. 
We have 


Smlg, x) — g(x)| 
< |gen) — g(%)| + |Sm(8n, x) 
+ |Sin(Zn, x) — Smg, x)| 
for all m,n € N. Since gn —> g uniformly and gy € 
U(T), the last inequality shows that we need to 


demonstrate that, for each e> 0, there exists 
No € N such that 


Sin(N,X) — Sin(¥,x)| < € 
Y N>No,x € [—7,7] 


— gn(x)| 


and meN 


thus proving that S,,(gn,x) —> Sm(g, x) uniformly in x 


and m when N — œ. 
But, if K > N € N, we have 


\Sin(Qn, Xx) — Sm(KK, x)| 


se | (eult +) -ge +2))Dn(t) dt 


7 7 3 f elth(t+x) _ D e1” (t+x) 
o 2na p= ü T 

















1 . t inh(t+x) 
< = ( 5> fixe Dy (t) dt 
T \ K>|n|>N 


LS e1” (t+x) 
<i Mall fer Dale) d 


K>|n[>N 











where 


A oi in(m + (1/2))t 
D= ikt _ sin(m 
(2) 2s sin(t/2) 
is the Dirichlet kernel. 
Hence, we are done if we show that 


| ce ae Z C [1] 





where C is a constant that does not depend on m, n, 
and x. 

To prove that the oscillatory integral above is 
bounded, we make use of Lemma 3. We have that 


2 sin(mt) 


Dy j= f 


+ O(1) 


on any compact subset of (—27, 27), that is, 
sin(m + (1/2))t 2 sin(mt) 
sin(t/2) t 
tcos(t/2) — 2 sin(t/2) 
tsin(t/2) 











KE 


where the constant C* does not depend on m, on 
any compact subset of (—2r, 27). 

In order to prove [1], consider x € [—z,z]. We 
have already proved that there exists k (depending 
on x), with 2 < k < v, such that |b) (t+ x)| > 6 > 0 
for all ¢ such that |t| < 6. Therefore, 


T - 6 f 
J einh(t+x) sin (mt) dt < J einh(t+x) sin (mt) di 
T t 5 t 
TT 
21 (=) 
+ 21og z 


We can assume that n is a positive integer: if 7 is 
negative, we take complex conjugate; and if n=0, 
the integral is trivially bounded, as we see by 
integration by parts or by Van der Corput lemma. 
(Indeed, we do not need to worry about n =0, since 
it is necessary to bound the integral only for large n.) 

So, assuming that n is a positive integer, we 
change variables: define t=rs, where r =n! 5 6-1, 
Since sin(mt) = (e”” — e~""*) /(2i), we have 


fo) A 
J einb(t+x) Sin(mt) dt 
=. t 
5 5 
J eileh(e-+x)+m] dt i Je ifnb(t-+x)—me] At 
t F T 


fe ilauh(rs+x)+mrs] | ds 
a S 


f ame —mrs| | ds 


—6/r S 


IA 











+ 








Put d(t)=nh(rt + x) + mrt and W(t) =nh(rt + x)— 
mrt. We have o% (t)=nrf hP (rt+ x) and w*)(t) = 


nrf h. (rt + x). But, since nr? = 1/6, we conclude that 
BHO = WOE 
1 
=e (rt+x)|>1, vte -2,2l 


Homeomorphisms and Diffeomorphisms of the Circle 669 


Also, 


o (z)| = |b @| < br 
= E maxf p(s] In<s< 2r} 


for all t € [—6/r, 6/r]. Therefore, by Lemma 3, we get 








and 
r 
J cw a < C(k, by) 
—$/r i 
< max{C(j, bj): 2<j7 <v} 
This concludes the proof. oO 


Diffeomorphisms of the Circle 


In this section we study the circle diffeomorph- 
isms. This theory goes back to Poincaré (1885), 
who studied circle diffeomorphisms to decide 
when differential equations on the torus have 
periodic orbits of a specified type. For this he 
introduced the rotation number as an important 
dynamical invariant, which later turned out to be 
very fruitful in the theory of dynamical systems, 
and proved that a diffeomorphism with an 
irrational rotation number is combinatorially 
equivalent to a rotation with the same rotation 
number. 

Denjoy (1932) constructed examples of diffeo- 
morphisms of class C! with irrational rotation 
number having wandering intervals, in opposition 
to early ideas of Poincaré. It was necessary to 
assume that a diffeomorphism without periodic 
points is more smooth, in fact CĈ, to prove that it 
is topologically conjugate to the rotation. 


The Poincaré Rotation Number 


Let h:T—T be an orientation-preserving homeo- 
morphism. Given such a map, there is a (nonunique) 
map /:R—R, which is called a lift of b, such 
that hop=poh, where p:R— T is covering map 
p(t) = e2mit 7 

A lift, h, of satisfies: 


1. b is monotonically increasing, that is, b(t) < 
h(t) if i. = Ds 

2. b(¢+1)=h(t)4+1 for all te R, so (b —id) has 
period 1. 7 

3. If by hz are two lifts of h, then there is an integer 
k such that 42(t)=h,(t)+k for allt eR. 


These conditions immediately yield the following: 
the transformation h* := ho -- -o h is monotonically 
increasing and h*(t+r)=h*(t)+r,teR,REN, 
rEZ. 

The rotation number gives an asymptotic indica- 
tion (i.e. in the limit) of the average amount of 
rotation of a point along an orbit. We start by 
defining, for a lift h of h, the number 


_ b(t) -t 
po(h,t) = lim E 
This limit exists and does not depend on the 
choice of the point t€ R; so, we denote it by 
polh). If by hz are two lifts of hb, then po(h1,t) — 
po(h2,t) is an integer, so 


p(bh) := po(h, t)mod 1 


is well defined. The number p(/) € [0, 1) is called the 
rotation number of h, and depends continuously on 
h. For detailed proof, see Katok and Hasselblatt 
(1995) or Robinson (1999). 


~ 


Theorem 5 _ The rotation number p(h) is rational if 
and only if h has a periodic point, this is, there exist 
zo € S! and k EN such that b? (zo) = 20. 


Proof Take a lift b of h such that h(0) € [0, 1). 
Suppose that p(h) =q/m. 

If h has no fixed point. Then h(t) —t € R \ Z for 
all t€ R, since þ(t)—te€ Z implies that p(t) is a 
point fixed for þh. In particular, h(t)—tAq for all 
t ER, since ) —id is continuous and periodic, there 
exist real numbers a> 0 such that h(t) —t<q-—a for 
all t€ R. Then 


b” (t) B p= nm) 
= h” [h&—Ve ey] — [p-e] 





<q-a, VkREN 
y 
p= 
Aa ()] — [bY (a)]} 
+ {h™[b&2"(t)) — [be "(a)]} 
+ {hb [b&™ (e)] — [bS"@)]} + 
+ {h™(t) — t} < k(q — a) 
So 
a: h(t) -t 
me Aa mk 
~. R(qg-a —a 
Sme 1 7 = 


proving the claim by contraposition. 
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To see the converse, assume that there exists a 
periodic point tọ € R, that is, there are m,qg EZ 
such that b” (to) = to + q then 


Hio) = fg -+F kq 


P mk _ 
> 6) = fim Tad 


L 


Corollary 6 A homeomorphism h:T—T does not 
have periodic points if and only if the rotation 
number p(h) is irrational. 


Let R) be defined on T by R,(e?™) =e2), 
This map is called a rigid rotation of angle A and it 
is easy to see that ))(t)=t¢+ A is lift of R) and that 
p(Ry) = p(y) =A mod 1. 

In this example we can see the connection 
between the rationality of the rotation number and 
the existence of a periodic orbit. Assume À =m/q is 
rational. Then h7(t)=t+q\=t+m. Therefore, 
every point is periodic with period g. Now, assume 
that A is irrational. Since hY(t)=t+nX for all n, 
then R) has no periodic points. In this case, show 
that every point in T has a dense orbit. 

Now, again let 4:T—7T be any orientation- 
preserving homeomorphism. 


Lemma 7 If the rotation number of h is rational, 
then all periodic orbits have the same period. 


~ 


Proof If p(h)=m/q with m,q € Z relatively prime, 
then we need to show that for any periodic point 
zo = p(t) (where p(t)=e?™ is a covering space 
projection of T) there is a lift h of h such that h(0) € 
[0,1) for which þ4(t) =t + m. If zp is periodic point, 
then /"(t)=t+s for some r,s € Z and 

Oe ae i 

q n>% MY nr r 
So that s = km and r= kq. Then by monotonicity of 
h, we have that þh4(t)=t + m as claimed. O 


The Poincaré Denjoy Theory 


A homeomorphism of the circle with rational 
rotation number has all its orbits asymptotic to 
periodic ones and this, together with Theorem 5, 
yields a complete classifications of the possible 
asymptotic behavior when the rotation number is 
rational. This motivates the study of the asymptotic 
behavior of orbits of homeomorphisms with irra- 
tional rotation number. 

The w-limit set of a point z € T with respect to h 
is the set w(zo) = {z € T; h™(z) - z as n, — oo, for 
same sequence {n} ,}. The a-limit set a(zo) of an 
arbitrary point z € T is defined similarly (with 
n, —>—œ instead n, —> +00). 


Any orbit of a rotation R) with irrational A is 
dense in T, that is, w(zo)=a(zo) =T for all zo € T. 


Theorem 8 (Poincaré 1885). Let h:T—T be an 
orientation-preserving homeomorphism with irra- 
tional rotation number. Then the w-limit set is 
independent of x and is either T or perfect and 
nowhere dense. 


The preceding proposition says that maps with 
irrational rotation number have either all orbits 
dense or all orbits asymptotic to a Cantor set. 

We say that two maps f,g:T — T are topologi- 
cally conjugate if there exists a homeomorphism 
h:T—T such that hof=goh. This implies that 
hof"=g" oh for every integer n. Hence, the 
conjugacy h maps orbits of f into orbits of g. If a 
monotone map /:T—T satisfies lof =g ol but is 
not a necessarily homeomorphism, we only have 
that inverse image of each point is either a point or a 
closed interval. We say that l is a semiconjugacy 
between f and g; this case l maps orbits or pack of 
orbits of f into orbits of g. 


Theorem 9 (Denjoy 1932). Let f:T—T be an 
orientation-preserving diffeomorphism of class C’, 
with irrational rotation number (p(f) =A). Then f is 
topologically conjugate to the rigid rotation R}. 


Note that in spite of the hypothesis of f being CŽ, 
we obtain only a continuous conjugacy. It took 
almost 50 years until Michael Herman (1979) was 
able to solve the more difficult problem of obtaining 
a smooth conjugacy for rotation number satisfying 
extra_arithmetic conditions. 

If f is a circle homeomorphism which does not 
have periodic points, then there exists a semicon- 
jugacy h between f and a rotation R). If þ is not a 
conjugacy, then there exists a point x of the circle 
whose inverse image by / is an interval J. Since 
hof=R,oh, we have that h(f (J))=R%(x). It 
follows that the intervals of the family 
{J,f(J),f*(J), ...} are pairwise disjoint, and the 
w-limit set of J does not reduce to a periodic orbit. 
We say that J is a wandering interval of the map f. 
Thus, C*-differentiability implies that f does not 
have a wandering interval. For details of the proof 
of Theorem 9, see Melo and Strien (1993). 


The Denjoy Example 


Denjoy also proved the following result, which 
shows that the hypothesis of class CŻ is essential. 


Theorem 10 (Denjoy 1932). For any irrational 
number  €[0,1), there exists a C'-circle diffeo- 
morphism f which has a wandering interval, and 
rotation number equal to A. 
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Proof The construction of a diffeomorphism with 
wandering interval will be done in the following 
manner. Given an irrational rotation R,(e?™)= 
e?milt+A) cut the circle T at all the points of an orbit 
{zn = R? (eo); n € Z} of Ry. In each cut insert a 
segment J, of length l, where X_n /],=1. We 
obtain in this manner a new circle longer than the 
first. The open intervals correspond to the gaps of 
the Cantor set. 

In order to construct f formally. Let l, be a 
sequence of positive real numbers with neZ 
satisfying 


) 
( 1) ae l,=1 
(iti) lp > l1 forn>0 
) 
) 


For example 
la = T(|n| + 2)* (a| +3) 


where 


T += X (a +2)" ((n| +3 

Let J, be a closed interval of length l„. We place 
these intervals on the circle in the same order as the 
order of the orbit RX(0). So to place an interval J, 
consider the sum of the lengths of the intervals J; 
where R\(0) is between R%(0) and 0. This deter- 
mines the placement of Jņ. 

The next step is to define f on the union of the J,,. 
It is necessary and sufficient for f’(t)=1 on the 
endpoint in order for the map to have a continuous 
derivative when it is extended to the closure. 
Assume J, = [an, by], so ln = bn — ay. The integral 


N |S os 


bn 
/ (b, — t)(t — a,)dt = 


Lis ee 
a J E TET, E a 


Therefore, if we define f for x € J, by 


f(x) = an1 


6(lay1 — ln 
+j a + Cot"), —t)(t —a,)| dt 
then f(by)=An41 + ln + lng — ln = bn. Also, f is 
differentiable on J, with 


6(ln+1 = la) 


f'(x) =1+ E 


(by — x)(x — an) 


Thus, f’(a,) =1=f'(b,). Notice that for n < 0, l,41 — 
l, > 0, that 


2 
1< fe) <1 SCE (2) = et 


3 2 21, 


and (3l,.1 —l,)/(2l,) goes to 1 as n— —oo. Simi- 
larly for n > 0 and x E Jy, 


/ Sas = ly 


so f'(x) goes to 1 as m+ +00 uniformly for x € Jy. 
From these facts, it follows that f is uniformly C! on 
the union of the interiors of the J,, and has a C! 
extension to all of T. 

Let A=T\Unezint(J,). This is a Cantor set. The 
orbit of a point x € A is dense in A since it is like the 
orbit of 0 for Ry. Thus, w(x) =A. If x € int(J,,), then 
there is a smaller interval I whose closure is 
contained in int(J,). Since the interval J, never 
returns to J, but wanders among the other Jp, then 
J, is a wandering interval. o 


Further Results 


In this section we shall state some additional results 
about homeomorphisms of the circle in the area of 
Fourier analysis. 

The first result is a theorem of Pál (1914) and 
Bohr (1935): let f:T—R be a real continuous 
function; then, there exists a homeomorphism of the 
circle h such that f oh € U(T). The best proof of this 
theorem is due to Salem (1945). In 1978, Kahane 
and Katznelson showed that the result is still valid 
for f : T — C continuous. 

A similar question was posed by Lusin: given a 
continuous function f:T— R, is there a home- 
omorphism of the circle h such that f o hb € A(T)? 
The problem remained open until 1981, when 
Olevskii, Kahane, and Katznelson answered nega- 
tively the question: there exists a real (or complex) 
continuous function f on the circle, such that, for all 
homeomorphism of the circle h, f o h ¢ A(T). 

It was proved by the author that there are C% 
homeomorphisms of the circle, not necessarily of 
finite type, that transport A(T) into U/(T). It is a very 
technical work, published in 1998, and it gives a 
necessary and sufficient condition for a homeo- 
morphism of the circle with a flat point to transport 
A(T) into U(T). 

Finally, the Denjoy theorem (Theorem 9) is rather 
close to being optimal. The example constructed here 
can be improved by obtaining a circle diffeomorphism 
whose first derivatives have Holder exponent arbitrarily 
close to 1 (see Katok and Hasselblatt (1995)). Recent 
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work has dealt with the existence of a differentiable 
conjugacy between a diffeomorphism f with irrational 
rotation number A and R}. Arnol, Moser, and Herman 
have obtained results (see Melo and Strien (1993) for a 
discussion of this results and references). 
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Introduction 


Homoclinic orbits (or motions) were first defined by 
Poincaré in his treatise on the “restricted three-body 
problem.” (Poincaré 1987) Further advances were 
made by Birkhoff (Birkhoff 1960) in the 1930s, and, 
by Smale in the 1960s. Since that time, they have been 
studied by many people and have been shown to be 
intimately related to our understanding of nonlinear 
dynamical systems. There are many systems which 
possess homoclinic orbits. In one striking example (as 
discussed in the book of Moser (1973), they can be 
used to account for the unbounded oscillatory motion 
discovered by Sitnikov in the three-body problem. They 
also commonly occur in two-dimensional mappings 
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Roughly speaking, a homoclinic orbit is an orbit 
of a mapping or differential equation which is both 
forward and backward asymptotic to a periodic 
orbit which satisfies a certain nondegeneracy condi- 
tion called “hyperbolicity.” On its own, such an 
orbit is only of mild interest. However, these orbits 
induce quite interesting structures among nearby 
orbits, and this latter fact is responsible for the main 
importance of homoclinic orbits. In addition, when 
homoclinic orbits are created in a parametrized 
system, many interesting and unexpected phenom- 
ena arise. 

In this article, we first describe the history and 
basic properties of homoclinic orbits. Next, we 
consider some simple polynomial diffeomorphisms 
of the plane (the so-called Hénon family) which 
exhibit homoclinic orbits. Subsequently, we discuss 
a general theorem due to Katok which gives 
sufficient conditions for the existence of such 
orbits. Finally, we briefly consider issues related to 
homoclinic bifurcations and some of their 
consequences. 


Homoclinic Orbits in Diffeomorphisms 


Consider a discrete dynamical system given by a C” 
diffeomorphism f: M—M where M is a C% mani- 
fold and r is a positive integer. That is, f is bijective 
and both f and f™ are r-times continuously 
differentiable. Given a point x € M, set x9 =x. For 
non-negative integers n we inductively define 
Xn41=flxn) and x_»-1=f (x). We also write 
f"(x)=x, for n in the set Z of all integers. The 
“orbit” of x is the set O(x) ={f"(x):1 E€ Z}. 

A “periodic point” p of f is a point such that there 
is a positive integer N > 0 such that fX(p) =p. The 
least such number 7(p) is called the “period” of p. If 
T(p)=1, we call p a “fixed point.” The periodic 
point p with period 7 is called called “hyperbolic” if 
all eigenvalues of the derivative Df” (p) at p have 
absolute value different from 1. For convenience, we 
refer to the eigenvalues of Df*(p) as eigenvalues 
associated to p. If p is a hyperbolic periodic point all 
of whose associated eigenvalues have norm less than 
one, we call p a “sink” or “attracting periodic 
point.” The opposite case in which all associated 
eigenvalues have norm larger than one is called a 
“source.” A hyperbolic periodic point p which is 
neither a source nor a sink is called a “saddle” or 
“hyperbolic saddle.” 

Given a saddle p of period 7, we consider the set 
Ws (p) = W*(p, f) of points y€ M which are forward 
asymptotic to p under the iterates f”7. That is, the 
points y€ M such that f”"(y) ~p as n— oo. This is 
called the “stable set” of p. Similarly, we consider 
the “unstable set” of p which we may define as 
W"(p) = W"(p, f) = Ws(p, ft). The stable manifold 
theorem guarantees that W‘(p) and W“(p) are 
injectively immersed submanifolds of M whose 
dimensions add up to dim M. In these cases, they 
are called the stable and unstable manifolds of p, 
respectively. A point q € W*(p)  W”(p) \ {p} is called 
a “homoclinic point” of p (or of the pair (f, p)). If the 
submanifolds W*(p) and W“(p) meet transversely at q, 
then g is called a “transverse homoclinic point.” 
Otherwise, q is called a “homoclinic tangency.” 

In the special case when M is a two-dimensional 
manifold, the stable and unstable manifolds of a 
saddle periodic point p are injectively immersed 
curves in M. A transverse homoclinic point q of p is 
a point of intersection off p where the curves are not 
tangent to each other. This is depicted in Figure 1 
for the case of a saddle fixed point for the map 
H(x, y) =(7—x?—y,x), a member of the so-called 
Hénon family, which we will discuss later. The 
figure was made using the numerical package 
“Dynamics” which comes with the book by Nusse 
and Yorke (1998). 
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W*(p) 


Figure 1 Stable and unstable manifolds in the map 
H(x, y) =(7 — x? — y, x) for the fixed point p ~ (—3.83, —3.83). 


One easily sees that every point in the orbit of 
a transverse homoclinic point g of a hyperbolic 
saddle fixed point p is again a transverse homoclinic 
point of p. Also, the curves W”(p) and W*%(p) are 
invariant; that is, f(W”(p)) = W“(p) and f(W*(p)) = 
W*(p). This implies that the curves W”(p) and W%(p) 
extend, wind around, and accumulate on each other 
forming a complicated web. 

Upon seeing this complicated structure in the 
restricted three-body problem, Poincaré very poeti- 
cally wrote (p. 389, Poincaré 1987) 


Que Pon cherche à se représenter la figure formée par 
ces deux courbes et leurs intersections en nombre infini 
dont chacune correspond a une solution doublement 
asymptotique, ces intersections forment une sorte de 
treillis, de tissu, de réseau a mailles infiniment serrées; 
chacune des deux courbes ne doit jamais se recouper 
elle-même, mais elle doit se replier sur elle-même d’une 
manière trés complexe pour venir recouper une infinité 
de fois toutes les mailles du réseau. 

On sera frappé de la complexité de cette figure, que je 
ne cherche même pas à tracer. Rien pest plus propre a 
nous donner une idée de la complication du problème 
des trois corps et en général de tous les problémes de 
Dynamique ou il ny a pas d’intégrale uniforme ... 


The next major advance concerning homoclinic 
orbits was made by Birkhoff (1960), who proved 
that in every neighborhood of a transverse 
homoclinic point of a surface diffeomorphism, 
one can find infinitely many distinct periodic 
points. Birkhoff also presented a symbolic 
description of the nearby orbits and noticed the 
analogy with Hadamard’s description of geodesics 
on a surface. Birkhoff’s analysis was generalized 
by Smale to arbitrary dimension, and, in addition, 
Smale gave a simpler analysis of the associated 
nearby orbits in terms of compact zero-dimensional 
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symbolic spaces which we now call “shift spaces” 
or “topological Markov chains.” 

Once one knows that a diffeomorphism f has a 
transverse homoclinic point for a saddle periodic 
point p, it is interesting to consider the closure of the 
orbits of all such homoclinic points. This turns out 
to be a closed invariant set containing a dense orbit 
and a countable dense set of periodic saddle points 
(Newhouse 1980). It is usually called a “homoclinic 
closure” or h-closure. These sets form the basis of 
chaotic or irregular motions in nonlinear systems. 


The Smale Horseshoe Map and 
Associated Symbolic System 


To understand the geometric picture discovered by 
Smale, it is best to start with a concrete example of a 
diffeomorphism of the plane known as the “Smale 
horseshoe diffeomorphism.” 

Given any homeomorphism f :X — X on a space 
X and a subset U C X, let us define I(f, U) to be the 
set of points x€X such that f”(x)€ U for every 
integer n. Thus, we have 


I(f,U) = { ) f"(U) 


nEZ 


We call I(f, U) the invariant set of f in U, or, 
alternatively, the invariant set of the pair (f, U). 

We now construct a special diffeomorphism f of 
the Euclidean plane to itself in which U=O is the 
unit square and for which I(f,U) has a very 
interesting structure. It is this map which is usually 
known as the Smale horseshoe map. 

Let O=[0,1] x [0,1] be the unit square in the 
plane R*. Let 0 <a < 1/2, and consider a diffeo- 
morphism f : R? — R? which is a composition of two 
diffeomorphisms f= T o Tı as follows. The map 
T1(x,y)=(a7!x,ay) contracts vertically, expands 
horizontally, and maps O to the thin rectangle 
O; ={(x,y):0 <x < a™™,0 < y < a} which is short 
and wide. The map T» bends the right side of QO; up 
and around so that T>(Q,) =f/(Q) has the shape of a 
“horseshoe” or “rotated arch.” We arrange for T2 to 
take the lower-right corner of O1 up to the upper-left 
corner of O in such a way that f(Q) meets O in two 
full width subrectangles which we call R and R2. 
This can be done in such a way that the preimages 
R7! = T7! (R1) and Ry! = T7! (T3'(R2)) are both full- 
height subrectangles of O, and the restricted maps 
fı def f | RI! and f d£ f | R3! are both affine. Thus, we 
arrange that fı is simply the restriction of Tı to Rj", 
and the map f can be expressed in formulas as 
f(x,y) =(-—a'x+a, —ay+1). This construc- 
tion implies that f will have the origin p =(0,0) as a 














p 
Figure 2 The horseshoe map. 


hyperbolic fixed point. We label the upper-left corner 
(0, 1) of O with the letter q. It follows that the bottom 
and left edges of O will be in the unstable and stable 
manifolds of p, respectively, and we have indicated 
this in Figure 2 with small arrows. 

The above construction gives us a diffeomorphism 
f of the plane R° such that Of 4eff(O)MO= 
Rı URz is the union of two full-width subrectangles 
of O. We wish to describe I(f,O). We begin with 
the sets O =(),59/ (Q) and O~`= [z0 (Q). 
Thus, O* is simply the set of points in O whose 
backward orbits stay in Q, and O° is the set of 
points whose forward orbits stay in O. For 1=1, 2, 
each rectangle R; is mapped to a thin horseshoe in 
f(O) which meets O in two full-width subrectangles. 
Combining these for 1=1,2 gives four full-width 
rectangles as shaded in Figure 3. Thus, 
O()\f(Q)MF7(Q) consists of these four subrectan- 
gles. Figure 3 shows the sets f*(Q), f-*(O) as well as 
the shaded rectangles we just mentioned. 

Continuing in this way, one sees that, for each 


n > 0, the set OF =O/)f(O)1)... QA (Q) consists 
of 2” full-width subrectangles of O, each with height 







q 
am 
ra 





Figure 3 The sets f*(Q) and f-*(Q) for the horseshoe map f. 


a”. It follows that Ot = (),f"(Q) is an interval 
times a Cantor set. Analogously, O` is a Cantor set 
times an interval, and the set I(f, Q) is a Cantor set 
in the plane. Let us recall the definition of a Cantor 
set C in a metric space X. We first define a Cantor 
space C to be a compact, perfect, totally discon- 
nected metric space. That is, C is a compact metric 
space, whose connected components are points such 
that every point x in C is a limit point of C\ {x}. A 
Cantor set C in a metric space X is a subset which is 
a Cantor space in the induced subspace (relative) 
topology. 

The dynamics of f on the invariant set I(f, O) can 
be conveniently described as follows. 

Let X2 ={1,2}/ be the set of doubly infinite 
sequences of 1’s and 2’s. Writing elements a €X» 
as a=(a;)=(d;);-7, we define a metric p on Xz by 


1 
p(a,b) = ria — bj 
nEZ 


The pair (£2, p), then, is a Cantor space. 

The “left-shift automorphism” on “2 is the map 
o:12—% defined by o(a);=aj11 for each iez. 
This is a homeomorphism from ¥; to itself. It has a 
dense orbit and a dense set of periodic points. 

For a point x EI(f,O), define an element ¢(x) = 
a= (ai) € D2 by a; =; if and only if f'(x) € Rj. It turns 
out that the map ¢:I(f, Q) > X2 is a homeomorph- 
ism such that of = of. 

In general, given two discrete dynamical systems 
f:X—xX, and g:Y—Y, a homeomorphism 
h:X—Y such that gh=hf is called a topological 
conjugacy from the pair (f,X) to the pair (g, Y). 
When such a conjugacy exists, the two systems have 
virtually the same dynamical properties. 

In the present case, one sees that the dynamics of f 
on I(f,Q) is completely described by that of o 
on bp. 

It turns out the the Smale horseshoe map contains 
essentially all of the geometry necessary to describe 
the orbit structures near homoclinic orbits. To begin 
to see this, recall that the left and bottom boundaries 
of O were in the stable and unstable manifolds of p. 
Extending these curves as in Figure 4, one sees that 
the three corners of O different from p are, in fact, 
all transverse homoclinic points of p. 

It was a great discovery of Smale that, in the case 
of a general transverse homoclinic point, one sees 
the above geometric structure after taking some 
power f^ of the diffeomorphism f. Thus, we have 


Theorem 1 (Smale). Let f:M—M be a C! diffeo- 
morphism of a manifold M with a hyperbolic 
periodic point p and a transverse homoclinic point 
q of the pair (f,p). Then, one can find a positive 
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Figure 4 Stable and unstable manifolds in the horseshoe map. 


integer N and a compact neighborhood U of the 
points p and q such that the pair (fX,I(fX, U)) is 
topologically conjugate to the full 2-shift (a, Xp). 


In modern language, we can assert that more 
is true. Let A(f) = Uo<jen fi(I(fX, U)) be the forbit 
of the set I(fN, U). Then, A(f) is a compact zero- 
dimensional hyperbolic basic set for f with 
V def ocien U] as an “adapted” or “isolating” 
neighborhood. This means that A(f) = Qnezf”(V) 
is a compact, zero-dimensional hyperbolic set (see 
Robinson (1999) for definitions and related refer- 
ences) contained in the interior of V and f | A(f) has 
a dense orbit. If g is C! near f, then 
A(g) df Qnezg”(V) is a hyperbolic basic set for g 
and the pairs (f, A(f)) and (g, A(g)) are topologically 
conjugate. 

To get some appreciation for the magnitude of the 
contribution here, one might note the complicated 
arguments employed by Poincaré at the end of 
Poincaré (1987) to show that so-called heteroclinic 
points (intersections between stable and unstable 
manifolds of saddles with different orbits) existed. 
Birkhoff found a symbolic description (using infinitely 
many symbols) of the orbits near a transverse 
homoclinic orbit from which the existence of both 
infinitely many periodic and heteroclinic points is 
obvious. Smale extended the treatment of transverse 
homoclinic points to all dimensions, and found the 
symbolic description (using two symbols for some 
iterate of the map) given above. Moreover, Smale 
proved the “robustness” of these structures: they persist 
under small C! perturbations. Note that Poincaré’s 
discovery of homoclinic points was in 1899, Birkhoff’s 
results came in 1935, and Smale’s results came in 
1965. Thus, the above advances took over 65 years! 

One can understand the geometry of Smale’s 
construction fairly easily in the two-dimensional 
case. Let q be the transverse homoclinic point of the 
saddle fixed point p of the C” diffeomorphism f on 
the plane R*. Given a small neighborhood U of p, let 
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Figure 5 The curves h c W%(p) and hb c W"(p). 


W(p, U) denote connected component of W*(p) N U 
containing p, and define W”(p, U) similarly. We may 
choose C” coordinates (x,y) so that in some small 
neighborhood U of p, the point p corresponds to 
(0,0), the set W”(p, U) corresponds to (y=0), and 
the set W*(p, U) corresponds to (x =0). We assume 
that U is small enough that f in U is closely 
approximated by its derivative Dfo,o. Hence, f 
nearly contracts vertical directions and expands 
horizontal directions in U. 

Take compact arcs 1, C W%(p) and Ib C W”(p) 
both containing the points p and q as in Figure 5. 

Let D be a curvilinear rectangle which is a slight 
thickening of I4. The forward iterates f’(D) will stay 
near lı for a while and then start to approach Ih. 
If we choose D appropriately, we can arrange for 
some high iterate f(D) to be a slight thickening 
of I, as in Figure 6. This looks geometrically like the 
horseshoe map. Let A; be the connected component 
of the intersection D N f(D) containing p, and let 
Az be the connected component of the intersection 
D(\fX(D) containing q. These sets (which are 
shaded in Figure 6) play the role of the rectangles 
Rı and R2, respectively, in the horseshoe construc- 
tion. We use the set A; |) A2 for U in Theorem 1. 

















Figure 6 The curvilinear rectangle D and its Nth iterate f£" (D) 
are geometrically like the horseshoe map. 


The Hénon Family 


To give explicit formulas for the horseshoe map 
above is somewhat tedious, and it is of interest to 
note that similar properties occur in maps with 
simple formulas. Indeed, such properties occur quite 
often in a well-known family of maps known as the 
“Hénon family.” As we have mentioned, the map in 
Figure 1 provides an example. 

One may simply define a Hénon map as a 
diffeomorphism H = (H (x,y), H2(x,y)) with inverse 
G(x, y) =(Gi(x, y), Go(x,y)) such that all the maps 
F;(x,y), G;(x,y) are polynomials of degree at most 
two. It is known (see, e.g., Friedland and Milnor 
(1989)) that such maps H have constant Jacobian 
determinant, and, up to affine conjugacy, may be 
represented in the form H =H, ,(x,y)=(a—x? — 
by,x) with a, b constants and b #0. This makes 
sense when all the terms are real or complex. In the 
real case, we speak of the real Hénon family and, 
in the complex case, we speak of the complex 
Hénon family. 

The real Hénon family was first presented by the 
physicist M Hénon in 1976 as perhaps the simplest 
nonlinear diffeomorphism of the plane exhibiting a 
so-called “strange attractor.” These mappings in the 
real and complex cases have been the focus of much 
attention. Our interest here is that, at least for 
certain parameters a, b, they provide concrete 
globally defined maps whose dynamics are analo- 
gous to that of the horseshoe diffeomorphism. In 
fact, Devaney and Nitecki (1979) proved (in the real 
case) that for fixed b #0, there is a constant ap > 0 
such that if a > ao, then the set B,, of bounded 
orbits of H, , is a compact zero-dimensional set and 
the pair (Hı, b,Ba,b) is topologically conjugate to 
(0,0). In addition, it can be shown that the 
invariant set B, p is a single hyperbolic h-closure. 
Analogous results are true for the complex Hénon 
family and proofs were originally given in the thesis 
of Ralph Oberste—Vorth (unpublished) under the 
supervision of John Hubbard at Cornell University. 
More recent proofs are in Newhouse (2004) and 
Hruska (2004). Many interesting results have been 
obtained for the complex Hénon map by Bedford 
and Smillie and Sibony and Fornaess (see the 
references in Hruska (2004). 


Homoclinic Points in Systems with 
Positive Topological Entropy 


There is an invariant of topological conjugacy which is 
known as the topological entropy. In a certain sense, 
this gives a quantitative measurement of the amount of 
complicated or chaotic motion in the system. 


Let f:X—X be a continuous self-map of the 
compact metric space (X,d). For a positive integer 
n > 0, we define an n-orbit to be a finite sequence 
O(x, 2) = {x, f(x),...,f"-'(x)}. Given a positive real 
number e > 0, we say that two n-orbits O(x, n) and 
O(y, 2) are “e-distinguishable” if there isa 0 <j <n 
such that d(f'x, f'y) > e. Another way to look at this 
is the following. Define the so-called d,,-metric on X 
by setting d,(x,y)= maxo<jen d(f/x, fiy). Then, the 
two n-orbits O(x, n), O(y,n) are e-distinguishable if 
and only if d„(x, y) > €. It follows from compactness 
of X and the uniform continuity of each of the 
maps f/,0<j<n, that the number r(n,¢,f) of 
e-distinguishable 7-orbits is finite for each given e > 0 
and each positive integer n. We define the number 


h(f) = lim lim sup : log r(n, €, f) 


E>0 yso MN 


This means that, for some sequence of inte- 
gers nı <m <..., the map f has roughly e””/) 
e-distinguishable 7;-orbits for i large and e small. 

The number h(f) is called the topological entropy 
of the map f. It may be infinite for homeomorph- 
isms, but it is always finite for smooth maps on 
finite-dimensional manifolds. The number /(f) has 
many nice properties. For instance, h(f') = Nh(f) 
for every positive integer N, and, if f is a homeo- 
morphism, then h(f t) =h(f). Further, if f and g are 
topologically conjugate, then h(f)=h(g). The so- 
called “variational principle for topological 
entropy” asserts that h(f) is the supremum of the 
measure-theoretic entropies of the invariant prob- 
ability measures for f. Our interest in this invariant 
here is the following theorem of Katok. 


Theorem 2(Katok). Let f be a C? diffeomorphism 
of a compact two-dimensional manifold M to itself 
with positive topological entropy. Then, f has 
transverse homoclinic points. 


In fact, Katok extended this theorem (see the 
supplement in Hasselblatt and Katok (1995)) to 
show that, if b(f) >0 and e€ >0, then there is a 
compact zero-dimensional hyperbolic basic set A for 
h such that h(f, A) > b(f) —e«. Thus, one can find 
nice invariant topologically transitive sets for f (i.e., 
sets with dense orbits) on which the topological 
entropies of restriction of f are arbitrarily close to 
that of f. 

This theorem has the interesting consequence that 
the map f—h(f) is lower-semicontinuous on the 
space of C? diffeomorphisms of a surface. It was 
proved in Newhouse (1989) (and, independently by 
Yomdin (1987)) that the map f—h(f) is upper- 
semicontinuous on the space of C% diffeomorphisms 
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of any compact manifold. Combining these results 
gives the theorem that the map f —/h(f) is contin- 
uous on the space of C% diffeomorphisms on a 
compact surface, and that positivity of b(f) implies 
the existence of transverse homoclinic points. 

It is also worth noting that, for any continuous 
self-map f:M-—M on a compact manifold M, one 
has the inequality h(f) >log|u| where u is the 
eigenvalue of largest norm of the induced map f, 
on the first real homology group (Manning 1975). 
Putting this together with Theorem 2 gives the fact 
that there are whole homotopy classes of diffeo- 
morphisms on surfaces all of whose elements have 
transverse homoclinic points. For instance, consider 


a 2 x 2 matrix 
a b 
L=(¢ a) 


with integer entries, determinant 1, and eigenvalues 
Mq,Az2 with 0 < JA] <1 < |A]. Let L:T* > T* be 
the induced diffeomorphism on the two-dimensional 
torus T”. This is an example of what is called an 
“Anosov” diffeomorphism. In this case the number 
u above is simply A2, and this holds for any 
diffeomorphism f of T? which can be continuously 
deformed into L. Hence, any such f must have 
transverse homoclinic points. 


Homoclinic Tangencies 


Let {f,,A€[0,1]} be a parametrized family of C” 
diffeomorphisms of the plane with À an external 
parameter. It frequently occurs that there is a 
hyperbolic saddle fixed point p) for each parameter 
A moving continuously with A such that, at some 
value Ao, a homoclinic tangency is created at a point 
go. This means that there are an ¢€ > 0, a small 
neighborhood U of go, and curves 7 C W”(p\), 
ys C Ws(py) such that yi (\¥%=90 for \0-Ee<A< 
Nos Vi. (VV, ={9o}, and y(\yx consists of two 
distinct points for Aj) <A <A+e. In most cases, 
the tangency of yý, and 7, at qo will be of the 
second order, and we will assume that occurs here. 
The geometry is as in Figure 7. 


y” j 


az rý Ay 
“L f ° 
AeA A= Ao à> ào 


Figure 7 Creation of a homoclinic tangency. 
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The creation of homoclinic tangencies is part of 
the general subject of “homoclinic bifurcations.” A 
recent survey of this subject is in the book by 
Bonatti et al. (2005). Typical results are the 
following. If p=p,, is a saddle fixed point whose 
derivative is area-decreasing (i.e., |Det(Df(p))| < 1), 
then there are infinitely many parameters A near Ao 
for which each transverse homoclinic point of py is a 
limit of periodic sinks (asymptotically stable peri- 
odic orbits) (Newhouse 1979, Robinson 1983). In 
addition, so-called strange attractors and SRB 
measures appear (Mora and Viana 1993). 

Finally, we mention that recently it has been 
shown that, generically in the C’ topology for r > 2, 
homoclinic closures associated to a homoclinic 
tangency (in dimension 2) have maximal Hausdorff 
dimension (Theorem 1.6 in Downarowicz and 


Newhouse (2005)). 


See also: Chaos and Attractors; Fractal Dimensions in 
Dynamics; Generic Properties of Dynamical Systems; 
Hyperbolic Dynamical Systems; Lyapunov Exponents 
and Strange Attractors; Saddle Point Problems; 
Singularity and Bifurcation Theory; Solitons and Other 
Extended Field Configurations. 
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Overview 


Renormalization theory is a venerable subject put to 
daily use in many branches of physics. Here, we 
focus on its applications in quantum field theory, 
where a standard perturbative approach is provided 
through an expansion in Feynman diagrams. Whilst 


the combinatorics of the Bogoliubov recursion, 
solved by suitable forest formulas, has been known 
for a long time, the subject regained interest on the 
conceptual side with the discovery of an underlying 
Hopf algebra structure behind these recursions. 
Perturbative expansions in quantum field theory 
are organized in terms of one-particle irreducible 
(1PI) Feynman graphs. The goal is to calculate the 
corresponding 1PI Green functions order by order in 
the coupling constants of the theory, by applying 
Feynman rules to these 1PI graphs of a 
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renormalizable theory under consideration. This 
allows one to disentangle the problem into an 
algebraic part and an analytic part. 

For the algebraic part, one studies Feynman graphs 
as combinatorial objects which lead to the Lie and 
Hopf algebras discussed below. Feynman rules then 
assign analytic expressions to these graphs, with the 
analytic structure of finite renormalized quantum field 
theory largely dictated by the underlying algebra. 

The objects of interest in quantum field theory are 
the 1PI Green functions. They are parametrized by 
the quantum numbers — masses, momenta, spin, and 
such — of the particles participating in the scattering 
process under consideration. We call a set of such 
quantum numbers an external leg structure r. For 
example, the three terms in the Lagrangian of 
massless quantum electrodynamics correspond to 


re Se) [1] 


Note that the Lagrangian L of massless quantum 
electrodynamics is obtained accordingly as 


L = by + f+) + dew) 
= WW + WAY +;4P [2] 


where ¢ are coordinate space Feynman rules. 

The renormalized 1PI Green function in momen- 
tum space, Gp({g};{p}, {m}; u), is obtained as the 
image under renormalized Feynman rules @p applied 
to a series of graphs: 


< T 
M=1+) g =1+ ` gi D 
k=1 res(T)=r Sym(T) 


Here r is a given such external leg structure, while 7 
is the finite sum of 1PI graphs having k loops, 


, f 
p= a 4 
i 2x E i 
T|=k 

and 0 < g < 1 is a coupling constant. The general- 
ization to the case of several couplings {g} and 
masses {7} is straightforward. In the above, the sum 
is over all 1PI graphs with the same given external 
leg structure. We have denoted the map which 
assigns r to a given graph a residue, for example, 


res(q ir) = ae. [5] 


The unrenormalized but regularized Feynman rules 
@ assign to a graph a function 


OT) (ahs {p} tabs u2) 
4 
-= JI] 6(4) ` ke [I Prop(ke) S [6] 


ver f incident v ecT! 


int 





and formally the unrenormalized Green function 


Gi {eh {p} {ae}; 2) 
= oT") ({g}; {p}, {m}; us, 2) [7] 


which is a function of a suitably chosen regulator z. 
Note that in [6] the four-dimensional Dirac-ô 
distribution guarantees momentum conservation at 
each vertex and restricts the number of four- 
dimensional integrations to the number of indepen- 
dent cycles in the graph. It is assumed that the 
reader is familiar with the readily established fact 
that these integrals suffer from UV singularities, 
which render the integration over the momenta in 
internal cycles ill-defined. We also remind the reader 
that the problem persists in coordinate space, where 
one confronts the continuation of products of 
distributions to regions of coinciding support. We 
restrict ourselves here to a discussion of the situation 
in momentum space and refer the reader to the 
literature for the situation in coordinate space. 
Ignoring problems of convergence in the sum over 
all graphs, the problem of renormalization is to 
make sense of these functions term by term: We 
have to determine invertible series Z"({g},z) in the 
couplings g such that the modified Lagrangian 


È = > Zee}, 2) 6) [8] 


produces a perturbation series in graphs that allows 
for the removal of the regulator z. 

This amounts to a transition from unrenorma- 
lized to renormalized Feynman rules ¢ — dr. Let us 
first describe how this transition is achieved using 
the Lie and Hopf algebra structure of the perturba- 
tive expansion, which is described in detail below: 


e Decide on the free fields and local interactions of 
the theory, appropriately specifying quantum 
numbers (spin, mass, flavor, color, and such) of 
fields, restricting interactions so as to obtain a 
renormalizable theory. 

e Consider the set of all 1PI graphs with edges 
corresponding to free-field propagators. Define 
vertices for local interactions. This allows one to 
construct a pre-Lie algebra of graph insertions. 
Antisymmetrize this pre-Lie product to get a Lie 
algebra £ of graph insertions and define the Hopf 
algebra H which is dual to the enveloping algebra 
U(L) of this Lie algebra. 

e Realize that the coproduct and antipode of this 
Hopf algebra give rise to the forest formula, 
which generates local counter-terms upon intro- 
ducing a Rota—Baxter map, a renormalization 
scheme in physicists’ parlance. 
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e Use the Hochschild cohomology of this Hopf 
algebra to show that one can absorb singularities 
in local counter-terms. 

e Determine the corepresentations of this Hopf 
algebra to identify the sub-Hopf algebras corre- 
sponding to time-ordered products in physical 
fields. This is most easily achieved by rewriting 
the Dyson—Schwinger equations using Hochschild 
1-cocycles. 


The last point exhibits close connections, in parti- 
cular, between the structure of gauge theories and 
the corepresentation theory of their perturbative 
Hopf algebras which we discuss below in brief. 

This program can be carried out in coordinate 
space as well as momentum space renormalization. 
It has given a firm mathematical background to the 
process of renormalization, justifying the practice of 
quantum field theory. The notion of locality has 
achieved a precise formulation in terms of the 
Hochschild cohomology of the perturbation expan- 
sion. In momentum space, this approach emphasizes 
the connections to number theory, which emerge 
when one investigates the role of the Hopf algebra 
primitives, which in turn furnish the Hochschild 
1-cocycles underlying locality. 

The next sections describe the above setup in 
some detail. 


Lie and Hopf Algebras of Graphs 


All algebras are supposed to be over some field K of 
characteristic zero, associative and unital, and 
similarly for coalgebras. The unit (and, by abuse of 
notation, also the unit map) will be denoted by I, 
the counit map by æ. All algebra homomorphisms 
are supposed to be unital. A bialgebra 
(A= @; o Aim, 1, A, e) is called graded connected 
if A;A;j G Ais; and A(A;) C Dini Aj Q Ap, and if 
A(I)=I1@I and Ajp=kl, e(1)=1 EK and €=0 on 
);~, Ai. We call ker @ the augmentation ideal of A 
and denote by P the projection A — ker @ onto the 
augmentation ideal, P=id — le. Furthermore, we 
use Sweedler’s notation, A(h)= X` p’ @h", tor the 
coproduct. We define 


Aug =| P8- 8P | Att, 
k times 


A — {ker 2}? 


[9] 


as a map into the k-fold tensor product of the 
augmentation ideal. We let A% = ker Aug**!/ 
ker Aug™®, Y k > 1. All bialgebras considered here 
are bigraded in the sense that 


A= D Ag = Dal [10] 


where Ak) C p AV) for all k > 1. Ap x AO ~K. 
The first construction we have to study is the pre- 
Lie algebra structure of 1PI graphs. 


The Pre-Lie Structure 


For each Feynman graph we have vertices as well as 
internal and external edges. External edges are edges 
that have an open end not connected to a vertex. 
They indicate the particles participating in the 
scattering amplitude under consideration and each 
such edge carries the quantum numbers of the 
corresponding free field. The internal edges and 
vertices form a graph in their own right. For an 
internal edge, both ends of the edge are connected to 
a vertex. 

We consider 1PI Feynman graphs. A graph I is 
1PI if and only if all graphs, obtained by removal of 
any one of its internal edges, are still connected. 
Such 1PI graphs are naturally graded by their 
number of independent loops, the rank of their 
first homology group Hyj1\(C,Z). We write || for 
this degree of a graph T. Note that |res(I‘)|=0, 
where we let res(T) be the graph obtained when all 
edges in re shrink to a point, as before. Note that 
the graph obtained in this manner consists of a 
single vertex, to which the edges T!!! are attached. 

For a 1PI graph I,I'°! denotes its set of 
vertices and as Ree UT its set of internal 
and external edges. In addition, let w, be the 
number of spacetime derivatives appearing in the 
corresponding monomial in the Lagrangian. 

Having specified free quantum fields and local 
interaction terms between them, one immediately 
obtains the set of 1PI graphs. One can then consider 
for a given external leg structure r the set of graphs 
with that external leg structure. For a renormaliz- 
able theory, we can define a superficial degree of 
divergence, 


w= SY) vw, — 4H (0,Z) [11] 
rer" urio 


int 


for each such external leg structure: w(T) =w(I") if 
res(T) = res(I’); all graphs with the same external leg 
structure have the same superficial degree of 
divergence, and only for a finite number of distinct 
external leg structures r will this degree indeed 
signify a divergence. 

This leaves a finite number of external leg structures 
to be considered to which we restrict ourselves from 
now. Our first observation is that there is a natural 
pre-Lie algebra structure on 1PI graphs. 


Hopf Algebra Structure of Renormalizable Quantum Field Theory 681 


To this end, we define a bilinear operation 


T1 * T3 = So a(r, r3; TE [12] 
Tr 


where the sum is over all 1PI graphs T. Here, 
n(04,102;T) is a section coefficient which counts 
the number of ways in which a subgraph T2 can be 
reduced to a point in I’ such that T1 is obtained. The 
above sum is evidently finite as long as I; and T% 
are finite graphs, and the graphs which contribute 
necessarily fulfill |] = || + |['2] and res(T) = res(T,). 
One then has the following theorem. 


Theorem 1 The operation x is pre-Lie: 


I% x T2] x IRE — l1 x T2 x T3] 
= T x T3] x I> = ry x T} x P| [13] 


which is evident when one rewrites the *-product in 
suitable gluing operations. 

To understand this theorem, note that the 
equation claims that the lack of associativity in the 
bilinear operation * is invariant under permutation 
of the elements indexed 2,3. This suffices to show 
that the antisymmetrization of this map fulfills a 
Jacobi identity. Hence, we get a Lie algebra £ by 
antisymmetrizing this operation: 


T4, P| = ry x I> = I> x li [14] 


This Lie algebra is graded and of finite dimension in 
each degree. Let us look at a couple of examples for 
pre-Lie products: 


virtar i [ES] 
nore a2 [16] 
mir tO- giy [17] 
“On te =2 ~O~ [18] 
Aa m O 
mar t O~ = OF 20 


Together with £ one is led to consider the dual of its 
universal enveloping algebra U(£) using the theorem 
of Milnor and Moore. For this we use the above 
grading by the loop number. 

This universal enveloping algebra U(L) is built 
from the tensor algebra 


T=QT, T'=L8-:-@L [21] 
k 


k times 


by dividing out the ideal generated by the relations 
a&gb-b&8a=fļ|a,b] EL [22] 


Note that in U/(£) we have a natural concatenation 
product m,. Furthermore, U/(£) carries a natural 
Hopf algebra structure with this product. For that, 
the Lie algebra £ furnishes the primitive elements: 


Ala) =a81+18a, Vael [23] 


It is, by construction, a connected finitely graded 
Hopf algebra which is co-commutative but not 
commutative. We can then consider its graded 
dual, which will be a Hopf algebra H(m, I, A, é) 
that is commutative but not cocommutative. One 
finds it upon using a Kronecker pairing 


L Ter 


0, else 24] 


Jo So = 
The space of primitives of U(L) is in one-to-one 
correspondence with the set Indec(H) of indecom- 
posables of H, which is the linear span of its 
generators. One finds the following theorem. 


Theorem 2 


LL ® Zp, — Zi 8 ZT; ôr > =< Zm, 74], or > 23] 


For example, one finds 
(xr eZ p Z oer bs | a) 
(ancy 00 a, -Z g Olax 
iidr 
(Z eai Zi ei) 


=2 [26] 


H is a graded commutative Hopf algebra which 
suffices to describe renormalization theory, as we 
see in the next section. We have formulated it for 
the superficially divergent 1PI graphs of the theory 
with the understanding that the residues of these 
graphs are in one-to-one correspondence with the 
terms in the Lagrangian of a given theory. Often, 
several terms in a Lagrangian correspond to graphs 
with the same number and type of external legs, but 
correspond to different form-factor projections of 
the graph. In such cases, the above approach can be 
easily adopted considering suitably colored or 
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labeled graphs. A similar remark applies if one 
desires to incorporate renormalization of super- 
ficially convergent Green functions, which requires 
nothing more than the consideration of an easily 
obtained semidirect product of the Lie algebra of 
superficially divergent graphs with the abelian Lie 
algebra of superficially convergent graphs. 


The Principle of Multiplicative Subtraction 


The above algebra structures are available once one 
has decided on the set of 1PI graphs of interest. We 
now use them toward the renormalization of any 
such chosen local quantum field theory. 

From the above, 1PI graphs T provide the linear 
generators ór of the Hopf algebra H= O%, Hi, 
where Hy, =span(6r) and their disjoint union 
provides the commutative product. 

Now let I be a 1PI graph. We find the Hopf 
algebra H as described above to have a coproduct 
explicitly given as A:H-H®H: 


A(T )=T@1+1er+S yel/y [27 
vcr 


where the sum is over all unions of 1PI superficially 
divergent proper subgraphs, and we extend this 
definition to products of graphs so that we get a 
bialgebra. 

While the Lie bracket inserted graphs into each 
other, the coproduct disentangles them. It is this 
latter operation which is needed in renormalization 
theory: we have to render each subgraph finite before 
we can construct a local counter-term. That is precisely 
what the Hopf algebra structure maps do. 

Having a coproduct, two further structure maps 
of H are immediate: the counit and the antipode. 
The counit @ vanishes on any nontrivial Hopf 
algebra element, @(1) = 1,e(X)=0. The antipode is 


S(t) =P — SoS) /y |28] 


al 


We can work out a few coproducts and antipodes as 
follows: 


Aug? (gt e+e) <2 ear este 29) 
Aug? (qf) =2 <br One [BO 
Aug (gaz) ="Or ext I 


Aug?) ((\wim(Qe) = 2 ear O- Om [33] 


Aug? R.O#)=~O- oer [4 


We give just one example for an antipode: 


SG) = Pye $2 tae oe [35] 


Note that for each term in the sum A(I) = ¥>, I) 8 
T» we have unique gluing data G; such that 


T=Meoogly, Vi (36) 


These gluing data describe the necessary bijections 
to glue the components I, back into T} so as to 
obtain T: using them, we can reassemble the whole 
from its parts. Each possible gluing can be inter- 
preted as a composition in the insertion operad of 
Feynman graphs. 

We have by now obtained a Hopf algebra 
generated by combinatorial elements, 1PI Feynman 
graphs. Its existence is automatic from the above 
choices of interactions and free fields. What remains 
to be done is a structural analysis of these algebras 
for the renormalizable theories we are confronted 
with in four spacetime dimensions. 

The assertion underlying perturbation theory is 
the fact that meaningful approximations to physical 
observable quantities can be found by evaluating 
these graphs using Feynman rules. 

First, as disjoint scattering processes give rise to 
independent amplitudes, one is led to the study of 
characters of the Hopf algebra, maps ¢:H — V such 
that dbom=my(¢d® 9d). 

Such maps assign to any element in the Hopf 
algebra an element in a suitable target space V. 
The study of tree-level amplitudes in lowest-order 
perturbation theory justifies assigning to each edge 
a propagator and to each elementary scattering 
process a vertex, which define the Feynman rules 
o(res(T)) and the underlying Lagrangian, on the 
level of residues of these very graphs. Graphs are 
constructed from edges and vertices which are 
provided precisely by the residues of those diver- 
gent graphs, hence one is led to assign to each 
Feynman graph an evaluation in terms of an 
integral over the continuous quantum numbers 
assigned to edges or vertices, which leads to the 
familiar integrals over momenta in closed loops 
mentioned before. 

Then, with the Feynman rules providing a 
canonical character ø, we will have to make one 
further choice: a renormalization scheme. The need 
for such a choice is no surprise: after all we are 
eliminating short-distance singularities in the graphs, 


Hopf Algebra Structure of Renormalizable Quantum Field Theory 683 


which renders their remaining finite part ambiguous, 
albeit in a most interesting manner. 

Hence, we choose a map R: V — V, from which 
we obviously demand that it does not modify the 
UV-singular structure, and furthermore that it obeys 


R(xy) + R(x)R(y) = R(R(x)y) + R(xR(y)) [37] 


which guarantees the multiplicativity of renormali- 
zation and is at the heart of the Birkhoff decom- 
position, which emerges below: it tells us that 
elements in V split into two parallel subalgebras 
given by the image and kernel of R. Algebras for 
which such a map exists are known as Rota—Baxter 
algebras. The role Rota—Baxter algebras play for 
associative algebras is similar to the role Yang- 
Baxter algebras play for Lie algebras. The structure 
of these algebras allows one to connect renormaliza- 
tion theory to integrable systems. In addition, most 
of the results obtained initially for a specific 
renormalization scheme, such as minimal subtrac- 
tion, can also be obtained, in general, upon a 
structural analysis of the corresponding Rota—Baxter 
algebras. 

To see how all the above comes together in 
renormalization theory, we define a further char- 
acter y that deforms ¢ o S slightly and delivers the 
counter-term for I in the renormalization scheme R: 


S(T) = —Rmy(S8 & ġ o P)A 


RID Se) 


ycI 





o(T'/7) | [38] 


which should be compared with the undeformed 


pos = M ad 


DA 


ycI 


eT/y) [89] 


The fact that R is a Rota—Baxter map ensures that 
S is an pa of the character group G of the 
Hopf algebra, S$ E€ Spec(G). Note that we have now 
determined the a Lagrangian: 


Z" = Se(I*) [40] 


The classical results of renormalization theory 
follow immediately using this group structure: we 
obtain the renormalization of I by the application 
of a renormalized character 


S$ x (T) = my(Se @o)A [41] 


and Bogoliubov’s R operation as 


A . hie 8 pad 8 ny (r) 
) + X Se ()G(P/y) [42] 


so that 


Sk x G(T) = R(T) + SRT) [43] 
Here, S x @ is an element in the group of characters 
of the Hopf algebra, with the group law given by the 
convolution 


b1 x d2 = My 0 (ġ1 D 2) 0A [44] 


so that the coproduct, counit, and coinverse (the 
antipode) give the product, unit, and inverse of this 
group, as befits a Hopf algebra. This Lie group has 
the previous Lie algebra £ of graph insertions as its 
Lie algebra: £ exponentiates to G. 

What we have achieved above is a local renorma- 
lization of quantum field theory. Let M? be a 
monomial in the Lagrangian L of degree w: 


= Dries 45] 


Then one can prove, using the Hochschild cohomol- 
ogy of H: 


Theorem 3 (Locality) 


ZD {0} = D,;Z{9} 46] 


that is, renormalization commutes with infinitesimal 
spacetime variations of the fields. 


We can now work out the renormalization of a 
Feynman graph T: 


AOH 
Hatre- a7 


PE) = (Do) + 282 (abe )b(Om) 48 
= 6D) — 2R ola) |o-O-) (49) 


-R [0] [50] 


= [id — R] o [pO] [51] 


The formulas [47]-[51] are given in their recursive 
form. Zimmermann’s original forest formula solving 
this recursion is obtained when we trace our 
considerations back to the fact that the coproduct 
can be written in nonrecursive form as a sum over 
forests, and similarly for the antipode. 
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Diffeomorphisms of Physical Parameters 


In the above, we have effectively obtained a Birkhoff 
decomposition of the Feynman rules ¢ € Spec(G) 
into two characters — WR =S xe E Spec(G) and 
pÈ =S? € Spec(G) — for any Rota-Baxter map R. 
Thanks to Atkinson’s theorem, this is possible for 
any renormalization scheme R. For the minimal 
subtraction scheme, it amounts to the decomposi- 
tion of the Laurent series ọ(T)(e€), which has poles of 
finite order in the regulator «€, into a part holo- 
morphic at the origin and a part holomorphic at 
complex infinity. This has a particularly nice 
geometric interpretation upon considering the 
Birkhoff decomposition of a loop around the origin, 
providing the clutching data for the two half-spheres 
defined by that very loop. 

Whilst in this manner a satisfying understanding 
of perturbative renormalization is obtained, the 
character group G remains rather poorly under- 
stood. On the other hand, renormalization can be 
captured by the study of diffeomorphisms of 
physical parameters as, by definition, the range of 
allowed modification in renormalization theory is 
determined by the variation of the coefficients of 
monomials ¢(r) of the underlying Lagrangian 


L= S Zo [52] 


Thus, one desires to obtain the whole Birkhoff 
decomposition at the level of diffeomorphisms of the 
coupling constants. 

The crucial step toward that goal is to realize the 
role of a standard quantum field-theoretic formula 
of the form 


Znew = Eel [53] 
where 
AA 


a 54 
I Tecres(v)Q Z° | | 


ext 


for some vertex v, which obtains the new coupling 
in terms of a diffeomorphism of the old. This 
formula provides, indeed, a Hopf algebra homo- 
morphism from the Hopf algebra of diffeomorph- 
isms to the Hopf algebra of Feynman graphs, 
regarding Zë (a series over counter-terms for all 
1PI graphs with the external leg structure corre- 
sponding to the coupling g), in two different ways: it 
is, at the same time, a formal diffeomorphism in the 
coupling constant giq and a formal series in Feyn- 
man graphs. As a consequence, there are two 
competing coproducts acting on Z,. That both give 
the same result defines the required homomorphism, 


which transposes to a homomorphism from the 
largely unknown group of characters of H to the 
one-dimensional diffeomorphisms of this coupling. 

In summary, one finds that a couple of basic 
facts enable one to make a transition from the 
abstract group of characters of a Hopf algebra of 
Feynman graphs (which, incidentally, equals the Lie 
group assigned to the Lie algebra with universal 
enveloping algebra the dual of this Hopf algebra) to 
the rather concrete group of diffeomorphisms of 
physical observables. These steps are given as 
follows: 


e Recognize that Z factors are given as counter- 
terms over a formal series of graphs starting with 
1, graded by powers of the coupling, hence 
invertible. 

e Recognize the series Z, as a formal diffeomorph- 
ism, with Hopf algebra coefficients. 

e Establish that the two competing Hopf algebra 
structures of diffeomorphisms and graphs are 
consistent in the sense of a Hopf algebra 
homomorphism. 

e Show that this homomorphism transposes to a Lie 
algebra and hence Lie group homomorphism. 


The effective coupling ge¢(e) now allows for a 
Birkhoff decomposition in the space of formal 
diffeomorphisms. 


Theorem 4 Let the unrenormalized effective cou- 
pling constant gle) viewed as a formal power 
series in g be considered as a loop of formal 
diffeomorphisms and let ge(€) = (gett) (E) Zete, (E) 
be its Birkhoff decomposition in the group of formal 
diffeomorphisms. Then the loop ge (£) is the bare 
coupling constant and g.,(0) is the renormalized 
effective coupling. 


The above results hold as they stand for any 
massless theory which provides a single coupling 
constant. If there are multiple interaction terms 
in the Lagrangian, one finds similar results relat- 
ing the group of characters of the corresponding 
Hopf algebra to the group of formal diffeomorph- 
isms in the multidimensional space of coupling 
constants. 


The Role of Hochschild Cohomology 


The Hochschild cohomology of the combinatorial 
Hopf algebras which we discuss here plays three 
major roles in quantum field theory: 


1. it allows one to prove locality from the accom- 
panying filtration by the augmentation degree 
coming from the kernels ker Aug"); 
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2. it allows one to write the quantum equations of 
motion in terms of the Hopf algebra primitives, 
elements in Hy, N {ker Aug’’/ker Aug'”’}; and 

3. it identifies the relevant sub-Hopf algebras 
formed by time-ordered products. 


Before we discuss these properties, let us first 
introduce the relevant Hochschild cohomology. 


Hochschild Cohomology of Bialgebras 


Let (A,m,I,A,e) be a bialgebra, as before. We 
regard linear maps L:A—A®” as n-cochains and 
define a coboundary map b, b? =0 by 


bL:=(id @L)oA+ 5 (-1)'AjoL 
=] 


+- Lol [55] 


where A; denotes the coproduct applied to the ith 
factor in A®”, which defines the Hochschild coho- 
mology of A. 

For the case n= 1, for L: A— A, [55] reduces to 


bL=(id@L)oA-AoL+Le@I [56] 


The category of objects (A, C), which consists of 
a commutative bialgebra A and a Hochschild 
1-cocycle C on A, has an initial object (Hi, B+), 
where H, 1s the Hopf algebra of (nonplanar) rooted 
trees, and the closed but nonexact 1-cocycle B4 
grafts a product of rooted trees together at a new 
root as described below. 

The higher (n > 1) Hochschild cohomology of Hy 
vanishes, but in what follows, the closedness of By. 
will turn out to be crucial. 


The Hopf Algebra of Rooted Trees 


A rooted tree is a simply connected contractible 
compact graph with a distinguished vertex, the root. 
A forest is a disjoint union of rooted trees. 
Isomorphisms of rooted trees or forests are iso- 
morphisms of graphs preserving the distinguished 
vertex/vertices. Let t be a rooted tree with root o. 
The choice of o determines an orientation of the 
edges of t, away from the root, say. Forests are 
graded by the number of vertices they contain. 

Let He be the free commutative algebra generated 
by rooted trees. The commutative product in Hy 
corresponds to the disjoint union of trees, such 
that monomials in Hn are scalar multiples of forests. 
We demand that the linear operator By on Hrm, 


defined by 


By (I) =e [57] 


ty) = MN, [58] 


is a Hochschild 1-cocycle, which makes H,, a Hopf 
algebra. The resulting coproduct can be described as 
follows: 


A(t)=1@t+t@l+ X P(A) SR.(t) [59] 


adm c 


where the sum goes over all admissible cuts of the 
tree t. Such a cut of t is a nonempty set of edges of t 
that are to be removed. The forest which is 
disconnected from the root upon removal of those 
edges is denoted by P(t) and the part which remains 
connected to the root is denoted by R,(t). A cut c(t) 
is admissible if, for each vertex l of t, it contains at 
most one edge on the path from / to the root. 

This Hopf algebra of nonplanar rooted trees is the 
universal object after which all such commutative 
Hopf algebras H providing pairs (H,B), for B a 
Hochschild 1-cocycle, are formed. 


Theorem 5 The pair (Ha,B+), unique up to 
isomorphism, is universal among all such pairs. In 
other words, for any pair (H,B) where H is a 
commutative Hopf algebra and B a closed nonexact 
1-cocycle, there exists a unique Hopf algebra 
morphism Ha 2H such that Bo p=po B4. 


This theorem suggests that we investigate the 
Hochschild cohomology of the Hopf algebras of 1PI 
Feynman graphs. It clarifies the structure of 1PI 
Green functions. 


The Roles of Hochschild Cohomology 


The Hochschild cohomology of the Hopf algebras of 
1PI graphs sheds light on the structure of 1PI Green 
function in at least four different ways: 


è it gives a coherent proof of locality of counter- 
terms — the very fact that 


IZ, D,] = 0 60 


means that the coefficients in the Lagrangian 
remain independent of momenta, and hence the 
Lagrangian remains a polynomial expression in 
fields and their derivatives; 

e the quantum equation of motions takes a very 
succinct form, identifying the Dyson kernels with 
the primitives of the Hopf algebra; 

e sub-Hopf algebras emerge from the study of the 
Hochschild cohomology, which connects the repre- 
sentation theory of these Hopf algebras to the 
structure of theories with internal symmetries; and 

è these Hopf algebras are intimately connected to 
the structure of transcendental functions, such as 
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the generalized polylogarithms, which play a 
prominent role these days ranging from applied 
particle physics to recent developments in 
mathematics. 


To determine the Hochschild 1-cocycles of some 
Feynman graph Hopf algebra H, one determines 
first the primitives graphs y of the Hopf algebra, 
which, by definition, fulfill the condition 


A(y) 


Using the pre-Lie product above, one then deter- 
mines the maps 


=78Il+I@y [61] 


Bl: H = Hin [62] 

such that 
mn = B} (bh) 8 I+ (id © BY )A(h) [63] 
where = Xp n(y,h, T). The coefficients 


ee ae GSi wn to the section coeffi- 
cients noted earlier. i 
Using the definition of the Bogoliubov map 4, this 
immediately shows that 
S$ (B1 (b ) = jD, — c, or(h) [64] 
which proves locality of counter-terms upon recog- 
nizing that B? increases the augmentation degree. 
Here, the insertion of the functions for the subgraph 
is achieved using the relevant gluing data of [36]. 


To recover the quantum equation of motions from 
the Hochschild cohomology, one proves that 


r E a 
I-=1+ —-__ Bi (X 65 
2o +f y) | | 
where 
Te 
X; = Jl Te |66] 
ecql! lint vegl ] 


has the required solution. Upon application of the 
Feynman rules, the maps B? turn into the integral 
kernels of the usual Dyson-Schwinger equations. 
This allows for new nonperturbative approaches 
which are a current theme of investigation. 

Finally, we note that the 1-cocycles introduced 
above allow one to determine sub-Hopf algebras of 
the form 


=) Foe) oc [67] 


where the G are defined in eqn [3]. These algebras 
do not necessitate the considerations of single 


Feynman graphs any longer, but allow one to 
establish renormalization directly for the sum of all 
graphs at a given loop order. Hence, they establish a 
Hopf algebra structure on time-ordered products in 
momentum space. For theories with internal sym- 
metries, one expects and indeed finds that the 
existence of these subalgebras establishes relations 
between graphs that are same as the Slavnov—Taylor 
identities between the couplings in the Lagrangian. 


Outlook 


Thanks to the Hopf and Lie algebra structures 
described above, quantum field theory has started to 
reveal its internal mathematical structure in recent 
years, which connects it to a motivic theory and 
arithmetic geometry. Conceptually, quantum field 
theory has been the most sophisticated means by 
which a physicist can describe the character of the 
physical law. We have slowly begun to under- 
standing that, in its short-distance singularities, it 
encapsulates concepts of matching beauty. We can 
indeed expect local point-particle quantum field 
theory to remain a major topic of mathematical 
physics investigations in the foreseeable future. 


See also: Bicrossproduct Hopf Algebras and 
Noncommutative Spacetime; Exact Renormalization 
Group; Hopf Algebras and g-Deformation Quantum 
Groups; Number Theory in Physics; Operads; 
Perturbation Theory and Its Techniques; 
Renormalization: General Theory; von Neumann 
Algebras: Introduction, Modular Theory, and 
Classification Theory. 
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Introduction 


Quantum groups are a remarkable generalization of 
conventional groups using an algebraic language by 
now quite well known to mathematical physicists. 
This language is first and foremost the concept of a 
“Hopf algebra.” In fact, the axioms of a Hopf 
algebra are so attractive from a mathematical point 
of view that they were proposed in the 1940s long 
before the advent of truly representative examples, 
which did not come until the 1980s (from mathe- 
matical physics). Until then, they were used mainly 
by mathematicians as a way for redoing group 
theory and Lie algebra theory in a more uniform 
way. 

It is remarkable that at least three points of view 
lead to the same axioms of a Hopf algebra: 


1. Generalized symmetry A generalization of a 
usual group algebra or enveloping algebra of a 
Lie algebra that can nevertheless act on other 
algebraic objects. The structure that controls this 
is the “coproduct”? A:H —H & H, while the 
group or Lie structure is encoded in the algebra 
H which is typically not changed up to iso- 
morphism. A allows H to act on tensor products 
and this is needed to define what it means, for 
example, for a product A ® A—A of an algebra 
to be an intertwiner. The usual flip map between 
two representations V@W—We®V is not 
typically an intertwiner any more, instead that 
is provided by an R-matrix solving the Yang- 
Baxter equations (YBE). 

2. Noncommutative geometry A generalization of 
the coordinate algebra of functions on a conven- 
tional group to allow noncommutative or “quan- 
tum” coordinate algebras. Here the group 
structure is encoded in a coproduct A: H — H & 
H in a way which would, in the case of functions 
on a group, be defined by the group product. It is 
typically not changed, the change being in the 
algebra. 

3. Duality An object that admits observer- 
observed duality or Fourier transform. Such a 
duality is known for abelian groups, lost for 
nonabelian groups but re-emerges for Hopf 
algebras. If there is to be an algebra with product 
H®H—H, then there should also be a 


“coproduct” A:H—H®H to maintain the 
duality symmetry. Then a suitable dual space 
H* is also a Hopf algebra, with the roles of 
product and coproduct interchanged. 


In line with these main ideas are three known classes 
of true quantum groups, and these remain the main 
types of example at the time of writing: the g-deformed 
enveloping algebras U,(g) of Drinfeld and Jimbo, their 
duals as quantizations of the Drinfeld—Sklyanin 
Poisson bracket on a simple Lie group (both of these 
arising from quantum inverse scattering but also in 
the case of C,[SU2] from C*-algebras) and the 
bicrossproduct quantum groups based on Lie group 
factorizations (arising from ideas for Planck-scale 
physics and quantum gravity). The latter are self-dual 
and hence are both generalized symmetries and 
noncommutative or quantum geometries at the same 
time. The impact of such quantum groups has been 
very far reaching from a mathematician’s point 
of view, spanning revolutions in the theory of knot 
and 3-manifold invariants, Poisson geometry, new 
directions in noncommutative geometry, to name 
some. In physics they are, at the time of writing, 
beginning seriously to be applied in a variety of 
contexts beyond the original ones, such as in book- 
keeping overlapping divergences in general quantum 
field theories, quantum computing, and construction 
of anyons. This article will mention some of these, but 
just as groups have many different roles in physics, 
one can expect that quantum groups and variants of 
them can and will have diverse roles as well. What 
follows is a short overview. 


Hopf Algebras and First Examples 


The general theory works over any field k but (to be 
concrete) we write our examples over C; one can 
also have examples over, say, the field Z2 of two 
elements. A Hopf algebra then is: 


1. An algebra H with unit which is also a 
“coalgebra” with counit, that is, there are maps 
A:H —H 8 H,e«:H — k obeying: 


(A &id)A = (id @ A)A 
(e @id)A = (id @ A = id 


2. A,e should be algebra homomorphisms. 
3. There should be a map $:H—H called the 
antipode or “linearized inverse” obeying 


(id @ S)A = -(S @id)A = 1e 
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If the third axiom is not obeyed one has a 
“quantum semigroup” or “bialgebra.” Note also 
that S looks nothing like a usual inverse and it is 
not, yet it plays the same role. For example, we can 
define conjugation or the “adjoint action” of any 
Hopf algebra on itself by 


Adj,(b) = X aybSa,), g= S an) © a2) 


where we use here the “Sweedler notation” for Aa a 
sum of unspecified pieces in H ® H. Moreover, if it 
exists, then S is unique and (it can be shown) 
S(ab) =(Sb)S(a) for all a,b € H, just like an inverse. 

The self-duality of these axioms is evident from 
the first one: a coalgebra is just an algebra with its 
product map H & H— H, unit element (viewed as a 
map k—H sending 1 to 1) and the associativity and 
unity axioms all written backwards. Meanwhile, 
the middle axiom means in explicit terms 
A(ab) = (Aa)(Ab), e(ab)=e(a)e(b) for all a,beH 
and A(1)=161,¢(1)=1. This may not look self- 
dual but it is equivalent to saying that the product 
and unit are coalgebra homomorphisms. Indeed, if 
one takes the trouble to write out all the axioms as 
commutative diagrams, the set of axioms is invar- 
lant under arrow reversal. Such arrow reversal can 
also be concretely implemented, for example, by 
taking adjoints. Thus, the coproduct dualizes to a 
map (H ® H)* > H* and since H* & H* C (H & H)* 
we have a product on the dual H*. If the dual space 
is defined correctly, one also has a coproduct by 
dualizing the product, etc. One says that two Hopf 
algebras H,H’ are “in duality” if their maps are 
adjoint to each other in such a way. 

The role of quantum groups as generalized 
symmetries is typified by the following examples. 
Thus, let G be a group; then its group algebra CG 
defined as a vector space (written here over C) with 
basis identified with G and product given by the 
group product extended linearly, is a Hopf algebra 
with 


Ag=g®@g, g=1, Sg=g"', VgeEG 


Likewise, if g is a Lie algebra, then its universal 
enveloping algebra U(q) generated by g is a Hopf 
algebra with 


ME=Colepi@es, <_=—0, SES veeg 


The two examples are related if one informally 
allows exponentials, then g= e has coproduct 


Aes = eX = eS 214188 — of Q ef 


using axiom 2 and that € ® 1,1 & € commute in the 
tensor product algebra. 

The coproduct structures are therefore implicit 
already in Lie theory and group theory. As for any 


Hopf algebra A, specifies how the algebra H acts in 
a tensor product of two representations. For groups 
the tensor product is diagonal (g acts on each copy), 
for Lie algebras it is additive (e.g., the addition of 
angular momenta). In general, the action of a € H is 
defined as the action of Aa on the tensor product. 
This has far-reaching consequences. For example, 
for the product A& A—A of an algebra to be 
covariant means that H acting before and after the 
product map gives the same answer, similarly for the 
unit map where k has the trivial representation 
afforded by «€, that is, 


ho (ab) = S (hay Pa)(hay>b), bol =e(h)1 


for all a,b€ A and heH. What that means in the 
case of a group is therefore g» (ab) = (g > a)(g > b) or 
G acts by automorphisms. What it means for a Lie 
algebra is €> (ab) = (£> a)b + a(€ > b), that is, g acts 
by derivations. This is how Hopf algebra theory 
unifies group theory and Lie algebra theory and 
potentially takes us beyond. 

In another, dual, point of view, if G is a group 
defined by polynomial equations in C”, then the 
Hilbert’s “nullstellensatz” in algebraic geometry says 
that it corresponds algebraically to a commutative 
nilpotent-free algebra with n generators, called its 
“coordinate algebra” H = C[G]. The group product 
then corresponds to A making C[G] into a Hopf 
algebra. If one replaces C by any field, one has an 
algebraic group over the field. For example, the 
group SL2(C) c Cf has coordinate algebra gener- 
ated by four functions a,b,c,d where a at matrix 
g € SL2(C) has value gj; the 1,1 entry of the matrix, 
similarly b(g) =g12 etc. Then C[SL2] is the commu- 
tative algebra generated by a, b, c,d with the relation 
ad —bc=1. A little thought about matrix multi- 
plication should convince the reader that 


a(e pole a)e(é a) 


where we have written the operation on each 
generator as an array and where matrix multi- 
plication is understood (so Aa=a®a+b®c, etc.). 
The counit and antipode are 


(e z) 7 f 1) 
Med] \o 1 
(° f 7 ( d 7) 
Cay ‘6 a 
One could also let G be a finite group, in which case 


the algebra C(G) of (say complex-valued) functions 
on it is more obviously a Hopf algebra with 


(Aa)(g,h) = a(gh), e(a) =a(1), (Sa)(g) = alg’) 
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for any function a € C(G). Here we identify C(G) ® 
C(G)=C(G x G) or functions in two variables on 
the group. These examples are dually paired with 
U(q) in the Lie case and CG in the finite case, 
respectively. 

In such a coordinate algebra point of view, usual 
constructions in group theory appear expressed 
backwards with arrows reversed. So an action of 
the group appears for such a Hopf algebra H as a 
“coaction” Arp: V—V®H (here a right coaction, 
one can similarly have Ay a left coaction). It obeys 


(AR & id)ARr = (id & A)Ar, (id o) c)AR = id 


which are the axioms of an algebra acting written 
backwards for the coalgebra of H “coacting.” An 
example is the right action of a group on itself which 
in the coordinate ring point of view is Ar =A, that 
is, the coproduct viewed as a right coaction. It is the 
algebra of H that determines the tensor product of 
two coactions, so, for example, A is a coaction 
algebra in this sense if Ap: A— A & H is a coalgebra 
and an algebra homomorphism. Similarly, in this 
coordinate point of view, an integral on the group 
means a map f:H—k and right invariance trans- 
lates into invariance under the right coaction, or 


([ sid) a =1 f 


There is a theorem that such an integration, if it 
exists, is unique up to scale. In the finite-dimensional 
case it always exists, for any field k. At least in this 
case, let exp = >. e; Q f’ for a basis {e;} of H and {f'} 
a dual basis. Then an application of the integral is 
Fourier transform H — H* defined by 


F (a) =| Neaer 


with properties that one would expect of Fourier 
transform. The inverse is given similarly the other 
way up to a normalization factor and using the 
antipode of H. This is one among the many results 
from the abstract theory of Hopf algebras, see 
Sweedler (1969) and Larson and Radford (1988) 
among others. 

A given Hopf algebra H does not know which 
point of view one is taking on it; the axioms of a 
Hopf algebra include and unify both enveloping and 
coordinate algebras. So an immediate consequence is 
that constructions which are usual in one point of 
view give new constructions when the wrong point 
of view is taken (put another way, the self-duality of 
the axioms means that any general theorem has a 
second theorem for free, given, if we keep the 
interpretation of H fixed, by reversing all arrows in 


the original theorem and its proof). Even the 
elementary examples above are quite interesting for 
physics if taken “upside down” in this way. For 
example, if G is nonabelian, then CG is noncom- 
mutative, so it cannot be functions on any actual 
group. But it is a Hopf algebra, so one could think 
of it as being like C(G), where G is not a group but 
a quantum group defined as C(G)=CG. The latter 
is a well-defined Hopf algebra viewed the wrong 
way. So this is an application of noncommutative 
geometry to allow nonabelian Fourier transform 
F:C(G)— CG. Similarly, U(q) is noncommutative 
but one could view it upside down as a quantization 
of C[q*] = S(q) (the symmetric algebra on g). To do 
this let us scale the generators of g so that the 
relations on U(q) have the form &7 — n= Alé, n] 
where A is a deformation parameter. Then the 
Poisson bracket that this algebra quantizes 
(deforms) is the Kirillov-Kostant one on q* where 
lEn = [En]. Here £n on the left-hand side are 
regarded as functions on q*, while on the right-hand 
side we take their Lie bracket and then regard 
the result as a function on g*. Examples which 
have been used successfully in physics include: 


te = NG (bicrossproduct model Ry") 


lj, Xj] = 12A€ypxz (spin space model R}) 


(summation understood over k). In both cases, we 
may develop geometry on these algebras using 
quantum group methods as if they were coordinates 
on a usual space (see Bicrossproduct Hopf Algebras 
and Noncommutative Spacetime). They are versions 
of R” because the coproduct which expresses the 
addition law on the noncommutative space is the 
additive one according to the above. In the second 
case, setting the Casimir to the value for a spin is the 
quadratic relation of a “fuzzy sphere.” As algebras, 
the latter are just the algebras of (2j; + 1)x(2j+ 1) 
matrices. 

Going the other way, we can take a classical 
coordinate ring C[G] and regard it upside down as 
some kind of group or enveloping algebra but with 
a nonsymmetric A. In the finite group case, an 
action of C(G) just means a G-grading. Here if an 
element v of a vector space has G-valued degree |v| 
then apv=a(|v|)v is the action of ae C(G). 
Alternatively, this is the same thing as a right 
coaction of CG, Agv=v & |v|. Thus, the notion of 
group representation and group grading are also 
unified. This is familiar in physics for abelian 
groups (a U(1) action is the same thing as a 
Z-grading) but works fine using Hopf algebra 
methods for nonabelian groups and beyond. 
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Returning to axioms, if one wants to speak of real 
forms and unitary representations, this corresponds, 
for Hopf algebras, to H a x-algebra over C with 


A = Te Oa), koS=S lox 


where r (throughout this article) denotes transposi- 
tion of tensor factors. This requires in particular that 
S in invertible (which is not assumed for a general 
Hopf algebra though it does hold in the finite- 
dimensional case and in all examples of interest). 
Thus, C[SU2] denotes the above with a certain * 
structure whereby the matrix of generators is unitary. 


q-Deformation Enveloping Algebras 


For a genuinely representative example of a Hopf 
algebra, consider, U¿(sl2) defined with noncommu- 
tative generators and relations, coproduct etc., 


gtx. = gt!y, 


bh „=b 
kx] = 
q—4 
A xe od Q q”? py oe © x4 
Ag??? = q’!? @ q? 
ex =0, eq =1 


h/2 —h/2 


Sx = —qĦ! x4, Sg eE 


The actual generators here are xi,g*’/? but the 
notation is intended to be suggestive: if ) existed and 
we took the limit q— 1, we would have the usual 
enveloping algebra of the Lie algebra sh. The 
quantum group U,(su2) is the same with the 
*-structure h* = h, xš =x+ when q is real (there are 
other possibilities). 

Two words of warning here. Although some 
authors write q = e”/?, the parameter q here has little 
to do with quantization. In fact, the cases of direct 
relevance to physics are q2™!/?+*), where k is the level 
of the Wess—Zumino—Witten (WZW) model in which 
this quantum group appears as a generalized symme- 
try. This quantum group also (first) appeared in the 
theory of exactly solvable lattice models, namely the 
Ising model with an applied external magnetic field: 
q #1 is a measure of the resulting nonhomogeneity 
of the model. Its origins go further back to the 
algebraic Bethe ansatz and the emergence of the YBE 
in such models (Baxter 1982). The general U,(q) 
emerged from this context in Drinfeld (1987) and 
Jimbo (1985) and the same remark applies (see Affine 
Quantum Groups; Yang—Baxter Equations). 

The second warning is that at least informally (if 
one works with H and allows formal power series 


etc.), the algebra here is isomorphic to usual U(sh), 
that is, it looks deformed but the true deformation is 
not here but in the coproduct, which enters into the 
tensor product of representations. The latter are 
labeled as usual because the algebra is not really 
changed, for example, the unitary ones of U,(suz2) 
are labeled by spin. The spin-> one even looks the 
same with x,,/ represented by the standard Pauli 
matrices. Tensor products of representations start 
to look different but their multiplicities are the same 
as classically and if V,W are representations then 
Ve2W2WeV. Because the coproduct above is 
not symmetric in its two factors, this isomorphism 
Üy w=rTo Ry w has Ry, w nontrivial. From the 
formulas given, the reader can compute that 


q0 0 0 


= 0 1 -qg 0 
Riin =q n 0 0 fa 0 
00 0 4 


in a tensor product basis. For this particular 
quantum group, and others like it, one finds that 
these “R-matrices” obey the braid relations as a 
version of the YBE. As a result, they can and do lead 
to knot invariants; the one above leads to the Jones 
knot invariant as a polynomial in q. Briefly, one 
represents the knot on a plane, assigns R or R™! to 
each braid crossing and takes a suitable trace (see 
The Jones Polynomial). 

Since such features hold in any representation, 
these matrices are in fact representations of an 
invertible element R € H ® H provided one allows h 


as a generator and formal power series: 
agl E 
Rage a fo 


where 








ex) = So, ml =a 


m=0 [m]! 1 -4 


are the q-exponential and q-integer, respectively. 
Their proper explanation is in the section “Braided 
groups and quantum planes.” This R is called the 
“universal R-matrix” or quasitriangular structure 
and obeys 


TA =R(A R! 
(A Q id)R = Ra3 R23, (id &) A)R = R13R 12 


and from the axioms of a Hopf algebra, one may 
deduce that the YBE 


Ria Ria Ra = RB eek 


hold in the algebra. This induces the YBE for 
matrices Ry, w in the representation V © W. Such a 
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Hopf algebra is called “quasitriangular” and its 
representations form a braided category (see Braided 
and Modular Tensor Categories). Even if R for a 
quasitriangular Hopf algebra is defined by a power 
series, the Ry w in finite-dimensional representa- 
tions are typically actual matrices. 

Of considerable interest is the special case when q 
is a primitive mth root of unity. In this case the 
quasitriangular Hopf algebra u4(sl2) has the above 
generators but the additional relations 

faa, Pa gag 
which render the algebra generated by e,f,g as n°- 
dimensional. The algebra no longer has a matrix 
block decomposition (is not semisimple) and not all 
representations descend to it. For example, if 7 is 
odd, then only representations of dimension <n 
descend. Other than this, one has many of the 
features of a classical enveloping algebra now for 
this finite-dimensional object. There is evidence that 
such objects over C are intimately related to 
classical Lie algebras but over a finite field. 

Finally, there is a similar theory of U,(q) for all Lie 
algebras determined by symmetrizable Cartan matrices 
{aij}, including affine ones. Here 7,7 € I an indexing set 
and gf= 21+) /1-4-6 10,21, =2,.3<| ori 7, where» 
is a symmetric bilinear form on the root lattice Z[J] 
generated by I with i-i a positive even integer. To be 
precise, one should also fix a “root datum” in the form 
of aninclusion Z[I] C X of the root lattice into a choice 
of character lattice X and an inclusion Z[I] C Y of the 
coroot lattice (also labeled by J) into the cocharacter 
lattice Y (the dual of X). Here the evaluation pairing is 
required to restrict to (i°,/) =aj if 1,7¢€I and i~is 1 
viewed in the cocharacter lattice Y. We let q; = q? 
and require gq? # 1 for alli (or one may consider q as an 
indeterminate). We have generators e;,f’ for i€ I and 
invertible g, for a each generator of Y, and the relations 


Sali = ge Ba f'Sa = gef 
o gl gid 
236 |= a 
di — q; 


1-a; 1 — a: 
(-1)'( ‘) (e) ej(e) "7 =0 
r=) 1 qi 


for all ¿Æj and an identical set for the {f'}. The 
coalgebra and antipode are 


Ae; =e; Q8 +1 @e; 


Afi=fieltg” ef 
Aga = Za B Za, Elga) = 1, 


Sga PE Se; = -eig Sf' — 


elei) = elf’) = 0 
-g f 


The q-Serre relations are those above involving the 
q-binomial coefficients, defined now using the 
symmetric q-integers (m),=(q” —q™)/(q— g): 
They have their true explanation as 


Ady (@) = 0 


where Ad is a braided group adjoint action in the 
sense of the section “Braided groups and quantum 
planes.” Notice that while the root generators are 
modeled on the Lie algebra, the Cartan generators 
are modeled on the torus of an algebraic group, 
which contains global information. Thus, the more 
precise form of U,(slz) is the e, f, g form with the 
generator g=q’ as above, with Z[I]CX and 
ZI] = Y. Meanwhile U,(pslz) has the square root 
of this as generator (what we called q’/* before) 
with Z[I]=xX and Z[I] CY where the strict inclu- 
sion has T=2 in the lattice Z. Note that, in the 
complex case, SL has compact real form SU2 while 
its quotient, PSL», has compact real form SO3, so 
these are distinguished at the Hopf algebra level. In 
general, the root datum has an associated reductive 
algebraic group which is simply connected when 
Y = Z|I] and generated by its adjoint representation 
when X = Z[I]. The complexified character lattice is 
a sublattice of the more familiar Lie algebra weight 
lattice and labels representations that extend to the 
(algebraic) group. Langlands duality interchanges 
the roles of X, Y. These subtleties are lost when we 
work over formal power series with g=e’/? and 
Lie-algebra-like Cartan generators. 

These objects are mathematically so interesting 
that some authors define “quantum groups” as 
nothing more than this particular extension of the 
theory of Lie algebras, Cartan matrices and root 
systems. Among the deepest theorems is the exis- 
tence of the Lusztig—Kashiwara canonical basis 
which is obtained from q=0 but valid also at 
g=1 (i.e. for classical enveloping algebras) and 
which has the remarkable property of inducing bases 
coherently across highest-weight representations. 
From a physicist’s point of view, however, there 
are many other Hopf algebras rather more closely 
connected with actual quantization. Most often, the 
terms quantum group and Hopf algebra are used 
interchangeably. 

There is similarly a reduced version u,(q). The 
simplest of all possible cases, even simpler than 
ug(sl2), is for what one could call u,(1) with a single 
generator g and 


g’=1, Ag=g@g, e=1, Sg=g' 


1 n—1 7 
R=- q ab of @ gb 
PTE, 
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where q is a primitive mth root of unity. The Hopf 
algebra is the same as the group algebra 
CZn=C(Zn) but the R is nontrivial. A representa- 
tion means a Z,,-graded space, that is, graded into 
degrees 0,1,...,2—1. The braiding matrices have 
the diagonal form Ry, w,=q” on components of 
degree a, b, respectively. The braided category 
generated in this case is the one where anyons live. 
From this point of view, uj(g) generate the category 
where nonabelian anyons live. Here R,2 (in place of 
q'”2")/?) along with an additional €,2 factor as 
above gives the quasitriangular structure of ug(sh). 
The physical model here is the rational conformal 
field theory mentioned above with these anyons as 
particular bound states. There is a proposal to use 
them in the construction of quantum computers. 


q-Deformation Coordinate Algebras 


From the coordinate algebra point of view, the 
corresponding deformation to the one in the last 
section is the Hopf algebra C,[SL2] with noncom- 
muting generators and relations 


(a= Gace, ba = gab 

db = qbd, dc = qcd 

bc = cb, da —ad =(q—q™')bc 
ad—q ‘be=1 


The coalgebra has the same matrix form on the 
generators as for C[SL2] and the antipode and 
*-structure (for C,[SU2]) are 


(e a= (are P)=( a) 


Its duality pairing with U,(sl2) is afforded by the 
2 x 2 Pauli-matrix representation of the latter. The 
C,[SU2] Hopf *-algebra may be completed to a 
C*-algebra. 

One similarly has C,[G] for all semisimple Lie 
groups G and their various real forms. From an 
axiomatic point of view, such quantum groups are 
“coquasitriangular” in the sense that there is a map 
R:H ®H-—k such that 


X Raya) @b ba) = X baaa R (an 8 ba) 
for all a,b € H and 


Rabg) = 


neS \IR(b 8 co )) 
> Raa)  €)R(aa) 8b) 


for all a,jb,ceH. We also require that R is 
invertible in a certain sense. These are just the 
arrow reversal of the axioms of a quasitriangular 


R(a® bc) = 


structure. In general, for the deformation of a linear 
algebraic group we will have some n? generators t';, 
now taken to be noncommutative, and with a 
matrix form of coalgebra 


Aŭ; = "i ®&) a et’; = i 


For the compact real form we will have St; =t; 
Moreover, from the first of the above axioms we 
will have among the relations 
Ri tit? _ ft R, 

where Ri*® = R(t, Q tfi) is a matrix REM, 8 M, 
obeying the YBE. If we take only these quadratic 
relations, we have the “Faddier Reshetikhin Takhta- 
jan (FRT) bialgebra” A(R) and it can be shown (see 
Majid 1995) that R extends to a coquasitriangular 
structure R on it. However, in our case we also have 


RMN, = R( Se & t*r) 
Rix, = R & st%,) 


where R=((R”)~')” (t) transposition in the second 
factor of M,,) is called the “second inverse” of R. With 
these additional matrices, one may define a g-determi- 
nant and antipode relations as well (Majid 1995). One 
may also generate a rigid braided monoidal category 
and reconstruct a Hopf algebra A(R) from it. In this 
way, the R-matrix plays a role similar to that of the 
structure constants of a Lie algebra and can in 
principle define the quantum group coordinate alge- 
bra. Such R-matrices have been classified in low 
dimension and include multiparameter and other 
deformations of classical group coordinate algebra as 
well as other nonstandard quantum groups. 

In the C,[G] examples it is not the coalgebra which 
is essentially deformed but the algebra. We already see 
this above on the generators but the coproduct of a 
product of generators may look different. Nonetheless, 
one can identify the vector space that the products 
generate with that of C[G] and at least informally with 
respect to a deformation parameter express the 
product as a power series in the undeformed product 
(a e-product deformation). For generic values, one still 
has a Peter-Weyl decomposition C,[G] = 6 (V & V*), 
where the sum is over irreducibles corepresentations, 
which can be identified with the classical representa- 
tions of the algebraic group. One can make the same 
decomposition for C[G] and identify the matrix blocks 
V ® V* in order to find this e-product. Also, since this 
is a flat deformation, it follows that the commutator at 
lowest order defines a Poisson bracket on G, given by 


; ok 
o ator; ae r it? 
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and this Poisson bracket is compatible with the group 
product G x G—G as a Poisson map (because the 
Hopf algebra coproduct was an algebra map). Here r 
is the first order part in the expansion of the 
R-matrix. A Lie group equipped with a Poisson 
bracket compatible in this way is called a “Poisson 
Lie group.” On general functions its Poisson bivector 
is generated by the first order part r € g & g in the 
expansion of R in the g-deformed enveloping 
algebra. In place of the YBE obeyed by R, we have 
the “classical Yang—Baxter equations (CYBE),” 


Ir12, r23] + [712,713] + [713,723] = 0 


In this way, one may characterize an “infinitesimal 
version” of U,(q) as (g,7,6) where 6:g—Q 8g is 
the leading part of rA — A and makes the triple into 
a quasitriangular “Lie Bialgebra” (see Classical 
r-Matrices, Lie Bialgebras, and Poisson Lie Groups). 

Finally, returning to our example, when g is an 
nth root of unity, one has the g-Frobenius Hopf 
algebra homomorphism 


C[SL4| => Ca [SL] 


(e 4 (e 4 
>> 
cf d oe a 
that is, a classical copy sitting inside the quantum 


group. Quotienting by this means adding the 
relations 


a"=d"=1, b=”=0 


which gives the finite-dimensional reduced quantum 
group C [Sil Similarly for other CIG]. These 
reduced quantum groups provide finite noncommu- 
tative geometries having the geometric flavor of the 
classical geometry but where geometry and physics 
(such as electromagnetic gauge theory modes) are 
fully computable. 


Self-Dual Quantum Groups 


The arrow-reversibility of the axioms of a quantum 
group make it possible to search for self-dual 
quantum groups or for quantum groups which, if 
not self-dual, have a self-dual form. This leads to the 
bicrossproduct quantum groups coming from mod- 
els of quantum gravity (Majid 1988) (see Bicross- 
product Hopf Algebras and Noncommutative 
Spacetime). 

The context here is that of Figure 1 which shows 
how Hopf algebras relate to other objects and to 
duality in a representation-theoretic sense. Along the 
central axis, we have put self-dual categories or in 
physical terms categories admitting Fourier trans- 
form. This is clear for abelian Groups where the 


Group 
duals 





Quantum 
theory 
Monoidal 
categories = 


Riemannian 
geometry 






Hopf Abelian 
algebras - 





Nonabelian 


Role of Hopf algebras along the self-dual axis. 


. 


Figure 1 


dual G of an abelian group G is also an abelian 
group. Below the axis, we have nonabelian groups 
which we view as toy models of geometries with 
curvature. Every compact Lie group, for example, 
has an associated Killing metric. Above the axis, a 
nonabelian group dual G means to construct unitary 
representations etc., which we view as toy models of 
quantum theory. We have seen that Hopf algebras 
are another self-dual category and provide a frame- 
work in which both groups and group duals can be 
unified (see the section “Hopf algebras and first 
examples”). Thus, G can be viewed as a coordinate 
Hopf algebra C(G) or C[G] in the finite or Lie cases, 
and G as the dual Hopf algebra CG or U(g) as a 
definition of the coordinate algebra “C(G).” Note that 
G is not merely the set of representations, as these 
alone are not enough to reconstruct the group (e.g., 
both St and SO; have the same set). We see that Hopf 
algebras are a microcosm for the unification of 
quantum theory and gravity. Hopf algebra duality 
interchanges the role of position and momentum on 
the one hand and of quantum and gravitational effects 
on the other. A self-dual Hopf algebra has both aspects 
unified and interchanged by the self-duality. 

One can also ask what the next most general self- 
dual category of objects is in which to look for more 
general unifications. One answer here is the category 
whose objects are themselves categories C equipped 
with a tensor product (a “monoidal category”) and a 
monoidal functor to a fixed monoidal category V. 
Motivated by the above, a theorem from the 1980s 
is that for any such C there is a dual C°of 
“representations in V” (Majid 1991a). The dotted 
arrows in Figure 1 indicate that this may be a setting 
for more ambitious models than those achieved by 
Hopf algebras alone. In fact, the C° construction was 
one of the ingredients going into the invention of 
2-categories a few years later. See also several 
articles on TQFT (such as Topological Quantum 
Field Theory: Overview; Axiomatic Approach to 
Topological Quantum Field Theory; Duality in 
Topological Quantum Field Theory). 
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The simplest self-dual quantum group is C[x] as 
the Hopf algebra of polynomial functions on a line 
with additive coproduct. This is dually paired with 
itself in the form of the enveloping algebra 
U(gl,) = Cp] with pairing 

(p"", x") = (i) mnn! 
and similarly for higher-dimensional flat space. In 
the case of C[x], a basis is x” and from the above the 
dual basis is (ip)”/n!. Hence the canonical element is 
exp =e*®? so that Hopf algebra Fourier transform 
on a suitable completion of these algebras reduces to 
usual Fourier transform. 

A more nontrivial example (Majid 1988) is given 
by the “Planck-scale Hopf algebra” C[x]»<C [p] 
which has algebra and coalgebra 


lp, x] = ib(1 E aa 
Ap=pee™+18p, 
Sx = =x, 


Ax=x81+18x 
ex = ep =0 
Sp = —pe™ 


The actual generator here should be e°% rather than 
x for an algebraic treatment (otherwise one should 
allow power series or use C*-algebras). The dually 
paired Hopf algebra has the same form C[p]><C[x], 
with new parameters h’=1/h and y '=by and 
quantum group Fourier transform connects the 
two. More details and the general construction of 
Hopf algebras C[M]b<i U(q) with dual U(m) >< C/G] 
are in the article on “bicrossproduct” Hopf algebras 
(see Bicrossproduct Hopf Algebras and Noncommu- 
tative Spacetime). These quantize particles in M 
moving under momentum Lie group G with Lie 
algebra g and vice versa. The states of one (in a 
C*-algebra context) lie in the algebra of observables of 
the other (“observable-state duality”). The data 
required are a matched pair of actions of (G,M) 
on each other. Such equations correspond locally to 
a factorization of a larger group GxM but 
typically have singularities and other features in 
keeping with a toy model of Einstein’s equations. 
There are, by the time of writing, many applica- 
tions of bicrossproducts beyond the original one, 
including a Poincaré quantum group for the RO 
mentioned in the section “Hopf algebras and 
first examples,” with links to Planck-scale physics. 
There is also a bicrossproduct quantum group 
C[G*]»<x U(q) canonically associated to any simple 
Lie algebra g and related to T-duality. The classical 
data here are Lie bialgebras and solutions of the 
CYBE as in the section “g-Deformation coordinate 
algebras,” however there is no known relation with 
the g-deformation Hopf algebras themselves. Finite 
group bicrossproducts are also interesting and 
examples (but not with both actions nontrivial) 
were already in the works of GI Kac in the 1960s. 


These constructions also work when the groups 
above are themselves Hopf algebras. For example, 
any finite-dimensional Hopf algebra H has a 
“quantum double” D(H)=H x H*P, where the 
double cross product ™ is by mutual coadjoint 
actions. The cross-relations between the two sub- 
Hopf algebras are 


>_ ay, 4ay)h 242) = X amba lbo ao) 


for heH and ae H*. The construction is due to 
Drinfeld (1987) while the >% form is due to the 
author. Moreover, D(H) is quasitriangular with 
R= exp, the canonical element used in the Fourier 
transform on H. Its representations consist of vector 
spaces where H acts and at the same time H*°? acts 
or (which makes sense when H is infinite dimen- 
sional) where H coacts, in a compatible way. Such 
objects are called “crossed modules” because when 
H=CG, one has exactly a linearization of the 
crossed G-sets of JC Whitehead. They are a special 
case of the C? construction mentioned above. 
Finally, one can also view the g-deformed linear 
spaces on which quantum groups such as U,(q) act 
as self-dual Hopf algebras under an additive 
coproduct. However, this needs to be as braided 
groups or Hopf algebras with braid statistics, see the 
next section. The simplest example here is the 
“braided line” B=C|x] developed not as above but 
as a self-dual Hopf algebra with q-statistics. Its 
“bosonization” gives a self-dual Hopf algebra 
U,(bs) C U,(sl2), and similarly for other U,(b+) C 
U,(q). Perhaps more surprisingly, the quantum 
groups U,(q) and C,[G] also both have canonical 
braided group versions (a process called “transmuta- 
tion”) and as such they too are isomorphic. This 
isomorphism extends the linear isomorphism g — q* 
afforded by the Killing form of any semisimple Lie 
algebra. In physical terms, what this means is that 
there is in g-deformed geometry just one self-dual 
object B,(G) with two different scaling limits 


U(g) — B,(G) — CIG] 


as q—> 1, and the structure of which underlies the 
deeper structure of U,(q) and C,[G] as well. 


Braided Groups and Quantum Planes 


A super quantum group or super-Hopf algebra is 
not a quantum group or Hopf algebra since the key 
homomorphism property of A: H > H & H is mod- 
ified: one must use in the target H&H the Z.-graded 
or super tensor product of super algebras. Here, 


(a @ b)(c @ d) = (-1)""l!"lac @ bd 
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for elements of degree |b], |c|. Super quantum groups 
Ug(glnjn) etc., have been constructed and have an 
analogous theory to the bosonic versions above. 
Super spaces in physics are associated to differential 
forms and in the same way a bicovariant exterior 
algebra on a quantum group H is generally a super 
quantum group. Here the exterior algebra is 
generated on by 1-forms and the coproduct on 1- 
forms is 


A= A, + Apr 


Here A,r are the coactions of H on 1-forms 
induced by the left and right coaction of H on itself. 

For a true understanding of quantum groups one 
must, however, go beyond such objects to “braided 
groups” or Hopf algebras with braid statistics (see 
Majid 1995). This theory was introduced by the 
author in the early 1990s as a more systematic 
method for g-deformation of structures in physics 
based on g-group covariance. We have seen that a 
quasitriangular quantum group, or any Hopf alge- 
bra through its double, generates a braided category 
with the flip map 7 replaced by a braiding Vy, w 
between any two representations. Anything which is 
covariant under the quantum group means by 
definition that it lives in the braided category. 
Working with such “braided algebras” is similar to 
working with superalgebras except that one should 
use W in place of the graded transposition in any 
algebraic construction. In particular, two braided 
algebras have a natural “braided tensor product” 
also in the category. In concrete terms, 


(a@® b)(c®d) =aV(b@c)d 


Then a Hopf algebra in the braided category or 
braided group is B, an algebra in the category along 
with a coalgebra and antipode, where A: B — B&B 
is an algebra homomorphism (see Braided and 
Modular Tensor Categories). 

Next, we have mentioned in the section 
“g-Deformation enveloping algebras” that q-alge- 
bras generate topological invariants, but we now 
turn this on its head and use braid diagrams to do g- 
algebra. We write all operations as flowing down 
the page, any transpositions in the algebraic con- 
struction are expressed as a braid crossing V =X or 
its inverse by the reversed braid crossing, and any 
other operations as nodes. Thus, a product is 
denoted Y and a coproduct A. Algebraic informa- 
tion “flows” along these “wires” much like the way 
that information flows along the wiring in a 
computer, except that under- and over-crossings 
represent distinct nontrivial operators. (In fact, one 
may formulate topological quantum computers 
exactly in this way.) In this notation, tensor 


products are denoted by juxtaposition and the trivial 
object in the category is omitted. In particular, one 
has the axioms and all general theorems of Hopf 
algebras at this diagrammatic level. For example, the 
adjoint action of any braided group B on itself is 
(see Majid 1995) 


In any concrete example, such diagrams turn into 
R-matrix formulas where Y =7R as explained in the 
section “g-Deformation enveloping algebras.” 

A basic example of a braided group is the braided 
q-plane ci with generators x,y and relations 
yx = qxy. Its coproduct is the additive one Ax = x & 
1+1&x (and similarly for y) reflecting addition in 
the plane, but this is extended to products as a 
braided group with braiding q! R; /2,1/2 in terms of 
the R-matrix in the section “q-Deformation envel- 
oping algebras.” The extra factor here means that 
CŽ lives in the braided category of representations of 
Ug(gl2) = U;¿(slh) (i.e. with an additional central 
U,(1) generator to provide the q'/”). More precisely, 
the category is that of corepresentations of 
C,[GL2]=C,[SL2]. The coaction in this case is 


ante =(e@ (2 a) 


where the additional central generator is encoded in 
the g-determinant (which is no longer set equal to 
1). Notice that q‘/7R, y2,1/2 has eigenvalues q, q! 
(one says that it is g-Hecke). Another braided group, 
associated now to the second eigenvalue is Cc 
with generators €,7 and relations nf€=—q~'€n, 
¿=n? =0. It is the quadratic algebra dual of G 
(Manin 1988). 

One has natural braided linear spaces for the whole 
family C,[G], on which the latter coact after central 
extension. The general construction is as follows. If V 
is an object in a braided category (e.g., the funda- 
mental representation of a quantum group), let T(V) 
be the tensor algebra generated by a basis {e;} of V 
with no relations and the additive braided coproduct 
as above. Assume that V has a dual V* in the 
category, and similarly form T(V*) with dual basis 
generators {f'}. These two braided groups will be 
dually paired by extending the evaluation map to 
products, which takes the form of “braided integers” 
(see Majid 1995) 


(fim. file, ee) = by m|n, V] es 
In, Y| = id + Wy + Wyn Vo3 +--+» + Wy +--+ Vy 
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We now quotient by the kernels of this pairing to 
obtain B(V),B(V*) as two nondegenerately paired 
braided groups. This quotient generates all the rela- 
tions, which are very often but not necessarily 
quadratic (in practice, one typically imposes only the 
quadratic relations to have braided groups with a 
possibly degenerate pairing). The construction is due to 
the author. Moreover, we can define partial derivatives 
on these braided groups by Aa = 1 ®a+e;@O0'a+:-: 
for any a in the algebra, that is, as an infinitesimal 
generator of translations under the braided group law; 
similarly exp, indefinite and Gaussian integration, 
Fourier transform, etc. The simplest example here is 
B =C[x] viewed not as a usual Hopf algebra but as a 
braided group in the category of Z-graded spaces with 
W(x ® x) =qx x. Also in this example the braided 
addition law on C[x] is 


n 


Ax” — ` ME Q x” 


m=(0 


defined by [],, and the partial derivative defined 
by it is the Jackson (1908) g-derivative 


f(x) — f (qx) 
x(1—q) 


while Ae,(x)=eg(x) 8 eg(x) if we allow power 
series. Such objects occur in the theory of g-special 
functions (see g-Special Functions). 

Among deeper theorems (see Majid 1995, 2002), 
there is a triangular decomposition 


U,(g) = Ug(n_) >< T >< Ug (14) 


where U,(14.) is a braided group and U,(7_) is dually 
paired to its opposite. T denotes the torus generators 
{ga} in the section “g-Deformation enveloping alge- 
bras.” More generally, if gg Cg is a principal 
embedding of Lie algebras (given by an inclusion of 
Dynkin diagrams), then U,(q) = B* >< U4 (Qo) >< BOP 
for some additive braided group of additional root 
generators and its dual. The general construction 
B* >a Hx B°P here is “double bosonization” which 
associates to dual braided groups B,B* in the 
category of representations of some quasitriangular 
Hopf algebra H, a new quasitriangular Hopf algebra. 
The simplest example B = C[x] lives in the category of 
representations of T=U,(1) in an algebraic form. 
The dual is another braided line C[p] and 
C[p] >< U,(1)e< C[x] is a version of U,(s/z). In this 
way, the braided line C[x] is at the root of all 
qg-deformation quantum groups. 

An earlier theorem is that for any braided group B 
covariant under a (co)quasitriangular H, we have its 
‘bosonization’ B><H. There is a similar “biproduct” 
if B lives in the category of crossed modules for any 


af (x) = 


Hopf algebra H. These have been extensively applied 
in physics notably in the construction of inhomoge- 
neous quantum groups. Similar to C* (but as a 
*-algebra), there is a natural self-dual g-Minkowski 
space B=R?° which is covariant under U,(s0j,3), 
and its bosonization is the q-Poincaré plus dilations 
group Ry” >< Uz (s01,3). It is not possible to avoid the 
dilation here. The double-bosonization extends this 
to the g-conformal group U,(so2,4). The braided 
adjoint action becomes the action of conformal 
translations on R!°. The construction of g-propaga- 
tors and g-deformed physics on such g-Minkowski 
space was achieved in the mid 1990s as one of the 
main successes of the theory of braided groups. 

This Re can be given also as a matrix of 
generators, relations, *-structure and, a second 
braided coproduct: 


Ba =q ab, ya=q ay, 
By = 78+ (1-q°7)a(6 - a) 
66 = bê + (1-q“*)aB 

yê = dy + (1-4 Jya 


ay dG ses) 
Geah GaGa) 


This is in addition to the additive coproduct above. 
It corresponds to the point of view of Minkowski 
space as Hermitian 2 x 2 matrices. Note that A is 
not a x-algebra map in the usual sense and indeed 
Hermitian matrices are not a group under multi- 
plication, but this does form a natural braided x- 
bialgebra. If we quotient by the braided determinant 
relation aô — g*73=1, we have the unit hyperbo- 
loid in RP? which turns out to be the braided group 
B,[SU2] mentioned at the end of the previous 
section (as obtained canonically from C4[SU2]). We 
now have a braided antipode 


sfo B\_[(P8+0-Pa -88 
y 6 -9°y a 
This was the first nontrivial example of a braided 


group (Majid 1991b) and we see that it has two 
g= 1 limits 


6a = ad 


U(suz) — B,[SU2] + C[Hyperboloid c R'”] 


Because most constructions in physics can be 
uniformly deformed by such methods (including 
the totally g-antisymmetric tensor), one finds that q 
provides a new regulator in which infinities in 
quantum field theory can be in principle be encoded 
as poles at g=1. That transmutation from the 
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quantum group to its braided version unifies unitary 
nonabelian symmetries with pseudo-Riemannian 
geometry is another deeper aspect of relevance to 
physics. In addition, g-constructions have their 
original role in quantum integrable systems, at g a 
root of unity and for infinite-dimensional (affine) 
Lie algebra deformations. 


Quasi-Hopf Algebras 


Although the braided category of representations of 
a quantum group has a trivial ‘“associator” 
®yw,z:(V®W)S®Z-V@(W@Z) between any 
three objects, a general braided category and the 
diagrammatic methods of “braided algebra” in the 
last section do not require this (one simply translates 
diagrams into algebra by inserting ® as needed). A 
more general object that generates such categories as 
its representations is a “quasi-Hopf algebra.” This is 
a generalization of Hopf algebras in which the 
coproduct A:H—-H & H is not necessarily coasso- 
ciative. Instead, 


(id @ AJA = d((A @id)A Jo 
(id ®e@id)d@=1 
6234(1d ® A @ id)(¢)¢123 

= (id? @ A)(4)(A @ id”) (¢) 


for some invertible element 6¢H®H®H. The 
numbers denote the position in the tensor product 
and one says that ¢ is a 3-cocycle. The axioms for 
the antipode and quasitriangular structure R are 
also modified. The tensor product of representations 
is given as usual by A, and the braiding and 
associator by the actions of R and @. 

This notion, due to Drinfeld (1990), arises when 
one wishes to write down the quantum groups U,(q) 
more explicitly as built on the algebras U(q) (recall 
that they are isomorphic over formal power series). 
Thus, for each semisimple g there is a natural 
(quasitriangular) quasi-Hopf algebra (U(q), 9, R) 
where U(q) has the usual Hopf algebra structure, 
R is an exponential of the split Casimir (or inverse 
Killing form) in q@®q and @ is constructed as a 
solution of the Knizhnik—Zamolodchikov equations 
coming out of conformal field theory. This is not 
U,(qg) but it has an equivalent braided category of 
representations. Thus, there is an element F € U(q)*” 
(extended over (imal power series) such that 


Ap=F(A)F',  Rp=FuRF' 
op = Fu (A 8 id)(F)d(id @ A)(F')F = 1 


recovers U,(q) as a quasitriangular Hopf algebra 
built directly on the algebra U(g). The conjugation 


operations here (and a similar process regarding the 
antipode) are a “Drinfeld twist” of a quasi-Hopf 
algebra, and such twisting by any invertible F such 
that 


(e @id)F = (id @e)F=1 


(a cochain) does not change the representation 
category up to equivalence. In the present case, the 
twist transforms @¢@ into @p=1, that is, into an 
ordinary Hopf algebra isomorphic over formal 
power series to U,(q). Note that in rational 
conformal field theory the tensor product of 
representations appears as a finite-dimensional 
commutative associative algebra (the Verlinde alge- 
bra) with integer structure constants N”, (this comes 
from the operator-product expansion of primary 
fields in the theory). This is because one has more 
precisely a truncated representation category corre- 
sponding to g a root of unity, and because we are 
identifying equivalent representations (so N”, are 
the multiplicity in the decomposition of a tensor 
product of two representations). However, if one 
wants to know the tensor product decomposition 
more fully, not just its isomorphism class, this is 
given in a choice of bases by recoupling matrices. 
Computation in terms of these shows that the actual 
tensor product is neither commutative nor associa- 
tive, but of the form above at least in the case of the 
WZW model. 

Hopf algebra theory typically extends to the 
quasi-Hopf case. For example, given a quasi-Hopf 
algebra H there is a quantum double D(H) at least 
in the finite-dimensional case, due to the author. An 
example is to take H =C(G) and ¢ a 3-cocycle on G 
in the usual sense 


Py, z, W) P(X, yz, w)olx, y, z) 
_ P(x, y, zw) (xy, z, w) 
on elements of G and ¢(x,1,y)=1. Then (C(G), ¢) 
can be viewed as a quasi-Hopf algebra. Its double 
D?(G) is generated by C(G) as a sub-quasi-Hopf 
algebra and by elements of G with 


xy = yx ôx y,x)(s), 


Det SO et 


ax,x—'bx)d(a, b, x) 


dx = Ses 


ab=s 


xX X04 XO; 


pla, x, x 1bx) 


in terms of a basis {6,} of C(G), the product of G on 


the right, and 


o(x,y,y x aola) 


x(x, y)(s) = plx, xlsx, y) 
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a 2-cocycle on G with values in C(G) (the algebra is 
a cocycle semidirect product). There is a quasitrian- 
gular structure R= 5°6,@x. This quasi-Hopf 
algebra first appeared in discrete topological quan- 
tum field theory related to orbifolds in the work of 
Dijkgraaf, Pasquier, and Roche. 

There are further generalizations in the same spirit 
and which are linked to conformal field theories of 
more general type; for example, weak (quasi-) Hopf 
algebras in which A1 41@1 but is a projector. 
These have been related to quantum groupoids. 

Finally, we mention some applications of twisting 
outside of the original context. First of all, we are not 
limited to starting with U(q): starting with any Hopf 
algebra or quasi-Hopf algebra H we can similarly twist 
it to another one Hp with the same algebra as H and 
Ar, Rr, or given by conjugation as above. The 
representation category remains unchanged up to 
equivalence, so in some sense the twisted object is 
equivalent. Moreover, if we start with a Hopf algebra 
H and ask F to be a 2-cocycle in the sense 


Fy2(A @ id)(F)(id @ A)(F~')F53 =1 


then Hp will remain a Hopf algebra. It has 
conjugated antipode (see Majid 1995) 


Sr(a) = U(Sa)U"!, =U =-(id @ S)(F) 


Many Hopf algebras are twists of more standard 
ones, for example, the multiparameter quantum 
groups tend to be twists of the standard U,(q). 
Likewise, “triangular” Hopf algebras (where 
R21kR =1) tend to be twists of classical group or 
enveloping algebras. 

A second application of twists is an approach to 
quantization. Although it can be applied to H itself, this 
is more interesting if we think of H as a background 
quantum group and ask to quantize objects covariant 
under H. For the sake of discussion, we start with H 
an ordinary Hopf algebra. We twist this to Hp and 
denote by 7 the equivalence functor from representa- 
tions of H to representations of Hp. This functor acts 
as the identity on all objects and all morphisms, but 
comes with nontrivial isomorphisms cy,w:7(V) 8 
T(W)—T(V & W) for any two objects, compatible 
with bracketting (see Majid 1995). Given any algebraic 
construction covariant under H, we simply apply the 
functor T to all aspects of the construction and obtain 
an equivalent Hp-covariant construction. As an exam- 
ple, if A is an H-covariant algebra, then applying 7 to 
its product we have 7(-):7(A @A)—T(A). Using 
ca, a we obtain a map 


e: T(A) @T(A) > T(A) 
aeb=-(F'>(a@b)) 


in terms of the product in A. Thus, we have a new 
algebra Ap built on the same vector space as A but 
with a modified e product. This is called a 
“covariant twist” of an algebra and should not be 
confused with the Drinfeld twist above. It is due to 
the author in the early 1990s. If F is a 2-cocycle, 
then Ap remains associative. The transmutation 
construction mentioned in the section “Self-dual 
quantum groups” or the passage from Ri to RP? ate 
examples in quantum group theory. Other examples 
include the standard Moyal product on R”, also 
called noncommutative spacetime [x,,x,]=10,, by 
string theorists (see Bicrossproduct Hopf Algebras 
and Noncommutative Spacetime). 

If we do not demand that F is a cocycle, then the 
algebra Ap is still associative but in the target 
category, which means 


(aeb)ec=(e(e))By 4 4((4@b) 8c) 


Such objects are called “quasialgebras.” It may still 
be that ®, 4,4 happens to be trivial (¢¢ happens to 
act trivially) so that Ap remains associative. This 
turns out frequently to be the case and many 
quantizations in physics, including C,[G] but not 
limited to g-examples, can be obtained in this way. 
It means that although they are associative there is a 
hidden nonassociativity which can surface in other 
constructions involving ®. The physical application 
here is with H = U(q) a classical enveloping algebra, 
A functions on a classical manifold on which gq acts, 
and a cochain F. In general the resulting quasialge- 
bra will not be associative but rather a quantization 
of a “quasi-Poisson manifold” obeying 


{a,{b,c}}+cylic = 2n(a 89 b ®c) 


Here 7 is the trivector field for the action of the 
lowest order part of op and the (quasi)Poisson 
bivector is the leading-order part of Fy,F'. As 
mentioned, there are many cases where 7 (and the 
action of the rest of r) happens to be trivial. 

Finally, let us give a discrete example using such 
quantum group methods. We consider H=C(G) 
and Fe C(G x G) a cochain. Twisting by this gives 
Hr =(C(G), df) a quasi-Hopf algebra where 


Ply, 2) F(x; y2) 
F(xy, 2) F(x, y) 
We take A=CG the group algebra. The action of 


C(G) on it is the diagonal one. The modified algebra 
Ar therefore has product 


PH Ve) = 


xey=F'(x,y)xy 


in terms of the product in G, and will be a 
quasialgebra if F is not a cocycle. For example, let 
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G=(Z»)? which we write additively (so elements 
are 3-vectors with values in Z2) and take 


ee 
then, 


r(x, yY, z) = (Ayr 


Moreover, Ar =O, the octonions (Albuquerque and 
Majid 1999). So these are a nonassociative quanti- 
zation of the classical discrete space (Z2)°. We see 
that they are in fact associative up to sign and with 
sign +1 when the corresponding 3-vectors are 
linearly independent. 


Noncommutative Geometry 


In this article, we have frequently encountered the 
view of quantum groups and other noncommutative 
algebras as by definition the coordinate algebras on 
“noncommutative spaces.” However, the “quantum 
groups approach” to such noncommutative geome- 
try that emerges has a somewhat different flavor 
from other approaches, as we discuss now. 

In fact, the problem of geometry at such a level 
was mentioned already by Dirac in the 1920s and 
led to theorems of Gelfand and Naimark in the 
1940s and 1950s whereby a noncommutative 
C*-algebra should be viewed as a noncommutative 
topological space, and of Serre and Swan in the 
1960s whereby a finitely generated projective 
module should be viewed as a vector bundle. 
Algebraic K-theory led to further refinement of this 
picture and particularly, in the 1980s, to A Connes’ 
formulation in terms of cyclic cohomology and 
“spectral triples” (see Noncommutative Geometry 
and the Standard Model; Noncommutative Tori, 
Yang-Mills and String Theory; Quantum Hall 
Effect; Hopf Algebra Structure of Renormalizable 
Quantum Field Theory; Path Integrals in Noncom- 
mutative Geometry). The quantum groups approach 
is less axiomatic, and consists of at least three 
disparate elements. 

The first layer of the quantum groups approach is 
the theory of g-deformed groups and q-spaces on 
which they act, using braided category methods 
(such as braided linear spaces). The braided group 
additive law leads to partial derivatives and these 
define g-exterior algebras etc. This programme 
covered during the 1990s most of what is needed 
to g-deform physics in flat space at an algebraic 
level. Formulas here tend to be complex but 
controlled by R-matrices, and the correct R-matrix 
formulas can be found systematically by working 


with braided algebra as explained in the section 
“Braided groups and quantum planes.” From a 
slightly different side, g-representation theory and 
the further theory of g-homogeneous spaces is 
intimately tied to a theory of g-special functions 
(such as the g-exponential function in the section 
“g-Deformation enveloping algebras”) of interest in 
their own right (see g-Special Functions). The use of 
*-algebras in some cases completable to C*-algebras 
is a point of contact with other approaches to 
noncommutative geometry but problems emerge 
when one considers the braiding. As a result, the 
natural g-Poincaré (plus dilation) quantum group is 
not even a Hopf x-algebra. Briefly, once one starts 
to braid the constructions, one may need to 
represent them with braided (not usual) Hilbert 
spaces and g-analysis. 

The second layer of the quantum groups approach 
is based on “differential calculus” as a specification 
of an exterior algebra of differential forms or 
differential graded algebra (DGA). In general this is 
a wild problem but, as in classical geometry, the 
requirement of a quantum group covariance greatly 
narrows the possible calculi, although no longer to 
the point of uniqueness. The first examples of 
covariant calculi on the quantum group C,[SU2] 
were found by Woronowicz (1989). The bicovariant 
one of these was cast in R-matrix form by Jurco 
while the first actual classification results on the 
moduli of irreducible calculi were obtained by the 
author (the bicovariant ones are essentially in 
correspondence with irreducible representations V, 
with left-invariant differentials forming a braided 
group of the form B(V ® V*)). Probably the most 
interesting feature of this theory is that for all C,[G] 
the bicovariant g-calculus cannot be of classical 
dimensions. For example, for C,[SU2] the smallest 
nontrivial calculus is four dimensional. The “extra 
dimension” is a biinvariant 1-form 0 which has the 
property that [06,a]=da for all aeC,[SU2] and 
which can be viewed as a spontaneously generated 
time (see Bicrossproduct Hopf Algebras and Non- 
commutative Spacetime). Quantum group methods 
also provide DGAs on finite groups, this time 
classified in the bicovariant case by nontrivial 
conjugacy classes. These therefore provide Lie 
structures on finite groups. One can go much further 
and define quantum principal bundles (with quan- 
tum groups as fiber) over general noncommutative 
algebras (Brzezinski and Majid 1993), associated 
bundles, frame bundles, and Riemannian geometry 
of the algebra (see Quantum Group Differentials, 
Bundles and Gauge Theory). 

Again g-deformation provides key examples but 
the theory may then be applied to other situations. 
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For example, the permutation group $3 has a natural 
connected calculus with dimensions 1:3:4:3:1 
(in other words the space has six points but each 
point has the local structure of a 4-manifold in some 
sense). It turns out to have a unique Levi-Civita type 
connection V for its invariant metric, with constant 
curvature. The use of DGAs here is in common with 
other approaches (e.g., Connes 1994) and indeed 
bundles associated to quantum group principal 
bundles and suitable connections can be shown to 
be projective modules. The approaches diverge at 
the level of spectral triples, however, and the 
examples of “Dirac operators” that emerge from 
quantum group methods do not usually obey the 
required axioms. 

A third established layer of the quantum groups 
approach is to trade some of the noncommutativity 
for nonassociativity, as in the dual version of 
Drinfeld’s construction, that is, C,[G] in terms of 
classical C[G] as a (co)quasi-Hopf algebra. The 
general approach here is a quantization functor T 
which provides all constructions but which will 
typically bring out the underlying nonassociative 
geometry even when the noncommutative covariant 
algebras of interest is associative. For example, 
applying the functor to the classical exterior algebra 
Q(G) gives a bicovariant (Q(C,[G]) of classical 
dimensions but with nonassociative products (it is 
a supercoquasi-Hopf algebra). As before, one may 
then apply these quantum group methods to other 
algebras not related to qg-deformation. 

Beyond these are many recent developments, some 
of which are covered in other articles. Probably one 
of the most interesting frontiers, at the time of 
writing, is the exploration of links of both quantum 
groups and noncommutative geometry to number 
theory. 


See also: Affine Quantum Groups; Axiomatic Approach 
to Topological Quantum Field Theory; Bicrossproduct 
Hopf Algebras and Noncommutative Spacetime; Braided 
and Modular Tensor Categories; Classical -Matrices, Lie 
Bialgebras, and Poisson Lie Groups; Duality in 
Topological Quantum Field Theory; Eight Vertex and 
Hard Hexagon Models; Hopf Algebra Structure of 
Renormalizable Quantum Field Theory; The Jones 
Polynomial; Noncommutative Geometry and the 
Standard Model; Noncommutative Tori, Yang—Mills and 
String Theory; Path Integrals in Noncommutative 


Geometry; g-Special Functions; Quantum Group 
Differentials, Bundles and Gauge Theory; Quantum Hall 
Effect; Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions; Topological Quantum Field 
Theory: Overview; von Neumann Algebras: Subfactor 
Theory; Yang—Baxter Equations. 


Further Reading 


Albuquerque H and Majid S (1999) Quasialgebra structure of the 
octonions. Journal of Algebra 220: 188-224. 

Baxter RJ (1982) Exactly Solvable Models in Statistical 
Mechanics. New York: Academic Press. 

Brzezinski T and Majid S (1993) Quantum group gauge theory on 
quantum spaces. Communications in Mathematical Physics 
157: 591-638. 

Connes A (1994) Noncommutative Geometry. San Diego: 
Academic Press. 

Drinfeld VG (1987) Quantum groups. In: Proceedings of the 
ICM. American Mathematical Society. 

Drinfeld VG (1990) Quasi-Hopf algebras. Leningrad Mathema- 
tical Journal 1: 1419-1457. 

Faddeev LD, Reshetikhin NYu, and Takhtajan LA (1990) 
Quantization of Lie groups and Lie algebras. Leningrad 
Mathematical Journal 1: 193-225. 

Jackson FH (1908) On g-functions and a certain difference 
operator. Transactions of the Royal Society of Edinburgh 46: 
253-281. 

Jimbo M (1985) A g-difference analog of U(g) and the Yang- 
Baxter equation. Letters in Mathematical Physics 10: 63-69. 

Larson RG and Radford DE (1988) Semisimple cosemisimple 
Hopf algebras. American Journal of Mathematics 110: 
187-195. 

Lusztig G (1993) Introduction to Quantum Groups. Boston: 
Birkhaser. 

Majid S (1988) Hopf algebras for physics at the Planck scale. 
Journal of Classical and Quantum Gravity 5: 1587-1606. 
Majid S (1990) Quasitriangular Hopf algebras and Yang—Baxter 

equations. International Journal of Modern Physics A 5: 1-91. 

Majid S (1991a) Representations, duals and quantum doubles of 
monoidal categories. Supplimento di Rendiconti del Circolo di 
Palermo, Series II 26: 197-206. 

Majid S (1991b) Examples of braided groups and braided 
matrices. Journal of Mathematical Physics 32: 3246-3253. 
Majid S (1995) Foundations of Quantum Group Theory. 

Cambridge: Cambridge University Press. 

Majid S (2002) A Quantum Groups Primer, London Mathema- 
tical Society Lecture Notes Series, vol. 291. Cambridge: 
Cambridge University Press. 

Manin Yul (1988) Quantum groups and non-commutative 
geometry. Technical Report, Centre de Recherches Math, 
Montreal. 

Sweedler ME (1969) Hopf Algebras. New York: Benjamin. 

Woronowicz SL (1989) Differential calculus on compact matrix 
pseudogroups (quantum groups). Communications in Mathe- 
matical Physics 122: 125-170. 


h-Pseudodifferential Operators and Applications 701 


h-Pseudodifferential Operators and Applications 


B Helffer, Universite Paris-Sud, Orsay, France 


© 2006 Elsevier Ltd. All rights reserved. 


From Classical Mechanics 
to Quantum Mechanics 


The initial goal of semiclassical mechanics was to 
explore the correspondence principle, due to N Bohr 
in 1923, which states that one should recover the 
classical mechanics from the quantum mechanics as 
the Planck constant / tends to zero. So we start with 
a very brief presentation of these two theories. 


Classical Mechanics 


We start (with the Hamiltonian formalism) from a C% 
function p on R” : (x, €) + p(x, £), which describes the 
motion of the system under consideration and is called 
the Hamiltonian. The variable x corresponds, in the 
simplest case, to the position and € to the momentum of 
one particle. The evolution is then described, starting 
at time 0 of a given point (y,7), by the so-called 
Hamiltonian equations 


Si = (gp/g)(x(t),(t)), for j= Ay... 
. a 
Ti = —(ap/dxj)(x(¢),€(8)), forj= 1n 


The classical trajectories are then defined as the 
integral curves of a vector field defined on R” called 
the Hamiltonian vector field associated with p 
and defined by H,=(0p/0&, —Op/Ox). All these 
definitions are more generally relevant in the 
framework of symplectic geometry on a symplectic 
manifold M (but we choose, for simplicity, to explain 
the theory on R”), which can be seen as the cotangent 
vector bundle T*R”, and is the “local” model of the 
general situation. This space is equipped naturally 
with a symplectic structure defined by giving at each 
point a nondegenerate 2-form, which is here 
o:=)/,dg; A^ dx;. This 2-form permits us to associate 
canonically to a 1-form on T*R{ a vector field on 
T*RŽ. In this correspondence, if p is a function on 
T*RZ, Hp is associated with the differential dp. 

In this article, we consider the example of the 
Hamiltonian p(x,€)=& + V(x), also called the 
Schrodinger Hamiltonian, as the guiding example. 
More specifically, the case of the harmonic oscilla- 
tor, where V is given by V(x)= 5—1 jx? (with 
j > 0), is the most significant, which is the natural 
approximation of a potential near its minimum, 
when nondegenerate. 


In the framework of the classical mechanics, the 
main questions could be: 


Are the trajectories bounded? 

Are there periodic trajectories? 

Is one trajectory dense in its energy surface? 
Is the energy surface compact? 


The solution of these questions could be very difficult. 
Let us just mention the trivial fact that, if p~'!(A) is 
compact for some A, then, by the conservation of 
energy law 


p(x(t), y(t)) = ply, n) |2] 


the whole trajectory starting of one point (y,n) 
remains in the bounded set {b (p(y, 7))} in R”. This 
is in particular the case for the harmonic oscillator. 


Quantum Mechanics 


The quantum theory was born dynamics-wise around 
1920. It is structurally related to the classical 
mechanics in a way that we shall describe very briefly. 
In quantum mechanics, our basic object will be a 
(possibly nonbounded) self-adjoint operator defined 
on a dense subspace of a Hilbert space H. In order to 
simplify the presentation, we shall always take 
H = L?(R”). 

This operator can be associated with p by using 
the techniques of quantization. We choose here to 
present a procedure, called the Weyl quantization 
procedure (which was already known in 1928), 
which under suitable assumptions on p and its 
derivatives, will be defined for u € S(R”) by 


p™ (x, bDx, h)u(x) 
=O MIGJA e e) 
p( =b )uO) dy de 3 


The operator p(x, bDx, h) is called an h-pseudodiffer- 
ential operator of Weyl symbol p. One can also write 
Op; (p) in order to emphasize that it is the operator 
associated to p by the Weyl quantization. Here þ is a 
parameter which plays the role of the Planck constant. 

Of course, one has to give a sense to these integrals 
and this is the object of the theory of the oscillatory 
integrals. If p=1, we observe that, by Plancherel’s 


formula, 
= (2ap)- "Sfl x— y): e) 


x u(y) dy dé 4 
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the associated operator is nothing but the identity 
operator. A way to rewrite any h-differential operator 
De <m 4a(x)(4Dx)° as an h-pseudodifferential opera- 
tor is to apply it on both sides to [4]. In particular, we 
observe that if the symbol is p(x, €) = ¿£ + V(x), then 
the operator associated with p by the h-Weyl 
quantization is the Schrödinger operator —h*A + V. 
Other interesting examples appear naturally in solid 
state physics. Let us, for example, mention the Harper 
operator H (or almost-Mathieu; see Helffer and 
Sjöstrand (1989) and references therein), whose 
symbol is the map (x, £€)—> cos + cosx, and which 
can also be defined, for u € L7(R), by 


(Hu) (x) = 4 (u(x + h) + u(x — h)) + cos x u(x) 


We shall later recall how to relate the properties 
of p and those of the associated operator. More 
precisely, we shall describe under which conditions 
on p the operator p”(x,)D,;h) is semibounded, 
symmetric, essentially self-adjoint, compact, with 
compact resolvent, trace class, Hilbert-Schmidt 
(see Robert (1987) for an extensive presentation). 
But before looking at a more general situation, let 
us consider the case of the Schrödinger operator 
S,=—h?A+ V(x). If V is (say, continuous) 
bounded from below, Sp, which is a priori defined 
on S(R”) as a differential operator, admits a unique 
self-adjoint extension on L*(R”). We are first 
interested in the nature of the spectrum. If 
V(x)— +00 as |x|— œ, one can show that S+, 
more precisely its self-adjoint realization, has 
compact resolvent and its spectrum consists of a 
sequence of eigenvalues tending to oo. We are next 
interested in the asymptotic behavior of these 
eigenvalues. 

In the case of the harmonic operator, corresponding 
to the potential 


V(x)=XÑ x? (with uj > 0) 
j=l 


the criterion of compact resolvent is satisfied and the 
spectrum is described as the set of 


a(h) = 5 Vlaj + 1)b 
j=1 


for aE N”. 

In this case we also have a complete description 
of the normalized associated eigenfunctions which 
are constructed recursively starting from the first 
eigenfunction corresponding to Ag(/) = >), v/b: 


n \ 1/4 
po (x; n- |a -55 mt [5] 


j=l 


The eigenfunction ¢o is strictly positive and decays 
exponentially. Moreover (and here we enter in the 
semiclassical world), the local decay in a fixed closed 
set avoiding {0} (which is measured by its L?-norm) is 
exponentially small as h—0. In particular, this says 
that the eigenfunction lives asymptotically in the set 
{V(x) < A(h)}. This last set can also be understood as 
the projection by the map (x, €)—>x of the energy 
surface, which is classically attached to the eigenvalue 
Alb), that is, {(x,€) € R” | p(x,é) = A(b)}. This is a 
typical semiclassical statement, which will be true in 
full generality. 


From Quantum Mechanics to Classical 
Mechanics: Semiclassical Mechanics 


Before describing the mathematical tools involved in 
the exploration of the correspondence principle, let 
us describe a few results which are typical in the 
semiclassical context. They concern Weyl’s asymp- 
totics and the localization of the eigenfunctions. 


Weyl’s asymptotics We start with the case of the 
Schrodinger operator $,, but we emphasize that the 
h-pseudodifferential techniques are not limited to 
this situation. 

We assume that V is a C™®-function on R” which 
is semibounded and satisfies 


inf V < lim V(x) 


|x| 


The Weyl theorem (which is a basic theorem in 
spectral theory) implies that the essential spectrum is 
contained in 


| lim V(x), +00 


|x|—00 





It is also clear that the spectrum is contained in 
[inf V, +00]. In the interval 


I= |inf V, lim V(x) 


|x|—00 








the spectrum is discrete, that is, it has only isolated 
eigenvalues with finite multiplicity. For any E in I, it is 
consequently interesting to look at the counting func- 
tion N;,(E) of the eigenvalues contained in [inf V, E], 


N, (E) = HA (h); Ah) < E} [6] 
The main semiclassical result is then 
Theorem 1 With the previous assumptions, we have: 


lim h”N,(E) =a” J 
h—0 V(x)<E 


(E — V(x))" dx 


The main term in the expansion of N,(E), which 
will be denoted by 


W,(E) := nh)" | 


V(x)<E 


(E — V(x))"/? dx 


is called the Weyl term. It has an analog for the 
analysis of the counting function for Laplacians on 
compact manifolds (see Quantum Ergodicity and 
Mixing of Eigenfunctions and references therein), but 
let us emphasize that here E is fixed and that one 
looks at the asymptotics as  — 0. In the other case, h 
is fixed and one looks at the asymptotics as E — +00 
(note that on a compact manifold and for the 
Laplacian, the formula N;(E)=N,(E/h*) permits 
switching between these cases). 

Although this formula is rather old (first as a 
folk theorem), many efforts have been made by 
mathematicians for analyzing the remainder (see 
Robert (1987), Ivrii (1998) and references therein) 
N,(E) —W,(E), whose behavior is again related to 
classical analysis. When E is not a critical value of 
V, h"*+!(N,(E)— W,(E)) can be shown to be 
bounded but it appears to be o(1) if the measure 
of the periodic points for the flow is O (see Ivrii 
(1998)). 

Beyond the analysis of the counting function, 
one is also interested (e.g., in questions concerning 
the ground-state energy of an atom with a large 
number of particles, N, satisfying the Pauli exclu- 
sion principle (see Stability of Matter)) in other 
quantities like the Riesz means, which are defined, 
for a given s > 0, by 


Nj(E) = 5 T=), 


J 


The case s = 0 corresponds to the counting function. 
It is then natural to ask for the asymptotic behavior 
as h — 0 of these functions. 

We have, for example, the following result 
(Helffer-Robert, Ivrii-Sigal, and Ivrii; see Robert 
(1987) and Ivrii (1998)), which is written here in a 
more Hamiltonian version, when E is not a critical 
value of V, 


Nj(E) = (21h) ( J o PEE E) de a) 


wa Opa) 
with plx, £)=& + V(x) — E. 


Uncertainty principle and Weyl term The Weyl 
term can be heuristically understood in the follow- 
ing way. According to the uncertainty principle, a 
“quantum” particle should occupy at least a volume 
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of order h” in the phase space with the measure 
dxd (proportional to (%1 d& Adxj)”). This 
guess is a consequence of the inequality 


1/2 
h 
> ll <( | axo)? an 
hd 
i | Gao)" 


expressing the noncommutation of the operators 
((b/i)d/dx —&) and (multiplication by) (x — xo). 
When ||u||=1 and xo (mean position) and &) (mean 
momentum) are defined by xo:= fr x|ul dx and 
ĉo :=(h/i) fg u'(x)- u(x) dx, this inequality expresses 
the impossibility for a quantum particle to have a 
simultaneous small localization in position and 
momentum. 

Consequently, the maximal number of “quantum” 
particles which can live in the region {pg(x, €) < 0} is 
approximately (up to some universal multiplicative 
constant) the volume of this region divided by (27h)”. 


j 1/2 
ax) , VueS(R) 








Lieb-Thirring inequalities and Scott’s conjecture In 
the case of regular potentials, we have seen that the 
quantity h”N}; (E) was asymptotically equal as h — 0 
to E (E — V(x))™? dx). For other ques- 
tions occurring in atomic physics (see Stability of 
Matter), one is more interested in the existence of 
universal constants Ms, , such that 


b"N§(E) < Mey ( [ E- vey" ax) 
V(x)<E 


for any V and any b. 

The best M,,, (which exists if s+ n/2 > 0) is 
denoted by Lsn (for s=0; this is called the 
Cwickel-Lieb-Rozenblium inequality). The semi- 
classical result gives the inequality Ls „n > LS. 

A still open question is the so-called Lieb-Thirring 
conjecture: do we have L1,3 = Le 3? This is related to 
the question of the stability of the matter (see Stability 
of Matter). The last results in this direction have been 
obtained quite recently by A Laptev and T Weidl, 
who show, for example, the equality for s > 3/2. 

The control, when s = 1, of a second term (for more 
singular potentials) for N} (E) was the object of the 
Scott conjecture, which was solved recently in many 
important cases by Hughes, Siedentop—Weikard, 
Ivrii-Sigal, and Feffermann-—Secco (see Ivrii (1998), 
Stability of Matter, and references therein). 


Localization of the eigenfunctions The localization 
property was already observed on the specific case of 
the harmonic oscillator. But this was a consequence 
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of an explicit description of the eigenfunctions. This 
is quite important to have a good description of the 
decay of the eigenfunctions (as h — 0) outside the 
classically permitted region without having to know 
an explicit formula. Various approaches can be used. 

The first one fits very well in the case of the 
Schrödinger operator (more generally to h-pseudodiffer- 
ential operators with symbols admitting holomorphic 
extensions in the € variable) and gives exponential 
decay. This is based on the so-called Agmon estimates 
(developed in the semiclassical context by Helffer— 
Sjostrand and Simon). We shall not say more about 
this approach, which is the starting point of the analysis 
of the tunneling (see Helffer (1988), Dimassi and 
Sjostrand (1999), and Martinez (2002)). 

The second one is an elementary application 
of the h-pseudodifferential formalism which will 
be described later and leads, for example, to the 
following statement. Let E in I and let (A(b;), 
bn(x)) be a sequence of spectral pairs in I x 
L*(R”), where h; +0 as j>+00,X(h;) +E, and 
x= p(x) is an L?-normalized eigenfunction 
associated with A(h;). Let Q be a relatively compact 
set in R” such that 


= 


Then, there exists, for all integer N, a constant Cy, Q 
such that 


00, E) NQ = 


< CN o: hn 


leonl) 


A third one uses the notion of frequency set and 
will be discussed later (see also the book of Martinez 
(2002) for what can be done with the Fourier—Bros— 
Iagolnitzer transform as developed by J Sjöstrand). 


Brief Introduction to the 
h-Pseudodifferential Calculus 


For fixed h, the pseudodifferential calculus has a long 
story starting in its modern form in the 1960s. A 
rather achieved version of the calculus is presented in 
Hörmander (1984). We will emphasize here on the 
semiclassical aspect of the calculus, that is, on the 
dependence of the calculus on the parameter / > 0. 


h-Pseudodifferential Calculus 


Basic calculus: the class S? We shall mainly discuss 
the most simple one called the $° calculus. Let us 
first say that the S° calculus is sufficient once we 
have suitably (micro)-localized the problem (e.g., by 
the functional calculus). Note that it is also 
sufficient for the local analysis of many problems 
occurring on compact manifolds. 


This class of symbols p is simply defined by the 
conditions: 


Oe p(x, €)| < Cag [7] 


for all (a, 8) € N” x N”. The symbols can possibly 
be h-dependent. With this symbol, one can associate 
an h-pseudodifferential operator by [3]. This opera- 
tor is a continuous operator on S(R”) but can also 
be defined by duality on S’(R”). 

The first basic analytical result is the Calderon- 
Vaillancourt theorem (see H6rmander (1984)) estab- 
lishing the L?-continuity. We also mention that if p 
is in L?(R7”), the associated operator is Hilbert- 
Schmidt. One can also give conditions on p implying 
the trace-class property (replace the uniform control 
in [7] by a control in Lt). 

The second important property is the existence of 
a calculus. If a is in S° and b is in S? then the 
composition a(x,hD,)0b’(x,bD,) of the two 
operators is a pseudodifferential operator associated 
with an h-dependent symbol c in $°: 


a” (bD; o b" 0D.) =" (xh Dah) 


We see here that we immediately meet symbols 
admitting expansions in powers of h, which we shall 
call regular symbols, in the sense that they admit 
expansions of the type 


a(x, &;b dm Sayles 6h" 
b(x, £; h) Dreo 


In this case the Weyl symbol c of the composition 
has a similar expansion: 


aenn [eot 0s-D,-D,-D0) 
x (atx, £b) - bly»; b)| 


x=y; €=n 


The symbol aọ is called the principal symbol. At the 
level of principal symbols, the rule is simply that 
the principal symbol of a¥o bY is the product of 
the principal symbols of a” and b*: co =ao-: bo. 
Another important property is the following corre- 
spondence between commutator of two operators 
and Poisson brackets. The principal symbol of the 
commutator (1/h)(a¥ o b¥ — b¥ o a™) is (1/1){ao, bo}, 
where {f, g} is the Poisson bracket of f and g: 


{f, g}(x,€) = Hyg 
= > (asf  Ox,& — Oxf + Og8) 


About global classes The class S° is far from being 
sufficient for analyzing the global spectral problem 
and we refer the reader to Hormander (1984) or 
Robert (1987) for an extensive presentation of the 
theory and for the discussion of other quantizations. 
Our initial operators (think of the harmonic oscilla- 
tor) do not belong to these classes of pseudodiffer- 
ential operators. We are consequently obliged to 
construct more general classes including these 
examples in order to realize this localization. Once 
such a class is introduced, one of the main points to 
consider is the existence of a quasi-inverse (or 
parametrix) for a suitably defined elliptic operator 
of positive order. Following Beals—Feffermann 
(see also the most general H6rmander calculus 
in Hörmander (1984) and references therein), we 
introduce a scale function (possibly 4-dependent; 
typically, m(x, €; h) = h”mo(x, €)) (x, €) > m(x, £; h) 
and C®™ strictly positive weight functions ¢ and 
® such that ¢-® > 1. All these functions are 
strictly positive and should satisfy additional 
conditions on their variation and growth. The 
class of symbols S'8(m,@,®) is defined by 


|D2Dép(x«, &h)| < Cag m(x, & hb) p(x, €) bx, ey 


These apparently complicated estimates permit 
actually the control of the variation of the symbol 
in reference balls defined by 


p7? (xo, €o)|x — xol” + 87 (x0, Eo) |E — E0l” < c 


Elliptic theory As noted above, the main point is to 
have a large class of invertible operators, such that 
the inverses are also in the class. This is what we call 
an elliptic theory and the typical statement is: 


Theorem 2 Let P be an h-pseudodifferential operator 
associated with a symbol p in S*8(m, ġ, ®). We assume 
that it is elliptic in the sense that 1/p belongs to 
S™°8(1/m, ġ, ®). Then there exists an h-pseudodifferen- 
tial operator O with symbol in S*8(1/m, ¢, ®), such that 


OP=I1+R; PO=I+S 


The remainders R and S are pseudodifferential 
operators with symbols in 


p NN 
S (5) ’ Q, a) 
ol 
These remainders are called “regularizing.” Note 
that this notion depends strongly on the choice of the 


class of pseudodifferential operators! When ¢= ®=1, 
we are just inverting modulo a remainder whose norm 


h-Pseudodifferential Operators and Applications 705 


in £L(L*) is O(b*%) (or simply O(h) at the first step). 


With other weights like 6 =® = 4/1 + |x|? + |é|’, we 


invert P modulo a remainder, which has, in addition, a 
distribution kernel in the Schwartz space S(R” x R”). 
The invertibility modulo a compact operator (which 
implies the Fredholm property) is a consequence of the 
assumption 
lim §(x,€)®(x,€) = +00 
Ix|+|E] +00 

The proof is rather easy, once the formalism of 
composition and the notion of principal symbol have 
been understood. One can indeed start from the 
operator Op of symbol 1/p and observe that Oo P =I + 
Rı holds, with Ry in Op“(S((h/®-¢), ¢,®)). The 
operator (I+ Ri) Qo ~ (Xo (1 R))Qo gives 
essentially the solution. 


Essential Self-Adjointness and Semiboundedness 


We now sketch two applications of this calculus in 
spectral theory. We shall usually consider in our 
applications an h-pseudodifferential operator P, 
whose Weyl symbol p is regular, that is, admitting 
an asymptotic expansion: 


(HO) p(x,&h) ~ X hpi(x,£) 
j=0 


(We refer to Robert (1987), H6rmander (1984), and 
Dimassi and Sjostrand (1999) for a more precise 
formulation). Moreover, we assume that 


(x, 8) > p(x, gh) E€ R 


This implies, as can be immediately seen from [3], 
that p” is symmetric (= formally self-adjoint): 


(H1) 


(p“u, v): = (u, pv), Vu,v E€ S(R”) 


The third assumption is that the principal symbol is 
bounded from below (and there is no restriction to 
assume that it is positive) 


(H2) poet) 20 


This assumption implies that the operator itself is 
bounded from below. This result belongs to the 
family of the so-called Garding inequalities. More 
precisely, the assumption (there are other quantiza- 
tions, e.g., the anti-Wick quantization, for which 
this result becomes trivial, the difference between 
the two quantizations being O(h)) will basically 
give, if m > 1, the existence of a constant C such 
that, for any u € S(R”), 


(Putri Z= hjul? 
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Everything is proved if m(x,&)=(po +1) is a scale 
function, if p; and their derivatives are controlled by 
(Po + 1): 


(H3) [0°99 p;)(x,6)| <Ca,aj(bo + b(x,é) T 
x (x, rae 


for all (a,G)eEN” xN”, and if there is a suitable 
control of the family (N € N) of symbols 


(msm) -2 


Under these assumptions, the main result is that P 
is, for h small enough, essentially self-adjoint. This 
means that the operator which was initially defined 
on S(R”) by the pseudodifferential operator with 
symbol p admits a unique self-adjoint extension. 


The Functional Calculus 


It is well known by the spectral theorem for a self- 
adjoint operator P that a functional calculus exists 
for Borel functions. What is important here is to find 
a class of functions (actually essentially C°) such 
that f(P) is a pseudodifferential operator in the same 
class as P with simple rules of computation for the 
principal symbol. 

We are starting from the general formula (see 
Dimassi and Sjöstrand (1999) ) 


=i _ pyl 
MP) = ee Ig oz Ú ” G 2 a dy 


which is true for any self-adjoint operator and any f 
in Co(R). Here the function (x,y) f(x,y) (note 
that z=x +iy) is a compactly supported, almost 
analytic extension of f to C. This means that f =f 
on R and that for any N € N there exists a constant 
Cyn such that 











The main result due to Helffer-Robert (see also 
Dimassi and Sjöstrand (1999) and references 
therein) is that, for P an h-regular pseudodifferential 
operator satisfying (HO)-(H3) and f in C3°(R), the 
operator f(P) is an h-pseudodifferential operator, 
whose Weyl symbol p;(x,&;/) admits a formal 
expansion in powers of h: 


r(x, Eb) S Poad 


J20 


with 
bro = f (po) 
pra = P1: f (po) 
2j-1 
ppp = X CI (RD "dja (po), Vi > 2 
k=1 


where the d; g r universal polynomial functions of 
the e ana; pe, with jal + |@| +2 <7. 

The main Boia in the proof is that we can construct, 
for Im z Æ 0, a parametrix (= approximate inverse) for 
(P — z) with a nice control as Im z — 0. The constants 
controlling the estimates on the symbols are exploding 
as Imz—0 but the choice of the almost analytic 
extension of f absorbs any negative power of |Im z]. 

As a consequence, we get that if, for some interval 
I and some ¢€ > 0, 

(H4) po! (I +[-eo, €0]) is compact 
then the spectrum is, for h small enough, discrete in I. 

In particular, we get that, if po(x,€)— +00 as 
|x| + |€] + +00, then the spectrum of P, is discrete 
(P, has compact resolvent). Under the assumption 
(H4), we get more precisely the following theorem. 


Theorem 3 Let P be an h-regular pseudodifferen- 
tial operator satisfying (HO)-(H4), with I= [E1, E2], 
then, for any g in Co([Ei,E2]), we have the 
following expansion in powers of h: 


tr[g(P(h))] ~b” > PT8), 


j20 


as h—0 


where gt+Tj(g) are distributions in D'(]E1, E2[). 
In particular, we have 


= (2)™ | | a(po(x.§)) dra 


Ti(g) = (20) / / g (pol, €))p1(xe,€) dx dé 


This theorem is just obtained by integration of the 
preceding one, because in these cases the trace of a 
trace-class pseudo-differential operator Op™(a) is 
given by the integral of the symbol a over 
R” = Ry x Re. According to [3], the distribution 
kernel is given by the oscillatory integral: 


K(x,ysh) = 2nhy" f exp(- (ey) E) 


x a(=S, &b) de 8 





and the trace of Op™(a) is the integral over R” of the 
restriction to the diagonal of the distribution kernel: 


K(x, x) = (2h) [a(x gb) dé 

Of course, one could think of using the theorem 
with g, the characteristic function of an interval, in 
order to get, for example, the behavior of the counting 
function attached to this interval. This is of course not 
directly possible and this will be obtained only through 
Tauberian theorems (Hormander (1968), (1984), Ivrii 
(1998)) and at the price of additional errors. 

Let us, however, remark that, if the function g is not 
regular, then the length of the expansion depends on 
the regularity of g. So it will not be surprising that, by 
looking at the Riesz means, we shall get a better 
expansion when s is large. 

Anyway, one basic interest of functional calculus is 
to permit a localization in the energy of the operator. 
For a general /-pseudodifferential operator, it could be 
difficult to approximate an operator like exp(—itP/h) 
by suitable Fourier integral operators but approximate 
exp(—itP/h)f(P) for suitable compactly supported f 
could be easier. 

Another interest is that for suitable f (possibly 
h-dependent) the operator f(P) could have better 
properties than the initial operator. This idea will, for 
example, be applied for the theorem concerning 
clustering. It appears, in particular, very powerful in 
dimension 1, where we can in some interval of energy 
find a function t++f(t;/) admitting an expansion in 
powers of / such that f(P;/) has the spectrum of the 
harmonic oscillator. This is a way to get the Bohr- 
Sommerfeld conditions (see Helffer-—Robert (1987), 
together with Maslov (1972), Leray (1981), or the 
thesis of A Voros in 1977), which reads: 


f(An());h) ~ (2n+1)h modulo O(h*) 


h-Fourier Integral Operators 
and Evolution Operators 


Classical Mechanics 


Let us come back to the Hamilton equations [1]. 
The local existence of solutions is well known. If, in 
addition, we assume (H4), the energy conservation 
law implies global existence for these solutions, if 
the initial data (y,7) belong to p™! (I). 

We recall that (y, n) ¢"(y, n) = (x(t, y, n), E(t, yY, 7) 
defines for any tf a canonical transformation, that is, a 
diffeomorphism respecting the symplectic 2-form: 


a=) d& A dx; 
j 
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We shall denote by A’ the graph of ¢’ which is a 
Lagrangian submanifold (which means that at each 
point m of the manifold the restriction of the 
symplectic two-form to TmA* is 0) for the 2-form 
on R” x R”: D dn; A dy; = D dg /\ dX: 

When the projection (y,n,x,€)—> (n,x) gives a 
local system of coordinates for A; (and this will 
always be the case for (7,x) in a compact set and t 
small enough), one easily finds, using the Lagran- 
gian character of A’, a function (n, x) +> S;(x, 7) such 
that 


At = Ly, 7, x,E|y¥ = OnS:,€ = OS) 


This function is only defined modulo an arbitrary 
function of t. In order to get a more natural 
choice, we consider the Lagrangian submanifold 


in R?” x R” x R? defined as 
A = {y,n, x, €,t, 7|(x,€) = # (yn), T= —po(x,€)} [9] 


The parametrization of A, by its projection 
(y, n, x, t, T)—>(n,x,t), will now give a natural 
function (7, x,t)—> S(t,x,7)=S;(x,7) describing A by 


NSA TE = 05, 90,0, =0,5). 110) 
We observe that we can choose 
S(0,x,9) =x 11) 


and that S is automatically a solution of the Hamilton- 
Jacobi equation 


(0,8) (t,x, 7) + po(x,0,S(é,x,7)) =O [12] 


also called the eiconal equation. 
We also observe the following property (by 
comparison of [9] and [10]): 


@' (O,S(é, x, ), 7) = (x, S(t, x,7)) 


We have actually an explicit expression of S(t, x, n) in 
term of the inverse y(t, x, n) of the map y => x(t, y, 7): 


S(t,x,7) = y(t,x,) “11 
+ [ [DE Ael: EE) 


=p, n) ds /y—y(t,2.9) 
For the harmonic oscillator, easy computations give 


p(x, €) = LE +x*), H, = a A) 
(y, n) = (ycost + nsint, —y sin t + cos t) 


and 
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Fourier Integral Operators 


We have already given in [8] the distribution kernel 
of an 4-pseudodifferential operator. It appears useful 
to generalize this point of view by considering more 
generally objects defined similarly as 


1 


K(x, y;hb) = (2h) * J, exp ¢ b(x,y, )) a(x,y,0;h) dé 


There are a lot of examples entering in this frame- 
work. The representation of the metaplectic group in 
L? (R”) appears to be in this class, with the specificity 
that the phase is quadratic (Guillemin and Sternberg 
(1977)). A quite elementary case corresponds to the 
case when N=0 and ¢(x,y)=x-y. No @ variable is 
present and so no integration appears with respect to 
0. When a=1, this defines essentially the Fourier 
transform. Under suitable conditions on ¢ and a, one 
can show that the associated operators are contin- 
uous on S(R”) (this is, of course, the case for the 
Fourier transform). This was done by Asada and 
Fujiwara, who transpose the theory developed by 
Hormander (1971) in this context, and we should 
also mention the older (but more formal) work by 
Maslov (1972) (see also Leray (1981)). We actually 
do not need it in the semiclassical context because 
the case when the amplitude is with compact support 
is sufficient. 

The basic object is first to look, thinking of 
the stationary-phase theorem, which gives the 
main contribution as h — 0 in this “formal integral” 
(see Stationary Phase Approximation), at the critical 
set Cg: 


Co = {(x, 9,0) € R” x R” x RY |(d9¢) (x,y, 8) = 0} 


In the case of a pseudodifferential operator, we find 
that it is included in {x =y}. Then we associate the 
canonical object, which is a Lagrangian submanifold 
called A, and defined as 


Ag ={(%, £, y,1)|30 s.t. € = Vib(x,y, 9), 
= —VyG(x,y, 0), Vog(x, y, 0) = 0} 


The assumptions on ¢ (which are omitted here) are 
given in order to get that Ag is a regular manifold at 
least on the support of a. The associated operators 
are called Fourier integral operators (FIOs). 
L. Hormander (1971, 1984) has developed a general 
and more intrinsic machinery but with a homo- 
geneity condition on the phase which is irrelevant in 
the semiclassical context. This theory permits also 
the reduction to normal forms for Hamiltonians in 
continuation of what can be done in classical 
mechanics. 


Quantum Evolution 


We just sketch how one approximates the operator 
exp (—1tP/h) by an FIO. The formal construction is 
probably rather old (Maslov 1972, Fedoryuk and 
Maslov 1981) but the rigorous approach with 
estimates of the remainders was first considered by 
J Chazarain with rather strong assumptions. It has 
been later realized that we need only a local 
approximation of this operator and everything 
becomes easier. 

The first approach followed by Helffer—Robert (see 
Robert (1987)) is to localize in energy, within the 
functional calculus associated to the operator P. If I is 
an interval and x is with compact support in I, it 
appears to be easier to approximate exp (—itP/h)x(P) 
when P satisfies (H4) in a neighborhood of I. 

We do not need any more assumptions at oo and 
the composition by y(P) localizes the construction. 

Although this construction is simple because we 
remain within a functional calculus which involves 
only functions of P, it is not always sufficient to 
localize in energy. We have then to localize through 
more general h-pseudodifferential operators and 
consider exp (—itP/h)a”(x,bD,,), where a is a sym- 
bol with compact support. We shall quickly develop 
the first approach. The result is that one can 
approximate U,(t):=x(P) exp (—itP/h) by a Fourier 
integral operator of the form 


= OSC i 
Kule xyih) = 2b)" f exp( -p6 yn] 
x dy (t,x, h) dn 


with dy ~ X ;dy,;h', in order to have 


itP 


Pex- T) Kl 
Writing that U, (t) is a solution of (hD, + P)U, =0, 
(U,)(0) =x(P), and expanding in powers of þh, one 
gets a sequence of equations permitting to determine 


recursively the symbols. The first one was analyzed 
in [12] and reads, in the case when P=—h*A + V: 


=O) 
L£(L2) 











(O,S)(t, x, n) + |VxS(t, x, M + V(x) =0 


with the initial condition S(0,x,7) =x- n. 

This has been solved for t small enough. The other 
equations are called transport equations. The first 
one is, for a(t, x, n) = dy, o(t, x, n), 


a + (Ocpo) (x, OS(t,x,1)) - Oa + ca = f 


with initial condition a(0, x, n) = x(po(x, 7)). 
This type of equation is easily solved by integra- 
tion along the integral curves of the vector field 


O; T (Ocpo) (x, OnS(t, X, n)) : Cis 


Applications 
The Frequency Set 


One has already met the question of localization of 
the eigenfunctions. It appears important to give this 
localization, not only in position (in domain of R”) 
but directly in the phase space. This can be 
described by the notion of frequency set attached 
to a bounded family u, of functions in L*(R”) (or 
more generally of distributions in S’(R”)). Here h 
belongs to an interval (0, þọo] or more generally to a 
subset of R* having 0 as accumulation point. 


Definition 4 We shall say that (x9,&)) € R” x R” 
does not belong to the frequency set of the family u, 
and write (xo, éo) Z FS(u,,), if there exists a compactly 
supported function ¢ equal to 1 in a neighborhood of 
xo and a neighborhood V of £o in which the )-Fourier 
transform of u, satisfies, as b — 0, 





J an (- a £) ion aes 00 nY 


For example, the frequency set FS(u,) of 
u(x) = x(x) exp (16(x)/b) with compactly supported 
x is contained in {(x, €) |x € supp x, € = Vx6(x)}, and 
the frequency set of the coherent state, 


a 2) 


XbO Woo) = p14 exp (“ 
2 
«exp -E7 


is reduced to a point (y, 7). 

In this semiclassical context, this notion seems 
to have been introduced by Guillemin and Sternberg 
(1977) and is further discussed in the book of Robert 
(1987) (see references therein). This is the semiclassi- 
cal analog of the well-known notion of wave front set 
of a distribution introduced by Hörmander (1984) 
in the C%-category for describing the singularities 
of a distribution, but note that a major difference is 
that the frequency set is attached to a family. If P is 
an h-pseudodifferential operator with symbol in S°, it 
is possible, as a consequence of the elliptic theory, 
to prove that: FS(Pu,)) C FS(u,)). For an FIO F 
attached to a canonical relation k, we get similarly: 
FS( Fup) C K(FS(up))). 

We also get a microlocal version of the localization 
result for the eigenfunctions mentioned in the first 
section (using again the parametrix construction). 


Theorem 5 Let E be in I and let (A(h;), dip,)(x)) be a 
sequence in I x L*(R”), where X(h;) > E and h; — 0 as 
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J 00, xt p(x) is an associated eigenfunction to 
A(h;) with norm 1. Then 


FS($(1,)) C po (E) 


Moreover, the frequency set of the family p) is 
invariant under the Hamiltonian flow ¢’. 


The last statement in the above theorem is the 
analog of the theorem on the propagation of 
singularities for the solution of a partial differential 
equation (PDE) (see H6rmander (1984)) and is a 
consequence of the Egorov theorem, which will be 
presented in the next subsection. 

Another remarkable property is that (see, e.g., the 
report on the lecture of T Paul in Rauch and Simon 
(1997), say, in the case of dimension 1, when P is a 
harmonic oscillator, then exp(—it/hP)q,,,., is a 
coherent state attached to œ (y, 7). 


Egorov’s Theorem 


Egorov’s theorem plays a central role in the classical 
theory of PDE by permitting to reduce the study of 
general differential operators to the study of simpler 
model operators, the simplest one being 0/0x, (see 
Hormander (1984)). We use it here in a simple form, 
given in the semiclassical context by Robert (1987), 
and which will play an important role in the study 
of ergodic situations (see Quantum Ergodicity and 
Mixing of Eigenfunctions, and references therein). 
The theorem is the following: 


Theorem 6 Let P satisfy assumptions (HO)-(H3). 
For all a’s in S? with compact support and allt € R, 
we have 


| exp (-izP) a” (x, hDx) exp (izp) 


—a7 (x, bD,) 








L£(L2) 


where 


a(x, E) = al (x, €)) 


and ¢ is the flow of Hp,, where po is the principal 
symbol of P. 


The proof is based on the study of the operator 
exp (—1(t/h)P) a” (x, bD,.) exp (i(t/h)P), which appears 
as the composition of three FIOs. But the Lagrangian 
manifold associated with this composition is the graph 
of the identity, and this is consequently a pseudodiffer- 
ential operator whose “principal” symbol can be 
computed modulo O(h) as a(¢’(x, €)). As an immediate 
consequence, FS(exp (—1tP/h)uj,) = ¢'(FS(u,)). 
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The Poisson Relation 


We start from the harmonic oscillator 


i : 


Its spectrum is given by (n+ 1/2)h (n > 0). Its 
symbol is ao(x,€)=(1/2)(€2 +x?) and the corre- 
sponding flow, for any strictly positive level E, is 
periodic with primitive period 27. The quantity we 
are interested in is 


=> x(i+3)h 


JEN 


H(b) 


) exp(-it( +3)) 


Using the classical Poisson relation, 
NC F(R) Jexplikr) = (27) - >D x + 2kr) 
keZ keZ 
one shows rather easily that the frequency set of S, is 
FPS {S =) Zen le 0, 
T € supp x¥,k E Z}U(R x {0} 


This admits the following generalization, initiated in 
this context by Chazarain. 


Theorem 7 Let P satisfy (HO)-(H4). Let x be a 
function with compact support in I and let 
tr fy(t;h) be the family of distributions defined by 


fx(t; h) = tr (exp (- =) x(P)) 


Then FS(f,) is contained in 


{(t,7)|7 € supp (x) and A(x, £) s.t. 
po(x, £) =T, P) = (x, €)} 


According to the definition, we have to study 


J exp (- +) A(t) f(t; h)dt 


This takes the form 


J c(t, x, 7) EXPT (—tr + S(t, x, n) — xn)dt dx dn 


and can be analyzed by a nonstationary-phase 
theorem, in order to determine for which value of 
r the quantity is O(h®™). 


Gutzwiller’s Formula 


The Gutzwiller formula was established formally by 
Gutzwiller (1971). It then appears in the context of 
high-energy spectral asymptotics in contributions of 
Colin de Verdiére, Chazarain, and Duistermaat 


and Guillemin (see Duistermaat and Guillemin (1975), 
Hormander (1984), Guillemin and Sternberg (1977); 
see also Semi-Classical Spectra and Closed Orbits and 
Quantum Ergodicity and Mixing of Eigenfunctions). In 
the semiclassical context, the simplest statement (cf. 
Chazarain, Helffer-Robert, Guillemin—Uribe, Mein- 
rencken, Paul—Uribe, Dozias, Combescure—Ralston— 
Robert — see Robert (1987), Rauch and Simon (1997), 
Dimassi and Sjostrand (1999), and in the recent article 
by Combescure et al. (1999) for techniques involving 
coherent states) can be presented in the following way. 
For a noncritical E, we introduce the energy surface 
We ={w € T*R” | po(w) =E}. Let P(hb) an h-pseudo- 
differential operator satisfying (HO)-(H4), with I = {E}. 
We also assume that 


(Cl) The restriction of the flow ¢,, to Wz is clean. 
(A flow ¢’, associated with a C™-vector field X 
on a manifold W, is called clean if the two 
following properties are satisfied: 

è the set [T=({(t,w) E Rx W| ¢'(w 
submanifold of R x W; 

è in each point y=(t,w) of T, the tangent 
space to I is given by Tar ={(7,v) E Rx 
Ty W |rX(w) + (D¢’)(w) - v=v}.) 


Then there exists a sequence of distributions 
~p E D'(R), such that, for all ¢ € S(R) with com- 
pactly supported Fourier transform, we have the 
asymptotic expansion in powers of h: 


o(h-'(A;(b) — E)) 


dj(b) €[E—€0 /2,E+€0 /2] 
i yu y(¢ 


Moreover, the supports of the distributions are 
contained in the set of the periods of the periodic 
trajectories of the flow contained in We. 

Actually, the proof gives more information on the 
structure of the different distributions. Let us just 
write the formula for yo: 


= d 
w= fE] dxat) T 
TE, a 


where 69 is the Dirac measure at 0. 


=w} isa 


o)h- n+14+/ [13] 


Clustering of Eigenvalues 


We shall mention one typical result due to Chazarain- 
Helffer-Robert in this context, but inspired by 
previous results obtained for the Laplacian on compact 
manifolds (see Semi-Classical Spectra and Closed 
Orbits, Quantum Ergodicity and Mixing of Eigenfunc- 
tions and references therein, including Chazarain, 
Duistermaat—Guillemin, and Colin de Verdière). 


Clustering means that the spectrum is concentrated 
around a specific sequence tending to oo. This was 
observed in the case of the Laplacian on the sphere 
S’-! by explicit computations. Here we assume 
that, with I = [E1,E2], the conditions (HO)—(H4) are 
satisfied and that 


e (H5) [E1, E2] does not meet the set of critical 
values of po. 

e (H6) VE € [E1 — €, E2 + €], We is connected. 

e (H7) VE € [E; — €, E2 + €], the Hamiltonian flow 
associated with po is periodic, with period T(E) > 0, 
on Wg (with T(E) bounded). 

e (H8) VE € [E1 — € E2 +€], the subprincipal pı 
vanishes on Wr. 


Then, under these conditions, one first observes that for 
a suitable C%-function f defined in a neighborhood of 
[E1, E2], the period of the Hamiltonian flow associated 
with f (po) can be chosen as constant and equal to 27. 
Extending the function f suitably, one can then state the 
following result of Chazarain—Helffer—Robert: 


Theorem 8 There exists hy and C such that, for 
O<h< ho, 


o(f(P())) A [E1, E2] CL Te(h) 


REZ, 
where 

S hb 2 
I, (h) =|- 5 jut kh- Ch i 

S hb 2 

-2 Gut kh + Cb? 

S= | edx- 27E 
y 


for some (hence for any) periodic trajectory y of period 
2r, and u is the Maslov index of this trajectory. 


Moreover, one can compute the multiplicity, in each 
of the intervals I,. The property remains true 
(e.g., Dozias proved this (see Rauch and Simon 
(1997)) in the case when the assumption is made only 
for one energy E, but in intervals [E — ah, E + ah], 
where a can be large but / is small enough. 


Remark 1 These results appear first in the context of 
high energy for Laplacians on compact manifolds. After 
illuminating contributions by physicists like Balian- 
Bloch, the main ideas (see the presentation in Semi- 
Classical Spectra and Closed Orbits) appear in the 
works of Colin de Verdiére, Chazarain, Duistermaat— 
Guillemin (1975), and Weinstein (see also Ho6rmander 
(1984) and Quantum Ergodicity and Mixing of Eigen- 
functions). The proof given in the semiclassical context 
is actually more general (it contains the case of the 
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Laplacian on a Riemannian manifold) and shows that 
the results are true for general Hamiltonians. 


Remark 2 (the case of dimension 1). In this 
particular case, the flow is periodic and the above 
theorems gives the localization of the problem predicted 
by the Bohr-Sommerfeld relations and the computation 
of the multiplicity gives m,(/) =1 for h small enough. 
This point of view was developed by Helffer-Robert 
(1987) (see Semi-Classical Spectra and Closed Orbits). 


Similar properties have been extended to the case 
of integrable systems by Colin de Verdiére in the 
high-energy context and in the semiclassical context 
by Charbonnel and Ivrii (see Ivrii (1998), Dimassi 
and Sjöstrand (1999), and references therein). 


Remark 3 Another interesting application of semi- 
classical analysis concerns the Schnirelman theorem 
treating the case when the flow is ergodic. We refer 
the reader to Quantum Ergodicity and Mixing 
of Eigenfunctions for references and to Helffer- 
Martinez—Robert (see Rauch and Simon (1997) for 
references) for the specific statement for general 
Hamiltonians in semiclassical analysis. 


Conclusions and Suggestions 
for Further Reading 


In this brief survey we have tried to present some of the 
foundational techniques appearing in the “mathemati- 
cal” semiclassical analysis. Of course, this is very 
limited, and semiclassical methods go far beyond the 
verification of the correspondence principle. One can 
refer to semiclassical analysis for many other problems 
where the same analysis (with a small parameter /) is 
relevant but where / is no more the Planck constant. 
This could be a flux (Harper’s equation) or the inverse 
of a flux, the inverse of a mass (Born—Oppenheimer’s 
approximation), of an energy, or of a number of 
particles. We have not developed this point of view here. 

The books given in the bibliography will allow the 
reader to discover other fields. The books by Robert 
(1987), Helffer (1988) and Dimassi and Sjostrand 
(1999) present the basic statements of the theory. The 
book by Martinez (2002) is more “microloca]” in spirit. 
The lectures published in Rauch and Simon (1997) give 
a rather good idea of the state of art in the middle of the 
1990s, and we also refer the reader to other articles in 
this encyclopedia for the presentation of the resonances 
(see Resonances), spectral problems connected with 
ergodicity (see Quantum Ergodicity and Mixing of 
Eigenfunctions), Kolmogorov—Arnol’d—Moser theory 
(see Normal Forms and Semi-Classical Approxima- 
tion), and trace formulas (see Semi-Classical Spectra 
and Closed Orbits). The book by Ivrii (1998) gives the 
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most sophisticated theorems on the counting functions 
(including boundaries, singularities,...) but is only 
written for specialists. 


See also: Normal Forms and Semiclassical 
Approximation; Quantum Ergodicity and Mixing of 
Eigenfunctions; Resonances; Schrodinger Operators; 
Semiclassical Spectra and Closed Orbits; Stability of 
Matter; Stationary Phase Approximation. 


Further Reading 


Combescure M, Ralston J, and Robert D (1999) A proof of the 
Gutzwiller semi-classical trace formula using coherent states 
decomposition. Communications in Mathematical Physics 
202(2): 463-480. 

Dimassi M and Sjöstrand J (1999) Spectral Asymptotics in the Semi- 
Classical Limit, London Mathematical Society Lecture Notes 
Series, vol. 268. Cambridge: Cambridge University Press. 

Duistermaat JJ (1973) Fourier Integral Operators. Courant Institut 
Mathematical Society. New York: New York University. 

Duistermaat JJ and Guillemin VW (1975) The spectrum of 
positive elliptic operators and periodic bicharacteristics. 
Inventiones Mathematicae 29: 39-79. 

Fedoryuk MV and Maslov VP (1981) Semi-Classical Approxima- 
tion in Quantum Mechanics. Dordrecht: Reidel. 

Guillemin V and Sternberg S (1977) Geometric Asymptotics, 
American Mathematical Surveys, No. 14. Providence, RI: 
American Mathematical Society. 

Gutzwiller M (1971) Periodic orbits and classical quantization 
conditions. Journal of Mathematical Physics 12: 343-358. 


Hubbard Model 


H Tasaki, Gakushuin University, Tokyo, Japan 
© 2006 Elsevier Ltd. All rights reserved. 


Definitions 


The Hubbard model is a standard theoretical model 
for strongly interacting electrons in a solid. It is a 
minimum model which takes into account both 
quantum many-body effects and strong nonlinear 
interaction between electrons. Here we review rigor- 
ous results on the Hubbard model, placing main 
emphasis on magnetic properties of the ground states. 

Let the lattice A be a finite set whose elements 
x,y,...€ A are called sites. Physically speaking, 
each site corresponds to an atomic site in a crystal. 
The Hubbard model is based on the simplest tight- 
binding description of electrons (Figure 1), where a 
single state is associated with each site. 

For each x€A and o €{f,]}, we define the 
creation and the annihilation operators cl, and 
Cx o, respectively, for an electron at site x with 
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spin ø. (Al is the adjoint or the Hermitian conjugate 
of A.) These operators satisfy the canonical anti- 
commutation relations 


ce oe = Ôx, y Ôo,7 
K A = {exs Cyr} =0 


for any x,y € A and o,r=T, |, where {A,B} =AB + 
BA. The number operator is defined by 


(1 


i a. Ge [2] 


which has eigenvalues 0 and 1. 

The Hilbert space of the model is constructed as 
follows. Let ®,,. be a normalized vector state which 
satisfies Cx ovac =0 for any xE A and o=7,\. 
Physically, ®,,. corresponds to a state where there 


are no electrons in the system. For arbitrary subsets 
Ay, A, C A, we define 


Dai Ay = Il Cnt II - Pvac [3] 


xEAy xEA, 


(a) 
» COOCOO 
2 O-O-O--O- 


(d) 
Figure 1 A highly schematic figure which explains the philoso- 
phy of tight-binding description. (a) A single atom which has 
multiple electrons in different orbits. (b) When atoms come 
together to form a solid, electrons in the black orbits become 
itinerant, while those in the light gray orbits are still localized at the 
original atomic sites. Electrons in the gray orbits are mostly 
localized around the atomic sites, but tunnel to nearby gray orbits 
with nonnegligible probabilities. (c) We only consider electrons in 
the gray orbits, which are expected to play essential roles in 
determining various aspects of low-energy physics of the system. 
(d) If the gray orbit is nondegenerate, we get a lattice model in 
which electrons live on lattice sites and hop from one site to 
another. In a simplified treatment of a metal, the black and the 
gray orbits correspond to the 4s and the 3d bands, respectively. 











in which sites in A; are occupied by up-spin 
electrons and sites in A, by down-spin electrons. 
We fix the electron number Ne, which is an integer 
satisfying 0 < Ne < 2|A|. (We denote by |S| the 
number of elements in a set S.) The Hilbert space 
for the system with N, electrons is spanned by the 
basis states [3] with all subsets A; and A, such that 
Ar| + [Ay] = Ne. i ean 
We define total spin operators Stor = (S15), $12), Ste) 


for a=1,2, and 3. Here p'™ are the Pauli matrices 
0 1 0 -i 
oS (2) — 
° « J ę a) 
1 0 
(3) _ 
i (o 4] 


The operators Stor are the generators of global SU(2) 
rotations of the spin space. As usual, we denote the 
eigenvalue of (Si) as Stot(Stop + 1). The maximum 
possible value of Stot is Smax = Ne/2 when Ne < JAJ, 
and Smax = |A| — (N./2) when Ne > |A]. 

The most general Hamiltonian of the Hubbard 
model is 


H=- ` lay E + > UO gM thc [6] 


x, yEA xen 
o=T,l 


[5] 
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Here the first term describes quantum-mechanical 
motion of electrons which hop around the lattice 
according to the amplitude tx, y = ty, € R. Usually, 
tx y is nonnegligible only when the two sites x and y 
are close to each other. The second term represents 
nonlinear interaction between electrons. There is an 
increase in energy by U, € R when the site x is 
occupied by both up-spin electron and down-spin 
electron. We usually set U, > 0 to mimic (screened) 
Coulomb interaction between electrons. 

The Hamiltonian H commutes with the total spin 
operator $$) for a=1, 2, and 3. One can thus 
investigate simultaneous eigenstates of (Sio)? and H. 
For Stot in the allowed range, we denote by Emin(Stot) 
the lowest possible energy among the states which 
satisfy (Stor)7® = Stot(Stor + 1)®. 


Wave-Particle Dualism in the 
Hubbard Model 


It is illuminating to examine the eigenstates of the 
Hamiltonian [6] for the following two special cases. 

First suppose that one has U, =0 for all x € A, 
that is, the model has no interactions. For 
i=1,2,...,|A], let Y=) ea EC be the 
single-electron eigenstate, which is the solution of 
the Schrödinger equation 


E a lx,y wy’ = «hy 


yeA 


for any x € A [7] 


We order the energy eigenvalues as «€; < 641. By 
defining the corresponding creation operator by 
ee ie ee. we see that, for any subsets 
Ty, I € {1,2,...,]A]} such that |h] + |= Ne, the 
state 


Ygs l[a [e Pvac [8] 


iElh 1E 


is an eigenstate of H (with U,=0) with the 
eigenvalue E= } jen € + J jcn €i The ground states 
are obtained by choosing l4, 1; which minimize E. In 
particular, when Ne is even and the single-electron 
eigenenergies €; are nondegenerate, the ground state 
is unique and written as 


Ne/2 
Pas = Mi ahah) Pyac [9] 
1=1 


The fact that this ground state has the minimum 
possible spin Stt=0 is known as Pauli 
paramagnetism. 

We have seen that the Hamiltonian H with U, =0 
can be diagonalized by using single-electron 
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eigenstates Y”. When (tx) has a translation 
invariance, each w") behaves as a “wave.” We 
can say that the noninteracting models can 
be understood in terms of the wave picture of 
electrons. 

Next suppose that tx y =0 for all x,y € A, that is, 
the electrons do not hop. Then the Hamiltonian [6] 
is readily diagonalized in terms of the basis state [3], 
where the corresponding eigenvalue is simply 
E => venna, Ux. In this case, the model is best 
understood in the particle picture of electrons. 

We thus see that the wave-particle dualism 
manifests itself in the Hubbard model in an essential 
manner. When both the first and the second terms in 
the Hamiltonian [6] are present, there takes place a 
“competition” between wave-like nature and particle- 
like nature of electrons. The competition generates 
rich nontrivial phenomena including antiferromagnet- 
ism, ferromagnetism, metal-insulator transition, and 
(probably) superconductivity. To investigate these 
phenomena is a major motivation in the study of 


the Hubbard model. 


One-Dimensional Model 


The Hubbard model defined on a simple one- 
dimensional lattice is easier to study. But it does 
not exhibit truly nontrivial behavior as the following 
classical theorem of Lieb and Mattis suggests. 


Theorem 1 Consider a Hubbard model on a one- 
dimensional lattice \={1,2,..., N} with open bound- 
ary conditions. We assume that tx y A 0 if |x — y|=1, 
and ty,y=0 if |x-—y|>1. tx € Rand Ux, € R are 
arbitrary. Then one has Emin(Stot) < Emin(Stot + 1) for 
any Sees Ordens Dae (OF Som 25.4) 25 cs 
Smax _ 1). 


As a consequence, one finds that the ground states 
always have Sjr=0O (or Stor=1/2) as in the 
noninteracting models. 

The translation invariant model with t,,=¢ if 
ie) = 1, dey = 0 oy] F 1. and U,—U can be 
solved by using the Bethe ansatz, as was first shown 
by Lieb and Wu. It was found that the model is 
insulating for all U>0, and there is no metal- 
insulator transition. (A metal—insulator transition is 
expected to take place in higher dimensions.) Earlier 
works on the Bethe ansatz were based on the 
assumption that the Bethe ansatz equation gives 
the true ground states. Recently, the existence and 
the uniqueness of the Bethe ansatz solution for the 
ground state of a finite system was proved by 
Goldbaum. 


Half-Filled Systems 


The system in which the electron number N, is 
identical to the number of sites |A| is said to be 
half-filled. Many (but not all) physical systems can 
be modeled as a half-filled Hubbard model. 

Based on a heuristic perturbation theory, low-energy 
properties of half-filled models with large U are 
expected to be similar to those of Heisenberg anti- 
ferromagnetic spin systems. There is no electrical 
conduction, and the spin degrees of freedom may 
show antiferromagnetic long-range order in the ground 
states. 

This expectation is partly justified by the follow- 
ing theorem due to Lieb. A Hubbard model is said 
to be bipartite if the lattice A can be decomposed 
into a disjoint union of two sublattices as A=A UB 
(with AM B= Q), and it holds that tx, =0 whenever 
x,y EA or x,y € B. In other words, only hopping 
between different sublattices is allowed. 


Theorem 2 Consider a bipartite Hubbard model. We 
assume |A| is even, and the whole A is connected 
through nonvanishing tx, y. We also assume Ux = U > 0 
for any x € A. Then the ground states of the model 
are nondegenerate apart from the trivial spin 
degeneracy, and have total spin Sio = ||A| — |B||/2. 
It also holds that Emin(Stot) < Emin(Stor + 1) for any 
Stor > IIA] — |BI|/2. 


The theorem implies that, as far as the total spin is 
concerned, the half-filled Hubbard model behaves 
exactly as the Heisenberg antiferromagnet. But the 
existence of antiferromagnetic ordering has not been 
proved in any version of the Hubbard model. 

To see another implication of Theorem 2, take the 
so-called CuO lattice in Figure 2. Here the A and B 
sublattices consist of black and white sites, respec- 
tively. One has |A|=|A|/3 and |B|=2|A|/3. Then 
the theorem implies that the ground state of the 
corresponding Hubbard model has total spin 


Figure 2 An example (the so-called CuO lattice) of a bipartite 
lattice in which the numbers of sites in two sublattices are 
different. Lieb’s theorem implies that the half-filled Hubbard 
model defined on this lattice exhibits ferrimagnetism. 


Stot = |[A] — |B||/2 =|A|/6. Since the total spin mag- 
netic moment of the system is proportional to the 
number of sites |A|, we conclude that the model 
exhibits ferrimagnetism, a weaker version of 
ferromagnetism. 

Another interesting result for the half-filled 


models is the following uniform density theorem 
by Lieb, Loss, and McCann. 


Theorem 3 Consider a bipartite Hubbard model. 
ty ER, ULER are arbitrary. Suppose that 
the ground states are n-fold degenerate, and let 
ol. ({=1,...,2) be mutually orthogonal normal- 
ized ground states. Define the correlation function 
by plx, y) =n" yy Bas» (CL, 10,1 + aD 
((-,-) is the inner product.) Then for any x,y € A or 
wy E B one has pay) = ry 


It is interesting that the density p(x,x) in the 
ground state is always unity though the hopping 
matrix and interactions can be highly nonuniform. 


Ferromagnetism 


Ferromagnetism is an interesting phenomenon in 
which the majority of the spins in the system align in 
the same direction. One of the original motivations 
to study the Hubbard model was to understand the 
origin of ferromagnetism in an idealized situation. 
Let us recall that neither the hopping term nor the 
interaction term in the Hamiltonian [6] favors 
ferromagnetism (or any other magnetic order). One 
must deal with the interplay between the two terms 
to have ferromagnetism. Here we review three 
rigorous examples of saturated ferromagnetism in 
the Hubbard model. Saturated ferromagnetism is the 
strongest form of ferromagnetism where the ground 
state has Stot = Smax- 

The first example is due to Nagaoka and 
Thouless. 


Theorem 4 Take an arbitrary finite lattice A, and 
let Ne=|A| — 1. Assume that tx y <0 for any x £ y, 
and let U,—co for all x€ A. (Taking the limit 
U,— 00 is equivalent to inhibiting x from being 
occupied by two electrons.) Then among the ground 
states of the model, there exist states with total spin 
Stot = Smax(=N,./2). If the system further satisfies the 
connectivity condition (see below), then the ground 
states have Stor =Smax(=Ne/2) and are nondegen- 
erate apart from the trivial spin degeneracy. 


The connectivity condition is a simple condition 
which holds in most of the lattices in two or higher 
dimensions, including the square lattice, the trian- 
gular lattice, or the cubic lattice. To be precise the 
condition requires that “by starting from any 


Hubbard Model 715 








Figure 3 The Hubbard model on the kagome lattice is a typical 
example which exhibits flat-band ferromagnetism. 


electron configuration on the lattice and by moving 
around the hole along nonvanishing f,,,, one can get 
any other electron configuration.” 

The requirements that U,—oo and N.=|A|-—1 
are indeed rather pathological. We still do not know 
if the ferromagnetism extends to more realistic 
situations. Heuristic studies indicate that the issue 
is highly delicate. 

A completely different class of rigorous examples 
of ferromagnetism was found by Mielke. Take, for 
example, the kagomé lattice of Figure 3, and define 
a Hubbard model by setting tx, =t < 0 when x and 
y are neighboring, tx y =0 otherwise, and U, = U > 0 
for any x € A. Then the corresponding single-electron 
Schrodinger equation [7] has a peculiar feature that 
its ground states are {(|A|/3) + 1}-fold degenerate. 
This huge degeneracy corresponds to the fact that the 
lowest-energy band of the model is completely 
dispersionless (or flat). 


Theorem 5 Consider the Hubbard model on the 
kagomé lattice with N, = (|A|/3) + 1. For any U > 0, 
the ground states have Stor =Smax(=Ne/2) and are 
nondegenerate apart from the trivial spin degeneracy. 


There are similar examples in higher dimensions. 
Ferromagnetism observed in these models is called 
flat-band ferromagnetism. 

The above examples of ferromagnetism have 
either singular interaction (U,— oo) or singular 
dispersion relation (highly degenerate single-electron 
ground states). Tasaki found a class of Hubbard 
models which are free from such singularities, and 
exhibit ferromagnetism. 

For simplicity, we concentrate on the simplest 
model in one dimension. There are similar examples 
in higher dimensions. Take the one-dimensional 
lattice A={1,2,...,N} with N sites (where N is an 
even integer), and impose a periodic boundary 
condition by identifying the site N + 1 with the site 1. 
The hopping matrix is defined by setting tx, x+1 = 
t41, x = —U for any x CA, ty x42 =ty42,x = —t for 
even By hey = leaa S for odd x, and i, —0 
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Figure 4 An example of nonsingular Hubbard model which 
exhibits saturated ferromagnetism. 


otherwise. Here t >0 and s>O are independent 
parameters, but the parameter ft’ is determined as 
ti = J/2(t +s). 

As can be seen from Figure 4, electrons are 
allowed to hop to next-nearest neighbors. Thus, 
Theorem 1 does not apply. The single-electron 
ground states are not degenerate unless s=0. We 
set U, =U > 0 for any x € A, and fix the electron 
number as N.=N/2. In terms of filling factor, this 
corresponds to the quarter filling. 


Theorem 6 Suppose that the two dimensionless 
parameters t/s and U/s are sufficiently large. Then 
the ground states have Stor =Smax(=N/4) and are 
nondegenerate apart from the trivial spin degeneracy. 


The theorem is valid, for example, when t/s > 4.5 
if U/s= 50, and t/s > 2.6 if U/s= 100. It is crucial 
that the statement of the theorem is valid only when 
the interaction U is sufficiently large. In the same 
model, it is also proved that low-lying excitation 
above the ground state has a normal dispersion 
relation of a spin-wave excitation. 

We would like to point out that one can learn 
more details about the Hubbard model and further 


rigorous results from the review articles (Lieb 1995, 
Tasaki 1998a, Tasaki 1998b). One can also find 
references for most of the results discussed here in 
these review articles, especially in Lieb (1995). 

As for the latest results which are not included in 


the above reviews, see recent publications, for 
example, Lieb and Wu (2003), Tasaki (2003), and 
Goldbaum (2005), and references therein. 
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Introduction 


Billiards are a class of dynamical systems with 
appealingly simple description. A point particle 
moves with constant velocity in a box of arbitrary 
dimension (the billiard table) and reflects elasti- 
cally from the boundary (the component of velocity 


perpendicular to the boundary is reversed and the 
parallel component is preserved). Mathematically, 
it is a class of Hamiltonian systems with collisions 
defined by symplectic maps on the boundary of the 
phase space. The billiard dynamics defines a one- 
parameter group of maps ©’ of the phase space 
which preserve the Lebesgue measure, and are in 
general only measurable due to discontinuities. The 
boundaries of the box are made up of pieces, 
concave, convex, and flat. Discontinuities occur at 
the orbits tangent to concave pieces of the 
boundary of the box. The orbits hitting two 
adjacent pieces (“corners”) cannot be naturally 


continued, which is another source of discontinu- 
ities. These singularities are not too severe so that 
the flow has well-defined Lyapunov exponents and 
Pesin structural theory is applicable (Katok and 
Strelcyn 1986). A billiard system is called hyper- 
bolic if it has nonzero Lyapunov exponents on a 
subset of positive Lebesgue measure, and comple- 
tely hyperbolic if all of its Lyapunov exponents are 
nonzero almost everywhere, except for one zero 
exponent in the direction of the flow. 

Billiards in smooth strictly convex domains have 
no singularities, but no such examples are known to 
be hyperbolic. 

In general, billiards exhibit mixed behavior just 
like other Hamiltonian systems; there are invariant 
tori intertwined with “chaotic sea.” In hyperbolic 
billiards, stable behavior is excluded by the choice of 
the pieces in the boundary of the box, arbitrary 
concave pieces and special convex ones, and their 
particular placement. Thus, hyperbolicity is achieved 
by design, as in optical instruments. 

It was established by Turaev and Rom-Kedar 
(1998) that complete hyperbolicity may be lost 
under generic singular perturbation of the billiard 
system to a smooth Hamiltonian system. 

Hyperbolicity is the universal mechanism for 
random behavior in deterministic dynamical sys- 
tems. Under suitable additional assumptions, it leads 
to ergodicity, mixing, K-property, Bernoulli prop- 
erty, decay of correlations, central-limit theorem, 
and other stochastic properties. Hyperbolic billiards 
provide a natural class of examples for which these 
properties were studied. In this article we restrict 
ourselves to hyperbolicity itself. 

The most prominent example of a hyperbolic 
billiard is the gas of hard spheres. This way of 
looking at the system was developed in the 
groundbreaking papers of Sinai (see Chernov and 
Sinai (1987) for an exhaustive list of references). 
The collection of papers (Szasz 2000) contains 
more up-to-date information. Another source on 
hyperbolic billiards is the book by Chernov and 
Markarian (2005). The books by Kozlov and 
Treschev (1990), and by Tabachnikov (1995) 
provide broad surveys of billiards from different 
perspectives. 


Jacobi Fields and Monotonicity 


The key to understanding hyperbolicity in billiards 
lies in two essentially equivalent descriptions of 
infinitesimal families of trajectories. The basic 
notion is that of a Jacobi field along a billiard 
trajectory. Let y(t,u) be a family of billiard 
trajectories, where f is time and u is a parameter, 
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ju) <e. A Jacobi field along y(t,0) is defined by 
J(t) = Oy [OU =0- 

Jacobi fields form a finite-dimensional vector 
space which can be identified with the tangent to 
the phase space at points along the trajectory. They 
contain the same information as the derivatives of 
the billiard flow D®’. In particular, the Lyapunov 
exponents are the exponential rates of growth of 
Jacobi fields. 

Jacobi fields split naturally into parallel and 
perpendicular components to the trajectory, each of 
them a Jacobi field in its own right. The parallel 
Jacobi field carries the zero Lyapunov exponent. In 
the rest we discuss only the perpendicular Jacobi 
fields. Between collisions the Jacobi fields satisfy the 
differential equation J” =0, hence J(t)=J(0) + ¢]'(0). 
At a collision a Jacobi field undergoes a change by 
the map 


I(t) = RI (te) 


[1] 
TE) = RI (te) + PKP) 

where J(t-) and J(t}) are Jacobi fields immediately 
before and after collision, K is the shape operator of 
the piece of the boundary (CK =Vn,n is the inside 
unit normal to the boundary), and P is the 
projection along the velocity vector from the hyper- 
plane perpendicular to the orbit to the hyperplane 
tangent to the boundary. Finally, R is the orthogo- 
nal reflection in the hyperplane tangent to the 
boundary. 

Perpendicular Jacobi fields at a point of a 
trajectory can be identified with a subspace of the 
tangent to the phase space, the subspace perpendi- 
cular to the phase trajectory. To measure the 
growth/decay of Jacobi fields, we introduce a 
quadratic form on the tangent spaces, or equiva- 
lently on Jacobi fields, Q( J, J’)=</J, J>. Evalua- 
tion of Q on a Jacobi field is a function of time Q(t). 
Between collisions we have Q(t2) > Q(t) for bh > ty 
(monotonicity). By [1] the monotonicity at the 
collisions, that is, Q(t') > Q(t~) is equivalent to 
the positive semidefiniteness of the shape operator 
K > 0, it holds for concave pieces of the boundary. 
If K > 0 at a point of collision with the boundary, 
then for (J, J) 4 (0,0), we have Q(t2) > Q(t1) (strict 
monotonicity), assuming that the collision occurred 
between time tı and fp. 

In billiards with concave pieces of the boundary, 
where K >0,K £0, strict monotonicity may still 
occur after sufficiently many reflections (eventual 
strict monotonicity, or ESM). Such billiards are 
called semidispersing, and the gas of hard spheres is 
an example. 
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The role of monotonicity is revealed in the 
following: 


Theorem 1 (Wojtkowski 1991). If a system is 
eventually strictly monotone (ESM), except on a set 
of orbits of zero measure, then it is completely 


hyperbolic. 


The theorem applies to billiard systems. It can be 
generalized and applied to other systems, not even 
Hamiltonian (see Wojtkowski (2001) for precise 
formulations, references and the history of this 
idea). 

The difficulty in applying the above theorem to 
the gas of hard spheres lies in the gap between 
monotonicity and strict monotonicity. There are 
many orbits on which strict monotonicity is never 
attained (parabolic orbits). Establishing that the 
family of parabolic orbits has measure zero (or 
better yet codimension 2) is a formidable task. It 
was brought to conclusion in the work of Simanyi 
(2002). 


Wave Fronts and Monotonicity 


There is a geometric formulation of monotonicity 
(which historically preceded the one given above). 
Let us consider a local wave front, that is, a local 
hypersurface W(0) perpendicular to a trajectory y(t) 
at t=O. Let us consider further all billiard trajec- 
tories perpendicular to W(0). The points on these 
trajectories at time t form a local hypersurface W(t) 
perpendicular again to the trajectory (warning: at 
exceptional moments of time the wave front W(t) 
may be singular). Infinitesimally wave fronts are 
described by the shape operator U = Vn, where n is 
the unit normal field. U is a symmetric operator on 
the hyperplane tangent to the wave front (and 
perpendicular to the trajectory y(t). The evolution 
of infinitesimal wave fronts is described by the 
formulas 


without collisions 


U(t) = (tl + U(0)*)7' 


2 
U(t) = RU(t)R+P*KP ata collision I 


It follows that between collisions a wave front 
that is initially convex (i.e., diverging, or U > 0) will 
stay convex. Moreover, any wave front after a 
sufficiently long run without collisions will become 
convex (after which the normal curvatures of the 
wave front will be decaying). The second part of [2] 
shows that after a reflection in a strictly concave 
boundary a convex wave front becomes strictly 
convex (and its normal curvatures increase). These 
properties are equivalent to (strict) monotonicity as 
formulated above. Indeed, in the language of Jacobi 


fields an infinitesimal wave front represents a linear 
subspace in the space of perpendicular Jacobi fields, 
that is, the tangent space. (Furthermore, it is a 
Lagrangian subspace with respect to the standard 
symplectic form.) We can follow individual Jacobi 
fields or whole subspaces of them. It explains the 
parallel of [1] and [2]. The form Q allows the 
introduction of positive and negative Jacobi fields 
and positive and negative Lagrangian subspaces. An 
infinitesimal convex wave front represents a positive 
Lagrangian subspace. Monotonicity is equivalent to 
the property that a positive Lagrangian subspace 
stays positive under the dynamics (it may appear 
that there is a loss of information in formulas [2] 
compared to [1], but actually they are equivalent 
due to the symplectic nature of the dynamics 
(Wojtkowski 2001). 


Design of Hyperbolic Billiards 


In view of [2] it seems that a convex piece in the 
boundary (K < 0) excludes monotonicity. There are 
two ways around this obstacle. First, we could 
change the quadratic form Q at the convex 
boundary. Second, we can treat convex pieces as 
“black boxes” and look only at incoming and 
outgoing trajectories. Although the second strategy 
seems more restrictive, all the examples constructed 
to date fit the black box scenario, and we will 
present it in more detail. 

To understand this approach, let us consider a 
billiard table with flat pieces of the boundary and 
exactly one convex piece. A trajectory in such a 
billiard experiences visits to the convex piece 
separated by arbitrary long sequences of reflections 
in flat pieces, which do not affect the geometry of a 
wave front at all. Hence, whatever is the geometry 
of a wave front emerging from the curved piece it 
will become convex and very flat by the time it 
comes back to the curved piece of the boundary 
again. Hence, it follows, at least heuristically, that 
we must study the complete passage through the 
convex piece of the boundary, regarding its effect on 
convex, and especially flat, wave fronts. 

Important difference between convex and concave 
pieces is that a trajectory has usually several 
consecutive reflections in the same convex piece; 
moreover, the number of such reflections is 
unbounded. A finite billiard trajectory is called 
“complete” if it contains reflections in one and the 
same piece of the boundary, and it is preceded and 
followed by reflections in other pieces. 


Definition A complete trajectory is (strictly) 
z-monotone if for every nonzero Jacobi field the 


value of the form Q (increases) does not decrease 
between the point at the distance z before the first 
reflection and the point at the distance z after the 
last reflection. 

A complete trajectory is parabolic if there is a 
nonzero Jacobi field J such that J’ vanishes before 
the first and after the last reflection. 


In the language of wave fronts, a complete 
trajectory is z-monotone if every diverging wave 
front at a distance at least z from the first reflection 
becomes diverging after the last reflection at the 
distance z, or earlier. 

It turns out that the only obstruction to mono- 
tonicity of complete trajectories is parabolicity. 
More precisely, if a complete trajectory is not 
parabolic then it is z-monotone for some z > 0. 

It follows from Theorem 1 that we get a 
completely hyperbolic billiard if we put together 
curved pieces with no complete parabolic trajec- 
tories and some flat pieces, in such a way that for 
every two consecutive complete trajectories, being 
zı- and z2-monotone, respectively, the distance from 
the last reflection in the first trajectory to the first 
reflection in the second one is bigger than z1 + 22. 
Indeed, we can put together the midpoints of 
trajectories leaving one curved piece and hitting 
another one into the Poincaré section of the billiard 
flow and we obtain immediately ESM for the return 
map. 

We can formulate somewhat informally two 
principles for the design of hyperbolic billiards. 


Convex pieces of the 
complete parabolic 


1. No parabolic trajectories 
boundary cannot have 
trajectories. 

2. Separation ‘There must be enough separation (in 
space or in time through reflections in flat pieces) 
between strictly z-monotone trajectories accord- 
ing to the values of z. 


All of the examples of hyperbolic billiards 
constructed up to now are designed according to 
these principles. 


Hyperbolic Billiards in Dimension 2 


Checking the absence of parabolic trajectories is 
nontrivial due to the unbounded number of reflec- 
tions in complete trajectories close to tangency. It 
was accomplished so far only in integrable, or near 
integrable examples, with the exception of convex 
scattering pieces described in the following. 
Billiards in dimension 2 are understood best. First 
of all, there is yet another way of describing 
infinitesimal families of nearby trajectories. Every 
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infinitesimal family of rays in the plane has a point 
of focusing (in linear approximation), possibly at 
infinity. This point of focusing contains the same 
information as the curvature of a wave front (it is 
the center of curvature, rather than curvature itself) 
and it has the advantage that it does not change 
between collisions. The focusing points before and 
after a reflection are related by the familiar mirror 
equation of the geometric optics: 


t 1 2 


fo h d 

where fo, fı are the signed distances of the points of 
focusing to the reflection point, d=rcos0@, r being 
the radius of curvature of the boundary piece (r > 0 
for a strictly convex piece), and @ the angle of 
incidence. The mirror equation is just the two 
dimensional version of [2]. 

It is instructive to consider an arc of a circle. A 
billiard in a disk is integrable due to its rotational 
symmetry. Let J be a Jacobi field obtained by 
rotation of a trajectory. This family of trajectories 
(“the rotational family”) is focused exactly in the 
middle between two consecutive reflections (that is 
where J vanishes). It follows further from the mirror 
equation that a parallel family of orbits is focused at 
a distance d/2 after the reflection, and any family 
focusing somewhere between the parallel family and 
the rotational family will focus at a distance some- 
where between d/2 and d, not only after the first 
reflection, but also after arbitrary long sequence of 
reflections. 

Hence, any complete trajectory in an arc of a 
circle is z-monotone, where 2z is the length of a 
single segment of the trajectory and strictly 
z’/-monotone for any z >z. Two arcs of a circle 
separated by parallel segments form the stadium of 
Bunimovich (1979). 

Lazutkin (1973) showed that billiards in smooth 
strictly convex domains are near integrable near the 
boundary. Donnay (1991) applied Lazutkin’s 
coordinates to establish that for an arbitrary strictly 
convex arc the situation near the boundary is similar 
to that in a circle, that is, complete trajectories near 
tangency are z-monotone, where z is of the order of 
the length of a single segment. In particular, no near 
tangent complete trajectory can be parabolic. Hence, 
this crucial calculation shows that if a strictly 
convex arc has no parabolic trajectories then any 
sufficiently small perturbation also has no parabolic 
trajectories. It follows further that any sufficiently 
small piece of a given strictly convex arc has no 
parabolic trajectories. 

It turns out that in dimension 2, complete 
parabolic trajectories are also z-monotone for some 
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z>0O (but clearly not strictly monotone) 
(Wojtkowski 2005). However, they are still an 
obstacle to complete hyperbolicity because in general 
nearby complete trajectories are z-monotone without 
a bound for the values of z, so that no separation of 
convex pieces is sufficient. 

Integrability of the elliptic billiard allows one to 
establish strict monotonicity of trajectories in the 
semi-ellipse with endpoints on the longer axis, 
Wojtkowski 1986. Donnay (1991) showed that 
also the semi-ellipse with endpoints on the shorter 
axis has no parabolic trajectories provided that the 
eccentricity is less than 2/2. As the eccentricity 
goes to V2/2 the separation required to produce a 
hyperbolic billiard goes to infinity. Markarian et al. 
(1996) obtained explicitly the separation of the 
elliptic pieces needed for hyperbolicity, when the 


eccentricity is smaller than V2 — V2/2. 


It follows from the mirror equation that a 
trajectory with one reflection in a convex piece is 
always strictly z-monotone for z > d. Hence, if for 
any two consecutive reflections in convex pieces with 
respective values of d equal to dı and d2, the distance 
between reflections exceeds d; + d2, then the billiard 
is completely hyperbolic. For one convex piece this 
condition, called convex scattering, turns out to be 
equivalent to d*r/ds? < 0, where s is the arc length 
(Wojtkowski 1986). This leads to examples of 
hyperbolic billiards with one convex piece of the 
boundary, like the domain bounded by the cardioid. 

Also, any complete trajectory in a convex scatter- 
ing piece is strictly z-monotone for z bigger than the 
maximum of the values of d for the first and the last 
segment of the trajectory. This allows to find easily 
the explicit separation of convex scattering pieces 
guaranteeing hyperbolicity. 


Hyperbolic Billiards in Higher Dimensions 


In higher dimensions, only two constructions of 
hyperbolic billiards with convex pieces in the 
boundary are known. The first construction by 
Bunimovich (1988), involves a piece of a sphere 
whose angular size, as seen from the center, does not 
exceed 1/2 (Wojtkowski 1990, 2005, Bunimovich 
and Rehacek 1998). The second construction by 
Papenbrock (2000) uses two cylinders, at 90° with 
respect to each other to destroy integrability 
(Wojtkowski 2005). In both cases, the successful 
treatment is based on integrability of the billiard 
systems bounded by a sphere or a cylinder. 

In both of these constructions, trajectories need to 
be cut into strictly monotone pieces of unbounded 
lengths. In the case of spherical caps, complete 


trajectories are z-monotone with unbounded value 
of z and the geometry of the billiard table is used to 
separate them in time by sufficiently many reflec- 
tions in flat pieces of the boundary (Wojtkowski 
2005). In the case of cylinders, trajectories are cut 
by consecutive returns to a Poincaré section in the 


middle of the billiard table. 


Soft Billiards 


The same ideas of monotonicity and strict mono- 
tonicity are applicable to soft billiards, where 
specular reflections are replaced by scatterers in 
which the point particle is subjected to the action of 
a spherically symmetric potential. As in ordinary 
billiards, we compare the wave fronts along trajec- 
tories before entering and after leaving scatterers. 
Again, in the absence of parabolic trajectories 
sufficient separation of the scatterers produces a 
completely hyperbolic system. 

The conditions on the potential that guarantee the 
absence of parabolic trajectories were obtained by 
Donnay and Liverani (1991) in the two-dimensional 
case and by Balint and Toth (2006) in higher 
dimensions. The complete integrability of the 
motion of a point particle in a spherically symmetric 
potential is crucial in the derivation of these 
conditions (Wojtkowski 2005). 


See also: Billiards in Bounded Convex Domains; Ergodic 
Theory; Hamiltonian Systems: Stability and Instability 
Theory; Hyperbolic Dynamical Systems; Polygonal 
Billiards; Random Matrix Theory in Physics. 
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Introduction 
Division of Smooth Dynamical Systems 


Linear maps can be elliptic (complex diagonalizable 
with all eigenvalues on the unit circle), parabolic (all 
eigenvalues on the unit circle but some Jordan blocks 
of size at least 2), or hyperbolic (no eigenvalues on the 
unit circle), and for differentiable dynamical systems, 
that is, smooth maps or flows, one can roughly make 
an analogous subdivision (see Hasselblatt and Katok 
2002, p. 100f). The linear maps not covered by these 
alternatives are those with some eigenvalues on the 
unit circle and others off it; the corresponding class of 
“partially hyperbolic” dynamical systems is usually 
considered in the context of hyperbolic dynamical 
systems with a view to studying phenomena wherein 
the hyperbolic behavior dominates. Thus, elliptic 
dynamical systems are more or less similar to 
isometries, with orbit separation constant or at most 
oscillatory but without persistent growth. KAM 
theory deals with elliptic systems, establishing that 
much of the ellipticity in an integrable Hamiltonian 
system persists under perturbation. Parabolic systems 
may have polynomial orbit separation produced by a 
local “shear” phenomenon; billiards in polygonal 
domains are an example of this. Hyperbolic dynamical 
systems are characterized by exponential divergence of 
orbits. They are of interest because of the complexity 
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of their orbit structure with respect to both topological 
and statistical behavior. 

Specifically, the stretching (corresponding to 
eigenvalues outside the unit circle in the case of 
linear maps) combined with the folding necessitated 
by compactness of the phase space produces not 
only highly sensitive dependence of orbit asympto- 
tics on initial conditions, but also a close intertwin- 
ing of different behaviors. On the one hand, there is 
a dense set of periodic points, on the other hand, an 
abundance of dense orbits. While there are only 
finitely many periodic points of a given period, their 
number grows exponentially as a function of the 
period. The entropy of these systems is positive, 
which indicates that the overall complexity of the 
orbit structure grows exponentially as a function of 
the length of time for which it is being tracked. In 
effect, the behavior of orbits is so intricate as to be 
quasirandom, which makes it natural to use statis- 
tical methods to describe these systems. 


History of Hyperbolic Dynamical Systems 


One strand of the history of hyperbolic dynamical 
systems leads back to the question of the stability of 
the solar system and to Poincaré, in whose prize 
memoir on the three-body problem the possibility of 
“homoclinic tangles” first presented itself. For 
Poincaré, this was important because the resulting 
complexity demonstrates that this system is not 
integrable. We describe below how hyperbolic 
dynamics arises in this situation (see Figure 3). 
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Another strand emerged about a decade later with 
Hadamard’s study of geodesic flows (free particle 
motion) on negatively curved surfaces. Hadamard 
noted that these exhibit the kind of sensitive 
dependence on initial conditions as well as the 
pseudorandom behavior that are central features of 
hyperbolic dynamics. This subject was developed 
much further after the advent of ergodic theory, 
with the Boltzmann ergodic hypothesis as an 
important motivation: work by numerous mathe- 
maticians, principally Hedlund and Hopf, showed 
that free particle motion on a negatively curved 
surface provides examples of ergodic mechanical 
systems. More than two decades later, in the 1960s, 
Anosov and Sinai overcame a fundamental technical 
hurdle and established that this is indeed the case in 
arbitrary dimension. This was done in the more 
general context of a class of dynamical systems 
known now as Anosov systems, which were axio- 
matically defined and systematically studied for the 
first time during this period of research in Moscow. 

A greater class of dynamical systems exhibiting 
chaotic behavior was introduced by Smale in his 
seminal 1967 paper under the name of Axiom-A 
systems. This class includes the hyperbolic dynamics 
arising from homoclinic tangles, see Figure 3 
(see Homoclinic Phenomena). Smale’s motivation 
was his program of classifying dynamical systems 
under topological conjugacy, and the consequent 
search for structurally stable systems. Today, Axiom- 
A (and Anosov) systems are valued as idealized models 
of chaos: while the conditions defining Axiom A are 
too stringent to include many real-life examples, it is 
recognized that they have features shared in various 
forms by most chaotic systems. Here, we concentrate 
on the discrete-time context to keep notations lighter. 

Partial hyperbolicity was introduced in the 1970s 
and has proved that a limited amount of hyperbo- 
licity in a dynamical system can produce much of 
the global complexity (such as ergodicity or the 
presence of dense orbits) exhibited by hyperbolic 
systems, and can do so in a robust way. Here one 
imposes uniform conditions, but expansion and 
contraction are not assumed to occur in all direc- 
tions. Stable ergodicity has been an important 
subject of research in the last decade. 

Nonuniform hyperbolicity weakens hyperbolicity 
by allowing the contraction and expansion rates to 
be nonuniform. This was motivated by examples of 
systems with hyperbolicity where expansion or 
contraction can be arbitrarily weak or absent in 
places, such as the Hénon attractor, and by 
situations where hyperbolicity coexists with singula- 
rities, such as for (semi)dispersing billiards (see 
Hyperbolic Billiards). 


With respect to both uniformly and nonuniformly 
hyperbolic systems, dimension theory has been a 
subject of much interest (computations and esti- 
mates of the fractal dimension of attractors and 
hyperbolic sets, which is deeply connected to 
dynamical properties of the system). 

A different weakening of hyperbolicity, the pre- 
sence of a dominated splitting, has been of interest 
from the a viewpoint to stability and classification 
of diffeomorphisms. 

The study of hyperbolic dynamics has always had 
interactions with other sciences and other areas of 
mathematics. In the natural and social sciences, this 
is the study of chaotic motions of just about any 
kind. Examples of applications in related areas of 
mathematics are geometric rigidity (an interaction 
with differential geometry) and rigidity of group 
actions. 


Uniformly Hyperbolic Dynamical Systems 
Definitions 


Let f be a smooth invertible map. A compact 
invariant set of f is said to be “hyperbolic” if at 
every point in this set, the tangent space splits into a 
direct sum of two subspaces E" and E’ with the 
property that these subspaces are invariant under the 
differential df, that is, df(x)E"(x)=E"(f(x)), 
df (x)ES(x) = E"(f(x)), and that df expands vectors 
in E" and contracts vectors in E’, that is, there are 
constants 0 < A < 1 < p,c > 0 such that if v € E(x) 
for some x, then ||df”v|| < cA”||v|| for n=1,2,..., 
and if v€ E(x) for some x, then ||df~"v|| < 
Ci lol] tor n= 1 Zaer 

If EX ={0} in the definition above, then the 
invariant set is made up of attracting fixed points 
or periodic orbits. Similarly, if E‘={0}, then the 
orbits are repelling. If neither subspace is trivial, 
then the behavior is locally “saddle-like,” that is to 
say, relative to the orbit of a point x, most nearby 
orbits diverge exponentially fast in both forward 
and backward time. This is why hyperbolicity is a 
mathematical notion of chaos. 

An Anosov diffeomorphism is a smooth invertible 
map of a compact manifold with the property that 
the entire space is a hyperbolic set. 

Axiom A, which is a larger class, focuses on the 
part of the system that is not transient. More 
precisely, a point x in the phase space is said to be 
“nonwandering” if every neighborhood U of x 
contains an orbit that returns to U. A map is said 
to satisfy Axiom A if its nonwandering set is 
hyperbolic and contains a dense set of periodic 
points. 


Definitions in the continuous-time case are analo- 
gous: f above is replaced by the time-t-maps of the 
flow, and the tangent spaces now decompose into 
E" @ E? @ ES where E}, which is one dimensional, 
represents the direction of the flow lines. 

A geometric way of detecting (indeed, defining) 
hyperbolicity is via the cone criterion: at every point 
there is a cone that is mapped by the differential into 
the interior of the corresponding cone at the image 
point, and a “complementary” cone family behaves 
similarly for the inverse. 

Many continuous structures associated with a 
hyperbolic dynamical system are, in fact, Holder 
continuous. (For a function g on a metric space this 
is defined as the existence of C,a> 0 such that 
d(g(x), 2(v)) < Cd(x,y)° whenever x,y are suffi- 
ciently close to each other.) In the present article, 
almost every assertion of continuity could be 
replaced by one of Holder continuity. This notion 
is natural in this context because x, — y exponen- 
tially fast implies that g(x,,) — g(y) exponentially fast 
if g is Holder continuous. 





Structure and Properties 


Stable and Unstable Manifolds, Local 
Product Structure 


Anosov and Axiom-A systems are defined by the 
behavior of the differential. Corresponding to the 
linear structures left invariant by df are nonlinear 
structures, namely “stable manifolds” tangent to E° 
and “unstable manifolds” tangent to E". 

Thus, associated with an Anosov map are two 
families of invariant manifolds, each one of which 
fills up the entire phase space; they are sometimes 
called the stable and unstable “foliations.” The 
leaves of these foliations are transverse at each 
point, that is, they intersect at positive angles, 
forming a kind of (topological) coordinate system. 
The map f expands distances along the leaves of one 
of these foliations and contracts distances along the 
leaves of the other. For Axiom-A systems, one has a 
similar local product structure or “coordinate 
system” at each point in the nonwandering set, but 
the picture is local, and there are gaps: the stable 
and unstable leaves do not necessarily fill out open 
sets in the phase space. 

There is much interest in determining the fractal 
dimension (box-counting or Hausdorff, say) of 
hyperbolic sets. So far the best dimension estimates 
have been made for stable slices, that is, for the 
intersection of a stable leaf with the hyperbolic set, 
and for unstable slices. Because the local coordinate 
systems describing the local product structure are 
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only known to be continuous, it is not known in 
general whether the sum of these stable and unstable 
dimensions gives the dimension of the hyperbolic set 
(we don’t even know whether all stable slices have 
the same fractal dimension). The problem is that an 
a-H6lder-continuous map can change dimensions by 
a factor of a or 1/a. But there is evidence to suggest 
that something like this “dimension product struc- 
ture” may often be true — this has been established 
for a class of solenoids. 


Transitivity and Spectral Decomposition 


In addition to these local structures, Axiom-A 
systems have a global structure theorem known 
as “spectral decomposition.” It says that the 
nonwandering set of every Axiom-A map can be 
written as X;U---UX, where the X; are disjoint 
closed invariant sets on which f is topologically 
transitive, that is, has a dense orbit. The X; are 
called “basic sets.” Each X; can be decomposed 
further into a finite union |J X;,;, where each X;,; is 
invariant and topologically mixing under some 
iterate of f. (Topological transitivity and mixing 
are irreducibility conditions; transitivity means that 
there is no proper open invariant subset, and 
topological mixing says that given two open sets, 
from some time onward the images of one will 
always intersect the other.) This decomposition is 
reminiscent of the corresponding result for finite- 
state Markov chains. 


Stability 


One of the reasons why hyperbolic sets are 
important is their “robustness”: they cannot be 
perturbed away. More precisely, let f be a map 
with a hyperbolic set A which is locally maximal, 
that is, it is the largest invariant set in some 
neighborhood U. Then for every map g that is 
C!-near f, the largest invariant set A’ of g in U 
is again hyperbolic; moreover, f restricted to A is 
“topologically conjugate” to g restricted to A’. This 
is mathematical shorthand for saying that not only 
are the two sets A and A’ topologically indistin- 
guishable, but the orbit structure of f on A is 
indistinguishable from that of g on A’. 

The phenomenon above brings us to the idea of 
“structural stability.” A map f is said to be 
structurally stable if every map gC!-near f is 
topologically conjugate to f (on the entire phase 
space). It turns out that a map is structurally stable 
if and only if it satisfies Axiom A and an additional 
condition called strong transversality. 
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Chains and Shadowing 


We discuss next the idea of pseudo-orbits versus real 
orbits. Letting d(-,-) be the metric, a sequence of 
points x9,x1,X2,... in the phase space is called an 
“e-pseudo-orbit” or a “chain” of f if d(f (xi), xj11) < € 
for every 7. Computer-generated orbits, for example, 
are pseudo-orbits due to round-off errors. A fact of 
consequence to people performing numerical experi- 
ments is that in hyperbolic systems, small errors at 
each step get magnified exponentially fast. For 
example, if the expansion rate is 3 or more, then 
an €-error made at one step is at least tripled at each 
subsequent step, that is, after only O(|loge|) 
iterates, the error is O(1), and the pseudo-orbit 
bears no relation to the real one. There is, however, 
a theorem that says that every pseudo-orbit is 
“shadowed” by a real one. More precisely, given a 
hyperbolic set, there is a constant C such that if 
X0,X1,X2,... is an e-pseudo-orbit, then there is a 
phase point z such that d(x;, f‘(z)) < Ce for all i. 
Thus, paradoxical as it may first seem, this result 
asserts that on hyperbolic sets, each pseudo-orbit 
approximates a real orbit, even though it may 
deviate considerably from the one with the same 
initial condition. 

The shadowing orbit corresponding to a bi- 
infinite pseudo-orbit is, in fact, unique. From this, 
one deduces easily the following Closing Lemma: 
For any hyperbolic set, there is a constant C such 
that the following holds: Every finite orbit segment 
x,f(x),...,f" '(x) that nearly closes up, that is, 
d(x,f"—'(x)) < £ for some small e, lies within <Ce of 
a genuine periodic orbit of period n. Thus, hyper- 
bolic sets contain many periodic points. 


Examples 
Anosov Diffeomorphisms 


A large class of Anosov diffeomorphisms comes 
from “linear toral automorphisms,” that is, maps of 
the n-dimensional torus induced by n x n matrices 
with integer entries, det = +1, and no eigenvalues of 
modulus one. The most popular example is the map 


obtained from 
2 1 
1 1 


sometimes called the Arnol’d cat map because of an 
illustration used by Arnol’d. The unstable manifolds 
are lines parallel to the expanding direction shown 
in Figure 1 and wrapped around the torus, and the 
stable manifolds are obtained from the orthogonal 
lines. 

















Figure 1 A hyperbolic toral automorphism. Reproduced from 
Katok A and Hasselblatt B (2003) Dynamics: A First Course. 
Cambridge: Cambridge University Press, with permission from 
Cambridge University Press. 


We remark that due to their structural stability, 
(nonlinear) perturbations of linear toral auttomorph- 
isms continue to have the Anosov property. This 
remark applies also to all of the examples below. In 
fact, all known Anosov diffeomorphisms are topo- 
logically identical to a linear toral automorphism (or 
a slight generalization of these, infranil-manifold 
automorphisms). 


Geodesic Flows 


Geodesic flows describe free motions of points on 
manifolds. Let M be a manifold. Given x € M and a 
unit vector v at x, there is a unique geodesic starting 
from x in the direction v. The geodesic flow y’ is 
given by y’(x,v) =(x’,v’) where x’ is the point £ units 
down the geodesic and v’ is the direction at x’. 
Geodesic flows on manifolds of strictly negative 
curvature are the main examples of Anosov flows. 
They were studied by Hadamard (ca. 1900), 
Hedlund and Hopf (1930s) considerably before 
Anosov theory was developed. 


Horseshoes 


Smale’s horseshoe is the prototypical example of a 
hyperbolic invariant set. This map, so called because 
it bends a rectangle B into the shape of a horseshoe 
and puts it back on top of B, is shown in Figure 2. 
The set {x:f"(x) € B for all n=0, +1, +2,...} is 
hyperbolic. It is a two-dimensional Cantor set in B. 
The emergence of this example can be traced back 
directly to real-world systems. 

During World War II, Cartwright and Littlewood 
worked on relaxation oscillations in radar circuits, 





Pen 


Figure 2 The horseshoe. 





consciously building on Poincaré’s work. Further 
study of the underlying van der Pol equation by 
Levinson contained the first example of a structu- 
rally stable diffeomorphism with infinitely many 
periodic points. (Structural stability originated in 
1937 but began to flourish only 20 years later.) This 
was brought to the attention of Smale. Inspired by 
Peixoto’s work, who had carried out such a program 
in dimension 2, Smale pursued a program of 
studying diffeomorphisms with a view to classifica- 
tion (Smale 1967). Until alerted by Levinson, Smale 
conjectured that only Morse-Smale systems (which 
have only finitely many periodic points with stable 
and unstable sets in general position) could be 
structurally stable. He eventually extracted the 
horseshoe from Levinson’s work. Smale in turn 
was in contact with the Russian school, where 
Anosov systems (then C- or U-systems) had been 
shown to be structurally stable, and their ergodic 
properties were studied by way of further develop- 
ment of the study of geodesic flows in negative 
curvature. 

The appearance of horseshoes in mathematical 
models of real-world phenomena is quite wide- 
spread. Indeed, in a sense this is the mechanism for 
the production of chaotic behavior, at least in 
dimension 2. In disguise, one of the earliest 
appearances of this phenomenon occurred in the 
prize memoir of Poincaré, where homoclinic tangles 
gave a first glimpse at the serious dynamical 
complexity that can arise in the three-body problem 
in celestial mechanics. If the stable and unstable 
curves of a hyperbolic fixed point intersect trans- 
versely (as in Figure 3a), this engenders further such 
intersections and produces a complicated web of 
accumulations of loops or lobes of stable and 
unstable curves, as shown in Figure 3b. Homoclinic 
tangles always produce horseshoes by the Smale- 
Birkhoff theorem, illustrated by Figure 3c, so in 
trying to solve the three-body problem, Poincaré 
essentially discovered the possibility of nontrivial 
hyperbolic behavior (see Homoclinic Phenomena). 

A related appearance of horseshoes in this context 
is in the work of Alekseev, who used their presence 
to show that capture of celestial bodies can indeed 
occur. 


Solenoids 


Finally we mention the solenoid, which is an 
example of an Axiom-A attractor (see Figure 4). 
Here the map f is defined on a solid torus M = S! x 
D2, where D> is a two-dimensional disk. It is easiest 
to describe it in two steps: first it maps M into a 
long thin solid torus, which is then put inside M 
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Figure 3 Homoclinic tangles produce horseshoes. Repro- 
duced from Katok A and Hasselblatt B (2003) Dynamics: A 
First Course. Cambridge: Cambridge University Press, with 
permission from Cambridge University Press. 





Figure 4 The solenoid. Reproduced from Katok A and 
Hasselblatt B (2003) Dynamics: A First Course. Cambridge: 
Cambridge University Press, with permission from Cambridge 
University Press. 


winding around the S! direction twice. The attractor 
is given by A= („>o f"(M). 


Symbolic Coding of Orbits and 
Ergodic Theory 


An important tool for studying the orbit structure of 
Axiom-A systems is the “Markov partition,” con- 
structed for Anosov systems by Sinai and extended to 
Axiom-A basic sets by Bowen. Given a partition 
{Ri,...,R,} of the phase space, there is a natural 
way to attach to each point x in the phase space a 
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sequence of symbols, namely (...,4-1, 40,41, 42,...-), 
where a; € {1,2,...,k} is the name of the partition 
element containing f'(x), that is, f'(x) € Ra for 
each 7. In general, not all sequences are realized by 
orbits of f. Markov partitions are designed so that 
the set of symbol sequences that correspond to real 
orbits has Markovian properties; it is called a shift of 
finite type. 

The ergodic theory of Axiom-A systems has its 
origins in statistical mechanics. In a 1D lattice model in 
statistical mechanics, one has an infinite array of sites 
indexed by the integers; at each site, the system can be 
in any one of a finite number of states. Thus, the 
configuration space for a 1D lattice model is the set of 
bi-infinite sequences on a finite alphabet. Identifying 
this symbol space with the one coming from Markov 
partitions, Sinai and Ruelle were able to transport 
some of the basic ideas from statistical mechanics, 
including the notions of Gibbs states and equilibrium 
states, to the ergodic theory of Axiom-A systems. 

The notion of equilibrium states, which is 
equivalent to Gibbs states for Axiom-A systems, 
has the following meaning in dynamical systems in 
general: given a potential function y, an invariant 
measure is said to be an equilibrium state if it 
maximizes the quantity 


buf) — | ody 


where /7,,(f) denotes the Kolmogorov-Sinai entropy of 
f and the supremum is taken over all f-invariant 
probability measures u. In particular, when ọ = 0, this 
measure is the measure that maximizes entropy; and 
when y= log|det(df|,.)|, it is the Sinai—Ruelle- 
Bowen (SRB) measure. From the physical or observa- 
tional point of view, SRB measures are the most 
important invariant measures for dissipative dynami- 
cal systems because if f is a diffeomorphism of a 
compact manifold M and A a transitive Axiom-A 
attractor with basin U, for example, A= U = M, then 
for Lebesgue-a.e. x € U and for every y € C®(M) 


1 n—1 

lim =Y tf) = | edu 
i=0 

that is, Lebesgue-a.e. point is p-typical. Thus, while 

Axiom-A attractors will have chaotic motions, they are 

statistically coherent in that the asymptotic distribution 

of any typical orbit is given by the SRB measure. 


Periodic Points and Their 
Growth Properties 


We discuss briefly some further results related to the 
abundance of periodic points in Axiom-A systems. 


For an Axiom-A diffeomorphism f, if P(n) is the 
number of periodic points of period <n, then P(n) ~ 
e’”, where h is the topological entropy of f. That is 
to say, the dynamical complexity of f is reflected in 
its periodic behavior. An analogous result holds for 
Axiom-A flows. This asymptotic behavior is known 
to remarkably fine accuracy (Margulis 2004), and 
these developments used the dynamical zeta func- 
tion, which sums up the periodic information of a 
system. In the discrete-time case, ¢(z):= exp) >>, 
P(n)z”/n has been shown to be a rational function 
analytic on |z| <e~’. In the continuous-time case, 
the zeta function is given by ((z):=[[, 
(1 — exp(—zl(y)))1, where the product is taken 
over all (nonstationary) periodic orbits y and L(y) 
is the smallest positive period of y. This function is 
known to be meromorphic on a certain domain, 
but the location of its poles, which are intimately 
related to correlation decay properties of the 
system, remains one of the yet unresolved issues in 
Axiom-A theory. 


Partial Hyperbolicity and 
Dominated Splitting 


There are various ways in which the notion of 
hyperbolicity described above, which we will hence- 
forth refer to as “uniform hyperbolicity,” can be 
extended beyond the one presented so far. This can 
be done with a view to weakening the conditions 
under which some of the salient properties of 
hyperbolic dynamical systems appear. The study of 
partially hyperbolic dynamical systems and that 
of dynamical systems possessing a dominated split- 
ting is of this type. Further below, we describe a 
different extension motivated more by a desire to 
bring the results and methods of hyperbolic 
dynamics to bear on systems that are closer to 
some physical situations. This led to the study of 
nonuniformly hyperbolic dynamical systems. 

If one views hyperbolicity as requiring that the 
spectrum of expansion and contraction rates is 
separated into two components by the unit circle, 
then one can consider systems where this separation 
is provided by a circle centered at 0 whose radius 
may not be 1 (partial hyperbolicity in the broad 
sense), or by two circles centered at 0 of which one 
has radius less than 1 and the other has radius 
greater than 1, with possibly a third component of 
the spectrum in the annulus between these (absolute 
partial hyperbolicity). Further weakenings are 
obtained by controlling not the whole spectrum in 
this absolute way, but rather ratios of expansion and 
contraction rates along orbits (dominated splitting 


and relative partial hyperbolicity, respectively). 
Among the motivations for these weakenings are 
the desire to understand which systems are topolo- 
gically transitive and robustly so (stable transitivity), 
and to understand which ergodic volume-preserving 
systems remain ergodic if perturbed within the space 
of volume-preserving systems (stable ergodicity). 


Pseudohyperbolicity 


Let f be a smooth invertible map. A compact 
invariant set of f is said to be partially hyperbolic 
in the broad sense if at every point in this set, the 
tangent space splits into a direct sum of two 
subspaces EY and E; with the property that these 
subspaces are invariant under the differential df, 
that is, df(x)E"(x)=E"(f(x)), df (x)E (x) = E” (f (x)), 
and that there are constants 0 < A < u,c > 0 such 
that if v € E*(x) for some x then ||df”v|| < cX”||v|| 
for n=1,2,... and if ve E(x) for some x 
then ||df~"v|| < cu” ||v|| for n=1,2,.... This is 
sometimes also referred to as the existence of a 
(A, u)-splitting or pseudohyperbolicity. 


Dominated Splitting 


A further weakening of this condition replaces these 
absolute estimates by relative ones. Let f be a 
smooth invertible map. A compact invariant set of 
f is said to admit a dominated splitting if at every 
point in this set, the tangent space splits into a direct 
sum of two subspaces E“ and E; with the property 
that these subspaces are invariant under the differ- 
ential and there are constants A € (0,1),c > 0 such 
that if u € E"(x) and v € E(x) for some x then 
lar vld ull < cr" for n=1,2, 2... 

The presence of a dominated splitting has been 
found to yield substantial information pertinent to 
stability of such systems, and it plays a significant 
role in a program of research aiming at a classifica- 
tion of generic diffeomorphisms up to topological 
conjugacy and specifically motivated by the “Palis 
conjecture,” which aims to describe that classifica- 
tion. With respect to inferring topological and 
ergodic (i.e., statistical) properties of the orbit 
structure, the stricter notion of partial hyperbolicity 
(in the narrow sense below) is more commonly used, 
but in this respect the presence of a dominated 
splitting is also of interest because there is evidence 
in support of the conjecture that stable ergodicity 
implies the presence of a dominated splitting. 


Partial Hyperbolicity 


Let f be a smooth invertible map. A compact 
invariant set of f is said to be (absolutely) partially 
hyperbolic if at every point in this set, the tangent 
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space splits into a direct sum of unstable, central, 
and stable directions E",E‘, and E; with the 
property that these subspaces are invariant under 
the differential df and that there exist numbers 
CSO, 


O02 MS <7 5 E = 
with u1 < 1 < 43 


[1] 


such that if 
n=1, 2; thën 


CA lell < def” Ol Cu” 
CoN" lwl] < def" o)l < Cr” w| 
C'A" lull < lldef” o)l] < Cus” u| 


In this case, we set ES := E° @ ES and EM := ES 6 E". 
Following Burns-Wilkinson, we say that f is “center- 
bunched” if max {111, à31} < à2/u2. 

As in the case of (uniformly) hyperbolic dynami- 
cal systems, the sub-bundles E; and E” are integrable 
to stable and unstable foliations W° and W". It is 
not automatic that the center-stable sub-bundle ES 
and the center-unstable sub-bundle E™ are tangent 
to foliations WS and W“ if this happens to be the 
case, the partially hyperbolic system is said to be 
“dynamically coherent.” 

Partial hyperbolicity can also be defined by a cone 
criterion, with suitable adaptations. 


v E€ E(x), w € E'(x),u € E“(x), 


Stable Ergodicity and Transitivity 


Partial hyperbolicity was introduced as a means of 
providing just enough hyperbolicity to render a 
dynamical system ergodic or topologically transitive. 
These are both irreducibility conditions, and to 
obtain these, one rules out a Cartesian product 
situation by assuming something like essential 
accessibility: almost every two points (in the sense 
of volume viewed as a measure) can be connected by 
a curve consisting of a finite concatenation of arcs, 
each of which lies entirely in one stable or unstable 
leaf. A celebrated result in this field is in its original 
form (with a much stronger center-bunching 
assumption) due to Pugh and Shub: suppose a 
volume-preserving diffeomorphism is partially 
hyperbolic on the entire manifold. If it is dynami- 
cally coherent and center bunched and has essential 
accessibility, then it is ergodic (Hasselblatt and Pesin 
2006). 

One of the motivating aims of this theory was to 
obtain nonhyperbolic volume-preserving systems 
that are stably ergodic, that is, for which all 
volume-preserving C!-small perturbations are also 
ergodic. If, in addition to the above, one assumes 
that essential accessibility also persists under such 
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perturbations and that the center bundle E° is 
integrable to a center foliation W° that is smooth 
(or “plaque-expansive’’), then ergodicity is indeed 
stable (Hasselblatt and Pesin). There are quite a few 
natural examples where these assumptions hold. 

While essential accessibility does not always hold, 
it is fairly common. The stronger property of 
accessibility (that any two points can be connected, 
not only almost every two points) is conjectured to 
be stable under C!-perturbations and has been 
shown to hold for an open dense set of partially 
hyperbolic systems with respect to the C!-topology. 

Ergodicity is a measure-theoretic irreducibility 
notion, and topological transitivity is the topological 
counterpart. It can also be obtained from accessi- 
bility: a partially hyperbolic volume-preserving 
diffeomorphism with the accessibility property is 
topologically transitive (in fact, almost every orbit is 
dense). 

There are interesting converse results as well. Any 
stably transitive diffeomorphism exhibits a domi- 
nated splitting. Moreover, in dimension 2 it is 
hyperbolic and in dimension 3 it is partially 
hyperbolic in the broad sense. 


Nonuniform Hyperbolicity 


Applications have motivated weakening assump- 
tions of uniform hyperbolicity to require only that 
“many” individual orbits exhibit hyperbolic beha- 
vior, without assuming that there are any uniform 
estimates on the degree of hyperbolicity. 

To measure the asymptotic contraction or expan- 
sion of a vector on an exponential scale, one defines 
the Lyapunov exponent of a (nonzero) tangent 
vector v at x for the map f to be 


Ax, v):= lim (1/n)log||Df"()|| 2 


whenever this limit exists. Note that being positive 
indicates asymptotic expansion of the vector, 
whereas negative exponents correspond to contract- 
ing vectors. This defines a measurable but, save for 
exceptional circumstances, discontinuous function 
of x and v. It is relatively easy to see that for a given 
point x the function X(x,-) can only take finitely 
many values, so it is natural to define nonuniform 
hyperbolicity as the property of having all of these 
finitely many values nonzero for “most” points. 
Given that A is measurable, it is natural to define 
“most” by using a measure that is invariant under 
the map f. Therefore, the theory of nonuniformly 
hyperbolic dynamical systems, much of which is due 
to Pesin, is based on measure theory throughout. 


The fundamental fact on which this theory is 
based is the “Oseledets multiplicative ergodic theo- 
rem,” which says that for a C!-diffeomorphism of a 
compact Riemannian manifold the set of Lyapunov- 
regular points has full measure with respect to any 
f-invariant Borel probability measure. 

For a Lyapunoy-regular point the limit [2] exists 
for all v, so this theorem tells us that no matter 
which invariant measure we consider, the limit [2] 
makes sense for all tangent vectors at points x 
outside a null set. (One should add that this small 
“bad” set can be somewhat substantial; for example, 
its Hausdorff dimension is usually that of the whole 
space.) 

Accordingly, one then defines a measure to be 
hyperbolic if at almost every point the limit [2] is 
nonzero for all vectors. In this case, one says that 
“f has nonzero Lyapunov exponents.” This property 
can also be obtained from a cone criterion, but here 
the family of cones may only be invariant and 
eventually strictly invariant, that is, there is a cone 
field such that cones are mapped to cones (but not 
necessarily into the interior of cones), and for almost 
every point there is an iterate that maps a cone 
strictly inside the cone at the image point (i.e., into 
the interior). Which iterate is needed is allowed to 
depend on the point (see Hyperbolic Billiards). 

It is good to keep in mind that a hyperbolic 
measure may be concentrated on a single point, say, 
in which case there is not much gained by this 
approach. The theory is of great interest, however, if 
the measure is equivalent to volume or is the 
“physical measure” on an attractor. 

Examples of this sort are fairly common, indeed 
any smooth compact Riemannian manifold other 
than the unit circle admits a volume-preserving 
Bernoulli diffeomorphism with nonzero Lyapunov 
exponents (Dolgopyat and Pesin 2002) (and every 
compact smooth Riemannian manifold of dimension 
at least 3 carries a volume-preserving Bernoulli flow 
for which at almost every point the only zero 
Lyapunov exponent is the one in the flow direction 
(Hu et al. 2004)). 

Structurally, these systems exhibit many of the 
features seen in uniformly hyperbolic ones (e.g., 
stable manifolds), but instead of being continuous 
these are now measurable. There are, however, 
(noninvariant) sets of arbitrarily large measure on 
which these structures are continuous. This provides 
a handle for pushing some of the uniform theory to 
this context. 

There are some topological results in this area, of 
which one of the more remarkable ones is that any 
surface diffeomorphism with positive entropy con- 
tains a horseshoe. Much of the current research is 


directed at the ergodic theory of these systems. A 
central result from the initial development of the 
theory is that while these systems may not be 
ergodic, the ergodic components are (a.e. equal to) 
open sets, so in particular there are at most 
countably many of them. 

One natural question is whether nonuniformly 
hyperbolic systems have SRB measures, and it is 
answered on a case-by-case basis. There are even 
benign examples where this fails to be the case, but 
for some realistic systems, such as the Lorenz and 
Hénon attractors, this has been established. 

Because they preserve volume, this is not an issue 
for billiard systems, (see Hyperbolic Billiards), that 
is, the free motion of a point mass in a cavity with 
elastic boundary collisions. This describes not just a 
toy model, but also the phase space and dynamics of 
a gas of convex rigid bodies. Such a gas of hard 
spheres in a rectangular box is semidispersing and 
has been studied intensely. It is now known to be 
hyperbolic and hoped to be ergodic. (The latter 
would provide a solid foundation for statistical 
mechanics, at least for the case of spherical 
molecules.) A gas of nonspherical convex rigid 
bodies is also a point billiard, but it is not 
semidispersing, which puts it beyond the range of 
readily available techniques for establishing 
ergodicity. 


Further Remarks 


The historical remarks made here are significantly 
expanded in Hasselblatt (2002), which contains 
some references to yet more detailed sources as 
well as more detail about uniformly hyperbolic 
dynamical systems in a concise form. A concise but 
reasonably comprehensive and current account of 
partially hyperbolic dynamics is in Hasselblatt and 
Pesin, and an authoritative full presentation is in 
Pesin (2004). A survey of nonuniformly hyperbolic 
dynamics is given in Barreira and Pesin (2006), and 
the definitive treatment is given by Barreira et al.. A 
textbook presentation of (not only) hyperbolic 
dynamics is in Katok and Hasselblatt (1995) as 
well as Hasselblatt and Katok (2003), and much 
current research, including on all subjects discussed 
here, is surveyed in Handbook. 
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Our society is often designated as being an “infor- 
mation society.” It could also be defined as an 
“image society.” This is not only because image is a 
powerful and widely used medium of communica- 
tion, but also because it is an easy, compact, and 
widespread way to represent the physical world. If 
we think about it, it is indeed striking to realize just 
how much images are omnipresent in our lives 
through numerous applications such as medical and 
satellite imaging, videosurveillance, cinema, 
robotics, etc. 

Many approaches have been developed to process 
these digital images, and it is difficult to say which 
one is more natural than the other. Image processing 
has a long history. Maybe the oldest methods come 
from 1D signal processing techniques. They rely on 
filter theory (linear or not), on spectral analysis, or 
on some basic concepts of probability and statistics. 
For an overview, we refer the interested reader to 
the book by Gonzalez and Woods (1992). 

In this article, some recent mathematical concepts 
will be revisited and illustrated by the image 
restoration problem, which is presented below. We 
first discuss stochastic modeling which is widely 
based on Markov random field theory and deals 
directly with digital images. This is followed by a 
discussion of variational approaches where the 
general idea is to define some cost functions in a 
continuous setting. Next we show how the scale 
space theory is connected with partial differential 
equations (PDEs). Finally, we present the wavelet 
theory, which is inherited from signal processing 
and relies on decomposition techniques. 


Introduction 


As in the real world, a digital image is composed of 
a wide variety of structures. Figure 1 shows different 


kinds of “textures,” progressive or sharp contours, 
and fine objects. This gives an idea of the complex- 
ity of finding an approach that allows to cope with 
the different structures at the same time. It also 
highlights the discrete nature of images which will 
be handled differently depending on the chosen 
mathematical tools. For instance, PDEs based 
approaches are written in a continuous setting, 
referring to analogous images, and once the exist- 
ence and the uniqueness of the solution have been 
proved, we need to discretize them in order to find a 
numerical solution. On the contrary, stochastic 
approaches will directly consider discrete images in 
the modeling of the cost functions. 


The Image Restoration Problem 


It is well known that during formation, transmis- 
sion, and recording processes images deteriorate. 
Classically, this degradation is the result of two 
phenomena. The first one is deterministic and is 
related to the image acquisition modality, to possible 
defects of the imaging system (e.g., blur created by 
an incorrect lens adjustment or by motion). The 
second phenomenon is random and corresponds to 
the noise coming from any signal transmission. It 
can also come from image quantization. It is 
important to choose a degradation model as close 
as possible to reality. The random noise is usually 
modeled by a probabilistic distribution. In many 
cases, a Gaussian distribution is assumed. However, 
some applications require more specific ones, like 
the gamma distribution for radar images (speckle 
noise) or the Poisson distribution for tomography. 
Unfortunately, it is usually impossible to identify the 
kind of noise involved for a given real image. 

A commonly used model is the following. Let 
u:Q C R*—R be an original image describing a real 
scene, and let f be the observed image of the same 
scene (i.e., a degradation of u). We assume that 


f = Au+n [1] 


where ņ stands for a white additive Gaussian noise 
and A is a linear operator representing the blur 
(usually a convolution). Given f, the problem is 
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Figure 1 Digital image example. œ~ the close-ups show 
examples of low resolution, low contrasts, graduated shadings, 
sharp transitions, and fine elements. (a) low resolution, (b) low 
contrasts, (c) graduated shadings, (d) sharp transitions, and 
(e) fine elements. 


then to reconstruct u knowing [1]. This problem 
is ill-posed, and we are able to carry out only an 
approximation of u. In this article, we will focus on 
the simplified model of pure denoising: 


f=ut+n |2] 


The Probabilistic Approach 
The Bayesian Framework 


In this section, we show how the problem of pure 
denoising, that is, recovering u from the equation 
f =u +n knowing only some statistical information 
on 7 can be solved by using a probabilistic 
approach. In this context, f, u, and 7 are considered 
as random variables. The general idea for recovering 
u is to maximize some prior probability. Most 
models involve two parts: a prior model of possible 
restored images u and a data model expressing 
consistency with the observed data. 


e The prior model is given by a probability space 
(Qu, p), where Q, is the set of all values of u. The 
model is specified by giving the probability p(u) 
on all these values. 

e The data model is a larger probability space 
(Qu, f, P), where Q, ¢ is the set of all possible values 
of u and all possible values of the observed image 
f. This model is completed by giving the condi- 
tional probability p(f/u) of any image f given u, 
rye in the joint probabilities p(f,u)= 

p(f/u)p(u). Implicitly, we assume that the spaces 
) a i uf) are finite although huge. 


The next step is to use a Bayesian approach 
introduced in image processing by Besag (1974) 
and Geman and Geman (1984). The probabilities 
plu) and p(f/u) are supposed to be known and, 
given an observed image f, we seek the image 
u which maximizes the conditional a posteriori 


probability p(u/f) (MAP: Maximum A Posteriori). 
Thanks to the Bayes’ rule, we have 


_ p(t /u)p) 
p(u/t) = af) 3] 


Let us explain the meaning of the different terms 


in [3]: 


e The term p(f/u) expresses the probability, the 
likelihood, that an image u is realized in f. It also 
quantifies the lack of total precision of the model 
and the presence of noise. 

e The term p(u) expresses our incomplete a priori 
information about the ideal image u (it is the 
probability of the model, i.e., the propensity that 
u be realized independently of the observation f). 

e The term p(f) which is the probability to observe f 
is a constant and does not play any role when 
maximizing the conditional probability p(u/f) 
with respect to u. 


Let us remark that the problem max, p(u/f) is 


equivalent to min, E(u) = —logp(f/u) — log p(u). 
So Bayesian models lead to a minimization 
process. 


Then the main question is how to assign these 
probabilities? The easiest probability to determine is 
p(f /u). If the images u and f consist in a set of values 
u= (ti j) j=1,N and f= (fails i,j=1,N, we sup- 
pose the conditional independence of (fj,;/u;,;) in any 
pixel: 


p/u) = | | plij mi) 


= 


| 
= 


1 


and if the restoration model is of the form f =u +n 
where 7 is a white Gaussian noise with variance o°, 


then 


(fi — uij) 





1 
Vano 


P(fij/Mij) = exp- 5 
and 
1 (fij — Ui n 
u) = ———_~ exp — 
ptf IW) = ae O D = 

Therefore, at this stage, the MAP reduces to 
minimize 

E(u) = K,||f — ul? — log p(u) 4] 
where ||.|| stands for the Euclidean norm on R™’ and 


K, is a constant. So, it remains now to assign a 
probability law p(u). To do that, the most common 
way is to use the theory of Markov random fields 
(MRFs). 


The Theory of Markov Random Fields 


In this approach, an image is described as a finite set 
S of sites corresponding to the pixels. For each site, 
we associate a descriptor representing the state of 
the site, for example, its gray level. In order to take 
into account local interaction between sites, one 
needs to endow S with a system of neighborhoods V. 


Definition 1 For each site s, we define its neighbor- 


hood V(s) as: 
V(s) = {t} such that s V(s) and tE V(s) >s€ V(t) 


Then we associate to this neighborhood system the 
notion of clique: a clique is either a singleton or a set 
of sites which are all neighbors of each other. 
Depending on the neighborhood system, the family 
of cliques will be different and involve more and less 
sites. We will denote by C the set of all the cliques 
relative to a neighborhood system V (see Figure 2). 

Before introducing the general framework of 
MRFs, let us define some notations. For a site s, 
X, will stand for a random variable taking its values 
in some set E (e.g., €={0,1,...,255}) and x, will be 
a realization of X, and x*=(x;),z, will denote an 
image configuration where site s has been removed. 
Finally, we will denote by X the random variable 
X =(X,, X;,...) with values in Q= £", 


Definition 2 We say that X is an MRF if the local 
conditional probability at a site s is only a function 
of V(s), that is, 


DAs=h OH) SD Xs = x;/xnt E V(E)) 


Therefore, the gray level at a site depends only on 
gray levels of neighboring pixels. Now we give the 
following fundamental theorem due to Hammersley- 
Clifford (Besag 1974) which states the equivalence 
between MRFs and Gibbs fields. 


Theorem 1 Let us suppose that S is finite, E is a 
discrete set and for all xeQ =E", p(X =x) > 0, 
then X is an MRF relatively to a system of 
neighborhoods V if and only if there exists a family 
of potential functions (V.)-ec such that 
p(x) =(1/Z) exp} -£ ec Vel). 

The function V(x)= X <e V(x) is called the 
energy potential or the Gibbs measure and Z is a 
normalizing constant: Z = exp(—)~ V(x)). 


xEQ 





Figure 2 Examples of neighborhood system and cliques. 
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If, for example, the collection of neighborhoods is 
the set of 4-neighbors, then the theorem says that 


V(x) = DETT Ve(xs) + Vets: Ve(Xs5 Xz). 


Application to the Denoising Problem 


Now, given this theorem we can reformulate, thanks 
to [4], the restoration problem (with the change of 
notation u=x and u,=x,): find u minimizing the 
global energy 


E(u) = Kollf — ul? + Vw) [5] 


The next step is now to precise the Gibbs 
measure. In restoration, the potential V(u) is often 
dedicated to impose local regularity constraints, for 
example, by penalizing differences between neigh- 
bors. This can be modeled using cliques of order 2 in 
the following manner: 


V(u)=8 Y lus — m) 


(s,t) EC) 


where ¢ is a given real function. This term penalizes 
the difference of intensities between neighbors which 
may come from an edge or some noise. This discrete 
cost function is very similar to the gradient penalty 
terms in the continuous framework (see the next 
section). The resulting final energy is (sometimes 
E(u) is written E(u/f)) 


E(u) =K; X(f- us)? +6 Y dlus — m) 


ses (s,t) €C2 


where the constant (@ is a weighting parameter 
which can be estimated. 

The difficulty in choosing the strength of the 
penalty term defined by ¢ is to be able to penalize 
the noise while keeping the most salient features, 
that is, edges. Historically, the function ¢ was first 
chosen as ¢(z) =z? but this choice is not good since 
the resulting regularization is too strong introducing 
a blur in the image and loss of the edges. A better 
choice is $(z)=|z| (Rudin et al. 1992) or a 
regularized version of this function. Of course, 
other choices are possible depending on the con- 
sidered application and the desired degree of 
smoothness. 

In this section, it has been shown how to model 
the restoration problem through MRFs and the 
Bayesian framework. Numerically, two main types 
of algorithms can be used to minimize the energy: 
deterministic algorithms and stochastic algorithms. 
The former are generally used when the global 
energy is strictly convex (e.g., algorithms based on 
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gradient descent). The latter are rather used when 
E(u) is not convex. There are stochastic minimiza- 
tion algorithms mainly based on simulated anneal- 
ing. Their main interest is that they always converge 
(almost surely) to a minimizer (this is not the case 
for deterministic algorithms which give only local 
minimizers) but they are often strongly time 
consuming. 

We refer the reader to Li (1995) for more details 
about MRFs and Bayesian framework and 
Kirkpatrick et al. (1983) for more information on 
stochastic algorithms. 


The Variational Approach 


Minimizing a Cost Function over a 
Functional Space 


One important issue in the previous section was the 
definition of p(w) which gives some a priori on the 
solution. In the variational approach, this idea is 
also present but the way to infer it is in fact to 
define the more suitable functional space that 
describes images and their geometrical properties. 
The choice of a functional space sets a norm which 
in turn will constrain the solution to a certain 
smoothness. 

We illustrate this idea in this section on the 
denoising problem [2] which can be seen as a 
decomposition one. This means that given the 
observation f, we look for u and 7 such that 
f =u+y, where 7 incorporates all oscillations, that 
is, noise, and also texture. Let us define a functional 
to be minimized which takes into account the data f 
and possibly some statistical informations about 7: 


min{ġ(lu|z) such that Y(In|g)=0 
with f =u +n} 


This formulation means that we look, among all 
decompositions f =u +n, for the one which mini- 
mizes (|u|) under the constraint Y(|n|g)=v. 


Lagrange multiplier, the formulation [6] can be 
rewritten as: 


min{¢(lulz) + Aylin); f= utn} [7] 


A similar writing consists in replacing 7 by f — u so 
that [7] rewrites 


min{ (|p) + A¥(If — ula) } [8] 


which is the classical formulation in image restora- 
tion. From a numerical point of view, the minimiza- 
tion is usually carried out by solving the associated 
Euler equations but this may be a difficult task. The 
main concern is the search for E and G and their 
norm (or seminorm). It is guided by the choice that 
an image u is composed of various geometric 
structures (homogeneous regions, edges) while 
n=f — u represents oscillations (noise and textures). 


Examples of Functional Spaces 


In this section, we revisit some possible choices of 
functional spaces summarized in Table 1. 

The first case (a) was inspired by the classical 
Tikhonov regularization. The functional space 
H'(Q)(Q c Rİ) is the space of functions in L?(Q) 
such that the distributional gradient Du is in L*(Q). 
Unfortunately, functions in H'(Q) do not admit 
discontinuities across curves and this is a major 
problem with respect to image analysis since images 
are made of smooth patches separated by sharp 
variations. 

Considering the problem reported in (a), Rudin et al. 
(1992) proposed to work on BV(Q), the space of 
bounded variations (BV) Ambrosio et al. (2000) 
defined by 


BV(Q) = we L'a): f Du rs >} 


with [pm = sup | | udivg dx; 
Q Q 





Banach spaces E and G, and functions ọ and w TE E aoed 

will be discussed in the next subsection. Since a OE i ! 

minimization problem under constraints can be EE i| (9] 

expressed with an additional term weighted by a 

Table 1 Examples of functional spaces and their norm (see model [8]) 

Model E and |u|- g(t) G and |u|, w(t) 
1/2 

(a) AY), (ule= (Jo Vu]? dx) r L? (Q) with its usual norm ie 

(b) BV(Q), |ule = fo |Du| t L?(Q) with its usual norm r 

(c) BV(Q), |uļe = fo |Du| t {b € (9); b =divé, lêle < 1,€- Nlag =0} t 


It is equivalent to define BV(Q) as the space of 
L'(Q) functions whose distributional gradient Du is 
a bounded measure and [9] is its total variation. The 
space BV(Q) has some interesting properties: 


1. lower semicontinuity of the total variation 
f{,|Du| with respect to the L'(Q) topology, 

2. if we BV(Q), we can define, for H! almost 
everywhere x €S,, the complement of Lebesgue 
points (i.e., the jump set of u), a normal n,„(x) 
and two approximate “right” and “left” limits 
u(x) andu (x), and 

3. Du can be decomposed as a sum of a regular 
measure, a jump measure, and a Cantor measure: 


Du = Vudx + (u* — U~) His, +C, 


where Vu is the approximate gradient and H! the 
one-dimensional Hausdorff measure. 


This ability to describe functions with disconti- 
nuities across a hypersurface S, makes BV(Q) very 
convenient to describe images with edges. In this 
context, the image restoration problem is well 
posed and suitable numerical tools can be proposed 
(Chambolle and Lions 1997). 

One criticism of the model (b) in Table 1 pointed 
out by Meyer (2001) is that if f is a characteristic 
function and if f is sufficiently small with respect to 
a suitable norm, then the model (Rudin et al. 1992) 
gives u=0 and 7=f contrary to what one should 
expect (u =f and ņn=0). In fact, the main reason of 
this phenomenon is that the L?-norm for the 7 
component is not the right one since very oscillating 
functions can have large L?-norm (e.g., 
falx) = cos(mx)). To better describe such oscillating 
functions, Meyer (2001) introduced the space of 
functions which can be expressed as a divergence 
of L®-fields. This work was developed in R and 
this framework was adapted to bounded 2D 
domains by Aubert and Aujol (2005) (see (c) in 
Table 1). An example of image decomposition is 
shown in Figure 3. 

In this section, we have shown how the choice of 
the functional spaces is closely related to the 
definition of a variational formulation. The 





t AAE 
Original u n 


Figure 3 Example of image decomposition (see Aubert and 
Aujol (2005)). 
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functionals are written in a continuous setting and 
they can usually be minimized by solving the 
discretized Euler equations iteratively, until conver- 
gence. These PDEs and the differential operators are 
constrained by the energy definition but it is also 
possible to work directly on the equations, forget- 
ting the formal link with the energy. Such an 
approach has also been much developed in the 
computer vision community and it is illustrated in 
the next section. 

We refer the reader to Aubert and Kornprobst 
(2002) for a general review of variational 
approaches and PDEs as applied to image analysis. 


Scale Spaces and PDEs 


Another approach to perform nonlinear filtering 
is to define a family of image smoothing operators 
T;, depending on a scale parameter t. Given an 
image f(x), we can define the image u(t, x) = (T;f)(x) 
which corresponds to the image f analyzed at scale t. 
In this section, following Alvarez—Guichard—Lions— 
Morel (Alvarez et al. 1993), we show that u(t, x) 
is the solution of a PDE provided some suitable 
assumptions on T}. 


Basic Principles of a Scale Space 


This section describes some natural assumptions to 
be fulfilled by scale spaces. We first assume that the 
output at scale t can be computed from the output at 
a scale t — þh for very small þh. This is natural, since a 
coarser scale view of the original picture is likely to 
be deduced from a finer one. T; is obtained by 
composition of transition filters, denoted by T;,, +. 
So the first axiom is 


(A1) Tap = Tish, ttt To =Id 


Another assumption is that operators act locally, 
that is, (Tp, f)(x) depends essentially upon the 
values of f(y) with y in a small neighborhood of x. 
Taking into account the fact that as the scale 
increases, no new feature should be created by the 
scale space, we have the local comparison principle: 
if an image u is locally brighter than another image 
v, then this order must be conserved by the analysis. 
This is expressed by: 


(A2) For all u and v such that u(y) > v(y) in a 
neighborhood of x and y Æ x, then for h small 
enough, we have 


(T4524) (x) > (Trev) (x) 


The third assumption states that a very smooth 
image must evolve in a smooth way with the scale 
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space. Denoting the scalar product of two vectors of 
RN by <x,y>, this assumption can be written as 


(A3) Let u(y)=1/2(A(y —x), y—x) + (Psy —x) +e 
be a quadratic form of R*, x fixed 
(A= V?u(x) eS) the set of 2 x 2 symmetric 
matrices, p= Vu/(x) a vector of R*,c=u(x) a 
constant.). We shall say that a scale space is 
regular if there exists a function F(t, x, c, p, A), 
continuous with respect to A, such that 


(Tap — u)(x) _, 


7 when h — 0 


F(t, X, C, P, A) 


Scale Spaces are Governed by PDEs 


In the following theorem, it is stated that the former 
assumptions are sufficient to prove that scale spaces 
are in fact governed by PDEs. 


Theorem 2 Under assumptions A1, A2, A3, there 
exists a continuous function F:[O0,T] x9 xR x 
R xSP >R satisfying F(t,x,c,p,A) > F(t,x,c,p,B) 
for all p€ R?, A and B in S®” with A > B such that 


ô(u) 
Theu 


= > F(t, x,u, Vu, Vn), h—0* [10] 


uniformly for x € R*, uniformly for u. 


In eqn [10], the left-hand side term can be 
interpreted as the partial temporal derivative with 
respect to ¢ so that the notion of PDEs arises. More 
precisely, if f is continuous and uniformly bounded, 
then it can be established that u(t, x) = (T;f)(x) is the 
viscosity solution(see Definition 3) of 


a + H(t, x,u, Vu, V? u) = 0 (here H = —F) 


u(0,x) = f (x) 


[11] 


The map H:[0, T] x Q x R x R* x S*) SR is called 
a Hamiltonian and the decreasing property of H 
with respect to S is called degenerate ellipticity. 

The theory of viscosity solutions was introduced 
in the 1980s by Crandall and P L Lions (Crandall 
and Lions 1981, Crandall et al. 1992). When strong 
solutions of [11] do not exist, this theory allows 
to define solutions which are only continuous or 
even discontinuous. The definition of viscosity 
solutions is 


Definition 3 Let H:Q x R x R? x S” SR be con- 
tinuous and degenerate elliptic and let u eC? 


((0, 7] x Q). Then u is a viscosity solution of [11] 
in [0, T] x Q if and only if 


(i) u is a subsolution, that is, Y € C*([0, T] x 9), 
V(to, xo) a local strict maximum point of (u — ¢) 
(t,x), we have 


O 
2E Cto, x0) + H(to, xo ,u(to, xo), VE(to, xo), 


V’ o(to, xo)) = 0 


(ii) u is a supersolution, that is, Vd € C?([0, T] x Q), 
V(to, xo) a local strict minimum point of (u — ¢) 
(t,x), we have 


o 
si (to, xo) T H (to, xo , u(to, xo), Vg@(to, xo), 


V> (to, x0)) > 0 


In this definition, it is noticeable that derivatives of 
u are replaced by the derivatives of the test functions 
@. Obviously, it can be verified that this notion of 
weak solutions coincides with classical solution 
when u has enough regularity. 


Diffusion Operators Coming from the Scale Space 


A step further is to assume additional properties on 
the scale spaces and estimate the corresponding 
operator. Invariance properties include geometric 
invariance axioms, contrast invariance, or scale 
invariance. For example, if we assume the axioms 
A1-A3, gray-level shift invariance: 


(11) T,(0) =0, T;(u + c)=T;(u) + c for all u and all 


constant c. 
and translation invariance: 


(12) T,(7,.u) =7,.(T,u) for all h in R?,t > 0, where 
(7,.u)(x) =u(x +b). 


Then it can be established that F in [10] is 
independent of (x,u), that is, u(t,x)=(T;f)(x) is 
the unique viscosity solution of 


a = F(Vu, V*u) 


u(0,x) = f(x) 


With more precise assumptions, one can even 
recover explicitly the operator F. As an example, if 
we look for a linear scale space which verifies some 
isometry assumption: 


(13) T,(R.u)(x) =R.(T,u)(x) for all orthogonal trans- 
formation R on R?, where (R.u)(x) =u(Rx). 


Then it can be proved that the scale space is the 
unique solution of the heat equation: 

Ou 

—— Au=0 

at [12] 


u(0,x) = f (x) 


Figure 4 is an example of [12] applied to a noisy 
image at different scale, that is, at different time. 
Note that noise is quickly removed but one has to 
stop the evolution very early if we would like to 
preserve some edges. In the nonlinear cases, several 
operators have also been found based on curvature. 
For instance, under suitable axioms (Alvarez et al. 
1993), including contrast, scale, and affine invari- 
ance, the associated scale space is 


a e alo 
ot 
i Vu [13] 
where kK = div (=) 


u(0,2) = f(x) 


This equation is called affine morphological scale 
space (AMSS) and three restored images are shown 
in Figure 5. Some qualitative differences are shown 
in Figure 6. 





150 iterations 


Original image 40 iterations 90 iterations 


Figure 4 Illustration of heat equation [12]. 
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150 iterations 


Original image 40 iterations 90 iterations 


Figure 5 Illustration of the AMSS model [13]. 





Heat AMSS Heat 


Figure 6 Some close-ups of Figures 4 and 5 showing 
qualitative differences after 40 iterations. 
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Remark Scale space theory has shown the formal 
link between some operators and PDEs. It has to be 
noticed that one may propose some PDEs which do 
not directly come from the scale space framework. 
Starting from [12] which performs isotropic smooth- 
ing and smears edges, many nonlinear diffusion 
models have been proposed to smooth images while 
preserving edges (see e.g., Perona and Malik 
(1990)). o 


To know more on scale space and PDEs, we refer 
the reader to Weickert (1998) and Aubert and 
Kornprobst (2002). 


The Wavelet Approach 


Before the 1980s, the Fourier transform played a 
major role for analyzing oscillating signals. The 
interest of such a transform for real application 
increased after the discovery of the fast Fourier 
transform. However, the Fourier transform has 
some limit. The Fourier transform extracts from 
the signal details of the frequency content but loses 
all information on the location of particular fre- 
quency. Moreover, for computing the Fourier trans- 
form Ff(A), we need to know f(t) for all the real 
values of t. These difficulties can be overcome by 
first windowing the signal, and then by taking its 
Fourier transform: 


FFA = | Fogl- feds 


where g is a window function. The parameter A 
plays the role of a frequency localized around the 
abscissa t of the temporal signal and Ff (A, t) give 
an information about what is happening around 
s=t, for the frequency à. The main drawback of 
this method is that the window has a fixed length 
which is a serious disadvantage when we want to 
treat signals having variations of different orders of 
magnitude. All these issues highlighted that a 
mathematical theory of time-frequency representa- 
tion was necessary. This was achieved with the 
wavelet representation. In this section, we first recall 
some elements of this theory (for 1D signal) and 
then we show how it can be applied for restoring 
noisy images. 


The Wavelet Decomposition 


The basic idea is to construct from a function w, 
called mother wavelet, an orthonormal basis {7;,,} of 
L? (R) deduced from w by translation and dilatation. 
It is required that ~ be regular, oscillating (but not 
too much), that y and Fw are well localized and that 
w has some null moments. Once this function W% is 
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chosen, we set p; p(x) =2//?y(2/t —k),j,kREZ. An 
elegant and practical way for obtaining such a basis is 
to construct a multiresolution analysis of L7(R) 
(Mallat 1989). 


Definition 4 A multiresolution analysis of L7(R) is 
a sequence V;, j € Z, of subspaces of L7(R), with the 
following ee 


(i) (|, V= 

be Vj = o 

(iii J V;=L?(R), 

(iv) f(t VE V; if and only if f(2t) € Vj+1, and 

(v) There exists a regular function ¢ with compact 
support such that the family ¢(t—k),k€Z, is 
an orthonormal basis of Vo for the scalar 
product of L7(R). Such a function ¢ is called a 
scaling function. 


Then it is straightforward to check that the family 
P; k(t) defined by ¢; g(t) = 2i/*6(2it — k) is an ortho- 
normal basis of Vj. 

A basic example of multiresolution analysis of 
L7(R) is to choose Vo as the set of piecewise 
constant functions on R and take œ as the 
characteristic function of the interval [0,1): 
b(t) = Xto, 1)(t). 

Let us now look at the link between wavelet basis 
and multiresolution analysis. We just give main 
ideas, all details can be found in the work of Mallat 
(1989). Assume that we have a multiresolution 
analysis, and let us define Wo as the orthogonal 
complement of Vo in Vj. We build the mother 
wavelet w by imposing that the family Y(t — k), 
k € Z, is an orthonormal basis of Wo. For example, 


if o(t)=xjo,1)(t), it can be shown that %(t)= 
X10, 1/2)(t) — x {1/2,1)(t) (called the Haar wavelet). By 
change of scale, one gets that the family 


h; p(t) = 2//?o(2/t — k), REZ, is an orthonormal 
basis of Wj, the orthogonal complement of V; in 
Vii, that is, 


Ve Wj = Vin [14] 


Since the V;s are a multiresolution analysis, we have 
V= pI i „W; and L? = 12t% W., It is then clear 
that W; v(t Ji is an orthonormal | basis ol L-R), that is, 
for each function f € L*(R), we get the (ollsqine 
decomposition: 


= > N fiebie® 
=e 


Let us see now how in practice a multiresolution 
analysis can be interpreted. Let f be a function in 
L7(R). We denote Af (resp. Dyf) the operator 
which approximates f (resp. gives the details of f) at 


with fik =(f; Vik) 12 


resolution 2/. More precisely, Azf (resp. Dyf) is the 
projection of f on V; (resp. on W;): 


k=+00 


Arif (t) = X (f, birje) 


k=—00 


Az;yf is characterized by the sequence of scalar 
products ASf=((f,dje)Ikez, We call AJF the 
discrete approximation of f at resolution 2/. 

In the same way, we have 


k=+00 


Daft) = do (f bieb) 


k=—00 


Dyf is characterized by the sequence of scalar 
products Dif =((f,vj)Iker. 

We call Di Sif the details of f at resolution 7. 
According to [14], approximation and detail are 
linked by the relation 


Azj1f = Azif + Dif 


This means that Dyf represents the details to be 
added to obtain from one level of approximation to 
the next level of approximation. 

Finally, the decomposition of a signal f on a 
wavelet basis is obtained as an accumulation of 
details at scale 2/ from 0 to +00: 


J=t+oo J=+t00 k=+00 


= ` D;f = ` ` En [15] 


j=—00 j=—00 k=—00 


Instead of considering the sum over all dyadic 
levels j, one can sum over j > J for a fixed J; in this 
case, we have 


k=+00 Rao 
f = ` N (f, Wik) Wi k a > (f, Pk) OLR 
k=- j>J k=—00 


We conclude this section by showing how we can 
construct a 2D wavelet basis from the 1D case. We 
can simply use a tensor product. Scaling function 
and mother wavelet are given, respectively, as 
follows: 


olx, y) = 9(x)d), V= (W, yy) 


with 


ypt (x,y) = o(x)v(y) 
y(x, y) = o(y) d(x) 
Ww (x,y) = pyly) 


As for the 1D case, Ayf denotes the projection of 


f on V; D}, the horizontal details, D3; the vertical 





Figure 7 


Illustration on the wavelets methodology. 


details, and D3, the other details (the indice / in D}; 
is the same as in y’). For a 2D image f, we then have 
the following decomposition (see Figure 7): 


k=+00 
f= >) >, DF viaddin 
y! € Y kR=—o0 j>] 
k=+00 


+ Sof Ope) Ops 


k=—co 


Application to the Denoising Problem 


We go back to the denoising problem. Our goal is to 
solve this problem by using a variational approach 
and wavelets. We recall that we have an ideal image 
u that has been corrupted by a white Gaussian noise 
n resulting in an observation f with f =u +n. As it 
has been seen in the section “The variational 
approach,” this question can be tackled by solving 
the variational problem 


min{àg(lule) + If — ule} 16 


for suitable choices of E, G, and ¢. Here we propose to 
choose G = L*(Q) (Q is the domain image) and for E 
the Besov space Bt(L'(Q)) and ¢=TIdentity. Besov 
spaces BE (L?(Q)) are used in many domains of 
mathematics as harmonic analysis or approximation 
theory. There exist different ways for defining them. 
Roughly speaking, they consist of functions having a 
derivatives in L’(Q); the third parameter q allows one 
to make finer distinctions in smoothness. Here we are 
only concerned with the Besov space B}(L1(Q)). One 
important property needed here is that the norm of a 
function in E = B}(L1(Q)) is equivalent to the /'-norm 
of the wavelet coefficients, that is if {w,,} is an 
orthonormal basis of L*((2) and if u; g, y are the wavelet 
coefficients of u € E, then |u|g =>); dog y Maw 





Remark When one is concerned with a finite 
domain, then some changes must be made with 
respect to the construction given in [15] to obtain an 
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Wavelet 
shrinkage 


image 


Figure 8 Illustration of two regularization methods. 


orthonormal basis of L7(Q). To avoid further 
technical complications, we ignore this question. 
LI 


Let us denote, respectively, by {u; k, y} and {fik y} 
the wavelet coefficients of u and f, then solving [16] 
amounts to finding the minimizer of the functional 


Paj=A> oal t > ia — Few 


jk jk) 


a [17] 








One notes immediately that minimizing problem 
[17] reduces to finding the minimizer s, given ft, of 
E(s) =|s — t|? + Als| and that the minimizer of E(s) is 
given by s=t—(A/2) if t > A/2,s=0 if |t| < A/2 
and s =t + (A/2) if t < —(A/2). 

Thus, we shrink the wavelet coefficients f; k,y 
toward zero by an amount of A/2 to obtain the 
minimizer. This is exactly the wavelet shrinkage 
algorithm of Donoho and Johnstone (1994). It is 
remarkable that the wavelet shrinkage algorithm, 
which has been found by using statistical tools, can 
also be explained via a variational approach 
(Chambolle et al. 1998). Figure 8 shows an example 
of the result on a noisy image. 

For more details, we refer the reader to Mallat 


(1998). 


Conclusion 


Image processing is a challenging domain of applied 
mathematics which has to deal with discrete and 
continuous representations. In this article, we have 
covered the core mathematical tools used in the 
area. The example of gray-scale image restoration 
allowed us to illustrate and compare the different 
methodologies. Naturally, as mentioned in the 
introduction, image processing refers to a wide 
variety of applications and an intensive research 
has been carried out on the different topics using the 
methodologies described here. The reader will find 
in the references (therein) several illustrations of 
challenging problems. 


See also: [-Convergence and Homogenization; Convex 
Analysis and Duality Methods; Elliptic Differential 
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Equations: Linear Theory; Evolution Equations: Linear 
and Nonlinear; Fluid Mechanics: Numerical Methods; 
Fractal Dimensions in Dynamics; Free Interfaces and 
Free Discontinuities: Variational Problems; Geometric 
Measure Theory; Ginzburg-Landau Equation; 
Inequalities in Sobolev Spaces; Minimax Principle in the 
Calculus of Variations; Optimal Transportation; Partial 
Differential Equations: Some Examples; Stochastic 
Differential Equations; Variational Techniques for 
Ginzburg—Landau Energies; Wavelets: Applications; 
Wavelets: Mathematical Theory. 
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Introduction 


In this article we present comprehensive mathema- 
tical results on the incompressible Euler equations. 
Our presentation is focussed on the two aspects of 
the equations. The first one is on the theories of 
classical solutions and the problem of global in time 
continuation/finite time blow-up of the local classi- 
cal solutions. The second topic is concerned on the 
weak solutions, mainly for the two-dimensional 
(2D) Euler equations for existence and uniqueness 
questions. 


The motion of homogeneous incompressible ideal 
fluid in a domain Q C R” is described by the 
following system of Euler equations: 


OV 
ae t (v: Viv = -Vp [1] 
divv = 0 [2] 
u(x, 0) = vo(x) [3] 


where v=(v',v*,...,v”),v =u"(x,t),j=1,2,...,7, 


is the velocity of the fluid flows, p= p(x,t) is the 
scalar pressure, and vo(x) is a given initial velocity 
field satisfying div vọ =0. Here we use the standard 
notion of vector calculus, denoting 


oo sncompresssible Euler Equations: Mathematical Theory 11 





Ox,’ x2" 8X, 
ot a 
_ k 
Nyy = 2 Ong 
n k 
div v = al 
rar 


Equation [1] represents the balance of momentum 
for each portion of fluid, while eqn [2] represents 
the conservation of mass of fluid during its motion, 
combined with the homogeneity (constant density) 
assumption on the fluid. Equations [1] and [2] are 
first obtained by Euler in 1755. Although we could 
consider, more generally, the inhomogeneous incom- 
pressible Euler equations, in mathematical fluid 
mechanics considerations the incompressible Euler 
equations usually mean the above system [1]|-[2]. 
For a bounded domain with fixed boundary OQ, the 
natural boundary condition is 


v(x,t)-v(x)=0 V(x,t)E€ 0N~x|0,co) M 


where v(x) is the unit normal vector at the boundary 
point x € N. Several studies are concerned with the 
Cauchy problem of the system [1]-[3], where we 
consider the case 


T- R” (whole domain of R”), or [5] 
| R"/Z" (periodic domain) 
In this article for simplicity we suppose 


Q=R",n=2,3 unless otherwise stated. We note 
that the Euler equation is obtained formally by 
setting the viscosity = 0, or, equivalently, Reynolds 
number =oo in the Navier-Stokes equations. Thus, 
we may view the Euler equations as the one 
describing approximately the extremely high 
Reynolds number turbulent flows. For detailed 
mathematical studies on the finite Reynolds number 
Navier-Stokes equations, see Temam (1984) and 
Lions (1996). For much shorter and more compre- 
hensive review see Constantin (1995). In the study of 
the Euler equations the notion of vorticity, w = curl v, 
plays a very important role. In particular, we can 
reformulate the system in terms of vorticity fields 
only as follows. We first suppose we are working in 
three-dimensional (3D) space, and rewrite [1] as 


OV 
Fe xeurly =-V(p +5 v| 7 [6] 


Taking curl of [6], and using elementary vector 
identities we obtain the following vorticity formulation: 


Ow 
ap + Vjus=w: Vo [7] 


divy= 0, curly =w [8] 


w(x,0) = wo(x) [9] 


The linear elliptic system [8] for v can be solved 
explicitly in terms of w to give the Biot-Savart law 


edsa dy o 


Substituting this v into [7] formally, we obtain a 
integrodifferential system for w. The term in the 
right-hand side of [7] is called the “vortex stretching 
term,” and is regarded as the main source of 
difficulties in the mathematical theory of the 3D 
Euler equations. In the 2D case we take the vorticity 


as the scalar, w=ðv?/ðxı — ðv! /ðx2, and the 
evolution equation of w becomes 

Ow 

ae T (v- V)w = 0 [1 1] 


combined with the 2D Biot-Savart law, 

1 / (—y2 + x2, y1 — x1) 
— | = uly, t)dy [12 
Ir Je xy? (y,t)dy [12] 


In many studies of the Euler equations it is 
convenient to introduce the notion of “particle 
trajectory mapping,” ®(-,t) defined by 


Vit). = 


[13] 


The mapping ®(-,f) transforms from the location of 
the initial fluid particles to the location at time £, 
and the parameter a is called the Lagrangian particle 
marker. If we denote the Jacobian of the transfor- 
mation, det (V,®(a,t)) =J(a,t), then we can show 
easily that 


J- (div v)] 
which implies the fact that the velocity field v 
satisfies the incompressibility, divv =0 if and only if 
the mapping ®(-,t) is volume preserving. At this 
moment, we note that, although the Euler equations 
are originally derived by applying the mass con- 
servation and the momentum balance principles, we 
could also derive them by applying the principle of 
least action to the action defined k 


E dodi 














t1 


Here, ®(-,f):Q—Q is a aee family of 
volume-preserving diffeomorphism. This variational 
approach to the Euler equations implies that we can 
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view solutions of the Euler equations as a geodesic 
curve in the L?-metric on the infinite-dimensional 
manifold of volume-preserving diffeomorphisms (see 
for more details, e.g., Arnol’d and Khesin (1998)). 
The 3D Euler equations have many conserved 
quantities. We list some important ones below. 


1. Energy 


E(t) =5 | I(x, t)? dx [14] 


2. Helicity 


H= [veo -w(x,t) dx [15] 


3. Circulation 


Fco) = $ v: dl [16] 
C(t) 


where C(t) = {®(a,t)|a € C} is the curve moving 
along with the fluid. 
4. Impulse 


5. Moment of impulse 


M(t) =3 | =x (=x w) dx [18] 


The proof of conservations of the above quantities 
can be carried out without difficulty by using 
elementary vector calculus (for details see, e.g., 
Chorin and Marsden (1993), Majda and Bertozzi 
(2002), Marchioro and Pulvirenti (1994)). The 
helicity above, in particular, represents the degree 
of knotedness of the vortex lines in the fluid, where 
the vortex lines are the integral curves of the 
vorticity fields. Arnol’d and Khesin (1998) discuss 
in detail aspects of helicity and other geometric 
aspects of the Euler equations. For the 2D Euler 
equations there is no analog of helicity, while the 
circulation conservation is replaced by the vorticity 
flux integral, 


/ w(x,t) dx [19] 
A(t) 


where A(t)={®(a,t)|a€ A} is a planar region 
moving along the fluid. The impulse and the 
moment of impulse integrals are replace by 


1 


5 | Cruda [20a] 


and 


1 
a J aod [20b] 
3 Jo 
respectively. 


In the 2D ideal incompressible fluids we have 
extra conserved quantities; namely for any p € 
[1, oo] the integral 


J lw(x,t)P dx [21] 
Q 


is conserved (as a matter of fact we can extend this 
statement by replacing the integral by fo f(w(x, t))dx 
for any continuous function f). There are many 
known explicit solutions to the Euler equations (See 
e.g., Lamb (1932) and Majda and Bertozzi (2002)). 


Local Existence and the Blow-Up 
Problem 


The Classical Results 


We first introduce some notations of function 
spaces. The Lebesgue space L?(Q), p € [1,20], is the 
Banach space defined by the norm 


fll = l (a Pde), 


CSS. SUP, <Q If (x)|, 


p € [1, 00) 


p = œ 


Let us set a:=(a1,Q2,...,Qn) E (Z4 U {0}” with 
la|=ay+a2+-+:+a,. Then, D” =D D eD 
where D;=08/0x;j=1,2,...,n. For given keZ 
and p €[1,0o) the Sobolev space, W*?(Q) is the 
Banach space of functions consisting of functions 
f € LP (Q) such that 


flws: =( f IDF dx) oe 


where the derivatives are in the sense of distribu- 
tions. For p =œ we replace the L?-norm by the L% 
norm. In order to cooperate with the fractional 
derivatives of order s € R, we use the space L5 (Q) 
defined by the Banach spaces norm, 


Vie = I - A) F lp 


where (1-A) f=F I1 + E FINE] with 
F(-) and F'(-) denoting the Fourier transform 
and its inverse. Below we outline the key ideas of 
proving the local existence theorems for the Euler 
equations. For more details we refer the reader to 
Majda and Bertozzi (2002). For simplicity, we use 
the function space H”(R”) = W?(R”),n=2, 3. 
Taking derivatives D°® on [1], and then taking its 
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L? inner product with D°v, and summing over the 
multi-indices a with |a| < m, we obtain 


— 2 (D°(v- Vw — (v- V)D%v, D°v);2 


Ja|<m 


- X (wV) 


lam 


- X (D°Vp,D°v) > 


la|<m 


=I14+0+ 10 


2 _ 
sa lvl = 


D°v, D°v),2 


By integration by parts, we obtain 


M=- X (D%p, D*divv),, =0 


lal 


Integrating by parts again, and using the fact that 
div v=0, we have 


ivy div v|D°v|7 dx = 
2 mn 


We now use the so-called commutator type of 
estimate, 


NC ||D° (fg) — (D°gllr2 


la|<m 
< CUVA Isla + MF lla [lg lle) 


and obtain 


I < ` |D (v - Vv — (v - V)D° v1 52 |||] gm 
la|<m 


2 
< C||Vu lp [v| z 


Summarizing the above estimates, I-III, we have 


d 2 2 
qy Ulli < ClV ull lle lia [22] 
Further estimate, using the Sobolev inequality, ||Vv ||; 
< C|lv||4m for m > 5/2, gives 

d 


2 3 
qy Vla < Chelle 


Thanks to Gronwall’s lemma, we have the local-in- 
time uniform estimate 


vo lam 
V(E) | pam <= e a 
lee <= Gq S 


2||Yoll Hm 
eg 


for all £ € [0,1/(2C||vo]| ,)]. This is the key a priori 
estimate for the construction of the local solutions. 
The local-in-time solution of the Euler equations in 
the Sobolev space H”(R”) for m>n/2+1,m € Z, 


was obtained by Kato (1972). For the above- 
constructed local-in-time solutions, one of the 
most outstanding open problems in mathematical 
fluid mechanics is whether the solution can be 
continued to any future time up to infinity, or the 
solution will lose regularity and blow up in finite 
time. Even in terms of numerical experiments, the 
answer is not yet settled down. In the direction of 
solving this problem there is a celebrated results, 
called the Beale-—Kato—Mayjda criterion (1984), 
which states 


lim sup ||v(t)|| 7s = oo if and only if 
I, 


T; 
/ IWo(s)||pnds = 00 23 
0 


We outline the proof of this result below (for more 
details see Majda and Bertozzi (2002)). We first 
recall the Beale-—Kato—Majda’s version of the loga- 
rithmic Sobolev inequality, 


[Velle < Cllrs (1 + log(1 + [lolli )) + Cllelize [24] 


for m>S5/2. Now suppose i lolt) || poo dt < 00. 
Taking L* inner product of [7] with w, then after 
integration by part we obtain 


2 
zq Helle = (w: View) 2 
< [lolze ||Vv]z2 elle 
2 
= [ollre lolli 


where we used the identity ||Vv||;2 = ||w||;2. Apply- 
ing the Gronwall lemma, we obtain 


Ty 
lell < woll exp( f lels) [Le as) 
< C(wo, T.) [25] 
for all ¢ € [0, T4]. Substituting [24] into [22], and 
combining this with [25], we have 
J 
E lvl 
Cll + lwllz=[1 + log(1 + lvla lle Ilr 


Applying the Gronwall’s lemma, we obtain 
lv) la < [loll 
x exp ci exp (c / ° or) le ar) 
< C(vo, T:) 


for all że [0,T,] and for some constants C1, C2. 
Thus, we proved the “necessity part” of [23], The 
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“sufficiency part” is an easy consequence of the 
Sobolev inequality, 


T; 
/ ieda T sup ole 
0 O<t<T, 


< CT, sup |v) |l 


O<t<T, 


for m > 3/2, 


Other Related Results 


The previous local existence result in H” (R”), m > 
n/2 + 1, is basically due to T Kato in 1972. He and 
G Ponce extended this existence result using the 
fractional Sobolev space, L5(R"),s >n/2+1,sER 
in 1986. These results were further extended, using 
the Besov and the Triebel-Lizorkin spaces, by the 
present author in 2001. 

For bounded domain 2 c R”, R Temam obtained 
the local-existence result using the space W®? (Q) in 
1975. On the other hand, in the setting of the 
Holder space, Ch°(R”) L Lichtenstein (1925) and 
W Wolibner (1933) obtained local existence of 
solutions of the Euler equations. More recently, 
J-Y Chemin considered the Zygmund C*%(R”), which 
is identical to the Holder space C!sbs-isi(R”) for 
noninteger s, where [s] means the largest integer not 
greater than s, but is different from C'S)°(R”) for 
integer s. He proved, in 1992, local existence of 
solutions to the 3D Euler equations in this space in 
1992. See Chemin (1998) for details of this proof. 

The Beale-—Kato—Mayjda criterion for the finite- 
time blow-up of the classical solutions of the 3D 
Euler equations has been refined recently by many 
authors; replacing the L®-norm of vorticity w(x, t) 
by the weaker BMO (the space of functions with 
bounded mean oscillations) norm (H Kozono and 
Y Taniuchi, 2000), and by the even weaker Besov 
space or Triebel-Lizorkin space norms by the 
present author in 2001 (see Triebel (1983) for 
more details on those spaces). Here we just note 
that these spaces are refinements of the usual 
Sobolev spaces. For a bounded domain case, there 
is a result by A Ferrari in 1993. The blow-up 
problem is still open even in the case of axisym- 
metric 3D Euler equations if there is a nonzero swirl 
(angular velocity). In this case, the blow-up is 
controlled only by the angular component of the 
vorticity as shown by the present author (1996). In 
the region off the axis, in particular, the axisym- 
metric 3D Euler equation has the same form as the 
2D Boussinesq equations. 

Some researchers also tried to approach to 
regularity/singularity problem of the 3D Euler 
equations by investigating the geometric structure 


of the vortex stretching term, and obtained a 
geometric type of blow-up criterion (P Constantin, 
C Fefferman, and A Majda, 1996). For more 
detailed review of studies in this direction see 
Constantin (1995). 

Since the blow-up problem of the 3D Euler 
equation itself looks too difficult to solve, it has 
also been studied on the simplified model problems. 
In 1985, P Constantin, PD Lax, and A Majda 
considered the following 1D model problem of the 
3D Euler equations: 


X 


UGO = Go (0) 


where H(-) is the Hilbert transform defined by 


H(w) = ~pv f = dy 





They proved finite-time blow-up of this model 
problem by explicitly obtaining the solution. There 
is another, 2D model problem of the 3D Euler 
equations, the quasigeostrophic equations, 


where V+ =(—02,0,). Contrary to the above 1D 
model equation, this 2D model has real physical 
relevance in the atmospheric science, and 6@(x, t) 
represents the temperature of the air. The resem- 
blance of this equation to the 3D Euler equation 
was first observed by P Constantin, A Majda, and 
E Tabak in 1994, and they derived the finite blow- 
up criterion of the equations. In spite of many 
interesting partial results, including the work by 
D Cordoba (1998), the blow-up problem of [26] is 
still open. 


The 2D Euler Equations and the 
Weak Solutions 


The Case of W':? Weak Solutions 


In 2D Euler equations, the problem of global well- 
posedness of the classical solutions is settled down. 
This is an immediate consequence of the conserva- 
tion of ||w(zt)||,;. as stated in [21] combined with the 
Beale-Kato—Majda criterion [23]. On the other 
hand, the notion of weak solutions is not well 
understood. A weak solution of the Euler equations 
is a singular (nondifferentiable) solution of the 
equations. More precisely, by a weak solution of 
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[1]-[2] in Q x (0,T) we mean a vector field v € 
C([0, T); L oc(Q)) satisfying the integral identity: 


-f fo (x,t) ee oD dwar 


5 pn 
R 





T 
-| [ væ t evlat): Volx, t)dxdt=0 [27a 
0 JR 


T 
J / u(x,t) - Vi(x,t) dx dt = 0 
0 JR? 


for every vector test function ¢ = (¢1, @2,...,¢n) € 
Cy(Q x [0, T)) satisfying div ¢=0, and for every 
scalar test function Y% € CẸ(Q x [0,T)). Here we 
used the notation (“®v);=ujyvj, and A:B= 
yo j= 1 Ag Bi for nxn matrices A and B. We 
observe that [27a] and [27b] are obtained by 
multiplying @ and w to [1] and [2], respectively, 
and integrating by parts. Thus, even the locally 
square-integrable vector fields, which are not differ- 
entiable in the classical sense, could be solutions of 
the Euler equations. For the general 3D Euler 
equations, we do not yet have the global existence 
theorems for the weak solutions. Actually, it is even 
suggested that we need more weaker notion of 
solution (the so-called “measure-valued solutions”) 
to describe generic global solutions for the 3D Euler 
equations. For the 2D Euler equations, however, we 
have global existence theorems for wọ € L'(R7)N 
L?(R*) for p € [1,00]. This better situation of 2D 
Euler equations compared to the 3D case for the 
weak solutions is mainly due to the conservation 
law of L?-norm described in [21]. Here we present 
briefly the existence proof of the weak solutions for 
2D Euler equations in the simplest situation. We will 
prove the global existence of ee solutions for 
wy € L?(R*),1 < p<% Let p(x) =(1/e)p(x/e), 
where p € CX(R*) is a standard molier satisfying 
p > 0, supp p C {x € R?||x| < 1}, and a pdx=1. 
Let vo be the velocity associated with the initial 
vorticity wo, given by the Biot-Savart law [12]. 
ee the hee of initial data v5(x)=p- * 
Vo(x = fr p(x — y)vo(y) dy. For each vj we have 
SL -in-time smooth solutions v°(x,t). Moreover, 
thanks to [21], we have the following estimate of the 
vorticity that is uniform in €: 


le) Ino = [leo llno < [leo ll» |28] 


where we used the property of the mollifier in the 
second inequality. If we take the (distributional) 
derivative of the Biot-Savart law [12], we find 
Vv=K*«w+Cw, where K(x) is a kernel function 


[27b] 


defining a singular integral operator of the convolu- 
tion type, and C is a constant vector. The well- 
known Calderon-Zygmund inequality implies that 


[Vell < Cp [wll ro |29] 


Combining [28] and [29] we have 


sup |V (Hl < Co), YTrT>0 [50] 
O<t<T 


namely the sequence {v*} is uniformly bounded in 
L®(0, T; W'?(R*)). Next, we claim that {v*} satis- 
fies the inequality 

le (t) — 1° (t2)lla-œ3 < Clleollallt — 2) 31) 


for all t1,t with 0 < ti < t < T, where C is an 
absolute constant. Here the negative-order Sobolev 
space H™”(Q),m > 0, is defined as the dual of 
H (Q), and can be identified with the space of 
functions C (Q) completed with metric in H”(Q). 
Indeed, let d€ C°°(R*). Taking L?(R?) inner pro- 
duct of [1] with ¢ we have the estimates 


[3 M 
pe vwaj hee 
“lf read, f Y) dx 


< Ola- Volle + Olle 
< CUP Ola- + leslo [32] 


where we used the Sobolev inequality ||V@l|;~ < 
C|||| 3 and the energy equality in the last step. 
Since [32] holds for all ¢ € CAR) by taking the 
closure of C (R? ) in H3(R*) we obtain 


du (t) 


dt 
We now estimate ||p*(t)||,,2. Taking the divergence 
operation on [1], we have the Poisson equation 


Ap* = —div(v* - Vv*) 
Let n € CX(RÎ), then 








Vof dx 


























< CIPO lla- + lll [83] 
Ha 


Ap* (x, t)n(x) dx} = 











| div(v® - Vu" )n dx 
R2 





R2 





/ (ue -Vw - Vndx 
R2 








/ (v - V)Vn -vf dx 
R2 


< Olla nl 
< Clvollz2llnllns [34] 
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where we used the energy equality [14] and the 
Sobolev inequality in the last step. Since [34] holds 
for all n € Ce(R? ), taking the closure of C ) in 
H*(R7), we obtain 





J, A0 (æ, timla) ax} < Cleollzlll 
Vn € HI (R?) [35] 

Thus, 

IAP Ella < Cllvollz2 Yt € [0, T) 
This provides us with 

Ola < IDE Ola < CA Ola 
< Cllvolli2 

Combining [33] with [36], we obtain 


duf (t) 
dt 

















< Cllvollz2 
0<t<T H-2 


Thus, from 


t dv (t) 
vf (ti) — u° (t2) = ——— dt 
(1) (ta) = fe 
we have 
dv (t 
(1) — Fadl < sup [SOY -nl 
O<t<T H-2 














< C|lvolliz|t1 — t2 


Thus, [31] is proved as claimed. Thanks to the 
Aubin-Nitsche compactness lemma together with 
[30] and [31] we have a subsequence, denoted by the 
same notation, {v°} and v in L®(0, T; W'?(R7)) such 
that 
vf — v weakly — « in L” (0, T; W!?(R*)) [86] 
and 
v — v 


in L2 (R? x (0, T)) [37] 


loc 


as €— 0. We know that as a classical solution each 
v€ and V5 satisfies 


[ot 0us (ade 
R2 
T 
+f J (0+6: @04) dede = 0 [38] 
0 JR? 
for all @ € CX(R* x [0, T)) with div¢@=0 and 
T 
J Vy -° dxdt = 0 [39] 
0 JR? 


for all Y € CoR- x [0, T)). We can check easily that 
the convergence [36] and [37] is enough to pass to the 


limit € — 0 in [38] and [39] to obtain the correspond- 
ing equations with vf and v5 replaced by v and vo. 
Thus, v is a weak solution of the Euler equations with 
initial data vo. This completes the outline of the proof 
of weak solutions to the 2D Euler equations. 


Notes on Further Results 


The study of weak solutions of the 2D Euler 
equations was initiated by V Yudovich in 1963, 
where he proved the existence of weak solutions for 
initial data wo € L'(R*) N L®(R?). Subsequenthy, 
theory of weak solutions has been developed by 
studies of the vortex sheet problem due to DiPerna 
and Majda in 1987. For the existence of weak 
solutions to the vortex sheet initial data, namely 
the existence problem for initial vorticity wo € 
H” (R?) a M(R?), where M(R7) is the space of 
Radon measures on R’, is still an outstanding open 
problem. The main physical motivation of this 
problem is to understand the dynamics of vortex 
sheets in the 3D turbulence. For this problem 
JM Delort proved existence assuming single- 
signedness of the initial vortex sheet in 1991. The 
proof is simplified by A Majda in 1993, using the 
conservation of moment of impulse. The result is 
also reproved by LC Evans and S Müller in 1994, 
using the weak compactness of the Hardy space. 
Later in 2001, MC Lopes Filho, HJ Nussenzveig 
Lopes and Z Xin allowed the change of sign for 
initial vortex sheet, but assumed special reflection 
symmetry to prove existence of global weak solu- 
tions. Related to this problem is the one of 
characterizing the precise borderline function space 
to which initial data belongs, and above which there 
is no concentration phenomenon for weakly approx- 
imating sequence of solutions; a recent analysis of 
this problem was done by E Tadmor in 2001. 

For the uniqueness problem of the weak solutions of 
the 2D Euler equations, there are remarkable works by 
V Scheffer (1993) and A Shnirelman (1997), where 
they constructed explicitly an L? (R? x R) weak 
solution starting from zero initial data. Also M Vishik 
(1999) extended the uniqueness class of the weak 
solutions of the 2D Euler equations, improving 
previous work by V Yudovich (1995). The class 
found by M Vishik, in particular, includes the BMO. 
There is another problem closely related to the weak 
solutions of the 2D Euler equations, called the vortex 
patch problem. The main question was if there is any 
singularity of the boundary of a patch 
O(t) = {X(a, t) |a@ € Qo}, where X(a, t) is the particle 
trajectory mapping generated by a weak solution 
v(x,t), which is evolving from the initial data 
wo(x) =xo,(x), the characteristic function of set Qo 


with smooth boundary. The problem itself is well 
defined, due to the work of V Yudovich (1963), and 
there exists unique particle trajectories associated with 
such weak solutions. The problem was settled by J-Y 
Chemin in 1991. He proved the global-in-time 
preservation of the C!° regularity of the boundary 
OQ(t), contrary to the previous numerical experiments. 
The proof of this result was later simplified by A 
Bertozzi and P Constantin in 1993. 

Another interesting problem related to the weak 
solutions of the Euler equations (2D or 3D) is 
whether or not the energy is preserved for the weak 
solutions, namely if there is any “intrinsic dissipa- 
tion” to the singular solutions of the ideal fluids. In 
1949, L Onsager conjectured that if the weak 
solution of 3D Euler equations belongs to certain 
Holder space, then the energy is conserved. This 
conjecture, in the setting of Besov space, was 
proved by P Constantin, W E and E S Titi in 1994. 
This question of possibility of dissipation of energy 
for weak solutions is further studied by J Duchon and 
R Robert in 2000. Later, in 2003 the present author 
considered the problem of helicity conservation for 
the weak solutions of the 3D Euler flows, which is 
related to the question of crossing/reconnections of 
the vortex tubes for weak solutions, and showed that 
for large class of weak solutions in certain Besov 
spaces the helicity is preserved. 


See also: Compressible Flows: Mathematical Theory; 
Evolution Equations: Linear and Nonlinear; Fluid 
Mechanics: Numerical Methods; Interfaces and 
Multicomponent Fluids; Intermittency in Turbulence; 
Inviscid Flows; Non-Newtonian Fluids; Partial Differential 
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Introduction 


If, in a problem of quantization, state spaces with 
indefinite inner product are used instead of Hilbert 
spaces, one speaks of quantization with indefinite 
metric. The main domain of application is the 
quantization of gauge fields, like the electromagnetic 
vector potential A,,(x) or Yang—Mills fields in quan- 
tum chromodynamics (QCD) and the standard model. 

The conceptual problem with the indefinite metric 
is the occurrence of senseless negative probabilities 
in the formalism. Such negative probabilities, 
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however, only arise in expectation values of fields 
that are not gauge invariant and hence do not 
correspond to observable quantities. Equivalently, 
the inner product of vectors generated by applica- 
tion of such fields to the vacuum vector with itself 
can be negative or null. In order to extract the 
observable content of an indefinite-metric quantum 
theory, a subsidiary condition is needed to single 
out the physical subspace. Restricted to this subspace, 
the inner product is positive semidefinite. This 
subsidiary condition can be seen as the implementa- 
tion of a gauge, as, for example, the Lorentz gauge 
O,A"(x)=0 in quantum electrodynamics (QED). 
This procedure is also known under the name 
Gupta-—Bleuler formalism. 

The use of indefinite metric in the quantization of 
gauge theories like QED can be avoided entirely. 
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This is called quantization in a physical gauge. The 
problem with such gauges is that they are not 
Lorentz invariant and that the vector potential A“(x) 
is not a local field. An example is the Coulomb 
gauge defined by Ag(x) =0 and 0’A;(x) =0 in QED. 
Furthermore, Dirac spinor fields y(x) in such gauges 
do not anticommute when localized in spacelike 
separated regions. The Dirac fields therefore are also 
nonlocal quantities. Although not in contrast with 
special relativity, as Dirac spinors and the vector 
potential are not gauge invariant and hence are 
unobservable, this leads to severe technical problems 
in the formulation of interacting theories. In 
particular, the theory of renormalization heavily 
uses both locality and invariance. Therefore, the 
Gupta—Bleuler formalism generally is the preferred 
quantization procedure for a gauge theory. 

That a local and invariant quantization is not 
possible using a (positive-metric) Hilbert space has 
been proved by F Strocchi in a series of articles 
published between 1967 and 1970. If one wants to 
preserve locality and/or invariance of the quantized 
field theory, it is thus strictly necessary to give up 
the positivity of the state space. 

A short digression into the early history of the 
idea might be of interest. It dates back to 1941, 
where the use of indefinite metric in the quantiza- 
tion of relativistic equations was proposed by Paul 
Dirac in a lecture at the London Royal Society. The 
negative probabilities for the bosonic vector poten- 
tial were thought to be connected with the problem 
of negative-energy solutions of relativistic equations 
as a type of surrogate of the “Dirac sea” in the 
quantization of fermions. Furthermore, Dirac pro- 
posed that negative-energy solutions and negative 
probabilities would jointly lead to the cancellation of 
divergences in QED. The latter idea was taken up by 
W Heisenberg in his lectures on the theory of 
elementary particles held in Munich in 1961, but the 
generally accepted solution to the problem of ultra- 
violet divergences was achieved without recourse to 
Dirac’s original motivation. In 1950 the consistent 
quantization of vector potential in the Lorentz gauge 
was formulated by SN Gupta and K Bleuler 
eliminating the use of negative-energy solutions. 
Since then the indefinite metric has become a building 
block of the standard theory of quantized gauge fields. 


No-Go Theorems 


The strict necessity of the Gupta—Bleuler procedure 
for the local or covariant quantization of gauge 
fields has been demonstrated by F Strocchi in 
the form of no-go theorems for positive metric. 
Here we review their content for the case of the 


electromagnetic field. Related statements can be 
obtained for nonabelian gauge theories. The main 
problem lies in the fact that standard assumptions 
on the quantization of relativistic fields are in 
conflict with Maxwell equations that should hold 
as operator identities in a positive-metric theory 
containing no unobservable states. Let 


File) = ,Av(x) — 8, Aulx) 1] 


be the quantized electromagnetic field strength 
tensor. Classically, the existence of A,,(x) is guaran- 
teed from the first set of Maxwell equations 
e"HOoF,,.(x) =0. Here (and henceforth) indices are 
raised and lowered with respect to the Minkowski 
metric gag and e°”! is the completely antisymmetric 
tensor on Rf. Furthermore, we apply Einstein’s 
convention on summation over repeated upper and 
lower indices. Standard assumptions from axiomatic 
quantum field theory are: 


1. The field strength tensor F,,,(x) is an operator- 
valued distribution acting on a (dense core of a) 
Hilbert space H with scalar product (.,.) — in the 
indefinite-metric case, (.,.) only needs to be an 
inner product. 

2. F(x) transforms covariantly, that is, there is a 
strongly continuous unitary (with respect to (., .)) 
representation U of the orthochronous, proper 
Poincaré group on H such that for translation a € 
R? combined with a restricted Lorentz transfor- 
mation A, one has 


U(a, A)F,,,(x)U(a, A)! 
= (AT) (AT)* Fon(Ax + a) [2 


3. There exists a unique (up to multiplication with 
C-numbers) translation invariant vector Q € H 
(the “vacuum”), that is, U(a, 1)Q=QVa € Rf. 

4. The representation of the translations fulfills the 
spectral condition 


J (®, U(a,1)W)e”* da = 0 3 
R4 


YY, € H if p is not in the closed forward light 
cone V'={p € R*: p-p>0,p° >0}. Here the 
dot is the Minkowski inner product. 
So far the assumptions concerned only observable 
quantities. In the following, we also demand. 

5. The vector potential A,(x) is realized as an 
operator-valued distribution on H and trans- 
forms covariantly under translations 


U(a,1)A„(x)U(a,1)* = A„(x + a) [4] 


The assumptions on the nature of the vector 
potential so far are rather weak. Strocchi’s no-go 
theorems show that one cannot add further desirable 
properties as Lorentz covariance and/or locality 
without getting into conflict with the Maxwell 
equations: 


Theorem 1 Suppose that the above assumptions 
(1)-(3) and (5) hold. If Maxwell’s equations in the 
absence of charges, 


cPMMOGE, (x) =0, OF ,,(x) = 0 [5] 
are valid as 6 operator identities on H and the gauge 
potential transforms covariantly 


U(a, A)A,(x)U(a, A)" = (A7) A(Ax +a) [6] 


the two-point function of the electromagnetic field 
tensor vanishes identically: 


(Q, Fiy(%)Fep(y)Q) =0 Vx,y eR" [7] 


To gain a better understanding, where the difficul- 
ties in the quantization of the Maxwell equations 
arise from, here is a rough sketch of the proof: 
Maxwell equations and covariance imply that 
faal =y) = (9, A(x) Foy) fulfills 8”3ufuval) 
=Q and hence its Fourier transform has support in 
the union of the forward and backward light cone. 
The Fourier transform thus can be split into a 
positive- and a negative-frequency part, and 
hive = favo + frp accordingly. By the general analysis 
of axiomatic field theory (see Axiomatic Quantum 
Field Theory), the functions La , are boundary values 
of complex analytic functions on certain tubar 
domains 7* transforming covariantly under a certain 
representation of the complex Lorentz group. By a 
theorem of Araki and Hepp giving a general 
representation of such functions and using the 
antisymmetry of the field tensor, the following 
formula can be derived: 


wp (2) = (SpA — Burp) f (2) + Ewpad°h™ (g) 
zET* [8] 


with f*,þ* invariant under complex Lorentz trans- 
formations. Taking boundary values in 7*, one 
obtains  fuvp = (SupOv — BuvOo)f + Euvpa.0%h, with 
f=ft+f- and h=h’ +h, where the bar stands 
for the distributional boundary value. Maxwell’s 
equations imply 0’ fii )=(O’ OS) — O,0,)f =O and 
made) ee =(O’O, Zan — Oa lbh =0. The only Lor- 
entz-invariant solutions to these equations are 
constant, which implies the statement of Theorem 1. 

The second no-go theorem eliminates the assump- 
tion that the vector potential A„(x) is covariant; 
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however, a local gauge is assumed. The result is the 
same as in Theorem 1: 


Theorem 2 Suppose that the above assumptions (1)— 
(S) and Maxwell’s equations hold as operator iden- 
tities on H. If, furthermore, the gauge is local, that is, 


[A,.(x), Ar(y)] = 0 


the two-point function of the field strength tensor 
vanishes again as in Theorem 1. 


if x— y is spacelike |9] 


Analyzing the interplay of the covariance proper- 
ties of F,,,(x) with the locality of A, (x), Strocchi was 
able to show that the function f,,,,(x — y) must have 
the same covariance properties as in Theorem 1, 
which implies the assertion of Theorem 2. 

The first two no-go theorems deal with the free 
electromagnetic field that is not coupled to charge- 
carrying fields. This is, of course, already a real 
obstruction also for an interacting theory, since, by 
the LSZ formalism, one expects the asymptotic 
incoming and outgoing fields Ain wo var out (X) to 
be free. In fact, it has been proved by D Buchholz 
that, in the positive-metric case, such asymptotic 
fields can always be constructed. If one assumes a 
local and covariant gauge and positivity, the 
vanishing of the two-point function would also 
imply that the field F „„(x)=0 identically by the 
Reeh-Schlieder theorem. 

The next no-go theorem shows that the problems 
connected to the quantization of the Maxwell 
equations are not connected only to the free 
electromagnetic fields. Let us assume that the second 
set of Maxwell equations is given by 


OP F(x) = f(x) (10) 


where jẹ is the leptonic current, that is, 
jlx) =e: yt (x)y plx): in the case of QED, where 1 is 
the quantized Dirac field associated with electrons and 
positrons. Here :-: stands for Wick ordering and y, 
are the Dirac matrices, 7" = 7)*°. The conservation of 
the current 0”7,(x) =0 implies that the current charge 


n= lia J / Mote ee wide: A 
Roo JR3JR 

is a constant of motion, where a and y are 
compactly supported infinitely differentiable func- 
tions with fe a(x°)=1 and x(x)=1 for |x| <1. 
Now, an alternative definition of charge, called 
gauge charge (it generates the global U(1)-gauge 
transformation), is given by 


O9 =0, [Qc,A,(x)| =O and 
lOc, (x)| = —ep(x) [12] 
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A third formulation of charge, the Maxwell charge 
Om, can also be given by replacing j°(x) in [11] by 
ð, F” (x). Obviously, if Maxwell equations hold as 
operator identities, Oc = Om. On observable states, 
all charges Om, Oc, and Og ought to coincide. 
Strocchi’s third theorem shows that this cannot be 
achieved within a local gauge: 


Theorem 3 If the Maxwell equations |9] hold and 
the Dirac field y(x) is local with respect to the 
electromagnetic field tensor F(x), that is, 


[Fa (x), Y) = 0 
then [Om, Y(x)] =0, hence Oc = Om F# Qc. 


The proof is a simple consequence of the 
observation that jo(x)=O”’F,o(x)=O'Fio(x) is a 
three-divergence as Foọo(x)=0 by antisymmetry of 
Fiw) Hence, 


if x— y is spacelike [13] 


Qc.) = Jim | lola), woy}a(x®)x(ee/R) dx” de 


=- jim | Fol) vO). xlE/R) 
x dx? dx = 0 [14] 
since, for R sufficiently large, the support 


of a(x?)ðx(x/R) becomes spacelike separated 
from y. 

It should be noted that the proof of none of the 
above theorems relies on the definiteness of the 
inner product. The main clue of the indefinite-metric 
formalism, therefore, is rather to give up Maxwell 
equations as operator identities. In the usual 
positive-metric formalism, where all states in H are 
physical states, this would not be legitimate. But in 
indefinite metrics, many states are unobservable — in 
particular, those with negative “norm” (Y, Y) < 0. 
On such states we can neglect the Maxwell 
equations. 


Axiomatic Framework 


The formalism of axiomatic quantum field theory 
(see Axiomatic Quantum Field Theory) requires a 
revision in order to cover the case of gauge fields. 
The necessary adaptations have been elaborated by 
G Morchio and F Strocchi, but also earlier work 
of E Scheibe and J Yngvasson played a significant 
role in this development. 

Let (x) be a V’-valued quantum field, where V 
is a finite-dimensional C-vector space with involu- 
tion x. The prime stands for the (topological) dual. 
For the case of QED, V is eight dimensional, 


containing four dimensions for the vector potential 
A,(x) and another four for the Dirac spinors 
(x), o" (x). 

Such a quantum field can be reconstructed from its 
vacuum expectation values (Wightman functions) as 
follows: let S;=S(R*,V) be the space of rapidly 
decreasing functions f:R* — V endowed with the 
Schwarz topology. Then the Borcher’s algebra S be 
the free, unital, involutive tensor algebra over S4, that 
is, S=C1 „>o S” with the multiplication induced 
by the tensor product and involution (fi ®---® 
fn) =f; Q- @ ff. Sis endowed with the direct-sum 
topology. One can show that any linear, normalized, 
continuous functional W:S — C,W(1)=1, is 
determined by its restrictions W, to S$”. By the 
Schwarz kernel theorem, W,, € S’(IR*”, V®”). Con- 
versely, any such sequence of Wightman distribu- 
tions W,, determines a W. 

Given a Hermitian Wightman functional W such 
that W(f") = Wf), Vf € S, Lw ={f E€ S: Wh’ @ f)= 
OVb € S} forms a left ideal and the inner product 
W(f* hb) induces a nondegenerate inner product 
(.,.) on Ho = S/Lw. Furthermore, Borchers’ algebra 
S acts from the left on Ho. The quantum field ¢(x) 
defined as the restriction of this canonical represen- 
tation to the space Sı CS according to ¢(f)= 
“fea O1(x)fa(x)dx” where the index a runs over a 
basis of V. 

If the Wightman functional W has further proper- 
ties from axiomatic QFT (see Axiomatic Quantum 
Field Theory) like invariance with respect to a given 
representation of the Lorentz group on V, translation 
invariance, locality, and the spectral property, the 
quantum field (x) fulfills the related requirements in 
analogy with the items (1)-(5) listed in the previous 
section for the case of the vector potential A,,(x). The 
Wightman distributions W,, as in the positive-metric 
case are related to the vacuum expectation values of 
the theory by 





Xn) = (R, 6 (x1) + P(X n)Q) [15] 


where Q is the equivalence class of 1 in Ho. 

The state-space Ho produced by the Gelfand- 
Naimark—Segal (GNS) construction for inner- 
product spaces might be too small to contain all 
states of physical interest. For example, in the QED 
case, it does not contain charged states (cf. Theorem 3). 
Depending on the physical problem, one might 
also be interested in constructing coherent or 
scattering states and translation-invariant states 
apart from the vacuum. Such states appear in 
problems related to symmetry breaking and confine- 
ment (the so-called @-vacua) or in some problems of 
conformal QFT (see Boundary Conformal Field 


Theory) in two dimensions. It, therefore, has 
become the standard point of view that one needs 
to make a suitable closure of Ho such that this 
closure includes the states of interest (for an 
alternative point of view, see the last paragraph of 
the following section). 

Typically, larger closures are favorable, as they 
contain more states. One therefore focuses on 
maximal Hilbert closures of Ho. A Hilbert topology 
r is induced by an auxiliary scalar product (.,.) on 
Ho. It is admissible, if it dominates the indefinite 
inner product |(®, w)| < C(U, V)(G, b) VU, € Ho 
for some C>0. This guarantees that the inner 
product extends to the Hilbert space closure H of 
Ho with respect to r. Furthermore, there exists a 
self-adjoint contraction 7 on H such that (W,7®) = 
(W,7®) VOU € H. A Hilbert topology T is maximal 
if there is no admissible Hilbert topology 7’ that is 
strictly weaker than Ho. The classification of 
maximal admissible Hilbert topologies in terms of 
the metric operator 7 is given by the following 
theorem: 


Theorem 4 A Hilbert topology tT on Ho generated 
by a scalar product (.,.) is maximal if and only if the 
metric operator n has a continuous inverse n on the 
Hilbert space closure H of Ho. In that case, one can 
replace (.,) by the scalar product (Y, ®), =(W, |n|®) 
without changing the topology r. The new metric 
operator q then fulfills 7 = 1. 


For a proof of the first statement, see the original 
work of Morchio and Strocchi (1980). One can 
easily check that m =7|7'| which implies the 
second assertion of the theorem. A Hilbert space 
(H, (.,.)) with an indefinite inner product induced by 
a metric operator 7 with n? = 14 is called a Krein 
space. For an extensive study of Krein spaces, see the 
monograph by Azizov and Iokhvidov (1989). 

Furthermore, one can show that given a nonmax- 
imal admissible Hilbert space topology 7 induced by 
some (.,.), one obtains a maximal admissible Hilbert 
topology as follows: given the metric operator 7, we 
define a scalar product (V,®), = (Y, (1 —Po)®) on 
H with Po the null space projector of 7. Obviously, 
this scalar product is still admissible and it leads to a 
new metric operator 7, and a new closure 7; of Ho. 
Furthermore, it is easy to show that the scalar 
product (Y, ®), = (Y, |7,|®), still induces an admis- 
sible Hilbert topology which is also maximal, as 
m=m\|n,'| clearly fulfills the Krein relation 
I= In, f 

The question of the existence of a Krein space 
closure of Ho, therefore, reduces to the question of 
the existence of an admissible Hilbert topology on 
Ho. The following condition on the Wightman 
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functions W, replaces the positivity axiom in the 
case of indefinite-metric quantum fields: 


Theorem 5 Given a Wightman functional W, there 
exists an admissible Hilbert space topology T on 
Ho =S/Lw if and only if there exists a family of 
Hilbert seminorms py, on Sy, such that |Wnin 
(f ® b)| S Pulf )Pm(h), Yn, m © No, f z Sn, h z Sm. 


In some cases, covering also examples with 
nontrivial scattering in arbitrary dimension, the 
condition of Theorem § can be checked explicitly 
(see Non-trivial Models of Quantum Fields with 
Indefinite Metric). 

It should be mentioned that different choices of the 
Hilbert seminorms p, lead to potentially different 
maximal Hilbert space closures (Hoffmann 
1998, Constantinescu and Gheondea 2001). In fact, 
often the topology is not even Poincaré invariant and 
hence the states that can be approximated with local 
states depend on a chosen inertial frame. This fact, 
for the case of QED, has been interpreted in terms of 
physical gauges. 

Many results from axiomatic field theory (see 
Axiomatic Quantum Field Theory) with positive 
metric also hold in the case of QFT with indefinite 
metric, like the PCT and the Reeh-—Schlieder 
theorem, the irreducibility of the field algebra (for 
massive theories) and the Bisoniano—Wichmann 
theorem (see Algebraic Approach to Quantum Field 
Theory). Other classical results, like the Haag- 
Ruelle scattering theory and the spin and statistics 
theorem definitively do not hold, as has been proved 
by counterexamples. This is, however, far from 
being a disadvantage, as, for example, it permits the 
introduction of various gauges in the scattering 
theory of the vector potential A,,(x) and fermionic 
scalar “ghost” fields in the BRST quantization (see 
BRST Quantization) formalism. 


Gupta-Bleuler Gauge Procedure 


Here the Gupta—Bleuler gauge procedure is pre- 
sented in a slightly generalized form following 
Steinmann’s monograph. Classically, the equations 
of motion for the vector potential A,,(x), 


A"0,A,,(x) + 0,0" A(x) = jn(x) 16] 


together with Lorentz gauge condition 
B(x) =0,A"(x)=0 imply the Maxwell equations 
[10]. Here, AX ER plays the role of a gauge 
parameter. As seen above, both equations, the so- 
called pseudo-Maxwell equations [16] and the 
Lorentz gauge condition B(x)=0, cannot both hold 
as operator identities. The idea for the quantization 
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of the theory therefore is to give up the Lorentz 
gauge condition as an operator identity on the entire 
state space H. 

Suppose one has constructed such a theory with 
an indefinite inner state space H. Already for the 
noninteracting theory, any invariant, spectral, local, 
and covariant solution requires indefinite metric, cf. 
the explicit formula [18] below. To complete the 
Gupta-Bleuler program, one needs to find a sub- 
space of (equivalence classes of) physical states H’ of 
the inner-product space H’ such that the following 
conditions hold: 


1. the vacuum is a physical state, that is, Q € H’; 

2. observable fields like j,,(x) and F,,,(x) map H’ to 
itself; 

3. the inner product (.,.) restricted to H’ is positive 
semidefinite; 

4. observable fields map H”, the set of null vectors 
in H’, to itself; and 

5. the Maxwell equations hold on H’ in the sense 
(Y, OF, (x)®) = (U,j,(x)®), VU,BEH' [17] 

Then one obtains HP? as the completion of the 

quotient space H'/H". The physical Hilbert space 

HP® contains the vacuum 2 (1), observable fields 

act on HP? (2) and (4), it is a Hilbert space (3) 

and the Maxwell equations hold on it (5). 


To see that such a construction is possible, 
consider the noninteracting case j,(x)=0, that is, 
the limit case of vanishing electrical charge e — 0, 
first. By taking the divergence of [16], one obtains 
(1 — à)” ð A, ,(x)=0. Excluding the Landau 
gauge (A=1), this implies (oy Alx) =0. The 
most general solution for the two-point vacuum 
expectation values that is in agreement with [16] 
and the requirements of locality, translation invar- 
iance, the spectral condition, uniqueness of the 
vacuum, and the Lorentz covariance of A“(x) is then 


(Q, Ay (x)Ay(y)Q) 
= (=p w + p0,0,)D™ (x — y) 


À 

where D% and E* are the inverse Fourier 
transforms of 0(p°)6(p*) and @(p°)6'(p7) respectively, 
p? =p - p, 0 being the Heavyside function, 6 the Dirac 
measure on R of mass one in zero and 6’ its 
derivative. p and A are gauge parameters, for 
example, the Feynman gauge corresponds to 
A=p=0. We have also omitted an overall factor 
corresponding to a field strength normalization 
(choice of numerical value of h — here h = 1). 


Using Wick’s theorem and the GNS construction 
for inner-product spaces (cf. the preceding section), 
it is possible to realize a representation of the vector 
potential A,(x) as operator-valued distribution on 
some indefinite-metric state space H with Fock 
structure, for example, a Krein closure of the GNS 
space with 2 the GNS vacuum and DCH the 
canonical domain of definition. In the case of 
Feynman gauge, the metric operator ņ can be 
obtained by a second quantization of the operator 
ie D 2,.f, on the one-particle space S4. 

In particular, the field B(x) acts as an operator- 
valued distribution on %# and, by taking the 
divergence of [16], it follows that &”ð,B(x)=0Q. 
Thus, B(x) = B*(x) + B(x) can be decomposed into 
a positive (“annihilation”) and a negative (“crea- 
tion”) frequency part B*(x). One obtains: 


Theorem 6 The space H'={Y € D:Bt(x)V=0} 
fulfills all requirements (1)-(5) of the Gupta—Bleuler 
gauge procedure. 


Condition (1) is obvious and (2) follows from the 
fact that the fields F,,,(x) and B(x) commute, which 
can be checked on the level of two-point functions 
[18]. In the same spirit, one can also use [18] to 
check (3) and (4) by explicit calculations on the one- 
particle space and showing that H’ is the Fock space 
over the one-particle states annihilated by B*(x). 
Finally, by Hermiticity of A“(x), BT (x)* = B(x) and 
thus (VU, B(x)®) = (Y, BT (x)®) + (B+ (x)U, 6) =0. As 
the field B(x) stands for the obstruction to Maxwell 
equations, this implies condition (5). 

It should be noted that the physical state space 
HP does not depend on the gauge parameters A, p 
and that it is spanned by repeated application of the 
field tensor F,,,(x) to the vacuum. 

By current conservation, the divergence of [16] 
still yields 0”0,B(x) =O also in the interacting case 
where e Æ 0. One can then choose the same gauge 
condition as in Theorem 6 to define H’. One can 
then try to prove that this space fulfills all the 
requirements of the Gupta—Bleuler procedure, for 
example, in the sense of perturbation theory. Using 
more advanced formulations as, for example, BRST 
quantization and Bogoliubov’s local $-matrix form- 
alism, this program has been completed up to a 
solution of the infrared problem (see Perturbative 
Renormalization Theory and BRST). 

A different procedure, motivated by the necessity of 
coincidence of all charges Oc, Oc, and Om on the 
physical state space, has been elaborated by Steinmann. 
It deviates from the standard procedure in the sense that 
the physical space H’ is not included in H, but HP} is 
directly obtained from the GNS procedure after taking 
certain limits of Wightman functions restricted to 


certain gauge-invariant algebras constructed from the 
Borchers algebra and a limiting procedure in a gauge 
parameter. The Wightman functional on this gauge- 
invariant algebras are positive (in the sense of perturba- 
tion theory), the limiting procedure, however, implies 
that the so-obtained physical states are singular (i.e., 
have diverging inner product) to states in H, hence 
the so-defined state spaces corresponding to going to 
a physical gauge after solving the problem of a 
perturbative construction of an indefinite-metric solu- 
tion, are not subspaces of H. 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Approach to Topological Quantum Field 
Theory; Axiomatic Quantum Field Theory; Boundary 
Conformal Field Theory; BRST Quantization; 
Perturbative Renormalization Theory and BRST; 
Quantum Fields with Indefinite Metric: Non-Trivial 
Models. 
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Introduction 


Let g be a Riemannian metric on a smooth compact 
manifold M of dimension m. We assume for the 
moment that the boundary of M is empty and 
postpone until later a discussion of the more general 
setting. If x=(x1,...,X%m) is a local system of 
coordinates on M, let 


Sij := g(a, a") 


give the components of the metric tensor. Let D be 
an operator of Laplace type on a smooth vector 
bundle V over M. Adopt the Einstein convention 
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and sum over repeated indices. Relative to a local 
coordinate frame for V,D has the form 


D = ~{gilda9* + Akag +B) 


where Af and B are endomorphisms (i.e., matrices) 
of V. 

We assume that V is equipped with a positive- 
definite inner product and that D is self-adjoint. 
There is then a complete orthonormal basis {¢;} for 
L*(V), where ¢; € C~(V) and Dd; = \;¢;. The collec- 
tion {¢;, A;} is called a discrete spectral resolution of 
D. For example, if D = —0; on the circle, then the 
discrete spectral resolution is 


{ev ™, T. \ 
nCTZ, 


If we order the eigenvalues A; < 2 <--- and repeat 
each eigenvalue according to multiplicity, then there 
is the following estimate due to Weyl: 


2/m 


An ~on as n— oœ 
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We now suppose given a pair of vector bundles V; 
and V2 over M and a kth-order partial differential 
elliptic operator 


A: C”(Vi) > C” (V2) 


Locally, we decompose 


A= N að 


|I|<k 


where I =(i1,...5%) is a multi-index and where 
S 


The ar are linear maps from V; to V2. The leading 
symbol of A is then defined by setting 


o1(A)(x, £) = (V-1)* X ar(x)ë 


=k 
where £ =(&)"'...(&,)'", and 
E= (f1,---5&m) 


are local fiber coordinates on the cotangent bundle. 
The leading symbol is an invariantly defined map 


OL : T*M — End(V1, V2) 


For example, if V; = V2 and if D is an operator of 
Laplace type, then the leading symbol is given by the 
metric tensor, that is, 


oL(D) = gi&éjld = |é|7Id 


If d is exterior differentiation, then the leading 
symbol is given by exterior multiplication, that is, 


orld) (Ew = V-1E Aw 


The operator A is said to be elliptic if øL (A) is an 
isomorphism from V; to V> for any € Æ 0. If A is an 
elliptic partial differential operator, then 


index(A) := dim ker(A) — dim coker(A) 
= dim ker(A*A) — dim ker(AA*) 


is well defined. As the index vanishes if m is odd, we 
assume for the most part that m is even. 

If A; is a smooth one-parameter family of such 
operators, then index (A+) is independent of £. The 
index depends only on the homotopy class of the 
leading symbol of A within the class of invertible 
symbols; it does not depend on the underlying 
metric of the manifold and it does not depend on 
the fiber metrics chosen for V; and V3. 

The Atiyah-Singer index theorem expresses the 
index as the integral of suitably chosen polynomials 
in the curvature tensor for the classical elliptic 
complexes and, more generally, in terms of 


cohomological information for general elliptic com- 
plexes. Further details appear later in the article. 

The primary focus here is on the complexes which 
are of Dirac type, that is, complexes where A is a 
first-order partial differential operator and where 
the associated second operators D,:=A*A and 
D := AA* are of Laplace type. 

Here is a brief outline of this article. The classical 
elliptic complexes (de Rham, signature, spin, 
Dolbeault, Yang-Mills) are discussed first. Next 
the characteristic classes are introduced, followed by 
the relevant formula for the index of the classical 
elliptic complexes, manifolds with boundary, and 
the equivariant index. Index theory is an enormous 
topic and here only classical features are emphasized 
as a complete treatment is beyond the scope of a 
short expository note such as this one. As some 
guide to various applications in mathematical 
physics, the reader is referred to the Further Reading 
section. 


The Classical Elliptic Complexes 
The de Rham Complex 


Let A?M be the bundle of smooth p forms over M 
and let 


d : C®(A?M) — CAM] 
and 
6 : C? (APM) —> C%(A?-1M) 


be the exterior derivative and dually the interior 
derivative, respectively. We set 


A :=(d+6)* on C%(AM) 


and the decompose A= 6, A’, where A? is an 
operator of Laplace type on C(A? M). 

We have d*=0. The de Rham cohomology 
groups are given by taking the quotient of the closed 
forms by the exact forms: 

. TOAD — C% p+1 
H? (M; R) := ker(d : CO(AP M) — C” (AP M)) 
im(d : C~(AP-!M) — C(A M) ) 
The Hodge-de Rham theorem identifies H? (M; R) 
with the kernel of the Laplacian 


ker(A’) = H? (M; R) 


and with the topological cohomology groups. 

If € is a cotangent vector, let e(£):w — EAw be 
exterior multiplication. Let 1t(€) be the dual 
Operator, interior multiplication. If {e;} is a local 


ortho-normal frame for TM, let e! =e" A--- Ae, 
where [={1 <1) < -+ < ip < m}. Then we have 


0 t=] 
eee = | | we. 

e Ne if i, >1 

ip i 2 — 
iee = fe Ae Ne a 1 

0 if4>1 


Define a Clifford module structure on AM by 
a ef) — 16) 
If {e;} is a local orthonormal basis for TM, then 


pele) + y(e/)y(e’) = —26;Id 


so the usual Clifford commutation rules are satisfied. 
Let V be the Levi-Civita connection on M. We may 
then expand 


d = e(e')Ve,, 6 = -i(e')Ve, 
d+&=(e')Ve, 
The de Rham complex is then defined by taking 
ASM := AM, AMM := pA% M 
deba CAN M =C AN 


The Signature Complex 


The signature complex arises from a different decom- 
position of the exterior algebra. Let Clif M be the 
Clifford algebra of T*M; this is the universal unital 
algebra generated by T*M subject to the Clifford 
commutation relations given above: 


E1 * E2 + & * E1 = —2g(¢1, 2) -Id 
We suppose M is orientable and let 
orn = e4 *--- * em E Clif M 


be the orientation class. The map € — q(£) extends 
to a unital algebra homomorphism 


y : Clif M > End(AM) 


y(orn) defines an endomorphism of AM which is, 
modulo suitable sign conventions, the Hodge x 
operator. If m=2k is even, then 


(d + 6)y(orn) = —y(orn)(d + 6) 
Set 
© := (V—1)*y(orn) 
As ©? =Id, we can decompose 


AM@C=ATMOAM 
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where A*M are the +1 eigenspaces of ©. The 
signature complex is then given by 


(d + 6) : C®(A*M) > C®(A7M) 


Twisted Signature Complex 


Let V be an auxiliary complex vector bundle over 
M which is equipped with a unitary connection VV. 
We use the connection VV on V and the Levi-Civita 
connection on TM to covariantly differentiate 
tensors of all types. The twisted signature complex 
is defined by setting 


(d+4)y 
= (y(e’) @Id)Ve, : C°(ATM@ V) > C”(AM& V) 


Yang-Mills complex 


This complex in dimension 4 arises from yet another 
decomposition of the exterior algebra. We use the 
discussion in the previous section to decompose 


A*M = A**M@ AM 
into the +1 eigenspaces of ©. Let 
a: AM —> ATM 


be orthogonal projection. The Yang-Mills complex 
is the 3-term sequence 


d : C°(A°M) — C%(A'M) 
and 
md : C°(A'M) — C” (AM) 


We can wrap up this sequence to obtain an 
equivalent elliptic complex 


(d i 6) . C% (AEM) —, C% (AM 


As with the signature complex, this complex can 
be twisted by taking coefficients in an auxiliary 
vector bundle V. It is crucial to the study of four- 
dimensional geometry using Yang-Mills theory. 


Dolbeault Complex 


Let z=(z1,...,2,%) be a local system of holomorphic 
coordinates on a complex manifold M, where 
Zj=x; +V—1y;. We define 


de! := dxi + V=1dy', dz’ := dx! — V=Ady! 
OF = 5 (0% — V—10?), OF = 3 (0% + V-10) 
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and decompose d= ð + ð, where 
ð :=e(dz’)0% and ð := e(dz)ð; 


on the complexified exterior algebra. Let 6’ be the 
adjoint of ð and 6” be the adjoint of ð. Let 


dz := dz" A--- A dz’ 
On = Span{dz'}, is even 


A048) Span{d2"}in is oda 


The Dolbeault complex is then defined by 
(ð En 5") . C” (AVe fe Ce (A040) 1) 


This complex can be twisted by taking coefficients 
in a holomorphic bundle V over M. 


The Spin Complex 


Let M be orientable. Let Pso be the principal SO 
bundle of orthonormal frames for the tangent 
bundle. A spin structure s on M is a principal 
Spin bundle Ps, together with a double cover 
p:Psp — Pso which respects the usual double 
cover p:Spin —> SO of the structure groups. 
Equivalently, a spin structure is a lifting of the 
transition functions from SO to Spin which 
preserves the cocycle condition. One says that M 
is spin if it admits a spin structure. 

A manifold is orientable if and only if the first 
Stiefel-Whitney class of M vanishes; an orientable 
manifold is spin if and only if the second Stiefel- 
Whitney class of M vanishes as well; these are 
Z-valued cohomology classes. Inequivalent spin 
structures are parametrized by the cohomology group 
H'(M; Z2) or, equivalently, by real-line bundles on M. 

The spin representation S of Spin defines an 
associated spin bundle SM = S(M, s). There is a 
natural Clifford action c of TM on SM. The Levi- 
Civita connection lifts to define the spin connec- 
tion on S and the Dirac operator is defined by 


Als) = c(dx') Vas on C*(SM) 
Let m=2k and let @:=(V—1)*c(orn). Since 
c(Q)* =Id, one can decompose 
SM=S'M®SM 


as the direct sum of the half-spin bundles to obtain 
the spin complex: 


A(s) : C%(StM) > C®(S~M) 


As with the signature complex, the spin complex can 
be twisted by taking coefficients in an auxiliary vector 
bundle V. 


Relating the Classic Elliptic Complexes 


One has natural isomorphisms of virtual representa- 
tions of the spinor group: 


At ~A7 =(S-S)g8(St+ S) 
Neven _ Aodd = aye _ S`) Q (S* _ S7) 


which show that the signature complex and de Rham 
complexes are the spin complexes with coefficients in 
the virtual bundles 


S*'M+S7M and (—1)”?(S*M — S7M) 


respectively. If M is complex and spin, then the 
Dolbeault complex is the spin complex with coeffi- 
cients in the square root of the canonical bundle. 
One can consider complex spinors to define the 
group Spin‘(m). Any spin manifold admits a Spin‘ 
structure with trivial associated complex line bun- 
dle. Any complex manifold admits a Spin! structure 
with associated complex line bundle given by the 
canonical bundle. Thus, a complex manifold admits 
a Spin“ structure if and only if it is possible to take a 
square root of the canonical line bundle; inequiva- 
lent Spin structures are parametrized by inequivalent 
square roots. If M is orientable, then M admits a 
Spin® structure if and only if the second Stiefel- 
Whitney class of M lifts from H?(M; Z2) to 
H?(M;Z); in the complex setting, this lifting is 
performed by the first Chern class. Inequivalent 
Spin! structures are parametrized by H?(M; Z) or, 
equivalently, by complex line bundles over M. 


Characteristic Classes 

The Euler Form 

Let V be the Levi-Civita connection on M. Let 
R(x, y) := V Vy — Vy Vx — Vix 


be the curvature operator. Let {e1,..., €m} be a local 


orthonormal frame for TM and let 
Ret = g(R(e;, Gee, er) 


give the components of the curvature relative to a 
local orthonormal frame. Let 


eld = gfe A---Ae™, el A--- A em) 
be the totally antisymmetric tensor; this is the sign 
of the permutation which sends 1, — jp. Let 
m=2m. The Euler form is given by setting 


IJR... 
E Rei sends 


Pe eee 


Let pj := Rigg; and 7 := p; be the Ricci tensor and the 
scalar curvature, respectively. Then, 





1 1 2 2 2 
= — = —4 R 
Ey i and E4 39 72 {T |p| Eg | | } 
The Pontrjagin Forms 
Since R(x,y)= —R(y,x), we can regard R as a 


2-form-valued endomorphism of the tangent bundle. 
We define the Pontrjagin forms p; € C®(A# M) by 
expanding 


1 
det(1+5-R) =l+pit+pot-::: 
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These differential forms are closed and the corre- 
sponding cohomology classes 


P; = [pi] € H*(M;R) 


in the de Rham cohomology are independent of 
the particular Riemannian metric on M which was 
chosen. 

The A genus and the Hirzebruch L polynomial 
are expressed in terms of these classes using the 
splitting principle. Let A be a skew-symmetric 
matrix. One sets 


p(A) := det(I + A) = 1+ p1(A)+p2(A)+-::: 


As A is skew symmetric, it decomposes as the direct 
sum of 2 x 2 blocks of the form 


O Aj 
=A; 0 


p(A) = | [ {1+} 


We then have 


SO 
pi(A) = si (A, A3- +) 


where s; is the ith symmetric function; 
— 2 _ 242 
pi= > A, p=) XN 
1 


i<j 
and so forth. Let 
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As L; and Â; are even symmetric functions of À, one 
can write L;=L;(pi1(A),...,p,(A)). For example, 


L=1+$p1+4(7p2 — pj) +-- 
A= 1-941 +s (P1 — 4P2) + 


Substituting (1/27)R for A then permits one to 
define the Hirzebruch polynomial L(R) and the A 
genus A(R). 


The Chern Forms 


Let V be a k-dimensional complex vector bundle 
over M. Let V be a Hermitian connection on V and 
let Q be the associated curvature endomorphism. 
The Chern forms c; € C®(A”M) are defined by 
expanding 


y-—1 
der 0) =l ateate 
T 


As with the Hirzebruch polynomial and the Â genus, 
the Chern character and Todd genus are expressed 
in terms of the generating functions: 


= Ay 
Td(X) = Ua = 


and 


One has 


Td=1+7d,+Td+-::- 
=14+4aq+4(jt+o)+-: 

Ch = cho + ch, + cho +- 
=k+eo43(cf—2c2)+--- 


The Index Theorem 
The Gauss-Bonnet Theorem 


We return to the de Rham complex. Let 


x(M) = X /(-1)? dim H?(M; R) 
p 


be the Euler—Poincaré characteristic; x(M) = 0 if m is 
odd. Let M have a simplicial structure with a(k) cells 
of degree k; n(0) is the number of vertices, n(1) is the 
number of edges, n(2) is the number of triangles, etc. 
Then 


x(M) = X (~1)*n(k) 


k 
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so the Euler—Poincaré characteristic is a combina- 
torial invariant. By the Hodge-de Rham theorem, 


index(d + 6) = dim ker(A®") — dim ker(A°"“) 
= x(M) 


The Chern—Gauss—Bonnet theorem expresses this 
invariant in terms of curvature 


x(M) = | Ende 


where Em is the Euler form given above. If one twists 
the de Rham complex to take coefficients in an 
auxiliary vector bundle V, then no new information 
results, since 


index{d + 6}, = y(M) - dim(V) 


The Hirzebruch Signature Theorem 


Let sign (M) be the index of the signature complex 
on a manifold of dimension 4k; the index vanishes 
in dimensions m =2 mod 4. Let x be the Hodge 
duality operator. As xA? x! = A”? x preserves the 
eigenspaces of the Laplacian. In particular, x induces 
an isomorphism 


x: H? (M; R) = ker(A’) 
— H” P(M;R) = ker(A” P) 


which implements Poincaré duality. In dimension 
2k, x? = Id. Decompose 


H”: (M; R) = H**+(M;R) @ H7*-(M;R) 


into the +1 eigenspaces of x; these may be identified 
with ker(A?**) acting on C%(A7**M). As the 
contributions to the signature away from the middle 
dimension cancel, 


sign(M) = dim H?**(M;R) — dim H**-(M;R) 


As with the de Rham complex, there is a 
topological description of this invariant. If œ and 8 
are closed 2k forms, one sets 


(a, 8) = f ang 


One can use Stoke’s theorem to see that this 
induces a symmetric bilinear form on the de 
Rham cohomology groups H?*(M;R). Poincaré 
duality then shows that this symmetric bilinear 
form is nondegenerate, so this is a form of type 


(p,q); sign(M) is the signature of this quadratic 
form: 


sign(M) = q — p 


The Hirzebruch signature formula expresses sign 
(M) in terms of curvature; if L is the Hirzebruch 
polynomial described above and if m= 4k, then 


sign(M) = | La 


Let V be an auxiliary coefficient bundle. Taking 
coefficients in V then yields the formula 


signy(M)= X 7 J L; A ch;(V) 


4i+2j=m 


The Index of the Yang-Mills Complex 


Let YMy be the Yang-Mills complex with coeffi- 
cients in an auxiliary vector bundle V, then the 
index can be evaluated using the formulas given 
above as 


index{YMy} = 4{dim(V)x(M) — sign(M, V)} 
= 1 fu{dim VE4 = dim VL, = 4ch2(V)} 


The Index of the Dolbeault Complex 


If V is a holomorphic bundle over a complex 
manifold M, then 


index{(09+6y}= X` 


2i+2j=m 


J Td;(M) A ch;(V) 
M 


The index of the untwisted Dolbeault complex is 
called the arithmetic genus and denoted by ag(M). 


The Index of the Spin Complex 


If M is a spin manifold and if Ay is the Dirac 
operator with coefficients in an auxiliary coefficient 


bundle, then 


index{Ay} = > 


4i4+2j=m 


J À;(M) A ch;(M) 
M 


The index of the spin complex is called the Â genus 
and is denoted by A(M). If M is a Spin‘ manifold, 
the appropriate formula becomes 


index{AṢ$}= X 


J À;(M) /\ ch;(M) /\ gk 
4i+2j+2k=m 7M 


where 0= +c1(L), L being the complex line bundle 
associated with the Spin! structure. 


Properties 


The classic elliptic complexes defined above are 
multiplicative with respect to Cartesian product. 
Suppose that Mı and M3 are Riemannian manifolds 
with the appropriate structures. For the signature 
complex, suppose Mı and Mp are oriented; for the 
Dolbeault complex, suppose Mı and Mp) are holo- 
morphic; for the spin complex, suppose Mı and M3 
are spin. By taking the twisting coefficient bundle to 
be trivial in the interests of simplicity, one has 


x(Mı x M2) = x(Mı)x(M2) 
sign(Mı x M2) = sign(Mı)sign(M2) 
ag(Mı x Mz) = ag(Mı)ag(M2) 
A(M, x My) = À(M1 JÀ (M2) 
These complexes behave well under finite coverings. 


Let F — M — M; be a finite covering projection 
with |F| sheets. Then 


x(M2) = |F|x(M1) 
sign(M2) = |F|siga(M1) 

ag(M2) = |Flag(M1) 

A(M2) = |F|A(M1) 


The connected sum M,#M)p is defined by punching 
out small disks about points P; in M; and then 
joining along the spherical boundaries that remain. 
It is necessary, of course, to smooth out the resulting 
corners. Note that if Mı and M) are complex 
manifolds, then M;#Mp) is no longer a complex 
manifold in general. Since 


y(S") =2, sign(S”) = 0, and A(S”) = 0 


the following additivity results follow from the 
integral formulas given above: 


x(Mı#M2) = x(M1) + x(M2) — 2 
sign(M,#M>) = sign(M;,) + sign(M2) 
À(M1#M2) = A(M,) + A(M2) 


Examples and Applications 


Let S” be the standard sphere and let CP’ be the 
complex projective plane. One then has 


ra ws sign(S*) = 0 
x(S x S) =4,  sign(S? x S*) = 0 
x(CP*) = 3, sign(CP*) = 1 


In dimension 4, the Riemann—Roch formula yields 


ag(M*) = 4 {x(M) + sign(M)} 
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This would yield ag(S*) = +3 since 5 is not an integer, 


this shows that S* does not admit a complex 
structure; a similar argument shows that $” does 
not admit a complex structure for n Æ 2,6, and it is 
not known whether Sf admits a holomorphic 
structure; it does admit an  almost-complex 
structure. 


If we set M=CP*#CP’, then 
ag(M) =4(3+3-241+4+1) =; 


and thus CP?#CP? does not admit a complex 
structure. These examples are typical of the use of 
the index theorem to prove the nonexistence of 
certain structures. 


The General Index Theorem 


Let S(T*M) be the sphere bundle of unit cotangent 
vectors and let D(T*M) be the disk bundle of 
cotangent vectors of length at most 1. Let 


P : C” (Vi) — C™(V2) 


be an elliptic pseudodifferential operator. The 
leading symbol p:=o,(P) induces a smooth map 


p : S(T*M) > End(V}, V2). 


We form %(M) by gluing two copies of D(M) 
together along their common boundary S(M) and 
we define a bundle (p, V1, V2) over %(M) by gluing 
Vı to Vz over S(M) using the clutching function p. 
The Atiyah-Singer index theorem expresses the 
index of P in terms of cohomological data involving 
the Chern class of the symbol bundle and the 
characteristic classes of the tangent bundle of M. If 
X(M) is given a suitable orientation, then 


index(P) = 5 f ch;((p, Vi, V2)) A Td;(M) 
2i+4j=2m Y U(M) 


It specializes to the results given above for 
the classical elliptic complexes. Conversely, by 
using K-theoretic methods, the index theorem in 
full generality can be derived from the special case 
of the twisted signature complex. 


Manifolds with Boundary 


If the boundary of M is nonempty, we must impose 
suitable boundary conditions. 


Local Boundary Conditions 


Choose local coordinates x= (x1,...,x”) near the 
boundary of M so that x” is the geodesic distance to 
the boundary. On the boundary, we can decompose 
a differential form weéeC™*(AM) in the form 
w=w1 + dx” Aw, where w1 and wy are tangential 
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differential forms. Absolute and relative boundary 
conditions are defined by setting 


Bw := wlay and Bw :=wiloy 


Let (d+ 6), and (d+ 6), be the associated realiza- 
tions. These operators preserve the grading of the 
exterior algebra AM=A°"M @ A°4M and define 
elliptic complexes 


. C% (AEM) = Cre) 
- C% (Ave? M) _. C% (A944 M) 


r 


We consider a collection 


CAT Ene < mj 
of tangential indices and let 
dx! = dx" ^- A dx! 


The associated absolute boundary conditions for the 
Laplacian are defined by 


Ba(pjdx! + yydx™ ^ dx!) 
= (tylomdx!) © (On drlam) dx 





If x is the Hodge operator, then one sets dually: 
B,(w) = B,(xw) 


Let A? and A? be the associated realizations of the 
Laplacian with these boundary conditions. The 
Hodge-de Rham theorem extends to this setting to 
yield isomorphisms 


ker(A®) = H?(M;R) 


and 
ker(A?) = H? (M, 0M; R) 


The Hodge x operator intertwines AZ and A”? 
and implements the Poincaré duality isomorphism 


H? (M; R) = H”? (M, M; R). This also shows that 
index(d + ô), = S 1} dim H? (M; R) = x(M) 
P 
and 
index(d + 6), = S (17 dim H? (M, ƏM; R) 
p 


= x(M, 0M) = x(M) — x(0M) 


Let €,, be the Euler form if m is even. We set 
E,, =0 if m is odd. Let L be the second fundamental 


form. Let A=(a1,...,@m_-1) and B=(by,..., Dn_1) 
be collections of distinct indices ranging from 1 to 
m — 1. Set 


1 
Em- = DBE — 1 — 2k)!vol(S7—1-28) 


A,B 
XE RK pasbaba R 
xL dL, 


Ak—142k bop bak—1 


Ap Popa ** Hami m-i 


The Chern-Gauss-Bonnet theorem generalizes to 
this setting to yield 


x(M) = index(d + ô), 
=} E iif Cnty 


For example, 


1 
M? pa de+2 f Laad } 
x( i= An ý aM2 y 


x(M?) = TE + LaaLbp — Lap Lan ydy 


x(M*) = “}dx 








n 


ey Lf {37 Lae + 6Ramam t 


EJ Rabel ah T 2 Lagbo Lee 
= 6L Lan Leet AL gpLychac}dy 


The interior integral vanishes if m is odd. The 
boundary integral can be nonzero in any dimen- 
sions. Thus, in particular, the index of this elliptic 
complex can be nonzero even if m is odd; x(D”)=1 
for any m. The index of (d+6), is computed 
similarly. 


Spectral Boundary Conditions 


In contrast to the de Rham complex, there do not 
exist local boundary conditions for the signature, 
spin, and Dolbeault complexes. To simplify the 
discussion, we assume that the metric is the product 
near the boundary; there are appropriate compen- 
sating terms involving the second fundamental form 
in the more general setting. Let A:C™~(V;) —> 
C™(V>) denote either the twisted signature or the 
twisted spin complexes; there are some additional 
difficulties for the Dolbeault complex. Near the 
boundary, we can express 


A=a(&, + Ar) 


where Ar is a self-adjoint tangential operator of 
Dirac type on V4|y, and o is a unitary bundle 


isomorphism from Vıļəm to V2ləm. Let {di, Ai} be the 
discrete spectral resolution of At. One defines 


n(Ar,s) = X sgn(Ag)[Agl * 
AAO 


as a measure of the spectral asymmetry of Ar. This 
is well defined for Re(s) >> 1 and has a meromorphic 
extension to the complex plane C. It turns out that 0 
is a regular value and one defines 


n(Ar) := 4 {n(Ar, s) + dim ker(Ar)}|,—o 


The spectral boundary conditions can now be 
imposed. Let IIs be orthogonal projection in 
L?(Vi|,) on the span of the eigensections of Ar 
corresponding to non-negative eigenvalues and let 
As be the associated realization defined by this 
boundary condition. 

One can use the Atiyah—Patodi-Singer index 
theorem to generalize the relations given above to 
this setting. Let fa be the local integral given above 
that involves the Hirzebruch L polynomial for the 
signature complex or the A genus for the spin 
complex. One then has 


index(As) = n(Ar) + | t 


There are suitable correction formulas involving 
integrals of polynomials in the second fundamental 
form and in the curvature tensor if the structures are 
not product near the boundary. 


Equivariant Problems 
The Classical Lefschetz Formula 


Let M be a compact Riemannian manifold without 
boundary. Let T be a smooth map from M to M. Then 
pullback T* induces an action on C%(A?M) which 
commutes with the exterior derivative d and hence an 
action on the de Rham cohomology groups H? (M; R). 
The Lefschetz number of T is then given by 


L(T) = X (-1)?tr{T* on H?(M;R)} 
p 


To illustrate the Lefschetz number, let M = T? be 
the two-dimensional torus. Let e!:=dx!, let 


e? :=dx*, and let e!* := dx! A dx*. Then, 
H°(T’;R) =1-R 
H'(T’;R) =e!-R+e?-R 
H’(T*;R) =e” -R 
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Let T(x1, x2) = (n11X1 + 122, n21%1 + n22x2). Then, 


T(1)=1 

T* (e!) = nyel +npe 

T*(e*) = nne! + ne 
T*(e!*) = (minz — y2N21)e'* 


and, consequently, the Lefschetz number becomes 


L(T) = det(I — T*) 
= 1 — (m1 + 122) + (M1122 — 112121) 


The classical Lefschetz fixed-point formula expresses 
L in terms of data for the fixed-point set F(T) and is an 
example of the equivariant index theorem. One 
assumes that the fixed-point set of T consists of smooth 
submanifolds N,,...,N, and that the induced map 
dT, on the normal bundles of these manifolds is 
nondegenerate. This means that det (I — dT,,) Æ 0, that 
is, that there are no infinitesimal normal directions 
which are left fixed. One then has 


L(T) = >, sign(det(I — dT,))x(N;) 


The Lefschetz Formula for the Other Classical 
Elliptic Complexes 


Let T be an orientation-preserving isometry of M. 
When dealing with the spin complex, suppose that T 
preserves the spin structure; when dealing with the 
Dolbeault complex, suppose that T preserves the 
holomorphic structure. If 


A: C”(V1) > C” (V2) 


is one of the classical elliptic complexes, then by 
assumption T* commutes with A and hence pre- 
serves the eigenspaces of the associated Laplacians. 
The Lefschetz number is defined by setting 


La(T) :=tr(T* on ker(A*A)) 
—tr(T™ on ker(AA’*)) 


Setting T=Id, one recovers the standard index. 

To simplify the discussion, we assume henceforth 
that T is an orientation-preserving isometry of M 
with only isolated fixed points. Let {01,..., 6/2} be 
the rotation angles of dT at a fixed point x of T. Set 


A; == cos(6;) + V—1 sin(6;) 


We take the sum over the isolated fixed points x and 
then the product over the rotation angles 1 < j < 
m/2 to express 
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sD = DTT {-vTeor() | 
eon -PG 
Lobl T) = DIG -A 


In considering the spin complex, we assume T 
preserves the spin structure. This permits us to lift dT 
from SO(m) to Spin(m) and defines liftings of the 
rotation angles 0; from [0, 27] to [0, 47] in such a way 
that the formula given above for the spin complex is 
well defined. In considering the Dolbeault complex, 
we assume that T preserves a complex structure, so the 
formula given above for the Dolbeault complex 
involving the complex eigenvalues A; is well defined. 
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Introduction 


Given 1 < p < n, it was shown by Sobolev that there 
exists a constant K > 0 such that, for any u € 
C%(R”), the space of smooth functions with 
compact support in R”, 


tip" 1/p 
(/ jul? dx) = K(f [Vu]? dx) [1] 
R” R” 
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where Vu is the gradient of u and p* =np/(n — p). It 
is easily seen that p* in [1] is critical in the following 
sense. Let ||- ||, stand for the L?-norm. For u € 
Co (R”), and A > 0, let also u) be the function given 
by u(x) =u(Ax). For p and q two real numbers, 


[Veal =o Vul 


[zalla =X" llall 


Letting A—0O and A— +00, it follows that an 
inequality like ||u||, < K||Vu]|, holds true for all u 
(in particular for the u)’s) only when q=p*. To 


prove [1], the approach of Sobolev was based on the 
straightforward representation formula 


r(n/2) xe = 
yp Lea z Opu( y)dy 


where T is the Gamma function, and on an 
n-dimensional version of a theorem of Hardy- 
Littlewood concerning fractional integrals that we 
apply to the right-hand side of the above representa- 
tion formula. More direct arguments were later 
discovered in independent works by Gagliardo and 
Nirenberg. In particular, the explicit inequality 


(n—1)/n (eee 1/n 
(/ wind) < SIIC Dauds) 
1 
< sf. |Vu|dx [2] 


was proved to hold, where D, is the partial 
derivative D; =0/Ox,. Inequality [2] is of the form 
[1] when p=1, since 1*=n/(n—1). By geometric 
measure theory, and the coarea formula, it can be 
expressed as an isoperimetric type inequality. 

There have been several symbols and several 
definitions for Sobolev spaces. Before they became 
generally associated with the name of Sobolev, they 
were sometimes referred to by other names, for 
instance, as “Beppo Levi spaces.” We often find two 
definitions and two notations in the literature. For Q 
a domain in R”,p > 1 real, and u of class C” in Q, 
we let 


u(x) = — 








1/p 


l4linp = {| XO Deul [3] 


0<la|<m 


when the right-hand side makes sense, where || - || 
the L?-norm, a=(aj,...,Q,) is a ma 
la| = X; a; and D°= DY --- De. We define 


H™? (Q) =the completion of 
{u € C” (Q) s.t. [lel] 5 < +0} 
with respect to the norm || - || 
= {u € LP (Q) s.t. D“u € L’ (Q) 
for all 0 < |a| < m} 


m,p 
we) 


where D® is the weak (or distributional) partial 
derivative of u with respect to the multi-index a. Both 
H”? (Q) and W’?(Q) are Banach spaces (and even 
Hilbert when p= 2). It is easily seen that H’?(Q) C 
Ww”™?(Q), but we had to wait for the work of Meyers 
and Serrin to realize that H’?(Q)=W”?(Q). The 
spaces H’”?(()), also denoted W”? (Q), are referred to 
as Sobolev spaces. The spaces Hj” P(Q), also denoted 
Wo” P(O), are defined as the snes of C(9Q) in 
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H”™?(Q), where CẸ(Q) is the space of smooth 
functions with compact support in 2. 
Inequality [1] states that the Sobolev space 
H (R”) is naturally embedded in the Lebesgue 
mace LP (R”), a particular case of what we now 
refer to as Sobolev embeddings. 


Sobolev Inequalities and the Sobolev 
Embedding Theorem in Its First Part 


Let m be an integer and let p>1 be real. The 
Sobolev space H”?(IR”), also denoted by W’”?(R”), 
is defined by in one of the two equivalent ways: 


H”? (R”) =the completion of 
{we CR") s.t. lulang < +00} 


with respect to the norm ||- [lmp 


or 


H”™?(R") = {u € LP (R”) s.t. D°u € L?(R") 


for all 0 < |a| < m} 


where D® is the weak (or distributional) partial 
derivative of u with respect to the multi-index a, and 
is as in [3]. The Sobolev space (H’”?(R”), 
Il- Ilm, p) 18 a Banach space, and even a Hilbert space 
when p=2. The space is reflexive when p > 1, and 
we also have that H”?(R”)=Hj’?(R"), where 
Hj"?(R”) is defined as the closure of C%(R”) in 
H™?(R”). What we usually refer to as the first part 
of Sobolev inequalities can be expressed as follows. 


Sobolev embeddings (Part I). For p, q two real 
numbers with 1 < g < p, and k, m two integers with 
O<m<k, if 1/p=1/q—(k—m)/n, then a 
H”™?, and there exists K >0 such that ||u|,,, 
Klltlle for all u € H54. 


I- lip 


The Sobolev theorem in its first part states that 
the above Sobolev embeddings (resp. inequalities) 
hold true for the Euclidean space. A particular case 
of interest is when k = 1. In this case, we get, as in 
the introduction, that for any 1 < p < n, Ht? (R”) c 
LP (R”) where p* =np/(n — p). The embedding for 
the Euclidean space reduces to the Sobolev inequal- 
ity [1]. An important remark is that there is a 
hierarchy for Sobolev embeddings. In particular, 
that if Hb! cL") 1*=n/(n— 1), then all the 
other embeddings H®4 c H”? hold true. Thanks to 
this remark, the Sobolev embedding theorem for 
Euclidean space easily follows from an inequality 
like [2]. The hierarchy for Sobolev embeddings is an 
easy consequence of Ho6lder’s inequalities when 
k=1, and of Holder’s inequalities together with 
Kato’s inequality when k > 1. 
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There are several extensions of Sobolev inequal- 
ities in the literature. Famous extensions were 
discovered by Gagliardo and Nirenberg. The Nash 
inequality, which reads as 


(n+2)/n 4/n 
(| n dx) <K([ | dx) 
n R” 
«| [Vul dx [4] 
R” 


for all u € Ht4(R”), is one of the Gagliardo- 
Nirenberg’s inequalities. The Nash inequality easily 
follows from [1] when p=2 and Hölder’s inequal- 
ity. There are also extensions of Sobolev spaces, for 
instance, spaces of BV-functions or Orlicz—Sobolev 
spaces. 


The Sobolev Embedding Theorem in Its 
Second Part 


For m integer, let C#(R”) be the space of functions 
of class C” in R” for which the norm 


Iulo = Š, sup |D°u(x)| 


0<ļa|<m xER” 


is finite. What we usually refer to as the second part 
of Sobolev inequalities can be expressed as follows. 


Sobolev embeddings (Part II). For qg>1 a real 
number, and k, m two integers with O < m < k, if 
1/q—(k-—-m)/n<0, then H®4 c C”, and there 
exists K>0 such that ||u|\on < Kllullk g for all 
u € H’, 


The Sobolev theorem in its second part states that 
the above Sobolev embeddings (resp. inequalities) 
hold true for the Euclidean space. Refinements were 
then obtained by Morrey with embeddings in 
Holder spaces. Let, for instance, C?°(R”) be the 
Hölder space of continuous functions in R” for 
which the norm 


[ullos = sup f(x) + sup“ 
xER” xÆy | 
is finite. For k=1,m=0, and q > 1 such that 1/q — 
1/n < 0, the embedding H'4(R”) c C}(R”) can be 
refined into an embedding like H'4(R”) c C®°(R”), 
where a € (0,1) is such that 1/q — (1 — a)/n < 0. 


The Case of Domains and the Kondrakov 
Theorem 


The Sobolev embeddings in their first and second 
parts extend to regular domains Q. A typical 
condition is that Q satisfies a cone property. When 


Q is bounded, and thus of finite volume, an 
embedding like H'?(Q) c LP (Q) implies that we 
also have that H'?(Q) c L4(Q) for all 1 < q < p*. 
The Kondrakov theorem states that such embed- 
dings are all compact, unless q = p*, in the sense that 
bounded sequences of functions in Ht? possess 
converging subsequences in L4. 

For p > 1 real, the Sobolev embedding theorem in 
its first part provides embeddings of Ht? into 
Lebesgue spaces when p < n, while the Sobolev 
embedding theorem in its second part provides 
embeddings of H}? into Holder spaces when p > n. 
For p=n, it is false that Ht” can be embedded 
into L®. However, when Q is bounded, we can 
prove that exp (u) € L'(Q) when u € Ho” (O); and 
that 


J exp(u) dx < K exp(uljulit „) 


where u,K > 0 are independent of u. We also have 
that 


| exp (uli!) de < K 


for all u € HẸ”(Q) such that ||Vul|, <1, where m, 
K > 0 are independent of u. Such inequalities are often 
referred to as Moser—Triidinger type inequalities. 


The Case of Riemannian Manifolds 


Riemannian manifolds are natural extensions of 
Euclidean space. For (M, g) a Riemannian manifold, 
m integer, and p> 1 real, we define the Sobolev 
space H”P (M) by 


H”?(M) =the completion of 
{u € C™(M) s.t. ||4|l np < +oo} 


with respect to the norm ||- [lmp 


where ||ulln, p = do |lV’4ll,, V'u is the ith covari- 
ant derivative of u, and ||-||, is the L?-norm 
in (M, g). A notation like [Vull stands for the 
LP -norm of the pointwise norm |V‘u| of V'u. Sobolev 
spaces on manifolds are Banach spaces, even Hilbert 
when p = 2, and they are reflexive when p > 1. They 
do not depend on the metric when M is compact. 
For compact Riemannian manifolds, everything 
works as for bounded domains. The Sobolev 
embeddings in their first and second parts remain 
valid. The Kondrakov theorem also remains valid. 
However, since constant functions are in Sobolev 
spaces when the manifold is compact, the L?-norm 
of u in the H'?-norm of u should be added to the 
right-hand side in inequalities like [1]. More 
precisely, if (M,g) is a compact Riemannian 


manifold of dimension n, and 1 < p < n, then the 
inequality for the embedding H'?(M) c L? (M) 
reads as: there exists K >0O such that for any 
u € H!:?(M), 


p/p* 
(/ juj dve) <K(/ VulP dv, + f udv) [5] 
M M M 


where dv, is the Riemannian volume element with 
respect to g. When (M, g) is no longer compact, the 
Sobolev embedding theorem might become false. A 
nontrivial key observation is that a Sobolev inequal- 
ity like [5] on a complete manifold (M, g) implies the 
existence of a uniform (with respect to the center) 
lower bound for the volume of balls of radius 1. It 
follows that for any n > 2, there exist complete 
Riemannian n-manifolds (M,g) for which, for any 
p € [1,n), Ht? (M) ¢ LP (M). Possible examples are 
warped products of the real line R and the 
(n — 1)-sphere S”~'. When the Ricci curvature is 
bounded from below, the condition that there is a 
uniform (with respect to the center) lower bound for 
the volume of balls of radius 1 is necessary and 
sufficient in order to get that the Sobolev embed- 
dings are valid. 


Isoperimetric and Euclidean 
Type Inequalities 


Let (M,g) be a complete Riemannian n-manifold. 
Euclidean type inequalities are said to hold on (M, g) 
if there exists K > 0 such that for any 1<p<4a, 
and any u € H!?(M), 


o NI” 1/p 
(/ jul? dve) < K(f ud [6] 


where p*=np/n — p. As for the Euclidean space, if 
the above inequality holds for some po, then it 
holds, with distinct K, for all poọ<p<n. In 
particular, if the inequality holds for p= 1, it holds 
for all p’s. The inequality when p=1 was shown to 
be true by Hoffman and Spruck when the manifold 
is simply connected of nonpositive sectional curva- 
ture. Such manifolds are referred to as Cartan- 
Hadamard manifolds. The inequality when p=2 is 
related to the nonparabolicity of the manifold, 
namely the existence of a minimal Green’s function, 
and to the behavior of the minimal Green’s function. 

By geometric measure theory and the coarea 
formula, [6] when p=1 is equivalent to the 
isoperimetric inequality 


Area,(0Q) > = Vol (Q) [7] 


l= 
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where C > 0,Q is a smooth bounded domain in 
M, Area,(OQ) is the volume of OQ) for the metric 
induced by g, and Vol,(Q) is the volume of 2 with 
respect to g. Moreover, the constants C and K 
(for p=1) are the same in the sense that if [6] for 
p=1 holds with K, then [7] holds with C= K, and 
if [7] holds with C, then [6] for p=1 holds with 
k=, 

The sharp constant for the isoperimetric inequal- 
ity [7] in Euclidean space is known. When 1 =2 its 
value is C(2)=1/(47) and the sharp isoperimetric 
inequality is the well-known inequality L? > 47A, 
where A is the volume of a smooth bounded domain 
in R*, and L is the length of its boundary. For 
arbitrary n, the sharp constant C(n) for the isoperi- 
metric inequality is given by 


om = (A y 8 


n \Wy-1 





where w,,_; is the volume of the unit (n — 1)-sphere. 
Moreover, still for the Euclidean space, equality 
holds in the sharp isoperimetric inequality if and 
only if Q is a ball. A famous conjecture concerning 
sharp isoperimetric inequalities, often referred to as 
the Cartan—Hadamard conjecture, is that the sharp 
isoperimetric inequality holds on Cartan—Hadamard 
manifolds. Thanks to works by Croke, Kleiner, and 
Weil, the conjecture is known to be true in 
dimensions 2, 3, and 4. From the Bishop—Gromov 
comparison theorem, we also get that the only 
complete manifold of non-negative Ricci curvature 
for which the sharp isoperimetric inequality holds is 
the Euclidean space itself. 

The sharp constants K = K(n, p) for [6] when p > 1 
have been computed in Euclidean space by Aubin, 
Rodemich, and Talenti. The extremal functions were 
also computed, where, by definition, an extremal 
function is a function which realizes the case of 
equality in the inequality. We get that 


1 (nlp —1)\ D/P 
Konp) == (222) 
n\ n-p 
; ( (n+) y 9 
TP a + 1 — npn 

where, as above, I is the gamma function. More- 
over, u is an extremal function for the sharp 
inequality in Euclidean space if and only if, up to a 
scale factor, 
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for some 4 >0, and a€R”. When p=2, the 
functions u in [10] are both the only extremal 
functions for the sharp Sobolev inequality in Euclidean 
space, and the only positive solutions of the equation 
Au =u”! in R”, where A= —)->, D? is the Laplace- 
Beltrami operator (the usual Laplacian with a minus 
sign in front of it). Sharp constants are also known for 
several of the Gagliardo—Nirenberg inequalities in 
Euclidean space. The sharp constant for the Nash 
inequality in Euclidean space was computed by Carlen 
and Loss. If the sharp isoperimetric inequality holds on 
a complete Riemannian n-manifold, then the sharp 
inequalities [6] hold for all 1 < p < n. 


Sharp Inequalities on Compact 
Riemannian Manifolds 


The study of sharp Sobolev inequalities on compact 
manifolds if often referred to as the AB program for 
Sobolev inequalities. For (M, g) a compact Rieman- 
nian m-manifold, and 1 < p < n, [5] can be rewritten 
in two different forms: 


Le" 1/p 
(/ jul? dve) <a(/ Vudu 
M M 
1/p 
+B( f wd) [11] 
M 


E 
(/ jul? dv <A’ | Vl’ dv, 
M M 
+B f |u|? du, [12] 
M 


where A, B, A’, B’ are positive constants independent 
of u. An easy remark is that if [12] holds with 
constants A’ and B’, then [11] holds with A =(A’)!/? 
and B=(B’)'/?. The sharp first (resp. second) 
constants in [11] and [12] are defined as the lowest 
possible values for A and A’ (resp. for B and B’) in 
[11] and [12]. The sharp first constants are 
independent of the manifold and are given by 
A’ =A? =K(n, p}, where K(n,p) is as in [9]. The 
sharp second constants depend on the manifold 
and are given by B'= B? = V}” /” where V, is the 
volume of (M,g). A typical question in the AB 
program is to know whether or not we can take A 
or B to be the sharp constants in [11] and, similarly, 
whether or not we can take A’ or B’ to be the sharp 
constants in [12]. Another typical question in the AB 
program is whether or not there are nonzero 


and 


extremal functions for the saturated form of the 
sharp inequalities when they are valid. Concerning 
the B-part of the program, the sharp inequality [11] 
with B = V,/” is true on any manifold, and constant 
functions are extremal functions. On the other hand, 
it can be proved that the stronger [12] with 
B'=V,° I" is always false when p > 2, whatever 
the manifold. Concerning the A-part of the 
AB-program, Hebey and Vaugon proved that the 
sharp inequality [12] with A’=K(n,2)* is true on 
any manifold. In other words, for any compact 
Riemannian manifold (M,g) of dimension n > 3, 
there exists B’ > 0 such that, for any u € H}?(M), 


2/2* 
( J ud < K(n, 2} / Va dv; 
M M 
+B! J uldu [13] 
M 


We then get the saturated form of [13] by taking 
B' = B'(g) to be the lowest possible B’ in [13]. In 
general, when p Æ 2, we can prove that the sharp 
inequality [11] with A=K(n,p) is true on any 
manifold, and that there are nonzero extremal 
functions for the saturated form of the sharp inequal- 
ity. On the other hand, the stronger [12] with 
A'=K(n, p)? when p > 2 is false when the curvature 
is positive, but true when the curvature is negative. 
The p =2 case in the A-part of the AB program is of 
importance for its connection with the Yamabe 
problem. The p=1 case in the A-part of the AB 
program is of importance for its connection with the 
isoperimetric inequality. The AB program has also 
been considered for Gagliardo—Nirenberg inequal- 
ities, including the Nash inequality, and Sobolev- 
Poincaré inequalities on compact manifolds. 
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Introduction 


Infinite-dimensional Hamiltonian systems arise in 
many areas in pure and applied mathematics and in 
mathematical physics. These are partial differential 
equations (PDEs) which can be written as evolution 
equations (dynamical systems) in the form 


F = {F, H} 


where H is the Hamiltonian (“energy”) and {.,.} is a 
Poisson bracket on an infinite-dimensional phase space, 
called Poisson manifold. Unlike finite-dimensional 
Hamiltonian systems, which are ordinary differential 
evolution equations on finite-dimensional phase spaces, 
for which general existence and uniqueness theorems 
for solutions exist, this is not the case for PDEs. There 
are no general existence and uniqueness theorems for 
solutions of infinite-dimensional Hamiltonian systems. 
These have to be established case by case. This article 
gives only a broad mathematical framework of infinite- 
dimensional Hamiltonian systems. Precise definitions 
are presented and the concept is illustrated through 
physical examples. 


Hamilton’s Equations on Poisson 
Manifolds 


A Poisson manifold is a manifold P (in general 
infinite dimensional) equipped with a bilinear 
operation {.,.}, called Poisson bracket, on the 
space C™(P) of smooth functions on P such that: 


1. (CPP), t.441) is a Lie algebra, that is, {:,:}: C” 
(P) x C®(P)— C™(P) is bilinear, skew-symmetric 
and satisfies the Jacobi identity {{F, G}, H} + 
HH, F},G}+{{G,H},F}=0 for all F,G,H € 
C™(P) and 

2. {.,.} satisfies the Leibniz rule, that is, {.,.} 
is a derivation in each factor: {F- G,H}=F.- 
(G, H! + G - {F, H}, forall F, G, H € C™(P). 


The notion of Poisson manifolds was rediscovered 
many times under different names, starting with Lie, 
Dirac, Pauli, and others. The name Poisson manifold 
was coined by Lichnerowicz. 

For any H € C%(P), the Hamiltonian vector field 
Xy is defined by 


Xy (F) = {F, H}, Fec”(P) 


It follows from (2) that, indeed, Xy defines a 
derivation on C%(P), hence a vector field on P. 
Hamilton’s equations of motion for a function F € 
C™®(P) with Hamiltonian H (energy function) are 
then defined by the flow (integral curves) of the 
vector field Xy, that is, 


F = Xn(F) = {F, H} 1] 


where the overdot implies differentiation with 
respect to time. F is then called a Hamiltonian 
system on P with energy (Hamiltonian function) H. 


Examples of Poisson Manifolds and 
Hamilton’s Equations 


Finite-Dimensional Classical Mechanics 


For finite-dimensional classical mechanics, we take 
P=R” and coordinates (q!,..., q”, P15---5 Pn) 
with the standard Poisson bracket for any two 
functions F(q’, pi), H(q', pi) given by 





OF OH OH OF 
{F, H}=) ~~ -=-=— |2] 
— Opi Oq' paq 


Then the classical Hamilton’s equations are 


-j ; OH 
q ={q, H} = T 
Pi [3] 
OH 
i= 1) A= SF 
Pi = ibp HY = -3 F 
i=1,...,n. This finite-dimensional Hamiltonian 


system is a system of ordinary differential equations 
for which there are well-known existence and 
uniqueness theorems, that is, it has locally unique 
smooth solutions, depending smoothly on the initial 
conditions. 


Example: harmonic oscillator As a concrete exam- 
ple, consider the harmonic oscillator: here P = R* and 
the Hamiltonian (energy) is H(q,p)= (q + p*). 
Then Hamilton’s equations are 


Infinite-Dimensional Classical Field Theory 


Let V be a Banach space and V* its dual space 
with respect to a pairing (.,.):V x V* — R (ie., 
(.,.) is a symmetric, bilinear, and nondegenerate 
function). On P= V x V*, the canonical Poisson 
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bracket for F,H €e C~(P), pe V, and m€ V* is 
given by 


where the functional derivatives 6F/é7 € V, 6F/éyp € V* 
are the “duals” under the pairing (.,.) of the partial 
gradients D1F(7r) € V*,D2F(y) eV" ~V. The corre- 
sponding Hamilton’s equations are 


= {p H}=— 6 
6 
a= {n, H=- 


As a special case in finite dimensions, if V ~ R” so 
V* ~ R” and P=V x V* ~ R”, and the pairing is 
the standard inner product in R”, then the Poisson 
bracket [5] and NHamilton’s equations [6] are 
identical with [2] and [3], respectively. 


Example: wave equations As a concrete example, 
consider the wave equations. Let V=C™(R°) and 
V*=Den(R*) (densities) and the L? pairing 
(yp, 7) = | y(x)m(x) dx. Take the Hamiltonian to be 


H(p, 7) = J (sx? +E + Fle) 


where F is some function on V. Then Hamilton’s 
equations [6] become 


pant, T=V p—F(y) 7] 


where the prime denotes differentiation with respect 
to y, which imply the wave equation 

Op 2 

— = Vy- F 8 
Different choices of F give different wave equations, 
for example, for F = 0 we get the linear wave equation 


Pp 2 
g 


for F=(1/2)my, we get the Klein-Gordon equation 


Op 
Vp-— =m 
Pop j 
So, these wave equations and the Klein-Gordon 
equation are infinite-dimensional Hamiltonian sys- 


tems on P= C™(R*) x Den (R3). 


Cotangent Bundles 


The finite-dimensional examples of Poissson brackets 
[2] and Hamilton’s equations [3] and the infinite- 
dimensional examples [5] and [6] are the local versions 
of the general case where P= T“O is the cotangent 


bundle (phase space) of a manifold O (configuration 
space). If O in an n-dimensional manifold, then T*O is 
a 2n-Poisson manifold locally isomorphic to R” 
whose Poisson bracket is locally given by [2] and 
Hamilton’s equations are locally given by [3]. If O is 
an infinite-dimensional Banach manifold, then T* O is 
a Poisson manifold locally isomorphic to V x V* 
whose Poisson bracket is given by [5] and Hamilton’s 
equations are locally given by [6]. 


Symplectic Manifolds 


All the examples above are special cases of symplectic 
manifolds (P,w). This means that P is equipped with 
a symplectic structure w which is a closed (dw=0), 
(weakly) nondegenerate 2-form on the manifold P. 
Then, for any H € C™(P), the corresponding Hamil- 
tonian vector field Xy is defined by dH =w(Xų, .) 
and the canonical Poisson bracket is given by 


{ F, H} = w( XF, Xy), F,H € G(r) [9] 


For example, on R” the canonical symplectic 
structure w is given by w= 5-"_, dp; ^ dg = dð, 
where 0= $ %_; pi^ dq. The same formula for w 
holds locally in T*O for any finite-dimensional O 
(Darboux’s lemma). For the infinite-dimensional 
example P= V x V*, the symplectic form w is given 
by w((y1,71)> (25 72)) = (Y1, 72) — (y2,71). Again, 
these two formulas for w are identical if V = R”. 


Remarks 


(i) If P is a finite-dimensional symplectic manifold, 
then P is even dimensional. 

(ii) If the Poisson bracket {.,.} is nondegenerate, 
then {.,.} comes form a symplectic form w, that 
is, {.,.} is given by [9]. 


The Lie-—Poisson Bracket 


Not all Poisson brackets are of the from given in the 
above examples [2], [5], and [9], that is, not all 
Poisson manifolds are symplectic manifolds. An 
important class of Poisson bracket is the so-called 
Lie—Poisson bracket. It is defined on the dual of any 
Lie algebra. Let G be a Lie group with Lie algebra 
q=T.G ~ {left-invariant vector fields on G} and let 
[.,.] denote the Lie bracket (commutator) on q. Let 
q* be the dual of a g with respect to a pairing 
(.,.):9° x g — R. Then, for any F,H € C™(q*) and 
u € q*, the Lie—Poisson bracket is defined by 


(RAW) =+(m (= Z]) o 


where 6F/é6u,6H/du€q are the “duals” of the 
gradients DF(u), DH(u) € a** ~ g under the pairing 
(.,.). Note that the Lie—Poisson bracket is degen- 
erate in general, for example, for G=SO(3) the 
vector space q* is three dimensional, so the Poisson 
bracket [10] cannot come from a symplectic 
structure. This Lie—Poisson bracket can also be 
obtained in a different way by taking the canonical 
Poisson bracket on T*G (locally given by [2] and [5] 
and then restrict it to the fiber at the identity 
TG = g*. In this sense, the Lie—Poisson bracket [10] 
is induced from the canonical Poisson bracket 
on T*G. It is induced by the symmetry of left- 
multiplication, as discussed in the next section. 


Example: rigid body A concrete example of the 
Lie—Poisson bracket is given by the rigid body. Here 
G =SO(3) is the configuration space of a free rigid 
body. Identifying the Lie algebra ($0(3),|.,.]) with 
(R°, x ), where x is the vector product on R? and 
q* = 380(3)* ~ RÌ, the Lie—Poisson bracket translates 
into 


{F, H}(m) = —m- (VF x VH) [11] 
For any F € C%”(50(3)*), we have 


dF 


q 07) = VF -rn = {F, H}(m) 


= -m : (VEF x VH) = VF - (m x VH) 


hence m=m x VH. With the Hamiltonian 











_ eg 
h-hh 
3 = Lb mMm 


These are Euler’s equations for the free rigid body. 


Reduction by Symmetries 


The examples discussed so far are all canonical 
examples of Poisson brackets, defined either on a 
symplectic manifold (P,w) or T*O, or on the dual of 
a Lie algebra g*. Different, noncanonical Poisson 
brackets can arise from symmetries. Assume that a 
Lie group G is acting in a Hamiltonian way on the 
Poisson manifold (P, {..}). This means that we have 
a smooth map y:G x P — P:y(g,p)=g-p such 
that the induced maps yg=¢(g,.):P — P are 
canonical transformations, for each g € G. In terms 
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of Poisson manifolds, a canonical transformation is 
a smooth map that preserves the Poisson bracket. 
So, the action of G on P is a Hamiltonian action if 
pF, A} = {p}F, pA} for all F,H € C*(P), gE G. 
For any € € g, the canonical transformations expire) 
generate a Hamiltonian vector field £p on P and a 
momentum map J:P — q* given by J|(x)(&) = F(x), 
which is Ad” equivariant. 

If a Hamiltonian system Xy is invariant under a 
Lie group action, that is, H(y,.(x))=H(x), then we 
obtain a reduced Hamiltonian system on a reduced 
phace space (reduced Poisson manifold). We recall 
the Marsden—Weinstein reduction theorem: 


Reduction Theorem For a Hamiltonian action of 
a Lie group G on a Poisson manifold (P,{.,.}), 
there is an equivariant momentum map ]:P — q*, 
and for every regular u € g* the reduced phase 
space P,,=J](u)/G, carries an induced Poisson 
Structure |.5-$4, (Gy the isotropy group). Any 
G-invariant Hamiltonian H on P defines a 
Hamiltonian H, on the reduced phase space P, 
and the integral curves of the vector field Xy 
project onto integral curves of the induced vector 
field Xu, on the reduced space P,. 


Example: rigid body The rigid body discussed 
above can be viewed as an example of this 
reduction theorem. If P=T*G and G is acting on 
T*G by the cotangent lift of the left-translation 
le:G — G,l,(h)=gh, then the momentum map 
J:T*G — q" is given by J(ag)=T;R,g(ag) and the 
reduced phase space (T*G),, =J(u)/G,, is iso- 
morphic to the coadjoint orbit O, through u € q*. 
Each coadjoint orbit O,, carries a natural symplec- 
tic structure w, and in this case, the reduced Lie- 
Poisson bracket {.,.},, on the coadjoint orbit O, is 
induced by the symplectic form w, on O, as in [9]. 
Furthermore, T*G/G ~ q*, and the induced Pois- 
son bracket {.,.},, on O, is identical with the Lie- 
Poisson bracket restricted to the coadjoint orbit 
O, C g*. For the rigid body this construction is 
applied to G=SOj(3). 


We now discuss some infinite-dimensional exam- 
ples of reduced Hamiltonian systems. 


Infinite-Dimensional Lie Groups 


A general theory of infinite-dimensional Lie groups 
is hardly developed. Even Bourbaki only develops a 
theory of infinite-dimensional manifolds, but all of 
the important theorems about Lie groups are stated 
for finite-dimensional ones. 
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An infinite-dimensional Lie group G is a group 
and an infinite-dimensional manifold with smooth 
group operations 


m:GxXG—>G, mg,hb)=g-h, Ce [12] 


i:G-G, ig)=g', Cc [13] 


Such a Lie group G is locally diffeomorphic to an 
infinite-dimensional vector space. This can be a 
Banach space whose topology is given by a norm 
|-||, a Hilbert space whose topology is given by an 
inner product (.,.), or a Frechet space whose 
topology is given by a metric but not by a norm. 
Depending on the choice of the topology on G, the 
Banach, Hilbert, or Frechet Lie groups, respectively, 
can be treated. 

The Lie algebra g of G is defined as 
g = {left-invariant vector fields on G} ~ T.G, where 
the isomorphism is given (as in finite dimensions) by 


E € T.Gr X*(g) = TeLe(E) [14] 


and the Lie bracket on g is induced by the Lie bracket 
of left-invariant vector fields [£,7]=[X‘, X"](e), 
67 € g 

These definitions in infinite dimensions are iden- 
tical with the definitions in finite dimensions. The 
big difference although is that infinite-dimensional 
manifolds, hence Lie groups, are not locally com- 
pact. For Frechet Lie groups, one has the additional 
nontrivial difficulty of defining the differentiability 
of functions defined on a Frechet space. Hence, the 
very definition of a Frechet manifold is not 
canonical. This problem does not arise for Banach 
and Hilbert Lie groups; the differential calculus 
extends in a straightforward manner from R” to 
Banach and Hilbert spaces, but not to Frechet 
spaces. 


Finite- versus Infinite-Dimensional 
Lie Groups 


The lack of local compactness of infinite-dimensional 
Lie groups causes some deficiencies of the Lie theory 
in infinite dimensions. Some classical results in finite 
dimensions are summarized below, which are not 
true in general in infinite dimensions: 


1. The exponential map exp:q — G is defined as 
follows: To each € € g we assign the correspond- 
ing left-invariant vector field X§ defined by [14]. 
We take the flow y(t) of X5 and define 
exp(€) =y'(1). The exponential map is a local 
diffeomorphism from a neighborhood of zero in g 
onto a neighborhood of the identity in G; hence, 


exp defines canonical coordinates on the Lie 
group G. This is not true in infinite dimensions. 

2. If f,h:Gı > G2 are smooth Lie group 
homomorphisms (1.e., file - h) = fi(g) - fi(b),7= 1, 2) 
with Tfi = Tfh, then locally fi =f. This is not 
true in infinite dimensions. 

3. If H is a closed subgroup of G, then H is a Lie 
subgroup of G. This is not true in infinite 
dimensions. 

4. For any finite-dimensional Lie algebra g, there 
exists a connected Lie group G whose Lie algebra 
is g, that is, such that g ~ TG. This is not true in 
infinite dimensions. 


Some classical finite-dimensional examples of Lie 
groups are the matrix groups GL(z), SL(z), O(n), 
SO(n), U(z), SU(z), Sp(z) with smooth group 
Operations given by matrix multiplication and 
matrix inversion. 


Examples of Infinite-Dimensional 
Lie Groups 


Abelian Gauge Group G =(C™(M), +) 


Let M be a finite-dimensional manifold and let 
G=C™(M). With group operation being addition, 
that is, m(f,g)=f+g,if)=—f,e=0. G is an 
abelian C% Frechet Lie group with Lie algebra 
q=T.C°(M) ~ C®(M), with trivial bracket 
[£n] =0, and exp = id. If one completes these spaces 
in the Ck-norm, k < œo then G* is a Banach Lie 
group, and if the H*-Sobolev norm is used with s > 
(1/2) dim M then G* is a Hilbert Lie group. 


Application of G=(C™~(M), +) to Maxwell’s equa- 
tions Let E, B be the electric and magnetic fields 
on R°; then Maxwells equations for a charge 
density p are: 


FE =curlB, B = —curl E [15] 


div B = 0, div E = p [16] 


Let A be the magnetic potential such that B = —curl A. 
As configuration space, we take V= Vec(R°), 
vector fields (potentials) on R?, so A € V, and as 
phase space, we have P=T*V ~ V x V* > (A, ), 
with the standard L^ pairing (A, E) = f A(x)E(x) dx, 
and canonical Poisson bracket given by [5], which 
becomes 


(RHVAE)= f(E- Eia 17] 


As Hamiltonian, we take the total electromagnetic 
energy 


H(A, E)= 5 | (cun Al’ + |E) dx 


Then Hamilton’s equations in the canonical vari- 
ables A and E are 


. sH 
A= E => B=-curlE 
and 
6H 
E = =A —curl curl A = curl B 


So the first two equations of Maxwell’s equations [15] 
are Hamilton’s equations, the third one is obtained 
automatically from the potential divB = —divcurlA 
=Q and the fourth equation, divE=,, is obtained 
through the following symmetry (gauge invar- 
lance): the Lie group G=(C™(R°),+) acts on V 
by p-A=A+Vy,~EG,AEV. The lifted action 
to VxV* becomes y-(A,E)=(A+Vy,E), and 
has the momentum map J:V x V* — q* œ {charge 
densities} 


J(A, E) = div E [18] 


With q=C™(R°) and g*=Den(R?), we identify 
the elements of g* with charge densities. The 
Hamiltonian H is G invariant, that is, H(y- 
(A, E) = H(A + Vy, E) = H(A, E). Then the reduced 
phase space for p € g* is 


(V x V*) =J" (p)/G ={(E, B)|divE =p, divB =0) 


and the reduced Hamiltonian is 
1 
H,(E, B) => | (EP +|BP) dx [19 


The reduced Poisson bracket becomes, for any 
functions F, H on (V x V*),, 
{F, H},(E, B) 


ôF 6H 6H ôF 
= JCE cu sB GE ™ 5) [20] 


and a straightforward computation shows that 


F E U Hp}, 
a curl B, a —curl E 21] 
divB = 0, divE=ọp 


So, Maxwell’s equations [15], [16] form an infinite- 
dimensional Hamiltonian system on this reduced 
phase space with respect to the reduced Poisson 
bracket. 
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Abelian Gauge Group G = (C™(M, R — {0}, -) 


Let M be a finite-dimensional manifold and let 
G = C™~(M, R — {0}), the group operation being the 
multiplication, that is, (f,z)=f - g, i(f)=f-',e=1. 
For k < co, C*(M, R — {0}) is open in C~(M, R), and 
if M is compact then C*(M, R — {0}) is a Banach Lie 
group. If s>(1/2)dim M then H*‘(M,R-— {0}) is 
closed under multiplication, and if M is compact 
then H*(M,R — {0}) is a Hilbert Lie group. 


Nonabelian Gauge Groups G = (C*“(M, G), -) 


The abelian example can be generalized by replacing 
R — {0} with any finite-dimensional (nonabelian) Lie 
group G. Let G=C*(M,G) with pointwise group 
operations m/(f, g)(x) =f (x) - g(x),x € M and i(f)(x) = 
(f(x))*, where and “(.)~” are the operations 
in G. If k<c then C*(M,G) is a Banach Lie 
group. Let g denote the Lie algebra of G, then the 
Lie algebra of G=C*(M,G) is g=C*(M,g), with 
pointwise Lie bracket [&,7](x) =[&(x),7(x)],x € M, 
the latter bracket being the Lie bracket in g. 
The exponential map exp:g — G defines the 
exponential map EXP:q=C*(M,g) ~ G=C*(M,G), 
EXP(£)=expo€&, which is a local diffeomorphism. 
The same holds for H’(M,G) if s>(1/2)dimM. 

Applications of these infinite-dimensional Lie 
groups are in gauge theories and quantum field 
theory, where they appear as groups of gauge 
transformations. 


cc 99 


Loop Groups G—C*(S', G) 


As a special case of the example above, we take 
M=S}, the circle. Then G=C%(S!,G)=L*(G) is 
called a loop group and g = C*(S', g) =/'(g) its loop 
algebra. They find applications in the theory of 
affine Lie algebras, Kac-Moody Lie algebras (central 
extensions), completely integrable systems, soliton 
equations (Toda, Korteweg-de Vries (KdV), 
Kadomtsev—Petviashvili (KP)), quantum field theory. 
Central extensions of Loop algebras are examples of 
infinite-dimensional Lie algebras which need not 
have a corresponding Lie group. 


Diffeomorphism Groups 


Among the most important “classical” infinite- 
dimensional Lie groups are the diffeomorphism 
groups of manifolds. Their differential structure is 
not the one of a Banch Lie group as defined above. 
Nevertheless, they have important applications. 

Let M be a compact manifold (the noncompact 
case is technically much more complicated but 
similar results are true) and let G=Diff°(M) be 
the group of all smooth diffeomorphisms on M, 
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group operation being the composition, that is, 
m(f,g)=fog, if)=f', e=idy. For C% diffeo- 
morphisms, Diff(M) is a Frechet manifold and 
there are nontrivial problems with the notion 
of smooth maps between Frechet spaces. There is 
no canonical extension of the differential calculus 
from Banach spaces (same as for R”) to Frechet 
spaces. One possibility is to generalize the notion 
of differentiability. For example, if we use the 
so-called CP differentiability, then G= Diff™ (M) 
becomes a CP Lie group with CP differentiable 
group operations. These notions of differentiability 
are difficult to apply to concrete examples. 
Another possibility is to complete Diff(M) in 
the Banach C*-norm, 0 < k < œ, or in the Sobolev 
H5-norm, s> (1/2) dim M; Diff*(M) and DiffS(M) 
become, in this case, Banach and Hilbert mani- 
folds, respectively. Then we consider the inverse 
limits of these Banach and Hilbert Lie groups, 
respectively: 


Diff (M) = lim Diff*(M) [22] 


becomes the so-called inverse limit of Banach (ILB) 
Lie group, or with the Sobolev topologies 


Diff®(M) = lim Diff (M) [23] 


becomes the so-called inverse limit of Hilbert (ILH) 
Lie group. Nevertheless, the group operations are 
not smooth, but have the following differentiability 
properties. If the diffeomorphism group is equipped 
with the Sobolev H’-topology, then Diff (M) 
becomes a C% Hilbert manifold if s> (1/2) dim M 
and the group multiplication 


m : Diff: (M) x Diff (M) — Diff(M) [24 


is Cf differentiable; hence, for k=0, m is only 
continuous on Diff (M). The inversion 


i: Diff (M) — Diff‘ (M) [25] 


is C* differentiable; hence, for kR=0,i is only 
continuous on Diff*(M). The same differentiability 
properties of m and i hold in the C? topology. This 
situation leads to the notion of nested Lie groups. 

The Lie algebra of Diff~(M) is given by 
g = T;Diff™ (M) ~ Vec*(M), the space of smooth 
vector fields on M. Note that the space Vec(M) 
of all vector fields is a Lie algebra only for C% 
vector fields, but not for C£ or H5 vector fields if 
k<0oo,s< oo, because one loses derivatives by 
taking brackets. 

The exponential map on the diffeomorphism 
group is given as follows: for any vector field X € 
Vec™(M) take its flow y; € Diff (M), then define 


EXP: Vec™(M) — Diff” (M):X = 1, the flow at 
time t=1. The exponential map EXP is not a local 
diffeomorphism; it is not even locally surjective. 

Applications of Diff(M) occur in general rela- 
tivity, where the diffeomorphism group plays the 
role of a symmetry group of coordinate transforma- 
tions. Let (M, g) be a Lorentz 4-manifold. Then the 
vacuum Einstein’s field equations are 


Ric(g) = 0 


These are invariant under coordinate transfor- 
mations, that is, under the action of Diff™(M). 
Moreover, Einstein’s field equations form a 
Hamiltonian system on the space P= {metrics 
on M}/Diff™ (M). 


Subgroups of Diff™ (M) 


Several subgroups of Diff*(M) have important 
applications. 


Group of volume-preserving diffeomorphisms Let 
u be a volume on M and G=Diff}(M)={f € 
Diff?°(M) | f*u= u} the group of volume-preserving 
diffeomorphisms. Diff/"(M) is a closed subgroup of 
Diff™(M) with Lie algebra g=Vec?(M)={X € 
Vec™(M) |div, X =0} the space of divergence free 
vector fields on M. Vec% (M) is a Lie subalgebra of 
Vec™~(M). 

Remark: We can neither apply the finite- 
dimensional theorem that if Vec? (M) is Lie algebra 
then there exists a Lie group whose Lie algebra it is; 
nor that if Diff} (M) C Diff(M) is a closed subgroup 
then it is a Lie subgroup. 

Applications of Diff% (M) occur, for example, in 
fluid dynamics. Euler’s equations for an incompres- 


sible fluid, 


% y u:Yu=-Vp, 


divu = 0 [26] 
are equivalent to the equations of geodesics on 
Diff% (M). 


Symplectomorphism group Let w be a symplectic 
2-from on M and G=Diff%(M)= {f € Diff™(M) | 
f*w=w} the group of canonical transformations (or 
symplectomorphisms). Diff% (M) is a closed sub- 
group of Diff*(M) with Lie algebra g = Vec” (M) = 
{X € Vec”(M)|Lxw=0} the space of locally 
Hamiltonian vector fields on M. Vec? (M) is a Lie 
subalgebra of Vec™ (M). 

Applications of symplectomorphism groups occur, 
for example, in plasma physics. Maxwell-Vlasov’s 
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equations for a plasma density f(x, v, t) generating 
the electric and magnetic fields E and B are 


of of Of 

OB OE | [27] 
a —curl E, i curl B — Jy 

div E = py, divB = 0 


where J; and pp are the current and charge densities, 
respectively. This coupled nonlinear system of 
evolution equations is an infinite-dimensional 
Hamiltonian system of the form F = {F, H} p, ON the 
reduced phace space 


MY = (T*Diff (R) x T* V)/C” (R) [28] 


(V is the same space as in the example of Maxwell’s 
equations) with respect to the following reduced 
Poisson bracket, which is induced via gauge sym- 
metry from the canonical Poisson bracket on 
T* Diff (Rf) x T*V: 


{F, G}, (f, E, B) 
Jg ss 
JEg- 


oF wae 6G Of óF 


ðo óF O6G 
+ | (5% ar) dv [29] 
and with Hamiltonian 
H(f, E, B) =5 J Pf E 
1 
+3 | (EÈ +|BP dx BO 


More complicated plasma models are formulated 
as Hamiltonian systems. For example, for the 
two-fluid model the phase space is constituted by 
coadjoint orbits of the semidirect product (x) of the 
group G= Diff® (RÉ ) x(C*(R°) x C”(R)). For the 
MHD model: G = Diff® (R£) x(C™(R°) x 92(R°)). 


F 
curl 5) dx dv 


The KdV Equation and Fourier Integral 
Operators 


There are many known examples of PDEs which are 
infinite-dimensional Hamiltonian systems, such as the 
Benjamin—Ono, Boussinesq, Harry Dym, KdV, and KP 
equations and others. In many cases, the Poisson 
structures and Hamiltonians are given ad hoc on a 
formal level. This is illustrated here with the KdV 


equation, where at least one of the three known 
Hamiltonian structures is well understood. 
The KdV equation 


U; + 6UUy + Uyyy = 0 [31] 


is an infinite-dimensional Hamiltonian system with 
the Lie group of invertible Fourier integral operators 
being a symmetry group. Gardner found that with the 
bracket 


aO ôG } 
F O=) Forde BY 


and Hamiltonian 


Ln 
Hae / (u? +143) dx 33] 
0 
u satisfies the KdV equation [31] if and only if 
u = {u, H} 


An important question concerns the origin of the 
Poisson bracket [32] and Hamiltonian [33]. It was 
shown earlier that this bracket is the Lie—Poisson 
bracket on a coadjoint orbit of Lie group G = FIO, the 
group of invertible Fourier integral operators on the 
circle S1. The latter is discussed briefly in the following. 

A Fourier integral operator on a compact mani- 
fold M is an operator 


A: C™(M) — C*(M) [34] 
locally given by 


O / J ceerd a(x 


where y(x,y,&) is a phase function with certain 
properties and the symbol a(x, €) belongs to a certain 
symbol class. A pseudodifferential operator is a 
special kind of Fourier integral operators, locally of 
the form 


P(u)(x) = 2m)" | f ple 


Denote by FIO and YDO the groups under composi- 
tion (operator product) of invertible Fourier integral 
operators and invertible pseudodifferential operators 
on M, respectively. Then we have the following results. 

Both groups WDO and FIO are smooth infinite- 
dimensional ILH Lie groups. The smoothness 
properties of the group operations (operator multi- 
plication and inversion) are similar to the case of 
diffeomorphism groups [24] and [25]. The Lie 
algebra of both ILH Lie groups WDO and FIO is 
the Lie algebra of all pseudodifferential operators 
under the commutator bracket. Moreover, FIO is a 
smooth infinite-dimensional principal fiber bundle 


)u(y)dy dg [35] 


ju(y)dy dé [36] 
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over the diffeomorphism group of canonical trans- 
formations Diff% (T*M — {0}) with structure group 
(gauge group) VDO. 

For the KdV equation, we take the special case 
where M = S!. Then the Gardner bracket [32] is the 
Lie—Poisson bracket on the coadjoint orbit of FIO 
through the Schrödinger operator P € VDO. Com- 
plete integrability of the KdV equation follows from 
the infinite system of conserved integrals in involu- 
tion given by H; =tr(P*); in particular, the Hamil- 
tonian [33] equals H = H3. 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Functional Integration in Quantum Physics; Hamiltonian 
Fluid Dynamics; Hamiltonian Systems: Obstructions to 
Integrability; Korteweg—de Vries Equation and Other 
Modulation Equations; Symmetries and Conservation Laws. 
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Introduction 


Let X be a closed (connected, compact without 
boundary) smooth manifold of dimension 4, pro- 
vided with a Riemannian metric denoted by g. Let 
o denote space of smooth p-forms on X, that is, 
the sections of AP TX. The Hodge operator acting on 
p-forms, 


. OÊ 4—p 
x: OY OO, 
satisfies ** =(—1)’. In particular, x splits Q% into 
two subspaces a with eigenvalues +1: 


a0, oa (1] 





Note also that this decomposition is an orthogonal 
one, with respect to the inner product: 


(w1, w2) = / Wy A *W2 
X 


A 2-form w is said to be self-dual if xw=w and it 
is said to be anti-self-dual if xw = —w. Any 2-form w 
can be written as the sum 


w=w" +w 


of its self-dual w* and anti-self-dual w components. 
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Now let E be a complex vector bundle over X as 
above, provided with a connection V, regarded as a 
C-linear operator 


V : T(E) T(E) @ a} 
satisfying the Leibnitz rule: 
Vifo) =fVo+o@df 


for all fe C”(X) and o €T(E). Its curvature 
Fy =VoV is a 2-form with values in End(E), that 
is, Fy €T(End(E)) @0%, satisfying the Bianchi 
identity VFy = 0. 

The Yang-Mills equation is 


Vere =o [2] 


It is a second-order nonlinear equation on the 
connection V. It amounts to a nonabelian general- 
ization of Maxwell equations, to which it reduces 
when E is a line bundle; the four components of V 
are interpreted as the electric and magnetic 
potentials. 

An instanton on E is a smooth connection V 
whose curvature Fy is anti-self-dual as a 2-form, 
that is, it satisfies: 


F =0, thatis, x Fy =—Fy [3] 


The instanton equation is still nonlinear (it is linear 
only if E is a line bundle), but it is only first-order 
on the connection. 


Note that if Fy is either self-dual or anti-self-dual 
as a 2-form, then the Yang-Mills equation is 
automatically satisfied: 


thy = tly => V ely ]=Vivy =0 


by the Bianchi identity. In other words, instantons 
are particular solutions of the Yang-Mills equation. 
Furthermore, while the Yang-Mills equation [2] 
makes sense over any Riemannian manifold, the 
instanton equation [3] is well defined only in 
dimension 4. 

A gauge transformation is a bundle automorphism 
g:E-—E covering the identity. The set of all gauge 
transformations of a given bundle E— X forms a 
group through composition, called the gauge group 
and denoted by G(E). The gauge group acts on the 
set of all smooth connections on E by conjugation: 


g-V=g'Vg 


It is then easy to see that [3] is a gauge-invariant 
condition, since Fyy =g 'Fyg. The anti-self-duality 
equation [3] is also conformally invariant: a con- 
formal change in the metric does not change the 
decomposition [1], so it preserves self-dual and 
anti-self-dual 2-forms. 

The topological charge k of the instanton V is 
defined by the integral 


k= E Bo A Fy) 
= o (E) -361 (E) 4 


where the second equality follows from Chern-Weil 
theory. 

If X is a smooth, noncompact, complete Rieman- 
nian manifold, an instanton on X is an anti-self-dual 
connection for which the integral [4] converges. 
Note that, in this case, k as above need not be an 
integer; however, it is always expected to be 
quantized, that is, always a multiple of some fixed 
(rational) number which depends only on the base 
manifold X. 


Summary This note is organized as follows. 
After revisiting the variational approach to the 
anti-self-duality equation [3], we study instantons 
over the simplest possible Riemannian 4-manifold, 
R* with the flat Euclidean metric. In the subse- 
quent sections, we present °t Hoofts explicit 
solutions, the ADHM construction, and its dimen- 
sional reductions to R?, R? and R. We conclude by 
explaining the construction of the central object of 
study in gauge theory, the instanton moduli 
spaces. 
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Variational Aspects of Yang-Mills 
Equation 


Given a fixed smooth vector bundle E — X, let A(E) 
be the set of all (smooth) connections on E. The 
Yang-Mills functional is defined by 


YM: AE)>R 


YM(V) = Fy ||72 = / tr(Fy A xFy) 5! 
M 

The Euler-Lagrange equation for this functional is 
exactly the Yang-Mills equation [2]. In particular, 
self-dual and anti-self-dual connections yield critical 
points of the Yang-Mills functional. 

Splitting the curvature into its self-dual and 
anti-self-dual parts, we have 


2 ~ 12 
YM(V) = |lFollz2 + IFollre 


It is then easy to see that every anti-self-dual 
connection V is an absolute minimum for the 
Yang-Mills functional, and that YM(V) coincides 
with the topological charge [4] of the instanton V 
times 87°. 

One can construct, for various 4-manifolds but 
most interestingly for K=S*, solutions of the 
Yang-Mills equations which are neither self-dual 
nor anti-self-dual. Such solutions do not minimize 
[5S]. Indeed, at least for gauge group SU(2) or 
SU(3), it can be shown that there are no other 
local minima: any critical point which is neither 
self-dual nor anti-self-dual is unstable and must be 
a “saddle point” (Bourguignon and Lawson 
Jr. 1981). 


Instantons on Euclidean Space 


Let X=R* with the flat Euclidean metric, and 
consider a Hermitian vector bundle E—R*. Any 
connection V on E is of the form d+ A, where d 
denotes the usual de Rham operator and A € 
T(End(E)) 8 Qh, is a 1-form with values in the 
endomorphisms of E; this can be written as follows: 


4 
A=)S Agdx*, Ag: R' u(r) 
R=1 


In the Euclidean coordinates x1, x2, x3, x4, the 
anti-self-duality equation [3] is given by 


Fig = 34, Py3 = —Fo4, Fig = Fo3 
where 
aA; OA, 
F; = — 
Ox; Ox; 








T Aj, Aj] 
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The simplest explicit solution is the charge-1 SU(2) 
instanton on R*. The connection 1-form is given by 


Eer. Im(qdq) [6] 
where q is the quaternion q = x1 + x21 + x3j + x4k, 
while Im denotes the imaginary part of the product 
quaternion; we are regarding 1, j, k as a basis of the 
Lie algebra $11(2); from this, one can compute the 
curvature: 


2 
1 
P= (p) -Im(dq A dq) [7] 


We see that the action density function 


1 2 
F4 | = 
rat = (5) 


has a bell-shaped profile centered at the origin and 
decays like r~*. 

Let t),y:R*—R* be the isometry given by the 
composition of a translation by ye Rf with a 
homothety by \€R*. The pullback connection 
t\ „Ao is still anti-selt-dual; more explicitly, 


2 
A2 + |x — yl? 


» í 
Fa,, = | ——— | Im(dq A dq 
4 = \ oa oP (dq ^ dq) 


Note that the action density function |F4|* has again 
a bell-shaped profile centered at y and decays like 
r~t; the parameter à measures the concentration of 
the energy density function, and can be interpreted 
as the “size” of the instanton A), y. 

Instantons of topological charge k can be obtained 
by “superimposing” k basic instantons, via the so- 





Ay = ti yAo = . Im(qdq) 


called °t Hooft ansatz. Consider the function 
o:R* = R given by 
5 
p(x) =14+ 7 —, 
j=1 (x= yj) 


where \; €R and y; € Rf. Then the connection 
1-form A=A,,dx,, with coefficients 


4 
~<. o 
Ay =i} wz elel) [8] 
v=1 j 
is anti-self dual; here, o,,, are the matrices given by 


(HM, v= 1, 2, 3): 
1 1 


Ow =| (Oi Oy] G= ad 


where o,, are the Pauli matrices. 


The connection [8] correspond to k instantons 
centered at points y; with size A; The basic 
instanton [6] is exactly (modulo gauge transforma- 
tion) what one obtains from [8] for the case k=1. 
The ’t Hooft instantons form a 5k-parameter family 
of anti-self-dual connections. 

SU(2) instantons are also the building blocks for 
instantons with general structure group (Bernard 
et al. 1977). Let G be a compact semisimple Lie group, 
with Lie algebra g. Let ¢: $u(2) — g be any injective 
Lie algebra homomorphism. If A is an anti-self-dual 
SU(2) connection 1-form, then it is easy to see that 
@(A) is an anti-self-dual G-connection 1-form. Using 
[8] as an example, we have that 


A=i 76m) node 9 


is a G-instanton on R*. 

While this guarantees the existence of G-instan- 
tons on R*, note that the instanton [9] might be 
reducible (e.g., @ can simply be the obvious 
inclusion of $u(2) into $u(m) for any n) and that 
its charge depends on the choice of representation œ. 
Furthermore, it is not clear whether every 
G-instanton can be obtained in this way, as the 
inclusion of a SU(2) instanton through some 
representation @:95u(2)— q. 


The ADHM Construction 


All SU(r) instantons on R* can be obtained through 
a remarkable construction due to Atiyah, Drinfeld, 
Hitchin, and Manin. It starts by considering 
Hermitian vector spaces V and W of dimension c 
and r, respectively, and the following data (the so- 
called ADHM data): 


B1, B2 € End(V), i € Hom(W, V) 
j € Hom(V, W) 


Assume, moreover, that (B;,B2,1,/) satisfy the 
ADHM equations: 


[By , Bo] +4 = 0 [10] 


[B1 , B|] + [B2 , B}] + it —j'7 = 0 [11] 
Now consider the following maps 


a:VxR*3=(VOVeW) x R* 
6B: (Va Vea W) x R* Vx Rí 





given as follows (1 denotes the appropriate identity 
matrix): 


By + 241 
a (21,22) = | B2 +221 [12] 
j 
Benga) =(—Ba— 221 Bi+21 i) [13] 


where z1 =x, +ix2 and z2 =x3 +ix4 are complex 
coordinates on R*. The maps [12] and [13] should 
be understood as a family of linear maps parame- 
trized by points in Rî. 

A straightforward calculation shows that the ADHM 
equation [10] implies that Ga =0 for every (z1, 22) € 
R4. Therefore, the quotient E = ker 8/im a= ker BN 
ker at forms a complex vector bundle over R* or rank r 
whenever (B1, B2, i, f) is such that a is injective and ( is 
surjective for every (z1, 22) € RÎ. 

To define a connection on E, note that E can be 
regarded as a sub-bundle of the trivial bundle (V 6 
V © W) x R4. Solet: E> (V6 V W) x Rf be the 
inclusion, and let P:(V @ V @ W) x RÎ—E be the 
orthogonal projection onto E. We can then define a 
connection V on E through the projection formula 


Vs = Pdi(s) 

















where d denotes the trivial connection on the trivial 
bundle (V6 V p W) x R’*. 

To see that this connection is anti-self-dual, note 
that projection P can be written as follows: 





P=1- DED 
where 


D:(Va Ve W)xRi—>(VeV)xRí 


san) 


and ==DD". Note that D is surjective, so that E is 
indeed invertible. Moreover, it also follows from 
[11] that 66 =ata, so that =! = (88t) t1. 

The curvature Fy is given by 


Fy =P(4(1 — Da d) = P(dDE! (dD) 


=P((dD')E"! (dD) + Did(E" (dD) 
= (dD')="*(dD) 


for P(D'd(=1(dD)))=0 on E= ker D. Since = is 
diagonal, we conclude that Fy is proportional to 
dD' \ dD, as a 2-form. 

It is then a straightforward calculation to show 
that each entry of dD! ^ dD belongs to Q>. 

The extraordinary accomplishment of Atiyah, Drin- 
feld, Hitchin, and Manin was to show that every 
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instanton, up to gauge equivalence, can be obtained in 
this way (see, e.g., Donaldson and Kronheimer 1990). 
For instance, the basic SU(2) instanton [6] is associated 
with the following data (c=1,r= 2): 


By, By = 0, i= (3) j=(0 1) 


Remark The ADHM data (B14, B2,i,/) are said to 
be stable if 3 is surjective for every (z1, 22) € RÍ, and 
it is said to be costable if œ is injective for every 
(z1, 22) € R*. (By, Bo, i,j) is regular if it is both stable 
and costable. The quotient: 


{regular solutions of (10) and (11)}/U(V) 


coincides with the moduli space of instantons 
of rankr=dim W and charge c=dim V on Rf (see 
below). It is also an example of a quiver variety (see 
Finite Dimensional Algebras and Quivers), asso- 
ciated to the quiver consisting of two vertices V and 
W, two loop-edges on the vertex V and two edges 
linking V to W, one in each direction. 


Dimensional Reductions of the 
Anti-Self-Dual Yang-Mills Equation 


As pointed out above, a connection on a Hermitian 
vector bundle E — R* of rank r can be regarded as 
1-form 


..,x4)dx*®, Ap: Rt u(r) 


Assuming that the connection components A, are 
invariant under translation in one direction, say x4, 
we can think of 


as a connection on a Hermitian vector bundle over 
R°, with the fourth component ¢=A4 being 
regarded as a bundle endomorphism ¢:E—E, 
called a Higgs field. In this way, the anti-self-duality 
equation [3] reduces to the so-called Bogomolny (or 
monopole) equation: 


Fa = «dé [14] 


where x is the Euclidean Hodge star in dimension 3. 

Now assume that the connection components A, 
are invariant under translation in two directions, say 
x3 and x4. Consider 


2 
A= ` A; (x1, x2) dx* 
k=1 
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as a connection on a Hermitian vector bundle over 
R?, with the third and fourth components combined 
into a complex bundle endomorphism: 


op = (A3 +i. A4)(dx1 — i. dx2) 


taking values on 1-forms. The anti-self-duality 
equation [3] is then reduced to the so-called 
Hitchin’s equations: 


Fa = [®, &*], ða = 0 [15] 


Conformal invariance of the anti-self-duality equa- 
tion means that Hitchin’s equations are well defined 
over any Riemann surface. 

Finally, assume that the connection components A, 
are invariant under translation in three directions, say 
x2,x3, and x4. After gauging away the first compo- 
nent Aj, the anti-self-duality equations [3] reduce to 
the so-called Nahm’s equations: 


dTe | 
dx, 75 De eal 


where each T}, is regarded as a map R —> u(r). 

Readers who are interested in monopoles and 
Nahm’s equations are referred to the survey 
by Murray (2002) and references therein. The best 
source for Hitchin’s equations still are Hitchin’s 
(1987a, b) original papers. A beautiful duality, 
known as Nahm transform, relates the various 
reductions of the anti-self-duality equation to periodic 
instantons; see the survey article by Jardim (2004). 

It is also worth mentioning the book by Mason 
and Woodhouse (1996), where other interesting 
dimensional reductions of the anti-self-duality equa- 
tion are discussed, providing a deep relation 
between instantons and the general theory of 
integrable systems. 


j Til = 0, j,k, l = {2,3,4} [16] 


The instanton Moduli Space 


Now fix a rank-r complex vector bundle E over a 
four-dimensional Riemannian manifold X. Observe 
that the difference between any two connections is a 
linear operator: 


(V — VNo) =fVot+o-df — fV'o — o - df 


= f(V - V’)o 


In other words, any two connections on E differ by 
an endomorphism-valued 1-form. Therefore, the set 
of all smooth connections on E, denoted by A(E), 
has the structure of an affine space over 
T (End(E)) 8 Q4. 


The gauge group G(E) acts on A(E) via 
conjugation: 
g:-V:=g 'Vg 
We can form the quotient set B(E) E)/G(E 


which is the set of gauge Ma a E 
connections on E. 

The set of gauge equivalence classes of anti-self- 
dual connections on E is a subset of B(E), and it is 
called the moduli space of instantons on E —> X. The 
subset of Mx(E) consisting of irreducible anti-self- 
dual connections is denoted M(E). 

Since the choice of a particular vector bundle 
within its topological class is immaterial, these sets 
are usually labeled by the topological invariants 
(Chern or Pontrjagyn classes) of the bundle E. For 
instance, M(r,k) denotes the moduli space of 
instantons on a rank-r complex vector bundle 
E—X with c(E)=0 and c(E)=k> 0. It turns 
out that M ,(E) can be given the structure of a 
Hausdorff topological space. In general, Mx(E) will 
be singular as a differentiable manifold, but M(E) 
can always be given the structure of a smooth 
Riemannian manifold. 

We start by explaining the notion of a L? vector 
bundle. Recall that LR denotes the completion 
of the space of smooth functions f:R” — C with 
respect to the norm: 


Iri = f (UFP + AFP +--+ PFP) 


In dimension n=4 and for p > 2, by virtue of the 
Sobolev embedding theorem, LŽ consists of continu- 
ous functions, 1.e., LZ(R”) C C°(R”). So we define 
the notion of a 12 vector bundle as a topological 
vector bundle whose transition functions are in L 
where p > 2. 

Now for a fixed L vector bundle E over X, we can 
consider the metric space Ap(E) of all connections on 
E which can be represented locally on an open subset 
UcXasa D U) 1-form. In this topology, the subset 
of r ble connections A (E ) becomes an open 
dense subset of A,(E). Since any topological vector 
bundle admits a compatible smooth structure, we may 
regard L connections as those that dier from a 
smooth e by a L 1-form. In other words, 
Ap(E) becomes an affine space modeled over ilie 
Hilbert space of L* 1-forms with values in the 
endomorphisms of E. The curvature of a connection 
in A,(E) then becomes a E 2-form with values in 
the endomorphism bundle End(E). 

Moreover, let Gp+1(E) be defined as the topolo- 
gical group of all Ls ,, bundle automorphisms. By 
virtue of the Sobolev multiplication theorem, 
Gp+1(E) has the structure of an infinite-dimensional 


Lie group modeled on a Hilbert space; its Lie 
algebra is the space of Lg sections of End(E). 

The Sobolev multiplication theorem is once again 
invoked to guarantee that the action G,41(E) x 
A,(E) — Ap(E) is a smooth map of Hilbert mani- 
folds. The quotient space B,(E) =A ,(E)/Gp41(E) 
inherits a topological structure; it is a metric (hence 
Hausdorff) topological space. Therefore, the sub- 
space Mx(E) of B,(E) is also a Hausdorff topolo- 
gical space; moreover, one can show that the 
topology of Mx(E) does not depend on p. 

The quotient space 6,(E) fails to be a Hilbert 
manifold because in general the action of G,,1(E) on 
A,(E) is not free. Indeed, if A is a connection on a 
rank-r complex vector bundle E over a connected 
base manifold X, which is associated with a 
principal G-bundle. Then the isotropy group of A 
within the gauge group 


Ta = {g € 9p (E)Ig(A) = A} 


is isomorphic to the centralizer of the holonomy 
group of A within G. 

This means that the subspace of irreducible connec- 
tions A (E) can be equivalently defined as the open 
dense subset of A,(E) consisting of those connections 


whose isotropy group is minimal, that is, 
A,(E) = {A € Ap(E)|T'4 = center(G)} 


Now Gp+1(E) acts with constant isotropy on A (E); 
hence, the quotient BE) = A (E) /Gp41(E) acquires 
the structure of a smooth Hilbert manifold. 


Remark The analysis of neighborhoods of points 
in By(E)\B,(E) is very relevant for applications of 
the instanton moduli spaces to differential topology. 
The simplest situation occurs when A is an SU(2) 
connection on a rank-2 complex vector bundle E 
which reduces to a pair of U(1) and such [A] occurs 
as an isolated point in B,(E)\B,(E). Then a 
neighborhood of [A] in B,(E) looks like a cone on 
an infinite-dimensional complex projective space. 


Alternatively, the instanton moduli space Mx(E) 
can also be described by first taking the subset of all 
anti-self-dual connections and then taking the 
quotient under the action of the gauge group. 
More precisely, consider the map 


p : Ap(E) > L}(End(E) ® 25") 


[17] 
p(A) = Fi 


Thus, o™t(0) is exactly the set of all anti-self-dual 
connections. It is Gp+1(E)-invariant, so we can take 
the quotient to get 


Mx(E) = P~ (0)/Gp+ (E) 
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It follows that the subspace My(E) = B,(E) N 
Mx(E) has the structure of a smooth Hilbert 
manifold. Index theory comes into play to show 
that M(E) is finite dimensional. Recall that if D is 
an elliptic operator on a vector bundle over a 
compact manifold, then D is Fredholm (i.e., ker D 
and coker D are finite dimensional) and its index 


ind D = dimker D — dimcoker D 


can be computed in terms of topological invariants, 
as prescribed by the Atiyah—Singer index theorem. 
The goal here is to identify the tangent space of 
M(E) with the kernel of an elliptic operator. 

It is clear that, for each A € A,(E), the tangent 
space T’4A)(E) is just L;(End(E) Q QL). We define 
the pairing 


(a,b) = | anxb [18] 
x 


and it is easy to see that this pairing defines a 
Riemannian metric (the so-called L*-metric) on A,(E). 

The derivative of the map p in [17] at the point A 
is given by 

di : L5(End(E) 9 9%) > L51 (End(E) 9 OX) 
a= (daa)™ 

so that for each A € p!(0) we have 

ar O= fa € L?(End(E)) @ Q4 | dja = o} 


Now for a gauge equivalence class [A] € BE), the 
tangent space T[4)8,(E) consists of those 1-forms 
which are orthogonal to the fibers of the principal 


Gp+1(E) bundle AŻ (E) > B$ (E). At a point A € A,(E), 
the derivative of the action by some g € Gp+1(E) is 


-d4 : L5,,(End(E)) > Lo (End(E) 8 Qy) 


Usual Hodge decomposition gives us that there is an 
orthogonal decomposition: 


L;(End(E) Q OL) = im d4 @ ker d4 
which means that: 
TaB: (E) = fa E L?(End(E) ® Q4) | diya = o} 


Thus, the pairing [18] also defines a Riemannian 
metric on B% (E). Putting these together, we conclude 
that the space Tj4)My tangent to M}%(E) at an 
equivalence class [A] of anti-self-dual connections 
can be described as follows: 


Ta Mx(Œ) 
= fa c L2(End(E) 9 24) | dja = dja = o} 19] 
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It turns out that the so-called deformation operator 
OA = A Oda: 


64 : L¢(End(E) ® Nx) 


>L ,,(End(E)) @ L5_,(End(E) @ Q§) 


is elliptic. Moreover, if A is anti-self-dual then coker 
ôa is empty, so that Tj4)M (E)=keré,. The 
dimension of the tangent space Tja M(E) is then 
simply given by the index of the deformation 
operator 6,4. Using the Atiyah—Singer index theorem, 
we have for SU(r) bundles with c.(E)=k: 


dim M(E) = 4rk — (r — 1)(1 — by(X) + b4 (X)) 


The dimension formula for arbitrary gauge group G 
can be found in Atiyah et al. (1978). 

For example, the moduli space of SU(2) instantons 
on R* of charge k is a smooth Riemannian manifold 
of dimension 8k — 3. These parameters are inter- 
preted as the 5k parameters describing the positions 
and sizes of k separate instantons, plus 3(k — 1) 
parameters describing their relative SU(2) phases. 

The detailed construction of the instanton moduli 
spaces can be found in Donaldson and Kronheimer 
(1990). An alternative source is Morgan’s lecture 
notes (Friedman and Morgan 1998). It is interesting 
to note that M(E) inherits many of the geometrical 
properties of the original manifold X. Most notably, 
if X is a Kahler manifold, then M(E) is also 
Kahler; if X is a hyper-Kahler manifold, then M(E) 
is also hyper-Kahler. One expects that other 
geometric structures on X can also be transferred 
to the instanton moduli spaces MX(E). 


See also: Characteristic Classes; Finite-Dimensional 
Algebras and Quivers; Gauge Theoretic Invariants 

of 4-Manifolds; Gauge Theory: Mathematical 
Applications; Integrable Systems: Overview; Index 
Theorems; Moduli Spaces: An Introduction; Solitons and 
Other Extended Field Configurations; Twistor Theory: 
Some Applications [in Integrable Systems, Complex 
Geometry and String Theory]. 
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Introduction 


The notion of integrability plays many different rôles 
in quantum field theory (QFT). In this article we 
interpret it in a narrow sense and describe some QFTs 
that are completely integrable, in the sense that there 
are as many integrals of motion as degrees of freedom. 
Necessarily this implies, since we are talking about 
field theories, that there is an infinite number of 
conserved quantities. The existence of such a tower of 
conserved quantities of increasing Lorentz spin 
implies, via the Coleman—Mandula theorem, that the 
theories are trivial in spacetime dimensions greater 


than 2. On the other hand, in 1 + 1 dimensions there is 
a rich menagerie of such integrable quantum field 
theories (IQFTs). These theories are fascinating in their 
own right as nontrivial QFTs for which data like the 
S-matrix and spectrum can be determined exactly. We 
will describe these exact S-matrices for a series of 
seminal examples. In addition, we briefly describe the 
applications of these theories to statistical systems in 
two dimensions. 


Classical Integrable Systems and 
Field Theories 


For a field theory to be integrable it must have an 
infinite number of conserved charges. Necessarily 
these must be spacetime symmetries which extend the 
Poincaré symmetry in some way. It turns out that, due 
to a theorem of Coleman and Mandula, such 


extensions are very restrictive: they are only possible in 
1 + 1 dimensions (one dimension of space and one of 
time) apart from noninteracting theories. Below we 
describe some of the most important examples. 


Affine Toda Theories 


These theories describe the interactions of a set of 
scalar fields which we write as a vector @. The action is 


= J d’x G 3,0)" -V@)) [1] 


The potential has to be very specially chosen in 
order that the resulting theory is integrable. The 
resulting theories are classified by affine Lie alge- 
bras. We shall describe only the theories related to a 
simply laced Lie algebra g (so of ADE type). In this 
case, for the affine version of the theory, 


2 r 
VO) =r dom emt 2 


where @ is an r-rank g vector and @,,a=1,...,7, are 
a set of simple roots of g. The fact that we are 
considering the affine version of the theory means 
that we include the term involving the extended root 
(the lowest root) @ = —} ` —4 naŒa, which defines 
the integers 7,(” 9 = 1). If this term is absent then the 
potential does not have a minimum. Such nonaffine 
theories are interesting in their own right since they 
include the Liouville theory, but we shall not 
describe them here. 

One way to expose the infinite set of conserved 
charges at the classical level is to write the equations 
of motion in Lax form. This has the form of the 
vanishing of the field strength, or zero-curvature 
condition, of an auxiliary gauge connection in g 
with components (Ax, Az): 


ae ee P? (ea + fa) 


a=0 


lid : BQa-@/2 — 
28 € (Ea fa) 


[3] 


Here, {e;, fi} are related to generators of g in a 
Cartan-Weyl basis, via 
fa = mg 

fo = on, 
where z is a auxiliary variable known as the spectral 


parameter and þ is the Coxeter number of g. Using 
the following commutators of g, 


Eo, Ea, | = babča °H 
IH, Eg| = @Eg [5] 
Eaa Ea] =0 


Cp = eka ae N 4] 


—h 
e0) =Z Exo, 
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it is straightforward to verify that the zero-curvature 
condition 


Fy: = O,A; — O,Ax + Ax, Arl = 0 [6] 


is equivalent to the equations of motion which 
follow from extremizing the action [1]. 

The fact that there exists a flat connection which 
depends on an auxiliary parameter z is sufficient to 
ensure integrability. In brief, the idea is that the 
gauge connection can be “abelianized” by a gauge 
transformation: 


A, = Uð, U! + UAU! with [A;,A,]=0 [7 


Hence, 0,A, — 0,A;=0. This can be done in two 
inequivalent ways, such that A,, are polynomials in z 
and z!, respectively. The corresponding coefficients 
are then conserved currents whose integrals give 
conserved charges. It can be shown that for the 
Toda theories these conserved charges have Lorentz 
spin given by an exponent {s,} of g modulo its 
Coxeter number h: 


An: b=n+1, {1,2,3,...,n} 

Dyn: h=2n—-2, {1,3,5,...,2n-—3,n—1} 

Le peat), {1,4,5,7,8,11} [8] 
Ez: b=18, {1,5,7,9,11,13,17} 

Fg: b=30, {1,7,11,13,17,19,23,29} 


This spectrum of conserved quantities seems to be a 
ubiquitous feature of IQFTs. These theories can be 
generalized by replacing g, or rather its (untwisted 
affinization) with any affine algebra. 


The Sinh/Sine-Gordon Theory 


These theories are the simplest of the Toda theories 

described above, associated to the Lie algebra A4. In 

this case there is a single field and the potential has 
the form 

m? 

V(O) = Fa (e + e) 9 

We have rescaled the field by 1/v2 relative to the 

normalization in [2]. This potential defines the “sinh- 


Gordon theory.” However, we can also take 8 — i to 
give the sine-Gordon theory with an action 


s= | ds (5 (a0 +e “7 os(4) ) 10) 


The sine-Gordon theory is a useful paradigm for 
IQFTs because it exhibits most of the features of 
more complicated examples. To start with, it illus- 
trates another important property of some integrable 
systems; namely, the existence of solitons. In the sine- 
Gordon case, the minima of the potential lie at 
p= 2nrT/ b, for an integer n, so there is a topological 
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kink that separates a vacuum 7 on the left and n + 1 
on the right, as well as an antikink. The explicit 
solution for the kink moving with velocity v is 


O01) = stan exp(m(xcoshé—tsinhd—€)) [11] 


where € is a constant and, since we are working in 
1+ 1 dimensions, we have introduced the rapidity 0, 
in terms of which the velocity is 


v = tanh 0, —o0 < 0 < œ [12] 


The antikink solution is simply the negative of the 
above. The kinks have a mass 


M=—~ [13] 


The existence of topological solitons is not a 
consequence of integrability, per se, for example, the 
t theory in 1+1 dimensions also has kinks; 
however, in the integrable setting, the solitons have 
special properties that survive in the quantum theory. 
The first property is that multisoliton solutions can be 
found exactly using a variety of different techniques. 
They are most easily written down using the tau 
function, which is related to the field via 


o=—=,log— [14] 


The N-soliton solution can then be written com- 
pactly as 


= 5 


N N 
exp| X up + X upg V?? | [15] 
{up }=0,1 p=1 


p.q4=1 


The sum is over the 2 possibilities for which up = 0 
or 1, for each p, and we have introduced 


P) — m(xcosh 0, — t sinh 6, — £) E [16] 


The rapidity of the pth soliton is 0p, and the choice 
of sign corresponds to the kink and antikink, 
respectively. The “interaction coefficient” is 


exp TP? = tanh? (4 (0, — 03)) [17] 
For example, the two-soliton solution is 


r=1+ ee + ee” + elton +o® [18] 


The multisoliton solutions have a natural physical 
interpretation as the histories of a set of solitons 
which scatter off each other. To make this more 
precise, consider the two-soliton solution [18] in 
more detail. Suppose that £1 < 2, v1 > v2. Focus on 
the solution in the vicinity of the first soliton, that is, 
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Figure 1 Classical scattering of a kink and an antikink. The 
final velocities equal the initial velocities and the only effect is to 
introduce a velocity-dependent time delay as shown. 


x~ vıt+&. In the limit t— —oo, the solution is 
approximately 


Tat ee” [19] 


while, as ft 00, it is approximately 
ro e?” (1 + ete") [20] 


In both the limits, the solution represents an isolated 
soliton, the only difference is that the final “position 
offset” has been displaced: & — & — T. It is a 
consequence of integrability that the solitons inter- 
act in such a simple way. There were two solitons in 
the initial configuration and two in the final 
configuration traveling with the same velocities. 
The only effect is to introduce a time delay of 


Y(0) 
Af == 21 
m sinh(0/2) ae 
in the center-of-mass frame with 6; = —0)=6/2, 


which we illustrate in Figure 1. We shall see that this 
kind of simple scattering is a characteristic feature of 
integrable field theories which extends to the 
quantum theory. It reflects the enormous restriction 
that the existence of the infinite set of integrals of 
motion puts on the dynamics. 


integrability at the Quantum Level 


In this section we turn to the particular implications 
of integrability for the field theories at the quantum 
level. In discussing theories in 1 + 1 dimensions it is 
convenient, as in [12], to use the rapidity. The 
energy and momentum of a particle of mass m are 
E=mcoshé@ and p =m sinh 9, respectively. 

The sinh- and sine-Gordon theory, and their affine 
Toda generalizations, are scalar field theories with a 
well-behaved potential and as such they can be 
quantized in the conventional manner. It can be 
shown that integrability survives quantization and we 
now address its consequences. The key observation is 
that having an infinite set of higher-spin conserved 
quantities is very restrictive on the possible quantum 
processes. Assuming that the theory has a mass gap, 
the asymptotic states |a, 0) are particles with rapidity 


6 and additional quantum numbers needed to specify 
the state are indicated by the label a. These states are 
eigenstates of the conserved charges, 


Osla, 0) = qs(a)e” |a, 0) |22] 


Here, s is the spin of the charge which ranges over 
some infinite subset of the integers. Since the charges 
must commute with the S-matrix, it follows imme- 
diately that if an incoming state of n particles has a 
set of rapidities {6,,...,0,} then the outgoing state 
must also have n particles with the same set 
{01,..., 0n}: there is consequently no particle crea- 
tion! For example, we have illustrated the scattering 
of two particles in Figure 2. The two-particle 
S-matrix element will be denoted as 


S3 (01 — 02): ja, 01; b, 02)— |c, 02;d,01) [23] 


Note that masses of the incoming particles must match 
the outgoing ones: m,=mg and m,=m,. We have 
already seen this kind of behavior with the classical 
scattering of solitons in the sine-Gordon theory. In 
spite of the fact that the scattering is purely elastic, it 
can be nontrivial for two reasons: if there are mass 
degeneracies in the theory, the quantum numbers 
{a1,..., án} can change and, in addition, the S-matrix 
element can depend nontrivially on the momenta. 
The fact that the incoming and outgoing states 
have the same set of momenta leads to the notion of 
factorizability. To see what this means, consider the 
case of three particles. Let us imagine that we 
prepare the initial state to consist of three fairly 
narrow wave packets in position space with 
momenta smeared in accordance with the uncer- 
tainty principle. The key to the following argument 
is the fact that the infinite set of higher-spin 
conserved charges (with commute with the S-matrix) 
allow one to move the positions of the three 
particles relative to each other in an arbitrary way. 
In addition, the theory has a mass gap, so interac- 
tions have a finite range. By using this freedom, we 
can arrange for particles 1 and 2 to interact first, 


ant 


a 
b 
Figure 2 The two particle S-matrix with particles a and bin the 
initial state and c and d in the final state. For consistency, 
Ma = Ma and Mp = Mc. 
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Figure 3 The scattering of three particles can factorize in two 
distinct ways as illustrated, leading to a nontrivial condition: the 
Yang—Baxter equation. 


well before they come within interaction range of 
the third. Subsequently, the first two particles 
interact with the third as on the right-hand side of 
Figure 3. This ability to move the wave packets 
around using the symmetries means that the three- 
particle S-matrix element must “factorize” into a 
product of three two-particle elements: 
S2! (01, 02, 03) 


abc 


= >. S801 = 62)Si (01 — 03) 8% (02 — 03) |24] 
ghi 


However, we could also use the symmetries afforded 
by the conserved charges to shift the positions of the 
particles so that particle 2 and 3 interact first, as on 
the left-hand side of Figure 3. Since the charges 
commute with the S-matrix, the result must the 
same; hence, there is a nontrivial consistency 
condition: 


NO Si.(02 — 02)S2F(01 — 03)Sé1 (02 — 03) 
ghi 


= S > 8% (01 = 62)Si (01 E 03) Soe (02 -= 03) |25] 
ghi 


This is the celebrated Yang—Baxter equation. Notice 
that it is only nontrivial if there are mass degen- 
eracies, otherwise the particles on internal lines are 
determined by the external particles. 

The factorization of the S-matrix extends readily to 
the case of more particles in an obvious way. An 
n-body element factorizes into a two-body element 
for each pair of particles. One might think that 
considerations of the -particle S-matrix would lead 
to additional constraints; however, it can readily be 
shown that this is not the case and that the Yang- 
Baxter equation acts as a basic “move” which allows 
one to reorder the n-particle S-matrix into an 
arbitrary order. Further conditions on the S-matrix 
come from the axioms of analytic S-matrix theory: 


(1) Unitarity 


DO SH, (SE (-8) = FacSoa 26] 
ef 
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(ii) Crossing symmetry Each particle a has an 
antiparticle a and 


Sed (6) = Si (ni — 0) 27) 


(iii) Analyticity The S-matrix is a meromorphic 
function of 0 on the physical strip, 0 <Im6@< 7. 
Singularities in most instances occur along the 
imaginary axis and the simple poles correspond to 
direct or cross-channel resonances. In this case, if 
S%# (8) has a simple pole at 0=iu‘, (necessarily a 
nonphysical rapidity difference) in the direct channel 
there exists a bound state of a and b of mass 


m = m + mj, + 2mm, cos us, [28] 
The situation is illustrated in Figure 4. The new 
particle must itself be included in the particle spectrum. 
The S-matrix elements at the = have the form 


= Pg T ee foe [29] 


where P4, can be thought of as a kind of projection 
Operator with 


2 Pa Doe [30] 


Unitarity of the QFT requires that r, is real and 
positive, although there are also examples of 
nonunitarity theories with exact S-matrices. If 
ab— c can occur then so can a¢—b and be—<a. 
From [28], we deduce the following identity: 


uo, + ub +u? = 20 [31] 


The data {u‘,} for any given scattering theory are 
known as the fusing angles. 

(iv) The Bootstrap equations ‘These give a non- 
linear relation between S-matrix elements. The basic 
idea is that if particle c appears as a resonance in the 
scattering of a and b then the S-matrix element of c 
with another state d can be deduced in terms of the 
scattering of d with a and b. This is illustrated in 
Figure 5. Using [30], we can write the resulting 
a for the S-matrix element of c and d directly: 


=) PGs (9 - 


mh J0 +i) Ph (32 
ghi 


a b 


Figure 4 Near a direct channel pole, the scattering of a and b 
is dominated by the bound state c. 





Figure 5 The bootstrap equations result from considering the 
interaction of a particle d with the bound state c of a and b in two 
distinct ways as illustrated. 


The bootstrap constraints are very powerful because 
they allow one to extract the S-matrix elements of new 
particles that appear as bound states. This leads to the 
philosophy of the “bootstrap program” where one 
attempts to build consistent S-matrices starting from 
the S-matrix for a subset of particles which act as a 
seed for the algorithm. The process is quite an art, but 
at the end one has to be satisfied that the complete 
analytic structure is consistent with all the axioms. The 
key is to be able to account for all the poles in a 
consistent way, either in terms of bound states, as 
above, or in terms of the Coleman—Thun mechanism. 
This allows some poles to be interpreted in ways other 
than the existence of a bound state. The bootstrap 
algorithm is very complicated in general and at the 
present time a complete classification of solutions is 
not known. However, there are a large number of 
known solutions which appear to be intimately related 
to Lie algebras and associated structures known as 
Yangians and quantum groups. Below we describe 
some of the simplest known solutions. 


Minimal S-Matrices 


These scattering theories are in some sense the 
simplest. The particle spectrum is generally non- 
degenerate and so the Yang—Baxter equation is 
trivial. As is ubiquitous in the subject of IQFT, the 
classification of the theories is related to Lie 
algebras, although what seems to be important is 
not so much the algebra in question but rather the 
details of the associated root system. In this case the 
appropriate algebras are the simply laced algebras of 
ADE type. The number of particles is equal to the 
rank r of the Lie algebra and the masses are given by 
the r elements of one of the eigenvectors of the 
Cartan matrix of the algebra g: 


Camp = 2 —2cos— Ma [33] 
2 A ( i) 


where / is the Coxeter number of g. The conserved 
charges have spins corresponding to the exponents 
of g modulo þh. We briefly explain how the complete 


S-matrix can be written down in terms of properties 
of the root system of g. Let ® be the set of roots of g, 
and @,,a=1,...,7r, a set of simple roots, as in the 
last section. In terms of these, C,, = 2@a4 - Œp laz, Let 
@7,4=1,...,r, be a corresponding set of funda- 
mental weights, Œa - @p = 6p. 

Key to defining the theories is the notation of the 
Weyl group of g, the group generated by reflections 
in the simple roots: 


20 -Oa 


2 a 
a 





Ra (a) =Q [34] 
The element w = R1 R2 --- R, is known as a Coxeter 
element of the Weyl group, and it has special 
properties that are significant in the present context. 
In particular, its eigenvalues are of the form 
exp (27is,/h), where h is the Coxeter number of g 
and the integers s, are the exponents of the algebra 
as in [8]. Note that there is always a pair with sı = 1 
and s,=h—1. Clearly, w acts as a rotation in the 
two-dimensional space spanned by the two corre- 
sponding eigenvectors. We can define an antisym- 
metric function u(a@,B) on roots to be h/r times the 
(signed) angle between the projections of œ and p 
onto this two-dimensional eigenspace. In prepara- 
tion for what follows, it is useful to also define the 
roots 


a = R,Ry-1 +++ Rati (Qa) [35] 


We can now present P Dorey’s amazingly compact 
formula for the complete S-matrix. For the scatter- 
ing of particle a with particle b, 


Sæl) = | [ {1+ uo Be” [36] 
BET, 


In this formula I, is the set of positive roots of g 
which lie in the orbit of @, under w. We have also 


defined the building block 
{x} = (x + 1)(x — 1) 


a Fy [37] 


MTE 


The fusing rules are also particularly elegant in 
the language of root systems. There is a three-point 
coupling between a;,i=1,2,3, if there exist three 
roots &® cer, such that œ” +a” +a) =0. 
Furthermore, the fusing occurs in the a1, a2 channel 
at rapidity difference 


u® = ula @®)) [38] 


This is Dorey’s fusing rule. 
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For the case of A„—1, the S-matrices are particu- 
larly simple. The mass spectrum is 
ma = msin £, a= 1,...,n— 1 [39] 
n 
and Dorey’s rule gives the possible fusings as 
ab — (a+ b)mod n, which occur at the rapidity 
values 


a+b 





T atb<n 
O=iteb= 4 7 op 
Si 
(2-4 
n 


The charge conjugation operator maps a> a@=n—a 
and the explicit form for the S-matrix elements is 


S(O) ={a+b—-1}{a+b-3}.---{ja—b|+1} [41] 


i 


[40] 





)r at+b>n 


The element S$,,(@) has one direct channel pole at 
0=iu,, corresponding to the exchange of the 
particle a+b mod n, and a cross-channel pole at 
0=iu p corresponding to the exchange of particle 
a—bmodn. 


Affine Toda Theories 


The bootstrap program has been solved for all the 
affine Toda theories. For the simply laced theories 
described earlier, the result is directly related to 
the minimal S-matrices constructed above. The 
only difference is that there are additional factors 
which depend on the coupling 8 of the Toda 
theory but which do not introduce any additional 
poles onto the physical strip. These CDD factors 
are included by simply changing the basic building 
block [37]: 


7 (x+ 1)(x-— 1) 
{x} > {x }Toda = («—1+B)(x+1—B) [42] 
where 
R i 43] 


TE 
2r 14+ 67/40 


The S-matrix structure for the Toda theories 
based on the nonsimply laced algebras is a good 
deal more complicated. Integrability is only main- 
tained in the quantum theory if the ratios of the 
physical masses of the particles depend on the 
coupling constant (3 is some very special way. 


The Sine-Gordon Theory 


We have seen that the sine-Gordon theory has 
solitons at the classical level. At the quantum level, 
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Figure 6 Soliton scattering processes. s and s are the kink 
and antikink, respectively, or vice versa. 


we expect that these kinks become bona fide particle 
states, in addition to the particle corresponding to 
the quantum fluctuations of the field ¢. Focusing on 
the solitons, we expect a degenerate doublet 
corresponding to the kink and antikink. For the 
scattering of two solitons, there are six allowed 
processes illustrated in Figure 6. Unitarity [26] leads 
to the constraints 


[44] 


while crossing symmetry [27] (using the fact that the 
soliton and antisoliton are antiparticles) gives 


S(ir — 0) = Srt (0), Srhir — 0) = SR(0) [45] 
By themselves, these constraints are rather mild; 
however, the complete soliton S-matrix must also 
satisfy the Yang—Baxter equation [25]. The solu- 
tion to all the constraints is not unique, however, 
the Zamolodchikovs conjectured that the exact 
answer is 


5(@) =~sinh (= = o) U(0) 


17T 


Sr(0) = &sinh (= o) U(6) 46 


sinh? @ + isin (<1 
Spi (0) = ; = 
sinh” 6 — isin {| —— 

16 


. > [{k-l-2j 
„sin (= 
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ae n(O)R, (im) 
(2m i) 
Y pi 
r(Qn ne ae i=) 
T Y 
[ (1 -+ m” + i) 
T q 
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T fi 


where y= 3?(1 — 8?/8r) t. The reason for confi- 
dence in the conjecture is that from the soliton 
S-matrix one can complete the bootstrap program 
and account for all the poles in terms of particles in 
the theory. In particular, there is a finite set of 
bound states of the soliton and antisoliton, called 
breathers, with masses 


R„(6) = m 


x 


ST 


= 2Msin a <— [48] 
T 


16 A NT 
Here, M is the soliton mass. The bootstrap 
equations give the S-matrix for the scattering of a 
soliton or antisoliton with the kth breather, 


sinh 6 + o 

Sk (0) = —— t6 
k ky 
sinh  — i cos 7 


,5(kR-2 m 0 
pine 32 1-5 +i5) 
TT 
4 


Tl ky a8 [49] 
sin? 37 15 
while, for the scattering of breather k with /, 
k—l 
y) sinh 6 + isin (—— TE 7) 
. >.. (R= 
7) sinh 9 — isin (S1) 
5) cos? (= pee -m F 5) 
2 32 2 50) 


where we assume, without loss of generality, that 
k > l. The remarkable thing is that the scattering of 
the lowest-mass breather mı with itself, 


sinh 6 + ising 
$11(0) = ae [51] 


T 
sin aa 


is precisely the Toda S-matrix for A; with 3 > iG/V/2 
(the origin of the factor of v2 is mentioned after eqn 
[9]). This uniquely identifies the lowest-mass breather 
as being the quantum of the ¢ field. 

The quantum structure that we have described 
above can be directly related to the classical 
scattering of solitons. In order to implement the 
classical limit, we can reintroduce h which is 
achieved by replacing 67 by (7h. In this limit, the 
S-matrix elements have the form 


9; 

S(0) = exp = (6(8) + OH) [52] 
The phase 6(0) is related via the WKB approxima- 
tion to the time delay in the classical theory of 
soliton scattering via 


ô(0) = const. +f dé'M sinh(6/2)At(6) [53] 
0 


where At(0) is the time delay in the center of mass 
(21). It is possible to verify [53] for the processes 
S(9) and St(0). Note that the reflection process has 
no classical analogue. 


IQFT, Conformal Field Theories and 
Statistical Systems 


We have described some IQFTs and their factoriz- 
able S-matrices in theories with a mass gap. We can 
ask the question, “what happens at very high 
energies compared with all the mass scales?” For a 
generic QFT such a limit may not exist, however, 
for a special class of theories the limit is a massless 
scale-invariant theory corresponding to a fixed point 
of the renormalization group. The massive theory 
can be thought of as a deformation of the massless 
theory by a particular relevant operator. At the fixed 
point, the Poincaré symmetry is enhanced to the full 
conformal group in the appropriate number of 
dimensions and the resulting theory is known as a 
conformal field theory (CFT). In 1+ 1 dimensions 
the conformal group is infinite dimensional and so 
many CFTs are themselves integrable, in the sense 
that the complete spectrum of fields is known and 
their correlation functions can be constructed. 
Hence, an alternative way of thinking about many 
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IQFTs is as a perturbation of a CFT by a specific 
relevant operator: 


SIQFT = SCFT a ej d'xO(x) [54] 


We will suppose that the operator has conformal 
dimensions (A, A). This description of the theory 
can be turned around to ask the following question: 
which relevant deformations of a given CFT lead to 
IQFTs? Remarkably, since CFTs are so well under- 
stood, the question can often be answered exactly. 
The idea is that the conserved quantities of a CFT 
are all (anti-)holomorphic with respect to a holo- 
morphic coordinate z=x + it. Conserved quantities 
include the stress tensor of spin 2 but include, in 
addition, an infinite tower of currents of ever 
increasing spin {T;}. After perturbation, one has 


OTe = eR fee pa R 4: [55] 


The conformal dimensions of the R™) are (s—n(1— A), 
1—n(1—A)). Since the conformal dimensions of 
fields in a CFT are bounded below by zero, it follows 
that the series on the right-hand side truncates. The 
question of whether T, remains conserved away from 
the CFT boils down to the question as to whether the 
right-hand side has the form 00, for some ©. 
Zamolodchikov found an ingenious counting argu- 
ment which showed in certain circumstances that the 
right-hand side has precisely this form for some s > 2. 
This is sufficient to establish that the perturbed theory 
is an IQFT. In certain cases the spectrum of spins of 
the conserved quantities that are established by the 
counting argument is enough to make a connection 
with a known factorizable S-matrix. 

This way of viewing IQFT as perturbations of CFT's 
is especially fruitful when we make the connection 
of the Euclidean QFT with the classical statistical 
mechanics of a two-dimensional system. In this 
connection, the Feynman path integral is reinterpreted 
as the sum over the configurations in the canonical 
ensemble with the Euclidean action interpreted as the 
energy. Usually, we consider statistical systems which 
are discrete, so typically defined on a lattice. The 
Euclidean QFTs are to be thought of as these statistical 
systems in the continuum limit where the lattice spacing 
is taken to zero keeping the long-range physics fixed. 
CFTs which have no massive degrees of freedom are 
identified with points of second-order phase transitions 
in the statistical system where correlation lengths are 
infinite. Perturbations of CFTs by relevant operators 
correspond to taking the statistical system away from 
criticality by changing some external parameter. 

The prototypical example of such a statistical 
system is the Ising model. In the lattice version of 
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this model, there are a set spins {o;} at each lattice 
site which can take the discrete values +1. The 
partition function of the theory is 


Zia LD) = X exp (- 1 N oio; — HX- o:) [56] 
(i) i 


{ai} 


The Ising model is the simplest model of a ferro- 
magnet, where T is the temperature and H is the 
external applied field. The theory has a second-order 
phase transition for T =T., the Curie temperature, 
and H =0 when the competition between the energy, 
which favors aligning the spins, and entropy, which 
favors disorder, exactly balance. In the two-dimen- 
sional neighborhood of the critical point, the lattice 
theory admits a continuum limit which can be 
described as the perturbation of a CFT, describing 
the critical Ising model, by a pair of relevant operators 
with couplings T — Te and H. In the case of the Ising 
model, the CFT is simply the theory of a free massless 
fermion in two-dimensional Euclidean space. 

It turns out that in the two-dimensional space 
of relevant perturbations, there are two directions 
which lead to IQFTs. The most obvious is changing 
the temperature away from T, while keeping H = 0. 
This leads to a particularly simple IQFT, that of a 
free massive fermion. More unexpectedly, the direc- 
tion for which H varies away from 0, but T= T,, 
also leads to an IQFT. In this case, Zamolodchikov’s 
counting argument shows that there are higher-spin 
conserved charges of spin including 


$4.9 11,13 17,19... <. [57] 


This is remarkable because, as we have described 
previously, there is a minimal solution of the 
bootstrap program that describes the scattering of 
eight particles which has a spectrum of conserved 
charges that includes these spins. It is the minimal 
scattering theory related to the algebra Eg. 

The fact that the scattering theory of the off- 
critical Ising model in the magnetic field direction 
has been identified is remarkable. From the S-matrix 
one can proceed to investigate the off-critical corre- 
lation functions using a technique known as the 
“form factor programe.” Detailed simulation of the 
original lattice model [56] has provided strong 
support for the veracity of the Eg scattering theory. 
For instance, the two lightest masses in the scatter- 
ing theory determine the ratio of the two longest 
correlation lengths m/m = 2cos (7/5). 

In general, the identification of an IQFT and the CFT 
at its ultraviolet limit can be more difficult to establish. 
One way to proceed is to use the thermodynamic Bethe 
ansatz. This technique involves considering the ther- 
modynamics of a gas of the particles in a periodic box. 
Since the scattering is purely elastic, thermodynamic 


quantities can be calculated, albeit in terms of a set of 
coupled nonlinear integral equations. If the box is small 
enough, ultraviolet effects dominate and various 
features of the CFT can be recovered. 


Other IQFTs 


There is a rich menagerie of other IQFTs that we 
have no space to discuss in detail. One is sigma 
models, whose fields take values in a Riemannian 
target space Jt with an action 


c= J d’ xg pôu X OX? [58] 


where g,,dX7 dX? is the metric of M. These theories 
are integrable at the classical level if the target space 
is either a group manifold of a compact simple 
group G or a symmetric space coset G/H, where H 
is a suitable subgroup of G. The former are known 
as the “principal chiral models.” There are two 
kinds of conserved quantities, both local and 
nonlocal. At the quantum level, the conserved 
currents which imply classical integrability can be 
subject to quantum anomalies. An analysis of these 
anomalies proves that the principal chiral models 
are all integrable at the quantum level, while only 
the subset of symmetric space coset models, namely 


SO(n + 1)/SO(n), SU(2) /SO(n) 
SU(2n)/Sp(n), SO(2n)/SO(n) x SO(n) T59] 
Sp(27)/Sp(1) x Sp(7) 


are quantum integrable. S-matrices have been proposed 
for all these integrable sigma models. They have a more 
complicated structure than most of the cases discussed 
here, because the particles fall into representations of the 
associated Lie groups and the Yang—Baxter equation, 
such as for the sine-Gordon solitons, is now nontrivial. 
Remarkably, gross features of the S-matrices, such as the 
mass spectrum fusing rules, are identical to the Toda 
theories or the minimal S-matrices. 

Returning to IQFTs that are associsted with 
deformations of CFIs, there are more general 
classes which are associated with the renormaliza- 
tion group trajectories between two nontrivial fixed 
points. These theories have both massless and 
massive degrees of freedom. Even more remarkable 
are the staircase models of Zamolodchikov that 
exhibit an infinite series of crossover behavior where 
the renormalization group trajectory passes close to 
an infinite series of fixed points in sequence. 

For all of the theories described above, one might 
have thought more generally that integrability is a 
very rigid property of a theory. In general, for 
example, the number of external coupling constants 
is very limited and the mass ratios are all fixed. For 


example, in Toda theories there is only an overall 
mass scale m and the coupling 8. If the form of the 
potential is altered in any way then integrability is 
lost. However, in certain circumstances, integrability 
appears to be a looser constraint that allows more 
flexibility. One class of such theories is known as 
the homogeneous sine-Gordon theories. These are 
integrable deformations of gauged WZW models 
associated with the coset G/U(1)’, where r is the 
rank of a simple compact group G. In these theories 
there is a rich spectrum of both stable and unstable 
particles with masses and an S-matrix that depends 
continuously on a set of r coupling constants. 


See also: Algebraic Approach to Quantum Field Theory; 
Bethe Ansatz; Constructive Quantum Field Theory; Eight 
Vertex and Hard Hexagon Models; Functional Equations 
and Integrable Systems; Integrable Systems: Overview; 
Quantum Field Theory: A Brief Introduction; Quantum 
Field Theory in Curved Spacetime; Sine-Gordon 
Equation; Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions; Two-Dimensional Models; Yang- 
Baxter Equations. 
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Discrete Dynamical Systems 


The expression “dynamical system” usually refers to 
a coupled system of ordinary differential equations 
(ODEs), namely, 


X(t) = MXi na 


where t belongs to some set of nonzero measure I of 
the real line R, typically an interval [a,b] or a 
semiline or the whole line, and x; are sufficiently 
smooth functions from I to R or to C. 

The system [1] is complemented by initial or 
boundary conditions that make it into an “initial- 
value” or a “boundary-value” problem. Under suitable 
regularity assumptions on the RHS, the existence and 
uniqueness of the solution of the initial-value problem 


xn), fo=l,....N [I 
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is guaranteed, but in most cases the solution can be 
known only “approximately” either through perturba- 
tion theory or just through numerical integration. This 
is not the proper place to discuss finite-difference 
schemes for systems of ODEs: what is relevant is that 
such numerical schemes (think, e.g., of Euler or 
Runge-Kutta schemes) “discretize” the continuous 
independent variable t by replacing it by an integer 
variable n € Z: in the simplest case, the interval [a, b] 
is replaced by a set of L equally spaced points t, =a + 
n(b—a)/L(n=1,...,L), the first derivative is 
approximated by a (forward) difference, and the 
system [1] is converted into a system of “difference” 
equations of the form 


watsa) EP, Rt) nn A BI 


where / denotes the time step (b — a)/L. 

The coupled system [2] is an example of a “discrete 
dynamical system,” explicit (because the updated 
variables only depend upon the values taken 


60 Integrable Discrete Systems 


at previous discrete times), first order (only “nearest- 
neighbor” discrete times, n, n+ 1 are involved), but 
nonautonomous, as the RHS is allowed to depend 
explicitly upon the independent variable n, analo- 
gously to its continuum counterpart. 

In the following, “autonomous” but not necessa- 
rily explicit discrete dynamical systems of a special 
type will be considered: in fact, we will require them 
to be equipped with a Hamiltonian structure, and 
we will define the notion of complete integrability 
(in the Arnol’d—Liouville sense) for such systems. 

This article emphasizes on some aspects and 
properties of integrable discrete systems, neglecting 
others that could be equally important. In particular, 
as no nonautonomous discrete systems will be 
considered, discrete analogs of Painleve’ equations 
will never be discussed in this article, and conse- 
quently the intriguing issues concerning “singularity 
confinement” in the discrete and “algebraic entropy” 
will not be touched upon (see, e.g., Grammaticos 
et al. (2004)). Similarly, neither the integrability for 
discrete systems in multidimensional space nor 
“quantum integrable mappings” will be discussed. 


Lagrangian and Hamiltonian 
Formulations 


Following the historical path along which modern 
classical mechanics has been developed, first the 
concept of a Lagrangian map is introduced, and then 
Hamiltonian (in fact, symplectic) maps are defined 
through a proper discrete version of the Legendre 
transformation. 

Let xj(z)(j=1,...,N,2 E€ Z) be N sequences of 
real numbers and let L(x,y) be a smooth function 
from RN x RY into the reals, x denoting the N-tuple 
X1,...,xnN. L is regarded as a “discrete Lagrange 
function”: corresponding to each discrete time n, it 
is assigned a certain value L,:=L(x(n),x(n+1)). 
The corresponding discrete action functional S[£] is 
defined in a natural way: 


s= Y L, 3] 


The actual “discrete trajectory” will be given by 
the sequence x(n) that corresponds to a “critical 
point” of the action [3] subject to the constraints 
6x(N,)=6x(N;)=0. Note that the values N, (Np) 
may well possibly coincide with —oo (+00). Such 
“critical points” are given by the solution of the 
discrete Euler-Lagrange equations: 
es i =0 {4 


Ox; xj =; (n) y;=x;(n+1) OY; BX 1 ye) 


It is worthwhile to remark the intrinsic nature of 
eqns [4], whose form turns out to be independent of 
the choice of a coordinate chart. In fact, by omitting 
the explicit dependence on n and simply denoting 
x(n) =x, x(n + 1)=X, x(n —1)=x, [4] can be cast 
in the form 


ViL(x, x) -+ ViL(%, x) =) [5] 


which makes its “implicit” nature for the updated 
variable x more transparent. Clearly, as a map from 
the pair (x ,x) to the pair (x,x), it is in general a 
multivalued map, or a “correspondence”, as it is 
called in the literature (Suris 2003, Veselov 1991). 
In order that [5] be solvable for x, the Hessian 
matrix Hj, =0*L/0xj;Oyz should be nondegenerate. 

As will be noted shortly, the Lagrangian map [4] 
(or [5]) is in fact a canonical, or better a symplectic 
transformation on a suitably defined cotangent 
bundle T*X to the configuration space X € RN. 
Namely, one defines the conjugate momentum to x as 


p := V2L(x,x) [6] 
so that [5] can be rewritten as the following system: 
p = —ViL(x,x) [7] 
p = V2L(x, x) [8] 


This system defines a correspondence (x, p) — (x, p), 
which is indeed a “symplectic” one, as it preserves 
the standard symplectic form u(x,p)= dj, dp; A 
dx;, and, of course, the associated Poisson brackets. 
The simplest way to recognize this property is by 
constructing the generating function of the corre- 
sponding canonical transformation. To this end, let 
us introduce 


N 
S(x, p) = -L+ 5 _ p,(%; — x;) [9] 
j=1 


The discrete Euler-Lagrange equation then takes the 
form 


: OS 
Xj — Xj ~ Op, [10] 
J 
~ OS 
Pj — Pj ~ Bx; [11] 


which is canonically generated by S + } `; x(j)p(). A 
strict analog of the Hamiltonian formulation for 
continuous-time Lagrangian systems does not indeed 
exist in the discrete-time case. One of the main 
consequences, well known to the specialists but 
worth emphasizing in the present context, is that 
even a symplectic map in one degree of freedom 


(two-dimensional T*X) is generically not integrable: 
the existence of an invariant function F(x, p) = 
F(x, p) is not entailed by the symplectic structure, 
so that, as discussed later, integrable maps of the 
standard type are indeed exceptional. On the other 
hand, note that invariant functions do exist when- 
ever a Lagrangian has some additional symmetry: 
this is the case when a Lie group acts on the 
configuration space X and the Lagrange function is 
invariant under its induced action on X x X, so that 
a discrete version of the Noether theorem applies 
(Suris 2003). 


Complete Integrability 


The definition of a “completely integrable” discrete- 
time system is now in order. Let ® be a symplectic 
map on the 2N-dimensional phase space 
M:=(R*‘,dp A dq), equipped with N smooth 
invariant functions Fj, such that 


e F,,...,Fx are functionally independent, that is, 
their gradients VF; are linearly independent of M; 
e F,,..., Fx are in involution: 


{F Fy} =0, j,R=1,...,N 


Let 7 be a connected component of the common 
level set 


{ (x, p) c1: F(x, p) = Ck, k = IPEE aN) 


Then T is diffeomorphic to T! x RN~’, for some 0 < 
l < N; if T is compact, then it is diffeomorphic to an 
N-dimensional torus T™. 

In the compact case, there exists an open ball Q € 
RÀ such that, in 7 x Q, there exist new canonical 
coordinates (I,,¢,),R=1,...,N; I, E T, Q, € Q, the 
so-called action-angle coordinates, enjoying the 
following properties: 


e the actions I, depend just on the F;’s 
è in action-angle coordinates the map is a linear 
shift on the N-dimensional torus: 


Ip .= (Iz) = I; 
Pk := Olde) = be + vrl, I2,..., IN) 


Hence, in action-angle variables a completely integr- 
able map is a canonical transformation from (I, ¢) to 
(I(=I),ġ), whose generating function W only 
depends on the action variables. It takes the form 


I, —I, =0 [12] 


r ow ONT 
ha af ers) (3 
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Integrable Maps of the Standard Type 


As the simplest integrable models, first consider 
some highly nontrivial examples of “standard 


maps,” that is, scalar discrete second-order differ- 


ence equations of the following type (Suris 2003): 
Xn+1 — 2Xn + Xn-1 = G(xn; h) [14] 


with / a real parameter, which exhibit an invariant 
function, say 


J(®n-1; Xn) = J (Xn, Xn + 1) [15] 


Clearly, [14] can serve as a discretization of the 
Newtonian equation: 


= f(x) 16 


if lim, o }*G(x;h) exists and is equal to f(x) 
All “standard maps” are Lagrangian, 
stationary points of the discrete action: 


S= we Xn+1 — 
neZ 


with G(x;b)=08V(x;hb)/ðx. A point in the phase 
space is a pair Xy,P~n=Xn—Xn-1, and [14] is 
symplectic for dp ^A dx, reading 


— Xn = Pn+1 [18] 


being 


ri V(æn;h)) 17] 


Xn+1 


Dn Prd =G an) [19] 


The corresponding generating function is given by 
S=V(x;h) + (1/2)p2,,. Integrability of [19] means 
the existence "of a function F from M into itself such that 


F(Xn41,Pnt+1) = F(Xn, Pn) [20] 


where [15] and [20] are equivalent provided 
J(x,x — y) = F(x, y). 

Suris has found three families of functions G that 
ensure integrability: a rational family, a trigonometric 
family, and a hyperbolic family. There is no room here 
to display the relevant formulas, nor to explain why, 
under natural analiticity assumptions both in h and x, 
no other integrable family exists. However, it is worth 
mentioning that they turn out to be integrable 
discretizations of the scalar second-order differential 
equations |16] for the following “force” functions f(x) 


frat (x) 
ftrig (x) 


= A + Bx + Cx* + DX? [21] 


= Asin(wx) + Bcos(wx) + Csin(w2x) 
+ D cos(w2x) [22] 


fayp(x) = Aexp(x) + Bexp(—x) + Cexp(2x) 


+ Dexp(—2x) [23] 


A curious fact is that those Newton forces that 
one can “discretize” in order to get integrable maps 
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are exactly the external forces that one can add to 
the internal two-body interactions of the Calogero- 
Moser or Calogero—Sutherland models to preserve 
complete integrability. 


Integrable Discrete Systems and 
the Lax Approach 


Since, in a seminal paper, Lax (1968) introduced it 
for the Korteweg-de Vries (KdV) equation, the 
search for a “Lax representation” played a crucial 
role in the construction of integrable systems, both 
finite and infinite dimensional. In particular, the 
continuous time dynamical system [1] (assumed to 
be autonomous) is said to be equipped with a Lax 
representation if there exist two matrices L, M 
whose entries depend upon the coordinates xj, 
whenceforth upon the time ft, such that the time 
evolution [1] can be cast in the form 


L(t) = [L(¢), M(E) |24] 


Hence, the one-parameter family of matrices L(t) 
undergoes the “isospectral” deformation: 


L(t) = U@)L(0)(U()) |25] 


U(t) being the unique solution of the linear matrix 
differential equation: 


U(t) = M(t)U(t) [26] 


with the initial condition U(0)=IJ. Then, the 
existence of a Lax representation in term of, say, 
k x k matrices entails the existence of k integrals of 
motion, given, for instance, by the eigenvalues of 
L(t), or by the traces t; := tr(L(t))!. 

Some remarks are in order: 


e In the case of a Hamiltonian system, the matrices L, 
M depend, of course, on the point in the phase space. 
e No guarantee exists, a priori, that the eigenvalues 
of L, or equivalently the traces t;, be “sufficiently 
many” and in involution. Note, however, that in 
many examples the Lax matrices L, M depend on 
an extra scalar parameter A (so that they are 
elements of an affine or “loop” Lie algebra), 
which might increase the number of integrals of 
motion well beyond the dimension of the matrix. 


The N-body systems of Calogero type and Toda 
type are celebrated examples of integrable dynami- 
cal systems equipped with a Lax representation. 
How this description can be adapted to the 
discrete-time case? The isospectral equation [25] 
suggests the proper way. One has to look for two 
matrices depending on the coordinates (or on the 
phase-space variables) x (again, they can be called L, 


M), such that the discrete-time evolution, modeled, 
for instance, by [2], can be cast in the form of a 
similarity transformation: 


L= MLM”! [27] 


where L= L(x), L=L(x), and M=M(x,x). As 
usual, by denoting by n the discrete time (i.e., the 
number of iterations), so that x= x(n), x =x(n + 1), 


eqn [27] implies that a discrete version of [25] 
holds: 


L(n) = U(n)L(0)[U(n)] |28] 


where U(n):= M(n)M(n — 1)---M(1). 

As in the continuous case, the existence of a 
discrete Lax representation entails the existence of 
conserved quantities (invariants of the map or of the 
correspondence) but by itself it does not say 
anything about completeness and involutivity of 
such invariants. There is, however, an approach that 
incorporates the involutivity property in the very 
construction of Lax equations, both discrete and 
continuous, namely the “R-matrix approach.” 
Indeed, from the experimental observation of a 
number of examples, both finite and infinite dimen- 
sional, one can assert that the matrix M taking part 
in the “continuous” Lax representation [24] may be 
presented in the form (Suris 2003) 


M = R(f(L)) 29) 


In [29], L, M are element of some matrix Lie algebra 
g, R is a linear map from g into itself, and f is a 
conjugation-covariant function, namely 


f(ALA™) = Af(L)A™ (30) 


A being an arbitrary element of the group G with 
Lie algebra g. 

Polynomials in the variable L with scalar coeffi- 
cients are typical examples of conjugation-covariant 
functions. Moreover, in a matrix Lie algebra, one 
can identify g with its dual space g* through the 
nondegenerate bilinear form provided by the trace: 
(Li, L2):=tr(LyL2). Then, the trace F of a conjuga- 
tion-covariant function f will be a typical example of 
a conjugation-invariant function, and, conversely, 
the gradient of a conjugation-invariant function F, 
defined as 


(VE, X) = © FL bel. 31) 


will be a typical example of a conjugation-covariant 
function. In the above setting, one can define the 
following Lie—Poisson bracket on g: 


IF GWL) t= (LVF; VG) [32] 
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where F, G are arbitrary (i.e., not necessarily 
invariant) functions from g into C, so that the 
Hamilton equation 


L = {H,L} [33] 


takes the Lax form 


L = |L, VH] [34] 

It is immediate to check that invariant functions of 

L are Casimir functions of [32] so that they will not 
generate any nontrivial flow. 

Assume now that the linear mapping R, usually 

called r-matrix, introduced in [29], is such that it 

defines a new Lie bracket on g, through the formula 


[L1, Lal, = 3 ([L1, R(L2)] + [R(L1), L2) BS] 
and consequently a new Lie—Poisson bracket 
Then the following theorem holds: 


Let H be an invariant function on g. Then: 


(i) The Hamilton equations on g generated by H with 
respect to the Poisson bracket [36] have the Lax 
form 


L = |L, R(VH)] [37] 


(ii) The invariants of g, that is, the Casimir function 
of the standard Lie—Poisson bracket |32], are in 
involution for [36] so that the corresponding 
flows are mutually commuting. 


A particular realization of such R operator, very 
important for the application, arises in the so-called 
Adler-Kostant-Symes (AKS) construction (Adler 
1979, Kostant 1979, Symes 1980), where the Lie 
algebra g admits a decomposition in two subalgebras, 
g, and g_, so that, as linear spaces, it holds that 


g=2, Dg [38] 


Denoting by m+ the corresponding projections, the 
linear mapping 


R := 14-7 [39] 


defines a new Lie bracket on g, and the correspond- 
ing Lax equations take the two equivalent forms: 


L = |L, r4 (f(L))] = =L, 7- (f(L)] [40] 


For the present purposes, it is of paramount 
importance that the AKS construction has a discrete- 
time version (Suris 2003). 

In fact, let G be a Lie group with Lie algebra g, and let 
G+, G_ be its subgroups having g,,g_ as Lie algebras. 


Then, in a certain component of the identity element J, 
any element g of G is uniquely factorizable as 


g=II,(g)II-(g), Us(g) € G+ [41] 


Moreover, let F:g — G be a conjugation-covariant 
function. Consider now the map 


L—>L:=7'(F(L))-L-T,(F(L)) 
=T1L_(F(L))-L-M*(F(L)) [42] 


and regard it as a difference equation, yielding 
L=L(n+1) in terms of L=L(n). Then, the follow- 
ing properties hold: 


e For whatever function F, the map [42] commutes 
with any continuous flow [40], mapping solutions 
into solutions. 

e It can be “explicitly integrated” with respect to 
the discrete time n, yielding 


L(n) = ID," (F"(Lo)) - Lo - IL. (F"(Lo)) [43] 


or the equivalent expression in terms of the 
complementary projection II. 

e It is interpolated by the continuous flow [40] with 
time step / if 


exp(hf(L)) = F(L) > f(L) =b log(F(L)) [44 


In other words, the discrete-time systems that one 
derives through this approach are just a sequence 
of pictures taken at equally spaced times of some 
continuous flow pertaining to the hierarchy [40]: 
so, by construction they are Poisson maps with an 
involutive family of integrals given by the con- 
jugation-invariant functions of L (typically, tr L”). 
e As far as 


F(L) = I + hf(L) + o(h*) [45] 


the map [42] serves as an integrable exact 
discretization of the flow [40], sharing both its 
Poisson structure and its constants of the 
motion. 


An Integrable Discretization of the 
Toda Lattice 


Consider a simple but an illuminating example of 
the above construction, showing an_ integrable 
discretization of the “open-end Toda lattice,” 
which is described (Suris 2003) by the Newtonian 
equations of motion: 


i exp a expe =) 
TS As tongi |46] 
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and can be cast into a Hamiltonian form by setting 
pj =x;3 qj =x;. If, according to H Flaschka (1974), 
one introduces the variables 


b= Xj, A = exp(Xj41 — 4) [47] 


eqn [46] takes the form 


b=aj-aj1, a=aj(bjyi-b) [48] 


and enjoys the Lax representation [24] in terms of 
the N x N matrices: 


N N N 
L(a, b) = `S Gj Bj 44 + ` DEn F ` Fussy [49] 
k=1 k=1 k=1 


|50] 


N N 
M(a, b) = —B := bD DE EE >. Eji; 
k=l k=1 


In the above formula, E; p is the matrix having 1 in the 
jk position and O elsewhere, so that, obviously, 
En.n+1 = En+1,n =Q. An inspection to [49] and [50] 
shows that A is just the strictly upper triangular part of 
L(a,b), while B is its lower triangular part. The pair 
(A, B) constitutes the so-called LU decomposition of 
L(a, b). One is clearly in the AKS setting, the Lie algebra 
g being just the algebra of N x N matrices, and the Lie 
subalgebras g, being the strictly upper and lower 
triangular matrices. The tridiagonal matrix L(a, b) 
belongs to a Poisson submanifold of g, invariant under 
the flows [40], and a complete family of commuting 
integrals of motion is given, for instance, by I, =trL*. 
Now, the elements of the group GLy, realized as 
the group of invertible N x N matrices, uniquely 
factorize into a product of an invertible lower- 
triangular matrix times an upper-triangular matrix 
with units on the diagonal, and the Lie algebras of 
those subgroups are just the aforementioned sub- 
algebras g,. Then, one is naturally tempted to look 
for an integrable discretization provided by a 
conjugation-covariant function of the type [45], 
starting with the simplest possible choice, namely 


F(L) =1+ bf (L) 
Setting 
L(a, b) := L(ă, b) 
= II} (I + bL) - L -IL (I + bL) 
=TI(I+hL)-L-T11+hL) [51] 


it turns out that the matrix equation [51] is 
equivalent to the map 


(a,b) = (4, b) 


described by the following equations: 


b, =b h(t) 
= Ok Bk- 


Ap = dp (Ory — Be) 


where (3, which are the “field variables” entering 
into the LU factorization [51], are explicitly and 
uniquely defined by the recurrent relation (amount- 
ing to a finite continued fraction): 


m=i thy YO, £a1,..4N 52 
Êk-1 
As ay = 0, the initial condition is simply 64 = 1 + bby. 
It follows from the general results of the previous 
section that [51] is an integrable Poisson map, sharing 
with the continuous Toda hierarchy both the Poisson 
structure and the integrals of motion. Its initial-value 
problem can be uniquely solved in terms of the LU 
factorization of the group element (I+ Lo)”, the 
initial condition Lo being any matrix pertaining to 
the tridiagonal submanifold [49]. According to [44], 
the interpolating Hamiltonian flow is provided by the 
function f(L)=h!log(1+hL). To make contact 
with the discussion in the section “Lagrangian and 
Hamiltonian formulations,” we observe that, in terms 
of the canonical variables x;, p;, the discrete Toda [51] 
lattice becomes the following symplectic map: 


1 + bp; = exp(%; — xj) + b* exp(xj—%j-1) [53] 
1 + hp; = exp(x; — xj) + b* exp(xj41-—%;) [54] 


It can evidently be written in the discrete Newtonian 
form: 


expli — x;) — exp(x; — x;) 


=h" exp(xj+1 — xj) — exp(xj — X-1) [55] 


whose Lagrangian function is given by 


N N 
L= 2 Gi — xk) — h Ba — Xz) [56] 


with 


YE) = h (exp(€) — 1 - £) [57] 


The variables 8; acquire the following extremely 
simple expression in the Lagrangian coordinates xj, %;: 


P= ep a=) 


For integrable Hamiltonian systems with long- 
range two-body interaction, such as Calogero- 
Moser type systems, and their so-called relativistic 
version (Ruijsenaars systems), an exact integrable 
discretization has also been found. However, at least 


in the more natural Lax representation, the related 
R-matrix is dynamical (namely, it depends on the 
phase-space coordinates), and the simple factoriza- 
tion scheme holding for the Toda lattice system (and 
for the related ones) is not available. 

Further knowledge on the intriguing subject of 
“discrete integrable systems” can be acquired by 
looking at the monographs and papers listed in the 
“Further Reading” section. In particular, the excellent 
book by Y B Suris, which also provides an exhaustive 
list of references (updated to 2003), is recommended. 


See also: Billiards in Bounded Convex Domains; 
Boundary Value Problems for Integrable Equations; 
Calogero—Moser—Sutherland Systems of Nonrelativistic 
and Relativistic Type; Integrable Systems and Discrete 
Geometry; Integrable Systems and the Inverse 
Scattering Method; Integrable Systems: Overview; 
Painleve Equations; Quantum Calogero—Moser Systems; 
Toda Lattices; Yang—Baxter Equations. 
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Historical Overview 


The relevance of algebraic geometry in the theory of 
dynamical systems has a long history. Three models 
may serve as guiding threads from old to the current 
state of the theory. Each time algebraic geometry is 
used to integrate an evolution equation; this is 
achieved by an underlying addition rule. The very 
origin for this seems to be Fagnano’s addition 
rule for the arc of a lemniscate (see Siegel (1969)). 
In analogy to the addition of two arcs on a circle 
x? + y*=1, or the duplication formula for 


r dr 
arcsin r = eaa 
0 V1—r 


[ dr =2[ du 
0 V1—-r 0 V1 — u? 


if r=2uv1 — u? (a restatement of the trigonometric 
identity r= sin(2x)=2 sin xcosx), Fagnano found, 
and proved, by substitution, a geometric rule for 
duplicating the arc of a lemniscate: 


namely 


x4 4 Ix2y? 4 y4 = x? — 2 


The length of the arc is now given by 


[ dr 
c= -n 
o vi-ryrt 


and later Gauss designated the limit of integration 
by r=sinlemn(s). Fagnano was able to show that 


[ dr =2 f du 
o y1 —7 0 v1 -— ut 


with the substitution 
Pa 4u? (1 — ut) 
(1 +u)? 


which is remarkable not only because it doubles the 
length, but also because it does so by rational 
functions, and in fact shows that the arc of the 
lemniscate can be halved by straightedge and compass. 
Gauss showed that the constructible fractions of an arc 
of a lemniscate are the same as the ones for the circle. 

Thanks to subsequent work by Euler, and to the 
theory of abelian functions due to Abel, Jacobi, and 
others in the nineteenth century, we now realize that 
Fagnano’s discovery revealed the algebraic group 
structure of the singular quartic curve (or of a 
smooth cubic, if preferred, an elliptic curve). 

This is the key fact that provides the “integration 
by quadratures” for the simple pendulum. We 
follow McKean and Moll (1997) to sketch this 
prototype example of a system which is algebraically 
completely integrable (ACI), defined in the section 
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“Hitchin systems.” Newton’s law gives the equation 
of motion 6+sin0=0, where @ parametrizes the 
position of the bob in terms of the angle the 
pendulum makes with the vertical axis, as it rotates 
about its pivot (the length has been normalized so as 
to match the gravitational constant). The energy is a 
first integral, I = cos @ — 1/26*, and the substitution 





linearizes the motion: 


: 1 
= JOTA 


with k*=(1—I)/2 between O and 1, precisely 
because of Fagnano’s and Euler’s addition rule. 

The second striking example of addition rule, 
yielding solutions to a nonlinear partial differential 
equation (PDE), together with this first will provide 
the two themes of this article, and embed into an 
infinite-dimensional family of conservation laws that 
will accommodate the representation-theoretic 
aspect of the symmetries. In their 1895 article, 
Korteweg and de Vries (KdV) gave official status to 
the (then controversial) representation of solitary 
waves in shallow water: 


(again up to normalization) is by now the well-known 
KdV equation, where u represents the amplitude of the 
wave and x the direction along a canal. It so happens 
that by integrating twice the ordinary differential 
equation (ODE) obtained by the one-wave ansatz, 
z=x — ct (where c is the constant velocity), one sees 
that the solution u and its derivative u, =u’ satisfy 
identically an algebraic equation: 


—cu' — 6uu +u” = 0 
(—cu — 3u? +u"+a)u' = 0 
(u)? 3 ue 
ae ae a b 
5} u` +c 5} au + 


u =2 + const. (up to a linear transformation) 


(P? = 493 -gp — 83 =4(9 —e1)(9 — e2)(p — e3) 


In disguise, then, the PDE and the Hamiltonian 
evolutions are the same; the motion becomes linear 
(and quasiperiodic) on the torus C/A, where A is the 
period lattice of the p function. It took considerably 
greater effort to generalize this correspondence to 
higher genus. This article is devoted to such a 
correspondence as well as some of the surprising 
connections between complete integrability and 
other areas of mathematics such as: representation 


theory (the corresponding geometric objects are 
Grassmann manifolds as opposed to Jacobians); 
differential algebras (Weyl algebras, commutative 
rings of differential operators, and differential 
Galois theory); and reduction in symplectic 
geometry. 

It is often helpful to highlight the relevant features 
in the simplest example, even if it is of special kind. 
The KdV equation and, as Hamiltonian counterpart, 
Neumann’s system (see Neumann (1859)) will serve 
best. The abelian sum identified by Fagnano cannot 
be defined on points of a curve X of genus g > 0; 
what one can add are points of the g-fold symmetric 
product X'S) up to linear equivalence, defining (up to 
noncanonical isomorphism) an abelian variety, the 
Jacobian Jac(X) = C*/A; analytically, the Jacobian is 
described by abelian coordinates 21,...,2%,: if 
Q1,.--5 Qg, G1,...5 Bg is a basis of 1-cycles on X 
with standard intersection matrix and w1,..., Wg is 
the dual basis of holomorphic differentials, then 
f= a Jp, wj is defined in terms of a fixed base 
point Po € X and of (Pi,...,P,) € X® up to the 
period lattice A. It is in these coordinates that the 
Hamiltonian flows become linear. In canonical 
coordinates 41,.--54g+15P15--+>Pge+1, the harmonic 
oscillator 


di = Di 
Di = —eiqi 
when constrained to the unit sphere a q? has 


equations 


di = pi 
pi = —e:4i + 4; S (eq; — p;) 


J 


This system is completely integrable in the sense that 
there exist enough involutory invariants, g gener- 
ically (in the (q;, pi) variables) independent functions 
on the 2g-dimensional tangent bundle of the unit 
sphere with canonical symplectic structure; in fact 
the coefficients of the polynomial 


= i g+1 q? g+1 p? 
fà =] [a ei) (St Net! 


= 











are invariant and the hyperelliptic Riemann sur- 
face X whose model in the affine plane is given by 
u? =f (A) is called the spectral curve of the system. 
Since the polynomial f(A) is monic of degree 
2g+1 and has generically simple roots, X has 


genus g. A change of variables permits integration 
by quadratures, 


= 9[mi—1](0)0[n2i-1](Z0 — A + 2V—12U) 
9[0](0)9[0](z9 — A + 2V/—11tU) 

where zo, U € CË are constant vectors, a denotes the 
Riemann theta function of X, ™m(k=1,...,2g) are 
theta characteristics and A is the Riemann constant. 
While these are technical objects of classical 
Riemann function theory whose detailed definition 
is best found in a textbook (see, e.g., Mumford 
(1984)), the point here is that the motion is 
linearized along the line with direction U, on the 
hyperelliptic Jacobian Jac(X), which is a 28+!:1 
cover of the phase space. 

A yet deeper fact links the integrable Hamiltonian 
motion and the (soliton) PDE, namely the statement 
that a (e;q7 + p7)=u(ti,t3) solves the KdV 
equation, where the variables are renamed as 
x=t,,t=t3; to denote two of the g commuting 
Hamiltonian flows. 

The Neumann system as well allows us to uncover 
another deep relation between dynamics and geo- 
metry, namely the moduli aspect: on the one hand, 
Mumford (1984) used the Neumann system to recover 
the equation of the spectral curve from a vanishing 
property of theta functions with characteristics, 
thereby giving the first characterization of the moduli 
subvariety of hyperelliptic curves in terms of thetanulls 
(for any genus). On the other hand, Francoise (1987) 
explored the relevance of the integrable system to the 
Picard—Fuchs equations. The fundamental link is 
provided by Arnol’d’s theory, according to which a 
set of action-angle variables (q;,p;), i1=1,...,”, fora 
completely integrable Hamiltonian system can be 
calculated in terms of a basis +; of the first homology 
of the fibers, which are n-dimensional tori, 
J dq; = 6; hence, in the case of an algebraically 
integrable system such as the Neumann example (or, 
in Francoise’s paper, the Kowalevski top), in principle 
one can express the (coefficients of the) differential 
equations satisfied by the periods in terms of the 
commuting Hamiltonians, despite the fact that 
periods and Hamiltonians are transcendental func- 
tions the ones of the others. A more general family of 
period matrices is subject to the Gauss—Manin 
connection, and the question of whether its general 
abelian variety is Lagrangian with respect to a 
holomorphic symplectic structure on the family yields 
a cubic condition on the periods (Donagi and Mark- 
man 1996). 

These are two major applications of PDEs to 
algebraic geometry: characterizing subvarieties of 
moduli spaces (of curves) and expressing the 
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Gauss—Manin connection acting on sections of a 
Hodge-theoretic bundle over the moduli space in 
terms of the evolution equations of a completely 
integrable system. In the former case, the flows of 
the system act on the theta functions of a (fixed) 
curve; in the latter, the Hamiltonians are related, 
via the action variables, to computing the mono- 
dromy over the branch points of the base of the 
system. The generalization of specific (e.g., hyper- 
elliptic) cases is very difficult to work out and 
remains largely open 40 years after the field of 
integrable equations started being actively 
investigated. 

Before concluding this historical overview, a 
beautiful theory that escaped attention is worth 
mentioning. In the late nineteenth century, for 
example, Baker (1907) constructed the first genus-2 
solutions of the KdV equation, although he was 
apparently not aware of the equation itself; in the 
process, he also defined what is known as the Hirota 
bilinear operator, a device introduced by R Hirota 
in the 1970s to capture an equivalent version of the 
KdV, or the more general Kadomtsev—Petviashvili 
(KP) equation, 


(uz — 6UUx + Uxxx), = Uyy 


Just as the Lax pair allows for a linearization of the 
isospectral deformations, Hirota’s bilinear form 
reveals the representation-theoretic (and algebro- 
geometric) nature of the equations, via the vanishing 
of a natural pairing on a pair of solutions, besides 
providing a formula for exact solutions; the defini- 
tion of the bilinear operation is the following: for 
functions F and G, 


D,,F-G = (2 z =) F(t)G(t') | 
ES (Fig yss) 
so that Hirota’s direct method gives the following 
solution: set u=2(0* /Ox*) log F, then 
KdV = (DxD, + D{)F-F =0 
(D; + 3D? — 4D;D) F d 


2F : 


KP = D? 

Baker was intent on generalizing the properties of 

the Weierstrass p function. He focused on genus 2 

(and obtained partial results for general genus), in 
which case any curve is hyperelliptic, 


fs pr = XET + angNE +--+ ao 


and used a suitable basis of holomorphic differen- 
tials particular to the hyperelliptic case, whose 
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integrals give abelian coordinates z; that happen to 
be dual to the KdV flows, 


Da AA aA 
X E ye 





to characterize the genus-2 theta function by 
differential equations (equivalent to the KdV hier- 
archy), as well as give the quartic equation for the 
Kummer surface in P?, namely the 2:1 image of the 
Jacobian of the curve mapped by the divisor 20, 
that is, by a basis of the space of theta functions 
with second-order characteristics, simply as the 
determinant of 


=A ja 2011 =L p 
lay —(22+4911) Fa3+2912 292 
201. 743+2912 —(a4 +422) 2 
—2912 2922 2 0 
where 
2 
p(z) = annu 2 a(z) 


and the o function, defined in analogy to the genus-1 
case, is proportional to the Riemann theta function. 

To summarize this introduction, the exchange 
between algebraic geometry (the classification of 
algebraic varieties) and dynamical systems has been 
extremely fruitful in either direction: algebraic 
geometry surprisingly provides exact solutions to 
evolution equations that have special algebraic 
symmetries (and arise in nature!), and conversely 
those very evolutionary equations yield the structure 
of particularly complicated varieties, by characteriz- 
ing their (rational) functions. 


lsospectral Deformations 


The isospectral deformations in question have been 
encoded by Lax-pair equations, which take their 
name from Peter D Lax, who gave a version of the 
KdV equation in such form. 

Lax pairs enter in two essentially different ways in 
the theory of integrable systems. The evolution 
equations take the form: 0, L=[B,L], where 
t1, t2, t3,... is a Sequence of commuting time flows, 
L is an operator whose coefficients depend on time, 
and B is another operator of the same kind; since 
heuristically this is the infinitesimal version of the 
equation L(t)=U(t)"L(0)U(t) (with B=U9d,U), 
the spectrum of L is preserved and provides 
conserved quantities; in fact, Moser (1980) specu- 
lated that every completely integrable system might 
have such a form. 


In the form that immediately yields a hierarchy of 
PDEs, the (hierarchy of) deformations pertain to a 
ring of (formal) pseudodifferential operators, where 
the variable x=f, is singled out and ð denotes 
differentiation with respect to x: 


Lt)eD= [S wou analytic near x = o 
j=0 


The multiplication rule that makes P into a ring (in 
fact, a C-algebra) is composition: 


ð ou = uð + u 
"ous nd -u0 Eu O =e 


We normalize L by an automorphism of D 
(generated by a change of variable and conjugation 
by a function) 


L = 8" Fima a +--+ + u9(x) 


In P any (normalized) L has a unique nth root, 
n=ordL, of the form £L=0+44(x)O7' + 
u_o(x)O* +--+. Finally, the deformation equations, 


dr, L = ("p L] 


define the KP hierarchy, which takes its name from 
the first nontrivial deformation equation, known as 
the KP equation encountered above, if we set 
x=tı, y =t, t=t3 (notice that this reduces to KdV, 
up to rescaling, when the solution is independent 
of y). The algebro-geometric solutions are those 
with the property that only a finite number of time 
evolutions are independent. This turned out to be 
equivalent to a classical problem of elementary 
differential algebra, known as the Burchnall- 
Chaundy problem after the two co-authors who 
solved it in the 1920s. 

The Burchnall-—Chaundy problem: which L(x)’s 
have centralizer Cp(L) that is larger than a 
polynomial ring C[L,1], Lı E€ D? The key to the 
solution is the following fact (which clearly does 
not hold for operators in more than one variable, 
or finite-dimensional operators such as matrices): 
if ord L>0O and A,B € D both commute with L, 
then [A,B]=0; in particular, Cp(L) is commuta- 
tive, hence every maximal-commutative subalgebra 
of D is a centralizer. It was proved in the early 
1900s by I Schur that Co(L)={5 ^ cL, c E CHN 
D. It follows that centralizers are rings of affine 
curves: their transcendence degree over the field of 
coefficients is 1, and SpecC(L) can be regarded as 
an affine curve Xo (with natural compactification 


X by a smooth point at infinity). Burchnall and 
Chaundy proceeded to show that the rings of 
operators whose orders are not all multiples of a 
fixed integer >1, and having the same spectral 
curve X (up to isomorphism), correspond to line 
bundles over X (more precisely, rank-1 torsion- 
free sheaves); thus, the hierarchy of evolutions 
linearizes on Jac X, as indicated by the examples 
treated above. 

In this setting, it has been very challenging to 
generalize the integrable flows, both to the higher- 
rank and to the higher-dimensional case. When all 
the operators in the commutative ring have order 
divisible by an integer r > 1, their common kernel 
defines a rank-r vector bundle over the spectral 
curve, and although the theory in principle is 
similar to the case of line bundles, there are no 
explicit formulas for solution. On the other hand, 
in order that the spectrum be a variety X of 
dimension d > 1 rather than a curve, it is natural to 
seek commutative rings of partial differential 
Operators in d variables; but again, while some 
constructions work in principle, explicit formulas 
are elusive. 

The form in which Lax pairs occur for finite- 
dimensional Hamiltonian systems is quite differ- 
ent: here what is preserved is the spectrum of a 
finite-dimensional linear operator, a matrix. The 
first examples, from which the theory took off, 
were inspired guesses. The Neumann system 
described above fits in the following theory: 
Moser (1980) showed that the Neumann system 
together with other important classical examples 
are special cases of rank-2 perturbations (since 
(2 =dim(p, q))) which preserve the spectrum of a 
matrix 


L=A+aqg&®&q+bq2picp®q+dp®&p 


where A is a fixed constant matrix which can be 
normalized to a diagonal, diag(e1,...,@g41), 


a b 
der A Z0 


and u &v denotes the matrix [u;v;]. The symplectic 
structure is the standard w= ` dp; ^ dq; so that a 
Hamiltonian H defines a flow 








_ 0H. OH 
di = Bp,” Di Og; 
and 
OG OHOG OGOH 











Ot a 2 o a Op; ðq; 
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The Hamiltonian flow of 


H -5 (aa q) + (b + c)(Bq, p) + d(Bp, p) 








ad — bc «~ b; — b; 
=a ` - (qib; — ap’) 


— ei— ej 
iff SF 


(where B=diag(b1,...,b¢41) is any fixed diagonal 
matrix) is equivalent to the Lax-pair equation 


L=[M, L], where M is a suitable matrix: 


M => (b—0)[bi6i] + (ad — be) É (qip; — ais) 


ei — ej 





The Weinstein—Aronszajn formula 


det (n = Se Q n) = der( I, Al es nj)1) 
i=] 


(where each of the &,...,&,1,.--5% iS a 
(g + 1=n)-vector) gives for the spectral invariants 


K _ det(A — L) 
e(A)  det(\ — A) 
= det(I — ((A— A) ‘q) Q (aq + bp) 
—((A— A) ‘p) ® (cq + dp)) 
= det(I, = Wlq, p)) 
with 
A— A) q, q) ((A — A)” *q,p) 
W, (q, p) = 
adai A-A) q, p) (A-A) `P, p) 


and det (I — W)(q,p)) =1—tr W) + det W) =1 — 
alq, p), defining the rational function ¢). 

Moser also showed that the system is completely 
integrable and linearizes on the (generalized) Jaco- 
bian of the curve u? = e7(A)d,(x, y). Letting a= — 1, 
b= —c=1,d=0 gives the Neumann system. 

The dilation q> àq gives a Lax pair with 
a parameter, At>A+)*q@qi rX(q@®p—p®4q), 
which makes the spectral curve look more natural. 
Indeed, 


Remark (Adler and van Moerbeke 1980). The 
Neumann flow is equivalent to the Lax pair: 
Lı =[M1, Lı], where Lı = Ap? + uq @ p —p ®q) + 
qg&q and M,j=Au+q®p—p®q. Moreover, the 
Hamiltonians are of Adler-Kostant-Symes (AKS) 
type, namely projections (with respect to an ad- 
invariant inner product) of gradients of orbit-invariant 
functions to half of the splitting of a Lie algebra. 
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Specifically, {X^ Aju’ |A; € gl(n,C)} =K N, with 
K={5) Ai} and N={5D}_ Ajp/}; if the inner 
product is (A, B} = X ;;--1 tr A;B;, the dual of N 
can be identified with K = K+, and the Hamiltonian 
for the Neumann flow can be taken to be 
H=((1/2)(Lip)*, pIg+1) under the Lie-Poisson 
brackets and suitable reduction. The flows linearize 
on the Jacobian of the (hyperelliptic) curve 
det(L; — 7) =0. 


It is possible to recover the link between the finite 
and infinite integrable systems (Neumann and KdV) 
mentioned in the introductory overview, if we notice 
that squared eigenfunctions for the Lax operator 
L=L*=0* +u become algebraic on the spectral 
curve: Dubrovin et al. (2001) introduced the Baker 
function, namely the unique function 7(x,P) with 
the following properties: 


(i) For |x| sufficiently small it is meromorphic on X \ 
{Pæ}, with pole divisor bounded by 6= P; + ---+ 
P,, independent of x, such that h°(6 — Pœ) =0, 
and near P.w(x,P)e** =1+ O(z) is holo- 
morphic, with z chosen to be \!/* in our case. 

(ii) We let Q be the unique meromorphic differential 
with zeros on 6 and a double pole of the 
form (—A+holomorphic)dz! at P.. Note: 
(1) that Riemann-Roch show that Q is unique. 
(2) We also get a characterization of the dual 
Baker function, defined as wW(x,vP) in the 
hyperelliptic case where v is the involution 
(A, 4) (A, —u), as meromorphic on X \ {Px} 
with poles bounded by 6’ and behavior e**(1 + 
O(z7!)) near Pœ, where 6 + 6’ are the 2g zeros of 2. 
(3) Furthermore, Q=dA/W(w, p), where W is 
the Wronskian (with respect to the variable x). 
Then, upon fixing a meromorphic function h, 
normalized at P,,)=2'/* + entire, with g + 1 
fixed poles distinct from 6, we have: 


If pj = Rese hQ, aj = \/pj¥(x, ej), Pi = VOjO(X, ej), 


1 1 1 
then 2, gq; =1, Si qib; = 9, Uy (ej; +p) = 


u(x) and {qj pi} satisfy the Neumann system. 


Indeed, the constraints follow from the “residue 
theorem” applied to the differential hOwd (it has 
a residue of —1 at Pa); the differential equations 
q;=e;jq;— uq; follow from the assumption 
LUZA 

The function u= —2 D (2 ia ei)qț, evolving 
under suitable abelian flows, is a solution of the 
KdV equation; the “times” of the KdV hierarchy are 
linear combinations of the Neumann Hamiltonians; 
more precisely, of the invariant vector fields deter- 
mined by the tangent directions to the image of X in 


JacX, with Abel map normalized at Pœ, at some 
point P: Dp= 5 -AP Di 
The other way around 

McKean-van Moerbeke), 


(Moser-Trubowitz, 


If L=0* +u(x) is a finite-gap operator and 
€1,---5€g11 are among the 2g+1 edges of the 
gaps, there exist constants p1,...,Pg41 so that 
the functions pj(x)=,/pjw(x, e;) satisfy Di 
p; (x) =1. Since Lyj=ej, the p;j(x) solve the 


Neumann system. 


The squared eigenfunctions also provide a natural 
interpretation for Moser’s Lax pair. If V) is the 
kernel of L — A, then the Baker function w(x, P) and 
its dual $(x,P) give a basis of V) except at the 
branch points (e;,0) where ~=¢. But then the 
normalized basis of V) is related to y, by a 


constant matrix: 
Yo Y 
=C 
l= cfs] 


rela 6 ls 
p 0 —ujle 
if B is the differential operator of the Burchnall- 


Chaundy ring corresponding to multiplication by u, 
so that 


while 


T 
-V U = _ u 0 —1 
w v| = Ma=cl& 8 Jc 


By evaluating at x =0, we find: 


1f ¢ 7 | 
C=— 
wW $ Y x=0 
with W=w¢' — y'¢. Finally, we calculate: 


cfi 0 Jo hb KEM its | 
0 —u w| -2y -=o +o) 
so that U(A) =Y + Yo, V(A) = —2H6, W(A)=2Y o" 
are polynomials like the entries of W, (q, p) -e7(A), 
and the fact that UW + V? does not depend on x 
expresses the fact that W = constant. 

An object that links the two distinct occurrences 
of Lax pairs is Sato’s infinite-dimensional Grass- 
mann manifold. One particular model will serve as 
illustration, with more general settings covered by 
Dickey (2003). Sato defined a one-to-one correspon- 
dence between cyclic D-submodules Z of P, namely 
of the type T = DS (which turns out to be equivalent 
to the property: P=Z @P'")), and subspaces of a 
ring of formal power series, which make up an 
infinite-dimensional Grassmann manifold, more 





precisely elements of Gr’, the “big cell.” This way, 
KP can be viewed as deformation of D modules. 

There are two ways to set up the Grassmannian: 
(1) more direct as a limit of finite-dimensional 
Grassmannians; (2) more intrinsic, using the rings 
DCP, 


1. Let dimV=m+n=N,Gr(m, V) ={m — frames 
in V}/GL(m) = P( A” V) via €,...,€" Yr EO A 
cae a evn) 

If we fix a basis eé,...,e€n-1 of V, and 
write a frame in coordinates, &) = £o, ;eo +: + 
EnN-1,i€N-1, then 


O<lo <i <mn- <N 


Cikao A 4 


gereg 


A point in the ambient P( ^A” V) lies in the embedded 
Gr(m, V) its projective coordinates £2, (0< 
l; < N) satisfy the Plücker relations (PRs): 


S (S1) Eko. hn 2t Sty i =0 


Therefore, 
Gr(m, V) = (Gr(m, V)\{0})/GL(1) 


where 

Gr(m, V) = {(Ey)yca,,, Satisfying the PRs} is a line 
bundle over Gr(m, V), Y is a Young diagram con- 
sisting of rows 


| = (m = 1) 


4-1 


so it is contained in the rectangle Amv. 
For the commutative diagram: 


Gr(m',N’) Pet Gr(m, N) 
| identity 


Gr(m', N’) ane Gr(m, N) 


| identity 


the following facts can be checked. Let 


m<m,n<n',N=m' +n.: 


(i) if (Ey)yca,,,, satisfies PRs, so does its restric- 
tion to Y’s within Amy; 

(ii) if (Ey)ycA,» satisfies PRs, so does (Ey)ycA „y 
where € =0 unless Y C Any. 
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These facts make it possible to define: Gr = 


(Gr\{0})/GL(1), where Gr={(y)y all Young diagrams 
satisfying all PRs} 


Gr Paet Gr(m,N) 
Î dense 


Gr” 


| identity 
ene Gr( m, N) 
and 


Gri” = {(€) € Gr : £y = 0 for almost all YI 
= U Gr(m, N) 
m, N 


The KP time deformations are defined as follows: 


Ey(t):= X xyy(t)€v where xyy (t) = det(pe—e (t)) 


all Y’ 


polt) =1, Palt):= >, 


14 +212+3V3+:-=n 


bt sa A e] 


Write xy; as xy, where yy(t)=det(py,;(t)) are 
the Schur functions. 
To connect with the KP hierarchy, let 


n Aei (x T t) 

Ea(x + t) 
where x+t=(x+h,b,...), and S:=1+w,(x,t) 
O-1+.... Then L=SOS"! satisfies the KP hierarchy, 
namely 0,,5=B,S—S0", where B,:=(SO"S"!),,<—> 
[ð — Bn, ðn — B] =0 0, L= L”), L]. 


Note The Plücker coordinate £&(t)= J} yy xy(t) 
Ey =7(€,t) is a generating function for the Plücker 
coordinates, €y(t) = xy(0;)&g(t), where 


5 (212 19 
“mdna 20h 30R 


Now by reducing to Gr(m,N) and checking that 
every €y(t) satisfies PRs, we have a dynamical 
system on Gr. 


Wy (x,t) := (—1) 


Conclusion (Sato). Although any f(t) € C[[t1, f2,...]] 
admits a formal expression of the form `y cyxy(t), 
where the coefficients are 


cy = xy (f (t)|,—-0 


it represents the 7 function for some € € Gr <=> its 
coefficients satisfy the following PRs: 


< 1 O; O 
S-(-1) Xko..-Rin_1&; (2) Xlo.. b; Ln (- 3) TT T= 0 


i=0 


which is the KP hierarchy in Hirota bilinear form. 
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2. Let 


P 
V = py = Peonst a l 


` að, aj € c) 


—OO<1<<00 


equipped with the induced filtration V" by order, 
induced by 


Pas l NO apă, ap € c| 
—oo<k<i 


and define 


Gr = {vector subspaces W of V 
s.t. dim(W NV) = dim V/(W + VO) < co} 
“same size” as the reference subspace {>} „<o crer: 
velam, 
The correspondence between such a W and a 
cyclic submodule of P is given as follows: 


T= W = SVO = {v € V : Tv c VO} 
W=>T={AEP:AW c VO} 


Generic points of particular interest in construct- 
ing KP solutions make up the “big cell”: 


Ge c Gre=V = W @ V) 


open dense 
<=> & #0 and a7 function can 


be defined as above 


In standard basis of V,e;:= 0-7 'mod Px, i € Z, 
the action 


xe; = (i + 1)e1 


Oe; = e141 


gives V a P-module structure. Let A be the shift 
operator: Oe; = e;_1; then 


a ee 
so, this linearizes the flows! 


This survey would not be complete without an 
example of the formula that links the r and the 
theta function; more general statements and groups of 
symmetries can be found in Dickey (2003). A solution 
of the KP hierarchy can be expressed in terms of the + 
function Tw associated with an element W of Gr(H), in 
the model Gr(H), where H=L7(S'), H = H, $ H_ 
with standard basis H,=(1, z, 2*,...),H_= 
(z+, z7,...) and p+ the projections, Tw(g) = det (tg 0 
P+ 2 [fg 0 (paly) ), where g=e~’*. The associated 
Baker function ww/(g, z) is a function of the form 


af 
w(g,z) = gz) ( + > ad) 


1=— 00 


with a; € C[[t1, t2,...]] for each 7, such that the map 
z+ vw(g, z) is an element of g'W. If d=1+ 
a _, 4iz', then L= 40¢ is a solution of the KP 


hierarchy. 
1 
re((s~ae)} 


Moreover, 
TW((te))o 


This is the analog of the expression for the Baker 
function in terms of the theta function, when W 
corresponds to an element of the Jacobian of the 
spectral curve [ via the Krichever map 


Pe PF Pa, 


(Ux + A(P) — A(D) — A)0(A(D) + A) 
” WAP) — A(D) — A)O(Ux — A(D) — A) 


g ‘dw(g,¢) = 


where P € T, A(—) is the Abel map, A the Riemann 
constant, U € C? a suitable vector, D a generic 
divisor of points P;,...,P, ET,ņ a differential of 
the second kind, and a a constant depending on the 
curve. For the KdV solutions, the condition on W € 
Gr’ is that z2W c W and the solution is 


uw (Xx, t2,t3,...) = 20 log Tw(x, to, t3,...) 


In the Grassmannian formulation, the Hirota 
bilinear operator mentioned in the introductory 
overview makes its third and most general appear- 
ance (we regard Baker’s and Hirota’s definitions as 
the first two — the one based on a residue formula in 
algebraic geometry, the other on the vanishing of a 
differential form): 


Definitions 


(i) In P, it is possible to conjugate any 
L=dO+u4(x)O'+--- into ð by a K=1+ 
v4(x)O-!+---, determined up to elements 
of C/O] =Cp(0): KLK =O. 

(ii) We define a formal Baker function for £ as the 
element of the module M (the free, rank-1 
P-module = space of formal expressions f = e**f 
where f = am f;(x)z’, with generator e**) such 
that LY = zw; so that Y = Ke% for K as in (i). 

(iii) We say that the formal adjoint At of a (formal 
pseudo) differential operator A = ss 5 Uj(x)O! 
is A= Da _ œ (—8Yu;(x), and that the dual 
Baker function yf to w=Ke*4* is the Baker 
function of (L'); the operator which corresponds 
to K in (i) is (Kt) +, that is, (Kt) '2'Kt = — ð. 


Then, the KP hierarchy is equivalent to the 
following formula: 


Res,o(¢',z)" (t,z) = 0 


Moreover, as proved in Dickey (2003), if yı and %2 
are formal power series of the form 
by = Ke” ab. =Je™**,, for K, J € 1+ PP"), satis- 
fying the condition 


Res, (an ay? = any) 6=0 


then there exists an operator £ satisfying the Lax 
equations, whose wave function and adjoint wave 
function are 71, %2, respectively. 

To conclude this overview of Lax equations, we 
point out that they can be viewed as zero-curvature 
condition for a (formal) connection (on the trivial 
bundle over the formal deformation space whose 
fiber is P), rephrasing the fact that the time flows 
commute and hence define time deformations; such 
formulation can be found in Mulase (1984). 


Symplectic Reduction and r Matrices 


While the Lax-pair presentation provides natural 
spectral invariants, the  group/representation- 
theoretic nature of integrability (sometimes referred 
to as hidden symmetries) is best seen in the context 
of Marsden—Weinstein reduction. We perform it in 
the example of a generalization of Moser’s rank-2 
perturbation; we extract the basic construction from 
Adams et al. (1988). A more comprehensive treat- 
ment can be found in Babelon et al. (2003). 


Definition We let M,,, denote the space of n xr 
complex matrices, with n > r and give M=M,,, x 
M,,, the symplectic structure w(F, G) =tr(dF ^ dG‘) 
for F, G € M. A rank-r perturbation of a fixed n x n 
matrix A is L=A+ FG". 


Definition We split the formal loop algebra 


gl(r)=gl(r)* S gl(r) cor 

matricial polynomials in A and gl(r) of strictly 
negative formal power series. Under the pairing 
(X(A), Y(A)) =tr(X(A)Y(A))_ (where the subscript — 


means the coefficient of A~!), the dual of gl(r)* is 


where gl(r)” consists of r xr 


——_ 


———_ 


identified with gl(r) , which therefore admits a Lie- 
Poisson structure. 


In sketch, we consider _an action on M whose 
moment map lands in gl(r) ; we check that the 
AKS flows on gl(r) correspond to isospectral 
deformations of L=A+FG! for flows on My; 
finally, we perform a Marsden—Weinstein reduction 
for an (equivariant) GL(r) action to obtain a 
completely integrable system on a symplectic leaf, 
whose flows are linear on the Jacobian of the 
spectral curve. We recall very briefly the general 
definitions. 
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Moment Map 


1. A smooth group action of G on a symplectic 
manifold (M, w) is said to be Hamiltonian if there 
exists a “moment map” J : M — g* such that the 
Hamiltonian vector field associated with J and a 
fixed element € € g is the same as the infinitesi- 
mal action associated with €. However, an 
infinitesimal definition is given because in the 
formal setup the group of a Lie algebra is often 
delicate to define. We recall that: 

2. The Lie—Poisson structure of g* is defined by 


14, he (u) =< u, [do(u), dv(u)] > 
for 6, pEC*(g"), weg 
where dd: g* — g** (which in our situation will 
always be identified with g) is defined by 


d 
<dg(u), v> = 9u + tv)  WYEes 


[= 


Now we say that J : M — g* is a moment map if 

3. its linear dual f: g— C™(M) is a Lie-algebra 
homomorphism; or if 

4. it is a Poisson map with respect to the Lie—Poisson 
structure: g, Y € C” (g) + (o, J= o, Ve 
In case we do have a Hamiltonian G-action, then 
the subspace CX (M) of G-invariant functions is a Lie 
subalgebra of C% (M). If G acts freely and properly on 
M, then M/G is a manifold with a Poisson structure 
inherited from the one on M through the identifica- 
tion C°(M/G) = Cé(M). The symplectic leaves of 
MIG have the form M, =J~'(u)/G, =] (O,,)/G, 
where u € g*, G, is the isotropy group of u in G and 
O, is the G-orbit through u. The reduced manifold 
M, has a natural symplectic structure w,, such that 
i*w=7*w,, where i:]~'(44) + M is inclusion and 
m:](u) + M, is the natural projection taking 
points to their G,,-orbits. 


This class of examples can be treated with the 
technique of a (classical) r-matrix, as follows. Given a 
linear map R:g— g, the alternating bilinear form 
LX, Y]r = (1/2)([RX, Y] + LX, RY]) satisfies the Jacobi 
identity <> certain quadratic conditions on R are 
satisfied. Assuming they are, for all pairs of invariant 
functions I, J on g*, we have {I, ]}z =0 (where {, }p 
is the attendant (Lie—Poisson) structure). Indeed, 
(1, Dhe(te) = (LAIH), df (ya) as p) = (1/2) (RAM) dJ (1) 
u) + (1/2) ([dI (us), RdJ(u)], u), but, for example, 
([RdI (1), dJ(u)], u) = (RI (12), ad*dJ(u)(u)} = 0. 


Remark As is clear from the proof above, our 
definition of invariant need only be infinitesimal, 
that is, f € I(g*) iff <p,[df(u),X]> = 0 Vu E€ g*, 
X € g. Of course, when we have a corresponding Lie 
group the invariants are the functions which are 
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invariant under the natural action, such as the 
symmetric functions of the eigenvalues of a matrix. 


AKS Flows 


For a splitting g=K@WN, as given above, with 
g*=N* @ K*, an example of r-matrix is given by 
R(X)=X, — X_ (where +,— denote projection to 
K, N): the Jacobi identity is straightforward to 
check. As a consequence, invariants on g* are in 
involution with respect to {,}p and these are called 
AKS flows, after work done independently by AKS: 
X =[df(X),, X] = — [df(X)_, X], given here for the 
special case in which we can identify K with K* and 
X is the element in K* that corresponds to X € K. 

We now proceed to the appropriate moment maps. 
We generalize the constant matrix A introduced 
above (isospectral deformations) by allowing multiple 
eigenvalues a; of multiplicities n; < r,nı +---+ 
np=n, so that det(A — AI) = i, (a; — A)”. Let 
a(X) = eae (a; —A). We split an 2 xr matrix F 
into k blocks F; accordingly. 


Definition/statement 


iG) iesin m =) anG) is the 
moment map of the action [(g1,...,8n) 
(F, G)];=(Fig;', Gigi), where g;€ GL(r) so 
that under standard identifications J?(F,G)= 
—(G!F,, ein G'F,,) and restricting the action to 
the diagonal subgroup {(g,...,2)}, J (F, G) = 
AGIP, _ 

For X(A) € gl(r)” we define a(X(A)) = (X(a1),.--, 
X(a,)) and obtain the exact sequence 


— 
pi 0 
pi 0 
— 


0 a(A)gl(r)* > gl(r)* > g + 0 
By dualizing, and identifying g? to its dual by 
using the trace componentwise, we get 


k y. 
a*(Y1,..., Yn) a 
j=1 f 


and finally check that J, =a* o J” is a moment 
map. By combining (i) and (ii), we get a 
moment map 








~ Neny T A 
I(F,G) = 3 = GI(A-A) F 


which becomes injective on M/H, where M is 
a suitable open submanifold of M and 
H = GL{(n1) x --- x GL(ng) acts blockwise by 
(Pah Gi: 

We also notice that the “Moser space” M4 = 
{A + FG!|F,G € M} of rank-r perturbations can 
be identified with the orbit space M/G,, G, = 
GL(r) acting as in (i). 


xr 


(iii 


_To finish, we turn on the obvious AKS flows on 
gl(r) : the key observation is that they are isospec- 
tral for the rank-r perturbation A + FG!: we see that 
the Poisson-commutative ring F, of projected 
invariants defines, by composition with J, a 
Poisson-commutative ring F of isospectral flows on 
Maur x Mayr. 


Hitchin Systems 


The Hitchin system, introduced in the late 1980s, 
20 years later still encompasses the most general class 
of “algebraically completely integrable” systems, which 
we now discuss. In its most basic form, the concept of 
“algebraic completely integrable” (ACI) Hamiltonian 
system, is an extra condition on the integrability of 
classical mechanics, in the following sense. 

A Hamiltonian system with n degrees of freedom, 
that is, defined on a symplectic manifold M of (real) 
dimension 2n is (Arnol’d—Liouville) completely 
integrable if it admits n functions in involution 
whose differentials are linearly independent (possi- 
bly, generically on M). When M is a component of 
the set of real points of an algebraic variety Mc and 
the symplectic form w and Hamiltonian function H 
are rational without poles on M, the concept of 
algebraic complete integrability can be introduced. 
For this condition to hold, we require that the vector 
fields corresponding to the Hamiltonians in involu- 
tion still have no poles on a compactification of the 
fibers on Mc. 


Nonexample (Mumford 1984, 84). Consider 


M=R’, w=dxAdy, H= x+y 


Here a compactification of the fiber, the affine 
curve x*+y*=c, is the projective curve X*+ 
Y*=cZ*, which is smooth (provided c 40) and 
has four points at infinity. The vector field Xp 
defined by H, Xy |w= —dH, is tangent to the fiber 
in the affine plane, but has a pole at infinity as can 
be checked by a change of coordinates; 4 is the 
lowest exponent for which this simple nonexample 
works! 


Note In the algebraically completely integrable 
situation, the fibers are abelian varieties or exten- 
sions of such by C** for some power k. This gives 
rise to the issues of variations of periods over the 
base mentioned in the introductory overview. 


The Neumann system is ACI, with integral tori 
given by the Jacobians of the spectral curves: 
2g+1 
T: p =g(d) = | [A-4e) =UW+ yv? 
1 
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where 











g 

U= [a — ài), (At,..-,Ag) “elliptical spherical 
i=1 

coordinates” 


Y = (1 =) eigenvector: Ly = pw 
p 


g 
divisor: X (A; VAs) 
i=1 

Hitchin (1982) devised a geometrical model of the 
spectral curve, a compact algebraic curve contained in 
the surface T*P!, and its line bundles. He also provided 
subsequently (1987) a dramatic generalization. 

Hitchin’s construction, in the Neumann-system 
example, highlights the following objects: 


e L€ H(P',End(E) 89 O(g+1)), E rank (r=)2 
bundle over Pt; 

e T=total space of the line bundle O(g + 1) over Pt; 
e ,,=tautological section: P! + T, where L — ñI € 
H°(T, End(E) ® Olg + 1)) (tildes denote pullback); 
T: det (L — fil) =0. The line bundle y% (eigenvec- 
tors) is defined as the kernel of L — ñI; and 

the moduli space of spectral curves is a linear 
system on the surface T. Fixing {e1,...,@g41} in 
the above example gives constraints that define it 
as subsystem of a complete linear system, as well 
as providing a Poisson structure on the whole 
((2g — 1) + g)-dimensional manifold (base = 
curves, fiber = Jacobians) which reduces to the 
standard *` dp; ^ dq;. Equivalent to choosing a 
section s € H®(P', O(g—1) ® Ki), 


T SS) (Cie) 
[r:1 E>SE@Op(gt+IoL 
p’ (A :1) €P! 


Generalizations 


e P! — Riemann surface X of genus g > 1; 
e E stable rank-r vector bundle over X. To give 
a concrete example, we will take r=2 and fix 


det E = Ox. 


Hitchin’s Abelianization Program 


Fact (Hitchin). Every such bundle E over X can be 
realized as the direct image of a line bundle over a 
spectral curve T! X. 


We introduce the moduli space M= 
SU x(2, Ox) = S-equivalence classes of E’s, E semi- 
stable rank-2 bundle over X, detE=Ox. The 
dimension of M is 3g — 3. 

Hitchin (1987) proved that 7T*M is ACI (gener- 
ically, there exist 3g — 3 regular functions in involu- 
tion with respect to the standard symplectic 
structure, with invariant manifolds isomorphic to 
Prym T, where T = spectral curve). 

To recognize the analog of the features high- 
lighted above, we recall that Kodaira—Spencer 
deformation theory gives the following description 
of the cotangent bundle: since a rank-r vector bundle 
over X is determined by a 1-cocycle with values in 
GL(r, Ox), a first-order deformation of E is given by 
a 1-cocycle with values in the associated bundle of 
Lie algebras, hence by a class in H'(X,End(E)), so 
the cotangent bundle has Serre-dual fiber 
H°(X,E@ E* Q K). 


Hitchin map (E,¢) € T*M (Higgs field, trace zero, 
p € H°(X, Endo(E) @ K)): 


H:¢+ det@ (more generally for any 
tA oS H(X, K 42. cust 

ure —u defines Prym IT, p* = deto € H°(X, K®?) 
defines T. 


Co 2, 


Explicit Hamiltonians for the Hitchin System 


The cases in which X is genus 0 and 1 were solved 
explicitly by Nekrasov (1996) using explicit parame- 
trizations of the moduli spaces; this includes the case of 
insertions (singular curves), yielding (elliptic) Gaudin 
models. We report the solution for the genus-2 case 
(van Geemen and Previato 1996). 


Remark The map H projectivizes, 


H : PH°(X,Endo(E) 9 K) + PH? (X, K®?) 
det(cd) = c* det d 


Coordinates on 7*M can be given as follows: 
© C Pic’ !X = canonical theta divisor 
A: M > |20| =P?! 
Ew Dg = {€ € Pic? 'X : h’ (E 8 £) > 0} 


X hyperelliptic = A is 2:1 except for g=2 (every 
point of M is fixed under the hyperelliptic 
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involution), where M & P?. For a vector space V 
the Euler sequence gives 


PT*PV = I= {(x,b) € PV x PV*:xeEb} 
In our case, 
PV x PV* = |20| x |200| 


Define six polynomial functions H; on P? x P* 
by the requirement: for generic q € P°,(H;=0)N 
PTP? = 4; U E, the six pairs of A to KA 
PT*P, where K is the Kummer surface (the 
nine 16 bitangents are cut out by the tropes.) 

Recall that the Grassmannian of lines in 
P? Gr(2,4), is defined by an equation Ss X = 
in Klein’s coordinates 


(Xi :... 1 Xe) € P’ 
X1 = po + p23, X2 = i(po1 — p23) 
X3 = i(po2 — p13), %4= po + p13 
X5 = po3 + pia, X6 = i(po3 — p12) 


where pj =Z;W; — WjZ; are Pliicker’s coordinates 
on the line 
((Zo : ...:Z3)(Wo re ee W3)) Cc p** 


Using coordinates on the incidence variety I given 
by the sections ®; of the bundle projection PT*P* > 


P*, gi: P? > PT“P? =I c P? x P”,q > (q, alq) = 

(q4, Xilq, —)), explicitly given, for g=(x:y:z:t), by 
=(y:-x:t:-z), & =(y:—x:—t: z) 
st t:—x:-y), «4, =(z%:-t:-x:y) 
y:—-x), € =(t:-z:y: -x) 


= 
= Xela) 


Fact For a point q € P’,pe PTP p Z lq), the 
ith Klein coordinate of the line m is zero and 


p € GULE & Hi(p,q) = 





a 


with xj 


= Xj((e(q), P))- 


Conclusion In an affine patch C? x C* 
(x:y:2:1),(u:vi:w: —(xu+ yv + zw))) 


> (4, p) = 


X; 1 ’ i 
moa- yro 
a MTN 


give six Hitchin Hamiltonians, any three of which 
are generically independent. The H? have degree < 4 
in x,y,z and are homogeneous of degree 2 in 
u,v, w; they Poisson-commute with respect to 
dx ^A du + dy A dv + dz ^ dw. 


An example is constituted by 


— 4)(* — 9) 
—(xu + yv + zw))) 


Example 


po = (X — 1)(%° 
(es vig? Lae niw: 
c A’ x A” 


Hy =uv(—70xy — 32x°y — 18xy? — 10z — 32x7z 
+ 18y*z) +v7(—9 — 307? — 16x*y” — 9y* — 32x" 
— 162?) +. u7(—16 — 40x? — 16x* — 9x57 + 18xyz 
— 92?) + vw(—18x + 10xy* + 10yz — 32x7yz 
— 18y°z — 32x2?) + uw(32y + 10x*y — 10xz 
— 32x7z — 18xy?z + 18 yz) + w?(—9x? — 16y? 
+ 10xyz — 16x27 — 9y*z") 


The concept of reduction and r-matrix have been 
generalized to Hitchin systems. Notably, Hitchin later 
showed that the Hamiltonians of the system appear as 
symbols of a heat operator that corresponds to a 
projectively flat connection, the quantization of the 
moduli space of bundles, obtained by changing the 
complex structure of the Riemann surface X. 


Other Aspects 
Special Functions 


Special functions have also been traditionally signifi- 
cant in both algebraic geometry and integrable 
systems. Within the examples presented, elliptic 
functions gave rise to surprisingly sophisticated the- 
ories. The 1-wave solution encountered in the intro- 
duction, u = 2 + const. in the limit when one or both 
periods of the Weierstrass function go to zero, 
becomes exponential or rational, respectively. The 
higher-genus analogs give rise to solitons, or rational 
solutions. On the other hand, the KP solutions which 
are doubly periodic in the x variable (“elliptic 
solitons”) were classified by Krichever (cf. Dubrovin 
et al. (2001)), as forming an ACI Hamiltonian system 
(“elliptic Calogero—Moser”), which, 25 years later, is 
still generating important work, with Hamiltonian 


H= yi +50 (4 - 


iF] 


(where p is the Weierstrass function of a lattice L 
with associated elliptic curve X=C/L, q €X the 
origin) and u=25~"_, plx — xi(to,t3,...)) is a solu- 
tion of the KP hierarchy for suitable time flows ¢; of 
the system (tı =x) and KP Baker function 


W(x; a) = 


The associated spectral curves have been classified in 
moduli by Treibich and Verdier (cf. Treibich 
(2001)); Krichever produced a two-field model as 


well as a universal Poisson structure for the system; 
Donagi and Markman (1996) realized it as a 
generalized Hitchin system. 

More classically, elliptic potentials were the subject 
of much study, in particular by Lamé and Hermite in 
the nineteenth century and Ince in the twentieth; a 
sample result due to Ince makes one feel like Alice in 
Wonderland, who “knelt down and looked along the 
passage into the loveliest garden you ever saw”: the 
Lamé operator L= —@* + a(a+1)o(x—xo0) with 
real, smooth potential is finite gap (namely, almost 
all the periodic eigenvalues are double) iff a € Z (if a 
is positive the number of gaps is a). A generalization 
to several variables (due to Chalykh and Veselov), 


L==A T >. Zap( (a, x)) 


aER, 


where R_, is the set of positive roots for a simple complex 
Lie algebra of rank n, (—, —) is some scalar product in 
R”, invariant under the action of the Weyl group, and 
La = MalMa + 1)(a, a) for some ma € Z, provides one 
of the few known examples of quantum completely 
integrable rings of differential operators in several 
variables. Roughly speaking, this means that the 
centralizer of L contains n operators with functionally 
independent symbols, where n is the number of variables. 

What is more, Chalykh et al. (2003) combine 
differential Galois theory and elliptic function 
theory to characterize (under some mild assump- 
tions) the generalized Lamé operators that are 
algebraically completely integrable: the differential 
Galois group of the solutions is abelian. 


Duality, Fourier-Mukai Transform, and Bispectrality 


Duality is a concept imported from mathematical 
physics; as a mathematical phenomenon, it has not 
reached theoretical maturity. First observed in exam- 
ples, as in Fock et al. (2000), where different definitions 
of dual ACI Hamiltonian systems were given (action- 
angle, action—action, and quantum), it resurfaced for 
the Hitchin system, in more than one guise, whether it 
be an interchange of position and momentum variables 
(Gawedzki and Tran-Ngoc-Bich 1998) or a duality 
between the Lagrangian tori that fiber two such 
systems, coming from a Fourier-Mukai transform, 
namely a twist by the (universal) Picard line bundle: 


P 


l 
Jac(X) x (H? (X, K) = T*Jac(X)) 

Notably, the Picard bundle was used by Nakayashiki 
to give a spectacular generalization of the Burchnall- 
Chaundy result for a genus-2 curve X (more generally, 
Jac(X) is replaced by a generic abelian variety in the 
statement): the coordinate ring of Jac(X) — Ox is the 
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common spectrum of a ring of commuting (g! x g!) 
matrix partial differential operators in g variables. The 
Fourier transform allowed him to extend Sato’s corre- 
spondence 0! <> z and give F a unique (free, rank-g!) 
Dyjac(x)-module structure, where F is a suitable coherent 
sheaf over Jac(X) generalizing the Baker function. 

In this model, the interchange of the x and z 
variables is known as bispectrality (cf. Griinbaum 
(2001)): a somewhat narrower question is a char- 
acterization of the differential operators L in x for 
which there exists a differential operator B in k and 
a common eigenfunction: 


L(x, k) = [(R) W(x, k) 
By(x, k) = 0(x)y(x, k) 


for some functions f,0, typically polynomial. This 
question proved to be related with the KP hierarchy 
and isomonodromy deformations. When to a hier- 
archy there is associated an ACI Hamiltonian system 
(as in the Neumann case shown above), bispectrality 
may produce a dual system, in a sense related to the 
ones discussed, but somewhat mysteriously so. 


Conclusion 


Many important mathematical topics and individual 
contributions regrettably have to go unmentioned in 
an article of this length. The aim was to illustrate 
by simplest examples the geometric nature of 
integrable systems and equations, in the areas of 
spectral curves, moduli of vector bundles over them, 
Grassmann manifolds, special functions, Poisson 
geometry, representation theory, as well as mention 
constructions that are not yet complete, such as 
spectral varieties of higher dimension, dualities 
sweeping vaster moduli spaces, and quantization. 


See also: Billiards in bounded convex domains; 
-Approach to Integrable Systems; Functional Equations 
and Integrable Systems; Integrable Systems and 
Discrete Geometry; Integrable Systems and Recursion 
Operators on Symplectic and Jacobi Manifolds; 
Integrable Systems and the Inverse Scattering Method; 
Integrable Systems in Random Matrix Theory; Integrable 
Systems: Overview; Multi-Hamiltonian Systems; 
Recursion Operators in Classical Mechanics; Riemann- 
Hilbert Methods in Integrable Systems; Solitons and Kac— 
Moody Lie Algebras. 
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Introduction 


Although the main subject of this article is the 
connection between integrable discrete systems and 
geometry, we feel obliged to begin with the 
differential part of the relation. 


Classical Differential Geometry 
and Integrable Systems 


The oldest (1840) integrable nonlinear partial 
differential equation recorded in literature is the 
Lamé system 








H; 10Hj,OH; 1 OH,OH; — F 
OujOu, H; Ou, Ou; H, Ou; Ou, = 
i,j,k distinct [1] 
a e ey 
ou, H; ou, Ou; H Ou; He Ou; Ou; o 


describing orthogonal coordinates in the three- 
dimensional Euclidean space E® (indices i,j,k range 
from 1 to 3). Already in 1869, it was found by 
Ribaucour that the nonlinear Lamé system possesses a 
discrete symmetry enabling to construct, in a linear 
way, new solutions of the system from the old ones. He 
gave also a geometric interpretation of this symmetry 
in terms of certain spheres tangent to the coordinate 
surfaces of the triply orthogonal system. In 1918, 
Bianchi showed that the result of superposition of the 
Ribaucour transformations is, in a certain sense, 
independent of the order of their composition. 

Such properties of a nonlinear equation are 
hallmarks of its integrability, and indeed, the Lamé 
system was solved using soliton techniques in 
1997-98. The above example illustrates the close 
connection between the modern theory of integrable 
partial differential equations and the differential 
geometry of the turn of the nineteenth and twentieth 
centuries. A remarkable property of certain para- 
metrized submanifolds (and then of the correspond- 
ing equations) studied that time is that they allow 
for transformations which exhibit the so-called 
“Bianchi permutability property.” Such transforma- 
tions called, depending on the context, the Darboux, 
Calapso, Christoffel, Bianchi, Backlund, Laplace, 


Koenigs, Moutard, Combescure, Lévy, Goursat, 
Ribaucour, or the fundamental transformation of 
Jonas, can be geometrically described in terms of 
certain families of lines called line congruences. 

In the connection between integrable systems and 
differential geometry, a distinguished role is played 
by the multidimensional conjugate nets, described by 
the Darboux system, which is just the first part [1] of 
the Lamé system with indices ranging form 1 to N > 
3. On the level of integrable systems, this dominant 
role has the following explanation: the Darboux 
system, together with equations describing isoconju- 
gate deformations of the net, forms the multicompo- 
nent Kadomtsev—Petviashvilii (KP) hierarchy, which 
is viewed as a master system of equations in soliton 
theory. In fact, in appropriate variables, the whole 
multicomponent KP hierarchy can be rewritten as an 
infinite system of the Darboux equations. 


Transition to the Discrete Domain 


The recent progress in studying discrete integrable 
systems showed that, in many respects, they should be 
considered as more fundamental than their differential 
counterparts. Consequently, the natural problem of 
extending the geometric interpretation of integrable 
partial differential equations to the discrete domain 
arose, leading not only to the transition to the discrete 
domain of many results on the connection between the 
differential geometry and integrable systems, but also — 
and this seems to be even more important — to the 
description of integrability in a very elementary and 
purely geometric way. 

At the level of integrable equations, the transition 
“from differential to discrete” often makes formulas 
more complicated and longer. On the contrary, at the 
geometric level, in such a transition the properties of 
discrete submanifolds, relevant to their integrability, 
become simpler and more transparent. Indeed, the 
mathematics necessary to understand the basic ideas of 
the integrable discrete geometry does not exceed the 
“ruler and compass constructions,” and many proofs 
can be performed using elementary incidence geometry. 

We will concentrate our attention on the multi- 
dimensional lattice made from planar quadrilaterals, 
which is the discrete analog of a conjugate net. Together 
with the discussion of its properties, which are the core 
of the geometric integrability, we briefly present the 
analytic methods of construction of these lattices and 
we also describe some basic multidimensional integr- 
able reductions of them. Then we discuss integrable 
discrete surfaces; some of them have been found in the 
early period of the “case-by-case” studies. We shall 
however try to present them, from a unifying perspec- 
tive, as reductions of the quadrilateral lattice (QL). 
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Multidimensional Integrable Lattices 
The Quadrilateral Lattice 


An N-dimensional lattice x: Z + R™ is a lattice 
made from planar quadrilaterals, or a quadrilateral 
lattice (QL) in short, if its elementary quadrilaterals 
{x, Tix, Tjx, T;T;x} are planar; that is, iff the follow- 
ing system of discrete Laplace equations is satisfied: 


A; A;x = (TAAK + TAGAR 
ifj, ij=1,.. N [3] 


where Ajj: ZN —IR are functions of the discrete 
variable; here T; is the translation operator in the ith 
direction, and A;=T;—1 is the corresponding 
difference operator. For simplicity, we work here 
in the affine setting neglecting projective geometric 
aspects of the theory. 


The geometric integrability scheme In the case 
N=2 the definition [3] allows one to uniquely 
construct, given two discrete curves intersecting in a 
common vertex and two functions A12, A21 : Us R, 
a quadrilateral surface. For N >2 the planarity 
constraints [3] are instead compatible if and only if 
the geometric data Aj satisfy the nonlinear system 


AAt pag Ag 
= (TAjg)Ag + (TrAn A 
i,j,k distinct [4] 


This constraint has a very simple interpretation: in 
building the elementary cube (see Figure 1), the 
seven points x, T;x, T;x, Tx, Tj;Tjx,T;T,x, and 
Tj Tx (i,j,k are distinct) determine the eighth point 
T;T; Tx as the unique intersection of three planes in 
the three-dimensional space. 

The connection of this elementary geometric point 
of view with the classical theory of integrable 
systems is transparent: the planarity constraint 
corresponds to the set of linear spectral problems 
[3] and the resulting QL is characterized by the 
nonlinear equations [4], arising as the compatibility 
conditions for such spectral problems. Since the QL 
equations [4] are a master system in the theory of 
integrable equations, planarity can be viewed as the 
elementary geometric root of integrability. The idea 





Figure 1 The geometric integrability scheme. 
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that integrability be associated with the consistency 
of a geometric (and/or algebraic) property when 
increasing the dimensionality of the system is 
recurrent in the theory of integrable systems. 


Other forms of the Darboux system The i = j 
symmetry of the RHS of eqns [4] implies the 
existence of the potentials H;: Z > R (the Lamé 
coefficients) such that 


Ajj = 1x] [5] 





and then eqns [4] take the form 








ALH; 
AAH -ùT i i) A;H; 
j 
A;H 
- (T; TA) AH =0, i,j,k distinct [6] 
k 


which is the discrete version of the first part [1] of 
the Lamé system. 

The Lamé coefficients allow to define the suitably 
normalized tangent vectors X; : ZN — R™ by equations 


A;x = (T;H;)X; [7] 


and the functions Q; : ZA —>R,ż¿ Æj, (the rotation 
coefficients) by equations 


AH S {HOn 147 [8] 
Then eqns [3] and [6] can be rewritten in the first- 
order form 


A;Xi = (TjQ;)X;, 1#] [9] 


AkO; = (TrQik) Ox; 


The discrete Darboux system [10] implies the 
existence of other potentials p; defined by the 
compatible equations 

T;pi 
Pa = 1- (T;Q;)(TjQ;), 


i,j,k distinct [10] 


izj =i 


The i < j symmetry of the RHS of eqns [11] implies 
the existence of yet another potential 7: Z > R, 
T;T 


T 





Pi [12] 


which is called the r-function of the QL. In terms of 
the 7-function, and of the functions 


iA 13] 
whose geometric interpretation will be given in a 


later section, the discrete Darboux equations take 
the following Hirota-type form: 


(ilir = (ole = Ce wes. 


Tij = TQ), 


izj 4 
(Timir = (Tet) TH + Ike) Tys hk distinct [15] 


Analytic Methods 


We will show how one can construct large classes of 
solutions of the discrete Darboux equations and the 
corresponding QLs using two basic analytical 
methods of the soliton theory: the 0-dressing 
method and the algebro-geometric techniques. 


The 0-dressing method Consider the nonlocal 
-problem 


Ax(z) + (Rx)(z) = ðv(z) 
lim (x(z) — v(z)) = 0 


|Z] 00 


|16] 


where ô= 8/8z, R is the integral operator 


(Rx) (2) = | R(z,2")x(2) de’ A dz! 


and v(z) is a given rational function of z. 

Let OF € C,i=1,...,N be pairs of distinct points 
of the complex plane, which define the dependence 
of the kernel R on the discrete variable n € Z^: 


N LTA” 


i=1 i 





We consider only kernels Ro(z,z’) such that the 
nonlocal -problem is uniquely solvable. If x(z; n) is 
the unique solution with the canonical normal- 
ization v = 1, then the function 


vem) = xterm] < - = 


i=1 





satisfies the system of the Laplace equations [3] with 
the Lamé coefficients given by 


H,(n) = lim (C Qi ) we ”)) 


2707 \\z — OF 


By construction, the system of such Laplace equa- 
tions is compatible, therefore the Lamé coefficients 
satisfy eqns [6]. To various m-independent measures 
dua on C there correspond coordinates 





x(n) = f p(z: n)dpa(2) 


of a QL x, having H;(n) as the Lamé coefficients. To 
have real lattices, the kernel Ro, the points QF, and 
the measures du, should satisfy certain additional 
conditions. 

One can find a similar interpretation of the 
normalized tangent vectors X; and of the rotation 


coefficients Oj. If x;(z; 7) are the unique solutions of 
the nonlocal 0-problem [16] with the normalizations 


non) = oe x Oe Q; 
i; )= ( z— OF Ig gt)" 


then the functions ~;(z;7), defined by 


N =A 7 
Z= O; 
pilz;n) = ( Xi(z; n) 
Ueo; 
satisfy the direct analog of the linear problem [9], 


Ajilzin) = (T;QO;ln) pln), AI [17] 


Again, by construction, eqns [17] are compatible 
and the functions Q;(n) satisfy the discrete Darboux 
equations [10]. The functions 


a(n) = | blein) dpal) 


are coordinates of the normalized tangent vectors X; 
of the QL x constructed above. 








The algebro-geometric techniques Given a compact 
Riemann surface R of genus g, consider a nonspecial 
divisor D = $£ _, Pa. Choose N pairs of points OF € 
R and the mona canon point O.. Given n € ZN, 
there exists a unique Baker—Akhiezer function Y(n), 
defined as a meromorphic function on R, with the 
following analytical properties: (1) as a function of P € 
R\ UN. ,O*, y(n) may have as singularities only 
simple poles in the points of the divisor D; (2) in the 
points QF function y(n) has poles of the order +n;; and 
(3) in the point Q» function y(n) is normalized to 1. 
When z*(P) is a local coordinate on R centered at 
+, then condition (2) implies that the function y(n) 
in a neighborhood of the point QF is of the form 


p)” S £ (n) (zp Py [18] 
s=0 


The Baker-—Akhiezer function, as a function of the 
discrete variable n € Z, satisfies the system of 
Laplace equations [3] with the Lamé coefficients 
H,(n) =&, , (7). 

Again, by construction, the Lamé coefficients 
satisfy eqns [6]. To various 1-independent measures 
dua on R there correspond coordinates 


=J W(P; n) dua(P) 
R 


YP 2) = (x 


of a QL x. 
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We present the expression of the Baker—Akhiezer 
function and of the r-function of the QL in terms of 
the Riemann theta functions. Let us choose on R the 
canonical basis of cycles {a1,...,ag,01,..., bg} and 
the dual basis {w1,...,wg} of holomorphic differen- 
tials on R, that is, $, wk = 6j,. Then the matrix B of 
b-periods defined as Bik = f, Wp is symmetric and 
has positively defined imaginary part. Denote by 
weg the unique differential holomorphic in 
R\{P, O} with poles of the first order in P,O and 
residues, correspondingly, 1 and —1, which is 
Koa ad by conditions f, wpo =Q. e Riemann 
function 0(z;B),z € C8, is defined by its Fourier 


expansion 
2B) = as exp{ai(m, Bm) + 27i(m, z) } 
meZs 
where (-,-) denotes the standard bilinear form in CÙ. 


Finally, the „Abel map A is given by A(P)= 
(Sp, W4,.. ue. We), where Po € R, and the Riemann 
constants vector K is given by 


K= S E Pair) 


ki 
(=A, a8 


The explicit form of the vacuum Baker—Akhiezer 
function Y can be written down with the help of the 
theta functions as follows: 


9(A(P) + Eka me(A(Qz) - ACOH) +Z) 
9(A(Qx) + Eka me(A(Qz) - A(O4)) +Z) 
6(A(Qx0) + Z) DPE 
WAP +Z) OP (Som L asai) 
where Z = — 5 , A(P;) =K: 


Denote by 7, and sẹ the constants in the 
decomposition of the abelian integrals near the 


point QF 


b(n, P) = 





J woor = Fôy loget (P) + rf + O(z*(P)) 
0 





J woo; = —óys logg (P) + sf + O(z#(P)) 
0 


Then the expression of the 7-function of the QL within 
the subclass of algebro-geometric solutions reads 


T(n) 
= 0 (Znue ny, (A 


x IT a TI H 


k j=1 k=1 


-A(Qf)) + A(Qx) + z) 
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— + 
lki — fki 
Aki = exp (3#) — Nik 


_ 1 e(A(Q;) +Z) 


Finally, we remark that the geometric integrability 
scheme and the algebro-geometric methods work 
also in the finite fields setting, giving solutions of the 
corresponding integrable cellular automata. 


where 


Lk exp (Szk — Sep) 


The Darboux-Type Transformations 


We present the basic ideas and results of the theory 
of the Darboux-type transformations of the multi- 
dimensional QL. 


Line congruences and the fundamental transformation 
To define the transformations we need to define 
first N-dimensional line congruences (or, simply, 
congruences), which are families of lines in R™ 
labeled by points of Z with the property that any 
two neighboring lines l and Tjl,i=1,...,N, are 
coplanar and therefore (eventually in the projective 
extension P™ of R™) intersect. 

The QL F(x) is a fundamental transform of the QL 
x if the lines connecting the corresponding points of 
the lattices form a congruence. The superposition of a 
number of fundamental transformations can be 
compactly formulated in the vectorial fundamental 
transformation. The data of the vectorial fundamental 
transformation are: (1) the solution Y;: Z = V, V 
being a linear space, of the linear system [9]; (2) the 
solution Y; : Z — V*, V* being the dual of V, of the 
linear system [8]. These allow to construct the linear 
operator-valued potential Q(Y,Y*): ZN — L(V), 
defined by the following analog of eqn [7]: 


A;Q(Y,Y*)=Y¥;@(T:iY*), i=1,...,N [19] 


Similarly, one defines Q(X, Y*): ZN — L(V, R™) and 
Q(Y,H):Z“ —V. The transforms of the lattice x 
and other related functions are given by 

F(x) = x — Q(X, Y*)Q(Y, Y*) 'Q(Y, H) 

F (Hi) = H; — ¥Q(Y, Y*) 'Q(Y, H), 


i=1,...,N 

F (Xj) =X; - Q(X, Y")Q(y, Y*)Y;, 
i=1,...,N 

F(Qy) = OF YOY.) Ys, 
ij=1,....N, iF 

F (pi) = pi(1 + (iY; )Q(Y, Y*)Y;), 
ij=1,...,N 





Figure 2 The fundamental transformation as the binary 
transformation. 


Notice that, by the coplanarity of any two neighbor- 
ing lines of the congruence, also the quadrilaterals 
{x, T;x, F(x), F(T;x)} are planar (see Figure 2). Then 
the construction of the transformed lattice mimics 
the geometric integrability scheme. In consequence, 
any quadrilateral 


{x, F(x), F2(x), Fi(F2(x)) =F2(Fi(x))} 


is planar as well. Therefore, on the discrete level, 
there is no difference between the lattice coordinate 
directions and the fundamental transformation direc- 
tions. The distinction becomes visible in the limit 
from the QL to the conjugate net. Therefore, the 
vectorial description of the superposition of the 
fundamental transformations not only implies their 
permutability but also provides the explanation of the 
validity of the practical rule of “integrable discretiza- 
tion by Darboux transformations.” 


The Lévy and Combescure transformations It is 
easy to see that the family t; of lines passing through 
the points x and T;x of a QL forms a congruence, 
called the ith tangent congruence of the lattice. 
When the congruence of the transformation is the 
ith tangent congruence of the lattice x, then the 
corresponding reduction of the fundamental trans- 
formation is called the “Lévy transformation” £;. 

It turns out that, for a generic congruence [, the lattice 
made from intersection points of the lines {and T;'Lisa 
QL, called the ith focal lattice of the congruence. When 
the fundamental transform of the lattice x is the ith focal 
lattice of the transformation congruence, then the 
corresponding reduction of the fundamental transfor- 
mation is called the “adjoint Lévy transformation” L;. 

Both Lévy transformations use only a half of the 
fundamental transformation data, and the corre- 
sponding reduction formulas (in the scalar case) for 
the lattice points read as follows: 


L(x) =x — X;(Y;) 'Q(Y, H) 
£3 (x) = x — Q(X, Y*) (YF) H; 


Notice that the composition of the Lévy and the 
adjoint Lévy transformations gives (see Figure 2) the 
fundamental transformation, also called, for this 
reason, the binary transformation. 

Another reduction of the fundamental transforma- 
tion, important from a technical point of view, is the 
“Combescure transformation,” in which the tangent 
lines of the transformed lattice C(x) are parallel to those 
of the lattice x. The transformation formula reads 


C(x) = x — Q(X, Y*) 


where only the solution Y* of the adjoint linear 
system [8], necessary to build the transformation 
congruence, is needed. 


The Laplace transformations and the geometric 
meaning of the Hirota equation The Laplace 
transform L;(x),i Æj, of the QL x is the jth focal 
lattice of its ith tangent congruence (see Figure 3). It 
is uniquely determined once the lattice x is given. 
The transformation formulas of the lattice points 
and of the 7-function read as follows: 


LAX) =e Ax [20] 
ji 
Le) =a [21] 


The superpositions of Laplace 
satisfy the following identities 


transformations 


Li O Lj = id 
Ljk © Lij = Lik 
Lasly= Ly 


which allow to identify them with the Schlesinger 
transformations of the monodromy theory. 

In the simplest case N=2 one obtains the 
so-called Laplace sequence of two-dimensional QLs 


= Lie); i Lal) 
Lo = lzy, LEZ 
Equations [14] and [21] imply that the 7-functions 


of the Laplace sequence satisfy the celebrated Hirota 
equation (the fully discrete Toda system) 


Tj11127¢ = (Tiro (Tare) = (TiTe—1)(T2Te41) 





-1 
T; x 


Figure 3 The Laplace transformation £jj. 
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Distinguished Integrable Reductions 


We will present here basic reductions of the multi- 
dimensional QL. The geometric criterion for their 
integrability is the compatibility with the geometric 
integrability scheme. 


The circular lattices and the Ribaucour congruences 
QLs ZN—E™ for which each quadrilateral is 
inscribed in a circle are called “circular” lattices. 
They are the integrable discrete analogs of submani- 
folds parametrized by curvature coordinates (e.g., 
the orthogonal coordinate systems described by the 
Laméequations [1]-[2]). 

The integrability of circular lattices is the consequence 
of the fact that if the three “initial” quadrilaterals 
1X, Tix, Tx, TTX}, {X, ae, T,X, LT px}, 1X5 Ik, TX, 
T;T,x} are circular, then also the three new quadri- 
laterals constructed by adding the vertex T;T;Tkx 
are circular as well (see Figure 4). In fact, all the 
eight vertices belong to a sphere, and, in consequence, 
all the vertices of any K-dimensional, K = 2,..., N, 
elementary cell belong to a (K — 1)-dimensional sphere. 

There are various equivalent algebraic descrip- 
tions of the circular lattices: 


1. the normalized tangent vectors X; satisfy the 

constraint 
XTA + Xj; TX; = 0, 1A] 

2. the scalar function x-x:ZN—R satisfies the 
Laplace equations [3] of the lattice x; 

3. the functions X? = (x + Tjx) - X;: ZN —R satisfy 
the same linear system [9] as the normalized 
tangent vectors X;; and 

4. the functions X;-X;:Z“ —R satisfy eqns [11] 
and thus can serve as the potentials p;. 


The Ribaucour transformation œR is the restriction 
of the fundamental transformation to the class of 
circular lattices such that also the “side” quadrilat- 
erals {x, Tix, R(x), R(T;x)} are circular. Again there 
is no geometric difference between the lattice 
directions and the Ribaucour transformation direc- 
tion. Moreover, the quadrilaterals {x,R (x), 





Figure 4 The geometric integrability of circular lattices. 
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R(x), R1(R2(x)) = R2(Ri(x))} are circular as well. 
In consequence, the vertices of the elementary K-cells, 
K=2,...,N, of the circular lattice and the correspond- 
ing vertices of its Ribaucour transform are contained in 
a K-dimensional sphere. Finally, for K = N, one obtains 
a special Z family of N-dimensional spheres, called 
the Ribaucour congruence of spheres. 

Algebraically, the Ribaucour transformation 
needs only a half of the data (necessary to build 
the congruence) of the fundamental transformation. 
The data of the vectorial Ribaucour transformation 
consists of the solution Y; -ZN — V*, of the linear 
system [8]. Then, because of the circularity con- 
straint, Y;: ZN = V given by 


Y; = (Q(X, Y*) + T;Q(X, Y*)) "X; 
is a solution of the linear system [9], and the constraints 
Q(Y, H) + Q(X°, Y*)' = 2 Q(X, Y*)'x 
Q(Y, Y*) + Q(Y, Y*)" = 2 Q(X, Y*) Q(X, Y*) 


are admissible. 

We remark that the above constraints have a simple 
geometric meaning when one considers the circular 
lattices in E™ as the stereographic projections of QLs 
in the Mobius sphere SM; that is, as a special case of 
QLs subjected to quadratic constraints. 


The symmetric lattice Given a QL x with rotation 
coefficients OQ; and potentials p; given by [11], then 
the functions OQ; defined by equation 


PiTiQj; = PiTiQi, FFI 

and called, because of their geometric interpretation, 
the backward rotation coefficients, satisfy the 
Darboux system [10] as well. A QL is called 
symmetric if its forward rotation coefficients Qj; 
are also its backward rotation coefficients. Again the 
constraint is compatible with the geometric integr- 
ability scheme, that is, it propagates in the construc- 
tion of the lattice. One can show that a QL is 
symmetric if and only if its rotation coefficients 
satisfy the following trilinear constraint: 


(T;Qji) (Tj Qe) (Te Qik) = (TF Qi) (Ti Qei) (Te Q jx) 


i,j,k distinct 


To obtain the corresponding reduction of the 
fundamental transformation we again need only half 
of the data. Given a solution Y; :ZN +V", of the 
linear system [8], then, because of the symmetric 
constraint, Y;: ZN — V, defined by 


yY, = pi(T;Y*)* 


is the solution of the linear system [9]; notice that, 
equivalently, we could start from Y;. The constraint 


Q(Y, Y*)=Q(Y,Y*)" 


is then admissible and gives a new symmetric lattice. 

There are other multidimensional reductions of 
the QL like, for example, the D-invariant and 
Egorov lattices or discrete versions of immersions 
of spaces of constant negative curvature. We remark 
that the transformations and reductions discussed 
above have also a clear interpretation on the level of 
the analytic methods. 


Integrable Discrete Surfaces 


In this section we present some distinguished examples 
of discrete integrable surfaces. Notice that, although 
the geometric integrability scheme is meaningless for 
N =2, it can be applied indirectly, by considering the 
discrete surfaces, together with their transformations, 
as sublattices of multidimensional lattices. 

We remark also that one can consider integrable 
evolutions of discrete curves, which give equations 
of the Ablowitz—Ladik hierarchy, and the corre- 
sponding integrable spin chains. 


Discrete Isothermic Nets 


An isothermic lattice is a two-dimensional circular 
lattice x:Z7—E™ with harmonic quadrilaterals; 
that is, given x, Tıx and Tx, then the point Tı Tzx 
is the intersection of the circle (passing through 
x, Tıx and T>x) and the line passing through x and 
the meeting point of the tangents to the circle at Tix 
and Tx (see Figure 5). Therefore, given two discrete 
curves intersecting in the common vertex xo, the 
unique isothermic lattice can be found using the 
above “ruler and compass” construction. 
Algebraically the reduction looks as follows. Any 
oriented plane in E™ can be identified with the 
complex plane C. Given any four complex points 
Z1, 22, Z3, and z4, their complex cross-ratio is defined by 


(Zi = 2) (2a 24) 


q(Z1, 22, 23 , 24) — (z2 = 23) (Za = 21) 





Figure 5 Elementary quadrilaterals of the isothermic lattice. 


One can show that the cross-ratio is real if and only 
if the four points are cocircular or collinear. In 
particular, a harmonic quadrilateral with vertices 
numbered anticlockwise has cross-ratio equal to —1. 
Therefore, abusing the notation (it can be forma- 
lized using Clifford algebras), the isothermic lattice 
is defined by the condition 


q(x, Tix, T Tox, Tax) ==] 


We remark that the definition of isothermic lattices 
can be slightly generalized allowing for the above 
cross-ratio to be a ratio of two real functions of 
single discrete variables. 

The restriction of the Ribaucour transformation 
to the class of isothermic lattices, named after 
Darboux who constructed it for isothermic surfaces, 
has as its data a real parameter À and the starting 
point D(xo), and can be described as follows. Given 
the elementary quadrilateral {x, Tix, T>x,T, Tox} 
of the isothermic lattice, and given the point D(x), 
then the points D(T,x) and D(Tx) belong to the 
corresponding planes and are constructed from 
equations 


q(x, D(x), D(Tix), Tix) =À 
q(x, D(x), D(T2x), Tax) = —À 


It turns out that the point D(T1ıT2x), constructed by 
the application of the geometric integrability 
scheme, is such that the quadrilateral {D(x), 
D(T,x), D(T2x), D(T,T2x)} is harmonic. Moreover, 
the construction of the Darboux transformation is 
compatible; that is, the new side quadrilaterals have 
the correct cross-ratios A and —A. 

There are various integrable reductions of the 
isothermic lattice, for example, the constant mean 
curvature lattice and the minimal lattice. 


Asymptotic Lattices and Their Reductions 


An asymptotic lattice is a mapping x: Z7 — R? such 
that any point x of the lattice is coplanar with its 
four nearest neighbors T,x,T x,T,'x,Tz'x (see 
Figure 6). Such a plane is called the tangent plane 
of the asymptotic lattice in the point x. 

It can be shown that any asymptotic lattice x can 
be recovered from its suitably rescaled normal (to 





Ta x 


Figure 6 Asymptotic lattices. 
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the tangent plane) field N : Z? — R? via the discrete 
analog of the Lelieuvre formulas 


Aix = (TiN) x N, A2xX =N x (T-N) [22] 


By the compatibility of the Lelieuvre formulas, the 
normal field N satisfies the discrete Moutard 
equation 


TiIT2N +N = F(T,N + TN) [23] 


for some potential F: Z7 — R. 
Given a scalar solution 6 of the Moutard equation 


[23], a new solution M(N) of the Moutard 
equation, with the new potential 
(T10) (T20) 
F) = ————_ 
ALE (Tı T20)0 
can be found via the Moutard transformation 
equations 
6 
M(T1N) FN = Tg MW) + TiN) [24] 


MT:N) +N = (M(N) + DN) (25 | 
T20 

Now, via the Lelieuvre formulas [22], one can 
construct a new asymptotic lattice M(x)=x+ 
M(N) x N. The lines connecting corresponding points 
of the asymptotic lattices x and M(x) are tangent to 
both lattices. Such a Z*-family of lines in R? is called 
Weingarten (or W for short) congruence. Notice that 
this is not a congruence as considered earlier. 

Various integrable reductions of asymptotic lat- 
tices are known in the literature: pseudospherical 
lattices, asymptotic Bianchi lattices and isothermally 
asymptotic (or Fubini-Ragazzi) lattices, and discrete 
(proper and improper) affine spheres. 

Formally, the Moutard transformation is a reduc- 
tion of the (projective version of the) fundamental 
transformation for the Moutard reduction of the 
Laplace equation. However, the geometric relation 
between asymptotic lattices and QLs is more subtle 
and the geometric scenery of this connection is the line 
geometry of Pliicker. Straight lines in R? c P? are 
considered there as points of the so-called Pliicker 
quadric Qp C P*. A discrete asymptotic net in P>, 
viewed as the envelope of its tangent planes, corre- 
sponds to a congruence of isotropic lines in Qp, whose 
focal lattices represent the asymptotic directions. The 
discrete W-congruences are represented by two- 
dimensional QLs in the Pliicker quadric. 


The Koenigs Lattice 


A two-dimensional QL x:Z7—P™ is called a 
Koenigs lattice if, for every point x of the lattice, 
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Figure 7 The Koenigs lattice. 


the six points x4, Tjx1y,T?xy,i=1,2, of its 
Laplace transforms belong to a conic (see Figure 7). 
The nonlinear constraint in definition of the Koenigs 
lattice can be linearized, with the help of the Pascal 
“mystic hexagon” theorem, to the form that the line 
passing through x and TıTzx, the line passing 
through x; and T?x1, and the line passing through 
x and Tx, intersect in a point. 

Algebraically, the geometric Koenigs lattice con- 
dition means that the Laplace equation of the lattice 
in homogeneous coordinates x:Z7 > R™*' can be 
gauged into the form 


TiTəx +x = T; (Fx) + T> (Fx) [26] 


It turns out that, if N is a solution of the Moutard 
equation [23], then x=TıN + TN satisfies the 
Koenigs lattice equation. Therefore, the algebraic 
theory of the discrete Koenigs lattice equation [26], 
its (Koenigs) transformation, and the permutability 
of the superpositions of such transformations is 
based on the corresponding theory for the Moutard 
equation [23]. 

Geometrically, the Koenigs lattices are selected 
from the QLs as follows. Given a two-dimensional 
QL x: Z? —P™ and given a congruence I with lines 
passing through the corresponding points of the 
lattice. Denote by y; = T1 N 1, i= 1,2, points of the 
focal lattices of the congruence. For every line [, 
denote by 2 the unique projective involution exchan- 
ging y; with T;y;. If, for every congruence [, the 
lattice K(x):Z7 — P™, with points K(x) =2(x), is a 
QL, then the lattice x is a Koenigs lattice. The above 
construction gives also the corresponding reduction 
of the fundamental transformation. 

A distinguished reduction of the Koenigs lattice is 
the quadrilateral Bianchi lattice. The natural con- 
tinuous limit of the corresponding equation is 
equivalent to the Bianchi (or hyperbolic Ernst) 
system describing the interaction of planar gravita- 
tional waves. 


Discrete Two-Dimensional Schrodinger Equation 


In the previous sections we have discussed examples 
of integrable discrete geometries described by 
equations of hyperbolic type. Below we present 
some results associated with the elliptic case; it is 
remarkable that the QL provides a way to connect 
these two subjects. 

Consider a solution N : Z7 — R? of the general self- 
adjoint five-point scheme on the star of the Z7 lattice 


aTıN + T,'(aN) + bT,)N + Tz'(bDN) — cN=0 [27] 


then the lattice x:Z7—R° 
Lelieuvre type formulas 


obtained by the 


Aix =—(Ty'b)N x Ty'N 


28 
Ax =(T;'a)N x T;'N 28] 


is a QL having N as normal (to the planes of 
elementary quadrilaterals) vector field. 

The following gauge-equivalent form of eqn 27, 
namely 


I T T 
zm a | —— mol 
TT ivt+T; (Ta) +a 2Y 


I 
+T;' (EF v) — qy =0 [29] 


an integrable discretization of the Schrödinger 
equation 


y Ow 
a pe tonn — 0 
art a a 


is also the Lax operator associated with an integrable 
generalization of the Toda law to the square lattice. 

The five-point scheme [27] is also a distinguished 
illustrative example of the sublattice theory. Indeed, 
it can be obtained restricting to the even sublattice 
ZZ the discrete Cauchy-Riemann equations 


T11n¢ — 6=1G(11¢ — T2¢) [30] 


Because of the equivalence (on the discrete level!) 
between eqn [30] and the discrete Moutard equation 
[23], the five-point scheme [27] inherits integrability 
properties (Darboux-type transformations, superpo- 
sition formulas, analytic methods of solution) from 
the corresponding (and simpler) integrability proper- 
ties of the discrete Moutard equation. 


See also: Backlund Transformations; -Approach to 
Integrable Systems; Integrable Discrete Systems; 
Integrable Systems and Algebraic Geometry; Integrable 
Systems and the Inverse Scattering Method; Integrable 
Systems: Overview; Nonlinear Schrodinger Equations; 
Sine-Gordon Equation; Stability Theory and KAM; Toda 
Lattices. 


Integrable Systems and Recursion Operators on Symplectic and Jacobi Manifolds 87 


Further Reading 


Akhmetshin AA, Krichever IM, and Volvovski YS (1999) Discrete 
analogs of the Darboux—Egoroff metrics. Proceedings of the 
Steklov Institute of Mathematics 225: 16-39. 

Białecki M and Doliwa A (2005) Algebro-geometric solution of 
the discrete KP equation over a finite field out of a 
hyperelliptic curve. Communications in Mathematical Physics 
253: 157-170. 

Bobenko AI (2004) Discrete differential geometry. Integrability as 
consistency. In: Grammaticos B, Kosmann-Schwarzbach Y, 
and Tamizhmani T (eds.) Discrete Integrable Systems, 
pp. 85-110. Berlin: Springer. 

Bobenko AI and Seiler R (eds.) (1999) Discrete Integrable 
Geometry and Physics. Oxford: Clarendon. 

Bogdanov LV and Konopelchenko BG (1995) Lattice and 
g-difference Darboux—Zakharov-Manakov systems via ð 
method. Journal of Physics A 28: L173-L178. 

Cieslinski J (1997) The spectral interpretation of N-spaces of 
constant negative curvature immersed in R?7\~!. Physics 
Letters A 236: 425-430. 

Doliwa A, Grinevich PG, Nieszporski M, and Santini PM (2004) 
Integrable lattices and their sublattices: from the discrete 
Moutard (discrete Cauchy—Riemann) 4-point equation to the 
self-adjoint 5-point scheme, nlin.SI/0410046. 

Doliwa A, Mafias M, Martinez Alonso L, Medina E, and Santini 
PM (1999) Charged free fermions, vertex operators and 
transformation theory of conjugate nets. Journal of Physics 
A 32: 1197-1216. 

Doliwa A, Nieszporski M, and Santini PM (2001) Asymptotic 
lattices and their integrable reductions. I. The Bianchi and the 


Fubini-Ragazzi_ lattices. Journal of Physics A 34: 
10423-10439. 

Doliwa A, Nieszporski M, and Santini PM (2004) Geometric 
discretization of the Bianchi system. Journal of Geometry and 
Physics 52: 217-240. 

Doliwa A and Santini PM (2000) The symmetric, D-invariant and 
Egorov reductions of the quadrilateral latice. Journal of 
Geometry and Physics 36: 60-102. 

Doliwa A, Santini PM, and Manas M (2000) Transformations of 
quadrilateral lattices. Journal of Mathematical Physics 41: 
944-990. 

Klimczewski P, Nieszporski M, and Sym A (2000) Luigi Bianchi, 
Pasquale Calapso and solitons. Rend. Sem. Mat. Messina, Atti 
del Congresso Internazionale in Onore di Pasquale Calapso, 
Messina, 12-14 October 1998, pp. 223-240. 

Manas M (2001) Fundamental transformation for quadrilateral 
lattices: first potentials and 7-functions, symmetric and 
pseudo-Egorov reductions. Journal of Physics A 34: 
10413-10421. 

Matsuura N and Urakawa H (2003) Discrete improper affine 
spheres. Journal of Geometry and Physics 45: 164-183. 

Rogers C and Schief WK (2002) Backlund and Darboux 
Transformations. Geometry and Modern Applications in 
Soliton Theory.Cambridge: Cambridge University Press. 

Schief WK (2003a) Lattice geometry of the discrete Darboux, KP, 
BKP and CKP equations. Menelaus’ and Carnot’s theorems. 
Journal of Nonlinear Mathematical Physics 10(suppl. 2): 
194-208. 

Schief WK (2003b) On the unification of classical and novel 
integrable surfaces. II. Difference geometry. Proceedings of the 
Royal Society of London A 459: 373-391. 


Integrable Systems and Recursion Operators on Symplectic 


and Jacobi Manifolds 


R Caseiro and J M Nunes da Costa, Universidade 
de Coimbra, Coimbra, Portugal 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Let (M,w) be a symplectic manifold of dimension 
2n. We denote by { the natural isomorphism 
between T*M and TM, defined by the equation 
yw =a, aE IM [1] 
We say that ‘df is the Hamiltonian vector field 
defined by the Hamiltonian f:M — R. 
Associated with the nondegenerated closed 2-form w 


there is also a Poisson bracket on C%(M), the space of 
real differentiable functions on M, defined by 


loeta CM] x C®(M) — C°(M) 
(fg) {f gh. = odf, dg) 


We say that two smooth functions F,G:M — R 
are in involution if 


tF, Gt, = 0 |2] 


Suppose we have n independent smooth functions 
in involution H,,...,H,, such that the associated 
Hamiltonian vector fields X1, ...,X, are complete 
on the level manifold 


Mg = 0 EM: Hx) = Gif = lyera} [3] 
The classical theorem of Arnol’d—Liouville states that 


1. the submanifold M, is invariant with respect to 
each one of the Hamiltonian commuting flows 
generated by H4, ..., Hy; 

2. every connected component of M, is diffeo- 
morphic to a product of a Euclidean space by a 
torus, RF x T*; 

3. there exist coordinates fi, ...,fn-k, Yi5--+5 Vp in 
M, such that the Hamiltonian systems in M,, 
associated with the Hamiltonians H;, have the form 


f, = Cpm =w, (w=w(a),c=const.) [4] 
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4. if M, is compact then it is diffeomorphic to T” 
and there exists a neighborhood of M, on M, 
symplectically diffeomorphic to B” x T”. 


A completely integrable Hamiltonian system is a 
Hamiltonian vector field X, that admits 7 integrals 
H,,...,H, satisfying the hypothesis of Arnol’d- 
Liouville theorem. 

It may happen that a system has more than n 
independent integrals of motion. In this case it is 
called superintegrable and not all the integrals are in 
involution. Supposing that 


Mo —= 4 € ie) Sa =], ...,n+k} 


is compact and connected and that Hj,...,H,,_, 
commute with all the n + k integrals, then M, is 
diffeomorphic to the torus T”~*. In particular, if the 
system is maximally superintegrable, that is, 
k =n —1,M, is diffeomorphic to T! = S! and all 
the trajectories are closed. 

To prove that a system is completely integrable, we 
have to find a sufficient number of integrals of the 
system in involution. The Lax pair is an extremely 
powerful tool in this task, although it does not 
guarantee the involution of the integrals found. 

A Lax pair of a vector field X on a smooth 
manifold M is a pair of operators (L, M) such that 


iL =(M,L] =ML—LM [5] 
This equation is equivalent to 
U-'LU = Lo [6] 


where U is the solution operator of the Cauchy 
problem 


U=MU, U(0)=I [7] 


So, the eigenvalues of L are integrals of X. Notice 
that all the pairs (L£, M), k € N, are Lax pairs of the 
system and we may conclude that the functions 
tr L£, k € N, are integrals of X. 

The first goal of this article is to relate 
integrable Hamiltonian systems and recursion 
operators, where some of the most important 
properties of the latter are exhibited. Very natu- 
rally, the Poisson—Niyenhuis manifolds appear in 
this context and the Toda lattice is the example 
chosen in order to show the whole theory working 
in practice. Also, we see how recursion operators 
can help in the construction of quadratic algebras 
of integrals of motion and, in the last section, we 
present the generalization to Jacobi manifolds of 
the Nijenhuis structures defined for Poisson 
manifolds. 


Integrable Systems on Poisson-Nijenhuis 
Manifolds 


Let X be a vector field on a smooth manifold M. 
A recursion operator of X is a (1,1)-tensor R 
invariant of X: 


LyR =0 8) 


The (1,1)-tensors, and in particular the recursion 
operators, may be regarded as fiber endomorphisms 
of TM. So, given a (1,1)-tensor R, we denote by 
'R:T*M — T*M the transpose of R:TM — TM, 
that is, 


(R(a),X) = (a,R(X)), ae T*M, XET [9] 


where (.,.) denotes the canonical pairing between 
T*M and TM. 

Recursion operators also generate symmetries. If R 
is a recursion operator and Y is a symmetry of X, that 
is, [X, Y] = 0, then RY is also a symmetry of X. So, 
given a recursion operator R of X, we may construct a 
sequence of symmetries of X, REY, k € N. 

The Nijenhuis torsion of a (1,1)-tensor R is the 
(1,2)-tensor T(R) defined by 


T(R)(X, Y) =[RX, RY] — R([X, RY] + [RX, Y] 
—R[X,Y]), X,Y € X(M) 10) 


A Nijenhuis operator is a (1,1)-tensor, R, with 
vanishing Nijenhuis torsion, that is, 


LrxR = RLxR 11) 


These operators can generate sequences of closed 
1-forms. If R is a Nijenhuis operator and a is a 
closed 1-form such that dR(a)= 0, then 
d*R¥(a) =0,kEN. In the particular case of a 
being exact, that is, a = df and the first cohomol- 
ogy group being trivial, then we have a sequence of 
local integrals of motion df, = ‘R*(df). 

A Nijenhuis recursion operator R and a symmetry 
Y of a vector field X lead to a sequence of 
commuting symmetries REY, k € N, 


[R'Y,R'Y]=0, ijEeN [12] 


To define the integrability in terms of a (1,1)- 
tensor is of special relevance when we try to extend 
everything to the infinite-dimensional case. 

Notice that in coordinates (q1,..., qn), the condi- 
tion [8] is equivalent to 


R=[A,R) 13] 


where A is the n x n matrix defined by 


OX! 
Ay = 
Fal 
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and X = X(q;) = ġ j= 1,...,7. So, the pair 
(R,A) is a local Lax pair of the system and the 
eigenvalues of R are integrals of X. 

If a recursion operator R of a vector field X on a 
manifold M has vanishing Nijenhuis torsion and 1 
doubly degenerated eigenvalues A; with nowhere- 
vanishing differentials, (d\;), # 0, then X defines a 
completely integrable Hamiltonian system. 

Now suppose X defines a completely integrable 
Hamiltonian system with Hamiltonian H on a 
symplectic manifold (M,w). Let (,...,ln, %1, 
...,n) be the action-angle variables in a neighbor- 
hood of an invariant torus. Two cases may happen: 


1. The Hamiltonian H is separable in the action 
variable, that is, 


H = X H(I) [14] 
k 
In this case, the (1, 1)-tensor 
k= N Akla) dI, & E dy, & nia [15] 
} Ol, Ov, 


where Ag are functions with nowhere-vanishing 
differentials, is a recursion operator of X, and has 
vanishing Nijenhuis torsion and doubly degener- 
ated eigenvalues. 

2. The Hamiltonian has nonvanishing Hessian 


0-H 
det | —— 0 16 
i (ar) 7 Pl 
In this case we may define new coordinates 
R= diessa [17] 
and a new symplectic structure 


FH 
w1 = 21d h dee D le [18] 
J 


The vector field X is Hamiltonian with respect to 
w1, with Hamiltonian 


1 2 
H = 72 Vi; [19] 
and the (1, 1)-tensor 
R= S— A (Ik) nO ae EA [20] 
z OVk Opr 


is a recursion operator of X. 


Nijenhuis operators also allow the construction of 
master symmetries from conformal ones. 

A conformal symmetry of a tensor field T is a 
vector field Z such that 


for some constant A 


BTT 


A master symmetry of a vector field X is a vector 
field Y such that 
X, [X,Y] =0, but [X, Y] #0 
Let R be a recursion operator of Xo and Zo be a 
conformal symmetry of Xo and R such that 


Lzy X0 = AXo and Lz, R = R [21] 


for some constants A, 4. 

If R is also a Nijenhuis operator, then defining the 
sequences of commuting symmetries X, = R*Xo 
and of conformal symmetries Z, = R*Zo,k E€ N, 
we have, for all k,j € No, 


(Zina [22] 
Ze, Zj] = pQ — R)Zjrp [23] 
Zp, Xj] = (A+ fu) Xe; [24] 


A bi-Hamiltonian manifold is a smooth manifold 
M endowed with two linearly independent Poisson 
tensors Ap, A1, compatible in the sense that their 
Schouten bracket vanishes, [Ap, A1] = 0. 

A vector field is said to be bi-Hamiltonian if it is 
Hamiltonian with respect to both Poisson structures. 
The equation that rules the flow of this vector field 
is said to be a bi-Hamiltonian system. 

When one of the Poisson structures is obtained 
from the other by means of a Nijenhuis operator, we 
obtain a Poisson—Nijenhuis manifold. Hence, a 
Poisson—Nijenhuis manifold is a differentiable mani- 
fold M endowed with a Poisson tensor A and a 
(1,1)-tensor R such that 

RA? = A*"R, [RA, A] = 0 and [RA, RA] = 0 

A classical example is the one of a bi-Hamiltonian 
manifold (M, Ao, A1) where Ao is nondegenerated. In 
this case we may define the Nijenhuis operator 
R = AA and the manifold M is a Poisson- 
Nijenhuis one. 

The characteristics of the Poisson—Nijenhuis 
manifold guarantee that all the bivectors A, = R*A 
are compatible Poisson tensors and the manifold is 
not just bi-Hamiltonian but multi-Hamiltonian. 

From what we saw, a Hamiltonian system is 
completely integrable if and only if it is bi-Hamiltonian 
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in a neighborhood of an invariant torus with the 
eigenvalues of the existing recursion operator provid- 
ing its complete integrability. These Poisson—Nijenhuis 
manifolds appear quite frequently in dynamics and 
allow us to obtain some interesting properties easily. 
We finish this section with the Toda lattice. This 
system is a good illustration of what has been said until 
now. 

Consider R?™! with coordinates (a1,...,dn—1, 
b1,...,b6,) equipped with the following compatible 
Poisson tensors: 





ioe 


—1 
o O o 
car F a 25 
=z) a ' Oa; (a a=) | | 
A E a å T PEA 
T i f Ob j44 Ob; 4 i ' Oa; 


a a 
aT $) [26] 





o 
A (ai Oana + 26541 


Not only these two Poisson tensors are degener- 
ated but also there is no Nijenhuis operator that 
transforms Ao into A;. This can be seen considering 
the 1-form S~’_,db;. This 1-form belongs to the 
kernel of Ag but not to the kernel of A;. So, the bi- 
Hamiltonian manifold (R?”~',Ao,A1) is not a 
Poisson—Nienhuis one. 

The Toda lattice is the bi-Hamiltonian system in 


R277! . 


Xı = A$ (dH1) = AË (dH) [27] 
defined by the Hamiltonians 
=2S b; 
i=1 ng] 


n—1 n 
Hy, =4\ a +25 b? 
i=1 i=1 


that is, 
å; = a;(bin1 — bi), if1<i<n-1 
bı =a 
b; =2(@ — aż), if2<i<n-1 
b, = =A 4 


Since we do not have a Nijenhuis operator in this 
setting, we are going to consider a new system in 
R?” that reduces to the Toda lattice, derive a 
hierarchy of Hamiltonians, symmetries, Poisson 
tensors, conformal symmetries and the associated 

: . 2n—-1 
relations and then transport everything to R by 
reduction. 


Consider the Flaschka transformation 


T: R2” = R2*-1 


(q1, ---,đn, P1,- --;,Pn) Oy san Abin) 


where 
1 di — Gist 1 
a; = zep), b; = —zPj 
1=1,...,n— 1, j=1,...,n [29] 


This application is a Poisson morphism between 
(R2”, Ao, Ai) and (R1, Ao, A1), where 


~ “. 0 o 
Ag= > —A [30] 
2 Opi Odi 





n—1 
~ o O 
2 Opi+ı Ôp; 


ð 
+S (og ng Dp, + Dag’ =| [31] 


The Poisson tensor Ao is none generan i aia we 
may define the Nijenhuis operator R = Aj At gn, So, 
R”, Ao, A1) is a Poisson—Nyenhuis agro and 
the bivectors of the sequence (A, =R*Ao), REN, 
are compatible Poisson tensors. 

The Toda lattice is the reduced bi-Hamiltonian 
system, by means of the Flaschka transformation, of 
the bi-Hamiltonian system 


Xı = A$ (dĦ1) = A! (dH) [32] 


where 
[33] 


We may define the sequence of commuting 
vector fields X, = Re 1X4, k € N, and the sequence 
of Hamiltonians dH, = 'Rk(dHo), k € N, first inte- 
grals of all the vector fields X; and in involution 
with respect to all the Poisson structures Aj. 

Moreover, considering the conformal symmetry of 
Ao, A1, and Ho defined by 


Zo = s(n m2) + pipe [34] 


f=] 


we have the following relations on R”: 


Ly kak [35] 
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Zim, Zk] = a [36] 
Ze, Xm = MX bn [37] 
Ly Am = (m—k—-1)Agim [38] 
Zp-Hm = (m+n + 1)Hpym [39] 


Although we do not have a Nijenhuis operator on 
(R1, Ao, Ay), the deformation relations [35]-[39], 
obtained for the Poisson—Nijenhuis manifold 
(R, Ao, hy’ may be reduced to the bi-Hamiltonian 
manifold (R?”~', Ag, A1) by means of the Flaschka 
transformation 7. 


Recursion Operators and Algebras 
of Integrals of Motion 


A master integral of a vector field X is a differenti- 
able function g such that 


LxLxg = 0 and Lxg A 0 [40] 


So, a master integral g generates an integral of 
motion Lyg of the system X. It is worth noticing that 
if f and g are master integrals, then not only Lyf and 
Lxg are integrals but also (Lxf)g—f(Lxg) is an 
integral of the system. This means that several master 
integrals may lead to extra integrals of motion. This 
procedure often leads to the construction of the 
integrals which provide the superintegrability of the 
system in consideration. This is the case of, for 
instance, the generalized rational Calogero—Moser 
system or the geodesic flow on the sphere. 

Recursion operators are often used to construct 
sequences of master symmetries of vector fields. The 
obvious connection between master symmetries and 
master integrals carries the recursion operators to 
this level. In many cases, the integrals of motion 
generated by the master integrals constructed on the 
basis of the existence of a recursion operator close in 
a quadratic algebra with respect to the Poisson 
structure we are considering (by quadratic algebra 
we mean that the brackets between the generators 
are polynomials of degree 2 in the generators). 

Let X be a vector field on a manifold M, R a 
Nijenhuis operator which is also a recursion 
operator of X, and P a (1,1)-tensor such that 


LxP = a(R) 
and 


LpxR = b(R) 


where a and 6 are polynomials with constant 
coefficients. The sequences X; = R'X, Y; = R'(PX), 


i € No, X_1 = Y_1 = O satisfy 
[Xi, Xj] = 0 [41] 
[X;, Yj] = a(R)Xi4; — ib(R)Xi4j-1 [42] 
[Yi, Yj] = G — e)D(R) Y [43] 


If (M,A) is a nondegenerated Poisson manifold 
with trivial first cohomology group, RA is a bivector 
and X and Y are Hamiltonian vector fields with 
respect to A and RA, that is, there exist functions 
Ho, Hı, Go, and G, satisfying 


X = A¥(dH,) = RA*(dHo) 
Y = A*(dG;) = RA! (dGo) 


then the sequences of exact differentials 


‘R'(dH1) = dH; and *R'(dG1) = dG; 

may be constructed. In this case, the functions G; are 
master integrals of all the vector fields X; and the 
integrals X;(G;) and Ly, = XG) G — XAG AIG, 
j,k € No, close in a quadratic algebra with respect 
to the Poisson bracket associated with A. 

If M is not a Poisson manifold but we can find a 
master integral G of all the vector fields X; of the 
sequence, then the functions G; = Y;(G) are also 
master integrals of the same vector fields and the 
functions X;j(G;) and Ly = Xj(Gg)G; = Xj(G;)Gz 
are integrals of X;. 

Now let us consider the completely integrable 
bi-Hamiltonian system case. In a neighborhood of 
an invariant torus, a completely integrable 
bi-Hamiltonian system may be written in the form 

Ron,- 


Ya) = yray [44] 


with 





“0 o 
Ao = zA 
2 Oyi Oi 
- o o 
Mi = Soya Aare 
2 Oyi Qi 


the compatible Poisson tensors that provide the 
complete integrability of the bi-Hamiltonian system. 
In this case, we may define the recursion operator 


R= Yow 8 dy + ae Z odo) 
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for which A; = RAọ, and the bi-Hamiltonian vector 





field 
x= Ai (dH) = = A‘ ld T Yi | 
The (1, 1)-tensor 
o 
P= X (dedó + 5-2 dyi) 
Shez OO; ot OY; - 


satisfies LxP = Id and LpxR =0. So, the vector fields 


= o 
Yp = R°(PX) = Sy Gi 
= OO; 
and the function G = )~"_, yi¢; help defining the 
functions G; = Y;(G),i € No. 
The integrals of X; 


X,(G;) and L}; = X_(Gi)G; — GiX,(G;) [45] 


happen to close in a quadratic algebra with respect 
to the bracket defined by Ao. 


Recursion Operators on Jacobi 
Manifolds 


In this section we extend the notion of Poisson— 
Nijenhuis manifold to the Jacobi setting. 

Let M be a smooth manifold with a bivector field 
A and a vector field E. We equip the space C~(M) 
with the bracket 


{f,g} = A(df, dg) + fE(g) — gE(f) 


which is bilinear and skew-symmetric, and satisfies 
the Jacobi identity if and only if 


[A,A] = -—2EAA and [E,A] =0 _ [46] 


When these conditions are satisfied, (M, A, E) is 
called a Jacobi manifold with Jacobi bracket {, }. 
The pair (C™~(M),{,}) is a local Lie algebra in the 
sense of Kirillov. If the vector field E identically 
vanishes on M, eqns [46] reduce to [A,A] =0 and 
(M, A) is just a Poisson manifold. But there are other 
examples of Jacobi manifolds that are not Poisson, 
for example, contact manifolds. 

We denote by (A,E)*:T*M x R — TM x R the 
vector bundle map associated with (A, E), that is, for 
all a, 8 sections of T*M and f € C™(M), 


(A, E)” (a, f) = (A* (a) + fE, —iza) 


Let R:X(M) x C?(M) > X(M) x C~(M) be a 
C™°(M)-linear map defined by 


R(X, f) = (NX+fY,ixyt gf) [47] 


where N is a tensor field of type (1,1) on M,Y € 
¥(M),y € Qİ(M) and g € C®(M). Let us denote by 
T(R) the Nijenhuis torsion of R with respect to the 
Lie bracket on X(M) x C%(M) given by 


(X, f); (Z, h)] = (X, Z], X(2) — Z(f)) 48] 


As in the case of Poisson manifolds, if R has a 
vanishing Nijenhuis torsion, we call R a Nijenhuis 
operator. 

Suppose now that M is equipped with a Jacobi 
structure (Ag, Eo) and a Nijenhuis operator R. Then, 
we may define a bivector field A; and a vector field 
FE, on M, by setting 


(Ay, E1)” =Ro (Ao, Eo)” 


If one looks for the conditions that imply that the 
pair (A1, E1) defines a new Jacobi structure on M 
compatible with (Ag, Eo), in the sense that (Ag + 
A1, Eọ + E1) is again a Jacobi structure, one 
finds that A, is skew-symmetric if and only if 
R o (Ao, Eo)” =(Ao, Eo)* oR. When Ay is skew- 
symmetric, (A4, E1) defines a Jacobi structure on 
M if and only if, for all (a,f),(G,h) € 
Q!(M) x C®(M), 


T(R)( (Ao, Eo)* (a, f), (Ao, Eo)” (3,4) ) 
= R o (Ao, Eo)” (C((Ao, Eo), R)((a, f), (8,4))) 


where C((Ao,Eo),R) is the Magri concomitant of 
(Ag, Eo) and R. In the case where (Aj, E1) is a Jacobi 
structure, it is compatible with (Ao, Eo) if and only 
if, for all (a, f),(G,b) € Q'(M) x C%(M), 


(Ao, Eo)” (C((Ao, Eo), R)((a, f), (8, 2))) = 0 


A Jacobi-Nijenhuis manifold (M, (Ao, Eo), R) is a 
Jacobi manifold (M, Ao, Ey) with a Nijenhuis opera- 
tor R such that: (1) Ro (Ao, Eo)” = (Ao, Eo)* o R 
and (2) the map (Ag, Eo)” o C((Ag, Eo), R) identically 
vanishes. R is called the recursion operator of 
(M, (Ao, Fo), RR): 

A recursion operator on a Jacobi-Nijenhuis mani- 
fold displays a hierarchy of Jacobi-Nijenhuis structures 
on the manifold. In fact, if ((Ao, Eo), R) is a Jacobi- 
Nijenhuis structure on M, there exists a hierarchy 
((A}, Ex), k € N) of Jacobi structures on M, which are 
pairwise compatible. For all k € N, (Az, Eg) is the 
Jacobi structure associated with the vector bundle map 
(Ap, Ep)” given by (Ag, E)” = RE o (Ao, Eo)”. More- 
over, for all k,l € N, the pair ((A}, Ex), R!) defines a 
Jacobi—Nijenhuis structure on M. 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Classical r-Matrices, Lie Bialgebras, and Poisson Lie 
Groups; Contact Manifolds; Integrable Systems and 
Algebraic Geometry; Integrable Systems: Overview; 
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Multi-Hamiltonian Systems; Recursion Operators in 
Classical Mechanics. 
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Introduction 


A British experimentalist, JS Russell, first observed 
a soliton in 1834 while riding on horseback beside a 
narrow barge channel. He challenged the theoreti- 
cians of the day “to predict the discovery after it 
happened, that is to give an a priori demonstration 
a posterori.” This work created a controversy 
which, in fact, lasted almost 50 years, and which 
involved such distinguished scientists as Stokes and 
Airy. It was resolved by Korteweg and deVries in 
1895, who derived the KdV equation as an 
approximation to water waves, 


ðq Oq Oq_ 


This equation is a nonlinear partial differential 
equation (PDE) of the evolution type, where t and 
x are related to time and space respectively, and 
q(x, t) is related to the height of the wave above the 
mean water level. Korteweg and de Vries were able 
to show that equation [1] supports a particular 
solution that exhibits the behavior described by 
Russell. This solution, which was later called 
1-soliton solution, is given by 


. e p> /2 
a“ PO cosh a E A 


where p,c are constants. The location of this soliton 
at time ż, that is, its maximum position, is given by 
p? —2c/p, its velocity is given by p*, and its 
amplitude by p*/2. Thus, faster solitons are higher 
and narrower. It should be noted that qı is a 
traveling-wave solution, that is, qı depends only on 
the variable X = x — p*t, thus in this case the PDE [1] 
reduces (after integration) to the second-order 
ordinary differential equation (ODE) 


2 
-paN + 392(X) EE =0 


Under the assumption that q and dq/dX tend to 
zero as |X| — oo, this ODE yields the 1-soliton 
solution [2]. 

The problem of finding a solution describing the 
interaction of two 1-soliton solutions is much more 
difficult and was not addressed by Korteweg and 
deVries. This question was studied by M Kruskal 
and N Zabusky in 1965. Studying numerically the 
interaction of two solutions of the form [2] (i.e., two 
solutions corresponding to two different pı and p2), 
Kruskal and Zabusky discovered the defining prop- 
erty of solitons: after interaction, these waves 
regained exactly the shapes they had before. This 
posed a new challenge to mathematicians, namely to 
explain analytically the interaction properties of 
such coherent waves. In order to resolve this 
challenge one needs to develop a larger class of 
solutions than the 1-soliton solution. We note that 
eqn [1] is nonlinear and no effective method to solve 
such nonlinear equations existed at that time. 

Gardner et al. (1967) not only derived an explicit 
solution describing the interaction of an arbitrary 
number of solitons, but also discovered what was to 
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evolve into a new method of mathematical physics. 
The 2-soliton solution is given by 


2 (pje™ 4+ pze”) + 4en+ (pı =p) 
+2A1 (pe +m ajs p? e™ a 


q2\x,t) = 
(x2) (1+ en + er + Ayen+n)? 


where 


2 
Aq — e Pa) 


pep t+ dhL 
E (p1 + p2) 
and Bn are constants. A snapshot of this solution 
with pı =1,p2 =2 is given in Figure 1. After some 
time the taller soliton will overtake the shorter one 
and the only effect of the interaction will be a “phase 
shift,” that is, a change in the position the two 
solitons would have reached without interaction. 

Regarding the general method introduced in 
Gardner et al. (1967), we note that if eqn [1] is 
formulated on the infinite line, then the most interest- 
ing problem is the solution of the initial-value 
problem: given initial data g(x,0)=qo(x) which 
decay as |x| — oo, find g(x,t). If go is small and qqx 
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Figure 1 A snapshot of the 2-soliton solution of the KdV equation. 


can be neglected, then eqn [1] becomes linear and 
q(x, t) can be found using the Fourier transform, 


1 PE Ae 
q=] E rgo(k) dk [a 


where 
MOF J ego (x) dx 4b} 


The remarkable discovery of Gardner et al. (1967) 
is that for eqn [1] there exists a “nonlinear analog” of 
the Fourier transform capable of solving the initial- 
value problem even if go is not small. Although this 
nonlinear Fourier transform cannot in general be 
written in closed form, g(x,t) can be expressed 
through the solution of a linear integral equation, or 
more precisely through the solution of a linear 2 x 2 
matrix Riemann-Hilbert (RH) problem (see the 
section “A nonlinear Fourier transform”). This linear 
integral equation is uniquely specified in terms of 
go(x). For particular initial data, g(x, t) can be written 
explicitly. For example, if go(x) = qi(x), where q1 (x) is 
obtained by evaluating eqn [2] at t=O, then 
q(x, t)=q1(x — p*t). Similarly, if go(x)= qz(x, 0), 
where g2(x,0) is obtained by evaluating eqn [3] at 
t = 0, then g(x, t) =q2(x, t). 

The most important question, both physically and 
mathematically, is the description of the long-time 
behavior of the solution of the initial-value problem 
mentioned above. If the nonlinear term of eqn [1] can 
be neglected, one finds a linear dispersive equation. In 
this case different waves travel with different wave 
speeds, these waves cancel each other out and the 
solution decays to zero as t— oo. Indeed, using 
the stationary-phase method to compute the large 
t behavior of the integral appearing in eqn [4a], 
it can be shown that g(x,t) decays like 0(1/vt) 
as t > œ, x/t=0(1). The situation with the KdV 
equation is more interesting: dispersion is balanced by 
nonlinearity and g(x,t) has a “nontrivial” asymptotic 
behavior as t — oo. Indeed, using a nonlinear analog 
of the steepest descent method discovered by Deift and 
Zhou (1993) to analyze the RH problem mentioned 
earlier, it can be shown that g(x,t) asymptotes to 
gn(x,t), where gn(x, t) is the exact N-soliton solution. 
This underlines the physical and mathematical sig- 
nificance of solitons: they are the coherent structures 
emerging from any initial data as t— oo. This 
implies that if a nonlinear phenomenon is modeled 
by the KdV equation on the infinite line, then one 
can immediately predict the structure of the solution 
as t > co, x/t=0(1): it will consist of N ordered 
single solitons, where the highest soliton occurs to 
the right; the number N and the parameters p; and ny 
depend on the particular initial data go(x). It should 


Integrable Systems and the Inverse Scattering Method 95 


be noted that this result can be obtained only using 
the machinery of the theory of integrability, and 
until now cannot be obtained using standard PDE 
techniques. 

So far we have concentrated on the KdV equation. 
However, there exist numerous other equations 
which exhibit similar behavior. Such equations are 
called “integrable” and the method of solving their 
initial-value problem is called the “inverse-scattering” 
or “inverse-spectral” method. 

The following section presents a brief historical 
review of some of the important developments of 
soliton theory. Next, typical solitons, lumps, and 
dromions are given. The inverse-spectral method is 
discussed in the penultimate section. Finally, the 
extension of this method to boundary-value prob- 
lems is briefly discussed. 


Important Analytical Developments in 
Soliton Theory 


Lax (1968) introduced the so-called Lax pair 
formulation of the KdV. In an example, he showed 
that eqn [1] can be written as the compatibility 
condition of the following pair of linear eigenvalue 
equations for the eigenfunction y(x, t, k): 


Wax + (q+ k°)b = 0 [Sa] 


Y + (2q — 4k?) — (qx + vy = 0, 


where v is an arbitrary constant. The nonlinear 
Fourier transform mentioned earlier can be obtained 
by performing the spectral analysis of eqn [Sa]. The 
time evolution of the associated nonlinear Fourier 
data, which are now called spectral data, is linear 
and can be determined using eqn [5b]. Following 
Lax’s formulation, Zakharov and Shabat (1972) 
solved the nonlinear Schrödinger (NLS) equation 


keC [5b] 


iq: + dxx — 2Aļlqfq =0, A=+1 [6] 


which has ubiquitous physical applications including 
nonlinear optics. Soon thereafter the sine-Gordon 
equation 


Axx — Fit = sin q [7] 
and the modified KdV equation 


qi + 69° x + qxxx = 0 [8] 


were solved. Since then, numerous nonlinear equations 
have been solved. Thus, the mathematical technique 
introduced by Gardner et al. (1967) for the solution 
of a particular physical equation gave rise to a new 
method in mathematical physics, the so-called inverse- 
scattering (spectral) method. Among the most 


important equations solved by this method are a 
particular two-dimensional reduction of Einstein’s 
equation and the self-dual Yang-Mills equations. 

The next important development in the analysis of 
integrable equations was the study of the KdV with 
space-periodic initial data. This occurred in the 
mid-1970s in the USA and in the USSR. This method 
involves algebraic-geometric techniques; in particular 
there exists a periodic analog of the N-soliton 
solution which can be expressed in terms of a certain 
Riemann-theta function of genus N. 

In the mid-1970s, it was also realized that there 
exist integrable ODEs. For example, a stationary 
reduction of some of the equations introduced in 
connection with the space-periodic problem men- 
tioned above led to the integration of some classical 
tops. Furthermore, the similarity reduction of some 
of the integrable PDEs led to the classical Painlevé 
equations. For example, letting q=t '/3u(é), 
€=xt~'/3 in the modified KdV equation [8], and 
integrating we find 

2 

Tatu- uta [9] 
where a is a constant. This is Painlevé II, that is, the 
second equation in the list of six classical ODEs 
introduced by Painlevé and is his school around 1900. 
These equations are nonlinear analogs of the linear 
special functions such as Airy, Bessel, etc. The connec- 
tion between integrable PDEs and ODEs of the Painlevé 
type was established by Ablowitz and Segur (1977). 
Their work marked a new era in the theory of these 
equations. Indeed, soon thereafter Flaschka and Newell 
(1980) introduced an extension of the inverse-spectral 
method, the so-called isomonodromy method, capable 
of integrating these equations. The most remarkable 
achievement of this new development is the construction 
of nonlinear analogs of the classical connection formulas 
that exist for the linear special functions. These 
formulas, although rather complicated, are as explicit 
as the corresponding linear ones (Fokas et al. 2005). 

It was mentioned earlier that the inverse-spectral 
method gives rise to a matrix RH problem. An RH 
problem involves the determination of a function 
analytic in given sectors of the complex plane, from 
the knowledge of the jumps of this function across the 
boundaries of these sectors. The algebraic-geometric 
method for solving the space-periodic initial-value 
problem can be interpreted as formulating an RH 
problem which can be analyzed using functions defined 
on a Riemann surface. Also, it was noted by Fokas and 
Ablowitz (1983a) and later rigorously established by 
Fokas and Zhou (1992) that the isomonodromy 
method also gives rise to a novel RH problem. This 
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implies the following interesting unification: Self- 
similar, decaying, and periodic initial-value problems 
for integrable evolution equations in one space variable 
lead to the study of the same mathematical object, 
namely to the RH problem. 

Every integrable nonlinear evolution equation in 
one spatial dimension has several integrable versions in 
two spatial dimensions. Two such integrable physical 
generalizations of the Korteweg-deVries equation are 
the so-called Kadomtsev—Petviashvili I (KPI) and II 
(KPII) equations. In the context of water waves, they 
arise in the weakly nonlinear, weakly dispersive, weakly 
two-dimensional limit, and in the case of KPI when 
the surface tension is dominant. The NLS equation also 
has two physical integrable versions known as the 
Davey-Stewartson I (DSI), and II (DSII) equations. They 
can be derived from the classical water-wave problem in 
the shallow-water limit and govern the time evolution of 
the free surface envelope in the weakly nonlinear, 
weakly two-dimensional, nearly monochromatic limit. 
The KP and DS equations have several other physical 
applications. 

A method for solving the Cauchy problem for 
decaying initial data for integrable evolution equations 
in two spatial dimensions emerged in the early 1980s. 
This method is sometimes referred to as the 0 (d-bar) 
method. We recall that the inverse-spectral method 
for solving nonlinear evolution equations on the line 
is based on a matrix RH problem. This problem 
expresses the fact that there exist solutions of the 
associated x-part of the Lax pair which are sectionally 
analytic. Analyticity survives in some multidimen- 
sional problems: it was shown formally by Fokas and 
Ablowitz (1983b) that KPI gives rise to a nonlocal RH 
problem. However, for other multidimensional pro- 
blems, such as the KPII, the underlying eigenfunctions 
are nowhere analytic and the RH problem must be 
replaced by the 0 problem. Actually, a 0 problem had 
already appeared in the work of Beals and Coifman 
(1982) where the RH problem appearing in the analysis 
of one-dimensional systems was considered as a special 
case of a 0 problem. Soon thereafter, it was shown in 
Ablowitz et al. (1983) that KPII required the essential 
use of the 0 problem. The situation for the DS equations 
is analogous to that of the KP equations. 

Multidimensional integral PDEs can support 
localized solutions. Actually there exist two types 
of localized coherent structures associated with 
integrable evolution equations in two spatial vari- 
ables: the “lumps” and the “dromions.” The spectral 
meaning, and therefore the genericity of these 
solutions was established by Fokas and Ablowitz 
(1983b) and Fokas and Santini (1990). 

The analysis of integrable singular integro-differential 
equations and of integrable discrete equations, although 


conceptually similar to the analysis reviewed above, has 
certain novel features. 

The fact that integrable nonlinear equations 
appear in a wide range of physical applications is 
not an accident but a consequence of the fact that 
these equations express a certain physical coherence 
which is natural, at least asymptotically, to a variety 
of nonlinear phenomena. Indeed, Calogero (1991) 
has emphasized that large classes of nonlinear 
evolution PDEs, characterized by a dispersive linear 
part and a largely arbitrary nonlinear part, after 
rescaling yield asymptotically equations (for the 
amplitude modulation) having a universal character. 
These “universal” equations are, therefore, likely to 
appear in many physical applications. Many integr- 
able equations are precisely these “universal” models. 


Solitons, Lumps, and Dromions 


Solitons, lumps, and dromions, are important not 
because they are exact solutions, but because they 
characterize the long-time behavior of integrable 
evolution equations in one and two space dimen- 
sions. The question of solving the initial-value 
problem of a given integrable PDE, and then 
extracting the long-time behavior of the solution is 
quite complicated. It involves spectral analysis, the 
formulation of either an RH problem or of a ð 
problem, and rigorous asymptotic techniques. On 
the other hand, having established the importance of 
solitons, lumps, and dromions, it is natural to 
develop methods for obtaining these particular 
solutions directly, avoiding the difficult approaches 
of spectral theory. There exist several such direct 
methods, including the so-called Backlund transfor- 
mations, the dressing method of Zakharov—Shabat, 
the direct linearizing method of Fokas—Ablowitz, 
and the bilinear approach of Hirota. 


Solitons 


Using the bilinear approach, multisoliton solutions 
for a large class of integrable nonlinear PDEs in 
one space dimension are given in Hietarinta 
(2002). Here we only note that the 1-soliton 
solution of the NLS [6], of the sine-Gordon [7], 
and of the modified KdV equation [8] are given, 
respectively, by 


p peilhixt (PR =p? )t+n) 


TO coship api] 


q(px + qt) =4 arc tan[e?*tt+, p? =1+q [11] 
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+p 


2 
-p= — 2 12 
q(x — p’t) a eer ar [12] 
where pr, Pi, 7, p, q are real constants. 
Lumps 
The KPI equation is 
Ox|91 + 64qx + xxx] = 34yy [13] 
The 1-lump solution of this equation is given by 
2 2, 1 
q(x, y, t) = 20 In IL(x, y, t)| +7 ’ 
4X5 i 
L=x-— 2y +12Xt +a ma 
A= AR +i, A >O 


where à and a are complex constants. 
The focusing DSII equation is 


iqi + qa + 4x — 24 (0z lal + Oz lak) =0 [15] 
where z=x + iy, and the operator 0;' is defined by 


(az F) (2,2) 


_1 f G9 


Din R2 C-z an dc 





The 1-lump solution of this equation is given by 
i(p* +p* t+ pz—pz 
e 
q(z,2,t) = oe eee [16] 
z +a + 2ipt| + |6| 
where a, 3, p are complex constants. A typical 
1-lump solution is depicted in Figure 2. 
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Figure 2 A typical 1-lump solution. 





Dromions 


The DSI equation is 
iq: + (a +&)q + qu=0 
uxy = 2( 8 + & J lal? 


The 1-dromion solution of this equation is given by 


[17] 


X-F 
pe 
q(x: y, t) = Se YY LL 
ae*t* + Ge + yest +6 
X = te _ . 2 [18] 
=px+ipþt, Y=qy+iqt 


lol = 4prar(aß — 76) 


where p,q are complex constants and a, (3,y,6 are 
positive constants. 


A Nonlinear Fourier Transform 


The solution of the initial-value problem of an 
integrable nonlinear evolution equation on the 
infinite line is based on the spectral analysis of the 
x-part of the Lax pair. Thus, for the KdV equation 
one must analyze eqn [Sa]. This equation is the 
famous time-independent Schrodinger equation. We 
now give a physical interpretation of the relevant 
spectral analysis. Let KdV describe the propagation 
of a water wave and suppose that this wave is frozen 
at a given instant of time. By bombarding this water 
wave with quantum particles, one can reconstruct its 
shape from knowledge of how these particles 
scatter. In other words, the scattering data provide 
an alternative description of the wave at fixed time. 


abs u 
t= -0:35 e 
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The mathematical expression of this description 
takes the form of a linear integral equation found 
by Faddeev (the so-called Gel’fand—Levitan—March- 
enko equation) or equivalently the form of a 2 x 2 
matrix RH problem uniquely specified by the 
scattering data. This alternative description of the 
shape of the wave will be useful if the evolution of 
the scattering data is simple. This is indeed the case, 
namely using eqn [Sb], it can be shown that the 
scattering data evolve linearly. Thus, this highly 
nontrivial change of variables from the physical to 
scattering space provides a linearization of the KdV 
equation. 

In what follows we will describe some of the 
relevant mathematical formulas. We first 
“assume” that there exists a real solution g(x, ft) 
of the initial-value problem which has sufficient 
smoothness and which decays for all t as |x| — oo. 
We then discuss how this assumption can be 
eliminated. 

As it was mentioned earlier most of the analysis 
of the inverse-scattering transform is carried out 
on the x-part of the Lax pair, that is, on eqn [Sa]. 
Hence, we first concentrate on eqn [Sa] and for 
convenience of notation we suppress the time 
dependence. 


The Direct Problem 


As |x| — co,g — 0, thus there exist solutions of eqn 
[Sa] which tend to exp[ +ikx] as |x| — co. Let 
w(k,x) and w(k,x) denote solutions of eqn [Sal 
with the following asymptotic property: 

poe, ġe asx> o, RER [19] 
Under the transformation k — —k, eqn [5a] remains 
invariant and the boundary condition for w is mapped 
to the boundary condition for y. Hence 


N 


ylk, x) = (=k, x) |20] 


We denote by ¢(k, x) the solution of eqn [Sa] which 
tends to exp|-ikx] as x = —o0, 

poe, asao RER [21] 
It is more convenient to work with eigenfunctions 
(i.e., solutions of [Sa]) normalized to unity as x — oc, 
thus we introduce M(k,x) and N(k,x) as follows: 

M = deik* N= ape ike [22] 

The functions M and N can be expressed in terms of 
q through the solution of linear Volterra integral 
equations. Indeed, M satisfies 


keR 


x — —0 [23] 


Mxx — 2ikMx = —qM, 
M — 1, 


The homogeneous version of [23] has solutions 1 
and e”'**, Thus, 


M = ¢, + pe + My [24] 
where c1,c2 are constants and M, is given by 
My, = m(x) + u2(x)e7"** [25] 


The functions 11, “2 satisfy 


u + e*u, = 0, 2ike”*u, = —qM 


Thus, 
1 x 
n(x) = J déq(é)M(k, Ê), 


E 26 
te ~2iké 
mlx) = -zg | OME) 
Substituting [25] and [26] into [24] and using the 
boundary condition [23], we find 


M(k,x) 

a4 spf de(-1+e# ggg) 27 
Similarly, one may establish that N satisfied 
N(k, x) 


=1+ag j) dg(-1+e7#)q(Q)N(A,8) 28 


The kernel of eqn [27], as a function of k, is 
bounded and analytic for Imk > 0. Thus, if q € 
Lı, M(k,x) as a function of k is holomorphic for 
Imk > 0. Similarly, N(k,x) as a function of k is 
holomorphic for Imk > 0. 

Thus, we have found particular solutions of eqn 
[Sa] which are holomorphic for Imk > 0. Further- 
more, these solutions are simply related for k real. 
Indeed, the linear independence of solutions of the 
second-order ODE [Sa] implies 


N 


(k, x) = a(k)y(k, x) + b(k)y(k, x), 


Using [20] and replacing ¢ and w in terms of M and 
N, we find 


keR 


Hea = N(—k, x) + p(k)e7!**N(k, x) 
p(k) = A keR [29] 


alk) 
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) and b(k) are given by 


=1-5 J. déq(£)M(k, £), 
AR, 
= 55 up déq(€)M(k, E)e 


Indeed as x — œ, N — 1, thus, eqn [29] implies 


The functions a(k) 


keR 
[30] 
keR 


M — a(k) + b(k)e”* as x — 00 [31] 


On the other hand, eqn [27] implies that 


M3 14: = / © dé(—1 + 29 q(e)M(k, £) 


x — © [32] 


Comparing eqns [31] and [32], we find eqns [30]. 
The expression for a(k) implies that this function 
is also holomorphic for Im k > 0. 
In summary, in the “direct problem,” we have 
found particular solutions of eqn [Sa] which are 
sectionally holomorphic: 


M(k, x) M(—k, x) 

boned ane fe 
are holomorphic for Imk > 0 and Imk < 0, respec- 
tively. These solutions, which are characterized in 


terms of g by eqns [27] and [28], are simply related 
by eqn [29]. 


The Inverse Problem 


Equation [28] expresses N in terms of q. Is it possible 
to find an alternative expression for N in terms of 
some appropriate “spectral data”? The answer is 
positive and is a direct consequence of the fact that 
eqn [29] defines the “jump condition” of an RH 
problem. Indeed, it can be shown that a(k) may have 
simple zeros k1,...,k, in the positive imaginary axis 
of the k-complex plane. Hence, in general, M/a can 
be expressed in the form 


M(k, x) 


alk) >0 








=~ A;(x 
= M(k, x) + —, pj 
( ) ar Pj 


where M(k, x) as a function of k is holomorphic for 
Imk>0O. It can also be shown that A;(x)= C; 
exp[-2p;, x]N(k;,x). Hence eqn [29] becomes 


M(k,x) 


n aT2PiXN (p. , 
= Yo OE ONGA) i kje” *N(k, x), 
i k= 1p; 


— N(—k, x) 


keR 


Taking the (—) projection of this equation, and 
using the fact that both M and N tend to 1 ask > œ, 
we find 


N(k, x) — af dl p(I)e7"* N(1, x) 





in [+k +i0 
E ae oe 
=1- Pg N 33] 


In summary, this equation expressed N(k,x) in 
terms of the scattering data (p(k), {C;, p;}7). 

Since both eqns [28] and [33] are associated with 
the same q, these equations can be used to obtain 
the following expression for g: 


1 ne 2ilx 
> J AKEP N( x) 


si 2 Ce?" N (ip, “) 34] 


o 
an eu 
Á Ox 





Indeed, eqn [28] implies 


jim N(k, x) = 1 — af déq(€) 


Comparing this expression with the large-k behavior 
of eqn [33], we find [34]. 


Time Dependence of the Scattering Data 


We now use eqn [5b] to compute the time 
dependence of the scattering data by evaluating 
eqn [Sb] as x—-—oo we find v=4ik’. Then, 
evaluating it as x — oo and using 


gage he, x = +00 


we find 
a, = 0, b; = 8ik’b 
Hence, 
a(t,k) =a(0,k), — p(t,k) = p(0, kje [35] 
Thus, 


pj(t)=p(0),  C(t)=C(0)e* — [36] 


The above formal results motivate the follow- 
ing definitions (for simplicity, we assume that a(k) 
has no zeros). Given a decaying real function 
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go(x), x € R, define Mo(k,x) as the solution of the 
linear Volterra integral equation 


Mo(k.x)=14 55 | dg(—1-+ 2 %q(6)Mo(k, 8) 
Imk >0 


Given Mo(k,x), define ap(k) and bo(k) by 


Mo(k,x) — ao(k) + bo(k)e**, x—>oœ, RER 
Given ao and bo, define N(k, x,t) by the solution of 


the linear integral equation 


= 1 ff? boll) siBemziw N(x, t) _ 
Beate) 7D E LEk 


A theorem of Gohberg and Krein implies that this 
equation has a unique global solution. Given 
ao, bo, N, define g(x, t) by 








— 10 f” 4, Dol) sikt+2ikx 
Gxt) = wa | eH N(k, x, t) 


Then it can be shown that g(x, t) satisfies the KdV 
equation and q(x, 0) = qoọo(x). 


A Unification 


After the emergence of a method for solving the 
initial-value problem for nonlinear integrable evolu- 
tion equations in one and two space variables, the 
most outstanding open problem in the analysis of 
these equations became the solution of initial 
boundary-value problems. A general approach for 
solving such problems for evolution equations in one 
space dimension was provided by Fokas (1997). 
This approach has already been used for the study of 
nonlinear integrable evolution PDEs on the half-line 
(Fokas 2002, 2005), on the interval, and in a time- 
dependent domain. An important advantage of this 
new method is that it yields the formulation of a 
matrix RH problem (or a 0 problem in the case of a 
convex time-dependent domain), which although has 
more complicated jump matrices than the analogous 
problem on the infinite line, it still has an explicit 
exponential (x,t) dependence. This fact allows one to 
describe effectively the asymptotic properties of the 
solution, using the powerful Deift-Zhou method 
(Deift and Zhou 1993). For example, the long-time 
asymptotics of boundary-value problems on the half 
line are discussed in Fokas and Its (1996). 

It is remarkable that the above results have 
motivated the discovery of a new method for solving 


boundary-value problems, not only for linear evolu- 
tion PDEs, but also for linear elliptic PDEs in two 
dimensions. This includes the Laplace, the biharmonic 
and the Helmholtz equations in a convex polygon 
(Dassios and Fokas 2005). In a most recent develop- 
ment, this method has also been applied to certain 
classes of linear PDEs with variable coefficients. This 
highly unexpected development unifies and extends 
several classical branches of mathematics. In particu- 
lar, it unifies the classical transform methods for 
simple linear PDEs as well as the method of images, 
the treatment of linear PDEs via certain ingenious 
techniques such as the Wiener-Hopf technique, the 
formulation of Ehrenpreis type integral representa- 
tions, and the solution of integrable nonlinear PDEs 
via the inverse-scattering transform. Furthermore, it 
extends these results to arbitrary domains and to 
certain classes of PDEs with variable coefficients. 

Regarding linear equations we note the following: 

Almost as soon as linear two-dimensional PDEs 
made their appearance, d’Alembert and Euler discov- 
ered a general approach for constructing large classes 
of their solutions. This approach involved separating 
variables and superimposing solutions of the resulting 
ODEs. The method of separation of variables natu- 
rally led to the solution of PDEs by a transform pair. 
The prototypical such pair is the direct and the inverse 
Fourier transforms; variations of this fundamental 
transform include the Laplace, Mellin, sine, cosine 
transforms, and their discrete analogs. 

The proper transform for a given boundary-value 
problem is specified by the PDE, by the domain, and 
by the given boundary conditions. For some simple 
boundary-value problems, there exists an algorithmic 
procedure for deriving the associated transform. This 
procedure involves constructing the Green’s function 
of a single eigenvalue equation, and integrating this 
Green’s function in the k-complex plane, where 
k denotes the eigenvalue. 

The transform method has been enormously 
successful for solving a great variety of initial- and 
boundary-value problems. However, for sufficiently 
complicated problems the classical transform method 
fails. For example, there does not exist a proper analog 
of the sine transform for solving a third-order evolution 
equation on the half-line. Similarly, there do not exist 
proper transforms for solving boundary-value pro- 
blems for elliptic equations even of second order and in 
simple domains. The failure of the transform method 
led to the development of several ingenious but 
ad hoc techniques, which include: conformal mappings 
for the Laplace and the biharmonic equations; the 
Jones method and the formulation of the Wiener-Hopf 
factorization problem; the use of some integral 
representation, such as that of Sommerfeld; the 
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formulation of a difference equation, such as the 
Malyuzhinet’s equation. The use of these techniques 
has led to the solution of several classical problems in 
acoustics, diffraction, electromagnetism, fluid 
mechanics, etc. The Wiener-Hopf technique played a 
central role in the solution of many of these problems. 

A crucial role in the new method is played by the 
global equation satisfied by the boundary values of q 
and of its derivatives. For evolution equations and for 
elliptic equations with simple boundary conditions, this 
involves the solution of a system of algebraic equations, 
while for elliptic equations with arbitrary boundary 
conditions, it involves the solution of an RH problem. 
For simple polygons, this RH problem is formulated on 
the infinite line, thus it is equivalent to a Wiener—Hopf 
problem. This explains the central role played by the 
Wiener—Hopf technique in many earlier works. 

For linear PDEs, the explicit x1,x2 dependence of 
q(x1, X2) is consistent with the Ehrenpreis formulation 
of the solution. Thus, this method provides the 
concrete implementation as well as the generalization 
to concave domains of this fundamental principle. For 
nonlinear equations, it provides the extension of the 
Ehrenpreis principle to integrable nonlinear PDEs. 


See also: Boundary value Problems for Integrable 
Equations; 0-Approach to Integrable Systems; Integrable 
Systems and Algebraic Geometry; Integrable Discrete 
Systems; Integrable Systems and Discrete Geometry; 
Integrable Systems in Random Matrix Theory; Integrable 
Systems: Overview; Korteweg-de Vries Equation and 
Other Modulation Equations; Partial Differential 
Equations: Some Examples; Riemann-Hilbert Methods in 
Integrable Systems; Sine-Gordon Equation; Toda 
lattices; Twistor Theory: Some Applications [in Integrable 
Systems, Complex Geometry and String Theory]. 
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Random Matrix Models 


A random matrix model is a probability space 
(Q,P,F) where the sample space Q is a set of 
matrices. There are three classic finite N random 
matrix models (see, e.g., Mehta (1991)): 


1. Gaussian orthogonal ensemble (8 = 1): 

(a) Q=N x N real symmetric matrices; 

(b) P= “unique” measure that is invariant under 
orthogonal transformations and the matrix 
elements are i.i.d. random variables; expli- 
citly, the density is 


cn exp(—tr(A*)) dA [1] 


where cy is a normalization constant and 

dA = |]; dA; [];-; dA, the product Lebesgue 

measure on the independent matrix elements. 
2. Gaussian unitary ensemble (8 = 2): 

(a) Q=N x N Hermitian matrices; 

(b) P= “unique” measure that is invariant 
under unitary transformations and the (inde- 
pendent) real and imaginary matrix elements 
are i.1.d. random variables; and 

3. Gaussian symplectic ensemble (G=4) (see Mehta 

(1991) for a definition). 


Generally speaking, the interest lies in the 
N —> œ limit of these models. Here we concentrate 
on one aspect of this limit. In all three models the 
eigenvalues, which are random variables, are real 
and with probability 1 they are distinct. If Amax(A) 
denotes the largest eigenvalue of the random 
matrix A, then for each of the three Gaussian 
ensembles we introduce the corresponding distri- 
bution function 


Fn, a(t) = Po(Amax < t), B=1,2,4 


The basic limit laws (see Tracy and Widom 
(1996) and references therein) state that 


Fa(s) = lim Fy,s(20VN+ =), p= 1,2,4 [2] 


exist and are given explicitly by 


F(s) = det (I — K Airy) 


=exp(— f œ- 9al) de) 


Ai(x)Ai (y) — Ai' (x)Ai(y) 
x —Yy 
acting on L*(s,oo)(Airy kernel) 


where 


K airy — 


and q is the unique solution to the Painlevé II 
equation 


q” = sq +2q° 
satisfying the condition 


q(s) ~ Ais) 


o in eqn [2] is the standard deviation of the 
Gaussian distribution on the off-diagonal matrix 
elements. For the normalization we have chosen 
o= 1/\/2; however, for subsequent comparisons, the 
normalization o= VN is perhaps more natural. 

The orthogonal and symplectic distribution func- 
tions are 


as S —> © 


Fi (s) = exp (-5 q(x) dx) (Fy(s))'/ 
S 
F4(s/ V2) = cosh(; [ q(x) dx) (Fy(s))'/ 
S 
Graphs of the densities dF;/ds are in the adjacent 
figure and some statistics of Fg can be found in 
Figure 1. 

The Airy kernel is an example of an integrable 
integral operator and a general theory is developed in 
Tracy and Widom (1994). A vertex operator approach 
to these distributions (and many other closely related 
distribution functions in random matrix theory) was 
initiated by Adler, Shiota, and van Moerbeke (see the 
review article var Moerbeke (2001) for further 
developments of this latter approach). 

Historically, the discovery of the connection 
between Painlevé functions (Py; in this case) and 
Toeplitz/Fredholm determinants appears in work 
of Wu et al. (1976) on the spin-spin correlation 
functions of the two-dimensional Ising model. Painlevé 
functions first appear in random matrix theory in 


6 Hg ap So Ks 

1 —1.20653 1.2680 0.293 0.165 
2 —1.77109 0.9018 0.224 0.093 
4 —2.30688 0.7195 0.166 0.050 


Probability densities 
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Figure 1 The mean (ug), standard deviation (og), skewness 
(Sg), and kurtosis (Kg) of Fz. 


Jimbo et al. (1980) where they prove that the Fredholm 
determinant of the sine kernel is expressible in terms of 
Py. Gaudin (using Mehta’s then newly invented 
method of orthogonal polynomials (Porter 1965)) 
was the first to discover the connection between 
random matrix theory and Fredholm determinants. 


Universality Theorems 


A natural question is to ask whether the above limit 
laws depend upon the underlying Gaussian assump- 
tion on the probability measure. To investigate this for 
unitarily invariant measures (8 = 2), one replaces in [1] 


exp(—tr(A*)) — exp(—tr(V(A))) 
Bleher and Its (1999) choose 


V(A) = gA*- A’, g>0 


and subsequently a large class of potentials V was 
analyzed by Deift et al. (1999). These analyses 
require proving new Plancherel—Rotach type formu- 
las for nonclassical orthogonal polynomials. The 
proofs use Riemann—Hilbert methods. It was shown 
that the generic behavior is GUE; hence, the limit 
law for the largest eigenvalue is F2. However, by 
finely tuning the potential new universality classes 
will emerge at the edge of the spectrum. For G=1,4 
a universality theorem was proved by Stojanovic 
(2000) for the quartic potential. 

In the case of noninvariant measures, Soshnikov 
(1999) proved that for real symmetric Wigner matrices 
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(complex Hermitian Wigner matrices), the limiting 
distribution of the largest eigenvalue is F4 (respectively, 
F2). (A symmetric Wigner matrix is a random matrix 
whose entries on and above the main diagonal are 
independent and identically distributed random vari- 
ables with distribution function F. Soshnikov assumes 
that F is even and all moments are finite.) The 
significance of this result is that non-Gaussian Wigner 
measures lie outside the “integrable class” (e.g., there 
are no Fredholm determinant representations for the 
distribution functions) yet the limit laws are the same as 
in the integrable cases. 


Appearance of F; in Limit Theorems 


In this section we briefly survey the appearances of 
the limit laws Fg in widely differing areas. 


Combinatorics 


A major breakthrough occurred with the work of 
Baik, Deift, and Johansson (see Baik et al. (2000) and 
references therein) when they proved that the limiting 
distribution of the length of the longest increasing 
subsequence in a random permutation is F2. Precisely, 
if ¢n(o) is the length of the longest increasing 
subsequence in the permutation o € Sy, then 


P(E <s) AG 

as N— oœ. Here the probability measure on the 
permutation group Sn is the uniform measure. 
Further discussion of this result can be found in 
Johansson (2000b). 

Baik and Rains (2001) showed by restricting the set 
of permutations (and these restrictions have natural 
symmetry interpretations) that F; and F4 also appear. 
Even the distributions F? and F$ (Tracy and Widom 
1999) arise. By the Robinson-Schensted-Knuth corre- 
spondence, the Baik—Deift—Johansson result is equiva- 
lent to the limiting distribution on the number of boxes 
in the first row of random standard Young tableaux. 
(The measure is the push-forward of the uniform 
measure on Sn.) These same authors conjectured that 
the limiting distributions of the number of boxes in the 
second, third, etc., rows were the same as the limiting 
distributions of the next-largest, next-next-largest, 
etc., eigenvalues in GUE. Since these eigenvalue 
distributions were also found in Tracy and Widom 
(1996), they were able to compare the then unpub- 
lished numerical work of Odlyzko and Rains (2000) 
with the predicted results of random matrix theory. 
Subsequently, Baik et al. (2000) proved the conjecture 
for the second row. The full conjecture was proved by 
Okounkov (2000) using topological methods and by, 
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among others, Johansson (2001) using analytical 
methods. For an interpretation of the Baik—Deift— 
Johansson result in terms of the card game patience 
sorting, see the very readable review paper by Aldous 
and Diaconis (1999). 


Growth Processes 


Growth processes have an extensive history both in 
the probability literature and the physics literature 
(see, e.g., Meakin (1998) and references therein), but 
it was only recently that Johansson (2002b) proved 
that the fluctuations about the limiting shape in a 
certain growth model (“corner growth model”) are 
F2. Johansson further pointed out that certain 
symmetry constraints (inspired from the Baik and 
Rains (2001) work) lead to F, fluctuations (see 
Growth Processes in Random Matrix Theory). 

Subsequently, Baik and Rains (2000) and Gravner 
et al. (2002) have shown the same distribution 
functions appearing in closely related lattice growth 
models. Prahofer and Spohn (2000) reinterpreted the 
work of Baik et al. in terms of the physicists’ poly- 
nuclear growth (PNG) model thereby clarifying the role 
of the symmetry parameter 8. For example, G=2 
describes growth from a single droplet, whereas 8 = 1 
describes growth from a flat substrate. They also 
related the distribution functions Fg to fluctuations of 
the height function in the KPZ equation (Kardar et al. 
1986, Meakin 1998). (The connection with the KPZ 
equation is heuristic.) Thus, one expects on physical 
grounds that the fluctuations of any growth process 
falling into the 1+ 1KPZ universality class will be 
described by the distribution functions Fg or one of the 
generalizations by Baik and Rains (2000). Such a 
physical conjecture can be tested experimentally. Ear- 
lier Myllys et al. established experimentally that a slow, 
flameless burning process in a random medium (paper!) 
is in the 1 + 1KPZ universality class. This sequence of 
events is a rare instance in which new results in 
mathematics inspire new experiments in physics. 

In the context of the PNG model, Prahofer and 
Spohn have given a process interpretation, the Airy 
process, of F2. 

There is an extension of the growth model in 
Gravner et al. (2002) to growth in a random 
environment. In Gravner et al. (2002) the following 
model of interface growth in two dimensions is 
considered by introducing a height function on the 
sites of a one-dimensional integer lattice with the 
following update rule: the height above the site x 
increases to the height above x —1, if the latter 
height is larger; otherwise, the height above x 
increases by 1 with probability p,. It is assumed 
that the px are chosen independently at random with 


a common distribution function F, and that the initial 
state is such that the origin is far above the other sites. 
In the pure regime, Gravner—-Tracy—Widom identify 
an asymptotic shape and prove that the fluctuations 
about that shape, normalized by the square root of 
the time, are asymptotically normal. This contrasts 
with the quenched version: conditioned on the 
environment and normalized by the cube root of 
time, the fluctuations almost surely approach the 
distribution function F,. We mention that these same 
authors find, under some conditions on F at the right 
edge, a composite regime where now the interface 
fluctuations are governed by the extremal statistics of 
px in the annealed case while the fluctuations are 
asymptotically normal in the quenched case. 


Random Tilings 


The Aztec diamond of order 7 is a tiling by dominoes of 
the lattice squares [m,m + 1] x [¢,2+1],m,n € Z, 
that lie inside the region {(x,y): |x| +|y| <n+1}. A 
domino is a closed 1 x 2 or 2 x 1 rectangle in R^ with 
corners in Z7. A typical tiling is shown in Figure 2. One 
observes that near the center the tiling appears random, 
called the temperate zone, whereas near the edges the 
tiling is frozen, called the polar zones. As n — oo the 
boundary between the temperate zone and the polar 
zones (appropriately scaled) converges to a circle 
(“arctic circle theorem”). Johansson (2002a) proved 
that the fluctuations about this limiting circle are F2. 


Statistics 


Johnstone (2001) considers the largest principal 
component of the covariance matrix X'X where X 
is an nxp data matrix all of whose entries are 
independent standard Gaussian variables and proves 
that for appropriate centering and scaling, the 
limiting distribution equals F; in the limit 1, p — co 
with n/p—y €R". Soshnikov has removed the 
Gaussian assumption but requires that n—p= 
O(p!/3). Thus, we can anticipate applications of 
the distributions Fz; (and particularly F,) to the 
statistical analysis of large data sets. 





Figure 2 Random tilings. 


Queuing Theory 


Glynn and Whitt (1991) consider a series of 1 single- 
server queues each with unlimited waiting space 
with a first-in and first-out service. Service times are 
i.i.d. with mean one and variance o° with distribu- 
tion V. The quantity of interest is D(k,7), the 
departure time of customer k (the last customer to 
be served) from the last queue n. For a fixed number 
of customers, k, they prove that 


D(k,n) —1n 
oyn 


converges in distribution to a certain functional D, 
of k-dimensional Brownian motion. They show that 
D, is independent of the service time distribution V. 
It was shown in, for example, Gravner et al. (2002) 
that D, is equal in distribution to the largest 
eigenvalue of a kxk GUE random matrix. This 
fascinating connection has been greatly clarified in 
recent work of O’Connell and Yor (2002). 

From Johansson (2002), it follows for V Poisson that 


p D(|xn|,n)— cın 
T 


< s) — Fz (s) 


as n— oo for some explicitly known constants c1 
and c2 (depending upon x). 


Superconductors 


Vavilov et al. (2001) have conjectured (based upon 
certain physical assumptions supported by numer- 
ical work) that the fluctuation of the excitation gap 
in a metal grain or quantum dot induced by the 
proximity to a superconductor is described by F4 for 
zero magnetic field and by F» for nonzero magnetic 
field. They conclude their paper with the remark: 


The universality of our prediction should offer ample 
opportunities for experimental observation. 
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Introduction 


This section introduces some elementary notions 
and sets the (mathematically low brow) tone of this 
presentation. 

A dynamical system is characterized by an evolu- 
tion equation the general structure of which reads 


Q; = F [1] 


Here O = O(x,t) is the dependent variable, and it 
might be a scalar, a vector, a matrix, you name it. 
The focus of interest is on its evolution as function 
of the (real, scalar) “time” variable t. The a priori 
unknown quantity O might moreover depend on 
another independent “space” variable (scalar or 
vector) x, O = O(x,t). The appended variable ¢ in 
the left-hand side of the above equation denotes 
partial differentiation, and this notation will be used 
throughout, although when tf is the only independent 
variable differentiation with respect to it might be 
instead denoted by a superimposed dot: 





The quantity in the right-hand side of the evolution 
equation (1), which has of course the same (scalar, 
vector, matrix) character as QO, is an assigned 
function of t, x and O, F = (x,t, O) (more generally, 
its dependence on O might be functional, see 
below). A typical example of the dynamical systems 
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we shall consider is the N-body problem character- 
ized by the Newtonian equations of motion 


du =— Wn 


N 
+ 29° ` (dn -—GQm),2=1,2,...N [2] 


m=1, m+n 


where the dependent variable is the N-vector g = 
(q1, --- qN), the components of which are the “particle 
coordinates” qn = qn(t). Note however that these 
equations of motion are of second-order in time 
(contrary to (1)); but they can of course be reformulated 
as first-order ODEs indeed their Hamiltonian version, 
derived in the standard manner from the Hamiltonian 


H=157(p2 ada a 
-2 = n dn 


n 


2 N 
+E YO (oan? Bd 
m,n=1;m+n 
reads 

dn = Pn [3b] 

Pn = — W qn 

N 
Ha Y (Gro) >, n=1,2,...N pe 
m=1, m+n 


Other typical examples are the (“Korteweg-de 
Vries”, “Burgers”, “Nonlinear Schrödinger”, “sine 
Gordon”) PDEs satisfied by the scalar dependent 
variable g = q(x,t), 


qit =—qxxx + 242g = (—dax + 9°), [4] 
qt =—Gxx + 24nq = (—qx + 9°), [5] 
q: = i| dx + slalq|, s=+ [6] 
dt — dx = S, St + Sx = sing [7] 


as well as the integrodifferential (“Benjamin—Ono’’) 
equation 


E ii Dyy(y) 
a= Pf dyP aq [8] 


and the (“Kadomtsev—Petviashvili”) PDE satisfied by 
the scalar dependent variable g = q(x,y, t), 


dix = (axxx + Ix) .+ Sayy, s=xt [9] 


This last equation should of course be reformulated 
as an integrodifferential equation to fit with (1). 
These are all examples of integrable systems (see 
below). In this presentation we restrict attention to 
dynamical systems of these general types, without 
considering evolutions in which the space variable, 
and/or the time variable, and/or the dependent 
variable, only take discrete values, forsaking thereby 
the discussion of discrete evolution equations, 
cellular automata and functional equations, see 
other entries of this Encyclopedia. We shall consider 
mainly the “initial-value problem” in which the 
solution is assigned at the initial time, say at t= 0, 


Q(x, 0) = Qo(x) 


and the subsequent evolution of the dependent 
variable, namely the values taken by Q(x, t) for t > 
0, is the focus of attention. Note however that, 
except when there is no dependence at all on the 
space variable x (see for instance (2)), the functional 
class to which QO(x,t) belongs as regards its 
x-dependence should be specified (and the assigned 
initial-value Oo(x) should of course belong to this 
functional class). A typical class of functions are 
those vanishing (adequately fast) at (spatial) infinity; 
another typical class are those characterized by 
periodicity properties as functions of x; and still 
another class are those restricted to a finite spatial 
domain (for instance, the positive x-axis, x > 0, or a 
finite interval, a < x < b), in which cases the initial- 
value problem must be supplemented by assigning 
boundary conditions. These latter class of problems, 
called initial/boundary-value problems, are generally 
more difficult; even the identification of which 
boundary conditions are adequate to identify 
uniquely the solution may be a nontrivial task. In 
the following we will always focus on the simpler 
class of problems characterized by solutions defined 
in the entire space region and vanishing (sufficiently 
fast) asymptotically (far away). 

Thus, in the spirit of the initial-value problem, a 
dynamical system is generally characterized by 
assigning its evolution equation, the functional 
class to which its solutions are required to belong, 
and possibly in addition some (additional) restric- 
tion on the set of initial data. 
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Let us finally mention that, aside from considering 
the initial-value problem, the study of dynamical 
systems may focus on the identification of special 
(classes of) solutions, for instance those obtained by 
using symmetry properties of the evolution equation 
under consideration (yielding, say, “similarity solu- 
tions”), and, in the integrable case, “solitonic” and 
“multisolitonic” solutions (see below). 


Integrable dynamical systems 


The solution of a dynamical system, however simple 
the equation that defines its time evolution, see (1), may 
be extremely complicated, indeed its time-dependence 
might feature one or more of the characteristics of 
deterministic chaos, such as a sensitive dependence on 
the initial data. But there are “exceptional” dynamical 
systems, the behavior of which is instead, in some 
sense, simple. Such systems are termed — in the least 
technical sense of the word — “integrable”. 

This characterization can be made precise for 
Hamiltonian systems with a finite number N of degrees 
of freedom, the equations of motion of which read 

OH(p,q) . OH (p,q) 
= ———_ , p, = — —~——_ ,n=1,...N 

OPn qn 
Such a system is integrable if there exist, in addition 
to the Hamiltonian H(p,ġ) = H® (p,q) itself, 
N-— 1 other (nontrivial and functionally indepen- 
dent) constants of motion H™ (p,q) in involution, 
namely such that their Poisson brackets vanish: 


n 


4 oqe Opi 

_ OH" (b,4) OH (6,4) | _ 
Oe Ope 

mm = Lh N 


Let us however emphasize the crucial role of the words 
“there exist”, as used just above. For definiteness let us 
require that the constants of motion H™ (p,q) be 
analytic functions of their 2N arguments, and not 
excessively multivalued: they might feature some 
branch points, but not so many to vanify their 
effectiveness in constraining the time evolution of the 
dynamical variables g,(t),pn(t) sufficiently to avoid 
their behavior from being too complicated. On the 
other hand it is of course not necessary that these 
functions H™) (p,q) be explicitly known. 

When these conditions hold it is in principle 
possible (“Liouville theorem”) to identify a 


108 Integrable Systems: Overview 


canonical transformation from the canonical coor- 
dinates and momenta qn and p, to action-angle 
variables 0„ and L, such that 


1, = HO (P3) 10] 
Then these action variables evolve trivially, 
lat) = LaO 640) E= O AOS a N 


Note that, once these new canonical variables are 
identified, the solution of the initial-value problem for 
the original Hamiltonian problem is provided directly 
by the expressions of the action-angle variables 6,, and 
I, in terms of the original variables g, and pn, as well 
as the expressions of the latter in terms of the former. 
The second step of this procedure requires inverting 
the expressions (10), and the corresponding expres- 
sions of the angle variables 0„ in terms of the original 
variables g, and p,; a necessary condition in order that 
this step allow to identify uniquely, at least in 
principle, the original canonical variables g, and p, 
in terms of the action-angle variables 1„ and 6,, — hence 
imply a simple time-evolution of these original vari- 
ables — is the requirement, as mentioned above, that 
the expressions of the constants of the motion 
H” (p,q) in terms of their arguments qn and p, not 
be excessively multivalued. 

The statements outlined above can be rigorously 
formulated for finite-dimensional Hamiltonian sys- 
tems, and they can be heuristically extended to all 
analogous dynamical systems with a finite number of 
degrees of freedom, even if they are not Hamiltonian. 

A system with N degrees of freedom might possess 
more than N constants of motion. Such a system 
that possesses 2N — 1 (nontrivial and functionally 
independent) constants of motion (the maximal 
number, to avoid the evolution being frozen) is 
called superintegrable, and its evolution is in some 
sense analogous to that of a system with a single 
degree of freedom, in particular all its confined and 
nonsingular motions are then completely periodic, 


Ant +T) = gn(t), pat +T) = prt) n = 1,..., N 


The period T depends generally on the initial data. If it 
does not, at least for an open set of such data having 
full dimensionality in phase space, the system is called 
isochronous: all its motions in that phase space region 
are then completely periodic with the same period. 

A dynamical system might be integrable in a region 
of its “natural” phase space, and nonintegrable in 
another region. Sometimes such systems are referred to 
as partially integrable. There even are systems which 
are isochronous (hence superintegrable) in a region of 
their phase space, and behave instead chaotically in 
another region. These regions are generally separated 
by boundaries where the evolution of the system runs 


into singularities, and the constants of motion asso- 
ciated with the integrable behavior become excessively 
multivalued in the regions where the behavior is 
chaotic. (see Isochronous Systems). 

Dynamical systems featuring an additional space 
variable x (see Section 1) can be interpreted as infinite- 
dimensional dynamical systems (by considering the 
variable x as a continuous label for the dependent 
variable OQ). Accordingly, a necessary condition in 
order that such systems be considered integrable is the 
requirement that they possess an infinite number of 
constants of the motion. But — even for such systems 
that allow a Hamiltonian formulation — this condition 
cannot be considered sufficient (due to the inherent 
ambiguities in the counting of infinities), and in fact a 
completely cogent, universally accepted definition of 
integrability for infinite-dimensional dynamical sys- 
tems is still lacking (various definitions can of course 
be given in special contexts). It is nevertheless rather 
well understood by practitioners what is meant by 
such a term at least for integrable equations such as 
those indicated at the end of the previous section, 
which generally give rise to the solitonic phenomenol- 
ogy — as explained below. 

The study of integrable systems has an illustrious 
history, to which many eminent mathematicians and 
mathematical physicists contributed after the 
Newtonian revolution: Euler, Jacobi, Poincaré, Pain- 
levé, Kowalewskaya, Kolmogorov, Moser ... Below 
we report — most tersely — on the bloom that this topic 
has witnessed over the last 3-4 decades, without being 
generally able, due to space constraints, to attribute 
the appropriate credit to the many colleagues, most of 
them still living, who contributed to this endeavor. For 
more detailed treatments of the topics outlined below, 
of related developments not mentioned here, and of 
such credits, the interested reader is referred to the 
bibliography given below, including the additional 
references traceable from there. 


Integrable many-body problems 


An important class of integrable dynamical systems 
is provided by N-body problems characterized by 
Hamiltonians such as 


N 
HBa) =5> P+ VG) 11] 
n=1 


with a potential energy V(q) that includes “exter- 
nal” and “two-body” forces, 


The corresponding Hamiltonian and Newtonian 
equations of motion read 

AV (qn) = 
An = Pn, Pn = San `“ 


The Lax pair and the constants of motion Suppose 
that two N@N matrices L= L(p,ġ) and M = 
M(p,q) could be found such that the matrix “Lax 
equation” 

L=[L,M] [14] 
be equivalent to the Hamiltonian equations of 
motion (13). Here and throughout the notation 
[A, B] denotes the commutator: 


[A,B] =AB-BA 


Because this matrix equation clearly entails that 
the N traces 


Tym aCe LC aS LN 


are constants of the motion, 


T, =0,n=1,...,N 

the possibility to write the Hamiltonian equations 
(13) in the Lax form (14) yields as a bonus N 
constants of the motion, namely it entails that the 
Hamiltonian system under consideration is integr- 
able. (One must moreover show that these constants 
of motion are in involution; this is usually the case). 

Hence a route to identify integrable N-body pro- 
blems is via the search of Lax pairs L, M of matrices 
such that (14) correspond to (13), with an appropriate 
assignment of the potential energy (12). For N > 2 this 
is a nontrivial task, because (13) is a system of 2N 
ODEs in 2N unknowns, while the matrix Lax 
equation (14) amounts to a system of N? ODEs. 


Functional equations and the identification of 
integrable many-body problems A convenient 
ansatz to identify a Lax pair suitable for the purpose 
outlined above reads as follows: 


Lam = pn tor n =m, Lam "Ol Ia — Gm) form#n, 
N 


Mam” ` P(du—4e) for n= m, 
Leen 
Mrm=(dn- qm), formAn 


where a(g),G(q) and (q) are 3 functions to be 
determined. It is then easily seen that these functions 
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may be assigned so that the corresponding Lax 
equation (14) be equivalent to the Hamiltonian 
equations (13) with 


ViV(q) =0 [1Sa] 


V” (q) = alq)a(—q) [15b] 


provided the function a(x) satisfies the functional 
equation 


a(x)a'(y) -alya x) ana a Ble: 
The general solution of this functional equation 
yields via [15b] the two-body potential 


VO (q) = g a plaqjw, w’) [16] 


where g and a are two arbitrary constants and 
vo(x|w,w") is the Weierstrass elliptic function (with 
semiperiods w and w’, as well arbitrary). One 
concludes therefore that the N-body problem char- 
acterized by the Hamiltonian (11) with (12), (15a) 
and (16) is integrable. 

This Hamiltonian system has played, since the mid- 
seventies, a seminal role in the developments of finite- 
dimensional integrable systems that occurred over the 
last few decades. However, since the Weierstrass 
function is doubly-periodic, from a “physical” point 
of view this N-body problem is rather unrealistic, or 
perhaps rather suited for the study of crystalline 
configurations, including their statistical mechanics. 
But there are two special cases, obtained by assigning 
an infinite value to one or both of the semiperiods of 
the Weierstrass function in (16), that qualify V®’ (q) as 
a physical two-body potential: 


an? 
VO (q) == — [17a 
sinh* (aq) 
2 g 
v” (q) = 2 [17b] 


(Of course the second of these two-body potentials, 
(17b), is merely the special case of the first, (17b), 
corresponding to a= 0). These Hamiltonian models 
are then naturally interpretable as one-dimensional 
many-body problems with repulsive two-body forces 
singular at zero separation and vanishing at large 
distances. Actually the fact that these systems are 
integrable is far from remarkable, since it is 
generally true that any many-body problem char- 
acterized by repulsive forces vanishing at large 
distances (hence causing unconfined motions) is 
integrable: indeed in such models the particles 
eventually separate and move freely, so that their 
trajectories cannot display the extreme complication 
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characterizing a chaotic (i.e., nonintegrable) beha- 
vior. But these models are in fact superintegrable 
and they (as well as various integrable extensions of 
them) feature many (physically and mathematically) 
interesting properties. For instance the asymptotic 
behavior of their trajectories, 


qalt) = pe + q® + o(1), palt) = p® + 0(1) 
ast > too,n=1,...,N [18] 


is characterized by the simple rules 


pP = ph) _n=1,...,N, 


N 
=d DE Alp —pOsea) n 


m=1, m+n 


Tog [1 + (ga/p)"| 
A(p;g,a) = sign(p) —— = 


The formula (19) indicates that the shift gi? — gs? 
among the asymptotic positions of the particles (see 
(18)) is merely a sum of two-body shifts A (which 
incidentally vanish altogether if a=0, namely in the 
(17b) case), and it only depends on the velocities 
py? of the particles in the remote past (not on the 
corresponding asymptotic positions gs’, in spite of 
their relevance in determining the order in which the 
different particles approach each other through the 
motion). 

A generalization of the above model in the (17b) 
case — nontrivial inasmuch as it yields confined 
motions — is characterized by the additional presence 
in the potential (12) of the one-body potential 


VO (q) =3 g [20] 


yielding the Hamiltonian (3a). This model is integr- 
able, indeed superintegrable, indeed isochronous, all 
its (real) solutions being completely periodic with 
period 
2 
Ie [21] 
W 
A neat way to understand this result is by noting 
that, if g(t) is a (possibly complex) solution of the 
model discussed above (in this subsection, with the 
two-body potential (17b) and no one-body poten- 
tial, see (15a)), then 
ee exp(2 iwt) — 1 
at= =t )du T = — 
qn(t) = exp(-iwt)qn(T),7 Tia 
provides a (possibly real) solution of the Newtonian 
equations of motion (2), namely of the same model 


but with the additional one-body potential (20). 
Remarkably this model was solved firstly in the 
quantal case (at the beginning of the seventies), and 
only a few years later in the classical case considered 
here (by J. Moser, who, for the w=0 case, 
introduced the special version of the Lax matrix 
appropriate for this case). 

Another class of many-body problems, introduced 
in the mid-sixties by M. Toda, played a seminal role 
in the study of integrable dynamical systems, indeed 
the first application (independently by H. Flaschka 
and S. Manakov) of the Lax approach to integrable 
many-body problems occurred in that context. This 
model is often referred to as the Toda lattice, 
because its (two-body) interaction (of exponential 
type) is only assumed to act among “nearest 
neighbors”. 

A particularly interesting, and just as integrable, 
generalization of this class of Hamiltonian many- 
body problems features an extra parameter, say c, 
which might be considered to play the role of “speed 
of light”. These models reduce to those considered 
above for c=ox, and for finite c they are invariant 
under the Poincaré group of coordinate transforma- 
tions (while of course the many-body problems 
described above are invariant under the Galilei 
group). They are sometimes termed RS models, to 
recognize those who first introduced them 
(S. Ruijsenaars and H. Schneider) as well as the 
possibility to interpret them in some sense as 
“relativistic” generalizations of the “nonrelativistic” 
models described above. 


Reduction of the solution to algebraic opera- 
tions The solution of the models described above 
can actually be reduced to purely algebraic opera- 
tions. For instance for the model characterized by 
the Newtonian equations of motion (2) such a 
solution of the initial-value problem is provided by 
the following prescription: the particle coordinates 
qn(t) coincide with the N eigenvalues of the N ® N 
matrix: 


Onn (#) = qu(0) cos(w#) + (0) 


ig sin(wt) 
wlan (0) B qm(0)| 





Ort) — 


Many-body problems related to the motion of the 
zeros of linear PDEs Another convenient approach 
to manufacture and investigate integrable many- 
body problems is by identifying the motion of the 
particles with that of the zeros of (polynomial) 


solutions of linear (hence solvable) evolution PDEs. 
Assume for instance that the monic polynomial 


N 


N 
b(z,t) =x + So cm(t)x0-” =] ea (22 
m=1 


n=1 


satisfies the (compatible) linear PDE 


[Ao + A1z+ A227 + A32? | ez 
+ [Bo + Biz—2(N—1)A327]e), 
+ Coy + LE — (N — 1)D2z] yy 
+ [Do +D1z+D22"7| vz 
—|N(N—1)(A2 —A3z) + NBıly=0 [23| 


where the letters Ao, A1,42, A3, Bo, B1, C, Do, D1, 
Dy ,E denote 11 arbitrary constants. Then the zeros 
Zn(t) evolve according to the system of ODEs 


Cin + En = Bo + Biz, — 2(N — 1)A32% 


X [2CZnZm — (Zn + Zm)(Do + D12n) 
= Dazn EA T Ce) 
+2(Ao + A1zn + Arz, + A3z7)] B4 


interpretable as the Newtonian equations of motion 
of an N-body problem with one- and two-body 
(velocity-dependent) forces. This problem is integr- 
able, indeed its solution can be reduced to the 
algebraic problem of finding the zeros of the 
polynomial %(z, t), see (22), whose time evolution 
can be ascertained by solving the linear PDE (23), 
itself a purely algebraic problem as it amounts to 
solving the system of (constant coefficients, linear) 
ODEs implied via (22) by this PDE (23) for the N 
coefficients C,,(t). 

This class of many-body problems is rather rich, 
thanks to the arbitrariness of the 11 constants it 
features. Several subcases, characterized by special 
choices of these constants, are suitable to display a 
gamut of different phenomenological behaviors: 
confined and nonconfined motions, periodic and 
nonperiodic evolutions, limit cycles, Hamiltonian 
cases,.... 


Solvable many-body problems in the plane The 
many-body problems considered above were all 
essentially one-dimensional. But via a simple trick 
it is possible to obtain from some of them many- 
body problems in the plane (which should of course 
be rotation-invariant to be certified as such). 
Consider for instance the special case of the above 
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model, (24), with C=1 and with Ag = A; = A3 = 
Bo = Do = D2 =0 so that its equations of motion, 


N 
tn + Ein =Bitn + Y, (%—%m) 


m=1, m+n 


X [2žnėm — Di (èn + ĉm)zn + 24222] [25] 


are invariant under rescaling of the dependent 
variables (z, => cZn). Let us then assume to work 
in the complex rather than the real, and let us set 


E = y+ iw, A2 =a +iã, By = 6+ i, 
Dı =6+i6 


where the Greek letter indicate now real constants, 
and let us moreover relate the N complex coordi- 
nates z, to N two-vectors 7, in the horizontal plane 
via the self-evident positions 


AN 


Zn = Xn + IYn Tn = (070); k= (0, 0, 1) |26] 


It is then easily seen that the integrable equations of 
motion (25) become the following rotation-invariant 
Newtonian equations of motion identifying a (no 
less integrable) N-body problem in the plane: 


7, + (7 + wka) F, 


Here and below we use the short-hand notation 
Foam = n — Fm entailing r2, =r + 72, — 2fa Fm, the 
symbol ^ denotes the three-dimensional vector pro- 
duct so that k A Fa = (Yn, Xn, 0) (see (26)), and the rest 
of the notation is self-evident. Note that these rotation- 
invariant Newtonian equations of motion are also 
translation-invariant if 3=3=6= 6=a=a=0. 


The “goldfish” model The attribute of “goldfish” 
has been attributed to the special case of the above 
model with all “coupling constants” vanishing, 
thanks to the neatness of its equations of motion, 
which in their complex version read 
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and in their real (“physical”) version as Newtonian 
equations of motion of an N-body problem in the 
horizontal plane read 


a N m Z Frm) +1 im (7, Fm) — fnm (r Fn) 
5 


(This name has also been attributed to some 
extensions of this model, see the entry Isochronous 
Systems in this Encyclopedia). This model is 
invariant under time rescaling (t= ct), in its 
physical version it is translation- and rotation- 
invariant, it only features two-body forces and in 
spite of their velocity-dependence it is Hamiltonian 
(it is in fact a simple instance of the RS models 
mentioned above). The solution of its initial-value 
problem (in its complex version) is given by a 
remarkably neat rule: the N coordinates z,(t) are the 
N roots of the following algebraic equations in z: 


=. 210) 1 
y= 28) 
Aza 7 
The phenomenology of its generic solution is also 
remarkable, corresponding to the “game of musical 
chairs”: in the remote past all particles but one are 
almost at rest in N — 1 positions (“sitting in N — 1 
chairs”) and one particle comes in from infinity, 
moving initially as a free particle; as it approaches, 
all the particles begin to move around (“dancing”); 
in the remote future one particle goes away (moving 
eventually with the same speed as the incoming 
particle), and all the others settle down in the same 
N — 1 positions (“of the N — 1 chairs”), but with 
the possibility that the outgoing particle be different 
from the incoming one, and that the other particles 
have reshuffled their “seating”. 

Another remarkable version (also translation- and 
rotation-invariant, as well as Hamiltonian) of the 
N-body model in the plane (27) obtains if all the 
“coupling constants” vanish except w. Then all its 
nonsingular solutions — which are given by the same 
prescription indicated just above, except for the 
replacement of + with EL the right-hand side 
of (28) — are sanie) periodic with periods which 
are an integer multiple — no larger than a number 
depending on N, generally (much) smaller than N! — 
of T (see (21)), the domains of phase space that give 
rise to solutions with different periodicity being 
separated from each other by boundaries character- 
ized by lower-dimensional sets of initial data 
yielding trajectories that run into singularities 
corresponding to particle collisions (note that when 


two or more particles collide their individuality gets 
lost, and their velocities diverge). 


Integrable many-body problems in spaces with 
arbitrary dimensions Integrable, or even solvable, 
many-body problems in spaces with more than two 
dimensions — with rotation-invariant equations of 
motion of Newtonian type — can be manufactured 
by starting from an appropriate integrable, or 
solvable, second-order matrix evolution equation, 
and by then parametrizing the evolving matrix in 
terms of multidimensional vectors so as to transform 
the matrix evolution equation into a covariant — 
hence rotation-invariant — system of evolution 
equations for these vectors, interpretable as New- 
tonian equations of motion of a many-body problem 
in multidimensional space. 
For instance the matrix equation 


M=AM+MA+ M? 


is integrable. Here M = M(t) is a square matrix of 
arbitrary order and A is an arbitrary constant 
matrix. By parametrizing appropriately these two 
matrices one concludes that either one of the 
following two Newtonian systems of ODEs is 
integrable: 


N M N 
t= Y Cnt) Tayo) 
v=1 u=1 y= 
n= 1,...,N,m = 1,...,M, 
N M N 
Pe tA S a a Ta] 
v=1 u=1 y=1 
M= l N mM = Lla M: 


Here N and M are arbitrary positive integers, the 
NM constants Qy, are also arbitrary, the NM 
“particle coordinates” fym = fnm(t) are S-vectors, 
with S an arbitrary positive integer, and the dots 
sandwiched among these S-vectors denote the 
standard scalar product in S-dimensional space. 

Let us emphasize the physical relevance of this 
class of many-body problems, characterized by 
linear and cubic forces. This is reinforced by the 
fact that these models are Hamiltonian. 


Nonlinear harmonic oscillators Two classes of 
integrable systems obtain from the classes written 
above by first setting to zero all the constants Qj, 
and by then performing the change of variables 


exp(iwt) — 1 


inm (t) = exp(iwt)Fam(T),T = [29] 


lw 


with w > 0. The corresponding Newtonian equa- 
tions of motion read 


n= 1,...,N,m = 1,...,M, 


E ' M N 


u=1; v=1 


n= 1,...,N,m= 1,...,M 


These equations of motion cause the N M evolving 
S-vectors Wym = Wyn(t) to be complex (see the 
second term in their left-hand sides), but a real 
system (with double the number of dependent 
variables) can be easily obtained by setting 


Wnm = Unm + nm 


Remarkably (but clearly suggested by (29)), all the 
nonsingular solutions of each of these two many- 
body problems are completely periodic, with a 
period which is an integer multiple of the period T, 
see (21). This justifies the title given to this 
subsection. It also shows that these are isochronous 
systems (see Isochronous Systems). 


Integrable nonlinear PDEs 


As indicated in Section 1 another class of integrable 
systems are nonlinear evolution PDEs. In this 
section we outline (some of) their properties, 
focussing mainly on the Korteweg-de Vries PDE 
(4), the solution of which by C. S. Gardner, 
J. M. Greene, M. D. Kruskal and R. M. Miura in 
the mid-sixties was the opening shot of a major 
scientific development which is still blooming. 
Other important early steps of this development 
were, in the late sixties, the introduction by P. D. 
Lax of what is now called the Lax pair technique, 
and at the beginning of the seventies the solution by 
V. E. Zakharov and A. B. Shabat of the Nonlinear 
Schrödinger equation (6) — an evolution PDE of 
great applicative importance. Subsequently many 
researchers developed various techniques to iden- 
tify, classify and investigate integrable nonlinear 
PDEs, a continuing activity for an overall appraisal 
of which the interested reader is referred to the 
bibliography reported below. 

Here we outline one of the approaches to 
obtaining these results; other approaches are tersely 
mentioned below. 
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Identification and investigation of integrable 
PDEs via the inverse spectral transform 
technique 


The class of linear dispersive evolution PDEs reads 
, s0 
mx t= io( =i ua: t),—o0o<x<co [30] 


where the “dispersion function” w(z) is, say, a (real) 
polynomial (which must be odd to guarantee that 
this PDE be real). The solution of this PDE is 
achieved via the introduction of the Fourier trans- 
form i(k, ft), 


u(x,t) = (2r)! / * dk exp(ikx)i(k,t) [31a] 


ikas J E oe [31b] 


whose evolution corresponding to (30) is then given 
by the simple linear ODE 


i(k, t) = —iw(kjû(k,t),—0 < k < œ [32a] 
which can be immediately integrated: 
(k, t) = û(k, 0) exp|—iw(k)t] [32b] 


Thus the solution of the initial-value problem of (30) 
is achieved via three steps: (i) at the initial time one 
obtains the initial value of the Fourier transform, 
u(k, 0), from the initial datum u(x, 0) (via (31b)); (ii) 
one then obtains (k,t) (via (32b)); (111) one finally 
obtains u(x,t) (via (31a)). From these formulas the 
main features of the resulting phenomenology are 
easily evinced (even when the above integrals cannot 
be explicitly performed). 

A class of integrable nonlinear evolution PDEs 
reads 


u(x,t) = a(R)u,(x, t) 33] 


where the assigned function a(z) is again, say, a 
(real) polynomial, while R is now the integrodiffer- 
ential “recursion operator” defined by the following 
formula that specifies its action on a generic 
function f(x,t) (vanishing asymptotically so as to 
allow all integrations to converge): 


Rf (x, t) = fax (x, t) = 4u(x, t)f (x, t) 
+u) f dyf) BA 


Note that the presence of the time variable t plays 
no relevant role (it is merely parametric). A 
remarkable property of this operator — which 
depends on u(x,t) — is that any power of it acting 
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on u(x,t) yields a nonlinear combination of u(x,t) 
and its x-derivatives — without any left-over integra- 
tion, in fact yielding a result which is itself an exact 
x-derivative, ready for exact integration in case of a 
further application of R, see the last term in the 
right-hand side of (34). For instance 

Ru, = uyyy — 6uy U = (ei — 3u’), 


Ry = Unxxxx — 10Ottyyx U — 20U yy Uy + 30U, 07 


= (uyyy — 10Uy, Uu — Suz + 10u? 
( X x 


and so on. Hence the simplest nonlinear evolution 
equation contained in the class (33) is the Korteweg- 
de Vries (KdV) equation 


Ut + Uxxx = OUx U [35] 


(corresponding to a(z)= —z; and note the identity 
with (4), via the trivial rescaling q(x,t) = 3 u(x, t)). 
Note that, if one neglects all nonlinear contribu- 
tions, the class (33) reduces to (30) with 


w(z) = —za (—2*) 


The solution of this class of nonlinear PDEs, (33), 
is given by a somewhat analogous procedure to that 
described above for the class of linear dispersive 
PDEs (30). 

Firstly, one introduces the spectral transform, a 
nonlinear generalization of the Fourier transform 
which indeed reduces to it if nonlinear effects are 
altogether neglected. That relevant for the class of 
PDEs (33) is based on the spectral problem 
associated with the linear Schrödinger operator 


PRE 
La -(2) +u(x, t), —00 < x < œ [36] 


Via it, the spectral transform 


S[u(x, t)] = {R(k, t), —20 < k < œ; Pn, pn(t), 
n= lss N} [37] 


is introduced. Here the function R(k,t) is the 
“reflection coefficient” associated to the eigenvalue 
k? of the continuous spectrum of L, while the 
nonnegative number N gives the number of discrete 
eigenvalues of L, and the positive quantities p, and 
pn(t) are associated to these discrete eigenvalues, 
specifically —p* are the “binding energies”, and 
pn(t) the “normalization coefficients”, associated to 
the “bound states” possessed by the “potential” 
u(x,t). (All this terminology comes from the inter- 
pretation of the above spectral problem in quantum- 
mechanical terms). And it can be shown not only 
that there is a one-to-one correspondence among a 
function u(x,t) and its spectral transform S[u(x, t)], 


but moreover that both the direct spectral problem 
to compute S[u(x,t)] from u(x,t) (arbitrarily 
assigned within an appropriate class), and the 
inverse spectral problem to compute u(x,t) from 
S[u(x, t)| (arbitrarily assigned within an appropriate 
class), only entail solving linear equations (an ODE 
in the former case, a Fredholm integral equation in 
the latter case). 

Note that, in the above definition of the spectral 
transform, the time variable ż plays merely a 
parametric role. But the usefulness of this spectral 
transform to solve the PDE (33) resides in the fact 
that, if u(x, t) evolves in time according to this PDE, 
the corresponding evolution of the spectral trans- 
form is quite simple: the number N and the positive 
numbers p, are time-independent (as already 
implied by our notation), while the time evolution 
of the reflection coefficient R(k,t) and of the 
normalization coefficients p,(t) is given by the 
simple linear ODEs 


R,(k,t) =2ika(—4k*)R(k,t),-coo<k<oco [38a] 

nlt) = —2pna(4p7)pn(t),2=1,...,N [38b] 
which can be readily integrated: 

R(k, t) = R(k, 0) exp |2ika(—4k7)t] [39a] 

Pn(t) = pn(O) exp [—2pna(4p;,)t| [39b] 


Hence the solution of the initial-value problem for 
the class of nonlinear PDEs (33) can now be 
achieved via the following three steps: (i) at the 
initial time, via the solution of the direct spectral 
problem, the spectral transform S[u(x,0)] (see (37)) 
is obtained (from u(x, 0), arbitrarily assigned within 
an appropriate class); (ii) the spectral transform at 
time ¢ is then obtained via (39); (iii) by solving the 
inverse spectral problem, u(x,t) is obtained from 
S[u(x, t)| (see (37)). 

The analogy of this procedure to that outlined 
above for the class of linear dispersive PDEs (30) is 
clear, and the fact that in this manner the solution 
of the initial-value problem for the nonlinear PDEs 
(33) can be achieved via a sequence of steps 
involving only the solution of linear problems is 
an indication of the integrable character of this 
class of nonlinear evolution PDEs. And it allows to 
gain thereby a lot of insight on the behavior of 
these solutions, and also to construct classes of 
explicit solutions of these equations, as we now 
indicate. 


Solitons 


The integrable nonlinear PDE (33) possesses the 
single-soliton solution 


U(x _ 2 40a 
=e -eo | 
= “lo pt) = V 
e(t) = 2p) Now| 57] = €(0) + vt 
v = —a(4p*) [40b] 


to which corresponds the simple spectral transform 


S|u(x,t)] = {R(R,t) =0;p1 =P, 
pi(t) = p(t) = p(0)exp[—2pa(4p*)t];N=1} [41] 


This solution, (40), describes a localized wave of 
constant shape moving with the constant speed v: 
the “soliton”. It is characterized by two (real) 
parameters, £(0) and p. The first identifies the 
initial location of the soliton; its arbitrariness 
corresponds to the translation invariant character 
of (33). The second, p, the spectral significance of 
which is clear from (41), determines the shape of 
the soliton (both its “height” 2p? and its “width” +) 
as well as its speed v (see (40b)); note that the 
shape is identical for all the nonlinear evolution 
PDEs of the class (33), while the speed depends on 
the function a(z), see (40b), namely it depends on 
which specific equation of the class (33) one is 
considering. For instance for the KdV equation 
(35), corresponding to a(z)=—z, the speed of the 
soliton is 


v = 4p [42] 


thus all solitons of the KdV equation move from left 
to right, and taller and thinner solitons move faster 
than less tall and more fat ones. 

More generally, every PDE of the class (33) 
possesses the N-soliton solution 


2 
u(x,t) = —2 (=) log det[I + C(x, t)] [43a] 


Here I is the N@WN unit matrix and C(t) is the 
N & N matrix 


Gan a E T a 43b 
(x,t) = [Pm (t)Pn(E)] are [43b] 
where the time-evolution of the p,(t)’s is given by 
(39b). Indeed the spectral transform of this solution 
is given by (37) with R(k,t)=0 and p,(t) given by 
(39b). To discuss the multisolitonic phenomenology, 
let us focus on the KdV equation, so that the speed 
of each soliton is given by the simple formula (42) 
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and let us order the N positive numbers p, in 
increasing order, 


so that the corresponding soliton velocities, 
Vn =4p*, are as well ordered in increasing order: 


Up [<V < < VN 


The N-soliton solution (43) is not so transparent, 
especially if N is large, but it becomes quite simple 
in the remote past and future: 


N 


mans Pn 
a a ope EO) 


En(t) =) + Unt, t — 00 


with the 2N (real) constants a related to one 
another (see below). It is thus seen that, both in the 
remote past and future, the N-soliton solution (43) 
splits into the sum of N separated solitons. In the 
remote past the solitons are ranged, from left to 
right, in order of decreasing amplitude, and they 
move to the right with speeds ordered in decreasing 
magnitude; then the taller and faster solitons 
gradually catch up and eventually “overtake” the 
fatter and slower ones (the quotation marks under- 
score the fact that whenever two, or possibly more, 
solitons get together, their individuality is in fact 
lost: for a while the solution might have just one 
peak, or instead the “overtaking” of two solitons 
may rather appear as an “exchange of identity”, 
with the taller soliton becoming fatter and the fatter 
becoming taller as they get close together until they 
separate again because the one in front, having 
become taller, speeds up while the one behind, 
having become fatter, slows down). The final out- 
come is of course that the order of the solitons gets 
altogether reversed, with the taller and faster head- 
ing the escape to the right. The most remarkable 
aspect of this phenomenology is that precisely the 
same solitons that existed in the remote past are 
found in the remote future, the only effect of their 
“interaction” having been to shift the position of the 
n-th soliton, relative to what it would have been if it 
had been moving in isolation, by the amount 


rae = gl) = E 


These N shifts are moreover determined (while 
either the N quantities gO) or the N quantities &," 
can be arbitrarily assigned), being given by the 
simple rule 


n— 


N 


[44a] 
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[44b] 


1 n EE m 
A (Pn: Pn) = —log (Eee) 


Pn — Pm 


Of course in (44a) a sum vanishes if its lower limit 
exceeds its upper limit. 

This formula (44), has a simple phenomenological 
significance. From the two-soliton case (N = 2) it is 
seen that in a two-body encounter the taller and 
faster soliton gets advanced by the amount 
A(p2,~1), while the slower and fatter one gets 
delayed by the amount A(p1, p2). Hence the overall 
shift (44) experienced by the n-th soliton in the 
N-soliton case is the sum of the n — 1 positive shifts 
derived from its “overtaking” n — 1 slower solitons 
and the N — 7 negative shifts derived from its being 
“overtaken” by N — n faster solitons. This outcome 
is obvious when each two-soliton encounter occurs 
separately, but is quite nontrivial in the general case 
when, at some intermediate time, several solitons 
might all encounter simultaneously. 

This soliton phenomenology strongly suggest 
ascribing to each soliton an individuality, even 
though in configuration space it only shows up as 
a separate entity in the remote past and future. The 
separated identity of each soliton is instead quite 
clear in the spectral transform context, since each of 
them corresponds to a (time-independent) discrete 
eigenvalue of the spectral problem. Indeed in the 
spectral context this identity is clear also for the 
generic solution of the class of integrable nonlinear 
PDEs (33) which, in contrast to the purely solitonic 
solution (43), is mot characterized by a vanishing 
reflection coefficient R(k,t). And indeed, even in 
configuration space, the soliton phenomenology 
described above is still featured by a generic solution 
(each of which is characterized, via its spectral 
transform (37), by the number N of its solitons), up 
to the additional presence of a “background” 
component of this solution (corresponding to the 
nonvanishing reflection coefficient R(k,t)), which 
however behaves in a manner analogous to the 
solution of the linear, dispersive part of the PDE 
under consideration, becoming eventually locally 
small due to its dispersive character. 


Kinks, breathers, boomerons and trappons, 
dromions The solitonic phenomenology described 
above for the class of integrable PDEs (33), and in 
particular for the KdV equation (35), is more or less 
common to all integrable nonlinear evolution PDEs — 
of which many other classes exist besides (33). But 
there also are some significant differences, some of 
which we now review tersely. 

For certain integrable PDEs the typical shape of 
the soliton is not localized, but it rather has the form 


of a “kink”. Some integrable PDEs also feature 
additional kinds of localized “solitons” which, in 
isolation, move overall with constant speed as 
ordinary solitons, but feature in addition a time- 
dependent amplitude modulation and are therefore 
called “breathers”. For integrable matrix nonlinear 
evolution PDEs — or, equivalently, for integrable 
systems of coupled PDEs — the new phenomenology 
may emerge of solitons that, even in isolation, move 
with a variable speed, the change of which over 
time is correlated with the variable interplay of 
the amplitudes of the different components of the 
solution: typically such solitons come in from one 
side in the remote past and boomerang back to that 
side in the remote future (“boomerons”), or they 
may be trapped to oscillate around some fixed 
position (“trappons”); and there are integrable 
evolution equations in which both these types of 
solitons are simultaneously present in a generic 
solution. All these phenomenologies refer to the 
simpler class of integrable evolution PDEs in 1+ 1 
(one space and one time) variables, with asympto- 
tically vanishing boundary conditions (at large space 
distances; or perhaps asymptotically constant, as in 
the case of kinks). There also exist integrable 
evolution PDEs in 2+ 1 dimensions (such as the 
KP equation (9)) the generic solution of which may 
feature localized soliton-like components, although 
in this case appropriate boundary conditions play a 
crucial role (for this reason such solitons have been 
called “dromions”, hinting at their being to some 
extent driven by the boundary conditions, as objects 
moving in a stadium). 

While there are quite many (classes of) integrable 
PDEs in 1+ 1 dimensions, there are only a few in 
2+ 1 dimensions, and there is a widespread belief 
that no integrable PDEs exist in D+ 1 dimensions 
with D > 2. But already in the early days of soliton 
theory it was pointed out that there do exist quite 
many (classes of) integrable PDEs in 1+ D dimen- 
sions (namely, one space and D time variables) and 
that it is quite possible via a different formulation of 
the initial-value problem to interpret such equations 
as (no less integrable) PDEs in D + 1 dimensions (D 
space and one time variables); and integrable PDEs 
in D +1 dimensions have also been identified and 
investigated in the context of (the simpler class of) 
C-integrable PDEs (see below). 


Other properties of integrable PDEs 


For the linear evolution equations (30) the main 
message implied by their solvability via the Fourier 
transform is, that the time-evolution is much simpler 
in Fourier space (see (32)) than in configuration 


space. This has a profound impact on the under- 
standing of all phenomena describable by such 
equations, to the extent of determining the kind of 
experimental tools better suited to understand the 
underlining physics (for instance, the use of mono- 
chromatic beams of light, the use of high-energy 
particle accelerators, and so on). The same kind of 
message is as well relevant for the class of integrable 
nonlinear PDEs solvable via the spectral transform 
technique — even more so inasmuch as the time- 
evolution is in this case so much simpler in the 
spectral space (being actually linear there, see (38) 
and (39)) than in configuration space (where the 
evolution is nonlinear, see (33)). It is indeed the 
basis for the possession by the class of integrable 
nonlinear PDEs (33) of several other remarkable 
properties as outlined tersely in the following 
subsections. 


Backlund transformations A Backlund transforma- 
tion is a formula relating two functions, say u® (x, t) 
and u\)(x,t), so that, if one of them satisfies a 
(generally nonlinear) PDE, the other one satisfies the 
same PDE. In the context of the class (33) of 
integrable PDEs, such a (class of) Backlund trans- 
formations is provided by the formula 


(A) [uO (x, 2) 


where g(z) and h(z) are two (a priori arbitrary) 
entire functions (say, two polynomials), while A and 
T are two integrodifferential operators the effect of 
which on a function f(x,t) (such that all relevant 
integrations are convergent) reads 


Tf (x,t) = H (x,t) +u (x, 1) f (x,t) 
[u0 (x,t) — a 


vor dy (y,t)—u,2)]f0.2) [46a 


Af (x,t) =fex(x,t) —2 a (x,t) + u (x, t) f (x, t) 
+r fo dyf (y,t 


— u(x, 1) +h(AP1=0 [45] 


[46b] 


Note that here the variable t plays no relevant role 
(its presence is merely parametric), and that T and A 
depend (in a symmetrical way) on u)(x,t) and 
ull) (x,t), whose presence causes the Bäcklund 
transformation (45) to be nonlinear in these 
functions. Also important is the observation that, 
for u(x,t) =u (x,t)=u(x,t), the operator A 
becomes the recursion operator R, see (34). 
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The reason why the formulas (45) constitute a 
class of Backlund transformations is because — as a 
property of the spectral transform based on the 
linear Schrödinger operator L, see (36) — if two 
“potentials” u(x,t) and W(x, t) are related by 
(45), the pene “reflection coefficients” 

((k,t) and R‘)(k,t) are related algebraically, as 


follows: 
g(—4k?) [RO (k, t) — RO (k, t)| 
+ 2ikh(—4k?) [RC (k, t) + RY (k, t)| =0 [47a] 
entailing 
—Ak?) + 2ikh(—4k? 
Ok, f= (6,2) a [47b] 


Clearly this N entails that, if R® (k, t) satisfies 
(38a), so does R® (k, t). Hence, as the fact that 

(0) (k, t) satisfies (38a) is a consequence of the fact 
that u(x,t) satisfies (33), likewise the fact that 

(D(k,t) satisfies (38a) provides the basis for 
concluding that u!)(x,t) also satisfies (33). 

The simpler version of the Backlund transforma- 
tion (45) obtains by setting g(z) = —2ph(z) with p 
an arbitrary constant, hence it reads 


wy.) (x,t) + wy (x,t) 
= 2) [w0 (x,t) — w” (x, ‘| 


— ; jw (x,t) — w® (x, D| 48] 


Here and below we use for convenience the 
functions w” (x,t) related to u” (x,t) as follows: 


D4 =f a Dy, 2) 
wY (x,t) ae (y, £) 49 
wnt) = 0) 


A convenient application of Backlund transfor- 
mations is to yield new solutions of (33) from 
known solutions; for instance from the trivial 
solution u® (x,t) =w(x,t)=0 the single-soliton 
solution (40) can be readily obtained via (48) and 
(49) (of course an appropriate time-dependence 
must be attributed to the x-independent “integra- 
tion constant” that obtains from the integration 
of (48), which is an ODE in the independent 
variable x). 

Another important property of Backlund trans- 
formations is their a ere oe two sets 
of two polynomials, ¢”)(z) and h’”)(z),m=1, 2, and 
the two Backlund transformations (45) they gener- 
ate, say BT1 and BT2. Take as starting point some 
function u°)(x) and associate to it two functions, 
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u\) (x) respectively u”) (x), obtained from u(x) via 
these two Backlund transformations, BT1 respec- 
tively BT2. Then obtain a new function, say u4” (x), 
from ul) (x) via BT2; and likewise obtain u?” (x) 
from u?) (x) via BT1. The property of commutativ- 
ity entails that, provided an appropriate choice is 
made of integration constants (see (45)), 


ul? (x)= ul) (x) |50] 


This property is highly nontrivial when viewed, as 
we just did, in configuration space; it is instead 
rather obvious in the spectral space, indeed the 
corresponding property for the “reflection coeffi- 
cients” reads (in self-evident notation, see (47b)) 


RC?) (k) = RON (k) = RO (k) BO (k) B® (k) [51a] 
ponp) = E (TAR) + 2ikh™ (—4 k’) 
(iy = SARE) + a 
gl) (—4 k2) — 2ikh™ (—4 k?) 

_ [51b] 


hence it corresponds simply to the commutativity of 
the ordinary product. 


Nonlinear superposition principle Another 
remarkable property of the class of evolution 
equations (33) is a straightforward consequence of 
the commutativity property, (50), of Bäcklund 
transformations. It reads (hereafter with a slight 
abuse of language we refer to “solutions” w0) even 
though the actual solutions are the functions x) 
related to the w” by (49)) 


(0) _ 2(b1 +p) (wi) =w”) 


21) OWA O PAO O C 2 
(p1 = pa) + wD =w 


w2) — w?) — [52] 
where w =w% (x,t) is an arbitrary solution of 
(33), w =w (x,t) respectively wP =w” (x,t) 
are likewise the solutions of the same PDE related 
to w) by the Bäcklund transformation (48) with 
p=pi respectively p=p2, and wH? (x,t) =w? (x,t) 
is another solution of the same PDE. Note that this 
formula, for which the title of this subsection seems 
appropriate, provides a completely explicit, rational 
expression of a new solution of (33) in terms of 
three other solutions of the same equation: an 
arbitrary solution w, and the two solutions w'! 
and w®) related to it by a simple Bäcklund 
transformation, see (48). 


Soliton ladder A simple application of the preced- 
ing formula is to start from the trivial solution 


w) = 0 [53] 


so that (see (48)) 
wh (x,t) = —2p; 1 — tand p; x — xe) + a(4p?)e| My, 
§=12 [54a] 


where, in order that this function be real, either 


Im Ed = [54b] 
or 
Im kd = ip, [54c] 


Via (49), the expression (54a) with (54b) yields, for 
each value of j, a version of the single-soliton 
solution (40). Insertion of (53) and (54a) in (52) 
yields, via (49), the two-soliton solution of (33), 
provided 0 < pı < p2 and xP satisfies (54b) while 
xe satisfies (54c) (otherwise the solution produced 
by (52) is complex or singular). 

Having thus obtained the two-soliton solution, 
one can apply the nonlinear superposition formula 
(52) to get the three-soliton solutions, by inserting in 
place of w the single-soliton expression (54a) 
(with parameter, say, p1) and in place of w\”) and 
w) the two-soliton expression (with parameters pı 
and p2 respectively pı and p3); and the process can 
be continued, as suggested by the title of this 
subsection. In this manner the multisolitonic solu- 
tion can be constructed by a sequence of purely 
algebraic operations: and simple rules can be given, 
detailing the restrictions on the soliton parameter p, 
and the reality properties of the constants x ((54b) 
or (54c)) to insure that the solution so arrived at be 
real and nonsingular, and thus coincide with (43). 


Conservation laws As mentioned above, integrable 
evolution PDEs are interpretable as infinite- 
dimensional dynamical systems. It is therefore 
natural that they possess an infinite number of 
conserved quantities. For instance every PDE of the 
class [33] possesses the following infinite sequence 
of conserved quantities: 





Cr = n+l = dx R lat (x, £) + 2u(x,t)], 


n—0,1,2,..., [55a] 


where R is the recursion operator (34). An alter- 
native definition for this sequence is 





i=): [ T 
= nti! dxR u(x,t), 


a [55b] 


where the integrodifferential operator R is in some 
sense the adjoint of R, being defined by the 
formula 


~ 


Rf (x, ¢) = frex (x, t) = 4u(x, t)f (x, t) 


+2 f dyng dpo — [55e 


that specifies its action on a generic function f(x, t) 
(such that the integration converge). The first 3 of 
these conserved quantities read as follows: 


Co =| dx u(x,t), 
Cy =| dx u* (x,t), 
G= / dx [2 u? (x,t) + uz (x, t)| 


These constants of the motion (55) are functionally 
independent and, in the context of a Hamiltonian 
formulation characterized by the Poisson bracket 


= 6A O 6B 
(A.B}= | as bu(x) Ax u(x) 


(where A and B are functionals of u(x) and 6/6 u(x) 
denotes the functional derivative), they are in 
involution, 





{Cu, Cn} = 0 


Note that, in this context, the KdV PDE (35) 
coincides with the Hamiltonian equation 


uxt) = {Ui x7) H= (=) et 
with 
H= To =; | ax [2 uw? (x,t) + uz (x, t)| 


Several alternative sequences of constants of 
motion also exist. For instance another infinite 
sequence is provided by the two equivalent formulas 


Cn = =g dx” -1 [56a] 


cn = (-1} J ETE [56b] 


with the integrodifferential operators R and Ao 
defined by the formulas 
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Rfi) =f) f ETT 
Aof (x,t) = fx (x, t) - 2u(x,t) f(x, t) 
uA J dy f(y,t) 


+ u(x,t) [ ano, t) [ af t) 


Note that the integrodifferential operator Ag is just 
A, see (46), with u(x,t) =0 and u(x, t) = u(x, t). 
The constants c, are also all independent of each 
other, but there is a relationship between the 
constants of the two sequences, (55) and (56), 


C, 2n+1 
rae 


which is to be understood by expanding the right- 
hand side in powers of z and then equating the 
coefficients of equal powers of z: 


X cn "tt = sin 
n=0 





co = Co, 
= 103 
ci = Cı =U: 


G = C2 — 4 CC: + 1 C9 


and so on. 

Of course all these conservation laws are applic- 
able to the class of solutions of (33) defined for all 
(real) values of x and vanishing asymptotically (as 
x — +00). But they can also be reformulated as local 
“continuity equations”. And — rather remarkably — 
all these results hold as well for the explicitly time- 
dependent class of PDEs that obtains if one allows 
the polynomial a(z) in the right-hand side of (33) to 
feature an arbitrary time-dependence, say 


M 


a1) = ` Gmie” [57] 


m=0 


Finally let us note that there is an additional 
conserved quantity for this (generalized) class of 
PDEs, 


Ca [ dx xul t) + [ work, t')u(x, t) 


CO 


with R defined by (55c). This implies that, for the 
generic solution of this (generalized) class of PDEs 
the center of mass 


7 J e dx x u(x, t) 


aa f dx u(x, t) 
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moves according to the formula 


M 
X(t) = Xo + N (2m + 1) (=) 
m=0 


t 
C 
ai Olt Agee 
«| Am(t), Xo CG 


Hence for all the autonomous evolution PDEs of the 
class (33) (with a(z, t) =a(z), @m(t)=am, see (57)) 
the center of mass of the generic solution moves 
uniformly, 


X(t) = Xo + Vt 


with the (constant) speed 


V= Scam +1) (=) Om 


m=(0 


Other techniques to identify, classify 
and investigate integrable PDEs 


The spectral transform approach on which we 
focussed above is just one of the various techniques 
used to identify and investigate integrable nonlinear 
evolution PDEs. (Incidentally; because the less 
standard aspect of this approach is the inverse 
transformation to reconstruct, in the framework of 
the spectral problem, the “potential” u(x) from its 
spectral transform, this approach is often called the 
Inverse Spectral, or Scattering, Transform method — 
abbreviated as IST). In this subsection we tersely 
mention some other approaches, referring to the 
literature indicated below for more adequate 
treatments. 

An approach starts from a trivially integrable 
PDE - say, linear and autonomous, see for instance 
(30) - and performs a nonlinear change of 
dependent, and possibly as well of independent, 
variables. The PDE thus obtained is generally 
integrable, indeed the term C-integrable is used to 
denote such equations (to distinguish them from 
the S-integrable equations solvable via IST: the 
letter C refers to the Change of variables, the letter 
S to the Spectral, or Scattering, transform). A 
simple instance of C-integrable equations is the 
Burgers equation (5), which is linearized via the 
change of dependent variable 


aœ) =q exp- | dyad) 


q(x, t) 


eT OA 


entailing the linear PDE 
at + qxx =0 
A second example is the “Liouville equation” 


[S8a] 


uyt = exp(u) 


or equivalently, in “light-cone coordinates” (£= x + t, 
T=—x +t) 
[58b] 


Urr — Uge = exp(u) 


the general solution of which reads 
u(t) = Fle) = g(t) = 2log fa f ax explf (x) 
t 
+ 2a f ar explet} 
to 


with f(x) and g(t) arbitrary functions and xo, to, a 
arbitrary constants. And a third example is the 
Eckhaus equation 


q= ifa + 2(14°) +laltla} 159] 
which is linearized by the transformation 
a(x.t) =al exp] f drao 


q(x, t) 
1+2 dyly, Ð 


entailing the linear PDE 


q(x, t) = 


dı = fa pe 

Thanks to the simplicity of the technique to 
solve them, C-integrable PDEs provide a conveni- 
ent tool to investigate the phenomenology asso- 
ciated with nonlinear PDEs. For instance the 
Burgers equation (5), which possesses kink-like 
solitons, is a simple nonlinear generalization of the 
heat equation; and the “relativistic invariance” of 
the Liouville equation, see (58b), makes it a 
convenient “toy model” in the context of relati- 
vistic field theory. The Eckhaus equation, (59), 
provides an interesting theoretical tool because of 
its similarity with the phenomenologically impor- 
tant NLS equation (6), as well as the fact that, 
thanks to its C-integrability, the structure of its 
solutions — which feature a remarkable solitonic 
zoology, including the possibility of “anelastic” 
solitonic reactions — can be studied in considerable 
detail, entailing an understanding of why such 
anelastic reactions are unlikely to be featured by 
solutions obtained in the context of the initial- 
value problem. 


C-integrable PDEs are generally as well S-integrable, 
being generally associable with a spectral problem that 
can be explicitly solved; the converse, instead, is not 
generally true. Hence C-integrability represents a 
higher level of integrability than S-integrability; a 
ranking that is quite useful in spite of its lack of strict 
cogency caused by the possibility to consider also the 
transformation from a function to its spectral trans- 
form as a change of (dependent) variable. 

The Lax approach, described in some detail above 
in the context of finite-dimensional integrable 
dynamical systems, was in fact originally invented 
in the context of integrable PDEs. For instance the 
KdV equation (35) corresponds to the (operator) 
Lax equation (to be compared with the matrix Lax 
equation (14)) 


L; = [L, M] 


where now the Schrödinger operator L is defined by 
(36) (so that L; =u(x,t)) and the operator M is 
defined as follows: 


M = —4 - ae ae (X7) 
7 Ox Ot ON ~~ 
Closely connected with this approach is the AKNS 
method (due to M. J. Ablowitz, D. J. Kaup, A. C. 
Newell and H. Segur), based on the observation that 
the KdV equation (35) coincides with the integr- 
ability condition 


Wet = Pixx [60] 


for the following pair of linear PDEs (the first of 
which is just the eigenvalue equation for the 
Schrödinger operator L, see (36)) satisfied by the 
function w(x, k, t) : 


Wee = [u(x,t) — k’ ly [61a] 
y = [—ux(x, t) + 4ik?] Y 
+2 [u(x,t) +2k*| vy [61b] 


and, more generally, that every equation of the 
class (33) coincides with the integrability condition 
(60) for the eigenvalue equation (61a) and the 
equation 


Wy = a(x, k, t) Y T Dx k, t) Wx 


with an appropriate choice of the two functions 
a(x,k,t) and b(x,k,t). Indeed this ansatz, (61c), 
with a(x,k,t) and b(x,k,t) low-order polynomials 
in k, provides a quite straightforward technique to 
identify the simpler equations of the class (33); ditto 


[61c] 
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for the extension of this approach based on more 
general eigenvalue problems than (61a). 

Another powerful approach suitable to identify 
and investigate integrable PDEs is the so-called 
“dressing method” (introduced by V. E. Zakharov 
and A. B. Shabat and pursued by many others), in 
which one starts again (as in the approach leading to 
C-integrable equations) from an easily solvable 
evolution equation and then performs transforma- 
tions (less elementary than just a change of 
variables) that modify (“dress”) the original equa- 
tion, obtaining thereby new (nontrivial and interest- 
ing) evolution equations, the integrability of which 
hinges on the control one has on the (dressing) 
transformation relating (both ways) the solutions of 
the new equations with those of the original 
equation. Of course many specific techniques are 
accommodated within this (admittedly vague) 
description; we must confine our remarks here to 
noting the crucial role that the Riemann-Hilbert 
problem generally plays in this context (indeed the 
Riemann-Hilbert problem also lies at the core of the 
solvability of the inverse spectral problem, although 
techniques not explicitly relying on it are also 
available). 

Algorithmic approaches, particularly suitable to 
manufacture multisolitonic solutions and to identify 
nonlinear PDEs that are integrable inasmuch as they 
feature such solutions, were developed already at the 
beginning of the 70’s. The pioneer of this approach 
was R. Hirota; less than a decade later a 
more sophisticated and general development — the 
so-called “tau-function” method -— was invented 
by M. Sato and his pupils/collaborators. 

Finally let us mention that many remarkable 
connections exist among integrable PDEs and 
integrable finite-dimensional dynamical systems 
such as those discussed above; for instance the 
time-evolution (taking generally place in the com- 
plex plane) of the poles of rational solutions of 
certain integrable PDEs obey the equations of 
motion of integrable dynamical systems interpreta- 
ble as many-body problems. 


Why are certain nonlinear PDEs both integrable 
and widely applicable? 


Several integrable PDEs play a key role in various 
applicative contexts, justifying the question figuring 
as title of this subsection. A metamathematical but 
enlightening, and heuristically quite useful, reply to 
this question reads as follows. 

Consider as starting point a large class of non- 
linear PDEs, and associate to it via some kind of 
asymptotic limit procedure a single nonlinear 
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PDE — to which it is then justified to attribute a 
certain universal character. If this procedure corre- 
sponds to a physically (or, more generally, applica- 
tively) significant limit, it stands to reason that this 
universal PDE play a role in several applicative 
contexts (because the original class of PDEs, being 
large, certainly contains several equations of appli- 
cative relevance). And if the limit procedure is in 
some sense asymptotically exact, and it therefore 
preserves the property of integrability, it is also 
likely that this universal PDE be integrable, because 
for this it is sufficient that the original, large class of 
PDEs contain just one integrable PDE. 

For instance most phenomena characterized by a 
dominant dispersive plane wave in a weakly non- 
linear context can be shown, via an asymptotically 
exact multiscale expansion, to be modeled by the 
Nonlinear Schroedinger equation (6), the solution of 
which provides then the evolution, in appropriately 
rescaled “slow” and “coarse-grained” time and 
space variables, of the amplitude modulation of the 
dominant dispersive wave. This explains why this 
nonlinear PDE plays a key role in so many, disparate 
applicative contexts, and it also implies, in the light 
of the above argument, its integrability. 

The reasoning outlined above is quite robust, 
and it allows to infer that, if instead the universal 
limit equation is not integrable, then the large class 
of PDEs from which it originates cannot contain 
any integrable equation, providing thereby the 
point of departure to obtain (quite useful) neces- 
sary conditions for integrability. Indeed these 
conditions are adequate to distinguish among 
different levels of integrability, for instance among 
C-integrability and _ S-integrability; with the 
Eckhaus equation (59) playing in this context a 
somewhat analogous role for C-integrable PDEs to 
that played by the Nonlinear Schrödinger equation 
(6) for S-integrable PDEs. 


Outlook 


Many more important developments than could be 
covered in this overview have occurred in the last 
few decades; for these we refer to the books listed 
below (and there are many more), and to the 
literature cited there. 

Let us end this entry by emphasizing that both the 
study of integrable systems, and its application to 
phenomenologically interesting situation — including 
technological innovations, for instance in nonlinear 
optics and telecommunications — are still in the 
forefront of current research; although perhaps the 
“heroic era” of this field of study is over. 
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Waves; Calogero—Moser—Sutherland Systems of 
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Introduction 


We present the theory of hydrodynamic behavior of 
interacting particle systems in the context of exclu- 
sion processes, in which no more than one particle 
per site is allowed. 

Denote by Ty = Z/NZ the discrete torus with N 
points and let T4, = (Tn)?. The state space EN = 
{0,1}'®% consists of all configurations obtained by 
distributing particles on the discrete torus T4, respect- 
ing the exclusion rule which prevents more than one 
particle per site. The configurations are denoted by the 
Greek letter 7 so that n(x) is equal to O or 1 if site 
x € TŻ is vacant or OrenT ies for the configuration 7. 

Denote by {Ty : x € Zf} the group of translations 
in En: (ten)(z) =n(x +z) for each x, z in Zt. Here 
and below summations are performed modulo N. A 
function f :{0,1}” — R with finite support is called 
a cylinder function. 

Fix a family of non-negative cylinder functions 
cj, 1<j<d. Let csata) = Giren) and consider the 
Markov process {7;:t > 0} on En with generator Ly 
given by 


(Lnf) (7 £3 `S Cx xte; (N fo” ein) — —f(n)| [1] 
j=1 xeTS, 
Here, {e1,...,eg} stands for the canonical basis of R? 


and on for the configuration obtained from 7 by 
exchanging the occupation variables 7(x) and 7/(y): 


nz) ifz#x,y 
(PP n(z)=4 nly) ifz=x [2] 
Hoe). i ey 


In this dynamics at each bond {x,x+e;} the 
occupation variables (x), n(x +e;) are exchanged 
at rate Cx,x+e,(7). This happens simultaneously and 
independently at each bond. 


Notice that the total number of particles is 
conserved by the dynamics since only exchanges are 
allowed. Denote by ©n.x(0 < K < ghey the hyper- 
plane of all configurations 7 of En with K particles. 
Assume that the rates cj are nondegenerate for 7 to 
be an irreducible Markov process on each Un, x. 

For 0<a<1, denote by yN the Bernoulli 
product measure of parameter œ on En. Under VN, 
the variables {n(x), x € T are independent, with 
marginals given by 


vi {n(x) =1}=a=1- vN {n(x) = 0} 


Assume that the measures v, 0 < a < 1 are station- 
ary for the Markov process ņ. An elementary 
computation shows that this is the case if each function 
c; does not depend on 7(0), 7(e;), in ee case the 
process is in fact reversible with respect to vN 

Let M,(T%) be the space of finite. Po 
measures on the torus T? endowed with the 
weak topology. For each configuration 1, let 
aN =nTN(n,du) be the positive measure on T 
obtained by assigning mass N~@ to each particle: 


TN := =N“4 S° n(x) 


xeTS, 


)bxjn(du) [3] 


where 6, stands for the Dirac measure on u. The 
measure m is called the empirical measure asso- 
ciated to the configuration 7. The integral of a 
continuous function G : T — R with respect to 7% 
is denoted by 
(rN, G) =N? X G(x/N)n(x) 


xeTS, 


Fix a density profile po: 17 — [0,1]. A sequence 
of probability measures u on En is said to be 
associated to po if n converges in probability to 


po(u)du under uY: 
> | = 0 


li 
in 








GEG) = f, G(u)po(u) du 
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for all continuous functions G: T? — R and all 6 > 0. 
For a continuous profile pọ consider, for instance, the 
product measure Vy, () on En whose marginals are 
given by 


vn {n(X) = 1} = po(x/N) 


It is easy to check that the sequence of probability 
measures yN (.) 1S associated to po. 

Denote by Wx, x+e; the instantaneous current of 
particles from x to x + e;. This is the rate at which a 
particle jumps from x to x+e; minus the rate at 
which a particle jumps from x + e; to x: 


Wy, x+e; = {7(x) = n(x + ej) }Cx, x+e; (1) 


Suppose that the mean value of the current vanishes 
under all stationary states vN. This denotes that the 
average displacement of each particle vanishes in the 
mean. In particular, in view of the central limit 
theorem, to observe an evolution of the density in 
the macroscopic scale, a diffusive rescaling of time is 
needed. On the other hand, if there is a net flux of 
particles, the evolution has to be examined in the 
Euler scale tN. 

Denote by @(N) the time rescaling: N? if the mean 

displacement of particles vanishes and N otherwise. 
For each probability measure u on En, let Pn be 
the probability measure on the path space 
D(R,,€n) induced by u and the Markov process 
nm speeded up by (N). Expectation with respect to 
Px is denoted by En. 
Denote by aN(du)=a% (mon), du) the empirical 
measure at time t. Fix a density profile pọ : T — [0, 1] 
and a sequence of probability measures u on 
En associated to po. The goal of the theory of 
hydrodynamic limit of interacting particle systems is to 
show that for each t > 0,7 converges, as N 7 cx, to 
a deterministic path m(t, du) = p(t, u)du whose density 
p is the solution of some partial differential equation, 
called the hydrodynamic equation. 

The main tools available are entropy production 
and Dirichlet forms. Denote by Hy(pN|v) the 
entropy of a probability measure u on Eyn with 


respect to a reference probability measure VN: 


Hy (pu |v’) = sup iL. f du® — log | ef an] 
f EN EN 


where the supremum is carried over all functions 
f : EN —> R. 

It follows from the general theory of Markov 
processes that the entropy of the state of the process 
with respect to an invariant state decreases in time. 
The rate at which the entropy production decreases 
can be estimated by the Dirichlet form: let SN be the 


semigroup associated to the generator Ly defined in 
[1] speeded up by O(N). An elementary computation 
gives that 


t 
Hy (WSN UX) + 26(N) f ds IY (u's) 
0 
< Hn (wun ) 


Here, IN(u) is the convex and lower semiconti- 
nuous functional given by 


NWN) = -(f?, xf! yy 


where f stands for the Radon—Nikodym derivative 
duN/dvN and (-,-),~ for the scalar product in 
Le), : 

Therefore, if the initial state u has entropy with 
respect to a reference measure vN bounded by CoN%, 
by convexity of IN, 


N~¢Hn (uN SN |v) 


t 
+ 2t0(N)N~4IN (m | dsy!'S® ) < Co [4 
0 


for all t>0. This elementary estimate plays a 
fundamental role in the following sections. 


The Entropy Method 


Consider an exclusion process with generator given 
by [1]. Fix T > 0, a density profile pọ: T? — [0,1] 
and a sequence of probability measures u asso- 
ciated to po. Let Qn be the measure on the path 
space D([0, T], M..(T7)) induced by the process a 
and the initial state u^. 
To prove that nN converges to p(t,u)du in 
probability, we first show that the sequence Q,n 
converges to the probability measure Q* concen- 
trated on the deterministic trajectory p(t,u)du, 
whose density is the solution of some partial 
differential equation with initial condition pọ. It 
follows from this result and general arguments that 
TN converges to p(t,u)du for each 0 < t < T. 

To prove that Q~ converges to Q*, assume that 
we are able to prove tightness of the sequence Q,n. 
Since there is at most one particle per site, all limit 
points Q” of the sequence Q, are concentrated on 
trajectories m(t, du) = p(t, u)du, which are absolutely 
continuous with respect to Lebesgue. 

To characterize the limit points Q*, fix a smooth 
function G :T* — R and consider the martingale 


MSN = (nN, G) — (iY, G) 


- | a0tn(e®.G)ds 5 
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An elementary computation of its quadratic variation 
shows that MEN vanishes in L? (Pn) as N JT œ. 

Denote by Co the space of cylinder functions 
which oe zero mean with respect to all invariant 
states vN. Assume that the currents Wo, alaj- 4 
belong to Co so that a diffusive scaling 6(N )=N? is 
in force. Notice that 


d 


= Weni 


j=l 


Lyn ( n(x Wx En 

In particular, after a summation by parts, the 
integral term on the right-hand side of [5] can be 
written as 


[xe p2 SO (VNH)(&/N)Wr,xie(S)ds [6 
0 


j=1 xeTS, 


where (VAH) )(x/N)=N{H(x + e/N) — H(x/N)}. 
Notice that this sum is in eae of order N. 

To illustrate the entropy method, consider the 
symmetric simple ora process obtained by 
taking cj;=1 fate in [1] and observe that the current 
Wo,e, = (1/2){n eo i)}. A second summation by 
parts permits 1 a the martingale [5] as 


t 
(mN, G) — (1,G) -3 | (1^, ANG) ds 
0 


where An is the discrete Laplacian. 

Since the martingale My? vanishes in L*(P,), 
as N Î œ, all limit points Q* are concentrated on 
weak solutions of the linear heat equation. It remains 
to recall that there is a unique weak solution of the 
Cauchy problem for the heat equation to conclude 
that the sequence Q, converges to Q*, the 
measure concentrated on the deterministic path 
™(du) = p(t, u)du whose density p is the solution of 
the heat equation with initial condition po. 

The symmetric simple exclusion process has the 
very special property that the martingale M®™ can 
be written as a function of the empirical measure. 
This is not the case for all the other models, for 
which a further argument is needed to close eqn [5] 
in terms of the empirical measure. 

To present the additional arguments needed, 


assume that c(7)=1+ [n(—e) +7(2e)]. In this 
case, the current Wo,¢, is equal to 
{7(0) — (ej) } + {n(0)n(—e;) — n(e)n(24e) $ 


+ {7(0)n(2e) — n(—e)n(e) } 


A second summation by parts in [6] permits to 
rewrite it as 


H)(x/N)txb(nsn2z)ds+on(1) [7] 


where h(n) =7(0) + 2n(0)n(—e;) — 7(0)n(2e;). The 
remainder on(1) appears because we replaced dis- 
crete space derivatives by continuous ones. 

In contrast with the symmetric simple exclusion 
process, the martingale M®™ defined in [5] is not a 
function of the empirical measure and an argument 
is needed to close the equation. 

For each positive integer @ and d-dimensional 
integer x, denote by 7‘(x) the empirical density of 
particles in a box of length 2¢+ 1 centered at x: 


1 
0) = 
1 (x) TEET. 2 00) 


For a cylinder function h :En —> R, let h(a) be the 
expected value of h with respect to the invariant 
state vN : h(a) = E,n[h(n)]. For £ > 1 and a cylinder 
function h, let 


7X (5hb)(n) — Bl 


Vin =- 
)= (24 + ae xe 


(0) 


Theorem 1 Consider a sequence of probability 
measures mN on En such that IN (m) < CoN? for 
some 0 <a < 1 and some finite constant Co. Then, 


NA YS ye Ven(n)| =0 


d 
xETĂ 


lim sup lim sup Ex 
E—0) N—-o0o 


This statement, due to Guo et al. (1988), permits 
the replacement of a local function h by a function 
of the density of particles over a macroscopic cube. 
It is the main step in the proof of the hydrodynamic 
behavior of gradient systems, defined below, and its 
proof can be found in Kipnis and Landim (1999, 
chapter 5). 

Assume that the sequence u has entropy with 
respect to a reference invariant state vN bounded by 
CoN?¢ for some finite constant Co. It follows from 
[4] that the sequence of measures T7 J. dau Ss 
satisfies the assumptions of Theorem 1. Therefore, 
due to the presence of the time integral, we may 
replace the cylinder function h in [7] by h(7n=(x)). 
Since 7°N(0) can be written as (m,e), where 

= (2e)~“1{[—e, <]“}, we now have expressed the 
martingale [5] in terms of the empirical measure. 

Repeating the arguments presented for the sym- 
metric simple exclusion process, we may conclude 
that all limit points Q* of the sequence Q,, are 
concentrated on paths m;(du)= p(t, u)du, whose 
density p is a weak solution of the parabolic 
equation 


l Oyp =Alp + P) 
p(0,:) = pof) 
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because h(a) =a + a* for h(n) =n(0) + 27(0)n(—e;)— 
n(0)n(2e;). It remains to show the uniqueness of 
weak solutions of this differential equation to 
conclude. 

The second integration by parts in [6] was possible 
because the currents could be written as the difference 
of local functions and their translations, a very special 
property not shared by most interacting particle 
systems. Processes with this attribute are called 
gradient systems. 


Nongradient Models 


Consider an exclusion process with rates c(7)=1+ 
n(—e;), in which case the current is given by 


Wo, = {n(0) — n(e)} + tn(0) — nle;)}n(—e;) 


a cylinder function in Co. 

Fix T > 0, a density profile po :T? — [0,1] and a 
sequence of probability measures u associated to 
po and having entropy with respect to a reference 
invariant state vN bounded by CoN* for some finite 
constant Co. Recall the definition of the sequence of 
measures Q,n, assumed to be tight. 

To daie the limit points of Q,n, fix a 
smooth function G:T’—R and examine the 
martingale My N introduced in [5]. After an 
integration by parts, the integral term of the 
martingale becomes [6]. While a second integration 
by parts is possible for the first part of the current 
n(0) — n(e;), the second piece remains 


[ Nid ety VNH)(x/N)rctoj(nenr) ds [8] 


j=l 1 xeT?, 


where w;={n(0) — n(e;)}n(—e). Notice the extra 
factor N multiplying the sum and that w; belongs 
to Co. The next result and Theorem 4 are due to 
Varadhan (1994). 


Theorem 2 Consider a sequence of probability 
measures mN on En such that Hx(mN|vN) < CoN? 
for some 0 < a < 1 and some finite constant Co. Fix a 
smooth function G:T4 — R and a cylinder function 
Y in Co. There exists a seminorm ||- || such that 


y: 
lim sup {En | | 
N-oo 


<CoT||G||z sup |Y 9 
0<a<1 


/ i dsN'~* X` G(x/N)Tx¥ (nnz) 
0 


d 
xETĂ 








The explicit form of the seminorm ||- ||, can be 
found in Kipnis and Landim (1999, chapter 7). The 
proof of Theorem 2 requires a sharp estimate on the 
spectral gap of the generator Ly. Denote by Ay 


the cube (-¢,..., 0% and by La, the restriction of 
the generator Ly to the cube Ay, obtained by 
suppressing all jumps from Ay (resp. A$) to AG 
(resp. Av). For 0 < K < |Ag|, let va, x be the m E 
measure on the configurations of {0,1}‘’ with K 
particles. The following estimate is needed in the 
proof of Theorem 2: 


Theorem 3 There exists a finite constant Co such 
that 


(f ba, „K < Col? (f, m Lavy, 


for all £ > 1,0 < K < |A,| and zero-mean function f 
in LA Uis Rl 


This result is due to Quastel (1992) for symmetric 
simple exclusion processes. Yau developed a general 
method to prove sharp estimates for the spectral gap 
of the generator for conservative dynamics (see Lu 
and Yau (1993) and Yau (1997)). 

Since the parallelogram identity is easy to check, 
by polarization we can define a semi-inner product 
X- ,: a from the seminorm ||-||,,. Denote by Ha the 
Hilbert space induced by Cg and the semi-inner 
product <-,->>q. 

Denote by L the generator [1] extended to Zi. 
Notice that Lf belongs to Co for any cylinder 
function f, and that the gradients n(e;) — (0), and 
the currents wj,1 <j <d, also belong to Co. The 
next result states that all functions in Ha can be 
written as a linear combination of gradients and 
cylinder functions in the image of the generator. 


Theorem 4 Denote by LCo the space {Lg:g € Co}. 
For eachO0<a< 1, 


Ho = LCo ® {n(ej) — n(0) : 1 <7 < d} 


In particular, there exists a matrix {Dj,;(a): ; < 
i,j <d} and a sequence of functions {f;,(a,-) € 
Co:k > 1},1 <i<d, for which 


m+ LDV 


vanishes in a as k Î o. For reversible systems (and 
more generally for generators satisfying a sector 
condition), it can be shown that the sequence of 
local functions f; ,(a, n) can be taken independent of 
a: fikrla n) = fi e(n). Moreover, with a little extra 
effort, one obtains a bound uniform in a: 


wD Dilo) 


This estimate together with some algebraic relations 
in Ha give a variational formula for the matrix Dj, 


)inle;) — n(0)} — Li, kla, °) 


inf sup n(0)} — Lf [10] 
fECo 0<a<1 


dint a= 














Q 
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for every vector v in Rf, 


-D = f 
” ar aS — a) i 


D ViWi — uf 
It can also be shown that the matrix D is continuous 
and strictly elliptic. 

We may now complete the proof of the hydro- 
dynamic behavior. Recall that the main difficulty 
was to express formula [8] in terms of the empirical 
measure. Fix 1 < i < d and consider a sequence of 
cylinder functions {f; p: k > 1} satisfying [10] asymp- 
totically as kf oo. Adding and subtracting the 
expression steg Dia nf §(0))nX(e;) — 1°X(0)} — 
Lf;,, [8] becomes the sum of three terms. 

The soa one is just the expression which appears 
inside the expectation in [9] with G = (VH) and Y 
given by | 


w+ Dial 


Since the sequence of measure u satisfies the 
assumptions of Theorem 2, a modification of the 
proof of this theorem, to take into account 
the dependence of Y on N and £, shows that the limit 
of the expectation of the absolute value of the first 
term in the decomposition, as N 7 œ and thene | 0, 


is bounded by 
2 2 
Co T|!0,,Ally sup [Yalla 
0<a<1 


(11) 














0)){n™ (ej) — n° (0)} — Lhe 


where 


hom HH) + YD 


By [10], the penultimate expression vanishes as k { oo. 
The second term in the decomposition is 


t d 
J dsN'~* S00 V ON aL ie) 


; b— d 
J; k=1 xET 


a)in(e;) — n(0)} — Lfiee 


The presence of the generator L and the diffusive 
rescaling of time permit to show that the expecta- 
tion of the absolute value of this expression is of 
order N™! for each fixed k. 

Finally, the third term is equal to 


= / ‘dsN1-4 3 > (VitH)(x/N)D 
0 


j,k=1 xeT4, 
E EN 
x GAG — NsN2 (x) } 


A second integration by parts is now possible and 
one obtains that the previous expression is equal to 


)) {Ne ( Xx -+ ek) 


H)(x/N)d j, ane )) 


any So (B, 
0 


j,k= lyer 


q on(1) 


where d, p=Dj,n- We have already seen in the 
derivation of ns hydrodynamic equation for gradi- 
ent systems that this sum can be expressed as a 
function of the empirical measure. Since all limit 
points are concentrated on paths z;(du) which are 
absolutely continuous, this integral converges to 


T H) ds [du du (3%, uH) (4) d; e(p(s,)) 


Since the martingale [5] vanishes, all limit points 
are concentrated on trajectories m(du)= p(t,u)du 
which are weak solutions of 


d 
do = X Au { [6.4 + Djelo) Ouo} 


j,k=1 


where D is the strictly elliptic and continuous matrix 
given by the variational formula [11]. Here, the 
identity matrix 6; , comes from the first piece of the 
current which permitted a second integration by 
parts. A uniqueness result of weak solutions of the 
Cauchy problem with initial condition pọ concludes 
the proof of the hydrodynamic behavior of this 
nongradient system. 


Hyperbolic Equations 


Consider the asymmetric simple exclusion process 
obtained by setting cj(7) =7(0)[1 — n(e;)] in formula 
[1]. Notice that the current Wo,.,=7(0)[1 — n(e;)] 
has mean a(1 — a) with respect to the invariant state 
vN, suggesting the Euler rescaling of time 6(N) =N. 

Let < be the partial order on En defined by 7 < £ 
if n(x) < €(x) for every x in Ti The asymmetric 
exclusion process is attractive: there exists a 
stochastic evolution on Eyn x En with the following 
two properties: (1) it preserves the order, in the 
sense that 7, < & for all t > 0 if no < £o and (2) each 
coordinate evolves according to the original asym- 
metric exclusion dynamics. This coupling, which 
may be constructed by letting particles jump 
together as much as possible, is the main tool in 
the derivation of the hydrodynamic equation of 
asymmetric processes. 

Fix a smooth function G:T*—R and recall 
definition [5] of the martingale MS N An elemen- 
tary computation shows that the quadratic variation 
of this martingale vanishes as N 7 oo. On the other 
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hand, after an integration by parts, the integral term 
of the martingale becomes 


t d 
/ NAY > (VNH) E/N) (x) 


=1 xeT4, 
x [1 — nsn(x + e;)| ds 


Assume that the state of the process at any 
macroscopic time s is close to a product measure 
associated to some profile p(s,-). Since the martin- 
gale vanishes asymptotically, taking expectations in 
[S], we obtain that the density profile should be a 


weak solution of the quasilinear hyperbolic 
equation 
d 
Ory + > On,F(p) = 0 [12] 
j=l 


where F(a) =a(1 — a). 

It is well known that solutions of this equation 
may develop shocks even if the initial profile po( - ) is 
smooth and that there is no uniqueness of weak 
solutions. Several criteria have been introduced to 
select the relevant solution among the weak solu- 
tions. Kruzkov (1970), for instance, in the case 
where density profile po:T’—R_ is bounded, 
proved that there exists a unique measurable 
function p which satisfies the entropy condition 


d 
Ale—el+ ) AulF(e) -F()| $0 [13] 
i=1 


in the sense of distributions on (0,00) x Tf, for 
every c E R, and which converges to the initial 
condition in L'(T“) as t| 0: lim; o ||; — po||, =0. 

Fix T > 0 and a density profile po: T° > [0, 1]. 
To couple the original process with another one 
starting from a different initial sate, we need to 
impose the initial distribution to be of product form. 
Consider, therefore, a sequence of “product” prob- 
ability measures u associated to po and recall the 
definition of the sequence of measures Q,,n given in 
the section “The entropy method,” assumed to be 
tight. 

We have to prove that all limit points are 
concentrated on entropy solutions of [12]. Coupling 
the original process 7 with another one, denoted by 
E, starting from the Bernoulli product measure with 
density a, and examining the time evolution of 
dxeT4 Imin(x) — En (x)|, we derive an entropy 
inequality at the microscopic level: let aN be a 
sequence of probability measures on the product 
space En X En whose first coordinate is pA. 
Denote by Pin the measure on the path space 
D([0, T], En x En) induced by 7% and the coupling 


informally described at the beginning of this section. 
Rezakhanlou (1991) proved the following theorem: 


Theorem 5 For every smooth positive function H 
with compact support in (0,00) x T? and every 
E> 0, 


lim lim P 
l-—s00 N00 ” 


f dt N Y {8,H(t,x/N)|nf(x) - E) 


d 
xETĂ 


d 
+S (3u, H)(t,x/N) |F (np (x)) — F (é (x)) t > -el 24 





=| 


If we now assume that the second coordinate €; is 
initially distributed according to the stationary state 
vN, it is not difficult to replace € in the above 
formula by a, obtaining a microscopic version of the 
entropy inequality. 

In the one-dimensional nearest-neighbor case, by 
coupling arguments, we may replace the average 
7'(0) over a large microscopic box by an average 
nN (0) over a small macroscopic box, deriving the 
entropy inequality [13]. To conclude the proof it 
remains to show, by means of coupling argument 
again, that the density profile at time £ converges in 
L'(T) to the initial condition as t | 0. 

In higher dimensions or in the one-dimensional 
non-nearest-neighbor case, it has not been proved 
that replacement of 7'(0) by 7°%(0) is allowed. One 
is thus forced to consider measure-valued solutions 
of eqn [12]. Details can be found in Kipnis and 
Landim (1999, chapter 8). 


Relative Entropy Method 


The relative entropy method, due to Yau (1991), is 
based on the analysis of the time evolution of the 
entropy of the state of the process with respect to 
the product measure associated to the solution of the 
hydrodynamic equation. 

While the entropy method requires uniqueness of 
weak solutions and proves the existence of weak 
solutions, the relative entropy method requires the 
existence of a smooth solution and proves the 
uniqueness of such smooth solutions. 

Consider the exclusion process with rates c;(7) = 
1+ [n( —e;) + 7(2e)]. We have seen that the hydro- 
dynamic equation of this model is given by the 
nonlinear parabolic equation 


Op = A{p + p*} [14] 


Fix a profile po: T? — [0,1] bounded away from 
0 and 1:0<6< po(u) <1—6. Let p(t,u) be the 
solution of the hydrodynamic equation [14] with 
initial condition pọ and denote by Vt, the product 
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measure with slowly varying parameter associated 
to the profile p(t, -): 


Vr. {m (x) = 1} = plt,x/N), for x € Th 


Theorem 6 Let {uN:N>1} be a sequence of 
probability measures on En whose entropy with 
respect to Vr) is of order o(N2?): 


An (uN rN.) E o(N*) 


Then, the relative entropy of the state of the process 
at the macroscopic time t with respect to VN ) Is 
also of order o(N®): 


Hn (uN SN.» ) =o(N*) for every t > 0 


It is not difficult to deduce from this result a 
strong version of the hydrodynamic limit behavior 
of the interacting particle system: 


Corollary 1 Under the assumptions of the theorem, 
for every cylinder function Y and every continuous 
function H: T! —>R, 





Jim E NSN | N” > H(x/N)r t(n) 
— , HCPC) du] =0 
T 





The relative entropy method can be extended to 
nongradient systems and to asymmetric processes, 
whose macroscopic evolution is described by quasi- 
linear hyperbolic equations, up to the first shock. 

The hydrodynamic behavior of an interacting 
particle system corresponds to a law of large 
numbers for the empirical measure. The central 
limit theorem is well understood in equilibrium, but 
remains to this date an important open question in 
nonequilibrium. The large deviations for diffusive 
systems have also been investigated, as well as the 
hydrodynamic behavior of systems in contact with 
reservoirs. The Navier-Stokes equations have been 
derived as a correction of the hydrodynamic 
equation of asymmetric particle systems. We refer 
to Kipnis and Landim (1999) for further details. 


See also: Boltzmann Equation (Classical and Quantum); 
Bose-Einstein Condensates; Breaking Water Waves; 
Fourier Law; Interacting Stochastic Particle Systems; 
Macroscopic Fluctuations and Thermodynamic 
Functionals; Multi-Scale Approaches. 
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Introduction 


According to the basic principles of mechanics, the 
motion of atoms and molecules is governed, in the 
semiclassical approximation, by the deterministic 
Hamiltonian equations of motion. While all evi- 
dence points in this direction, for many problems 
this Hamiltonian approach is so complicated that it 
hardly yields any useful results. A simple example 
are many (10°) polystyrene balls (size 1m) 
immersed in water. The Hamiltonian description 
would have to deal with the degrees of freedom of 
all the fluid molecules and all the polystyrene balls. 
Clearly, a more useful approach is to collect the 
incessant bombardment of a polystyrene ball by 
water molecules into a stochastic force acting on the 
ball with postulated statistical properties. For 
example, following Einstein, one could regard 
successive collisions as independent and occurring 
after an exponentially distributed waiting time. In 
addition to such stochastic forces, the polystyrene 
balls are charged and interact with each other 
through the screened Coulomb force. 

On the one-particle level, stochastic models have a 
long tradition within statistical physics. Considerable 
part of the classical theory of Markov processes is the 
mathematical response to such type of description. 
The aspect of interaction is more recent. Its origin can 
be traced back to the Metropolis algorithm in early 
computer simulations (1953). It was recognized 
that the Hamiltonian dynamics is a rather slow tool 
to statistically sample the Gibbs equilibrium distribu- 
tion Z! exp[—H/kgT]. A more efficient route is to 
devise a stochastic algorithm which has as its unique 
stationary measure the Gibbs distribution. Such 
schemes are now known as Markov Chain Monte 
Carlo and of extremely wide use, not only in 
statistical physics but also in quantum chromody- 
namics (QCD) and other quantum field theories. The 
time appearing in the stochastic algorithm has no 
physical significance; it merely counts how often a 
certain operation is performed. 

The second clearly identifiable push toward the 
use of interacting stochastic particle systems came 
from the study of critical dynamics. Close to a point 
of second-order phase transition, the equilibrium 
properties are very effectively handled by means of 
statistical field theories. Thus, it was natural to 


search for an extension into the time domain, which 
then led to time-dependent Ginzburg-Landau the- 
ories, where now time refers to physical time. These 
are interacting stochastic models, where one keeps 
only a few basic fields, together with their behavior 
under time reversal, their vector character, and 
whether they are dynamically conserved or not. 

In probability theory, interacting stochastic particle 
systems date back to the seminal papers by M Kac in 
1956 and independently by R L Dobrushin and by 
F Spitzer in 1970. Spitzer was motivated by spin-flip 
and spin-exchange dynamics, while Dobrushin had 
the vision of many locally interacting components. In 
the early days, one of the prime goals was the 
construction of the stochastic process in infinite 
volume, an enterprise which had important mathe- 
matical spin-off, for example, the theory of Dirichlet 
forms on function spaces. Physical models offer a rich 
menu to the probabilist, but there is also considerable 
input from other areas. To give just one example: in 
queueing theory one considers queues in series, that 
is, a customer served at one counter immediately 
moves on to the next one. If one regards as field the 
number of customers at each counter, one has an 
interacting stochastic particle system, the interaction 
being mediated through the servers. 

This article is split into two sections. In the first 
one, we list and explain a few prototypical interact- 
ing stochastic particle systems. Of course, the list is 
hardly exhaustive and we restrict ourselves from the 
outset to models from statistical physics. In the 
second part, we summarize prominent lines of recent 
research. Again the wealth of material is over- 
whelming and we draw the line according to the 
rules of mathematical physics. 


Model Systems 


Our list is determined by the intrinsic mathematical 
properties of the stochastic particle system. Alter- 
natively, a classification is possible according to the 
physical system, which would, however, be less 
transparent for our purposes. We restrict ourselves 
to models with only position-like degrees of free- 
dom, but if needed velocity-like fields may be 
included. The most basic distinction is the behavior 
under time reversal. A model is called (statistically) 
“time reversible” if a particular history and its time- 
reversed image have the same probability. Techni- 
cally, one imposes this through the condition of 
detailed balance. Nonreversible systems are much 
less explored, but currently a very active area of 
research. 


Reversible Models 


1. Spin-flip, Glauber dynamics. One considers 
spins attached to the sites of a regular lattice, 
which for symplicity we take as the hypercubic 
lattice Z’. The spin at site x € Z is denoted by 
O,=2+1 and the whole spin configuration is 
denoted by o. Thus, the state space of the Markov 
processs is {-1,1}“ =Q. Spin configurations 
evolve in time through random spin flips, that is, 
through a change from o, to —o, according to 
configuration-dependent rates c,(c). c,(o) is local, 
in the sense that it depends only on the spins close 
to x, and is translation invariant, that is, if 7, is 
the shift by y, then c,4,(7,0) =cx(a). If the current 
spin configuration is o(t), then after a short 
time dt 


with probability 1 —c,,(a(t)) dt 


E Ox (t) 
ox(t+dt) = l = with probability c,.(o(t)) dt 


ox (t) 


The update is performed independently at each 
lattice site. Technically, it is more concise to specify 
the generator, L, of the Markov process. It acts on 
local functions f:Q — R and is given by 


Lf(o) = $ ex(a)(F(o*) — f(0)) 1] 


xez? 


where o* denotes the configuration o with the spin 
at site x reversed. The transition probability from 
the configuration o to the configuration o’ in time 
t > 0 is given by the matrix element (e™), ,, of the 
Markov semigroup e*. 

To impose time reversibility, one needs an energy 
function H(c) constructed according to the rules of 
equilibrium statistical mechanics. The condition of 


detailed balance then reads 
cxlo) = lee Pete) [2] 


with 8=1/kgT the inverse temperature. Note that 
on the right only energy differences appear, which 
are always well defined. In finite volume the 
unique invariant measure is the Gibbs measure 
Ze. 

2. Spin-exchange, Kawasaki dynamics, stochastic 
lattice gases. We model particles hopping on the 
lattice Zf and switch to the occupation variables ny, 
where 7,=0 stands for site x empty and n,=1 
stands for site x occupied. The state space is 
Q={0,1}”. Since the number of particles is con- 
served, the basic dynamical process is a random 
jump of a particle from x to a nearby site y, 
provided 7, =0. Therefore, we specify the exchange 
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rates Cyy(7) between x and y. They are local, 
translation invariant and symmetric, that is, 
Cxy(n) = Cyx(7). The generator now reads 


Lf(n) =5 So w- B 


x yezi 


where 1” is the configuration 7 with the occupan- 
cies at sites x and y exchanged. 

The condition of detailed balance refers to the 
exchange and reads 


Cxy() = Cay (oF EED 


In [4] we can freely add to H the chemical potential 
—u X` nx. Thus for stochastic lattice gases there is a 
one-parameter family of invariant measures, labeled 
by the chemical potential u. 

3. Interacting Brownian motions. These motions 
model, for example, suspensions as mentioned in the 
“Introduction”. One considers a box A C R? con- 
taining N Brownian particles. The jth Brownian 
particle has position x; € A. Thus, the state space of 
the Markov process is AN. We assume that the 
Brownian particles interact through a (sufficiently 
local) even pair potential U. Then the total potential 
energy Is 


N 
H(x) = D U(x;—x;), x=(x1,..-,xn) [5] 


The dynamics of the Brownian particles is given 
through the stochastic differential equations 


N 
dxj(t)= — X VU(xj(t) — x;(t)) dt 
i=1iAj 


+ /2DodW,(t), j=1,...,.N [6] 


W,(t),7=1,...,N, are a collection of independent 
Brownian motions and Do is the diffusion coeffi- 
cient of a single Brownian particle. Equation [6] has 
to be supplemented with suitable boundary condi- 
tions at the surface OA. Since the forces in [6] are the 
gradient of a potential, time reversibility is auto- 
matically satisfied with the invariant measure being 
Le exp(— H(x)/Do) dx “ae dxn. 

4. Ginzburg-Landau models. Ginzburg-Landau 
models should be viewed as discretized versions of 
stochastic partial differential equations. At every 
lattice site x€ Zf, there is a real-valued field 
x E R, a field configuration being denoted by @. 
Formally, the state space is RŽ“. Since the single-site 
space is noncompact, some growth condition at 
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infinity must be imposed. Next we give ourselves an 
energy, H(@), one standard example being 


H= Yo (bxr- b+ 5 Vibe) 7] 
x,yEZ |\x—y|=1 xez? 


The on-site potential increases sufficiently rapidly, so as 
to make large field values unlikely. The ¢-field evolves 
according to the set of stochastic differential equations 


OH (o(t))dt + /2/BdW,(2), 


d¢,.(t) = Ody [8] 


xe ZF 





where {W,(t),x € Zf} is a collection of independent 
Brownian motions. If V(¢,)=¢2, then ¢(t) is 
a Gaussian field theory. To have an Ising-type phase 
transition, one would have to choose V(x) = Adz + ¢%. 

It is rather simple to modify [8] as to incorporate 
a conservation law. To each directed bond (x,y), 
x= %|=T1, One associates the: current jy = =p IE 


e is a unit vector, |e|= 1, then 
déx(t)+ X` jxxye(t)dt=0, xEZ* [9 
e,je|=1 


The current has both a deterministic part, given 
through the gradient of a chemical potential, and a 
random part: 


OH OH 
lta =| -—_ = t))dt + dW (t), 
ioa = —(Fo~ Fp) OOM + Wl), 
Ix —y|=1 
where W, y(t) = —W,,(t) is a collection of indepen- 


dent Brownian motions labeled by nearest-neighbor 
bonds. The conserved quantity is X`, x. Again, the 
dynamics has a one-parameter family of stationary 
measures labeled by the “magnetic field”. Since in 
[8] and [10] the drift is the gradient of a potential, 
Ginzburg—Landau models are reversible. 

5. Interface dynamics. The scalar field ¢ describes 
the location of an interface. The energy of an 
interface does not depend on its absolute displace- 
ments. Thus, interface models are special Ginzburg- 
Landau models, which have an energy H(d) 
invariant under the global shift 6, — x +a for all 
x € Zf. An example is 


H= > 


x yeZ’ |x—y|=1 


V(x — py) [11] 


with even V. Note that in order to have a normal- 
izable equilibrium measure, the interface must be 
pinned somewhere. 

6. Several components. For lattice gases, there may 
be several components. In a Ginzburg-Landau theory 


instead of a scalar, Ising-like field, one could consider a 
vector-valued, Heisenberg-like, field and require the 
energy to be invariant under global rotations of the field 
variables. The construction is as before and we do not 
have to repeat it. 

7. Constrained, glassy dynamics. The constraint is 
enforced by setting some of the rates equal to zero. 
For example, in the case of standard Glauber 
dynamics, one could allow for a spin-flip only if at 
least two neighboring spins have the opposite sign. 
The Gibbs measure is still invariant, but the approach 
to equilibrium will be slowed down due to the 
constraint. It may even happen that the configuration 
space splits into several invariant subsets. 

After this long and still incomplete list, let us turn 
to the nonreversible models. 


Nonreversible Models 


Mathematically, one merely has to drop the condition 
of detailed balance. To have a more concrete example, 
let L; be the generator for the Glauber dynamics 
satisfying detailed balance with inverse temperature 
Bi, i= 1,2. Then L= L1 + L2 generates a nonreversible 
dynamics provided 3, Æ 62. Physically, it corresponds 
to coupling the spins to two bulk thermal reservoirs of 
different temperatures. Our example leads to a general 
point which should be noted: While reversible models 
have a wide range of physical applicability, for 
nonreversible models nonequilibrium conditions have 
to be maintained over sufficiently long time spans, 
which poses considerable difficulties experimentally. 
Thus on a theoretical level, the efforts go into exploring 
properties of, say, semirealistic models. 

Very roughly there are two broad classes of 
nonreversible models. 


Boundary-driven models We consider a finite 
volume A. Inside A the dynamics is reversible as 
explained before. At the boundary ðA the system is 
coupled to particle, resp. energy, reservoirs. In case the 
boundary chemical potential, resp. temperature, is not 
uniform, the dynamics is nonreversible. To be more 
concrete let us reconsider the lattice gas discussed in 
item (2) (see the discussion following eqn [2]). Inside 
A the generator La is given by [3] and satisfies 
detailed balance [4]. The boundary generator is 


Laaf(n) = X MEO) — F) [12] 


xEOA 


where the notation is as in [1] with {—1,1} 
substituted by {0,1}. c,(7) satisfies [2] with the 
same ( as in the bulk, but a chemical potential ux 
depending on x € OA. ux controls the injection/ 


absorption of particles at x. The generator for the 
nonreversible dynamics is then 


L= Lai + Lar [13] 


Bulk-driven models A prototype is the two- 
temperature model mentioned above. More widely 
studied is a nonconservative force acting globally. 
Here the standard example are particles moving in A 
with periodic boundary conditions and subject to an 
additional uniform force field of strength F, which 
clearly cannot be written as the gradient of a 
potential. In the case of Brownian particles, by 
changing to a comoving frame of reference, one 
would be back to the reversible case F=0. For 
lattice gases the lattice provides a fixed frame and 
the driven model has properties very different from 
the undriven one. This leads us to: 

8. Driven lattice gases. The generator L is still 
given by [3]. Formally, we insert in [4] instead of H 
the Hamiltonian H(n) — $, (F - x)nx. The exchange 
rates then satisfy the condition of “local” detailed 
balance as 


Cxy(7) = Ca] e (Ar) -H(n)) 


x e7 BUF (x—y))(nx—Ny) [14] 


This means, particles preferentially jump in the 
direction of F. On the infinite lattice the dynamics 
admits two classes of stationary measures. First, 
there is the Gibbs measure with particles piling up 
along F and formally given by 

zt Pa D(Fx)ns) 45) 
With respect to this measure the dynamics is 
reversible. Second, there are translation invariant 
measures with nonzero steady-state current. This 
cannot happen for reversible models. A very widely 
studied particular case is the asymmetric simple 
exclusion process for which d=1,H(n)=0, and 
jumps are only to nearest-neighbor sites. 


Items of Interest 


As there are thousands of research papers in 
mathematical physics alone, it is literally impossible 
to provide any sort of summary. On the other hand, 
the type of questions investigated are generic. Thus, 
we just explain what one would like to understand 
without paying much attention to the fractal 
boundary between “proven” and “unproven.” For 
the construction of the stochastic processes listed 
above, there is a well-developed probabilistic theory 
available. Thus, the main focus is on “qualitative 
properties” of the stochastic particle system. As in 
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the previous section, we distinguish between rever- 
sible and nonreversible models. 


Reversible Models 


1. Equilibrium state. The most basic question 
concerns the classification of invariant measures in 
infinite volume. By construction, they are the Gibbs 
measures for the Hamiltonian appearing in the condi- 
tion of detailed balance. In principle there could be 
more, which so far has been excluded only in dimension 
1 or 2. Properties of the invariant measure belong to the 
domain of equilibrium statistical mechanics. 

Thus we can turn directly to: 

2. Spectral analysis of the generator L. We fix 
some extreme Gibbs measure stationary for L. 
By detailed balance, e/* is a symmetric Markov 
semigroup in L*(Q, u). Hence, L is self-adjoint and 
L <0. Furthermore, it has a nondegenerate eigen- 
value 0. The rate of approach to equilibrium is 
determined by the spectral gap of L. Related are log- 
Sobolev inequalities which serve as a stronger 
notion. For models with a conservation law, there 
is no spectral gap. Thus, the more appropriate 
question is to study how fast the gap vanishes as 
the volume A increases. In the case of independent 
components, the spectral subspaces for L are 
organized as single excitation, double excitation 
etc. Such a structure persists as the interaction is 
turned on which, on a mathematical level, is similar 
to the particle spectrum of a quantum field theory. 

Physically more directly relevant are: 

3. Spacetime correlations. To be concrete, let us 
consider a Ginzburg-Landau field theory ¢,(t) 
starting with a translation invariant Gibbs measure 
u. Then x(t) is a spacetime stationary process. The 
two-point correlation function is the covariance 


(Gx(t)0(0)) — (@0(0))° [16] 


Its Fourier transform is directly linked to energy- 
momentum resolved scattering intensity from a probe 
which is modeled by the respective Ginzburg—Landau 
theory. For t=0, the expression [16] is the static 
correlation, again belonging to the domain of equili- 
brium statistical mechanics. The time decay depends 
on whether the field is dynamically conserved or not. 
Correlation functions do not always capture the 
physics of the system well. This is certainly true for: 
4. Dynamics at low temperatures. Let us consider 
the Glauber dynamics for the ferromagnetic Ising 
model in the finite but large volume A. Then there is 
a very high free energy barrier between configura- 
tions typical for the + phase and those typical for the — 
phase. If one starts the spin system in the + phase, one 
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may study through which configurations the system 
moves to the — phase and how much time such a 
process will take. If the two phases are symmetric with 
the external magnetic field }=0, the spin system 
tunnels, while for h < O and small the + phase is 
metastable. Another widely studied situation, also 
experimentally, is the quenching from high to low 
temperatures. In our context this means that the initial 
measure is Bernoulli, while the Glauber dynamics runs 
at low temperatures. Then spin clusters coarsen as time 
proceeds developing well-defined interfaces which are 
governed through motion by mean curvature. 

Close to a point of second-order phase transition, 
one has to deal with: 

5. Critical dynamics. The usual Glauber dynamics 
becomes very slow at the critical point and reliable 
equilibrium is hard to achieve. It is thus a challenge 
to design faster algorithms. One proposal is the 
Swendsen—Wang algorithm which is based on the 
Fortuin—Kasteleyn representation and flips a whole 
cluster of spins simultaneously. 

So far we concentrated on statistical properties. 
Researchers have been fascinated by the observation 
that for stochastic particle systems, the transition to a 
deterministic macroscopic evolution can be handled 
with full rigor. Such a program has been baptized: 

6. Hydrodynamic limit, which is meaningful only 
for particle systems with one or several conservation 
laws. Let us discuss then a reversible lattice gas with 
Hamiltonian H. We start the dynamics with a state 
of local equlibrium which is Gibbs with a slowly 
varying chemical potential, that is, 


Zi exp|-6| Hin) -X ulex)m ||, e«K1 [17 


Such a measure is almost time invariant. For small £, 
at least approximately, such a structure should 
persist in the course of time at the expense of 
properly regulating the chemical potential. For our 
example, the correct timescale is €?t in microscopic 
units, and the evolution equation for the density, 
related thermodynamically to the chemical poten- 
tial, is a nonlinear diffusion equation of the form 


ð 
pp = V Dip) Ver |18] 


We turn to the nonreversible models. 


Nonreversible Models 


While for reversible models the study of the 
stationary Gibbs measure is its own field of inquiry, 
here the first entry must be: 


7. Nonequilibrium steady state. This steady state is 
determined through the dynamics, since the stationary 
measure u has to satisfy u(Lf)=0 for a sufficiently 
large class of functions f. As in equilibrium, phase 
transitions may occur. In the nonconservative case it 
would mean that the infinitely extended system has 
several extreme stationary measures. In the conserva- 
tive case, say with the density as locally conserved field, 
it would mean that there is an interval of densities for 
which there is no extreme stationary measure. Given 
the nonequilibrium steady state, one may wonder 
about its typical fluctuations and large deviations. In 
contrast to thermal equilibrium, weak long-range 
correlations are the rule. 

8. Spacetime correlations in the steady state. 
Through the bulk drive the power-law decay of time 
correlations may change. For example for the sym- 
metric and asymmetric exclusion process, the steady 
states are Bernoulli with density p, denoted by {-},. For 
the on-site density—density correlation, one finds, for 
large t, 


1 re for PSO 
(mm Oy2—42 {ean E 


for F#0 ae 


9. Hydrodynamic limit. The concept of slowly 
varying conserved fields remains valid; only local 
equilibrium must be replaced by local stationarity. 
Generically, there are nonzero currents in the steady 
state. Therefore, the macroscopic fields change on 
the timescale e~'t (cf. item (5)) and are governed by 
a hyperbolic conservation law of the form 


ð _ 
a” + div i(pr) = 0 [20] 


in the case of a single conservation law. Here, j(p) is 
the average steady state in the stationary measure at 
density p. Several conservation laws have an intri- 
guing rich variety of solutions. Even on the level of 
continuum partial differential equations, such sys- 
tems of hyperbolic conservation laws still pose 
unresolved basic problems. 


See also: Ginzburg—Landau Equation; Glassy Disordered 
Systems: Dynamical Evolution; Interacting Particle 
Systems and Hydrodynamic Equations; Macroscopic 
Fluctuations and Thermodynamic Functionals; Stochastic 
Differential Equations. 
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Introduction 


Many important industrial problems involve flows 
with multiple constitutive components. Examples 
include extractors, separators, reactors, sprays, poly- 
mer blends, and microfluidic applications such as DNA 
analysis, and protein crystallization. Due to inherent 
nonlinearities, topological changes, and the complexity 
of dealing with unknown, active, and moving surfaces, 
multiphase flows are challenging. Much effort has been 
put into studying such flows through analysis, asymp- 
totics, and numerical simulation. Here, we focus on 
review on studies of multicomponent fluids using 
continuum numerical methods. 

There are many ways to characterize moving 
interfaces. The two main approaches to simulating 
multiphase and multicomponent flows are interface 
tracking and interface capturing. In interface-tracking 
methods (examples include boundary-integral, 
volume-of-fluid, front-tracking, immersed-boundary, 
and immersed-interface methods), Lagrangian (or 
semi-Lagrangian) particles are used to track the 
interfaces. In (BIMs), the flow equations are mapped 
from the immiscible fluid domains to the sharp 
interfaces separating them thus reducing the dimen- 
sionality of the problem (the computational mesh 
discretizes only the interface). In interface-capturing 
methods such as level-set and phase-field methods, 
the interface is implicitly captured by a contour of a 
particular scalar function. 

The equations governing the motion of an 
unsteady, viscous, incompressible, immiscible two- 
fluid system are the Navier-Stokes equations (the 
subscript i denotes the ith flow component): 


Ou; 
pi( FE ba Vu) = V0 + og i= 1,2 [1] 


o; = —p;l + 2n,D; [2] 


where p; is the density, u; is the fluid velocity, p; is 
the pressure, ņ; is the viscosity, and g is the 
gravitational acceleration vector. In eqn [2], o; is 
the stress tensor, I is the identity matrix, and D; is 
the rate of deformation tensor and defined as 
D; = (1/2)(Vu; + Vu}). The velocity field is subject 
to the incompressibility constraint, 


V -u; =0 [3] 


We let T denote the fluid interface. The effect of 
surface tension is to balance the jump of the normal 
stress along the fluid interface. This gives rise to a 
Laplace-Young condition for the discontinuity of 
the normal stress across T: 


[on]; = TKN [4] 


where [o]p denotes the jump oz — a; across I’, is 
the curvature of I’ (positive for a spherical interface), 
T is the surface tension coefficient which is assumed 
to be constant, and n is the unit normal vector along 
I. directed toward fluid 2. The fluid velocity is 
continuous across I. 

In order to circumvent the problems associated 
with implementing the Laplace-Young calculation 
at the exact interface boundary, Brackbill and 
collaborators developed a method referred to as 
the continuum surface force (CSF) method. See the 
review by Scardovelli and Zaleski (1999). In this 
method, the surface tension jump condition is 
converted into an equivalent singular volume force 
that is added to the Navier-Stokes equations. 
Typically, the singular force is smoothed and acts 
only in a finite transition region across the interface. 
The system of equations [1]-[2] and the boundary 
condition, eqn [4] can be combined into the 
following distribution formulation that holds in 
both phases: 


plu, +u: Vu)=-— Vp + V (2nD) + pg + Faing, 
V-u=0 [5] 


where the subscript 7 is dropped (i.e., it is under- 
stood that u = u; in fluid 7, etc.,) and Fsing is singular 
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surface tension force that is given by Fsing = —TKOPN, 


where ôr is the surface delta-function. 


Numerical Methods for Multicomponent 
Fluid Flows 


Interface-Tracking Methods 


Boundary-integral methods (BIMs) BIMs can be 
highly accurate for modeling free surface flows 
with relatively regular interface topologies. The 
BIM was apparently first used by Rosenhead in 
1932 to study vortex sheet roll-up. In this 
approach, the interface is explicitly tracked, but 
the flow solution in the entire domain is deduced 
solely from information possessed by discrete 
points along the interface. 

BIMs have been used for both inviscid and Stokes 
flows. For a review of Stokes flow computations, see 
Pozrikidis (2001), and for a review of computations 
of inviscid flows, see Hou et al. (2001). For flows 
with both inertia and viscosity, volume integrals 
must be incorporated into the formulation. 

When inertial forces are negligible (left-hand side 
term of eqn [1] is dropped), the velocity u(xo) at a 
given point xo on the interface can be obtained by 
means of the boundary-integral formulation, 


200 ( (xo) -5 | f G(xo, x 
n(x) ds(x) [6] 


(A + 1)u(xo) = 


a i u(x) - T(x, x) - n(x) ds(x) [7] 


where A is the viscosity ratio, uœ% is an imposed 
velocity prevailing in the absence of the interfaces, and 
f(x) is the capillary force function f = rK. The tensors 
G and T are the Stokeslet and stresslet, respectively: 








I xx 
G(xo, ) mas Sat 
r T i [8] 
6XXX 
T (x0, x) = r5 


where x= x= Xj; r= |x| [9] 


The boundary conditions at the interface, that is, the 
stress balance equation [4] and continuity of the 
velocity across the interface, are automatically 
satisfied by the boundary-integral formulation. 

The normal velocity of the interface I(x,t) is 
given by 


—-n(x) = u(x,t) - n(x) [10] 


The shape of the interface does not depend on the 
tangential velocity and there are many possible 
choices that can be taken, see Hou et al. (2001). 

The principal advantages gained by using BIMs 
are the reduction of the flow problem by one 
dimension since the formulation involves quantities 
defined on the interface only and the potential for 
highly accurate solutions if the flow has topologi- 
cally regular interfaces. In addition, highly efficient 
adaptive surface mesh refinement algorithms have 
recently been developed to improve the performance 
and accuracy of the methods (Cristini et al. 2001). 
The main disadvantages are the development of 
accurate quadratures of integrals with singular 
kernels (particularly in 3D) and the need for local 
surgery of the interface in the event of topological 
changes. 

BIMs have been successfully used for simulations 
of complex multiphase flows: drop deformation and 
breakup; jets; capillary waves; mixing; drop-to-drop 
interaction; suspension of liquid drops in viscous 
flow (e.g., see Cristini et al. (2001), Hou et al. 
(2001), and Pozrikidis (2001) and the references 
therein). 


Volume-of-fluid (VOF) method In the VOF 
method (see Scardovelli and Zaleski (1999) for a 
recent review), the location of the interface is 
determined by the volume fraction c; of fluid 1 in 
the computational cell, Q;. In cells containing the 
interface 0 < cj < 1, cj = 1 in cells containing fluid 1, 
and c;;=0 in cells containing fluid 2 as shown in 
Figure 1b. 

A VOF algorithm is divided into two parts: a 
reconstruction step and a propagation step. A 
typical interface reconstruction is shown in 
Figure 1c. In the piecewise linear interface construc- 
tion (PLIC) method, the true interface, as shown in 
Figure 1a, is approximated by a straight line 
perpendicular to an interface normal vector n; in 
each cell Q;. The normal vector nj is determined 
from the volume fraction gradient using data from 
neighboring cells. With given a volume fraction cj 
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Fluid 1 




















(a) (b) (c) 
Figure 1 VOF representation of an interface: (a) actual 
interface, (b) volume fraction, and (c) an approximation to the 
interface is produced using an interface reconstruction method 
such as piecewise linear approximation as shown. 


and a normal vector mj, the interface is given by the 
straight line with normal mj such that area beneath 
the line in cell ( is equal to cj. More recently, 
parabolic reconstructions of the interface have been 
used to gain higher-order accuracy for the surface 
tension force (e.g., the “parabolic reconstruction of 
surface tension” or PROST algorithm). 

Once the interface has been reconstructed, its 
motion by the underlying flow field must be 
modeled by a suitable advection algorithm. The 
key here is that the explicit interface reconstruction 
enables fluxes to be developed that exactly conserve 
mass and do not diffuse the interface. 

Capillary effects may be represented by the 
continuous surface stress (Scardovelli and Zaleski 
1999), 


T=-r(I-n@n)|Vel, Fis =-V: T [11] 


where č is a smoothed version of the volume 
fraction. For the flows in which the capillary force 
is the dominant physical mechanism, the PROST 
algorithm discussed above can be used to signifi- 
cantly reduce spurious currents due to inaccurate 
representation of surface tension terms and asso- 
ciated pressure jump in normal stress. 

The distribution form of the fluid equations [5] is 
typically solved using a variant of the projection 
method for incompressible single phase flows. 

VOF methods are popular and have been used in 
commercial multiphase flow codes, in models of 
inkjet printers, flows with surfactants and in many 
other applications (e.g., see Scardovelli and Zaleski 
(1999) and James and Lowengrub (2004) and the 
references therein). The principal advantage of VOF 
methods is their inherent volume-conserving prop- 
erty. Nevertheless, spurious bubbles and drops may 
be created. The reconstruction of the interface from 
the volume fractions and the computation of 
geometric quantities such as curvature are typically 
less accurate than other methods discussed here 














(a) (b) 
Figure 2 
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since the curvature and normal vectors are obtained 
by differentiating a nearly discontinuous function 
(volume fraction). 


Front-tracking methods The basic idea behind the 
original front-tracking method is the use of two 
grids as illustrated in Figure 2. One is a standard, 
Eulerian finite difference mesh that is used to solve 
the fluid equations. The other is a discretized 
interface mesh that is used to explicitly track the 
interface and compute surface tension force which is 
then transferred to the finite difference mesh via a 
discrete delta-function. Front tracking was first 
proposed by Richtmyer and Morton and further 
developed by Glimm and co-workers. 

A similar approach was taken by Unverdi and 
Tryggvason (see Tryggvason et al. (2001) and Peskin 
(2002) for recent reviews), who combined a moving 
grid description of the interface with flow computa- 
tions on a fixed grid. In this immersed-boundary 
approach, all the fluid phases are treated together by 
solving a single set of governing equations. This 
method has its roots in the original marker-and-cell 
(MAC) method, where marker particles are used to 
identify each fluid and the immersed-boundary 
method of Peskin and McQueen, that was designed 
to track moving elastic boundaries in homogeneous 
fluids. 

The interface is represented discretely by Lagran- 
gian markers that are connected to form a front 
which lies within and moves through a stationary 
Eulerian mesh. 

In Tryggvason’s original implementation, the 
basic structural unit is a line segment. Since the 
interface moves and deforms during the computa- 
tion, interface elements must occasionally be added 
or deleted to maintain regularity and stability. In the 
event of merging/breakup, elements must be relinked 
to effect a change in topology. 

The interface is represented using an ordered list 
of marker particles xg =((x1),,(*2),), 1 <R<N. 
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(a) The basic idea in the front-tracking method is to use two grids — a stationary finite difference mesh and a moving 


Lagrangian mesh, which is used to track the interface. (b). Blow-up of the subgrid control volume in (a). (c) Control volume for the 


Eulerian mesh, 92; ;(4/2)- 
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The first step in this algorithm is the advection of the 
marker particles. A simple bilinear interpolation is 
used to find the velocity inside each grid cell (indicated 
in Figure 2c). The marker particles are then advected in 
a Lagrangian manner. Once the points have been 
advected, a list of connected polynomials (p*(s), p? (s)) 
is constructed using the marker particles. This gives a 
parametric representation of the interface, with s 
typically an approximation of the arclength. Both 
lists are ordered and thus identify the topology of the 
interface. In later works, higher-order polynomials 
have been used (e.g., cubic splines) and semi-Lagran- 
gian evolutions have been implemented where other 
tangential velocities have been used. 

As the interface evolves, the markers drift along 
the interface following tangential velocities and 
more markers may be needed if the interface is 
stretched by the flow. Typically, the markers are 
redistributed along the interface to maintain an 
accurate interface representation. 

Next, we compute the surface tension force, 


Feing (2, t) = J F TK5(x —x¢(s))apds [12] 


where the subscript f means values evaluated at the 
interface I(t) and s is arclength. The discrete 
numerical implementation of this distribution onto 
the fixed grid is in the form of a sum over interface 
elements, xf g: 


Fy(x) =) _ f ,5(% — xp) Asp [13] 
k 


where As, is the average of the straight line 
distances from the point xr g to the two neighboring 
points xf k41 and xf -1 as indicated by the subgrid 
control volume shown in Figures 2a and 2b. The 
delta-function is typically taken to be Peskin’s 
discrete Dirac delta-function: 


ôx — Xf k) 
E. we) | 
— | 1+cos— — —* ] if =x] < 2b 
{Tail a be — xp 
0 otherwise [14] 


Other higher-order alternative forms of the regular- 
ized delta-function using the product formula have 
recently been proposed. 

Using the Frenet relation, the surface tension force 
on a short segment of the front is given by 


f, = J TK pny ds = J g- -ds = T(tp — ta) [15] 
A A Os 


where A and B are the segment endpoints that lie 
on the boundary of the subgrid control volume 


(Figures 2a and 2b), and ty is a tangent vector 
computed by fitting a polynomial to the endpoints 
of each element. 

In the case of flows with varying density and/or 
viscosity between the fluid components, there is a 
need to calculate the phase indicator function I(x, t) 
(defined by interface geometry and position), which 
has the value 0 in fluid 1 and 1 in fluid 2. The 
indication function can be determined via the 
solution of the equation 

ALK. t) = V. npd(x — xy(s,t))ds [16] 
r(t) 
This equation is discretized on the Eulerian mesh 
and a discrete delta-function (e.g., eqn [14]) is used. 
The fluid properties such as density and viscosity are 
determined via the indicator function, that is, 
p(x, t) = pi (p2 — Pl M(x, t), etc. 

As in the volume of fluid algorithm, the distribu- 
tion form of the Navier-Stokes equations [5] are 
typically solved using a version of Chorin’s projec- 
tion method. 

An alternative flow solver that can be used to 
integrate the flow equations in the presence of an 
interface is the immersed-interface method (IIM). 
The IIM was developed by Leveque and Li (see the 
review Li 2003), and can be used together with 
front-tracking as well as level-set methods. 

The IIM directly incorporates jump conditions for 
the normal stress into the finite difference stencil. The 
key idea of this method is to use the jump conditions 
in Taylor series expansions of pressure and velocity 
near interfaces to derive difference equations that 
achieve pointwise second-order accuracy. 

The principal advantage of front-tracking algo- 
rithms is their inherent accuracy, due in part to the 
ability to use a large number of grid points on the 
interface. Front-tracking methods can be compli- 
cated to implement, particularly in 3D, but give the 
precise location and geometry of the interface. In 
addition, explicit front tracking permits more than 
one interface to be present in a single computational 
cell without coalescence, which can be important in 
dense bubbly flows, emulsions, etc. One of major 
handicaps of front-tracking methods is the difficulty 
in modeling topological changes of the interface 
such as breakup and coalescence without ad hoc cut- 
and-connect and reconnecting parameterized inter- 
face (particularly, difficulties in 3D). 


Interface-Capturing Methods 


Level-set method Level-set methods, introduced by 


Osher and Sethian (see the recent review papers 
(Osher and Fedkiw 2001, Sethian and Smereka 
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(a) (b) 
Figure 3 (a) Zero contour of ¢ representing the interface T. 
(b) Surface of # with zero contour. 





2003) and the recent texts (Osher and Fedkiw 
2002, Sethian 1999)), are popular computational 
techniques for tracking moving interfaces. These 
methods rely on an implicit representation of the 
interface as the zero set of an auxiliary function 
(level-set function). The application of these meth- 
ods to incompressible, multiphase flows started with 
the work of Osher, Merriman, Sussman, Smereka, 
Hou, and their collaborators. 

In the level-set method, the level-set function 
d(x, t) is defined as follows (see Figure 3): 


>0 if xe fluid 1 
(x,t) =0 if x €T (the interface between fluids) 
<O if xe fluid 2 


and the evolution of ¢ is given by 
dp tu-Vo=0 [17] 


which means that the interface moves with fluid. 

To keep the interface geometry well resolved, the 
level-set function ¢ should be a distance function near 
the interface. However, under the evolution [17] it 
will not necessarily remain as such. We note that 
special velocity extensions v off the interface (i.e., 
v=u at the interface, v ~u away from interface) 
have been recently developed to better maintain ¢ as 
a distance function (e.g., Sethian and Smereka (2003) 
and Macklin and Lowengrub (2005)). Typically, a 
reinitialization step (solving a Hamilton-Jacobi type 
equation, eqn [18]) below, is performed to keep ¢ as 
a distance function near the interface while keeping 
original zero-level set unchanged. More specifically, 
given a level-set function, @, at time t, the contours 
are redistributed according to the steady-state solu- 
tion of the equation 


Od 
g T Se(P)(1— |Val),  d(x,0) = (x) [18] 
where S. is the smoothed sign function defined as 


Q 


S-() = ene 


[19] 
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where e is usually is one or two grid lengths. After 
solving eqn [18] to steady state (x,t) is then 
replaced by d(x, Tsteadqy). Note that d(X, Tsteady) iS 
typically a good approximation of the signed 
distance function. 

The density and viscosity are defined as 


p() = p2 + (p1 — p2)Ae(¢) 


and 
n(o) = m + (m — 2)H-(¢) [20] 


where H,(@) is the smoothed Heaviside function 
given by 


0 if ọ <—e 
H.(¢) = 4 3[1+2+;sin(md/e)] if |d| <e 
if d>e€ 


1 
The mollified delta-function is 6.(¢)=dH,./d@. The 


surface tension force is given as 


Vo Vo 
Fins = —TV + | —— léo) -= 21 
sav (KI OR P 


The fluid equations [5] are solved using projection 
methods, the IIM or the ghost-fluid (GF) method 
(e.g., Osher and Fedkiw (2001, 2002) and Fedkiw 
et al. (2003)). The GF method is similar to the HM 
in that jump discontinuities are incorporated in the 
finite difference stencil. In the GF algorithm, subcell 
resolution is used to mark the interface position and 
the values of discontinuous quantities are artificially 
extended to grid points neighboring the interface via 
extrapolation. A fully second order accurate GF 
method for moving interfaces has recently been 
developed (Macklin and Lowengrub 2005). 

Applications of the level-set method include 
multiphase flows, viscoelastic fluid flows and fluid- 
structure interactions (e.g., see the reviews Osher and 
Fedkiw (2001, 2002), Sethian (1999), and Sethian 
and Smereka (2003)). 

Advantages of the level-set algorithm include the 
simplicity with which it can be implemented, the 
ability to capture merging and breakup of interfaces 
automatically, and the ease with which the interface 
geometry can be described using the level-set 
function. A disadvantage of the level-set method is 
that mass is not conserved. 

Accurate numerical simulations of multiphase 
flow and topology transitions require the computa- 
tional mesh to resolve both the macroscales (e.g., 
droplet size, flow geometry) and the microscales to 
accurately capture local interface geometries near 
contact region, van der Waals forces, surfactant 
distribution, and Marangoni stresses. Adaptive mesh 
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Figure 4 Each of the first three figures has a boxed region that is magnified in the next figure. The rates of magnification are 5, 10, 
40/3, respectively. The meshes in the figure are used to simulate the drop-impacting interface problem. Source: Zheng X, Anderson A, 


Lowengrub JS, and Cristini V (unpublished). 


algorithms have recently been used greatly to 
increase accuracy and computational efficiency in 
level-set methods. Typically, the methods involve 
Cartesian adaptive mesh refinement. Problems 
tackled using this approach include droplet forma- 
tion in inkjet printers and wake development behind 
a ship. Another approach, recently developed, is to 
use adaptive unstructured mesh refinement (Zheng 
et al. 2005), as shown in Figure 4, in which the 
impact of a drop onto a fluid interface is captured. 


Hybrid Methods 


More recently, a number of hybrid methods, which 
combine good features of each algorithm, have been 
developed. These include coupled level-set volume- 
of-fluid (CLSVOF) algorithms, particle level-set 
methods, marker-VOF methods and level-contour 
front-tracking methods. 

Level-set and VOF methods have recently been 
combined. The volume fraction is used to maintain 
volume conservation, while the level-set function is 
used to describe the interface geometry. After every 
time step, the volume-fraction function and level-set 
function are made compatible. The coupling 
between the level-set function @ and the VOF 
function c occurs through the normal of the 
reconstructed interface and through the fact that 
the level-set function is reset to the exact signed 
normal distance to the reconstructed interface 
(where the area below the reconstructed interface is 
given by the volume-fraction function). 

In the particle level-set method, Lagrangian 
disconnected marker particles are randomly posi- 
tioned near the interface and are passively advected 
by the flow in order to rebuild the level-set function 
in under-resolved zones, such as high-curvature 
regions and near filaments. In these regions, the 
standard nonadaptive level-set method regularizes 
excessively the interface structure and mass is lost. 
The use of marker particles significantly ameliorates 
these difficulties. 


Recently, a hybrid method has been developed, 
which uses both marker particles, to reconstruct and 
move the interface, and the volume-fraction function 
to conserve volume. In this approach, a smooth 
motion of the interface, typical of marker methods is 
obtained together with volume conservation, as in 
standard VOF methods. This work improves both 
the accuracy of interface tracking, when compared 
to standard VOF methods, and the conservation of 
mass, with respect to the original marker method. 

Finally, a hybrid method that combines a level 
contour reconstruction technique with front-tracking 
methods has recently been developed to auto- 
matically model the merging and breakup of inter- 
faces in three-dimensional flows. 


Phase-Field Method 


Phase-field, or diffuse-interface, models are an 
increasingly popular choice for modeling the motion 
of multiphase fluids (see Anderson et al. (1998) for a 
recent review). In the phase-field model, sharp fluid 
interfaces are replaced by thin but nonzero thickness 
transition regions where the interfacial forces are 
smoothly distributed. The basic idea is to introduce 
a conserved order parameter (e.g., mass concentra- 
tion) that varies continuously over thin interfacial 
layers and is mostly uniform in the bulk phases (see 
Figure 5). 

For density-matched binary liquids (let p=1 
for simplicity), the coupling of the convective 
Cahn-Hilliard equation for the mass concentration 
with a modified momentum equation that includes a 
phase-field-dependent surface force is known as 
Model H (Hohenberg and Halperin 1977). In the 
case of fluids with different densities a phase-field 
model has been proposed by Lowengrub and 
Truskinovsky. Complex flow morphologies and 
topological transitions such as coalescence and 
interface breakup can be captured naturally and in 
a mass-conservative and energy-dissipative fashion 
since there is an associated free energy functional. 
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Figure 5 A concentration prome across an interface with 
interface thickness, €. 


The phase field is governed by the following 
advective Cahn—Hilliard equation: 


Oc 
a te Ve = V (M(e)Vp) [22] 


=F (ee Le [23] 


where M(c)=c(1—c) is the mobility, F(c)= 
(1/4) (1 — c)? is a Helmholtz free energy that 
describe the coexistence of immiscible phases, and 
€ is a measure of interface thickness and e ~ € (see 
Figure 5). It can be shown that in the sharp interface 
limit e—0, the classical Navier-Stokes system 
equations and jump conditions are recovered. 

The singular surface tension force is Fsing= 
—6\/2reV : (Vc & Vc), where 7 is the surface ten- 
sion coefficient. An alternative surface tension force 
formulation based on the CSF is Fsing = —6/2TeV: 
(Vc/|Vcl)|Vc|Vc. 

Recently, very efficient nonlinear multigrid meth- 
ods have been developed to solve implicit discretiza- 
tions of the Cahn—Hilliard equation (e.g., Kim et al. 
(2004)). These schemes have been combined with 
projection methods to solve the Navier-Stokes 
equations to perform simulations of multiphase 
flows. 

An example of simulation of liquid thread breakup 
using a phase-field method is shown in Figure 6. 
A long cylindrical thread of a viscous fluid 1 is in an 
infinite mass of another viscous fluid 2. If the thread 
becomes varicose with wavelength A, the equilibrium 
of the column is unstable, provided A exceeds the 
circumference of the cylinder. This is the Rayleigh 
capillary instability that results in surface-tension- 
driven breakup of the thread. 

An advantage of the phase-field approach is that it 
is straightforward to include more complex physical 
effects. For example, the binary model can be 
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Figure 6 Time evolution leading to multiple pinch-offs. The 
evolution is from top to bottom and left to right. The domain is 
axisymmetric, the initial velocities are zero everywhere, and the 
concentration field is given by c(r, Z)=0.5(1 — tanh ((r — 0.5— 
0.05 cos (z))/(2\/2e))) on Q=(0,7) x (0,27). Densities are 
matched and viscosity ratio is 0.5. 





straightforwardly extended to describe three- 
component flows as follows. 

Consider a ternary mixture and denote the 
composition of components 1, 2, and 3, expressed 
as mass fractions, by cy,c2, and c3, respectively. 


Therefore, 


0<c¢<1 [24] 


3 
aS ie 


i=1 


The composition of a ternary mixture (A, B, and C) 
can be mapped onto an equilateral triangle (the 
Gibbs triangle (Porter and Easterling 1993)) whose 
corners represent 100% concentration of A, B, or C 
as shown in Figure 7a. Mixtures with components 
lying on lines parallel to BC contain the same 
percentage of A, those with lines parallel to AC have 
the same percentage of B concentration, and 
analogously for the C concentration. In Figure 7a, 
the mixture at the position marked ‘o’ contains 60% A, 
10% B, and 30% C. Because the concentrations sum 
to unity, only two of them need to be determined, 
SAY C1, C2. 

The evolution of cı and c2 is governed by the 
following advective ternary Cahn—Hilliard equation: 


— +u. Vc =V. (M(c1, c2)V u1) [25] 








(a) (b) 


Figure 7 (a) Gibbs triangle. (b) Contour plot of the free energy 
F(c1, C2) on the Gibbs triangle. 
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Ot Fu Vo= V- (M(c1, c2)V u2) [26] 
F 

i= Tene -eAq-— 0Sa [27] 
F 

2 = Tena —0.5éAci—-eAco [28] 


where M(c1,c2)= ae cic; is the mobility and 
F(c1,c2) is the Helmholtz free energy that can be 
used to model the miscibility of the components. An 
example of a free energy (used in the simulation 
shown in Figure 8 below) for which fluids 1 and 3 
are immiscible and fluid 2 is preferentially miscible 
with fluid 3 is: 


26 ato) Hla t02)\o 02) 
+ (1.2 —c1 — c2)(c2 — 0.4) 


F(c1, c2) 


The contours of F on the Gibbs triangle are shown 
in Figure 7b. 
The m surface tension force is Fsing= 


—6V2 2. V-(Vc;®Vc;), where the physical 


surface aco coefficients 7; between two fluids 7 


and j are decomposed into the phase-specific surface 
tensions 7; such that Tj =T; + 7. 











Figure 8 Evolution of concentration of fluid 1 (top row), 


As a demonstration of the evolution possible in 
partially miscible liquid systems, we present an 
example in which there is a gravity-driven 
(Rayleigh-Taylor) instability that enhances the 
transfer of a preferentially miscible contaminant 
from one immiscible fluid to another in 2D. In this 
system, the ternary Cahn-—Hilliard system is solved 
using nonlinear multigrid methods and a projection 
method (Kim and Lowengrub (in press)) is used to 
solve the flow equations [5]. 

In Figure 8 (first column), the top half of the domain 
initially consists of a mixture of fluids 1 and 2, 
and the bottom half consists of fluid 3, which is 
immiscible with fluid 1. The contours of c1, c2, and c3 
are visualized in gray-scale where darker regions 
denote larger values of cy,c2, and c3, respectively. 
In the top row, the contours of fluid 1 are shown, the 
middle and bottom rows correspond to fluids 2 and 3, 
respectively. 

Fluid 2 is preferentially miscible with fluid 3. 
Fluid 1 is assumed to be the lightest and fluid 2 the 
heaviest. The density of the 1/2 mixture is heavier 
than that of fluid 3, so the density gradient induces 
the Rayleigh-Taylor instability. 

The evolution of the three phases is shown in 
Figure 8. As the simulation begins, the 1/2 mixture 
falls and fluid 2 diffuses into fluid 3. A characteristic 
Rayleigh-Taylor (inverted) mushroom forms, the 


D I ga — 
sun 
mal 1 D 


(middle row), and 3 (bottom row). The contours of c;, C2, and c3 are 


visualized in gray-scale where darker regions denote larger ne of c1, C2, and Cz, respectively. 


surface area of the 1/3 interface increases, and 
vorticity is generated and shed into the bulk. 
As fluid 2 is diffused from fluid 1, the pure fluid 
1 rises to the top as shown in Figure 8. Imagining 
that fluid 2 is a contaminant in fluid 1, this 
configuration provides an efficient means of cleans- 
ing fluid 1 since the buoyancy-driven flow enhances 
the diffusional transfer of fluid 2 from fluid 1 to 
fluid 3. 

The advantages of the phase-field method are: 
(1) topology changes are automatically described; 
(2) the composition field c has a physical meaning 
not only near interface but also in the bulk phases; 
(3) complex physics can easily be incorporated into 
the framework, the methods can be straightforwardly 
extended to multicomponent systems, and miscible, 
immiscible, partially miscible, and lamellar phases 
can be modeled. 

Associated with diffuse interfaces is a small scale 
€, proportional to the width of the interface. In real 
physical systems describing immiscible fluids, € can 
be vanishingly small. However, for numerical 
accuracy € must be at least a few grid lengths in 
size. This can make computations expensive. One 
way of ameliorating this problem is to adaptively 
refine the grid only near the transition layer. Such 
methods are under development by various research 
groups. 

Phase-field methods have been used to model 
viscoelastic flow, thermocapillary flow, spinodal 
decomposition, the mixing and interfacial stretch- 
ing, in a shear flow, droplet breakup process, 
wave-breaking and sloshing, the fluid motion near 
a moving contact line, and the nucleation and 
annihilation of an equilibrium droplet (see the 


references in the review paper Anderson et al. 
(1998)). 


Conclusions and Future Directions 


In this paper we have reviewed the basic ideas of 
interface-tracking and interface-capturing methods 
that are critical in simulating the motion of inter- 
faces in multicomponent fluid flows. The differences 
between these various formulations lie in the 
representation and the reconstruction of interfaces. 
The advantages and disadvantages of the algorithms 
have been discussed. While there has been much 
progress on the development of robust multifluid 
solvers, there is much more work to be done. 
Promising future directions for research include the 
incorporation of adaptive mesh refinement into the 
algorithms and the development of efficient hybrid 
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schemes that combine the best features of individual 
methods. 


See also: Breaking Water Waves; Capillary Surfaces; 
Fluid Mechanics: Numerical Methods; Incompressible 
Euler Equations: Mathematical Theory; Inviscid 
Flows; Non-Newtonian Fluids; Partial Differential 
Equations: Some Examples; Viscous Incompressible 
Fluids: Mathematical Theory; Vortex Dynamics. 
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Introduction 


Intermittency has several meanings in turbulence. 
The oldest one, now most often labeled “external” 
or “large-scale” intermittency, refers to the coex- 
istence of turbulent and laminar regions in inho- 
mogeneous turbulent flows, such as in boundary 
layers or in free shear layers. In those cases, the 
interface between laminar irrotational flow and 
turbulent vortical fluid is typically sharp and 
corrugated. An observer sitting near the edge of 
the layer is immersed in turbulent fluid only part of 
the time. 

The intermittency coefficient y measures the 
fraction of turbulent fluid over the sampling 
universe over which the statistics are taken. For 
example, in a boundary layer such as that in 
Figure 1, the intermittency coefficient as a function 
of wall distance measures the fraction of turbulent 
fluid at a given distance from the wall. External 
intermittency is important in any attempt to model 
realistic turbulent flows, which are almost always 
inhomogeneous. Consider, for example, the classical 
homogeneous relation in eqn [1] between the mean 
kinetic energy K of the turbulent fluctuations and 
the energy dissipation rate €: 


K3/2 


Cc > Flow 
y 7 4 
O 1 


Figure 1 Sketch of a turbulent boundary layer, and of the 
associated intermittency factor. An observer such as A, at a 
distance y from the wall, only sees turbulent flow for a fraction y 
of the time. 





Zheng X, Lowengrub J, Anderson A, and Cristini V (2005) 
Adaptive unstructured volume remeshing II. Application to 
two- and three-dimensional level-set simulations of multiphase 
flow. Journal of Computational Physics 208: 626-650. 


where L is the length scale of the largest eddies, and 
Cx 0.1 is an experimentally determined constant. 
Such relations are often implicit in turbulent models, 
and they have to be modified to account for 
intermittency. Equation [1] only holds within the 
turbulent regions where the energy and the dissipa- 
tion rates are Ky and er, while the overall mean 
values used in the modeling conservation equations 
are K=~Ky and e= yer. The true overall relation 
should therefore be 


7 K3/2 
2. [2] 


E€ = Cy 
which may differ substantially from eqn [1], 
especially near the edge of the layer. Experimental 
values and rough theoretical estimates for the 
distribution of the intermittency coefficient are 
available for most practical turbulent flows. 


Internal Intermittency 


While the external intermittency just described is 
probably the most important one from the point of 
view of applications, it is not the most interesting 
from the theoretical point of view. Turbulence is a 
multiscale phenomenon which is inhomogeneous 
at all length scales, from the largest ones to the 
inner viscous cutoff (see Turbulence Theories). 
Moreover, this inhomogeneity goes beyond what 
could be expected just from the statistics of a 
random process. Consider, for example, the velo- 
city difference Au between two points separated 
by a distance r. The original Kolmogorov formula- 
tion of the energy cascade assumes that the 
probability density function (PDF), p(Az), is a 
universal function in the inertial range of scales, 
whose only parameter is a velocity scale depending 
on r. It then follows from Kolmogorov’s analysis 
that 


p(Aw) =F] Au/(éer)'? 3] 


where € is the average energy transfer rate across 
scales per unit mass, and the average () is taken 
either over the whole flow or over a suitably designed 


ensemble of experiments. In an equilibrium system, 


global energy conservation implies that Æ is equal to 
the average viscous dissipation per unit mass: 


E = v| Vul? [4] 


In eqn [4], the kinematic viscosity of the fluid is v, and 
|Vu| is the L2-norm of the velocity gradient tensor. 
Equation [3] is valid as long as the separation r is 
much larger than the Kolmogorov viscous cutoff 
n=(v>/ 2)'/4 and much smaller than the integral 
scale of the largest eddies L- = u” /@, where u’ is the 
root-mean-square value of the fluctuations of one 
velocity component. The extent of this inertial range 
is a function of the Reynolds number Re, =w'L-/v: 


L./n = Re?!" [5] 


The strict similarity hypothesis in eqn [3] is not well 
satisfied by experiments. While the velocity distribu- 
tion at a given point is approximately Gaussian, 
Figure 2a shows that the velocity increments become 
increasingly non-Gaussian as the spatial separation 
is made much smaller than L.. It was also soon 
noted that the dependence of eqn [3] on a single 
parameter such as Æ was theoretically suspect, since 
it is difficult to see how the PDFs of a whole set of 
local properties, such as the Au for different 
intervals, could depend only on a single global 
property. Kolmogorov himself sought to bypass that 
difficulty by substituting eqn [3] by a “refined 
similarity” hypothesis, 


p(Aw) = F|Au/ (er) 6] 


where £, is no longer a global average, but the mean 
value of the dissipation over a ball of radius of order 
r centered at the midpoint of the interval. This 
refined similarity is better satisfied by experiments 





(a) 
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(see Figure 2b), although, from the practical point of 
view, it just transfers the problem of characterizing 
Au to that of characterizing the statistics of £,. 

It has become customary to measure the behavior 
of p(Az) in terms of its structure functions, 


s= f Au"p(Au)dAu [7] 


which can be normalized as generalized flatness 
factors, 


a(n) =S(n)/S(2)"” [8] 
It follows from the strict similarity hypothesis [3] that 
S(n)~ r"? [9] 


and that all the o(n) should be independent of the 
separation. 

For example, the fourth-order flatness of a 
Gaussian distribution is o(4)=3. Figure 3 shows 
that this is not true. The flatness increases as the 
separation decreases, and it only levels off at lengths 
of the order of the Kolmogorov viscous scale. For 


separations in that viscous range the flow is smooth, 
Au x (0,u)r, and 


si) Oa Oa 110] 


It follows from eqn [10] and from Figure 3 that the 
velocity gradients become increasingly non-Gaussian 
as L, and 7 separate at high Reynolds numbers. The 
velocity differences across intervals which are large 
with respect to 7 also become very non-Gaussian 
when r < L,. 

Because the velocity difference between two 
points which are not too close to each other can be 
expressed as the sum of velocity differences over 
subintervals, a loose application of the central limit 


-1 
10 f 


102] 


PDF 
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Figure 2 PDFs of the differences of the velocity component in the direction of the separation (for separations in the inertial range of 
scales). r/L. =0.02—0.36, increasing by factors of 2; equivalent to r/ņn=180—3000. Nominally isotropic turbulence at Reynolds 
number Re, = 10°.. (a) Au is normalized with the global energy dissipation rate Z; distributions are wider as the separation decreases. 
(b) Au is scaled with the locally averaged dissipation over the separation interval. Data courtesy of H Willaime and P Tabeling. 
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Figure 3 Fourth-order flatness of the differences of the 
velocity component in the direction of the separation, for 
separations in the inertial range of scales, r/L-=0.5 to 
r/n=2.. The Reynolds numbers of the different flows range 
from Re, = 1800 to 10°.. Data in part courtesy of H Willaime, 
P Tabeling, and R A Antonia. 


theorem would suggest that its PDF should be 
roughly Gaussian. The key conditions for that to 
happen are that the summands should be mutually 
independent, that their magnitudes should be com- 
parable, and that each of them has a probability 
distribution with a finite variance. The first of those 
three conditions is probably a good approximation 
if the separation is much longer than the viscous 
cutoff, but the second one depends on the structure 
of the flow. The experimental non-Gaussian beha- 
vior suggests the existence of occasional very strong 
velocity jumps. In the viscous range of scales, those 
structures have been identified both experimentally 
and numerically as very strong linear vortices, in 
whose neighborhoods the strongest gradients are 
generated. An example of a tangle of such structures 
is shown in Figure 4. 

In another example, the vorticity in decaying 
two-dimensional turbulence concentrates very 
quickly into relatively few strong compact vortices, 
which are stable except when they interact with 
each other. The velocity field is dominated by them, 
and the flatness of the velocity increments reaches 
values of the order of o(4)~%50-100, even at 
moderate Reynolds numbers. That case is interest- 
ing because something can be said about the 
probability distribution of the velocity gradients. 
We have noted that the PDF of a sum of mutually 
comparable independent random variables with 
finite variances tends to Gaussian when the number 
of summands is large. This well-known theorem is a 
particular case of a more general result about sums 
of random variables whose incomplete second 
moments diverge as 


na(s)= | x*p(x)dx ~ s*-° whens—-oo [11] 





Figure 4 Intense vortex tangle in the logarithmic layer of a 
turbulent channel. The vortex diameters are of the order of 107, 
and the size of the bounding box is of the order of the channel 
width. Reproduced with permission of J C del Alamo. 


When 0 < a < 2, the sums of such variables tend 
to a family of “stable” distributions parametrized by 
a. The Gaussian case is the limit of that family when 
a@=2. In the case of two-dimensional vortices with 
very small cores, the velocity gradients at a distance 
R from the center of the vortex behave as 1/R?. If 
we take s in eqn [11] to be one of those velocity 
derivatives, its probability distribution is propor- 
tional to the area covered by gradients with a given 
magnitude, and 


1/2 


u2 (s) ~| R-“*2aRdR~s7} [12] 
0 


The velocity derivatives at any point, which are 
the sums of the velocity derivatives induced by all 
the randomly distributed neighboring vortices, 
should therefore be distributed according to the 
stable distribution with œ= 1, which is Cauchy’s 

č 

p(s) n(c a s*) [13] 
This distribution has no moments for n >1. Its 
tails decay as s*, and the distribution of the 
gradients essentially reflects the properties of the 
closest vortex. In real two-dimensional turbulent 
flows, the distribution [13] is followed fairly well, 
but its extreme tails only reach to the maximum 
values of the velocity gradient found within the 
viscous vortex cores, which are not exactly point 
vortices. 


Other similar general results can be derived that 
link the behavior of the structure functions with the 
properties of the stable distributions corresponding 
to the type of flow singularities expected in the limit 
of infinite Reynolds number. 

The common feature of the two cases just 
described is the presence of strong structures that 
live for long times because viscosity stabilizes 
them. They are therefore more common than 
what could be expected on purely statistical 
grounds. They are responsible for the tails of the 
probability distributions of the velocity derivatives, 
but they are not the only intermittent features of 
turbulent flows. The increase of the flatness in 
Figure 3 below r 507 is clearly connected with 
the presence of the coherent vortices, but even for 
larger separations there is a smooth evolution of 
o(4) that suggests that the formation of intense 
structures is a gradual process that takes place 
across the inertial range. Much less is known 
about those hypothetical inertial structures than 
about the viscous ones. 

We can now recast the problem of intermittency 
in Navier-Stokes turbulence into geometric terms. 
The defining empirical observation for that system is 
that the energy dissipation given by eqn [4] does not 
vanish even in the infinite Reynolds number limit in 
which v —0. This means that the flow has to 
become singular as |Vu|L,/u' ~ Rey! *. The strict 
similarity approximation assumes that those singu- 
larities are uniformly distributed across the flow, but 
the experimental evidence just discussed shows that 
this is not true. The singularities are distributed 
inhomogeneously, and the inhomogeneity develops 
across the inertial cascade. The problem of inter- 
mittency is to characterize the geometry of the 
support of the flow singularities in the limit of 
infinite Reynolds number. 

In the absence of detailed physical mechanisms 
for the dynamics of the inertial range, most 
intermittency models are based on plausible pro- 
cesses compatible with the invariances of the 
inviscid Euler equations. The precise power law 
given in eqn [9] for the structure functions depends 
on the strict similarity hypothesis [3], but the fact 
that it is a power law only depends on the scaling 
invariances of the equations of motion. The 
energies and sizes of the eddies in the inertial 
range are too small for the integral scales of the 
flow to be relevant, and too large for the viscosity 
to be important. They therefore have no intrinsic 
velocity or length scales. Under those conditions, 
any function of the velocity which depends on 
a length has to be a power. Consider a quantity 
with dimensions of velocity, such as u(r) = S(n)'/", 
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which is a function of a distance such as r. 
On dimensional grounds we should be able to 
write it as 


u(r) = UF(p) [14] 


where p=r/L, and L and U(L) are arbitrary length 
and velocity scales. The value of u(r) should not 
depend on the choice of units, and we can 
differentiate eqn [14] with respect to L to give 


ðLu =(dU/dL)F(p) — UpL~'(dF/dp)=0 J15] 
which can only be satisfied if 


dF 

— = (FSF o 16 
poe [16] 
and ¢=L(dU/dL)/U is constant. This suggests 
generalizing eqn [9] to 


S(n) wr) [17] 


where the exponents are empirically adjusted. Only 
¢(3)=1 can be derived directly from the Navier- 
Stokes equations. Equation [17] implies that o(7) 
satisfies a power law with exponent ¢(n)— n¢(2)/2. 
In Figure 3, for example, the flatness follows a 
reasonably good power law outside the viscous 
range, consistent with C(4)— 2¢(2)~ —0.12. The 
anomalous behavior near the viscous limit, and 
similar limitations at the largest scales, mean that 
only very high Reynolds number flows can be used 
to measure the scaling exponents, and that the range 
over which they are measured is never very large. 
Moreover, the integrand of the higher-order struc- 
ture functions peaks at the extreme tails of the 
probability distributions of the velocity differences, 
which implies that very long experimental samples 
have to be used to accumulate enough statistics to 
measure the high-order exponents. For these and for 
other reasons, the scaling exponents above n = 8—10 
are poorly known. This is unfortunate because we 
will see later that some of the most interesting 
intermittency properties of the velocity field, such as 
the nature of the flow singularities in the infinite 
Reynolds number limit, depend on the behavior of 
the ¢(z) for large n. 

Experimental values for the scaling exponents are 
given in Table 1. They are generally smaller than the 
ones predicted by the strict similarity approxima- 
tion, implying that the moments of the velocity 
differences decrease with the separation more slowly 
than they would if they were self-similar, and 
suggesting that new stronger structures become 
important as the scale decreases. 

Note that we have included in the table values for 
odd-order powers. Up to now we have not specified 
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Table 1 Longitudinal scaling exponents 

Order Experimental Strict similarity 
2 0..70 + ..01 0.667 

3 1.00 1 

4 1..30 + ..03 1.333 

5 1..56 + ..04 1.667 

6 1..79 + ..03 2.000 

7 1-99 + ..10 2.333 

8 2..22 + ..05 2.667 


The values on the second column are averages from different 
experiments, and the standard deviations reflect scatter among 
experiments. The third column is the value from the strict 
similarity equation [9]. 


which velocity component is being analyzed, but 
most experiments refer to the one in the direction 
of the separation. That is the easiest case to 
measure, specially if time is used as a surrogate 
for distance, and those PDFs are not symmetric 
even in isotropic turbulence. Negative increments 
are more common than positive ones because of the 
extra energy required to stretch a vortex, and the 
effect is clearly visible in the distributions in 
Figure 2. Those longitudinal odd-order structure 
functions do not vanish, and their scaling expo- 
nents are the ones given in the table. The transverse 
structure functions are those in which the velocity 
component is normal to the separation, and their 
odd-order moments vanish by symmetry in iso- 
tropic turbulence. There has been a lot of discus- 
sion about whether the longitudinal scaling 
exponents of even orders differ from the transverse 
ones. Early results suggested that the latter are 
lower than the former, undermining the case for 
intermittency theories based on similarity argu- 
ments, and suggesting that a more mechanistic 
approach was needed. The present consensus 
seems to be that both sets of exponents are 
equal, but that there are residual effects of low 
Reynolds numbers and of flow anisotropy that are 
difficult to avoid experimentally. The question is 
still open. 


Multiplicative Models 


The most successful phenomenological models for 
the geometry of intermittency are based on the 
concept of a multiplicative cascade. Consider some 
flow property v, such as the locally averaged 
energy transfer rate by eddies of size rg, which 
cascades into smaller eddies of size rg, which is 
some fraction of rp. Denote by p(vą) the 


probability distribution of the value of v at the 
step k of the cascade. 

Assume that the cascade is Markovian in the sense 
that the probability distribution of vą depends only 
on its value in the previous step, 


perleess)= | prvesslee k)pp(vyp) dv, [18] 


This is in contrast to some more complicated 
functional dependence, such as on the values of vg 
in some extended spatial neighborhood, or on 
several previous cascade stages. This assumption 
intuitively implies that vg,; evolves faster, or on a 
smaller scale, than v,;, and that it is in some kind of 
equilibrium with its precursor. If the cascade is 
deterministic in that sense, vą can be represented as 
a product 


Ve/VO = qkqk-1-- -q1 [19] 


in which the factors qg =v,/vVg_1 are statistically 
independent of each other. 

If the underlying process is invariant to scaling 
transformations, the transition probability density 
function has to have the form 


pr(Vk+1lYk) = vp w(qk41; k) [20] 


The multiplicative model works most naturally 
for positive variables, and we will assume that 
to be the case in the following, but most results 
can be generalized to arbitrary distributions. We 
will also assume for simplicity that all the 
cascade steps are equivalent, so that the distribu- 
tion w(q) of the multiplicative factors is indepen- 
dent of k, and depends only on our choice for 
fk+1/fk- 

Local deterministic self-similar cascades lead 
naturally to intermittent distributions, in the sense 
that the high-order flatness factors for vp become 
arbitrarily large as k increases. It follows from eqns 
[18]-[20] that the mth order moment for p, can be 
written as 


S(n)= J E pl€) dE=So(n)Sy(n)& [21] 


where S„(n) is the nth order moment of the 
multiplicative factor q, and n is any real number 
for which the integral exists. If we define flatness 
factors as in eqn [7], we can rewrite eqn [21] as 


o4(m) =00(12)o(n)" [22] 
It follows from Chebichev’s inequality that 


S(n) > S(n — 2)S(2) > S(n — 4)S(2)*... [23] 


from where 
1 < 0(4) < o(6)... [24] 


which is true for any distribution of positive 
numbers. Equality only holds for trivial distributions 
concentrated on a single value. The product in eqn 
[22] therefore increases without bound with the 
number of cascade steps, and the flatness factors 
diverge. 

It is tempting to substitute k in [21] by a 
continuous variable, in which case the PDFs form 
a continuous semigroup generated by infinitesimal 
scaling steps. This leads to beautiful theoretical 
developments, but it is not necessarily a good idea 
from the physical point of view. For example, while 
it might be reasonable to assume that the properties 
of an eddy of size r depend only on those of the 
eddy of size 2r from which it derives, the same 
argument is weaker when applied to eddies of 
almost equal sizes. We will restrict ourselves here 
to the discrete case. 


Limiting Distributions 


The multiplicative process just described can be 
summarized as a family of distributions p(v) such 
that the probability density for the product of two 
variables is 


p(vp, Vk, ) = Pki +k (Ve +k, ) [25] 


and it is natural to ask whether there is a limiting 
distribution for large k. We know that, in the case of 
sums, rather than products, such distributions tend 
to be Gaussian under fairly general conditions, and 
the first attempt to analyze [25] was to reduce it to a 
sum by defining 


z =k log(vg/v0) [26] 


The argument was that z would tend to a Gaussian 
distribution, and that the limiting distribution for vg 
would be lognormal. This was soon shown to be 
incorrect. The central part of the distribution 
approaches lognormality, but the tails do not, 
because the central limit theorem says nothing 
about their behavior. The family of lognormal 
distributions is a fixed point of eqn [25], but it is 
unstable, and it is only attained if the individual 
generating distributions are themselves lognormal. 
The lognormal distribution has moments 


Su (n) = exp(an + bn?) [27] 


which are conserved under [21], so that the product 
of lognormally distributed variables stays lognormal. 
The moments in eqn [27] are generated by the 
recursive relation 
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3 
Qu(n) = e ps 
w(n)S? (m+ 2) 

with suitable conditions for n < 2. Under [21], 
O,(2)=O*(n), and it is clear that only when all 
the O,,(7) are exactly equal to 1 do they continue to 
be so under multiplication. Otherwise, any O, 
initially larger than 1 tends to infinity after enough 
cascade steps, while any one initially smaller than 1 
tends to 0. Only an exactly lognormal distribution 
of the generating factors results in a lognormal 
limiting distribution, and even small errors lead to 
very different patterns of moments. This contrasts 
with the situation for sums of random variables, in 
which the Gaussian distribution is not only a fixed 
point, but also has a very large basin of attraction. 


Multifractals 


The problem with using the transformation [26] to 
find the limiting distribution of a multiplicative 
process is not so much the technique of analyzing 
the statistics of products in terms of those of sums, 
but the inappropriate use of the central limit 
theorem. It can be bypassed by using instead the 
theory of large deviations of sums of random 
variables. The key result is obtained by expanding 
the characteristic function of p when k > 1, and 
states that 





2 wt 
peri) (52) efl) |29] 


where z is defined as in [26] and ¢, which plays the 
role of an entropy, is a smooth function of z. Primes 
stand for derivatives with respect to z. Let us define 
Zn as the point where 


Pn = P (n) = -7 (30) 


which corresponds to the location of the maximum 
of 6 + nz. The entropy ¢ can be computed from the 
moments of the transition probability density. Using 
Laplace’s method to expand the mth moment of pz, 
we obtain 


TE J kelp (v4) dz 


gin? 
x (2) ek (dn tun) [31] 
from where, using [21], 
An = log Swl(n) = O(Zn) + Nn [32] 


The essence of Laplace’s approximation is that, for 
k > 1, most of the contribution to the integral in 
eqn [31] comes from the neighborhood of zn, so that 


150 Intermittency in Turbulence 


it makes sense to consider each such neighborhood 
as a separate “component” of the cascade. 

The geometric interpretation of this classification 
into components as a multifractal was developed in 
the context of three-dimensional homogeneous 
turbulence. We have up to now assumed very little 
about the nature of each cascade step, but it is 
natural in turbulence to interpret it as the process in 
which eddies decay to a smaller geometric scale. The 
argument works for any variable for which scale 
similarity can be invoked, but we have seen that 
most experiments are done for the magnitude of the 
velocity increments across a distance r. If we 
assume for simplicity that rp/rp,1=e, so that 
rp /ro = exp(—k), eqns [26] and [29] can be written as 


vp/Vo = (te/t0) *, — Pe(&n) ~ (re/T0)® [33] 


The multifractal interpretation is that the “compo- 
nent” indexed by n, whose velocity increments are 
“singular” in terms of r with exponent Z,, lies on a 
fractal whose volume is proportional to its prob- 
ability, and which therefore has a dimension 
D(%,)=3 + dn. 

Note that eqn [32] implies that the scaling 
exponents in eqn [17] can now be expressed as 


(n) —— log Sv (n) = — Àn [34] 


There was an enumeration there of several things 
which are equivalent: the exponents, the spectra, the 
distribution, and the limiting distribution p.(v) —- 
univocally determine each other. Note however that 
different quantities have different scaling exponents. 
For example, it follows from eqn [6] that, if the 
scaling exponents for the local dissipation are 
Ce(n), the exponents for Au would be 
Caul) =7/3 + &(n/3). 

Some properties can be easily derived from the 
previous discussion. If we assume, for example, that 
the multiplicative factor q is bounded above by qp, 
which is reasonable for many physical systems, eqn 
[26] implies that z, < log qp. In fact, if the transition 
probability behaves near qp as w(q)~(qy — q)", the 
scaling exponents tend to 


An =n log qp — (8 + 1)logn+0O(1) [S] 


for nœ 1. In the case in which w(q) has a 
concentrated component at g=q,, the logn is 
missing in eqn [35]. In all cases, the singularity 
exponent of the set associated with n — oo is 
Zoo = log qp, because the very high moments are 
dominated by the largest possible multiplier. In the 
case of a concentrated distribution the dimension of 
this set approaches a finite limit, but otherwise 


D(n)®—(8 +1) logn [36] 


which becomes infinitely negative. This should not 
be considered a flaw. The set of events which only 
happen at isolated points and at isolated instants has 
dimension D = —1 in three-dimensional space, and 
those which only happen at isolated instants, and 
only under certain circumstances, have still lower 
negative dimensions. Sets with very negative dimen- 
sions are however extremely sparse, and are difficult 
to characterize experimentally. 

The multifractal spectrum of the velocity differ- 
ences in three-dimensional Navier-Stokes turbulence 
has been measured for several flows in terms of the 
scaling exponents, and appears to be universal. The 
probability distribution w(q) of the multipliers has 
also been measured directly, and agrees well with 
the values implied by the exponents. It is also 
approximately independent of r, although not 
completely, perhaps due to the same experimental 
problems of anisotropy and limited Reynolds 
number which plague the measurement of the 
scaling exponents. There has been extensive theore- 
tical work on the consequences of imposing various 
physical constraints on the multipliers, specially the 
conservation requirement that the average value of 
the dissipation has to be conserved across each 
cascade step. Several simple models have been 
proposed for the transition distribution which 
approximate the experimental exponents well, but 
the relation lacks specificity. Models that are very 
different give very similar results, and it is impos- 
sible to choose among them using the available data. 

Multiplicative cascades and the resulting inter- 
mittency are not limited to Navier-Stokes turbu- 
lence. The equations of motion have only entered 
the discussion in this section through the assumption 
of scaling invariance. Multifractal models have in 
fact been proposed for many chaotic systems, from 
social sciences to economics, although the geometric 
interpretation is hard to justify in most of them. It is 
also important to realize that the fact that a given 
process can in principle be described as a cascade 
does not necessarily mean that such a description is 
a good one. Neither does a cascade imply a 
multiplicative process. For each particular case, we 
need to provide a dynamical mechanism that 
implements both the cascade and the transition 
multipliers. In three-dimensional Navier-Stokes 
turbulence, the basic transport of energy to smaller 
scales and to higher gradients is vortex stretching. 
The differential strengthening and weakening of the 
vorticity under axial stretching and compression 
also provide a natural way of introducing the self- 
similar transition probabilities of the local dissipation. 

Examples of nonintermittent cascades abound. 
We have already mentioned that the vorticity in 


decaying two-dimensional turbulence gets concen- 
trated into stable vortex cores which eventually 
block the decay. The resulting enstrophy distribu- 
tion is highly intermittent, but it is not well 
described by a multifractal. Conversely, forced 
two-dimensional turbulence is dominated by an 
inverse energy cascade to larger scales, which is not 
intermittent. 

In addition, the intermittency of some systems is 
not a small-scale effect. Turbulent mixing of a 
passive scalar, which is the key process in 
turbulent heat transfer and in the atmospheric 
dispersion of pollutants, is an extremely intermit- 
tent phenomenon. The gradients of the scalar tend 
to be very localized, but they concentrate in sheets, 
narrow in thickness but otherwise extended. Some 
progress has recently been made on a simplified 
model due to Kraichnan for this problem, which is 
the linear stirring of a passive scalar by a random 
noise with delta correlation. Its statistics have been 
computed analytically, but the constraints of 
linearity and of uncorrelated forcing are strong, 
and the same methods do not appear to be 
extensible to mixing by real turbulence (see 
Lagrangian Dispersion (Passive Scalar)). Another 
problem in which intermittency is confined to 
large-scale surfaces is the motion of a three- 
dimensional pressureless gas, which has been used 
as a model for hypersonic turbulence and for the 
large-scale evolution of dark matter in the early 
universe. 

In summary, intermittency is a fascinating property 
of many random systems, including three-dimensional 
Navier-Stokes turbulence, which interferes, sometimes 
strongly, with their description by simple cascade 
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Introduction 


Intersection theory is the theory that governs the 
rigorous definition of intersections of cycles. This 
can take place in a variety of mathematical contexts, 
for instance, the intersections of two cycles on an 
oriented manifold in algebraic topology, of two 
currents on a differentiable manifold in differential 
geometry, or of two subvarieties on a nonsingular 
algebraic variety in algebraic geometry. 
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models. Significant advances have been made in its 
quantitative kinematic analysis. In some cases we also 
have a qualitative understanding of its roots. But in very 
few cases do we understand it well enough to make 
quantitative predictions. 


See also: Ergodic Theory; Incompressible Euler 
Equations: Mathematical Theory; Lagrangian Dispersion 
(Passive Scalar); Turbulence Theories; Vortex Dynamics; 
Wavelets: Applications. 
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In algebraic geometry the theory is especially well 
developed (Fulton 1998). A cycle on an algebraic 
variety (or scheme) is a formal linear combination of 
irreducible closed subvarieties. These are subject to 
an equivalence relation called rational equivalence. 
For every rational function on every subvariety, its 
zero set is deemed rationally equivalent to its poles 
(with appropriate multiplicities). 

As an example, in the complex projective plane 
CP’, any two lines are rationally equivalent since 
the ratio of two linear forms will vanish on one line 
and have a pole along the other. Similarly, a curve 
of degree d is rationally equivalent to d lines. Any 
two points in CP’ can be joined by a line (a copy of 
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CP!), and a rational function on CP! can be chosen 
to vanish at one point and have a pole at the other. 
The groups of cycles modulo rational equivalence, 
known as Chow groups, are 


CH2(CP?) = Z, generated by the fundamental 


class [CP*] 
, generated by the class of a line 


, generated by the class of a point 


Two distinct lines 44 and 4) meeting at a point p have 
this point as their intersection-theoretic product: 


1] - [42] = [P| [1] 


Intersection theory must also provide a self-intersection 
[21] - [41]. Because 2; and @ are rationally equivalent, 
this must also be the class of a point, but symmetry 
precludes the choice of a distinguished point on 4. 
Instead, [41]. [4] is declared to be the rational 
equivalence class of a point on 4, an element of 
CHo(¢1) rather than a specific cycle. This example 
illustrates that intersections cannot generally be defined 
on the level of cycles. 


Algebraic Intersection Products 
Refined Intersections 


For a general nonsingular variety X, say of dimen- 
sion m, if U and V are subvarieties of X of respective 
dimensions c and d, then there is a refined 
intersection product 


[U] - [V] € CHeyd-m(UN V) 2] 


The traditional definition of the intersection 
product is based on two ideas. First, given two 
cycles that intersect properly, which by definition 
means that no component of their intersection has 
codimension less than the sum of the codimensions 
of the given cycles, the intersection product should 
be a formal sum of these components, each with a 
multiplicity that correctly reflects the geometry of 
the intersection. Second, given two arbitrary cycles, 
it should be possible to replace one of them by a 
rationally equivalent cycle which intersects the other 
properly. 

While these ideas are simple, it took several 
decades for them to be carried out successfully. 
The case of curves on a surface meeting at a point 
was understood in the nineteenth century. General- 
izing the classically understood canonical divisor 
class on a variety, work in the 1930s by Severi, 
Todd, and others showed that there are groups of 
equivalence classes of cycles in which canonical 


invariants of higher degrees can be defined (in 
modern language, higher Chern classes of the 
tangent bundle). Weil’s foundations for algebraic 
geometry of the 1940s included a study of intersec- 
tions of cycles. It was not until the 1950s that the 
notion of Chow groups was formalized and inter- 
section theory was properly developed in this 
context. Chevalley, Chow, Samuel, Severi, and 
others contributed essential components of the 
theory. In an interesting parallel development, an 
intersection theory based on intersection multipli- 
cities in algebraic topology was put forth by 
Alexander and Lefschetz in the 1920s, a decade 
before the introduction of the cup product in 
cohomology. 


Deformation to the Normal Cone 


In the 1970s, Fulton and MacPherson established a 
construction of the intersection product in algebraic 
intersection theory that does not require moving 
cycles into general position. To accomplish this, they 
used an elegant geometric construction known as 
deformation to the normal cone. 

Let i: X — Y be an embedding of codimension d 
of nonsingular varieties. Let V be a subvariety of Y 
of dimension k whose intersection with X is of 
interest. We may view X as the zero set of a section s 
of some algebraic vector bundle E on Y. By 


(y, A) (A's(y), A) 


we have a map of the product of Y with the 
punctured affine line, Y x (A! \ {0}), into E x A’. 
We denote the closure of the image by MSY. An 
alternative, more intrinsic description is in terms of 
the blowup construction of algebraic geometry: 


MY = Blyx{o (Y x A‘) 


Geometrically, MY has a copy of Y over each \ Æ 0 
and a copy of the normal bundle Nx Y over A = 0. This 
is the key construction that Fulton and MacPherson 
make use of. The same construction applied to V, that 
is, the closure of V x (A! \ {0}) in MSY, has over 0 a 
sort of singular normal bundle known as the normal 
cone 


Cyny V C NxY|xny 


One of the properties of Chow groups is that they 
are unchanged upon pullback to the total space of a 
vector bundle (apart from the obvious dimension 
shift). The refined intersection of V with X, denoted 
i[V], is defined to be the unique element of 
CHy_¢g(X N V) whose pullback to NxY is equal to 
[Cxnv V]. 


This single construction encompasses and inter- 
polates between two extreme cases of intersections: 


[V] =[XMV]_ when X and V 


meet transversely 


3] 


[V] =cq(NxY)N[V] when Vc X [4] 


Equation [3] makes reference to transverse inter- 
section, a notion that is stronger than proper 
intersection. In situations when it applies, for 
example, in eqn [1], it signifies that intersection 
operations behave as one might expect. Equation 
[4] includes the self-intersection formula which says 
that [X]-[X] is equal to the top Chern class of 
Nx Y. 

With this construction, which is well documented 
in Fulton (1998), the general refined intersection in 
eqn [2] is obtained by reduction to the diagonal. Let 
Ax denote the diagonal inclusion X — X x X of the 
nonsingular variety X. For subvarieties U and V of 
X, we define 


[U] -[V] = 4, [U x V] [5] 


Equation [5] makes the Chow groups of X into a 
ring, the Chow ring CH*(X), which is graded by 
codimension by setting 


CH“ (X) = CHin_p (X) 


Links with Topology 
Cycle Map to Homology 


For algebraic varieties over the complex numbers, 
there is a cycle map which links the Chow groups 
with a topological homology group. If X is an 
algebraic variety over C, then let H,(X) denote the 
Borel-Moore homology of X, that is, the homology 
of locally finite singular chains on X (viewed as a 
topological space with the classical topology). If X is 


embedded as a closed subset of an oriented 
differentiable manifold M, then there are 
identifications 

H;(X) = H™'(M, M\ X) [6] 


where n is the dimension of M. There is a cycle class 
map 


CH,(X) > H} (X) 


which sends the class of each irreducible subvariety 
Z of dimension k in X to its fundamental class 
[Z] € H(X). 

Let M be an oriented differentiable manifold of 
dimension n and let X and Y be closed subsets of M. 
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Then the cup product H'(M, M\X) & H’(M, M\ Y) 
— H'Y(M,M\(X A Y)) induces, via eqn [6], an 
intersection product 


H;(X) @ H;(Y) > Hisj-a(X N Y) 


which is the topological analog of the refined 
intersection product of eqn [5]. The products are 
compatible via the cycle class map. The topology of 
complex algebraic varieties and the compatibilities 
between algebraic and topological intersections are 
discussed in Fulton (1998). An interesting applica- 
tion of this interplay of intersection theories is the 
convolution product in Borel-Moore homology, 
which is important in geometric representation 
theory (see Chriss and Ginzburg (1997)). 


Riemann-Roch Theorems 


The classical Riemann-Roch theorem relates the 
dimensions of linear systems on an algebraic curve 
(algebraic quantities) with their degrees and 
the curve’s genus (topological quantities). The 
Hirzebruch-Riemann-Roch theorem states that on 
a nonsingular projective variety X, if E is an 
algebraic vector bundle on X and x(E) denotes its 
Euler characteristic (the alternating sum of the ranks 
of the sheaf-theoretic cohomology groups), then 


x(E) = | ch(E) - td(Tx) 7) 


where |, denotes the degree of the zero-dimensional 
component of the quantity that follows, and the Chern 
character ch(E) and Todd class td(Tx) are certain 
standard universal polynomials of Chern classes. 

Grothendieck had the inspired idea that eqn [7] 
could be generalized to a covariance property for the 
Chern character times the Todd class. If X and Y are 
nonsingular varieties and f: X — Y is a projective 
morphism (or, more generally, a proper morphism), 
then there is a well-defined push-forward f, on 
Chow groups. There is also a kind of push-forward 
for vector bundles. The Grothendieck group of 
vector bundles on X, denoted K°(X), is the group 
of formal linear combinations of vector bundles, 
modulo the relations [E] =[E’] + [E”] whenever F’ is 
a sub-bundle of E with quotient bundle E”. Every 
coherent sheaf F has a well-defined class in K°(X), 
namely, the alternating sum of [E;] where E, is any 
finite resolution of F by vector bundles (locally free 
sheaves). The push-forward f,[E] is defined as the 
alternating sum of the classes in K°(Y) of the higher 
direct images R’f,E. The Grothendieck—Riemann— 
Roch theorem states that 


ch(f.[E]) -td(Ty) = f.(ch(E) -td(Tx)) [8] 
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in CH,(Y) ® Q. Notice that eqn [7] represents the 
case that Y is a point. 

There is an even more general formulation valid 
for singular varieties. It is necessary to work with a 
homology version of the Grothendieck group, 
namely, the Grothendieck group Ko(X) of coherent 
sheaves on X. The Baum—Fulton—MacPherson ver- 
sion of the Grothendieck—Riemann—Roch theorem 
prescribes transformations 


Tx : Ko(X) > CH,(X) 89 Q [9] 


which are covariant for proper morphisms. When 
X is nonsingular, rx is given by the “Chern 
character” times the “Todd class”, and covariance 
becomes eqn [8]. 

In the case of varieties over the complex numbers, 
there is also a transformation from the algebraic 
Grothendieck group Ko(X) to a topological analog, 
satisfying various compatibilities. The composition 
with the homology Chern character gives Riemann- 
Roch transformations Ko(X) — H,(X;Q) satisfying 
properties akin to those of eqn [9]. 


The Analytic Setting 


The Atiyah-Singer index theorem stands as 
an important generalization of the Hirzebruch- 
Riemann-Roch theorem. The index of an elliptic 
differential operator on a differentiable manifold 
plays the role of the Euler characteristic, and is 
equated with a topological quantity. One of the 
consequences of the index theorem is the validity of 
eqn [7] for general compact complex manifolds. 

More in the domain of pure analysis is the 
question of intersecting two currents on a differenti- 
able manifold. Currents arise naturally out of 
Chern—Weil theory. To each current is associated a 
wave front, a subset of the cotangent bundle that 
reflects the geometry of the singular set of the 
current. A current can be pulled back to an 
embedded submanifold whenever the embedding is 
transverse to the wave front. By reduction to the 
diagonal, this gives an intersection of two currents 
with transverse wave fronts which reduces to the 
usual wedge product in the case of smooth differ- 
ential forms (see Hormander (1990)). 


Applications of Intersection Theory 
Enumerative Geometry 


Intersection theory has proved to be a useful tool in 
diverse areas such as enumerative geometry, singular- 
ity theory, and moduli problems. Enumerative pro- 
blems have intrigued generations of geometers. 
Chasles, Maillard, Schubert, and Zeuthen are among 
the geometers of the second half of the nineteenth 


century who solved an impressive array of problems, 
including, as a notable example, Steiner’s five conics 
problem to determine the number of plane conics 
tangent to five given conics in general position. 

In modern terms, the successful solution to an 
enumerative problem involves setting up a space which 
parametrizes the geometric objects being counted, 
suitably compactified, and carrying out an intersec- 
tion-theoretic computation on this space. Steiner’s 
problem illustrates how “excess intersection” can 
occur and cause difficulty. Inside the CP° of plane 
conics, including degenerate conics, those tangent to a 
given conic constitute a sextic hypersurface. So 
6° =7776 would appear plausible; this was, in fact, 
the originally proposed solution. However, the most 
degenerate conics, the double lines, all appear as limits 
of families of conics tangent to any given conic. The 
refined intersection of five of these sextics has a cycle 
class of degree 4512 supported on the Veronese 
surface of double lines. This leaves 3264, the correct 
answer given by Chasles in 1864. The issue of 
providing rigorous foundations for these kinds of 
calculations was recognized by Hilbert, who set it as 
the 15th of his 23 major mathematical problems 
outlined in 1900. A good survey of early and modern 
efforts in enumerative geometry can be found in 
Kleiman and Thorup (1987). 


Singularity Theory and Degeneracy Loci 


In any situation where a geometric object is 
described by parameters, there will be values of the 
parameter at which the geometry changes qualita- 
tively. The significance of this is evident in the space 
of conics above. Singularity theory is concerned with 
the loci in parameter spaces on which these 
transitions can occur. Let 7: Y — P be a map of 
differential manifolds, or of nonsingular algebraic 
varieties, which is generally (but not everywhere) 
submersive, so that there are singular fibers. Let d 
denote the dimension of P, which can be considered 
as a parameter space, and let c be the dimension of 
Y. Consider the loci 


S(T) = {y € Y | rk(Ty y — Trop) <d= k} 


of singularity theory. Thom made an influential 
study of these in the 1950s, and Porteous in 1971 
gave the following formula, now called the Thom- 
Porteous formula: 


[Sp (T)] = Skype-a) (T TP — Ty) [10] 


The symbol on the right is shorthand for 
Sikice-d a ktc-d the case ay = = =a =k + c—a of 
the Schur determinant s(q,,..,a,) = det (Sa;+)-1)1<;, j<ks 
and for vector bundles E and F the s;(F-— E) are 


defined by the formula s(F—E)=)/, (—1)’c;(E)/ 
X (—1)'ci(F). In algebraic intersection theory, eqn 
[10] has the precise meaning that when S;,(7) has the 
expected codimension k(k + c — d) in Y (or is empty), 
its cycle class is equal to the given polynomial in 
Chern classes. The Thom-—Porteous formula applies 
to the degeneracy loci of arbitrary maps of vector 
bundles E — F. Degeneracy loci constitute an active 
area of research in intersection theory, and there are 
generalizations, for example, to cases where there 
are more bundles or bundle maps with symmetry 
(see Fulton and Pragacz (1998)). 


Moduli Spaces 


The parameter spaces that have appeared often admit 
interpretations as moduli spaces. Moduli problems 
start with geometric objects to be classified, and ask for 
families of these objects over an arbitrary base space to 
be represented as faithfully as possible by maps from 
the base space to some space called a moduli space. For 
enumerative applications it is most useful for the 
moduli space to be compact. One of the principal 
examples is the moduli of algebraic curves of given 
genus g: for g > 2, the moduli space of smooth curves 
M, has a compactification M, by stable curves, as 
defined and studied by Deligne and Mumford. While 
the M, are singular, the singularities are mild enough 
to permit the definition of an intersection theory for 
M, and M,, as was done by Mumford in the 1980s. 
More generally, if X is a complex projective variety, 
Kontsevich’s spaces of stable maps Mg, ,(X, 3) com- 
pactify the moduli of genus g curves with n marked 
points together with algebraic maps to X having image 
in homology class 8 € H(X). These spaces, and some 
high-powered intersection theory that takes place on 
them, are vitally important in Gromov—Witten theory. 
K-theory also provides an alternative approach to 
intersection products in algebraic geometry. 


Extensions and Related Theories 
Motives and Higher Chow Groups 


Intersection theory has evolved into a mature theory 
with numerous extensions and offshoots. Many of 
these are a result of endeavors to forge links with 
other branches of mathematics. One of the exten- 
sions, higher Chow groups, has its roots in a basic 
property of intersection theory, the excision prop- 
erty, which states that if X is a variety and U C X an 
open subvariety, with Z = X\ U, then the inclusion 
and restriction maps fit into a right exact sequence 


CH,Z — CH,X — CH,U — 0 


This is reminiscent of the long exact homology 
sequence of a pair in algebraic topology. Indeed, 
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there is a corresponding long exact sequence of 
Borel-Moore homology groups, but the elementary 
algebraic theory lacks such a long exact sequence. 
Bloch introduced higher Chow groups in the 1980s 
to fill this gap. The theory, which is quite 
complicated, provides groups CH,(X,/), with 
CH,.(X, 0) = CH,.X, such that there is a long exact 
sequence 


These groups are closely connected to algebraic K- 
theory and also to a related theory called motivic 
cohomology. 

Motives, a sort of universal cohomology theory 
envisaged by Grothendieck, conjecturally form a 
category which can be extended to a bigger category 
of mixed motives that reflects mixed structures in 
cohomology, such as mixed Hodge structures. 
Recently, Voevodsky et al. (2000) have introduced 
motivic cohomology groups which form an integral 
part of a homotopy theory for algebraic varieties. 
Voevodsky’s work, including a proof of the Milnor 
conjecture of K-theory, earned him a Fields Medal 
in 2002. 


Arithmetic Intersection Theory 


There is an arithmetic version of intersection theory 
which applies to an arithmetic scheme X, which is, 
informally, a scheme defined over every prime field 
(all finite fields F, and also Q) in a consistent way. 
This means that X can be base-extended to any 
field. In situations where the complex variety X(C) 
is nonsingular, there is an arithmetic Chow ring 
CH*(X), introduced by Gillet and Soulé in 1990. 
Elements of CH*(X) are equivalence classes of pairs 
(Z,z) where Z is an algebraic cycle on X and g is 
known as a Green current for Z, a current on X(C) 
satisfying the relation 
i 
Lat 
for some smooth differential form w satisfying some 
conditions. Here, 67(C) denotes the current of 
integration along Z(C). The point to notice is that 
eqn [11] relates analysis (the Green current) and 
algebra (the cycle) on X on one side with topology 
on the other, as w will be a closed form whose class 
in de Rham cohomology is Poincaré dual to [Z(C)]. 
Arithmetic intersection theory is used to define 
arithmetic height functions. Height functions have 
important applications to Diophantine problems, and 
were an essential component of the proof by Faltings of 
the Mordell conjecture, which earned him a Fields 
Medal in 1986. Arithmetic intersection theory grew 


ddg T ÔZ(C) = WwW [1 1] 
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out of an earlier theory of Arakelov, in which X(C) is 
endowed with a Kahler metric, and the form w in eqn 
[11] is required to be harmonic. The Arakelov Chow 
group is only a ring when harmonic forms are closed 
under wedge product, which is not the case generally 
but which is true in some interesting cases, for example, 
for Grassmannian varieties. Arakelov treated the case 
of arithmetic surfaces, that is, the case when X(C) is an 
algebraic curve (“surface” refers to a second dimension 
in the arithmetic direction), and introduced a pairing of 
arithmetic divisors, in analogy with the usual pairing of 
divisors on an algebraic surface. Arakelov’s work, its 
subsequent generalizations, and more recent develop- 
ments are covered in Faltings (1992). 


Equivariant Theories and Stacks 


Moduli problems such as those mentioned previously 
are often best represented not by traditional varieties, 
but by a more sophisticated sort of object called a 
stack. Taking inspiration from Mumford’s intersec- 
tion theory on Mg, intersection theory on algebraic 
stacks has grown into a mature theory in its own 
right. Examples of stacks include orbifolds, for which 
there is the Chen—Ruan (orbifold) cohomology theory 
as well as an algebraic analog due to Abramovich, 
Graber, and Vistoli (see Abramovich, et al. (2002)). 
Another class of examples are quotient stacks of a 
variety by the action of an algebraic group. In these 
cases the Chow groups of the stack are equivariant 
Chow groups, part of a rich theory modeled on 
equivariant cohomology in algebraic topology. 
Behrend (2002) provides a nice survey of stacks, 
equivariant intersection theory, and their uses in 
Gromov-Witten theory. The Bott residue formula is 
an important tool in equivariant intersection theory 


which is particularly well suited to making concrete 
calculations, for example, in enumerative geometry. 
A description with nice examples can be found in 
Ellingsrud and Strømme (1996). 


See also: Cohomology Theories; Hamiltonian Group 
Actions; Index Theorems; K-Theory; Moduli Spaces: 
An Introduction. 
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Formulation of the Problem 


Consider the Newton equation 


% = F(x), F(x)=—-Vv(x), xeR? [1] 
where 
v € C*(R*,R) 
Zv] < ea + e [2] 


for x € Rf, j| < 2,and some a > 1,c) > 0 


(where j is the multi-index j € (NU {0})%, | = 
Sane) In classical mechanics, eqn [1] describes 
the dynamics of a particle with the mass m= 1 in the 
force field F with the potential v. For eqn [1] the 
energy E=(1/2)(x(t))* + v(x(t)) is an integral of 
motion. 

Under the assumptions [2], it follows that (Reed 
and Simon 1979): for any (p_,x_) € R*4,p_ 40, 
eqn [1] has a unique solution x € C?(R, R?) such 
that 


M2) jp.) 


y_(t) — 0, y(t) — 0, 3! 


as t — -œO 


in addition, for almost any (p_,x_) 


x(t) = a(p_,x_)t + b(p_,x_) + y+(t) 
a(p_,x_) #0, y(t) > 0, V(t) > 0 4] 


as t — +œ 


furthermore, the set D of all (p_,x_) € R”, p_ £0, 
for which [4] holds for fixed v, is an open subset of 
R?” and Mes(R74\D) =0. 

We say that a, b arising in [4] (and defined on D) are 
the scattering data for eqn [1]. In addition, the scattering 
data a, b at fixed energy E > 0 meansa,b on {(p_,x_) € 
D | p? /2 = E}. Roughly speaking, for a particle moving 
according to [1], the functions a, b relate the free motion 
at time t — —oo with the free motion at time t — +00. 

Note that 


a(p_,x_+top_) = a(p_,x-_) 
b(p_,x_+top_) = b(p_,x_)+toa(p_,x_) [5] 
(p_,x_)ED, weER 


Formula [5] imply that a,b on D are uniquely 
determined by a,b on {(p_,x_)€ D|p_x_ =O}, 
where p_x_ is the scalar product of p_ and x_. 

If v(x)=0, then a(p_,x_)=p_,b(p_,x_)= 
x_,(p_,x_) € R¢,p_ £ 0. Therefore, it is convenient 
to use for a, b the following representation: 


AP) =p- Faa 25 00-.) 


b(p_,x_) =x_+d.(p_,x-), (p-,x-)€D [6] 


where the subscript sc is an abbreviation of the word 
“scattering.” 

The direct scattering problem for eqn [1], under 
the assumptions [2], consists in the following: given 
v, find a, b. 

The inverse-scattering problem for eqn [1], under 
the assumptions [2], consists in the following: given 
a,b (or some partial information about a, b), find v. 

In the present article, we discuss, mainly, the 
aforementioned inverse-scattering problem. 


Abel’s Result of 1826 


Consider the Newton equation [1] in dimension 
d=1 for x €] — œ,x1], xı > 0, where 
v € C*(] — 00, x1], R) 
v(x) =0 forx <0 7] 
duv(x) 
dx 


Under the assumptions [7], for any p- > 0, where 
E=p* /2 < v(x1), eqn [1] has a unique solution 
x € C*(R,] — 00,x1]) such that 


=p fort <0 [8] 





>0 fr0<x<x 
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in addition, 


x(t) = —p_t+b(p_) as t— +00 [9] 


0<E<v(x1), V2E>0 [10] 





(T(E) is the time during which a particle starting 
at x=0 with the impulse p_=V2E returns to 
v= 

Let x(v), v € [0, v(x1)], be the inverse function to 
v(x), x € [0, x1]. Then (under the assumptions [7]), 


Th) = 2 fe — yy 1/2 PY) 4, 


du 
0 < E< v(x) [11] 
x(v EN om is 
0) = |, ©- By PTE ME 


0 <v < v(x) [12] 


Actually, the formulas [11], [12] relating the travel 
time T and the potential v are the results from 
Abel (1826) (see also Keller (1976) for a discus- 
sion of this result). Formula [11] is a result on 
direct scattering, whereas [12] is a result on 
inverse scattering. In addition, if T(E), 0< E< 
v(xı), is given, then [11] is the Abel integral 
equation for x(v),0 <v < v(xı), and [12] solves 
this equation. 

Concerning further results on inverse scattering 
for the one-dimensional Newton equation, see Keller 
(1976) and Astaburuaga et al. (1991). Note that for 
the one-dimensional case the scattering data a, b do 
not in general determine v uniquely. 

The Abel integral equation and the Abel 
formula solving this equation were used also, in 
particular, by Firsov (1953) and Keller et al. 
(1956), where inverse scattering was considered 
for the three-dimensional Newton equation at 
fixed energy for the case of spherically symmetric 
monotonous decreasing potential in |x]. 

Note also that the Abel method for solving the 
integral equation [11] was used by Radon (1917) 
for finding the inversion formula for the Radon 
transformation. In the next section, we reduce the 
inverse-scattering problem for the Newton equa- 
tion [1] in dimension d > 2, under the assumptions 
[2], to the inversion problem for the X-ray 
transformation (i.e. the Radon transformation 
along straight lines). 
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Inverse Scattering for the 
Multidimensional Newton Equation 


Consider 
TS?! = {(0,x) € St! x R? | 6x = 0} [13] 


Consider the X-ray transformation P defined by the 
formula 


Pf(0,x) = | f(t0-+x)dt, (0,x)€TSt! [14] 


where 

FEeCcR’.R™ 

f(x) = O(|x|-’) as |x| + œ forsome 8>1 [15] 
Consider the functions asc, bsc of [6] 


Theorem 1 (Novikov 1999). 
equation |1], under the assumptions | 
ing formulas hold: 


For the Newton 
2], the follow- 


PF(@,x) = lim 1 SAsc(S8, x), (6,x)eTS*! [16] 


sot 


Pv(6,x) = lim s*0b,-(s0,x), (b,x) ETS! [17] 


S= 
in addition, 


|PF(0, x) — sase (s0, x)| 
d? c222a+4 3 


<Jel Dah Ghia "9 
IPv(8, x) — s*0b,.(s0, x)| 
d? c222a+4 sí 19 


Sree ETT ast eG 
aa — 1)°(1 + |x|/v2)°°™* (s/v2 — 1 


for (0,x) € TS4|,s > 2(d,c,a, |x|), where bbs: is the 
scalar product of 6 and bsc, z is the root of the equation 


d2c22+2 z2 
(œa — 1)(1 + |x| / V2 e 
z €E]V2, +00] [20] 


c= max (c1, c2) (and a, c1, c2 are the constants of [2]) 


Theorem 1 gives a method for finding PF and Pv 
from as: and bsc at high energies. It has been proved in 
Novikov (1999) by means of analysis of the following 
nonlinear integral equation for the function y_ of [3]: 


y-(t) = Ap_,x_(y-)() 


where 


Dip og (U1) = J f F(p_s + x_ + u(s))ds dr 
p- #0 


In dimension d > 2, Theorem 1 and methods for 
the reconstruction of f from Pf (Gelfand et al. 1980, 
Natterer 1986, Novikov 1999) give a method for the 
reconstruction of F and v from the scattering data a, 
b at high energies. Note that for d=1 Theorem 1 
is valid but f cannot be uniquely reconstructed 
from Pf. 

Theorem 1 is an analog of the Born formula for 
the Schrödinger equation at high energies (see, e.g., 
Faddeev (1956), Enss and Weber (1995), and 
Novikov (1998) as regards this Born formula and 
its variations). On the other hand, Theorem 1 was 
preceded by a result of Gerver and Nadirashvili 
(1983) on the high-energy asymptotics for the travel 
time between boundary points for the Newton 
equation in a bounded strictly convex domain with 
smooth boundary. There is a considerable similarity 
between this result and Theorem 1. 

We continue our review on inverse scattering for 
the multidimensional Newton equation, and make 
the following well-known observation. 


Observation 1 Suppose that v(x) > E > 0 forx €U, 
where U is a compact subset of R“. Then the scattering 
data a, b for energies smaller than or equal to E contain 
no information about v(x) for x € U. 


In addition to Theorem 1 and Observation 1, one 
has the following conjecture. 


Conjecture 1 (Novikov 1999). Suppose that v 
satisfies [2], d > 2, and the energy E is sufficiently 
large, E> E(v). Then the scattering data a,b at 
fixed energy E uniquely determine v. 


Gerver and Nadirashvili (1983) proved a result 
similar to Conjecture 1 for the case of the Newton 
equation in a bounded strictly convex domain G 
with smooth boundary. Their proof of this result 
contains no reconstruction method but does contain 
a stability estimate. It is based on the Maupertuis 
principle and the results of Muhometov and Roma- 
nov (1978), Beylkin (1979), and Bernstein and 
Gerver (1980). For the case v € C2(R2,R), suppv C 
G (where G has the properties mentioned above), 
in Novikov (1999) a connection between the 
boundary-value data of Gerver and Nadirashvili 
(1983) and the scattering data a,b is given and it is 
shown that for d > 2 the scattering data a, b and the 
domain G uniquely determine v at fixed sufficiently 
large energy E > E(v,G). 

For more information concerning results men- 
tioned above, see Novikov (1999) and Gerver and 
Nadirashvili (1983). One can see from the review 
of this section that very few results on inverse 
scattering for the multidimensional Newton equa- 
tion are given in the literature, at present. It should 


be remarked that the inverse-scattering theory in 
multidimensions is much more developed for the 
Schrodinger equation than for the Newton 
equation. 


Inverse Scattering for the Schrodinger 
Equation in Multidimensions 


The inverse-scattering theory for the multidimen- 
sional Schrödinger equation has been developed by 
many authors (see, e.g., surveys given in Grinevich 
(2000) and Novikov (2001)). 

Quantum-mechanical analogs of Theorem 1 
appear, for example, in Faddeev (1956), Enss 
and Weder (1995), Novikov (1998) (see also 
references therein). Similarly, the quantum-mechan- 
ical analogs of Conjecture 1 have been proved, for 
example, in Novikov (1992, 1994) and Grinevich 
and Novikov (1995) (see also references therein). On 
the other hand, as a rule, classical-mechanical analogs 
of results of the works on inverse Schrodinger 
scattering in multidimensions are unknown. This 
leads to many open problems. For the one-dimen- 
sional case some results on finding classical limits of 
results on inverse Schrödinger scattering are given in 
Lax and Levermore (1983) and Bogdanov (1985). 
Note that inverse scattering for the two-dimensional 
Schrodinger equation at fixed energy (see Novikov 
(1992), Grinevich and Novikov (1995), and 
Grinevich (2000) and references therein) has con- 
siderable similarity with inverse scattering for the 
one-dimensional Schrödinger equation. Therefore, 
an interesting open problem consists in extending 
the aforementioned study of Lax and Levermore 
(1983) and Bogdanov (1985) to the case of inverse 
scattering for the two-dimensional Schrödinger 
equation at fixed energy. Perhaps, in this way one 
can find proper two-dimensional analogs of the Abel 
formulas [11] and [12]. 
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Introduction 


The equations governing the motion of an ideal 
(inviscid) fluid were derived by Euler in 1755. They 
were, together with the equation of vibrating strings, 
the first partial differential equations introduced in the 
field of mathematical physics. While several partial 
differential equations, coming from the modeling of 
physical phenomena, have had a satisfactory mathe- 
matical solution, it is piquant to note that the old Euler 
equations remain essentially unsolved. Together with 
the Navier-Stokes equations of viscous fluids, the 
Euler equations play a central role in the modern 
analysis of partial differential equations. 

The mathematical difficulties encountered in the 
study of Euler equations seem to be deeply linked with 
the understanding of turbulence, which remains one of 
the great open problems in the field of macroscopic 
physics. 

The relevance of Euler equations as a model of 
fluid flow is rather subtle, and the discussion is far 
from closed. On the one hand, Euler equations have 
disturbing aspects, which, in their most visible form, 
yield paradoxes. On the other hand, the systematic 
recourse to some viscosity seems to put a serious 
obstacle to a proper understanding of turbulence. In 
this article we will try to give some insight into this 
issue. 

To be rigorous, every fluid has some compressi- 
bility, that is to say the density varies with the 
pressure. Compressibility gives rise to pressure 
waves, which propagate in the fluid with some finite 
speed. When the velocity of the fluid particles is 
slow relative to the speed of the pressure waves, it is 
legitimate to make the approximation that the flow 
is incompressible; it is the case for meteorological 


flows, for example. Then, there are no more 
pressure waves; nevertheless the motion can be 
very unstable and intricate (turbulent). Although 
very often in physical flows these two features 
coexist, following the tradition, we clearly separate 
the compressible and incompressible cases. 


The Equations of the Perfect Fluid 


Until now a rigorous derivation of the fluid 
equations from a system of interacting particles 
governed by Newton’s laws is not known. Thus, 
the mathematical models of fluid motion result 
from heuristic considerations. 

Let us specify some notations. 

The fluid motion is supposed to take place in 
some domain (not necessarily bounded) Q of the 
physical space RÈ. 

We shall use the so-called Eulerian description of 
the fluid motion: p(t,x) denotes the local density of 
the fluid at time ¢ and position x, and u(t,x) the 
velocity of the fluid particle located at x at time t. 

The first equation (conservation equation) expresses 
the conservation of mass: 


Op l E 
ar + div(pu) = 0 [1] 


The second equation (momentum equation) 
expresses Newton’s law (in the absence of internal 
friction): 


(5 + (u- v)u) = —Vp [2 


where the scalar function p(t, x) is the pressure 


inside the fluid, and 
(u-V)u= X ujOju 


With [1] and [2], we have five scalar unknown 
functions (p,u;,p) and only four equations. To get a 


closed set of equations, we need to add a supple- 
mentary relationship: 


div(u) = 0, 


for the incompressible flows [3] 


In the case of compressible flows, eqns [1] and [2] must 
be completed by a thermodynamical description of the 
fluid, which yields a relationship between p,p, the 
internal energy, the specific entropy, etc. We will only 
consider here the simple case of an isentropic gas 
which is modeled by the relationship 


p = p(p) 4] 
with p(p)=cp’ for a perfect gas (c > 0,7 > 1). 


Condition at the Boundary Q of the Domain 


In the case of a perfect fluid, we simply have to 
write that the velocities of the fluid particles at the 
boundary are tangent to the boundary, that is, 


u-n=0 on) [5] 


where n denotes the unit normal vector to the 
boundary (pointing outward). 


The Incompressible Perfect Fluid: Main 
Properties of Smooth Flows 


We shall suppose p=1. Equations [1]-[3] and [5] 

then yield the classical Euler system: 
ot (u- Vu = -Vp on} 
divu = 0, 


[6] 
u-n=O on ON 


The Constants of the Motion 


Let us examine the constants of the motion of the 
dynamical system defined by [6], that is, the functionals 
which are conserved by the motion of the fluid. 

First we have the classical constants of motion 
associated with the natural symmetries by Noether’s 
theorem. 

The time translational invariance of the system 
implies that the kinetic energy is conserved: 


1 
E =5 | wax 


In the case Q = #3, the homogeneity of space implies 
the conservation of the impulsion: 


[was 
Q 


The space isotropy, on the other hand, yields the 
conservation of the angular momentum: 


| xnudx 
Q 
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There is a more hidden constant of the motion, 
called helicity, which was discovered in 1961 by 
J J Moreau (1961) (see, e.g., Serre (1979)). 

Let us define the vorticity of the flow: 


w = curl u 


| ouda 
Q 


Of course, here, we suppose u to be vanishing at 
infinity in such a manner that the above integrals 
make sense. 

One may wonder about the existence of other 
constants of the motion of the form (first-order 
functionals): 


then the helicity is 


[Fe u(x), Vu(x))dx 


The answer, due to Serre (1979), is that any 
functional of the above form which is conserved by 
the flow is a linear function of the energy, the 
impulsion, the angular momentum, the helicity plus 
a trivial term (i.e., taking the same value for any 


field u such that div u = 0). 


Beltrami Equation and Kelvin’s Theorem 


Another important issue is to know how the vorticity 
field evolves in a regular flow. If we apply the operator 
curl to the equation [6] in order to eliminate the 
pressure term, we get: 


o 
S +u: Vw- (w-V)ju=0 7] 
which is the Beltrami equation. 
To exploit the Beltrami equation, we need the 
Lagrangian flow y(t, x), associated with the field u, 
which is defined by the differential equation: 


ad (t, x) = u(t, y(t, x)), 


PO a= 
Then we can state the following proposition. 


Proposition During the smooth motion of an 
incompressible perfect fluid, we have: 


w(t, p(t, x)) = Delt, x) lw(0, x)|, 


where Dy(t,x) denotes the derivative at the point x 
(t fixed) of the mapping x — (t,x). 


for all t,x 


The first consequence of this result is to point 
out the class of irrotational flows, for which 
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w(t,x)=0. Indeed, if the vorticity vanishes initi- 
ally, it follows from the proposition that it will 
vanish for ever. 

Another consequence is the behavior of vortex 
lines. By definition, a vortex line is any integral 
curve of the vorticity field. More precisely, a 
vorticity line at time ft, C(s) is defined by the 
differential equation 


dC 
q; © = we Cls)) 


Now we can check that vortex lines are merely 
transported by the flow: if C(s) is a vortex line at 
time t=0, y(t, C(s)) is a vortex line at time t. 

We end this section with the famous Kelvin’s 
circulation theorem (1869) (see, e.g., Marchioro and 
Pulvirenti (1994)). 


Theorem Let L be a closed (oriented) contour drawn 
inside the fluid. We suppose that L is transported by 
the flow; pi(L) denotes the contour at time t. Then the 
circulation of the velocity field u(t,x) along (L) is 
independent of t. 


Stationary Solutions: D’Alembert’s Paradox 


Let us focus now on the flow around a bounded 
body 9, whose complement 2” will be supposed to 
be simply connected. 

A stationary solution u(x), p(x) satisfies: 


(u-V)u = -Vp 
divu=0, u-n=0 ondoQ 


But since (u-V)u= V(5u") +(curlu) ^u, any 
stationary field u(x) satisfying curl u=0, divu =0, 
u-n=O on OQ, defines a stationary solution with 
associated pressure p=— 4u’. 

We also need to specify a condition at infinity 
for the field u. We impose that the velocity is equal 
(at infinity) to some constant value U. Since 0° is 
simply connected, the condition curlu=O implies 
that the flow is potential, that is, there is a scalar 
function F(x) such that u = U + VF. 

Thus, the determination of an irrotational flow 
around an obstacle amounts to solving the following 
exterior Neuman problem. 


Find F satisfying: 


AF=0 in QS 
a on OQ) 
On 


VF =0 at infinity 


This problem is well known and has a unique 
solution, which satisfies, at infinity: 


F(x) = O(1/|x|") VE) = O(1/[x!") 


Then a classical calculation (integration by parts) gives 
the resulting force exerted by the flow on the body: 


1 
R=- | pu do = | —u’ndo = 0 
ao an 2 


This property of inviscid potential flows was first 
noticed by Jean Le Rond d’Alembert (1717-1783). 
Furthermore, d’Alembert performed a series of 
experiments to measure the drag on a sphere in a 
flowing fluid and he expected that the force would 
go to zero as the viscosity of the fluid approached 
zero. But this was not the case: the drag seemed 
to converge toward a nonzero value. Hence, this 
property was called d’Alembert’s paradox. 

Of course, d’Alembert’s paradox tells us that some- 
thing is going wrong: this model of flow around a body 
is not physically relevant. But it is not obvious to 
identify precisely what is going wrong. 

Physics tells us that in a flow around a flying 
airplane, the viscous term (as measured by an 
dimensionless number called Reynolds number) is 
very small. The main effect of the viscosity is then 
to alter the limit condition at the boundary of the 
body. The relevant boundary condition is no longer 
u-n=O, but the purely viscous condition u=0, 
or more realistically a condition of friction type 
(turbulent boundary condition). 

A common approach is to disqualify the perfect- 
fluid model in arguing that this modification of the 
boundary condition has important consequences on 
the flow near the body (giving rise to a turbulent 
boundary layer, for example). 

It seems to us that such a disqualification of the 
perfect-fluid model discards prematurely interesting 
issues. Indeed, we must notice first that the 
stationary solution on which d’Alembert’s reasoning 
is based is highly unstable and not acceptable 
physically. Thus, a realistic solution would necessa- 
rily be either nonstationary or with some vorticity. 
On this basis, we can imagine other scenarios to 
explain the existence of a resulting force exerted 
on the body. For example, we may imagine a 
stationary solution with a discontinuous velocity 
field (i.e. with a vortex sheet). The process 
conducive to such a stationary solution is called 
Prandtl’s scenario (Batchelor 1967). The mathema- 
tical proof that Prandtl’s scenario does exist is a 
difficult (open) issue, which seems closely related to 
the (probable) nonuniqueness of weak solutions of 
the Cauchy problem. 


The Cauchy Problem for the 
Incompressible Perfect Fluid 


The Case Q c R’ 


In the Cauchy problem, given an initial velocity field 
uo(x), we want to determine the corresponding 
solution u(t,x) of [6] at each time f. 

The first significant result on the Cauchy problem 
for three-dimensional Euler equations was given by 
Kato (1975). 


Theorem For uo in the Sobolev space H°(R>), for 
s > 5/2, there is T > 0 and a unique classical solution 
(of the Cauchy problem) u(t,x) on [0,T] x R’. u 
depends continuously on t in the space H°. 


By a classical solution we mean that the field 
u(t,x) is derivable in terms of the variables t, x and 
satisfies the equations in the usual sense. 

Here HS (R?) denotes the Sobolev space of the 
fields u, which are square integrable and with spatial 
derivatives of order s (in the case where s is an 
integer) also square integrable. 


Remark These results have been generalized to some 
extent during the last few decades, but the following 
issues are still open: 


1. Do singularities occur at a finite time for such 
regular solutions? 

2. For a less regular initial datum, do weak solutions 
exist (in the sense of distributions)? 


The Case Q c R? 


This case is better understood, the first mathematical 
results trace back to Lichtenstein (1925) and Wolibner 
(1933); they take a plain form with the famous theorem 
of Youdovitch 1963 (see, e.g., Chemin (1995)). 

In two dimensions, the vorticity w = curl u identifies 
with a scalar function, and the Beltrami equation 
becomes 


Ow 


7 + div(wu) = 0 [8a] 
curl u = w [8b] 
divu=0, u-n=0 onðQ [8c] 


This formulation, which appears as a transport 
equation [8a] for w, coupled with the elliptic system 
[Sb]-[8c], which determines u from w, is particularly 
convenient. 

The constants of motion associated with the usual 
symmetries, of course, persist; notice, however, that 
the helicity degenerates since, in two dimensions, 
w-u=(. But now from [8a] we see that w is merely 
convected by the incompressible velocity field u. We 
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deduce that, for any continuous function f, the 
functional 


| Fot x))dx 


is a constant of motion. 

Thus, a specific feature of the two-dimensional case 
is to introduce an infinite set of constants of motion. 
By a skilful exploitation of this fact, Youdovitch 
succeeded in proving the following result. 


Theorem For a given wọ in the space L®(Q), there 
is a unique weak solution w(t,x) of [8], such that 
w(t,x) is in L®(Q) for all t, and w depends 
continuously on t in the space L?,1<p<.o. 


L? denotes, in a standard way, the Lebesgue space 
of the functions f such that |f|? is integrable over 
Q and L®(Q), the space of measurable bounded 
functions on Q. 

Thus, if we limit ourselves to initial data with 
bounded scalar vorticity, the Cauchy problem for 
the two-dimensional incompressible perfect fluid is 
satisfactorily solved. The situation is much more 
intricate if we consider a less regular initial datum 
(e.g., if wo is a Measure supported by a curve (vortex 


sheet)). 


Arnol’d’s Work on Two-Dimensional Inviscid Flows 


Youdovitch’s theorem implies that the incompressible 
Euler equations, with wo in L™(Q), is a satisfactory 
model of two-dimensional flows — an important issue 
to study further the properties of this model. 

A famous result due to Arnol’d (see Arnol’d and 
Khesin (1998) and Marchioro and Pulvirenti (1994)) 
deals with the nonlinear stability of the stationary 
solutions. 

Let us determine the smooth stationary solutions 
of the two-dimensional Euler equations in a bounded 
domain Q of the plane. We have to solve: 


(u-V)w=0 [9a] 
curl u = w [9b] 
divu =0, u-n=0 onóðQ [9c] 


Since we have div u = 0, we may introduce the stream 
function of u, p, which is given by the Dirichlet’s 
problem: 


-Ay =w, y=0on ðQ 


so that u = curl W. 
The system [9] becomes: 


VYAVw=0, —-Av=w, Yy=0on ðQ 
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Let us focus on solutions which are characterized by a 
relationship w=f(wW), where f is a smooth function. 
Such solutions are given by the resolution of the 
following nonlinear elliptic problem: 


-Ay =f (4), Y=0 on oO [10] 


This problem has always at least a solution, for 
example, if f is a bounded function of w. 

Let y* be a solution of [10], and w*=f(y*) 
the corresponding vorticity function. We shall say that 
the stationary solution u* is stable in the L*-norm if: 


For all £ > 0, there is a 6 > 0, such that for all initial 
datum wọ in L®(Q) satisfying 


Jo — wo) dx < 6, we have : 
Q 


J a dx <e,for all ¢ 
Q 


where w(t) denotes the solution of the Cauchy problem 
associated with the initial datum wọ by Youdovitch’s 
theorem. 


Now we can state the following result. 


Theorem (Arnol’d) Let w be a stationary solution 
given by [10]. We assume that one of the following 
assumptions holds: 


(C1) There are positive constants c1, c2, such that 
alog 


(C2) There are positive constants c1, c2, with c2 < »4 
(first eigenvalue of the Dirichlet problem on the 
domain QQ) such that: 


a <-f'<aQ 


Then w is stable in the Ls-norm. 


Remarks 


(i) This result was the first nonlinear stability result 
for stationary flows. 

(ii) The proof makes use of the conservation of the 
functionals of the vorticity field. 


Another significant contribution of Arnol’d to 
hydrodynamics was to reveal the geometrical aspect 
of the instability of the perfect-fluid motion. We give 
a brief insight into this issue. 

Let us come back to the Lagrangian description 
of motion. We want to determine the function 
y(t,x). Each mapping y;(x) = y(t,x) is, for t fixed, 
a diffeomorphism of Q preserving the Lebesgue 
measure and the orientation (equivalently stated, it 
is an element of SDiff(Q)). 


In other words, a fluid motion is a curve t > g; 
drawn on the “manifold” M = SDiff(Q) (the config- 
uration space of the system). 

At time t, the relationship 


Op 
Ot 


states that the velocity field u(t, y,(x)) belongs to the 
space tangent to M at y. The tangent space at » 
to M is the space of vector fields v(y(x)), where v(x) 
is an incompressible vector field on Q satisfying 
v-n=O0 on OQ. This space is naturally endowed 
with a norm given by the kinetic energy 


=f v(x) dx 


is endowed with a Riemannian 


(t, x) = u(t, p(t, x)) 


and thus M 
structure. 

It is easy to check that the perfect-fluid motions 
correspond to the curves y, drawn on M which are 
the critical points of the action integral: 


r dx, forall ti < t 











(with ie constraints y(t), .) = y1, (t2, .) = y2) 


That is to say, the perfect-fluid motions are the 
geodesics of the Riemannian manifold M. 

The main interest of this geometric framework is 
to bring back, at least formally, the perfect-fluid 
motions to well-known objects. Indeed, we know 
that the Riemannian curvature of a manifold has a 
profound impact on the behavior of geodesics on it. 
If the Riemannian curvature is positive, then nearby 
geodesics oscillate about one another, and if the 
curvature is negative, geodesics rapidly diverge from 
one another. More precisely, the stability of geode- 
sics is expressed in terms of the curvature by means 
of Jacobi’s equation [1]. If y is a geodesic curve 
starting from yọ, with velocity field v(t) (whose 
norm is supposed equal to 1), if the sectional 
curvature of the manifold in all the 2-planes 
containing v(t) is less than —c(<0), a perturbation 
of the initial datum will increase at least as exp(ct): 


d( Yr, Pr) Z d(po, Po) exp(ct) 


where Øo denotes the perturbed initial datum and d 
the geodesic distance on the manifold. Moreover, if 
the curvature at every point and for all the sections 
is less than —c, and if M is compact, then the geodesic 
flow, that is, the one-parameter group of transfor- 
mations (Y0, v(0)) —> (yrs, v(t)), is mixing (in the usual 
meaning of ergodic theory). Arnol’d succeeded in 
calculating the sectional curvature for flows on the 
two-dimensional torus; he showed that the 


curvature is negative for “most” of the sections. This 
gives an enlightening geometrical picture of the 
instability of Lagrangian flows. 

It was tempting to connect the above considera- 
tions on the instability of two-dimensional flows 
with the problem of weather forecast. In 1963 
EN Lorenz stated that a two-week forecast would be 
a theoretical bound for predicting the atmospheric 
motion. Lorenz’s assertion was based on numerical 
simulations. He took as model for the large-scale 
atmospheric motion the two-dimensional Euler 
equations on the torus, which he truncated to a 
small number of Fourier modes (about 20). This 
model is highly unstable and displays exponential 
sensitivity with respect to the initial datum. How- 
ever, the parallel between the behavior of this 
system and the instability of the Lagrangian flow is 
misleading. On the one hand, if we again do the 
Lorenz computations on Euler equations, taking 
into account a large number of Fourier modes, we 
note a striking phenomenon: the flow has a tendency 
to self-organize into large vortices, called coherent 
structures, and simultaneously the exponential 
sensitivity, as measured in terms of the energy 
norm of the velocity field, disappears. On the other 
hand, the problem of predicting the Lagrangian flow 
is very different, the Lagrangian flow can be 
exponentially unstable, while the corresponding 
velocity field quietly converges, in the energy norm, 
towards some equilibrium. We must keep in mind 
that the meteorologist aims to predict the values of 
the velocity field at some future time and not the 
trajectories of the fluid particles. In fact, it appears 
that Lorenz has ignored phenomena of a statistical 
nature which occur when a large number of degrees 
of freedom are considered; thus, his theoretical 
bound for the prediction of the atmospheric motion 
has no definite basis. More detailed reflections on 
this issue can be found in Robert and Rosier (2001). 


The Cauchy Problem for the Euler 
Equations for Compressible Inviscid 
Fluids 


As remarked in the introduction, compressible flows 
yield pressure waves. The equations of motion being 
nonlinear, these waves interact in an intricate 
manner giving rise to shocks. This is the main 
feature of compressible fluid flows. Compressible 
flows are situated in the more general domain of 
nonlinear hyperbolic systems, which were inten- 
sively studied during the last decades. We only give 
here an example of the kind of result which can be 
obtained. 
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The following theorem, which states that for a set 
of regular initial data, shocks do not occur till some 
finite time, is a consequence of a more general result 
on hyperbolic systems due to Majda (1984). 

We consider 2 =R? and the system [1], [2], [4]. 


Theorem Assume po,uo € HS A LS(R?), with 
s>5/2 and po(x) > 0. Then there is a finite time 
T > 0, depending on the H° and L® norms of the 
initial data, such that the Cauchy problem for [1], 
[2], [4] bas a unique bounded smooth solution p, 
u € C!([0, T] x R?), with p(t, x) > 0 for all t,x. 


Inviscid Flows and Turbulence 


Loosely speaking, turbulence is the intricate motion 
of a slightly viscous flow. Going back to the first 
half of the last century, there are two main 
approaches to turbulence. The first is due to Leray. 
The dissipation of energy is a characteristic feature 
of three-dimensional turbulence, and Leray thought 
that, even if very small, the viscosity of the fluid 
plays an important role, so that to understand 
turbulence the first step is to study the Navier- 
Stokes equations. A radically different approach is 
due to Onsager. Onsager (1949) started with the 
fundamental remark that the 4/5 law of turbulence, 
which relates the dissipation of energy to the 
increments of the velocity field, does not involve 
viscosity. Furthermore, he observed that the proof of 
the conservation of energy for the solutions of Euler 
equations uses an integration by parts which 
supposes some regularity of the velocity field. He 
then imagined that an inviscid dissipation mechan- 
ism, due to a lack of regularity of the solutions, was 
at work in Euler equations. In modern terminology, 
he suggested to model turbulent flows by nonregular 
(weak) solutions satisfying the Euler equations in the 
sense of distributions. He also conjectured that if a 
solution satisfies a Holder regularity condition of 
order >1/3, then the energy would be conserved. 
Onsager’s views were revolutionary and forgotten 
for a long time. Recent works, such as the proof of 
Onsager’s conjecture, the construction of weak 
solutions with energy dissipation, and the discovery 
of the explicit local form of the energy dissipation 
for weak solutions, show a renewed interest in these 
views (see, e.g., Constantin and Titi (1994), Eyink 
(1994), Robert (2003), and Shnirelman (2003)). 


See also: Compressible flows: Mathematical Theory; 
Dissipative Dynamical Systems of Infinite Dimension; 
Hyperbolic Dynamical Systems; Incompressible Euler 
Equations: Mathematical Theory; Non-Newtonian Fluids; 
Partial Differential Equations: Some Examples; Chaos 
and Attractors; Turbulence Theories. 
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Introduction 


This paper reviews recent developments, following 
closely (sometimes verbatim) the review paper 
Calogero F (2004c) (see the Bibliography below); 
for more traditional investigations of isochronous 
systems see other entries of this Encyclopedia (and 
for the mathematical investigation of isochronous 
centers in the plane, related to the 16th Hilbert 
problem, see for instance the survey paper referred 
to at the end of this entry). 

The isochronous systems treated herein are char- 
acterized by the property to possess an open domain 
having full dimensionality in their phase space such 
that all the motions evolving from a set of initial 
data in it are completely periodic with the same 
fixed period. The natural measure of this open 
domain might, or it might not, be infinite when the 
measure of the entire phase space is itself infinite: 
for instance, if the entire phase space is the two- 
dimensional Euclidian plane, such a domain might 


be the exterior, or the interior, of a circle of finite 
radius. 

It is justified to call such systems superintegrable, 
or perhaps partially superintegrable inasmuch as the 
property of isochronicity of all their motions holds 
only in a subregion of the entire phase space. This 
terminology is justified by the observation that, 
roughly speaking, all confined motions of a super- 
integrable system — in which all but one of the 
degrees of freedom are constrained by the existence 
of the maximal possible number of constants of 
motion — are completely periodic, although not 
necessarily all with a fixed period — entailing that 
isochronicity entails superintegrability, while the 
converse is not the case (see the entry Integrable 
systems in this Encyclopedia). 

A simple trick -— amounting essentially to a 
change of independent, and possible as well of 
dependent, variables, allows to deform a largely 
arbitrary dynamical system so that the deformed 
system obtained from it be isochronous. This 
“trick”, which is now explained, entails therefore 
that isochronous systems are not rare. Below we 
provide several examples; others can be found in 
the further reading suggested at the end of this 
entry, and/or can be manufactured ad libitum using 
the trick. 


The Trick 


We now show that, given a largely arbitrary 
dynamical system, it is possible to introduce a 
deformed version of it featuring a real constant w, 
that has the following properties: for w=0, it 
coincides with the original, undeformed system; for 
w>0Q, it possesses an open region having full 
dimensionality in its phase space such that all 
solutions evolving from an initial datum in it are 
completely periodic with a period T which is a finite 
integer multiple, or perhaps a simple fraction, of the 
basic period 
T= a 
W 
Let us indeed, consider a quite general dynamical 
system which we write as follows: 


C = F(¢;7) [2] 


Here €=C(r) is the dependent variable, which 
might be a scalar, a vector, a tensor, a matrix, you 
name it. The independent variable is 7, and the main 
limitation on the dynamical system [2] is that it be 
permissible to treat this variable as complex; this 
requires that the derivative with respect to this 
complex variable 7 that appears in the left-hand side 
of the evolution equation [2] make sense, namely 
that this dynamical system be analytic, entailing that 
the dependent variable ¢ be an analytic function of 
the complex variable 7 (but this does not require 
¢(r) to be a holomorphic nor a meromorphic 
function of 7;C¢(r) might feature all sorts of 
singularities, including branch points, in the com- 
plex r-plane, indeed this will generally happen since 
we generally assume the evolution equation (??) to 
be nonlinear). The quantity F in the right-hand side 
of [2] — which has of course the same scalar, vector, 
matrix... character as Ç — might depend (arbitrarily 
but analytically) on Ç as well as on r. (Let us also 
emphasize that this approach is as well applicable to 
more general dynamical systems that also feature 
other, “spacelike”, independent variables, for 
instance are a system of PDEs rather than ODEs; 
the interested reader is referred to the literature cited 
below). 

In spite of the generality of this dynamical system, 
[2], there generally holds a result (“Theorem of 
existence, uniqueness and analyticity”) that charac- 
terizes the solution ¢(r) of its initial-value problem 
determined by the assignment 


C(O) = Co 


Here, for notational simplicity, we assign the initial 
datum Co at 7=0; and we assume of course that the 
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right-hand side of [2] is not singular for r=0 and 
¢=(C . The relevant result guarantees, not only for 
the initial datum Cp, but for a (sufficiently small but 
open) set of initial data in its neighborhood, the 
existence of a circular disk in the complex 7-plane, 
centered at T =0 (where the initial data are assigned) 
and having a nonvanishing radius p, such that the 
solutions ¢(7) corresponding to these initial data are 
holomorphic in it, namely for |r| < p (and note that 
if (T) is a multicomponent object, the property to 
be holomorphic is featured by each and everyone of 
its components). 

Let us now introduce the following changes of 
dependent and independent variables: 


z(t) = exp(iAwt)C(T) [3a] 
T=T(t)= —— = [3b] 


This transformation is called “the trick”. The 
essential part of it is the change of independent 
variable [3b]: and let us re-emphasize that, here and 
hereafter, the new independent variable t is con- 
sidered as the real, “physical time” variable. Note 
that [3b] entails 


7(0) = 0, 7(0) =1 


and, most importantly, that r(t) is a periodic 
function of t with period T, see [1]. More specifi- 
cally, as the time ¢ increases from zero onwards, the 
complex variable r travels counterclockwise round 
and round on the circle C the diameter of which, of 
length 2/w, lies on the imaginary axis in the complex 
T-plane, with one extreme at the origin, 7=0, and 
the other at the point T = 2i/w, making a full circle in 
the time interval T. As for the prefactor exp(iAwt) 
that multiplies ¢(7) in the right-hand side of [3a], its 
purpose is to allow, via an appropriate choice of the 
parameter A, the deformed system, see below, to 
have a neater look; however this choice is hereafter 
restricted by the condition that » be real and 
rational, say 


cee 


q 


with p and q two coprime integers and q > 0. This 
restriction is essential to guarantee, via [3], that if 
C(t) is holomorphic in r in the (closed) disk 
encircled by the circle C, then z(t) is completely 
periodic (namely, each and everyone of its compo- 
nents is periodic) with the period 


Leg [4] 
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The deformed dynamical system is the one that 
obtains from [2] via the trick [3]. It clearly reads as 
follows: 


Z=1hwz + exp[i(A + 1)ut| 


r F(exp(—idur)z ae) [5] 


lw 


And it is plain, on the basis of the arguments we just 
gave, that this system is isochronous, a sufficient 
condition for the complete periodicity with period 
T, see [4], of its solutions being provided by the 
inequality 

2 

Ww 
which can clearly be satisfied by initial data situated 
inside an open domain of such data, at least 
provided w is sufficiently large (actually, in all the 
examples reported below no restriction on the value 
of w is required, namely such an open domain exists 
for any arbitrary value of w > 0). 


Examples 


In this subsection we report tersely several examples 
of isochronous dynamical systems; in each case we 
also provide the reference where more information 
can be found. Except when explicitly otherwise 
mentioned, these dynamical systems are to be 
considered in the complex context. 

The first example we report is a Hamiltonian 
N-body problem which is a generalization of a well- 
known integrable (indeed, superintegrable) system 
(see Integrable Systems: Overview). It is characterized 
by the (normal) Hamiltonian 


N 
zp ATA 
n=1 


1 Š Š fm 
T 4 f Jk [6a] 
m,n=1;men k=1 R(Zn E Zm) 








and correspondingly by the Newtonian equations of 
motion 


N K (k) 


Ën EW Zn = >, > 7: ae [6b] 
ml msn k= 1 (Zn = Z) 








Here the N(N — 1)K “coupling constants” f{$) are 


arbitrary, except for the symmetry restriction 
k) __ 
fim = fnn (see [6a]). | 
The next example we report is a real N-body 
problem in the horizontal plane, characterized by 
the Newtonian equations of motions 


EE RED > (cum + Bum), 


m=1, m+n 
a (Fn : 7m) + Tm (7, i 7m) — Yum Q Fn) 


x 
2 
Tn 


[7] 


Here 7, = (Xn, Yn 0) is a real two-vector in the 
horizontal plane, k= (0,0,1) is the unit vector 
orthogonal to the horizontal plane, the symbol ^ 
denotes the (three-dimensional) vector product so 
that RAF, =(—Yn, Xn, 0), and we use the short-hand 
notation Pym = Fn — Fm entailing r2,,=72 +172, — 27, - 
Tm. Note that these equations are translation- and 
rotation-invariant; and they are Hamiltonian, 
although the corresponding Hamiltonian function 
is not of normal type (kinetic plus potential 
energy). 

The N(N — 1) “coupling constants” Qj, and Ban 
are of course real, but they are otherwise arbitrary 
except for the symmetry restrictions Qnm = Qmn, 
Gum = Bmn Which are required in order that this 
system be Hamiltonian. If all these coupling 
constants vanish, this dynamical system has a 
clear physical interpretation: it describes the 
motion of N equal, electrically charged, point 
particles, moving in the horizontal plane under 
the effect of a magnetic field orthogonal to that 
plane (in the approximation in which the electro- 
static interparticle interaction is neglected). In that 
case each particle moves on a circle, the center and 
radius of which depend on the initial data, while 
the time taken to go round it is, in all cases, T, see 
[1]. If the 5N(N — 1) coupling constants Brum 
vanish, Bam =0, and the IN(N — 1) coupling con- 
stants Qy, all equal unity, aym = 1, the system is a 
well-known integrable (indeed solvable) system; 
and this is as well the case if the 4N(N — 1) 
coupling constants Gym vanish, Banm =0, and the 
I N(N — 1) coupling constants aym equal minus one 
half, and only act among “nearest neighbors”, 
Qnm = —$ (Emn, n41 + óm, n—1) (see the entry Integrable 
systems in this Encyclopedia). 

Because of its many interesting features as well as 
the neatness of its equations of motion (especially in 
their complex version, see below) the honorary title 
of “goldfish?” has been attributed to this model, 
characterized by the Newtonian equations of motion 
in the plane [7]. A more detailed discussion of it — in 
particular of its behavior for initial data outside of 
the region yielding isochronous motions — is made in 
the next section. 

Several interesting classes of isochronous dyna- 
mical systems are reported in Calogero F. (2004b). 


We only mention here a remarkably general 
example, characterized by the Newtonian equa- 
tions of motion 


K 
Zt iwż = yr re z+ iwz) 
k=l 


where z = (z1,...,2N) is the N-vector whose com- 
plex components z, = Z,(t) are the dependent vari- 
ables, while the “forces” a (* (z, 3) are required to be 
waite in all their neime and to satisfy the 
scaling properties 
e kf (z, 2) 

which however entail no restriction on the velocity- 
dependence a bay forces, namely on the depen- 
dence of fir on the (components of the) 
second, 2, of K N-vector arguments. 

The next example we report is characterized by 
the Newtonian equations of motion 


TE Ss Min? 
Fa + iwt, + 2w P, = X e 


m=1 m+n Tmn 





where we assume the N dependent variables 7, = 
7,(t) to be three-vectors (although the property of 
isochronicity of this deformed system would hold no 
less if these were S-vectors, with S an arbitrary 
positive integer) and we use the short-hand notation 
Tmn =Tm — Tn. This system is (perhaps) remarkable 
inasmuch as it represents a (complex) deformation 
of the classical N-body gravitational problem, to 
which it clearly reduces for w= 0. 

The next example we report is characterized by 
the following (first-order) equations of motion of 
oscillator type: 


Xn — iPnWXn = falx, Y), n=1,...,N 
Ym + 1dmWVm = m(x, Y), m=1,...,M 


Here the N-vector x, respectively the M-vector y, 
have as components the N + M complex dependent 
variables x, = x„(t), Ym = Vm(t); the N+M para- 
meters pn, qm are all nonnegative integers (or they 
could be nonnegative rational numbers); and the 
N +M complex functions f,,gm are restricted by 
the following conditions (which are sufficient to 
guarantee the isochronicity of this dynamical 
system): 


1) falx, y) and gm(x,y) are holomorphic at 
x = 0, y = 0; 

(2) lim: — oe + flex, ey)] = 0, lime — ole *g(ex, ey)] 
= 0; 


isochronous Systems 169 


(3) f(x,y) and g(x, y) are polynomial in the Ym; 


(4a) lims ~ ole” falx, e 4y)] = nondivergent, n = 
1,..., N; 

(4b) lim. ole7! 4" gn(e2x,e 1y)]=nondivergent, m= 
1,...,M. 


In the conditions (4a) and (4b) the notation £x indicates 
of course the N-vector of components ¢?"x,, and 
likewise £~ 2y is the M-vector of components ¢7"y,,. 
Note that this dynamical system, [8], includes the 
Hamiltonian case characterized by the restrictions 


OV(x,y) 
N = M: þh = anala y) = By, PE y) 
_ V(x, y) 
Ox, 


which imply that the equations of motion [8] are 
just the Hamiltonian equations entailed by the 
Hamiltonian function 


N 


n=1 


isochronicity being now guaranteed by the following 
conditions on the function V(x, y): 


(1) V(x, y) is holomorphic at x =0,y =0; 
(2) limeso [e* V (ex, ey)] = 05 

(3) V(x,y) is pena in the yy; 

(4) lim. „olet V(x, c7 Py)] = nondivergent. 


The last two examples we report can be char- 
acterized as assemblies of non-linear harmonic 
oscillators, inasmuch as these two dynamical sys- 
tems (which are actually special cases of more 
general systems) have the remarkable property that 
their generic solutions (namely, all their solutions, 
except for a lower-dimensional set of singular 
solutions in which one or more of the “moving 
particles” shoot off to infinity at a finite time) are 
completely periodic with the fixed period T, see [1]. 
Their Newtonian equations of motion read 


Lam — I am — Qu Zam =C 


am a DU ym —_ Des Zhe =C 


These are two (different) systems of NM Newtonian 
equations of motion satisfied by the NM complex 
S-vectors Zm (with S an arbitrary positive integer); 
hence here the index n runs from 1 to N, and the 
index m runs from 1 to M, with N and M two 
arbitrary positive integers, while c is of course an 
arbitrary complex constant (which might actually be 
rescaled away). The dot sandwiched between two 
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S-vectors denotes the standard (Euclidian) scalar 
product, entailing the rotation-invariant character, 
in S-dimensional space, of these equations of 
motion. Since these systems only feature linear and 
cubic forces, these models are remarkably close to 
physics; and they become even more applicable if 
they are written in their real versions, that obtain in 
an obvious manner by setting 


Zum = Xnm + 1¥nm, E = a+ ib 


In contrast to what we did for the previous examples, 
let us outline here the derivation of these results. 
Actually the two systems of Newtonian equations 
written above are merely special subcases, corres- 
ponding to appropriate parametrizations of a square 
matrix M (of appropriate rank) in terms of S-vectors, of 
the following nonlinear matrix evolution equation: 


M — 3iwM — 2u*M = cM? [9] 


Hence the findings reported above are merely special 
cases of the more general result according to which 
the generic solution of this nonlinear matrix evolu- 
tion equation — with M = M(t) a square matrix of 
arbitrary rank — is periodic with period T, see [1]: 


M(t + T) = M(t) 
And this result is an immediate consequence, via the 
following matrix version of the trick 


_ exp(iwt) — 1 


M(t) = exp(iwt)(r),7 [10] 


iw 
of a previous result due to V. I. Inozemtsev, 
according to which the matrix evolution equation 


Tagi 


which clearly corresponds to [9] via [10], is 
integrable and all its solutions W(t) are mero- 
morphic functions of the independent variable 7. 


The Transition to Deterministic Chaos 


In this section we illustrate, using the real N-body 
problem in the plane characterized by the New- 
tonian equations of motion [7], the behavior of an 
isochronous system of the kind described above 
when the initial data fall outside of the region 
yielding isochronous motions. 

To do this it is convenient to use the complex 
version of the equations of motion [7], that obtain 
from [7] by setting 


Zn = Xn + Yasta = (Qa Vas 0) 


A [11] 
k = (0,0, 1) Gay = Anm + 10am 


and read as follows: 


N a 

A a Anmkn& 

2 = 192, 2 ` E A m [12] 
no <m 


m=1,m#~n 


The main tool of our analysis is the (particularly 
simple) version of the trick appropriate to this 
model, 


al) = G(r), r= P 3 
entailing 
Zn (0 = Cn(0), &n(0) = ¢,(0) [13b] 


that relates our equations of motion [12] to the 
equations of motion 


2 
"=2 


m=1 m+n 


L yl 
Bam Ga Gn [14] 
Gn _ Cm 
These equations of motion, together with the initial 
data ¢,,(0), C (0) (see [13b]) define the solutions ¢, = 
G(T) in the complex t-plane. The “physical” 
evolution of the points z, = z,(t) as functions of 
the real time variable t is then given by the evolution 
of the corresponding coordinates ¢,(T), see [13a], as 
the complex variable 7 travels round and round on 
the circle C in the complex r-plane, the diameter of 
which of length 2/w, has one extreme at the origin 
7 =0 and the other on the positive imaginary axis at 
T=2i/w. It is therefore clear that the behavior of 
Zn(t) as a function of the real, “physical time” 
variable t depends on the analytic structure of ¢,(T) 
as function of the complex variable 7r, in particular 
of the singularities, if any, of this function ¢,,(7) that 
fall in the disk D encircled by the circle C in the 
complex r-plane. 

Let us tersely review the relevant analysis. We 
recall first of all that (it can be proven that) there 
exists in phase space an open region of initial data 
Zn(0),Z,(0), characterized by large values of the 
moduli |Z,(0) — Zm(0)| of the initial interparticle 
distances and by small values of the moduli of the 
initial particle velocities |z,,(0)| (see [14] and [13b]), 
that guarantees (all components ¢,(7) of) the 
corresponding solution C(t) of [14] to be holo- 
morphic in (a disk of radius p centered at the origin 
T =0 of the complex r-plane that includes) the circle 
C, hence the corresponding solution z(t) to be 
completely periodic with period T, see [13a] and 
[1]. This result guarantees the isochronous character 
of this model, [12], for any arbitrarily given assign- 
ment of the coupling constants apm. 


Next, let us restrict, for simplicity, our considera- 
tion to models [12] in which the coupling constants 
Anm are real and nonnegative, 


Gin 0 [15] 


Then the singularities of the generic solution ¢(r) of 
[14] - which occur at values 7 of r where two 
coordinates ¢,(7) coincide, say ¢,,(7)=G(T))=b 
(see the right-hand side of [14]) — are branch points 
characterized by the exponent, say, 


1 


— 16 
EEA 16 


Y = Yw = 
so that in their neighborhood, namely for T 7, 


G(T) =b+c(t — T) +T — Tp) 


~ k+ly+m(1- 
+ 2o imanna) MO 


k=1 l,m=0l+m>1 


S=p,V [17a] 
Cal T) =bn + Un(T — Tp) 
ae ` `^ ^ po r E yo 
k=1 L=6p1 m=0 
n+ U,V [17b] 


The sign in front of c in the right- hand side of the 
first, [17a], of these formulas indicates that one sign 
must be chosen for s= pm, the opposite for s=v. 
Note that here the 4+2(N —2)=2N constants 
Tp, b,c, U, bn, Vn are a priori arbitrary — except for 
the obvious restrictions b, 4 b, bn 4 bm — while the 
coefficients O30 can be computed from these 
constants, recursively, by inserting this ansatz, [17], 
in the equations of motion [14]. The fact that the 
number, 2N, of a priori undetermined coupling 
constants equals the number of arbitrary initial data 
for this system of ODEs, [14], indicates that this 
kind of branch points, characterized by the expo- 
nents Ym, see [16], is the typical singularity featured 
by the generic solution ¢(r) of [14]. (Branch points 
with different exponents may appear, but only in 
nongeneric solutions (T) which, at some value r, of 
T, feature the coincidence of more than two 
components, say ¢),(7) = G(T) = a(7p)). 

We conclude therefore that the generic solution ¢(rT) 
of [14] features a, generally infinite, number of branch 
points, that generally affect each of its components 
¢,(7), and which are characterized, for the class of 
models to which we are restricting attention here, see 
[15]) by (real) exponents Ym, see [16], which are then 
clearly characterized by the inequalities 


O< Yum <1 
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What does this tell us about the generic solution z(t) 
of the equations of motions of primary interest to 
us, [12], in particular about its evolution as function 
of the real “time” variable t? 

To the solution ¢(T) is associated a Riemann 
surface the structure of which is determined by the 
character and distribution of the branch points of 
¢(r) in the complex r-plane (each of which is 
generally featured by each component (T) of C(r), 
although generally not in the same way: see [17]), 
and we know from [13a] that the values taken by 
z(t) as t evolves from t=0 towards t=oo coincide 
with the values taken by C(r) as the independent 
variable 7 travels, on that Riemann surface asso- 
ciated with (T), counterclockwise round and round 
on the circle C defined above (the diameter of which 
lies on the imaginary axis in the complex 7-plane, 
with one end at t=O and the other at 7=2i/w), 
employing a period T, see [1], to make each full 
round. Hence the behavior of the solution z(t) of 
[12] depends on the structure of the Riemann 
surface associated with the corresponding solution 
C(t) of [14], and specifically on the number of 
different sheets of that surface that are visited as one 
travels on it before returning, if ever, to the main 
sheet from which the travel started at t=7=0. 

If no other sheet is visited besides the main one, 
the corresponding solution z(t) is of course periodic 


with period T, see [1] and [13a], 
g(t + T) = z(z) |18] 


This happens provided no branch point is featured 
by (T) on its main sheet inside the circle C; and, as 
already indicated above, it has been proven (even in 
the more general case with arbitrary coupling 
constants a,m) that there is an open region having 
full dimensionality in the phase space of initial data, 
see [13b], that yields such an outcome, implying the 
isochronicity of the model characterized by the 
Newtonian equations of motion [12]. This region 
R of initial data has a boundary —- a lower- 
dimensional domain in the phase space of initial 
data — out of which emerge motions leading, at a 
time t, smaller than T, to a “particle collision”, say 
zulto) = zulto). 

The character of the solution z(t) yielded by initial 
data outside of the region R depends on the 
structure of the Riemann surface associated with 
the corresponding solution ¢(r). This is mainly 
determined by the values of the branch point 
exponents Yum, which are themselves determined by 
the values of the coupling constants „m, see [17] 
and [16]. Let us focus on the (more interesting) case 
in which these constants 4„m are rational numbers, 
entailing that the coefficients y,, determining the 
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character of the branch points are as well rational, 
see [16], so that each of the cuts associated with 
them opens the way, in the Riemann surface, to a 
finite number of sheets. There are then two 
possibilities, each generally characterized by open 
regions of initial data having full dimensionality in 
phase space, the boundaries of which always are 
(lower-dimensional) domains out of which emerge 
motions leading, in a time t, smaller than T, to a 
“particle collision”. 

One possibility is that the number B of sheets 
visited before returning to the main sheet be finite, 
B < œ; the corresponding solutions z(t) are then 
completely periodic with period T=(B+1)T, 
z(t + T) = z(t). 

Another possibility is that the number of new 
sheets visited be unlimited, namely that the structure 
of the Riemann surface be such that, by traveling 
round and round on it along the circle C one never 
returns back to the main sheet. This can happen, 
even if the exponents Y„m are all rational so that via 
the cuts associated to each of them access is gained 
to only a finite number of new sheets, because of the 
possibility that an infinity of branch points be 
located inside the circle C on the infinite sheets 
associated to these branch points, via a never ending 
mechanism of branch points nesting. Whenever this 
happens the corresponding solution z(t) is aperiodic; 
and it is moreover likely that it then be chaotic, in 
the sense of displaying a sensitive dependence on its 
initial data. Indeed this will happen whenever some 
ones out of this infinity of branch points fall 
arbitrarily close to the contour C, because then a 
minute change in the initial data, to which there will 
correspond a minute change in the pattern of these 
branch points of (r) in the complex 7-plane, will 
cause some relevant branch point to cross over from 
outside the circle C to inside it, or viceversa, and this 
will eventually affect quite significantly the time 
evolution of z(t), by causing a change in the 
sequence of sheets that get visited by traveling 
along the circle C on the Riemann surface associated 
to the corresponding (rT). 

This phenomenology has a clear “physical inter- 
pretation”, which can be qualitatively understood 
as follows. The N-body problem characterized by 
the Newtonian equations of motion [12] generally 
yields confined motions, the trajectory of each 
particle tending to wind round and round - it 
would indeed reduce to a circle were it not for the 
interaction with the other particles. A possibility, as 
we know, is that this N-body motion be completely 
periodic, with the same period T that characterizes 
the circular motion of each particle when the two- 
body interparticle interaction is altogether missing 


(dam =0). Another possibility, in the case discussed 
above with rational coupling constants, is that there 
exist other motions which are as well completely 
periodic, but with periods which are integer multi- 
ples of T. A third possibility, which cannot a priori 
be excluded, is that there also exist motions which 
are aperiodic but in some way overall ordered, 
perhaps featuring trajectories that eventually wind 
up around limit cycles. And still another possibility 
is that the motions described by the solution z(t) be 
aperiodic and disordered. In this case the physical 
mechanism causing a sensitive dependence on the 
initial data can be understood as follows. Such 
disordered motions necessarily feature near misses, 
in which, typically, two particles pass quite close to 
each other (while the probability that an actual 
collision occur among point particles moving in a 
plane is of course a priori nil). Such a near miss in 
the motion described by z(t) corresponds — see the 
discussion above — to a branch point of the 
corresponding solution ¢(r) occurring quite close 
to the circle C in the complex r-plane (which is the 
one-dimensional region of the two-dimensional 
complex t-plane in which the values of C(r) 
correspond to the values z(t) describing the motion 
of physical particles moving as functions of the 
time t); and in the generic case of a two-body near 
miss, there is a correspondence between the fact 
that such a branch point occur just inside, or just 
outside, the circle C, and the way the particles pass, 
on one or the other side, by each other. Likewise, 
the tiny change in the initial data that causes, in the 
context of the solutions (T) — see the discussion 
above — a branch point of ¢(r) to pass from inside 
to outside the circle C, or viceversa, corresponds, in 
the context of the “physical” solutions z(t), to a 
change occurring in the corresponding near miss, 
from the case in which the two particles involved in 
it slide by each other on one side to the case in 
which they instead slide by each other on the other 
side — entailing a significant change in the sub- 
sequent motion (indeed, the closer a near miss, the 
more it affects the motion, due to the singularity 
of the two-body interaction at zero separation, 
see [12]). 

The phenomenology outlined here does indeed 
occur in this goldfish model. It also occurs — rather 
similarly if more simply, since in this case only 
square-root branch points occur, irrespective of the 
values of the coupling constants — in the model [6] 
with K=1. Indeed, it is clear that this phenomen- 
ology provides a paradigm of rather general applic- 
ability for the transition from isochronicity to 
deterministic chaos, indeed perhaps for the generic 
onset of deterministic chaos. 


See also: Bifurcations of Periodic Orbits; 
Calogero—Moser—Sutherland Systems of Nonrelativistic 
and Relativistic Type; Integrable Systems: Overview; 
Quantum Calogero—Moser Systems; Synchronization of 
Chaos. 
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Introduction 


In this article we consider families of linear 
differential equations whose monodromy data do 
not depend on the parameters. Such families are 
called isomonodromic deformations of any of the 
equations of the family (for the definitions of a 
regular and Fuchsian linear system and of 
their monodromy groups, see Riemann-—Hilbert 
Problem). 


Schlesinger’s Equation 


The best-studied example of an isomonodromic 
deformation is the Fuchsian system on Riemann’s 
sphere CPt = C U œ considered by L Schlesinger: 





rat 02 e [1] 


Here the poles a; € C are free parameters and the 
matrices-residua A; depend analytically on 
a:=(a1,...,ap41); therefore, system [1] is in fact a 
family of linear systems which is an analytic 


deformation of the system obtained for aj =a). 


Isomonodromic Deformations 173 


Quantization, Varna, June 2004. Edited by Mladenov IM 
and Hirshfeld AC, Sofia, Bulgaria, 2005, pp. 11-61 (ISBN 
954-84952-9-5). 

Calogero F and Francoise J-P (2002) Periodic motions galore: 
how to modify nonlinear evolution equations so that they 
feature a lot of periodic solutions. Nonlinear Math. Phys 9: 
99-125. 

Calogero F and Francoise J-P (2003) Nonlinear evolution ODEs 
featuring many periodic solutions. Theor. Mat. Fis 137: 
1663-1675. 

Calogero F and Françoise J-P (2002) Isochronous motions galore: 
nonlinearly coupled oscillators with lots of isochronous 
solutions. In: Proceedings of the Workshop on Superintegrability 
in Classical and Quantum Systems, Centre de Recherches 
Mathématiques (CRM), Université de Montréal, September 
2002, CRM Proceedings and Lecture Notes, American Mathe- 
matical Society, 2004, vol. 37, pp. 15-27. 

Chavarriga J and Sabatini M (1999) A survey of isochronous 
systems. Qualitative Theory Dyn. Syst 1: 1-70. 

Mariani M and Calogero F (2005) Isochronous PDEs. Yader- 
naya Fizika (Russian Journal of Nuclear Physics) 68: 
958-968. 


One can think of system [1] as defined by the 
Pfaffian system 


p+1 A; 


I t — aj 





dX = wX, d(t-a) 2] 


Suppose first that the poles a; vary within 
small nonintersecting disks of the points a’, SO 
small that the standard system of generators of 
the monodromy group could be defined by one 
and the same contours for all values of the 
parameters a; (see Figure 1 from Riemann-Hilbert 
Problem). Suppose also that one chooses oo as 


base point and that one has 
Xid [3] 


(where I is the identity matrix) for all values of the 
parameters aj. Finally, suppose that all matrices A; 
are nonresonant, that is, without two eigenvalues 
differing by a nonzero integer. Then the following 
conditions are necessary and sufficient for system [1] 
to be isomonodromic: 


p+1 
i , Aj 
dA;(a) = > AAEN 4) i(4) d(a; — i) 
j=l yt 
i= 1,...,p+1 [4] 
This system (called Schlesinger’s equations) results 


from the Frobenius integrability condition 
dws =w, Aws of system [2]. 
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Remarks 1 


(i) To find the matrices-residua A; as functions of a 
and given their values Aj|,_,0 is a Cauchy 
problem. It is solvable for a close to a? and 
the matrices A; are analytic in a. 

(ii) The differential of A; being a commutator 
[A;,.], the matrix A; remains within its con- 
jugacy class throughout the deformation. 

(iii) Schlesinger’s equations are the necessary and 
sufficient conditions for isomonodromy also in 
the case when system [1] has a logarithmic 
pole at oo whose matrix-residuum does not 
change throughout the deformation. In this 
case the solution to system [1] in its Levelt’s 
decomposition at oo (see Riemann—Hilbert 
Problem) equals U,,(1/t)t-P~t-'~G, where 
D» is a diagonal matrix with integer entries, 
E% is an upper-triangular constant matrix, and 
U, is holomorphic at œo and such that 
Valsi 


Definition 2 The deformation satisfying condition 
[4] with initial condition [3] for the solution to 
system [1] is called the normalized Schlesinger 
deformation. 


Remark 3 When the matrices-residua A; are 
nonresonant, then every isomonodromic deforma- 
tion of system [1] with a; =a; is either the normal- 
ized Schlesinger deformation or is a nonnormalized 
Schlesinger deformation, that is, obtained from 
the normalized one by a change of variables 
X > C(a)X, C(a) € GL(n, C). In this way, one has 
X|; = C(a) instead of [3] and the deformation is 
described by a Pfaffian system with a form of the 


kind wn = ws + yeah yj (a)da;. 


Example 4 The following one-parameter Fuchsian 
family is an isomonodromic Schlesinger deformation: 


x i. A 
ee (Sot 


j=l 





Here the matrices A; are constant and the parameter 
b takes nonzero values. Indeed, one either checks 
directly that there holds condition [4] or one makes 
the change of time (which does not change 
monodromy) t> bt after which the parameter 
b disappears. 


A A Bolibrukh has shown that in the resonant 
case every isomonodromic deformation of a Fuch- 
sian system is described by an integrable Pfaffian 
system with 1-form w=wn + Wm, where the mero- 
morphic 1-form wm vanishes at oo and has poles of 


orders <r; along the hyperplanes {x — a; = 0}; here 7; 
is the largest nonzero integer difference between two 
eigenvalues of the matrix Aj. 

Consider now Schlesinger’s equation in the global 
situation, that is, when the poles a; belong to the 
universal covering Z of the space C”\A, where A is 
the “diagonal,” that is, the union of all sets 
{a;=a;},iA 7. Suppose that the matrices A; are 
nonresonant. There are values of a (their set is 
denoted by ©) for which some entries of some of the 
matrices-residua A; tend to oo. Typically, at such 
points the matrices A; have poles of second order; 
this is a result due to Bolibrukh. Indeed, set 
A;=Q7* Jj;Q;, where J; is the Jordan normal form 
of Aj; hence, this is a constant matrix; we assume 
that Q; € SL(n, C). Typically, at points of © the 
matrices Q; and Qt have simple poles, which 
makes a pole of second order for Aj. 

B Malgrange and, independently, T Miwa have 
proved that system [4] is completely integrable and 
that it has the Painlevé property: “The only 
movable singularities of its solutions are poles.” 
(The fixed singularities of the solutions are, by 
definition, along the points of Z which are over A. 
The positions of the movable singularities depend 
on the initial condition, that is, on the values of the 
matrices A; for a=a’.) In other words, the 
solutions to Schlesinger’s equation are matrices 
meromorphic on Z. 


Theorem 5 The set © of movable singular points 
of the Schlesinger equation is the set of zeros of a 
function T (the Miwa r-function) holomorphic on Z 
and such that 


1 ALD OLE 


Lau = i 
i jiti ' J 


Some improvements of this result are due to 
Malgrange and Bolibrukh. 


lsomonodromy and Confluence 


The idea to consider a linear system of ordinary 
differential equations with a pole of order higher than 
1 as embedded into a family of Fuchsian systems with 
confluence of the poles has been proposed by V I 
Arnol’d in 1984 and independently by J-P Ramis in 
1988. The idea has been used by A Duval, B Khesin, 
A A Glutsyuk, and other authors. In particular, it is 
interesting to relate the Stokes multipliers (defined in 
the next section) of the system obtained as a result of 
a confluence to the monodromy groups of the 


Fuchsian systems obtained for values of the para- 
meters before the confluence occurs. 


Example 6 Consider the one-parameter family of 
linear systems: 


(t= x)dX/dt = (A(A)t + B(A))X [5] 


Here the matrices A, B, and X are nxn. 
Suppose that t€C (i.e we do not consider 
singularity at co), A € (C,0). Then for A#0 the 
system is Fuchsian — it has two logarithmic poles at 
+A1/2 whose confluence for \=0 gives as a result a 
pole at 0 which might be of order 2 if B(O) #0 or 1 
if B(0)=0. 


In this section we consider only the situation 
when the family producing the confluence is 
isomonodromic for values of the parameters before 
the confluence. 


Example 7 This is the case of family [5] with B ~ 0 
and A being a constant nonresonant n x n matrix. 
Indeed, the change of time t+ !/*t(*) transforms the 
family into the family (£ —1)dX/dt=tAX 
(independent of A) which is a Fuchsian system (at oo 
as well). 


Suppose now that t € CP! (i.e., we consider the 
singularity at oo as well). Hence, the monodromy 
operator M» around œ is independent of À up to 
conjugacy (it is conjugate to exp(—271A)). On the 
other hand, consider the monodromy operator M’ 
defined by a contour circumventing counterclock- 
wise both poles at +\!/7 (one can choose as such 
a contour a circumference centered at the origin 
and of sufficiently large radius). It equals M4, 
and it is well defined for A=0 as well. (This is 
not the case of the monodromy operators defined 
by contours circumventing only one of the poles 
at +\!/*.) Hence, up to conjugacy M’ is indepen- 
dent of A. As M’ is in a sense the only 
monodromy operator that can be defined by 
a contour depending continuously on A for all 
à €(C,0) and not passing through a pole of the 
system, one can say that the family is strongly 
isomonodromic. 


Example 8 Consider now family [5] with n= 2, 


where d € C. For A Æ 0 the family is isomonodromic — 
the change of time (x) followed by the change of 
variables 


1/2 
Xt C5 1) XC) 
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brings the family to the form 


eo ( )eo( 


which is independent of A, hence, isomonodromic. 
However, the change of variables (xx) is not defined 
for \=0. The monodromy operator M’ (defined as 
above) is scalar for \=0 and conjugate to a Jordan 
block of size 2 for A #0. Hence, the family is not 
strongly isomonodromic. 


The following example is closely connected 
to singularity theory. It has been suggested by 
F Pham. 


Example 9 Consider the Abelian integrals 
= [axle +sx+t) and 
I = Jaiero + sx +t) 


taken over a closed contour y belonging to a 
nonsingular fiber of the function f(x) =x? + sx +t. 
Suppose that x? +sx+t#0 on y. Obviously, I; 
and I, depend only on [y], the class of homotopy 
equivalence of y. Set 


e+ sxtt= (x — x1) (x — x2)(x — x3), 
NS, t) 


Then one has 
3 
In = 2i D> dye! / (32? + 58), k= 2 
j=l 


where the integers óg; depend only on [y] (the 
contour y is homotopy equivalent to a linear 
combination with integer coefficients of small 
loops around the roots of f; the integral along such 
a loop is computed using residua). Note that 


C= de/d = -1/ (3x? +s) 


An easy computation shows that the integrals I4, I2 
satisfy the following Picard—Fuchs system of differ- 
ential equations: 


—tl, — 2s /3 = 21,/3 
2871/9 —th = 1/3 


The system admits also a presentation of the form 


(P+ S)(H) = (S87 23) (0) 


Here the unknown variables form a vector column 
of length 2; to obtain a 2 x 2 matrix, one has to 
choose another contour y (linearly independent 
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with y as a linear combination of the loops around 
the roots x;) which gives the second column of the 
matrix. The system is strongly isomonodromic — its 
matrix-residuum at co equals diag(2/3, 1/3); hence, 
the monodromy operator M’ up to conjugacy equals 
diag(exp(—471/3), exp(—271/3)). 


A A Bolibrukh has considered the possibility of 
confluence of poles in Schlesinger’s equation 
(i.e., the possibility to have equalities of the form 
a; = a; in system [1]). He has considered the so-called 
normalized isomonodromic  confluences, that 
is, isomonodromic confluences defined by Pfaffian 
systems with coefficient forms w=w,+w, alone 
(see the previous section). He has shown that 
a normalized isomonodromic confluence of singular 
points of Fuchsian systems of linear differential 
equations on Riemann’s sphere can only lead to 
a system with regular singular points. This is a 
partial answer to a problem stated by V I Arnol’d: 
how to express a system with regular singular 
points as a limit of Fuchsian systems? 


Other Results 


In the case of a linear system with irregular singular 
point, isomonodromy means that the formal mono- 
dromy and the Stokes multipliers do not change 
throughout the deformation. The formal mono- 
dromy can be computed from the formal normal 
form (the latter can be found algorithmically; this is 
due to H Turrittin). Consider, for simplicity, the 
nonresonant case, that is, the case when the leading 
matrix in the Laurent series of the system at the 
singular point has distinct eigenvalues (this defini- 
tion differs from the one in the case of a Fuchsian 
singular point). The Stokes multipliers are linear 
Operators acting on the solution space. They are 
defined as follows: there exist sectors of maximal 
Opening centered at the singular point on each of 
which the solution is uniquely defined by its 
asymptotic development. Two solutions X1, X2 
having one and the same asymptotic development 
in two overlapping sectors are related by X1 = X2C, 
where C is a Stokes multiplier. The monodromy 
operator is expressed as a product of the operator 
of formal monodromy and the Stokes multipliers. 
Isomonodromic deformations of systems with irregu- 
lar singular points have been constructed by B 
Malgrange. Isomonodromic deformations have been 
used by Y Sibuya and C H Lin and by Y Sibuya and 
T J Tabara to investigate Stokes multipliers. 

At the beginning of the twentieth century, 
P Painlevé and B Gambier have classified the 
differential equations of second order, 


Ug =R U Us) [6] 


(where R is analytic in x and rational in u and uy) 
whose solutions do not have branch-type movable 
singularities. From the 50 equations (up to local 
transformation) discovered by them only six are not 
reduced to linear ones. These are the so-called 
Painlevé equations. They appear often as isomo- 
nodromy conditions for families of linear differen- 
tial equations and this has given the idea to 
develop the isomonodromic deformation method. It 
consists in associating with eqn [6] a linear system 


dẸ/dà = A(A, x, u, ux) VU [7] 


with matrix-valued coefficients rational in A. 
The deformation of the coefficients in x is described 
by eqn [6] in such a way that the monodromy data of 
system |7] remain the same. Thus, the monodromy 
data of system [7] are first integrals of eqn [6]. 


Example 10 The Painlevé II equation 

Ux. — XU — 2u =v 
is associated with the system 
AD oo ge WV 
dw —41r° — ix — 2iu^ 4iAu — 2u, — Fi 
ee l Ww 
OX. \ Aii +Ý 4i? + ix 2in? 

The idea to present the Painlevé equations as 
isomonodromy conditions originate from the works 
of Fuchs (1907) and Garnier (1912). It has been 
used, for example, in the papers of Flaschka and 
Newell (1980), Jimbo and Miwa (1981), and Its and 
Novokshenov (1986). 


See also: Holonomic Quantum Fields; Integrable 
Systems: Overview; Painlevé Equations; 
Riemann-Hilbert Problem; WDVV Equations and 
Frobenius Manifolds. 


Further Reading 


Arnol’d VI and Ilyashenko YuS (1988) Ordinary differential 
equations. In: Dynamical Systems I, Encyclopedia of Mathe- 
matical Sciences, t. 1. Berlin: Springer. 

Bolibrukh AA (1997) On isomonodromic deformations of 
Fuchsian systems. Journal of Dynamical and Control Systems 
3(4): 589-604. 

Bolibrukh AA (1998) On isomonodromic confluences of Fuchsian 
singularities. Proceedings of the Steklov Institute of Mathematics 
221: 117-132 (translation from Trudy Matematicheskogo 
Instituta Imeni Steklova 221: 127-142 (1998)). 

Bolibrukh AA (2000) On orders of movable poles in Schlesinger’s 
equation. Journal of Dynamical and Control Systems 6(1): 
57-73. 

Bolibrukh AA (2001) Regular singular points as isomonodromic 
confluences of Fuchsian singular points. Russian Mathematical 


Surveys 56(4): 745-746 (translation from Uspekhi Matema- 
ticheskikh Nauk 56(4): 135-136 (2001)). 

Flaschka H and Newell AC (1980) Monodromy and spectrum 
preserving deformations I. Communications in Mathematical 
Physics 76: 67-116. 

Fokas AS and Ablowitz MJ (1982) On a unified approach to 
transformations and elementary solutions of Painlevé equa- 
tions. Journal of Mathematical Physics 23(11): 2033-2042. 

Fuchs R (1907) Mathematical Annals 63: 301-321. 

Garnier R (1912) Annales Scientifiques de l’Ecole Normale 
Supérieure 29: 1-126. 

Its AR and Novokshenov VYu (1986) The Isomonodromic Defor- 
mation Method in the Theory of Painlevé Equations, Lecture 
Notes in Mathematics, vol. 1191, p. 313. Berlin: Springer. 

Jimbo M and Miwa T (1981) Monodromy preserving deforma- 
tions of linear ordinary differential equations with rational 
coefficients, II. Physica D 2: 407-448. 


Isomonodromic Deformations 177 


Lin C-H and Sibuya Y (1990) Some applications of isomono- 
dromic deformations to the study of Stokes multipliers. 
Journal of the Faculty of Sciences, University of Tokyo, 
Section IA 36(3): 649-663. 

Malgrange B (1983) Sur les déformations isomonodromiques. I. 
Singularités régulières, pp. 401-426. II. Singularitésirrégu- 
liéres. Mathematics and Physics (Paris, 1979/1982), Progress 
in Mathematics, vol. 37. Boston, MA: Birkhauser. 

Schlesinger L (1912) Uber eine Klasse von Differentialsystemen 
beliebiger Ordnung mit festen kritischen Punkten. J. Reine 
Angew. Math 141: 96-145. 

Sibuya Y (1990) Linear Differential Equations in the Complex 
Domain: Problems of Analytic Continuation. Providence, RI: 
American Mathematical Society. 

Ueno K (1980) Monodromy preserving deformations of linear 
differential equations with irregular singular points. Proceed- 
ings of the Japanese Academy Series A Mathematical Sciences 
56(3): 97-102. 





The Jones Polynomial 


V F R Jones, University of California at Berkeley, 
Berkeley, CA, USA 


© 2006 Published by Elsevier Ltd. 


Introduction 


A “link” is a finite family of disjoint, smooth, 
oriented or unoriented, closed curves in R? or 
equivalently S*. A “knot” is a link with one 
component. The “Jones polynomial” Vj ,(t) is a 
Laurent polynomial in the variable yt which is 
defined for every oriented link L but depends on 
that link only up to orientation-preserving diffeo- 
morphism, or equivalently isotopy, of R°. Links can 
be represented by diagrams in the plane and the 
Jones polynomials of the simplest links are given 
below. 


cs, 
OO- G9) 


S) = t+P-4 


E = =y (1+ t°) 
V 
1 1 


<4 ereer 


The Jones polynomial of a knot (and generally a 
link with an odd number of components) is a 
Laurent polynomial in ż. 

The most elementary ways to calculate Vzņ(t) 
use the “linear skein theory” ideas of Conway 
(1970). Indeed, it is not hard to see by induction 
that V(t) is defined by its invariance under 
isotopy, the normalization Vo(t)=1 and the skein 
formula 


1 1 
“Vv, E 
pi Tic (vi =) F 


which holds for any three oriented links having 
diagrams which are identical except near one crossing 
where they differ as below. 


N X S7 
A XN 
L, L be 
As such the Jones polynomial resembles the 


Alexander (1928) polynomial A;(t) which can be 
calculated in exactly the same manner as Vzı(t) 
except that the skein relation becomes 


EEN (vi e =) AL, 


A two-variable generalization P; of both A; and 
V, sometimes called the HOMFLYPT polynomial, 
was found in Freyd et al. (1985) and Przytycki and 
Traczyk (1988). It satisfies the most general skein 
relation 


xPr, Vl. 4-20 = 0 


for homogeneous variables x,y, and z. 

The other skein-like definition of V; was found in 
Kauffman (1987). Begin with unoriented link dia- 
grams up to planar istotopy. The Kauffman bracket 
(L) of such a diagram is calculated using 


OQA) OFAC) 


where the (-) notation means that the relation may 
be applied to that part of the link diagrams inside 
the bracket, the rest of the diagrams being identical. 
If (L) were to be an invariant of three-dimensional 
isotopy it is easy to see that 


(Oea 
which further implies 
(SO)= 4%) 


Thus, (L) cannot be a three-dimensional isotopy 
invariant as such. However, if L is given an 


180 The Jones Polynomial 


orientation (then called L), a simple renormalization 
solves the problem and it is true that 


(x) VL(Aî) = A~? writhe HNL 


where writhe (L) is the sum over the crossings of L 
of +1 for a positive crossing (X) and —1 for a 
negative crossing XxX). 

The formula (x) is readily proved by induction but 
a more structural proof will be discussed later on, 
connected with physics. If the crossings in a link 
alternate between over and under as one follows the 
string around, the highest and lowest degree terms in 
the Kauffman bracket can readily be located. This 
led to the proof of some old conjectures about 
alternating knots in Murasugi (1987), Kauffman 
(1987), and Thistlethwaite (1987). 

The Kauffman two-variable polynomial Fz (a, x) is 
defined in Kauffman (1990) by considering the 
linear skein relation involving all four possibilities 
at a crossing: 


XK Kd 
L L_ i L 


This polynomial contains V;(T) as a specializa- 
tion but not the Alexander polynomial. 

The above polynomials are quite powerful at 
distinguishing links one from another, including 
links from their mirror images, which corresponds 
for the Jones polynomial to replacing t by t. More 
power can be added to the polynomials if simple 
geometric operations are allowed. “Cabling” entails 
replacing a single strand with several parallel copies 
and the polynomials of cables of a link are also 
isotopy invariants if attention is paid to the writhe 
of a diagram. 

The following problem, however, is open at the 
time of writing this article: “Does there exist a knot 
in R°, different from the unknot ©, whose Jones 
polynomial is equal to 1?” 

For links with more than one component, it is 
known (Thistlethwaite 2001, Eliahou et al. 2003) 
that the answer to the corresponding question is yes, 
the simplest example being: 


One of the reasons that the question above has 
not been answered is presumably that, unlike with 
the Alexander polynomial, we have little intuitive 
understanding of the meaning of the “t” in V(t). 
Perhaps, the most promising theory in this context is 
in Khovanov (2000) where a complex is constructed 
whose Euler characteristic, in an appropriately 
graded sense, is the Jones polynomial. The homol- 
ogy of the complex is a finer invariant of links 
known as “Khovanov homology.” 


Braids 


A braid (see Birman (1974)) on n strings is a 
collection of curves in R? joining n points in a 
horizontal plane to the n points directly below 
them on another horizontal plane. If the end- 
points of the braid are on a straight line, the 
braid can be drawn as in the example below 
(where 1=4). 


z 


The crucial property of a braid is that the tangent 
vector to the curves can never be horizontal. Braids 
are considered up to isotopies which are supported 
between the top and bottom planes. 

Braids on n strings form a group, called B„, under 
concatenation (plus some isotopy) as below: 


— oo = 


Artin’s presentation (Birman 1974) of the braid 
group is on the generators 01,02,...,0,-1 with the 
relations 
for 1<i<n-—2 
if |i—j| > 2 


0707419; = 0i+10i0i+1 


Oj0; = OjO; 


Thus, to find linear representations of B,,, it suffices 
to find matrices p1, (2,...5Pn—1 Satisfying the above 
relations (with o replaced by p). One such representa- 
tion (of dimension 7) called the (nonreduced) Burau 
representation is given by the row-stochastic matrices 


1-t t 0 0 
1 00 0 
0 00... 1 
1 0 0 0 
0 1-t t 0 
0 1 0 0 
P=|o 0 0 1 
0 0 0... 1 
1 0 0 
0 1 0 
ae. F 3 
O vie Let 4 
O... 1 0 


This representation is known not to be faithful for 
n > 5 but faithful for n < 3. The case n= 4 remains 
open. (See Moody (1991), Long and Paton (1993), 
and Bigelow (1999)). 

Braids can be viewed in several ways, which lead to 
several generalizations. For instance, identifying the 
vertical axis for a braid with time and taking the 
intersection of horizontal planes with the braids shows 
that elements of B,, can be thought of as motions of n 
distinct points in the plane. Thus, it is natural that 


B, = m ({C"\A}/S") 


when A is the set {(z1,...,2n)|z;=2; for some i 4 j} 
and the symmetric group S, acts freely on C”\A by 
permuting coordinates. But A is the zero-set of the 
frequently encountered function 


[Iti Ry) 


so the braid group may naturally be generalized as 
the fundamental group of C” minus the singular 
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set of some algebraic function (Birman 1974). Or, 
motions of points can be extended to motions of the 
whole plane and a braid defines a diffeomorphism of 
the plane minus 7 points. Thus, the braid group may 
be generalized as the “mapping class group” of a 
surface with marked points (Birman 1974). 


The Temperley-Lieb Algebra 


If 7 € C one may define the algebra TL(n,7) with 
identity 1 and generators @1,€2,...,@n,-1 subject to 
the following relations: 


2. 
€iĉi+1i = Tei 


Ej€; = jei if |z — j| > 2 


Counting reduced words on the e;’s shows that 


RE an — 7") 


and in Jones (1983) it is shown that these numbers, 
the Catalan numbers, are indeed the dimensions of 
the Temperley—Lieb algebras. In the obvious way, 
TL(n,7) C TL(n+1,7). If r” is not in the set 
{4 cos? qr;q € Q}, TL(n,T) is semisimple and its 
structure is given by the following Bratteli diagram: 


| 


` 

Pa 

SAN, 
LYN 
NNN, 

Z NISN > 
NAN POS 


where the integers on each row are the dimensions 
of the irreducible representations of TL(n, 7) and the 
diagonal lines give the restriction of representations 
of TL(n,r) to TL(n—1,7T). These representations 
are naturally indexed by Young diagrams with n 
boxes and at most two rows: HH- with the 
diagonal lines in the Bratteli diagram corresponding 
to removal/addition of a box. The dimension of the 
representation corresponding to the diagram whose 
second row has r boxes (r < n) is 
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One may attempt to make TL(n,r) into a 
C*-algebra and look for Hilbert space represen- 
tations (with e; #0), by imposing e*=e;. From 
(Wenzl 1987), this is only possible (for all n) when 


1.7E€R,0<7< 1/4, or 
2. T’ € {4 cos? t/m, m=3,4,5,...}. 


The proof uses the fact that f„, inductively defined by 


2), [2+ 1] 


lusi = jam na, Pehe 


must be an orthogonal projection with e;fn = fne; = 0 
for i < n. These f„ are sometimes called Jones-Wenzl 
idempotents. (Here Tt =2 + q? +q” and for this 
and later formulas we define the quantum integer 
[a]; = (4" — q™”)/(q4 — q>”). 

When 7! = 4 cos? (r/m), the Hilbert space repre- 
sentations decompose according to Bratteli diagrams 
obtained by truncating — eliminating the 1 on the 
mth row, and all representations below and to the 
right of it, so that for m=7 we would obtain 


1 


A? 

ZN 
NaN 
LN ™, 
NNN 


LN \S 
NUN. 
ANAN S 


In terms of Young diagrams, this corresponds to 
only taking those diagrams whose row lengths differ 
by at most m — 2. The existence of these Hilbert 
space representations is from Jones (1983). 

The Temperley—Lieb algebras arose in Jones (1983) 
as orthogonal projections onto subfactors of II, factors. 
As such the Hilbert space structure was manifest. The 
trace on a II, factor also yielded a trace on the TL(n, 7). 

To be precise, there is for each m a unique linear 
map tr: TL(n,7)— C with: 


1. tr(1)=1 
2. tr(ab) =tr(ba) 


3. tr(x€y41) =7tr(x) for x € TL(n+ 1,7). 


This trace may be calculated either from (1), (2), 
and (3), or using the representations, as a weighted 
sum of ordinary matrix traces. The weight for the 


representation of TL(n, T), the second row of whose 
Young diagram has r boxes, is 


eel, 
([2],)" 


Thus, if x € TL(n,7r) and m, is the (2) — (.”,) 
dimensional irreducible representation, then 


1 [2/2] 
trix) = qrg F D, — r + 1], trace (m,(x)) 


One also has 
[n + 2], 
([2],)"*" 


so that the disappearance of the “1” from the 
Bratteli diagram is mirrored by the vanishing of the 
trace of the corresponding projection. 

Positivity of tr, tr(a*a) > 0, is responsible for all the 
Hilbert space structures. To explicitly construct the 
Hilbert space representations, one may use the GNS 
construction: take the quotient of the x-algebra by the 
kernel of the form (a,b) =tr(b*a) which makes this 
quotient a Hilbert space on which TL(n,7) will act 
with the e;’s as orthogonal projections. Explicit bases 
can be obtained easily if desired, using paths on the 
Bratteli diagram, or Young tableaux. 

A useful diagrammatic presentation of TL(n,rT) 
was discovered in Kauffman (1987). A (Kauffman) 
TL diagram (for non-negative integers m and 7) is a 
rectangle with n marked points on the top and m on 
the bottom with nonintersecting smooth curves 
inside the rectangle connecting the boundary points 
as illustrated below. 


tr(fn) = 


Se 


A (5, 7)-diagram 


Two Kauffman TL diagrams are considered the same 
if they connect the same pairs of boundary points. 

The vector space TL(m,n, 6) with basis the set of 
(m,n) diagrams, and 6 € C, becomes a category with 
this concatenation together with the rule that closed 
curves may be removed, each one counting a 
(multiplicative) factor of 6. We illustrate their 
product in TL(m,n, 6) below: 


Le lea!” 





fan 


Of special interest is the algebra TL(n, n, 6). If we 
define E; to be the diagram below: 


1 2 i [+1 


then E? = ÖL; EE al, S and EE; = FE; for 
¿— j| > 2. Thus, provided 640, we have an 
isomorphism between TL(n,6™) and TL(n,n, 6) by 
mapping e; to (1/6)E;j. 

One of the nicest features of the Kauffman 
diagrams is that they yield simple explicit bases for 
the irreducible representations. To see this, call a 
curve in a diagram a “through-string” if it connects 
the top of the rectangle to the bottom. Then all 
(m,n) diagrams are filtered by the number of 
through-strings and if we let TL(m,n,k,6) be 
the span of (m,n) diagrams with at most k 
through-strings, we have TL(k,1,6)TL(n, m, k, 6) C 
TL( Rm, R30). Thus, Vam = ILa msm, 0) / TL n;m, 
m — 1,6) is a TL(n,6~*)-module, a basis of which is 
given by (m,n)-diagrams with m through-strings 
(m< n). The number of such diagrams is (”) — 
iy) and it follows from Jones (1983) that all these 
representations are irreducible for “generic” 6 (i.e., 
6 Z {2 cos Qr}) and that they may be identified with 
those indexed by Young diagrams as below: 


=a m 
Vem m | | | 
<— N-mM 


The invariant inner product on V, m is defined by 
(v,w)=w*v for the natural identification of Vin m 
with C (* is the obvious involution from (m,n) 
diagrams to (n,m) diagrams.). 


The Original Definition of Vz (t) 


Given a braid 6 € B, one may form an oriented link 
B called the closure of 8 by tying the top of the braid 
to the bottom as illustrated below: 


ys 
“Qo 
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All oriented links occur in this way (Birman 1974) 
but if a € By, aBa and Bot! (in Byii) have the 
same closure. 


Theorem 1 (Markov) (Birman 1974). Let ~ be the 
equivalence relation on |[°-_, Bn (all braids on any 
number of strings) generated by the two “moves” 
B ~ Bos and B ~ aba. Then Bi ~ Bo if and only 
if the links B, and 3 are the same. 


It is easily checked that, if 1,e1,e2,e3,... satisfy 
the TL relations of the section “The Temperley—Lieb 
algebra,” then sending c; to (t + 1)e;—1 (with 7! = 
2+t-+t') defines a representation p, of B, inside 
TL(n, 7) for each n. The representation is unitary for 
the C*-algebra structure when 7! =4cos* r/n, 
n=3,4,5,... (and t=e"/”). It is an open question 
whether p, is faithful for all n. It contains the Burau 
representation as a direct summand. 

Combining the properties of the trace tr defined 
on TL with Markov’s theorem, one obtains imme- 
diately that, for a € B,, the following function of t 
depends only on a: 


Po 
=) Vi “te(pn(a) 


(here e € Z is the “exponent sum” of a as a word on 
O15 O0328:25 Cpl): 

A simple check using the (oriented) skein-theoretic 
definition of the Jones polynomial shows that this 
function of t is precisely Va(t). This is how Vz(t) 
was first discovered in Jones (1985). 

Although less elementary, this approach to Vz (t) 
does have some advantages. Let us mention a few. 


(-w 


1. One may use representation theory to do calcula- 
tions. For instance, using the weighted sum of 
ordinary traces to calculate tr as in the section 
“The Temperley—Lieb algebra,” one obtains read- 
ily the Jones polynomial of a torus knot (i.e., â 
where a@=(0102---op-1)1 E€ Bp if p and q are 
relatively prime). It is 


PACAP a a oe 

— pz | t t pema) 

2. If one restricts attention to links realizable as â for 
a € B, for fixed n, the computation of Vg(t) can be 
performed in polynomial time as a function of the 
number of crossings in £. Thus, one has computa- 
tional access to rather complicated families of links. 

3. Unitarity of the representation when t=e** can 
be used to bound the size of |V;,(t)|. For instance, 
if a € B, and V(t) =(—Vt — (1/V£))*+, then a 
is in the kernel of p,, and |V (emin) < 
(2 cos n/n)! for any other 8 € By. 
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The representation of the braid group inside the TL 
algebra should be thought of as an extension of the 
Jones polynomial to “special knots with boundary.” 
The coefficients of the words in the e;’s (or equivalently 
the Kauffman TL diagrams) are all invariants of the 
braid. We can further remove the braid restriction and 
consider arbitrary knots and links with boundary, 
known as “tangles” (Conway 1970). 


A 3-tangle 


Tangles may be oriented or not and their 
invariants may be evaluated either by reduction to 
a system of elementary tangles using skein relations 
or by organizing the tangle and representing it in an 
algebra. See Turaev (1994). 

A similar algebraic approach is available for the 
HOMELYPT and Kauffman two-variable polyno- 
mials. The algebra playing the role of the TL algebra 
is the Hecke algebra for HOMFLYPT (Freyd et al. 
1985, Jones 1987) and the BMW algebra (Birman and 
Wenzl 1989, Murakami 1990) for the Kauffman 
polynomial. The BMW algebra was discovered after 
the Kauffman polynomial in order to provide an 
analog of the TL and Hecke algebras. For detailed 
analysis of the Hilbert space and other structures for 
both Hecke and BMW algebras, see Wenzl (1988) and 
Wenzl (1990). 


Connections with Statistical Mechanics 


One might say that turning a knot into a braid 
organizes the knot by “putting it on a lattice,” 
thereby creating a physical model with the crossings 
of the knot as interactions. Taking the trace of the 
braid is evaluating the partition function with 
periodic (vertical) boundary conditions. 

This is more than wishful thinking. The Temperley- 
Lieb algebra arose from transfer matrices in both 
the Potts and ice-type models in two dimensions 
(Temperley and Lieb 1971) and each “e;” implements 
the addition of one more interaction to the system. 
(The same e;’s as in the ice-type models were 
rediscovered in the subfactor context in Pimsner and 
Popa (1986)). Thus, the Jones polynomial of a closed 
braid is the partition function for a statistical mecha- 
nical model on the braid. In Jones (1983), it is observed 


that knowledge of the Jones polynomial for a family of 
links called French sinnets would constitute a solution 
of the Potts model in two dimensions. 

In Temperley and Lieb (1971), the TL relations 
are used to establish the mathematical equivalence 
of the Potts and ice-type (six-vertex) models. In 
Baxter (1982, chapter 12), this equivalence is shown 
for Potts models on an arbitrary planar graph. In 
view of this, it is not surprising that statistical 
mechanical models can be defined directly on link 
diagrams to give explicit formulas for Vz(t) (and 
other invariants) as partition functions. This works 
most easily for the O-state Potts model. 

Given an unoriented link diagram D, shade the 
regions of the plane black and white and form the 
planar graph rT whose vertices are the black regions 
and whose edges are the crossings as below: 





D 


Assign + and — to each edge according to the 
following scheme: 


a 





Fix O € N and two symmetric matrices w-(a, b) 
for 1<a,b<Q. The partition function of the 
diagram is then 


Vi ` [I Walo) 


states edges of T 


where a “state” is a function from the vertices of T 
to {1,2,..., O} and, given an edge of T and a state, 
o and o’ denote the values of the state at the ends of 
that edge (w, and w_ are used according to the sign 
of the edge). 

The “Potts model” is defined by the property that 
the “Boltzmann weights” w(o,0’) depend only on 
whether o =o’ or not. It is a miracle that the choice 
(with O=24747") 

HH! ifg=a' 
—1 otherwise 


W+(0,0') ={ 


gives the Jones polynomial of the link defined by D 
as its partition function (up to a simple normal- 
ization). See Jones (1989) for details. 

It is natural to look for other choices of w+ which 
give knot invariants. The Fateev-Zamolodchikov 
(1982) model gives a classical knot invariant but 
besides that (and some variants on the Jones 
polynomial) there is only one other known choice of 
any interest, discovered in Jaeger (1992). In this case, 
O=100 and the Boltzmann weights are symmetric 
under the action of the Higman—Sims group on the 
Higman-Sims graph with 100 vertices. The knot 
invariant is a special value of the Kauffman two- 
variable polynomial. 

The other side of Temperley—Lieb equivalence is 
the “ice-type” model which is a “vertex model.” 
That is to say the “spins” reside on the edges of a 
graph and the interactions occur at the vertices. To 
use vertex models in knot theory, the knot projec- 
tion D itself is the (4-valent) graph. The ice-type 
model has two spin states per edge so that a state of 
the system is a function from the edges of the graph 
to the set {+}; the Boltzmann weights are given by 
two 4 x 4 matrices w+(01,02,03,04) where the o’s 
are +] and w, and w_ are the contributions of 


02 xe . 4 
and 
gi A 03 É `N, 
to the partition function, respectively. Furthermore, 
we may think of a state as a locally constant 
function ø on D so for any f:{+1}—R we may 
form the term |, f(o)d@ corresponding to interac- 
tion with an external field (d0 is the curvature or 


change of angle form on D). Then the partition 
function is 


w+(o1, 02,03, o4) ed pee 


Ap =) 


states \ crossings of D 


A (nonphysical) specialization of the six-vertex 
model yields values of f and w+ for which Zp is a 
link invariant equal to V;(t). See Jones (1989). 

As with the Potts model, one may try to generalize 
to more general w+ and f. This is much more 
successful for these “vertex” models than it was for 
models like the Potts model. The theory of quantum 
groups (Jimbo 1986, Drinfeld 1987, Rosso 1988) 
allows one to obtain link invariants (as partition 
functions for vertex models) for each simple finite- 
dimensional Lie algebra X and each assignment of an 
irreducible representation of X to the components of 
the link. The images of the braid generators ø; in the 
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corresponding braid group representations are called 
“R-matrices.” It is the Yang—Baxter equation that 
gives isotopy invariance of the partition function. In 
this way, one obtains (by an infinite family of one- 
variable specializations) the HOMFLYPT polynomial 
(sl) and the Kauffman polynomial (orthogonal and 
symplectic algebras) and more polynomials. The 
geometric operation of cabling corresponds to the 
tensor product of representations. 


Connections with Quantum Field Theory 
Conformal Field Theory 


If y is a (multicomponent) field in one chiral half of 
a two-dimensional conformal field theory (CFT), the 
correlation functions 


(y(21) (22) +++ P(Zn)) 


(where z; € C) are expected to be singular if z; =2; 
for some i Æ j, holomorphic otherwise and satisfy a 
linear differential equation. Thus, analytic continua- 
tion should determine a unitary monodromy repre- 
sentation of m(C"\{(Z1, 225---5%n)|%i =z; for some 
i Æj}) on the vector space of solutions to the 
differential equation near a point. In Tsuchiya and 
Kanie (1988), these representations were calculated 
for the SU(2) WZW (Wess—Zumino—Witten) model, 
where the differential equation is known as the 
Khniznik—Zamolodchikov equation. The corre- 
sponding braid group representations were shown 
to be those obtained in the section “The original 
definition of V;(t)” and cablings thereof. 


Topological Quantum Field Theory 
In Witten (1989), the following formula appears: 
Vi (ern 2) ) 


= | expli f (Anda +2/3A0ANA)} 
x [[« [resp f [DA] 


J 


where A ranges over all functions from $? to the Lie 
algebra su(2), modulo the action of the gauge group 
SU(2). Also /=7/k and j runs over the components 
of the link L, to each of which is assigned an 
irreducible representation of SU(2). Parallel trans- 
port around a component f using A yields the linear 
map Pexp $, A whose trace is constant modulo gauge 
transformations. And [DA] is a fictitious diffeo- 
morphism invariant measure on all A’s modulo 
gauge transformation. 


186 The Jones Polynomial 


There are at least two ways to interpret this 
formula. 


1. As a solvable topological quantum field theory 
(TQFT) in 2 + 1 dimensions, according to Witten 
(1988) and Atiyah (1988, 1989). One is then 
obliged to expand the context and conclude that 
Vr (e?™/”) is defined for (possibly empty) links in 
an arbitrary 3-manifold. The TQFT axioms then 
provide an explicit formula for the invariant if the 
3-manifold is obtained from surgery on a link. In 
particular, the invariant of a 3-manifold without a 
link is a statistical mechanics type sum over 
assignments of irreducible representations of 
SU(2) to the components of the surgery link. The 
key condition making this sum finite is that only 
representations up to a certain dimension (deter- 
mined by 7) are allowed. This is the vanishing of 
the Jones—Wenzl idempotent of the section “The 
Temperley—Lieb algebra.” This explicit formula 
was rigourously shown to be a manifold invariant 
in Reshetikhin and Turaev (1991). For a more 
simple treatment, see Lickorish (1997) and for the 
whole TQFT treatment, see Blanchet et al. (1995). 

2. As a perturbative QFT. The stationary-phase 
Feynman diagram technique may be applied to 
obtain the coefficients of the expansion of Witten’s 
formula in powers of b or equivalently 1/7. These 
coefficients are known to be “finite type” or 
Vassiliev invariants and have expressions as 
integrals over configurations of points on the link, 
see Vassiliev (1990) and Bar-Natan (1995). 


Algebraic Quantum Field Theory 


In the Haag—Kastler operator algebraic framework 
of quantum field theory (Haag 1996), statistics of 
quantum systems were interpreted in Doplicher 
et al. (1971, 1974) (DHR) in terms of certain 
representations of the symmetric group correspond- 
ing to permuting regions of spacetime. To obtain the 
symmetric group, the dimension of spacetime needs 
to be sufficiently large. It was proposed in 
Fredenhagen et al. (1989) that the DHR theory 
should also work in low dimensions with the braid 
group replacing the symmetric group, and that 
unitary braid group representations defined above 
should be the ones occurring in quantum field 
theory. The “statistical dimension” of the DHR 
theory turns up as the square root of the index of a 
subfactor (this connection was clearly established in 
Longo (1989, 1980)). The mathematical issue of the 
existence of quantum fields with braid statistics was 
established in Wassermann (1998) using the language 
of loop group representations. Actual physical systems 
with nonabelian braid statistics have not yet been 


found but have been proposed in Freedman (2003) 
as a mechanism for quantum computing. 
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Introduction 


Kolmogorov—Arnol’d—Moser (KAM) theory deals 
with the construction of quasiperiodic trajectories 
in nearly integrable Hamiltonian systems and it was 
motivated by classical problems in celestial 
mechanics such as the n-body problem. Notwith- 
standing the formidable bulk of results, ideas and 
techniques produced by the founders of the modern 
theory of dynamical systems, most notably by 
H Poincaré and G D Birkhoff, the fundamental 
question about the persistence under small perturba- 
tions of invariant tori of an integrable Hamiltonian 
system remained completely open until 1954. In that 
year, A N Kolmogorov stated what is now usually 
referred to as the KAM theorem (in the real-analytic 
setting) and gave a precise outline of its proof, 
presenting a strikingly new and powerful method to 
overcome the so-called small-divisor problem (reso- 
nances in Hamiltonian dynamics produce, in the 
perturbation series, divisors which may become 
arbitrarily small, making convergence argument 
extremely delicate). Subsequently, KAM theory has 
been extended and applied to a large variety of 
different problems, including infinite-dimensional 
dynamical systems and partial differential equations 
with Hamiltonian structure. However, establishing 
the existence of quasiperiodic motions in the n-body 
problem turned out to be a longer story, which only 
very recently has reached a satisfactory level; the 
point being that the m-body problems present strong 
degeneracies, which violate the main hypotheses of 
the KAM theorem. 

This article gives an account of the ideas and 
results concerning the construction of quasiperiodic 


solutions in the planetary n-body problem. The 
synopsis of the article is the following. 

The next section gives the analytical description of 
the planetary (1 + 2)-body problem. 

In the subsection “Kolmogorov’s theorem and the 
RPC3BP (1954),” original version of the KAM 
theorem is recalled, giving an outline of its proof 
and showing its implications for the simplest many- 
body case, namely, the restricted, planar, and 
circular three-body problem. 

In the section “Arnol’d’s theorem,” the existence 
of a positive measure set of initial data in phase 
space giving rise to quasiperiodic motions near 
coplanar and nearly circular unperturbed Keplerian 
trajectories is presented. The rest of the section is 
devoted to the proof of Arnol’d’s theorem following 
the historical developments: Arnol’d’s proof (1963a) 
for the planar three-body case is presented, the 
extension to the spatial three-body case due to 
Laskar and Robutel (1995) is discussed, and Her- 
man’s proof — in the form given by Féjoz in 2004 — 
of the general spatial (1 + n)-case is presented. 

In the section “Lower dimensional tori,” a brief 
discussion of the construction of lower-dimensional 
elliptic tori bifurcating from the Keplerian unper- 
turbed motions is given (these results have been 
established in the early 2000s). 

Finally, the problem of taking into account real 
astronomical parameter values is considered and a 
recent result on an application of (computer- 
assisted) KAM techniques to the solar subsystem 
formed by Sun, Jupiter, and the asteroid Victoria is 
briefly mentioned. 


The Planetary (1 + n)-Body Problem 


The evolution of (1 + 7)-body systems (assimilated 
to point masses) interacting only through gravita- 
tional attraction is governed by Newton’s equations. 
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If u® € R? denotes the position of the ith body in a 
given reference frame and if m; denotes its mass, 
then Newton’s equations read 


da 
d2? Dom 


0<j<n 


j#t 


uÒ — yf 





Wu) udp 


Here the gravitational constant is taken to be equal 
to 1 (which amounts to rescale the time 1). 
Equations [1] are equivalent to the standard 
Hamilton’s equations corresponding to the Hamil- 
tonian function 


uel mimj 
Hew = D om, Da, a P 


O<i<j<n 





where (U®,u®) are standard symplectic variables 
and the phase space is the “collisionless domain” 
M:= {U, u © R? u® 4u, 0<iżj<n; the 
symplectic form is the standard one: $`; dU® ^ 
du”! := ik dul ^ dut’; |-| denotes the standard 
Euclidean norm. Introducing the symplectic coordi- 
nate change (U, u) = ġpeal(R, r), 


u =r u® =r 4r (G=1,...,n) 
U®-=-RO N” RO YOR RO [3] 
i1 9? 
(= lss) 


Phel : 


one sees that the Hamiltonian Hy.):= HNew © Phe! 
does not depend upon 1) (recall that a local 
diffeomorphism is called symplectic if it preserves 
the symplectic form). This means that R® (= total 
linear momentum) is a global integral of motion. 
Without loss of generality, one can restrict attention 
to the invariant manifold Mo:= {R =0} (invar- 
iance of eqn [1] by changes of inertial reference 
frames). 

In the “planetary”? case, one assumes that one of 
the bodies, say ¿= 0 (the Sun), has mass much larger 
than that of the other bodies (this accounts for the 
index “hel,” which stands for “heliocentric”). To 
make the perturbative character of the problem 
transparent, one may introduce the following rescal- 
ings. Let 


R® rÈ 
(i) — = 
mi = EMi, X JER JE 
EMY Mo 
(¢=1,...,7) [4] 


and rescale time by a factor em)! 2 (which amounts 


to dividing the new Hamiltonian by such a 
factor); then, the flow of the Hamiltonian 1, on 
Mo is equivalent to the flow of the Hamiltonian 








Gf |XOP iM; 
Holt mif T = x] 


i=1 


+e X 


1<i<j<n 


71-00. lyn 
xO xO _ mmm \ jg 
la) — x0) | 
on the phase space M = {XP x ER:1<i<n 
and OAx" Ax) with respect to the standard 
symplectic form )\"_, dX" A dx”); the mass para- 
meters are defined as 
Mi oo m, Ml 
mo’ Hi = mo + em; mo M; 


M; =i 
The following observations can be made: 
1. The Hamiltonian 


n (i) |2 M. 
O XV yM; 
Mok = et 2u;  |x®] 


i=l 








is integrable and represents the sum of n two- 
body systems formed by the Sun and the ith 
planet (disregarding the interaction with the 
other planets). 

2. The transformation ¢).; in eqn [3] preserves 
the total angular momentum C:= 5~"_, U® x 
u“), which is a vector-valued integral for 
Hnew. Thus, the three components, C, of 
C:= Y_, X" xx” (which is proportional to 
C and is termed the “total angular momen- 
tum”), are integrals for H,),. The integrals Cx 
do not commute: if {-,-} denotes the standard 
Poisson bracket, then {C1, C2}= C3 (and, cycli- 
cally, {C2, C3} = Cy, {C3,C1} = C2). Nevertheless, 
one can form two (independent) commuting 
integrals, for example, |C|” and C3. This shows 
that the (spatial) (1+ 7)-body problem has 
(3n — 2) degrees of freedom. 

3. An important special case is the planar (1 + n)- 
body problem. In such a case, one assumes that 
all the “single” angular momenta C® := XK" x x) 
are parallel. In this case, the motion takes place 
on a fixed plane orthogonal to C and (up to a 
rotation of the reference frame) one can take, as 
symplectic variables, X®, x" € R”. The Hamilto- 
nian Hpin governing the dynamics of the planar 
(1 + )-body problem is, then, given on the right- 
hand side of eqn [5] with X", x" € R*. Notice 
that the planar (1+ n)-body problem has 2n 
degrees of freedom. 

4. For a deeper understanding of the perturbation 
theory of the planetary many-body problem, it is 
necessary to find “good” sets of symplectic 
coordinates, which the founders of celestial 


mechanics (most notably, Jacobi, Delaunay, and 
Poincaré) have done. In particular, Delaunay 
introduced an analytic set of symplectic “action- 
angle” variables. Recall the Delaunay variables 
for the two-body “reduced Hamiltonian” 


Let {k1,k2,k3} be a standard orthonormal basis 
in the x-configuration space; let the angular 
momentum C=X x x be nonparallel to k3 and 
let the energy E =Hkep < 0. In such a case, x(t) 
describes an ellipse lying in the plane orthogonal 
to C, with focus in the origin and fixed symmetry 
axes. Let a be the semimajor axis of the ellipse 
spanned by x; (the inclination) be the angle 
between k3 and C;G=|C|;0=G cos 1= C - k3; 
L=mvVMa;¢ be the mean anomaly of x (:= 2r 
times the normalized area spanned by x mea- 
sured from the perihelion P, which is the point 
of the ellipse closest to the origin); 0 be the 
angle between kų and N:= k3 x C (:= oriented 
“node’’); and g be the argument of the perihelion 
(:= the angle between N and (O,P)). Then 
(letting T:= R/(27Z)) 


(L,G,O)Ee{L>0}x{G>o>0} 


(og, eT á 
are conjugate symplectic coordinates and if pel 
is the corresponding symplectic map, then 
Hkep o ỌDel = (U M 2E: 

Note that the Delaunay variables become 
singular when C is vertical (the node is no more 
defined) and in the circular limit (the perihelion 
is not unique). In these cases different variables 
have to be used. 

. Let (X®, x!) = ppal(Li Gi, O;), (Lis gi 0i)). Then 
Hpt expressed in the Delaunay variables 
{(Li, Gi, ©;), (£i Zi, 93): 1&1 < m} becomes 


0 1 0 ~a Mj 
Hpa=Hpatenpyea da == Poo [8] 


Note that the number of action variables on 
which the integrable Hamiltonian He depends 
is strictly less than the number of degrees of 
freedom. This “proper degeneracy,” as we shall 
see in next sections, brings in an essential 
difficulty one has to face in the perturbative 
approach to the many-body problem. In fact, this 
feature of the many-body problem is common to 
several other problems of celestial mechanics. 
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Maximal KAM Tori 
Kolmogorov’s Theorem and the RPC3BP (1954) 


Kolmogorov’s invariant tori theorem deals with 
the persistence, in nearly integrable Hamiltonian 
systems, of Lagrangian (maximal) tori, which, in 
general, foliate the integrable limit. Kolmogorov 
(1954) stated his theorem and gave a precise 
outline of the proof. Let us briefly recall this 
milestone of the modern theory of dynamical 
systems. 

Let M := B? x T? (B? being a d-dimensional ball 
in R? centered at the origin) be endowed with the 
standard symplectic form dy /Adx:= 5° dy; A dx; 
(y € BY, x € TŻ). A Hamiltonian function N on M 
having a Lagrangian invariant d-torus of energy E 
on which the N-flow is conjugated to the linear 
dense translation x > x + wt, w€R7\Q% can be 
put in the form 


N :=E+w-y+ O(y,x) 


8°O(0,x)=0, Va EN? Pl 
yQ(0,x) = 0, a ENY, 


jal <1 


(as usual, |al=ai+---+ag, wy = 4 wid 
and O=O 3204), a such a case, the Hamiltonian 
N is said to be in Kolmogorov normal form. The 
vector w is called the “frequency vector” of the 
invariant torus {y=0} x T?. The Hamiltonian N is 
said to be nondegenerate if 


det(%O(0,-)) # 0 [10] 


where the brackets denote average over T? and oO, 
the Hessian with respect to the y-variables. 

We recall that a vector w€ Rf is said to be 
“Diophantine” if there exist k > 0 and tr >d—1 
such that 


K 


By Vk Ee Z*\{0} [11] 


lw k| > 
The set D? of all Diophantine vectors in R? is a set of 
full Lebesgue measure. We also recall that Hamilto- 
nian trajectory is called quasiperiodic with (rationally 
independent) frequency w € R? if it is conjugate to 
the linear translation 6 € T? 0+ wt € T°. 


Theorem (Kolmogorov 1954) Consider a one- 
parameter family of real-analytic Hamiltonian func- 
tions H: := N + eP where N is in Kolmogorov normal 
form (as in eqn [9]) and £ € R. Assume that w is 
Diophantine and that N is nondegenerate. Then, 
there exists £o > 0 and for any |e| < £o, a real-analytic 
symplectic transformation ps: M => M putting H. in 
Kolmogorov normal form, H-0¢-=N., with 
N: := Es +w- y + Oly’, x). Furthermore, |E; — EJ, 
|: —id||~2, and |Q: — Oll~@ are small with e. 
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In other words, the Lagrangian unperturbed torus 
To := {y=0} x T? persists under small perturbation 
and is smoothly deformed into the H-,-invariant 
torus T. := ¢-({y’ =0} x T2); the dynamics on such 
torus, for all |e| < €9, consists of dense quasiperiodic 
trajectories. Note that the H,-flow on T, 
is analytically conjugated by ¢- to the translation 
x' +x’ + wt with the same frequency vector of N, 
while the energy of T., namely E., is in general 
different from the energy E of To. 

Kolmogorov’s proof is based on an iterative 
(Newton) scheme. The map ¢ə is obtained 
as limp, d!)0---od'*), where the ’s are 
(e-dependent) symplectic transformations of M 
successively closer to the identity. It is enough 
to describe the construction of ¢); 6%) is 
then obtained by os H. with H, og” 
and so on. The map ¢° is e-close to the identity 
and it is generated by  g(y',x):= y -x + 

e(b-x +s(x)+ y'-a(x)), where s and a are (resp. 
scalar- and vector-valued) real-analytic functions 
on T? with zero average and b € R4; this means 
that the symplectic map ¢%® :(y',x')— (y,x) is 
implicitly given by the relations y=O,g and 
x' = Oyg. It is easy to see that there exists a unique 
g of the above form such that for a suitable £ọ > 0, 


H, o bY) = E1 +w- y + O1(y',x’) + 7 Py 
Y |e] < £o [12] 
witho, Qi(0, x’) = 0, for any œ € Nf and la| < 1; here, 


E1, Q1, and P; depend on € and, for a suitable cı > 0 
and for |e] < £o, |E — Ei] < eilel, [Q — Qille < c1lel, 
and ||Pilla@ < c1. 

Notice that the symplectic transformation ¢") 
actually the oe of two ena transfo- 
mations: ¢!!) — gl!) O p me oy : (y, x) > (N, €) 
is the symplectic lift of u T e A given 
by x=&+<ea(€) (ie., pP is the symplectic map 
generated by y -€+ ey’-a(&)), while p : (n, E) — 
(y,x) is the angle-dependent action translation gener- 
ated by n- x+ elb- x+ s(x)); oS” acts in the “angle 
direction” and straightens out the flow up to order 
O(e?), while p acts in the “action direction” and is 
needed to keep the frequency of the torus fixed. 

Since H, o dy =: N1 + e?P1 is again a perturbation 
of a nondegenerate Kolmogorov normal form (with 
same frequency vector w), one can repeat the 
construction by obtaining a new Hamiltonian of 
the form N3 + £4P3. Iterating, after k steps, one gets 
a Hamiltonian N; + Ph Carrying out the 
(straightforward but lengthy) estimates, one can 
check that ||Pp||~@2 < ck < c”, for a suitable constant 
c>1 independent of k (the fast growth of the 
constant cą is due to the presence of the small 


divisors appearing in the explicit construction of the 
symplectic transformations ¢"”)). Thus, it is clear that 
taking ¢ 9 small enough the iterative procedure 
converges (superexponentially fast) yielding the 
thesis of the above theorem. 


6. While the statement of the invariant tori theorem 
and the outline of the proof are very clearly 
explained in Kolmogorov (1954), Kolmogorov 
did not fill out the details nor gave any estimates. 
Some years later, Arnol’d (1963a) published a 
detailed proof, which, however, did not follow 
Kolmogorov’s idea. In the same year, J K Moser 
published his invariant curve theorem (for area- 
preserving twist diffeomorphisms of the annulus) 
in smooth setting. The bulk of techniques and 
theorems stemmed out from these works is 
normally referred to as KAM theory; for reviews, 
see Arnold (1988) or Bost (1984-85). A very 
complete version of the “KAM theorem” both in 
the real-analytic and in the smooth case (with 
optimal smoothness assumptions) is given in 
Salamon (2004); the proof of the real-analytic 
part is based on Kolmogorov’s scheme. The 
KAM theory of M Herman, used in his approach 
to the planetary problem, is based on the abstract 
functional theoretical approach of R Hamilton 
(which, in turn, is a development of Nash—Moser 
implicit function theorem; see Bost (1984-85) for 
references); it is interesting, however, to note that 
the heart of Herman’s KAM method is based on 
the above-mentioned Kolmogorov’s transforma- 
tion ¢") (compare Féjoz (2002)). 

7. In the nearly integrable case, one considers a one- 
parameter family of Hamiltonians Ho(I) + eH, (I, x) 
with (I,x)€ M:= Ux T? standard symplectic 
action-angle variables, U being an open subset of 
R. When ¢=0, the phase space M is foliated 
by Hpo-invariant tori {Ip} x I, on which the flow 
is given by x —x+0y,Ho(Io)t. If Io is 
such that w:=0,Ho(Io) is Diophantine and if 
det O} Hollo) AU; ‘en from K oimonoroy s theorem 
it fellows that he torus {Io} x T? persists under 
perturbation. In fact, introduce the symplectic 
variables (y, x) with y=I—JIo and let N(y):= 
Ho(lo +y), which by Taylors formula can be 
written as Ho(Ig) +w -y + O(y) with Oly) quad- 
ratic in y and o O(0) = 3H o(Io) invertible. One can 
then apply Kolmogorov’s theorem with P4 (y, x) := 
Hı (Io TY, x iE 

Notice that Kolmogorov’s nondegeneracy con- 
dition det O% (Io) Æ 0 simply means that the 
frequency map 


leB cUu = OyHo(I) [13] 


is a local diffeomorphism (B? being a ball 
around Io). 

8. The symplectic structure implies that if n denotes 
the number of degrees of freedom (i.e., half of the 
dimension of the phase space) and d is the 
number of independent frequencies of a quasi- 
periodic motion, then d < n; if d=n, the quasi- 
periodic motion is called maximal. Kolmogorov’s 
theorem gives sufficient conditions in order to get 
maximal quasiperiodic solutions. In fact, Kolmo- 
gorov’s nondegeneracy condition is an open 
condition and the set of Diophantine vectors is 
a set of full Lebesgue measure. Thus, in general, 
Kolmogorov’s theorem yields a positive invariant 
measure set spanned by maximal quasiperiodic 
trajectories. 


As mentioned above, the planetary many-body 
models are properly degenerate and violate 
Kolmogorov’s nondegeneracy conditions and, 
hence, Kolmogorov’s theorem — clearly motivated 
by celestial mechanics — cannot be applied. 

There is, however, an important case to which a 
slight variation of Kolmogorov’s theorem can be 
applied (Kolmogorov did not mention this in 1954). 
The case referred to here is the simplest nontrivial 
three-body problem, namely, the restricted, planar, 
and circular three-body problem (RPC3BP for short). 
This model, largely investigated by Poincaré, deals 
with an asteroid of “zero mass” moving on the plane 
containing the trajectory of two unperturbed major 
bodies (say, Sun and Jupiter) revolving on a Keplerian 
circle. The mathematical model for the restricted 
three-body problem is obtained by taking n=2 and 
setting m =0 in eqn [1]: the equations for the two 
major bodies ({=0,1) decouple from the equation 
for the asteroid (c=2) and form an integrable two- 
body system; the problem then consists in studying 
the evolution of the asteroid u®(t) in the given 
gravitational field of the primaries. In the circular 
and planar cases, the motion of the two primaries is 
assumed to be circular and the motion of the 
asteroid is assumed to take place on the plane 
containing the motion of the two primaries; in fact 
(to avoid collisions), one considers either inner or 
outer (with respect to the circle described by the 
relative motion of the primaries) asteroid motions. 
To describe the Hamiltonian Hcp governing the 
motion of the RCP3BP problem, introduce planar 
Delaunay variables ((L, G), (4,ĉ)) for the asteroid 
(better, for the reduced heliocentric Sun—asteroid 
system). Such variables, which are closely related to 
the above (spatial) Delaunay variables, have the 
following physical interpretation: G is proportional 
to the absolute value of the angular momentum of 
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the asteroid, L is proportional to the square root of 
the semimajor axis of the instantaneous Sun- 
asteroid ellipse, Z is the mean anomaly of the 
asteroid, while ¢ the argument of the perihelion. 
Then, in suitably normalized units, the Hamiltonian 
governing the RPC3BP is given by 


1 
igre La Ga E) = 202 _ 
+€H1(L, G, £4, g; £) [14 


where g:= ¢—7,7 € T being the longitude of Jupi- 
ter; the variables ((L, G), (£, g)) are symplectic coordi- 
nates (with respect to the standard symplectic form); 
the normalizations have been chosen so that the 
relative motion of the primary bodies is 27 periodic 
and their distance is 1; the parameter € is (essentially) 
the ratio between the masses of the primaries; the 
perturbation Hı is the function x'?).x() — 1/|x — 
x!) | expressed in the above variables, x% being the 
heliocentric coordinate of the asteroid and x") that of 
the planet (Jupiter): such a function is real-analytic on 
{0< G< L} x T? and for small e (for complete 
details, see, e.g., Celletti and Chierchia (2003)). 
The integrable limit 


HY, = Hreple=0 = —1/(2L*) — G 
has vanishing Hessian and, hence, violates 
Kolmogorov’s nondegeneracy condition (as 
described in item (7) above). However, there is 
another nondegeneracy condition which leads to a 
simple variation of Kolmogorov’s theorem, as 
explained briefly below. 

Kolmogorov’s nondegeneracy condition det; Ho 
(Io) £ 0 allows one to fix d-parameters, namely, the 
d-components of the (Diophantine) frequency vector 
w= 0yHo(Io). Instead of fixing such parameters, one 
may fix the energy E=Ho(Io) together with the 
direction {sw:s € R} of the frequency vector: for 
example, in a neighborhood where wg Æ 0, one can 
fix E and w;/wg for 1 < i < d — 1. Notice also that if 
w is Diophantine, then so is sw for any s 4 0 (with 
same 7 and rescaled x). Now, it is easy to check that 
the map I € Hj'(E)— (wi/wg,..-,wg_1/wa) is (at 
fixed energy E) a local diffeomorphism if and only if 
the (d+ 1) x (d+ 1) matrix 


O2Hy Ho 

ðHo 0 
evaluated at Io is invertible (here the vector „Ho in 
the upper right corner has to be interpreted as a 


column while the vector Ho in the lower left 
corner has to be interpreted as a row). Such 
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“iso-energetic nondegeneracy” condition, rephrased 
in terms of Kolmogorov’s normal forms, becomes 


det( (8,200, ) 4 £0 15) 


Kolmogorov’s theorem can be easily adapted to the 
fixed energy case. Assuming that w is Diophantine 
and that N is isoenergetically nondegenerate, the 
same conclusion as in Kolmogorov’s theorem holds 
with N-:= E +w- y + O-(y’,x’), where w-=a,w 
and |as — 1| is small with e. 

In the RCP3BP case, the isoenergetic nondegene- 
racy is met, since 


2 0 0 
det Ot At,c) HY -2 
aL o Hig 0 L 


Therefore, one can conclude that on each negative 
energy level, the RCP3BP admits a positive measure 
set of phase points, whose time evolution lies on two- 
dimensional invariant tori (on which the flow is 
analytically conjugate to linear translation by a 
Diophantine vector), provided the mass ratio of the 
primary bodies is small enough; such persistent tori 
are a slight deformation of the unperturbed “Kepler- 
ian” tori corresponding to the asteroid and the Sun 
revolving on a Keplerian ellipse on the plane where 
the Sun and the major planet describe a circular orbit. 

In fact, one can say more. The phase space for the 
RCP3BP is four dimensional, the energy levels are 
three dimensional, and Kolmogorov’s invariant tori 
are two dimensional. Thus, a Kolmogorov torus 
separates the energy level, on which it lies, into two 
invariant components, and two Kolmogorov’s tori 
form the boundary of a compact invariant region so 
that any motion starting in such region will never 
leave it. Thus, the RCP3BP is “totally stable”: in a 
neighborhood of any phase point of negative energy, if 
the mass ratio of the primary bodies is small enough, 
the asteroid stays forever on a nearly Keplerian ellipse 
with nearly fixed orbital elements L and G. 


Arnol’d’s Theorem 


Consider again the planetary (1 + n)-body problem 
governed by the Hamiltonian Hpr in eqn [5]. In the 
integrable approximation, governed by the Hamil- 
tonian aie the n planets describe Keplerian ellipses 
focused on the Sun. Arnol’d (1963b) has stated the 
following theorem. 


Theorem (Arnold 1963b) Let e>0 be small 
enough. Then, there exists a bounded, H,|,-invariant 
set F(e) Cc M of positive Lebesgue measure corre- 
sponding to planetary motions with bounded 
relative distances; F(0) corresponds to Keplerian 


ellipses with small eccentricities and small relative 
inclinations. 


This theorem represents a major achievement in 
celestial mechanics solving more than tri-Centennial 
mathematical problem. Arnol’d (1963b) gave a 
complete proof of this result only in the planar 
three-body case and gave some indications of how to 
extend his approach to the general situation. 
However, to give a full proof of Arnol’d’s theorem 
in the general case turned out to be more than a 
technical problem and new ideas were needed: the 
complete proof (due, essentially, to M Herman) has 
been given only in 2004. 

In the following subsections, we briefly review 
the history and the ideas related to the proof of 
Arnol’d’s theorem. As for credits: the proof of Arnol’d’s 
theorem in the planar 3BP case is due to Arnol’d himself 
(Arnol’d 1963b); the spatial 3BP case is due to Laskar 
and Robutel (1995) and Robutel (1995); the general 
case is due to Herman (1998) and Féjoz (2004). The 
exposition we have given does not always follow the 
original references. 


The planar three-body problem Recall the Hamil- 
tonian Hpm of the planar (1+ n)-body problem 
given in item (3) of the section “The planetary 
(1+ 2)-body problem.” A convenient set of sym- 
plectic variables for nearly circular motions are the 
“planar Poincaré variables.” To describe such vari- 
ables, consider a single, planar two-body system 
with Hamiltonian 


A M 
ot XeER’*, 04x€R’ 
H X 
(with respect to dX ^ dx) [16] 


and introduce — as done before formula [14] for 
0 r 

a — planar Delaunay variables ((L, G), (2, 2)) 

(here, g = ê= argument of the perihelion). To remove 

the singularity of the Delaunay variables near zero 

eccentricities, | Poincaré introduced variables 


((A, n), (A, €)) defined by the following formulas: 
Lar, F2toc 
A=£+8, 


V2Hcosh=y7 
V2H sinh = £ 


As Poincaré showed, such variables are symplectic and 
analytic in a neighborhood of (0,00) x T x {0,0}; 
notice that the symplectic map ((A, n), (A, €)) > (X, x) 
depends on the parameters 1, M, and £. In Poincaré 
variables, the two-body Hamiltonian in eqn [16] 


pan 
[17] 


becomes —K/(2A7), with &:= (u/mo)?/M. Now, 
re-insert the index i, let @;:((Aj, ni), (Ais &)) @ (X®, 


x!) and P(A, n, A, £) = (91 (A1, ni, Ads Bis me) Onl ns 
Nns Ans &n)). Then, the Hamiltonian for the planar 
(1 + 2)-body problem takes the form 


Pisin D = Ho(A) er (A, A, n, £) 


AA 
Ki := (=) M; [18] 


l i 
Hı: =HoOm tL ne nc 


14 ki 


where the so-called “complementary part” Hi"? 
and the “principal part” HẸ™ of the perturbation 


are, respectively, the aeons 


S m j 
S> x®.x and a cee |) 


1<i<j<n 1<i<j<n 





expressed in Poincaré variables. 

The scheme of proof of Arnol’d’s theorem in the 
planar, three-body case (one star, n =2 planets) is as 
follows. The Hamiltonian is given by eqn [13] with 
n=2; the phase space is eight dimensional (four 
degrees of freedom). This system, as mentioned several 
times, is properly degenerate and Kolmogorov’s 
theorem cannot be applied directly; furthermore, a 
full (four-dimensional) set of action variables needs 
to be identified. 

A first observation is that, in the planetary model, 
there are “fast variables” (the Aps describing the 
revolutions of the planets) and “secular variables” 
(the ns and £s describing the variations of position 
and shape of the instantaneuous Keplerian ellipses). 
By averaging theory (see, e.g., Arnol’d (1998)), one 
can “neglect,” in nonresonant regions, the fast-angle 
dependence up to high order in € obtaining an 
effective Hamiltonian, which, up to O(e7), is given 
by the “secular” Hamiltonian 


Mee = Ho(A) ) + €H1( A = 


=f i > 20] 


“Nonresonant region” means, here, an open A-set 
where 0\Ho-k #0 for k € Z*,|ki|+|k2| < K and 
for a suitable K > 1. 

In order to analyze the secular Hamiltonian, we 
shall beriefly consider Hı as a function of the 
symplectic variables 7 and &, regarding the “slow 
actions” A; as parameters. 

For symmetry reasons, Hı is even in (7, €) and the 
point (7, €)=(0,0) is an elliptic equilibrium for H1: 
the eigenvalues of the matrix Sa, ey Hi(A, 0, 0), 
S being the standard symplectic matrix, are purely 


H(A, ,&) 
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imaginary numbers { +19), +192}. The real numbers 
{Q;} are symplectic invariants of the secular Hamil- 
tonian and are usually called first (or linear) Birkh- 
off invariants. In a neighborhood of an elliptic 
equilibrium, one can use Birkhoff’s normal form 
theory (see, e.g., Siegel (1971)): if the linear 
invariants (3,92) are nonresonant up to order r 
(i.e, if O-k:= iki +O.ko 40 for any keZ? 
such that |ki/+|k2| <r), then one can find a 
symplectic transformation @¢pi, so that 


Hy © bpir = FUt,Ja;A)+0,, Jj = [21 


m+ & 

2 
where F is a polynomial of = [r/2] of the form 

Od, + Q2J2 + (1/2)MJ - i -,-M=M(A) being a 
(2 x 2) matrix (and 0,/|J|"/* — 0 as |J| + 0). Arnol’d, 
using computations performed by Le Verrier, 
checked the nonresonance condition up to order 
r=6 in the asymptotic regime aı/a2 — 0 (where a; 
denote the semimajor axes of approximate Kepler- 
ian ellipses of the two planets); these computations 
represent one of the most delicate parts of the paper. 

Thus, combining averaging theory and Birkhoff 
normal form theory, one can construct a symplec- 
tic change of variables defined on an open 
subset of the phase space (avoiding some linear 
resonances) (A, A,7,&€)—(A‘’,X’,J,¢~), where + 
ig; = ,/2J; exp (ip;), casting the three-body Hamil- 
tonian into the form 


Ho(A’) + €(Q(A’) -J +4M(A’)J -J) 
$e Fi (NJ) PeF (NX. ),9) 
= Ho(A', J €) + PFA, J, o) [22] 


for a suitable prefixed order p > 3; notice that the 
nonresonance condition needed to apply averaging 
theory is not particularly hard to check since it 
involves the unperturbed and completely explicit 
Kepler Hamiltonian Ho. The idea is now to consider 
e? Fy as a perturbation of the completely integrable 
Hamiltonian Ho and to apply Kolmogorov’s theo- 
rem. Finally, one can check the Kolmogorov’s 
nondegenearcy condition, which since 


det Oy) Ho(A’, J’; €) =e? ((det HG) det M + O(e)) 
amounts to check the invertibility of the matrix M. 
Such a condition is also checked in Arnol’d (1963b) 
with the aid of Le Verrier’s tables and in the 
asymptotic regime a1 /a2 — 0. 


The spatial three-body problem In order to extend 
the previous argument to the spatial case, Arnol’d 
suggested connecting the planar and spatial case 
through a limiting procedure. Such strategy presents 
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analytical problems (the symplectic variables for the 
spatial case become singular in the planar limit), 
which have not been overcome. However, the 
particular structure of the three-body case allows 
one to derive a four-degree-of-freedom Hamiltonian, 
to which the proof of the planar case can be easily 
adapted. The procedure described below is based on 
the classical Jacobi’s reduction of the nodes. 

First, we inroduce a convenient set of symplectic 
variables. Let, for i=1,2,((Li, G; O;), (44 8; 0;:)) 
denote the Delaunay variables introduced in items 
(S) and (6) above: these are the Delaunay variables 
associated to the two-body system, Sun-ith planet. 
Then, as Poincaré showed, the variables ((A¥, Až), 
(77; 5 Ch (Oj, 9;)), where 


ASL 
A; = & + 8; 
|23] 
n; = /2(L; — G;) cos g; 
E = — (L; — Gi) sin g; 
are symplectic and analytic near circular, non- 


coplanar motions; for a detailed discussion of these 
and other sets of interesting classical variables, see, 
for example, Biasco et al. (2003) and references 
therein; the asterisk is introduced to avoid confusion 
with a closely related but different set of Poincaré 
variables (see below). Let us denote by 


Hopp = HO (A*) + EHP (A*, A*, 7", E, 0,0) 


the Hamiltonian equation [8] (with n = 2) expressed 
in terms of the symplectic variables 
((A*, A*), (7%, €*), (O, 0)), A* = (A1, A5), etc. Recalling 
the physical meaning of the Delaunay variables, one 
realizes that ©; +02 is the vertical component, 
C3=C.- k3, of the total argument C=C" + C% 
where C® denotes the angular momentum of the ith 
planet with respect to the origin of an inertial 
heliocentric frame {k1, k2, k3}. This suggests that the 
symplectic variables can be introduced: 


(A, A, 17°, E, U, wy) = OA 5767 oe oO; 0) 
with {V U2, Y1, Y2) = (O1, O81 + C2, 1 — 2, 2). 
Let 


.— —1 
Mapp a H3bp op 


denote the Hamiltonian of the spatial three-body 
problem in these symplectic variables. Since the 
Poisson bracket of V2 = 01 + ©» and H3,. vanishes 
(C3 being an integral for the H3,,-flow), the 
conjugate angle 72 is cyclic for H3pp that is, 


H3bp = Mp 9A 50) 36 9 Y, Wo, p1) 


Now (because the total angular momentum C 
is preserved), one may restrict attention to the 
ten-dimensional invariant (and symplectic) submani- 
fold My. defined by fixing the total angular 
momentum to be vertical. Such submanifold is 
easily described in terms of Delaunay variables; in 
fact, C -kı =O=C- ky is equivalent to 


0—0 =n and G{-O7=G5-O5 [24 
Thus, Mj. := o(M 


Mia = {v1 = 7, Yı = (A ‘nt, 65 Wa) } 


ver) IS given by 


with 
* x\2 * oN 2 
t = pi — Hy)" — (Ag - By) 
"2 20, 
"E ee 
1’ 2 
Since Mřę is invariant for the flow ¢ of 


Happs Y1(t) = 7 and 41 = 0 for motions starting on 
Mers Which implies that (ôw, H3bp)| m: =0. This 
fact allows one to introduce, for fixed values of the 
vertical angular momentum Y =c Æ 0, the follow- 


ing reduced Hamiltonian 


ae M1; £“) 


= HAA aes Wi(A*, n", E"; hG T) 


on the eight-dimensional phase space Mea := {A} > 0, 
A € T?, (n°, €) € B4} endowed with the E 
e form dA* A dà* + dņ* A dé* (B* being a 
ball around the origin in R*). In fact, the (standard) 
Hamilton’s equations for Hg are immediately recog- 
nized to be a subsystem of the full (standard) 
Hamilton’s equations for 713,, when the initial data 
are restricted on Mž„ and the constant value of W2 is 
chosen to be c. More precisely, if the Hamiltonian flow 


of HE q on Mea is denoted by œt, then 
pi, (2*, Vv, (A*, a a c); C,T, v2) 
= (Ae iene) BS 


where we have used the shorthand notations: 
z* =(A*, As &*) E Marcas Wy (t)= WV} 0 f(z"); y2(t)= 
drt fi du, Hi, (lz 2"), W4(s),c,m)ds. At this point, 
the scheme used for the planar case may be easily 
adapted to the present situation. The nondegeneracy 
conditions have been checked in Robutel (1995) where 
indications, based on a computer program, have been 
given for the validity of the theorem in a wider set of 
initial data. 

Notice that the dimension of the reduced phase 
space of the spatial case is 8, which is also the 
dimension of the phase space of the planar case. 


Therefore, also the Lagrangian tori obtained with 
this procedure have the same dimension of the tori 
obtained in the planar case (i.e., four). 


The general case Consider the general case follow- 
ing the strategy of M Herman as presented by Féjoz 
(2004), to which the reader is referred for complete 
proofs and further references. 

The symplectic variables used in Féjoz (2004), to 
cope with the spatial planetary (1 + n)-body prob- 
lem (Sun and n planets), are closely related to the 
variables defined in eqn [23]. For 1<i< n, let 
((Li, Gi, O;), (4;, g;, 9;)) denote the Delaunay variables 
associated with the two-body system, Sun-ith 
planet. Then (as shown by Poincaré) the variables 
(Aj, As), (is Ei), (Dis qi)), Where A; = Li, À; = 4i + gi +H, 


and 


np =y 214 = Gi) cos gy +0) 


& = —v/2(L; — G;) sin(g; + 0;) 

[26] 
Di = 2(G; — Oj) COS 0; 
qi = — 2(G; = O;) sin 0; 


are symplectic and analytic near circular, non- 
coplanar motions (see, e.g., Biasco et al. (2003)). Let 


Hoop = HO (A) + eH" (A,d,7,60,9) 27 


denote the Hamiltonian (eqn [8]) expressed in terms 
of the Poincaré symplectic variables ((A, A), (7, £), 
(P, q)), A= (Ay, coy Ay), etc. 

As the number of the planets increases, the 
degeneracies become stronger and stronger. Further- 
more, a clean reduction, such as the reduction of the 
nodes, is no more available if n > 2. To overcome 
these problems Herman proposed a new approach, 
which is described below. 

Instead of Kolmogorov’s nondegeneracy assump- 
tion — which says that the frequency map [13] 
I—w(I) is a local diffeomorphism - one may 
consider weaker nondegeneracy conditions. In 
particular, in Féjoz (2004), one considers non- 
planar frequency maps. A smooth curve u € A —> 
w(u) € Rf, where A is an open nonempty interval, 
is called “nonplanar” at uo € A if all the u-derivatives 


up to order (d — 1) at uo, wluo), w' (uo), . . ., w% P (u0) 
are linearly independent in Rf; a smooth 
map uE€ACR?—w(u) € Rf, p <d, is called 


nonplanar at uo E A if there exists a smooth 
curve y: Â — A such that wo is nonplanar at to € 
A with y(to)=uo. A S Pyartli has proved (see, e.g., 
Féjoz (2004)) that if the map u € A C R? > w(u) € pR’ 
is nonplanar at uo, then there exists a neighborhood 
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BCA of up and a subset C C B of full Lebesgue 
measure (i.e., meas(C) = meas(B)) such that w(u) is 
Diophantine for any u € C. The nonplanarity condi- 
tion is weaker than Kolmogorov’s nondegeneracy 
conditions; for example, the map 
T4 
w(I):= ô (ž tibl t Is) 


= (3 +2hbh+b,,h,1) 


violates both Kolmogorov’s nondegeneracy and the 
isoenergetic nondegeneracy conditions but is non- 
planar at any point of the form (I,,0,0,0), since 
w(I1,0,0, 0) = (17, 14,11, 1) is a nonplanar curve (at 
any point). 

As in the three-body case, the frequency map is 


that associated with the averaged secular 
Hamiltonian 
These — HO (A) F cH") 
|28] 


_ dà 
(1) me (1) 
HY (Asn, 0,4): J My 


which has an elliptic equilibrium at 7 ==p=q=0 
(as above, A is regarded as a parameter). It is a 
remarkably well-known fact that the quadratic part 
of H") does not contain “mixed terms,” namely, 


HD = HÇ + €(Qoin 7° + O ine E Qspt P -P 
+ Qs q: q + Ou) 29] 


where the function ai and the symmetric matrices 
Qpin and Qp depend upon A while O4 denotes 
terms of order 4 in (n, €, p,q). The eigenvalues of the 
matrices Q n and Qp are the first Birkhoff 
invariants of H") (with respect to the symplectic 
variables (7,£,p,q)). Let o1,...,0, and o1,...,% 
denote, respectively, the eigenvalues of Qp and 
Qpr; then the frequency map for the (1 + n)-body 
problem will be defined as (recall eqn [18]) 


A => (ô, £N) [30] 
with 
F K1 Kin 
ain (8.8) 7 


:n)) 


Herman pointed out, however, that the frequencies 
go and ç satisfy two independent linear relations, 
namely (up to renumbering the indices), 


gael Sass 


n 


X (mi+a)=0 [32] 


i=1 


G=, 


which clearly prevents the frequency map to be 
nonplanar; the second relation in eqn [32] is usually 
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called “Herman resonance” (while the first relation 
is a well-known consequence of rotation invariance). 

The degeneracy due to rotation invariance may 
be easily taken care of by considering (as in the 
three-body case) the (6n — 2)-dimensional invariant 
symplectic manifold Myer, defined by taking the 
total angular momentum C to be vertical, that is, 
C-ky =0=C-k). But, when n > 2, Jacobi’s reduc- 
tion of the nodes is no more available and to get rid 
of the second degeneracy (Herman’s resonance), the 
authors bring in a nice trick, originally due — once 
more! — to Poincaré. In place of considering Hnbp 
restricted on Msye, Féjoz considers the modified 
Hamiltonian 


Hig = Unbp+6Cz, Czi=C+k3=|C| [33] 


where 6 € R is an extra artificial parameter. By an 
analyticity argument, it is then possible to prove that 
the (rescaled) frequency map 


(A ea (Oh ee ree Bn ea 


is nonplanar on an open dense set of full measure 
and this is enough to find a positive measure set of 
Lagrangian maximal (3n — 1)-dimensional invariant 
tori for Tees but, since Fes and Habp Commute, a 
classical Lagrangian intersection argument allows 
one to conclude that such tori are invariant also for 
Hn»bp yielding the complete proof of Arnol’d’s 
theorem in the general case. Notice that this 
argument yields (372 — 1)-dimensional tori, which in 
the three-body case means five dimensional. Instead, 
the tori found in the section “The spatial three-body 
problem” are four dimensional. The point is that 
in the reduced phase space, the motion of the 
nodeline — denoted as q(t) in eqn [25] — does not 
appear. 

We conclude this discussion by mentioning that 
the KAM theory used in Féjoz (2004) is a modern 
and elegant function-theoretic reformulation of the 
classical theory and is based on a C™ local inversion 
theorem (F Sergeraert and R Hamilton) on “tame” 
Frechet spaces (which, in turn, is related to the 
Nash—Moser implicit function theorem; see Bost 
(1984-85)). 


Lower Dimensional Tori 


The maximal tori for the many-body problems 
described above are found near the elliptic equilibria 
given by the decoupled Keplerian motions. It is 
natural to ask what happens of such elliptic 
equilibria when the interaction among planets is 
taken into account. Even though no complete 
answer has yet been given to such a question, it 


appears that, in general, the Keplerian elliptic 
equilibria “bifurcate” into elliptic n-dimensional 
tori. This section presents a short and nontechnical 
account of the existing results on the matter (the 
general theory of lower-dimensional tori is, mainly, 
due to J K Moser and S M Graff for the hyperbolic 
case and V K Melnikov, H Eliasson, and S B Kuksin 
for the technically more difficult elliptic case; for 
references, see, e.g., Chierchia et al. (2004)). 

The normal form of a Hamiltonian admitting an 
n-dimensional elliptic invariant torus 7 of energy E, 
proper frequencies © € R”, and “normal frequen- 
cies” Q € R? in a 2d-dimensional phase space with 
d=n +p is given by 


"EG 
2 





p 
N:=E+o0-y+ > 9 [34] 
=1 


J 


Here the symplectic form is given by dy A^ dx + 
dn A dé, y € R”, x € T”, (n,€) € R*;T is then given 
by T:={y=0} x {7 =&=0}. Under suitable assump- 
tions, a set of such tori persists under the effect of a 
small enough perturbation P(y,x,7,&). Clearly, the 
union of the persistent tori (if n < d) forms a set of 
zero measure in phase space; however, in general, 
n-parameter families persist. 

In the many-body case considered in this article, 
the proper frequencies are the Keplerian frequencies 
given by the map A—w(A) (eqn [31]), which is a 
local diffeomorphism of R”. The normal frequencies 
Q, instead, are proportional to € and are the first 
Birkhoff invariants around the elliptic equilibria as 
discussed above. Under these circumstances, the main 
nondegeneracy hypothesis needed to establish the 
persistence of the Keplerian n-dimensional elliptic tori 
boils down to the so-called Melnilkov condition: 


2,4049;-9), WiFi 35] 


Such condition has been checked for the planar 
three-body case in Féjoz (2002), for the spatial 
three-body case in Biasco et al. (2003) and for the 
planar n-body case in Biasco et al. (2004). The 
general spatial case is still open: in fact, while it is 
possible to establish lower-dimensional elliptic tori 
for the modified Hamiltonian ae in [33], it is not 
clear how to conclude the existence of elliptic tori 
for the actual Hamiltonian Hnbp since the argument 
used above works only for Lagrangian (maximal) 
tori; on the other hand, the direct asymptotics 
techniques used in Biasco et al. (2003) do not 
extend easily to the general spatial case. 

Clearly, the lower-dimensional tori described in 
this section are not the only ones that arise in 
n-body dynamics. For more lower-dimensional tori 
in the planar three-body case, see Féjoz (2002). 


Physical Applications 


The above results show that, in principle, there may 
exist “stable planetary systems” exhibiting quasiper- 
iodic motions around coplanar, circular Keplerian 
trajectories — in the Newtonian many-body approx- 
imation — provided the masses of the planets are 
much smaller than the mass of the central star. 

A quite different question is: in the Newtonian 
many-body approximation, is the solar system or, 
more in generally, a solar subsystem stable? 

Clearly, even a precise mathematical reformula- 
tion of such a question might be difficult. However, 
it might be desirable to develop a mathematical 
theory for important physical models, taking into 
account observed parameter values. 

As avery preliminary step in this direction, consider 
one of the results of Celletti and Chierchia (see Celletti 
and Chierchia (2003), and references therein). 

In Celletti and Chierchia (2003), the (isolated) 
subsystem formed by the Sun, Jupiter, and asteroid 
Victoria (one of the main objects in the Asteroidal 
belt) is considered. Such a system is modeled by an 
order-10 Fourier truncation of the RPC3BP, whose 
Hamiltonian has been described in the section 
“Kolmogorov’s theorem and the RPC3BP (1954).” 
The Sun-—Jupiter motion is therefore approximated by 
a circular one, the asteroid Victoria is considered 
massless, and the motions of the three bodies are 
assumed to be coplanar; the remaining orbital 
parameters (Jupiter/Sun mass ratio, which is approx- 
imately 1/1000; eccentricity and semimajor axis of the 
osculating Sun—Victoria ellipse; and “energy” of the 
system) are taken to be the actually observed values. 
For such a system, it is proved that there exists an 
invariant region, on the observed fixed energy level, 
bounded by two maximal two-dimensional Kolmo- 
gorov tori, trapping the observed orbital parameters of 
the osculating Sun—Victoria ellipse. 

As mentioned above, the proof of this result is 
computer assisted: a long series of algebraic compu- 
tations and estimates is performed on computers, 
keeping a rigorous track of the numerical errors 
introduced by the machines. 
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Introduction 


In most physical cases, the evolution of a system of N 
indistinguishable interacting particles Xj =(x1,%2,..., 
xn) with velocities Vyn = (v1,V2,...,UN) is described by 
a Hamiltonian system 


dXy  OH(Xn, Vn) 


dt OVN 1] 
dVn H(Xy, Vw) 
dt OXN 


in the phase space RIN x RIN, When N becomes 
large, it is natural to consider replacing the above 
discrete phase space by a continuous phase space 
of dimension 1 <d < 3, RÉ x RÉ and to introduce 
a measure f(x,v,t) that describes the density of 
particles which, at the point x € R? and at time t, 
have velocity v. This measure may also be 
interpreted as a generalization of the empirical 
measure 


1 
BN) =H DD) Seau) 
1<i<N 


defined in the phase space R? x RÉ by the above 
system of N particles. In this way, one constructs a 
link between the microscopic and the macroscopic 
descriptions. The macroscopic physical quantities 
are, for instance, the first moments of this density: 


p(x, t) =f f(x,v,t)dv (density) 


ple, thu(x,t) = | 


j vf (x,v,t)dv (momentum) 
R 


ie 
p(x, t)E(x, t) -| OT fav, t)dv (energy) 


R? 


v 


Kinetic theory studies the intermediate stage shown 
in Figure 1. 

Its first successes were related to classical thermo- 
dynamics and in particular to the molecular hypoth- 
esis. The contributions of Maxwell (1860, 1872) 
and of Boltzmann (1867) led to the “Boltzmann” 





Hamiltonian Systems As Kinetic equations = Macroscopic equations 


Figure 1 Illustration of the role of kinetic equations in linking 
microscopic and macroscopic properties. 


equation, described in the companion article of 
Mario Pulvirenti (see Boltzmann Equation (Classical 
and Quantum)). In 1905, Lorentz used the same 
point of view to describe the motion of electrons in a 
metal. However, the different physical context leads 
to some basic differences between the Boltzmann 
equation and the Lorentz equation. The Boltzmann 
equation is derived under the assumption that the 
driving forces result from collisions between pairs of 
molecules. Therefore, the problem is nonlinear with 
a quadratic nonlinearity. In the Lorentz model the 
driving force is the interaction of the electrons with 
the atoms of the metal, which remain fixed. 
Collisions between electrons are ignored, so that 
the Lorentz equation is linear. 

The most general form of a kinetic equation is as 
follows: 


OPV, 4) > Ville V of (G00) 
-VH Viflev.t)=Cf) 2 


The term C(f) represents the effect of interactions 
either between particles or with the background. 
Without this term, the eqn [2] is reduced to the 
classical Liouville equation 


Onf (x, v, t) + Voy Vaf (x, v, t) 
SV gle Vy (Ut) = 0 [3] 


which says that the function f is transported by the 
flow of the Hamiltonian H;(x,v). This Hamiltonian 
depends on the model and may involve the unknown 
function f itself. In the simplest case H(x, v) = v|? 12; 
eqn [3] and its solutions are given by 


OJ (x, v,t) +v- Vif (x, v,t) = 0 
f(x,v,t) = f(x — vt,v,0) [4] 


Nowadays kinetic equations appear in a variety of 
sciences and applications, such as astrophysics, 
aerospace engineering, nuclear engineering, particle- 
fluid interactions, semiconductor technology, social 
sciences, and biology, for example in chemotaxis 
and immunology. 

They are used first to model phenomena and then 
to obtain a qualitative and quantitative description 
of situations involving sufficiently many particles so 
as to prohibit any computation at the level of 
particles, and yet the medium is still too rarefied to 
allow the use of macroscopic equations. As detailed 
in the next section, a macroscopic description 
requires that the function f(x,v,t) be close to local 
thermodynamical equilibrium. For classical and 
quantum Boltzmann equations (see Boltzmann 


Equation (Classical and Quantum)) these equilibria 
are either Maxwellian, Bose-Einstein, or Fermi- 
Dirac distributions. 

Several effects, especially the influence of the 
boundary, may prevent the system from reaching 
local thermodynamical equilibrium and, therefore, 
even in macroscopic descriptions, kinetic equations 
may still be used to take into account the effect of 
the boundary. In this case, the term “Knudsen 
boundary layer” is currently used. 

Finally, one should keep in mind that there exist 
some macroscopic phenomena which cannot be 
deduced from the corresponding microscopic phys- 
ics by the mediation of a kinetic equation. Once 
again, returning to the companion article (see 
Boltzmann Equation (Classical and Quantum)) one 
observes that, since the only equilibria are Maxwel- 
lian, the macroscopic equations are those describing 
perfect gases. A real gas with a nontrivial van der 
Waals law is “too dense” to be explained by this 
theory. The alternative seems to go directly from the 
microscopic direction to the macroscopic descrip- 
tion. This is a subject which is still under investiga- 
tion and for which the reader may consult Olla et al. 
(1993). 


Kinetic Equations Entropy 
and Irreversibility 


At the level of particles, the basic laws of physics are 
reversible. Yet these same laws are not reversible 
when seen at the level of a macroscopic description. 
This lack of reversibility is measured by the decay of 
entropy (mathematicians prefer convex functions; 
therefore, the mathematical entropy considered in 
this contribution is the negative of the physical 
entropy, and with irreversibility it decays). The 
kinetic equations lie in between, as shown in 
Figure 1; the decay of entropy should appear along 
one of the two arrows of this diagram. 

Since the appearance of irreversibility is related to 
loss of information and averaging, it should be 
driven by a “mixing” process. 

In general two mechanisms are responsible for 
such effects: 


1. an ergodic or a relaxation mechanism by which a 
process averages itself; and 

2. the introduction of some external random param- 
eter. Observable quantities are then defined as 
averages over that parameter. 


It seems important to compare these two “pro- 
cesses.” This will be illustrated below with the most 
classical examples of the theory. 
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The Diffusion Limit for the Neutron Transport 
Equation 


Equations very similar to the one introduced by 
Lorentz are used to describe the interaction of neutrons 
with atoms in a nuclear reactor: this is the reason why 
these types of equations are often called neutron 
transport equations. An important issue is the deriva- 
tion of a macroscopic diffusion equation. Assuming 
that neutrons are not subject to acceleration effects, 
considering the problem with constant modulus of 
velocity (|v) =1), introducing a “small” parameter e 
which here corresponds to the absorption of the 
medium, one can study the following simplified model: 


Of, +V- Vf 
a(x) (f. 7 J k(v, v/)f.(v')dv" ) =0 [5] 





+ 
€ 


In [5] one assumes, for the kernel k(v,v’), the 
following properties: 


Vv, vu, k(v,v')=k(v’,v), 0< k(v,v’) 


J k(v,v')\dv = 1 [6] 
jo |=1 


and denotes by K the operator 


k(v,v')f (vdv 


lv'|=1 


f+ Kf = 


In the simplest case (say without boundary) eqn [5] 
is well-posed both for positive and negative time 
but hypothesis [6] has the following important 
consequences: 


1. For positive time, it defines, for each € > 0, a 
contraction semigroup in any L’ space and, there- 
fore, the sequence of solutions or a subsequence 
thereof converges, say weakly, to a limit f(x, v, t). 

2. One also observes that v> 1 is (up to a multi- 
plicative constant) the only solution of the equation 


f-Kf=fl)-f  kww)fv)dv=0 17 


lv'|=1 


Therefore, the e~! in front of the collision term 
forces the limit f(x,v,t) to be independent of v. 
In this simple problem, this is the thermodyna- 
mical equilibrium. 


Dividing by € and integrating over |v| = 1 gives the 
relation 


O; [Axut dv 
lv|=1 
1 
+ Va E 
€ J \v|=1 


vfe(x, v, t)dv = 0 [8] 
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Now using the Fredholm alternative implies the 
existence and uniqueness of a function v |> G(v) such 
that 


Multiply eqn [5] by (v) and integrate over |v| =1 to 
obtain 


ae) — v)fe(x, v, t)dv 
im (UE - KD (oVh(x.0-00d 
= in Bw — K) f(x, v, t)dv 


e>0 € lv|=1 


= — lim Vs 


(v) @ vf-(x,v,t)dv [10] 
e>0 lv|=1 

Since the operator (I— K) is self-adjoint non- 

negative, with O as the leading eigenvalue, the 

matrix 


is positive definite, and one finally obtains the 
diffusion equation 


EA 
a(x) 


af — Vs vaf) =0 111] 


The above derivation is an example of what is called 
the “moments method.” It is implicit even in the 
papers of Maxwell. It has been systematically used 
in several domains: 


e To understand the relation between the Boltzmann 
equation and the Euler and Navier-Stokes 
equations (Golse 2005); 

e To compute the critical size of a nuclear assembly. 
One shows that this size is well approximated by 
the size of the domain for which the Laplacian, 
with appropriate boundary conditions, has lead- 
ing eigenvalue 0. It is for the spectral analysis of 
this problem that the averaging lemma (see the 
section “Some specific mathematical tools”) was 
derived. 

e To analyze the macroscopic limit for the solution 
of the radiative transfer equations, which describe 
the propagation of the intensity of photons in a 
large class of phenomena ranging from stellar 
atmospheres to the cooling of glass, including 


optical tomography in biomedical imaging. In a 
simplified form, the so-called “grey model,” these 
equations can be reduced to 


EOpT (x, v, t) +v- Vale(x, v, t) 
1 1 f f 
A- (gf Goes, .t))dv’) (Io, pat) 


/ 
tr I.(x,v',t)) dv’) 0 [12] 
In contrast to the previous example, the problem 
is, in many cases, nonlinear. The opacity o is a 
positive function that depends on the intensity Ke 
through 


~ 1 
Xt) = zl, I.(x,v',t) du" 


and which goes to co with I. going to zero. The 
moments method can be applied with the aver- 
aging lemma, and one shows that the limit of Ie is 
a function that is independent of v and satisfies 
the following degenerate parabolic equation: 


1 
ol — Vx (n Vs) =( [13] 
This equation is similar to the one obtained in the 
description of porous media and contains the 
following information: for initial data I(x, 0) with 
compact support, in contrast to the behavior of 
solutions of the standard diffusion equation, the 
solution I(x,t) remains compactly supported in x. 
The boundary of this support is the thermal front 
and for a finite time, up to saturation (by water in 
porous media, by reacted deuterium in laser- 
confined fusion), this front remains fixed. 


What made the analysis of the above macroscopic 
limit simple was the existence of an € > 0 dependent 
process which, for vanishing <€, forces the solution to 
converge to a “thermodynamical” equilibrium. The 
irreversibility was already present in the first arrow 
of Figure 1. This is what made the analysis of the 
second arrow simple. The subtleties of the appear- 
ance of the irreversibility in the first arrow may be 
well explained by the next examples. 


The Linear Billiard Model 


In the absence of an external electric field, the model 
proposed by Lorentz could be viewed as a limit of a 
system of particles evolving freely between spherical 
obstacles and reflecting on these obstacles according 
to the law of geometric optics. Along these lines, 
two types of results have been proved in two space 
variables. 


In 1973, Gallavotti considered the case where the 
obstacles are randomly spaced under a Poisson 
configuration and proved the following theorem: 


Theorem 1 Consider obstacles(balls) of radius «€ 
and center c;. Assume that the probability of finding 
exactly N such obstacles in a bounded measurable 
set A C R? is given by the “Poisson law” 


uN 
P(den) — eo Hel Al NI dcı dcz - -- den [14] 
with 


CN = C1, C2,..., CN and m=” [15] 
€ 


Denote by E‘ the expectation with respect to the 
above Poisson distribution. For given e and cn 
introduce 


Oey = R'\ Ur<ien {|x — cl < €} |16] 
and feye, the solution of the problem 
Ofen Ntt) ++ Vefen,e(x,U;t) = 0 
in Oene X S! [17] 


with specular reflection on the boundary and 
v-independent initial data: 


fae) E On. Ss! [18] 
Then 
PRG) SA ee [19] 


converges weakly for t >Q to the solution of the 
transport equation 


Of (x, v, t) +u-VF(x,v,t) + py 2f tx v) 
-3 [few - vid = [20] 


f(x, v, 0) = (x) in R? x S! [21] 
The situation is completely different when the 
obstacles are periodically spaced, a situation which 
seems closer to Lorentz’s original idea. Golse (2003) 
(and previous contributions quoted in this article) 
obtained the following result: 


Theorem 2 Assume that the obstacles are periodi- 
cally spaced and conveniently scaled, defining the 
domain 


O.=R\ y dek- Se} pa 
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Then there exists a family of continuous uniformly 
bounded initial data such that no subsequence 
extracted from the family of solutions of 


dfe +v- Vxfe=0 in Oe x S! [23] 


with specular reflections on the boundary, converges 
to solutions of equations of the type |20]. 


This pathology is related to the existence of 
particles that can travel freely for a very long time 
before meeting the obstacles, and the proof with 
some arithmetic (Diophantine approximations and 
continued fractions) relies on the analysis of such 
trajectories. 

A comparison between the Theorems 1 and 2 
shows that the ergodic property of the free flow on 
the periodic lattice is not strong enough to lead to a 
collisional kinetic equation unless some complemen- 
tary randomness is introduced. 

The examples of this section should be compared 
with the rigorous derivation of the Boltzmann 
equation by Lanford (see Boltzmann Equation 
(Classical and Quantum)). The reader should 
observe that this derivation corresponds to the 
same type of scaling (finite mean free path). 
However, no extra randomness is needed in this 
case. The proof uses the fact that configurations 
leading only to a finite number of binary collisions 
are of full measure. This corresponds to an 
ergodicity property which is enforced by the fact 
that the problem is genuinely nonlinear. 


Mean-Field Scaling and Vlasov Equations 


The neutron transport equation is devoted to the 
interaction with obstacles and the Boltzmann 
equation to binary collisions. A simpler situation 
from the mathematical point of view corresponds 
to the case where each particle is under the action 
of the average of all other particles. Then the name 
“mean field limit” is used. The simplest example is 
the derivation of a Vlasov-type equation from a 
system of N classical particles interacting with a C? 
potential V(|x|). The following Hamiltonian is 
used: 


H (kiyin ÆN Vigna UN] 
=- D BLL D Vinsa) PA 
1<k<N 2 2N IEN 


and the name mean-field scaling is related to the 
factor N™ before the potential. Assuming that the 
particles are undistinguishable, one introduces 
the joint probability density Fy = Fn(x1,...,xN, 
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v1,..., VN) in the N-particle phase space, which 
satisfies the Liouville equation 


A:Fy + {H,Fu}:=0Fn + Y` (v Veen 


1<k<N 
1 
—s— >) Va (V(lær — x1) 
1<IZR<N 


From [25], with the notations 


Va = (Pissa) 


VN = CA ee , XN) 


Xy (Ki oxigXy) 
XN = Care debe AN): 


one deduces an infinite hierarchy of equations for 
the marginals 


FR (Xa, Vn, t) = J An, VN, t)d xX, dV; 


for 1<n<N, Fy =OforN <z: 


dF” (Xn, Vint) + So VaVe Fh Xn Vat) 


1<i<n 


1 
N » Vo; (Va V (oi — x) FN (Xa Vn t)) 





1<i<j<n 
N-n . 
a ( D Va f Vx, V(\xi — x*|) 
1<i<n 
sI (Xa Vex oe. dv") =) [26] 


Letting N go to infinity, one obtains “formally,” for 
the distribution functions, 


a eng 
the Vlasov hierarchy: 
OF” (Xn, Wn, t) + Vn: Vx," (Xn, Vn,t) 


=> v.(/ Vx, V (|i — x") 


1<i<n 


SE (Xn Vag oe dv" dv) =0 J27 
Observe that for any density F(x, v, t) that satisfies 
JJ F(x v;t)jdedv=1,;, F(x vt) 2 0 [28] 


and is a solution of the V potential Vlasov equation: 


O,F(x,v,t) +v-VF(x,v,t) 


— (| Vx V(|x — x" )Plx",0" datde) 
X Vanav) =0 [29] 


the factorization formula 


Xa Vent) = [| Fert) [30] 


defines a solution of the above Vlasov hierarchy. 

A uniqueness argument implies that any solution 
of the Vlasov hierarchy which is factorized at time 
zero will remain factorized at any subsequent time. 
Such a property, also observed for the hierarchy 
leading to the Boltzmann equation, is called the 
propagation of chaos. To make the proof rigorous, 
one has to analyze the limiting process in the 
hierarchy and prove the uniqueness of the solution 
of the infinite hierarchy. For a smooth potential, this 
has been done by Braun and Hepp in 1977 and by 
Spohn in 1981. An interesting approach consists, 
following Dobrushin, in introducing the Wasserstein 
distance; see Golse (2003) for a detailed exposition. 

In the case of the Vlasov—Poisson equation [29] 
with V(|x|)=1/4r|x| the potential turns out to be 
too singular for the above derivation. In particular, 
the corresponding solution of the N-particle pro- 
blem is not uniformly defined. However, for the 
corresponding equation (and for variants thereof, 
including the effect of the magnetic field, the 
Vlasov-Maxwell system) a series of mathematical 
results concerning existence and stability of solu- 
tions have been obtained. An excellent recent 
exposition of these results can be found in the 
book of Glassey (1996). 

Equation [29] as well as the original system turns 
out to be fully reversible. Neither irreversibility nor 
averaging has appeared in the limit process which 
corresponds to the first arrow of Figure 1; this is due 
to the “weak coupling.” Therefore, irreversibility 
should now appear on the second arrow. Integrating 
eqn [29] with respect to v gives the relation (often 
called Fick’s law): 


O,0(x, t) + Vx J F(s tdv =0 [31] 


But now expressing the current j= f vF(x,v,t)dv in 
terms of macroscopic variables turns out to be a 
difficult issue in the absence of a “relaxation” effect. 
Up to now there has been no derivation of such 
macroscopic equations from first principles. 

The same type of problems exist for the two- 
dimensional Euler equation, which is in some sense 
very similar to the Vlasov equation. It has been 
observed that these equations develop for “turbulent 
initial data” a kind of “mixing process” leading to 
coherent structures that would play the role of 
thermodynamical equilibrium (in the absence of 
relaxation). The Jupiter red spot is the most 


well-known example of such a structure. These 
coherent structures are obtained by maximizing an 
entropy which does not come directly from the 
dynamics but which is inspired by similar problems 
in statistical mechanics. Finally, one has to take into 
account in this construction the existence of an 
infinite set of conserved quantities: for each regular 
function G, vanishing at infinity, one has 


SII G(F(x,v,t))dx dv = 0 


This approach was already started by Onsager in 1945 
and pursued by many scientists. A recent reference is 
the article of Chavanis and Sommeria (1998). 


Derivation of Kinetic Equations from the 
Schrödinger Equation 


Oscillatory solutions of the Schrödinger equation, 
with wavelength of the order of the Planck constant, 
tend to behave like particles. This is described in 
detail by different tools of high-frequency approxi- 
mation. In particular, the limit of the Wigner 
transform of the density w(x, t) @ wy, t): 


1 : h 
W(x, €, t) = Ton eioy(x +32) 
2 U(x — pt ey [32] 


is a solution of a Liouville equation. Therefore, one 
should expect that in the presence of “many” 
obstacles (“many potentials”) the limit should be 
given by a kinetic equation. As shown by the 
previous section the introduction of randomness 
seems compulsory in reaching this goal. 

Consider a big cube A= Az of size L in R°. Let 
w=(Xq),a=1,2,...,N denote the configuration of 
random obstacles distributed uniformly in A. The 
density of obstacles is p= N/L? and the expectation 
with respect to this uniform measure is denoted by 


z= J] 


~ J dxa ) 
1<a<N 


With V(|x|) a smooth, short-range potential, the 
random potential created by the obstacles is 


Vix) = $, V(|x — xal) 


1l<a<N 


then one of the typical results (low-density limit, 
which corresponds to the quantum version of 
Gallavotti classical result) obtained, reads as follows: 


Theorem 3 (Erd6s and Yau 1988) Assume that the 
density of obstacles is p=poe with a fixed po. 
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Denote by (t) the solution of the Schrodinger 
equation 


iis, = — t Ags, + Vis [33] 


with initial condition localized and oscillating at the 
scale e, that is, with h and S smooth 


y£ (0) = &/7h(ex) exp (i =) [34] 
€ 
Consider the density matrix p(t, x, y) =% (t,x) 8 
(t, y) and its Wigner transform 


W(x, § t) 


1 2 ey ey [35] 
= IY 9€ 2 Wie 
E (bet 5) 4y 





Then for any t > 0, EWS(t) converges weakly with e 
going to zero to a solution F(t) of the kinetic equation 


F(t, x, E) T £ i Veh hX) 
= J T(E, &)PO(\e? — EPEE, x, ) 
— F(t, x, &))dé 36 


where T is the amplitude of the scattering operator 
associated to the Schrödinger equation with the 
short range potential V. 


The proof uses several ingredients including 
scattering theory with expansion in term of Dyson 
series; see Erdos and Yau (1998). 


Semiconductor Modeling 


In modern computers, the electronic devices are so 
small that the electric current may have no space/time 
to reach a thermodynamical equilibrium. Therefore, 
this turns out to be a field where the kinetic equations 
are the most naturally used. Details of what can be 
deduced from a mathematical analysis can be found 
in Poupaud (1994). The equations involve the 
distribution of electrons f.(x,k,t) and holes 
fa(x, k, t) and have the following form: 


cOifelt, x, k) + Ve(R)Vxfe(t, x, k) 


+2 V U(t, x) - Vafelt, xk) 


7 KONATE k) + Re(fesfn(t,*,k)) [37] 


cOifalt, x, k) Sig Dil) V ality k) 


-F Vr U(t, x) - Vefalts%k) 


=~ Olfa) (t,x, k) + Ralfa f), k) [38] 
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The variable k ranges over a torus B of R? which, in 
physics books, carries the name of Brillouin zone. 
The velocities of propagation of electrons and holes 
are determined in terms of the energy band by the 
formula 


1 
Yeh = z VkEen(k) [39] 


The potential U is determined in terms of the doping 
profile C(x), the conductivity €, and the density of 
electrons and holes according to the formula 


) — yf flex. kak 


+ F fat dk) [40] 


Finally Qep and Ren are binary integral operators in 
the variable ke B which model collisions and 
generation-recombination processes. Concerning 
the “mathematical approach”? the situation is as 
follows. 

The relations [39] can be deduced from the high- 
frequency analysis of the solution of the Schrödinger 
equation 


—A,U(t,x) = = Nas 


2 
bob- Aps v(E)u 41] 

2 h 
with V a periodic potential constructed on the dual 
lattice of B. The method uses the Bloch decom- 
position of the solution and the Wigner series 
(Poupaud 1994). No mathematical derivation of 
the collisions operator is currently available. The 
situation should be compared to what is said in the 
section “Derivation of kinetic equations from the 
Schrödinger equation,” but in a much more 
complicated setting. 

On the other hand, the collision operators Qen 
and Ren, as given by phenomenological arguments, 
have enough good relaxation properties to allow a 
rigorous limit of the system [37]-[38] for € going to 
zero (Poupaud 1994). This leads to the justification 
of the so-called drift-diffusion models and to the 
possibility of constructing correctors (with respect to 
€) and to treating the effect of heterojunctions by 
boundary layer analysis. 


Some Specific Mathematical Tools 


Few proofs were given in the above exposition and 
details would not be suitable for a review article. 
However, the mathematical approach to kinetic 
equations has generated some new tools, and it 
may be useful to give the most prominent ones. 


The Averaging Lemma 


Compactness results appear in spectral theory and in 
the construction of solutions of nonlinear equations 
(whenever strong convergence is needed for the 
limit). Being hyperbolic, the transport operator 
v- Vx propagates singularities along characteristics. 
Therefore, at first sight it seems hopeless that one 
might obtain any regularizing effect from the free 
streaming part of a kinetic model. The key to 
obtaining regularizing effects from the transport 
operator v- Vx is to seek those effects not on the 
number density itself, but on velocity averages 
thereof; in other words, on the macroscopic densities. 

Here is the prototype of all velocity averaging 
results. 


Theorem 4 Let F, be a bounded family in L?(R! x 
RÊ). Assume that the family v - VF. is also bounded 
in L?(R° x Rf). Then, for each ¢ € L?(R%), the 
family of moments p,.(x) defined by 


p(x) = | F.(x,0)6(v)du 


is relatively compact in L*(R°). 


For the proof one starts with the expression 
G =F; +v- VF, takes the Fourier transform with 
respect to x of this relation and writes for p,(&) the 


expression 
a= 


Then use the Cauchy-Schwarz inequality to obtain 


\|*d 
a s ( [,eoree A G(Ev)Pdv [43] 


and complete the proof by standard arguments. 

The averaging lemma was first observed by 
Agoshkov (1984) for abstract results concerning 
the regularity of solutions of kinetic equations in 
domains with boundary. Independently, it was 
rediscovered in the improved form given above by 
Golse, Perthame, and Sentis (1985) and used for the 
spectral theory in the diffusion approximation. The 
extension to L?, p > 1, spaces and to L! (with use of 
entropy estimate) were instrumental in proving the 
validity of the Rosseland approximation for the 
radiative transfer equations and for the proof of 
existence by Lions and Di Perna of renormalized 
solutions of the Boltzmann equation. A more refined 
result needs to be used to establish the incompres- 
sible limit of the solutions of the Boltzmann 
equations; see Golse (2005) for details and a 
complete list of references. 


G.(E,v) o(v)dv 
14+ w.€ rA 


The Dispersive Property 


Consider for the solutions in R? x R? of the 
elementary kinetic equations 


afto- Vf =0, f(x,v,0) =a) [44] 


the local density 
plx.t) = f flee,v,t)du 45] 
R; 
From the relation 


pDl = f Fv dv 
= | vrv tdv 
R, 


Y — Vi, W V 
< f sup Pæ- vew) 46 


weR? 


deduce with an elementary change of variable the 
following estimate, which carries the name of 
dispersion lemma, 

1 jpo 

gel I (Rd.L~(R') [47] 


From interpolation and duality arguments follows: 


lo(x,t)| < 


Proposition 1 The macroscopic density p defined 
by [45] satisfies the inequality 





ll LRL R) = CEA) IF Ilia [48] 
for any choice of real numbers a, p, and q such that 
1 < "a d 2 — d 
: [49] 
1 < 4 = 2p 2d- 
=" p+1 2d=1 


The values a=1, p=1, and g=oo are obvious. 
The other limiting values are the interesting ones. 
They are given by p=d/(d — 1), that is, p =d’ then 
q=2 and a=2d/(2d — 1). 

These inequalities carry the name of Strichartz 
inequalities because they are very similar to classical 
inequalities obtained by Strichartz for the solution of 
the free Schrödinger equation. This should not be 
surprising since the Wigner transform of the densities 


1 
fev) =o J aey 
@ p(x — 3y, t)dy [50] 


then turns out to be a solution of the transport 
equation 


Of +v- Vf =0 [51] 
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However, the estimates for kinetic equations are not 
easily translated into estimations for the Schrödinger 
equation because the properties of the initial data in 
terms of norms cannot be simply estimated in terms of 
the inverse Wigner transform. Spaces with Fourier 
transform in L’, p Æ 2, are not easy to characterize and 
not natural for the Schrödinger equation. The above 
estimates have been very useful in analyzing the large- 
time behavior of solutions and also in proving the 
regularity of the three-dimensional Vlasov equation. 


The Entropy and Entropy Dissipation 


For solutions of the Boltzmann equation the 
Boltzmann H function 


He J f(x, v) log f(x, v)dx dv 


decreases in time and the same is true for the 


relative entropy to an absolute Maxwellian M(v) = 
(2m) 3/2e-ll'/2; 


H(F|M) = J - (f i @ -f4 M) ide 


This leads to the systematic introduction in the theory 
of the notion of relative entropy. It turned out to be 
instrumental in proving relaxation toward equilib- 
rium of solutions of kinetic (or similar) equations 
and for the analysis of hydrodynamical limits. 

A striking example considered by Desvillettes and 
Villani is the linearized Fokker-Planck equation in 
any space dimension: 


OF +v- VF — V V(x) V F 
= V (VF + Fv) [52] 


When x+> V(x) is a smooth potential strictly convex 
at infinity, this system has a unique steady state 
given by the relation 


53] 
For any solution of [52] one has 
F 
oH (F|M) +f FIV, log—|- dxdv=0 [54] 
R’ xR? M 
which says that the entropy dissipation is the 
relative Fisher information (with respect to v) of F. 
Now, to study the relaxation to equilibrium, one 
uses the logarithmic Sobolev inequality: 


1 Pe 
oo ce 
H(F|M) < ofa FIV, log =| dx dv [55] 


Details, references, and extensions can be found in 


Arnold et al. (2004). 


208 Knot Homologies 


Conclusions 


Kinetic equations have been studied since the end of 
the nineteenth century, both from the physical and 
mathematical points of view, but it seems that since 
the middle of the last century the interest in this 
approach has considerably increased. 

The fact that these equations are well adapted to the 
description of media which have not “thermalized” 
(because they are too rarefied or because the domain 
where they evolve is too small) has been a basic reason 
for their use in many applied fields; to the ones already 
quoted one may add the analysis of the air between the 
reading head and a compact disk, the computations of 
the characteristics of an ionic motor, and many others. 

As a consequence, mathematical progress has 
been very important. Without going into the details, 
this contribution is focused on this, and in particular 
on what can be obtained by the deterministic 
approach and where the introduction of randomness 
seems compulsory. 

The kinetic formulation turned out to be well 
adapted to large-scale computers, in particular with 
Monte Carlo simulations. One should observe that 
the point of view of modern functional analysis 
contributes stability estimates to the understanding 
and improvement of numerical methods. For an 
introduction to such numerical methods, the reader 
should first concentrate on the Boltzmann equation 
itself, which has been one of the basic motivations; 
consult the book of Sone (2002) the references 
therein and in particular the book of Bird (1994). 


See also: Boltzmann Equation (Classical and Quantum); 
Breaking Water Waves; Einstein’s Equations with Matter; 
Fourier Law; Interacting Stochastic Particle Systems; 
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Introduction 


A knot homology is a theory which assigns to a knot 
K (or link L) in S? a graded homology group whose 
graded Euler characteristic is a knot polynomial 
associated to K. In all known examples, the knot 
polynomials in question are specializations of the 
HOMELY polynomial Pg(a,q), which we take to be 
determined by the skein relation 


aP(X) -a~ P(X) = (q—g')PO) [1] 


Nonequilibrium Statistical Mechanics: Dynamical 
Systems Approach; Partial Differential Equations: Some 
Examples; Quantum Dynamical Semigroups. 
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and normalized so that P of the unknot is equal to 1. 
Let Px(K) be the specialization of Px given by 


Py(K) = Px(q",4) |2] 


Then for each N > 0, there is a bigraded knot 
homology Hx/(K), which satisfies 


Px(K) = $ (= D'g dim HY (K) 3] 


We refer to the first grading i as the homological 
grading, and the second grading j as the polynomial 
or q-grading. 

The idea of a knot homology was introduced by 
Khovanov (2000) in a seminal paper, in which he 
defined the homology theory corresponding to the 


Jones polynomial (N =2). In subsequent work, he 
defined such a theory for N=3, and then, in 
collaboration with Rozansky, for any N>0. 
Recently, the two authors have introduced a triply 
graded homology theory H’/*(K) whose graded 
Euler characteristic gives the entire HOMFLY 
polynomial: 


Px(a,q) =) (-1)'q/a* dim H"*(K) 4] 
i,j,k 
All of these theories are combinatorial in nature. 

In contrast, the knot homology for N =0 arises 
from a very different source — the Heegaard Floer 
homology of Ozsvath and Szabo. This theory traces 
its roots back to invariants of 3- and 4-manifolds 
defined using Seiberg—Witten and Donaldson theory. 
The definition of Ho(K) is not combinatorial, but 
because of its connections with these invariants, the 
theory is known to carry a good deal of geometric 
information about the knot K. The interplay 
between the two apparently different sorts of knot 
homologies (N > 0 and N=0O) has enhanced our 
understanding of both sides. 

This article will mostly focus on the cases N=0 
and N=2, which are the oldest and _ best-studied 
examples of knot homologies and are related to the 
two best-known specializations of the HOMFLY 
polynomial — the Alexander and Jones polynomials. 
We have chosen to use a uniform notation to 
emphasize the similarities between theories, but the 
reader should be aware that other notation is more 
common in the literature. Ho is often referred to as 
the knot Floer homology (written HFK), and is 
usually normalized with a polynomial grading of 
ï = j/2, corresponding to the substitution t= q?, 
which gives the standard normalization of the 
Alexander polynomial. H3 is generally called the 
reduced Khovanov homology, and often denoted by 
Kh, or Khyeg. 


Construction 


Seen from a distance, all knot homologies are 
defined in much the same way. Given a knot K, we 
must first choose some additional data D which 
give a concrete geometric presentation of the knot. 
Using this data, we write down a bigraded chain 
complex (C/(D),dn). This complex depends on 
our initial choice of D, but when we take 
homology, we are left with groups Hy/(K) which 
are invariants of the knot K (cf. the simplicial 
homology of a topological space X, where the 
chain groups depend on the choice of some initial 
geometric data — a triangulation of X — but the 
homology groups are invariants of X). 
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In all cases, the generators of Cn(D) correspond 
naturally to terms which appear in a classical model 
for computing Px(K). In other words, we can write 


Py(K) =$ 0(-1)'% qi [5] 


oes 


where the sum runs over a set of states S determined 
by D, and the functions 7 and j are also determined 
by D. C/(D) is the free abelian group generated by 
{o € S\i(o) =i, j(0)=7} and the differential dy is 
chosen to preserve the j-grading: j(dyx)=j(x). It 
follows that Cyx(D) decomposes into an infinite 
direct sum of complexes, one for each value of j, and 
[3] is a consequence of [5]. 

Beyond these global similarities, the definition of 
Cy (D) varies with the value of N. In the second half 
of the article, we give explicit details of the 
constructions for N=0 and N=2. 


Filtered Complexes and Deformations 


An important characteristic shared by all the Cy’s is 
the existence of deformations with homology Z. 
Recall that (Cn(D), dn) is a graded chain complex: 
j(dnx) =j(x). By a deformation of such a complex, 
we mean a new chain complex (Cy(D), dn + dy) in 
which the underlying group remains the same, but 
the differential has been perturbed by the addition of 
a new term dù which strictly raises the j-grading: 
jd (2e)) > jl). 

Any deformation of a graded complex is naturally 
a filtered complex, and as such, gives rise to a 
spectral sequence. The Eo term of this spectral 
sequence is the original unperturbed complex 
(Cy(D),dn), so the underlying group of the EF; 
term is just Hy(K). Thus, it is independent of the 
choice of initial data D. In fact, it can often be 
shown that all terms in the spectral sequence beyond 
the first one are invariants of K. This is known to 
be the case for N =0 and N =2, and is most likely 
true for all other N as well (cf. the Leray—Serre 
sequence associated to a fibration, where the first 
two terms depend on a choice of geometric data but 
the E2 and higher terms are all invariants of the 
fibration). 

For each value of N, Cn(D) admits a natural 
deformation whose homology is Z in homological 
grading 0, and zero in every other grading. When 
N=0,2, the filtration grading of this generator is 
known to be an invariant of K. (This is probably the 
case for N>2 as well.) Equivalently, this is the 
j-grading of the surviving copy of Z in the spectral 
sequence. When N = 0, this invariant is convention- 
ally normalized to be half the j-grading of the 
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generator, and is called 7(K). When N=2, it is 
called s(K). 


Geometric Properties 


Some elementary properties of the Hy’s generalize 
those of the HOMFLY polynomial. If K,#K2 
denotes the connected sum of Kı and K2, then over Q 


An (Ki#K2) = Hn(K1) 8 An(K2) (6) 
and if K is the mirror image of K, 
HX(R) = Hy’ (K) 7] 
Moreover, Ho satisfies an additional symmetry 
H (K) = Hy" (K) [8] 


generalizing the symmetry of the Alexander poly- 
nomial: Po(q)=Po(qt). (With integer coefficients, 
these equalities all hold at the chain level. The 
correct statements about the homology can be 
obtained from the Kunneth formula and universal 
coefficient theorem.) 

Hyn(K) also contains deeper information related to 
the genus of surfaces bounding K. If K is a knot in 
S, recall that g(K) — the Seifert genus of K — is the 
minimal genus of an orientable surface smoothly 
embedded in $? and bounding K. If we view S° as 
the boundary of the 4-ball B*, we can define a 
second quantity g,(K) — the slice genus — by relaxing 
the requirement that the surface be embedded in S° 
and instead requiring it to be embedded in B’*. 

Both s(K) and T(K) give lower bounds on the slice 
genus of K: 


IT(K)| < g.(K) [9] 


Is(K)| < 2g.(K) (10) 


These bounds are far from independent. In fact, in 
all known examples, s(K)=27(K). It is an open 
problem to determine whether this is true for all 
knots. 

From [6], it follows that s and 7 are additive 
under connected sum. Thus, both invariants define 
homomorphisms from the concordance group of 
knots in S? to Z. The inequalities in eqns [9] and [10] 
are not always sharp, but there is one case where 
equality is known to hold. This is when K is 
represented by a diagram with all positive crossings 
(or, more generally, K is quasipositive.) In this case, 
the slice genus is also equal to the Seifert genus, and 
all three are easily computed using Seifert’s 
algorithm. 

The proof of [10] depends on the fact that 
for N > 0, Hyn is functorial in the following sense. 


If S c S x [0,1] is a smoothly embedded, orientable 
cobordism between links Lı and L2, then for each 
N>0, there is an induced map ¢%,): Hn(Li)— 
Hyn(L2). ¢3; is a graded map: it preserves the 
homological grading, and lowers the j-grading by 
(N — 1)x(S). Under deformation, it becomes a 
filtered map which induces a rational isomorphism 
on the deformed homologies. 


Ho and Heegaard Floer Homology 


The proof of [9] depends on the close connection 
between the knot Floer homology and the Heegaard 
Floer homology. Roughly speaking, the Heegaard 
Floer groups of 3-manifolds obtained by surgery on 
K are determined by the groups Hj’(K) together 
with additional differentials obtained by relaxing the 
requirement that 1,(¢)=n,(¢)=0. The relation 
with the slice genus again arises by studying maps 
induced by cobordisms, but in this case, the relevant 
cobordism is the surgery cobordism between S° and 
the 0-surgery on K. 

This connection also leads to another important 
property of Ho: it detects the Seifert genus. If we let 
M(K) be the largest value of j for which the group 
HY’ (K) is nontrivial, then 


M(K) = 28(K) [11] 


This fact generalizes a well-known inequality invol- 
ving the degree of the Alexander polynomial: if 
m(K) is the largest power of q appearing in Po(K), 
then m(K) < 2g(K). 


Computations 


The difficulty of computing Hy(K) varies with the 
value of N. When N=1, the theory is essentially 
trivial: H (RK) ~ Z for any knot K, and all other 
groups vanish. Of the remaining knot homologies, 
H>(K) is the easiest to compute. The theory for 
alternating knots was worked out by E S Lee, and 
extensive calculations have also been made for 
nonalternating knots using computer programs 
written by Bar-Natan and Shumakovitch. 
Computing Hp is more difficult, on account of the 
noncombinatorial nature of do. Three families of 
knots for which Hp is well understood are alternat- 
ing knots, (1,1) knots (described in the next section), 
and knots which admit lens space surgeries. Beyond 
this, there is an array of techniques which may or 
may not work in any given case. The best of these is 
probably a setup introduced by Ozsvath and Szabo, 
in which the generators of Co(D) correspond to 
states in the Kauffman state model of the Alexander 
polynomial. Combining this method with the known 


results for alternating knots and (1,1) knots gives a 
fairly good understanding of Ho(K) for knots with 
10 or fewer crossings; for larger knots, relatively 
little is known. 

Few computations of Hy for N >2 have been 
made, although the definition in this case is purely 
combinatorial. 


Thin and Thick Knots 


For simple knots, both Hp and H3 are thin. This 
means that there exists a constant cn(K)(N =0, 2) 
such that H¥/(K) is trivial unless j — 2i=cyn(K). In 
such cases, we necessarily have co(K)=27(K) (resp. 
co(K)=s(K)), and Hy(K) is completely determined 
by cn(K) and Pyn(K). The relationship is best 
expressed in terms of the Poincaré polynomial of 
An (kK): 


Px(K) = X tq! dim Hx(K) 
= (-t) “Px (K)(q(-t)'") [12] 


If K is an alternating knot, both Ho(K) and H2(K) 
are thin, and co(K) =c2(K) =o(K). (Note that in this 
case the bound on g,(K) coming from 7 and s 
coincides with the classical bound coming from the 
signature.) Many nonalternating knots are thin as 
well; in all examples in which both groups have 
been computed, either both Hpo(K) and H>(K) are 
thin, or neither is. In addition, all such knots appear 
to have co(K) =c2(K)=0(K). 

Those knots whose homologies are not thin are 
called thick. There are a dozen such knots with ten 
or fewer crossings: using the standard numbering in 
the knot tables (see, e.g., Rolfsen (1976)) these are 
819, 942, 10124, 10128, 10132, 10136, 10139,10145, 10152, 
10153, 10154, and 10161. It is a curious and as yet 
unexplained coincidence that, for all of these knots, 
the ranks of Ho(K) and H2(K) are equal. 

There is an analogous notion of thinness when 
N > 2, but there exist alternating knots for which 
Hy cannot be thin for N > 0 (this can be seen from 
the HOMELY polynomials). 


Construction of Ho 


We now turn to a more detailed description of the 
definition of Ho(K). The geometric data D used to 
define Co is a Heegaard diagram for the complement 
of K. One convenient way to specify such a diagram 
is by a doubly pointed Heegaard diagram of S*. The 
data for such a diagram consist of a surface © of 
genus g, two g-tuples of attaching circles {a1,..., ag} 
and {61,..., 6e} on X, and two points z,w € X 
which are disjoint from all the a’s and (’s. Each set 
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D 


On 


Figure 1 Heegaard splitting of S? corresponding to the 
standard decomposition of SÌ? into two solid tori. 


of attaching circles is composed of g disjoint simple 
closed curves, arranged so that when ©, is cut along 
them the result is a sphere with 2g holes. Any such 
set of attaching circles determines a unique genus-g 
handlebody H with boundary % and the property 
that each attaching circle bounds a disk in H. 

The choice of œ and 8 curves determines the 
underlying 3-manifold in which the knot is 
embedded. Starting with © x [0,1], we fill in 
one component of the boundary with the handle- 
body determined by the a-curves, and the other 
component with the handlebody determined by the 
G-curves to obtain a closed 3-manifold. By hypoth- 
esis, this manifold is required to be $3. A simple 
Heegaard diagram of S? with g=1 is shown in 
Figure 1. 

To go from a doubly pointed Heegaard diagram 
to a diagram of the knot complement, we remove 
neighborhoods of z and w and replace them with a 
tube to get a surface X’ of genus g + 1. We also add 
an additional a-handle ag1, which runs from z to w 
in X in such a way that it does not intersect the 
other a’s, and then comes back over the tube. This 
process is illustrated in Figure 2. 

A Heegaard diagram of S*—K determines a 
presentation of 71(S° — K) with one generator x; 
for each a-circle and one relator w; for each (-circle. 
To find the relator wj, one travels along ĝ; 
recording each intersection with some a; by append- 
ing x*' to the relator. The sign is determined by the 
sign of the intersection. As an example, consider the 
two doubly pointed diagrams of Figure 3, both of 
which correspond to the same Heegaard diagram of 
S3. (It is isotopic to the one shown in Figure 1.) The 
fundamental groups of the associated knot comple- 
ments can be read off from the corresponding genus- 
2 Heegaard splittings. Starting from the point where 





Figure 2 Going from a doubly pointed diagram to a Heegaard 
diagram of the knot complement. 
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(a) (b) 

Figure 3 Doubly pointed Heegaard diagrams for the unknot 
and the trefoil. Opposite sides of the square are identified to form 
a torus. The dotted line represents ao. 


Bı intersects the left-hand side of the square and 
moving to the right, we get 


m(S° — Kı) = (x1, x2|x1X7 x1 = 1) 


1,,-1 


m(S° = Ky) = Xi Xn Kak Ar x3 Xi = 


The first group is isomorphic to Z, and the knot in 
Figure 3a is the unknot. The second is isomorphic to 
mı of the complement of the trefoil knot, and in fact 
the knot in Figure 3b is the left-handed trefoil. 

The definition of Co(D) is based on a classical 
method for computing the Alexander polynomial 
known as the Fox calculus, which takes as its input 
a presentation of 7;(S° — K). According to Fox 
calculus, 


Po(K) = q” det (d;Wj)1 <i j<g [13] 
Here d,,w; is an element of the group ring 
ZH, (S° — K)] = Z[q**] 


It is determined by the following rules: 


dx, Xj = Oi [14] 
d,,ab = d,a + |a|d,,,b [15] 
dx = — [x7 [16] 


where 

|: m($ — K) >H (S -K)2=Z=(q’) [17 
is the abelianization map. The factor of +q” is chosen 
so that Po(K)(1)=1 and Po(K)(q) = Po(K) (q). 

As an example, consider the two presentations 
above. In the first presentation, | - | sends x; to 1 and 
x2 to q?, so 

dy, (xix xi) =1- ee | + [1x7 
=1-1+1 


=1 [18] 


which is the Alexander polynomial of the unknot. If 
we abelianize the relator in the second presentation, 
we see that |x1| = |x2| = q7, so 


Stet ad 
d,,,(%2%1x5 AT X Xi) 


= |x2| — oroo a + baaa a Xa [19] 
=g -1+0 [20] 


which is the Alexander polynomial of the trefoil. 

When g= 1, the complex Co(D) is generated by 
the points of a; N 61. These intersection points may 
be naturally identified with the appearances of the 
generator x; in w1, and thus with the monomials 
appearing in d,,w ,. For example, the three mono- 
mials which appear on the right-hand sides of eqns 
[18] and [19] correspond, respectively, to the points 
labeled p1, p2, and p3 in Figure 3. The j-grading of 
each generator is given by the exponent of g which 
the corresponding monomial contributes to the 
Alexander polynomial. Thus, all three generators in 
Figure 3a have j-grading 0, while in Figure 3b, the 
generators p1, p2, and p3 have j-gradings 2,0, and —2 
respectively. 

For general g, the monomials appearing in the 
determinant of eqn [13] correspond to intersection 
points of the two totally real tori a=a; x -++ X Qg 
and G= (3; x --- x Be inside the symmetric product 
Sym®S. The knot Floer homology is the Lagran- 
gian Floer homology of œ and 8 inside the 
symplectic manifold Sym (£ — z — w). The genera- 
tors of Co(D) are the points of aM; the 
differential is defined by counting holomorphic 
disks with boundary on a and 8. To be precise, for 
xEan sp, 


dox = ` 


ET? (x,y) w(@)=1 
nz(¢)=nw (d)=0 


#M()y [21] 


Here m(x, y) denotes the set of homotopy classes of 
maps of the strip D = {a + ib |b € [0, 1]} into Symê £ 
which take the right-hand boundary to a and the 
left-hand boundary to 8, and which limit to x as 
b——oo and to y as b — œ. ule) denotes the formal 
dimension of the space of pseudoholomorphic disks 
in this homotopy class. There is a natural action by 
translation on the space of such maps, so when 
u(ġ)=1 we can divide out by this action and obtain 
an oriented zero-dimensional moduli space M(¢). 
Finally, by 1,(¢) and 1,,(¢) we denote the intersec- 
tion number of such a strip with the divisors 
determined by z and w inside of Sym*S. The 
requirement that they vanish forces the strip to lie 


in Sym§(i— z—w). It can be shown that, for 


Ð € 712 (X, y), 
j(x) — i) = nz(b) — nulh) |22] 


so j(dox) = j(x). 

When g=1, computing the differential amounts 
to counting maps of the strip into the Heegaard 
torus. This can be done algorithmically using the 
Riemann mapping theorem, so computation of Ho is 
purely combinatorial. Knots of this form are called 
(1,1) knots. They are one of our few windows into 
the behavior of Ho for large knots. 

As an example, consider the diagram of Figure 3a. 
The two shaded regions represent the domains 
of classes $1 € m2(p1, p2) and ¢3 € m2(p3, p2). 
The Riemann mapping theorem implies that up 
to reparametrization, there is a unique holo- 
morphic map of the strip into each region, so 
#M(ġ1)= t1=#M(d2). The differential in 
Co(D1) is given by 


do(p1) = p2 = do(p3) 
do(p2) = 0 


and Ho(U) S Z. This reflects the fact that we could 
have chosen the more efficient diagram of S? — U 
shown in Figure 1, simply by moving (3; to remove 
two of the intersection points. 

For comparison, consider the diagram for the 
trefoil shown in Figure 3b. All three generators of 
Co(D2) have different j-gradings, so we must have 
do = 0. Thus, Ho(T) = Z°. The two disks ¢; and ¢2 
are still present, but now 1,(¢1)=”y(¢2)=1, so 
neither disk contributes to the differential. This is 
reflected in the fact that 6; cannot be moved to 
reduce the number of intersection points without 
passing through either z or w. 


Deformations 


In this case, finding an appropriate deformation of 
Co(D) is simple: we just drop the condition that 
n(@) =O in the definition of the differential. If a 
homotopy class ¢@ € m2(x, y) contributes nontrivially 
to the sum, it must have a holomorphic representative, 
which necessarily intersects the divisor in Sym® £ 
defined by z non-negatively. Thus, 2,(¢) > 0. From 
[22], it follows that j(x) — ;(y)=n,(¢) > 0, so this 
new differential has the form do + dọ, where dọ 
strictly lowers the j-grading. 

The fact that the homology of Co(D) with respect 
to the perturbed differential is Z goes back to the 
knot Floer homology’s roots in Heegaard Floer 
homology. By dropping the condition that 
nz(@)=0, we have effectively forgotten about the 
basepoint z, and thus about the knot. The new 
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complex simply computes the Heegaard Floer group 
HF(S°), which is isomorphic to Z. When g=1, this 
can be seen directly: if we remove the basepoint z, 
any genus-1 Heegaard diagram of S? can be isotoped 
into the standard diagram of Figure 1. 


Construction of H- 


In this case, the geometric data D needed to define 
the chain complex C2(D) is a planar diagram of 
the knot, and the classical model on which the 
construction of C2(D) is based is the Kauffman state 
model for the Jones polynomial. There is a related 
homology theory H»(D), known as the unreduced 
Khovanov homology, whose graded Euler character- 
istic is (q + q~')P2(K). This is the original categor- 
ification of the Jones polynomial defined in 
Khovanov (2000). 

To construct C2(D), we consider complete resolu- 
tions of the planar diagram D. As shown in Figure 4, 
there are two different ways to resolve each crossing 
of D. If D has n crossings, there will be 2” ways to 
resolve all n, one for each vertex of the cube [0, 1]”. 
To a vertex v, we associate the crossingless planar 
diagram D, obtained from the corresponding reso- 
lution of D. Thus, each vertex of the cube is 
decorated by a 1-manifold D». 

If e is an edge joining vertices vg and vı (where vo 
has one more 0 coordinate than vı), we write 
e:vo— v1, and decorate e with a two-dimensional 
cobordism Se from D,, to D,,. Se is a product 
cobordism outside a neighborhood of a single 
crossing, where it is the one-handle cobordism 
between the 0-resolution and the 1-resolution. The 
resulting cobordism is necessarily composed of 
a union of product cobordisms (cylinders) together 
with a single nontrivial cobordism (a pair of pants). 
Thus, starting from D, we have constructed an 
n-dimensional cube whose vertices are decorated by 
1-manifolds and whose edges are decorated by 
cobordisms between them. This is the cube of 
resolutions of D. - 

The next step in the construction of C2(D) is to 
apply a graded (1 + 1)-dimensional TQFT A to the 
cube of resolutions. A is a functor which associates 
to each 1-manifold X a group A(X), and to each 
two-dimensional cobordism W:X,— X> a homo- 
morphism A(W):A(X1) > A(X2). If we apply A to 
all the manifolds and cobordisms of the cube of 


OX) 


Figure 4 0- and 1-resolutions of a crossing. 
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Table 1 Summary of cube of resolutions 


Vertex v — 1-manifold D, — Group A(D,) 


Homomorphism 
A(Se): A(Dy,) > A(Dy,) 


Cobordism = 
Se: Dy, > Dy, 


Edge =y 
@:V1— Vo 


resolutions, we obtain a new cube, decorated with 
groups and cobordisms between them. This process 
is summarized in Table 1. - 

We can now describe the chain complex C2(D). 


As a group, 
D)=Q AD.) 23 


where the sum runs over all vertices of the cube of 
resolutions. For x € A(D,), the differential is given by 


dx = X` (-1)) AlS.) (x) [24] 


evo! 


The signs in this sum are determined by assigning a 
sign (—1)*” to each edge e in such a way that every 
two-dimensional face of the cube has an odd 
number of — signs on its edges. (This ensures that 
d? —0.) There are many ways to do this, but they all 
result in isomorphic complexes. 7 

The homological grading i on C(D) is easily 
determined. For x € A(D,), we set i(x) =i(v) — c(D), 
where i(v) is the sum of all the coordinates of v, and 
c(D) is a constant. Clearly, i(d2x) =i(x) + 1. In order 
to have invariance, it turns out that c(D) must be 
chosen to be equal to the number of negative 
crossings in D. 

It remains to specify the TQFT A. At the level of 


groups, A(S!) is a free abelian group of rank 2: 
A(S') = A = (1, X) [25] 


General principles then imply that 


A( Il s — Ae 26] 


To specify the maps induced by cobordisms, it is 
enough to describe the maps associated to the two 
pairs of pants shown in Figure 5. They are given by 





A: A — AQA 


Maps induced by pairs of pants. 


m:A®A—A 
Figure 5 


m(1@1)=1 

A(1)=18X+X81 [27] 
m1®X)=m(X@1)=xX 

A(X) =X @xX [28] 
m(X @ X) =0 [29] 


Note that the multiplication m makes A into a 
commutative ring isomorphic to Z[X]/(X?). 

A is a graded TQFT. In other words, there is a 
grading g on A and its tensor products, determined by 


qi) = 1 
g(a 8b) = g(a) + a(b) = 
q(X) = -1 31] 
From eqns [27]-[29], it is easy to see that 
q(m(a 8 b))=q(a@b)-1 32] 


q(A(a)) = q(a)- 1 


If we define j(x) =k(D) + g(x) + i(x), it follows that 


j(d2x) =j(x). Taking the graded Euler characteristic 


gives 


~ 


x(C2(D)) = SV (-g)q4+q'y” [33] 


Vv 


where n, is the number of components of D,. If we 
define k(D) to be the writhe of D, this is precisely 
Kauffman’s formula for the unnormalized Jones 
polynomial. - 

Figure 6 illustrates C2(D) for a simple two- 
crossing link. The figure shows the original link (in 
the center), the cube of resolutions, and basis vectors 
for C2(D), together with their j-gradings. We leave it 
to the reader to check that the homology H2(L) is 
four dimensional, supported in j-gradings 1 and 3 at 
the vertex labeled 00, and in gradings 5 and 7 at the 
vertex labeled 11. 
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¿318X X@1 
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Figure 6 The cube of resolutions for the Hopf link. 


To get the reduced chain complex C2(D), we must 
divide the graded Euler characteristic by a factor of 
(q+q-'). This is accomplished by choosing a 
marked point on K and requiring that for each 
resolution D,, the vector associated to the circle 
containing the marked point lie in the subspace of A 
spanned by X. If D is a diagram of a knot, the 
resulting homology H»(K) is independent of the 
choice of marked point. For links, H2(L) depends on 
the component of the link on which the marked 
point lies. 


Deformations 


Deformations in the N=2 theory are constructed 
using a technique introduced by E S Lee. The idea is 
to replace the graded TQFT A with a filtered TQFT 
A’. As a group, we still have A(S')=A, but the 
multiplication and comultiplication maps are per- 
turbations of those for A: 


m'(1@1)=1 
A’(1)=18X4+X@l1l-r1@1 [34 


m(1@X)=m'(X@1)=xX 
A'(X)=X@X+s101 35] 


m(X®X)=rX +s [36] 


The new terms involving r and s have q gradings 
strictly greater than the terms which are shared with 
eqns [27|-[29]. Thus, the differential defined by 
replacing m and A by m and A’ will be a 
perturbation of the original differential on C2(D). 
The simplicity of the homology with respect to the 
new differential depends on the fact that when the 
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polynomial X? — rX — s has simple roots, the TQFT 
A’ decomposes as a direct sum of two one- 
dimensional TQFTs. This implies that for a knot, 
the deformed homology H5(K) decomposes as a 
direct sum of two copies of H,(K). This group is 
always isomorphic to Z, so H! (K) ~ 7,0 Z. If s=0, 
the same strategy can be used to define deformations 
of the reduced chain complex C;(D). In this case, we 
find that the deformed homology is isomorphic to a 
single copy of Z. 


See also: Floer Homology; Gauge Theory: Mathematical 
Applications; The Jones Polynomial; Knot Theory and 
Physics; Topological Quantum Field Theory: Overview. 
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Introduction 


As in all other physical theories, one expects that 
gravitational phenomena will ultimately be ruled by 
quantum mechanics. This requires to consider the 
quantization of the best available theory of gravity, 
namely Einstein’s general relativity. This problem has 
been considered since the 1930s (see Loop Quantum 


Gravity). The application of the rules of quantum 
mechanics to general relativity is immediately problem- 
atic. Unlike other physical interactions, general 
relativity describes gravitational phenomena through a 
distortion of spacetime rather than through a field living 
in spacetime. Therefore, its quantization is bound to be 
very different from that of other physical theories. In 
particular, the well-established framework of perturba- 
tive quantum field theory, used with remarkable success 
in describing electroweak and strong interactions (in the 
latter case at least in certain regimes), runs into trouble 
when applied to general relativity. At present, it is not 
clear if this is a fundamental problem or if there might 
exist an implementation of perturbative quantum field 
theory that works well in the gravitational case. On the 
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other hand, there exist examples of field theories where 
perturbative methods fail but that nevertheless can be 
quantized. This suggests that the consideration of 
nonperturbative techniques in the quantization of the 
gravitational field could be a promising avenue. 

In particular, canonical quantization methods 
appear attractive for attempting a nonperturbative 
quantization of gravity. Canonical methods force 
the introduction, in a clear way, of a Hilbert space 
of states and definition of the quantum operators of 
interest. The application of canonical methods to 
classical general relativity was pioneered by Dirac 
and Bergmann in the late 1950s. During the 1960s, 
the resulting canonical theories were considered in a 
quantum setting by DeWitt. At the time it appeared 
that making progress in the canonical quantization 
of general relativity was going to be quite a 
challenge. In particular, the canonical theory has 
constraints, which have to be implemented as 
Operator identities quantum mechanically. The 
wave functions were functionals of the spatial metric 
of spacetime. One of the operator identities to 
be satisfied implies that the wave functions only 
depend on properties of the spatial metric that 
are invariant under spatial diffeomorphisms. This 
is a direct consequence of general relativity being 
a theory that is independent of coordinate choice 
since a diffeomorphism changes the assignment of 
coordinates to points in the manifold. Finding such 
wave functions already presented a challenge, since 
there is no well-grounded mathematical theory of 
functionals of diffeomorphism-invariant classes of 
metrics. Moreover, the other operator identity to be 
imposed, known as the Hamiltonian constraint or 
Wheeler-DeWitt equation, was a nonpolynomial 
complicated operator equation that does not admit 
a simple geometrical interpretation and needs to be 
regularized. Since one does not have a background 
metric to rely upon, traditional regularization 
techniques of quantum field theory are not suitable 
to deal with the Hamiltonian constraint. 

These difficulties severely hampered development 
of canonical methods for the quantization of general 
relativity for approximately two decades. The 
situation started to change when Ashtekar noticed 
that one could choose a different set of variables 
to describe general relativity canonically. Instead of 
using as variable the spatial metric g,,, Ashtekar 
chooses to use a set of (densitized) frame fields E; 
The relationship between the metric and the 
densitized frames is det (q%)q™® =E?E? and we are 
assuming the Einstein summation convention, that 
is, the index 7 is summed from 1 to 3 (such an index 
labels which vector in the triad one is referring to). 
The resulting theory has an additional symmetry 


with respect to usual general relativity, in the sense 
that it is invariant under the choice of frame. This 
symmetry operates on the index 7 as if it were 
an SO(3) symmetry. As canonical momenta the 
usual choice is to pick the extrinsic curvature of the 
3-geometry. Ashtekar chooses a variable related to it 
that behaves under frame transformations as an 
SO(3) connection, A‘. The resulting theory is there- 
fore cast in terms of a canonical pair (E, , Ai), with 7 
an SO(3) index. One can therefore consider the 
canonical pair as that of a Yang-Mills theory 
associated with the SO(3) group. In fact, associated 
with the extra symmetry under triad rotations the 
theory has a new set of constraints that take 
the form of a Gauss law, D,E; =0 with D, the 
covariant derivative formed with the connection A‘. 
This allows us to view the phase space of a Yang- 
Mills theory as the kinematical arena on which to 
discuss quantum gravity. The theory is of course 
different from the Yang-Mills theory. In particular, 
it still has constraints that imply that it is invariant 
under spacetime diffeomorphisms. In the canonical 
picture, these constraints appear asymmetrically as 
one constraint is associated with time evolution 
(“Hamiltonian constraint’) and a set of three 
constraints is associated with spatial diffeomorph- 
isms (“diffeomorphism constraint”). 

If one quantizes the theory starting from the 
Ashtekar formulation, given the resemblance with 
Yang-Mills theory, the natural choice for a represen- 
tation of the quantum wave functions is to consider 
wave functions of the connection W[A] that are 
invariant under SO(3) transformations. Such a repre- 
sentation is known as “connection representation.” 
There is significant experience in Yang-Mills theory in 
constructing such wave functions. In particular, it is 
known that if one considers the parallel transport 
operator defined by a connection around a closed 
curve (holonomy) and one takes its trace (“Wilson 
loop”), the resulting object is invariant under SO(3) 
transformations. What is more important, the set of 
traces of holonomies along all possible closed loops is 
an overcomplete basis for all gauge-invariant func- 
tions. More recently, it has been shown that one can 
construct a less redundant complete basis using 
techniques from spin networks. We will discuss later 
on how to do this. 

Since any gauge-invariant functional can be 
expanded in the basis of Wilson loops, one can 
choose to represent it through the coefficients of 
such an expansion. These coefficients are functions 
of the curve upon which the corresponding element 
of the basis of Wilson loops is based. The 
representation of wave functions in terms of such 
coefficients is called “loop representation.” Wave 


functions in the loop representation are functions of 
a closed curve (more precisely of families of closed 
curves, or spin networks, as we will discuss below). 

We still have to deal with the diffeomorphism 
and Hamiltonian constraints. The diffeomorphism 
constraint when written in the loop representation 
implies that the wave functions are not functions of 
loops but rather of topologically invariant properties of 
the loops under general diffeomorphisms of the spatial 
manifold containing the loops. Such functions are 
technically known in the mathematical literature as 
“knot invariants.” This is the first point of connection 
between knot invariants and quantum gravity; they 
constitute the kinematical arena of the theory. One still 
has to deal with the Hamiltonian constraint, which has 
to be imposed as an operator equation. We shall see that 
knot theory also seems to have a lot to say about 
solutions of the Hamiltonian constraint. This is quite 
remarkable, since the Hamiltonian constraint embodies 
in detail the specific dynamics of Einstein’s theory of 
gravitation, and to our knowledge this is an input that 
has never gone into the ideas of knot theory. 

In terms of the Ashtekar variables, the Hamiltonian 
constraint takes the form 


H = EE? x (B° + AE‘)e,y, [1] 


where we have used a conventional vector notation 
for the frame indices and kept explicit the spatial 
indices. €,,, is the Levi-Civita totally antisymmetric 
tensor. We have included a possible cosmological 
constant A. The Ashtekar formulation can be 
constructed in different ways. In the original 
formulation, the connection A’, was a complex 
variable and the Hamiltonian took the form we 
listed above. However, the resulting theory was only 
equivalent to real general relativity if the variables 
satisfied certain reality conditions. One can choose 
to use a real connection instead, but then the 
Hamiltonian constraint has additional terms. At 
the moment, we will concentrate on the constraint 
as listed above. The constraint has to be implemen- 
ted as a quantum operator acting on wave functions. 
Since it involves the product of operators, it needs to 
be regularized. Most regularization methods are 
problematic in this context, since they use a metric, 
and here the metric is a quantum operator, not an 
external fixed quantity. If we ignore these difficul- 
ties, one observes that, if one were to choose a 
quantum state, for instance in the connection 
representation, for which, 


AE?U[A] = —B7U[A] [2] 


the state would be annihilated by the Hamiltonian 
constraint, and this would be true no matter what 
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regularization was chosen. Classically, the condition 
E? ~ B? is satisfied for the de Sitter geometry, so one 
could envision the state as a quantum state 
associated with such geometry. The exact solution 
of the above equation is given by a state that is the 
exponential of the integral on the spatial slice of the 
Chern-Simons form built from the connection 


Wcs|A] = exp (i J d’xtr(A ^dA 


+5Anana) | [3] 


and the constant k needs to be chosen as k = 6/A for 
the state to be a solution. 

One can ask, “what is the expression of this state 
in the loop representation?” To answer this, one 
needs to compute the coefficients of its expansion in 
the basis of Wilson loops W, [A], where as we stated 
earlier, y should be a collection of (intersecting) 
loops (later we will discuss the generalization to spin 
networks). The expression for the coefficients will 
be a function only of the loops y and is given by 


Ves|y] = J DAW,,|A]Wcs|A| [4] 


This expression is invariant under diffeomorph- 
isms of the manifold or, equivalently, under smooth 
deformations of the curve y. That is, it is what in the 
mathematical literature is called “knot invariant.” In 
fact, this integral has been studied by Witten in the 
context of Chern-Simons theory and has been 
shown to be related to the Kauffman bracket knot 
polynomial, which in turn is related to the cele- 
brated Jones polynomial. Therefore, the implication 
of these results is that the Kauffman bracket knot 
polynomial appears to be the representation in the 
loop representation of a state of quantum gravity 
that solves the quantum Einstein equations (with a 
cosmological constant). The reader may be intrigued 
by the word “polynomial” in this context. It should 
be noted that the Chern-Simons state Wcs[A] 
depended on a parameter k, which had to take a 
certain value for it to solve the quantum Einstein 
equations. The resulting knot invariant is a poly- 
nomial in exp (k). If one expands out the result, an 
infinite power series in k results. There will be 
infinite coefficients in the series, but they are just 
combination of the finite number of coefficients of 
the polynomial. Knot polynomials are a powerful 
tool for analyzing and distinguishing knots. The 
coefficients of the polynomials are all knot invari- 
ants. Typically, for “simple” knots, the first few 
coefficients of the knot polynomial are nonzero. As 
one considers more complicated knottings, higher 
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coefficients become nonvanishing. The ultimate goal 
of knot theory is to be able to consider two arbitrary 
knots and to unambiguously determine if the two 
knots are related by a smooth transformation. The 
knot polynomials appear as promising tools for 
achieving this task that has remained elusive up 
to now. 

Returning to quantum gravity, to have a well-known 
knot polynomial as a solution of the quantum Einstein 
equations is a remarkable fact. The first connection we 
outlined between knot theory and quantum gravity was 
less unexpected: if one describes a theory that is 
diffeomorphism invariant in terms of loops, the 
appearance of knots is inevitable. But we are now 
finding that knot invariants from the mathematical 
literature, which were constructed without any knowl- 
edge of the details of the dynamics of the Einstein 
equations, seem to manage to solve such equations. This 
is either a big coincidence or a pointer to some 
unexplained deep connection yet to be understood. 
Notice, for instance, that other theories of gravity would 
not have the Kauffman bracket as a quantum state. 

There is a certain technicality about the Kauffman 
bracket that makes it difficult to argue with precision 
that it is a state of quantum gravity. To understand 
this technicality better, it is perhaps best to concen- 
trate on the form of the quantum state written above 
if the connection is an abelian connection. In that 
case, the integral in question, 


Wes abelian a = j DA f dy? exp(1A,) 
Y 


x exp ( / Poe AsQhAc) [5] 


by turning it into a Gaussian integral. The result is 


— a b (x 7 y)“ 
Yes biia l $ dx f dy Eabc ix = y| [6] 


This integral has problems, since the integrand is 
ill-defined when x=y. Notice that the integral 
would be well defined if the two contour integrals 
were evaluated on different, nonintersecting curves. 
The result would be the well-known formula for 
the Gauss linking number of the curves, yielding 
zero if they are not linked and and integer multiple 
of 4r if they were. So the integral we were trying to 
compute was actually the Gauss linking number of 
the curve with itself. Such a quantity is not well 
defined for ordinary curves. To deal with this 
problem, mathematicians introduced the concept of 
framed knots. A framed knot is a curve with a 
prescription to determine a second curve from it. 
One way to see it is to construct another curve that 
is “infinitesimally close” in space to the original 


one. It is clear that there is no canonical way to 
compute such a second curve. Then, when one 
considers quantities like the self-linking number, 
one makes them well defined by evaluating the two 
integrals on the two curves, the original one and 
the one yielded by the prescription. In reality, the 
notion of framing is a bit more elaborate than what 
we hint at here, since one could consider invariants 
constructed with more than two integrals and could 
still be ill-defined if one only considers two curves. 
The notion has to be extended as well to handle 
intersections in the curves. We will ignore these 
subtleties in this discussion. 

The Kauffman bracket knot invariant is an 
invariant of framed knots, just like the self-linking 
number. It is not well defined for a single curve. It 
requires a framing of the knot. In quantum gravity, 
there is no compelling reason to consider framed 
curves. It is true that framed curves arise naturally in 
qg-deformed field theories and perhaps a g-deformed 
version of quantum gravity is what needs to be 
considered to accommodate the Chern—Simons state, 
but at the moment there are no proposals along 
these lines that have widespread consensus. 

So, it appears the Kauffman bracket does not have 
a natural role to play as a state of quantum gravity. 
However, it is known that the frame dependence of 
the Kauffman bracket knot polynomial can be 
captured in an overall factor that depends on the 
self-linking number. If one strips the polynomial of 
this factor, one gets the Jones polynomial, which is a 
knot invariant of single curves. Could it be that this 
polynomial has a chance of being a solution of the 
quantum Einstein equations? 

To determine this, the analogy with Chern- 
Simons theory is no longer useful, since there is no 
straightforward way to transform the relation 
between the Kauffman and Jones polynomials into 
relations between states in the connection represen- 
tation. To analyze if the Jones polynomial could be 
a solution of the quantum Einstein equations, one 
needs to write the quantum Einstein equations 
directly in terms of loops. 

There have been several attempts to rewrite the 
quantum Einstein equations directly in the loop 
representation. In one of these attempts, the curva- 
ture that appears in the Hamiltonian constraint was 
represented by the “loop derivative.” This is a 
differential operator that can be introduced in the 
space of loops by considering that two loops that 
differ by a small element of area are “close.” One 
can build an attractive differential calculus in loop 
space that actually encodes many of the kinematical 
properties that are useful to formulate Yang-Mills 
theory. 


The Hamiltonian constraint in terms of the loop 
derivative is an operator that has an explicit form. 
The coefficients of the Jones polynomial can also be 
given an explicit form by computing perturbatively 
the integral in the Chern—Simons theory. The results 
are generalizations of the types of integrals that arise 
in the self-linking number, but involving a larger 
number of integrals. One can therefore envisage 
carrying out an explicit computation in which one 
checks if the coefficients of the Jones polynomial are 
annihilated or not by the Hamiltonian constraint of 
quantum gravity in the loop representation. Such a 
calculation has been carried out for the first few 
coefficients. It turns out that the second coefficient 
(the first coefficient is normalized to unity, so it 
trivially satisfies the constraint) is indeed annihilated 
by the Hamiltonian constraint of vacuum quantum 
gravity (with zero cosmological constant). It has 
been shown that the third coefficient is not, and 
there are good arguments to indicate that other 
coefficients will not be states of quantum gravity. 

So, a remarkable result has been found in that one 
of the coefficients of the Jones polynomial (related 
to the Arf and Casson invariants) is annihilated by a 
version of the quantum Hamiltonian constraint of 
general relativity. The result is quite nontrivial; it 
requires a fair amount of calculation to actually 
show that the coefficient is annihilated. The mean- 
ing of this quantum state and the deep reason why it 
is annihilated remain at present a mystery. 

The quantum Hamiltonian constraint based on the 
loop derivative makes certain assumptions about the 
space of functions one is using to quantize the theory. 
In quantum field theory, not all classical operators 
have a well-defined quantum counterpart. The choice 
being made is to assume that the curvature F,p is a 
well-defined quantum operator defined by the loop 
derivative. Differentiability of knot polynomials is 
not a new idea. It is the core idea of the Vassiliev knot 
invariants, which are defined by a set of identities, 
one of them acting as a “derivative in knot space.” It 
can be shown that the loop derivative is a concrete 
implementation of the Vassiliev derivative and, there- 
fore, Vassiliev invariants are the “arena” in which this 
version of quantum gravity takes place. 

The Hamiltonian based on the loop derivative has 
problems, in the sense that it is obtained by a 
regularization procedure that requires extra external 
geometric structures. This is common practice in 
Yang-Mills theory, where one has at hand a fixed 
external background metric. However, in gravity the 
geometry is a dynamical object and, if one con- 
structs expressions that resort to some fixed external 
geometry, one gets inconsistencies. In particular, it is 
expected that the Hamiltonian based on the loop 
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derivative will not reproduce the correct Poisson 
algebra of canonical general relativity. This sort of 
problem plagued early attempts to construct a 
quantum version of the Hamiltonian constraint in 
the early 1990s. 

A point that we mentioned earlier but did not 
elaborate upon, is that the Wilson loops constitute an 
overcomplete basis of states. Therefore, if one takes a 
quantum state and expands it on such a basis, one gets 
that the coefficients of the expansion satisfy certain 
identities, called the Mandelstam identities. These are 
nonlinear identities that states in the loop representa- 
tion have to satisfy. These identities are very incon- 
venient at the time of constructing quantum states. The 
identities stem from the fact that if one chooses a 
matrix representation of the group of interest, the fact 
that one is in a given representation is indicated by 
certain identities the matrices satisfy. To break free 
from these constraints, one possibility is to consider 
multiple representations when constructing Wilson 
loops. To do this, one considers piecewise-continuous 
graphs with intersections (the nonintersecting case is 
a trivial subcase). Along the lines connecting the 
intersections one considers holonomies in a given 
representation for a given line. In the case of the group 
SU(2), which is the one of interest in quantum gravity, 
such representations are labeled by a (half-) integer. 
One then considers invariant tensors in the group to 
“tie the holonomies together” at intersections. The 
resulting object is a gauge-invariant object for a given 
connection based on a “spin network.” The latter 
is an embedded piecewise-continuous graph with an 
assignment of integers to each of its lines and an 
assignment of “intertwiners” at each intersection (if 
the intersections are trivalent or lower, one can choose 
canonical intertwiners and forget about them). 

One can then consider the “spin network represen- 
tation” in which one expands gauge-invariant states 
in terms of the basis of Wilson nets. Knot polynomials 
for these types of graphs have been considered in the 
mathematical literature (“polynomials of colored 
graphs”). The construction with the Chern—Simons 
state can be repeated, and there exist suitable general- 
izations of the Kauffman bracket and Jones polyno- 
mials. The Hamiltonian based on the loop derivative 
can also be introduced in this context; again, its action 
is well defined on suitable generalizations of Vassiliev 
invariants for these kinds of graphs. This opens the 
possibility of encoding the quantum dynamics of 
general relativity as a combinatorial action in the 
space of Vassiliev invariants. 

An alternative Hamiltonian based on assuming that 
the holonomies and the volume operators are well 
defined quantum mechanically (but not the curvature) 
has been introduced that has the advantage of not 


220 Knot Theory and Physics 


requiring external structures for its regularization. In 
fact, it can be explicitly checked that it satisfies the 
correct Poisson algebra without anomalies at the 
quantum level. The exploration of the action of this 
Hamiltonian constraint on knot polynomials has not 
been carried out as systematically as for the one based 
on the loop derivative, but it has been explicitly shown 
that the first coefficient in the expansion of the Jones 
polynomial is annihilated by this Hamiltonian con- 
straint. The first coefficient, written in terms of loops, 
was simply the numeral 1 and was automatically 
annihilated. In terms of spin network states, the first 
coefficient is the “chromatic evaluation” of the net- 
work (the result of computing the Wilson loop on a 
connection that is pure gauge). It is somewhat 
nontrivial to show that this quantity is actually 
annihilated by the Hamiltonian constraint in question. 
At the moment, the issue of what the correct 
Hamiltonian constraint is that describes a realistic 
and physically correct theory of quantum gravity is 
still open to debate. There are certain concerns that 
the action of the operators considered up to now is 
too simple to encompass the true dynamics of 
general relativity. Constructing a semiclassical the- 
ory that could confirm or deny the viability of the 
proposals is a complicated task, since one has to 
make contact with physics that is not diffeomor- 
phism invariant in the context of a theory that is. 
Moreover, in canonical quantum gravity, there 
exists the “problem of time.” Since the Hamiltonian 
vanishes, the dynamics implied by it is trivial, and 
one has to disentangle the true dynamics by 
relational constructions among the variables of the 
theory. One then needs to compare the resulting 
predictions with classical general relativity. 
Whether the current proposals are viable and 
whether knot theory will play a role at a “kinematical 


Knot Theory and Physics 


L H Kauffman, University of Illinois at Chicago, 
Chicago, IL, USA 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


This article is an introduction to some of the relation- 
ships between knot theory and theoretical physics. 
Knots themselves are macroscopic physical phenomena 
in three-dimensional space, occurring in rope, vines, 
telephone cords, polymer chains, DNA, certain species 
of eel, and many other places in the natural and man- 
made world. The study of topological invariants of 


level” or it will actually play a key role in the detailed 
dynamics of quantum general relativity is yet to be 
seen. It is reassuring that in partial constructions, 
celebrated knot polynomials have appeared to have 
some knowledge of the dynamics of the Einstein 
equations. 

Quantum gravity being an unfinished symphony, 
we cannot entirely conclude how great an impact 
knot theory will have on it in the end. One can only 
note that beautiful mathematical results seem to tie 
in naturally with the partial constructions that have 
been carried out thus far. 
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knots leads to relationships with statistical mechanics 
and quantum physics. This is a remarkable and deep 
situation where the study of a certain (topological) 
aspects of the macroscopic world is entwined with 
theories developed for the subtleties of the microscopic 
world. The present article is an introduction to the 
mathematical side of these connections, with some 
hints and references to the related physics. 

We begin with a short introduction to knots, 
links, braids, and the bracket polynomial invariant 
of knots and links. The article then discusses 
Vassiliev invariants of knots and links, and how 
these invariants are naturally related to Lie algebras 
and to Witten’s gauge-theoretic approach. This part 


of the article is an introduction to how Vassiliev 
invariants in knot theory arise naturally in the 
context of Witten’s functional integral. 

The article is divided into several sections beyond 
the introduction. Section two is a quick introduction 
to the topology of knots and links. The third one 
discusses Vassiliev invariants and invariants of rigid 
vertex graphs. The fourth section introduces the 
basic formalism and shows how Witten’s functional 
integral is related directly to Vassiliev invariants. 
The fifth section discusses the loop transform and 
loop quantum gravity in this context. The final 
section is an introduction to topological quantum 
field theory, and to the use of these techniques in 
producing unitary representations of the braid 
group, a topic of intense interest in quantum 
information theory. 


Knots, Braids, and Bracket Polynomial 


The purpose of this section is to give a quick 
introduction to the diagrammatic theory of knots, 
links, and braids. A knot is an embedding of a circle in 
three-dimensional space, taken up to ambient isotopy. 
That is, two knots are regarded as equivalent if one 
embedding can be obtained from the other through a 
continuous family of embeddings of circles in 3-space. 
A link is an embedding of a disjoint collection of 
circles, taken up to ambient isotopy. Figure 1 illus- 
trates a diagram for a knot. The diagram is regarded 
both as a schematic picture of the knot, and as a plane 
graph with extra structure at the nodes (indicating 
how the curve of the knot passes over or under itself 
by standard pictorial conventions). 

Ambient isotopy is mathematically the same as 
the equivalence relation generated on diagrams by 
the Reidemeister moves. These moves are illustrated 
in Figure 2. Each move is performed on a local part 
of the diagram that is topologically identical to the 
part of the diagram illustrated in this figure (these 
figures are representative examples of the types of 
Reidemeister moves) without changing the rest of 
the diagram. The Reidemeister moves are useful in 
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Figure 1 A knot diagram. 
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Figure 2 The Reidemeister moves. 
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doing combinatorial topology with knots and links, 
notably in working out the behavior of knot 
invariants. A knot invariant is a function defined 
from knots and links to some other mathematical 
object (such as groups or polynomials or numbers) 
such that equivalent diagrams are mapped to 
equivalent objects (isomorphic groups, identical 
polynomials, identical numbers). 

Another significant structure related to knots and 
links is the Artin braid group. A braid is an 
embedding of a collection of strands that have 
their ends in two rows of points that are set one 
above the other with respect to a choice of vertical. 
The strands are not individually knotted and they 
are disjoint from one another. See Figures 3-5 for 
illustrations of braids and moves on braids. Braids 
can be multiplied by attaching the bottom row of 
one braid to the top row of the other braid. Taken 
up to ambient isotopy, fixing the endpoints, the 
braids form a group under this notion of multi- 
plication. In Figure 3 we illustrate the form of the 
basic generators of the braid group, and the form of 
the relations among these generators. Figure 4 
illustrates how to close a braid by attaching the 
top strands to the bottom strands by a collection of 
parallel arcs. A key theorem of Alexander states that 
every knot or link can be represented as a closed 
braid. Thus, the theory of braids is critical to the 


Braid generators 





X |- | 


a =<) S4SoS1 = S2S4 S2 





| sj's,=1 





ona. 


Figure 3 Braid generators. 


7 
A S] S3 = S3S]1 


222 (Knot Theory and Physics 


Figure-8 knot 


Figure 4 Closing braids to form knots and links. 


theory of knots and links. Figure 5 illustrates the 
famous Borrowmean rings (a link of three unknotted 
loops such that any two of the loops are unlinked) 
as the closure of a braid. 

We now discuss a significant example of an 
invariant of knots and links, the bracket polynomial. 
The bracket polynomial can be normalized to 
produce an invariant of all the Reidemeister moves. 
This normalized invariant is known as the Jones 
(1985) polynomial. The Jones polynomial was 
originally discovered by a different method than 
the one given here. 

The bracket polynomial, (K) =(K)(A), assigns to 
each unoriented link diagram K a Laurent poly- 
nomial in the variable A, such that 


1. If K and K’ are regularly isotopic diagrams, then 
(K) = (K'). 

2. If K IO denotes the disjoint union of K with an 
extra unknotted and unlinked component O (also 
called “loop” or “simple closed curve” or 
“Jordan curve”), then 


(KILO) = 6(K) 
where 


6= A> A 


3. (K) satisfies the following formulas: 


b CL(b) 


Figure 5 Borromean rings as a braid closure. 


where the small diagrams represent parts of 
larger diagrams that are identical except at the 
site indicated in the bracket. We take the 
convention that the letter chi, x, denotes a 
crossing where the curved line is crossing over 
the straight segment. The barred letter denotes 
the switch of this crossing, where the curved line 
is undercrossing the straight segment. 


In computing the bracket, one finds the following 
behavior under Reidemeister move I: 


(7) = —A*(~) 
and 

(7) = -A~(~) 
where y denotes a curl of positive type as indicated 
in Figure 6, and y indicates a curl of negative type, 
as also seen in this figure. The type of a curl is the 
sign of the crossing when we orient it locally. Our 
convention of signs is also given in Figure 6. Note 
that the type of a curl does not depend on the 
orientation we choose. The small arcs on the right- 
hand side of these formulas indicate the removal of 
the curl from the corresponding diagram. 

The bracket is invariant under regular isotopy and 


can be normalized to an invariant of ambient 
isotopy by the definition 


fx(A) = (—A3)~”) (K)(A) 


where we chose an orientation for K, and where 
w(K) is the sum of the crossing signs of the oriented 
link K. w(K) is called the writhe of K. The 


convention for crossing signs is shown in Figure 6. 


The State Summation 


In order to obtain a closed formula for the bracket, 
we now describe it as a state summation. Let K be 
any unoriented link diagram. Define a state, S, of K 
to be a choice of smoothing for each crossing of K. 
There are two choices for smoothing a given 
crossing, and thus there are 2% states of a diagram 
with N crossings. In a state we label each smoothing 
with A or A’! as in the expansion formula for the 
bracket. The label is called a vertex weight of the 
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Figure 6 Crossing signs and curls. 


state. There are two evaluations related to a state. 
The first one is the product of the vertex weights, 
denoted (K|S). The second evaluation is the number 
of loops in the state S, denoted ||S||. 

Define the state summation, (K), by the formula 


(K) = > (Kjs)slsl 
S 
It follows from this definition that (K) satisfies the 
equations 


The first equation expresses the fact that the entire 
set of states of a given diagram is the union, with 
respect to a given crossing, of those states with an 
A-type smoothing and those with an A-type 
smoothing at that crossing. The second and the 
third equation are clear from the formula defining 
the state summation. Hence, this state summation 
produces the bracket polynomial as we have 
described it at the beginning of the section. 


Remark By a change of variables one obtains the 
original Jones polynomial, Vx(t), for oriented knots 
and links from the normalized bracket: 


V(t) = f(t" $) 


Remark The bracket polynomial provides a con- 
nection between knot theory and physics, in that the 
state summation expression for it exhibits it as a 
generalized partition function defined on the knot 
diagram. Partition functions are ubiquitous in 
statistical mechanics, where they express the sum- 
mation over all states of the physical system of 
probability weighting functions for the individual 
states. Such physical partition functions contain 
large amounts of information about the correspond- 
ing physical system. Some of this information is 
directly present in the properties of the function, 
such as the location of critical points and phase 
transition. Some of the information can be obtained 
by differentiating the partition function, or perform- 
ing other mathematical operations on it. 


In fact, by defining a generalization of the bracket 
polynomial, defined on knot diagrams but not 
invariant under the Reidemeister moves, we can 
capture significant partition functions that are 
physically meaningful. There is no room in this 
survey to detail how this generalization can be used 
to express the Potts model for planar graphical 
configurations, and how it expresses the relationship 
between the Potts model and the Temperley—Lieb 
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algebra in diagrammatic form. There is much more 
in this connection with statistical mechanics in that 
the local weights in a partition function are often 
expressed in terms of solutions to a matrix equation 
called the Yang—Baxter equation, that turns out to 
fit perfectly invariance under the third Reidemeister 
move. As a result, there are many ways to define 
partition functions of knot diagrams that give rise to 
invariants of knots and links. The subject is 
intertwined with the algebraic structure of Hopf 
algebras and quantum groups, useful for producing 
systematic solutions to the Yang—Baxter equation. 
In fact, Hopf algebras are deeply connected with 
the problem of constructing invariants of three- 
dimensional manifolds in relation to invariants of 
knots. We have chosen, in this survey article, not to 
discuss the details of these approaches, but rather to 
proceed to Vassiliev invariants and the relationships 
with Witten’s functional integral. The reader is 
referred to Kauffman (1987, 1994, 2002), Jones 
(1985), and Reshetikhin and Turaev (1991) for 
more information about relationships of knot theory 
with statistical mechanics, Hopf algebras, and 
quantum groups. For topology, the key point is 
that Lie algebras can be used to construct invariants 
of knots and links. This is shown nowhere more 
clearly than in the theory of Vassiliev invariants that 
we take up in the next section. 


Vassiliev Invariants and Invariants 
of Rigid Vertex Graphs 


In this section we study the combinatorial topology 
of Vassiliev invariants. As we shall see, by the end of 
this section, Vassiliev invariants are directly con- 
nected with Lie algebras, and representations of Lie 
algebras can be used to construct them. This aspect 
of link invariants is one of the most fundamental for 
connections with physics. Just as symmetry con- 
siderations in physics lead to a fundamental rela- 
tionship with Lie algebras, topological invariance 
leads to a fundamental relationship of the theory of 
knots and links with Lie algebras. 

If V(K) is a (Laurent polynomial valued or, more 
generally, commutative ring valued) invariant of 
knots, then it can be naturally extended to an 
invariant of rigid vertex graphs by defining the 
invariant of graphs in terms of the knot invariant via 
an “unfolding of the vertex.” That is, we can regard 
the vertex as a “black box” and replace it by any 
tangle of our choice. Rigid vertex motions of the 
graph preserve the contents of the black box, and 
hence implicate ambient isotopies of the link 
obtained by replacing the black box by its contents. 
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Invariants of knots and links that are evaluated on 
these replacements are then automatically rigid vertex 
invariants of the corresponding graphs. If we set up a 
collection of multiple replacements at the vertices 
with standard conventions for the insertions of the 
tangles, then a summation over all possible replace- 
ments can lead to a graph invariant with new 
coefficients corresponding to the different replace- 
ments. In this way, each invariant of knots and links 
implicates a large collection of graph invariants. 

The simplest tangle replacements for a 4-valent 
vertex are the two crossings, positive and negative, and 
the oriented smoothing. Let V(K) be any invariant of 
knots and links. Extend V to the category of rigid 
vertex embeddings of 4-valent graphs by the formula 


V(K,) = aV(K,) + bV(K_) + cV(Ko) 


where K, denotes a knot diagram K with a specific 
choice of positive crossing, K- denotes a diagram 
identical to the first with the positive crossing 
replaced by a negative crossing and K, denotes a 
diagram identical to the first with the positive 
crossing replaced by a graphical node. 

There is a rich class of graph invariants that can 
be studied in this manner. The Vassiliev invariants 
(Bar-Natan 1995) constitute the important special 
case of these graph invariants where a = +1,b =—1 
and c=0. Thus, V(G) is a Vassiliev invariant if 


V(K,) = V(Ky) — V(K_) 


Call this formula the exchange identity for the 
Vassiliev invariant V. See Figure 7. 

V is said to be of finite type k if V(G)=0 
whenever |G| > k, where |G| denotes the number of 
(4-valent) nodes in the graph G. The notion of finite 
type is of extraordinary significance in studying 
these invariants. One reason for this is the following 
basic lemma. 


Lemma Ifa graph G has exactly k nodes, then the 
value of a Vassiliev invariant v, of type k on G, 
vp(G), is independent of the embedding of G. 


Proof Omitted. o 


Figure 7 Exchange identity for Vassiliev invariants. 








Figure 8 Chord diagrams. 


The upshot of this lemma is that Vassiliev 
invariants of type k are intimately involved with 
certain abstract evaluations of graphs with k nodes. 
In fact, there are restrictions (the four-term relations) 
on these evaluations demanded by the topology and 
it follows from results of Kontsevich (see Bar-Natan 
(1995) that such abstract evaluations actually deter- 
mine the invariants. The knot invariants derived from 
classical Lie algebras are all built from Vassiliev 
invariants of finite type. All of this is directly related 
to Witten’s functional integral (Witten 1989). 

In the next few figures we illustrate some of these 
main points. In Figure 8 we show how one 
associates a so-called chord diagram to represent 
the abstract graph associated with an embedded 
graph. The chord diagram is a circle with arcs 
connecting those points on the circle that are welded 
to form the corresponding graph. In Figure 9 we 
illustrate how the four-term relation is a conse- 
quence of topological invariance. 

In Figure 10 we show how the four-term relation is a 
consequence of the abstract pattern of the commutator 














Figure 9 The four-term relation from topology. 
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Figure 10 The four-term relation from categorical Lie algebra. 


identity for a matrix Lie algebra. That is, we show how 
a diagrammatic version of the formula 


TT? B TËT” = oa 


fits directly with the four-term relation. The formula 
we have quoted here states that the commutator of 
the matrices T? and T’ is equal to a sum of the 
matrices T° with coefficients (the structure coeffi- 
cients of the Lie algebra) f. Such a relation is the 
most concrete way to define a matrix Lie algebra. 
There are other levels of abstraction that can be 
employed here. The same diagrammatic can be 
interpreted directly in terms of the Jacobi identity 
that defines a Lie algebra. We shall content 
ourselves with this matrix point of view here, and 
add that it is assumed here that the structure 
coefficients are invariant under cyclic permutation, 
an assumption that is not needed in the general case. 
The four-term relation is directly related to a 
categorical generalization of Lie algebras. 

Figure 11 illustrates how the weights are assigned 
to the chord diagrams in the Lie algebra case — by 
inserting Lie algebra matrices into the circle and 
taking a trace of a sum of matrix products. The 
relationship between Vassiliev invariants and Lie 
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Figure 11 Calculating Lie algebra weights. 
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algebras has been known since Bar-Natan’s thesis 
(see also Kauffman (1995). In Bar-Natan (1995) the 
reader will find a good account of Kontsevich’s 
theorem, showing how Lie algebra weight systems, 
and in fact any weight system satisfying the four- 
term relation, can be used to construct knot 
invariants. Conceptually, the ideas behind the 
Kontsevich theorem are directly related to Witten’s 
approach to knot invariants via quantum field 
theory. We give an exposition of this approach in 
the next section of this article. 


Example Let Px(t)=fx(e’) (A =e’) where fx(A) is 
the normalized bracket polynomial invariant dis- 
cussed in the last section. Then Px(t) is expressed as 
a power series in t with coefficients v,(K), 
n=0,1,2,..., that are invariants of the knot or 
link K. It is not hard to show that these coefficient 
invariants (extended to graphs so that the Vassiliev 
exchange identity is satisfied) are Vassiliev invar- 
lants of finite type. In fact, most of the so-called 
polynomial invariants of knots and links (relatives 
of the bracket and Jones polynomials) give rise to 
Vassiliev invariants in just this way. Thus, Vassiliev 
invariants of finite type are ubiquitous in this area 
of knot theory. One can think of Vassiliev 
invariants as building blocks for the other invar- 
iants, or that these invariants are sources of 
Vassiliev invariants. 


Vassiliev Invariants and Witten’s 
Functional Integral 


Edward Witten (1989) proposed a formulation 
of a class of 3-manifold invariants as generalized 
Feynman integrals taking the form Z(M), where 


Z(M) = [pActeiansiona) 


Here M denotes a 3-manifold without boundary and 
A is a gauge field (also called a gauge potential or 
gauge connection) defined on M. The gauge field is a 
1-form on a trivial G-bundle over M with values in a 
representation of the Lie algebra of G. The group G 
corresponding to this Lie algebra is said to be the 
gauge group. In this integral, the action S(M, A) is 
taken to be the integral over M of the trace of the 
Chern-Simons 3-form AAdA+(2/3)AAAAA. 
(The product is the wedge product of differential 
forms.) Z(M) integrates over all gauge fields modulo 
gauge equivalence. 

The formalism and internal logic of Witten’s 
integral supports the existence of a large class of 
topological invariants of 3-manifolds and associated 
invariants of knots and links in these manifolds. 
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The invariants associated with this integral have 
been given rigorous combinatorial descriptions but 
questions and conjectures arising from the integral 
formulation are still outstanding. Specific conjec- 
tures about this integral take the form of just how it 
implicates invariants of links and 3-manifolds, and 
how these invariants behave in certain limits of the 
coupling constant k in the integral. Many conjec- 
tures of this sort can be verified through the 
combinatorial models. On the other hand, the really 
outstanding conjecture about the integral is that it 
exists! At the present time there is no measure 
theory or generalization of measure theory that 
supports it in full generality. Here is a formal 
structure of great beauty. It is also a structure 
whose consequences can be verified by a remarkable 
variety of alternative means. 

The formalism of the Witten integral implicates 
invariants of knots and links corresponding to each 
classical Lie algebra. In order to see this, we need to 
introduce the Wilson loop. The Wilson loop is an 
exponentiated version of integrating the gauge field 
along a loop K in three space that we take to be an 
embedding (knot) or a curve with transversal self- 
intersections. For this discussion, the Wilson loop 
will be denoted by the notation 


Wx(A) 


to denote the dependence on the loop K and the 
field A. It is usually indicated by the symbolism 


tr(Petc 4), Thus, 
Wx(A) =tr (Peds *) 


Here the P denotes path ordered integration — we 
are integrating and exponentiating matrix valued 
functions, and so must keep track of the order of the 
operations. The symbol tr denotes the trace of the 
resulting matrix. This Wilson loop integration exists 
by normal means and does not require functional 
integration. 

With the help of the Wilson loop functional on 
knots and links, Witten writes down a functional 
integral for link invariants in a 3-manifold M: 


Z(M, K) — | DAES MA (Pef^) 


— [atts we(A) 


Here S(M, A) is the Chern—Simons Lagrangian, as in 
the previous discussion. We abbreviate S(M, A) as S 
and write Wx(A) for the Wilson loop. Unless 


otherwise mentioned, the manifold M will be the 
three-dimensional sphere S°. 

An analysis of the formalism of this functional 
integral reveals quite a bit about its role in knot 
theory. One can determine how the Witten integral 
behaves under a small deformation of the loop K. 


Theorem 


(i) Let Z(K)=Z(S°,K) and let 6Z(K) denote the 
change of Z(K) under an infinitesimal change in 
the loop K. Then 


6Z(K) = (4ni/k) J dAe"*/4™)S [Vol] T, T, Wx(A) 


where Vol=€,~dx' dxsdx’. 

The sum is taken over repeated indices, and 
the insertion is taken of the matrices T,T, at the 
chosen point x on the loop K that is regarded 
as the center of the deformation. The volume 
element Vol = &.dx,dx,dx; is taken with regard 
to the infinitesimal directions of the loop 
deformation from this point on the original 
loop. 

(ii) The same formula applies, with a different 
interpretation, to the case where x is a double 
point of transversal self-intersection of a loop K, 
and the deformation consists in shifting one of 
the crossing segments perpendicularly to the 
plane of intersection so that the self-intersection 
point disappears. In this case, one T, is inserted 
into each of the transversal crossing segments so 
that T,T,Wkx(A) denotes a Wilson loop with 
a self-intersection at x and insertions of T, at 
x +e, and x +e, where « and © denote small 
displacements along the two arcs of K that 
intersect at x. In this case, the volume form is 
nonzero, with two directions coming from the 
plane of movement of one arc, and the perpen- 
dicular direction is the direction of the other arc. 


Remark One shows that the result of a topological 
variation has an analytic expression that is zero if 
the topological variation does not create a local 
volume. Thus, we have shown that the integral of 
el'k/47)5(A) W7(A) is topologically invariant as long as 
the curve K is moved by the local equivalent of 
regular isotopy. 


In the case of switching a crossing, the key point is 
to write the crossing switch as a composition of first 
moving a segment to obtain a transversal intersec- 
tion of the diagram with itself, and then to continue 
the motion to complete the switch. Up to the choice 
of our conventions for constants, the switching 
formula is, as shown below (see Figure 12). 








Figure 12 The difference formula. 


Z(K,) = Z(K_) 
= (4ni/k) J DAAT TK lA) 
= (4ni/k)Z(T?T* Kus) 


where K,, denotes the result of replacing the 
crossing by a self-touching crossing. We distinguish 
this from adding a graphical node at this crossing by 
using the double-star notation. 

A key point is to notice that the Lie algebra 
insertion for this difference is exactly what is done 
(in chord diagrams) to make the weight systems for 
Vassiliev invariants (without the framing compensa- 
tion). Thus, the formalism of the Witten functional 
integral takes one directly to these weight systems in 
the case of the classical Lie algebras. In this way, the 
functional integral is central to the structure of the 
Vassiliev invariants. 


The Loop Transform and Quantum 
Gravity 


Suppose that y(A) is a (complex-valued) function 
defined on gauge fields. Then we define formally the 
loop transform (K), a function on embedded loops 
in three-dimensional space, by the formula 


B(K) = [DAWA)WK(A) 


If A is a differential operator defined on Y(A), then 
we can use this integral transform to shift the effect 
of A to an operator on loops via integration by 
parts: 


AJ(K) = [DAAA Wla) 
— J DAY(A)AWg(A) 


When A is applied to the Wilson loop, the result can 
be an understandable geometric or topological 
operation. One can illustrate this situation with 
operators G and H: 


G = —Fydx'6/6A7(x) 


H = ~€qysF46/6A56/6A" 
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with summation over the repeated indices. Each of 
these operators has the property that its action on 
the Wilson loop has a geometric or topological 
interpretation. One has 


GY(K) = 6&Y(K) 


where this variation refers to the effect of varying K. 
As we saw in the previous section, this means that if 
w(K) is_a topological invariant of knots and links, 
then Gy(K)=0 for all embedded loops K. This 
condition is a transform analog of the equation 
Gy(A)=0. This equation is the differential analog 
of an invariant of knots and links. It may happen 
that 6y(K) is not strictly zero, as in the case of our 
framed knot invariants. For example, with 


w(A) = exp (ita) [ua \dA+(2/3)AAAN A)) 


eC 


we conclude that Gy(K) is zero for flat deformations 
(in the sense of the previous section) of the loop K, 
but can be nonzero in the presence of a twist or curl. 
In this sense, the loop transform provides a subtle 
variation on the strict condition Gy(A)=0. 

In Ashtekar et al. (1992) and other publications by 
Ashtekar, Rovelli, Smolin, and their colleagues, the 
loop transform is used to study a reformulation and 
quantization of Einstein gravity. The differential- 
geometric gravity theory of Einstein is reformulated 
in terms of a background gauge connection and in the 
quantization, the Hilbert space consists in functions 
y(A) that are required to satisfy the constraints 
Gw=0 and Hw=0. Thus, we see that G(K) can be 
partially zero in the sense of producing a framed knot 
invariant, and that H(K) is zero for non-self- 
intersecting loops. This means that the loop trans- 
forms of G and H can be used to investigate a subtle 
variation of the original scheme for the quantization 
of gravity. This program is being actively pursued by 
a number of researchers. The Vassiliev invariants 
arising from a topologically invariant loop transform 
are of significance to this theory. 


Braiding, Topological Quantum Field 
Theory, and Quantum Computing 


The purpose of this section is to discuss in a very 
general way how braiding is related to topological 
quantum field theory and to the enterprise 
(Freedman et al. 2002) of using this sort of theory 
as a model for anyonic quantum computation. The 
ideas in the subject of topological quantum field 
theory are well expressed by Michael Atiyah (1990) 
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and Edward Witten (1989). The simplest case of this 
idea is C N Yang’s original interpretation of the 
Yang-—Baxter equation. Yang articulated a quantum 
field theory in one dimension of space and one 
dimension of time, in which the R-matrix giving the 
scattering amplitudes for an interaction of two 
particles whose (let us say) spins corresponded to 
the matrix indices so that Re is the amplitude for 
particles of spin a and spin b to interact and produce 
particles of spin c and d. Since these interactions are 
between particles in a line, one takes the convention 
that the particle with spin a is to the left of the 
particle with spin b, and the particle with spin c is to 
the left of the particle with spin d. If one follows the 
concatenation of such interactions, then there is an 
underlying permutation that is obtained by follow- 
ing strands from the bottom to the top of the 
diagram (thinking of time as moving up the page). 
Yang designed the Yang—Baxter equation for R so 
that the amplitudes for a composite process depend 
only on the underlying permutation corresponding 
to the process and not on the individual sequences of 
interactions. 

In taking over the Yang—Baxter equation for 
topological purposes, we can use the same inter- 
pretation, but think of the diagrams with their 
under- and over-crossings as modeling events in a 
spacetime with two dimensions of space and one 
dimension of time. The extra spatial dimension is 
taken in displacing the woven strands perpendicular 
to the page, and allows the use of braiding operators 
R and R” as scattering matrices. Taking this picture 
to heart, one can add other particle properties to the 
idealized theory. In particular, one can add fusion 
and creation vertices where, in fusion, two particles 
interact to become a single particle and, in creation, 
one particle changes (decays) into two particles. 
Matrix elements corresponding to trivalent vertices 
can represent these interactions (see Figure 13). 

Once one introduces trivalent vertices for fusion 
and creation, there is the question how these 
interactions will behave in respect to the braiding 
operators. There will be a matrix expression for the 
compositions of braiding and fusion or creation as 
indicated in Figure 14. Here we will restrict 
ourselves to showing the diagrammatics with the 
intent of giving the reader a flavor of these 


Figure 13 Creation and fusion. 
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Figure 14 Braiding. 


structures. It is natural to assume that braiding 
intertwines with creation as shown in Figure 15 
(similarly with fusion). This intertwining identity is 
clearly the sort of thing that a topologist will love, 
since it indicates that the diagrams can be inter- 
preted as embeddings of graphs in three-dimensional 
space. Figure 16 illustrates the Yang—Baxter equa- 
tion. The intertwining identity is an assumption like 
the Yang—Baxter equation itself, which simplifies the 
mathematical structure of the model. 

It is to be expected that there will be an operator 
that expresses the recoupling of vertex interactions 
as shown in Figure 17 and labeled by O. The actual 
formalism of such an operator will parallel the 
mathematics of recoupling for angular momentum 
(see, e.g., Kauffman (1994)). If one just considers 
the abstract structure of recoupling then one sees 
that for trees with four branches (each with a single 
root) there is a cycle of length 5, as shown in 
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Figure 15 Intertwining. 
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Figure 16 Yang—Baxter equation. 
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Figure 17 Recoupling. 


Figure 18 Pentagon identity. 


Figure 17. One can start with any pattern of three 
vertex interactions and go through a sequence of five 
recouplings that bring one back to the same tree 
from which one started. It is a natural simplifying 
axiom to assume that this composition is the identity 
mapping. This axiom is called the pentagon identity 
(Figure 18). 

Finally, there is a hexagonal cycle of interactions 
between braiding, recoupling and the intertwining 
identity as shown in Figure 19. One says that the 
interactions satisfy the hexagon identity if this 
composition is the identity. 

A three-dimensional topological quantum field 
theory is an algebra of interactions that satisfies the 
Yang-—Baxter equation, the intertwining identity, the 
pentagon identity and the hexagon identity. There is 
no room in this summary to detail the way that 
these properties fit into the topology of knots and 
three-dimensional manifolds, but a sketch is in 
order. For the case of topological quantum field 
theory related to the group SU(2) there is a 
construction based entirely on the combinatorial 
topology of the bracket polynomial (see the section 
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Figure 19 Hexagon identity. 
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Figure 20 Decomposition of a surface into trinions. 


“Knots, braids, and bracket polynomial”). For more 
information on this approach, the reader is referred 
to Kauffman (1994, 2002). 

It turns out that the algebraic properties of a 
topological quantum field theory give it enough 
power to rigourously model three manifold invar- 
iants described by the Witten integral. This is done 
by regarding the 3-manifold as a union of two 
handlebodies with boundary an orientable surface 
So of genus g. The surface is divided up into 
trinions as illustrated in Figure 20. A trinion is a 
surface with boundary that is topologically equiva- 
lent to a sphere with three punctures. In Figure 20 
we illustrate two trinions, the second shown as a 
neighborhood of a trivalent vertex, and a surface 
of genus 3 that is decomposed into three trinions. 
It turns out that there is a way to associate a 
vector space V(S,) to a surface with a trinion 
decomposition, defined in terms of the associated 
topological quantum field theory, such that the 
isomorphism class of the vector space V(S,) does 
not depend upon the choice of decomposition. 
This independence is guaranteed by the braiding, 
hexagon, and pentagon identities in such a way 
that one can associate a well-defined vector |M,) in 
V(S,) whenever M is a 3-manifold whose boundary is 
Są. Furthermore, if a closed 3-manifold M? is decom- 
posed along a surface S, into the union of M_ and M4, 
where these parts are otherwise disjoint 3-manifolds 
with boundary S,, then the inner product I(M)= 
(M_|M_) is, up to normalization, an invariant of the 
3-manifold M3. With the definition of topological 
quantum field theory given above, knots and links can 
be incorporated as well, so that one obtains a source of 
invariants I(M°,K) of knots and links in orientable 
3-manifolds. 

The invariant I(M?, K) can be formally compared 
with the Witten integral 


Z(M, K) n [Dacta Wx(A) 
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It can be shown that up to limits of the heuristics, 
Z(M, K) and I(M?,K) are essentially equivalent for 
appropriate choice of gauge groups. 

This point of view leads to more abstract 
formulations of topological quantum field theories 
as ways to associate vector spaces and linear 
transformations to manifolds and cobordisms of 
manifolds. (A cobordism of surfaces is a 3-manifold 
whose boundary consists of these surfaces.) 

As the reader can see, a three-dimensional TQFT is, 
at base, a highly simplified theory of point-particle 
interactions in (2 + 1)-dimensional spacetime. It can be 
used to articulate invariants of knots and links and 
invariants of 3-manifolds. The reader interested in the 
SU(2) case of this structure and its implications for 
invariants of knots and 3-manifolds can consult 
Kauffman (1994, 2002) and Crane (1991). One expects 
that physical situations involving 2 + 1 spacetime will 
be approximated by such an idealized theory. It is 
thought, for example, that aspects of the quantum Hall 
effect will be related to topological quantum field 
theory (Wilczek 1990). One can imagine a physics 
where the geometrical space is two dimensional and the 
braiding of particles corresponds to their interactions 
through circulating around one another in the plane. 
Anyons are particles that do not just change their wave 
functions by a sign under interchange, but rather by a 
complex phase or even a linear combination of states. It 
is hoped that TQFT models will describe applicable 
physics. One can think about the possible applications 
of anyons to quantum computing. The TQFTs then 
provide a class of anyonic models where the braiding is 
essential to the physics and to the quantum 
computation. 

The key point in the application and relationship 
of TQFT and quantum information theory is, in our 
Opinion, contained in the structure illustrated in 
Figure 21. There we show a more complex braiding 
operator, based on the composition of recoupling 
with the elementary braiding at a vertex. (This 
structure is implicit in the hexagon identity of 
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Figure 21 A more complex braiding operator. 


Figure 19.) The new braiding operator is a source of 
unitary representations of braid group in situations 
(which exist mathematically) where the recoupling 
transformations are themselves unitary. This kind of 
pattern is utilized in the work of Freedman et al. 
(2002) and in the case of classical angular momentum 
formalism has been dubbed a “spin-network quantum 
simulator” by Rasetti and collaborators (see, e.g., 
Marzuoli and Rasetti (2002). Kauffman and Lomo- 
naco (2006) show how certain natural deformations 
(Kauffman 1994) of Penrose (1969) spin networks can 
be used to produce such the Freedman—Kitaev model 
for anyonic topological quantum computation. It is 
legitimate to speculate that networks of this kind are 
present in physical reality. 

Quantum computing can be regarded as a study of 
the structure of the preparation, evolution, and 
measurement of quantum systems. In the quantum 
computation model, an evolution is a composition of 
unitary transformations (usually finite-dimensional 
over the complex numbers). The unitary transforma- 
tions are applied to an initial state vector that has been 
prepared prior to this process. Measurements are 
projections to elements of an orthonormal basis of 
the space upon which the evolution is applied. The 
result of measuring a state |W), written in the given 
basis, is probabilistic. The probability of obtaining a 
given basis element from the measurement is equal to 
the absolute square of the coefficient of that basis 
element in the state being measured. 

It is remarkable that the above lines constitute an 
essential summary of quantum theory. All applications 
of quantum theory involve filling in details of unitary 
evolutions and specifics of preparations and measure- 
ments. Such unitary evolutions can be seen as approxi- 
mated arbitrarily closely by representations of the Artin 
braid group. The key to the anyonic models of quantum 
computation via topological quantum field theory, or 
via deformed spin networks, is that all unitary evolu- 
tions can be approximated by a single coherent method 
for producing representations of the braid group. This 
beautiful mathematical fact points to a deep role for 
topology in the structure of quantum physics. 

The future of knots, links, and braids in relation 
to physics will be very exciting. There is no question 
that unitary representations of the braid group and 
quantum invariants of knots and links play a 
fundamental role in the mathematical structure of 
quantum mechanics, and we hope that time will 
show us the full meaning of this relationship. 
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Introduction 


The Kontsevich integral was invented by Kontsevich 
(1993) as a tool to prove the fundamental theorem of 
the theory of finite-type (Vassiliev) invariants (see Bar- 
Natan (1995a)). It provides an invariant exactly as 
strong as the totality of all Vassiliev knot invariants. 

The Kontsevich integral is defined for oriented 
tangles (either framed or unframed) in R?; therefore, 
it is also defined in the particular cases of knots, 
links, and braids (see Figure 1). 

As a starter, we give two examples where simple 
versions of the Kontsevich integral have a 
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straightforward geometrical meaning. In these 
examples, as well as in the general construction of 
the Kontsevich integral, we represent 3-space R? as 
the product of a real line R with coordinate t and a 
complex plane C with complex coordinate z. 


Example 1 The number of twists in a braid with 
two strings 24(t) and z2(t) placed in the sliceO <t < 1 
(see Figure 2) is equal to 


1 1 dzı — dz2 
271 0 Z1 29 


Figure 1 A tangle, a braid, a link, and a knot. 
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Z,(t) \- - - p z(t) 


Figure 2 Counting the number of twists. 





Figure 3 Counting the linking number. 


Example 2 The linking number of two spatial 
curves K and K’ (see Figure 3) can be computed as 


Lf yate 
2ri m<t<M j i a(t) — z;() 


where m and M are the minimum and the maximum 
values of t on the link K UK’, 7 is the index that 
enumerates all possible choices of a pair of strands 
of the link as functions z;(t), z(t) corresponding to K 
and K’, respectively, and e;= +1 according to the 
parity of the number of chosen strands that are 
oriented downwards. 

The Kontsevich integral can be regarded as a far- 
going generalization of these formulas. It aims at 
encoding all information about how the horizontal 
chords on the knot (or tangle) rotate when moved in 
the vertical direction. From a more general view- 
point, the Kontsevich integral represents the mono- 
dromy of the Knizhnik—Zamolodchikov connection 
in the complement to the union of diagonals in C” 
(see Bar-Natan (1995a) and Ohtsuki (2002)). 


Ik(K, K’) = 


Chord Diagrams and Weight Systems 
Algebras A(p) 


The Kontsevich integral of a tangle T takes values in 
the space of chord diagrams supported on T. 

Let X be an oriented one-dimensional manifold, 
that is, a collection of p numbered oriented lines and 
q numbered oriented circles. A chord diagram of 
order n supported on X is a collection of n pairs 
of unordered points in X, considered up to an 
orientation- and component-preserving  diffeo- 
morphism. In the vector space formally generated 
by all chord diagrams of order n, we distinguish the 
subspace spanned by all four-term relations 


where thin lines designate chords, while thick lines are 
pieces of the manifold X. Apart from the fragments 
shown, all the four diagrams are identical. The 
quotient space over all such combinations is denoted 
by An (X) =An(p,q). Let A(p,q)= On-0 An (P, ) 
and let A(p,q) be the graded completion of A(p, q) 
(i.e., the space of formal infinite series )>?° 9 a; with 
di E A;(p,q)). If, moreover, we divide A(p, q) by all 
“framing independence” relations (any diagram with 
an isolated chord, i.e., a chord joining two adjacent 
points of the same connected component of X, is set to 
0), then the resulting space is denoted by A'(p,q), and 
its graded completion by A’(p, q). 

The spaces A(p, 0) =.A(p) have the structure of an 
algebra (the product of chord diagrams is defined by 
concatenation of underlying manifolds in agreement 
with the orientation). Closing a line component into a 
circle, we get a linear map A(p,g) — Alp — 1,4 + 1) 
which is an isomorphism when p=1. In particular, 
A(S!) = A(R!) has the structure of an algebra; this 
algebra is denoted simply by A; the Kontsevich integral 
of knots takes its values in its graded completion 
A. Another algebra of special importance is 
A(3) =.A(3,0), because it is where the Drinfeld 


associators live. 





Hopf Algebra Structure 


The algebra A(p) has a natural structure of a Hopf 
algebra with the coproduct 6 defined by all ways to 
split the set of chords into two disjoint parts. To give 
a convenient description of its primitive space, one 
can use generalized chord diagrams. We now allow 
trivalent vertices not belonging to the supporting 
manifold and use STU relations (Bar-Natan 1995a) 


mer * rS e 


to express the generalized diagrams as linear combi- 
nations of conventional chord diagrams, for example, 


2B -S 


Then the primitive space coincides with the sub- 
space of A(p) spanned by all connected generalized 
chord diagrams (“connected” means that they remain 
connected when the supporting manifold X is 
disregarded). 


Weight Systems 


A “weight system” of degree n is a linear function 
on the space A,. Every Vassiliev invariant v of 
degree n defines a weight system symb(v) of the 
same degree called its “symbol.” 


Algebras (p) 


Apart from the spaces of chord diagrams modulo four- 
term relations, there are closely related spaces of Jacobi 
diagrams. A Jacobi diagram is defined as a unitrivalent 
graph, possibly disconnected, having at least one 
vertex of valency 1 in each connected component and 
supplied with two additional structures: a cyclic order 
of edges in each trivalent vertex and a labeling of 
univalent vertices taking values in the set {1,2,..., p}. 
The space B(p) is defined as the quotient of the vector 
space formally generated by all p-colored Jacobi 
diagrams modulo the two types of relations: 
Antisymmetry: IHX: 


-Y DHX 


The disjoint union of Jacobi diagrams makes the 
space B(p) into an algebra. 

The symmetrization map xp: B(p) —> Alp), defined 
as the average over all ways to attach the legs of color 1 
to ith connected component of the underlying manifold 


Lho K 


is an isomorphism of vector spaces (the formal 
PBW isomorphism (Bar-Natan 1995a, Le and 
Murakami 1995) which is not compatible with 
the multiplication. The relation between A(p) and 
B(p) very much resembles the relation between 
the universal enveloping algebra and the sym- 
metric algebra of a Lie algebra. The algebra 
B= B(1) is used to write out the explicit formula 
for the Kontsevich integral of the unknot (see 
Bar-Natan et al. (2003) and below). 


The Construction 
Kontsevich’s Formula 


We will explain the construction of the Kontsevich 
integral in the classical case of (closed) oriented 
knots; for an arbitrary tangle T, the formula is the 
same; only the result is interpreted as an element of 
A(T). As above, represent three-dimensional space 
R? as a direct product of a complex line C with 
coordinate z and a real line R with coordinate t. 
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The integral is defined for Morse knots, that is, 
knots K embedded in R? =C, x R, in such a way 
that the coordinate ¢ restricted to K has only 
nondegenerate (quadratic) critical points. (In fact, 
this condition can be weakened, but the class of 
Morse knots is broad enough and convenient to 
work with.) 

The Kontsevich integral Z(K) of the knot K is the 
following element of the completed algebra A’: 





x min < n Oo? <4 


t; are noncritical 


x Dp 


J 


/ 


j4 dz; — dz; 
=] J 


ZiT 


Explanation of the Constituents 


The real numbers tmin and tmax are the minimum and 
the maximum of the function t on K. 

The integration domain is the m-dimensional 
simplex tmin < m < <+- < t4 < tmax divided by the 
critical values into a certain number of “connected 
components.” For example, Figure 4 shows an 
embedding of the unknot where, for m=2, the 
integration domain has six connected components. 

The number of summands in the integrand is 
constant in each connected component of the 
integration domain, but can be different for different 
components. In each plane {t= t;} C R? choose an 
unordered pair of distinct points (z;, tj) and (z;, t) on 
K, so that z;(t;) and z/(t;) are continuous branches of 
the knot. We denote by P = {(zj, 2;)} the collection of 
such pairs for j= 1,...,m. The integrand is the sum 
over all choices of the pairing P. In the example 
above for the component {tmin < t1 < C1, C2 < h < 
tmax}, we have only one possible pair of points on 
the levels {t= tı} and {t= t2}. Therefore, the sum 
over P for this component consists of only one 
summand. Unlike this, in the component {tmin < 
ti < c1, c1 <b < c2}, we still have only one 
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Figure 4 Connected components. 
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possibility for the level {t = t1}, but the plane {t = t2} 
intersects our knot K in four points. So we have 
(5) =6 possible pairs (z2,z5), and the total number 
of summands is six (see Figure 5). 

For a pairing P, the symbol “|p” denotes the 
number of points (z,¢;) or (zt) in P, where the 
coordinate t decreases along the orientation of K. 

Fix a pairing P. Consider the knot K as an oriented 
circle and connect the points (z;,¢;) and (Z; tj) by a 
chord. Up to a diffeomorphism, this chord does not 
depend on the value of ¢ within a connected 
component. We obtain a chord diagram with m 
chords. The corresponding element of the algebra A’ 
is denoted by Dp. Figure 5, for each connected 
component in our example, shows one of the possible 
pairings, the corresponding chord diagram with 
the sign eal. and the number of summands of the 
integrand (some of which are equal to zero in A’ due 
to the framing independence relation). 

Over each connected component, z; and z; are 
smooth functions of t;. 

By 
j 


a dz; — dz’ 
=I 


g 
gr 


~x 


we mean the pullback of this form to the integration 
domain of variables ti,...,tm. The integration 
domain is considered with the orientation of the 
space R” defined by the natural order of the 
coordinates t1, ..., tm. 

By convention, the term in the Kontsevich integral 
corresponding to m=0 is the (only) chord diagram 
of order 0 with coefficient 1. It represents the unit of 


the algebra A’. 


QS 


36 summands 








Framed Version of the Kontsevich Integral 


Let K be a framed oriented Morse knot with writhe 
number w(K). Denote the corresponding knot 


without framing by K. The framed version of the 
Kontsevich integral can be defined by the formula 


Z(K) =P 7(K) € A 


where © is the chord diagram with one chord and the 
integral Z(K) € A’ is understood as an element of the 
completed algebra A (without one-term relations) by 
virtue of a natural inclusion A’ — A defined as identity 
on the primitive subspace of A’ (see Goryunov 
(1999) and Le and Murakami (1996)). 


Basic Properties 
Constructing the Universal Vassiliev Invariant 
The Kontsevich integral Z(K) 


1. converges for any Morse knot K, 

2. is invariant under deformations of the knot in the 
class of Morse knots, and 

3. behaves in a predictable way under the deforma- 
tion that adds a pair of new critical points to a 
Morse knot: 


Here the first and the third pictures depict two 
embeddings of an arbitrary knot, differing only in 
the shown fragment, H = is the “hump” (unknot 
embedded in R? in the specified way), and the 
product is the product in the completed algebra A’ 
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Figure 5 Pairings and chord diagrams. 
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of chord diagrams. The last equality allows one to 
define a genuine knot invariant by the formula 


I(K) = Z(K)/Z(H)" 


where c denotes the number of critical points of K and 
the ratio means the division in the algebra A’ according 
to the rule (1 +.a)'=1-—a+a@—a@+-:--. 

The expression I(K) is sometimes referred to as 
the “final” Kontsevich integral as opposed to the 
“preliminary” Kontsevich integral Z(K). It repre- 
sents a universal Vassiliev invariant in the following 
sense: Let w be a weight system, that is, a linear 
functional on the algebra A’. Then the composition 
w(I(K)) is a numerical Vassiliev invariant, and any 
Vassiliev invariant can be obtained in this way. 

The final Kontsevich integral for framed knots is 
defined in the same way, using the hump H with 
zero writhe number. 


Is Universal Vassiliev Invariant Universal? 


At present, it is not known whether the Kontsevich 
integral separates knots, or even if it can tell the 
orientation of a knot. However, the corresponding 
problem is solved, in the affirmative, in the case of 
braids and string links (theorem of Kohno- 
Bar-Natan (Bar-Natan 1995b, Kohno 1987). 


Omitting Long Chords 


We will state a technical lemma which is highly 
important in the study of the Kontsevich integral. It 
is used in the proof of the multiplicativity, in the 
combinatorial construction, etc. 

Suppose we have a Morse knot K with a 
distinguished tangle T (Figure 6). Let m and M be 
the maximal and minimal values of t on the tangle T. 
In the horizontal planes between the levels m and M, 
we can distinguish two kinds of chords: “short” 
chords that lie either inside T or inside K\T, and 
“long” chords that connect a point in T with a point 
in K\T. Denote by Z7(K) the expression defined by 
the same formula as the Kontsevich integral Z(K) 
where only short chords are taken into consideration. 
More exactly, if C is a connected component of the 











Figure 6 Short and long chords. 
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integration domain whose projection on the coordi- 
nate axis £; is entirely contained in the segment [m, M], 
then in the sum over the pairings P we include only 
those pairings that include short chords. 


Lemma “Long” chords can be omitted when 
computing the Kontsevich integral: Z7(K) = Z(K). 


Kontsevich’s Integral and Operations on Knots 


The Kontsevich integral behaves in a nice way with 
respect to the natural operations on knots, such as 
mirror reflection, changing the orientation of the 
knot, mutation of knots (see Chmutov and Duzhin 
(2001)), cabling (see Willerton (2002)). We give 
some details regarding the first two items. 


Fact 1 Let R be the operation that sends a knot 
to its mirror image. Define the corresponding 
operation R on chord diagrams as multiplication 
by (—1)”, where n is the order of the diagram. Then 
the Kontsevich integral commutes with the opera- 
tion R: Z(R(K))=R(Z(K)), where by R(Z(K)) we 
mean simultaneous application of R to all the chord 
diagrams participating in Z(K). 


Corollary The Kontsevich integral Z(K) and the 
universal Vassiliev invariant I(K) of an amphicheiral 
knot K consist only of even order terms. (A knot K is 
called “amphicheiral,” if it is equivalent to its mirror 
image: K = R(K).) 


Fact 2 Let S be the operation on knots which 
inverts their orientation. The same letter will also 
denote the analogous operation on chord diagrams 
(inverting the orientation of the outer circle or, 
which is the same thing, axial symmetry of the 
diagram). Then the Kontsevich integral commutes 
with the operation S of inverting the orientation: 
Z(S(K)) = S(Z(K)). 


Corollary The 
equivalent: 


following two assertions are 


(i) Vassiliev invariants do not distinguish the 
orientation of knots and 

(ii) all chord diagrams are symmetric: D=S(D) 
modulo four-term relations. 


The calculations of Kneissler (1997) show that up 
to order 12 all chord diagrams are symmetric. For 
bigger orders, the problem is still open. 


Multiplicative Properties 
The Kontsevich integral for tangles is multiplicative: 
ZA 14) ZT) = ZA Ty e T2) 


whenever the product T;-7>, defined by vertical 
concatenation of tangles, exists. Here, the product 
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on the left-hand side is understood as the image of 
the element Z(T,) ® Z(T>) under the natural map 
A(T) ® A(T2) > A(T, - T2). 

This simple fact has two important corollaries: 


1. For any knot K, the Kontsevich integral Z(K) is 
a group-like element of the Hopf algebra A’, 
that is, 


6(Z(K)) = Z(K) 8 Z(K) 


where 6 is the comultiplication in A defined 
above. 

2. The final Kontsevich integral, taken in a different 
normalization 


Z(K) 
Z(H)?" 


is multiplicative with respect to the connected 
sum of knots: 


I'(K,#K2) = I'(K1)I'(K2) 


Arithmetical Properties 


For any knot K the coefficients in the expansion of 
Z(K) over an arbitrary basis consisting of chord 
diagrams are rational (see Kontsevich (1993), Le 
and Murakami (1996), and below). 


Combinatorial Construction of the 
Kontsevich Integral 


Sliced Presentation of Knots 


The idea is to cut the knot into a number of 
standard simple tangles, compute the Kontsevich 
integral for each of them, and then recover the 
integral of the whole knot from these simple 
pieces. 

More exactly, we represent the knot by a family 
of plane diagrams continuously depending on a 
parameter ¢€(0,¢ 9) and cut by horizontal planes 
into a number of slices with the following 
properties. 


1. At every boundary level of a slice (dashed lines 
in the pictures below), the distances between 
various strings are asymptotically pro- 
portional to different whole powers of the 
parameter €. 

2. Every slice contains exactly one special event 
and several strictly vertical strings which 
are farther away (at lower powers of £) from 
any string participating in the event than its 


width. 


3. There are three types of special events: 


min/max: m= NX.. M=__/N.. 
braiding: B,= os 
associativity: A,= | Pi | A_= NI 


where, in the two last cases, the strings may be 
replaced by bunches of parallel strings which 


are closer to each other than the width of this 
event. 


Recipe of Computation of the Kontsevich Integral 


Given such a sliced representation of a knot, the 
combinatorial algorithm to compute its Kontsevich 
integral consists in the following: 


1. Replace each special event by a series of chord 
diagrams supported on the corresponding tangle 
according to the rule 


m,M= 1 
BaueR, Bas R 
A =ð, A m=! 


where 


R= jeo) 














x Is 1 x 1 , 
= ais a a 
~ - oa ba 

— ((a, [a, bl] + [b, (a, Bll) + 


(e A(3) is the Knizhnik-Zamolodchikov 
Drinfeld associator defined below; it is an infinite 
series in two variables a = #1, b =1#). 

2. Compute the product of all these series from 
top to bottom taking into account the connec- 
tion of the strands of different tangles, thus 
obtaining an element of the algebra Æ. 


To accomplish the algorithm, we need two 
auxiliary operations on chord diagrams: 


1. §;: A(p)—A(p) defined as multiplication by 
(-1)* on a chord diagram containing k end- 
points of chords on the string number 7. This is 
the correction term in the computation of R 
and ® in the case when the tangle contains 


some strings oriented downwards (the upwards 
orientation is considered as positive). 

2. A;: A(p) — A(p +1) acts on a chord diagram D 
by doubling the ith string of D and taking the 
sum over all possible lifts of the endpoints of 
chords of D from the ith string to one of the two 
new strings. The strings are counted by their 
bottom points from left to right. This operation can 
be used to express the combinatorial Kontsevich 
integral of a generalized associativity tangle 
(with strings replaced by bunches of strings) in 
terms of the combinatorial Kontsevich integral 
of a simple associativity tangle. 


Example 


Using the combinatorial algorithm, we compute the 
Kontsevich integral of the trefoil knot 3; to the 
terms of degree 2. A sliced presentation for this knot 
shown in Figure 7 implies that Z(31)= S3(®) 
R~°S3(@-) (here the product from left to right 
corresponds to the multiplication of tangles from 
top to bottom). Up to degree 2, we have 


= 1+4 la, b] + 
R=X(1+ła+ła +.) 
where X means that the two strands in each term of 


the series must be crossed over at the top. The 
operation $3 changes the orientation of the third 


strand, which means that $3(a) =a and $3(b)= —b. 
Therefore, 
S3(®) =] — 3; (a, b] T 
S3(@-') = 1 + $ [a, b] + 
R? = X(1 -ła +3 +) 


and 

















Figure 7 A sliced presentation of the trefoil. 
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Z(31) = (1 - 34a, 6] +-+) 
x X(1—3a43a+---)(1+$[a,b] +---) 
= 1-—3Xa-—abX + 4 bax 


+54 Xab — 4 Xba +3 Xa? + 

Closing these diagrams into the circle, we see that in 
the algebra A we have Xa=0 (by the framing 
independence relation), then baX = Xab=0 (by the 
same relation, because these diagrams consist of two 
parallel chords) and abX = Xba = Xa? = Q). The 
result is Z(31)=1 + (25/24)®+---. The final 
Kontsevich integral of the trefoil (in the multi- 
plicative normalization) is thus equal to 


I'(31) =Z(31)/Z(A) 


(1+ 8Q@+-)/[(r4d@e-) 


=1+®+-- 


Drinfeld Associator and Rationality 


The Drinfeld associator used as a building block in 
the combinatorial construction of the Kontsevich 
integral can be defined as the limit 


Prz = lim SY LAT” 


where a=#t, b=1#, and AT, is the positive associa- 
tivity tangle (special event A, shown above) with 
the distance between the vertical strands constant 1 
and the distance between the close endpoints equal 
to £. An explicit formula for kz was found by Le 
and Murakami (1996); it is written as a nested 
summation over four variable multi-indices and 
therefore does not provide an immediate insight 
into the structure of the whole series; we confine 
ourselves by quoting the beginning of the series 
(note that kz is a group-like element in the free 
associative algebra with two generators; hence, its 
logarithm belongs to the corresponding free Lie 
algebra): 


log(®xz) = — ¢(2) [x,y] — ¢(3) (Lx, lx, yl] + Ly, bx, y]) 
KON x, | 
10 x, |x, y]]] + Ly, bx, æ, I] 
+ 4[y, Ly, [x,y]]]) 
— (5) ([ox, [x H lx, VIII] + Ly, by, by, x, YD 
+ (¢(2)¢(3) — 2¢(5)) (Ly, [x, bx, [x vII]] 
+ [y, Ly, bx, [x, y]]]]) 
+ (6(2)¢(3) —36(5)) [lx, 9], fx, [x,y 
+ (5¢(2)¢(3) —$¢(5)) [lx, 9], by, [x,y] 
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where x =(1/271)a and y=(1/271i)b. In general, ®xz 
is an infinite series whose coefficients are “multiple 
zeta values” (Le and Murakami 1996, Zagier 1994) 


CFigc ss) = ` ee Pi 


O0<ki<ko<:::<k,, 


There are other equivalent definitions of ®xz, in 
particular one in terms of the asymptotical behavior 
of solutions of the simplest Knizhnik—Zamolodchikov 


equation 
dG a b 
et fu G 
dz £ T= z) 


where G is a function of a complex variable taking 
values in the algebra of series in two noncommuting 
variables a and b (see Drinfeld (1991)). 

It turns out (theorem of Le and Murakami (1996)) 
that the combinatorial Kontsevich integral does not 
change if ®xz is replaced by another series in A(3) 
provided it satisfies certain axioms (among which 
the pentagon and hexagon relations are the most 
important, see Drinfeld (1991) and Le and 
Murakami (1996)). 

Drinfeld (1991) proved the existence of an 
associator ®g with rational coefficients. Using it 
instead of ®x7z in the combinatorial construction, we 
obtain the following: 


Theorem (Le and Murakami 1996). The coeffi- 
cients of the Kontsevich integral of any knot (tangle) 
are rational when Z(K) is expanded over an 
arbitrary basis consisting of chord diagrams. 





Explicit Formulas for the Kontsevich 
Integral 


The Wheels Formula 


Let O be the unknot; the expression I(O)= Z(H) 
is referred to as the “Kontsevich integral of the 
unknot.” A closed form formula for I(O) was 
proved in Bar-Natan et al. (2003): 


Theorem 


I(O) = exp ` b7,W rn 
n=1 


2 
n=1 n=1 


Here b>, are modified Bernoulli numbers, that is, 
the coefficients of the Taylor series 


ex/2 — e7*/2 


OO 1 
X bonx 5 In x 


(ba = 1/48, b4 = —1/5760, bg = 1/362 880,...), and 
Wn are the “wheels,” that is, Jacobi diagrams of the 


form 
m=. we=Kh we =r... 


The sums and products are understood as operations 
in the algebra of Jacobi diagrams B, and the result is 
then carried over to the algebra of chord diagrams A 
along the isomorphism yx. 


Generalizations 


There are several generalizations of the wheels 
formula. 


1. Rozansky’s rationality conjecture (Rozansky 
2003) proved by Kricker (2000) affirms that the 
Kontsevich integral of any (framed) knot can be 
written in a form resembling the wheels formula. 
Let us call the “skeleton” of a Jacobi diagram the 
regular 3-valent graph obtained by “shaving off” 
all univalent vertices. Then the wheels formula 
says that all diagrams in the expansion of I(O) 
have one and the same skeleton (circle), and the 
generating function for the coefficients of dia- 
grams with n legs is a certain analytic function, 
more or less rational in e”. In the same way, the 
theorem of Rozansky and Kricker states that the 
terms in I(K) €B, when arranged by their 
skeleta, have the generating functions of the 
form p(e*)/Ax(e*), where Ax is the Alexander 
polynomial of K and p is some polynomial 
function. Although this theorem does not give 
an explicit formula for I(K), it provides a lot of 
information about the structure of this series. 

2. Marché gives a closed form formula for the 
Kontsevich integral of torus knots T(p, q). 


The formula of Marché, although explicit, is 
rather intricate, and here, by way of example, we 
only write out the first several terms of the final 
Kontsevich integral I’ for the trefoil (torus knot of 
type (2,3)), following Willerton (2002): 


31 5 1 
"(Q) =O-8+8- 578+ 8+ 50+: 


First Terms of the Kontsevich Integral 


A Vassiliev invariant v of degree n is called 
“canonical” if it can be recovered from the 
Kontsevich integral by applying a homogeneous 
weight system, that is, if v=symb(v) o I. Canonical 
invariants define a grading in the filtered space of 
Vassiliev invariants which is consistent with the 
filtration. If the Kontsevich integral is expanded 
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over a fixed basis in the space of chord diagrams A’, 
then the coefficient of every diagram is a canonical 
invariant. According to Stanford (2001) and Willerton 
(2002), the expansion of the final Kontsevich integral 
up to degree 4 can be written as follows: 


I'(K) =O -e2(K) @ — Z73(K)B 
+ $(4j4(K) + 36c4(K) — 36c5(K) + 3c2(K))® 
+ 34(—12c4(K) -+ 6c (K) — c2(K))@ 
+4G(KB+- 


where c, are coefficients of the Conway polynomial 
V«(t)= >> c,(K)t” and j, are modified coefficients of 
the Jones polynomial Jx(e’) = $` j,(K)t”. Therefore, up 
to degree 4, the basic canonical Vassiliev invariants of 
unframed knots are c2, j3, j4, c4 + (1/12)c2, and c2. 
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Modulation equations are simplified equations 
used to model complicated physical systems. Typi- 
cally they are derived from the fundamental partial 
differential equations that describe the system via 
asymptotic analysis. Furthermore, the modulation 
equations are in a sense “universal” in that many 


different physical systems are described by the same 
modulation equation. This comes about because 
the form of the modulation equation depends on 
only a very few, qualitative features of the original 
partial differential equation. Thus, they serve a sort 
of “normal form” for these partial differential 
equations and as such justify greater study than 
their apparently special character might otherwise 
merit. 
The Korteweg-de Vries (KdV) equation 


Ou = Fu + 6udu, u=u(x,t),xER, t>0 [1 
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was one of the earliest modulation equations to be 
intensively studied. It was derived in an attempt to 
understand the propagation of solitary waves on the 
surface of water in a channel of finite depth. The 
KdV equation was first derived by Boussinesq but 
then independently rederived and studied in detail 
by Korteweg and de Vries. (For an interesting 
discussion of the early history of the KdV equation 
see Pego and Weinstein (1997).) 


Derivation of the KdV Equation 


As mentioned above, the KdV equation is a sort of 
normal form describing the propagation of small- 
amplitude, long-wavelength disturbances in a variety 
of different physical systems. In this section we 
describe in detail how it arises as an approximation 
to the Fermi—Pasta-Ulam (FPU) model of coupled, 
nonlinear oscillators. Although the KdV equation is 
most commonly encountered as an approximation 
to water waves, its study as an approximation to the 
FPU model was extremely important historically 
because it was in this context that its complete 
integrability was discovered by Miura (1968) and 
Gardner et al. (1974). 

Consider an infinite set of particles of mass 
m=1 at positions q;(t), j € Z, interacting with 


their nearest neighbors via a potential V(q). 
Newton’s equations for the motion of such 
particles are: 
d*q; 
qp TV Gin — ale) 
-Vig(t)-qa®), i€Z 2 


If we rewrite these equations in terms of the 
difference variables r(j,t)=qj41(t) — q(t), then [2] 
becomes 

d*r ee 

qn he =V (rU + 1,t)) 


+Vi(rG—1,t))-2V(rG,2)), FEZ [3] 


We are interested in small-amplitude, long- 
wavelength, solutions of [3]. One way of studying 
such motions is to change the lattice spacing in [3] 
from 1 to þ and then let þh tend to zero. A nice 
derivation of the KdV equation from that point of 
view is contained in Ablowitz and Segur (1981). 
Here, following Schneider and Wayne (1999), we 
will keep the lattice spacing fixed at 1 and rescale 
the spatial variable in the KdV equation. This is 
closer to the approximation method used in the 
water wave problem. 

Since we want to focus on small-amplitude, long- 
wavelength solutions of [3], we begin by making the 


hypothesis that there exists some real-valued func- 
tion R(x,t) such that the solution of [3] can be 
written as 


r(j,t) =e R(éj,t) [4] 


The prefactor £? insures that the solution is of small 
amplitude while rescaling j — £j means that phe- 
nomena that occur on length scales of O(1) in the 
equation for R will occur on length scales of O(1/e) 
in the original equation — that is, they will be long- 
wavelength solutions. The differing powers of e 
chosen for rescaling the amplitude and the spatial 
scale are chosen so that the dispersive and nonlinear 
effects will balance each other. Inserting [4] into [3] 
and expanding to lowest order in £ we find that the 
nonvanishing terms of lowest order in £ are 


PR „OR 
T J 


This is just the wave equation and thus to leading 
order we expect solutions of [3] to split into a left- 
and right-moving waves, each moving with speed 
ce =€4/ V" (0). (We assume that c* = V"(0) > 0.) 
Thus, we make a refinement of the hypothesized 
form of the solution and replace [4] by 


r(j,t) =e°U(e(j + ct), e*t) 
+ e?V(e(j—ct), ert) +e*y(ej,et) 16 


The presence of the term e*~ may be somewhat 
surprising. We will discuss the reason for its 
appearance in more detail below, but for the 
moment we mention merely that its presence does 
not affect the fact that to leading order the solution 
is approximated by the left- and right-moving waves 
represented by the e*U and e? V terms, respectively. 
We also note that the additional time dependence 
eùt in U and V is chosen, as is typical in the 
multiscale method to incorporate the higher-order 
terms omitted in [5] into the evolution. 

Substituting [6] into [3] and expanding the 
resulting equation in € we find that the lowest 
order in € that occurs is O(e*) and these terms all 
cancel exactly because of the form of our hypothe- 
sized solution. The terms of O(e°) are: 


{2côxðrU — 2cdx0rV + Py} 
_ (AU + LAV + By) 
+5" (0){0%(U* + V* +2UV)} [7] 


Here, X, T, £, and T represent the rescaled indepen- 
dent variables, that is, U=U(X,T),V=V(X,T), 
and y = y(€, 7). 
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Note that if it were not for the presence of the term 
2UV on the right-hand side of this last equation the 
equations for U and V would completely decouple, 
that is, there would be no interaction between the 
left- and right-moving parts of the solution to this 
order. At this point, we can take advantage of the 
(heretofore) arbitrary function y. If we assume that U 
and V are given, we can choose y to satisfy the 
inhomogeneous wave equation: 


Py = dy + V"(0)R(UV) [8] 


Then, provided y remains of O(1) over the time- 
scales of interest (which one can verify a posteriori), 
we see that all terms of O(e°) in the expansion of [3] 
will vanish provided 


1 
28ərU = L82 U + —V"(0) dx (U2) 

E Egy, y 2 
207 V = po ae (O)Ox(V ) 


This means that the left- and right-moving parts of the 
solution satisfy a pair of uncoupled KdV equations. 


Remark 1 To rewrite [9] in the standard form [1] 
one can make a simple rescaling — for instance, 
choose X=ax,T=t and u(x,t) =GUl(ax,t), with 
a=(c/24)"/?) and B=V"(0)/(12ca). 


We can now comment on the reasons we chose 
the particular scalings of the amplitude and of the 
independent variables used in [6]. The terms 0% U? 
and 4V? are the lowest-order contributions from 
the nonlinear part of [3], while the terms 0%U and 
ð% V represent the lowest-order contributions from 
the linear part of the equation, except for the 
“trivial” translation that comes for [5]. In particular, 
in the absence of nonlinear effects the terms ô$ U 
and ð% V (or equivalently, the terms ô} U and 03, V in 
[9]) would cause traveling waves to “disperse” and 
thus, the KdV equation represents a balance 
between nonlinear and dispersive effects. It is this 
balance between dispersion and nonlinearity which 
permits traveling-wave solutions to propagate with- 
out change of form (see the section “Integrability of 
the KdV equation”). 

More generally, we expect the KdV equation to 
arise as a modulation equation whenever a small- 
amplitude, long-wavelength linear wave is simulta- 
neously perturbed by dispersive and nonlinear 
effects of the same order of magnitude. This is, of 
course, oversimplified. For instance, the original 
equation may have no quadratic terms in the 
nonlinearity, for instance, which means that the 
term xU? in the modulation equation will be 
replaced by a term like OxU?, for p>2 —- this 


leads to the modified KdV equation as the appro- 
priate modulation equation. Or, for certain para- 
meter values in the original equation the coefficient 
in front of the leading-order dispersive term may 
vanish, in which case a fifth-order modulation 
equation known as the Kawahara equation is more 
appropriate. However, both of these cases are in 
some sense nongeneric and the relatively weak 
hypotheses needed to obtain the KdV equation as 
the appropriate modulation equation indicate why it 
is encountered in so many diverse circumstances. We 
note, however, that the multiscale method used 
above to derive the KdV equation does not give a 
unique choice for the appropriate modulation 
equation at any given order of approximation and 
we discuss in a later section some other equations 
that could be used as models in the situation above. 


Validity of the KdV Approximation 


While the above derivation of the KdV equation is 
simple and intuitive one may wonder how accurate 
an approximation it actually provides to the true 
solutions of [3] (or to the evolution of water waves, 
probably the most important physical situation in 
which the KdV approximation is used). In particu- 
lar, note that in the notation of [9] the phenomena 
intrinsic to the KdV equation occur on timescales 
T=O(1). However, this corresponds to a very 
long timescale t=O(1/e°?) in the original FPU 
model and it could easily be the case that although 
the error made in derivation of the KdV approx- 
imation at any given time is quite small, over these 
very long timescales the errors could accumulate 
in such a way as to destroy the accuracy of the 
approximation. 

The KdV and other modulation equations have 
been used since the nineteenth century but only 
relatively recently have rigorous estimates of the 
accuracy of this approximation been proved. In 
fact, the first estimates demonstrating that the 
KdV equation actually provided an accurate 
approximation to the true motion of water 
waves over the timescales expected from the 
heuristic derivation were not proved until Craig 
(1985). More recently, powerful general methods 
have been developed to justify not just the KdV 
equation but other modulation equations like the 
nonlinear Schrodinger equation and Ginzburg- 
Landau equation as well. 

For instance, the following method, introduced 
in Kirrmann et al. (1992), has been used to justify 
the use of modulation equations in the water-wave 
problem, the evolution of Taylor—Couette patterns 
in viscous fluids, and a number of other 
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circumstances. We will explain it in the context of 
a general, abstract evolution equation to indicate 
its generality. Suppose that one wishes to approx- 
imate the small-amplitude solutions of a general 
evolution equation (or system of such equations) of 
the form 


Ou = Lu + N (u) [10] 


where L is a linear operator and N represents the 
nonlinear terms. Suppose that via some formal 
analysis like that in the previous section we have 
derived a function 77 that is believed to be a good 
approximation to a true solution of [10]. In that 
example, for instance, <*w would be the sum of the 
solutions of the two KdV equations in [9], and in 
general it will be given by the solution of the 
modulation equation that is expected to approxi- 
mate [10]. We must show that the difference 
between °% and a true solution of [10] remains 
small over the timescales of interest. We write this 
difference as u—e*w=eR so that if 8 > 2, and if 
R=O(1),e*w does provide the  leading-order 
approximation to the true solution. We can make 
R|,-. small by choosing the initial conditions of 
our modulation equation appropriately and thus 
we need to follow how R evolves in time. If we use 
the equation satisfied by u we see that R evolves as 


ƏR = LR + EP [N (ey +R) 
-N (e*)| + e PRes(e%y) [11] 
where Res(e*w) = L(e7) + N(e*w) — alep), the 


“residual?” of our approximation is simply the 
amount by which the approximation fails to satisfy 
the original equation at any given time. In the 
example in the previous section the residual would 
include the terms O(e%) that we ignored in our 
expansion. 

One must now, in any given example consider 
three points: 


1. The linear evolution of R: 
3R =LR+DN(e*)R [12] 


Controlling the solutions of this linear, but 
nonconstant coefficient partial differential equa- 
tion is often the most difficult step in proving 
that solutions of the modulation equation give 
accurate approximations to the true solution. 
One can frequently find norms that are preserved 
by solutions of the leading-order equation 
ðR =LR. However, the term DN (e7w) = Ole?) 
if NV is a quadratic nonlinearity. Over the very 
long timescales (i.e., O(e°)) of interest in these 
approximation problems this O(e*) term can 


cause uncontrolled growth of R, leading to a 
breakdown in the approximation. In order to 
control [12] one must typically make use of some 
special features of the problem under consider- 
ation. For instance, it is sometimes possible to 
make a coordinate transformation which elim- 
inates the terms of O(e*) on the right-hand side 
of [12], after which relatively standard methods 
suffice to control the solutions of [12]. 

2. The nonlinear terms in [11]: these terms are of the 
form EPIN (e? + ef R) — N(e7)] — DN (e7W)R. 
From Taylor’s theorem we see that, if the non- 
linear term is reasonably smooth, these terms are 
of O(e”). If 8 > 3, these terms are small and can 
be controlled over the timescales of interest by a 
straightforward application of Gronwall’s inequal- 
ity or standard “energy estimates.” 

3. Finally, one must consider the influence of the 
inhomogeneous terms €” Res(e77). Note that if 
this term is small enough, say O(ef), with 3 > 3 
this term can also be controlled over the relevant 
timescales by an application of the Gronwall 
inequality. In order to make this term small, we 
need to be sure that our approximation £? fails 
to solve the true equation at any given time by a 
small amount. In doing so, we can exploit the 
fact that we can add to our leading-order 
approximation terms of higher order without 
affecting the fact that to leading order the true 
solution is still approximated by the solution of the 
modulation equation. This is the role of the term 
ctp in the approximation [6] in the previous 
section. The leading-order approximation is given 
by the functions U and V which solve the KdV 
equations but by adding the additional term ct% to 
the approximation we cancel the remaining terms of 
O(e°) in [7], thereby reducing the size of the residue 
in that example to O(<*). This method works in 
other examples as well so that the inhomogeneous 
term in [11] can usually be treated by this means. 
However, in each case, we must prove that the 
additional terms one adds to the approximation 
remain bounded over the timescales of interest and 
demonstrating this fact may not be as easy as it was 
in the case of the FPU model where the additional 
term satisfied a simple wave equation. 


Using this approach one can show that the 
approximation derived heuristically in the previous 
section does accurately model the behavior of 
solutions of the FPU model over the expected 
timescales. More precisely, if r(j, t) is the solution 
of [3] and if U and V are the solutions of the 
modulation equations [9] (with appropriately 
chosen, small-amplitude, long-wavelength initial 
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conditions), one can prove (see Schneider and 
Wayne (1999)) that for any To > O there is an £ọ > 
0 and C > 0 such that for all 0 < £ < £ọ, 


sup _||r(-,#) — (e° Ule(. + et), e*t) 
tE|0,To/e>] 


+ (e°V(E(. — ct), €°t))|lpx < Ce”? 


One can also use this method to show that the 
solution of the water-wave problem with general 
small-amplitude, long-wavelength, initial data can 
be approximated by the sum of the solutions of a 
pair of uncoupled KdV equations (Schneider and 
Wayne 2000), one representing the left-moving part 
of the solution and one representing the right- 
moving part of the solution, though in this context 
the technical difficulties associated with the exis- 
tence theory for the water-wave problem mean the 
details are quite a lot more complicated. 


Integrability of the KdV Equation 


One reason that normal forms for systems of 
ordinary differential equations are so useful is that 
they are frequently integrable — that is, they possess 
sufficiently many integrals, or constants of motion, 
that essentially explicit formulas for their solutions 
can be obtained. Remarkably, the same is true for 
the KdV equation and for many other modulation 
equations. An argument for why this is so has been put 
forth by Calogero and Eckhaus based on the univer- 
sality of these equations — see Calogero and Eckhaus 
(1987) and references therein, as well as the article 
Integrable Systems: Overview for more on this point. 

Recall that Boussinesq and Korteweg and de Vries 
introduced the KdV equation to study solitary 
traveling waves on a fluid surface. For [1], one has 
an explicit family of such solutions given by: 


u(x,t) = 2A?sech*(A[x + 4A7z]), A>0 


Note that from this formula one sees that waves of 
large amplitude are narrower and travel faster than 
waves of small amplitude. 

In a famous numerical study, Zabusky and 
Kruskal made a remarkable discovery. They con- 
sidered solutions of the KdV equation in which a 
solitary wave of large amplitude overtook one of 
smaller amplitude. They found that after a highly 
nonlinear interaction the two solitary waves re- 
emerged with their original amplitudes and speeds 
and the only reminder of their interaction was a 
phase shift in their relative positions. Their discov- 
ery began a search for a mathematical explanation 
of this remarkable “nonlinear superposition princi- 
ple” which culminated with the solution of the KdV 


equation via the method of inverse scattering and 
the identification of the KdV equation as an infinite- 
dimensional, completely integrable Hamiltonian 
system. 

We begin by describing how a transformation 
discovered by Miura (1968) and then generalized by 
Gardner et al. (1974) leads very easily to the 
conclusion that there are infinitely many conserved 
quantities for the KdV equation. The basic idea is 
that given a transformation which maps solutions of 
one equation to solutions of a second, the existence 
of simple or “obvious” conserved quantities for the 
first equation may lead, via the transformation, to 
more complicated conserved quantities for the 
second. 

Given u=u(x,t), define w(x,t) implicitly via the 
formula 


u(x,t) = w(x,t) + icd,w(x, y) + €e?(w(x,t))” [13] 


Note that if w is smooth enough and € is small, we 
can invert this relation recursively to obtain w in 
terms of u via the formula 


w =u —icd,u — e*(u> + O2u) 
+ ie? (O3u + 4ud2u) + 4 (2u? + 5(A,u)? 
+ 6u02u + Ou) + Ole) [14] 


Now compute 


Ou — Pu — 6u0,.u 
= {0,w — 6w0,w — 6e w ðw — Bw} 
+ 2<*w{0,w — 6wôðyw — 6E w yw — Pw} 


+ 1€0,,{ O;w — 6WO,Ww — 6e"w* 0,0 — Zw} [15] 


From this we see immediately that if w satisfies the 
modified KdV equation 


Ow = 6(wd,w + w Ow) + Bw [16] 


then u, defined by [13] satisfies the KdV equation. 
However, one also sees immediately that the integral 
of w is a conserved quantity of [16] for all values of 
e, that is, if we define Z-(t) = f w(x, t) dx, then T+: is a 
constant for all values of £. (We will assume here 
that w is defined on the real line, and that w and its 
derivatives go to zero as |x| tends to infinity. Similar 
results hold for x running over a finite interval with 
periodic boundary conditions.) But this in turn 
immediately implies that if we use [14] to expand 
T. in powers of £ the coefficients in this expansion 
must also be constants in time. Since these coeffi- 
cients will be expressed as integrals of u and its 
derivatives, they will give us (infinitely many) 
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conserved quantities for the KdV equation! Looking 
at the first few of these we find: 


1. Ko = f u(x,t) dx. The conservation of this quan- 
tity follows immediately from the form of the 
KdV equation. 

2. Kı = | 0,u(x,t)dx=0, if we assume that u and 
its derivatives tend to zero as |x| tends to infinity. 
Thus, we gain no new information from this 
quantity and in fact, all the integrals coming 
from the odd powers of £ turn out to be “trivial” 
so we ignore them and focus just on the even 
powers of €. 

3. Ky = f (u? + 02u) dx =f u* dx. That this is a con- 
served quantity is again easy to see directly from 
the KdV equation, just by multiplying the 
equation by u and integrating with respect to x. 

4. K4= f (3u + 5(0,u)* + 6ud2u+ 04u) dx = f (3u?— 
(0,.u)*) dx. The origin of this integral is not so obvious 
and we comment further on its meaning below. 


Clearly by continuing this procedure we can generate 
an infinite number of conserved quantities for the KdV 
equation. Indeed, if one chose another conserved 
quantity for the modified KdV equation, [16], say 
f w(x, t)dx one could generate another sequence of 
conserved quantities via this same procedure. How- 
ever, Kruskal, Miura, Gardner, and Zabusky proved 
that in fact, all of the conserved quantities that can be 
written as polynomials in u and its derivatives are 
already obtained by the procedure above. 

The constant of the motion K4 found above is of 
particular interest because one can write the KdV 


equation as 
6K, 


where 6/6u denotes the variational derivative of K4 
with respect to u(x). One can interpret this equation 
as a Hamiltonian system where ô, defines the 
(nonstandard) symplectic structure and remarkably, 
Zhakarov and Faddeev (1971) proved that the KdV 
equation is actually a completely integrable Hamil- 
tonian system. In particular, there exists a canonical 
transformation such that with respect to the new 
coordinates the Hamiltonian is a function only of 
the action variables (and hence in particular, the 
action variables remain constant in time). The 
transform which brings the Hamiltonian into its 
action-angle form is known as the inverse spectral 
transform and its details would take us beyond the 
limits of this article. However, very briefly, by 
observing that the Miura transformation [13] 
defines a Ricatti differential equation, and using 
the transformation that converts the Ricatti 


equation to a linear ordinary differential equation 
one can relate the solution of the KdV equation to 
an eigenvalue problem for a linear Schrödinger 
operator. The potential term in the Schrödinger 
operator is given by the solution u(x,t) of the KdV 
equation. Remarkably, it turns out that the eigen- 
values of this Schrödinger operator are constants of 
the motion if u is a solution of the KdV equation 
and are very closely related to the action variables 
for the Hamiltonian system. For more details on the 
inverse-scattering method and its use in solving the 
KdV equation we refer the reader to the mono- 
graphs of Ablowitz and Segur (1981), Newell 
(1985), or the recent book by Kappeler and Poschel 
(2003) which develops the theory for the KdV 
equation on a finite interval with periodic boundary 
conditions in a particularly elegant fashion. 


Other Mathematical Aspects of the 
KdV Equation 


In addition to the inverse-scattering transform 
approach, more traditional approaches to the exis- 
tence and uniqueness of solutions have also been 
studied, starting with Temam’s proof of the well- 
posedness of solutions of the KdV equation with 
periodic boundary conditions in the Sobolev space 
H*. Noting that the Hamiltonian for the 
KdV equation described in the preceding section 
is closely related to the H! norm, this might seem a 
natural space in which to study well-posedness, but 
surprisingly Kenig, Ponce, and Vega, and Bourgain 
showed that the equation is also well posed in 
Sobolev spaces H*, with s< 1 and more recent 
work has extended the global well-posedness results 
to Sobolev spaces of small negative order. Aside from 
their intrinsic interest, these results have other 
physical implications. If one wishes to study statis- 
tical aspects of the behavior of ensembles of solutions 
of these equations, statistical mechanics suggests that 
the natural invariant measure for these equations is 
given by the Gibbs’ measure. However, the Gibbs’ 
measure is typically supported on functions less 
regular than Ht, so that in order to define and 
study this measure one needs to know that solutions 
of the equation are well behaved in such spaces. 
Another natural mathematical question arises 
from the fact that the KdV equation is only an 
approximation to the original physical equation. 
Viewed from another perspective, the original 
system can be seen as a perturbation of the KdV 
equation. It then becomes natural to ask whether the 
special features of the KdV equation are preserved 
under perturbation. Viewing the KdV equation as a 
completely integrable Hamiltonian system this is 
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very analogous to the questions studied by the 
Kolmogorov—Arnol’d—Moser (KAM) theory and 
has led to a development of KAM-like results for a 
number of different partial differential equations 
like the KdV equation. The results are somewhat 
technical in nature but roughly speaking they say 
that if one considers the KdV equation with periodic 
boundary conditions, temporally periodic or quasi- 
periodic solutions will persist under small perturba- 
tions. The situation is more complicated and less 
well understood for the equation on the whole line 
due to the presence of a continuum of scattering 
states. For a very thorough review of the problem 
with periodic boundary conditions see Kappeler and 
Poschel (2003). 


Other Modulation Equations 


As we stressed in its derivation, the KdV equation is 
an appropriate modulation equation for small- 
amplitude, long-wavelength solutions in dispersive 
nonlinear partial differential equations. However, as 
mentioned in the section “Derivation of the KdV 
equation” the method of multiple scales does not give 
a unique modulation equation even in this specific 
physical regime. Already in his original studies 
Boussinesq derived at least three different model 
equations for small-amplitude, long-wavelength 
water waves and a variety of such models continue 
to be studied today. For instance, an easy variation in 
the derivation of the KdV equation leads to the 
regularized long wave, or Benjamin—Bona—Mahoney 
equation in which the 02u term in the KdV equation 
is replaced by the term 020,u. The validity of these 
alternatives to the KdV equation can also be studied 
with the aid of the methods described in the section 
“Validity of the Kdv approximation.” 

There have been many discussions of which of these 
modulation equations is the “correct” one. while they 
may all yield equivalent approximations to the original 
physical problem the KdV equation has at least two 
advantages: it is independent of the expansion para- 
meter £, and it is completely integrable. None of the 
other equations that have been proposed as approx- 
imations to these small-amplitude, long-wavelength 
phenomena share both of these properties. 

If we think in terms of the Fourier transforms of 
the long-wavelength functions studied above they 
are solutions whose Fourier transform is concen- 
trated near zero. One can also ask about modulation 
equations for solutions whose Fourier transform is 
concentrated about nonzero wave numbers. Such 
solutions represent a wave train with some fixed 
underlying wavelength, A., modulated on a much 
longer length scale, A. /e. 


If we make the ansatz that the solution has the 
form 


u(x, t) X eA(e(x = CH), ape ar 
+ complex conjugate [18] 


and insert this hypothesized form of the solution into 
the original equation, then under mild assumptions 
on the form and properties of the original equation, 
similar to those under which we derived the KdV 
equation in an earlier section we find that to the 
lowest, nontrivial order in £, the amplitude A evolves 
according to the nonlinear Schrödinger equation 


—iðrA = c,0% A + co AJA|" [19] 


If cı and c2 are both real, the nonlinear Schrödinger 
equation can also be solved via the inverse-scattering 
method and it represents another completely integr- 
able modulation equation. 

In this article, we have discussed modulation 
equations only for Hamiltonian, or conservative 
systems. However, similar equations have also played 
an important role in the study of dissipative 
equations like the Navier-Stokes equation. The 
most common modulation context in that setting is 
the Ginzburg-Landau equation, which can be derived 
as a modulation equation for Taylor—Couette rolls or 
for the convection rolls in the Rayleigh—Bénard 
problem. Like the nonlinear Schrödinger equation, 
the Ginzburg-Landau equation describes how slow 
variations of the amplitude of an underlying periodic 
pattern evolve and as such it arises in a host of other 
situations in addition to the fluid dynamics examples 
mentioned above. For an extensive review of the 
applications of the Ginzburg-Landau equation, as 
well as its mathematical properties and some special 
solutions, see the recent article of Mielke (2002). 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Central Manifolds, Normal Forms; Hamiltonian Fluid 
Dynamics; Infinite-Dimensional Hamiltonian Systems; 
Integrable Systems and the Inverse Scattering Method; 
Integrable Systems: Overview; KAM Theory and 
Celestial Mechanics; Multiscale Approaches; Partial 
Differential Equations: Some Examples; WDVV 
Equations and Frobenius manifolds. 
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K-theory was invented in the category of algebraic 
vector bundles over algebraic varieties by 
A Grothendieck, who was directly motivated by 
the Hirzebruch—Riemann-Roch theorem which he 
subsequently greatly generalized. He also defined 
K-homology in terms of coherent sheaves and 
established the basic properties of K-theory 
and K-homology including Poincaré duality for 
nonsingular varieties. The origin for the choice of 
the letter K in K-theory was apparently the German 
word “Klasse.” 

Using the formalism of Grothendieck, MF Atiyah 
and F Hirzebruch (cf. Karoubi 1978), developed 
topological K-theory in the category of topological 
(complex) vector bundles over topological spaces. It 
is this theory that will be the first principal focus of 
this article. A topological (complex) vector bundle 
over a compact topological space X is a topological 
space E together with a continuous map p:E —> X 
that is onto, such that p~'(x) is a vector space that is 
isomorphic to C” for all x € X, and there is an open 
cover {U} of X together with homeomorphisms 
hy:p'(U) = Ux C” called “local trivializations” 
with the property that hy oh: UNV xC” — UNV 
xC” is of the form (Id,gyy), where gyy:UNV —> 
GL(m,C) are continuous maps satisfying the 
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following cocycle condition on triple overlaps, 
guv2vwewu = 1. X x C” is called the trivial vector 
bundle. Two vector bundles p:E — X and g:F > X 
over X are said to be isomorphic if there is a 
homeomorphism ¢:E—F with the property that 
p=qo, and which is a linear isomorphism when 
restricted to each fiber. The direct sum and tensor 
product of vector spaces carries over to vector 
bundles. There are canonical isomorphisms E 6 F = 
FE and E 8 F FQ E, making the set Vect(X) of 
isomorphism classes of complex vector bundles over 
X into a commutative semiring. Vect(X) can be 
made into the commutative ring K?(X) as follows. 
K(X) is generated by pairs ([E],[F]), together with 
the relation ([E],[F])=((E’],[F’]) if EF G 
E'®@F@G for some [G] € Vect(X). Also K!(X) is 
defined to be the group of homotopy classes of 
continuous maps from X to the infinite unitary 
group. Around the same time, R Bott proved his 
celebrated periodicity theorem, which says that the 
odd homotopy group of the (infinite) unitary group 
is the integers, whereas the even homotopy groups 
are all trivial. Incorporating Bott’s periodicity 
theorem for the unitary group into K-theory, Atiyah 
and Hirzebruch proved that topological K-theory 
K*(X)=K°(X) Kİ(X) is a periodic generalized 
cohomology theory, and in what follows, the 
notation K”(X) means n modulo 2. If M is not 
compact, then we can compactify M by adding to it 
a point + “at infinity,” and denote it by M7”. Let 
L:+— Mr? be the inclusion, inducing the pullback 











map w': K*(M*+) — K*(+) = Z. Then K*(M) is defined 
to be ker(z'), also called the reduced K-theory. If X4 
is a closed subset of X, the K-theory of the pair 
(X,X1) is defined as the reduced K-theory of the 
quotient space X/X,. A fundamental computation 
of Bott is the computation of the K-theory of 
Euclidean space, K”(IR”)2Z with canonical gen- 
erator called the Bott class be K”(R”), and 
Kso. 

Some of the basic properties of K-theory are listed 
as follows. Details can be found in Karoubi (1978). 


1. Pullback If f:N —> M is a continuous map, then 
given a vector bundle m:E — M over M, the 
pullback vector bundle is defined as f*(E) = {(x, v) € 
N x E: f(x) =7(v)} over N. This induces a pullback 
homomorphism, f! : K*(M) — K*(N). 

2. Push-forward Let f : N — M be a smooth proper 
map between compact manifolds which is 
K-oriented, that is, TN 6 f*TM is a spin“ vector 
bundle over N. Then there is a pushforward 
homomorphism, also called a Gysin map, 
fi: K*(N) — K*t4(M). where d= dim M — dim N, 
whose construction will be explained in the next 
section. 

3. Homotopy If f:N—M and g:N—M are 
homotopic maps, then the pullback maps f! = g' 
are equal. If in addition, f and g are K-oriented, 
proper maps which are homotopic via proper 
maps, then the Gysin maps fı =g, are equal. 

4. Excision Let Mı be a closed subset of M and U 
be an open subset of M such that U is contained 
in the interior of M1. Then the inclusion of pairs 
(M\ U, Mı\U)— (M, M1) induces an isomorph- 
ism in K-theory, K°(M, M1) = K*(M\U, Mı \ U). 

5. Exactness Let Mı be a closed subset of M. Then 
there is a six-term exact sequence in K-theory, 





K? (M, M1) 


T JA 


K! (M1) 


— K°(M) —  K®(M;) 


+— K!(M) — K!(M,M3) 

6. Cup product There is a canonical map given by 
external tensor product, K’(M) @K’(N) > 
K™ (M x N). When N = M, one can compose this 
with the homomorphism induced by the diagonal 
map M — M x M given by x —> (x, x), to get a cup 
product, K?(M) @ K4(M) — K?*4(M). 

7. Bott periodicity This is arguably the most impor- 
tant property of K-theory. It says that the zero- 
section embedding .“:M—M x R” induces a 
Gysin isomorphism, M, : K*(M)= K*t"(M x R”), 
which is given as follows. Let mm :M x R” — M 
and mpx:M x R” — R” denote the projections 
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onto the factors, and b=u1 € K”(R”) the Bott 
element, where 1: {0}<- R” is the inclusion of the 
origin. Then the Bott periodicity isomorphism is 
given by M(x) = may (x) U mpn(b) € K°™(M x R”) 
for all x € K*(M). 


Using the fact that any vector bundle over a 
contractible space is trivial, together with Bott’s 
periodicity theorem, one deduces the calculation 
of the K-theory of spheres. The calculation for the 
odd-dimensional spheres given, K?(S%-1) = Z % 
K'(S*"-'), and for the even-dimensional spheres 
K°(S2"-1) = Z? and K!(S2") = {0}, for all n> 1. 

There is a natural homomorphism of rings called 
the Chern character, Ch: K*(X) — H°(X,Q) which 
is characterized by the following axioms: 


1. Naturality If f:N — M isa smooth map, and if 
E is a vector bundle over M, then Ch(f'(E)) = 
f*(Ch(E)). 

2. Additivity Ch(E @ F) =Ch(E) + Ch(F). 

3. Normalization If L is the canonical line bundle 
over CP” which restricts to the Hopf line bundle 
over CP!, then Ch(L) = exp (x), where x is the 
generator of H*(CP”, Z) S Z. 


Atiyah and Hirzebruch, cf. Karoubi (1978), also 
proved that the Chern character induces an iso- 
morphism of the rings K*(X) ® Q and H*(X, Q). The 
Chern—Weil representative of the Chern character is 
tr(exp((i/27)Q£)), where Qg is the curvature of a 
Hermitian connection on E. 

There are many variants of K-theory, such as 
KO-theory, where the unitary group is replaced 
by the orthogonal group, which is periodic of 
order eight, and G-equivariant K-theory, where G 
is a compact Lie group. K-theory and its variants 
have many interesting applications such as deter- 
mining the maximum number of linearly inde- 
pendent vector fields on spheres, which is due to 
Adams, cf. Karoubi (1978). We will content 
ourselves with the description of two important 
applications. 





Grothendieck-Riemann-Roch Theorem 
for Smooth Manifolds 


Recall that an oriented real vector bundle E over M is 
said to be a spin“ vector bundle if the bundle of 
oriented frames on E, SO(E) has a circle bundle 
Spin (E) such that the restriction to each fiber yields 
the central extension 0 —> U(1) — Spin“ (n) > 
SO(n) — 0 that defines the group Spin“ (n), where n 
is the rank of E. It turns out that the obstruction to the 
existence of a spin“ structure on E is the third integral 
Stieffel-Whitney class of E, W3(E) e€ H?(M, Z). 
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A generalization of Bott periodicity is the Thom 
isomorphism in K-theory. It says that if 7: E — M is 
a rank-n spin“ vector bundle over M, then the zero- 
section embedding .“@: M<> E induces a Gysin iso- 
morphism, ™, : K*(M) © K**”(E), which is given as 
follows. There is a canonical element M1 € K”(E) 
called the Thom class in K-theory, which is character- 
ized by the property that 41 restricts to give the Bott 
class on each fiber. Then the Thom isomorphism in K- 
theory is given by M(x) = m(x) UeM1 € K**"(E) for 
all x € K*(M). For canonical representatives of the 
Thom class, cf. Mathai-Quillen Formalism, or Mathai 
and Quillen (1986). 

Recall the definition of the Gysin map for smooth 
embeddings. Let X be a smooth, compact manifold, 
and Y a smooth manifold. Let h : X — Y be a smooth 
embedding that is K-oriented. Since TX 6 TX has a 
canonical almost-complex structure, it follows that 
the normal bundle NyX=h*(TY)/TX is a spinC 
vector bundle. If .*:XGNyX is the zero-section 
embedding, then we have the Thom isomorphism 
us: K°(X) > K*t”(NyX), where n= dim(Y)— dim(X) 
is the codimension of the embedding. Upon choosing a 
Riemannian metric on Y, there is a diffeomorphism ® 
from a tubular neighborhood U of h(X) onto a 
neighborhood of the zero section in the normal bundle 
(X). That is, ®': K*(NyX)=K*(U). For any open 
subset j: U<+ Y, the extension by zero defines a 
homomorphism 7: K*(U) — K*(Y). Then the Gysin 
map of the embedding h is defined as };=j0 ®' o 
us: K*(X) — K**"(Y), which turns out to be inde- 
pendent of the choices made. 

Next recall the definition of the Gysin map for 
smooth submersions. Let 7: Y — Z be a smooth 
submersion of smooth manifolds, which is K- 
oriented and a proper map. Since every smooth 
compact manifold can be smoothly embedded in 
R” for q sufficiently large, a parametrized version 
yields an embedding «: YO Z x R™4 that is spinC. 
Therefore the Gysin map is a homomorphism 
ki: K*(Y) > K*+4(Z x R*4), where a= dim(Z) + 2q— 
dim(Y). Let .47:Z Z x R*4 denote the zero-section 
embedding. Then we have the Thom isomorphism 
Los K*(Z) 5 K°t*4(Z x R74). Then the Gysin map 
of the submersion m is defined as m =k o (Ue): 
K*(Y) — K™®(Z), where b= dim(Y)-— dim(Z), and 
turns out to be independent of the choices made. 

Let f:N — M be a smooth proper map that is 
K-oriented. Then f can be canonically factored, first 
into the smooth embedding gr(f):N=N x M, 
which is the graph of the function, that is, 
er(f)(x)=(x,f(x)), and which is K-oriented. The 
Gysin map is er(f),:K*(N) — Ke+¢™™)(N x M). 
Second, the projection py:NxM—M is a 
K-oriented proper submersion, when restricted to 


the image of gr(f). The Gysin map is py,:K*(M x 
N) — K**°(M), where b= dim(N). The Gysin map 
of f is defined as fi = pm ° gt(f),: K*(N) — K**4(M), 
where d= dim(M) + dim(N). 

Given such a smooth proper map f: N —> M that 
is K-oriented. Then there are Gysin maps in 
cohomology, f.:H*(N,Q) — H**4(M,Q) (where 
we consider the Z.-grading given by even and odd 
degree), and in K-theory, fi: K*(N) — K*t¢(M) 
which increases the degree by d= dim(M) + 
dim(N). The Grothendieck—-Riemann—Roch theorem 
due to Atiyah and Hirzebruch, cf. Karoubi 1978, in 
the smooth category can be phrased as the commu- 
tativity of the diagram, 


f, 


K*(N) — K°**!(M) 


Mee | meagan 


H*(N,Q) => H*4(M,Q) 


That is, 
Ch(fi(€)) U Todd(TM) = f,(Ch(€) U Todd(TN)) 


for all € € K°(N), where Todd(E) is the Todd genus 
characteristic class of a Hermitian vector bundle E 
over M. The Chern—Weil representative of the Todd 
genus is 


(1/2) Oe 
uct (ah E) 


where Qg is the curvature of a Hermitian connection 
on E. There are many useful variants of this 
beautiful formula. 


The Atiyah-Singer Index Theorem 


The 2004 Abel Prize citation mentions the Atiyah- 
Singer (1971) index theorem as being one of the 
greatest achievements of twentieth-century mathe- 
matics. It has stimulated considerable interaction 
between mathematicians and mathematical physi- 
cists. We content ourselves here with a rudimentary 
description of the results. 

Let F be the space of all Fredholm operators on 
an infinite-dimensional complex Hilbert space H. 
Recall that an operator A is said to be Fredholm if 
both the kernel and cokernel of A are finite 
dimensional. The index of such a Fredholm operator 
is index(A) = dim(ker(A)) — dim(coker(A)) € Z. The 
index map is continuous, so it induces a map on the 
connected components of F, which turns out to be 
an isomorphism. 


K-theory is naturally related to the space of all 
Fredholm operators F endowed with the norm 
topology. Any continuous map A:X — F from a 
compact space to F has an index in K(X), which 
is given by index(A)= ker(A) —coker(A) in the 
special case when dim(ker(A))(x) is constant in x € 
X. In general, one uses the fact that the index is 
stable under compact perturbation, and shows that 
one can always achieve the special case after a 
compact perturbation. It is again the case that the 
index map is continuous, and so induces a map, 
index: [X, F] — K?” (X), which turns out to be an 
isomorphism, thanks to a fundamental theorem 
of Kuiper which proves that the group of all 
invertible operators on an infinite-dimensional 
complex Hilbert space is contractible in the norm 
topology. 

Now let 7:N — Z be a fiber bundle with typical 
fiber a smooth compact manifold M, where N and Z 
are also smooth compact manifolds. Consider a 
smooth family of elliptic operators D={D,},<7 
along the fibers of r, parametrized by Z, where 
D,:C(a™ (2), E | ra) > C(t (z), F| ayy) and 
E, F are vector bundles over N. Such a family of 
elliptic operators has a symbol 


o(D): T* (E) - 7* (F) 


where m:T*(N/Z)— N is the projection and 
T*(N/Z) is the vertical cotangent bundle. Ellipticity 
for the family is the condition that o(D) is an 
isomorphism outside the zero section, so that the 
triple (n*(E), n*(F),o(D)) determines an element in 
K°(T*(N/Z)) denoted by o(D). 

The analytic index of the family D is index(D) € 
K°(Z), and it turns out that it only depends on the 
class of the symbol o(D) € K°(T*(N/Z)), so the 
analytic index can be viewed as a homomorphism, 


index : K°(T*(N/Z)) > K°(Z) 


Consider an embedding 1:N—Z x R” that is 
compatible with the projection m:N —> Z. The 
fiberwise differential is an embedding di: T(N/Z) — 
Z x R”, which induces a Gysin map 


du : K°(T(N/Z)) — K°(Z x R”) 


upon identifying T*(N/Z) with T(N/Z). Let 
j:Z—ZxR” be the inclusion j(z)=(z,0). It 
induces the Bott isomorphism jı: K°(Z) = K°(Z x 
R”). The topological index of the family D is, by 
definition, 


index, = j, | o du : K°(T*(N/Z)) > K°(Z) 


The Atiyah-Singer (1971) index theorem 
for families of elliptic operators D asserts the 
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equality of the analytic index and the topological 
index, 


index(D) = index,(a(D)) € K°(Z) 


Combined with the Grothendieck—Riemann—Roch 


theorem, one has the following exquisite formula in 
H°*(Z, Q): 


Ch(index(D)) = p.7.{Todd(Ta(N/Z)) U Ch(a(D))} 


where p: T7(N/Z) — N is the projection. 

The map sending a complex vector bundle E over 
Z to its determinant line bundle det(E)=A™*E 
induces a homomorphism, det: K°(Z) — m9(Pic(Z)), 
where 70(Pic(Z)) denotes the isomorphism classes of 
complex line bundles over Z. Then 


c,(det(index(D))) 


where 7! denotes the degree-2 component, and the 
left-hand side denotes the first Chern class of the 
determinant line bundle of the index class. This 
formula is often used in the study of anomalies in 
physics. 


K-Theory of C*-Algebras 


The Gelfand—Naimark theorem asserts that unital 
abelian C*-algebras A can be identified with the 
space of continuous functions C(X), where X is the 
compact Hausdorff space known as the spectrum of 
A, consisting of characters of A. Conversely, given a 
compact Hausdorff space X, the characters of C(X) 
consist of the evaluation maps at points of X. 

Let E be a vector bundle over X. Then there is a 
vector bundle F over X such that E 9 F S X x C”. 
Setting A=C(X), M=C(X,E), N=C(X,F), we 
see that M@N = A”, showing that each vector 
bundle E over X determines a canonical finite 
projective module M over A. The converse is also 
true and is a result of Serre and Swan, cf. Blackadar 
(1986), which asserts that every finite projective 
module M over A is the space of all continuous 
sections of a vector bundle over X. So we have an 
equivalence of the category of vector bundles over X 
and the category of finite projective modules over A. 

This motivates the following generalization of 
topological K-theory for a general unital C*-algebra 
A. Let Proj(A) denote the isomorphism classes of 
finite projective modules over A. It is a commutative 
semigroup under the operation of direct sum, which 
can be made into the commutative group Ko(A) as 
follows: Ko(A) is generated by pairs ([M], [N], 
together with the relation ({MJ], (VJ) =([M’], NV ]) 
if MAN @G2ZM ON OG for some [G] € Proj 
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(A). Also K,(A)=79(GL(oo,A)) where GL(œ, A) 
denotes the direct limit of GL(n,A) where 
(GL(n, A) embeds in GL(n + 1, A) as 16 GL(n, A). 
Then, defining Kj(A)=7;-1(GL(oo,A)) for j>1, 
together with generalized Bott periodicity which 
asserts that there is a canonical isomorphism 
m-1(GL(o0, A)) S 7j41(GL (00,A)), we see that 
K.(A)=Ko(A) @ Ki(A) is a generalized periodic 
cohomology theory. If A is a C*-algebra without 
unit, then consider AT = A $ C, with product given 
by (a, A)(b, w) =(ab + aw + bà, Au) with unit (0, 1). 
The projection p:A*—C defined as p(a,rA)=A 
induces a map p;: K.(A*) > K,(C). In the nonunital 
case, K.(A) is defined as ker(p:). Observe that 
K,(A) =K,(A*), but this is often not the case with 
Ko. It is easy to see that when A has a unit, then the 
two definitions of Ko agree. An important caveat in 
the case of noncommutative C*-algebras is that the 
K-theory is often not a ring as there is no analog of 
the tensor product operation. 

Some of the basic properties of K-theory are listed 
as follows. Details can be found in Blackadar 
(1986). 








1. Cup product A continuous bilinear map of 
C*-algebras, A x B — C, induces a cup product, 
K;(A) 8 K;(B) > Ki4;(C). 

In particular, the continuous product A x A — A 
induces a cup product homomorphism, 
K;(A) & K;(A) —> Keta: 

2. Induced homomorphism If f: A — B is a homo- 
morphism of C*-algebras, then there is an 
induced homomorphism, fı : K.(A) — K.(B). 

3. Homotopy If f:A—B and g:A—B are 
homomorphisms of C*-algebras that are homo- 
topic, the induced homomorphisms on K-theory 
f= g, ate equal. 

4. Excision If I is a closed two-sided ideal in A, 
then there is a six-term exact sequence in 
K-theory, 


Ko —+ Ko(A) — Ko(A/D 
T [A 
KA, = BGA e- Ke 


5. Morita invariance The inclusion homomorph- 
ism of A into the top left of the diagonal in 
M,(A) induces an isomorphism in K-theory, 
K.(A) = K.(M,(A)). 

6. Continuity Let A= lim, An be a C*-direct 
limit. Then, K.(A) = limy—, ə Ke(An). 

7. Stability Let K be a C*-algebra of all compact 
Operators on an infinite-dimensional complex 
Hilbert space. Then since K = lim, — so M,(C) is 


a C*-direct limit, we see that K.(A@K)= 

8. Bott periodicity The continuous product A x 
C— A induces the cup product K;(A) 8 Kj(C) > 
K;,;(A). The computation by Bott asserts 
that there is a canonical element b € K>(C) that 
gives an isomorphism K>2(C)2=Z, and 
Bott periodicity asserts that the cup product 
with b gives rise to an isomorphism K;(A) = 
Kj+2;(A). 


We mention in passing that Connes has defined a 
Chern character homomorphism, Ch:K,(A) — 
HE,(A), mapping into the entire cyclic homology 
of A, having similar properties as the ordinary 
Chern character. Due to space constraints, it will 
not be defined here. 


A C*-Algebra Generalization of the 
Atiyah-Singer Index Theorem and 
the Baum—-Connes Conjecture 


We content ourselves here with a rudimentary 
account of the C*-algebra generalization of the 
Atiyah-Singer index theorem and the Baum—Connes 
conjecture, and its relevance to the quantum Hall 
effect and strict deformation quantization. Let A be 
a C*-algebra. 

Let Ha =A 8 H, which is the analog of a Hilbert 
space. Let Fa be the space of all A-Fredholm 
operators on Ha. Recall that an operator T is said to 
be A-Fredholm if both the kernel and cokernel of T + 
K are closed and finitely generated projective modules, 
where K is an A-compact operator. The space of 
A-compact operators is by definition the closure of 
the A-finite rank operators. The index of T is 


index(T) = [ker(T + K)] — [coker(T + K)] € Ko(A) 


The index map turns out to be well defined and 
independent of the choice of A-compact perturba- 
tion K. It is continuous, so it induces a map on the 
connected components of Fa, which turns out to 
be an isomorphism, by a theorem of Mingo 
(cf. Rosenberg (1983, 1989)). 

Now let M be a smooth compact manifold. An 
A-vector bundle over M is a locally trivial Banach 
vector bundle E over M whose fibers have the 
structure of finitely generated left A-modules, with 
morphisms respecting the A-module structure. The 
isomorphism classes of A-vector bundles over M 
form a commutative semigroup under direct sums, 
and the associated commutative group is easily 
identified with Ko(C(M) @ A). Let D: C%(M, E) — 
C%(M, F) be an elliptic A-operator acting between 
smooth sections of A-vector bundles E, F over M. It 


turns out that by elliptic regularity, such an operator 
is A-Fredholm, and has an analytic index, 


index(D) € Ko(A) 
Associated to each such operator is a symbol 
o(D): a°(E) > a*(F) 


where 7: T*M — M is the projection. Ellipticity is 
the condition that o(D) is an isomorphism outside 
the zero section, so that the triple (7*(E), 7*(F), o(D)) 
determines an element in Ko(Co(T*M) 8 A) denoted 
by o(D). It turns out that the analytic index of D 
depends only on the class o(D) € Ko(Co(T*M) @ A). 
Therefore, the analytic index can be viewed as a 
homomorphism, 


index : Ko(Co(T*M) &) A) — Ko(A) 


Consider an embedding 1: M — R”, which induces 
an embedding di: TM — R”. The associated Gysin 
map is du:Ko(Co(T*M) 8 A) — Ko(Co(R”) @ A). 
Let j : {0} — R” denote inclusion of the origin in R”. 
It induces a Gysin map jı: Ko(A) > Ko(Co(R”) @ A) 
which is the Bott periodicity isomorphism. Then the 
topological index is the homomorphism 


index, = jy * o du : Ko(Co(T*M) @ A) — Ko (A) 


The C*-generalization of the Atiyah-Singer 
index theorem due to Mishchenko-Formenko, cf. 
Kasparov (1988), asserts the equality of the 
analytic index and the topological index, 


index(D) = index,(o(D)) € Ko(A) 


Now let M be a compact even-dimensional 
spin“ manifold. Then there is a spin“ Dirac 
operator D :C®(M, S+) — C®%(M,S~), where S+ is 
the bundle of half-spinors on T*M ® L, where L is 
a line bundle over M with the property that the 
first Chern class of L modulo 2, cı(L)mod 2 is 
equal to the second Stieffel-Whitney class of M, 
w2(M). Let T be a torsion-free discrete group, and 
BT be its classifying space. It is a paracompact 
space with the property that it is the quotient of T 
acting freely on a contractible space ET. Let C¥(T) 
denote the reduced group C*-algebra, and consider 
the canonical flat Cs([) bundle V over BT defined 
as follows: 


v= {EP x C(L)}/P 


where T acts on the left on C(I) and on the right on 
EY. Let f: M — BT be a continuous map. Then f*V 
is a flat C*([)-bundle over M. Upon choosing a flat 
connection on f*V, we can couple the spin“ Dirac 
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operator Dy to act on sections of S+ @f*V. The 
ellipticity of Dy ensures that it is a C*([)-Fredholm 
operator, so it has an analytic index, index(Dy) € 
Ko(C(C)) by the earlier discussion, which is 
also equal to the topological index index,(o(Dy)) € 
Ko(C3(L)). 

By Baum, Connes, and Douglas, the K-homology 
of BT, Ko(BI), is generated by the triples (M, E, f) as 
described above, modulo relations that we will not 
present here because of space constraints. The 
assembly map 


is a homomorphism given by p([(M,E,f)|)= 
index(Dy). The Baum—Connes conjecture asserts 
that u is an isomorphism. There are variants of 
this conjecture when I has torsion. The Baum- 
Connes conjecture has been verified when I is an 
amenable group or, for instance, a word hyperbolic 
group. There are also variants of this conjecture for 
certain foliations and groupoids, and is an extremely 
active area of research. The injectivity of the 
assembly map is related to the Novikov conjecture 
on the homotopy invariance of the higher signatures 
(Kasparov 1988), and the obstructions to the 
existence of Riemannian metrics of positive scalar 
curvature on compact spin manifolds (Rosenberg 
1983, 1989). A variant of the Baum—Connes 
conjecture, where the reduced group C*-algebra is 
replaced by the twisted reduced group C*-algebra, is 
used in the analysis of the noncommutative geome- 
try approach to the integer and fractional quantum 
Hall effect, and also the gaps in the spectrum of 
magnetic Schrödinger operators (Bellissard et al. 
1994, Marcolli and Mathai 2001). 


Twisted K-theory and the Chern 
Character 


We begin by reviewing some results due to Dixmier 
and Douady (1963). Let M be a smooth manifold, let 
H denote an infinite-dimensional, separable, Hilbert 
space and let K be the C*-algebra of compact 
operators on H. Let U(H) denote the group of 
unitary operators on H endowed with the strong 
operator topology and let PU(H) = U(H)/U(1) be the 
projective unitary group with the quotient space 
topology, where U(1) consists of scalar multiples of 
the identity operator on H of norm equal to 1. Since 
U(H) is contractible in the operator norm topology, it 
follows that PU(H) = BU(1) is an Eilenberg—MacLane 
space K(Z,2). Therefore, BPU(H) is an Eilenberg— 
MacLane space K(Z,3). That is, principal PU(H) 
bundles P over X are classified up to isomorphism by 
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the Dixmier-Douady class DD(P) in H(X, Z) and 
conversely. 

For g € U(H), let Ad(g) denote the automorphism 
T— Tg! of K. As is well known, Ad is a 
continuous homomorphism of U(H), given the 
strong operator topology, onto Aut(K) with kernel 
the circle of scalar multiples of the identity where 
Aut(X) is given the point-norm topology. Under this 
homomorphism we may identify PU(H) with 
Aut(K). Define an Azumaya bundle to be a locally 
trivial bundle € over X with fiber K and structure 
group Aut(K). They are of the form Kp={P x K}/ 
PU(H) and isomorphism classes of Azumaya bundles 
are also parametrized by their Dixmier-Douady 
class DD(P) in H?(X, Z) and conversely. 

Since K&K SK, the isomorphism classes of 
locally trivial bundles over X with fiber K and 
structure group Aut(K) form a group under the 
tensor product, where the inverse of such a bundle 
is the conjugate bundle. This group is known as 
the infinite Brauer group and is denoted by Br®(X). 
So, a restatement of the Dixmier-Douady theorem 
is that Br%(X) 2 H?(X,Z).H°*(X,Z) can also 
be described in terms of bundle gerbes (Murray 
1996). 

The twisted K-theory, K°(X, P), is defined as the 
K-theory of the C*-algebra of continuous sections of 
the Azumaya bundle Kp,K.(C(X,Kp)). It was 
studied in the torsion case by Donovan and Karoubi, 
where one can replace the compact operators K by 
finite-dimensional matrices, and was studied in the 
general case by Rosenberg (1983, 1989). Let F be 
the space of all Fredholm operators endowed with 
the norm topology. Then, one can form the bundle 
of Fredholm operators Fp ={P x F}/PU(H), where 
PU(H) acts on F via the adjoint action. Consider the 
fibration Kp —> Fp — GL(Cp), where Cp={P x C}/ 
PU(H) and C= B(H)/K is the Calkin algebra. Since 
m™o(C(X, Kp)) ={0}, we see that mo(C(X,Fp))= 
™o(C(X, GL(Cp))). Consider the short exact sequence 
of C*-algebras, 


OC Ke) CX, Bs) CXC) 0 


where Bp={P x B(H)}/PU(H) and where PU(H) 
acts on B(H) via the adjoint action. It gives rise to 
a six-term exact sequence 


Ko(C(X,Kp)) —> Ko(C(X,Bp)) — Ko(C(X,Cp)) 
index 


Ki(C(X,Cp)) — Ki(C(X,Bp)) — Ki(C(X,Kp)) 


By definition, K,(C(X,Cp)) = m0(C(X, GL(co,Cp))) 
and a standard argument shows that this is also 
equal to mo(C(X,GL(Cp))). By Kuiper’s theorem, it is 


not difficult to see that 
Therefore, 


K.(C(X, Bp)) = {0}. 


index: m(C(X, Fp)) — K°(X, P) 


is an isomorphism. Let X4 be a closed subset of X, 
and Ix, be the closed ideal of sections of Kp that 
vanish on X1. Then K*(X,X1,P) is by definition 
K.(Ix,). A geometric description of twisted K-theory 
in terms of modules for bundle gerbes is described in 
Bouwknegt et al. (2002). 

Some of the basic properties of twisted K-theory 
are listed as follows. Many of these properties 
follow from the corresponding properties for the 
K-theory of C*-algebras. See Atiyah and Segal and 
Bouwknegt et al. (2002). 


1. Normalization If P is trivial, then K°(M, P)= 
K*(M). 

2. Module property K*(M,P) is a module over 
K°(M). 

3. Pullback If f:N—M is a continuous map, 
and P a principal PU(H) bundle over M, then 
there is a pullback homomorphism f: K*(M, P) — 
K*(N, f(P)). 

4. Push-forward Let f:N—M be a smooth proper 
map between compact manifolds which is K- 
oriented, that is, TN @f*TM is a spin“ vector 
bundle over N. Let P be a principal PU(H) bundle 
over M. Then there is a pushforward homomorph- 
ism, also called a Gysin map, f :K*(N,f'(P)) > 
K*+4(M, P), where d= dim M — dim N. 

5. Homotopy If f:N — M and g:N—M are 
homotopic maps, then the pullback maps f! = g' 
are equal. If in addition, f and g are K-oriented, 
then the pushforward maps f, =g, are equal. 

6. Excision Let Mı be a closed subset of M and U 
be an open subset of M such that U is contained 
in the interior of Mı. Then the inclusion of 
pairs (M\ U, Mı \U)— (M, M1) induces an iso- 
morphism in K-theory, K*(M,M,,P)&= 
K°(M\U, M1 \U, P| mvo). 

7. Exactness Let Mı be a closed subset of M and 
L:Mı — M be the inclusion. Let P be a principal 
PU(H) bundle over M. Then the short exact 


sequence 





= imn = CM = C(M1, Kriu, ) +0 


gives rise to the six-term exact sequence in K-theory, 


K°(M, Mı, P) — K°(M, P) — eM ,u'(P)) 
A 
K'(M1,0'(P)) — K! (M, P) — K! (M, Mi, P) 


8. Cup product Let P be a principal PU(H) bundle 
over M and O be a principal PU(H) bundle over N. 
An identification H ® H & H gives rise to a principal 
PU(H) bundle P ® O over M x N whose Dixmier- 
Douady invariant is DD(P & Q)=p}(DD(P)) + 
p3(DD(Q)), where p; denote projections onto the 
jth factor, 7 =1,2. Then there is a canonical map 
given by external tensor product, 


Ki(M, P) @ Ki(N,Q) > K" (M x N, P 8 Q) 


called the cup product. 

9. Bott periodicity Let P be a principal PU(H) 
bundle over M. Bott periodicity says that there is 
a canonical isomorphism 


K*(M, P) = K**”(M x R”, a(P)) 


where m:M x R” — M is the projection onto the 
first factor. Let b € K”(R”) be the Bott element. 
Then the isomorphism above is given by (x) U 
b € K*t"(M x R",7(P)) for all x € K*(M, P). 


There is a natural homomorphism of rings called the 
twisted Chern character, which depends both on a 
choice of P and a de Rham representative H of DD(P), 


Chp : K*(M, P) > H*(M, H) 


Here H°(M, H) denotes the twisted cohomology, 
which is by definition the cohomology of the 
complex ((Q°(M),d — HA). The twisted Chern char- 
acter is characterized by the following axioms: 


1. Naturality If f:N — M isa smooth map, and if 
x € K*(M, P), then Chyp)(f'(x)) = f*(Chp(x)). 

2. Additivity If x,y € K°(M, P), then Chp(x 6 y) = 
Chp(x) + Chp(y). 

3. Chp respects the K?(M)-module structure of 
K°(M, P). 

4. Normalization If P is trivial, then Chp reduces 
to the ordinary Chern character Ch. 


It turns out that the twisted Chern character 
induces an isomorphism of the rings K*(M, P) @ Q 
and H*(M, H). The Chern—Weil representative of the 
twisted Chern character is derived in Bouwknegt 
et al. (2002). 


Twisted K-Theory and Duality in Type Il 
String Theories 


Let E be an oriented S'-bundle over M, 


S'— E 
r] 
M 
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characterized by its first Chern class c,(E) € 
H? (M, Z), in the presence of (possibly nontrivial) 
H-flux H € H°(E,Z). We will argue that the T-dual 
of E is again an oriented S'-bundle over M, denoted 
by E, 


S!_.E 
âl 
M 


supporting H-flux H € H?(E,Z), such that 


AN AN 


c1(E) = mH, 


where 1,: H*(E, Z) — H*-!(M, Z) and, similarly, m, 
denote the pushforward maps. Then we can form 
the following commutative diagram: 


Exm Ê 


ĆA 
~Z 


The correspondence space E xy E is a circle bundle 
over E with first Chern class 7*(cı(È)), and it is also 
a circle bundle over Ê with first Chern class 
m(ci(E)), by the commutativity of the diagram 
above. If E=E or if E=M x S!, then the correspon- 
dence space E xy E is diffeomorphic to E x S!. 

T-duality gives an isomorphism of the twisted 
K-theories of E and E as well as an isomorphism 
between the twisted cohomologies of E and E, and 
can be expressed in the following commutative 
diagram: 


T 


K*(E,P) —> K*t!(E,P) 
Chp | | Cha 
H*(E,H) = H (È, A) 


where the horizontal arrows are isomorphisms. Here 
P is a principal PU(H) bundle over E such that 
DD(P)=H and P is a principal PU(H) bundle over 
E such that DD(P)=H. We refer to Bouwknegt 
et al. (2004) for details. The T-duality isomorphism 
above gives compelling evidence that a type IIA 
string theory A on a circle bundle of radius R in the 
presence of a background H-flux, and a type IIB 
string theory B on a “T-dual” circle bundle of radius 
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1/R in the presence of a “T-dual” background H-flux, 
are equivalent in the sense that the string states of 
string theory A are in canonical one-to-one correspon- 
dence with the string states of string theory B. 

We briefly mention two other applications of 
twisted K-theory. Consider the adjoint action of a 
compact connected simple Lie group G on itself, 
and the corresponding twisted G-equivariant 
K-theory, twisted by a multiple of the generator 
of H?(G, Z). The relevance of the equivariant case 
to conformal field theory was highlighted by the 
result of Freed, Hopkins and Teleman (see Freed 
(2002)) that it is graded isomorphic to the 
Verlinde algebra of G, with a shift given by the 
dual Coxeter number. Here the Verlinde algebra 
consists of equivalence classes of positive-energy 
representations of the loop group of G which was 
originally shown to be a ring in a rather nontrivial 
way. On the other hand, the ring structure of the 
twisted G-equivariant K-theory of G is just 
induced by the product on G, which makes this 
result all the more remarkable. 

Fractional analytic index theory, developed in 
Mathai et al. is a generalization of Atiyah—Singer 
index theory, assigning a fractional-valued analytic 
index to each projective elliptic operator on a compact 
manifold, where the fraction need not be an integer. 
These projective elliptic operators act on projective 
vector bundles, where the usual compatibility condi- 
tion on triple overlaps to give a global vector bundle, 
may fail by a scalar factor. These are the geometric 
objects in twisted K-theory, when the twist is torsion. 
In Mathai et al., a fractional index theorem is 
proved, computing the fractional-valued analytic 
index of projective elliptic operators essentially in 
terms of topological data. The Dirac operator in 
the absence of a spin structure is also defined there 
for the first time resolving a long standing mystery, 
and its index is computed. 

Some topics not covered in this brief account of 
K-theory include: KK-theory, cf. Blackadar (1986) 
and Kasparov (1988), which is natural setting for 
the Atiyah—-Singer index theorem and its general- 
izations, as well as higher algebraic K-theory. 


See also: C*-Algebras and Their Classification; 
Characteristic Classes; Cohomology Theories; 
Equivariant Cohomology and the Cartan Model; Gerbes 
in Quantum Field Theory; Index Theorems; Intersection 
Theory; Mathai—Quillen Formalism; Spectral Sequences. 
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Introduction 


To describe transport by a random flow, one needs 
to apply the statistical methods to the motion of 
fluid particles, that is, to the Lagrangian dynamics. 
We first present the propagators describing evolving 
probability distributions of different configurations 
of fluid particles. We then use those propagators to 
describe decay and steady states of a passive scalar 
field transported by random flows. 

Consider an evolution of a passive scalar tracer 
O(r,t) in a random flow. The mean value of the 
scalar tracer at a given point is an average over 
values brought by different trajectories: 


(Olr, s)) = [Pos 0) A(R, 0) dR [1] 


Here, P(r,s; R, t) is the probability density function 
(PDF) to find the particle at time ¢ at position R 
given its position r at time s. That PDF is called the 
propagator or the Green function. Multipoint 
correlation functions of the tracer 


Cn(r,s) O08 ern VsS)) 
7 / Py(r,s:R,0)0(R,,0)...A(Rn,0)dR [2] 


are expressed via the multiparticle Green functions 
Pn which are the joint PDFs of the equal-time 
positions R=(Rj,...,Rn) of N fluid trajectories. 

The trajectory of the fluid particle that passes at 
time s through the point r is described by the vector 
R(t; r,s) which satisfies R(t; r,t)=r and the stochas- 
tic equation 


R = v(R,t) + u(t) [3] 
Here, u(t) describes the molecular Brownian motion 
with zero average and covariance (u'(t)w(t')) 
=26"6(t—t'). We also consider macroscopic 
velocity v as random with various statistical properties 


in space and time. There is a clear scale separation 
between macroscopic velocity v and molecular 
diffusion u that allows one to treat them separately. 

Using [3], one can write the Green’s function as 
an integral over paths that satisfy g(s)=r and 
q\t)=R: 


PirsRt)=( | DpDaexp(- f(r) (alo) 


S 


- o(q(r),7) = u(r) dr) 4 


vu 


— ( | Pona exp (- [im (407) 


-vaD +a) BS 


= ( [oqe ( -7 f tan 
- vao) Dar) ) 


= (P(r,s;R,t|v)), [6] 


v 


The integration over the auxiliary field p in [4] 
enforces the delta function of [3]. One passes from 
[4] to [5] by averaging over the Gaussian Brownian 
noise, and from [5] to [6] by calculating Gaussian 
integral over p. 

Generally, exact calculations are only possible for 
Gaussian random processes short-correlated in time- 
like in [5]. The simplest case is the Brownian motion 
when the advection is absent. One then obtains from 
[6] the Gaussian PDF of the displacement: 


P(R,t) = (4rrt) t eTR / 40) [7] 


which satisfies the heat equation (0, — KV?) P(r, t) =0. 
The short-correlated case is far from being an exotic 
exception but rather presents a long-time limit of an 
integral of any finite-correlated random function. 
Indeed, such an integral can be presented as a sum of 
many independent equally distributed random numbers, 
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the statistics of such sums is a subject of the central limit 
theorem. One can move beyond the central limit 
theorem considering the correlation time finite (yet 
small comparing to the time of evolution). Such 
generalization is the subject of the large deviation 
theory. Consider some quantity X which is an integral 
of some random function over time t much larger than 
the correlation time 7. At t >> T, X behaves as a sum of 
many independent identically distributed random 
numbers y;: X = ay y; with N œx t/r. The generating 
function (e**) of the moments of X is the product, 
(e**\ — eNS@), where we have denoted (e”)= e$t) 
(assuming that the generating function (e*”) exists for 
all complex z). The PDF P(X) is given by the inverse 
Laplace transform (2mi)' [e*X+NS® dz with the 
integral over any axis parallel to the imaginary one. 
For X x N, the integral is dominated by the saddle point 
zo such that S’(z9) = X/N and 


P(X) x e NA(X/N~(y)) [8] 


Here H = —S(zo) + zoS'(zo) is the function of the 
variable X/N — (y); it is called entropy function as it 
appears also in the thermodynamic limit in statis- 
tical physics. A few important properties of H (also 
called rate or Cramér function) may be established 
independently of the distribution P(y). It is a convex 
function which takes its minimum at zero, that is, 
for X equal to the mean value (X) =NS'(0). The 
minimal value of H vanishes since S(0)=0. The 
entropy is quadratic around its minimum with 
H”(0)= A, where A=S"(0) is the variance of y. 
We thus see that the mean value (X) = N(y) grows 
linearly with N. The fluctuations X — (X) on the 
scale O(N'/*) are governed by the central limit 
theorem that states that (X — (X))/N'/? becomes for 
large N a Gaussian random variable with variance 
(y2) — (y)* = A as in [7]. Finally, its fluctuations on 
the larger scale O(N) are governed by the large 
deviation form [8]. The possible non-Gaussianity of 
the y’s leads to a nonquadratic behavior of H 
for (large) deviations from the mean, starting from 
X — (X)/N ~ A/S" (0). Note that if y is Gaussian, 
then X is Gaussian too for any t, but the universal 
formula [8] with H=(X—N(y))*/2NA is valid 
only for t > rT. 


Single-Particle Diffusion 


For the pure advection without noise, the dis- 
placement of the single Lagrangian trajectory is 
R(t) — R(O) = A V(s) ds, with V(t) = v(R(t),t) being 
the Lagrangian velocity. One can show that V(t) is 
statistically stationary in the frame of reference with 
no mean flow and under statistical homogeneity and 


stationarity of the incompressible Eulerian velocities. 
For « = 0, the mean square displacement satisfies the 
equation 


d t 
G(R) — ROP) =2 | (VO)-Vi9))ds p 


The behavior of the displacement is crucially 
dependent on the Lagrangian correlation time r of 
V(t) defined by 


f “(V(0) - V(s)} ds = (2)7 10 


No general relation between the Eulerian and 
the Lagrangian correlation times has been estab- 
lished, except for the case of short-correlated 
velocities. For times t « 7, the two-point function 
in [9] is approximately equal to (V(0)*)=(v?). 
The fluid particle transport is then ballistic with 
([R(t) — R(0)|*) ~ (v*)t?, and the PDF P(R,t) is 
determined by the whole single-time velocity PDF. 
When the correlation time of V(t) is finite (a generic 
situation in a turbulent flow where 7 is of order of a 
large-scale turnover time), an effective diffusive regime 
is expected to arise for t > 7 with ((R(t) — R(0))*) ~ 
2(v*)rt. Indeed, the particle displacements over time 
segments much larger than 7 are almost independent. 
At long times, the displacement 6R(t) behaves then as a 
sum of many independent variables and falls into the 
class of stationary processes treated in the previous 
section. In other words, R(t) for t >> 7 becomes 
a Brownian motion in d dimensions, normally 
distributed with (6R‘(t)6R/(t)) ~ Dt, where the 
so-called eddy diffusivity tensor is as follows: 


. 1. re 

DE=5 | (VV) + V0) Vis))ds (1 
The symmetric second-order tensor DË is the only 
characteristics of the velocity which matters in this 
limit of t >> 7. The trace of the tensor is equal to 
(v*\r, that is, equal to the large-time value of the 
integral in [9], while its tensorial properties reflect 
the rotational symmetries of the advecting velocity 
field. If the latter is isotropic, the tensor reduces to a 
diagonal form characterized by a single scalar value 
De. The main problem of turbulent diffusion is to 
obtain the effective diffusivity tensor given the 
velocity field v and the value of the molecular 
diffusivity k. 


Two-Particle Dispersion in Smooth 
Flows 


Even when velocity v(R,t) is a smooth function of 
the coordinates, Lagrangian dynamics can be quite 


complicated. Indeed, d ordinary differential equations 
R=v(R,t) generally produce chaotic dynamics (for 
d > 3 already for steady flows and for d=2 for time- 
dependent flows). The tools for the description of what 
is called chaotic advection are similar to those of the 
theory of dynamical chaos. The description consistently 
exploits two simple ideas: to single out the variables 
that can be represented by the sum of a large number of 
independent random quantities and to separate vari- 
ables that fluctuate on different timescales. 

The distance, Rj2=R,—R»2, between two fluid 
particles with trajectories R;(t)=R(t;7r;) passing at 
t=O through points r; satisfies the equation 


Ry = v(R1, t) — v(R2, t) [12] 


If the velocity field can be considered smooth on 
the scale R12, then one expands v(R1, t) — v(R2, t) = 
a(t, Rı)R12, introducing the strain matrix ø which 
can be treated as independent of R12. The distance 
thus satisfies locally a linear system of ordinary 
differential equations (we omit subscripts replacing 
Ry by R) 


R(t) = o()R(t) [13] 


This equation, with the strain treated as given and 
R(0)=r, may be explicitly solved for arbitrary o(t) 
only in the 1D case 


In[R(t)/7] = In W(t) = | o(s)ds=X [14] 


When ¢ is much larger than the correlation time 7 of 
the strain, the variable X is a sum of N independent 
equally distributed random numbers with N =t/r 
and one can apply [8]. In the multidimensional case, 
to use the large deviation theory, one introduces the 
evolution matrix W such that R(t) = W(t)R(0). The 
modulus R is expressed via the positive symmetric 
matrix W!W. In almost every realization of the 
strain, the matrix t!ln WTW stabilizes at t — ov, 
that is, its eigenvectors tend to d-fixed orthonormal 
eigenvectors f;. To understand that intuitively, 
consider some fluid volume, say a sphere, which 
evolves into an elongated ellipsoid at later times. As 
time increases, the ellipsoid is more and more 
elongated and it is less and less likely that the 
hierarchy of the ellipsoid axes will change. The 
limiting eigenvalues 


Aj = lim t'In| WF; [15] 


are called Lyapunov exponents. The major property 
of the Lyapunov exponents is that they are realiza- 
tion independent if the flow is ergodic (i.e., spatial 
and temporal averages coincide). The relation [15] 
states that two fluid particles separated initially by r 
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pointing into the direction f; will separate (or 
converge) asymptotically as exp (A;t). The incom- 
pressibility constraints det(W)=1 and $` A;=0 
imply that a positive Lyapunov exponent will exist 
whenever at least one of the exponents is nonzero. 
Consider indeed 

E(n) = lim £’ In([R(t)/7") 16] 
whose derivative at the origin gives the largest 
Lyapunov exponent A. The function E(n) obviously 
vanishes at the origin. Furthermore, E(—d) = 0, that 
is, incompressibility and isotropy make that (R~“) is 
time independent as t — oo. Apart from n=0, —d, 
the convex function E(n) cannot have other zeroes if 
it does not vanish identically. It follows that dE/dx 
at n=0, and thus A), is positive. A simple way to 
appreciate intuitively the existence of a positive 
Lyapunov exponent is to consider the saddle-point 
2D flow vy =Ax,vy=—Ay with the axes randomly 
rotating after time interval T. A vector initially at 
the angle ¢ with the x-axis will be stretched after 
time T if cos¢ >[1+exp(2AT)] 1’, that is, the 
measure of the stretching directions is larger 
than: 1/2, 

A major consequence of the existence of a positive 
Lyapunov exponent for any random incompressible 
flow is the exponential growth of the interparticle 
distance R(t). In a smooth flow, it is also possible to 
analyze the statistics of the set of vectors R(t) and to 
establish a multidimensional analog of [8]. The idea is 
to reduce the d-dimensional problem to a set of d 
scalar problems for slowly fluctuating stretching 
variables excluding the fast fluctuating angular degrees 
of freedom. Consider the matrix I(t) = W(t)W!(t), 
representing the tensor of inertia of a fluid element 
such as the above-mentioned ellipsoid. The matrix is 
obtained by averaging R‘(t)R/(t)d/@* over the initial 
vectors of length Z and J(0)=1. Introducing the 
variables that describe stretching as the lengths of the 
ellipsoid axis e*?!,..., e*?4, one can deduce similarly to 
[8] the asymptotic PDF: 


Pipes pi) 
x exp|—t H(pi/t—A1,..-, pa_1/t — Ag_1)| 
x O(p1 — p2).--O(pa_1 — pa) 
XO Pi = 22 pq) [17] 


The entropy function H depends on the statistics 
of o. In the 6-correlated case, H is everywhere 
quadratic: 


\xd(d—2i+1) [18] 
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Two-Particle Dispersion in Nonsmooth 
Flows 


To consider dispersion in the inertial interval of 
turbulence, one should assume 6v(r, t)| x r“, where 
generally a <1. Rewriting then eqn [12] for the 
distance between two particles as R= 6v(R,t), we 
infer that dR*/dt=2R - 6v(R,t) x R'*°. It suggests 


R(t)’ ° — R(0)' ° at [19] 


For large t, R(t) x tt/07%, with the dependence of 
the initial separation quickly forgotten. Of course, 
for the random process R(t), relation [19] is of the 
mean-field type and should pertain (if true) to the 
large-time behavior of the averages ((R(t)?) « 
tP/1-2) for p> 0) implying their super-diffusive 
growth, faster than the diffusive one œ t?/*. The 
power-law scaling may be amplified to the scaling 
behavior of the PDF of the interparticle distance, 
P(R,t)=XP(AR, At). The power-law growth of 
the second moment, (R(t)*) x £3, is the celebrated 
Richardson dispersion relation, which was the first 
quantitative phenomenological prediction in devel- 
oped turbulence. It seems to be confirmed by 
experimental data and the numerical simulations. It 
is important to remark that, even assuming the 
validity of the Richardson relation, it is impossible 
to establish general large-time properties of the PDF 
P(R;t) such as those for the single-particle PDF of 
the distance between two particles. This is because 
the correlation time of the Lagrangian velocity 
difference, R/dv(R) œ (R2)'/3 œ t, is comparable 
with the total time of the process. 

It is instructive to contrast the exponential growth 
[16] of the distance between the trajectories with the 
power-law growth [19]. In a smooth flow, the closer 
two trajectories are initially, the more time is needed 
to effectively separate them. In a nonsmooth 
turbulent flow, the trajectories separate in a finite 
time independent of their initial distance R(0), 
provided that the latter is also in the inertial range. 
This explosive separation of trajectories results in a 
breakdown of the deterministic Lagrangian flow 
since the trajectories cannot be labeled by the 
initial conditions. That agrees with the fundamental 
theorem stating that the ordinary differential equa- 
tion R=v(R,t) does not have unique solution 
if v(r,t) is non-Lipschitz. As shown by the 
example of the equation x = |x|" with two solutions 
x=[(1 — a)t] 07% and x =0 both starting at zero, 
one should expect multiple Lagrangian trajectories 
starting or ending at the same point for velocity 
fields with a <1. Even though the deterministic 
Lagrangian description breaks down, the statistical 
description is still possible and one can make 


sense of propagators like P(r,s;R,t|v). They are 
expected to be weak solutions of the equation 
[O, — V - v(R, t)]P(r,s;R,t\v)=0 in the nonsmooth 
case. According to this assumption, the Lagrangian 
trajectories behave stochastically already in a 
given velocity field and for negligible molecular 
diffusivity — and not only due to a random noise or 
to random fluctuations of the velocities. 

The general conjecture about the existence and 
diffuse nature of propagators is known to be true for 
the Gaussian ensemble of velocities decorrelated in 
time (Kraichnan 1968): 


(vilr, b)uj(r,t)) = 26(¢ — f)D,y(r — 7°) [20] 


Here the Lagrangian velocity v(R,t) has the same 
white noise temporal statistics as the Eulerian 
velocity v(r,t) for fixed r and the displacement 
along a Lagrangian trajectory R(t)— R(0) is a 
Brownian motion for all times. To model non- 
smooth velocity field of turbulence, we choose 
D’ (r) = D6" — (1/2)d" (r) and 


dï (r) = Dil(d — 1 + €)8%r8 — &r'rir 7] [21] 


Here Do gives the eddy diffusivity of a single fluid 
particle (discussed earlier), whereas dj(r) describes 
the statistics of the velocity differences. For 0 < € < 2, 
the Kraichnan ensemble is supported on the velo- 
cities that are Holder continuous in space with a 
fixed exponent a arbitrarily close to €/2. It mimics 
this way the main property of turbulent velocities. 
The rough (distributional) behavior of Kraichnan 
velocities in time, although not very physical, is not 
expected to modify essentially the qualitative prop- 
erties of propagators (it is the spatial regularity, not 
the temporal one, of a vector field that is crucial for 
the uniqueness of its trajectories). 

In exactly the same way as one derives [6] and [7] 
from [4], one gets P(R, t) = Â|"? (4 nt) 4/2 e FiRiRi/4", 
where (84), = D;(0) + «6;. In much the same way 
one can examine the two-particle PDF. The PDF 
P2(r,s;.R,t) of the distance R between two particles 
satisfies the equation 


(Ə; — Mz)P2(r, s;R,t) = 6(t — s)6(r — R) [22] 


where My = —D4(d — 1)r!~40,r¢-'*£0, and [22] can 
be readily solved: 


lim P2(r,s;R,t) « 


x exp const. Poel [23] 


That confirms the diffusive character of the limiting 
process describing the Lagrangian trajectories in 
fixed non-Lipschitz velocities: the endpoints of the 
process stay at finite distance when the initial points 
converge. The PDF [23] changes from Gaussian to 
log-normal when € changes from 0 to 2. The 
Richardson dispersion (R7(t)) xt is reproduced 
for €=4/3. 


Multiparticle Propagators 


In studying multiparticle statistics, an important 
question is what memory of the initial configuration 
remains when final distances far exceed initial 
ones. To answer this question, one must analyze 
the conservation laws of turbulent diffusion. 
Many-particle evolution in nonsmooth velocities 
exhibits nontrivial statistical integrals of motion 
(martingales) that are proportional to the positive 
powers of the distances. The integrals involve 
geometry in such a way that the distance growth is 
balanced by the decrease of the shape fluctuations. 
The existence of multiparticle conservation laws 
indicates the presence of a long-time memory and is 
a reflection of the coupling among the particles due 
to the simple fact that they are all in the same 
velocity field. The conserved quantities may be easily 
built for the limiting cases. Already for a smooth 
velocity, the d-volume ¢&,j,..;,Ri,...R{, is indeed 
preserved for (d + 1) Lagrangian trajectories. In the 
opposite case of a very irregular velocity, the fluid 
particles undergo a Brownian motion. The distances 
between the Brownian particles grow according to 


(R2 „(t)) = R2„(0)+ Dt. The statistical integrals 
of motion are (RŽ„-— eae (2(d + LIR aRar — 


d(R* + A and an infinity of similarly built 
harmonic polynomials (zero modes of Laplacian). 

The statistics of the relative motion of N particles 
is described by the joint PDF averaged over rigid 
translations: Pel (r, s;R,t) = | Pn(s,r;R + p,t) dp. 
For smooth velocities, 


N 
PRORA) = | ([[6®,+e-We)r)) dp (24 
n=1 


Such PDF depends only on the statistics of the 
evolution matrix W(t) discussed earlier. Under the 
evolution governed by W(t), all distances between 
points grow exponentially for large times while their 
ratios Rym/ Rp tend to a constant. For whatever initial 
positions, asymptotically in time, the points tend to be 
situated on the line. Note that the existence of 
deterministic trajectories leads to the collapse property 
limyy sry, PN (r; R; t) = PK (r'; B's t) 6(RN-1 — Rn), 
where R’ = (R1,..., Rn_1). 
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The long-time asymptotics of the propagators in 
the nonsmooth case can be found explicitly for the 
Kraichnan ensemble of velocities: 


(3 + Mn) PX (r,s;R,t) = 6(t—s)6(R—r) [25] 


Mn = X d’ (tam)V Vi. [26] 


n<m 


When initial points get close or final points far 
apart and time gets large, the multiparticle PDF is 
factorized: 


5 rel z = Ce 
lim PÑ (Az, 0;R, t) 2 folr)go(R, t) [27] 


where fg must be taken as zero modes of M\, and its 
powers while 0;g3 = —Mwngg. The remarkable fea- 
ture of the zero modes of Mi, is that they are 
conserved in mean by the Lagrangian evolution: 


a,(f(R(t))) = J f(R)MnP&'(r, 0; R, t) aR’ 
= [ PR'(E, 0; R.)MLf(R) dR’ = 0 


The scaling exponents of the zero modes depend, in 

a nontrivial way, on the number of particles N. For 

€< 1 and d > 1, one finds 
N 


ae a 


2 (d+ 2) i P 


Passive Scalar 


For practical applications, for example, in the 
diffusion of pollution, the most relevant quantity is 
the average (0(r,t))} which can be expressed via the 
single-particle propagator. As discussed earlier, for 
times longer than the Lagrangian correlation time, 
the particle diffuses and (0) obeys the effective heat 
equation 


O(O(r,t)) = (DE + rôi) VV; Olr, t) 29] 


with the eddy diffusivity DY given by [11]. The 
simplest decay problem is that of a uniform scalar 
spot of size L released in the fluid. Its averaged 
spatial distribution at later times is given by the 
solution of [11] with the appropriate initial condi- 
tion. On the other hand, the decay of the scalar in 
the spot is governed by the multipoint Lagrangian 
propagators. Taking the point of measurement 
inside the spot, consider the single-point moment 
(ON \(t) described by [2]. If there is no molecular 
diffusion and the trajectories are unique (spatially 
smooth velocity), particles that end at the same 
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point remained together throughout the evolution 
and all the moments are preserved. On the contrary, 
when velocity is nonsmooth and the propagator is 
diffusive, we expect the decay even at the limit k — 0. 
This is an example of the so-called dissipative 
anomaly: the symmetry t— —t remains broken 
even when the symmetry-breaking factor k goes 
to zero. Consider a spherical spot of @ released in 
a spatially smooth incompressible 3D flow with 
Ay > A2 >O0>A3. During the time less than 
ty=|A3|' ln (L/ra), diffusion is unimportant and 8 
inside the spot does not change. At larger time, the 
dimensions of the spot with negative Lyapunov 
exponents are frozen at rz, while the rest keep 
growing exponentially, resulting in an exponential 
growth of the total volume exp (p; + p2). That leads 
to an exponential decay of scalar moments averaged 
over velocity statistics: (fa(t) |X) x exp (—ynt). The 
decay rates yn can be expressed via the PDF [18] of 
stretching variables p;. Since 0 decays as the inverse 
volume, 


ON) ox / doıdpz exp[—tH(p1/t — A1, p2/t — M) 
— N(p1 + p2)| [30] 


At large t, the integral is determined by the saddle 
point. At small N, the saddle point lies within the 
parabolic domain of H so yn increases with N 
quadratically. At large N, the main contribution is 
due to the realization with smallest possible spot of 
size L so yy saturates. 

For the decay in incompressible nonsmooth flow, 
using the Kraichnan model one gets 


"(t)) = J Pan(0; Ri —1) Con (t4(?-9R,0) dR [31] 


When Jo= f[Co(r,t)dr #0, the function t079 
C2(t!/2-9r,0) tends to Joô(r) in the long-time limit 
and [31] is reduced to 


(6° (t)) y (2n = Dupre 


x< | PanlO: R1, R1, ... , Ra, Ra; —1) dR [32] 


The decay is self-similar: P(t, 0) = t279 
Q(t” 22-90). That means that the PDF of 0/Vé 
is asymptotically time independent, with €é(t)= 
«((V6)*) being time-dependent (decreasing) dissipa- 
tion rate. This should be contrasted with the lack of 
self-similarity for the smooth case. 


One can also consider steady state of 0 pumped by 
a source (r,t): 


0,0+(v-V)O+KA0= ¢ [33] 


Assuming that pumping is white Gaussian with a 
zero mean and variance (rj, t,)0(1r2, tr) = x(112)6 
(t2 — t1), rj =r; — rj, one can express the correlation 
functions via the multiparticle propagators. For 
example, assuming zero conditions at the distant 
past and space homogeneity, one gets 


Ort) = [ d f P(Rr t')x(R) dR [34] 


The function y(R) is nonzero within the correlation 
scale L of the pumping which restricts integration to 
R(t)< L. For smooth velocity, this gives 
F)(r) =|A3| /x(0) In (L/r) at r< L. For nonsmooth 
velocity, the statistics of scalar fluctuations at 


small scales is described by the set of structure 
functions Sy(r) = ([0(r) — 0(0)]}%) x rs with the 


scaling exponents determined by the zero 
modes (see Falkovich et al. (2001)). Therefore, 
existence of Lagrangian statistical invariants 


explains the anomalous scaling of passive scalar 
(here, anomaly means that scale invariance broken 
by pumping is not restored even when the pumping 
scale goes to infinity). 


See also: Anomalies; Intermittency in Turbulence; Large 
Deviations in Equilibrium Statistical Mechanics; Lyapunov 
Exponents and Strange Attractors; Random Walks in 
Random Environments; Stochastic Differential Equations; 
Turbulence Theories. 
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Introduction 


Large deviation theory (LDT) deals with the study 
of probabilities of extremely rare events. As an 
example, consider the case of independent identi- 
cally distributed random variables o1,...,0nN with 
the mean value E(o;)=m. Then the typical devia- 
tions of the sum My =o; +:-:-+on from its mean 
value Nm are of the order of VN, while in LDT we 
study the probabilities of the deviations which are 
linear in N. In “good” cases we know that for b > 0 


Pr{ Myn — Nm > bN} ~ exp{-—I(b)N} 


as N — œ 


[1] 


where I(-) > 0 is the “rate” function. 

Questions of LDT are very natural in statistical 
mechanics, and they have deep physical meaning, 
notwithstanding the fact that the corresponding 
events are rare. One reason is that (some) rare 
events in the grand canonical ensemble become 
typical events in the canonical ensemble. 

An interesting feature of LDT in statistical 
mechanics is that the behavior [1] of LD is not 
universal, and sometimes is replaced by a nonclassi- 
cal one: 


Pr{Mn — Nm > bN} ~exp{—I(b)N”} [2] 


with v < 1. That usually happens in the “phase 
transition” regime, and then the quantity I(b), as 
well as the exponent v, have very much to do with 
the geometry of a droplet of one phase formed inside 
the other. 

Below, we will illustrate all these features on the 
example of the Ising model. 


The Ising Model in the Finite Box 


Our random variables o, will take values +1, with 
x € 74, They are called spins. For every finite box 
A C Zf, we will define Gibbs states in A. To do this 
we need the Hamiltonians 


Ay e(o) = ` OxOy — > Oxy 


x,y nn. x,y n.n. 
x, yEA xEA,ygA 


Here, € is some spin configuration on 74, which is 
called “boundary condition,” while øo € Q4 is any 
spin configuration in A. 


The “grand canonical Gibbs measure” jua,¢,7 in A 
with boundary condition € at inverse temperature 
G=T™! is given by 


Hagt(o) = Zier exp GHa(o)) [3] 


where 


Zaer = >_ exp-BHag¢(o)) 


aE y 


is called “partition function”; it makes the measure 
[3] to be a probability distribution. 

The boundary condition €=+1(—1) will be 
denoted by +(—). For every value of T, the Gibbs 
measures ua+,r With (+)-boundary condition in 
the cubic box A(/) of size l converge, as | — ov, to 
the probability measures that we will denote by 
H+ T. If the two happen to be different, then ju4, 7 is 
called the (+)-phase, and u- r the (—)-phase. That 
happens to be the case iff the temperature T is lower 
than the critical temperature T; = T,(d). The critical 
temperature depends on dimension; T.(1) =0, while 
T.(d) > 0 for d > 2. The expectation 


Ey (o0) = m((3) 


is called spontaneous magnetization; m((3) > 0 iff 
BT 


LD Properties of the Gibbs States 1, _ 7 


In what follows, we will discuss the LD properties of 
the sum Myj=o;+---+o\,, where the spins 
Ox, X € A, are distributed according to the Gibbs 
state ua,—,T- Note that E,, (00) = — m((). 


Classical Case 


If we look on the LDs of the sum My, when the 
temperature T is high enough (in which case the 
limiting states 44,7 and u—,r coincide), or else if the 
temperature is low, and the deviations are negative — 
that is, we consider the events M, + |A|m(T~') < b|A| 
with b < 0 -then their probabilities behave classically: 

There exists a (high) temperature To such that if 
T > To, then 


lim — PrfMq + |A|lm(T~') < bIA|} 


A3Z4 |A| 
= —IĪr(b) for b<0 [4] 
lim = P Ma +|A|lm(T~') > bJA|} 


A—Z4 |A| 


=-I7(b) for b>0 [5] 
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where the function I7(b) > 0 is strictly concave on 
the segment (m(T~') — 1, m(T~') +1). It vanishes 
at only one point b=0. 

There exists a (low) temperature Tı such that if 
T < Tj, then the relation [4] holds with the function 
I7(b) > 0 strictly concave on the segment (m(T~!) — 
1,0). The limit [5] also does exist, but it can vanish 
once we are in the phase transition region. In order 
to see some nontrivial behavior, we have to change 
the normalization 1/|A] in [5]. 


Nonclassical Case 


The proper normalization happens to be the surface 
term, 1/|Ale-/4, 

There exists a temperature Tı such that if T < Tj, 
then 


1 _ 


=-—Wr(b) for bD>0 [6] 


The function Wr(b) obeys Wr(b) = b! twr, with 
wr > 0, provided the value b > 0 is not too large: 
b < b(d), where b(d) is some constant, depending on 
the dimension and temperature; one can show that 
b(d) > 1/24. For larger b’s the dependence is more 
complex. 

The key object here is the constant wr. To obtain 
it, one has to solve the following variational 
problem. Let r7(6), 6 € S! be the surface tension 
between the (+)-phase and the (—)-phase of the Ising 
model at the temperature T. Then, for every closed 
compact (hyper)surface M^! c Rf, we define its 
surface energy as 


wr(M) = | r1(6,) 


where 0, is the normal vector to M at s € M. The 
functional Wr(M) has the meaning of the energy of 
the M-shaped droplet of the (+)-phase floating in the 
(—)-phase. It is called the “Wulff functional.” Let 
Wyr be the surface which minimizes Wr(-) over all 
the surfaces enclosing the unit volume. Such a 
minimizer does exist and is unique up to translation. 
It is called the “Wulff shape.” The value wr is just 
the surface energy of the Wulff shape: 


wr = Wr(%8r) 


The value b(d) is defined as the maximal value of 
b’s, for which the dilatation bt/4%8r can fit into the 
unit cube. For higher values of b, the shape of the 
(+)-phase droplet in the cube with (—)-boundary 
condition is deformed by its walls, so its surface 
energy is given by a more complicated variational 
problem. 


Moderate Deviations and the Droplet 
Condensation 


The reason behind the different order of the 
probabilities of the events M, +|A\m(T~!) < 
b\A\, b <0, and My + |Alm(T~') > BIA], b > 0, at 
low temperatures is the following. A typical config- 
uration contributing to the first event contains many 
small droplets of (—)-spins, of size < In|A|, floating 
in the sea of (+)-spins. On the contrary, in the case 
of the second event a typical configuration con- 
tains, in addition to small droplets, one large 
droplet of the size of A. It has a random shape, 
but in the limit A — Zf that shape converges to a 
nonrandom one, which happens to be the Wulff 
shape Wr. (The precise meaning of that statement 
depends on dimension; in case d=2 the conver- 
gence holds in the Hausdorff metrics, while in 
higher dimensions it is known only in L! sense.) 
That statement makes the following question 
natural: consider the event 


My — E(My) > LAI, O<a<l 

For which a should we expect, in addition to 
microscopic (+)-droplets of size < In|A|, the forma- 
tion of a large droplet, of volume ~ |A|“, in a 
corresponding typical configuration? In other 
words, how many extra (+)-spins should we pump 
into our systems in order for the microscopic 
droplets to condense into a macroscopic one? (In 
the formulation of this question, we have to use the 
expectation E(M,) instead of the asymptotically 
equivalent quantity —|A|m(T~'). The difference, 
E(Ma) + |Ajm(T-!) ~ O(|OA]), being irrelevant in 
the LD case, becomes significant here.) 

The answer is the following: 


e if a<d/(d+1), then a typical configuration 
contains only microscopic droplets; 

è if a>d/(d+1), then any typical configuration 
contains, in addition to microscopic droplets, one 
large droplet of volume ~ |A|". 


Therefore, the condensation happens at the value 
a = d/(d + 1). This picture has its counterpart in the 
behavior of the probabilities of “moderate deviations” 
(MD), that is, events when My + |Alm(T~!) > |AJ*: 


e if a<d/(d+1), then the deviation is due to 
independent fluctuations of sizes of many small 
droplets, and the usual Gaussian behavior holds: 


Pr{ M, — E(M,y) > JAI} 


(A) | 7 
~ apl- E = exp{ —c|A/? n) 


e if a>d/(d+1), then the deviation is due to the 
formation of a large droplet, and so 


Pr{My — E(Ma) > |Al*} ~ expd =e JAD 


Note that the two estimates match at a=d/(d + 1). 


Other Questions 


There are many related questions; some are partially 
solved, others are widely open, if considered on a 
rigorous mathematical level. 

One can ask about the asymptotic behavior of 
probabilities of the events like 


My — E(My) = by 


where the values by lie in the LD or MD region. The 
difference between such questions and those treated 
above is of the same nature as the difference between 
the integral and the local limit theorems. Partial answers 
to them are given in Dobrushin and Shlosman (1994). 

Many results about the Wulff shape and its 
relation to the Ising model are known, starting by 
Dobrushin et al. (1992). Some are still challenging. 
One such question concerns the so-called roughening 
phase transition. It is known rigorously that the 
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Introduction 


Topological strings have been well studied since 
they were introduced in the early 1990s. Essen- 
tially, they are simplified string theories that 
capture the information about a sector of the full 
(or “physical”) string theory. Thus, while sharing 
many of the structural features of usual string 
theory, they hold out the possibility of being 
amenable to explicit calculations. This is especially 
true with regard to stringy quantum corrections 
(the higher genus contributions from the point of 
view of the string world sheet), which are normally 
rather intractable in the full physical string theory. 
This has allowed them to play a useful role in 
enhancing the understanding of string theory and 
many of its mysterious quantum properties, such as 
the various dualities. 
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Wulff shape Wr in the d > 3 Ising model has flat 
facets at low temperatures T. It is believed that such a 
feature holds true only for T < Tp, where the 
roughening temperature TR is strictly less than the 
critical temperature T.(d) for d=3. At the tempera- 
tures T € (Tr, T.(3)), the Wulff shape Wr does not 
have facets. This conjecture seems to be very difficult. 

The question about the typical behavior of 
the MD of the Ising model at the threshold 
value My —E(M,a) ~ Aer? was recently 
answered in Biscup et al. (2003). 
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In particular, in the last several years, topological 
strings have served as an important laboratory for 
testing and understanding the connection between the 
large-N expansion of gauge theories and closed- 
string theories. In this article we will sketch how 
this connection is illustrated in a duality between 
large-N Chern-Simons gauge theory and closed 
topological string theories. We will survey the origin 
and current status of these developments and 
indicated some of its remarkable mathematical 
ramifications. 


Background 


In order to appreciate the conjecture relating the 
Chern-Simons theory and topological string the- 
ories, we need to go back to the seminal work of 
’t Hooft, who pointed to the connection between the 
large-N expansion of gauge field theories and string 
theories. 

The starting point is a gauge field theory (with, 
say, gauge group U(N)), where we take the limit of 
the rank N of the gauge group to infinity (see Brezin 
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and Wadia (1993) for a collection of papers on the 
topic). The idea is then to make an expansion in 
inverse powers of N for various observables such as 
the free energy and correlation functions. For 
definiteness, let us take a gauge theory containing 
only gauge fields A in the adjoint representation of 
U(N). The quantum theory is (schematically) defined 
by the path integral 


Z= J [DA]es“) (1 


For now, the action S(A) for the gauge fields is left 
unspecified. It could be either the usual Yang—Mills 
functional or of the Chern-Simons form which we 
describe below. S(A) is normalized in such a 
way that the gauge coupling constant, denoted 
by «, only appears via an overall multiplicative 
factor of 1/k. 

Then the expression, for instance, for the free 
energy F= ln Z has an expansion in a power series 
in x, whose individual terms are given by the usual 
Feynman diagrammatic rules. Namely, we have is 
a sum over connected vacuum diagrams (those 
without any external legs) formed from the 
vertices determined by the action S(A). Even 
without going into the details of the action, we 
can write down the dependence on N and «x 
coming from a diagram with þh faces, V vertices, 
and E edges. Every edge is associated with a 
propagator (arising from the inverse of the quad- 
ratic term in S(A)) and thus comes with a weight 
of K. Every vertex, coming from the cubic and 
higher-order terms in S(A), comes with a factor of 
«'. There is a factor of N coming from summing 
over the color indices that circulate in every loop 
(face). We thus get a weight of N’«®-Y and so the 
total contribution to the free energy can be 
organized as 


CO 
F= ` CoN’ 282th 
g=0,b=1 


on 
Here we have defined A= KN, the ’t Hooft 
coupling, as the combination that will be kept 
fixed when taking the limit of large N. We have 
also used the fact that V — E + hb =2 — 2g, where g 
is the number of handles of the closed two- 
dimensional surface one can associate with the 
Feynman diagram. (It is best to visualize the 
Feynman diagram as a “fatgraph” which forms 
the skeleton of a closed Riemann surface.) The 


coefficients C,, represent the sum of the 


CypN? 7878-24" [2] 


contributions from all genus g diagrams with hb 
boundaries and depend on the details of the 
theory. 

We note that the reorganization of the contribu- 
tions to the free energy is reminiscent of the genus 
expansion in a string theory. In fact, eqn [2] as it 
stands looks like an open-string expansion on world 
sheets with g handles and ) boundaries. Indeed, in 
many cases the gauge theory arises as a limit of an 
open-string theory. (Recall that a massless nonabe- 
lian gauge boson is one of the low-lying excitations 
of an open-string theory.) So the double expansion 
in terms of g and b is not too surprising. 

However, the interesting conjecture of °t Hooft 
is in the relation to closed-string theory. Note 
that the expansion in inverse powers of N depends 
only on the number of handles g. In fact, 1/N 
seems to play the role of closed-string coupling in 
that it suppresses higher genus diagrams. The total 
contribution to a given genus g comes from 
summing over all the holes hb in eqn [2], for 
example, 


F= YO NEF (A) 3 
g=0 


The conjecture is to identify this with a closed-string 
expansion in which F,(X) is a closed-string ampli- 
tude on a genus g Riemann surface. (In carrying out 
the sum over the holes, we have assumed the 
existence of a radius of convergence. This is 
plausible since the number of planar diagrams 
(g=0), for instance, grows only exponentially with 
the number of holes.) The question, since ’t Hooft, 
has been: what is this closed-string theory? In other 
words, what is the background on which the closed 
string propagates? 

A breakthrough came from Maldacena’s identi- 
fication of the background for the particular case of 
U(N) N =4 supersymmetric Yang-Mills theory. 
His conjecture was that this theory is dual to type 
IIB closed-string theory on AdS; x S° with a 
curvature scale set by A and with closed-string 
coupling x A/N. This proposal passed a number of 
nontrivial checks and is widely held to be true. It 
also stimulated the search for closed-string duals to 
other large-N gauge theories. 

In what follows, we explain how the conjecture of 
>t Hooft has a nice realization in the case of three- 
dimensional U(N) Chern-Simons gauge theory on 
S$. The dual closed-string theory, obtained by 
summing over the holes, turns out to be the 
A-model topological string on the (six-dimensional) 
resolved conifold background. The parameter A 
maps into a Kahler parameter in the closed-string 


geometry and once again the closed-string coupling 
is xA/N. 


The Large-N Expansion of Chern-Simons 
Theory 


Nonabelian Chern-Simons theory is based on the 
following action functional for the U(N) gauge 
connection A: 


Scs(A) = f tr(AAdA+4AAAAA) |4] 
TIM 

Here M is a three-dimensional manifold. k is called the 
level and is integer quantized for the path-integral 
equation [1] to be single valued. Note that, classically, 
k as defined earlier is proportional to 1/k. One of the 
nice properties of Scs(A) is that it is independent of the 
metric on M, unlike the Yang-Mills functional. Thus, 
it is a prototype of a topological field theory. In fact, 
the observables in this theory capture topological 
information about the 3-manifold M. 

Witten succeeded in quantizing the Chern- 
Simons theory by relating its Hilbert space to the 
space of conformal blocks in the two-dimensional 
U(N) WZW theory. (for more details on the 
quantization, see Chern-Simons Models: Rigorous 
Results). Here, merely the answers for various 
observables in the theory will be quoted. In 
particular, the free energy for the theory on S? 
can be written in a completely explicit form: 


Z(S°, N,k) = exp F(S’, N, k) 





One of the features one observes in the quantization 
is the shift (“finite renormalization”) of the effective 
level from k to k+ N. This can also be seen in 
perturbation theory. Consequently, while taking the 
large-N limit, the natural quantity to be held fixed 
as the °t Hooft coupling is A=2rN/(k +N). 

We can then carry out the °t Hooft expansion in 
powers of A and 1/N, of expressions, for example, 
for the free energy in eqn [5]: 


N? 3 
F >y (108 \ — 5) 
1 Bs 
+ ee eee 
2 Neeme 2¢(2g — 2) 


+ D 3 T ea (6) 


g=0 h=2 


1 / 
= 8N + C (—1) 
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The coefficents F, are nonzero only for even h and 
are given by 





a 2¢(b — 2) 

(2n)’2(b — 2)h(b — 1) 

(b) 

1 el2m)}h 7 
p 2628-2 +h) ‘ree 

& (2r) h 

Bog 
*2e(2g — 2) 


where the last line is for g > 1. Bog are the Bernoulli 
numbers. The first few terms in eqn [6] are 
nonperturbative contributions which do not have a 
Feynman-diagram interpretation. The power series in 
A is, on the other hand, of the same form as eqn [2]. 
In fact, there is an open-string interpretation for these 
terms which will be considered later. 

Given the explicit form of the answer, we can 
carry out the summation over the holes b. Using 
some resummation techniques, we find 


F=>-(-id) RO 5 


with ¢t = ià and 


= (—1)*|BagBrg_2| 
2eQe—-2)2e— -a 


ET 2-3 ent 


(This expression is for g > 1. There are very similar 
expressions for genus 0 and 1 as well.) With the 
identification of the string coupling g, = — it/N, the 
F,(t) actually turn out to be the genus g amplitudes 
of a closed topological string, in line with the 
general expectation of the previous section. This is 
explained in the following. 


F(t) 


Topological Strings 


Physical strings are defined in terms of a two- 
dimensional sigma model (the theory on the world 
sheet) made reparametrization invariant by coupling 
to two-dimensional gravity. Topological strings are 
simpler versions of this, where the world-sheet 
theory is a two-dimensional topological sigma 
model. The latter is defined in terms of a sigma 
model (usually with N=2 superconformal symme- 
try) with an additional twist which drastically cuts 
down the physical states to a subset of the low-lying 
modes. There are actually two inequivalent twists 
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denoted by A and B, respectively, but we will 
restrict to the A twist in this article. One of the 
simplifications of the A twisted sigma model is that 
the path integral localizes to contributions from only 
holomorphic maps from the world sheet to the 
target space (which will be taken to be a Calabi-Yau 
3-fold). Also, all the observables in the theory 
depend only on the Kahler parameters of the target 
space and not the complex structure parameters (see 
Topological Sigma Models as well as the book by 
Hori et al. (2003) for more details). 

The topological string theory is defined by an 
appropriate integration of the observables of the 
topological sigma model over the moduli space of 
the world-sheet Riemann surface. For instance, the 
free energy of the string theory at genus g is given by 


6g—6 


FP (¢ =]. < IT b, ui) [10] 


Here b is one of the reparametrization ghost fields 
on the world sheet and u; are Beltrami differentials. 
The averaging is with respect to the world-sheet 
sigma model for the Calabi-Yau target X, as the 
subscript indicates. We have also shown the depen- 
dence of F, on the Kahler parameters of X, 
collectively denoted by t. The localization to the 
holomorphic maps in the path integral implies that 
fe) takes the generic form 


Fort .> Ng, aq” 


g=] ja [1 


Here q; =e™ and n; are the integer coefficents 
labeling the element 8 € H?(X). This is in the same 
basis of two cycles of H?(X) in terms of which the 
complex Kahler parameters t; are expressed. (Recall 
that in string theory the Kahler parameters are 
complexified because of the presence of an addi- 
tional 2-form field.) The Ngg are the Gromov- 
Witten invariants for X and are in general rational 
numbers. For nonzero 6, the corresponding terms 
are often called world-sheet instanton contributions 
since they correspond to topologically nontrivial 
maps from the world sheet to 2-cycles in the target 
space. The all-genus free energy of the topological 
string is also defined to be 


FP ( t 85) => 


g=0 


oe) [12] 


with g, being the string coupling. 

Since topological strings are related to physical 
strings by a twist on the world sheet, it is natural 
that topological string computations are related to 
computations in the physical string theory. In fact, 


as shown by Antoniadis, Gava, Narain, and Taylor 
as well as Bershadsky, Cecotti, Ooguri, and Vafa, 
observables such as F'°P(t) are related to special 
superpotential terms in the type II string compacti- 
fication on the Calabi-Yau X. Using duality to 
M-theory, these answers were reinterpreted by 
Gopakumar and Vafa in terms of contributions 
coming from BPS states of wrapped D-branes. This 
gives a completely different perspective on topolog- 
ical strings. For instance, the all-genus free energy 
can naturally be reorganized as 


Peet gs) 
oe) oe) 1 l d i 2g=2 
= > X ` nag (2 sin = ) qg% [13] 


where the n are integer invariants (Gopakumar-— 
Vafa) since they count the number of BPS states. 
This will prove to be useful in extracting all-genus 
answers for topological string amplitudes, which is 
normally quite difficult using the perturbative 
definition given earlier. 


The Large-N Dual to Chern-Simons 
Theory 


We are now in a position to state the duality 
(Gopakumar and Vafa 1999) between large-N 
Chern-Simons theory and topological strings in a 
precise way. The conjecture is that the closed 
topological string theory on the S* resolved conifold 
geometry is exactly dual to the U(N) Chern—Simons 
theory on S°. The resolved conifold geometry is a 
noncompact Calabi-Yau 3-fold described by the 
equation 


xy — zw =) [14] 


where the singularity is resolved by a 2-sphere 
x=pz,w=py. The resulting space can thus be 
characterized as an O(—1) + O(—1) bundle over P!. 
It has a single Kahler parameter t for the nontrivial 
2-cycle of the S*. In addition, the string theory is 
characterized by the string coupling g,. These 
parameters map on the gauge theory side to the 
°t Hooft parameter A and N via the dictionary 


L=1), [15] 


§s = N 

This conjecture can be checked by comparing 
various exact calculations in the Chern—-Simons 
theory with corresponding calculations in the topo- 
logical string on this conifold background. The use 
of the duality to M-theory enables us to make exact 
computations on this side as well. One of the 


nontrivial checks of this duality comes from a 
comparison of the free energies. In eqns [8] and 
[9], we already have carried out the sum over the 
holes in the Chern—Simons theory and organized it 
as a closed-string genus expansion. Note that these 
expressions are already of the form [11] expected of 
a closed topological string. One simply has to check 
that it is indeed that on the S* resolved conifold. 

In the language of the integer invariants ft the S? 
resolved conifold is particularly simple. The only 
nonzero invariant is n?=1. Physically, this corre- 
sponds to a single brane wrapped on the genus-zero 
S*. Putting this into eqn [13], and making the 
expansion in powers of g,, we find exactly eqn [9] 
for the genus-g contribution to the free energy. This 
is quite a remarkable agreement and represents a 
triumph for the ideas of large-N duality. 


Geometric Transitions and Large-N 
Duality 


To understand the reason for this duality a bit 
better, we utilize an old observation of Witten that 
Chern-Simons theory is an open topological string 
theory. As mentioned earlier, the expansion [2] (or 
[6]) is suggestive of an open-string expansion in 
terms of handles and holes. Witten observed that 
open topological strings on the noncompact 3-fold 
T*M (with Dirichlet boundary conditions on M for 
the end points of the string) is Chern—Simons theory 
on M. In fact, in the modern language of D-branes, 
we would say that U(N) Chern-Simons theory is the 
world-volume theory of N D-branes wrapped on M, 
for the topological A-model on T*M. 

In particular, Chern-Simons theory on $? is the 
theory of branes wrapped on S° inside T*S*. The 
latter is the conifold geometry but now deformed by 
a nonzero size S>. It is described by the equation 


Ky =w= [16] 


where u is the deformation which parametrizes the 
size of the S3. 

The above large-N duality can be considered as an 
open-closed string duality. Namely, that the theory 
of open A-model topological strings on the S° 
resolved conifold (with N D-branes) is dual to closed 
A-model topological strings on the S* resolved 
conifold. Cast in this way, we see that the duality 
involves a transition in the background geometry in 
going from the open-string to the closed-string 
description. The sum over the holes changes the 
background. The S°, as it were, shrinks to zero size 
and a transverse S* opens up. This geometric 
transition makes the connection between the 
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Chern-Simons theory and the closed topological string 
somewhat less mysterious. Maldacena’s conjecture for 
super Yang-Mills involves a similar passage from 
D-branes in flat space to a closed-string theory on 
anti-de Sitter space. In fact, it appears as if the best way 
to understand ’t Hooft’s idea in generality is to think of 
it as an open-closed string duality. 


Further Checks and Consequences 


The free energy is not the only gauge-invariant 
observable in Chern—Simons theory. One important 
class of observables, which played an important role 
in the connection with knot invariants, are the 
Wilson loop expectation values. Given a knot K in 
S’, we can define, in terms of an arbitrary 
representation R of U(N), the trace of the holonomy 
around the knot averaged with respect to the Chern- 
Simons path-integral measure: 


Wr(K) =< tre (P expi f A) > [17] 


P denotes path ordering. Similarly, we can also 
define the expectation values of links: products of 
traces of holonomies around various interlinked 
paths. The nonperturbative solution of Chern- 
Simons theory gives exact answers for the expecta- 
tion values of these Wilson loops. The discussion 
below is, however, confined to knots. 

Since the trace of holonomies is being considered 
in different representations, it makes sense to study 
the generating functional 


Z(U, V) =X trr(U)trr(V) 
R 


< 1 n n 

=p » „t U”trV [18] 
The source V here is a U(M) matrix, unrelated to the 
U(N) holonomy U around K. The second equality in 
[18] follows from use of the Frobenius formula. It 
was shown by Ooguri and Vafa that this generating 
functional is the natural object from the point of 
view of the open-closed string duality. 

We have already mentioned that the U(N) Chern- 
Simons theory can be thought of as the theory of N 
topological D-branes wrapped on the Lagrangian S? 
cycle inside T*S*. For a knot K in the S$, we consider 
another Lagrangian 3-cycle Cx in T*S? which 
intersects the S? exactly in K. A canonical construc- 
tion for Cx is 


Co={(als),p) E T'Y pid = 0} [19 
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where the knot K is parametrized by the closed curve 
q(s). By construction, Cx intersects the S? in K. 
Now consider M D-branes wrapped on Cc. One now 
has to consider the fields coming from the strings 
stretching between the two sets of branes. One can 
show that integrating out these fields (which are in the 
bifundamental of the product group U(N) x U(M)) 
modifies the original Chern—Simons action to 


-Ni 
Sel A) = Scs(A) + X` —trU"trV" [20] 
n=1 


Here V is the holonomy around K of the U(M) 
gauge field A. Thus, this configuration of M probe 
branes gives rise exactly to the generating function 
eqn [18] for Wilson loops of K. 

The geometric transition which relates the Chern- 
Simons theory to the closed-string theory now 
suggests what one needs to do to compute this 
generating function on the closed-string side. We 
have to follow the configuration of the M probe 
branes on Cx through the conifold transition in 
which the S° shrinks and one blows up the S?. It is 
not easy in general to figure out the Lagrangian 
cycle Cx which results from following Cx through 
the transition. It has only been done in a class of 
knots including the simple unknot. But assuming we 
know Cx, the generating function for Wilson loops 
is given by the free energy on the S? resolved 
conifold in the presence of M probe branes on Cx. 
This requires one to know more than the closed- 
string partition function computed earlier. We now 
also need to compute amplitudes for world sheets 
with boundary on Cx. These are called open-string 
Gromov-Witten invariants and the study of this 
subject is in its infancy. For simple knots such as the 
unknot, for which Cx is known, these can be 
computed. One finds again a remarkable agreement 
with the nonperturbative answers of Chern—Simons 
theory. Thus, the computation of knot invariants 
gets related to open-string Gromov—Witten invar- 
iants. There have been a number of other tests 
involving more general knots and links. One also 
has to be careful of subtleties such as in the choice 
of framing. The reader is referred to the articles 
by Marino (2002, 2004) for these topics. 


Conclusions 


The large-N duality of °t Hooft is realized in Chern- 
Simons theory in a very explicit way. Thanks to the 
analytic control we have over both Chern—Simons 
theory as well as closed topological strings, the 
conjecture passes very nontrivial checks that extend 
to all-genus case. This is more than we can do in the 


AdS/CFT conjecture where most computations are 
at tree level in the supergravity limit. In contrast, 
here we see the essential stringiness of the closed- 
string dual to Chern—Simons theory. 

Also, by viewing it as an open-closed string 
duality, many aspects of the correspondence were 
clarified. It, therefore, provides a useful toy model 
for a general understanding of open-closed string 
duality. Indeed, a proof of this duality using world 
sheet techniques has been proposed by Ooguri and 
Vafa. One would like to carry over some of the 
intuition that operates in this duality to the case of 
other physically interesting gauge theories. 

From the mathematical point of view, as already 
indicated, this duality leads to previously unsuspected 
relations between Gromov—Witten invariants and 
invariants of 3-manifolds, including those of knots. 
In fact, by considering more general geometric 
transitions and using this duality locally, one can 
learn about all-genus topological string amplitudes 
for a wide class of noncompact toric geometries. This 
line of development culminated in the formulation of 
the topological vertex by Aganagic, Klemm, Marino, 
and Vafa, which captures the essence of the 
topological closed-string amplitudes for noncompact 
toric geometries. As in the case of the general 
correspondence between the gauge theory and grav- 
ity, this duality sheds new light on both sides of the 
equation. We learn to see new integrality properties 
in knot and 3-manifold invariants which have an 
interpretation in terms of enumerative problems in 
3-folds. The surprises that such a deep connection 
presages have not yet been exhausted. 
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Introduction 


Gopakumar and Vata (1999) conjectured that U(N) 
Chern-Simons gauge theory on S? is dual, for large 
values of N, to a closed topological string theory 
on a suitable Calabi-Yau 3-fold X. They suggested 
that this duality is realized by a geometric “transi- 
tion,” a topological surgery which can be realized by 
birational contractions followed by the complex 
deformations of Calabi-Yau varieties. Here we will 
give some general comments on the history of this 
conjecture and then present some of its mathema- 
tical implications; we will focus on the geometric 
transition and the novel mathematics that it has 
generated. 

A duality relating gauge theories and string 
theories (with gravity) was first conjectured by 
>*t Hooft (1974). In 1998 Maldacena conjectured a 
duality between Yang-Mills gauge theory with 
N=4 SUSY on a four-dimensional manifold M 
and IIB type closed string on the anti-de Sitter space 
AdS’ x $5. Chern-Simons string theory is a three- 
dimensional theory and purely topological, hence it 
is in principle simpler than four-dimensional Yang- 
Mills theory, which also involves a metric. 

In this survey, we discuss the IIA open/closed 
dualities: we will mostly be concerned with the partition 
function, that is we will be working in the context of 
“topological strings.” The duality has been extended to 
a duality of strings, adding fluxes on the closed sector 
and branes on the open sector. There is much 
mathematical evidence supporting the conjecture. 


Overview 


The conjecture says that U(N) Chern-Simons gauge 
theory on $? is dual, for large values of N, to type 
IIA closed topological string theory on a suitable 
Calabi-Yau manifold X. A starting point for the 
geometry, and its mathematical implications, is that 
S? can be thought of as a vanishing cycle in a local 
Calabi-Yau manifold Y = T*S, which deforms to a 
singular Calabi-Yau Yo; X is a Calabi-Yau bira- 
tional resolution of Yo. X are Y are related by a 
geometric transition. In fact, Witten showed that 
quantum Chern-Simons theory on S? can be thought 
of as open IIA (with U(N) branes) on Y = T*S°; thus, 
a more general conjecture says, loosely speaking, 
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that open IIA theory on a Calabi-Yau manifold Y is 
dual, for large N, to closed IIA on a Calabi-Yau X 
which is related to Y via a geometric transition. A 
consequence of a physics “duality” is a matching of 
the free energies of the dual theories. In this 
particular case, if the conjecture is true, the Chern- 
Simons free energy Z(S°,U(N)) should determine, 
and be determined by, the closed prepotential 
F4(X,t). Note that Z(S°,U(N)) is purely topologi- 
cal, and that Fa(X, t) includes all genera, as we will 
discuss later. A mathematical application is comput- 
ing Gromoy—Witten invariants for higher genus via 
large-N dualities (Marino 2004). Another conse- 
quence involves the matching of the observable in S° 
and X. 

This conjecture is now supported by a vast 
amount of evidence. Vafa, Gopakumar and Ooguri 
noted, via a string-theory analysis, that topological 
and knot invariants of S? (computed through U(N) 
Chern-Simons theory on S°) determine and are 
determined by, for large N, the Gromov—Witten 
invariants of X in a neighborhood of the exceptional 
locus of the birational contraction X — Yo. 

The extension to the full string theory would say 
that open string of type IIA compactified on a 
Calabi-Yau manifold Y with branes is conjectured 
to be dual to closed string of type IIA compactified 
on a Calabi-Yau manifold X with fluxes, if X and Y 
are related by a geometric transition. 

A mathematical consequence of this statement 
is that the closed Gromov—Witten invariants of X 
agree, with a suitable identification of the para- 
meters, with combinations of open Gromoyv—Witten 
invariants and knot invariants of Y. This has been 
shown to hold for some classes of examples. 

This circle of ideas has stimulated much work in 
physics and mathematics on the nature of the 
mathematical correspondence behind this duality, as 
well as the property of the enumerative and topo- 
logical invariants involved. The “mirrors” of the above 
transitions have been studied in a series of papers, 
starting with the work of Dijkgraaf and Vafa (2002). 

The mathematics behind the open/closed dualities 
is still not understood: it is reasonable to speculate 
that the natural setup is a framework of symplectic 
field theory. 

We shall start by discussing the principal topics 
of this large-N duality: Chern—Simons quantum field 
theory, IIA closed prepotential (and Gromov—Witten 
invariants), and Chern-Simons as open string (and 
IIA open prepotential). Next we shall study the 
geometric transitions and conclude with some 
mathematical predictions of the duality. 
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We shall not discuss some other interesting 
implications of this duality. For example, we shall 
not discuss its mirror IIB duality: it is known that 
the part of the closed prepotential in ITA correspond- 
ing to rational curves can be expressed as its IIB 
mirror dual with periods over certain suitable cycles; 
the IIA open contribution corresponding to open 
discs is expressed in terms of integrals over chains 
and the Abel-Jacobi map. We only remark that this 
large-N duality has also been interpreted as a duality 
between seven-dimensional manifolds with G2 
holonomy. 


History 


The chronology of various important contributions 
in the field of large-N duality is as follows: 


e 1976: °t Hooft’s conjecture 

e 1988: Clemens introduces transitions 

e 1988: Witten introduces quantum Chern-Simons 
theory on 3-manifolds 

e 1992: Witten discusses Chern—Simons theory as 
open string 

e 1998: Gopakumar—Vafa—Ooguri 

e 2001: Verification for unknot, Katz—Liu, Li, and 
Song 

e 2001: Lift to manifolds with G2 holonomy 

e 2002: The conjecture verified for many examples 
of conifold transitions, including compact case; 
the topological vertex is introduced 

è 2003: Relations with Donaldson—Thomas invariants 


Background 


The varieties of interest in the physical theory 
must satisfy certain “supersymmetry” conditions; in 
particular, a complex algebraic manifold is required 
to be Calabi-Yau, a real seven-dimensional Rieman- 
nian manifold is required to have G2 holonomy 
group. Also of particular interest are the Lagrangian 
real submanifolds of the Calabi-Yau 3-folds. By a 
Calabi-Yau manifold X we mean a manifold with 
c1(X) =0,h9(Q*)=0, where QF is the sheaf of 
holomorphic k-forms, and 0<k<dim(X). If 
dim X > 2, we also assume that X is simply 
connected, but not necessarily compact. For exam- 
ple, if dim (X)=1, X is a torus, if dim(X)=2, X is 
a K3 surface, if dim (X) > 3, X is simply called a 
Calabi-Yau manifold. A compact Kahler manifold 
(M,g,]) of complex dimension m > 3 is a Calabi- 
Yau variety if and only if its holonomy is SU(m). A 
subvariety L of a symplectic manifold (X,w) is 
Lagrangian if wp=0 and dimL=(1/2)dimX. 
Sometimes we consider noncompact manifolds, 


thought of as neighborhoods of a compact projective 
Calabi-Yau manifold. Typically, our symplectic 
manifold is a Calabi-Yau 3-fold (X,w) together 
with its Kahler form w. If there exists an antiholo- 
morphic involution, then the fixed locus is a 
Lagrangian submanifold. 


The Dualities 


We will take the point of view that dualities in 
physics imply relations between geometric invari- 
ants, without dwelling on the physics of the dualities 
themselves. A consequence of a physics “duality” is 
the matching of the prepotential of two dual string 
theories. 


A Few Comments on Chern-Simons Theory: Free 
Energy (Partition Function) 


Let L be a closed oriented manifold together with a 
principal G-bundle. The n Ne lpr 
action is defined as S(L,A)= f; a(A), where a is a 
3-form on L which pracy ona A e A A anda 
suitable bilinear invariant form on the Lie algebra q. 
It is well defined under gauge transformations 
modulo the integers; e?7°'4) is well defined. In 
the large-N dualities considered here, the groups of 
interests are SU(N) and U(N). The first check of the 
duality was found with G = SU(N) and M =S?; later 
it was discovered that the correct group for the 
matching of the observables must be U(N), while 
both can be used for the free energies. We shall 
consider G=SU(N) and M=S?°. Without loss of 
generality, the bundle can be taken to be the product 
U(N) x $; any bilinear invariant form on the Lie 
algebra $u(N) is necessarily an integer multiple k of 
the Cartan-Killing form on the Lie algebra. Then 
S= S(k, A) and 
S(k, A) = -5f tr(A^dA+4A^AA^A) 

where k is the “level” of the theory. Witten defines 
the quantum Chern-Simons theory by taking the 
integral of the Chern-Simons action over all possible 
connections A modulo gauge equivalence G: 


Z(S°,SU(N)) 


=] (DA)e 27iS(A 
A/G 
-j earn (Anda +}AnAnA)) 
A/G 4r 
Witten shows how to calculate the free energy 


Z(S°,SU(N)) through topological surgery, assuming 
Z(S* x $')=1. Witten also defines the partition 


function of knots and links in L (the “expectation 
values”), which are knot and link invariants. The 
expectation values are computed by evaluating the 
trace of the holonomy transformation of a U(N) 
connection around the knot, and then taking a 
suitable average of the U(N) connections. These 
invariants depend on a choice of the framing of the 
knot (or link). 

The explicit computations involve physics, repre- 
sentation theory, and topology. If L =S°, then: 


Z(S°,SU(N)) = (k + N)N”4 Fae 


Reshetikin and Turaev, among others, described 
mathematically the Chern—Simons free energy and 
the expectation values. 


A Few Comments on Closed-String Theory: Free 
Energy (Prepotential) 


In IIA closed-string theory on X, a Calabi-Yau 
manifold, one considers holomorphic stable maps 
of closed Riemann surfaces of genus g,¢:Lg— X, 
with ¢,(X,) =[G] € H2(X,Z), for all genera g and 
homology classes 8 € H2(X, Z). 

Then one forms the closed prepotential F(X, t), 
which encodes the enumerative invariants of 
such maps to X, and which depends on the 
Kahler parameters t of X. Sometimes the prepoten- 
tial is also called “free energy” in the physics 
literature or Gromov—Witten prepotential, as it 
contains the Gromov—Witten invariants of X. 
Setting Feld) = Dje mix z) Cg, 64°, the closed pre- 
potential is defined as 


Fa(X,q)= Y` 28? F (a) 


g>0 


Here q is a formal variable such that g?'t” = qi q; 
(for 31, 6&2 € H2(X,Z)) and g, is the string coupling 
constant. Cg are the genus g Gromov—Witten 
invariants of X, corresponding to the class @ and 
they have been defined as 


Cy. z f 1 
[Mz 0 (X, 6 


It is difficult to explicitly compute the invariants 
C, o; in particular, there is no known general 
method for calculating these invariants. They are 
computed mostly via “localization” methods, in the 
presence of a suitable torus action. In the case of 
g=0 the invariants are often computed via I[A-IIB 
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duality, calculating certain periods in the mirror 
manifold W. 


Example (Faber-Pandharipande). Let X = Opı(—1) 
$0pı(—1); X is a neighborhood of a rigid 
rational curve, which can be thought of as a local 
Calabi-Yau manifold; then all the effective curves 
BEH(X,Z) must be of the form 8=d[P!], Vd EN. 
Faber and Pandharipande showed that 


OO d 


= q 
Fal(X, q) = DEE [1] 


This formula was proved with localization methods 
after it was conjectured by Gopakumar and Vafa using 
large-N dualities. In fact, a consequence of a duality 
between two theories is the matching of the free energies 
of two dual string theories. In this particular case, the 
conjectures imply that Chern-Simons free energy 
determines, and is determined by, the all-genus closed 
prepotential of a suitable Calabi-Yau manifold X: 


Z(S°, U(N)) 2 FAs t) 


Note that the left-hand side is purely topological, 
as we saw in the previous section, while the right- 
hand side is holomorphic. 

The trait d’union between the two prepotentials is 
given by the interpretation of Chern—Simons theory 
on $? as open-string theory on T*S and the 
geometric transition. 


A Few Comments on Open-String Theory 
with Branes: Open Prepotential 


Let Y be a Calabi-Yau manifold together with {UL,}, 
Lagrangian submanifolds; to each submanifold 
L; is assigned a gauge group G;:L; is wrapped 
with G,-branes. Here we shall focus on the case 
G;=U(N;) and we will write (Y; L;, U(N;)). 

Witten shows that the open  prepotential 
F op(Y; A; tops Zs) depends on ’t Hooft’s coupling con- 
stants À; associated to Chern-Simons theory on the 
Lagrangian submanifolds (L;,U(N;)), together with 
the open Kahler parameters top € H2(X; U L;, Z), and 
the string coupling constant g,. To describe the open 
prepotential, Witten argues, we consider all maps 
of Riemann surfaces with boundary to Y, with 
the condition that the boundaries are mapped to the 
Lagrangian submanifolds L;; one should also include 
all the “highly degenerate holomorphic maps,” in 
particular those which contract X, p to a “ribbon 
graph” on the Lagrangian UL;. The contribution of 
these highly degenerate maps is captured by the 
quantum Chern-Simons theory of the Lagrangians 
(Li, U(Ni)}. 
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Application 1 (Chern-Simons free energy as open 
prepotential). Let us consider open HA on 
Y=T*S? with U(N)-branes wrapped on L=S?: 
L is a Lagrangian submanifold with the standard 
symplectic structure; note that in T*L there are no 
nontrivial homology curves. Then, according to 
Witten, the corresponding open prepotential 
Fop(Y, U L;i) must only depend on the “highly 
degenerate” maps and must consist of the Chern- 
Simons term Fcs on L=S?. In particular, 


Fes = log 7S) = Fop( Y, A, 8s) 


where A=2Nr/(k + N) is the °t Hooft coupling 
constant. Periwal (1993) showed that, for large N, 
log Z(S*) could be expanded as a closed-string 
expansion: 


Fes(A) = y FOT 


g>0 


where g, =:27r/(k + N) is the Chern-Simons cou- 
pling constant. In 1998 Gopakumar and Vafa, using 
physics arguments, deduced that the expansion 
would have the closed form [1], which was later 
proved by Faber and Pandharipande. 


The explicit description of the open prepotential 
in the presence of homology classes is not known; 
one would need to combine the enumerative 
invariants of open maps together with the quantum 
Chern-Simons factor. We shall discuss an approach 
at the end of this note, but consider first the 
geometric transition. 


The Transition 


The conjecture says that U(N) Chern-Simons gauge 
theory on S° is dual, for large values of N, to IIA 
closed topological string theory on a suitable 
Calabi-Yau manifold X. A starting point to find 
such X is that S° is a Lagrangian 3-cycle in the 
manifold Y = T*S°; performing a topological surgery 
by replacing S? with S* one obtains a (local) Calabi- 
Yau manifold X, on which the dual IA theory is 
compactified. The key observation is that Y can be 
identified with the algebraic variety of equation 
{xy —zw=t}C C* and that this is a complex 
smoothing (in fact the Milnor fiber) of Yo with 
equation {xy — zw =0} c C*. On the other hand, X 
is a small resolution of this singularity, where Pt is 
the exceptional locus of the birational contraction. 
The origin is an “ordinary double point” singularity 
and the nontrivial sphere S3 c Y is the vanishing 
cycle of the degeneration. The manifolds involved 
are noncompact: the exceptional curve [P']=t is 
the only nontrivial homology class in X, and the 


enumerative invariants in X can be thought as the 
contribution of the exceptional curve in a neighbor- 
hood of a Calabi-Yau manifold. We shall present 
the steps leading to this construction and the 
evidence for the conjecture. 


The Local Construction of X 
Let Y, = {(w1,..., w4) € C* such that 2 w? =k 


Proposition 1 Let u be a nonzero real positive 
parameter; then: 


e L=S c T*S is a Lagrangian submanifold of 
T*S? with its standard symplectic structure; 
e T*S =Y, and LSL, def {Re(5 w? =j] 


In fact, we can embed T*S? in RÊ as 
4 4 
zai J gp=0 
j=l j=l 


where S? ={p;=0}; consider then the morphism 
C* — R? defined by setting 


Re(wj) 


di = . 
T EDDA 


which induces the diffeomorphism Y, = T*S? of the 
statement. 


Remark 1 Let Yọ = Daa w? = 0} c C’; then: 


p; = Im(w;) 


e Yo is singular at the origin, 
e Y, is a complex deformation of Yo, and 
e L,, is called a “vanishing cycle.” 


With a change of coordinates we can write the 
equation of Y,, as {xy—zw=0O}; the singularity is 
still at the origin. This singularity is an ordinary 
double point, which is often referred in physics 
literature as “the conifold singularity.” Let 
XcC* xP! be defined: 


Az + vy = 0, Ax + vw =Q 


[Av] E€ Pt. 
Remark 2 X is smooth and the morphism 


o:X— Yo, ((x,y,z, w), [A, u]) = (x,y,z, w) 


is an isomorphism Pye ot :(X\P') ~ (Yo\{0}) and 
Pp! (0,0,0,0,) cC*. @ is a small (nondivisorial) 
birational resolution of the singularity at the origin. 
Y,, is a deformation (smoothing) of Yo. Note that 
topologically $? = L, C Y, has been replaced by 
P! = $2 c X. The algebraic properties of the topo- 
logical surgery between Y,, and X were first studied 
by Clemens in 1988. 


Transitions in Geometry 


A transition between X and Y is a birational 
contraction from a smooth Calabi-Yau X to a 
singular variety Yo followed by a complex deforma- 
tion to another smooth Calabi-Yau manifold Y: 


X 


| 
Y < Yo 


The vanishing cycles of the complex deformation 
UL; are always Lagrangian submanifolds of Y. The 
transition makes sense if dim(X) = dim(Y) > 2 and 
it is nontrivial if dim(X)= dim(Y) > 3, when the 
topology of X is different from the topology of Y. 
The possible transitions among Calabi-Yau 3-folds 
have been classified. 


Conjecture 1 Let X and Y be Calabi-Yau mani- 
folds related by a geometric transition: then IIA 
open theory with U(U) branes compactified on 
(Y, UL;) is dual to IIA closed theory compactified 
on X (with fluxes). 


As a consequence: 


Conjecture 2 Let X and Y be Calabi-Yau mani- 
folds related by a geometric transition: then 
F op (Ys As 855 top) = F(X, 4, 2s) for a suitable identi- 
fication of the parameters. 


The results stated in the previous section can be 
summarized in the the following statement, which is 
the proof of the above conjecture for the special case 
of a local conifold transition: 


Theorem 1 Let X 2Op,(—1)@Opi(-1) and 
Y=T*S? with U(N) branes wrapped on L=S?. 
Then X and Y are related by a conifold transition 
and log Fcs(S°) =Fop(Y, A) =Fa(X,q), with the 
identification 


o 2Nr o 2r 
RIN O SN 





This matching of the free energies is supporting 
evidence for the large-N conjecture. At this moment, 
we still do not know if Conjectures 1 and 2 hold for 
more general transitions. 


A Few Comments on Knots and Links 


Later, Ooguri and Vafa extended the conjecture to 
the observables, that is, by adding knots and links in 
S3; the guiding principle is that a knot (or link) C c S° 
should determine a noncompact Lagrangian sub- 
manifold Le C X; it is conjectured that the knot 
(and link) invariants, expressed as expectation 


Large-N Dualities 273 


values, should determine and be determined by the 
enumerative invariants of morphisms of bounded 
Riemann surfaces, with boundaries mapped onto 
Le. We refer to these invariants as open Gromov-— 
Witten invariants. While both statements have been 
verified with mathematical techniques only when C 
is the unknot, there is much supporting evidence for 
the conjecture in general. We will not describe these 
aspects here but only make a few remarks. 

The expectation values of a knot C are computed 
by taking first the trace of a holonomy matrix of a 
U(N) connection A along C and then integrating over 
all connections (modulo gauge equivalence). As for 
the case of the Chern-Simons free energy, the 
definition of expectation values has been worked 
out both in the realm of physics and of mathematics. 
The expectation values are knot and link invariants, 
and depend on a choice of the framing of the knot (or 
link). The open Gromov—Witten invariants have not 
yet been constructed, as we shall discuss in the 
following section; however, starting with the work of 
Katz and Liu, Li and Song open invariants have been 
successfully calculated in the presence of a torus 
action. The resulting invariants do depend on the 
choice of the torus action, which has been shown to 
match the choice of the framing of the knot (or link). 


More on the Open Prepotential 


The open Gromov—Witten invariants, in analogy with 
the closed case, should “count” in an appropriate sense 
open morphisms; at this point, it is not known how to 
define this quantity. To proceed in analogy with the 
closed case, one would need to define the appropriate 
moduli space of open maps and its virtual fundamental 
class. On the other hand, open invariants have been 
successfully calculated in the presence of a torus 
action, assuming the existence of the moduli and 
virtual fundamental class and that the Atiyah—Bott 
localization theorems can be applied. We shall follow 
this approach in sketching how the IIA prepotential 
has been computed in many examples. 


Open Invariants 


Let [G]<Ho(Y; UL; Z) be the relative homology 
class of Riemann surfaces in Y with boundary on the 
union of the Lagrangian 3-cycles U;L; and a class 
[y] € Hy (UL;). 

If Xp is a Riemann surface of genus g and h 
boundary components, let @: X, p — Y be a morph- 
ism with 


Ox (2gp) = [6] € H2(Y; UL;, Z) 
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The open generating function is 


FAY Lite) = a "ibn 
g,h>0 


with 
F, p (top) = >, C boat 
By 


Here g and y are formal variables such that 
gihi+Be 2u q” l qP and lithe —y" syz, for 81, B€ 
HY UL; Z), y1, %2 i UE, Z); top is the open 
Kähler parameter, g, is the string coupling constant 
and C, p, 4, Should “count” in an appropriate sense 
the maps ¢. 


Example (Ooguri-Vafa; Katz—Liu; Li-Song). If 
Y = Op, (—1) © Op, (—1), then f is the class of the P! = 
S*,t/2 represents the class of the lower hemisphere in 
S*. The Lagrangian L is the Lagrangian £ in the 
previous sections, which corresponds to the unknot in 
S3 c Y; it is the fixed locus of an antiholomorphic 
involution on X and it intersects $ in an equator. 
Then, for a suitable choice of the torus action: 


d 
— 7 —dt/2 
Fa Y, Ly, op» ss) — ; EJAN 
(¥, UL; top, 8s) 2. Td sin(dn/2) : 


There is a complete form for more general torus 
actions. The above formula was first computed by 
Ooguri and Vafa, using string-theory arguments, 
and then computed by the mathematicians, Katz and 
Liu, and Li and Song. 


More on the Open IIA Prepotential 


If there is only one rigid open curve in Y, say a disk 
C, with boundary on Lc Y, then, as Witten 
showed, the open prepotential is a combination of 
the open enumerative invariants as described above 
with @=d[C] and y= OC and the expectation values 
of the unknot OC. The variable Y is changed in the 
trace of the holonomy of a connection. 

In the presence of a torus action, one can treat the 
fixed locus as if it were rigid and proceed accordingly. 


With these techniques, Conjecture 2 has been 
verified for many cases of conifold transitions, with 
to» nontrivial, for a suitable identifications of the 
parameters, including when both X and Y are 
compact manifolds (Diaconescu—Florea 2003). 


See also: AdS/CFT Correspondence; Chern—Simons 
Models: Rigorous Results; Large-N and Topological 
Strings; Mirror Symmetry: A Geometric Survey; String 
Field Theory. 
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Introduction 


As a prototype of lattice gauge theory, quantum 
chromodynamics (QCD) will be considered in this 
article. All statements about QCD can easily be 
extended to other theories, with different gauge 
group and different content of particles. 

QCD is a gauge theory with gauge group SU(3) 
(color group), coupled to spin-1/2 particles (quarks) 
belonging to the fundamental representation of the 
color group. There exist in Nature six different 
species (flavors) of quarks, with masses ranging 
from Myp ~ 5 MeV to mop ~ 180 GeV: the values of 
these masses are determined by other interactions 
and can be treated as input parameters of the theory 
as well as the number of quark flavors. In standard 
notation, the Lagrangian reads 


L= — 5 te( Gu Gu) T N yip E my ) Vf [1] 
f 


The sum runs over the six quark flavors f. 
Gv = A, — OA, + ig[A,, Av] is the field strength 
tensor, A, = >) T'A, the (gluon) gauge field, 
T*(a=1,...,8) are the eight generators of the 
gauge group in the fundamental representation, 
normalized as tr(T?T’) = (1/2)6%. wr is a color 
triplet of fields. Under a gauge transformation U(x), 


be (x) > P(x) = U(x) be (x) [2 
A(x) > A(x) 
= U(x)A„ U+ (x) + iU(x)ð„ Ut (x) [3] 


D,, is the covariant derivative of 4% 
Dypsy = (On — 1gAy) ve [4] 


and transforms like ys by construction. 

L is invariant under the gauge transformation 
equations [2] and [3]. As a consequence of gauge 
invariance, the theory has one single coupling 
constant g. 

To make connection with the observations, one 
has to solve the theory, that is, one has to construct 
a Hilbert space on which the fields act as operators 
obeying the equations of motion and the canonical 
commutation relations. In textbook field theory, 
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this is done by splitting the Lagrangian L into two 
parts: 


L = Lo + Li [5] 


with Lo the part of L which is bilinear in the fields 
and Ly the rest. Lo can be solved exactly since it 
describes free particles and the corresponding 
equations of motion are linear. The resulting Hilbert 
space is the Fock space of free particles. Ly is treated 
as a perturbation producing scattering between the 
fundamental particles. This approach works well 
in quantum electrodynamics, where the observed 
particles (electrons and photons) coincide with the 
excitations of the fundamental fields of the 
Lagrangian. 

In QCD, the fundamental excitations (the quarks 
and the gluons) are observed as particles neither in 
Nature nor as a product of high-energy collisions 
between elementary particles. This feature is known 
as confinement of color. The conjecture is that 
excitations with nontrivial color are forbidden to 
propagate as free particles. However, if hadrons are 
probed at short distances by photons or by leptons, 
everything works as if they were composite states of 
quarks. The accepted explanation relies on asymp- 
totic freedom: the effective coupling constant 
becomes small at short distances (high momentum 
transfers) and the constituents behave as free 
particles. 

At large distances, the fundamental excitations are 
not observed, the interaction is strong and the 
perturbative picture describing scattering between 
quarks and gluons is not adequate for the real 
world. 

An alternative quantization procedure is needed 
which does not rely on perturbation theory. A 
formally exact quantization procedure is the Feynman 
path integral. The solution of the theory is given in 
terms of a functional integral Z[J], which generates 
the correlators of the fields in the ground state 
(vacuum). Indicating symbolically the Lagrangian 
coordinates, namely the fields, by a single symbol 9, 
one has 


z= f [1de exp -s8 - f 1(2)o()dx) (6 


The connected Euclidean vacuum correlators are 
given in terms of functional derivatives of Z[]] 


< O|T(&(x1) ®(x2) = &(x,,))|0 > conn 
1 ZIJ] 


TE 2) en) 7 





J(x)=0 
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“Euclidean” means that they are analytic con- 
tinuations to imaginary times. Going to Euclidean 
system is necessary to isolate the vacuum state. The 
amplitudes can be analytically continued back to 
Minkowski space. The Hilbert space and all the 
physical observables can be constructed in terms of 
the correlators, a property known as reconstruction 
theorem. Formally (i.e., assuming that everything 
makes sense only if the functional integral exists), 


= O/T (&(x1)®(x2) pare P(x,)) |O >conn 


-5 J [J d(x) 
x exp(—S[]) B(oe1) Ber)“ Den) [8 


The continuation to imaginary time changes sign to 
the kinetic energy, and Z formally becomes the partition 
function of a four-dimensional statistical model with 
Hamiltonian Sg[®], a general fact in Feynman integrals. 

By definition of functional integral, Z is defined 
by discretizing a finite volume V of spacetime to a 
finite set of points and then sending their number to 
infinity, making a set dense in V. If the limit exists, a 
Zy is obtained. The volume V is then sent to infinity, 
to cover the whole spacetime (thermodynamical 
limit) and Zy eventually converges to Z. A rigorous 
proof of the existence of these limits does not exist 
for QCD, but there are qualitative arguments that 
this is the case, which will be presented below. 

In the lattice formulation of field theory, a regular 
lattice, usually cubic, is taken as a discretization of 
spacetime. 

From the very definition of Feynman integral, it 
follows that the formulation of field theory on the 
lattice is nothing but an approximation to the limit 
which defines Z. It will provide a good approxima- 
tion if the lattice spacing is small enough with 
respect to the physical lengths involved and if the 
lattice is large compared to them. 

Perturbation theory amounts to split the action 
into a bilinear term Sp and an interaction term $ 
containing the higher powers of the fields. The Z 
integral is then computed by expanding the weight 
in a power series of S|: 


J | [ dæ (x) exp(—So = S1) 
= [Termen-sy E 9 


The Feynman integral thus becomes Gaussian, can 
be computed, and gives the usual perturbative 
expansion. The two limits (integral and series 
expansion) do not commute in general. For QCD, 





there are indeed arguments that the renormalized 
perturbative expansion does not converge and is 
plagued by singularities known as renormalons. 


Wilson’s Formulation 


For field theories of scalar particles, the lattice 
discretization is performed by assigning a value of 
the field to each site of the lattice. The Wilson 
formulation for gauge theories is not made in terms of 
the fields A, which are defined in the Lie algebra of 
the gauge group, but in terms of parallel transports, 
which are elements of the group itself. The building 
blocks are parallel transports along links parallel to 
spacetime axes connecting neighboring sites 


U(x) . 
Px (ie / r And”) sea 10 


where û indicates the vector of length a in the p 
direction and P the ordered product. The last 
approximate equality is valid in the limit of small 
lattice spacing a. g is the coupling constant. 

Under a gauge transformation V(x), 


U (x) > V(x) Up (x) Vi (x + â) [11] 


It follows from eqn [11] that the parallel transport 
along a closed path is gauge invariant. The density 
of action can be written in terms of the parallel 
transport along the elementary square of links in the 
hyperplanes pv II,,,, known as plaquette: 


| [ = [Up (x)Un (x + â)U} (x + b)Ui e) [42] 


uV 


By expanding in powers of a, one easily finds 
1 
[= N: -5a%tr1GwGw)+O(@) [13] 
uV 


with N. the number of colors, 3 for QCD. The 
lattice action can be defined as 


s= (1-1) 14 


XV 


with 6 =2N./g%, and tends to the continuum action 
as a — 0, O(a’). An infinite number of higher-order 
terms in a exist, which come from the expansion of 
the links, but they are expected to be irrelevant in 
the continuum limit a —> 0. 

The measure of the Feynman integral is assumed 
to be the Haar measure of the gauge group for each 
link, which again can be shown to tend to the 
continuum measure in the continuum limit. 


Everything is gauge invariant, contrary to the 
perturbative formulation, where a gauge fixing is 
required to define the vector meson propagator. 

By Weierstrass theorem, the integral is finite for any 
finite number of links, the gauge group being compact. 

Any other choice of the lattice action differing from 
the Wilson action of eqn [14] by terms of higher order in 
a will have the same continuum limit: there is significant 
freedom in the choice of the action. 

In the language of statistical mechanics, the 
Euclidean lattice formulation is a spin model. 
Different choices of the action correspond to different 
spin models. In the vicinity of a second-order phase 
transition, however, the correlation length becomes 
large with respect to the lattice spacing and all the 
irrelevant terms become negligible. All the spin 
models at the critical point belong to the same 
universality class and define the same field theory. 

This is what happens for QCD because of 
asymptotic freedom. By renormalization group 
arguments, the lattice spacing behaves as 


a(/3) = Z expl bob) a5] 


at sufficiently large 6, where —bo is the coefficient of 
lowest-order term of the 8-function, bo is positive and A 
is a physical scale. As 8 — oo, a tends exponentially to 
zero in physical units and the coarse structure of the 
lattice becomes unimportant, indicating that the short- 
distance limit in the definition of the Feynman integral 
exists. The theory also develops a mass scale A which 
insures the existence of a finite correlation length and 
hence of the thermodynamical limit. In practice, when 3 
is increased, the lattice space becomes exponentially 
small in physical units. As a consequence, however, the 
physical scale becomes exponentially large in lattice 
units, and an exponentially large lattice is needed to 
insure the large-distance convergence. This makes life 
difficult if the Feynman integral has to be computed 
numerically. 


Quarks 


Fermion fields are defined on lattice sites. The 
naive lattice transcription of the fermion term 
in eqn [1] consists in replacing the covariant 
derivatives by finite differences with parallel 
transports to make the result gauge covariant. In 
principle, D,,~)(x) = U'(x)w(x + â) — y(x) is a correct 
definition. In practice, a more symmetric difference 
is used which is correct O(a”), namely 


Dx) 
= 3U) +A) U eÀ [16 


Lattice Gauge Theory 277 


The fermionic Lagrangian then reads 
N UOD — myx) 
= ` Palx)Mzi antel") [17] 


x, x'ap 


It is convenient to indicate this expression in the 
form Ss = YM, where 4 is a large column whose 
elements are labeled by the site x and by the 
component a. The functional integral over yw can 
explicitly be done by using the standard rules of 
integration on Grassman variables, since the action 
is bilinear, 


Z= fT] dU p(ee)dus(x)ai(e 
x exp(—Se[U] — PMY) 18) 


The result is 


Z = J ] [ 4U,(x) exp(—Sg[U]) det M [19] 


The effect of fermions is to multiply the weight by a 
functional determinant which depends on the gauge 
field configuration. 

A problem exists, however, in this procedure 
already at the level of free fermions, that is, putting 
U=1 in the action and in the determinant of 
eqn [18]. The equation of motion reads, in Fourier 
transform, 


3 by a (2x4) m Hk) =0 [20] 


u 


With respect to the continuum, the momentum 
pu =2rk,/L has been replaced by its sinus. At 
small values of p,, eqn [20] coincides with the 
Dirac equation. However, an alternative solution 
exists at p, ~ 7, for each u independently. The new 
equation differs from the other by a change of sign 
of yı. Changing sign of one of the gammas means 
changing sign to y? = 7!y*7°74 which is the 
chirality of the fermion. Instead of one fermion, 
we then have 2*=16 fermion species, organized in 
pairs with opposite chiralities. It is impossible to 
have a single fermion with a given chirality. A 
number of recipes have been proposed to circum- 
vent this artifact of the lattice regulation, for 
example, introduce by hand a term in the action 
which removes the spurious particles in the limit of 
zero lattice spacing (Wilson’s fermions); double the 
lattice spacing by constructing two sublattices on 
even and odd sites, respectively, which propagate 
fermions of opposite chirality (staggered fermions), 
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so that the argument of the sinus in the derivative is 
doubled. More recently, an idea which goes back to 
Ginsparg and Wilson has been implemented, which 
consists in replacing a strictly local equation of 
motion like eqn [20] by an equation with the 
same continuum limit which is nonlocal, but with a 
nonlocality falling off exponentially at large 
distances, a recipe which makes propagation of 
chiral fermions possible. This is an important 
improvement, even if very demanding in computer 
power. 


Numerical Simulations 


Solving analytically the lattice version of QCD 
would allow one to follow constructively all the 
steps which bring to the definition of Z, that is, the 
ultraviolet and the infrared limit, as explained 
earlier. Presently that is out of reach. Also an 
attempt by Wilson to solve the lattice renormaliza- 
tion group equations by techniques of decimation is 
not conclusive. 

The problem can be attacked numerically. One 
way would be to compute the integral numerically. 
That is, however, prohibitive: it would be like 
solving exactly the equations of motion for the 
molecules of a gas. The lattice theory is in fact a 
four-dimensional statistical mechanics with the 
Boltzmann factor @=2N,/g? and Hamiltonian 
equal to the Euclidean action. As in statistical 
mechanics the way out is to create a significant 
sample of configurations with weight exp (—(Sz) 
and to determine the field correlators which describe 
physics by an average on this ensemble. This is done 
by Monte Carlo techniques. 

The basic principle is to start from an arbit- 
rary field configuration and make a sequence of 
random changes, normally on a single link at a 
time, with uniform probability in the group 
measure so as to converge toward the equilibrium 
distribution exp (—6Sg). For that purpose, the 
probability Pec to change from a configuration C 
to another C’ is constrained to obey the detailed 
balance relation 

Pocexp(—88(C]) = Pec exp(—8S[C]) 21 
A common algorithm is known as metropolis. The 
way to implement the condition (eqn [21]) is to accept 
the new trial configuration C’ if S[C’] < S[C], and to 
accept it with probability exp (— 6[S(C’) — S(C)]) if 
S[C’] > S[C]. An alternative method is known as 
“heat-bath”. If the probability of the configuration for 
one link at a fixed value of the other variables is 


explicitly known, the change can be accepted with that 
probability. 

In the presence of dynamical quarks, the integral 
eqn [18] is converted into an integral on bosonic 
variables by inverting the matrix M: 


z= | [Javea 


x) db(x)' 


x exp(—Sz[U] — ; [MM] o) [22] 
The property has been used such that 
STI dex) dot (x) exp (—¢'[MiM}'4)=|detM]. A 


metropolis updating is then i ie on the 
combined U, and @ variables. To have a choice 
of the trial uniform in the measure, an algorithm is 
commonly used which is based on ergodicity, 
known as hybrid molecular dynamics. A fictitious 
conjugate momentum is associated with all 
variables, and a fictitious Hamiltonian is defined 
by adding to the action, considered as a potential 
energy, the sum of the squares of the conjugate 
momenta. A classical evolution is then performed in 
time by small steps which should displace the state 
in phase space ergodically: the evolution is called a 
trajectory. After a number of steps, a metropolis test 
is made as explained above. 

Typically, the computer time needed to produce a 
significant configuration is proportional to the 
volume V of the lattice for pure gauge systems, to 
V°/4 in the hybrid algorithm for full QCD. 

As explained before, in order to have a good 
approximation to the Feynman integral the lattice 
spacing has to be small compared to the physical 
scales, for example, with respect to the Compton 
wavelength of the heaviest quark. On the other 
hand, to control volume effects it has to be large 
compared to the biggest physical length, for 
example, with respect to the Compton wavelength 
of the lightest quark. Since there is a factor 
Mop /Mup © 3 x 10° between these two lengths, the 
lattice size needed would be prohibitive from 
numerical point of view. In practice, lattices of 
size L* are affordable with L < 64-— 128. For 
this reason, only the light quarks u, d, s are kept, 
which have mass smaller than the typical scale of 
the theory, which can be identified as the square 
root of the string tension. In the limit in which light 
quark masses are small compared to QCD scale, 
the Lagrangian is invariant under any unitary 
mixing of them. A global SU(3) invariance exists, 
which is known as flavor symmetry, and is broken 
by the difference of quark masses. Heavier 
quarks can be described by an effective theory, 
since they have negligible dynamical effects at low 
energies. 


A Selection of Physics Results 
String Tension 


A big excitement followed the first numerical 
calculations by M Creutz at the beginning of the 
1980s in which the static potential V(r) between a 
quark and an antiquark was computed in pure- 
gauge theory on the lattice. One way to measure it is 
to measure the correlator of two Polyakov lines at a 
distance r on a significant ensemble of field config- 
urations. The Polyakov line is the parallel transport 
in the fundamental representation along the time 
axis across the lattice: with periodic boundary 
conditions it is a closed loop, and hence it is gauge 
invariant. It can be proved that the log of this 
correlator is equal to —V(r)aL; with L; the extension 
of the lattice in the time direction. It was found that 


V(r) =or [23] 


The parameter ø is known as string tension. A 
potential of the form eqn [23] means confinement: 
an infinite amount of energy is required to pull apart 
the particles at infinite distance. The parameter o 
can be determined phenomenologically from the 
mass spectrum of the mesons and o27 ~ 1 GeV. 
What is measured on the lattice is 


oa(3)>n? [24] 


where n is the distance of the two Polyakov lines in 
lattice spacings and a(6) the lattice spacing in 
physical units. In fact, the computer only produces 
pure numbers. If the lattice QCD belongs to the 
same universality class of QCD at the critical point, 
that is, if the lattice really defines QCD, the 
dependence of a(6) on 8 is dictated by the 
G-function of the renormalization group. At suffi- 
ciently large 3 = 6/g", 





a((8) = <—exp(—bo9) 251 
latt 

with bo = (11/3)N./167%. Aau is the energy scale of 
the theory. The measurement of the potential gives 
indeed a dependence of the lattice spacing on 8 
consistent with eqn [25] and allows one to deter- 
mine o/Aj,,. The absolute value of the lattice 
spacing can be determined by comparison with the 
physical value of the string tension. The theory is 
able to produce a physical scale. The correlation 
length is finite and as a consequence the infrared 
limit of the Feynman integral exists. 


Mass Spectrum 


Any operator with the quantum numbers of a 
particle can be used as interpolating field for it. 
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The correlator of the operator at large distances 
behaves like a sum of exponentials exp (—mr) with 
m the masses of the particles with the same quantum 
numbers. At large distances the lightest particle 
dominates, especially if the operator has a good 
overlap, that is, if its matrix element between 
vacuum and the state of the particle is the biggest. 
From the correlators mr can be determined. On the 
lattice r=na(3) so that, by eqn [25] what is really 
determined is the ratio m/Aje. If Ata has been 
determined, for example, from the string tension, 
the mass of the particle results in physical units. 
Alternatively, the ratios of any two masses can be 
determined and the scale fixed by the value of one of 
them. A good agreement is obtained already in pure 
gauge (quenched approximation) indicating that the 
quark loops are relevant at the level of 10% 
typically. This fact supports the idea that the large 
N,-limit is a good approximation to reality, quark 
loops being nonleading in that limit. The light 
particle masses are more difficult to compute, 
being sensitive to the masses of light quarks which 
cannot be taken at realistic values due to computa- 
tional difficulties: large lattices are required and big 
fluctuations are present near the chiral point. The 
spectrum of particles made of heavy quarks can be 
computed using effective theories, and nicely fits 
experiment. A byproduct is a precise determination 
of the gauge coupling constant, competitive with 
phenomenological determinations from short dis- 
tance perturbative QCD. 


Weak Interaction Matrix Elements 


There exist matrix elements of currents (or products 
thereof) entering in weak amplitudes which involve 
large distances and are not computable in perturba- 
tion theory. Lattice can be used to evaluate them. 
Renormalization problems can appear in this 
approach when the cutoff is removed, which, 
however, are not difficulties of principle but only 
of technical nature. This activity is of fundamental 
importance to have precise predictions in order to 
understand the limits of the standard model. 


Finite-Temperature QCD and the Deconfinement 
Transition 


The static thermodynamics of a system of fields is 
described by the partition function 


Zr = trlexp(—H/T)| [26] 


It is easy to show that Zr is equal to the Euclidean 
Feynman integral on the imaginary time interval 
(0,1/T) with boundary conditions in time periodic 
for bosons and antiperiodic for fermions. Indeed, the 
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Boltzmann factor is formally an imaginary time 
evolution by 1/T. A lattice of extension L;L? with 
L; >> L, provides the partition function at a tem- 
perature T=1/aL,, if a is the lattice spacing in 
physical units. 

Finite-temperature simulations are important to 
investigate the transition from the phase in which 
color is confined to a phase in which quarks and 
gluons can propagate as free particles. This phase is 
called deconfined phase or quark gluon plasma. 

Big experiments at Brookhaven and at CERN are 
looking for this phase transition in high-energy 
collisions between heavy nuclei, but no definite 
evidence has yet been produced for it. Lattice 
simulations instead definitely prove that such a 
transition exists. For pure SU(3) gauge theory 
(quenched) at T~270MeV, a first-order phase 
transition is observed, at which the string tension 
vanishes. In a more realistic theory with 
dynamical quarks, a transition is also observed at 
T + 160 MeV, where chiral symmetry, which is 
spontaneously broken at zero temperature, is 
restored. This transition is also associated to decon- 
finement even if, in the presence of light quarks, the 
string tension does not exist. Indeed, when pulling 
apart a quark and an antiquark, an instability for 
production of quark—antiquark pairs sets in when 
the potential energy becomes large enough, which 
physically manifests itself as a production of light 
mesons. An alternative order parameter is needed. 
The possibility of defining alternative order para- 
meters is discussed in next section. 

The equation of state can also be studied relating 
internal energy to pressure, which is useful to 
understand heavy ion collisions. 

From the features of the deconfinement transition, 
information can be extracted on the mechanisms by 
which QCD confines color. 

A connected issue is the behavior of QCD at 
nonzero baryon density or chemical potential. The 
corresponding thermodynamics is described by a 
grand canonical ensemble 


Zy= trlexp|—(H + uN)/T]] |27] 


where N=] d’xyty is the baryon number operator 
and u the chemical potential. In the process of 
converting the partition function Z, into a Feynman 
integral, the term H at the exponent of eqn [27] 
generates the Euclidean action, which is real. The 
term proportional to N becomes imaginary. The 
integral is well defined, but the analogy with a four- 
dimensional statistical mechanics is broken, the 
effective Hamiltonian being non-Hermitian and no 
sampling can be made. Approximate methods have 
been developed, but the problem is open. Exploring 


numerically the region of phase space with u Æ 0 
would be interesting, since a rich structure is 
expected, which could be relevant to dense systems 
such as neutron stars. 


Mechanisms of Color Confinement 


Understanding how QCD manages to confine color 
is one of the most fascinating problems in field 
theory. 

To prove confinement, one should, in principle, 
prove that, at zero temperature, no gauge-invariant 
quasilocal operator exists, carrying nontrivial color 
and obeying cluster property at large distances. This 
proof is not known. There exists evidence form 
lattice simulations that a string tension exists, as 
discussed before. In any case, a guess can be made of 
the physical mechanism of confinement. If confine- 
ment is an absolute property reflecting a symmetry 
property of the vacuum, an order parameter should 
exist which discriminates between confined and 
deconfined phase, and the transition between the 
two phases has to be a true transition. Observing a 
crossover in some part of the boundary between the 
two phases would disprove this view. A lattice 
determination of the order of the deconfining 
transition is therefore of fundamental importance. 

A possible mechanism of confinement proposed by 
G’t Hooft is dual superconductivity of the vacuum: 
dual means interchange of electric with magnetic 
with respect to ordinary superconductors. In the same 
way as the magnetic field is constrained into 
Abrikosov flux tubes in an ordinary superconductor, 
the chromoelectric field acting between a quark and 
an antiquark would be constrained into flux tubes by 
a dual Meissner effect producing an energy propor- 
tional to the distance, or a string tension. 

This mechanism can be investigated by lattice 
simulations, by checking if any magnetically charged 
operator exists whose vacuum expectation value is 
nonzero in the confined phase signaling condensation 
of magnetic charges and zero in the deconfined phase. 
Progress has been made in this direction which, 
however, is not yet conclusive. Chromoelectric flux 
tubes between g-g pairs are observed in lattice field 
configurations. 


Topology 


Euclidean QCD admits classical solutions with finite 
action and with a nontrivial topology which makes 
them stable. These solutions, known as instantons 
or multi-instantons, realize a mapping of the three- 
dimensional sphere at infinity on the gauge group, and 
the topological charge is the winding number of this 
mapping. The Jacobian of this mapping is the Chern 


current K,, and its divergence 0,,K,,(x) = Q(x) is the 
density of topological charge. O= [ d*x O(x) is the 


topological charge which has integer values. 
Explicitly, 
g2 
Q(x) = Te "Cm Gi] [28] 


with Gi, = (1/2)€;vpo Gpo the dual field strength tensor. 
O(x) plays an important role in hadron physics, 
being related to the anomaly of the flavor singlet 
axial current J? = >>, YY Y- Ji is conserved at the 
classical level in the chiral limit mp =0, but this 
symmetry does not survive quantization. In fact, 


OTi = 2N; Q(x) 29) 


A consequence of eqn [29] is the high mass my ~ 
1GeV of the flavor singlet partner 7 of the 
pseudoscalar flavor octet. An N-— oo argument by 
Witten and Veneziano relates m, to the response of 
the quenched (no quark) vacuum to topological 
excitation, the topological susceptibility x = f d'x < 
0|TO(x)O(0)|O >. The relation is 


x = |m; + m — 2ml + O(1/N)] [B0] 
This approximate relation has been checked on the 
lattice. y has been determined by different methods 
which agree in confirming it. This is an important 
verification of QCD. 

Instantons are stable solutions in the continuum, 
approximately stable in the lattice discretized ver- 
sion. A cooling procedure which locally freezes 
short-distance quantum fluctuations would leave 
the instantons untouched if they were stable. On 
the lattice the instanton is stable anyhow if the 
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distance in correlation reached by the local cooling 
procedure is small compared to the size of the 
instanton: cooling is indeed a diffusion process and 
the distance involved grows as the square root of the 
number of cooling iterations. Instanton configura- 
tions can nicely be exposed by cooling. 


See also: Anomalies; Quantum Chromodynamics; 
Renormalization: General Theory; Spin Foams; 
Symmetry Breaking in Field Theory. 
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Introduction 


The Leray-Schauder theory gives a powerful and 
versatile continuation method for proving the 
existence, multiplicity, and bifurcation of solutions 
of nonlinear operator, differential and integral 
equations. Let X and Y be topological spaces, A C X, 
f:X— Y, a continuous mapping, and y € Y. The 
fundamental idea of a continuation method to solve 


the equation f(x) = y in A consists in embedding it into 
a one-parameter family of equations 


F(x, A) = 2(A) [1] 


where the continuous functions F: X x [0,1]— Y, 
z:[0,1] — Y are chosen in such a way that F(-,1)= 
f,2(1)=y and 


1. equation F(x,0)=z(0) has a nonempty set of 
solutions in A; 

2. one of those solutions at least can be continued 
into a solution in A of [1] for each A € [0, 1]. 


Simple examples show that Assertion 2 can be 
violated when all solutions of [1] leave A after some 
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A* € ]0,1[. A way to avoid such a situation consists 
in “closing the boundary,” through the “boundary 
condition”: 


FAF AA) 


When this condition is satisfied, Assertion 2 can 
still fail when two existing solutions for A small 
disappear after coalescing at some Ap < 1. Losing all 
solutions through this process can be eliminated by 
reinforcing Assumption 1 into 


for each (x,A) € 0A x [0,1] 


2’. Equation F(x,0)=2z(0) has a “robust” nonempty 
set of solutions in A. 


This statement can be made precise through the 
concept of topological degree of a mapping, an 
“algebraic” count of the number of its zeros. In a 
finite-dimensional setting, this concept was intro- 
duced by Kronecker for smooth mappings and 
by Brouwer for continuous mappings. Its extension 
by Leray and Schauder to some classes of mappings 
in Banach spaces made much wider applications 
to nonlinear differential and integral equations 
possible. 


Topological Degree of a Mapping 


If UCR” is a bounded open set, z€ R” and 
F:U — R” is a C! mapping such that z ¢ F(OU) 
and det F'(x) 40 on F(z), the Brouwer degree 
degp[F, U,z] is defined (analytically) by 


deg, [F, U, z] := >», sign det F(x) 


x€F-!(z) 


= > ye 


x€F-!(z) 


where o(x) is the sum of the multiplicities of the 
negative eigenvalues of F(x). The case of a 
continuous F such that z¢F(OU) is treated by 
approximating F through mappings of the above 
type, and showing that the corresponding degrees 
stabilize to an unique value, defining deg,[F, U, z] in 
the general case. This number remains the same 
under sufficiently small perturbations of F and/or z, 
which expresses the “robustness” mentioned above. 
When n=2 and U is bounded by a closed Jordan 
curve, then degp[F, U, 0] is nothing but the winding 
number of F/||F|| along OU. 

Leray and Schauder have extended Brouwer 
degree to the important class of compact perturba- 
tions of identity in a normed space. A compact 
mapping f:A—B between metric spaces is a 
continuous mapping on A such that f(A) is relatively 
compact. If f: A— B is continuous and compact on 


each bounded B C A,f is called “completely con- 
tinuous” on A. 

If X is a real normed space, U C X an open bounded 
set, f: U— X compact, and z¢(I—f)(OU), the 
Leray-Schauder degree deg, [J — f, U,z] of I — f in 
U over z is constructed from Brouwer degree by 
approximating the compact mapping f over U by 
mappings fe with range in a finite-dimensional sub- 
space X. of X containing z. One shows that the values 
of the Brouwer degrees degp[(I — fc)|x,UN Xe, z] 
stabilize for sufficiently small positive € to a common 
value which defines deg, sl — f, U, z]. 

Again, this topological degree is an algebraic 
count of the number of elements of (I — f)“(z), 
equal to 0 when z ¢ (I—f)(U). When f is of class 
C!, and I — f'(x) invertible at each fixed point x € 
(I—f)*(z),(l-f) (z) is finite and the 
Leray—Schauder formula holds: 


deg; s[I —f, U,z] = ` 
xe(I-f)*(z) 
where o(x) is the sum of the algebraic multiplicities 
of the eigenvalues of f'(x) contained in [1, +00]. 
Let J=[0,1]. For A C X x I, and A € I, we write 
A, ={x € X:(x,A) € A}. The Leray—Schauder degree 
inherits the basic properties of Brouwer degree: 


1. Additivity. If U= U U U2, where U; and U2 
are open and disjoint, and if zé (I —f)(OU;) U 
(I — f)(OU2), then 
deg; s| a f; Ug] = deg; [I _ f; Uiz] 
+ deg; l= f, U22] 


2. Existence. If deg,;.[f—f,U,z]4#0, then 
z € (I—f)(U). 
3. Homotopy invariance. Let QCXxI be a 


bounded open set, and let F: Q — X be compact. 
If x— F(x, àA) Æz for each (x,A) € 00, then 
deg, sH — F(-, A), Q), 2] is independent of 2. 


In particular, if a is an isolated fixed point of f, 
and B(a,r) denotes the open ball of center a and 
radius r,deg,<[I — f, B(a,r),0] is defined and inde- 
pendent of r for sufficiently small r > 0. Its value is 
called the “Leray—-Schauder index” of I — f at a, and 
denoted by ind;s[I — f, a]. 


Fixed-Point Theorems for Compact 
Perturbations of Identity in a Normed 
Space 


An important application of Leray-Schauder degree 
is the obtention of general fixed point theorems for 
compact mappings in normed spaces based on 


continuation along a parameter. If F:AC 
X x I— X, we denote by ©“ the (possibly empty) 
solution set defined by 


X4 = {(x,r) € A: x = F(x, A)} 


Let QC XxI be a bounded open set and 
F:Q—X be a compact mapping. The general 
Leray—Schauder fixed-point theorem goes as follows: 


Theorem If the following conditions hold: 


(i) EL A OQ=0 (a priori estimate) 

(11) degisi — F(-,0),Q0,0] Æ O (degree condition), 
then ©? contains a continuum C along which 
à takes all values in I. In other words, X°? 
contains a compact connected subset C connect- 
ing XF to Q4. If one refines Assumption (ii) into 

(iii) OY is a finite nonempty set {a1,...,a,} and 
indy s[I — F(-,0),a,;] 40, the conclusion takes 
the form of an “alternative”: if assumptions 
(i) and (iii) hold, then (a,,0) belongs either to 
a continuum in ©” containing one of the points 
(a2,0),..., (44,0), or to a continuum in pia 
along which A takes all the values in I. 


Condition (iii) automatically holds in the following 
important special case: If =’ Md0Q=0, F(-,0)=0, 
and 0 € Qo, then ©” contains a continuum C > (0, 0) 
along which takes all values in J. When dealing with 
the fixed-point problem x= f(x) with f: U c X—X 
compact, U open and bounded, a natural choice is 
F(x, A)=Af (x), Q= U x I, giving the statement: If 
0 € U and if x Æ Af (x) for each (x, A) € OU x I, then 
{(x, A) € U x I:x = Af(x)} contains a continuum C > 
(0,0) along which A takes all values in I. 

Condition (i) requires the a priori knowledge of 
the localization of the solution set £? and is in 
general very difficult to check. An important special 
case occurs when ©*¥ is a priori bounded: if F is 
completely continuous on X x I, F(-,0)=0, and 
EX C Bir) x I for some r > 0, then ©* contains a 
continuum C 3 (0,0) along which A takes all values 
in I. Its special case with F(A, x)=Af(x) can be 
stated as Schaefer’s alternative: Let f:X—X be 
completely continuous. Then either there exists, for 
each AE[0,1], at least one x € xX such that 
x=Af(x), or the fixed point set {xEX: 
x= Af(x),0 < A < 1} is unbounded in X. Schaefer’s 
alternative is equivalent to the following Schauder 
fixed-point theorem: 


Theorem Any compact mapping f : B(r)— B(r) has 
a fixed point. 
A simple consequence of Schauder’s theorem is 


that, for any continuous and bounded g:R —> R, 
any open bounded D C R”, any A different from an 
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eigenvalue of —A on D with Dirichlet boundary 
conditions, the nonlinear Dirichlet problem 

Au + Au+e(u)=h(x) inD 
u=0 on ðD 


has a weak solution for each h € L?(D). 

An interesting consequence of Leray—Schauder 
theorem with ©* a priori bounded is that, for any 
bounded domain D c R” with OD of class C*, the 
Dirichlet problem for the equation of surfaces with 
constant mean curvature A 


(1 + ||Vul|*)Au — Ss Oju dju Oru 
ij=1 
= nA(1 + || Vu] 


has a unique solution for arbitrary smooth boundary 
data if and only if the mean curvature of the boundary 
OD is everywhere greater than [n/(n — 1)]|Al. 

The use of auxiliary continuous functionals gives 
a fixed-point theorem in the absence of a priori 
bounds: 


Theorem (Capietto-Mawhin-Zanolin). Let Q C 
X x I be an open set and F:N — X be completely 
continuous. If X} is bounded, deg,<[I — F(-,0), 
Uo,0] 40 for some open bounded neighborhood 
Uo of ©Ẹ, and if there exists a continuous 
mapping p:X x I—R4, proper on ¥?, and c_ < 
min,a y(-,0) < maxya y(-,0) < c} such that X? g 
{c_,c+} and X d [c_,c}], then XÈ contains a 
continuum C along which X takes all values in I. 


This result implies, for example, that for g: — R 
continuous, odd and superlinear (lim). oo g(u)/ 
u=+00), and p:[0,1] x R? with at most linear 
growth in u and w’ at infinity, the two-point 
boundary-value problem 


u” + g(u) = p(t,u,u'), 


has, for all sufficiently large j, at least one solution 
uj; having exactly ;+1 zeros on [0,1], and 
l;la 00 if j — 00, 


u(0) = u(1) = 0 


Extensions of Leray—Schauder degree 


Fixed-point theorems for operators between suitable 
nonlinear spaces can also be proved using topologi- 
cal continuation arguments. For example, if C C X 
is a nonempty convex set, one has the following 
extension of a result of the previous section to 
mappings in C: if UCC is open and bounded, 
F:clcU x I— C compact and such that x Æ F(x, A) 
for each (x,A) € OcU x I, F(-,0)=xo € U, then 
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F(-,A) has a fixed point in U for each A € I. The 
special case where C is a wedge is useful in finding 
positive solutions of nonlinear differential or inte- 
gral equations. For nonlinear spaces, the degree has 
to be replaced by the fixed-point index, 
which generalizes both the “Hopf—Lefschetz num- 
ber” and Leray—Schauder degree. 

The Leray-Schauder degree also has been 
extended to other classes of operators. Compact 
operators can be replaced by k-set-contractive or 
condensing mappings f, with respect to various 
measures of noncompactness, and fixed-point pro- 
blems can be replaced by problems of the form x € 
F(x) for multivalued mappings F. Equivariant degree 
theories have been developed when U is invariant 
and f equivariant with respect to the action of some 
compact Lie group G on X. The special case of 
G=S!' is of special importance in the study of 
periodic solutions of autonomous differential sys- 
tems. Degree theories have also been constructed for 
various classes of mappings between two different 
Banach spaces or manifolds, which include mono- 
tone-like and nonlinear Fredholm operators. We just 
describe a simple but useful situation in this 
direction. 

Many differential equations, when expressed as 
equations in an abstract space, do not have the 
fixed-point form but can be written as Lx = Nx with 
L:D(L) c X > Z linear, N: U— Z, X and Z real 
normed spaces. If L is invertible, the equation is 
trivially equivalent to the fixed-point problem 
x = L'Nx, to which Leray-Schauder theory can be 
applied when LN is compact. The situation is 
more delicate when L has no inverse. If L is a linear 
Fredholm mapping of index zero (its range R(L) is 
closed and has a finite codimension equal to the 
dimension of its null space N(L)), the set F(L) of 
linear continuous mappings of finite rank A: X —> Z 
such that L+A:D(L)—Z is a bijection is none- 
mpty and the compactness of (L + A)'G does not 
depend upon the choice of A € F(L). G is then called 
“T-compact” on E, and “L-completely continuous” 
on E when compact on each bounded set of E. 

The following continuation theorem for perturbed 
Fredholm mapping of index zero holds. 


Theorem Let QC X xI be open and bounded, 
L:D(L) Cc X—Z linear Fredholm of index zero, 
N:Q — Z L-compact, and let S={(x,) € (D(L) x I) 
NOQ:Lx=N(x,A)}. If 


(i) ENQ FO (a priori estimate), 
(ii) N(Qo x {0} c Y, with Y @ R(L) =Z (transvers- 
ality condition), and 
(iii) deggiN(- , Olke» 20 N kerL, 0] # 0 
condition) 


(degree 


then X contains a continuum C along which A takes 
all values in I. 


When dealing with equation Lx=f(x) with f 
L-completely continuous, an interesting special case 
of the above result follows from the choice 
N(x, A) =Af(x) + (1 —A)OF(x), with O:Z—-Z a 
projector such that N(OQ)=R(L). In this case, the 
homotopy is equivalent to 


Lx = Af(x) (A €]0, 1]) 
Of (x) =0; xe N(L) (A=0) 


An application (among many) of this result, 
for g:R—R continuous such that -o0 < 
lim sup, „a glu) < lim inf, — +% glu) < +00,D C R” 
open, bounded, A, an eigenvalue of the Dirichlet 
problem for —A on D, is the weak solvability of the 
nonlinear problem 

Au + àu + glu) = b(x) in D 
u=0 on ôD 


for each h € L?(D) such that 


J Heel) di < [lim sup g(u)| 
D 


uUu—> — CoO 


x J ote dx — [lim inf a(u)| [ee dx 
D u— +00 D 

for all eigenfunctions y associated to Ag. The 
addition of the nonlinearity g “widens” the range 
(b € L*(D): [, bp =0} of the corresponding linear 
problem. 


Bifurcation Theory 


Leray—Schauder degree is a powerful tool in bifurca- 
tion theory, where, given a family F of solutions, 
one tries to detect and analyze other ones branching 
or bifurcating from F. Consider the equation 


x = ALx + R(x, A) [3] 


in a real normed space X, where L: X — X, linear, 
and R:X x R — X are completely continuous, and 
R(0,A) =0 for each A€ R. Thus, {(0, A): € R} is 
the trivial solution set of [3]. A bifurcation point 
(A\*,0) for [3] is the limit of a sequence (Ap, xp) of 
solutions of [3] in R\{O}. 


If 
7 IRON o 
x>0 ||>e|| 
uniformly on bounded A-sets |4] 


it is easy to prove that if (A*, 0) is a bifurcation point for 
[3], then A* is a characteristic value (reciprocal of an 
eigenvalue) of L. Leray-Schauder theory gives a partial 


converse to this result known as Krasnosel’skii’s 
bifurcation theorem: 


Theorem For each real characteristic value X* of L 
with odd algebraic multiplicity, (X*,0) is a bifurcation 
point of [3]. Of fundamental importance in the proof is 
the special case of [2] with f = L and N(I — L) = {0}. 


Another fruitful concept is Krasnosel’skii’s bifur- 
cation from infinity. We say (A*, oo) is a bifurcation 
point for [3] if there exists a sequence (An, Xn) of 
solutions of [3] such that \,,— * and ||x,,|| — co. 
The corresponding bifurcation result goes as follows 
(Krasnosel’skii): if 


_ IRENI g 
Ixi> æ] 
uniformly on bounded A-sets [S] 


then, for each real characteristic value A* of L with 
odd algebraic multiplicity, (A*,oo) is a bifurcation 
point of [3]. 

Global versions of Krasnosel’skii’s theorems can be 
given, whose statements are reminiscent of Leray- 
Schauder’s alternative theorem. Let S denote the 
closure in R x X of the set of (A,x) € R x (X \ {0} 
satisfying [3]. For bifurcation from zero, one has 
Rabinowitz global bifurcation theorem: 


Theorem If [4] holds and X* is a real characteristic 
value of L with odd algebraic multiplicity, then S 
contains a component C which either is unbounded, 
or contains (A**,0), where \** + X* is a character- 
istic value of L. 


As an application, one can show that the non- 
linear Sturm—Liouville problem 


—(p(x)u')' +.q(x)u=Aa(x)ut+h(x,u,u',r) (x €]0,1f) 
agu(0) + bou'(0) = a,u(1) + byu'(1) =0 


with p€ C! positive, q, a, h continuous, a positive, 
(ap + b3)la?+b3) #0 and h(x,u,v)=o(|u|+|v|) if 
ju|-+|v| > 0 uniformly on compact A-intervals, has, 
for each REN, an unbounded component of 
solution C; in R x C!([0,1]) emanating from (.,,0), 
with A, an eigenvalue of the problem with )=0 
(Rabinowitz). 

One has also global bifurcation from infinity: if 
[S] holds and if A* is a real characteristic value of L 
with odd algebraic multiplicity, then [3] has an 
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unbounded component of solutions D which con- 
tains (A*, co). 


See also: Bifurcation Theory; Bifurcations in Fluid 
Dynamics; Bifurcations of Periodic Orbits; Minimal 
Submanifolds; Minimax Principle in the Calculus of 
Variations; Partial Differential Equations: Some 
Examples; Riemann-—Hilbert Problem; Topological 
Defects and Their Homotopy Classification; Viscous 
Incompressible Fluids: Mathematical Theory. 
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Introduction 


Local continuous transformations were introduced 
by Lie as a tool for solving ordinary differential 
equations. In this program, he followed the spirit of 
Galois, who used finite groups to develop algo- 
rithms for solving algebraic equations (the general 
quadratic, cubic, and quartic), or else to prove that 
some equations (the generic quintic) could not be 
solved by quadrature. 

Lie’s work led eventually to the definition and 
study of Lie groups. Lie groups are beautiful in their 
own right — so beautiful that they have been studied 
independently of their origin as a tool for solving 
differential equations and studying the special 
functions determined by certain classes of these 
equations. 


Lie Groups 


Lie groups exist at the interface of the two great 
divisions of mathematics: algebra and topology. 
Their algebraic properties derive from the group 
axioms. Their geometric properties arise from the 
parametrization of the group elements by points in a 
differentiable manifold. The rigidity of these struc- 
tures arises from the continuity requirements 
imposed on the group composition and inversion 
maps. 
The algebraic axioms are standard. 


Definition A group G_ consists of a set 
Zi, Zj, 8&k,---E€G together with a combinatorial 
Operation o that satisfy the four axioms: 


(i) Closure. If g; € G, gj € G, then g; o g; € G. 

(ii) Associativity. If gi, gj, g} € G, then (gj o gi)o 
Zk = Zi O (8; © Be). 

(iii) Identity. There is a unique operation e € G that 
satisfies € o g; =g; =g; Oe. 

(iv) Inverse. Every group operation g; € G has an 
inverse, denoted g;!, that satisfies g; o g7! = e = 
gr Ogi. 

Lie groups have more structure than groups. In 
particular, each g; € G is a point in an n-dimen- 
sional manifold M”. That is, the subscript 1 
actually identifies a point x € M”, so that we 
can write g;=g(x) or most simply g;=x. 
The group multiplication can be expressed in the 


form g; o gj =r — g(x) o gly) =g(z), where x € M”, 
y E€ M”,z=¢(x,y) € M”. The group inversion map 
can be expressed in the form g(x) — g(x)! =g(y), 
y= v(x) € M”. The topological axioms for 
Lie groups can be taken as: 


(v) Continuity of composition. The mapping 
z=(x,y) defined by the group composition 
law is differentiable. 

(vi) Continuity of inversion. The mapping y= y(x) 
defined by the group inversion law is 
differentiable. 


The dimension of the Lie group is the dimension 
of the manifold that parametrizes the operations in 
the group. 

The most familiar examples of Lie groups consist 
of n x n nonsingular matrices over the fields R, C, O 
of real numbers, complex numbers, and quaternions. 
For example, the set of 2x2 real unimodular 
matrices 


a b 
f I ad—bc=1 
is a three-dimensional submanifold embedded in 
R? = R4. 


Matrix Lie Groups 


Not every Lie group is a matrix group. Yet, it is a 
surprising and useful result that almost every Lie 
group encountered in physics is a matrix Lie 
group. These are all subgroups of the general 
linear groups GL(n;F) of nxn nonsingular 
matrices over the field F (R, C, O). These groups 
have real dimension n? x (1,2,4), respectively. The 
special linear subgroups SL(n; F) are defined as the 
subgroups of nxn matrices with determinant 
+1:M € SL(n;F) if det M= +1. This definition is 
problematic for quaternions, as they do not 
commute. To avoid this problem, it is useful to 
map quaternions into 2 x2 complex matrices in 
the same way complex numbers can be mapped 
into 2 x 2 real matrices: 


a b 
a+ib 
i 7 
igita 
qo — 143 


+1 
got+tiqi + Iq + Kq3 > H ae 
1q1 — q2 


Here (1,i) are basis vectors for C! considered as 
a real two-dimensional linear vector space, 


(1,Z, J, kK) are basis vectors for O! considered as a 
real four-dimensional linear vector space, and (a,b) 
and (go, 91, 92,93) are all real. The squares of the 


imaginary quantities i and Z, J, K are all -1:i* = —1; 
T? = J? =K? = —1 and the imaginary quaternion 
basis elements anticommute: CERN 


{7,K}={K,Z}=0. The unimodular subgroup 
SL(n; O) of GL(n; O) is obtained by replacing each 
quaternion matrix element by a 2x2 complex 
matrix, setting the determinant of the resulting 2” x 
2n matrix group to +1, and then mapping each of the 
n? complex 2 x 2 matrices back to quaternions. 

Many other important groups are defined by 
imposing linear or quadratic constraints on the n? 
matrix elements of GL(n;F) or SL(n;F). The 
compact metric-preserving groups U(n;F) leave 
invariant lengths (preserve a positive-definite metric 
g=I,) in linear vector spaces. The matrices M € 
U(n; F) satisfy MİI,M =L. These conditions define 
the orthogonal groups O(n) = U(n; R) and the uni- 
tary groups U(n)=U(n; C). Their noncompact 
counterparts O(p,q) and U(p,q) leave invariant 
nonsingular indefinite metrics 


I 0 
8 = pa 6 “| 


in real and complex n= (p + q)-dimensional linear 
vector spaces: M'I, ,M =I, q- 

Intersections of matrix Lie groups are also Lie 
groups. The special metric-preserving groups are 
intersections of the special linear groups SL(n; F) C 
GL(n;F) (with F=QO,SL(m;Q) is defined as 
described above) and the metric-preserving sub- 
groups U(n; F) C GL(n; F): 


SL(n; R) 0 U(n; R) = SO(n), n(n —1)/2 
SL(n; C) N U(n; C) = SU(n), n~—1 
SL(n; O) NU(n; O)= Sp(n) = USp(2n), n(2n +1) 


The real dimensions of these groups are given in the 
right-hand column. Under the replacement of qua- 
ternions by 2 x 2 complex matrices, the group of 
nxn metric-preserving and unimodular matrices 
Sp(7) over O is identified as USp(27), an isomorphic 
group of 2n x 2n matrices over C. 

Noncompact forms SO(p, q), SU(p, q), and Sp(p, q) = 
USp(2p, 2q) are defined similarly. 

The Lie group SU(2) rotates spin states to spin 
states in a complex two-dimensional linear vector 
space. It leaves lengths, inner products, and 
probabilities invariant. If an interaction is spin 
independent, only an invariant (“Casimir invar- 
lant”) constructed from the spin operators can 
appear in the Hamiltonian. The same group can act 
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in isospin space, rotating proton to neutron states. 
The Lie group SU(3) similarly rotates quark states 
or color states into quark states or color states, 
respectively. The Lie group SU(4) rotates spin- 
isospin states into themselves. The conformal group 
SO(4, 2) leaves angles but not lengths in spacetime 
invariant. It is the largest group that leaves the 
source-free Maxwell equations invariant. It is also 
the largest group that transforms all the (bound, 
scattering, and parabolic) hydrogen atom states 
into themselves. 

Lie groups such as the Poincaré group (inhomo- 
geneous Lorentz group) and the Galilei group have 
the matrix structures 


ty x 

Ol; 1) a] y 
& 

ct 








vi 4 x 

O(3) Ua- a y 
V3 t3 Z 
1 t4 t 
O 1 


respectively. In these transformations t= (t1, t2, t3) 
describes translations in the space (x-, y-, and z-) 
directions, v= (v1, v2,v3) describes boosts, and t4 
resets clocks. The matrices in these defining matrix 
representations are reducible. 

The Heisenberg covering group H4 is a four- 
dimensional Lie group with a simple 3 x 3 matrix 
structure: 


1 l d 
Heisenberg covering group = H4 = |O n rj, 
0 0 1 
nz” 
This matrix representation of H4 is faithful but 
nonunitary. 


“Linearization” of a Lie Group 


At the topological level, a Lie group is homoge- 
neous. That is, every point in a manifold that 
parametrizes a Lie group looks like every other 
point. At the algebraic level, this is not true — the 
identity group operation e is singled out as an 
exceptional group element. At the analytic level, the 
group composition law z= ¢(x,y) is nonlinear, and 
can therefore be arbitrarily complicated. 
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The study of Lie groups is enormously simplified 
by exploiting these three observations. Specifically, 
it is useful to “linearize” the group multiplication 
law in the neighborhood of the identity. The 
linearization leads to a local Lie group. This is a 
linear vector space on which there is an additional 
structure. Once the local Lie group properties are 
known in the neighborhood of the identity, they are 
known everywhere else in the group, since the group 
is homogeneous. 

A Lie group is linearized in the neighborhood of 
the identity by expressing an operator near the 
identity in the form g(e) =I + «€X, where the local 
Lie group operator «X =—6x'X;, the X; are n 
linearly independent vector fields on the manifold 
M”, and the small coordinates 6x’ measure the 
distance (in some rough sense) of g(e) from the 
point that parametrizes the identity group opera- 
tion e=g(0). For another group operation 
g(6Y)=1+6Y in the neighborhood of the identity, 
the following holds. 


1. The product g(eX)g(6Y) =(1+ eX)(I+ 6Y)=I+ 
(eX + dY) + (h.o.t) is in the local Lie group. 

2. The commutator g; ogjog;' og" in the group 
leads to 


g(eX)g(6Y)g(eX)‘g(SY) 
= I +4}66(XY — YX) + h.o.t 
= I + 566[X, Y] +h.o.t 


in the local Lie group. 


The first condition shows that the local Lie 
group is a linear vector space. The n vector fields 
X; can be chosen as a set of basis vectors in this 
space. 

The second condition shows that the commutator 
of two vectors in this linear vector space is also in 
this linear vector space. The commutator endows 
this linear vector space with an additional combina- 
torial operation (“vector multiplication”) and pro- 
vides it with the structure of an algebra, called a Lie 
algebra. 


Definition A Lie algebra [a consists of a set of 
operators X, Y,Z,..., together with the operations 
of vector addition, scalar multiplication, and com- 
mutation [X,Y] that satisfy the following three 
axioms: 


(i) Closure (linear vector space). If X, Y € la,aX + 
BY € la and [X, Y] € Ia. 
(11) Antisymmetry. |X, Y] = —[Y, X]. 
(iii) Jacobi identity. [X,|Y,Z]|+ [Y,[Z,X]]+ 
[Z, [X, Y]]=0. 


The structure of a Lie algebra, or local Lie group, 
is summarized by the structure constants, defined in 
terms of the basis vectors X;, by 


Xp = c*X, summation convention 


The structure constants cjf are components of a 
third-order tensor, covariant and antisymmetric 
in two indices G= — cjf) and contravariant in 
the third. These components obey the Jacobi 
identity, which places a quadratic constraint on 
them: 


S t S t S t __ 
Ci Csk + Cie Csi + Chi Cg = O 


Linearization of a Lie group generates a Lie 
algebra. A Lie group can be recovered by the 
inverse process. This is the exponential operation. 
A group operation a finite distance from the origin 
(the point identified with the identity group opera- 
tion) of the manifold that parametrizes the Lie 
group can be obtained from the limiting procedure 
(e=1/K — 0): 


g(X) = lim 


K-00 


1 K 
(1 -- zX) = e* = EXP(X) 


The exponential operation is well defined for real 
numbers, complex numbers, quaternions, n x n 
matrices over these fields, and vector fields. 

A 1:1 correspondence between Lie groups and Lie 
algebras does not exist. Isomorphic Lie groups have 
isomorphic Lie algebras. But nonisomorphic Lie 
groups may also possess isomorphic Lie algebras. 
The best known examples of nonisomorphic Lie 
groups and their isomorphic Lie algebras are 


SO(3) £SU(2), 30(3) = 3u(2) 
SO(4) £SU(2) x SU(2),  80(4) = 8u(2) + 8u(2) 
SO(5) #Sp(2) =USp(4),  30(5) = 8p(2) = usp(4) 


There is a 1:1 correspondence between Lie algebras 
and “locally” isomorphic Lie groups. This has been 
extended to global Lie groups by a beautiful 
theorem due to E Cartan. 


Theorem (Cartan) There is a 1:1 correspondence 
between Lie algebras and simply connected Lie 
groups. Every Lie group with the same Lie algebra 
is either the simply connected (“universal covering”) 
group or is the quotient of this universal covering 
group by one of its discrete invariant subgroups. 


This relation is summarized in Figure 1. 

As a concrete example, the Lie algebra of 
SO(3), which is the group of real 3 x 3 matrices 
satisfying M'I;M=I3; and det(M)= +1, is 
spanned by the three “angular momentum vector 
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Simply connected 


Lie group 
SG 
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| ee) SG/D> 
groups 
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algebra, 
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Linearization “LOG” 


Figure 1 Cartan’s theorem states that there is a 1:1 correspondence between Lie algebras and simply connected Lie groups. All 
other Lie groups with this Lie algebra are quotients of the covering group by one of its discrete invariant subgroups D; C Dmax. There is 
a relation between the discrete invariant subgroup D; and the homotopy group of SG/D;. Reproduced with permission from Gilmore R 
(1974) Lie Groups, Lie Algebras, and Some of Their Applications. New York: Wiley. 


fields” L;(x)= Eijk X Of or the three angular 
momentum matrices 


0 0 0 
Lı = Lz = | 0 


0 0 —1 
00 0 
+1 O0 O 


LsL = = 143 = 


QO +1 0 
L3 = Li —|-1l 0 0 
0o 00 


The Lie group SU(2) is the group of complex 2 x 2 
matrices satisfying M'I M = Í and det(M) = +1. Its 
Lie algebra is spanned by the three spin matrices 
S; =(i/2)o;, which are multiples of the Pauli spin 
matrices gj: 


ro 1 ro y 
2+1 0 2|+i 0 


if+1 0 
3 =5 5 a 


The two Lie algebras are isomorphic as they share 
isomorphic commutation relations [J;, J2]= —J3 (and 
cyclic), J;= L; or J;=S;. The group SU(2) is simply 
connected. Its maximal discrete invariant subgroup D 
consists of all multiples of the identity, alp, so that 
œ= +1. According to Cartan’s theorem, SO(3) = 
SU(2)/D2, D2 = {h, —h}. The group SO(3) is doubly 
connected, with a two-element homotopy group. 


Matrix Lie Algebras 


A deep theorem of Ado guarantees that every Lie 
algebra is equivalent to a matrix Lie algebra, even 
though the same is not true of Lie groups. 

Sets of nxn matrices that close under vector 
addition, scalar multiplication, and commutation 
(Mı E la, M2 € la=> [M,, M2] —M,M,—-MoM,€ la) 
form matrix Lie algebras. The antisymmetry proper- 
ties and Jacobi identity are guaranteed by matrix 
multiplication. 

Lie algebras for the general linear groups 
GL(n; F) consist of nxn matrices over F. Lie 
algebras for the special linear groups SL(n; F) 
consist of traceless n x n matrices. The Lie algebras 
of the unitary groups consist of anti-Hermitian 
matrices. The Lie algebras of U(p,q; F) consist of 
matrices that obey 

M'Ipg +IpqgM=0, M €u(p,q;F) 
The matrix Lie algebras of other matrix Lie groups 
are obtained by constructing the most general Lie 
group operation in the neighborhood of the identity 
by linearization. For example, the Lie algebra of the 
Heisenberg covering group H4 is 


Li d 1 ól bd 

O n r|—>{]0 1+6n Or 

0 0 1 0 0 1 
+1I3;+6nN+6rR+6/1L+6dD 
N ~x aa Real 

0 0 0 0 0 0 

0O 1 0 0 0 1 

0 0 0 0 0 0 
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0 10 0 0 

0 0 0 0 0 

0 0 0 0 0 


The four 3 x 3 matrices N, R, L, D that span the Lie 
algebra §, of H4 satisfy commutation relations 
isomorphic with the commutation relations satisfied 
by the photon operators (ata, at, a, I=[a,a']). The 
3 x 3 matrix representations of the group H4 and 
the algebra 94 are faithful. The representation of H4 
is nonunitary and that of §, is non-Hermitian. 
There is a simple way to relate a large class 
of operator Lie algebras to matrix Lie algebras. 


If A,B,C,... belong to a Lie algebra of nxn 
matrices with [A, B] = C, the matrix-to-operator 
mapping 


A—A=x'A/ O; 
preserves commutation relations, for 
A B] = [x'A/0;, x"B,50,] 
= x Aj Oj, XIB o =- x" B; Os, x'|A/O, 
= x Aj B; ô; — x"B, AJ; = x'A, BIO, =C 


This relation depends on the bilinear products x‘d; 
satisfying commutation relations 


[x'O;,x"0,| = x'O,6;" — x"0,6,' 


These commutation relations are satisfied oy pro- 
ducts of creation and annihilation eon a! aior 
either bosons (b'b; ) or fermions (f (fF fi)- These matrix- 
to-operator mappings can be extended to include 
bilinear products such as x‘x/, x'ð;, 0;0; and their 
boson and fermion counterparts a;dj, N tal. For 
example, the vector fields associated wh the 
operator Jı for SO(3) and SU(2) are x'(L1)/0;= 

x73 = x% and u i(S1) )/ð; = (i/2)( (u'o + u*01). 

Boson and fermion leas products ala;(1 <1, 
j<ny) A E e to u(n). Boson bilinear products 
bibj, b! Dj b! b' are Ponpa A ae) while 
fermion bilinear products fifis fi fis fi A are isomorphic 
to $p(2n). 


Structure of Lie Algebras 


The study of Lie algebras is greatly facilitated by 
studying their structure. The structure is determined 
by the commutation properties of the Lie algebra. 


Invariant Subalgebra 


If a Lie algebra has an invariant subalgebra, then 
the commutator of anything in the algebra with 


anything in the subalgebra is in the subalgebra. 
Suppose a is a linear vector subspace of q. 
If [g,a] Ca, then a is an invariant subspace of q. 
In particular, [a,a] Ca and a is therefore also 
a subalgebra of g: it is an invariant subalgebra 
in g. 


Example The Lie algebra 1$0(3) consists of the 
three rotation operators Lj =x'0; —x/0; and the 
three displacement operators P,=0O,. The subset 
of displacement operators is an invariant subspace 
in 1$0(3), since it is mapped into itself by all 
commutators. It is also a subalgebra in 1$0(3). This 
particular invariant subalgebra is commutative. 


Solvable Algebra 


If g is a Lie algebra, the linear vector space obtained 
by taking all possible commutators of the operators 
in g is called the “derived” algebra: [g, g] = gq" C gq. 
If gt =g, there is no point in continuing this 
process. If g'!) C g, it is useful to define g= gq") 
and to continue this Pia G defining g% as the 
derived algebra of g“ = jg, gt]. We can 
continue in 2 ney N q'"*") as the algebra 
derived from g™. Ultimately (for finite-dimensional 
Lie algebras), vite: gD =0 or gt =g for 
some n. If the former case occurs, 

g = 9 > gi ET >... 5g > gi) = 
is called solvable. Each algebra 
4. 


the Lie algebra gm 
q is an invariant subalgebra of g”, 


Example The Lie algebra spanned by the boson 
number, creation, annihilation, and identity opera- 
tors is solvable. The series of derived algebras has 
dimensions 4, 3, 1, 0. 


g gD gD gO 
oe = = 
a a = — 
I I I - 


Semidirect Sum Algebra 


When a Lie algebra g has an invariant subalgebra a, 
the linear vector space of the Lie algebra g can be 
written as the direct sum of the linear vector 
subspace of the subalgebra a plus a complementary 
subspace b. The subspace b is generally not by itself 
a Lie algebra. The Lie algebra g is written as a 
semidirect sum of the two subspaces. The semidirect 


sum structure satisfies the commutation relations 
shown: 


qgq=bAa 


The subspace b can be given the structure of an 
algebra modulo the component of the commutator 
in a: 6b =qmod a. 


Example The three-dimensional Lie algebra spanned 
by the photon operators at, a, I has a semidirect sum 
decomposition where b is spanned by a', a and a is 
spanned by I. The subspace b is not closed under 
commutation, and a is commutative. The Lie algebra 
1$0(3) also has the structure of a semidirect sum, with 
b=b6=80(3) and the invariant subalgebra a is 
spanned by the three displacement operators P}. 


Nonsemisimple Algebra 


A Lie algebra is nonsemisimple if it has a solvable 
invariant subalgebra. 


Example The Lie algebra spanned by bilinear 
products of photon creation and annihilation opera- 
tors a;a;, creation operators al, annihilation opera- 
tors aj, and the identity operator I(1 < i,j <n) 
is nonsemisimple. The solvable invariant subalgebra 
is spanned by the 27 + 2 operators consisting of the 
single photon operators al, dj, the identity operator 
I, and the total number operator à= S~"_, ala; 


Semisimple Algebra 


A Lie algebra is semisimple if it has no solvable 
invariant subalgebras. 


Example The Lie algebra 30(4) is semisimple. This 
Lie algebra has two invariant subalgebras, both 
isomorphic to $0(3). The direct sum decomposition 


80(4) = $0(3) + $0(3) 


is well known to physical chemists and is respon- 
sible for the dualities that exist between rotating and 
laboratory frame descriptions of molecular systems. 


Simple Algebra 


A Lie algebra is simple if it has no invariant 
subalgebras at all. The prettiest page in the theory 
of Lie groups is the classification theory of the 
simple Lie algebras. We turn to this subject now. 


Lie Algebra Tools 


Two powerful tools have been developed for study- 
ing the structure of a Lie algebra. These are the 
regular representation and the Cartan—Killing form. 
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Regular Representation 


This representation assigns the structure constants to 
a set of n n x n matrices according to 


Bake =e, Pa ea 


The matrices of the regular representation contain 
exactly as much information as the components of 
the structure tensor. They can be studied by 
standard linear algebra methods. For example, a 
secular equation can be used to put the commuta- 
tion relations into canonical form. 

The structure of the matrices of the regular 
representation determines the structure of the Lie 
algebra. The identification is carried out according to 
the usual rules of representation theory, as shown in 
Figure 2. If a basis Xa can be found in which all the 
matrices of the regular representation are simulta- 
neously reducible, the algebra possesses an invariant 
subalgebra. If the representation is not fully reduci- 
ble, the invariant subalgebra is solvable. If the regular 
representation is fully reducible, the algebra consists 
of the direct sum of two (or more) smaller, mutually 
commuting subalgebras. If the regular representation 
is irreducible, the algebra is simple. 

If a Lie algebra is solvable (Sol), all matrices in 
the regular representation can be transformed to 
upper triangular matrices. If the Lie algebra is 
nilpotent (nil c Solb), the diagonal matrix elements 
in the upper triangular matrices are zero. The 
converses are also true. 


Cartan-Killing Form 


The Cartan-Killing form is a second-order sym- 
metric tensor that is constructed from the third- 
order antisymmetric tensor Cap” by cross-contraction 


Sap = Cay” Cav” = Boo = tr R(Xa)R(Xp) = (Xa, Xz) 
= (Xs, Xa) 


The metric gag can be used to place an inner product 
(Xq,Xg) on this linear vector space. This inner 
product is not necessarily positive definite. 


Reducible Fully reducible Irreducible 


Us Us Us 










7 
WU: 


Semisimple 


Lp 


Nonsemisimple Simple 


Figure 2 When the regular matrix representation of a Lie 
algebra is reducible, fully reducible, or irreducible, the Lie 
algebra is nonsemisimple, semisimple, or simple. 
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The matrix gag can also be treated by standard 
linear algebra methods. Since it is real and 
symmetric, it can be diagonalized. If there are 
n_ negative eigenvalues, n, positive eigenvalues, 
and mo vanishing eigenvalues (1n=n_ +n, + no), the 
Lie algebra has a corresponding linear vector space 
decomposition of the form 


g = 9- +941 Qo 


The inner product is positive definite on the 
subspace g, and negative definite on g_. We call 
go the singular subspace. The subspace go is closed 
under commutation and in fact is a nilpotent 
invariant subalgebra of g. 


Decomposition of Lie Algebras 


The most general Lie algebra g is the semidirect sum 
of a semisimple Lie algebra $3 and a solvable 
invariant subalgebra Solb: 


(33, 33] = 33 
135, 8o0lb] C Solb 
solv, solb] C solb 


g = $8 A Solb 


The decomposition of g into its component parts 
is accomplished by a simple two-step algorithm. 


1. Compute the Cartan-Killing metric for g and 
determine the singular subspace. If there is none, 
stop. If the dimension of gọ > 0, nil = gọ is the 
maximal nilpotent invariant subalgebra of g. 

2. Compute the structure constants of the Lie 
algebra g =g- nil=gmodnil=g/nil, the 
Cartan-Killing metric tensor on g’, and the 
decomposition g’=g"_ + 8%. + go. Then a= qo is 
abelian and invariant in g’. In fact, a is the largest 
abelian invariant subalgebra in g’. 


The algorithm stops here, for the algebra 
g” =q’ moda =g'/a=g_ +g, has no singular sub- 
space under its Cartan—Killing metric. 

Under this algorithm, the decomposition of g into 
its semisimple part and its maximal solvable 
invariant subalgebra is 

g = (gL +9.) A (80 ^ 40) 
The maximum solvable invariant subalgebra Solb 
in g is the semidirect sum of a and nil: Solb = gh A 
Gj=aAnt. In addition, 83=qmod8olb= 
g/Solb =g +g. The subspace g’ is closed 
under commutation and exponentiates into a 
compact subgroup of G’. The subspace g 


exponentiates to a noncompact coset in G’ that is 
simply connected. 

Every element in a semisimple Lie algebra can be 
expressed as the commutator of two elements in the 
Lie algebra. In this sense, a semisimple algebra 
reproduces itself under commutation. 

To illustrate this algorithm, we tear apart the 
eight-dimensional Lie algebra spanned by the photon 
operators alaj, 1<1,;<2 and 430435 a a3, I, 
where the photon operators obey [a;,a a']= éi1. The 
regular ee crane of the Bene linear combi- 
nation X = aii mija! aj + nala; + ral + laz + óI is 


0 =n m1 
0 m2 —m) 
=Ma Moi +m, =m 0 
miz =mi2 0 —m11 + M2 


The Cartan-Killing inner product is the trace of the 
square of this matrix: 


(X, X) = tr R(X)? = 2(my, — mY 


+ 8my2mz1 + 2n? 


The subspace go is spanned by a\ ay + aa, a}, az, Í, 
leaving the four operators ajai — a}a2, 4442, 
atai, a3a3 to span g. J simple calculation shows 
that go is spanned by alaz. As a result: 





Subspace Spanned by 

g’, ala, — alao, 5 (aj ap palai) 
ial t 

gL z (aj a2 — a381) 
t 

Jo a383 

do ala, + ala, a}, ag, I 


The Lie algebra is the direct sum g= 8{(2;R)+ 
1) + b4. 


Structure of Semisimple Lie Algebras 


The Cartan-Killing metric gag is nonsingular on a 
semisimple Lie algebra. The metric and its inverse g®”, 
can be used to raise and lower indices. In particular, the 
tensor whose components are Cagy = Cy 48) is third- 
order antisymmetric: apy = Cha = Crap = — Chae nas 
Classification of semisimple Lie algebras is equivalent to 
classifying such tensors. 

Another useful way to describe semisimple Lie 
algebras is to search for a canonical structure for the 
commutation relations. A useful canonical form is 
an eigenvalue form 


X,Y] = AY 


In a basis X; with X=x'X; and Y=yX;, this 
equation reduces to a standard eigenvalue equation 
for the regular representation 


SS y (R(x'X;);* = 5") Xa =) 
j ok 


Thus, the search for a standard form for the commuta- 
tion relations reduces to a study of the secular equation 


n 


det(R(X) - A1) = YAX =0 [a 


j=0 


The coefficients ¢;(X) are homogeneous polynomials 
of degree j in the coefficients x’ of X = x'X;. 

In order to extract maximum information from 
this secular equation, a generic vector X € q is 
chosen. Such a choice minimizes all degeneracies. 
With a generic choice of X € g, it is useful to define 
the rank, l, of the Lie algebra g as: 


1. the number of functionally independent coeffi- 
cients ;(X) in the secular equation; 

2. the number of independent roots, a1, Q2,...,Q) 
of the secular equation; 

3. the dimension of the subspace H Cg that 
commutes with X; and 

4. the number of independent (Casimir) operators 
that commute with all X;:C;(X)=¢;(x' > X;): 
[C (X), Xi] =0. 


For example, for $0(3) or $u(2), the secular 
equation for X =x'X; is 


0 X3 =X) 
det | | -x3 0 xı |— Ab 
X2 —X1 0 


= (—A)* + (—A)d2(x) = 0 


where ¢2(x) =x? + x5 + x3. The rank is /=1. There 
is one independent coefficient ¢œ2(x) and one 


independent root of this equation, ay = \/— 6jx'x/ = 
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iyx-x. The only linear operators that commute 
with X are scalar multiples of X. There is one 
independent homogeneous operator that commutes 
with all generators X;, obtained by the substitutions 
x' — L; (for $0(3)) or x’ — S; (for $u(2)): 


C(L) = n(x 9 LL) = 14 +1, +1; 


The secular equation [1] is over the field of real 
numbers. This is not an algebraically closed field. 
There is no guarantee that the number of indepen- 
dent functions ¢;(x) in the secular equation is equal 
to the number of (real) roots of this equation until 
we extend the field from R to C, which is 
algebraically closed. As a result, the classification 
of semisimple Lie algebras is done over complex 
numbers. After the complex extensions of the simple 
Lie algebras have been classified, their different 
inequivalent real forms can be determined. 


Root Spaces 


When the secular equation for the regular represen- 
tation of a generic element in a Lie algebra is solved, 
the commutation relations can be put into a simple 
and elegant canonical form. This canonical form 
depends on the rank, l, of the Lie algebra, not the 
dimension, n, of the Lie algebra. This provides a 
very useful simplification, as n ~ FP’. 

For this canonical form, the independent roots 
a(x), Q2(x),...,a;(x) are gathered into a single 
vector @ with l components. The vectors œ = 
(a1,Q2,...,a;) are called root vectors. The root 
vectors exist in an /-dimensional space on which a 
positive-definite inner product can be defined. The 
root vectors for a rank-/ semisimple Lie algebra g 
span this Euclidean space. The basis vectors of g 
can be identified with the roots in the root space. 

The roots in a root space have the following 
properties: 


1. A positive-definite metric can be placed on the 
root space. 

. The vector 0 is a root. 

. The root 0 is /-fold degenerate. 

. If @ is a root and cæ is a root, c= £1, 0. 

. If œ and B are roots, 


nm BW NH 


is also a root and 2a -B/a@ -œ is an integer, 1. In 
fact, B” is the root obtained by reflecting B in the 
hyperplane orthogonal to @. 

6. The set of reflections generated by nonzero roots 
itself forms a group, the Weyl group of the Lie 
algebra. 
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7. The angle between roots œ and B is determined by 


123 


a-Ba-B mm tao 
"A? 474? 


a-ap-B 22 — 
The integers n,n for noncolinear roots are 
constrained by |m,n2| < 4. 


8. The relative lengths of the roots are determined 
by the angles between them: 


0 1 


cos*(a@,B) = 





cos? (6(a, B)) O(a, B) a-a/Bp-B 
3/4 30°, 150° 3# 

2/4 45°, 135° 22 

1/4 60°, 120° 1 


9. When the roots are normalized so that 
Saja; = ô or Saal 
ax~0 a0 


the commutation relations can be placed in the 
canonical form presented in the next section. 


\3/2e, + 3 @5 


—a, =- V3/2e,+ 2 e 
a : 2 2 Q4 + 3Q2 








s 


It is possible to build up all possible root space 
diagrams using an “Aufbau” construction. We start 
with a rank-1 root space. This consists of three roots 
in R!: a,0, —a@. 

To construct rank-2 root spaces, a new noncolinear 
root B is adjoined to the two nonzero roots. The new 
root and the old roots span R*. The new root can only 
have a limited set of angles with the roots already 
present. The set of roots œ, P is completed by reflection 
in hyperplanes orthogonal to all roots present. If any 
pair of roots violates the angle conditions, the result 
is not a root space. In this way, the rank-2 root 
spaces G7(30°), B2 = C2(45°),A2(60°), and Dy =A, + 
A,(90°) are constructed from A4. Proceeding in this 
way, it is possible to construct rank-3 root spaces 
(B3, C3, A3 = D3) from the rank-2 root spaces, the 
rank-4 root spaces from the rank-3 root spaces, and so 
forth. Ultimately, there are four unending chains 
An, Bn, Cn, Dn and five exceptional root spaces 
G2, F4, E6, E7, Eg. The rank-2 root spaces are shown 
in Figure 3 and the rank-3 root spaces are shown in 






1 a 
—Q4 — An =— V3/261 + 5 Cos 2 
1 3 l = = Q1 + 205 = 3/2e, — te, =i aii i C2 a = ey + e2 
—2a, -3a =— V36] 204 + 305 =— V3e, 
; = Oloo ® 
zo =20==vV3/26; =le os. . B a 
} 2 hee `x a4 + A = V3/2e, = 4 @2 +6; 
ri —Ap =-04— @o Q1 = 01 — @o 

3 4 /3/ 3 i 
=Q -3a =— \3/2e, me @5 Q1 = 3/2e, 77 @5 tej] + @5 +e% 

1 1 
G= k — B= @ © A, © Ap 

Q a Q1 Q 

|e;|=V1/12 1 2 2 

Qo =269 
Qo=e@ 
=04 =—0; + @5 2 2 Oy + 205 = ©] + @5 
—Q41 =—0] + @o Ay + Qo = 01 + & 
—Q1 — Qo =—-@ —2Q — Qv =—2e = 
1709 =e} Gs ose 17 Op 1 204 + Ap =26] 
—Q4 — 20 =—64 — @p a4 = 64 +62 E a a4 =641- €29 
Aig mE 


lela 

2 1 
B, €__® 
Oy Qp 





— 0p = —26, 


te t62; +264; +26, 


lel=V1/12 


Figure 3 Rank-2 root spaces: Go 30°, Bb = Co 45°, A2 60°, Do = A; + A; 90°. 
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Figure 4 Rank-3 root spaces: A3, B3, C3, D3 = A3. 


Figure 4. The normalization factors (cf. point (9) above) 
are shown for the rank-2 root spaces in Figure 3. 


Canonical Commutation Relations 


The canonical commutation relations are expressed 
in terms of root vectors. The l operators in g with 
the /-fold degenerate root vector 0 are H1, H2,..., 
H;. These | operators mutually commute. In a 
matrix Lie algebra, they can be taken as simulta- 
neously commuting diagonal matrices. Associated 
with each nonzero root a Æ 0, there is exactly one 
basis vector, Eg, in g. The canonical commutation 
relations are expressed in terms of the roots as 
follows: 


aj, | = 0 
Hj, Ea! = je 
Ez eee —a:-H 


Ea Fa] =] 


The structure constants Ngg are determined from a 
recursion relation derived from a chain of roots 
p-m P= m= Opn 1)æ, B + na, 


NagEa+p &+f a root 


0 a+ ß not a root 
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Zz 







+€,-@, 


where B — (m+ 1)æ and B + (n + 1) @ are not roots 
(cf. Figure 5). The structure constants are 


Nop = 3n(1 + m)(a@- ar) 


The operators H and Eg are often called diagonal 
and shift operators, respectively. They are general- 
izations of the shift operators Ją and J+ of angular 
momentum theory. The general idea is as follows. 
Since the operators H; mutually commute, the 
matrices I (H;) representing these operators can be 
chosen as diagonal in any matrix representation. 





Figure 5 Ana chain containing 2. 
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The action of any of these operators on a basis 
vector in this representation is H;|m)=m,|m). The 
operator Eg shifts the eigenvalue of H according to 


A(Eq|m)) = (|H, Ea| + EaH)|m) = (a + m)(Eq|m)) 


In this sense the operators Eg act on basis vectors 
|m) in such a way that the eigenvalue m is shifted by 
æ to m+ æ. 

For the simple classical Lie algebras, the roots can 
be expressed in terms of an orthogonal Euclidean 
basis set as shown in Table 1 and Figures 3 and 4 
for the rank-2 and rank-3 root spaces. The roots for 
the five remaining inequivalent simple Lie algebras 
(“exceptional” algebras) are shown in Table 2. 

The diagonal and shift operators for several of the 
classical Lie algebras can be related to bilinear 
products of boson or fermion creation and annihila- 
tion operators. For u(n), the bilinear products ataj 
are related to Eg with œ =e; —e, 1 <iAj <n, and 
H; =a! aj. This holds for either boson or fermion 
operators. For $p(2n; R), we have the identifications 
with bilinear products of boson operators as 
follows: +e; + ej > bibi, +e; — €j > b'b;, 7 
e;  bib;, and H; = b'b;. In particular, +2e; > bi? 
and —2e; = b?. For 30(2n), we have the identifica- 
tions with bilinear products of fermion operators as 


antisymmetric, of USp(2n) that are symmetric, and 
of SO(2n) that are antisymmetric (bosons +> sym- 
metric, fermions +> antisymmetric). 


Dynkin Diagrams 


Every root in a rank-/ root space can be represented as 
a linear combination of l “basis roots.” These basis 
roots can be chosen in such a way that all coefficients 
are integers. In fact, the basis roots can be chosen so 
that all linear combinations that are roots involve only 
positive integers (and zero) or only negative integers 
and zero. This comes about because every shift 
operator Es can be written as a multiple commutator 


Es ~ [Eg |Eg, Ey||, 6=a+PB+Y/ 


One simple way to construct such a basis set of 
fundamental roots is to construct an (/ — 1)-dimen- 
sional plane through the origin of the root space that 
contains no nonzero roots, and choose as / funda- 
mental roots the l roots on one side of this 
hyperplane that are closest to it. For the classical 
simple Lie algebras, the fundamental roots are: 








bot i Root Space O Qo ŒI] a) 
follows: +e; + ej © i, f; T a ej i; hs A —@j o 
fifj, and H; = fif; In particular, EF =f? =0. These A- eq- @2 @2—@3  @;1—e€l] 
identifications make it a relatively simple matter to 7 ee See See EEE 
f . B; eC; — @o Co — 63 e,_;—e, Hej; 
construct unitary matrix representations of the D 7 E E 
i i | eC; — @2 e2 — @3 e;_1 — e; +2e, 
compact Lie groups SU(m) that are symmetric or 
Table 1 Roots for the simple classical Lie groups and algebras 
Group Algebra Root space Rank Roots Conditions 
SU(/) Sul A = +e; — e 1<itj<! 
SO(2/) $0(2/) D; l +e; + ej 1 ee | <j </ 
SO(2/ + 1) $0(2/ + 1) Bı / +€; + €j, tex 1<i<j,k<l 
Sp(/) = USp(2/) $p(/) = usp(2/) C] / +e; + e;, +2e, 1<i<j,kzxiı 


Table 2 Roots for the simple exceptional Lie algebras 





Root space Rank Dimension Roots Conditions 

G2 2 14 +e; — 8j; l<isfjFk=3 
+[(e; + e;) — 2e,| 

F4 4 52 re; + 6j, +2@e; T<i<j<4 
+@€4 + €> + @3 + 04 

Es 6 78 +e; + @; Lar< 75 
1(+e,;te,+e3+e,+ 6s) +e, a 

E 7 133 +e; + @; 1a 7<— 756 
1( +e; +e. + e3 +e, + es +e) + Ze, b 

Eg 8 248 TEEB 1<i<j<8 
1 (+e + €e + e3 + e, es + 66 + €7 + es) a 


“Even number of + signs. 
’Even number of + signs within bracket. 


All roots in the rank-2 root spaces have been 
expressed in terms of both two orthogonal vectors 
and two fundamental roots in Figure 3. 

If a; and a; are fundamental roots, their inner 
product is zero or negative 


cos(a;,at;) = 0, oe m- -/3 


This information has been used to classify the root 
spaces of the inequivalent simple Lie algebras (over 
C). The procedure is as follows. Each of the l 
fundamental roots in a rank-/ root space is repre- 
sented by a dot in a plane. Dots representing roots 
a; and a; are connected by nj lines, where 
cos (Aj,@;) = —./ni/4. Orthogonal roots are not 
connected by any lines. Such diagrams are called 
Dynkin diagrams. Disconnected Dynkin diagrams 
describe semisimple Lie algebras. Connected Dynkin 
diagrams classify simple Lie algebras. 

The properties of Dynkin diagrams arise from two 
simple observations: 


O1: The root space is positive definite. 
O2: If u is a unit vector and v; are an orthonormal 
set of vectors, 


X (u-v <1 


These two observations lead to three important 
properties of Dynkin diagrams. 


D1: There are no loops. If œ; (i=1,2,...,k) are in a 
loop, then there are at least as many lines as 
vertices. With u; = a@;/|q;\, 


k k k 
Sui, > uj =k+25 u;-u; >0 
i=l = j= 


i<j 


Since 2u;: uj < —1 if uj-u; £4 0, there cannot be 
as many lines as vertices. 

D2: The number of lines connected to any node is 
<4. If &; are connected to v, then with 
ui = @i/ |æ], 


N wu =X ni/4<1 


since v is linearly independent of the @;j. 

D3: A simple chain connecting any two nodes can be 
shrunk. If the original diagram is allowed, the 
shrunk diagram is also allowed, and conversely. 
Since the shrunk diagram in Figure 6 violates D2, 
the original is not an allowed Dynkin diagram. 


According to these results, the maximum number 
of lines that can be attached to a vertex is three. If a 
vertex is attached to three lines, it can be connected 
to three (one line each) other vertices, two (two plus 
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Shrink 
— 


Figure 6 A chain with single links can be removed from a 
diagram. If the original is an allowed Dynkin diagram, the shrunk 
diagram is also allowed, and conversely. 


one) other vertices, or only one other vertex (all 
three lines). This last case describes Dynkin diagram 
G2 (cf. Figures 3 and 5). 

The only remaining possibilities are shown in 
Figure 7. 

For diagrams of type (B, C, F) we define vectors 


p q 
u = ; 1u; v= Jv; 
i=1 j=1 


where as usual u;, vj are unit vectors @,/|a,|. The 
Schwartz inequality applied to u and v leads to the 


inequality 
1 1 
(1 +5] (1 +7) >22 
p q 


The solutions with p > q are 





p q Root space Constraint 
arbitrary 1 B,C; i=p+1 
2 2 F4 


For diagrams of type (D, E), we define vectors 
p-l q-l 


r—1 
jvj, w = ` kw, 
k=1 


where as usual u;, v; wpg are unit vectors Æm/|Œml. 
With similar arguments, we obtain the inequality 


1 1 1 
~+—-4+->2 
Pp q r 
OS ee (B, C, F) 
Uy Uy W 2 
w 
W,—1 
(D.E) 
u Up-1 X Vg-1 Vi 


Figure 7 The only remaining candidate Dynkin diagrams have 
either two vertices (B, C, F) or one vertex (D, E) connected to 
three lines. 
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The solutions with p > q >r are 





p q r Root space Regular Euclidean solid 
arbitrary 2 2 D,+2 

3 3 2 Es Tetrahedron 

4 3 2 E Cube—octahedron 

5 3 2 Eg Icosahedron-dodecahedron 


All allowed Dynkin diagrams are shown in Figure 8. 
In these diagrams roots making an angle of 120° 
with each other (joined by single lines) have equal 
length. Roots joined by double lines or triple lines 
have different lengths. The arrows on double lines 


o— oo o oo A; 
Oy Op Ol 1 a 
Q1 
ooo oo D; 
ci ag Oy -2 
x 


o—o— o o o< B; 


o—o— o o o> C; 


a a- Ql | 
o =o Go 
Oy Qo 
o— co~ o oO Fy 
OY Qo Q3 4 
ff 
Es 
OY Qo Q3 Q4 Q5 
oo o 
E7 
Q 1 Qo Q3 4 Q5 QG 
Og 
Eg 
OY Qo Q3 4 Q5 OG Q7 


Figure 8 Four infinite series (A,, Dı, Bı, C;) of Dynkin diagrams 
exist and correspond to the classical simple Lie groups (SU 
(I+ 1), SO(2/), SO(2/ + 1), USp(2/)). The five exceptional Dynkin 
diagrams include a short finite series (E, |= 6, 7,8), F4, and Go. 


indicated the shorter and longer roots. Arrows point to 
longer roots. The root space G2 and F; are self-dual, so 
it does not matter which way the arrow points. 

Coxeter-Dynkin diagrams also appear in classical 
geometry and catastrophe theory. 


Real Forms 


The metric tensor g,,, for a simple Lie algebra (over C) 
in the canonical basis H, Eg is 





In this basis, the Lie algebra decomposes into 
positive- and negative-definite subspaces according to 


q= Tri 


H;, (Eza T E_q)/V2 
(Esq — E-g)/V2 


The choice of basis suggested above diagonalizes the 
Cartan-Killing form in eqn [2]: g — Ip, with 
p=l+(1/2)(n — l) positive values +1 on the diag- 
onal and q =(1/2)(n — l) values —1 on the diagonal. 
The trace of this matrix is the trace of g: +l. 

An arbitrary element in this (complex) Lie algebra 
is a linear superposition of the form 


X=) H+ )— e%Eg [3] 


&+0 


g4 spanned by 
g_ spanned by 


where all n coefficients h’, e% are complex. If all 
these coefficients are taken real, the resulting Lie 
algebra closes under commutation and describes a 
noncompact Lie group. The subalgebra describing 
the maximal compact subgroup is spanned by the 


linear combinations (E,g — E-a)/ V2. The remain- 
ing Operators exponentiate to a noncompact coset 


EXP4h'H; + e% (Esa + E-a)/Vv2} 


which is topologically equivalent to R‘,K=/+ 
(1/2)(m —1)=(1/2)(n +1). Of all the real forms of 
the complex Lie algebra described by this set of 
canonical commutation relations (or root space, or 
Dynkin diagram), this is the least compact real form. 

The compact real form is obtained from [3] by 
taking linear combinations 


X= 2, ib’ H; + > ie? (Exg + E-a)/ V2 
+ ` e% (Exq — E_g)/V2 


a0 


where h’, e%,e® are real. The compact real forms of 


the simple Lie algebras are: 





Root space Group 

A, — 1 SU(/) 

D; SO(2/) 

B; SO(2/ + 1) 

Ci USp(2/) = Sp(/) 


If the imaginary factor i is absorbed into the 
Cartan-Killing metric, this metric is diagonal, all 
matrix elements are —1, the trace of this form is —7, 
and the linear combinations for X are real. 

Every complex simple Lie algebra (i.e., simple Lie 
algebra over C) has a spectrum of inequivalent real 
forms. These can all be obtained from the compact 
real form by an analog of Minkowski’s “rotation 
trick,” derived by Cartan. Cartan introduced a 
metric-preserving linear mapping (“involutive auto- 
morphism”) T:q — g with the property T? =I and 
(TX, TY) =(X, Y), with X,Y € g. The operator T 
has eigenvalues +1 and induces a decomposition 
(“Cartan decomposition”) in g as follows: 


T(g) = TH) + Tp) 
q=f+p l | 
ft — P 


As a result, the subspaces f and p are orthogonal. 
The subspaces obey the following commutation and 
inner-product properties: 


(epee? (f,f) <0 
£, p] Cp, (f,p) =0 
Py P| St, (p,p) <0 
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Under the analytic continuation p — ip, the com- 
pact Lie algebra g is rotated to a noncompact Lie 
algebra q’ whose commutation relations and inner- 
product properties are 


g=f+p > g=f+p 
£, f] c f, (ŒP <0 
É, p] E p’, Œp) =0 
[pp] CF, (p,p) > 0 


The maximal compact subalgebra of g’ is f. The 
subspace p’ exponentiates to a simply connected 
submanifold on which the Cartan—Killing metric is 
positive definite. This manifold is topologically 
equivalent to RS, K = dim p. It is not geometrically 
equivalent to RË once an invariant metric is placed 
on it. 

Three linear mappings that satisfy T? =I suffice 
to generate all real forms of all the simple classical 
Lie algebras. 


Block Matrix Decomposition 


The compact Lie algebra u(n;F) has a block 
submatrix decomposition (1=p + q): 


nle LS t 


where Al = — Ap, Al = —A, and B is an arbitrary 
p X g matrix over F. Under the map 


I, 0 
Ta =i Tae | p | 
(3) =IpqSlp.q5 Ipa e? 
the diagonal subspace 
Ap 0 
0 A, 
has eigenvalue +1 and the off-diagonal subspace 
0 +B 
-Bİ 0 


has eigenvalue —1. Under the Cartan rotation 


A, 0 0 +B 
uF) > ulpa F= |“ a+ fam d 
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The real forms of the classical Lie groups obtained 
in this way are 


Dn, Br 


SO(2n + 1) 


An-1 
SU(1) > SU(p, q) 


Cy 
Sp(n) — Sp(p, q) 
USp(2n) > USp(2p, 24) 


Subfield Restriction 


The Lie algebra ŝu(n) of complex traceless anti- 
Hermitian matrices has a subalgebra 5o(n) of real 
antisymmetric matrices. The algebra $u(m) can be 
expressed in terms of real nxn antisymmetric 
matrices A, and traceless symmetric matrices S,: 


Su(n) = So(n) + [Su(n) — S0(m)] = A, + iS, 
The Cartan rotation is 


3u(n) > 3l(n;R) = 80(n) + i[3u(n) — 80(n) 
ey ee 


The classical Lie group generated by this transfor- 
mation is SL(n; R). 

A similar rotation can be carried out on unitary 
matrices over the quaternion field, u(m;O)=$p(z). 
This algebra contains the subalgebra u(n) in which 
quaternions q = qo + Zqi + Jq2 + Kq3 are restricted 
to complex numbers q = go + iq. There is a natural 
decomposition 


Sp(m) = u(n) + [Sp(m) — (72) 


It is useful at this point to replace each quaternion 
matrix element by a 2 x 2 complex matrix: Sp(n) —> 
usp(2n). This is a unitary representation of the 
symplectic algebra. Replacing the complex matrix 


Table 3 Real forms of the simple classical Lie algebras 


elements in u(n) by 2 x 2 real matrices simultaneously 
generates a real matrix representation of u(n) named 
ou(27). This is an orthogonal representation of the 
unitary algebra. The decomposition above is 


Sp(m) > u(n) + [sp(m) — u(7)] 
— ou(2n) + [ju5p (2n) — ou(2n)| = Aon + iS27 


where as before Az, and Sz, are 2n x 2n antisym- 
metric and symmetric matrices. The Cartan rotation 
maps this to $p(2n; R), 


usp(2n) > 8p(2n;R) = Arn + Son 


The classical Lie group generated in this way is 
Sp(2n; R). Matrices in this group satisfy the quadratic 
constraint M’GM=G, G’=—G,det(G) 40. The real 
symplectic groups leave invariant Hamilton’s equations 


of motion: dp;/dt= —0OH/04i; dq; /dt= =e OH /Op;. 


Field Embeddings 


The image of u(n) — ou(2n) consists of a set of 
2n x 2n antisymmetric matrices of dimension n?. 
These matrices form a subset of $0(2”), which 
consists of 22x 2n antisymmetric matrices of 
dimension 2n(2n —1)/2. As a result, ou(2n) is a 
subalgebra in $0(2m). Thus, ou(2”)~f and 


80(2n) ~ g and we have a Cartan decomposition 
$0(2n) =ou(2n) + [$0(2n) — ou(2n)| 


{ { 
ou(2n) + i[$0(2n) — ou(2n)] = $0*(2n) 

In the same way, the image of $p(2n) — usp(27) 
consists of an n(2n + 1)-dimensional set of 2n x 2n 
anti-Hermitian matrices. This is a subset of $11(27), 
which has dimension (2n)? — 1. It is also a sub- 
algebra of $1(27). Thus, usp(27) ~ f and $u(2n) ~ q, 
so we have a Cartan decomposition 


Su(2n) =usp(2n) + |su(2n) — usp(2n)| 


l l 
usp(2n) + ilõu(2n) — usp(2n)] = 5u* (2n) 


These real forms are summarized in Table 3. 





Mapping Real form Maximal compact subalgebra Root space Condition 
Block submatrix 30(p, q) $0(p) + $0(q) Dn p+q=2n 
5o(p, q) So(p) + 30(q) Bn p+q=2n+1 
su(p, q) u(1) + su(p) + su(q) An—1 p+q=n 
3p(p, q) =usp(2p, 2q) usp(2p) + usp(2p) Cn p+q=n 
Subfield restriction 3[(n; R) So(n) An—1 
5p(2n; R) u(n) Ch 
Field embedding $0*(2n) u(n) Dn 
Su*(2n) $p(n) = usp(2n) A2n-1 


Table 4 [Equivalence among real forms of the simple classical 
Lie algebras 











3u(2) = 80(3) = 8p(1)=usp(2) -3 
Su(1,1)=68I(2; R) = 89(2, 1) = $p(2; R) +1 
D2 Ay + Ay X 
30(4) = 30(3) + 80(3) 6 
50*(4) = §0(3) + §0(2, 1) —2 
30(3, 1) = 38{(2:C) 0 
$0(2, 2) = $0(2, 1) + §0(2, 1) +2 
Bo = C2 X 
30(5) — 8p(2) = u3p(4) 49 
30(4, 1) — 38p(1, 1) =usp(2, 2) E 
30(3, 2) — 3p(4: R) 42 
D3 = A3 X 
50(6) = $11(4) —15 
30(5, 1) = §11*(4) —5 
50*(6) = $1(3, 1) —3 
30(4, 2) = 3n(2,2) +1 
30(3, 3) — 8{(4: R) +3 

The root spaces Aj,[SU(2)], By[SO(3)], and 


C,[U(1; O) ~ USp(2; C)] are equivalent. As a result, 
the different real forms of their complex extensions 
are related to each other. Similar remarks hold for 
the real forms of By=CG,, D.=A;+ Aj, and 
D3 = A3. The relations among these real forms are 
summarized in Table 4. This table is useful in 
inferring “spinor representations” among classical 
groups. Thus, SO(3) has spinor representations 
based on SU(2) and Sp(1); SO(4) has spinor 
representations based on SU(2) x SU(2);SO(5) has 
spinor representations based on USp(4); and SO(6) 
has spinor representations based on SU(4). 

For completeness, the real forms for the excep- 
tional Lie algebras are collected in Table 5. 

Real forms of the complex extension of a simple 
Lie algebra are almost uniquely distinguished by an 
index. This is the trace of the Cartan—Killing form 
[2], once the appropriate factors of 1 have been 
absorbed into it. If m is the dimension of the 
maximal compact subgroup, y =tr(g) = +1(” — ne) 
—1(n,.) =n — 2n,. The index ranges from —n for the 
compact real form (for which n. =n) to +l for the 
least compact real form. 


Riemannian Symmetric Spaces 


Exponentiation lifts Lie algebras to Lie groups and 
subspaces in Lie algebras into submanifolds in Lie 
groups. In particular, exponentiation of a Cartan 
decomposition 
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Table 5 Real forms of the exceptional Lie algebras 


Maximal compact 











subgroup 
Root space ClasSSpank(Character) Root space Dimension 
Go G21 4) Go 14 
G2(42) Aı +Ai 6 
F3 F452) F3 52 
F420) B, 36 
F444) C3 + A; 24 
Es E6(-78) Es 78 
Fe-26) F4 52 
E6(-14) Ds + Di 46 
F642) As + Ay 38 
F646) C4 36 
E E7133) E; 133 
Ezi-25) Fe + Di 79 
E75) De + Ai 69 
E7(47) A7 63 
Eg Eg(-248) Eg 248 
Eg(-24) E7 + A; 136 
Eg(48) Dg 120 
qg = É + p 
{ { 
G = K x (P=G/K) 


lifts the subspace p to the quotient (P= G/K). 

A metric may be defined on the Lie group G as 
follows. Define the distance between the identity 
and some nearby point = g(e)=EXP(eX)= 
EXP(6x!X;) by 

ds*(0) = G,,6x" 8x" 
Move I and g(e) to the neighborhood of any point 
g(x)€G by left multiplication: g(x) — g(x), 
2(x)2(dx'X;) — g((x +dx)'X;). The  infinitesimals 
dx'(x) at x (defined by g(x)) and 6x’=dx'(0) at I 
are linearly related, 
bx! = M';(x) dx! (x) 
By requiring that the distance ds between I and 
2(6x'X;) at the identity be the same as the 
distance between g(x'X;)I and g(x'X;)g(éx'X;) = 
g((x + dx)'X;) at g(x’X;) leads to the condition 
ds? = G,,(0) 5x" 8x" 
= G;5(0)M"j(x)MS;(x) dx! (x) dx (x) 
= Gi (x) dx‘ (x) dx! (x) 


An invariant metric G(x) over the Lie group G is 


defined by 


D 
x 
| 


Grs(O) MT (20) M5 (x) 
G(x) = M*(x)G(0)M(x) 
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It is useful to identify G(0) with the Cartan—Killing 
inner product on q. Since M(x) is nonsingular, the 
signature of G(x) is invariant over the group. 

The invariant metric on G can be restricted to 
subspaces K c G and P=G/K CG. The signature 
on these subspaces is the same as the signature on 
the subspaces f and p in g. Thus, if G is compact, 
the invariant metric is negative definite on K and on 
P=G/K and positive definite on the analytically 
continued space P’=G’/K. In short, it is definite 
(negative, positive) on P,P’. These spaces are 
Riemannian spaces and they are globally symmetric. 
They have been investigated by studying the proper- 
ties of the secular equation of the Lie algebra g, 
restricted to the subspace p: 


det[R(p'P;) — AI] = X (AA) =0 A 
j 

where the P; are basis vectors that span p. The 
coefficients ilp) in the secular equation [4] for 
Riemannian symmetric spaces are related to the 
coefficients ¢;(x) in the secular equation [1] for Lie 
algebras. A rank for the Riemannian symmetric 
space P=EXP(p) can be defined from the secular 
equation following exactly the prescription followed 
for the Lie algebra g. The rank of the Riemannian 
symmetric space P=EXP(p) is 


1. the number of functionally independent coeffi- 
cients ĝ;lp) in the secular equation; 

2. the number of independent roots of the secular 
equation; 

3. the dimension of the maximal Euclidean sub- 
space in P; and 

4. the number of independent (Laplace—Beltrami) 
Operators that commute with all displacement 
operators P;: Aj(P) = ¢;(p' — P;). 


Rank-1 Riemannian symmetric spaces are isotropic 
as well as homogeneous. 


exceptional Riemannian symmetric spaces can be 
constructed from the information in Table 5 following 
the procedure used to construct Table 6 from Table 3. 

As particular examples of Riemannian symmetric 
spaces we consider the compact spaces SO(p + q)/ 
[SO(p) x SO(q)] and their noncompact counterparts 
SO(p, g)/[SO(p) x SO(q)]. These spaces have rank 
min(p,qg), dimension pg, and can be represented 
explicitly in matrix form as 


| 0 =| | 0 =| 
— EXP 
ox’ 0 ox! 0 
[Dp | Y 
= JoY! | D; 








Here X is a pxq matrix and o=+1 for the 
noncompact case and —1 for the compact case. The 
block diagonal matrices Dp and Dg are defined from 
the metric-preserving conditions (M*Ip+4M = Ip4q, 
M'Tp,qM = Ip, q) 


Di=Ip+oYY’, De=I,+0Y'Y 

The pg coordinates in the Riemannian symmetric 
spaces can be taken as the pq elements of the 
submatrix Y. 

These Riemannian symmetric spaces can be 
treated as algebraic submanifolds in RŠ, K =pq + 
(1/2)q(q+1). The K coordinates on R* can be 
identified with the pg matrix elements of Y and the 
(1/2)q(q + 1) matrix elements of the real symmetric 
matrix D4. These coordinates obey the (1/2)q(q + 1) 
algebraic constraints defined by 


Di-oY'Y =I, 
For S$O(3)/SO(2) and SO(2,1)/SO(2), this condition 


is determined from the matrix 














Tables 3 and 5 contain all the information required to be 
to enumerate all the classical and exceptional Rieman- 
nian symmetric spaces. All the classical Riemannian 
symmetric spaces are tabulated in Table 6. The zg —a(x* +y7) =1 
Table 6 All classical Riemannian symmetric spaces 
Root space Quotient Dimension Rank x 
Ap+q-1 SU(p, q)/S[U(p) $ U(q)] 2pq min(p, q) 1 —(p—q)° 
An_1 SL(n; R)/SO(n) 4(n+2)(n — 1) n—1 D= 
Aon—1 SU*(2n)/USp(2n) (2n+ 1)(n—1) n-i —2n—1 
Bo+q SO(p, q)/SO(p) & SO(q) pq min(p, q) pq —3p(p —1)-39(q-1 
Dp+q SO(p, q)/SO(p) & SO(q) pq min(p, q) pa-p- 1- a= 
Dn SO*(2n)/U(n) n(n — 1) n/2 —n 
Cp+q USp(2p, 2q)/USp(2p) ® USp(2q) 4pq min(p, q) -2(p — q}? — (P + q) 
Cr Sp(2n; R)/U(n) n(n + 1) n +n 


For ø = —1, the space is the sphere S* defined by z* + 
(x? + y*)=1. For o = +1, the space is the two-sheeted 
hyperboloid H+ defined by z% — (x? + y7)=1. More 
specifically, it is the upper sheet containing (0, 0, 1) of 
the two-sheeted hyperboloid. The second sheet occurs 
in the coset O(2,1)/SO(2). The symmetric spaces 
SO(n + 1)/SO(m) and SO(n, 1)/SO() are the sphere 
S” and the upper sheet of the two-sheeted hyperboloid 
H}. Both have dimension n and rank 1. The spaces 
are simply connected, homogeneous, and isotropic. 

For SO(4, 2)/SO(4) x SO(2), the eight-dimensional 
algebraic manifold is defined by the three con- 
straints in R!!: 


Yi Y5 
- jo eek y2 y3 s y2 ye 
yio Yi MS Je J7 MAII J7 
y4 Y8 
1 0 
p f 1 
The compact analytically continued space 


SO(6)/SO(4) x SO(2) is obtained by setting o = —1. 
These spaces have dimension 8 and rank 2. They are 
homogeneous but not isotropic. For each, there are 
“two inequivalent directions.” There are two inde- 
pendent Laplace-Beltrami operators on these spaces, 
one quadratic and one quartic. 

The complete list of globally symmetric pseudo- 
Riemannian symmetric spaces can be constructed 
almost as easily. Two linear operators, Tı and Th, 
are introduced that obey T? =I, T} =I, Tı T = 
TəTı # I. The two are used to split g into 
subspaces 


Ti or = O Bor, Ta Bor = T Bor 


where o= +1,r= +1. The decomposition and 
double rotation 


a = M Ti Tray T 
HE 

a =i Gg 
{Ty 

g = G44 +104_ +i(G_, + ig__) 


generates a noncompact subgroup K” as well as a 
pseudo-Riemannian symmetric space P”: 
K" =EXP(g,, +ig,_), P” =EXP(ig_, +9_) 


These have also been classified. 
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The simplest example of a pseudo-Riemannian 
symmetric space is SO(2,1)/SO(1,1): 








O 63 O2 0 0 0 
60(2,1) > | -63 0 a hae ee O 04 
02 01 0 01 0 
0 03 O2 Z X 
+ | -3 0 0| —=M= |=x * * 
QO O 0 yx * 


The metric-preserving condition M'h ıM=h 1 
leads to the constraint equation z% + x? — y2 =1. 
This space is the single-sheeted hyperboloid H?. It is 
two dimensional and has rank 1, but it is not 
isotropic. Intersections with the plane x=0 are 
hyperbolas and with the planes y = const. are circles. 
This space is not simply connected. 


Summary 


Lie groups are among the most powerful mathema- 
tical tools available to physicists. They play a major 
role in physics because they occur as transformation 
groups from coordinate system to coordinate system 
in real space (rotation group SO(3), Lorentz group 
O(3,1), Galilei group, Poincaré group ISO(3,1)) or 
in spaces describing internal degrees of freedom 
(SU(2) for spin or isospin, SU(3) for quarks and 
color, SU(4) for spin—isospin, etc.). 

It is remarkable that a beautiful classification 
theory for simple (the building blocks) Lie groups 
exists, because of the rather amorphous nature of the 
definition of a Lie group. In a search for structure, 
the first step in the analysis of Lie groups is 
linearization of the group multiplication law in the 
neighborhood of the identity to a linear vector space 
on which there is a Lie algebra structure. This in itself 
is sufficient to create a strong connection to quantum 
mechanics. Although there is not a 1:1 correspon- 
dence between Lie groups and their Lie algebras, 
there is a very beautiful connection between them. 
This relates algebra (discrete invariant subgroups) 
and topology (homotopy groups) in an elegant way. 

The structure of Lie algebras is described using 
tools from linear algebra: secular equations and 
inner products. Together, these tools are used to 
reduce Lie algebras to their basic units: nilpotent 
and solvable invariant subalgebras, and semisimple 
and simple Lie algebras. The commutation relations 
for simple Lie algebras can be put into a canonical 
form using another miracle of this theory: a positive- 
definite root space that summarizes the properties of 
the secular equation and the Cartan-Killing inner 
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product. As the secular equation can only be solved 
exactly over an algebraically closed field, the 
classification of simple Lie algebras covers complex 
Lie algebras. Each complex extension has several 
real forms, which are easily classified. 

Even more remarkable is the connection between 
simple Lie groups and Riemannian spaces that “look 
the same everywhere.” All Riemannian symmetric 
spaces are quotients of a simple Lie group by a 
subgroup that is maximal in some precise sense 
(Cartan decomposition sense). Cartan was able to 
classify all Riemannian symmetric spaces as a 
consequence of his classification of all the real 
forms of all the simple Lie groups. The algebraic 
tools used to classify Lie algebras (secular equations, 
Dynkin diagrams) were used again to classify these 
spaces (Dynkin diagrams — Araki—Satake diagrams). 
These spaces are classified by a root space, group- 
subgroup pair, dimension, rank, and character. 
Construction of invariant operators (Casimir invar- 
iants, Laplace—Beltrami operators) is algorithmic. 

Nonsemisimple Lie groups/algebras can be con- 
structed from simple Lie algebras by carefully 
introducing singular change of basis transforma- 
tions. This leads to “group contraction,” not 
discussed above. In this way, the Poincaré group 
can be constructed systematically from the groups 
SO(3, 2) or SO(4, 1): SO(3, 2) — ISO(3, 1), SO(4, 1) — 
ISO(3,1) in the limit of “large R.” Here, R is the 
“radius” of some universe of hyperbolic nature, with 
signature (3,2) or (4,1). The Galilei group can be 
constructed by contraction from the Poincaré group in 
the limit c=3 x 10!°cms7! —> oo. 

We have not discussed here the theory of the 
representations of Lie groups. A beautiful theorem by 
Wigner and Stone guarantees that the tensor represen- 
tations of a compact group are complete. Gel’fand has 
given expressions for the complete set of tensor 
representations of the classical compact Lie groups. 
They are expressed by “dressing” the appropriate 
Dynkin diagrams or else in terms of irreducible 
representations of the symmetric group S,. Gel’fand 
has also given explicit, analytic, closed-form expres- 
sions for the matrix elements of any of the shift 
operators in any of these representations. For the 
noncompact real forms, most of the unitary irreducible 
representations can be obtained from these expressions 
for matrix elements (“master analytic representation”) 
by appropriate analytic continuation. 


Since Lie groups exist at the interface of algebra 
and topology, it is to be expected that there is a very 
close relation with the theory of special functions. In 
fact, the theory of special functions forms an 
important chapter in the theory of Lie groups. On 
the topological side, the shift operators Eg (think J+) 
have coordinate representations (x’|Eg|x) involving 
first-order differential operators. On the algebraic 
side, the matrix elements (n'|Eg|n) are square roots 
of products of integers (divided by products of 
integers). These topological and algebraic expres- 
sions are related to each other in a myriad of ways. 
All of the standard properties of special functions 
(Rodriguez formulas, recursion relations in coordi- 
nates and indices, differential equations, generating 
functions, etc.) occur in a systematic way in a Lie- 
theoretic formulation of this subject. 

Finally, no review or even book could do justice 
to the applications that Lie group theory finds in 
physics. 

The rich interplay that exists between freedom 
and rigidity of structure found in Lie group theory 
can be found in only the purest works of art — for 
example, the fugues of Bach. 


See also: Classical Groups and Homogeneous Spaces; 
Compact Groups and their Representations; Cosmology: 
Mathematical Aspects; Equivariant Cohomology and the 
Cartan Model; Finite-Type Invariants of 3-Manifolds; 
Functional Equations and Integrable Systems; Lie 
Superalgebras and Their Representations; Lie, 
Symplectic, and Poisson Groupoids and Their Lie 
Algebroids; Measure on Loop Spaces; Quasiperiodic 
Systems; Symmetry and Symplectic Reduction; 
Symmetry Classes in Random Matrix Theory; Toda 
Lattices. 
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Basic Definitions 


Let A be an algebra over a field K of characteristic 
zero (usually K =R or C) with internal laws + and x. 
One sets Z2 =Z/2Z—={0,1}. A is called a super- 
algebra or Z2-graded algebra if it can be written into 
a direct sum of two spaces A= Az © Aj, such that 


Az * Az C Ap, Az * Az C Aj, Az * AZ CAG 


Elements of Az are called even or of degree 0 while 
elements of A; are called odd or of degree 1. 
A superalgebra A is called associative if (X x Y) x 
Z=Xx«*(Yx*Z) for all X,Y,ZeEA. It is called 
commutative if X x Y=(—1)%8*-48’y 4X for all 
X, Y € A, where deg X is the degree of the element X. 

A homomorphism ® from a superalgebra A into a 
superalgebra A’ is a linear application from A into 
A’ which respects the Z2-gradation, that is, ®(A5) C 
Az and (Az) C AZ. 

A Lie superalgebra G over a field K of character- 
istic zero (usually K=R or C) is a superalgebra in 
which the product, denoted |, |, satisfies the 
following properties: 


Z-gradation 
K2 G; | = Gi+i 


Graded-antisymmetry 


5 € Z2) 


[xla] 
Generalized Jacobi identity 


(m [X;, [X;, x, ]] 
4 (1) [X;, [X,, Xi] 
aR aa cea [X,, [X;, XJ] — 0 


Note that G5 is a Lie algebra, called the even or 
bosonic part of G, while Gz, called the odd or 
fermionic part of G, is not an algebra. 

An associative superalgebra G = G5 © Gz over the 
field K acquires the structure of a Lie superalgebra by 
taking for the product | , | of two elements X, Y € G 
the Lie superbracket (also called supercommutator or 


graded commutator) 


[X, Y] =X*Y—(-1)*8* “8 V4 X 


The notation | , | for the supercommutator is used to 
avoid confusion with the usual commutator [X, Y] = 
X* Y—YxXkX. 

A Lie superalgebra G is Z-graded if it can be 
written as a direct sum of finite-dimensional Z2- 
graded subspaces G; such that 


G=QGi, where [G;, 9] C Gis 
IEZ 
The Z-gradation is said to be consistent with the Z2- 
gradation if 


Ga = ` Gu and G= > G2i+1 
IEZ 1EZ 
It follows that Go is a Lie subalgebra and that each 
Gili 4 0) is a Go-module. 

A subalgebra K = K3 @ Kz of a Lie superalgebra G 
is a subset of elements of G which forms a vector 
subspace of G that is closed with respect to the Lie 
product of G such that K5 C G5 and K3 C Gp. 
A subalgebra K of G is called a proper subalgebra of 
G if K Æ G. An ideal Z of G is a subalgebra of G such 
that [G,Z] c Z, that is, X EG, YeT=>|xX,Y] €Z. 
An ideal Z of G is called a proper ideal of G if T Æ G. 
If Z and TZ’ are two ideals of G, |Z, Z'| is an ideal of G. 

The definitions of the centralizer, the center, and 
the normalizer of a Lie superalgebra follow those of 
a Lie algebra. Let S be a subset of elements in the 
Lie superalgebra G. The centralizer Cg(S) is the 
subset of G given by 


Co(S) = {X € G| |X, Y] =0, VY € S} 


The center Z(G) of G is the set of elements of G 
which commute with any element of G (in other 
words, it is the centralizer of G in G): 


Z(9) = {X €G|[X, Y] =0, YY € 9) 
The normalizer Ng(S) is the subset of G given by 
Nog(S) = {X €G||xX,Y] €S, YY € S} 


The Lie superalgebra G is said to be nilpotent if 
considering the series [G,G'!]=G"! with g™! =g, 
then there exists an integer n such that G”! = {0}. 

The Lie superalgebra G is said to be solvable if 
considering the series [G7 ", gY] =G" with GO! =G, 
then there exists an integer n such that G™ = {0}. A 
Lie superalgebra G is solvable if and only if G5 is 
solvable. 

Let G be a noncommutative Lie superalgebra. 
The Lie superalgebra G is called simple if it does 
not contain any nontrivial ideal. The Lie super- 
algebra G is called semisimple if it does not 
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contain any nontrivial solvable ideal. Let us 
recall that if A is a semisimple Lie algebra, it 
can be written as the direct sum of simple Lie 
algebras A;: A= 6; A;. This is not the case for 
superalgebras. 

Let G=G5 Gz be a Lie superalgebra and V= 
V5 ® Vz be a Zo-graded vector space. Consider 
the algebra EndV of endomorphisms of V, 
which naturally acquires a superalgebra structure 
by EndV=End,V End; V, where End; V={¢ € 
End V|¢(V;) C Viy}. A linear representation m of G 
is a homomorphism of G into End V, that is, 


n(aX + BY) = an(X) + Br(Y) 


n([X, YD = [(),2(¥)] 
™(G5) C Endz and ™(Gz) C End7V 


for all X, Y € Ganda, 3 € C. The vector space V is the 
representation space. The vector space V has the 
structure of a G-module by X(v)=7(X)v for X €G 
and v € V. The dimension (resp. superdimension) of the 
representation 7 is the dimension (resp. graded dimen- 
sion) of the vector space V: dim a= dim Vj + dim V; 
and sdimz = dim V5 — dim V}. In particular, the repre- 
sentation ad:G—+EndG (G being considered as a 
Za-graded vector space) such that ad(X)Y =|X, Y| 
is called the adjoint representation of G. 

In the basis (€1,...5@ms@m+15+++5€m4n) Of V= 
Vz ® Vz (called homogeneous basis), where dim V5 =m 
and dim V4 =n, an element of G is represented by the 


matrix 
A B 
m=(¢ p) 


where A, B, C, and D are m x m,m x n,n x m, and 
nxn matrices, respectively. Even elements corre- 
spond to block diagonal matrices (i.e., B=C=0), 
odd elements to block antidiagonal matrices (i.e., 
A=D=0). One defines the supertrace function 
denoted by str: 


str(M) = tr(A) — tr(D) 





To a given representation 7 of G, one can associate a 
bilinear form B, on G as 


B(X, Y) = str(m(X)n(Y)), VX,YEG 


m(X) are the matrices of the generators X in the 
representation m and str denotes the supertrace. A 
bilinear form B on G is called 


1. consistent if B(X, Y)=0 for all X € G; and all 
Y €E G7, 
2. supersymmetric if, for all X, Y € G, 


B(X, Y) = (ARY X) 


3. invariant if, for all X, Y, Z € 9, 
B(X, Y], Z) = B(x, [Y, Z] 


The bilinear form associated to the adjoint repre- 
sentation of G is called the Killing form on 
G: K(X, Y) =str(ad(X)ad(Y)). It is consistent, super- 
symmetric, and invariant. 


Classification of Simple Lie 
Superalgebras 


The simple Lie superalgebras have been classified by 
V G Kac. One distinguishes two general families: the 
classical Lie superalgebras and the Cartan type 
superalgebras. 


Classical Lie Superalgebras 





A simple Lie superalgebra G=G> @G; is called 
classical if the representation of the even subalgebra 
Gz on the odd part Gz is completely reducible. The 
superalgebra is said to be of type I if the representa- 
tion of Gy on G; is the direct sum of two irreducible 
representations of Gz. In that case, one has G7 = 
G46Q with 


[9-1, G1 | = Go and [Ga1, G1 | — 0 


The superalgebra is said to be of type II if the 
representation of G5 on G; is irreducible. 

A classical Lie superalgebra G is called basic if 
there exists a nondegenerate invariant bilinear form 
on G. The basic Lie superalgebras split into four 
infinite families: A(m, n) or sl(m + 1|n +1) form £ n 
and A(n,n) or sl(1+1|n+1)/Z=psl(n+ 1|n + 1), 
where Z is a one-dimensional center for m=n 
(unitary series), B(m,n) or osp(2m + 1|2n), C(n) or 
osp(2|27), D(m,n) or osp(2m|2n) (orthosymplectic 
series); and three exceptional superalgebras F(4), 
G(3), and D(2,1;a@), the last one being actually a 
one-parameter family of superalgebras. The classical 
Lie superalgebras which are not basic are called 
strange, and correspond to two infinite families 
denoted by P(n) and O(n). 

A basic Lie superalgebra G= G3 ® Gz admits a 
consistent Z-gradation G= ®;ez G; (called distin- 
guished), such that (see Tables 1 and 2) 








e for superalgebras of type I, G;=0 for |i| > 1 and 
G5 = Go, GF = G1 BG and 

e for superalgebras of type II, G;=0 for |i| > 2 and 
G5 = 9-2 ® Go © G2, G7 = G-1 ON. 


Cartan Type Superalgebras 


The Cartan type Lie superalgebras are the simple Lie 
superalgebras in which the representation of the 
even subalgebra on the odd part is not completely 


Table 1 Z2-gradation of the classical Lie superalgebras 











Superalgebra G G G7 

Ai N= 1) Am 1 DAp_-1 B U( ) (m, n) & (m,n) 
A(n — 1,n—1) An—1 © An—1 (n,n) (n,n) 
C(n+ 1) Cn © U(1) (2n) & (2n) 
B(m, n) Bm ® Cn (2m + 1, 2n) 
D(m, n) Dm ® Cn (2m, 2n) 
F(4) A; © Bz (2, 8) 

G(3) A; ® Go (2,7) 

D(2, 1; a) Ai $ A1 BA (2,2,2) 

P(n) An [2] 6 [1°] 
Q(n) An ad(An) 
reducible. They are classified into four infinite 


families called W(n) with n > 2,S(n) with n > 3, 
S(n), and H(n) with n > 4. S(m) and S(m) are called 
special Cartan type Lie superalgebras and H(m) 
Hamiltonian Cartan type Lie superalgebras. 


Classical Lie Superalgebras 


The classical Lie superalgebras are described as matrix 
superalgebras as follows. Let V= V5 $ Vz be a Z2- 
graded vector space, with dim Vj =m, dim Vj =n. 
The Lie superalgebra gl(m|n) is defined as the super- 
algebra End V = Endz V 6 End; V supplied with the 
Lie superbracket. 

The unitary superalgebra A(m — 1,” — 1) =sl(m | n) 
is defined as the superalgebra of matrices M € gl(m|n) 
satisfying the supertrace condition str(M)=0. In the 
case m =n, sl(n|n) contains a one-dimensional ideal 7 
generated by Ib, and one sets A(m—1,n—1)= 
sl(n|n)/T = psl(n |n). 

The orthosymplectic superalgebra osp(m |2n) is 
defined as the superalgebra of matrices M € gl(7m | n) 
satisfying the conditions 


A‘—-A, D'G=-GD, B=-C'G 





where t denotes the usual transposition and the 
matrix G is given by 
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The strange superalgebra P(n) is defined as the 
superalgebra of matrices M € gl(n |n) satisfying the 
conditions 


A‘=-D, B*=B, C=-C, tr(A)=0 
The strange superalgebra Ọ(n) is defined as the 
superalgebra of matrices M € gl(n |n) satisfying the 
conditions 


A=D, B=C, tr(B)=0 
The superalgebra O(n) has a one-dimensional center 
Z. The simple superalgebra O(n) is given by 


O(n) = Q(n)/Z. 


Structure of the Classical Lie 
Superalgebras 


Let G=G, $ Gz be a classical Lie superalgebra. A 
Cartan subalgebra H of G is defined as a Cartan 
subalgebra of G5, that is, the maximal nilpotent 
subalgebra of G5 coinciding with its own normal- 
izer: H={X € G| |X, H] C H}. It follows that the 
Cartan subalgebras of a Lie superalgebra are 
conjugate since the Cartan subalgebras of a Lie 
algebra are conjugate and any inner automorphism 
of the even part Gj can be extended to an inner 
automorphism of Q; hence, they all have the 
same dimension. By definition, the dimension of 
a Cartan subalgebra H is the rank of G:rankG= 
dim H. 

A classical Lie superalgebra G with Cartan 
subalgebra H can be decomposed as G= aep Ga 
(H* is the dual of H), where 


Ga = {x €G||b,x] = a(h 
The set A C H* 


\x,h € H} 


A={e€ nan] 


is by definition the root system of G. A root a is 























called even (resp. odd) if Ga NGz #0 (resp. 
Table 2 Z-gradation of the classical basic Lie superalgebras 
Superalgebra G Go Gi ® G Go ® G-2 
A(m — 1,n — 1) Am-1 ® An-1 Ð U(1) (m,n) & (m, n) 
A(n — 1,n — 1) An—1 ® An-1 (n,n) @ (n,n) 
C(n+ 1) Cn ® U(1) (2n), ®(2n)_ 
B(m,n) Bm ® An-1 ® U(1) (2m+1,n)(2m+1,7) [2] $ [2"71] 
D(m,n) Dm ® An-1 ® U(1) (2m, n) $ (2m, n) [2] 6 [271] 
F(4) B; © U(1) 8, 08- 1,91- 
G(3) Go & U(1) 74 07_ lpia 
D(2, 1; a) A; $ A; $ U(1) (2,2). (2,2). 14.6 1_ 
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Ga N G7 #0). The set of even roots Aj is the root 
system of the even part Gz of G. The set of odd root 
A; is the weight system of the representation of G5 
in Gz. One has A= Aj U Aj. A root can be both 
even and odd (however this only occurs in the case 
of the superalgebra O(z)). The vector space spanned 
by all the possible roots is called the root space. It is 
the dual H* of the Cartan subalgebra H as vector 
space. 

Except for A(1,1), P(n), and O(n), using a non- 
degenerate invariant bilinear form B on the super- 
algebra G, one can define a bilinear form (-,-) 
on the root space H* by (aj, aj) = B(Hji, H;), where 
the H; form a basis of H. The following properties 
hold: 


1. Gao) =H except for O(n). 
2. dim Ga =1 when a Æ 0 except for A(1,1), P(2), 
P(3), and O(n). 
3. Except for A(1, 1), P(n), O(n), one has 
(a) [Ga, G6] 4 0 if and only if a, B,a+ 8 € A, 
(b) (Ga, Ge) =0 for a + 8 #0, 
air are A (resp. AAi then, =0 e A resp: 
Ap Arh and 
(d) a€ A => 2a € A if and only if a € Az and 
(a,a) ~ 0. 


In the rest of this section, we restrict to the case 
of a basic Lie superalgebra G of rank r, with Cartan 
subalgebra H and root system A = Aj U Aj. Then G 
admits a Borel decomposition G=N BHEN, 
where N~ are subalgebras such that [H, N=] CN 
with dim M” = dim M. If G=H@,, Ga is the root 
decomposition of G, a root a is called positive if 
Ga ONF £0 and negative if Ga IN #490. A root is 
called simple if it cannot be decomposed into a sum 
of positive roots. The set of all simple roots is 
called a simple root system of G and is denoted here 
by A?. The set B=H@®N" is called a Borel 
subalgebra of G. Such a Borel subalgebra is solvable 
but not maximal solvable. Indeed, adding to B a 
negative simple isotropic root generator (i.e., a 
generator associated to an odd root of zero length), 
the obtained subalgebra is still solvable since the 
superalgebra sl(1|1) is solvable. However, B con- 
tains a maximal solvable subalgebra Bz of the even 
part Oz- 

In general, for a basic Lie superalgebra G, there 
are many inequivalent classes of conjugacy of Borel 
subalgebras (while for the simple Lie algebras, all 
Borel subalgebras are conjugate). 

To each class of conjugacy of Borel subalgebras of 
G is associated a simple root system A®. Hence, 
contrary to the Lie algebra case, to a given basic 
Lie superalgebra G will be associated in general 


many inequivalent simple root systems, up to a 
transformation of the Weyl group W(G) of G (the 
Weyl group of a basic Lie superalgebra being 
generated by the Weyl reflections with respect to 
the even roots; under a transformation of W(G), a 
simple root system will be transformed into an 
equivalent one with the same Dynkin diagram). The 
generalization of the Weyl group for a basic Lie 
superalgebra G gives a method for constructing all 
the simple root systems of G and hence all the 
inequivalent Dynkin diagrams of G. For a € Aj, one 
defines 





wel) = 8-21 a if (a,a) #0 
we(8)=8+a if (œa) =0, (a,8) £0 
w(8)=8 if (aa) = (a,8) =0 
Wala) = —a 


Note that the transformation associated to an odd 
root a of zero length cannot be lifted to an 
automorphism of the superalgebra since Wwa trans- 
forms even roots into odd ones, and vice versa, and 
the Z.-gradation would not be respected. A simple 
root system A^? being given, from any root a € A? 
such that (a,a)=0, one constructs the simple root 
system W,(A°), where wa is the generalized Weyl 
reflection with respect to œ and one repeats the 
procedure on the obtained system until no new basis 
arises. 

In the set of all inequivalent simple root 
systems of a basic Lie superalgebra, there is one 
simple root system that plays a particular role, 
the distinguished simple root system, for which 
the number of odd roots is equal to one, 
constructed as follows. Consider the distinguished 
Z-gradation of G,G= iez Gi The even simple 
roots are given by the simple root system of the 
Lie subalgebra Go and the odd simple root is the 
lowest weight of the representation G1 of Go. See 
Table 3 for the root systems and Table 4 for the 
distinguished simple root systems of the basic Lie 
superalgebras. 

Let A°=(ay,...,a,) be a simple root system 
of G, such that (a; aj) € Z and | min (a; a;)|=1 if 
(ai, aj) Æ 0. Then one defines the symmetric Cartan 
matrix a with integer entries as aj =(aj,a;). One 
associates to A? a Dynkin diagram according to the 
following rules: 





1. One associates to each simple even root a white 
dot, to each simple odd root of nonzero length 
(au 4 0) a black dot, and to each simple odd root 
of zero length (a; =0) a gray dot. 


Table 3 Root systems Aş, A; of the basic Lie superalgebras 
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Superalgebra G Az A 

A(m — 1,n — 1) ej = 67, On = ĝi +(e; — ôk) 

B(m,n) see) oe Ej, Hey, +ôk + ôl, +2ôk +c; + ôk, ôk 

B(O, n) =O + ôl, +2ők tók 

C(n + 1) +ôők + ôl, +2ôk +e + ôk 

D(m,n) +e; + Ej, +ók + ôl, +26, +e; + Ok 

F(4) +ő, z= E +£j I(E + €9 + €3 + 6) 
G(3) +26, +£j, Ej — Ej +ô, tej t6 

D(2, 1; a) +2¢; +e1 + E2 £ E3 


1<i,j<m,1<k,!l<n for A(m-—1,n-— 1), B(m,n), C(n+ 1),D(m 


,n). 1< i,j <3 for F(4), G(8), D(2, 1; a), with £4 + £2 +e3=0 in 


the case of G(3). For A(n — 1,n — 1), one has to add the condition £4 +--+ En =61 +--+ ôn. 


Table 4 Distinguished simple root systems of the basic Lie superalgebras 


Superalgebra G 


Distinguished simple root system A° 





A(m — 1,n — 1) 
B(m, n) 

B(O, n) 

C(n) 

D(m, n) 

F(4) 

G(3) 

D(2, 1; a) 


2. The ith and jth dots are joined by n; lines where 
2|aij| | 
j = ——_———_ _ if aj.a;, 4 0 
= ninaa AT 


i m aA T AN if ii 0 d 2 =U) 
Nij in (la| 2) if a; Z O and aj 


ni; = aij if ai = aj = 0 

3. We add an arrow on the lines connecting the ith 
and jth dots when 7; > 1, pointing from 7 to 7 if 
Ajj Ajj F 0 and laz > \a;;| or if ay = 0, a, a 0, 
laj|<2, and pointing from j to i if a;=0, 
ajj FO, |ajj| > 2. 

4. For D(2, 1; a), Nij = 1 if dij A 0 and Nij =() il 
ai =0. No arrow is put on the Dynkin diagram. 


The distinguished Dynkin diagrams of the basic Lie 
superalgebras are listed in Table 5. 


Representation Theory of Basic Lie 
Superalgebras 


We restrict in the following to the basic Lie 
superalgebras. We assume that G Æ psl(n,n) but the 
following results still hold for sl(n |n). Let G=N~* © 
HN- be a Borel decomposition of G where N+ 
(resp. NV’) is spanned by the positive (resp. negative) 
root generators of G,H is a Cartan subalgebra, and 
H* is the dual of H. A representation 7:G— End Y 
with representation space V is called a highest- 








ô] — 60,..., bn—1 — ón, bn — €45€1 = €95-+-+5Em—1 — Em 

64 — 690,...,6n—1 — On, On — €1, E1 — E2, . . . , Em-1 — Em, Em 

64 — ô2, .. ., Ón—1 — Ôn, Ôn 

€ — 64,64 — ó2,..., bn—4 — ón, 26n 

64 — 60,...,6n—-1 — Ôn, On — E1, E1 — E2, ..., Em-1 — Em, Em_-1 + Em 


1 

z0 =S — €9 — €3), €3, €2 — €3,€1 — €2 
6+ €3,€1,€2 — & 

E1 — €9 — €3, 2&0, 2€3 


weight representation with highest weight A € H* if 
there exists a nonzero vector va € V such that 


N v = 0 
h(v,) = A(b\wal(h € H) 


The G-module V is called a highest-weight module, 
denoted by V(A), and the vector va € V a highest- 
weight vector. From now on, # is the distinguished 
Cartan subalgebra of G with basis of generators 
(H,,...,H,) where r=rankG and H, denotes the 
Cartan generator associated to the odd simple root. 
The Kac—Dynkin labels are defined by 

for > S and a= (1,04) 

A weight A € H* is called a dominant weight if a; > 0 
for alli Æ s, integral if a; € Z for alli Æ s, and integral 
dominant if a; € Zso for all 1~#s. A necessary 
condition for the highest-weight representation of G 
with highest weight A to be finite dimensional is that A 
be an integral dominant weight. 

One then defines the Kac module. Consider 
G= @icz, G; the distinguished Z-gradation of G and 
let K=Go BN’, where N= jio Gi be a sub- 
algebra of G. Denote by U(G) and U(K) the 
corresponding universal enveloping superalgebras. 
Let A € H* be an integral dominant weight and 
Vo(A) be the Go-module with highest weight A, 
which is extended to a K-module by setting 
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Table 5 Distinguished Dynkin diagrams of the basic Lie 
superalgebras 


Superalgebra G Distinguished Dynkin diagram 


Am-1n-) — Qe--O-——-@—O---O 

B(m, n) O- - -O—®—©- --05=0 
B(0, n) O---O>9 

C(n +1) @—O- - -O&=0 

D(m, n) O---0-8 0-0 


F(4) @—_O==O0—O 
G(3) 9—0 


e 


N Vo (A)=0. From this K-module, it is possible to 
construct a G-module in the following way. One 
considers the factor space U(G) Su) Vo(A) consist- 
ing of elements of U(G)®Vo(A) such that the 
elements )®v and 1@h(v) have been identified 
for h&K and v € Yo(A). This space acquires the 
structure of a G-module by setting glu & v) = gu & v 
for u € U(G),g € G, and v € Vo(A). This G-module is 
called the induced module from the K-module Vo(A) 
and denoted by Ind¥.Vo(A). For example, in the case 
of type I basic Lie superalgebras, if {f,,..., fg} 
denotes a basis of odd generators of G/K, then 


IndgVo(A)= @ fi---figVo(A) 
1 <i, <:-<ip<d 
The Kac module V(A) is defined as follows: 


1. For a superalgebra G of type I (the odd part is the 
direct sum of two irreducible representations of the 
even part), the Kac module is the induced module 


V(A) = Indf. Vo(A) 


2. For a superalgebra G of type II (the odd part is an 
irreducible representation of the even part), the 


induced module Ind¥. VYo(A) contains a submodule 
M(A) = U(G)G?s!VoIA A), where y is the longest 
simple root of G- which is hidden behind the odd 
simple root — da is, the longest simple root of 
sp(2n) in the case of osp(m|2n) and the simple 
root of sl(2) in the case of F(4),G(3), and 
D(2,1;a) - and b=2(A,~)/(w, Y) is the compo- 
nent of A with respect to Y. The Kac module is 
DA as the quotient of the induced module 


Ind¥. Vo(A) by the submodule M(A): 


V(A) = Indk Vo(A)/U(G)G $ Vo(A) 


In the case where the Kac module is not simple, it 
contains a maximal a 7 (A) and the 
quotient module V(A)=V(A)/Z(A) is a simple 
module. 

The fundamental result concerning the representa- 
tions of basic Lie superalgebras is the following: 


1. Any finite dimensional boy representation 
of G is of the form V(A) = V(A)/Z(A), where A is 
an integral dominant ie 

2. Any finite-dimensional simple G-module is 
uniquely characterized by its integral dominant 
weight A: two G-modules V(A) and V(A’) are 
isomorphic if and only if A=”. 

3. HA finite-dimensional simple G-module V(A) = 
V(A)/Z(A) has the weight decomposition 


A) =v, 


A<A 
with 


Vy = {v € Vih(v) = A(hb)u,b € H} 


The presence of odd roots will have another 
important consequence in the representation theory 
of superalgebras. Indeed, one might find that in certain 
representations, weight vectors, different from the 
highest one specifying the representation, are annihi- 
lated by all the generators corresponding to positive 
roots. Such vector have, of course, to be decoupled 
from the representation. Representations of this kind 
are called atypical, while the other irreducible repre- 
sentations not suffering this pathology are called 
typical. For a basic Lie superalgebra G with root 
system A, one defines Aj = {a € Agla/2¢ Az} and 
A; ={a € Az|2a ¢ Ap}. Let po be the half-sum of the 
roots of A*, pı the half-sum of the roots of A+, and 

0 l l 1, 
p=po — pı. The representation m~ with highest 
weight A is called typical if 


(A+p,a) #0 for all œ € At 


The highest weight A is then called typical. If 
there exists some a € At such that (A + p,a)=0, 


the representation m and the highest weight A are 
called atypical. The number of distinct elements a € 
At for which A is atypical is the degree of 
sepia of the representation m. If there exists 
one and only one a € A; such that (A + p,a)=0, 
the representation m and the highest weight A are 
called singly atypical. 

The Kac module V(A) is a simple G-module if and 
only if the highest weight A is typical. All the finite- 
dimensional representations of B(0, 7) are typical. All 
the finite-dimensional representations of C(n + 1) are 
either typical or singly atypical. 

The dimension of a typical finite-dimensional 
representation V of G is given by 


dim (A) = 2447 TT Ato) 
acAt (po, a) 


where dim V5(4) = 
G = B(0, n), 


dim V-(A) if G # B(0,n), and if 


(A + p,a) 
il (Do, a) 


0 


dim Y5(A) — dim Y;(A) = 
The atypicality conditions are the following: 
è For A(m,n) with A=(a,...,4m+4n-1) 


ay An An ân+1 Am+n-1 
OO. — e 0 
Sa- 5 ak + ån =i+j-— 2n 


k=n+1 


where 1<i<n<j<m+n-1. 
e B(m,n) with A=(a1,...,4min)(m £ 0) 


ân+1 Am+n-1 men 


n j 
q=i q=n+1 

n j m+n—1 

` dg ` Ag —2 ` ag — Amin 
q=i q=n+1 q=j+1 


= 2m+i—-j-—-1=0 


where 1<i<n<j<m+n-1. 
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e C(n + 1) with A=(q,...,4n41) 
ay a2 An An+1 
G= 
1 
a-y a,—-i+1=0 
aoe 
n+1 
dı — Da2 ` ag—2n+i-1=0 
q=i+1 
where 1 <i <n. 
e D(m|n) with A =(d1,...,4m+4n) 
ay An—1 An An+1 4m+n-2 ans m—1 
3-40 a 
Anim 
n j 
q=i q=n+1 
where 1<i<n<j<m+n-1 
m+n—2 
Da- lg —Anin=m—n+i-1 
q=n+1 
where 1 <i<n 
n m+n—2 
> u- 5 dg—-2 Š, ag 
q=i g=n+1 q=j+1 


= Am+n-—-1 + Amin t2m+i-j-—2 
where 1 <i<n<j<m4+n-2 


See also: Lie Groups: General Theory; Lie, Symplectic, 
and Poisson Groupoids and Their Lie Algebroids. 
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Introduction 


Groupoids are mathematical structures able to describe 
symmetry properties more general than those described 
by groups. They were introduced (and named) by 
H Brandt in 1926. Around 1950, Charles Ehresmann 
used groupoids with additional structures (topological 
and differentiable) as essential tools in topology and 
differential geometry. In recent years, Mickael Karasev, 
Alan Weinstein, and Stanistaw Zakrzewski indepen- 
dently discovered that symplectic groupoids can be used 
for the construction of noncommutative deformations 
of the algebra of smooth functions on a manifold, with 
potential applications to quantization. Poisson group- 
oids were introduced by Alan Weinstein as general- 
izations of both Poisson Lie groups and symplectic 
eroupoids. 

We present here the main definitions and first 
properties relative to groupoids, Lie groupoids, Lie 
algebroids, symplectic and Poisson groupoids and 
their Lie algebroids. 


Groupoids 
What is a Groupoid? 


Before stating the formal definition of a groupoid, let us 
explain, in an informal way, why it is a very natural 
concept. The easiest way to understand that concept is 
to think of two sets, r and To. The first one, I’, is called 
the “set of arrows” or “total space” of the groupoid, 
and the other one, To, the “set of objects” or “set of 
units” of the groupoid. One may consider an element 
x €T as an arrow going from an object (a point in To) 
to another object (another point in To). The word 
“arrow” is used here in a very general sense: it means a 
way for going from a point in Ip to another in To. One 
should not consider an arrow as a line drawn in the set 
Io joining the starting point of the arrow to its 
endpoint: this happens only for some special groupoids. 
Rather, one should think of an arrow as living outside 
Io, with only its starting point and its endpoint in Lo, as 
shown in Figure 1. 

The following ingredients enter the definition of a 
groupoid. 


1. Two maps a: — Tọ and @:I —> To, called the 
“target map” and the “source map” of the 


groupoid. If x € IT is an arrow, a(x) € To is its 
endpoint and G(x) € To its starting point. 


2. A “composition law” on the set of arrows; we can 


compose an arrow y with another arrow x, and get 
an arrow m(x,y), by following first the arrow y, 
then the arrow x. Of course, m(x, y) is defined if and 
only if the target of y is equal to the source of x. The 
source of m(x, y) is equal to the source of y, and its 
target is equal to the target of x, as illustrated in 
Figure 1. It is only by convention that we write 
m(x,y) rather than m(y,x): the arrow which is 
followed first is on the right, by analogy with the 
usual notation f og for the composition of two 
maps g and f. When there is no risk of confusion, we 
write x o y, or x.y, or even simply xy for m(x,y). 
The composition of arrows is associative. 

3. An “embedding” £ of the set To into the set I’, which 
associates a unit arrow e(u) with each u cI. 
That unit arrow is such that both its source and its 
target are u, and it plays the role of a unit when 
composed with another arrow, either on the right or 
on the left: for any arrow x,m(e(a(x)),x) =x, and 
m(x, €(G(x))) =x. 

4. Finally, an “inverse map” uv from the set of 
arrows onto itself. If x € [ is an arrow, one may 
think of (x) as the arrow x followed in the 
reverse sense. We often write x! for (x). 


Now we are ready to state the formal definition of 
a groupoid. 
Definition 1 A groupoid is a pair of sets (IIo) 
equipped with the structure defined by the following 
data: 


(i) an injective map e¢:Ig—TI, called the unit 
section of the groupoid; 

(ii) two maps @:I —Tọ and 8: —PTo, called, 
respectively, the target map and the source 
map; they satisfy 

aoe = boe = idr, [1] 


(iii) a composition law m:I2—>TPT, called the pro- 
duct, defined on the subset I, of Fr xT, called 
the set of composable elements, 


r2 = { (x,y) ET xT; B(x) = aly)} 2] 


m(Xx,y) T 


a(m(x,y)) = a(x) B(x)= aly) BY) =5(m(x,y)) 


Figure 1 Two arrows xand y € I, with the target of y, a(y) € To, 
equal to the source of x, 3(x) € To, andthe composed arrow m(x, y). 
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which is associative, in the sense that whenever 
one side of the equality 
m(x,m(y,z)) = m(m(x,y),z) [3] 


is defined, the other side is defined too, and the 
equality holds; moreover, the composition law 
m is such that for each x ET, 


m(e(a(x)),x) = m(x, e(plx))) = x 4] 


(iv) a map ų¿:IT —> T, called the inverse, such that, for 
every x € I, (x, (x)) € T2 and (u(x), x) € T2, and 


m(x,U(x)) =e(a(x)), m(x), x) =e) [S] 


The sets T and Tọ are called, respectively, the 
total space and the set of units of the groupoid, 
which is itself denoted by T = To. 


Identification and Notations 


In what follows, by means of the injective map £, we 
will identify the set of units To with the subset e€(To) 
of T. Therefore, e will be the canonical injection in T 
of its subset I. 

For x and y € T, we will sometimes write x.y, or 
even simply xy for m(x,y), and x! for u(x). In 
addition, we will write “the groupoid I” for “the 
groupoid rSTo.” 


Properties and Comments 


The above definitions have the following consequences. 


Involutivity of the inverse map The inverse map 1 
is involutive: 


Lo.=idp [6] 
We have indeed, for any x ET, 


Lou(x) =m( o u(x), BU o u(x))) 
=m(1 0 U(x), B(x)) = mz o u(x), m(u(x), x)) 


= WOU ss) i)a] =e) x) Sx 


Unicity of the inverse Let x and y € T be such that 


m(x,y) =a(x) and m(y,x) = G(x) 


Then we have 


y =m(y, Bly)) = m(y, a(x) 
=m(y,m(x, o(x))) = m(m(y, x), o(x)) 
=m(G(x), e(x)) = m(a(e(x)), o(x)) = u(x) 
Therefore for any x € T, the unique y € T such that 
my, x) = B(x) and m(x, y) =a(x) is u(x). 


The fibers of a and 8 and the isotropy groups The 
target map a (resp. the source map 8) of a groupoid 
[=I determines an equivalence relation on T: 
two elements x and y € T are said to be a-equivalent 
(resp. (-equivalent) if a(x)=a(y) (resp. if 
G(x) = G(y)). The corresponding equivalence classes 
are called the a-fibers (resp. the (-fibers) of the 
groupoid. They are of the form a™t (u) (resp. 8 (u)), 
with u € Io. 
For each unit u € Io, the subset 


T, = a~t (u) NB" (u) 
= {x ET; a(x) = G(x) =u} [7] 


is called the “isotropy group” of u. It is indeed a 
group, with the restrictions of m and ų¿ as composi- 
tion law and inverse map. 


A way to visualize groupoids We have seen 
(Figure 1) a way in which groupoids may be 
visualized, by using arrows for elements in I’ and 
points for elements in Io. There is another very 
useful way to visualize groupoids, shown in 
Figure 2. 

The total space T of the groupoid is represented as 
a plane, and the set To of units as a straight line in that 
plane. The a-fibers (resp. the 3-fibers) are represented 
as parallel straight lines, transverse to To. 


Examples of Groupoids 


The groupoid of pairs Let E be a set. The “group- 
oid of pairs” of elements in E has, as its total 
space, the product space Ex E. The diagonal 
Ag = {(x,x);x € E} is its set of units, and the target 
and source maps are 


b: (x,y) (y,y) 


Its composition law m and inverse map u are 


a: (x,y) = (x, x), 


m((x, y), (Y,z)) = (x,2) 
L((x, y)) a (x,y)! = (y, x) 


Groups A group G is a groupoid with set of units 
{e}, with only one element e, the unit element of the 





Figure 2 A way to visualize groupoids. 
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group. The target and source maps are both equal to 
the constant map x> e. 


Definition 2 A topological groupoid is a groupoid 
[S30 for which T is a (maybe non-Hausdorff) 
p 
topological space, To a Hausdorff topological subspace 
of I’, a and 8 surjective continuous maps, m:I2 >T a 
continuous map, and ¿:T — T a homeomorphism. 
A Lie groupoid is a groupoid [=I for which 
I is a smooth (maybe non-Hausdorff) manifold, To a 
smooth Hausdorff submanifold of T, a and 8 smooth 
surjective submersions (which implies that T2 is a 
smooth submanifold of F x T), m:T2 — T a smooth 
map, and ¿:IT —T a smooth diffeomorphism. 


Properties of Lie Groupoids 


Dimensions Let [=I be a Lie groupoid. Since a 
and 8 are submersions, for any x €T, the a-fiber 
a(a(x)) and the G-fiber G-!(G(x)) are submanifolds 
of T, both of dimension dim T — dim Io. The inverse 
map J, restricted to the a-fiber through x (resp. the 
G-tiber through x), is a diffeomorphism of that fiber 
onto the (-fiber through +(x) (resp. the a-fiber 
through 1(x)). The dimension of the submanifold 
I, of composable pairs in I x T is 2dimT — dimT. 


The tangent bundle of a Lie groupoid Let P=. be 
a Lie groupoid. Its tangent bundle TT is a Lie 
groupoid, with TTo as set of units, Ta: TT —> TT 
and TG: TT — TTo as target and source maps. Let us 
denote by T% the set of composable pairs in T x T, by 
m:T2 — T the composition law, and by ¿:T — T the 
inverse. Then the set of composable pairs in TT x TT 
is simply TI, the composition law on TT is 
Tm:TT2 — TT, and the inverse is Tv: TT — TT. 
When the groupoid [ is a Lie group G, the Lie 
groupoid TG is a Lie group too. We will see that 
the cotangent bundle of a Lie groupoid is a Lie 
groupoid, and more precisely a symplectic groupoid. 


Isotropy groups For each unit wE€To of a Lie 
groupoid, the isotropy group T, (defined earlier) is a 
Lie group. 


Examples of Topological and Lie Groupoids 


Topological groups and Lie groups A topological 
group (resp. a Lie group) is a topological groupoid 
(resp. a Lie groupoid) whose set of units has only 
one element e. 


Vector bundles A smooth vector bundle 7: E — M 
on a smooth manifold M is a Lie groupoid, with the 
base M as set of units (identified with the image of 
the zero section); the source and target maps both 
coincide with the projection r; the product and the 


inverse maps are the addition (x, y)—> x +y and the 
Opposite map x+> —x in the fibers. 


The fundamental groupoid of a topological space Let 
M be a topological space. A “path” in M is a 
continuous map y:[0, 1] — M. We denote by [7] the 
homotopy class of a path y and by II(M) the set of 
homotopy classes of paths in M (with fixed end- 
points). For [y] € I(M), we set a([y])=~7y(1), 
Bi[y]) =7(0), where y is any representative of the 
class [y]. The concatenation of paths determines a 
well-defined composition law on II(M), for which 
II(M)=M is a topological groupoid, called the 
p . 
“fundamental groupoid” of M. The inverse map is 
[y] = [y'], where y is any representative of [y] and 
vy! is the path t ~(1 — t). The set of units is M, if 
we identify a point in M with the homotopy class of 
the constant path equal to that point. 

When M is a smooth manifold, the same 
construction can be made with piecewise smooth 
paths, and the fundamental groupoid T1(M)=M is a 
Lie groupoid. 


Symplectic and Poisson Groupoids 
Symplectic and Poisson Geometry 


Let us recall some definitions and results in 
symplectic and Poisson geometry, used in the next 
sections. 


Symplectic manifolds A “symplectic form” on a 
smooth manifold M is a differential 2-form w, which 
is closed, that is, which satisfies 


dw = 0 [8] 


and nondegenerate, that is, such that for each point 
x €M and each nonzero vector v € TyM, there 
exists a vector w € T ,M such that w(v,w) Æ 0. 
Equipped with the symplectic form w, a smooth 
manifold M is called a “symplectic manifold” and 
denoted by (M, w). 

The dimension of a symplectic manifold is always 
even. 


The Liouville form on a cotangent bundle Let N 
be a smooth manifold, and T*N be its cotangent 
bundle. The Liouville form on T*N is the 1-form 6 
such that, for any n € T*N and v € T ,(T*N), 


A(v) = (n, Tan(v)) 9] 


where my: T*N — N is the canonical projection. 

The 2-form w= dð is symplectic, and is called the 
“canonical symplectic form” on the cotangent 
bundle T*N. 
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Poisson manifolds A Poisson manifold is a smooth 
manifold P equipped with a bivector field (i.e., a 
smooth section of ^A? TP) II which satisfies 


I, 1] = 0 10] 


the bracket on the left-hand side being the Schouten 
bracket. The bivector field II will be called the 
Poisson structure on P. It allows us to define a 
composition law on the space C®%(P, R) of smooth 
functions on P, called the Poisson bracket and 
denoted by (f,g)>{f,g}, by setting, for all f and 
g € C™(P,R) and x €P, 


tft (x) = Hdt (x), dg(x)) [11] 


That composition law is skew-symmetric and satis- 
fies the Jacobi identity, therefore turns C%(P, R) into 
a Lie algebra. 


Hamiltonian vector fields Let (P,II) be a Poisson 
manifold. We denote by II': T*P — TP the vector 
bundle map defined by 


(n, TÈ (C)) = (G, n) [12] 


where ¢ and 7) are two elements in the same fiber of 
T*P. Let f: P — R be a smooth function on P. The 
vector field X; =I (df) is called the Hamiltonian 
vector field associated to f. If g:P — R is another 
smooth function on P, the Poisson bracket {f, g} can 
be written as 


{f,g} = (dg, T (df)) = —(df, T (dg)) [13] 


The canonical Poisson structure on a symplectic 
manifold Every symplectic manifold (M,w) has a 
Poisson structure, associated to its symplectic 
structure, for which the vector bundle map 
I: T*M — M is the inverse of the vector bundle 
isomorphism v> —i(v)w. We will always consider 
that a symplectic manifold is equipped with that 
Poisson structure, unless otherwise specified. 


The KKS Poisson structure Let G be a finite- 
dimensional Lie algebra. Its dual space G* has a 
natural Poisson structure, for which the bracket of 
two smooth functions f and g is 


{f, gt(€) = (& [df (£), dg(g)]) [14] 
with £ € G", the differentials df (£) and dg(£) being 


considered as elements in G, identified with its 
bidual §**. It is called the Kirillov, Kostant, and 
Souriau (KKS) Poisson structure on G’. 


Poisson maps Let (P4, Il) and (P2,II,) be two 
Poisson manifolds. A smooth map y:P; — P2 is 
called a Poisson map if, for every pair (f,g) of 
smooth functions on P>, 


Lef gh =P Bho [15] 


Product Poisson structures The product P4 x P2 
of two Poisson manifolds (P1, I) and (P2,II,) has 
a natural Poisson structure: it is the unique 
Poisson structure for which the bracket of 
functions of the form (x1, x2)—> fi(xı)f2(x2) and 
(x1, x2) +> 21(x1)g2(x2) (where fi and gı € C% 
(P1, R), fo and g2 € C®(P2,R)) is 


(x1, x2) {f1, 8134 (%1) tha, A E 


The same property holds for the product of any 
finite number of Poisson manifolds. 


Symplectic orthogonality Let (V,w) be a symplectic 
vector space, that means a real, finite-dimensional 
vector space V with a skew-symmetric nondegenerate 
bilinear form w. Let W be a vector subspace of V. 
The “symplectic orthogonal” of W is 


orth W = {v € V;u(v,w) = 0 for all we W} [16] 
It is a vector subspace of V, which satisfies 
dim W+dim(orthW)=dim V, orth(orth W) = W 


The vector subspace W is said to be isotropic if 
WcorthW, coisotropic if orthWcwW, and 
Lagrangian if W=orth W. In any symplectic vector 
space, there are many Lagrangian subspaces; there- 
fore, the dimension of a symplectic vector space is 
always even; if dim V=2n, the dimension of an 
isotropic (resp. coisotropic, resp. Lagrangian) vector 
subspace is <n (resp. > n, resp. =n). 


Coisotropic and Lagrangian submanifolds A sub- 
manifold N of a Poisson manifold (P, II) is said to be 
coisotropic if the bracket of two smooth functions, 
defined on an open subset of P and which vanish on 
N, vanishes on N too. A submanifold N of a 
symplectic manifold (M, w) is coisotropic if and only 
if for each point x € N, the vector subspace T,.N of 
the symplectic vector space (T,M,w(x)) is coisotro- 
pic. Therefore, the dimension of a _ coisotropic 
submanifold in a 2m-dimensional symplectic mani- 
fold is > n; when it is equal to n, the submanifold N 
is said to be Lagrangian. 


Poisson quotients Let ~:M-—P be a surjective 
submersion of a symplectic manifold (M,w) onto a 


316 Lie, Symplectic, and Poisson Groupoids and Their Lie Algebroids 


manifold P. The manifold P has a Poisson structure 
II for which y is a Poisson map if and only if 
orth( ker Ty) is integrable. When that condition is 
satisfied, that Poisson structure on P is unique. 


Poisson Lie groups A Poisson Lie group is a Lie 
group G with a Poisson structure II, such that the 
product (x, y)—> xy is a Poisson map from G x G, 
endowed with the product Poisson structure, into 
(G, II). The Poisson structure of a Poisson Lie group 
(G,II) always vanishes at the unit element e of G. 
Therefore, the Poisson structure of a Poisson Lie 
group never comes from a symplectic structure on 
that group. 


Definition 3 A symplectic groupoid (resp. a Pois- 
son groupoid) is a Lie groupoid rro with a 
symplectic form w on I (resp. with a Poisson 
structure II on T) such that the graph of the 
composition law m 


f(x,y,z) ET xT xT;(x,y) € T2 and z = m(x, y)} 


is a Lagrangian submanifold (resp. a coisotropic 
submanifold) of rxr xI with the product 
symplectic form (resp. the product Poisson structure), 
the first two factors T being endowed with the 
symplectic form w (resp. with the Poisson structure IT), 
and the third factor T being T with the symplectic form 
—w (resp. with the Poisson structure II). 


The next theorem states important properties of 
symplectic and Poisson groupoids. 


Theorem 4 Let TTo be a symplectic groupoid 
with symplectic 2-form w (resp. a Poisson groupoid 
with Poisson structure II). We have the following 
properties. 


(i) For a symplectic groupoid, given any point 
c ET, each one of the two vector subspaces of 
the symplectic vector space (T-T,w(c)), 


T.(9'(G(c))) and T.(a*(a(c))) 


is the symplectic orthogonal of the other one. 
For a symplectic or Poisson groupoid, if f is 
a smooth function whose restriction to each 
a-fiber is constant, and g a smooth function 
whose restriction to each (3-fiber is constant, 
then the Poisson bracket {f,g} vanishes 
identically. 
The submanifold of units T'o is a Lagrangian 
submanifold of the symplectic manifold (1, w) 
(resp. a coisotropic submanifold of the Poisson 
manifold (1, II)). 
(iii) The inverse map 1:1 —T is an antisymplecto- 
morphism of (T,w), that is, it satisfies “w = —w 


xr 


(ii 


(resp. an anti-Poisson diffeomorphism of (T, II), 
i.e., it satisfies „II = — II). 


Corollary 5 Let rSTo be a symplectic groupoid 
with symplectic 2-form w (resp. a Poisson group- 
oid with Poisson structure II). There exists on To a 
unique Poisson structure Ilo for which a:1 —> To 
is a Poisson map, and 3:1 — To an anti-Poisson 
map (1.e., 3 is a Poisson map when To is equipped 
with the Poisson structure Io). 


Examples of Symplectic and Poisson Groupoids 


The cotangent bundle of a Lie groupoid Let rSTo 
be a Lie groupoid. 

We have seen above that its tangent bundle TT 
has a Lie groupoid structure, determined by that of 
I. Similarly (but much less obviously), the cotan- 
gent bundle T*I has a Lie groupoid structure 
determined by that of I. The set of units is the 
conormal bundle to the submanifold rọ of T, 
denoted by NT. We recall that VT is the vector 
sub-bundle of T% [I (the restriction to Io of the 
cotangent bundle T*I), whose fiber M To at a 
point p € To is 


N To = fr E TE (n,v) = 0 for all v € T;ro} 


To define the target and source maps of the 
Lie algebroid T*T, we introduce the notion of 
“bisection” through a point x €I. A bisection 
through x is a submanifold A of T, with x € A, 
transverse both to the a-fibers and to the (-fibers, 
such that the maps a and 8, when restricted to A, 
are diffeomorphisms of A onto open subsets a(A) 
and B(A) of To, respectively. For any point x € M, 
there exist bisections through x. A bisection A 
allows us to define two smooth diffeomorphisms 
between open subsets of I’, denoted by La and Ra 
and called the left and right translations by A, 
respectively. They are defined by 


La : a '(B(A)) > a *(a(A)) 
Laly) =m( plz" ° a(y),y) 
and 
Ra : 8 (a(A)) > 8 (6(A)) 
Ra(y) =m(y, ala’ 0 A) 


The definitions of the target and source maps for 
T*T rest on the following properties. Let x be a 
point in I and A be a bisection through x. The two 
vector subspaces, Tawo and ker To(x)3, are com- 
plementary in Taw. For any v € Tar, v — TG(v) 
is in ker Taw) 8. Moreover, Ra maps the fiber 
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6B (a(x)) into the fiber 6-'(G(x)), and its restriction 
to that fiber does not depend on the choice of A; it 
depends only on x. Therefore, TR4(v — TG(v)) is in 
ker T,,3 and does not depend on the choice of A. We 
can define the map a by setting, for any € € TT and 
any ve Tel 


(a(€),v) = (€, TRa(v — TS(v))) 


Similarly, we define @ by setting, for any €€ ITE 
and any w € Tgl, 


(Ble), w) = (E, TLa(w — Ta(w))) 


We see that @ and ĝ are unambiguously defined, 
smooth, and take their values in the submanifold 
N°To of T*T. They satisfy 


mo@=aonmp, mof= Gon 


where amp:7*T —TI is the bundle 
projection. 

Let us now define the composition law m on T*T. 
Let ¿€ TIT and ne I be such that B(£) = a(n). 
This implies 8(x)=a(y). Let A be a bisection 
through x and B a bisection through y. There exist 
a unique -iza € TaT 0 and a unique npg € Thy Vo 


such that 


cotangent 


—m~ 


= (L3')"( ©) Fa Sha 
n = (Rg Y (8E) + Eng 


Then m/(€,7) is given by 


MCE, n) = oyna + Bima + (REY (L3) (2) 


We observe that in the last term of the above expression 
we can replace G(€) by a(7), since these two expressions 
are equal, and that (R;')*(L,')* =(L7')*(R3!)*, since 
Rpg and L4 commute. 

Finally, the inverse Tin T*T is v*. 

With its canonical symplectic form, T*T NT is 
a symplectic groupoid. When the Lie groupoid I is a 
Lie group G, the Lie groupoid T*G is not a Lie 
group, contrary to what happens for TG. This shows 
that the introduction of Lie groupoids is not at all 
artificial: when dealing with Lie groups, Lie group- 
oids are already with us! The set of units of the 
Lie groupoid T*G can be identified with G* (the 
dual of the Lie algebra G of G), identified itself with 
T*G (the cotangent space to G at the unit element e). 
The target map &:T*G — T3G (resp. the source 
map ĝ:T*G — TG) associates to each g€ G 
and € € T7G, the value at the unit element e of the 
right-invariant 1-form (resp. the  left-invariant 
1-form) whose value at x is €. 


Poisson Lie groups as Poisson groupoids Poisson 
groupoids were introduced by Alan Weinstein as a 
generalization of both symplectic groupoids and Poisson 
Lie groups. Indeed, a Poisson Lie group is a Poisson 
groupoid with a set of units reduced to a single element. 


Lie Algebroids 


The notion of a Lie algebroid, due to Jean Pradines, is 
related to that of a Lie groupoid in the same way as the 
notion of a Lie algebra is related to that of a Lie group. 


Definition 6 A Lie algebroid over a smooth 
manifold M is a smooth vector bundle 7:A — M 
with base M, equipped with 


(i) a composition law (s1,s2)+> {s1, s2} on the space 
T(z) of smooth sections of 7, called the bracket, 
for which that space is a Lie algebra; and 

(ii) a vector bundle map p: A — TM, over the identity 
map of M, called the anchor map, such that, for all 
sı and s2 E€ ['™(z) and all f € C®”(M, R), 


{s1;fs2} = fisi, s2} + ((p0 s1): f)s2 [17] 


Examples 


Lie algebras A finite-dimensional Lie algebra is a 
Lie algebroid (with a base reduced to a point and the 
zero map as anchor map). 


Tangent bundles and their integrable sub-bundles A 
tangent bundle ty: TM — M to a smooth manifold 
M is a Lie algebroid, with the usual bracket of 
vector fields on M as composition law, and the 
identity map as anchor map. More generally, any 
integrable vector sub-bundle F of a tangent bundle 
™:ITM-—M is a Lie algebroid, still with the 
bracket of vector fields on M with values in F as 
composition law and the canonical injection of F 
into TM as anchor map. 


The cotangent bundle of a Poisson manifold Let 
(P,II) be a Poisson manifold. Its cotangent bundle 
mtp:T*P — P has a Lie algebroid structure, with 
II? : T*P — TP as anchor map. The composition law 
is the bracket of 1-forms. It will be denoted by 
(7,¢)++[, ¢] (in order to avoid any confusion with 
the Poisson bracket of functions). It is given by the 
formula, in which 7 and Ç are 1-forms and X a 
vector field on P: 


(Ln, 6], X) = Hin, d(¢, X)) + H(d(n, X), ¢) 
+ (L(X)TD (n, 6) [18] 


We have denoted by £(X)II the Lie derivative of 
the Poisson structure I] with respect to the vector 
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field X. Another equivalent formula for that 
composition law is 


6n] = LTC) — Ln) — d(IL(¢,9)) [1.9] 


The bracket of 1-forms is related to the Poisson 
bracket of functions by 


ldf,dg])=d{f,g} forall fandgeC™(P,R) [20] 


Properties of Lie Algebroids 


Let 7:A be a Lie algebroid with anchor map 
p:A— IM. 


A Lie algebras homomorphism For any pair (s1, s2) 
of smooth sections of 7, 


po {s1,s2} — [p © $1, p © s2] 


which means that the map s> pos is a Lie algebra 
homomorphism from the Lie algebra of smooth 
sections of m into the Lie algebra of smooth vector 
fields on M. 


The generalized Schouten bracket The composi- 
tion law (s1,s2)—> {s1, s2} on the space of sections of 
m extends into a composition law on the space of 
sections of exterior powers of (A,m, M), which is 
called the “generalized Schouten bracket.” Its 
properties are the same as those of the usual 
Schouten bracket. When the Lie algebroid is a 
tangent bundle ty: TM — M, that composition law 
reduces to the usual Schouten bracket. When the Lie 
algebroid is the cotangent bundle tp: T*P — P to a 
Poisson manifold (P,II), the generalized Schouten 
bracket is the bracket of forms of all degrees on the 
Poisson manifold P, introduced by J-L Koszul, 
which extends the bracket of 1-forms used earlier. 


The dual bundle of a Lie algebroid Let w:A* — M 
be the dual bundle of the Lie algebroid 7: A — M. 
There exists on the space of sections of its exterior 
powers a graded endomorphism d,, of degree 1 (that 
means that if 7 is a section of A\*A*,d,(7) is a section 
of A*+!A*). That endomorphism satisfies 


dody =0 


and its properties are essentially the same as those of 
the exterior derivative of differential forms. When 
the Lie algebroid is a tangent bundle ty:TM —> 
M,d, is the usual exterior derivative of differential 
forms. 

On the spaces of sections of the exterior powers of 
a Lie algebroid and of its dual bundle we can 
develop a differential calculus very similar to the 
usual differential calculus of vector and multivector 


fields and differential forms on a manifold. Opera- 
tors such as the interior product, the exterior 
derivative, and the Lie derivative can still be defined 
and have properties similar to those of the corre- 
sponding operators for vector and multivector fields 
and differential forms on a manifold. 

The total space A* of the dual bundle of a Lie 
algebroid 7: A — M has a natural Poisson structure: 
a smooth section s of m can be considered as a 
smooth real-valued function on A* whose restriction 
to each fiber w!(x)(x € M) is linear; this property 
allows us to extend the bracket of sections of 7 
(defined by the Lie algebroid structure) to obtain a 
Poisson bracket of functions on A*. When the Lie 
algebroid A is a finite-dimensional Lie algebra G, the 
Poisson structure on its dual space G* is the KKS 
Poisson structure discussed earlier. 


The Lie Algebroid of a Lie Groupoid 


Let rSTo be a Lie groupoid. Let A(T) be the 
intersection of ker Ta and Tp,I (the tangent bundle 
TT restricted to the submanifold Tro). We see that A(T) 
is the total space of a vector bundle 7: A(T) —> To, 
with base To, the canonical projection 7 being the map 
which associates a point u € lọ to every vector in 
ker T,,a. In this section, we define a composition law 
on the set of smooth sections of that bundle, and a 
vector bundle map p:A(T)— TTo, for which 
m:A(T)—Toọo is a Lie algebroid, called the Lie 
algebroid of the Lie groupoid D=1. 

We observe first that for any point u € Fo and any 
point x € 6 1(u), the map Ly: ye Lyy=m/(x, y) is 
defined on the a-fiber a~!(u), and maps that fiber 
into the a-fiber a~'(a(x)). Therefore, T,,L, maps the 
vector space A, = ker T,a onto the vector space 
ker Tya, tangent at x to the a-fiber a~!(a(x)). Any 
vector w € A, can therefore be extended into the 
vector field along G~!(u), x> (x) = T,L,(w). More 
generally, let w: U — A(T) be a smooth section of 
the vector bundle 7: A(T) — To, defined on an open 
subset U of Io. By using the above-described 
construction for every point u € U, we can extend 
the section w into a smooth vector field w, defined 
on the open subset G~'(U) of T, by setting, for all 
u € U and x € B-!(u): 


w(x) = TyLx(w(u)) 


We have defined an injective map w> ù from the 
space of smooth local sections of 7: A(T) — To, into 
a subspace of the space of smooth vector fields 
defined on open subsets of I’. The image of that map 
is the space of smooth vector fields w, defined on 
open subsets U of I of the form U = 67t(U), where 
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U is an open subset of ro, which satisfy the two 
properties: 


1. Taow=0, 
2. for every x and y€ U such that G(x) =a(y), 
TyLx(w(y)) = (xy). 


These vector fields are called “left-invariant vector 
fields” on T. 

The space of left-invariant vector fields on I is 
closed under the bracket operation. We can therefore 
define a composition law (w1, w2) > {w1, w2} on the 
space of smooth sections of the bundle 7: A(T) —> To 
by defining {w1, w2} as the unique section such that 


{wi, Ww } = |, w2] 


Finally, we define the anchor map p as the map T8 
restricted to A(T). With that composition law and 
that anchor map, the vector bundle 7: A(T) > To is 
a Lie algebroid, called the Lie algebroid of the 
Lie groupoid TTo. 

We could exchange the roles of a and ( and use 
right-invariant vector fields instead of left-invariant 
vector fields. The Lie algebroid obtained remains the 
same, up to an isomorphism. 

When the Lie groupoid r= is a Lie group, its Lie 
algebroid is simply its Lie algebra. 


The Lie Algebroid of a Symplectic Groupoid 


Let rSTo be a symplectic groupoid, with symplectic 
form w. As we have seen above, its Lie algebroid 
m:A —T9o is the vector bundle abos fiber, over 
each poit u € To, is ker T a. We define a linear 
map w? : ker T,a — T*To by setting, for each w € 
e E 


(w (w), v) = a W) 


Since T,I9 is Lagrangian and ker T,,a@ complemen- 
tary to T,„ľo in the symplectic vector space 
(T,T',w(u)), the map w is an isomorphism from 
ker T,a onto T*To. By using that isomorphism for 
each u € Io, we obtain a vector bundle isomorphism 
of the Lie algebroid 7: A — To onto the cotangent 
bundle TTo : T*To = Io. 

As seen in Corollary 5, the submanifold of units To 
has a unique Poisson structure II for which a: T —> To 
is a Poisson map. Therefore, the cotangent bundle 
Tr, : T*To — To to the Poisson manifold (To, II) has a 
Lie algebroid structure, with the bracket of 1-forms as 
composition law. That structure is the same as the 
structure obtained as a direct image of the Lie 
algebroid structure of m:A(T)— Io, by the above- 
defined vector bundle isomorphism of 7:A — To 
onto the cotangent bundle mr, :T*To — To. The Lie 


algebroid of the symplectic groupoid rSTo can 
therefore be identified with the Lie algebroid 
Tr, : T*To —> To, with its Lie algebroid structure of 
cotangent bundle to the Poisson manifold (To, II). 


The Lie Algebroid of a Poisson Groupoid 


The Lie algebroid 7: A(I’) —> To of a Poisson group- 
oid has an additional structure: its dual bundle 
w:A(T)* — To also has a Lie algebroid structure, 
compatible in a certain sense (indicated below) with 
that of 7: A(T) — To. 

The compatibility condition between the two Lie 
algebroid structures on the two vector bundles in 
duality 7: A — M and w: A* — M can be written as 
follows: 


d,(X, Y] = £(X)d.¥ — £(Y)d.X [21] 


where X and Y are two sections of 7, or, using the 
generalized Schouten bracket of sections of exterior 
powers of the Lie algebroid 7: A — M, 


d,[X, Y] = [d.X, Y] + [X,d.Y] [22] 


In these formulas d, is the generalized exterior 
derivative, which acts on the space of sections of 
exterior powers of the bundle 7: A — M, considered 
as the dual bundle of the Lie algebroid w: A* — M. 

These conditions are equivalent to the similar 
conditions obtained by exchange of the roles of A 
and A*. 

When the Poisson groupoid P=. is a symp- 
lectic groupoid, we have seen that its Lie algebroid is 
the cotangent bundle zp, : T*l'9 — To to the Poisson 
manifold Fo (equipped with the Poisson structure for 
which a is a Poisson map). The dual bundle is the 
tangent bundle mr, : TTo — To, with its natural Lie 
algebroid structure defined earlier. 

When the Poisson groupoid is a Poisson Lie group 
(G,II), its Lie algebroid is its Lie algebra G. Its dual 
space G has a Lie algebra structure, compatible with 
that of G in the above-defined sense, and the pair 
(G,G") is called a Lie bialgebra. 

Conversely, if the Lie algebroid of a Lie groupoid 
is a Lie bialgebroid (i.e., if there exists on the dual 
vector bundle of that Lie algebroid a compatible 
structure of Lie algebroid, in the above-defined 
sense), that Lie groupoid has a Poisson structure 
for which it is a Poisson groupoid. 


Integration of Lie Algebroids 


According to Lie’s third theorem, for any given 
finite-dimensional Lie algebra, there exists a Lie 
group whose Lie algebra is isomorphic to that 
Lie algebra. The same property is not true for Lie 
algebroids and Lie groupoids. The problem of 
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finding necessary and sufficient conditions under 
which a given Lie algebroid is isomorphic to the Lie 
algebroid of a Lie groupoid remained open for more 
than 30 years, although partial results were 
obtained. A complete solution of that problem was 
recently obtained by M Crainic and R L Fernandes. 
Let us briefly sketch their results. 

Let 7: A — M bea Lie algebroid and p: A — TM its 
anchor map. A smooth path a: I = [0,1] — A is said to 
be admissible if, for allt € I, p o a(t) =(d/dt)(m o a)(t). 
When the Lie algebroid A is the Lie algebroid of a Lie 
groupoid I’, it can be shown that each admissible path 
in A is, in a natural way, associated to a smooth path in 
I’ starting from a unit and contained in an a-fiber. 
When we do not know whether A is the Lie algebroid 
of a Lie groupoid or not, the space of admissible paths 
in A still can be used to define a topological groupoid 
G(A) with connected and simply connected a-fibers, 
called the Weinstein groupoid of A. When G(A) is a Lie 
groupoid, its Lie algebroid is isomorphic to A, and 
when A is the Lie algebroid of a Lie groupoid I’, G(A) is 
a Lie groupoid and is the unique (up to an isomorph- 
ism) Lie groupoid with connected and simply con- 
nected a-fibers with A as Lie algebroid; moreover, G(A) 
is a covering groupoid of an open sub-groupoid of T. 
Crainic and Fernandes have obtained computable 
necessary and sufficient conditions under which the 
topological groupoid G(A) is a Lie groupoid, that is, 
necessary and sufficient conditions under which A is 
the Lie algebroid of a Lie groupoid. 


See also: Classical rMatrices, Lie Bialgebras, and 
Poisson Lie Groups; Lie Superalgebras and Their 
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Liquid crystals represent an important state of matter, 
intermediate between regular solids with long-range 
positional order of atoms or molecules (often accom- 
panied by the orientational order, as in the case of 
molecular crystals) and isotropic fluids with neither 
positional nor orientational long-range order. The 
basic feature of liquid crystals is orientational order of 
building units, which might be individual molecules or 
their aggregates, and complete or partial absence of the 
long-range positional order. Molecular interactions 
responsible for orientation order in liquid crystals are 
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relatively weak (most liquid crystals melt into the 
isotropic phase at around 100-150°C). As a result, 
the structural organization of liquid crystals, most 
importantly, the direction of molecular orientation, 
is very sensitive to the external factors, such as 
electromagnetic field and boundary conditions. This 
sensitivity opened the doors for applications of 
liquid crystals, including in information displays 
and flat-panel TVs. 

Liquid crystals, discovered more than 100 years 
ago, represent nowadays one of the best studied 
classes of soft matter, along with colloids, polymer 
solutions and melts, gels and foams. There is 
an extensive literature on physical phenomena in 
liquid crystals, their chemical structure and material 
parameters, display applications, etc. 


Thermotropic and Lyotropic Systems 


Depending on the way the liquid crystalline state 
(also known as “mesophase”) is produced, one 
distinguishes thermotropic and lyotropic liquid 
crystals. Thermotropic liquid crystalline state can 
exist in a certain temperature range for the materials 
made of strongly anisometric molecules, either 
elongated (calamitic molecules) or disk-like (discotic 
molecules). Upon heating, many substances of this 
type yield the following phase sequence: solid 
crystal—liquid crystal—-isotropic fluid. 

Lyotropic liquid crystals form only in the presence of 
a solvent, such as water or oil. Most commonly, 
lyotropic mesophases are formed by solutions of 
anisometric amphiphilic molecules (such as soaps, 
phospholipids, and surfactants). Amphiphilic molecules 
have two distinct parts: a (polar) hydrophilic head and a 
(nonpolar) hydrophobic tail (generally, an aliphatic 
chain). This feature gives rise to a special “self- 
organization” of amphiphilic molecules in solvents. 
Mesomorphic states also might be formed in the 
solutions of certain polymers; polymers might also 
form thermotropic (solvent-free) liquid crystals. 

There are four basic types of liquid crystalline phases, 
classified according to the dimensionality of the trans- 
lational correlations of building units: nematic (no 
translational correlations), smectic (1D correlations), 
columnar (2D correlations), and various 3D-correlated 
structures, such as cubic phases and blue phases. 

“Uniaxial nematic,” noted UN, is an optically 
uniaxial fluid phase. The unit vector along the optic 
axis is called the director n, n? = 1; it indicates the 
average orientation of the molecular axes (see 
Figure 1). Even when the molecules are polar, 
head-to-head overlapping and flip-flops establish 
centrosymmetric arrangement in the nematic bulk. 
Thus, n and —n are equivalent notations. It is 





(a) (b) 
Figure 1 (a) Nematic (uniaxial) type of ordering in thermotropic 
liquid crystals; the molecular long axes are on average aligned 
along the director n; (b) a molecule of octylcyanobiphenyl, a 
typical thermotropic liquid crystalline material capable of both 
nematic and SmA types of ordering. 
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important to realize that n specifies only the 
direction of orientation but not the degree of 
orientational order. In biaxial nematics (BN), the 
symmetry point group is one of a prism. A BN 
phase is characterized by three directors, n, l, and 
m=n X l, such that n = —n,l = —l, and m = —m. 

When the building unit (molecule or aggregate) is 
chiral, that is, not equal to its mirror image, UN 
might show a helicoidal structure. It is then called a 
cholesteric phase denoted Ch or N*. Note that UN, 
BN, and N* phases are liquid phases (no long-range 
correlations in molecular positions). 

“Smectics” are layered phases with a quasi-long- 
range 1D translational order of centers of molecules 
in a direction normal to the layers (see Figure 2). 
This positional order is not exactly the long-range 
order as in regular 3D crystals: as shown by Landau 
and Peierls, the fluctuative displacements of layers in 
1D lattice diverge logarithmically with the size of 
the sample. However, for regular materials with 
smectic period of the order of 1 nm, the effect is 
noticeable only on scales of 1mm and larger. In 
smectic A (SmA), the molecules within the layers 
show fluid-like arrangement, with no long-range 
in-plane positional order; it is a uniaxial medium 
with the optic axis m perpendicular to the layers (see 
Figure 2). Some materials, such as octylcyanobiphe- 
nyl (see Figure 1b), show both UN and SmA phase 
(at somewhat lower temperatures). In the lyotropic 
version of SmA, the so-called lamellar La phase, the 
amphiphilic molecules arrange into bilayers. If the 
solvent is water, the exterior surfaces of the bilayer 
are formed by polar heads; the hydrophobic tails are 


Water 


Thermotropic SmA Lyotropic L,, phase 


Figure 2 SmA type of ordering in the thermotropic SmA liquid 
crystal (left) and the lyotropic analog, La phase (right) formed by 
equidistant arrangement of amphiphilic bilayers in water. 
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hidden in the middle of the bilayer (note that 
membranes of many biological cells are organized 
in the similar way). The periodic structure of 
alternating surfactant and water layers gives rise to 
the La phase (see Figure 2). Interestingly, the 
structure might retain its smectic ordering even 
when strongly diluted, being stabilized by thermal 
fluctuations of bilayers. 

Other types of smectics show in-plane order, 
caused, for example, by a collective tilt of the rod- 
like molecules with respect to the normals to the 
layers (the so-called SmC). In chiral materials, the 
tilt of the molecules might lead to the helicoidal 
structure; we do not consider them here, although 
the chiral SmC phase is of considerable interest for 
applications in fast-switching optical devices. 

“Columnar phases” are most frequently formed 
by hexagonal packing of cylindrical aggregates, as in 
the case of thermotropic materials formed by disc- 
like molecules. The positional order is 2D only, as 
the intermolecular distances along the axes of the 
aggregates are not regular. 

“3D-correlated structures” demonstrate a periodic 
structure along all three coordinates, but they are 
still different from the 3D crystals, as the periodicity 
is caused by the repetition of molecular orientations 
rather than by regular repetition of the molecular 
centers of mass. For example, in cubic lyotropic 
phases, the 3D network is formed by periodically 
curved layers of amphiphilic molecules; the mol- 
ecules are free to move within the layers. 


Order Parameter 


The concept of an order parameter (OP) has 
emerged in its modern form in the Landau model 
of phase transitions and has been later expanded to 
describe other features such as topologically stable 
defects in the ordered media. The OP of the liquid 
crystal can be related to the anisotropy of macro- 
scopic properties such as diamagnetic or dielectric 
susceptibility. Measuring these anisotropies allows 
one to determine the degree of orientational order. 
The magnetic measurements are especially conveni- 
ent compared with their electric counterparts, as in 
this case the local field acting on the molecules 
differs very little from the external field. In UN, the 
components of the (symmetric) magnetic suscepti- 
bility tensor y read in the frame in which the z-axis 
is parallel to the director n, as 


B xı 0 0 
x=) 0 xX 0 [1] 
0 0 XII 


The quantity xa =X — xı is called the anisotropy 
of the magnetic susceptibility. In most thermotropic 
UNs, x < O and xı < 0 (diamagnetism), and y, > 0, 
so that n orients along the applied magnetic field. In 
the isotropic phase, yz = 0; in UN, x, is determined by 
(1) molecular susceptibilities of individual molecules 
and (2) degree of molecular order. For the latter, one 
can chose the temperature-dependent quantity 
s(T)= (jG cos” 0 — 1); where 0 is the angle 
between the axis of an individual molecule and the 
director n and (...) means an average over molecular 
orientations. The OP is thus the traceless symmetric 
tensor O with the components that vanish in the 
isotropic phase, and are proportional to yx, in the UN 
phase: 


= —Xa/3 0 0 
Q=0| 0 -x3 0 [2 
0 0 LG) 


One can choose the constant O in such a way 
that in an arbitrary coordinate system, where 


Xi = X10% + XaNiN, 
O¥ =s(T) (nin; — 363) 3] 


The tensor OP allows one to describe the biaxial 
nematic phase as well: 


ar = s(T) (ninj — +6ii) + b(T) (Ll; = mmj) |4] 


where n, l, and m are three orthogonal directors and 
b is the “biaxiality parameter”; b =0 in UN. 


Elasticity of the Nematic Phase 


In real samples of liquid crystals, the average 
molecular orientation changes from point to point 
because of the external fields, boundary conditions, 
presence of foreign particles, etc. The OP becomes 
spatially nonuniform, O,(r). In most problems of 
practical interest, the typical scale of distortions is 
much larger than the molecular scale; the deforma- 
tions are weak in the sense that the scalar part of the 
OP, s(T), remains constant despite the spatial 
gradients of the director field n(r). 

The free-energy density associated with the (small) 
deformations of the UN, classified as splay, twist, 
and bend of the director (see Figure 3) writes in 
terms of the director gradients n; ; = (On;/Ox;) as 


fro = 4Ki(divn)* + 1Ky(n- curl n) 
+ 5K3(n x curl n) [5] 
and is known as the Frank—Oseen energy density with 


Frank elastic constants of splay (K1), twist (K2), and 
bend (K3); all three are necessarily positive definite; the 
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Figure 3 Basic types of director distortions in the bulk of the 
uniaxial nematic. 


dimensionality is that of a force. The elastic constants 
can be estimated as the typical energy of molecular 
interactions responsible for the orientational order 
divided by the characteristic length (a molecular size): 
K ~ U/l ~ kgT/l ~ 4 x 107!J/10? ~ 4 pN, which 
yields a good estimate for many thermotropic UNs, 
as the experimental values are between 1 and 10 pN. 
The energy density [5] is often supplemented with the 
so-called divergence terms: 


fi3 F h4 = K43 div(n div n) 
— Ky4 div(ndivn +n x curln) f6] 


The Ky4 term can be re-expressed as a quadratic 
form of the first derivatives whereas the K13 term is 
proportional to the second derivatives n; ; and thus 
might in principle be comparable to fro ~ n; jrk 1. 
The volume integrals of these terms can be 
re-expressed as the surface integrals by virtue of 
the Gauss theorem (but only when the elastic moduli 
Kı3 and K 4 are constant which might not be the 
case at certain interfaces and at the core of defects). 
Therefore, when one seeks for equilibrium director 
configurations by minimizing the total free-energy 
functional I (fro + fi3 + foa)dV, the Kı} and K234 
terms do not enter the Euler-Lagrange variational 
derivative for the bulk. However, they can 
contribute to the energy and influence the equili- 
brium director through boundary conditions at the 
surface. Usually, K24 term is retained when the 
system experiences a topological change of the 
director field. The K,3 term is often neglected; 
very little is known about Kj3 value. 

In the presence of external field, the free-energy 
density acquires additional terms. For example, for 
the magnetic field B, the energy density [5], [6] should 
be supplemented by the term —(1/2)u9'ya(B ny’, 
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where uo =47 x 107 Hm is the magnetic perme- 
ability of free space (magnetic constant). 

The possibility to orient the director by an applied 
electric or magnetic field leads to numerous practical 
applications. Any actual liquid crystal cell is 
confined; say, by a pair of parallel glass plates. The 
molecular interactions between the liquid crystal 
and the boundary substrates are anisotropic. This 
anisotropy establishes one (sometimes more) pre- 
ferred orientation of n at the boundary, the so-called 
“easy axis.” The phenomenon is called the “surface 
anchoring.” Orienting action of the substrates 
usually keeps the director uniform if the external 
field is absent. However, the external field can 
overcome both the “anchoring” at the surfaces and 
the elasticity of the nematic bulk and reorient the 
director. This is the “Frederiks effect,” first dis- 
covered for the magnetic case. When the field is 
removed, the surface anchoring restores the original 
director structure. Thus, one can use the external 
field and surface anchoring to switch the liquid 
crystal orientation back and forth. The dielectric 
version of the effect is used in electrooptic devices, 
including displays. The liquid crystal is usually 
sandwiched between two transparent electroconduc- 
tive plates (e.g., glass covered with indium tin oxide) 
coated with a suitable alignment layer. The voltage 
across the cell controls the director configuration 
and thus the optical properties of the cell. 


Elasticity of the Smectic A Phase 


For the SmA phase, the elastic free-energy density 
should be modified to take into account (1) 
restrictions that the layered structure imposes onto 
the director twist and bend, and (2) elastic cost of 
changes in the thickness of the layers: 


f =4K,(div n)* + 4B [7] 


where B is the Young modulus (layers compressi- 
bility modulus) and y=(d-—do)/do, the relative 
difference between the equilibrium period dọ and 
the actual layer thickness measured along the 
director n. The ratio of Kı to B defines an important 
length scale 


A= y Kı/B [8] 


called “the penetration length”; A is of the order 
of the layer separation but diverges when the 
system approaches the SmA-nematic transition. 
The splay constant Kı in the SmA phase is of the 
same order as in a nematic phase stable at higher 
temperatures. With \ ~ dọ ~ (1 + 3)nm, one finds 
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B ~ 10° + 107 N/m’, a value that is 10° to 10* times 
smaller than the compressibility modulus in a solid. 

The SmA elastic free-energy density is often 
written in terms of the mean curvature 
H=(1/2)(0; +02) and the Gaussian curvature 
G =0}107 of the layers: 


f =4Ki(o1 +. 02)°+Koio2 + 1BY [9] 


As compared with eqn [7], it is supplemented by the 
divergence saddle-splay term K; —2K, < K < 0 (for 
the system of flat layers to be energetically stable); 
cı =1/Rı and o2 =1/R32 are the local values of the 
principal curvatures of the smectic layers. 


Dynamics 


Liquid crystals are fluids; they can flow preserving 
the orientational order. Flow imposes an orienta- 
tional torque on the liquid crystals. Most often, the 
director tends to realign along the direction of flow. 
There is also an inverse effect: director distortions 
can cause the flow. This “backflow” effect is of 
importance in liquid crystal displays. In the approxi- 
mation of a constant scalar OP, the hydrodynamics 
of liquid crystals is described in terms of seven 
unknown variables: (1) mass density p(r,t), (2) three 
components of the velocity field v(r,t), (3) energy 
density, and (4) two components of the director field 
n(r,t). These variables are found from seven 
equations 


1. conservation of mass, 

2. three equations for the conserved components of 
the linear momentum, 

3. entropy balance equation, and 

4. two director dynamics equations. 


In contrast to an isotropic fluid, the stress tensor 
depends not only on the gradients of the velocity, 
but also on the director components. UN phase 
should be characterized by five different viscosity 
constants. The number of viscosities reduces to 
three, when the director distortions are small. 
These three can be chosen as the effective viscosities 
for three idealized geometries of flow, also known as 
Miezowicz geometries, in which one assumes that 
the director is fixed (e.g., by a strong magnetic field) 
(see Figure 4): 

When n=(1,0,0) is perpendicular to both the 
flow direction and the velocity gradient, the UN 
behaves as an isotropic fluid with a viscosity n4; 
however, director fluctuations coupled with the 
certain values of the viscosity coefficients might 
destabilize the initial director orientation (see 
Figure 4a). When n is parallel to the flow 





Figure 4 Miezowicz geometries for effective viscosities of the 
uniaxial nematic. 


(Figure 4b) or parallel to the velocity gradient 
(Figure 4c), the corresponding viscosities n, and ne 
are generally different from 7, and from each other; 
Nb < Na < Ne for a typical thermotropic UN material 
composed of the rod-like elongated molecules. The 
result jp < ne can be explained by assuming that 
the friction correlates with the cross section of the 
molecules seen by the flow. 


Topological Defects 
Experimental Observations 


When a thick UN sample (say, 100 um thick) with 
no special aligning layers is viewed under the 
microscope, one usually observes a number of 
mobile flexible lines, the so-called disclinations. 
The disclinations are seen as thin and thick threads 
(see Figure 5). Thin threads strongly scatter light and 
show up as sharp lines. These are truly topologically 
stable defect lines, along which the nematic sym- 
metry of rotation is broken. The disclinations are 
topologically stable in the sense that no continuous 
deformation can transform them into a uniform 
state, n(r)=const. Thin disclinations are singular in 
the sense that the director is not defined along the 
core of the defect line. Thick threads are line 
defects only in appearance; they are not singular 
disclinations. The director is smoothly curved and 
well defined everywhere, except, perhaps, at a 
number of point defects, the so-called hedgehogs 
(see Figure 5). 

In thin UN samples (1-50 um) with the director 
tangential to the bounding plates, the disclinations 
are often perpendicular to the plates. Under 
a microscope with two crossed polarizers, one 
can see the ends of the disclinations as centers 
with emanating pairs of dark brushes (see Figure 6) 
giving rise to the so-called “Schlieren texture.” The 
dark brushes display the areas where n is either in 


Point 
defect 
hedgehog 


n i 
l Nonsingular 
_ Singular disclination 
disclination 
Core 
(b) (c) 
Figure 5 (a) Thin singular disclinations and thick nonsingular 


threads in the nematic (n-pentylcyanobiphenyle (5CB)) bulk. 
Crossed polarizers; (b, c) typical director configurations asso- 
ciated with thin and thick lines; thick lines are often associated 
with point defects in the nematic bulk — hedgehogs. 


the plane of polarization of light or in the perpendi- 
cular plane. The director rotates by an angle +7 
when one goes around the end of the disclination at 
the surface. Centers with four emanating brushes are 
also observed; they correspond to point defects 
located at the surface, the so-called boojums, (see 
Figure 6). The director undergoes a +27 rotation 
around these four-brush centers. The principal 
difference between the centers with two brushes 
(ends of singular lines) and centers with four brushes 
(surface point defects) can be seen after a gentle shift 
of one of the bounding plates with respect to the 
other. Upon shear-induced separation in the plane of 
observation, the centers with two brushes are clearly 
seen as connected by a singular trace — disclination, 
while the centers with four brushes separate without 
a visible singularity between them. 

The intensity of linearly polarized light coming 
through a uniform UN slab depends on the angle 8 
between the polarization direction and the projec- 
tion of the director n onto the slab’s plane: 


h 
I = Ip sin? 26 sin? Ee (Me eff — no) [10] 


where Io is the intensity of incident light, A is the 
wavelength of the light, neef is the effective 
refractive index that depends on the ordinary index 
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Boojums Disclination ends 


100um f 
Figure 6 Schlieren texture of a thin (13um) slab of 5CB. 
Centers with two and four brushes are the ends of singular 


disclinations and point defects — boojums, respectively. Tangen- 
tial director orientation. Crossed polarizers. 


no, extraordinary index ne, and the director orienta- 
tion. Equation [10] allows one to relate the number 
|k| of director rotations by +27 around the defect 
core, to the number B of brushes: 


k| = B/4 11] 


Taken with a sign that specifies the direction of 
rotation, k is called the “strength of disclination,” 
and is related to a more general concept of a 
topological charge (but does not coincide with it). 
Note that I =0 when n is perpendicular to the plates 
(so-called homeotropic state), as Me et=Mo. The 
homeotropic state is used as one of the ground 
states in modern flat-panel TV sets. By applying the 
electric field, one tilts the director so that ne eff A Mo 
and the cell (or the corresponding pixel in the liquid 
crystal panel) becomes transparent. 


Nematic Droplets 


When left intact, textures with defects in flat samples 
relax into a more or less uniform state. Disclinations 
with positive and negative k find each other and 
annihilate. There are, however, situations when the 
equilibrium state requires topological defects. 
Nematic droplets suspended in an isotropic matrix 
such as glycerin, water, polymer, etc., (see Figure 7) 
and inverted systems, such as water droplets in a 
nematic matrix are the most evident examples. 
Consider a spherical nematic droplet of a 
radius R and the balance of the surface anchoring 
energy ~W,R* (W, is the surface anchoring 
coefficient), and the elastic energy ~KR;K is 
some averaged Frank constant. Small droplets 
with R << K/W, avoid spatial variations of n at 
the expense of violated boundary conditions. In 
contrast, large droplets, R >> K/W, satisfy 
boundary conditions by aligning n along the 
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(a) (b) 


Figure 7 Polarizing-microscope texture of spherical nematic 
droplets suspended in glycerin. (a) The director configuration is 
radial and normal to the spherical surface; the inset shows the 
point-defect hedgehog in the center of the droplet. (b) Tangential 
director orientation at the interface results in the bipolar structure 
with two defects-boojums at the poles. The director is twisted 
because of the smallness of the twist elastic constant as 
compared to the splay and bend constants. 


preferred direction(s) at the surface. Since the 
surface is a sphere, the result is the distorted 
director in the bulk, for example, a radial hedgehog 
when the surface orientation is normal (see Figure 7). 
The characteristic radius R is macroscopic (microns), 
as K~10pN and W, ~10>%-10-§Jm™~. Point 
defects in large nematic droplets must satisfy restric- 
tions on their topological characteristics that have 
their roots in the Poincaré and Gauss theorems of 
differential geometry. 


Topological Classification 
of Defects in UN 


The language of topology, or, more precisely, of 
homotopy theory, allows one to associate the 
character of ordering of a medium and the types of 
defects arising in it, to find the laws of decay, 
merger and crossing of defects, to trace out their 
behavior during phase transitions, etc. The key point 
is occupied by the concept “of topological invari- 
ant,” also called a “topological charge,” which is 
inherent in every defect. The stability of the defect is 
guaranteed by the conservation of its charge. 
Homotopy classification of defects includes three 
steps. 

First, one defines the OP of the system. In a 
nonuniform state, the OP is a function of 
coordinates. 

Second, one determines the OP (or degeneracy) 
space R, that is, the manifold of all possible values 
of the OP that do not alter the thermodynamical 
potentials of the system. In the UN, R is a unit 
sphere denoted S?7/Z (also called the projective 
plane RP 2) with pairs of diametrically opposite 
points being identical. Every point of S?/Z) 


represents a particular orientation of n. Since 
n =—n, any two diametrically opposite points at 
S/Z describe the same state. 

The function n(r) maps the points of the nematic 
volume into S*/Z,. The mappings of interest are 
those of i-dimensional “spheres” enclosing defects. 
A line defect is enclosed by a linear contour, i= 1; a 
point defect is enclosed by a sphere, i= 2, etc. 

Third, one defines the homotopy groups 7;(R). 
The elements of these groups are mappings of 
i-dimensional spheres enclosing the defect in real 
space into the OP space. To classify the defects of 
dimensionality ¢/ in a t-dimensional medium, one 
has to know the homotopy group 7;(R) with 
i=t—f—1. 

Each element of 7;(R) corresponds to a class of 
topologically stable defects; all these defects are 
equivalent to one another under continuous 
deformations. The elements of homotopy groups 
are topological charges of the defects. For UN, 
the homotopy group 7(S*/Z2)=Z,={0,1/2} is 
composed of two elements; there is thus only one 
class of topologically stable defects (that appear 
as thin singular lines under the microscope, see 
Figure 5) with the addition rules 1/2+1/2=0 
and 1/2+0=1/2 describing interaction of dis- 
clinations. The topological point defects in the 
bulk (hedgehogs) are described by the second 
homotopy group, 72(S*/Z2)=Z={0,1,2,...}, and 
can be labeled by integer topological charges. The 
simplest point defect is a “radial” hedgehog, seen 
in the center of the radial droplet (see Figure 7a). 
Boojums are special point defects that, in contrast 
to hedgehogs, can exist only at the boundary of 
the medium (see Figure 7b). 

The relative stability of stable disclinations 
depends on the Frank elastic constants of splay 
(K11), twist (K22), bend (K33) and saddle-splay 
(K24) in the Frank—Oseen elastic free-energy 
density functional; the role of the elastic constant 
Kı3 in the structure of defects is not clarified yet. 

Consider the simplest case of “planar” disclina- 
tions with n perpendicular to the line. In this case, 
the K24-term in the line’s energy is zero. Assuming 
Ky, = K2 = K33 = K, by minimizing the bulk integral 
of [5], one finds the equilibrium director configura- 
tion around the line of strength k 


n = {cos[ky + c], sin[ky + c], 0} [12] 


where y= arctan (y/x),x and y are Cartesian coor- 
dinates normal to the line, c is a constant. The energy 
per unit length of a straight planar disclination is 


Fy, = nKk? In +F, [13] 


Lc 


where L is the characteristic size of the system, re 
and F. are, respectively, the radius and the energy of 
the disclination core, a region in which the distor- 
tions are too strong to be described by a pheno- 
menological theory. 

The restriction of planar director distortions does 
not allow the model to grasp the crucial difference 
between half-integer and integer k’s. The lines of 
integer k, as already discussed, are fundamentally 
unstable, as the director can be reoriented along the 
axis. This “escape in the third dimension,” is usually 
energetically favorable, since the singular core is 
eliminated. When opposite directions of the 
“escape” meet, a point defect hedgehog is formed, 
as illustrated in Figure Sc. 

Unlike point defects such as vacancies in 
solids, topological point defects in nematics 
cause disturbances over the whole volume. 
The curvature energy of the point defect is 
proportional to the size R of the system. For 
example, for the radial hedgehog with 


n=(x,y,z)/\/x2 +y? +27, and the hyperbolic 
hedgehog with n=(—x,—y,z)/\/x* +y +z, 


one finds, respectively, 


= 87R(Ky1 = K24) + Fo and 


Ki, 2K33 K4 
Fan = 8TR | — + —— + — F 14 
hh mk (A 4 TP 4S) 4 a [14] 


Defects in Smectics 


Layered structure of smectics leads to linear 
defects of positional order, dislocations, in addi- 
tion to disclinations. There is also a special class 
of distortions known as focal conic domains 
(FCDs) that are associated with large-scale cur- 
vatures of layers. Imagine that because of the 
boundary conditions, flow, or the external fields, 
the smectic layers are curved over the scale much 
larger than the thickness of the layers. It is easy 
to see from eqn [9] that the curved layers will 
prefer to maintain their equidistance, as the 
curvature energy is much smaller than the layers 
dilation energy at the large scales of deforma- 
tions. Generally, the family of equidistant curved 
surfaces is associated with the focal surfaces at 
which the principal curvatures diverge. These 
focal surfaces are thus energetically very costly. 
A radical way to reduce the elastic energy would 
be to decrease the dimensionality of the focal 
surfaces, say, by transforming them into lines and 
points. The latter case corresponds simply to a 
system of concentric spherical layers. The former 
is more complicated and corresponds to FCDs in 
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Figure 8 SmA phase with FCDs based on the confocal pairs 
of ellipses and hyperbolas; the scheme on the right shows 
the arrangement of the elliptic bases and smectic layers 
wrapped around the confocal pairs of defects. Reproduced 
from Lavrentovich OD (2003) In: Arodz et al. (eds.) Patterns of 
Symmetry Breaking. Dordrecht: Kluwer Academic Publishers, 
with kind permission of Springer Science and Business Media. 


which the focal surfaces are represented by pairs 
of confocal lines: ellipse and hyperbola (limiting 
case: circle and straight line), and the pair of 
confocal parabolae. Experiments confirm that the 
FCDs are the most frequent type of structural 
deformations in smectic materials see Figure 8. 


Conclusion 


To summarize, over the last few decades, liquid 
crystals transformed from a mysterious and 
curious form of condensed matter into a key 
technological material, thanks to the progress in 
the understanding of their elastic, optical, and 
viscous properties. However, the intrinsic com- 
plexity of these materials still leaves plenty of 
room for further studies, not only of an applied 
nature, but also fundamental. In the field of 
thermotropic liquid crystals, researchers continue 
to discover new types of structural organization, 
such as the phases formed by “banana-shaped” 
molecules that are dramatically different from the 
phases formed by “regular” rod-like and disk-like 
molecules. There is a continuous work to sharpen 
our understanding of even the “old” problems, such 
as mechanisms of surface alignment, nature and 
quantitative values of the elastic constants K13, K24, 
and K. Even in the case of the electric Frederiks 
effect that is at the heart of modern applications, the 
search continues as the corresponding process of 
director reorientation is generally very complex. In 
addition to the dielectric torque, it is controlled by 
various factors, for example, a nonlocal character of 
the electric field in the anisotropic medium, finite 
electric conductivity, flexoelectric effect (i.e., electric 
polarization brought about by the director deforma- 
tions), surface electric polarization at the bounding 
plates, dependence of the dielectric and other 
material properties on the frequency of the applied 
field which might be comparable with the 
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characteristic frequency of dielectric relaxation, cou- 
pling of the director reorientation and the material’s 
flows, appearance of topological defects, etc. Many 
research efforts nowadays are focused on composite 
systems, such as liquid crystal colloids and polymer- 
liquid crystal composites. Over the next decade or so, 
one would expect that the emphasis in fundamental 
studies will gradually shift from the thermotropic 
liquid crystals to their lyotropic counterparts, as the 
lyotropic type of orientational order is featured by 
many systems of biological significance, such as 
solutions of DNA, f-actin, etc. 


See also: Non-Newtonian Fluids; Topological Defects 
and Their Homotopy Classification. 
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Introduction 


Using Lagrange multipliers, the smallest and 
the largest eigenvalue of a symmetric quadratic form 


Olu) = ` AjpUjUp (Ajk = akj) 


j,k=1 


can be obtained by minimizing and maximizing O 
on the unit sphere S™! = {u € R”: |jul|=1}. If the 
corresponding extremum is reached at u*, then u* is 
an associated eigenvector. 

In the setting of integral or partial differential 
equations, a “recursive variational method”? has 
been proposed to determine all the eigenvalues A; < 
Ag <-++ <2, and corresponding eigenvectors 
u',ur,...,u” of O or, in modern terms, of the 


associated symmetric matrix A = (aj): 


M = min QU) (= QW) 
i ieee ee Q() 
= Ow)) G=2,...,”) 
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Further considerations have led to a nonrecursive 
minimum-—maximum principle: 


(u) (1<j<n) 


Aj m 
j 


= mn | ax 
{XICR" : dim Xi=j} {ucXi : |jull=1} 


and to a dual maximum-minimum principle 


(Weyl): 


Aj = max min u 
J {Piss biR] {ul =1,u-pi:=0,1<i<j—1} O( ) 
(1 <j<n) 


These principles have been widely used in various 
existence and approximation questions of mathema- 
tical physics, and extensions have been made to the 
abstract setting of symmetric bilinear forms in 
Hilbert spaces. 

Around 1930, Ljusternik and Schnirelman have 
extended this theory beyond the frame of quadratic 
forms, replacing O by a differentiable real-valued 
function f and the unit sphere by a finite- 
dimensional compact differentiable manifold M. 
Their aim was the obtention of the “critical points” 
of f on M, that is, the points u € M where the 
differential f’(u) of f at u (as a linear functional on 
the tangent space T,,M to M) is equal to zero, and of 
the corresponding critical values, that is, the values 
of f at critical points. When M is a sphere, the 


critical points are nontrivial solutions of the 


equation 


f(u) = du 1] 


for some AER (nonlinear eigenvalue problem). 
Ljusternik and Schnirelman have replaced the 
dimension of the vector spaces occurring in 
the minimum-—maximum principle for eigenvalues 
by the concept of “category” of a closed set A ina 
topological space X. An early success of their 
approach was the existence of three geometrically 
distinct closed geodesics without self-intersections 
on any compact surface of genus zero. In 1960, 
their theory has been extended to infinite- 
dimensional manifolds and to other measures of 
the “size” of a set than the category, allowing many 
theoretical developments as well as various 
applications to nonlinear differential equations. 


Ljusternik—Schnirelman Category 


Let X be a topological space (e.g., a normed vector 
space, or a differentiable manifold, or a metric 
space), and A a closed subset of X. The category of 
A in X, catx(A), is the least integer k such that A 
can be written as Ge Aj, with A; closed and 
contractible in X, that is, continuously deformable 
in X into a single point. If no such k exists, one sets 
caty(A) = +20. We write cat(X) for caty(X). For 
example, if X is contractible (in itself), cat(X)=1. 
This is the case for any normed space X. For the 
hypersphere, catg»(S”7!) = 1, but cat(S”~!) =2. 

The Ljusternik—Schnirelman category satisfies the 
following properties, which are not too difficult to 
prove. If A,B C X are closed, 


1. caty(A)=0 if and only if A = Í; 

2. if A C B,catx(A) < catx(B); 

3. catx(A U B) < catx(A) + catx(B); 

4. if 7:[0,1] x X — X is a continuous deformation 
of X(n(0, A) = A), catx(A) < catx(n(1, A)); and 

5. if X is a finite-dimensional manifold and A C X 
is compact, there is a neighborhood B of A such 
that caty(B) =caty(A). 


Computing or even estimating the category of a 
given set is in general difficult, requiring techniques 
of algebraic topology. In particular, one can show 
that, for the n-torus T” =S! x St x --- x S! (n times), 
cat(T”) =n + 1, and for the n-dimensional projective 
space P”=§"/Z7, obtained by identifying the anti- 
podal points of $”,cat(P”) = + 1. It is clear that a 
set of category p must contain at least p points. If X 
is connected, any compact subset of category p+ 1 
has (topological) dimension larger or equal to p. 


Ljusternik-Schnirelman Theory 329 


Ljusternik—Schnirelman Minimax Method 


The Ljusternik—Schnirelman category of M provides a 
lower bound for the number of critical points of a 
smooth function f on suitable finite-dimensional 
manifolds M. Namely, if M is a compact Riemannian 
C*-manifold without boundary, any f € C*(M,R) 
has at least cat(M) distinct critical points, with 
critical values 


c = inf supf(u) (1 <k <cat(M)) [2] 


AEAk ucA 
where 
A, = {A C M:A closed, catm(A) > k} 
(1 < k < cat(M)) [3] 


A fundamental technique in the proof is a deformation 
lemma along the trajectories of the gradient system 
associated to f (method of steepest descent). If Vf 
denotes the gradient of f in the Riemannian structure 
of M, the Cauchy problem for the gradient system 


TIL _vf(n), nO) =u 4] 


has a unique globally defined continuous solution 
n (t u), which is such that 


1 
fa) flu) = f few) 
1 
=- | (VFG) Pde [5] 


Notice that, by property (4) of the category, each 
deformation by 7 of a set in A; remains in Aj. For 
c € R, define 


f := {uce M:flu)< c} 


KiS fu € M: Vf(u)=0, flu) = c} [6] 


From [5] it follows that given c€ R and an open 
neighborhood U. of K,, one has n(1, f° \ Ue) c f* 
for all sufficiently small € > 0. This implies that if 
GenSan lor some @ > 0,. then 
caty(K.) > 4+1. Assume, by contradiction, that 
caty(K.) < q, let U. be an open neighborhood of K. 
such that catyy(U.) =caty(K,.) (U.=90 if g=0), £ > 0 
such that 7(1,f°T* \ Ue) C f, and A E Aji, such 
that sup, f < c + e, that is, A C f°. Then 


caty(n(1,A \ U.)) > caty(A \ Uo) 
> caty(A) —caty(U) > j 


giving the contradiction c < sup, 4)f < c= E€. 
Notice that, for each f, c; = inf {c € R:catm(f*) > j}, 
which shows that the c; are precisely those levels of f 
where caty(f°) changes. The presence of critical 
values is detected by changes in the topology of the 
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sublevel sets f€ when c varies, a common feature of 
many techniques for finding critical points of 
functions. 

A direct consequence is that for each even 
f € ŒR”, R), system [1] has at least n pairs of 
solutions (u, —u) with ||u||=1. Indeed, the solu- 
tions of [1] are the critical points of f on S”~!. As f 
takes the same values at antipodal points, it is well 
defined on the projective space P”', and 
cat(P” t) =n. 

The Ljusternik—Schnirelman theorem can be extended 
to the C!-situation. The category of M gives a lower 
bound for the number of critical points of f on the closed 
manifold M. If Crit(M) denotes the minimum of 
the number of critical points of all C'-functions on M, 
so that Crit(M) > cat(M), an interesting question is 
to estimate the gap Crit(M) —cat(M). For M closed 
connected,  Crit(M)<dim(M)+1 (Takens). If 
Crit(M)=2, M is homeomorphic to a sphere, so that 
the equality Crit($)=cat($) for homotopy spheres is 
equivalent to Poincaré’s conjecture! Manifolds with 
Crit(M)=cat(M)+ 1 are known, but not with 
Crit(M) > cat(M) + 1. 


Ljusternik—Schnirelman Theory 
in Infinite-Dimensional Manifolds 


The main difficulty in extending the results of the 
previous section to functions defined on infinite- 
dimensional manifolds lies in the lack of compact- 
ness. J T Schwartz and Palais have shown that such 
an extension is possible for functions f satisfying on 
M a compactness property (allowing an infinite- 
dimensional deformation lemma), now referred to as 
the Palais-Smale condition: each sequence (up) with 
(f(u})) bounded and limz_,,, Vf(uz) =0 has a con- 
vergent subsequence. Such a condition can be 
localized at level c by replacing the boundedness of 
(f(up)) by limg_, x f (up) =c. The infinite-dimensional 
extension of Ljusternik—Schnirelman’s theorem goes 
as follows: Let M be an infinite-dimensional Rieman- 
nian (or even Finsler) connected complete manifold 
of class C! without boundary. Any f € C'(M,R) 
bounded from below and satisfying Palais-Smale 
condition has at least cat(M) distinct critical points. 

A simple application can be given to the periodic 
solutions of period T (T-periodic solutions) of 
Lagrangian systems 


u” + VV(u) = b(t) [7] 
where V € C!(R”,R), 2a-periodic in each compo- 


nent u;(1 <j < n), h is continuous, T-periodic and 
has mean value / equal to zero. By the least action 


principle, the T-periodic solutions of [7] are the 
critical points of the action functional 


T af 2 
PT J ! olt 


on the Hilbert space H} obtained by completion of 
the space of T-periodic C! functions for the norm 
associated with the inner product 


V(u(t)) + How) dt 


T 


T 
(u,v) =| u(t) v(e) de + | u'(t)-uv(t) dt 


It follows easily from condition h = 0 that f is bounded 
from below and that f(u + 27e’) = f(u) for all u € HŁ, 
with æ the jth unit vector in R”(1 < j< n). Conse- 
quently, we can see f as defined on the Riemannian 


manifold T” x H4, where Hi = {u € Ht :m=0)}. It is 
easy to show that cat(T” x HŁ) = cat(T”)=n + 1 and 


that f satisfies Palais-Smale condition on T” x HŁ. 
Consequently, system [7] has at least n + 1 geometri- 
cally distinct T-periodic solutions. The same result 
holds for the more general systems 


Mu" + Au + VF(u) = h(t) 


occurring in the theory of multipoint Josephson 
junctions or in space discretizations of the 
sine-Gordon equation. In particular, the classical 
forced pendulum equation 


u” +asinu = h(t) 


has at least two geometrically distinct T-periodic 
solutions when / is T-periodic and )=0, a result 
first proved, in a different way, by Mawhin and 
Willem. 

Another way to study nonlinear eigenvalue pro- 
blems of the form 


f’(u) = do! (1) 


in a Hilbert or a suitable reflexive Banach space X 
is based upon a _ Rayleigh—Ritz approximation 
through a sequence of finite-dimensional problems, 
where the classical theory is applied. Conditions 
upon f,g€Cİ(X,R) are given, generalizing 
Ljusternik-Schnirelman’s ones, which ensure the 
existence of infinitely many solutions. Again, some 
compactness is needed to justify the limit process, 
and expressed by some assumptions upon f and g 
too lengthy to be reproduced here. The following 
application is exemplary. Let Q c R be a bounded 
domain and X = Wy? (9), p >1, be the Sobolev 
space of functions u:Q — R obtained as the comple- 
tion of the smooth functions with compact support 


in Q for the norm ||ull p = (Jo l|Vu (x) |? dx)!®. 
Define the functionals f and g on Wa (Q) by 


f(u) = | |Vu(x)||’dx, g(u) = J u(x) dx 


The critical points of f on {u € X:g(u)=1} corre- 
spond to the nontrivial solutions of the Dirichlet 
eigenvalue problem 


Apu = Nul’*u inQ, u=0 on ðQ [8] 


for the p-Laplacian operator Ap defined by 
Apu(x):=V- (||Vu(x) |? ?Vu(x) 


which occurs in the modelization of various 
problems in a porous medium. An eigenvalue is 
any A € R such that problem [8] has a nontrivial 
solution. The Ljusternik-Schnirelman technique 
implies the existence of a sequence of eigenvalues 
going to infinity, with the usual minimax character- 
ization. When N = 1, direct computations show that 
this sequence gives all eigenvalues, but the problem 
remains open for N > 2. The corresponding forced 
problem 


Apu — Nul’*u = h(x) inQ, u=0 ondQ 


is always solvable (although not uniquely) when A is 
not an eigenvalue, but solvability conditions at the 
higher eigenvalues (Fredholm alternative) remain 
almost terra incognita. 


Index Theories and Critical Points 
of Symmetric Functionals on 
a Banach Space 


Closely related to the Ljusternik—Schnirelman category 
is the concept of index associated to the action of a 
compact topological group G on a normed space X, 
that is, to a continuous map G x X — X,|[g,u]> gu 
such that 1-u=u,(gh)u=g(hu), u> gu is linear. 
The action is isometric if  ||gz|| = |||], AC X 
is invariant if gA=A for all g € G, f:X — R is 
invariant if fog=f for all g € G, and h:X — X 
is equivariant if goh=hog for each g€ G. Let 
Fix G = {u € X : gu =u for all g € G}. The aim of an 
index is to measure the size of invariant sets. 
Explicitly, an index theory associates to each closed 
invariant subset A of X a non-negative (possibly 
infinite) integer G-ind(A), its G-index, such that 
1. G-ind(A) =0 if and only if A = ģ; 
2. if R:A — B is equivariant and continuous, 
G-ind(A) < G-ind(B); 
3. G-ind(A U B) < G-ind(A) + G-ind(B); and 
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4. if A is compact, there is a closed invariant 
neighborhood U of A such that G-ind(U)= 
G-ind(A). 


A first example of index is Krasnosel’skii’s genus 
or Zy-index which corresponds to the action 
0O-u=u,1-u= —u of G = Z2. The invariant sets 
are the ones symmetric with respect to the origin 
and Z,-ind(A) is defined by Z,.-ind(@) = 0 and, for 
A Æ Ø, as the smallest integer k such that there 
exists an odd þh € C(A, R£ \ {0}). A consequence of 
the Borsuk—Ulam theorem in algebraic topology is 
that any symmetric bounded neighborhood of the 
origin in R” has Z,-index equal to n. Furthermore, 
for a compact A C R” \ {0} symmetric with respect 
to the origin, and A= A/Z (A with antipodal 
points identified), one has Z,-ind(A) = catro (A). 

A second example, the S!-index, is important in 
the study of periodic solutions of autonomous 
Hamiltonian systems. S'-ind(@) = 0 and for a non- 
empty closed invariant A C X, S!-ind(A) is defined 
as the smallest integer k such that there exists a 
positive integer n and he C(A,C*\ {0} 
with hog=g"oh for all gest. A Borsuk- 
Ulam-type theorem for St-equivariant mappings 
implies that if Z is a finite-dimensional invariant 
subspace of X such that Fix St N Z = {0} and D is 
an open bounded invariant neighborhood of 0 in Z, 
then S!-ind(@D) = (1/2)dim Z. 

As the category of a Banach space X = 1, the 
classical Ljusternik—Schnirelman approach does not 
provide any information about the multiplicity of 
the unconstrained critical points of f € C'(X,R). If f 
is invariant under the action on X of a compact 
group G and satisfies Palais-Smale condition, a 
Ljusternik—Schnirelman minimax method associated 
to a G-index provides multiplicity results for 
unconstrained critical points. Letting 


A; = {A C X : A is compact, invariant, 
and G-ind(A) > j} 
oy (a 


one shows as in classical Ljusternik—Schnirelman 
theory that if c= c = ci === = cjg for some f 
and some q > 0, then G-ind(K,) > q + 1. The proof 
uses an equivariant deformation lemma. 


Z?- and S'-Invariant Functionals 


In the case of the Z7-action, the following multiplicity 
result holds for possibly unbounded even f € C'(X,R) 
satisfying the Palais-Smale condition and having the 
mountain pass geometry: if Y N {u € X:f(u) > 0} is 
bounded for each finite-dimensional subspace Y of X, 
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f(0) = 0, and f(u) >a>0 on OBi(r), then f has 
infinitely many couples of critical points. As an 
application, the semilinear Dirichlet problem 


Au + dru + uP u =0 inQ 


[9] 
u = Q0 on ðQ 


has infinitely many solutions when Q c R^ is 
bounded, 1 < p < (N + 2)/(N — 2), and A < 44, the 
smallest eigenvalue of —A with Dirichlet boundary 
conditions. The corresponding energy functional, 
defined on Wy? (Q) by 


fu) = | 


satisfies the Palais-Smale condition. This condition 
fails in the critical case where p =(N + 2)/(N — 2), at 
least at some levels c, and this lack of compactness 
creates both difficulties and interesting phenomena. 
This situation, which occurs in many important 
problems of geometry and physics (harmonic maps, 
Yang-Mills connections, Yamabe problem, equations 
of constant mean curvature, closed geodesics pro- 
blems, etc.), reveals indeed, in physical terms, “phase 
transitions” or “particle creations” at the levels where 
the Palais-Smale condition fails. In the special case of 
eqn [9] with p =(N + 2)/(N — 2), if N > 4, a positive 
solution exists when A € [0,1], and, if N=3, the 
same is true for A € [A*, à] and some X* € [0, Aq], 
with the optimal value \* = A1 /4 when Q is a ball. For 
N > 4, [9] has at least cat(Q) nontrivial solutions 
when A € [0, A**] for some A** < Ay. Such a lack of 
compactness, which can also occur for eqn [9] in Rẹ 
(nonlinear Schrödinger equation), is associated to the 
invariance of f with respect to the action of some 
noncompact group, coming, for example, from scale or 
gauge invariance. P L Lions’ concentration—compact- 
ness method is useful to analyze those problems. 

The following multiplicity theorem holds for an 
S'-invariant f € C!(X,R) satisfying Palais-Smale 
condition. Let Fix(S')={0} and Z be a closed 
invariant vector subspace of X of positive finite 
dimension. If f is bounded from below, f(u) < c < 0 
whenever u € Z and ||u||=r, and f(0) >0 for u € 
Fix(S!) m(f’) (0), then f has at least dim Z/2 
distinct S'-orbits of critical points of f with critical 
values less or equal to c. This abstract theorem 
provides multiplicity results for the periodic solu- 
tions (closed orbits) of autonomous Hamiltonian 
systems in R” 


Vee) eee) ue 


2 2 po | 





Ju +VH(u) = 0 [10] 


where J is the symplectic matrix, H € C!(R*”,R), 
and cER is such VH(u) #0 for u e H(c). If 


H! (c) bounds a strictly convex compact set C such 
that Bir] c CC B[R] for some 0 <r< R< vAr, 
then [10] has at least n closed orbits on H! (c). The 
problem is reduced to finding the critical points of a 
suitable dual action functional acting on some space 
X of 27-periodic functions having mean value zero. 
The S'!-action on X is defined by time translations 
[7,u]~ u,=u(-+7) for all y=e" € St. One takes, 
in the abstract result above, Z={(cost)e+ 
(sin t) Je:e € R*”}, so that dim Z = 2n. The complete 
proof is quite involved, and, although some 
improvements of Ekeland—Lasry conditions have 
been obtained, the problem remains open to know 
if some pinching condition of the energy surface 
between spheres or ellipsoids is necessary. 


Some Extensions 


When dealing with unbounded functionals, it may 
be convenient to replace the Ljusternik—Schnirelman 
category catx(A) by a relative category catx y(A) 
with respect to a closed subset Y where, in the 
covering of A occurring in the classical definition, a 
set Ag D Y is added, which is continuously deform- 
able in X into a subset of Y in such a way that 
points of Y remain in Y during the deformation. 
Clearly caty g(A) =catx(A). This allows us to prove, 
under some restrictions on the coefficients and the 
period, the existence of at least four periodic 
solutions for the double pendulum with periodic 
forcing of mean value zero. The classical Ljusternik— 
Schnirelman category gives at least three periodic 
solutions without restrictions, and the question of 
their necessity to obtain four solutions is open. 

The relative category also gives a simpler proof of 
Conley—Zehnder’s version of the Arnol’d conjecture 
(the existence of at least 2n + 1 geometrically distinct 
1-periodic solutions for the Hamiltonian system 


Ju'+ VH(t,u) = 0 


with H 1-periodic in each variable), under minimal 
regularity assumptions upon H. The general con- 
jecture, namely that the minimum number of fixed 
points of all Hamiltonian symplectomorphisms of a 
closed symplectic manifold M is larger than the 
minimum number of critical points of smooth 
functions f on M, remains open. 

In another direction, a Ljusternik—Schnirelman 
theory for functionals defined on closed convex sets 
of a Banach space has been developed, which is 
specially well suited for the study of the Plateau 
problem for minimal surfaces, for surfaces of 
constant mean curvature, as well as for variational 
inequalities. 


See also: Bifurcations of Periodic Orbits; Compact 
Groups and Their Representations; Floer Homology; 
Ginzburg—Landau Equation; Inequalities in Sobolev Spaces; 
Minimal Submanifolds; Minimax Principle in the Calculus 
of Variations; Saddle Point Problems; Sine-Gordon 
Equation; Spectral Theory for Linear Operators. 
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Introduction 


Discrete Schrodinger operators with quasiperiodic 
potentials are operators acting on (Zf) and defined 


by 
Hy, =A+AV [1] 
where A is the lattice tight-binding Laplacian 


1, dist(n,m)= 1 
otherwise 


and V(n,m)=V,,6(n,m) is a potential given by 
Va =f(T} --- T70), 0 € T’, where T;0=0 + wi, and 
w is an incommensurate vector. In certain cases A 
may also be replaced by a long-range Laplacian 
L(n,m)=L(n— m) with L(n)—0 sufficiently fast. 
The questions of interest in the study of quasiper- 
iodic and other ergodic operators are the nature and 
structure of the spectrum, behavior of the eigenfunc- 
tions, and the quantum dynamics: properties of the 


time evolution Y, =e*#Ųọ of an initially localized 
wave packet Wo. 

Of particular importance is the phenomenon of 
Anderson localization which is usually referred to 
the property of having pure point spectrum with 
exponentially decaying eigenfunctions. A stronger 
property of dynamical localization (see the section 
“Dynamical localization”) indicates the insulator 
behavior, while ballistic transport, which for d=1 
follows from the absolutely continuous spectrum, 
indicates the metallic behavior. 

Operators with ergodic potentials always have 
spectra (and pure point spectra, understood as closures 
of the set of eigenvalues) constant for a.c. realization of 
the potential. The individual eigenvalues however 
depend very sensitively on the phase. Moreover, the 
pure point spectrum of operators with ergodic 
potentials never contains isolated eigenvalues, so pure 
point spectrum in such models is dense in a certain 
closed set. An easy example of an operator with dense 
pure point spectrum is H,, which is operator [1] with 
X1=0, or pure diagonal. It has a complete set of 
eigenfunctions, characteristic functions of lattice 
points, with eigenvalues V;. H) may be viewed as a 
perturbation of Hæ for small At. However, since V; 
are dense, small denominators (V; — V) make any 
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perturbation theory difficult, for example, requiring 
intricate KAM-type schemes. 

Various methods developed for the Anderson 
model (where V,, are 1.i.d.r.v.’s) such as Fröhlich- 
Spencer multiscale analysis and its enhancements, or 
Aizenman—Molchanov method, do not work for 
quasiperiodic potentials as, among other reasons, 
quasiperiodicity does not allow for nice perturba- 
tions. The situation here is more difficult and the 
theory is far less developed than for the random 
case. With a few exceptions, the results are confined 
to the one-dimensional setting, and also the case of 
one frequency (b=1) has been much better under- 
stood than that of higher frequencies. 

One might expect that H) with A small can be 
treated as a perturbation of Hp =A, and therefore 
have absolutely continuous spectrum. It is not the 
case though for random potentials in d=1, where 
Anderson localization holds for all A. The same is 
expected for random potentials in d=2 (but not 
higher). Moreover, in one-dimensional case, there 
is strong evidence (numerical, analytical, as well as 
rigorous) that even models with very mild stochas- 
ticity in the underlying dynamics (and sufficiently 
nice sampling functions) have point spectrum for 
all values of A, like in the random case (e.g., 
Vi=Af(n°a + 6), for any o>1). At the same time, 
for quasiperiodic potentials, one can in many cases 
show absolutely continuous spectrum for A small 
as well as pure point spectrum for A large (see 
below), and therefore there is a metal—insulator 
transition in the coupling constant. It is an 
interesting question whether quasiperiodic poten- 
tials are the only ones with metal-insulator 
transition in 1D. 


Perturbative and Nonperturbative 
Approaches 


It is probably fair to say that much of the theory of 
qusiperiodic operators has been first developed 
around the almost-Mathieu operator, which is 


Hyw = A+ Af (0 + mw) 2] 


acting on (Z), with f:T— T;f(0)= cos (276). 
Several KAM-type approaches, starting with the 
pioneering work of Dinaburg-Sinai in 1975, were 
developed, in 1980s and 1990s, for this or similar 
models in both large and small coupling regimes. Of 
those, the most robust and detailed is the reduci- 
bility result of Eliasson (1998) that settled the case 
of small couplings for sufficiently regular potentials. 

The common feature of those perturbative 
approaches is that, besides all of them being rather 


intricate multistep procedures, they rely extensively 
on eigenvalue and eigenfunction parametrization 
and perturbation arguments. 

The common feature of the perturbative results in 
the quasiperiodic setting is that they typically provide 
no explicit estimates on how large (or small) the 
parameter A should be, and, more importantly, A 
clearly depends on w at least through the constants in 
the Diophantine characterization of w. 

In contrast, the nonperturbative results allow 
effective (in many cases even optimal) and, most 
importantly, independent of w, estimates on A. The 
latter property (uniform in w estimates on A) has been 
often taken as a definition of a nonperturbative result. 

Recently developed nonperturbative methods are 
also quite different from the perturbative ones in that 
they do not employ multiscale schemes: usually only 
a few (from one to three) sufficiently large scales are 
involved, do not use the eigenvalue parametrization, 
and rely instead on direct estimates of the Green’s 
function. They are also significantly less involved, 
technically. One may think that in these latter 
respects they resemble the Aizenman—Molchanov 
method for random localization. It is, however, a 
superficial similarity, as, on the technical side, they 
are still closer to and do borrow certain ideas from 
the multiscale analysis proofs of localization. 


Lyapunov Exponents 


Here for simplicity we consider the quasiperiodic 
case, although the definition of the Lyapunov 
exponents and some of the mentioned facts apply 
more generally to the one-dimensional ergodic case. 

Let d=1. For an energy E € R the Lyapunov 
exponent (E) is defined as 


y(E) = lim ; 3 
where 
2 eS Af (wn +0) —1 
M,(0,E) = 
il ) IL ( 1 0 ) 


is the k-step transfer matrix for the eigenvalue 
equation HW = EW, 

In physics literature, positivity of the Lyapunov 
exponent is often taken as an implicit definition of 
localization, as Lyapunov exponent is often called 
the inverse localization length. Thus, we will be 
interested in the regime when Lyapunov exponents 
are positive for all energies in a certain interval 
intersecting the spectrum. If this condition holds for 
all EER, there is no absolutely continuous 


component in the spectrum for all 0. Positivity of 
Lyapunov exponents, however, does not imply 
localization or exponential decay of eigenfunctions 
(in particular, neither for the Liouville w nor for the 
resonant 0 € Th 

Nonperturbative methods, at least in their original 
form, stem to a large extent from estimates invol- 
ving the Lyapunov exponents and exploiting their 
positivity. 

The general theme of the results on positivity of 
y(E), as suggested by perturbation arguments, is that 
the Lyapunov exponents are positive for large A. 
This subject has had a rich history. The strongest 
result in this general context up to date is the 
following theorem (Bourgain 2003): 


Theorem 1 Let f be a nonconstant real-analytic 
function on T°, and H given by [1]. then, for 
A>X(f), we have y(E)>(1/2)InX for all E and all 


incommensurate vectors w. 


Corollaries of Positive Lyapunov Exponents 


The almost-Mathieu operator On one hand the 
almost-Mathieu operator, while simple looking, 
seems to represent most of the nontrivial properties 
expected to be encountered in the more general case. 
On the other hand it has a very special feature: the 
duality (essentially a Fourier) transform maps H, to 
H4/,; hence A=2 is the self-dual point. Aubry and 
Andre in 1980, conjectured that for this model, for 
irrational w a sharp metal-insulator transition in the 
coupling constant À occurs at the critical value of 
coupling \=2: the spectrum is pure point for A> 2 
and purely absolutely continuous for A< 2. This 
conjecture was modified based on later discoveries 
of singular-continuous spectrum in this context for 
frequencies or phases with certain arithmetic proper- 
ties. The modified conjecture stated pure point 
spectrum for Diophantine w and a.e. 0 for A>2 
and pure absolutely continuous spectrum for À < 2 
for all w,8. The spectrum at A=2 is singular 
continuous for all w and a.e. 6(this follows from a 
combination of works by Gordon, Jitomirskaya, 
Last, Simon Avila, and Krikoryan). 

As with the KAM methods, the almost-Mathieu 
operator was the first model where the positivity of 
Lyapunov exponents was effectively exploited 
(Jitomirskaya 1999): 


Theorem 2 Suppose w is Diophantine and y(E, w) > 0 
for all E € [E1, E2]. Then the almost-Mathieu operator 
has Anderson localization in [E,, E2] for a.e. 0. 


The condition on 0 can be made explicit (arithmetic) 
and close to optimal. This, combined with the 
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mentioned results on the Lyapunov exponents, 
critical value A=2, and duality, gives the following 
description in the Diophantine case: 


Corollary 3 The almost-Mathieu operator H.,,»,¢ 
has 


1° for A>2, Diophantine w € R and almost every 
0 € R, only pure point spectrum with exponen- 
tially decaying eigenfunctions. 

2° for A=2, all wg Q, and ae. OER purely 
singular-continuous spectrum. 

3° for A< 2, Diophantine we R and a.e. OER, 
purely absolutely continuous spectrum. 


Precise arithmetic descriptions of w,@ are available. 
Thus, the Aubry—Andre conjecture is settled at 
least for almost all w,@. One should mention, 
however, that while 1° can be made optimal by 
existing methods, both 2° and 3° are expected to 
hold for all 6 and all w Z Q, and such extension 
remains a challenging problem (see Simon (2000)). 

The method in the above work, while so far the 
only nonperturbative method available allowing 
precise arithmetic conditions, uses some specific 
properties of the cosine. It extends to certain other 
situations, for example, quasiperiodic operators 
arising from Bloch electrons in a perpendicular 
magnetic field, where the lattice is triangular or 
has next-nearest-neighbor interactions. However, it 
does not extend easily to the multifrequency or even 
general analytic potentials. A much more robust 
method was developed by Bourgain—Goldstein 
(2000), which allowed them to extend (a measure- 
theoretic version of) the above localization result to 
the general real analytic as well as the multi- 
frequency case. Note that essentially no results 
were previously available for the multifrequency 
case, even perturbative. 


Theorem 4 Let f be nonconstant real analytic on 
T° and H given by [2]. Suppose y(E,w)>0 for 
all E € [E1, E2] and a.e. we T°. Then for any 8, 
H has Anderson localization in [E1, E2] for a.e. w. 


Combining this with Theorem 1, Bourgain (2003) 
obtained that for A>A(f), H as above satisfies 
Anderson localization for a.e. w. Those results were 
recently extended by S Klein to potentials belonging 
to certain Gevrey classes. One very important 
ingredient of this method is the theory of semialge- 
braic sets that allows one to obtain polynomial 
algebraic complexity bounds for certain “excep- 
tional” sets. Combined with measure estimates 
coming from the large deviation analysis of 
(1/7) In ||M,,(0)|| (using subharmonic function theory 
and involving approximate Lyapunov exponents), 
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this theory provides necessary information on the 
geometric structure of those exceptional sets. Such 
algebraic complexity bounds also exist for the 
almost-Mathieu operator and are actually sharp 
albeit trivial in this case due to the specific nature 
of the cosine. 

Further corollaries of positive Lyapunov expo- 
nents for analytic sampling functions f and b=1 
include Holder regularity of the integrated density 
of states, zero-dimensionality of spectral measures 
for all w,@, almost Lipshitz continuity of spectral 
gaps, continuity of measure of the spectrum (in 
frequency), and vanishing of lower transport 
exponents for all w,0. Some weaker statements are 
available for b>1 or f belonging to certain Gevrey 
classes. 


Without Lyapunov Exponents 


While having led to significant advances, Lyapunov 
exponents have obvious limitations, as any method 
based on them is restricted to one-dimensional 
nearest-neighbor Laplacians. It turns out that the 
above methods can be extended to obtain nonper- 
turbative results in certain quasi-one-dimensional 
situations where Lyapunov exponents do not exist. 
For example, nonperturbative localization results 
extend to the strip (of arbitrary dimension). 

The following nonperturbative theorem deals with 
the case of small coupling: 


Theorem 5 Let H be an operator [2], where f is 
real analytic on T and w is Diophantine. then, for 
A< A(f), H has purely absolutely continuous spec- 
trum for a.e. 0. 


We note that an analog of this theorem does not 
hold in the multifrequency case (see next section). 
The results of this type are obtained by a method 
(developed by Bourgain and Jitomirskaya in 
2000-02) that studies large deviations for the 
quantities of the form (1/m)In|det(H — E),| and 
path-determinant expansion for the matrix elements 
of the resolvent. Those techniques apply also to 
certain other situations with long-range Laplacians, 
for example, the kicked-rotor model. Theorem 5 is a 
result on nonperturbative localization in disguise as 
it was obtained using duality from a localization 
theorem for a dual model which has in general a 
long-range Laplacian and a cosine potential, and 
was in turn obtained by an extension of the method 
of Jitomirskaya (1999). A certain measure-theoretic 
version of it allowing nonlocal Laplacians but 
leading only to continuous spectrum is also available 
(see Bourgain (2004)). 


Multidimensional Case: d > 1 


As mentioned above, there are very few results in 
the multidimensional lattice case (d > 1). Essentially, 
the only result that existed before the recent 
developments was a perturbative theorem —- an 
extension by Chulaevsky—Dinaburg of Sinas 
method to the case of operator [1] on (Zf) with 
V,=AM(n-w),w € Rf, where f is a cos-type function 
on T. This also holds nonperturbatively for any real- 
analytic f (see Bourgain (2004)). Note that since 
b=1, this avoids most serious difficulties and is 
therefore significantly simpler than the general 
multidimensional case. We therefore have: 


Theorem 6 For any «>0 there is A(f, €), and, for 
A > Alfe), UA, f) c T? with mes(Q) < e, so that for 
wQ, operator |1] with V, as above has Anderson 
localization. 


This should be confronted with the following 
theorem of Bourgain: 


Theorem 7 Let d=2 and f(0) = cos270 in H=H, 
defined as above. Then for any à measure of w s.t. 
H, has some continuous spectrum is positive. 


Therefore, for large A there will be both w with 
complete localization as well as those with at least 
some continuous spectrum. This shows that non- 
perturbative results do not hold in general in the 
multidimensional case! Perturbative results, how- 
ever, had been obtained, see next section. 

A similar (in fact, dual) situation is observed for 
one-dimensional multifrequency (d=1;b> 1) case 
at small disorder. One has, by duality: 


Theorem 8 Let H be given by [2] with 0,w € T°? 
and f real analytic on T°. Then for any €>0 there is 
Mf, €) s.t. for A< Alf,€) there is Q(A,f) c T? with 
mes(Q) <€ so that for wQ, H has purely abso- 
lutely continuous spectrum. 


And also 


Theorem 9 Letd=1,b=2 and f be a trigonometric 
polynomial on T? with a nondegenerate maximum. 
Then for any à, measure of w s.t. H, has some point 
spectrum, dense in a set of positive measure, is positive. 


Therefore, unlike the b=1 case (see Theorem 5), 
nonperturbative results do not hold for absolutely 
continuous spectrum at small disorder. 


Perturbative Localization by 
Nonperturbative Methods 


While the above demonstrates the limitations of 
the nonperturbative results, the nonperturbative 


methods have been applied to significantly simplify 
the proofs and obtain new perturbative results that 
previously had been completely beyond reach. 

Many such applications, that are outside the scope of 
this article, are described in Bourgain (2004). In 
particular, new results on the construction of quasiper- 
iodic solutions in Melnikov problems and nonlinear 
PDEs, obtained by using certain ideas developed for 
nonperturbative quasiperiodic localization (e.g., the 
theory of semialgebraic sets), are presented there. 
Other results in this group contain localization for the 
skew-shift model by Bourgain—Goldstein—-Schlag, almost 
periodicity for the quantum kicked-rotor model by 
Bourgain and Bourgain—Jitomirskaya, and localization 
for potentials in higher Gevrey classes by S Klein. 

The main goal in a nonperturbative method is to 
obtain exponential off-diagonal decay for the matrix 
elements of the Green’s function of box-restricted 
operators along with subexponential bounds on the 
distance from the spectrum of such box restrictions 
to a given energy. From that result one can obtain 
localization through elimination of energy via an 
argument involving complexity bounds on semialge- 
braic sets (see Bourgain (2004)). 

A nonperturbative way to achieve the desired 
Green’s function estimates uses Cramer’s rule to 
represent the matrix elements of the resolvent. Then, 
in the one-dimensional (in space) case it is often 
possible to obtain the estimates from the positivity of 
Lyapunov exponents: uniformly for the numerator, 
and from large deviation bounds for the subharmonic 
functions for the denominator. This is done in one 
step for a sufficiently large scale (see the subsection 
“Corollaries of positive Lyapunov exponents”) 

A perturbative way consists of establishing the 
desired estimates in a multiscale scheme: namely, the 
estimates are proved outside a set of parameters of 
(subexponentially) decaying (in the size of the box) 
measure. Moreover, this set should be shown to have 
a semialgebraic description, in order to make possible 
sublinear upper bounds on the number of times a 
trajectory of a given phase (under the underlying 
rotation or other ergodic transformation of the torus) 
hits the “forbidden” set. This, plus certain subhar- 
monic function arguments, allows passage to a larger 
scale through a repeated use of the resolvent identity. 

An application that is most relevant to the current 
article is localization for a “true” d>1 situation. 
The best currently available result is the following 
very recent theorem (Bourgain 2005): 


Theorem 10 Let d=b and let f be real analytic on 
T? such that for all i=1,...,d and (01,...,0i-1, 
Oi4145---59g) € T the map 


0; — v(01,...,0i,...,04) 
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is a nonconstant function of 0; € T. Then for any 
c>0 there is X(f,e) s.t. for X>A(f,€) there is 
QA, f) CT? with mes(Q) <e so that for wQ 
operator [1] with V, = Af (n1w, nzw2) has Anderson 
(and dynamical) localization. 


This result was obtained previously, for d=2 
only, by Bourgain, Goldstein, and Schlag. There 
were some serious purely arithmetic difficulties that 
prevented an extension of this result to higher 
dimensions. In the previous results on localization 
there were two major steps: estimations on the 
Green’s function for fixed energy and elimination of 
energy. The main difficulty in the multidimensional 
case lies in establishing the sublinear bound 
described above, that enters in the first step. It is 
for this bound that an arithmetic condition on w was 
needed. The condition used was to guarantee that 
the number of (71,72) € [1,N]* such that (niw, 
nw2)(mod Z4) € S is bounded from above by N® for 
some a < 1, uniformly for all semialgebraic sets S of 
degree D, with D’/D=o(1/N) and with the 
measure of all horizontal and vertical sections Sx 
satisfying logmesS,=o(log1/N). This condition 
roughly means that too many points close to an 
algebraic curve of a bounded degree would force it 
to oscillate more than it should. Such a statement is 
essentially two dimensional and not extendable to 
d > 3. In Theorem 10, Bourgain circumvents it by 
using from the beginning the theory of semialgebraic 
sets to eliminate energy and the translation variable 
to get conditions on w (that depend on the potential) 
already in the first step. 


Dynamical Localization 


Anderson localization does not in itself guarantee 
absence of quantum transport, or nonspread of an 
initially localized wave packet, as characterized, for 
example, by boundedness in time of moments of the 
position operator. This was first observed in del Rio 
et al. (1996), where a rather artificial example of 
coexistence of exponential localization and quantum 
transport was constructed. However, such phenom- 
ena also happen in models of interest to physicists 
such as the random dimer model. Considering for 
simplicity the second moment 


T 
(7p =a | DUP ae 


we will say that H exhibits dynamical localization 
if (x*)..< const. We will say that the family 
{Ho},-7» exhibits strong dynamical localization if 
fo dé sup,(x”), < const. We note that the results 
mentioned below will hold with more restrictive 


338 Localization for Quasiperiodic Potentials 


definitions of dynamical localization (involving the 
higher moments of the position operator) as well. 
Dynamical localization implies pure point spectrum 
by RAGE theorem so it is a strictly stronger notion. 

It turns out that nonperturbative methods allow 
for such dynamical upgrades as well. For the almost- 
Mathieu operator, strong dynamical localization 
holds throughout the regime of localization. It was 
shown by Bourgain and Jitomirskaya that in 
Theorems 4 and 6 as well as some other localization 
results, dynamical localization also holds (see 
Bourgain (2004)). However, methods that require 
elimination of certain frequencies based on implicit 
conditions currently do not provide sufficient infor- 
mation to obtain strong (i.e., averaged) dynamical 
localization, like what was done in the almost- 
Mathieu case. 


Quasiperiodic Localization and Cantor 
Spectrum 


A remarkable feature of quasiperiodic operators 
with b=d=1 is their tendency to have Cantor 
spectrum. In particular, it was conjectured that all 
almost-Mathieu operators (for all nonzero couplings 
and all irrational frequencies) have Cantor spec- 
trum. This conjecture became known as the Ten 
Martini problem. In a significant recent develop- 
ment (Puig 2004), it was shown that for Diophan- 
tine frequencies Cantor structure of the spectrum 
follows from localization for phase 6=0, with 
corresponding eigenvalues being the boundaries of 
noncollapsed gaps. The key idea here is that for 
energies dual to eigenvalues of Ho, corresponding to 
localized eigenfunctions, the rotation number of the 
transfer-matrix cocycle is of the form kw(modZ), 
thus they are the ends of the gaps (possibly 
collapsed). However, a collapsed gap in this case 
would correspond to reducibility of the system to 
the identity which can be shown to contradict the 
simplicity of pure point spectrum for the dual 
model. Since those energies form a dense subset of 
the spectrum the result follows. The same idea 
works, thus establishing Cantor spectrum, for 
potentials that are generic in certain sense. Localiza- 
tion also played an important role in the final proof 


of the Ten Martini conjecture, for all irrationals 
(Avila and Jitomirskaya 2005). It can be shown that 
proving localization for a large set of phases allows 
one to conclude reducibility of the transfer-matrix 
cocycle for the dual model, for a large set of 
energies, and this in turn can be shown to contradict 
the presence of an interval in the spectrum. 
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Introduction 


Loop quantum gravity (LQG) is a mathematical 
formalism that defines a tentative quantum theory 
of spacetime. Equally, the formalism provides a 
description of the gravitational field in regimes in 
which its quantum properties cannot be neglected. 
The distinctive feature of LQG is to be a quantum 
field theory consistent with general relativity. 

According to general relativity, the physical fields 
that form the world do not live on a background 
spacetime. Rather, these fields make up spacetime 
themselves (“background independence”). Accord- 
ingly, the quanta of a quantum field theory compatible 
with this principle — the s-knots described below — do 
not live on a background spacetime: rather, they 
themselves form physical spacetime. 

This physical idea is realized in the formalism by 
the gauge invariance under active diffeomorphisms 
of the manifold on which the fields are originally 
defined (“diffeomorphism invariance”). Such gauge 
invariance renders the localization of the field’s 
excitations on the manifold physically irrelevant. 

LQG implements these physical motivations by 
merging two traditional lines of thinking in theoretical 
physics. The first is the long-standing idea that gauge 
fields are naturally understood in terms of variables 
associated to lines (holonomies of the gauge connec- 
tion, Wilson loops, Faraday lines, .. .). This idea can be 
traced to Faraday’s initial intuition that gave birth to 
modern field theory: physical fields are real entities 
formed by lines. The second is the background- 
independent canonical or covariant quantization of 
general relativity developed by following the ideas of 
Wheeler, DeWitt, and Hawking. Each of these two 
lines of research has encountered serious obstructions, 
but the two turn out to solve each others’ difficulties: 
the formulation in terms of holonomies renders the old 
ill-defined background-independent quantum gravity 
well defined; conversely, background independence 
cures the divergences associated to the Wilson loop 
basis. 

The formalism of LQG can be separated into two 
parts. A kinematics, describing the quantum proper- 
ties of space, and a dynamics, describing its 
evolution. Here we outline the LQG kinematics, 
and we give only the main result of the LQG 
dynamics. 
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LQG can be extended to include standard matter 
couplings such as fermions and Yang-Mills fields. It 
finds numerous applications, for instance, in early 
cosmology, astrophysics and black hole thermo- 
dynamics (see Black Hole Mechanics, Quantum 
Cosmology). 

So far no empirical evidence supports the physical 
correctness of this — nor of any other — tentative 
theory of quantum gravity. 


General Relativity in Canonical Form 


Classical general relativity is the field theory 
describing the gravitational field and the structure 
of physical spacetime. It is a well-established 
physical theory, strongly supported empirically. 

In its Riemannian version, the theory can be 
written in canonical form in terms of two fields on a 
three-dimensional (3D) manifold £ with coordinates 
x9(a =1,2,3): a 2-form E = E‘%e,,,. dx? dx”, called the 
“triad field” and a 1-form A = A, dxf, called the 
“gravitational connection” (€,,. is the totally anti- 
symmetric tensor density). Both take values in the 
su(2) algebra, and they satisfy the three “constraint” 
equations 


G = D,E* =0 (1) 
C, = tr[F,E*] = 0 [2] 
C = tr[F, E" E’] = 0 [3] 


D, is the SU(2) covariant derivative defined by the 
connection A, F,, is the SU(2) curvature of A, and 
the trace is on su(2). 

E and A are canonically conjugate: their Poisson 
brackets are {E?(x), Ap(y)} = 8rGc 76467 (x, y); where 
G is the Newton constant, c is the speed of light, 6? is 
the Kronecker delta, and & (x, y) is the Dirac-delta on 
X, which is a scalar density in x. The Poisson brackets 
of G with the fields define their SU(2) gauge 
transformations: E transforms in the adjoint repre- 
sentation and A transforms as a connection. The 
Poisson brackets of C, (more precisely, of an 
appropriate linear combination of C, and G) with 
the fields determine their transformation under a 
diffeomorphism of X: E transforms as a 2-form and A 
as a 1-form. The Poisson brackets of C with the fields 
generate their coordinate time evolution. If the t 
derivatives of the fields E(x*,t) and A(x*,t) are 
given by their Poisson brackets with (the 3D integral 
of) C, then (assuming that the determinant 


E = y/ det tr[E%E°®] does not vanish) the metric field 
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2° — 1, g =0, g” =tr[E*E°]/E is a general solution 
of the Riemannian Einstein equations in a fixed gauge. 
The physical Lorentzian theory can be obtained in 
this formalism in two ways. Either by adding an 
appropriate term to eqn [3], or by taking A in 
sl(2,C) and satisfying a suitable reality condition. 
(For more details, see Canonical General Relativity.) 


Spin Network and s-Knot States 


LQG can be defined as a Schrödinger quantization 
of the canonical formalism described above. The 
space of the quantum states is defined as a Hilbert 
space K of Schrodinger wave functionals Y[A] of the 
gravitational connection. The nontrivial aspect of 
this construction is the definition of a scalar product 
invariant under the two kinematical gauge invar- 
lances of the theory: the local SU(2) and the 
diffeomorphisms transformations generated by the 
constraints [1] and [2]. The state space K is defined 
as follows (see Quantum Geometry and its Applica- 
tions for an essentially equivalent construction). 
Given an su(2) connection A and an oriented path 
y:s € [0,1] — x*(s) € £, recall that the “holonomy” 
U[A, y] of A along y is the element of SU(2) defined by 


“UIA, A(s) + 47(s)Aa(r(s)) UIA, W(s) =9 A 


U[A, y|(0) = 1, U[A, 7] = UA, y](1) [5] 


where 74(s) = dx*(s)/ds is the tangent to the path. 
The solution of this equation is usually written in 
the form 


UJA, 3] = Pe: 6 


where the path ordered P is understood as acting on 
the power series expansion of the exponential. 

Let A be the space of the smooth connections A on 
X}. (For technical reasons, it is convenient to consider 
smooth fields A defined everywhere in X except at 
most at a finite number of points, and the group 
Diff* of the “extended diffeomorphisms” defined by 
the continuous invertible maps ¢:%: — X that are 
smooth everywhere in © except at most at a finite 
number of points.) A graph T is an ordered collection 
of smooth oriented paths, y;, denoted as links, with 
/=1,...,L, where the links overlap only at their 
endpoints, called nodes. Given a graph I’ and a 
smooth, Haar-integrable complex function f:U € 
(SU(2))} > f(U) € C, the couple ([,f) defines the 
(“cylindrical”) functional of A 


Wr lA] = MUAT] [7] 


UIA, T] = (UA, 71], .- -, UIA, Ll) [8] 


Let £ be the linear space of all functionals Vp ;[A], 
for all T and f. £ is dense (in an appropriate sense) in 
the space of all continuous functionals on A. 

An SU(2) and Diff* invariant scalar product can 
be defined in £ as follows. If two functionals 
Wr [A] and Yr, ,[A] are defined by the same graph 
T, define 


(Ur/Mre) = | AUAU) O 


where dU is the Haar measure on (SU(2))". The 
extension to functionals defined on different graphs 
is obtained by observing that (I, f) and (I”, f’) define 
the same functional if I’ contains I’ and f is 
independent of the variables in T but not in I”. It 
follows that any two given functionals Yr and 
Wr,” can be written as functionals Yr and Wy, , 
with the same graph T, where I is obtained from the 
union of I” and I”. Using this, the scalar product [9] 
is defined for any two functionals in £: 


(Up, peng) = ry Yre) (10) 


Standard completion in the Hilbert norm defines the 
kinematical Hilbert space K of LQG. £ is dense in K 
and defines the Gelfand triple £ C K C £*. K carries 
a natural unitary representation of the group of local 
SU(2) representations and a natural unitary repre- 
sentation U, of the group of the extended diffeo- 
morphism of X. These two properties are nontrivial; 
they represent the main physical motivation for the 
definition of the scalar product. The SU(2)-invariant 
subspace of K is a proper subspace Ko. 

An orthonormal basis in Ko can be defined using 
the Peter-Weyl theorem. The basis states are labeled 
by a graph T, by the assignment of a nonvanishing 
spin ją to each link y € T and by the assignment of a 
basis element i, in the space of the intertwiners 
(invariant tensors in the tensor product of the 
representations space of the adjacent links) at each 
node n of I. The triple S=(T,/,,7%,) is called an 
imbedded spin network. The quantum state 
Ws[A] = (A|S) in Ko labeled by the spin network 
S=(T,Jy,%:) is the cylindrical function obtained by 
contracting the representation matrices of the 
holonomies U(A, 7), in the representations ją, with 
the invariant tensors at the nodes. 

The diffeomorphism-invariant state space Kgir¢ is 
the SU(2) and diffeomorphism invariant subspace of 
L*. It is the (closure of the) image of the map 
Paige: £ — L* defined by 


(Pat (V) = So WW) vy, v eK [11] 
W"=U,U 


The sum is over all states Ų” in £ for which there 
exists a diffeomorphism ¢ such that Y” = U,W; this 
is a finite sum. The scalar product on this image is 
naturally defined by 


(Pait Ys, Pait Ys) cig = (PaieeUs)(Us) [12] 


The space Kgis¢ obtained in this manner is separable. 
The images |s} = Pgire|S) of the spin network states 
are called s-knot states. They span Kaige. They are 
determined only by the diffeomorphism equivalence 
class s of the spin network S. Namely, by an abstract 
(non-imbedded) knotted graph, colored with spins 
and intertwiners. These colored knots are called 
s-knots or abstract spin networks. The s-knot states 
have a straightforward physical interpretation as 
quantum excitations of space, discussed below. 


Operators and Quanta of Space 


The state space defined above carries a quantum 
representation of classical observables of general 
relativity. The classical quantity U[A, y], a function 
of the field variable A, acts naturally as a multi- 
plicative operator on kK. Thus, K provides a 
Schrodinger functional representation Y[A] of quan- 
tum gravity, which diagonalizes the (holonomy of 
the) gravitational connection. The two constraints 
[1] and [2] generate SU(2) gauge and diffeomorph- 
ism transformations on A. The corresponding 
transformations on the Schrödinger functional states 
W[A] are given by the unitary representations 
mentioned above. The quantum implementation of 
the two constraint equations [1] and [2], following 
Dirac’s theory of constrained quantum systems, is 
the requirement of invariance under these transfor- 
mations. The space Kgs is the solution to these 
requirement. 

The triad field operator E can be defined only if 
suitably smeared. Since E is a 2-form, its geometri- 
cally natural smearing is with a 2D surface. (The 
1-form field A is smeared over a line in U[A,7].) 
Given a finite 2D surface S:o =(o!, 07) 6 x4(o) € X, 
the smeared field 


s= [E= | Poan Eo) (13 


is quantized by the functional derivative operator 


81G f 2 Oxon? o 
E[S] |= -ih fa O Cabe BT 13o? 6A.(x(0)) [14] 


This operator is well defined on K and the quantum 
operators E[S] and U[A, 7] define a linear represen- 
tation of the Poisson algebra of the corresponding 
classical quantities. Thus, they define a quantization 
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of the kinematics of general relativity. Notice that in 
a general covariant quantum field theory field 
operators can be well defined even if smeared on 
low-dimensional regions, while in conventional 
quantum field theory, these operators need to be 
smeared over 3D or 4D regions. 

A simple calculation shows that if S and y 
intersect once, 


Ul|A,y]YUIA, %2] [15] 





E,[S]U[A,] = 4b"? 


where v € su(2), we have written E, =tr[vE], 71,2 
are the two paths into which y is partitioned by the 
surface, and the sign is determined by the relative 
orientation of S and y. More generally, E[S]U[A, y] 
is a sum of one such term per intersection between S 
and +. 

Composite operators can be constructed in terms 
of these operators. In particular, using standard 
formulas in classical general relativity, the area of 
the surface S can be written as a Riemann sum 


= lim 2 \/ tr [16] 


Nw 


where S, ,n=1,...,N, is a Riemann partition of 
the surface. A straightforward calculation based on 
eqn [15] shows that, if S cuts n links of a spin 
network carrying spins (j1...jn)=j, then the spin 
network state |S) is an eigenstate of A[S] with 
eigenvalue 


= 8S GD 47 


i=1,n 


where j;,=1/2,1,3/2,2,... These are therefore 
discrete eigenvalues of the area. All eigenvalues of 
the area operator A[S] are real and discrete and 
A[S] is a self-adjoint operator. Similar results are 
obtained for the volume operator. This gets a 
discrete contribution for each node of a spin 
network. 

These spectral properties of the area and volume 
operators determine the physical interpretation of 
the spin network states: the nodes of the spin 
network represent quanta of space with quantized 
volume; the nodes are connected by links represent- 
ing quanta of surface with quantized area. The 
graph T determines the adjacency relations between 
the individual quanta of space; the intertwiners i, 
are volume quantum numbers; the spins j, are area 
quantum numbers. 

The interpretation carries over to the s-nodes, which 
represent the same quantum excitations of space, up to 
its manifold coordinatization, which is physically 
irrelevant because of the gauge invariance under 





342 Loop Quantum Gravity 


Figure 1 The graph of an s-knot, namely an abstract spinfoam, 
and the set of quanta of space it represents. Each node n of the 
graph defines a quantum of space. The associated intertwiner /, is 
the corresponding volume quantum number. Two quanta of space 
are adjacent if the corresponding nodes are linked. A link y cuts 
the elementary surface separating the two quanta and its spin j} is 
the area quantum number of this surface. 


diffeomorphisms of X. An s-knot state |s) with N 
nodes represents a quantum excitation of space with N 
quanta of space adjacent to one another according to 
the connectivity of I (see Figure 1). 

Notice that the quantum states |s) do not 
represent quantum excitations living in the physical 
space: they represent quantum excitations of the 
physical space. For instance, the state |0} defined by 
the empty graph does not represent an “empty” 
physical space, but the absence of any physical 
space. A generic quantum state of the physical space 
is represented by a normalizable linear superposition 
of these discrete quantized spacetimes (see Knot 
Invariants and Quantum Gravity). 

In a nongeneral covariant context, the kinematical 
quantization predictions of quantum theory (such as 
the quantization of the angular momentum) are 
obtained from the spectral properties of operators 
that represent measurements at a given time. In the 
general covariant Hamiltonian formalism, the corre- 
sponding kinematical quantization predictions are 
given by spectral properties of “partial observables” 
operators, which in general are not gauge invariant in 
the sense of Dirac. Area and volume are partial 
observables of this kind. Their spectra are therefore 
interpreted as physical predictions of LQG (up to an 
overall numerical factor, called the Immirzi parameter, 
which is obtained in certain variants of the theory). 


Dynamics 


The dynamics of the theory is obtained in terms of a 
“Hamiltonian constraint” operator C that quantizes 
the constraint [3]. Different variants of the operator 
C, and of its Lorentzian version, have been 
constructed. The operator is defined via a suitable 
regularization procedure. The description of these 
constructions exceeds the scope of this article, and 


we limit ourself here to mentioning the main result 
and a few general comments. 

The main result of the LQG dynamics is that C 
turns out to be well defined and ultraviolet-finite 
when restricted to Kg. Finiteness holds also when 
standard matter couplings, such as Yang-Mills fields 
and fermions, are added. 

The reason for this finiteness can be understood as a 
consequence of the discrete nature of space implied by 
the spectral properties of the geometric operators 
described above. The limit in which the ultraviolet 
cutoff, introduced to regulate C, is removed turns 
out to be trivial on the diffeomorphism-invariant states 
in Kg. This is because this limit probes the short- 
distance regime, but there is no physical (gauge- 
invariant) short distance, in a theory in which 
geometry turns out to be quantized at the Plank 
scale. Since the physical states in Kgj¢¢ define a physical 
geometry only at scales larger than the Planck scale 
bGc™, the “short-distance” modes in the coordinate 
manifold © turn out to be pure gauge. This interplay 
between quantum field-theoretical and general- 
relativistic physics is the distinctive character of LQG. 

Finally, we sketch the formal structure that 
dynamics can take in the general covariant 
Hamiltonian formalism of LQG. The operator C 
defines a linear operator P ~ 6(C), usually (impro- 
perly) denoted the “projector,” which sends states in 
Kgigg into the kernel of C, formed by the generalized 
Kaige vectors that solve the Wheeler-De Witt equa- 
tion CV =0 (see Wheeler-De Witt Theory). Matrix 
elements of P are interpreted as transition ampli- 
tudes between quantum states of space. 

Physical predictions for processes that take place 
in a finite spacetime region R can be obtained, in 
principle, as follows. One considers a state |W) 
representing the result of the measurement of partial 
observables of the 3D boundary of a spacetime 
region R. |W) codes the nonrelativistic notions of 
initial, boundary and final conditions. Then (0|P|W) 
can be interpreted as a relative probability ampli- 
tude associated to this result. A formal expansion of 
this amplitude in powers of C generates a spinfoam 
sum (see Spin Foams) that can be understood as the 
“quantum gravity sum over histories” in R. 

A systematic technique for computing physical 
transition amplitudes from the background- 
independent and nonperturbative formalism of 
LQG has not yet been developed. 


See also: BF Theories; Black Hole Mechanics; Canonical 
General Relativity; Knot Invariants and Quantum Gravity; 
Knot Theory and Physics; Quantum Cosmology; Quantum 
Dynamics in Loop Quantum Gravity; Quantum Geometry 
and its Applications; Spin Foams; Wheeler—De Witt Theory. 
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Introduction 


Einstein’s (1916) use of differential geometry as an 
essential tool in his theory of general relativity has 
long been a motivation for the study of Lorentzian 
geometry. More recently, the influential mono- 
graphs of R Penrose (1972) and of S Hawking and 
G Ellis (1973), the latter still cited by some as the 
Bible of general relativity, so fascinated differential 
geometers that Lorentzian geometry took its place 
alongside of global Riemannian geometry as a 
worldwide research area. 

Let M be a smooth n-dimensional manifold, n > 2, 
with a countable basis. A Lorentz metric g= < , > 
on M is a symmetric nondegenerate (0, 2) tensor field 
on M of index (-, +, ..., +). The existence of such 
a tensor field implies that M admits a (non-oriented) 
line field; hence, some compact manifolds like S* do 
not admit such metrics. A nonzero tangent vector v in 
TM is then timelike (resp., nonspacelike, null, space- 
like) according to whether g(v,v) <0 (resp., 
<0, =0, >0). A Lorentzian manifold (M,g) is a 
pair consisting of a smooth manifold together with a 
choice of Lorentz metric. In this article, we use the 
convention that a spacetime (M,g) is a Lorentzian 
manifold together with a choice of time orientation, 
that is, a continuous timelike vector field X on M. 
Then a tangent vector v based at p may be 
consistently defined to be future (resp., past) directed 
if g(X(p),v) <0 (resp.,>0). (Some authors also 
require that (M, g) be space oriented.) If a Lorentzian 
manifold happens not to be time orientable, then a 
2-fold covering manifold with the induced pullback 
metric will be time orientable. Also basic are the 
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notations p <q (resp., p < q) if there is a future- 
directed timelike (resp., nonspacelike) curve from p 
to q and the corresponding chronological (resp., 
causal) future of p given by I*(p)={q E M; p « q} 
and J"(p)={q € M; p < q}. 

For a Riemannian manifold (N, go), the Riemannian 
distance function 


do : N x N > (0, +00) [1] 


given by do(p, q) = inf {L(c); c: [0,1] — N is a piece- 
wise smooth curve with c(0)=p and c(1)=q}. A 
fundamental result in global Riemannian geometry 
is the celebrated Hopf-Rinow theorem. 


Hopf-Rinow Theorem For any Riemannian 
manifold (N,go), the following conditions are 
equivalent: 


(i) metric completeness: (N,do) is a complete 
metric space; 

(ii) geodesic completeness: for any v in TN, the 
geodesic c,(t) in N with initial condition 
c,(0)=v is defined for all values of an affine 
parameter t; 

(iii) for some point p in N, the exponential map 
exp, is defined on all of TN; 

(iv) finite compactness: every subset K of N that is 
dy bounded has compact closure. 

Moreover, if any one of (i)-(iv) holds, then 
(N, go) also satisfies 

(v) minimal geodesic connectedness: given any p, q 
in N, there exists a smooth geodesic segment 
c:[0,1] >N with c(0)=p,c(1)=q and 
L(c) = do(p, q). 


A Riemannian metric for a smooth manifold is 
then said to be complete if it satisfies any of the 
above properties (i) through (iv). The Heine—Borel 
property of basic topology implies (via (iv)) that all 
Riemannian metrics for a compact manifold are 
automatically complete and many of the examples 
studied in basic Riemannian geometry are complete. 
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Also, if Riem(N) denotes the space of all Rieman- 
nian metrics for a smooth manifold N, both geodesic 
completeness (property (ii) above) and geodesic 
incompleteness (the failure of property (ii) to hold 
for all geodesics) are C? stable properties on 
Riem(N), that is, given a complete (resp., incom- 
plete) metric g for N, there exists an open neighbor- 
hood U(g) of g in Riem(N) in the Whitney C° fine 
topology such that all Riemannian metrics / in U(g) 
are complete (resp., incomplete). 

For spacetimes (M,g), however, many basic 
examples furnished by general relativity fail to be 
geodesically complete and compactness of the 
underlying smooth manifold M does not imply that 
the given Lorentz metric g (let alone all Lorentz 
metrics for M) are complete. Also, the stability of 
geodesic completeness and incompleteness is more 
complicated than in the Riemannian case, necessi- 
tating concepts like pseudoconvex geodesic systems 
and disprisonment as studied by Beem and Parker. 
To summarize, for spacetimes and their associated 
Lorentzian distance functions, no naive analogs 
for the Hopf-Rinow theorem are valid. Under 
additional hypotheses, geodesic completeness may 
be guaranteed. Marsden noted that a compact 
spacetime with a homogenous Lorentz metric is 
geodesically complete. Then Carriere showed that a 
compact spacetime whose curvature tensor vanishes 
is geodesically complete. Later Kamishima (assum- 
ing constant curvature) and then Romero and 
Sanchez more generally showed that a compact 
Lorentzian manifold which admits a timelike Killing 
field is geodesically complete. 

At any point p in a given spacetime, emanating 
from p are three families of geodesics: timelike, 
spacelike, and null. It was hoped in the 1960s that 
possibly continuity arguments could be obtained for 
different types of geodesic completeness. However, a 
series of examples showed by the mid-1970s that 
timelike geodesic completeness, null geodesic com- 
pleteness, and spacelike geodesic completeness are 
logically inequivalent. (Here, a given geodesic is said 
to be complete if it may be extended to be defined 
for all values of an affine parameter.) Nomizu and 
Ozeki for Riemannian manifolds showed that any 
given Riemannian metric go for the smooth mani- 
fold N could be made geodesically complete by 
making a conformal change of metric Qgọ, where 
Q:N — (0, +00) is a smooth function. Especially in 
general relativity, such conformal changes are 
natural because the causal character of tangent 
vectors and curves (and hence of the basic causality 
conditions) are preserved. For spacetimes while 
generally nonspacelike geodesic completeness could 
not be produced by conformal changes, for some 


subclasses of spacetimes, such as the strongly causal 
ones, it was possible with a global conformal 
change. 

For a large class of spacetimes, the warped or 
multiwarped products (originally inspired by several 
cosmological models in general relativity and a basic 
construction from Riemannian geometry), explicit 
integral criterion involving the warping functions 
have been given for timelike or null geodesic 
completeness. Several early examples of this type 
of result are discussed in Beem et al. (1996, 
pp. 111-112). 


Lorentz Distance and the Nonspacelike 
Cutlocus 


For an arbitrary, not necessarily complete, Riemannian 
manifold (N, go), the Riemannian distance function 
given in eqn [1] is continuous, the metric topology 
induced by do coincides with the given manifold 
topology, and do(p,q) is finite for all p,q in N. 
Now, for an arbitrary spacetime (M,g), and p,q 
in M, if there is no future-directed nonspacelike 
curve from p to q, set d(p,q) =O; if there is such a 
curve, let 


d(p, q4) = sup{L(c); c: [0, 1] > (M, 8) 
is a piecewise smooth future- 
directed nonspacelike curve 


with c(0) = p and c(1) = q} [2] 


(Unlike the Riemannian case, [2] does not bound 
d(p,q) from above by L(c) for any selected curve c 
and hence the Lorentz distance may assume the 
value +00.) 

This then defines what some authors term the 
“Lorentzian distance function” 


d = d(g):M x M = (0, +00] 3) 


and other authors term “proper time.” It is linked to 
the causal structure of the given spacetime since 


d(p, q) > 0 iff q is in I*(p) [4] 


and in place of the triangle inequality for the 
Riemannian distance function, a reverse triangle 
inequality holds: 


if p<r<q, then d(p, q) >d(p,r)+d(r, q) [S] 


Also in the context of eqn [2], a future-directed 
nonspacelike curve c:[0,1] — M from c(0)=p to 
c(1)=q is defined to be maximal if L(c)=d(p, q). 
Corresponding to the Riemannian theory, a max- 
imal nonspacelike curve turns out to be a smooth 
null or timelike geodesic segment. 


As mentioned earlier, geodesic completeness is 
generally not a natural requirement to place on 
a spacetime. But what emerges from [4] in place of 
Riemannian completeness is an interplay between 
the causal properties of the given spacetime and 
the continuity (and other properties) of the 
Lorentzian distance function (cf. Beem et al. (1996, 
chapter 4)). At the extreme of totally vicious 
spacetimes, the Lorentz distance is always +00. 
Less drastically, if (M,g) contains a closed timelike 
curve passing through p, then d(p,q)= +œ for all 
q in J*(p). Also, certain cosmological models 
contain pairs of points at infinite distance. In 
general, Lorentzian distance is only lower semicon- 
tinuous. Adding upper semicontinuity forces a 
distinguishing spacetime to be causally continuous. 
A spacetime is chronological iff d(p,p)=0 for all p 
in M. At the other extreme from totally vicious 
spacetimes are globally hyperbolic spacetimes, 
which share many properties somewhat analogous 
to complete Riemannian manifolds. The Lorentzian 
distance function of a globally hyperbolic spacetime 
is both continuous and finite valued. (Indeed, a 
strongly causal spacetime is globally hyperbolic iff 
all Lorentz metrics g’ in the conformal class 
C(M,g) also have finite-valued distance functions 
d(g').) Second, corresponding to property (v) of the 
Hopf-Rinow Theorem, these spacetimes all satisfy 
maximal nonspacelike geodesic connectability: 
given any p, q in M with p < q, there exists a 
future nonspacelike geodesic segment c : [0,1] — M 
with c(O) =p, c(1)=g and L(c)=d(p, q). 

A basic concept from the calculus of variations is 
that of a pair of conjugate points along a geodesic 
segment c: [0,a] — (M,g). A smooth vector field 
](t) along c is said to be a “Jacobi field” if J satisfies 
the Jacobi differential equation 


TER je =0 6 


where R denotes the curvature tensor. Then 
c(t), c(s) are said to be conjugate points along c if 
there exists a nonzero Jacobi field J along c with 
](t)=J(s)=0. Much of the basic comparison tech- 
niques in global Riemannian geometry involving 
lengths of geodesics in manifolds satisfying curva- 
ture inequalities, such as the “Rauch comparison 
theorems,” the “Toponogov triangle comparison 
theorem,” and volume comparison theorems, were 
first obtained through Jacobi field techniques 
(cf. Petersen (1998) for a contemporary account). 
Later, Riccati equation techniques became more 
popular (cf. Karcher (1989)). For spacetimes, espe- 
cially in the globally hyperbolic case, analogous 
results have been obtained for nonspacelike geodesic 
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segments, with a key breakthrough in 1979 being 
Harris’s version of the “Toponogov triangle com- 
parison theorem” for timelike geodesic triangles in 
globally hyperbolic spacetimes. The Raychaudhuri 
equation used earlier in general relativity corre- 
sponds for spacetimes to this passage in the 
Riemannian setting from the Jacobi equation to the 
Riccati equation. The basic conjugate point theory 
and the Morse index theory for an arbitrary timelike 
or null geodesic segment in a general spacetime are 
reasonably close to the earlier Riemannian theory, if 
vector fields of the form J(t) = f (t)(@’(t) are accounted 
for in the case of a null geodesic segment 
B : [0,1] — (M,g). But spacelike geodesics and 
conjugate points are more problematic, as was first 
established using symplectic techniques by Helfer in 
1994. More recently, progress has been made in 
applying important ideas of Gromov (1999) for 
Riemannian manifolds to the spacetime context 
(cf. Noldus (2004) for an example). 

Inspired by fundamental concepts in global 
Riemannian geometry, Beem and Ehrlich in 1979 
introduced the concept of nonspacelike cut 
point, again most tractable for globally hyperbolic 
spacetimes. Let ~y: [0,a)— (M,g) be a future- 
inextendible, future-directed nonspacelike geodesic 
in an arbitrary spacetime. Define 


to = sup{t € [0, a); d(O), 1O) = Llo} [7 


(If there is a closed timelike curve through ~(0), 
then d(7(0), y(0)) = +oo and tọ will not exist. If y is 
a nonspacelike geodesic ray and hence 
d(y(0), y(t)) = L(7lio,y) for all t, then to =a.) How- 
ever, if O < tọ < a, then (Zo) is said to be the future 
nonspacelike cut point of p=7(0) along y. For 
general spacetimes, it may be shown that: 


1. for O<s<t<t, that yl is the unique 
maximal nonspacelike geodesic in all of (M,g) 
between y(s) and y(t); 

lio, | 1S maximal for all t with 0 < t < to; and 

3. for all ¢ with t) <t<a, there is a longer 

nonspacelike curve in (M,g) than y|)9 , between 


7(0) and q(t). 


A nonspacelike cut point is a subtler concept than 
a nonspacelike conjugate point since the existence of 
a cut point is not necessarily captured by the 
behavior of families of future nonspacelike curves 
(or geodesics) close to the given geodesic segment y, 
the basic viewpoint of the calculus of variations. But 
since calculus of variations arguments shows that 
past a nonspacelike conjugate point, longer “neigh- 
boring curves” join 7(0) to y(t), the future cut point 
of p=7(0) along y comes no later than the first 


NO 
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future conjugate point to p along y in either the 
timelike or null geodesic case. 

In a startling result which contradicted erroneous 
arguments in all the standard textbooks, Margerin 
in 1993 gave examples to show that even for 
compact Riemannian manifolds, the first conjugate 
locus of a point (i.e., the set of all first conjugate 
points along all geodesics issuing from a given point) 
need not be closed, even though elementary argu- 
ments correctly show that the cut locus of any point 
(i.e., the set of all cut points along all geodesics 
issuing from the given point) is always closed. The 
timelike first conjugate locus of a point in a 
spacetime will generally not be closed, but because 
a nonspacelike geodesic in a globally hyperbolic 
spacetime must escape from any compact subset in 
finite affine parameter, the future (or past) first 
nonspacelike conjugate locus of any point in such a 
spacetime is a closed subset. In a result analogous to 
the Riemannian characterization, nonspacelike cut 
points in globally hyperbolic spacetimes may be 
characterized as follows: let g=~(to) be the future 
cut point of p=~7(0) along the timelike (resp., null) 
geodesic segment y from p to q. Then either one of 
both of the following conditions hold: (1) g is the 
first future conjugate point to p along y, or (2) there 
exist at least two maximal timelike (resp., null) 
geodesic segments from p to g. 

Now given p in an arbitrary spacetime (M, g), the 
future timelike (resp., null) cut locus of p is defined 
to be the set of all timelike (resp., null) cut points 
along all future timelike (resp., null) geodesics 
issuing from p and the future nonspacelike cut 
locus of p is defined as the union of the future 
timelike and null cut loci. Employing alternatives 
(1) and (2) in the preceeding paragraph, it may be 
shown for globally hyperbolic spacetimes that the 
null and nonspacelike cut loci are closed subsets 
of M. 

The null cut locus has a privileged status 
by virtue of a phenomena not encountered for 
Riemannian manifolds. Under a conformal change 
of back-ground spacetime metric, null geodesics 
remain null pregeodesics (i.e., may be reparame- 
trized to be null geodesics in the deformed Lorentz 
metric) while such deformations fail to preserve 
timelike or spacelike geodesics, or to preserve 
geodesics in the Riemannian case. Even though 
null conjugate points along a null geodesic will not 
remain invariant under conformal change of space- 
time metric, it is remarkable that elementary 
arguments involving the spacetime distance func- 
tion show that global conformal diffeomorphisms 
do preserve null cut points and hence the null cut 
locus of any point. 


Geodesic Incompleteness and the 
Lorentzian Splitting Theorem 


In global Riemannian geometry, an important concept 
is that of a geodesic ray. In a complete Riemannian 
manifold (N,go), a unit geodesic c:[0, +00) > 
(N,go) is said to be a (geodesic) ray if do(c(0), 
c(t))=t for all t > 0. By the triangle inequality, c(t) is 
minimal between every pair of its points. By making a 
limit construction, it may be shown that for each p in 
N, there exists a geodesic ray c(t) with c(0)=p. An 
allied concept is that of a (geodesic) line c: R —> 
(N, 20); here do(c(t), c(s)) =|t — s| for all t, s is required, 
that is, c is minimal between every pair of its points. 
The existence of a line is much stronger than the 
existence of a ray. If (N,go) has positive Ricci 
curvature everywhere, then (N, go) contains no lines 
despite the fact that it contains a ray issuing from 
each point. A helpful tool in this setting is the 
compactness of sets of tangent vectors of the form 


{w € TpN; go(w, w) = 1} [8 | 


for any p in N; hence, any infinite sequence of 
tangent vectors based at p automatically has a 
convergent subsequence. 

For spacetimes, geodesic completeness cannot 
generally be assumed. Yet a future nonspacelike 
geodesic ray y : [0, b) — (M, g) may be defined to be 
a future-directed, future-inextendible nonspacelike 
geodesic with d(7(0),¥(t))=L(yjo,4) for all ¢ in 
[0, b). The reverse triangle inequality implies that y 
is maximal between any pair of its points. Similarly, 
a nonspacelike geodesic line y : (a,b) — (M,g) is a 
past- and future-inextendible nonspacelike geodesic 
with d(y(t), y(s)) =L(y;,)) for all s, t. Hence, y is 
maximal between any pair of its points. If nonspace- 
like geodesic completeness is assumed, a= —oo and 
b= +o above. Constructions here are more delicate 
than in the Riemannian case because the sets 


{v € IM; g(v, v) =—1} [9] 


of unit timelike tangent vectors, while closed in the 
tangent space, are noncompact. Despite this techni- 
cality, using the limit curve machinery of general 
relativity in place of the compactness in [8], it has 
been shown that a strongly causal spacetime admits 
a past and future nonspacelike geodesic ray issuing 
from every point (cf. Beem et al. (1996, chapter 8)). 
(If the spacetime is not nonspacelike geodesically 
complete, these rays will not necessarily be past or 
future complete.) As in the Riemannian case, the 
existence of a complete line is a stronger geometric 
condition. For that reason, in 1977 Beem and 
Ehrlich introduced the concept of a spacetime 
causally disconnected by a compact set K and 


showed that a strongly causal spacetime which is 
causally disconnected by a compact set contains a 
nonspacelike geodesic line which intersects the 
compact set. (Again, unless the spacetime is non- 
spacelike geodesically complete, this line need not be 
future or past complete.) 

A pattern common to many results in global 
Riemannian geometry especially since the 1950s is 
the following: the existence of a complete Riemannian 
metric on a smooth manifold which also satisfies a 
global curvature inequality implies a topological or 
geometric conclusion. A celebrated early example 
from the 1950s and 1960s, obtained by separate 
results of Rauch, Berger, and Klingenberg, is the 
topological sphere theorem. 


Topological Sphere Theorem Suppose (N, go) is a 
complete, simply connected Riemannian n-manifold 
whose sectional curvatures satisfy 1/4<K <1. 
Then N is homeomorphic to S”. 


By contrast, for spacetimes, the assumption of 
geodesic completeness is generally unwarranted. 
Here is an example of one of the celebrated 
singularity theorems of general relativity, published 
in 1970 as originally stated: 


Hawking-—Penrose Singularity Theorem No space- 
time (M, g) of dimension n > 3 can satisfy all of the 
following three requirements together: 


(i) (M, g) contains no closed timelike curves; 
(ii) Every inextendible nonspacelike geodesic in 
(M, g) contains a pair of conjugate points; and 
(iii) There exists a future- or past-trapped set S in 
(M, g). 


This theorem may be reinterpreted more akin to 
the Riemannian pattern above as follows: suppose 
(M,g) is a chronological spacetime of dimensions 
n>3 which satisfies the timelike convergence 
condition (Ric(v,v) >0O for all timelike tangent 
vectors) and the generic condition (every inextend- 
ible nonspacelike geodesic contains a point which 
has some appropriate nonzero sectional curvature). 
If (M, g) contains a future- or past-trapped set, then 


(M,g) is nonspacelike geodesically incomplete. 
Hence, this result models the pattern: global 
curvature inequalities (reflecting the physical 


assumptions that gravity is assumed to be attractive 
and every inextendible nonspacelike geodesic experi- 
ences tidal acceleration) and a further physical or 
geometric assumption (the first and third conditions) 
implies the existence of an incomplete timelike or 
null geodesic. 

An influential concept in global Riemannian 
geometry formulated during the 1960s and 1970s 
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is that of curvature rigidity, which first became 
widely known through the introduction to the text 
Cheeger and Ebin (1975). The above statement of 
the “sphere theorem” contains one hypothesis that 
the sectional curvature is strictly greater than 1/4. 
In curvature rigidity, the hypothesis of strict 
inequality is relaxed to include the possibility of 
equality as well, and then one tries to show that 
either the old conclusion is still valid, or if it fails, it 
fails in an isometric (hence “rigid”) manner. Thus 
in the example of the sphere theorem, if the 
sectional curvature is now allowed to satisfy 1/4 < 
K <1, then either the given Riemannian manifold 
remains homeomorphic to the n-sphere, or if not, it 
is isometric to a Riemannian symmetric space of 
rank 1. 

Already in an article in 1970, Geroch had 
expressed the opinion that most spacetimes should 
be nonspacelike geodesically incomplete and also 
that a spacetime should fail to be nonspacelike 
geodesically incomplete only under special circum- 
stances. Apparently by the early 1980s, S T Yau had 
formulated the idea that timelike geodesic incom- 
pleteness of spacetimes ought to display a curvature 
rigidity. In the paragraph following the statement of 
the Hawking—Penrose singularity theorem, there are 
two curvature conditions mentioned — the timelike 
convergence condition and the generic condition. 
Now the timelike convergence condition already 
allows for the case of equality (i.e., zero timelike 
Ricci curvature) in its formulation; hence, curvature 
rigidity here would imply dropping the generic 
condition that each inextendible nonspacelike geo- 
desic contains a point of nonzero sectional curva- 
tures as a hypothesis. This notion seems first to have 
been published by Yau’s Ph.D. student R Bartnik in 
1988 as follows: 


Conjecture Let (M,g) be a spacetime of dimension 
>3 which 


(i) contains a compact Cauchy surface and 
(ii) satisfies the timelike convergence condition 
Ric(v,v) > 0 for all timelike v. 


Then either (M, g) is timelike geodesically incom- 
plete, or (M,g) splits isometrically as a product 
(IR x V,—-dt? +b) where (H,b) is a compact 
Riemannian manifold. 


This conjecture has been proven in many cases 
with the following proof scheme. From the physical 
or geometric assumptions made, produce an 
inextendible nonspacelike geodesic line. Further, 
prove that the line happens to be timelike rather 
than null. Then if the spacetime were timelike 
geodesically complete, it would contain a complete 
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timelike line. But then the desired splitting may be 
obtained using the Lorentzian splitting theorem. 


Lorentzian Splitting Theorem Let (M,g) be a 
spacetime of dimension >3 which satisfies each of 
the following conditions: 


(i) (M,g) is either globally hyperbolic or timelike 
geodesically complete; 
(ii) (M, g) satisfies the timelike convergence condi- 
tion; and 
(iii) (M, g) contains a complete timelike line. 


Then (M, g) splits isometrically as a product (R x V, 
dt? +h) where (H,h) is a complete Riemannian 
manifold. 


This result, which corresponds to obtaining the 
spacetime analog of a celebrated splitting theorem of 
Cheeger and Gromoll for lines in complete Riemannian 
manifolds of non-negative Ricci curvature, published 
in 1971, was posed as a problem by S T Yau in a 
problem list stemming from the conference 
Special Year in Differential Geometry held at the 
Institute for Advanced Study in Princeton during the 
1979-80 academic year. Early progress was made 
using maximal hypersurface methods by Gerhardt in 
1983, Bartnik in 1984, and Galloway in 1984. Then 
in 1985, Beem, Ehrlich, Markvorsen, and Galloway 
introduced the methodology of employing the 
Busemann function of the complete timelike line, 
motivated by techniques from Riemannian geome- 
try, and succeeded in obtaining a splitting under the 
hypothesis of global hyperbolicity and everywhere 
nonpositive timelike sectional curvatures. In separate 
publications, Eschenburg and Galloway extended 
the result to the desired curvature hypothesis of 
nonnegative timelike Ricci curvatures. Finally, 
Newman in 1990 achieved the originally desired 
goal of obtaining the splitting under the assumption 
of timelike geodesic completeness, rather than global 
hyperbolicity. This is a more delicate setting, since 
timelike geodesic completeness does not imply 
maximal nonspacelike geodesic connectability, a 
fairly basic geometric tool in many standard 
constructions. But the idea emerged with 
Newman’s solution that the existence of a timelike 
geodesic line or segment in a nonglobally hyper- 
bolic spacetime implies an adequate level of control 
in a tubular neighborhood of the given line to 
enable the proof to work. Galloway and Horta in 
1996 published a much simplified working out of 
these concepts. A fuller exposition of these devel- 
Opments may be found in Beem et al. (1996, 
chapter 14). In addition, in 2000, Galloway 
published a version of the splitting theorem for a 
null maximal geodesic line. 


Two-Dimensional Spacetimes 


Two-dimensional spacetimes, sometimes termed 
Lorentz surfaces, are especially tractable because 
given (M,g) with dim M =2, then (M, —g) is also a 
spacetime. Hence, it may be shown that any 
Lorentzian 2-manifold (M,g) homeomorphic to R? 
may be made geodesically complete (not just 
nonspacelike geodesically complete) by a conformal 
change of metric. Also, any simply connected two- 
dimensional Lorentzian manifold is strongly causal. 
In Weinstein (1996), an extensive study is made of 
Lorentz surfaces generally and particularly, of a 
conformal boundary for such surfaces first given by 
Kulkarni in 1985. 

One of the prettiest classical results linking the 
geometry and topology of a Riemannian surface is 
the Gauss-Bonnet theorem. Let (N,gọ) be a 
Riemannian manifold of dimension 2 and let P be a 
polygonal subregion with piecewise smooth bound- 
ing curves c; 1<i<k. Let K denote the Gauss 
curvature of (N, go) and « the geodesic curvature of 
the smooth curves c; (which vanishes if c; happens to 
be a geodesic). If a; denote the corresponding 
interior angles between the successive boundary 
curves c; and cj,,, then the Gauss—Bonnet formula 
over P is 


J| Kaas [dst E-e) = 2 10] 


By considering a triangulation of N itself and 
summing up the corresponding terms in [10], it 
follows for a compact oriented Riemannian mani- 
fold (N, go) of dimension 2 that 


| [Kaa = 2TX(N) [11] 
N 


where x(n) denotes the Euler characteristic. Also 
lurking in the background here is a formula for 
computing the angle between unit tangent vectors v, 
w as 


cos 0 = go(v, w) [12] 


In the spacetime setting, different versions of a 
Gauss-Bonnet formula for subregions of a two- 
dimensional spacetime (M, g) corresponding to [10] 
have been given in 1974 by Helzer and in 1984 by 
Birman and Nomizu. First, the angle computation is 
a bit trickier for spacetimes than in the Riemannian 
case; eqn [12] has to be replaced by techniques 
which use the hyperbolic functions coshu and 
sinhu to define the angle u (sometimes called the 
“hyperbolic angle”) between two unit vectors and 


then to allow for null vectors. Birman and Nomizu 
obtained an analog of [10] assuming that the 
boundary curves for P are successive smooth unit 
timelike curves: 


ds- | | KdA+ 0, = 0 
Ja" P 2 


Helzer in his formulation allows the different 
boundary curves to be either unit timelike, unit 
spacelike or null separately. Since the only compact, 
orientable smooth surface which admits a spacetime 
metric is the 2-torus, which has zero Euler char- 
acteristic, the Riemannian formula [11] above 
translates into the uniform constraint on the Gauss 
curvature of the spacetime: 


J [Kaa =o 


See also: General Relativity: Overview; Geometric 
Analysis and General Relativity; Pseudo-Riemannian 
Nilpotent Lie Groups; Spacetime Topology, Causal 
Structure and Singularities. 
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Lyapunov Exponents 


The Lyapunov exponents of a sequence {A”,n > 1} 
of square matrices of dimension d > 1 are the values 


of 
1 
A(v) = lim sup ~ log |A” -v| [1] 


over all nonzero vectors v € R%. For completeness, 
set A(0) = —oo. It is easy to see that A(cv) = A(v) and 
Mv +0’) < max{A(v), A(v’)} for any nonzero scalar c 
and any vectors v, uv’. It follows that, given any 
constant a, the set of vectors satisfying \(v) <a is a 
vector subspace. Consequently, there are at most d 
Lyapunov exponents, henceforth denoted by 


A<- <Ag_1<Ag, and there exists a filtration 
Fl<...<Fe!<FR=R® into vector subspaces, 
such that 


Av) =N for all v € F;\ Fj-1 


and every i=1,...,k (write Fo = {0}). In particular, 
the largest exponent is 


1 
Ap = lim sup ~ log |A” || [2] 


One calls dim F; — dim F;—ı the multiplicity of each 
Lyapunov exponent A;. 

There are corresponding notions for continuous 
families of matrices A’, t € (0,00), taking the limit as 
t goes to œ in the relations [1] and [2]. The theories 
for the two types of families, discrete and contin- 
uous, are analogous and so at each point in what 
follows we refer to either one or the other. 
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Lyapunov Stability 
Consider the linear differential equation 
v(t) = Blt) - v(t) [3] 


where B(t) is a bounded function with values in the 
space of d x d matrices, defined for all t € R. The 
theory of differential equations ensures that there 
exists a fundamental matrix A’,t € R, such that 


v(t) = A’ - vo 


is the unique solution of [3] with initial condition 
v(O) =v. 

If the Lyapunov exponents of the family A’, t> 0, 
are all negative then the trivial solution v(t) = 0 is 
asymptotically stable, and even exponentially stable. 
The stability theorem of Lyapunov asserts that, 
under an additional regularity condition, stability is 
still valid for nonlinear perturbations 


w(t) = B(t) -w + F(t, w) [4] 


with ||F(t,w)|| <const.||w||'*,c>0. That is, the 
trivial solution w(t) = 0 is still exponentially asymp- 
totically stable. 

The regularity condition means, essentially, that 
the limit in [1] does exist, even if one replaces 
vectors v by elements v1 \--- A v; of any lth exterior 
power of R4,1</<d. By definition, the norm of an 
l-vector vy \--- Av; is the volume of the parallele- 
piped determined by the vectors 11,...,v,%. This 
condition is usually tricky to check in specific 
situations. However, the multiplicative ergodic 
theorem of VI Oseledets asserts that, for very 
general matrix-valued stationary random processes, 
regularity is an almost sure property. This result sets 
the foundation for the modern theory of Lyapunov 
exponents. We are going to discuss the precise 
statement of the theorem in the slightly broader setting 
of linear cocycles, or vector bundle morphisms. 


Linear Cocycles 


Let u be a probability measure on some space M and 
f:M-—M be a measurable transformation that 
preserves u. Let 7:E—M be a finite-dimensional 
vector bundle, endowed with a Riemannian metric 
| - ||, on each fiber €,=a (x). Let A:E—€E be a 
linear cocycle over f. What we mean by this is that 


tmoA=fon 


and the action A(x): Ex — Ef) of A on each fiber is 
a linear isomorphism. Notice that the action of the 
nth iterate A” is given by 


A” (x) = A(f"'(x))-- A(F(x)) - A(x) 


for every n> 1. 
Assume the function log” ||A(x)||,, is integrable: 


log” ||A(x)|], € L*(u) [5] 


(we write log’ ¢= log max {¢,1}, for any ¢>0). 
It is clear that the sequence of functions 
Ayn(x) = log ||A”(x)||,, satisfies 


Gyan) 2 Am (x) T dal (x)) 


for every m, n, and x. It follows from J Kingman’s 
subadditive ergodic theorem that the limit 


lim —a,, 

en 

exists for p-almost all x. In view of [2], this means 
that the largest Lyapunov exponent A(x) of the 
sequence A”(x),2> 1 is a limit, and not just a lim 
sup, at almost every point. 


Multiplicative Ergodic Theorem 


The Oseledets theorem states that the same holds 
for all Lyapunov exponents. Namely, for u-almost 


every x E€ M there exists k=k(x) € {1,...,d}, a 
filtration 
Pee ey SF Se, 
and numbers \1(x) < --- < A(x) such that 
. 1 ; 
dim zlog || A" x)| = Ai(*) 6 


for all v € F.\Fo! and i € {1,..., k}. 

The Lyapunov exponents A;(x), and their number 
k(x), are measurable functions of x and they are 
constant on orbits of the transformation f. In 
particular, if the measure u is ergodic then k and 
the A; are constant on a full p-measure set of 
points. The subspaces Fi also depend measurably 
on the point x and are invariant under the linear 
cocycle: 


It is in the nature of things that, usually, these 
objects are not defined everywhere and they depend 
discontinuously on the base point x. 

When the transformation f is invertible, one 
obtains a stronger conclusion, by applying the 
previous kind of result also to the inverse of the 
cocycle. Namely, assuming that log” ||A™|| is also 
in L'(u), one gets that there exists a 
decomposition 


defined at almost every point and such that 
A(x) Ey, = Ex, and 


| 7 EES 
lim og A"); =A) 7 
for all ve EŻ different from zero and all ¿€ 
{1,...,k}. These Oseledets subspaces E! are related 


to the subspaces Fi. through 


Hence, dim EŻ = dim F} — dim F! is the multipli- 
city of the Lyapunov exponent A;(x). 

The angles between any two Oseledets subspaces 
decay subexponentially along orbits of f: 


1 | | 
i j — 
„lim : log angle (Ei, (x) Een w =0 


for every i Æ j and almost every point. These facts 
imply the regularity condition mentioned previously 
and, in particular, 


k 
lim 1 Nog | det A%(sc) =) di(x)dimE., [8] 


n—+oo n 


Consequently, for cocycles with values in SL(d, R), 
the sum of all Lyapunov exponents, counted with 
multiplicity, is identically zero. 

As we are dealing with almost certain properties, 
we may generally restrict the vector bundle to some 
full measure subset over which it is trivial. Then 
each fiber £, is identified with the space Rf, and we 
may think of A(x) as a dxd matrix. Then 
An(x) =A(f"(x)) is a stationary random process 
relative to (f, u). Thus, in this context it is no 
serious restriction to view a linear cocycle as a 
stationary random process with values in the linear 
group GL(d, R) of invertible d x d matrices. 

Furthermore, given any such random process 
A,,n>0, one may consider its normalization 
B,=A,/|detA,|. The Lyapunov exponents of the 
two random processes A„, n > 0, and Ba, n > 0, differ 
by the time average 


n—1 


lim 59 log |detA;(x)| 
n— CoO j=0 


of the determinant. The Birkhoff ergodic theorem 
ensures that the time average is well defined almost 
everywhere, as long as the function log| det A| is in 
L'(u); this is the case, for instance, if both 
log” ||A*|| are integrable. This relates the general 
case to random processes with values in the special 
linear group SL(d,R) of dxd matrices with 
determinant +1. 
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The Oseledets theorem was extended by D Ruelle 
to certain linear cocycles in infinite dimensions. He 
assumes that the A(x) are compact operators on a 
Hilbert space H and log” ||A|| is in L(y). The 
conclusion is the same as in finite dimensions, 
except that the filtration 


+ <Fi<...-<FL=H 


may involve infinitely many subspaces, and the 
Lyapunov exponents may be —oo. There is also a 
version for cocycles over invertible transforma- 
tions, where one assumes each A(x) to be invertible 
and the sum of a unitary operator with a compact 
operator, such that both log ||A*|| are integrable. 
The conclusion is that there exists an Oseledets 
decomposition H=E!@---@E@--- at almost 
every point, with finitely or countably many 
factors. 


Random Matrices 


Relation [8] implies that, for SL(d,R) cocycles, if 
there is only one Lyapunov exponent (with full 
multiplicity) then it must be zero. When this 
happens, the theory contains no information on the 
behavior of the iterates A” (x) - v, apart from the fact 
that there is no exponential growth nor decay of 
their norms. Thus, the question naturally arises 
under which conditions is there more than one 
Lyapunov exponent or, equivalently, under which 
conditions is the largest Lyapunov exponent strictly 
positive. 

This problem was first addressed by H Furstenberg 
for products of independent random variables, 
corresponding to the following class of linear 
cocycles. Let v be a probability measure on the 
group G=GL(d,R). Consider M =G and u= 
(or M=G*% and w=v%), and let f:M—M be the 
shift map 


f ((aj),) = (Qj+1); 
It is clear that u is invariant and also ergodic for the 
transformation f. Consider the cocycle A:E— E 
defined by €E=M x Rt and 


A((a;);) -V = QQ: V 


Clearly, 
A” ((a;);) -V = Qn—1 °° @1Q0 -V 


Corresponding to the hypothesis of the multiplicative 
ergodic theorem, assume that log’ |la|| (and 
log” |a! ||) are v-integrable functions of the matrix a. 

Furstenberg’s theorem states that if the closed 
group G(v) generated by the support of v is 
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noncompact and strongly irreducible in R? then 
the largest Lyapunov exponent of the cocycle A 
is strictly positive. Strong irreducibility means 
that there exists no finite union of subspaces of 
R? that is invariant under all elements of the 
group. Improvements, extensions, and alternative 
proofs have been obtained by several authors since 
then. 

Especially, Y Guivarc’h and A Raugi provided 
conditions under which there are exactly d distinct 
Lyapunov exponents or, in other words, the 
multiplicity of every Lyapunov exponent is equal 
to 1. A matrix semigroup has the contraction 
property if there exists a sequence of elements h, 
and a probability measure on the projective space 
of R? that gives zero weight to any projective 
subspace, such that the images (h,),m of m under 
the 4,, converge to a Dirac mass in the projective 
space. They proved that if the closed semigroup 
H(v) generated by the support of the probability v 
is strongly irreducible and has the contraction 
property then the largest Lyapunov exponent has 
multiplicity 1. Applying this to the exterior 
powers of the cocycle, one obtains sufficient 
conditions for simplicity of the other Lyapunov 
exponents as well. 

This statement has been improved by I Ya 
Gol’dsheid and G A Margulis, who formulated the 
hypotheses in terms of the algebraic closure G(v) of 
the semigroup H(v). They assumed that G(v) has the 
contraction property and the connected component 
of the identity inside G(v) is irreducible in Rf, 
meaning that its elements do not have any common 
invariant subspace. Then the largest Lyapunov 
exponent is simple. 


Schrodinger Cocycles 


The one-dimensional discrete Schrödinger equation 
is the second-order difference equation 


=| Mnai + Un—1) + Vyn = Eun [9] 


derived from the stationary Schrödinger equation in 
dimension 1 by space discretization. Here the energy 
E is a constant and V,=V(f"(@)), where the 
potential V(-) is a bounded scalar function and 
f:M—M is a transformation preserving some 
probability measure u on M. In what follows, we 
take u to be ergodic. Equation [9] may be rewritten 
as a first-order relation, 


Un+1 E Va -E -1 Un 
Vn+1 E 1 0 Vy 


Hence, it may also be interpreted as a linear cocycle 
A over f, where the vector bundle is € =M x R? and 


A(0) = wi 4 [10] 


takes values in SL(R, 2). By ergodicity, the Lyapu- 
nov exponents are essentially independent of the 
base point 6. Let A(E) denote the largest exponent: 
by the relation [8], the other one is —\(E). 

The Lyapunov exponent A(E) is related to the 
spectral theory of the linear operators Lo, 


(Lou), — —(Un41 T Un—1) T Vpn 


on the space (Z) of complex square-integrable 
sequences Un, n E€ Z. These are bounded Hermitian 
operators and so the spectra are compact subsets of R. 
Using the assumption that u is ergodic, one can prove 
that the spectrum spec(£,) is constant almost every- 
where. If the transformation f is minimal, the spectrum 
is even independent of the point 6. Moreover, for all 
energies, 


A(E) > const. dist(E, spec(Lg)) 


In particular, \(E) is always positive on the comple- 
ment of the spectrum. 

A fundamental problem (Anderson localization) is 
to decide when the spectrum is pure-point. This is 
reasonably well understood for a few classes of base 
dynamics only, for example, the very chaotic systems 
such as Bernoulli and Markov processes (random 
potentials) or uniformly hyperbolic maps and flows, 
or the irrational rotations on the d-dimensional torus 
(quasiperiodic potentials). In the latter case, the 
results are more complete when there is only one 
frequency (d= 1). It was shown by K Ishii and by L 
Pastur that if A(E) is positive for almost all values of 
E in some Borel set then the absolutely continuous 
part of the spectrum is essentially disjoint from that 
set. The converse is also true (due to S Kotani). Thus, 
checking that A(E) is positive is an important step 
towards proving localization. 

A very general criterion for positivity of the 
Lyapunov exponent was obtained by Kotani. Namely, 
he proved that if the potential is not deterministic then 
A(E) is positive for almost all E. In particular, for 
nondeterministic potentials the absolutely continuous 
spectrum is empty, almost surely. In simple terms, the 
hypothesis means that from the values of the potential 
for negative n one cannot determine the values for 
positive n. More formally, one calls the potential 
deterministic if every V„,n > 0 is almost everywhere a 
measurable function of {V,:2<0}. For instance, 
quasiperiodic potentials are deterministic, whereas 
Bernoulli potentials are not. 


Subharmonicity Method 


Let D” be the set of complex vectors (z1, ... , Xm) € C” 
such that |z;| <1 for all j and let T” be the subset 
defined by |z|=1 for all j. Let f:T” — T” and 
A:T” — SL(d, R) be continuous maps that admit 
holomorphic extensions to the interior of D” with 
f(0)=0. Assume that f preserves the natural (Haar) 
measure u on T”. Let 


MA, 1) = / edu 


where A(z) denotes the largest Lyapunov exponent 
for the cocycle defined by A over f. It also follows 
from the subadditive ergodic theorem that 


E : 
MAn) = lim f log |A"(2) du 
n jy 


M Herman observed that, since the function 
log ||A”(z)|| is plurisubharmonic on D”, one may 
use the maximum principle to conclude that 


1 1 
> / log ||A"(z)||d > — log ||A"(0) 
n q n 


Then, taking the limit when n — oo one obtains that 
AA, u) 2 p(A) [11] 


where p (A) denotes the spectral radius of the matrix 
A(0). Starting from this observation, he developed a 
very effective method for bounding Lyapunov 
exponents from below, that received several applica- 
tions and extensions, in particular, to the theory of 
Schrödinger cocycles with quasiperiodic potentials. 
The best-known application is the following bound 
for integrated Lyapunov exponents of two-dimen- 
sional cocycles. Let f:M—M be a continuous 
transformation on a compact metric space, preserving 
some probability measure u, and A: M —> SL(2, R) be 
a continuous map. For each fixed 6, let ARọ be the 
cocycle obtained by multiplying A(x), at every point 
x, by the rotation of angle 6. Herman proved that 


1 
~ J Wake 1)d0 > J Nejd 


(A Avila and J Bochi later showed that the equality 
holds) where 


=Í 
N(x) = log CE AOI 


Apart from the exceptional case when A acts by 
rotation at every point in the support of u, the right- 
hand side of the inequality is positive, and so the 
Lyapunov exponent of the cocycle ARg is positive 
for many values of 0. 
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Nonuniform Hyperbolicity 


The prototypical example of a linear cocycle is the 
derivative of a smooth transformation on a mani- 
fold. More precisely, let M be a finite-dimensional 
manifold and f:M—M be a diffeomorphism, that 
is, a bijective smooth map whose derivative Df(x) 
depends continuously on x and is an isomorphism at 
every point. Let E = TM be the tangent bundle to the 
manifold and A=Df be the derivative. If M is 
compact or, more generally, if the norms of both Df 
and its inverse are bounded, then the hypothesis in 
Oseledets theorem is automatically satisfied for any 
f-invariant probability u. Lyapunov exponents yield 
deep geometric information on the dynamics of the 
diffeomorphism, especially when they do not vanish. 
For most results that we mention in the sequel, one 
needs the derivative Df to be Holder continuous: 


||Df(x) — Df(y)|] < const. d(x, y)” 


Let E$ be the sum of the Oseledets subspaces 
corresponding to negative Lyapunov exponents. 
Pesin’s stable manifold theorem states that there 
exists a family of embedded disks WẸ (x) tangent to 
ES at almost every point and such that the orbit of 
every y E€ WẸ (x) is exponentially asymptotic to the 
orbit of x. This lamination {W%(x)} is invariant, in 


the sense that 


f(W*(x)) c W*(F(x)) 


and has an “absolute continuity” property. There 
are analogous results for the sum EY of the Oseledets 
subspaces corresponding to positive Lyapunov 
exponents. 

The entropy of a partition P of M is defined by 


1 
ba(f,P) = lim —H,(P") 


where P” is the partition into sets of the form 
r= AT N- -Of (Pa) with FEF and 


H,(P”) = X —p(P) log u(P) 
PEP” 


The Kolmogorov-Sinai entropy h„(f) of the system 
is the supremum of h,(f,P) over all partitions P 
with finite entropy. The Ruelle—Margulis inequality 
says that /,,(f) is bounded above by the average sum 
of the positive Lyapunov exponents. A major result 
of the theory, Pesin’s entropy formula, asserts that if 
the invariant measure p is smooth (e.g., a volume 
element) then the two invariants coincide: 


k 
hy(f) = J (>: 3 dy 
j=1 
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A complete characterization of the invariant mea- 
sures for which the entropy formula is true was 
given by F Ledrappier and L S Young. 

The invariant measure u is called hyperbolic if all 
Lyapunov exponents are nonzero at almost every 
point. Hyperbolic measures are exact dimensional: 
the pointwise dimension 


r0  logr 


exists at almost every point, where B,(x) is the 
neighborhood of radius r around x. This fact was 
proved by L Barreira, Ya Pesin, and J Schmeling. Note 
that it means that the measure u(B,(x)) of neighbor- 
hoods scales as r“'*) when the radius r is small. 

Another remarkable feature of hyperbolic mea- 
sures, proved by A Katok, is that periodic motions 
are dense in their supports. More than that, 
assuming the measure is nonatomic, there exist 
Smale horseshoes H, with topological entropy 
arbitrarily close to the entropy h,,(f) of the system. 
In this context, the topological entropy h(f, Ha) may 
be defined as the exponential rate of growth, 


lim zlog Bee o 


of the number of periodic points on H,. 


Generic Systems 


Given any area-preserving diffeomorphism on any 
surface M, one may find another whose first 
derivative is arbitrarily close to the initial one and 
which has Lyapunov exponents identically zero at 
almost every point, or else is globally uniformly 
hyperbolic (Anosov). This surprising fact was 
discovered by R Mañé, and a complete proof was 
given by J Bochi. Uniform hyperbolicity means that 
the tangent bundle admits a Df-invariant splitting 


TM = E* @ E” 





such that the line bundle E’ is uniformly contracted 
and E" is uniformly expanded by the derivative. It is 
well known that Anosov diffeomorphisms can only 
occur if the surface is the torus T’. 

In fact, the theorem of Mané—Bochi is stronger: 
for a residual subset (a countable intersection of 
open dense sets) of all once-differentiable area- 
preserving diffeomorphisms on any surface, either 
the Lyapunov exponents vanish almost everywhere 
or the diffeomorphism is Anosov. This shows that 
zero Lyapunov exponents are actually quite com- 
mon for surface diffeomorphisms that are only once- 
differentiable. Moreover, this theorem has been 


extended to diffeomorphisms on manifolds with 
arbitrary dimension, in a suitable formulation, by 
J Bochi and M Viana. 

However, this phenomenon should be specific to 
systems with low differentiability. Indeed, already 
for Holder-continuous linear cocycles over chaotic 
transformations it is known that vanishing Lyapu- 
nov exponents can only occur with infinite codimen- 
sion. That is, unless the cocycle satisfies an infinite 
number of independent constraints, there exists 
some positive exponent. By “chaotic” we mean 
here that the invariant probability u of the base 
transformation is assumed to be hyperbolic and to 
have local product structure: it is locally equivalent 
to a product of two measures, respectively, along 
stable and unstable sets. 

Under additional assumptions, one can even prove 
that all Lyapunov exponents have multiplicity 1 
outside an infinite-codimension subset. This follows 
from extensions of the Guivare’h—-Raugi criterion for 
certain linear cocycles over chaotic transformations, 
obtained by A Avila, C Bonatti, and M Viana. 


Strange Attractors 


This expression was coined by D Ruelle and 
F Takens in their celebrated study on the nature of 
fluid turbulence. E Hopf and also L D Landau and 
E M Lifshitz had suggested that turbulent motion 
arises from the existence in the phase space of 
invariant tori carrying quasiperiodic flows with 
large number of frequencies. Ruelle and Takens 
observed that dissipative systems such as viscous 
fluids do not generally have such quasiperiodic tori, 
and concluded that turbulence must be credited to a 
different mechanism: the presence of some “strange” 
attractor. 

While they did not propose a precise definition, 
two main features were mentioned: 


1. Complex geometry: a strange attractor is not 
reduced to an equilibrium point or a periodic 
solution of the system and, generally, should 
have a fractal structure. 

2. Chaotic dynamics: solutions accumulating on the 
attractor should be sensitive to their initial states. 


As more examples were found, it became appar- 
ent that the above two features do not always come 
together. This led to two types of definitions in the 
literature, depending on whether one emphasizes the 
geometry or the dynamics. We adopt the second 
point of view, and propose to define the strange 
attractor as one carrying an invariant ergodic 
physical measure which has some positive Lyapunov 
exponent. The notion of physical measure will be 


defined near the end. The condition on the Lyapu- 
nov exponent ensures that the dynamics near the 
attractor is (exponentially) sensitive to the initial 
states. 


Lorenz-Like Attractors 


The uniformly hyperbolic attractors introduced by 
S Smale provided an interesting class of examples of 
strange attractors, both chaotic and fractal. Perhaps 
more striking, given that they originated from a 
concrete problem in fluid dynamics, were the 
strange attractors introduced by E N Lorenz. The 
Lorenz system of differential equations, 


x=-ox+oy, g= 10 
y= =y £25 [12] 
ż = xy — bz, b= 3/3 


was derived from Lord Rayleigh’s model for 
thermal convection, by Fourier expansion of the 
stream function and temperature, and truncation of 
all but three modes. Lorenz observed that its 
solutions depend sensitively on their initial states. 
Consequently, predictions based on the numerical 
integration of the equations may turn out to be 
very inaccurate, given that the initial data obtained 
from experimental measurements are never com- 
pletely precise. This remarkable observation 
brought the issue of predictability in deterministic 
systems to a whole new light and motivated intense 
investigation of this and many other chaotic 
systems. 

The dynamical behavior of the eqns [12] was first 
interpreted through certain geometric models where 
the presence of strange attractors, both chaotic and 
fractal, could be proved rigorously. It was much 
harder to prove that the original eqns [12] them- 
selves have such an attractor. This was achieved just 
a few years ago, by W Tucker, by means of a 
computer-assisted rigorous argument. At about the 
same time, a mathematical theory of Lorenz-like 
attractors in three-dimensional space was developed 
by C Morales, M J Pacifico, and E Pujals. In 
particular, this theory shows that uniformly hyper- 
bolic attractors and Lorenz-like attractors are the 
only ones which are robust under all small mod- 
ifications of the vector field. 


Henon-Like Attractors 


Starting from the work of Lorenz, many models of 
strange attractors have been found and described to 
some extent, often related to concrete problems. 
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From a mathematical point of view, it is usually 
hard to give even a rough description of the 
dynamics in the chaotic regime. However, this was 
especially successful for the family of strange 
attractors introduced by M Hénon. He considered 
a very simple nonlinear system, particularly suited 
for numerical experimentation: the transformation 


f(x,y) = (1 — ax? + by, x) [13] 


where a and b are constant parameters. In a 
breakthrough, M Benedicks and L Carleson were 
able to prove that, for a set of parameter values with 
positive probability, this transformation has some 
nonhyperbolic attractor such that the orbits accu- 
mulating on it are sensitive to the starting point. The 
system [13] is also a model for many other 
situations, including the phenomenon of creation of 
homoclinic motions as parameters unfold, and the 
conclusions of Benedicks and Carleson have been 
extended to such situations, starting from the work 
of L Mora and M Viana. 

Moreover, a detailed theory of Hénon-like attrac- 
tors has been developed by M Benedicks, M Viana, 
D Wang, L S Young, and other authors. It follows 
from this theory that these attractors carry an 
invariant ergodic probability measure u which 
describes the statistical behavior of almost all 
trajectories f'(x), j> 1, that accumulate the 
attractor: 


_ 1g ; 
lim Dette) = feds 
for any continuous function y. This property 
implies that, despite the fact that it is supported 
on a zero-volume set, the measure u is, in some 
sense, physically observable. For this reason, one 
calls it a physical measure. In other words, time 
averages along typical orbits in the domain of 
attraction coincide with the space averages deter- 
mined by the probability u. Another property with 
physical relevance is that u is the zero-noise limit of 
the stationary measures associated to the Markov 
chains obtained by adding random noise to f. One 
says that the system (f, u) is stochastically stable. 


See also: Chaos and Attractors; Dissipative Dynamical 
Systems of Infinite Dimension; Ergodic Theory; Fractal 
Dimensions in Dynamics; Generic Properties of 
Dynamical Systems; Gravitational N-Body Problem 
(Classical); Homoclinic Phenomena; Hyperbolic 
Dynamical Systems; Lagrangian Dispersion (Passive 
Scalar); Nonequilibrium Statistical Mechanics: Interaction 
between Theory and Numerical Simulations; Random 
Dynamical Systems; Synchronization of Chaos. 
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Introduction 


There is no theory so far of irreversible processes that 
is of the same generality as equilibrium statistical 
mechanics and presumably it may not exist. While in 
equilibrium the Gibbs distribution provides all the 
information and no equation of motion has to be 
solved, the dynamics plays the major role in none- 
quilibrium. The theory illustrated below refers to 
stationary states that are not restricted to being close 
to equilibrium, and for a wide class of models it can be 
shown to be exact. In this case one begins to see the 
appearance of some general principles. 

In equilibrium statistical mechanics, there is a well- 
defined relationship, established by Boltzmann, 
between the probability of a state and its entropy. 
This fact was exploited by Einstein to study thermo- 
dynamic fluctuations. When we are out of equilibrium, 
for example, in a stationary state of a system in contact 
with two reservoirs, it is not completely clear how to 
define thermodynamic quantities such as the entropy 
or the free energy. One possibility is to use fluctuation 
theory to define their nonequilibrium analogs. In fact 
in this way, extensive quantities can be obtained, 
although not necessarily simply additive due to the 
presence of long-range correlations which seem to be a 
rather generic feature of nonequilibrium. This possibil- 
ity has been pursued in recent years leading to a 
considerable number of interesting results. One can 
recognize two main lines. 


1. Exact calculations in simplified models. This is 
well exemplified by the work of Derrida et al. 
(2002). 

2. A general treatment of a class of continuous time 
Markov chains for which the simplified models 


provide examples. This is the point of view 
developed by Bertini et al. (2002, 2004). 


Both approaches have been very effective and of course 
give the same results when a comparison is possible. 


The second approach seems to encompass a wide class 
of systems and has the advantage of leading to 
equations which apply to very different situations. 
This is the point of view we shall adopt in the 
following. The question whether there are alternative 
more natural ways of defining nonequilibrium entro- 
pies or free energies is, for the moment, open. 


Boltzmann-Einstein Formula 


The Boltzmann-—Einstein theory of equilibrium ther- 
modynamic fluctuations, as described for example in 
the book Physique Statistique by Landau-—Lifshitz, 
states that the probability of a fluctuation from 
equilibrium in a macroscopic region of fixed volume 
V is proportional to exp{VAS/k}, where AS is the 
variation of entropy density in the region calculated 
along a reversible transformation creating the 
fluctuation and k is the Boltzmann constant. 

This formula was derived by Einstein simply by 
inverting the Boltzmann relationship between entropy 
and probability. He considered this relationship as a 
phenomenological definition of the probability of a 
state. 

Einstein theory refers to fluctuations from an 
equilibrium state, that is from a stationary state of a 
system isolated or in contact with reservoirs character- 
ized by the same chemical potentials so that there is no 
flow of heat, electricity, chemical substances, etc., 
across the system. When in contact with reservoirs, AS 
is the variation of the total entropy (system + 
reservoirs) which, for fluctuations of constant volume 
and temperature, is equal to -AF/T, where AF is the 
variation of the free energy of the system and T the 
temperature. In the following, we refer to AF/T, our 
main object of study, as the entropy and use the letter S 
for it but no confusion should arise. 

The important question we address is then: what 
happens if the system is stationary but not in 
equilibrium, that is, flows of physical quantities are 
present due to external fields and/or different chemical 
potentials at the boundaries? To start with it is not 
always clear whether a closed macroscopic dynamical 
description is possible. If the system admits such a 
description of the kind provided by hydrodynamic 
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equations, a fact which can be rigorously established in 
simplified models, a reasonable goal is to find an 
explicit connection between time-independent thermo- 
dynamic quantities (e.g., the entropy) and dynamical 
macroscopic properties (e.g., transport coefficients). 
As we shall see, the study of large fluctuations provides 
such a connection. It leads in fact to a dynamical 
theory of the entropy which is shown to satisfy a 
Hamilton-Jacobi equation (HJE) in infinitely many 
variables requiring the transport coefficients as input. 
Its solution is straightforward in the case of homo- 
geneous equilibrium states and highly nontrivial in 
stationary nonequilibrium states (SNSs). In the first 
case we recover a well-known relationship widely used 
in the physical and physico-chemical literature. There 
are several one-dimensional models, where the HJE 
reduces to a nonlinear ordinary differential equation 
which, even if it cannot be solved explicitly, leads to 
the important conclusion that the nonequilibrium 
entropy is a nonlocal functional of the thermodynamic 
variables. This implies that correlations over macro- 
scopic scales are present. The existence of long-range 
correlations is probably a generic feature of SNSs and 
more generally of situations where the dynamics is not 
time-reversal invariant. As a consequence if we divide 
a system into two subsystems, the entropy is not 
necessarily simply additive. 

The first step toward the definition of a non- 
equilibrium entropy is the study of fluctuations in 
macroscopic evolutions described by hydrodynamic 
equations. In a dynamical setting, a typical question 
one may ask is the following: what is the most 
probable trajectory followed by the system in the 
spontaneous emergence of a fluctuation or in its 
relaxation to an equilibrium or a stationary state? To 
answer this question, one first derives a generalized 
Boltzmann-Einstein formula from which the most 
probable trajectory can be calculated by solving a 
variational principle. The entropy is related to the 
logarithm of the probability of such a trajectory and 
satisfies the HJE associated to the variational principle. 

For states near equilibrium, an answer to this type of 
questions was given by Onsager and Machlup in 1953. 
The Onsager—Machlup theory gives the following 
result under the assumption of time reversibility of 
the microscopic dynamics. In the situation of a linear 
hydrodynamic equation and small fluctuations, that is, 
close to equilibrium, the most probable creation and 
relaxation trajectories of a fluctuation are time 
reversals of one another. This conclusion holds also 
in nonlinear hydrodynamic regimes and without the 
assumption of small fluctuations. This follows from 
the study of concrete models. In SNSs, on the other 
hand, time-reversal invariance is broken and the 
creation and relaxation trajectories of a fluctuation 
are not time reversals of one another. 


In the following we refer to boundary-driven 
stationary nonequilibrium states, for example, a 
thermodynamic system in contact with reservoirs 
characterized by different temperatures and chemi- 
cal potentials, but there is no difficulty in including 
an external field acting in the bulk. 


Microscopic and Macroscopic Dynamics 


We consider many-body systems in the limit of 
infinitely many degrees of freedom. The basic general 
assumption of the theory is Markovian evolution. 
Microscopically, we assume that the evolution is 
described by a Markov process X, which represents 
the state of the system at time 7. This hypothesis 
probably is not so restrictive, because the dynamics of 
Hamiltonian systems interacting with thermostats 
finally is also reduced to the analysis of a Markov 
process. Several examples are discussed in the litera- 
ture. To be more precise, X, represents the set of 
variables necessary to specify the state of the micro- 
scopic constituents interacting among themselves and 
with the reservoirs. The SNS is described by a 
stationary, that is, invariant with respect to time shifts, 
probability distribution P.: over the trajectories of X,. 
Macroscopically, the usual interpretation of 
Markovian evolution is that the time derivatives 
of thermodynamic variables p; at a given instant of 
time depend only on the pps and the affinities 
(thermodynamic forces) OS/Op; at the same instant 
of time. Our next assumption can then be 
formulated as follows: the system admits a 
macroscopic description in terms of density fields 
which are the local thermodynamic variables. For 
simplicity of notation, we assume that there is 
only one thermodynamic variable (e.g., p, the 
density). The evolution of the field p= p(t,w), 
where t and u are the macroscopic time and 
space coordinates (see below), is given by diffu- 
sion-type hydrodynamic equations of the form 


O,p = 5V - (D(p)Vp) 
=} X an (Dilo) p) 


1<i,j<d 
= D(p) [1] 


The interaction with the reservoirs appears as 
boundary conditions to be imposed on solutions of 
[1]. We assume that there exists a unique stationary 
solution p of [1], that is, a profile p(w), which 
satisfies the appropriate boundary conditions and is 
such that D(p) =0. This holds if the diffusion matrix 
D; ;(p) in [1] is strictly elliptic, namely there exists a 
constant c > 0 such that D(p) > c (in matrix sense). 

These equations derive from the underlying 
microscopic dynamics through an appropriate 
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scaling limit in which the microscopic time and 
space coordinates t,x are rescaled as follows: 
t=1T/N*,u=x/N, where N represents the linear 
size of the system. For lattice systems, N is an 
integer. The hydrodynamic equation [1] repre- 
sents a law of large numbers with respect to the 
probability measure P,, conditioned on an initial 
state Xo. The initial conditions for [1] are 
determined by Xo. Of course, many microscopic 
configurations give rise to the same value of 
p(0,u). In general, p=p(t,u) is an appropriate 
limit of a local observable pn(X,) as the number 
N of degrees of freedom diverges. 

The hypothesis of Markovian evolution is also the 
basis of the 1931 Onsager’s theory of irreversible 
processes near equilibrium. Onsager, however, did not 
rely on any microscopic model and assumed, near the 
equilibrium, linear hydrodynamic equations or regres- 
sion equations as he called them. His equations, 
ignoring space dependence, were of the form 


ĥi =— > Dipi [2] 


The diffusion matrix D is related to Onsager 
transport matrix x and the entropy by the 
relationship 


D = xs [3] 


where the elements of s are 07S/0p;0p;. The matrix 
x is defined by the relationship between flows and 
affinities 


Os 

Tae Meba 4 

Pi 2 Xij dp; |4] 
The indices ij here label different thermodynamic 
variables. The matrix x is symmetric, a property 
known as Onsager reciprocity. Equations [2] and [3] 
follow by developing the entropy near an equilib- 
rium state, that is, by taking a quadratic expression 
as an approximation. The minus sign in eqn [4] is 
due to our convention in which the entropy has the 
same sign as the free energy. 

Equation [3] permits to reconstruct the entropy 
from the knowledge of the coefficients D and y and 
has been widely used especially in physical chem- 
istry. In SNSs, eqn [3] is replaced by a Hamilton- 
Jacobi-type equation for the entropy. 


Dynamical Boltzmann-Einstein Formula 


The basic assumption is that the stationary ensemble 
P, admits a principle of large deviations describing 
the fluctuations of the thermodynamic variables 
appearing in the hydrodynamic equation. This 
means the following. The probability that for large 


N, the evolution of the random variable py deviates 
from the solution of the hydrodynamic equation and 
is close to some trajectory f(t) is exponentially small 
and of the form 


Ps (Pn(Xn22) p(t),t - It1, t2]) 
a e TN ISC )) +1 ()] 


— eN" Tits 1) (A) [5] 


where d is the dimensionality of the system, J(ĝ) is a 
functional which vanishes if A(t) is a solution of [1] 
and S(A(t;)) is the entropy cost to produce the initial 
density profile (tı). We normalize S so that 
S(p)=0. Therefore, J(ĝ) represents the extra cost 
necessary to follow the trajectory f(t). Finally, 
ON(Xyz2+) ~ P(t) means closeness in some metric 
and = denotes logarithmic equivalence as N — ov. 
Equation [5] is the dynamical generalization of the 
Boltzmann-Einstein formula. Experience with many 
models justifies this assumption. 

To understand how [5] leads to a dynamical 
theory of the entropy, we discuss its properties 
under time reversal. Let us denote by @ the time 
inversion operator defined by 6X, = X_,. The prob- 
ability measure P% describing the evolution of the 
time-reversed process X* is given by the composition 
of P,, and 61, that is, 


P(X = $7,7 € [T1,72]) 
= P,(X,=¢-.,7€[—-n,-n1]) [6] 


Let L be the generator of the microscopic 
dynamics. We remind that L induces the evolution 
of observables (functions on the state space) accord- 
ing to the equation 0,Ex,[f(X-,)] = Ex (LAX), 
where Ex, stands for the expectation with respect to 
Px conditioned on the initial state Xo. 

The time-reversed dynamics, that is, the dynamics 
which inverts the direction of the fluxes through the 
system, for example, heat flows under this dynamics 
from lower to higher temperatures, is generated by 
the adjoint L* of L with respect to the invariant 
measure u: 


E"|fLg| = E*((L"f)a [7] 


The measure u, which is the same for both processes, is 
a distribution over the configurations of the system 
and formally satisfies uL =0. The expectation with 
respect to u is denoted by E” and f, g are observables. 
We note that the probability Ps, and therefore P%, 
depends on the invariant measure p. The finite- 
dimensional distributions of Ps are in fact given by 


Pax = On; te Ar _ br, ) 
= Lln) Pr—n (Pr = Pr) tt Ptaa (Ori E Ptr, ) [8] 
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where p,(@, — $2) is the transition probability. 
According to [6] the finite-dimensional distributions 
of P* are 


r= onr X 6) 
= (or) eee (Pr = Pr) E P i (Pri mi Per, ) 


= WM Pr,)Piy—ty1(Pm S Oma) tt Pn—n (On > On) 
[9] 


In particular, the transition probabilities p,(¢1 — ¢2) 
and p,*(¢1 — ¢2) are related by 


(1) Pr(o1 > %2) = w(d2) p- (ġ2 > o1) [0O 


This relationship reduces to the well-known detailed 
balance condition if p-(ġ1 — $2) = p-* (d1 — Q2). 

We require that also the evolution generated by 
L* admits a hydrodynamic description, that we call 
the adjoint hydrodynamics, which, however, is not 
necessarily of the same form as [1]. In fact, we 
consider models in which the adjoint hydrodynamics 
is nonlocal in space. 

In order to avoid confusion, we emphasize that what 
is usually called an equilibrium state for a reversible 
dynamics, as distinguished from an SNS, corresponds 
to the special case L* = L, that is, the detailed balance 
principle holds. In such a case, Ps is invariant under 
time reversal and the two hydrodynamics coincide. 

We now derive a first consequence of our 
assumptions, that is, the relationship between the 
functionals I and I* associated to the dynamics L 
and L* by [5]. From eqn [6], it follows that 


Dan n] (Ê) = Tt.) (0P) [11] 


with obvious notations. More explicitly, this equa- 
tion reads 


S(o(t1)) + Jit, t (Ô) = S(o(42)) + Jit,- (0) [12] 


where (tı), (t2) are the initial and final points of 
the trajectory and S(((t;)) the entropies associated 
with the creation of the fluctuations (t;) starting 
from the SNS. The functional J* vanishes on the 
solutions of the adjoint hydrodynamics. To compute 
J*, it is necessary to know the entropy S. 

We consider now the following physical situation. 
The system is macroscopically in the stationary state 
p at t= — œ, but at t=0 we find it in the state p. We 
want to determine the most probable trajectory 
followed in the spontaneous creation of this fluctua- 
tion. According to [5], this trajectory is the one that 
minimizes J among all trajectories A(t) connecting p 
to p in the time interval [—co,0]. From [12], 
recalling that S(p) =0, we have that 


J-co, 0] (ô) = SCP) + Jio, o} (A) [13] 


The right-hand side is minimal if Jip ..)(@6) =0, that 
is, if #6 is a solution of the adjoint hydrodynamics. 
The existence of such a relaxation solution is due to 
the fact that the stationary solution p is attractive 
also for the adjoint hydrodynamics. We have there- 
fore the following consequences: 


In a SNS the spontaneous emergence of a macroscopic 
fluctuation takes place most likely following a trajec- 
tory which is the time reversal of the relaxation path 
according to the adjoint hydrodynamics. 


This implies that the entropy is related to J by 
S(p) = ink J[-00, 0) P) [14] 


where the minimum is taken over all trajectories (A(t) 
connecting p to p. 

We note that the reversibility of the microscopic 
process X,, which we call microscopic reversibility, 
is not needed in order to deduce the Onsager- 
Machlup result (i.e., that the trajectory which 
creates the fluctuation is the time reversal of the 
relaxation trajectory). In fact, Onsager—Machlup 
result holds if and only if the hydrodynamics 
coincides with the adjoint hydrodynamics, which 
we call macroscopic reversibility. Indeed, it is 
possible to construct microscopic nonreversible 
models, L 4 L*, in which the hydrodynamics and 
the adjoint hydrodynamics coincide. 

Spontaneous fluctuations, including Onsager- 
Machlup time-reversal symmetry, have been 
observed in stochastically perturbed reversible elec- 
tronic devices. In nonreversible systems, an asym- 
metry between the emergence and the relaxation of 
fluctuations has been observed. The above discus- 
sion provides the explanation. 


The Hamilton-Jacobi Equation and Its 
Consequences 


We assume that the functional J has a density (which 
plays the role of a Lagrangian), that is, 


ae J EOT OE 


Let us introduce the Hamiltonian H(p, H) as the 
Legendre transform of L(p,0;p), that is, 


Hip, H) = sup, H) — L£(p, §) 5 [16] 


where (-,-) denotes integration with respect to the 
macroscopic space coordinates u. 
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Noting that H(p,0)=0, the Hamilton-Jacobi 
equation associated to [14] is 


n(n’) -0 17 


This is an equation for the functional derivative 
C(p)=6S/dp, but not all the solutions of the 
equation H(p, C(p))=0 are the derivatives of some 
functional. Of course, only those which are the 
derivative of a functional are relevant for us. 

We now specify the Hamilton-Jacobi equation 
[17] for boundary-driven lattice gases. For models 
with purely diffusive hydrodynamics [1], we expect 
a quadratic large deviation functional of the form 


Jina) =5 [ TO- DO) 
(a V(Ap— DH) (18 


where D(p) is the right-hand side of the hydrody- 
namic equation [1], and by Vf we mean a vector 
field whose divergence equals f. The form [18], which 
can be derived for several models, is expected to be 
very general: the functional J(6) measures how much 
ô differs from a solution of the hydrodynamics [1]. 
The matrix y(p) = x(p) with y(p) has the same role in 
our more general context, as the Onsager matrix in 
[4]. This form of J is also typical for diffusion 
processes described by finite-dimensional Langevin 
equations (Freidlin—Wentzell theory). 

In this case, the Lagrangian £ is quadratic in 
(t) and the associated Hamiltonian is given by 


H(p, H) = (VH, x(p)VH) + (H,D(p)) [19] 


so that the Hamilton-Jacobi equation [17] takes the 
form 


1 ôS ôS ôS 
As is well known in mechanics, the Hamilton-Jacobi 
equation has many solutions and we must give a 
criterion to select the correct one. The criterion 
which the correct solution has to satisfy is that it 
must be a Lyapunov function with respect to the 
unique stationary state. 

It is a simple calculation to show that eqn [3] follows 
from HJE, if we look for a solution which is a local 
function of p. This is the right choice in equilibrium 
where correlations over macroscopic distances are not 
expected if the microscopic forces are short range. 

Out of equilibrium, it has been shown by direct 
calculation that for a special model, the symmetric 
simple exclusion, the entropy is a nonlocal function 
of the thermodynamic variables, that is, space 


P(p)) =0 Ro 


correlations extend to macroscopic distances. This 
result can be derived in a simple way from HJE as 
we will discuss later. 

Lattice gases which do not conserve the number 
of particles do not give rise in general to a purely 
diffusive hydrodynamics but rather to a reaction 
diffusion equation. In this case, the large deviation 
functional will not have the quadratic form [18] and 
also the HJE will not be quadratic. An example in 
which particles can be created and destroyed is the 
so-called Kawasaki-Glauber dynamics. In this case, 
HJE has exponential nonlinearities. 


Nonequilibrium Fluctuation Dissipation Relation 


We now derive a twofold generalization of the 
celebrated fluctuation dissipation relationship: it is 
valid in nonequilibrium states and in nonlinear 
regimes. 

Such a relationship will hold provided the rate 
function J* of the time-reversed process is of the form 
[18] with D replaced by D*, the adjoint hydrodynamics, 


op = D (p) [21] 


with the same boundary conditions as [1]. 


If J* has the form 


* A 1 ü = A * A 
Jina P) a 7 ; dt((V ‘(Lp — D (ô)), 
xC) VT (aA = D*A) 22 
by taking the variation of eqn [12], we get 
ôS 
Pie) + D=: (xE) e 


This relation can be verified explicitly for the 
nonequilibrium zero-range process which we discuss 
later and holds for several other models. It is also 
easy to check that the linearization of [23] around 
the stationary profile p yields a fluctuation dissipa- 
tion relationship which reduces to the usual one in 
equilibrium. 

The fluctuation dissipation relation [23] can be used 
to obtain the adjoint hydrodynamics from D(p) and 
6S/6p; the first is usually known and the second can be 
calculated from the Hamilton—Jacobi equation. 


H Theorem 


We show that the functional S$ is decreasing along 
the solutions of both the hydrodynamic equation [1] 
and the adjoint hydrodynamics 


ap =D(e) =V: (xT) -D p4 
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Let p(t) be a solution of [1] or [24]; by using the 
Hamilton-Jacobi equation [20], we get 


5551040) (E00) ao) 


<0 25] 


In particular, we have that (d/dt)S(p(t))=0 if and 
only if (6S/6p)(p(t)) = 0. 

We remark that the right-hand side of [25] 
vanishes in the stationary state, that is, there is no 
internal entropy production due to the evolution. 
On the other hand, there is a steady entropy 
production due to the differences in the chemical 
potentials of the reservoirs. This is not discussed in 
this article. 


Decomposition of Hydrodynamics 


There is a structural property of hydrodynamics 
which follows from the HJE. The hydrodynamic 
equation can be decomposed as the sum of a 
gradient vector field and a vector field A orthogonal 
to it in the metric induced by the operator K, 
where Kf = —V -(x(p)V/f), namely 


D(p) = EV. (xwv) +Alp) B6 


Similarly, using the fluctuation dissipation rela- 
tionship [23] for the adjoint hydrodynamics, we 
have 


with 


D'(p) = EV. (xov) -A 27 


Since A is orthogonal to 6S/ép, it does not contribute 
to the entropy production. The vector field A is odd 
under time reversal like a magnetic force. 

Both terms of the decomposition vanish in the 
stationary state, that is, when p=p. Whereas in 
equilibrium the hydrodynamics is the gradient flow of 
the entropy S, the term A(p) is characteristic of 
nonequilibrium states. Note that, for small fluctuations 
p & p, small differences in the chemical potentials at 
the boundaries, A(p) becomes a second-order quantity 
and Onsager theory is a consistent approximation. 

Equation [26] is interesting because it separates 
the dissipative part of the hydrodynamic evolution 
associated to the thermodynamic force 6S/éo and 


provides therefore an important physical informa- 
tion. Notice that the thermodynamic force 6S/6ép 
appears linearly in the hydrodynamic equation 
even when this is nonlinear in the macroscopic 
variables. 

In general, the two terms of the decomposition 
[26] are nonlocal in space even if D is a local 
function of p. This is the case for the simple 
exclusion process discussed later. Furthermore 
while the form of the hydrodynamic equation does 
not depend explicitly on the chemical potentials, 
6S/ép and A do. 

To understand how the decomposition [26] arises 
microscopically, let us consider a stochastic lattice 
gas. Let 


L=}(L+L*)+}(L-L*) [28] 


be its Markov generator, where L* is the adjoint of 
L with respect to the invariant measure, namely the 
generator of the  time-reversed microscopic 
dynamics. The term L — L* behaves like a Liouville 
operator, that is, it is anti-Hermitian and, in the 
scaling limit, produces the term A in the hydro- 
dynamic equation. This can be verified explicitly in 
the boundary-driven zero-range model introduced in 
the next section. 

Since the adjoint generator can be written as 
L*=(L+L*)/2 — (L — L*)/2, the adjoint hydro- 
dynamics must be of the form [27]. In particular, if 
the microscopic generator is self-adjoint, we get A = 0 
and thus D(p)=D*(p). On the other hand, it may 
happen that microscopic nonreversible processes, 
namely for which L 4 L*, can produce macroscopic 
reversible hydrodynamics if L — L* does not con- 
tribute to the hydrodynamic limit. 

The decompositions [26] and [27] remind of the 
electrical conduction in the presence of a magnetic 
field. Consider the motion of electrons in a 
conductor: a simple model is given by the effective 
equation 


. 1 1 


where p is the momentum, e the electron charge, E 
the electric field, H the magnetic field, m the mass, 
c the velocity of the light, and 7 the relaxation time. 
The dissipative term p/r is orthogonal to the 
Lorentz force p A H. We define time reversal as the 
transformation p> —p,H+> —H. The adjoint evo- 
lution is given by 


E 1 1 
p=e(E+ paH) -p [30] 
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where the signs of the dissipation and the electro- 
magnetic force transform in analogy to [26] and 
[27]: 

Let us consider in particular the Hall effect where 
we have conduction along a rectangular plate 
immersed in a perpendicular magnetic field H with 
a potential difference across the longer side. The 
magnetic field determines a potential difference 
across the other side of the plate. In our setting on 
the contrary, it is the difference in chemical 
potentials at the boundaries that introduces in the 
equations a “magnetic-like” term. There is therefore 
a kind of equivalence between certain externally 
applied fields and driving the system at the 
boundaries. 


Minimum Dissipation Principle 


In 1931 Onsager formulated, within his near 
equilibrium theory, a variational principle which 
shows that the hydrodynamic evolution minimizes 
at each instant of time a quadratic functional of p. 
He called this the “minimum dissipation principle.” 
We now show that the decomposition of the 
previous subsection leads to a natural exact general- 
ization of this principle. We want to construct a 
functional of the variables p and p such that the 
Euler equation associated to the vanishing of the 
first variation under arbitrary changes of p is the 
hydrodynamic equation [1]. We define the “dissipa- 
tion function” 


F(p, p) = (C - Al), K- AC) B1 
and the functional 
®(p, p) = S(p) + F(p, ù) 
AS | 
(=?) + {(o- Alo), 
K~'(~— A(p))) [32] 


which generalize the corresponding Onsager’s defi- 
nitions (Onsager 1931a, b). The operator K has been 
defined in the previous subsection. 

It is easy to verify that 


6 = 0 [33] 


is equivalent to the hydrodynamic equation [1]. 
Furthermore, a simple calculation gives 


1 ôS 6S 
F| =D) AOA [34] 


that is, 2F on the hydrodynamic trajectories equals 
the entropy production rate as in Onsager’s near 
equilibrium approximation. 


The dissipation function for the adjoint hydro- 
dynamics is obtained by changing the sign of A 
in [31]. 


Entropy and Optimal Control 


There is an interesting interpretation of the entropy 
as a minimal cost to produce a fluctuation by 
externally acting on the system. The idea is to show 
that there exists a cost function which on the optimal 
control trajectory coincides with the entropy differ- 
ence with respect to the stationary state. 

We add an external perturbation v to the 
hydrodynamic equation 


Op =5V-(D(p)Vp)+v=D(p)+u [B5] 


We want to choose v so as to drive, with minimal 
cost, the system from its stationary state p to an 
arbitrary state p. A simple cost function is 

1 f? 4 

2 J, ds(v(s),K “(p(s))u(s)) [36] 
where p(s) is the solution of [35] and we recall that 
K(p)f =—-V-(x(p)Vf). More precisely, given 
plti) =p, we want to drive the system to p(t) =p 
by an external field v which minimizes [36]. This is 
a standard problem in control theory. Let 


gl {7 _ 
Ve) = ints | dse K OOO) B71 
ti 
where the infimum is taken with respect to all fields 
v which drive the system to p in an arbitrary time 
interval [t1, t2]. The optimal field v can be obtained 
by solving the Bellman equation which reads 


min] 3 (UK "(p)) = (Dio) +v, ~)} —=0 [38] 


It is easy to express the optimal v in terms of V; we 
get 


v = K— [39] 


Hence, [38] now becomes 


(Ki) (D=o ao 


By identifying the cost functional V(p) with S(p), eqn 
[40] coincides with the Hamilton-Jacobi equation [20]. 

By inserting the optimal v [39] in [35] and 
identifying V with S, we get that the optimal 
trajectory p(t) solves the time-reversed adjoint 
hydrodynamics, namely 


xp = —D* (p) [41] 
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The trajectory of the spontaneous emergence of a 
fluctuation coincides therefore with the trajectory of 
minimal cost for the optimal control. The optimal 
field v does not depend on the nondissipative part A 
of the hydrodynamics. 


Models 


The general theory will now be illustrated by briefly 
describing models where it has been successfully 
applied. We consider examples of different nature in 
order to emphasize the generality and flexibility of 
the point of view developed in the previous section. 

We have chosen three examples in which the 
theory is used in different ways. The first one, the 
zero-range process, can be solved in a simple way so 
that the theory can be verified in detail. In the second 
one, the symmetric simple exclusion, we derive from 
the HJE a nonlinear ordinary differential equation 
first obtained by Derrida, Lebowitz, and Speer 
through a direct rather complex calculation. This 
equation implies the nonlocality of the entropy in the 
SNS of this model. The third model, the Kawasaki- 
Glauber dynamics, provides the illustration of two 
aspects. Nonlocality of the entropy, that is, long- 
range correlations, can appear in isolated equilibrium 
states if the microscopic dynamics is not time-reversal 
invariant. This means that long-range correlations as 
a signature of time-reversal violation are not 
restricted to SNSs. The second aspect to be under- 
lined is the effectiveness of the HJE in a more 
complex case: in fact in this model, the number of 
particles is not conserved which leads to a very 
complicated structure of the HJE. 

As a general comment, we emphasize that 
dynamics microscopically different but leading to 
the same macroscopic description, in particular the 
same hydrodynamics and large deviation functional, 
are indistinguishable for the theory which is purely 
macroscopic. 


Zero Range 


We consider the so-called zero-range process 
which models a nonlinear diffusion of a lattice 
gas. The model is described by a positive integer 
variable 7,(x) representing the number of particles 
at site x and time 7 of a finite lattice which for 
simplicity we assume one dimensional. The parti- 
cles jump with rates g(n(x)) to one of the nearest- 
neighbor sites x +1,x—1 with probability 1/2. 
The function g(k) is nondecreasing and g(0)=0. 
We assume that our system interacts with two 
reservoirs of particles in positions N and —N with 
rates p+ and p_, respectively. This model can be 


solved exactly and the previous theory can be 
checked in full detail. 

Let us introduce the macroscopic coordinates, 
time t=tT/N?* and space u=x/N. To describe the 
macroscopic dynamics, we introduce the empirical 
density 


N 
plu) =< YO nna(e)élu—x/N) 42 
x=—N 


where 6(u — x/N) is the Dirac 6. One can prove that in 
the limit N — oo, the empirical density [42] tends in 
probability to a continuous function p;(u“), which 
satisfies the following hydrodynamic equation: 


rp = sA¢(p) = D(p) [43] 


where ¢(p) can be explicitly defined in terms of the 
rates g(7). The boundary conditions for [43] are 


P(p(t, +1))=p+. 
The adjoint hydrodynamics is 


ap=3A0p) -aV =D) [aa 
with 
e i ete 
and 
patie 


The boundary conditions for [44] are the same as 
for [43]. The second term on the right-hand side of 
[44] is proportional to the difference of the chemical 
potentials and produces an inversion of the particle 
flux. The action functionals J(ĝ) and J*(ĝ) for this 
model have been computed and have the form [18] 
and [22], respectively, with y(p) = ¢(p). The entropy 
S(p) can be easily computed directly from the 
expression of the invariant measure which is of 
product type and is known explicitly: 


oe o(p(u)) 
A(u) 
Za 7 


where 


It is easy to verify that it solves the HJE. Due to the 
special zero-range character of the interaction in this 
model, there are no long-range correlations in 
nonequilibrium states. 
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Simple Exclusion 


The simple exclusion process is a model of a lattice 
gas with an exclusion principle: a particle can move 
to a neighboring site, with rate 1/2 for each side, 
only if this is empty. We consider again a one- 
dimensional case and we denote by 7,.(7) € {0, 1} the 
number of particles at the site x at (microscopic) 
time r. The system is in contact with particle 
reservoirs at the boundaries + N where a particle is 
created with rates p+ if the boundary site is empty 
and is destroyed 1 — p if it is occupied. In contrast 
to the zero-range model, the invariant measure 
carries long-range correlations making the entropy 
nonlocal. 

The hydrodynamic equation for the simple exclu- 
sion process can be derived as for the zero-range 
process; in fact, it is easier in this case because a 
simple computation leads directly to a closed 
equation for the empirical density which is defined 
as in [42] except that the variable 7 now takes only 
the values 0 or 1. We find that the limiting density 
evolves according to the linear heat equation 





Orp(t, u) = zAp(t,u) = D(p) [46] 
with boundary conditions 
P+ 
a) = 
pt, £1) = 7 m 


In this case, the density of particles p takes values 
in [0,1]. We use the HJE to calculate the entropy. 
For this model, we have y(p)=p(1— p). We show 
that the solution of the HJE for S(p) (which is a 
functional derivative equation) can be reduced to the 
solution of an ordinary differential equation. 

The Hamilton-Jacobi equation for the simple 
exclusion process is 


óS óS óS 
-AN eee | AZ 
(VE -ov + (2, ap) 47] 


We look for a solution of the form 





ôS o plu) _ blu: 
g er ao) es 


for some functional ¢(u; p) to be determined satisfy- 
ing the boundary conditions 





P 
(+1) = log m 
in the space variable. The first term on the right- 
hand side is the derivative of the equilibrium 
entropy, that is for boundary conditions p_ = p4. 
Inserting [48] into [47], we get (note that 
p — e? /(1 +e?) vanishes at the boundary) 





0=- (v(ig7-6)20 - pe) 


=- (Vp, V4) + (o(1 — p), (V9)") 

s- (o-r) v) 
(bria) er) 0) 

-( (r-ra) (1 SE -avo) 


We obtain a nontrivial solution of the Hamilton- 
Jacobi if we solve the following ordinary differential 
equation, corresponding to the vanishing of the right 
side of the scalar product, which relates the 
functional ¢(u) = d(u; p) to p: 











Ag(u) 1 
Dp Trem BEND a 
P+ 
é(+1) = log TEN 


It is clear that ¢ is a nonlocal functional of p. A 
computation shows that the derivative of the 
functional 


S(p) = f au} plog p+ (1 — p) log(1 — p) 
+(1 — p)d — log(1 + e?) + ogie] 


is given by [48] when ¢(u; p) solves [49]. 


Kawasaki-Glauber Dynamics 


The model consists of particles on a lattice evolving 
according to two basic dynamical processes: 


1. a particle can move to a neighboring site if this is 
empty as in the simple exclusion and 

2. a particle can disappear in an occupied site or be 
created if this is empty, the rate depending on the 
nearby configuration. 


The first process is conservative while the second is 
not. 

As before the object of our study is the empirical 
density [42]. It is possible to show that as N goes to 
infinity, p(t, u) is a solution of 


dp = }żAp + B(p) — D(p) [50] 
with 
B(p) = E,,(e(m)(1 — n(0))) [51] 


D(p) = E,,(c(n)n(0)) [52] 
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where v, is the Bernoulli product distribution with 
parameter p. Typically, B(p) and D(p) are poly- 
nomials in p. For this model we consider equilibrium 
states so that we can take periodic boundary 
conditions. An equilibrium state corresponds to a 
density p which is the solution of the equation 
aa he e gives a minimum of the potential 

= |D (p')|dp'. We admit potentials 
a unr minima. The Hamiltonian associated 
to the large deviation functional for this model is not 
quadratic: 


H(p, H) = Jed; sHAp +5 (VH) a. 
= B(p)(1 — exp H) = D(p) 
x (1 — exp(-H)) | 53] 


where H has the role of the conjugate momentum. 
The Hamilton-Jacobi equation 


n(n’) -0 sa 


is therefore very complicated but can be solved by 
successive approximations using as an expansion 
parameter p — p, where p is a solution of B(p) = D(p) 
that is a stationary solution of hydrodynamics. For 
p=p, we have 6S/59=0. We are looking for an 
approximate solution of [54] of the form 


S(p) =;/ du Is du(p(u) — p)k(w,v)(p(v) — P) 
+ o(p — py [55] 


The kernel k(u,v) is the inverse of the density 
correlation function c(u, v). 


J ctw.y)R(y,2) dy = 6(4 = v) 56 


By inserting [55] in [54], one can show that k(u, v) 
satisfies the following equation: 
15(1 — p)A,k(u,v) — 


xp bok(u, v) 
— 5A,,6(u — v) + 


(dı —bi)6(u—v) =0 [57] 


and 
bo = B(p) = D(p) = do [58] 


If the entropy is a local functional of the density, 


k(u,v) must be of the form k(u,v)=f(p)d(u — v) 
which inserted in [57] gives 
f(p) = pa -A [59] 
and 
boa(1 — p)]™* — (di — bı) = 0 [60] 


Therefore if bo,b1,d,; do not satisfy the last 
equation, the entropy cannot be a local functional 
of the density. It can be shown that in this case time- 
reversal invariance is violated and the adjoint 
hydrodynamics is different from [50]. This calcula- 
tion supports the conjecture that macroscopic 
correlations are a generic feature of equilibrium 
states of nonreversible lattice gases. 


See also: Interacting Particle Systems and 
Hydrodynamic Equations; Interacting Stochastic Particle 
Systems; Nonequilibrium Statistical Mechanics 
(Stationary): Overview; Quantum Central-Limit Theorems. 
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Introduction 


Nuclear magnetic resonance (NMR) is a subtle 
quantum-mechanical phenomenon that, through 
magnetic resonance imaging (MRI), has played a 
major role in the revolution in medical imaging over 
the last 30 years. Before being conceived for use in 
imaging, NMR was employed by chemists to do 
spectroscopy, and it remains a very important tech- 
nique for determining the structure of complex 
chemical compounds like proteins. In this article we 
explain how NMR is used to create an image of a 
three-dimensional object. Scant attention is paid to 
both NMR spectroscopy, and the quantum descrip- 
tion of NMR. Those seeking a more complete 
introduction to these subjects should consult the 
article Nuclear Magnetic Resonance in this Encyclo- 
pedia, as well as the monographs of Abragam (1983) 
or Ernst et al. (1987), for spectroscopy, and that of 
Callaghan (1993) for imaging. All three books 
consider the quantum-mechanical description of 
these phenomena. Comprehensive discussions of 
MRI can be found in Bernstein et al. (2004) and 
Haacke et al. (1999), and a historical appreciation of 
the development of MRI is given in Wehrli (1995). 


The Bloch Equation 


We begin with the Bloch phenomenological equa- 
tion, which provides a model for the interactions 
between applied magnetic fields and the nuclear 
spins in the objects under consideration. This is a 
macroscopic averaged model that describes the 
interaction of aggregates of spins, called isochro- 
mats, with applied magnetic fields. An isochromat is 
a collection of “like” spins, which is spatially large 
on the atomic scale, but very small on the scale of 
the variations present in the applied magnetic fields. 
Spins are alike if they belong to the same species and 
are in the same chemical environment. There may be 
several different classes of spins, but, in this article, 
it is assumed that they are noninteracting and so it 
suffices to consider each separately. Heretofore, we 
suppose that there is a single class of like spins. The 
distribution of isochromats for these spins is 
described macroscopically by the spin density 
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function, which we denote by p(x,y,z). In most 
medical applications, one is imaging the distribution 
of spins arising from hydrogen protons in water 
molecules. 

The state of the isochromat at spatial location 
(x,y,z) is given by a 3-vector: 


M(x,y,z) = (m1 (x,y, 2), m2(x,y, 2), 73 (x,y, 2)) 


which is interpreted as the magnetic moment per 
unit volume. It is an ensemble mean of the quantum 
dipoles caused by the spins within the isochromat. In 
most applications of NMR to imaging, the applied 
magnetic field is described as the sum of a large, 
time-independent field, Bo(x, y,z), and smaller time- 
dependent fields, B’(x,y,z;t). In the presence of a 
static field, thermal fluctuations cause the nuclear 
spins to slightly prefer an orientation aligned with 
the field. Using the Boltzmann distribution, one 
obtains that the nuclear paramagnetic susceptibility 
of water protons is given by 


>? 
K= a [1] 

4kgT 
here h is Planck’s constant, kg the Boltzmann’s 
constant, and T the absolute temperature, (see Levitt 


(2001)). The constant y is called the gyromagnetic 
(or magnetogyric) ratio. For a proton, 


y & 2r x 42.5764 x 10° rad s~ T! [2] 


For water molecules at 
y 3.6 x10. 

If the sample is held stationary in the field Bo for a 
sufficiently long time, then the spins become 
polarized and a bulk magnetic moment appears; 
this is called the equilibrium magnetization: 


room temperature, 


Mo(x, ¥,z) = xp(x, y, z)Bo(x, y, z) [3] 


The Bloch equation describes the evolution of M 
under the influence of the applied field B= Bọ + B’: 


dM t 
eee = yM(x,y,2;t) x B(x, y, z; t) 
1 1 
=M l — (M 
T, (9 zit) A o(x, y, z) 
- M! (x,y, Z; t)) 4 


Here x is the vector cross-product, M~(x, y, z; t) 
the component of M(x,y,z; t) perpendicular to 
Bo(x,¥,z) (called the transverse component), and 
M! the component of M parallel to Bo (called the 
longitudinal component). For hydrogen protons in 
other molecules, the gyromagnetic ratio is expressed 
in the form (1 — a). The coefficient ø is called the 
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nuclear shielding; it is typically between —10~* and 
+1074. The difference in the nuclear shielding causes 
a shift in the resonance frequency by yo. 

The second and third terms in eqn [4] are 
relaxation terms. They provide a phenomenologi- 
cal model for the averaged interactions of the spins 
with one another and their environment. The 
coefficient 1/T;(x,y, z) is the spin lattice relaxation 
rate; it describes the rate at which the magnetiza- 
tion returns to equilibrium. The coefficient 
1/T2(x,y,z) is the spin-spin relaxation rate; it 
describes the rate at which the transverse compo- 
nents of M decay. The physical processes causing 
these relaxation phenomena are different and so 
are the rates themselves, with T> less than T1. The 
relaxation rates largely depend on the localized 
thermal fluctuations of the molecules and provide 
a useful contrast mechanism in MR imaging. 
Spin-spin relaxation occurs very rapidly in solids 
(<1 ms) and, therefore, we usually assume that we 
are imaging liquid-like materials such as water 
protons in soft mammalian tissues. In this case, T> 
takes values in the 40 ms to 4s range. Notice that 
this model does not include any explicit interac- 
tion between isochromats at different spatial 
locations. A variety of such interactions exist, 
but, at least in liquid-like materials, they lead only 
to small corrections in the Bloch equation model. 
A derivation of the Bloch equation from the 
Schrodinger equation can be found in Abragam 
(1983) and Slichter (1990). For coupled systems, 
the Bloch equation formalism breaks down and a 
full quantum-mechanical treatment is necessary 
(see Nuclear Magnetic Resonance and Ernst et al. 
(1983)). 

Much of the analysis in NMR imaging amounts to 
understanding the behavior of solutions to eqn [4] 
with different choices of B. We now consider some 
important special cases. The simplest case occurs if 
B has no time-dependent component; then this 
equation predicts that the sample becomes polarized 
with the transverse part of M decaying as e~’/”, 
and the longitudinal component approaching the 
equilibrium magnetization, Mo, as 1—e-“/"'. To 
simplify the subsequent discussion, we assume that 
the field By is homogeneous with Bo = (0,0, bo). If 
B=Bp and we omit the relaxation terms (set 
T; =T>=co in [4]), then an initial magnetization 
M(x,¥,z;0) simply precesses about Bo at angular 
frequency wo = bo: M(x, y, z; t) = U(t) M(x, y, z; 0), 
with 


coswot —sinwot 0 
U(t) = | sinwọt coswot 0 [5] 
0 0 1 


The frequency wọ is called the Larmor frequency; 
this precession of M about the axis of Bo is the 
resonance phenomenon referred to as NMR. In 
typical medical imaging systems, bo is between 1 
and 3 T and the corresponding resonance frequency 
is between 40 and 120 MHz. 

Typically, the field B takes the form 


B = Bo + G + B; j6] 


where G is a gradient field and B, is a radio- 
frequency (RF) field. Usually, the gradient fields are 
“piecewise time-independent” fields, small relative 
to Bo. By piecewise time-independent field, we 
mean a collection of static fields that, in the course 
of the experiment, are turned on and off. The Bı 
component is a time-dependent RF field, nominally 
at right angles to Bo. It is usually taken to be 
spatially homogeneous, with time dependence of 
the form 


Bi(t) = U(t) | 6) [7] 


The functions a and 8 define an envelope that 
modulates the time-harmonic field, [coswot, 
sin wot, 0]. They are supported in a finite interval 
[to, £1], that is, the B4 field is “turned on” for a finite 
period of time. The change in the state of the 
magnetization between to and tų is called the RF 
excitation. It may be spatially dependent. 

In light of [5] it is convenient to introduce the 
rotating reference frame. We replace M with m, 
where m(x, y, z;t) = U(t)! M(x, y,z;t). It is a classi- 
cal result of Larmor, that if M satisfies [4], then m 
satisfies 


dm(x, y, z;t) 


dt =qm(x, y, 2; t) x Bese(x, y, z; t) 
Loi 1 
es Gite ON 
T” CR ty. (x, y, 2) 
where 


Beg = U(t)'B- (0 0,2) 


As G is much smaller than B and quasistatic, it turns 
out that one can ignore the components of G 
orthogonal to Bo. Indeed, in imaging applications, 
one usually assumes that the components of G 
depend linearly on (x,y,z) with the 2-component 
given by ((x,y, Z), (21,22, 23)). The constant vector 
G = (g1, 22,23) is called the gradient vector. With 
Bo =(0,0,b0) and Bı given by [7], we see that Ber 
can be taken to equal (0, 0, ((x, y, z), G)) + (a, 3, 0). 


In the remainder of this article, we assume that Bg 
takes this form. 

If G=0 and 8 = 0, then the solution operator for 
Bloch’s equation, without relaxation terms, is 


1 0 0 
V(t)= |0 cos@(t) sin A(t) [9] 
0 —sin@(t) cosé@(t) 


where 


g(t) = / a(s) ds 10] 


This is simply a rotation about the x-axis through 
the angle 0(t). If Bı #0 for t€ [0,7], then the 
magnetization is rotated through the angle 6(r). 
Thus, RF excitation can be used to move the 
magnetization out of its equilibrium state. As we 
shall soon see, this is crucial for obtaining a 
measurable signal. Note that the equilibrium mag- 
netization is a tiny perturbation of the very large 
field Bo and is, therefore, in practice not directly 
measurable. Only the precessional motion of the 
transverse components of M produces a measurable 
signal. More general B, fields, that is, with both a 
and (3 nonzero, have more complicated effects on the 
magnetization. In general, the angle between M and 
Mo at the conclusion of the RF excitation is called 
the flip angle. 

If, on the other hand, Bj; =0 and G;= (0,0, 
I(x, y,z)), where [(-) is a function, then V depends on 
(x,y,z), and is given by 


VZIT) 


cosyl(x,y,z)t —sinyl(x,y,z)t 0 


[11] 


= | sinyl(x,y,z)t  cosyl(x,y,z)t 0 
0 0 1 


This is precession about Bo at an angular 
frequency that depends on the local field strength 
bo + (x,y,z). If both Bı and G are simultaneously 
nonzero, then, starting from equilibrium, the 
solution of the Bloch equation, at the conclusion 
of the RF pulse, has a nontrivial spatial depen- 
dence. In other words, the flip angle becomes a 
function of the spatial variables. We return to this 
in a later section. 


A Basic Imaging Experiment 


With these preliminaries, we can describe the basic 
measurements in magnetic resonance imaging. When 
exposed to Bo, the sample becomes polarized at a 
rate determined by Tı. Once the sample is polarized, 
a B,-tield, of the form given in [7] (with 8 = 0), is 
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turned on for a finite time 7. This is called an RF 
excitation. For the purposes of this discussion, we 
suppose that the time is chosen so that (T) = 90°, see 
eqn [10]. As Bo and By, are spatially homogeneous, 
the magnetization vectors within the object remain 
parallel throughout the RF excitation. At the conclu- 
sion of the RF excitation, M is orthogonal to Bo. 

After the RF is turned off, the vector field 
M(x, y,23t) precesses about Bo, in phase with the 
angular velocity wo. The transverse component of M 
decays exponentially. If we normalize the time so 
that t=0 corresponds to the conclusion of the RF 
pulse, then, in the laboratory frame, 


M(x,y,z; t) = ae z) 


ee COS Wot, 
e/™ sin wot, (1 — eh) [12] 


Recall Faraday’s law: a changing magnetic field 
induces an electromotive force (EMF) in a loop of 
wire according to the relation 





d 
EMFloop X Te [13] 
Here loop denotes the flux of the field through the 
loop of wire (see Introductory Articles: Electromag- 
netism). The transverse components of M are a 
rapidly varying magnetic field, which, according to 
Faraday’s law, induce a current in a loop of wire. In 
fact, by placing several such loops close to the sample 
we can measure a signal of the form 


2 alwot 
S(t) = “oe J aI A 
i sample 


x Direc(X, Y, z)dx dy dz [14] 


Here Direc(x, y,z) quantifies the sensitivity of the 
detector to the precessing magnetization located at 
(x,y,z). From So(t) we easily obtain a measurement 
of the integral of the function pbj;ec. By using a 
carefully designed detector, bj,e¢ can be taken to be 
a constant, and therefore we can determine the total 
spin density within the object of interest. For the rest 
of this article, we assume that birec is a constant. 
Note that the size of the measured signal is 
proportional to w%, which is, in turn, proportional 
to ||Bo||”. This explains, in part, why it is so useful to 
have a very strong Bo-field. Though even with a 
1.5T magnet, the measured signal is only in the 
microwatt range (see Hoult and Lauterbur (1979) 
and Edelstein et al. (2004)). 

Suppose that, at the end of the RF excitation, we 
turn on the gradient G. As the magnetic field 
B = Bo + G now has a nontrivial spatial dependence, 
the precessional frequency of the spins, which equals 
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7||B||, also has a spatial dependence. In fact, 
assuming that T> is spatially independent, it follows 
from [11] that the measured signal would now be 
given by 

biren E e 

y 
«| p(x, y, z)e”™ x,y,2), 
S 


ample 


Sg(t) ~ 
k dx dy dz [15] 


Up to a constant, e™™™0* e™*/T2>Sg(t) is simply the 
Fourier transform of p at k= — tyG/2r. By sam- 
pling in time and using a variety of different gradient 
vectors, we can sample the three-dimensional Fourier 
transform of p in a neighborhood of 0. This suffices 
to reconstruct an approximation to p. In medical 
applications, T> is spatially dependent, which, as 
described later in the section “Contrast and resolu- 
tion,” provides a useful contrast mechanism. 

Imagine that we collect samples of (k) on a 
rectangular grid 


| sds nk jAk) 
Ny. -Ny Ny. Ny 
ae ee es 
2. 2. 2. 2 
N. ON. 
ps R 
S fz = > 


Since we are sampling in the Fourier domain, the 
Nyquist sampling theorem implies that the sample 
spacing determines the spatial field of view from which 
we can reconstruct an artifact-free image: in order to 
avoid aliasing artifacts, the support of p must lie in a 
rectangular region with side lengths [Ak;", Ak, T 
Akt], see Haacke et al. (1999), Epstein (2003), and 
Barrett and Myers (2004). In typical medical applica- 
tions, the support of p is much larger in one dimension 
than the others, and so it turns out to be impractical to 
use the simple data collection technique described 
above. Instead, the RF excitation takes place in the 
presence of nontrivial gradient fields, which allows for 
a spatially selective excitation: the magnetization in 
one region of space obtains a transverse component, 
while that in the complementary region is left in the 
equilibrium state. In this way, we can collect data from 
an essentially two-dimensional slice. This is described 
in the next section. 


Selective Excitation 


As remarked above, practical imaging techniques do 
not excite all the spins in an object and directly 
measure samples of the three-dimensional Fourier 
transform. Rather, the spins lying in a slice are 


excited and samples of the two-dimensional Fourier 
transform are then measured. This process is called 
selective excitation and may be accomplished by 
applying the RF excitation with a gradient field 
turned on. With this arrangement, the strength of 
the static field, By + G, varies with spatial position, 
hence the response to the RF excitation does as 
well. Suppose that G= (0,0, ((x,y,z),G)) and set 
f = [2r] "y(x, y,z), G}. This is called the offset 
frequency, as it is the amount by which the local 
resonance frequency differs from the resonance 
frequency wo of the Bo-field. The result of a selective 
RF excitation is described by a magnetization profile 
m®"(f), which is a unit 3-vector-valued function of 
the offset frequency. A typical case would be 


(0,0, 1] for f € [fo fi] 
[sin#,0,cos6@] for f €[fo, fil 


The magnetization is flipped through an angle 0, in 
regions of space where the offset frequency lies in 
the interval [ fo, f1] and is left in the equilibrium state 
otherwise. 

Typically, the excitation step takes a few milli- 
seconds and is much shorter than either Ti or T; 
therefore, one generally uses the Bloch equation, 
without relaxation, in the discussion of selective 
excitation. In the rotating reference frame, the Bloch 
equation, without relaxation, takes the form 


O 2af =p 
= sf) = |—27f 0 ya 
P —ya 0 


mif =] 16] 





m(f;t) [17] 


The problem of designing a selective pulse is 
nonlinear. Indeed, the selective excitation problem 
can be rephrased as a classical inverse-scattering 
problem: one seeks a function a@(t)+1i8(t) with 
support in an interval [to,¢,;] so that, if m(f;t) is 
the solution to (17) with m(f;to))=[0,0,1], then 
m(f;t;)=mP'(f). If one restricts attention to flip 
angles close to 0, then there is a simple linear model 
that can be used to find approximate solutions. 

If the flip angle is close to zero, then m3 ~ 1 
throughout the excitation. Using this approxima- 
tion, we derive the low-flip-angle approximation to 
the Bloch equation, without relaxation: 


—— = —2rif (m + im) +iyla +i6) [18] 


From this approximation, we see that 
_ Zon + im) 
yi 


)=f “hfe 2 df — [19 


where .7 (h)( 
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Figure 1 A selective 90° pulse and profile designed using the 


magnetization profile produced by the pulse in (a). 
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linear approximation. (a) Profile of a 90° sinc-pulse. (b) The 
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Figure 2 A selective 90° pulse and profile designed using the inverse scattering approach. (a) Profile of a 90° inverse-scattering 


pulse. (b) The magnetization profile produced by the pulse in (a). 


For an example such as in [16], @ close to zero, and 
fo= —fi, we obtain 


isin sin fit 


a+ibx [20] 


nyt 

A pulse of this sort is called a sinc-pulse. A 
sinc-pulse is shown in Figure 1a, the result of 
applying it in Figure 1b. A more accurate pulse can 
be designed using the Shinnar-Le Roux algorithm 
(see Pauly et al. (1991) and Shinnar and Leigh 
(1989)), or the inverse scattering approach (see 
Epstein (2004)). An inverse-scattering 90°-pulse is 
shown in Figure 2a and the response in Figure 2b. 


Spin-Warp Imaging 


In an earlier section we showed how NMR 
measurements could be used to measure the three- 


dimensional Fourier transform of p. In this section, 
we consider a more practical technique, that of 
measuring the two-dimensional Fourier transform of 
a “slice” of p. Applying a selective RF pulse, as 
described in the previous section, we can flip the 
magnetization in a region of space Zo — ÂZ <z < 
zo + Az, while leaving it in the equilibrium state 
outside a slightly larger region. Observing that a 
signal near the resonance frequency is only produced 
by isochromats whose magnetization has a nonzero 
transverse component, we can now measure samples 
of the two-dimensional Fourier transform of the 
function 


1 Zo + Azo 
Pola) =z  leye)de [2 
20 7 AZ 


If Az is sufficiently small then p(x, y) = p(x, y, zo). 
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In order to be able to use the fast Fourier 
transform (FFT) algorithm to do the reconstruction, 
it is very useful to sample Pz, on a uniform grid. To 
that end, we use the gradient fields as follows: after 
the RF excitation we apply a gradient field of the 
form G,, = (0,0, —g2y + g1x) for a certain period of 
time T,,. This is called a phase encoding gradient. 
At the conclusion of the phase encoding gradient, 
the transverse components of the magnetization 
from the excited spins has the form 

in! Ky) Ore NE) poe. 9) [22] 
where (kx, ky) =[27]yTpn(—£1,82). At time Toh, 
we turn off the y-component of Gp and reverse the 
polarity of the x-component. At this point, we begin 
to measure the signal. We get samples of p(k, ky) 
where k varies from —ky max tO Rxmax. By repeating 
this process with the strength of the y-phase 
encoding gradient being stepped through a sequence 
of uniformly spaced values, g) € {nAgy}, and col- 
lecting samples at a uniformly spaced set of times, 
we collect the set of samples 


2 (mAk,,nAky): 
<m< -F sns) [23] 


The gradient Gg = (0,0, —g1x), left “on” during 
signal acquisition, is called a frequency encoding 
gradient. While there is no difference, mathemati- 
cally, between the phase encoding and frequency 
encoding steps, there are significant practical differ- 
ences. This approach to sampling is known as spin- 
warp imaging; it was introduced in Edelstein et al. 
(1980). The steps of this experiment are summarized 
in a pulse sequence timing diagram, shown in 
Figure 3. This graphical representation for the 
steps followed in a magnetic resonance imaging 
experiment is ubiquitous in the literature. 

To avoid aliasing artifacts, the sample spacings 
Ak, and Ak, must be chosen so that the excited 
portion of the sample is contained in a region of size 
Ak 1x Ak This is called the field of view or 
FOV. Since we can only collect the signal for a finite 
period of time, the Fourier transform p(k,,ky) is 
sampled at frequencies lying in a rectangle with 
vertices (Ekymars T KRymaxh Where 


N,.Ak,, 


N,Ak 
kx max — » » 
2 9 


Ryne = 5 [24] 





The maximum frequencies sampled effectively deter- 
mine the resolution available in the reconstructed 
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Figure 3 Pulse timing diagram for spin-warp imaging. During 
the positive lobe of the frequency encoding gradient, the analog- 
to-digital converter (ADC) collects samples of the signal 
produced by the rotating transverse magnetization. 


image. Heuristically, this resolution limit equals half 
the shortest measured wavelength: 





1 FOV. 
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Whether one can actually resolve objects of this size in 
the reconstructed image depends on other factors such 
as the available contrast and the signal-to-noise ratio 
(SNR). We consider these factors in the final sections. 


Signal-to-Noise Ratio 


At a given spatial resolution, image quality is largely 
determined by SNR and the contrast between the 
different materials making up the imaging object. SNR 
in MRI is defined as the voxel signal amplitude divided 
by the noise standard deviation. The noise in the NMR 
signal, in general, is Gaussian distributed with zero 
mean. Ignoring contributions from quantization, for 
example, due to limitations of the analog-to-digital 
converter, the noise voltage of the signal can be 
ascribed to random thermal fluctuations in the receive 
circuit (see Edelstein (1986)). The variance is given by 


Tnermal = 4kB TRAV [26] 


where kg is Boltzmann’s constant, T the absolute 
temperature, R the effective resistance (resulting from 
both receive coil, Re and object, Ro), and Av the 
receive bandwidth. Both Re and R, are frequency 
dependent, with Re « w!/*, and Ro œ w. Their relative 
contributions to overall circuit resistance depend in 
a complicated manner on coil geometry, and 
the imaging objects shape, size, and conductivity 


(see Chen and Hoult (1989)). Hence, at high magnetic 
field, and for large objects, as in most medical 
applications, the resistance from the object dominates 
and the noise scales linearly with frequency. Since the 
signal is proportional to w4, in MRI, the SNR increases 
in proportion to the field strength. 

As the reconstructed image is complex valued, it is 
customary to display the magnitude rather than the 
real component. This, however, has some conse- 
quences on the noise properties. In regions where the 
signal is much larger than the noise, the Gaussian 
approximation is valid. However, in regions where the 
signal is low, rectification causes the noise to assume a 
Raleigh distribution. Mean and standard deviation can 
be calculated from the joint probability distribution: 

2 2 2 
= a +N )/20 [27] 
where N, and N; are the noise in the real and 
imaginary channels, respectively. When the signal is 
large compared to noise, one finds that the variance 
2 


o2, =0%. In the other extreme of nearly zero signal, 


one obtains for the mean: 


P(N,, Ni) 


S=o/n/2 = 1.2530 [28] 
and, for the variance: 
o*, =20°(1 — 1/4) S 0.6550° [29] 


Of particular practical significance is the SNR 
dependence on the imaging parameters. The voxel 
noise variance is reduced by the total number of 
samples collected during the data acquisition pro- 
cess, that is, 


o = Pavcenici Nl [30] 


where N =N, N; in a two-dimensional spin-warp 
experiment. Incorporating the contributions to 
thermal noise variance, other than bandwidth, into 
a constant 


u = 4kgTR [31] 
we obtain for the noise variance: 
2 uAv 
— 32 
7m NN Na A 


Here Navg is the number of signal averages collected 
at each phase encoding step. We obtain a simple 
formula for SNR per voxel of volume AV: 


N; N, Nay 
SNR = Cp AV 4/228 
uAv 
N,N,N 
= Cp Ax Ayd, 4/2" [33 
Cp X yd, ulAv | | 
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(a) (b) 


Figure 4 17,-weighted sagittal images through the midline of 
the brain: Image (b) has twice the SNR of image (a), showing 
improved conspicuity of small anatomic and low-contrast detail. 
The two images were acquired at 1.5T field strength using two- 
dimensional spin-warp acquisition and identical scan para- 
meters, except for Navg, which was 1 in (a) and 4 in (b). 


where Ax, Ay are defined in [25], d; is the thickness 
of the slab selected by the slice-selective RF pulse, 
and p denotes the spin density weighted by effects 
determined by the (spatially varying) relaxation 
times Tı and Tọ and the pulse sequence timing 
parameters. Figure 4 shows two images of the 
human brain obtained from the same anatomic 
location but differing in SNR. 


Contrast and Resolution 


The single most distinctive feature of MRI is its 
extraordinarily large innate contrast. For two soft 
tissues, it can be on the order of several hundred 
percent. By comparison, contrast in X-ray imaging is 
a consequence of differences in the attenuation 
coefficients for two adjacent structures and is 
typically on the order of a few percent. 

We have seen in the preceding sections that the 
physical principles underlying MRI are radically 
different from those of X-ray computed tomogra- 
phy, in that the signal elicited is generated by the 
spins themselves in response to an external pertur- 
bation. The contrast between two regions, A and B, 
with signals $4 and Sp, respectively, is defined as 

Cap = a [34] 

SA 

If the only contrast mechanism were differences in 
the proton spin density of various tissues, then 
contrast would be on the order of 5-20% . In reality, 
it can be several hundred percent. The reason for 
this discrepancy is that the MR signal is acquired 
under nonequilibrium conditions. At the time of 
excitation, the spins have typically not recovered 
from the effect of the previous cycle’s RF pulses, nor 
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is the signal usually detected immediately after its 
creation. 

Typically, in spin-warp imaging, a spin-echo is 
detected as a means to alleviate spin coherence 
losses from static field inhomogeneity. A spin-echo 
is the result of applying an RF pulse that has the 
effect of taking (m1, m2, m3) to (mı, —™m2, —m3). As 
such a pulse effects a 180° rotation of the Z-axis, it is 
also called a z-pulse. If, after such a pulse, the spins 
continue to evolve in the same environment then, 
following a certain period of time, the transverse 
components of the magnetization vectors through- 
out the sample become aligned. Hence a pulse of 
this type is also called a refocusing pulse. The time 
when all the transverse components are rephased is 
called the echo time, Tr. 

The spin-echo signal amplitude for an RF pulse 
sequence 1/2 —1T—a-—T, repeated every Tp sec- 
onds, is approximately given by 


S(t = 27) & p(1 — e7 T )e7 T/T [35] 


This is a good approximation as long as Tg << TR 
and Tə << Tp, in which case the transverse magne- 
tization decays essentially to zero between successive 
pulse sequence cycles. In eqn [35], p is voxel spin 
density and the echo time Tg =2r. Empirically, it 
is known that tissues differ in at least one of 
the intrinsic quantities, T1, T2, or p. It, therefore, 
suffices to acquire images in such a manner that 
contrast is sensitive to one particular parameter. For 
example, a “T>-weighted” image would be acquired 
with Tg ~ To and Tr >> Tiy and, similarly, a 
“T,-weighted” image with Tr < Tı and Tp << Th, 
with T1, T> representing typical tissue proton relaxa- 
tion times. Figure 5 shows two images obtained with 
the same scan parameters except for Tp and Tg 
illustrating the fundamentally different image con- 
trasts that are achievable. 

It is noteworthy that object visibility is not just 
determined by the contrast between adjacent 





(a) (b) 


Figure 5 Dependence of image contrast on pulse sequence 
timing parameters: (a) 7,-weighted; (b) proton density-weighted. 


structures but is also a function of the noise. It is, 
therefore, useful to define the contrast-to-noise ratio as 


CNRag = —2 [36] 


where oe is the effective standard deviation of the 
signal. Finally, it may be useful to reconstruct 
parametric images in which the pixel signal values 
represent any one of the intrinsic parameters. A 
T>-image can be computed from eqn [35], for 
example, either analytically from two image data 
sets acquired with two different echo times, or from a 
series of Tg values, obtained from a Carr—Purcell spin- 
echo train, using regression techniques (see Nuclear 
Magnetic Resonance and Haacke et al. (1999)). 

We have previously shown that the limiting 
resolution is given by Rmax, the largest spatial 
frequency sampled, see [25]. In reality, however, 
the actual resolution is always lower. For example, 
spin-spin (T2) relaxation causes the signal to decay 
during the acquisition. In spin-warp imaging, this 
causes the high spatial frequencies to be further 
attenuated. 

A further consequence of finite sampling is a 
ringing or Gibbs artifact that is most prominent at 
sharp intensity discontinuities. In practice, these 
artifacts are mitigated by applying an appropriate 
apodizing filter to the data. Figure 6 shows a portion 
of a brain image obtained at two different resolu- 
tions. In Figure 6b, the total k-space area covered 
was 16 times larger than for the acquisition of the 
image in a). Artifacts from finite sampling and 
blurring of fine detail such as cortical blood vessels 
are clearly visible in the low-resolution image. SNR, 
according to eqn [33], is reduced in the latter image 
by a factor of 4. 





(a) (b) 
Figure 6 Effect of k-space coverage on spatial resolution in 
axial image of the brain: the field of view in both images was 
20cm and all scan parameters were the same except that (a) 
was acquired with Nx = N, = 128 and (b) with Ny = N, = 512. 


See also: Nuclear Magnetic Resonance; Stochastic 
Resonance. 
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The Basic Modeling 


Magnetohydrodynamics (MHD) is the study of the 
interaction of (electro-) magnetic fields and con- 
ducting fluids. When a conducting fluid (e.g., a 
liquid metal, a weakly ionized gas, or a plasma) is 
placed within a magnetic field, two coupling 
phenomena appear: the electric currents modify the 
magnetic field, and the Lorentz forces due to the 
magnetic field modify the motion of the fluid. At the 
mathematical level, two sets of equations, very 
different in nature, are involved. The usual descrip- 
tion of the hydrodynamics phenomena is most often 
that provided by the continuum mechanics for 
fluids, while the description of electromagnetic 
phenomena essentially proceeds from the Maxwell 
equations. 

Either category of equations can be declined in a 
variety of models. The coupling between the two 
categories might also be accounted for at different 
levels of accuracy. For the sake of conciseness in 
such an expository survey, it is neither desirable nor 
doable to present all the possible set of equations 
and their possible coupling. The difficulty stems 
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from the incredibly large spectrum of physical 
phenomena where MHD plays a role. A list of 
such phenomena includes 


è astrophysical and geophysical applications (mod- 
eling of stars in the galactic field, of pulsars, of 
solar spots, of the flows in the earth’s core, ...), 

e advanced “terrestrial” applications such as the 
magnetic confinement of plasmas in controlled 
fusion, MHD propulsion engines for rockets, and 

e industrial applications in the engineering world 
(electromagnetic pumping, metal forming, alumi- 
num electrolysis, and many other metallurgical 
applications). 


Due to this variety of physical situations, no 
unified setting can be presented with a satisfactory 
degree of details. We therefore mostly concentrate 
throughout this article on the MHD of conducting 
fluids that are homogeneous, incompressible, vis- 
cous, and Newtonian. This is often the case of 
liquid metals in many industrial processes. The 
equations manipulated will first be given in their 
most general form and then immediately adapted to 
the above context. For other contexts, the modeling 
follows the same pattern, but other variants of the 
general equations must be employed. The biblio- 
graphy of this article contains such general 
information. 
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The Hydrodynamics Description 


The usual description for fluids follows from 
continuum mechanics. In this setting, the governing 
equation is the equation for the conservation of 
momentum 

E + div(pu & u) — divr=f [1] 
where p denotes the density of the fluid, u its 
velocity, r the stress tensor, and f the density of 
volumic (or per unit volume) body forces applied to 
the fluid. For incompressible viscous Newtonian 
fluids, the stress/velocity relation reads 


r=ņ(Vu + (Vu)') — pid [2] 
together with the constraint 
divu =0 [3] 


on the velocity. Here, 7 denotes the viscosity of the 
fluid, p the pressure, and A! denotes the transpose 
matrix of the matrix A. A third usual assumption is 
that the incompressible fluid is in addition homo- 
geneous, that is, 


p =p = constant [4] 


Equations [1]-[4] lead to the equations for 
conservation of momentum in the case of incom- 
pressible homogeneous viscous Newtonian fluid, 
that is, the incompressible Navier-Stokes equations 


Ou 


Po, + pu Vu—nAu+ Vp=f 


div u = 0 


[5] 


These equations are supplied with initial and 
boundary conditions on the velocity u. At initial 
time, the velocity is assumed to be known 
u(t=0,-)=up on the whole domain occupied by 
the fluid Q, a domain that is supposed here not to 
vary in time (see, nevertheless, the section “The 
industrial production of aluminum” for a different 
setting). On the other hand, the boundary conditions 
on the boundary 02 of Q can be of various forms. 
For simplicity, the boundary is supposed regular, so 
that its unitary outward normal mgq can be 
unambiguously defined. The standard choice is to 
set Dirichlet conditions on the velocity u = Utgiven. In 
the following, we will assume for simplicity that the 
boundary condition is the homogeneous Dirichlet 
boundary condition u =Q, as a superposition of the 
nonpenetration condition u -nan =0 and the no-slip 
boundary condition ux nən=0. One can also 
impose alternative boundary conditions, for exam- 
ple, involving the pressure. 


The Electromagnetic Description 


Classical electromagnetism is described by the 
Maxwell equations. For the sake of consistency, we 
recall here that these are: 


The Maxwell-Ampère equation 


oD . 
-z teurlH =j [6] 


The Maxwell-Coulomb equation 


divD = pe [7] 


The Maxwell-Faraday equation 


OB 
at curl E = 0 [8] 


The Maxwell-Gauss equation 
divB = 0 [9] 


In the above equations, the three-dimensional vector 
fields D, B, E, H denote the electric and magnetic 
inductions, and the electric and magnetic fields, 
respectively. On the other hand, the three-dimensional 
vector field j denotes the current density, and the scalar 
field pe denotes the charge density. Inside an elec- 
trically conducting medium, the standard assumption 
of perfect medium consists in assuming the following 
relations: 


D=cE 
10 
ity [10] 
u 


often called “constitutive laws,” where € and p, 
respectively, denote the (electric) permittivity and 
the (magnetic) permeability of the medium. In the 
simple isotropic homogeneous case, both these 
parameters are scalar and constant. They are often 
expressed as 


E€ = EE 


H = Hr Lo 


[11] 


where £ọ, uo are the permittivity and the perme- 
ability of the vaccum (that satisfy couo = 1/c?, with 
c denoting the speed of light), and e,, ur are the 
permittivity and the permeability relative to vaccum, 
or relative permittivity and relative permeability. 
When collecting [6|-[9], together with [10], [11], 
one obtains the following general system of 


Maxwell 


medium: 


equations in a continuum (dielectric) 





_ OE) + curl (8) =j 
u 


ot 
div(eE) = pe 
iv(cE) = p 12 
B uO 
ot 
divB = 0 


This system is supplied with initial conditions on the 
fields B and E. On the other hand, boundary 
conditions might be necessary when the equations 
are restricted to a bounded domain. The latter 
question, quite delicate, is postponed until next 
section. 


The MHD Coupling 


For coupling systems [5] and [12], a threefold task is 
in order. 

On the one hand, the body force term in [5] needs 
to be made precise, and this is completed by setting 


f=] xBF fea [13] 


The first term in the right-hand side is the Lorentz 
force, consequence of the electric current 7 running 
within the magnetic field B, a force that influences 
the motion, along the velocity field u, of the 
particles of the conducting fluid. The second term 
is due to possible external forces. A typical case for 
such forces is that of the gravity forces 


f ext = PE [14] 


On the other hand, in order to be a mathemati- 
cally closed system, the Maxwell system [12] needs 
to be complemented by Ohm’s law, another type of 
constitutive relation, like [10], that now relates the 
current density j with the other fields. When dealing 
with MHD phenomena, Ohm’s law most often 
reads in the form 


j=o(E +u x B) [15] 


where ø denotes the electric conductivity of the 
fluid. The second term of [15] explicitly accounts for 
the deviation of the lines of electric current by the 
hydrodynamics flow. In some oversimplified situa- 
tions, it can be neglected, leading to Ohm’s law in 
the more usual form j=øE, that is also valid for 
solid media. Most of the times the term ux B 
contains crucial information, and thus is not 
neglected. 
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System [5]|-[12] now reads 





ə , 
divu = 0 
O(cE) 1 
_ l| -B ] = 
ii G ) í 
divE =~p. 


cali = 0 
Ot 


div B = 0 
j =o(E+u~xB) 


A third task is then in order. 

Apart from the constitutive laws [10] and Ohm’s 
law [15], the specificity of the Maxwell equations for 
conducting fluids, as opposed to the same equations 
written, for example, in the vacuum, resides in the 
possible need for supplying the system with ad hoc 
boundary conditions. Indeed, in their most general 
form, the Maxwell equations are valid in the whole 
physical space R?. On the other hand, as the goal here 
is to simulate an MHD fluid that most often occupies 
only a bounded domain 2 in R?, there is the need to 
adequately define the simulation domain. 

A first possibility is to set the Maxwell equations 
in the whole space, while solving the hydrodynamics 
equation on the domain Q occupied by the fluid. 
Regarding only the Maxwell equations [12], this 
seems to be the method of choice. But then there is 
the need for an extension of Ohm’s law [15] outside 
the fluid domain. Notice indeed that u appears in 
[15]. In addition to this, the fact that the physical 
confinement device for the fluid is then embedded in 
the domain where the Maxwell equations are set 
may be the source of various difficulties, as such a 
device is often delicate to model and treat. There- 
fore, alternative tracks may be followed. 

A second possibility is to restrict the Maxwell 
equation to a bounded domain. In turn, this option 
divides in two: taking as the domain for the Maxwell 
equations that occupied by the fluid, or choosing a 
domain larger than 2. We cannot discuss this choice 
without loss of generality, and refer the reader to the 
literature (see e.g., Gerbeau et al. (2005)). In either 
situation, boundary conditions are needed. We only 
consider the former for the sake of brevity. 

A standard choice for the boundary conditions for 
[12] is the following: 


E x nag = k x nag 


[17] 
B- nan =q 
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where k and q, respectively, are given vector and 
scalar functions on the boundary. 

A fact that needs to be emphasized is that it is not 
so easy to design accurate boundary conditions, that 
is, evaluations of k or g, especially because accurate 
experimental measures of magnetic quantities are 
often delicate to obtain, especially in industrial 
environments. 


A Commonly Used Simplified MHD Coupling 


For the terrestrial MHD applications that are the 
focus of the present article, a commonly used 
assumption is to neglect the first term O(cE)/Ot, 
often called the displacement current, in the 
Maxwell-Ampére equation [6], that is the first 
equation of [12] or the third of [16] above. 
Then system [16] can be reorganized, eliminating 
E and j, and leaving aside the Maxwell—Faraday 
equation [8], Ohm’s law [15], and the Maxwell- 
Coulomb equation [7]. The latter equations 
amount to defining, respectively, E from B, 7 
from E and B, and p. from E. One is left with 
the following system with the triple of unknown 


fields (u, p, B) 


2" 
Pot 


ext 


1 
+ pu : Vu — nAu + Vp = —curl B x B + f 
ü 


divu = 0 [18] 


Gi + curl (- curl £B) = curl (u x B) 
ot o u 


div B = 0 


Correspondingly, the initial conditions are now 
only on the pair (u, B). Regarding the boundary 
conditions on B, they can be derived from [17] 
using, for example, a homogeneous Dirichlet bound- 
ary condition on u: 


curl B x nog = k x naq 


[19] 
B- ng =q 

Other simplifications of system [16] can be 
adopted, such as steady-state approximations. In 
particular, it is often considered that electromagnetic 
phenomena have characteristic times that are so 
short in comparison with the characteristic time 
of hydrodynamics phenomena that the Maxwell 
equations in their stationary form may be coupled to 
the time-dependent hydrodynamics equations, such 
as [5]. We refer to the “Further reading” section 
for further information along these lines (see e.g., 

Gerbeau et al. (2005)). 


The Mathematical Nature of the 
Equations 


With a view to understand the mathematical 
nature of systems [16] and [18], we first briefly 
recall some mathematical facts concerning hydro- 
dynamics, before focusing on the coupling with 
electromagnetics. 

Regarding the incompressible Navier-Stokes 
equation, we recall that the state of the art of the 
mathematical knowledge heavily depends on the 
dimension of the ambient space. In dimension 2, 
solutions are unique and regular (they are said to be 
strong), for regular enough data of course. Unfortu- 
nately, as the focus is here on MHD and electro- 
magnetism is fundamentally a three-dimensional 
phenomenon, only the three-dimensional case for 
the Navier-Stokes equation is relevant. Now, in the 
context of the Navier-Stokes equations alone, only 
the existence of weak solutions for large times, and 
the existence and uniqueness of strong solutions for 
small times are known. Whether or not there exists a 
unique strong solution for all time (of course again 
for sufficiently regular data) is an open problem, of 
outstanding difficulty, (see Temam 1995). 

In the coupled setting examined here, there is no 
reason to expect a better situation. At best, one may 
hope for the same situation as that for the 
uncoupled case (Navier-Stokes equations alone). 
Regarding the existence and uniqueness of solutions, 
a commonly used strategy is that of regularization: 
the Cauchy problem is studied for regularized data, 
and then one passes to the limit in the regulariza- 
tion. In this latter step, the linear terms cause no 
difficulty, since they pass to the limit only using 
weak convergence. On the other hand, the main 
concern is always the treatment of the nonlinear 
terms, which require strong convergence. Here, for 
the Navier-Stokes equation in the MHD setting, the 
additional difficulty stems from the presence of 
the nonlinear term j x B on the right-hand side. The 
mathematical treatment of this nonlinear term calls 
for a compactness argument, which in turn requires 
obtaining some information on the fields 7 and B, 
and their derivatives, from the Maxwell equations. 
In this respect, the situation is radically different for 
system [16] and for system [18]. Likewise, these 
two systems behave differently regarding the other 
nonlinear term of electromagnetic nature, namely 
ux B in Ohm’s law, or curl(u x B) on the right- 
hand side of the equation in B, respectively. 


The Hyperbolic Variant 


Due to the presence of the Maxwell equations [12] 
in their general form, that is a hyperbolic form, 


system [16] is indeed very difficult, from the 
standpoint of mathematical analysis. 

In order to realize this, it suffices to recall that the 
first step in the proof of the existence of solution to 
such a system of equations is to write down an 
a priori energy estimate. It is a simple manipulation 
on [16] to show that, formally, a solution to [16] 


satisfies 


ld f 2 2 ; 
sae [Pe +o f u= f GxB u 2o 


multiplying the Navier-Stokes equation by u and 
integrating over the domain Q, while, on the other 


hand, 
-Jj E [21] 
Q 


multiplying the Maxwell-Ampère equation by —E, 
the Maxwell-Faraday equation by (1/u)B, integrat- 
ing over Q, and summing up the two. Next, the 
right-hand side of [21] can be modified, accounting 
for Ohm’s law: 


Q 
=- f Zy- [GB [22 


up [20] and [22] yields the energy 


1d 
ja f (PeP + ele? +t BP) 
1 
+ [žy +n f Vu? =0 23] 
Q 7 Q 


Notice that, in the above, we set the external forces 
and all boundary conditions to zero, for the sake of 
simplicity. 

Estimate [23] clearly indicates that we dispose of 
L®([0, T], L7(Q)) bounds on the vector fields E and 
B together with an L*([0,T]xQ) bound on the 
current j, and with the (classical) L®([0,T], 
L7(Q)) NL7([0, T], H'(Q)) bounds on the velocity 
u. In addition, divB and, when assuming pe 
bounded, divE are bounded in L%([0, T] x). 
Unfortunately, these bounds do not allow for 
passing to the limit in the nonlinear term j x B on 
the right-hand side of the Navier-Stokes equation. 
In addition, there seems to be no way of deriving 
further energy estimates on system [16] that would 
provide with more a priori regularity on the fields 
E,B, and j. To date, system [16] presents an 
unsolved mathematical difficulty. 


Summing 
estimate: 
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The Parabolic Variant 


On the other hand, system [18] is radically different 
in mathematical nature, because the Maxwell 
equations then reduce to a parabolic-type equation. 
The same manipulations as above, in order to 
establish a priori estimates on the solution of [18], 
now lead to 


za | (PP +B) 
+ fni B) 


which, together with the divergence-free constraint 
on B, yields L®([0, T], L*(Q)) N L ([0, T], H'(Q)) 
bounds on both the velocity u and the magnetic 
field B. These bounds now allow for passing to the 
limit in the terms curl B x B and curl(u x B) on the 
right-hand side of the equations. This being estab- 
lished, the rest of the mathematical analysis is 
straightforward, and a theorem of existence and 
uniqueness of solutions can be proved. Like in the 
case of the Navier-Stokes equations alone, we have 
(in dimension 3) the existence of a global-in-time 
weak solution (i.e. for any T,u and B both 
L®([0, T], L*(Q)) A L7([0, T], H'(Q)) satisfying the 
divergence-free constraint). No uniqueness of this 
weak solution is known. On the other hand, for 
sufficiently regular data, we have the existence of a 
local-in-time strong solution (i.e., for T sufficiently 
small, u and B both L™([0, T],H'(Q)) A L7([0, T], 
H7(Q)), and uniqueness of this strong solution in 
the class of weak solutions as long as it exists. We 
refer to Sermange and Temam, (1983) and Gerbeau 
et al. (2005). 

At this stage, it is to be remarked that there is a 
formal similarity, at first sight at least, between 
the parabolic form of the Maxwell equations, 
namely 








f Vul" = [24] 


OB 
Dr + curl curl B = curl þh 25] 


divB = 0 


and the incompressible Navier-Stokes equation [5]. 
Note that indeed the curl operator in the first 
equation of [25] can be replaced by (minus) the 
Laplacian operator —A, since div B=0. Actually, 
this formal similarity cannot be translated into 
mathematical arguments, simply because there is 
no pressure in [25]. In other terms, the divergence- 
free constraint div B = 0 simply propagates in time in 
[25] (note that the right-hand side curlh is also 
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divergence-free by construction), while on the other 
hand div u =Q is enforced as a constraint in [5], the 
pressure playing the role of a Lagrange multiplier 
that adjusts itself in time in order to allow for u to 
be divergence-free. 

Of course, as in the purely hydrodynamics case, 
much more can be said on the equations than simply 
establishing the existence and uniqueness of solu- 
tions. For instance, the long time limit of the 
solutions can be studied, etc.... For this and other 
issues, we refer to the “Further reading” section 
(Duvaut and Lions 1972a, b, Sermange and Temam 
1983, Gerbeau et al. 2005). 


Numerical Issues 


We concentrate again on system [18]. It is illustra- 
tive to mention that this system, when written in 
nondimensional variables, reads 


Ou 1 
ap VY- RA” t Vp = SculBx B+ f 


ext 


divu = 0 


7 Re curl (curl B) = curl (u x B) 


div B = 0 


where S is the coupling parameter, Re is the 
(hydrodynamic) Reynolds number, and Remag 
denotes the magnetic Reynolds number. 

As expected, the numerical simulation of a system 
such as [18] superposes the difficulties of the 
hydrodynamics simulation of incompressible viscous 
fluids, and those faced when simulating the para- 
bolic form of the Maxwell equations. Therefore, the 
goal is to efficiently combine the techniques 
employed to overcome either of them. 

For incompressible fluid mechanics, the method 
of choice is the finite-element method for the 
discretization of differential operators in space. A 
typical discretization of eqn [5], called the “mixed” 
finite-element method, makes use of a pair of finite 
elements, one for the velocity, and one for the 
pressure. Other possibilities exist, that amount 
more or less in eliminating one unknown in a 
first stage and calculating the second one as a 
postprocessing task. The mixed formulation in the 
pair of unknowns (u,p) is however the most 
employed method to date, at least in the present 
setting. The finite-element space for the velocity is 
taken richer than that for the pressure: a possibility 
is, for example, to take the degree of the finite 


element for the velocity equal to the degree of the 
finite element for the pressure plus one. The 
heuristics for this is the fact that the velocity is 
derived twice in [5] while the pressure is only 
derived once. Of course, a mathematical ground 
for this is available, and a key issue is the “inf- 
sup” condition (also compatibility condition, or 
stability condition) that dictates the possible choice 
for finite-elements pairs, so that problem [5] is well 
posed at the discrete level. Typically, Q2 finite 
elements for the velocity can be combined with 
(continuous) O1 finite elements for the pressure. 
An alternative choice is to ignore the inf-sup 
condition, adopting, for example, O1 finite ele- 
ments for both fields u and p, but this requires for 
a so-called stabilized formulation of [5] at the 
discrete level. The “Further reading” section 
provides details on the broad variety of techniques 
available in the field: Quarteroni and Valli (1997), 
Gerbeau et al. (2005). 

On the other hand, the parabolic equation on B in 
[18] may be discretized with the same finite elements 
as those used for the velocity. The enforcement of 
the divergence constraint divB=0O at the discrete 
level deserves some attention. Recall indeed that 
at the continuous level the divergence-free constraint 
is spontaneously propagated by the equation. At 
the discrete level, a crucial role in this respect is 
played by the weak formulation of the parabolic 
equation and an ad hoc account for the boundary 
condition [17]. 

For the sake of completeness, let us mention that 
an alternative strategy to the use of the finite 
elements that have been mentioned above (and that 
are called Lagrangian finite elements), is to use 
“edge elements.” In some sense, the use of such 
elements simplifies the treatment of the boundary 
conditions [17], since they are very well adapted to 
their mathematical nature. 

Note also that, in the vein of what is done for 
purely hydrodynamics flow simulations, stabilized 
finite-elements techniques have been developed for 
the MHD system [18], that allow for a discretization 
of the three unknown fields (u, p, B) over the same 
finite elements, for example, O1. 

When coupling the two discrete formulations for 
simulating the whole system [18], two main strate- 
gies can be adopted: one can either treat each of the 
two equations separately, independently describing 
the propagation of u and B forward in time, or one 
can address directly the coupled system of equa- 
tions, describing the propagation of u and B in 
parallel. 

The first option aims in particular at obtaining in 
the end small algebraic systems. An instance of such 


a segregated algorithm reads, formally and setting 
all constants to unity for simplicity, 


u”! — u” 
N +u” .- Vue = Ky + ve 


= curl B” x B” +f 


ext 
div u”*! =0 
[26] 
Bt! — B” 
At 
= curl (u” x B”+') 


divB”*' =0 


+ curl curlB”*! 


At each time step, the two independent subsystems 
are solved, providing with u”t! and B”*! for the 
next time step. The difficulty is that it is not 
possible, with such segregated algorithms, to repro- 
duce the energy estimate [24] at the discrete level. 
Note that, at the continuous level, the estimate [24] 
is based upon a proper cancelation of the term 
Jo (i x B)-u present on the two right-hand sides. 
Such a cancelation basically stems for a nonlinear 
interplay that cannot be present in a segregated 
iteration. Consequently, some spurious energy is 
created in the system simply by an inadequate 
iteration between the two equations. More precisely, 
the scheme obtained is at best only conditionally 
stable, that is, stable for small enough time steps, a 
condition that might be prohibitive when it is 
needed to simulate the MHD coupling over large 
times. 

On the other hand, the other option consists in 
attacking the full system [18] directly: 


yr! — y” 
At 
= curl B’t' x B” +f 


div ut! =0 


ds u” ; Vu"t! _ Ay"! A Vo" 


ext 


|27] 
Bt! _ Rn 


At 
= curl(u"*! x B”) 


div B! =0 


+ curl curlB”*! 


Note that B”+! is present in the equation yielding 
u"*', while conversely u”*! is present in that 
yielding B’*'. Then the coupled system admits at 
the discrete level an energy estimate analogous to 
the energy estimate [24], and the scheme is much 
more stable than the previous one, and even 
unconditionally stable. The price to pay is that the 
system is, at the algebraic level, of very large size. 
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Being sparse, it may however be treated, for 
example, via a GMRES-type iterative solver. 

Let us make a final remark on these numerical 
issues. In the whole generality, the numerical 
simulation of viscous fluids raises the question of 
large Reynolds numbers, that is, the question of the 
difficulties encountered in the numerical approxi- 
mation for viscosities 7 small with respect to the 
other dimensionalized parameters of the problem 
(density, velocity, and dimension of the domain). 
For such small viscosities, the flow becomes 
turbulent rather than laminar, and the broad 
range of length and energy scales in the flow turns 
out to be too difficult to capture numerically. A 
commonly used technique that is resorted to in 
such difficult cases is the turbulence modeling. 
Schematically, an averaged, or homogenized, model 
is derived on the basis of the Navier-Stokes 
equation, with the help of simplifying hypotheses, 
for example, in the form of closure relations. The 
quality of the simulation of the averaged model, 
and its relation to the true flow, heavily depends on 
these simplifying assumptions, which are in turn 
based upon a very deep understanding on the 
various physical phenomena at play. In the context 
of MHD flows, the situation is not clear, regarding 
such assumptions. It seems that there are no well- 
established models for turbulent MHD to date, at 
least from a rigorous viewpoint. In the absence of 
those, only a direct simulation of the Navier-Stokes 
equation seems possible. 


The Industrial Production of Aluminium 


A prototypical example of an application of MHD 
to the industrial context is the production of 
aluminum in electrolysis cells. The numerical simu- 
lation of the process involves the simulation of the 
evolution of two layers of nonmiscible incompres- 
sible viscous fluids, separated by an interface, and 
covered by a free surface. A schematic description of 
an industrial cell indeed is the following. An electric 
current of 10°A, or more, runs through two 
horizontal layers of conducting fluids: a bath of 
aluminum oxide above, and a layer of liquid 
aluminum below. The aluminum is produced by 
the reduction of the aluminum oxide, a reaction that 
only occurs at a temperature where aluminum is 
liquid. The high magnetic field induced by such a 
huge current produces in turn high Lorentz forces 
that influence the motion of either fluid. A key issue 
in the modeling, as well as in the technological 
control of the cell, is to understand the motion of 
the interface separating the two fluids. In a rough 
picture, this interface may be seen as a mobile 
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cathode, moving below a fixed anode. The equa- 
tions describing the interior of the cell are basically 
of the type [18], with an important modification 
though: one needs to account for the presence of 
two fluids. They read: 


E + div(pu ® u)—div(n(Vu + (Vu)")) 
=n poet BEE 
u 
divu = 0 28] 
OO. a B 
as div(pu) =0 
OB + curl (Z curlB ) = curl(u x B) 
Ot [Lo 
divB=0 


where g denotes the gravity field, we recall, and are 
supplied with the boundary conditions 


u=0 
1 
—curlB x nag = kx naq [29] 
OJ 
B.nan = q 


As opposed to [18], the density p in [28] is no longer 
the constant p, but is only piecewise constant, that is, 
constant in each (moving) subdomain occupied by 
each fluid. Likewise, the viscosity 7, and the con- 
ductivity o are taken constant in each fluid, but with 
different values from one fluid to the other. While the 
density and the viscosity are only slightly different, the 
conductivity varies from many orders of magnitude, a 
discrepancy which ends up in some numerical stiffness 
of the equations. On the other hand, the permeability 
u can be considered as constant throughout the 
domain, within a good level of approximation. 
Mathematically, system [28] is an order of magni- 
tude more difficult than [18]. We refer to Lions 
(1996) and Gerbeau and LeBris (1997) for some 
mathematical ingredients. A first major difficulty 
stems from the fact that the domain occupied by the 
fluids is no longer fixed. Notice that this difficulty 
already arises when simulating the MHD of one 
conducting fluid with a free surface. A second major 
difficulty is the discontinuity of the physical para- 
meters at the interface, which causes a loss of 
regularity at the interface for the solution fields. The 
best result known to date is the existence of a global- 
in-time weak solution to [28]. Both mathematical 
difficulties above of course have significant numerical 
counterparts. A notable issue in such a simulation is 
how to handle the motion of the free interface, while 
ensuring that each fluid remains of constant mass (or 


volume) throughout the simulation. One of the most 
efficient method in such a context, introduced three 
decades ago, is the arbitrary-Lagrangian Eulerian 
(ALE) method. We refer to Brackbill and Pracht 
(1973) and Gerbeau et al. (2003a, b, 2005). 

Apart from the direct numerical attack of system 
[28], which carries significant analytical and geome- 
trical nonlinearities, there is the possibility, in 
particular in the industrial context, to derive a set 
of linearized equations at the vicinity of some 
equilibrium configuration of the system. This track 
has been extensively followed in the past and 
provides information that efficiently complement 
those provided by the much more satisfactory, but 
also more costly, nonlinear approach. 


See also: Compressible Flows: Mathematical Theory; 
Computational Methods in General Relativity: The Theory; 
Fluid Mechanics: Numerical Methods; Newtonian Fluids 
and Thermohydraulics; Partial Differential Equations: 
Some Examples; Stability of Flows; Symmetric 
Hyperbolic Systems and Shock Waves; Topological Knot 
Theory and Macroscopic Physics. 
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Introduction 


Malliavin calculus was initiated in 1976 with the 
work by P Malliavin (1978) and is essentially an 
infinite-dimensional differential calculus on the 
Wiener space. Its initial goal was to give conditions 
ensuring that the law of a random variable has a 
density with respect to Lebesgue measure as well as 
estimates for this density and its derivatives. When 
the random variables are solutions of stochastic 
differential equations (SDEs), these densities are heat 
kernels and Malliavin used NH6rmander-type 
assumptions on the corresponding operators, thus 
providing a probabilistic proof of a H6rmander-type 
theorem for hypoelliptic operators. 

The theory was much developed in the 1980s by 
Stroock, Bismut, and Watanabe, among others (the 
reader is referred to Nualart (1995) and Malliavin 
(1997)). In recent years, Malliavin calculus had 
great success in probabilistic numerical methods, 
mainly in the field of stochastic finance (Malliavin 
and Thalmaier 2005). However, the theory has also 
been applied to other fields of mathematics and 
physics, notably in statistical mechanics and statistical 
hydrodynamics (see Stochastic Hydrodynamics). In 
addition, one should remember that Wiener measure 
can be viewed as an “imaginary time” (but well- 
defined) counterpart of Feynman’s “measure” for 
quantum systems. A stochastic calculus of variations 
for Wiener functionals could not be irrelevant to the 
path-integral approach to quantum theory. 

Another field of application worth mentioning is 
the study of representations of stochastic oscillatory 
integrals with quadratic phase function and their 
stationary phase estimation. For this, complexifica- 
tion of the Wiener space must be properly defined 
(Malliavin and Taniguchi (1997)). 
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In order to give a flavor of what Malliavin 
calculus is all about, let us consider a second-order 
differential operator in R? of the form 


A= 2 al Or + >, b'd; 
ij= i 


with smooth bounded coefficients and such that 
the matrix a is symmetric and non-negative, admit- 
ting a square root ø. The corresponding Cauchy 
value problem consists in finding a smooth solution 
u(t, x) of 


Ou 
T Au, u(0,.) = d(.) [1] 


Then there exists a transition probability function 
p(t, x,.) such that 


ult) = | o)plt.x.dy) 


When p(t,x, dy) =p(t,x,y)dy, the function p is the 
heat kernel associated to the operator A, and 
from eqn [1] one may deduce Focker—Planck’s 
equation for p. 

Since Kolmogorov we know that it is possible to 
associate with such a second-order operator a stochas- 
tic family of curves like a deterministic flow is 
associated with a vector field. This stochastic family 
is a Markov process, &,(t), which is adapted to the 
increasing family P,,7 € [0,1], of sigma-fields gener- 
ated by the past events, that is, u(r) € P, for every T. 

Itô calculus allows us to write the SDE 
satisfied by €: 


de(t) = o(&x(t)) dW (t) + (Ex (t)) dt, &(O)=x [2] 


where W(t) stands for R¢-valued Brownian motion 
(see Stochastic Differential Equations). Then p is the 
image of the Wiener measure p (the law of 
Brownian motion), namely p(t,x,.)=o€,"(t)(.) 
and we have the representation 


u(t, x) = Ealt) 
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The following criterion for absolute continuity of 
measures in finite dimensions holds: 


Lemma If y is a probability measure on R and, 
for every f € CP, 


[ata < afls 


where ci, i=1,...,d, are constants, then y is abso- 
lutely continuous with respect to Lebesgue measure. 


Now one can think about Wiener measure as an 
infinite (actually continuous) product of finite- 
dimensional Gaussian measures. Considering the 
toy model of the above-mentionned situation in 
one dimension, we replace Wiener measure by 

a(i; V2n)e~ /2 dx and look at the process at 
a fixed time as a function g on R. In order to apply 
the lemma and study the law of g, one would write 


frogan- [CX a 


and then integrate by parts to obtain f (f og)pdy. A 
simple computation shows that p(x) = (g” + xg’) /(g’ 
and, in particular, that the nondegeneracy of the 
derivative of g plays a role in the existence of the 
density. 

To work with functionals on the Wiener space, 
one needs an _ infinite-dimensional calculus. Of 
course, other (Gateaux, Fréchet) calculi on infinite- 
dimensional settings are already available but the 
typical functionals we are dealing with, solutions of 
SDEs, are not continuous with respect to the 
underlying topology, nor even defined at every 
point, but only almost everywhere. Malliavin calcu- 
lus, as a Sobolev differential calculus, requires very 
little regularity, given that there is no Sobolev 
imbedding theory in infinite dimensions. 


ee 


Differential Calculus on the 
Wiener Space 


We restrict ourselves to the classical Wiener space, 
although the theory may be developed in abstract 
Wiener spaces, in the sense of Gross. For a 
description of this theory as well as of Segal’s 
model developed in the 1950s for the needs of 
quantum field theory, the reader is referred to 
Malliavin (1997). 

Let # be the Cameron—Martin space, 
The 2 1] — = r that Å is square integrable 
and h(t = fobi , which is a separable rae 
space F wih cata <hi, þh > = Js hy(r 
h>(r)dr. The classical Wiener measure will 7 
denoted by u; it is realized on the Banach space X 


of continuous paths on the time interval [0,1] 
starting from zero at time zero, a space where H is 
densely imbedded. In finite dimensions, Lebesgue 
measure can be characterized by its invariance under 
the group of translations. In infinite dimensions 
there is no Lebesgue measure and this invariance 
must be replaced by quasi-invariance for transla- 
tions of Wiener measures (Cameron—Martin admis- 
sible shifts). We recall that, if b € H, Cameron- 
Martin theorem states that 


=E, (Fe) exp ( J ' b(r) dw(r) 
ff (7) dr) ) 


where dw denotes It6 integration. 

For a cylindrical “test” functional F(w)= 
f(wW(71), ---,W(Tm)), where fe CrP(R”) and 0< 
Ty S++ <T% <1, the derivative operator is 


defined by 


E (Elw + 4)) 


= Yen (T1),---,w(tm)) B 


This operator is closed in W2,1(X; R), the comple- 
tion of the space of cylindrical functionals with 
respect to the Sobolev norm 


1 
Flay = EvllFI? + E; | ID, EP dr 


Define F to be H-differentiable at w € X when there 
exists a linear operator VF(w) such that, for all b € H, 


F(w +h) — Fw) = h) + o(||P| lx) 
as ||h|| — 0 


(VF(w), 


Then D, disintegrates the derivative in the sense that 


1 a 
z / D,Fw).b(r) dr (4 


Higher (r)-order derivatives, as r-linear functionals, 
can be considered as well in suitable Sobolev spaces. 

Denote by 6 the L? adjoint of the operator V, that 
is, for a process u: X — H in the domain of 6, the 
divergence (u) is characterized by 


w A ( J D, Fil) dr) [5] 


For an elementary process u of the form 
T)= X; ETAT), where the F; are smooth ran- 
dom variables and the sum is finite, the divergence is 


— N F(z) — S D,F;dr 
j poa 


D,F(w) = (VE(w), h 


The characterization of the domain of 6 is delicate, 
since both terms in this last expression are not 
independently closable. It can be shown that 
W1,2(X;H) is in the domain of 6 and that the 
following “energy” identity holds: 


1 1 
E,,(6(u))” = E,||u||z + E, / / D,ttz.D,it, do dr 
0 0 


Notice that when u is adapted to P,, Cameron- 
Martin—Girsanov theorem implies that the divergence 
coincides with It6 stochastic integral fis u(t) dw(T) 
and, in this adapted case, the last term of the energy 
identity vanishes. We recover the well-known Itô 
isometry which is at the foundation of the construction 
of this integral. When the process is not adapted, the 
divergence turns out to coincide with a generalization 
of Itô integral, first defined by Skorohod. 

The relation [5] is an integration-by-parts formula 
with respect to the Wiener measure u, one of the 
basic ingredients of Malliavin calculus. This formula 
is easily generalized when the base measure is 
absolutely continuous with respect to u. 

Considering all functionals of the form 
P(w) = O(w(™),..., W(Tn)) with O a polynomial on 
Rf, the Wiener chaos of order n, C,, is defined as 
Cn =Pn®P_1, where P, denote the polynomials 
on X of degree <n. The Wiener-chaos decomposition 
L2(X) = Dr) Cn holds. Denoting by I, the ortho- 
gonal projection onto the chaos of order n, we have 


(o(I] F).6) = | [((VF, b») 


n+1 n 


The derivative D, corresponds to the annihilation 
operator A(z) and the divergence (u) to the creation 
operator A‘(u) on bosonic Fock spaces. 

An important result, known as the Clark- 
Bismut—Ocone formula, states that any functional 
F € W1,2(X; R) can be represented as 


F = E,,(F) + J E,(D,F) dw(r) 
0 


where E, denotes the conditional expectation with 
respect to the events prior to time 7 (or, for short, 
the past P, of 7). 

The Ornstein—Uhlenbeck generator (or minus 
number operator) is defined by LF= —6VF. On 
cylindrical functionals F(w)=f(w(71),..., w(tm)), it 
has the form 


LF(w) = ye oj Ano f (w(t), +». 
= >on) of (w(71), + 


,(Tm)) 


(Tm) 
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where i,j denote multi-dimensional (d) indexes. 

As a multiplicative operator on the Wiener-chaos 
decomposition CF = —)> nII,F. It is the generator 
of a positive p-self-adjoint semigroup, the Ornstein- 
Uhlenbeck semigroup, formally given by 
T;F= X. ,e "ILF. Another familiar representation 
of this semigroup is Mehler formula, 


TOR (7 (a + TEED ano) ) 


Considering the map X — R”,w — (wlt), ..., 
W(Tm)), the image of this operator is the Ornstein- 
Uhlenbeck generator (corresponding to the Langevin 
equation) on R” with Euclidean metric defined by 
the matrix T; A T;. 

The fundamental theorem concerning existence of 
the density laws of Wiener functionals is the following: 


Theorem Let F be an R?-valued Wiener functional 
such that F and LF belong to D for every 
i=1,..., d. If the covariance matrix 


Sr, Viy 


is almost surely invertible, then the law of F is 
absolutely continuous with respect to the Lebesgue 
measure on R’. 


Under more regularity assumptions, smoothness 
of the density is also derived. On the other hand, the 
integrability assumptions on £ can be replaced by 
integrability of the second derivatives, due to Krée- 
Meyer inequalities on the Wiener space. 

We remark that, although equivalent, the initial 
formulation (Malliavin 1978) of Malliavin calculus 
was different, relying on the construction of the 
two-parameter process associated to £ and on its 
properties. In the early 1980s, the theory was 
elaborated, the main applications being the study 
of heat kernels (cf., e.g., Stroock (1981), Ikeda and 
Watanabe (1989), and Bismut (1984)). Starting from 
an SDE [2], it is possible to apply these techniques to 
obtain existence and smoothness of the transition 
probability function p(t,x,y) if the vector fields 
Zi= >}; o'(0/Ox;) together with their Lie brackets 
generate the tangent space for “sufficientely many” 
(in terms of probability) paths. These results shed a 
new light on Hörmander theorem for partial 
differential equations. 


Quasi-Sure Analysis 


Quasi-sure analysis is a refinement of classical 
probability theory and, generally speaking, replaces 
the fact that, due to Sobolev imbedding theorems, 
functions in finite dimensions belonging to Sobolev 
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classes are in fact smooth. We work in classical 
probability up to sets of probability zero; in quasi- 
sure analysis negligible sets are smaller and are those 
of capacity zero. This is the class of sets which are 
not charged by any measure of finite energy. 

Under a nondegenerate map, Wiener measure and 
more general Gaussian measures may be disinte- 
grated through a co-area formula. This principle, 
developed by Malliavin and co-authors (cf. 
Malliavin (1997) and references therein), implies 
that a property which is true quasi-surely will also 
hold true almost surely under conditioning by such 
a map. One can use this principle to study 
finer properties of SDEs. It was also used in 
M P Malliavin and P Malliavin (1990) to transfer 
properties from path to loop groups (see Measure on 
Loop Spaces). A pinned Brownian motion, for 
example, is well defined in quasi-sure analysis. It is 
possible to treat anticipative problems using quasi- 
sure analysis by solving the adapted problem after 
restriction of the solution to the finite-codimensional 
manifold which describes the anticipativity. These 
methods have also been applied to the computation 
of Lyapunov exponents of stochastic dynamical 
systems (Imkeller 1998). With a geometry of finite- 
codimensional manifolds of Wiener spaces well 
established, it is reasonable to think about applica- 
tions to cases where such submanifolds correspond to 
level surfaces of invariant quantities for infinite- 
dimensional dynamical systems (cf. Cipriano (1999) 
for an example of such a situation in hydrodynamics). 

The (p,r)-capacity of an open subset O of the 
Wiener space is defined by 


cap, (O) = inf {\||l{y,, > ¢ >0,¢> 1 p-a.s. on O} 


and, for a general set B, cap, ,(B)= inf {cap, ,(O): 
B C O,O open}. A set is said to be slim if all its 
(p,7r)-capacities are zero. For ® € W, the space of 
functionals with every Malliavin derivative belong- 
ing to all L?, there exists a redefinition of ©, 
denoted by ®*, which is smooth and defined on the 
complement of a slim set. 

Following Airault and Malliavin (1988), let G € 
W(X; Rf) be of maximal rank and nondegenerate 
in the sense that the inverse of 


(det #) (w) = det( (VE (w), V®/(w))) 


belongs to W. Then for every functional G € Wx, 
the measures uo ® | and (Gu) o &! are absolutely 
continuous with respect to Lebesgue measure on R? 
and have C~® Radon-Nikodym derivatives. If 


— duo @! 


5 6-1 
p(A) T -C ELA 


and PaA) = dà 


the function ÀA — pc(A)/p(A) will be smooth in the 
open set O={A: p(A) > O}. 

For every A € O, it is possible to define (up to slim 
sets) a submanifold of the Wiener space of codimen- 
sion d, S,=(®*)1(\), as well as a measure us 
satisfying 





f. & dusta) = EG) = 
S) 


for every G € Wœ. This measure does not charge 
slim sets. 
The area measure N on the submanifold S) is 


defined by 
J F* dN = p(d) J F* (w) det( (V8! (w), 
VE (w)))"? dus(w) 


The following co-area formula on the Wiener 
space 


| f(®(w)) F(w)(det ©)(w) dulo) 
_ f FO) / F*(w) dX(w) dd 
R S) 


was proved in Airault and Malliavin (1988). 


Calculus of Variations in a 
Non-Euclidean Setting 


Let M be a d-dimensional compact Riemannian 
manifold with metric ds*= 53, ; gi dm' dm. The 
Laplace—Beltrami operator is expressed in the local 
chart by 





where Le, are the Christoffel symbols associated 
with the Levi-Civita connection. The corresponding 
Brownian motion p, is locally expressed as a 
solution of the SDE: 


dp'(t) = a"! (p(t)) dW;(t) —5g"*T* , (p(¢)) dt 


with p(0)=mo E€ M and where a=,/g. Its law on 
the space of paths P(M)={p:[0,1] — M,p contin- 
uous, p(0) =o} will be denoted by v. 

How can we develop differential calculus and 
geometry on the space P(M)? An infinite-dimensional 
local chart approach is delicate, due to the difficulty 
of finding an atlas in which the changes of charts 
preserve the measures. A possibility, developed in 
Cruzeiro and Malliavin (1996), consists in replacing 
the local chart approach by the Cartan-like metho- 
dology of moving frames. The canonical moving 


frame in this framework is provided by It6 stochastic 
parallel transport. Nevertheless, a new difficulty 
arises: the parallel transport will not be differentiable 
in the Cameron—Martin sense described before. 

Recall that a frame above m is a Euclidean 
isometry r:R? — T,,(M) onto the tangent space. 
O(M) denotes the collection of all frames above M 
and z(r) =m the canonical projection. O(M) can be 
viewed as a parallelized manifold for there exist 
canonical differential forms (6, w) realizing for every r 
an isomorphism between T,(O(M)) and Rf x so(d). 

If A,,a=1,...,d, denote the horizontal vector 
fields, which are defined by <0,A,>=€ 9, <u, 
Ay > =0, where £ are the vectors of the canonical 
basis of Rf, then the horizontal Laplacian in O(M) 
is the operator 


and we have Aom(for)=(Amf)or. With the 
Laplacians on M and on O(M) inducing two 
probability measures, the canonical projection rea- 
lizes an isomorphism between the corresponding 
probability spaces. 

The Stratonovich SDE 


dr,, = N Aa (two da”, 7,(0)=% 


with 7(79) =mpo defines the lifting to O(M) of the Ito 
parallel transport along the Brownian curve and we 
write t?  oro=r.(T). It6 map was defined by 
Malliavin as the map I: X — P(M) given by 


Iw) (T) = m(r(7)) 


This map is a.s. bijective and we have v= puo It; 
therefore, it provides an isomorphism of measures 
from the curved path space to the “flat” Wiener 
space. 

For a cylindrical functional F=f(p(™),.. 
on P(M), the derivatives are defined by 


-»D(Tm)) 


D,.F(p) = > lcr (to, (Op F) |E) 
k=1 


The derivative operator is closable in a suitable 
Sobolev space. 

It would be reasonable to think that the differ- 
entiable structure considered in the Wiener space 
would be conserved through the isomorphism I and 
that the tangent space of P(M) would consist of 
transported vectors from the tangent space to X, 
namely Cameron—Martin vectors. Let us take a map 
Zy(T) € Tor (M) such that z(T) S aa belongs 
to the Cameron—Martin space H. 


Malliavin Calculus 387 


In order to transfer derivatives to the Wiener 
space, we need to differentiate the Ito map. We have 
(Cruzeiro and Malliavin (1996)): 


Theorem The Jacobian matrix of the flow ro —> 
r.(T) is given by the linear map Ju,- = (Jt n JZ,-) € 


GL(R? x so(d)) defined by the system of Stratonovich 
SDE’s 


dor = 3 (JL) odwa(r) 


d 
die = D Qira odwal) 


where Q denotes the curvature tensor of the under- 
lying manifold read on the frame bundle. 


From this result we can deduce the behavior of the 
derivatives transferred to the Wiener space, a result 
whose origin is due to B Driver. We have, for a 
“vector field” Z,(7) on P(M) as above, 


(DzF)ol = De(Fol) 
with € solving 


d(T) = 2(r) dr + po du(r) 
dp(r) = Q(0 dw(r), 2(7)) 


The process € is no longer Cameron—Martin space 
valued. Nevertheless, it satisfies an SDE with an 
antisymmetric diffusion coefficient (given by the 
curvature) and therefore, by Levy’s theorem, it still 
corresponds to a transformation of the Wiener space 
that leaves the measure quasi-invariant. We extend, 
accordingly, the notion of tangent space in the 
Wiener space to include processes of the form 
dé*=a3 dw? +c%dr, with a}+a%=0. These 
were called “tangent processes” in Cruzeiro and 
Malliavin (1996). 

Another important consequence of the last theo- 
rem is the integration-by-parts formula in the curved 
setting, initially proved by Bismut (1984): 


E,(DzF) = E, (Eon / [z + 4Ricci(z)] dtr) 


where Ricci is the Ricci tensor of M read on the 
frame bundle. 


Some Applications 


We already mentioned that Malliavin calculus has 
been applied to various domains connected with 
physics. We shall describe here some of its relations 
with elementary quantum mechanics. 
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Feynman gave a path space formulation of 
quantum theory whose fundamental tool is the 
concept of transition element of a functional F(w) 
between any two L?-states Y, and ¢,, for paths w 
defined on a time interval [s, u]: 


<F>s =<@|Flu>s 


=] uleexp(5 Sru s)) 


x F(w)d,(z)Dw dx dz [6] 


This is a shorthand for the time discretization 
version along broken paths w interpolating 
linearly between point x;=w(t;), t; =j(u—s)/N, 
j=0, 1,..., N. In [6] b is Planck’s constant and 
S=S$, denotes the action functional with Lagran- 
gian L of the underlying classical system. For a 
particle with mass m in a scalar potential V on the 
real line, 


Soua—s)= J (Bae 2 V(w(r)) dr [7] 


The “Dw” of [6] is used as a Lebesgue measure, 
although there is no such thing in infinite dimen- 
sions. More generally, the construction of measures 
or integrals on the various path spaces required for 
general quantum systems is still nowadays a field of 
investigation. 

When F =1 and ¢, (the complex conjugate of ¢,) 
reduces to a Dirac mass at z, [6] is the path-integral 
representation of the solution (x,u) of the initial- 
value problem in L?: 


, OW 
iba 
w(x, s) = W(x) 
where H = lb /2)A + V and when S; is as in [7]. 


Feynman’s framework is time symmetric on J: when 
w, =6, (still for F=1), [6] provides a path-integral 
representation of the solution of the final-value 
problem for ¢(z,s). 

According to Feynman, “it would be possible to 
use the integration-by-parts formula 


(aia) a") 


as a starting point to define the laws of quantum 
mechanics” (Feynman and Hibbs 1965, p. 173). The 
functional derivative corresponds to variations of 
the underlying paths in directions ów and 


OF 
óF = law ds 


to an L? analog of [4]. 


me 8] 


Its first consequence, when F=1, is the path 
space counterpart of Newton’s law, in the elemen- 
tary case [7], 


< mw >S, 5 — <VV(w)>s, [10] 


where the left-hand side involves a time discretiza- 
tion of the second derivative. When F(w)= w(t), 
Feynman obtains the path space version of 
Heisenberg commutation relation between position 
and momentum observables: 


(l0) w(t) — w(t — 2) 7 Ge + €) — w(t) (0) 
€ S; € 
= — [11] 
m 

and from this the crucial fact that “quantum 
mechanical paths are very irregular. However, these 
irregularities average out over a reasonable length of 
time to produce a reasonable drift or average 
velocity” (Feynman and Hibbs 1965, p. 177). 

A probabilistic interpretation (cf. Cruzeiro 
and Zambrini (1991)) of Feynman’s calculus uses 
(Bernstein) diffusion processes solving the SDE 


pT h 
dzZ) = (2) dW (t) +> V log n(z(t),t) dt [12] 
where the drift stems from a positive solution of the 
Euclidean version of the above final-value problem 
for Q, 
On 


b—=]H 
Ot 4 


(x, u) = Mu (x) 


[13] 


For any regular function f, we can make sense of 
the “continuous limit” 


Df(2(#),t) = lim Eft e) +6) 
— f(e(t), £) [14] 


where E, denotes conditional expectation with 
respect to the past P; and check, indeed, that 


Dz(t) = ” Vlog n(z(t), t) 
is Feynman’s “reasonable drift.” Using Feynman- 
Kac formula, one shows that the diffusions [12] 
have laws which are absolutely continuous with 
respect to the Wiener measure of parameter h/m, 
with Radon-—Nikodym density given by 


adoa ff 
iO ne exp ( - 5 [ veo) dr) 


We can, therefore, use Malliavin calculus on the 
path space of these diffusions and the associated 
integration-by-parts formula to make sense of [9] 
and all its consequences. 

The probabilistic counterpart of the time symme- 
try of Feynman’s framework is interesting: Heisen- 
berg’s original argument to deny the existence of 
quantum trajectories (1927) was that any position 
can be associated with two velocities. Feynman’s 
interpretation [11] and the definition [14] suggest 
that this has to do with a past or future conditioning 
at time t. Indeed, there is another description of 
diffusions z(t) with respect to a family of future 
o-fields, using the Euclidean version of the initial- 
value problem for y, underlying [6]. Another drift 
built on the model of the drift in [12] results, and 
Feynman’s commutation relation [11] becomes 
rigorous (without, of course, the factor i). 

We refer to Cruzeiro and Zambrini (1991) for a 
development of this approach using Malliavin 
calculus. 


See also: Euclidean Field Theory; Functional Integration 
in Quantum Physics; Measure on Loop Spaces; 
Stochastic Differential Equations; Stochastic 
Hydrodynamics. 
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Introduction 


Characteristic classes play an essential role in the 
study of global properties of vector bundles. 
Particularly important is the Euler class of real 
orientable vector bundles. A de Rham representative 
of the Euler class (for tangent bundles) first 
appeared in Chern’s generalization of the Gauss- 
Bonnet theorem to higher dimensions. The repre- 
sentative is the Pfaffian of the curvature, whose 
cohomology class does not depend on the choice of 
connections. The Euler class of a vector bundle is 
also the obstruction to the existence of a nowhere- 
vanishing section. In fact, it is the Poincaré dual of 
the zero set of any section which intersects the zero 
section transversely. In the case of tangent bundles, 
it counts (algebraically) the zeros of a vector field on 
the manifold. That this is equal to the Euler 
characteristic number is known as the Hopf theo- 
rem. Also significant is the Thom class of a vector 
bundle: it is the Poincaré dual of the zero section in 
the total space. It induces, by a cup product, the 
Thom isomorphism between the cohomology of the 
base space and that of the total space with compact 
vertical support. Thom isomorphism also exists and 
plays an important role in K-theory. 

Mathai and Quillen (1986) obtained a represen- 
tative of the Thom class by a differential form on 
the total space of a vector bundle. Instead of 
having a compact support, the form has a nice 
Gaussian peak near the zero section and exponen- 
tially decays along the fiber directions. The pull- 
back of Mathai-—Quillen’s Thom form by any 
section is a representative of the Euler class. By 
scaling the section, one obtains an interpolation 
between the Pfaffian of the curvature, which 
distributes smoothly on the manifold, and the 
Poincaré dual of the zero set, which localizes on 
the latter. This elegant construction proves to be 
extremely useful in many situations, from the 
study of Morse theory, analytic torsion in mathe- 
matics to the understanding of topological (coho- 
mological) field theories in physics. 

In this article, we begin with the construction of 
Mathai—Quillen’s Thom form. We also consider the 
case with group actions, with a review of equivar- 
iant cohomology and then Mathai—Quillen’s con- 
struction in this setting. Next, we show that much of 
the above can be formulated as a “field theory’ on a 


superspace of one fermionic dimension. Finally, 
we present the interpretation of topological field 
theories using the Mathai—Quillen formalism. 


Mathai-Quillen’s Construction 
Berezin Integral and Supertrace 


Let V be an oriented real vector space of dimension 7 
with a volume element v € A”V compatible with 
the orientation. The “Berezin integral” of a form 
w € A*V* on V, denoted by f B ww, is the pairing (v, w). 
Clearly, only the top degree component of w 
contributes. For example, if o € A? V* is a 2-form, then 


MDN 
Dic aT It 71S even 
oP) 


0, if n is odd 


If V has a Euclidean metric (-,-), then v is chosen to 
be of unit norm. If X € End(V) is skew-symmetric, 
then (1/2)(-,-) is a 2-form and, if n is even, the 
Pfaffian of © is 


Pf() = [ exp (5 p? )) 


The Berezin integral can be defined on elements in 
a graded tensor product A*V* &® A, where A is any 
Z2-graded commutative algebra. For example, if we 
consider the identity operator x =idy as a V-valued 
function on V, then dx is a 1-form on V valued in V, 
and (dx,-) is a 1-form valued in V*. Let {e1,...,e,} 
be an orthonormal basis of V and write x=x’e;, 
where x’ are the coordinate functions on V. We let 


u(x) = af exp (~5 6m — (dx, )) 


The integrand is in Q*(V) @ A* V*. The result is 
x)= — (5 x) Ja! Ac A dx” [1] 
(2r)? 2 


a Gaussian n-form whose (usual) integration on V is 1. 

Let Cl(V) be the Clifford algebra of V. For any 
orthonormal basis {e;}, let y’ be the corresponding 
generators of Cl(V) and let y=e; @y' € V@CIYV). 
For any w € Ak V*, we have 

1 | | 

Y) = gy ii CA myk E CI(V) 
If n is even, the Clifford algebra has a unique 
Z-graded irreducible spinor representation S(V) = 
S*(V)@S-(V). For any element a€ Cl(V), the 


a ET 


supertrace is str a =trs+(y) a — trs-(y) a. If © € End(V) 
is skew-symmetric, then 


À (£) PPE) 


N 
ct 
SS 
Oo 
$ 
as 
7N 
EN 
— 
> 
M 
= 
N5 
Il 


a 52 

Go (Es 5) 

More generally, supertrace can be defined on 
Cl(V)@A for any Z2-graded commutative algebra 
A=A* GA. If X is skew-symmetric and a € V* Q 
A`, then 


str exp (Gro. vy) + CI) a0) 


= AS" f Ge +a) [2] 


Representatives of the Euler and Thom Classes 


Let M be a smooth manifold and let 7:E — M 
be an oriented real vector bundle of rank r. Suppose 
E has a Euclidean structure (-,-) and V is a 
compatible connection. The curvature REQ? 
(M, End (E)) is skew-symmetric, and hence (-,R-) € 
Q?(M, ^A? E*). A de Rham representative of the Euler 
class of E is 


(E= oa / ‘en (56.89) = (E) 3 


Here, the Berezin integration is fiberwise in E: it is 
the pairing between the integrand and the unit 
section v of the trivial line bundle AʻE that is 
consistent with the orientation of E. The de Rham 
cohomology class of [3] is independent of the choice 
of (-,-) or V. 

Let s be a section of E. Following Berline et al. 
(1992) and Zhang (2001), we consider 


8y s= 4 (s,s) + (Vs,-)+5(-,R-) [4] 


a differential form on M valued in A*E*. Mathai- 
Quillen’s representative of the Euler class is 


_4yr(rt+1)/2 7B 
ev (E= a J gvi 5] 


One can show that ey,,(E) is closed and that as 
8 varies, the cohomology class of ey,g;(E) does not 
change. By taking G— 0, the de Rham class of 
ey s(E) is equal to that of ey(E) when r is even. The 
form ey, gs(E) provides a continuous interpolation 
between [3] and the limit as 8 — oo, when the form 
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is concentrated on the zero locus of the section s. In 
fact, the Euler class is the Poincaré dual to the 
homology class represented by s~!(0). Hence, if 
n >m and if w € 1”"(M) is closed, we have 


Jenn fe 6 


when s intersects the zero section transversely. 

To obtain Mathai—Quillen’s representative of the 
Thom class, we consider the pullback of E to E itself. 
The bundle 7*E — E has a tautological section x. 
Applying [5] to this setting, we get 


—1 r(r+1)/2 B 1 
a a | exp(—5 (2.3) 


(Vx) =5(R>) 7 


where (-,-),V, and R are understood to be the 
pullbacks to 7*E. This is a closed form on the total 
space of E. Moreover, its restriction to each fiber 
is the Gaussian form [1]. The cohomology groups 
of differential forms with exponential decay along 
the fibers are isomorphic to those with compact 
vertical support or the relative cohomology groups 
H*(E,E\M). Here M is identified with its image 
under the inclusion i:M — E by the zero section. 
Under the above isomorphism, the cohomology 
class represented by ty(E) coincides with the 
Thom class 7(E) =i,1 € H’(E, E\M) defined topo- 
logically. For any section s€I(E), we have 
ey s(E) =s*ry(E). 


Character Form of the Thom Class in K-Theory 


Let E=E* @E be a Z2-graded vector bundle over 
M. The spaces 0*(M, E), T(End(E)) and Q*(M) T 
(End(E)) are also Z2-graded. The action of a & T € 
Q*(M) T(End(E)) on 8 & s € Q*(M, E) is 





a@T : 88sm (1) lla A B) @ (Ts) 


The supertrace of A € I(End(E)) is str A =trg+A — 
trp-A; it extends (*(M)-linearly to str: Q*(M) T 
(End(E)) = 0*(M). Let V be a connection on E 
preserving the grading. V is an odd operator on 
Q*(M, E). If L € T(End(E) ) is odd, then D=V + L 
is called a “superconnection” on E; the “curvature” 
D? =R + VL + L? € (Q*(M) @T(End(E)))* is even. 
With the superconnection, the Chern character of 
the virtual vector bundle ET GE™ can be repre- 
sented by 


chy (E+, E7)= str exp( 5D?) [8] 
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It is a closed form on M and its de Rham 
cohomology class is independent of the choice of 
V or L. If L is invertible everywhere on M and the 
eigenvalues of V—1L? are negative, then [8] is exact: 


chy aE E`) 


staf (eo Vive) Jz) d8 
Now let E be an oriented real vector bundle of 
rank r=2m over M with a Euclidean structure (-, -). 
Suppose further that E has a spin structure. The 
associated spinor bundle S(E)=S+(E)® S-(E) is a 
graded complex vector bundle over M. For any 
section s € [ (E), let c(s) € T(End(Ey ) be the Clifford 
multiplication on E. Then for any s,s’ €T(E), 
we have {c(s), c(s’)} = —2(s, s’). Given a connection V 
on E preserving (-,-), the induced spinor connection 
V° on S(E) preserves the grading. If R is the curvature 
of V, that of V’ is RS = —(1/4) (y, Ry), where y is now 
a section of E ® CI(E). For any s € T(E), consider the 
superconnection 


D= v5 (E) as 


The Chern character form [8] of St(E) © S7 (E) is, 
using |2], 


~fR\W 12 
chy, «(S*(E)S-(E)) = (-1)"A(S=) eval) I 
where ey,;(E) is given by [5]. In cohomology groups, 
[9] reduces to 


ch(S*(E)) — ch(S-(E)) = (—1)” A (E) " e(E) 


If M is noncompact and the norm of s increases 
rapidly away from s~!(0), then both sides of [9] are 
differential forms that decay rapidly away from 
s™!(0) and can represent cohomology classes of such. 
As before, we take the pullback z*E with the 
tautological section x. Then [9] becomes 


chy (n*S*(E), n*S-(E)) 
4/2 
=(-1)"" A(X) wE ft 


where ty(E) is given by [7]. Both sides of [10] are 
forms on E that decays exponentially in the fiber 
directions; hence, it descends to an equality in 
H*(E,E\M). In the relative K-group K(E,E\M), 
the pair 7*S*(E) with the isomorphism c(x) away 
form the zero section is, up to a factor of (—1)”, the 
K-theoretic Thom class 7,1 € K(E, E\M). Therefore, 
[10] reduces to the well-known formula 


ch(i)1) = À (E) "i1 


in cohomology groups H*(E, E\M). The refinement 
[10] as an equality of differential forms is 
due to Mathai and Quillen (1986). In fact, this is 
how [7] was derived originally. 


Equivariant Cohomology and Equivariant 
Vector Bundles 


Equivariant Cohomology 


Let G be a compact Lie group with Lie algebra g. 
Fixing a basis {e4} of g, the structure constants are 
given by [e,, ep] = ti pec: Let {v7} and {y*} be the dual 
bases of q* generating the exterior algebra /A(q*) and 
the symmetric algebra S(q*), respectively. The Weil 
algebra is W(q) = A (q*) S S(q*). We define a grading 
on W(q) by specifying deg Vv? =1,degy*=2. The 
contraction t4 and the exterior derivative d are two 


odd derivations on W(q) defined by 


Lad? = 8°, Lap? = 0 
di? = Fh. +9", de = -h.0" 


|11] 
The Lie derivative is Ly={tg,d}. These operators 
satisfy the usual (anti-)commutation relations 


ge=0, Ly= {id}, Ledo 112) 


im Lp } = 0, es Lp] = l ple 


¢ [13] 
Lz Lp] = Ep Le 


The cohomology of (W(q), d) is trivial. 

If G acts smoothly on a manifold M on the left, let 
Va be the vector field generated by the Lie algebra 
element —e, € g. Then, [Vz, Vo] =t, Ve. Denote 
lg=ty, and L,=Ly,, acting on Q*(M). In the Weil 
model of equivariant cohomology, one considers the 
graded tensor product W(q)®@*(M), on which the 
operators 


ig = u1 +18 
d-d&1+1ad 
L,=L,®1+1€@L, 


act and satisfy the same relations [12] and [13]. 
An element w € W(q)@*(M) is “basic” if it 
satisfies s,w=0,L,zw=0 for all indices a. Let 
%(M) =(W(q)@0*(M)),,, be the set of such. 
Elements of Q¢(M) are equivariant differential 
forms on M. The operator d preserves Q% (M) 
and its cohomology groups HG(M) are the equiv- 
ariant cohomology groups of M. They are 


isomorphic to the singular cohomology groups of 
EG xg M with real coefficients. 

The BRST model of Kalkman (1993) is obtained 
by applying an isomorphism o=e"*® of W(g) 
Q*(M). The operators become 


ool oo! =1,81 
godog!= d- Put À La 
gol,o0™! = L; 


The subspace of basic forms in the Weil model 
becomes 


a(Ng(M)) = (S(g*) 2 Q (M))| 


This is precisely the Cartan model of equivariant 
cohomology, in which the exterior differential is 


~ 


d =18d-— y 8 ta 


If P is a principal G-bundle over a base space B, 
we can form an associated bundle P xg M — B. 
Choose a connection on P and let O0O=0%e, € 
Q'UP) g, P = e, E€ QP) @q be the connection 
and curvature forms, respectively. The components 
©, 7 satisfy the same relations [11]. Replacing 
v7," by 07,7, we have a homomorphism that 
maps w € W(g) 8 Q*(M) to ú € Q*(P x M). If w is 
basic, then so is wW, and the latter descends to a form 
won P xg M. Furthermore, the operator d on Q% (M) 
descends to d on Q*(P xg M). Thus, we get the 
Chern—Weil homomorphisms Q% (M) — 0*(P xg M) 
and H%(M) — H*(P xg M). For example, the vector 
space R” has an obvious SO(r) action. The Gaussian 
r-form [1] is invariant under SO(r) and can be 
extended to an SO(r)-equivariant closed r-form, 
called the “universal Thom form.” Let E be an 
orientable real vector bundle E of rank r with a 
Euclidean structure. E determines a principal SO(r)- 
bundle P; the associated bundle P xsqi,) R” is E itself. 
By applying the Chern-Weil homomorphism to this 
setting, we get a closed r-form on E. This is another 
construction of the Thom form [7] by Mathai and 
Quillen (1986). Further information of equivariant 
cohomology can be found there, and in Berline et al. 
(1992) and Guillemin and Sternberg (1999). 


Equivariant Vector Bundles 


Recall that a connection on a vector bundle E — M 
determines, for any k > 0, a differential operator 
V : QF(M, E) > O8*1(M, E) 


The curvature R = V? € Q?(M, End(E)) satisfies the 
Bianchi identity VR = 0. If the connection preserves a 
Euclidean structure on E, then R is skew-symmetric. 
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If a Lie group G acts on M and the action can be 
lifted to E, then G also acts on the spaces T(E) and 
Q*(M, E). As before, the Lie derivatives L, on these 
spaces are the infinitesimal actions of —e, € g. We 
choose a G-invariant connection on E. The 
“moment” of the connection V under the G-action 
IS [lg=L,—Vy, acting on T(E). In fact, ua is a 
section of End(E), or weT(End(E))@q*. If a 
Euclidean structure on E is preserved by both the 
connection and the G-action, then ua is skew- 
symmetric. On Q*(M, E), we have 


Lı = lias Vt + Ha 
taR = V ha, Laup = tpl: 
|Ha, Ho] = byte + Rab 


where Ra = R( V2, Vp) € T (End(E)). 

On the graded tensor product W(q) @*(M, E), 
the contraction 7, and the Lie derivative L, act and 
satisfy [13]. In the Weil model, equivariant differ- 
ential forms on M with values in E are the basic 
elements in W(q)  Q*(M, E), which form a subspace 
0% (M, E) =(W(qg) 8 Q*(M, E))pas. The “equivariant 


covariant derivative” is 
V=d81418V40 È pa [14 


One checks that {1,, V} = L 
the basic subspace Q%(M, E 
ture R=V is 


a and hence V preserves 
). The equivariant curva- 


R =R — BV pa + pha ++ Rap [15] 


It satisfies the equivariant Bianchi identity VR =0. 
Equivariant characteristic forms are invariant poly- 
nomials of R. They are equivariantly closed and 
their equivariant cohomology classes do not depend 
on the choice of the G-invariant connection. Hence, 
they represent the equivariant characteristic classes 
of E in Hý(M). 

For the BRST model, we use a similar isomorph- 
ism g=e” 8% on W(g)Q*(M,E). The operators 
become 


ool oo! = u81 
coor =V=y 9l t OL 
ome ie og f= La 
and the basic subspace turns into 
* * * G 
o(Q¢(M, E)) = (S(g") 8 O"(M, E)) 


This is the Cartan model, which can be found in 
Berline et al. (1992). The equivariant covariant 
derivative is 


V =189V-Ø g la 
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The equivariant curvature is R’ = (y = R + f ua 
and the characteristic forms are defined similarly. 

Let P— B be a principal G-bundle with a 
connection ©. Following [14], the bundle P x E > 
P x M has a connection 


V=dQ1+189V +Q ua 


It descends to a connection V on the vector bundle 
P xg E— P xq M. The map V V can be consid- 
ered as the analog of the Chern-Weil homomorphism 
for connections. There is also a homomorphism 
Q%(M, E) — Q*(P xg M,P xg E), which commutes 
with the covariant derivatives V, V. The curvature 
R= V^ is the image of the equivariant curvature R. 
Consequently, the equivariant characteristic forms 
descend to those of P xg E — P xg M by the usual 
Chern-Weil homomorphism. 

Now let E = Et @E be a graded vector bundle 
over M with a G-action preserving all the structures. 
We have the Q%(M)-linear supertrace map str: 
Q%(M)ŠT(End(E)) = Q% (M). If V is a G-invariant 
connection on E preserving the grading and if 
LE T(End(E)~)° is odd and G-invariant, then 
D=V+L is an “equivariant superconnection.” 
The equivariant counterpart of [8] is 


J-l z 
cher EE J= sirexp (3°) E QG(M) 
i T 


representing the equivariant Chern character of 
E* © E~ in Hý(M). 


Representatives of the Equivariant Euler 
and Thom Classes 


Consider an oriented real vector bundle E —> M of 
rank r with a Euclidean structure (-,-). Choose a 
connection V on E preserving (-,-). We assume that 
a Lie group G acts on M and that the action can be 
lifted to E preserving all the structures on E. We 
use the Weil model; the constructions in the Cartan 
model are similar. For any a€ QF (M, E) and 
BEQL(M,E), we obtain (a,A8) €Q8"(M) by 
taking the wedge product of forms as well as the 
pairing in E. The Berezin integral of w € Q% (M, A* E*) 
along the fibers of E is Ta = (vw) € OG(M). Here, 
v is the unit section of the canonically trivial 
determinant line bundle A’E, compatible with the 
orientation of E. The equivariant Euler form 


ee(E) = zaf exp (3¢ ,R )) = Pi( =) [16] 


is equivariantly closed. It represents the equivariant 
Euler class eg(E) € Hý (M). 


Given a G-invariant section s € ['(E)°, the equiv- 
ariant counterpart of [4] is 


Se,=4(s,s)+ (Vs, -) +4(, R>) [17] 
and that of Mathai—Quillen’s Euler form [5] is 
aye [ rs 
es (E) =~. | e°s 18 
sl ) (20)! | | 


It is also equivariantly closed, and its equivariant 
cohomology class is eg(E). The equivariant exten- 
sion of Mathai—Quillen’s Thom form [7] is 


4 r(r+1)/2 B 1 
T(E) = E. exp (—5 (0.3) 


(Wx) 5 (8) 19 


where x is the (G-invariant) tautological section of 
TE > E. 

Finally, G acts on the (graded) spinor bundle S(E). 
Using the equivariant superconnection 


Da g (A) "a 


[9] generalizes to 


A ij 
che dS ES E= (-1)"A (| eg (E) 


Now apply the construction to the bundle 7*E — E 
and its tautological section x. The pair 7*S*(E) with 
an odd bundle map c(x) determines, up to a factor 
of (—1)”, the Thom class ilg in the equivariant 
K-group Kc(E,E\M). The equivariant analog of 
[10] descends to 


che (ii1g) = *AG(E) "ilg 


in equivariant cohomology. 


Superspace Formulation 


Mathai-Quillen Formalism and the 
Superspace R”!! 


Let R°!' be the superspace with one fermionic 
coordinate @ but no bosonic coordinates. The 
translation on R°!!! is generated by D=0/00, 
which satisfies {D,D}=0. We consider a sigma 
model on R°!! whose target space is an (ordinary) 
smooth manifold M of dimension n. A map 
X:R°!! — M can be written as X(0) =x + /—16w. 
Here, x=X|p_)5 E Mand y = —/-1DX|,_, € T,.M; 
the latter is fermionic. Under the translation 


O++O0+.6¢,x and w vary according to the super- 
symmetry transformations 


bp = eD(DX)lo- = 9 
Clearly, 6 = 0, which is also a consequence of D? = 0. 
For any p-form w € QP (M), we have an observable 


1 x 
O.[X) =F XD, Dylon 


In local coordinates, 


and 


Using C(-) to denote the set of function(al)s on a 
space, we can identify C(Map(R°!!, M)) with 2*(M). 
Under [20], 60,,(X) = «O4,(X). So, © (X) is invar- 
iant under supersymmetry if and only if w is closed. 
The cohomology of 6 is the de Rham cohomology of 
M. Consider the measure [dX] =[dx][dw]. In local 
coordinates, [dx] = dx! --- dx” is the standard (boso- 
nic) measure and [dy] = dy! --- dy” is a fermionic 
measure such that 


[lava PP vga 


For any we€"(M), the superfield integral 
J [dX]O,,(X) is equal to the usual integral fyw if 
the latter exists. 

Let E — M be a real vector bundle of rank r with 
an inner product (-,-), and let V be a compatible 
connection whose curvature is R. Consider a theory 
whose fields are X € Map(R°!', M) and a fermionic 
section = € I(X*E). Let D=(X*V)p be the covar- 
iant derivative along D in the pullback bundle 
X*E—R®°!', Then, x=2lp_) E€ Ex is fermionic 
and f =D=|,_ € Ex is bosonic. 

Given a fixed section s € T(E), we write a super- 
space action 


Smo[X, =] = J „ E,3DE + V=Is o X) 


=5 (ff) + V-1ff,s) — (Vys, x) 
FRP) [21] 
It is automatically supersymmetric. Performing the 


Gaussian integral over f and replacing x by —/—ly, 
we get 
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e naa pa] 


=) .—5mM [X=] _.™ =d 
[dje Ma! = — 
2a” 


where 


SMolx, Y, x] 
= 1 (s, s) = V= Vys) = +(x, R(%, w)x) [23] 


When r is even, [22] is equal to Oxy, s)z)(X), where 
e(V,s)(E) is given by [5]. Furthermore, for any 
closed form w on M, the expectation value 


(0,,(X)) = J [dX][d=]O,(X) XE] [24] 
is equal to [6]. 


Equivariant Cohomology and Gauged Sigma 
Model on R?°!! 


Suppose G is a Lie group and P isa principal G-bundle 
over R°!!. Since @ is nilpotent, we can choose a 
“trivialization” of P such that the connection and 
curvature are A € 2'(R°!!) @q and FE 2(R°!!)@ 
q, respectively. (q is the Lie algebra of G.) In 
components, c= /—lipA € g is fermionic and ¢= 
=y /2)i4,F € g is bosonic. The space of connec- 
tions A is the set of pairs (c, ġ). Under 90> 0 + €, 


dc =€ (o + na IC, a) 25) 


56 = V—le(c, d] 


Thus, the algebra C(A) is isomorphic to the Weil 
algebra W(q) and 6 corresponds to the differential d 
in [11]. This relation between gauge theory on a 
fermionic space and the Weil algebra can be found 
in Blau and Thompson (1997). 

With a trivialization of P, the group of gauge 
transformation G can be identified with 
Map(R°!!,G). Any group element is of the form 
g=ge¥ 16, with g=8|,_) E€ G and £=V/—lips* weg 
(fermionic), where w is the Maurer—Cartan form on 
G. The action of g is Ar+A’=Ad3(A—g*a), or 
crc'=Ad,(c—&) and d+ ¢'=Adg¢. By choosing 
€=c, we obtained a new trivialization, called the 
“Wess-Zumino gauge,” in which c/=0. The residual 
gauge redundancy is G, and A/G=q/Adg. The 
Wess—Zumino gauge is not preserved by the transla- 
tion on R°!! unless we define 6 by composing 6 with 
a suitable (infinitesimal) gauge transformation. If so, 
then 6’¢6=0. 

Suppose M is a manifold with a left G-action. As 
before, let {e,} be a basis of q and let the vector field V, 
be the infinitesimal action of —e,. In the gauged sigma 
model, we include another field X € T(P xg M). With 
a trivialization of P, we can identify X with a map 
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X:R°!! — M. The covariant derivative is given by 
VX =dX — A’°V,,DX=VpX. Let x=X|y-ọ EM 
and w= —V—-1DX|,_) E€ T,M. Then the supersym- 


metric transformations are 
bx! = V—le(y/ — cV’) 
syi = -(¢°V; ie v-1¢Vi,,) 
In the Wess—Zumino gauge, the transformations 
simplify to 6’x = /—lew, 6 W = —€d* Vj. 
The observables form the G-invariant part of the 


space C(A x Map(R°!',M)). For any w € ?(M), 
we have 


|26] 


DX)|9=0 


=— Wii (x) we 27 


O (X, A) is gauge covariant: O (X, A) => Og-.,(X, A), 
and the set of gauge-invariant observables is thus 
identified with (S(q*) x Q*(M))°. Moreover, since 


6O,,(X, A) = €(Og,(X, A) — V-11 Or... (X, A) 
—V—1470,,.,(X, A)) 


6 corresponds to the differential d’ in BRST model. 

Let E — M be an equivariant vector bundle and 
let V be a G-invariant connection with curvature R 
and moment u. Any s € I'(E)° defines a section of 
P xg E —> P xg M, still denoted by s. Consider a 
theory with superfields XeTrT(PxsM) and 
= E€ T(X*(P xg E)) (fermionic). Let D be the covar- 
iant derivative of the pullback connection. With a 
trivialization of P, we put y=5|y_,) € Ex (fermio- 
nic) and f = DE|,_,) € Ex (bosonic). The equivariant 
extension of [21] is 


SmalX, EA] = | 
ROl 


d=, IDE +V¥—1s0 X) 
1 
Similar to [22], we get, in the Wess-Zumino gauge, 


v- 
=]e-Sma[XE,A] — 
where 


SMQ i Yp, Q, x] 
= 5 (s,s) — VZT, Vys) 


-FRH w)x) — rite Hax) [29] 


When r is even, [28] is equal to Osy,;)(X, A), where 
e(V,s) is given by [18]. 


[dyjeSmalev.e.x] [28] 


The Atiyah-Jeffrey Formula 


Given the G-action on M, for any x € M, there is a 
linear map C,:q —> TM defined by C,(e,) = Va(x). 
With an invariant inner product (-,-) on g and an 
invariant Riemannian metric on M, the adjoint of 
C, is Cl:T,M — 9, that is, Cİ € Q1(M) 89. If G 
acts on M freely, then Cx is injective and (C'C),. is 
invertible for all xe M. The projection M —> 
M = M/G is a principal G-bundle. It has a connection 
such that the horizontal subspace is the orthogonal 
compliment of the G-orbits. The connection 1-form is 
© =(CiC) !Ct, whereas the curvature is ® = (CC)! 
dC on horizontal vectors. 

Let w be an equivariant form on M. Suppose G 
acts on M freely, then w descends to a form w on M. 
We look for a gauge-invariant, supersymmetric 
quantity Y(X, A) such that 


1 
vol(G) 


= J ax10.%) 30] 


J idXI[dA]0, (X, A)T(X,A) 


Mathematically, Y corresponds to a closed equivar- 
iant form v on M such that 


1 
a A f 20) 600) = 


which is [30] in the Wess—Zumino gauge. In fact, v is 
distribution valued in the sense of Kumar and Vergne 
(1993) and can be understood as an equivariant 
homology cycle, as in Austin and Braam (1995). 

Let P be a G-bundle over R°!! with a connection 
and let Ad P =P xg g — R°!! be the adjoint bundle. 
Consider a (bosonic) superfield A € I'(AdP). Set A = 
Aly (bosonic) and n= —/-1DA|,_, (fermionic). 
Choosing a trivialization of P, à and 7 are both in q. 
Under 0— 0 + €e, they transform as 


6d = V—1e(n + [c, d]) 


31 
ôn = e(|, A| a v—1[c,n]) 


The superspace action 


Scmr[X, A, A] = v—1 | d6(A, C'DX) 
Rot 


is invariant under [25], [26], and [31] and, under the 
Wess—Zumino gauge, it is 
SCMR Ix, Up Q, n, A| 
= -vV —1 (n, Cl) — V-1(A, dC! (4, Y) 
+ (A, C'C¢) [32] 


If G acts on M freely, then 
1(X, A) = J [dA]e sR X44] [33] 


satisfies [30]. The factor Y(X, A) in [30] is called 
“projection” in Cordes et al. (1996). 

Let E — M be a G-equivariant vector bundle with 
a fixed G-invariant connection V, moment u, and 
an invariant section s. Consider the superspace 
action 


SAJ |X, =, A, Al = SMQ X, =, A| T SCMR X, A, A| 


In the Wess-Zumino gauge and after the Gaussian 
integral over f, it becomes the Atiyah-Jeffrey action 


SAJ Ix, W, Q, Xo; A| 
= SMQ Ix, w, @, x] T Scmr|x, P, 0,1); A| [34] 


If s intersect the zero section transversely and G acts 
on s_'(0) freely, then s-'(0)/G is smooth and 


/ J= J [dx] [dy] [dø] [dx] [dn] [4 
s-!(0)/G 
x O,,(x, yp, pje ebem [35] 


for any closed equivariant form w on M. Equation 
[35] is the formula of Atiyah and Jeffrey (1990) and 
of Witten (1988a) in an infinite-dimensional setting. 
When s™!(0)/G is not smooth, the right-hand side of 
[35] can be regarded as a definition of the left-hand 
side. 

It is often convenient to add to Sa; another term 


= VE lo m) EALAN B6 


Since [36] is -exact and no new field is added, the 
integral [35] does not change if AS is added to Saj. 


Applications to Cohomological 
Field Theories 


We now apply the Mathai—Quillen construction 
formally to a number of cases in which both the 
rank of the vector bundle and the dimension of the 
base space are infinite. Thus, the (bosonic and 
fermionic) integrals in [24] or [35] become path 
integrals in quantum mechanics or quantum field 
theory. 


Supersymmetric Quantum Mechanics 


Let (M,g) be a Riemannian manifold and LM = 
Map(S!, M), the loop space. At each point u € LM, 
which is a map u:St! — M, the tangent space is 
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T,LM =T (u*TM). In particular, ù = du/dt, where t 
is a parameter on S!, is a tangent vector at u and 
ut>u is a vector field on LM. For any Morse 
function h on M, s(u)=u+(gradh) ou is another 
vector field on LM. 

Vector fields on LM can be identified as sections of 
the bundle ev*TM — St x LM, where ev:S! x 
LM — M is the evaluation map. The Levi-Civita 
connection V on TM pulls back to a connection on 
ev*TM and the covariant derivatives along LM define 
a natural connection VMM on T(LM). For example, 
for any tangent vector V € T,LM=I(u*TM), we 
have Vi"s(u) = VV + (Vy gradh) ou, where V” is 
the pullback connection on u*TM. The Riemann 
curvature tensor R on M determines that on LM. 

The (infinite-dimensional) analog of [22] is 


[lesittviteexp(— f drew) B7 
where vw, x € T,LM =T (u* TM) are fermionic and 
Liu, 0), x]=}g(ù + grad h, ù + grad b) 
— V—-1g(x, Viv + Vy grad h) 


Here and below, factors of V—1 and 27 in [22] are 
absorbed in the path-integral measure. [38] is, up to 
a total derivative, the Lagrangian of the Euclidean 
N=2 supersymmetric quantum mechanics on M. 
The partition function [37] is equal to Euler 
characteristic number of LM or M, which can 
be confirmed by an (exact) stationary-phase 
calculation. 


Topological Sigma Model 


Let X be a Riemann surface with complex structure 
e and let (M,w) be a symplectic manifold with a 
compatible almost-complex structure J. Let E be a 
vector bundle over Map(, M) so that the fiber over 
u is Eu =l (u*TM x T*d). For any u € Map(y, M), 
du € €, and u> du is a section of €. The pullback 
of the Levi-Civita connection on TM, tensored with 
a connection on T*X, defines a connection on £. 

The vector bundle to which we apply the Mathai- 
Quillen formalism is the antiholomorphic part €°! of £. 
The fiber over u € Map(¥£, M) is E€” =T((u*TM @ 
T*>)°*'). The sub-bundle €°! has a connection V°! via 
projection from €.€°' has a natural section 
s:ur>Ou=(1/2)\(du+Joduoe). Solutions to the 
equation ójm=0 are pseudoholomorphic (or 
J-holomorphic) curves; let M =s™ (0) be the space of 
such curves. Its (virtual) dimension is 


dim M=4 x(X)dimM+2c\(u*TM) [89] 
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Along any V € T,,Map(&, M) =T(u*TM), the covar- 
iant derivative of s=0, is calculated in Wu (1995): 


Vy (7) = 4(V4V +] 0 V*V oe) 
+4VyJ o (du o € + J o du) [40] 


where V” is the pullback connection on u*TM. 

To write the Mathai—Quillen formalism for the 
bundle €°' — Map(£, M), we let y € T(u*TM) and 
x € T((u*TM @ T*d)"') be fermionic fields. Equa- 
tion [23] becomes the Lagrangian 


Llu,w,x] = tdu]? + (du, J o du o€) 
- V—1(x, V” + (VJ) o du oe) 
-4 (X (R, p- EVD [41] 


It is precisely the Lagrangian of the topological 
sigma model of Witten (1988b). Here, the pairing 
(-,-) is induced by the Riemannian metric w(-,J-) 
on M and a metric on © that is compatible with e. 
The second term in [41], integrated over X, is equal 
to fa u*w= ([w], u.[X]). 

For any differential form a € QP (M), let Og(u, y) 
be the observable obtained from ev*a € QP(X x 
Map(=,M)) by identifying Q*(Map(X,M)) with 
C(Map(R°'', Map(=,M))). If œa is closed and 
y € H,(%) is a homology cycle, then Wa (u, Y) = 
J, Oalu, Y) is identified with a closed (p — q)-form 
on Map(%,M). For closed a; €Q?i(M) and 
yi € Hg(%)(1 <i < r), the expectation values 


i=1 


= fiad I Wanted paz 


i=1 


are the Gromov-Witten invariants of (M, w). More- 
over, [42] is nonzero only if X`; (pi — qi) = dim M. 


Topological Gauge Theory 


Let M be a compact, oriented 4-manifold, G, a 
compact, semisimple Lie group, and P— M, a 
principal G-bundle. Denote by A the space of 
connections on P and G, the group of gauge 
transformations. The Lie algebra of G is Lie(G) = 
T(ad P)=°(M, ad P). At A € A, the tangent space is 
T,A=!(M,ad P). Both spaces have inner products 
if we choose an invariant inner product (- , -) on the Lie 
algebra g of G and a Riemannian metric g on M. The 
infinitesimal action of G on A is C=V,;: 
Lie(G) — TAA. 

With a Riemannian metric, any 2-form on M 
decomposes into self-dual and anti-self-dual parts: 
Q7(M) =0%(M)602(M). We consider a trivial 
vector bundle € — A whose fiber is Q? (M, ad P), 


G acts on € and the bundle is G-equivariant. The 
trivial connection on € is G-invariant; the moment is 
given by ọ € T(ad P): xv € Q? (M, ad P)— [¢, x]. The 
bundle € has a natural section s: A € Am FÅ, the 
self-dual part of the curvature. Its derivative along 
V €e Q!(M,ad P)=T,A is Lys=(VAV)". The sec- 
tion s is G-invariant, the zero set s'(0) is the space 
of anti-self-dual connections, and the quotient 
M=s1(0)/G is the instanton moduli space. Its 
(virtual) dimension is 


dim M = 4h(q)k(P)—4dim G(x(M) + o(M)) 


where h(g) is the dual Coxeter number of g and 


k(P) = (p1(AdP), [M]) € Z 


a 


is the instanton number of P. 

We proceed with the Mathai—Quillen interpretation 
of Atiyah and Jeffrey (1990). Let y € Q! (M,ad P), 
x € 02(M,ad P), n € T(ad P) be fermionic fields and 
@,A € (ad P), bosonic fields. The combination of 
[34] and [36] is given by the Lagrangian 


LIA, Y, b, x, 7, Al 
1m T 
= 5 IFAII + (¢, V4 VAA) 


— V —1(1, VaV) — V-1(x, Va) 
— V —1(A, [, %]) 


e kdt nm) oA K3 


Here, (-,-) is the pairing induced by a Riemannian 
metric on M and an invariant inner product on q. 
With an additional topological term proportional to 
(Fa, A Fa), [43] is the Lagrangian of topological 
gauge theory of Witten (1988a). 

There is a tautological connection on the 
G-bundle A x P — A x M. It is invariant under the 
G-action. Identifying 0*(A) with C(Map(R°'',A)) 
and using the Cartan model, the G-equivariant 
curvature is F =F, + V—1y + ¢. For any homology 
cycle y € H,(M), 





W,(A, b,¢)=— J (F, AF) [44 


4h(q) i 


corresponds to a closed -equivariant form on A. 
For y; € Hg,(M)(1 <i < r), the expectation values 


i=1 


4 1 
(T] w, ) = gg | AAs 


x [[ WA, p, pje Sern 45] 
7=1 


are, up to a factor of |Z(G)|, Donaldson invariants 
of M. Moreover, [45] is nonzero only if 
Yj-1(4— gi) = dim M. 

Other cohomological field theories can also be 
understood or constructed by the Mathai—Quillen 
formalism. Of such we mention only the topological 
field theories of abelian and nonabelian monopoles 
in Labastida and Marino (1995), which are related 
to the Seiberg-Witten invariants. 


See also: Characteristic Classes; Donaldson—Witten 
Theory; Equivariant Cohomology and the Cartan Model; 
kK-Theory; Topological Quantum Field Theory: Overview; 
Topological Sigma Models. 
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Fundamental Concepts of the 
Topological Theory of Knots and Links 


The first known discovery relating to knots as 
mathematical objects was made by Gauss around 
1833 in a note that refers to the knotting together of 
closed curves. This investigation originated in his work 
on electromagnetic theory that led him to compute 
inductance in a system of two linked circular wires. In 
this note he had given an analytic formula for the 
linking number of a pair of knotted curves. This 
number is a combinatorial topological invariant (it is 
an integer number). Moreover, one can now show that 
this number is invariant under Reidemeister moves 
(discussed in a later section). The linking coefficient 
can be generalized for the case of p- and g-dimensional 
manifolds in R?*4*!. The formula for the parametrized 
curves y(t) and y(t) with radius vectors 71(t), 72(t) is 
given by the following formula: 


1 — n, dri, dr) 
kmag f | =a 1 
P, Y 


rı — 7 
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The linking coefficient allows us to distinguish some 
two component links. Another approach to the link 
coefficient is that involving Seifert surfaces. (On this 
subject, see the section “Isotopies, Reidemeister 
moves, torus knots, and the linking number.”’) 

A systematic study of knots in R?, however, was 
only begun in the second half of the nineteenth 
century by Tait and his followers. They were 
motivated by Kelvin’s theory of atoms modeled on 
knotted vortex tubes of ether. It was expected that 
physical and chemical properties of various atoms 
could be expressed in terms of properties of knots 
such as the knot invariants. Even though Kelvin’s 
theory did not work, the theory of knots grew as a 
subfield of combinatorial and algebraic topology. 
Recently, new invariants of knots have been 
discovered and they have led to the solution of 
long-standing problems in knot theory. Surprising 
connections between the theory of knots and 
statistical mechanics, quantum groups, and quantum 
field theory are emerging. Moreover, knot theory 
has been shown to be intimately connected with 
many problems in physics, chemistry, and biology. 

Tait classified the knots in terms of the crossing 
number of a regular projection. A regular projection 
of a knot on a plane is an orthogonal projection of 
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the knot such that, at any crossing in the projection, 
exactly two strands intersect transversely. He made 
a number of observations about some general 
properties of knots which have come to be known 
as the “Tait conjectures.” In its simplest form, the 
classification problem for knots can be stated as 
follows. Given a projection of a knot, is it possible 
to decide in finitely many steps if it is equivalent to 
an unknot. This question was answered affirma- 
tively by W Haken in 1961. (For details, see Burde 
and Zieschang (1985)). 


General Notions and Definitions 


Let M be a closed orientable 3-manifold. A smooth 
embedding of S! in M is called a knot in M. A link 
in M is a finite collection of disjoint knots. The 
number of disjoint knots in a link is called the 
number of components of the link. Thus, a knot can 
be considered as a link with one component. Two 
links L, L’ in M are said to be equivalent if there 
exists a smooth orientation-preserving automorph- 
ism f:M-—M such that f(L)=L’. For links with 
two or more components, we require f to preserve a 
fixed given ordering of the components. Such a 
function f is called an ambient isotopy and L and L’ 
are called ambient isotopic. Here, we shall take M to 
be $3 [= R? U {oo} and simply write “a link” instead 
of “a link in S°.” The diagrams of links are drawn as 
links in R?. A link diagram of L is a plane projection 
with crossings marked as over or under. The 
simplest combinatorial invariant of a knot K is the 
crossing number c(K). It is defined as the minimum 
number of crossings in any projection of the knot K. 
The classification of knots up to crossing number 17 
is now known. The crossing numbers of some 
special families of knots are known; however, the 
question of finding the crossing number of an 
arbitrary knot is still unanswered. Another combi- 
natorial invariant of a knot K that is easy to define is 
the unknotting number u(K). It is defined as the 
minimum number of crossing changes in any 
projection of the knot K which makes it into a 
projection of the unknot. Upper and lower bounds 
for u(K) are known for any knot K. An explicit 
formula for u(K) for a family of knots called torus 
knots, conjectured by Milnor nearly 40 years ago, 
has been proved recently by a number of different 
methods. The 3-manifold S°\K is called the knot 
complement of K. The fundamental group 7(S°\K) 
of the knot complement is an invariant of the knot 
K. It is called the fundamental group of the knot and 
is denoted by 71(K). Equivalent knots have homeo- 
morphic complements and conversely. However, 


this result does not extend to links. (For details 
and a proof, see Manturov (2004), chapter 4). 


The Fundamental Group of Knots and 
Its Role in Topology 


For a better understanding of the above consider- 
ations, we need to introduce briefly the important 
concept of fundamental group in topology. The 
fundamental group plays an essential role in 
topology; it is involved in the entire technical 
apparatus of the subject, and likewise in all 
applications of topological methods. In fact, for 
low-dimensional manifolds (i.e., of dimension 2 or 
3) the fundamental group underlies all nontrivial 
topological facts. 

Classical knot theory is concerned with the space 
S°\K=M, an open 3-manifold. There is a natural 
embedding of the torus T* in M, namely as the 
boundary of small tubular neighborhood of the knot 
K. Similarly, for a link we obtain a disjoint union of 
2-tori in M. The principal topological invariant of a 
knot K is the fundamental group 7;(M) of the 
complement M of K, with distinguished subgroup 
the natural image of mı(T?), T € M?, with the 
obvious standard basis. The classical theorem of 
Papakyriakopoulos of the 1950s asserts that a knot 
is equivalent to the trivial one if and only if 71(M) is 
abelian. It was known by Haken in the early 1960s 
that there is an algorithm for deciding whether or 
not any knot is equivalent to the trivial knot. 
However, while it appears to have been established 
(by Waldhausen and others in the 1960s and 1970s) 
that two knots are topologically equivalent if and 
only if the corresponding fundamental groups with 
labeled abelian subgroups are isomorphic, the 
existence of an appropriate algorithm for deciding 
such equivalence remains an open question. The 
complexity of the knot group 7;(M) has led to the 
search for more effectively computable invariants to 
distinguish knots and links. (On this subject, see the 
section “Polynomial invariants of knots and links.’’) 

Starting with the oriented diagram of the knot or 
link K on the plane, one calculates in the standard 
manner (see Crowell and Fox (1963) and Neuwirth 
(1965)) a presentation of the group 71(M) of the 
knot (M=S°\K), obtaining one generator for the 
edge of the diagram of a trefoil knot and a pair of 
relations for each crossing. Since one relation of 
each such pair simply equates the pair of generators 
corresponding to the edges forming the upper 
branch of the crossing, the presentation reduces 
immediately to the standard one involving the same 
number of generators and relations. The 2-complex 


L with exactly one 0-cell, and with 1-cells labeled by 
generators and 2-cells labeled by the relations, is 
then a deformation retract of M. Lifting to the 
universal cover we obtain a boundary operator on a 
complex of free Z[7|-modules, which takes the form 
of a square matrix with entries from this group ring, 
and it is this matrix that is related to some 
differentiation as follows. Denoting the generators by 
a; and relators by rj, one defines the operator 0; by 


Oa;(aj) = ii 
a; (bc) = da; (b) + ba; (c) 


the matrix in question then has entries q; given by 
qij = Pa;(7%) 


Mapping each generator a; to t, we obtain a 
complex of modules over the ring of integer Laurent 
polynomials, with boundary operator the corre- 
sponding square matrix now with Laurent poly- 
nomials as entries. The determinant of this matrix 
turns out to be zero, and the highest common factor 
of its cofactors, after multiplication by a suitable 
power of ż, turns out to be just the Alexander 
polynomials A(t). 

Let us say a bit more using a little different 
notation on this question. Let A,(K) and J,(K) be 
the Alexander polynomial and the Jones polynomial, 
respectively. One of the earliest problems in knot 
theory was: to what extent does the topological type X 
of the complementary space X=S°\K and/or the 
isomorphism class G of its fundamental group 
G(K) = 7(X, xo) suffice to classify knots? The trefoil 
knot is the simplest example of nontrivial knot, so it 
seems remarkable that, not long after the discovery 
of the fundamental group of a topological space, 
Max Dehn (1914) succeeded in proving that the 
trefoil knot and its mirror image had isomorphic 
groups, but their knot types were distinct. Dehn’s 
(ingenious) proof was the beginning of a long story, 
with many contributions which reduced repeatedly 
the number of distinct knot types that could have 
homeomorphic complements and/or isomorphic 
groups, until it was finally proved, quite recently, 
that (1) X determines K and (2) if K is prime, then G 
determines K up to unoriented equivalence. Thus, 
there are at most four distinct oriented prime knot 
types which have the same knot group. 

The knot group G is finitely presented; however, 
it is infinite, torsion-free, and (if K is not the unknot) 
nonabelian. Its isomorphism class is in general not 
easily understood via a direct attack on the problem. 
In such circumstances, the obvious thing to do is to 
pass to the abelianized group, but unfortunately 
G/[G, G] S H,(X;Z) is infinite cyclic for all knots, 
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so it is of no use in distinguishing knots. Passing to 
the covering space X that belongs to [G, G], we note 
that there is a natural action of the cyclic group 
G/[G,G] on ~X via covering translations. The 
action makes the homology group H;(~X;Z) into a 
Z[q,q-']-module, where q is the generator of 
G/[G,G]. This module turns out to be finitely 
generated. It is the famous Alexander module. While 
the ring Z[g,q~'] is not a principal ideal domain 
(PID), relevant aspects of the theory of modules over 
a PID apply to H,(~X; Z). In particular, it splits as a 
direct sum of cyclic module, the first nontrivial one 
being Z[q,q7']/A,(K). Thus, A,(K) is the generator 
of the “order ideal,” and the smallest nontrivial 
torsion coefficient in the module Hy;(~X). In 
particular, A,(K) is very clearly an invariant of the 
knot group. 

We remark that when a knot is replaced by its 
mirror image (i.e., the orientation on S? is reversed), 
the Alexander and Jones polynomials A,(K) and 
Jq(K) go over to Ag_1(K) and Jj_1(K), respectively. 
As noted earlier, Aj(K) is invariant under such a 
change, but from the simplest example, the trefoil 
knot, we see that J,(K) is not. Now recall that G 
does not change under changes in the orientation of 
S°. This simple argument shows that J,(K) cannot be 
a group invariant! Thus, it seems interesting indeed 
to ask about the underlying topology behind the 
Jones polynomial. 


Isotopies, Reidemeister Moves, Torus 
Knots, and the Linking Number 


Because each knot is a smooth embedding of St in 
R?, it can be arbitrarily closely approximated by an 
embedding of a closed broken line in R°. Here we 
mean a good approximation such that after a very 
small smoothing (in the neighborhood of all ver- 
tices) we obtain a knot from the same isotopy class. 
However, generally this might not be the case. 


Definition 1 An embedding of a disjoint union of 
n closed broken lines in R? is called a polygonal 
n-component link. A polygonal knot is a polygonal 
one-component link. 


Definition 2 A link is called tame if it is isotopic to 
a polygonal link and wild otherwise. 


All C!-smooth knots are tame. In the sequel, all 
knots are taken to be smooth, hence, tame. 


Definition 3 Two polygonal links are isotopic if 
one of them can be transformed to the other by 
means of an iterated sequence of elementary 
isotopies and reverse’ transformations. The 
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elementary isotopy, generally, is assumed to be a 
replacement of an edge with two edges provided 
that the triangle has no intersection points with 
other edges of the link. 


It can be proved that the isotopy of smooth links 
corresponds to that of polygonal links; the proof is 
technically complicated. Like smooth links, poly- 
gonal links admit planar diagrams with overcross- 
ings and undercrossings, having such a diagram one 
can restore the link up to isotopy. 


Definition 4 By a planar isotopy of a smooth-link 
planar diagram we mean a diffeomorphism of the 
plane onto itself not changing the combinatorial 
structure of the diagram. 


Obviously, planar isotopy is an isotopy, that is, it 
does not change the link isotopy type in R°. 


Theorem 1 (Reidemeister) Two diagrams D, and 
D2 of smooth links generate isotopic links if and 
only if Dı can be transformed into D, by using a 
finite sequence of planar isotopy and the three 
Reidemeister moves Q1, Q2, Q3. 


Theorem 2 Suppose that D and D' are regular 
diagrams of two knots (or links) K and K', 
respectively. Then K x K' & D x D'. 


We may conclude from the above theorems that 
the problem of equivalence of knots, in essence, is 
just a problem of the equivalence of regular 
diagrams. Therefore, a knot (or link) invariant may 
be thought of as a quantity that remains unchanged 
when we apply any one of the Reidemeister moves 
to a regular diagram. 

Knots and links embedded in R? can be consid- 
ered as curves (families of curves) in 2-surfaces, 
where the latter surfaces are standardly embedded in 
R?. In this section we shall briefly show that all 
knots and links can be obtained in this manner. 

Consider a handle surface S, standardly embedded 
in R? and a curve (knot) K in it. We can now ask the 
following question: which knot isotopy classes can 
appear for a fixed g? First, let us note that for g=0 
there exists only one knot embeddable in S*, namely 
the unknot. The case g= 1 (torus, torus knots) gives 
us some interesting information. Consider the torus 
as a Cartesian product Stx S! with coordinates 
@,p € [0,27], where 27 is identified with 0. In two 
dimensions, the torus can be illustrated as a square 
with opposite sides identified. Let us embed this torus 
standardly in R°; more precisely, 


(¢,~) — ((R + rcos wy) cos ø, 
(R +rcos yy) sing, rsin g) [2] 


Here R is the outer radius of the torus, r the small 
radius (r< R), the longitude, and » the meridian. 
For the classification of torus knots we shall need 
the classification of isotopy classes of nonintersect- 
ing curves in T*: obviously, two curves isotopic in 
T? are isotopic in R°. Without loss of generality, we 
can assume the considered closed curve to pass 
through the point (0,0) =(27,27). It can intersect 
the edges of the square several times. In addition, 
assume all these intersections to be transverse. Let us 
calculate separately the algebraic number of inter- 
sections with horizontal edges and those with 
vertical edges. Here, passing through the right edge 
or through the upper edge is said to be positive; that 
through the left or the lower edge is negative. Thus, 
for each curve of such type we obtain a pair of 
integer numbers. So, each torus knot passes p times 
the longitude of the torus, and g times its meridian, 
where GCD(p, gq) =1. It is easy to see that for any 
coprime p and q such a curve exists: one can just 
take the geodesic line {gd — pp =0 (mod 27)}. Let us 
denote the torus knot by T(p,qg). So, in order to 
classify torus knots, one should consider pairs of 
coprime numbers p,q and see which of them can be 
isotopic in the ambient space R°. The simplest case 
is when either p or g equals 1. The next simplest 
example of a pair of coprime numbers is p= 3, g=2 
(or p=2, g=3). In each of these cases we obtain the 
trefoil knot. Let us state the following important 
result. 


Theorem 3 For any coprime integers p and q, the 
tori (p,q) and (q,p) are isotopic. 


Proof For a proof of this theorem, see Rolfsen 
(1990). Note that the (p,q) torus knot in one full 
torus is just the (g,p) torus knot in the other one. 
Thus, mapping one full torus to the other one, we 
obtain an isotopy of (p,q) and (qg,p) torus knots. 
This homotopy of full tori can be expressed as a 
continuous process in $°. Indeed, torus knots of type 
(p,q) can be represented by a series of planar 
diagrams. Moreover, it is possible to demonstrate a 
way of coding a knot (link) as a (p-strand) braid 
closure. 

Analogously to the case of torus knots, one can 
define torus links which are links embedded into the 
torus standardly embedded in R?. We know the 
construction of torus knots. So, in order to draw a 
torus link, one should take a torus knot K D T (one 
can assume that it is represented by a straight linear 
curve defined by the equation g¢ — py = 0 (mod 27) 
and add to the torus T some closed nonintersecting 
simple curves; each curve should be nonintersecting 
and should not intersect K. Thus, these curves 
should be embedded in T\K, that is, in the open 


cylinder. Each curve on the cylinder is either 
contractible or passes the longitude of the cylinder 
once. So, each curve in T\K is either contractible 
inside T\K, or “parallel” to K inside T, that is, 
isotopic to the curve given by the equation g¢ — 
py=e(mod2z) inside T\K. Thus, the following 
theorem holds. 


Theorem 4 Each torus knot is isotopic to the 
disconnected sum of a trivial link and a link that is 
represented by a set of parallel torus knots of the 


same type (p,q). 


As we already know, a link invariant is a function 
defined on links that is invariant under isotopies. We 
shall represent links by using their planar diagrams. 
According to the Reidemeister theorem, in order to 
prove the invariance of some function on links, it is 
sufficient to check this invariance under the three 
Reidemeister moves. First, let us consider the 
simplest integer-valued invariant of two-component 
links. Let L be a link consisting of two oriented 
components A and B and let L’ be the planar 
diagram of L. Consider those crossings of the 
diagram L’ where the component A goes over the 
component B. There are two possible types of such 
crossings with respect to the orientation. For each 
positive crossing we assign the number (+1), for 
each negative crossing we assign the number (—1). 
Let us summarize these numbers along all crossings 
where the component A goes over the component B. 
Thus, we obtain some integer number and, in fact, 
this number is invariant under Reidemeister moves. 
The so-obtained link invariant is called linking 
coefficient. 


Polynomial Invariants of Knots and Links 


By changing a link diagram at one crossing we can 
obtain three diagrams corresponding to links 
L,, L_, and Lo which are identical except for this 
crossing. In the 1920s, Alexander gave an algorithm 
for computing a polynomial invariant Ax(t) 
(a Laurent polynomial in t) of a knot K, called the 
Alexander polynomial, by using its projection on a 
plane. He also gave its topological interpretation as 
an annihilator of a certain cohomology module 
associated to the knot K. In the 1960s, Conway 
defined his polynomial invariant and gave its 
relation to the Alexander polynomial. This poly- 
nomial is called the Alexander-Conway polynomial. 
The Alexander-—Conway polynomial of an oriented 
link L is denoted by V;.(z) or simply by V(z) when L 
is fixed. We denote the corresponding polynomials 
of L}, L_, and Lo by V}, V_, and Vo, respectively. 
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The Alexander-Conway polynomial is uniquely 
determined by the following axioms. 


Axiom 1 Let L and L’ be two oriented links which 
are ambient isotopic. Then 


Vr (z) = Vrk) [3] 


Axiom 2 Let S be the standard unknotted circle 
embedded in S°. It is usually referred to as the 
unknot and is denoted by O. Then 


Vo(z) =1 [4] 


Axiom 3 The polynomial satisfies the following 
skein relation: 


V+(z) — V-(@) = zVolz) 5] 


We note that the original Alexander polynomial 
Az is related to the Alexander-—Conway polynomial 
of an oriented link L by the relation 


A(t) = Viel? — t") 6 


In the 1980s, Jones discovered his polynomial 
invariant V;(t), called the Jones polynomial, while 
studying von Neumann algebras and gave its 
interpretation in terms of statistical mechanics. A 
state model for the Jones polynomial was then 
given by Kauffman (1987) using his bracket 
polynomial. These new polynomial invariants have 
led to the proofs of most of the Tait conjectures. 
The Jones polynomial Vx(t) of K is a Laurent 
polynomial in t, which is uniquely determined by a 
simple set of properties similar to the axioms for 
the Alexander-—Conway polynomials. More gener- 
ally, the Jones polynomial can be defined for any 
oriented link L as a Laurent polynomial in t!/7, so 
that reversing the orientation of all components of 
L leaves V; unchanged. In particular, Vg does not 
depend on the orientation of the knot K. For a 
fixed link, we denote the Jones polynomial simply 
by V. Recall that there are three standard ways to 
change a link diagram at a crossing point. The 
Jones polynomial is characterized by the following 
properties: 


1. Let L and L’ be two oriented links which are 
ambient isotopic. Then 


Vati = Va 7] 
2. Let O denote the unknot. Then 
Volt) =1 [8] 
3. The polynomial satisfies the following skein 
relation: 


aO A a [9] 
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An important property of the Jones polynomial that 
is not shared by the Alexander-Conway polynomial 
is its ability to distinguish between a knot and its 
mirror image. More precisely, we have the following 
result. Let K,, be the mirror image of the knot K. 
Then 


Vx, (t) = V(t = 1) [10] 


Since the Jones polynomial is not symmetric in ¢ and 
tt, it follows that in general 


Vk, (t) # Ve (EZ) [11] 


We note that a knot is called amphicheiral (achiral 
in biochemistry) if it is equivalent to its mirror 
image. We shall use the simpler biochemistry term. 
So, a knot that is not equivalent to its mirror image 
is called chiral. The condition expressed by [11] is 
sufficient but not necessary for chirality of a knot. 
The Jones polynomial did not resolve the following 
conjecture by Tait concerning chirality: if the cross- 
ing number of a knot is odd, then it is chiral. 
However, it has been demonstrated recently that a 
15-crossing knot provides a counterexample to the 
chirality conjecture. 


New Invariants and Their Applications 
in Mathematical Physics 


There was an interval of nearly 60 years between the 
discovery of the Alexander polynomial and the Jones 
polynomial. Since then a number of polynomials 
and other invariants of knots and links have been 
found. A particularly interesting one is the two- 
variable polynomial generalizing V, called the 
HOMFLY polynomial (name formed from the 
initials of authors of the article (Freyd et al. 1985) 
and denoted by P. The HOMFLY polynomial 
P(a,z) satisfies the following skein relation: 


a'P, —aP_ = zPo [12] 


Both the Jones polynomial V and the Alexander- 
Conway polynomial Vz are special cases of the 
HOMELY polynomial. The precise relations are 
given by the following theorem. 


Theorem 5 Let L be an oriented link. Then the 
polynomials Pi, V, and V_ satisfy the following 
relations: 


V,(t) =P, (t,t'/* — t12) and V1 (z) =P (1,z) [13] 


After defining his polynomial invariant, Jones also 
established the relation of some knot invariants with 
statistical mechanical models. Since then this has 
become a very active area of research. By 


constructing a typical statistical mechanics model — 
the star-triangle relations of the Yang—Baxter 
equations are an example of such model — one 
obtains a state model for the Alexander or the Jones 
polynomial of a knot, by associating to the knot a 
statistical system, whose partition function 


Zg := > Ex(s)w(s) [14 


gives the corresponding polynomial. (For details, see 
Jones (1989)). In the function above, w = F(X, S) —> R 
is a weight function and the sum is taken over all 
states s E€ F(X, S). The energy E; of the system (X, S) 
is a functional, 


Ex: F(X,S) ~R,REK [15] 


where the subscript k € K indicates the dependence 
of energy on the set K of auxiliary parameters, such 
as temperature, pressure, etc. 

However, these statistical models did not provide 
a geometrical or topological interpretation of the 
polynomial invariant. Such an interpretation was 
provided by Witten (1989) by applying ideas from 
quantum field theory to the Chern-Simons Lagran- 
gian. In fact, Witten’s model allows us to consider 
the knot and link invariants in any compact 
3-manifold M. 


Vassiliev Invariants and the Space 
of All Knots: New Generalizations 
of Knot Theory 


An entirely new collection of knot invariants, 
which arose out of techniques pioneered by Arnold 
in singularity theory, has been introduced by V A 
Vassiliev in the 1990s. The knot invariants, like 
the Alexander polynomial, associate a knot with 
some sort of mathematical quantity. A Vassiliev 
invariant, on the other hand, is an invariant that 
satisfies a set of conditions. In this sense, all the 
invariants introduced above — the Jones polyno- 
mial, the HOMFLY and the Kauffman polyno- 
mial, the Conway polynomial, and the Alexander 
polynomial — can all be shown to be Vassiliev 
invariants. However, not all the knot invariants are 
Vassiliev invariants, for instance, the signature of a 
knot is not a Vassiliev invariant. The new Vassiliev 
invariants have a solid basis in a very interesting 
new topology, where one studies not a single knot, 
but a space of all knots. Vassiliev’s knot invariants 
are rational numbers. They lie in vector space V; of 
dimension d;,i=1,2,3,..., with invariants in V; 
having “order” i. These invariants are built from 
different families of crossing changes. 


Considering that Vassiliev’s invariants require 
introducing an important conceptual change, shift- 
ing our attention from the knot K, which is the 
image of S! under an embedding ¢:S! — S>, to the 
embedding ¢ itself. A knot type K thus becomes an 
equivalence class {¢} of embeddings of S! into $è. 
The space of all such equivalence classes of embed- 
dings is disconnected, with a component for each 
smooth knot type. In this way, one passes from 
embeddings to smooth maps, thereby admitting 
maps which have various types of singularities. Let 
~M be the space of all smooth maps from St to S°. 
This space is connected and contains all knot types. 
Our space will remain connected and will contain all 
knot types if we place two mild restrictions on our 
maps. Let M denote the collection of all 6€ “M 
such that ¢(S') passes through a fixed point a and is 
tangent to a fixed direction at a. The space M has 
some interesting properties, the main one being that 
it can be approximated by certain affine spaces, and 
these affine spaces contain representatives of all 
knot types. The walls between distinct chambers in 
M constitute the discriminant X, that is, X= {¢ € 
M|¢} has a multiple point or a place where its 
derivative vanishes or other singularities. The space 
M — © is our space of all knots. 

The additive properties of the Alexander and 
Jones polynomials have a very attractive interpreta- 
tion in terms of Vassiliev invariants. By a result of 
Bar-Natan, all coefficients of the Alexander poly- 
nomial are Vassiliev invariants (see Bar-Natan 
(1995)). The same can be said of the Jones 
polynomial, as proved by a theorem of Birman and 
Lin (1993). There is an attractive formula due to 
Kontsevich expressing all Vassiliev invariants ana- 
lytically in terms of multiple integrals, assuming that 
the knot or link diagram comes with some generic 
Morse function (e.g., the projection of the planar 
diagram on the y-axis). Moreover, from the work of 
Kontsevich it follows that it is possible to give a 
purely combinatorial characterization of all Vassi- 
liev invariants (other than the one mentioned above) 
by associating to an oriented knot K in R? (given via 
coordinates z = z(t)(= x(t) + iy(t)), t) a chord diagram, 
which is just a circle with 2k distinct points labeled 
P;, Oj, 7=1,2,...,k, marked on it, and by imposing 
certain relations on the free abelian group freely 
generated by all chord diagrams. 


Theorem 6 Let Vx(t) be the Jones polynomial of a 
knot K. Let Vg(q) be the infinite series obtained 
from Vx(t) by substituting e1(=1 + q +q? /2! +- = 
S o g”/n!) for t. So we may write 


Vx(q) = bo + big + bq” = Ie 
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Then Jin(K) =bm is a Vassiliev invariant induced by 
the Jones polynomial of order (at most) m. 


The structure and significance of the HOMFLY and 
Kauffman polynomials can be interpreted in the 
language of Vassiliev invariants, which are invariants 
of finite type. The notion of finite type is of 
extraordinary significance in studying these invariants. 
One reason for this is the following basic lemma: 


Lemma 7 If a graph G (an embedded 4-valent 
graph) has exactly k nodes, then the value of a 
Vassiliev invariant vp of type k on G, v(G), is 
independent of the embedding of G. 


Let us show briefly this important result. Suppose 
V is any invariant of oriented links taking values in 
some abelian group. This V can be extended to be 
an invariant of singular links in the following way 
(Kauffman 2001): a singular link is an immersion 
of simple closed curves in $? with finitely many 
transverse double-points. These self-intersections are 
required to remain transverse in any isotopy 
demonstrating the equivalence of such singular 
links. If the definition of V has been extended over 
singular links with n — 1 double points, define it on 
a singular link Lx with n singularities by 


V(Ly) = V(L+) — V(L_) 


where V(L,.), V(L.), and V(L_) are identical except 
near a point where they form a node. Note that 
V(L,) and V(L_) each has n— 1 double points. 
Then V is called a Vassiliev invariant of order n, or 
an invariant of finite type n, if V(L)=0 for every 
L with +1 or more singularities. Recall the 
Alexander-Conway polynomial invariant, V,(z) € 
Ziz], of oriented links defined by Vunknor(Z) = 1 and 


Vi. (z) — VL (z) = 2V io (2) 


Extend this over singular links by the above method. 
Then if Ly is a link with r singularities, Vz, (z) = 
ZV (2), where Lo is a link with r — 1 singularities. 
Thus, by induction on r, if L has r singularities then 
Vı(z) has a factor of z’. This implies at once that the 
coefficient of z” in the Conway polynomial of a link 
is a Vassiliev invariant of order n. Now suppose one 
considers the HOMFLY polynomial and makes the 
substitution (l, m) = (itN/*,i(t-!/ — t!/2)). The char- 
acterizing skein relation becomes 


MAP a= RL = 0 A 


Note that this becomes the Jones polynomial when 
N =2. Now make the further substitution t= exp x. 
Here exp x should be thought of as the classical 
power series expansion. Of course, exp(x/2) and 
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exp(—x/2) have power series expansions; the power 
series can be multiplied and added to give another 
power series. Thus, P(L) has a power series 
expansion in powers of x. It follows immediately 
that P(L,)—P(L_)=xS(x) for some power series 
S(x). Hence, the proof used for the Conway 
polynomial shows at once that the coefficient of x” 
in the power series expansion of P(L) is a Vassiliev 
invariant of order n. 

All present studies of Vassiliev invariants clearly 
indicate a major role of these invariants in the future 
developments of knot theory and topological quan- 
tum field theories. Many questions in knot theory 
remain open, nevertheless, in future it will, very likely 
be one of the most fruitful and beautiful subjects of 
research in mathematics and in mathematical physics. 
Knot theory also attracts attention from the fact that 
it is revealing new astounding and profound links 
between geometry, algebra, and topology. 


See also: Finite-Type Invariants; The Jones Polynomial; 
Knot Invariants and Quantum Gravity; Knot Theory and 
Physics; Kontsevich Integral; String Topology: Homotopy 
and Geometric Perspectives; Topological Knot Theory 
and Macroscopic Physics; Topological Quantum Field 
Theory: Overview. 
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Introduction and Models 


Rarely has a paper with a simple title as “A solvable 
model of a spin glass” had such a tremendous impact 
on both physics and mathematics as the seminal 
paper of 1972 by Sherrington and Kirkpatrick, 
which introduced what is now known as the 
Sherrington—Kirkpatrick (SK) mean-field spin glass 
model. As solvable as it might have appeared to the 
authors, it was soon found that the heuristic 
solution, based on the so-called replica method, 
was physically unacceptable. The reason was a tacit 
assumption, now known as replica symmetry, that 
proved unfounded. Several years later, Giorgio 
Parisi provided an ingenious way out through his 
continuous replica symmetry-breaking scheme, that 
presented a solution that, through its complexity 
and intrinsic beauty, both stunned and fascinated 
the community. Unraveling the mysteries involved in 
this solution has presented a challenge and driving 
force for the last three decades of mathematical 
statistical mechanics, while the use of the method in 
theoretical physics opened the path to solving a wide 
variety of problems not only in the theory of 
disordered magnets, but also in neural networks 
and combinatorial optimization. In this article the 
focus is on the mathematical results obtained in the 
study of this and a number of related models. 


Mean-Field Models 


Mean-field models have played an important role in 
statistical mechanics by providing simple, solvable 
models in which some of the complex phenomena, 
such as phase transitions, could be studied and under- 
stood. For example, the Curie-Weiss model of a 
ferromagnet describes N spin variables g; (taking values 
+1) in interaction. The simplifying assumption com- 
pared to more realistic models, such as the Ising model, 
is to ignore the spatial structure of the model and allow 
all spins to interact with each other with equal strength. 
This yields to a Hamiltonian function of the form 


J N N 
—5 > cia) thd oi [1] 
TE Zi 


where J is a coupling constant and h a magnetic 
field. This from of the interaction implies that the 


Hamiltonian is in fact p a eon of the 
empirical magnetization mylo) =N! S-,_,0;, and 
this allows one to use tne from the neo of 
large deviations to analyze rather easily the corre- 
sponding Gibbs measures 

-Hx (0) 
+z 2 


uen (0) = ra 


The SK Model 


This model was a straightforward attempt to 
introduce a mean-field version of models with 
randomly interacting spins. The interest in such 
models arose from the discovery of certain alloys of 
ferromagnets and conductors (e.g., AuFe and 
CuMn) that had been found to exhibit very unusual 
magnetic properties. Ruderman and co-workers had 
proposed that in these models the magnetic ions 
with magnetic moments S; and S; located at the 
points x; and x; would interact via an exchange 
interaction of the form 

cos(R¢(x; — x;)) 

ee ry 

xi — x; 

Since the positions of the magnetic ions in the alloy 
are random, the signs of their interaction would be 
oscillatory. Anderson proposed a simplified model, 
in the spirit of the Ising model, where spins taking 
values +1 located on a regular lattice would interact 
via nearest-neighbor couplings J; modeled as 1.1.d. 
random variables uniformly distributed on an inter- 
val [—J, J]. In the spirit of the Curie-Weiss model, 
Sherrington and Kirkpatrick then proposed the 
mean-field model where any two spins would 
interact via iid. Gaussian random variables J; of 
mean zero and variance one. The SK Hamiltonian is 
thus given by 


SK 
Hy (o) = — 


z ` Dote ` oi [3] 


1<i<j<N 


where the normalization is chosen to ensure that the 
variance of Hyn is an extensive quantity. Although 
the two Hamiltonians superficially look similar, the 
main feature that allows one to solve the Curie- 
Weiss model is absent in the SK model: there is no 
way to write the Hamiltonian as a function of 
macroscopic variable(s) such as the magnetization. 
This implies that all methods known to solve the 
Curie-Weiss model fail here. The approach used 
systematically in the physics literature to overcome 
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this difficulty is to try to compute the mean free 
energy fan = —(1/GN), Eln Zs n using the formal 
identity Inx=limg)oq~'(x4 —1). For qeéN, one 
easily sees that (putting h = 0) 


This expression looks already more like the parti- 
tion function of an ordinary mean-field model, and 
the computation with standard methods seemed 
feasible. However, passage to the limit g|0 remains 
a highly risky enterprise, and it took the genius of 
Parisi to develop an approach that provided at least 
a physically meaningful and convincing answer. 
The replica method being dealt with elsewhere in 
this encyclopedia, this approach is not explained 
any further here, although we will explain the 
nature of the result in the light of recent rigorous 
work later on. 


Site Disordered Models 


The difficulties encountered with the random-bond 
interactions led readily to proposals of mean-field 
models that were closer to the Curie-Weiss model — 
from the point of view that they allowed the 
Hamiltonian to be written as a function of macro- 
scopic variables. The most important of these 
models was introduced by Figotin and Pastur. Here 
the disorder was introduced as an M-dimensional 
vector é; for each site i. The components of this 
vector are usually taken as 1.i.d. random variables €7 
taking values +1 with equal probability. One can 
then introduce M-dimensional vectors as macro- 
scopic variables that generalize the magnetization 
with components 


N 
milo) = N! >g 
i=1 


The Hamiltonian can then be written as 


M 
Hy(o) = -NY (m4 (0))” 
p=1 


1 N M 
=N A L 
ij=1 = 


These models were indeed found to be solvable with 
tools similar to those used in the Curie—Weiss case; 
however, they proved disappointing in that the 
solution did not show the characteristic features 
expected in a spin glass. In fact, it turns out the 
these models behave very much like a mean-field 


ferromagnet, except that as they display not just 
two equilibrium states at low temperatures, but 2M 
of them, concentrated on spin configurations o for 
which myn(a) takes values close to one of the values 
+m*(3)e,, where e, is the u-unit vector in R™ and 
m*(3) solves the equation m= tanh (6m) known 
from the Curie-Weiss model. This model might 
have been forgotten, had it not been rediscovered in 
1982 by Hopfield in the context of neural net- 
works. Hopfield realized that if c; are interpreted as 
the activation states (“firing” and “not firing”) of 
neurons in the brain, the form of the interaction in 
this model is exactly the one proposed earlier by 
Hebb for synaptic interaction between neurons 
having “learned” the M “patterns” €" in the past. 
He went on to interpret Hy(o) as the Lyapounov 
function of the retrieval algorithm by which the 
brain would recognize the learned pattern. Natu- 
rally, the fact the the configurations ¿” are minima 
of Hy then implies the functioning of the algorithm. 
The important observation of Hopfield was that, 
based on numerical experiments, the algorithm 
failed when M became too large. In fact, he 
observed a breakdown of the memory if M > 
0.14N. This meant that the interesting asymptotics 
in this model required to consider M as an 
increasing function of N. This regime was not 
covered by large-deviation-type results and an 
intensive program to investigate this model was 
initiated. Again, the replica method could be 
employed and yielded a very rich structure of the 
model, including an explanation of the findings of 
Hopfield. These models also turned out to be an 
important starting point for the rigorous analysis. 


Gaussian Processes and Derrida’s Models 


While the models discussed so far were motivated 
from the point of view of randomly interacting 
spins, Derrida had the consequential idea to view 
the Hamiltonian of such a model simply as a 
random process indexed by the set of all spin 
configuration. In the case of the SK model, this 
process was, moreover, a Gaussian process and thus 
characterized entirely by its mean and variance. For 
h =0 we see that 


BH (a) HX (0!) = > (rn(o,0))- 
where ry(o,0’)=N'ojo; is usually called the 
overlap. This opened the view to a much larger 
class of models. In particular, the simplest model 
from this perspective corresponds to taking Hyn(o) as 
a process of i.i.d. random variables. Derrida called 
this the random-energy model (REM). He also noted 


3 tn(o, a’) 


that it could be seen as the limit if a sequence of the 
so-called p-spin SK models corresponding to the 
covariance of the Hamiltonian being N(rn(c,o’))?. 
On the other hand, Derrida observed that another 
class of models could be defined that were easier to 
analyze while exhibiting much of the complex 
properties of the SK model. These are obtained by 
choosing the covariance not as a function of the 
overlap (resp. the Hamming distance), but of a 
ultra-metric distance related to dn(o,o’) = N7!( inf 
{i:0; #0,}—1). These models, called generalized 
Random-Energy Models (GREM) were analyzed by 
Derrida and Gardner in the 1980s and are now the 
only models where the full predictions of the Parisi 
theory can be rigorously justified. This is discussed 
in some detail later. 


Further Models and Applications 


There is a wealth of problems that can be 
interpreted in terms of disordered mean-field 
models, and which may be analyzed using methods 
developed here. Some of the most notable ones 
that have received more attention lately include: 
the perceptron, a feed-forward neural network 
was analyzed first by Gardner using the replica 
method. Very recently, Shcherbina and Tirozzi gave 
a rigorous justification of this result. The 
p-satisfiability problem is an important problem in 
computer science that also can be analyzed with the 
replica method. Rigorous results are still very 
limited. The number partitioning problem can be 
formulated as a random-energy model. Also, the 
most famous problem in combinatorial optimiza- 
tion, the traveling salesman problem, can be solved 
heuristically with the replica method. Another 
emerging field are applications to coding theory. 


Formulation of the Problem 


Given a model, that is, a Hamiltonian function 
defined as a random process, the ultimate goal is 
to describe the asymptotic properties of the 
corresponding Gibbs measure, ideally identifying 
a (random or deterministic) limiting measure, as a 
function of the temperature, 87t, and other 
parameters, such as the magnetic field hb. 

The first steps in this direction concerns global 
properties: 


e Does the ground-state energy density, 


lim max Hy (0) 
Noo aESN 


converge (in what sense?) and what is the limit? 
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e What is the limit of the free energy 
fon = = 57 
BN = gN PIEN 


It has been noted in the mid-1990s that such 
quantities are usually self-averaging, for example, 
in the sense that 
a (fo N = Efon) = 0, a.S. 

due to the concentration of measure phenomenon. 
However, until very recently, the existence of the 
limits was considered an open problem in most of 
the models described above. Guerra and Toninelli 
(2002) discovered that a clever use of comparison 
inequalities for convex functions of Gaussian 
processes allows one to prove a priori the existence 
of limits at least in the case of models based on 
Gaussian processes (SK, GREM). The main task is 
the computation of the values of the limit. 

If the free energy is known as a function of 
sufficiently many parameters, one can frequently 
compute a number of correlation functions that 
characterize the limiting measure as well. What one 
should compute is somewhat model dependent. 


Geometry of Gibbs Measures 
and Multi-Overlap Distributions 


The problem of satisfactorly describing the asymp- 
totic geometric properties of random Gibbs 
measures on {—1,1} is rendered difficult as the 
symmetries of the problem make the use of local 
topologies seem unattractive. A reasonable way of 
solving this problem is as follows. Let Dy be a 
distance on Sy normalized so that maxs, res, 
Dyn(o, T)=1. Then consider the mass distribution 
around any fixed point o, 


m,(x) = ug, N(Dn(0,0') < x) 


and construct the biased empirical average 


Kan = X Ho, N(0)êm,() 


oESN 


The set of distributions of these random measures 
is compact (with respect to the weak topology) 
and thus we can expect to construct limits. The 
law of Kg, n is fully determined by the family of 
averaged distributions of the distances between 
independent copies of o drawn from the Gibbs 
measures, 


Egy (Dn(o"," o°), ..., ma” 6) 
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In the SK models, one chooses 
1 
Dy(o,T) =1- a 


so that these quantities can be expressed as distribu- 
tions of the overlaps (1/N)X diTi, between n 
“replica” spin variables. In the GREM models, it is 
natural to chose as distance the lexicographic distance 
used in the construction of the models. In this case, the 
limits of 3 n can be constructed explicitly and it was 
shown that they can be expressed in terms of the size- 
biased empirical family size distribution of a certain 
continuous state branching process via a model- 
dependent time change. Since this plays a key rôle 
not only in the GREM s but in other models as well, we 
will go into some detail to elucidate this structure. 


Neveu’s Process and Random 
Genealogies 


The random structure of the limiting Gibbs 
measures of the GREM models (and presumably 
also the SK models, even though this is not proven) 
can be traced to a continuous-state branching 
process introduced by Neveu, and an induced 
associated random genealogy on the unit interval. 
Let Z, be a time-homogeneous continuous-time 
Markov process with state space R4 characterized 
by the Laplace transform of its transition kernel 


) 


Based on this process, construct a two-parameter 
process Z(t, a) with the property that, for any a,b > 
0, the processes Z(-, a) and Z(-,a+b)—Z(-,a) 
are independent and have the same laws as Z; with 
initial conditions a, resp. b. It follows that Z(t,-) is a 
stable subordinator with exponent e™. Now let 
O(a) = Z(t,a)/Z(t,1), as a function on [0,1], 0; 
being a random probability distribution function (of 
pure point type). Any such family u: of distributions 
defines in a natural way a genealogical structure on 
[0, 1]. Define the ancestor of a € [0,1] at time ż < 1 
to be aa) = 6,(0;'(a)), where 67! is the right- 
continuous inverse of the nondecreasing function @. 

We say that, for a,a’ € [0,1], g(a, a’)=t if and 
only if t= sup(s:a,(a) =a,(a’)). It is easy to see that 
1—g defines an ultra-metric distance. We can 
associate with this the distribution size of the offspring 
of an ancestor at time t, malt) = |a’: g(a, a’) < t|, and 
its size-biased empirical distribution 


1 
c= | da bn,(-) 
0 


—t 


E(e“"|Zp = a) = exp (aX 


In the GREM models, it can be shown that the 
quantity Kgn converges (weakly in law) to the 
corresponding K obtained from a time change of 
the family of measures 6;, namely 


OF” = Oin m(t)—Inm(0) 


where m is a nondecreasing function that can be 
computed explicitly. Namely, if EX,X,= 
A(dn(o,T)), and ā denotes the right-derivative 
of the concave hull of A, then 


m(x) = min(3-!V2In2/V/a(x), 1) 


As explained below, similar results are expected in 
the SK models. 


Interpolation Methods and Guerra’s 
Integral Representation 


Among the very important tools for the analysis of 
Gaussian models in particular have been the inter- 
polation methods that allow one to compare 
functions of processes with different covariance. 
While these methods go back to early work on 
Gaussian processes (Slepian, Kahane), they have 
been employed with remarkable success in the 
present context. Mostly, they consist r ea 
an interpolating Hamiltonian H'(c) = /tH(o 

V1 -—tK(o), where K is a reference (iak that "i 
certain o properties. Given any function F of 
the process (e.g., the free energy of the model), one 
then represents 


1 
F(H) = F(K) + J de FH!) 


Often the derivative on the right-hand side can be 
controlled rather well, for example, because of some 
obvious positivity properties. 


Example 1 (Guerra and Toninelli). Choose 


N 


/ 
> Jya; 


i<j=M+1 


1 
0,0, = 
mÈ i+ INM 


VM = 1 


and consider the free energy F(H*) = fg y. Then, first 





F(H’,) = F(Hm) + F(Hn-m). On he other hand, 
d 1 Goia J0i0; 
Ot = iO dij ij 
af (AN) -rin 32 P 
n 3 Jijia j 
ay =N =M) 


A key tool to be used at this stage is the so-called 
Gaussian integration by parts formula, Egf(g)= 
Ef’ (e). Applied here, this gives 


This proves superadditivity of NEf3,n, 
NEfs.n > MEfem + (N — M)Efs,N-m 


which, in turn, implies convergence of Efgn to 
a limit Efg. Moreover, standard concentration 
of measure estimates show then that fs also 
converges almost surely. 


Example 2 (Guerra, Aizenman-Sims-Starr). A 
more complicated application of the interpolation 
method allows one to relate the free energy to 
Parisi’s solution. This was first found by Guerra 
(2003), but a different, and in some sense more 
intuitive formulation, was given later by Aizenman 
et al. (2003). It is based on the following construc- 
tion. We consider a centered Gaussian process Hy (co) 
on Sn with covariance given by Ng(Rn(o,o’)) for 
some even convex function g:[— 1,1] — [0,1]. Let 
us take F(Hy)= ln E, e^) (the a priori expecta- 
tion E, need not be symmetric, but may incorporate 
a magnetic field). Before using comparison, we now 
want to go to a larger space. For this, introduce some set 
A equipped with some positive-definite quadratic form 
q, normalized such that gojg=1, and |qa w| <1, 
Va,aveA- Let Pq denote some probability measure 
on A. Now introduce a centered Gaussian process 
Ka on A, independent of Hx, whose covariance is 
given by Eo Ka! =1(o,0') = Gane? daa Elda a). 
Define 


G(Hyn + VNK) = lñ (EEn EA 


Obviously, G(Hn, K) =F(Hn) + F(x), where F(«) = 
In(Ea e78YN*«), The amazing idea is now to 
compare the process (Hyn + «) with another process 
No, Whose covariance is a linear function of Ry(o) 
(this is in some sense a Slepian’s process), and that 
otherwise is smaller than the covariance of (Hn + 
k); to wit 


Eaalate = Rn(o, o e (da, w) 


By these choices of covariances, one has that for x € 
[— 1,1], y € [0, 1], since g is even and convex, 


Mean Field Spin Glasses and Neural Networks 411 


g(x) + yg (y) — g(y) = xg' (y) 


It is an immediate consequence of Kahane’s theo- 
rem, respectively the same interpolation argument 
given above, that 


EG(Hw +) < EG(n) 


which translates into 
EF(HN) < EG(n) — EF(«) 


It is clear that we can optimize this bound by 
choosing A,g, and Py. Of course, the difficulty 
would be to find such a minimum. A first 
simplification of this optimization problem is to 
consider instead of the deterministic structure of P 
and g random-probability measures on the space of 
probability measures and quadratic forms on A, to 
average over the preceding equation with respect to 
their laws, and then take the infimum over all such 
random structures. This gives a (still incalculable) 
bound that Aizenman et al. (2003) have shown to be 
asymptotically sharp, that is, they showed that 


co eg) = eee) Bi) 


where u is short for all probability measures on the 
space of (Pa,da,v) on A (called “random overlap 
structures” (rosts) in Aizenman et al. (2003)). Guerra’s 
bound consists in restricting the infimum to a class of 
rosts where the bound is calculable ‘explicitly’. 
Maybe unsurprisingly, this is exactly the class of 
asymptotic models that have already arisen in the 
GREMs. In fact, we set A= [0,1], M = {m:[0,1] — 
[0,1], non-decreasing}, let q be the random genealo- 
gical distance associated to the family of measures 0”, 
and let Pa be the probability measure on A whose 
distribution function is 67’(a). Then Guerra’s bound 
states that 


lim EF(Hn) = Lee EG(n) — EF(«) 
where the expectations relate to all random quan- 
tities involved. By self-averaging, the same result 
holds almost surely. The right-hand side of this 
equation is known as (a particular formulation of) 
the famous Parisi solution. In fact, define the 
function f(g,y) as the solution of the nonlinear 
partial differential equation 


1 
Onf +5 (Hf + maaf) = 
with final conditions 


(1,4) = In cosh By 
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These equations can be solved by elementary means 
in the case when m is a step function. It turns out 
that, for given m, 


2 pl 

EGO) - EPH) = F(0,b,m, 3) -5 f amla) drla) 
where h = 67! cosh”! (E,01). This solution was origi- 
nally obtained using the replica method. The preceding 
construction gives, at the least, a clear mathematical 
meaning to the objects involved. In particular, the 
notion of “ultra-metric zero-dimensional matrices,” 
appears now to be equivalent to ultra-metric structures 
on the unit interval. 

In a recent paper, Talagrand (2003) has proven 
that converse inequality is also true in the preceding 
equation, confirming that Parisi’s solution yields the 
correct free energy in a large class of models of the 
SK type. 


Ghirlanda—Guerra Relations 


The appearance of a universal probabilistic structure 
in the asymptotics of these models may appear 
surprising. A partial explanation can be found in a 
set of remarkable identities between multi-overlap 
distributions that has been discovered first by 
Ghirlanda and Guerra (1998) in the context of SK 
models. If 13°N denotes the n-fold product Gibbs 
measure, the ae Guerra relations assert a 
recursion relation of the form 


Byes (Dulo, o) < t\Bn) 


== Ey w (Dní o,o 5) = t\Bn) 
lt+k 
1 
+— Eps NDN, gja t|B,) + o(1) 
These relations hold generically for Gaussian mean- 
field models, with Dy being the distance through 
which the covariance is defined. The proof of these 
relations is based on Gaussian integration-by-parts 
formulas, and concentration of measure inequalities. 
In the case of the GREM models, where Dn is ultra- 
metric, these recursions are sufficient to determine all 
n-replica overlap distributions in terms of the 2-replica 
distribution. On the other hand, the set of n-replica 
overlap distributions determines the law of the process 
K and thus the geometry of the Gibbs measure. In 
particular, they leave time changes of Neveu’s process 
as the only candidates for limit processes. In the case of 
the SK models, the same does not hold a priori, since 
the Hamming distance is not an ultra-metric. How- 
ever, since the Parisi solution is correct, this suggests 


very strongly that asymptotically the overlap distances 
are almost surely (with respect to the Gibbs measure) 
ultra-metric. Then, the Ghirlanda—Guerra identities 
also imply that the geometry of the Gibbs measures is 
described by the same structure. 


From Mean-Field to Lattice Models 


One of the widely discussed issues in the theory of spin 
glasses is to what extent the results of mean-field 
theory are relevant for lattice models. This issue has 
been addressed elsewhere in this encyclopedia by 
Newman and Stein. Here, we will only mention a 
recent result of Franz and Toninelli (2004) that shows 
that the free energy of the SK model can be represented 
as the limit of the free energy of lattice models when 
the range of the interaction tends to zero while their 
strength tends to zero in an appropriate way (the so- 
called Kac models). This still leaves open many finer 
questions, but hints to the fact that mean-field theory 
bears at least some relevance for realistic spin glasses. 


See also: Short-Range Spin Glasses: The Metastate 
Approach; Spin Glasses. 
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Introduction 


Loop spaces have been considered for their geo- 
metric interest (Freed Daniel 1988) where the space 
of based loops on a compact Lie group is endowed 
with a Kahlerian structure; see also the survey by 
L Gross (1988). The harmonic analysis on loop 
groups, developed by Pressley and Segal, is 
reviewed by Hsu (1997). Loop groups have also 
an impact in string theory (Bowick and Rajeev 
1987). They are related to Yang-Mills theory (Levy 
2003). A presentation of the history of measure on 
infinite-dimensional spaces has been given by 
P Malliavin (see Malliavin (1992) and references 
therein). The main problem is the construction of 
measures on the loop space which have quasi- 
invariance property. This has implications in 
representation theory (Neretin 1994, Jones 1995). 
Here we mainly concentrate on the nonlinear 
stochastic point of view and its interference with 
geometry. The geometrical study of the space of 
closed curves over a compact Riemannian manifold 
M, that is, the loop space over M, was initiated by 
Marston Morse in 1932. The loop space is itself a 
manifold where one can define a Laplace—Beltrami 
operator. A diffusion process can be considered on 
this manifold. Wiener defined the Brownian loop 
by the Fourier series 


u(r) = ET Gy 1 


k>1 


where the G; are independent normal variables. 
The time evolution of the Wiener loop and the 
extension of the theory to the case of a compact 
Riemannian manifold of finite dimension has been 
considered by Airault and Malliavin (1996, and 
references therein). The Brownian loop evolutes in 
the time parameter t as a Brownian sheet where 
the independent random variables G; are function 
of t. 

Starting from the zero loop, one obtains at time f, 
a random loop, and the law of this loop gives a 
measure on the loop space. A construction of this 
measure with functional analysis on infinite- 
dimensional manifold was done by Gaveau and 
Mazet (1979). The tools of stochastic analysis are 
important to the subject. The loop space of 
continuous maps from the circle to the multi- 
plicative group of complex numbers has a group 
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structure, hence the term “loop group.” On the loop 
group, we consider the multiplicative Brownian 
motion starting at one point of the circle and 
conditioned to come back at this point at time s. It 
defines a probability measure on the loop group. 
One can also consider the set of continuous maps 
from the circle to the set of complex numbers of 
modulus equal to 1. The loop group is the space of 
continuous closed paths on a Lie group. More 
generally, on a Riemannian manifold M, the 
Brownian motion on M defines a Wiener measure 
on the loops over M. To go from the path space to 
the loop space, an important tool is the quasisure 
analysis in infinite dimension. The quasisure analysis 
was developed by Airault and Malliavin (1996, and 
references therein) to obtain disintegrations of the 
Wiener measure and they have used this tool in 
1992 to construct measures on the loop group. The 
main problems are: 


1. The construction of heat kernel measures and the 
existence of a Brownian motion on the loop 
space, the existence of pinned Wiener measures 
obtained as the law of Brownian motions condi- 
tioned on the loops. 

2. The quasi-invariance of these transition prob- 
ability measures under translation, or multi- 
plication if we have a multiplicative structure, or 
under the infinitesimal action of suitable vector 
fields. For the path space over the n-dimensional 
Euclidean space R”, the Cameron—Martin theo- 
rem (1944) ensures the existence of a density 
which shows the quasi-invariance of the Wiener 
measure under translations. For the quasi- 
invariance, an important fact is the choice of 
the metric on the Cameron—Martin space. In the 
case of the Wiener measure, one considers the 
paths of finite energy, fs \h'(s)|7ds <+oo. This 
corresponds to the metric “1.” P Malliavin 
(1989, and references therein) discussed the 
case of metrics a with 1/2 <a <1. 

3. To define the “good” Cameron subspace, that is, 
find the vector fields that yield integration- 
by-parts formulas. The question occurs whether 
the Cameron—Martin space depends on time. For 
the loop space, it has been proved by Driver 
(2003) that it is not the case. A time evolution of 
the tangent Cameron—Martin space could appear 
eventually. 

4. The determination of the support of the measures 
(e.g., the Wiener measure) is carried by the set of 
Holder functions of order 1/2 — e. 

5. The absolute continuity of the measures with 
respect to each other. 
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The Construction of Heat Measures 
on the Loop Space and Their 
Quasi-Invariance 


The construction of measures giving a solution to 
the infinite-dimensional heat equation as well as the 
study of the quasi-invariance of the Wiener measure 
on the path space was started extensively in the 
work by Bismut, followed by Gross (1998), then by 
Aida and Elworthy (1995) where the loop group is a 
suitable manifold to extend to infinite-dimensional 
manifolds the log-Sobolev inequalities, by Malliavin 
and Malliavin (1992, and references therein) where 
the measures on the path space and the path group 
have been studied. Consider a compact Lie group G 
with unit e and let G be its Lie algebra. From the 
G-valued Brownian motion, one can construct a 
family of measures (7), on the path space. These 
measures py are the images of the Wiener measure 
on G through the Ito map 


dg.(T) = Vt gx(T)dx(7) with g.(0)=e [2] 


The convolution of two measures uf and u$, is equal 
to Mis». By choosing the initial value of the path 
randomly distributed according to the Haar measure 
on G, it defines a family of measures (/1;);.9 on the 
path space with 


| Eml = f dg f Fene) 


The Laplacian on the path group is defined by 


(Arf)(g) = lim | f Fenaa) = F) 


The heat equation is valid for the measures (/1;);5 0 
on the paths, 


5 | Fladm(de) = | (Arel) 


Moreover, there is a quasi-invariance density k,,(g) 
defined on the path group (go and g are paths with 
values in G) such that 


(go) = J ke, (8) (dg) 


where goA is the translated on the left of the subset 
A in the path space over G. This is a generalization 
to the path space of the classical Cameron—Martin 
theorem. Then, one can consider the loop space. The 
free loop space is the set of continuous maps g from 
[0,1] to G such that g(0) = g(1), and the loop space 
with a base point is the set of maps such that 
g(0) =g(1) =m is fixed. One can define the pinned 
Brownian motion on the group G to obtain the 


pinned Wiener measures ( i eso on the loop group 
(Malliavin and Malliavin 1992, Driver and 
Srimurthy 2001). Denote by p;(g) the solution of 
the heat equation on the group G. Let g be a map 
from [0, 1] to the finite-dimensional Lie group G. For 
Ti, T2... Tn € [0, 1], consider the evaluations of the 
map 8, 8m, 8m,- --;8, E€ G, Let f be a real function 
defined on G and denote by dg the Haar measure on 
G. The measure ut on the loop group is given by 


J f E E T 


= J flere. ak ee is (21)Pitn—r) (81 22) o. 


x Piles En 12a Pitti) (8n) dey oT dgn 


From uč, one defines a measure u? on the free 
loops by taking the mean over G as 


J fon = f dg | flevut-(ay 


The quasi-invariance property for the pinned Wiener 
measure was proved by Malliavin and Malliavin 
(1992). 

When the measures (ut),>o are obtained by 
conditioning and quasisure analysis, we have heat 
kernel measures. The case of heat kernel measures 
defined on the loop group has been studied by 
Airault and Malliavin by disintegrating the measures 
on the path space and using the quasisure analysis. 
The Laplacian on the loop group is defined as it has 
been for the Laplacian on the path space, 


(AL P(e) = limt | f Flges)ub (dss) = Fe) 


but now the heat equation has a Kac’s potential ®; 
defined on the loops. On the loop group, the heat 
equation is 


a J On a= / KALADELE) [3 


where 


1 d 
a) = ~2 = log p:(e) 


1 
/ di(s)I(s)~' 
0 


-Żdimg 














2 
G 


The case of the circle, G=R/27Z, is interesting. 
The law of the functional 


1 
/ di(s)I(s)~ 
0 


is given in Airault and Malliavin (1996, and 
references therein). Moreover, the study of the heat 


measures over the loop group of R/27Z brings new 
identities on the classical Jacobi theta function 


p:(0)=1+2 X cos(n0) e™™ t2 at 9=0 


n>1 


Let 


d 1 

= —2 7 log p:(0) = 

The following system of differential equations is 

given by Airault-Malliavin (1996, and references 
therein): 


To pass from path space to loop space, it is 
convenient to use the “tubular chart” introduced 
by Gross and the quasisure analysis developed by 
Airault-Malliavin. Let ®:y—~y(1)y(0)*! from the 
path space to the group G; then the free loop 
space over G is (e). There exists a neighbor- 
hood V of the neutral of G such that @!(V) is 
diffeomorphic to V x L(G), the product of V with 
the loop space over G. With this diffeomorphism, 
one can disintegrate the measures on the path 
space and obtain the measures on the loop space. 
The Cameron—Martin formula on the path space 
of the group G is obtained from the Cameron- 
Martin formula for the Wiener space and the Ito’s 
map. Let y be a differentiable path with finite 
energy on G, that is, 














it holds 


J TE J f(g) ky(g) e(dg) 


Let us denote by (|)g the Euclidean scalar product on 
the Lie algebra G; then the density is given by 


1 
k, (g) =exp ; [oo rOle) 


if i 
— — l s 
The previous approach relies on the heat equation 


on the loop space. Thus, the metric on the 
Cameron—Martin loop or path space is important. 


(sy Ls) 
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The problem of quasi-invariance for metrics œ with 
1/2 <a < 1 relates to the random series 


sink 
to(t) =J g Ga 4 
k>1 





where the G; are independent normal variables. 
Driver (2003) solved the problem for 1/2 <a<1 
by Riemannian geometry in infinite dimension. 
The Ricci curvature appears in the integration-by- 
parts formulas on the loop space. The case of the 
metric 1/2 is out of reach. Fang (1999) calculated 
the Ricci curvature of the loop manifold for 
metrics œ > 1/2 and showed that when a— 1/2, 
these Ricci curvatures tend to a limit. Another 
presentation of the problem is that of Pickrell 
(1987), where he obtains a family of quasi- 
invariant measures on Grassmannians. 

Given a family of measures (ut);>o on the path 
space of a Riemannian manifold, one defines a heat 
operator as a family (L£;),.9 of operators depending 
upon t € [0, +oo[ such that 


d 


where F is a function defined on the path space. The 
heat equation with a potential as [3] gives an 
example of a heat operator. Heat operators have 
been constructed for the path space over R” by 
Airault—Malliavin, obtaining, after an integration by 
parts on the path space, a heat operator of first 
order. This introduces the notion of dilatation vector 
fields on the path space. In the case of the flat 
Wiener space, to each point x in the path space is 
associated the dilatation vector field Y such that 
(Yf)(x) = (x|(grad f)(x)). This gives a rescaling of the 
Wiener measure under dilatations. This idea has 
been exploited by Mancino (1999), who extended 
the method to free loop groups. 


Integration-by-Parts Formulas 


The Cameron—Martin space plays the role of the 
tangent space to the Wiener space. The integration- 
by-parts formulas are an infinitesimal version of the 
Cameron—Martin quasi-invariance property. Let G 
be a compact Lie group or any product of R” by a 
compact Lie group. For a vector field z, the 
differentiation on the right 0!'8" and differentiation 
on the left ə! are given by 


ale F(p) — lim F(exp(ez)p) 7 F(p) 


0 € 
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and 


geht F(p) — lim F(p exp(€Z)) 7 F(p) 


€ 


The operator ðÏ8ht commutes with the translation on 
the left, for a translation T: then gipon] = 
(ON8ht F) olett and vice versa for oO, 

For the measures on the path space or loop space, 
the problem is to prove the integration-by-parts 
formulas. On the path spaces on G, let up, be the 
Wiener measure on the set of paths starting from e, 
there exists a density k; such that E[exp(ck,)]| is 
finite and 


[a Fe) dure) = [Fle dkedur,(e) 
P.(G) P.(G) 


The density k, is defined on the path space by 


1 
ke(g) = / < gle) (Egl), du(t) > 


This was proved by a number of authors (see, e.g., 
Pickrell (1987) and, in a geometrical context, 
Cruzeiro and Malliavin (1996)). 

The existence of a density for the differentiation 
on the left is valid for any Lie group. This is not true 
for the differentiation on the right. If G is 
noncompact or is not the product of R” by a 
compact Lie group, the existence of k, is not proved 
on the right. This comes from the fact that the map 
Ad defined on the path group as a parallel transport 
does not preserve the Cameron—Martin subspace. In 
the case where G is not a product of a flat space by 
a compact Lie group, the Cameron space, which is a 
kind of “tangent space” to the infinite-dimensional 
loop manifold, is not closed under the Lie bracket of 
vector fields. 

The integration-by-parts formulas are obtained 
with the stochastic calculus of variation. On a group 
G, consider Yj, Y2,..., Yp, p independent left- 
invariant vector fields. Let G be the Lie algebra of 
G. The second-order differential operator A= 

a Y defines a left-invariant diffusion g„(t) on 
the group G with the stochastic equation 
dgu (t) = 2,(1) D (Y;),0 du* | where (w) are inde- 
pendent Brownian motions on the Euclidean space 
G. In the work by Malliavin and Malliavin (1992, 
and references therein), the stochastic calculus of 
variation is done with the right-invariant connection 
on the Lie group by setting 


ri d = 
PO = Te o EPOE 


where / is a differentiable function of t with 
values in the Lie algebra G, with finite energy 


i, Ih'(s)|? ds < +oo. By taking the derivative with 
respect to € in the Stratonovitch equation 


g(t) "odg (t) = dw(t) + &h'(t) dt 


and letting «=0, it turns out that "8" is a differenti- 
able function of t and its derivative is given by 


d _ 
© 8 (0) = g.(t)h'()g.(t) | 
The situation is not the same for 


g = d 


—1 
deje=0 Sy OlBe) 


where dd'(z) is a stochastic differential. This 
generalizes to an arbitrary Riemannian manifold 
using a coupling of connections (see Airault and 
Malliavin (1996), and references therein). The 
construction of the appropriate Cameron subspace, 
that is, the choice of the infinitesimal action of 
vector fields on the measure, is of importance. In the 
commutative case of the path space over R”, the 
classical Cameron—Martin subspace of paths h such 
that T Ib'(s)|7 ds < +oo is time invariant. To define 
the vector fields acting on the path (or loop) space 
over M, it is necessary to consider the geometry of 
the manifold M. The infinitesimal transformations 
which preserve the Riemannian metric are called 
Riemannian connections. In the case where M is a 
group, the natural connections are those defined by 
the parallelism on the group. For a Riemannian 
manifold, Driver proved the existence of integration- 
by-parts formulas for the measures on the path 
space of M when M is endowed with a torsion skew- 
symmetric connection. The Levi-Civita connection, 
since it is torsionless, is of course a Driver (2003) 
connection. If the connection is not skew-symmetric, 
then two coupled connections permit study of the 
€-variation or “reduced variation” of a path, and one 
obtains a Cameron—Martin formula on the path and 
on the loop space of the Riemannian manifold M 
(Fang 1999). The method of reduced variation can be 
used to obtain the integration-by-parts formulas over 
path and loop spaces. Another approach to the quasi- 
invariance problem, using two-parameter processes, 


has been provided by Norris (1995). 


The Support of the Measures and 
Absolute Continuity with Respect 
to Each Other 


Given a Riemannian manifold M, let (u+), be the heat 
kernel measures on the path space of M and let (p;), 
be heat kernel measures on the loop space of M; the 
question arises whether p; is absolutely continuous 


with respect to us. For a connected compact Lie 
group G, consider the path and loop groups on G. 
The pinned Wiener measure on the loop group is 
defined as the law of a G-valued Brownian motion 
starting at e and conditioned to end at e, and the heat 
kernel measure is the endpoint distribution of 
Brownian motion on the loop group. 

It has been shown (Driver and Srimurthy 2001) 
that the heat kernel measure is absolutely continuous 
with respect to the pinned Wiener measure, and that 
the Radon-Nikodym derivative is bounded. This 
proof relies on the heat formula with a potential 
[3], which is satisfied by the heat kernel measure. 
They give a new proof of this heat formula. When the 
group G is simply connected, Aida and Driver (2000) 
prove that the heat kernel measure over a based loop 
group, constructed by using the Brownian motion is 
equivalent to the Brownian bridge measure over a 
based loop group. When G is the circle, the Radon- 
Nikodym derivative of the heat kernel measure with 
respect to the pinned Wiener measure can be 
calculated in terms of the Jacobi theta function 
(Driver and Srimurthy 2001). On the loop space of 
R”, at time t, the two measures, “heat kernel” and 
“pinned Wiener” are the same. 


See also: Abelian and Nonabelian Gauge Theories Using 
Differential Forms; Lie Groups: General Theory; Malliavin 
Calculus; Path Integrals in Noncommutative Geometry. 
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Introduction 


The theory of metastability studies the states of 
the matter which “should not be there,” but which 
still can be observed, albeit for only a short time. 
One example is water, cooled below the zero 
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temperature. This supercool water can stay liquid, 
but not for a long time, and it then freezes abruptly. 
Such states are called metastable. They are not 
equilibrium states; at negative temperatures the only 
equilibrium state of water is ice. Physically, these 
metastable states are produced from the equilibrium 
states by slowly changing the external parameters, 
such as the temperature (or magnetic field): one 
takes, for example, water (extremely purified) at low 
positive temperature, T > 0, and then lowers the 
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temperature slowly to negative values T < 0. Thus, 
the family of metastable states, $r, T < 0, should 
be thought as a continuation of the family sr, T > 0 
of equilibrium states through the point of phase 
transition T, =0, at which critical temperature these 
states cease to exist as equilibrium states. 

Below we will present rigorous results, which 
validate the above picture for the case of the 2D 
Ising model. They are contained in Schonmann and 
Shlosman (1998). The relevant external parameter 
in this case will be the magnetic field, bh. 

It turns out that the lifetime of metastable states is 
determined by the quantities given by the Wulff 
construction. 


Equilibrium States and Dynamics 


Let us denote the set {—1, +1} of the Ising model 
configurations o by Q. Two configurations are 
specially relevant, the one with all spins —1 and the 
one with all spins +1. We will use the simple 
notation — and + to denote them. 

Observables are just functions on Q. Local observ- 
ables are those which depend only on the values of 
finitely many spins. 

We will consider the formal Hamiltonian 


Ho) =- X o(x)o(y)-—bS ox) [1] 


x,y n.n. 
where h € R! isthe external field and o € Qis a generic 
configuration. We define, for each set A CC Z% and 
each boundary condition £ € Q, 


Ay cp(o =-S o(x)o -> o(x)€(y) 
ek. jensen 
-hX _o(x) 
xEA 


The “grand canonical Gibbs measure” in A with 
boundary condition € under external field hb and at 
temperature T is defined on Q4 as 


= Zik r pexp( -8Ha ¢,4(0)) 


where 6 = T~, and the partition function Z4, ET, p 18 
a normalization, chosen such that ma é 7,,(Qa) = 1. 
The equilibrium states are obtained by taking the 
thermodynamic limit lim, ,72 HA, T,b. We will be 
interested in the states 


HA, E, T,h(0) 


M+ Tih = lim HA, +,T,h 
isn 


corresponding to (+)-boundary conditions. If b Æ 0, 
then u T,p = H4,T,p, SO it will be denoted simply by 
uTt,p. If b=0, the same is true if the temperature 
is larger than or equal to a critical value Te = Te, and 


is false for T < Te, in which case one says that there is 
phase coexistence. The measure u+, T,0 = [4,7 is 
called the (+)-phase, and w_,7— the (—)-phase. 

For an observable f we will denote by (f), its 
expected value in the state p,, that is, the integral 
| fdu.. In particular, the spontaneous magnetization 
m*(T) equals by definition to (o(0)), r- 

Next, we need to supply the Ising model with the 
time evolution. For this we will use the Glauber 
dynamics. It is a Markov process on Q, whose 
generator, L, acts on a generic local observable f as 


= >) (x, 0)(F(o*) - F(a) 


xEZ? 


where o~ is the configuration obtained from ø by 
flipping the spin at the site x to the opposite value, 
and c(x, co) is the rate of the flip of the spin at the site 
x when the system is in the state o. In words, one 
can say that the dynamics proceeds as follows: at 
every site x the spin o(x) is flipped randomly, 
independently of all others, with the rate c(x,0o), 
where o is the current configuration. Common 
examples are “metropolis dynamics”: 


cp(x, 0) = exp(—G(AxHy(o))”) 


or “heat bath dynamics”: 


cp(x, 0) = [1 + exp(8AsH,(0))] 

Here (a)" = max{a,0}, and A,H,(oc)=H,(o*) — 
H,(c). The spin flip system thus obtained will be 
denoted by (ot, p:t)t20> Where € is the initial con- 
figuration at time t=0. If this initial configuration 
is selected at random according to a probability 
measure v, then the resulting process is denoted by 
(o7, p:t)t>0; It is known that the Gibbs measures are 
invariant with respect to the stochastic Ising models. 
Moreover, 


y 


= + 
OT bt T UT OT ht T H+T,h, aS E> oo 


We will be interested in the case when / is 
positive, though small. Then there is only one 
invariant state, j1;,7,,, SO the state w_7,, is equal 
tO Mi Ths wad OT py T H4,T,hs aS t — œ. (One 
should intuitively think about the state OT p, for 
t small as the supercooled but liquid water, 
thinking about the state ui 7, to be ice.) We 
want to control the convergence of the temporal 
state oF p, to the equilibrium, 1, 7,,, and to see, if 
possible, that during some (long) initial time the 
state OT p,p looks very similar to the (—)-phase 
H— T, while after some time threshold it changes 
suddenly and looks quite similar to the state 
H4,T,p- It turns out that all the above features 
can indeed be established rigorously. 


If one starts to simulate the above dynamics 
on a computer, then the picture observed would 
be the following: one would see that droplets of 
the (+)-phase are created in the midst of (—)-phase 
droplets, which are there for a while, and then 
disappear. That process goes on for a while, until 
a big enough (+)-droplet is born; this one then 
starts to grow and eventually fills up all the 
display. 


The Life Span of Metastable States 
Let us define the “critical time exponent” A, = A(T) by 


Wr 


^e = 2m (TT 


|2] 


where w,=w,, is the value of the surface energy 
of the Wulff curve of our 2D Ising model at the 
temperature T: 


w, = W, (%9, ) 


Suppose now that T < T., h > 0. Let v be either the 
(—)-phase u- T or 6;,—-}. (In fact, any v “between” 
these two states would go.) Then the following 
happens. 


1. £O < A< Aa then for each n € {1,2,...} and for 
each local observable f, 


E (f G = aplat) ) 


n—1 


= 2 bi(f)h! + OH”) [3] 


where 


(We stress that in the last relation we are using 
the Gibbs states corresponding to the negative 
values of the magnetic field.) In particular, 


E( ofr ptm exp 0/0} (0) 
= -m (T) + O(h) 4] 
2. If A > Aa then for any finite positive C there is a 


finite positive Cı such that for every local 
observable f, 


E(f) E Prp 
< ilf el- 5} 5 
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The relation [3] implies that the family of 
nonequilibrium states (-}f p;x% > 0, defined for 
every local observable f by 


T ae 


is a C*-continuation of the curve {(-)_ r p,’ < 0} of 
equilibrium states. This is true for every 0 < A < A 
and every v as above. The states (-);,., are the 
“metastable states” we are looking for. The relations 
[3] and [4] should be interpreted in the sense that 
before the time exp{A,//} our temporal state is still 
“liquid,” while [5] means that after the time 
exp{A./h} freezing happens. So one can think about 
the quantity exp{A./h} as being the life span of the 
metastable state. 

This theorem was obtained in Schonmann and 
Shlosman (1998). Let us explain the heuristics 
behind it. It has two ingredients. The first one is 
that the transition to the equilibrium is going via 
creation of droplets of the (+)-phase. The second 
one is that once such a droplet is created by a 
thermal fluctuation, with the size exceeding a certain 
critical value, it does not die out, but grows further, 
with a speed v of the order of h. (This second belief 
can be expected to be correct only in dimension 2.) 
Let us see how these two hypotheses can give us the 
right answer. To get to the equilibrium we have to 
overcome the energy barrier, by creating a large 
droplet of the (+)-phase. Subcritical droplets 
are constantly created by thermal fluctuations in the 
metastable phase, but they tend to shrink. On the 
other hand, once a supercritical droplet is created 
due to a larger fluctuation, it will grow and drive the 
system to the stable phase. Indeed, the energy ®(m) 
of an m-shaped droplet of the (+)-phase in the sea of 
(—)-phase equals W,(m) —2m*(T)h vol(m). For 
small m the functional ®(m) decreases as m shrinks, 
while for large m the functional ®(m) decreases as m 
grows. Its saddle point m,q is precisely the Wulff 
shape. Since the minimal height of the barrier is 
(m,a), one predicts the rate of creation of a critical 
droplet with center at a given place to be 


Comparing with [2], we see that we miss the 
correct answer 


ePi aT 


by a factor of 1/3. The reason for that is the 
following. Note that we are concerned with an 
infinite system, and we are observing it through a 
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local function f, which depends on the spins in a 
finite set supp(f). For us, the system will have 
relaxed to equilibrium once supp (f) is covered by 
a big droplet of the (+)-phase, which appeared 
spontaneously somewhere and then grew, as 
discussed above. We want to estimate how long 
we have to wait for the probability of such an 
event to be close to 1. If we suppose that the 
radius of the supercritical droplet grows with a 
speed v, then we can see that the region in 
spacetime, where a droplet which covers supp (f) 
at time t could have appeared, is, roughly speak- 
ing, a cone with vertex in supp(f) and which has 
as base the set of points which have time 
coordinate 0 and are at most at distance tv from 
supp (f). The volume of such a cone is of the order 
of (vt)*t. The order of magnitude of the relaxation 
time, tel, at which the region supp (f) starts to be 


Minimal Submanifolds 


T H Colding and W P Minicozzi Il, University of 
New York, New York, NY, USA 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Soap films, soap bubbles, and surface tension were 
extensively studied by the Belgian physicist and 
inventor (the inventor of the stroboscope) Joseph 
Plateau in the first half of the nineteenth century. At 
least since his studies, it has been known that the 
right mathematical model for soap films are minimal 
surfaces — the soap film is in a state of minimum 
energy when it is covering the least possible amount 
of area. Minimal surfaces and equations like the 
minimal surface equation have served as mathemat- 
ical models for many physical problems. 

The field of minimal surfaces dates back to the 
publication in 1762 of Lagrange’s famous memoir 
“Essai d’une nouvelle méthode pour déterminer les 
maxima et les minima des formules intégrales 
indéfinies.” Euler had already, in a paper published 
in 1744, discussed minimizing properties of the 
surface now known as the catenoid, but he only 
considered variations within a certain class of 
surfaces. In the almost one-quarter of a millennium 
that has past since Lagrange’s memoir, the subject of 
minimal surfaces has remained a vibrant area of 
research and there are many reasons why. The study 
of minimal surfaces was the birthplace of regularity 
theory. It lies on the intersection of nonlinear elliptic 
PDE, geometry, topology, and general relativity. 


covered by a large droplet can now be obtained by 
solving the equation 


p 
(Ut ret) ta exp} - e] ~N 1 
T 
This gives us what we want: 


1 
Fral ~ p 2/3 exp} ge 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Large Deviations in 
Equilibrium Statistical Mechanics; Wulff Droplets. 


Further Reading 


Schonmann RH and Shlosman S (1998) Wulff droplets and the 
metastable relaxation of the kinetic Ising models. Communi- 
cations in Mathematical Physics 194: 389-462. 


In what follows we give a quick tour through 
many of the classical results in the field of minimal 
submanifolds, starting at the definition. 

The field of minimal surfaces remains extremely 
active and has very recently seen major develop- 
ments that have solved many longstanding open 
problems and conjectures; for more on this, see the 
expanded version of this survey (Colding and 
Minicozzi II, 2005). See also the recent surveys 
(Meeks III and Perez 2004, Perez 2005), and the 
expository article (Colding and Minicozzi II 2003). 

Throughout this survey, we refer to Colding and 
Minicozzi II (1999) for references unless otherwise 
noted. 


Part 1. Classical and Almost 
Classical Results 


Let X C R” be a smooth k-dimensional submanifold 
(possibly with boundary) and Cj°(N&) the space of 
all infinitely differentiable, compactly supported, 
normal vector fields on X. Given ® in C>°(N%), 
consider the one-parameter variation 


Dro = {x +E B(x)|x € dU} [1] 


The so-called first variation formula of volume is the 
equation (integration is with respect to d(vol) 


dt 





Vol(E,0) = | (8,4) 2| 


t=0 yu 


where H is the mean curvature (vector) of X. (When 
X is noncompact, then 6 in [2] is replaced by 


Tg, where I is any compact set containing the 
support of ®.) The submanifold © is said to be a 
“minimal” submanifold (or just minimal) if 


a Vol(X;6) =0 forall BE CE(NN) [3] 
dz ,_o 

or, equivalently by [2], if the mean curvature H is 
identically zero. Thus, © is minimal if and only if it 
is a critical point for the volume functional. (Since a 
critical point is not necessarily a minimum, the term 
“minimal” is misleading, but it is time honored. The 
equation for a critical point is also sometimes called 
the Euler-Lagrange equation.) 

Suppose now, for simplicity, that © is an oriented 
hypersurface with unit normal my. We can then 
write a normal vector field ® € C(NX) as = ¢ny, 
where function ¢ is in the space C (X) of infinitely 
differentiable, compactly supported functions on X. 
Using this, a computation shows that if X is 
minimal, then 


2 


Vo= -J i 4] 


t=0 





where 


Lyo = Aso + |A| ¢ [5] 


is the second variational (or Jacobi) operator. Here, 
Ay is the Laplacian on z a A 7 the second 
fundamental form. So |A| =K? +r? +- +k, 
where kK1,...,Kn—1 are the principal curvatures of 
£ and H = a +- + ky—1) ng. A minimal submani- 
fold X is said to be stable if 


d? 


JZ Vol(X:6) >0 forall # € C(NE) [6] 


t=0 





Integrating by parts in [4], we see that stability is 
equivalent to the so-called stability inequality 


faes fiver 7] 


More generally, the “Morse index” of a minimal 
submanifold is defined to be the number of negative 
eigenvalues of the operator L. Thus, a stable 
submanifold has Morse index zero. 


The Gauss Map 


Let ©? c R? be a surface (not necessarily mini- 
mal). The Gauss map is a continuous choice of a 
unit normal n: — S° c R°. Observe that there 
are two choices of such a map n and —n 
corresponding to a choice of orientation of X. If 
X is minimal, then the Gauss map is an (anti) 
conformal map since the eigenvalues of the 
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Weingarten map are kı and k2 = —K,. Moreover, 
for a minimal surface 


A|" =k tk = —2 kik; = —2 Ky [8] 


where Ks is the Gauss curvature. It follows that the 
area of the Gauss map is a multiple of the total 
curvature. 


Minimal Graphs 


Suppose that 4:2 C R? >R is a C function. The 
graph of u 


Graph, = {(%, y, u(x 


has area 


y) yE | 


Area(Graph,) = J (1,0, ux) x (0, 1, u )| 
Q 


= | Jtt 

Q 

= J 1+ (vu? [10] 
Q 


and the (upward pointing) unit normal is 


_ (1,0, ux) x (0, 1, uy) _ (üz, — üy, A) 11 


}(1, 0, ux) x (0,1, u,)| 1 4 Vul” 


Therefore, for the graphs Graph, ,,, where 7/0Q =0, 


u+tn 
we get that 
Area(Graph,,, ,,.) = J \V1+|Vu+tVn? [12 
Q 
Hence 


d 
JLo Area(Graph, PR 


= [L A =- | ndiv a [13] 
mal tavu Q \/1+|Vul? 


It follows that the graph of u is a critical point for 
the area functional if and only if u satisfies the 
divergence form equation 


jg E, 14] 


\/1 + |Vul? 


Next we want to show that the graph of a 
function on Q satisfying the minimal surface 
equation, that is, satisfying [14], is not just a critical 
point for the area functional but is actually 
area minimizing amongst surfaces in the cylinder 
QxRCR?’. To show this, extend first the unit 
normal n of the graph in [11] to a vector field, still 
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denoted by n, on the entire cylinder Q x R. Let w be 
the 2-form on 2 x R given that for X, Y € R? 


W(X, Y) = det(X, Y, n) [15] 


An easy calculation shows that 


d O —Uy. 
Ww = — —————— 
oe 1+ |Vul 
o =i 


ee E a 16] 
oy 1+ |Vul? 


since u satisfies the minimal surface equation. In 
sum, the form w is closed and, given any X and Y at 
a point (x, y, z), 


Jw(X, Y)| < |X x Y| /17] 
where equality holds if and only if 
X, Yc T cyana rapi [1 8| 


Such a form w is called a “calibration.” From this, 
we have that if © C Q x R is any other surface with 
0X = ð Graph,, then by Stokes’ theorem since w is 
closed, 


Area(Graph,,) = J 


G= / w<Area(X) [19] 
Graph,, > 


This shows that Graph, is area minimizing among 
all surfaces in the cylinder and with the same 
boundary. If the domain 2 is convex, the minimal 
graph is absolutely area minimizing. To see this, 
observe first that if Q is convex, then so is Q x R and 
hence the nearest point projection P : R? + x R is 
a distance nonincreasing Lipschitz map that is equal 
to the identity on Q x R. If © c R? is any other 
surface with £ =ð Graph,, then X'=P(£) has 
Area(>’) < Area(X). Applying [19] to X’, we see 
that Area(Graph,) < Area()’) and the claim 
follows. 

If Q c R? contains a ball of radius r, then, since 
B, N Graph, divides OB, into two components at 
least one of which has area at most equal to 
(Area(S*)/2)r*, we get from [19] the crude estimate 


Area(S*) 
2 


When the domain Q is convex, it is not hard to see 
that the minimal graph is absolutely area minimizing. 

Very similar calculations to the ones above show 
that if Q c R”! and u:Q—R is a C2 function, then 
the graph of u is a critical point for the area 
functional if and only if u satisfies [14]. Moreover, 
as in [19], the graph of u is actually area 


Area(B,  Graph,,) < A [20] 


minimizing. Consequently, as in [20], if Q contains 
a ball of radius r, then 


Vol(S* ’) p 


Vol(B, N Graph,) < 5 


[21] 


The Maximum Principle 


The first variation formula, [2], showed that a smooth 
submanifold is a critical point for area if and only if 
the mean curvature vanishes. We will next derive the 
weak form of the first variation formula which is the 
basic tool for working with “weak solutions” (typi- 
cally, stationary varifolds). Let X be a vector field on 
R”. We can write the divergence div xX of X on È as 


div s X = divs x” + diva x 
= divs X! + (X, H) [22] 


where XT and XN are the tangential and normal 
projections of X. In particular, we get that, for a 
minimal submanifold, 


div x X = divs X! [23] 


Moreover, from [22] and Stokes’ theorem, we see that 
X is minimal if and only if for all vector fields X with 
compact support and vanishing on the boundary of X, 


/ divs X = 0 [24] 
2) 


The key point is that [24] makes sense as long as we 
can define the divergence on X. As a consequence of 
[24], we will show the following proposition: 


Proposition 1 ©% c R” is minimal if and only if the 
restrictions of the coordinate functions of R” to X 
are harmonic functions. 


Proof Let 7 be a smooth function on » with 
compact support and 7|O = 0, then 


[ (ven V»Xxi) = [ (ven ei) 
2 [ drae 25] 


From this, the claim follows easily. C 


Recall that if = C R” is a compact subset, then the 
smallest convex set containing = (the convex hull, 
Conv(=)) is the intersection of all half-spaces 
containing =. The maximum principle forces a 
compact minimal submanifold to lie in the convex 
hull of its boundary (this is the “convex hull 


property”): 


Proposition 2 If S* c R” is a compact minimal 
submanifold, then £ C Conv(0™). 


Proof <A half-space H C R” can be written as 
H={x eR" sesa [26] 


for a vector e€ S$” ' and constant a€ R. By 
Proposition 1, the function u(x) = (e,x) is harmonic 
on X and hence attains its maximum on O» by the 
maximum principle. C 


Another application of [23], with a different 
choice of vector field X, gives that for a 
k-dimensional minimal submanifold £ 


Aslx — xo = 2divs(x — xo) =2k [27] 


Later, we will see that this formula plays a crucial 
role in the monotonicity formula for minimal 
submanifolds. 

The argument in the proof of the convex hull 
property can be rephrased as saying that as we 
translate a hyperplane towards a minimal surface, 
the first point of contact must be on the boundary. 
When © is a hypersurface, this is a special case of 
the strong maximum principle for minimal surfaces: 


Lemma 1 Let QCR”! be an open connected 
neighborhood of the origin. If uy,u2:Q—-R are 
solutions of the minimal surface equation with u, < u 
and u,(0)=u2(0), then uy = u2. 


Since any smooth hypersurface is locally a graph 
over a hyperplane, Lemma 1 gives a maximum 
principle for smooth minimal hypersurfaces. 

Thus far, the examples of minimal submanifolds 
have all been smooth. The simplest nonsmooth 
example is given by a pair of planes intersecting 
transversely along a line. To get an example that is 
not even immersed, one can take three half-planes 
meeting along a line with an angle of 27/3 between 
each adjacent pair. 


Monotonicity and the Mean-Value 
Inequality 


Monotonicity formulas and mean-value inequalities 
play a fundamental role in many areas of geometric 
analysis. 


Proposition 3 Suppose that XF c R” is a minimal 
submanifold and xo € R”; then for all0 < s <t, 


t-* Vol(B:(x0) N £) — s~% Vol (B, (x0) NZ) 
(x — xo) 


k+2 [28] 


Jarre Ix — xo| 
Notice that (x — x9)“ vanishes precisely when 5 is 
conical about xo, that is, when © is invariant under 
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dilations about xg. As a corollary, we get the 
following: 


Corollary 1 Suppose that * C R” is a minimal 
submanifold and xy € R”; then the function 


_ Vol(B;(x0) A £) 


Ox, 
(s) Vol(B, C R*) 


|29] 


is a nondecreasing function of s. Moreover, 
Ox (s) is constant in s if and only if X is conical 
about xo. 


Of course, if xọ is a smooth point of X, then 
lims — o Ox, (s) =1. We will later see that the converse 
is also true; this will be a consequence of the Allard 
regularity theorem. 

The monotonicity of area is a very useful tool in 
the regularity theory for minimal surfaces — at least 
when there is some a priori area bound. For 
instance, this monotonicity and a compactness 
argument allow one to reduce many regularity 
questions to questions about minimal cones (this 
was a key observation of W Fleming in his work on 
the Bernstein problem; see the section “The 
theorems of Bernstein and Bers”). 

Arguing as in Proposition 3, we get a weighted 
monotonicity: 


Proposition 4 If S* C R” is a minimal submani- 
fold, xp € R”, and f is a function on ©, then 


ef fost] f 
B:(xo)NX} B;(x9)N& 


_ (æ= xo) 1 fa 
kł}2 2 Si 


i Ix — Xo| 


. J C E T 30 
B,(x9)N& 


We get immediately the following mean-value 
inequality for the special case of non-negative 
subharmonic functions: 


Corollary 2 Suppose that <* C R” is a minimal 
submanifold, x9 € R”, and f is a non-negative 
subharmonic function on ©; then 


J. f 31] 
B,(x9)N& 


is a nondecreasing function of s. In particular, if 
xo E È, then for all s > 0, 


f (x0) < Jeans f 


$ Vol(B, C RI) E 
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Rado’s Theorem 


One of the most basic questions is what does the 
boundary ð% tell us about a compact minimal 
submanifold i? We have already seen that © must 
lie in the convex hull of 0%, but there are many 
other theorems of this nature. One of the first 
theorems is a beautiful result of Rado which says 
that if OX is a graph over the boundary of a convex 
set in R, then © is also graph (and hence 
embedded). The proof of this uses basic properties 
of nodal lines for harmonic functions. 


Theorem 1 Suppose that Q C R? is a convex subset 
and o CR? is a simple closed curve which is 
graphical over ƏN. Then any minimal disk X c R? 
with O£ =o must be graphical over Q and hence 
unique by the maximum principle. 


Proof (Sketch). The proof is by contradiction, so 
suppose that © is such a minimal disk and x € Nisa 
point where the tangent plane to © is vertical. 
Consequently, there exists (a,b) Æ (0,0) such that 


V>»(ax1 + bx2)(x) = 0 [33] 


By Proposition 1, axı + bx2 is harmonic on © (since 
it is a linear combination of coordinate functions). 
The local structure of nodal sets of harmonic 
functions (see, e.g., Colding and Minicozzi II 
(1999)) then gives that the level set 


fy € Nlax, + bx2(y) = axı + bx2(x)} [34] 


has a singularity at x where at least four different 
curves meet. If two of these nodal curves were to 
meet again, then there would be a closed nodal 
curve which must bound a disk (since ¥ is a disk). 
By the maximum principle, ax; + bx2 would have 
to be constant on this disk and hence constant on X 
by unique continuation. This would imply that 
o@=0O®% is contained in the plane given by [34]. 
Since this is impossible, we conclude that all of 
these curves go to the boundary without intersect- 
ing again. 


In other words, the plane in R? given by [34] 
intersects ø in at least four points. However, since 
Q c R? is convex, ðQ intersects the line given by 
[34] in exactly two points. Finally, since o is 
graphical over ƏN, øo intersects the plane in R? 
given by [34] in exactly two points, which gives 
the desired contradiction. C 


The Theorems of Bernstein and Bers 


A classical theorem of S Bernstein from 1916 says 
that entire (i.e., defined over all of R*) minimal 


graphs are planes. This remarkable theorem of 
Bernstein was one of the first illustrations of the 
fact that the solutions to a nonlinear PDE, like the 
minimal surface equation, can behave quite differ- 
ently from solutions to a linear equation. 


Theorem 2 Ifu:R?—R is an entire solution to the 
minimal surface equation, then u is an affine 
function. 


Proof (Sketch). We will show that the curvature of 
the graph vanishes identically; this implies that the 
unit normal is constant and, hence, the graph must 
be a plane. The proof follows by combining two 
facts. First, the area estimate for graphs [20] gives 


Area(B, N Graph,) < 27r? [35] 


This quadratic area growth allows one to construct 
a sequence of non-negative logarithmic cutoff func- 
tions ¢; defined on the graph with ¢;— 1 every- 
where and 


lim Voi) =0 [36] 
J=% J Graph, 


Moreover, since graphs are area minimizing, they 
must be stable. We can therefore use ¢; in the 
stability inequality [7] to get 


|  @ars/ ier 37 
Graph, Graph, 

Combining these gives that |A| is zero, as 
desired. C 


Rather surprisingly, this result very much 
depended on the dimension. The combined efforts 
of E De Giorgi, F J Almgren Jr., and J Simons finally 
gave: 


Theorem 3 If u:R™!—R is an entire solution to 
the minimal surface equation and n < 8, then u is an 
affine function. 


However, in 1969, E Bombieri, De Giorgi, and 
E Giusti constructed entire nonaffine solutions to 
the minimal surface equation on R® and an area- 
minimizing singular cone in R®. In fact, they showed 
that for m > 4, the cones 

C= (tirem (07 oo ba 
= Kp HH Xim} CRM [38] 


are area minimizing (and obviously singular at the 
origin). 

In contrast to the entire case, exterior solutions 
of the minimal graph equation, that is, solutions 


on R*\ By, are much more plentiful. In this case, L 
Bers proved that Vu actually has an asymptotic 
limit: 

Theorem 4 If u is a C* solution to the minimal 
surface equation on R?\B,, then Vu has a limit at 
infinity (1.e., there is an asymptotic tangent plane). 


Bers’ theorem was extended to higher dimensions 
by L Simon: 


Theorem 5 If u is a C* solution to the minimal 
surface equation on R”\ By, then either 


(i) |Vu] is bounded and Vu has a limit at infinity or 
(ii) all tangent cones at infinity are of the form i x R 
where X is singular. 


Bernstein’s theorem has had many other interest- 
ing generalizations, some of which will be discussed 
later. 


Simons Inequality 


In this section, we recall a very useful differential 
inequality for the Laplacian of the norm squared of 
the second fundamental form of a minimal hypersur- 
face X in R” and illustrate its role in a priori 
estimates. This inequality, originally due to J 
Simons, 1s: 


Lemma 2 If £”! C R” is a minimal hypersurface, 
then 


As|A|’ = -2|A|f + 2|VsA| >-2|Al* [39 


An inequality of the type [39] on its own does not 
lead to pointwise bounds on |A|? because of the 
nonlinearity. However, it does lead to estimates if a 
“scale-invariant energy” is small. For example, 
H Choi and Schoen used [39] to prove: 


Theorem 6 There exists € > 0 so that if 0E£ C 
B,(0) with O£ C OB,(0) is a minimal surface with 


fia se 40] 
then 

JA (0) <r? [41] 
Heinz’s Curvature Estimate for Graphs 


One of the key themes in minimal surface theory is 
the usefulness of a priori estimates. A basic example 
is the curvature estimate of E Heinz for graphs. 
Heinz’s estimate gives an effective version of the 
Bernstein’s theorem; namely, letting the radius rp go 
to infinity in [42] implies that |A| vanishes, thus 
giving Bernstein’s theorem. 
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Theorem 7 If D,, C Rê and u:D,, — R satisfies 
the minimal surface equation, then for i= Graph, 
and0<a< ro 
o? sup |A| < C |42] 
oa 
Proof (Sketch). (Observe first that it suffices to 
prove the estimate for o = ro, that is, to show that 


[AŻ (0, u(0)) < Cro? [43] 


Recall that minimal graphs are automatically stable. 
As in the proof of Theorem 2, the area estimate for 
graphs [20] allows us to use a logarithmic cutoff 
function in the stability inequality |7] to get that 


C 
[Psp 
B,, NGraph,, log(ro/r1) 


Taking roọ/rı sufficiently large, we can then apply 
Theorem 6 to get [43]. O 


[44] 


Embedded Minimal Disks 
with Area Bounds 


In the early 1980s, Schoen and Simon extended the 
theorem of Bernstein to complete simply connected 
embedded minimal surfaces in R? with quadratic 
area growth. A surface X is said to have quadratic 
area growth if for all r > 0, the intersection of the 
surface with the ball in R? of radius r and center at 
the origin is bounded by Cr? for a fixed constant C 
independent of r. 


Theorem 8 Let 0 € ©? C B,,=B,,(x) C R? be an 
embedded simply connected minimal surface with 
Od C OB,,. If u > 0 and either 


Area(S) < u or / IAI? <u [45] 
J 
then for the connected component X' of B,,/2(x0) N X 


with 0 € x! we have 
sup |A|* < Crp? [46] 
yy 


for some C=C(p). 


The result of Schoen-—Simon was generalized by 
Colding—Minicozzi to quadratic area growth for 
intrinsic balls (this generalization played an impor- 
tant role in analyzing the local structure of 
embedded minimal surfaces): 


Theorem 9 Given a constant Cy, there exists Cp so 
that if Bap C & C R? is an embedded minimal disk 
satisfying either 


Area(Bz,,) < Cyr or J AŻ <C [47 
Bor 
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then 


sup |A|? < Cps~* [48] 
B; 


As an immediate consequence, letting ro— oo 
gives Bernstein-type theorems for embedded simply 
connected minimal surfaces with either bounded 
density or finite total curvature. Note that Enneper’s 
surface is simply connected but neither flat nor 
embedded; this shows that embeddedness is essential 
for these estimates. Similarly, the catenoid shows 
that the surface being simply connected is essential. 
The catenoid is the minimal surface in R? given by 


{ (cosh scos t, cosh s sin t, s)|s,t € R} [49] 


Stable Minimal Surfaces 


It turns out that stable minimal surfaces have a 
priori estimates. Since minimal graphs are stable, the 
estimates for stable surfaces can be thought of as 
generalizations of the earlier estimates for graphs. 
These estimates have been widely applied and are 
particularly useful when combined with existence 
results for stable surfaces (such as the solution of the 
Plateau problem). The starting point for these 
estimates is that, as we saw in [4], stable minimal 
surfaces satisfy the stability inequality 


flare < fiver 50) 


We will mention two such estimates. The first is 
R Schoen’s curvature estimate for stable surfaces: 


Theorem 10 There exists a constant C so that if 
© C R? is an immersed stable minimal surface with 
trivial normal bundle and B, C i\O™%, then 


sup |A|> < Co? [51] 


ro =F 


The second is an estimate for the area and total 
curvature of a stable surface is due to Colding- 
Minicozzi; for simplicity, we will state only the area 
estimate: 


Theorem 11 If SCR° is an immersed stable 


minimal surface with trivial normal bundle and 
B,, C &\O%, then 


Area(B,,) < 4ar¢/3 [52] 


As mentioned, we can use [52] to bound the 
energy of a cutoff function in the stability inequality 
and, thus, bound the total curvature of sub-balls. 
Combining this with the curvature estimate of 
Theorem 6 gives Theorem 10. Note that the bound 


[53] is surprisingly sharp; even when © is a plane, 


the area is 774. 


Regularity Theory 


In this section, we survey some of the key ideas in 
classical regularity theory, such as the role of 
monotonicity, scaling, e-regularity theorems (such 
as Allard’s theorem) and tangent cone analysis (such 
as Almgren’s refinement of Federer’s dimension 
reducing). We refer to the book by Morgan (1995) 
for a more detailed overview and a general 
introduction to geometric measure theory. 

The starting point for all of this is the mono- 
tonicity of volume for a minimal k-dimensional 
submanifold ©. Namely, Corollary [1] gives that the 
density 


— Vol(B;(x0) NX) 


Oxo(s) = Vol(B, c R*) 33 


is a monotone nondecreasing function of s. Conse- 
quently, we can define the density ©,, at the point 
xo to be the limit as s— 0 of ©,,(s). It also follows 
easily from monotonicity that the density is semi- 
continuous as a function of xo. 


e«-Regularity and the Singular Set 


An e-regularity theorem is a theorem giving that a 
weak (or generalized) solution is actually smooth at 
a point if a scale-invariant energy is small enough 
there. The standard example is the Allard regularity 
theorem: 


Theorem 12 There exists 6(k,n) >0 such that if 
X C R” is a k-rectifiable stationary varifold (with 
density at least one a.e.), xo € X, and 


O in Vol(B,(x0) A E) 


r LF [54] 
r0 Vol(B, C R5) 


then X is smooth in a neighborhood of xo. 


Similarly, the small total curvature estimate of 
Theorem 6 may be thought of as an e-regularity 
theorem; in this case, the scale-invariant energy is 
(AP. 

As an application of the e-regularity theorem, 
Theorem [12], we can define the singular set S of £ by 


S= {xE DO >14+6} 55] 


It follows immediately from the semicontinuity of 
the density that S is closed. In order to bound the 
size of the singular set (e.g., the Hausdorff measure), 
one combines the e-regularity with simple covering 
arguments. 


This preliminary analysis of the singular set can 
be refined by doing a so-called tangent cone 
analysis. 


Tangent Cone Analysis 


It is not hard to see that scaling preserves the space 
of minimal submanifolds of R”. Namely, if = is 
minimal, then so is 


Zya = {y +A (x — y) |x € £} |56] 


(To see this, simply note that this scaling multi- 
plies the principal curvatures by A.) Suppose now 
that we fix the point y and take a sequence A; — 0. 
The monotonicity formula bounds the density of 
the rescaled solution, allowing us to extract a 
convergent subsequence and limit. This limit, 
which is called a “tangent cone” at y, achieves 
equality in the monotonicity formula and, hence, 
must be homogeneous (i.e., invariant under dila- 
tions about y). 

The usefulness of tangent cone analysis in 
regularity theory is based on two key facts. For 
simplicity, we illustrate these when X C R” is an 
area-minimizing hypersurface. First, if any tangent 
cone at y is a hyperplane R”', then © is smooth in a 
neighborhood of y. This follows easily from the 
Allard regularity theorem since the density at y of 
the tangent cone is the same as the density at y of È. 
The second key fact, known as “dimension redu- 
cing,” is due to Almgren and is a refinement of an 
argument of Federer. To state this, we first stratify 
the singular set S of © into subsets 


So C S1 C+: C Sy_2 [57] 


where we define S; to be the set of points y € S so 
that any linear space contained in any tangent cone 
at y has dimension at most 7. (Note that S,_1 =@ by 
Allard’s theorem.) The dimension reducing argu- 
ment then gives that 


dim(S;) <i [58] 


where dimension means the Hausdorff dimension. 
In particular, the solution of the Bernstein problem 
then gives codimension-7 regularity of X, that is, 


dim (S) <n- 8. 


Part 2. Constructing Minimal Surfaces 


Thus far, we have mainly dealt with regularity and 
a priori estimates but have ignored questions of 
existence. In this part, we survey some of the most 
useful existence results for minimal surfaces. The 
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following section gives an overview of the classical 
Plateau problem. Next, we recall the classical 
Weierstrass representation, including a few modern 
applications, and the Kapouleas desingularization 
method. Then we deal with producing area-mini- 
mizing surfaces and questions of embeddedness. 
Finally, we recall the min-max construction for 
producing unstable minimal surfaces and, in parti- 
cular, doing so while controlling the topology and 
guaranteeing embeddedness. 


The Plateau Problem 


The following fundamental existence problem for 
minimal surfaces is known as the Plateau problem: 
given a closed curve T, find a minimal surface with 
boundary I. There are various solutions to this 
problem depending on the exact definition of a 
surface (parametrized disk, integral current, Z2 
current, or rectifiable varifold). We shall consider 
the version of the Plateau problem for parametrized 
disks; this was solved independently by J Douglas 
and T Rado. The generalization to Riemannian 
manifolds is due to C B Morrey. 


Theorem 13 Let T C R? be a piecewise C! closed 
Jordan curve. Then there exists a piecewise C! map 
u from D C R? to R? with u(OD) CT such that the 
image minimizes area among all disks with bound- 
ary I. 


The solution u to the Plateau problem above can 
easily be seen to be a branched conformal immer- 
sion. R Osserman proved that u does not have true 
interior branch points; subsequently, R Gulliver and 
W Alt showed that u cannot have false branch 
points either. 

Furthermore, the solution u is as smooth as the 
boundary curve, even up to the boundary. A very 
general version of this boundary regularity was 
proved by S Hildebrandt; for the case of surfaces 
in R?, recall the following result of J C C Nitsche: 


Theorem 14 If T is a regular Jordan curve of class 
C° where k >1and0<a <1, then a solution u 
of the Plateau problem is C®* on all of D. 


The Weierstrass Representation 


The classical Weierstrass representation (see Osserman 
(1986)) takes holomorphic data (a Riemann surface, a 
meromorphic function, and a holomorphic 1-form) 
and associates a minimal surface in R?. To be precise, 
given a Riemann surface Q, a meromorphic function g 
on Q, and a holomorphic 1-form ¢ on Q, then we 
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get a (branched) conformal minimal immersion 


F:Q—R° by 


ra=Ref (360-80) 


516) +80).1)00) 59 


Here, zo € Q is a fixed base point and the integra- 
tion is along a path +,,,z from zo to z. The choice of 
zo changes F by adding a constant. In general, the 
map F may depend on the choice of path (and hence 
may not be well defined); this is known as “the 
period problem.” However, when g has no zeros or 
poles and Q is simply connected, then F(z) does not 
depend on the choice of path ¥,,,.. 

Two standard constructions of minimal surfaces 
from Weierstrass data are 


g(z) = %, (z) = dz/z, Q = C\{0} 
giving a catenoid [60] 


g(z) =e", d(z) = dz, Q = C giving a helicoid [61] 


The Weierstrass representation is particularly 
useful for constructing immersed minimal surfaces. 
Typically, it is rather difficult to prove that the 
resulting immersion is an embedding (i.e., is 1-1), 
although there are some interesting cases where this 
can be done. For the first modern example, 
D Hoffman and Meeks proved that the surface 
constructed by Costa was embedded; this was 
the first new complete finite topology properly 
embedded minimal surface discovered since the 
classical catenoid, helicoid, and plane. This led 
to the discovery of many more such surfaces 
(see Rosenberg (1992) for more discussion). 


Area-Minimizing Surfaces 


Perhaps the most natural way to construct minimal 
surfaces is to look for ones which minimize area, for 
example, with fixed boundary, or in a homotopy 
class, etc. This has the advantage that often it is 
possible to show that the resulting surface is 
embedded. We mention a few results along these 
lines. 

The first embeddedness result, due to Meeks and 
Yau, shows that if the boundary curve is embedded 
and lies on the boundary of a smooth mean convex 
set (and it is null-homotopic in this set), then it 
bounds an embedded least area disk. 


Theorem 15 (Meeks III and Yau 1982). Let M? be 
a compact Riemannian 3-manifold whose boundary 
is mean convex and let y be a simple closed curve in 


OM which is null-homotopic in M; then y is 
bounded by a least area disk and any such least 


area disk is properly embedded. 


Note that some restriction on the boundary curve 
y is certainly necessary. For instance, if the 
boundary curve was knotted (e.g., the trefoil), then 
it could not be spanned by any embedded disk 
(minimal or otherwise). Prior to the work of Meeks 
and Yau, embeddedness was known for extremal 
boundary curves in R? with small total curvature by 
the work of R Gulliver and J Spruck. 

If we instead fix a homotopy class of maps, then 
the two fundamental existence results are due to 
Sacks-Uhlenbeck and Schoen—Yau (with embed- 
dedness proved by Meeks—Yau and Freedman- 
Hass-Scott, respectively): 


Theorem 16 Given M?, there exist conformal 
(stable) minimal immersions 1,...,Um:S° ~M 
which generate m2(M) as a Z|m(M)] module. 
Furthermore, 


(i) if u: —M and [u], #0, then Area(u) > 
min; Area(z;), 

(ii) each u; is either an embedding or a 2-1 map 
onto an embedded two-sided RPP. 


Theorem 17 If ©? is a closed surface with genus 
g>0 and ig: —M? is an embedding which 
induces an injective map on mı, then there is a 
least area embedding with the same action on mı. 


The Min—Max Construction 
of Minimal Surfaces 


Variational arguments can also be used to construct 
higher index (i.e., nonminimizing) minimal surfaces 
using the topology of the space of surfaces. There 
are two basic approaches: 


1. Applying Morse theory to the energy functional 
on the space of maps from a fixed surface © to M. 

2. Doing a min-max argument over families of 
(topologically nontrivial) sweep-outs of M. 


The first approach has the advantage that the 
topological type of the minimal surface is easily 
fixed; however, the second approach has been more 
successful at producing embedded minimal surfaces. 
We will highlight a few key results below but refer 
to Colding and De Lellis (2003) for a thorough 
treatment. 

Unfortunately, one cannot directly apply Morse 
theory to the energy functional on the space of maps 
from a fixed surface because of a lack of compact- 
ness (the Palais-Smale condition C does not hold). 
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Figure 1 A one-parameter family of curves on a 2-sphere 
which induces a map F : S? — S? of degree 1. First published in 
Surveys in Differential Geometry, volume IX, in 2004, published 
by International Press. 


To get around this difficulty, Sacks-Uhlenbeck 
introduce a family of perturbed energy functionals 
which do satisfy condition C and then obtain 
minimal surfaces as limits of critical points for the 
perturbed problems: 


Theorem 18 If m(M) #0 for some k> 1, then 
there exists a branched immersed minimal 2-sphere 
in M (for any metric). 


The basic idea of constructing minimal surfaces 
via min-max arguments and sweep-outs goes back 
to Birkhoff, who developed it to construct simple 
closed geodesics on spheres. In particular, when M is 
a topological 2-sphere, we can find a one-parameter 
family of curves starting and ending at point curves 
so that the induced map F:S* — S% (see Figure 1) 
has nonzero degree. The min-max argument pro- 
duces a nontrivial closed geodesic of length less than 
or equal to the longest curve in the initial one- 
parameter family. A curve-shortening argument 
gives that the geodesic obtained in this way is 
simple. 

J Pitts applied a similar argument and geometric 
measure theory to get that every closed Riemannian 
3-manifold has an embedded minimal surface (his 
argument was for dimensions up to seven), but he 
did not estimate the genus of the resulting surface. 
Finally, F Smith (under the direction of L Simon) 
proved (see Colding and De Lellis (2003)): 


Theorem 19 Every metric on a_ topological 
3-sphere M admits an embedded minimal 2-sphere. 


The main new contribution of Smith was to 
control the topological type of the resulting minimal 
surface while keeping it embedded. 


Part 3. Some Applications of Minimal 
Surfaces 


In this part, we discuss very briefly a few applica- 
tions of minimal surfaces. As mentioned in the 
introduction, there are many to choose from and we 
have selected just a few. 
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The Positive-Mass Theorem 


The (Riemannian version of the) positive-mass 
theorem states that an asymptotically flat 
3-manifold M with non-negative scalar curvature 
must have positive mass. The Riemannian manifold 
M here arises as a maximal spacelike slice in a 
(3 + 1)-dimensional spacetime solution of Einstein’s 
equations. 

The asymptotic flatness of M arises because the 
spacetime models an isolated gravitational system 
and hence is a perturbation of the vacuum solution 
outside a large compact set. To make this precise, 
suppose for simplicity that M has only one end; M 
is then said to be asymptotically flat if there is a 
compact set Q C M so that M\Q is diffeomorphic 
to R°\Br(0) and the metric on M\Q can be 
written as 


M\* 
= i r j 
Sij (1455) Oy + Pij [6 | 
where 
xl pyl + lx |Dpi| + |x| |D p; <C [63] 


The constant M is the so-called mass of M. Observe 
that the metric gj is a perturbation of the metric on 
a constant-time slice in the Schwarzschild spacetime 
of mass M; that is to say, the Schwarzschild metric 
has pi = 0. 

A tensor h is said to be O(|x|”) if |x|?\b| + 
lx?" |Dh| < C. For example, an easy calculation 
shows that 


=) 
8j = (1 +2M/|x]) 6 + Odx| °) 
VE = 1/ det gj =14+3M|x|~' + O(|x|*) 
The positive-mass theorem states that the mass M 
of such an M must be non-negative: 


Theorem 20 (Schoen and Yau 1979). 
above, M > 0. 


With M as 


There is a rigidity theorem as well which states that 
the mass vanishes only when M is isometric to R°: 


Theorem 21 (Schoen and Yau 1979). If |V°pj;|= 
O(|x|~) and M=0O in Theorem 20, then M is 
isometric to R°”. 


We will give a very brief overview of the proof of 
Theorem 20, showing in the process where minimal 
surfaces appear. 


Proof (Sketch). The argument will be by contra- 
diction, so suppose that the mass is negative. It is 
not hard to prove that the slab between two parallel 
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planes is mean convex. That is, we have the 
following: 


Lemma 3 If M <0 and M is asymptotically flat, 
then there exist Ro, h > 0 so that for r > Ro the sets 


C, = {|x| <?7,-h < x3 < h} [65] 
have strictly mean-convex boundary. 


Since the compact set C, is mean convex, we can 
solve the Plateau problem to get an area-minimizing 
(and hence stable) surface T, C C, with boundary 


OV, = {|x| = 17, x3 = h} [66] 


Using the disk {xe|? <1*,x3=h} as a comparison 
surface, we get uniform local area bounds for any 
such T,. Combining these local area bounds with the 
a priori curvature estimates for minimizing surfaces, 
we can take a sequence of 7’s going to infinity and 
find a subsequence of I,’s that converge to a 
complete area-minimizing surface 


Tc {-h<x3 <b} 67] 


Since I is pinched between the planes {x3 = +h}, the 
estimates for minimizing surfaces implies that (out- 
side a large compact set) I is a graph over the plane 
{x3 =O} and hence has quadratic area growth and 
finite total curvature. Moreover, using the form of 
the metric gj, we see that |Vu| decays like x|! and 


/ Re = (25s 4. O(1))(s7! + O(s~7)) 
gr 2r + O(s7!) [68] 


where o; = {xt + x3 =s} OT and k, is the geodesic 
curvature of g, (as a curve in T). 

To get the contradiction, one combines stability of 
I with the positive scalar curvature of M to see that 
no such T could have existed. (M was assumed only 
to have non-negative scalar curvature. However, a 
“rounding off” argument shows that the metric on 
M can be perturbed to have positive scalar curvature 
outside of a compact set and still have negative 
mass.) Namely, substituting the Gauss equation into 
the stability inequality (this is the stability inequality 
in a general 3-manifold; see Colding and Minicozzi II 
(1999)) gives 


J (|Al2/2. + Scalm — Ky)¢? < f Ve [69] 
T T 


Since I has quadratic area growth, we can choose a 
sequence of (logarithmic) cutoff functions in [69] to 
get 


0< [AP /2 + Scal) < [ Ks co P 


since Ky may not be positive, we also used that T 
has finite total curvature. Moreover, we used that 
Scaly is positive outside a compact set to see that 
the first integral in [70] was positive. Finally, 
substituting [70] into the Gauss-Bonnet formula 
gives that Jo. kọ is strictly less than 27 for s large, 
contradicting [68]. 


Black holes 


Another way that minimal surfaces enter into 
relativity is through black holes. Suppose that we 
have a three-dimensional time slice M in a (3 + 1)- 
dimensional spacetime. For simplicity, assume that M 
is totally geodesic and hence has non-negative scalar 
curvature. A closed surface © in M is said to be 
trapped if its mean curvature is everywhere negative 
with respect to its outward normal. Physically, this 
means that the surface emits an outward shell of light 
whose surface area is decreasing everywhere on the 
surface. The existence of a closed trapped surface 
implies the existence of a black hole in the spacetime. 

Given a trapped surface, we can look for the 
outermost trapped surface containing it; this outer- 
most surface is called an apparent horizon. It is not 
hard to see that an apparent horizon must be a 
minimal surface and, moreover, a barrier argument 
shows that it must be stable. Since M has non- 
negative scalar curvature, stability in turn implies 
that it must be diffeomorphic to a sphere. See, for 
instance, Bray (2002) for references to some results 
on black holes, horizons, etc. 


Constant Mean Curvature Surfaces 


At least since the time of Plateau, minimal surfaces 
have been used to model soap films. This is because 
the mean curvature of the surface models the surface 
tension and this is essentially the only force acting 
on a soap film. Soap bubbles, on other hand, enclose 
a volume and thus the pressure gives a second 
counterbalancing force. It follows easily that these 
two forces are in equilibrium when the surface has 
constant mean curvature (cmc). 

For the same reason, cmc surfaces arise in the 
isoperimetric problem. Namely, a surface that mini- 
mizes surface area while enclosing a fixed volume must 
have cmc. It is not hard to see that such an 
isoperimetric surface in R” must be a round sphere. 
There are two interesting partial converses to this. 
First, by a theorem of Hopf, any cmc 2-sphere in R? 
must be round. Second, using the maximum principle 
(“the method of moving planes”), Alexandrov showed 
that any closed embedded cmc hypersurface in R” 
must be a round sphere. It turned out, however, that 
not every closed immersed cmc surface is round. The 





The min—max surface 


Figure 2 The sweep-out, the min-max surface, and the width 
W. First published in the Journal of the American Mathematical 
Society in 2005, published by the American Mathematical Society. 


first examples were immersed cmc tori constructed by 
H Wente. Kapouleas constructed many new examples, 
including closed higher-genus cmc surfaces. 

Many of the techniques developed for studying 
minimal surfaces generalize to general cmc surfaces. 


Finite Extinction for Ricci Flow 


We close this article by indicating how minimal 
surfaces can be used to show that on a homotopy 
3-sphere the Ricci flow becomes extinct in finite 
time (see Colding and Minicozzi II (2005) and 
Perelman (2003) for details). 

Let M? be a smooth closed orientable 3-manifold 
and let g(t) be a one-parameter family of metrics on 
M evolving by the Ricci flow, so 


Ong = —2Ricym, [71] 


In an earlier section, we saw that there is a natural 
way of constructing minimal surfaces on many 
3-manifolds and that comes from the min-max 
argument where the minimal of all maximal slices of 
sweep-outs is a minimal surface. The idea is then to 
look at how the area of this min-max surface changes 
under the flow. Geometrically, the area measures a 
kind of width of the 3-manifold and as we will see for 
certain 3-manifolds (those, like the 3-sphere, whose 
prime decomposition contains no aspherical factors), 
the area becomes zero in finite time corresponding to 
the solution becoming extinct in finite time. 

Fix a continuous map £8:[0,1]— C? N L2? (S, M) 
where ((0) and 8(1) are constant maps so that ( is 
in the nontrivial homotopy class [8] (such 8 exists 
when M is a homotopy 3-sphere). We define the 
width W = W(g, [6]) by 


W(g) = min max Energy(7(s)) [72] 
yele] se[0,1] 


The next theorem gives an upper bound for the 
derivative of W(g(t)) under the Ricci flow which forces 
the solution g(t) to become extinct in finite time. 
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Theorem 22 Let M? be a homotopy 3-sphere 
equipped with a Riemannian metric g=g(0). 
Under the Ricci flow, the width W(g(t)) satisfies 


d 


EWEO)< -4n +7 Welt) 173] 


4(t+ C) 

in the sense of the limsup of forward difference 
quotients. Hence, g(t) must become extinct in finite 
time. 


The 4r in [73] comes from the Gauss—Bonnet 
theorem and the 3/4 comes from the bound on the 
minimum of the scalar curvature that the evolution 
equation implies. Both of these constants matter 
whereas the constant C depends on the initial metric 
and the actual value is not important. 

To see that [73] implies finite extinction time, 
rewrite [73] as 


(W(g(e))(¢ + c) 
< —4n(t+C)°"" (74) 


Gea 


and integrate to get 
(T + C)? W(g(T)) < C4 W(g(0)) 
-16r|(T+C)"*- C4] [75] 


Since W > 0 by definition and the right-hand side of 
[75] would become negative for T sufficiently large, 
we get the claim. 

As a corollary of this theorem we get finite 
extinction time for the Ricci flow. 


Corollary 3 Let M? be a homotopy 3-sphere 
equipped with a Riemannian metric g =g(0). Under 
the Ricci flow g(t) must become extinct in finite time. 
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Introduction 


When studying a functional f on an infinite- 
dimensional function space X, one is often interested 
in finding critical points which are not local minima. 
A simple yet powerful method to detect those 
critical points is the minimax method. The idea 
consists in detecting some complexity in the topol- 
ogy of X, or in the structure of the sublevels of f, to 
find a class T of subsets of X which somehow 
reveals such a topological complexity, and to show 
that the number 


c := inf sup f (x) 


YET xey 


is finite (even if the functional may be unbounded 
above and below). If the class [I is positively 
invariant under the action of the negative-gradient 
flow of f, and if a suitable compactness assumption 
known as the Palais-Smale condition holds, c is 
proved to be a critical value of f. Quite remarkably, 
the minimax method also works when no topologi- 
cal complexity is present, but the negative-gradient 
flow of f exhibits some kind of rigidity. 

In this article we shall describe these ideas, 
starting from the simplest minimax result, the 
“mountain-pass theorem.” We will show how to 


apply the minimax method by discussing the 
existence question of solutions of a nonlinear elliptic 
boundary value problem, of closed geodesics on 
compact manifolds, and of closed characteristics on 
compact energy hypersurfaces. 


The Mountain-Pass Theorem 


Let us start by considering the following familiar 
fact. Let f: R” — R be a smooth coercive function 
(i.e., its sublevels have compact closure). If a sublevel 
{f < a} is not connected — say {f < a}=A U B, with 
A, B disjoint open sets — then f has a critical point x at 
level 


where T is the class of all continuous curves in R” 
with one end point in A and the other in B. More 
figuratively: if there are two valleys, then there 
must be a mountain pass. Let us examine a possible 
proof. 

First notice that any curve in the class T will have 
to cross the level {f =a}, so c > a. If by contradiction 
c is not a critical value of f, by the compactness of the 
sublevels there is some € > 0 such that |Vf| > € on 
{c-—e<f<c+e}. Then the negative-gradient flow 
of f, that is, the solution of 


opit, u) = —VF(et,4)), (0,4) =4 


pulls the sublevel {f < c + e} down into the sublevel 
{f < c— e} in finite time 2/e. Indeed, if ¢([0, t], u) c 
{c—e <f <c+eé}, then the inequalities 


— f ($, u)) 


=- | $f (s, u) 


- [ VF(6(s, u))|? ds > èr 


2e > f(u) 


imply that t < 2/e. By definition of c, we can find a 
continuous curve y € I which is contained in {f < 
c+ €}. But then the curve 7/:= ¢(2/e,7) still has one 
end point in A, the other one in B, and lies in {f < 
c — €}, contradicting the definition of c. 

If we try to generalize this result to functions 
defined on an infinite-dimensional real Hilbert space 
H, we encounter difficulties due to lack of compact- 
ness. Indeed, a continuous function on an infinite- 
dimensional Hilbert space can never have compact 
sublevels (with respect to the norm topology). If we 
look back at the proof, we see that we have used 
coercivity to guarantee that if the level set {f = c} 
contains no critical points, then Vf is bounded away 
from zero on the strip {c -e < f < c+ e€}, for some 
small e > 0. A natural idea is then to replace the 
coercivity assumption by a condition implying the 
latter fact. 


Definition Let f:H — R be a continuously differ- 
entiable function on a real Hilbert space H. 
A sequence (up) C H is said a Palais-Smale sequence 
if f(u) is bounded and Df(up) tends to zero. The 
function f is said to satisfy the Palais-Smale 
condition if every Palais-Smale sequence has a 
converging subsequence. 


The Palais-Smale condition readily implies the 
statement above. Assuming also that f is twice 
continuously differentiable, the negative-gradient 
flow of f (a well-defined local flow because Vf is 
continuously differentiable) pulls the sublevel {f < 
c+e} down into {f <c-—e} in finite time. These 
observations lead to the following: 


Theorem (Mountain pass). Let f be a twice con- 
tinuously differentiable function on a real Hilbert 
space H, satisfying the Palais-Smale_ condition. 
Assume that a sublevel {f <a} is not connected, 
and let A, B be two disjoint open sets such that 
AUB=f{f < a}. Then f has a critical point x at level 


f(x) = c := 


f > 
rer nen OO 24 


where T is the class of all continuous curves in H 
with one end point in A and the other one in B. 
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If we are even more ambitious, and we wish to 
consider functions defined on a real Banach space E, 
we also encounter the problem of not having a 
gradient vector field. Indeed, the differential of f at x, 
Df (x), is an element of the dual space E*, but in this 
case we have no inner product on E by which we can 
represent Df(x) as the product by some vector of E. 
This problem can be overcome by the notion of a 
pseudogradient vector field. In fact, it can be proved 
that if fis continuously differentiable on E, then there 
exists a locally Lipschitz vector field V defined on the 
complement of the critical points of f, such that 


(V) < min{||Df(#) ||, 1} 
Df (u)[V(u)] > zmin{||DF)ll; 1} DF) 


In other words, even if there is no direction of 
steepest increase for f, we do have directions along 
which the increase of f is steep enough, and these 
directions can be selected in a locally Lipschitz way. 
Notice that pseudogradients are useful also in the 
case of a continuously differentiable function on a 
Hilbert space: in this case the gradient of f is just 
continuous, so it does not generate a flow. The 
Palais-Smale condition, as stated above, makes 
perfect sense on the Banach space E (with the only 
difference that now Df (up) tends to zero in the dual 
norm of E*), and the mountain-pass theorem holds 
for functions of class Ct on a Banach space. 

Actually, the fact that the domain of f has a vector 
structure is not relevant in this statement, and the 
mountain-pass theorem holds also for functions 
defined on connected infinite-dimensional mani- 
folds. Since the essential feature is to dispose of a 
pseudogradient vector field, the right level of 
generality is to consider a Banach manifold M (i.e., a 
manifold modeled on a Banach space) endowed with a 
complete Finsler structure (i.e. a Banach norm on 
each tangent space of M, varying in a suitably regular 
way, inducing a complete distance on M). 


A Nonlinear Elliptic Boundary-Value 
Problem 


Let us consider a typical application of the mountain- 
pass theorem to a semilinear elliptic boundary-value 
problem. Let Q be a smooth bounded domain in R”, 
and for A € R, p > 2, consider the problem 


-Au = àu + uju} ind 


[1] 
n= on øQ 


Let 0 < Ay < Ax < A3 < --- be the eigenvalues of the 
Laplace operator —A, with domain H? N H}(Q), the 
Sobolev space of L*-functions on Q with weak first 
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two derivatives in L*, vanishing on 0Q. We claim that, 
ifn=2,orifn > 3and2.< p< 2*:=2n/(n — 2), then 
problem [1] with A < \, has a nontrivial n: 

By elliptic regularity, the solutions of [1] are 
precisely the critical points of the functional 


E(u) = ; J (Vu = u(x)? )dx 


= u(x)|? dx 
5 f Pd 


We recall that Hj(Q) continuously embeds into 
LP (Q), for every p < +00 if n=2, for every p < 2* if 
n > 3. So the functional E€ is well defined, and 
actually continuously differentiable, on H}(Q), a 
Hilbert space with the inner product 


= | Vale -Vu(x) dx 


Since p>2, near zero the quadratic part of 
the functional € dominates over the part with the 
L?-norm. By the Rayleigh characterization of the 
first eigenvalue of the Laplacian, 


N 16 v 


| dx 
= min 


ucH} (Q) {0} faul )* dx 


(u, vimo 


the assumption A < A, implies that the quadratic 
part of E is positive definite. So we can find a small 
p > 0 such that 
a:= inf 
lll y=? 


E(u) > 0 


On the other hand, the fact that p > 2 implies that 


lim €(pu) = —oo 

[b> +00 
for every u #0. Therefore, the sublevel {€ < a} 
is not connected, and if we can prove the 
Palais-Smale condition, the mountain-pass theorem 
will imply the existence of a critical point u with 
E(u) > a > 0, i.e., a nontrivial solution of [1]. 

In order to prove the Palais-Smale condition, 

notice that the expression for the differential of €, 


= | vas - Vu (x) dx 
— | (amex) + il) P ul) Jola) de 


and the compactness of the embedding of H}(Q) 
into L?(Q) for p< 2* imply that the gradient of 
E has the form 

VE(u) =u + K(u) [2] 


where K: HO — H} (Q) is a compact map, that is, 
it maps bounded sets into precompact ones. It is 


readily seen that when VE has such a form, bounded 
Palais-Smale sequences are compact. Thus, it is 
enough to show that every Palais-Smale sequence is 
bounded. But this follows from the identity 


pE) — DE(u) [u] 
— (5 - ) [ (iva — \u(x)? dx 


together with the fact that the right-hand side term 
defines an equivalent norm on H} (Q), because p > 2 
and A < 4. This concludes the proof. 

Actually, using the maximum principle one could 
show that under the same assumptions, problem [1] 
has a solution which is positive in Q. 

When n > 3 and p=2*=2n/(n—2), the func- 
tional f still exhibits a mountain-pass PA but 
the Palais-Smale condition fails. In fact, the embed- 
ding of Hj(Q) into L” (Q) is not compact, so the 
map K appearing in [2] is not compact, and 
bounded Palais-Smale sequences need not have a 
converging subsequence. We recall that the non- 
compactness of the embedding of Hj(Q) into L” (Q) 
is due to the fact that the quotient 


Jo Vu) ) dx 
(Jo u(x)" ‘dx) 


is invariant under rescaling “+> u,,(x) = u(ux). 

When A=0, the Pohozaev identity — an integral 
formula obtained by multiplying the equation by 
x-Vu(x) — can be used to prove that problem [1] 
has no nontrivial solutions, when Q is a star-shaped 
domain other than the whole R”. 

When A Æ 0, the presence in the functional of an 
L?-norm — which rescales differently — breaks the 
symmetry, and the existence of nontrivial solutions 
is again possible. Indeed, Brezis and Nirenberg have 
shown that problem [1] with p= 2* has a nontrivial 
solution provided that n>4 and O< à< A, or 
n=3 and X\* < A < Ay, for some X* € [0, A1] depend- 
ing on the domain 2. 

The proof is based on the fact that there is a 
certain threshold s > 0, related to the best Sobolev 
constant obtained by taking the infimum of S(u) 
over all u€ H} (the domain is irrelevant here), 
below which the Palais-Smale condition holds. That 
is, every sequence (up) such that €(u,) converges to 
some b less than s, and DE(u,) tends to zero, is 
compact. The proof of the mountain-pass theorem 
shows that the Palais-Smale condition is needed 
only at the minimax level c. In order to conclude, it 
is then enough to show that c < s. The value of 
c can be estimated by using the fact that the 


S(u) = 


infimum of the quotient $ over functions on the 
whole R” is attained at the family of functions 


vile) = e*n(n — 2) ame i 
a ( =) 


which are then solutions of [1] with p=2*,A=0, 
and QR" 

Another way to break the symmetry is to keep 
A= 0 but to consider domains with a rich topology. 
For instance, Bahri and Coron have shown that if Q 
is a domain with some nonzero singular homology 
group H,(Q;Z2),k > 1, then problem [1] with 
p=2* and A=0 has a positive solution. 

Elliptic equations having nonlinearities with the 
critical exponent 2* arise naturally in some geo- 
metric problems. Consider a manifold M of dimen- 
sion 2 > 3, with a metric g having scalar curvature k. 
The Yamabe problem calls for finding a metric go, 
conformally equivalent to g, having constant scalar 
curvature. If gy =u*/("-7)g, where the positive func- 
tion u gives the conformal factor, one finds that 
u must solve the equation 


A4(n — 1) 
n—2 

where A, is the Laplace—Beltrami operator associated 
with the metric g, and the constant kọ is the scalar 
curvature of go. Again, the corresponding functional 
satisfies the Palais-Smale condition only below a 
certain threshold (actually, the same number s as seen 
earlier; this because the lack of compactness is due to 
local concentration phenomena, and the metric 
structure of the whole ambient becomes irrelevant). 
The task is then to show that the minimax level is 
below that threshold or, equivalently, that a certain 
best Sobolev constant for (M,g) is less than the 
corresponding constant for R” with the flat metric 
(the latter constant is again the infimum of S(z)). This 
fact was proved by Aubin in the case n > 6 or (M, g) 
not locally conformally flat. Schoen has then treated 
the remaining case, by means of the positive-mass 
theorem, a deep result in differential geometry. 


A,u = —ku + kou|ul? 7 


A General Minimax Principle 


Let us consider again a twice continuously differ- 
entiable function f on a real Hilbert space H. The 
vector field 


Via) = elo _ 
1+ IVF) 


has the same nice properties of the gradient vector 


field of f, but in addition it is bounded. The 


Minimax Principle in the Calculus of Variations 435 


advantage is that the flow of —V is globally defined. 
When talking about the negative-gradient flow of 
f, we will actually refer to such a flow. It will also be 
useful to dispose of a negative-gradient flow 
truncated below level b. This is the flow of the 
vector field —V;, where 


Ve (u) = (f u)) V (u) 


with y a smooth function on R which is identically 
zero on [—oo, b], then increases up to reaching the 
value 1, and afterwards remains constantly equal to 
1. This truncated negative-gradient flow keeps the 
points in the sublevel {f < b} fixed, and behaves 
as the negative-gradient flow above b (except the 
fact that trajectories slow down as the value of 
f approaches b). 

After these preliminaries, let us consider again the 
characterization of the critical level c appearing in the 
mountain-pass theorem. This critical level was 
obtained as the infimum over a certain class I’ of 
sets y — the curves with end points in different 
components of {f < a} — of the maximum of f over y. 
But if we look back at the proof, we realize that the 
fact that these sets were curves was not essential. The 
important feature was that the negative-gradient flow 
olt, -) mapped a set of the class I into a set still 
belonging to the class T, for t > 0. This observation 
leads to the following general minimax theorem, due 
to Palais: 


Theorem (General minimax). Let f be a twice 
continuously differentiable function on a real 
Hilbert space H, satisfying the Palais-Smale condi- 
tion. Let T be a class of subsets of H which is 
positively invariant under the action of the negative- 
gradient flow ¢ of f (possibly truncated below level 
b): that is, if the set y belongs toT, then the set (t, y) 
belongs to T for all t > 0. Then, if the number 


c:= inf sup f(u) 
yD uey 


is finite (and larger than b), then c is a critical 


value of f. 


The proof goes along the same lines of the proof of 
the mountain-pass theorem: if c is not a critical value 
of f, the (possibly truncated) negative-gradient flow 
olto,- ) pulls a sublevel {f < c+} down into the 
sublevel {f < c — e} (with c—e > b), for some large 
to, by the Palais-Smale condition. Then we achieve a 
contradiction choosing a set y € I on which f does 
not exceed c+ «€, and noticing that (to, y) is a set 
which still belongs to the class IT, by positive 
invariance, and on which f does not exceed c — e. 

As we shall see in the last section, the possibility 
of working with a truncated negative-gradient flow 
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(assuming in this case that c > b) makes the applica- 
tion of this theorem easier. Again, an analogous 
result holds for continuously differentiable functions 
on Banach spaces, or more generally on Banach 
manifolds with a complete Finsler structure. 

Trivial classes I’ are the class of all points in H, 
and the class consisting of the single set H, yielding 
to the infimum and the supremum of f, respectively. 
More interesting classes are constructed by fixing a 
topological space X and considering the images of 
all continuous maps h:X — H belonging to a 
certain relative homotopy class. 


Closed Geodesics on Compact Manifolds 


A typical application of the general minimax 
theorem is Birkhoff proof of the existence of a 
closed geodesic on the sphere S*, endowed with an 
arbitrary metric g. Closed geodesics are precisely the 
critical points of the energy functional 


on the Hilbert manifold H!(T, S?) consisting of all 
one-periodic loops on S* of Sobolev regularity H! 
(here T=R/Z denotes the circle parametrized by 
[0,1]). This functional satisfies the Palais-Smale 
condition and it is bounded below, but its minima 
are just the trivial constant loops, on which S=0. 
Let us use angle coordinates (0, Y) on S*, —7/2 < 
0 < T/2,0 <y < 2z (6 is the latitude, p the longi- 
tude). A (suitably regular) map 4: S* — S* induces a 
curve in H!(T, S?) parametrized by 6: the value of 
this curve at @¢€[-—7/2,7/2] is the loop 
t— h(0,27rt). It is a curve that joins two constant 
loops. Let T be the set of curves in H!(T, S?) which 
are obtained by maps h:S? — S* of topological 
degree 1. This class is clearly positively invariant 
under the action of the negative-gradient flow of 
S (as of every homotopy fixing the constant loops). 
If we can show that the minimax level 


c := inf sup S(x) 
yD uey 


is positive, we will get a positive critical value of S by 
the general minimax theorem, hence a nontrivial 
closed geodesic. By considering the fact that loops 
with small energy also have a small diameter, it 
is easy to construct a homotopy on {S < a}, for 
some small a> 0, which shrinks every loop to a 
point. If þ:S?— S$ determines a curve y with 
maXxey S(x) < a, composition with this homotopy 


yields to a homotopy of þh to a map whose image is 
a curve in S*. A further homotopy then shows that 
the map b is homotopic to a constant, which 
is impossible if 4 has degree 1. This shows that 
c>a>0, concluding the proof. 

Actually, Ljusternik and Fet have proved that 
every compact manifold M has a nontrivial closed 
geodesic. Indeed, if M has nonzero fundamental 
group, it is enough to minimize S on some nontrivial 
homotopy class of loops. Otherwise, the fact that 
M is a compact manifold implies that some homo- 
topy group 7,41(M),1 < k < dim M, does not van- 
ish. A construction similar to the one described 
above then allows to associate with every noncon- 
tractible map h:S%t! — M a map u:(B*,0B*) > 
(H'(T,M),{S=0}) which is not homotopically 
trivial (here B£ denotes the closed unit ball in R£, 
and the notation means that u maps the boundary 
of the ball B* into the set of constant loops). Taking 
a minimax over the set of images of the maps 
u associated with every noncontractible map 
h: S+! — M yields to the desired critical point of 
S with positive energy. 

It is conjectured that every compact manifold has 
infinitely many closed geodesics. Morse theory 
allows to prove this fact for the vast majority of 
manifolds, but not for the spheres. Bangert and 
Franks have established the existence of infinitely 
many geodesics on S* by proving that every area- 
preserving homeomorphism of the open disk with 
two fixed points must have infinitely many periodic 
points. Proving the existence of infinitely many 
closed geodesics on higher-dimensional spheres is a 
challenging open problem. 


A Rigidity Property of a Certain 
Class of Maps 


It is important that the class T in the general 
minimax theorem is only required to be invariant 
under the action of the negative-gradient flow, and 
not, say, under the action of any continuous 
homotopy on which the function f is nonincreasing. 
Indeed, too many undesirable things can be done on 
an infinite-dimensional Hilbert space by arbitrary 
continuous maps, whereas the maps arising from 
our negative-gradient flow might show some rigid- 
ity, forcing them to behave as maps on finite- 
dimensional spaces. 

Let us clarify this point by considering the follow- 
ing example, due to Benci and Rabinowitz. It may 
sound a bit artificial at this moment (simpler 
examples could be built), but we will find it useful 
in the next section. Assume that our Hilbert space is 









ES 


men 
oo 
oid 
rae 
a 
el 
See 
ee) 
ee 
oe 
<d 
ed 
e] 
= 
os 


aoe 
ii 
Ses 


aa 
Soho 
EEO 
EE 
nS 
iSt 
h 
oe 
EE 
te 
ae 
soe 
i 
585 
Sei 
i 
ees 
ae 


oe 
othe 


ot 
perene 
merh 
oe ee 
ese 








caret 

as 
ne 
ve 
Se 
ae 
ae 
y 
S 


oe 
R 
SSE 
s 
EOS 
oieee 
heeren 
osasta 
RS 
RR 


PAT "AT 
eei 
E 6 
n © 

k 05 
Sire eat A 
a ee he 
A N 
i AK 
A teint 

ESA 
ferateriteren 


ae bots 





lene 


TE 
Bees 
te 
male 
in 
Gs 
ict 
Net seen 
YOBASSASA SY 
teen 
Hee 
ei 
ies 
i 
cee 
pene 
ota 





& 
& 
& 
|; 
& 
f 
|; 
of 
es 





<] E 
EX 

AEA 

c] 











a sal 
ce 
aera 
eran 
cece 
NSA 
ere E E 
H seais ote 
Sees | 
Ney £e 
teeth 0. See An 
roih 
EE 
eee, 
CESS 
tee! 









& 
os 
EIEI] 
RRS] 
ee 
fe 


rire, T 
oes ete 
NS 

e 
Sey 

TSRS 
E 


IA 
tates 
satay" 
e 
a 
S585 
states 
eet 
ae 
jo 
ain 
ce 
ee 
eee 
pate 


ed 
see 
oe 
es 
eres, 
2 
fi 
EiS 
i 
Se 





Se 
xf 






a 
Se 
en 
Se 
Se 
es 
3 
es 
oe 
oe 
oe 


me 
ie 
Se 
z 


DTT 
a 
ne 


oQ 


i 
EE 


a 
& 
EII 
ETIC 






+ 
rated E 
araras Ti 
ELIN 
E 
ES 
a 
AEE iS 
SRR 
T 


S 
os 
ae 

sees 
a 

Se 
one, 
Se 
pee 
no 


ne 
3 
ee 

2 





kaes 


ce 
Bese 
ne 
Ke 
[> 
‘ 
i 
p 
‘ 
K 
K 
iS 
i 
& 
‘ 
i 





Figure 1 The sets S,Q,0Q. 





endowed with an orthogonal splitting H = H- @ HF, 
fix a unit vector ut in H”, and consider the sets 


S={u € H"| |lu] = p} 
Q = {u + àu* |u € H, |u| <o, 0<A<7T} 
00 ={u + rut € Q| € {0,7} or ||jul| = o} 


for some positive numbers p,o,7 such that T > p. 
The latter inequality implies that the intersection 
ONS is not empty (see Figure 1). 

If the linear subspace H™ is finite dimensional, a 
simple argument involving the topological degree 
shows the following fact: the image of any contin- 
uous map /: O — H which is the identity on Q has 
nonempty intersection with S. 

When HA™ is infinite dimensional, this fact is 
not true anymore. Indeed, it is not difficult to see 
that the set O is homeomorphic to an infinite- 
dimensional closed ball B, by a homeomorphism w 
mapping OO onto the infinite-dimensional sphere 
OB. If B is the closed ball of an infinite-dimensional 
Hilbert space, for instance, the space ¢) of all 
square-summable sequences (x) endowed with the 
norm |x|, = ($ olx)", the continuous map 


(NG, Hs.) = (v 1 = |x|f, x0, x1, X2, .. ) 


maps B into OB and is a shift operator on OB. 
In particular, it is a continuous map on B without 
fixed points, and it can be used to define a map 
h:B — OB which is the identity on OB, by setting 


h(x) = u(x)x + (1 — u(x))e(x) 
with u(x) > 1 such that |h(x)|, = 1 


Conjugation by the homeomorphism % produces 
a continuous map from O to OQ, which is the 
identity on OQ, providing us with the desired 
counterexample. 

In other terms, when H` is infinite dimensional, 
the sets OO and S can be unlinked by means of a 
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continuous map. The situation changes if we restrict 
the class of maps h: O — H to those of the form 


b(u) = u + K(u) 3) 


where K is a continuous compact map. In this case, 
indeed, the argument for a finite-dimensional H- 
can be applied, by replacing the topological degree 
by the Leray-Schauder degree (which is invariant 
precisely with respect to homotopies of the form 
above), and one proves that OQ and S cannot be 
unlinked by means of continuous maps of this form. 


Closed Characteristics on Compact 
Energy Hypersurfaces 


Consider R” with coordinates (91, ..-5 Pns d1,- - -3 dn), 
endowed with the standard symplectic form 


n 


w= dp Adg= Y dp; \ dq; 


j=l 


Let © be a compact connected hypersurface in R”. 
The restriction of w to the tangent space T} has a 
one-dimensional kernel, which varies smoothly with x. 
In other words, there is a smooth line bundle 


Ly := {(x u) ETE |w(u,v) = 0 Yv € TX} 


over X. We wish to discuss the classical problem 
of finding a closed characteristic for Ly, that is, 
a closed curve everywhere tangent to Ly. 

This geometric problem has a dynamical inter- 
pretation. Indeed, let H be a smooth real function on 
R” such that © is the inverse image of the regular 
value 1. The function H - the Hamiltonian - 
generates a vector field X on R” by the formula 


w(Xy(x),u) = —DH(x)[u], Vu € R” 


or, equivalently, 


Xy(x) =JVHA(x), with J = t a) 


The Hamiltonian vector field Xy is tangent to X and 
belongs to Ly. Therefore, the hypersurface © is 
invariant for the flow of Xy, and the flow orbits are 
precisely the characteristics. So finding a closed 
characteristic on ®© is equivalent to finding a 
periodic orbit of Xy with energy H=1. 

Up to changing the Hamiltonian, we may assume 
that all the values in an interval ]1 — ôo, 1 + ôo[ are 
regular for H, and that the corresponding level sets 
X, := {H =n} are all connected (hence diffeomorphic 
to =). We would like to sketch Hofer and 
Zehnder’s proof of the fact that there is a dense set 
of values 7 € ]1 — ôo, 1 + ĉo| for which X, admits a 
closed characteristic. 
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This proof is based on the fact that the one- 
periodic orbits of Xų are critical points of the action 
functional 


Au(x) = | «(pdq— Hdt) 


Le 
=; J z(t) - Jx(t) dt — / H(x(t)) dt 


on the space of loops x: T — R”. 

Clearly, it is enough to show that for every 6 > 0 
there is a closed characteristic on some &, with 
7 — 1| < 6. We can take advantage of the fact that 
we are free to change the Hamiltonian, as long as 
it has the level sets &,, |7 — 1| < 6. Denoting by B 
the bounded component of the complement of 
{1—6<H<1+6}, we may assume that B con- 
tains the origin. We can modify H in such a way 
that H vanishes identically on B, then it grows, 
parametrizing all the hypersurfaces X, |ņn— 1| < 6, 
in a strictly increasing way, then it remains 
constant in a large ball, and finally it smoothly 
switches to the quadratic form (3/2)r|x|". By 
choosing H in this way, one can ensure that all 
the constant orbits and all the one-periodic orbits 
which do not lie on &, for some |7 — 1| < 6 have 
non-positive action. So it is enough to prove that 
the functional Ay has a positive critical value. 

Using the Fourier series decomposition 


ai= ` eZ, Rp ER” 
keZ 


one sees that the quadratic part of the action 
functional has the form 


1 
/ x(t) + Jre(t) dt = 21 Y` klia 4 


REZ, 


so it is positive on an infinite-dimensional linear 
space, negative on an infinite-dimensional linear 
space, and null on the 27-dimensional space spanned 
by the constant loops. The specific form of [4] 
suggests to choose as domain of the action func- 
tional the Sobolev space H'!/2(T, R*”), the space of 
square-integrable one-periodic curves x in R” with 


lx It. = [o| +20 X |k|lår| < +00 
keZ 


This is indeed a Hilbert norm on H! (T, R”). The 
functional Ay is smooth on this space, and its 
gradient takes the form 


V Au(x) = Lx + K(x) 15] 


where L is the self-adjoint Fredholm operator 
representing the quadratic form [4] with respect to 


the H'/*-Hilbert product, and K is a compact map. 
A gradient of the form [5] again implies that 
bounded Palais-Smale sequences are compact. The 
Palais-Smale condition then follows from the fact 
that the Hamiltonian H is quadratic outside a large 
ball, and has no one-periodic orbits there (the large 
orbits are all periodic, but their period is 2/3). 

Consider the splitting H!(T, R”) =H- @ H*, 
with 





H =44|\4,.=0 for k > 0} 
AY =1x\a,= 0 fork <0} 


Let S, O, and OO be the sets defined in the previous 
section, with 


a) = 1 any, 
V2 
and constants p,o,7 to be determined. Since the 
quadratic form [4] is positive on H* and the 
Hamiltonian H vanishes near the origin, we can 
find a small p > 0 such that 


inf Apy(x) > 0 
xES 


uo € R”, |uo| = 1 


The fact that the quadratic form [4] is seminegative 
on H~ and the behavior of H(x) for large |x| imply 
that if o and 7 are suitably large (in particular 
T > p), then 


sup Ap(x) <0 
xEdO 


Let I’ be the set of all images of maps 
þh: Q —> H!/?(T, R2”) 
which are the identity on OO and are of the form 
h(x) = e® (xe + K(x)) [6] 


with a a continuous real-valued function, and K a 
continuous compact map. This class of maps is more 
general than the one considered in the previous 
section, but the fact that e°} commutes with the 
projections onto H~ and H* ensures that Q and 
S cannot be unlinked even inside this class. There- 
fore, any y € I has nonempty intersection with S, so 
c := inf sup Ap(x) > inf Ay(x) > 0 
YEL yey xES 
We would like to apply the general minimax 
theorem, and conclude that c is the desired positive 
critical value. 

The number c being clearly finite, it is enough to 
show that I is positively invariant under the action 
of the negative-gradient flow ¢@ of Ay, truncated 
below level 0. Let y=h(Q) €T and t>0. Then 
olt, y) is the image of O by the map ¢(t, h(-)). This 


map is the identity on OO because OO lies in 
{Ay < 0} and ¢ is truncated below level 0. It is of 
the form [6] because by [5] the truncated negative- 
gradient flow of Ap has the form 


b(t, x) =e "(x + K(t,x)) 


for some continuous function 0 < (t,x) < t and for 
some continuous compact map K. This concludes 
the proof. 

This result was refined by Struwe, who proved the 
existence of a closed characteristic on X, for almost 
every 7, in the sense of the Lebesgue measure. We 
could try to use the abundance of closed characteristics 
on energy levels near X to get the existence of one on 
» by taking a limit. But this process produces a closed 
characteristic on © only if we can bound the periods of 
the approximating closed orbits, otherwise a more 
general invariant set results. Actually, Ginzburg, Her- 
man, and Gürel have produced examples of compact 
hypersurfaces without any closed characteristic. 

As conjectured by Weinstein and proved by 
Viterbo, closed characteristics always exist on 
contact-type compact hypersurfaces (i.e., hypersur- 
faces X on which the restriction of w is the 
differential of a 1-form A such that AA dAA---A 
dà is a volume form). In this case, one should even 
expect a multiplicity result. For hypersurfaces which 
bound a strictly convex set in R”, for instance, the 
existence of n closed characteristics is conjectured. 
The best result so far is due to Long, who could 
prove the existence of [7/2]+1 of them. Hofer, 
Wysocki, and Zehnder have proved that, when n = 2, 
there are either two or infinitely many closed 
characteristics (for a generic contact-type hypersur- 
face diffeomorphic to $$), by using the already 
mentioned theorem by Franks on periodic points of 
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area-preserving homeomorphisms of the disk. Prov- 
ing an analogous result for n > 3 is an intriguing 
open problem. 


See also: Contact Manifolds; Floer Homology; 
Hamilton—Jacobi Equations and Dynamical Systems: 
Variational Aspects; Image Processing: Mathematics; 
Inequalities in Sobolev Spaces; Leray—Schauder Theory 
and Mapping Degree; Ljusternik—Schnirelman Theory; 
Saddle Point Problems. 
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Introduction 


Mirror symmetry was discovered in the late 1980s 
by physicists studying superconformal field theories 
(SCFTs). One way to produce SCFTs is from closed 
string theory; in the Riemannian (rather than 
Lorentzian) theory the string’s world line gives a 
map of a Riemannian 2-manifold into the target 
with an action which is conformally invariant, so 
the 2-manifold can be thought of as a Riemann 


surface with a complex structure. Making sense of 
the infinities in the quantum theory (supersymmetry 
and anomaly cancelation) forces the target to be 
10-dimensional — Minkowski space times by a 
6-manifold X — and X to be (to first order) Ricci 
flat and so to have holonomy in SU(3). That is X is a 
Calabi-Yau 3-fold (X,Q,w). So SCFTs come from v- 
models (mapping Riemann surfaces into Calabi-Yau 
3-folds) but, it turns out, in two different ways — the 
A-model and the B-model. Deformations of the SCFT 
and either o-model are isomorphic, so over an open set 
the two coincide. Thus, it was natural to conjecture 
that almost all of the relevant SCFTs came from 
geometry — from an A or B o-model. In particular, 
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the A-model of a Calabi-Yau X should, therefore, 
give the same SCFI as the B-model on another 
Calabi-Yau X. It turns out then that the A-model 
on X should also be isomorphic to the B-model on 
X; thus, mirror symmetry should give an involution 
on a Calabi-Yau 3-folds. (The full picture is 
slightly more complicated - it involves large 
complex structure limits, multiple mirrors and 
flops.) By studying the SCFTs, Greene and Plesser 
predicted the mirror of the simplest Calabi-Yau 
3-fold, the quintic in Pf, and mirror symmetry 
was born. 

Topological observables, that is, certain path 
integrals over the space of all maps, can be 
calculated by the semiclassical approximation as 
integrals over the space of classical minima — (anti) 
holomorphic curves in the Calabi-Yau (these mini- 
mize volume in a fixed homology class). From the 
zero homology class we get the constant maps — 
points in X — and so integrals over X. In some cases, 
by Poincaré duality, these can be thought of as 
intersections of cycles; we think of the string world 
sheet lying at a point of intersection. When the 
world sheet has a nontrivial homology class, it 
allows more general “intersections” where the cycles 
need not intersect but are connected by a 
holomorphic curve, giving a perturbation of the 
usual intersection product on cohomology called 
quantum cohomology. Namely, there is a contri- 
bution (a.3)(b.8)(c.3)e! to the quantum triple 
product a.b.c of three 4-cycles a,b,c € HI! ~ H? & 
H4 trom each holomorphic curve 8 (of genus 0, in 
the 0-loop approximation to the physics) in X of 
area faw (where w is the Kahler form). The 
A-model correlation functions can be determined 
from these data; the B-model computation involves 
no such quantum correction and can be computed 
purely in terms of integrals over cycles (“periods”) 
and their derivatives (discussed in the next section). 
So it is in some sense easier and, in a historic tour- 
de-force, was calculated by Candelas et al. (1991) 
for the Greene—Plesser mirror of the quintic. 
Comparing with the A-model computation on 
the quintic gave remarkable predictions about the 
number of holomorphic rational curves on the 
quintic. These were way beyond mathematical 
capabilities at the time, and sparked enormous 
mathematical interest. The predictions (and more) 
have now been proved to be true by Givental and 
Lian—Liu-—Yau, while mirror symmetry has begun to 
be understood geometrically. But, in some sense, 
the mathematical reason for the relationship 
between the Yukawa couplings and the quantum 
cohomology of the mirror is still a little mysterious; 
it is the hardest part of mirror symmetry to see in the 


geometry, yet for the physics it was the easiest and the 
first prediction. 

We survey, nonchronologically, some of the 
geometry of mirror symmetry as it is now under- 
stood, mainly in dimension n=3. For the many 
topics omitted, the reader should consult the Further 
Reading section. 


The Geometric Setup 


A Calabi-Yau 3-fold (X,Q,w) is a Kahler manifold 
(X,w) with a holomorphic trivialization of its 
canonical bundle 


Kx = A T'X 


(i.e., a nowhere-vanishing holomorphic volume form, 
locally dz; A dz2 A dz3), and b1(X) =0. It follows that 
the Hodge numbers 97,9! vanish, and so 
H2(X,C)=H'! and H3(X,R) = H2!+H39, By 
Yau’s theorem the Kahler metric can be changed 
within its H7(X,R) cohomology class to a unique 
Ricci-flat Kahler metric; equivalently, Q is parallel, so 
the induced metric on Kx is flat. Roughly speaking, 
mirror symmetry swaps the symplectic or Kahler 
structure w on X with the complex structure (encoded 
in Q, up to scaling by C*) on the (conjectural) mirror 
X. Kahler deformations are unobstructed, forming an 
open set Ky in H?(X,R). Its closure Ky is sometimes 
extended by adding the Kahler cones of all birational 
models of X to give Kawamata’s movable cone. This is 
because the work of Aspinwall, Greene, Morrison, and 
Witten suggested that all birational models of X are 
indistinguishable in string theory and so are all mirrors 
of X, corresponding to a different choice of (1, 1)-form 
w which is a Kahler form on one model only. Kx is also 
complexified by including in the A-model data any 
“B-field” B € H*(X,R/Z), and divided by holo- 
morphic automorphisms of X, to give a moduli space 
of complex dimension h!!(X). Deformations of 
complex structure are also unobstructed by the 
nontrivial Bogomolov—Tian—Todorov theorem; thus, 
they form a smooth space with tangent space 


H1 (TX) = H1(A?T*X) = H> (X) 


(Given a deformation of complex structure, the 
above isomorphism takes the H> !-component of the 
derivative of the (3,0)-form Q.) So, for the moduli 
spaces to match up, we get the first and simplest 
prediction of mirror symmetry: 


h(x) =h"(X) and BX) Hb CK) [1 


This is where mirror symmetry gets its name, the 
above relation making the Hodge diamonds of X 
and X mirror images of each other. 


As the complexified Kahler cone is a tube 
domain, it has natural partial complex compactifi- 
cations (due to Looijenga, and suggested in the 
context of mirror symmetry by Morrison (1993)). 
The simplest case is where we ignore the movable 
cone and automorphisms and assume that there is 
an integral basis e1,...,€, of both Ky and 
H?(X,Z)/torsion. The complexified Kahler moduli 
space is then 


Ky := H?(X,R)/H?(X,Z) + iKx = {B+ iw} 


with natural coordinates x;, y; > 0 pulled back from 
the first and second factors, respectively, induced by 
the e;. x; is multivalued with integer periods, so 


Zi = exp(27i(x; + iy;)) [2] 


is a well-defined holomorphic coordinate, giving an 
isomorphism to the product of n punctured unit 


disks in C: 
Kx & (A*)” = {(zi) 10 < a (C*)” 


The compactification A” comes from adding in the 
origins in the disks, which we reach by going to 
infinity (in various directions) in Ky. We call the 
point (0,...,0) € A” the large Kahler limit point 
(LKLP) in this case. Moving along the ray generated 
by >> kie; € Kx,k; > 0, complexifies in the holo- 
morphic structure [2] to give the analytic curve 


l =, Vij [3] 


in KY. For k; € Q Vi, this extends to a complete 
curve in the compactification. Without loss of 
generality, we can assume that k; are integers with 
no common factor; then the link of the curve winds 
around the LKLP (0,...,0) € A” with winding 
number 


(Ri, a Ra) € m(H*(X, R)/H?(X, Z) a iKx) 
— H?(X,Z) = Ze, t= Ze, 


This is because multiplying the ray R.Xk;e; € Kx 
by 1 gives the direction R.Uk,e; in the space 
H? (X, R)/H?(X, Z) of B-fields, with the given 
winding number. For k; not rational we get an 
analytic mess; the direction in the space of B-fields 
does not close up to give a circle. 

There is no obvious mirror to these rays since we 
consider 2 only up to scale. So, mirror symmetry 
predicts an isomorphism between Ky and the 
moduli space M% of complex structures on X, and 
a distinguished limit in My, the large complex 
structure limit point (LCLP), the mirror of the LKLP 
(0,...,0) € A” above. Morrison has given a rigorous 
definition of LCLPs and the canonical coordinates 
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on My dual to the z; on KX; see the section 
Monodromy around the LCLP. The holomorphic 
curves in (A)” described above, corresponding to 
rational rays of Kahler forms, give degenerations of 
(the complex structure on) X to the LCLP whose 
monodromy is discussed in this article (see “Lagran- 
gian Torus Fibrations’’). 

LCLPs play a vital role in mirror symmetry; in 
fact, mirror symmetry is really a statement about 
LCLPs and families of Calabi-Yau manifolds near 
LCLPs. Most predictions only really hold near or at 
the LCLP, and the complex structure moduli space 
only looks like A” near the LCLP. For instance, 
manifolds can have many LCLPs and accordingly 
many mirrors. This also explains one obvious 
paradox — that rigid Calabi-Yau manifolds, those 
with no complex structure deformations, h>! =0, 
and so no LCLP, can have no mirror, since a Kahler 
(or symplectic) manifold has h? = htt Æ 0. 

The first predicted refinement of [1] is, as 
discussed in the introduction, that the variation of 
Hodge structure (VHS) on X should be describable 
in terms of Gromov—Witten invariants of X. Here 
VHS is governed by how the ray C.Q, = H>?(X,) 
sits inside H3(X;,C) as the complex structure on X; 
varies, parametrized by te My. By Poincaré 
duality, it is sufficient to know how Q pairs with 


Vv 


H3(X), that is, to compute the period integrals 
J Q, i=1,...,2k =2bt +2 
A; 


where A; form a basis of H3(X,Z). (In fact we can 
choose the A; to be a symplectic basis, Aj.Aj = 6:12, j» 
and then knowledge of only the periods of the first k 
A; suffices, locally in moduli space.) These periods 
determine Q, and so the Yukawa coupling 
@2 

H}(TX,)® > (ATX) > HB(Ky) C4 
On X, we get the cubic form on H?(X) described 
earlier in terms of numbers of rational curves in X. 
These numbers are in fact independent of the 
almost-complex structure on X (as long as it is 
compatible with the symplectic form w), and, there- 
fore, give the symplectic invariants of Gromov 
and Witten. The cubic form depends on w=u, 
as it moves in Ky, (or in ka replacing w, by 
—i(B; + iw;)). Under the predicted local isomorphism 
or My near the LKLP and LCLP, the equality of 
these cubic forms gives the predictions of number of 
rational curves in X mentioned in the introduction. 
This has been carried out, and the predictions 
checked rigorously, in quite some generality, for 
instance for mirror pairs produced by Batyrev’s toric 
methods. 
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There is, of course, a flat connection, the Gauss- 
Manin connection on the bundle over My with 
fiber H?(X,,C) over t € My, given by the local 
system H?(X, Z) c H3(X;,C). As mirror to this, 
Dubrovin has shown how to put a flat connection 
on the bundles with fibers H? (X,) and H” (X,) using 
Gromov-—Witten invariants. 


Homological Mirror Symmetry 


Building on the work of Witten, Kontsevich (1995) 
proposed a remarkable conjecture that purported to 
explain mirror symmetry, all the more surprising 
because it appeared to have little to do with what 
was thought to be mirror symmetry at the time. The 
conjecture is now reasonably well understood, while 
the link to Gromov—Witten invariants and Yukawa 
couplings is more mysterious, although it is known 
how both data should be encoded in the conjecture. 

Kontsevich proposed that mirror symmetry should 
be explained by a (noncanonical) equivalence of 
triangulated categories between the derived Fukaya 
category D? (X) of (X,w) and the bounded derived 
category of coherent sheaves D(X) on its mirror X. 
This second category consists of chain complexes of 
holomorphic bundles, with quasi-isomorphisms 
(maps of chain complexes which induce isomorph- 
isms on cohomology) formally inverted, that 
is, decreed to be isomorphisms. For zero B-field 
the first category should be constructed from 
Lagrangian submanifolds L C X carrying flat uni- 
tary connections A. That is, L is middle- (three-) 
dimensional, and 


wlr =0, Fa =0 


For B Æ 0, this needs modifying to Fa + 27iB.id=0 
(so, in particular, we require that L satisfies 
[B|,|=0 € H7(L,R/Z)). There are also various 
technical conditions such as the choice of a relative 
spin structure, the Maslov class of L must vanish 
(i.e. the map (Q|,/vol_):L — C* has winding 
number zero) and we pick a grading on L 
(a choice of logarithm of this map). Morphisms are 
defined by Floer cohomology HF* of Lagrangian 
submanifolds; roughly speaking, this assigns a vector 
space to each intersection point (the homomorph- 
isms between the fibers of the two unitary bundles 
carried by the Lagrangians at this point), made into 
a chain complex by a certain counting of holo- 
morphic disks between intersection points. In-depth 
work by Fukaya—Oh-Ohta—Ono shows that this 
gives the structure of an A%®-category which can 
then be “derived” into a triangulated category in a 
formal way by taking “twisted cochains.” The 


construction is still very technical and difficult to 
calculate with, but the key points are that we get a 
category depending only on the symplectic structure, 
that certain “unobstructed” Lagrangian submani- 
folds give objects of this category, and that 
Hamiltonian isotopic unobstructed Lagrangian sub- 
manifolds give isomorphic objects. 

Since the introduction of D-branes there is a 
physical interpretation of this conjecture in terms of 
open string theory; the objects of the two categories 
are boundary conditions for open strings, and 
morphisms correspond to strings beginning on one 
object and ending on the other. So, for instance, 
intersections of Lagrangians give morphisms corre- 
sponding to constant strings at the intersection 
point, while the Floer differential gives instanton 
tunneling corrections. 

One paradox this formulation immediately sheds 
light on concerns automorphisms on both sides of 
mirror symmetry. While symplectomorphisms of 
(X,w) are abundant, there are few holomorphic 
automorphisms of a Calabi-Yau X. The former 
induce autoequivalences of D*(X); Kontsevich’s 
suggestion is that as a mirror to this there should 
be an autoequivalence of D>(X); this need not be 
induced by an automorphism of X. Motivated by 
this, groups of autoequivalences of derived cate- 
gories of sheaves of Calabi-Yau manifolds have 
now been found that were predicted by mirror 
symmetry; a few are mentioned below. Thus, 
homological mirror symmetry suggests that an 
SCFT is equivalent to a triangulated category, 
and the ambiguities in geometrizing an SCFT 
(finding a Calabi-Yau of which it is a o-model) 
are seen in the category — not all automorphisms 
come from an automorphism of a Calabi-Yau 
(e.g., Calabi-Yau manifolds X with equivalent 
derived categories give multiple mirrors to X), 
and not all appropriate categories need even come 
from a Calabi-Yau. Supporting this suggestion, 
Bondal-—Orlov and Bridgeland have shown that 
indeed birational Calabi-Yau manifolds X have 
equivalent derived categories. 

Finally, Kontsevich explained how deformation 
theory of the categories should involve derived 
morphisms on the product from the diagonal 
(thought of as a Lagrangian in the A-model, its 
structure sheaf as a coherent sheaf in the B-model) 
to itself, giving quantum cohomology in the 
A-model and Hodge structure in the B-model. For 
instance, the holomorphic disks used to compute the 
Floer cohomology of the diagonal on the product 
X x X give holomorphic rational curves on X. So, 
one should be able to see some parts of “classical” 
mirror symmetry. 


Below, as we describe more of the geometry of 
mirror symmetry that has emerged since Kontse- 
vich’s conjecture, we will mention at each stage how 
his conjecture fits in with it. 


The Strominger-Yau-Zaslow Conjecture 


To recover more geometry from Kontsevich’s con- 
jecture, there are some obvious objects of D(X) 
that reflect the geometry of X — the structure sheaves 
Op of points p € X. Calculating their self-Homs, 
Ext* (Op, Op) = A*T,X = A*C’ Œ H*(T?,C), shows 
that if they are mirror to Lagrangians L in X (with 
flat connections A on them) then we must have 


HF*((L, A), (L, A)) = H*(T?,C) 


as graded vector spaces. Since the left-hand side is, 
modulo instanton corrections, H*(L,C)°’, where r is 
the rank of the bundle carried by L, this suggests 
that the mirror should be L = T? with a flat U(1) 
connection A over it. There are reasons why the 
Floer cohomology of such an object should not be 
quantum corrected, and so be isomorphic to 
Ext* (Op, Op). 

For any Lagrangian L, the symplectic form gives 
an isomorphism between T*L and its normal bundle 
Nz; thus, Lagrangian tori have trivial normal 
bundles, and locally one can fiber X by them. 
Thus, one might hope that X is fibered by 
Lagrangian tori, and the mirror X is (at least over 
the locus of smooth tori) the dual fibration. This is 
because the set of flat U(1) connections on a torus is 
naturally the dual torus. 

This is the kind of philosophy that led to 
the Strominger-Yau-Zaslow (SYZ) conjecture 
(Strominger et al. 1996), although Strominger et al. 
were working with physical D-branes, and not 
Kontsevich’s conjecture. Therefore, their D-branes 
are not the “topological D-branes” of Kontsevich, 
but those minimizing some action. That is, instead 
of holomorphic bundles in the B-model, we deal 
with bundles with a compatible connection 
satisfying an elliptic partial differential equation 
(PDE) (e.g., the Hermitian—Yang—Mills equations 
(HYM), or some perturbation thereof); instead of 
Lagrangian submanifolds up to Hamiltonian isotopy 
in the A-model, we consider special Lagrangians 
(sLags) (see eqn [5]). The SYZ conjecture is that a 
Calabi-Yau X should admit a sLag torus fibration, 
and that the mirror X should admit a fibration 
which is dual, in some sense. 

A sLag is a Lagrangian submanifold of a Calabi- 
Yau manifold X satisfying the further equation that 
the unit norm complex function (phase) 
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? — constant [5] 


(So, sLags have Maslov class zero, in particular.) 
This equation uses the complex structure on X as 
well as the symplectic structure, and the resulting 
Ricci-flat metric of Yau, to define a metric on L and 
so its Riemannian volume form volz. SLags are 
calibrated by Re(e~!?Q) and so minimize volume in 
their homology class. This is similar to the HYM 
equations on the mirror X, which are defined on 
holomorphic bundles on the complex manifold X 
via a Kahler form w, and minimize the Yang-Mills 
action. The Donaldson—Uhlenbeck-Yau theorem 
states that for holomorphic bundles that are 
polystable (defined using [w], this is true for the 
generic bundle), there is a unique compatible 
HYM connection. Thus, modulo stability, HYM 
connections are in one-to-one correspondence with 
holomorphic bundles. A similar correspondence is 
conjectured, and proved in some special cases, by 
Thomas and Yau, for (special) Lagrangians: that 
modulo issues of stability (which can be formulated 
precisely), sLags are in one-to-one correspondence 
with Lagrangian submanifolds up to Hamiltonian 
isotopy. That is, there should be a unique sLag in 
the Hamiltonian isotopy class of a Lagrangian if and 
only if it is stable. Currently, only the uniqueness 
part of this conjecture has been worked out, but, in 
principle at least, we do not lose much by consider- 
ing only Lagrangian torus fibrations. 

The SYZ conjecture is thought to hold only near 
the LCLPs and LKLPs of X and X; away from these, 
the sLag fibers may start to cross. According to Joyce, 
the discriminant locus of the fibration on X is 
expected to be a codimension one ribbon graph in a 
base S? near the limit points, while the discriminant 
locus of the dual fibration X may be different — that 
is, the smooth parts of the fibration and its dual are 
compactified in different ways. In the limit of moving 
to the limit points, however, both discriminant loci 
shrink onto the same codimension-two graph. In this 
limit, the fibers shrink to zero size, so that X (with its 
Ricci-flat metric) tends, in the Gromov—Hausdorft 
sense, to its base $? (with a singular metric). This 
formal picture has been made precise in two 
dimensions, for K3-surfaces, by Gross and Wilson. 
The limiting picture suggests that if we are only 
interested in topological or Lagrangian torus fibra- 
tions then we might hope for codimension-two 
discriminant loci, and such fibrations might make 
sense well away from limit points. Gross and Ruan 
carry this out in examples such as the quintic and its 
mirror, and makes sense of dualizing the fibration by 
dualizing monodromy around the discriminant locus 
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and specifying a canonical compactification over the 
discriminant locus. This gives the correct topology for 
toric varieties and their mirrors, and flips the Hodge 
numbers [1], for instance. Approaching the LCLP in 
a different way (in the example of eqn [3] this 
corresponds to altering the rational numbers k;) can 
give a different graph and different fibration on X; 
the dual fibration can then be a topologically 
different manifold, giving a different birational 
model of the mirror X. 

We focus only on Lagrangian fibrations, as they 
are better behaved and understood. We can expect 
them to be C% fibrations with codimension-two 
discriminant loci, for instance. Below we see how 
to put a complex structure on the smooth part 
of the fibration, but extending this over the 
compactification is much harder and will involve 
“instanton corrections” coming from holomorphic 
disks. Fukaya (2005) has beautiful conjectures about 
this that will explain a great deal more of mirror 
symmetry, but they will not be discussed here. 


Lagrangian Torus Fibrations 


If (X*”,w) + B” is a smooth Lagrangian fibration 
with compact fibers, then the fibration is naturally 
an affine bundle of torus groups (i.e., a bundle of 
groups once we pick a Lagrangian O-section — an 
identity in each fiber), and the base B inherits a 
natural integral affine structure: it looks like a 
vector space V with an integral structure V = A ®7 
R up to translation by elements of V. This is the 
classical theory of action-angle variables. TYB acts 
on the fiber X, =r™(b): by pullback and contrac- 
tion with the symplectic form, øo € T7B gives a 
vector field o tangent to X,, and the time-one flow 
along a gives the action. By compactness and 
smoothness of X, the kernel is a full-rank lattice 
Ap C TYB, giving the isomorphism 


Xarb 


We define the integral affine structure on B by 
specifying the integral affine functions f (up to 
translation) to be those whose time-one flow along 
df is the identity (i.e., on the universal cover the time- 
one flow is to a section of the bundle of lattices A). 

The situation that concerns us is where B is a 
3-manifold B (usually $?) minus a graph; then the 
monodromy around the graph preserves the integral 
affine structure: 


mı(B) > R°xGL(3,Z) (6] 


A great deal of mirror symmetry can be seen from 
just this knowledge of the smooth locus of the 


fibration; in particular, Gross (1998) has shown 
how mild assumptions about the compactification 
(with singular fibers over B\B) are enough to 
determine much of the topology of X. The dual 
fibration 7 should have the monodromy dual to [6], 
and he shows how this implies the switching of the 
Hodge numbers [1] by the Leray spectral sequence; 
the rough idea being the obvious isomorphism 


Rim, R & ATB = AP 'T*B & R3, R 


induced by a trivialization of A°TB. That is, morally 
speaking, the flipping of Betti numbers arises by 
representing cycles by those with linear intersection 
with the fibers, and replacing this linear space by its 
annihilator in the dual torus. This also agrees with 
the equivalence taking Lagrangians to coherent 
sheaves described in the next section. 

The dual fibration 7 has a natural complex 
structure; here the affine structure is essential, as in 
general a tangent bundle TB only has a natural 
almost complex structure along its O-section. Since, 
up to translation, locally B = V is a vector space, 


TB=>VxV2VS&rC has a natural complex 
structure which descends to 
it: X = TB/A* > B [7] 


Gross suggests that the B-field on X should lie in the 
piece 


H! (Rim R/Z) = H'(TB/A*) 


of the Leray spectral sequence converging to 
H?( X, R/Z). That is, it is represented by a Cech 
cocycle e on overlaps of an open cover of B with 
values in the dual bundle of groups TB/A*. Using 
this to twist [7] and re-glue it via transition 
functions translated by e, we get a new complex 
manifold (e is locally constant, so translation by e is 
holomorphic) which we consider as mirror to X 
with complexified form B + iw. In this way, Gross 
manages to match up complexified symplectic 
deformations of X with complex structures on X. 


The 2-Torus 


Mirror symmetry is nontrivial even for the simplest 
Calabi-Yau — the 2-torus. This can be written as an 
SYZ fibration T? + B=S', and write B as R/aZ 
with its standard integral affine structure induced by 
Z C R. This trivializes T*B =B x R and the lattice A 
in it as Bx Z C B x R. So as a symplectic manifold, 


, TS! (0, a] x [0,1] 
Y=) “Op~Gpao-~-@an © 





with symplectic coordinates (g,p) in which the 
symplectic form is w= dp ^ dq (so fp w=a). Again, 
the B-field, b € H'(R'z,R/Z) =H?*(T*, R/Z), is in 
H! of the locally constant sections of the dual 
fibration. 

In our trivialization B S R/aZ, A* C TB is also 
standard: Bx ZCBxR, so the mirror has the 
same description as in [8] in which the complex 
structure is standard: JO, =0,. That is, p + ig gives a 
local holomorphic coordinate. 

For nonzero B-field b #0, twisting the dual 
fibration by b gives 


2 TS _ [0,a] x [0,1] 
L= “Op~@b+p,40~a) 7 


again with holomorphic structure given by p + ig and 
SYZ fibration 7 being projection onto q. So, as a 
complex manifold the mirror is C divided by the lattice 


A= (1,b + ia) 


Changing b to b+1 does not alter this lattice, 
so the construction is well defined for b € R/Z = 
H! (Ri m,R/Z), and we have the standard description 
of an elliptic curve via its period point T =b + ia in 
the upper half plane (as a > 0). Mirror symmetry 
has indeed swapped the complexified symplectic 
parameter b + ia= fae (b+iw) for the complex 
structure modulus 7 =D + ia. SL(2, Z) acts on both 
sides (in the standard way on 7, and as symplecto- 
morphisms modulo those isotopic to the identity on 
the A-side) permuting the choices of SYZ fibration. 
We note that in this case the fibrations are special 
Lagrangians in the flat metric, with no singular 
fibers. 

Polishchuk and Zaslow have worked out in detail 
how Kontsevich’s conjecture works in this case. 
The general picture for any torus fibration is an 
extension of the fiberwise duality that led to SYZ. 
Namely, Lagrangian multisections L of the 
fibration, of degree r over the base, give r points 
on each fiber, and so r flat U(1) connections on the 
dual fiber. The resulting U(1)” connections can be 
glued together and twisted by the flat connection on 
L, to give a rank-r vector bundle with connection on 
the mirror. Arinkin and Polishchuk show that 
in general the Lagrangian condition implies the 
integrability condition F’*=0 of the resulting 
connection, giving a holomorphic structure on the 
bundle. Leung—Yau—Zaslow show that the special 
Lagrangian condition gives a perturbation of the 
HYM equations on the connection. Branching of 
sections has been dealt with by Fukaya, and requires 
instanton corrections from holomorphic disks. 
Other Lagrangians with linear intersection with the 
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fibers can be dealt with similarly. T? is simpler 
because all Lagrangians with vanishing Maslov class 
can be isotoped into straight lines (i.e., sLags in the 
flat metric) with no branching. The upshot is that 
the slope of the sLag over the base corresponds to 
the slope (fm ¢1/rank) € [+0,00] of the mirror 
sheaf. 


The Large Complex Structure Limit 


The LKLP for T? is clearly lima— oo. On the 
mirror then, the LCLP is at 7=b+ia — b +iœ, 
the nodal torus compactifying the moduli of elliptic 
curves. Metrically, however, in the (Ricci-) flat 
metric, things look different; if we rescale to have 
fixed diameter, the torus collapses to the base of its 
SYZ fibration, and all of its fibers contract. This is 
an important general feature of the difference 
between complex and metric descriptions of 
LCLPs; see the description of the quintic in the 
next section. 

We note that, as in the compactifications 
discussed in an earlier section, the monodromy 
around this LCLP is given by rotating the B-field: 
b++b+1. This gives back the same elliptic curve, 
but after a monodromy diffeomorphism T, which, 
from [9], is seen to be 


T: gq, poptq/a 
On H'(T*)=Z/fiber] $ Z[section] this acts as 


r= (0 4 [10] 


This is called a Dehn twist. Picking the 0-section 
O={p=0} in the mirror [9] when b=0, this is 
taken to the section 


T(O) = {p = q/a} 

and T is in fact the translation by this section T(O) 
on TŻ, using the group structure on the fibers (now 
we have chosen a O-section). Again, Gross (1998) 
has shown that this is a general feature of LCLPs. 

If we pick a Kahler structure on this family of 
complex tori, T turns out to be a symplectomorph- 
ism. Importantly, its mirror is not a holomorphic 
automorphism, but an equivalence of the derived 
category of coherent sheaves. As above, the section 
T(O) corresponds to a slope-one line bundle L 
on the mirror, and the monodromy action 
corresponds to 


SL: D” p> [11] 


on the derived category. Again, this is a more 
general feature of these LCLPs, with L such that 
cı(L) equals the symplectic form which generated 
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the ray along which the original LKLP was reached. 
In general, the SYZ fiber is the invariant cycle under 
T, [10], and, on the mirror, structure sheaves of 
points are invariant under @L. On the cohomology 
of T?, cupping with ch(L) =e“) = 1 + c;(L) has the 
same action [10] on H = Z(c,(L)) 6 Z(1). 

Notice we have used the choices of fibration and 
Q-section to produce the equivalence of triangulated 
categories and to equate the monodromy actions. 
Kontsevich’s conjectural equivalence is not canonical, 
but is fixed by a choice of fibration and 0-section. In 
turn, a fibration should be fixed by a choice of LCLP 
or LKLP from the resulting collapse (in the Ricci-flat 
metric) onto a half-dimensional $” base. The choice of 
Q-section is then rather arbitrary (as monodromy 
about the LCLP changes it) but determines the 
equivalence of categories. Different choices of section 
give different equivalences, differing, for instance, by 
the monodromy transformation QL [11]. 

Another point of view is that a Lagrangian 
fibration and O-section determine a group structure 
on the fibers and so on the Fukaya category 
(translating Lagrangian multisections by multiplica- 
tion on each fiber). This corresponds to a choice of 
tensor product on the derived category of the 
mirror; the identity for this product is then the 
structure sheaf Ox mirror to the O-section, and an 
ample line bundle is given by the action of the 
monodromy transformation L=T(Ox); T then 
acts as ®T(Ox) [11]. Since X is determined by the 
graded ring 


CD HY(L') = BB Hom* (Ox, T/(Ox)) 


j>0 j>0 


one might also try to construct X purely from the 
Q-section O and LCLP monodromy on X, as 


X = Proj É HF*(O, T’(O)) 


j>0 


A problem is to show that Sjs.0HF°(O,T/(O)) is 
finitely generated; a related problem is to show that, for 
j >> 0, the above Floer homologies vanish except 
for x=0. 

We now turn to the quintic 3-folds, where we will 
see how to identify the (homology classes of the) 
Q-section and fiber in general using Hodge theory. 


The Quintic 3-Fold 


The simplest Calabi-Yau 3-fold is given by the zeros 
O of a homogeneous quintic polynomial on P*, that 
is, an anticanonical divisor of P*. By adjunction, this 
has trivial canonical bundle, and so is Calabi—Yau. 
By the Lefschetz hyperplane theorem, it has htt =1, 


so computing its Euler number to be e = —200, we 
find that h*!=101 gives its number of complex 
deformations. Alternatively, this can be seen by 
showing that all such deformations are themselves 
quintics, then dividing the 126-dimensional space of 
quintic polynomials by the 25-dimensional GL 
(5,C). Thus, its mirror has one complex structure 
deformation and 101 Kahler classes. 

Greene and Plesser prescribed the following 
mirror. Take the special one-dimensional family of 
Fermat quintics 


4 4 
o= fy -afas cm [12] 
i=0 i=0 


with the action of  {(ag,...,a4) €(Z/5): 
[]oi=1} &(Z/5)* given by rescaling the x; by 
fifth roots of unity. Dividing by the diagonal Z/S 
projective stabilizer, we get a free (Z,/5)° action; the 
mirror of the quintic is any crepant (K=O) 
resolution of the quotient: 





Different resolutions give different Kahler 
cones whose union is the moveable cone; its complex- 
ification is locally isomorphic to the complex 
structure moduli space of Q. h!!(O,)=101 for any 
crepant resolution, and h®!(O ,)=1 corresponds 
locally to the one complex structure deformation 
[12]. In fact, for œ? =1, multiplying x9 by a shows 
that Ó, = Ó n, and àf parametrizes the complex 
structure moduli. 

The LCLP is at A= œ, that is, it is the quotient of 
the union of hyperplanes 


4 
On = ‘IL = of 
i=0 
= {xo =O0}U--Ufxy= 0} [13] 


This is a union of toric varieties, each with a T? action 
inherited from the toric T* action on P*. Much more 
generally, Batyrev’s construction considers the 
anticanonical divisors (and even more generally, 
complete intersections) in toric varieties fibered over 
the boundary of the moment polytope, and takes as 
mirror the anticanonical divisor of the toric variety 
associated to the dual polytope. However, most of the 
geometry is visible in this quintic example. 

Equation [13] is the analog of the nodal torus of 
the last section, and we emphasize again that 
metrically it looks nothing like this; the Ricci-flat 
metric collapses the T? toric fibers to the base S° (with 
a singular metric). General LCLPs look rather similar, 


with such “as bad as possible” normal crossing 

singularities. Smoothing a local model (in x9 =1) 
4 & 4 

[L-1 x;=0, we can see the tori in {[|;_, *;=€}: 


Te f jea = 015 X = 02, 





€ 
|æ3] = Ô3, X4 = } [14] 
X1X2X3 


These are even Lagrangian in the standard symplec- 
tic form on the local model, and fiber the smoothing 
over the base {(61,62,63)}. It turns out that, 
metrically, these tori (which vanish into the normal 
crossings singularity at the LCLP) actually form a 
large part of the smooth Calabi-Yau. This 
enlightens the apparent paradox between the SYZ 
conjecture and the Batyrev construction, that is, why 
a vertex of the original moment polytope (corre- 
sponding to the deepest type of singularity 
(0,0,0,0) € (I: x;=0}) can be replaced by the 
dual three-dimensional face in the dual polytope. 
This was first suggested by Leung and Vafa. 

Gross and Siebert (2003) exploit this to extend SYZ 
and Batyrev’s construction to nontoric LCLP Calabi- 
Yau manifolds; it is only the local toric nature of the 
normal crossing singularities of the LCLP that they 
use. It seems possible that their construction will give 
the mirrors of all Calabi-Yau manifolds with LCLPs. 
Much of mirror symmetry should soon be reduced to 
graphs (the discriminant locus of a Lagrangian torus 
fibration) in spheres, and further graphs over which D- 
branes (such as holomorphic curves) fiber, as in recent 
conjectures of Kontsevich and Soibelman and Fukaya 
(2005). It may soon be possible to write down a 
triangulated category in terms of such data. The full 
geometric story (involving Joyce’s description of sLag 
fibrations, for instance) is still some way off, however; 
we cannot even write down an explicit Ricci-flat 
metric on a compact Calabi-Yau. 


Monodromy around the LCLP 


As well as the SYZ torus fiber [14] we can also see a 
Lagrangian 0-section on the quintic and its mirror as a 
component of the real locus of [12] for A> 5. 
Remarkably, like the torus [14], this cycle was already 
described and used by Candelas et al. (1991), long 
before the relevance of torus fibrations was suspected. 

Gross and Ruan have been able to describe the 
quintic and its mirror (at least topologically or 
symplectically) very explicitly as a simple torus 
fibration over this S? with a natural integral affine 
structure and codimension-two graph discriminant 
locus (see, e.g., Gross et al. (2003)). 

Under monodromy about A= ox, the 0-section is 
moved to another section T(O), and T is given by 
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translation by T(O) using the group structure on the 
fibers. This is the analog of the Dehn twist [10], and 
one can choose a basis of H3(Q) (with first element 
the invariant cycle, the T°-fiber, second element 
a cycle fibered over a curve in S?, third fibered over 


a surface, and last the 0-section itself) such that 


[15] 


Core * x% 
= %¥ KX *¥ 


Like the Dehn twist [10], it turns out that T, is 
maximally unipotent; that is, we have in -dimensions, 


(T.-1)"*'=0 but (T,-1)” 40 


Again, this is a general feature of LCLPs as formulated 
by Morrison (1993) as part of the definition. 

This should be compared with the Lefschetz 
operator L = Uw on the cohomology of the mirror, 
which also satisfies L” 40, L”7*'=0 (or, more 
relevantly, exp(L), which satisfies (et — 1)” 40, 
(et — 1)"*! =0). Their similarity was noticed by the 
Griffiths school working on VHS in the late 1960s! 
Now we know that for Calabi-Yau manifolds at an 
LCLP dual to an LKLP along a ray w=c,(L) on the 
mirror, they should be considered mirror operators 
(up to some factors of the Todd class of the 
underlying Calabi-Yau, to do with the relationship 
between the Chern character e” of the line bundle L 
(see [11]) and the Riemann—Roch formula). 

Both, by linear algebra of the nilpotent operator 
N= log T, = 93, (Ts - 1)*, induce a natural 
filtration W.:0 < Wo <--- < Wo, =H on the coho- 
mology on which they operate (which is H = H” for 
N=lop T; and HSH” for N= L= Ua): 


0 < im(N”) < im(N”"') N ker(N) <--- m 
< ker(N”~') + im(N) < ker(N”) < H 6) 


For a discussion of the construction of this mono- 
dromy weight filtration, the reader is referred to the 
further reading section. It plays a key role in studying 
degenerations of varieties and Hodge structures, in this 
case as we approach the LCLP. It is a beautiful result of 
Gross that this filtration coincides with the Leray 
filtration on H” induced by the fibration. That is, 
under Poincaré duality, the weight filtration on cycles 
is by the minimal dimension (over all homologous 
cycles) of the image in the base over which the cycle is 
fibered. So, the first graded piece is spanned by the 
invariant cycle, the T? fiber, supported over a point, 
and the last by the 0-section; cf. [15]. (Similarly on the 
mirror, the filtration for the Lefschetz operator Ue’ 
has first piece spanned by the cohomology class of a 


448 Mirror Symmetry: A Geometric Survey 


point, which is invariant under the monodromy action 
QL of [11], etc.) 

Letting yo be the class of a fiber and y, span 
W2/Wo (which is one-dimensional) over the inte- 
gers, then T,y, =71 + 70. It follows that 





q =exp| 271i m a 
YO 


is invariant under monodromy. This is the higher- 
dimensional analog of the coordinate exp (27iT) on 
the moduli space of elliptic curves, where 7 is the 
period point. It is this coordinate g that is mirror to 


J W 


on the Kähler moduli space on the mirror quintic, 
which allows one to compute the correspondence 
between VHS and Gromov—Witten invariants men- 
tioned in the introduction. 

More generally, following Morrison (1993), one 
can make a rigorous definition of an LCLP using 
features noted above extended to the case of h*! > 0 
(see, e.g., Cox and Katz (1999). Roughly, the 
upshot is that My (of dimension s = þ>1(X)) should 
be compactified with s divisors (D;);_, (parametriz- 
ing singular varieties) forming a normal crossings 
divisor meeting at the LCLP, with monodromies T; 
about them. There should be a unique (up to 
multiples) integral cycle yo (our torus fiber) invariant 
under all Tj, and cycles (y;);—4 such that 


A 0 
m 0 


is logarithmic at Dj; that is 7; =(1/(2771)) log (zi), 
where z; is a local parameter for D; = {z; = 0}. 

So, z= exp (2rir;) form local coordinates for 
moduli space, mirror to the polydisk coordinates [2] 
on KĘ. The direction of approach to the LKLP in that 
section corresponds to the holomorphic curve z,’ = gi 
[3] we take through the LCLP (z;=0Vi), and the 
monodromy *`N;T; varies accordingly, but the 
corresponding weight filtration W. remains constant 
if k; 4 OVi, by a theorem of Cattani and Kaplan. 

Morrison then requires that the (7;)?_) should 
form an integral basis for W2 = W; (with yọ a basis 
of Wo=W,). Finally, part definition and part 
conjecture, we should be able to make a choice 
such that they satisfy the condition log T;(7;) = 670. 





Ti = 


Of course, as has been emphasized, Morrison’s 
definition of an LCLP is really where the mathematics 
and geometry of mirror symmetry begin, and should 
have been the starting point of this article. But that 
would have required appreciable knowledge of 
abstract VHS that are best understood, in this context, 
through the new geometry of Lagrangian torus 
fibrations that mirror symmetry has inspired. 


See also: AdS/CFT Correspondence; Calibrated 
Geometry and Special Lagrangian Submanifolds; 
Derived Categories; Fourier—Mukai Transform in String 
Theory; Geometric Analysis and General Relativity; 
Geometric Flows and the Penrose Inequality; Geometric 
Measure Theory; Geometric Phases; Number Theory in 
Physics; Riemann Surfaces; Several Complex Variables: 
Compact Manifolds; Topological Gravity, Two- 
Dimensional; Topological Sigma Models; WDVV 
Equations and Frobenius Manifolds. 
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The concept of a moduli space has been used by 
mathematicians for nearly 150 years, although it was 
not until the 1960s that Mumford (1965) gave precise 
definitions of moduli spaces and methods for con- 
structing them. The use of the word “moduli” in this 
context goes back to Riemann in a paper of 1857, in 
which he observed that an isomorphism class of 
compact Riemann surfaces of genus g “hängt ... 
von 3g — 3 stetig veranderlichen Grössen ab, welche 
die Moduln dieser Klasse genannt werden sollen.” 
The idea of moduli as parameters in some sense 
measuring or describing the variation of geometric 
objects has been of fundamental importance in 
geometry ever since. 

Moduli spaces arise naturally in classification 
problems in geometry, particularly in algebraic 
geometry (Mumford 1965, Newstead 1978, Popp 
1977, Seshadri 1975, Sundaramanan 1980, Viehweg 
1995). Algebraic geometry is, roughly speaking, the 
study of solutions of systems of polynomial equa- 
tions in many variables; the solutions to such a 
system form an algebraic variety. A simple example 
of an algebraic variety is a hypersurface, consisting 
of the solutions to a single polynomial equation in 
some number of variables. We can try to classify 
hypersurfaces by their degree and their dimension; 
these are “discrete invariants” for the classification 
problem, but of course they do not determine 
hypersurfaces completely, even if we regard two 
hypersurfaces as equivalent when one is obtained 
from the other after making a change of coordinates. 
It is typical of classification problems in algebraic 
geometry (and other areas of geometry) that there 
are not enough discrete invariants to classify objects 
sufficiently finely, and this is where the concept of a 
moduli space arises. 

In complex algebraic geometry, discrete invariants 
often come from topology. For example, a non- 
singular complex curve (i.e., a complex algebraic 
variety which is a connected complex manifold of 
dimension 1, in other words a Riemann surface) 
which is projective (i.e., points have been added at 
infinity to make it compact) is topologically just a 
sphere with a number of handles attached to it; the 
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number of handles is called the genus of the curve 
and is a discrete invariant. Nonsingular complex 
projective curves (or equivalently compact Riemann 
surfaces) are not classified completely by their genus 
g; they are determined by g when regarded simply as 
topological surfaces, but the genus does not deter- 
mine their complex structure when g > 0. 

A classification problem such as this one (the 
classification of nonsingular complex projective 
curves up to isomorphism, or, equivalently, compact 
Riemann surfaces up to biholomorphism), can be 
resolved into two basic steps. 


Step 1 is to find as many discrete invariants as possible 
(in the case of nonsingular complex projective 
curves the only discrete invariant is the genus). 

Step 2 is to fix the values of all the discrete invariants 
and try to construct a “moduli space”; that is, a 
complex manifold (or an algebraic variety) whose 
points correspond in a natural way to the 
equivalence classes of the objects to be classified. 


What is meant by “natural” here can be made 
precise (as we shall see shortly) given suitable notions 
of families of objects parametrized by base spaces and 
of equivalence of families. A “fine moduli space” is 
then a base space for a universal family of the objects 
to be classified (any family is equivalent to the 
pullback of the universal family along a unique map 
into the moduli space). If no universal family exists 
there may still be a “coarse moduli space” satisfying 
slightly weaker conditions, which are nonetheless 
strong enough to ensure that if a moduli space exists it 
will be unique up to canonical isomorphism. 

It is often the case that not even a coarse moduli 
space will exist. Typically, particularly “bad” objects 
must be left out of the classification in order for a 
moduli space to exist. For example, a coarse moduli 
space of nonsingular complex projective curves exists 
(although to have a fine moduli space we must give the 
curves some extra structure, such as a level structure), 
but if we want to include singular curves (which is 
often important so that we can understand how 
nonsingular curves can degenerate to singular ones) 
we must leave out the so-called “unstable curves” to 
get a moduli space. However all nonsingular curves 
are stable, so the moduli space of stable curves of genus 
g is then a compactification of the moduli space of 
nonsingular projective curves of genus g. 

Moduli spaces are often constructed and studied as 
orbit spaces for group actions (using Mumford’s 
geometric invariant theory or more recently ideas due 
to Kollar (1997) and Keel and Mori (1997); geometric 
invariant theoretic quotients can also often be described 
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naturally as symplectic reductions, and it is in this guise 
that many moduli spaces in physics appear. Another 
technique involves period maps, Torelli theorems and 
variations of Hodge structures, initiated by Griffiths 
(1984) and others. In the special case of moduli spaces 
of compact Riemann surfaces, Teichmüller theory can 
also be used (see e.g., Lehto (1987)). 


Remark 1 Recall that a compact Riemann surface 
(i.e., a Compact complex manifold of complex dimen- 
sion 1) can be thought of as a nonsingular complex 
projective curve, in the sense that every compact 
Riemann surface can be embedded in some 
complex projective space 


P, = C”** — {0}/(multiplication by nonzero 


complex scalars) 


as the solution space of a set of homogeneous 
polynomial equations. Moreover, two nonsingular 
complex projective curves are biholomorphic if and 
only if they are algebraically isomorphic. So, there is 
a natural identification between the moduli space of 
compact Riemann surfaces of genus g up to 
biholomorphism and the moduli space of nonsingu- 
lar complex projective curves up to isomorphism. 


There are other situations where an “algebraic” 
moduli space can be naturally identified with the 
corresponding “complex analytic” moduli space, but 
this is not always the case. For example, if we 
consider K3 surfaces (compact complex manifolds 
of complex dimension 2 with first Betti number and 
first Chern class both zero), we find that the moduli 
space of all K3 surfaces has complex dimension 20, 
whereas the moduli spaces of algebraic K3 surfaces 
(which have one more discrete invariant, the degree, 
to be fixed) are 19-dimensional. 

This problem of algebraic moduli spaces versus 
nonalgebraic ones is one reason why the question of 
classifying 1-folds (i.e., compact complex manifolds — 
or, in the algebraic category, nonsingular projective 
varieties — of dimension n) becomes much harder 
when n > 1 than in the case n = 1 (which is the case of 
compact Riemann surfaces or nonsingular projective 
curves). Another difficulty is that families of 1-folds 
can be “blown up” along families of subvarieties to 
produce ever more complicated families. 


Remark 2 Recall that we blow up a complex 
manifold X along a closed complex submanifold Y 
by removing the submanifold Y from X and glueing 
in the projective normal bundle of Y in its place. We 
get a complex manifold X with a holomorphic 
surjection 7: X — X such that 7 is an isomorphism 
over X — Y and if y € Y then 7 !(y) is the complex 
projective space associated to the normal space 


T,X/T,yY to Y in X at y. If X=C"t! and Y={0} 
and we identify P,, with the set of one-dimensional 
linear subspaces of C”*', then 


X = {(v, w) € C"! xP, :v € w} 
with a(v,w) =v. 


Again this problem does not arise when n=1, 
because blowing up a 1-fold makes no difference unless 
the 1-fold has singularities (in which case blowing up 
may help to “resolve” the singularities; for example, 
when we blow up the origin {0} in C’, then the singular 
curve C in C* defined by y* =x? + x* is tranformed 
into a nonsingular curve C with the origin in C replaced 
by two points, corresponding to the two complex 
“tangent directions” in C at 0). 

Thus, the classification of n-folds when n> 1 
requires a preliminary step before there is any hope 
of carrying out the two steps described above. 


Step 0 (the “minimal model programme” of Mori 
(1987) and others): Instead of all the objects to be 
classified, consider only specially “good” objects, 
such that every object is obtained from one of these 
specially good objects by a sequence of blow-ups 
(or similar carefully prescribed operations). 


How to carry out Mori’s minimal model program 
is well understood for algebraic surfaces and 3-folds, 
but in higher dimensions is incomplete as yet (Kollar 
and Mori 1998). We shall ignore both step 0 and 
step 1 from now on, and concentrate on step 2, the 
construction of moduli spaces. 


Ingredients of a Moduli Problem 


Formally before posing a moduli problem, we need 
to fix the category in which we are working; that is, 
we need to specify what we mean by “space” and 
“map” in the description below. If, for example, we 
are working in complex analytic geometry then we 
might take “space” to mean a complex manifold (or 
more generally we might allow singularities) and 
take “map” to mean a complex analytic map, 
whereas in algebraic geometry “space” might mean 
an algebraic variety, or a scheme, or even a stack, 
with “map” interpreted as a morphism of algebraic 
varieties (or schemes, or stacks). 

Once this is fixed, the ingredients of a moduli 
problem are: 


1. a set A of objects to be classified, 

2. an equivalence relation ~ on A, 

3. the concept of a family of objects in A with base 
space S (or parametrized by S$), and sometimes 

4. the concept of equivalence of families. 


These ingredients must satisfy: 


1. a family parametrized by a single point {p} is just 
an object in A (and equivalence of objects is 
equivalence of families over {p}) and 

2. given a family X parametrized by a space S and a 
map $:S — S, there is a family ¢*X parametrized 
by S (the “pullback of X along @”), with 
pullback being functorial and preserving 


equivalence. 


In particular, for any family X parametrized by S$ 
and any s € S, there is an object X, given by pulling 
back X along the inclusion of {s} in S. We think of 
X, as the object in the family X whose parameter is 
the point s in the base space S. 


Example 1 A family of compact Riemann surfaces 
parametrized by a complex manifold S is a surjective 
holomorphic map 


TELT = 


from a complex manifold T of (complex) dimen- 
sion dim (T)= dim (S)+1 to S, such that m is 
proper (i.e. the inverse image m™(C) of any 
compact subset C of S under m is compact) and 
has maximal rank (i.e., its derivative is everywhere 
surjective). Then m™(s) is a compact Riemann 
surface for each s € S, and is the object in the 
family with parameter s. 

The family defined by z is an algebraic family if 
m is a morphism of nonsingular complex projective 
varieties. 


Example 2 A family of nonsingular complex 
projective varieties parametrized by a nonsingular 
complex variety S is a proper surjective morphism 


a: ToS 


with T nonsingular and 7 having maximal rank. We 
can also allow T and S to be singular, but then we 
require an extra technical condition (that 7 must be 
flat with reduced fibers). 


In the above example, equivalence of families 
mı: Tı > Sı and 12: T2 — S is given by isomorph- 
isms f: 7, > Tə and g:$; —> S such that g o mı = 
m2 of. Equivalence of families in the first example is 
similar. 


Definition 1 A “deformation” of a nonsingular 
projective variety or compact complex manifold M 
is given by a family m:T —> S together with an 
isomorphism 


a + (sq) ~M 


for some so € S. 
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Strictly speaking, the deformation is the germ at 
so of such a 7; that is, the restriction of 7 over any 
open neighborhood of sọ in S$ determines the same 
deformation of M as 7r does. 

A study of deformations leads to information 
about the local structure of moduli spaces. Let 
m:X —S be a deformation of a compact complex 
manifold M =m™! (so) where so € S. We can cover M 
(thought of as a subset of X) with open subsets W; 
of X such that there exist isomorphisms 


bh; : W; — U; x V; 


where V;=q(W;) is open in S and U; = MN W; is 
open in M =m™! (so) and the projection of h; onto V; 
is just 7: W; — V;. For each i Æj, we then get a 
holomorphic vector field 6; on U;MU; by differ- 
entiating h; o hs in the direction of any tangent 
vector v € T&S. These holomorphic vector fields 
define a 1-cocycle in the tangent sheaf © of M. This 


gives us the “Kodaira-Spencer map” 


Pr : TaS > H! (M, 0) 


Theorem 1 (Kuranishi). If M is a compact com- 
plex manifold, then it has a deformation 7: X — S 
with m (so) =M such that 


(i) the Kodaira-Spencer map p,:T;,S — H'(M, ©) 
is an isomorphism, 

(ii) m has the local universal property for deforma- 
tions (1.e., any deformation of M is locally the 
pullback of x along a map f into S), 

(iii) if H°(M, ©) =0, then the map f in (ii) is unique, 
and 

(iv) if H7(M, ©) =0, then S is nonsingular at so and 
so dim S= dim H'(M, O). 


This deformation m is called the “Kuranishi 
deformation” of M (its germ at so is unique up to 
isomorphism), and S is called the “Kuranishi space” 
of M. 


Example 3 A family of holomorphic (or algebraic) 
vector bundles over a compact Riemann surface (or 
nonsingular complex projective curve) X is a vector 
bundle over £X x S where S is the base space (see e.g., 
Verdier and Le Potier (1985)). A deformation of a 
vector bundle Ep over X is then given by a vector 
bundle E over a product = x $ together with an 
isomorphism 


EļSx{so} = Eo 


for some so € S (strictly speaking it is the germ at so 
of such a family of vector bundles). 
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Fine and Coarse Moduli Spaces 


For definiteness, except when it is specified other- 
wise, let us consider moduli problems in algebraic 
geometry with “space” meaning algebraic variety 
(over some fixed field k which is usually C) and 
“map” meaning morphism of algebraic varieties. 


Definition 2 A “fine moduli space” for a given 
(algebro-geometric) moduli problem is an algebraic 
variety M with a family U parametrized by M 
having the following (universal) property: for every 
family X parametrized by a base space S, there exists 
a unique map ¢:S — M such that 


Xg U 


U is then called a “universal family” for the given 
moduli problem. 


Many moduli problems have no fine moduli 
space, but nonetheless there may be a moduli space 
satisfying slightly weaker conditions, called a coarse 
moduli space. If a fine moduli space does exist, it 
will automatically satisfy the conditions to be a 
coarse moduli space. Both fine and coarse moduli 
spaces, when they exist, are unique up to canonical 
isomorphism. 


Definition 3 A “coarse moduli space” for a given 
moduli problem is an algebraic variety M with a 
bijection 


a:A/~—> M 


(where as before A is the set of objects to be 
classified up to the equivalence relation ~) from the 
set A/~ of equivalence classes in A to M such that: 


(i) For every family X with base space S, the 
composition of the given bijection a: A/~—M 
with the function 


vy : S — A/~ 


which sends s € S to the equivalence class [X,] 
of the object X, with parameter s in the family 
X, is a morphism. 

(ii) When N is any other variety with 8:A/~—N 
such that for each family X parametrized by a 
base space S the composition Govy:S + Nisa 
morphism, then 


Boa!:M—oN 


is a morphism. 


Remark 3 For some moduli problems, a family X 
with base space S$ which is connected and of 


dimension strictly greater than zero may exist such 
that for some so € S we have 


(i) X; ~ X; for all s,t € S — {so} and 
(ii) Xs % Xs for all s € S — {so}. 


This is the “jump phenomenon,” and when it 
occurs we cannot construct a moduli space including 
the equivalence class of the object X,,. Typically, to 
construct a moduli space, some objects (often called 
“unstable”) must be left out because of the jump 
phenomenon and we only get a moduli space of 
“stable” objects. This happens, for example, in the 
construction of moduli spaces of complex projective 
curves, if we want to include singular curves, or 
moduli spaces of vector bundles. 


Example 4 The Jacobian J(£) of a compact Rie- 
mann surface X is a fine moduli space for holo- 
morphic line bundles (i.e., vector bundles of rank 1) 
of fixed degree over © up to isomorphism. As a 
complex manifold 


JŒ) = C/A 


where g is the genus of } and A is a lattice of maximal 
rank in C® (in other words J(£) is a complex torus). 
Since J(X) is also a complex projective variety, it is an 
“abelian variety.” 

More precisely, /(%:) is the quotient of the 
complex vector space H°(X, Ky) of dimension g by 
the lattice H! (X, Z) = Z*®. Here Ky is the complex 
cotangent bundle of © and H? (£, Ks) is the space of 
its holomorphic sections, that is, the space of 
holomorphic differentials on ©. If we choose a 
basis w1,...,Wg of holomorphic differentials and a 
standard basis 71,...,72¢ for Hi(%, Z) such that 


Vi-Vitg = 1 = Vite Vi 


when 1 <i<g and all other intersection pairings 
Jij are zero, then we can associate to X the g x 2g 
“period matrix” P(X) given by integrating the 
holomorphic differentials w; around the 1-cycles 7j. 
The Jacobian J(£) can then be identified with the 
quotient of C! by the lattice spanned by the columns 
of this period matrix. 

We can in fact always choose the basis wy,..., we 
of holomorphic differentials so that the period 
matrix P(X) is of the form 


(Ip Z) 


where I, is the g x g identity matrix. This period 
matrix is called a “normalized period matrix.” The 
Riemann bilinear relations tell us that Z is sym- 
metric and its imaginary part is positive definite. 


Example 5 The moduli space Ag of all abelian 
varieties of dimension g was one of the first moduli 
spaces to be constructed. We have 


Ag = Hg/Sp(2g; Z) 


where 7, is Siegel’s upper half space, which consists 
of the symmetric g xg complex matrices with 
positive-definite imaginary part. 


Example 6 One way to construct and study the 
moduli space M, of compact Riemann surfaces of 
genus g is via the “Torelli map” 


T : Mg — A, 
given by 
DiJ) 


Torelli’s theorem tells us that 7 is injective (cf. 
Griffiths (1984)). Describing the image of M, in A, 
is known as the Schottky problem. 


We can calculate the dimension of the moduli 
space M, using Kuranishi theory as in the previous 
section: we get 


dim M, = dim H' (£, ©) = 3g — 3 


for any compact Riemann surface © of genus g > 2. 
In fact, if M is any compact complex manifold and 
there exists a fine moduli space of complex mani- 
folds diffeomorphic to M, then the moduli space is 
locally isomorphic near [M] to the Kuranishi space 
near so. More often, there is only a coarse moduli 
space (as in the case of M,), and then the moduli 
space is locally isomorphic near [M] to the quotient 
of the Kuranishi space by the action of the group of 
automorphisms of M. 

For the Teichmüller approach to M, (cf. Lehto 
(1987)), we consider the space of all pairs consisting 
of a compact Riemann surface of genus g and a basis 
J15- - -s Y2g for Hı (£, Z) as above such that 


Vi-Vitg = 1 = —Yi+g-Yi 


if 1 <i<g and all other intersection pairings 7.9; 
are zero. If g > 2, this space (called Teichmüller 
space) is naturally homeomorphic to an open ball in 
C°s-> (by a theorem of Bers). The mapping class 
group I, (which consists of the diffeomorphisms of 
the surface modulo isotopy) acts discretely on 
Teichmüller space, and the quotient can be identi- 
fied with the moduli space M,. This gives us a 
description of M, as a complex analytic space, but 
not as an algebraic variety. 

To construct the moduli space M, as an algebraic 
variety, we can use the fact that every compact 
Riemann surface of genus g can be embedded 
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canonically as a curve of degree 6(g—1) in a 
projective space of dimension 5g—6. The use of 
the word “canonical” here is a rather poor pun; it 
refers both to the canonical line bundle (or 
cotangent bundle) of the Riemann surface, although 
here “tricanonical” would be more accurate, and 
also to the fact that no choices are involved, except 
that a choice of basis is needed to identify the 
projective space with the standard one Ps,_¢. This 
enables us to identify M, with the quotient of an 
algebraic variety by the group PGL(” + 1; C). How- 
ever, here we do not have a discrete group action, 
and to construct the quotient we must use Mum- 
ford’s geometric invariant theory (see below), which 
was developed in the 1960s in order to provide 
algebraic constructions of this moduli space and 
others. 

In fact, geometric invariant theory also provides a 
beautiful compactification of M, known as the 
Deligne-Mumford (1969) compactification Mg. 
This compactification is itself a moduli space: it is 
the moduli space of (Deligne-Mumford) stable 
curves, which are complex projective curves with 
only nodal singularities and at most finitely many 
automorphisms. M, is singular but in a relatively 
mild way; it is the quotient of a nonsingular variety 
by a finite group action. 

The moduli space Me n of nonsingular complex 
projective curves of genus g with n marked points 
has a similar compactification Mg „n which is the 
moduli space of complex projective curves with n 
marked nonsingular points and with only nodal 
singularities and finitely many automorphisms. 
Finiteness of the automorphism group of such a 
curve X is equivalent to the requirement that any 
irreducible component of genus 0 (respectively 1) 
has at least 3 (respectively 1) special points, where 
“special” means either marked or singular in X. 

The construction of M, using the period matrices 
of curves and the Torelli theorem leads to a different 
compactification M, of M, known as the Satake 
(or Satake-Baily—Borel) compactification. Like the 
Deligne-Mumford compactification, M, is a com- 
plex projective variety, but the boundary of Mg in 
Mge has (complex) codimension 2 for g > 3 whereas 
the boundary A of M, in M, has codimension 1. 
Each of the irreducible components Ao,..., Ajg/2) of 
A is the closure of a locus of curves with exactly one 
node (irreducible curves with one node in the case of 
Ao, and in the case of any other A; the union of two 
nonsingular curves of genus i and g — i meeting at a 
single point). The divisors A; meet transversely in 
Mg, and their intersections define a natural decom- 
position of A into connected strata which parame- 
trize stable curves of a fixed topological type. 
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For a recent guide to many different aspects of the 
moduli spaces M,, see Harris and Morrison (1998). 


Example 7 Given any nonsingular complex pro- 
jective variety X, we can study the moduli spaces of 
maps from curves to X considered by Kontsevich. 
Intersection theory on these moduli spaces leads to 
Gromov-Witten theory and the quantum cohomol- 
ogy of X, with many applications, for example, to 
enumerative geometry (cf. Cox and Katz (1999), 
Fulton and Pandharipande (1997), Dijkgraaf et al. 
(1995)). 

More precisely, if 2g—2 +n >00 then for any 
6 € H(X; Z) there is a moduli space Mg n(X, 3) of 
n-pointed nonsingular complex projective curves X 
of genus g equipped with maps f/f: — X satisfying 
f.[2] = G. This moduli space has a compactification 
Meg, n(X, 3) which classifies “stable maps” of type 3 
from m-pointed curves of genus g into X (Fulton 
and Pandharipande 1997). Here, a map f:% — X 
from an n-pointed complex projective curve X 
satisfying f,[X]=( is called stable if X has only 
nodal singularities and f:i— X has only finitely 
many automorphisms, or equivalently every irre- 
ducible component of © of genus 0 (respectively 
genus 1) which is mapped to a single point in X by 
f contains at least three (respectively 1) special 
points. The forgetful map from Mg, n(X, B) to Mg n 
which sends [%, p1,... Pn, f: — X] to [“,p1,...,pn| 
extends to a forgetful map 7:Megn(X,3) = Men 
which collapses components of © with genus 0 and 
at most two special points. 

Of course, when X is itself a single point, 
Me, n(X, 3) and Me n(X,3) are simply the moduli 
spaces Mg n and Me n. In general Mg, ,,(X, 3) has 
more serious singularities than M, „n and may indeed 
have many different irreducible components with 
different dimensions. In spite of this, Moal Xs GB) has 
a “virtual fundamental class” [Mg n(X, 3)]"" lying in 
the expected dimension 


3g—3+n+(1 -g)dimX + | ci(TX) 
b 
of Me n(X, 3). Gromov-Witten invariants (origin- 
ally developed mainly in the case g=0 when 
Meg n(X, 6) is more tractable, but now also studied 
when g > 0) are obtained by evaluating cohomology 


classes on Mg n(X, 6) against this virtual funda- 
mental class. 


Moduli Spaces as Orbit Spaces 


Example 8 As a simple example, let us consider the 
moduli space of “hyperelliptic” curves of genus g. 
By a hyperelliptic curve of genus g, we mean a 


nonsingular complex projective curve C with a 
double cover f : C — Pı branched over 2g + 2 points 
in the complex projective line P4. 

Let S be the set of unordered sequences of 2g + 2 
distinct points in P4, which we can identify with an 
open subset of the complex projective space P2.42 by 
associating to an unordered sequence 41,...,42¢42 of 
points in P4 the coefficients of the polynomial whose 
roots are d@1,...,42g:2. Then, it is not hard to 
construct a family ¥ of hyperelliptic curves of genus 
g with base space S such that the curve parametrized 
by a1,...,42g42 is a double cover of Pı branched 
over 41,...,42¢42. This family is not quite a universal 
family, but it does have the following two properties. 


(i) The hyperelliptic curves ¥; and X; parametrized 
by elements s and t of the base space S are 
isomorphic if and only if s and ż lie in the same 
orbit of the natural action of G = SL(2; C) on S. 

(ii) (Local universal property) Any family of hyper- 
elliptic curves of genus g is locally equivalent to 
the pullback of ¥ along a morphism to S. 


These properties (i) and (ii) imply that a (coarse) 
moduli space M exists if and only if there is an 
“orbit space” for the action of G on S (Newstead 
1978). Here, by an orbit space we mean a 
G-invariant morphism ¢:S — M such that every 
other G-invariant morphism %:S— M factors 
uniquely through ¢, and moreover ¢~'(m) is a single 
G-orbit for each m € M. (We can think of an orbit 
space as the set of G-orbits endowed in a natural 
way with the structure of an algebraic variety.) 

This sort of situation arises quite often in moduli 
problems, and the construction of a moduli space is 
then reduced to the construction of an orbit space. 
Unfortunately, such orbit spaces do not in general 
exist. The main problem (which is closely related to 
the jump phenomenon discussed above) is that there 
may be orbits contained in the closures of other 
orbits, which means that the natural topology on the 
set of all orbits is not Hausdorff, so this set cannot 
be endowed naturally with the structure of a variety. 
This is the situation the geometric invariant theory 
of Mumford (1965) attempts to deal with, telling us 
how to throw out certain “unstable” orbits in order 
to be able to construct an orbit space. For more 
general constructions of orbit spaces which can be 
used for moduli problems where geometric invariant 
theory may not be of use, see Keel and Mori (1997) 
and Kollar (1997). 


Example 9 Let G=SL(2;C) act on (P,)* via 
Mobius transformations on the Riemann sphere 


P4 = CU {oo} 


Then, 


{(x1,22,%3,%4) € (P1)* : x1 = x2 = x3 = x4} 


is a single orbit which is contained in the closure of 
every other orbit. On the other hand, the open subset 


{(x1,%2,%3,%4) € (P,)* : X1,X2,%3,x4 distinct} 


of (P)* has an orbit space which can be identified 
with 


Py a 10, 1, oo} 
via the cross ratio. 


In order to describe Mumford’s geometric invar- 
iant theory, let X be a complex projective variety 
(i.e., a subset of a complex projective space defined 
by the vanishing of homogeneous polynomial 
equations), and let G be a complex reductive group 
acting on X. We also require a “linearization” of the 
action; that is, an ample line bundle L on X and a 
lift of the action of G to L. We lose very little 
generality in assuming that for some projective 
embedding X C P, the action of G on X extends 
to an action on P,, given by a representation 


p:G—>GLin+1) 


and taking for L the hyperplane line bundle on P,. 
Algebraic geometry associates to X C P,, its homo- 
geneous coordinate ring 


A(X) = PH (X, L™*) = Cx, ... 
k>0 


Xl 


which is the quotient of the polynomial ring 
C[xo,...,Xn] in n+1 variables by the ideal Zx 
generated by the homogeneous polynomials vanish- 
ing on X. Since the action of G on X is given by a 
representation p:G — GL(n + 1), we get an induced 
action of G on C[xo,...,x,] and on A(X), and we 
can therefore consider the subring A(X)° of A(X) 
consisting of the elements of A(X) left invariant by 
G. This subring A(X)° is a graded complex algebra, 
and because G is reductive it is finitely generated 
(Mumford 1965). To any finitely generated graded 
complex algebra we can associate a complex 
projective variety, and so we can define X//G to 
be the variety associated to the ring of invariants 
A(X)°. The inclusion of A(X)° in A(X) defines a 
“rational” map ¢ from X to X//G, but because 
there may be points of X CP, where every 
G-invariant polynomial vanishes, this map will not 
in general be well defined everywhere on X (_.e., it 
will not be a morphism). 

We define the set XS of “semistable” points in X 
to be the set of those x € X for which there exists 
some f € A(X)° not vanishing at x. Then, the 
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rational map @¢ restricts to a surjective G-invariant 
morphism from the open subset XS of X to the 
quotient variety X//G. However, ¢: XS — X//G is 
still not in general an orbit space: when x and y are 
semistable points of X, we have ¢(x)=@/(y) if and 
only if the closures Og(x) and Oc(y) of the G-orbits 
of x and y meet in X. Topologically, X//G is the 
quotient of X” by the equivalence relation for which 
x and y in XS are equivalent if and only if Og(x) 
and Oc(y) meet in X5. 

We define a “stable” point of X to be a point x of 
Xs with a neighbourhood in X such that every 
G-orbit meeting this neighborhood is closed in XSS, 
and is of maximal dimension equal to the dimension 
of G. If U is any G-invariant open subset of the set 
X° of stable points of X, then ¢(U) is an open subset 
of X//G and the restriction ¢|,,: U — ¢(U) of ¢ to U 
is an orbit space for the action of G on U in the sense 
described above, so that it makes sense to write U/G 
for (U). In particular, there is an orbit space X*/G 
for the action of G on X5, and X//G can be thought 
of as a compactification of this orbit space. 


XS = A C X 














open open 

| l 

MG & X= = XG 
open 


Example 10 Let us return to hyperelliptic curves 
of genus g. We have seen that the construction of a 
moduli space reduces to the construction of an 
orbit space for the action of G=SL(2;C) on an 
open subset S of P2542. If we identify P2.42 with the 
space of unordered sequences of 2g +2 points in 
P1, then S is the subset consisting of unordered 
sequences of distinct points. When the action of G 
on P242 is linearized in the obvious way, then an 
unordered sequence of 2g+2 points in P is 
semistable if and only if at most g + 1 of the points 
coincide anywhere on Pj, and is stable if and only 
if at most g of the points coincide anywhere on P, 
(cf. Kirwan (1985), chapter 16). Thus, S is an open 
subset of P3,,5, so an orbit space S$/G exists with 
compactification the projective variety P2,.2//G. 
This orbit space is then the moduli space of 
hyperelliptic curves of genus g. 


Other moduli spaces (such as moduli spaces of 
curves and of vector bundles; see e.g., Donaldson 
(1984), Gieseker (1983), Mumford (1965, 1977), 
and Newstead (1978)) can be constructed as orbit 
spaces via geometric invariant theory in a similar 
way. 
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Symplectic Reduction and Moduli Spaces 
of Vector Bundles 


Geometric invariant theoretic quotients are closely 
related to the process of reduction in symplectic 
geometry, and thus many moduli spaces can be 
described as symplectic reductions. 

Suppose that a compact, connected Lie group K 
with Lie algebra k acts smoothly on a symplectic 
manifold X and preserves the symplectic form w. Let 
us denote the vector field on X defined by the 
infinitesimal action of a € k by 


X —> dy 


By a moment map for the action of K on X we mean 
a smooth map 


u: xX —>k* 
which satisfies 


du(x)(£).a = wx(€, ax) 


for all x € X, €€ T,X and a € k. In other words, if 
Ha: X — R denotes the component of u along a € k 
defined for all x € X by the pairing 


Ha(x) = p(x).a 


between u(x) € k* anda € k, then pug is a Hamiltonian 
function for the vector field on X induced by a. We 
shall assume that all our moment maps are equivariant 
moment maps; that is, u:X — k* is K-equivariant 
with respect to the given action of K on X and the 
co-adjoint action of K on k“. 

It follows directly from the definition of a 
moment map u:X —> k* that if the stabilizer Ke of 
any C€ k* acts freely on (C), then p!(C) is a 
submanifold of X and the symplectic form w induces 
a symplectic structure on the quotient y'(¢)/Ke. 
With this symplectic structure, the quotient 
uu (¢)/Ke is called the Marsden—Weinstein reduc- 
tion, or symplectic quotient, at Ç of the action of K 
on X. We can also consider the quotient yu! (¢)/K- 
when the action of Ke on p1(C) is not free, but in 
this case it is likely to have singularities. 


Example 11 Consider the cotangent bundle T*Y of 
any n-dimensional manifold Y with its canonical 
symplectic form w which is given by the standard 
symplectic form 


Ww = 


dp; ^A dq; [1] 


n 
j=l 

with respect to any local coordinates (g1,...,qn) on 
Y and the induced coordinates (pj,...,),) on its 


cotangent spaces. If Y is the configuration space of a 
classical mechanical system, then T*Y is the phase 


space of the system and the coordinates p= 
(P1,---5Pn) ET x4 Y are traditionally called the 
momenta of the system. 

If Y is acted on by a Lie group K, the induced 
action on T*Y preserves w and there is a moment 
map u:T*Y — k* whose components m4 along a € 
k are given by pairing the moment coordinates p 
with the vector fields on X induced by the 
infinitesimal action of K; that is, 


Kalp, q) — p.dq 


for all q € Y and p € T,Y. When K=SO(3) acts by 
rotations on Y =R?, then pu is the angular momen- 
tum, or moment of momentum, about the origin. 


The connection with geometric invariant theory 
arises as follows. Let X be a nonsingular complex 
projective variety embedded in complex projective 
space P,,, and let G be a complex Lie group acting 
on X via a complex linear representation p:G —> 
GL(n + 1;C). A necessary and sufficient condition 
for G to be reductive is that it is the complex- 
ification of a maximal compact subgroup K (e.g., 
G = GL(m; C) is the complexification of the unitary 
group U(m)). By an appropriate choice of coordi- 
nates on P,,, we may assume that p maps K into the 
unitary group U(n+ 1). Then, the action of K 
preserves the Fubini-Study form w on P,, which 
restricts to a symplectic form on X. There is a 
moment map 4: X — k* defined (up to multiplica- 
tion by a constant scalar factor depending on 
differences in convention on the normalization of 
the Fubini-Study form) by 


R pla) 


= 2 
2mil|%||7 2. 


u(x).a 
for all a € k, where & € C”*! — {0} is a representa- 
tive vector for x € P,, and the representation p:K — 
U(n+ 1) induces p,:kR—-u(n+1) and dually 
p*:u(n+1)* oR’. 

In this situation, we have two possible quotient 
constructions, giving us the geometric invariant 
theory quotient X//G if we want to work in 
algebraic geometry and the symplectic reduction 
u™(0)/K if we want to work in symplectic geome- 
try. In fact, these give us the same quotient space, at 
least up to homeomorphism (and diffeomorphism 
away from the singularities). More precisely, any 
x € X is semistable if and only if the closure of its 
G-orbit meets u™!(0), and the inclusion of p!(0) 
into X” induces a homeomorphism 


u~ (0)/K > X//G 


There are other quotient constructions closely related 
to symplectic reduction and geometric invariant 


theory, which are useful when working with Kahler 
or hyper-Kahler manifolds. 

In physics, moduli spaces are often described as 
symplectic reductions of infinite-dimensional sym- 
plectic manifolds by infinite-dimensional groups 
(although the moduli spaces themselves are usually 
finite-dimensional). One example is given by moduli 
spaces of holomorphic vector bundles, which 
can also be described using Yang-Mills theory 
(cf. Atiyah and Bott (1982)). 

The Yang-Mills equations arose in physics as 
generalizations of Maxwell’s equations. They have 
become important in differential and algebraic 
geometry formulated over arbitrary compact oriented 
Riemannian manifolds, and in particular over com- 
pact Riemann surfaces and higher dimensional Kahler 
manifolds. The fundamental theorem of Donaldson, 
Uhlenbeck, and Yau that a holomorphic bundle over 
a compact Kahler manifold admits an irreducible 
Hermitian Yang—Mills connection if and only if it is 
stable can be thought of as an infinite-dimensional 
illustration of the link between symplectic reduction 
and geometric invariant theory. 

Let M be a compact oriented Riemannian mani- 
fold and let E be a fixed complex vector bundle over 
M with a Hermitian metric. Recall that a connection 
A on E (or equivalently on its frame bundle) can be 
defined by a covariant derivative d4:%,(E) — 
oF IE), where QP (E) denotes the space of 
C%-sections of AP T*M@E (i.e. the space of 
p-forms on M with values in E). This covariant 
derivative satisfies the extended Leibniz rule 


da(a A B) = (daa) A B+ (—-1)?a A daB 


for œ € QÑ (E), 8 € OL(E), and therefore is deter- 
mined by its restriction dą :Q} (E) = Q4 (E). The 
Leibniz rule implies that the difference of two 
connections is given by an E® E*-valued 1-form 
on M, and hence that the space of all connections on 
E is an infinite-dimensional affine space A based on 
the vector space Q1 (E @ E*). Similarly, the space of 
all unitary connections on E (i.e., connections 
compatible with the Hermitian metric on E) is an 
infinite-dimensional affine space based on the space 
of 1-forms with values in the bundle gp of skew- 
adjoint endomorphisms of E. The Leibniz rule also 
implies that the composition dą o d4 :Q9 (E) — 
Q4 (E) commutes with multiplication by smooth 
functions, and thus we have 


da O da(s) = Fas 


for all C® sections s of E, where F4 € 0%,(g,) is 
defined to be the curvature of the unitary connection 
A. The Yang-Mills functional on the space A of all 
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unitary connections on E is defined as the L*-norm 
square of the curvature, given by the integral over M 
of the product of the function ||F4 ||” and the volume 
form on M defined by the Riemannian metric and the 
orientation. The Yang-Mills equations are the Euler- 
Lagrange equations for this functional, given by 


da * Fa =0 


where dą has been extended in a natural way to 
OV (ge). The gauge group G, that is, the group of 
unitary automorphisms of E, preserves the Yang- 
Mills functional and the Yang-Mills equations. 

If M is a complex manifold, we can identify the 
space A't! of unitary connections on E with 
curvature of type (1,1) with the space of holomorphic 
structures on E, by associating to a holomorphic 
structure E the unitary connection whose (0, 1)- 
component is given by the 0-operator defined by £. 
This space A") is an infinite-dimensional complex 
subvariety of the infinite-dimensional complex affine 
space A, acted on by the complexified gauge group 
G: (the group of complex C® automorphisms of E), 
and two holomorphic structures are isomorphic if 
and only if they lie in the same G,-orbit. 

When (M,w) is a compact Kahler manifold, there 
is a G-invariant Kahler form Q on A defined by 


1 
O(a, 8) = za | tla AB) Aw" 


where n is the complex dimension of M. The Lie 
algebra of G is the space Q} (gp) of sections of gp, 
and there is a moment map p:A— (O07, (g,))* for 
the action of G on A given by the composition of 


1 
A Fa Aw"! € O04" (g,) 


T Bm? 
with integration over M. On A” the norm square 
of this moment map agrees up to a constant factor 
with the Yang-Mills functional, which is minimized 
by the Hermitian Yang-Mills connections. 

As in the finite-dimensional situation, for a suitable 
definition of stability, the moduli space of stable 
holomorphic bundles of topological type E over M 
(which plays the role of the geometric invariant 
theory quotient) can be identified with the moduli 
space of (irreducible) Hermitian Yang-Mills connec- 
tions on E (which plays the rôle of the symplectic 
reduction). This was proved in general for vector 
bundles over compact Kähler manifolds Uhlenbeck 
and Yau with a different proof for nonsingular 
complex projective varieties given by Donaldson. 

Over a compact Riemann surface M the situation is 
relatively simple, as all connections on E have 
curvature of type (1, 1) and so the infinite-dimensional 
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complex affine space A can be identified with the 
space C of holomorphic structures on E. A moment 
map for the action of the gauge group on A is given by 
assigning to a connection A € A its curvature Fa € 
0+, (g,-), and, after a suitable central constant has been 
added, the Hermitian Yang-Mills connections are 
exactly the zeros of the moment map. 

A holomorphic bundle € over a Riemann surface 
M is stable (respectively semistable) if u(F) < p(E) 
(respectively pu(F) < u(E)) for every proper sub- 
bundle F of €, where 


u(F) = deg(F)/rank(F) 


When the theory of stability of holomorphic vector 
bundles was first introduced, Narasimhan and 
Seshadri proved that a holomorphic vector bundle 
over M is stable if and only if it arises from an 
irreducible representation of a certain central exten- 
sion of the fundamental group mı(M). Atiyah and 
Bott (1982) translated this in terms of connections to 
show that a holomorphic vector bundle over M is 
stable if and only if it admits a unitary connection 
with constant central curvature. They deduced from 
this the existence of a homeomorphism between the 
moduli space M(n, d) of stable bundles of rank n and 
degree d over M and the moduli space of irreducible 
connections with constant central curvature on a 
fixed C” bundle E of rank n and degree d over M. 


See also: BF Theories; Calibrated Geometry and Special 
Lagrangian Submanifolds; Cohomology Theories; Floer 
Homology; Gauge Theoretic Invariants of 4-Manifolds; 
Gauge Theory: Mathematical Applications; Geometric 
Measure Theory; Geometric Phases; Hamiltonian Group 
Actions; Instantons: Topological Aspects; Intersection 
Theory; Riemann Surfaces; Several Complex Variables: 
Basic Geometric Theory; Several Complex Variables: 
Compact Manifolds; Topological Gravity, Two- 
Dimensional; WDVV Equations and Frobenius Manifolds. 
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Introduction 


Since the late 1970s, a particular attention in the 
theory of integrability has been payed to systems 
admitting more than one Hamiltonian representa- 
tion. The first examples belonged to the class of 
infinite-dimensional systems (i.e., partial differen- 
tial equations), like the Korteweg-de Vries (KdV) 
equation, the Ablowitz—Kaup—Newell-Segur 
system, and many other soliton equations (see 
Bi-Hamiltonian Methods in Soliton Theory). It 
was realized soon that finite-dimensional integr- 
able systems are also likely to possess a 
bi-Hamiltonian representation. Moreover, a geo- 
metric setting for the study of bi-Hamiltonian 
systems was established, with the introduction of 
the so-called bi-Hamiltonian manifolds. They are 
Poisson manifolds with an additional Poisson 
structure, fulfilling a suitable compatibility con- 
dition with the initial Poisson bracket. An 
important program for the study and the classi- 
fication of (finite-dimensional) bi-Hamiltonian 
manifolds was started in the 1990s by Gelfand 
and Zakharevich. They pointed out that the 
geometry of such manifolds is extremely rich 
and complicated. 

In this article we present the basic facts 
concerning the bi-Hamiltonian geometry and its 
relations with the theory of integrable systems, 
referring to Recursion Operators in Classical 
Mechanics in this encyclopedia for the connections 
with separable systems of Jacobi. In the first 
section we give the definitions of bi-Hamiltonian 
manifold and bi-Hamiltonian system, and we 
present some properties of the former. The next 
section contains three concrete examples (the Euler 
top, the open Toda lattice, and a stationary KdV 
flow) and two important classes of bi-Hamiltonian 
manifolds, both related to Lie algebras. This is 
followed by a discussion of the iterative construc- 
tion of first integrals in involution for a given 
bi-Hamiltonian system. This procedure is particu- 
larly efficient in the case of Poisson—Nijenhuis 
manifolds, that is, those bi-Hamiltonian manifolds 
whose second Poisson structure can be obtained by 
composing the first one with a suitable recursion 
operator. 
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Bi-Hamiltonian Systems 


First of all, we recall some fundamental definitions 
from the theory of Poisson manifolds, which are the 
natural setting for the study of Hamiltonian systems. 
Let M be a finite-dimensional C-differentiable 
manifold and let C®%(M) be the space of C®™- 
functions from M to R. A Poisson bracket on M is 
a skew-symmetric R-bilinear map 


{o1}: C7 (M) x C*(M) > C™(M) 
fulfilling the Jacobi identity 
LiF, G}, A} + (1H, Fi, G} + 11G, A}, F} = 0 
and the Leibniz rule 
{FG,H} = F{G, H} + {F, H}G 


A Poisson manifold is a differentiable manifold 
endowed with a Poisson bracket. Starting from a 
Poisson bracket, one can introduce a tensor field P 
of type (2,0), which we consider as a map from 
T*M to TM, defined by 


(dG, P dF) = {F, G} 


or, using coordinates on M, by P” ={x',x/}. This 
tensor field is called the Poisson tensor associated 
with {-,-}. It is skew-symmetric, and its components 
satisfy the cyclic condition 





gOP™ OP" 4p OPH 
ý Ox! a get Ox! 

meaning that the Schouten bracket [P,P] vanishes. 
On a Poisson manifold, the vector field 
Xy={H,-}=PdH is called the Hamiltonian 
vector field associated with H. In coordinates, 
X4; = P0H /ðx'. The Jacobi identity is equivalent 
to the statement that the map H > Xuy, assigning 
to a function H its Hamiltonian vector field Xj, is 

a Lie algebra homomorphism: 


Xira) = [Xr, Xc] [1] 


0 


A Casimir function is a function H such that 
Xp =0, that is, a function which is in involution 
with any other function on M. In terms of the 
Poisson tensor, a Casimir is a function whose 
differential belongs to the kernel of P. 

The most famous class of Poisson manifolds is 
certainly that of symplectic manifolds. They can be 
seen as nondegenerate Poisson manifolds. Indeed, if 
a Poisson tensor P is invertible, then its inverse 
defines a closed nondegenerate 2-form (i.e., a 
symplectic form). Moreover, any Poisson manifold 
turns out to be foliated in symplectic leaves. 
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Let us introduce now the bi-Hamiltonian manifolds, 
which can be considered as a geometric setting for the 
study of integrable Hamiltonian systems. A manifold 
M endowed with two Poisson brackets, {-,-} and {-,-}’, 
is said to be bi-Hamiltonian if the brackets are 
compatible, that is, if any linear combination (with 
constant coefficients) of them is still a Poisson bracket. 
Such a linear combination automatically satisfies all 
properties of a Poisson bracket except the Jacobi 
identity. This is fulfilled if and only if the following 
compatibility condition holds: 


{f,{G, H}} + {H,{F, Gh} + {G, {H, Fy} 
+ {F,{G, H}} + {H, {F, G}} 
+ {G, {H, F}} =0 2] 


for any triple (F, G, H) of functions on M. This 
amounts to saying that the sum of the two Poisson 
brackets is also a Poisson bracket. In this case the 
two (compatible) Poisson brackets are said to form a 
Poisson pair. 

There are some interesting equivalent forms of the 
compatibility condition [2]. First of all, in terms of 
the components of the Poisson tensors P and P’, it 
reads 

ory il OP’) aae 
j Ox! ve Ox! m Ox! 
OP gor” + (Pye Ore 

Ox! Ox! Ox! 
that is, the Schouten bracket [P, P’] vanishes. More- 
over, if Xp=—PdF is the Hamiltonian vector field 
associated with Fe C*(M) by means of P and 
Yr = P'dF is the one obtained by P’, the compat- 
ibility condition takes the form 


ea 0 


Xr, Yo] + [Yr, Xe] = Xir ey + Yer} 
VE,G € C®(M) 3] 


to be compared with [1]. Moreover, in terms of Lie 
derivatives we have the equivalent condition 


Lx,P’ +Ly,P=0 VFeC*(M) 4] 


Now we turn our attention to special vector fields 
that can be selected on a bi-Hamiltonian manifold 
M. Let P and P’ be the Poisson tensors associated 
with the (compatible) Poisson brackets of M. A 
vector field X on M is said to be bi-Hamiltonian if it 
is Hamiltonian with respect to both Poisson struc- 
tures, that is, if there exist two functions Ho and Hı 
such that 


X = PdH; = P' dHp [5] 


We will see in the following that such vector fields 
are likely to have a number of first integrals in 


involution, and thus they are good candidates for a 
geometric description of integrable systems. The next 
section is devoted to examples of bi-Hamiltonian 
(and multi-Hamiltonian) systems. 


Examples 


The first example is the Euler top, that is, free 
motions of a rigid body with a fixed point. The 
equations of motion are 


Mo 


I; = Lok 
1 ae 213 





and its cyclic permutations. They define a vector 
field in R°, which is well known to be Hamiltonian 
with respect to the Lie—Poisson structure on the 
(dual of the) Lie algebra of 3 x 3 skew-symmetric 
matrices. This means that 


Pai. 7212.3 


where 


i Iir Fe 
Pf oo o E a a a 
1e: T t7 


is the kinetic energy and the bracket {-, -} is defined 
by {11,2} =T3 and its cyclic permutations. Another 
Hamiltonian representation is given by 


Leib. 7242.0 
where 
K =4(0)* + T2? +13’) 


and the new bracket {-,-}’ is defined by {r1, r2 Y = 
—T3/I3; and its cyclic permutations.Any linear 
combination of the two brackets has the form of 
the second one, and it is very easy to show that the 
Jacobi identity is satisfied for such a bracket. 
Therefore, the Euler top is a bi-Hamiltonian system. 
Let us also notice that 


IKT =H eS 0, fol 


that is, K is a Casimir function for the Lie—Poisson 
bracket and H is a Casimir function for the new 
Poisson bracket. Hence, we have the following 
(recursion) relations: 


{K, Tj} =0 
{H, Di} = {K, Tj}! [6] 
0 = {H, Ey 


From a geometrical point of view, the situation is as 
follows. The symplectic leaves of {-,-} are the level 
surfaces of K, that is, spheres, while the symplectic 


leaves of {-, -}’ are the ellipsoids H = constant. Their 
intersections are Lagrangian submanifolds for both 
symplectic leaves (in the compact case they are the 
Arnol’d—Liouville tori of the integrable systems, that 
in this case coincide with the trajectories). 

Let us consider now the (three-particle) open 
Toda lattice. It consists in three particles (with 
masses equal to 1) moving on the line under a 
nearest-neighbor interaction of exponential type. 
The Hamiltonian is given by 


H=1(pi" + po” + p3*) + exp(qi — q2) + exp(q2 — q3) 


and the system is of course Hamiltonian with 
respect to the canonical Poisson structure of R, 


0 0 0—1 0 0 


0 0 0 0—1 0 
p— 0 0 0 0 0-1 
1 0 0 0 0 O 
0 1 0 0 0 0 
0 0 1 0 0 0 


But the Toda vector field can also be written as 
P'dK, where K=p, + p2 + p3 is the total momen- 
tum and 


0 1 1 —p1 0 0 
—] 0 1 0 —p2 0 
p= —1 -1 0 0 0 —p3 
|p, 0 0 0 el11742) 0 
0 p 0 — el11742) 0 e (92-93) 
0 0 p —ẹ(42-43) 0 


is a Poisson tensor, which turns out to be compatible 
with P. The generalization to an arbitrary number of 
particles is straightforward. Hence, the open Toda 
lattice is a bi-Hamiltonian system. In the next section 
we will show that this property can be used to 
construct a maximal set of integrals of motion for the 
Toda lattice, which are automatically in involution. 

The third example — a stationary reduction of the 
KdV equation — comes from the field of soliton 
equations. Let us recall that the first members of the 
KdV hierarchy are 


Ou 

— = Ux 

oti 

A Hussy — 6uus) (KdV equation) 

e 7 g [7] 
= = — Cae — 10uU.x 


—20UxUx~ + 30u ux) 


It is well known how to find finite-dimensional 
reductions for the KdV equation, giving rise to explicit 
solutions. Indeed, the set of singular points of a given 
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vector field of the hierarchy is a finite-dimensional 
manifold which is invariant under the flows of the 
other vector fields, due to the fact that the flows 
commute. The (finite-dimensional) systems obtained 
by restricting the KdV hierarchy to such invariant 
manifolds are called the stationary reductions of KdV. 
Let us consider explicitly the reduction corresponding 
to the third vector field of the hierarchy. The set of its 
critical points is given by 


Urxxxx — 10UUx5%% — 20UxUx~ + 30u7u, = 0 [8] 


and its dimension is 5, since we can use the values of 
U, Ux, Uxx, Uxxx, and Uxxxx at a fixed point xo (i.e., the 
Cauchy data) as global coordinates. For the sake of 
simplicity, we set 


ui = Ux(X0), u = tisa X0) 


U4 = Uxxxx (xo) 


Uy = u(xo), 


UZ = Uxxx (x0), 


In order to compute the reduced equations of the 
first flow of [7], we have to take its x-derivative and 
to use the constraint [8] and its differential 
consequences to eliminate all the derivatives of 
order higher than 4. We obtain the equations 


uo = Our =j ouz 7, Ou3 = u 
ðh Ot OH Van [9] 
o 
ae = 10uou3 + 20u4 U2 a 30u01 
Oty 
In the same way, for the KdV equation we get 
Ou 
A = 1(u3 — 6uouı ) 
Ou 
rn = 1(4uou3 + 2uiu — 30uo u1) 
7 [10] 
x =4(4uou4 + 6u4uU3 + 2u? 
3 
—30uo u — 60ugu1") 
o 
on =4(10u)u4 + 10u9*u3 + 10u2u3 
3 


—100uou U2 = 60u? — 120u9° 1) 


There are two compatible Poisson structures giving 
a bi-Hamiltonian formulation of both systems. The 
corresponding Poisson tensors are 


0 0 0 2 0 
0 0 =a 0 —20uo 
P=] 0 2 0 20uo 20u4 
—2 0 —20u9 0 —140uo? — 20u2 
O 20uọ —20u; 140u9* + 20x 0 
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and 
0 - 0 3uo 6u 
-4 0 —3uo —3u4 —4u5 = 1510 
0 3uo 0 Uat 15uo? uz + 30uguı 
P= — 40 
—3uo 3u1 —uU) — 15u02 0 2 pou 
30u1 = 60uo 
—u4 Ami 
—6u, 4u + 1510 —u3 — 30unu, i awe? 0 


In fact, if we call X4 and X; the vector fields given 
by [9] and [10], then the following recursion 
relations hold: 
PdHp =0 
Xı = PdH,; =P’ dHo 
X, = PdH> =P’ dH; 
0 =P’ dH> 


[11] 


where 


Ho = — u4 + 10ugou + Su? = 10u0° 
Hy, = }(2uou4 — 2u4u3 + T — 20u u2 + 1519") 
H> = + (2uzu4 — 6uo u4 — uz + 12ugouu3 


—16uou2? — 12u1?u> -+ 60uo u2 — 36u0°) 


Therefore, the vector fields xX, and X3 are 
bi-Hamiltonian. The geometry of this bi-Hamiltonian 
manifold is similar to the one of the first example. The 
symplectic leaves of both Poisson structures 
have dimension 4, and the Lagrangian foliation 
(given by the level submanifolds of Ho, Hı, and H2) 
is contained in the intersections of such leaves. This 
Lagrangian foliation is called by Gelfand and Zakhar- 
evich the “axis” of the bi-Hamiltonian manifold. 

We also notice that the relations [11] can be 
collected in the statement that the function 
H(A) = Hoà? + H1A + Hp is a Casimir of the Poisson 
pencil P, =P’ — XP, that is, 


P, dH(\) = 0 


The importance of the stationary reductions of 
the KdV hierarchy lies in the fact that (as noticed 
in the early works on the subject) the reduced 
equations can be solved by means of the classical 
method of separation of variables. We mention 
that the separability of these systems is a par- 
ticular instance of a general result, which is 
valid for quite a wide class of bi-Hamiltonian 
manifolds. 


30u11? + 60u9° 


Next we present an important class of 
bi-Hamiltonian manifolds. We recall that the 
dual g* of a finite-dimensional Lie algebra g 
possesses a canonical Poisson structure, called the 
Lie—Poisson structure. It is defined as 


{F, G}(X) = (X, [dP(X), dG(X)]) [12] 


where F, G € C™(q*) and their differentials at X € q* 
are seen as elements of q. If Xo is a fixed element in 
q*, the constant Poisson bracket 


{F, G} (X) = (Xo, [dF(X),dG(X)]) [43] 


is compatible with the Lie—Poisson bracket. In fact, the 
Poisson pencil {-, -}; ={-,-} — A{-, -Y is obtained from 
{-,:} by applying the translation X > X + AXo; 
hence, it is a Poisson bracket for every value of the 
constant À. The method of translation of the argument, 
due to Manakov, provides a lot of bi-Hamiltonian 
vector fields for this bi-Hamiltonian manifold. One has 
to consider an Ad’-invariant function on q*, that is, a 
function H € C™(q*) such that 


(X, {dH(X),x])=0 Vxeg, X Eg 


It is clearly a Casimir function for the Lie—Poisson 
bracket, and this implies that the function 
X => H(X — Xo) is a Casimir of the Poisson pencil. 
If this function can be developed as a Laurent series 
in A, its coefficients H; fulfill the recursion relations 


{Hm} = {Hj} [14] 


and thus give rise to a sequence of bi-Hamiltonian 
vector fields. 

The last example is a generalization of the 
previous one. For the sake of simplicity, we consider 
a Lie algebra g of matrices such that the trace of the 
product is nondegenerate, and the space M = g? = 
qxq. If Fe C%(M), its differential at a point 
(x9, 1) can be identified with the element (OF/Oxo, 
OF/0x1) of M given by 


OF OF 
— F(x9 + evo, x1 + evi) = tr | =— vo + — v1 
dt|c=0 x 1 


for all vo,v; € g. The manifold M has a three- 
dimensional family of pairwise compatible Poisson 
brackets: 


OF OG 
{F, G}(x0, %1) = —tr (x acl 


OF OG 
{F, G } (xo, x1) = tr (x al | 


OF OG 


{F, G} (xo, x1) = trt (x al 
OF OG OF OG 

1% ( = il 7 = z )) 
Notice that the first two brackets restrict to the 
submanifolds xọ= constant and give rise to the 
bi-Hamiltonian structure presented in the previous 
example (via the identification between g and g* 
given by the trace of the product). This example can 
be generalized to an arbitrary number n of copies of 
q. In this case there is an (7 + 1)-dimensional family 
of pairwise compatible Poisson brackets, which can 
be shown to be Lie—Poisson brackets with respect to 
suitable Lie algebra structures on g”. According to 
Reyman and Semenov—Tian-Shansky, these brackets 
can also be casted in the R-matrix formalism. 

Also in this case, the Ad-invariant functions on g 
give rise to functions in involution on our multi- 
Hamiltonian manifold. For example, if He" denotes 
the A*-coefficient of tr(x1À + xo)“, then the recur- 
sion relations 


fae is 620,04 


hold, and they imply the existence of tri-Hamiltonian 
vector fields on M. 

Finally, we mention that the bi-Hamiltonian 
structure of the stationary flow of KdV — discussed 
above — can be obtained as a suitable reduction of 
the multi-Hamiltonian structure on q°, where 
q=$l(2,R). A similar statement holds for the other 
stationary flows of the Gelfand—Dickey hierarchies. 


Iterative Properties and Integrability 


In this section we show how to use the bi- 
Hamiltonian formulation of a given system to explain 
its integrability. In the cases similar to the open Toda 
lattice, where one of the Poisson structures is 
nondegenerate, one can introduce a recursion opera- 
tor and employ its powers in order to generate a 
chain of integrals of motion in involution. In the 
other examples, where the bi-Hamiltonian structure 
is degenerate, the conserved quantities turn out to be 
the coefficients of Casimir functions of the Poisson 
pencil. 
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If (M,{-,-},{-,-}’) is a bi-Hamiltonian manifold, 
we call bi-Hamiltonian hierarchy a sequence {Hz},50 
of functions on M fulfilling the recursion relations 


1 Hgy} — O H,}’, 


In terms of Poisson tensors we have that 
P dH;,1 =P’ dH}. A bi-Hamiltonian hierarchy clearly 
gives rise to an infinite sequence of bi-Hamiltonian 
vector fields, 


k>0 15] 


X, =PdH,=P'dH,1, k>1 16] 
The functions H, are in involution with respect to 
both Poisson brackets. Indeed, for k > j, one has 


= HiIl} 


so that {H;, H}}=0 for all j,k > 0, and therefore 
{H;, H,}’ =0 for all j,k > 0. If {Hi}js9 and {K;]so are 
two bi-Hamiltonian hierarchies, then all functions 
are in (bi-)involution provided that one of the two 
hierarchies starts from a Casimir of {-,-}. In fact, 
suppose that Ho is such a Casimir. Then 


{ Hi, Kj} = {H;-1, K;}' = tis Ka} = 6 «8 
— { Ho, Kj+i} = 0 


and 
{H;, Kj} = {Hiss Kj} = 0 


We observe that these proofs of the involutivity do 
not use the compatibility condition [2] between the 
Poisson structures. The point is that this condition is 
important for the existence of bi-Hamiltonian hier- 
archies. Indeed, the problem of the existence and the 
construction of bi-Hamiltonian hierarchies is quite 
delicate. We tackle it first in the case of a particular 
class of bi-Hamiltonian manifolds, the so-called 
Poisson—Nijenhuis manifolds. In turn, they are a 
generalization of nondegenerate bi-Hamiltonian 
manifolds. 

Let (M, P,P’) be a bi-Hamiltonian manifold such 
that P is invertible. Then we can introduce the 
tensor field N=P’P~!, which is of type (1,1) and 
will always be dealt with as an endomorphism of the 
tangent bundle TM. This tensor field possesses some 
remarkable properties. First of all, its Nijenhuis 
torsion T(N) vanishes; this means that 


for any pair (X, Y) of vector fields on M, where 


[X,Y] = [NX, Y] + [X, NY] — N[X, Y] 
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Sometimes a tensor field with vanishing Nijenhuis 
torsion is called a recursion operator. Since P defines 
a symplectic structure on M, such a bi-Hamiltonian 
manifold is called an wN manifold. 

The tensor field N satisfies two compatibility 
conditions with P. The first one is simply the 
skew-symmetry of P’ and reads NP=PN*, while 
the second one is a restatement of [3], 


Xp, Xcly = XíF,G}p VE, G € C°(M) 


A manifold is said to be a Poisson—Nijenhuis manifold 
(briefly, a PN manifold) if it is endowed with a Poisson 
tensor P and a torsionless (1,1) tensor field N which 
are compatible, in the sense that the two above- 
mentioned conditions hold. We have just seen that 
every nondegenerate bi-Hamiltonian manifold (i.e., 
such that one of the two Poisson tensors is invertible) is 
a PN manifold. On the other hand, if (M,P,N) is a 
PN manifold, then it can be shown that P’=NP is 
a Poisson tensor, which is compatible with P. In 
other words, PN manifolds are particular examples of 
bi-Hamiltonian manifolds. Moreover, one has that 
P’)=N/P and P*)=N*P are, for every j,k > 0, 
compatible Poisson tensors. 

Let us consider now a function Ho, on a PN 
manifold (M,P,N), such that N*dH)=dHyj is 
exact, where N*:7*M — T*M is the adjoint of the 
recursion operator N. This implies that 


X = PdH,; = PN* dHo = P’ dH [17] 


is a bi-Hamiltonian vector field. By means of N* we can 
define the 1-forms a; = (N* Y dHo, which can be shown 
to be all closed. If they are exact, that is, a, = dH}, then 
the functions H; form a bi-Hamiltonian hierarchy and 
thus are in involution. This shows that on a (simply 
connected) PN manifold every bi-Hamiltonian vector 
field of the form [17], with N* dHp = dH, belongs to a 
bi-Hamiltonian hierarchy and that its first integrals (in 
involution) can be iteratively constructed with the 
recursion operator. (The integrability of this vector 
field clearly depends on the number of independent 
integrals of motion.) Moreover, the vector field 
Xp = P dH, =P’ dH,_, of the hierarchy is Hamiltonian 
with respect to all Poisson structures P with j > k, 
because X, = P” dH, ;. 

The example of the Toda lattice presented earlier 
can be casted in the PN (more precisely, wN) 
framework. One can introduce the recursion opera- 
tor N and, in the three-particle case, one can define 
the third integral of motion as dJ = N* dH. Since K, 
H, and J belong to a bi-Hamiltonian hierarchy, they 
are in involution, and this (along with their 
functional independency) proves the integrability of 
the Toda lattice. 


In this example something more happens: the 
integrals of motion are (up to multiplicative con- 
stants) the traces of the powers of the recursion 
operator N. This is a general fact, since the 
vanishing of the torsion of N implies that N* dI, = 
dIz414, where I, = (1/k)tr NF. 

Next we deal with the case where the 
bi-Hamiltonian manifold (M,P,P’) is not of the 
Poisson—Nijenhuis type, that is, both P and P’ are 
degenerate. Let us suppose that their symplectic 
leaves have codimension 1. We also want to discuss 
in this case an iteration problem, namely the 
problem of constructing a bi-Hamiltonian hierarchy 
starting from a Casimir Ho of P. Let us consider the 
Hamiltonian vector field X; =P’ dH) = Yp, (using 
the notations introduced earlier). Thanks to the 
form [4] of the compatibility condition between P 
and P’, we have that 


Lx,P = Ly, P = —Lx,, P’ = 0 


meaning that X4 is an infinitesimal symmetry of P. 
Moreover, X1 is tangent to the symplectic leaves of P, 
since (dHo, X1) = (dHo, P’ dHo) = 0. Under some sui- 
table topological assumptions, we can conclude that 
there exists a function Hı such that X; =P dH,, that 
is, X; is a bi-Hamiltonian vector field. Now the 
procedure can be iterated, that is, in the same way one 
can show that, if X2 = P’ dH; = Yp,, then there exists 
a function H> such that X» = P dH», and so on. Thus, 
one obtains a bi-Hamiltonian hierarchy {H,},50, 
which can either be infinite or end with a Casimir of 
P’. In any case, the function H(A) = > ps0 H,»* isa 
Casimir of the Poisson pencil P, =P’ — AP. As seen 
earlier, the typical situation is that the chain terminates 
with a Casimir H,, of P’, where dim M =2n + 1. In 
other words, there is a Casimir of the Poisson pencil 
which is a polynomial of degree n in the parameter A. 

As a general procedure for constructing 
bi-Hamiltonian hierarchies, one can look for the 
Casimir functions H(A) of the Poisson pencil which 
are deformations of Casimir functions of P, but it is 
not clear when such a deformation does exist in the 
case where the corank of the bi-Hamiltonian structure 
is greater than 2. Nevertheless, suppose that H(A) = 
D o H,X* isa Casimir of Py, that is, that {Hg},s0 is 
a bi-Hamiltonian hierarchy. Then, for all A, the 
bi-Hamiltonian vector fields X,,, =P dH ,,, =P’ dH, 
are Hamiltonian with respect to P), with Hamiltonian 
function H'®)(A)= S77_y Hye, 


Xpit = Py)dH™ (A) 


Therefore, the vector fields X, are not only 
bi-Hamiltonian, but they are Hamiltonian with 
respect to any Poisson bracket of the pencil. 


In this article we have described some basic 
properties of bi-Hamiltonian systems, defined on 
manifolds possessing a Poisson pair. There are other 
important vector fields on these manifolds (more 
precisely, on wN manifolds). They are called cyclic 
systems of Levi-Civita, and they give an intrinsic 
description of the separable systems of Jacobi. We 
refer to the article Recursion Operators in Classical 
Mechanics in this encyclopedia for these topics. 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Classical r-Matrices, Lie Bialgebras, and Poisson Lie 
Groups; Integrable Systems and Algebraic Geometry; 
Integrable Systems and Recursion Operators on 
Symplectic and Jacobi Manifolds; Integrable Systems: 
Overview; Recursion Operators in Classical Mechanics; 
Separation of Variables for Differential Equations; 
Solitons and Kac—Moody Lie Algebras; Toda Lattices. 
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Introduction: Multiple-Scale 
and Multiscale Approaches 


Multiscale, or more precisely multiple-scale, 
method is a technique of perturbation theory 
based on the introduction of additional rescaled 
variables, say time variables, formally considered as 
independent variables and describing each a differ- 
ent timescale (for the sake of simplicity, we will 
mainly consider a dynamic framework and time- 
scales; all can be transposed to spatial dependences 
and scales). It was first developed to handle 
singular situations in which dynamic regimes of 
different characteristic scales coexist and intermin- 
gle in such a way that straightforward perturbation 
expansions are not uniformly convergent in time 
(hence of limited relevance and use) due to the 
so-called secular terms growing unbounded with 
time; the freedom introduced together with the 
extra variables indeed allows to impose conditions 
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preventing these secular divergences and improving 
the convergence of the perturbation series. It yields 
a global perturbation solution describing jointly the 
behavior at small and large scales. This technique 
belongs to the far more wide-ranging class of 
multiscale approaches; these can be divided into 
four main subclasses: 


1. Mean-field techniques exploiting scale separation 
between fast and slow components of the 
dynamics. The influence of the slow variables 
onto the fast dynamics, if any, is treated in a 
decoupled way within a parametric approxima- 
tion, allowing an adiabatic elimination of fast 
variables (see the section “Slow/fast variables”). 

2. Singular perturbations, in which individual fast 
components ultimately give rise to slow trends 
and influence the large-scale features. Scale 
separation here breaks down at long times and 
multiple-scale method is then a method of choice 
(see the next section). 

3. Matched expansions when regimes of different 
scales succeed (boundary-layer singularity; see 
the section “Boundary layers and matched 
expansions”). 
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4. Renormalization techniques, in systems exhibiting 
some kind of universality in the relations between 
their behaviors at different scales, for example, 
scale invariance (see the section “Renormalization: 
an iterated multiscale approach”). 


We will first present the principles of multiple- 
scale method, detail its technical implementation on 
simple abstract examples and cite some typical 
applications. Then we will articulate this technique 
with more general multiscale methods in a brief 
overview (see the section “A brief overview of 
multiscale approaches”). The range of multiscale 
approaches and technical tools will then be illus- 
trated and compared in the context of diffusion, 
Brownian motion, and transport phenomena (see the 
section “Summary: the exemplary case of 
diffusion”). 


Multiple-Scale Method: Principles 


Context: Singular Perturbations and Secular 
Divergences 


Multiple-scale methods have been developed to 
handle situations in which the dynamics involves a 
small parameter e (e.g., the ratio of the masses of 
different subsystems, the strength of an additional 
interaction, the amplitude of an applied field) 
directly controlling the separation between the 
different characteristic timescales of the evolution 
and, specifically, such that the behavior for «=O is 
qualitatively different from the behavior for «e small 
(e &« 1 but finite); in other words, when a weak 
influence, of strength controlled by e < 1, does not 
have only weak consequences. Typically, this occurs 
when € represents the strength of a weak coupling 
between otherwise independent subsystems or when 
a vanishing value «= 0 changes a characteristic time, 
the sign of a friction coefficient, the order of the 
highest time derivative in case of ordinary differ- 
ential equations (turning points), or the type of 
partial differential equations in case of spatially 
extended systems. Accordingly, a naive perturbative 
approach with respect to e, that is, an expansion 
taking as a basic approximation the behavior for 
«€=0, cannot bridge the qualitative gap with 
behaviors observed for e > 0. It thus fails to give a 
full account of the system evolution at all times: one 
speaks of singular perturbation. 

A historical example arose in celestial mechanics, 
in the celebrated nonintegrable three-body problem, 
involving the Sun, a big planet and a smaller one, of 
respective Masses m1, Mı < Mı and m3 &« m. The 
straightforward approach would be to consider the 
presence of the small planet as a small perturbation 


of the integrable two-body problem for the masses 
mı and m. But when one tries to determine the 
solution as a series in powers of the mass ratio 
€=m3/m, unbounded terms appear, the so-called 
secular terms, increasing without bounds as fast as t£, 
hence of ill-defined order and impairing the very 
consistency of the perturbation approach at long 
times t > 1/e. Accordingly, the perturbation expan- 
sion is not uniformly convergent in time, preventing 
from using it to investigate asymptotics and deter- 
mine the fate of the three-body system: the influence 
of the small planet on the motion of the bigger one, 
although seemingly a weak perturbation, might 
ultimately modify its trajectory around the Sun, at 
least in some resonant cases. 

The origin of secular terms lies in a phenomenon 
of resonance, which is best explained on an 
example: the Duffing oscillator x + x= —e«x° with 
c€ X 1. When looking for a solution in the form 
x(t)= So e"x,(t), each component x,(t) has to be 
bounded in order to get a consistent perturbation 
expansion, in which the hierarchy of terms of 
different orders remains valid forever: ex„41(t) << 
x,(t). These components should satisfy the following 
sequence of equations: 


Xotxo =O, X txi = —Xxé, 


(linearized operator Lx = x + x) [1] 


It gives xo(t)=ae* + c.c., from which follows a 
secular contribution (3i/2)ala\*te” in x(t). In 
general, solving perturbatively z=f(z,«) for an 
expansion 2(¢,t)= >> €"z,(t) yields a hierarchical 
sequence of equations of the form Zz, =Lz,+ Pn 
(Z0,215---5%n-1) for n > 1, where L=Df(zo,¢=0) 
comes from the linearization in zo of the unperturbed 
evolution law. A secular divergence arises in Z, as 
soon as %„ contains an additive contribution which is 
an eigenvector of L (part of a mathematical result 
known as the Fredholm alternative). The appearance 
of secular terms reflects a singular feature of the 
dynamics: the fact that the limits as € — 0 and t > co 
do not commute. As a rule, such noninversion is 
associated with generalized secular divergences: the 
fast, short-term dynamics finally contributes to the 
slow, long-term behavior. This feature is a clue 
towards using multiple-scale method. 


Technical Principles 


The first step is to perform rescalings leading to 
dimensionless variables and functions, which evidence 
a small control parameter <, related to scale separation 
and providing a natural parameter for a perturbation 
approach. The basic principle of multiple-scale 
method is to introduce additional independent time 


variables t1,t2,...,¢, such that the physical situation 
corresponds in this extended time-variable space to 
the line 


to =t, th = et, b = et, 
d ð ð ,ð [2] 


It thus amounts to a perturbation expansion of the 
time-derivative operator. This method can be traced 
back to the Lindstedt-Poincaré technique, where the 
time variable t is expanded according to t=s(1 + 
ew, + ew. +---) and the evolution described in 
terms of the new variable s and unknown frequencies 
(wi)>1 to be determined self-consistently (Nayfeh 
1973). By contrast, the multiple-scale approach puts 
on a par t9=¢ and the additional variables (t;);>1. 
The perturbation approach is then carried out as 
usual, plugging eqn [2] for d/dt and the expansion 
z(€,t) = oso €” 2n(to, t1,,.-..) into the evolution 
equation and identifying term-wise the coefficients 
of the successive powers of e. The additional freedom 
thus introduced when considering (t;);>ọ as indepen- 
dent variables will be compensated in the course of 
the computation, by imposing “solubility conditions” 
ensuring the vanishing of secular terms and the 
consistency of the perturbation method. In particular, 
it is possible to freely choose boundary conditions 
outside the physical line tı =efo,...,t,=e€"to. The 
resulting set of equations contains exactly the same 
information as the original one, only expressed in a 
different way: by construction, terms depending, say, 
on fo, describe a fast component with no emerging 
slow trends that would intermix with the t4- 
dependence; fast variables contribute only to fast 
modes. At the end, one restricts to the physical line, 
thus turning back to the single “real” variable t. The 
benefit of the method is to provide a joint access to 
dependences at different scales, now expressing as 
dependences onto the different time variables 
to,t1,...,t,. One introduces as many new variables 
as necessary to circumvent secular divergences. We 
have implicitly supposed above that the behavior at 
timescale At=QO(1) corresponds to the fastest 
timescale of the evolution. If it were not the case, 
the rescaled time variables would be to=&”t, 
tı =e"... if the fastest timescale is At =O (e€”°). 
More general time-derivative expansion, associated 
with rescaled variables t,, = «°"t might be considered 
to better account for the hierarchy of characteristic 
timescales of the dynamics. 


Multiple-Scale Method: Abstract Examples 


Let us first consider the simplest possible example 
x =a(1 + €)x, for which the exact solution is trivially 
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known, allowing to appreciate the validity of the 
multiscale approach compared to the straightforward 
perturbation expansion. In the latter case, one looks 
for a solution x(t) = xo(t) + ex4(t) + Ole?) and identi- 
fies term-wise the powers of «e. At order 0, xo =axo 
yields x(t) = coe”. At order 1, x; — axı = xo(t) leads 
to a secular divergence: x;(t)=ce” + cote”. Carry- 
ing on the perturbation analysis yields the following 
expansion: 


x(t) =ce*(1tea+2r/24---) E 


which is not uniformly convergent: for t = O(1/e), all 
terms are of the same magnitude. Using this recursive 
method to obtain a finite-order approximate solution 
(e.g., stopping, as here, after two steps of the 
perturbation method) is only relevant at short times 
t<1/e. The straightforward perturbation analysis 
captures the behavior of the exact solution only if all 
terms are computed and taken into account (in less 
trivial examples, the straightforward perturbation 
series might even be divergent). In the multiple-scale 
approach, one introduces two rescaled variables to = t 
and t, = et and looks for a solution of the form x(t) = 
xolto, t1, ...) texq(to,t,...) + Oe), At order 0, 
Oi X0 = AXo yields xolto, t1, ...) = colti, ...)e™. At 
order 1, we get O,.x1 + xo = xo + axı. The solubil- 
ity condition writes aco — „co = 0, which allows as 
to avoid secular divergence and suppresses the 
artificial freedom introduced with the additional 
time variable t4, yielding co=ce”!. The equation 
(ð — a)xı =0 is here superfluous, but in less simple 
situations, it remains at this stage a nontrivial 
equation for x1. One thus directly gets the solution, 
uniformly valid at all times: 


x(t) = c e% e% = c ete [4] 


As a rule in singular perturbation method, the 
difficulty here originates in the noncommuting limits 
€ — Oandt — œ; indeed, denoting y,(t)=x-,(t)e™, 
one has lim;_.. lim,_.9+ y-(t)=c, whereas lim, -, o+ 
lips Yelt) = 00. 

Other training examples are the weakly damped 
linear oscillator x + x = —2ex, solved with multiple 
scales to =t,t; = ct, ti =t, or with the more spe- 
cific variables 0= V1 — et, T = et; the Duffing oscil- 
lator x+x=-—ex’ introduced above, whose 
multiple-scale resolution requires three variables 
to =t, t4 = ct, t4 = €t; and the Van der Pol oscillator 


x +x=e(1 —x7)x. 


An lllustration: Classical Lorentz Electron Gas 
in a Weak Field 


As a less abstract, hence more convincing, illustra- 
tion of the strength of multiple-scale method, let us 
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consider the dynamics of a classical Lorentz electron 
gas acted upon an external electric field (associated 
acceleration a). This model considers the electrons 
as charged hard spheres whose motion results from 
the superimposition of a driven classical motion in 
the field and elastic collision on immobile scatterers 
(the atoms). It is implemented within a kinetic- 
theoretic framework, based upon a Boltzmann-like 
equation for the electron velocity distribution: 


(tag fled =-Sofe) 5 


where v=|v|, and A is the mean free path of the 
electrons. Of =f — fsph is a projector accounting for 
the effect of collisions through the deviation of the 
distribution f from spherical symmetry, namely 
through the discrepancy between f and its isotropic 
counterpart fopn(v)=(1/47) f f(v,t)dd obtained as 
an average over the velocity directions v. The 
relevant small parameter is €=maA/kT, measuring 
the ratio of the work ma. done by the field over the 
mean free path to the thermal energy kT in the 
initial state. The condition ¢«<1_ ensures 
the separation of the characteristic timescales of 
the two mechanisms experienced by an electron: the 
thermal motion and the field-induced deterministic 
motion. Denoting by vy,=. WRT /m the thermal 
velocity of the electrons, we have indeed 
€=(ty/tacc)*, where tj, =Av,, is the mean time 
between two successive collisions with the scatterers 
and tcc = \/A/a is the acceleration time required for 
the field to move the electron over the mean free 
path starting from rest. The result of the plain 
weak-field expansion is to evidence its own failure: 
it shows that the perturbation is singular insofar as 
the asymptotic state will be fully dominated by the 
field, with no memory of the initial temperature. 
Multiple-scale method is here implemented with 
respect to the time variable, introducing new 
independent variables (7;);,) such that the physical 
situation corresponds to the line 


2 n 
T= hA Hy SHO; sr tee =O T 


(e = ma) /kT) 6) 


The time-derivative expansion [2] is supplemented 
with an expansion of the velocity distribution: 


Ce hl Cres ee tee [7] 


i>0 


The procedure is conducted as exposed in the 
general case. Identifying term-wise the coefficients 
of the expansion yields a hierarchy of equations for 
the (F”);>1, each supplemented with a solubility 
condition preventing the appearance of secular 


divergences. A detailed presentation can be found 
in Piasecki (1993). The benefit of the multiple-scale 
method is to yield jointly the different stages of the 
gas evolution, starting from thermal equilibrium and 
switching on the field at t= 0: 


è at times T= (1), an initial transient with a drift 
velocity (v) (t) =at — Ciat’vp/A +- in the 
direction of the applied field (denoting C; some 
numerical constant); 

è at times T= O(1/e), a linear-response regime with 
a steady drift velocity (vz) ~ aX/vy,; and 

è at times T=O(1/e7), a long-time field-dominated 
heating of the gas, where the velocity distribution 
is no longer Maxwellian, and the kinetic energy of 
the electrons grows without bounds as ?7/%, 
whereas the drift velocity slowly vanishes asymp- 
totically: (vz) ~ (2a/t)'?. 


Domains of Application of the Multiple-Scale 
Method 


The multiple-scale method was first developed in 
nonlinear mechanics. It is fruitful and is even 
required in any instance where plain perturbation 
expansion is not uniformly convergent, more gen- 
erally when it is necessary to account jointly for 
variations at different timescales: resonant wave 
interactions, for example, in plasmas, or in the case 
of oscillations with slowly varying coefficients. 
Multiple-timescale method was applied, around 
1960, to get kinetic equations (closed equations for 
the one-particle distribution) from molecular 
dynamics (Liouville equation) for dilute gases, 
plasmas, or to establish a microscopic theory of 
Brownian motion from molecular dynamics of a 
hard-sphere system (see the section “Microscopic 
theory of Brownian motion”). In the same spirit, it 
allows to relate constructively different mesoscopic 
descriptions, for example, in the case of Brownian 
motion, to relate the Kramers equation for the 
distribution P(r, v, t) to the Smoluchowski equation 
for P(r,t) (see the section “Mesoscopic theory of 
Brownian motion”). Other examples are the deter- 
mination of transport coefficients (friction, viscosity) 
from kinetic description or, at macroscopic scale, 
the determination of eddy viscosity and eddy 
diffusivity (see the section “Effective diffusivity for 
a passively advected scalar”). A last domain of 
application concerns systems where relaxation pro- 
cesses at different scales superimpose, requiring to 
handle jointly different time dependences. Multiple- 
scale method then displays the physics of the 
relaxation process and its associated hierarchical 
structure (e.g., the application to the adiabatic 
piston problem discussed in this Encyclopedia by 


Gruber and Lesne — see Adiabatic Piston; see also 
the section “Some typical applications”). 


A Brief Overview of Multiscale 
Approaches 


Different Scales and Regimes 


Common to all multiscale approaches is the focus on 
the very existence of different scales, exploited 
through the use of rescaled variables, which makes 
explicit the presence of a small parameter € control- 
ling the dynamics, responsible for the existence of 
different timescales and related to the scale separa- 
tion. Technically, the first, very simple but essential, 
step is to replace the variables, fields, and param- 
eters by their dimensionless counterparts. So doing, 
small parameters reflecting scale separation (in time, 
space, energies, amplitudes,...) will naturally 
appear. Although it is thus possible to estimate the 
order of the different terms, it is to be underlined 
that it gives no clue on their actual contribution to 
the long-term behavior: in singular situations, pre- 
cisely those where multiscale approaches have to be 
developed, small terms can have a noticeable 
influence at all scales. As illustrated in the following 
sections, different rescalings of variables and func- 
tions allow us to discriminate features at different 
scales and to capture different regimes. More 
specifically, the techniques to manage with the 
joint contributions of several regimes at different 
timescales depend on the way these regimes inter- 
mix. They can be: 


è superimposed regimes, when fast and slow depen- 
dences intermingle in the evolution of the same 
variable. It is the framework of multiple-scale 
analysis. The solution writes typically x(t, ef, 
et, ...); Or 

® coexisting regimes, namely a coexistence of fast 
and slow evolutions. One might focus either on 
the fast evolution and use a quasistatic approx- 
imation (or parametric approximation) for the 
slow evolution, either on the slow evolution and 
use a quasistationary approximation or an aver- 
aging of the fast evolution. The solution writes 
typically [Xfast(t), Xslow(et)| (or [Xfast(T/€)s Xslow(T)] 
if the observation takes place at long timescales, 
with a relevant time variable 7 = cet); or 

è successive regimes, when initial conditions, bulk 
behavior and asymptotics are not of the same 
order with respect to «e; this is a boundary-layer- 
like issue, and the solution writes typically 
Mayer(t/€) for O<t< to, then iok O for t > to, 
with to = O(1). 
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Applications are innumerable; the most typical 
and investigated ones are the climate (from “hours” 
for the observed weather to “thousands of years” for 
eras), population dynamics, coasts and sand dunes 
(from “grains” to “country” scales), protein folding 
(the vibration of covalent bonds occurs at scale of 
femtoseconds, while the whole folding may require 
up to a few seconds), or trading markets (from 
seconds to years). Let us finally give two typical 
examples for the parameter e: 


e The weak-damping and high-friction limits, best 
explained on an example. The damped oscillator 
mx + yx + V'(x)=0 appears as an Hamiltonian 
dynamics mx + V'(x)=0 as soon as the damping 
can be neglected, when the characteristic time 
0 =[m/V" (0)]'/* of the undamped oscillator is far 
smaller than the damping time r=m/y. The 
weak-damping limit is thus defined as «€ — 0, 
where e= 0/7 =[7/mv"(0)]'”. It leads to a 
singular behavior when investigating the asymp- 
totics, as in the Duffing oscillator and weakly 
damped oscillator mentioned in the last section. 
On the contrary, the evolution appears as a 
dissipative gradient dynamics x =—V'(x)/y=0 
as soon as T < 0. This leads to the high-friction 
limit: +/0=[mV"(0)/72]'/* 4 0. This example 
somehow reconciles conservative and dissipative 
dynamics, showing that they might coexist in the 
same system. 

è The hydrodynamic limit involved in the deriva- 
tion of hydrodynamics equations (namely incom- 
pressible Navier-Stokes equations) from kinetic 
Boltzmann equation. It writes «= A/L — 0, where 
e is the so-called Knudsen number, defined as the 
ratio of the mean free path A (the average distance 
traveled by a fluid molecule between two succes- 
sive collisions) to a characteristic spatial scale L of 
the system (e.g., the size of an obstacle). 


Bridging the Scales: Mean-Field, Singular 
and Scaling Approaches 


The aim of multiscale approaches is to bridge 
different scales, through the determination of the 
large-scale behavior of the solution, or by establish- 
ing a constructive relation between the initial model 
and an effective model at higher scale. We have 
mentioned in the introduction a first classification of 
multiscale systems and associated approaches: they 
might exhibit (1) scale decoupling, (2) some singu- 
larity in the relation between the different scales, or 
(3) scale invariance. 


Mean-field approaches In case of scale decoupling, 
mean-field approaches apply. Let us briefly recall, 
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within its usual spatial formulation, that a mean- 
field approach amounts to identifying the local 
environment, which is a priori fluctuating and 
spatially inhomogeneous (e.g., the local magnetic 
field generated by neighboring spins in a spin lattice 
model) with the average one, expressed as a function 
of the average order parameter (spatial average or 
equivalently a statistical average in the limit as the 
system size tends to infinity). Mean-field approaches 
can be implemented either in time (averaging), in 
real space (homogenization, coarse-graining), or in 
phase space (aggregation and projection techniques). 

In the present context, the best example of a mean- 
field approach is provided by homogenization proce- 
dures. They can be traced back to the method of 
Lagrange to solve the three-body problem. The issue is 
to describe the motion of a light body Bz experiencing 
the gravitational attraction of the Sun and a heavier 
body Bı. The mass of By is supposed to be small 
enough to neglect its influence on the Sun and B, (the 
so-called restricted three-body problem); B4 will thus 
obey the Keplerian laws of motion. The method of 
Lagrange applies when B2 is far more distant from the 
Sun than By(r2 >> rı), which implies (due to the third 
law of Kepler: w*7° = const.) that the angular velocity 
w1 of B1 is far larger than w2: the large body B4 moves 
faster than By around the Sun. In first approximation, 
Lagrange replaced the rapidly oscillating influence of 
Bı on the motion of B2 by the influence of a constant 
distribution of mass, obtained by spreading the mass 
mı of Bı all over its orbit. The Gauss theorem thus 
states that this influence can be accounted for by 
simply adding the total mass of this distribution to the 
mass of the Sun. The stability of the system would 
follow: B2 will remain trapped in the neighborhood of 
the pair composed with the Sun and B4. 


Singular perturbations A typical instance of singu- 
lar multiscale behavior is associated with asymptotic 
expansions 


n—1 


x(t) = X ex, + Rale, t) [8] 


r=0 


which are not convergent: lim, ... Ra(c,t) AO at 
e fixed, but lim,_.,9 €”R,,(e,t) =O at fixed n and t. 
Asymptotic expansions are ubiquitous in multiscale 
approaches: the coexistence of different timescales, 
superimposed and nontrivially coupled to get rise to 
the observed phenomenon, prevents from obtaining 
uniformly convergent perturbative expansions; it 
is only in this latter regular case that the above- 
mentioned mean-field approaches and homogeniza- 
tion techniques apply. 


Scale invariance, scaling theories and renormalization 
Self-similarity and associated criticality prevent scale 
decoupling, but allow us to develop scaling theories 
and renormalization methods. In contrast to scale- 
separation arguments, the guiding principle is now 
to focus on the links relating one scale to the others 
(scaling transformations, renormalization transfor- 
mations). The problem complexity is thus reduced in 
a some “transverse way,” by retaining only scale- 
invariant features. We shall expose in the section 
“Renormalization: an iterated multiscale approach” 
further links between multiscale approaches and 
renormalization methods, beyond the restricted 
scope of scale-invariant systems: in many instances, 
renormalization can be seen as an iterated multiscale 
approach. 


Scaling Limits 


Let us mention a specific instance of multiscale 
approach, which is associated with scaling limits. 
Scaling limit refers to a joint limiting procedure, in 
which several independent variables jointly converge 
towards given limits, with prescribed relative beha- 
viors; this latter condition is a key point in the 
frequent case when the different limits do not 
commute, and we shall see later that it is an 
essential ingredient of renormalization methods. 
Let us cite two acknowledged examples: 


è The thermodynamic limit for a system of N particles 
in a volume V; it amounts to let N — œ, V — œ, 
while N/V =n = const. (constant average number 
density). It is a prerequisite to derive standard 
thermodynamic behavior from the statistical- 
mechanical description; it supports the use of 
asymptotic results given by the law of large numbers 
and the central-limit theorem provided the correla- 
tions between the particles remain short-range. 

e The Boltzmann-Grad limit for a system of n hard 
spheres of radius € per unit volume. In dimension 
d, it writes € — 0,n — oo (thus differing from the 
thermodynamic limit) while ne™!=z remains 
constant. This limit is involved in kinetic theory 
as a limiting instance where the Boltzmann ansatz 
applies (identifying the two-particle distribution 
function with the product of the corresponding 
one-particle distributions). Indeed, the occupied 
volume fraction ne tends to 0 so that recollisions 
and ensuing long-term correlations can be 
neglected (rarefied gas). On the other hand, the 
mean free path of a particle remains finite, so that 
numerous collisions and associated molecular 
chaos further support the Boltzmann decorrela- 
tion ansatz. 


Stochastic Multiscale Approaches 


Multiscale approaches are far less developed for 
stochastic processes. Let us mention the case of a 
Markov process. Scale separation reflects in a 
spectral gap in the transition matrix generating the 
dynamics. Identification of fast and slow modes is 
then straightforward: slow modes are associated 
with quasidegenerated eigenvalues (A ~ 0 in a time- 
continuous setting), whereas fast dynamics is asso- 
ciated with damped modes and negative eigenvalues 
(A<0,|A| >> 1) (Gaveau et al. 1999). A basic 
difficulty in extending methods developed in a 
deterministic context is the fact that the reduction 
(or projection) of a Markov process is a priori no 
longer Markovian. Closure relations and approx- 
imations should be introduced to circumvent mem- 
ory effects, for example, supported by arguments 
of decorrelation and ensuing fast temporal self- 
averaging of the fast dynamics. 

It is to note that the behavior upon rescaling of a 
stochastic process differs from the transformation of 
a deterministic evolution. The basic relation is the 
scaling upon a time rescaling 6=et of the white 
noise involved in stochastic differential equations 
and defined from the Wiener process W(t) through 
the relation dW(t)=n(t)dt. It follows from the 
definition W(@)= W(t) that dW(6)=/edW(t). At 
this point, it is important to notice the difference 
with respect to the behavior of a plain deterministic 
function f(9)=f(t) for which df(@)=edf(t). Using 
the fact that 6(t)=«6(@) and the definition 
dW (6) =7(@)d6, we obtain that 7(@) is a white noise 
with respect to the rescaled time 0, that is, a 
stationary Gaussian process defined by its first two 
moments 


OO) = 60-8) D 


Slow/Fast Variables 
Slow/Fast Decomposition 


Dynamics of systems made of many interacting 
elements, for example, chemical reactions, or popu- 
lation dynamics, typically involves far too many 
degrees of freedom to be handled at the level of 
individual units, and requires a drastic reduction to 
make sense of it. A natural way of reduction is based 
upon the phenomenology, taking as relevant degrees 
of freedom those describing the slow evolution 
observed at macroscopic scales. Scale separation 
between microscopic and macroscopic worlds has to 
be turned into a constructive and quantitative 
argument to achieve this reduction. 
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Solving this typical multiscale issue first requires 
to identify and construct explicitly the slow vari- 
ables, for example, collective variables obtained 
through aggregation or coarse-grainings. The second 
step is to eliminate or rather integrate the fast 
dynamics into a closed system of effective equations 
describing the large-scale evolution. The closure 
requirement generically involves an approximation, 
neglecting the remaining dynamic coupling between 
fast and slow variables. It is precisely here that scale- 
separation arguments and the very choice of the 
slow variables are crucial, ensuring that the influ- 
ence of fast dynamics is essentially accounted for in 
its effective or average contribution to the slow 
dynamics; remaining fluctuating influences can be 
either neglected or included in a noise term, required 
to be fully determined as a function of the slow 
variable only (otherwise the whole procedure would 
neither be consistent nor useful). In the following 
subsections, we shall briefly present the main 
techniques allowing to achieve this program, con- 
sidering the simple abstract system: 


a ee D =eg(X,Y), (e«1) [0 


Although involving only two variables for simpli- 
city, it exhibits the typical multiscale structure: 
whereas X varies on scales ©(1), Y appears as a 
slow variable of characteristic timescale O(1/e). 


Parametric Approximation 


The preliminary step of the reduction is to get some 
knowledge on the fast dynamics, at least to choose 
the proper multiscale technique. A plain but never- 
theless fruitful remark is that a parameter p can 
always be seen as a variable that does not evolve: 
dp/dt=0 in a deterministic setting, or W,.7= 
6(p — q) in a stochastic one (transition probability 
W). Conversely, a slow variable can be transiently 
treated as a mere parameter in the fast dynamics. 
Supported by timescale separation, this parametric 
approximation (or quasistatic approximation) 
decouples the fast dynamics from the slow variable 
evolution, investigating the fast dynamics asympto- 
tics (t — oo) while considering that the slow variable 
remains constant Y(t) = y. In the following, we shall 
distinguish two cases: (1) the fast dynamics oscillates 
with a period T < 1/e, and (2) the fast dynamics 
relaxes to a stable equilibrium point X*(y) slaved to 
the slow variable. 


Amplitude Equations 


A ubiquitous technique to account for slowly 
modulated oscillations has been introduced first by 
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Fresnel for light propagation and optical phenom- 
ena. The basic idea is to take benefit from the scale 
separation between the fundamental oscillation 
(frequency w, wavelength }=27/k) and a super- 
imposed slow variation of the wave amplitude 


Ar, t) =A, pe) 


[11] 
K=|VA/A| <k, 


Q = |0,A/A| K w 

The evolution can be rewritten in terms of the 
slowly varying amplitude A; by construction, it is 
ruled by terms involving the small parameter e ~ 
K/k ~Q/w & 1, but the resulting equation is now 
devoid of small or large parameter. Such technique 
has been successfully applied and further developed, 
for example, in various situations involving electro- 
magnetic waves (e.g., diffraction of Hertzian waves), 
in plasma physics (resonant interaction between 
electromagnetic waves and acoustic modes) and in 
quantum mechanics, to investigate the deformation 
of a wave packet in a potential. 


Averaging 


Let us discuss further, in a general setting, the case 
when the fast dynamics is an oscillation of period T 
(either linear modes as in the last subsection or a 
stable limit cycle). It is a context where averaging 
techniques apply. We refer to the associated entry in 
this Encyclopedia by Neishtatdt (see the article 
Averaging Methods) and only mention here the 
main principle: to exploit scale separation and self- 
averaging property of the fast dynamics to replace 
X(t) by an average value 


T+t 


Xay(t) = (1/T) X(s)ds 


t 
The underlying idea is that averaging cancels out 
most of the fast variations so that X,,(t) is now 
slowly varying. In case when the fast dynamics is 
influenced by the slow variable Y, its value is kept 
constant in the averaging (see the section “Para- 
metric approximation”). The resulting average 
behavior X,,[Y(t),¢] is reinjected in the evolution 
of the slow component, leading to a closed equation, 


— = €2(Xav Y(t), fl; Y) 


or rather [12] 
dY ~ ~ 
=~ = 8(Xavl¥(r),71, Y) 
in terms of the more relevant rescaled time variable 
T=et and Y(r) = Y(t). Denoting Y(r) the solution 
of this approximate equation, the validity of the 
averaging procedure is assessed by theorems 


giving conditions ensuring that lim,_.9 Y.(r) = Y(r). 
Note that such theorems (quite unusually) state 
the convergence, for a vanishing value of the 
perturbation parameter «e, of the exact solutions 
towards the approximate one (solution of the 
average equations). 

To conclude, let us notice that one speaks of 
averaging in temporal context and homogenization 
in spatial or spatio-temporal contexts, when aver- 
aging is performed over space; as discussed in the 
section “Bridging the scales: mean-field, scalar, and 
scaling approaches,” averaging and homogenization 
belongs to the general class of mean-field 
approximations. 


Quasistationary Approximation 


Let us now consider the case when the fast dynamics 
converges at fixed Y towards a stable fixed point 
X*(Y). Focusing on the slow dynamics, the relevant 
time variable is 7=et, which turns the evolution 
[10] into 


dX dY 
ey IA ar ap = 8% Y) [13] 


(for the sake of simplicity, we use the same notation 
X for both X(t) and X(r)). It is solved in two steps, by 
noticing that at lowest order in e, the fast dynamics 
reduces to the asymptotic regime f(X, Y) = 0, slaved to 
the slow variable Y. The corresponding stable state 
X*(Y) is then plugged into the slow dynamics to get a 
closed equation for Y(r): 

dY 

< = afX*(¥), Y] = G(Y) 14 
This achieves the desired dimensional reduction. It 
works equally well when X is a string of variables 
A = (jsir AN): 

There is seemingly a paradox here, ubiquitous in 
many multiscale approaches: in order to determine 
the evolution of the slow variable Y, it is considered 
a constant! The solution lies in scale separation: the 
trick is to consider the ensuing approximate decou- 
pling as an exact one (what it would be in the limit 
€ — 0). In other words, the constancy of Y is 
considered over a time length which is long at the 
level of fast dynamics (At > 1), long enough for X 
to reach its equilibrium state X*(Y), but short at 
the macroscopic level (cAt= Ar <1). As in the 
so-called “quasistatic evolutions” encountered in 
thermodynamics, the large-scale evolution will be 
composed of a continued succession of local 
equilibrium states: at each time r,X takes its 
instantaneous equilibrium value, slaved to Y(r). 
Here one speaks equivalently of quasistationary 


approximation, quasisteady-state approximation, or 
adiabatic elimination of fast variables. 


Slow Invariant Manifolds 


In the previous subsections, the decomposition 
between fast variables X and slow variables Y was 
given. But in practice, only the whole dynamics of 
the system is known and a main part of the issue is 
to find and construct explicitly the slow variables. 

A geometrical viewpoint on the dynamics 
appears to be fruitful: if the system evolution is 
to be reducible to the evolution of a few degrees 
of freedom, it means that the flow essentially lives 
in a low-dimensional region of the phase space, 
which can be parametrized by these degrees of 
freedom up to some fuzziness of order O(e). 
Mathematical investigations have been conducted 
to assess this point, leading to the concept of 
invariant slow manifold: a manifold M of the 
phase space, invariant upon the dynamics and 
describing the slow dynamics once the system has 
reached it (Gorban et al. 2004). Starting from an 
arbitrary point zo, the trajectory first exhibits a 
fast transient bringing the system state close to M, 
up to some tolerance of order O(c), then sticks to 
M. Its evolution on M is ruled by a reduced 
dynamics, far slower than the fast relaxation to 
M as soon as the system actually exhibits a 
timescale separation. This latter self-consistent 
assertion should be considered as a working 
hypothesis, to be validated by the explicit deter- 
mination of M and associated reduced dynamics. 
This can be done numerically, by exploiting the 
presumed convergence property of any trajectory 
reaching M after some intrinsic transients. In 
other words, if the dynamics possesses a slow 
invariant manifold, an operational way to find M 
is to let the system evolve, starting from a sample 
of initial conditions, and to observe its stabiliza- 
tion on M. 

This framework obviously embeds the quasista- 
tionary approximation presented in the last subsec- 
tion: in this case, the slow invariant manifold is 
M={z=(x,y), f(z) =0}={(x*(y),y)} and the dynamics 
restricted to M is the slow dynamics dy/dr = 
G[y(7)], x(7) =x*[y(7)]. Here the manifold is invar- 
lant upon the approximate dynamics (for all 
t, flz(t)]}=0, hence z(t) € M) but not upon the 
original one: some rigorous mathematical work has 
to be done to show that the actual dynamics keeps 
the trajectory in a proper neighborhood of M of 
width O(c). In other words, one has to control the 
discrepancy between the exact trajectory and the 
trajectory slaved on M. 
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Central Manifold 


The notion of slow invariant manifold generalizes 
older results about central manifolds, exploited to 
reduce the dynamics near a bifurcation point. Let us 
consider a dynamical system x=f(x,a) near a 
bifurcation point: in @=a,, the fixed point xo, 
stable for a < ac, loses its stability. This reflects on 
the largest eigenvalue(s) of the stability matrix 
Df(xo,@), namely Ay(a@) < 0 for a < ay, àla) > 0 
for a > œe, and Ay(a,)=0. The small parameter is 
then «= ;. A main result was to show that, near the 
bifurcation point, slow modes coincide with 
unstable directions and fast modes with stable 
directions (Haken 1996). The decomposition into 
slow and fast variables is ruled by the central 
manifold theorem: the solutions can be expressed 
in terms of the amplitudes along the eigenvectors of 
the null space of the dynamics at «=O; these 
amplitudes appear as the relevant order parameters 
near the bifurcation. This is referred to as the slaving 
principle. Compared to the setting presented in the 
subsection “Slow invariant manifolds,” the slow 
invariant manifold M is given here by the central 
manifold. 


Projection Techniques 


The methods presented in the previous subsections 
to eliminate fast variables and construct a reduced 
slow dynamics can be unified into a common 
framework: Mori-Zwanzig projection techniques. 
The full state (x, y) of the system is projected onto 
the slow variable y and the functions w(x, y) are 
projected onto their conditional expectation 


Pw(y) = J w(x, y)p(xly)dx 15] 


The core of the method lies in the choice of 
conditional distribution p(x|y), for instance, 
p(x |y)=d(x —x*(y)) in case when there is an 
invariant manifold x=x*(y), or p(x|y)=1/27 in 
case of averaging over a rapidly varying phase x. We 
refer to Givon et al. (2004) for a review. 


Aggregation Techniques and Coarse-Grainings 


An intuitive guideline in the analysis of a multiscale 
dynamics is that collective variables or coherent 
states coincide with slow modes. The rationale is 
that numerous fast fluctuations at the level of agent 
dynamics self-average, so that only a slow trend is 
perceptible at large scale. Aggregation methods have 
been developed in this spirit to build reduced models 
governing the slow dynamics. Nevertheless, in 
generic situations, aggregation does not lead to 
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closed equations for the collective variables and 
some level of approximation has to be introduced. 

Let us now consider a system of N coupled 
degrees of freedom, [x;(t)];_1_y (e-g., a system of N 
interacting agents) evolving deterministically accord- 
ing to a two-scale dynamics (Auger and Bravo de la 
Parra 2000): 


e— = fi(x1,..., Xn) tegi(x1,---,Xn) [16] 


where f describes a fast evolution due to the 
coupling between species and g; a slow evolution 
due to internal mechanisms. A natural choice for the 
slow variable is Y(x1,...,%n)= > ,x;, but we shall 
write below the general case. The self-consistent 
requirement of the method is that this variable Y 
reflect a global and slow behavior. Considering t as 
a fast time variable, this condition amounts to 
require a quasistatic behavior for Y at this timescale. 
In other words, the consistency condition requires 
that there exists a manifold F, such that 


N 
Y 
— SE (yo sn) 2N) =) 
OF y= {Y Vip tn) — 77 [17] 


We, moreover, assume that the fast dynamics on this 
manifold F, leads to a stable equilibrium 
(xï(y) --- xġ(y)). We are then in a position to 
describe the slow evolution of the manifold itself, 
that is, the slow dynamics ruling the evolution of the 
aggregated variable y for € small enough: 


2 =e xi(y),--- 200] 
RO) +O) [18] 


Internal support of the procedure is to check the 
structural stability of this resulting aggregated 
dynamics. Compared to the quasistationary approx- 
imation and slaving principle presented earlier, here 
the slow variable is not given independently but 
constructed as a function of the fast variables 
(aggregated variable). The same principles can also 
be implemented for discrete-time models. 

Coarse-graining can be seen as the spatial analog 
of aggregation techniques developed in the phase 
space: the real space is split into cells considered as 
elementary units at macroscopic scale, and all the 
small-scale physics is averaged over each cell, 
yielding the apparent state of each unit (described 
by a few “coarse-grained” variables) and the 
effective interactions between them. 

Let us cite two hydrodynamic examples. Eddy 
viscosity refers to an effective viscosity involved in 
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coarse-grained hydrodynamics equations; the con- 
tribution of small-scale turbulent structures is 
accounted for in an integrated way in this para- 
meter, hence its name. It is typically lower than bare 
viscosity, even possibly reaching negative values at 
large enough Reynolds number, that is, at low 
enough bare viscosities. Cellular flows are space- 
periodic flows, thus exhibiting a natural spatial 
scale: the coarse-graining amounts to an intrinsic 
homogenization over each cell of the flow. 

Let us finally mention that coarse-grainings are 
involved in renormalization-group transformations 
once supplemented with the adequate rescalings (see 
the section “Renormalization: an iterated multiscale 
approach”). 

In conclusion, it is to note that all these various 
multiscale approaches are closely related and can all 
be expressed as a specific projection technique in the 
extended phase space containing both fast and slow 
variables. For instance, aggregation techniques 
replacing the fast variables (x1,...,x,) by the slow 
collective variable y= Y(x1,...,x,) amount to the 
projection technique involving the slow invariant 
manifold V4 (Niseva5 nV) Y= Y isons Ma): 


Numerical Aspects 


In the community of applied mathematics, multi- 
scale methods refer specifically to numerical homo- 
genization, involving multigrid algorithms as, for 
instance, multiscale finite-element method, multigrid 
Monte Carlo, multigrid optimization, or annealing. 
Basically, the idea of numerical homogenization is 
to avoid the numerical cost of using a mesh of size 
bhb <e, where e is the scale of the smallest-scale 
features of the dynamics, and to use jointly: 


è a fine mesh, to compute local quantities indepen- 
dently (hence with a parallelized program); and 

è a coarse mesh, to compute global behavior using 
effective parameters and homogenized quantities 
determined in the prior fine-mesh computation. 


We refer to Gorban et al. (2004) for a review. 


Boundary Layers and Matched 
Expansions 


Purposes and Principles 


Multiscale approach to handle boundary layers was 
introduced in 1905 by Prandtl in fluid mechanics for 
situations where the solution of hydrodynamics 
equations far from the boundaries (“bulk” solution) 
does not match the conditions at the surface of the 
walls or obstacles. This typically originates in the 
presence of a multiplicative small factor «€ in front of 


the highest-order derivative; accordingly, the flow 
exhibits two different scales in space: a thin 
boundary layer of width controlled by € and the 
bulk domain. The idea is to perform two different 
perturbation methods in the layer and in the bulk, 
involving a different rescaling in order to focus on 
and give the ruling place to either the boundary 
conditions or the bulk dynamics (one also speaks of 
inner and outer expansions). Then these parallel 
perturbation expansions have to be bridged into a 
single global continuous solution. The matching 
principle is to identify the asymptotic behavior on 
the boundary side with the boundary condition of 
the bulk behavior (Nayfeh 1973): 


lim Xbulk (7) — tim X iga) with C= r/e [19] 


Boundary layers of hydrodynamics have numer- 
ous analogs: initial layers in chemical kinetics, skin 
layers in electrodynamics and edge layers in solid- 
state physics (Nayfeh 1973). Adaptation of this 
technique is to be developed to determine the 
complete dynamics in the slow-invariant-manifold 
approach, matching the fast relaxation towards the 
manifold with the slow motion onto the manifold. 
Let us finally note that the matched-expansion 
approach can benefit in each region of all the 
above-mentioned multiscale techniques. 


Time Analog: Implementation for Initial Layers 


We shall now work out the time analog of a 
boundary-layer problem on the abstract example 
encountered in [10], in the case when X rapidly 
evolves to a slaved equilibrium state X*(Y) but with 
initial conditions Y(0)=yo and X(0) = xo 4 X* (yo). 
Obviously, the quasistationary approximation fails 
to describe the initial regime and its applicability 
has to be reconsidered. The general principle of 
boundary-layer analysis, namely the recourse to two 
different perturbation approaches, is implemented as 
follows: 


e For the initial regime, one solves the fast 
dynamics with initial conditions X(0)=x9 while 
keeping Y(t) = yo; this yields an approximate 
solution [Xjayer(t), Yiayer(t)], satisfying the initial 
conditions and valid at short times, as long as Y 
has not evolved. 

e At longer times, the relevant variable is the 
rescaled time t=et and the quasistationary 
approximation described in the last section 
applies. 


The consistency of the two perturbative 
approaches is ensured by the matching conditions 
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lim Xbulk(T) = him X layer (f) 
m S [20] 
lim Yhuel) = lim Yigal?) = Yo 


These conditions are actually satisfied since Xpuik(T) = 
X*[Ybuk(7)], hence lim,— o Xpuik(T) = X*(yo) and, by 
definition of X* (at fixed Y(t) = yo), lim; æ 
X ayer (t) = X* (yo). 


Some Typical Applications 


Enzymatic catalysis A matched singular perturba- 
tion approach is currently encountered in chemical 
systems, for instance, in the derivation of the 
Michaelis-Menten kinetics for a single enzyme and 
the Hille cooperative kinetics for an allosteric 
enzyme (Murray 2002). Denoting by E the enzyme, 
by S the substrate, by ES the active complex, and by 
P the product, the single-enzyme catalytic transfor- 
mation of S into P is described by the following 
scheme: 


k 
SESES SPHE 
k' [21] 


S] =s, [E] =e, [ES =c 


where, as is well known, the enzyme is released at 
the end. Introducing dimensionless quantities 





t = keot, — ee 
SO 0 
k a Reat T m 
Kn = À Ky = — 
A [22] 
eS Rete | _ €0 
kso SO 


the corresponding chemical kinetic equations can be 
written as 


di 23 


Noticing that «<1 (the enzyme is present in 
infinitesimal quantities compared to the substrate), 
a quasistationary approximation applies for the 
variable c: it means that the intermediary species 
ES rapidly reaches a local equilibrium state č = č* (š). 
This yields the substrate evolution 


ds A 
dt 34+Ky 





|24] 


The initial condition is set only on the substrate: 
s(0)= sọ, that is, s(0)=1. It yields the well-known 
expression of the velocity V = (ds/dt)p-ọ as a 
function of the initial substrate concentration: 
V(so) = eokcatso/(so + Km) (with a maximal value 
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Vmax =€0Rcat). The quasistationary value for the 
complex (dimensionless) concentration c*(s=1)= 
1/(1 + Km) at t=0 obviously differs from the actual 
initial condition c(0) =0: besides, it is quite foresee- 
able that the transients leading the complex ES to its 
stationary value cannot be described using a 
quasistationary approximation. At short times, the 
relevant time variable is the fast rescaled time 
6=t/e, leading to the equation describing the initial 
regime when supplemented with the actual initial 
condition c(0)=0,s(0)=1. The analysis is straight- 
forwardly carried over, exactly as in the general 
abstract case, with a matching condition limg_,. 
c(0) =c(t=0)=1/(1 + Ky). 


Kinetic theory Time-matched expansions have 
been developed in kinetic theory, for instance, to 
describe the fate of a tagged particle within a gas. In 
a first, short stage (kinetic stage) following the 
injection of the particle in the thermally equilibrated 
gas, the velocity distribution of the particle rapidly 
evolves due to collisions with gas molecules and 
associated momentum transfer. This stage lasts a 
few mean-free-times and it ends when the tagged- 
particle distribution is almost Maxwellian. Then, in 
a second stage (hydrodynamic stage), the distribu- 
tion slowly relaxes towards a spatially uniform 
distribution, ultimately equal to the equilibrium 
Maxwell—Boltzmann distribution; at each time, the 
velocity distribution is almost Maxwellian. The 
particle dynamics is described at the level of its 
distribution function by the Boltzmann equation, 
and the resolution (the so-called Chapman—Enskog 
method) is based on the above general principles. 


The adiabatic-piston problem A matched two- 
timescale perturbation approach has been developed 
for the adiabatic piston problem: an isolated cylinder 
filled with an ideal gas (noninteracting light particles 
of mass m) is separated in two compartments by a 
moving piston, of mass M, adiabatic in the sense that 
it has no internal degrees of freedom and does not 
conduct heat when fixed. The small parameter is the 
mass ratio e=2m/(M +m). It quantifies the effi- 
ciency of energy transfer between the gas particles 
and the piston upon elastic collisions, and the 
strength of the indirect coupling of the two gas 
compartments through the collisions of their particles 
with one and the same piston. The matched 
perturbation approach gives access both to a fast 
deterministic relaxation towards mechanical equili- 
brium, at timescales O(1), with no heat transfer 
between the compartments, and a slow fluctuation- 
driven evolution towards thermal equilibrium, where 
the heat transfer is achieved by the collision-induced 


coupling between the gas and the piston fluctuating 
motion, thus occurring at timescales O(M/m) (see 
Adiabatic Piston). 


Renormalization: An Iterated 
Multiscale Approach 


It is not the place to expose or even summarize the 
implementation of renormalization techniques, for 
which we refer to the associated entries in this 
Encyclopedia. Here we will only stress the natural 
relations between renormalization group (RG) and 
multiscale approaches. The RG approach indeed 
shares many steps and guiding principles: joint 
rescalings, coarse-grainings and local averaging, 
effective parameters and effective terms, relevant 
and irrelevant contributions, with a focus on large- 
scale behavior. Moreover, far beyond the scope of 
the study of critical phenomena, RG has been 
extended into an iterated multiscale approach 
allowing to determine in a systematic and construc- 
tive way the effective equation describing the 
universal large-scale features and asymptotics of a 
multiscale system (see, e.g., Chen et al. (1996) and 
Mazzino et al. (2004). 

It is first to be underlined that different meanings 
are associated with the term “renormalization,” 
corresponding to very different statuses for the 
associated renormalization procedures. 

A renormalized quantity can be plainly a rescaled 
quantity (normalized, dimensionless or put to the 
scale of the considered sample): here arises a first 
connection with multiscale approaches, both involv- 
ing rescalings as an essential preliminary step. 

A renormalized quantity can be an effective 
quantity accounting in an integrated way of com- 
plicated underlying mechanisms (e.g., the renorma- 
lized mass of a body moving in a fluid, accounting 
for hydrodynamic effects); here arises another 
central notion of multiscale approaches: effective 
parameters or effective equations (following, e.g., 
from averaging or homogenization). 

Renormalization is also a mathematical technique 
developed first in celestial mechanics, and then 
mainly in quantum electrodynamics to regularize 
divergent expansions and perturbation series. It 
might proceed by means of resummation; the idea, 
implemented by Rayleigh in 1917, is to sum up 
correlations and interactions into a redefinition of 
the parameters. It might either rely on the introduc- 
tion of a cutoff in the space, time, and energy scales, 
then accounting in an effective way of the host 
of contributions at smaller space and time scales 
Ax < A, At < 0 (or, equivalently, larger momentum 


and frequency scales: k > 27/A,w > 27/6) so as to 
take advantage of the physical cancellation of 
mathematical divergences. In any case, it turns the 
bare parameters of the original singular expansion 
into renormalized parameters and yields a renorma- 
lized regular expansion. Writing that the resulting 
large-scale behavior does not depend on the chosen 
cutoff (A,@) yields renormalization equations, 
expressing quantitatively the very consistency of 
the procedure (“renormalizability” of the expan- 
sion). Renormalization provides alternative technical 
tools in instances treated above with the multiple- 
scale method. Its main advantage is its recursive 
structure: introducing a sequence (A, 0,,),, of cutoffs 
(what is called momentum-shell RG), the whole 
procedure can be iterated to integrate recursively the 
influence of small-scale features on the asymptotic 
behavior, allowing as to handle situations exhibiting 
a hierarchy or even a continuum of scales. 

Renormalization also refers to an asymptotic 
analysis allowing as to classify critical behaviors, to 
determine quantitatively the critical exponents and to 
handle the associated divergences. Indeed, the above- 
mentioned multiscale approaches fail near bifurcation 
points or critical points. In this case, scale separation is 
replaced by scale invariance. The key idea, underlying 
RG techniques is to shift the focus on the scaling 
procedure itself. The basic point is to construct a 
renormalization transformation, consisting in joint 
coarse-grainings and rescalings, thus relating the two 
models describing the same phenomenon at different 
scales (Lesne 1998); it puts forward their self-similar 
properties and associated scaling laws, while eliminat- 
ing specific small-scale details having no consequences 
on the asymptotic, large-scale behavior. The set of 
renormalization transformations has a semigroup 
structure with respect to the rescaling factor (or plainly 
with respect to iteration) justifying to speak of RG. It 
generates a flow in the space of models, whose fixed 
points correspond either to trivial or to critical 
situations according to their stability. It can be shown 
that the linear analysis of the renormalization trans- 
formation around a critical fixed point gives access to 
the critical exponents. Moreover, this analysis allows 
us to split the space of models into universality classes, 
each associated to the basin of attraction of a critical 
fixed point. Let us emphasize that scale invariance 
leads to a deep change in the modeling and investiga- 
tions, shifting from a “physics focusing on the 
prediction of amplitudes” to a “physics of the 
exponents,” focusing on less specific, but more 
universal and above all, more intrinsic features. 

Far more generally, RG is associated with a 
qualitative change in the questioning, since the 
study takes place in a space of models. Generalized 
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renormalization transformation can be designed to 
extract not only self-similarity properties but any 
large-scale feature from a more microscopic model. 
In particular, RG can be specially designed to 
discriminate between essential and inessential terms 
in a model: the latter do not modify the asymptotics 
of the RG flow, meaning that they are of no 
consequence at large scales. In other words, generic 
properties of the renormalization flow in this space of 
models yield universal large-scale scaling properties. 
RG is thus essentially a multiscale approach, insofar 
as it only retains the relations between the different 
levels of descriptions, somehow ignoring the details at 
each given scale. It is actually designed to capture 
universal features of the multiscale organization. 


Summary: The Exemplary Case 
of Diffusion 


Bridging the Scales 


Our aim in this section is to present the whole range 
of multiscale approaches in use, allowing both to 
bridge models devised at different scales and to 
predict the large-scale features of the phenomenon 
they account for. We choose the context of diffu- 
sion, Brownian motion, and transport phenomena, 
where such a bridge is essential and has been much 
investigated. Indeed, transport coefficients are 
defined through phenomenological equations; it is 
thus necessary to relate such macroscopic equations 
with smaller-scale theories, so as to get an expres- 
sion of the coefficients in terms of the microscopic 
ingredients and to justify the validity of the 
phenomenological description. 

The exposition in the various subsections below, 
following increasing scales, will mark out the path- 
way from reversible molecular dynamics to macro- 
scopic diffusion equations. We shall thus come 
across the multiple-scale analysis of the Liouville 
equation describing at microscopic scales a Brown- 
lan grain suspended in a thermal bath of water 
molecules (see the next subsection) leading to the 
mesoscopic Kramers equation for the grain distribu- 
tion function P(r,v,t). Next, involving higher but 
still mesoscopic scales, we see that another multiple- 
scale analysis leads to the reduced Smoluchowski 
equation for its spatial distribution P(r, t). Random 
walks offer alternative mesoscopic models, involving 
effective diffusion coefficients in order to take into 
account underlying features like persistence length 
or other short-range correlations. Scaling limits or 
more systematic renormalization methods in real 
space allow to bridge discrete random-walk models 
with continuous descriptions. Another RG, based on 
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a path-integral formulation in the framework of 
field theory, allows to handle the case of self- 
avoiding walks with infinite memory. Homogeni- 
zation is illustrated on the case of diffusion in a 
regular porous medium, whereas diffusion pro- 
cesses in fractal substrates provide a counterexam- 
ple, singular enough to exhibit anomalous scaling 
behavior. The issue of reducing the dynamics of the 
diffusion process to a simpler effective one is 
encountered in many other macroscopic instances, 
among which we shall mention diffusion in a 
periodic medium, lending to space averaging, and 
advection of a passive scalar field in a two-scale 
velocity field, where a multiple-scale analysis yields 
the effective diffusivity at large scale. We shall give 
further technical guidelines for constructing these 
steps climbing from molecular up to large macro- 
scopic scales, thus providing additional illustrations 
of the multiscale approaches introduced in the 
previous sections on more general and abstract 
grounds. 


Microscopic Theory of Brownian Motion 


The first theoretical account of Brownian motion, 
namely the erratic movement of a micron-sized 
pollen grain suspended in a thermal bath, for 
example, water, dates back to 1905 and the famous 
paper by Einstein. It took almost 60 years before a 
microscopic theory was achieved; this theory has 
been further worked out using multiple-scale 
techniques (Cukier and Deutsch 1969). The chal- 
lenge is to start from the complete deterministic 
reversible dynamics of the system, described within 
a probabilistic framework by the Liouville equation 
Op /Ot=Lp for the distribution of probability p in 
the whole phase space (position and velocities of 
the grain, of mass M, and all water molecules, of 
mass m <M). The small parameter is the mass 
ratio €=,/m/M measuring the efficiency of the 
energy transfer upon collisions between the grain 
and the bath particles, assuming a binary interac- 
tion potential U=) ;u(|r;—r|). The Liouville 
Operator is decomposed into L=Lo+eL 1, and 
one introduces rescaled time variables T,= €” t, 
where 7 =t is the timescale of the fluid particle 
dynamics. Multiple-scale method is carried out 
according to the general scheme, leading to the 
so-called Kramers equation, 


where the friction coefficient is explicitly given as 


1 CO 


where F, = e+* Fo and Fo = -V,U [26] 


¢ 


We refer to the original, although very pedagogical, 
paper by Cukier and Deutsch (1969) for a thorough 
exposition and discussion of this derivation. 


Mesoscopic Theory of Brownian Motion 


Multiple-scale method is also of relevance to 
determine the high-friction limit of the above 
Kramers equation. Standard perturbation technique 
with respect to the inverse of friction, 1/¢, fails to 
describe the asymptotic regime: there is not enough 
freedom to fulfill all the solubility conditions 
required to avoid the appearance of secular diver- 
gences (Bocquet 1997). By contrast, multiple-scale 
technique yields a uniform expansion of the evolu- 
tion equation still valid at long times, thus allowing 
to bridge two mesoscopic levels of description, 
namely the Kramers equation and the Smoluchowski 
equation for the spatial density p(r,t) of the 
Brownian particle: 


ð 1 8 O 
5 ls Dis Môr (eT =) p(r, t) |27] 


Introducing dimensionless variables 7 =tv,,//,R = 
r/l,V =v/vy, where l is the size and vp = y kT/M 
the thermal velocity of the grain, the relevant small 
parameter appears to be the dimensionless inverse of 
the friction coefficient, € = vy,/1¢; hence, 


o o 
(2 +V. PR Vz) 
o o 
=y v + A P(R, V,7) [28] 
If the friction is high (i.e. € <«& 1), the velocity 
relaxes very rapidly towards the equilibrium Max- 
well distribution, and it is then enough to describe 
the (slow) evolution of the spatial distribution 
plr, t). Nevertheless, the relaxation stage is essential 
and accordingly the e-dependence is singular, as a 
rule when the small perturbation parameter multi- 
plies the time derivative. 

According to the general procedure exposed in the 
section “Multiple-scale method: principles,” we intro- 
duce rescaled variables 7) =7,7; =€T,7 =€77,... 
considered as independent variables and look for a 
solution of the Kramers equation of the form 
P=P + «PO + eP” +..-, where the arguments 
of all the components P® are (R,V,70,71,72,..-): 
Identifying term-wise the successive powers of e yields 


a hierarchy of equations. At order 0, we obtain 
P =@(R,7,71,72,...)e” /*. The following equa- 
tions, for the [P®];>1, involve the linearized operator 
£L=Ovy(V + Ov). For each of them, there appears a 
solubility condition, requiring that none of the additive 
contributions in the equation is an eigenvector of £; 
involving the components P% with < i, it prevents the 
appearance of a secular divergence in P®. At order’, 
the solubility condition is O®/O7) = 0, thus determin- 
ing the (trivial) 79-dependence of P'°). In a similar way, 
the solubility condition at order 2 allows to determine 
the 7,-dependence of P®. This bridges the Kramers 
and Smoluchowski equations in the high-friction limit, 
when retaining only the first-order term in e. We refer 
to Bocquet (1997) for a pedagogical account of the 
derivation and discussion of its relation with the time- 
derivative expansion involved in the so-called Chap- 
man-Enskog solution of the Boltzmann equation. 


Random-Walk Model and Weakly 
Correlated Diffusion 


Random walks are discrete-time mesoscopic models, 
accounting for the diffusing motion of a particle 
through the statistical properties of its successive 
steps, when observed at a given timescale r. The 
basic model (ideal random walk) assumes isotropic, 
independent and identically distributed steps of var- 
iance a°. Central-limit theorem straightforwardly 
gives the time dependence of the mean-square dis- 
placement R*(t) = (|r(t) — r(0)|?)=a2t/r, showing 
that the motion is a normal diffusion, with diffusion 
coefficient D =a? /2dr in dimension d. It is to note (see 
also the next subsection) that D depends 7 and a, but in 
a joint manner. Actually, the diffusion coefficient 
associated with a diffusive motion observed at scale a 
and modeled by a random walk on a lattice of 
parameter a can be written as D = aa’, where the rate 
a depends on a (effective rate at spatial resolution a): 
this is a sort of renormalization that accounts for the 
rate a(a) of all microsteps backward and forward of 
length far smaller than a. 

In case of short-range correlations between the 
successive steps (namely if XO% |C(t)| < co, where 
C(t) is the statistical correlation function between 
elementary steps separated by a time length t), direct 
computations support a time-average-like result: 
the asymptotic behavior is still described by a 
normal diffusion law R*(t) ~ 2dDegt, with Deg = 
DES, C(t). When C(t) =e7*/7 


D(1 +e7!/7) 


L E PT 


hence Deg ~ 27D if T < 1. 
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Renormalization Analysis in Case 
of Markovian Diffusion 


Trying to bridge lattice random walks with a 
continuous description brings out the following 
difficulty: as the step size a goes to 0, one has to 
obviously decrease the duration 7 accordingly, but 
by what amount is not so obvious, since the walker 
velocity is ill-defined (it depends on the observation 
scale). Determination of the proper joint rescaling 
can be guessed from the knowledge obtained by 
another mean about the system; rather, it can also 
be obtained in a systematic way, thanks to RG 
methods. Let us explain the basic principle. 

Let us denote by P,,-(x,y,t) the transition prob- 
ability governing the random walk, namely the 
density of probability to jump from x to y in time 
t, where x, y are restricted to the lattice (aZ)! and 
time to TN. The renormalization transformation 
k a should express the consequence for Pa,- of a 
joint rescaling of space (by a factor of k) and time 
(by a factor of k®). Taking into account the Markov 
character of the walks, we are thus led to define 


Opal alo) = k*Pa (kx, ky, R°t) 
in dimension d [29] 
The proper value of œ is to be determined self- 
consistently in order that the limit limo ®g a Pa,7 
exists (it is then a continuous transition probability 


P*(x,y,t) defined on R? x R? x R). The root- 
mean-square displacement 


1/2 
R(P, t) = pai E y P(x, y, ) 


is transformed according to 
Ri Dralans =R R Park [30] 


Accordingly, it yields the diffusion law associated 
with the fixed point P*: 
for any k, R(P*,t) =k 'R(P*, kt), 
hence R(P*,t) ~ t!/° [31] 
It is anomalous except if a=2. In the case of ideal 
random walks, the proper exponent leading to a 
nontrivial limit is a = 2; this limit P3 is the transition 
probability of a Wiener process: 
Wp(x,y, t) = [4ndDt] e73) [Aa 
with D = a*/2dr [32] 
This shows that all ideal lattice random walks 


belong to the same universality class, that of the 
Wiener process. This approach has been fruitfully 
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applied to diffusion in disordered systems, the issue 
being to determine whether or not the disorder, 
accounted for as a noise term in the transition 
probabilities, modifies the normal diffusion law 
obtained in the unperturbed situation. Similar 
reasoning can also be implemented for self-similar 
anomalous diffusion processes, like fractional Brow- 
nian motions and Levy flights (Lesne 1998). 


Renormalization Analysis for Self-Avoiding Walks 


Let us only mention, for the sake of completeness, the 
renormalization techniques developed for determining 
the conformational statistics of linear polymer chains, 
whose three-dimensional shape can be represented as 
the trajectory of a self-avoiding random walk. These 
techniques belong to the RG corpus developed in 
statistical mechanics for critical phase transitions, 
within a field-theoretic framework. A formal but 
exact analogy can actually be worked out between 
self-avoiding walks and a spin lattice system with 
n — 0, where n is the number of spin components. 

The multiscale nature of the system is so marked 
here that it should rather be qualified as an absence of 
characteristic scale. In this respect, standard RG 
methods developed for critical phenomena lie at the 
very boundary of multiscale approaches. Scale decoup- 
ling is replaced by scale invariance, which is somehow 
the conjugate situation: homogeneity in real space is 
replaced by homogeneity in the conjugate space (space 
of characteristic scales). Scale invariance here reflects 
in the self-similar property, R(N) ~ N”, relating the 
end-to-end distance R of the chain to the number N of 
elementary steps (the monomers), with an anomalous 
exponent v (the Flory exponent v ~ 3/5 in dimension 
d=3) originating from the infinite memory of the 
nonoverlapping chain. We refer to Lesne (1998) and 
references therein for a more detailed exposition of the 
concepts and techniques only alluded here. 


Effective Diffusion in a Porous Medium 
(Homogenization) 


Describing the diffusion in a porous medium appears 
as a formidable task at the pore level: it would 
require us to account for all the boundary conditions 
at the border of the hollow domain V € Yo actually 
accessible to diffusion. When the pores have a finite 
characteristic size a, a homogenization approach can 
be developed at scales far larger than a. It allows to 
account for the slowing down of the motion due to 
obstacles in an effective diffusion coefficient (in plain 
words, the black and white medium made of matter 
and holes of size a appears as a grey homogeneous 
medium at larger scales). More specifically, a diffus- 
ing tracer of random trajectory r(t) experiences a 


varying coefficient D[r(t)] (it equals D inside the 
pores, whereas it vanishes in the nonaccessible region 
Yo — V). The idea is to replace this fluctuating 
realization of the transport coefficient by its spatial 
average (independent of the trajectory), in what 
concerns macroscopic properties: 


Dg= 
Vo 


(where no(r)= 1 iff re V) [33] 


Deed r= J Dir] d?r 


Rigorous mathematical theorems ensure that the 
large-scale motion can actually be described by a 
Fick law and associated plain diffusion equation 
(Bensoussan et al. 1978). 


Anomalous Diffusion in a Fractal Medium 


The above homogenization for diffusion in a porous 
medium works well only if the pores have a finite 
characteristic size; by contrast, diffusion in a fractal 
substrate (e.g., a porous medium with pores of all 
sizes) generically leads to anomalous diffusion, asso- 
ciated with a time dependence of the mean-square 
displacement R?(t)~ £ with y< 1. In a fractal 
substrate, the existence of obstacles and pores of all 
sizes introduces spatial fluctuations at all scales and 
long-range correlations in the spatial dependence of D. 
This case corresponds to a critical situation and 
homogenization fails to give a relevant description of 
the macroscopic behavior, in the same way as mean- 
field methods fail to account for critical phase transi- 
tions. It reflects in the anomalous exponent y < 1 of the 
diffusion law, that can be related to the fractal 
characteristics of the substrate (y = d, /d¢{, where d, is 
the spectral dimension and dẹ the fractal dimension). 


Effective Diffusion in a Periodic Potential 
(Averaging Method) 


In case of a periodic medium, where D[r(t)] oscillates 
with a small spatial period, an averaging procedure 
can be developed as in the subsection “Effective 
diffusion in a porous medium (homogenization ),” to 
determine an effective diffusion equation accounting 
for the large-scale motion. Explicit computations 
within a multiple-scale approach yield 


1 


Deff = (D) 


[34] 
where (D) denotes a space average over the 
elementary cell (Givon et al. 2004). 

Let us rather detail the case of diffusion of a 
Brownian particle in a periodic potential U, with 
U(x + L)= U(x) for any x (restricting to dimension 
1 for simplicity), at equilibrium at temperature T. 
Let D be the coefficient of this particle in the 


absence of the potential. At large scales dx > L, 
the substrate appears to be spatially uniform. The 
influence of the periodic bias exerted by the 
potential on the diffusive motion (superimposition 
of a modulated deterministic drift) can be described 
in an average way. The result is a normal diffusion 
with a reduced effective diffusion coefficient 


L 
-Z i o gl 2 
Dai(U) =D _ inf f 11 — f"(x)|? day (x) 


-U(%)/kT d% 
with dmy(x) = ————__—— [35] 
fE UC VET dx! 


where the infimum is taken over the set of smooth 
periodic functions of period L and the average involves 
the equilibrium distribution my of the particle in the 
potential landscape U(.). So doing, one sees in 
particular that no oriented motion can arise at 
equilibrium, even if U is asymmetric. The procedure 
extends to dimension d with only technical differences. 


Effective Diffusivity for a Passively 
Advected Scalar 


Still another fruitful implementation of multiple- 
scale method is encountered in the context of 
diffusion and transport phenomena, in the study of 
the advection by a given incompressible velocity 
field v(r,t) of a passive scalar field O(r,t), for 
example, the density of small inert “tracer” particles 
advected by the fluid flow without modifying it 
back. We consider the case when the fluid motion 
can be decomposed into a large-scale, slowly varying 
component and a small-scale, rapidly varying fluc- 
tuation: v(r,t) = U(r, t) + Au(r,t). The parameter A 
controls the relative strength of these components. 
Another small parameter A is involved in this 
problem: the ratio «=//L < 1 of the typical length 
scales L and / of U and u, respectively. Here the 
issue is to bridge two macroscopic descriptions: the 
full hydrodynamic equation describing the evolution 
of the scalar field 6(r, t) 


o 
2 or.) + o(r,2).90(F,0) 


and a large-scale effective transport equation for an 
average scalar field 8z (r,t), 


=DAOr,t) [36] 


ae t) + U(r,t).VOL(r, t) 


Ot 
ð eff 0 
= D; Bi; (r,t)OL(r,t) [37] 


This procedure, amounting to account in an average 
way for the small-scale contributions to the 
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complete hydrodynamic description, relies on a 
spatio-temporal generalization of the multiple-scale 
method: it involves rescaled space and time vari- 
ables, X=ex,7=et, T =t The different charac- 
teristic scales of the velocity components are directly 
reflected in their arguments: u(x, t) and U(X, T). The 
passive scalar field now expresses (x,t, X,7, T) and 
it is expanded as 0= 0° + e0! + € 6°. The standard 
multiple-scale procedure leads to introduce an 
auxiliary field x: 


Ox; + |(u + AU).O]x; — D Ox; =-uj [38] 


yielding the effective diffusivity tensor (where () is a 
space average) 


D; D a =D) (Op XiOp Xi) [39] 
p 


Advection enhances transport, and eddy diffusivity 
is larger than molecular diffusivity. In realistic cases, 
there is a continuum of scales u =X`^_} utn, where 
u, has a characteristic scale l, ~ 27”lọ. Multiple- 
scale method is to be iterated into an RG analysis, 
achieving a recursive integration of the small and 
fast scales into DË starting from the smallest and 
fastest ones. 


Conclusions 


Multiscale approaches allow to predict large-scale 
behavior generated by a given model; even more, 
they offer constructive tools to bridge models at 
different scales for the same phenomenon. They 
provide systematic and mathematically well- 
controlled tools to turn faithful but intractable 
models into effective reduced ones, thus lying at 
the core of statistical mechanics, many-body dyna- 
mical systems, and, more generally, at all issues of 
the still-in-progress complex systems science. Indeed, 
in a complex system (that might be their very 
definition), levels are so interrelated that it is 
essential to investigate jointly all the scales, from 
elementary units up to the whole system, and its 
emergent properties; neither theoretical nor numer- 
ical approaches can alone consider all the levels 
together, showing the relevance, if not the necessity, 
of multiscale approaches. 

Basic preliminary issues are to determine the 
proper elementary level, the proper collective vari- 
ables, and the relevant small parameters. Let us 
remark that the implementation of a multiscale 
technique rapidly faces the fundamental issue of 
defining a macroscopic variable; it offers some clues, 
indicating that a macroscopic variable might be a 
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phenomenological quantity observable at our scale, 
a slow mode, or collective variable. 

Multiscale approaches take benefit of the separa- 
tion of scales involved in the different mechanisms 
at work in the phenomenon under consideration. 
The basic idea, seen above at work in various 
instances and different ways, is to somehow decou- 
ple the different scales and to solve several simpler 
single-scale problems. Any multiscale implementa- 
tion actually involves, at some stage and more or 
less explicitly, a limiting process in which the scale 
separation ratio 1/e tends to oo: this limiting process 
has to be carefully controlled in order that the 
method can be applied to real situation. Finally, to 
be successful, multiscale approaches should achieve 
a trade-off between: 


è accuracy (minimizing the loss of information 
involved in the reduction or projection technique), 

e efficiency and tractability (this is, e.g., one of the 
major successes of hydrodynamics) 

è robustness of the resulting reduced model (to be 
checked a posteriori), 

e flexibility (extending to heterogeneous systems 
involving different components), and 

è scope (bridging many different levels in order to 
capture the whole hierarchical structure). 


Let us conclude by emphasizing a much fruitful 
benefit of multiscale approaches: they allow to 
investigate structural stability of a model, in parti- 
cular to evidence relevant parameters and essential 
mechanisms controlling large-scale features. In this 
respect, they lead beyond the (necessarily restricted) 
scope of a specific model and give an explicit account 
of the observer biased view, related to its scale of 
observation. They hence contribute to capture a more 
complete and controlled understanding of the real 
physical systems. 

Finally, a note on bibliographic guide to multi- 
scale approaches may be useful. Technical details 
and several applications of multiscale perturba- 
tive expansions, in particular multiple-timescale 
method, with references to the original papers, 
can be found in Nayfeh (1973). Applications of 
multiple-scale method, fully worked out in a very 
pedagogical way, can be found in the work of 
Cukier and Deutsch (1969), Piasecki (1993), 
Bocquet (1997), and Mazzino et al. (2004). An 
acknowledged reference on homogenization tech- 
niques and multiscale analysis in periodic media 
is Bensoussan et al. (1978); see also the mono- 
graphs by Lochak and Meunier (1988) and 


Berdichersky et al. (1999). Two recent review 
papers on multiscale approaches and reduction 
techniques are Givon et al. (2004) and Gorban et al. 
(2004). Basic principles and technical aspects of 
scaling theories and RG approaches from a multiscale 
viewpoint can be found in Lesne (1998). 


See also: Adiabatic Piston; Averaging Methods; 
Bifurcations in Fluid Dynamics; Boltzmann Equation 
(Classical and Quantum); Central Manifolds, Normal 
Forms; Interacting Particle Systems and Hydrodynamic 
Equations; Korteweg-de Vries Equation and Other 
Modulation Equations; Localization for Quasiperiodic 
Potentials; Singularity and Bifurcation Theory; Stability 
Problems in Celestial Mechanics; Stationary Phase 
Approximation; Universality and Renormalization. 
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Introduction 


The concept of negative refraction has caused a 
revolution in classical optics and electromagnetic 
theory in the past few years (Pendry 2004, 
Ramakrishna 2005). If a material has negative 
dielectric permittivity (€) and negative magnetic 
permeability (u) simultaneously at a given frequency 
w, then it can be said to have a negative refractive 
index defined as 


n= — JER a 


Several peculiar consequences of Maxwell’s equations 
for the propagation of radiation in such a material 
were originally pointed out by Veselago (1968). But 
the lack of such natural materials failed to create much 
enthusiasm until recently when composite structured 
photonic materials have been shown to have negative 
refractive index (Smith et al. 2000, Shelby et al. 2001). 

The question then boils down to what constitutes 
materials with negative € and u? Where the structure 
varies spatially on a scale much less than the 
wavelength of the incident radiation, composite 
electromagnetic materials can be regarded effectively 
as homogeneous media. A set of effective response 
functions: the effective permittivity, e.¢, and the 
effective permeability, peg, can then be ascribed to 
these materials. To develop a homogeneous view of 
the electromagnetic properties of a medium com- 
posed of discrete atoms and molecules was the 
motivation for defining a permittivity € and permea- 
bility u. The simplicity provided by such a descrip- 
tion cannot be understated. Provided the radiation 
cannot resolve the underlying structure, replicating 
the atoms of a material with structure on a larger 
scale therefore represents a straightforward exten- 
sion of the original concept. 


If we consider arrays of structures defined by a 
unit cell of dimensions, d, then our effective 
description of the response of the medium to 
electromagnetic radiation of angular frequency w 
will be valid provided that 


dX A=21c/w [2] 


This restriction ensures that the underlying structure 
of the medium will merely refract and not scatter the 
incident radiation, in which case an effective 
permittivity and permeability for the medium 
become valid. The above inequality defines 
the long wavelength or effective medium limit 
(Garland and Tanner 1978). Maxwell’s equations, 
written in the absence of free charges and external 
currents, 


OB 
V-D=0, YEs [3] 
OD 
V-B=0, Vx H=—- [4] 
together with the constitutive relations: 
B(w) = ober (w)H(w) [5] 
D(w) = £oEefelw)E(w) (6) 


then provide us with a complete description of the 
electromagnetic properties of the material over the 
frequency range of interest. Note that the effective- 
medium parameters are a function of the frequency 
as the material polarization response depends on the 
time history of the applied fields (Landau et al. 
1984). These effective parameters were then general- 
ized to analytic complex functions to account for 
absorption, and to second-ranked tensors to describe 
anisotropic responses. 

The real parts of these effective material para- 
meters can always be negative; there is nothing 
fundamentally wrong about that. Provided that they 
are dispersive, that is, they vary as a function of 
frequency, and dissipative as a consequence of the 
famous Kramers—Kronig relations (Landau et al. 
1984), such materials are causally possible. Simulta- 
neously negative values of €e and Hep change the 
nature of electromagnetic radiation in these media. 


484 Negative Refraction and Subdiffraction Imaging 


For example, the wave vector in such isotropic 
media points opposite to the Poynting vector and 
gives rise to many new interesting effects such as 
modified refraction, negative Doppler shifts, etc. 
Such materials can support a variety of surface 
electromagnetic modes, which can have dramatic 
effects such as the possibility of a perfect lens which 
has unlimited image resolution (Pendry 2000) and is 
not subject to the traditional diffraction limit. 

New artificial electromagnetic composite struc- 
tures, often referred to as “meta-materials,” allow 
us to access values of these material parameters 
which are not found in naturally occurring materi- 
als. We will show here how to obtain negative 
values of Eg, and ueg in meta-materials using a 
variety of resonance phenomena. Then we will 
look at the problem of imaging with subdiffraction 
resolution using negative refractive index 
materials. 


Artificial Plasmas 


From the electromagnetic viewpoint, a plasma can 
be represented as a medium with dielectric permit- 
tivity whose real part is negative. The Coulomb 
force and the finite mass of the electrons combine to 
give an ideal plasma a dispersion in the relative 
permittivity, €(w), given by 


E(w) =1- 7) 


where the plasma frequency is defined by we = 
(pe”)/(egme), p is the number density of electrons, 
e is the electronic charge, and m, is the electron 
mass. The permittivity of the plasma is negative at 
frequencies below the plasma frequency. 

A plasma-like behavior characterizes the electron 
gas in the noble and alkali metals, with a plasma 
frequency typically at ultraviolet frequencies. 
Because of the presence of dissipation, at lower 
frequencies resistive effects dominate and the plas- 
mons cannot be excited. To obtain materials with 
negative dielectric permittivity at low frequencies, a 
lower plasma frequency is required corresponding to 
more massive particles and a lower particle density 
p. A structure consisting of a three-dimensional 
lattice of very thin wires simulates a low-density 
plasma of very heavy charged particles and is shown 
in Figure 1 (Pendry et al. 1998). A simple model 
allows us to describe the desired reduction in wp in 
such a structure. 

First consider a displacement of the electrons in 
the wires along one of the cubic axes. Only the wires 
directed along that axis are active and thus provide a 





Figure 1 A periodic structure composed of infinite conducting 
wires arranged in a simple cubic lattice. Provided the factor a/d 
is small enough, the structure responds to incident electromag- 
netic waves as a plasma of very heavy charged particles. 


lowered effective density of electrons, Pers, given by 
the area occupied by the active wires. Thus, 


2 
Ta 
Peff = P a [8] 


An even more profound effect of constraining the 
electrons to run along thin wires is a result of the 
induced magnetic field which wraps the wires as 
the electrons are in motion. Suppose a current I 
flows in the wires. The magnetic field is 


— dł pave 
rR 2R 


where R is the distance from the wire center, v is the 
electron drift velocity, and pe is the charge density in 
the wire. In terms of the magnetic vector potential, 
the magnetic field is 





H(r) 9] 


H(R) = u3 'V x A(R) 10 
where 
A(R) = Hot Pee in (da) 11] 


and d is the lattice spacing. The importance of the 
divergence of the magnetic field with the wire radius 
as seen in eqn [9] is the contribution to the canonical 
electronic momentum given by eA. If we neglect the 
variation of the fields with distance from the wire 
center, we can view this contribution as defining a 
new effective mass for the electrons given by 





2 
mese = = In(d/a) [12] 


Now the effective plasma frequency for the system 


2 2 
ep e _ ACO 43] 
P EQMer¢ d+ In(d/a) 


is seen to be much reduced. As an example, the 
plasma frequency of 1m aluminum wires paced 
by 10mm is about 2 GHz, and the corresponding 
electronic effective mass is almost 15 times that of 
a proton! The factors of effective mass and charge 
density cancel leaving an expression comprising 
only the macroscopic system parameters. This is to 
be expected as a circuit analysis in terms of a 
capacitance and inductance can also be used to 
formulate the problem. However, such an 
approach can obscure the true nature of the 
problem which is encapsulated as a low-frequency 
plasma oscillation. Inclusion of the finite resistivity 
of the metal yields a finite lifetime for the plasmon 
excitation. Experiments have shown that a reduc- 
tion in the plasma frequency of six orders of 
magnitude from the ultraviolet to the microwave 
region can be achieved in these thin-wire compo- 
sites (Pendry et al. 1998). 


Artificial Magnetism 


Although the Maxwell equations [2]-[4] are sym- 
metric in the electric and magnetic fields, we are yet 
to discover a free magnetic pole. The magnetism we 
find in natural materials is limited to spin systems 
and restricts the values of uep. Up to microwave 
frequencies, magnetic activity is common and 
certain insulating ferromagnets and antiferro- 
magnetic compounds such as MgF, and FeF, can 
even exhibit a negative permeability at some 
frequencies. However, large losses can accompany 
the magnetic activity in these materials. 

Recently, it has become clear that a wide variety 
of composite structures comprising resonant inclu- 
sions can display magnetic activity in the effective 
medium limit (Pendry et al. 1999). Efficient screen- 
ing of AC magnetic fields can be achieved using a 
thin cylindrical shell of metal or superconductor. In 
order to obtain a large magnetic response such that 
the modulus of the magnetic susceptibility, |Xm| > 1, 
what we require is a resonant over-screening 
material response. A collection of subwavelength- 
sized structures that exhibits such an over-screening 
response can constitute a negative [Jes material. 
One such resonant subwavelength structure is the 
so-called split-ring resonator (SRR), which can be 
scaled to form magnetic meta-materials from 


microwave to optical frequencies (Pendry et al. 
1999, O’Brien and Pendry 2002b). An SRR 
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(a) (b) 
Figure 2 (a) The split-ring resonator structure. The structure is 
planar with an internal radius R. The metal rings are of width w 
and are separated by a spacing g. (b) Generic dispersion 
relationship, w vs. k, for a resonant structure with an isotropic 
effective permeability as in eqn [15]. 


structure which has been demonstrated experimen- 
tally to have a resonant magnetic response at 
microwave and THz frequencies is depicted in 
Figure 2a (Smith et al.). It comprises of two planar 
rings of metal on an insulating backing. The rings 
couple inductively to the magnetic field normal to 
the plane of the rings. Because of the large 
capacitance between the rings, the structure reso- 
nates at some frequency. Driven by the back 
electromotive force (emf), a large response is 
expected in the vicinity of the resonance frequency 
which is also antiphased in a small frequency range 
above the resonant frequency. If the SRRs are much 
smaller than the free-space wavelength, a collection 
of such SRRs would behave as a negative [lege 
material at these frequencies. 

Theoretical calculations (Pendry et al. 1999) 
assuming a nondispersive metal show that a periodic 
lattice of such structures is characterized by a 
magnetic permeability given by 


fa? 
w — w + irw es 
0 


where f = 7R*/d? is the filling factor, 


Üe = 1 — 









3lc2 


20 = NTR In 2w/g 


is the resonant frequency, and the damping of the 
resonance is determined by the factor 
2i 


OS n 16] 





Here d is the lattice spacing, R is the inner radius of 
the ring, w is the width of the rings, / is the distance 
between adjacent planes of SRRs, and ø is the 
conductance per unit length of the rings measured 
along the circumference. Orientation of planar SRRs 
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Figure 3 (a) The generic magnetic response of the SRR 


structure. Re(u) < 0 in a frequency band above the resonance 
frequency. 


along all three Cartesian axes allows for the creation 
of an isotropic material. Figure 3 shows the generic 
dispersion of the p(w) given by eqn [14]. A higher 
resistivity for the material of the SRR would 
broaden the resonance and the frequency region 
with Re(u)<0 might vanish altogether for large 
resistivity. 

For isotropic homogeneous materials with a 
resonant effective permeability as in eqn [14] we 
can illustrate a generic dispersion relationship, w vs. 
k, shown in Figure 2b. The solid lines represent 
twofold degenerate transverse modes and the 
dispersionless longitudinal magnetic plasmon 
mode at the magnetic plasmon frequency (wmp). 
The dashed lines are a band of propagating states 
with a linear dispersion determined by the 
polarizability of the SRRs and a flat band of 
resonant states at the magnetic resonance fre- 
quency wọ. The gap in the dispersion can be 
regarded as arising from the hybridization and 
avoided crossing of these bands. The important 
points to note are: 


1. Wherever me is negative there is a gap in the 
dispersion relationship. This is the case for wo < 
W<Wmp, the frequency where mep =0. Only 
evanescent modes with imaginary wave vector 
exist in this region. 

2. A longitudinal magnetic plasma mode, which 
shows no dispersion, appears at w=Wymp. 


An alternative approach to obtaining a nonzero 
magnetic susceptibility in composite media is pro- 
vided by the zeroth-order transverse electric (TE) 
Mie resonance in dielectric particles. Ferroelectric 
and phonon polaritonic materials are promising 
candidates for providing the necessary large dielec- 


tric constants up to infrared frequencies (O’Brien 
and Pendry 2002a). 


The high-frequency scaling properties of the SRR 
offer an interesting insight. The plasma-like dielec- 
tric permittivity of noble metals 


Ww 


E(w) = (€1,€2) = Eœ% — je [17] 


is essentially a large negative real number for wp > 
w >> y. For a 2D array of simplified SRRs consisting 
of a single conducting ring with symmetrically 
placed small capacitive gaps, the quasistatic effective 
magnetic permeability for a magnetic field applied 
normal to the plane of the SRR is (O’Brien and 
Pendry 2002b) 


ea 
we — wo? + ilw 


where f'=L,f -(Lg+L;),T=Liy-(Lg+L;)", and 
wo? =(Lg+Li)" Cİ. In the above expressions, 
Lz=potR* is the geometrical inductance per unit 
length of the structure and C=e9€,T/ncd, is the 
capacitance per unit length of the structure for series 
connection. Here it has been assumed that the 
thickness of the SRR (r) is small compared to the 
skin depth 6~co/wp. 

An additional inductive impedance in the struc- 
ture, the kinetic or inertial inductance, Lj= 
2nR /eqwyt = 2ponR& JT, determines the effective 
filling fraction and damping of the resonance through 
the ratio of the two contributions to the total 
inductance. This contribution to the inductance arises 
from the finite electron mass and implies that simply 
decreasing the size of the resonators indefinitely will 
not result in our being able to realize a strong 
magnetic response at near-infrared or optical fre- 
quencies. As the dimensions of the structure are 
reduced that fraction of the energy of the displace- 
ment current associated with the inertial mass of the 
electrons increases. A finite y then means that 
dissipative losses increase. Thus, strong damping of 
the resonance will be avoided if the quantity Rr /267 
is large. We note here that with 6 equal to the 
London penetration depth, this ratio also determines 
the screening efficiency of low-frequency magnetic 
fields by a thin layer of superconductor. This result 
points to a broader similarity between the low- 
frequency electromagnetic properties of the super- 
conducting condensate and those of a perfect plasma. 

Other nanocomposites in addition to the SRR 
have been proposed which may lead to a magnetic 
response at optical frequencies. These include pairs 
of nanometer-sized metallic sticks where simulta- 
neous electric and magnetic dipole resonances lead 
to a strongly dispersive effective permittivity and 
permeability. 


a= l- [18] 


Negative Refractive Index Media 


Interleaving the structures for a negative Egg and [Meg 
can create a composite with £eț¢ < O and ueg < O ata 
common frequency (w) (Smith et al., Shelby et al. 
2001), which as predicted by Veselago (1968) should 
give rise to a material with negative refractive index. 
Although this appears intuitively correct, it is actually 
nontrivial that the electromagnetic fields of the two 
composites do not interfere with each other’s function 
(Pokrovsky and Efros 2002) and this could depend 
crucially on the relative placement of the two 
structures (Marques and Smith 2004). However, 
there is now overwhelming experimental and numer- 
ical evidence that such composite structures possess 
negative refractive index (see Ramakrishna (2005, 
section 6)). Now consider a medium with predomi- 
nantly real € and u. For € > 0 and u > 0, we have 
our usual optical materials. Only one of £ or p lesser 
than zero with the other positive would imply a 
medium which cannot support any propagating 
modes. This is a consequence of Maxwell’s equations: 


k-k =elw)ulw) > [19] 

0 
which implies that only evanescently decaying waves 
with an imaginary component of k are possible. 
Common examples are ordinary metals with £ < 0 
and u > 0. Now consider a medium with both £ < 0 
and u <0, or a negative refractive index medium. 
The Maxwell’s equations for a plane time-harmonic 


wave exp|[i(k-r— wt)] are: 


k x E = : u(w)H [20] 
kxH=- =e(w)E [21] 


The “left-handedness” of the triad (E, H, k) is clear 
from these equations for e(w), u(w)<0. A real 
refractive index means that waves propagate with the 
direction of energy flow given by the Poynting vector, 


S=ExH [22] 


opposite to the direction of the wave vector. Since 
the group velocity is in the direction of the energy 
flow, we conclude that in these left-handed materials 
(LHMs) the group velocity and the phase velocity 
are oppositely directed. The phase accumulated in 
propagating a distance x is Ag = — ,/E£pw/ cox. Thus, 
the refractive index can be taken to be n= —,/ep, 
that is, a negative quantity. Mathematically, it is 
more reasonable to ask for the sign of the square- 
root to determine the wave vector given by eqn [19]. 
It can be shown by arguments of analytic continuity 
in the complex plane that the negative sign has to be 
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VAC RHM 


VAC LHM 


(a) (b) 
Figure 4 Illustration of Snell’s law at an interface between 
two media with (a) positive refractive index (VAC/RHM) and 
(b) negative refractive index (VAC/LHM). The arrows indicate the 
wave vectors and the energy flow is opposite to the wave vector 
in the negative index medium. 


chosen for propagating waves when Re(e) < 0 and 
Re(u) < 0 (Ramakrishna 2005). 

The negative refractive index has real effects on 
the behavior of radiation even in basic processes 
such as refraction. Consider an interface between 
vacuum and a negative refractive index medium 
with n < 0 shown in Figure 4. Continuity conditions 
on the electromagnetic fields at the interface require 
for a plane wave incident from the vacuum side at 
an oblique angle that the parallel wave vector ky is 
conserved for the transmitted and reflected wave. 
This is the origin of Snell’s law: 


sin(;) = sin(6,) = m_ sin(@,) [23] 


where 0i, 0; and 6, are the angles of incidence, 
reflection, and transmission, respectively. The flow 
of energy across the interface determines the direc- 
tion of the group velocity in the material medium as 
being away from the interface. Therefore, the 
component of the phase velocity vector normal to 
the interface must change sign as we pass from 
vacuum into the material medium. We are then 
forced to conclude that the ray is bent toward the 
same side of the surface normal as the incident 
wave. This picture is consistent with Snell’s law with 
the interpretation that n<0=>6,<0. Figure 4 illus- 
trates this point which has been experimentally 
verified by several groups (Shelby et al. 2001, 
Parazzoli et al. 2003, Eleftheriades et al. 2002). 

As a direct consequence of this, it is seen that a 
flat slab of negative refractive medium can act as a 
lens as shown in Figure 5. Provided that the slab is 
of sufficient thickness, the refracted rays from a 
point source come to a focus inside the slab and 
upon exiting the slab the rays are redirected again 
such that they come to a focus on the opposite side 
of the slab (Veselago 1968). Veselago also predicted 
a negative Doppler shift in such media and an 
obtuse angle cone for Cerenkov radiation. 


488 Negative Refraction and Subdiffraction Imaging 


Image 
plane 


Object 
plane 





' d/2 d d/2' 


Figure 5 Steady-state passage of rays (representing the energy 
flow) of light from vacuum through a slab made of a LHM with 
n= —1. The slab acts as a lens mapping a point on the image plane 
to a point on the object plane. 


Perfect Lens: Subwavelength Imaging 


A wave analysis of the Veselago lens revealed an 
extremely novel aspect: it did not suffer from the 
diffraction limit and the image resolution could be 
infinite (Pendry 2000), if the negative index 
material were perfectly nondispersive and nonab- 
sorbing. Before we analyze this, let us first briefly 
review the problem of imaging and the diffraction 
limit. 

Any object is visible because it emits or scatters 
light. The problem of imaging is then concerned 
with reproducing the electromagnetic field distribu- 
tion on a 2D object plane in the 2D image plane. If 
E(x, y,0) be the electric field on the object (z= 0) 
plane, the fields in free space can be decomposed 
into the Fourier components k, and ky, and 
polarization defined by o: 


E(x, y,2;t) = Ss” Ex(Ra,Ry) 
a kaRy 


x expli (kxx + kyy + kzz — wt) | [24] 


where 


Eo(kxsky) = | Eo(x.y.0)e M6?) dedy [25 
x,y 
In the above expression, the source is assumed to be 
monochromatic of frequency w, kZ + k? + k? = 
w/c, co is the speed of light in free space, and 
z is the optical axis. A conventional lens acts by 
applying a phase correction to each of the propaga- 
ting components so that they reassemble to a focus 
at a point beyond the lens. For these components k, 
is real, thus a phase change is all that is required to 
form an image containing these components. The 
higher spatial details in an object, however, are 
described by the nonpropagating near-field compo- 
nents with an imaginary k, where k2 + k? PUJE: 
A conventional lens cannot restore these 


components in the image plane as they decay 
exponentially in amplitude as one moves away 
from the source. Hence the resolution, A, provided 
by a conventional lens is limited to those compo- 
nents with 
B+P JAn TEA [26] 
Now consider the slab of medium with £= -—1 
and u= —1 and of thickness d;. It can be shown 
(Pendry 2000) that the transmission and reflection 
coefficients are 


lim ¢ = exp|—ik,d,| [27] 


lim? = 0 i28] 


respectively, where k, is the component of the wave 
vector normal to the interface. Thus, the slab 
reverses the phase advance for the propagating 
waves as revealed by the ray picture. Analytic 
continuation to imaginary wave vectors k,=ik, 
implies that the transmittance t— exp(+x«,d), that 
is, the slab also increases the amplitude of the 
evanescent waves in transmission at exactly the 
same rate as the rate of the decay in free space 
outside. Thus, each wave, propagating or evanes- 
cent, arrives at the image plane with its phase or 
amplitude restored exactly to the values at the object 
plane so as to perfectly reconstruct the image. The 
lens is also perfectly impedance matched and has 
zero reflection. These incredible properties have led 
the phenomenon to be called “perfect lensing.” 
Note that there is no energy flux associated with 
purely evanescent waves, and hence the amplifica- 
tion obtained in the steady state corresponds to local 
field enhancements which would imply the presence 
of localized resonances. In fact, the entire mechan- 
ism of the focusing of the near-field components is 
due to surface modes that reside on the surfaces of 
these negative index materials (Ramakrishna 2005). 
é€=-—1 and u= -—1 are precisely the conditions for 
these surface modes of electric and magnetic nature, 
respectively. These surface plasmon resonances 
which are excited resonantly by the evanescent 
modes and the secret to the perfect lens is that all 
the surface modes are completely degenerate. 
Although the conditions for realizing a perfect lens 
are easy to specify, in practice these are very difficult to 
meet. The requirement of negative values for € and u 
implies that these quantities must disperse necessarily 
with frequency and be dissipative. Thus, the perfect- 
lens condition can only be met approximately at a 
single frequency. Any deviation from the ideal 


conditions can then result in the excitation of slab 
polariton resonances which can swamp the image. The 
effects of absorption, which are always present, can 
also seriously degrade the lens performance by damp- 
ing out the surface plasmon resonances (Ramakrishna 
2005). Consider the transmission for the P-polarized 
radiation through a negative index slab: 


t(Rx) — D 


|29] 
where 
D= (ka/e+ +ka/e-Y — (ka/E+ — kaje) et 


Under the perfect-lens conditions, the first term in 
the denominator goes to zero for evanescent waves 
and the exponential in the second term decays faster 
than the exponential in the numerator. However, if 
there was a mismatch in the conditions, (£4 = 1 and 
e€- = —1 + ô, say) then the first term in the denomi- 
nator no longer vanishes. In the large wave vector 
limit (ky > w/co), the two terms in the denominator 
become approximately equal when 





; 30 


thus yielding a criterion for the largest wave vector 
for which there is effective amplification. The 
dependence through the logarithm on the deviations 
(whether real or imaginary) from the resonant 
conditions underlines the fact that the perfect lens 
effect is indeed very sensitive. In practice, the 
periodicity, d, of the strucuture of the meta- 
materials comprising the negative index slab itself 
imposes an upper wave vector cutoff ke =27/d. The 
material will become spatially dispersive for wave 
vectors k —> k., and for k > ke. the very description as 
a homogeneous material will break down. 

An important simplification of the perfect-lens 
conditions results when we consider a situation in 
which all length scales in the problem are much less 
than the wavelength of the light (the quasistatic 
approximation). Under these conditions, the electric 
and magnetic fields effectively decouple. If we 
consider the case of P-polarized fields, it can be 
shown (Pendry 2000) that in the quasistatic limit 
only the value of the permittivity is important, and 
there are essentially no conditions on the value of 
the permeability. This brings metals such as silver 
into the picture as the permittivity of silver becomes 
equal to —1 in the optical region of the spectrum 
and with relatively small losses (Pendry 2000). To 
overcome the losses, a series of refinements of the 
simple thin-slab picture have been proposed includ- 
ing dividing the lens into a series of layers and using 
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optical amplification to act against the deleterious 
effects of absorption (Ramakrishna 2005). 


The Generalized Perfect-Lens Theorem 


The negative refractive slab can be considered as 
“optical antimatter” in the sense that it cancels out the 
effects on radiation of the traversal through an equal 
amount of positive refractive index medium. This 
cancelation is applicable to the phase changes for the 
propagating modes and the amplitude changes to the 
evanescent modes. In fact, the focussing action can 
happen for more general situations where the require- 
ment of homogeneity of the slab material can be 
relaxed. Now consider the more general situation 
where the dielectric permittivity and the magnetic 
permeability are arbitrary functions of the spatial 
coordinates: 


E+ = e(x, y), H+ = u(x, y) [31] 


e- = —e(x, y), u- = —u(x,y) [32] 


corresponding to the Figure 6. We will consider the 
imaging axis to be the z-axis. Thus, we see that the 
system is antisymmetric with respect to the z=d 
plane. It turns out (Pendry and Ramakrishna 2003) 
that such a system also transfers the image of a 
source placed at the z=0 to the z=2d plane in the 
same exact sense that it includes both the propagat- 
ing and evanescent components. In general, the rays 
in spatially varying media will not be straight lines 
as shown in Figure 6, but the effect of propagating 
through the positive medium is nullified by the 
negative medium. Thus, to an observer on the right- 
hand side, it would appear as if the region between 
z=0 and z=2d did not exist. We will call such 
media with the same sense of transverse spatial 
variation but with opposite signs as optical com- 
plementary media, and the effect of any such pairs 
of complementary media on radiation is null. 


d d d d 


Figure 6 A pair of complementary optical media nullify the 
effect of each other for the passage of light. Spatially varying 
positive and negative refractive indices are schematically depicted 
by the white or shaded regions. 
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The most general conditions on the permittivity 
and permeability tensors for such complementary 
behavior are: 


a E E 
zx zy Zz | 3 3 
xx Hxy Pxz 
H+ = yx Lyy  Lyz 
Hzx HMzy Pez 
and 
—Exx TE xy ro XZ 
E- = | —Eyx —Eyy Eyez 
FE FE —E 
zx zy zz | 3 4) 
—=HUxx —Hxy xz 
H- = | Hys —Hyy +h 
+Uzx +FHzy —Hzz 


and a perfect focus results whenever the two slabs of 
positive and negative media have such a behavior (see 
Pendry and Ramakrishna (2003) and Ramakrishna 
(2005) for the proof). This theorem clearly shows that 
the dependence along the x- and y-directions trans- 
verse to the imaging axis z is completely irrelevant as 
long as the two slabs are optically complementary. As 
an extension, it can be shown that any system of 
optically complementary media will also have a 
perfect focus as long as the system has a plane of 
antisymmetry normal to the optical axis. The above 
effects have also been numerically verified for several 
such spatially varying complementary media (Pendry 
and Ramakrishna 2003). 


Perfect Lens in Other Geometries 


The above generalized perfect-lens theorem along with 
a method of coordinate transformations can enable us 
to now generate a variety of superlenses in different 
geometries. In general, if we can find a geometric 
transformation that maps a given configuration into 
the geometry for the generalized slab lens, then we 
would have generated one more arrangement that will 
exhibit the property of transferring images of sources 
in a perfect sense. If we define the new coordinates 
qı (x, y, zs q2(x, y, Z; and dalk; y, R) (assumed ortho- 
gonal), then in the new frame, the material parameters 
and fields are given by (Ward and Pendry 1996) 
p-a 


i= Ej A 


where 


2 ð 2 a 2 
a(R) o 


Note that a distortion of space results in the change 
of £ and u tensors in general. Thus, in many cases, 
the transformed geometry would involve spatially 
varying (inhomogeneous) and anisotropic medium 
parameters. 

The change in geometry can also make it possible 
for us to realize lenses with curved surfaces. The 
original slab lens maps every point on the object plane 
to another point on the image plane. But the size of 
the image is identical to that of the source. This is due 
to the invariance in the transverse direction and the 
transverse wave vector (kx,ky) is preserved. In 
general, to change the size of the images, the 
translational symmetry would have to be broken and 
curved surfaces will necessarily be needed. The 
focussing action for the evanescent waves is crucially 
dependent on the near degeneracy of the surface 
plasmons in the case of the slab, and curved surfaces, 
in general, have a completely different dispersion for 
the surface plasmons. Thus, one should expect that 
inhomogeneous materials will be required for such 
curved lenses of negative refractive index. It can be 
shown (Ramakrishna 2005) that mapping the slab 
lens into cylindrical coordinates 


x = roe P coso, y=ro sing, z=Z_ [38] 


where fọ is some scale factor(=1) generates a 
cylindrical annulus of inner and outer radii a 
and a2, respectively, with the material parameters 
given by 

Er = Ur = -1 

E$ = u = -1 [39] 

Ez = Hz = —1/ r 


for the annular region. The positive material outside 
the annular region should vary as 


Ey = Ur = +1 
Eb = Ho = +1 [40] 
Ez = Hz = +1/7 


where r= ro exp(£/o). This system transfers images 
in and out of the cylindrical annulus and the image 
of a source inside at r=dp will be formed on the 
surface d3=do(ay/a1)*. Thus, there will be a 
magnification of the image by the factor 


Note that these cylindrical lenses are also short- 
sighted in the same manner as the slab lens. They 
can only focus sources from inside to the outside 
only when aj{/a.<r<aj, and the other way 
around from outside to the inner world when the 
source is located in az < r < a}/ay. 

Similarly the transformation into spherical coor- 
dinates (r=roe’/,0,¢) can be used to generate a 
spherical perfect lens wherein a spherical shell of 
negative refractive material with e(r)~ —1/r and 
u(r) ~ —1/r with arbitrary dependence along 6 and ġ 
(which could be constant too!) have the property of 
perfectly transferring images of sources in and out of 
the shell (Pendry and Ramakrishna 2003). This 
spherical lens also has exactly the same magnifica- 
tion factor given by eqn [41]. In fact, the solutions in 
these two cases of a cylinder and sphere can also be 
obtained by a more conventional electromagnetic 
calculation in terms of the scattering modes 
(Ramakrishna 2005). One can obtain even more 
esoteric configurations such as one or two intersect- 
ing corners of negative refracting materials that 


behave as perfect lenses (Pendry and Ramakrishna 
2003). 


Other Approaches to Negative Refraction 


There is also an approach to negative refractive 
materials based on loaded transmission lines 
(Eleftheriades et al. 2002), which has been imple- 
mented at radio- and microwave frequencies using 
lumped circuit elements. These show all the hall- 
marks of a negative refractive material within an 
effective medium approach. 

Effects which can be interpreted as negative 
refraction have been observed in certain periodic 
photonic crystals (PCs) (Luo et al. 2003). An 
incident propagating plane wave from vacuum 
appears to undergo negative refraction inside the 
PC, and a slab of the PC can even work as a 
Veselago lens. The negative refraction in this case is 
a result of the curvature of the equifrequency surface 
and is present in spite of the right-handed nature of 
the propagation. In these instances, an effective 
permittivity and permeability cannot be easily 
ascribed to the crystal as the long wavelength 
condition is not met. It is difficult to homogenize 
the PC in the sense of meta-materials, and the 
energy transport in these PCs is very sensitive to the 
periodicity and the structural arrangements. Thus, it 
would be an over-simplification to characterize these 
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effects in PC as merely due to an effective refractive 
index. 
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Introduction 


Thermohydraulics is based on the hypothesis of 
continuous medium. This hypothesis is easily satis- 
fied since, for instance, a one-thousandth of 1 mm? 
of a perfect gas at normal temperature and pressure 
conditions (300 K, 1 atm) contains about 2.5 x 10!° 
molecules. Instantaneous balances are made inside a 
control volume fixed in the system of axes and 
crossed by the flows. The limit where this volume 
vanishes leads to the local formulation of the laws 
governing the flows. The flow is described by 
velocity v(7,t), pressure p(7,t), temperature T(F,t), 
and other fields, 7 being the position vector of a 
point M, and ¢ the time. The material derivative of 


q(f,t) is 


= 


Let O (Q) be one of the scalar (vectorial) extensive 
quantities whose balance participates in the flow 
dynamics. It can be a quantity of matter, heat, 
impulse, or something else. Let AO be the amount of 
O contained in the volume AY localized around M, 
and g(7,t) its local representative defined by 


m 2 . A d 
MFDF) = Jim SS = 1 


where p is the density, similarly defined considering 
the case where [QO] is taken as the mass m: 


dm 


pf, t) ~ dp [2| 


Table 1 gives examples of g quantities. 
The instantaneous local balance of O reads 


=> 
e 


o V z — 
= (PD) +7: (ig + pq) = So 3] 


where So stands for any possible local source of QO, 
and jg is the O conduction flux density. Figure 1 


Table 1 Some quantities q. T is the absolute temperature, Cp 
the specific heat at constant pressure, and C the solute mass 
fraction 


Mass Impulse Kinetic energy Heat Mass fraction 





1 v 4 C,T 0<C<1 





Figure 1 Q flux density and Q flux. 


Table 2 Physical dimension of fluxes, flux densities, and 


=> 


V. (flux density) for some q quantities 





Q q Flux Flux density V- (flux density) 
Volume undefined m°s' [velocity] s~! 
Mass 1 kgs kgs'm? kgs m® 
Energy, [velocity]? W Wm? Wm 

heat 
Electrical Coulomb kg” A Am”? Am? 

charge 
Impulse [velocity] [force] [pressure] [pressure] m~ 


illustrates how these quantities allow us to evaluate the 
flux d®o 15 dS of O that instantaneously crosses 
a surface dS. Table 2 gathers the physical dimension 
of these notions for various O’s. 

For O, the flux densities are second-order tensors, 
since dD = 6. dS is vectorial (Figure 1). Its 
balance reads 

0,4, 63 (7, T 

0) +9: (jot peed) = M 
where t indicates the transposition and & a dyadic 
product. 16 and j. 6 are given later. 

The governing equations of thermohydraulics are 
like [3] and [4]. They are completed by compatible 
initial and boundary conditions. The most general 


linear expression of the latter ones is of mixed type, 
for a scalar field, 


agt+ 6 (V in) gq =y onthe boundary [5] 


a, 3, and y being prescribed data, and à the outward 
normal to the boundary. For a vectorial field, g and 
7, respectively, replace q and y. The simplest cases 
are Dirichlet and Neumann boundary conditions 
with, respectively, G=0 or a=0. 


Governing Equations 


We consider nonisothermal flows of fluids in thermo- 
dynamic conditions far from the critical point where 
acoustic effects are involved. The fluid is possibly a 
binary mixture, the simplest non-pure-fluid case where 
modeling does not raise conceptual difficulties. The 


local composition is described by the solute (say) mass 
fraction, 


C(M, t) = Jim AM shite = Psolute 

Av-0 Am p 
with 0< C< 1. Only thermodiffusion is treated, 
and the influence the solutal gradient has on the heat 
flux is not considered, being negligible in liquid 
mixtures. The coupling between the heat and species 
molecular transports then comes only in the solutal 
flux density relation 


= 


fote = —PRc(T,C)|VC+C(1—C)SrVT] [6l 
with Kc >0O, and S7(T,C), the solute Soret 
coefficient, which is positive or negative. The 
order of magnitude of the Soret coefficient in the 
molecular solutions does not exceed few 107K, 
while for colloidal solutions (ferrofluids) |Sr| can 
be in the range 0.03-0.5K™. Even if small, the 
induced mass fraction separation, AC ~ STAT, 
generates a solutal buoyancy of significant dyna- 
mical influence. 


Equation of State for the Density 


One must first describe the sensitivity of the density, 
p(p,T,C), upon pressure, temperature, and mass 
fraction in static conditions. The pressure and 
temperature effective ranges, Ap and AT, are 
assumed small enough compared to their respective 
mean values, po and Tp, for the local (at 
po = p(po, To, Co)) tangent to p(p,T,C) to be a 
good approximation in most cases, 


a = x(P — po) — ar(T — To) + ac(C — Co) [7] 


where 





and 





or a_i (2 ae ai (22 
"PNT po KOC 


are the compressibility, thermal, and solutal 
expansion positive coefficients, and Cp is the solute 
mean mass fraction. Thermodynamic properties of 
some fluids are given in Table 3. Equation [7] is 
valid if yAp, arAT, and ac|AC| are <1. More- 
over, in laboratory experiments and industrial 
processes, one generally has Ap/po < AT/To. The 
pressure term in [7] can thus be neglected in 
thermohydraulics. 





0 
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Table 3 Some values of density, thermal expansion and 
compressibility coefficients, specific heat at constant pressure, 
and sound speed at p=1atm and T = 293K; in SI units 





Fluid p aT xp Cp C 

Air 1.205 1 1 1005 344 
Helium 0.167 1 1 5227 1010 
COs 1.841 1 1 832 269 
Water 1000 0.0607 4.91 x 10° 4182 1461 
Glycerol 1250 0.148 2.2 x 10° 2333 2044 


Mercury 13579 3.76 x 10° 1391 1409 


Notice that water density exhibits a maximum 
around 4°C. A quadratic term in T must then be 


added to [7]. 


The Boussinesq Approximations 


The parameter arAT < 1 is the primary source of 
thermohydraulics. Therefore, the v, p, T and C fields 
can be expanded in series of terms of increasing 
power in atAT. The leading term of each series 
contains an important part of the interesting 
dynamics. The forthcoming equations are given in 
the corresponding approximation framework. They 
contain many simplifications, due to Boussinesq. For 
instance, the conductivities and diffusivities are 
taken as constant, as well as C(1 — C)Sr in eqn [6]. 
The next approximation step, the low-Mach model, 
keeps the leading compressibility and expansion 
effects, while discarding the associated acoustic 
waves. This gives access to thermo-soluto-acoustic 
phenomena. Expansion oscillations are indeed able 
to trigger, and sustain, acoustic waves provided 
phase agreements are fulfilled. This second-order 
model is not presented here. 

The compliance with the criteria ar AT <1 and 
ac|AC] <1 must be checked case by case. The 
section “Steady parallel-flow model”? briefly illus- 
trates this point with an example of thermally driven 
flow. Furthermore, the T- and C-sensitivity of Sr is 
an experimental fact that requires a generic 
approach of the problem. The C-sensitivity of the 
physical properties is generally more pronounced, 
nonmonotonic, for instance, over C € [0,1], than 
their T-sensitivity. 


Boussinesq Local Balances 


Mass It reads 0p/Ot + V- (ov) =0, or equivalently 
(1/p)(Dp/Dt)=—V-v. The fluid particle density 
varies along its trajectory by compressibility and 
thermo-solutal expansion. At the leading order in 
atAT and ac|AC], the latter is negligible, whereas 
the former is associated with acoustics effects, also 
negligible when the fluid velocity is much smaller 
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than the sound speed. The mass balance equation 
then reduces to 


= 


V-v=0 8] 


Only transverse velocity waves (or shear waves) are 
allowed by this equation, D ~ e't) with k- v=0, 
since acoustics contributions are discarded. 


Impulse The impulse molecular flux density is 
= = = = = t 
j a =p 1- mV d+ V25) 


where ug is the impulse conductivity and 1 the 
Kronecker tensor. A Newtonian fluid is defined as 
having uz constant with respect to the rate-of- 
strain tensor V@v. The impulse balance then 
reads 


O _ > => — 


a (ev) +V. (DT) +V -j z =pl 
In the source term poľ, =# for gravity-driven 
buoyant flows. 
With the aforementioned approximations, the 


impulse balance becomes 


P—P0r>, Ge 
= [9] 
po 


with 





the impulse diffusivity, and the pressure P =p — po, p, 
Po,» satisfying the hydrostatic relation 


Vpos = pog 


In the rotating frame of vector Q(t), 


= 


— poz |= = dQ 
pean (GAP) + 2QA54+—AF 
po dż 





must be subtracted from the right-hand side of [9] 
and po, redefined by 


Vio,» = po(- BA (GAP) 


On a free surface, a particular velocity boundary 
condition is to be established. Let dS=dSn be a 


surface element located around M. The tangential 
component (t -7#=0) of the impulse flux across dS, 


tdf =ê- j a dS = pst Vos+ Vez] a5 


must be continuous. Surface tension o(T,C) inho- 
mogeneities make the free surface a source of 
impulse which diffuses in the fluid core. A flow 
occurs even with I =0. For the fluid located where 
dS points to, the velocity boundary condition on the 
free surface then reads 


For most fluids, 00/0T <0. In the Boussinesq 
framework O0/0T and 0a/OC are constant. Equa- 
tion [10] couples the impulse balance with the heat 
and composition ones. 


Heat Local thermodynamic equilibrium is 
assumed. The molecular heat flux density is 
jhe = —MT WT, with ur the thermal conductivity. 
The approximate heat balance reads 

21 = pV T Dhedt [11] 
where Kkr=pur/(poCp) is the heat diffusivity and 
Sheat a possible local (Joule, radioactive, ...) heat 
source. Thermohydraulics can simply be driven by 
nonuniform thermal conditions imposed along the 
fluid boundary, and in this article we henceforth 
takë Sper =O, 


Mass fraction Approximating [6] yields the mass 
fraction balance, 

DC my) e) 
Dr = kcV C+ Co(1 — Co)StTV T [12] 
where Kc and Sr are evaluated at Tọ and Co. The 
normal flux condition 


(Vc l i) 226i Co)Sr (WT , â) 


is imposed on impervious boundaries. 


The Hydrostatic State 


Knowing whether the fluid can be in static state 
with respect to its presupposed rigid container helps 
for a first understanding of thermohydraulic 
dynamics. This raises two problems: (1) the exis- 
tence of this state and (2) its stability, discussed 


later. Point (1) requires the fulfilment of three 
relations, 


Vp = p(p, T, CT [13] 
n = kr T 
Ot 
OC 2 3 [14] 
-~ KcV C+ Co(1 — Co)StV T 


The curl of [13] yields 
Yop, T,C) AČ + p(p,T, CÝ AŬ =0 


which has no reason to be generically satisfied since 
o(p,T,C) and I are totally uncorrelated. The 
hydrostatic state cannot exist if I! does not derive 
from a scalar potential, as with 


Pp" dQ dQ 
T=g-QA)A(QA7r)—-——A?r if —F0 

g (QAF) -g^r it —— # 
The Earth’s rotation axis is known to precess with a 
period of about 26000 years. This generates a 
component of 26000 years timescale in the atmo- 
spheric, oceanic, and internal flows. 


Considering now that 
T= -Vy 


the existence of a hydrostatic state only depends on 
the simultaneous verification of [14] and 


Volp, T,C) AV =0 [15] 


Iso-y surfaces must therefore coincide with iso- 
pycnal, isobaric, iso-T, and iso-C surfaces since the 
p, T, and C sensitivities of p are uncorrelated. The 
compatibility of this condition with [14] is the key 
for concluding about the existence of the hydro- 
static state. Considering again our planet as an 
example (forgetting about precession), the iso-w 
surfaces are almost ellipsoidal. Such T and C 
distributions cannot satisfy [14]. Thus, the atmo- 
spheric and oceanic dynamics, and thermohydrau- 
lics as well, are due to a nonvanishing thermal 
torque, VT A^ Vw. 

A free surface in hydrostatic state is isothermal 
and isocompositional, by eqn [10], whatever T. 


Dimensionless Local Balances 


In buoyancy-driven thermohydraulics, we consider 
four velocity scales — three of molecular origin, and 
the fourth is the free-fall velocity in the buoyancy, 
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Table 4 Orders of magnitude of the Prandtl number for the 
usual fluids. Air and water are in normal conditions 


Liquid metals Gases Water Oils 
Several 10> - 10° ~1, 0.7 for air 6.7 >10 
KT KC V 
Ves- Vv=—, Va- 
T o aT a F 


V4 = JatATgL 


L being a fluid container size scale. Thence come the 
Rayleigh, Prandtl and Lewis numbers, 





L? 

Ma og T 

í Vi V3 “a VKT 
V3 V V2 KC 
P SS E L = a SS 
r Vi kT’ i Vi KT 


Ra being the experimental control parameter, and 
Le <1. Table 4 gives Pr orders of magnitude for 
usual fluids. Let V be the fluid velocity amplitude. 
The importance of the thermal, solutal, and impulse 
convections with respect to the corresponding 
diffusions is, respectively, estimated by the thermal, 
compositional Péclet and Reynolds numbers, 


pe, ¥ VEL »p,,_V_VE 
a ar oT. RE 

V VL 

R = — = — 

i V3 V 

with 
_ Per _ ee = 
Pr= Re’ e = Te Ra = (PerRe) yy, 


Capillary thermohydraulics introduces one velo- 
city scale and the Marangoni number, 
_ |Ao| 
jig Vi 
with Ag=(do/dT)AT in pure fluid. A small 
capillary number, Ca=|Ao|/o, indicates a weak 
influence of the dynamics upon the free-surface 
curvature. 


Let V1, H= poy r=L/ Vis AT and 
ACs- = C;)S7AT 


Vs 


be the velocity, pressure, time, temperature, and 
mass fraction scales, with 


ga and. C= 


AT AC 





the reduced temperature and mass fraction, respec- 
tively. The other quantities, coordinates included, 
are similarly reduced and noted identically. 
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Equation [8] does not change and [9], [11] and [12] 
become, respectively, 





Dov = ~2 
> — VP + Pr Ra(O + U,C)é, + V j 16] 
DO = 
aa 1 
Di VO [17] 
DC >2 >2 
es y = 1 
Di eVC-VO [18] 
where 
acAC 
== 
ap AT 
is the buoyancy separation ratio and ê, = —g/|g|. 


A Wz <0 (>0) corresponds to opposite (coopera- 
tive) thermal and solutal buoyancies. The reduced 
mass fraction boundary condition on impervious 
walls is 


(Von) z (vo | i) 119] 


In rotating frame, scaling Q(t) by Qo, Q(t) = Q(t) /No, 


=i 


Ra Fr (© + WORA (QAP) 


must be added inside the square-bracket term of 
[16]. The Froude and Ekman numbers appear as 


QL V 
Fr = 0 Ek =~— 
: g ? QoL 


The dimensionless capillarity stress condition [10] 
reads 


with 


_ d0/AC AC 
C @0/8T AT 





the capillarity separation ratio, and 


_ ee AT 
«(OT eV 





Ma 





These equations show that, in the Boussinesq 
framework, the flow physics does not depend on 
po, To, and Co, except through the material proper- 
ties which enter the numbers. 


Linear Stability 


Given a base state S=(v,0,C), a solution of [8], 
[16]-[18], how does it behave in presence of an 
infinitesimal disturbance (6v,60,6C)? Applying [8], 
[16]-[18] to (0+ 60,0 + 60,C + ôC) and discarding 
the quadratic terms in perturbation provide the 
disturbance temporal evolution, 


=> 


Ñ. (50) =0 21] 


l 


where F =(—V(6P), 0,0)", and 
Bp, RaPre, RaPr Ypg ê, 
A=| 0 Bi 0 [23] 
0 -Ñ Bie 


with B, = —(v- V) + N. The perturbations (ôv, 60, 
óC) have the (v¥,0,C) boundary conditions, but 
homogeneous. On a free surface, the perturbation 
capillary stress condition is 


i. V o+ (V @ 60)" A 
— —Ma((V -£)60 + UC (V- i)8C) [24] 


Recasting [21]-[23] provides 


» [ 7 
“1 56 | =c(s)| 60 25] 
3 (#8) (18 


whose solution is 
U(t) dv(t = 0) 
(30) = FS}! (ie = s) [26] 
óC (t) 6C(t = 0) 


Direct System 


£(S) is made of V acting on the initial perturbation. 
Conclusions about S stability depend on the sign of 
Amax, the real part of the leading eigenvalue of £ 
found with all the possible perturbations. There is 
stability if Amax < 0. At Amax=0, the marginal 
stability, the bifurcation threshold is located at 
Ra (Pr, Le, Yg, Vc, X) = Ra., Ra.-being the critical 
value of the control parameter, X containing all 
the other parameters of the problem (container 
aspect ratios, etc.). The nonlinear-stability analysis 
in the vicinity of Ra. supplies € in Amax X 
(Ra — Ra.)’, which is characteristic of the 
bifurcation. 





Figure 2 Leading axisymmetric thermal adjoint eigenvector 
(Courtesy of O Bouizi and C Delcarte). 


Adjoint System 


The leading left eigenmode complex conjugate 
supplies the response field of the base state to the 
most destabilizing punctual disturbances. 

The S state and £ eigenspace analytical determi- 
nations are often impossible. One must resort to 
specifically designed numerical tools. A numerical 
adjoint eigenvector is presented in Figure 2 for a 
(Ma = 106, Pr=10~) side-heated cylindrical liquid 
bridge, with a free surface on the right and the axis 
on the left. 


Nonlinear Stability 


When Amax > 0, the associated disturbance exponen- 
tially grows with time, until nonlinearities become 
essential. The flow progressively evolves from S 
towards a new state, S’, which is a solution of [8], 
[16]-[18]. How can one proceed analytically to 
know how the nonlinearities control the bifurca- 
tion? A large number of S—S’ bifurcations exist, 
with either both S,S’, steady or unsteady but with 
different flow structure, or one is steady and the 
other is not. Bifurcations can also be reversible or 
hysteretic, with respect to Ra. The symmetries of S 
play an important role and non-Boussinesq effects 
change the thresholds and the nature of bifurcation. 

Landau’s works have opened up the way to the 
theory of nonlinear hydrodynamic stability. The 
ruling equations are reduced, using an appropriate 
expansion method, to a set of ordinary differential 
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equations describing the temporal evolution of 
amplitudes, A;,i=1,2,...,J, characterizing the per- 
turbation eigenmodes, 


dA; 
dt 


where N accounts for the nonlinear action of the I 
modes on A;, and the ;’s are the temporal growth 
rates coming from the linear theory. The stability of 
the steady solutions, dA;/dt=0, is determined by 
local analysis. With one destabilizing mode, the 
simplest model is dA/dt= AA — aA|A], with a > 0, 
constant, specific of the bifurcation. Symmetry con- 
siderations (some of them directly originate from the 
Boussinesq framework) may impose a=0, whereby 
the simplest model becomes dA/dt= AA + GA?®, with 
B another constant. 

When the flow is weakly confined in one or two 
space directions, boundary effects can play a subtle 
dynamical role, allowing, for instance, the existence 
of multiple solutions, each one made of many 
interacting modes. A large variety of flow regimes 
is then observed, as steady/traveling, extended/ 
localized wave packets, particularly in binary mix- 
tures. Spacetime models, close to [27], such as the 
Ginzburg-Landau equation, 

OA O7A 

yore 6|A|A 
are derived for describing the dynamics of the wave 
packet envelop (of complex amplitude A). 





= XA + N;(4A;) for LI al Oe eee [27] 


Hydrostatic State Stability 


The static-state stability is analytically tractable in 
unbounded volume. Transverse wave (by [21]) 
solutions are the potentially destabilizing perturba- 
tions, with wave vector k and complex frequency w. 
The system [22]-[23] gets simplified, and £ becomes 
algebraic upon substituting (ik,iw) for (V,0/Ot). 
Intuitively, the quiescent state loses its stability when 
Vo(p, T, C) - Vw exceeds a threshold value (positive, 
by the dissipative effects). This analysis supplies it, 
together with the data of the oscillatory motions 
emerging at onset from the rest-state instability. 

In reality, the fluid is confined to three dimen- 
sions, possibly with free surfaces, and wave solu- 
tions are no longer usable. The first approach 
consists in defining a simplified model confined to 
one dimension. The perturbations must satisfy 
homogeneous boundary conditions, and/or [24], 
and they are waves in both other space directions. 
The resulting problem may be analytically tractable. 
The stability of many quiescent-state configurations 
was studied, for fluid layers of infinite or very large 
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extension, of pure-fluid/mixtures, with/without free 
surface. Nonetheless, many other configurations are 
not yet analyzed. Two- and three-dimensional cases 
must be numerically treated. 


Gravitational Buoyancy Convection 


Among the numberless thermal situations to ana- 
lyze, research mainly favored the case where the 
fluid is confined in simple geometries and submitted 
to two distinct heating directions, VT being either 
aligned or normal to I, that is vertical or horizontal 
in the gravity field. Each case leads to specific 
thermohydraulics. The rest-state stability is the first 
analysis step of the former case, the first to be 
experimentally studied by Bénard in 1900, with a 
horizontal liquid layer. The latter is of more recent 
interest, with Batchelor’s theoretical work on the 
parallel convective regimes of pure fluid confined in 
tall slot. Since then, a large amount of work has 
been published on those cases, tackling various 
confinement geometries, and involving high Ra 
values. This problem became the paradigm of the 
rich spatiotemporal behaviors arising in nonlinear 
systems driven away from equilibrium. In binary 
mixtures the complexity of the dynamics increases 
considerably. The literature is so far practically 
devoid of any three-dimensional results in mixtures. 
Ternary mixtures have so far been only scarcely 
considered. 


Steady Parallel-Flow Model 


This analytical approach comes from an interesting 
Batchelor’s remark made about the vorticity but 
here applied to the velocity of a confined flow. “A 
number of flow fields are characterized by values of 
the magnitude of the” velocity “in the neighborhood 
of a certain line in the fluid which are much larger 
than those elsewhere,” and (by V-v=0) “this line 
of necessity” is parallel to ý and to the container 
walls. 

Buoyant forces may contradict this assertion, 
particularly in Rayleigh—Bénard configuration with 
imposed temperatures. There, no parallel solution 
exists. Nevertheless, steady parallel flows do exist in 
containers. The thermally active walls (whatever 
they be - the largest or smallest) are either 
maintained at constant temperatures, or subjected 
to a constant heat flux. Figure 3 sketches a cross 
section (hereafter referred to as the vertical mid- 
plane) of such a configuration, with active (uniform 
heating g) vertical walls. The other sides are 
adiabatic. No rest state is allowed here. Although 
intrinsically three dimensional, the steady regime in 





L«H 


Figure 3 Sketch of the cross section of a slender vertical 
container. 


this cavity can be fairly well approximated as 
two dimensional (in the vertical midplane), and 
moreover mainly parallel to the active walls, in an 
Ra range which increases with the aspect ratio, H/L. 
The influence of the horizontal sides is of limited 
range compared to the flow extension, H. The 
parallel flow is then the one-dimensional approx- 
imation of what occurs in the major part of the 
cavity. This configuration is taken with a binary 
mixture for illustrating an approach applicable with 
minor variations in other situations. 

The problem becomes linear. Indeed, v= w(x)ê: 
by V-v=0. Taking AT=qL/pr as temperature 
scale, [16|-[18] imply 


O(x,z)= Grz + Ô(x),  C(x,z2) = Gez + C(x) 


with Gr, Gc as constants. The impulse balance is 


d'w r A 
and the ruling equations 
df y 
Zr = ~Ra Gr pGr Go)| w 
x Le 29] 
dO dE 
wGT = Ae. : w(Gr =e Gc) = Le ae 


An internal length scale is predicted, of thickness 


Up ~1/4 
Ra (Gr +73 (Ert Go) ) | 
By [28] and [19], the thermal flux condition yields 


dw 


d = — Ra (1 -+ Yg) 


x=+1/2 





A last operation allows to determine Gr and Gc. 
The overall heat and mass fraction balances are 
performed in the cavity part (V), which is bounded 
by an horizontal plane located within the parallel- 
flow region. Since the walls are impervious, the 
solute is transported only across the lower boundary 
of (V), through which the net vertical convective 
supply must be balanced, in steady regime, by 
vertical diffusion. The heat balance works similarly, 
since the walls are adiabatic or submitted to equal 
fluxes. Whence the relations, 


1/2 
Lin 


1/2 . 
/ w(x)C(x) dx = Gr + Ge 
1/2 


w(x)O(x) dx = Gr 


The steady parallel flow is determined. Its stability 
can be analyzed as indicated in the section “Linear 
stability.” 

Some caution must be taken for the Boussinesq 
approximations to be valid here, with the tempera- 
ture and mass fraction increasing constantly (by 
Gr, Gc) along the direction of largest cavity exten- 
sion. These gradients are at the origin of the 
“thermogravitational column” separation power, a 
device designed for the isotope separation. Extre- 
mely long columns can provide almost complete 
separations, with ac|AC| no longer <1, and then 
the non-Boussinesq effects occur. 

As an illustration of aforementioned notions, let 
us consider the (Pr=1, Le=0.1) Rayleigh—Bénard- 
Soret (RBS) problem where horizontal solid plates of 
infinite extension are uniformly heated from above 
(Ra < 0) or below (Ra > 0). This configuration is 
simply obtained by rotating the cavity in Figure 3 by 
+7/2 with respect to g and to (é,,é,). The steady 
parallel-flow model can lead to the right-hand side 
of an equation like [27] governing the time evolu- 
tion of A, the parallel-flow amplitude, 


+o? (1 -7)| [30] 
fe 
where 
315 Ra ee | 
X= 478° r= F909” ro = |1 + Up(14+ Le n) 


Here r, is the critical value or r where the rest state 
loses its stability towards a steady parallel flow. The 
roots of dA/dt=0 are A= Ap = 0, A= A)(r, Le, Ug), 
for the quiescent, convective states. Figure 4 shows 
that Ag=0O and the curves Aj(r) for several 
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Figure 4 Bifurcation diagram of Ao(r) and Aj(r) for various 
separation ratios Vp(Le). 


Up(Le), yy =—(1+ Le) being the re pole. The 
solid (dotted) parts correspond to the stable 
(unstable) steady states, emerging from direct (back- 
ward) pitchfork bifurcations of the rest state at re. 
Saddle-node bifurcations from unstable to stable 
steady states are also predicted, on the dashed curve 
of the equation 


A= +3 r— (1+ Le?) 


Fully Nonlinear Problem 


Numerical tools are required for solving the system 
[8], [16]-[18] and analyzing the stability of the 
flows obtained. 


The RBS Case Let us illustrate how the rest-state 
loss of stability occurs in the two-dimensional RBS 
case, with a (Pr=1, Le=0.1, Vg = —0.2) mixture. 
The flow lies in the meridian plane of an axisym- 
metric container with the radius/height ratio equal 
to 2. No-slip conditions are imposed on impervious 
walls; the temperature on the bottom plate is higher 
than on top, and the peripheral wall is adiabatic. At 
t=O, the quiescent state is given a small random 
perturbation. The system evolves (Figure 5) towards 
a stable periodic solution via a transient regime of 
exponentially amplified amplitude (eqn [26]). One 
speaks of a Hopf bifurcation for a steady (here 
quiescent) state destabilization by oscillatory 
disturbances. 

The “instantaneous” frequency (from the time 
running between two successive identical passes of 
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Figure 5 Time evolution of a radial velocity nodal value for Ra = 2600. Reproduced from Millour, Labrosse, and Tric (2003) Physics 
of Fluids 15(10): 2791—2802, with permission from American Institute of Physics. 
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Instantaneous angular frequency w, corresponding to Figure 5. Reproduced from Millour, Labrosse, and Tric (2003) 


Physics of Fluids 15(10): 2791—2802, with permission from American Institute of Physics. 


the signal) evolves with time (Figure 6) from its 
threshold value to its nonlinearly saturated one. 
Accurate determination the thresholds and identi- 
fication of the associated bifurcation is possible by 
fitting the argument € of Amax(Ra) from the 
exponential growth of Figure 5, in the Ra, vicinity. 
Figure 7 shows (solid dots) A(Ra) measurements, 
and the solid line (in Figure 8 also) is the linear law 
given by the two points closest to the vanishing 
growth rate. The local law announced in the 
subsection “Direct system” is confirmed, with 


an exponent €=1 for the Hopf bifurcation, and 
€=1/2 for saddle-node (Figure 8) and pitchfork 
bifurcations. 


The Thermally Driven Cubic Cavity All flows are 
obviously three dimensional. When do they possess 
a two-dimensional approximation? How to qualify 
it? Clearly, the flow that develops in the container of 
Figure 3 might enjoy (in a given parameter domain, 
D) the mirror-reflection symmetry property about 
the vertical midplane. Is there a two-dimensional 
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Figure 7 Temporal growth rate, à, of infinitesimal perturba- 
tions, in the vicinity of the Hopf bifurcation of the quiescent state. 
Reproduced from Millour, Labrosse, and Tric (2003) Physics of 
Fluids 15(10): 2791-2802, with permission from American 
Institute of Physics. 
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Figure 8 Squared temporal growth rate, àA?, of transient 
relaxation towards the stationary state close to the saddle- 
node bifurcation. Reproduced from Millour, Labrosse, and Tric 
(2003) Physics of Fluids 15(10): 2791-2802, with permission 
from American Institute of Physics. 


approximation of the flow in this midplane? Is it 
able to give a correct estimate of the two-dimen- 
sional flow stability within D, and to predict the D 
frontiers, where the mirror-reflection symmetry 
property ceases to be valid? Only partial answers 
are available so far, coming from the thermally 
driven cubic cavity (Figure 9). 

Filled with a pure fluid, its left and right vertical 
plates have fixed temperatures, Tp (Q=0 at x=0) 
and Ty + AT (Q=1 at x=1), while the others are 
adiabatic. Any ATÆ 0 generates a flow, possibly 
mirror-symmetric about the vertical (hatched) 
midplane, and also centrosymmetric about é,. The 
two-dimensional approximation was extensively ana- 
lyzed, numerically, with air as a fluid. A steady flow 
is obtained for Ra < Ragp,.=(1.82 £0.01) x 10°, 
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Figure 9 Sketch of the thermally driven cubic cavity. 


where an oscillatory regime appears. The numerical 
three-dimensional flow is steady until Ra3p,c= 
3.2 x 10’, where it hysteretically bifurcates towards 
an oscillatory regime breaking the mirror symmetry 
about the midplane. Let us assess the validity of the 
two-dimensional approximate solutions. We define 
dimensionless heat fluxes (Nusselt numbers) which 
penetrate in one of the active walls, 


1 
Nu(y) = | SFP 





dz 


x=0 





0 Ox 


Three fluxes are interesting to compare: (1) in the 
midplane, Nump = Nu(y = 1/2), (2) globally Nu3p,w = 
h Nu(y)dy, and (3) the two-dimensional 
approximation 


1 
Nu w = / "Pan 
0 X 





dz 


x=0 





Figure 10 shows how they compare themselves, 
as a function of Ra. Quantitatively, the two- 
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Figure 10 Relative 2D-3D Nusselt numbers. Reproduced with 
permission from Tric E, Labrosse G, and Betrouni M (2000) A 
first incursion into the 3D structure of natural convection of air in 
a differentially heated cubic cavity from accurate numerical 
solutions. International Journal of Heat and Mass Transfer 43: 
4043-4056. © Elsevier Ltd. 
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dimensional approximation is not too bad, but not 
qualitatively, with a nonmonotonic evolution of the 
discrepancies. These latter become quite negligible 
when the three-dimensional flow gets unsteady and 
paradoxically loses the symmetry property on 
which its two-dimensional approximation is 


founded. 


Thermocapillary Convection 


Two immiscible liquids, or a liquid and a gas, are 
separated by a free surface, a region of small 
thickness (some ten molecular sizes). From a 
macroscopic viewpoint, it is considered as a singular 
entity. Its location and geometry are part of the 
solutions of the governing equations, themselves 
supposed to satisfy [20] on the free surface. As a 
first iteration, the free-surface shape can be imposed, 
fixed, and straight often. 

Numerous industrial processes involve thermoca- 
pillarity wherein thermohydraulics involves complex 
phenomena, such as phase-change kinetics. A rele- 
vant modeling of these situations is a research 
subject by itself. For thermohydraulics, some aca- 
demic configurations (Figure 11) have retained the 
attention of the scientific community. 

Any thermohydraulic flow transfers heat 
between hot and cold solid boundaries wherein 
heat penetrates by conduction. Consequently, the 
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Figure 11 Open boat ((a) straight and (b) circular) and liquid 
bridge (c) configurations. 
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Figure 12 Thermocapillary origin of vorticity singularity (cold 
wall configuration). 


term (V -2)0 of [20] never cancels at the solid 
boundary/free surface junction, as in Figure 12. 

A nonzero vorticity is thus generated by thermo- 
capillarity on the free surface until the wall, while 
flow adherence on the wall gives vorticity values of 
opposite sign. The problem presents therefore a 
vorticity singularity at the triple point. This is a deep 
physical and modeling problem. 


See also: Bifurcations in Fluid Dynamics; Capillary 
Surfaces; Compressible Flows: Mathematical Theory; 
Dynamical Systems and Thermodynamics; Dynamical 
Systems in Mathematical Physics: An Illustration from 
Water Waves; Fluid Mechanics: Numerical Methods; 
Magnetohydrodynamics; Non-Newtonian Fluids; Partial 
Differential Equations: Some Examples; Stability of 
Flows; Vortex Dynamics. 
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Introduction 


The general theory of relativity (GRT) unifies special 
relativity theory (SRT) and Newton’s theory of 
gravitation (NGT). SRT and NGT describe success- 
fully large domains of physical phenomena; there- 
fore, one would like to understand how they survive 
as approximations in GRT. 

In GRT, spacetime is idealized as a four-dimen- 
sional Lorentz manifold whose curvature is related 
to the distribution of energy and momentum. In 
such a spacetime, the existence of the exponential 
map implies that the metric near any event (space- 
time point) x deviates from a flat metric only by 
terms given by the curvature there. Thus, if the 
gravitational tidal field, represented by the curvature 
tensor, is small near x, one may approximate the GR 
metric there by a flat Minkowski metric. This 
explains that SRT is a general local approximation 
to GRT. Apart from a remark at the end of the 
subsection “Local laws” the relation GRT — SRT 
will not be discussed further. 

In its traditional formulation, Newton’s theory 
differs drastically from Einstein’s theory both in its 
spacetime structure and in its description of gravita- 
tion. The main purpose of this report is to show 
how NGT can nevertheless be understood as a kind 
of “limit” of GRT. More precisely, the structure of 
NGT can be viewed as a degenerate version of that 
of GRT, in parallel to the fact that the Galilei group 
can be obtained by contracting the Lorentz group. 

In the next section we state the laws of GRT. 
We then reformulate these laws with slightly 
different field variables such that, besides the 
gravitational constant k, the speed of light appears 
via \=c~. The resulting laws remain meaningful 
if A and/or k are replaced by zero. They turn out 
to give a common basis for GRT, SRT, and 
NGT. The possibility of such a framework was 
indicated independently by Cartan (1923, 1924) and 
Friedrichs (1927) and extended by several authors; 
the complete formulation reviewed here was given 
by Ehlers (1981). 

The section “Newton’s theory in spacetime form” 
shows that the laws of NGT and SRT are obtained, 
with some additional restrictions, from the rescaled 
laws of GRT by putting, respectively, \=0 or k=0. 
It is emphasized that Newton’s theory proper is a 


theory only of isolated systems. Its intrinsic, four- 
dimensional formulation explains how the distinc- 
tion between a vectorial gravitational field and 
inertial forces, as well as the existence of inertial 
frames, emerge as consequences of asymptotic 
flatness. These structures are lost in the so-called 
“Newtonian” cosmology whose dynamics is due to 
symmetry assumptions, whereas GR cosmology is a 
proper part of GRT. 

The penultimate section is concerned with rela- 
tions between solutions of GRT and NGT, and in 
the final section some results related to solutions are 
reported. They illustrate that the limit relation 
GRT — NGT may sometimes be inverted to get 
exact or approximate GR results from NGT. 
Approximations are related to uniform convergence 
in A, as is indicated at the end of the final section. 

The limit relations described here may be con- 
sidered as a model for other theory relations in 
physics such as quantization or dequantization. 

Notation Indices will be considered in general as 
“abstract” ones, characterizing the kind of objects 
independent of coordinate systems. Greek indices 
refer to spacetime, Latin ones to 3-space. Fields on 
spacetime will generally be taken to be smooth. 


Basic Concepts and Laws of GRT 


According to GRT, spacetime is a four-dimensional 
manifold M endowed with a Lorentzian metric gag, 
here taken to have signature (+ + + —). Any kind 
of matter including nongravitational fields is sup- 
posed to determine an energy tensor T°’. Metric 
and matter are interrelated by Einstein’s gravita- 
tional field equation 


Srk 1 
Kag= Pa (Tos = 5 837 [1] 


In this equation, T:= T°, denotes the trace of the 
energy tensor, k and c stand for Newton’s constant 
of gravity and the speed of light, respectively, and 
the Ricci tensor Rag is obtained from Riemann’s 
curvature tensor by contraction 


Rog — Rog 


The curvature tensor is constructed from the 
symmetric, linear connection Tg“, determined by 
the metric. 

Equation [1] implies the vanishing of the covar- 
iant divergence of the energy tensor 


Ty = 0 2] 


504 Newtonian Limit of General Relativity 


the GRT analog of the laws of local conservation of 
energy and momentum. 

The energy tensor depends on the kind of matter 
to be taken into account. In this article, only 
vacuum fields (T°? =0) and perfect fluids will be 
considered. For such a fluid, 


T” = (p+ *p)U°U" + pg” [3a] 


p and p denote the mass density and the pressure, 
respectively, and the 4-velocity U® is a timelike 
vector obeying 


gal U [3b] 


If thermodynamical relations are added to specify 
the kind of fluid — the simplest cases are barotropic 
equations p=f(p) — then eqns [1]-[3] admit a 
well-posed initial value problem for the fields 
Sap» U®, p. 

Different matter models which could be treated in 
the context of this report are elastic bodies and ideal 
gases, but not point particles. Point particles fit into 
GRT even less than into electrodynamics. 


The Cartan-Friedrichs Formalism 


To obtain a spacetime formulation of NGT and a 
limit relation ART — NGT, we recall that the 
metric structure of Newton’s spacetime consists of a 
scalar t, absolute time, which foliates M into 
instantaneous 3-spaces S,, and Euclidean metrics 
y,p(t) on these spaces. If the inverses 77?(t) are 
pushed forward onto M via the embeddings S; — M, 
a field s° on M results which is assumed to be 
smooth. By construction, 


st =) [4] 


The pair (t,s®’) defines the “metric,” that is, times 
and distances, in NGT. 

Such a structure can arise from a Lorentzian 
metric, for example, the Minkowski metric nag, by 
taking, component-wise, the limits 


—c ° Nag dx® dx? 


=i —¢ dx* — dg a ge fg 
which can be interpreted geometrically as “opening 
up the light cones” until they degenerate into 
doubly covered, spacelike hyperplanes, the New- 
tonian Sy's. 
The relations [5] suggest to write the GRT laws in 
terms of the rescaled temporal metric (\ = c>?) 


log i= —ALa8 [6] 


and to write — presently only as a change of 
notation — s®’ instead of g®’. Then the fields 
tog, 8°, 3°,, T, p,p, U®, called the basic fields 


below, and constants k>0,A>0_ satisfy the 
following laws: 
toy? = —r8" 5 [7a] 
tað; = 9, sv =0 [7b] 
Ke 5 = Ree [7c] 
1 
Rag = 8rk (totas — zits) TY? [7d] 
Te =) [7e] 
T = (p+ Ap)U*UP tp [7f] 
WOUL =l [7g] 


The Lorentz signature of gag can be reexpressed 
thus: at each event (& spacetime point), there exists 
a “timelike” vector V°, that is, 


togV°V" > 0 [7h] 


and V°X, =0 for Xa # 0 implies s*°X.Xg > 0. 

The indices in eqn [7c] are raised, here and later, 
by s”, 

Given a set of basic fields on M as listed below 
eqn [6], the laws [7] remain meaningful for all A > 0 
and k>0. If A=0, the “metrics” tag and s% 
degenerate (and the pair (t,3,s°’) is then called a 
Galilei metric). Nevertheless, the definition of “time- 
like” will also be used in that case. Also, X° will be 
said to be “spacelike” if and only if it can be written 
Xe =s Eg with s°’E,€3 > 0. While for A > 0, some 
of the relations [7] are redundant, this is not so for 
A= 0. For example, if A=0, the two eqns [7b] are 
independent and do not determine the connection 
Ta“, uniquely, in contrast to the case A > 0. The 
connection will always be assumed to be symmetric. 

As will be discussed below, these formulas define 
a framework which serves to relate GRT to NGT 
and special relativity (SRT). First steps to formulate 
such a framework have been taken independently by 
E Cartan and KO Friedrichs. Therefore we call the 
structure defined by [7] the Cartan—Friedrichs 
formalism (CFF). We call it a “formalism” and not 
a “theory” since it is of interest solely as a tool to 
study relations between theories. 

Equations [7] remain unchanged if the basic fields 
and constants are rescaled according to a change of 
units for time, length, and mass. Here, two sets of 
basic fields related by such a rescaling will be 
considered as physically equivalent; they provide the 


same relations between observables. Thus, \ and k 
have no physical meanings, but only their signs: 


A>0,k>0: GRT 
A=0,k>0: NGT 
A>0,k=0: SRT 


(The last two lines are not sufficient to specify the 
theories within CFF; in connection with eqn [9] and 
in Theorem 2 they will be completed.) For discuss- 
ing limit relations between theories, it is nevertheless 
useful to represent physical models in different 
scales. 

The physical interpretation of tag,s°° in terms of 
time and distance and that of ['3°, through its 
geodesics as world lines of freely falling test 
particles, respectively, is the same in the three 
theories and can be stated in terms of the common 
framework CFF. 

For an obvious reason, A may be called causality 
constant. Note that A and k each occur in only one of 
the general laws of the theory, apart from the A in [7f]. 

The laws [7] are invariant under diffeomorphisms 
of the spacetime manifold. Those diffeomorphisms 
which map the basic fields of a solution into 
themselves form the symmetry group of that solution. 


Newton’s Theory in Spacetime Form 
Local Laws 


Remarkably, for A=0 and k > 0 the formulas [7] 
reproduce almost all the laws on which Newton’s 
theory of spacetime coupled to Euler’s fluid theory is 
based. This is summarized in the following: 


Theorem 1 Let eqn |7] hold on M with \=0. 
Then there exists, for any event of M, a neighbor- 
hood U with coordinates (x*,t) such that, on U, t 
coincides with the absolute time, tag =t,at,3, and on 
the local slices U A S;,s°° defines Euclidean metrics 
ya with orthonormal coordinates x°, Y% = Ôab. 
Vectors are spacelike iff they are tangent to S,, 
otherwise they are timelike. Moreover, the slices 
are locally geodesic with respect to the connection 
Ta“, and the induced connection on the slices is the 
flat connection associated naturally to ‘yp. In 
addition, in the coordinate chart given by (x*,t), 
the connection components vanish except Tofo and 
T'9?,(= —Io%,). Therefore, t is an affine parameter 
on timelike geodesics. Further, U? =1, and U’ =v" 
is the 3-velocity of the fluid. If one writes 


—T'9*o =: r: —I'o*, =: wp [8a] 
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and uses 3-vector notation with (g*)=g, 
(w23,W31,w12)=@, the timelike geodesics of Vg", 
are given by 


x=g+2xx@ [8b] 

g and @ satisfy 
V-@=0, Vxg+2a0=0 [8c] 
V xø -= 0, V-g-—2ø° = —4rkp [Sd] 


and the fluid’s equations of motion are 
p+V-(pv) =0 [8e] 
pvu+tv-Vv—g—2vx@)+Vp=0 [8f] 


A solution (g,@,p,p,v) of eqns [8] on a local 
chart (x*,t) with tag =diag(0,0,0,1) and s’ = 
diag(1,1,1,0) provides, via eqn [8a], the general 
local solution to eqns |7] for \=0. 


The proof consists of many, mostly elementary 
steps which can be gathered from Kiinzle (1972) and 
Ehlers (1981). 

Given a solution to eqns (7) with A=0 and k > 0, 
the coordinates x° = (x4, t) referred to in the theorem 
are determined by the basic fields up to time- 
dependent Euclidean motions, time translations, and 
time reflections. Such a coordinate system corresponds 
to a rigid reference frame. As the equation of motion 
for freely falling particles, eqn [8b], shows, g and @ 
are to be interpreted as the acceleration and rotation 
fields which determine, relative to a rigid frame, the 
combined influence of inertia and gravity on particles 
encoded in the spacetime connection ['3°,. (This role 
of a connection in NGT was recognized by E Cartan.) 
This interpretation is supported by the (generalized) 
Euler equation [8f]. 

As claimed above already, eqns [7] almost 
reproduce the local laws of the Newton-—Euler 
theory. Indeed, eqns [8] are those of the Newton- 
Euler theory, provided @ depends on time only. 
Then and only then can the coordinate freedom be 
used to get nonrotating rigid coordinates with 
respect to which @=0. The existence of such 
coordinates is indispensable for NGT since only 
with respect to them —g is the gradient of a 
potential U which obeys Poisson’s equation, as 
shown by eqns [8c] and [8d]. 

The preceding argument shows that the CFF, 
specialized to A=0, has to be restricted by a 
condition which implies @=q@(t) in order to give 
the local laws of NGT. One such condition is 


Rg = 0 [9] 
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as can be verified by computing the curvature tensor 
via eqn [8a]. 

Equation [9] for A=0 expresses that parallel 
transport of spacelike vectors along arbitrary spacetime 
curves is integrable, which corresponds to the behavior 
of free gyroscopes in NGT (in contrast to GRT). 

Of course, eqn [9] cannot be added to the CFF since it 
is incompatible with GRT. If, however, the CFF with 
A >0,k=0 is restricted by the condition [9], the 
spacetime and hydrodynamics of special relativity result. 


Global Laws for Isolated Systems 


The laws [8] and [9] do not determine the time 
evolution of the basic fields. Using nonrotating 
coordinates we put g=—VU and replace eqns [8c], 
[8d] by Poisson’s equation 


AU = 4rkp [10] 


In Newtonian dynamics, the potential only serves 
to compute forces depending instantaneously on the 
mass distribution. Traditionally, this is achieved by 
assuming p to have spatially compact support at 
each time and to solve eqn [10] by 


o= k |D ay 
which implies the fall-off 
lim ġ(x,t) = 0 [12] 


|x| — 00, 
t=const. 


( will always be used for this solution of eqn [10]). 

To relate the foregoing isolation assumptions 
to corresponding assumptions in GRT as far as 
presently possible, it seems necessary to go back 
to the laws [7] restricted to A=0 or the equivalent 
(3 + 1) version [8] without the restriction [9]. 

If some global assumptions are added to eqns [8], 
eqns [10|-[12] can be deduced from the four- 
dimensional formulation. One first introduces the 
following two assumptions: 


(1) The hypersurfaces S; of M (which, for \=0, are 
the only spacelike hypersurfaces) are simply 
connected, complete Euclidean spaces. 

(2) On each S,, the support of p is compact. 


Using coordinates (x*,t) as in the last subsection, 
with x“ now ranging on R®, eqns [8a] imply 


2 
RgygR ge = —2 (Wap) tee [13] 
a,b 
Hence the sum is a 4-scalar, and since tag is 
covariantly constant, it is possible to require 


R” gR” ae 0 at spatial infinity [14] 


which expresses covariantly that w4, — 0. Since @ is 
harmonic on S; (by eqns [8c], [8d]), this in turn 
implies wap =0; thus, w depends on ¢ only; the 
asymptotic condition [14] and the local laws imply 
eqn [9]. 

We may therefore employ rigid, nonrotating 
coordinates, @=0. Then, by eqns [8a], [8c], [8d] 
the connection coefficients take the form 


ey = t gts U 5 [15] 
and 
R^ aup R” yrs = Lat pl yie ` Sa |16] 
ab a,b 


As before, we require 
R^ang R” yrs — 0 [17] 


and conclude U „p — 0. Since the Newtonian poten- 
tial @ of p also has this fall-off and U -— ¢ is 
harmonic on S; = R?, the following conclusion can 
be obtained: 


Lemma 1 The laws [8] and the global conditions 
(1)-(2), [14], [17] imply: in rigid, nonrotating 
coordinates, the connection 


D ueia ae [18] 


is flat (¢ according to eqn [11] is a scalar, and the 
@-term in eqn [18] is a tensor). In other words, 
Ta“, is asymptotically flat since the ġ-term falls of 
as |x|”. 


Because of this lemma, one can further restrict the 
coordinates (xf, t) by demanding I4% =0. In physi- 
cal terms this means: by switching to a new, 
“unaccelerated” frame of reference, one removes 
from the equations of motion a spatially homo- 
geneous gravitational field which, in contrast to the 
ġ-term in eqn [16], is not due to matter. 

The resulting coordinates are defined, up to 
Galilean transformations, 


f=-+t+2 
x? =D px? +u7t+ co 
where c’,u% are constants and D is a constant 
orthogonal 3x3 matrix. These coordinates are 
called inertial ones; with respect to them the usual 


laws of Newtonian mechanics hold; see [8] with 
@=0 and U=¢d[p]. 


Theorem 2 (Ehlers 1981). The laws [7] of the 
CFF restricted to \=0 and augmented by the global 
and asymptotic conditions (1)-(2), [14], [17], 
provide a generally covariant, four-dimensional 


formulation for the Newtonian theory of space, 
time, gravitation, and hydrodynamics. 


The possibility to split the connection T into a flat 
part which is independent of matter and a tensorial 
part depending on matter and given by the vector 
field g% =s@ g (with ¢ from eqn [11]), arises only 
from supplementing the local laws [7] by the global, 
resp. asymptotic, conditions (1)-(2), [14], [17] 
stated above. The introduction of inertial coordi- 
nates is then convenient, but not necessary. In 
noninertial, rigid frames of reference, I'g°, gives 
rise to inertial forces. 

It should be possible to define spatial asymptotic 
flatness in the CFF, but that has not been done. 


Remarks on Newtonian Cosmology 


In cosmology, the conditions (2) and [17] of the 
last subsection are not appropriate. Instead one 
keeps the laws [7] and adds to them eqn [9], so 
that with respect to nonrotating coordinates the 
laws [8] with @=0 and eqn [10] remain valid. 
Then, there are no longer inertial coordinate 
systems, and the potential U is not a 4-scalar. 
For a slightly different approach, see Rüede and 
Straumann (1997), 

For the purpose of this article, the term 
“cosmological model” will be applied to those 
solutions of the laws [7] and [9] which satisfy p > 0 
and which have a symmetry group which acts 
transitively on the set of world lines representing 
the motion of the fluid. This strong symmetry 
assumption determines the time-evolution even in 
the “Newtonian” case \=0 in spite of the absence 
of an evolution equation for the gravitational 


field g. 


Newtonian Limits of Families 
of GR Solutions 


The discussion in the sections “The Cartan- 
Friedrichs formalism” and “Newton’s theory in 
spacetime form” suggests the following: 


Definition 1 Let a family F(A)=(toag(A),...) of 
basic fields parametrized by à, obeying the laws [7] 
of the CFF, be given for 0 < A < a. We assume the 
underlying manifolds M(A) to be open submanifolds 
of a fixed manifold M such that M(\;) D M(A2) if 
Ay < Az and LJ, M(A)=M. Then we write 


lim F(A) = F(0) [19] 


if the fields of F(A) and their first derivatives 
converge pointwise to those of F (0). 
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F (0) is then said to be a CF limit of the sequence of 
(A-rescaled) solutions F(A) of GRT. If the fields of a 
A-family of GR solutions (A > 0) and their first 
derivatives converge for A — 0 locally uniformly, 
then the limit fields satisfy eqns [7]. If F(0) has the 
additional property [9], the limit is locally Newtonian. 

On the basis of the section “The Cartan- 
Friedrichs formalism” one may conjecture that if 
eqn [19] holds and the F(A) for à > 0 are spatially 
asymptotically flat, F(0) will represent an asympto- 
tically flat Newtonian spacetime. Examples such as 
Example 1 below are in agreement with this 
conjecture, but a general proof is not known. 


Example 1 The interior solution for a static, 
spherically symmetric fluid ball of constant energy 
density (Schwarzschild 1916) is given by 





2 dr ap, 2 2 
ds = (d9 + sinf ð dy^) 
1 izai 
~ 4 (340 -4 c“ dt 
p = const. > 0, p=p ET 
2 ETA 
er iy ON) = (1-Fe 2p) 
ay =a(ro) 


Inserting into these expressions the parameter 
\=c* and treating p and rọ as A-independent 
constants results in a A-family with 0<A< 
((82/3)kr3p)*. The limit solution represents a 
Newtonian fluid ball of constant mass density p. 
The Schwarzschild vacuum fields belonging to these 
fluid balls also have the appropriate Newtonian 
limits. The resulting complete spacetimes are asymp- 
totically flat. A dimensionless small parameter 
which could be used instead of A to measure the 
deviation of the GR solution from its Newtonian 
limit is the ratio of Schwarzschild radius and the 
geometric radius: 


2kM 8r koro 





cery 3 œ 


Example 2 A Friedmann-Lemaitre cosmological 
model of GR containing dust and radiation is given by 


ôapdé" dé? 
(1 — (1/4)(E/2)Sn Et) 
where R(t) obeys 


ds? = R? (t) =e dr 


n 


. M S 
2 — — = 
> 3 E(k 7 az) i 
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M is a mass constant, p= M/R? is the mass density 
of “dust,” S is an entropy constant, «=S/R* the 
energy density and p=(1/3)e the pressure of 
radiation; and E is a constant of dimension 
(speed). The world lines of the fluid elements 
are given by &*=const. (Lagrangian comoving 
coordinates). 


Taking E, M, S constant and \=c™ as a parameter 
provides a A-family of GR models with Newtonian 
limit. In the limit, t is the Newtonian time, and the 
spatial metric R*6,,dé7dé describes an expanding 
Euclidian space R? (if E <0) or an open ball of 
radius 2R(t) in it (if E > 0). In the coordinates (€%, t) 
the connection does not have the “Newtonian” 
components [8a], instead its nonvanishing compo- 
nents are Tof, =(R/ R)6?. In local inertial coordinates 
x’ = RE centered on the particle with €*=0 (which 
could be any particle because of the homogeneity of 
the model), the spatial metric is dx*, and the 
connection components are Newtonian, with 
U= (2r/3)kpx? and AU=4rkp. In the limit, the 
radiation no longer influences the expansion; one gets 
the Newtonian dust models (eqn [9] is satisfied). The 
connection is, of course, not asymptotically flat. The 
curvature tensor R°%g3%5=(47/3)Rptgss°7 exhibits 
homogeneity and isotropy. The Gaussian sectional 
curvature of the 3-space at time t is K = —\E/R?*. As 
a dimensionless smallness parameter one can take 
E/c*. In the “open” models, with E> 0, the 
coordinates €* cover the whole 3-manifold of fluid 
particles, while in the “closed” case, E <0, one 
particle, the antipode of £ =0 on the 3-sphere, is not 
covered. That particle is missing in the Newtonian 
limit model. In the Newtonian case the expanding 
Euclidian space R? can be replaced by a torus; in the 
GR cases this is possible only for E=0. 

Many examples of GR families with Newtonian 
limits are known (see, e.g., Ehlers (1997) and 
references therein). An example of a A-family 
which has an almost Newtonian limit which does 
not satisfy eqn [9] is provided by NUT spacetimes 
(see Ehlers 1997), interpreted as due to a 
gravitomagnetic monopole (Lynden-Bell and 
Nouri-Zonez 1998). 


Applications and Problems 


Can one construct, for a given Newtonian solution 
N, a A-family of GR solutions which converges to 
N? Some answers are known and listed below. 

U Heilig (1995) has shown: given a solution to 
the Euler—Poisson equations representing a station- 
ary, rigidly rotating, self-gravitating fluid body 
with its surrounding gravitational field, there exists 


a A-family of corresponding solutions to the 
Einstein—Euler system having the given solutions 
as its limit. 

The proof is based on the fact that one can 
reformulate eqns [1], [2] in terms of harmonic 
coordinates and new dependent gravitational vari- 
ables instead of gag such that the new equations 
given in Lottermoser (1992) are analytic in and 
reduce, for \=0, to the Euler—Poisson system. In the 
stationary case these equations are elliptic for \ > 0. 
Using appropriate function spaces, Heilig shows, via 
the implicit function theorem, that a solution for 
= 0 can be extended to small, positive values of A. 
Since L Lichtenstein has constructed solutions as 
assumed in the theorem, the existence of GR 
solutions follows. 

The gravitational part of the system of equations 
referred to above is hyperbolic for A>0, but 
becomes elliptic for A=0, whereas the fluid equa- 
tions remain hyperbolic. In spite of this difficulty 
Rendall (1994) has shown that A-families of time- 
dependent, asymptotically flat solutions to the 
Einstein—Vlasov system representing gravitating 
systems of collisionless particles have Poisson- 
Vlasov limits, and that any Poisson—Vlasov solution 
can be so obtained. 

Lottermoser (1992) succeeded in proving the exis- 
tence of A-families of solutions to the Einstein constraint 
equations which have Newtonian initial data as limits. 
Nothing seems to be known about solutions evolving 
from such data. Lottermoser has given an interesting 
discussion concerning possible extension of his work 
which apparently has gone unnoticed. 

Rendall (1992) has defined and analyzed post- 
Newtonian expansions to Einstein’s equations and 
their solvability, assuming A-families whose tag, s’ 
are a few times differentiable in e= VA at e=0. He 
found that for low orders the equations have 
asymptotically flat solutions, but that at order e 
divergences occur for general Newtonian seed 
solutions. Modifications of the method to overcome 
these difficulties have been considered by Rendall 
and others; the problem is open. 

In cosmology, one uses homogeneous back- 
ground models and studies their perturbations. 
The latter are frequently based on Newtonian 
equations. This can perhaps be justified as follows. 
According to Example 2 the fields of Friedmann- 
Lemaitre models differ from their Newtonian limits 
by arbitrarily small amounts uniformly in 
spacetime regions where the terms involving are 
small, that is, 
<r), VA 


—~< Rit —— 
C 


Ma |x| << R(t) 
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Additional conditions will be needed to ensure that 
Newtonian perturbations approximate relativistic 
ones and that gravitational wave perturbations can 
be neglected. 


See also: Cosmology: Mathematical Aspects; Einstein 
Equations: Exact Solutions; General Relativity: Overview; 
Gravitational Lensing; Shock Wave Refinement of the 
Friedman—Robertson—Walker Metric. 
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Introduction 


The aim of this contribution is to explain how 
Connes derives the standard model of electromag- 
netic, weak, and strong forces from noncommuta- 
tive geometry. The reader is supposed to be aware of 
two other derivations in fundamental physics: the 
derivation of the Balmer—Rydberg formula for the 
spectrum of the hydrogen atom from quantum 
mechanics and Einstein’s derivation of gravity from 
Riemannian geometry. 

At the end of the nineteenth century, new physics 
was discovered in atoms, namely their discrete 
spectra. Balmer and Rydberg succeeded to put 
order into the fast-growing set of experimental 
results with the help of a phenomenological ansatz 
for the frequencies v of the spectral rays of, for 
example, the hydrogen atom, 


tnt), nEN, qEeZ, gER [i] 


v = g(m — ni 
The integer variables nų and m reflect the 
discreteness of the spectrum. On the other hand, 
the discrete parameter q and the continuous 
parameter g were fitted by experiment: q= -—2 


and g=3.289 x 101 Hz, the famous Rydberg 
constant. Later quantum mechanics was discov- 
ered and allowed to derive the Balmer-Rydberg 
ansatz and to constrain its parameters: 


m et 


Gao 2 


=2 and = —<~_ 
i Sanh? (Ameo)? 


in beautiful agreement with the anterior experi- 
mental fit. 


The Standard Model 


We propose to introduce the standard model (see 
Standard Model of Particle Physics) in analogy with 
the Balmer—Rydberg formula (Table 1). 


Table 1 An analogy between atomic and particle physics elements 


Atomic physics Particle physics 

New physics Discrete spectra Forces mediated by 
gauge bosons 

Ansatz v= g(nj — nį) Yang-Mills—Higgs 


models 
Experimental q= —2, g =3.289 x 10'°Hz Standard model 
fit 
Underlying 
theory 


Noncommutative 
geometry 


Quantum mechanics 
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The Yang-Mills-Higgs Ansatz 


The variables of this Lagrangian ansatz are spin-1 
particles A, spin-(1/2) particles decomposed into left- 
and right-handed components w = (Wy, Yr) and spin- 
0 particles y. There are four discrete parameters, a 
compact real Lie group G, the “gauge group,” and 
three unitary representations on complex Hilbert 
spaces HL, Hr, and Hs. The spin-1 particles come ina 
multiplet living in the complexified of the Lie algebra 
of G, A € Lie(G)©. The left- and right-handed spinors 
come in multiplets living in the Hilbert spaces, wy, € 
Hı, wr E€ Hr, respectively. The (Higgs) scalar is 
another multiplet, % € Hs. The Yang—Mills—Higgs 
Lagrangian, together with its Feynman diagrams, is 
spelled out in Table 2. 

There are several continuous parameters: the 
gauge coupling g € R}, the Higgs self-couplings 
A, u € R4, and several Yukawa couplings gy € C. 


Table 2 The Yang—Mills—Higgs Lagrangian and its Feynman 
diagrams 


LIA, p, | = Atr(0,A,0"A" — 8,,A,0"A") 


+g tr(0,,AL[A", A”]) 


+9° tr([A,,, AvI[A", A”]) 
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Let us choose G = U(1) > e”. Its irreducible unitary 
representations are all one-dimensional, H=C 53 4% 
characterized by the charge q € Z: plet yy = ey. 
Then with qi =gr and Hs = {0}, we get Maxwell’s 
theory with the photon (or gauge boson or 4-potential) 
A coupled to the Dirac theory of a massless spinor of 
electric charge qy whose (relativistic) wave function is 
w. The gauge coupling is given by g=e/\/ep. Gauge 
invariance of the Yang—Mills-Higgs Lagrangian 
implies, via Noether’s theorem, electric charge con- 
servation in this case (see Symmetries and Conserva- 
tion Laws). 

Yang-Mills models are therefore simply nonabelian 
generalizations of electromagnetism where the abelian 
gauge group U(1) is replaced by any compact real Lie 
group. We insist on a compact group because all 
irreducible unitary representations of compact groups 
are finite dimensional. Finally, the Higgs scalar is 
added to give masses to spinors and gauge bosons via 
spontaneous symmetry breaking (see Symmetry 
Breaking in Field Theory). 

We use compact groups and unitary representations 
as (discrete) parameters. One motivation is Noether’s 
theorem and conserved quantities. The other comes 
from Wigner’s theorem: the irreducible unitary 
representations of the Poincaré group are classified 
by mass and spin. Its orthonormal basis vectors 
are classified by energy-momentum and by the 
z-component of angular momentum. This theorem 
leads to the widely accepted definition of a particle as 
an orthonormal basis vector in a Hilbert space 
H carrying a unitary representation p of a group G. 

A precious property of the Yang—Mills—Higgs 
ansatz is its perturbative renormalizability necessary 
for fine-structure calculations like the anomalous 
magnetic moment of the muon. 


The Experimental Fit 


Physicists have spent some 30 years and some 10? Swiss 
Francs to distill the fit (Particle Data Group 2004): 


G= SUZ) x U(1) SU) (Z2 xX Z3) [3] 


Hi = [2,4 3) e (2, -4 1)] 4] 





3 
Hr = Dl[(1,3,3)e(,-4,3)e(1,-1,)] [5] 
1 


Hs = 2, = 1) [6] 


Here (n2,y,n3) denotes the tensor product of an 
n-dimensional representation of SU(2), “(weak) iso- 
spin,’ an 73-dimensional representation of SU(3), 
“color,” and the one-dimensional representation of 
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U(1) with “hyper charge” y. For historical reasons, the 
hypercharge is an integer multiple of 1/6. This is 
irrelevant: in the abelian case, only the product of the 
hypercharge with its gauge coupling is measurable, and 
we do not need multivalued representations, which are 
characterized by noninteger, rational hypercharges. In 
the direct sum, we recognize the three generations of 
fermions, the quarks, “up, down, charm, strange, top, 
bottom,” are SU(3) triplets, the leptons, “electron, 
u, T” and their neutrinos, are color singlets. The basis 
of the fermion representation space is 


u Ç t 
(a) (S) G), 
Ve Di Vr 
O @r C9) 
UR, CR, ËR, ER, UR, TR 
dR, SR, br, 


The parentheses indicate isospin doublets. 

The eight gauge bosons associated with su(3) are 
called gluons. Warning: the U(1) is not the one of electric 
charge; it is called hypercharge, the electric charge is a 
linear combination of hypercharge and weak isospin. 
This mixing is necessary to give electric charges to the W 
bosons. The Wt and W- are pure isospin states, while 
the Z° and the photon are (orthogonal) mixtures of the 
third isospin generator and hypercharge. 

As the group G contains three simple factors, 
there are three gauge couplings, 


g2 = 0.6518 + 0.0003 
gı = 0.3574 + 0.0001 7] 
g3 = 1.218 4 0.01 


The Higgs couplings are usually expressed in terms 
of the W and Higgs masses: 


mw = 4g v = 80.419 + 0.056 GeV [8] 


mo = 2V2 V Av > 98 GeV [9] 


with the vacuum expectation value v:= (1/2)u/ VA. 
Because of the high degree of reducibility of the spin- 
(1/2) representations there are 27 complex Yukawa 
couplings. They constitute the fermionic mass matrix 
which contains the fermion masses and mixings: 

Me = 0.510998902 + 0.000000021 MeV 

M, =3+2MeV, m4 = 6+ 3MeV 

m, = 0.105658357 + 0.000000005 GeV 

m: = 1.25 + 0.1 GeV, m= 0.125 + 0.05 GeV 

m, = 1.77703 + 0.00003 GeV 

m; = 174.3 + 5.1 GeV, m, = 4.2 + 0.2 GeV 


For simplicity, we have taken massless neutrinos. 
Then mixing only occurs for quarks and is given by 
a unitary matrix, the Cabibbo-Kobayashi-Maskawa 
matrix 


Vad Vus Vab 
CkM = Vaa Va Væ [1 0| 
Vid Vis Vip 


whose matrix elements in terms of absolute values are: 


0.222+0.003 0.9742+0.0008 0.040+0.003 


0.009+0.005 0.039+0.004 0.9992+0.0003 


[11] 


[ 0222 sey 0.223 +0.004 0.004 + 0.002 ) 


Mathematically, the Cabibbo-Kobayashi-Maskawa 
matrix comes from a polar decomposition of the 
mass matrix. The physical meaning of the quark 
mixings is the following: when a sufficiently 
energetic W* decays into a u quark, this u quark 
is produced together with a d quark with prob- 
ability Wal- an 5 quark with probability |V,.|’, 
and a b quark with probability |V,,)|’. 

The phenomenological success of the standard 
model is phenomenal: with only a handful of 
parameters, it reproduces correctly some millions 
of experimental numbers: cross sections, lifetimes, 
branching ratios. 


Noncommutative Geometry 


Noncommutative geometry is an analytic geometry 
generalizing three other geometries that also had 
important impact on our understanding of forces 
and time. Let us start by briefly recalling the three 
forerunners (Table 3). Euclidean geometry underlies 
Newton’s mechanics as a geometry in the space of 
positions. Forces are described by vectors living in 
the same space and the Euclidean scalar product is 
needed to define work and potential energy. Time 
is not part of geometry — it is absolute. This point 
of view is abandoned in special relativity unifying 
space and time into Minkowskian geometry. This 
new point of view allows to derive the magnetic 


Table 3 Four nested analytic geometries 





Geometry Force Time 
Euclidean F= f Fdx Absolute 
Minkowskian Ew = Bmg e Universal 
Riemannian Coriolis — gravity Proper, 7 
Noncommutative Gravity = YMH, à= 495 Ar 10s 
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field from the electric field as a pseudoforce 
associated with a Lorentz boost. Although time 
becomes relative, one can still imagine a grid of 
synchronized clocks, that is, a universal time. The 
next generalization is “Riemannian geome- 
try=curved spacetime.” Here gravity can be 
viewed as the pseudoforce associated with a 
uniformly accelerated coordinate transformation. 
At the same time, universal time loses all meaning 
and we must content ourselves with proper time. 
With today’s precision in time measurement, this 
complication of life becomes a bare necessity, for 
example, the global positioning system (GPS). 
Our last generalization is “noncommutative 
geometry =curved space(time) with an uncertainty 
principle.” As in quantum mechanics, this uncertainty 
principle is introduced via noncommutativity. 


2 


Quantum Mechanics 


Consider the classical harmonic oscillator. Its phase 
space is R? with points labeled by position x and 
momentum p. A classical observable is a differenti- 
able function on phase space such as the total energy 
p?/(2m) + kx?. Observables can be added and multi- 
plied, and they form the algebra C®(R?), which is 
associative and commutative. To pass to quantum 
mechanics, this algebra is rendered noncommutative 
by means of a noncommutation relation for the 
generators x and p:[x,p]=ih1. Let us call A the 
resulting algebra “of quantum observables.” It is still 
associative, and has an involution -* (the adjoint or 
Hermitian conjugation) and a unit 1. 

Of course, there is no space anymore of which A is 
the algebra of functions. Nevertheless, we talk about 
such a “quantum phase space” as a space that has no 
points or a space with an uncertainty relation. Indeed, 
the noncommutation relation implies Heisenberg’s 
uncertainty relation AxAp >h/2 and tells us that 
points in phase space lose all meaning; we can only 
resolve cells in phase space of volume 4/2, see Figure 1. 
To define the uncertainty Aa for an observable a € A, 
we need a faithful representation of the algebra on a 
Hilbert space, that is, an injective homomorphism p 
from A into the algebra of operators on H. For the 
harmonic oscillator, this Hilbert space is H = £7(R). 
Its elements are the wave functions (x), square- 
integrable functions on configuration space. Finally, 
the dynamics is defined by the Hamiltonian, a self- 
adjoint observable H=H* € A via Schrodinger’s 
equation (ih0/0t — p(H))y(t,x) =0. Here time is an 
external parameter; in particular, time is not an 
observable. This is different in the special-relativistic 
setting, where Schrédinger’s equation is replaced by 
Dirac’s equation gy = 0. Now the wave function w is 


Hi 


X 


Figure 1 The first example of noncommutative geometry. 


the four-component spinor consisting of left- and right- 
handed, particle and antiparticle wave functions. 
Unlike the Hamiltonian, the Dirac operator does not 
lie in A, but it is still an operator on H. In Euclidean 
spacetime, the Dirac operator is also self-adjoint, 


g =9. 
Spectral Triples 


Noncommutative geometry (Connes 1994, 1995) 
does to a compact Riemannian spin manifold M 
what quantum mechanics does to phase space. A 
noncommutative geometry is defined by the three 
purely algebraic items (A, H, @), called a spectral 
triple. A is a real, associative, and possibly non- 
commutative involution algebra with unit, faithfully 
represented on a complex Hilbert space H, and @ is 
a self-adjoint operator on H. As the spectral triple, 
also the axioms linking its three items are motivated 
by relativistic quantum mechanics. 

When A=C™(M), the functions on a Riemannian 
spin manifold M, represented on spinors w, and @ is 
the gravitational Dirac operator, one has a spectral 
triple. The converse is also true when A is a 
suitable commutative algebra (Connes 1996), but 
the axioms make sense even when A is not 
commutative. As for quantum phase space, Connes 
defines a noncommutative geometry by a spectral 
triple whose algebra is allowed to be noncommu- 
tative and he shows how important properties like 
dimensions, distances, differentiation, integration, 
general coordinate transformations, and direct 
products generalize to the noncommutative setting. 
As a bonus, the algebraic axioms of a spectral 
triple, commutative or not, include discrete, that is, 
zero-dimensional spaces that now are naturally 
equipped with a differential calculus. These spaces 
have finite-dimensional algebras and Hilbert 
spaces, meaning that their algebras are just matrix 
algebras. 

An “almost commutative geometry” is defined as a 
direct product of a four-dimensional commutative 
geometry, “ordinary spacetime,” by a zero-dimensional 
noncommutative geometry, the “internal space.” If the 
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latter is also commutative, for example, the ordinary 
two-point space, then the direct product describes a 
two-sheeted universe or a Kaluza—Klein space whose 
fifth dimension is discrete, (Madone 1995). In general, 
the axioms of spectral triples imply that the Dirac 
operator of the internal space is precisely the fermionic 
mass matrix. 

As a generic example, here is the internal spectral 
triple underlying the standard model with one 
generation of quarks and leptons. The algebra 
A=H® C64 M;3(C) 3 (a,b,c) contains quaternions, 
that is, 2 x 2 matrices of the form 


a= (3 F xyEC 
y X 


complex numbers b and complex 3 x 3 matrices c. 
The Hilbert space is 30-dimensional, where we 
count particles and antiparticles (-°) separately: 
H =H @ Hr S HE PHL =C! a C’ e C! a C. The 


representation is block-diagonal, with the four 














blocks 
a&l; 0 
pi(4) -( 
0 a 
b1; 0 0 [12] 
pr(b):= | 0 bl; O0 
0 0 b 
ibo h S a) 
[13] 


€ 
pr(0,c) = | 0 
0 


The internal Dirac operator (=fermionic mass 
matrix) contains two quark masses m„, mg and one 
lepton mass me, and no mixing: 


0 M 0 0 
M0 0 0 
D= f 
0 0 0 M 
0 0 M 0 
[14 
m, 0 
( 0 — 
m 
M= 2 


These matrices look rather ad hoc; they are not. 
They define an irreducible spectral triple and, for a 
given algebra, there is only a finite number of such 
triples. 


The Spectral Action 


Chamseddine and Connes (1997) generalize general 
relativity to noncommutative spacetimes in two 
strokes, kinematics and dynamics. They explicitly 
compute this generalization for almost commutative 
geometries. 

Kinematics In noncommutative geometry, gen- 
eral coordinate transformations are algebra auto- 
morphisms lifted to the Hilbert space of spinors. 
For almost commutative geometries, these transfor- 
mations are precisely general coordinate trans- 
formations of ordinary spacetime and gauge 
transformations. Now remember how Einstein uses 
the equivalence principle to produce “gravity = 
curvature” starting from the flat metric, which in 
Connes’ language is the ordinary flat Dirac opera- 
tor. When applied to an almost commutative 
geometry (Connes 1996), the equivalence principle 
produces again a curved metric via the ordinary 
coordinate transformations on M, while the gauge 
transformations applied to the fermionic mass 
matrix produce a new field, the Higgs scalar y. For 
the example above, this field is precisely the isospin 
doublet, color singlet with hypercharge —1/2 of eqn 
[6]. Gauge transformations also apply to the 
ordinary Dirac operator, thereby producing the 
gauge fields A. 

Dynamics The group of generalized coordinate 
transformations allowed us to construct the con- 
figuration space. In the almost commutative case it 
consists of Riemannian metrics, gauge fields, and 
Higgs scalars. We now want a dynamics on this 
configuration space. Of course, we want this 
dynamics to be invariant under the group of 
generalized coordinate transformations. Note that 
the spectrum of the Dirac operator is invariant 
under this group and Chamseddine and Connes 
(1997) define the spectral action as a regularized 
partition function of these eigenvalues. 

On almost commutative geometries, the spectral 
action is equal to the Einstein—Hilbert action plus 
the Yang—Mills-Higgs ansatz (Figure 2). In other 
words, almost commutative geometry explains the 
forces mediated by gauge bosons and Higgs scalars 
as pseudoforces accompanying the gravitational 
force in the same way that Minkowskian geometry 
(i.e., special relativity) explains the magnetic force as 
a pseudoforce accompanying the electric force. 
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Noncommutative geometry — r 22 
Connes 
Almost commutative Connes Gravity + Yang—Mills—Higgs 
geometry ansatz + constraints 
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Figure 2 Deriving the Yang—Mills—Higgs ansatz from gravity. 
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Figure 3 Constraints inside the ansatz. 


There are constraints on the discrete and contin- 
uous parameters in the Yang—Mills-Higgs ansatz 
deriving from the spectral action Figure 3. 

In particular, if we consider only irreducible spectral 
triples and among them only those which produce 
nondegenerate fermion masses compatible with renor- 
malization, then we only get the standard model with 
one generation of quarks and leptons, with a massless 
neutrino and with an arbitrary number of colors, and a 
few submodels thereof. More than one generation and 
neutrino masses are possible but imply reducible 
triples. However, in at least one generation, the 
neutrino must remain purely left and massless. 

For the standard model with N generations 
and N. colors, we have the constraints 
gN. = g5 =(9/N)A on the continuous parameters. If 
we put N=N,=3 and if we believe in the popular 
“big desert” then these constraints yield a “unifica- 
tion scale? A=10!7 GeV at which the uncertainty 
relation in spacetime should become manifest, 
Ar=h/A, and a Higgs mass of m,=171.6+ 
5 GeV for m; = 174.3 + 5.1 GeV (see Figure 4). 

It is clear that almost commutative geometries 
only scratch the surface of a gold mine. May we 
hope that a genuinely noncommutative geometry 
will solve our present problems with quantum field 
theory and quantum gravity? 


Figure 4 Running coupling constants. 


See also: Compact Groups and Their Representations; 
Dirac Fields in Gravitation and Nonabelian Gauge 
Theory; Effective Field Theories; General Relativity: 
Overview; Hopf Algebras and g-Deformation Quantum 
Groups; Positive Maps on C*-Algebras; Quantum Hall 
Effect; Standard Model of Particle Physics; Symmetries 
and Conservation Laws; Symmetry Breaking in Field 
Theory; von Neumann Algebras: Introduction, Modular 
Theory, and Classification Theory. 
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Noncommutative Geometry from 
String Theory 


The first use of noncommutative geometry in string 
theory appears in the work of Witten on open-string 
field theory where the noncommutativity is asso- 
ciated with the product of open-string fields. 
Noncommutative geometry appears in the recent 
development of string theory in the seminal work of 
Connes, Douglas, and Schwarz where they con- 
structed and identified the compactification of 
Matrix theory on a noncommutative torus. 


Matrix Theory Compactification and 
Noncommutative Geometry 


The matrix theory (M-theory) is an 11-dimensional 
quantum theory of gravity which is believed to 
underlie all superstring theories. Banks, Fischler, 
Shenker, and Susskind proposed that the large N 
limit of the supersymmetric matrix quantum 
mechanics of N D0O-branes should describe the 
M-theory compactified on a lightlike circle. 
Compactification of the M-theory on a torus can be 
easily achieved by considering the torus as the quotient 
space R*/ZŹ with the quotient conditions 


U;' XU; = X + 827R; i=1,...,d [1 


Here R; are the radii of the torus. The unitary 
translation generators U; generate the torus. They 
satisfy U;U;=U;,U;. T-dualizing the DO brane 
system, eqn [1] leads to the dual description as a 
(d+ 1)-dimensional supersymmetric gauge theory 
on the dual toroidal D-brane. A noncommutative 
torus T? is defined by the modified relations 


U;U; = ei U; U; [2] 


where 6; specify the noncommutativity. Compacti- 
fication on a noncommutative torus can be easily 


accommodated and leads to noncommutative gauge 
theory on the dual D-brane. The parameters 0; can 
be identified with the components C_j; of the 3-form 
potential in M-theory. 

Since M-theory compactified on a circle leads to 
IIA string theory, the components C correspond to 
the Neveu-Schwarz (NS) B-field B; in IIA string 
theory. The physics of the DO brane system in the 
presence of an NS B-field can also be studied from 
the viewpoint of IIA string theory. This led Douglas 
and Hull to obtain the same result that a non- 
commutative field theory lives on the D-brane. 
Toroidally compactified HA string theory has a 
T-duality group SO(d, d; Z). The T-duality symmetry 
gets translated into an equivalence relation between 
gauge theories on the noncommutative torus: a gauge 
theory on the noncommutative torus T? is equivalent 
to that on the noncommutative torus T¢ if their 
noncommutativity parameters and metrics are related 
by a T-duality transformation. For example, 


6’ = (A0 + B)(CO+D)"', 


A B 
P A € SO(d, d; Z) [3] 
It is remarkable that the T-duality acts within the 
field theory level, rather than mixing up the field 
theory modes with the string winding states and 
other stringy excitations. Mathematically, eqn [3] is 
precisely the condition for the noncommutative tori 
TŻ and T to be Morita equivalent. 


Open-String in B-Field 


It was soon realized that the D-brane does not 
necessarily need to be toroidal in order to be 
noncommutative. A direct canonical quantization 
of the open-string system shows that a constant 
B-field on a D-brane leads to noncommutative 
geometry on the D-brane world volume. Consider 
an open string moving in a flat space with metric gj 
and a constant NS B-field. In the presence of a Dp 
brane, the components of the B-field not along the 
brane can be gauged away; thus, the B-field can 
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have effects only in the longitudinal directions along 
the brane. The world-sheet (bosonic) action for this 
part is 


1 j 
Ser 
3 4ra! [ ° 
x (giidax' Ox! 2a al Bye"? Oaa Opa) [4] 


where i,j=0,1,...,p is along the brane. It is easy 
to see that the boundary condition gj0,x/ + 
2ria’B,0,x) =0 at c=0,7 is not compatible with 
the standard canonical quantization [x‘(7,0), 
x!(T,0')|=0 at the boundary. Taking the boundary 
condition as constraints and performing canonical 


quantization, one obtains the commutation 
relations 

air dp] = MG" bmin, (Xo, Po] = iG”, 

ix, x] = i s 


Here, the open-string mode expansion is 
x (T, o) =x) +20 (phr — 2ra’ (87 B) pho) 
Ey 


n#£() 





n 
x (ia, cos no — 2ra’ (g~'B)‘d’, sin no) 


jon 


G’ and 6” are the symmetric and antisymmetric 
parts of the matrix (g + 27a/B)’: 


= \g+2ra'B z g — 2ra'B 


g 1 1 ij 
gil ye 2 ee o 
(ame (5 55) 


It follows from [5] that the boundary coordinates 
x = x'(T,0) obey the commutation relation 


6 


[x x] = 10” [7] 


Relation [7] implies that the D-brane world volume, 
where the open-string endpoints live, is a noncom- 
mutative manifold. One may also start with the 
closed-string Green function and let its arguments to 
approach the boundary to obtain the open-string 
Green function 


(x (r)i (T')} = —a'GË In(r — 7')* + Loer —7') [8] 


where e(r) is the sign of r. From [8], one can 
again extract the commutator [7]. Gy = gj — (2ra a 
(Bg™' B); is called the open-string metric since it controls 
the short-distance behavior of open strings. In contrast, 
the short-distance behavior for closed strings is con- 
trolled by the closed-string metric g;. One may also treat 


the boundary B-term in [4] as a perturbation to the 
open-string conformal field theory and from which one 
may extract [8] from the modified operator product 
expansion of the open-string vertex operators. 

D-branes in the Wess—Zumino—Witten model 
provide another example of noncommutative geo- 
metry. In this case, the background is not flat since 
there is a nonzero H=dB~ k™/?, where k is the 
level. Examining the vertex operator algebra, one 
obtains that D-branes are described by nonassocia- 
tive deformations of fuzzy spheres with nonassocia- 
tivity controlled by 1/k. 


String Amplitudes and Effective Action 


The effect of the B-field on the open-string ampli- 
tudes is simple to determine since only the xj 
commutation relation is affected nontrivially. For 
example, the noncommutative gauge theory can be 
obtained from the tree-level string amplitudes read- 
ily. For tree and one loop, the vertex operator 
formalism can be used. Generally, the vertex 
operator can be inserted at either the c=0 or o=7 
boundary, where the string has zero mode parts x‘ 
and yi = xh — (27a! ) (g B}\iph, respectively. The 
commutation relations are 


xih = i0, [xh 94] = 0, 

[Yo Yo] = —10” 9] 
The difference in the commutation relation for xo 
and yo implies that the two boundaries of the open 
string have opposite commutativity. This fact is not 
so important for tree-level calculations since one can 
always choose to put all the interactions at, for 
example, the o=0 boundary. Collecting all these 
zero mode parts of the vertex operators, one obtains 
a phase factor 


: N 
eib" xo eib xo , eib™ xo 2 e! ` pio .—(i/2) os pop! [1 0| 


where the external momenta p° are ordered cycli- 
cally on the circle, and momentum conservation has 
been used. The computation of the oscillator part of 
the amplitude is the same as in the B =0 case, except 
that the metric G is employed in the contractions. 
As a result, the effect of the B-field on the tree-level 
string amplitude is simply to multiply the amplitude 
at B=0 by the phase factor and to replace the 
metric by the metric G. A generic term in the tree- 
level effective action simply becomes 


J d’ ttx /— det gtr OO, ---O%*®, 


= [a's — det G tr o"! ®ı x -- -x O, M11] 


Here the star product, also called the Moyal 
product, is defined by 


(f * g)(x) 
= exp (iF aga FOO [12] 


The star product is associative and noncommutative, 
and satisfies f x g = g x f under complex conjugation. 
Also, for functions that vanish rapidly enough at 
infinity, there holds 


Jf»s=]s+f=] fe [13] 


An interesting consequence of the nonlocality 
as expressed by the noncommutative geometry [7] 
is the existence of a dipole excitation whose extent is 
proportional to its momentum, Ax = k8. This rela- 
tion is at the heart of the “IR/UV mixing phenom- 
enon” (see below) of noncommutative field theory. 

At one- (and higher-) loop level, the different 
noncommutativities for the opposite boundaries of 
the open string become essential and give rise to new 
effects. In this case nonplanar diagrams require one 
to put vertex operators at the two different 
boundaries o=0,7. A more complicated phase 
factor, which involves internal as well as external 
momentum, results. This leads to IR/UV mixing in 
the noncommutative quantum field theory. The 
different noncommutativity for the opposite bound- 
aries of the open string [9] is the basic reason for the 
IR/UV mixing in the noncommutative quantum field 
theory. The commutation relations [5] are valid at all 
loops; therefore, one can use them to construct the 
higher-loop string amplitudes from first principles. 
The effect of the B-field on the string interaction can 
easily be implemented into the Reggeon vertex and 
the complete higher loop amplitudes in the presence 
of the B-field have been constructed. 





Low-Energy Limit - The Seiberg-Witten Limit 
and the NCOS Limit 


The full open-string system is still quite complicated. 
One may try to decouple the infinite number of 
massive string modes to obtain a low-energy field- 
theoretic description by taking the limit a’ — 0. 
Since open strings are sensitive to G and 0, one 
should take the limit such that G and @ are fixed. 
For the magnetic case Bo; =0, Seiberg and Witten 
showed that this can be achieved with the following 
double scaling limit: 


af a cll 


2, Bw EN [14] 
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with B;; and everything else kept fixed. Assuming B 
is of rank r, then [6] becomes 


Gi = —(2na’)”(Bg'B),, 07 = (B71), 
1G) a oe el [15] 


Otherwise G; = gj, 0’ =0. One may also argue that 
the closed string decouples in this limit. As a result, 
in the low-energy limit a greatly simplified non- 
commutative Yang-Mills action Fx F is obtained 
(see below for more discussion of this field theory). 

For the case of a constant electric field back- 
ground, say Bo, Æ 0, there is a critical electric field 
beyond which the open string becomes unstable and 
the theory does not make sense. Due to the presence 
of this upper bound of the electric field, one can 
show that there is no decoupling limit where one 
can reduce the string theory to a field theory on a 
noncommutative spacetime. However, one can con- 
sider a different scaling limit where one takes the 
closed-string metric scale to infinity appropriately as 
the electric field approaches the critical value. In this 
limit, all closed-string modes decouple. One obtains 
a novel noncritical string theory living on a 
noncommutative spacetime known as the noncom- 
mutative open string (NCOS). 


Noncommutative Quantum Field Theory 


Field theories on noncommutative spacetime are 
defined by using the star product instead of the 
ordinary product of the fields. To illustrate the 
general ideas, let us consider a single real scalar field 
theory with the action 


1 2 
S= J Pzero- Tore- V(¢) 


Ve = 50" 16 
Due to the property [13], free noncommutative field 
theory is the same as an ordinary field theory. 
Treating the interaction term as a perturbation, one 
can perform the usual quantization and obtain the 
Feynman rules: the propagator is unchanged and the 
interaction vertex in the momentum space is given 
by g times the phase factor 


exp -; Y px | [17] 
1<a<b<4 


Here p x g=p,,0"’qv. The theory is nonlocal due to 
the infinite order of derivatives that appear in the 
interaction. 
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Planar and Nonplanar Diagrams 


The factor [17] is cyclically symmetric but not 
permutation symmetric. This is analogous to the 
situation of an M-field theory. Using the same 
double-line notation as introduced by ’t Hooft, one 
can similarly classify the Feynman diagrams of 
noncommutative field theory according to its 
genus. In particular, the total phase factor of a 
planar diagram behaves quite differently from that 
of a nonplanar diagram. It is easy to show that a 
planar diagram will have the phase factor 


` pp) [18] 


1<a<b<n 


Vip yeep = "e 


where pt, ...,p” are the (cyclically ordered) external 
momenta of the graph. Note that the phase factor 
[18] is independent of the internal momenta. This is 
not the case for a nonplanar diagram. One can easily 
show that a nonplanar diagram carries an additional 
phase factor 


Vap = Vp exp (- 5 


`N Cab" x P) [19] 


1<a<b<n 


where C,, is the signed intersection matrix of the 
graph, whose ab matrix element counts the number 
of times the ath (internal or external) line crosses the 
bth line. The matrix C,, is not uniquely determined 
by the diagram as different ways of drawing the 
graph could lead to different intersections. However, 
the phase factor [19] is unique due to momentum 
conservation. 

The different behaviors of the planar and non- 
planar phase factors have important consequences. 


1. Since the phase factor [18] is independent of the 
internal momenta, the divergences and renorma- 
lizability of the planar diagrams will be (simply) 
the same as in the commutative theory and can 
be handled with standard renormalization tech- 
niques. This is sharply different for the nonplanar 
diagrams. In fact, due to the extra oscillatory 
internal-momenta-dependent phase factor, one 
can expect the nonplanar diagrams to have an 
improved ultraviolet (UV) behavior. It turns out 
that planar and nonplanar diagrams also differ 
sharply in their infrared (IR) behavior due to the 
“IR/UV mixing effect” (see below). 

2. Moreover, at high energies one can expect that 
noncommutative field theory will generically 
become planar since the nonplanar diagrams will 
be suppressed due to the oscillatory phase factor. 

3. In the limit 6 — œ, the nonplanar sector will be 
totally suppressed since the rapidly oscillating 


phase factor will cause the nonplanar diagram to 
vanish upon integrating out the momenta. Thus, 
generically the large 0 limit is analogous to the 
large-N limit where only the planar diagrams 
contribute. However, these expectations do not 
apply for noncommutative gauge theory since 
one needs to include “open Wilson lines” (see 
below) in the construction of gauge invariant 
observables, and the open Wilson line grows in 
extent with energy and 6. 


IR/UV Mixing 


Due to the nonlocal nature of noncommutative field 
theory, there is generally a mixing of the UV and 
IR scales. The reason is roughly the following. 
Nonplanar diagrams generally have phase factors 
like exp (ik@p) with k a loop momentum, p an 
external momentum. Consider a nonplanar diagram 
which is UV divergent when 6=0; one can expect 
that for very high loop momenta the phase factor 
will oscillate rapidly and render the integral finite. 
However, this is only valid for a nonvanishing 
external momentum 6p; the infinity will come back 
as 0p — 0. However, this time it appears as an IR 
singularity. Thus, an IR divergence arises whose 
origin is from the UV region of the momentum 
integration and this is known as the IR/UV mixing 
phenomenon. 

To be more specific, consider the ¢* scalar theory 
in D =4 dimensions. The one-loop self-energy has a 
nonplanar contribution given by 


rp 8 J d'k o 8 
P 6(2n)4 J k? +m? 3(4r2)? 
x (Age +) [20] 





where Na(l JA + (0p) yt. One can see clearly 
the IR/UV mixing: Pnp is UV finite as long as Op ¥ 0; 
when 6p=0, the quadratic UV divergence is 
recovered, Ipp ~ A?. For supersymmetric theory, 
one has at most logarithmic IR singularities from 
IR/UV mixing. 

IR/UV mixing has a number of interesting 
consequences. 


1. Due to the IR/UV mixing, noncommutative 
theory does not appear to have a consistent 
Wilsonian description since it requires that 
correlation functions computed at finite A differ 
from their limiting values by terms of order 1/A 
for all values of momenta. However, this is not 
true for theory with IR/UV mixing. For example, 
the two-point function [20] at finite value of A 
differs from its value at A=co by the amount 


Thy — ea x 1/(0p)*, for the range of momenta 
(0p) <1/A?. It has been argued that the IR 
singularity may be associated with missing light 
degrees of freedom in the theory. With new 
degrees of freedom appropriately added, one may 
recover a conventional Wilsonian description. 
Moreover, it has been suggested to identify 
these degrees of freedom with the closed-string 
modes. However, the precise nature and origin of 
these degrees of freedom is not known. 

2. The renormalization of the planar diagrams is 
straightforward; however, the situation is more 
subtle for the nonplanar diagrams since the IR/ 
UV-mixed IR singularities may mix with other 
divergences at higher loops and render the proof 
of renormalizability much more difficult. IR/UV 
mixing renders certain large N noncommutative 
field theory nonrenormalizable. However, for 
theories with a fixed set of degrees of freedom 
to start with, it is believed that one can have 
sufficiently good control of the IR divergences 
and prove renormalizability. An example of 
renormalizable noncommutative quantum field 
theory is the noncommutative Wess—Zumino 
model where IR/UV mixing is absent. However, 
a general proof is still lacking. 

3. One can show that IR/UV mixing in timelike 
noncommutative theory (0° 4 0) leads to break- 
down of perturbative unitarity. For a theory 
without IR/UV mixing, unitarity will be respected 
even if the theory has a timelike noncommuta- 
tivity. Theory with lightlike noncommutativity is 
unitary. 


Noncommutative Gauge Theory 


Gauge theory on noncommutative space is defined 
by the action 


SS] af dx tr (F(x) + F (œ)) [21] 


where the gauge fields A; are N x N Hermitian 
matrices, F; is the noncommutative field strength 
F; = 0;A; — 0;A; — i[Aj, Aj],, and tr is the ordinary 
trace over N x N matrices. The theory is invariant 
under the star-gauge transformation 


A; —> g x A; x g' — ig x Og! [22] 


where the N x N matrix function g(x) is unitary 
with respect to the star product g*g!=g!* g=I. 
The solution is g=e”, where A is Hermitian. In 
infinitesimal form, 6,A;=0;A + i[A, Aj],. The non- 
commutative gauge theory has N? Hermitian gauge 
fields. Because of the star product, the U(1) sector of 
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the theory is not free and does not decouple from 
the SU(N) factor as in the commutative case. Note 
that this way of defining noncommutative gauge 
theory does not work for other Lie groups since the 
star commutator generally involves the commutator 
as well as the anticommutator of the Lie algebra; 
hence, the expressions above generally involve the 
enveloping algebra of the underlying Lie group. 
With the help of the “Seiberg-Witten map” (see 
below), one can construct an enveloping-algebra- 
valued gauge theory which has the same number of 
independent gauge fields and gauge parameters as 
the ordinary Lie-algebra-valued gauge theory. How- 
ever, the quantum properties of these theories are 
much less understood. One may also introduce 
certain automorphisms in the noncommutative 
U(N) theory to restrict the dependence of the 
noncommutative space coordinates of the field 
configurations and obtain a notion of noncommu- 
tative theory with orthogonal and symplectic star- 
gauge group. However, the theory does not reduce 
to the standard gauge theory in the commutative 
limit ð — 0. 


Open Wilson Line and Gauge-Invariant 
Observables 


One remarkable feature of noncommutative gauge 
theory is the mixing of noncommutative gauge 
transformations and spacetime translations, as can 
be seen from the following identity: 


elk x F(x) x eR = f(x + RO) [23] 


for any function f. This is analogous to the situation 
in general relativity where translations are also 
equivalent to gauge transformations (general coor- 
dinate transformations). Thus, as in general relativ- 
ity, there are no local gauge-invariant observables in 
noncommutative gauge theory. The unification of 
spacetime and gauge fields in noncommutative 
gauge theory can also be seen from the fact 
that derivatives can be realized as commutators, 
O;f = —i[6;"x! ,f], and get absorbed into the vector 
potential in the covariant derivative 


D; = ð; + iA; > —i0;'x + iA; [24] 


Equation [24] clearly demonstrates the unification 
of spacetime and gauge fields. Note that the field 
strength takes the form F; =i[D;, Dj] + 0. 

The Wilson line operator for a path C running 
from xı to x2 is defined by 


W(C) = P,exp ( J A) 25] 
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P, denotes the path ordering with respect to the star 
product, with A(x2) at the right. It transforms as 


W(C) — g(x1) * W(C) * g(x)" |26] 


In commutative gauge theory, the Wilson line 
operator for closed loop (or its Fourier transform) 
is gauge invariant. In noncommutative gauge 
theory, the closed Wilson loops are no longer 
gauge invariant. Noncommutative generalization 
of the gauge invariant Wilson loop operator can 
be constructed most readily by deforming the 
Fourier transform of the Wilson loop operator. It 
turns out that the closed loop has to open in a 
specific way to form an open Wilson line in order 
to be gauge invariant. To see this, let us consider a 
path C connecting points x and x + l. Using [23], it 
is easy to see that the operator 


Wk) = J dx tr W(C) xe, vihik [27 


is gauge invariant. Just like Wilson loops in ordinary 
gauge theory, these operators also constitute an 
overcomplete set of gauge-invariant operators para- 
metrized by the set of curves C. When 06=0,C 
becomes a closed loop and we reobtain the (Fourier 
transformed) usual closed Wilson loop in commu- 
tative gauge theory. Noncommutative version of the 
loop equation for closed Wilson loop has been 
constructed and involves open Wilson line. The 
open Wilson line is instrumental in the construction 
of gauge-invariant observables. An important appli- 
cation is in the construction of various couplings of 
the noncommutative D-brane to the bulk super- 
gravity fields. The equivalence of the commutative 
and noncommutative couplings to the RR fields 
leads to the exact expression for the Seiberg—Witten 
map. It is remarkable that the one-loop nonplanar 
effective action for noncommutative scalar theory, 
gauge theory, as well as the two-loop effective 
action for scalar can be written compactly in terms 
of open Wilson line. Based on this result, the 
physical origin of the IR/UV mixing has been 
elucidated. One may identify the open Wilson line 
with the dipole excitation generically presents in 
noncommutative field theory and hence explain the 
presence of the IR/UV mixing. IR/UV mixing may 
also be identified with the instability associated with 
the closed-string exchange of the noncommutative 
D-branes. 


The Seiberg-Witten Map 


The open string is coupled to the 1-form A; living on 
the D-brane through the coupling fap A. For slowly 
varying fields, the effective action for this gauge 


potential can be determined from the S-matrix and 
is given by the Dirac—Born-Infeld (DBI) action. In 
the presence of a B-field, the discussion above (see 
eqn [11]) leads to the noncommutative DBI 
Lagrangian 


Lncpst(F) = Gr" up — det(G + 2ra! P) [28] 


where up = (2m) ® (a!) PT1? is the D-brane tension 
and Ê is the noncommutative field strength. 
However, one may also exploit the tensor gauge 
invariance on the D-brane (i.e., the string sigma 
model is invariant under A — A — A, B — B + dA) 
and consider the combination F + B as a whole. In 
this case, it is like having the open string coupled 
to the boundary gauge field strength F+ B and 
there is no B field. One has the usual DBI 
Lagrangian 


Lpgi(F) = g7 'up4/— det(G + 2ra'(F +B)) [29] 
In [28] and [29], G; and g, are the effective open- 
string couplings in the noncommutative and com- 
mutative descriptions. Although they look quite 
different, Seiberg and Witten showed that the 
commutative and noncommutative DBI actions 
are indeed equivalent if the open-string couplings 
are related by g,=G,./det(g+27a'B)/detG and 
there is a field redefinition that relates the 
commutative and noncommutative gauge fields. 
The map A=A(A) is called the Seiberg-Witten 
map. Moreover, the noncommutative gauge sym- 
metry is equivalent to the ordinary gauge symme- 
try in the sense that they have the same set of 
orbits under gauge transformation: 


A(A) + &A(A) = A(A + ôA) [30] 


Here A; and X are, respectively, the noncommu- 
tative gauge field and noncommutative gauge 
transformation parameter, and A; and A are, 
respectively, the ordinary gauge field and ordinary 
transformation parameter. The map between A; 
and A; is called the Seiberg-Witten map. Equation 
[30] can be solved only if the transformation 
parameter A=X(\,A) is field dependent. The 
Seiberg-Witten map is characterized by the Seiberg- 
Witten differential equation 


6Aj(0)=4.50"'| Ay x (OÀ; + Fy) 
+(O)A; + Fy) * A, [31] 


An exact solution for the Seiberg-Witten map can 
be written down with the help of the open Wilson 


line. For the case of U(1) with constant F, we have 
the exact solution F=(1 + F0)" F. 

That there is a field redefinition that allows one to 
write the effective action in terms of different fields 
with different gauge symmetries may seem puzzling 
at first sight. However, it has a clear physical origin 
in terms of the string world sheet. In fact, there are 
different possible schemes to regularize the short- 
distance divergence on the world sheet. One can 
show that the Pauli—Villars regularization gives the 
commutative description, while the point-splitting 
regularization gives the noncommutative descrip- 
tion. Since theories defined by different regulariza- 
tion schemes are related by a coupling-constant 
redefinition, this implies that the commutative and 
noncommutative descriptions are related by a field 
redefinition, because the couplings on the world 
sheet are just the spacetime fields. 

Despite this formal equivalence, the physics of the 
noncommutative theories is generally quite different 
from the commutative case. First, it is clear that 
generally the Seiberg-Witten map may take non- 
singular configurations to singular configurations. 
Second, the observables one is interested in are also 
generally different. Moreover, the two descriptions 
are generally good for different regimes: the con- 
ventional gauge theory description is simpler for 
small B and the noncommutative description is 
simpler for large B. 


Perturbative Gauge Theory Dynamics 


The noncommutative gauge symmetry [22] can be 
fixed as usual by employing the Faddeev—Popov 
procedure, resulting in Feynman rules that are 
similar to the conventional gauge theory. The 
important difference is that now the structure 
constants in the phase factors [18] and [19] should 
be amended. It turns out that the nonplanar U(N) 
diagrams contribute (only) to the U(1) part of the 
theory. As a result, unlike the commutative case, the 
U(1) part of the theory is no longer decoupled and 
free. Noncommutative gauge theory is one-loop 
renormalizable. The (@-function is determined solely 
by the planar diagrams and, at one loop, is given by 





for N> 1 [32] 


Note that the G-function is independent of 0; the 
noncommutative U(1) is asymptotically free and 
does not reduce to the commutative theory when 
0 — 0. Noncommutative theory beyond the tree 
level is generally not smooth in the limit 0 — 0. 
Discontinuity of this kind was also noted for the 
Chern-Simon system. 
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Gauge anomalies can be similarly discussed and 
satisfy the noncommutative generalizations of the 
Wess—Zumino consistency conditions. In d=2n 
dimensions, the anomaly involves the combination 
r(T"T” ---T%+1) rather than the usual symme- 
trized trace, since the phase factor is not permutation 
symmetric. As a result, the usual cancellation of the 
anomaly does not work and is the main obstacle to 
the construction of noncommutative chiral gauge 
theory. 

There are a number of interesting features to 
mention for the IR/UV mixing in noncommutative 
gauge theory. 


1. IR/UV mixing generically yields pole-like IR 
singularities. Despite the appearance of IR 
poles, gauge invariance of the theory is not 
endangered. 

2. One can show that only the U(1) sector is 
affected by IR/UV mixing. 

3. As a result of IR/UV mixing, noncommutative 
U(1) photons polarized in the noncommutative 
plane will have different dispersion relations 
from those which are not. Strange as it is, this 
is consistent with gauge invariance. 


Noncommutative Solitons, Instantons 
and D-Branes 


Solitons and instantons play important roles in the 
nonperturbative aspects of field theory. The non- 
locality of the star product gives noncommutative 
field theory a stringy nature. It is remarkable that 
this applies to the nonperturbative sector as well. 
Solitons and instantons in the noncommutative 
gauge theory amazingly reproduce the properties of 
D-branes in the string. 


GMS Solitons 


Derrick’s theorem says that commutative scalar field 
theories in two or higher dimensions do not admit 
any finite-energy classical solution. This follows 
from a simple scaling argument, which will fail 
when the theory becomes noncommutative since 
noncommutativity introduces a fixed length scale 
VO. Noncommutative solitons in pure scalar theory 
can be easily constructed in the limit 9=co. For 
example, consider a (2 + 1)-dimensional single sca- 
lar theory with a potential V and noncommutativity 
01? =0. In the limit 9=00, the potential term 
dominates and the noncommutative solitons are 
determined by the equation 


OV /d¢ =0 [33] 
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Equation [33] can be easily solved in terms of 
projectors. Assuming V has no linear term, the 
general soliton (up to unitary equivalence) is 


$= XAP; [34] 


where A; are the roots of V'(A)=0 and P; is a set of 
orthogonal projectors. For real scalar field theory, 
the sum is restricted to real roots only. These 
solutions are known as the Gopakumar-Minwalla- 
Strominger (GMS) solitons. A simple example of a 
projector is given by P=|0)(0|, which corresponds 
to a Gaussian profile in the xt, x? plane with width 
v8. The soliton continues to exist until 6 decreases 
below a certain critical 6<. 

New solutions can be generated from known ones 
using the so-called solution-generating technique. If 
@ is a solution of [33], then 


g = ToT [35] 


is also a solution provided that TT'=1. In an 
infinite-dimensional Hilbert space, T is not necessa- 
rily unitary, that is, T'T Æ 1. In this case, T is said 
to be a partial isometry. The new solution ¢’ is 
different from @ since they are not related by a 
global transformation of basis. 


Tachyon Condensation and D-Branes 


A beautiful application of the noncommutative 
soliton is in the construction of D-branes as solitons 
of the tachyon field in noncommutative open-string 
theory. For the bosonic string theory, one may 
consider it to be a space-filling D25 brane. Integrat- 
ing out the massive-string modes leads to an 
effective action for the tachyon and the massless 
gauge field A,,. It should be remarked that, contrary 
to the pure scalar case, noncommutative solitons can 
be constructed exactly for finite 0 in a system with 
gauge and scalar fields. Although the detailed form 
of the effective action is unknown, one has enough 
confidence to say what the true vacuum configura- 
tion is according to the Sen conjecture. One can then 
apply the solution-generating technique to generate 
new soliton solutions. In this manner, with a B-field 
of rank 2k, one can construct solutions which are 
localized in R% and represent a D(25 — 2k) brane. 
This is supported by the matching of the tension 
and the spectrum of fluctuations around the 
soliton configuration. Similar ideas can also be 
applied to construct D-branes in type II string 
theory. Again the starting point is an unstable 
brane configuration with tachyon field(s). There 
are two types of unstable D-branes: non-BPS Dp 
branes (p odd in IIA theory and p even in IIB 
theory) and BPS branes—antibranes Dp-Dp 


systems. A similar analysis allows one to identify 
the noncommutative soliton with the lower- 
dimensional BPS D-branes which arises from 
tachyon condensation. 

One main motivation for studying tachyon 
condensation in open-string theory is the hope 
that open-string theory may provide a fundamental 
nonperturbative formulation of string theory. It 
may not be too surprising that D-branes can be 
obtained in terms of open-string fields. However, 
to describe closed strings and NS branes in terms 
of open-string degrees of freedom remains an 
obstacle. 


Noncommutative Instanton and Monopoles 


Instantons on noncommutative Rj can be readily 
constructed using the Atiyah—Drinfeld—Hitchin- 
Manin (ADHM) formalism by modifying the 
ADHM constraints with a constant additive 
term. The result is that the self-dual (resp. anti- 
self-dual) instanton moduli space depends only on 
the anti-self-dual (resp. self-dual) part. The con- 
struction goes through even in the U(1) case. 
Consider a self-dual 6; the ADHM constraints for 
the self-dual instanton are the same as in the 
commutative case, and there is no nonsingular 
solution. On the other hand, the ADHM con- 
straints for the anti-self-dual instanton get mod- 
ified and admit nontrivial solutions. This 
noncommutative instanton solution is nonsingular 
with size v6. The noncommutative instanton 
represents a D(p-4) brane within a Dp brane. 
The ADHM constraints are just the D-flatness 
condition for the D-brane world-volume gauge 
theory. The additive constant to the ADHM 
constraints also has a simple interpretation as a 
Fayet—Iliopolous parameter which appears in the 
presence of a B-field. Although the ADHM 
method does not give a self-dual instanton, a 
direct construction can be applied to obtain non- 
ADHM self-dual instantons. Recall that the gauge 
field strength can be written as Fj; =i[D;, Dj] + 0, 
where D; is given by the function on the right- 
hand side of [24]. Thus, a simple self-dual solution 
can be constructed with 


Dp=10, TT [36] 


where T is a partial isometry which satisfies 
TT'=1, but T'T=1-—P is not necessarily the 
identity. It is clear that P is a projector. The field 
strength 


Fy = 6;°P [37] 


is self-dual and has instanton number n where n is 
the rank of the projector. 

On noncommutative R? (say 01? = 6), BPS mono- 
poles satisfy the Bogomolny equation: 


Vest, 21,05 (38) 


and can be obtained by solving the Nahm 
equation 


O, T; = Eak Lj Tp + 6;30 [39] 


T; are k x k Hermitian matrices depending on an 
auxiliary variable z and k gives the charge of the 
monopole. Noncommutativity modifies the Nahm 
equation with a constant term, which can be 
absorbed by a constant shift of the generators. 
Therefore, unlike the case of instanton, the mono- 
pole moduli space is not modified by noncommuta- 
tive deformation. The Nahm construction has a 
clear physical meaning in string theory. The mono- 
pole (electric charge) can be interpreted as a D-string 
(fundamental string) ending on a D3 brane. One can 
also suspend k D-string between a collection of N 
parallel D3 braness; this would correspond to a 
charge k monopole in a Higgsed U(N) gauge theory. 
The matrices X’ correspond to the matrix transverse 
coordinates of the D-strings which lie within the D3 
branes. 


Further Topics 


Finally, in the following some further topics of 
interest are discussed briefly. 


1. The noncommutative geometry discussed here is 
of canonical type. Other deformations exist, for 
example, kappa-deformation and fuzzy sphere 
which are of the Lie-algebra type, and quantum 
group deformation which is a quadratic-type 
deformation: x'x/ = gq Ri xkx!, whose consis- 
tency is guaranteed by the Yang—Baxter equation. 
It is interesting to see whether these noncommu- 
tative geometries arise from string theory. 
Another natural generalization is to consider 
noncommutative geometry of superspace. A 
simple example is to consider the fermionic 
coordinates to be deformed with the nonvanish- 
ing relation 


aa eae Oa [40] 


where C°? are constants. It has been shown that 
[40] arises in certain Calabi-Yau compactification 
of type IIB string theory in the presence of RR 
background. The deformation [40] reduces the 
number of supersymmetries by half. Therefore, 
it is called M =1/2 supersymmetry. The 
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noncommutativity [40] can be implemented on 
the superspace (y', 6°, 4°) as a star product for the 
0®s. Unlike the bosonic deformation which 
involves an infinite number of higher derivatives, 
the star product for [40] stops at order C* due to 
the Grassmannian nature of the fermionic coordi- 
nates. Field theory with M =1/2 supersymmetry 
is local and differs from the ordinary M = 1 theory 
by only a small number of supersymmetry break- 
ing terms. The N=1/2 Wess-Zumino model is 
renormalizable if extra F and F? terms are added 
to the original Lagrangian, where F is the auxiliary 
field. The N=1/2 gauge theory is also 


renormalizable. 


. Integrability of a theory provides valuable infor- 


mation beyond the perturbative level. An integr- 
able field theory is characterized by an infinite 
number of conserved charges in involution. It is 
natural to ask whether integrability is preserved 
by noncommutative deformation. Noncommuta- 
tive integrable field theories have been con- 
structed. In the commutative case, Ward has 
conjectured that all (1+1)- and (2 + 1)-dimen- 
sional integrable systems can be obtained from 
the four-dimensional self-dual Yang-Mills equa- 
tion by reduction. Validity of the noncommuta- 
tive version of the Ward conjecture has been 
confirmed so far. It will be interesting to see 
whether it is true in general. 


. Locality and Lorentz symmetry form the corner- 


stones of quantum field theory and standard 
model physics of particles. Noncommutative field 
theory provides a theoretical framework where 
one can discuss effects of nonlocality and Lorentz 
symmetry violation. Possible phenomenological 
signals have been investigated (mostly at the tree 
level) and a bound has been placed on the extent 
of noncommutativity. A proper understanding 
and better control of the IR/UV mixing remains 
the crux of the problem. Noncommutative 
geometry may also be relevant for cosmology 
and inflation. 


. Like the standard AdS/CFT correspondence, the 


noncommutative gauge theory should also have 
a gravity-dual description. The supergravity 
background can be determined by considering 
the decoupling limit of D-branes with an NS 
B-field background. However, since the non- 
commutative gauge theory does not permit any 
conventional local gauge-invariant observable, 
the usual AdS/CFT correspondence that relates 
field theory correlators with bulk interaction 
does not seem to apply. It has been argued that 
generic properties such as the relation between 
length and momentum for open Wilson lines 
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can be seen from the gravity side. A more precise 
understanding of the duality map is called for. 


See also: Brane Construction of Gauge Theories; 
Deformation Quantization; Gauge Theories from Strings; 
Noncommutative Tori, Yang-Mills, and String Theory; 
Positive Maps on C*-Algebras; Solitons and Other 
Extended Field Configurations; String Field Theory; 
Superstring Theories. 
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Introduction 


Noncommutative tori are historically among the 
oldest and by now the most developed examples 
of noncommutative spaces. Noncommutative 
Yang-Mills theory can be obtained from string 
theory. This connection led to a cross-fertilization 
of research in physics and mathematics on Yang- 
Mills theory on noncommutative tori. One 
important result stemming from that work is the 
link between T-duality in string theory and 
Morita equivalence of associative algebras. In 
this article, we give an overview of the basic 
results in the differential geometry of noncommu- 
tative tori. Yang-Mills theory on noncommuta- 
tive tori, the duality induced by Morita 
equivalence and its link with T-duality are 
discussed. The noncommutative Nahm transform 
for instantons is introduced. 


Noncommutative Tori 
The Algebra of Functions 


The basic notions of noncommutative differential 
geometry were introduced and illustrated on the 
example of a two-dimensional noncommutative 
torus by Connes (1980). To define an algebra of 


functions on a d-dimensional noncommutative 
torus, consider a set of linear generators U, labeled 
by neZ — a d-dimensional vector with integral 
entries. The multiplication is defined by the 
formula 


UnUm = ene Un +m [1] 


where 6/* is an antisymmetric dxd matrix, and 
summation over repeated indices is assumed. We 
further extend the multiplication from finite linear 
combinations to formal infinite series `, C(n)U,„ 
where the coefficients C(n) tend to zero faster than 
any power of ||||. The resulting algebra constitutes 
the algebra of smooth functions on a noncommuta- 
tive torus and will be denoted by T?. Sometimes for 
brevity we will omit the dimension label d in the 
notation of the algebra. We introduce an involution 
* in I by the rule U% = U-„. The elements U, are 
assumed to be unitary with respect to this involu- 
tion, that is, U*U,=U_,U,=1=Upo. One can 
further introduce a norm and take an appropriate 
completion of the involutive algebra T? to obtain 
the C*-algebra of functions on a noncommutative 
torus. For our purposes, the norm structure will not 
be important. A canonically normalized trace on T? 
is introduced by specifying 


irl, = Oni [2] 


Projective Modules 


According to the general approach to noncommuta- 
tive geometry, finitely generated projective modules 
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over the algebra of functions are natural analogs of 
vector bundles. Throughout this article, when speak- 
ing of a projective module, we will assume a finitely 
generated left projective module. 

A free module (T? \N is equipped with a T4-valued 
Hermitian inner product (.,.)7, defined by the 
formula 


N 


((a1,...,4n),(b1,...,6n))7,= Sat; D 


i=l 


A projective module E is by definition a direct 
summand of a free module. Thus, it inherits the 
inner product (.,.)7,. Consider the endomorphisms 
of the module E, that is, linear mappings E —> E 
commuting with the action of Tf. These endo- 
morphisms form an associative unital algebra 
denoted End7,E. A decomposition T) EOE 
determines an endomorphism P: ( 7 es T \N that 
projects (T4) onto E. The algebra Endr,E can then 
be identified with a subalgebra of Maty(T“%,) — the 
endomorphisms of the free module ( r \N. The latter 
has a canonical trace that is the composition of the 
matrix trace with the trace specified in [2]. By 
restriction, it gives rise to a canonical trace tr on 
Endr,E. The same embedding also provides a 
canonical involution on End7,E by a composition of 
the matrix transposition and the involution x on T. 

A large class of examples of projective modules 
over noncommutative tori are furnished by the 
so-called Heisenberg modules. They are constructed 
as follows. Let G be the direct sum of R? and an 
abelian finitely generated group, and let G* be its 
dual group. In the most general situation 
G=R? x Z1 x F where F is a finite group. Then 
G*R? x T1 x F*. 

Consider the linear space S(G) of functions on G 
decreasing at infinity faster than any power. We 
define operators U}, y :S(G)—> S(G) labeled by a 
pair (y, 7) € G x G* acting as follows: 


(Uant) = I) +) 4 


One can check that the operators Uy, satisfy the 
commutation relations 


E 
Ui Ua) SAY Ua U) [5] 


If (y, 7) run over a d-dimensional discrete subgroup 
TcGxG*,T2Z4, then formula [4] defines a 


module over a d-dimensional noncommutative 
torus T with 
exp(2710;) = Wy) (qi) [6] 


for a given basis (y;, 9;) of the lattice r. This module is 
projective if [ is such that G x G*/T is compact. 


If that is the case, then the projective T¢-module at 
hand is called a Heisenberg module and denoted 
by Fr. 

Heisenberg modules play a special role. If the 
matrix 0; is irrational in the sense that at least one 
of its entries is irrational, then any projective 
module over T? can be represented as a direct sum 
of Heisenberg modules. In that sense, Heisenberg 
modules can be used as building blocks to construct 
an arbitrary module. 


Connections 


Next we would like to define connections on a 
projective module over T4. To this end, let us first 
define a Lie algebra of shifts Lọ acting on T? by 
specifying a basis consisting of derivations 
&;: T? — T4,j=1,...,d satisfying 


(Un) = 271Nn; Ga [7] 


These derivations span a d-dimensional abelian Lie 
algebra that we denote by Ly. 

A connection on a module E over T? is a set of 
operators Vx: E—E, X € Lg, depending linearly on 
X and satisfying 


[Vx Un] =6x(Un) [8] 


where U, are operators E— E representing the 
corresponding generators of T?. In the standard 
basis [7], this relation reads as 


V;, Un] =2ninjU, [9] 


The curvature of the connection Vy defined as the 
commutator Fyy =[Vx, Vy] is an exterior 2-form 
on the adjoint vector space Lj with values in 
EndyuE. 


K-Theory: Chern Character 


The K-groups of a noncommutative torus coincide 
with those for commutative tori: 


Ke = Ky (T4) 


The Chern character of a projective module E 
over a noncommutative torus T can be defined as 


ch(E) = tr exp (=) EAS (LY) [10] 


where F is the curvature form of a connection on 

E, A®®™®(Lž) is the even part of the exterior algebra 

of L} and tr is the canonical trace on Endy4E. This 
0 


526 Noncommutative Tori, Yang-Mills, and String Theory 


mapping gives rise to a noncommutative Chern 
character 


ch : Ko (Ty ) > A9 (L§) [11] 


The component cho(E)=tr 1 = dim(E) is called the 
dimension of the module E. 

A distinctive feature of the noncommutative 
Chern character [11] is that its image does not 
consist of integral elements, that is, there is no 
lattice in L} that generates the image of the Chern 
character. However, there is a different integrality 
statement that replaces the commutative one. Con- 
sider a basis in L} in which the derivations 
corresponding to basis elements satisfy [7]. Denote 
the exterior forms corresponding to the basis 
elements by a!,...,a¢. Then an arbitrary element 
of A(Lj) can be represented as a polynomial in the 
anticommuting variables a’. Next let us consider the 
subset Oo ene) that consists of poly- 
nomials in ao having integer coefficients. It was 
proved by Elliott that the Chern character is 
injective and its range on Ko(T¢) is given by the 
image of A‘’"(Z7) under the action of the operator 


This fact implies that the K-group Ko(Ty) can be 
identified with the additive group A‘’"(Z*). 

The K-theory class u(E) € ASen (Zf) of a module E 
can be computed from its Chern character by the 
formula 


iE) exp Gan? sat) ch(E) [12] 


Note that the anticommuting variables a’ and the 
derivatives 0/0a/ satisfy the anticommutation rela- 
tion {a’, 0/da’} = 6. 

The coefficients of p(E) standing in front of 
monomials in a’ are integers to which we will 
refer as the topological numbers of the module E. 
These numbers can also be interpreted as numbers 
of D-branes of a definite kind although in non- 
commutative geometry it is difficult to talk about 
branes as geometrical objects wrapped on torus 
cycles. 

One can show that for noncommutative tori T4 
with irrational matrix 0; the set of elements of 
Ko(T¢) that represent a projective module (i.e., the 
positive cone) consist exactly of the elements of 
positive dimension. Moreover, if 0; is irrational, any 
two projective modules which represent the same 
element of Ko(T?) are isomorphic; that is, the 
projective modules are essentially specified in this 
case by their topological numbers. 


The complex differential geometry of noncommu- 
tative tori and its relation with mirror symmetry is 


discussed in Polishchuk and Schwarz (2003). 


Yang-Mills Theory on Noncommutative 
Tori 


Let E be a projective module over T?. We call a 
Yang-Mills field on E a connection Vx-compatible 
with the Hermitian structure, that is, a connection 
satisfying 


(Vx8s 1) 7, + E Vx) 7, =x nr) [13] 


for any two elements £,7¢E. Given a positive- 
definite metric on the Lie algebra Lg, we can define 
a Yang-Mills functional 


Sym(Vi) = gig ‘te (FFs) [14] 
485M 
Here g” stands for the metric tensor in the canonical 
basis [7], V=,./|detg|,gym is the Yang-Mills 
coupling constant, tr stands for the canonical trace 
on End7z,E discussed above, and summation over 
repeated indices is assumed. Compatibility with the 
Hermitian structure [13] can be shown to imply 
the positive definiteness of the functional Syy. The 
extrema of this functional are given by the solutions 
to the Yang-Mills equations 


g” [Vg Fi] =0 [15] 


A gauge transformation in the noncommutative 
Yang-Mills theory is specified by a unitary endo- 
morphism Z € Endr,E, that is, an endomorphism 
satisfying ZZ* = Z*Z = 1. The corresponding gauge 
transformation acts on a Yang-Mills field as 


VETA [16] 


The Yang-Mills functional [14] and the Yang- 
Mills equations [15] are invariant under these 
transformations. 

It is easy to see that Yang-Mills fields whose 
curvature is a scalar operator, that is, [Vi, Vj] = 
cij: 1 with oj a real-number-valued tensor, solve the 
Yang-Mills equations [15]. A characterization of 
modules admitting a constant curvature connection 
and a description of the moduli spaces of constant 
curvature connections (i.e., the space of such 
connections modulo gauge transformations) 
is reviewed in Konechny and Schwarz (2002). 
Another interesting class of solutions to the Yang- 
Mills equations is instantons (see below). 

As in the ordinary field theory, one can construct 
various extensions of the noncommutative Yang- 


Mills theory [14] by adding other fields. To obtain a 
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supersymmetric extension of [14], one needs to 
add a number of endomorphisms X;€ Endr, E 
that play the role of bosonic scalar fields in the 
adjoint representation of the gauge group and a 
number of odd Grassmann parity endomorphisms 
we c IEndr,E endowed with an SO(d)-spinor 
index a. The latter ones are analogs of the usual 
fermionic fields. 

In string theory, one considers a maximally 
supersymmetric extension of the Yang-Mills theory 
[14]. In this case, the supersymmetric action depends 
on 10—d bosonic scalars X;,[=d,...,9, and the 
fermionic fields can be collected into an SO(9,1) 
Majorana—Weyl spinor multiplet ~°,a=1,...,16. 
The maximally supersymmetric Yang-Mills action 
takes the form 


V vole 
Sov = 4g ——tr (FF +[Vu, Xıl y; X'i 


— 2p oh [Vue 
— 2ys"ol [Xr y] [17 


Here the curvature indices Fy, y, v =0,...,d — 1, 
are assumed to be contracted with a Minkowski 
signature metric, and ony are blocks of the ten- 
dimensional 32 x 32 gamma-matrices 


0 ov? 
ine . A=0,...,9 
(TA) ag 0 


This action is invariant under two kinds of super- 
symmetry transformations denoted by <s óe and 
defined as 


T [X7, XJ] [X’, x!) 


a= Lg Fre +o! [V;, Xile +o” [X7, Xe) 
dV; = eT, 6X] = coy [18] 
sep =e, 6V;=0, 6X; =0 


where € is a constant 16-component Majorana—Weyl 
spinor. Of particular interest for string theory 
applications are solutions to the equations of motion 
corresponding to [17] that are invariant under some 
of the above supersymmetry transformations. 
Further discussion can be found in Konechny and 
Schwarz (2002). 


Morita Equivalence 


The role of Morita equivalence as a duality 
transformation in noncommutative Yang-Mills 
theory was elucidated by Schwarz (1998). We will 
adopt a definition of Morita equivalence for 
noncommutative tori which can be shown to be 
essentially equivalent to the standard definition of 
strong Morita equivalence. We will say that two 


noncommutative tori T and T? are Morita equiva- 
lent a there exists a (T4, Té)- bimodule O and a 
( TI T4)-bimodule P such that 


Q®@r,P2Ty,  PORQST} [19] 


where Tọ on the right-hand side is considered as a 
(To, Tọ)-bimodule and analogously for Tọ. (It is 
assumed that the isomorphisms are canonical.) 
Given a Tg-module E one obtains a T;-module 
E as 


E=P 8r E [20] 


One can show that this mapping is functorial. 
Moreover, the bimodule O provides us with an 
inverse mapping O ®r, EXE. 

We further introduce a notion of gauge Morita 
equivalence (originally called “complete Morita 
equivalence”) that allows one to transport 
connections along with the mapping of modules 
[20]. Let L be a d-dimensional commutative Lie 
algebra. We say that the T. TŻ) Morita equiva- 
lence bimodule P establishes a gauge Morita 
equivalence if it is endowed with operators 
Vx,XEL that determine a constant curvature 
connection simultaneously with respect to T? and 
T, that is, satisfy 


Vy(ea) = (Veja + e(5xa) 
Vý (âe) =a(Vxe) + (dxa)e [21] 
[Vi Vy] 2mo i 


Here ôy and éy are standard derivations on Tọ and 
T, respectively. In other words, we have two Lie 
algebra homomorphisms 
6:Lolg, oL [22] 
If a pair (P, Vx) specifies a gauge (Ty, T,)- 
equivalence bimodule, then there exists a correspon- 
dence between connections on E and connections on 
E. The connection Vx on E corresponding to a 
given connection Vx on E is defined as 


Vx=Vx=18Vx+V$&1 [23] 


More precisely, an operator 1@Vx+V@1 on 
P &c E descends to a connection Vx on E=P 87 E. 
It is straightforward to check that under this 
mapping gauge equivalent connections go to gauge 
equivalent ones, 


ZN Z= NyZ 


where Z = 1 ® Z is the endomorphism of Ê =P &r, E 
corresponding to Z € EndyE. 
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The curvatures of Vy and Vy are connected by 
the formula 


FY, = FY, + loxy [24] 


which in particular shows that constant curvature 
connections go to constant curvature ones. 

Since noncommutative tori are labeled by an 
antisymmetric d x d matrix 6, gauge Morita equiva- 
lence establishes an equivalence relation on the set 
of such matrices. To describe this equivalence 
relation, consider the action 0=b0=6 of 
SO(d,d|Z) on the space of antisymmetric d x d 
matrices by the formula 


6 =(M0+N)(RO+S)' [25] 


where the d x d matrices M, N, R, S are such that 


the matrix 
M N 
p= ( = ) 26 


belongs to the group SO(d, d| Z). The above action is 
defined whenever the matrix A=R6+S is inverti- 
ble. One can prove that two noncommutative tori 
TŻ and T; are gauge Morita equivalent if and only if 
the matrices 9 and 0 belong to the same orbit of the 
SO(d, d|Z) action [25]. 

The duality group SO(d,d|Z) also acts on the 
topological numbers of moduli u € At”"(Z7). This 
action can be shown to be given by a spinor 
representation constructed as follows. First note 
that the operators a =a',b;=0/0a' act on A(R) 
and give a representation of the Clifford algebra 
specified by the metric with signature (d,d). The 
group O(d,d|C) can thus be regarded as a group of 
automorphisms acting on the Clifford algebra 
generated by a’,b;. Denote the latter action by W, 
for h € O(d,d|C). One defines a projective action V, 
of O(d, d|C) on A(R?) according to 

V,a'V,'=W,-(a), Vb; V; = Wp- (b;) 
This projective action can be restricted to yield a 
double-valued spinor representation of SO(d,d|C) 
on A(R) by choosing a suitable bilinear form on 
A(R4). The restriction of this representation to the 
subgroup SO(d,d|Z) acting on ASe (Zf) gives the 
action of Morita equivalence on the topological 
numbers of moduli. 

The mapping [23] preserves the Yang-Mills 
equations of motion [15]. Moreover, one can define 
a modification of the Yang-Mills action functional 
[14] in such a way that the values of the functionals 
on Vx and Vx coincide up to an appropriate 
rescaling of coupling constants. The modified action 
functional has the form 


V 
Sym = ga tEn +6,-1)(F®+6*%.1) [27 
where ®% is a scalar-valued tensor that can be 
thought of as some background field. Adding this 
term will allow us to compensate for the curvature 
shift by adopting the transformation rule 


Dyy > xy — oxy 


Note that the new action functional [27] has the 
same equations of motion [15] as the original one. 

To show that the functional [27] is invariant 
under gauge Morita equivalence, one has to take 
into account two more effects. Firstly, the values of 
trace change by a factor c= dim(E)(dim(E))! as 
tr X =ctr X. Secondly, the identification of Ly and 
Lg is established by means of some linear transfor- 
mation A£, the determinant of which will rescale the 
volume V. Both effects can be absorbed into an 
appropriate rescaling of the coupling constant. 

One can show that the curvature tensor, the 
metric tensor, the background field ®,;, and the 
volume element V transform according to 


FY = AREA; T Oj 


Bi = Af guA 28 
Ô; = AO, A, = Oij 


V = V|det A| 


where A=R +S and o= —RA*. The action func- 
tional [27] is invariant under the gauge Morita 
equivalence if the coupling constant transforms 
according to 


im = Syl det A|" [29] 


Supersymmetric extensions of Yang-Mills theory 
on noncommutative tori were shown to arise within 
string theory essentially in two situations. In the first 
case, one considers compactifications of the (BFSS 
or IKKT) matrix model of M-theory (Connes et al. 
1998). A discussion regarding the connection 
between T-duality and Morita equivalence in this 
case can be found in Seiberg and Witten (1999, 
section 7). Noncommutative gauge theories on tori 
can also be obtained by taking the so-called Seiberg— 
Witten zero slope limit in the presence of a Neveu- 
Schwarz B-field background (Seiberg and Witten 
1999). The emergence of noncommutative geometry 
in this limit is discussed in this article. Below we give 
some details on the relation between T-duality and 
Morita equivalence in this approach. Consider a 
number of Dp-branes wrapped on T? parametrized by 
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coordinates x’ ~ x’ + 2rr with a closed-string metric 
Gj and a B-field By. The SO(p, p|Z) T-duality group 
is represented by the matrices 


T=(¢ A (30) 


that act on the matrix 


Oor 


by a fractional transformation 


T: EXE’ = (aE+b)(cE +d)! [31] 


The transformed metric and B-field are obtained by 
taking, respectively, the symmetric and antisym- 
metric parts of E’. The string coupling constant is 
transformed as 


/ 8s 
a - SE 32 
se aeaa 32] 


The zero slope limit of Seiberg and Witten is 
obtained by taking 


gae =O, 


Sending the closed-string metric to zero implies that 
the B-field dominates in the open-string boundary 
conditions. In the limit [33], the compactification is 
parametrized in terms of open-string moduli 


Gi ~we—>0 [33] 


2 —1 
. 1 r 

E p 34 
= (B7) 34 
which remain finite. One can demonstrate that 0” is 
a noncommutativity parameter for the torus and the 
low-energy effective theory living on the Dp-brane is 
a noncommutative maximally supersymmetric gauge 

theory with a coupling constant 


det g Ki 
G; = g; | —— 35 
e (£ z) A 
From the transformation law [31], it is not hard to 
derive the transformation rules for the moduli [34] 
in the limit [33], 

T : gœ g = (a + b0)g(a+ be)’ 

T :0= 0 = (c + d0) (a + b6)’ 
Furthermore, the effective gauge theory becomes a 


noncommutative Yang-Mills theory [17] with a 
coupling constant 





[36] 


1)(3-p)/2 
(a) 


(gym) ~ = Om G, 


which goes to a finite limit under [33] provided one 
simultaneously scales g; with € as 


g, ~ Seth 


where k is the rank of B,. The limiting coupling constant 
gym transforms under the T-duality [31], [32] as 


T : gym gyu = gym (det(a + b0)) "4 [37] 


We see that the transformation laws [31] and [37] 
have the same form as the corresponding transfor- 
mations in [25], [28], [29] provided one identifies 
matrix [26] with matrix [30] conjugated by 


t=(1 o) 


The need for conjugation reflects the fact that in the 
BFSS M(atrix) model in the framework of which 
the Morita equivalence was originally considered, the 
natural degrees of freedom are DO branes versus Dp 
branes considered in the above discussion of T-duality. 

One can further check that the gauge field transfor- 
mations following from gauge Morita equivalence 
match with those induced by the T-duality. It is worth 
stressing that in the absence of a B-field background 
the effective action based on the square of the gauge 
field curvature is not invariant under T-duality. 


Instantons on Noncommutative T; 


Consider a Yang-Mills field Vx on a projective 
module E over a noncommutative 4-torus Tj. 
Assume that the Lie algebra of shifts Lọ is equipped 
with the standard Euclidean metric such that the 
metric tensor in the basis [7] is given by the identity 
matrix. The Yang—Mills field V; is called an instanton 
if the self-dual part of the corresponding curvature 
tensor is proportional to the identity operator, 


Fi, Soi +4 Gjikmn F” ") = twp 1 [38] 


where w, is a constant matrix with real entries. An 
anti-instanton is defined the same way by replacing 
the self-dual part with the anti-self-dual one. 

One can define a noncommutative analog of 
Nahm transform for instantons (Astashkevich et al. 
2000) that has properties very similar to those of the 
ordinary (commutative) one. To that end, consider a 
triple (P, Vi, V;) cousins of a (finite projective) 
(Th. T$)- bimodule P,Tj-connection V; and T> 
connection V; that Ne the following properties. 
The connection V; commutes with the T,-action on 
P and the connection V; with that el Tə. The 
commutators [V;, Vj], [Vi, Vj, [Vi, Vj] are propor- 
tional to the identity operator 


530 Nonequilibrium Statistical Mechanics (Stationary): Overview 


Va Va) Oat 
Vi, Vj] = Oy +1 [39] 
(Vi, Vj = G01 


The above conditions mean that P is a I a 
module and V; @ V; is a constant curvature connec- 
tion on it. In addition, we assume that the tensor oj 
is nondegenerate. 

For a connection VË on a right Tj-module E, we 
define a Dirac operator D =T'(VË + V;) acting on 
the tensor product 


(ES, P) @S 


where S is the SO(4) spinor representation space and 
I’ are four-dimensional Dirac gamma-matrices. The 
space S is Zz-graded: $S=S$* @S~ and D is an odd 
Operator so that we can consider 


D* : (E @7, P) ®S* — (E @7, P) 8 S- 
D` : (E 8r, P) 8 S7 — (E 8r, P) @ St 





A connection v on a T;-module E is called 
P-irreducible if there exists a bounded inverse to the 
Laplacian 


A= dv + V;) (VE + Vi) 


One can show that if VË is a P-irreducible instanton, 
then ker D+*=0 and DD+ =A. Denote by E the 
closure of the kernel of D7. Since D™ commutes with 
the T-action on (E 8r, P) @ S- the space E is a right 
T*-module. One can prove that this module is finite 
projective. Let P : (E 8r, P) 8 S- > È be a Hermitian 
projector. Denote by VË the composition P o V. One 
can show that VË is a Yang-Mills field on E. 

The noncommutative Nahm transform of a 
P-irreducible instanton connection VË on E is 
defined to be the pair (È, VE). One can further 
show that VË is an instanton. 


See also: Electroweak Theory; Hopf Algebras and 
q-Deformation Quantum Groups; Noncommutative 
Geometry from Strings; Quantum Group Differentials, 
Bundles and Gauge Theory; Quantum Hall Effect; String 
Field Theory; von Neumann Algebras: Introduction, 
Modular Theory, and Classification Theory. 
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Nonequilibrium 


Systems in stationary nonequilibrium are mechanical 
systems subject to nonconservative external forces 


and to thermostat forces which forbid indefinite 
increase of the energy and allow reaching statisti- 
cally stationary states. A system » is described by 
the positions and velocities of its n particles X,X, 
with the particle positions confined to a finite 
volume container Co. 

If X=(x1,...,X,) are the particle positions in 
a Cartesian inertial system of coordinates, the 
equations of motion are determined by their masses 
m; >0,i=1,...,n, by the potential energy of 


Nonequilibrium Statistical Mechanics (Stationary): Overview 531 


interaction V(x1,...,X,) = V(X), by the external 
nonconservative forces F;(X,®), and by the thermo- 
stat forces —2; as 


miži = —Oy, V(X) + F(X;B) — Vj, E [1] 


where ®=(1,...,,) are strength parameters on 
which the external forces depend. All forces and 
potentials will be supposed smooth, that is, analytic, 
in their variables aside from possible impulsive elastic 
forces describing shocks, and with the property 
F(X;0)=0. The impulsive forces are allowed here to 
model possible shocks with the walls of the container 
Co or between hard core particles. 

A thermostat is a “reservoir” which may consist 
of one or more infinite systems which are asympto- 
tically in thermal equilibrium and are separated by 
boundary surfaces from each other as well as from 
the system: with the latter, they interact through 
short-range conservative forces, see Figure 1. 

The reservoirs occupy infinite regions of the space 
outside Co, for example, sectors C, C R?,a=1,2..., 
in space and their particles are in a configuration 
which is typical of an equilibrium state at temperature 
Ta. This means that the empirical probability of 
configurations in each C, is Gibbsian with some 
temperature T,. In other words, the frequency with 
which a configuration (Y,Y + r) occurs in a region 
A+rc C, while a configuration (W, W + r) occurs 
outside A +r (with Y c A, W N A =Q) averaged over 
the translations A +r of A by r (with the restriction 
that A +r c C,) is 


average(ferl(¥.¥ +7): W, W +) 
r+ACC, 
eBa((1/2m,)|¥|"+Va(¥|W)) 
Se [2] 


normalization 


Here ma is the mass of the particles in the ath 
reservoir and V,(Y|W) is the energy of the short- 
range potential between pairs of particles in Y C C, 
or with one point in Y and one in W. Since the 
configurations in the system and in the thermostats 
are not random, [2] should be considered as an 
“empirical” probability in the sense that it is the 


Figure 1 A symbolic drawing of the container Co for the 
system X£ and of the surrounding regions containing the particles 
acting as thermostats at temperatures T4, 7o,.... 


frequency density of the events {(¥, Y +r); W +r} 
in other words, the configurations @, in the 
reservoirs should be “typical” in the sense of 
probability theory of distributions which are asymp- 
totically Gibbsian. 

The property of being “thermostats” means that 
[2] remains true for all times, if initially satisfied. 

Mathematically, there is a problem at this point: 
the latter property is either true or false, but a 
proof of its validity seems out of reach of the 
present techniques except in very simple cases. 
Therefore, here we follow an intuitive approach 
and assume that such thermostats exist and, 
actually, that any configuration which is typical of 
a stationary state of an infinite size system of 
interacting particles in the C,’s, with physically 
reasonable microscopic interactions, satisfies the 
property [2]. 

The above thermostats are examples of “determi- 
nistic thermostats” because, together with the 
system, they form a deterministic dynamical system. 
They are called “Hamiltonian thermostats” and are 
often considered as the most appropriate models of 
“physical thermostats.” 

A closely related thermostat model is obtained by 
assuming that the particles outside the system are 
not in a given configuration but they have a 
probability distribution whose conditional distribu- 
tions satisfy [2] initially. Also in this case, it is 
necessary to assume that [2] remains true for all 
times, if initially satisfied. Such thermostats are 
examples of “stochastic thermostats” because their 
action on the system depends on random variables 
@, which are the initial configurations of the 
particles belonging to the thermostats. 

Other kinds of stochastic thermostats are “colli- 
sion rules” with the container boundary OCp of ©: 
every time a particle collides with Co it is reflected 
with a momentum p in d°p that has a probability 
distribution proportional to e~®%(!/2™)P"d?p where 
Ga,4=1,2,... depends on which boundary portion 
(labeled by a=1,2,...) the collision takes place and 
T, =(kp3,) and its “temperature” if kp is Boltz- 
mann’s constant. Which p is actually chosen after 
each collision is determined by a random variable 
w =(@1,@2,...). 

The distinction between stochastic and deter- 
ministic thermostats ultimately rests on what we 
call “system.” If reservoirs or the randomness 
generators are included in the system, then the 
system becomes deterministic (possibly infinite); 
and finite deterministic thermostats can also be 
regarded as simplified models for infinite reservoirs, 
see the section “Heat, temperature, and entropy 
production.” 
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It is also possible, and convenient, to consider 
“finite deterministic thermostats.” In the latter case, 
0 is a force only depending upon the configuration 
of the n particles v of © in their finite container Co. 

Examples of finite deterministic reservoirs are forces 
obtained by imposing a nonholonomic constraint via 
some ad hoc principle like the Gauss principle. For 
instance, if a system of particles driven by a force 
G;$ —0,,V(X) + F;(X) is enclosed in a box Co and 8 
is a thermostat enforcing an anholonomic constraint 
«W(X, X) = 0 via Gauss’ principle, then 


;(X, X) 
> j: OVX, X) + (1/m)G; : 3z Y(X,X) 
im (O;,0(X,X))° 
x ZYX, X) [3] 


Gauss’ principle says that the force which needs to 
be added to the other forces G; acting on the system 
minimizes 


given X,X, among all accelerations a; which are 
compatible with the constraint ~. 

It should be kept in mind that the only known 
examples of mathematically treatable thermostats 
modeled by infinite reservoirs are cases in which the 
thermostat particles are either noninteracting parti- 
cles or linear (i.e., noninteracting) oscillators. For 
simplicity stochastic or infinite thermostats will not 
be considered here and we restrict attention to finite 
deterministic systems. 

In general, in order that a force # can be 
considered a deterministic “thermostat force” a 
further property is necessary: namely that the system 
evolves according to [1] towards a stationary state. 
This means that for all initial particle configurations 
(X,X), except possibly for a set of zero phase-space 
volume, any smooth function f (X, X) evolves in time 
so that, if S,(X,X) denotes the configuration into 
which the initial data evolve in time t according to 
[1], then the limit 


T— co 


T 
lim = f f(S,(X,X)) de = J fulde) [4 


exists and is independent of (X, X). The probability 
distribution u is then called the SRB distribution for 
the system. The maps $ will have the group 
property S,-S,=S,,7 and the SRB distribution u 
will be invariant under time evolution. 

It is important to stress that the requirement that 
the exceptional configurations form just a set of zero 


phase volume (rather than a set of zero probability 
with respect to another distribution, singular with 
respect to the phase volume) is a strong assumption 
and it should be considered an axiom of the theory: 
it corresponds to the assumption that the initial 
configuration is prepared as a typical configuration 
of an equilibrium state, which, by the classical 
equidistribution axiom of equilibrium statistical 
mechanics, is a typical configuration with respect 
to the phase volume. 

For this reason, the SRB distribution is said to 
describe a “stationary nonequilibrium state” of 
the system. The SRB distribution depends on the 
parameters on which the forces acting on the 
system depend, for example, |Co| (volume), ® 
(strength of the forcings), {871} (temperatures), etc. 
The collection of SRB distributions obtained by 
letting the parameters vary defines a “nonequilibrium 
ensemble.” 

In the stochastic case, the distribution p is 
required to be invariant in the sense that it can be 
regarded as a marginal distribution of an invariant 
distribution for the larger (deterministic) system 
formed by the thermostats and the system itself. 

For more details, the reader is referred to Evans 
and Morriss (1990), Ruelle (1999), and Eckmann 
et al. (1999). 


Nonequilibrium Thermodynamics 


The key problem of nonequilibrium statistical 
mechanics is to derive a macroscopic “nonequili- 
brium thermodynamics” in a way similar to the 
derivation of equilibrium thermodynamics from 
equilibrium statistical mechanics. 

The first difficulty is that nonequilibrium thermo- 
dynamics is not well understood. For instance, there 
is no (agreed upon) definition of entropy of a 
nonequilibrium stationary state, while it should be 
kept in mind that the effort to find the microscopic 
interpretation of equilibrium entropy, as defined by 
Clausius, was a driving factor in the foundations of 
equilibrium statistical mechanics. 

The importance of entropy in classical equilibrium 
thermodynamics rests on the implication of univer- 
sal, parameter-free relations which follow from its 
existence (e.g., Ov(1/T) =Ou(p/T) if U is the 
internal energy, T the absolute temperature, and p 
the pressure of a simple homogeneous material). 

Are there universal relations among averages of 
observables with respect to SRB distributions? 

The question has to be posed for systems “really” 
out of equilibrium, that is, for ® Æ 0 (see [1]): in 
fact, there is a well-developed theory of the 
derivatives with respect to @® of averages of 
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observables evaluated at ®=0. The latter theory is 
often called, and here we shall do so as well, 
“classical nonequilibrium thermodynamics” or 
“near-equilibrium thermodynamics” and it has 
been quite successfully developed on the basis of 
the notions of equilibrium thermodynamics, paying 
particular attention to the macroscopic evolution of 
systems described by macroscopic continuum equa- 
tions of motion. 

“Stationary nonequilibrium statistical mechanics” 
will indicate a theory of the relations between 
averages of observables with respect to SRB dis- 
tributions. Systems so large that their volume 
elements can be regarded as being in locally 
stationary nonequilibrium states could also be 
considered. This would extend the familiar “local 
equilibrium states” of classical nonequilibrium ther- 
modynamics: however, they are not considered here. 
This means that we shall not attempt to find the 
macroscopic equations regulating the time evolution 
of continua locally in nonequilibrium stationary 
states but we shall only try to determine the 
properties of their “volume elements” assuming 
that the timescale for the evolution of large 
assemblies of volume elements is slow compared to 
the timescales necessary to reach local stationarity. 

For more details, the reader is referred to 
de Groot and Mazur (1984), Lebowitz (1993), 
Ruelle (1999, 2000), Gallavotti (1998, 2004), and 
Goldstein and Lebowitz (2004). 


Chaotic Hypothesis 


In equilibrium statistical mechanics, the ergodic 
hypothesis plays an important conceptual role as it 
implies that the motions of ergodic systems have an 
SRB statistics and that the latter coincides with the 
Liouville distribution on the energy surface. 

An analogous role has been proposed for the 
“chaotic hypothesis,” which states that the 


motion of a chaotic system, developing on its attracting 
set, can be regarded as an Anosov flow. 


This means that the attracting sets of chaotic 
systems, physically defined as systems with at least 
one positive Lyapunov exponent, can be regarded as 
smooth surfaces on which motion is highly unstable: 


1. Around every point, a curvilinear coordinate 
system can be established which has three planes, 
varying continuously with x, which are covariant 
(i.e., they are coordinate planes at a point x 
which are mapped, by the evolution S+, into the 
corresponding coordinate planes around S;x). 


2. The planes are of three types, “stable,” “unstable,” 
and “marginal,” with respective positive dimen- 
sions d,,d,, and 1: infinitesimal lengths on the 
stable plane and on the unstable plane of any 
point contract at exponential rate as time 
proceeds towards the future or towards the past. 
The length along the marginal direction neither 
contracts nor expands (i.e., it varies around the 
initial value staying bounded away from 0 and 
oo): its tangent vector is parallel to the flow. In 
cases in which time evolution is discrete, and 
determined by a map S, the marginal direction is 
missing. 

3. The contraction over a time t, positive for lines 
on the stable plane and negative for those on the 
unstable plane, is exponential, i.e. lengths are 
contracted by a factor uniformly bounded by 
Ce-*l with C, «k > 0. 

4. There is a dense trajectory. 


It has to be stressed that the chaotic hypothesis 
concerns physical systems: mathematically, it is 
very easy to find dynamical systems for which it 
does not hold, at least as easy as it is to find 
systems in which the ergodic hypothesis does not 
hold (e.g., harmonic lattices or blackbody radia- 
tion). However, if suitably interpreted, the ergodic 
hypothesis leads, even for these systems, to physi- 
cally correct results (the specific heats at high 
temperature, the Raleigh—Jeans distribution at low 
frequencies). Moreover, the failures of the ergodic 
hypothesis in physically important systems have led 
to new scientific paradigms (like quantum 
mechanics from the specific heats at low tempera- 
ture and Planck’s law). 

Since physical systems are almost always not 
Anosov systems, it is very likely that probing 
motions in extreme regimes will make visible the 
features that distinguish Anosov systems from non- 
Anosov systems, much as it happens with the 
ergodic hypothesis. 

The interest of the hypothesis is to provide a 
framework in which properties like the existence of 
an SRB distribution is a priori guaranteed, together 
with an expression for it which can be used to work 
with formal expressions of the averages of the 
observables: the role of Anosov systems in chaotic 
dynamics is similar to the role of harmonic oscillators 
in the theory of regular motions. They are the 
paradigm of chaotic systems, as the harmonic 
oscillators are the paradigm of order. Of course, the 
hypothesis is only a beginning and one has to learn 
how to extract information from it, as it was the case 
with the use of the Liouville distribution, once the 
ergodic hypothesis guaranteed that it was the 
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appropriate distribution for the study of the statistics 
of motions in equilibrium situations. 

For more details, the reader is referred to Ruelle 
(1976), Gallavotti and Cohen (1995), Ruelle (1999), 
Gallavotti (1998), and Gallavotti et al. (2004). 


Heat, Temperature, and Entropy 
Production 


The amount of heat O that a system produces while in 
a stationary state is naturally identified with the work 
that the thermostat forces 3 perform per unit time 


Q=} vki [5] 


A system may be in contact with several reservoirs: 
in models, this will be reflected by a decomposition 


v= 5S 0 (X,X) [6] 


where 2? is the force due to the ath thermostat and 
depends on the coordinates of the particles which 
are in a region A, CCo of a decomposition 
U” A4 = Co of the container Cy occupied by the 
system (A4 N Ay =0 if a £ a'). 

From several studies based on simulations of finite 
thermostatted systems of particles arose the proposal 
to consider the average of the phase-space contrac- 
tion o” (X,X) due to the ath thermostat 


o (X, X) É X as, 0 (K,X) 7 
j 


and to identify it with the rate of entropy creation in 
the ath thermostat. 

Another key notion in thermodynamics is the 
temperature of a reservoir; in the infinite determi- 
nistic thermostat case, of the section “Nonequili- 
brium,” it is defined as (kp@,) but in the finite 
deterministic thermostats considered here it needs to 
be defined. If there are m reservoirs with which the 
system is in contact, one sets 


al?) (o9(X,X)) = | X,X) w(dX dX) 
Ou = Sa” Xj 


where u is the SRB distribution describing the 
stationary state. It is natural to define the absolute 
temperature of the ath thermostat to be 


[8] 





T, = 24 9 
kgo K 

It is not clear that T, > 0: this happens in a rather 

general class of models and it would be desirable, for 


the interpretation that is proposed here, that it could 
be considered a property to be added to the require- 
ments that the forces 2”) be thermostat models. 

An important class of thermostats for which the 
property T, > 0 holds can be described as follows. 
Imagine N particles in a container Co interacting via 
a potential Vo = i<j plq; — 4;) + V'(q;) (where 
V’ models external conservative forces like obsta- 
cles, walls, gravity, ...) and, furthermore, interacting 
with M other systems ©}, of N, particles of mass 
Mg, in containers C, contiguous to Cg. The latter 
will model M parts of the system in contact with 
thermostats at temperatures T,,a=1,...,M. 

The coordinates of the particles in the ath system 
Xa will be denoted x7,j=1,...,Na, and they will 
interact with each other via a potential V,= 
= a(x? — x7). Furthermore, there will be an 
interaction between the particles of each thermostat 
and those of the system via potentials W,= 
a a 9 Tse Me 

The potentials will be assumed to be either hard 
core or nonsingular potentials and the external V’ is 
supposed to be at least such that it forbids the 
existence of obvious constants of motion. 

The temperature of each X, will be defined by 
the total kinetic energy of its particles, that 
is, by Ka= 382, (1/2)mq(a?)S (3/2)NakpTa: the 
particles of the ath thermostat will be kept at 
constant temperature by further forces 0”. The latter 
are defined by imposing via a Gaussian constraint 
that K, is a constant of motion (see [3] with Y% = Ka). 
This means that the equations of motion are 


Na 
mq; = —Oy (vo + ` wao) 
a=1 


mað = —Axe(Va(x") + Wa(Q, x") -0 


(10) 


and an application of Gauss’ principle yields 


(|e g 
ye = 4 * Ox? 
a A 


where L, is the work per unit time done by the 
particles in Co on the particles of X, and V; is their 
potential energy. 

In this case, the partial divergence of = (3N, — 1)a? 
is, up to a constant factor (1— (1/3N,)), 


anla Va 
kal, kpT, 


and it will make [9] identically satisfied with T, > 0 
because L, can be naturally interpreted as heat O, 
ceded, per unit time, by the particles in Co to the 
subsystem X, (hence to the ath thermostat because 
the temperature of ©, is constant), while the 
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derivative of V, will not contribute to the value of 
of. The phase-space contraction rate is, neglecting 
the total derivative terms (and O(N;')), 





N, ¢ 
Orrue(X, X) = Qa [11] 


where the subscript “true” is to remind that an 
additive total derivative term distinguishes it from 
the complete phase-space contraction. 


Remarks 


(i) The above formula provides the motivation of the 
name “entropy creation rate” attributed to the 
phase-space contraction o. Note that in this way 
the definition of entropy creation is “reduced” to 
the equilibrium notion because what is being 
defined is the entropy increase of the thermostats 
which have to be considered in equilibrium. No 
attempt is made here to define neither the entropy 
of the stationary state nor the notion of tempera- 
ture of the nonequilibrium system in Co (the T, 
are temperatures of the %{,, not of the particles in 
Co). This is an important point as it leaves open 
the possibility of envisaging the notion of “local 
equilibrium” which becomes necessary in the 
approximation (not considered here) in which 
the system is regarded as a continuum. 

(ii) In the above model, another viewpoint is 
possible: that is, to consider the system to 
consist of only the N particles in Co and the M 
systems X; to be thermostats. From this point of 
view, it can be considered a model of a system 
subject to thermostats. The Gibbs distribution 
characterizing the infinite thermostats of the 
section “Nonequilibrium” becomes in this case 
the constraint that the kinetic energies K, are 
constants, enforced by the Gaussian forces. In 
the new viewpoint, the appropriate definition 
should be simply the right-hand side (RHS) of 
[11], i.e. the work per unit time done by the 
forces of the system on the thermostats divided 
by the temperature of the thermostats. This 
suggests a different and general definition of 
entropy creation rate, applying also to thermo- 
stats that are often considered “more physical” 
and that needs to be further investigated. In the 
example [10] the new definition differs from the 
phase space contraction rate by a total time 
derivative, i.e. rather trivially for the purposes of 
the following. 


For more details, the reader is referred to Evans 
and Morriss (1990), Gallavotti and Cohen (1995), 
Ruelle (1996, 1997), and Gallavotti (2004). 


Thermodynamic Fluxes and Forces 


Nonequilibrium stationary states depend upon 
external parameters y; like the temperatures T, of 
the thermostats or the size of the force parameters 
= (1,---5 Yq), see [1]. Nonequilibrium thermo- 
dynamics is well developed at “low forcing”: strictly 
speaking, this means that it is widely believed that 
we understand the properties of the derivatives of 
the averages of observables with respect to the 
external parameters if evaluated at y; = 0. Important 
notions are the notions of thermodynamic fluxes J; 
and of thermodynamic forces y;; hence, it seems 
important to extend such notions to nonequilibrium 
systems (1.e., ®© Æ 0). 

A possible extension could be to define the 
thermodynamic flux J; associated with a force y; as 
Ji=(O,0)spn where o(X,X;®) is the volume 
contraction per unit time. This definition seems 
appropriate in several concrete cases that have been 
studied and it is appealing for its generality. 

An interesting example is provided by the model 
of thermostatted system in [10]: if the container of 
the system is a box with periodic boundary condi- 
tions, one can imagine to add an extra constant 
force E acting on the particles in the container. 
Imagining the particles to be charged by a charge e 
and regarding such force as an electric field, the first 
equation in [10] is modified by the addition of a 
term eE. 

The constraints on the thermostat temperatures imply 
that ø depends also on E: in fact, if J =e X`; q; is the 
electric current, energy balance implies Uso =E-J— 
yo tg V,) if Us is the sum of all kinetic and 
potential energies. Then, the phase-space contraction 


iV, 


can be written, to first order in the temperature 
variations óT, with respect to a common value 


d= Tas 





She a Vatla EJ- Veo 
~ T T T 


hence wue, see [11], is 








EJ yo Q ST, 
E [12] 


The definition and extension of the conjugacy 
between thermodynamic forces and fluxes is com- 
patible with the key results of classical nonequili- 
brium thermodynamics, at least as far as Onsager 
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reciprocity and Green—Kubo’s formulas are con- 
cerned. It can be checked that if the equilibrium 
system is reversible, that is, if there is an isometry I 
on phase space which anticommutes with the 
evolution (IS,=S_;I in the case of continuous-time 
dynamics tS, or IS=S"'I in the case of discrete- 
time dynamics S$), then, shortening (X,X) into x, 


def 
Lij = 06,Jile—o = 99; (00,0(x;®)) spp lo- = 9,Jile—o 


1 CoO 
=1y=5 | (09,0 (S1x;®)00,0(x;®)) srgloo dt [13] 


The o(x;®) plays the role of “Lagrangian” generat- 
ing the duality between forces and fluxes. The 
extension of the duality just considered might be of 
interest in situations in which ®40. 

For more details, the reader is referred to de Groot 
and Mazur (1984), Gallavotti (1996), and Gallavotti 
and Ruelle (1997). 


Fluctuations 


As in equilibrium, large statistical fluctuations of 
observables are of great interest and already there is, 
at the moment, a rather large set of experiments 
dedicated to the analysis of large fluctuations in 
stationary states out of equilibrium. 

If one defines the dimensionless phase-space 


contraction 
p(x) == | È ar [14] 
0 


T ome 


(see also [11]), then there exists p* > 1 such that the 
probability P, of the event p € [a,b] with [a,b] c 
(—p*, p*) has the form 


P,(p € [a, b]) = const. e7 M%retasi S) tOO) [15] 


with ¢(p) analytic in (—p*, p*). The function ¢(p) can 
be conveniently normalized to have value 0 at p=1 
(i.e., at the average value of p). 

Then, in Anosov systems which are reversible and 
dissipative (see the previous section), a general 
symmetry property, called the “fluctuation theorem” 
and reflecting the reversibility symmetry, yields the 
parameterless relation 


Cp) =¢(p) —por peCp’,p’) [16] 


This relation is interesting because it has no free 
parameters; in other words, it is universal for 
reversible dissipative Anosov systems. In connection 
with the flux-force duality in the previous section, it 
can be checked to reduce to the Green—Kubo 
formula and to Onsager reciprocity, see [13], in the 
case in which the evolution depends on several fields 
® and ®— 0 (of course the relation becomes trivial 


as — 0 because 0, — 0 and to obtain the result 
one has first to divide both sides by suitable powers 
of the fields ®). 

A more informal (but imprecise) way of writing 
[15] and [16] is 
eee 


=e for all p € (-—p*,p*) [17 
where P,(p) is the probability density of p. An 
obvious but interesting consequence of [17] is that 


(e777) pp = 


in the sense that (1/7) log(e-7?"*) spn ——> 0 

Occasionally, systems with singularities have to be 
considered. In such cases, the relation [16] may 
change in the sense that the function ¢(p) may not be 
analytic: in such cases, one expects that the relation 
holds in the largest analyticity interval symmetric 
around the origin. In Anasov systems and also 
various cases considered in the literature, such 
interval appears to contain the interval (—1,1). 

Note that in the theory of fluctuations of the time 
averages p we can replace o by any other bounded 
quantity which is a total time derivative: hence, in the 
example discussed above, it can be replaced by true, 
see [12], which has a natural physical meaning. 

It is important to remark that the above fluctua- 
tion relation is the first representative of several 
consequences of the reversibility and chaotic 
hypotheses. For instance, given Fy,...,F, arbitrary 
observables which are (say) odd under time reversal 
I (i.e. F(Ix)=—F(x)) and given n functions t€ 
[—7/2,7/2] > »(t),7=1,...,2, one can ask which 
is the probability that F;(S;x) “closely follows” the 
“pattern” y;(t) and at the same time 


f a(Sox) ay 
0 


T O, 


has value p. Then calling P (Fi ~ Q1,..., Fn ~ Pn, P) 
the probability of this event, which we write in the 
imprecise form corresponding to [17] for simplicity, 
and defining Ip;(t) df (2), it is 


Pli S Gissel a Ongp) — g%+P 
P,(Fi ~ Igi,.--; Fa ~ Ign, =p) 
p € (=p*,p*) [18] 


which is remarkable because it is parameterless and 
at the same time surprisingly independent of the 
choice of the observables F;. The relation [18] has 
far-reaching consequences: for instance, if m= 1 and 
Fı =05,0(x;@®) the relation [18] has been used to 
derive the mentioned Onsager reciprocity and 
Green—Kubo’s formulas at B= 0. 
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Equation [18] can be read as follows: the 
probability that the observables F; follow given 
evolution patterns y; conditioned to entropy crea- 
tion rate po, is the same that they follow the time- 
reversed patterns if conditioned to entropy creation 
rate —po,. In other words, to change the sign of 
time, it is just sufficient to reverse the sign of 
entropy creation rate, no “extra effort” is needed. 

For more details, the reader is referred to Sinai 
(1972, 1994), Evans et al. (1993), Gallavotti and 
Cohen (1995), Gallavotti (1996, 1999), Gallavotti 
and Ruelle (1997), Gallavotti et al. (2004), and 
Bonetto et al. (2005). 


Fractal Attractors, Pairing, 
and Time Reversal 


Attracting sets (i.e., sets which are the closure of 
attractors) are fractal in most dissipative systems. 
However, the chaotic hypothesis assumes that 
fractality can be neglected. Apart from the very 
interesting cases of systems close to equilibrium, in 
which the closure of an attractor is the whole phase 
space (under the chaotic hypothesis, i.e., if the 
system is Anosov), hence not fractal, serious 
problems arise in preserving validity of the fluctua- 
tion theorem. 

The reason is very simple: if the attractor closure 
is smaller than phase space, then it is to be expected 
that time reversal will change the attractor into a 
repeller disjoint from it. Thus, even if the chaotic 
hypothesis is assumed, so that the attracting set 
A can be considered a smooth surface, the motion 
on the attractor will not be time-reversal symmetric 
(as its time-reversal image will develop on the 
repeller). One can say that an attracting set with 
dimension lower than that of phase space in a time- 
reversible system corresponds to a spontaneous 
breakdown of time-reversal symmetry. 

It has been noted however that there are classes 
of systems, forming a large set in the space of 
evolutions depending on a parameter ®, in which 
geometric reasons imply that if beyond a critical 
value ®, the attracting set becomes smaller than 
phase space, then a map Ip is generated mapping the 
attractor A into the repeller R, and vice versa, such 
that I5 is the identity on AUR and Ip commutes 
with the evolution: therefore, the composition I - Ip 
is a time-reversal symmetry (i.e., it anticommutes 
with evolution) for the motions on the attracting set 
A (as well as on the repeller R). 

In other words, the time-reversal symmetry in 
such systems “cannot be broken”: if spontaneous 
breakdown occurs (i.e., A is not mapped into itself 


under time reversal I), a new symmetry Ip is 
spawned and I- Ip is a new time-reversal symmetry 
(an analogy with the spontaneous violation of time 
reversal in quantum theory, where time reversal T is 
violated but TCP is still a symmetry: so T plays the 
role of I and CP that of Ip). 

Thus, a fluctuation relation will hold for the 
phase-space contraction of the motions taking place 
on the attracting set for the class of systems with the 
geometric property mentioned above (technically, 
the latter is called “axiom C” property). 

This is interesting but it still is quite far from 
being checkable even in numerical experiments. 
There are nevertheless systems in which a “pairing 
property” also holds: this means that, considering 
the case of discrete-time maps S, the Jacobian matrix 
O,S(x) has 2N eigenvalues that can be labeled, 
in decreasing order, An lX), =: ss A(1/2)N(X)5~- +5 A(X), 
with the remarkable property that (1/2)(An_;(x) + 
dj(xx)) a(x) is j-independent. In such systems, a 
relation can be established between phase-space 
contractions in the full phase space and on the 
surface of the attracting set: the fluctuation theorem 
for the motion on the attracting set can therefore 
be related to the properties of the fluctuations of 
the total phase-space contraction measured on the 
attracting set (which includes the contraction trans- 
versal to the attracting set) and if 2M is the 
attracting set dimension and 2N is the total 
dimension of phase space it is, in the analyticity 
interval (—p*, p*) of the function ¢(p), 


CCP) = lp) -p&o 19) 


which is an interesting relation. It is however very 
difficult to test in mechanical systems because in 
such systems it seems very difficult to make the field 
so high to see an attracting set thinner than the 
whole phase space and still observe large 
fluctuations. 


For more details, the reader is referred to Dettman 
and Morriss (1996) and Gallavotti (1999). 


Nonequilibrium Ensembles 
and Their Equivalence 


Given a chaotic system, the collection of the SRB 
distributions associated with the various control 
parameters (volume, density, external forces,...) 
forms an “ensemble” describing the possible sta- 
tionary states of the system and their statistical 
properties. 

As in equilibrium, one can imagine that the 
system can be described equivalently in several 
ways at least when the system is large (“in the 
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thermodynamic” or “macroscopic limit”). In none- 
quilibrium, equivalence can be quite different and 
more structured than in equilibrium because one can 
imagine to change not only the control parameters 
but also the thermostatting mechanism. 

It is intuitive that a system may behave in the 
same way under the influence of different thermo- 
stats: the important phenomenon being the extrac- 
tion of heat and not the way in which it is extracted 
from the system. Therefore, one should ask when 
two systems are “physically equivalent,” that is, 
when the SRB distributions associated with them 
give the same statistical properties for the same 
observables, at least for the very few observables 
which are macroscopically relevant. The latter may 
be a few more than the usual ones in equilibrium 
(temperature, pressure, density, etc.) and include 
currents, conducibilities, viscosities, etc., but they 
will always be very few compared to the (infinite) 
number of functions on phase space. 

As an example, consider a system of N interacting 
particles (say hard spheres) of mass m moving in a 
periodic box Cp of side L containing a regular array 
of spherical scatterers (a basic model for electrons in 
a crystal) which reflect particles elastically and are 
arranged so that no straight line exists in Co which 
avoids the obstacles (to eliminate obvious constants 
of motion). An external field Eu acts also along the 
u-direction: hence, the equations of motion are 


ix; = f; + Eu — 0; [20] 


where f; are the interparticle forces and those 
between scatterers and particles, and #; are the 
thermostatting forces. The following thermostat 
models have been considered: 


1. J; = vx; (viscosity thermostat), 

2. immediately after elastic collision with an obsta- 
cle the velocity is rescaled to a prefixed value 
\/3kpTm for some T (Drude’s thermostat), 

a2 = (E>) D> ier (Gauss’ thermostat). 


The first two are not reversible. At least not 
manifestly such, because the natural time reversal, 
that is, change of velocity sign, is not a symmetry 
(there might be however more hidden, hitherto 
unknown, symmetries which anticommute with 
time evolution). The third is reversible and time 
reversal is just the change of the velocity sign. The 
third thermostat model generates a time evolution in 
which the total kinetic energy K is constant. 

Let wi, ut ug be the SRB distributions for the 
system in a container Cy with volume |Co| = L? and 
density p= N/L’ fixed. Imagine to tune the values 
of the control parameters v,T,K in such a way that 


(kinetic energy), =E, with the same E for w= p, 
[rs ug and consider a local observable F(X,X) > 0 
depending only on the coordinates of the particles 
located in a region AC Cp. Then a reasonable 
conjecture is that 








F) PF) y 
lim Pu, = lim Py = 1 [21] 
L-oo (F) u” L—-oo (F) u” 
N/L3=p T N/L3= 


if the limits are taken at fixed F (hence at fixed A 
while L—- oo). The conjecture is an open 
problem: it illustrates, however, the kind of ques- 
tions arising in nonequilibrium statistical mechanics. 

For more details, the reader is referred to Evans 
and Sarman (1993), Gallavotti (1999), and Ruelle 
(2000). 


Outlook 


The subject is (clearly) at a very early stage of 
development. 


1. The theory can be extended to stochastic thermo- 
stats quite satisfactorily, at least as far as the 
fluctuation theorem is concerned. 

2. Remarkable works have appeared on the theory 
of systems which are purely Hamiltonian and 
(therefore) with thermostats that are infinite: 
unfortunately, the infinite thermostats can be 
treated, so far, only if their particles are “free” at 
infinity (either free gases or harmonic lattices). 

3. The notion of entropy turns out to be extremely 
difficult to extend to stationary states and there 
are even doubts that it could be actually 
extended. Conceptually, this is certainly a major 
open problem. 

4. The statistical properties of stationary states out of 
equilibrium are still quite mysterious and surpris- 
ing: some exactly solvable models have appeared 
recently, and attempts have been made at unveil- 
ing the deep reasons for their solubility and at 
deriving from them general guiding principles. 

5. Numerical simulations have given a strong 
impulse to the subject; in fact, one can even say 
that they created it: introducing the model of 
thermostat as an extra microscopic force acting on 
the particles and providing the first reliable results 
on the properties of systems out of equilibrium. 
Simulations continue to be an essential part of the 
effort of research on the field. 

6. Approach to stationarity leads to many impor- 
tant questions: is there a Lyapunov function 
measuring the distance between an evolving 
state and the stationary state towards which it 
evolves? In other words, can one define an 
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analogous of Boltzmann’s H-function? About this 
question there have been proposals and the answer 
seems affirmative, but it does not seem that it is 
possible to find a universal, system-independent, 
such function (search for it is related to the problem 
of defining an entropy function for stationary 
states: its existence is at least controversial, see the 
sections “Nonequilibrium thermodynamics” and 
“Chaotic hypothesis”). 

. Studying nonstationary evolution is much harder. 
The problem arises when the control parameters 
(force, volume,...) change with time and the 
system “undergoes a process.” As an example one 
can ask the question of how irreversible is a given 
irreversible process in which the initial state jo is a 
stationary state at time t=O, and the external 
parameters ®p start changing into functions ®(t) 
of t and tend to a limit ®,, as t — oo. In this case, 
the stationary distribution uọ starts changing and 
becomes a function us of t which is not stationary 
but approaches another stationary distribution Ho 
as t— œ. The process is, in general, irreversible 
and the question is how to measure its “degree of 
irreversibility”: for simplicity we restrict attention 
to very special processes in which the only 
phenomenon is heat production because the 
container does not change volume and the energy 
also remains constants, so that the motion can be 
described at all times as taking place on a fixed 
energy surface. A natural quantity Z associated 
with the evolution from an initial stationary state 
to a final stationary state through a change in the 
control parameters can be defined as follows. 
Consider the distribution ju; into which uo evolves 
in time ¢, and consider also the SRB distribution 
Uae) Corresponding to the control parameters 
“frozen” at the value at time ¢, that is, ®(ż). Let 
the phase-space contraction, when the forces are 
“frozen” at the value (ft), be o;(x) = a(x; @(t)). In 
general u: A Ua). Then, 


TADOJ poto) E f ITS 
—pan(or)) dt [22] 


can be called the degree of irreversibility of the 
process: it has the property that in the limit of 
infinitely slow evolution of ®(t), for example, if 
(t) = +(1—e77*)A(a quasistatic evolution 
on timescale y tk! from ®) to ®,,=®y +A), 
the irreversibility degree Z,—>0 if (as in the case 
y= , 
of Anosov evolutions, hence under the chaotic 
hypothesis) the approach to a stationary state is 
exponentially fast at fixed external forces ®. The 
quantity Z is a time scale which could be 


interpreted as the time needed for the process to 
exhibit its irreversible nature. 


The entire subject is dominated by the initial 
insights of Onsager on classical nonequilibrium 
thermodynamics, which concern the properties of 
the infinitesimal deviations from equilibrium (1.e., 
averages of observables differentiated with respect 
to the control parameters ® and evaluated at ®=0). 
The present efforts are devoted to studying proper- 
ties at #0. In this direction, the classical theory 
provides certainly firm constraints (like Onsager 
reciprocity or Green-Kubo relations or fluctuation- 
dissipation theorem) but at a technical level, it gives 
little help to enter the terra incognita of none- 
quilibrium thermodynamics of stationary states. 

For more details, the reader is referred to 
Kurchan (1998), Lebowitz and Spohn (1999), 
Maes (1999), Eckmann et al. (1999), Bonetto 
et al. (2000, 2005), Eckmann and Young (2005), 
Derrida et al. (2001), Bertini et al. (2001), Evans 
and Morriss (1990), Evans et al. (1993), Goldstein 
and Lebowitz (2004), and Gallavotti (2004). 


See also: Adiabatic Piston; Chaos and Attractors; 
Ergodic Theory; Lie, Symplectic, and Poisson Groupoids 
and Their Lie Algebroids; Macroscopic Fluctuations and 
Thermodynamic Functionals; Nonequilibrium Statistical 
Mechanics: Dynamical Systems Approach; Quantum 
Dynamical Semigroups; Random Dynamical Systems. 
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Time Evolution of Infinite-Particle 
Systems 


A preliminary problem in the rigorous study of 
nonequilibrium statistical mechanics is to give a 
precise sense to the time evolution of infinitely 
extended systems. In fact, statistical mechanics deals 
with systems composed by a very large number of 
bodies (of the order of 10%) and studies the 
properties of such systems which are related to 
their large number of degrees of freedom. Mathe- 
matically, this aspect is stressed by introducing the 
so-called “thermodynamical limit,” that is, by 
defining and analyzing systems with infinite degrees 
of freedom. For particle systems, the problem can be 
formulated in the following way. A phase point of 
the system is an infinite sequence {(x;,v;)};en of the 
positions and velocities of the particles, and its time 
evolution is characterized by the solutions of the 
Newton equations: 


mit) = X F(xi(t)—x(t)), ieN [i 


jeNyjHi 


where m is the mass of each particle, F(x) = —V®(x), 
and ® is a two-body potential. Equation [1] must be 


completed by the initial data {(x;(0),v;(0))};en. The 
time evolution of a phase point implies in a natural 
way the time evolution of functions on the phase 
space, which are the observables to be compared with 
experiments. 

The existence of a solution to eqn [1] is not 
obvious, because the classical theorem of existence 
and uniqueness for the Cauchy problem of the 
Newton equations depends on the number of 
degrees of freedom of the system. The main 
difficulty is that a priori the time evolution can 
bring infinitely many particles in a bounded region 
within a finite time, so that the right-hand side of 
eqn [1] becomes meaningless. Without any hypoth- 
esis on the initial conditions, this can happen, as 
shown by the following simple example. Consider a 
system of free (noninteracting) particles moving 
on the real line with initial conditions x; =i, v; = —1, 
i € N. It is clear that at time t=1 all the particles 
are at the origin. To forbid this “collapse,” we must 
restrict the allowed initial conditions, but we cannot 
be too drastic. For instance, we could surely avoid 
these pathologies by choosing the initial velocities 
uniformly bounded and the initial distribution of 
particles locally finite. But the set of such data is 
exceptional with respect to the Gibbs state (as it can be 
easily shown using that, at equilibrium, the velocities are 
independent identically distributed Gaussian variables). 
In conclusion, we must construct the dynamics for initial 
conditions which are chosen in a set sufficiently large to 
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be the support of states of interest from a thermo- 
dynamical point of view. 

The difficulty of the problem increases with the 
spatial dimension d, as it is shown by the following 
example. Let the potential ® be smooth enough and 
short range and assume that, initially, the velocities 
and the density are bounded, that is, 


N(X; u, R) 


Rd <œ BI 


sup |v;| < œœ, sup 
i eR Ri 


where X = {(x;, vi)}en is the particle configuration and 
N(X; u, R) is the number of particles in the ball of radius 
R, centered at u. If V(t) denotes the modulus of the 
maximal velocity carried by the particle during the time 
[0, t] and X(t) the evolved configuration, the conserva- 
tion of the particles number yields 


N(X(t); u, Ro) < N(X(0); u, R(t)) < const. R(t)? 
[3] 


where 


R(t) = Ro + [ ds V(s) |4] 


On the other hand, V(s) is controlled by the force, 
which turns out to be bounded by sup, N(X(s); u, r), 
where r > 0 is the range of the potential. By virtue of 
eqns [3] and [4], we arrive at the integral inequality: 


t 
R(t) < Ro + const. t + const. / dsR(s)* [5] 
0 


which is solvable globally in time only if d=1. 

In the case of interest, from a thermodynamical 
point of view, we also need to allow fluctuations of 
the density and velocities, which add further 
difficulties. The existence, uniqueness, and locality 
of the motion has been solved in dimension d = 1 for 
almost all relevant interactions (Lanford 1968, 
Dobrushin and Fritz 1977), and in dimension d = 2 
for interactions not too singular at the origin (Fritz 
and Dobrushin 1977). (This does not cover, for 
instance, the hard-core interactions, where it is still 
an open problem to investigate whether the 
dynamics evolves toward a close-packing situation.) 
Finally, in dimension d= 3, the result has recently 
been proved only for bounded, non-negative, finite- 
range interactions (Caglioti et al. 2000). 

We state the result for the three-dimensional case. 
Let the interaction ® depend only on the mutual 
distance, be twice differentiable, positive in the 
origin and, for the moment, also non-negative and 
compactly supported. We assume that the initial 
data have bounded local energies and densities, with 


at most logarithmic divergences in velocities and 
densities. More precisely, we define 


Q(X; u, R) = X x(|xi — u| < R) 


1EN 





mv? 1 
KIS +5), 2(%i — 4%) +1} [6] 
j:jFi 
where y(A) denotes the characteristic function of the 
set A so that eqn [6] gives the energy and density 
contained in a ball centered at u with radius R. 
Define 
Q(X; u, R) 


OQa(X) = sup sup D Rp [7] 
u R:R>¢ġalu) 


where a > 0 and 
Palx)=log (e + |x|), x ER? [8] 


We denote by Xa the set of the phase points X such 
that O,(X) < oo. It is possible to prove that for any 
a > 1/3, ¥, has full measure with respect to any 
Gibbs measure. 

We define the partial dynamics tr X(t) as the 
solutions to eqn [1] obtained by neglecting all the 
particles which are initially outside the ball of radius 
n and centered at the origin. 


Theorem If X € Xa there exists a unique flow 
X — X(t) € Vena satisfying eqn [1] with X(0) = X. 
Moreover, the partial dynamics locally converges to 
X(t)asn — oo. 


The result has been extended to bounded super- 
stable long-range interactions. The (nontrivial) proof 
is based on several steps: we introduce a mollified 
version on the local energy and study its evolution in 
time under the partial dynamics. The energy 
conservation allows us to prove that the local energy 
grows at most as the cube of the maximal velocity. 
On the other hand, a suitable time average allows us 
to control the maximal velocity via the local energy 
in an appropriate way. The result is achieved by 
letting n — oo. 


Long-Time Behavior 


Existence and locality of the dynamics is only a first, 
preliminary, step. The next and much more subtle 
question concerns the asymptotic (in time) and the 
statistical properties of the motion. Here, the main 
problem is the absence of simple but nontrivial 
models. Let us explain this point by a comparison 
with the situation in equilibrium statistical 
mechanics. In this case, even the simpler model, 
the free-particle system, exhibits all the relevant 
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thermodynamical properties of real systems away 
from the critical regime. In fact, the effort is 
often reduced to rigorously proving that the real 
systems away from the critical region behave as a 
free-particle system. The presence of the interaction 
is instead essential to describe phase transitions. 

In the case of nonequilibrium statistical mechanics 
there are very few solvable models (free particles, 
chain of oscillators, hard-core system in one dimen- 
sion), and typically they do not catch the essential 
properties of the real systems. For example, let us 
consider a system which is close to equilibrium and 
ask whether it converges to the corresponding Gibbs 
state. Two possible mechanisms usually come together: 
the dispersive properties of the matter (by which 
perturbations “escape” to infinity) and the mixing 
properties (by which perturbations are “spread” and 
disappear). The former is present also in the free-particle 
system, being responsible of its ergodic properties. The 
latter requires a deep analysis of the dynamics of 
interacting-particle systems and it is too difficult to be 
analyzed except in rare cases. 

We just mention the case of systems with 
instantaneous interaction, which are simple enough 
to be studied but nevertheless exhibit a nontrivial 
long-time behavior. We recall in particular the 
famous Sinai’s billiard: a particle moving freely in 
a two-dimensional torus except for elastic collisions 
with the boundary of a convex obstacle. As proved 
by Sinai (1970), this system has strong ergodic 
properties. Sinai’s billiard can be proved to be 
equivalent to the “Lorentz gas” in which the 
obstacles are dislocated in a periodic way. 
Bunimovich and Sinai (1981) proved that when 
the obstacles are close enough to each other, the 
diffusive (weak) limit of the particle motion is the 
Wiener process. This remarkable result gives a 
rigorous derivation of Brownian motion from a 
Hamiltonian system. 

More recently, similar questions have been inves- 
tigated in the case of a charged particle subject to a 
constant electric field and interacting with a medium 
described by a particle system. Several rigorous 
results have been obtained on this subject. We only 
recall those by Boldrighini and Soloveitchik (1995, 
1997). In the context of a simplified model, the 
asymptotic motion of the charged particle is 
described as a drift plus a Brownian motion, and 
the Einstein relation between the drift and the 
diffusion constant is established. 


Mean-Field Limit 


The validity of any model is related to some 
approximation limit. In statistical mechanics, we 


encounter one of the most important ones, the 
“thermodynamical limit,” used to stress the effect of 
large number of particles. Here we briefly discuss the 
“mean-field limit.” For the kinetic, Boltzmann—Grad 
limit, see Boltzmann Equation (Classical and Quan- 
tum) and Kinetic Equations. 

We consider N particles of mass m mutually 
interacting via the force F. The equations of motion are 


o 2, Flexi) ~ x(t) 
J=l n NJAi 
(x:(0), x:(0)) = (x; vi) P| 


i-1,...,N 


mx;(t) = 


We consider a system with N very large, the mass m 
of each particle very small, and the interaction very 
weak. An interesting situation arises when the 
quantities N, m, and F are linked by the relations 
M G 
m = — F 


for some function G. Of course, M is the total mass 
of the system. 

We are interested in investigating the limit N — oo. 
We assume that the initial data are chosen in a way 
that the empirical measure N~! 57>, 6y,6,, weakly 
converges (as N —> oo) to the absolutely continuous 
measure fo(x,v)dxdv with some smooth density 
fo(x,v). We ask whether at some positive time t > 0 
the empirical measure N” Y`; 6,,(2)6y,(2) weakly con- 
verges to f(x,v,t)dxdv with a density f(x, v,t) 
satisfying some limiting evolution equation. 

Formally, it is easy to find this equation: by the 
Liouville theorem, a continuous medium in which 
each point moves under the action of an acceleration 
field behaves as an incompressible fluid. The 
continuity equation becomes 


A mnt) +o-Vaflesn) +E: Vellent)=0 
f (x,v,0) = fo(x,v) 


where 


E) = | dyGix—y)on) 2 


xt) = 2 du f(x, v, t) [13] 


This equation can be studied by following the 
characteristics, for which it suffices to look at the 
pair of functions 


(x,v) +> (X(x,v,t), V(x, v,t)), fo(x,v) > f(x, v, t) 
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where (x,v) € R? x R? and t € R, solutions of 


X(x,v,t) = V(x, v,t), V(x, v,t) = E(x,t) 
AX) Hh. Vev O= [14] 
f(X(x,v,t), V(x, v, t), t) = fo(x, v) 


This is a weak formulation of eqn [11], in the sense 
that any smooth solution to eqn [11] satisfies eqn 
[14], but this last equation in meaningful also for 
nonsmooth functions. This is a weak version of the 
Vlasov equation and its measure solutions will play 
an important role in the sequel. 

Equations [11]-[14] are called Vlasov equations, 
after Vlasov, who first introduced them in plasma 
physics. They have a Hamiltonian structure and 
conserve several quantities: the total mass, the total 
energy, the Liouville measure dx dv, and in general 
each moment of this measure. 

The existence and uniqueness of the solutions 
has been studied in many papers. Two cases have 
to be considered, depending on whether the total 
mass 


M = J, dx dv fo(x, v) [15] 


is finite or not. We start with the first case. If the 
interaction G is bounded, the analysis is easy. On 
the other hand, in plasma physics one deals with 
the Coulomb interaction, which is singular at the 
origin. In this case (where eqn [11] is usually 
called the Vlasov—Poisson equation), existence and 
uniqueness can still be proved, but it is not 
straightforward, especially in dimension d=3. 
The case with the complete Lorentz force, also 
taking into account the relativistic effect, is much 
more difficult. 

For infinite total mass, the problem has been 
solved recently in three (or lower) dimensions for 
bounded, non-negative, finite-range interactions, 
and in two dimensions for singular Helmholtz 
interactions. 

Another way to relate the Vlasov equation with 
the particle systems is to consider the usual 
transition from microscopic to macroscopic evolu- 
tions based on a separation between microscopic 
and macroscopic scales. Moreover, the force 
between the particles is due to a long-range pair 
interaction of the Kac type, in which the range 
parameter tends to infinity as the ratio e7! 
between the macro and the micro spatial scale: 
Fe) = —24+1 Glex; — exj). Finally, the mass of 
the particles is proportional to ef: m=e%. After 
rescaling space and time by a factor e, in the 
macroscopic variables (7,r)=(et,ex), the equa- 
tions of motion (eqn [9]) become 





dr; 
a D8 Oi) [16] 
j:jFi 


Then eqn [14] is the limiting equation as € — 0. 


Other Models 


We mention another model of larger interest. We 
introduce it in the simplest formulation, leaving 
possible generalizations to the reader. 

We consider an infinite chain of anharmonic 
oscillators, with Hamiltonian H given by 


H(q,p) 


2 
=». PL taqt +b S (ai- aj) +cq +d 


icZ j:\i—j|=1 





[17] 


where gj, pi € R, a > 0, b,c,d > 0. 

When a = 0, it reduces to the well-known chain of 
harmonic oscillators, which is integrable and widely 
studied in the literature. 

The time evolution defined by the Hamiltonian in 
eqn [17] exists and it is unique for initial data 
chosen in a set large enough to be the support of 
any reasonable thermodynamic (equilibrium or 
nonequilibrium) state. This can be achieved by 
proving integral inequalities for the “Lyapunov 
function” 


It is interesting to note that uniqueness holds only in 
a class of data such that the position of the ith 
oscillator does not increase too much as |i|— oo. 
For example, besides the stationary solution 
gi(t) =0,7 € Z, we can construct a different solution 
corresponding to the same initial conditions 
gi(0)=0, p(0)=0,7€ Z. In fact, by imposing 
go(t)=t* and q;,(t)=q_j(t), we can solve recursively 
the equations of motion and obtain a nonzero 
solution g;(t), which however increases superexpo- 
nentially as |i] — oo. 

The Hamiltonian dynamical systems (classical or 
quantum) are surely quite faithful descriptions of 
real systems, but they are too difficult to study. 
Mainly it is not known how to prove good 
dynamical mixing for deterministic evolutions with 
many degrees of freedom. Therefore, stochastic 
evolutions have been introduced to model the real 
systems. More precisely, one renounces a full 
description of the microscopic dynamics, introdu- 
cing simplified models where the effects of the 
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“hidden degrees of freedom” are taken into account 
by adding suitable stochastic forces. Many useful 
results have been obtained, which show that these 
stochastic model systems exhibit a macroscopic 
behavior much closer to that observed in nature. 
The main criticism concerns the role of stochasticity, 
which in these models is introduced ab initio. In 
other words, if one believes that the statistical 
properties of the deterministic motion on the small 
scale determine the collective behavior of systems 
with many degrees of freedom, then these properties 
do have to be proved for a true understanding of 
nonequilibrium phenomena. 


See also: Adiabatic Piston; Boltzmann Equation 
(Classical and Quantum); Fourier Law; Kinetic Equations; 
Nonequilibrium Statistical Mechanics (Stationary): 
Overview; Nonequilibrium Statistical Mechanics: 
Interaction between Theory and Numerical Simulations. 
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Introduction 


Nonequilibrium statistical mechanics concerns a 
wide range of fundamental problems and applica- 
tions. Perturbative methods are quite effective for 
approaching weakly nonlinear problems, usually 
relying upon effective coarse-grained equations. 
The attempt of obtaining a microscopic description 
of genuine nonlinear problems demands the com- 
bined use of theoretical methods and numerical 
simulations. The proprotypic case is the numerical 
experiment performed by Fermi, Pasta, and Ulam 
in 1955. As we discuss in the following section, the 
main questions, which had inspired this experi- 
ment, remained without an answer for a long time, 
while new puzzling problems emerged. Despite its 


apparent failure, the Fermi—Pasta-Ulam (FPU) 
experiment represents a remarkable example in 
the history of science of how a good guess may be 
the source of many fruitful achievements. Part of 
them are discussed in the section on energy 
relaxation in nonlinear chains, where we summar- 
ize the present understanding of the very slow 
relaxation mechanism, characterizing the dynamics 
of nonlinear chains of oscillators, like the FPU 
model, at low energies. Next, we report one further 
success of the interplay between theory and 
numerics, that is, the formulation of a generalized 
fluctuation—dissipation relation for stationary pro- 
cesses. Finally, we survey the main achievements 
concerning the study of anomalous transport 
properties in low-dimensional systems. In particu- 
lar, we focus our attention on the heat conduction 
in nonlinear lattices. Lacking a general hydrody- 
namic theory, also in this case computer simula- 
tions and theoretical arguments have greatly 
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contributed to clarify the general scenario, unveil- 
ing surprising aspects, which, up to a few years 
ago, were completely unexpected. 


The Numerical Experiment by Fermi, 
Pasta, and Ulam 


The impressive progress of electronic technology 
during World War II made possible the design of the 
first digital computers. The equally impressive 
budgets for their production and maintenance 
could only be justified by their employment in 
classified military research. Nonetheless, some of 
the outstanding scientists involved in these 
researches, like E Fermi, immediately realized the 
great potential of these new machines for tackling 
also some fundamental problems in basic science. 

Fermi had in his mind a crucial and still open 
physical problem. In 1914 the Dutch physicist 
P Debye had suggested that the finiteness of thermal 
conductivity in crystals should be due to the 
nonlinear forces acting among the constituent 
atoms. Forty years later a microscopic theory of 
transport processes, including nonlinear effects, was 
still lacking. Actually, technical difficulties pre- 
vented a theoretical approach based on analytic 
methods. Numerical integration of the equations of 
motion by a digital machine appeared to Fermi as 
an effective way for tackling this problem. In 
collaboration with the mathematician S Ulam and the 
physicist J Pasta, Fermi used MANIAC 1 (a proto- 
type digital computer installed at Los Alamos National 
Laboratories, USA) for integrating the dynamical 
equations of the simplest mathematical model of 
an anharmonic crystal: a chain of N harmonic oscilla- 
tors, coupled by nonlinear forces. Its Hamiltonian 
reads 


= ee 
H = 2 PE t5 (dia qi)? 
+ 5 (qir — qi) + P (ai — qi) [1] 
where w is the harmonic frequency, while a and 8 
are the positive coupling constants of the nonlinear 
terms. The integer space index i labels the oscillators 
along the chain, while g; and p; are the displacement 
from the equilibrium position and the momentum of 
the ith oscillator, respectively. The potential energy 
is the general form taken by any nonlinear interac- 
tion potential, when expanded, up to fourth order, 
around its equilibrium position. This choice guaran- 
tees the boundedness of trajectories for any finite 
energy. 


Accordingly, the model contains the minimal 
basic ingredients, needed for testing the conjecture 
about the finiteness of thermal conductivity. 

The equations of motion 


ƏH — OH 
a ËT 








di = 2] 


were integrated numerically by an algorithm, where 
space and time derivatives were approximated by 
proper finite-difference expressions. 

The choice of the initial conditions was motivated 
by a further basic question concerning Fermi and his 
collaborators. In fact, they aimed at verifying also a 
common belief that had never been proved rigor- 
ously: in an isolated mechanical system with many 
degrees of freedom (i.e., made of a large number of 
oscillators), a generic nonlinear interaction among 
them should eventually yield equilibrium through 
“thermalization” of the energy. On the basis of 
physical intuition, nobody would object to this 
expectation if the mechanical system would start 
its evolution from an initial state very close to 
thermodynamic equilibrium. Nonetheless, the same 
should be observed by considering an initial state, 
where energy is supplied to a small subset of 
oscillatory modes of the crystal. At variance with a 
finite system of linear oscillators, where each 
initially excited mode keeps its energy constant, 
nonlinear terms should make the energy flow 
towards all oscillatory modes, until thermal equili- 
brium is eventually reached. Thermalization corre- 
sponds to energy equipartition among all the modes. 
This statement has to be interpreted in a statistical 
sense: the time averages of the energies contained in 
the modes converge to the same constant value. But 
if this was the case, one further fundamental aspect 
concerning the evolution towards thermodynamic 
equilibrium could be checked. In the formulation of 
his transport equation, L Boltzmann had conjec- 
tured that thermodynamic irreversibility can emerge 
from microscopic reversible dynamics (which is 
the case of eqns [2]). The paradoxical implication 
of Boltzmann’s conjecture was pointed out by 
H Poincaré, who had proved that any isolated 
Hamiltonian system necessarily evolves towards an 
almost-recurrent dynamics. This is manifestly 
incompatible with the second law of thermody- 
namics, which implies that thermodynamic systems, 
in the absence of a supplied energy flux, have to 
evolve irreversibly towards their equilibrium state. 
In this perspective, the FPU numerical experiment 
was intended to test also if and how equilibrium is 
approached by a relatively large number of non- 
linearly coupled oscillators, obeying the classical 
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laws of Newtonian mechanics. Furthermore, the 
measurement of the time interval needed for 
approaching the equilibrium state, that is, the 
“relaxation time” of the chain of oscillators, would 
have provided an indirect determination of thermal 
conductivity. In fact, according to elementary kinetic 
theory, the relaxation time, 7,, represents an 
estimate of the timescale of energy exchanges inside 
the crystal: Debye’s argument predicts that thermal 
conductivity « is proportional to the specific heat at 
constant volume of the crystal, C,, and inversely 
proportional to 7,, in formulas k x C,/7;. 

Fermi, Pasta, and Ulam considered relatively short 
chains, up to 64 oscillators — a size that already 
challenged the limits of the computational power of 
MANIAC 1. They imposed fixed boundary condi- 
tions (i.e., the particles at the chain boundaries 
interact with infinite mass walls) and the energy was 
initially stored just in one of the long-wavelength 
oscillatory modes. 

A very surprising and unexpected scenario 
showed up. Contrary to any intuition, the energy 
did not flow to the higher modes, but was 
exchanged only among a small number of long- 
wavelength modes, before flowing back almost 
exactly to the initial state, thus yielding a recurrent 
behavior. 

Although nonlinearities were at work, neither a 
tendency towards thermalization, nor a mixing rate 
of the energy could be identified. The dynamics 
exhibited regular features very close to those of an 
integrable system. 

Fermi guessed that they were facing a very 
important result, but he was also quite disappointed 
by the difficulties in finding a convincing explana- 
tion. This lacking, he had decided not to publish the 
results in a scientific review, which remained 
confined into a Los Alamos report for almost one 
decade. In fact, he died in 1955, the same year of 
publication of the report. 

The results were finally published in 1965, in a 
volume containing his collected papers (Fermi et al. 
1965), and they immediately raised a renewed 
interest in the scientific community. Despite the 
failure in answering all the questions that had been 
raised, the FPU numerical experiment represents a 
crucial scientific achievement, which determined 
many subsequent scientific progresses. The implica- 
tions about nonequilibrium will be widely dis- 
cussed in the following sections. Here, we want to 
conclude by mentioning the important develop- 
ments, inspired by the FPU experiment, that led to 
the discovery of solitons by Zabusky and Kruskal 
in 1965. 


Slow and Fast Energy Relaxation 
in Nonlinear Chains 


The results of the FPU numerical experiment 
indicate that the energy initially supplied to long- 
wavelength oscillatory (Fourier) modes remains 
localized for a very long time in a small subset of 
long-wavelength modes. This time can be exceed- 
ingly pean than any typical timescale of the model 
(e.g., w, i.e., the inverse of the harmonic frequency 
in Ap. An n of this apparently bizarre 
scenario has been tackled by combining theoretical 
approaches with numerical studies. A complete 
account of the many contributions in this direction 
being beyond the scope of this text, we shall 
summarize the two main lines along which this 
problem has been considered. 


The Resonance-Overlap Criterion 


The almost-recurrent behavior of single-mode exci- 
tations studied in the FPU experiment can be 
explained by the resonance-overlap criterion, intro- 
duced in 1959 by the Russian scientist B Chirikov. 
Moreover, this criterion provides a quantitative 
estimate of the value of the energy density, above 
which the regular motion observed in the FPU 
experiment should be definitely lost. 

In order to provide the reader with an illustration 
of this criterion, we have to introduce a few simple 
mathematical ingredients. 

The Hamiltonian [1] can be rewritten in terms of 
linear normal Fourier coordinates, (O;(t), Pz(t)), as 
follows: 


H= 5 (Pk + wk Ok) + oV3({Qe}) 
+ BVa({Qx}) 3] 


Here, we have used the shorthand notation V,,({O;}) 
for the lengthy explicit expressions, in the new set of 
coordinates, of the nonlinear potentials of [1]. 

Without prejudice of generality, we can impose 
periodic boundary conditions to the FPU chain: the 
frequency of the kth normal mode is given by the 
expression w, = 2 sin(7k/N). The coupling constants 
a and 8 control the energy exchange among the 
normal modes, due to nonlinear interactions. 

For the sake of space, we give here a brief sketch 
of Chirikov’s criterion for the FPU §-model (this 
model amounts to take a=0 in [3], i.e., to exclude 
the cubic part of the nonlinear potential). 

By making reference to the initial conditions of 
the FPU experiment, we can consider a single 
excited mode, so that the Hamiltonian [3] can be 
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approximated by the expression in action-angle 
variables 


H = Ho + BH, & ux, + (wide)? [4] 


Here, J, =w0Q% is the action variable. In practice, 
this amounts to approximate the original Hamilto- 
nian by the sum of the harmonic and nonlinear self- 
energy of the initially excited mode. In this frame- 
work, Ho and Hy, are the unperturbed (integrable) 
Hamiltonian and the perturbation, respectively. 
Indeed, if the energy is initially attributed to mode 
k, the following relations hold: u,J, ~ Ho = E. By 
the approximated Hamiltonian [4], one can com- 
pute the nonlinear correction to the linear frequency 
Wp, giving the renormalized frequency w: 


e _ 9H b 
Wy, =g, = Wk tT wade = we + a [5] 


For N > k one has 


GHokR 
i; aw) N2 


The distance between two primary resonances, in 
the harmonic limit, is given by the expression 





6 


Awk = wki — We = N! [7] 


Consistently with [6], the last approximation is valid 
only for small wave number (k < N), that is, long- 
wavelength modes. 

The “resonance overlap”? criterion amounts to 
compare this distance with the frequency shift. In 
formulas: 


Q; A Aii [8] 


This equation allows to obtain also an estimate of 
the “critical” energy density, éc, above which size- 
able chaotic regions develop and a fast diffusion 
takes place in phase space: 


(BN 1 
e= (RN) 4 


with k=O(1)<N. Below cc, primary resonances 
are weakly coupled and determine a slow-relaxation 
process to energy equipartition. Above és, due to 
“primary resonance” overlap, fast relaxation to 
equipartition sets in (Izrailev and Chirikov 1966). 
This prediction was verified numerically later by 
Chirikov et al. (1973). The presence of a critical 
energy density can be tested by measuring the 
evolution of the finite time-averaged quantity 
Elt) =t! f EklT)dr, where Ep =(P? + «202)/2 is 
the harmonic energy of the kth mode. For energy 
densities much smaller than ee, E(t) exhibits an 


extremely slow relaxation towards the equipartition 
condition, E, = constant. Conversely, for € > ee such 
a condition is rapidly approached on a relatively 
short timescale. The slow relaxation below ee can be 
traced back to the overlap of higher-order reso- 
nances: its typical timescale has been found to be 
inversely proportional to a power of the energy 
density (Shepelyansky 1997). 


Energy-Equipartition Thresholds 


The first paper reporting evidence of the existence of 
an energy threshold in chains of coupled anharmo- 
nic oscillators had already been published in 1970 
by Bocchieri et al. (1970). This pioneering numerical 
experiment concerned a chain of oscillators coupled 
through a Lennard-Jones interatomic potential. The 
Italian group observed an energy threshold, separat- 
ing a high-energy thermalized regime from a regular 
dynamics regime at low energies (like the one 
observed by Fermi, Pasta, and Ulam). The main 
point raised by this experiment concerns the 
consequences on ergodic theory: the ordered motion 
observed in the low-energy regime seems to violate 
ergodicity, although the model is known to be 
chaotic at any energy. 

This is quite a delicate and widely debated issue 
for its statistical implications. Actually, as we have 
mentioned in the previous section, also Fermi, Pasta, 
and Ulam expected that a nonlinear dynamical 
system, made of a large number of degrees of 
freedom, should naturally evolve towards equili- 
brium. Further confirmations to the seminal paper 
by Bocchier1 and co-workers came from more 
refined numerical experiments, showing that, for 
sufficiently high energies, regular behaviors disap- 
pear, while equipartition among the Fourier modes 
sets in rapidly. Later on, the presence of the energy 
threshold was characterized by introducing an 
appropriate entropy, S=—) >, pzlnp, with pp= 
(E,(t)/E), which counts the number of effective 
Fourier modes involved in the dynamics: at equi- 
partition, this entropy is maximal (Livi et al. 1985). 

Nowadays, we know that the approach to 
equipartition below and above the energy threshold 
is a matter of timescales, which turn out to be very 
different in the two regimes. For instance, the 
analytic estimate of the maximum Lyapunov expo- 
nent A of the FPU 3-model (Casetti et al. 1995) has 
definitely pointed out that there is a threshold value 
of the energy density, er, at which its dependence on 
€ changes drastically: 


1/4: . 
CE 1 if € > er; 


| [10] 
E if € K erT. 
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This implies that the typical relaxation time, that is 
A, may become exceedingly large for very small 
values of e below er. It is worth stressing that this 
result holds in the thermodynamic limit, thus indicat- 
ing that the presence of er is statistically relevant. 

A more controversial scenario emerges from the 
studies of the relaxation dynamics for specific 
classes of initial conditions. When a few long- 
wavelength modes are initially excited, regular 
motion may persist over times much longer than 
A! (De Luca et al. 1995). The excitation of small- 
wavelength modes yields an even more complex 
scenario: solitary wave dynamics is observed, fol- 
lowed by slow relaxation to equipartition. It is also 
worth mentioning that some regular features of the 
dynamics persist even at high energies. As we shall 
discuss in the section “Heat transport,” such 
regularities still play a crucial role in determining 
energy transport mechanisms, although they do not 
affect significantly the equilibrium statistical proper- 
ties of the FPU model at high energies. 


The Generalized Fluctuation-Dissipation 
Theorem 


Another fundamental problem of nonequilibrium 
statistical mechanics concerns the possibility of 
establishing a fluctuation—dissipation theorem, gen- 
eralizing the relation valid for equilibrium condi- 
tions. In fact, on this basis one might develop a 
large-deviation formalism, aiming at the identifica- 
tion of an explicit nonequilibrium statistical mea- 
sure, analogous to the equilibrium Boltzmann-Gibbs 
measure. Recently, some relevant progresses in this 
direction have been made. 

A crucial numerical experiment, which attracted 
the attention on the problem of formulating a 
generalized fluctuation—dissipation relation for sta- 
tionary flows, was performed at the beginning of the 
1990s (Evans et al. 1993). Stationary conditions for 
momentum transport were obtained in the shear 
flow of a fluid contained between moving walls. The 
reversibility of the microscopic dynamics yields the 
heuristic fluctuation relation: 

re [11] 
t Pr(R; = —A) 
where Pr(R; =A) is the probability that the average 
entropy production rate, R, along a trajectory 
segment of duration t, takes the value A. For 
sufficiently large values of t, this relation was 
confirmed by numerical analysis. 

Gallavotti and Cohen (199Sa,b) proved a theo- 

rem meant to put on a rigorous mathematical 


basis eqn [11], that is, the proposed extension to 
nonequilibrium steady states of the equilibrium 
fluctuation—dissipation theorem. This theorem 
concerns the phase-space contraction rate of the 
dynamics, which equals the entropy production 
rate in the case of particle systems, whose internal 
energy is a constant of the motion. The proof of 
the theorem is based on restrictive hypotheses, 
which include the existence of an average non- 
vanishing phase-space contraction rate, the time- 
reversal invariance of the dynamics and a strong 
form of chaos (the dynamics is assumed to be of 
the Anosov type, that is, smooth and uniformly 
hyperbolic). Nonetheless, the prediction of the 
theorem, that is, 


w = D(o)p [12] 


is expected to hold much more generally. Here I (p) 
is the probability that a fluctuation variable takes 
the value p. The theorem proved by Gallavotti and 
Cohen states that I,(p) has to satisfy the large 
deviation relation [12], where ø is the average 
phase-space contraction rate over a trajectory seg- 
ment of duration t and D is a suitable constant. It 
must be pointed out that the rigorous derivation of 
this relation provided strong motivations for inves- 
tigating its validity and generality in many other 
contexts. The first numerical experiment, where 
almost all the constituent hypotheses of the Gallavotti- 
Cohen theorem were satisfied, was performed by 
Bonetto et al. (1997). They studied a Lorentz gas 
(massive pointlike noninteracting particles bouncing 
elastically on circular scatterers displaced on a 
regular lattice without free horizon) of charged 
particles moving in an uniform external electric 
field. Numerical simulations were found to be in 
very good agreement with [11] and [12] (which, in 
this case, refer to the same quantity). One further test 
of the fluctuation-dissipation relation was later 
performed for a different setup (Lepri et al. 1998). 
The FPU 8-model is put in contact at its boundaries 
with thermal heat baths of different temperatures T} 
and T_(T, > T_). Numerical simulations have been 
performed for sufficiently large applied thermal 
gradients, which guarantee sizeable effects of fluc- 
tuations, suitable for verifying a relation like [11]. It 
is worth noticing that many of the constituent 
hypotheses of the Gallavotti-Cohen theorem are 
not valid for this setup, but eqn [12] is still expected 
to hold, although in this case it does not refer to the 
entropy production rate. Nonetheless, the extension 
[11] of the fluctuation—dissipation theorem can be 
tested, thanks to the following useful relation, 
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between the heat flux j and the entropy production 
rates, C+, at the chain boundaries: 


G)+=(---) B 


This can be interpreted as a balance relation for the 
global entropy production. In fact, according to the 
principles of irreversible thermodynamics, the local 
rate of entropy production ø in the bulk is given by 


1) =i5- (a5) 14] 


By integrating this equation, one straightforwardly 
obtains the previous one, which then applies to the 
entropy production from the heat baths. Careful 
numerical simulations show that stationary condi- 
tions are found to hold over a wide range of 
temperatures and gradients. Equation [13] indicates 
that the heat flux is equivalent to the entropy 
production rate, apart from a multiplicative con- 
stant which depends on the amplitude of the applied 
field. 

Let us define the finite-time average of the global 
heat flux 


IsI 7... 
E drji(T) [15] 


The normalization of this quantity can be obtained 
by computing the asymptotic average value 

Joo = lim J; [16] 
The quantity of statistical interest is the normalized 
finite-time average global heat flux 


Jr 
z= =— [17] 
Jæ 
Accordingly, the fluctuation-dissipation relation in 
this case takes the form: 


P, (2) (1 1 

In P.(—z) > TZ (= = T) [18] 
The conjecture that such a relation might be valid in 
this case has been confirmed by numerical analysis. 
It is worth stressing that, in this out-of-equilibrium 
setup, the probability distribution, P,(z), is not 
Gaussian and exhibits a peculiar asymmetric shape. 
Nonetheless, for increasing values of r, the asym- 
metry progressively reduces, while P,(z) approaches 
a Gaussian shape. This observation indicates that, in 
this case, large fluctuations deviate from the typical 

statistics of independent events. 
It should be mentioned that generalized fluctuation- 
dissipation relations, like those discussed in this 





section, have been successfully checked in many other 
situations, where the hypotheses of the Gallavotti- 
Cohen theorem did not apply. The “robustness” of 
relations such as [11] and [12] indicates that a more 
general theory may be possible. 


Heat Transport 


The validity of Debye’s conjecture about the 
necessity of nonlinear forces for obtaining a finite 
heat conductivity in crystals still remained an open 
problem after the unsuccessful FPU numerical 
experiment. The setup, described in the previous 
section for testing the generalized fluctuation- 
dissipation relation in the FPU chain, can be used 
also for tackling the verification of this conjecture. 
Actually, the thermal conductivity, x, of a chain of 
oscillators can be measured from the Fourier’s law 


Jo =—-KVT(x) [19] 


where Jo is the heat current and VT(x) is the 
temperature gradient. 

This problem was solved analytically for a chain 
of N harmonic oscillators (Rieder et al. 1967). The 
bulk of the chain is found to reach thermal 
equilibrium conditions at the average temperature 
T =(T, + T_)/2, corresponding to a constant tem- 
perature profile. Only at the chain boundaries the 
harmonic chain exhibits a steep temperature gra- 
dient. This implies that the heat current is propor- 
tional to the temperature difference, rather than to 
the temperature gradient, thus violating Fourier’s 
law. Accordingly, a harmonic chain, made of N 
oscillators, in contact with two heat reservoirs at 
different temperatures, exhibits anomalous trans- 
port properties and the effective thermal conduc- 
tivity is found to diverge in the infinite-chain limit 
as k ~ N. This peculiar behavior is a consequence 
of the integrability of the harmonic chain 
dynamics. Actually, the Fourier modes propagate 
with finite velocity through the harmonic chain, so 
that any energy injected from the hot reservoir 
flows ballistically to the cold one, rather than 
diffusing, as required for the validity of [19]. It is 
worth stressing that any integrable system should 
exhibit a similar scenario. This is the case of the 
equal-mass hard sphere gas in one dimension and 
of the Toda chain, where the harmonic potential 
(w*/2)(dis1 — qi? is replaced by the nonlinear 
expression 


aexp[—b(qi+1 — 4i)| 


In the former case, integrability and ballistic 
propagation are straightforward consequences of 


550 Nonequilibrium Statistical Mechanics: Interaction between Theory and Numerical Simulations 


the conservation laws, inherent elastic collisions 
between hard spheres. In the latter model, the 
normal nonlinear modes, called “Toda solitons,” 
are responsible for such anomalous behavior. 

Debye’s conjecture should be modified accord- 
ingly: nonintegrability of the equations of motion 
has to be invoked as a necessary property for 
explaining heat transport in real solids. Let us 
observe that the FPU model is known not to be 
integrable and it is expected to be a good candidate 
for confirming Debye’s conjecture, at least in its 
fully chaotic regime. Careful and extended numer- 
ical simulations have shown that the FPU chain 
maintains anomalous properties (Lepri et al. 1997). 
In particular, the thermal conductivity, k, is found 
to diverge in the infinite chain limit as 


k~ N? [20] 


with y ~ 2/5. This value agrees with independent 
analytic estimates (e.g., see Lepri et al. (2003)), 
although renormalization arguments indicate that 
one should rather find y=1/3 (Narayan and 
Ramaswamy 2002). This discrepancy could be due 
to the peculiar features associated with the presence 
of a quartic nonlinearity in the FPU problem and 
also to the fact that in the FPU chain heat can be 
transported only through longitudinal oscillations. 
Anyway, this is still an open problem, which 
requires further theoretical advances to be solved. 

In a more general perspective, the main outcome 
of these numerical studies indicates that a power- 
law divergence like [20] is found in all one- 
dimensional nonintegrable models. This general 
feature must be attributed to the combined effect 
of low-space dimensionality, with energy and 
momentum conservation. In such a situation, 
fluctuations are strongly constrained, so that the 
evolution of long-wavelength hydrodynamic modes 
is not sufficiently damped, to be ruled by diffusion 
(which is a necessary ingredient for the validity of 
[19]). It must be stressed that these numerical 
investigations have strongly revived the interest for 
this problem. In particular, they have also stimu- 
lated new theoretical efforts for explaining the 
power-law divergence of transport coefficients in 
d=1. One of the main achievements of these 
theoretical approaches is that the power-law 
divergence turns to a logarithmic one in d=2, 
while the divergence should disappear in d > 3. 
Despite the difficulty of performing the necessary 
large-scale simulations for such systems in d > 1, it 
seems that numerics essentially agree with such 
predictions. 

One can find normal transport properties even 
in d=1, if suitable models are considered. For 


instance, momentum conservation can be broken 
by adding to the Hamiltonian [1] a local interac- 
tion potential, U(g;), which breaks translation 
invariance, thus restoring finite heat conductivity 
(e.g., see Casati et al. 1984). The exception to this 
case is the harmonic chain with the addition of a 
local harmonic potential: in this case the dynamics 
is still integrable and there are as many conserved 
quantities as degrees of freedom. A further pecu- 
liar case is represented by the rotator model in 
d=1, which is known to be nonintegrable. Its 
Hamiltonian contains the interaction potential 
e[1 — cos(gj,1 — q;)], replacing the algebraic poten- 
tials of the FPU chain. Anyway, such a Hamilto- 
nian still guarantees momentum conservation, 
since the nearest-neighbor form of the interaction 
is maintained. Notice that, for small oscillations 
around the equilibrium position, also the rotator 
potential admits a Taylor-series expansion, whose 
first three terms correspond to quadratic, cubic, 
and quartic contributions, as in the FPU chain. 
Nonetheless, at variance with the FPU problem, 
the potential of the rotator model is bounded also 
from above. Numerical investigations (Giardina 
et al. 2000) have shown that for any finite energy 
density and for a sufficiently long finite time, 
some previously oscillating rotators start to rotate, 
due to local energy fluctuations, that allow to 
overtake the potential barrier. These dynamical 
configurations typically appear in the form of 
spatially localized, synchronous rotating clusters. 
Their time evolution is characterized by an 
intermittent behavior: they are eventually reab- 
sorbed by lattice fluctuations and may reappear 
afterwards at other lattice positions. In this way 
they play the role of scattering centers for 
hydrodynamic modes. It must be pointed out that 
such a qualitative argument is not sufficient for 
explaining the onset of a genuine diffusive beha- 
vior, compatible with the validity of Fourier’s law. 
A hydrodynamic theory, still to be developed, 
could provide a more convincing insight on these 
results. 

It is worth concluding this section by mentioning 
that the overall scenario described above is con- 
firmed by numerical studies, relying upon a different 
approach, based on equilibrium measurements. 
Actually, the linear response theory by Green and 
Kubo (see Kubo (1985)) provides an alternative, but 
essentially equivalent, definition of the thermal 
conductivity, according to the expression 


m= fim lim ~ / dr(J(r)J(0)) 21] 


t—oo N= 
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The crucial quantity to be computed numerically is 
the heat-flux time-correlation function Cj)(T)= 
(](r)J(0)), where () represents the thermodynamic 
equilibrium average. In practice, numerical simula- 
tions can be performed for a chain of N oscillators 
in contact with boundary heat reservoirs at the same 
temperature T=T,=T_. The presence of anom- 
alous transport coefficients can be singled out by 
analyzing the long-time behavior of Cj(7). It has to 
decay at least as 7~''**), with £ > 0 to yield a finite 
heat conductivity. In one-dimensional models exhi- 
biting the power-law divergence [20] one rather 


finds 
Car [22] 


where the positive exponent y is the same appear- 
ing in [20]. This relation between space and time 
exponents can be easily explained, by considering 
that space and time variables depend linearly on 
each other through a proportionality constant, 
which is the velocity of sound in the lattice. Since 
0O<y<1, the anomalous behavior observed in 
out-of-equilibrium conditions is recovered. 

One major problem in performing proper numer- 
ical studies concerns the control over finite-size 
effects, which demands a consistent increase of the 
integration time with the system size. This may 
yield very extended and expensive computations, 
mainly when very slow relaxation processes set in. 
This is the case of the low-energy regime originally 
studied by FPU in their pioneering computer 
simulations. Numerical analysis indicates that in 
this regime the expected behavior of Cj(r), reported 
in eqn [22], sets in after a crossover time te, which 
increases, for decreasing energy density €, as te © €°. 
This seems to be compatible with the studies 
described earlier. 

We conclude this section by pointing out that this 
result also contributes significantly to clarify one of 
the basic questions raised by the FPU numerical 
experiment. 


See also: Dynamical Systems and Thermodynamics; 
Ergodic Theory; Fourier Law; Gravitational N-Body 
Problem (Classical); Lyapunov Exponents and Strange 
Attractors; Nonequilibrium Statistical Mechanics: 
Dynamical Systems Approach. 
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Historical Background 
Ginzburg-Landau Equations 


Nonlinear Schrodinger (NLS) equations have 
become one of the most important nonlinear systems 
studied in mathematics and physics. Actually, one 
can find the essence of NLS equations in the early 
work of Ginzburg and Landau (1950) and Ginzburg 
(1956) in their study of the macroscopic theory of 
superconductivity, and also of Ginzburg and Pitaevskii 
(1958), who subsequently investigated the theory of 
superfluidity. 

By minimizing the free energy of a superconductor 
near the superconducting transition, Ginzburg and 
Landau arrived at what are now called the 
Ginzburg-Landau equations: 


1 2 
T (—ibv — A) yprayt Bibl =0 [1] 


2 
J=- Wvy- yv] -WPa o 
mc mc 
where a, 3 are phenomenological parameters, A the 
electromagnetic vector potential, and y* denotes 
complex conjugate of w. The first equation deter- 
mines the field y~ based on the applied magnetic 
field. The second equation provides the supercon- 
ducting current J. 

The equation describing the behavior of super- 
fluid helium near the transition point in the 
stationary case derived in Ginzburg and Pitaevskii 
(1958) is completely analogous to eqn [1] in the 
phenomenological theory of superconductivity. 

Equation [1] contains all the ingredients of the 
NLS equations which are discussed below. How- 
ever, it was not until the 1960s that the wide 
physical importance of NLS equation became 
evident. The next section discusses how the NLS 
equation historically first appeared in the context of 
nonlinear optics. 


Nonlinear Optics: Self-Focusing of Optical Beams 
in Nonlinear Media 


In the mid-1960s, Chiao et al. (1964) and Talanov 
(1964) investigated the conditions under which an 


electromagnetic beam can produce its own dielectric 
waveguide and propagate without spreading. This is 
a reflection of the phenomenon of self-focusing. In 
fact, self-focusing of optical beams may occur in 
materials whose dielectric constant increases with 
field intensity. In the general situation, a beam of 
uniform intensity in a dielectric broadens due to 
diffraction. However, the refractive index of many 
physically important materials (the so-called Kerr 
materials, such as silica) depends on the field 
intensity as follows: 


n = no +n|El +- 


If the term |E|* is large enough, the critical angle 
for total internal reflection at the beam’s boundary 
can be greater than the angular divergence due to 
diffraction; thus, spreading does not occur as a 
result of diffraction. As a consequence, a beam 
above a certain critical power level is trapped and 
does not spread. 

In a remarkable contribution, Kelley (1965) 
observed, using computational methods (years 
before computational methods became easy to 
implement and, consequently, so popular) that 
when the self-focusing effect due to the increase in 
the nonlinear index is not compensated by diffrac- 
tion, there is a buildup in intensity of part of the 
beam as a function of the distance in the direction 
of propagation. Consequently, the intensity of the 
self-focused regions tended to become “anoma- 
lously large,” that is, a singularity appeared to 
develop. 

Consider as starting equation the electromagnetic 
wave equation in the presence of nonlinearities 


derived earlier by Chiao et al. (1964): 
2 €0 a2 02 rrr 
VE-30; E- 32; (E-E)=0 [3] 


where e&|E|* < 1. One assumes a linearly polarized 
wave of frequency w, propagating along the z-axis, 
so that 


E=1Eel®!™™) Eee 


where c.c. denotes complex conjugation, k = Ey! 2w C, 
the factor exp(ikz — wt) represents the propagating 
part, that is, the “carrier,” of the wave, and € is the 
slowly varying part. Substituting the above expres- 
sion for E into eqn [3], neglecting the third-harmonic 
term and the term 02€ from V*E (assuming it to be 
small), yields 


2ikOE + (a + a )E + ok = E e=0 [4] 
0 


or, with a suitable rescaling of the dependent and 
independent variables (E€ — w/((3/4)k2€2 /eo)'/, 
z — 2kz), 


iby + Vib + 2ly y = 0 [5] 


which is the NLS equation in standard nondimen- 
sional form. 

It should be remarked here that the name NLS 
equation for equations of the form of [5] is natural 
due to the formal analogy with the Schrödinger 
equation in quantum mechanics: 


ið + V2 + VY = 0 [6] 


If one sets V = 2} in eqn [6], the result is the NLS 
equation. In the context of quantum mechanics, a 
nonlinear potential arises in the “mean-field” 
description of interacting particles. 

Modifications of [6] also arise as mean-field 
descriptions of Bose-Einstein condensates which is 
of keen interest in physics (see Pethick and Smith 
(2002) and references therein). The normalized 
equation is 


ib- V+ (Væ y) +2) 7 


where V is an external potential. This is generally 
referred to as the Gross—Pitaevskii equation. 

Talanov (1965) (see also Zakharov et al. (1971)) 
investigated the behavior of stationary light beams 
in a self-focusing nonlinear medium and found that 
for a purely cubic nonlinearity, “collapse” of the 
beam can take place. The proof that there is a 
singularity in eqn [5] is remarkably straightforward. 
This is discussed in the section “Wave collapse.” In 
order to avoid wave collapse, other physical effects 
(e.g., saturable nonlinearity or dissipation) are 
required. 


Universal Character of the NLS Equation 


It turns out that almost any dispersive, energy- 
preserving system gives rise, in an appropriate limit, 
to the NLS equation. For instance, one can derive 
the NLS from other physically significant equations 
such as the Klein—Gordon equation 


Uy — Uyy +U + ku? =0 
and the Korteweg-de Vries (KdV) equation 
Ut + 6UUy + Uxxx~ =O 


Actually, the NLS equation provides a “canonical” 
description for the envelope dynamics of a quasi- 
monochromatic plane wave (the carrier wave) 
propagating in a weakly nonlinear dispersive med- 
ium when dissipative processes are negligible. 


Nonlinear Schrödinger Equations 553 


Indeed, consider a scalar nonlinear wave equation 
written symbolically as 


L(ð;,, V)u + G(u) =0 


where L is a linear differential operator with 
constant coefficients and G a nonlinear function 
of u and its derivatives. For a real, small- 
amplitude solution of magnitude e < 1, the non- 
linear effects can first be neglected, and the 
equation admits approximate monochromatic 
wave solutions 


u = epeta) + ec, [8] 


with small amplitude e|q|. Substituting [8] into the 
linear equation, one can find that the frequency w 
and the wave vector k are related by the dispersion 
relation 


L(—iw, ik) = 0 
Let 
w = w(k) 


be one of the solutions of the previous equation. 
Suppose one is interested in a solution w which is 
not constant, but slowly varying in space and time. 
This has the interpretation of k having a “sideband” 
wave vector and w a “sideband” frequency. More 
precisely, restricting discussion, for simplicity, to the 
(1 + 1)-dimensional case, the slowly varying ampli- 
tude assumption corresponds to letting 


U(X, 1) = OX 1) = age R 


where X=ex and T=et. Note that K=ck and 
(=ew are sometimes referred to as the sideband 
wave number and frequency, respectively, because 
they correspond to a deviation from the central 
wave number k and central frequency w. Looking at 
these deviations from the point of view of operators, 
whereby w — 10,, k — —10, and Q — 107, K — —10x, 
one has 


Wrot ~ wW + EQ = w + iceðr 


kit ~ R+€K = k — ieOx 


Then w(k) can be expanded in a Taylor series 
around the central wave number as 
5 Ww 


w(k — iceOx) ~ w(k) — iew Ox — € J 


oz deges 
Therefore, 


Wror(R)w ~ [w(k) + ieðriy 


~ (aw) =i Ox — e +) 
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which shows that, to the leading order, 


(Ow ,dow gu Op 

ie Set xx) re Ox 
In the moving frame £=X — w'(k)T, T=€T = Ct, 
eqn [9] transforms to 


0 [9] 


e (iv. F + ve) =0 


which is the linear Schrodinger equation with the 
canonical w”(k)/2 coefficient. On the other hand, if 
one considers rather general conservative nonlinear 
wave problems with leading quadratic or cubic 
nonlinearity, asymptotic analysis (e.g., multiple 
scale analysis which yields the so-called Stokes- 
Poincaré frequency shift) shows that a wave solution 
of the form 


u(x,t) = ew(r)el*-”) + c.c. 


with r= €*t has (T) satisfying 
Ob a 
ine tay ~=0 [10] 


where the constant coefficient n depends on the 
particular equation under study. It should be 
remarked here that cubic nonlinearity yields an 
O(e) contribution, which is balanced by a slow 
timescale of order e. Putting the linear and non- 
linear effects together (i.e., eqns [9] and [10]) implies 
that an NLS equation of the form 

OW wl Pw 7 

iaz t 7 02 + aly p= 0 
naturally arises. The NLS equation is viewed as a 
“universal” equation as it generically governs the 
slowly varying envelope of a monochromatic wave 
train (see also Benney and Newell (1969)). 


Physical Applications 


The nonlinear propagation of wave packets is 
governed by NLS-type systems in several different 
branches of scientific and technological applications, 
beyond what has been mentioned earlier. Some of 
these applications are discussed below. 


NLS equation in Water Waves 


The NLS equation in the context of small-amplitude 
water waves was derived by Zakharov (1968) 
(infinite depth) and Benney and Roskes (1969) 
(finite depth). The procedure for deriving the NLS 
equation from the Euler-Bernoulli equations of fluid 
dynamics in one horizontal direction will now be 
discussed, under the assumption of small-amplitude 


waves and deep water. The interested reader can 
also find the details of the derivation in Ablowitz 
and Clarkson (2006). The relevant equations are 


Pxx te zz |= 0; 
bz = 0, 


be +5 (be + 2) +gn=0, z=) [13] 


SOO Sz = 475%, 0) [11] 


=A [12] 


Mi + Inds = Pz, Z= EN [14] 


where @ is the velocity potential of an ideal 
(i.e. incompressible, irrotational, and inviscid) 
fluid, n(x,t) is the free surface of the fluid, which 
is to be found, in addition to ¢(x, z; t). 

Equation [11] expresses the ideal nature of the 
fluid; the condition [12] expresses the requirement 
that there is no vertical flow at infinity; and eqn [13] 
is the Bernoulli equation of energy conservation. 
Finally, eqn [14] is a kinematic condition stating 
that no flow occurs transverse to the free surface. 

At the free boundary, for small amplitudes, one 
can expand ¢= (t,x, en) for €e < 1 as 


2 
p = o(t,x,0) + eng: (t, x, 0) + DP plex, TESE 


and similarly for the derivatives. Second, one 
introduces slow temporal and spatial scales (one 
expects the slowly varying envelope of the wave to 
depend on slow variables X = ex, Z = ez, T = et). 
Finally, because of the quadratic nonlinearity one 
expects second harmonics to be generated; hence, 


b = (a + c.c. J c (Ape tec. + $) 
T= (Be'® + Ta + e(B2e”9 + c.c. + 7) 


where A, A2, @ depend on X, Z, T and B, Bo, 7 
depend on X,T (¢ and 7 are mean contributions, 
which are real) and © = kx — wt with the dispersion 
relation w* = g\|k|. Substituting this ansatz into the 


equations, one obtains from the order-e* terms 


| Vy 2k4 
2iwA, — (FEA + Ea apa) = [15] 
where vg = wu" (k) = g/2w is the group velocity and the 
new variables 7=eT, €=X — vT. 

Equation [15] is the typical formulation of the 
(1 + 1)-dimensional NLS equation found in water 
wave theory for large depth. 

In the section “NLS in nonlinear optics,” a 
special solution to (a rescaled version of) eqn [15], 
namely a soliton solution, is discussed in the 


context of nonlinear optics. It should be 
remarked here that the coefficients of both terms 
Age and |A| A have the same sign. This is necessary 
for a decaying soliton solution to exist (see, e.g., 
Lighthill (1965)). 


NLS in Nonlinear Optics 


The NLS equation also describes self-compression 
and self-modulation of electromagnetic wave pack- 
ets in weakly nonlinear media. Hasegawa and 
Tappert (1973a, b) first derived the NLS equation 
in the context of fiber optics. Light-wave propaga- 
tion in a fiber is mainly affected by: (1) group 
velocity dispersion (GVD), that is, the frequency 
dependence of the group velocity originating from 
the refractive index of the fiber and (2) fiber 
nonlinearity (the so-called Kerr effect), originating 
from the dependence of the refractive index on the 
intensity of the optical pulse. In the presence of 
GVD and Kerr nonlinearity, the refractive index is 
expressed as 


n(w, E) = no(w) + m\E| [16] 


where w and E represent the frequency and 
electric field of the light wave, respectively, no(w) 
is the frequency-dependent linear refractive index, 
and the constant m, referred to as the Kerr 
coefficient, is “small” but can have significant 
impact since the nonlinear effects accumulate over 
long distances. Normally, the electric field is 
modulated into a slowly varying amplitude of a 
carrier wave: 


E(z, t) = E(z, Jeket) + e.c, [17] 


where z denotes the distance along the fiber, t the 
time, ko =ko(wo) the wave number, wo the fre- 
quency, and E(z,t) the envelope of the electromag- 
netic field. 

A Taylor series expansion of the dispersion 
relation (see also the section “Universal character 
of the NLS equation”) 


k(w, E) = ~ (molu) + m|El?) 


around the carrier frequency w = wọ yields 


k” (wo) 
2 
EJ [18] 


k — ko = k' (wo) (w — wo) + = 


won2 


(w — wo) 
+ 
where the prime represents derivative with respect to 


w and kj =k(wo). Replacing k — kọ and w — wọ by 
their Fourier operator equivalents, iô; and ið; resp., 
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using k — kọ =(w/c)no(w) and letting eqn [18] 
operate on E yields 


(3E, OEN — RB (wo) BE be 





where v=wo2/cAeg, with Aege being the effective 
cross-section area of the fiber (the factor 1/Agg, 
comes from a more detailed derivation which takes 
into account the finite size of the fiber; the factor 
1/Aese is needed in order to account for the variation 
of field intensity in the cross section of the fiber). 
Note that ko(wo)=1/vg, where vg represents the 
group velocity of the wave train. Introducing dimen- 
sionless: vatiables T =ta/t2'=2/cng=C/V/P; 
yields the NLS equation 


„ðq | sgol klw) Og 
Oz! 2 ðt’? 


where t, P, are the characteristic time and power, 
respectively, and tret =tŁ — kolwo)z =t — Z/Vg, % = 
1/vP,, with the constraint that the “nonlinear 
length” is balanced by the linear dispersion time, 
that is, t = (z.| — R”(wo)|)'/. 

There are two cases of physical interest depending 
on the sign of kj. The so-called focusing case occurs 
when kọ < 0; this is called “anomalous” dispersion. 
The defocusing case obtains when the dispersion is 
“normal”: kj > 0. 

Now write eqn [20] in the form 





+|q’q=0 [20] 


iqt + dex +2\q\’q = 0 [21] 


with + corresponding to the focusing (+) and 
defocusing (—) case, respectively. The focusing NLS 
equation admits special solutions called “bright” 
solitons (solutions that are traveling localized 
“humps”). A pure one-soliton solution in the 
focusing (+) case has the form 


q(x, t) = nsech[n(x + 2ét — xo)] ee"? [22] 


where O=€x + (È —77)t + On. The parameters £ 
and 7 are such that A\=€/2 + in/2 is an eigenvalue 
from the inverse scattering transform analysis. 

The defocusing (—) NLS equation does not admit 
solitons that decay at infinity. However, it does admit 
soliton solutions which have a nontrivial background 
intensity (called “dark” and “gray” solitons). A dark- 
soliton solution has the form 


q(x, t) = n tanh(nx) e77" [23] 


Note that g — +n as x — too. A gray-soliton 
solution is 


1/2 . 
q(x, t) = n1 — B? sech? (nB(% — xo))| ? let) (24) 
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with 
(x,t) = — 7 (2 — Bei +v 1 — Bex 


ı/B | 
+ tan : ae T 
T po 


and |B| < 1. Note that as B — 17, the gray soliton 
becomes a dark soliton, taking ġo = —7/2. 

Recall that the solutions [23] and [24] can be 
allowed to travel uniformly by making a Galilean 
transformation, that is, taking into account that if 
gi(x,t) is a solution of [21], then so is 


q2 (x, f) = qı (x — ut, t) el(kx—wt) 


with k= =y and w= —k*/2. 

It should also be remarked that Ablowitz et al. 
(1997) have shown that, in quadratically nonlinear 
optical materials, more complicated NLS-type equa- 
tions arise. These equations are analogous to the 
finite-depth multidimensional nonlocal NLS-type 
systems derived in the context of water waves by 
Benney and Roskes (1967) and later by Davey and 
Stewartson (1974). 


Optical Communications 


Hasegawa and Tappert (1973) first suggested using 
solitons as the “bit” format for transmission of 
information in optical fiber systems. Motivated by 
this, in 1980, scientists at Bell Laboratories observed 
solitons (described by the NLS equation) in optical 
fibers (Mollenauer et al. 1980). The development of 
optical amplifiers (erbium-doped amplifiers) in the 
mid-1980s provided a mechanism to compensate 
fiber loss, and this permitted the transmission of 
information entirely optically over long distances. 
With damping and amplification included (see, e.g., 
Hasegawa and Kodama (1995)), the NLS equation 
[20] takes the form 


Aq _ sgn(—kj(wo)) q 2 
lat 2 pt EO q=0 [25] 


where 9(z) =a exp( —2T'z/z,),0 < z < Za, and peri- 
odically extended thereafter, and aj is determined by 


1% 
2S =j g(z/za)dz = 1 
a J0) 


with z,=/,/z.,1, being the amplifier length. 
Remarkably, asymptotic analysis (Zza <1) shows 
that, to leading order, g(z,t) still satisfies the NLS 
equation [20]. 

Amplifiers, however, introduce small amounts of 
noise to the system, which causes the temporal 
position of the soliton to fluctuate (cf. Gordon and 
Haus (1986)) and thus limits the distance signals can 


be reliably transmitted to. Soliton control mechan- 
isms were introduced in the early 1990s in order to 
deal with these difficulties (cf. Mecozzi et al. (1991) 
and Kodama and Hasegawa (1992)). 

By the mid-1990s, the development of all optical 
transmission systems began to take great advantage 
of wavelength-division-multiplexing (WDM), that 
is, the simultaneous transmission of multiple 
signals in different frequency (or equivalently 
wavelength) “channels” (Hasegawa 2000). How- 
ever, it was found that a serious problem affected 
WDM systems. Namely, the interactions of soli- 
tons traveling at different velocities cause resonant 
amplifier-induced instabilities in adjacent fre- 
quency channels (four-wave mixing (Mamyshev 
and Mollenauer 1996, Ablowitz et al. 1996)). In 
order to avoid these instabilities, researchers 
developed and analyzed dispersion-managed (DM) 
transmission systems (cf. Hasegawa (2000)). In a 
DM transmission system, the fiber is composed of 
alternating sections of positive (normal) and 
negative (anomalous) dispersion fibers. The 
(dimensionless) NLS equation that governs this 
phenomenon is 

„ðq  d(z) 0? 

EROT E 26 
where d(z) is usually taken to be a periodic, large, 
rapidly varying function of the form d(z)=6, + 
A(z), with |A(z)| > 1 and having zero average in 
the period z, (generally the same as that of the 
amplifier). In fact, asymptotic analysis of [26] 
yields a nonlocal NLS-type equation (Gabitov and 
Turitsyn 1996, Ablowitz and Biondini 1998). It has 
also been shown that eqn [26] admits various types 
of optical pulses, such as DM solitons (Ablowitz 
and Biondini 1998), and quasilinear modes (Ablowitz 
et al. 2001). 


NLS Equation in Other Settings 


Many other interesting applications of the NLS 
equations exist in such different areas of physics as 
magnetic spin waves (see, e.g., the work by Zvezdin 
and Popkov (1983) and also by Kalinikos et al. 
(1997)), plasma physics (cf. the work by Zakharov 
(1972) on collapse of Langmuir waves), other areas 
of fluid dynamics, etc. (the interested reader can 
find an overview in the monograph by Ablowitz 
(1981)). 


Mathematical Framework 


Mathematically, the NLS equation had attained 
broad significance since it is integrable via 


inverse-scattering transform (IST), admits multisoliton 
solutions, has an infinite number of conserved 
quantities, and possesses many other interesting 
properties. Some of these are discussed below. 


The Inverse-Scattering Transform 


The IST method allows one to linearize a large class 
of nonlinear evolution equations and can be con- 
sidered as a nonlinear version of the Fourier trans- 
form. An essential prerequisite of IST method is the 
association of the nonlinear evolution equation with 
a pair of linear problems (Lax pair), a linear 
eigenvalue problem, and a second associated linear 
problem, such that the given equation results as a 
compatibility condition between them. A key 
research breakthrough on NLS systems appeared in 
1972, in the papers of Zakharov and Shabat (1972, 
1973), who first analyzed the scalar NLS equation 
in the form 


iq: = xx a 2jq*q [27] 


(+ correspond to the focusing/defocusing case, 
respectively) and found the associated Lax pair 


_({ 1k q 
n=( 7i dye [28] 


n= ( 2A Till 


i= 


—2kq — ig 
x 29 
+2kq* Fiq? —2ik? + ilq? Je Pa 


where v(x,t) is a two-component vector. The 
compatibility of [28] and [29] yields eqn [27], 
assuming that the eigenvalue parameter k is 
constant in time (so that [27] is often said to be 
isospectral). 

The solution of the initial-value problem of a 
nonlinear evolution equation by IST proceeds in 
three steps, as follows: 


1. the forward problem — the transformation of the 
initial data from the original “physical” variables 
to the transformed “scattering” variables; 

2. time dependence — the evolution of the trans- 
formed data according to simple, explicitly 
solvable evolution equations; and 

3. the inverse problem — the recovery of the evolved 
solution in the original variables from the 
evolved solution in the transformed variables. 


The implementation of steps 1-3 described above is 
more concretely carried out as follows. The initial 
(Cauchy) datum g(x,0) for eqn [27] is mapped into 
scattering data S(k, 0) (comprising, in general, discrete 
eigenvalues and associated normalization constants, 
and reflection coefficients) by means of eqn [28]. The 
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data S(k, 0) are evolved via eqn [29] to get S(k, t) at an 
arbitrary time ż> 0. Finally, by employing the 
methods of inverse scattering, eqn [28] allows one to 
reconstruct the evolved solution g(x, t) from S(k, t). 

One can easily note the “formal” resemblance to 
the well-known method of Fourier transform for 
linear differential equations. 

There is considerable literature on the subject and 
the interested reader is encouraged to consult, for 
instance, some of the following references: Ablowitz 
and Segur (1981), Calogero and Degasperis (1982), 
Novikov et al. (1984), Ablowitz and Clarkson 
(1991), Ablowitz et al. (2004). 


Linear Stability Analysis 


Consider a special solution of eqn [27] in the 
focusing (+sign) case: g=a exp(—2ia’t). If this 
solution is perturbed as 


q(x, t) = ae” "(1 + elx, t)) 


where |e|« 1, it is found that e satisfies the 
condition 


lës = €x, + 2a*(e + €*) 


On the periodic spatial domain 0 < x < L,e has the 
Fourier expansion 


CO 


elx, t) = > ên (te 


—CO 
where 


_2nn 


bn = [30] 


Assuming a solution of the form 


e)z Ge 
= ev” 
am B 


one finds that o,, satisfies 
on = Hn (Hn — 42°) [31] 


It then turns out that when aL/r < n the system is 
unstable. Note that there are only a finite number of 
unstable modes (i.e., for fixed a, L, sufficiently high 
mode numbers n will not satisfy the above inequal- 
ity). In the context of water waves, this corresponds 
to the famous experimental and theoretical result by 
Benjamin and Feir that the Stoke’s water wave is 
unstable. Later, Benney and Roskes (1969) showed 
that all periodic wave solutions of the generalized 
nonlocal NLS equation resulting from water waves 
in (2 + 1)-dimensions are unstable. Also, in (2 + 1)- 
dimensions soliton solutions are unstable to weak 
transverse modulations. 
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Wave Collapse 


The equation 


id + Ay + [Y| Y = 0, 


has the following conserved quantities: 


P= | idx 


M= | uVude 
_ 2 1 4 
H= f hiv- ds 


that is, mass (power), momentum, and energy 
(Hamiltonian) are conserved. Remarkably, Talanov 
(1965) showed that eqn [32] satisfies the following 
equation: 


x= (x,y) ER? [32 


oy 


where 


v= J (x2 + y2) yl? dæ dy 


Equation [33] is also known as the “virial”? theorem. 
Hence, it follows that 


V =4Ht +ct+o 


and if H < 0 initially, then a singularity in eqn [32] 
results since V must be positive. Actually, one can 
further show (see, e.g., C Sulem and P L Sulem 
(1999), and references therein) that there exists a 


time ¢* such that 
[iver ax 


becomes infinite as £ — t*, which in turn implies 
that ~ also becomes infinite as £ — t* (blowup in 
finite time). 

Note also that for the more general equation 


id + Ag + [wb = 0, 


where A, is the d-dimensional Laplacian, one has 
the following types of solutions: 


xeR? 


e Supercritical (od > 2): the solution blows up. 

è Critical (cd=2): blowup can occur or global 
solution can exist. 

è Subcritical (od < 2): global solutions exist. 


Vector NLS Systems 


In many applications vector NLS (VNLS) systems are 
the key governing equations. Physically, the VNLS 


arise under conditions similar to those described by 
NLS with the additional proviso that there are 
multiple wave trains moving nearly with the same 
group velocities (Roskes 1976). Importantly, VNLS 
also models systems where the field has more than 
one component. For example, in optical fibers and 
waveguides, the propagating electric field has two 
components transverse to the direction of propaga- 
tion. The nondimensional system 


ig’? = 4) +2(laP + lg? Ja [34a] 
iq? = qR +2(laPP +PP) [34b] 


is an asymptotic model which governs the propaga- 
tion of the electric field in a waveguide, where z is 
the normalized distance along the waveguide and x 
a transversal spatial coordinate. It was first exam- 
ined by Manakov (1974) (see also Anastassiou et al. 
(1999) and Soljačić et al. (2003)). Subsequently, this 
system was derived as a key model for light-wave 
propagation in optical fibers. More precisely, in 
optical fibers with constant birefringence 
(i.e. constant phase and group velocities as a 
function of distance) Menyuk (1987) has shown 
that the two polarization components of the 
electromagnetic field £= (u,v)! which are orthogo- 
nal to the direction of propagation, z, along the fiber 
asymptotically satisfy the following nondimensional 
equations (assuming anomalous dispersion): 


i(uz + 6) +tun + (lu? +alv|7?)u=0 [35a] 


i(vz — vi) +tvn + (alul? + v ju=0 [35b] 
where 6 represents the group velocity “mismatch” 
between the u, v components of the electromagnetic 
field, œ is a constant that depends on the polarization 
properties of the fiber, z the distance along the fiber, and 
t a retarded temporal frame. In deriving eqn [35], it is 
assumed that the electromagnetic field is slowly varying 
(as in the scalar problem); certain nonlinear (four-wave 
mixing) terms are neglected in the derivation of eqn 
[35], because the light wave is rapidly varying due to 
large, but constant, linear birefringence. In this context, 
birefringence means that the phase and group velocities 
of the electromagnetic wave in each polarization 
component are different. In a communications environ- 
ment, due to the distances involved (hundreds to 
thousands of kilometers), the polarization properties 
evolve rapidly and randomly as the light wave evolves 
along the propagation distance, z. Not only does the 
birefringence evolve, but it does so randomly, and on a 
scale much faster than the distances required for 


communication transmission (birefringence polariza- 
tion changes on a scale of 10-100 m). In this case, the 
relevant nonlinear equation is eqn [35] above, but with 
6=0 and a=1. Indeed, this is the integrable VNLS 
equation first derived by Manakov (1974). 

It should be remarked that the VNLS equation 
[34] and its generalization to an arbitrary number of 
components, 


iq; = Axx + 2llql q [36] 


where q is an N-component vector and ||-|| is the 
Euclidean norm, are integrable by the IST. One has 
to suitably extend the analysis discussed earlier in 
this article (cf. e.g., Ablowitz et al. (2004)). 


Discrete NLS Systems 


Both the NLS and the VNLS equations discussed 
above admit integrable discretizations which, 
besides being used as the basis for constructing 
numerical schemes for the continuous counterparts, 
also have physical applications as discrete systems. 

A natural discretization of NLS [27] is the 
following: 


_d 1 
rag de = 77 (Ant — 2an + qn-1) 
+ [qn (Juri + 4n-1) [37] 


which is referred to as the integrable discrete NLS 
(IDNLS). It is an O(h7) finite-difference approxima- 
tion of [27] which is integrable via the IST and has 
soliton solutions on the infinite lattice (Ablowitz and 
Ladik 1975, 1976). Note that if the nonlinear term in 
[37] is changed to 2]q,|"g,, the equation, which is 
often called the discrete NLS (DNLS) equation, is 
apparently no longer integrable. It should be 
remarked that the (apparently nonintegrable) DNLS 
equation arises in many important physical contexts. 

Correspondingly, one can consider the discretiza- 
tion of VNLS given by the following system: 


_d 1 
PrE = h2 (dn _ 2q, + qanı) 


where q,, is an N-component vector. Equation [38] 
for q, =q(nh) in the limit h — 0,nh=x gives VNLS 
[36]. The discrete vector NLS system [38] is also 
integrable (Ablowitz et al. 1999, Tsuchida et al. 
1999). The interested reader can find further details 
in Ablowitz et al. (2004). 


See also: Boundary-Value Problems for Integrable 
Equations; Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Evolution Equations: 
Linear and Nonlinear; Ginzburg—Landau Equation; 
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Integrable Systems and Discrete Geometry; Integrable 
Systems: Overview; Partial Differential Equations: Some 
Examples; Riemann—Hilbert Methods in Integrable 
Systems; Schrodinger Operators. 
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Introduction 


The flow of a fluid, liquid or gas, is described by 
three conservation laws, the conserved physical 
quantities being the mass, the linear momentum, 
and the energy, and by constitutive equations. The 
constitutive equations are specific to each fluid, and 
link deformations to stresses. 
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A fluid is said to be Newtonian if it satisfies the 
simplest constitutive equation, which gives the stress 
tensor o as a linear function of the rate of 
deformation tensor D=(1/2)(Vu + Vu'), namely 


o =(AtrD—p)I+2nD [1] 


where u is the fluid velocity, p is the hydrostatic 
pressure (p > 0), and à and ņ are the Lamé viscosity 
coefficients of the fluid, satisfying 7 > 0O and A+ 
2n/3 > 0. The superscript T designates the transpose 
operation, the abbreviation “tr” the trace operator 
of a tensor, and I the unit tensor. Water and glycerin 
are examples of Newtonian liquids. 


Non-Newtonian fluids are fluids for which the 
behavior is not described by eqn [1]. Silicone oils, 
polymers (melted or in solution), egg yolks, and 
blood are examples of non-Newtonian liquids. 
Other examples include liquid crystals, rubbers, 
suspensions, paints, etc. 

In the following we shall first describe flows 
which show Newtonian or non-Newtonian 
behaviors. Then we shall describe the requirements 
a constitutive equation needs to satisfy to be 
considered, introducing the notions of continuum 
mechanics we need. After giving the most commonly 
used constitutive equations, we will give a few ideas 
about the mathematical study of the set of equa- 
tions, and their numerical study, in the particular 
case of viscoelastic fluids. 

Numerous kinds of materials are already known 
to exist, and more might exist in the future. This 
report, however, will be limited to the most 
commonly materials used nowadays, which are 
polymers, liquid crystals and polymeric liquids 
crystals, and paints. Moreover, we shall only 
consider isothermal flows, even though temperature 
might be an important parameter in experiments 
or in industry, because in particular most theoretical 
or numerical studies concern isothermal problems. 

Non-Newtonian fluids will always be liquids, and 
we shall use the terms liquid or fluid indifferently. 


Non-Newtonian Behaviors 


We describe a few experiments to show how 
differently both types of fluids, Newtonian or non- 
Newtonian, might react in some experimental 
situations. We also give some mechanical explana- 
tion when possible. 


Shear Thinning or Shear Thickening 


In a Poiseuille experiment, where a fluid flows in 
a tube under the action of a pressure drop, the 
volumetric flow rate of a Newtonian fluid is 
inversely proportional to the constant fluid viscosity. 
Under the same pressure-drop condition, a polymer 
melt flows much faster out of the tube, which means 
that there is a decreasing apparent viscosity with 
increasing shear rate: this is referred to as shear 
thinning effect. Other fluids might exhibit the 
opposite behavior and flow out of the tube more 
slowly: this is called the shear thickening effect. 


Rod Climbing 


When a rotating rod is inserted in a beaker filled with 
a Newtonian fluid, it is observed that the liquid near 
the rotating rod is pushed outwards by centrifugal 
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force and that a dip on the surface of the liquid near 
the rod results. On the contrary, if we make the same 
experiment with a polymer, the fluid climbs along the 
rod. Moreover, for comparable rotation speed, the 
difference in behaviors might be quantitatively con- 
siderable. This is explained by totally different 
pressure repartitions in both fluids, Newtonian or 
non-Newtonian: in particular, the pressure in the 
polymer along the rod is much larger than that along 
the beaker, so that this pressure difference fights the 
centrifugal force; this is in contrast with the situation 
in a Newtonian fluid. 


Extrudate Swell 


If a fluid is forced to flow from a large reservoir out 
of a circular tube of small diameter, the swell at the 
exit is much larger for a polymer solution than for a 
Newtonian fluid. A polymer flowing out of a die 
might also show a delayed die well, which means 
that the swell is not at the exit but on the jet at a 
certain distance of the exit. The explanation of this 
phenomenon is not unique: it is due partly to 
memory effects (the fluid remembers its former 
shape, the one in the reservoir), partly to the release 
of normal stresses, to interfacial forces, compressi- 
bility, viscous heating, and the complicated flow 
near the die exit. 


Difference in Normal Stresses 


In a shearing flow of a Newtonian fluid, the two 
normal stress differences are both zero, whereas for 
a polymer the first normal stress difference might be 
very large, the second one being nearly zero. These 
differences in stresses in shearing flow might be a 
partial answer to the extrudate swell and to rod 
climbing experienced by polymers. 


Presence of a Yield Stress 


Some materials, when subjected to shear stress, 
flow only after a critical value is attained. Such 
fluids are referred to as Bingham fluids: some 
cements, slurries, paints, and biological fluids 
might exhibit such a behavior. It is actually a 
well-known property of paints: if put in large 
quantities on a vertical wall, the paint will flow, 
whereas if put as a very thin film on the same wall, 
the paint will not flow, but stay in place, and dry to 
form a nice colored covering. 


Preferred Orientation of the Particles of Fluid 


Fluids with properties as above, Newtonian or 
non-Newtonian, are isotropic in nature, even though 
they are constituted of atoms, or of long chains of 
material. They are the same everywhere, optically, 
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magnetically, or electrically. Some fluids, liquid 
crystals, or polymeric liquid crystals in particular, 
have remarkable properties of nonanisotropy, being 
able to orient themselves, on average, along a 
particular direction: this is the nematic phase, which 
is used in many devices (screens for clocks, hand 
calculators, and cell phones), because the average 
orientation may be changed by applying an electric 
field. Other phases of liquid crystals include smectic 
A, C, and C* phases, where one sees a preferred 
orientation (tilted for C phases) of the fluid, and also 
a layer-like structure. As an example, let us mention 
discotic nematic liquid crystals, which are precursors 
for carbon-based materials, such as fibers, compo- 
sites, and films, which possess excellent mechanical 
and thermal properties. Sails for race sailing boats are 
made of Kevlar, which is one of these new materials 
with remarkable properties. 


Modeling 


The flowing fluid will be described by its (Euler- 
ian) velocity at time t and position x, say u(x,t), 
for x belonging to the domain of the flow 2 and 
the time t to R,, by its mass density p(x,t), its 
pressure p(x,t) (p >0O defined up to an additive 
constant), and its stress o(x,t) — which is a 
symmetric tensor. 

The partial differential equations describing the 
flow are satisfied in the domain of the flow and read 
as follows: 


Op 
Ot 


(5+ i- Vu) = divo + f 


+ div(pu) = 0 
|2] 


where f denotes some external forces applied to the 
fluid. These equations describe the conservation of 
mass and the conservation of linear momentum. To 
close the system, we need a constitutive equation for 
the stress ø as well as initial conditions and 
boundary conditions. 

Moreover, most non-Newtonian fluids are practi- 
cally incompressible in most regions of the flow, so 
that we shall only consider this case: the first 
equation in [2] is replaced by condition div u =0 in 
the domain of the flow. 


Notions of Continuum Mechanics 


At time t, a body S occupies a region Q, of the 
Euclidean space E3, called the configuration at time f, 
of the body. Points p of S are called material points 
or particles of fluids. The configuration Q; 
is assumed to be regular in the following sense: Q; 


is closed, its interior is connected and dense 
everywhere, its boundary is piecewise regular, C° at 
least. 

A mapping ® : No — Q; is a deformation if ® is a 
bijection from Qo onto Q, and is a C'—diffeomorph- 
ism from the interior of Qo onto the interior of Q,, 
with positive Jacobian. 

The motion of a body S is given by a set of 
deformations II(t, t’): Q; — Qr, satisfying 

M= Id, A=) 
The trajectory of the material point which is in X at 
to is the set 


UIG, to)(X), t > to} 


A body is said to be rigid if the deformation II(t, t’) 
is an isometry for all times ¢ and t’. A material point 
p is said to be attached to the rigid body S if the 
body p US is rigid. 

The motion of a fluid might be described in terms 
of the Lagrangian coordinates X €Q 9 of each 
particle of fluid: Qo is called the reference config- 
uration and is the fixed configuration occupied by 
the body of fluid at the time of reference, say to. The 
motion of the fluid might also by described in terms 
of the Eulerian coordinates x=y(X,t), which 
represent the position of a particle at time t£ which 
has position X at to. The Lagrangian and Eulerian 
coordinates of the same particle of fluid are linked 
by the differential equation 


x(X, t) = u((X, t),t), 
x (X, to) =A 


fort > to 


For defining the constitutive equations, we shall 
use a few tensors that we define now. The defor- 
mation gradient is defined by F(X,t)=0Oy 
(X,t)/OX, and the right Cauchy—Green tensor by 
C=F'F (also called Cauchy strain). To define 
relative tensors, we denote by y= x;(x,s) the 
position at time s < t of the material point, which 
is at x at time t. The relative tensors are defined in 
the following way: 


e the relative deformation gradient F;(s) = Vx;(x, s), 

è the relative right Cauchy—Green tensor C;(s) = F! (s) 
F,(s), and 

è the relative Finger tensor Cils". 


Note that the rate of deformation tensor is obtained 
as the time derivative of the relative Cauchy strain 
tensor: 


OC;(s) 


1 
D=- 
2 Os ey 





Principle of Objectivity and Frame Invariance 


A frame of reference is defined in the spacetime 
E€3x R attached to the observer by giving a 
chronology and a system of reference. The chron- 
ology is a timescale, which will be assumed to be 
the same for all observers. The system of reference 
is a set of at least four points attached to a rigid 
body (this is the observer), which are not 
coplanar. 

The constitutive equation needs to satisfy the 
principle of frame invariance and of frame indiffer- 
ence (or objectivity), which means that the equation 
does not depend on rigid motions of the observer. In 
the mathematical framework, it means that the 
equation has to be invariant under a change of 
orthonormal frame of reference x* = O(t)x, where 
Q(t) is an orthogonal tensor: the transformed 
equation has to have the same expression, and also 
to be frame indifferent. We define a scalar quantity 
p, a vector field u, or a tensor field 7, as being frame 
indifferent if, under the change of variables 
x*=QO(t)x, they satisfy the relations y(x,t)= 
g(x, t), u(x,t) = Olt) u(x", t), and (x, t) = Olt)! r" 
(x*, t)O(t), respectively. 

The velocity gradient Vu is not frame indifferent, 
but its symmetric part is. The vorticity, which is the 
antisymmetric part W = (Vu — Vu')/2 of the velo- 
city gradient, satisfies the equation w=o'w*o- 
O'O, where the dot denotes the convective deriva- 
tive d/dt=0/0t + (u - V). 

Note that the convective derivative of a 
scalar function y is frame indifferent, which 
means that 


but the convective derivative of a vector or a tensor 
is not frame indifferent. 
It can be easily checked that the derivative 
Dot dr 
— =—+TW-Wr 3 
Dt dt 3] 
of a (frame-indifferent) tensor 7 is frame indifferent, 
which means that 





To obtain another frame-indifferent derivative of a 
tensor T, we need to start with the expression [3], to 
which we may add other terms containing frame- 
indifferent quantities, for example, combinations of 
r and D. A derivative which is often considered is 
the Oldroyd derivative, as introduced by Oldroyd in 
1958: 
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Pat = FW — Wr —a(Dr + rD) [4] 
where a is a real parameter, chosen in the interval 
[—1,1]. (This restriction on a is necessary for 
viscometric reasons, and obtained when simple 
flows, such as Couette or Poiseuille flows, are 
studied.) 

The case a = 1 corresponds to the upper convected 
derivative, and the case a=—1 to the lower 
convected derivative. The case a=0 refers to the 
corotational or Jaumann derivative. Derivatives 
corresponding to cases a=—1,0, or 1 might 
actually be obtained by derivating r in a frame 
fixed locally to the body of fluid, and which rotates 
and/or deforms with the body. Moreover, we shall 
see that the derivatives corresponding to a=1 or —1 
have very simple integral expressions. 


Constitutive Equations 


The constitutive equation of a non-Newtonian fluid 
is a nonlinear relationship between the stress tensor 
and objective variables depending on the flow, such 
as the pressure, the rate of deformation, frame- 
indifferent derivatives of such quantities, etc. 

Analogously to the constitutive equation for an 
incompressible Newtonian fluid, we may also write 
the stress tensor in the form o = —pI + r. The extra 
stress tensor 7 could be either a function of objective 
variables, which characterize the flow, or defined by 
a differential equation or by an integral equation. 
The point here is to model the fact that the fluid 
might have some elasticity or some memory, or 
might experience, for example, yield stress or 
orientational properties. 


Shear dependent viscosity fluids A very simple 
generalization of the incompressible Newtonian 
fluid consists in making the viscosity dependent on 
the rate of deformation tensor, 7=7(D). This 
generalization has been introduced by OA Ladyz- 
henskaya in 1970 and, if the function is chosen 
properly, this model reproduces the behavior of 
existing fluids, at least in certain parts of their flow. 
For power-law fluids, the viscosity depends on the 
second invariant Ip =(1/2)tr D* of the symmetric 
tensor D (the first invariant tr D is zero because of 
incompressibility), and reads as 


n(D) = no + mig ' [5] 


where no > 0, m > 0, and n > 0. If n = 1, we recover 
the Newtonian case, whereas for n < 1 this equation 
describes a shear thinning fluid, and for n > 1 a shear 
thickening fluid. The power law is not valid for Ip 
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close to 0, so that the Carreau-Yasuda law is 
preferred: 

— (n-1)/(20) 

no — Too 

where no is the zero-shear rate viscosity, Jo is the 
infinite-shear rate viscosity, A a time constant, n a 
dimensionless power-law index, n > 0, anda>0Oa 
parameter (generally equal to 1 for a monomolecu- 
lar polymer). 


6 


Oldroyd models and related models Oldroyd mod- 
els are differential models built with one of the 
Oldroyd derivatives, and are very commonly used 
for polymer solutions or melts. The stress tensor is 
given as a solution of a differential equation in the 
following way: 





TFA PaT + g(7,D) = 2n (D + A = [7] 
where A; > 0 is a relaxation time, A2 is a retardation 
time, 0 < Ar < Ay, and g(r,D) is a tensor-valued 
function, constrained to certain restrictions due to 
objectivity, and which is at least quadratic. 

The Johnson-Segalman model has g = 0, and —1 < 
a<1. Other models of differential type often 
suppose the parameter a to be 1, because it has 
been noticed that with a close to 1 the model is able 
to reproduce some experimental behavior, whereas 
for a= — 1 or close to —1, the model does not work 
at all. Among the models with a= 1, the following 
ones are fairly popular: the model of Phan-Thien and 
Tanner has g(t,D)=artrt, where a is a constant; 
this model can be generalized by defining g(r, D) = 
aT? + 3r,a and 8 being functions of the trace of 7 
and of its determinant; the model of Giesekus is the 
particular case where a is a constant and G=0. The 
Oldroyd eight-constant model is given by 


g(7T, D) = uo(tr T)D + vı tr(7TD) I 
+ maD? + v tr(D*) I 


where uo, v1, u2, and 1» are constants. 

In [7], the limit case à =0 corresponds to 
Maxwell’s type models, where there is no New- 
tonian viscosity, while the case 2 > 0 corresponds 
to the Jeffreys’ type models. The cases where a=1 
and g=0, are often considered in mathematical or 
numerical studies: this is the upper convected 
Maxwell (UCM) model for A. =0, and the Oldroyd 
B model for A> > 0. 

The parameters à1, 2, and 7 might also depend 
on Ip: such a model where the upper convected 
derivative (a=1) is chosen is referred to as the 
White—Metzner model, and reads as follows: 


Dair D,D 
ZA D ID ae 
T+A Di 2m + 1) ( + Aj Dr )) 


where 7. is also the Newtonian viscosity. 


Integral equations Other constitutive equations for 
viscoelastic fluids include integral equations. Actu- 
ally, some differential equations have integral 
counterparts: this is the case for the differential 
equations associated with the upper or lower 
convected frame-indifferent derivatives. For the 
upper convected derivative (a=1), the extra stress 
is given by the integral expression 


ae 
Ai 





A 
T(x, t) =2n $ D(x, t) + 2n 


t 
«| e— *9)/M (Vx) D(X, s)\(Vxx)" ds 


where X is the position, at time s, of the point which 
is at x at time t. A similar expression might be 
obtained for the lower convected derivative. 

A very common integral equation is the K-BKZ 
equation (introduced independently by Kaye and 
Bernstein, Kearsley, and Zapas in 1962-63). In a 
simplified form, the extra-stress tensor is given as 
the integral of a combination of the relative Cauchy 
strain tensor C, and its inverse: 


Txt) =a} G(t — s) ZAI Crs) 
OW(h, In) 
— eas) ds 


where I; =tr Cs and h =tr C(s). The function G 
is a given kernel, and W a given scalar potential. 
The upper convected Maxwell model is obtained 
from the K-BKZ model by setting W(1,, Ib) =I, and 
G(s) =(A1A2/2) es. 


Models issued from kinetic theories or micro—macro 


models Polymeric fluids could also be modeled by 
coupling a macroscopic viewpoint — the one of 
continuum mechanics, as described above — and 


a microscopic viewpoint. A polymer is, in general, 
made of long chains of molecules. Rather than trying to 
represent the polymer behavior by a sophisticated 
constitutive equation, one describes the mean behavior 
of the molecules by using their microscopic description. 

To take an example, we consider a dilute solution 
of polymer, where each chain of polymer is modeled 
as a collection of dumbbells, each of them consisting 
of two beads connected by a spring. The configura- 
tion of the spring, namely its length and orientation, 
is described by a random vector field O € R?. The 
dumbbells are convected and stretched by the flow. 


The probability w(x, O,t)dO of finding a dumbbell 
with a configuration O at (x,t) is governed by a 
Fokker—Planck equation: 


H + divo((Vu)Qu) 
2. es 2kT 
=7 divo((VoW)y) + Fi 


where ¢ is the friction coefficient of the dumbbell 
beads, T the temperature, and k the Planck constant, 
and W the spring potential. The extra stress is given 
by the constitutive equation 


T = à | (vows O)y(x, Q, t) dO 


The simplest potential is the linear one (also called 
Hookean potential) W(Q)=H|O|*, where |Q| is 
the length of QO, and H tthe elasticity constant. 
In fact, in the case of the Hookean potential, this 
set of equations is equivalent to the Oldroyd B 
model. Another potential corresponds to finitely 


extendable nonlinear elastic (FENE) chain of 
dumbbells, 
HO} í 
mo) =~ Piios( - 125) 
0 


for |O| < Qo, and gives the FENE model, for which 
there is no macroscopic constitutive equation known. 

We have only made here a short incursion in these 
micro—macro models: research is in progress, both 


analytical and numerical (Ottinger 1996, Suen et al. 
2002, Keunings 2004). 


Liquid crystals and polymeric liquid crystals As an 
example, we present the constitutive equations for a 
uniaxial nematic liquid crystal. 

In the theory of Leslie and Ericksen, established in 
the 1960s and the 1970s, the stress tensor is given as 
a function of the orientation unit vector n, through 
the Oseen-Frank elastic energy, 


2W(n, Vn) = ki (divn) + k2(n - curln)? 


+ «3\n x curl n|? 


where Kk; > 0,K2 > 0, and «3 > O are the three basic 
modes (splay, twist, and bend, respectively). The extra 
stress tensor is precisely given by the relation 


r= (Vn) + caln-Dn)n@n 


+aaN®u+a3n@N 
+ a4D + a;Dn ®1n+a6n & 


Non-Newtonian Fluids 565 


where N =ñ — Wn is the corotational derivative of 
the director, and aj;,i=1,...,6, the six Leslie 
viscosity coefficients. 

The director satisfies a differential equation 
derived from continuum mechanics, 


pı = G + g + divm 


where pı is the moment of inertia per unit volume, 
G the external director body force (torque per unit 
volume), m the director stress tensor, and g the 
intrinsic director body force. Precisely, 


Ow 
g= àn — (Vn)b — e WN — yDn 


7 OW 
eal OVn 
where 8 is a Lagrange multiplier vector, and A= 
—72/71 is the reactive parameter, with %1 =a3 — a2 
the rotational viscosity, and y2 =a¢g — a5 =a3 + a2 
the irrotational torque coefficient. 

Polymeric liquid crystals might have other variables 
entering in the modeling, such as order parameters, 
order tensors, etc. 

Because of the complexity of modeling, most 
studies concern either very simple flows, such as 
Couette or Poiseuille flows, or steady flows, or 
flows for which the coefficients satisfy specific 
relationships. 

Reports about earlier studies, theoretical as well 
as numerical, can be found in Coron et al. (1991), 
and references therein. The study of polymeric liquid 
crystals, or of the smectic phase of liquid crystals is 
at its very early stage and one could look into it in 
specialized journals, such as the Journal of Non- 
Newtonian Fluid Mechanics, or see Liquid Crystals. 


Yield stress fluids Bingham materials have the 
property of flowing only when the stress magnitude 
is greater than a critical value, and being a solid 
otherwise. Precisely, in the simplest and the most 
widely used model, the Bingham model, the extra 
stress tensor T is given by the relations 


D 
=2nD+7,— if Ilp40 
T n TT if Ip 8) 


IT] < 7 if Ip =0 

where 7, > 0 is the yield limit. The Bingham model 
is generalized in taking the viscosity 7 to be a 
function of the shear stress: 7 is given by the 
relation 
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for the Casson law, and by the power law [5] for the 
Herschel—Bulkley model. 

The mathematical study was started by Duvaut 
and Lions (1976), and regained interest recently 
(Malek and Rajagopal 2005), especially in relation 
with other recent studies in polymeric liquids. 


Theoretical and Numerical Problems 
for Viscoelastic Flows 


The mathematical study of viscoelastic fluid flows 
amounts to studying systems of partial differential 
equations, which all include either the incompres- 
sible Euler equation or the incompressible Navier- 
Stokes equation as particular cases. In particular, it 
means that the results obtained from such a study 
are similar to the ones obtained for Euler or Navier- 
Stokes equations, and, because of the complexity of 
the system, the results are expected to be qualita- 
tively as good, actually more often less good, than 
for these equations. For example, the existence of 
weak three-dimensional solutions to the Navier-— 
Stokes system is known, while for non-Newtonian 
flows, this result will be true only in very specific 
cases. Moreoever, when a result is not known for 
the Navier-Stokes problem, such as the uniqueness 
of solution for all data in a three-dimensional 
problem, there is no hope something similar could 
be proved for non-Newtonian fluid flows. 

As an example, we consider the case of Johnson- 
Segalman fluids, which are described by constitutive 
equation [7] with g=0. Recall that the limit case 
A2 =0 corresponds to the purely elastic case, and 
Az = à1 to the purely Newtonian case. Equation [7] 
is coupled with the equations of motion: 


du 


pat Vp=divr+f 


divu = 0 


[9] 


Equations [7] and [9] have to be solved in the 
domain of the flow, which might be the whole 
space R? (or R or R* in case of symmetries), or a 
domain Q, bounded or not, in R”, n=1, 2, or 3. 
These equations are supplemented by appropriate 
boundary conditions and initial conditions for the 
velocity u and the extra stress r (no boundary 
condition on 7 is needed if the homogeneous 
nonslip boundary condition u=0 is chosen). 

We first make explicit the Newtonian contribu- 
tion to the stress by setting t=7*°+7? and 
T° = 2ND. The differential equation for 7P is then 


Daze 


Py 
Be OM 





= 2) D 


where mp =(1 — à2/å1)ņ is the so-called polymeric 
viscosity, s=(A2/A1)n the so-called Newtonian 
viscosity (or solvent viscosity). 

We then use nondimensional variables, so as to 
make explicit the characteristic parameters, which 
the flow depends on. The non-Newtonian fluid 
considered in this model will always be homoge- 
neous: its density p is a constant independent of x 
and t. The dimensional variables are now asterisked. 
We define quantities which are characteristic of the 
flow: a length L, a velocity magnitude U, a stress 
magnitude T, a force magnitude F, and a pressure P. 
We operate the change of variables and functions 
x=x*/L, u=u*/U, t= Ut*/L, and also introduce the 
nondimensional functions 


After choosing the parameters T, P, and F in 
an appropriate way, namely T=P=7U/L, and 
F=7nU/L?, we obtain the following system 





d 
Re+ Vp =(1—w)Au + divr +f 
divu = 0 [10] 
DaT 
= 2wD 
We a + T Ww 


Here the three nondimensional parameters which 
the flow depends on are the usual Reynolds number 
Re=poUL/n and two other numbers: the Weissen- 
berg number We = AU/L measures the elasticity per 
unit time (sometimes also called the Deborah 
number), and the parameter w= m/n is the ratio of 
elastic viscosity to total viscosity (w = 0 corresponds 
to the Newtonian case, while w= 1 corresponds to 
the purely elastic case). 

System [10] couples a transport equation (the 
equation for the stress 7T), and either a Navier- 
Stokes type equation when w < 1, or a Euler type 
equation when w=1 (for the velocity u). This 
system is not hyperbolic, parabolic, or elliptic. 

Maxwell’s type models (w= 1) display two striking 
phenomena. First, the Cauchy problem (with initial 
data) can present Hadamard instabilities, that is, 
instabilities to short waves. It means, in particular, that 
the Cauchy problem is not well posed in any good class 
but analytic. Moreover, the partial differential system 
for Maxwell’s type steady flows may experience a 
change of type, analogous to the situation in gas 
dynamics, if the “Mach number” Re We is larger than 1. 

Jeffreys’ type models (w < 1), because of the 
presence of a Newtonian viscosity, do not exhibit 
such phenomenon, but their study does not enter in 


the theory of parabolic equations either, the type of 
the system being composite. 

Problems of interest for rheologists, as well as for 
mathematicians, include in particular the high 
Weissenberg asymptotics, the high Weissenberg 
boundary layers, the singularity of flows near a 
reentrant corner, and the stability of flows. 

We give a few details about stability questions. 
Instabilities are seen in experimental extrusion of 
melted polymers from a pipe: melt fracture designates 
different phenomena appearing at different stages of 
the experiment, when the speed of the extrusion is 
increased, such as sharkskin instability, slight distor- 
tions of the extrudate, large distortions and wavyness 
of the extrudate. One may distinguish two kinds of 
instabilities. First, constitutive instabilities are asso- 
ciated with nonmonotonicity of constitutive functions 
and loss of evolutionary property of the equations of 
motion. Other kinds of instabilities are close to 
classical hydrodynamic instabilities at increasing Re. 
Note that in viscoelastic flows the Re is usually very 
small, and might even be set to zero in some studies. 

Other mathematical questions for system [10] 
include existence of weak solutions (for the very 
special case of Oldroyd model with the Jaumann 
derivative where (a=0) in [5]), existence of regular 
solutions defined on some time interval, depending 
on the magnitude of the data, and existence of 
regular solutions for all times. Other studies concern 
the existence, uniqueness, and stability of steady 
solutions. Another field of study is the numerical 
simulation of such flows. 

In summary, there have been numerous computa- 
tions made in the field of steady or unsteady viscoelastic 
fluids, and especially models using continuum 
mechanics. Standard test problems include the cavity- 
driven flow, flows inside a 4: 1 contraction, extrusion 
flows, flows between eccentric cylinders, and flows in 
“wiggly” pipes. As mentioned already, the type of the 
sytem of partial differential equations is composite, 
neither elliptic nor hyperbolic. The numerical codes 
have to take into account the precise nature of the set of 
partial differential equations, so as to be able to obtain 
noncatastrophic results. One of the main challenges has 
been to deal with the high- We problem: with increasing 
We, the results would become totally incoherent, and 
the numerical algorithms would diverge. 

Nowadays, with the power of computers increasing, 
molecular simulations of flows are proposed, using the 
macro-—micro modeling mentioned above. Also, simula- 
tions of flows of colloidal suspensions and reacting 
flows have been undertaken with success. 
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See also: Compressible Flows: Mathematical Theory; 
Fluid Mechanics: Numerical Methods; Incompressible 
Euler Equations: Mathematical Theory; Interfaces and 
Multicomponent Fluids; Inviscid Flows; Liquid Crystals; 
Newtonian Fluids and Thermohydraulics; Partial 
Differential Equations: Some Examples; Stability of 
Flows; Stochastic Hydrodynamics; Viscous 
Incompressible Fluids: Mathematical Theory. 
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Introduction 


Classical fields that enter a classical field theory 
provide a mapping from the “base” manifold on 
which they are defined (space or spacetime) to a 
“target” space over which they range. The base and 
target spaces, as well as the map, may possess 
nontrivial topological features, which affect the 
fixed-time description and the temporal evolution of 
the fields, thereby influencing the physical reality that 
these fields describe. Quantum fields of a quantum 
field theory are operator-valued distributions whose 
relevant topological properties are obscure. Never- 
theless, topological features of the corresponding 
classical fields are important in the quantum theory 
for a variety of reasons: (1) Quantized fields can 
undergo local (spacetime-dependent) transformations 
(gauge transformations, coordinate diffeomorphisms) 
that involve classical functions whose topological 
properties determine the allowed quantum field 
theoretic structures. (2) One formulation of the 
quantum field theory uses a functional integral over 
classical fields, and classical topological features 
become relevant. (3) Semiclassical (WKB) approxi- 
mations to the quantum theory rely on classical 
dynamics, and again classical topology plays a role in 
the analysis. 

Topological effects of gauge fields in quantum 
theory were first appreciated by Dirac in his study of 
the quantum mechanics for (hypothetical) magnetic 
point monopoles. Although here one is not dealing 
with a field theory, the consequences of his analysis 
contain many features that were later encountered in 
field theory models. 

The Lorentz equations of motion for a charged (e) 
massive (M) particle in a monopole magnetic field 
(B=mr/r*) are unexceptional, 


Pp 
= [1a] 
p=% pxB (c=1) Mb] 


and completely determine classical dynamics. But 
knowledge of the Lagrangian L and of the action 
I — the time integral of L: I= fdtL — is further 
needed for quantum mechanics, either in its func- 
tional integral formulation or in its Hamiltonian 


formulation, which requires the canonical momen- 
tum w=0L/0r. The  Lorentz-force action 
is expressed in terms of the vector potential 
A, B=V x A: lorenz =e [dtr-A=efdr-A. The 
magnetic monopole vector potential is necessarily 
singular because V -B=42m6°(r) ¢ 0. The singular- 
ity (Dirac string) can be moved, but not removed, by 
gauge transformations, which also are singular, and 
do not leave the Lorentz action invariant. Noninvar- 
lance of the action can be tolerated provided its 
change is an integral multiple of 27, since the 
functional integrand involves exp (iJ) (with b=1). 
The quantal requirement, which is not seen in the 
equations of motion, is met when 


eg =N/2 [2] 


The topological background to this (Dirac) quanti- 
zation condition is the fact that II, (U(1)) is the 
group of integers, that is, the map of the unit circle 
into the gauge group, here U(1), is classified by 
integers. 

Further analysis shows that only point magnetic 
sources can be incorporated in particle quantum 
mechanics, which is governed by the particle 
Hamiltonian H=p7/2M (magnetic fields do no 
work and are not seen in H). Quantum Lorentz 
equations are regained by commutation with 


H: +=i[H,r], p=i[H, p], provided 


il”, r] =0 [3a] 
ilp, r] = & [3b] 
ilp’, p] = —ee"* BS [Sc 


But [3c] implies that the Jacobi identity is obstructed 
by magnetic sources V -B Æ 0. 


leik ip’, [p,p] =eV-B [4] 


This obstruction is better understood by examin- 
ing the unitary operator U(a) = exp (ia - p), which 
according to [3b] implements finite translations 
of r by a. The commutator algebra [3] and 
the failure of the Jacobi identity [4] imply 
that these operators do not associate. Rather one 


finds 
U(a1)(U(a2)U(a3)) = e° (U(a1)U(a2))U (a3) [5] 


where @=e f d°xV-B is the total flux emerging 
from the tetrahedron formed from the three vectors 
a; with vertex at r (see Figure 1). But quantum 
mechanics realized by linear operators acting on a 
Hilbert space requires that operator multiplication 
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Figure 1 Tetrahedron pierced by magnetic flux that obstructs 
associativity. 


be associative. This can be achieved, in spite of [5], 
provided ® is an integral multiple of 27, hence 
invisible in the exponent. This then needs that (1) 
V-B be localized at points, so that the volume 
integral of V-B retain integrality for arbitrary a; 
and (2) the strengths of the localized poles obey 
Dirac quantization. The points at which V-B is 
localized can now be removed from the manifold 
and the Jacobi identity is regained. The above 
argument, which rederives Dirac’s quantization, 
makes no reference to gauge variance of magnetic 
potentials. 

In the remainder we shall discuss related phenom- 
ena for selected gauge field theories in four, three, 
and two dimensions that describe actual physical 
events occurring in nature. We shall encounter in 
generalized form, analogs to the above quantum 
mechanical system. 

Some definitions and notational conventions: 
Nonabelian gauge potentials Af carry a spacetime 
index (u) (metric tensor g,,, =diag(1,—1,...)) and 
an adjoint group index (a). When contracted with 
anti-Hermitian matrices T, that represent the 
group’s Lie algebra (structure constants [,;°) 


[Ta, To] = far’ Te [6] 
they become Lie algebra-valued. 
MSA Ta [7] 


Gauge transformations transform A, by group 
elements U: 


A, > AĮ = UAU + Uto, U [8a] 


For infinitesimal gauge transformations, U ~ I + A, 
A = A'T; this leads to the covariant derivative D,: 


Ar > Ar TOGA Ap A= Pig 8b 
Af > A? + O,X7 + foc? AS AF = AZ + (DA) 


(In a quantum field theory, A,, becomes an operator 
but the gauge transformations U, A remain c-number 
functions.) The field strength F,,,, given by 


Fin = Op Ay Op Ay Amey [9a] 
is also given by 
DiDa iss =F ima] [9b] 


(coupling strength g has been scaled to unity). The 
definition [9] implies the Bianchi identity 


DF + DoF + Dy Fop = 0 10) 


Here F,,, is gauge covariant 


Egy Ship =U FpU [11a] 
or, infinitesimally, 
P= Pioti Al [11b] 


In the gauge invariant Yang-Mills action Iym, the 
Yang-Mills Lagrange density Lym is integrated over 
the base space, 


LYM =} tr F Bay 


1 m [12] 
Iym = | Lym = J tr F a 
The trace is evaluated with the convention 
tr Te T, = — ab [13] 


and henceforth there is no distinction between upper 
and lower group indices. The Euler-Lagrange condition 
for stationarizing Iyyy gives the Yang-Mills equation 


D,” =0 [14a] 
Should sources J” be present, [14a] becomes 
D, FY =J” [14b] 
and J” must be covariantly conserved: 
DF =D; D = —3|[D,,,D,]F’” 
= —1F,,, Fi] =0 as 


All this is a nonabelian generalization of familiar 
Maxwell electrodynamics. 


Gauge Theories in Four Dimensions 


Gauge theories in four-dimensional spacetime are at 
the heart of the standard particle physics model. 
Their topological features have physical conse- 
quences and merit careful study. 
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Yang-Mills Theory 


In four dimensions, we define nonabelian electric E? 
and magnetic B° fields, 

E* = Fi, Bi =—tei® Fe [16] 
Canonical analysis and quantization is carried out in 
the Weyl gauge (Aj =0), where the Lagrangian and 
Hamiltonian (energy) densities read 


Lym = 1(E* - E° — B° - B°) [17] 


Hym = HE” “EEP B°) [18] 


The first term is kinetic, with E* = —0,A° also 
functioning as the (negative) canonical momentum 
m’, conjugate to the canonical variable A*; the 
second magnetic term gives the potential. In the 
Weyl gauge, the theory remains invariant against 
time-independent gauge transformations. The time 
component of equation [14] (Gauss law) is absent 
(because there is no Ag to vary); rather it is imposed 
as a fixed-time constraint on the canonical variables 
E° and A’. This regains the Gauss law: 

(D-E)*=0 (in the absence of sources) [19a] 

In the quantum theory D- E annihilates “physical” 
states. Explicitly, in a functional Schrédinger repre- 
sentation, where states are functionals of the canonical 
fixed-time variable A|W) — W(A), [19a] requires 


(>. 8) wa) =0 


that is, physical states must be invariant against 
infinitesimal gauge transformation, or equivalently, 
against gauge transformations that are homotopic 
(continuously deformable) to the identity (the so-called 
“small” gauge transformations) 


[19b] 


(A+ DA) = (A) [20] 


But homotopically nontrivial gauge transformation 
functions that cannot be deformed to the identity 
(the so-called “large” gauge transformations) may 
be present. Their effect is not controlled by Gauss’ 
law, and must be discussed separately. 

Fixed-time gauge transformation functions 
depend on the spatial variable r:U(r). For a 
topological classification, we require that U tend to 
a constant at large r. Equivalently, we compactify 
the base space R? to S*. Thus, the gauge functions 
provide a mapping from $? into the relevant gauge 
group G, and for nonabelian compact gauge groups 
such mappings fall into disjoint homotopy classes 


labeled by an integer winding number 
n: II?(G)=Z. Gauge functions U, belonging to 
different classes cannot be deformed into each 
other; only those in the “zero” class are deformable 
to the identity. An analytic expression for the 
winding number w(U) is 


w(U) 
1 
2412 


This is a most important topological entity for 
gauge theories in four-dimensional spacetime, that is, 
in 3-space, and we shall meet it again in a description 
of gauge theories in three-dimensional spacetime, 
that is, on a plane. Various features of w expose its 
topological character: (1) w(U) does not involve a 
metric tensor, yet it is diffeomorphism invariant. 
(2) w(U) does not change under local variations of U: 





J d’xetftr(U-taUU -to UU taU) [21] 


6w(U) -5 / d?x Oje"*tr(U-16UU 1, UU U) 
1 i ij — = = 
= ya | 4s e"®tr(U-'6UU ta UU dU) 
=0 [22] 


The last integral is over the surface (at infinity) 
bounding the base space and vanishes for localized 
variations óU. In fact, the entire w(U), not only its 
variation, can be presented as a surface integral, but 
this requires parametrizing the group element U on 
R?. For example, for SU(2), 

A= NO 21 


U = exp à, (o = Pauli matrices) 


= 1 
— 1697? 
AL = VAP," 


Specifically, with [A| —>2nn (so that U-—> I1), 
w(U)=-—n. As befits a topological entity, w(U) is 
determined by global (here large distance) properties 
of U. 

Since all gauge transformations, small and large, 
are symmetry operations for the theory, [20] should 
be generalized to 





mee / dS! elite, 5 520,50, (sin JA] — IAD 


=| [23] 


T(A™) = el” (A) [24] 


where @ is an universal constant. Thus, Yang-Mills 
quantum states behave as Bloch waves in a periodic 
lattice, with large gauge transformations playing the 
role of lattice translations and the Yang-Mills vacuum 
angle 0 playing the role of the Bloch momentum. This 
is further understood by noting that the profile of the 
potential energy density, 4B* - B? possesses a periodic 
structure symbolically depicted in Figure 2. 
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Energy density 





Instanton —— 


Figure 2 Schematic for energy periodicity of Yang—Mills fields. 


Thanks to Gauss’ law, potentials A that differ by 
small gauge transformations are identified, while 
those differing by large gauge transformations give 
rise to the periodicity. Zero energy troughs corre- 
spond to pure gauge vector potentials in different 
homotopy classes n: A= — U-V U,. 

The @ angle (Bloch momentum) arises from 
quantum tunneling in A space. Usually, in field 
theory tunneling is suppressed by infinite energy 
barriers. (This gives rise to spontaneous symmetry 
breaking.) However, in Yang-Mills theory there are 
paths in field space that avoid such barriers. 
Quantum tunneling paths are exhibited in a semi- 
classical approximation by identifying classical 
motion in imaginary time (Euclidean space) that 
interpolates between classically degenerate vacua 
and possesses finite action. 

In Yang-Mills theory, continuation to imaginary 
time, x? —ixf, places a factor of i on E’. Zero 
(Euclidean) energy is maintained when E° = +B’, or 
with covariant notation in Euclidean space, 


1 erat Bg = * FAY = EP” [2.5] 


Euclidean finite action field configurations that 
satisfy [25] are called self-dual or anti-self-dual 
instantons. By virtue of the Bianchi identity [10], 
instantons also solve the field equation [14a] in 
Euclidean space. Since the Euclidean action may also 
be written as 


1 
lym = ; | ax id a a T T T 
1 
T3 J dfx tr* FH” Fy [26] 


and the first term vanishes for instantons, we see 
that instantons are characterized by the last term, 
the Chern—Pontryagin index, 


_ 1 4 * GUV 
P= -z7 | ¢ AECI Tw) 
1 


— — 3772 [atx grr tr( FoF) |27] 


This again is an important topological entity: 


1. The diffeomorphism invariant P does not involve 
the metric tensor. 
2. P is insensitive to local variations of A,, 


1 
ôP = -za | ats wT” ory) 
TT 
= -p fats tr F” D,6A,) 
TT 


1 <P 
— 72 | t(D, FPA) = 0 |28] 


3. P may be presented as a surface integral owing to 
the formula 


1 tr*F"” Fy = ô K” [29] 


K" = ° tr(TA,0gA, + 4AoAgA,) [BO] 


where K” is the Chern-Simons current, 


P= - a | 45," 31] 


The integral [31] is over the base space boundary, 
S3. The Chern—Pontryagin index of any gauge field 
configuration with finite (Euclidean) action (not 
only instantons) is quantized. This is because finite 
action requires F,,, to vanish at large distances; 
equivalently, A,—U~‘d,U. Using this in [30] 
renders [31] as 


1 
Z papy 
p a fas; 
x tr(U-'0,UU-'0g,UU'0,U) [B2 


which is the same as [20] and, for the same reason, 
is given by an integer [I (G) = Z]. Alternatively, for 
instantons in the (Euclidean) Weyl gauge (A4 =0), 
which interpolate as x* passes from —œo to +00 
between degenerate, classical vacua A;=0 and 
A; = — U`! V;U, P becomes 


Pay | ax' as (04K* + V-K) 


1 3 
=z j° x Kî bä- 


1 3 ijk —1 —1 —1 
E x e/* te(U-16,;UU-18,UU OU) 


= w(U) [33] 
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We have assumed that the potentials decrease at 
large arguments sufficiently rapidly so that the 
gradient term in the first integrand does not 
contribute. This rederivation of [32] relies on the 
“motion” of an instanton between vacuum config- 
urations of different winding numbers. 

An explicit 1-instanton SU(2) solution (P = 1) is 


Ji , 
u = e- a [34] 


(Upon reinserting the coupling constant g, which 
has been scaled to unity, the field profiles acquire 
the factor gt.) In [34], op = (1/4i)(o} a, — 
caer Btey = (—iø,[). € is the “location” of the 
instanton, p is its “size,” and there are three more 
implicit parameters fixing the gauge, for a total of 
eight parameters that are needed to specify a single 
SU(2) instanton. One can show that there exist N 
instanton/anti-instanton solutions (P=N/—N) and 
in SU(2) they depend on 8N parameters. From [26] 
we see that at fixed N, instantons minimize the 
(Euclidean) action. Explicit formulas exist for the 
most general N=2 solution, while for N > 3 
explicit formulas exhibit only SN +7 parameters. 
But algorithms have been found that construct 
the most general 8N-parameter instantons. The 
1-instanton solution is unchanged by SO(5) 
rotations, the maximal compact subgroup of the 
SO(S5, 1) conformal invariance group for the 
Euclidean 4-space Yang-Mills equation [14a]. 

The Chern—Pontryagin index also appears in the 
Yang-Mills quantum action, for the following 
reason. Since all physical states respond to gauge 
transformations U, with the universal phase n0 
[24], physical states may be presented in factorized 
form, 


(A) = VA G(A) [35] 


where ®(A) is invariant against all gauge transfor- 
mations, small and large, while the phase response is 
carried by W(A), 


W(A™) = W(A)+n [36] 
An explicit expression for W(A) is given by 
— (1/47?) f d?x K?, where K? is the time (fourth) 


component of K”, with dependence on the fourth 
variable suppressed, that is, K? is defined on 3-space, 


1 : 
W(A) T za | ex g'ik tr (4A;0j;Ak T sA;AjAg) [37] 


The gauge transformation properties of W(A) are 
W(A") 
- W(A) +- | Bxeitatr(aUU-1A 
= ( ) -+ p XE i tr( j A 


1 
2472 





-+ / dx c*tr(U-10,U U-10,U U-'&U) 


[38] 


The middle surface term does not contribute for 
well-behaved A; the last term is again w(U), the 
winding number of the gauge transformation U. 
Thus, [36] is verified. 

The universal gauge-varying phase e!?’4), which 
multiplies all gauge-invariant functional states, may 
be removed at the expense of subtracting from the 
action 


9 
0 / dfx ð; W(A) = a J dfx ðK? = 0P 


(as in [33]). Thus, the Yang-Mills quantum action 
extends [12] to 


1 0 
P J d*x tr (SPF = 16,2 PE [39] 


The additional Chern-Pontryagin term in [35] 
does not contribute to equations of motion, but it is 
needed to render all physical states invariant against 
all gauge transformations, large and small. With this 
transformation, one sees that the 6-angle is a 
Lorentz invariant, but CP noninvariant effect. 
Evidently, specifying a classical gauge theory 
requires fixing a group; a quantized gauge theory is 
specified by a group and a 6-angle, which arises 
from topological properties of the gauge theory. The 
energy eigenvalues depend on 6, and distinct 6’s 
correspond to distinct theories. 

Note that the reasoning leading to [24] and [39] 
relies on exact quantum-mechanical arguments, 
while the instanton-based tunneling discussion is 
semiclassical. 


Adding Fermions 


When fermions couple to the gauge fields, the 
previously described topological effects are modified 
by action of the chiral anomaly. Dirac fields, either 
noninteracting but quantized, or unquantized but 
interacting with a gauge potential through a 
covariantly conserved current Jý, Lı = —JHA%, also 
possess a chiral current /§ = Y y#ysp, which satisfies 


Ons = WZmin ys [40] 
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Here m is the mass, if any, of the fermions. jẹ is 
conserved for massless fermions, which therefore 
enjoy a chiral symmetry: Y — esy. However, 
when the interacting fermions are quantized, there 
arises correction to [40]; this is the chiral anomaly: 


3a 5) a= 2im(h ys), + CRY AF, A 


C is determined by the fermion quantum numbers 
and coupling strengths. (For a single charged (e) 
fermion and a U(1) gauge potential, C =e*/87’.) 
(|), signifies the fermionic vacuum matrix element 
in the presence of A,. The modified equation [41] 
indicates that even in the massless limit chiral 
symmetry remains broken due to the anomaly, 
which arises with quantized fermions. 
(7), may also be presented as 


(is) = troyst (ip) 4 [42] 


In Euclidean space (Yy), is the coincident-point 
limit of the resolvent R(x,y;) for the Dirac 
equation, 


We (x 
R(x, y3 2s exe. aa [43] 


Here yw is an eigenfunction of the massless, 
Euclidean Dirac operator in the presence of the 
gauge field A,, 


iy” (0, + T We = ee [44] 


The coincident-point limit is singular, so R must be 
regulated: R — R — Rreg (we do not specify the 
regularization procedure). It then follows that 


Pl (x)ys We(x 
AF) aa - eee — trys HO, RReg 
. yi (x) Ys Pe (x ) 
= Cea 45 
in Do r [45] 


The first term on the right-hand side is the (Euclidean 
space) analog of the mass term in [40] or [41], while 
the second survives even after the regulators are 
removed, giving the anomaly tr *F*” F, 

The anomaly formula [41], or more explicitly 
[45], is also the local form of the Atiyah-Singer 
index theorem, which follows after [45] is integrated 
over all space: The left-hand side integrates to zero. 
The integral of the first term on the right-hand side, 
fdxyžysľe, vanishes for «40 by orthogonality, 
because y5y, is an eigenfunction of [44] with 
eigenvalue —e. Only zero modes contribute to the e 
sum since these can be chosen to be eigenfunctions 
of ys, n+ of them satisfying Wo = yso. For a single 
multiplet, the normalizations work out so that 


1 
1672 


n y — n = 


J d*x tr T Ip [46] 


The result that the (signed) number of zero modes is the 
Chern—Pontryagin index is an instance of the Atiyah- 
Singer theorem. (In specific applications, one can 
frequently show that n, or m_ vanishes.) It, therefore, 
follows that in the background field of instantons, the 
Euclidean Dirac equation possesses zero modes. 

Another viewpoint on the chiral anomaly arises 
within the functional integral formulation, where the 
exponentiated action is constructed from unquantized 
fields, over which the functional integration is 
performed. Here the classical action retains chiral 
symmetry Y — ew, but the Grassmann fermion 
measure dydy, once it is properly regularized, looses 
chiral invariance and acquires the anomaly, 


dydy — dydy exp iC J d*x atr Tiy [47] 


Evidently, the chiral anomaly involves the gauge- 
theoretic topological entity, the Chern—Pontryagin 
density. Not unexpectantly, the anomaly phenom- 
enon affects significantly the topological properties 
of the gauge theory that are connected to P and 
were described previously. 

When there is (at least) one massless fermion 
coupling to the Yang-Mills fields, the Yang-Mills 
-angle looses physical relevance. This is because a 
chiral transformation that redefines the massless 
Dirac field does not modify the classical action, but 
owing to the chiral noninvariance of the functional 
measure, [47], an anomaly term is induced in the 
(effective) quantum action. The strength of this 
induced term can be fixed so that it cancels the 
-term in [39]. Since field redefinition cannot affect 
physics, the elimination of the 6-term indicates that it 
had no physical relevance in the first place. In 
particular, energy eigenvalues no longer depend on 0. 

An alternate argument for the same conclusion is 
based on the functional determinant that arises 
when the functional integral is performed over the 
massless Dirac field: det[y“(0,+A,)]. The semi- 
classical tunneling analysis of the 6-angle is based on 
instantons, but in the presence of instantons the 
Dirac equation has a zero mode [46]. Consequently 
the determinant vanishes, tunneling is suppressed 
and so is the 6-angle. 

However, in the standard model for particle 
physics, there are no massless fermions, so the 
presence of the 9-angle entails the following physical 
consequences. The tunneling amplitude T in leading 
semiclassical approximation is determined by the 
Euclidean action, namely the continuation of iy) in 
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[39] to imaginary time. This results in the same 
expression except that the topological 6-term 
acquires a factor of 1. Only the 1-instanton and 
anti-instanton give the dominant contribution, 


T x cose 8" /& [48] 


where the coupling constraint g has been reinserted; 
the proportionality constant has not been computed, 
owing to infrared divergences. (Higher-instanton- 
number configurations contribute at an exponen- 
tially subdominant order and have thus far played 
no role in physics.) The tunneling leads to baryon 
decay, but fortunately at an exponentially small 
rate. More useful is the fact that instanton tunneling 
gives semiclassical evidence for the removal of an 
unwanted chiral U(1) Goldstone symmetry, which 
would be present in the standard model if the chiral 
anomaly did not interfere. Furthermore, the chiral 
anomaly facilitates the decay of the neutral pion to 
two photons; a process forbidden by other apparent 
chiral symmetries of the standard model, which in 
fact are modified by the chiral anomaly. Gauge 
fields in four dimensions must interact with anomaly 
free currents. This necessitates a precise adjustment 
of fermion content and charges so that the anomaly 
coefficients (analogs of “C” in [41]) vanish for 
currents coupled to gauge fields. Finally, 0 
provides a tantalizing source of CP violation in the 
strong-interaction sector of the standard model. But 
no experimental signal (e.g., neutron electric dipole 
moment) for this effect has been seen. At present, we 
do not know what mechanism is responsible for 
keeping @ vanishingly small. 

These are the physical consequences of topologi- 
cal effects in four-dimensional gauge theories. 
Although they have provided experimentalists with 
only a few numbers to measure (e.g., n? — 2y decay 
amplitude, prediction of anomaly-free arrangements 
of quarks and leptons in families), they have added 
enormously to our appreciation of the complexities 
of quantized gauge theories. 

That chiral anomalies are an obstruction to 
consistent gauge interactions can be established 
within perturbation theory. A similar, but nonper- 
turbative effect is seen in an SU(2) gauge theory with 
N Weyl fermion (ysy = +4) SU(2) doublets, which 
lead upon functional integration to det[y"(0, + 
Ay But because II*(SU(2)) = Z2, there exists a 
single homotopy class of gauge transformations 
which are not deformable to the identity. One 
shows that the determinant changes sign when 
such a gauge transformation is performed. Thus, 
the theory is ill-defined for odd N. Consistent SU(2) 
gauge theories must possess an even number of Weyl 


fermion doublets, but such models have not found a 
place in physical theory. 


Adding Bosons 


Instantons are finite-action solutions to classical 
equations continued to imaginary time; they provide 
a semiclassical description of quantum-mechanical 
tunneling. A field theory may also possess finite- 
energy, time-independent (static) solutions to the 
real-time equations of motion. When these solutions 
are stable for topological reasons, they are called 
“solitons.” Solitons give semiclassical evidence for 
the existence in the quantum field theory of a 
particle sector disjoint from the particles obtained 
by quantizing field fluctuations around the vacuum 
state. The soliton particles are heavy for weak 
coupling g. (Their energy is O(1/g*); the field 
profiles are O(1/g).) They do not decay owing to 
the conservation of “charges” that do not arise from 
Noether’s theorem but are topological. 

Yang-Mills theory does not possess soliton solu- 
tions (except in five-dimensional spacetime, where 
the static solitons are just the four-dimensional 
instantons discussed previously). However, when a 
gauge theory, based on a simple group is coupled to 
a scalar field that undergoes symmetry breaking to 
U(1), soliton solutions exist. These are the ‘t Hooft- 
Polyakov magnetic monopoles, found in a SU(2) 
gauge theory with scalar fields in the adjoint 
representation, as well as various generalizations. 
The topological consideration that arises here con- 
cerns finite energy of the static, scalar field multiplet 
y, which in the Weyl gauge is 


Ele) = [dx(\(Dp)" (DP + Vie) (49 


V is non-negative and possesses non trivial symmetry 
breaking zeroes. On the sphere S* at spatial infinity, 
y must tend to such a zero. Thus, the fields belong 
to G/H, where G is the gauge group and H the 
unbroken subgroup. For the ‘t Hooft—Polyakov 
monopole these are SU(2) and U(1), respectively, 
and the scalar field provides a mapping of the sphere 
at infinity S* to S? ~ SU(2)/U(1). 

One now considers II?(S*) =II*(SU(2)/U(1)) = 
I'(U(1))=Z, and one shows that the magnetic 
flux is determined by the winding number. Hence, 
the magnetic charge is quantized. Explicitly, the 
electromagnetic U(1) gauge field is given by 


ma x A anb ANC 
fw =" Fiv — Eade?" (Du) (Dr) 
= Oil = Oð [50] 
ay, = "Af, — cos a0, b 
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parametrized as 
The manifestly 


where ° is the unit isovector, 
p* =(sinacos 3, sina sin 3, cosa). 
conserved magnetic current 


jh, = or” Sa] 
is rearranged to read 
jt = ter O e 1 On 0o O,0° [51b] 


and is nonvanishing because f possesses zeroes, 
where aĝ? acquires localized singularities. The 
magnetic charge 


la =p i | 
odse d’xV.-b [52] 


(b'=U(1) magnetic field: —4e/*f, =*f") is given by 
the topological entity (Kronecker index of the 
mapping) 


1 : PEEN p 
ele ce 1 OD 3 One 

=| dS! Pe sre P Oj’ OpP° 
"ET pfas cko, cos Q0 8 [53] 


which readily evaluates the integer winding number. 

The theory also supports charged magnetic mono- 
pole solutions called “dyons.” Here the profiles 
involve time-periodic gauge potentials, where the 
time variation is just a gauge transformation 
0A =D, Aà. (Gauge-equivalent, static expressions 
have slow large-distance fall-off, which is removed 
by the time-dependent gauge function.) For dyons, 
the integer valued Chern—Pontryagin index, with the 
integration taken over all space and in time over the 
dyon period, reproduces the magnetic monopole 
strength. 

Regrettably, these fascinating structures are not 
found in nature. Nor do they arise in the standard 
model, whose structure group is not simple, 
although speculative grand unified models, with 
simple G and H=SU(3) x U(1), would support 
magnetic monopoles and dyons. While challenged 
physically, the magnetic monopole phenomena have 
produced extensive and interesting mathematical 
analysis. 


Gauge Theories in Two Dimensions 


Two-dimensional gauge theories have only a few 
physical applications; edge states of the planar 
quantum Hall effect can be described by excitations 
moving on a line. However, the abelian model with 
fermions is useful in that it provides a very accurate 


reflection of topological behavior in the physically 
important four-dimensional theory. 


Abelian Gauge Theory 


Take the spatial interval to be [—L, L]. Homotopi- 
cally nontrivial gauge transformations satisfy A(L) — 
A(—L)=2mrn (IIt U(1)= Z). States U(A) of the free 
gauge theory that satisfy Gauss’ law and respond 
with a -angle are 


= A) = exps [dv sa) 


U(A + OX) = el” U(A) 


In this model, @ has the interpretation of a constant 
background electric field E= —6/27, 


EW(A) = €EW(A), E = Fo1 
6 0 [55] 
izz YA) = — 5, UIA) 
This also gives the energy eigenvalue: 
TEA A) = 5 | dewa) [56] 


The phase may be removed by adding to the 
Lagrangian —(0/2r) [dx,A; equivalently, the 
action becomes 


1 0 
T = [es (- g Fw + eF) [57a] 


which apart from a constant is also given by a 
formula with the background field: 
prentum _ ; J dx(E + £} [57b] 
Because of gauge invariance, there is only one state, 
annihilated by E and carrying energy 4f dx E€7. 
Distinct @ (different E£) correspond to distinct 
theories. 
We recognize in [57a] 
Chern-Pontryagin density, 
derivative to the action, 


the two-dimensional 
contributing a total 


1 A ow 
=; Jd A ay [58] 
the Chern-Simons current, whose divergence is P, 
K= Í wa [59] 
2r ” 


and the Chern-Simons term, which carries the phase 


J J 
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For Euclidean-space gauge potentials, which are 
given at large distance by the pure gauge 
2mntan!y/x, P=n. All this is just as in the four- 
dimensional theory, except there are no instantons 
and no tunneling. 


Adding Fermions 


The addition of massless fermions to the U(1) gauge 
theory results in the Schwinger model of massless 
quantum electrodynamics in two-dimensional space- 
time. The equation of motion becomes 


aP” = J” 61 


with the vector current constructed from the Dirac 
fields as J“ =y 7". This current remains conserved 
in the quantized version because it couples to the 
gauge field. But the axial vector current jk = Y ysy 
acquires an anomaly that involves the Chern- 


Pontryagin density in [58], 
ae a 


The model is readily solved, and shows no 6-angle 
(background field) dependence in physical quanti- 
ties. The solution is directly obtained by combining 
[61] with [62] into a second-order differential 
equation and using the matrix identity of two- 
dimensional Dirac (= Pauli) matrices: e” yyys =". 
It follows that 


(a+=)E=0 63] 


So the theory describes a free massive photon (mass 
squared=1/n in units of b and the coupling 
constant, which have been scaled to unity), with no 
sign of a -angle (background field). 

However, in parallel with four-dimensional beha- 
vior, the model with massive fermions regains a 0 
dependence in the particles’ energy spectrum; a 
result that is established perturbatively, because a 
complete solution is not available. 

Note that in the Schwinger model, the gauge 
particle (“photon”) acquires a mass, even though 
local gauge invariance is preserved. This happens 
essentially for topological/anomaly reasons. Such 
topological mass generation is met again in three 
dimensions. 


Adding Bosons 


Scalar electrodynamics with a negative mass squared 
term in (3 + 1)-dimensional spacetime leads to the 
Higgs mechanism and short-range interactions due 
to the massive photons. In (1 + 1) spacetime dimen- 
sions, the model possesses instantons — scalar and 


gauge field profiles that solve the imaginary-time 
equations of motion — labeled by II!(U(1))=Z. 
These disorder the Higgs condensate so that the 
force between charged particles remains long-range, 
like in the positive mass-squared case. This is a vivid 
example of how excitations arising from nontrivial 
topological issues significantly effect physical 
content. 


Gauge Theories in Three Dimensions 


Gauge theories on three-dimensional spacetime, that 
is, evolving on a plane, have physical application to 
planar phenomena, like the quantum Hall effect. 
Also, the high-temperature limit of four-dimensional 
field theories is governed by the corresponding field 
theory in three Euclidean dimensions. 

In three (more generally, odd) dimensions, there 
are no Chern—Pontryagin quantities, no Chern- 
Simon currents, no axial vector currents or anoma- 
lies (there is no ys matrix). These are replaced by 
odd-dimensional entities that can modify Yang- 
Mills dynamics. 


Yang-Mills and Other Gauge Theories 


Using the three-index Levi-Civita tensor, one can 
construct a gauge-covariant, covariantly conserved 
vector, which can be added to the Yang-Mills 
equation. Thus, [14] can be modified to 


D,,FY + a -J [64a] 
or, equivalently, in terms of the dual-field strength 
«Fl = Lehab Eag, 


aD *R, +wF = j” [64b] 


For dimensional balance, m carries dimension of 
mass. Indeed, in the source-free case [64] implies 


(D° Dane) B= Sieg FO) [65] 


This shows that excitations are massive, even 
though local gauge invariance is preserved. Other- 
wise, as in the Dirac monopole case, the equations 
of motion are unexceptional. 

However, for the quantum theory we need the 
action, whose variation produces the mass term in 
[64]. This is just the Chern-Simons term W(A) in 
[37], multiplied by -87m and now defined on 
(2 + 1)-dimensional spacetime: 


Ics = 2m [ex ctr (1A ,0gA, T TAa AgA,) [66] 


Everything holds also in the abelian theory; the last 
term in [66] is then absent. 
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In this model, the mass is generated by a 
topological mechanism since Ics possesses the usual 
attributes for a topological entity: it is diffeomor- 
phisms invariant without a metric tensor; when 
the potentials are appropriately parametrized, it is 
given by a surface term. (In the abelian case, 
the appropriate parametrization is in terms of 
Clebsch decomposition, A, =0,,0+ 0,3.) Most 
importantly, in the nonabelian theory [66] changes 
by 82?mn with three-dimensional gauge transforma- 
tions carrying winding number n. Hence, for 
consistency of the nonabelian quantum theory, m 
must be quantized as n/4r (in units of h and the 
coupling constant, which have been scaled to unity). 
All this is a clear field-theoretic analog to the 
quantum mechanics of the Dirac monopole, and 
just as for the magnetic monopole, a Hamiltonian 
argument for quantizing m can be constructed, as an 
alternative to the above action-based derivation. 

The time component of [64] relates the electric 
and magnetic fields to the charge density: 


D-E—mB=p 67] 


In the abelian case, the first term involves a total 
derivative and its spatial integral vanishes, leaving a 
formula that identifies magnetic flux with a total 
charge. At low energy, the mass term dominates the 
conventional kinetic term in [64], and the flux- 
charge relation becomes a local field-current 
identity, 


ee J” [68] 


These formulas have made Chern—Simons-modified 
gauge theories relevant to issues in condensed matter 
physics, for example, the quantum Hall effect. In the 
abelian case, m need not be quantized. 


Adding Fermions 


Three-dimensional Dirac matrices are minimally rea- 
lized by 2 x 2 Pauli matrices. As a consequence, a mass 
term is not parity invariant; also, there is no ys matrix, 
since the product of the three Dirac (= Pauli) matrices 
is proportional to I. While there are no chiral 
anomalies, there is the so-called parity anomaly: 
integrating a single doublet of massless SU(2) fermions 
one obtains A(A) = det[y“(10,, + A,,)], which should 
preserve parity and gauge invariance. 

Since there are no anomalies in current divergences, 
A(A) is certainly invariant against infinitesimal gauge 
transformations. But for finite gauge transformations 
(categorized by II? (SU(2) = Z) one finds that A(A) is 
not invariant: when the gauge transformation belongs 
to an odd-numbered homotopy class, A(A) changes 
sign. To regain gauge invariance, one must either work 


with an even number of fermion doublets or, if only 
one doublet (more generally, odd number) is to be 
used, one must add to the gauge Lagrangian a parity- 
violating Chern—Simons term with half the correctly 
quantized coefficient, to neutralize the gauge non- 
invariance of A(A). 

Alternatively, A(A) can be regularized in a 
gauge-invariant manner. But this requires massive, 
Pauli-Villars regulator fields, which produce a parity- 
violating expression for A(A). One cannot avoid the 
parity anomaly. 


Adding Bosons 


There are a variety of bosonic field models that one 
may consider: Abelian or nonabelian; with conven- 
tional kinetic term or supplemented by the Chern- 
Simons topological mass; or, for low energy, no kinetic 
term but only the Chern—Simons term, as in [68]. 
Abelian charged Bose fields in a Maxwell theory lead 
to vortex solitons, based on II'(U(1)) = Z. These are 
just the instantons of the (1 + 1)-dimensional bosonic 
gauge theory discussed previously. With Maxwell 
kinematics there are no charged vortices, but these 
appear when the Chen-Simons mass is added; see [67]. 
Pure Chern-Simons kinematics, with no Maxwell 
term, can produce completely integrable soliton 
equations (Liouville, Toda) when the Bose field 
dynamics is appropriately chosen. 


Conclusion 


Topological effects in field theory are associated with 
the infinities and regularization that beset quantum 
field theories. These give rise to the chiral anomaly, 
parity anomaly (and scale symmetry anomalies, not 
discussed here). Yet the anomalies themselves are finite 
quantities that have topological significance (Atiyah— 
Singer, Chern—Pontryagin, Chern-Simons). This para- 
doxical pairing has not been understood. Nor can we 
explain why the anomalies interfere in a topological 
manner with symmetries associated with masslessness. 
Although the range of topological effects in gauge 
theory is large, and even larger in non-gauge theories 
(sigma models, Skyrme models) the relevance to actual 
fundamental physics is confined to the -angle phe- 
nomenon, which is analyzed accurately and abstractly 
by reference to II?(G) and to the interplay with 
fermions through the chiral anomaly. Instantons are 
relevant only to an approximate, semiclassical discus- 
sion. Although after much mathematical work, general 
instanton configurations are well understood, only the 
1-instanton solution enjoys physical significance. 
Other topological entities that fascinate are either 
nonexistent in fundamental physics or are relevant to 
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condensed matter physics (vortices, Chern—Simons 
effects). But here too, we note that the funda- 
mental equation of condensed matter physics — the 
many-body Schrödinger equation — carries no evident 
topological structure. Only the phenomenological 
equations, which replace the fundamental one, give 
rise to topological intricacies. 
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Introduction 


Quantum mechanics was born at the beginning of the 
twentieth century with the quantization rules for the 
harmonic oscillator and for the hydrogen atom. Such 
rules were almost immediately extended to more 
general systems by the so-called Bohr—Sommerfeld 
quantization rule: “the actions of the classical system 
can assume only those values which are integer 
multiples of 4.” However, the actions are defined 
only in some special situations and, moreover, at the 
present time the Schrödinger equation is the paradigm 
of quantum mechanics. A question naturally arises: is 
there any relation between the eigenvalues of the 
Schrodinger operator and the numbers obtained by 
Bohr-Sommerfeld quantization rule (when available)? 

According to common wisdom, the “Bohr- 
Sommerfeld numbers” are a first approximation to the 
eigenvalues of the Schrédinger operator in the so-called 


semiclassical limit. However, precise mathematical 
results on the subject were obtained only in the 1980s 
and a good understanding of the problem has been 
achieved only recently. In particular it is now clear how 
to compute higher-order corrections to the eigenvalues: 
this is done through suitable normal form procedures. 

In the present article we will discuss the above 
questions for the case of perturbed harmonic 
oscillators, a case which, on the one hand, is 
physically relevant and, on the other, is well under- 
stood. We will only briefly discuss the quantization 
of perturbations of integrable systems. 


A Statement 


On L’(R”), consider the Schrödinger operator 


. bp 
H=-—A+V [1] 

2 
where A is the n-dimensional Laplacian and V is a 
smooth real potential having an absolute nonde- 
generate minimum at the origin. We are interested in 


the eigenvalues of [1] close to zero. Introduce 
coordinates adapted to the normal modes, namely 
such that 





V(x) = 02 + O(a) 
j=1 


Assume 


(H1) Nonresonance: There exist y > 0 and TER 
such that, for any k € Z” — {0} one has 


lw k| > |2] 


Er 
(H2) Vix) = 0 tor x = 0, and 
lim inf V(x) > 0 


|x| 


(H3) V € C®(R”) and for any r > 0 there exists C, 








such that 
lal 
EZ w| sca)", vaen’ 
_ 21/2 
where we used the notation (x) :=(1 + ||xl|) 2. 


Theorem 1 Assume that (H1)-(H3) hold. Then, for 
any positive N, M there exist positive constants 
PN, Ms €N, M, CN mo CX. Mo and a smooth function 


Zn M(, sey 
such that, VO < e < en,m and 0<b<byn me, the 
eigenvalues of |1] in [0,€) have the representation 

=(k +4) -wh + Zn m((k -+ 5)b; h) 
+ Rynm(k,Þ), keN” , kj >1 [3] 


Lb) 


where 


pM 
Rum(kP)| < Ch me + Cha) 
More precisely, for any k € N” such that 


(k++) -wh + Znm((k+4)b;b) € [0,0 [4 


there exists an eigenvalue A, € [0,€) for which [3] 
holds, and vice versa, for any eigenvalue in [0,€) 
there exists a k satisfying [3] and [4]. The function 
Zn,M(li,...,In;0) coincides with the classical 
Birkhoff normal form of the system computed up 
to order N. 


The proof of the theorem is constructive, in the 
sense that it provides an algorithm allowing to 
construct explicitly, by elementary operations, the 
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function Zyn,m. One could choose e= e(h) = b? with 
some positive 6 < 1, obtaining a simpler statement 
valid for the an in [0, bô). It is also possible 
to weaken the nonresonance condition (H1) to the 
condition w- k Æ 0 for k € Z” — {0}. 

A theorem very close to [1] was proved by 
Sjöstrand (1992) by a method different from the 
one that will be presented here (see also Graffi and 
Paul (1987)). In the analytic or Gevrey case (recall 
that a C% function f(x) is Gevrey in some domain if 
there exist constants C,o such that, for all multi- 
indexes a € N” one has 


ant 


< dla 
One s OMe) 








in the whole domain), the error can be reduced to be 
exponentially small with the parameters (Bambusi 
et al. 1999). Previous results dealing with compact 
perturbations of the harmonic oscillator were 
obtained by Bellissard and Vittot (1990). It is 
possible to deal also with the resonant case in 
which (H1) is violated. In this case the spectrum of 
the complete system is qualitatively different from 
the spectrum of the harmonic one. As discussed 
later, the normal form allows one to compute the 
main qualitative differences. 


Birkhoff Normal Form 


In this section we recall the procedure leading to 
classical Birkhoff normal form, whose quantization 
leads to the proof of Theorem 5. 


Birkhoff’s Theorem 


The operator [1] is the quantization of the classical 
Hamiltonian 


N + V(x) [5] 
i=1 2 
Denote 
u ʻ + w? x? 
= 1 ras: O E 
)= Lod bar la 


then we have 


Theorem 2 For any positive integer N > 2 there 
exist a neighborhood Un of the origin and a 
canonical transformation Ty:R” DUN > R” 
which puts the system |S] in Birkhoff normal form 
up to order N, namely such that 


HoTy = Ho + Z^ +Rwn [7] 
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where ZN Poisson-commutes with Ho, namely 
{Ho; ZN} = 0 and Rn is small, that is, 
IRn(E, x)| < Cnll(E, x) IP [8] 
Moreover, if the frequencies are nonresonant, namely 
w-k #0, VRE Z"\{0} [9] 


the function ZN depends on the actions I; only. We 
recall that the Poisson bracket of two functions f 
and g is defined by 


o (Of ôg Of Og\ | 
tt) =X (sean, a 36) T 


j=l 


—{g; fh 


and coincides with the Lie derivative of g with 
respect to the Hamiltonian vector field of f. 


Remark 1 In the case where the frequencies fulfill 
(H1) and the potential V is analytic (or of Gevrey 
class) the remainder can be reduced to be exponen- 
tially small with ||(€, x)||. 


Scheme of the Proof 


Make the rescaling E= ef, x =ex’. In terms of the 
primed variables, the Hamiltonian of the system [5] 
takes the form 


H.(€, x’) = Ho(&, x) + W(x’) [10] 
with 
Vex!) — e y, w(x) /2 
W(x") = ( ) aa 5 ( i) / 
= W3(x’) ap eWa(x’) ae ee [11] 


and W; is the Taylor polynomial of order / of V. In 
what follows we will omit primes from the scaled 
variables. 

Given an auxiliary Hamiltonian x3, denote by ©; 
the flow of the corresponding Hamiltonian vector 
field. We construct x3 so that H, o ©? is in normal 


form up to order e. 


Remark 2 Given a C% function g one has g o ®? ~ 


1 
20 = g, gi = 7 {X381}; l>1 [12] 


where ~ denotes the fact that the left-hand side is 
asymptotic to the right-hand side (a precise defini- 
tion appears later in the article). If both g and x3 are 
analytic then the series of go ®? can be shown to 
converge in a neighborhood of the origin. Using [12] 
to compute H, o ®°, we get 


Heo p? = Ho + e| W3 + {x3; Ho} + O(e’) 


So H, o &° is in normal form up to O(e*) provided 
x3 fulfills the so-called homological equation: 


W3 + {x3; Ho} = Z3 [13] 


where the unknown function Z3 has to be in normal 
form. Note that, since the operator 


xX? 1x; Ho} 


maps linearly polynomials of degree / into poly- 
nomials of degree l, eqn [13] can be interpreted 
as a linear equation in the finite-dimensional space 
of polynomials of degree 3 in the phase-space 
variables. 


Lemma 1 The homological equation [13] admits a 
solution (x3, Z3). 


Proof Introduce the canonical coordinates (¢, n) by 
1 eS + 
G = A 5 t T 


MEA e 
w= a(S a) 


In these variables the unperturbed Hamiltonian Ho 
reads Ho = X} j> iwjGinj and W3 is transformed in a 
different polynomial, again of third order. 
The important fact is that in these coordinates the 
eigenvectors of the linear operator {Ho;.} are the 
monomials 


[14] 


— pk nal n 
Cy = aces ni -nE 


Indeed, one has {Ho; fn} =iw. (k —DCn!. As a 
consequence, writing 


L A EDS T 
k,l 


one can define the resonant set 
R:=4(k, D): w- (k-D=0} 


and 


Ci; [15] 


Going back to the original variables, one has the 
solution of the homological equation. o 


Definition 1 The function Z3 solving [13] will be 
called the resonant part of W3 and will be denoted 
by (W3). 


Using the function 3, one can transform the 
Hamiltonian to the form 


Ho + €Z3 + 7R3 


Remark 3 Equation [12] allows to construct 
directly the Taylor expansion of R3 in terms of the 
Taylor expansion of W and of its Poisson brackets 
with y3. 


Iterating the construction (which however slightly 
changes due to the presence of Z3), one gets the 
proof of Theorem 2. 


Remark 4 In the nonresonant case w:(k—/)=0 
implies that k =l; therefore, the resonant part of a 
polynomial is the sum of monomials of the form 


that is, it is a function of the actions only. Moreover, 
in this case one has Z3 =0, while in general Z4 Æ 0. 


Some Symbolic Calculus 


To understand how to quantize the procedure of 
Birkhoff normal form, we consider the classical- 
quantum correspondence. It is well known that 
there are different procedures in order to associate 
an operator with a classical observable. Here we 
concentrate on the Weyl quantization rule. 

To a function f € S(R*”) (Schwartz class), we 
associate an operator f acting on functions w € 


S(R”), which is defined by 


Fal) =a fo a Ee) 


i(x—y)-€ 


xe Fly) dy dé |16] 








Definition 2 The operator [16] is called the Weyl 
quantization of f and in turn f is called the symbol of f. 


Using the method of oscillatory integrals, the 
Weyl quantization rule can be extended to much 
more general observables f. We recall that, roughly 
speaking, the method of oscillatory integrals consists 
in giving meaning to a formal expression of the form 
[16] by using successive integration by parts (see, 
e.g., Martinez (2001)). 


Definition 3 A function f € C~(R*”) will be called 
a smooth symbol of class S((z)”) if, for any r > 0, 
there exists C, such that 








Where (z) is as defined earlier. 
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It is useful to extend such a definition to functions 
explicitly depending also on h. This can be done in a 
straightforward way by asking the constants C, to 
be independent of f in a neighborhood of the origin. 
Different classes of symbols can also be defined, but 
for our purpose this class is enough. 


Theorem 3 Let f € S((z)’""), m € R, and 4 € S(R”); 
then the formal expression [16] is a well-defined 
oscillatory integral. 


Example 1 Under Weyl quantization rule, one has 


Ê; = ib0,,, ĝj=x; (multiplication operator) 
Ginny = 3 (GÂ; + FG) 
Definition 4 A sequence (f;i)>ọ with fj € S((2)”) 
will be called the asymptotic expansion of f € 


S((2)”) if, for any integer N, there exist two positive 
constants Cy, byn such that 


N A 
f= S Wf+Rn 
i=0 


with |Rn(z,6)| < CNAN t! (2)”, and h € (0, bn). 


The key point for the quantization of the normal 
form procedure is the following. 


Theorem 4 Let f € S((2)™") and g € S((z)""); then 
there exists a unique F € S((z)""""") such that 


F= fe (operator product!) 


moreover, one has 


F =exp (> (ð; - Oy — Oy - a0) 
x (f(x, E)aly, n)) ane [17] 


Finally, F admits an asymptotic expansion in h 
which coincides with the formal expansion of [17]. 


The proof is obtained by using eqn [16] to 
write down an expression for fgy and obtain a 
formula for F. Then, one shows that the formula 
is well defined and therefore the result is not 
formal. 


Definition 5 In the above context, the symbol G of 


TE 


will be called the “Moyal bracket” of f and g and 
will be denoted by {f; e}m. 


By formula [17], one has in particular 


gim = {fig} + Alg) +O a8] 
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where 
Ai(f,g) = 1 (Pf 0's a Of Os 
158) ~~ 94 \ 883 x3 > BE2Ax Ax2AE 
Df He Cfg 
OEOx2 OxOE2 8x? 0& 


where we used a vector notation for the derivatives. 
If either f or g are polynomials of degree <2, then 


sim = {8} [19] 


Given a self-adjoint operator A and a smooth 
function G:R—R, it is well known how to 
define by spectral theorem the operator G(A). 
Suppose now that A= f for some symbol f. In 
general, one has G (f) Æ Gof. However, by sym- 
bolic calculus (i.e., using eqn [17]), one has: 


Lemma 2 Denote I(x, €) = (wi xs + &)/2uy. Then, 
for any positive integer k there exists a function 
F,(1j,6) such that 


(I) = Fa (Ij, b) 


where the right-hand side is defined by spectral 
calculus. Moreover, F, can be computed explicitly 
by the recursion formula Fp, =TjFyt+ 
Fp 1b (k? — k + 1)/4. 


As a consequence of this fact and of the fact that 
[Î;, Î]=0, one has that the Weyl quantization of a 
polynomial function of the actions is a function of 
the action operators. 


Semiclassical Normal Form 


Let y be a smooth symbol such that ¥ is self-adjoint, 
and consider the group of unitary operators 
Xe := exp ((ie/h)x). Let g be a smooth symbol; 
apply the unitary transformation X, to g, namely 
compute X.éX-_'. Noting that (on a suitable domain) 


d oa Le aaa 
T (Xe8Xo') = Xoz lk lx’ 


one has (formally!) the expansion of X.ĝX-! in «e: 


XX! =Y egi 
I>0 
where 
as < a fi 
87,0 = &, Gi = zz lk ida [20] 


(Such a series can be interpreted as an asymptotic 
expansion provided one restricts the domain at each 


step of the approximation.) Equivalently, the symbol 
of XX7 is formally given by >>, e@g,,; with 


£4,0'=8, 841°= 7x Sqi-ttm,?21 [21] 
from which one sees a remarkable similitude with 
the classical equation. Moreover, [21] converges to 
[12] when b — 0. 

Applying the unitary transformation generated by 
x to the Hamiltonian operator H. (cf. eqn [10]), one 
has X.H. a Hi with 


H, = Ho + e[W; + {x;Ho}m] + O(A) [22] 


= Ho + €[W3 + {x HHY + Ole) [23] 


where we used the fact that Ho is a quadratic 
polynomial, so that [19] holds. It is thus clear that 
Lemma 1 allows to solve also the quantum homo- 
logical equation appearing in this context and to 
determine the symbol of the operator generating the 
unitary transformation putting the Hamiltonian opera- 
tor in normal form up to corrections of order e°. 
Moreover, one can compute in terms of Moyal 
brackets (of polynomials!) the expansion of the symbol 
of the new remainder and of the normal form. Iterating 
the construction, one generates a well-defined semi- 
classical normal form of the quantum system. 


Example 2 Denote by Z,),/=1,2..., the term 
added to the semiclassical normal form at the Ith 
step of the iterative construction. Explicitly, the first 
terms are given by 


Zar = (W3) = Z3 [24] 


Za2 = (Wa) +3¢({x3; W3}u) taai Zam) BS] 


Z4,3 =(Ws) + ({x4;Z3}ue) + 3({x3; Ho} u) 
+43({x3; Wsalm) + ({x3; Wat) B6] 


where, according to Definition 1, (.) is the resonant 
part of its argument, x; 1s (formally) the symbol of 
the operator generating the jth unitary transforma- 
tion, and 


Hy := x3; Zs — W3}y, W31 := {x33 Ws}m 


Note that all the Moyal brackets involved contain 
polynomials of degree at most 4, so that they can be 
computed exactly using uml [18] which in this 
case does not contain corrections of order h°. 


The problem in making previous construction 
rigorous is that all the series involved are in general 


divergent. Moreover, it is not possible to show that 
the remainders appearing when truncating such 
series are small in a reasonable sense. Nevertheless, 
it is possible, using the tools of microlocal analysis, 
to show that the semiclassical normal form contains 
essentially all the information on the part of the 
spectrum close to zero. 

The precise relation between the spectrum of 
the original Hamiltonian and the spectrum of the 
semiclassical normal form is captured by the 
following definition. 

Let H,(e,4), H2(e,4) be two families of self-adjoint 
operators; set Spec,.(H1,2) := Spec(H1,2) A [0, €). 


Definition 6 We say that 
Spec,(H,) = Spec.(H2) mod(e* + (4/e)™) 


if for any N, M > 0 there exist Ch, y and Cx m such 
that for any A E€ Spec.(H;) there exists A2 € 
Spec,(H2) such that Ay =A2 + Ryn, m with 


IRn] < Cy.meY + Cy.m(h/e)" [27] 
and conversely. Equation [27] has to hold for any 
couple (h,e) with € and (4/e) small enough. 


Theorem § Assume (H2) and (H3); assume also: 
(H1) There exist y > 0 and T € R such that, for any 
k € Z”, one has 


a [28] 





eitherw:-k=O or|lw-k|> 


Then there exists a polynomial function Z4 such 
that one has 


AN 


Spec, (ÊT) 
= Spec, (Ĥo + Z,) mod (e F (=) o) (29) 


The polynomial Z4 coincides with the semiclassical 
normal form defined at the beginning of the 
section. 


Scheme of the proof It consists of six steps. 
(1) Make the unitary transformation (Uw)(x):= 
é/4u,(e!/2x) which transforms the Hamiltonian 
operator [1] into the Weyl quantized of 
cH, :=e(Ho + 62W), but a Weyl quantization 
where $ is substituted by h := h /e. (2) Make a cutoff 
of H., namely, fix R and consider a smooth function 
t such that t(s) = 1 for |s| < R, t(s) = 0 for |s| > 2R, 
define a(x, £) := W(x)t(||(€,x)||). (3) Compare the 
spectrum of the Hamiltonian H, with the spectrum 
of H’:=Hpy + ea. By microlocal analysis, one has 
that, in any fixed bounded interval such spectra 
coincide modulo b% (see, e.g., Martinez (2001)). 
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(4) Rescale back the variables, namely apply the 
transformation U7! to H*. (5) Apply the normal 
form algorithm to the so-obtained Hamiltonian 
showing that all the series involved are convergent 
in suitable norms. (6) Use again microlocal analysis to 
show that the spectrum of the semiclassical normal 
form coincides with the spectrum of the normalized 
operator with compactly supported symbol. o 


Remark 5 Fix an arbitrary 1 > 6 > 0 and link € to 
b by e :=b°. Then one obtains a simplified statement 
according to which the spectrum of [1] in [0,4] 
coincides modulo 4* with the spectrum of Ho + Zi 
in the same interval. 


Remark 6 In the case where the frequencies are 
nonresonant one has that the symbol of the normal 
form depends on the actions only. By Lemma 2 one 
has that also the quantization of the normal form is 
a function of the action operators only (explicitly 
computable), and therefore the spectrum of the 
normal form is given by a quantization formula as 
claimed in Theorem 1. 


The Resonant Case 


In the case where the frequencies are nonresonant, 
due to the particular structure of the normal form, 
one obtains a very precise information on the 
spectrum. In the case where there are some 
resonances, the situation is more difficult. In order 
to illustrate what happens we concentrate on the 
completely resonant case, that is, the case where all 
the frequencies are integer multiples of a single 
fundamental frequency v. 

In this case, the eigenvalues of Ho form a subset of 
Nhv + (1/2)|w|b and are degenerate. One expects the 
nonlinear part to break such a degeneracy and to 
transform each eigenvalue in a small band. One can use 
the normal form to study the structure of the so- 
obtained band. To this end, the most relevant contribu- 
tion is due to the first nonvanishing term of the normal 
form. For the sake of definiteness, we assume that this is 
the term of order 4, namely Z4. Denote 


N := Zala) B(E) = (E — vh, E + qh 


Theorem 6 Fix 1 > %1 > 1/2, then, provided h is 
small enough, one has 


Spec(H)(b,6")c |]J BŒ) [30] 


EcSpec(fo) 
Moreover, denote by 


EFM) S E FAE, h) [31] 
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the eigenvalues of H in B(E) counted with multi- 
plicity, then 


à (E, 6) = E*Min N + E?(O(6/E) + O(E'*)) [32] 
and similarly 
E? \m(E,b) = Max N+ E*(O(6/E) + O(E'/”)) [33] 


This statement is due to Bambusi, Charles, and 
Tagliaferro (see Bambusi 2004); for previous results, 
see Vă Ngoc (1998). 

Equation [30] shows that the spectrum has a band 
structure, while eqns [32] and [33] allow one to 
compute the minimum and the maximum of each band. 

The idea of the proof is as follows. First forget high- 
order terms of the normal form, whose effect is included 
in the error terms. Then, due to the commutation 
property of the normal form with Ho, one has that Z4 
restricts to an operator acting on the eigenspaces of Hp. 
On the classical side, one has that by Marsden- 
Weinstein procedure Z4 defines a classical Hamiltonian 
system on the manifold obtained by symplectic reduc- 
tion of the original phase space. By the methods of 
geometric quantization, it turns out that the quantum 
operator acting on an eigenspace of Hp is a Toeplitz 
operator whose principal symbol is exactly the above 
reduced classical Hamiltonian. Then, the proof follows 
by classical properties of Toeplitz operators. 

We point out that results of this kind are useful in 
the computation of the molecular spectra (Michel 
and Zhilinskii 2001, Zhilinskii 2001). 


Quantization of KAM Tori 


In this section we present a result on the quantiza- 
tion of KAM tori. It allows one to construct part of 
the spectrum of a close-to-integrable system. 

We recall that a classical Hamiltonian system with 1 
degrees of freedom is said to be integrable if it has n 
integrals of motion independent and in involution. If the 
energy surface is compact, then, by Arnol’d—Liouville 
theorem there exists a canonical transformation To: 
R” x T” > D x T” > R” introducing action-angle 
variables, namely such that, denoting by Ko the original 
integrable Hamiltonian, Ko o To is independent of the 
angles @ € T”. Here, D is an open bounded domain. 

Consider now a close-to-integrable analytic 
Hamiltonian system, namely a Hamiltonian system 
with Hamiltonian 


K = Ko + ck, 


where € is a small parameter. We assume that, 
denoting again by Tọ the canonical transformation 
introducing action-angle variables for the system Ko, 


one has that both KooTọo and K,07Tpo are real 
analytic on D x I”. Then, the KAM theory applies. 
To state the corresponding result, denote by Do c D 
a domain whose closure is contained in D. 


Theorem 7 Assume that VI € D one has 


2 O 
i (e K: 2) £0 34 


then there exists a positive constant «e, and, for any e 
with |e| < e, there exists a Gevrey canonical 
transformation T.:Do x T” —> R?” and a Cantor 
set D: C Do with the following properties: 


KoT.=Z(I) + R(I,¢,6) [35] 


where R(I, ¢, €) vanishes at infinite order on D,, that is, 
for any multi-index a there exists Cia; such that one has 
olelR 


C 
a, 6° ap (- T- 5p) pel 


with a suitable p>0 and |I—D,| denoting the 
distance from D.. Moreover, as «e tends to zero, the 
measure of D. tends to the measure of Do. 





(I, @,€) 





A particular consequence is that the set D, is 
foliated in invariant tori. From the proof, it also 
turns out that the motion on each torus is 
quasiperiodic with frequencies fulfilling the assump- 
tion (H1) stated earlier. Moreover, the tori are 
linearly stable and even more: they are stable in an 
exponential sense (namely, a solution starting O(u) 
close to a torus takes at least a time O( exp (c/u’)) to 
double its distance from the torus). 

Quantizing the normalizing transformation T, by 
using the theory of Fourier integral operators, one 
can also put the quantum Hamiltonian in a suitable 
normal form which allows to deduce some spectral 
information on the system. 

To fix ideas we restrict to the case where K is a 
natural system, namely it has the form (3.1), and is 
close to integrable in the above sense. Fix two 
parameters E; < E2; assume (1) that K ([— o0, E2 + 
6]) is compact for some positive 6 and (2) that the 
domain Dp can be constructed in such a way that 
To:Do x T” — K! ([Eo, E1]) is a bijection and, 
moreover, the KAM condition [34] holds. Denote 
by 0 € Z” the Maslov class of the tori of Ko (see, 
e.g., Lazutkin (1993)) and, having fixed some 0 < 
o < 1, define the set of indexes 


T := {k € Z”: |D- b(k+0/A| <b} [87 


Theorem 8 There exist positive constants h,, c, C, 
and o< 1, and a function K,:Do x (0,b6.) ~ R 
with the following property: for any k © TZ there 
exists at least one eigenvalue of R in the interval 


Z,(b(k+0/4),4) — C=, 
Z (b(k+0/4), h) + ce- 38] 


One can also show that a large part of the 
spectrum is constructed in this way. This is obtained 
by comparing the semiclassical estimate of the 
number of eigenvalues in [E1, E2] to the number of 
eigenvalues thus constructed. 

Theorem 8 is due to Popov (2000); the quantiza- 
tion of KAM tori was initiated by Lazutkin and 
widely developed by Colin de Verdiére, who obtained 
a result similar to Theorem 8 for the case where K is 
C% and describes the geodesic flow on a compact 
Riemannian manifold (Colin de Verdiére 1977). 


See also: Central Manifolds, Normal Forms; 
h-Pseudodifferential Operators and Applications; Optical 
Caustics; Quantum Mechanics: Foundations; 
Schrodinger Operators; Stationary Phase Approximation. 
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Introduction 


The present article relies heavily on Quantum 
Mechanical Scattering Theory in this Encyclopedia 
and can be considered as its continuation. We use 
here freely the notation and results discussed in this 
article. 

An important problem of scattering theory con- 
cerns the Schrödinger H operator of N, N > 3, 
interacting particles. Since the potential energy of 
pair interactions between particles depends on their 
relative positions only, it does not tend to zero at 
infinity in the configuration space of a system, even 
if the center-of-mass motion is removed. This is 
qualitatively different from the two-particle case. 
It turns out that asymptotically (for large times 
t— +00 or t— —oo) an N-particle system splits up 
into clusters, 


Cole) Cais GrG kaAi i 
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Particles from the same cluster C}, R=1,...,”, form 
a bound state, and different clusters do not interact 
with each other. In particular, if n=1 and 
C,;={1,2,...,N}, then we have a bound state of 
the system. In another extreme case n=N, all 
particles are free. The asymptotic evolution deter- 
mined by clusters Ci,...,C, where n > 2, and bound 
states of all these clusters is called a scattering 
channel. Physically it is natural to expect that the list 
of all such channels is exhaustive, that is, no other 
scattering process is possible. This statement 
is called asymptotic completeness. 

We emphasize that an N-particle system may be in 
different scattering states as t > +00 and t— —oo and 
different rearrangement processes are possible. For 
example, a three-particle system may asymptotically 
consist of free particles or a pair of particles may be in 
a bound state, whereas the third particle may be 
asymptotically free. If particles are free at both —oo 
and +00, then one speaks about elastic scattering; we 
have a capture if particles free at —oo form a bound 
state of a couple after the interaction; an opposite 
process, when a bound state at —oo gives three free 
particles, is known as a breakup. It is also possible that 
a bound state of one couple yields a bound state of 
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another pair (a rearrangement) or a bound state of a 
couple transforms into another bound state of the 
same couple (an excitation). All these processes are 
described by the scattering operator. On the contrary, 
if the whole system forms a bound state at —oo (i.e., 
n= 1), then it remains in the same state for all t. 

As far as monographic literature on N-particle 
scattering is concerned, we mention Derezinski and 
Gérard (1997), Faddeev (1965), Reed and Simon 
(1979), and Yafaev (2000). 


Setting the Scattering Problem 


Let us recall the definition of the N-particle 
Schrödinger operator (Hamiltonian) 


H = Ho + V 2] 


If the configuration space of each particle is R, then 
the operator H acts in the space L2: (R®N). The operator 
of kinetic energy (the “unperturbed” Hamiltonian) is 


N 


Ho = — X (2m;) Ax, [3] 
ji 


where x; and m; are the position and mass of the 
particle labeled by j. The operator of potential energy 
of pair interactions of particles (the perturbation) V is 
the operator of multiplication by the function 


Vix)=S Vi- xi), ij=1,..., N | 
i<j 
Set a= (ij), x“ =x;— xi. It is assumed that the 
functions V°(x°) tend to zero sufficiently rapidly as 
jx°|— oo in R. However, the function V(x) 40 
as |x| oo in R” if at least one of the distances 
|x; — x;| between particles remains bounded. This 
difficulty is manifest even for two particles (N = 2), 
but in this case it disappears if the motion of the 
center of mass of the system is removed. 
This means the following. Let the subspace X™ of 
RN be distinguished by the condition 


> mix; =='0) [5] 


and let Xem be the orthogonal complement to X™ in 
the space RN endowed with the scalar product 


N 
(x,y) = 2S m pe [6] 
j=l 


Then 
LR =X OL 


Denote by xem, x“ the orthogonal projections of 


x ER” on the subspaces Xem, X°™, respectively, so 
that x=(xXem,x™). Clearly, the vector xem has 
components 


N N 
— M-!1 = 
Kn SM X Mixi M= X m; 
j=1 j=1 


Let T(p), (T(p)f (x1, e. XN) =f ua <.. XN +p), 
be the operator of common translations of particles. 
The operator H commutes with T(p), that is, 
T(p)H = HT(p), for all p € RË. It follows that 


H=K@I+I@H, K=-(2M)'A [7] 


Xcm 


where K is the kinetic energy operator of the center- 
of-mass motion. 
The operator 


H = Ho + V [8] 


acts in the space H = L (X™). Here V is again the 
operator of multiplication by function [4]. The 
precise form of the differential operator Hp depends 
on the choice of coordinates in X™. For example, if 
N=2 and x= x2 — x1, then Ho = —(2m) 1A, where 
m =mm (mı + m). In the case N = 3, a natural 
choice of coordinates in X™ is given by one of the 
three sets of Jacobi variables: 


x1? = X2 — X1 
x12 = x3 — (mı + m) | (m4x4 + 7X2) 


and similarly for x13, x13 and x”, x23. In coordinates 


Xx”, Xa the operator of kinetic energy is determined 
by the formula 


Ho = —(2my) ‘Ay. — (2m?) 7" Age 
where, for example, 
2)! =m tmy!, mp = (mi +m)! + m3! 
If N=2, then V(x) 0 as |x| — co, x € X™, but this 
is no longer true for N > 3. According to eqn [7] the 
spectral and scattering theories for the operator H 
reduce to those for the operator H. However, for 
N > 3, this reduction is not really helpful. 

Let us now consider a breakup a={Cj,...,C,} 
of an N-particle system into clusters C1,...,Cy, 
1<n=:#(a)<N_ satisfying conditions [1]. If 
interactions between different clusters are neglected, 
we obtain the operator 


H,=H)+V%, V7=> X V° [9] 
I=1 a€C; 


In particular, H,=Ho if #(a)=N and H,=H if 
#(a)=1. Let the operator of common translations 


of particles from the same cluster be defined by the 
equation 


T Oiee Da) ie) ee 


where x;=x; +p, if 7 € C. The operator H, com- 
mutes with the operators T,(p1,...,fn) for all 
vectors ~1,.--5Pn E RI. Let the subspace X° be 
determined by the condition 


ego l= benn 


JEC 


and let X, be the orthogonal complement to X° in 
X™ with respect to scalar product [6]. Clearly, 
dim X’ = (N — #(a))d, dim X, = (#(a) — 1)d. Then 
the space H splits into the tensor product 


Lo(X™) = LXS Lax") [10] 


In what follows, x, and x° are the orthogonal 
projections of x € X™ on the subspaces X, and 
X*, respectively. The “external” variable x,= 
(X1,X2,...,X,), where 


-1 
XI = M; S m Mi = Nom; 


jECI JEC] 


describes positions of centers of masses of the clusters. 
The “internal” variable x° is the set of numbers x; — x; 
for all 7 € Cı and all /=1,...,”. Of course, for each l 
only |C;| — 1 (|C;| is the number of particles in a cluster 
Cı) of variables x; — x; are independent. Set 


n 


Kı = =A = NO 2M) Ax 
j=i 


and 
BH? =e V 
Then 
H, = Kı ®1I+1®H* 


Note that eigenvalues A>” of the operator H° are 
sums over l= 1,...,n of eigenvalues of the operators 


H(C;) = Ho(C;) + X V° 


aeC; 


describing each cluster. Similarly, eigenfunctions yy” of 
H’ are products of eigenfunctions of these operators. 
We usually write a instead of a couple {a,n}. In the 
following, the index a labels all cluster decompositions 
with #(a) > 2. The eigenvalues \* of the operators H° 
(A7=0 if #(a)=N) are called thresholds of the 
Schrödinger operator [8]. If all functions V°(x°) — 0 
as |x°| — oo, then the essential spectrum of the operator 
H consists of the interval [Ao, oo), where 


Ao = min à? 
a 
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(the Hunziker-Van Winter—Zhislin theorem). More- 
over, the eigenvalues of the operator H may 
accumulate at its thresholds only. 

The fundamental result of scattering theory for 
the N-particle Schrodinger operator can be formu- 
lated as follows. Let P* be the orthogonal projection 
in L2(X*) on the subspace HP’ spanned by all 
eigenvectors Y” of H°, and let P, =I & P’, where 
the tensor product is defined by eqn [10]. Then P, 
commutes with the operator H,. Set also Ko = Ho, 
Po =I. Suppose that for all a 


VER) < CA + |x"), p> [11] 


(the short-range assumption). Then, for all a, the 
wave operators 


Koe P,)=slime Ve P; 
t—+oo 
exist and are isometric on the ranges RanP, of 
projections P4. The subspaces Ran W are mutually 
orthogonal, and scattering is asymptotically complete: 


Dp Ran w+ = HCS) 
a 


The singular continuous spectrum of H is empty, so 
the absolutely continuous subspace H° of the 
operator H can be replaced by HGH"), where 
H'?) is spanned by all eigenvectors of H. 

These results can be reformulated in terms 
of scattering theory in a couple of spaces. 
Suppose that, for every a, eigenvectors w=” are 
normalized and orthogonal if the corresponding 
eigenvalues A>” coincide. Let us introduce an 
auxiliary space 


H= Qa, Ha = Ha = L2(X,) [12] 


and an auxiliary operator 


A =QKa, Ka=K,+ [13] 
a 


in this space. Here and below, the sums are taken 
over all a. We define an identification ]:7— H by 
the relations 


Jao Te FGH-ee [14] 


where the tensor product is the same as in [10]. In 
particular, J?=I. Since H,J*=J*K,, the wave 
operators W+(H,H;/J) exist and are isometric and 
complete, that is, 


Ran W*(H, H; J) = H& 


Thus, for states orthogonal to eigenvectors of 
H, evolution of an N-particle system decomposes 
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asymptotically into a sum of evolutions which 
are “free?” in external variables x, and are 
determined by eigenvalues and eigenfunctions of 
the Hamiltonians H* in internal variables x. To be 
more precise, we have that, for all f € H'° and 
t— 00, 


exp(-iHt)f = J" exp(-iKat)f © Yf + 0(1) [15] 


where 
fe = WHH, Ka JOf 


and the term o(1) tends to zero in H. The wave 
operator W+(H,Kąa;J*) describes the scattering 
channel where a system of N interacting particles 
splits up asymptotically (for t— +00) into non- 
interacting clusters C1,...,C„, n > 2, and particles 
from the same cluster C; are in the bound state (if 
there are more than one particle in C;) given by the 
function y%(x*). Somewhat loosely speaking, this 
implies that the continuous spectrum of the 
operator H consists of branches starting from all 
its thresholds. 

Note that the scattering problem can equivalently 
be formulated without the separation of center- 
of-mass motion. In this case, a trivial decomposition 
with #(a)=1 should be added, and the set of 
thresholds of the operator H includes eigenvalues of 
the operator H. 

The existence of the wave operators and their 
isometricity can be obtained by the Cook method. 
Only the asymptotic completeness is a difficult 
mathematical problem. It can be solved within the 
framework of the smooth method, which requires a 
study of boundary values of resolvents as the 
spectral parameter z approaches the continuous 
spectrum or, equivalently, a study of a large-time 
behavior of evolution operators. 

The scattering operator 


S = W*(H,H;J)* W- (H, Å; J) 


is unitary on the space H and commutes with the 
operator H. Its component Sap : Hp —> Ha describes 
a process where a system in a state b as t— — o0 
goes over in a state a as t— +00. Diagonalizing 
the operator H by a unitary operator 
F,(FHf)(\)= AËÈFA)A >Ap, we obtain the 
scattering matrix S(A) defined by the equation 
(FSf)\(A) =S(A)(Ef)(A). In its turn, the scattering 
matrix is also a matrix operator with components 
Sab(A). For N > 3, the structure of the scattering 
matrix is essentially more complicated than for 
N =2. This is discussed in some detail in the next 
section. 


Resolvent Equations for Three-Particle 
Systems 


Let the Hamiltonian H be defined by eqns [2]-[4], 
where N = 3, and let the configuration space of each 
particle be R, d > 3. The operator H acts in the 
space H= L (X™), where the subspace X™ c p” 
is distinguished by condition [5]. Let Ro(z)= 
(Ho — z2)", R(z)=(H — 2z)". Since V(x) does not 
tend to 0 as |x| — œ,x € X™, in the three-particle 
case, the resolvent equation 


R(z) = Ro(z) — Ro(z) VR (2) [16] 


is not Fredholm even for Imz Æ 0. 

To overcome this difficulty, Faddeev (1965) 
derived a system of equations for components of 
the resolvent. The entries of this system are 
constructed in terms of three Hamiltonians 


Ha = Ho + V° 


œ= (12), (13), (23), containing only one pair inter- 
action each, and their resolvents R,(z) =(H, — 2)". 
Let us write down the resolvent equation for each 
pair Ha, H 


R(z) = Ra (z) — Ra (2) X V°R(z) 
ba 
We multiply it by vej" * and set 


r? (z) = |V| Raz), ralz) = |V%|7R(z) 
too(Z)=0, top(z) =V A RAVNA 


where (V8)! = V8|V9|"/?. This yields a system of 
equations 


ralz) = ralz) — X ta,a(z)ra(z) [17] 


pza 


for the operators r,(z). Note that the resolvent R(z) 
can be recovered from its components rq(z) by the 
formula 


R(z) = Ro(z) — Ro(z) X (V) ra(z) 


It is convenient to rewrite eqn [17] in the matrix 
notation 


r(z) = r° (z) — t(z)r(z) [18] 
where r°(z) = {r®(z)}, r(z) = {ra(2)} are the “vector” 


operators in the three-component space Lx and 
t(z) = {ta a(z)} is the “matrix” operator in this space. 

The advantage of eqn [17] compared to [16] is 
that the operators ta,g(z) are compact for Imz Æ 0. 
This can be deduced from the fact that the product 
Ve(x%)Vf(xf), where aA tends to 0 as 


jx] ~oco,x € X™, provided that V°(x°)—0 as 
lx°| co for all a. Moreover, the homogeneous 
equation [17] has only a trivial solution. Indeed, if 
for some z with Imz 4 0 


— > taplz)fa (19) 


bFa 


then the function 


u = Paa 


a 


satisfies the equation u=—Ro(z)Vu. Since the 
operator H is self-adjoint, this implies that u=0 
and hence f,=0 for all a. According to the 
Fredholm alternative, eqns [17] for r,(z) or [18] 
for r(z) can be solved if Im z Æ 0, that is, 


r(z) = (I + t(2)) r (2) |20] 


This equation allows one to deduce the existence of 
necessary boundary values of the “sandwiched” 
resolvent R(z) from similar results for the resolvents 
R,(z) of the “two-particle” operators H,. In its 
turn, R,(z) can be expressed in terms of the resolvent 
R° (z) of the operator H® acting in the space Lz (RÍ). 
Indeed, in the “mixed” representation (£a, x“), where 
the Fourier transform in the variable x, is performed 
and the variable €, is dual to xa, we have 


(Ra(z)f) (Ea, x°) = (R°(z — (2a) EaP) 
x (Eq, x°) [21] 


The passage to the limit Imz — 0 requires that 
assumption [11] be satisfied for p > 2. Moreover, 
we have to suppose that the operators H® do not 
have the so-called zero-energy resonances as well as 
eigenvalues embedded in the continuous spectrum. 
Then the operator functions (x°)~ R(z Lom ~is 
(xe) =(1 + |x%|7)/*, are analytic in the smie 
plane cut along [0, c0), they have poles only at the 
points A%”, and are continuous up to the cut, the 
point z=0 included. In particular, it follows from 
eqn [21] that, if the operators H® do not have 
negative eigenvalues, then the operator functions 
(x) Ra(z)(x%) 4,1 >1, are also analytic in the 
complex plane cut sone [0, oo) and are continuous 
up to the cut. 

The next result is of genuinely three-particle nature 
and is crucial for the study of the o t(z). The 
operator functions (x%)’Ro(z)(x?) (a 46,1 > 1, 
are continuous in norm up to the cut alone [0, 00). 

Now it follows from ate [20] that the operator- 
valued functions rq(z)|/V°|'/? are continuous up to 
the cut (0,00) except points A € (0,00), where the 
homogeneous equation [19] for z=A+i0 has a 
nontrivial solution. The set M=M, UN- of such 
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points A € (0, 00) is closed and has Lebesgue measure 
zero. In particular, the operators (x%)',/> 1, 
are H-smooth on any compact M a 
A= (0,00)\N. Therefore, the smooth method of 
scattering theory can be directly applied. It yields 
the existence and completeness of the wave 
operators W(H, Ho). In this case, three-particles 
are necessarily asymptotically free. 

“Two-particle” channels of scattering arise if the 
operators H® have negative eigenvalues. To simplify 
notation, we assume that every H® has exactly one 
eigenvalue A° < 0. Moreover, it is supposed that the 
corresponding eigenfunction w(x") tends to zero 
sufficiently rapidly as |x°|— oo. Analytically, the 
appearance of new channels is due to new singula- 
rities of the resolvents. Indeed, in this case 

R®(z) = (AX — z) aR 

where the function R°(z) is analytic and continuous 
up to the cut in the complex plane cut along [0, oo). 
It follows from eqn [21] that in this case the 
resolvent R,(z) contains the additional term 


(Oma) 


which is analytic only in the complex plane cut 
along [A°,oo). To take these terms into account, 
system [17] should be further rearranged. This yields 
the following result. Let us set 


Gao = (x°) (I— Pa), Gar = (xa) OSV? [22 
bfa 


Then, for all aœ, 8,i,j=0,1, a suitable />1 and 
ào = min{A°}, the operator functions GaiR(2)G5; are 
norm continuous as z approaches the cut (Ap, oo) at 
the points of A=(A9,00)\N, where N is again a 
closed set of measure zero. In particular, 
the operators Gao and G,; are H-smooth on any 
compact subinterval of A. 

In the multichannel case, to fit scattering for the 
Hamiltonian H into the framework of smooth 
theory, it is convenient to reformulate the result in 
terms of scattering theory in a couple of spaces. Let 
the space H, the operator H, and the identification J 
be defined by eqns [12], [13], and [14], respectively, 
where the index a takes four values a=0,a and 
œ = (12), (13), (23). One, further, needs to introduce 


auxiliary identifications 


J =i=)> P, 


te t-a or 


and 


J=Poe@r 


590 N-Particle Quantum Scattering 


The H- (and H-) smoothness of operators [22] imply 
that the wave operators 


W*(H,H;J) and W*(H,H;J*) 
exist. o 
The operators W+(H, H;J) are isometric because 
mee Pa exp(—iHot) = 0 [23] 
t| 00 


and the operators P,Pg are compact for a Æ 8. 
Using that the operator 


I’ -1=) PaPg 
oF 


is compact (whereas th —I is not), we see that the 
operators W*(H,H;J*) are also isometric. Finally, 
we remark that, by eqn [23], 

W+(H, H; J) = W=(H,H, J) 
This implies the asymptotic completeness. 

Let us discuss properties of the scattering matrix 
in the one-channel case where the pair operators H® 
do not have negative eigenvalues. The scattering 
matrix S(A):L7(S74-!) + L,(S74-!), X\>0, is of 
course a unitary operator, but in contrast to the 
two-particle case the operator S(A)—I is not 
compact because its kernel contains the Dirac 
functions (éa — £). Nevertheless, the structure of 
its singularities can be explicitly described. Actually, 
let S,(A) be the “two-particle” scattering matrix for 
the pair Ho, Ha. Then 


S(A) = S12(A)S23(A)S13(A)S(A) 
where the operator S(A) — I is compact. 

The approach described briefly in this section 
relies on a kind of an advanced perturbation theory 
where the free problem is determined by the set of 
all sub-Hamiltonians. Its generalization to the case 
of an arbitrary number of particles meets with 
numerous difficulties. A different, nonperturbative, 
approach which works well for any number of 
particles will be discussed in the next section. 

A purely time-dependent method in three-particle 
scattering is exposed in Enss (1983). 


Nonperturbative Approach 


Now N and d are arbitrary. In the nonperturbative 
approach (see Graf (1990), Sigal and Soffer (1989), 
and Yafaev (1993)) the operators H and Ho as well 
as the Hamiltonians of all subsystems are treated on 
an equal basis. It is supposed that all pair potentials 
satisfy condition [11]. No assumptions on subsys- 
tems are required. 


The starting point of this approach is the limiting- 
absorption principle, which claims that the operator 
(x) xe X™, for 1>1/2 is H-smooth on any 
compact interval A not containing the thresholds and 
eigenvalues of H. Its proof relies on the Mourre 
commutator method (see Cycon et al. (1987)). To be 
more precise, it is deduced from the following estimate: 


i((H, Alf. f) > cliff, c=c(A) >0 
f € E(A\)H [24] 


for the commutator of H with the generator of 
translations 


A= -i) (xj0; + 0x/) 


J 


Here x; are coordinates of x € X™ in some orthonor- 
mal (with respect to scalar product [6]) basis in X™, A 
is neither a threshold nor an eigenvalue of the operator 
H and A) is a sufficiently small interval. Very roughly 
speaking, the Mourre estimate [24] means that, 
similarly to the two-particle case, the observable 


(Ae Mf, eh) 


is a strictly increasing function of t for all f € H®®. 

The limiting-absorption principle implies that the 
singular continuous spectrum of the operator H is 
empty, but it is not sufficient for scattering theory. If 
the limiting-absorption principle were true for the 
critical value /=1/2, then it would imply asymptotic 
completeness. Unfortunately, the operator (x) * is 
definitely not smooth even with respect to the free 
operator Ho. However, by introducing an auxiliary 
differential operator we can fix this problem. This 
leads to the radiation estimates. These estimates look 
differently in different regions of the configuration 
space. Choose any cluster decomposition a= 
(C1,...,C,). The radiation estimate morally implies 
that the motion of a system is asymptotically free in 
the variable x, (describing the relative motion of 
clusters) in the region where particles from each 
cluster C),/=1,...,n, are close to each other 
compared to distances between different clusters. 
On the contrary, this motion is very complicated in 
the variable x° pertaining to bound states of different 
clusters. In particular, the radiation estimate is the 
same as for the two-particle case in the “free” region 
where all particles are far from each other. 

To be more precise, let Va = Vx, be the gradient 
in the variable x, and let V+, 


(Vau) (x) = (Vate)(x) — |xal* ((Vatt)(x), %a)%a 


be its orthogonal projection in X, on the subspace 
orthogonal to the vector x,. Let yz be the 


characteristic function of a closed cone Y, c X™ 
satisfying the condition Y, NX, =Q for all b such 
that X, ¢ Xp. Then the operator 


Ga = xalx) 7? Vt 


is H-smooth on A. 

A proof of the radiation estimates is based on 
the consideration of the commutator of H with 
some differential operator M = —i X (m")0; + ðm”), 
where m” = 0m/Ox;. Here m (it depends on a) is a 
specially constructed function satisfying the follow- 
ing properties: 


1. m(x) is homogeneous (for |x| > 1) of order 1; 

2. for any b it does not depend on x° in some 
conical neighborhood of the subspace X;; 

3. m(x) is convex; and 

4. m(x) = HalļXal, Ha > 1, on support of the function xa. 


Note that we can set m(x) = |x| in the case of the 
operator Ho. 

Due to properties (1) and (2) the commutator 
[V,M] is a short-range function (estimated by 
(x) E for € > 0). Due to properties (3) and (4) 
the commutator [Ho,M]>cG*G,,c>0, up to 
short-range terms. The estimate 


[H, M] > cG*G, — calx) t 


implies that the operator G, is H-smooth on A. 

The main difficulty in the N-particle problem is 
that pair potentials V°(x°) do not tend to zero as 
|x|— oo. The idea of the proof of asymptotic 
completeness is to introduce auxiliary wave opera- 
tors such that “effective” perturbations are decaying 
functions. This requires a suitable smooth partition 
of unity. Moreover, it is convenient to choose 
auxiliary identifications as first-order differential 
operators rather than operators of multiplication. 
Unfortunately, although such identifications allow 
one to “kill” directions where the potentials V°(x°) 
do not tend to zero, their commutators with the 
operator Hp have coefficients decaying at infinity 
only as |x|. 

Thus, we introduce differential operators 


M, = —i S (mto; + am) 


with coefficients m” = dm, /Əx;. The functions ma 
satisfy properties (1), (2) formulated above and 


5. m,(x)=0 in some conical neighborhoods of the 
subspaces X, such that X, Ý X,. To put it 
differently, ™,(x)=0 in some conical neighbor- 
hood of the subspace where x; =x; for some i,j 
belonging to different clusters C,,...,Cy.- 
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Let the operator H, be defined by eqn [9]. Given 
the limiting-absorption principle and the radiation 
estimates, we first check the existence of auxiliary 
wave operators 


W=(H, H,; M,E,(A)) 
and 
W* (Ha, H; MaE(A)) [25] 


Here we use that according to (5) coefficients of the 
differential operator (V — V*)M, are, under assump- 
tion [11], short-range (in the configuration space 
X™). By property (2), the function [V%, M4] is also 
short-range. Thus, the operator VM, — M, V° can be 
taken into account by the limiting-absorption 
principle. The commutator [Ho, M4] factorizes into 
a product of H,- and H-smooth operators according 
to the radiation estimates. 

Similar arguments show that, for X}, ma =m and 
M=} M, (the sums here are taken over all 
possible breakups of the N-particle system), the 
wave operator (observable) 


W*(H, H; +ME(A)) [26] 


also exists. Moreover, it can be easily achieved 
that m(x) > 1. Then it follows from the Mourre 
estimate that operator [26] is positive definite 
on the subspace E(A)H and hence its range 
coincides with this subspace. It means that for 


all f € E(A)H 


lim | exp(—iHt)f — Mexp(—iHt)g*||=0 [27] 
[— 00 


if f= W*(H, H; ME(A))g*. 
The existence of wave operators [25] implies that 
for any g* = E(A)g* and g= = W*(H,, H; M,E(A))g* 


lim |M exp(—iHt)g* 
t— OO 
~STexp(-iHit)gt|=0 R8 


Combining eqns [27] and [28], we see that 
exp (—iHt)f decomposes asymptotically into sim- 
pler evolutions exp (—iH,t)g*. This is one of the 
equivalent formulations of asymptotic complete- 
ness and leads to eqn [15]. 

Finally, we note that eqn [15] can be rewritten as 


exp(—iHt)f =X exp(i®a (xa, t)) (2it) “”” 


x FE (xq /(2t)) d(x") +0(1) 29 
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where t > +00, dą = dim X,, f= is the Fourier trans- 
form of f= and 


®,(x,,t) = x2(4t)' — A't [30] 


Long-Range Interactions: New Channels 


The multiparticle problem acquires a long-range 
character if pair potentials decay as Coulomb 
potentials or slower. Similarly to the two-particle 
problem, for long-range potentials the definition of 
wave operators should be naturally modified. As in 
the short-range case, only the asymptotic complete- 
ness is a really difficult mathematical problem. 
Assume that pair potentials satisfy condition 


p>v3-1 


for all |x| < ko and sufficiently large Ko. Then only 
phase factors in eqn [29] should be modified. 
Actually, instead of eqn [30] we should set 


(OV) (x*)| < C(L + [xr 


1 
Palxa, t) = x2 (4t) — At — | V Asxq; 0) ds 
0 


where V,(x) = V(x) — V*(x) and as usual x = (x4, x”). 
As shown in Derezinski (1993), with this definition of 
wave operators, the asymptotic completeness holds. 
On the contrary, if pair potentials decay slower 
than |x| "7, then the traditional picture of scatter- 
ing breaks down (see Yafaev (1996)). Actually, a 
three-particle system might have additional scatter- 
ing channels intermediary between the channel 
where three particles are asymptotically free and 
the channels where a couple of particles form a 
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Introduction 


The existence of nuclear spin and its associated 
magnetism was first suggested by Wolfgang Pauli in 
1924, a conjecture based on the fine details of 
atomic spectra, the so-called hyperfine structure. 
The interaction of this nuclear magnetism with an 
external magnetic field was predicted to result in a 
finite number of discrete energy levels known as the 
Zeeman structure. However, the first direct 


bound state. In these additional channels, the 
bound state of a couple of particles depends on a 
position of the third particle, and it is destroyed 
asymptotically. 


See also: Quantum Mechanical Scattering Theory; 
Schrodinger Operators. 
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excitation of transitions between nuclear Zeeman 
levels was by Isador Rabi in 1933, using radio- 
frequency (RF) waves in an atomic beam apparatus. 
In 1945, Felix Bloch and co-workers at Stanford, 
and Edward Purcell and co-workers at MIT, 
performed the first nuclear magnetic resonance 
(NMR) experiments in condensed matter, with the 
RF response of the hydrogen nucleus (proton) being 
directly detected. 

The early prospects for this new technique were 
limited to precise measurements of magnetic fields 
and nuclear magnetic moments. However, three 
transformational discoveries intervened to set 
NMR on a course that would result in initially 
unimaginable contributions to physics, chemistry, 


engineering, medicine, geology, food science, and 
biochemistry. In 1950, it was found that atomic 
nuclei at different sites of a molecular orbital had 
slightly different resonant frequencies, a phenom- 
enon known as “chemical shift.” In the a same year, 
Erwin Hahn discovered the spin echo, thus opening 
the possibility that multiple RF pulse trains could be 
used to remove unwanted nuclear spin interactions 
while being used to manipulate spin coherences with 
exquisite resolution. In addition, in 1951, using this 
spin echo, Herbert Gutowsky and Charles Slichter 
revealed a hitherto unobserved scalar spin-spin 
interaction between nuclei, mediated by the mole- 
cular orbital electrons. 

The discovery of the chemical shift and the scalar 
coupling would immediately revolutionize chemis- 
try. Further discoveries of nuclear quadrupole 
interactions and through-space dipolar interactions 
would add to the capacity of NMR to provide 
insight regarding structure and order in the solid and 
liquid crystalline state. But the spin echo would 
provide a platform for new advances in science in 
every one of the six decades following the discovery 
of NMR in 1945. These were successively diffusion 
and flow NMR, multidimensional NMR, magnetic 
resonance imaging, protein structure NMR, ex situ 
NMR, and quantum computing NMR. 


Resonant Excitation and Detection 


In quantum-mechanical language, the Zeeman 
Hamiltonian H for a nuclear spin experiencing a 
magnetic field By along the laboratory z-axis may be 
written as 


H =—yBol, [1] 


y being the (nuclear) gyromagnetic ratio while I, is the 
operator for the z-component of angular momentum, 
with eigenvalues mh, m lying in the range —I, —I + 
1,...,/. I is the angular momentum quantum 
number, being either integer or half-integer. From the 
Schrödinger equation, it can be seen that the eigenkets 
of H precess about the z-axis at a rate yBo, the 
frequency corresponding to the energy difference 
between the 2] + 1 Zeeman levels. For convenience, 
we shall take the eigenvalues of I, to be simply m, 
dropping the factor 4, and leading to a Hamiltonian 
expressed in frequency rather than in energy units. 

Resonant excitation between the Zeeman levels is 
achieved by the application of an RF (w) magnetic 
field of amplitude 2B, linearly polarized normal to 
Bo such that the total Hamiltonian becomes 


H = —yBol, — 2yB, cos wtl, [2] 
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This excitation is easily applied by means of a 
transversely oriented antenna coil, the same coil 
generally being used to detect the nuclear spin 
response. In the frame of reference rotating about 
Bo at w, the Hamiltonian transforms to 


Ien 1(Bi = “VI. — yBılx 
T 
— 7B, exp(i2utl,) I, exp(—i2utl,) [3] 


At resonance, w = wọ = yBo. The last term in eqn [3] 
averages to zero and may be neglected (the 
Heisenberg condition) provided w >> yB1, that is, 
Bo >> Bı. Given Bo of the order of tesla and B1 of 
the order of millitesla, this condition is easily 
satisfied. Hence, from the perspective of the 
rotating frame, the spins at resonance see only the 
static magnetic interaction yB,I,, so that applica- 
tion of this resonant RF field causes spins to nutate 
about the rotating frame x-axis at a rate yB,. Thus, 
by application of RF pulses of different duration, 
and phases, one may produce arbitrary reorienta- 
tion of the spins about various axes in the rotating 
frame. 

With the spin system disturbed from equilibrium, 
the NMR “signal” is detected via the subsequent 
free precession, and usually via the same antenna 
coil used for resonant excitation, Semiclassically, the 
phenomenon may be pictured as follows. RF 
excitation nutates an initial z-magnetization into 
the transverse plane of the rotating frame. Such 
transverse magnetization corresponds the laboratory 
frame to a magnetization precessing at the Larmor 
frequency, thus inducing an oscillating emf in the 
receiver coil. In the next section, we see how to 
describe this phenomenon in the language of 
quantum mechanics. 

Typically, NMR is performed using the nuclei of 
common atoms in organic molecules, ('H,7H, '°C, 
ISN, YF, 31P) although for inorganic matter a wider 
class of nuclei are available. Of all these, the 
proton is most abundant and most sensitive, 
having the highest gyromagnetic ratio, y, of all 
stable nuclei. 


The Quantum Statistics of the 
Spin Ensemble 


The nuclear Zeeman energy in typically available 
laboratory magnetic fields, yBoh, is many orders of 
magnitude smaller than the Boltzmann energy, kgT, 
except at millikelvin temperatures. At room tem- 
perature in thermal equilibrium, the fractional 
difference in populations between the Zeeman levels 
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is normally very small, for example, for protons, 
about 10°. Of course, the total number of spins 
available may be very large, for example, on the 
order of 107°. 

The signal in magnetic resonance is detected as a 
collective effect of the large ensemble of nuclear 
spins. The natural language of quantum statistics is 
that of the density matrix, p; the time-dependent 
expectation value for any observable represented 
by an operator O is then, tr(Op(t)), the diagonal 
sum of the product of O and p. The time evolution 
of the density matrix is given by the Liouville 
equation 


Op 
iL = Hal 4] 


where [,] is a commutator. For a constant Hamilto- 
nian, this equation gives 


p(t) = exp(iHt) (0) exp(-1Ht) 5 


Physical solutions to the density matrix (Liouville 
space) are (21+ 1)* (square) matrices formed in 
the (224+ 1)-dimensional angular momentum 
eigenbasis. Generally, we may write the density 
matrix in a representation of irreducible tensor 
operators. One very convenient representation is 
the set formed by taking products of spin 
operators. For example, in the case of spin-1/2 
where Liouville space is 2*-dimensional, we may 
write 


p(t) =51 + axl, + ayl, + azl, [6] 


where I is the identity operator. The operators Iy 
and I, provide the off-diagonal elements of p and 
define the degree of phase coherence in the 
ensemble, while the operator I, defines the degree 
to which the diagonal elements differ, thus defining 
the polarization. a, and a, give the amount of “one- 
quantum coherence” in the ensemble while a, gives 
the polarization. In thermal equilibrium a, =a, =0, 
and the spin ensemble exists in a state of 
pure longitudinal polarization given, in the high- 
temperature approximation, yBoh <<kpT, by 


1 yhBo 


0) ——_ 4+ —_M__ 
Petal ) Pa z 


AEREN ia 


This is the starting point for all NMR experiments 
(Figure 1). 

Consider then the detection of precession via the 
Faraday induction. The size of the signal observed 
will be proportional to the size of the transverse 
magnetization M=tr[(l, +il,)p(t)] present in the 
rotating frame, this magnetization producing an 


!=1/2 m 


Figure 1 Schematic Zeeman levels for the case /=2 and 
/=1/2. The bold lines indicate the relative population in each 
state in thermal equilibrium. 


induced emf with real and imaginary components 
because of the capacity of heterodyne receivers to 
detect quadrature phase. In the laboratory frame, 
the detected signal has a prefactor of yBo reflecting the 
Faraday induction, which, taken together with the 
dependence of the initial equilibrium magnetization on 
~Bo, gives an overall NMR sensitivity (7Bo)*, helping 
to explain in part why high magnetic fields are 
advantageous. Take the simple example for I= 1/2, 
where a single 90° resonant RF pulse is applied to the 
spin system, subsequent free precession occurring 
under the Zeeman Hamiltonian. The density matrix 
at detection is 


: 7 
p(t) =exp(iwotl,) exp (i STe) Peaba 0) 
x exp(—i7 Ir] exp(—iwotl,) 
. T 
= exp(iwotl,) exp (i Ix) deghmle 


x exp (—i5 Ix) exp(—iwotl,) 
= exp(iwotlz)deqhmly exp(—iwoftl,) 


= Aegbmly COS(Wot) + deghmlx sin(wot) 8) 


Noting tr(I2) = tr(I5) = tr(IZ) = (1/3)(27 + 1)1(1 + 1) 
and tr(I,Jg)=0, the signal may easily be calculated 
as S(t) : degbm €XP(iwot), Corresponding, upon Fourier 
transformation, to a unique frequency at wo. Note 
that a basis consisting of products of angular 
momentum operators are easy to handle since all 
evolution properties follow from the usual angular 
momentum commutation algebra. 

The spin echo pulse scheme of Figure 2 is one of 
the most important in NMR. It allows one to 
refocus dephasing effects caused by inhomoge- 
neous broadening, for example, due to the hetero- 
geneity of the magnetic field across the sample. 
Rewriting the density matrix equation in the 
rotating frame, replacing the Zeeman precession 





Prot(O)_= egom!z 
& Prot(9), = eqomly 
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eee i EE ETETE 


~: prot(t) = Aegom!y COS(Awigt) + ego! Sin(Awot) 
t | 
T \ ProtT) + = Feqhm!y COS(AwoT ) — egom! Sin(Awgr) 
|. Oa 


Na 


! Prol T + Ë) = aegomlyCOS(AwgT)cos(Awot) + Aegom!,COsS(Awgr)sin(Awgf) 
O le —Aegpm!,Sin(Awgr)cos(Awf) + aegbmlySiNn (Awgr)sin(Awo t) 


2T 


\ 


| Prot\2T) = eabmly 





Figure 2 Spin echo pulse scheme showing the evolution of the density matrix. 


by its residual offset, and accounting for both RF 
pulses, 


Prot(2T) = exp(iAworl,) exp(ial,) exp(iAworTI,) 
T T 
x exp (15 Ix) Peabom(0) exp( i> In] 
x exp(—i^woTlz) exp(—i71,) 
x exp(—1Aw TI) 


= abm y 9 | 


Details of the density matrix evolution are given in 
Figure 2. The inversion pulse has the effect of 
completely reversing all the phase shifts that occur 
during the first interval, resulting in an echo signal 
when the two time periods are equal. Note the use 
of nested operators representing the successive 
influences of RF pulses (assumed to be ideal 
rotations) and Hamiltonian evolutions. The overall 
influence of the RF pulses is to render the effective 
Hamiltonian zero in this case. 

This echo sequence (and its equivalent multiple RF 
train, the Carr-Purcell-Meiboom-Gill sequence) allows 
one to remove the effect of magnetic field inhomo- 
geneities so as to investigate the underlying homoge- 
neous broadening and associated signal damping. 


Spin Relaxation 


The free precession of nuclear spins does not 
continue indefinitely. Ultimately the off-diagonal 
elements of the density matrix lose phase coherence 
while the diagonal elements gradually return to their 
thermal equilibrium state, two processes known, 
respectively, as T2 (spin-spin) and Tı (spin-lattice) 


relaxation. The rate of relaxation depends on 
interactions between the spins themselves and 
between the spins and their thermal environment. 
The process of Tı relaxation requires fluctuations 
that induce transitions between the Zeeman levels. 
Clearly the relevant quantum-mechanical opera- 
tors must possess a nonzero matrix element 
coupling the Zeeman levels, and the frequency of 
those fluctuations must match the energy gap 
spacing. Predominant in causing such relaxation 
in diamagnetic environments are the internuclear 
dipolar interactions, while in paramagnetic envir- 
onments, dipolar interactions between nuclear and 
electronic spins are effective. One simple way of 
representing these processes is by the spectral 
density function, the Fourier power transform of 
their fluctuations, dipolar interactions causing 
spin-lattice relaxation due to fluctuations at wọ 
and 2wọ. For a fluctuating interaction with correla- 
tion time, Te, that spectral density may approx- 
imate a Lorentzian of the form 
Jo = 


Tiron a 


Thus, as the rate of molecular motions varies, due to 
the influence of temperature on re, the Tı relaxation 
rate will be a maximum when wo7 = 1. Both solids 
(woTe > 1) and liquids (wor <1) have long Tı 
relaxation times while soft solids or complex liquids 
may have faster relaxation. Tı relaxation manifests 
as an exponential return to equilibrium values of 
longitudinal magnetization. Typical vales range 
from hundreds of milliseconds to hours, and the 
need to re-establish equilibrium between repetitions 
of the experiment can severely limit signal averaging 
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and hence available signal-to-noise ratios. Note that 
Tı relaxation occurs by stimulated emission. 
Spontaneous emission is effectively absent from 
nuclear spin systems owing to the long-radiation 
wavelength. 

The case of T (spin-spin) relaxation is inherently 
more complex. First, the definition of “loss of phase 
coherence” depends on the particular RF pulse 
sequence employed. Second, the simple perturbation 
theory description applied to Tı relaxation only 
works in the fast motion limit, where the Tz 
relaxation rate may be shown to depend on spectral 
density terms not only at wo and 2wọ but also w=0. 
In consequence, T) < Tı. Tə relaxation is sensitive 
to static components. These static components may 
dominate in soft solids and solids. Indeed, any term 
in the Hamiltonian which spreads spin phases, and 
which cannot be recovered by means of a judicious 
RF pulse train, will contribute to Tə relaxation. 
Suppose the effective frequency distribution causing 
dephasing is described by an ensemble second 
moment <Aw*>, and exhibits fluctuations about a 
mean of zero with correlation time, Te. Then we may 
identify two limiting cases: in the slow motion limit 
<Auw* >! 7. >> 1, the decay of the detected magne- 
tization is Gaussian, and given by a factor 
exp(—1/2 < Aw? > t?). In solids, the proton T2 
relaxation may take place in a few tens of micro- 
seconds. In the fast motion limit < Au? >12 <1, 
the decay of the detected magnetization is exponential, 
and given by a factor exp(—<Aw*>7¢t). Liquid state 
Tə values approach Tı under extreme narrowing 
conditions. 


The Details of the Nuclear 
Spin Hamiltonian 


Atomic nuclei interact with their environment, with 
surrounding electrons, and with other nuclear spins. 
It is precisely this feature that provides such a 
sensitive probe of material structure and dynamics. 
For a material immersed in a steady magnetic field 
Bo along the laboratory z-axis, the Hamiltonian for 
the ith nuclear spin can be written 


H = —yBoliz — 1;.8.Bo + X JL; 
j 
+X GDI + 5.0.1; [11] 
- = 


It is the variety of the terms in the nuclear spin 
Hamiltonian that imparts power to NMR. The 
first is the nuclear Zeeman interaction with the 
applied magnetic field. In modern laboratory 


superconducting magnets, this interaction can be 
as large as 1000 MHz, although in earth field 
applications it can be as small as 2.5 kHz. Given that 
the sensitivity and resolution of NMR generally 
improve with increasing magnetic field, the range of 
100-1000 MHz is typically the operating regime of 
choice. All other terms in the nuclear spin Hamiltonian 
are smaller and thus act as first-order perturbations 
only, projecting their quantum operators into the 
zeroth-order Zeeman eigenbasis, the quantum frame 
of the operator I,. Because several of the terms in 
H depend on the orientation of the local nuclear 
environment (e.g., the molecular orbital) with respect 
to the magnetic field, these terms will fluctuate in the 
presence of reorientational motions. By the Heisenberg 
uncertainty principle, fluctuations faster in frequency 
than the size of the Hamiltonian contribution, 
expressed in frequency units, will result in an averaging 
to the mean, a phenomenon known as “motional 
averaging.” 

The term —I;.S.Bo is the chemical shift that occurs 
for nuclei in molecular atoms, or the knight shift for 
nuclei in metals. It is typically a few ppm to several 
100 ppm (i.e. 100’s Hz to 10kHz), depending 
on the nucleus. S$ =yo is a tensor whose principal 
axes (1, 2, 3) are associated with the local symmetry 
axis of the molecular orbital (bond) in the vicinity 
of the nucleus. For a liquid state molecule tumbling 
rapidly and isotropically, only the averaged trace 
of o,0;=(1/3)(o11 + 022 + 033) survives under 
motional averaging, giving a fixed frequency shift 
—o;yBoliz. However, in a solid-state environment, 
the remaining terms also contribute to the aniso- 
tropic chemical shift 


Hes = —o77Boliz — 4 (3 cos? 8 — 1) 


x (033 — 03) Bolz [12] 


where 8 is the polar angle between the magnetic 
field and the principal axis (the axis “3”). 

The scalar coupling term, >/,JIj.J; causes each 
(ith spin) energy level to be sensitive to the quantum 
states of the neighboring j-spins, the coupling 
constant J being typically tens to hundreds of hertz 
for nearby spins, but reducing rapidly with greater 
distance in the molecular orbital. Note that the 
operator >; JI;-I; is nondiagonal in the zeroth-order 
representation, but provided that the chemical shift 
between the I and f spins is larger than the coupling 
frequency (known in chemistry as an AX spin 
system), the operator reduces to }/; Jlizlj, the effect 
being to split the i-spin resonance in to a multiplet, 
depending on the state of the nearby j-spin. For m 
identical nearby j-spins, the multiplet bears a simple 


binomial relationship to m, allowing one to “read” 
this number directly. The combination of chemical 
shift and scalar coupling information is of profound 
importance in identifying molecular structure in 
chemistry. 

The terms D I;.D.I; and I;.Q.I; are, respectively, 
the through-space dipolar interaction, Hp, and the 
nuclear quadrupole interaction, Ha, the latter being 
nonzero only for nuclear spin quantum numbers I > 
1/2, for example, *H. These interactions, projected 
into the zeroth-order Zeeman frame, for the dipole- 
dipole interaction, are 





_ Hoh Wil 2 
gee ee 
ae 
x omn = I;.I;) [13] 


where r; is the internuclear distance and 0; is the 
angle made by the internuclear vector with the 
magnetic field direction; while, for the quadrupole 
interaction 


= 3eVzzO 1 B 
isle MOI 12) 3 cos 6zz) 
x (317 — I(I + 1)) [14] 


where O is the nuclear quadrupole moment, Vzz is 
the electric field gradient (assuming axial symmetry) 
and zz is the angle made by the principal axis of 
that gradient with the magnetic field direction. For 
protons in organic matter, the internuclear dipole 
interaction strength is on the order of 100 kHz, a 
similar strength being found for the quadrupole 
interaction of deuterons. However, in the liquid 
state, these orientation-dependent interactions fluc- 
tuate so rapidly that they are typically motionally 
averaged to zero. Nonetheless, their fluctuations do 
contribute to the relaxation process. 

Liquid-state NMR can result in exceptionally 
high-resolution (sub-Hz) spectra, if care is taken to 
adjust the magnetic field harmonics (shims) to 
produce a highly uniform Zeeman field across the 
sample. The last contribution of residual inhomo- 
geneities to line broadening can often be removed by 
gently spinning the sample about its axis at a rate of 
a few tens of hertz. 


The Evolution Domain, Multiple RF 
Pulses, and Multidimensional NMR 


Having seen the complexity of the spin Hamilto- 
nian, one may envisage experiments where the spin 
coherences evolve in a much more complicated 
manner. To this end, consider the case of a 
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molecular liquid two-spin (AX) system coupled via 
the scalar spin-spin interaction. In first-order per- 
turbation theory, we may represent the simple two- 
spin Hamiltonian (in the rotating frame of the 
averaged Larmor frequency) as 


Hrot = —01YBoliz — 92yBol2z + J high, 
= —wyli, — wale, + J lizloz [15] 
We now write down the density matrix in 


the rotating frame following a single 90° RF 
pulse (Ix), 


p(t) = exp(1wy tliz tiwtl, + iJ Iizl2zt) 


T E 
x expl i5 Ix ) degbm(Iiz +I4,)exp (-iF tx) 


x exp(—iw, th, — iwyth, — iJ lizl2zt) 
= expli th, + iw2t, +1 yzlogt) deghm (ty + Lay) 
x exp(—iw th, — iwyth, — iJ lizl2zt) 
= exp(iwythz + tw2tloz)deqhm 
x ((Iiy + by) cos (5 Jt) +2(Iizl2x + Lixl2z) 
x sin(5Jt)) exp(—iw, th), — iw2th,) 


(Hy cosw t+ In, cosw2t 
. . 1 
+l sinwit + hy sinwzt) cos(5Jt) 


= degbm +2 (Iizhxcoswet — Izy sinwyt [16] 


+ lsd COSw1Ł£— liylz sinw tf) 
x sin(4Jt) 


Detection in the rotating frame with I, +1, gives a 
signal 


S(t) ~ degom(exp(iwit) + exp(iwzt)) cos(5Jt) [17] 


Fourier transformation with respect to ft yields a 
spectrum corresponding to two spectral lines at w1 
and w2, each split into a doublet of two sidebands 
separated by J. 

Notice that it is easier to follow the evolution of 
the density matrix by simply writing down a time 
sequence of behaviors under the influence of the 
successive Hamiltonians. Where simultaneous terms 
in the Hamiltonians commute, the order of their 
Operation may be set at will. Thus, the above 
example becomes 


lx lizl2zt 
liz + dz Iy E (Iy + h) cos(4Jt) 


+2(hrzby + IxI2z)sin(4Jt) 


wth, ,+w tl 
DA N Ti cosw1t + Iny coswzt 


+ilysinwt +ihysinwt) cos(5Jt) 
+ 2 (Ligh COS Wt — Se Oe sin wt 


Fial coswyt — yl; sin wt) sin (5 Jt) [18] 
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Figure 3 The proton NMR spectrum of ethanol showing three major peaks, separated by chemical shift, each split into multiplets 


arising from nearby protons via the scalar coupling. 


Now consider a two RF pulse scheme as shown in 
Figure 4, each RF pulse being 90°. The evolution is 


T 


ZI 
ae ty] tı l2z—Jlizl2zt 
lig + log — liyt Iy witi lie+w2t1l2z—Jlizl2z í 





(Iy COS Wty + hy cos w7ty + Ix sin w1ti 
+ b, sin w2t1) cos(4 Jti) F 2(hzhx COS w2 t1 
=li sin w2t1 + lixl2z cos why 


—hiylog sin w1 tı) sin(4Jt1) 


ZI 
74x è 
—>(—-h; cos Ww ty — In, cos wzt1 + lix sin w1t1 


+ hy sin w2t1) cos(4Jt1) 
+ 2 (Liylox COS w2t1 + liylaz sin w2t4 


+iislay coswiti + lizzy sin wt) sin(4Jt) 


wt liz +w2t2l2z+JI1z12zt2 


> 
Keeping only observable magnetization 





(lix sin w1t1 cos w1t2 + hx sin w2t1 cos w2t2) 

x cos(4 Jta) cos(4Jt2) + (lix sin w2t1 sin w1t2 

+ hy sin w1t1 sin w2t2) x sin(4Jt1) sin (4Jt2) [19] 
If the idealized experiment is performed with two 


independent time dimensions t4 and t2, then detec- 
tion in the rotating frame over the t) period with 


Iy + il, gives a signal (restricting our attention to the 
quadrant of positive frequencies) 


S(t1, t2) ~ deqbm(€Xp (iwi t1 ) exp(iw tz) + exp(iw2t ) 
x exp(iwzt2)) cos(4 Jti) cos(5Jt2) 
+ degbm(€Xp(iw2t1 ) exp (Iw f2 ) 
+ exp(iw1tı ) exp(iw2t2 )) 
x sin (4 Jti) sin (4 Jt2) [20] 


When Fourier transformed in two dimensions with 
respect to t; and t2, the pattern shown in Figure 5 
results. Remarkably, while the diagonal spectrum 
is the same pair of doublets seen in the figure, 
this two-dimensional spectrum contains off-diagonal 
antiphase peaks for scalar-coupled sites where magnet- 
ization transfer has occurred. 

The idea of performing NMR in two or more 
dimensions was first proposed by Jean Jeener in 
1971. The example outlined above, correlation 
spectroscopy (COSY), is just one of an array of 
coherence transfer experiments using multiple RF 
pulse trains and time domain evolution of the spin 
ensemble. Notice that in the COSY experiment, t4 is 
an evolution dimension during which no detection of 
NMR signal occurs, while t) is the detection domain. 
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Figure 4 RF pulse scheme used for COSY experiment. 





Figure 5 Schematic COSY (modulus) spectrum for an AX spin 
system. Not that the (antiphase) off-diagonal peaks indicate 
J-couplings between chemical-shift-separated spins. 


The effect of the evolution is indelibly imprinted in 
the spin system density matrix allowing later recall of 
vital information concerning the interactions present 
in the spin Hamiltonian. The COSY experiment 
allows one to determine which spins are coupled via 
their molecular orbital electrons. Other multidimen- 
sional methods that rely on dipole-dipole relaxation 
effects, such as NOESY, determine which spin sites 
have “through-space” proximity. 

The use of two- and higher-dimensional methods 
has allowed the NMR spectra of biological macro- 
molecules to be unraveled, with COSY methods used 
for spectral assignment of amino acid units, and 
NOESY methods used to determine any close proxi- 
mities of amino acids otherwise well separated in the 
sequence. Such distance information has allowed the 
reconstruction of protein conformations by NMR. 

The second RF pulse of Figure 4 also generates a 
state of the density matrix, I1,I2, known as a double 
quantum coherence, and, in the simple COSY 
experiment, lost to observable magnetization. Other 
RF pulse schemes can take advantage of this state, 
converting it via suitable “coherence pathways” into 
an observable. For a detailed summary of these 
various NMR phenomena, readers are referred to the 
book by Ernst et al. (1987). 
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Solid-State NMR 


As with J couplings, dipolar interactions and 
quadrupole interactions (I > 1/2) are bilinear in 
the spin operators and can be used to generate 
various higher-order coherence pathways in NMR 
experiments. Unlike the simple spin-spin coupling, 
they have an angular dependence. In solids, these 
interactions may broaden the NMR resonance line 
by tens to hundreds of kilohertz. In the case where a 
probe nucleus is located at a known site in the 
material (often achieved by deuteron labeling), these 
Hamiltonian terms may contribute important infor- 
mation about structure, and especially orientational 
anisotropy. For example, the quadrupole interaction 
for the spin-1 deuteron (see eqns [11] and [14]) 
depends as P2( cos zz) = (1/2)(3 cos zz — 1) on the 
angle between the external magnetic field and 
the electric field gradient (generally associated with 
the local molecular orbital or bond direction, and taken 
here to be axially symmetric). Note that the first-order 
contribution of the quadrupole interaction leads to an 
unequal separation of the m=1,0,—1 Zeeman 
energy levels, resulting in a doublet NMR spectrum, 
for any particular orientation, 0zz. Such a unique 
orientation might be found in a single crystal, or in a 
nematic liquid crystalline state. For a polycrystalline 
material, however, the NMR spectrum has a con- 
tribution from all orientations, leading to a character- 
istic powder pattern. The details of ?H spectral 
distributions may be used to characterize the degree 
of orientational order in solids and soft, anisotropic 
matter. 

For 'H,!°C, and other spin-1/2 nuclei, dipolar 
interactions (with a wide distribution of spin 
spacings and internuclear vector orientation) may 
severely broaden the NMR spectrum in the solid 
state (see eqns [11] and [13]). Such interactions, 
along with quadrupole interactions for nuclei with 
I >1/2, may be significantly reduced by modulating 
the effective dipolar Hamiltonian at a rate faster 
than its strength in frequency units. Two methods 
are available, one (magic angle spinning or MAS) 
relying on the angular terms in eqns [13] and [14], 
and the other (multiple pulse line narrowing) on the 
spin terms. The MAS technique relies on spinning 
the sample rapidly about at angle oriented at 54.4° 
to the magnetic field, such that the average value of 
P (cos 0;) becomes its projection along this spinning 
axis, while the projection of the spinning axis 
residual is P2(cos 54.4°) ~ 0. Multiple pulse meth- 
ods rely on a successive reorientation of the spin 
system such that the effective dipolar Hamiltonian 
that results from the application of the nested 
evolution operators is rendered close to zero. 
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In practice, MAS techniques work best with !°C 
NMR where the moderate 'H-'°C dipolar interactions 
may be removed with achievable spinning speeds (a 
few tens of kilohertz). Furthermore, the larger proton 
magnetization (yı /yc % 4) can be transferred to the 
I3C nuclei via Hartman—Hahn cross-polarization thus 
significantly enhancing sensitivity. Such methodology 
is referred to as CPMAS NMR. 

The real art of solid-state NMR is in removing the 
unwanted dipolar or quadrupolar interactions, but 
leaving specific interactions of interest. This may be 
achieved by including in the MAS experiment, 
specific combinations of pulses which recouple 
selected spins. Some of the most sophisticated 
experiments in modern NMR are to be found in 
this domain of application. 


Conclusion 


NMR provides exceptional structural information 
concerning molecules, biomolecules as well as 
molecular assemblies, liquid crystals, soft solids, 
and solids. In addition, the method provides unique 
information concerning molecular dynamics, 
through both relaxation methods and the direct 
measurement of diffusion or flow. One spectacular 
application of NMR concerns its use in imaging, 
achieved by giving the Larmor frequency a spatial 
tag through the use of deliberately inhomogeneous 
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Several fields of mathematics have closely been 
associated to physics: this has always been the case 
for the theory of differential equations. In the early 
twentieth century, with the advent of general 
relativity and quantum mechanics, topics such as 
differential and Riemannian geometry, operator 
algebras, and functional analysis, or group theory 
also developed a close relation to physics. In the 
1990s, mostly through the influence of string theory, 
algebraic geometry also began to play a major role 
in this interaction. Recent years have seen an 
increasing number of results suggesting that number 
theory also is beginning to play an essential part on 
the scene of contemporary theoretical and mathe- 
matical physics. Conversely, ideas from physics, 


magnetic fields. This topic is covered in the article 
on Magnetic Resonance Imaging. 


See also: Magnetic Resonance Imaging. 
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mostly from quantum field theory and string theory, 
have started to influence work in number theory. 

In describing significant occurrences of number 
theory in physics, we will, on the one hand, restrict 
our attention to quantum physics, while, on the other, 
we will assume a somewhat extensive definition of 
number theory that will allow us to include arithmetic 
algebraic geometry. The territory is vast and an 
extensive treatment would go beyond the size limits 
imposed by the encyclopedia. The choice of topics 
represented here inevitably reflects the limited knowl- 
edge, particular interests, and bias of the author. Very 
useful references, collecting a lot of material on number 
theory and physics, are the proceedings of the Les 
Houches conference in 2003 (Beilinson and Manin 
1986), as well as the two volumes of a previous Les 
Houches conference on number theory and physics, 
which took place in 1989, published by Springer in 
1990 and 1992. A number theory and physics database 
is presently maintained online by M R Watkins. 


In the following, we have organized the material 
by topics in number theory that have so far made an 
appearance in physics, and for each we briefly 
describe the relevant context and results. This 
singles out many themes. We first discuss a class of 
functions that occur in physics and their special 
values that are of great number-theoretic impor- 
tance. This includes the dilogarithm, the polyloga- 
rithms and multiple polylogarithms, and the 
multiple zeta values. We also discuss the most 
important symmetry groups of number theory, the 
Galois groups, and occurrences in physics of some 
forms of Galois theory. We then discuss how 
techniques from the arithmetic geometry of alge- 
braic varieties, especially Arakelov geometry, play a 
role in string theory. Finally, we discuss briefly the 
theory of motives and outline its possible relation to 
quantum physics. From the physics point of view, it 
seems that the most promising directions in which 
number-theoretic tools have come to play a crucial 
role are to be found mostly in the realm of rational 
conformal field theories and of noncommutative 
geometry, as well as in certain aspects of string 
theory. 

Among the topics that are very relevant to this 
theme, but that will not be touched upon in this 
article, there are important subjects such as the 
theory of “arithmetic quantum chaos,” the use of 
methods of random matrix theory applied to the 
study of zeros of zeta functions, or mirror symmetry 
and its connection to modular forms. The interested 
reader can find such topics treated in other articles 
of this encyclopedia and in the references mentioned 
above (see Quantum Ergodicity and Mixing of 
Eigenfunctions; Random Matrix Theory in Physics; 
Mirror Symmetry: a Geometric Survey). 


Dilogarithm, Multiple Polylogarithms, 
Multiple Zeta Values 


The dilogarithm is defined as 
0 _ œO „n 
Li (z) = J log — #) dt = ye 

Z 


It satisfies the functional equation Liz (z) + Li2(1 — z) = 
Liz(1) — log (z) log (1 — z), where Li2(1) = ¢(2), for C(s) 
the Riemann zeta function. A variant is given by 
the Rogers dilogarithm L(x) = Liz(x) + (1/2) log (x) 
log(1 — x). For more details, see Zagier’s paper 
(Julia et al. 2005, vol. II). 

The polylogarithms are similarly defined by the 
series Li,(z) = S>,.,2"/n*. In quantum electrody- 
namics, there are corrections to the value of the 
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gyromagnetic ratio, in powers of the fine structure 
constant. The correction terms that are known 
exactly involve special values of the zeta function 
such as ¢(3),¢(5) and values of polylogarithms such 
as Li4(1/2). The series defining the polylogarithm 
function Li,(z)= S>,.,2"/n* converges absolutely 
for all s€ C and |z| < 1 and has analytic continua- 
tion to z€C\H,œ) The Fermi-Dirac and 
Bose-Einstein distributions are expressed in terms 
of the polylogarithm function as 


I aaa dx = —I(s + 1)Li145(te") 


The multiple polylogarithms are functions defined 
by the expressions 


Liz; — s,(Z1, 22; e. sZ) 
- P oe p 
7 $1 4,82 eee Sr 
ny >m >- >n,>0 ii ai My 
By analytic continuation, the functions 
Lig,....,s,(Z15 Z25---5%r) are defined for all complex s; 
and for z; in the complement of the cut [1, oo) in the 
complex plane. Multiple zeta values of weight k and 


depth r are given by the expressions 


(hy .k= X =z B 


1 
ny >nz>-+->n,>0 n My 


with k; € N and kı > 2. These satisfy many combi- 
natorial identities and nontrivial relations over Q. 
For an informative overview on the subject, see 
Cartier (2002). Notice that, for the sums in [1] and 
[2], a different summation convention can also be 
found in the literature. 


Conformal Field Theories and the Dilogarithm 


There is a relation between the torsion elements in 
the algebraic K-theory group K3(C) and rational 
conformally invariant quantum field theories in two 
dimensions (see Nahm (2005)). There is, in fact, a 
map, given by the dilogarithm, from torsion 
elements in the Bloch group (closely related to the 
algebraic K-theory) to the central charges and 
scaling dimensions of the conformal field theories. 

This correspondence arises by considering sums of 
the form 


2 


Px a 





[3] 


where (4) i = (Gm Pas Ga legg --- 
(1 — q™) and O(m) = m Am/2 + bm + h has rational 
coefficients. Such sums are naturally obtained from 
considerations involving the partition function of a 
bosonic rational conformal field theory (CFT). In 
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particular, [3] can define a modular function only if all 
the solutions of the equation 


> Ai log(x;) = log(1 z x;) [4] 


determine elements of finite order in an extension 
B(C) of the Bloch group, which accounts for the fact 
that the logarithm is multivalued. The Rogers 
dilogarithm gives a natural group homomorphism 
(27i)?L: B(C) => C/Z, which takes values in Q/Z 
on the torsion elements. These values give the 
conformal dimensions of the fields in the theory. 


Feynman Graphs 


Multiple zeta values appear in perturbative quantum 
field theory. D Kreimer (2000) developed a connec- 
tion between knot theory and a class of transcen- 
dental numbers, such as multiple zeta values, 
obtained by quantum field-theoretic calculations as 
counterterms generated by corresponding Feynman 
graphs. Broadhurst and Kreimer (1997) identified 
Feynman diagrams with up to nine loops whose 
corresponding counterterms give multiple zeta 
values up to weight 15. Recently, Kreimer showed 
some deep analogies between residues of quantum 
fields and variations of mixed Hodge-Tate struc- 
tures associated to polylogarithms. 

Testing predictions about the standard model of 
elementary particles, in the hope of detecting new 
physics, requires developing effective computational 
methods handling the huge number of terms involved 
in any such calculation, that is, efficient algorithms for 
the expansion of higher transcendental functions to a 
very high order. The interesting fact is that abstract 
number-theoretic objects, such as multiple zeta values 
and multiple polylogarithms, appear naturally in this 
context (cf., e.g., Moch et al. (2002)). The explicit 
recursive algorithms are based on Hopf algebras and 
produce expansions of nested finite or infinite sums 
involving ratios of gamma functions and Z-sums 
(Euler—Zagier sums), which naturally generalize multi- 
ple polylogarithms and multiple zeta values. Such 
sums typically arise in the calculation of multiscale 
multiloop integrals. The algorithms are designed to 
recursively reduce the Z-sums involved to simpler ones 
with lower weight or depth. 


Galois Theory 


Given a number field K, which is an algebraic 
extension of Q of some degree [K:Q] =n, there is 
an associated fundamental symmetry group, given 
by the absolute Galois group Gal(K/K), where K is 
an algebraic closure of K. Even in the case of Q, the 


absolute Galois group Gal(Q/Q) is a very compli- 
cated object, far from being fully understood. 

One can consider an easier symmetry group, 
which is the abelianization of the absolute Galois 
group. This corresponds to considering the field K”, 
the “maximal abelian extension” of K, which has 
the property that 


Gal(K” /K) = Gal(K/K)” 


The Kronecker-Weber theorem shows that for 
K=Q the maximal abelian extension can be 
identified with the cyclotomic field (generated by 
all roots of unity), Q® =Q", and the Galois 
group is identified with Gal(Q” /Q) ~ 7,*, where 
Z* = Az /Q). In general, for other number fields, 
one has the “class field theory isomorphism” 


9: Gal(K?’/K)>Cx /Dx 


where Cx = A; /K" is the group of idele classes and 
Dx the connected component of the identity in Cx. In 
general, however, one does not have an explicit 
description of the generators of the maximal abelian 
extension K® and the action of the Galois group. This 
is the content of the explicit class field theory problem, 
Hilbert’s 12th problem. In addition to the Kronecker- 
Weber case, a complete answer is known in the case of 
imaginary quadratic fields K = Q(V—d), with d > 1 a 
positive integer. In this case generators are obtained by 
evaluating modular functions at a point r in the 
upper-half plane such that K =Q(r) and the Galois 
action is described explicitly through the group of 
automorphisms of the modular field, through Shimura 
reciprocity. For a survey of the explicit class field 
theory problem and the case of imaginary quadratic 
fields, see Stevenhagen (2001). 

As we mentioned above, understanding the 
structure of the absolute Galois group Gal(Q/Q) is 
a fundamental question in number theory. Grothen- 
dieck described, in his famous proposal “Esquisse 
ďun programme,” how to obtain an action of 
Gal(Q/Q) on an essentially combinatorial object, 
the set of “dessins d’enfants.” These are connected 
graphs (on a surface) such that the complement of 
the graph is a union of open cells and the vertices 
have two different markings, with the properties 
that adjacent vertices have opposite markings. Such 
objects arise by considering the projective line P! 
minus three points. Any finite cover of P! branched 
only over {0,1,00} gives an algebraic curve defined 
over Q. The dessin is the inverse image under the 
covering map of the segment [0,1] in Pt. The 
absolute Galois group Gal(Q/Q) acts on the data of 
the curve and the covering map, hence on the set of 


dessins. A theorem of Bielyi shows that, in fact, all 
algebraic curves defined over Q are obtained as 
coverings of the projective line ramified only over 
the points {0,1, co}. This has the effect of realizing 
the absolute Galois group as a subgroup of outer 
automorphisms of the profinite fundamental group 
of the projective line minus three points. For a 
general reference on the subject, see Schneps (1994). 

A different type of Galois symmetry of great 
arithmetic significance is “motivic” Galois theory. 
This will be discussed later in the section dedicated 
to motives, where we discuss a surprising occurrence 
in the context of perturbative quantum field theory 
and renormalization. 


Quantum Statistical Mechanics and Class 
Field Theory 


In quantum statistical mechanics, one considers an 
algebra of observables, which is a unital C*-algebra 
A with a time evolution gz. States are given by linear 
functionals y: A — C satisfying y(1)=1 and posi- 
tivity y(x*x) > 0. Equilibrium states y at inverse 
temperature 8 satisfy the Kubo—Martin—Schwinger 
(KMS) condition, namely, for all x,y € A there 
exists a bounded holomorphic function Fx, (z) on 
the strip 0 < S(z) < 3, which extends continuously 
to the boundary, such that for allt € R 


Fx y(t) = p(xor(y)) 


and 


Fx y(t + 18) = p(or(y)x) [5] 


Cases of number-theoretic interest arise when one 
considers the noncommutative space of commensur- 
ability classes of Q-lattices up to scaling as algebra of 
observables, with a natural time evolution determined 
by the covolume, as shown in the paper Quantum 
Statistical Mechanics of Q-Lattices of Connes—Marcolli 
(Julia et al. 2005, vol. I). A Q-lattice in R” consists of a 
pair (A,@) of a lattice A C R” together with a 
homomorphism of abelian groups ¢:Q”/Z” — 
QA/A. Two Q-lattices are commensurable, (A1, 1) ~ 
(Ad, 2), iff QA, = QA» and D1 = 2 mod Ay + Ad. 


The Bost-Connes system The quantum statistical 
mechanical system considered by Bost and Connes 
(1995) corresponds to the case of one-dimensional 
Q-lattices. The partition function of the system is 
the Riemann zeta function ¢(3). The system has 
spontaneous symmetry breaking at G = 1, with a 
single KMS state for all 0< G<1. For 8 > 1, the 
extremal equilibrium states are parametrized ie the 
embeddings of Q° in C with a free transitive 
action of the idele class group Co/Do = Z*. At zero 
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temperature, the evaluation of KMS, states on 
elements of a rational subalgebra intertwines the 
action of Z* by automorphisms of (A,o;) with the 
action of Gal(Q” /Q) on the values of the states. 
This recovers the explicit class field theory of Q 
from a physical perspective. 


Noncommutative space of adele classes The algebra 
A of the Bost-Connes system is the noncommutative 
algebra of functions f(r, p), for p € Z, and r € Q* 
such that rp € Z, with the convolution product 


SO Ars soklo) — [6 


s€Q*:spEZ, 


fi * far, p) = 


and the adjoint f*(r, p)=f(r-!, rp). According to the 
general philosophy of Connes style noncommutative 
geometry, it is the algebra of coordinates of the 
noncommutative space defined by the “bad quoti- 
ent” GL;(Q) \ (Ay x {£1}) - a noncommutative 
version of the zero-dimensional Shimura variety 
Sh(GL1,{+1})=GL1 (Q) \ (GL1 (Ap) x {£1}). Its “dual 
system” (in the sense of Connes’s duality of type M 
and type II factors) is obtained by taking the crossed 
product by the time evolution. It gives the algebra of 
coordinates of the noncommutative space defined by 
the quotient A/Q*. This is the noncommutative 
space of “adele classes” used by Connes in his 
spectral realization of the zeros of the Riemann zeta 
function. 


The GL 2-system A generalization of the Bost- 
Connes system was introduced by Connes and 
Marcolli in the paper Quantum Statistical 
Mechanics of Q-Lattices (Julia et al. 2005). This 
corresponds to the case of two-dimensional 
Q-lattices. The partition function is the product 
¢(3)¢(G — 1). The system in this case has two phase 
transitions, with no KMS states for 8 < 1. For 8 > 2, 
the extremal KMS states are parametrized by the 
invertible Q-lattices, namely, those for which ¢ is an 
isomorphism. The algebra A has an arithmetic 
structure given by a rational algebra of unbounded 
multipliers. This rational algebra contains modular 
functions and Hecke operators. At zero temperature, 
extremal KMS states can be evaluated on these 
multipliers. Symmetries of (A, o+) are realized in part 
by endomorphisms (as in the theory of superselec- 
tion sectors) and the symmetry group acting on 
low-temperature KMS states is the group of auto- 
morphisms of the modular field GL2(A;)/Q*. For a 
generic set of extremal KMS» states, evaluation at 
the rational algebra intertwines this action with the 
action on the values of an embedding of the modular 


field as a subfield of C. 
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The complex multiplication system In the case of an 
imaginary quadratic field K = Q(r), an analogous 
construction is possible. A one-dimensional K-lattice is 
a pair (A, @) of a finitely generated O-submodule A of 
C, with AK = K, and a homomorphism of O-modules 
@:K/O — KA/A. Two K-lattices are commensurable 
iff KA, = KA: and ¢; = ¢2 mod Ay + A2. Connes et al. 
(Preprint 2005) constructed a quantum statistical 
mechanical system describing the noncommutative 
space of commensurability classes of one-dimensional 
K-lattices up to scale. The partition function is the 
Dedekind zeta function ¢x (3). The system has a phase 
transition at 8 = 1 with a unique KMS state for higher 
temperatures and extremal KMS states parametrized by 
the invertible K-lattices at lower temperatures. There is 
a rational subalgebra induced by the rational structure 
of the GL2-system (one-dimensional K-lattices are also 
two-dimensional Q-lattices with compatible notions of 
commensurability). The symmetries of the system are 
given by the idele class group Ax ;/IK". The action is 
partly realized by endomorphisms corresponding to the 
possible presence of a nontrivial class group (for class 
number >1). The values of extremal KMS» states on 
the rational subalgebra intertwine the action of the idele 
class group with the Galois action on the values. This 
fully recovers the explicit class field theory for 
imaginary quadratic fields. 


Conformal Field Theory and the Absolute 
Galois Group 


Moore and Seiberg considered data associated to any 
rational conformal field theory, consisting of matrices, 
obtained as monodromies of some holomorphic multi- 
valued functions on the relevant moduli spaces, 
satisfying polynomial equations. Under reasonable 
hypotheses, the coefficients of the Moore-Seiberg 
matrices are algebraic numbers. This allows for the 
presence of interesting arithmetic phenomena. Through 
the Chern—Simons/Wess—Zumino—Witten correspon- 
dence, it is possible to construct three-dimensional 
topological field theories from solutions to the Moore- 
Seiberg equations. 

On the arithmetic side, Grothendieck proposed in 
his “Esquisse d’un programme” the existence of a 
Teichmüller tower given by the moduli spaces Mọ, n 
of Riemann surfaces of arbitrary genus g and number 
of marked points n, with maps defined by operations 
such as cutting and pasting of surfaces and forgetting 
marked points, all encoded in a family of funda- 
mental groupoids. He conjectured that the whole 
tower can be reconstructed from the first two levels, 
providing, respectively, generators and relations. He 
called this a “game of Lego—Teichmiiller.” He also 
conjectured that the absolute Galois group acts by 


outer automorphisms on the profinite completion of 
the tower. The basic building blocks of the tower are 
provided by “pairs of pants,” that is, by projective 
lines minus three points. 

This leads to a conjectural relation between the 
Moore-Seiberg equations and this Grothendieck- 
Teichmüller setting (cf. Degiovanni 1994) according 
to which solutions of the Moore-Seiberg equations 
provide projective representations of the Teichmüller 
tower, and the action of the absolute Galois group 
Gal(Q/Q) corresponds to the action on the coeffi- 
cients of the Moore-Seiberg matrices. 

Rational conformal field theories are, in general, 
one of the most promising sources of interactions 
between number theory and physics, involving 
interesting Galois actions, modular forms, Brauer 
groups, and complex multiplication. Some funda- 
mental work in this direction was done by, for 
example, Borcherds and Gannon. 


Arithmetic Algebraic Geometry 


In this section we describe occurrences in physics of 
various aspects of the arithmetic geometry of 
algebraic varieties. 


Arithmetic Calabi-Yau 


In the context of type II string theory, compactified 
on Calabi-Yau 3-folds, Greg Moore considered 
certain black hole solutions and a resulting dynami- 
cal system given by a differential equation in the 
corresponding moduli. The fixed points of these 
equations determine certain “black hole attractor 
varieties.” In the case of varieties obtained from a 
product of elliptic curves or of a K3 surface and an 
elliptic curve, the attractor equation singles out 
an arithmetic property: the elliptic curves have 
complex multiplication. The class number of the 
corresponding imaginary quadratic field counts 
U-duality classes of black holes with the same area. 
Other results point to a relation between the 
arithmetic properties of Calabi-Yau 3-folds and 
conformal field theory. For instance, it was shown 
by Schimmrigk that, in certain cases, the algebraic 
number field defined via the fusion rules of a 
conformal field theory as the field defined by the 
eigenvalues of the integer-valued fusion matrices 


Pi * pj = (Ni); oe 


can be recovered from the Hasse—Weil L-function of 
the Calabi-Yau. An interesting case is provided by 
the Gepner model associated with the Fermat 
quintic Calabi-Yau 3-fold. 


Arakelov Geometry 


For K a number field and Ox its ring of integers, a 
smooth proper algebraic curve X over K determines 
a smooth minimal model Xo,, which defines an 
arithmetic surface Vo, over Spec(Ox). The closed 
fiber X, of Vo, over a prime p € Ox is given by the 
reduction mod ø. 

When Spec(Ox) is “compactified” by adding the 
Archimedean primes, one can correspondingly 
enlarge the group of divisors on the arithmetic 
surface by adding formal real linear combinations of 
irreducible “closed vertical fibers at infinity.” Such 
fibers are only treated as formal objects. The main 
idea of Arakelov geometry is that it is sufficient to 
work with “infinitesimal neighborhood” X,(C) of 
these fibers, given by the Riemann surfaces obtained 
from the equation defining X over K under the 
embeddings a: K<>+C that constitute the Archime- 
dean primes. Arakelov developed a consistent inter- 
section theory on arithmetic surfaces, by computing 
the contribution of the Archimedean primes to the 
intersection indices using Hermitian metrics on these 
Riemann surfaces and the Green function of the 
Laplacian. 

A general introduction to the subject of Arakelov 
geometry can be found in Lang (1988). Manin 
(1991) showed that these Green functions can be 
computed in terms of geodesics in a hyperbolic 
3-manifold that has the Riemann surface X,(C) as 
its conformal boundary at infinity. 


The Polyakov measure A first application to 
physics of methods of Arakelov geometry was an 
explicit formula obtained by Beilinson and Manin 
(1986) for the Polyakov bosonic string measure in 
terms of Faltings’s height function at algebraic 
points of the moduli space of curves. 

The partition function for the closed bosonic 
string has a perturbative expansion Z= } 7.9 Zg, 
with 


Z= eb(2-28) / eS?) DxDy [7] 
> 


written in terms of a compact Riemann surface » of 
genus g, maps x: ©) — R*%, and metrics y on X. The 
classical action is of the form 


S(x,y) = IEE yy? Oa Oa [8] 


Using the invariance of the classical action with 
respect to the semidirect product of diffeomorphisms 
of ©) and the conformal group, the integral is reduced 
(in the critical dimension d = 26 where the con- 
formal anomaly cancels) to a zeta regularized 
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determinant of the Laplacian for the metric on X 
and an integration over the moduli space M, of genus 
g algebraic curves. Beilinson and Manin gave an 
explicit formula for the resulting Polyakov measure 
on M, using results of Faltings on Arakelov geometry 
of arithmetic surfaces. In particular, their argument 
uses essentially the properties of the Faltings metrics 
on the invertible sheaves d(L) given by the “multi- 
plicative Euler characteristics” of sheaves L of 
relative 1-forms. For a suitable choice of bases {¢;} 
and {w;} of differentials and quadratic differentials, 
the formula for the Polyakov measure is then of the 
form (up to a multiplicative constant) 


dr, =|det B|? (det Sr) Wi A Wi A-A 
W3g-3 /\ W3e-3 [9] 


with 7 in the Siegel upper-half space, B; = f a, Pj» 
and the W; given by the images of the basis w; under 
the Kodaira—Spencer isomorphism. 


Holography In the case of the elliptic curve 
X,(C)=C*/q”, a formula of Alvarez-Gaume, 
Moore, and Vafa gives the operator product expan- 
sion of the path integral for bosonic field theory as 


B2 (lo lo 2 
g(z,1) =log (i 2 (log |z|/ log |q|)/ 11 — z| 


x | [11 — a”zl [1 - re"! [10] 
n=1 
where Bz is the second Bernoulli polynomial. 
Expression [8] is in fact the Arakelov Green function 
on X,(C) (cf. Lang (1988)). 

Using this and analogous results for higher genus 
Riemann surfaces, Manin and Marcolli (2001) 
showed that the result of Manin (1991) on Arakelov 
and hyperbolic geometry can be rephrased in terms 
of the AdS/CFT correspondence, or holography 
principle. Expression [8] can then be written as a 
combination of terms involving geodesic lengths in 
the Euclidean BTZ black hole. 

In the case of higher genus curves, the Arakelov 
Green function on a compact Riemann surface, 
which is related to the two-point correlation func- 
tion for bosonic field theory, can be expressed in 
terms of the semiclassical limit of gravity (the 
geodesic propagator) on the bulk space of Euclidean 
versions of asymptotically AdS2}1 black holes 
introduced by K Krasnov. 


Motives 


There are several cohomology theories for algebraic 
varieties: de Rham, Betti, étale cohomology. de Rham 
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and Betti are related by the period isomorphism, and 
comparison isomorphisms relate Etale and Betti 
cohomology. In the smooth projective case, they 
have the expected properties of Poincaré duality, 
Künneth isomorphisms, etc. Moreover, Etale coho- 
mology provides interesting ¢-adic representations of 
Gal(k/k). In order to understand what type of 
information, such as maps or operations can be 
transferred from one to another cohomology, 
Grothendieck introduced the idea of the existence of 
a “universal cohomology theory” with realization 
functors to all the known cohomology theories for 
algebraic varieties. He called this the theory of 
“motives.” Properties that can be transferred between 
different cohomology theories are those that exist at 
the motivic level. A short introduction to motives can 
be found in Serre (1992). 

The first constructions of a category of motives 
proposed by Grothendieck covers the case of smooth 
projective varieties. The corresponding motives form 
a Q-linear abelian category of “pure motives.” 
Roughly, objects are varieties and morphisms are 
“correspondences” given by algebraic cycles in the 
product, modulo a suitable equivalence relation. The 
category also contains Tate objects generated by 
Q(1), which is the inverse of the pure motive 
H?(P'). Grothendieck’s standard conjectures imply 
that the category of pure motives is equivalent to the 
category of representations Repco of a “motivic 
Galois group,” which in the case of pure motives is 
proreductive. The subcategory of pure Tate motives 
has as motivic Galois group the multiplicative group 
Gm. The situation is more complicated for “mixed 
motives,” for which constructions were only very 
recently proposed (e.g., in the work of Voevodsky). 
These provide a universal cohomology theory for 
more general classes of algebraic varieties. Mixed 
Tate motives are the subcategory generated by the 
Tate objects. There is again a motivic Galois group. 
For mixed motives it is an extension of a proreduc- 
tive group by a prounipotent group, with the 
proreductive part coming from pure motives and 
the prounipotent part from the presence of a weight 
filtration on mixed motives. The multiple zeta values 
appear as periods of mixed Tate motives. 


Renormalization and Motivic Galois Theory 


A manifestation of motivic Galois groups in physics 
arises in the context of the Connes—Kreimer theory of 
perturbative renormalization (for an introduction to 
this topic, see Hopf Algebra Structure of Renormaliz- 
able Quantum Field Theory). In fact, according to the 
Connes—Kreimer theory, the Bogoliubov—Parasiuk-— 
Hepp-Zimmerman (BPHZ) renormalization scheme 


with dimensional regularization and minimal subtrac- 
tion can be formulated mathematically in terms of the 
Birkhoff factorization 


ylz) = y-(z) +4 (2) [11] 


of loops in a prounipotent Lie group G, which is the 
group of characters of the Hopf algebra of Feynman 
graphs. Here, the loop y is defined on a small 
punctured disk around the critical dimension D, y+ 
is holomorphic in a neighborhood of D, and 7y_ is 
holomorphic in the complement of D in P!(C). The 
renormalized value is given by (D) and the 
counterterms by 7_(z). 

The paper of Connes and Marcolli Renormaliza- 
tion, the Riemann—Hilbert Correspondence, and 
Motivic Galois Theory in volume II of Julia et al. 
(2005) shows that the data of the Birkhoff factor- 
ization are equivalently described in terms of 
solutions to a certain class of differential systems 
with irregular singularities. This is obtained by 
writing the terms in the Birkhoff factorization as 
time-ordered exponentials, and then using the fact 
that 


b OO 
Teds a(t) dt =i +> falsi) : -- (Sn) ds, tee ds, 
n=1 14 


<s1 L- <s Lb 





is the value g(b) at b of the unique solution g(t) € G 
with value g(a)=1 of the differential equation 
dg(t) = g(t)a(t) dt. 

The singularity types are specified by physical 
conditions, such as the independence of the counter- 
terms on the mass scale. These conditions are 
expressed geometrically through the notion of 
G-valued “equisingular connections” on a principal 
C*-bundle B over a disk A, where G is the 
prounipotent Lie group of characters of the 
Connes—Kreimer Hopf algebra of Feynman graphs. 
The “equisingularity” condition is the property that 
such a connection w is C°-invariant and that its 
restrictions to sections of the principal bundle that 
agree at 0 € A are mutually equivalent, in the sense 
that they are related by a gauge transformation by a 
G-valued C*-invariant map regular in B; hence, they 
have the same type of (irregular) singularity at the 
origin. 

The classification of equivalence classes of these 
differential systems via the Riemann—Hilbert corre- 
spondence and differential Galois theory yields a 
Galois group U* = UxG,,, where U is prounipotent, 
with Lie algebra the free graded Lie algebra with 
one generator e_, in each degree n € N. The group 
U* is identified with the motivic Galois group of 
mixed Tate motives over the cyclotomic ring 
Z{e2™/N], for N=3 or N=4, localized at N. 


Speculations on Arithmetical Physics 


In a lecture written for the 25th Arbeitstagung in 
Bonn, Y Manin presented intriguing connections 
between arithmetic geometry (especially Arakelov 
geometry) and physics. The theme is also discussed 
in Manin (1989). These considerations are based on a 
philosophical viewpoint according to which funda- 
mental physics might, like adeles, have Archimedean 
(real or complex) as well as non-Archimedean 
(p-adic) manifestations. Since adelic objects are 
more fundamental and often simpler than their 
Archimedean components, one can hope to use this 
point of view in order to carry over some computa- 
tion of physical relevance to the non-Archimedean 
side where one can employ number-theoretic methods. 


Adelic physics? Some of the results mentioned in 
the previous sections seem to lend themselves well to 
this adelic interpretation. The quantum statistical 
mechanics of Q-lattices relies fundamentally on 
adeles and it admits generalizations to systems 
associated to other algebraic varieties (Shimura 
varieties) that have an adelic description and adelic 
groups of symmetries. The result on the Polyakov 
measure also has an adelic flavor, as it uses 
essentially the Archimedean component of the 
Faltings height function. The latter is in fact a 
product of contributions from all the Archimedean 
and non-Archimedean places of the field of defini- 
tion of algebraic points in the moduli space, so that 
one can expect that there would be an adelic 
Polyakov measure, of which one normally sees the 
Archimedean side only. The Freund—Witten adelic 
product formula for the Veneziano string amplitude 
fits in the same context, with p-adic amplitudes 


Bap) =f TN -xl ds 
p 


and Bala, 3) *= II, Bola, 8) (cf. 
(2004)). 


Varadarajan 


Adelic physics and motives A similar adelic philo- 
sophy was taken up by other authors, who proposed 
ways of introducing non-Archimedean and adelic 
geometries in quantum physics. A recent survey is 
given in Varadarajan (2004). For instance, Volovich 
(1995) proposed spacetime models based on 
cohomological realizations of motives, with étale 
topology “interpolating” between a proposed non- 
Archimedean geometry at the Planck scale and 
Euclidean geometry at the macroscopic scale. In 
this viewpoint, motivic L-functions appear as parti- 
tion functions and actions of motivic Galois groups 
govern the dynamics. 
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See also: Hopf Algebra Structure of Renormalizable 
Quantum Field Theory; Mirror Symmetry: A Geometric 
Survey; Quantum Ergodicity and Mixing of 
Eigenfunctions; Random Matrix Theory in Physics; 
Regularization for Dynamical Zeta Functions. 
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Introduction 


An operad is an abstraction of a family of composable 
functions of n variables for various n, useful for the 
“bookkeeping” and applications of such families. 
Operads are particularly important and useful in 
categories with a good notion of “homotopy,” where 
they play a key role in organizing hierarchies of higher 
homotopies, reflecting their original use as a tool in 
homotopy theory, especially for studying (iterated) 
loop spaces. For several years now, operads have 
become increasingly important in mathematical 
physics, especially in string field theory, where they 
organize the terms of higher order in perturbed 
actions, and in deformation quantization. 

The major focus of this article will be on operads as 
they are relevant to mathematical physics, but will also 
include some background material from homotopy 
theory, where they originated. A borderland where 
homotopy theory and cohomological physics overlap is 
the world of differential graded vector spaces, including 
those of differential forms, ghosts, anti-ghosts, etc., 
sometimes lumped together as BRST theory. Here, as 
elsewhere in contemporary mathematical physics, the 
flow has been in both directions — sometimes physicists 
have discovered or reinvented known mathematics but 
finding new applications, at other times physics has 
suggested new concepts for mathematicians to develop 
further. In the case of operads, they have provided 
general structure for varieties of algebras, some of 
which are novel types contributed by physicists. 

For a reasonably up-to-date introduction and 
survey, consider Markl et al. (2002), although there 
have been many developments since then. Two 
particularly important original works are Boardman 


and Vogt (1973) and May (1972). 


Definitions and Examples 


The term “operad” is due to May, building on work 
of Stasheff and of Boardman—Vogt. The most 


fundamental example of an operad is the endo- 
morphism operad Endyx:= {Map(X”, X)},51 where, 
for a set or topological space X, {Map(X”, X)} 
means the set or space of functions or continuous 
functions from the n-fold product of X with itself to 
X, together with the operations 


oj: Map(X", X) x Map(X”, X) — Map(X"*"""1, X) 
given, for 1 < i < n, by 

(f oig) (x1. 

= f (x1, T 


In the endomorphism operad End x, there are 
easily discovered relations involving iterated o;- 
operations and the symmetric group ©, actions on 
the X”s. For example, 


(f oig) 7 b = f 9; (8 oj-i+1 h) 
fri<j<i+m-1 


a) Kia) 


pial oes ged eas ee) 


if g is a function of m variables, since only the name 
of the position for the insertion is changed. 

An operad (O, o;) consists of a collection 
{O(n)}, +1 of objects and maps œo;: O(n) x O(m) —> 
O(n+m—1) for m,n > 1 and i < n satisfying the 
relations manifest in the example Endy. 

May’s original definition corresponds to simulta- 
neous insertions into all possible positions of inputs 
into f € Map(X”, X). In most examples, the struc- 
tures are “manifest” without appeal to the technical 
definitions. 

It helps to see graphic examples of operads, 
particularly ones relevant for physics. Two kinds 
that are particularly important are the tree operads 
and the little disks (or cubes) operads. 

Let 7 (n) be the set of planar trees with one root 
and n leaves labeled (arbitrarily) 1 through n. The 
collection 7 ={T(n)},., of sets of trees forms an 
operad by grafting the root of g to the leaf of f 
labeled 1, as in Figure 1, where the leaves are 
assumed labeled in order from left to right. Figure 1 
can be interpreted as portraying the o4 result of 
inserting a 3-linear operation into a 5-linear one. 

The little n-disks operad D,={Dy(7)};>1 where 
D,„(j) consists of an ordered collection of j n-disks 
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Figure 1 Grafting with the leaves numbered from left to right. 
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Figure 2 The little 2-disks operad. 


embedded in the standard n-dimensional unit disk 
D” with disjoint interiors, the embedding being of 
the form az + b with 0 < a € R. The operations are 
given as indicated in Figure 2. 

Just as group theory without representations is 
rather sterile, so are operads best appreciated by 
their representations known as (varieties of) alge- 
bras, especially algebras with higher homotopies. 

An algebra A over an operad % “is” a map of 
operad $ — Enda. This is just a compact way 
of saying that an algebra A has a coherent system 
of maps P(n) x A” >A. Much of this article will 
speak in terms of such algebras with the correspond- 
ing operad being understood. 


Operads in Homotopy Theory 


A major motivation for the development of operads 
was the desire to have a homotopy-invariant char- 
acterization of based loop spaces and iterated loop 
spaces. Precisely such coherent systems of higher 
homotopies provided the answers. For based loop 
spaces, the operad in question K = {K,},,., consists of 
the polytopes known as “associahedra.” The usual 
product of based loops is only homotopy associative. 

If we fix a specific associating homotopy and 
consider the five ways of parenthesizing the product 
of four loops, there results a pentagon whose edges 
correspond to a path of loops (Figure 3). 

From the leftmost vertex to the rightmost, consider 
the two paths of loops across the top or around the 
bottom. By further adjustment of parameters, the 
pentagon can be filled in by a family of such paths. 

The associahedron K, can be described as a 
convex polytope with one vertex for each way of 
associating n ordered variables, that is, ways of 
inserting parentheses in a meaningful way in a word 


(ab)(cd) 


a(b(cd)) 


a((bc)d) 


Figure 3 The associahedron K4. 


(a(bc))d 


of n letters. The edges correspond to a single 
application of an associating homotopy. More 
generally, the cellular structure of the associahedra 
is well described by planar rooted trees, the vertices 
corresponding to binary trees and so forth (see 
Figure 4). 

For Ks, see Figure 5 or a rotatable image available at 
http://igd.univ-lyon1.fr/~ chapoton/stasheff.html. The 
facets are all products of two associahedra of lower 
dimension and specific imbeddings can be given to play 
the role of the 0; operations as in an operad. 

An A.-space is a space Y which admits a coherent 
family of maps 


My: Ky, x Y” — Y 


so that they make Y an algebra over the operad 
(without %,,-actions) K = {Ky},,51. 

The main result by Stasheff is: A connected space 
Y (of the homotopy type of a CW-complex) has the 
homotopy type of a based loop space QX for some 
X if and only if Y is an A-space. 

Homotopy characterization of iterated loop 
spaces ()”X, for some space X, required the full 
power of the theory of operads with the symmetries. 


SZ 


\Y xY 


Figure 4 kK, with vertices labeled by trees. 





Figure 5 The associahedron Ks. 


An early motivation for the invention of a theory 
of operads was the consideration of infinite loop 
spaces, that is, a sequence of spaces X, such that 
each X, is homotopy equivalent to Q1.X,,41. 

Although introduced originally in the category of 
topological spaces, operads were available almost 
immediately for differential graded (dg) vector 
spaces, also known as chain complexes. In physics, 
the differential is often called a BRST operator, a 
term that should be reserved for a special kind of dg 
algebra, see below. 


Operads in Algebra 


The 0; notation first appeared in Gerstenhaber’s study 
of the algebraic structure of the Hochschild cohomol- 
ogy of an associative algebra, about the same time as 
the construction of the associahedra where the opera- 
tions were given in a less convenient notation. Recall 
the Hochschild cohomology of an associative algebra 
A is the homology of the complex Hom(A®”, A) with 
the coboundary given as follows (all signs below are 
indicated as +, any of the standard references will 
specify conventions and signs): for f € Hom(A®”, A) 
and g € Hom(A®”, A) let 


fog=Uitforg i 
Gerstenhaber then defines his bracket as [f, g] =f og 


+gof. With hindsight, he realized that the 
Hochschild coboundary can be written as 


Sh = |m, h] (2) 


where m:A@®A-—A is the multiplication. More- 
over, the associativity of m is equivalent to 


m,m] = 0 [3] 


A,.-Algebras 


In the setting of graded vector spaces V = ez V", 
there are two conventions for defining A,,-algebras, 
which differ by a shift in grading. We adopt the 
physics convention so that A here is the suspension 
of that considered in the original papers. The 
cellular chains of the associahedra form the Ax- 
operad, providing the following definition. 


Definition 1 A-algebra (Strong homotopy asso- 
ciative algebra). Let A be a Z-graded vector space 
A = ez A’ and suppose that there exists a collec- 
tion of degree 1 multilinear maps 


m=; ASF SA ei 
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(A,m) is called an A-algebra when the multilinear 
maps mz, satisfy the following relations: 


p 
>, +i»; =) [4] 
p+q=n+1 i=1 


with an appropriate set of signs for n > 1. 


A weak A-algebra consists of a collection of 
degree 1 multilinear maps 


m:= {m; : A > A tiso 


satisfying the above relations, but for n > 0 and in 
particular with k,l > 0. 


Remark 1 The “weak” version is fairly new, 
inspired by physics, where mọ: C —> A, regarded as 
an element mo(1) € A, is related to what physicists 
refer to as a “background.” The augmented relation 
then implies that mọo(1) is a cycle, but 72477; need no 
longer be 0, rather 


mımı = + m (mo &) 1) =e m(1 ) mo) [5] 


Just as associativity was captured by the equation 
[m,m] = 0, so the defining relations of the definition 
of an A-algebra are captured by 


m, m| = 0 6 


Decades later it was realized that considering 
T'A =A as a coalgebra with 


A(a1 ® +++ @ an) = Uptg + (41 @ ++: ap) 
Q (p41 D Qan) 


we then have an isomorphism 
~Hom(A®”, A) ~ Coder(T°A) 


Here Coder is the space of all coderivations of T°A. 
The Gerstenhaber bracket is indeed the “intrinsic” 
commutator bracket of coderivations via the above 
isomorphism. As such, it satisfies a graded version of 
the Jacobi identity; after a shift in grading from the 
original one of Hochschild, the Hochschild cochain 
complex forms a dg Lie algebra. 


L,.-Algebras 


Since an ordinary Lie algebra g is regarded as 
ungraded, the defining bracket is regarded as skew- 
symmetric. For dg Lie algebras and L,.-algebras, we 
need graded symmetry, which refers to symmetry with 
signs determined by the grading. The basic operation is 

Tix y= (1) ly @ x [7] 
Also we adopt the convention that tensor products 
of graded functions or operators have the signs built 
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in; for example, (f ® g)(x 8 y) = C1) flx) @ gly). 
By decomposing each permutation as a product of 
transpositions, there is then defined the sign of a 
permutation of n graded elements, for example, for 
any c; E€ V,1 < i< n, and any o € G,, the permuta- 
tion of n graded elements is defined by 


aleis) = aaea) B 


The sign (—1)“® is often referred to as the Koszul 
sign of the permutation. 


Definition 2 (Graded symmetry). A graded sym- 
metric multilinear map of a graded vector space V to 
itself is a linear map f :V®” — V such that for any 
ci E€ V,1<i<n, and any o € ©, (the permutation 
group of n elements), the relation 


Persoa = 1 7 ose: 
holds. 


Definition 3 By a (k,/)-unshuffle of c1,...,Cc„ with 
n=k + lis meant a permutation o such that for i < 
j < k, we have o(i) < o(j) and similarly for k < i < 
j < k+ l. We denote the subset of (k, l)-unshuffles in 
Sky by Gg and by Gk41=n, the union of the subsets 
G; ı with k+/=n. Similarly, a (k1,..., k;)-unshuffle 
means a permutation o € ©, with n=k, +---+k; 
such that the order is preserved within each block of 
length k,,...,k;. The subset of G, consisting of all 
such unshuffles we denote by Gg, x 


‘aoe 2] 


Definition 4 L,-algebra (Strong homotopy Lie 
algebra). Let L be a graded vector space and suppose 
that a collection of degree 1 graded symmetric linear 
maps I := {h : LF > L}ps1 is given. (L,1) is called an 
L-algebra iff the maps satisfy the following relations: 


> (=1) hea Pag" 


FES pian 


[Co(k+1); TET Cala) = 0 [10] 


» Co(k) ), 


for n> 1. 


A weak L.-algebra consists of a collection of 
degree 1 graded symmetric linear maps 
[:= {I,: L®* — L}jx9 satisfying the above relations, 
but for n > 0 and with k,l > 0. 


Remark 2 The alternate definition in which the 
summation is over all permutations, rather than just 
unshuffles, requires the inclusion of appropriate 
coefficients involving factorials. 


Just as an A-algebra can be described as a 
coderivation of T°A, similarly an L.-algebra L can 
be described as a coderivation on S°L, the symmetric 
subcoalgebra of T°A. 


The operad of Lie algebras was defined rather 
late, although it was earlier implicit in the work of 
Fred Cohen. It is defined as the homology 
H,,1(Config(R*,)) for n > 1, where Config(R7, n) 
denotes the configuration space of ordered n-tuples 
of distinct points in R*. Equivalently, the configura- 
tions can be thought of as the centers of the little 
2-disks. The open disks being contractible to their 
centers, this is a suboperad of the full homology 
H,(D2). 

Just as a Lie algebra is obtained from an 
associative algebra using the commutator as bracket 
and, inversely, a Lie algebra gives rise to its 
universal enveloping associative algebra, an 
L.-algebra can be obtained from an A-algebra 
by n-variable analogs of commutators and there 
is a universal enveloping A-algebra of a given 
L-algebra. 


Open-Closed Homotopy Algebras 


Open-closed string field theory suggests interaction 
between an L,.-algebra H. and an A,,-algebra Ho 
including a strong homotopy representation of He 
on Ho by strong homotopy derivations. Here is the 
formal definition: 


Definition 5 Let H=H, He be a graded vector 
space and (He, I) be a weak L..-algebra. Consider a 
collection of multilinear maps 


n := {71 : (Ho) ozo Ha” = Ho}k i>0 


each of which is graded symmetric on (He). We 
denote the collection also by n. We call (H,n,[) a 
(partial) open-closed homotopy algebra (OCHA) 
when n satisfies the following relations (up to some 
factorial coefficients): 
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Other Algebras of Interest 


The Hochschild complex also has a graded product 
(without invoking the shift) known as the cup 
product. Except for the signs and the grading, the 
bracket and the product satisfy the Leibniz rule of a 
Poisson algebra on the cohomology; the result is 


axiomatized as a “Gerstenhaber algebra.” However, 
on the cochain complex, the Lie bracket and the 
associative product are compatible only up to 
homotopy. 

This naturally raises the issue of an operad for 
strong homotopy Gerstenhaber algebras. The operad 
G for Gerstenhaber algebras is the homology of the 
little disks operad, H.(D2). But now we have 
choices: in addition to relaxing the Leibniz rule up 
to homotopy, the bracket could be relaxed to be 
part of an L.-algebra and/or the product could be 
relaxed to be part of an A-algebra. The choice 
which is now known as the G,-operad is defined in 
terms of a procedure which works for what are 
known as quadratic operads, indicating they have 
generators in QO(2) and relations in Q©(3): the 
corresponding Oœ has “dual” relations. For exam- 
ple, this gives the classical Koszul duality between 
Lie and commutative associative algebras. The G,- 
operad can also be described as the “minimal 
model” of G in the sense of Markl. 

Another alternative is to consider just the “brace” 
operations, originally introduced by Kadeishvili and 
later independently by Getzler, but described in the 
Hochschild complex setting by Gerstenhaber- 
Voronov. Together with the cup product, these 
determine an operad denoted HG which acts on the 
Hochschild complex; there is an operad map from 
Gx. to HG, hence G, also acts on the Hochschild 
complex. Finally, Tamarkin showed that G% is 
quasi-isomorphic to the dg operad of singular chains 
on the little disks operad, thus providing one of 
several proofs of what had been a conjecture by 
Deligne. 

Algebras with invariant inner products <—,—> 
are of considerable importance in mathematics and 
especially in mathematical physics; invariance means 
2G,0C> =< 00,¢c> Or <a [bc] >= <la blh e> 10; 
respectively, the associative or the Lie case (with 
appropriate signs in the graded case). Using the 
inner product, n-ary operations A®” — A can be 
converted to operations A®”*! — C of which we can 
require cyclic symmetry. To handle such algebras, 
there is a notion of “cyclic operad.” In terms of 
trees, the transition is to take a rooted tree and then 
regard the root edge as just another leaf. This point 
of view corresponds to an essential symmetry for 
particle interactions. 


Operads in Mathematical Physics 


One reason for the explosive development of operad 
theory in the 1990s was the introduction of operadic 
structures in field theories, for example, conformal 


field theories (CFTs) and string field theories (SFTs). 
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These operadic structures were directly related to 
the moduli spaces of Riemann surfaces with punc- 
tures or boundaries (or other decorations) in these 
physical theories. 

Two special “higher-homotopy algebras” have 
been emphasized because they are particularly 
important in mathematical physics: A. for open- 
string field theory and Lə for closed-string field 
theory and for deformation quantization. Open- 
closed string field theory combines A,,-algebra and 
L.-algebra in a particular way known as an OCHA. 

The operad for L..-algebras is given a very nice 
and physically relevant geometric interpretation in 
terms of a real compactification of the moduli space 
of Riemann spheres with punctures, while for 
OCHAs, there is a real compactification of the 
moduli space of Riemann disks with punctures on 
the boundary or in the interior (bulk). Thus, this 
operad can be regarded as obtained from a moduli 
space of configurations of points (punctures) in the 
disk by compactifying the moduli spaces by adding 
boundary strata where two (or more) points 
(punctures) collide. Points on the boundary strata 
can be visualized as “bubble trees” of disks and/or 
spheres, see Figure 6. Alternatively, the little disks 
operad can be regarded as being obtained by 
“decorating” the points with little disks, while for 
OCHAs there is also a basic half-disk decorated 
with little disks in the bulk and little half-disks for 
the boundary points. The corresponding colored 
operad is Voronov’s “Swiss-cheese  operad.” 
“Colored” refers to the fact that disks can be 
inserted into half-disks but not vice versa. Compare 
trees with two “colors” of edges with grafts 
restricted to ones which match colors. 


On-Shell versus Off-Shell 


In cohomological physics, the “on-shell” states or 
observables are usually given by the cohomology 
with respect to an internal differential, which in 
physics is called the BRST differential or BRST operator, 
though originally this meant the Chevalley—Eilenberg 
differential associated to the action of the Lie algebra of 


O 


Figure 6 Bubble tree for circle configurations. 
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gauge symmetries of a physical theory. The generators 
of the Chevalley—Eilenberg cochain complex are known 
as “ghosts”. On-shell subspaces of algebras which are 
not closed under the product of the larger “off-shell” 
algebra are called “open” algebras by physicists. Quite 
generally, this situation gives rise to an algebra over an 
appropriate operad. A special case involves a differential 
graded algebra A and a linear imbedding H(A) C A. 
The (co)homology is in turn a graded algebra (with 0 as 
differential), but inherits a higher-homotopy structure 
so that cohomology and original algebra are equivalent. 

In the associative case, the inheritance is a result 
of Kadeishvili: 


Let (A, d) be a differential graded associative or 
A-algebra, then the homology H(A) inherits the 
structure of an A-algebra. 


Even if the original algebra A is strictly associa- 
tive, the inherited A,.-structure generally has non- 
trivial operations ™;. 

Analogous results hold for L.-algebras and 
others. It is the L.-version that is relevant for 
closed-string field theory (CSFT). Zwiebach showed 
the quantum theory of covariant closed strings has 
an action defined in terms of an infinite chain of 
string field products. The genus-0 (tree level) string 
field algebra is an L,,-algebra inherited from the off- 
shell state space modeled by the Batalin—Vilkovisky 
(BV) construction. The higher-order brackets pro- 
vide higher-order correlation or n-point functions 
which play a crucial role in the extended Lagrangian 
of the theory. 


Batalin-—Fradkin-Vilkovisky and Batalin-Vilkovisky 
Constructions 


The constructions of Batalin—Fradkin—Vilkovisky 
(BFV) for constrained Hamiltonian systems and of 
Batalin—Vilkovisky (BV) for Lagrangians with sym- 
metries are important examples of L.-structures 
derived from “open” algebra settings, though the 
L.-structures were recognized quite a while after 
the constructions. 

The BFV setting is that of a symplectic manifold 
W with a family of constraints, that is, a family of 
functions ¢° € C*(W). The constraints are called 
“first class” if the ideal they generate is closed under 
the Poisson bracket. The vector space spanned by 
the constraints will in general be an open algebra; 
the structure of the bracket is given by structure 
functions, rather than structure constants. The zero 
locus of all the constraints forms the constraint 
surface V. In the first-class case, the constraints are 
in involution and determine a foliation F of V. If the 
space of leaves V/F is a manifold, it would be 


considered the true physical space and the physical 
observables would be functions in C®(V/F). BFV 
construct a differential graded Poisson algebra such 
that the cohomology in degree O agrees with 
C%(V/F) when that makes sense and, in the regular 
case, the rest of the cohomology is that of the 
differential forms along the leaves of the foliation. 
The BFV differential is a deformation of the 
Chevalley—Eilenberg/BRST differential and can be 
constructed most efficiently by the same techniques 
used in proving Kadeishivili’s inheritance theorem. 
Crucially, it is an inner derivation with respect to 
the Poisson bracket. After the fact, an L..-structure 
can be observed in the extended algebra. 

For a Lagrangian with symmetries, BV develop a 
similar construction, the main difference being that 
there is no Poisson bracket initially, but one is 
constructed by adjoining “anti-fields” as conjugate to 
the fields but of ghost degree —1 and the differential of 
an anti-field being the Euler-Lagrange expression for 
the corresponding field. Then, as in the Hamiltonian 
case, ghosts and anti-ghosts, etc. are adjoined and the 
construction proceeds in a parallel fashion. 


Deformation Quantization 


Once algebras over an operad % are considered, it is 
natural to consider also morphisms of such algebras 
over a fixed $B. 

From a homotopy point of view, the appropriate 
maps need not respect the operad structure strictly 
but only up to higher homotopy; indeed, there is a 
related operad to define such maps. For Læ- 
algebras, such L,,-maps play a key role in deforma- 
tion quantization. That refers to deformation of the 
commutative multiplication of a Poisson algebra in 
the direction of the Poisson bracket; that is, to first 
order, the deformation is given by the bracket. 

More generally, for any associative algebra A with 
multiplication m, one considers formal deformations 


axb = m(a,b) + tmı(a, b) + t7mz(a,b) +--+ [12] 


where each m; € Hom(A @ A, A). The associativity 
of x provides a sequence of constraints on the ™;. In 
particular, mı must be a Hochschild cocycle and the 
obstruction to the existence of mə is a class in the 
Hochschild cohomology of degree 3. In fact, the 
primary obstruction is represented by [71,771]. If it 
is cohomologous to zero, that fact identifies candi- 
dates for m2, that is, 


m1, mı] = +2|m, m| [13] 
or, using the notation d=[m,], 


dm) = 1/2|m1,m | = 0 [14] 


once known as the integrability equation but now, 
more frequently, as a Maurer—Cartan equation. For 
a Poisson algebra, the Poisson bracket is a Hochs- 
child cocycle but in general a full deformation need 
not exist. However, for the algebra A of smooth 
functions on a Poisson (e.g., symplectic) manifold 
M, Kontsevich showed that such a full formal 
deformation does exist. 

The guiding philosophy is that deformations are 
controlled by a dg Lie or L,.-algebra L, unique up to 
L..-homotopy equivalence. Therefore, the obstruc- 
tions can be computed in any of the equivalent dg Lie 
algebras. Moreover, the structure of the obstructions 
is known sufficiently so that if there is an equivalent 
dg Lie algebra with d in fact zero, then all the 
obstructions to deformation quantization vanish. The 
key to Kontsevich’s proof was the construction of an 
L4-map, inducing an isomorphism in cohomology, 
from the Lie algebra of polyvector fields on R? with 
the Schouten bracket and d=0 to the Lie algebra of 
multidifferential operators on A = C®(R“) regarded as 
a subalgebra of the Hochschild cochain complex for A 
with the Gerstenhaber bracket. 


BV Algebras 


In addition to their construction of a differential 
graded Gerstenhaber algebra (a differential graded 
commutative algebra with a compatible Poisson 
bracket of degree 1), BV introduced a new mathe- 
matical structure, adding a second-order differential 
operator A relating the commutative product and 
the bracket. The operator A is a derivation of the 
bracket and of square zero. Moreover, 


la, b] = A(ab) — A(a)b + aA(b) [15] 


so that the failure of A to be a derivation of the 
product is given by the bracket. 

The definition of a BV algebra is then a Gerstenhaber 
algebra with such an operator, though alternative 
definitions exist in which A and the product are 
primary and the bracket is defined by the above 
equation. From the operadic/higher-homotopy point 
of view, one can then go on to consider BV,, algebras. 

Recall that A,,-algebras and L,-algebras (among 
others) can be characterized by an “inner” coderiva- 
tion d=[m,]| of square zero on an appropriate 
“standard” construction. In the context of BV 
algebras, where the bracket is more commonly 
written as {,}, the classical action is an element So 
such that {So,.$o}=0 or, equivalently, d = {So, } is of 
square zero. The quantum analog S is a perturbation 
of So and satisfies instead 


{S,S} = AS 16) 
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This was originally called the “master equation,” 
but now is increasingly referred to as a “Maurer-— 
Cartan” equation. 


Insertion Operads 


There is another class of operads illustrated by trees 
(and more generally graphs) with a very different 
sort of “composition,” namely insertion of one 
graph into another. The most directly relevant to 
physics is the kind of insertion used by Connes and 
Kreimer in their Hopf algebra constructed for 
renormalization of Feynman diagrams. For example, 
consider all finite graphs with exactly two external 
edges and internal numbered edges. Given two 
graphs I'1,I, define T1 o; r2 by cutting edge i of 
I, and identifying the dangling edges with the two 
external edges of T2. 

For planar trees, yet another insertion operad is 
obtained by Chapoton, isolating a part of a structure 
due to Kontsevich, in which a small neighborhood 
of a vertex of the second planar tree is removed and 
the dangling edges are attached to a vertex of the 
first tree by entering through the angles between the 
edges at that vertex (Figure 7). 

Inside the HG-operad is the operad Brace for an 
abstract brace algebra (forgetting the cup product), 
first described as such by Chapoton using the 
insertion operations of Kontsevich and Soibelman. 


A,.-Categories 


Also of importance for applications to mathematical 
physics is the notion of an A.-category, first made 
explicit by Fukaya and now playing a major role in 
string D-brane theory and homological mirror 
symmetry. The D-branes are the objects of the Ay- 
category and the open strings with boundaries on 
two (possibly equal) D-branes B,,By are the 
morphisms from Bı to B2. The operations m; are 
defined only on tuples (a;,...,a;) of “composable” 
morphisms (e.g., strings). 


PROPs 


While an operad is an abstraction of a family of 
composable functions of n variables for various n, a 
PROP is an abstraction of a family of functions in 





Figure 7 Angles determined by edges with leaves extended to 
the semicircle. 
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Hom(A®?, A®41) for all p and q. Now the relevant 
images are graphs with p input legs and q output 
legs with composition being defined by grafting 
output legs of one graph to inputs of another. 
Feynman diagrams are the obvious example in 
physics or, in conformal field theory, tubular 
neighborhoods of such graphs, which is to say, 
Riemann surfaces with boundary circles: p as 
inputs and g as outputs. 


See also: Algebraic Approach to Quantum Field Theory; 
Batalin—Vilkovisky Quantization; Constrained Systems; 
Deformations of the Poisson Bracket on a Symplectic 
Manifold; Deformation Quantization; Deformation Theory; 
Hopf Algebra Structure of Renormalizable Quantum Field 
Theory; String Field Theory. 
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Introduction 


The operator product expansion (OPE) provides an 
algebraic structure in quantum field theory. In a 
sense it supercedes or rather transcends the equal- 
time commutation relations, which provide the 
traditional starting point for the canonical quantiza- 
tion of any quantum field theory. The essential idea 
is that for any two local operator quantum fields at 
spacetime points x1,x2 their product may be 
expressed in terms of a series of other local quantum 
fields at a point x, which may be identified with x1 


Applied Mathematics, Notes by Seidel P, vol. 184, pp. 9-32. 
New York: Marcel Dekker, Inc. 

Gerstenhaber M (1963) The cohomology structure of an 
associative ring. Annals of Mathematics 78: 267-288. 

Getzler E (1995) Operads and moduli spaces of genus 0 Riemann 
surfaces. In: Dijkgraaf R, Faber C, and van der Geer (eds.) The 
Moduli Space of Curves, Progr. Math., vol. 129, pp. 199-230. 
Boston: Birkhauser. 

Getzler E and Jones JDS (1993) n-algebras and Batalin- 
Vilkovisky algebras. Preprint. 

Getzler E and Jones JDS (1994) Operads, homotopy algebra and 
iterated integrals for double loop spaces. Preprint, Department 
of Mathematics, MIT; Department of Mathematics North- 
western University, March 1994, hep-th/9403055. 

Getzler E and Kapranov M (1994) Modular operads. Compositio 
Mathematica 110: 65-126 (dg-ga/9408003). 

Hinich V and Schechtman V (1993) Homotopy Lie algebras. 
Advanced Studies in Soviet Mathematics 16: 1-18. 

Lada T and Markl M (1995) Strongly homotopy Lie algebras. 
Communications in Algebra 2147-2161 (hep-th/9406095). 
Lada T and Stasheff JD (1993) Introduction to sh Lie algebras for 
physicists. International Journal of Theoretical Physics 32: 

1087-1103. 

Markl M, Shnider S, and Stasheff J (2002) Operads in Algebra, 
Topology and Physics, Mathematical Surveys and Mono- 
graphs, vol. 96, MR 2003f: 18011. Providence, RI: American 
Mathematical Society. 

May JP (1972) The Geometry of Iterated Loop Spaces, Lecture 
Notes in Mathematics, vol. 271. Springer. 

Stasheff J (1963) Homotopy associativity of H-spaces, I, II. 
Transactions of the American Mathematical Society 108: 
293-312, 313-327. 

Voronov AA (1999) The Swiss Cheese Operad, Contemp. Math. 
vol. 239, pp. 365-373. Providence, RI: American Mathema- 
tical Society. 

Zwiebach B (1993) Closed string field theory: quantum action 
and the Batalin—Vilkovisky master equation. Nuclear Physics 
B 390: 33-152. 

Zwiebach B (1998) Oriented open—closed string theory revisited. 
Annals of Physics 267: 193-248 (hep-th/9705241). 


Quantum Field Theory 


or x2, times c-number coefficient functions which 
depend on x; — x2. The set of operators which may 
appear depends on the particular quantum field 
theory and must of course be in accord with any 
requirements of conserved quantum numbers. The 
coefficient functions depend on x; — x2 in a fashion 
which depends on the dimensions of the various 
operators involved, at least up to renormalization 
group corrections. The most singular contributions 
are those for the operators appearing in the OPE 
with lowest scale dimension. From a phenomenolo- 
gical point of view, only the first few terms in the 
OPE are of relevance. However, theoretically, 
especially for conformal field theories, it is desirable 
to know the full expansion to all orders in powers of 
xı — x2 in such a way that the operator product may 
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be replaced by the full expansion in appropriate 
correlation functions. We first discuss the OPE for 
free theories and then the interacting case. 


Free Field Theory 


The OPE is most straightforward in free field theory 
when it almost reduces to a Taylor series expansion. 
For a simple free massless scalar field (x) then in 
four dimensions we may write 


6(x)4(0) = 5+ :6(x)6(0) a 


where denotes normal ordering (moving all 
annihilation operators to the right of creation 
operators) and C is just a normalization numerical 
constant (for canonical normalization C= 1/47°). 
The 1/x* term proportional to the identity operator 
reflects the leading singular behavior at short 
distances of ¢(x)d(0), the power being determined 
by ¢ having dimension 1. For the normal-ordered 
term we may expand in terms of an infinite set of 
local operators by using the Taylor expansion 


(8) 6(0):= SE! a Aa OOOO): D 
n=0 ` 


where the operator appearing in the nth term has 
dimension n + 2. Manifestly at short distances only the 
leading terms are relevant. Equation [1] also provides a 
point splitting definition of the local composite 
operator :¢7(0): in terms of limit of ¢(x)¢(0) as x > 0 
after subtraction of the singular C/x* term. 

The OPE can be easily generalized to composite 
operators defined by normal ordering. For :¢7: we 
have, by applying Wick’s theorem, 


2. 
g(x): 62 (0): = sel E (0): 
+ :6°(x)¢*(0): [3] 


where Taylor series expansion may be applied to both 
:o(x)b(0): and also :¢?(x)¢? (0): to give an infinite 
sequence of local operators of increasing dimensions. 

The expansion in terms of local operators may be 
reordered. For instance, from [1] we may write, 
using 07¢ =0, 


C 
6(0)6(0) = 5 
+ (1+4x"0,, +4x"x" 0,3, + 4x07) :¢7(0): 
~ ak Iwt O(x*) [4] 
where 
Ly = : O PO: -4 Nuv o= O@: [5] 


is the energy-momentum tensor. In [4], and also in a 
similar context subsequently, we define 0:¢7(0):= 
Oy :67(y): e? The expansion [4] provides a point 
splitting definition of T,,, and also demonstrates that 
many operators appearing in the OPE are expres- 
sible in terms of overall derivatives of lower- 
dimension operators. We may also note that without 
further input there is an ambiguity in the definition 
of T of the form 


Liye Lip taO tnw”) T [6] 


In a conformal theory, 
a=-— 1/6. 


however, we require 


Interacting Theories 


The OPE becomes an essential tool in the context 
of interacting quantum field theories. For renorma- 
lizable quantum field theories various results can be 
proved to all orders in the standard perturbative 
expansion and are naturally assumed to be proper- 
ties of the complete theory. In interacting theories 
we may no longer use normal ordering to define 
composite operators which, in general, have anom- 
alous dimensions. The coefficient functions appear- 
ing in the OPE also gain perturbative corrections but 
these are constrained by renormalization group 
(RG) Callan-Symanzik equations. 

Again if we consider the simplest case of a massless 
scalar theory as above but now with a renormalized 
coupling constant g the leading terms in the expan- 
sion of 6(x)(0) are of the form (here we assume a Z2 
symmetry under ¢— —¢, otherwise the operator ¢ 
would be expected to appear in the OPE) 


Conx) 


a +Deus) O+ 7 


o(x)e(0) = 
where p is an arbitrary renormalization scale. This 
arbitrariness is reflected in the RG equation 


C + lg) 5+ dre) Chg, ux?) =0 [8] 


At a fixed point 6(g,)=0 this equation may be 
solved with an arbitrary choice of normalization to 
give Clg., p2x2) = (2x) Y), which corresponds 
to the fields ø having a modified scale dimension 
1+ ye(g.). In a similar fashion the coefficient 
D(g, 7x7) in [7] satisfies 


Gè + Ble) E+ 2rolg) — Ye (8)) 


x D(g, u’ x") = 0 [9] 


where it is necessary to introduce a new anomalous 
dimension function yẹ (g) related to the composite 
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operator ¢*. Although it is natural to label the 
operator as œ? its definition in terms of the 
elementary field @ is essentially only as given in 
terms of the OPE ce At a fixed point again 
D(g., ux?) = k( p22) &+)+(1/2) 742 (8s) where the 
coefficient k is determined by the scale of the 
three-point function (¢(x)@(y)¢7(0)). In asymptoti- 
cally free theories the RG equations show that at 
short distances the coefficient functions tend to 
those of free field theory but with calculable 
logarithmic corrections. More generally, for a set 
of operators {O;} the OPE has the form 


ay >, Cin(g, u”x*)OR(0) [10] 
k 


where p is determined by the free scale dimensions 


of the O; and 

O o 
apt Hg) Soe 
— =a M 


=a) Ce a) [11] 


with Yin(g) the anomalous dimension matrix arising 
from the mixing of composite operators. 

An important aspect of the OPE is that the 
coefficient functions may be calculated perturbatively, 
essentially by applying the OPE in some suitable 
correlation function. Essentially the OPE provides a 
factorization between short-distance UV singularities 
and nonperturbative effects. In a Feynman graph the 
short distances in an operator product correspond to 
the large-momentum behavior and power-counting 
theorems allow a factorization up to calculable 
logarithmic corrections. A detailed analysis depends 
on the detailed technicalities of the proofs of renorma- 
lization to all orders of perturbation theory. 

The coefficient functions in the OPE should be 
independent of any infrared or nonperturbative long- 
distance effects (such as confinement in QCD). 
However, the operators which appear in the OPE, 
such as ¢* above, may have nonzero expectation 
values which are absent to all orders in perturbation 
theory. 


Oj(x)O;(0) ~ 


Ciin (g, u° x *) — Yin (2) Cre (8; ie) 


Perturbative Example 


The general considerations can be illustrated by 
considering a scalar field theory to lowest order in a 
perturbative expansion. We consider a four dimen- 
sional theory with a single scalar field and a 
potential V(¢) = {m° + 479¢°. Using dimensional 
regularization m*, as well as g, is treated as a 
2 


coupling with an associated (-function y,2(g)m~. 


With a mass term the operator ¢7 mixes with the 
identity operator so that 


(D+ ye alae = —ypr(8)m 


+ Ble) + yale) 5 12 


D— 2 
aT ðm? 


where 42; reflects the mixing. At one loop order we 
have 


g 1 
1672” AIE) a) [13] 
and we may also set yẹ(g)=0. In this case in the 
Operator product expansion (7) the coefficient C 
also depends on m*x* and the RG equations [8] and 
[9] are now modified to include the effects of mixing 


Ye? (g) = 


DC(g,m°x*, pox") = m*x*y21(g)D(g, ux") 


(D — y2(g))D(g, ux") = 0 
From lowest order perturbation theory with [13], 
and using [14] to include all orders in gln p*x*, we 
have in this approximation 


[14] 











C(g, mx, ux?) 
1 2mx? 3g ae 
rn 1 l 2a 2 
m g ( Tp 04 * r) 
[15] 
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The operator product expansion then reproduces the 
small x behavior of the two point function (ġ(x)ọ(0)}) at 
one loop, expanding C, D to first order in g, if we take 


3 
Dig, pa?) = (14+ 5% 


m2 
(9°(0)) = -5n E + O(8) 16 


which is in accord with [12]. If m < 0 the symmetry 
@ +> —¢ is broken and it is necessary to shift the field 
o=v+f, with v? = —6m*/g and the field f has a 
mass m; with m:=-—2m*. The operator product 
expansion [7] with the same coefficient functions as in 
[15] remains valid. The two point function (¢(x)¢(0)), 


which includes a nonperturbative term v^, is again 
reproduced for small x at one loop now if 
6m m, u 
2 
0)) = -—— — = ln— + O 17 
POs- nioe 07 


but in this case it is necessary to expand D(g, p? x?) 
to O(g*) as a consequence of the leading 1/g term in 
[17]. Note that both [16] and [17] contain the 
nonperturbative dependance on lnm and Inmy 
which is present in the two point function. 
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Conformal Field Theories 


When the 6-function vanishes and a quantum field 
theory enjoys conformal invariance the operator 
product expansion is a potentially convergent 
expansion. It is natural to restrict to conformal 
quasiprimary operators which do not mix with 
lower scale dimensions under conformal transforma- 
tions. If we consider, for instance, two scalar 
operators ¢ with scale dimension A, then the OPE 
has the generic form 


1 1 
(x) 0(0)= aa, + LCov0! RGR ATED 


0) [18] 


where there is a sum over quasiprimary operators 
O! u With scale dimension A’ and spin 4, so they 
are symmetric traceless tensors of rank £. In the first 
term in [18] the coefficient is chosen to be 1 by a 
choice of normalization. The coefficients Cgo! 
with a standard normalization for O!, are then 
determined by the coefficients of the corresponding 
three-point functions involving ¢¢ and O!. In [18] 
ey are differential operators which sum up the 
contributions of all derivatives or descendants of the 
quasiprimary operator O!. They can be explicitly 
given in terms of an integral representation, for any 
spacetime dimension, where the scale is fixed by 
requiring for the leading term C0) = 
xii ...x — traces. The spectrum of operators 
which appear is obviously a property of the 
particular conformal field theory. 


Ward Identities 


If the theory has a symmetry with corresponding 
conserved currents then there are Ward identities 
which constrain the OPE of fields with the con- 
served current. For a current J,a then we have, in 
d dimensions, the singular contribution in the OPE 
is given by 


1 Xi 


Jyua(x)O(0) ~ — 
where t, are a set of matrix generators correspond- 
ing to the symmetry acting on the fields O and Sy is 
the volume of the unit (d — 1)-dimensional sphere, 
S4=2n?. For a conserved current there are no 
anomalous dimensions and the coefficient in [19], 
which depends on the normalization for the current 
Jna, is chosen so that [Q,, O(0)] = —t,O(0) with Q, 
the charge formed from J,a. For the energy- 
momentum tensor the operator there is an analo- 
gous result. We consider the simpler case of a 


conformal theory when the energy-momentum 
tensor is both conserved and traceless and 


Tu (x)O(0) ~ Aw (2)O(0) 
+ Bua(x)ð^O(0) +- [20] 


where A,,,(x) = O(x74) and Balx) = O(x7%+1). As a 
distribution A,,(x) is ambiguous up to terms 
proportional to f(x). If A is the scale dimension 
of O and s, are the Lorentz spin generators acting 
on O the Ward identities then give 


A 
MA w(x) = ($ a Capt tsn) O64 (x) 


A p(x) = Ct a) 


2il 
OMB w(x) = — nad (x) 5 


where C,, is a constant tensor reflecting the 
arbitrariness in A,,, it is immaterial as far as Ward 
identities are concerned. We may choose 


A 
grat Cr = 0 [22 


(If desired, we might also take Ai (Xx) = Ap (x) + 
(1/2)s,64(x) in which case ð A(x) = 0, Ajy (%) = 
(1/2)s,,64(x) but such an antisymmetric piece seems 
unnatural). In general there is no unique form for 
A w(x), as a consequence of the freedom of choice 
for Caa in [21]. However, for a scalar field O we 
must have, for x Æ 0, 


= A 1 ey 1 
Awha) d—158, (hw x? ) (ga 
ey, 23] 


(d—1)(d— 2) Sa ”” (2/241 


with the overall scale determined by [21]. 

For the operator product of the current J,a with 
itself there is an additional term proportional to the 
identity operator of the form 

Cy 


1 
Jaala) Côa (Mu - 2-25") zam PA 


where the coefficient Cz, which determines the scale 
of the two-point function for Jaa, is well defined 
since the normalization of the current is determined 
through the Ward identity. A similar result also 
holds for the operator product of the energy- 
momentum tensor with itself, with an overall 
coefficient Cr. In general, we may also write for 
the operator product of two scalar fields O: 


1 Co dA 1 
O(x)O(0)~ Co ee Ce gai (2) D 
KP I m0) |25] 
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neglecting other contributions. The contribution of 
the energy-momentum tensor does not therefore 
introduce any new coefficient. 


Two Dimensions 


In two dimensions the OPE plays an essential role in 
the discussion of conformal field theories. For a 
Euclidean metric it is natural to use complex 
variables z and z. The energy-momentum tensor in 
this case reduces to a chiral field T(z) and its 
conjugate T(z). For the operator product with a 
chiral field (z) with scale dimension A, 


TAO) ~5o(0) +40) 26l 


and, for the operator product of T with itself, 


E 2 1 
ee a i 2 
sat ZT) +=7(0) [27 


Here c is the Virasoro central charge, which plays a 
critical role in the discussion of two-dimensional 
conformal field theories, it is given by the two-point 
function which follows from [27], (T(z)T(0)) = 
(der. 

In simple rational conformal field theories the 
Operators are organized into conformal blocks by 
the infinite-dimensional extended conformal sym- 
metry in two dimensions. This allows the full 
spectrum of operators and their dimensions to be 
determined and in consequence complete results for 
the OPE to be found in many cases. 


T(z)T(0) 


Further Remarks 


The OPE reflects the locality properties of quantum 
field theories and can be extended without difficulty 
to curved space backgrounds. For a product 
d(x)¢(0), the separation x? may be replaced by a 
biscalar at x and 0 but it is necessary to include in 
the OPE contributions involving the background 
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Introduction 


Optical caustics are the bright forms created by the 
focalization, natural or artificial, of light (Figure 1). 
Special caustic points, called focuses, are produced 
by stigmatic optical systems in order to visualize 
objects. However, there are no special conditions for 


Riemann tensor as well as the operator fields present 
in flat space. There is also a generalization of the 
OPE for superfields on superspace. 

At a fundamental level although the OPE can be 
derived to all orders in perturbation theory the 
contribution of nonperturbative effects such as 
instantons to the coefficients is not entirely clear. 
Issues of associativity have yet to be fully analyzed. 

There are also important applications to the 
phemenonological analysis of QCD when assump- 
tions about the OPE and saturation of sum rules can 
lead to results for the vacuum expectation value of 
gauge-invariant operators such as FH’ F». 


See also: Boundary Conformal Field Theory; Effective 
Field Theories; Quantum Chromodynamics; 
Renormalization: General Theory; Renormalization: 
Statistical Mechanics and Condensed Matter; 
Two-Dimensional Models. 
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producing usual caustics. Every congruence of rays 
always generates a caustic, more or less intricate. 
Caustics have been observed and described since a 
long time, tracing back to antiquity. The name itself 
was coined after the Greek root “kausticos” mean- 
ing burning and expressing that a high energy 
density is produced by ray focalization at a caustic 
point. Conceptually, they appeared in the literature 
as “evolutes,” “envelopes,” “centers of curvature,” 
“focals,” etc. However, these different approaches, 
often too restricted, were unable to clarify the 


393 cc 
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Figure 1 Optical caustics may be produced by reflection (on window glasses) or by refraction (through the wavy surface of a 
swimming pool). Here the light source, the Sun, has some angular extension and the caustic appears somewhat blurred. 


general properties of caustics, for instance, their 
classification in generic types. This difficult question 
was solved only recently in the framework of the 
singularity theory which appeared in the second half 
of the twentieth century (Whitney 1955, Thom 
1956). Caustics are now understood as physical 
realizations of Lagrangian singularities, and they 
are often called optical singularities or optical 
catastrophes. 

The aim of this introductory article is to show in 
which sense caustics can be understood as singula- 
rities, and to present their main properties. 


The Physical Phenomenon 


Caustics are usually observed by interposing a screen 
on the ray trajectories and their trace in the screen 
forms a set of bright curves called “fold” (A2). 
Across the fold, the number of rays passing through 
a given point jumps by +2. Two fold curves may 
join at some point forming there a tip called cusp 
(A3). A simple example is provided by the nephroid 
that one sees in a cup of coffee when the light is 
reflected off the cylindrical sides. In the three- 
dimensional (3D) space, the folds form surfaces 
and the cusps form curves (Figure 2). For particular 





Cusp A, 


Swallow tail A, 


Figure 2 The five generic types of caustics of the 3D space. 


Elliptic umbilic Dy 


positions of the screen, three other types of caustics 
may be observed: the swallowtail (A4), the meeting 
point of two cusp lines; the elliptic umbilic (D7), the 
meeting point of three cusp lines; and the hyperbolic 
umbilic (Df) where a cusp line tangentially meets a 
fold surface (Figure 2). These five caustic types are 
generic in the sense that any other type of caustic 
point is unstable and decomposes into these generic 
caustic points under small perturbations. The perfect 
focus is an example of a nongeneric caustic point, 
obtained by imposing a special symmetry. The 
natural focusing of light, as in gravitational optics, 
produces only generic caustics. A caustic point is 
then a generalized focus. The caustic surface is a 
complex surface in the 3D physical space, generally 
self-intersecting and possessing singular lines A3 
ending at singular points A4, Dz, or Dj. 

At the scale of the wavelength of the light, the 
caustics have a more complex structure. Instead of 
well-defined surfaces, lines and points, one observes 
a system of interference fringes concentrated in 
the vicinity of the geometrical caustic. Each type of 
caustic point has its own diffraction pattern (also 
called diffraction catastrophe) (Figure 3). These 
interference systems are easily produced, for 
instance, by focusing a coherent laser beam by a 
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Figure 3 Interference fringes produced by the five generic 
caustics of the 3D space (numerical simulation). 


corrugated glass or by a water droplet. An impor- 
tant feature is revealed by Gouy’s experiment, in 
which bright and dark fringes are inverted when the 
rays are forced to pass through a focus (Guillemin 
and Sternberg 1977). The experiment shows that the 
wave undergoes a phase shift of 2/2 when the 
associated ray passes through a caustic point. 

So, caustics are fundamental objects of both the 
geometrical optics and the wave optics. 


Modeling Caustics 


Because of the presence of a caustic, a congruence of 
rays generally presents intersecting rays. At the 
points of intersection, the coordinates g1,q2,q3 of 
the physical space R? are unable to distinguish the 
various intersecting rays and they do not constitute a 
convenient system of coordinates. It is then interest- 
ing to construct an abstract space in which the rays 
are represented by nonintersecting curves. The initial 
congruence is recovered by projecting the abstract 
space into the physical one. All the models use this 
type of construction in which the properties of the 
caustics are deduced from those of the projection. 


Caustics as Envelopes of Rays 


In this geometrical modeling, each ray is labeled by 
two parameters r1, 72, for instance, the coordinates on 
the initial wave front W. A third coordinate r3 
specifies the points along the ray, for instance, 
by assigning their distance to W. Taken together, 
these three coordinates represent the congruence of 
rays, and define a 3D space, the source space 
M = {r1,12,73}. By construction, the rays in M do 
not intersect. The coordinates (g1,92,q3) of the 
current point PeR along each ray depend 


differentiably on the coordinates (r1, 72,73) and define 
a “projection” f :(r1,r2,r3)—> (g1,92,93) from the 
source space M into the physical space R3. 

The caustic points correspond to the envelope of 
the rays. At a caustic point P, the energy density 
flowing along the rays becomes infinite, since the 
small volume delimited by neighboring rays is 
shrunk into a small surface at P. This behavior 
may be simply expressed with the help of the 
projection f: the rank rk of the derivative Df is 
equal to 2 at the point representing P in M. This 
motivates the following definition. Given a map 
f:M—N, a point x € M is said to be critical (or 
singular) if the rank of the derivative Df is less than 
the maximal possible value min(dim M, dim N). 
Here, dim M = dim N = 3, and a critical point is a 
point where rk < 3. The set © C M of the critical 
points is called the singular set. The caustic C is the 
image of the singular set: C=f(£). One also says 
that the caustic points are the critical values of f. 

In practice, the derivative Df is expressed by the 
Jacobian matrix J = O(q1, 92, 93)/O(11, 72,73) and the 
singular set X is defined by solving the equation 


det(J) = 0 [1] 


If this equation permits one to express explicitly one 
coordinate, say 73, as a function of the other two, 
the caustic surface C is found in parametric form: 
qı =q1(r1, r2, r3(r1,r2)), etc. For a homogeneous 
medium, equation [1] is of second degree in r3 and 
the caustic is composed of two sheets which meet at 
the umbilic points D4. 

Equation [1] gives all caustic points independently 
of their nature, that is, it does not distinguish 
between A2, A3, A4, Dz, and Dz. A refinement 
allows one to recognize different types of caustic 
points. One defines the Thom—Boardman class ¥’ as 
the points in M where Df has a kernel of dimension 
i. Then one defines inductively the class 0-7" as 
the class X£ of the restriction of f to £>=>/. Thus, X° 
represents the regular points (noncaustic points), 
“1° the fold points A2, ©!!° the cusp points 
A3, 4! the swallow-tail points A4, and ©*° the 
umbilics D4 (hyperbolic or elliptic). Altogether, the 
classes X}, I 4 0, form the singular set X. 

The Thom-—Boardman classes constitute a simple 
and powerful tool for computing the structure of a 
caustic. Each class is obtained by canceling some 
functional determinants associated with the map f or 
with its restriction to some class. However, the 
method presents the weakness of ignoring the 
special nature of a set of rays: its Lagrangian 
character. As a consequence, it is unable, for 
instance, to distinguish between Dz and D7. 


Caustics as Lagrangian Singularities 


As for mechanics, the natural framework for geomet- 
rical optics is a phase space: the cotangent space 
T*R? = {p;, qi} of the configuration space R? = {qj}. 
The phase space is characterized by its symplectic 
structure, that is, the differential 2-form w= $`; dp; ^ 
dq;, which is nondegenerate and closed (dw = 0). 

A set of rays in the phase space is defined by 
specifying the wave vector (or momentum) p at 
each point q of the congruence. In the simple case 
where only one ray passes through each point, one 
has p= VS, where S is the optical length f nds and 
n the refractive index. In other words, p is the 
differential of the optical length. The wave vector 
p is tangent to the ray and orthogonal to the 
(geometrical) wave front S=const. The eikonal 
equation shows that its modulus is n. As a direct 
consequence of the relation p= VS, the symplectic 
form annihilates identically for these p. However, 
in general, because of the presence of the caustics, 
One must not expect to have p=VS for some 
function S. Nevertheless, it is possible to keep 
the more general property to annihilate w. This 
motivates the definition of a Lagrangian submani- 
fold: a submanifold Lc T*R? of dimension 3 
(that is, half of the dimension of the phase space) 
on which the symplectic form vanishes: w|, =0. 
Every congruence of rays is described by a 
Lagrangian submanifold. The Lagrangian subma- 
nifold plays the same role as the source space in 
the preceding section. The role of the projection f 
is played by the natural projection m from the 
phase space into the configuration space 
(p,q) =q, or more precisely to its restriction to 
L: f=a|,. It is called a Lagrangian map (or 
Lagrangian projection) and it is again a map 
between two spaces of the same dimension (here 
3). When L is given by an embedding 1: L — T*R?, 
one has f=7ov. A caustic is then defined as the 
set of critical values of a Lagrangian map. 

There exist two remarkable results showing that a 
Lagrangian submanifold may be described in terms 
of functions or of families of functions. As a 
consequence, caustics are not directly related to the 
singularities of maps but, more particularly, to the 
singularities of functions. 


",1,5,3,0,0pc,0pc,0pc,0pc>Generating function of a 
Lagrangian submanifold The 3D Lagrangian sub- 
manifold L C {p;,g;} is locally defined by three 
coordinates pala € A) and qg(8 € B) depending on 
the three other ones Da and 


Jat Pa = Palda, P6), 48 = qelqa, Po). One can show 
that this may be done in such a way that each 
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conjugate pair (qi, pi) gives exactly one independent 
variable and one dependent variable. Formally: 
AUB={1,2,3}, ANB=9@. 

In fact, introducing the function S(qa,ps3)= 
| (p, dq) — (ds, ps)({,) denotes the scalar product), 
the local equation for L takes a more simple form: 


os os 
se Penn 2 
qB p T [2] 


The function S is well defined, since, by the 
definition of a Lagrangian submanifold [(p,dq) is 
locally path independent: it depends only on its end 
points. S is called a (local) generating function. 
Formula [2] generalizes p=VS, to which it 
reduces when B = (), that is, for nonintersecting rays. 


",1,5,3,0,0pc,0pc,0pc,0pc>Generating family and 
optical catastrophes Formula [2] may be rewritten 
in an interesting way. Taking the |B| variables pg as 
internal parameters x and q= (qa,q8) as external 
parameters, we construct a function F of x para- 
metrized by q: F(x, q)=S(da,x) + (q8, x}. Now the 
Lagrangian submanifold L is defined by 


OF OF 
L= fap): Jx: Ay Os p= 


F is called the generating family. The first equation 
OF/Ox =0 determines the rays passing through the 
fixed external parameter q € R’. The second one 
distinguishes these rays according to their wave 
vector p. Each ray corresponds to a critical point 
(i.e., an extremum) of F considered as a function of 
x. At a caustic point, two infinitely close rays are 
converging and F then presents a degenerate critical 
point. So the generating-family technique links the 
caustics to the theory of singularities of functions 
depending on some parameters, that is, to the 
catastrophe theory (Thom 1969). Caustics are also 
called optical catastrophes. 

The generating families are not uniquely defined, 
even locally. In optics, one may always take for F 
the equivalent family “optical length” d, considered 
as a function defined on the initial wave front W 
(this is discussed in the following). 


Caustics as the Locus of Wave Front Singularities 


There exists a remarkable duality linking rays and 
wave fronts. As a consequence, the caustic points 
(i.e., Lagrangian singularities) are related to singula- 
rities of wave fronts (i.e., Legendrian singularities). A 
typical wave front W may possess only two types of 
singularities: cuspidal curves and swallow-tail points. 
During the motion of W, governed by the eikonal 
equation, the cuspidal curves generate surfaces, and 
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swallow tails generate curves. These surfaces are 
exactly the fold surfaces of the caustic C and the 
curves are the cusp lines of C. The point singularities 
of the caustic, that is, the swallow tails and the 
umbilics, correspond to bifurcations of the instanta- 
neous wave front, at certain moments of its motion. 


Caustics as Short Wave Asymptotic 


The fine observation of the optical caustics shows 
that they never appear as the well-defined surfaces 
given by the geometrical optics, but rather as 
diffraction patterns concentrated around these sur- 
faces. So wave optics is the natural framework 
to account for this fundamental feature. One 
exploits the fact that the wave number k=27/ 
(A: wavelength of the light) is a large parameter. 
This short-wave approximation permits the use of 
powerful expansion techniques and clarifies the 
relation with the geometrical optics viewpoint, 
formally obtained for k tending to infinity. 


The stationary phase In the most simple model, 
the Huygens-Fresnel principle, the amplitude U(P) 
of the optical field may be evaluated by adding the 
secondary disturbances emitted from the points O of 
some initial wave front W: 


U(P) = cff Saas [3] 


where d is the distance OP. G is the inclination factor, 
a smooth function defined on W and c some 
prefactor. For simplicity, G and n (the refractive 
index) are assumed to be constant. Defining 
a=cG/d, formula [3] appears as an integral of the 
form fa(y)e*®)dy. This type of integral may be 
evaluated for large k by the method of stationary 
phase. The principal contributions are due to points 
where the phase ¢ is stationary: V@=0. For wave 
optics, @ is the length PO, considered as a function of 
O and parametrized by P. The stationary condition 
means that PO is normal to W, that is, it represents a 
ray of geometrical optics. The function PO is a 
generating family in the sense of the discussion earlier. 

If no stationary points exist, that is, if P is in the 
shadow, the integral is O(k™) for any N. Other- 
wise, and if the critical points are not degenerate, 
the phase stationary method gives (Guillemin and 
Sternberg 1977): 


U(P) = 2n ` e(1-t)m/2 
rays PO 
a(Q)e'*4 


FP A =) 
(1 — pid) (1 — pod) |'/* O(k~) [A 


where u! and u3! are the two principal radii of 
curvature at O € W, and # the number of caustic 
points (also called focal points) along the ray PO. 

In the stationary-phase approach, the caustic C, 
locus of centers of curvature of W, appears as an 
obstacle in constructing asymptotics, since formula [4] 
diverges when dy; — 1, that is, when P tends to C. 
It is, nevertheless, remarkable that C also appears 
explicitly when [4] is valid, via the us and {. In 
particular, the term e~#"/*, applied in the case of a 
focus (# = 2), accounts for the phase shift of m observed 
in Gouy’s experiment. 


Asymptotics on caustics Uniform asymptotic for- 
mulas, valid also on the caustic, need a more complex 
theoretical framework, for instance, Maslov’s theory, 
presented here in a necessarily simplified version (see 
Maslov and Fedoriuk (1981) for more detail). 

The starting point is the equation of wave optics, 
that is, the Helmholtz equation 


(A + k*n*)U =0 [5] 


where the refractive index n generally varies from 
point to point. For k— œ, one looks for an 
asymptotic solution in the (tentatively) form: 


U(P) = ee) Sik) plq q2,43) [6] 
j=0 


Inserting this form in eqn [5] one obtains the eikonal 
equation (or characteristic equation) for the phase S: 


(VS) =n? 


and an infinite series of equations for the amplitudes 
yj, called the transport equations. One knows that 
the Cauchy problem for the eikonal equation may be 
reduced to the integration of the corresponding 
Cauchy problem for the Hamilton system (or 
bicharacteristic system): 


dq OH _ dp OH 
dt Op ~*’ dt dq 
where H= (p,p) —n7?. Its solutions, the bicharac- 
teristics q(t, £), p(t,€) are parametrized by the 
“time” t and the 2D parameter € parametrizing the 
points on the initial wave front W. The bicharacter- 
istics form a 3D Lagrangian submanifold L in the 
phase space {p;, qi} and one recovers the preceding 
situation. Assuming L to be simply connected, one 
defines a global phase function S on L by formula 
s(t, ©) = J(p, dq). 
In a domain 9; C L not containing the singular set 
and in which the coordinates t,€ are in a one-to-one 
correspondence with the physical coordinates, S$ 


Vn? 


becomes a function of g;. Using the transport equation, 
one finds the leading term of the asymptotic solution 
(with accuracy to k) in the following form: 


U(P) = (K(Qi)¢)(q) 
do 
dq; 


el5(91-92-93) o( gy, q2, q3) [7] 








where do and dq;, respectively, represent the 
measures on the Lagrangian submanifold and on 
the physical space. The amplitude y depends on the 
initial conditions. Formula [7] defines a precanoni- 
cal operator K(Q;). It has the same form as [4], with 
the same drawback to diverge near the singular set 
X, where dg; = 0. 

In a domain Q; containing the singular set, L is 
locally parametrized by mixed coordinates qa, pa. 
The basic idea is then, roughly speaking, to carry 
out a Fourier transform F, with respect to these pg 
(in fact, a variant of the usual Fourier transform, in 
which the parameter k appears in the prefactor and 
in the phase term). This leads one to consider, 
instead of L=A+k?*n*, the operator L=F,LF,", 
and instead of U, the unknown function V = F, U. In 
this Fourier space, V may be found in the same way 
as U was found in the preceding case, with S 
replaced here by the local generating function 
Si(das Ps) =S — (q8, po}. Coming back to the real 
space by F,", one obtains (with the same accuracy): 


U(P) = (K(Q)) 9) (q) 
do 
dpsdqa 


—1 
k 








50 olde po) [8] 


There is no divergence in this local solution. So local 
short-wave asymptotics may be found everywhere, 
even on the caustic where they have a more complex 
form than the form [6] or [7]. 


Global asymptotics and Maslov’s index The global 
asymptotic solution is obtained by formally gluing 
the local solutions by a partition of unity Xe;=1 
subordinate to a covering {};} of L. However there 
is a difficulty. The representations of the same 
precanonical operator in different local coordinates 
da, Ps, even not containing the singular set, agree 
only up to a constant multiplier e'””/?, where the 
integer m is the number of negative eigenvalues of 
some matrix. One is led to multiply every precano- 
nical operator by a convenient phase factor e~'"7’7, 
where y € Z4 is called Maslov’s index. The coher- 
ency of the phase factor in different domains is 
realized by using the important property of to be 
co-oriented. Thus, y counts the number of passages 
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of an oriented path on L from the negative side of X 
to its positive side, minus the number of passages in 
the opposite sense. Maslov’s index is locally con- 
stant and jumps by +1 only across the singular set 
X. The global canonical operator is now formally 
defined as K = ve ™u/2K(O,)e;. 

Finally, the canonical operator K is well defined 
only if it is independent of the {Q,} and e; used for its 
definition. This possibility is expressed (in the case of a 
simply connected L) by the following property, 
intrinsically attached to L: the Maslov index cancels 
on every closed loop. So the only obstruction for global 
asymptotics is the nontriviality of the characteristic 
class defined by Maslov’s index and not the caustic. 

The central object of the caustic modeling is then 
the projection of the submanifold representing the 
rays (M or L) into the physical space. The possibility 
to reduce this projection to some normal form is the 
key result for the local classification of caustics. 


Local Classification of Caustics 
Equivalence, Stability, and Genericity 


In order to distinguish different types of singula- 
rities, one has to define an equivalence relation. Two 
Lagrangian maps f;:T*M; > L; — M; (i=1,2), are 
said to be Lagrange equivalent if there is a 
diffeomorphism 4: T*M, — T*M) preserving both 
the symplectic and the fiber structures, and sending 
Lı to Ly. In fact, only the local problem of 
classification makes sense, and one considers, 
instead of Lagrangian maps, germs of Lagrangian 
maps. A map germ is a map locally defined, that is, 
defined in an infinitely small neighborhood around a 
point (depending on the germ). The notion of 
Lagrange equivalence is extended to the germs. A 
Lagrangian singularity is then the Lagrange equiva- 
lence class of a germ at a critical point. Each 
equivalence class represents a type of Lagrangian 
singularity, that is, a type of caustic point. 

The example of the perfect focus point shows that 
there exist singularities which are totally unstable. In 
this sense, they correspond to idealized situations not 
physically realizable, and they have to be disregarded. 
Conversely, stable singularities resist under the action 
of small perturbations. They correspond to Lagrangian 
germs for which all neighboring germs are Lagrange 
equivalent (not necessarily at the same point, but near 
the point considered). 

Now the important question is: do the stable germs 
represent the generality? In the best case, stable germs 
form a dense open set. This means that every germ may 
be approximated by stable germs. In this case, one says 
that the stable germs are generic. 
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Stability and genericity are disctinct notions. It 
turns out that they coincide for low values of the 
dimension n of the “physical space” (n < 6), but 
they may disagree at higher dimensions. 


Classification of Stable Caustics 


The fundamental result of the theory is the local 
classification of Lagrangian singularities (Arnol’d 
1972). With the help of the generating families, the 
study of Lagrangian singularities is reduced to the 
study of singularities of families of functions. More 
precisely, at a singular point, every stable Lagragian 
map is equivalent to one of the following maps, 
given by their generating function S and by their 
generating family F: 


Aro: S= E 
F =x? +qıx 
A3: S=+pi+ Qp 
F = xf + qx? + q2x 
A4: S=p +Qpi + 93P% 
F= x + qx? + qx + 43x 
De: S=pi +p} + 43h% 
F= Max ae x3 =T gax tait gxi 


These polynomial functions are called normal forms. 
The stable singularities are generic. In other words, 
every other type of singularity is destroyed by 
infinitely small perturbations and gives a set of 
singularities belonging to the list. The five generic 
caustics have been observed and experimentally 
studied in detail (Berry and Upstill 1980, Nye 1999). 

By inserting the normal forms S in a short-wave 
asymptotic, one obtains the diffraction patterns 
associated with the five caustic types (Figure 3). 
They generalize the Airy function which corresponds 
to the fold singularity. 

The normal forms describe at once the geometry 
of the caustics and the interference systems around 
them. 


Codimension, Corank, Multiplicity, and Index 


Lagrangian singularities are also characterized by 
some numbers. They have a codimension c equal 
to the difference between the dimension of the 
physical space and their dimension: c(A2)=1, 
c(A3) =2, c(A4)=c(D})=3. They have a corank ck, 
equal to the difference between the dimension of the 
space and the rank of the Lagrangian map: 
ck(Az) =ck(A3) =ck(A4) =1, ck(DF) =2. The corank 
is the number of internal parameters of the generating 
family F. They also have a multiplicity u, which is the 


number of nondegenerate critical points of F, that is, 
the number of rays coinciding at the singularity. In 
the 3D space, one has p=c+ 1: u(A2)=2, p(A3) =3, 
(Aa) = (DE) =4. 

Short-wave asymptotics near the caustic present 
remarkable scaling properties (Berry and Upstill 
1980). In particular, the amplitude |U(P)| increases 
like kê as k — oo. The number 6 depends only on the 
type of the singularity and it is called the singularity 
index. The more “degenerate” the singularities, the 
larger the index, and then the brighter the caustic 
point: 6(A2)=1/6 < 6(A3)=1/4 < 6(A4) =3/10 < 
AD )=1,3. 


Global Organization of Caustics 


The global properties of caustics are less under- 
stood than the local ones. There is, nevertheless, 
an interesting result concerning specifically the 
caustics in the 3D space (Chekanov 1986). Given 
a Lagrangian map f:L — R°, the Euler character- 
istic y(X) of the singular set X C L and the number 
tD4(—1/2) of umbilics of index —1/2 are related 
by the formula 


x(©) + 24D4(—1/2) = 0 [9] 


At an umbilic point T, © is locally a cone with 
vertex at T. The index is defined according to the 
relative positions of the following elements: the 2D 
plane I=kerf, the cusp lines A3 C™ passing 
through T, and the characteristic line / which 
represents the ray at T. If l and A3 are separated 
by II, the index is equal to +1/2, and to —1/2 in the 
other case. The index of an elliptic umbilic is always 
equal to —1/2. 

The validity of Chekanov’s formula [9] requires 
that L lies on a hypersurface E of the phase space, 
convex with respect to the wave vectors. The 
characteristics are the orthocomplements of E. In 
this framework, the singularities are called optical 
singularities, because such an E is always defined in 
geometrical optics by the eikonal equation. All 
Lagrangian singularities can be realized as optical 
singularities. Chekanov’s formula has been experi- 
mentally checked (Joets and Ribotta 1996). 

The Chekanov relation has an important conse- 
quence on the caustic bifurcations (also called 
metamorphoses or perestroikas), that is, the generic 
transformations modifying the topology of a caustic 
depending on one parameter. Among the 11 possible 
caustic bifurcations, considered as bifurcations of 
general Lagrangian singularities, four of them cannot 
be realized as bifurcations of optical Lagrangian 
singularities. So Chekanov’s relation reduces the 
number of optical metamorphoses to seven. 


Extensions 
Caustics in Spaces of Higher Dimension 


The local classification of Lagrangian singularities 
has been extended in spaces of higher dimension. 
For n=4, in addition to the preceding ones, two 
new singularities appear: the butterfly As and the 
parabolic umbilic D5. For n=5, in addition to A¢ 
and Dz, one has a new type of umbilic: Eg. 
However, in higher dimensions, the classification 
becomes more complex. In addition to stable 
singularities, like those of the series A;, D;, E;, one 
encounters unstable generic singularities which 
depend on arbitrary parameters (moduli). Despite 
this difficulty, there exists a classification of generic 
Lagrangian singularities up to the dimension n = 10. 

The Maslov index has been extended in spaces of 
higher dimension and has led to the discovery of 
invariants associated with particular types of singu- 
larities (Vassilyev 1988). These invariants control 
the number of some types of singularities. For 
instance, in dimension n=4, the number of As 
(taking account of sign) is equal to zero. 


Symmetrical Caustics 


Another extension consists in imposing some 
constraint, for instance, a symmetry (Janeczko and 
Roberts 1993). Symmetrical caustics are not merely 
the symmetrized usual caustics. Many of them result 
from the stabilization of unstable singularities of 
higher codimension by the symmetry. For example, 
in the 3D space, the butterfly As is unstable, but the 
symmetrical butterfly is a generic singularity in the 
class of Lagrangian singularities having the mirror 
symmetry. 


Nonoptical Caustics 


Caustics, as locus of focalization, are not restricted 
to the usual optics. They are also observed in 
electronic optics or in gravitational optics and the 
preceding results apply to these waves. They also 
appear in nonelectromagnetic waves, for instance, 
acoustic waves, seismic waves, etc. Propagation 
always generates caustics. 

Optical caustics are now understood as Lagran- 
gian singularities and, as singularities, their interest 
is not restricted to optics. They became indispen- 
sable for understanding other domains of mathema- 
tical physics, for instance, the variational calculus, 
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the classical mechanics, the Hamilton—Jacobi equa- 
tions, the control theory, the field theory, etc. 


See also: Billiards in Bounded Convex Domains; Normal 
Forms and Semiclassical Approximation; Stationary 
Phase Approximation; Singularity and Bifurcation Theory. 
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Introduction 


According to the well-known “no-cloning theorem” 
(Wootters and Zurek 1982) perfect copying of 
quantum information is impossible, that is, there is 
no machine which takes a quantum system in an 
unknown state as input and produces two systems of 
the same kind, such that none of them is distinguish- 
able from the input by a statistical experiment. In 
this qualitative form, however, the theorem is not 
very useful, because in the presence of noise classical 
information cannot be copied perfectly as well. 
Therefore, the crucial point is that even under ideal 
conditions the errors produced in the clones cannot 
be made arbitrarily small. The best we can hope for 
is to find an optimal cloning device which makes 
these errors as small as possible. 

More generally, we can consider cloning devices, 
which take as input a certain number, N, of 
identically prepared systems, and produce a larger 
number, M, of systems as output. Again, the 
cloning task is to make the output state resemble 
as much as possible a state of M systems all 
prepared in the same state as the inputs. This 
variant of the problem is of interest as a “quantum 
amplifier.” It also has a better chance of reasonable 
success than a cloning device operating on single- 
input systems: in the limit of many-input systems, 
the device can make a good statistical estimate of 
the input density matrix and hence produce 
arbitrarily good clones. 


Figures of Merit 


To get a precise mathematical description of the 
problem, let us consider a one-particle Hilbert 
space H (which is assumed to be finite dimen- 
sional, H=Cf, if nothing else is explicitly stated) 
and the algebras B(H®), B(H°“) of (bounded) 
Operators on the N-fold, respectively M-fold, 
tensor product of H. A quantum operation which 
takes N particles as input and produces M output 
particles is then described, in the Heisenberg 
picture, by a completely positive, unital map (a 
completely positive, unital and normal map if H is 
infinite dimensional): 


Tp) = Bt [1] 


while the Schrödinger picture representation is given 
in terms of the (pre-)dual of T, that is, 


Tebu T= R [2] 


where B,(-) denotes the space of trace-class 
operators. Hence, if T operates on input systems in 
the (joint) state pN, the output systems (i.e., the 
“clones”) are in the state T,(p®). We will call each 
such T a cloning map. 

Now our aim is to find an operation T such that 
the output state T,(o®’) approximates the product 
state p°™ as well as possible. The quality of the 
approximation is measured by a distance function 6 
on the convex set S(H®™M) c B,(H®™) of density 
operators on H®™ and, since it is impossible to 
minimize 6(T,(p®%), p®“) for all p simultaneously, 
we are looking only for the worst case. Hence, the 
quality of a cloning map T is measured by a figure 
of merit of the form 


Ax.5(T) = sup 6(T.(p®"), 0°") [3] 
pEX 
Here X C S(H) is a set of “preferred” density 
operators whose role will be explained in the next 
section. An optimal cloning device is described by a 
cloning map T which minimizes Ax, s, that is, 


AN 


Ax (T) < Axa) [4] 


should hold for each cloning map T. 


The Preferred Set of States 


The set X C S(H) of density operators introduced in 
the last equation describe a priori knowledge about 
the one-particle input state p; for example, if we 
want to clone only signal states p1,...,p, used to 
transmit classical information through a quantum 
channel, the choice for X is {p1,...,p,}. Other 
possibilities include: X = S(H) if nothing is known 
about p, the set of pure states, the states in the 
“equatorial plane” of the Bloch sphere, or Gaussian 
states if H is infinite dimensional. Each different 
choice for X leads to a different variant of the 
cloning problem, and we will summarize the most 
relevant cases treated in the literature in the section 
“Examples.” 

A different kind of a priori knowledge is a priori 
measures, that is, instead of knowing that all 
possible input states lie in a special set X, we know 
for each measurable set X C S(H) the probability 
u(X) for p€ X. Such a situation typically arises 
when we are trying to clone states of systems which 


originate from a source with known characteristics. 
In this case, we can use mean errors, 


Dosti: J. ETO, ud) 1S 


as a figure of merit. Sometimes these are easier to 
compute than maximal errors as in eqn [3]. Often, 
however, A leads to stronger results than A, 
therefore we will concentrate our discussion on 
maximal rather than mean errors. 


The Distance Measure 


The remaining freedom in eqn [3] is the distance 
measure 6 and there are mainly two physically 
different choices: we can either check the quality of 
each clone separately or we can test, in addition, the 
correlations between output systems. The most 
common choice for a figure of merit for the first 
type is given by (where tr; denotes partial trace over 
all but the jth tensor factor) 


A(T) = sup|1— F(T), p)| 16 
PEX 
Here F(p, o) denotes the (quadratic) fidelity of p and 
o, that is, 


F(p,0) = ee((p' apt)" 7 


and the supremum is taken over all p€ X and 
j=1,...,N. A; measures the worst one-particle 
error of the output state T*(o®), and we will refer 
to it in the following as the local error. If we are 
interested in correlations too, we have to choose 


Aan(T) = sup|1 =F T e [8] 


A, Measures again a “worst-case” error, but now 
of the full output with respect to M uncorrelated 
copies of the input p. We will call it the global error. 
Alternative figures of merit arise if we replace the 
fidelity in eqns [6] and [8] by other distance 
measures like the trace norm, the Hilbert—Schmidt 
norm, or the relative entropy. If X consists only of 
pure states, the operations T which minimize A, or 
A, are usually not altered by such different choices. 
If X is a set of mixed states, however, the correct 
choice is unclear and might depend on the precise 
physical context (there is, in particular, no reason to 
prefer fidelities). 


General Properties 


Before we consider more special examples in the 
next section, let us discuss some general properties 


Optimal Cloning of Quantum States 629 


of the figure of merit Ax 5 from eqn [3] and the 
corresponding optimization problem. 


Existence of Solutions 


If the distance measure 6 is continuous in the first 
argument, the optimization problem [4] has a 
solution, that is, optimal cloning machines exist: 
the set 7 of cloning maps [1] is compact and the 
quantity Ax 5 is — as a supremum over continuous 
functions — lower-semicontinuous. Hence, the 
statement follows from the fact that a lower- 
semicontinuous function on a compact set always 
admits a minimizer. 

This argument can be generalized to the infinite- 
dimensional case, if we choose the set J of allowed 
cloning maps more carefully (the restriction to 
normal channels proposed above is most probably 
not sufficient for this purpose) and if we equip it 
with an appropriate topology. The latter should be 
weak enough for T to be compact, and strong 
enough for Ayx, to be lower-semicontinuous. A 
typical choice is the weak*-topology arising from an 
embedding of 7 into the dual of a Banach space 
(such that we can apply the Banach—Alaoglu 
Theorem). Detailed studies in this direction are, 
however, not yet available. 


Covariant Cloning Maps 


To solve the optimization problem [4] is a difficult 
and, in many cases, impossible task. However, it can 
be simplified significantly if X and 6 admit a 
nontrivial symmetry group. Hence, consider again 
a distance 6 which is continuous and convex in its 
first argument and a closed subgroup G of the group 
U(d) of unitary operators on H=C%, such that 


Ux Gx, S00) 

= 6(p,¢) [9] 

hold for all U € G and p,o € S(H®™). Then Ax,5 is 

invariant under the induced G action on the set 7 of 
cloning maps, that is, 

with oTr A= UTU Aum i0] 

holds for all U € G and all T € T. Convexity of 

Ax,s in T implies (with the Haar measure uy on G) 


Ax s(T) < Ax s(T), 
with T = J tyu(T) uy (dU) [11] 
G 


for all T. Hence, we can replace each cloning map 
by its group average T without sacrificing the 
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quality of the clones. This implies that T is optimal 
if T is, and, since T is G-covariant, 


VUEG [12] 


T u(T) = T 
we can conclude, together with the arguments from the 
last section, that the optimization problem [4] always 
admits covariant solutions. Similarly, we can show that 
permutation invariant (sometimes called “symmetric” ) 
solutions exist, that is, cloners which do not prefer a 
particular clone or a particular input system. 

This is a very useful result, because the set of 
covariant and permutation-invariant T is much 
smaller than the set of all cloning maps, and it can 
be parametrized in terms of irreducible representa- 
tions of G and the permutation group. In particular, 
the case G=U(d) (such a T is often called 
“universal”? because it does not prefer any direction 
in the Hilbert space H) leads to quite general 
solutions. 


Relationships with Quantum State Estimation 


If a procedure to estimate the input state p from a 
measurement on the N-fold system in the joint state 
p is given, there is a simple way to produce a 
cloning machine: we just have to take the estimate 
po for the density matrix p and prepare M > N 
systems in the state p®M. If X is finite and 
estimation (which in this case is called hypothesis 
testing) is done in terms of a positive operator 
valued measure (E,),-y, Eo € BIHON), the prob- 
ability to get the estimate o € X when the input is 
in the state p® is given by tr(E,p®%). Hence, the 
cloning map derived from this estimation scheme is 
given by 


E,(p®%) = 3 (bsp je" [13] 
oEX 


A generalization to arbitrary X is straightforward, 
but requires the use of measure theory. It is easy to 
see that the cloning map E from eqn [13] is in 
general not optimal, in particular if M is only 
slightly bigger than N. However, E has the interest- 
ing feature that Ay, (E) depends only on the number 
of input systems, N, but not on the number of 
clones, M, we want to produce. This observation 
leads immediately to the conjecture that E becomes 
optimal in the limit M — co. A general proof is 
currently not available, in those cases, however, 
where optimal cloner and estimater can be explicitly 
calculated for all N and M (i.e., the cases treated in 
the sections “Universal pure-state cloning” and 
“Phase-covariant pure-state cloning”) the conjecture 
is true. A more detailed discussion of this problem 
together with information about its current status 


can be found on the web at http://www.imaph. 
tu-bs.de/qi/problems/problems-html. 


Examples 


In this section, we will discuss concrete examples 
that arise from different choices of the distance 
measure 6 and the set X of preferred states. 


Universal Pure-State Cloning 


The most frequently discussed case arises if X is the 
set of pure states, that is, the input states are pure, 
but otherwise unknown. Under this condition, it is 
sufficient to consider the symmetric part a of the 
tensor product H®N, and only cloning maps 
TBO") = Da ), because only this part 
affects the local or the global error. A complete 
solution for arbitrary N, M and all finite- 
dimensional Hilbert spaces is available for A, in 
Werner (1998) and for A, in Keyl and Werner 
(1999). Both cases admit the same (surprisingly 
simple) unique solution 


T.(o) = a5" (o Q aca [14] 


where Sy is the projection onto the symmetric 
tensor product H and d[M] denotes the dimen- 
sion of ag To derive these results, the group- 
theoretic methods sketched in the section “Covar- 
iant cloning maps” are used. The fact that global 
and local figures of merit are minimized by the same 
cloning map is surprising and a special feature of 
pure-state cloning. It implies that correlations and 
entanglement between the clones does not matter 
at all. 


Phase-Covariant Pure-State Cloning 


Consider a fixed basis |j}, 7=0,...,d — 1, in H and 
let X be the set of states given by 


d=1 
w= |0) + > e*l) [15] 
j=1 


where the ġ; denote arbitrary phases. Obviously, 
this set is invariant under the set of all unitaries 
which are diagonal in the given basis (i.e. a 
maximal torus in U(d)). Using the methods outlined 
in the section “Covariant cloning maps,” the 
corresponding cloning problem is (almost) comple- 
tely solved in Buscemi et al. (2005). For arbitrary 
d=dimH,N and all M=N-+dk, with REN a 


cloning map which minimizes global as well as 
local errors is given in terms of the unitary 


F ? TE 
UH ~ H U|) 


= |no + k,... ng +k) [16] 


where |n1,... na} n; EN, denotes the number 
basis of H®N associated with the distinguished 
basis |j) of H. 


Cloning Finitely Many States 


If X is a finite set of pure states, a general solution 
is not available, but there are several important 
partial results. The easiest situation arises if the 
elements of X are mutually orthogonal pure states. 
In this case, ideal cloning is possible in terms of an 
appropriately chosen unitary. If the states are 
linearly independent but nonorthogonal, ideal 
cloning is possible as well if we consider probabil- 
istic cloning machines (Duan and Guo 1998); that 
is, there is a nonvanishing probability that the 
machine fails and does not produce any clones at 
all (this means T is not unital). Optimal cloning 
(with deterministic operations) of two nonorthog- 
onal qubit states p; = |p;} (Wl, j= 1,2, is considered 
for all N, M in (Brufs et al. (1998) and Chefles and 
Barnett (1999)) (using averaged global fidelity as 
the figure of merit). The crucial observation in this 
case is that the optimal clones are pure, that is, 
Te J= (W; (Y;| and that the Y; lie in the 
subspace spanned by the (unattainable) ideal 
clones g~ 


Universal Mixed-State Cloning 


X =S(H) means that absolutely nothing is known a 
priori about the input state p. If the distance 
measure 6 is U(d) and permutation invariant 
(which is the case for all possible choices discussed 
in the section “The distance measure”) the analysis 
from the section “Covariant cloning maps” shows 
that a universal and symmetric minimizer exists. An 
explicit solution, however, is not known, and even 
the physically most appropriate choice for 6 is 
unclear. In contrast to the pure-state case, this is a 
serious question, because the set of optimal cloners 
is, in this case, much more sensitive to changes in 6. 
In particular, correlations among the clones become 
crucial, and it is very likely that local and global 
figures of merit lead to very different solutions. To 
emphasize this difference, an operation which 
minimizes only local errors is sometimes called 
“broadcasting,” rather than cloning. A related 
problem with (at least) partial solutions (“purifica- 
tion”) will be discussed in the section “Purification.” 
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Cloning of Gaussian States 


If the Hilbert space is infinite dimensional, the restric- 
tion to a reasonable small set X of preferred states is 
crucial, because otherwise the search for minimizers 
becomes hopeless. A physically relevant class with nice 
mathematical properties are Gaussian states and in 
particular coherent states. Cloning of the latter has been 
studied in Cerf et al. (2005) for the case N = 1 (and M 
arbitrary). As in the section “Covariant cloning maps,” 
it can be shown that the search for optimal cloners can 
be restricted to those which are covariant with respect 
to phase space translations. This simplifies the problem 
significantly and leads to the result that the global error 
is minimized by Gaussian cloning maps, while in the 
local case the best cloner is non-Gaussian. 


Asymmetric Cloning 


In all examples discussed up to now, we have 
considered symmetric cloners, that is, the quality of 
all clones is measured with equal weight. Alternatively, 
we can look for asymmetric cloners which produce 
clones with different quality and ask for the trade-off 
between them. This problem was first discussed in Cerf 
(2000) and later in Iblisdir et al. (2005). It can be 
regarded as a constraint optimization problem, where 
the error of the first M’ <M clones should be 
minimized under the constraint that the error of the 
rest is bounded by a fixed value. In Iblisdir et al. (2005), 
it is conjectured that for pure input states and local 
errors the optimal solution to this problem is given by 


T,(0) = V" (o S [poen y [17] 


where V is a linear combination of projections in the 
commutant of {U®N | U € U(H)}. This conjecture is 
true (at least) for qubits in the case 1—n+1 and 
1—1 +n. 


Related Problems 


Instead of cloning, we can also try to approximate 
other impossible machines by channels which 
operate on multiple inputs. To this end, we only 
have to replace the figure of merit [6] by 


Aia(T) = sup|1—F(tT.(p™), B(p))| [8] 
PEX 

where 3:S(H) — S(H) is a (possibly nonlinear) 
functional which describes the task we want to 
approximate. The generalization A, g of Aay can be 
given similarly. If G has the appropriate continuity 
and symmetry properties, the discussion in the 
section “General properties” applies completely, 
that is, we can assume covariance and permutation 
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invariance, and we can consider operations which 
use state estimation in an intermediate step. 


Purification 


Consider N quantum systems, all originally prepared in 
the same pure state o, and then subsequently exposed 
to the same (known) decoherence process, described by 
a depolarizing channel R. The task of purification is to 
produce M output systems which approximate the 
original pure input state as well as possible. Hence, 
the corresponding figure of merit arises with X = 
{R(c)|o pure} and 6(p)=R"(p). This problem is 
discussed for qubits in Cirac et al. (1999), Keyl and 
Werner (2001) and D’Ariano et al. (2005). The 
optimal purifier can be given explicitly for all N, M in 
terms of irreducible SU(2) representations. Surpris- 
ingly, it turns out that the output purity can be 
improved even if the number of outputs, M, is larger 
than the number of available input systems, N 
(although N should be large enough). If we measure 
purity in terms of local errors, it can be shown that, in 
the limit N — ov, perfectly purified qubits can be 
produced at an infinite rate (i.e., the number of output 
systems per input system can become infinite). How- 
ever, we have to pay for this result with extremely large 
correlations between the output systems. Therefore, the 
global error does not disappear asymptotically, if we 
insist on a nonvanishing rate. 


Universal Not 


“Universal not” (UNOT) is an operation which 
sends each pure state ø to its orthocomplement. This 
is a positive but not a completely positive operation. 
Hence, it cannot be performed by any physical 
device. However, we can try to approximate it by a 
cloning map T operating on N input systems. The 
corresponding figure of merit [18] arises if X is the 
set of pure states and G(p)=1-— p. In Buzek et al. 
(1999), it is shown that the optimal solution to this 
problem (for all N and M) is to estimate and 
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The purpose of this article is to introduce some of the 
main ideas of optimal transportation theory. A lot 
more can be found in Villani’s book (Villani 2003), in 
a somewhat similar spirit. Supplementary information 
is also available in Ambrosio et al. (2005), Evans and 
Gangbo (1999), and Riischendorf and Rachev (1990). 


reprepare as described in the section “Relationships 
with quantum state estimation.” Approximating 
UNOT is, therefore, significantly more difficult 
than (pure-state) cloning, where the optimal solution 
is always (for finite M) better than estimation. 


See also: Channels in Quantum Information Theory; 
Compact Groups and Their Representations; Positive 
Maps on C*-algebras. 
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Transportation Maps 
Let us start by a rather abstract definition: 


Definition 1 Let X and Y be two topological 
spaces with Borel probability measures a and £, 
respectively. We say that a Borel map T: X — Y isa 
transportation map between (X, a) and (Y, 8) if, for 
each Borel subset A of Y, 


It is customary to say that T pushes forward a to 
B, or to say that 8 is the image of a by T. An abstract 
measure-theoretic result asserts that there is always 
such a transportation map T, as soon as a has no 
atom (i.e., the a measure of any point x € X is zero). 

A more concrete situation is when X = Qo, Y=, 
where Qo and Qı are two smooth bounded open 
subsets of the d-dimensional Euclidean space R®. In 
such a case, a classical result, due to Moser and 
improved by Dacorogna and Moser (1990), reads: 


Theorem 1 Let Qo and Qı be two smooth bounded 
open sets in R. Let po > 0 and pı >Q be two 
smooth functions on R? such that 


L po(x)dx = h, pı(x)dx = 1 


Then there is a smooth transportation map T 
between (Qo, po(x)dx) and (Q1, poly)dy). Further- 
more, T is an orientation-preserving diffeomorphism 
and solves the Jacobian equation: 


pı(T(x)) det(DT(x)) = po(x), YxEQo [1] 


Transportation Maps with Convex 
Potentials 


An important property of Moser’s construction, 
which we did not state, is the possibility of 
prescribing the restriction of T along the boundary 
OQo. If one does not care about this latter property, 
one can improve Theorem 1 as follows (Caffarelli 
1722): 


Theorem 2 Assume further that Qy is a uniformly 
strictly convex set. Then, there is a transportation 
map T with a smooth convex potential, namely 


T(x) = D(x), Vx € Qo 


for some smooth convex function ® defined on 
R? and strictly convex on Qo. In addition, among 
all Borel maps T transporting (Qo, po(x)dx) to 
(Q1, p1(y)dy), D® is the unique map that minimizes 


inf i T(x) — xP polx)dx 2] 


where |-| denotes the Euclidean norm on R°. 


Because of its characterization, T=D® is often 
called the “optimal transportation map” with respect 
to the “transportation cost” [2]. Notice that, because 
of the Jacobian equation [1], ® automatically is a 
classical solution to the Monge—Ampéere equation: 


p1(D®(x)) det(D2®(x)) = po(x), VxEQ [3] 
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(The Monge—Ampére equation is a famous geo- 
metric PDE, related to the seeking of hypersurfaces 
with prescribed Gaussian curvature.) The main gain 
with respect to Moser’s construction is the property 
that the optimal map T has, at each x € Qo, a 
Jacobian matrix DT(x) = D* (x) which is a positive- 
definite symmetric matrix. This property has been 
first exploited by McCann (1997) and later by many 
authors (see Villani (2003), for many references) 
to prove a large series of geometric and functional 
inequalities. A very fine example can be found in 
Barthe (1998). Let us just consider, as an elementary 
illustration, a short and sharp proof of the isoperi- 
metric inequality using the optimal transportation 
map. 


A Proof of the isoperimetric Inequality 
Using Optimal Transportation Maps 


Let us recall the isoperimetric inequality: 


Theorem 3 Let Q be a smooth bounded open 
subset in R. Then 


əN] > d|B" a 


holds true where Bı is the unit ball in R, |Q} and 
[OQ], respectively, denote the d-dimensional volume 
of Q and the (d — 1)-dimensional Hausdorff measure 
of the boundary OQ. In addition, the inequality 
becomes an equality if and only if Q is a ball. 


To prove this result, let us define densities: 


1 
polx) = Toy: LEN 


1 
pı(y) ~ ye By 


and consider the associated optimal transportation 
map D® from (Qo, po(x)dx) to (Q1, po(y)dy). From 
the Monge—Ampéere equation, 


p1(D®(x)) det(D* B(x) = po(x) 
we get: 


_ [Bil 


det(D?6(x)) = A 


xed) [4] 


Since the range of D® on Q is the unit ball B4, we 
have 


I= | D®(x)-n(x)do(x) < J do(x) = |AQ| 
an an 


where n(x) and do(x) respectively, denote the out- 
ward unit normal and the (d -— 1)-dimensional 
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Hausdorff measure along 02. Using the divergence 
theorem, we also have: 


I= | ao(xjae 


where A®(x) = trace(D*®(x)) is the Laplacian of ©. 
From the geometric mean inequality, we know that, 
for any symmetric matrix A > 0, 


(det A)" < 1/d trace (A) 


holds true, with equality if and only if A is equal to 
the identity matrix multiplied by a non-negative 
scalar factor. Thus, 


I>d J dan sa d 
Q 


= djQ|'-/4\B,|'/4 


(because of [4]). So, we have obtained the isoperi- 
metric inequality: 


ƏN] > d|B1| $Q] 


Let us now consider the case when this inequality 
becomes an equality. Then, necessarily, for each x € 
Q,A=D’*®(x) satisfies detA = (trace(A) /d)4 and, 
therefore, must be the identity matrix multiplied by a 
scalar factor A > 0, possibly depending on x. Because 
of [4], the determinant of D*®(x) is constant over Q. 
Thus, A4>0O must be constant. It follows that 
D®(x) = A(x — a), for some point a in R. Therefore, 
Q must be the ball centered at a of radius 1/A. 


Monge’s Optimal Transportation Problem 


Theorem 2 is one of the numerous avatars of the so- 
called optimal transportation theory that goes back to 
Monge’s mass transfer problem which addressed in 
1781 the ‘mémoire sur la théorie des déblais et des 
remblais’ and was completely renewed by Kantorovich 
in the 1940s (see e.g., Rüschendorf and Rachev (1990) 
for instance). Let us quote a typical result, similar to 
Theorem 2, but without regularity assumptions on the 
data (see Brenier and Caffarelli (1992)): 


Theorem 4 Let po be a non-negative Lebesgue 
integrable function on RË, such that 


a po(x)dx = 1 


Then for any Borel probability measure p,(dy) with 
compact support on R? there is a unique map T 
transporting po(x)dx to p1(dy), which minimizes 


J [rl - Pooled 
R 


where |-| denotes the Euclidean norm on R°. In 
addition, there is a Lipschitz continuous convex 
function ® defined on R! such that T(x) =D®(x) 
for po almost every x € RË, which implies: 


[foe @)po(x)de = | Fold) 
R R 


for all continuous functions f on R°. 


Theorem 2, which can be interpreted as a 
regularity result with respect to Theorem 4, is the 
main output of Caffarelli’s regularity theory for 
transportation maps with convex potentials 
(Caffarelli 1992). Caffarelli’s analysis starts by a 
proof that ® actually is a weak solution of the 
Monge—Ampére equation [3] in the sense of Alex- 
androv and is strictly convex. Then, Caffarelli shows 
that D*® is Holder continuous, as soon as po and p1 
are Holder continuous. 

Notice that the convexity assumption for Q4 is 
crucial to insure the regularity of the convex 
potential. Caffarelli provided counter-examples 
when Qı is made of two separate balls attached 
together by a sufficiently thin pipe. 

Surprisingly enough, results such as Theorem 4 
are related to concrete applications in, for example, 
astrophysics, image processing, etc. (Frisch et al. 
2002, Haker and Tannenbaum 2003). 


The Kantorovich Optimal Transportation 
Problem 


The Monge optimal transportation problem can 
be solved using the Kantorovich duality method, 
based on the key concept of “generalized transpor- 
tation maps,” also called “transportation plans” or 
“doubly stochastic measures.” The abstract defini- 
tion is: 

Definition 2 Let X and Y be two topological 
spaces with Borel probability measures a and Ø, 
respectively. We say that a Borel probability 
measure u on X x Y is a generalized transportation 
map, or a transportation plan, if its marginals are, 
respectively, a and 8, namely 


f wlexdyy = f ald») 

xEA,yeY xEA 

f udxd)= f By) 
xEX,yEB yeb 


for all Borel subsets A and B of X and Y, 
respectively. 


[5] 


The Monge-Kantorovich (MK) optimal transpor- 
tation problem amounts, given a “transportation 


cost,” that is, a continuous function c:X x Y — R, 
to find a minimizer for 


Iu = inf J c(x,y)u(dx, dy) 6 


where u is subject to be a transportation plan 
between (X,a) and (Y, 8). Notice that this problem 
is convex (and can be seen as an infinite-dimensional 
linear program) and its dual problem can be easily 
computed (using, e.g., Rockafellar’s theorem in 
convex analysis and assuming, for simplicity, that 
both X and Y are compact). 


Theorem 5 We have 
Ix = sup | alsja(dx) + f bo) aa) } 7 


where (a,b) is any pair of continuous functions, 
defined on X and Y, respectively, and subject to: 


a(x) + Bly) < (x,y), 


Of course, each transportation map T, in the sense 
of Definition 1, can be seen as a transportation plan 
u in the Kantorovich framework, just by setting 


u(dx, dy) = 6(y — T(x) )a(dx) 


Vx EX, Vy Ee Y 


which means 


| wlde.dyy= fad 
xEA, yeEB xEA,T(x)EB 


for all Borel subsets A and B of X and Y, 
respectively. Then, we have 


J etx. ryulde.dy) = | elx: Told) 


So, the MK problem can be seen as a “relaxed” 
version of the “classical? optimal transportation 
problem à la Monge: 


Iv = inf J A 8 


where T is subject to be a transportation map 
between (X,a) and (Y, 8). Indeed, we have Img < 
Im. It turns out that, in many important situations, 
there is no gap between these two values, which 
makes the MK problem a perfectly convenient 
convex substitute for the original, nonconvex, 
Monge transportation problem. This is, in particu- 
lar, the case of the situation considered in Theorem 
4, when the cost function is just 


2 
c(x,y) = |x — yl 
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or, more generally, c(x,y)=k(x — y), where k is a 
uniformly strictly convex function. A typical result is: 


Theorem 5 Let po be a non-negative Lebesgue 
integrable function on R°, with unit integral, and 
pi(dy) be a Borel probability measure with compact 
support on R*. Let k be a uniformly strictly convex 
function on R*. Then the MK problem 


fen / E 
u 


where u is subject to be a transportation plan 
between po(x)dx and pı(dy) on Rf, has a unique 
solution of form 


u(dx, dy) = 6(y — T(x) )a(dx) 


where T is the unique minimizer of the Monge 


problem: 
Iv = inf J e 


among all transportation maps T between po(x)dx 
and pı(dy) on RË. In addition Imp = Ím. 


Proof for Theorem 5 (Sketch) For simplicity, we 
assume that pọ and pı are both compactly supported 
in a ball B in RË and we limit ourselves to the 
simplest cost function k(x) =|x|7/2. We first denote 
by M the set of all Borel regular probability 
measures v on B x B having po(x)dx and p;(dy) as 
marginals, which means 


(x)v(dx, dy) = J E 
f(y)v(dx, dy) = J f(y)p1(dy) 


BxB 


BxB 


for all continuous functions fon R“. From Theorem 7, 
we deduce: 


max | x - yv(dx, dy) 
BxB 


veEeM 
= inf J (se) pol) + W(x) pr (x)]dx 
B 


where the infimum is taken over all pairs (®, Y) of 
continuous functions on B satisfying 


O(x)+ W(y)>x-y, Vx € BVyEB 


Then, it can be established that the infimum is attained 
by a pair (®,W) such that ® is the restriction of a 
Lipschitz continuous convex function defined on Rf, 
and for po(x)dx almost every point of Rf, Y coincides 
with the Legendre—Fenchel transform of ®, 


EE) = ey 2), 


xER 
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Moreover, if v=Vop, EM maximizes fp gX- yv 


(dx, dy), then 
P(x) + Yy) =x-y 


holds for vop-almost every (x,y) € R? x R’. Using 
well-known properties of the Legendre—Fenchel 
transform in convex analysis, one deduces that Vopr 
is necessarily of the form 


Vopr(dx, dy) = 6(y — D®(x)) po(x) dx 


which implies 


[fo world, dy) = | FOSE) dx 
R°xR R 


for all continuous functions f on R? and achieves the 
proof since the second marginal of Vopr is pi(dy). 


The Wasserstein Distance 


Optimal transportation theory is strongly related to 
the geometric analysis of probability measures. For 
simplicity, let us just consider the space Prob(B) of 
all Borel probability measures p supported by some 
fixed ball B in R*. This space is compact for the 
weak topology of measures. An equivalent definition 
of this topology is provided by the distance d, 
naturally attached to the MK problem: 


doop) in| f le-yPaldrd)) D 


where u is subject to be a transportation plan 
between pọ and pı on B. (Of course, more general 
convex functions k can be used to define the cost 
function.) It has become popular to call this distance 
as Wasserstein distance (or its generalizations for 
various k). It turns out that Prob(B) equipped with 
this distance has a formal Riemannian structure 
(Otto 2001, Ambrosio et al. 2005). For instance, 
given two probability measures po(x)dx and p1(x)dx, 
we can define a “shortest path” t— p(t,-) € 
Prob(B) such that p(0) = po, p(1) = p1, just by setting: 


p(t, dx) = J óla + (D®(a) — a)t — x)po(a)da, 
Vt € [0,1] 


where D® is the optimal transportation map 
between po and p; on B. This idea, which is 
somewhat related to the geometric analysis of 


hydrodynamics and various concepts of generalized 
flows Arnol’d and Khesin 1998, Brenier, was 


successfully used by McCann (1997) and Otto 
(2001). In particular, the concept of convexity 
along these geodesic paths on Prob(B) has been 
pointed out by McCann (1997) to be a crucial tool 
for new proofs of geometric and functional inequal- 
ities. Otto, and other contributors (see Ambrosio 
et al. (2005) for a comprehensive discussion), observed 
that many important parabolic or dissipative evolu- 
tion PDEs can be described as “gradient flows” (or 
“steepest descent”) of such functionals, with respect 
to the Wasserstein metric. 
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Introduction 


The exponential function, the logarithm, the trigo- 
nometric functions, and various other functions are 
often used in mathematics and physics. They are 
transcendental functions in the sense that they 
cannot be obtained by a finite number of operations 
as a solution of an algebraic (polynomial) equation. 
Typically, they are obtained by a Taylor series 
expansion. Many other higher transcendental func- 
tions arise in mathematical physics, often as solu- 
tions of differential equations. A precise knowledge 
of the behavior of such functions, their relation with 
other functions, addition, multiplication and com- 
position properties, representations as an infinite 
series, or as an integral, often shed a lot of light onto 
the problem in which they arise. If they are 
sufficiently useful to a large audience, then they 
usually get a name and they will be called special 
functions. In what follows, we describe a few of 
these special functions of one variable, but clearly 
this is just a tip of the iceberg. Many other special 
functions exist and we refer to the classical tables of 
Abramowitz and Stegun (1964) and the Bateman 
manuscript project (Erdélyi et al. 1953-55) for more 
special functions. Nowadays, there have been 
numerous g-extensions of special functions (see 
q-Special Functions). 


Gamma and Beta Function 


The gamma function is defined by 
re = J le” dt, Rz >00. [1] 
0 


It satisfies the functional equation T(z + 1) =2zI(z) 
and since '(1)=1 we have [(n+1)=n! for n EN. 
The gamma function therefore extends the factorial 
function for integers to complex numbers. The 
functional equation 


rari —-7=— 2) 


sin TZ 





allows to continue the gamma function analytically 
to Rz<0 and the gamma function becomes an 
analytic function in the complex plane, with a 
simple pole at O and at all the negative integers. 
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The residue of T(z) at z= —n is equal to (—1)"/n!. 
Legendre’s duplication formula is 


922-1 
Jt 


from which one can obtain the special value 
[T(1/2)=/zx. Finally, two useful infinite product 
representations are 





Pi22) = Pere F12) [3] 


l n'n? 
Me) = eel) eH) 
and 
1 yz A] —z/n 
re IT(a + z/n)e*! ) 


where y is Euler’s constant: 


=< | 
y = lim ( joe] = 0.577215 6649... [4] 
k=1 


The beta function is a function of two variables 
given by 


1 
BY) = / r = d 
Rx > 0, Ry > 0 [5] 


Clearly it satisfies B(x, y) = B(y, x) and it is related to 
the gamma function by 


_ TWO) 
Be) = Tery) i 


The gamma and beta function are quite useful in 
probability theory. One of the most common 
probability distributions on the positive real line is 
the gamma distribution 


=> 1 
BeT(a) 
The case a=3/2 is the Maxwell—-Boltzmann dis- 


tribution. The most common probability distribu- 
tion on the interval [0, 1] is the beta distribution 


Pr(X < x) 





X 
/ ehr- de x > 0 
0 





PY <8) = 5 7 [f ea- a 


where 0<x< 1. 
The psi function is the logarithmic derivative of 
the gamma function 


p(z) = [7] 
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It is meromorphic with simple poles at 0 and at the 
negative integers. Special values are y(1) = —y and 


where y is Euler’s constant. These can be obtained 
from the functional equation 


W(z) = W(z+1)-—- 


Bessel Functions 


Bessel’s differential equation is 


xy" + xy! + (x7 —v*)y = 0 [8] 
where derivatives are with respect to x and v is a 
complex number. This differential equation has a 
regular singularity at x=0 and an irregular singu- 
larity at x =oo. The standard method of finding a 
solution in the neighborhood of a regular singularity 
gives the solution 


2) pe o o E aa 
Nahe) EES 


and J_,(x) is another solution (if v Æ 0). The 
function J, is called the “Bessel function of the first 
kind” and v is the “order” of the Bessel function. 
The series x~”J,(x) is an entire function of the 
variable x. The function 


v(x) = Lule) costvm) -J-l 
sin( vr) 

is also a solution of Bessel’s differential equation 

and is known as the “Bessel function of the second 

kind of order v.” Two other solutions that are often 

used are 


HP (x) = Ju(x) +1Y,(x) 
HP (x) = Ju(x) — iY, (x) 


which are the first and second “Hankel functions.” 
Bessel functions appear if one solves the wave 
equation in cylindrical or spherical coordinates, using 
separation of variables. The Helmholtz equation 
V?F + k?F=0 in cylindrical coordinates p, d, z is 


OF 10F 10°F as 


2 
F= 
TATATA +k 0 


and if we look for a solution of the form 
f(p)g(¢)h(z), then this leads to a differential equation 
for f of the form 


where a and v are separation constants. The general 
solution is f(p) = Z,(p(k* — a*)), where Z, is any of 
the Bessel functions given higher or linear combina- 
tions of them. In spherical coordinates r, 0, œ the 
Helmholtz equation is 





OF 20F 1° corpor 
ðr? ror. 12002 r? OO 
1 O° F L KRF=0 


+ —— 
r2 sin? ġ ap 


and for a solution of the form f(r)g(@)h(¢) one 
obtains a differential equation for f of the form 





IEP op 


r dr? 


—v(v + 1)/P If = 0 


with general solution f(r) = Z,4(1/2)(Rr)/v. 
Bessel functions have very simple differentiation 
formulas: 


kI = z Jv- (2) 
RIR] = 2") 
The first formula can be seen as a lowering 


operation, the second as a raising operation. Some 
integral representations are 


= 62) : sin?” 8 cos(z cos 
L (z/2)" i v—1/2 
hkz) = Tw + 1/2) Ja — x*) cos zx dx 


which hold for Rv > —1/2. For real v the Bessel 
function J, has infinitely many real zeros, and when 
v > —1, then all the zeros are real. All the zeros are 
simple (except possibly at the origin). Each of 
the functions J,(z), Y,(z), HP (z), or HP (z) satisfies 
the recurrence relation 


Zay_1(Z) + 2ay41(z) = 2va,(z) 


and the differential—recurrence relation 


ay1(Z) — ay1(Z) = 2a,(z) 


Modified Bessel Functions 
The modified Bessel equation is 


xy" + xy! — (xt + *)y = 0 [9] 
Clearly J„(ix) is a solution of this equation. The 
“modified Bessel function of the first kind” is 
defined as 


L(x) = e” ™ 2] (xe™/*), —r <argx <1/2 [10] 


so that 


If v is not an integer, then I,(x) and I_,(x) are two 
linearly independent solutions of [9], and when v =n 
is an integer one has I,,(x)=I_,(x). The “modified 
Bessel function of the second kind” is defined by 


U(x) — I(x) 


TT 





K, (x) 


2 sin v7 


Some special cases of modified Bessel functions are 


[2 

Ty j2(x) = ~ sinh x 
2 

I_4/2(x) = — cosh x 


K1/2(x) = —e ~ 


and 


One has the integral representation 


K,(z) = J ecos cosh vx dx 
0 


~1/2 
say [2 e+ dx 


I (2) CEI [ a 


- yrTw++1/2) J1 
whenever Rv > —1/2. The “Airy functions” are 


given by 


vk 


AZ) = 3 


biz) = 


[1730 — hiys(Q)] = —— Kayo) 
2/3 |I13(¢) + hs(¢)| 


where ¢ = 227//3. They are both a solution of Airy’s 
differential equation 


y"(z) — zy(z) = 0 
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Hypergeometric Series 


A power series X` o Caz” is said to be hypergeo- 
metric when the ratio Cn+1/Cn is a rational function 
of the index n. Most series that one finds in calculus 
textbooks are hypergeometric series and some of 
them define important special functions. When 


Cnt (W +.a1)(n + az) +++ (4+ ay) 
Cn (n+by)(n + ba)--- (n+ bg)(n + 1) 





then we write the corresponding series as 


‘) 


a (Ap ) z” (1 1] 





where (a),=a(a+1)(a+2)---(a+n-—1), with 
(a)y)=1, is the rising factorial or Pochhammer 
symbol. When p and g are small, one also uses the 
notation »Fy(a1,...,4)3b1,...,6932) where a semi- 
colon (;) is used to separate the parameters in the 
numerator from the parameters in the denominator 
and also to separate the parameters from the 
variable z. Some special cases are: 


è the exponential series 
OO on 
oFo(—;—;2) = a, = exp(z) 


n=0 


e the geometric series 


zs 1 
rio) = 





e the binomial series 





e the logarithmic function 
eee eg — = ——log(1—-z 
2F1( ) DaT zlog — z) 


the Bessel function 
(2/2)"oFi(-;v + 1; -27/4) =T(v + 1).(2) 


For generic values of the parameters, we see that the 
hypergeometric series converges everywhere in the 
complex plane when q > p, it converges for |z| < 1 
when p =q + 1, and for p > q + 1 it is only defined at 
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z=0. When one of the numerator parameters is a 
negative integer, say a;=—m, then the series is 
terminating and defines a polynomial of degree m. 
None of the denominator parameters is allowed to 
be a negative integer —m, unless there is a 
numerator parameter which is a negative integer 
—k with k<m. For q> p, the hypergeometric 
series therefore defines an entire function which is 
the corresponding hypergeometric function. For 
p=q + 1, the hypergeometric series only converges 
in the open unit disk, but sometimes it can be 
continued analytically to a larger domain in the 
complex plane. The analytic continuation of the 
hypergeometric series is then called the hypergeo- 
metric function. Take for example the geometric 
series, then it is clear that the hypergeometric 
series converges in the open unit disk, but the 
corresponding hypergeometric function is defined 
in the whole complex plane with a simple pole at 
z=1. The logarithmic function —log (1 — z) has a 
hypergeometric series in the open unit disk, but it 
can be continued analytically to the complex plane 
with a cut along [1,œ) and a branch point 
at <= 1. 


Gauss Hypergeometric Function 


The most famous hypergeometric function is the 
Gauss hypergeometric function defined for |z| < 1 
by the hypergeometric series 


2F1( (a, b: C; Zz) y= re (a)y at = |12] 


which is often denoted by F(a, b; c; z). It is a solution 
of the hypergeometric equation 


A a a a 1 zy (Z) 
— aby(z) = 0 [13] 


and this solution is regular at z=0. Obviously, 
2Fı(a,b; c3z)=2F1(b,a;c;z). The six functions 
2Fıla + 1,b; cz), 2Fı(a,b + 1; c; z), and 2F1(a, b; c + 
1;z) are called contiguous to 2F1(a, b;c;z) and there 
are 15 linear relations (with coefficients which are 
linear functions of z) between 2F4(a, b;c;z) and any 
two contiguous functions. Two of these relations are 


(2a — c —az+bz)F(a,b;c;z) + (c —a)F(a — 1,b;c;z) 
+a(z— 1)F(a + 1,v;c;z)=0 
and 


c(a — (c — b)z)E (a, b; c;z) — ac(1 — z)F(a + 1, b; c; z) 
+ (c—a)(c — b)zF(a,b;c + 1;z)= 0 


Euler gave the integral representation 


2 F1 (a, b: Gi) 
7 T'(c) a 6 x)=] 
= T(byT(c — b) / (1 — zx)? me |e 


for Rc >O0 and Rb > 0. This allows to find the 
analytic continuation from the open unit disk to the 
complex plane. A useful result is the Gauss summa- 
tion formula 


[(c)P(c —a-— b) 
[(c —a)['(c — 5) 
R(c-a—b)>0 


2F; (a, b; c; 1) — 


The special case for a terminating series is known as 
the Chu-Vandermonde sum 





Aala = 


Pfaff’s transformation is 





1Fy(a,b;¢;2) = (1-2) Fy (a c—b;c; 


z 
z—1 


Diez) 


and Euler’s transformation is 


2F (a,b; cz) = (1 — 2) 772 Fi(c — a,c — 


Confluent Hypergeometric Function 


The hypergeometric series 1Fı(a;c;z) defines an 
entire function in the complex plane and satisfies 
the differential equation 


zy (z) + (c — z)y (z) — ay(z) = 0 [15] 


This hypergeometric series (and the differential equa- 
tion) are formally obtained from 2F1(a, b; c;z/b) by 
letting b — oo, which gives a confluence of two of the 
singularities at z=00. This is the reason why the 
differential equation [15] is known as the confluent 
hypergeometric equation. The solution 


Paez = irian) [16] 


is called a confluent hypergeometric function, and a 
second linearly independent solution of [15] is 
z'-§@(c —a4+1,2 -— cz). The function 


(1 -c) 
T(a—c+1) 
r(c—1) 


eao * ea 


W(a,c;z) = ® (a,c; 2) 


—c+1,2-—c;z) [17] 


is therefore also a solution of eqn [15]. The 
following integral representations hold: 


n Fe) Te 
Bac) = re a), ext 1 (1 


whenever Rc > Ra > 0, and 


= x) dx 


1 CoO 
Va eg = | eyl (1 +x) IN dx 
0 


r(a) 


whenever Ra > Q. 
The “Whittaker functions” are defined as 


Mipi = e*/?.2°/* Ba, Cz) 
Wy (2) = e*/* 2° WB (a, Ce) 


with A= —a+c/2 and w=(c—1)/2. They are a 
solution of the Whittaker equation 





The “parabolic cylinder functions” are also con- 
fluent hypergeometric functions. They are given by 


D,(z) = 2°? e* 4 (—v/2, 1/2; 27/2) 
= 2° V/2e-*/42H((1 — v)/2, 3/2; 27/2) 


When v is a non-negative integer, one finds Hermite 
polynomials 


H,,(z) = 27/2e Da (V2z) 


Classical Orthogonal Polynomials 


A family of polynomials {p„(x) n E N}, where p, 
has degree n, is orthogonal on the real line if there is 
a positive measure u on the real line for which 


| Pan()Pm(2e)d pulse) = bnômn [18] 


Usually the measure u is absolutely continuous, in 
which case du(x)=w(x)dx with w a non-negative 
density function on the real line, or u is discrete and 
supported on a finite or at most countable set. Any 
family of orthogonal polynomials satisfies a “three- 
term recurrence relation” 


XPn(x) = Anpn+1 (x) + BuPn(x) + Crpn—i(x) [19] 
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with C,„,A„-1 >0 for every n>1. For the 
monic polynomials P„(x)=pn(x)/kn, with k,= 
1/(ApA1A2---A,—1) this relation becomes 


Pari (x) = (x — bn)Pu(x) — a,Pn—1(x) 


with b,=B, and a*=A,-1C,. This recurrence 
relation gives rise to a tridiagonal matrix 


bo d1 0 0 0 0 


d1 bı az 0 0 0 
0 az b> a3 0 0 

J = 0 0 a3 b3 a4 0 
0 0 0 apno 
0 0 0 O 


which is formally symmetric and which is called the 
“Jacobi matrix.” The spectral measure of this opera- 
tor, acting on (N), is equal to the orthogonality 
measure u whenever this symmetric operator can be 
extended to a self-adjoint operator. If this is not 
possible in a unique way — a situation which can occur 
for unbounded operators only — then every self-adjoint 
extension of J gives rise to a spectral measure which 
can be used for the orthogonality conditions [18]. In 
this case, there are infinitely many positive measures 
which can be used in the orthogonality relations and 
all these measures have the same moments 


Mn = | x” du(x) 


Some families of orthogonal polynomials have 
additional properties which are quite useful in 
many practical and physical applications, such as 
the following: 


e The derivatives p’, are again a family of orthogo- 
nal polynomials (Hahn property). 

e The polynomials p, satisfy a second-order linear 
differential equation of the form 


a(x)y (x) + Ty (x) = Any(x) 
where ø is a polynomial of degree at most 2, T is a 
polynomial of degree 1, both independent of n, 
and à; is a real number (Bochner property). 
e The polynomials can be obtained by a Rodrigues 
formula 
d” 7 
wlx)palx) = Cr- wla)" (E) 


where w is a non-negative function and o a 
polynomial of degree at most 2 (Hildebrand 


property). 


There are three families of orthogonal polynomials 
on the real line which have these three properties, and 
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each of these three properties characterizes these 
families. These are the Hermite polynomials, the 
Laguerre polynomials, and the Jacobi polynomials. In 
a more general situation when the orthogonality 
relation is described by a linear functional and the 
functional is not required to be positive, one has an 
additional family of Bessel polynomials. The densities 
w(x) for these families all satisfy a first-order differ- 
ential equation [o(x)w(x)]'=7T(x)w(x), where o is a 
polynomial of degree at most 2 and 7 a polynomial of 
degree 1. This equation is known as the “Pearson 
equation.” 


Hermite Polynomials 


Hermite polynomials H,,(x) are orthogonal with 
respect to the normal density w(x) = ee: 


J H, (x) Ain (xe dx = 2” bn m 


Observe that the density satisfies w’ = —2xw so that 
o=1 and r(x) = —2x. The recurrence relation is 


Haalt) =2xHa (x) — 27a) 


and the polynomials satisfy the second-order differ- 
ential equation 

y" (x) — 2xy' (x) + 2ny(x) = 0 
The functions h,(x) =e-* /2H,,(x) satisfy the differ- 
ential equation 


b!(x) + (2n+1—x*)b,(x) = 0 


The derivatives satisfy H}(x)=2nH,_1(x) (lowering 
operation) and one also has [e* H,,(x)]/=—e-* 
H,,41(x) (raising operation). The Rodrigues formula is 


nd 2 


e™ Hy(x) = (-1)"p5e" 





The polynomials can be written as a hypergeometric 
series 


Hy (ot) = (2x)"2Fo(—n/2, -(7 — 1)/2; —; 1/222) 
or alternatively as 


| 2/2 | k n—2k 
(—1)" (2x) 


k=0 


Their generating function is 


3 H,(x) =i exp(2xt — t°) 


Hermite polynomials are relevant for the analysis of 
the quantum harmonic oscillator, and the lowering 
and raising operators there correspond to creation 
and annihilation. 


Laguerre Polynomials 


Laguerre polynomials L°(x) are for a > —1 orthogo- 
nal with respect to the gamma density w(x) =x°e~* 
on [0, co): 


| LY (x) LO (x)x*e 7" dx = ———— 
0 


The Pearson equation is [xw] = (a + 1 — x\w so that 
o(x)=x and T(x)=a + 1-— x. The recurrence rela- 
tion is 


(n+ iD Bee (x) 
= (2n+a+1-—x)Li(x) — (n+ a)L%_ (x) 


and the differential equation is 
xy" (x) + (a + 1 — x)y' (x) + ny(x) = 0 
The functions 4„,(x) =x°/*e~*/* L% (x) satisfy 


y a e 
e) + (n+ 2 4 =) 


Differentiation has the effect that 
[Lr] = -Lt ) 
and 
el, = (n+ 1)x°be* L975 (x) 
The Rodrigues formula is 


1 d” 
~— nldx” 





lore e*] 


The hypergeometric expression is 
aL (x) = (a+ 1) aia 1;%) 


and the generating function is 





DL = (1-1) ep2) 


f 


Laguerre polynomials occur as eigenfunctions of the 
hydrogen atom. 


Jacobi Polynomials 


Jacobi polynomials P'%)(x) are orthogonal for the 
beta density w(x)=(1—x)*(1+x)? on [-1,1] 
whenever a >—1 and 8 >-1: 


1 
J POA (PEA (6) (1 — x) (1 + x)? dx 
=] 

Jatp+1 


B Tuta+1)(n+6+1) 
— Wnt+at+6+1 


Tin+a+6+1) 


Onn 


The Pearson equation is [(1 -x jw] =[8 -— a -— 
(a+ G+2)x|w and the differential equation is 


(1 — x*)y"(x) + [6 AR 
+n(n+a+ B+ 1)y(x) =0 


Differentiation has the effect 


n—1 


[PEA] = (nat BH 1)/2PO4r (x) 


and 


/ 


(1 — x)*(1 + x) PO (x)| 


= -2(n-4 1)(1 = x) 4) POG a) 


The Rodrigues formula is 


(1 — x) (1 +x) PP (x) 
a a 


= Qny! dx” 








a ee a 


In terms of hypergeometric series, one has 





Pe ) = (a +e 
n: 
(eee) 
xX oF, 
a+1 2 


Observe that one has P®®(— x) =(—1) P% (x). 
Special cases of the Jacobi polynomials are as 
follows: 
(0, 0) 


e The “Legendre polynomials” P,,(x)=P\>*)(x). 
They appear when the Laplacian is separated in 
spherical coordinates as functions of the polar 
angle 0, for which x = cos 0. 

e The “Chebyshev polynomials” of the first kind 


Ty (2) = PVA) PVA (A) 
and of the second kind 


a =G ReGen 
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These functions are more easily written by using the 
change of variable x = cos@ and then T,( cos 6) = 
cos nô and U,,( cos @) = sin (n + 1)6/sin 0. 

e The “Gegenbauer polynomials” or ultraspherical 
polynomials are Jacobi polynomials with equal 
parameters: 

Ci (x) = (2A)n/(A + 1/2), Pere (x) 

Gegenbauer polynomials are involved in the 
angular or spatial part of the wave function of 
physical systems in a central potential in both 
position and momentum space, and in the spatial 
part of the wave function of hydrogenic systems in 
momentum space, as well as in the eigenfunctions of 
several quantum-mechanical potentials, such as the 
relativistic harmonic oscillator. 


Other Classical Orthogonal Polynomials 


Instead of restricting attention to the differential 
operator D=d/dx, one can also use the (forward) 
difference operator A for which Af(x)=f(x +1) — 
f(x), the divided difference operator A) for which 
A yf (x) = Af (A(x))/AX(x) with a quadratic function 
A, or certain g-difference operators and look for 
orthogonal polynomials that satisfy difference equa- 
tions in the variable x. Together with the three-term 
recurrence relation (in the degree n), one then has 
families of polynomials satisfying a bispectral 
problem. For the difference operator and the divided 
difference operator, this gives several important 
families of orthogonal polynomials which all have 
a hypergeometric representation. These hypergeo- 
metric polynomials are usually listed in a table, and 
each level indicates the number of parameters and/or 
the order of the hypergeometric function. This table 
is known as Askey’s table and is given in Figure 1. 
The extension with q-difference operators involves 
basic hypergeometric series and g-extensions of 
classical orthogonal polynomials. 

“Charlier polynomials” C,(x;a) are orthogonal 
with respect to the Poisson distribution 


gk 


> Calk; a)Cn(k; a) T7 = 8/4" ôn 


T 
© 


The recurrence relation is 


aCn+1(x;a) + (x — n — a)C, (x; a) 
+ nC,_1 (x; a) = 0 


and the second-order difference equation is 


ay(x + 1) + (n — x — a)y(x) + xy(x — 1) = 0 
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Laguerre Charlier 


Figure 1 Askey’s table. 








The forward difference operator has the effect 
AC, (x3 a) = —n/aC,_1(x;a) and the backward differ- 
ence operator Vf(x)=f(x) — f(x —1) has the effect 
V [a® /x!C,,(x3a)] =a*/x!C,41(x3a). The hypergeo- 
metric representation is C,(x;a)=2Fo(—n — x; —; 
—1/a). Observe that the variable x appears as a 
parameter of the hypergeometric series. 
“Krawtchouk polynomials” K„(x;p, N) are ortho- 
gonal with respect to the binomial distribution: 


N 
S > Kn (Rs p, N)Kn (Rs p, N) A) a pN 
k—0 





mae! E DE 


where N is a positive integer and 0 < p < 1. They 
are given by K,(x;p,N)=2Fi(—n, —x; —N;1/p) 
and correspond to Meixner polynomials for which 
the parameter ( is a negative integer. 

“Meixner polynomials” m,(x;,c) are orthogonal 
with respect to the negative binomial distribution 
(Pascal distribution) 


= (3),c* n! 
My k; ) c)m;(k; ) = ——— on j 
3 a | e = 


where G>0 and 0<c<1. They are given by 
mlx; b,c) =2Fi(—n, =x; 851 — 1/0). 

“Meixner-Pollaczek polynomials” 
orthogonal on (— oo, ov): 





Poo) ate 


[PAs OPA peP + i)? dx 


— 2nT(n + 2A) 
(2 sin) ^n! 


m,n 


where A > 0 and 0 < ġ < mr. The appropriate differ- 
ence operator 6 has an imaginary shift 6f(x)= 


(x + i/2) —f(x—i/2) and one has 6PA(x;¢)= 
2 sin oP.* (x ;@). They are given by 
1 — aad 


“Hahn and dual Hahn polynomials” are orthogo- 
nal on a finite set of points. Hahn polynomials are 


given by 
1) 


and their orthogonality is with respect to a 
hypergeometric distribution on {0,1,...,N}. The 
appropriate difference operator is the (forward) 
difference operator A. They are related to the 3 — j 
symbols or Wigner coefficients that arise when 
considering angular momenta in two quantum 
systems. Dual Hahn polynomials are given by 


a el” Nae 
Palaso) = Elres m ( a 





—-n,n+a + b +1,—x 
a+1,-N 





OE a, 2, N) = sFa( 


Ry(A(x); 7,6, N) = AC” ce 1) 


y+1,-N 


where A(x) =x(x+7+6+4+1). They are obtained 
from the Hahn polynomials by interchanging the 
roles of n and x. They are orthogonal on the set 
{A(0), A(1),...,A(N)}. The appropriate difference 
operator is the divided difference operator which 
acts on f as Af(A(x))/AX(x) 

“Continuous Hahn and dual Hahn polynomials” 
are orthogonal on the real line. The continuous 
Hahn polynomials are 


Da Xia; b;C2d) 
_ lated td, 


n! 
x 3F> ( 1) 


-n,n+a+b+c+d-— 
a+c,a+d 
and the appropriate difference operator is the 
difference operator 6 with imaginary shift. The 
continuous dual Hahn polynomials are 


la+ix 





Su(x*;4, b,c) =(a+ 6), (4+ co), 


ont 


and the appropriate difference operator is the divided 
difference operator which acts on f as 6f(x*)/dx*. 
“Wilson polynomials” are the most general system 
of hypergeometric polynomials satisfying a bispec- 
tral problem. All the other classical orthogonal 
polynomials can be obtained from them by taking 


—nN,a+1x,a —1x 
a+b,at+ec 





appropriate parameters or as limiting cases. They are 
given by 


W,,(x*;a, b,c, d) 
(a+b), (a+c), a +d), 
(rrer a ta erme i 
= 4F;3 
a+b,a+c,a+d 





') 
and for R(a,b,c,d) > 0 (with nonreal parts appear- 


ing in conjugate pairs) they are orthogonal on the 
positive real line with respect to the weight function 


L(a + ix)T(b + ix)P(c + ix)r(d + ix) 


L T (2ix) 





“Racah polynomials” can be obtained from 
Wilson polynomials when the parameters are such 
that one of a+b, a+c, or a+d is a negative 
integer —N. They are given by 


Ry (A(x) 00,9, 5) 
=: ee beta P) 
— a+1,6+6+4+1,y+1 


wherea + 1=—NorG+6+1=—-Nory+1=—-N, 
and N is a non-negative integer. They are orthogonal on 
the finite set {A(0), A(1),..., A(N), where A(x) = x(x + 
y+6+1). They arise as 6 — j symbols in the coupling 
of three angular momenta. 


See also: Combinatorics: Overview; Compact Groups 
and their Representations; Integrable Systems: 
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Overview; Painlevé Equations; g-Special Functions; 
Random Matrix Theory in Physics; Separation of 
Variables for Differential Equations. 


Further Reading 


Abramowitz M and Stegun IA (1964) Handbook of Mathematical 
Functions, With Formulas, Graphs, and Mathematical Tables, 
National Bureau of Standards Applied Mathematics Series, 
vol. 55 (reprinted 1984). New York: Dover. 

Andrews GE, Askey R, and Roy R (1999) Special Functions, 
Encyclopedia of Mathematics and Its Applications, vol. 71. 
Cambridge: Cambridge University Press. 

Bailey WN (1935) Generalized Hypergeometric Series, Cambridge 
Mathematical Tract, vol. 32. Cambridge: Cambridge University 
Press. 

Erdélyi A, Magnus W, Oberhettinger F, and Tricomi FG (1953-1955) 
Higher Transcendental Functions, Bateman Manuscript Project, 
vols. 1-3. New York: McGraw-Hill. 

Gradshteyn IS and Ryzhik IM (1965) Table of Integrals, Series, 
and Products, chs 8-9, pp. 904-1080. New York: Academic 
Press. 

Koekoek R and Swarttouw R (1998) The Askey-Scheme of 
Hypergeometric Orthogonal Polynomials and Its q-Analogue. 
Reports of the faculty of Technical Mathematics and Infor- 
matics, no. 98-17, Delft University of Technology. 

Lozier D, Olver F, Clark C, and Boisvert R Digital Library of 
Mathematical Functions, http://dlmf.nist.gov. 

Nikiforov AF and Uvarov VB (1988) Special Functions of 
Mathematics Physics. Basel: Birkhauser. 

Nikiforov AF, Suslov SK, and Uvarov VB (1991) Classical 
Orthogonal Polynomials of a Discrete Variable, Springer 
Series in Computational Physics. Berlin: Springer. 

Szego G (1939) Orthogonal Polynomials, American Mathema- 
tical Society Colloquium Publications XXIII, 4th edn., 1975. 
Providence, RI: American Mathematical Society. 

Watson GN (1922) A Treatise on the Theory of Bessel Functions. 
Cambridge: Cambridge University Press. 





Painleve Equations 


N Joshi, University of Sydney, Sydney, NSW, 
Australia 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


The Painlevé equations P;-Py are six classical 
second-order ordinary differential equations that 
appear widely in modern physical applications. 
Their conventional forms (governing y(x) with 
derivatives y = dy/dx, y” = d*y/ dx?) are: 


P: | oy" =6y* +x 
Pr:  y"=2y +xy+a 
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where a, 3, y, 6 are constants. They were identified 
and studied by Painlevé and his school in their 
search for ordinary differential equations (in the 
class y” = R(x, y, y’), where R is rational in y’,y and 
analytic in x) that define new transcendental func- 
tions. Painlevé focussed his search on equations that 
possess what is now known as the Painlevé property: 
that all solutions are single-valued around all 


movable singularities (a singularity is “movable” if 
its location changes with initial conditions). 

For the Painlevé equations, all movable singula- 
rities are poles. For Py and Py, all solutions are 
meromorphic functions. However, the solutions of 
each of the remaining equations have other singula- 
rities called “fixed” singularities, with locations that 
are determined by the singularities of the coefficient 
functions of the equation. Pjy—Py; have a fixed 
singularity at x=oo. Py and Py have additional 
fixed singularities at x =0, and Py; has them at x = 0 
and 1. Although each solution of Pyy—Py; is single- 
valued around a movable singularity, it may be 
multivalued around a fixed singularity. 

Painlevé’s school considered canonical classes of 
ordinary differential equations equivalent under linear 
fractional transformations of y and x. Of the fifty 
canonical classes of equations they found, all except 
six were found to be solvable in terms of already 
known functions. These six lead to the Painlevé 
equations P|—Py; as their canonical representatives. 

A resurgence of interest in the Painlevé equations 
came about from the observation (due to Ablowitz 
and Segur) that they arise as similarity reductions 
of well-known integrable partial differential equa- 
tions (PDEs), or soliton equations, such as the 
Korteweg-de Vries equation, the sine-Gordon equa- 
tion, and the self-dual Yang-Mills equations. 

As this connection suggests, the Painlevé equations 
possess many of the special properties that are 
commonly associated with soliton equations. They 
have associated linear problems (i.e., Lax pairs) for 
which they act as compatibility conditions. There 
exist special transformations (called Backlund trans- 
formations) mapping a solution of one equation to a 
solution of another Painlevé equation (or the same 
equation with changed parameters). There exist 
Hamiltonian forms that are related to existence of 
tau-functions, that are analytic everywhere except at 
the fixed singularities. They also possess multilinear 
forms (or Hirota forms) that are satisfied by tau- 
functions. In the following subsections, for concise- 
ness, we give examples of these properties for the first 
or second Painlevé equations and briefly indicate 
differences, in any, with other Painlevé equations. 


2 Painlevé Equations 


Complex Analytic Structure of Solutions 


Consider the two-(complex-)parameter manifold of 
solutions of a Painlevé equation. Each solution is 
globally determined by two initial values given at a 
regular point of the solution. However, the solution 
can also be determined by two pieces of data given 
at a movable pole. The location xo of such a pole 
provides one of the two free parameters. The other 
free parameter occurs as a coefficient in the Laurent 
expansion of the solution in a domain punctured at 
xo. For Pi, the Laurent expansion of a solution at a 
movable singularity xo is 


1 

+2 (x — x0)" + a(x — x0)" + [1] 
where cy is arbitrary. This second free parameter is 
normally called a “resonance parameter.” For Py, 
the Laurent expansion of a solution at a movable 
singularity xo is 

+1 +-X0 
y(x) = Ca ee 

Fl-a 

4 


where cy is arbitrary. The symmetric solution of Pi 
that has a pole at the origin and corresponding 
resonance parameter cy = 0 has a distribution of poles 
in the complex x-plane shown in Figure 1. (This figure 
was obtained by searching for zeros of truncated 
Taylor expansions of the tau-function 7 described in 
the section “Backlund and Miura transformations.” 
One hundred and sixty numerical zeros are shown. 
The two pairs of closely spaced zeros near the 
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Figure 1 Poles of a symmetric solution of P; in the complex 


x-plane, with a pole at the origin and zero corresponding 
resonance parameter, i.e., Xo =0, Q =0. 


imaginary axis (between 8 < +Sx < 12) may be 
numerical artifacts. We used the command NSolve to 
32 digits in MATHEMATICA4.) 

The rays of symmetry evident in Figure 1 reflect 
discrete symmetries of P;. The solutions of Py and Py 
are invariant under the respective discrete symmetries, 


Pi; Valx) =e yey), n= +1, +2 
Pi ; yn (x) _ EMS 4 (E29 3), at eT" y 
(ae ls) 
The rays of angle 27/5 for Py and mn/3 for Py 
related to these symmetries play special roles in the 


asymptotic behaviors of the corresponding solutions 
for |x| — co. 


Linear Problems 


The Painlevé equations are regarded as completely 
integrable because they can be solved through an 
associated system of linear equations (Jimbo and 
Miwa 1981). 


dy 
de~ L(x, C)y [3a] 
dy 
E- Me, o 3b) 
The compatibility condition, that is, 
Lx — Me + [L, M] = 0 4] 


is equivalent to the corresponding Painlevé equation. 
The matrices L, M for Py and Py are listed below: 


Pr: ix.) = (5 et, r) 
(= fi 
meo-( Pe 


where z=y, z =6y +x 
m wnon(s Be 2, 2) 
Ee A 
medh ia) C au “0 ) 


where u = —uy, z =y —y* —x/2 
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0 :=37a 


Alternative linear problems also exist for each 
equation. For example, for Py, an alternative choice 
of L and M is (Flaschka and Newell 1980): 


-4i 0\, [0 4y 
Pyr: Ly (x,¢) = 0 äi te Ay 0 - 
ors 2 iy’ 
+ l . 
—2 iy i(x + 2y?) 
af0O 1 
gla o 
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The matrix L for each Painlevé equation is 
singular at a finite number of points a;(x) in the 
¢-plane. For the above choices of L for Py and Py, 
the point ¢=oo is clearly a singularity. For Ly, the 
origin Ç=0 is also a singularity. The analytic 
continuation of a fundamental matrix of solutions 
® around a; gives a new solution ® which must be 
related to the original solution: ®=@ A. A is called 
the monodromy matrix and its trace and determi- 
nant are called the monodromy data. In general, the 
data will change with x. However, eqn [4] ensures 
that the monodromy data remain constant in x. For 
this reason, the system [3] is called an isomonodr- 
omy problem. 


Backlund and Miura Transformations 


Backlund transformations are those that map a 
solution of a Painlevé equation with one choice of 
parameter to a solution of the same equation with 
different parameters. For P; no such transformation 
is known. For Py, there is one Backlund transforma- 
tion. Let y=y(x;a@) denote a solution of Py with 
parameter a. Then y=y(x;a— 1), which solves Py 
with parameter a — 1, is given by 


1 

2 3 
y — y —x/2 
If a=1/2, then y =y? +x/2 and ğ= —y (see the 
next section for this case). Combined with the 
symmetry yre—y,~a=—a, we can write down 


another version of this Backlund transformation 
which maps y to y=y(x;a@ + 1): 


ï :=-—y + ifa#1/2 [5] 


a+4 


1 
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If we parametrize a by c+n for arbitrary c, and 


denote the solution for corresponding parameter as 
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Yn, we can write a difference equation relating y,_1 
and y,+1 (by eliminating y’ from the two transfor- 
mations y, Vy) as 


ct itn 
Yn+1 T Yn 


This is an example of a discrete Painlevé equation (called 
“alternate” dP; in the literature). In such a discrete 
Painlevé equation, x is fixed while n varies. Another 
lesser known Bäcklund transformation for Py is 


= 1 
C z +n 
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y -y7 -3-8 =0 [7] 


v+yv=0 [8] 


between Py with a=1/2 and 
y+ 6 +50 =0 


which can be scaled (take v(x) = y(V2x)/V v26) to 
the usual form of Py with a=0. 

Miura transformations are those that map a solution 
of a Painlevé equation to another equation in the 50 
canonical types classified by Painlevé’s school. If y is a 
solution of Py with parameter a 4 1/2, then 








1 — w 
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which represents the 34th canonical class in the 
Painlevé classification listed in Ince (1927). 

The Painlevé equations do not possess contin- 
uous symmetries other than Backlund and Miura 
transformations described here. However, they do 
possess discrete symmetries described in the section 
“Complex analytic structure of solutions.” 


Classical Special Solutions 


Painlevé showed that there can be no explicit first 
integral that is rational in y and y’ for his 
eponymous equations. It is known that this state- 
ment can be extended to say that no such algebraic 
first integral exists. But the question whether the 
Painlevé equations define new transcendental func- 
tions remained open until recently. 

Form a class of functions consisting of those 
satisfying linear second-order differential equations, 
such as the Airy, Bessel, and hypergeometric functions, 
as well as rational, algebraic, and exponential func- 
tions. Extend this class to include arithmetic opera- 
tions, compositions under such functions, and 


4 Painlevé Equations 


solutions of linear equations with these earlier func- 
tions as coefficients. Members of this class are called 
classical functions. For general values of the constants 
a, B, Y, ô, itis now known (Umemura 1990, Umemura 
and Watanabe 1997) that the six Painlevé equations 
cannot be solved in terms of classical functions. 
However, there are special values of the constant 
parameters a, 6, y, 6 for which classical functions do 
solve the Painlevé equations. Each Painlevé equation, 
except Pr, has special solutions given by classical 
functions when the parameters in the Painlevé equa- 
tion take on special values. For Py, with a=1/2 we 
have the special integral 


x 
hz2=y -y -3=0 [9] 


which, modulo Py with a= 1/2, satisfies the relation 


d 


The Riccati eqn [9] can be linearized via y = —y//w 
to yield 


Nl x = 
which gives 
w(x) = a Ai(—2-'/3x) + b Bi(—271/8 x) 


for arbitrary constants a and b, that is, the well- 
known Airy function solutions of Py. Iterations of 
the Backlund transformations y and y, [5|-[6] give 
further classical solutions in terms of Airy functions 
for the case when a=(2N + 1)/2 for integer N. 
Similarly, there is a sequence of rational solutions of 
the family of equations Py with a = N, for integer N, if 
we iterate the Backlund transformations y,y by 
starting with the trivial solution y = 0 for the case 
a = 0. For example, for a= 1, we have y = —1/x. The 
transformations [7|-[8] give a mapping that shows 
that this family of rational solutions and the above 
family of Airy-type solutions of Py both exist for the 
cases when a is half-integer and when it is integer. 


Hamiltonians and Tau-Functions 


Each Painlevé equation has a Hamiltonian form. For 
Pi and Py, these can be found by integrating each 
equation after multiplying by y’. These give 


Pi ad +y- | Qe +E 

12 4 x 

yo y X2 1 2 

7 =7 t3 al y(é) dé + ay + En 


where Eg and Eg are constants. We choose 
canonical variables q1ı(t) = y(x), pı(t) =y (x), where 
t=x. Furthermore, for Pi, we take 


and the Hamiltonian 


S pi o 
Hi = > — 241" — q2q1 + p2 
so that the Hamiltonian equations of motion 
gi; =OH/Op; and p; =—OH/0q; are satisfied. For 
Py, we take 


nlt)=x/2,  palt)= f * y(O2de 


and the Hamiltonian 


2 4 
Ay := ao — 2 — qaqi” + xb -~ aq 
We note that these Hamiltonians govern systems 
with two degrees of freedom and each is conserved. 
However, no explicit second conserved quantity is 
known (see comments on first integrals in the last 
section). 

Painlevé’s viewpoint of the transcendental solutions 
of the Painlevé equations as natural generalizations of 
elliptic functions also led him to search for entire 
functions that play the role of theta functions in 
this new setting. He found that analogous functions 
could be defined which have only zeros at the 
locations of the movable singularities of the Painlevé 
transcendents. These functions are now commonly 
known as tau-functions (also denoted 7-functions). 
For Py and Py, the corresponding tau-functions are 
entire functions (i.e., they are analytic everywhere in 
the complex x-plane). However, for the remaining 
Painlevé equations, they are singular at the fixed 
singularities of the respective equation. 

For Pi, all movable singularities of Py are double 
poles of strength unity (see eqn [1]). Therefore, the 
function given by 


Pe a(x) =exp (- [ [ vodas) 


has Taylor expansion with leading term (x — x0). 
In other words, 7(x) is analytic at all the poles 
of the corresponding solution of Pr. Since y(x) has 
no other singularity (other than at infinity), m(x) 
must be analytic everywhere in the complex x-plane. 
Differentiation and substitution of P; shows that 
T(x) satisfies the fourth-order equation 


Py: P (x) u(x) = 4r (x) 7} (x) — 34 (x)? — x(x)? 


Note that this equation is bilinear in r and its 
derivatives. Such bilinear, or in general, multilinear, 
equations are called Hirota-type forms of the Painlevé 
equations. The special nature of such equations is 
most simply expressed in terms of the Hirota D( = Dx) 
operator, an antisymmetric differential operator defined 
here on products of functions of x: 


D"f - g = (Oz — On)" FOl) len- 


Notice that 


/ / 
Dreger =r 


Dir -r = rr^”) — 47/79) + 37? 


Hence the equation satisfied by n(x) can be 
rewritten more succinctly as 


(Dy + x) Ty ie) 


For Py, a generic solution y(x) has movable simple 
poles of residue +1 (see eqn [2]). Painlevé pointed 
out that if we square the function y(x), multiply 
by —1 and integrate twice, we obtain a function 
with Taylor expansion with leading term (x — xo). 
However, the square is not invertible and to 
construct an invertible mapping to entire functions, 
we need two r-functions. We denote these by 7(x) 
and a(x): 


Prr: T(x) = exp (- [ [ rodas) 


on(x) = y(x) 7 (x) 
The equations satisfied by these tau-functions are 


Pu: Tlr) =r (x — a(x)? 


Hierarchies 


Each Painlevé equation is associated with at least 
one infinite sequence of ordinary differential 
equations (ODEs) indexed by order. These 
sequences are called hierarchies and arise from 
symmetry reductions of PDE hierarchies that are 
associated with soliton equations. 

Define the operator L,,{v(z)} (the Lenard recursion 
operator) recursively by 
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E tmt) — (a T wi + 7 Ln{v} 
L£i{v}=v 
where primes denote z-derivatives. Note that 
Lo{v} =v" +30" 
L3{v} =v + 10vv" + Sv? + 100° 


This operator is intimately related to the Korteweg-de 
Vries equation. (It was first discovered as a method of 
generating the infinite number conservation laws 
associated with this soliton equation.) 

The scaling v(z)=Ay(px), with A=(—2)', 
u=(—2) 1/3, shows that the case n=2 of the 
sequence of ODEs defined recursively by 


LAO SZ 


is P}. Hence this is called the first Painlevé hierarchy. 
A second Painlevé hierarchy is given recursively by 


d 
(+ 2y) Luly —y}=xy+an, n>1 


where a, are constants. 

Each Painlevé equation may arise as a reduction 
of more than one PDE. Since different soliton 
equations have different hierarchies, this means 
that more than one hierarchy may be associated 
with each Painlevé equation. 


See also: Backlund Transformations; Integrable Discrete 
Systems; Integrable Systems: Overview; Isomonodromic 
Deformations; Ordinary Special Functions; 
Riemann—Hilbert Methods in Integrable Systems; 
Riemann-Hilbert Problem; Solitons and Kac—Moody Lie 
Algebras; Two-Dimensional Ising Model; WDVV 
Equations and Frobenius Manifolds. 
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Introduction 


Many physical laws are mathematically expressed 
in terms of partial differential equations (PDEs); 
this is, for instance, the case in the realm of 
classical mechanics and physics of the laws of 
conservation of angular momentum, mass, and 
energy. 

The object of this short article is to provide an 
overview and make a few comments on the set of 
PDEs appearing in classical mechanics, which is 
tremendously rich and diverse. From the mathema- 
tical point of view the PDEs appearing in mechanics 
range from well-understood PDEs to equations 
which are still at the frontier of sciences as far as 
their mathematical theory is concerned. The math- 
ematical theory of PDEs deals primarily with their 
“well-posedness” in the sense of Hadamard. A well- 
posed PDE problem is a problem for which 
existence and uniqueness of solutions in suitable 
function spaces and continuous dependence on the 
data have been proved. 

For simplicity, let us restrict ourselves to space 
dimension 2. Several interesting and important PDEs 
are of the form 

Oru o?u o?u 


mitts — = 1 
I oxy Oe i 1 





Here a, b, c may depend on x and y or they may be 
constants, and then eqn [1] is linear: they may also 
depend on u, Ou/Ox, and Ou/Oy, in which case the 
equation is nonlinear. 

Such an equation is 


@ elliptic when (where) b? — 4ac < 0, 
e hyperbolic when (where) b? — 4ac > 0, 
è parabolic when (where) b? — 4ac=0. 


Among the simplest linear equations, we have the 
elliptic equation 


Au = 0 [2] 


which governs the following phenomena: equation 
for the potential or stream function of plane, 
incompressible irrotational fluids; equation for 
some potential in linear elasticity, or the equation 
for the temperature in suitable conditions (sta- 
tionary case; see below for the time-dependent 
case). 


Another eqn of the form [1] is the hyperbolic 
equation 
u o? 
cca Ec 3] 
Ot? Ox? 
which governs, for example, linear acoustics in one 
dimension (sound pipes) or the propagation of an 
elastic wave along an elastic string. 
A third equation of type [1] is the linear parabolic 
equation 


Ou Oru 


Ot Ox 4 


also called the heat equation, which governs, under 
appropriate circumstances, the temperature (u(x, t) = 
temperature at x at time t). 

All these equations are well understood from the 
mathematical viewpoint and many well-posedness 
results are available. A fundamental difference 
between eqns [2], [3], and [4] is that for [2] and 
[4] the solution is as smooth as allowed by the data 
(forcing terms, boundary data not mentioned here), 
whereas the solutions of [3] usually present some 
discontinuities corresponding to the propagation of 
a wave or wave front. 

A considerable jump of complexity occurs if we 
consider the equation of transonic flows in which 


where v=v(x, y) is the local speed of sound. This is 
a mixed second-order equation: it is elliptic in the 
subsonic region where M < 1,M the Mach number 
being the ratio of the velocity 


d du\? du\? 

edud = (a) +5) 

to the local velocity of sound v=v(x,y); eqn [1] 
(with [5]) is hyperbolic in the supersonic region, 
where M>1 and parabolic on the sonic line 
M=1. Essentially no result of well-posedness is 
available for this problem, and it is not even totally 
clear what are the boundary conditions that one 


should associate to [1]-[5] to obtain a well-posed 
problem. 


Intermediate mathematical situations are encoun- 
tered with the Navier-Stokes and Euler equations, 
which govern the motion of fluids in the viscous 
and inviscid cases, respectively. A number of 
mathematical results are available for these equa- 
tions (see Compressible Flows: Mathematical The- 
ory, Incompressible Euler Equations: Mathematical 
Theory, Viscous Incompressible Fluids: Mathema- 
tical Theory, Inviscid Flows); but other questions 
are still open, including the famous Clay prize 
problem, which is: to show that the solutions of the 
(viscous, incompressible) Navier-Stokes equations, 
in space dimension three, remain smooth for all time, 
or to exhibit an example of appearance of singularity. 
A prize of US$ 1 million will be awarded by the Clay 
Foundation for the solution of this problem. 

For compressible fluids, the Navier-Stokes equa- 
tions expressing conservation of angular momentum 
and mass read 


(5 (u V)u) 
— pAu+Vp—-—(A+n)V(V-4)=0 ~ [6 


“P+ (pu) = 0 7 
Here u=u(x,t) is the velocity at x at time f, 
p=p(x,t) the pressure, p the density; A, are 
viscosity coefficients, u >0,3A+2u>0. When 
u=A=0, we obtain the Euler equation (see Com- 
pressible Flows: Mathematical Theory). If the fluid 
is incompressible and homogeneous, then the den- 
sity is constant, p= pọ and 


V-u=0 [8] 


so that eqn [8] replaces eqn |7] and eqn [6] 
simplifies accordingly. 

Finally, let us mention still different nonlinear 
PDEs corresponding to nonlinear wave phenomena, 
namely the Korteweg-de Vries (see Korteweg-de 
Vries Equation and Other Modulation Equations) 
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Ou Ou u 
and the nonlinear Schrodinger equation (see Non- 
linear Schrödinger Equations) 


Ou 


2A 2 
— ——ig|[A A +aA=0 [10] 
Oz 


“0 
+13 Jz 
a,y > 0. These equations are very different from 
eqns [1]-[8] and are reasonably well understood 
from the mathematical point of view; they produce 
and describe the amazing physical wave phenom- 
enon known as the soliton (see Solitons and Kac- 
Moody Lie Algebras). 

This article is based on the Appendix of the book 
by Miranville and Temam quoted below, with the 
authorization of Cambridge University Press. 
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Introduction 


Let us recall that there are basically two algebraic 
infinite-dimensional distribution theories: 


e The first one is white-noise analysis (Hida et al. 
1993, Berezansky and Kondratiev 1995), and uses 
Fock spaces and the algebra of creation and 
annihilation operators. 

e The second one is the noncommutative differen- 
tial geometry of Connes (1988) and uses the entire 
cyclic complex. 


If we disregard the differential operations, these 
two distribution theories are very similar. Let us 
recall quickly their background on geometrical 
examples. Let V be a compact Riemannian manifold 
and E a Hermitian bundle on it. We consider an 
elliptic Laplacian Ag acting on sections w of this 
bundle. We consider the Sobolev space H;,k > 0, of 
sections w of E such that: 


[((at + 1)w, w) dmy < œœ [1] 


where dmy is the Riemannian measure on V and (,) 
the Hermitian structure on V. Hz, is included in 
H, and the intersection of all H, is nothing other 
than the space of smooth sections of the bundle E, 
by the Sobolev embedding theorem. 

Let us quickly recall Connes’ distribution theory: 
let a(n) be a sequence of real strictly positive 
numbers. Let 


g= Soon [2] 


where o, belongs to Hy” with the Hilbert structure 
naturally inherited from the Hilbert structure of Hg. 
We put, for C > 0, 


lelli c= >) C’a(n 


The set of ø such that |/o||; c, <œ is a Banach 
space called Coc. ,. The space of Connes functionals 
Co.— is the intersection of these Banach spaces for 
C > 0 and k > 0 endowed with its natural topology. 
Its topological dual Co_. is the space of distribu- 
tions in Connes’ sense. 


Non |H” [3] 


Remark We do not give the original version of the 
space of Connes where tensor products of Banach 


algebras appear but we use here the presentation of 
Jones and Léandre (1991). 


Let us now quickly recall the theory of distribu- 
tions in the white-noise sense. The main tools are 
Fock spaces. We consider interacting Fock spaces 
(Accardi and Bozejko (1998)) constituted of o 
written as in [2] such that 


lolz ce = C"a 


The space of white-noise functionals WN- is the 
intersection of these interacting Fock spaces A, c for 
C > 0,k > 0. Its topological dual WN_. is called 
the space of white-noise distributions. 
Traditionally, in white-noise analysis, one con- 
siders in [2] the case where o, belongs to the 
symmetric tensor product of H, endowed with its 
natural Hilbert structure. We get a symmetric Fock 
space Ati, and another space of white-noise 
Cu WN;,-œ. The interest in considering 
symmetric Fock spaces, instead of interacting Fock 
spaces, arises from the characterization theorem of 
Potthoff-Streit. For the sake of simplicity, let us 
consider the case where a(n)=1. If w if a smooth 
a of E, we can consider its exponential 


n)“ \|onllzen < 00 4] 


exp[w] = > an! A w®”, If we consider an element ® of 
= (®, exp[w J) satisfies two natural 
conditions: 

1. |(®, exp[w])| < Cexp| Cllellin] ] for some k >Q. 


2. z— (®, explwı + Zw2]) is entire. 


The Potthoff-Streit theorem states the opposite: 
a functional which sends a smooth section of V 
into a Hilbert space and which satisfies the two 
previous requirements defines an element of 
WN; -œ with values in this Hilbert space. More- 
over, if the functional depends holomorphically on 
a complex parameter, then the distribution 
depends holomorphically on this complex para- 
meter as well. 

The Potthoff-Streit theorem allows us to define 
flat Feynman path integrals as distributions. It is the 
Opposite point of view, from the traditional point of 
view of physicists, where generally path integrals are 
defined by convergence of the finite-dimensional 
lattice approximations. Hida-Streit have proposed 
replacing the approach of physicists by defining 
path integrals as infinite-dimensional distributions, 
and by using Wiener chaos. Getzler was the first 
who thought of replacing Wiener chaos by other 
functionals on path spaces, that is, Chen iterated 
integrals. In this article, we review the recent 


developments of path integrals in this framework. 
We will mention the following topics: 


e infinite-dimensional volume element 

e Feynman path integral on a manifold 

è Bismut—Chern character and path integrals 

e fermionic Brownian motion 

The reader who is interested in various rigorous 
approaches to path integrals should consult the 
review of Albeverio (1996). 


Infinite-Dimensional Volume Element 


Let us recall that the Lebesgue measure does not 
exist generally as a measure in infinite dimensions. 
For instance, the Haar measure on a topological 
group exists if and only if the topological group is 
locally compact. Our purpose in this section is to 
define the Lebesgue measure as a distribution. 

We consider the set C~(M; N) of smooth maps x(.) 
from a compact Riemannian manifold M into a 
compact Riemannian manifold N endowed with its 
natural Fréchet topology. S is the generic point of M 
and x the generic point of N. We would like to say 
that the law of x(S;) for a finite set of n different 
points S; under the formal Lebesgue measure dD(x(.)) 
on C™(M;N) is the product law of ndmy (This 
means that the Lebesgue measure on C™®(M;N) is a 
cylindrical measure). Let us consider a smooth 
function o, from (M x N)” into C. We introduce 
the associated functional F(a,,)(x(.)) on C~(M; N): 


P(on)(x(-)) 
= f on(S15- ++ Sm (S1) + (Sy) doy [5] 


If we use formally the Fubini formula, we get 
f FOEDE) 
C(M;N) 
= J Pe ae ae 6 
M” xN” 


We will interpret this formal remark in the framework 
of the distribution theories of the introduction. We 
consider V = M x N and E the trivial complex line 
bundle endowed with the trivial metric and a(n) = 1. 
We can define the associated algebraic spaces Co-o 
and WN- and we can extend to Cos- and WN- 
the map F of [5]. F sends elements of Co, and 
WN. into the set of continuous bounded maps of 
C~(M; N) where we can extend [6]. We obtain: 


Theorem 1 o> JM N) F(c)(x(.))dD(x(.)) defines 
an element of Co_. or WN _.. 
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Feynman Path Integral on a Manifold 


Let us introduce the flat Brownian motion s — B(s) 
in R? starting from 0. It has formally the Gaussian 


law 
1 1 
Z` os F 


where dD(B(.)) is the formal Lebesgue measure on 
finite-energy paths starting from 0 in R? (the 
partition function Z is infinite!). Let N be a compact 
Riemannian manifold of dimension d endowed with 
the Levi-Civita connection. The stochastic parallel 
transport on semimartingales for the Levi-Civita 
connection exists almost surely (Ikeda and Watanabe 
1981). Let us introduce the Laplace—Beltrami opera- 
tor An on N and the Eells—Elworthy—Malliavin 
equation starting from x (Ikeda and Watanabe 1981): 


dx(x) =Ta(x)dB(s) [7] 


d 
q Pls) 








as dD(B(.)) 


where B(.) is a Brownian motion in T,,(M) starting 
from 0 and s > 7,(x) is the stochastic parallel transport 
associated to the solution. s—.x,(x) is called the 
Brownian motion on N. The heat semigroup asso- 
ciated to Ay satisfies exp[—tAn]f (x) = Elf (x;(x))] for 
f continuous on N. Formally, there is a Jacobian which 
appears in the transformation of the formal path 
integral which governs B(.) into the formal path 
integral which governs x (x) 


dyie(1) =Zz1 exp[-I(x.(x))/2]JdD(x.(x)) [8] 


It was shown by B DeWitt, in a formal way, that the 
action in [8] is not the energy of the path and that 
there are some counter-terms in the action where the 
scalar curvature K of N appears (see Andersson and 
Driver (1999) and Sidorova et al. (2004) for rigorous 
results). In order to describe Feynman path integrals, 
we perform, as it is classical in physics, analytic 
continuation on the semigroup and on the “measure” 
dux(1) such that we get a distribution dy.(a) which 
depends holomorphically on a, Rea > 0. 

In order to return to the formalism of the 
introduction, we consider V = N, E the trivial com- 
plex line bundle and the symmetric Fock space and 
a(n) =1. To o,/n! belonging to H,”"” we associate 
the functional on P(N), the smooth path space on N: 


F(on,/n!)(x(.)) 
=] On(x(s1),---,X(Sy))ds1--- ds, [9] 
An 
where A, is the n-dimensional simplex of [0,1]” 
constituted of times 0<s;<---<s,<1 (Léandre 


(2003)). We remark that F maps WN, œ- into the 
set of bounded continuous functionals on P(N). We 
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introduce an element h of L*(N). The map which to 
w, a smooth function on N, associates exp[a(Ay + 
w)|b(Rea <0) satisfies the requirements (1) and (2) 
of the introduction and depends holomorphically on 
a. This defines by the Potthoff-Streit theorem a 
distribution ®, which depends holomorphically on a, 
Rea <0 with values in L*(N). By uniqueness of 
analytic continuation, we obtain: 


Theorem 2 If P,(N) is the space of smooth paths 
starting from x in N, we have 


no) = a | Flom) dia) (10) 
P,(N) 


Instead of taking functions, we can consider as 
bundle E the space of complex 1-forms on N. We 
then consider Chen (1973) iterated integrals: 


F(on)(x(-)) 
=J (Tn (OS) px Sa) WAM (Sids) 1 


such that F maps WN,,.o- into the set of measurable 
maps on P(N). These maps are generally not 
bounded. Namely, 


F(exp|u]) = exp} J (olal), da(s) [12] 


instead of exp| T w(x(s))ds] in the previous case. By 
using the Cameron-Martin-Girsanov-Maruyama for- 
mula and Kato perturbation theory, we get an analog 
of Theorem 2 for Chen iterated integrals, but for 
Rea <0, because we have to deal with a perturbation 
of An by a drift when we want to check (1) and (2). 
The interest of this formalism is that the parallel 
transport belongs in some sense to the domain of the 
distribution and that we get the flat Feynman path 
integral from the curved one by using an analog of [7]. 


Bismut-Chern Character 
and Path Integrals 


Since we are concerned in this part with index theory, 
we replace the free path space of N by the free smooth 
loop space L(N). We consider the case where V = N is 
a compact oriented Riemannian spin manifold and 
E=E_@E,. E_ is the bundle of complexified odd 
forms and E, is the bundle of complexified even 
forms. To o =n! t (w1 + wt) 8- Q (wn twl), we 
associate the even Chen (1973) iterated integral 


FG) =} (w (dx(s1), .) +wyds1)A--- 


n 


SOA (Sj) 5.) +wds,) [13] 


where s—x(s) is a smooth loop in N, w; is of odd 
degree and w! is of even degree. Let us recall that 
even forms on the free loop space commute. F(a,,) is 
built from even forms on the free loop space, which 
commute. This explains why we have to consider 
the symmetric Fock space. Therefore, if o belongs to 
WN; œ, then F(c)= >> F” (o), where F*"(c) is a 
measurable form on L(N) of degree 2r (see Jones 
and Léandre (1991) for an analogous statement in 
the stochastic context). 

Let us explain why the free loop space is 
important in this context. Let dv,(1) be the law of 
the Brownian bridge on N starting from x and 
coming back at x at time 1: this is the law of the 
Brownian motion x.(x) subject to return in time 1 at 
its departure. Let p,(x,y) be the heat kernel 
associated with x(x): the law of x;(x) is namely 
Di(x,y)dmn(y) (Ikeda and Watanabe 1981). We 
consider the Bismut-Hgegh—-Krohn measure on the 
continuous free loop space Lo(N): 


dP = pi(x,x)dx ® dy,(1) [14] 
This satisfies 
trlexp[—si An] fi- frexpL-(1 — sn) An] 
=] AED hE s 
Lo(N) 


(We are interested in the trace of the heat semigroup 
instead of the heat semigroup itself unlike in the 
previous section.) 

Since N is spin, we can consider the spin bundle 
Sp = Sp, ® Sp_ on it, the Clifford bundle Cl on it with 
its natural Z/2Z gradation (Gilkey 1995). Let us recall 
that the Clifford algebra acts on the spinors. A form w 
can be associated with an element w of the Clifford 
bundle (Gilkey 1995). We consider the Brownian loop 
x(.) associated to the Bismut-Hgegh—Krohn measure. 
If s < t, we can define the stochastic parallel transport 
Ts + from x(t) to x(s) (we identify a loop to a path from 
[0,1] into N with the same end values). We remark 
that with the notations of [13] 


| T0,s1 (wy (dx(s1)) + a ds1)F5,.5, oa 


n 


Kui (Wn (dx(s,)) + 7 ASn)Fq,1 —A [16] 


is a random almost surely defined even element of the 
Clifford bundle over x(0). Acting on Sp(x(0)), it thus 
preserves the gradation. We consider its supertrace 
trs A = trsp, A — trsp_A. This becomes a random vari- 
able on Lo(N). We introduce the scalar curvature K of 
the Levi-Civita connection on N, whose introduction 
arises from the Lichnerowicz formula given the square 
of the Dirac operator in terms of the horizontal 


Laplacian on the spin bundle ( age) ie ). We 
consider the expression fi n) exp[- h K s) ds/8] 
tr,A dP. This expression can ay extended 7 a on 
and therefore defines an element Wi of WN, -o called 
by Getzler (Léandre 2002) the Witten current. 

Bismut has introduced a Hermitian bundle € on M. 
He deduces a bundle €,, on L(N): the fiber on a loop x(.) 
is the space of smooth sections along the loop of £. We 
can suppose that € is a sub-bundle given by a projector p 
of a trivial bundle. We can suppose that the Hermitian 
connection on € is the projection connection A = pdp 
such that its curvature is R = pdp A pdp. Bismut (1985, 
1987) has introduced the Bismut—Chern character: 


Ch(é,.) =tr( | CO) ee conn 
^ (Adx(sy) — Rds,)) [17] 


Ch(&,,) is a collection of even forms equal to F(ø(£)), 
where o(€) belongs to WN,,~—. We obtain: 


Theorem 3 Let us consider the index Ind(D¢) of 


the Dirac operator on N with auxiliary bundle € 
(Hida et al. 1993). We have 


(Wi, o()) 


The proof arises from the Lichnerowicz formula, 
the matricial Feynman—Kac formula, and the decom- 
position of the solution of a stochastic linear 
equation into the sum of iterated integrals. 

By using the Potthoff-Streit theorem, we can do the 
analytic continuation of [18], as is suggested by the path- 
integral interpretation of Atiyah (1985) or Bismut 
(1985, 1987) of [18], motivated by the Duistermaat-— 
Heckman or Berline—Vergne localization formulas on 
the free loop space. For this, these authors consider the 
et a even form on the free loop space given by 

= fa \(d/ds)x(s)|"ds + dX.., where dX. is the 
wake derivative of ite Killing form X,, which to a 
pees ve $i . on the loop associates (X 4, X(.)) = 
lax ği (s)). We should obtain the heuristic formula 


(Wio =z"! J 7 Flo) Aexp| —51(x(.)) [19] 


We refer to Léandre (2002) for details. 

Let us remark that Bismut (1987) and Léandre 
(2003) has continued his formal considerations to 
the case of the index theorem for a family of Dirac 
operators. We consider a fibration 7:N—B of 
compact manifolds. Bismut replaces [19] by an 
integral of forms on the set of loops of N which 
project to a given loop of B. Bismut remarks that 
this integration in the fiber is related to filtering 
theory in stochastic analysis. 
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Fermionic Brownian Motion 


Alvarez-Gaumé has given a supersymmetric proof of the 
index theorem: the path representation of the index of 
the Dirac operator involves infinite-dimensional Berezin 
integrals, while in the previous section only integrals of 
forms on the free loop space were concerned. Rogers 
(1987) has given an interpretation of the work of 
Alvarez-Gaumé, which begins with the study of 
fermionic Brownian motion. Let us interpret the 
considerations of Rogers (1987) in this framework. 

We consider C. H is the space of L?-maps from 
[0,1] into C?. We denote such a =. by gls 7 
(als) -3 Pals), where gils) = qils) + V-Tp:ls 
pi(s) is the ith momentum and Me ) = ith ames 
We denote by A(H) the fermionic Fock space associated 
with H. 

We introduce the bilinear antisymmetric form on H: 


vay —p; (s) dq; (s 


+p; a dq; (s) |20] 


and we consider the formal expression exp[Q] = 
yoga! tO”. We define a state on A?(H)_ by 
wlot A ¢?)=0(6',¢7). We put — d;(s) = 1)0,s) + 
-1 1jo, where we take the ith coordinate in C*. 
We obtain, if s4 < s2, 


w(di(s1) A ¢;(s2)) = —-V—-16,, [21] 


where 6;,; is the Kronecker symbol. We change the 
sign if s2 > sı and we write 0 if s1 =sp. 

We consider the finite-dimensional space Pol of 
fermionic polynomials on C%. Pol is endowed with a 
suitable norm, and we consider Pol®” endowed with 
the induced norm. We consider a formal series 
a= Son, where o, belongs to Pol”. In order to 
simplify the treatment, we suppose that our fermio- 
nic polynomials do not contain constant terms. We 
introduce the following Banach norm: 


2(¢', g") 


n 


r lonl 22 


C 
UED DS 


We obtain the notion of Connes space Cow- in this 
simpler context: ø belongs to Cow- if ||o||~¢ < co for 
all C. If o, =P; ®---@P,, we associate 


Plon) = f PDA 
APea dsr ++ ds, [23] 


F can be extended in an injective continuous map 
from Cow- into A(H). By using [21], we get: 


Theorem 4 
Connes. 


exp[Q] is a distribution in the sense of 
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We have only to use the formula [21] and 


(exp[Q],¢' Ap Ad") = Pol A@P)} 24 


and to estimate the obtained Pfaffians when 1 — oo. 
Theorem 4 allows us to give a rigorous interpreta- 
tion of the fermionic Feynman—Kac formula of Rogers 
(1987). We refer to Roepstorff (1994) for details. 
exp[Q] should give a rigorous interpretation to the 
Gaussian Berezin integral with formal density 


exp[V—I |) > pils) dqi(s)]. 


See also: Equivariant Cohomology and the Cartan 
Model; Feynman Path Integrals; Functional Integration in 
Quantum Physics; Hopf Algebras and g-Deformation 
Quantum Groups; Index Theorems; Measure on Loop 
Spaces; Positive Maps on C*-Algebras; Stationary Phase 
Approximation; Stochastic Differential Equations; 
Supermanifolds; Supersymmetric Quantum Mechanics. 
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Introduction 


Peakons are singular solutions of the dispersionless 
Camassa—Holm (CH) shallow-water wave equation in 
one spatial dimension. These are reviewed in the 
context of asymptotic expansions and Euler—Poincaré 
(EP) variational principles. The dispersionless CH 
equation generalizes to the EPDiff equation (defined 
subsequently in this article), whose singular solutions 
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are peakon wave fronts in higher dimensions. The 
reduction of these singular solutions of CH and EPDiff 
to canonical Hamiltonian dynamics on lower-dimen- 
sional sets may be understood, by realizing that their 
solution ansatz is a momentum map, and momentum 
maps are Poisson. 

Camassa and Holm (1993) discovered the “peakon” 
solitary traveling-wave solution for a shallow- 
water wave: 


u(x,t) = ce7 ie [1] 


whose fluid velocity u is a function of position x on 
the real line and time t. The peakon traveling wave 


moves at a speed equal to its maximum height, at 
which it has a sharp peak (jump in derivative). 
Peakons are an emergent phenomenon, solving the 
initial-value problem for a partial differential equa- 
tion (PDE) derived by an asymptotic expansion of 
Euler’s equations using the small parameters of 
shallow-water dynamics. Peakons are nonanalytic 
solitons, which superpose as 


u(x,t) = Spal te tta [2] 
a=1 


for sets {p} and {q} satisfying canonical Hamiltonian 
dynamics. Peakons arise for shallow-water waves in 
the limit of zero linear dispersion in one dimension. 
Peakons satisfy a PDE arising from Hamilton’s 
principle for geodesic motion on the smooth 
invertible maps (diffeomorphisms) with respect to 
the H! Sobolev norm of the fluid velocity. Peakons 
generalize to higher dimensions, as well. We explain 
how peakons were derived in the context of 
shallow-water asymptotics and describe some of 
their remarkable mathematical properties. 


Shallow-Water Background for Peakons 


Euler’s equations for irrotational incompressible 
ideal fluid motion under gravity with a free surface 
have an asymptotic expansion for shallow- 
water waves that contains two small parameters, 
c and 6*, with ordering e > 6”. These small para- 
meters are €=a/hg (the ratio of wave amplitude to 
mean depth) and 6? =(ho/I,)* (the squared ratio of 
mean depth to horizontal length, or wavelength). 
Euler’s equations are made nondimensional by 
introducing x =/,x’ for horizontal position, z= hoz’ 
for vertical position, t= (1,/co)t’ for time, 7 =a for 
surface elevation, and y=(gl,a/co)y’ for velocity 
potential, where co = \/gho is the mean wave speed 
and g is the constant gravity. The quantity 
a=0o'/(hopcs) is the dimensionless Bond number, 
in which p is the mass density of the fluid and ø’ is 
its surface tension, both of which are taken to be 
constants. After dropping primes, this asymptotic 
expansion yields the nondimensional Korteweg-de 
Vries (KdV) equation for the horizontal velocity 
variable u=y,(x,t) at “linear” order in the small 
dimensionless ratios € and 67, as the left-hand side of 


3€ 6 5 
Uz + Ux + 5 Mx + zU — 30)Uxx, = O(c) — [3] 
Here, partial derivatives are denoted using sub- 
scripts, and boundary conditions are u=0 and 
ux =Q at spatial infinity on the real line. The famous 
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sech*(x — t) traveling-wave solutions (the solitons) 
for KdV [3] arise in a balance between its (weakly) 
nonlinear steepening and its third-order linear 
dispersion, when the quadratic terms in e and & 
on its right-hand side are neglected. 

In eqn [3], a normal-form transformation due to 
Kodama (1985) has been used to remove the other 
possible quadratic terms of order O(e) and O(64). 
The remaining quadratic correction terms in the 
KdV equation [3] may be collected at order O(e6?). 
These terms may be expressed, after introducing a 
“momentum variable,” 


m = u — vô Uxx [4] 
and neglecting terms of cubic order in € and 6%, as 


62 
Mi +My +5(umy + b mux) +70 —30)Uxx, =0 [5] 
In the momentum variable m=u — vô uxx, the 
parameter v is given by Dullin et al. (2001): 


E 19 — 300 — 450° 


"= e030) [e 


Thus, the effects of 67-dispersion also enter the 
nonlinear terms. After restoring dimensions in eqn 
[5] and rescaling velocity u by (b + 1), the following 
“b-equation” emerges, 


mı + com, tum, + b muy + T uxx = 0 [7] 


where m =u — a*u,, is the dimensional momentum 
variable, and the constants a* and T /co are squares of 
length scales. When a? — 0, one recovers KdV from 
the b-equation [7], up to a rescaling of velocity. Any 
value of the parameter b 4 —1 may be achieved in 
eqn [7] by an appropriate Kodama transformation 
(Dullin et al. 2001). 

As already emphasized, the values of the coeffi- 
cients in the asymptotic analysis of shallow-water 
waves at quadratic order in their two small para- 
meters only hold, modulo the Kodama normal-form 
transformations. Hence, these transformations may 
be used to advance the analysis and thereby gain 
insight, by optimizing the choices of these coeffi- 
cients. The freedom introduced by the Kodama 
transformations among asymptotically equivalent 
equations at quadratic order in € and & also helps 
to answer the perennial question, “Why are integr- 
able equations so ubiquitous when one uses asymp- 
totics in modeling?” 


Integrable Cases of the b-equation [7] 


The cases b=2 and b=3 are special values 
for which the b-equation becomes a completely 
integrable Hamiltonian system. For b=2, eqn [7] 
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specializes to the integrable CH equation of 
Camassa and Holm (1993). The case b=3 in [7] 
recovers the integrable equation of Degasperis and 
Procesi (1999) (henceforth DP equation). These two 
cases exhaust the integrable candidates for [7], as 
was shown using Painlevé analysis. The b-family of 
eqns [7] was also shown in Mikhailov and Novikov 
(2002) to admit the symmetry conditions necessary 
for integrability, only in the cases b=2 for CH and 
b=3 for DP. 

The b-equation [7] with b =2 was first derived in 
Camassa and Holm (1993) by using asymptotic 
expansions directly in the Hamiltonian for Euler’s 
equations governing inviscid incompressible flow in 
the shallow-water regime. In this analysis, the CH 
equation was shown to be bi-Hamiltonian and 
thereby was found to be completely integrable by 
the inverse-scattering transform (IST) on the real 
line. Reviews of IST may be found, for example, in 
Ablowitz and Clarkson (1991), Dubrovin (1981), 
and Novikov et al. (1984). For discussions of other 
related bi-Hamiltonian equations, see Degasperis 
and Procesi (1999). 

Camassa and Holm (1993) also discovered the 
remarkable peaked soliton (peakon) solutions of [1], 
[2] for the CH equation on the real line, given by [7] 
in the case b=2. The peakons arise as solutions of 
[7], when co =O and r =0 in the absence of linear 
dispersion. Peakons move at a speed equal to their 
maximum height, at which they have a sharp peak 
(jump in derivative). Unlike the KdV soliton, the 
peakon speed is independent of its width (a). 
Periodic peakon solutions of CH were treated in 
Alber et al. (1999). There, the sharp peaks of 
periodic peakons were associated with billiards 
reflecting at the boundary of an elliptical domain. 
These billiard solutions for the periodic peakons 
arise from geodesic motion on a triaxial ellipsoid, in 
the limit that one of its axes shrinks to zero length. 

Before Camassa and Holm (1993) derived their 
shallow-water equation, a class of integrable equa- 
tions existed, which was later found to contain eqn 
[7] with b =2. This class of integrable equations was 
derived using hereditary symmetries in Fokas and 
Fuchssteiner (1981). However, eqn [7] was not 
written explicitly, nor was it derived physically as 
a shallow-water equation and its solution properties 
for b=2 were not studied before Camassa and 
Holm (1993). (See Fuchssteiner (1996) for an 
insightful history of how the shallow-water equation 
[7] in the integrable case with b=2 relates to the 
mathematical theory of hereditary symmetries.) 

Equation [7] with b =2 was recently re-derived as a 
shallow-water equation by using asymptotic methods 
in three different approaches in Dullin et al. (2001), in 


Fokas and Liu (1996), and also in Johnson [2002]. All 
the three derivations used different variants of the 
method of asymptotic expansions for shallow-water 
waves in the absence of surface tension. Only the 
derivation in Dullin et al. (2001) used the Kodama 
normal-form transformations to take advantage of the 
nonuniqueness of the asymptotic expansion results at 
quadratic order. 

The effects of the parameter b on the solutions of 
eqn [7] were investigated in Holm and Staley (2003), 
where b was treated as a bifurcation parameter, in the 
limiting case when the linear dispersion coefficients are 
set to co=0 and [=0. This limiting case allows 
several special solutions, including the peakons, in 
which the two nonlinear terms in eqn [7] balance each 
other in the “absence” of linear dispersion. 


Peakons: Singular Solutions without 
Linear Dispersion in One Spatial 
Dimension 


Peakons were first found as singular soliton solutions 
of the completely integrable CH equation. This is eqn 
[7] with b = 2, now rewritten in terms of the velocity as 


Ut + Cou, + 3uu, + [Uyyy 


=a (Usxct + 2UtyUxx + UUxxx ) [8] 


Peakons were found in Camassa and Holm (1993) 
to arise in the absence of linear dispersion. That is, 
they arise when cọ=0 and [=O in CH [8]. 
Specifically, peakons are the individual terms in the 
peaked N-soliton solution of CH [8] for its velocity 


= pal —|x—q(t)|/a [9] 


in the absence of linear dispersion. Each term in the 
sum is a soliton with a sharp peak at its maximum, 
hence the name “peakon.” Expressed using its 
momentum, m=(1—a702)u, the peakon velocity 
solution [9] of dispersionless CH becomes a sum 
over a delta functions, supported on a set of points 
moving on the real line. Namely, the peakon 
velocity solution [9] implies 


=20 p(t 


because of the relation (1 — a7*d2)e™*!/* =2a6(x). 
These solutions satisfy the b-equation [7] for any 
value of b, provided co =0 and TF =0. 

Thus, peakons are “singular momentum solu- 
tions” of the dispersionless b-equation, although 


(x — qp(t)) [10] 


they are not stable for every value of b. From 
numerical simulations (Holm and Staley 2003), 
peakons are conjectured to be stable for b > 1. In 
the integrable cases b=2 for CH and b=3 for DP, 
peakons are stable singular soliton solutions. The 
spatial velocity profile e7*!/°/2a of each separate 
peakon in [9] is the Green’s function for the 
Helmholtz operator on the real line, with vanishing 
boundary conditions at spatial infinity. Unlike the 
KdV soliton, whose speed and width are related, the 
width of the peakon profile is set by its Green’s 
function, independently of its speed. 


Integrable Peakon Dynamics of CH 
Substituting the peakon solution ansatz [9] and [10] 
into the dispersionless CH equation 


m, + um; +2mu,=0, m =u — a Ux, [11] 


yields Hamilton’s canonical equations for the 
dynamics of the discrete set of peakon parameters 


qalt) and pa(t): 
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da (t) Opa E ~ 


Oda 





and palt) = [12] 





for a=1,2,..., N, with Hamiltonian given by 
(Camassa and Holm 1993): 


1 N 
— X | —|qa—qol/Q 
hn = a = Pappe oa de [13] 


Thus, one finds that the points x=qf(t) in the 
peakon solution [9] move with the flow of the fluid 
velocity u at those points, since u(g*(t),t)=q*(t). 
This means the g*(t) are Lagrangian coordinates. 
Moreover, the singular momentum solution ansatz 
[10] is the Lagrange-to-Euler map for an invariant 
manifold of the dispersionless CH equation [11]. 
On this finite-dimensional invariant manifold for 
the PDE [11], the dynamics is canonically 
Hamiltonian. 

With Hamiltonian [13], the canonical equations 
[12] for the 2N canonically conjugate peakon 
parameters p,(t) and gq,(t) were interpreted in 
Camassa and Holm (1993) as describing “geodesic 
motion” on the N-dimensional Riemannian mani- 
fold whose co-metric is g/({q})=e7!#-al/°, More- 
over, the canonical geodesic equations arising from 
Hamiltonian [13] comprise an integrable system for 
any number of peakons N. This integrable system 
was studied in Camassa and Holm (1993) for 
solutions on the real line, and in Alber et al. (1999) 
and Mckean and Constantin (1999) and references 
therein, for spatially periodic solutions. 
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Figure 1 A smooth localized (Gaussian) initial condition for the 
CH equation breaks up into an ordered train of peakons as time 
evolves (the time direction being vertical). The peakon train 
eventually wraps around the periodic domain, thereby allowing 
the leading peakons to overtake the slower emergent peakons 
from behind in collisions that cause phase shifts as discussed in 
Camassa and Holm (1993). Courtesy of Staley M. 


Being a completely integrable Hamiltonian soliton 
equation, the continuum CH equation [8] has an 
associated isospectral eigenvalue problem, discov- 
ered in Camassa and Holm (1993) for any values of 
its dispersion parameters cg and T. Remarkably, 
when co=0 and r =0, this isospectral eigenvalue 
problem has a purely “discrete” spectrum. More- 
over, in this case, each discrete eigenvalue corre- 
sponds precisely to the time-asymptotic velocity of a 
peakon. This discreteness of the CH isospectrum in 
the absence of linear dispersion implies that only the 
singular peakon solutions [10] emerge asymptoti- 
cally in time, in the solution of the initial-value 
problem for the dispersionless CH equation [11]. 
This is borne out in numerical simulations of the 
dispersionless CH equation [11], starting from a 
smooth initial distribution of velocity (Fringer and 
Holm 2001, Holm and Staley 2003). 

Figure 1 shows the emergence of peakons from an 
initially Gaussian velocity distribution and their 
subsequent elastic collisions in a periodic one- 
dimensional domain. This figure demonstrates that 
singular solutions dominate the initial-value pro- 
blem and, thus, that it is imperative to go beyond 
smooth solutions for the CH equation; the situation 
is similar for the EPDiff equation. 


Peakons as Mechanical Systems 


Being governed by canonical Hamiltonian equa- 
tions, each N-peakon solution can be associated 
with a mechanical system of moving particles. 
Calogero (1995) further extended the class of 
mechanical systems of this type. The r-matrix 
approach was applied to the Lax pair formulation 
of the N-peakon system for CH by Ragnisco and 
Bruschi (1996), who also pointed out the connection 
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of this system with the classical Toda lattice. A discrete 
version of the Adler—Kostant-Symes factorization 
method was used by Suris (1996) to study a discretiza- 
tion of the peakon lattice, realized as a discrete 
integrable system on a certain Poisson submanifold of 
gl(N) equipped with an r-matrix Poisson bracket. Beals 
et al. (1999) used the Stieltjes theorem on continued 
fractions and the classical moment problem for study- 
ing multipeakon solutions of the CH equation. Gen- 
eralized peakon systems are described for any simple 
Lie algebra by Alber et al. (1999). 


Pulsons: Generalizing the Peakon Solutions of 
the Dispersionless b-Equation for Other Green’s 
Functions 


The Hamiltonian by in eqn [13] depends on 
the Green’s function for the relation between 
velocity u and momentum m. However, the singular 
momentum solution ansatz [10] is “independent” of 
this Green’s function. Thus, as discovered in Fringer 
and Holm (2001), the singular momentum solution 
ansatz [10] for the dispersionless equation 


m,+um,+2mu,=0, with u = g xm [14] 


provides an invariant manifold on which canonical 
Hamiltonian dynamics occurs, for any choice of the 
Green’s function g relating velocity u and momen- 
tum m by the convolution u =g x m. 

The fluid velocity solutions corresponding to the 
singular momentum ansatz |10] for eqn [14] are the 
“pulsons”. Pulsons are given by the sum over N velocity 
profiles determined by the Green’s function g, as 


N 
u(x,t) = X palt)g(x, da(t)) [15] 
a=1 


Again for [14], the singular momentum ansatz [10] 
results in a finite-dimensional invariant manifold of 
solutions, whose dynamics is canonically Hamilto- 
nian. The Hamiltonian for the canonical dynamics 
of the 2N parameters p,(t) and q4(t) in the “pulson” 
solutions [15] of eqn [14] is 


1 N 
bn = 7 > Pa Po &(4a, qb) [16] 
a, b=1 


Again, for the pulsons, the canonical equations for the 
invariant manifold of singular momentum solutions 
provide a phase-space description of geodesic motion, 
this time with respect to the co-metric given by the 
Green’s function g. Mathematical analysis and numer- 
ical results for the dynamics of these pulson solutions 
are given in Fringer and Holm (2001). These results 
describe how the collisions of pulsons [15] depend 
upon their shape. 


Compactons in the 1/a* — 0 Limit of CH 


As mentioned earlier, in the limit that a* — 0, the 
CH equation [8] becomes the KdV equation. 
In contrast, when 1/a7—-0, CH becomes the 
Hunter—Zheng equation (Hunter and Zheng 1994): 


(u: + UU.) xx = z (u2), 


This equation has “compacton” solutions, whose 
collision dynamics was studied numerically and 
put into the present context in Fringer and Holm 
(2001). The corresponding Green’s function satis- 
fies —O,79(x)=26(x), so it has the triangular 
shape, g(x)=1-— |x| for |x|<1, and vanishes 
otherwise, for |x| > 1. That is, the Green’s func- 
tion in this case has compact support, hence the 
name “compactons” for these pulson solutions, 
which as a limit of the integrable CH equations 
are true solitons, solvable by IST. 


Pulson Solutions of the Dispersionless b-Equation 


Holm and Staley (2003) give the pulson solutions of 
the traveling-wave problem and their elastic colli- 
sion properties for the dispersionless b-equation: 


m,+um,+bmu,=0, with u = g *m [17] 


with any (symmetric) Green’s function g and for 
any value of the parameter b. Numerically, 
pulsons and peakons are both found to be stable 
for b > 1 (Holm and Staley 2003). The reduction 
to “noncanonical? Hamiltonian dynamics for the 
invariant manifold of singular momentum solu- 
tions [10] of the other integrable case b=3 with 
peakon Green’s function g(x,y) = e7 ~!/¢ is found 
in Degasperis and Procesi (1999) and Degasperis 
et al. (2002). 


Euler-Poincare Theory in More 
Dimensions 


Generalizing the Peakon Solutions of the CH 
Equation to Higher Dimensions 


In Holm and Staley (2003), weakly nonlinear analysis 
and the assumption of columnar motion in the 
variational principle for Euler’s equations are found 
to produce the two-dimensional generalization of the 
dispersionless CH equation [11]. This generalization is 
the EP equation (Holm et al. 1998a, b) for the 
Lagrangian consisting of the kinetic energy: 

(= ; J u? +a2(divu)*|dedy [18 
in which the fluid velocity u is a two-dimensional 
vector. Evolution generated by kinetic energy in 


Hamilton’s principle results in geodesic motion, 
with respect to the velocity norm ||z||, which is 
provided by the kinetic-energy Lagrangian. For 
ideal incompressible fluids governed by Euler’s 
equations, the importance of geodesic flow was 
recognized by Arnol’d (1966) for the L* norm of 
the fluid velocity. The EP equation generated by 
any choice of kinetic-energy norm without impos- 
ing incompressibility is called “EPDiff,” for “Euler— 
Poincaré equation for geodesic motion on the 
diffeomorphisms.” EPDiff is given by (Holm et al. 
1998a): 


(Sp V )m+ Vu -m+ m(divu) = 0 [19] 


with momentum density m = l/u, where = (1/2) 
lull” is given by the kinetic energy, which defines a 
norm in the fluid velocity ||z||, yet to be determined. 
By design, this equation has no contribution from 
either potential energy or pressure. It conserves the 
velocity norm ||z|| given by the kinetic energy. Its 
evolution describes geodesic motion on the diffeo- 
morphisms with respect to this norm (Holm et al. 
19964). 

An alternative way of writing the EPDiff equation 
[19] in either two or three dimensions is 


© m— ux curlm + V(u-m) +m(divu) =0 [20] 
This form of EPDiff involves all three differential 
operators: curl, gradient, and divergence. For the 
kinetic-energy Lagrangian £ given in [18], which is a 
norm for “irrotational” flow (with curl u= 0), we 
have the EPDiff equation [19] with momentum 
m = l/u =u — a? V (div u). 

EPDiff [19] may also be written intrinsically as 


O ôl , Ol 
dtéu "bu en 
where ad* is the L? dual of the ad-operation 
(commutator) for vector fields (see Arnol’d and 
Khesin (1998) and Marsden and Ratiu (1999) for 
additional discussions of the beautiful geometry 
underlying this equation). 


Reduction to the Dispersionless CH Equation 
in One Dimension 


In one dimension, the EPDiff equations [19]-[21] with 
Lagrangian £ given in [18] simplify to the dispersionless 
CH equation [11]. The dispersionless limit of the CH 
equation appears, because potential energy and pres- 
sure have been ignored. 
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Strengthening the Kinetic-Energy Norm to Allow 
for Circulation 


The kinetic-energy Lagrangian [18] is a norm for 
irrotational flow, with curl u = 0. However, inclusion 
of rotational flow requires the kinetic-energy norm to be 
strengthened to the H! norm of the velocity, defined as 


(= ; J ul? + a?(div u)? + a? (curl u)?] dx dy 


E 1 2 2 2 _ 1 2 
=5 | [w + a‘|Vul |dx dy = 5 lolli: |22] 


Here, we assume boundary conditions that give 
no contributions upon integrating by parts. The 
corresponding EPDiff equation is [19] with m= 
6¢/6u=u—a*Au. This expression involves inver- 
sion of the familiar Helmholtz operator in the 
(nonlocal) relation between fluid velocity and 
momentum density. The H! norm ||u\|7,, for the 
kinetic energy [22] also arises in three dimensions 
for turbulence modeling based on Lagrangian aver- 
aging and using Taylor’s hypothesis that the 
turbulent fluctuations are “frozen” into the Lagran- 
gian mean flow (Foias et al. 2001). 


Generalizing the CH Peakon Solutions 
to n Dimensions 


Building on the peakon solutions [9] for the CH 
equation and the pulsons [15] for its generalization 
to other traveling-wave shapes in Fringer and Holm 
(2001), Holm and Staley (2003) introduced the 
following measure-valued singular momentum solu- 
tion ansatz for the n-dimensional solutions of the 
EPDiff equation [19]: 


N 
m(x,t) =~ J P"(s, 1)5(x -O%(s,t))ds 3 
a=1 


These singular momentum solutions, called “diffeons,” 
are vector density functions supported in R” on a set of 
N surfaces (or curves) of codimension (n — k) for s € 
R? with k < n. They may, for example, be supported on 
sets of points (vector peakons, k = 0), one-dimensional 
filaments (strings, k = 1), or two-dimensional surfaces 
(sheets, k = 2) in three dimensions. 

Figure 2 shows the results for the EPDiff equation 
when a straight peakon segment of finite length is 
created initially moving rightward (East). Because of 
propagation along the segment in adjusting to the 
condition of zero speed at its ends and finite speed in its 
interior, the initially straight segment expands outward 
as it propagates and curves into a peakon “bubble.” 

Figure 3 shows an initially straight segment whose 
velocity distribution is exponential in the transverse 
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Figure 2 A peakon segment of finite length is initially moving 
rightward (east). Because its speed vanishes at its ends and it 
has fully two-dimensional spatial dependence, it expands into a 
peakon “bubble” as it propagates. (The various shades indicate 
different speeds. Any transverse slice will show a wave profile 


with a maximum at the center of the wave, which falls 
exponentially with distance away from the center.) 


direction, but is wider than a for the peakon 
solution. This initial-velocity distribution evolves 
under EPDiff to separate into a train of curved 
peakon “bubbles,” each of width a. This example 
illustrates the emergent property of the peakon 
solutions in two dimensions. This phenomenon is 
observed in nature, for example, as trains of internal 
wave fronts in the South China Sea (Liu et al. 1998). 

Substitution of the singular momentum solution 
ansatz [23] into the EPDiff equation [19] implies the 
following integro-partial-differential equations (IPDEs) 
for the evolution of the parameters {P} and {Q}: 


afs t) => f Pres G(Q"(s,2) 
=| 
~ Q°(s', t) ds’ 


È P (s, t) -5 J( (P*( (s, t) . PP(s', t) [24] 
ð 4 
E 


—~ Q(s', t) ds’ 


Importantly for the interpretation of these solutions, 
the coordinates s € R* turn out to be Lagrangian 
coordinates. The velocity field corresponding to the 
momentum solution ansatz [23] is given by 


(x.t) = 
N 
E bi! ee big g 
=>> j P 1) G( Q”( .t))d |25] 


G xm 


Figure 3 An initially straight segment of velocity distribution 
whose exponential profile is wider than the width a for the 
peakon solution breaks up into a train of curved peakon 
“bubbles,” each of width a. This example illustrates the 
emergent property of the peakon solutions in two dimensions. 


for u €R”. When evaluated along the curve 
x= QO*(s,t), this velocity satisfies 


N 
u(Q"(s,0),) => | PS 
b=1 


x G(O"(s, t) — O°(s', t) ds! 


_ IQ"(s,t) 
-< [26] 


Consequently, the lower-dimensional support sets 
defined on x=Q*(s,t) and parametrized by 
coordinates s € RE move with the fluid velocity. 
This means that the s € R* are Lagrangian coordi- 
nates. Moreover, eqns [24] for the evolution of these 
support sets are canonical Hamiltonian equations: 


ð _6HN ð 


=0"(s,t) = = P(s,t) = — 


60° 


The corresponding Hamiltonian function Hy :(R” x 


R”) — R is 
AG (s, t) en) 


x G(Q°(s, t), Q;(s', t)) ds ds’ [28] 


This is the Hamiltonian for geodesic motion on the 
cotangent bundle of a set of curves Q°(s,t) with 
respect to the metric given by G. This dynamics was 
investigated numerically in Holm and Staley (2003) 
which can be referred to for more details of the 
solution properties. One important result found 
“numerically” in Holm and Staley (2003) is that 
only codimension-1 singular momentum solutions 


appear to be stable under the evolution of the EPDiff 
equation. Thus, 

Stability for codimension-1 solutions: the singular 
momentum solutions of EPDiff are stable, as points 
on the line (peakons), as curves in the plane (filaments, 
or wave fronts), or as surfaces in space (sheets). 

Proving this stability result analytically remains an 
outstanding problem. The stability of peakons on the 
real line is proven in Constantin and Strauss (2000). 


Reconnections in Oblique Overtaking Collisions 
of Peakon Wave Fronts 


Figures 4 and 5 show results of oblique wave front 
collisions producing reconnections for the EPDiff 
equation in two dimensions. Figure 4 shows a single 
oblique overtaking collision, as a faster expanding 
peakon wave front overtakes a slower one and 
reconnects with it at the collision point. Figure 5 
shows a series of reconnections involving the 
oblique overtaking collisions of two trains of curved 
peakon filaments, or wave fronts. 


The Peakon Reduction is a Momentum Map 


As shown in Holm and Marsden (2004), the singular 
solution ansatz [23] is a momentum map from the 
cotangent bundle of the smooth embeddings of lower- 
dimensional sets R* € R”, to the dual of the Lie algebra 
of vector fields defined on these sets. (Momentum maps 
for Hamiltonian dynamics are reviewed in Marsden 
and Ratiu (1999), for example.) This geometric feature 
underlies the remarkable reduction properties of the 
EPDiff equation, and it also explains why the reduced 
equations must be Hamiltonian on the invariant 
manifolds of the singular solutions; namely, because 


pu 


Figure 4 A single collision is shown involving reconnection as the 
faster peakon segment initially moving southeast along the diagonal 
expands, curves, and obliquely overtakes the slower peakon 
segment initially moving rightward (east). This reconnection 
illustrates one of the collision rules for the strongly two-dimensional 
EPDiff flow. 
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Figure 5 A series of multiple collisions is shown involving 
reconnections as the faster wider peakon segment initially moving 
northeast along the diagonal expands, breaks up into a wave train 
of peakons, each of which propagates, curves, and obliquely 
overtakes the slower wide peakon segment initially moving 
rightward (east), which is also breaking up into a train of wave 
fronts. In this series of oblique collision, the now-curved peakon 
filaments exchange momentum and reconnect several times. 


momentum maps are Poisson maps. This geometric 
feature also underlies the singular momentum solution 
[23] and its associated velocity [25] which generalize 
the peakon solutions, both to higher dimensions and to 
arbitrary kinetic-energy metrics. The result that the 
singular solution ansatz [23] isa momentum map helps 
to organize the theory, to explain previous results, and 
to suggest new avenues of exploration. 
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Introduction 


Percolation as a mathematical theory was introduced 
by Broadbent and Hammersley (1957), as a stochastic 
way of modeling the flow of a fluid or gas through a 
porous medium of small channels which may or may 
not let gas or fluid pass. It is one of the simplest models 
exhibiting a phase transition, and the occurrence of a 
critical phenomenon is central to the appeal of 
percolation. Having truly applied origins, percolation 
has been used to model the fingering and spreading of 
oil in water, to estimate whether one can build 
nondefective integrated circuits, and to model the 
spread of infections and forest fires. From a mathema- 
tical point of view, percolation is attractive because it 
exhibits relations between probabilistic and algebraic/ 
topological properties of graphs. 

To make the mathematical construction of such a 
system of channels, take a graph G (which originally 
was taken as Zf), with vertex set V and edge set £, and 
make all the edges independently open (or passable) 
with probability p or closed (or blocked) with 
probability 1—p. Write Pp for the corresponding 
probability measure on the set of configurations of 
open and closed edges — that model is called bond 
percolation. The collection of open edges thus forms a 
random subgraph of G, and the original question stated 
by Broadbent was whether the connected component 
of the origin in that subgraph is finite or infinite. 

A path on G is a sequence v1, 12,... of vertices of G, 
such that for alli > 1, v; and v;+1 are adjacent on G. A 
path is called open if all the edges {v;,v;.1} between 
successive vertices are open. The infiniteness of the 
cluster of the origin is equivalent to the existence of 
an unbounded open path starting from the origin. 

There is an analogous model, called “site percola- 
tion,” in which all edges are assumed to be passable, 
but the vertices are independently open or closed 
with probability p or 1 — p, respectively. An open 
path is then a path along which all vertices are open. 
Site percolation is more general than bond percola- 
tion in the sense that the existence of a path for 


bond percolation on a graph G is equivalent to the 
existence of a path for site percolation on the 
covering graph of G. However, site percolation on 
a given graph may not be equivalent to bond 
percolation on any other graph. 

All graphs under consideration will be assumed to 
be connected, locally finite and quasitransitive. If 
A,B C V, then A=B means that there exists an 
open path from some vertex of A to some vertex of 
B; by a slight abuse of notation, u «>v will stand for 
the existence of a path between sites u and v, that is, 
the event {u} {v}. The open cluster C(v) of the 
vertex v is the set of all open vertices which are 
connected to v by an open path: 


C(v)={uEV:u v} 


The central quantity of the percolation theory is the 
percolation probability: 


o(p) := Pp{0 > œ} = Pp{|C(O)| = oof 


The most important property of the percolation 
model is that it exhibits a phase transition, that is, 
there exists a threshold value p. € [0,1], such that 
the global behavior of the system is substantially 
different in the two regions p < pe and p > pe. To 
make this precise, observe that @ is a nondecreasing 
function. This can be seen using Hammersley’s joint 
construction of percolation systems for all p € [0, 1] 
on G: let {U(v), v € V} be independent random 
variables, uniform in [0,1]. Declare v to be p-open 
if U(v) <p, otherwise it is declared p-closed. The 
configuration of p-open vertices has the distribution 
Pp for each p€[0,1]. The collection of p-open 
vertices is nondecreasing in p, and therefore 6(p) is 
nondecreasing as well. Clearly, (0) =0 and 0(1)=1 
(Figure 1). 





0 


Pe 1 


Figure 1 The behavior of (p) around the critical point 
(for bond percolation). 
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The critical probability is defined as 


Pc :=p:(G) = supip: O(b) = 0} 


By definition, when p < pe, the open cluster of the 
origin is P,-a.s. finite; hence, all the clusters are also 
finite. On the other hand, for p> p. there is a 
strictly positive P,-probability that the cluster of the 
origin is infinite. Thus, from Kolmogorov’s zero—one 
law it follows that 


P {|C(v)| = co for some vE V} = 1 forp>p- 


Therefore, if the intervals [0, pe) and (pe, 1] are both 
nonempty, there is a phase transition at pe. 

Using a so-called Peierls argument it is easy to see 
that p.(G) >0 for any graph G of bounded degree. 
On the other hand, Hammersley proved that 
p(Z*) <1 for bond percolation as soon as d > 2, 
and a similar argument works for site percolation 
and various periodic graphs as well. But for some 
graphs G, it is not so easy to show that p.(G) < 1. 
One says that the system is in the subcritical (resp. 
supercritical) phase if p < pe (resp. p > pe). 

It was one of the most remarkable moments in the 
history of percolation when Kesten (1980) proved, 
based on results by Harris, Russo, Seymour and 
Welsh, that the critical parameter for bond percolation 
on Z7 is equal to 1/2. Nevertheless, the exact value of 
p-(G) is known only for a handful of graphs, all of 
them periodic and two dimensional — see below. 


Percolation in Z’ 


The graph on which most of the theory was 
originally built is the cubic lattice Zf, and it was 
not before the late twentieth century that percola- 
tion was seriously considered on other kinds of 
graphs (such as Cayley graphs), on which specific 
phenomena can appear, such as the coexistence of 
multiple infinite clusters for some values of the 
parameter p. In this section, the underlying graph is 
thus assumed to be Zf for d> 2, although most 
of the results still hold in the case of a periodic 
d-dimensional lattice. 


The Subcritical Regime 


When p< pe, all open clusters are finite almost 
surely. One of the greatest challenges in percolation 
theory has been to prove that x(p):= Ep{|C(v)|} is 
finite if p < p. (Ep stands for the expectation with 
respect to P,,). For that one can define another critical 
probability as the threshold value for the finiteness of 
the expected cluster size of a fixed vertex: 


pr(G) := sup{p: x(p) < co} 


It was an important step in the development of the 
theory to show that pr(G)=p-(G). The fundamental 
estimate in the subcritical regime, which is a much 
stronger statement than pr(G) = p,(G), is the following: 


Theorem 1 (Aizenman and Barsky, Menshikov). 
Assume that G is periodic. Then for p < pe there 
exist constants 0 < C1, C2 < œ, such that 


Pp{|C(v)| > 2} < Cre 


The last statement can be sharpened to a “local 
limit theorem” with the help of a subadditivity 
argument: for each p < pe, there exists a constant 
0< C3(p) < oo, such that 


1 
lim —=log Pp{|C(v)| = n} = C3(p) 


The Supercritical Regime 


Once an infinite open cluster exists, it is natural to 
ask how it looks like, and how many infinite open 
clusters exist. It was shown by Newman and Schul- 
man that for periodic graphs, for each p, exactly one 
of the following three situations prevails: if N € 
Z4 U {oo} is the number of infinite open clusters, then 
P,(N=0)=1, or PAN =1)=1L or P(N =0)=1. 

Aizenman, Kesten, and Newman showed that the 
third case is impossible on Zf. By now several 
proofs exist, perhaps the most elegant of which is 
due to Burton and Keane, who prove that indeed 
there cannot be infinitely many infinite open clusters 
on any amenable graph. However, there are some 
graphs, such as regular trees, on which coexistence 
of several infinite clusters is possible. 

The geometry of the infinite open cluster can be 
explored in some depth by studying the behavior of 
a random walk on it. When d=2, the random walk 
is recurrent, and when d > 3 is a.s. transient. In all 
dimensions d > 2, the walk behaves diffusively, and 
the “central limit theorem” and the “invariance 
principle” were established in both the annealed and 
quenched cases. 


Wulff droplets In the supercritical regime, aside 
from the infinite open cluster, the configuration 
contains finite clusters of arbitrary large sizes. These 
large finite open clusters can be thought of as droplets 
swimming in the areas surrounded by an infinite open 
cluster. The presence at a particular location of a large 
finite cluster is an event of low probability, namely, on 
Zf d> 2, for p > pe, there exist positive constants 
0 < C4(p), Cs(p) < œ, such that 


1 
Calp) < — =r l8 Pp{|CW)| =n} < Cs(p) 


for all large n. This estimate is based on the fact that 
the occurrence of a large finite cluster is due to a 
surface effect. The typical structure of the large 
finite cluster is described by the following theorem: 


Theorem 2 Let d > 2, and p > p.. There exists a 
bounded, closed, convex subset W of RË containing 
the origin, called the normalized Wulff crystal of 
the Bernoulli percolation model, such that, under the 
conditional probability P,{-|nf < |C(0)| <0}, the 
random measure 


1 


xEC(0) 


(where 6, denotes a Dirac mass at x) converges 
weakly in probability toward the random measure 
O(p) lw(x — M) dx (where M is the rescaled center of 
mass of the cluster C(0)). The deviation probabilities 
behave as exp{—cn?'} (ie. they exhibit large 
deviations of surface order; in dimensions 4 and 
more it holds up to re-centering). 


This result was proved in dimension 2 by Alexander 
et al. (1990), and in dimensions 3 and more by Cerf 
(2000). 


Percolation Near the Critical Point 


Percolation in Slabs The main macroscopic obser- 
vable in percolation is (p), which is positive above 
Po 0 below po, and continuous on [0,1]\{pc}. 
Continuity at pe is an open question in the general 
case; it is known to hold in two dimensions 
(cf. below) and in high enough dimension (at the 
moment d > 19 though the value of the critical 
dimension is believed to be 6) using lace expansion 
methods. The conjecture that 6(p,) =0 for 3<d<18 
remains one of the major open problems. 

Efforts to prove that led to some interesting and 
important results. Barsky, Grimmett, and Newman 
solved the question in the half-space case, and simulta- 
neously showed that the slab percolation and half-space 
percolation thresholds coincide. This was complemen- 
ted by Grimmett and Marstrand showing that 


p-(slab) = pc (Z) 


Critical exponents In the subcritical regime, expo- 
nential decay of the correlation indicates that there 
is a finite correlation length €(p) associated to the 
system, and defined (up to constants) by the relation 


a 


where y is bounded on the unit sphere (this is known 
as Ornstein—Zernike decay). The phase transition can 
then also be defined in terms of the divergence of the 
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correlation length, leading again to the same value for 
pc; the behavior at or near the critical point then has no 
finite characteristic length, and gives rise to scaling 
exponents (conjecturally in most cases). 

The most usual critical exponents are defined as 
follows, if @(p) is the percolation probability, C the 
cluster of the origin, and €(p) the correlation length: 


fay 


gp EUC] ~ |p — pel ° 


| 
Pp [|C] = n] en? 
Predaj m 
E(P) ~ |P — Pel” 
P, [diam(C) = n] ~ n711” 
EIG lides E 


These exponents are all expected to be universal, 
that is, to depend only on the dimension of the 
lattice, although this is not well understood at the 
mathematical level; the following scaling relations 
between the exponents are believed to hold: 


2—-a=74+26 = B(6+4+1), A= 66, 7=v(2—7) 


In addition, in dimensions up to d.=6, two 
additional hyperscaling relations involving d are 
strongly conjectured to hold: 


do =6+1, dv=2-a 


while above de the exponents are believed to take 
their mean-field value, that is, the ones they have for 
percolation on a regular tree: 


w=-1,8=1,7=1,6=2 
n=0,v=5,p=7,A=2 


Not much is known rigorously on critical expo- 
nents in the general case. Hara and Slade (1990) 
proved that mean field behavior does happen above 
dimension 19, and the proof can likely be extended 
to treat the case d > 7. In the two-dimensional case 
on the other hand, Kesten (1987) showed that, 
assuming that the exponents 6 and p exist, then so 
do 3,7,7, and v, and they satisfy the scaling and 
hyperscaling relations where they appear. 


The incipient infinite cluster When studying long- 
range properties of a critical model, it is useful to 
have an object which is infinite at criticality, and 
such is not the case for percolation clusters. There 
are two ways to condition the cluster of the origin to 
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be infinite when p = pe: The first one is to condition 
it to have diameter at least n (which happens with 
positive probability) and take a limit in distribution 
as n goes to infinity; the second one is to consider 
the model for parameter p> pe, condition the 
cluster of O to be infinite (which happens with 
positive probability) and take a limit in distribution 
as p goes to pe. The limit is the same in both cases; it 
is known as the incipient infinite cluster. 

As in the supercritical regime, the structure of the 
cluster can be investigated by studying the behavior 
of a random walk on it, as was suggested by de 
Gennes; Kesten proved that in two dimensions, the 
random walk on the incipient infinite cluster is 
subdiffusive, that is, the mean square displacement 
after n steps behaves as m!~* for some e > 0. 

The construction of the incipient infinite cluster 
was done by Kesten (1986) in two dimensions, and a 
similar construction was performed recently in high 
dimension by van der Hofstad and Jarai (2004). 


Percolation in Two Dimensions 


As is the case for several other models of statistical 
physics, percolation exhibits many specific properties 
when considered on a two-dimensional lattice: duality 
arguments allow for the computation of pe in some 
cases, and for the derivation of a priori bounds for the 
probability of crossing events at or near the critical 
point, leading to the fact that 6(p.)=0. On another 
front, the scaling limit of critical site percolation on the 
two-dimensional triangular lattice can be described in 
terms of Stochastic Loewner evolutions (SLE) processes. 


Duality, Exact Computations, and RSW Theory 


Given a planar lattice £, define two associated 
graphs as follows. The dual lattice £’ has one vertex 
for each face of the original lattice, and an edge 
between two vertices if and only if the correspond- 
ing faces of £ share an edge. The star graph £* is 
obtained by adding to £ an edge between any two 
vertices belonging to the same face (£* is not planar 
in general; (£,£°) is commonly known as a 
matching pair). Then, a result of Kesten is that, 
under suitable technical conditions, 


po d pre) = pL) j p =| 


Two cases are of particular importance: the lattice 
Z? is isomorphic to its dual; the triangular lattice 7 
is its own star graph. It follows that 


pioi (Z2) = pit*(T) =4 


The only other critical parameters that are known 
exactly are p?4(T)=2sin(z/18) (and hence also 


Cc 


pd for T’, i.e., the hexagonal lattice) and p>" for 
the bow-tie lattice which is a root of the equation 
p° — 6p? + 6p? + p — 1=0. The value of the critical 
parameter for site percolation on Z might, on the 
other hand, never be known; it is even possible that 
it is “just a number” without any other signification. 

Still using duality, one can prove that the 
probability, for bond percolation on the square 
lattice with parameter p=1/2, that there is a 
connected component crossing an (7+ 1) x2 rec- 
tangle in the longer direction is exactly equal to 1/2. 
This and clever arguments involving the symmetry 
of the lattice lead to the following result, proved 
independently by Russo and by Seymour and Welsh 
and known as the RSW theorem: 


Theorem 3 (Russo 1978, Seymour and Welsh 1978). 
For every a,b > 0 there exist n > 0 and no > 0 such 
that for every n > no, the probability that there is a 
cluster crossing an |na| x |nb| rectangle in the first 
direction is greater than n. 


The most direct consequence of this estimate is that 
the probability that there is a cluster going around an 
annulus of a given modulus is bounded below 
independently of the size of the annulus; in particular, 
almost surely there is some annulus around 0 in 
which this happens, and that is what allows to prove 
that 6(p.) =0 for bond percolation on Z? (Figure 2). 


The Scaling Limit 


RSW-type estimates give positive evidence that a 
scaling limit of the model should exist; it is indeed 
essentially sufficient to show convergence of the 
crossing probabilities to a nontrivial limit as n goes 
to infinity. The limit, which should depend only on 
the ratio a/b, was predicted by Cardy using con- 
formal field theory methods. A celebrated result of 
Smirnov is the proof of Cardy’s formula in the case of 
site percolation on the triangular lattice 7: 


Theorem 4 (Smirnov (2001)). Let Q be a simply 
connected domain of the plane with four points a, b, 
c, d (in that order) marked on its boundary. For 
every 6>0, consider a critical site-percolation 





Figure 2 Two large critical percolation clusters in a box of the 
square lattice (first: bond percolation, second: site percolation). 


model on the intersection of Q with 6T and let 
fs(ab,cd;Q) be the probability that it contains a 
cluster connecting the arcs ab and cd. Then: 


(i) fs(ab, cd;Q) has a limit fo(ab, cd; Q) as 6— 0; 

(ii) the limit is conformally invariant, in the 
following sense: if ® is a conformal map from 
Q to some other domain V = ®(Q), and maps 
a to a,b to b',c to c and d to d', then 
fo(ab, cd; Q) = fola’b’, c'd'; O); and 

(iii) in the particular case when Q is an equilateral 
triangle of side length 1 with vertices a, b and c, 
and if d is on (ca) at distance x € (0,1) from c, 
then fo(ab, cd;Q) =x. 


Point (iii) in particular is essential since it allows 
us to compute the limiting crossing probabilities in 
any conformal rectangle. In the original work of 
Cardy, he made his prediction in the case of a 
rectangle, for which the limit involves hypergeo- 
metric functions; the remark that the equilateral 
triangle gives rise to nicer formulae is originally due 
to Carleson. 

To precisely state the convergence of percolation 
to its scaling limit, define the random curve known 
as the percolation exploration path (see Figure 3) as 
follows: In the upper half-plane, consider a site- 
percolation model on a portion of the triangular 
lattice and impose the boundary conditions that on 
the negative real half-line all the sites are open, 
while on the other half-line the sites are closed. The 
exploration curve is then the common boundary of 
the open cluster spanning from the negative half- 
line, and the closed cluster spanning from the 
positive half-line; it is an infinite, self-avoiding 
random curve in the upper half-plane. 

As the mesh of the lattice goes to 0, the exploration 
curve then converges in distribution to the trace of an 
SLE process, as introduced by Schramm, with 
parameter k= 6 — see Figure 4. The limiting curve is 
not simple anymore (which corresponds to the 





Figure 3 A percolation exploration path. Figure courtesy 
Schramm O (2000) Scaling limits of loop-erased random walks 
and uniform spanning trees. Israel Journal of Mathematics 118: 
221-228. 
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Figure 4 An SLE process with parameter « =6 (infinite time, 
with the driving process stopped at time 1). 


existence of pivotal sites on large critical percolation 
clusters), and it has Hausdorff dimension 7/4. For 
more details on SLE processes, see, for example, the 
related entry in the present volume. 

As an application of this convergence result, one 
can prove that the critical exponents described in the 
previous section do exist (still in the case of the 
triangular lattice), and compute their exact values, 
except for a, which is still listed here for 
completeness: 


2 5 43 91 
e=-4l ssanie 


5 4 48 91 


n= 537E pA Sa 


These exponents are expected to be universal, in the 
sense that they should be the same for percolation 
on any two-dimensional lattice; but at the time of 
this writing, this phenomenon is far from being 
understood on a mathematical level. 

The rigorous derivation of the critical exponents 
for percolation is due to Smirnov and Werner 


(2001); the dimension of the limiting curve was 
obtained by Beffara (2004). 


Other Lattices and Percolative Systems 


Some modifications or generalizations of standard 
Bernoulli percolation on Z exhibit an interesting 
behavior and as such provide some insight into the 
original process as well; there are too many 
mathematical objects which can be argued to be 
percolative in some sense to give a full account of all 
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of them, so the following list is somewhat arbitrary 
and by no means complete. 


Percolation on Nonamenable Graphs 


The first modification of the model one can think of 
is to modify the underlying graph and move away 
from the cubic lattice; phase transition still occurs, 
and the main difference is the possibility for 
infinitely many infinite clusters to coexist. On a 
regular tree, such is the case whenever p € (pe, 1), 
the first nontrivial example was produced by 
Grimmett and Newman as the product of Z by a 
tree: there, for some values of p the infinite cluster is 
unique, while for others there is coexistence of 
infinitely many of them. The corresponding defini- 
tion, due to Benjamini and Schramm, is then the 
following: if N is as above the number of infinite 
open clusters, 


Pont APAN SeT Sp 


The main question is then to characterize graphs on 
which 0 < pe < py < 1. 

A wide class of interesting graphs is that of Cayley 
graphs of infinite, finitely generated groups. There, 
by a simultaneous result by Häggström and Peres 
and by Schonmann, for every p € (Pe, pu) there are 
Pp-a.s. infinitely many infinite cluster, while for 
every p € (py, 1] there is only one — note that this 
does not follow from the definition since new 
infinite components could appear when p is 
increased. It is conjectured that pe < py for any 
Cayley graph of a nonamenable group (and more 
generally for any quasitransitive graph with positive 
Cheeger constant), and a result by Pak and 
Smirnova is that every infinite, finitely generated, 
nonamenable group has a Cayley graph on which 
Pe < Pu; this is then expected not to depend on the 
choice of generators. In the general case, it was recently 
proved by Gaboriau that if the graph G is unimodular, 
transitive, locally finite, and supports nonconstant 
harmonic Dirichlet functions (i.e., harmonic functions 
whose gradient is in £7), then indeed p,.(G) < py(G). 

For reference and further reading on the topic, 
the reader is advised to refer to the review paper by 
Benjamini and Schramm (1996), the lecture notes 
of Peres (1999), and the more recent article of 
Gaboriau (2005). 


Gradient Percolation 


Another possible modification of the original model 
is to allow the parameter p to depend on the 
location; the porous medium may for instance have 
been created by some kind of erosion, so that there 
will be more open edges on one side of a given 





Figure 5 Gradient percolation in a square. In black is the 
cluster spanning from the bottom side of the square. 


domain than on the other. If p still varies smoothly, 
then one expects some regions to look subcritical 
and others to look supercritical, with interesting 
behavior in the vicinity of the critical level set 
{p=pc}. This particular model was introduced by 
Sapoval et al. (1978) under the name of gradient 
percolation (see Figure 5). 

The control of the model away from the critical 
zone is essentially the same as for usual Bernoulli 
percolation, the main question being how to 
estimate the width of the phase transition. The 
main idea is then the same as in scaling theory: if the 
distance between a point v and the critical level set is 
less than the correlation length for parameter p(v), 
then v is in the phase transition domain. This of 
course makes sense only asymptotically, say in a 
large n xn square with p(x,y)=1-—-—vy/n as is the 
case in the figure: the transition then is expected to 
have width of order n? for some exponent a > 0. 


First-Passage Percolation 


First-passage percolation (also known as Eden or 
Richardson model) was introduced by Hammersley 
and Welsh (1965) as a time-dependent model for the 
passage of fluid through a porous medium. To define 
the model, with each edge e € €(Z) is associated a 
random variable T(e), which can be interpreted as 
being the time required for fluid to flow along e. The 
T(e) are assumed to be independent non-negative 
random variables having common distribution F. For 
any path r we define the passage time T(z) of m as 


T(r) =) T(e) 


eET 


The first passage time a(x, y) between vertices x and 
y is given by 


a(x,y) = inf{T(): a a path from x to y} 
and we can define 
W(t) := {x € Zt: a(0,x) < t} 


the set of vertices reached by the liquid by time ż. It 
turns out that W(t) grows approximately linearly as 
time passes, and that there exists a nonrandom limit 
set B such that either B is compact and 


Lew 
(1-e)BC F W(t) C (1 +e€)B, eventually a.s. 
for all e > 0, or B=R%, and 
ie 
{x ER? : |x| <K}c ri W(t), eventually a.s. 


for all K>0. Here W(t)={z+[ —1/2,1/2]*: 
ze W(t)}. 

Studies of first-passage percolation brought 
many fascinating discoveries, including Kingman’s 
celebrated subadditive ergodic theorem. In recent 
years interest has been focused on study of 
fluctuations of the set W(t) for large t. In spite of 
huge effort and some partial results achieved, it 
still remains a major task to establish rigorously 
conjectures predicted by Kardar—Parisi-Zhang the- 
ory about shape fluctuations in first passage 
percolation. 


Contact Processes 


Introduced by Harris and conceived with biological 
interpretation, the contact process on Zf is a 
continuous-time process taking values in the space 
of subsets of Zf. It is informally described as 
follows: particles are distributed in Zf in such a 
way that each site is either empty or occupied by 
one particle. The evolution is Markovian: each 
particle disappears after an exponential time of 
parameter 1, independently from the others; at any 
time, each particle has a possibility to create a new 
particle at any of its empty neighboring sites, and 
does so with rate A > 0, independently of everything 
else. 

The question is then whether, starting from a 
finite population, the process will die out in finite 
time or whether it will survive forever with positive 
probability. The outcome will depend on the value 
of A, and there is a critical value A., such that for 
A < àc process dies out, while for A > A, indeed 
there is survival, and in this case the shape of the 
population obeys a shape theorem similar to that of 
first-passage percolation. 


Percolation Theory 2/7 


The analogy with percolation is strong, the 
corresponding percolative picture being the follow- 
ing: in Zt, each edge is open with probability p € 
(0,1), and the question is whether there exists an 
infinite oriented path 7 (i.e., a path along which the 
sum of the coordinates is increasing), composed of 
open edges. Once again, there is a critical parameter 
customarily denoted by p., at which no such path 
exists (compare this to the open question of the 
continuity of the function 0 at p. in dimensions 
3 <d< 18). This variation of percolation lies in a 
different universality class than the usual Bernoulli 
model. 


Invasion Percolation 


Let X(e):e € E be independent random variables 
indexed by the edge set € of Zf, d>2, each 
having uniform distribution in [0,1]. One con- 
structs a sequence C={C,,i>1} of random 
connected subgraphs of the lattice in the 
following iterative way: the graph Cp contains 
only the origin. Having defined C;, one obtains 
Cii1 by adding to C; an edge e;}1 (with its outer 
lying end-vertex), chosen from the outer edge 
boundary of C; so as to minimize X(e;}1). Still 
very little is known about the behavior of this 
process. 

An interesting observation, relating 6(p,) of usual 
percolation with the invasion dynamics, comes from 


CM Newman: 
(p) = 0 & P{x € C} — 0 as |x| — co 


Further Remarks 


For a much more in-depth review of percolation on 
lattices and the mathematical methods involved in 
its study, and for the proofs of most of the results we 
could only point at, we refer the reader to the 
standard book of Grimmett (1999); another excel- 
lent general reference, and the only place to find 
some of the technical graph-theoretical details 
involved, is the book of Kesten (1982). More 
information in the case of graphs that are not 
lattices can be found in the lecture notes of Peres 
(1999): 

For curiosity, the reader can refer to the first 
mention of a problem close to percolation, in the 
problem section of the first volume of the American 
Mathematical Monthly (problem 5, June 1894, 
submitted by D V Wood). 


See also: Determinantal Random Fields; Stochastic 
Loewner Evolutions; Two-Dimensional Ising Model; Wulff 
Droplets. 
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Introduction 


There are several equivalent formulations of the 
problem of quantizing an interacting field theory. 
The list includes canonical quantization, path- 


integral (or functional) techniques, stochastic 
quantization, “unified” methods such as the 
Batalin-Vilkovisky formalism, and techniques 


based on the realizations of field theories as low- 
energy limits of string theory. The problem of 
obtaining an exact nonperturbative description of a 
given quantum field theory is most often a very 
difficult one. Perturbative techniques, on the other 
hand, are abundant, and common to all of the 
quantization methods mentioned above is that they 
admit particle interpretations in this formalism. 

The basic physical quantities that one wishes to 
calculate in a relativistic (d + 1)-dimensional quan- 
tum field theory are the S-matrix elements 


Sba = out (Wp (2) Palt) in [1] 


between in and out states at large positive time t. 
The scattering operator S is then defined by writing 
[1] in terms of initial free-particle (descriptor) states as 


Sva =: (Ye (0) [SVa (0)) |2] 


Suppose that the Hamiltonian of the given field 
theory can be written as H = Họ +H’, where Ho is 
the free part and H’ the interaction Hamiltonian. 
The time evolutions of the in and out states are 
governed by the total Hamiltonian H. They can be 
expressed in terms of descriptor states which evolve 
in time with Hp in the interaction picture and 
correspond to free-particle states. This leads to the 
Dyson formula 


Sie (i / ; dr H(t) 3 


where T denotes time ordering and Hy)(t)= 
f d xHin(x, t) is the interaction Hamiltonian in the 
interaction picture, with Hint(x,t) the interaction 
Hamiltonian density, which deals with essentially 
free fields. This formula expresses S in terms of 
interaction-picture operators acting on free-particle 
states in [2] and is the first step towards Feynman 
perturbation theory. 


For many analytic investigations, such as those 
which arise in renormalization theory, one is 
interested instead in the Green’s functions of the 
quantum field theory, which measure the response 
of the system to an external perturbation. For 
definiteness, let us consider a free real scalar field 
theory in d+1 dimensions with Lagrangian 
density 


C= - po" — imp dint |4] 


where Lin is the interaction Lagrangian density 
which we assume has no derivative terms. The 
interaction Hamiltonian density is then given by 
Hint = — Lint. Introducing a real scalar source J(x), 
we define the normalized “partition function” 
through the vacuum expectation values, 


_ (OSLO) 
ea s 


where |0) is the normalized perturbative vacuum 
state of the quantum field theory given by (4) 
(defined to be destroyed by all field annihilation 
operators), and 


SIJ] = Texp(i f axm JED) (6 


from the Dyson formula. This partition function is 
the generating functional for all Green’s functions 
of the quantum field theory, which are obtained 
from [5] by taking functional derivatives with 
respect to the source and then setting /(x)=0. 
Explicitly, in a formal Taylor series expansion in J 
one has 


~ Yall [asses JT) G iyers in [7] 


whose coefficients are the Green’s functions 
G” (x1, ee EA 
(OlT exp (i f d xLine) 6(2e1) ++-(2n)]10) 


(0|Texp (i f d C) 10) A 


It is customary to work in momentum space by 
introducing the Fourier transforms 
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in terms of which the expansion [7] reads 


CO jn n Tag? R 
an=D al f Spe T 


G) (ky,...,Rn) [10] 





The generating functional [10] can be written as a sum 
of Feynman diagrams with source insertions. Dia- 
grammatically, the Green’s function is an infinite series 
of graphs which can be represented symbolically as 


a TE A 


where the n external lines denote the source 
insertions of momenta k; and the bubble denotes 
the sum over all Feynman diagrams constructed 
from the interaction vertices of Lint. 

This procedure is, however, rather formal in the way 
that we have presented it, for a variety of reasons. First 
of all, by Haag’s theorem, it follows that the interaction 
representation of a quantum field theory does not exist 
unless a cutoff regularization is introduced into the 
interaction term in the Lagrangian density (this 
regularization is described explicitly below). The 
addition of this term breaks translation covariance. 
This problem can be remedied via a different definition 
of the regularized Green’s functions, as we discuss 
below. Furthermore, the perturbation series of a 
quantum field theory is typically divergent. The 
expansion into graphs is, at best, an asymptotic series 
which is Borel summable. These shortcomings will not 
be emphasized any further in this article. Some 
mathematically rigorous approaches to perturbative 
quantum field theory can be found in the bibliography. 

The Green’s functions can also be used to describe 
scattering amplitudes, but there are two important 
differences between the graphs [11] and those which 
appear in scattering theory. In the present case, 
external lines carry propagators, that is, the free- 
field Green’s functions 


A(x — y) = (0| Tl d(x) e(y)] |0) 
- (scam?) h) 


— d'p i =ip{x=y) 
= Fa ERAS [12] 
(Ia P — m* + le 


where €—0* regulates the mass shell contributions, 
and their momenta k; are off-shell in general 
(k? £m”). By the LSZ theorem, the S-matrix element 
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is then given by the multiple on-shell residue of the 
Green’s function in momentum space as 


oe ae 





I 
1 
= „ im (k ki? — m*) TT — (k= m) 
a yesi es “— [at j=1 l, /Cj J 
Jorn [7m 


x Gem) (ki... -k, ki, --- ky) [13] 


where ici,ic; are the residues of the corresponding 
particle poles in the exact two-point Green’s 
function. 

This article deals with the formal development 
and computation of perturbative scattering ampli- 
tudes in relativistic quantum field theory, along the 
lines outlined above. Initially we deal only with real 
scalar field theories of the sort [4] in order to 
illustrate the concepts and technical tools in as 
simple and concise a fashion as possible. These 
techniques are common to most quantum field 
theories. Fermions and gauge theories are then 
separately treated afterwards, focusing on the 
methods which are particular to them. 


Diagrammatics 


The pinnacle of perturbation theory is the technique 
of Feynman diagrams. Here we develop the basic 
machinery in a quite general setting and use it to 
analyze some generic features of the terms compris- 
ing the perturbation series. 


Wick’s Theorem 


The Green’s functions [8] are defined in terms of 
vacuum expectation values of time-ordered products 
of the scalar field (x) at different spacetime points. 
Wick’s theorem expresses such products in terms of 
normal-ordered products, defined by placing each 
field creation operator to the right of each field 
annihilation operator, and in terms of two-point 
Green’s functions [12] of the free-field theory 
(propagators). The consequence of this theorem is 
the Haffnian formula 


(0| T[o(x1) «++ 6(%n)]|0) 
0 
n=2k—-1 
7 14] 
2 l] I (0|T 1)) 6(%x(2H))] 10) 
n=2k 


The formal Taylor series expansion of the 
scattering operator S may now be succinctly 
summarized into a diagrammatic notation by 
using Wick’s theorem. For each spacetime integra- 
tion fd ee we introduce a vertex with label 1, 
and from each vertex there emanate some [ines 
corresponding to field insertions at the point xj. 
If the operators represented by two lines appear in 
a two-point function according to [14], that is, they 
are contracted, then these two lines are connected 
together. The S operator is then represented as a 
sum over all such Wick diagrams, bearing in mind 
that topologically equivalent diagrams correspond 
to the same term in S. Two diagrams are said to 
have the same pattern if they differ only by a 
permutation of their vertices. For any diagram © 
with (QD) vertices, the number of ways of inter- 
changing vertices is 2()!. The number of diagrams 
per pattern is always less than this number. The 
symmetry number S(D) of © is the number of 
permutations of vertices that give the same dia- 
gram. The number of diagrams with the pattern of 
D is then n(D)!/S(D 

In a given pattern, we write the contribution to S 
of a single diagram © as 





where the combinatorial factor comes from 
the Taylor expansion of S, the large colons 
denote normal ordering of quantum operators, 
and :6(): contains spacetime integrals over nor- 
mal-ordered products of the fields. Then all 
diagrams with the pattern of © contribute :6(9): 
/S() to S. Only the connected diagrams D,,r € N 
(those in which every vertex is connected to every 
other vertex) contribute and we can write the 
scattering operator in a simple form which 
eliminates contributions from all disconnected dia- 


grams as 
< 4(D,) 
S =:ex 15 
(> 135) [15] 


Feynman Rules 





Feynman diagrams in momentum space are 
defined from the Wick diagrams above by drop- 
ping the labels on vertices (and also the symmetry 
factors §(D)~'), and by labeling the external lines 
by the momenta of the initial and final particles 
that the corresponding field operators annihilate. 
In a spacetime interpretation, external lines 


represent on-shell physical particles while internal 
lines of the graph represent off-shell virtual 
particles (k? Æ m?). Physical particles interact 
via the exchange of virtual particles. An arbitrary 
diagram is then calculated via the Feynman rules: 


O p pa ae 
= J (2m)! p?-m?°+ie 


Pa Pi [16] 
ig (27)! 64+ ip] +---+p,,) 


P3 


for a monomial interaction Lint = (g/m!)d”. 


Irreducible Green’s Functions 


A one-particle irreducible (1PI) or proper Green’s 
function is given by a sum of diagrams in which 
each diagram cannot be separated by cutting one 
internal line. In momentum space, it is defined 
without the overall momentum conservation delta- 
function factors and without propagators on exter- 
nal lines. For example, the two particle 1PI Green’s 
function 


k © EGIk) 17 


is called the self-energy. If G(k) is the complete 
two-point function in momentum space, then one 


has 
k © k 
TEE 
_ i 


= k?-m?-5(k) [18] 








G(k) := 


and thus it suffices to calculate only 1PI diagrams. 

The 1PI effective action, defined by the Legendre 
transformation T[¢]:= —iln Z[J] — [dot x] (x) b(x) 
of [5], is the generating functional for proper vertex 
functions and it can be represented as a functional of 
only the vacuum expectation value of the field 4, 
that is, its classical value. In the semiclassical (WKB) 
approximation, the one-loop effective action is 
given by 
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rio] = Sfo] + ors In(1 + AV" [¢]) + O?) 


= S[¢| + ih 3 ao 
n=1 


x IT / dO. (x; — xia) V" ol) 
4 


+ O(h?) [19] 


where we have denoted S[¢]=/ d**'xf and 
V[é]=—Lint, and for each term in the infinite 
series we define x„+1 := x1. The first term in [19] 
is the classical contribution and it can be 
represented in terms of connected tree diagrams. 
The second term is the sum of contributions of 
one-loop diagrams constructed from n propaga- 
tors —iA(x—y) and n vertices —iV”"[ọ]. The 
expansion may be carried out to all orders in 
terms of connected Feynman diagrams, and the 
result of the above Legendre transformation is to 
select only the one-particle irreducible diagrams 
and to replace the classical value of @ by an 
arbitrary argument. All information about the 
quantum field theory is encoded in this effective 
action. 





Parametric Representation 


Consider an arbitrary proper Feynman diagram 
D with n internal lines and v vertices. The 
number, 4, of independent loops in the diagram 
is the number of independent internal momenta in 
when conservation laws at each vertex have 
been taken into account, and it is given by 4=n+ 
1—v. There is an independent momentum inte- 
gration variable k; for each loop, and a propa- 
gator for each internal line as in [16]. The 
contribution of © to a proper Green’s function 
with r incoming external momenta p;, with 


>; 1 Pi =9, is given by 


ae V(2)< J dk i 
O p a S(D) -= a K _ m2 Beet 


x | An (4+) (P, — K;) [20] 


where V(®) contains all contributions from the 
interaction vertices of Lint, and P; (resp. K;) is the 
sum of incoming external momenta p; (resp. 
internal momenta k;) at vertex j with respect to 
a fixed chosen orientation of the lines of the 
graph. After resolving the delta-functions in terms 
of independent internal loop momenta kj,...,k¢ 
and dropping the overall momentum conservation 
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delta-function along with the symmetry and vertex 
factors in [20], one is left with a set of momentum 


space integrals 
d+1 
“If dk i 21] 
Oa a;(k, p) +i€ 


where aj(k, p) are functions of both the internal and 
external momenta. 

It is convenient to exponentiate propagators using 
the Schwinger parametrization 





— / da; elaj(aj te) [22] 
0 


aj + le 


and after some straightforward manipulations one 
may write the Feynman parametric formula 


n 


1 
lle + l€ 


j=1 


Ijas) 
pig d e Se aby 23] 


where Do(k; a, p) := >j ajla;(k,p)+ ie] is generic- 


ally a quadratic form 


a(k;a,p) = 


zk Q;(a 
+L) 


The positive symmetric matrix Oj; is independent 
of the external momenta pı, invertible, and 
has nonzero eigenvalues Q,.. 2 Qe. The vectors 
L; are linear combinations of the p;, while Alp” ) 
is a function of only the Lorentz invariants De 
After some further elementary manipulations, 
the loop diagram contribution [21] may be 
written as 


Ip (p) 
n 1 £ 1 dtk; 

=e- Baraioi (27r)! i Ee) 

Ge ae ho 


Finally, the integrals over the loop momenta k; 
may be performed by Wick-rotating them 
to Euclidean space and using the fact that 
the combination of £ integrations in R“*! has 
O((d+1)é) rotational invariance. The contribu- 
tion from the entire Feynman diagram © thereby 


-ki +X(p?) R4 


a) Lj Mba) |25] 


reduces to the calculation of the parametric 
integrals: 





P(n- S94) n 1 l 1 
wo- ma II f doy [[ 


where T (s) is the Euler gamma-function. 


Regularization 


The parametric representation [26] is generically 
convergent when 2n—(d+1)¢>0. When diver- 
gent, the infinities arise from the lower limits of 
integration @j— 0. This is just the parametric 
representation of the large-k divergence of the 
original Feynman amplitude [20]. Such ultraviolet 
divergences plague the very meaning of a quan- 
tum field theory and must be dealt with in some 
way. We will now quickly tour the standard 
methods of ultraviolet regularization for such 
loop integrals, which is a prelude to the renor- 
malization program that removes the divergences 
(in a renormalizable field theory). Here we 
consider regularization simply as a means of 
justification for the various formal manipulations 
that are used in arriving at expressions such 


s [26]. 


Momentum Cutoff 


Cutoff regularization introduces a mass scale A 
into the quantum field theory and throws away 
the Fourier modes of the fields for spatial 
momenta k with |k| >A. This regularization 
spoils Lorentz invariance. It is also nonlocal. For 
example, if we restrict to a hypercube in 
momentum space, so that |k;| < A for i=1,...,d, 
then 





f d*k ika 1 sin(Ax’) 
Ik|>A (2m)4 a Wee 


which is a delta-function in the limit A— oo but is 
nonlocal for A < oo. The regularized field theory is 
finite order by order in perturbation theory and 
depends on the cutoff A. 


Lattice Regularization 


We can replace the spatial continuum by a lattice & 
of rank d and define a Lagrangian on £ by 


Le=5 SG +I X. ditit X V) [27] 


icS(L) (ij)EL(£) icS(L) 


where S(£) is the set of sites i of the lattice on each 
of which is situated a time-dependent function ¢;, and 
Le is the collection of links connecting pairs (i,7) of 
nearest-neighbor sites i,j on £. The regularized field 
theory is now local, but still has broken Lorentz 
invariance. In particular, it suffers from broken rota- 
tional symmetry. If £ is hypercubic with lattice spacing 
a, that is, 2=(Za)*, then the momentum cutoff is 
at A=a™ 


Pauli-Villars Regularization 


We can hares A pr ee i(k? — m? + ie) by 
i(k* — m? + ie) clk? M? +ie) t, where 
the masses M; i m are Tor with the momen- 
tum cutoff as min{M;}=A— oo. The mass-depen- 
dent coefficients c; are chosen to make the modified 
propagator decay rapidly as Nol at k— œ, 
which gives the N equations ( YE; c;(M?)’ = 
0,72=0,1,...,N—1. This Be preserves 
Lorentz invariance (and other symmetries that the 
field theory may possess) and is local in the 
following sense. The modified propagator can be 
thought of as arising through the alteration of the 
Lagrangian density [4] by N additional scalar fields 
pj of masses M; with 


Lpy =50,00" — 4m g? 
“YO On ip" 


where ®:=@ + )/,,/ej¥;- The contraction of the ® 
field thus produces the required propagator. 
However, the cs as computed above are gener- 
ically negative numbers and so the Lagrangian 
density [28] is not Hermitian (as ® 4 ®'). It is 
possible to make [28] formally Hermitian by 
redefining the inner product on the Hilbert 
space of physical states, but this produces 
negative-norm states. This is no problem at 
energy scales E < M; on which the extra particles 
decouple and the negative probability states are 
invisible. 


-1 My 2) + Linel] [28] 


Dimensional Regularization 


—r 


Consider a Euclidean space integral f d*k(k? + a?) 
arising after Wick rotation from some loop diagram 
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in (3+ 1)-dimensional scalar field theory. We 
replace this integral by its D-dimensional version 


Pk — mPa) D 
Jera eha) 


This integral is absolutely convergent for D < 2r. 
We can analytically continue the result of this 
integration to the complex plane D € C. As an 
analytic function, the only singularities of the Euler 
function T(z) are poles at z=0, —1, —2,.... In 
particular, T(z) has a simple pole at z =0 of residue 
1. If we write D=4+.€ with |e|—0, then the 
integral [29] is proportional to I(r — 2 — €/2) and e 
plays the role of the regulator here. This regulariza- 
tion is Lorentz invariant (in D dimensions) and is 
distinguished as having a dimensionless regulariza- 
tion parameter «€. This parameter is related to the 
momentum cutoff A by €t = In (A/m), so that the 
limit € — 0 corresponds to A —> oo. 


Infrared Divergences 


Thus far we have only considered the ultraviolet 
behavior of loop amplitudes in quantum field theory. 
When dealing with massless particles (m=0 in [4]) 
one has to further worry about divergences arising 
from the k—0 regions of Feynman integrals. After 
Wick rotation to Euclidean momenta, one can show 
that no singularities arise in a given Feynman diagram 
as some of its internal masses vanish provided that all 
vertices have superficial degree of divergence d + 1, 
the external momenta are not exceptional (i.e., no 
partial sum of the incoming momenta p; vanishes), and 
there is at most one soft external momentum. This 
result assumes that renormalization has been carried 
out at some fixed Euclidean point. The extension of 
this property when the external momenta are con- 
tinued to physical on-shell values is difficult. The 
Kinoshita—Lee-Nauenberg theorem states that, as a 
consequence of unitarity, transition probabilities in a 
theory involving massless particles are finite when the 
sum over all degenerate states (initial and final) is 
taken. This is true order by order in perturbation 
theory in bare quantities or if minimal subtraction 
renormalization is used (to avoid infrared or mass 
singularities in the renormalization constants). 


Fermion Fields 


We will now leave the generalities of our pure scalar 
field theory and start considering the extensions of 
our previous considerations to other types of 
particles. Henceforth we will primarily deal with 
the case of (3 + 1)-dimensional spacetime. We begin 
by indicating how the rudiments of perturbation 
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theory above apply to the case of Dirac fermion 
fields. The Lagrangian density is 


Lr = 4i- myy + L [30] 


where w are four-component Dirac fermion fields in 
3+ 1 dimensions, y:=~wly? and @=70, with 74 
the generators of the Clifford algebra {y", y’} = 27”. 
The Lagrangian density £’ contains couplings of the 
Dirac fields to other field theories, such as the scalar 
field theories considered previously. 

Wick’s theorem for anticommuting Fermi fields 
leads to the Pfaffian formula 
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where for compactness we have written in the 
argument of y(i) the spacetime coordinate, the 
Dirac index, and a discrete index which distin- 
guishes y% from w. The nonvanishing contractions 


in [31] are determined by the free-fermion 
propagator 
Ar(x — y) = (0|T[d(x)H(y)] |) 


= (| my) 


-i f 2? p+m 
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Perturbation theory now proceeds exactly as 
before. Suppose that the coupling Lagrangian 
density in [30] is of the form L’ = W(x)V(x)a(x). 
Both the Dyson formula [3] and the diagrammatic 
formula [15] are formally the same in this instance. 
For ‘example, in the formal expansion in powers of 
f dfxL', the vacuum-to-vacuum amplitude (the 
aoto in [5]) will contain field products of 


the form 
II / dx; (O/T (B(x) V(x) (a) 10) 
g=1 


which correspond to fermion loops. Before applying 
Wick’s theorem, the fields must be rearranged as 


e J Vew 


W(xi41) 
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(with xy41:=%1), where tr is the 4x4 trace 
over spinor indices. This reordering introduces the 
familiar minus sign for a closed fermion loop, and 
one has 


Vi.) V(x) ; 
= (JI Jats; 
V(x) mi 
n-1 
Vix) x tr [] Ap(x;-x;,1) 
j=l 


X V(x541) Ap(%41-%j42) 


[33] 


Feynman rules are now described as follows. 
Fermion lines are oriented to distinguish a particle 
from its corresponding antiparticle, and carry both 
a four-momentum label p as well as a spin 
polarization index r= 1,2. Incoming fermions (resp. 
antifermions) are described by the wave functions 
Ti (resp. Dy), while outgoing fermions (resp. 
antifermions) are described by the wave functions 
Hy,’ (resp. vy’). Here uy," and vy are the classical 
spinors, that is, the positive and rien r 
ouo of the Dirac equation (#— m a= (p+ 
m)v\ =0. Matrices are multiplied along a Fermi 
line, with the head of the arrow on the left. Closed 
anion loops produce an overall minus sign as in 
[33], and the multiplication rule gives the trace of 
Dirac matrices along the lines of the loop. Unpolar- 
ized scattering amplitudes are summed over the spins 
of final particles and averaged over the spins of initial 
particles using the completeness relations for spinors 


ya as =pim, yu Dy =p-—m [34] 
r=1,2 r=1.2 
leading to basis-independent results. Polarized 


amplitudes are computed using the spinor bilinears 
Patty) =D tty) = 2pH6 Wu = — Dy!) =2m 
és, and a Uy —0. 
When calculating fermion loop integrals using 
dimensional regularization, one utilizes the Dirac 
algebra in D dimensions 


WI =, = D 
WP =(2-D)p 
PRY =4p-k+(D—4)pR 
WN PRdy = —24 ki —(D—4) PRG [35] 
trl =4, try. att = 0, try” = 4” 
try Pry? = 4( nt??? — nth’? 
+n n’) 


Specific to D =4 dimensions are the trace identities 
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where 7°:=iy’y!¥77°. Finally, loop diagrams eval- 


uated with the fermion propagator [32] require a 
generalization of the momentum space integral [29] 
given by 


J d?k 1 
(27)? (k? +2k-p +a? + ie)’ 
(=) T(r — 8) 1 
-— D pz COB 
(2n)?(r—1)! (a2 — p2 tie)” P 

From this formula we can extract expressions for 
more complicated Feynman integrals which are 
tensorial, that is, which contain products of 
momentum components k” in the numerators of 
their integrands, by differentiating [37] with respect 
to the external momentum p”. 


Gauge Fields 


The issues we have dealt with thus far have 
interesting difficulties when dealing with gauge 
fields. We will now discuss some general aspects of 
the perturbation expansion of gauge theories using 
as prototypical examples quantum electrodynamics 
(QED) and quantum chromodynamics (QCD) in 
four spacetime dimensions. 


Quantum Electrodynamics 
Consider the QED Lagrangian density 
Lon = — F Fw E” 
+4p7A,AX + Yü- eA- m)y [38] 


where A, is a U(1) gauge field in 3 + 1 dimensions 
and F =0,A, — A, is its field strength tensor. 
We have added a small mass term u—0 for 
the gauge field, which at the end of calculations 
should be taken to vanish in order to describe 
real photons (as opposed to the soft photons 
described by [38]). This is done in order to cure 
the infrared divergences generated in scattering 
amplitudes due to the masslessness of the photon, 
that is, the long-range nature of the electromag- 
netic interaction. The Bloch-Nordsieck theorem 
in QED states that infrared divergences cancel 
for physical processes, that is, for processes 
with an arbitrary number of undetectable soft 
photons. 

Perturbation theory proceeds in the usual way 
via the Dyson formula, Wick’s theorem, and 
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Feynman diagrams. The gauge field propagator is 
given by 


(O/T [A,,(x)A.(y)] 0) 
= (x| [nu (O +2) — 8,8] |y) 


4 — PuPv 
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=1 =e [39] 
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and is represented by a wavy line. The fermion- 
fermion—photon vertex is 


= 40 
= K 
u 





An incoming (resp. outgoing) soft photon of 
momentum k and polarization r is described by the 
wave function el” (k) (resp. eRT; where the 
polarization vectors e r= 1,2,3 solve the vector 
field wave equation ((}+py7)A,=0,A"=0 and 





obey the  orthonormality and completeness 
conditions 
e\) (k)* l el) (k) E 6's 
3 
r r * k ky [41] 
> et (Re? (k)* == te +7 


along with k-e'(k)=0. All vector indices are 
contracted along the lines of the Feynman graph. 
All other Feynman rules are as previously. 


Quantum Chromodynamics 


Consider nonabelian gauge theory in 3+ 1 dimen- 
sions minimally coupled to a set of fermion fields 
w',A=1,...,Ny, each transforming in the funda- 
mental representation of the gauge group G whose 
generators T° satisfy the commutation relations 
[T?, T’] =f% T°. The Lagrangian density is given by 


1 bs 1 PEO 
Leacp = — 4 ae F” + Ja Ca +0,7) D” 


Ny 
+30 V GD- may 42] 
A=1 


where F? , =0,A% — 0,A2 + f#°A° AC and D, =9, + 
ieR(T*)A%,, with R the pertinent representation of G 
(R(T’),.=ff7. for the adjoint representation and 
R(T*)=T? for the fundamental representation). 
The first term is the Yang-Mills Lagrangian density, 
the second term is the covariant gauge-fixing term, 
and the third term contains the Faddeev—Popov 
ghost fields 7 which transform in the adjoint 
representation of the gauge group. 

Feynman rules are straightforward to write 
down and are given in Figure 1 where wavy lines 
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Figure 1 Feynman rules. 


represent gluons and dashed lines represent ghosts. 
Feynman rules for the fermions are exactly as 
before, except that now the vertex [40] is multi- 
plied by the color matrix T’. All color indices are 
contracted along the lines of the Feynman graph. 
Color factors may be simplified by using the 
identities 


dim R 





Tr R R? = R), RIR7—=C(R 
r EE Ca (R)6?, 2(R) m 
R° R? R? = (CR) . C(G))R' 


where R*:=R(T*°) and C2(R) is the quadratic 
Casimir invariant of the representation R (with 
value C2(G) in the adjoint representation). For 
G=SU(N), one has C(G)=N and GŒG(N) = (N? — 
1)/2N for the fundamental representation. 

The cancellation of infrared divergences in loop 
amplitudes of QCD is far more delicate than in 
QED, as there is no analog of the Bloch- 
Nordsieck theorem in this case. The Kinoshita- 
Lee-Nauenberg theorem guarantees that, at the 
end of any perturbative calculation, these diver- 
gences must cancel for any appropriately defined 


Nuw pT up Nya) 


Nu vp up Tr) 
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physical quantity. However, at a given order of 
perturbation theory, a physical quantity typically 
involves both virtual and real emission contribu- 
tions that are separately infrared divergent. 
Already at two-loop level these divergences have 
a highly intricate structure. Their precise form is 
specified by the Catani color-space factorization 
formula, which also provides an efficient way of 
organizing amplitudes into divergent parts, which 
ultimately drop out of physical quantities, and 
finite contributions. 

The computation of multigluon amplitudes in 
nonabelian gauge theory is rather complicated 
when one uses polarization states of vector bosons. 
A much more efficient representation of amplitudes 
is provided by adopting a helicity (or circular 
polarization) basis for external gluons. In the 
spinor—-helicity formalism, one expresses positive 
and negative-helicity polarization vectors in terms 
i massless Weyl spinors |k*):= s(L4ys5)up= 
x(1+ys5)v, through 


(qF ly |RF) 


e~(k:q) = 
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(44) 


where g is an arbitrary null reference momentum 
which drops out of the final gauge-invariant 
amplitudes. The spinor products are crossing sym- 
metric, antisymmetric in their arguments, and satisfy 
the identities 


(hee) CRP Re ) = ki by 
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Any amplitude with massless external fermions 
and vector bosons can be expressed in terms of 
spinor products. Conversely, the spinor products 
offer the most compact representation of helicity 
amplitudes which can be related to more conven- 
tional amplitudes described in terms of Lorentz 
invariants. For loop amplitudes, one uses a 
dimensional regularization scheme in which all 
helicity states are kept four dimensional and only 
internal loop momenta are continued to D=4+€ 
dimensions. 


Computing Loop Integrals 


At the very heart of perturbative quantum field 
theory is the problem of computing Feynman 
integrals for multiloop scattering amplitudes. The 
integrations typically involve serious technical chal- 
lenges and for the most part are intractable by 
straightforward analytical means. We will now 
survey some of the computational techniques that 
have been developed for calculating quantum loop 
amplitudes which arise in the field theories consid- 
ered previously. 


Asymptotic Expansion 


In many physical instances one is interested in 
scattering amplitudes in certain kinematical limits. In 
this case one may perform an asymptotic expansion of 
multiloop diagrams whose coefficients are typically 
nonanalytic functions of the perturbative expansion 
parameter 4. The main simplification which arises 
comes from the fact that the expansions are done 
before any momentum integrals are evaluated. In the 
limits of interest, Taylor series expansions in different 
selected regions of each loop momentum can be 
interpreted in terms of subgraphs and co-subgraphs 
of the original Feynman diagram. 

Consider a Feynman diagram D which depends on 
a collection {O;} of large momenta (or masses), and 
a collection {m;,q;} of small masses and momenta. 
The prescription for the  large-momentum 
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asymptotic expansion of D may be summarized in 
the diagrammatic formula 


ou D(Q;m, q) 


a S (9) (m, q) * (T im,.q}d) (O; Mp), qo) [46] 


cD 


where the sum runs through all subgraphs 0 of D 
which contain all vertices where a large momentum 
enters or leaves the graph and is one-particle irredu- 
cible after identifying these vertices. The operator 
T im, q} performs a Taylor series expansion before any 
integration is carried out, and the notation (9/0) x 
(T tmp,qo}0) indicates that the subgraph OC D is 
replaced by its Taylor expansion in all masses and 
external momenta of D that do not belong to the set 
{O;}. The external momenta of 0 which become loop 
momenta in D are also considered to be small. The 
loop integrations are then performed only after all 
these expansions have been carried out. The diagrams 
D/D are called co-subgraphs. 

The subgraphs become massless integrals in which 
the scales are set by the large momenta. For instance, 
in the simplest case of a single large momentum O one 
is left with integrals over propagators. The co- 
subgraphs may contain small external momenta and 
masses, but the resulting integrals are typically much 
simpler than the original one. A similar formula is true 
for large-mass expansions, with the vertex conditions 
on subdiagrams replace by propagator conditions. For 
example, consider the asymptotic expansion of the 
two-loop double bubble diagram (Figure 2) in the 
region q? < m7, where m is the mass of the inner loop. 
The subgraphs (to the right of the stars) are expanded 
in all external momenta including g and reinserted into 
the fat vertices of the co-subgraphs (to the left of the 
stars). Once such asymptotic expansions are carried 
out, one may attempt to reconstruct as much informa- 
tion as possible about the given scattering amplitude 


Figure 2 Asymptotic expansion of the two-loop double bubble 
diagram. 
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by using the method of Padé approximation which 
requires knowledge of only part of the expansion of 
the diagram. By construction, the Padé approximation 
has the same analytic properties as the exact 
amplitude. 


Brown-Feynman Reduction 


When considering loop diagrams which involve 
fermions or gauge bosons, one encounters tensorial 
Feynman integrals. When these involve more than 
three distinct denominator factors (propagators), 
they require more than two Feynman parameters 
for their evaluation and become increasingly 
complicated. The Brown—Feynman method simpli- 
fies such higher-rank integrals and effectively 
reduces them to scalar integrals which typically 
require fewer Feynman parameters for their 
evaluation. 

To illustrate the idea behind this method, consider 
the one-loop rank-3 tensor Feynman integral 


pa | d?k 
(2r)? 
k” k” kò 


* 2 (k2 — (qR) ((k q) +a) (R2+2k-p) 


where p and q are external momenta with the mass- 
shell conditions p? =(p — q} =m?. By Lorentz invar- 
iance, the general structure of the integral [47] will 
be of the form 


la — a” p^ Ea bi’ q* 4 cl gv L c” s> [48] 





[47] 


where a,b!” are tensor-valued functions and 
c” a vector-valued function of p and q. The 
symmetric tensor s'” is chosen to project out 
components of vectors transverse to both p and q, 
i.e., pus” =qus”=0, with the normalization 
S,” =D — 2. Solving these constraints leads to the 
explicit form 


m qq” +q pip’ — (p:a) p” +0") 4g 
22 2 [42 
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To determine the as yet unknown functions 
a”, b!” and c above, we first contract both sides 
of the decomposition [48] with p” and q” to get 


2px J> = Im'a” + 2(p - q)b” 
2qx J" = 2(p- q)a” + 2gb” 


Inside the integrand of [47], we then use the trivial 
identities 
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to write the left-hand sides of [50] as the sum of 
rank-2 Feynman integrals which, with the exception 
of the one multiplied by q? from [51], have one less 
denominator factor. This formally determines the 
coefficients a” and b”” in terms of a set of rank-2 
integrations. The vector function c” is then found 
from the contraction 


J, = pra” + qub™ + (D — 2)c" |52] 


This contraction eliminates the k? denominator 
factor in the integrand of [47] and produces a 
vector-valued integral. Solving the system of 
algebraic equations [50] and [52] then formally 
determines the rank-3 Feynman integral [47] in 
terms of rank-1 and rank-2 Feynman integrals. The 
rank-2 Feynman integrals thus generated can then 
be evaluated in the same way by writing a 
decomposition for them analogous to [48] and 
solving for them in terms of vector-valued and 
scalar-valued Feynman integrals. Finally, the rank-1 
integrations can be solved for in terms of a set of 
scalar-valued integrals, most of which have fewer 
denominator factors in their integrands. 

Generally, any one-loop amplitude can be reduced 
to a set of basic integrals by using the Passarino- 
Veltman reduction technique. For example, in 
supersymmetric amplitudes of gluons any tensor 
Feynman integral can be reduced to a set of scalar 
integrals, that is, Feynman integrals in a scalar field 
theory with a massless particle circulating in the 
loop, with rational coefficients. In the case of N = 4 
supersymmetric Yang-Mills theory, only scalar box 
integrals appear. 


Reduction to Master Integrals 


While the Brown—Feynman and Passarino—Veltman 
reductions are well suited for dealing with one-loop 
diagrams, they become rather cumbersome for 
higher-loop computations. There are other more 
powerful methods for reducing general tensor 
integrals into a basis of known integrals called 
master integrals. Let us illustrate this technique on a 
scalar example. Any scalar massless two-loop Feyn- 
man integral can be brought into the form 
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where Aj are massless scalar propagators depending 
on the loop momenta k,k’ and the external 
momenta p1,...,Ðn, and X; are scalar products of 
a loop momentum with an external momentum or 
of the two loop momenta. The topology of the 
corresponding Feynman diagram is uniquely deter- 
mined by specifying the set A1,..., A; of t distinct 


propagators in the graph, while the integral itself is 
specified by the powers l; > 1 of all propagators, by 
the selection ¥4,..., 4, of q scalar products and by 
their powers n; > 0. 

The integrals in a class of diagrams of the same 
topology with the same denominator dimension 
r= 7,4; and same total scalar product number 
s= n; are related by various identities. One 
class follows from the fact that the integral over a 
total derivative with respect to any loop momentum 
vanishes in dimensional regularization as 
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where J(k) is any tensorial combination of propaga- 
tors, scalar products and loop momenta. The 
resulting relations are called integration-by-parts 
identities and for two-loop integrals can be cast 
into the form 


d?k f dPR afk, k, p) 
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(2r)? J (27) Ok’ 
where f(k,k',p) is a scalar function containing 
propagators and scalar products, and v” is any 
internal or external momentum. For a graph with £ 
loops and independent external momenta, this 
results in a total of @(m + £) relations. 

In addition to these identities, one can also exploit 
the fact that all Feynman integrals [53] are Lorentz 
scalars. Under an infinitesimal Lorentz transformation 
p” — p" + dp", with dp" = p” del, bef = — de", one has 
the invariance condition I(p + 6p) =I(p), which leads 
to the linear homogeneous differential equations 
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This equation can be contracted with all possible 
antisymmetric combinations of PiuPi to yield 
linearly independent Lorentz invariance identities 
for (53). 

Using these two sets of identities, one can either 
obtain a reduction of integrals of the type (53) 
to those corresponding to a small number of simpler 
diagrams of the same topology and diagrams of 
simpler topology (fewer denominator factors), or 
a complete reduction to diagrams with simpler 
topology. The remaining integrals of the topology 
under consideration are called irreducible master 
integrals. These momentum integrals cannot be 
further reduced and have to be computed by different 
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techniques. For instance, one can apply a Mellin- 
Barnes transformation of all propagators given by 


1 1 100 d Z 
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where the contour of integration is chosen to lie to the 
right of the poles of the Euler function I'(/+ z) and to 
the left of the poles of I'(—z) in the complex z-plane. 
Alternatively, one may apply the negative-dimension 
method in which D is regarded as a negative integer in 
intermediate calculations and the problem of loop 
integration is replaced with that of handling infinite 
series. When combined with the above methods, it may 
be used to derive powerful recursion relations among 
scattering amplitudes. Both of these techniques rely on 
an explicit integration over the loop momenta of the 
graph, their differences occurring mainly in the repre- 
sentations used for the propagators. 

The procedure outlined above can also be used to 
reduce a tensor Feynman integral to scalar integrals, as 
in the Brown—Feynman and Passarino—Veltman reduc- 
tions. The tensor integrals are expressed as linear 
combinations of scalar integrals of either higher 
dimension or with propagators raised to higher 
powers. The projection onto a tensor basis takes the 
form [53] and can thus be reduced to master integrals. 


String Theory Methods 


The realizations of field theories as the low-energy 
limits of string theory provides a number of power- 
ful tools for the calculation of multiloop amplitudes. 
They may be used to provide sets of diagrammatic 
computational rules, and they also work well for 
calculations in quantum gravity. In this final part we 
shall briefly sketch the insights into perturbative 
quantum field theory that are provided by tech- 
niques borrowed from string theory. 


String Theory Representation 


String theory provides an efficient compact repre- 
sentation of scattering amplitudes. At each loop 
order there is only a single closed string diagram, 
which includes within it all Feynman graphs along 
with the contributions of the infinite tower of 
massive string excitations. Schematically, at one- 
loop order, the situation is as shown in Figure 3. 
The terms arising from the heavy string modes are 
removed by taking the low-energy limit in which all 
external momenta lie well below the energy scale set 
by the string tension. This limit picks out the regions 
of integration in the string diagram corresponding to 
particle-like graphs, but with different diagrammatic 
rules. 
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Figure 3 String theory representation at one-loop order. 


Given these rules, one may formulate a purely 
field-theoretic framework which reproduces them. 
In the case of QCD, a key ingredient is the use of a 
special gauge originally derived from the low-energy 
limit of tree-level string amplitudes. This is known 
as the Gervais—Neveu gauge and it is defined by the 
gauge-fixing Lagrangian density 


ie 
y2 


This gauge choice simplifies the color factors that 
arise in scattering amplitudes. The string theory 
origin of gauge theory amplitudes is then most 
closely mimicked by combining this gauge with the 
background field gauge, in which one decomposes 
the gauge field into a classical background field and 
a fluctuating quantum field as ni + Al", and 
imposes the gauge-fixing condition DIA =Q, 
where D is the background field covariant deriva- 
tive evaluated in the adjoint representation of the 
gauge group. This hybrid gauge is well suited for 
computing the effective action, with the quantum 
part describing gluons propagating around loops 
and the classical part describing gluons emerging 
from the loops. The leading loop momentum 
behavior of one-particle irreducible graphs with 
gluons in the loops is very similar to that of graphs 
with scalar fields in the loops. 
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Supersymmetric Decomposition 


String theory also suggests an intimate relationship 
with supersymmetry. For example, at tree level, 
QCD is effectively supersymmetric because a multi- 
gluon tree amplitude contains no fermion loops, and 
so the fermions may be taken to lie in the adjoint 
representation of the gauge group. Thus, pure gluon 
tree amplitudes in QCD are identical to those in 
supersymmetric Yang-Mills theory. They are con- 
nected by supersymmetric Ward identities to ampli- 
tudes with fermions (gluinos) which drastically 
simplify computations. In supersymmetric gauge 
theory, these identities hold to all orders of 
perturbation theory. 

At one-loop order and beyond, QCD is not super- 
symmetric. However, one can still perform a super- 
symmetric decomposition of a QCD amplitude for 
which the supersymmetric components of the ampli- 
tude obey the supersymmetric Ward identities. Con- 
sider, for example, a one-loop multigluon scattering 


amplitude. The contribution from a fermion propagat- 
ing in the loop can be decomposed into the contribution 
of a complex scalar field in the loop plus a contribution 
from an N =1 chiral supermultiplet consisting of a 
complex scalar field and a Weyl fermion. The 
contribution from a gluon circulating in the loop can 
be decomposed into contributions of a complex scalar 
field, an M =1 chiral supermultiplet, and an M =4 
vector supermultiplet comprising three complex scalar 
fields, four Weyl fermions and one gluon all in the 
adjoint representation of the gauge group. This 
decomposition assumes the use of a supersymmetry- 
preserving regularization. 

The supersymmetric components have important 
cancellations in their leading loop momentum 
behavior. For instance, the leading large loop 
momentum power in an n-point 1PI graph is 
reduced from |k|” down to |k|”* in the N=1 
amplitude. Such a reduction can be extended to any 
amplitude in supersymmetric gauge theory and is 
related to the improved ultraviolet behavior of 
supersymmetric amplitudes. For the M =4 ampli- 
tude, further cancellations reduce the leading power 
behavior all the way down to |k|”*. In dimensional 
regularization, M =4 supersymmetric loop ampli- 
tudes have a very simple analytic structure owing to 
their origins as the low-energy limits of superstring 
scattering amplitudes. The supersymmetric Ward 
identities in this way can be used to provide 
identities among the nonsupersymmetric contribu- 
tions. For example, in M =1 supersymmetric Yang- 
Mills theory one can deduce that fermion and gluon 
loop contributions are equal and opposite for multi- 
gluon amplitudes with maximal helicity violation. 


Scattering Amplitudes in Twistor Space 


The scattering amplitude in QCD with n incoming 
gluons of the same helicity vanishes, as does the 
amplitude with n — 1 incoming gluons of one helicity 
and one gluon of the opposite helicity for n > 3. The 
first nonvanishing amplitudes are the maximal helicity 
violating (MHV) amplitudes involving n — 2 gluons of 
one helicity and two gluons of the opposite helicity. 
Stripped of the momentum conservation delta-function 
and the group theory factor, the tree-level amplitude 
for a pair of gluons of negative helicity is given by 


A(R) =e (ke k) [N (krleta) (58) 
i=1 


This amplitude depends only on the holomorphic 
(negative chirality) Weyl spinors. The full MHV 
amplitude (with the momentum conservation 
delta-function) is invariant under the conformal 


group SO(4,2) = SU(2,2) of four-dimensional 


Minkowski space. After a Fourier transformation of 
the positive-chirality components, the complexifica- 
tion SL(4, C) has an obvious four-dimensional repre- 
sentation acting on the positive- and negative-chirality 
spinor products. This representation space is iso- 
morphic to Cf and is called twistor space. Its elements 
are called twistors. 

Wave functions and amplitudes have a known 
behavior under the C”*-action which rescales twistors, 
giving the projective twistor space CP? or RP? 
according to whether the twistors are complex valued 
or real valued. The Fourier transformation to twistor 
space yields (due to momentum conservation) the 
localization of an MHV amplitude to a genus-0 
holomorphic curve CP! of degree 1 in CP? (or to a 
real line RP! c RP’). It is conjectured that, generally, 
an ¢-loop amplitude with p gluons of positive helicity 
and g gluons of negative helicity is supported on a 
holomorphic curve in twistor space of degree q + £ — 1 
and genus < ¢. The natural interpretation of this curve is 
as the world sheet of a string. The perturbative gauge 
theory may then be described in terms of amplitudes 
arising from the couplings of gluons to a string. This 
twistor string theory is a topological string theory which 
gives the appropriate framework for understanding the 
twistor properties of scattering amplitudes. This frame- 
work has been used to analyze MHV tree diagrams and 
one-loop NV = 4 supersymmetric amplitudes of gluons. 


See also: Constructive Quantum Field Theory; 
Dispersion Relations; Effective Field Theories; Gauge 
Theories from Strings; Hopf Algebra Structure of 
Renormalizable Quantum Field Theory; Perturbative 
Renormalization Theory and BRST; Quantum 
Chromodynamics; Renormalization: General Theory; 
Scattering, Asymptotic Completeness and Bound States; 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools; Stationary Phase 
Approximation; Supersymmetric Particle Models. 
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Main Problems in the Perturbative 
Quantization of Gauge Theories 


Gauge theories are field theories in which the basic 
fields are not directly observable. Field configurations 
yielding the same observables are connected by a 


gauge transformation. In the classical theory, the 
Cauchy problem is well posed for the observables, 
but in general not for the nonobservable gauge- 
variant basic fields, due to the existence of time- 
dependent gauge transformations. 

Attempts to quantize the gauge-invariant objects 
directly have not yet been completely satisfactory. 
Instead, one modifies the classical action by adding a 
gauge-fixing term such that standard techniques of 
perturbative quantization can be applied and such 
that the dynamics of the gauge-invariant classical 
fields is not changed. In perturbation theory, this 
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problem shows up already in the quantization of the 
free gauge fields (see the section “Quantization of 
free gauge fields”). In the final (interacting) theory the 
physical quantities should be independent on how the 
gauge fixing is done (“gauge independence”). 

Traditionally, the quantization of gauge theories 
is mostly analyzed in terms of path integrals (e.g., by 
Faddeev and Popov), where some parts of the 
arguments are only heuristic. In the original treat- 
ment of Becchi, Rouet, and Stora (cf. also Tyutin) 
(which is called “BRST-quantization’’), a restriction 
to purely massive theories was necessary; the 
generalization to the massless case by Lowenstein’s 
method is cumbersome. 

The BRST quantization is based on earlier work 
of Feynman, Faddeev, and Popov (introduction of 
“ghost fields”), and of Slavnov. The basic idea is 
that after adding a term to the Lagrangian which 
makes the Cauchy problem well posed but which is 
not gauge-invariant one enlarges the number of 
fields by infinitesimal gauge transformations 
(“ghosts”) and their duals (“anti-ghosts”). One 
then adds a further term to the Lagrangian which 
contains a coupling of the anti-ghosts and ghosts. 
The BRST transformation acts as an infinitesimal 
gauge transformation on the original fields and on 
the gauge transformations themselves and maps the 
anti-ghosts to the gauge-fixing terms. This is done 
in such a way that the total Lagrangian is invariant 
and that the BRST transformation is nilpotent. 
The hard problem in the perturbative construction 
of gauge theories is to show that BRST symmetry can 
be maintained during renormalization (see the section 
on perturbative renormalization). By means of the 
“quantum action principle” of Lowenstein (1971) 
and Lam (1972, 1973) a cohomological classification 
of anomalies was worked out (an overview is given, 
e.g., in the book of Piguet and Sorella (1995)). For 
more details, see BRST Quantization. 

The BRST quantization can be carried out in a 
transparent way in the framework of algebraic 
quantum field theory (AQFT, see Algebraic 
Approach to Quantum Field Theory). The advan- 
tage of this formulation is that it allows one to 
separate the three main problems of perturbative 
gauge theories: 


1. the elimination of unphysical degrees of freedom, 
2. positivity (or “unitarity”), and 
3. the problem of infrared divergences. 


In AQFT, the procedure is the following: starting 
from an algebra of all local fields, including the 
unphysical ones, one shows that after perturbative 
quantization the algebra admits the BRST transfor- 
mation as a graded nilpotent derivation. The 


algebra of observables is then defined as the 
cohomology of the BRST transformation. To solve 
the problem of positivity, one has to show that the 
algebra of observables, in contrast to the algebra of 
all fields, has a nontrivial representation on a 
Hilbert space. Finally, one can attack the infrared 
problem by investigating the asymptotic behavior 
of states. The latter problem is nontrivial even in 
quantum electrodynamics (since an electron is 
accompanied by a “cloud of soft photons”) and 
may be related to confinement in quantum 
chromodynamics. 

The method of BRST quantization is by no means 
restricted to gauge theories, but applies to general 
constrained systems. In particular, massive vector 
fields, where the masses are usually generated by the 
Higgs mechanism, can alternatively be treated 
directly by the BRST formalism, in close analogy 
to the massless case (cf. the section on quantization 
of free gauge fields). 


Local Operator BRST Formalism 


In AQFT, the principal object is the family of 
operator algebras O — A(O) (where O runs, e.g., 
through all double cones in Minkowski space), 
which fulfills the Haag—Kastler axioms (cf. Algebraic 
Approach to Quantum Field Theory). To construct 
these algebras, one considers the algebras F(O) 
generated by all local fields including ghosts u and 
anti-ghosts #. Ghosts and anti-ghosts are scalar 
fermionic fields. The algebra gets a Z2 grading with 
respect to even and odd ghost numbers, where ghosts 
get ghost numbers +1 and anti-ghosts ghost number —1. 
The BRST transformation s acts on these algebras as a 
Zn-graded derivation with s* = 0, s(F(O)) c F(O), 
and s(F*) = —(—1)°'s(F)*, 6p denoting the ghost num- 
ber of F. 

The observables should be s-invariant and may be 
identified if they differ by a field in the range of s. 
Since the range Ago of s is an ideal in the kernel Ao 
of s, the algebra of observables is defined as the 
quotient 


A := Ao/Aoo [1] 


and the local algebras A(O) C A are the images of 
Ao NF(O) under the quotient map Ag — A. 

To prove that A admits a nontrivial representa- 
tion by operators on a Hilbert space, one may use 
the BRST operator formalism (Kugo and Ojima 
1979, Dütsch and Fredenhagen 1999): one starts 
from a representation of F on an inner-product 


space (K,(-,-)) such that (F*d,¢) = (¢, Fu) 


and that s is implemented by an operator O on K, 
that is, 


s(F) = [Q, F] 2 | 


with [-,-] denoting the graded commutator, such 
that O is symmetric and nilpotent. One may then 
construct the space of physical states as the 
cohomology of O, H:= Ko/Koo, where Ko is the 
kernel and Koo the range of O. The algebra of 
observables now has a natural representation 7 


on H: 
m™([A])[¢] := [Ag] [3] 


(where A € Ao, ¢ġ € Ko, [A]:= A + Ago, [6] := @ + 
Koo). The crucial question is whether the scalar 
product on H inherited from K is positive definite. 

In free quantum field theories (K,(-,-)) can be 
chosen in such a way that the positivity can directly 
be checked by identifying the physical degrees of 
freedom (see next section). In interacting theories 
(see the section on perturbative construction of 
gauge theories), one may argue in terms of scattering 
states that the free BRST operator on the asymptotic 
fields coincides with the BRST operator of the 
interacting theory. This argument, however, is 
invalidated by infrared problems in massless gauge 
theories. Instead, one may use a stability property of 
the construction. 

Namely, let F be the algebra of formal power 
series with values in F, and let K be the vector space 
of formal power series with values in K. K possesses 
a natural inner product with values in the ring of 
formal power series C[[A]], as well as a representa- 
tion of F by operators. One also assumes that the 
BRST transformation § is a formal power series 
§=S>,A”s, of operators s, on F and that the 
BRST operator O is a formal power series 
O=)-,X"O, of operators on K. The algebraic 
construction can then be done in the same way as 
before, yielding a representation 7 of the algebra 
of observables A by endomorphisms of a C[[A]] 
module H, which has an inner product with values 
in C[[A]]. 

One now assumes that at A = 0 the inner product 
is positive, in the sense that 


(Positivity) 
(i) (6,6) >0 Vd €K with Ood = 0, and 

(ii) Q09 = OA (9,6) = 0 = GE Ook 4] 

Then the inner product on H is positive in the 


sense that for all ¢ € H the inner product with itself, 
(ġ, p), is of the form č*č with some power series 


č € CIPI], and č = 0 iff 6 = 0. 
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This result guarantees that, within perturbation 
theory, the interacting theory satisfies positivity, 
provided the unperturbed theory was positive and 
BRST symmetry is preserved. 


Quantization of Free Gauge Fields 


The action of a classical free gauge field A, 


So(A) = -3 f dx P(x) Fula) 


N 


1 4 * A LY 
=5 J dkå, (k) M” (k)Â (k) [5] 


(where FH’ := OHA” — OYA" and M#”(k):= k?g"” — 
k”k”) is unsuited for quantization because M”” is not 
invertible: due to M””k,„ = 0, it has an eigenvalue 0. 
Therefore, the action is usually modified by adding a 
Lorentz-invariant gauge-fixing term: M”” is replaced 
by M”” (k) + Ak”k”, where A € R \ {0} is an arbitrary 
constant. The corresponding Euler-Lagrange equation 
reads 


LIA” — (1 — A)0"0,A” = 0 [6] 
For simplicity, let us choose A = 1, which is referred 


to as Feynman gauge. Then the algebra of the free 
gauge field is the unital x-algebra generated by 


elements <A”(f), f € D(R*), which fulfill the 
relations: 

f= A” (f) is linear [7] 

A"(Cf) = 0 8 


[A (f), A’(g)] = ig” / dx dy f(x)D(x —y)g(y) [10] 


where D is the massless Pauli—Jordan distribution. 

This algebra does not possess Hilbert space 
representations which satisfy the microlocal spectrum 
condition, a condition which in particular requires 
the singularity of the two-point function to be of the 
so-called Hadamard form. It possesses, instead, 
representations on vector spaces with a nondegene- 
rate sequilinear form, for example, the Fock space 
over a one-particle space with scalar product 


d*p 

b,%b) = (2x) J T 
(p= jp 
Gupta and Bleuler characterized a subspace of the 
Fock space on which the scalar product is semide- 
finite; the space of physical states is then obtained 


G*(D) uP) pop [11] 
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by dividing out the space of vectors with vanishing 
norm. 
After adding a mass term 


m? i 
M | dxA,(x)A (x) 


to the action [5], it seems to be no longer necessary 
to add also a gauge-fixing term. The fields then 
satisfy the Proca equation 


3, F” +m A” =0 [12] 


which is equivalent to the equation (O + m?)A” = 0 
together with the constraint 0,A” = 0. The Cauchy 
problem is well posed, and the fields can be 
represented in a positive-norm Fock space with 
only physical states (corresponding to the three 
physical polarizations of A). The problem, however, 
is that the corresponding propagator admits no 
power-counting renormalizable perturbation series. 

The latter problem can be circumvented in the 
following way: for the algebra of the free quantum 
field, one takes only the equation (LJ + m7)A“ = 0 
into account (or, equivalently, one adds the gauge- 
fixing term (1/2)(0,A") to the Lagrangian) and goes 
over from the physical field A” to 


Bia av 42? [13] 
m 
where ¢ is a real scalar field, to the same mass m 
where the sign of the commutator is reversed 
(“bosonic ghost field” or “Stückelberg field”). 
The propagator of B” yields a power-counting 
renormalizable perturbation series; however, B” is 
an unphysical field. One obtains four independent 
components of B which satisfy the Klein—Gordon 
equation. The constraint 0 = ð, ,A” = ð, B” + mọ is 
required for the expectation values in physical states 
only. So, quantization in the case m > 0 can be 
treated in analogy with [8]-[10] by replacing A” by 
B”, the wave operator by the Klein—Gordon operator 
(O + m7) in [8], and D by the corresponding massive 
commutator distribution A,, in [10]. Again, the 
algebra can be nontrivially represented on a space 
with indefinite metric, but not on a Hilbert space. 
One can now use the method of BRST quantiza- 
tion in the massless as well as in the massive case. 
One introduces a pair of fermionic scalar fields 
(ghost fields) (u, #). u, ŭ, and (for m > 0) ¢ fulfill the 
Klein—Gordon equation to the same mass m > 0 as 
the vector field B. The free BRST transformation 
reads 


so(BY) = 10", 
so(u) = 0, 


so(@) = imu 


soli) = -i(8,B’ +me) “4 


(see, e.g., Scharf (2001)). It is implemented by the 
free BRST charge 


Qo= | dx») as 
x°=const. 
where 
jO := (0,BY + m@)0,u — 0,(0,B’ +md)u [16] 


is the free BRST current, which is conserved. (The 
interpretation of the integral in [15] requires some 
care.) Oo satisfies the assumptions of the (local) 
operator BRST formalism, in particular it is nilpotent 
and positive [4]. Distinguished representatives of the 
equivalence classes [6] € Ke Oo /Ra Qo are the states 
built up only from the three spatial (two transversal 
for m=O, respectively) polarizations of A. 


Perturbative Renormalization 


The starting point for a perturbative construction of 
an interacting quantum field theory is Dyson’s 
formula for the evolution operator in the interaction 
picture. To avoid conflicts with Haag’s theorem on 
the nonexistence of the interaction picture in 
quantum field theory, one multiplies the interaction 
Lagrangian £ with a test function g and studies the 
local S-matrix, 


Sig) =1+ 05 | dei denglar) = glen) 
n=1 ` 


x T(L(x1) -< £(Xn)) [17] 


where T denotes a time-ordering prescription. In the 
limit g—1 (adiabatic limit), S(g£) tends to the 
scattering matrix. This limit, however, is plagued by 
infrared divergences and does not always exist. 
Interacting fields F,¢ are obtained by the Bogoliubov 
formula: 


Fec(x) = |p-oS(gL) 'S(gL + hF) [18] 


6 

dh (x) 
The algebraic properties of the interacting fields 
within a region © depend only on the inter- 
action within a slightly larger region (Brunetti and 
Fredenhagen 2000), hence the net of algebras in the 
sense of AQFT can be constructed in the adiabatic 
limit without the infrared problems (this is called the 
“algebraic adiabatic limit’’). 

The construction of the interacting theory is thus 
reduced to a definition of time-ordered products of 
fields. This is the program of causal perturbation 
theory (CPT), which was developed by Epstein and 
Glaser (1973) on the basis of previous work by 
Stiickelberg and Petermann (1953) and Bogoliubov 


and Shirkov (1959). For simplicity, we describe 
CPT only for a real scalar field. Let y be a classical 
real scalar field which is not restricted by any field 
equation. Let P denote the algebra of polynomials 
in y and all its partial derivatives 0*y with multi- 
indices a € NG. The time-ordered products (Ty) ,ex 
are linear and symmetric maps T,:(P® 
D(R*))°” + L(D), where L(D) is the space of 
Operators on a dense invariant domain D in the 
Fock space of the scalar free field. One often uses 
the informal notation 


T„(g1Fı TERE © 2nF,) 
= fey dy Ty (Fults). Palan) 


X g1 (x1) 8n(Xn) [19] 


where F; € P, g; € D(RÎ). 
The sequence (T,„) is constructed by induction on 
n, starting with the initial condition 


Tı TI a) = lI OM p(x) : [20] 


where the right-hand side is a Wick polynomial of 
the free field ¢. In the inductive step the requirement 
of causality plays the main role, that is, the 
condition that 


Taf @ +++ @ fn) STD: @ fe) 
x Th_e(fer1 @ °°: @fn) B1 


if 
(supp fı U--- U supp fk) f 
N ((supp fk+1 U -+> U supp fa) + V-) = 0 


(where V_ is the closed backward light cone). This 
condition expresses the composition law for evolu- 
tion operators in a relativistically invariant and local 
way. Causality determines T,„ as an operator-valued 
distribution on R” in terms of the inductively known 
Tıp, l<n, outside of the total diagonal A, := 
{(X1,---5 Xn)|xX1= ++: =Xy}, that is, on test functions 
from D(R*” \ A,,). 

Perturbative renormalization is now the exten- 
sion of T, to the full test function space D(R*”). 
Generally, this extension is nonunique. In contrast 
to other methods of renormalization, no diver- 
gences appear, but the ambiguities correspond to 
the finite renormalizations that persist after 
removal of divergences by infinite counter terms. 
The ambiguities can be reduced by (re-)normal- 
ization conditions, which means that one requires 
that certain properties which hold by induction on 
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D(R*™”\ A,) are maintained in the extension, 
namely: 


e (NO) a bound on the degree of singularity near 
the total diagonal; 

e (N1) Poincaré covariance; 

e (N2) unitarity of the local S-matrix; 

e (N3) a relation to the time-ordered products of 
subpolynomials; 

e (N4) the field equation for the interacting field 
pec [18]; 

e (AWI) the “action Ward identity” (Stora 2002, 
Diitsch and Fredenhagen 2003): O“T(---F;)(x)---) = 
T(--- O¥F,(x)---). This condition can be understood 
as the requirement that physics depends on the action 
only, so total derivatives in the interaction Lagrangian 
can be removed; and 

e further symmetries, in particular in gauge 
theories, Ward identities expressing BRST invar- 
lance. A universal formulation of all symmetries 
which can be derived from the field equation in 
classical field theory is the “master Ward iden- 
tity” (which presupposes (N3) and (N4)) (Boas 
and Diitsch 2002, Diutsch and Fredenhagen 
2003); see next section. 


The problem of perturbative renormalization is to 
construct a solution of all these normalization 
conditions. Epstein and Glaser have constructed the 
solutions of (NO)-(N3). Recently, the conditions 
(N4) and (AWI) have been included. The master 
Ward identity cannot always be fulfilled, the 
obstructions being the famous “anomalies” of 
perturbative quantum field theory. 


Perturbative Construction of Gauge 
Theories 


In the case of a purely massive theory, the 
adiabatic limit S$ = lim, .1S(gf) exists (Epstein 
and Glaser 1976), and one may adopt a formalism 
due to Kugo and Ojima (1979), who use the fact 
that in these theories the BRST charge O can be 
identified with the incoming (free) BRST charge 
Oo [15]. For the scattering matrix S to be a well- 
defined operator on the physical Hilbert space of 
the free theory, H = Ke Oo/Ra Qo, one then has to 
require 


lim|Qo, T((gl)")Ilkeroo = 0 |22] 


This is the motivation for introducing the condi- 
tion of “perturbative gauge invariance” (Dütsch 
et al. 1993, 1994); see Scharf (2001)): according 
to this condition, there should exist a Lorentz 
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vector Lj € P associated with the interaction £, 
such that 


(Qo, TE) L(Xn)) 


=1) 0y Ta(L(x1) Ly (x1) Ln) (23) 
l=1 


This is a somewhat stronger condition than [22] but 
has the advantage that it can be formulated 
independently of the adiabatic limit. The condition 
[22] (or perturbative gauge invariance) can be 
satisfied for tree diagrams (i.e., the corresponding 
requirement in classical field theory can be fulfilled). 
In the massive case, this is impossible without a 
modification of the model; the inclusion of addi- 
tional physical scalar fields (corresponding to Higgs 
fields) yields a solution. It is gratifying that, 
by making a polynomial ansatz for the interaction 
L € P, perturbative gauge invariance [23] for tree 
diagrams, renormalizability (i.e., the mass dimension 
of L is <4), and some obvious requirements (e.g., 
the Lorentz invariance) determine £ to a far extent. 
In particular, the Lie-algebraic structure needs not to 
be put in, as it can be derived in this way (Stora 1997, 
unpublished). Including loop diagrams (i.e., quantum 
effects), it has been proved that (NO)-(N2) and 
perturbative gauge invariance can be fulfilled to all 
orders for massless SU(N) Yang-Mills theories. 
Unfortunately, in the massless case, it is unlikely that 
the adiabatic limit exists and, hence, an S-matrix 
formalism is problematic. One should better rely on 
the construction of local observables in terms of 
couplings with compact support. However, then the 
selection of the observables [1] has to be done in terms 
of the BRST transformation § of the interacting fields. 
For the corresponding BRST charge, one makes 
the ansatz 
O= [dx jt), C= Lr" al 


n>1 


where (b„) is a smooth version of the 6-function 
characterizing a Cauchy surface and j,, is the 
interacting BRST-current [18] (where 
i = Me A” OK € P) is a formal power series with 
he given by [16]|). (Note that there is a volume 
divergence in this integral, which can be avoided by a 
spatial compactification. This does not change the 
abstract algebra F-(O).) A crucial requirement is that 
ee is conserved in a suitable sense. This condition is 
essentially equivalent to perturbative gauge invariance 
and hence its application to classical field theory 
determines the interaction £ in the same way, and in 
addition the deformation j® — j ç. The latter also 
gives the interacting BRST charge and transformation, 
O and §, by [24] and [2]. The so-obtained O is often 


nilpotent in classical field theory (and hence this holds 
also for 3). However, in QFT conservation of j,- and 
O =0 requires the validity of additional Ward 
identities, beyond the condition of perturbative gauge 
invariance [23]. All the necessary identities can be 
derived from the master Ward identity 


Tri (A, Fi, oles fa) 


=y Taies ba esda) [25] 
k=1 


where A = 6489 with a derivation 6,. The master 
Ward identity is closely related to the quantum 
action principle which was formulated in the 
formalism of generating functionals of Green’s 
functions. In the latter framework, the anomalies 
have been classified by cohomological methods. The 
vanishing of anomalies of the BRST symmetry is a 
selection criterion for physically acceptable models. 

In the particular case of QED, the Ward identity 


ONT (y) Fi (x1) +++ Fa(Xn)) 
=i) óy- x) 
= 


x T (Fy (x1) +++ (OF) (xj) <- Fu(%n)) [26] 
for the Dirac current j” := ypy”, is sufficient for 
the construction, where (0F):=i(r—s)F for 
F = wW'sB,---B, (B1,..., Bı are nonspinorial fields) 
and F,,...,F, run through all subpolynomials of 
L = jJ "A, (NO)-(N4) and [26] can be fulfilled to all 
orders (Diitsch and Fredenhagen, 1999). 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Batalin—Vilkovisky 
Quantization; BRST Quantization; Constrained Systems; 
Indefinite Metric; Perturbation Theory and its Techniques; 
Quantum Chromodynamics; Quantum Field Theory: 

A Brief Introduction; Quantum Fields with Indefinite 
Metric: Non-Trivial Models; Renormalization: General 
Theory; Renormalization: Statistical Mechanics and 
Condensed Matter; Standard Model of Particle Physics. 
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Introduction 


When an external parameter such as the tempera- 
ture T is changed, physical systems in a homo- 
geneous state often become unstable and tend to 
an ordered phase with broken symmetry. The 
growth of new order takes place with coarsening 
of domains or defect structures on mesoscopic 
spatial scales much longer than the microscopic 
molecular scale. Such ordering processes are 
ubiquitously observed in many systems such as 
ferromagnetic (spin) systems, solid alloys, and 
fluids. Historically, structural ordering and phase 
separation in solid alloys have been one of the 
central problems in metallurgy (Cahn 1961). These 
are highly nonlinear and far-from-equilibrium 
processes and have been studied as challenging 
subjects in condensed matter physics, polymer 
science, and metallurgy (Gunton et al. 1983, 
Binder 1991, Bray 1994, Onuki 2002). Here a 
short review on phase ordering is given on the 
basis of prototype mathematical models, which 
can be a starting point to understand the real 
complex problems. 
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Phase Ordering in Nonconserved 
Systems 


Let us consider phase ordering in a system with a 
scalar spacetime-dependent variable y(r,t). If its 
space integral is not conserved in time, it is called 
the nonconserved order parameter, representing 
magnetization, electric polarization, etc. After 
appropriate scaling of time t, space r, and w, the 
simplest dynamic equation reads 


2 y=Vyp- rý- VP +h [1] 


The coefficient 7 is related to the temperature by 
7 =A(T — T.), where A is a constant and T, is the 
critical temperature. The constant / is also an 
externally controllable parameter, proportional to 
the applied magnetic field for the ferromagnetic 
case. The last term is the Markovian Gaussian 
random noise needed when eqn [1] is treated 
as a Langevin (stochastic differential) equation. 


In physics its stochastic property is usually 
expressed as 
(O(r, t)O(r', t')) = 2e6(r — r')6(t — t') [2 


where € represents the strength of the noise 
(proportional to the temperature before the scaling). 
In the presence of 0, the variable ~ is a random 
variable, whose probability distribution P({7}, t) 
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obeys the Fokker—Planck equation. The equilibrium 
(steady) distribution is given by 


Peq {Y} = const. exp(—F{w}/e) 3] 


where 
P= fdr] su? + Gut +5 Vol — be [4] 


is the so-called Ginzburg-Landau free energy. Using 
F we rewrite eqn [1] in a standard form of the 
Langevin equation, 


T yp= -+0 [5] 


In equilibrium ~ consists of the average Ye and the 
deviation 6, where the latter is a Gaussian 
fluctuation in the limit of small e. If 7 >0 and 
hb=0, we obtain wy. =0. If 7 <0 and b=0, there 
are two minima w,=+|7|'/*. These two states 
can coexist in equilibrium with a planar interface 
separating them at h = 0. If its normal is along the x- 
axis, the interface solution is of the form 


w(x) = |r|" tanh(|z|'/2x/V2) [6] 


1/2 


which tends to |r| ^ as x + co and satisfies 


6F/6p=(r + W)p — dy/dx? = 0 [7] 


It is well known that the fluctuations of w are 
increasingly enhanced near the critical point. The 
renormalization group theory shows how the equili- 
brium distribution P.g{7} in eqn [3] depends on the 
upper cutoff wave number A of Y, where we suppose 
that ~ consists of the Fourier components Yp with 
k < A (Onuki 2002). In our phase-ordering problem 
the shortest relevant spatial scale is the interface 
width of the order of the thermal correlation length € 
at the final temperature. Therefore, near criticality, 
we may assume that the thermal fluctuations with 
wave numbers larger than €~! have been eliminated 
in the model (or A ~ €7! at the starting point). 


Domain Growth 


Thermodynamic instability occurs when 7 is 
changed from a positive value 7, to a negative 
value 7 at t=0. We here assume )=0. We set 
T= —1 using the scaling. At long wavelengths k < 
1, small plane wave fluctuations with wave vector k 
grow exponentially as 


w(t) ~ exp[(1 — k)i] [8] 


with the growth rate largest at k =0. This suggests 
that the nonlinear term in eqn [1] becomes crucial 
after a transient time. Numerically obtained snap- 
shots of the subsequent y(r, t) are shown in Figure 1 





Figure 1 Time evolution of y in model [1] in 2D with system 
length = 128. The numbers are the times after quenching. Noise 
is added, but is not essential for large patterns or in the late 
stage. Reproduced with permission from Onuki A (2002) Phase 
Transition Dynamics. Cambridge, UK: Cambridge University 
Press. 


in two dimensions (2D), where we can see the 
coarsening of the patterns. The characteristic domain 
size ((t) grows algebraically as 


llt) ~ t’ 9] 


where a = 1/2 is known for the model [1]. Scattering 
experiments detect the time-dependent correlation 


g(r, t) = (ôYlr + ro, £)6Y(r0, t)) [10] 


Set) = | arse tek [11] 


where S(k,t) is called the structure factor. We 
assume the translational invariance and the spatial 
isotropy after the thermal average (---). If 7 >> 1, 
the quartic term in F is negligible, leading to the 
initial structure factor 


S(k,0) Y e/( +k?) [12] 


which is produced by the thermal fluctuations. 
However, when the domain size ¢(t) much exceeds 
the microscopic length (lattice constant), the follow- 
ing scaling behavior emerges: 


g(r, t) = G(r/£(t)) [13] 


S(k, t) = &(t)“OQ(E(t)k) [14] 


where d is the space dimensionality and G(x) and O(x) 
are the scaling functions of order unity for x ~ 1. The 
correlation on the scale of ¢(t) in eqn [13] arises 
from large-scale domain structures, while eqn [14] 
is simply its Fourier transformation. The maxi- 
mum of the structure factor grows as ¢(t)?. When 
e <1, however, there can be a well-defined initial 
stage in which S(k,t) grows exponentially at long 
wavelengths. 


We may explain the roles of the terms on the 
right-hand side of eqn [1] in phase ordering in a 
simple manner. 


1. The linear term —ry triggers instability for r < 0. 
2. The nonlinear term —y” gives rise to saturation 
of 7 into +1. To see this, we neglect V7~ and 6 
to have 0y/dt=(1— 4?) for t= —1. This 


equation is solved to give 


b(t) = bo/1/ Y + (1 — poe” [15] 


where Wo =%(0) is the initial value. Thus, Y — 1 
for Wo > 0 and w— —1 for Wo < 0 as t— œ. 

3. The gradient term limits the instability only in 
the long wavelength region k < 1 in the initial 
stage (see eqn [8]) and creates the interfaces in 
the late stage (see eqn [7]). 

4. The noise term @ is relevant only in the early 
stage where w is still on the order of the initial 
thermal fluctuations. The range of the early stage 
is of order 1 for e > 1, but weakly grows as 
In(1/e) for e « 1. The noise term can be 
neglected once the fluctuations much exceed the 
thermal level. 

5. If þ is a small positive number, it favors growth 
of regions with y% = 1. 


Interface Dynamics 


At long times t >> 1 domains with typical size ¢(t) 
are separated by sharp interfaces and the thermal 
noise is negligible. Allowing the presence of a small 
positive h, we may approximate the free energy F as 


F=oS(t) — 2hV,(t) + const. [16] 


where o is a constant (surface tension), S(t) is the 
surface area, and V(t) is the volume of the 
regions with Y% = 1. In this stage the interface velocity 
Vint =Vine N is given by the Allen—Cahn formula 
(Allen and Cahn 1979): 


Vine = —K + (2/o)h [17] 


The normal unit vector n is from a region with y = 1 
to a region with w = —1. The K is the sum of the 
principal curvatures 1/Rı + 1/R2 in 3D. This equa- 
tion can be derived from eqn [1]. If the interface 
position rz moves to ra +6¢n infinitesimally, the 
surface area changes by 6S= [daké¢, where [da--- 
denotes the surface integral. Therefore, F in eqn [16] 
changes in time as 


_ J da(oK — 2h)vim < 0 [18] 


which is non-negative-definite owing to eqn [17]. 
Furthermore, we may draw three results from eqn [17]. 
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1. If we set Vin ~ C(t)/t and K ~ 1/4(t), we obtain 
a= 1/2 in the growth law [9]. 

2. In phase ordering under very small positive h, 
the balance 1/&(t) ~h/o yields the crossover 
time t, ~ hb. For t < t, the effect of h is small, 
while for t > t, the region with y = 1 becomes 


predominant. 
3. A spherical droplet with yw = 1 evolves as 
OR 2 2h 
— =+ 19 
Ot R” o a 
from which the critical radius is determined as 
Re=o/) [20] 


A droplet with R > R.(R < Re) grows (shrinks). 

We mention a statistical theory of interface dynamics 
at h =0 by Ohta (1982). There, a smooth subsidiary 
field u(r,t) is introduced to represent surfaces by 
u = const. The differential geometry is much simplified 
in terms of such a field. The two-phase boundaries are 
represented by u = 0. If all the surfaces follow ving = —K 
in eqn [17] in the whole space, u obeys 

2u = |V — D nin iN;| 21) 

where V;=0/0x; and nj = Vju/|Vu|. This equation 
becomes a linear diffusion equation if njn;V;Vj; is 
replaced by d'6;,V7. Then u can be expressed in 
terms of its initial value and the correlation function 
of wW(r,t)(= u(r,t)/|u(r,t)| in the late stage) is 
calculated in the form of eqn [13] with 


G(x) = Z sin! exp (- a5") [22] 


which excellently agrees with simulations. 


Spinodal Decomposition in Conserved 
Systems 


The order parameter ~ can be a conserved variable 
such as the density or composition in fluids or 
alloys. With the same F in eqn [4], a simple dynamic 
model in such cases reads 

o > OF R 

pe u 23 

Rea Vi 23] 


Here j* is the random current characterized by 


(RDB) ) =2ebagd(r —r)6(e- 2) [24 


which ensures the equilibrium distribution [3] of w. 
However, the noise jÊ is negligible in late-stage 
phase separation as in the nonconserved case. Note 
that þh in the conserved case is the chemical potential 
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conjugate to w and, if it is homogeneous, it vanishes 
in the dynamic equation [23]. In experiments the 
average order parameter 


M = (4) = J dry(r)/V 25] 


is used as a control parameter instead of þh, where 
the integral is within the system with volume V. If 
there is no flux from outside, M is constant in time. 
Here the instability occurs below the so-called 
spinodal M? < 1/3(M? < |r|/3 for general 7 < 0). 
In fact, small fluctuations with wave vector k grow 
exponentially as 


we(t) ~ exp[k* (1 — 3M? — k* el |26] 


right after the quenching as in eqn [8]. The growth rate 
is largest at an intermediate wave number k = k„ with 


km = [(1 — 3M?)/2]'/7 [27] 


This behavior and the exponential growth of the 
structure factor have been observed in polymer mixtures 
where the parameter £ in eqn [3] or [12] is expected to be 
small (Onuki 2002). In late-stage coarsening the peak 
position of S(k, t) decreases in time as 


kin(t) ~ 27/4(t) [28] 


in terms of the domain size (t). The growth 
exponent in eqn [9] is given by 1/3 for the simple 
model [23] (see eqn [33] below). 

Figure 2 shows the patterns after quenching in 2D. 
For M=O the two phases are symmetric and the 
patterns are bicontinuous, while for M +0 the 





Figure 2 Time evolution of ~ in model [23] in 2D with system 
length = 128 without thermal noise: (a) M=0 and (b) M=0.1. 
The numbers are the times after quenching. Reproduced with 
permission from Onuki A (2002) Phase Transition Dynamics. 
Cambridge, UK: Cambridge University Press. 


minority phase eventually appears as droplets in the 
percolating region of the majority phase. 


Interface Dynamics 


Interface dynamics in the conserved case is much 
more complicated than in the nonconserved case, 
because the coarsening can proceed only through 
diffusion. Long-distance correlations arise among 
the domains and the interface velocity cannot be 
written in terms of the local quantities like the 
curvature. As a simple example, we give the counter- 
part of eqn [19]. In 3D a spherical droplet with y = 1 
appears in a nearly homogeneous matrix with Y% =M 
far from the droplet. The droplet radius R is then 
governed by (Lifshitz and Slyozov 1961) 


o A 2dọ 

ba y aE j 

at p(k z) a 
where A=(M + 1)/2 is called the supersaturation, 
while D and do are constants (equal to 2 and a/8, 


respectively, after the scaling). The critical radius is 
written as 


Re = 2do/A 30] 


The general definition of the supersaturation is 
A=(M-u@)/(W-v@) Ba 


Here the equilibrium values of y are written as wl) 
and w2) and M is supposed to be slightly different 
from y”. 

Lifshitz and Slyozov (1961) analyzed domain coar- 
sening in binary AB alloys when the volume fraction q 
of the A-rich domains is small. They noticed that the 
supersaturation A around each domain decreases in 
time with coarsening. That is, the A component atoms 
in the B-rich matrix are slowly absorbed onto the 
growing A-rich domains, while a certain fraction of the 
A-rich domains disappear. Thus, g(t) and A(t) both 
depend on time, but satisfy the conservation law 


q(t) + A(t) = A(O) = (M+ 1)/2 [32] 


With this overall constraint, they found the 
asymptotic late-stage behavior 


Hie NG a [33] 


where £(t) is the average droplet radius. Notice that 
this behavior is consistent with the droplet equation 
[29], where each term is of order R/t ~ t?"°. 


Nucleation 


In metastable states the free energy is at a local 
minimum but not at the true minimum. Such states 


are stable for infinitesimal fluctuations, but rare 
spatially localized fluctuations, called critical nuclei, 
can continue to grow, leading to macroscopic phase 
ordering (Onuki 2002, Debenedetti 1996). The birth 
of a critical droplet is governed by the Boltzmann 
factor exp(—F./kgT) at finite temperatures, where 
F. is the free energy needed to create a critical 
droplet and kgT is the thermal energy with kg being 
the Boltzmann constant. In this section we explicitly 
write kgT, but we may scale w and space such that 
T= —1 at the final temperature. 


Droplet Free Energy and Experiments 


In the nonconserved case we prepare a spin-down state 

with Y = —1 in the time region t < 0 and then apply a 

small positive field b at t=0. For t>0 a spin-up 

droplet with radius R requires a free energy change 
ST 


F(R) =4n0R? — FoR [34] 


The first term is the surface free energy and the 
second term is the bulk decrease due to h. The 
critical radius R. in eqn [20] gives the maximum of 
F(R) given by 


= z [35] 
In fact, F’(R) =OF(R)/OR is written as 
F'(R) = 8ro(R — R*/R,) [36] 


In conserved systems such as fluids or alloys, we 
lower the temperature slightly below the coexistence 
curve with the average order parameter M held fixed. 
We again obtain the droplet free energy [34], but 


h = (a/2do)A [37] 


in terms of the (initial) supersaturation A = A(0). 
Let the equilibrium values y) and Y?) in the two 
phases be written as A(T. —T)’ with A and 8 
being constants (8 = 1/3 as T— T,). For each given 
M, we define the coexistence temperature Tex by 
M=yp2) = -A(T — T-x)’. In nucleation experi- 
ments the final temperature T is slightly below Tex 
and óT = Tax — T is a positive temperature incre- 
ment. For small óT we find 


p 


A S5 6T/(Te — Tex) (38) 


Droplet Size Distribution and Nucleation Rate 


In a homogeneous metastable matrix, droplets of the 
new phase appear as rare thermal fluctuations. We 
describe this process by adding a thermal noise term 
to the droplet equation [19] or [29]. The droplet size 
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distribution n(R,t) then obeys the Fokker—Planck 
equation 





ð ð a x) 39 


a OR aR kT 


Here n(R,t)dR denotes the droplet number density 
in the range [R, R+ dR]. We determine the kinetic 
coefficient £(R) such that 


v(R) = —L(R)F'(R)/kgT [40] 


is the right-hand side of eqn [19] or [29]. It is 
equal to OR/Ot when the thermal noise is 
neglected. Thus, £(R) x Rœ or R> for the non- 
conserved or conserved case. The second deriva- 
tive (O/OR)L(R)(O/OR) in eqn [39] stems from the 
thermal noise and is negligible for R—R,>1 in 
3D (Onuki 2002). Hence, for R—R,>1, the 
droplets follow the deterministic equation [19] or 
[29] and n obeys 
on =— 2 [v(R)n] [41] 
In Figure 3, we plot the solution of eqn [39] for 
the conserved case with F./kgT=17.4 (Onuki 
2002). The time is measured in units of 1/T., 
which is the timescale of a critical droplet defined by 


De = (Ov(R)/OR) per. [42] 


We notice Te x R? from eqn [29] so Fe is small. 
The initial distribution is given by 


n(R,0)=no exp(—4roR?/kgT) |43] 


log40 N(R,t) 





RIR, 


Figure 3 Time evolution of the droplet size distribution n(R, t) 
on a semilogarithmic scale as a solution of eqn [39] in the 3D 
conserved case. The first 11 curves correspond to the times at 
I.t=0,1,... and 10. The last four curves are those at 
[,.t=15,20,25, and 30. Reproduced with permission from 
Onuki A (2002) Phase Transition Dynamics. Cambridge, UK: 
Cambridge University Press. 
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with no being a constant number density. This form 
has been observed in computer simulations as the 
droplet size distribution on the coexistence curve 
(,=0). Figure 3 indicates that m(R,t) tends to a 
steady solution 1,(R) which satisfies 


L(R) k + F ng=—I [44] 


where I is a constant. Imposing the condition ,(R) — 0 
as R — œ, we eee the above equation as 


pyar [aR exp FRSA) 


a | 45) 
For R—R,>>1 we may replace F(R,)— F(R) 
by F’(R)(R; — R) in the integrand of eqn [45] to 


obtain 





ns(R) = I/v(R) [46] 
which also follows from eqn [41]. Thus 
n(R)dR=Idt (dR=v(R)dt) [47] 


This means that I is the nucleation rate of droplets 
with radii larger than R. emerging per unit volume 
and per unit time. Furthermore, as R— 0, we 
require 1,(R)— no =const. in eqn [43] so that 


n=l [ dR = exp t a [48] 


kgT 
where the S becomes maximum 
around Re. Using the expansion F(R)= F, + 
F"(R.) (R— Re} /2+---, we obtain the famous 
formula for the nucleation rate 


—F./RgT) [49] 








I= lo exp( 


= Io exp(—Co/A°) [50] 


where the coefficient Io is of order nor.. The second 
line holds in the 3D conserved case. Here, Cy ~ 107 
typically and Ip is a very large number in units of 
cm™ st, say, 10°". Then the exponential factor in I 
changes abruptly from a very small to a very large 
number with only a slight increase of A at small 
A <1. For example, if Co/A? = 50, I is increased 
by exp(1006A/A) with a small increase of A to 
A + 6A. This factor can be of order 10° even for 
dA/A=0.05. Unless very close to criticality, simple 
metastable fluids become opaque suddenly with 
increasing A or 6T at a rather definite cloud point. In 


near-critical fluids, however, Ip itself becomes small 
(oc€*) such that the cloud point considerably depends 
on the experimental timescale (observation time). 


Remarks 


The order parameter can be a scalar, a vector as in 
the Heisenberg spin system, a tensor as in liquid 
crystals, and a complex number as in superfluids 
and superconductors. In phase ordering a crucial 
role is played by topological singularities like 
interfaces in the scalar case and vortices in the 
complex number case. Furthermore, a rich variety of 
phase transition dynamics can be explained if the 
order parameter is coupled to other relevant 
variables in the free energy and/or in the dynamic 
equations. We mention couplings to velocity field in 
fluids, electrostatic field in charged systems, and 
elastic field in solids. Phase ordering can also be 
influenced profoundly by external fields such as 
electric field or shear flow. 


See also: Reflection Positivity and Phase Transitions; 
Renormalization: Statistical Mechanics and Condensed 
Matter; Statistical Mechanics of Interfaces; Topological 
Defects and Their Homotopy Classification. 
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Introduction 


Many aspects of our everyday life, from weather to 
boiling water for a cup of coffee, involve heat 
exchanges and variations of pressure and, as a 
result, a phase transition. The general theory behind 
these phenomena is thermodynamics, which studies 
fluids and macroscopic bodies under these and more 
general transformations. 

In the simple case of a one-component substance, 
the behavior under changes of temperature T and 
pressure P is described, according to the Gibbs 
phase rule, by a phase diagram such as the one in 
Figure 1. The curves in the (T, P) plane, distinguish 
regions where the substance is in its solid, liquid, 
and gas phases. Thus, in an experiment where we 
vary the pressure and temperature moving along a 
line which crosses a transition curve, we observe an 
abrupt and dramatic change at the crossing, when 
the system changes phase. As already stated, every- 
day life is an active source of examples of such 
phenomena. 

The picture is “far from innocent”, it states that air, 
liquid, and solid are not different elements of nature, as 
for long believed, but just different aspects of the same 
thing: substances are able to adapt to different external 
conditions in dramatically different ways. What 
properties of intermolecular forces are responsible for 
such astonishing behavior? The question has been 
extensively studied and it is the argument of the 
present article, where it will be discussed in the 
framework of statistical mechanics for continuous 
systems. Before entering into the matter, let us mention 
two basic motivations. 





T 


Phase diagram of a one-component substance. 


Figure 1 


As always, there is a “fundamental theory” 
aspect; in the specific case it is the attempt for an 
atomistic theory able to describe also macroscopic 
phenomena, thus ranging from the angstrom to the 
kilometer scales. From an engineering point of view, 
the target is, for instance, to understand why and 
when a substance is an insulator, or a conductor or, 
maybe, a superconductor, and, more importantly, 
how should we change its microscopic interactions 
to produce such effects: this opens the way to 
technologies which are indeed enormously affecting 
our life. 


Phase Transitions and Statistical 
Mechanics 


The modern theory of statistical mechanics is based 
upon the Gibbs hypothesis. In a classical (i.e., not 
quantum) framework, the macroscopic states are 
described by probability measures on a particle 
configuration phase space. The equilibrium states 
are then selected by the Gibbs prescription, which 
requires that the probability of observing a config- 
uration which has energy E should be proportional 
to eE, where G=1/kT, k is the Boltzmann 
constant, and T the absolute temperature. These 
are the “Gibbs measures” and the purpose of 
statistical mechanics is to study their properties. A 
prerequisite for the success of the theory is compat- 
ibility with the principles of thermodynamics, the 
theory should then be able to explain the origin of 
the various phase diagrams and in particular to 
determine the circumstances under which phase 
transitions appear. 

The theory, commonly called DLR, after 
Dobrushin, Lanford, and Ruelle, who, in the 
1960s, contributed greatly to its foundations, has 
solid mathematical basis. Its main success is a 
rigorous proof of consistency with thermodynamics, 
which is derived under the only assumption that 
surface effects are negligible, a condition which is 
mathematically achieved by studying the system in a 
“thermodynamic limit,” where the region containing 
the system invades the whole space. 

In the thermodynamic limit, the equilibrium states 
can no longer be defined by the Gibbs prescription, 
because the energy of configurations in the whole 
space, being extensive, is typically infinite. The 
problem has been solved by first proving conver- 
gence of the finite-volume Gibbs measures in the 
thermodynamic limit. After defining the limit states, 
called “DLR states,” as the equilibrium states of the 


54 Phase Transitions in Continuous Systems 


infinite systems, it is proved that the DLR states can 
be directly characterized (i.e., without using limit 
procedures) as the solutions of a set of equations, 
the “DLR equations,” which generalize the finite- 
volume Gibbs prescription. 

In terms of DLR states, the mathematical meaning 
of phase transitions becomes very clear and sharp. 
The starting point is the proof that the physical 
property that intensive variables in a pure phase 
have negligible fluctuations is verified by all the 
DLR measures which are in a special class, thus 
selected by this property, and which are therefore 
interpreted as “pure phases.” All the other DLR 
measures are proved to be mixtures, that is, general 
convex combinations, of the pure DLR states. Thus, 
in the DLR theory, the system is in a single phase 
when there is only one DLR state, at the given 
values of the thermodynamic parameters (e.g., 
temperature and chemical potential), while the 
system is at a phase transition if there are several 
distinct DLR states. 

While the theory beautifully clarifies the meaning 
of phase transitions, it does not say whether the 
phenomenon really occurs! This is maybe the main 
open problem in equilibrium statistical mechanics. A 
general proof of existence of phase diagrams is 
needed, which should at least capture the basic 
property behind the Gibbs phase rule, namely that in 
most of the space (of thermodynamic parameters) 
there is a single phase, with rare exceptions where 
several phases coexist. A more refined result should 
then indicate that coexistence occurs only on regular 
surfaces of positive codimension. 

There is, however, a general result of existence of 
the gaseous phase, with a proof of uniqueness of 
DLR measures when temperature is large and 
density low. Coexistence of phases is much less 
understood at a general level, but results for 
particular classes of models exist, for instance, in 
lattice systems at low temperatures. The prototype is 
the ferromagnetic Ising model in two or more 
dimensions, where indeed the full diagram has 
been determined, see Figure 2. The transition curve 





Figure 2 Phase diagram of the Ising ferromagnet. 


is the segment {0 < T < Te.,bh=0}, in the (T,h) 
plane, þh being the magnetic field. In the upper-half 
plane, there is a single phase with positive magne- 
tization, in the lower one with a negative value; at 
h =0, positive and negative magnetization states can 
coexist, if the temperature is lower than the critical 
value Te. Correspondingly, there are, simulta- 
neously, a positive and a distinctly negative DLR 
state, which describe the two phases. 

An analogous result is missing for systems of 
particles in the continuum, but there has been recent 
progress on the analysis of the liquid—vapor branch 
of the phase diagram, and the issue will be the main 
focus of this article. 


Sensitive Dependence on Boundary 
Conditions 


Phase transitions describe exceptional regimes where 
the system is in a critical state; this is why they are 
so interesting and difficult to study. As in chaotic 
systems, criticality corresponds to a “butterfly 
effect,” which, in a statistical-mechanics setting 
means changing far-away boundary conditions. 
Such changes affect the neighbors, which in turn 
influence their neighbors, and so on. In general, the 
effect decays with the distance but, at phase 
transition, it provokes an avalanche which propa- 
gates throughout the system reaching all its points. 
Its occurrence is not at all obvious, if we remember 
the stochastic nature of the theory. The domino 
effect described above can in fact, at each step, be 
subverted by stochastic fluctuations. The latter, in 
the end, may completely hide the effect of changing 
the boundary conditions. This is an instance of a 
competition between energy and entropy which is 
the ruling phenomenon behind phase transitions. 
This intuitive picture also explains the relevance 
of space dimensionality. In a many-dimensional 
space, the influence of the boundary conditions has 
clearly many more ways to percolate, in contrast to 
the one-dimensional case, where in fact there is a 
general result on the uniqueness of DLR measures 
and therefore absence of phase transitions, for short- 
range interactions. For pair potentials, “short” 
means that the interaction energy between two 
molecules, respectively at r and 7’, decays as 
|r — r'|“, a > 2. There are results on the converse, 
namely on the presence of phase transitions when 
the above condition is not satisfied, mainly for 
lattice systems, but with partial extensions also to 
continuous systems. One-dimensional and long- 
range cases are not the main focus of this article, 
and the issue will not be discussed further here. 


Ising Model 


In order to make the previous ideas quantitative, let 
us first describe the simple case of the Ising model. 
Ising spin configurations are collections {o(x), x € 
Zf} of olx) €{£1} magnetic moments called spins. 
In the nearest-neighbor case, the interaction between 
two spins is —Jo(x)o(y), J > 0, if x and y are nearest 
neighbors on Zf, or is vanishing otherwise. There 
are, therefore, two ground states, one with all spins 
equal to +1 and the other one with all spins equal to 
—1. Since the Gibbs probability of higher energies 
vanishes as the temperature goes to zero, these are 
interpreted as the equilibrium states at temperature 
T=0. 

If T > 0, configurations with larger energy will 
appear, even though depressed by the Gibbs factor, 
but their occurrence is limited if T is small. In fact, 
in the ferromagnetic Ising model at zero magnetic 
field, dimensions d > 2, and low enough tempera- 
ture, it has been proved that there are two distinct 
DLR measures, one called positive and the other 
negative. The typical configurations in the positive 
measure are mainly made by positive spins and, in 
such an “ocean of positive spins” there are rare and 
small islands of negative spins. The same situation, 
but with the positive and negative spins inter- 
changed, occurs in the negative DLR state. 

The selection of one of these two states can be 
made by choosing the positive or the negative 
boundary conditions, which shows how a surface 
effect, namely putting the boundary spins equal to 1 
or —1, has a volume effect, as most of the spins in the 
system follow the value indicated by the boundary 
values. Again, this is more and more striking as we 
note that each spin is random, yet a strong, 
cooperative effect takes over and controls the system. 

The original proof due to Peierls exploits the spin- 
flip symmetry of the Ising interaction, but it has 
subsequently been extended to a wider class of 
systems on the lattice, in the general framework of 
the “Pirogov—Sinai theory.” This theory studies the 
low-temperature perturbations of ground states and 
it applies to many lattice systems, proving the 
existence of a phase transition and determining the 
structure of the phase diagram in the low- 
temperature region. The theory, however, does not 
cover continuous systems, where the low-temperature 
regime is essentially not understood, with the notable 
exception of the Widom and Rowlinson model. 


Two Competing Species in the Continuum 


The simplest version of the Widom and Rowlinson 
model has two types of particles, red and black, 
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which are otherwise identical. Particles are massive 
points and the only interaction is a hard-core 
interaction among different colors, namely a red and 
a black particle cannot be closer than 2Ro, Ro > 0 
being the hard-core radius. 

The order parameter for the phase transition is the 
particle color. For large values of the chemical 
potential, and thus large densities, there are two 
states, one essentially red, the other black, while, if 
the density is low, the colors “are not separated” 
and there is a unique state. The proof of the 
statement starts by dividing the particles of a 
configuration into clusters, each cluster made by a 
maximal connected component, where two particles 
are called connected when their mutual distance is 
<2Rpo. Then, in each cluster, all particles have the 
same color (because of the hard-core exclusion 
between black and red), and the color is either 
black or red, with equal probability. 

The question of phase transition is then related to 
cluster percolation, namely the existence of clusters 
which extend to infinity. If this occurs, then the influence 
of fixing the color of a particle may propagate infinitely 
far away, hence the characteristic “sensitive dependence 
phenomenon” of phase transitions. Percolation and 
hence phase transitions have been proved to exist in the 
positive and negative states, if the density is large and, 
respectively, small. The above argument is a more recent 
version of the original proof by Ruelle, which goes back 
to the 1970s. 

The key element for the appearance of the phase 
transition is the competition between two different 
components, so that the analysis is not useful in 
explaining the mechanisms for coexistence in the 
case of identical particles, which are considered in 
the following. 


Coarse Graining Transformations 


The Peierls argument in Ising systems does not seem 
to extend to the continuum, certainly not in a trivial 
way. The ground states, in fact, will not be as simple 
as the constant configurations of a lattice system; 
they will instead be periodic or quasiperiodic config- 
urations with a complicated dependence on the 
particle interactions. The typical fluctuations when 
we raise the temperature above zero have a much 
richer and complex structure and are correspondingly 
more difficult to control. Closeness to the ground 
states at nonzero temperature, as described in the 
Ising model, would prove the spontaneous breaking 
of the Euclidean symmetries and the existence of a 
crystalline phase. The question is, of course, of great 
interest, but it looks far beyond the reach of our 
present mathematical techniques. 
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The simpler Ising picture should instead reappear 
at the liquid—vapor coexistence line. Looking at the 
fluid on a proper spatial scale, we should in fact see 
a density that is essentially constant, except for 
small and rare fluctuations. Its value will differ in 
the liquid and in the gaseous states, Pgas < Plig: 
Therefore, density is an order parameter for the 
transition and plays the role of the spin magnetiza- 
tion in the Ising picture. 

There are general mathematical techniques devel- 
oped to translate these ideas into proofs, they involve 
“coarse graining,” “block spin transformations,” and 
“renormalization group” procedures. The starting 
point is to ideally divide the space into cells. Their size 
should be chosen to be much larger than the typical 
microscopic distance between molecules, to depress 
fluctuations of the particle density in a cell. To study 
the probability distribution of the latter, we integrate 
out all the other degrees of freedom. After such a 
coarse graining, we are left with a system of spins on a 
lattice, the lattice sites labeling the cells (also called 
blocks) and each spin (also called block spin) giving 
the value of the density of particles in the correspond- 
ing cell. Translated into the language of block spins, 
the previous physical analysis of the state of the fluid 
suggests that most probably, in each block the density 
is approximately equal to either pq Or Peas, and the 
same in different blocks, except in the case of small 
and rare fluctuations. If we represent the probability 
distribution of the block spins in terms of a Gibbs 
measure (as always possible if the system is in a 
bounded region), the previous picture is compatible 
with a new Hamiltonian with a single spin (one-body) 
potential which favors the two values pig and peas and 
an attractive interaction between spins which sup- 
presses changes from one to the other. A new effective 
low temperature should finally dampen the 
fluctuations. 

Thus, after coarse graining, the system should be in 
the same universality class as of the low-temperature 
Ising model, and we may hope, in this way, to extend 
to the liquid—vapor branch of the phase diagram the 
Pirogov-Sinai theory of low-temperature lattice 
systems. In particular, as in the Ising model, we will 
then be able to select the liquid or the vapor phases by 
the introduction of suitable boundary conditions. 

The conditional tense arises because the computation 
of the coarse graining transformation is in general very 
difficult, if not impossible, to carry out, but there is a 
class of systems where it has been accomplished. These 
are systems of identical point particles in Rf, d > 2, 
which interact with “special” two- and four-body 
potentials, having finite range and which can be chosen 
to be rotation and translation invariant; their specific 
form will be described later. For such systems, the above 


coarse graining picture works and it has been proved 
that in a “small” region of the temperature—-chemical 
potential plane, there is a part of the curve where two 
distinct phases coexist, while elsewhere in the neighbor- 
hood, the phase is unique. 

The ideas behind the choice of the Hamiltonian 
go back to van der Waals, and the Ginzburg- 
Landau theory, which are milestones in the theory 
of phase transitions, while the mathematics of 
variational problems also enters here in an impor- 
tant way. These are briefly discussed in the next 
sections. 


The van der Waals Liquid-Vapor 
Transition 


Let us then do a step backwards and recall the 
van der Waals theory of the liquid—vapor transition. 
As typical intermolecular forces have a strong 
repulsive core and a rather long attractive tail, in a 
continuum, mesoscopic approximation of the system 
will be described by a free-energy functional of the 


type 


=> (r, 7 )p(r)e(r)drdr [1] 


where p= {p(r),7 € A} is the particles density and A 
the region where the system is confined, which, for 
simplicity, is taken here as a torus in Rf, consisting 
of a cube with periodic boundary conditions. The 
term —J(r,7’)p(r)p(7’), J(r, 7’) > 0, is the energy due to 
the attractive tail of the interaction, which is 
periodic in A; o pS f3 oP) — Xp is the free-energy 
density due to the short, repulsive part of the 
interaction, À being the chemical potential. 

As noted later, [1] can be rigorously derived by a 
coarse graining transformation; it will be used to 
build a bridge between the van der Waals theory and 
the previous block spin analysis of the liquid—vapor 
phase transition. Let us take for the moment [1] as a 
primitive notion. By invoking the second principle of 
thermodynamics, the equilibrium states can be 
found by minimizing the free-energy functional. 
Supposing J to be translation invariant, that is, 
Tinr)=J(r+tart+a)rnr,ae Rf, and calling 
a= fJ(r,r)dr the intensity of J, we can rewrite 
F(p) as 


r= f foo- yar 





This shows that the minimizer must have p(r) 
constant (so that the second integral is minimized) 
and equal to any value which minimizes the function 
f3 \(p) — ap? /2}. By thermodynamic principles, the 
free energy f3 ,\(p) is convex in p, but, if œ is large 
enough, the above expression is not convex and, by 
properly choosing the value of A, the minimizers are 
no longer unique, hence the van der Waals phase 
transition. 


Kac Potentials 


The analogy between the above analysis of [2] and 
the previous heuristic study of the fluid based on 
coarse graining is striking. As customary in con- 
tinuum theory, each mesoscopic point r should be 
regarded as representative of a cell containing many 
molecules. Then the functional F(p) can be inter- 
preted as the effective Hamiltonian after coarse 
eraining. The role of the one-body term is played in 
[2] by the curly bracket, which selects two values of 
p (its minimizers, to be identified with pg and peas); 
the attractive two-body potential is then related to 
the last term in [2], as it suppresses the variations of 
p. The analogy clearly suggests a strategy for a 
rigorous proof of phase transitions in the conti- 
nuum, an approach which has been and still is 
actively pursued. It will be discussed briefly in the 
sequel. 

The first rigorous derivation of the van der Waals 
theory in a statistical-mechanics setting goes back to 
the 1960s and to Kac, who proposed a model where 
the particle pair interaction is 

-ayt e744 + hard core, y,a > 0 [3] 
The phase diagram of such systems, after the 
thermodynamic limit, can be quite explicitly deter- 
mined in the limit y — 0, where it has been proved 
to converge to the van der Waals phase diagram, 
under a proper choice of fg. 5 -) in [1]. 

The characteristic features of the first term in [3] 
are: (1) very long range, which scales as y~!, and (2) 
very small intensity, which scales as 7“, so that the 
total intensity of the potential, defined as the 
integral over the second position, is independent of 
y. The additional hard-core term (which imposes 
that any two particles cannot get closer than 
2Ro, Ro > 0 being the hard-core radius) is to ensure 
stability of matter, that is, to avoid collapse of the 
whole system on an infinitesimally small region, as it 
would happen if only the attractive part of the 
interaction were present. 

Derivation of the van der Waals theory has been 
proved for a general class of Kac potentials, where 
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the exponential term in [3] is replaced by functions 
whose dependence on y has the same scaling 
properties as mentioned above (in (1) and (2)), 
while the hard core can be replaced by suitably 
repulsive interactions. 

The proof, in the version proposed by Lebowitz 
and Penrose, uses coarse graining and shows that the 
effective Hamiltonian is well approximated by the 
van der Waals functional [1], when y is small, while 
the effective temperature scales as 74. The approx- 
imation becomes exact in the limit y — 0, where it 
reduces the computation of the partition function to 
the analysis of the minima and the ground states of 
an effective Hamiltonian which, in the limit y — 0, 
is exactly the van der Waals functional. 

A true proof of phase transitions requires instead 
to keep y > 0 fixed (instead of letting y — 0) and 
thus to control the difference of the effective 
Hamiltonian after coarse graining and the van der 
Waals functional, which is the effective Hamilto- 
nian, but only in the actual limit y — 0. In general, 
there is no symmetry between the two ground states, 
unlike in the Ising case where they are related by 
spin flip, and the Pirogov—Sinai theory thus enters 
into play. The framework in fact is exactly similar, 
with the lattice Hamiltonian replaced by the func- 
tional and low temperatures by small y (recall that 
the effective temperature scales as 7“). The extension 
of the theory to such a setting, however, presents 
difficulties and success has so far been only partial. 


A Model for Phase Transitions in the 
Continuum 


The problem is twofold: to have a good control of 
(1) the limit theory and (2) the perturbations 
induced by a nonzero value of the Kac parameter 
y. The former falls in the category of variational 
problems for integral functionals, whose prototype 
is the Ginzburg—Landau free energy 


F'(p) = | {u(p) + Vo} dr 4] 


which can be regarded as an approximation of [2] 
with w equal to the curly bracket in [2] and J 
replaced by a 6-function. Minimization problems for 
this and similar functionals have been widely 
analyzed in the context of general variational 
problems theory and partial differential equations 
(PDEs), and the study of the limit theory can benefit 
from a vast literature on the subject. The analysis of 
the corrections due to small y is, however, so far 
quite limited. To implement the Pirogov—Sinai 
strategy, we need, in the case of the interaction [3], 
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a very detailed knowledge of the system without the 
Kac part of the interaction and with only hard cores. 
This, however, is so far not available when the 
particle density is near to close-packing (i.e., the 
maximal density allowed by the hard-core poten- 
tial). Replacing hard cores by other short-range 
repulsive interactions does not help either, and this 
seems the biggest obstacle to the program. 

The difficulty, however, can be avoided by 
replacing the hard-core potential by a repulsive 
many-body (more than two) Kac potential, which 
ensures stability as well. The class of systems 
covered by the approach is characterized by Hamil- 
tonian of the form 


Hala) = f ex(on(r))dr s 


where e)(¢) is a polynomial of the scalar field 
variable ¢, a specific example being 


SOS a ~ [6] 


This form of the Hamiltonian is familiar from 
Euclidean field theories. In these theories, the free 
distribution of the field is Gaussian; in our case, 
however, the field ¢=¢,(r) is a function of the 
particle configurations q =(q;,i=1,...,7): 


b(t) = 1,40) = > plted) 
A [7] 
er =r ao 
where j(r,r') is a translation-invariant, symmetric 
transition probability kernel. Thus, ¢,(r) is a non- 
negative variable which has the meaning of a local 
density at r, weighted by the Kac kernel /,(7, 7’). 


Contours and Phase Indicators 


The dependence on y yields the scaling properties 
characteristic of the Kac potentials and [5] may be 
regarded as a generalized Kac Hamiltonian, which, 
in the polynomial case of [6], involves up to four- 
body Kac potentials. The phase diagram of the 
model, after taking first the thermodynamic limit 
and then the limit y — 0, is determined by the free- 
energy functional 


Fo) = feag- P,a is 





where [8] is taken to be defined on a torus (to avoid 
convergence problems of the integral), and 
f=IpV= 1. 

Exploiting the concavity of the entropy S(p), it is 
proved that the minimizers of F(-) are constant 
functions with the constants minimizing 


fy,e(u) = e lu) — e u > 0 [10] 


In the case of [6], to which we restrict in the sequel, 
for any 8 > Boe there is Ag so that fa, g(u) is 
double-well with two minimizers, pgas < Pliq (depen- 
dence on (3 is omitted). 

To “recognize” the densities pgas and pjiq in a 
particle configuration, we use coarse graining and 
introduce two partitions of R? into cubes C!7). The 
cubes C=) of the first partition have side ¢_,, 
proportional to y!*°, a > 0 suitably small; those of 
the second one have length ¢,,, proportional to 
y-!-@; they are chosen so that each cube C'7) is 
union of cubes C=). Notice that the small cubes 
have side much smaller than the interaction range (for 
small y), while the opposite is true for the large cubes. 

Given a particle configuration g, we say that 
a point r is in the liquid phase and write 
O(r;q)=1, if 





E Pliq) <f, a> 0O suitably small [11] 








for any small cube C™-=7) contained either in CH or 


in the cubes C'7) contiguous to CG. Iq T Cl’) is 
referred to as the number of particles of q in Ci’, 
and C; “°” as the large cube which contains r. 

Thus, O(7r;g)=1 if the local particle density is 
constantly close to pig in a large region around r. 
Defining O(7;q)= —1 if the above holds with pgas 
instead of pig and setting O(r;q) =0 in all the other 
cases, we then have a phase indicator O(r; q), which 
identifies, for all particle configurations, which 
spatial regions should be attributed to the liquid 
and gas phases. The connected components of the 
complementary region are called contours and the 
definition of O(r;q) has been structured in such a 
way that liquid and gas are always separated by a 
contour. The liquid phase will then be represented 
by a measure which gives large probability to 
configurations having mostly 0=1, while the gas 
phase by configurations with mostly © = —1. 

This is quite similar to the Ising picture and, as in 
the Ising model, the existence of a phase transition 
follows from a Peierls estimate that contours have 
small probability. In fact, if there are few contours, 
the phase imposed on the boundaries of the region 


where the system is observed percolates inside, 
invading most of the space. Thus, boundary condi- 
tions select the phase in the whole volume. The 
absence of the short-range potential, which was the 
hard-core interaction in [3], and hence the absence 
of all the difficulties which originate from it, allow 
one to carry through successfully the Pirogov—Sinai 
program and prove Peierls estimates on contours 
and, hence, the existence of a phase transition. In 
particular, the statistical weight of a contour is 
estimated by first relating the computation to one 
involving the functional [8] and then computing its 
value on density profiles compatible with the 
existence of the given contour. This part of the 
problem needs variational analysis for [8], with 
constraints and benefits of a vast literature on the 
subject. 

The phase transition is very sharp, as shown by 
the following ideal experiment. Having fixed 6 > 
(3 /2)°/ * let à vary in a (suitably) small interval 
[Ag — 6, Ag + 6],6 > 0, centered around the mean- 
field critical value Ag. We consider the system in a 
large region with, for instance, boundary conditions 
O= —1 (1.e., forcing the gas phase) and fix y small 
enough. At A= Ags — 6, the system has O= —1 in 
most of the domain, and this persists when we 
increase À till a critical value, Ag 4, close to, but not 
the same as Ag. For À > Ag,y,0=1 in most of the 
domain, except for a small layer around the 
boundaries. The analogous picture holds if we 
choose boundary conditions © = 1, and A=Agz,, is 
the only value of the chemical potential where the 
system is sensitive to the boundary conditions and 
both phases can be produced by the right boundary 
conditions. The fact that the actual value Ag, differs 
from Ag, is characteristic of the Pirogov—Sinai 
approach and enlightens the delicate nature of the 
proofs. 


Some Related Problems 


In this concluding section, two important related 
problems, which have not been mentioned so far, 
are discussed. 

A natural question, after proving a phase transi- 
tion, is to describe how two phases coexist, once 
forced to be simultaneously present in the system. 
This can be achieved, for instance, by suitable 
boundary conditions (typically positive and negative 
on the top and bottom of the spatial domain) or by 
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imposing a total density (or magnetization in the 
case of spins) intermediate between those of the pure 
phases. There will then be an interface separating 
the two phases with a corresponding surface tension 
and the geometry will be determined by the solution 
of a variational problem and given by the Wulff 
shape. 

Can statistical mechanics explain and describe the 
phenomenon? Important progress has been made 
recently on the subject in the case of lattice systems 
at low temperatures. The question has also been 
widely studied at the mesoscopic level, in the 
context of variational problems for Ginzburg and 
Landau and many other functionals. Therefore, all 
the ingredients of further development of the theory 
in this direction are now present. 

We have so far discussed only classical systems; 
a few words about extensions to the quantum case 
are now in order. In the range of values of 
temperatures and densities where the liquid—vapor 
transition occurs, the quantum effects are not 
expected to be relevant. Referring to the case of 
bosons, and away from the Bose condensation 
regime (and for system with Boltzmann statistics 
as well), the quantum delocalization of particles 
caused by the indeterminacy principle should 
essentially disappear after macroscopic coarse 
graining, and the block-spin variables should 
again behave classically, even though their under- 
lying constituents are quantal. If this argument 
proves correct, then progress along these lines may 
be expected in near future. 


See also: Cluster Expansion; Ergodic Theory; Finite 
Group Symmetry Breaking; Pirogov—Sinai Theory; 
Reflection Positivity and Phase Transitions; Statistical 
Mechanics and Combinatorial Problems; Statistical 
Mechanics of Interfaces; Symmetry Breaking in Field 
Theory; Two-Dimensional Ising Model. 
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Introduction 


Pirogov-Sinai theory is a method developed to 
study the phase diagrams of lattice models at low 
temperatures. The general claim is that, under 
appropriate conditions, the phase diagram of a 
lattice model is, at low temperatures, a small 
perturbation of the zero-temperature phase dia- 
gram designed by ground states. The treatment can 
be generalized to cover temperature driven transi- 
tions with coexistence of ordered and disordered 
phases. 


Formulation of the Main Result 
Setting 


Refraining first from full generality, we formulate 
the result for a standard class of lattice models with 
finite spin state and finite-range interaction. We will 
mention different generalizations later. 

We consider classical lattice models on the 
d-dimensional hypercubic lattice Zf with d> 2. 
A spin configuration o =(o,x),-74 is an assignment of 
a spin with values in a finite set S to each lattice site 
x € Zf; the CORN aon space is Q= S”. For a EQ 
and AC 7e. we use oa EQA=S^ to denote the 
restriction m Ove ec AN: 

The Hamiltonian is given in terms of a collection of 
interaction potentials (#4), where ®, are real func- 
tions on Q, depending only on o, with x € A, and A 
runs over all finite subsets of Z7. We assume that the 
potential is periodic with finite range of interactions. 
Namely, ®4/(o’) = ®4(o) whenever A and øo are related 
to A’ and o’ by a translation from (aZ,)4 for some fixed 
integer a and there exists R > 1 such that ®, = 0 for 
all A with diameter exceeding R. 

Without loss of generality (possibly multiplying 
the number a by an integer and increasing R), we 
may assume that R=a. 

The Hamiltonian Ha(o|n) in A with boundary 
conditions 7 € Q is then given by 


Hy(aln)= X` Saloa V ac) [1] 
AnA#O 


where o, V nac € is the configuration ao, extended 
by na: on AS. The Gibbs state in A under boundary 


conditions n€ Q (and with Hamiltonian H) is the 
probability ua(- |n) on Qa defined by 


_ expt n 


with the partition function 
Z(Aļn) = B s BH (oln)} [3] 


We use G(H) to denote the set of all periodic Gibbs 
states with Hamiltonian H defined on Q by means of 
the Dobrushin—Lanford—Ruelle (DLR) equations. 


Ground-State Phase Diagram and the Removal 
of Degeneracy 


A periodic configuration o € Q is called a (periodic) 
ground state of a Hamiltonian H =(®,) if 


>_(@a(6) — 24 (0)) 20 4 


A 


H(5; o) = 


for every finite perturbation č Æ ø of o (č differs 
from o at a finite number of lattice sites). We use 
g(H) to denote the set of all periodic ground states 
of H. For every configuration o € g(H), we define 
the specific energy e,(H) by 





(with V, denoting a cube consisting of nf lattice sites). 

To investigate the phase diagram, we will consider 
a parametric class of Hamiltonians around a 
fixed Hamiltonian H” with a finite set of periodic 
ground states g(H')) = {o1,...,0,}. Namely, let HO 
H"),..., and H" be Hamiltonians determined by 
potentials 6), 67), ..., and ®"—"), respectively, and 
consider the (r — 1)-parametric set of Hamiltonians 
H= HO + 34" tH with t=(t1,...,t-1) ER. 
Using a shorthand e,,(H)=e,,(H), and introducing 
the vectors e(H) =(e,(H),...,e,(H)) and b(t) =e(H;)— 
Minm Cn(Hz), we notice that for each t € R”', the 
vector h(t) € 0Q,, the boundary of the positive octant 
in R’. A crucial assumption for such a parametriza- 
tion H; to yield a meaningful phase diagram is the 
condition of removal of degeneracy: we assume that 
g(H) +H) Cg(H)) @=1,...,7-1, and that the 
vectors e(H®),4=1,...,r — 1, are linearly independent. 

In particular, its immediate consequence is that 
the mapping R”! >t=— hb(t)€ðQ, is a bijection. 
This fact has a straightforward interpretation in 
terms of ground-state phase diagram. Viewing the 


phase diagram (at zero temperature) as a partition of 
the parameter space into regions K, with a given set 
g C g(H) of ground states — “coexistence of zero- 
temperature phases from g”? — the above bijection 
means that the region K, is the preimage of the set 


O, = {h € 0Q,|bm = 0 for om € g and 
hm > 0 otherwise} (6) 


The partition of the set OO, has a natural 
hierarchical structure implied by the fact that Qs, N 
Oe, = Qziug: (Og is the closure of O,). Namely, the 
origin {0} = Qo) is the intersection of r positive 
coordinate: axes. Oj, mm M=1,...,r; each of 
those half-lines is an intersection of r—1 two- 
dimensional quarter-planes with boundaries on posi- 
tive coordinate axes, etc., up to (r — 1)-dimensional 
planes O;,,,,m=1,...,7. This hierarchical structure 
is thus inherited by the partition of the parameter 
space R”! into the regions K,. The phase diagrams 
with such regular structure are sometimes said to 
satisfy the Gibbs phase rule. 

We can thus summarize in a rather trivial conclusion 
that the condition of removal of degeneracy implies 
that the ground-state phase diagram obeys the Gibbs 
phase rule. The task of the Pirogov—Sinai theory is to 
provide means for proving that this remains true, at 
least in a neighborhood of the origin of parameter 
space, also for small nonzero temperatures. To achieve 
this, we need an effective control of excitation energies. 














Peierls Condition 


A crucial assumption for the validity of the Pirogov- 
Sinai theory is a lower bound on energy of 
excitations of ground states — the Peierls condition. 

In spite of the fact that for a study of phase diagram 
we consider a parametric set of Hamiltonians whose 
set of ground states may differ, it is useful to introduce 
the Peierls condition with respect to a single fixed 
collection G of reference configurations (eventually, it 
will be identified with the ground states of the 
Hamiltonian H”). Let thus a fixed set G of periodic 
configurations {01,...,0,} be given. Again, without 
loss of generality, we may assume that the periodicity 
of all configurations om € G is R. 

Before formulating the Peierls condition, we have 
to introduce the notion of contours. Consider the set 
of all sampling cubes C(x) = {y € Z4||y; — x;| < R for 
1<i<d},xeZ%. A bad cube of a configuration 
o € Q is a sampling cube C for which oc differs from 
Om restricted to C for every om € G. The boundary 
B(c) of o is the union of all bad cubes of ø. If om € G 
and ø is its finite perturbation (differing from cm on a 
finite set of lattice sites), then, necessarily, B(c) is 
finite. A contour of o is a pair y=(I, or), where T 
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(the support of the contour y) is a connected 
component of B(c) (and op is the restriction of øo on 
I’). Here, the connectedness of I means that it cannot 
be split into two parts whose (Euclidean) distance is 
larger than 1. We use O(c) to denote the set of all 
contours of o, B(c) = U caja) E- o 

Consider a configuration o” such that y is its 
unique contour. The set Z*\I has one infinite 
component to be denoted Ext y and a finite number 
of finite components whose union will be denoted 
Inty. Observing that the configuration o” coincides 
with one of the states om € G on every component of 
Z\ B(o), each of those components can be labeled 
by the corresponding m. Let q be the label of Ext y, 
we say that y is a g-contour, and let Int, y be the 
union of all components of Inty labeled by 
m,m=1,...,r. 

Defining the “energy” Y(y) of a g-contour y by 
the equation 


W(y) =H(0; 04) + eH) IP 
— X (m(H) — eq(H))|Intm y) [7] 
m1 


the Peierls condition with respect to the set G of 
reference configurations is an assumption of the 
existence of p > 0 such that 


W(y) > (p + min en(H))IEI [8] 


for any contour of any configuration o that is a 
finite perturbation of og €G. 

Notice that if G = g(H), the sum on the right-hand 
side of [7] vanishes. 


Phase Diagram 


The main claim of the Pirogov—Sinai theory provides, 
for ( sufficiently large, a construction of regions K.(/3) 
of the parameter space characterized by the coex- 
istence of phases labeled by configurations oj, € g. 
This is done similarly as for the ground-state phase 
diagram discussed earlier by constructing a home- 
omorphism t +> a(t) from a neighborhood of the origin 
of the parameter space to a neighborhood of the origin 
of ðQ, that provides the phase diagram (actually, the 
function a(t) will turn out to be just a perturbation of 
h(t) with errors of order e~”). 

Before stating the result, however, we have to 
clarify what exactly is meant by existence of phase 
m for a given Hamiltonian H. Roughly speaking, it 
is the existence of a periodic extremal Gibbs state 
Um € GH), whose typical configurations do not 
differ too much from the ground-state configura- 
tion Om. In more technical terms, the existence 
of such a state is provided once we prove a 
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suitable bound, for the finite-volume Gibbs state 
uA({oa}|om) under the boundary conditions om, on 
the probability that a fixed point in A is encircled 
by a contour from Oo. If this is the case, we say that 
the phase m is stable. It turns out that such a bound 
is actually an integral part of the construction of 
metastable free energies fm(t) yielding the home- 
omorphism t= a(t). In this way, we get the main 
claim formulated as follows: 


Theorem 1 Consider a parametric set of Hamilto- 
nians H; =H + DE tıH® with periodic finite- 
range interactions satisfying the condition of 
removal of degeneracy as well as the Peierls 
condition with respect to the reference set 
G=g(H™). Let d>2 and let B be sufficiently 
large. Then there exists a homeomorphism t> a(t) 
of a neighborhood Vg of the origin of the parameter 
space R! onto a neighborhood Ug of the origin of 
OO, such that, for any t € Vg, the set of all stable 
phases is {m € {1,...,17}|am(t) = O}. 


The Peierls condition can be actually assumed 
only for the Hamiltonian H® inferring its validity 
for H; on a sufficiently small neighborhood Vg. 

Notice also that the result can be actually stated 
not as a claim about phase diagram in a space of 
parameters, but as a statement about stable phases 
of a fixed Hamiltonian H. Namely, for a Hamilto- 
nian H satisfying Peierls condition with respect to a 
reference set G, one can assure the existence of 
parameters 4m labeled by elements from G such that 
the set of extremal periodic Gibbs states of H 
consists of all those m-phases for which am =Q. 


Construction of Metastable Free Energies 


An important part of the Pirogov—Sinai theory is 

an actual construction of the metastable free 

energies — a set of functions f,,(t),m=1,...,7, 

that provide the homeomorphism a(t) by taking 
t) = falt) — minm falt) 

We start with a contour representation of 
partition function Z(A\o,). Considering, for each 
contributing configuration g, the collection O(c) of 
its contours, we notice that, in addition to the fact 
that different contours y,7 € O(c) have disjoint 
supports, [1M I’=(@, the contours from O(c) have 
to satisfy the matching conditions: if C is a 
connected component of Z\vea I, then the 
restrictions of the spin configurations o to C 
are the same for all contours y€ O(c) with 
dist(T, C) =1. In other words, the contours touch- 
ing C induce the same label on C. Let us observe 
that there is actually one-to-one correspondence 
between configurations ø that coincide with og on 


AF and collections M(A,qg) of contours ð in A 
satisfying the matching condition, and such that the 
external among them are g-contours. Here, a contour 
y € ð is called an external contour in O if T C Ext y 
for all ~” € O different from y. 

With this observation and using A„(0) to denote 
the union of all components of A \ Uea PF with label 


m, we get 
So [Jen entlbn nl TT eae 


OEM(A,q) mM yEO 


Z(Aloq) = 


Usefulness of such contour representations stems 
from an expectation that, for a stable phase q, 
contours should constitute a suppressed excitation 
and one should be able to use cluster expansions to 
evaluate the behavior of the Gibbs state pg. 
However, the direct use of the cluster expansion on 
[9] is trammeled by the presence of the energy terms 
e Pn(AIAn()l and, more seriously, by the require- 
ment that the contour labels match. 

Nevertheless, one can rewrite the partition func- 
tion in a form that does not involve any matching 
condition. Namely, considering first a sum over 
mutually external contours 0 and resumming over 
collections of contours which are contained in their 
interiors without touching the boundary (being thus 
prevented to “glue” with external contours), we get 


Z(Aloq) =y e eq A )|Ext| 
oext 
x [I [eeto T] 2 (nt rlom) } [10] 
yEeorxt m 
Here the sum goes over all collections of 
compatible external q-contours in A, Ext= 
Exty(O™) = (peax (ExtyN A), and the partition 


function Z®(Aļo}) is defined by [9] with 
M(A, q) replaced by MÌË(A, q) C M(A, q), the 
set of all those collections whose external coun- 
tours y are such that dist (T, A°) > 1. Multiplying 
now each term by 


Z (Int ¥|oq) 
ic | | | | 11 
Z Tnte gg) [11 


veo m 
we get 
Z(Aloq) => ea es 
= 
x JI (ew, (Zinta) [12] 
om 


where wg(y) is given by 


E , (Inty: ¥|Om) 
w,(7) =e BU(y) @eq(H TT Sane [13] 
m IiOq 


Observing that a similar expression is valid for 
Z*(Alo,) (with an appropriate restriction on the 
sum over external contours 0‘) and proceeding by 
induction, we eventually get the representation 


Z(Alog) =e PIN NT [fuo [14 


0€C(A,q) YEO 


where C(A,qg) denotes the set of all collections of 
nonoverlapping g-contours in A. Clearly, the sum on 
the right-hand side is exactly of the form needed to 
apply cluster expansion, provided the contour weights 
satisfy the necessary convergence assumptions. 

Even though this is not necessarily the case, there 
is a way to use this representation. Namely, one can 
artificially change the weights to satisfy the needed 
bound, for example, by modifying them to the form 


w(7) = min (wal), el) [15] 


with a suitable constant 7. The modified partition 
function 
—Be,(H)|A 
Z'(Aloq) =a S TT wh (a) (16) 
OEC(A,q) YEO 
can then be controlled by cluster expansion allowing 


to define 


1. 1 

fq(H) = eT ET cence [17] 
This is the metastable free energy corresponding to the 
phase q. Applying the cluster expansion to the 
logarithm of the sum in [16], we get |f,(H) — e4(H)| < 
e77/2, The metastable free energy corresponds to 
taking the ground state a, and its excitations as long 
as they are sufficiently suppressed. Once wg(y) exceeds 
the weight e~7!"! (and the contour would have been 
actually preferred), we suppress it “by hand.” The 
point is that if the phase q is stable, this never happens 
and w (y) =W4(7) for all q-contours y. This is the idea 
behind the use of the function f,(H) as an indicator of 
the stability of the phase g by taking 


ag(t) = fy(He) — min fn (Hi) 18] 


Of course, the difficult point is to actually prove that 
the stability of phase q (i.e., the fact that a,(t) =0) 
indeed implies w} (y) = wq(7) for all y. The crucial step 
is to prove, by induction on the diameter of A and 4, 


the following three claims (with € = 2e~7/7): 


1. If y is a q-contour with aj(t)diamT < 7/4, then 


waly) = wald). 
2. Ifa;(t)diam A < 7/4, then Z(Alog) =Z'(Aloq) #0 
and 


IZ(Aloq)| > ef (Ar)|A|—€l0A| [19] 
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3. If m € G, then 
IZ(Alom)| < e7 m AHAIA QlOA [20] 


A standard example illuminating the perturbative 
construction of the metastable free energies and 
showing the role of entropic contributions is the 
Blume-capel model. It is defined by the Hamiltonian 


Halo) =-]X (ox -— oy) -AX -hX os [21] 
(x,y) 


xEA xEA 


with spins ox € {—1,0,1}. Taking into account only 
the lowest-order excitations, we get: 


~ 


eer h) =-\Tth- ane 


(sea of pluses or minuses with a single spin flip + — 0) 
and 


jb) = Leth (eh 8) 


(sea of zeros with a single spin flip either 0 — + or 
0 — —) 

Since these functions differ from full metastable free 
energies f+(A, b), fo(à, h) by terms of higher order 
(~e (44-29) the real phase diagram differs in this 
order from the one constructed by equating the 
functions f+(A,/) and fo(A,h). It is particularly 
interesting to inspect the origin, A=) =0. It is only 
the phase O that is stable there at all small 
temperatures since 


fo(0,0) ~ -Zeta < fa (0,0) ~ -3e [22] 


The only reason why the phase 0 is favored at this 
point with respect to phases + and — is that there 
are two excitations of order e? for the phase 0, 
while there is only one such excitation for + or —. 
The entropy of the lowest-order contribution to 
fo(0, 0) is overweighting the entropy of the contribu- 
tion to f+(0,0) of the same order. 


Applications 


Several applications, stemming from the Pirogov- 
Sinai theory, are based on the fact that, due to the 
cluster expansion, we have quite accurate descrip- 
tion of the model in finite volume. 

One class of applications concerns various 
problems featuring interfaces between coexisting 
phases. To be able to transform the problem into a 
study of the random boundary line separating the 
two phases, one needs a precise cluster expansion 
formula for partition functions in volumes occupied 
by those phases. In the situation with no symmetry 
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between the phases, the use of the Pirogov—Sinai 
theory is indispensable. 

Another interesting class of applications concerns 
the behavior of the system with periodic boundary 
conditions. It is based on the fact that the partition 
function Zr, on a torus Ty consisting of N? sites 
can be, again with the help of the cluster expan- 
sions, explicitly and very accurately evaluated in 
terms of metastable free energies, 


, 
q=1 


< exp{—Gminf,(H)N® —b6N} [23] 








with a fixed constant b. This formula (and its 
generalization to the case of complex parameters) 
allows us to obtain various results concerning the 
behavior of the model in finite volumes. 


Finite-Size Effects 


Considering as an illustration a perturbation of the 
Ising model, so that it does not have the + symmetry 
any more (and the value þ,(8) of external field 
at which the phase transition between plus and 
minus phase occurs is not known), we can pose a 
natural question that has an importance for correct 
interpretation of simulation data. Namely, what is 
the asymptotic behavior of the magnetization 
my (6,bh)= ur, (1/A X c4 Gx) on a torus? In the 
thermodynamic limit, the magnetization mP£"((, h) 
displays, as a function of h, a discontinuity at 
h=h,(G). For finite N, we get a rounding of the 
discontinuity — the jump is smoothed. What is the 
shift of a naturally chosen finite-volume transition 
point 4,(N) with respect to the limiting value þ;? 
The answer can be obtained with the help of [23] 
once sufficient care is taken to use the freedom in 
the definition of the metastable free energies f,(/) 
and f_(b) to replace them with a sufficiently smooth 
version allowing an approximation of the functions 
f(b) around limiting point 4, in terms of their 
Taylor expansion. 

As a result, in spite of the asymmetry of the model, 
the finite-volume magnetization m\"(3,) has a uni- 
versal behavior in the neighborhood of the transition 
point );. With suitable constants m and mo, we have 


mÈ (B, b) ~mo + mtanh{N4bm(hb—h,)} [24 


Choosing the inflection point /max(N) of mi (8, h) 
as a natural finite-volume indicator of the occurence 
of the transition, one can show that 

3X 


97 34 
Pmax N) =h: + 5a a N *é + O(N-**) |25] 


Zeros of Partition Functions 


The full strength of the formula [23] is revealed 
when studying the zeros of the partition function 
Zr, (z) as a polynomial in a complex parameter z 
entering the Hamiltonian of the model. To be able 
to use the theory in this case, one has to extend the 
definitions of the metastable free energies to com- 
plex values of z. Indeed, the construction still goes 
through, now yielding genuinely complex, contour 
models w+ with the help of an inductive procedure. 
Notice that no analytic continuation is involved. An 
analog of [23] is still valid, 


Zrt) — J ehan 
m=1 
< exp{—{min Refin(z)N? —bGN} [26] 


Using [26], it is not difficult to convince oneself 
that the loci of zeros can be traced down to the 
phase coexistence lines. Indeed, on the line of 
the coexistence of two phases Ref, = Refy, the 
partition function Zņ (z) is approximated by 
e BENS (@— BSF nN? +e 3hN") The zeros of this 
approximation are thus given by the equations 


Refin = Refn < Refe for all 2A m,n 


2 
BN*(Smfn — Smf,) = mmod 2x 27] 


The zeros of the full partition function Zr, (z) can 
be proved to be exponentially close, up to a shift 
of order O(e~¥°X), to those of the discussed 
approximation. 

Briefly, the zeros of Zz7,,(z) asymptotically con- 
centrate on the phase coexistence curves with the 


density (1/27)@N*|(d/dz)(fm — fn)l. 


Bibliographical Remarks 
and Generalizations 


The original works Pirogov and Sinai (1975, 1976) 
and Sinai (1982) introduced an analog of the weights 
w (J) and parameters a,(H) as a fixed point of a 
suitable mapping on a Banach space. The inductive 
definition used here was introduced in Kotecký and 
Preiss (1983) and Zahradník (1984). The completeness 
of phase diagram — the fact that the stable phases 
exhaust the set of all periodic extremal Gibbs states 
was first proved in Zahradník (1984). Extension to 
complex parameters was first considered in Gawedzki 
etal. (1987) and Borgs and Imbrie (1989). For a review 
of the standard Pirogov-Sinai theory, see Sinai (1982) 
and Slawny (1987). 

Application of Pirogov-Sinai theory for finite-size 


effects was studied in Borgs and Kotecký (1990) and 


general theory of zeros of partition functions is 
presented in Biskup et al. (2004). 

The basic statement of the Pirogov—Sinai theory 
yielding the construction of the full phase diagram 
has been extended to a large class of models. Let us 
mention just few of them (with rather incomplete 
references): 


1. Continuous spins. The main difficulty in these 
models is that one has to deal with contours 
immersed in a sea of fluctuating spins (Dobrushin 
and Zahradnik 1986, Borgs and Waxler 1989). 

2. Potts model. An example of a system a transi- 
tion in temperature with the coexistence of the 
low-temperature ordered and the high-tempera- 
ture disordered phases. Contour reformulation is 
employing contours between ordered and dis- 
ordered regions (Bricmont et al. 1985, Kotecky 
et al. 1990). The treatment is simplified with help 


of Fortuin—Kasteleyn representation (Laanait 
et al. 1991). 
3. Models with competing interactions. ANNNI 


model, microemulsions. Systems with a rich 
phase structure (Dinaburg and Sinai 1985). 

4. Disordered systems. An example is a proof of 
the existence of the phase transition for the three- 
dimensional random field Ising model (Bricmont 
and Kupiainen 1987, 1988) using a renormaliza- 
tion group version of the Pirogov—Sinai theory 
first formulated in Gawedzki et al. (1987). 

5. Quantum lattice models. A class of quantum 
models that can be viewed as a quantum perturba- 
tion of a classical model. With the help of Feyn- 
man-—Kac formula these are rewritten as a (d + 1)- 
dimensional classical model that is, in its turn, 
treated by the standard Pirogov—Sinai theory (Datta 
et al. 1996, Borgs et al. 1996). 

6. Continuous systems. Gas of particles in con- 
tinuum interacting with a particular potential of 
Kac type. Pirogov—Sinai theory is used for a proof 
of the existence of the phase transitions after a 
suitable discretisation (Lebowitz et al. 1999). 


See also: Cluster Expansion; Falicov—Kimball Model; 
Phase Transitions in Continuous Systems; Quantum 
Spin Systems. 
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Introduction 


Vortices have a long fascinating history. Descartes 

wrote in his Le Monde: 
...que tous les mouvements qui se font au Monde sont 
en quelque façon circulaire: cest à dire que, quand un 
corps quitte sa place, il entre toujours en celle d’un 
autre, et celui-ci en celle d’un autre, et ainsi de suite 
jusques au dernier, qui occupe au méme instant le lieu 
délaissé par le premier. 


In particular, Descartes thought of vortices to 


model the dynamics of the solar system, as reported 
by W W R Ball (1940): 


Descartes’ physical theory of the universe, embodying 
most of the results contained in his earlier and 
unpublished Le Monde, is given in his Principia, 
1644,... He assumes that the matter of the universe 
must be in motion, and that the motion must result in a 
number of vortices. He stated that the sun is the center 
of an immense whirlpool of this matter, in which the 
planets float and are swept round like straws in a 
whirlpool of water. 


Descartes’ theory was later on recused by Newton 
in his Principia in 1687. Few centuries later, 
W Thomson (1867) the later Lord Kelvin, made use 
of vortices to formulate his atomic theory: each atom 
was assumed to be made up of vortices in a sort of 
ideal fluid. In 1878-79 the American physicist A M 
Mayer conducted a few experiments with needle 
magnets placed on floating pieces of cork in an 
applied magnetic field, as toy models for studying 
atomic interactions and forms (Mayer 1878, Aref 
et al. 2003). In 1883 inspired by Mayer experiments, 
J J Thomson combined W Thomson’s atomic theory 
with H von  Helmholtz’s point-vortex theory 
(Helmholtz 1858): he thought as the electrons were 
point vortices inside a positively charged shell (see 
Figure 1), the vortices being located at the vertices of 
regular parallelograms and investigated about the 
stability of such structures (see Thomson (1883, 
section 2.1)). The vortex-atomic theory survived for 
quite a few years up to Rutherford’s experiments 
proved that atoms have quite a different structure! 
Before continuing this historical/modeling overview, 
let’s address the following question: 

what is a vortex and, more specifically, what is a point- 

vortex? 


Roughly speaking, following Descartes, a vortex 
is an entity which makes particles move along 
circular-like orbits. Examples are the cyclones and 
anticyclones in the atmosphere (see Figure 3). 
Mathematically speaking, let u = (u,v, w) € R? be a 
velocity field, the associated vorticity field w is 


defined to be 
w=V^u [1] 


In this article we are considering exclusively inviscid 
flows which are also incompressible, that is, 


V-u=0 [2] 


and have constant density p, which we normalize to 
be equal to 1 (9=1). In two dimensions, a point- 
vortex field is the simplest of all vorticity fields: it 
can be thought as an entity where the vorticity field 
is concentrated into a point. In other words, point 
vortices are singularities of the vorticity field! Then, 
in the plane the vorticity field associated to a system 
of N point vortices is 


N 
w(r) = 5 Tod(r — ra) [3] 
a=1 





(a) (b) 
Figure 1 Thomson atomic model: (a) atom with three 
electrons and (b) atom with four electrons. From Thomson JJ 
(1883) A Treatise on the Motion of Vortex Rings. New York: 
Macmillan and Thomson JJ (1904) Electricity and Matter. 
Westmister: Archibald Constable. 





Figure 2 Hurricane Jeanne. Reproduced with permission from 
the National Oceanic and Atmospheric Administration (NOAA) 
(www.noaanews.noaa.gov). 
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Figure 3 Cyclones and anticyclones in the atmosphere. Repro- 
duced from Boatto S and Cabrel HE, SIAM Journal of Applied 
Mathematics 64:216—230 (2003). With the permission of SIAM. 


where Tra, a=1,...,N, is a constant and corre- 
sponds to the vorticity (or circulation) of the 
a-vortex, situated at ra. In fact by definition, 
the circulation around a curve C delimiting a region 
X with boundary C, 


to= f u:ds= || (Van: nda= ] f w 4] 


where we have used Stokes’ theorem to bring in the 
vorticity. Then if the region contains only the ath 
point vortex, we obtain 


To= ff wda =T, [5] 


by eqn [3]. A positive (resp. negative) sign of Ta 
indicates that the corresponding point vortex 
induces an anticlockwise (resp. clockwise) particle 
motion, see Figure 4a)). Is there an analog of a 
point-vortex system for a three-dimensional flow? 
Yes, and this brings in the analogy between vortex 
lines and magnetic field lines that Mayer used in his 
experiments with floating magnets. In fact, in three 
dimensions, the notion of a point vortex can be 
extended to that one of a straight vortex line (see 
Figure 4b), where, by definition, a vortex line is a curve 
that is tangent to the vorticity vector w at each of its 
point. In this context we would like to mention the 
beautiful experiments of Yarmchuck-Gordon-Packard 
on vortices in superfluid helium. They observed the 


particle 
i r>0 o) 


(a) 
Figure 4 


r>0 ©) 
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formation of stable polygonal configurations of iden- 
tical vortices, quite similar to the ones observed by 
Mayer with his magnets (see Figures 5 and 1). 

One would like to understand how such configura- 
tions form and to give a theoretical account about their 
stability. In order to answer these questions we have to 
first be able to describe the dynamics of a system of 
point vortices from a mathematical point of view. 


Evolution Equations 


Can point vortices be viewed as “discrete” (or 
localized) solutions of Euler equation in two dimen- 
sions? Let us consider the Euler equation 


o 
a‘ +u-Vu=-Vpt+f 6] 
Ot 

where p is the pressure, f = —VU is a conservative 


force, and restrict our attention to the two-dimensional 
setting, for example, vortex dynamics on the plane (or a 
sphere). Then it is immediate that by taking the curl of 
eqn [6] we obtain the evolution equation of the 
vorticity, that is, 

% tu Vw = 0, or a = 
where the operator D/Dt=0/0t +u- V is called the 
material derivative and describes the evolution along 
the flow lines. It follows from eqn [7] that in two 
dimensions the vorticity is conserved as it is trans- 
ported along the flow lines. Then a natural question 
arises: supposing the vorticity field w is known, is it 
possible to deduce the velocity field u generating w? Or 
in other words, is it possible to solve the system of eqns 
[1]-[2]? It is immediate to see that in general the 
solution is not unique, if some boundary conditions 
are not specified (see Marchioro and Pulvirenti 
(1993)). Furthermore, as already observed by Kirchh- 
off in 1876 (Boatto and Cabral 2003), in two 
dimensions we can recast the fluid equations [1]-[2] 
into a Hamiltonian formalism. In fact, notice that on 
the plane u = (x,y) and eqn [2] is still satisfied if we 
represent the velocity components as 


0 [7] 





(b) 


(a) Advected by the velocity field of one point vortex, a test particle follows a circular orbit, with a speed proportional to the 


absolute value of the vortex circulation and inversely proportional to the square of its distance from the vortex. (b) Straight vortex lines. 
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Figure 5 Photographs of vortex configurations in a rotated 
sample of superfluid helium with 1,..., 11 vortices. Reprinted 
figure with permission from Yarmchuk EJ, Gordon MJV, and 
Packard RE (1979) Observation of stationary vortices arrays in 
rotating superfluid Helium. Physical Review Letters 43(3): 214— 
217. Copyright (1979) by the American Physical Society. 


Ow Ow 


that is, by means of Y, called the stream function. 
Formally, Y plays the rôle of a Hamiltonian for the pair 
of conjugate variables (x, y) and it is used to describe the 
dynamics of a test particle, located at (x, y) and advected 
by the flow. By substituting [8] into [1], we obtain 


AT (r) = u(r) [9] 


that is, a Poisson equation with w as a source term. 
Then, once we specify the vorticity field, by 
inverting [9] we obtain the stream function WV to be 


P(r) = | G(r r')w(r') dr [10] 


where G(r,r') is the Green’s function, solution of 
the equation AG(x, y) = —6(x,y). The Green’s func- 
tion both for the plane and the sphere is (Marchioro 
and Pulvirenti 1993) 


/ 1 / 
G(r,r)= ag log [Ir — 1" [11] 


where ||r — r'|]? = (x — x!) + (y—y')*. By [10], once 
we specify the vorticity field w(r) we can compute Y, 
and by replacing it into [8] the velocity field becomes 


u(r) = [ke r’)w(r') dr’ [12] 


where K(r,r')= —(r—r')+/[2rlir- r'|] and it 
represents the velocity field generated by a point 
vortex of intensity one, located at r’. Then by 
considering the vorticity field generated by point 
vortices, eqn [3], together with eqn [11], eqn [10] 
becomes 


N 
U(r) =-— z) log |r — r'|" 5 raó(r' — 3 dr’ 
a=1 


1X 
=—7-)_Falog|lr — ral’ [13] 
a=1 


Equation [13] describes together with [8], the 
dynamics of a test particle at a point r= (x,y) in 
the plane. Analogously, it can be shown that the 
dynamics of a systems of point vortices in the plane 
is given by the equations 


dx, OH, dya OH, 
pee E S 14 
dt Oya ba 








“dt axa 


where (das Pa) = is l Valo = 1; N, 18 a pair of 
conjugate variables and H, is the generalization of 
the stream function WV (eqn [13]): 


1X 
m= Ge log||lra — reall’ [15] 

of 
Notice that the vortex Hamiltonian H, (eqn [15]) is 
an autonomous Hamiltonian and, as we will discuss 
in the first subsection, it provides a good Lyapunov- 
like function to study stability properties of some 
vortex configurations. Moreover, H, is invariant 
with respect to rotations and translations, then by 
the Noether theorem there are other first integrals of 

motion, that is, 


N N 
Les eel, Mie ae, 
k=1 k=1 


N 
M, =X Tayr 
k=l 


expressing, respectively, the conservation of angular 
momentum, L, and linear momentum, M= 
(Mx, M,), on the plane. We shall denote with M 
the magnitude of M (i.e.. M=||M||). Furthermore, 
by introducing the Poisson bracket 


p.ai=>-(2! Ete) 














a1 Oda Opa OPa OFa 
s> 1 (of g _ of og 
PAV Gg Oaa aa 


we can construct three integrals in involution out of 
the four conserved quantities L, Mx, My, and H,. 
These are L, M2 + M? and H,: in fact, 


IH,, L]=0, (H, M2 + Mb! =f), 
|L, M2 + M;|=0 


It is then possible to reduce the system of equations 
from N to N — 2 degrees of freedom. A Hamiltonian 
system with N degrees of freedom is integrable 
whenever there are N independent integrals of 
motion in involution. It follows that a vortex system 
with N <3 is integrable, whereas the system of 
equations of four identical vortices has been shown 
by Ziglin to be nonintegrable in the sense that there 
are no other first integrals analytically depending on 
the coordinates and circulations, and functionally 
independent of L,H,,M,,My, (see Ziglin (1982)). 
The following, however, has been shown: 


1. Let K= pee k, be the total vorticity, 
M = (Mx, M,) the total momentum and M = ||M||. 
Then, as shown by Aref and Stremler (1999), if K=0 
and M =0, N-vortex problem [16] is integrable. 

2. A system of four identical vortices (1.e., ka =k 
for a=1,...,4) can undergo periodic or quasi- 
periodic motion for special initial conditions (see 
Khanin (1981) Russian Math. Surveys 36: 231; 
Aref and Pomphrey (1982) Proc. R. Soc. Lond. A 
380: 359-387). More specifically, the motion of a 
system of four identical vortices can be periodic, 
quasiperiodic, or chaotic depending on the symme- 
try of the initial configuration. In fact, every vortex 
configuration that belongs to the subspace of 
symmetric configurations — Xa = —Xa+2 and ya = 
Ya+2,&=1,2 — gives rise to an integrable vortex 
motion. 


We have that up to two vortices, the motion is 
almost always periodic and the orbits are circles; the 
only exception being the case for which k = —kı, 
when the circles degenerate into straight lines. Thus, 
a configuration of two point vortices is always a 
relative equilibrium configuration, that is, there exists 


Point-Vortex Dynamics 69 


O0<I,<I5 
T=. 
(a) (b) 
IP y]>|To| 
ae 
© | 
8i & 


(c) (d) 
Figure 6(a-d) For N =2 the vortex dipole exhibits a synchro- 
nous and the orbits are in general circular orbits, with the 
exception of the case (d) for which I} = —T2 and the circular 
orbit degenerates into a line (or a circle of infinite radius). 


a specific reference frame in which the two vortices 
are at rest. If the vortices are identical (T4 =I. =T), 
the motion is synchronous with frequency Q =T /r 
and the vortices share the same circular orbit (see 
Figure 6a). If the vortices are not identical and have 
vorticities of different magnitudes (say |I| > |[2)), 
their motion is still synchronous and periodic, with 
frequency Q= (T1 + T2)/(27), and the vortices move 
on different circular orbits (with 7 < rı) both 
centered at the center of vorticity. Note that for 
both cases, identical and nonidentical vortices, we 
can view the vortex dynamics in a co-rotating frame 
where the vortices are simply at rest. 

For three vortices we can have periodic and 
quasiperiodic motion, depending on the initial 
conditions, and for four vortices we can have 
periodic, quasiperiodic, or weakly chaotic motion. 


Remarks 


(i) The nonintegrability of the 4-vortex system was 
also proved for configurations of nonidentical vortices. 
Koiller and Carvalho (1989) gave an analytical proof 
for Ty = —T; and r3 =r4 =€6,0 < e < 1. Moreover, 


Castilla et al. (1993) considered the case: 
l4 =]v =l} =1 and l4 =e. 
(ii) Due to the translational and rotational 


symmetries of H,, there are some analogies between 
the N-vortex problem and the N-body problem, 
especially for what concerns configurations of 
relative equilibria (see Albouy (1996) and Glass 
(2000)). A relative equilibrium is a vortex (or mass) 
configuration that moves without change of shape 
or form, that is, a configuration which is steadily 
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Figure 7 Polygonal configuration of vortices: (a) planar 


configurations and (b) configurations of vortex rings on a sphere, 
with and without polar vortices. 
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rotating or translating. A few examples are vortex 
polygons (see Figure 7) like the ones studied by 
Thomson, Mayer, Yarmchuk-Gordon-Packard, 
Boatto-Cabral (2003), Cabral-Schmidt (1999/ 
2000), Dritschel-Polvani (1993), Lim-Montaldi- 
Roberts (2001), Sakajo (2004). For an exhaustive 
review on relative equilibria of vortices, see the 
article by Aref et al. (2003). We shall discuss 
stability of polygonal vortex configuration in the 
following subsection. 

(iii) As shown by Kimura (1999) in a beautiful 
geometrical formalism, on the unit sphere (S*) and 
on the Hyperbolic plane (H?), the vortex Hamilto- 
nians [15] are 


1 r 
y= —7->_Tal's log(1 — cos pag) on S$ 


aFB 
Tee cosh pog — 1 
H, =-—) Talg log ————_ H? 
m P 98 Cosh Pag +1 = 
where 
COS Pag = COS Oa COS Og 
+ sin ĝa sin 0g cos(¢a — ¢g) on S? 
cosh pag = cosh 6, cosh 6g 
+ sinh 0, sinh 0g cos(a — ¢3) on H? 


On S’*,6, and ġa are, respectively, the co-latitude 
and the longitude of the a-vortex, a=1,...,N. We 
can define canonical variables qa and pa on S* and 
H?, respectively, as 


on S? 


on H? 


dn =T 200s Ox: 


Pa = Po 


da = La cosh ee Pa = Da 


Montaldi et al. (2002) studied vortex dynamics on 
a cylindrical surface, and Souliére and Tokieda 
(2002) considered vortex dynamics on surfaces 
with symmetries. 

(iv) As we shall see in the section on point 
vortex motion, it is sometimes useful to employ 
the complex analysis formalism. Then the vari- 
ables of interest are Za =X, +1yqg,a=1,...,N, and 
its conjugate Za, the Hamiltonian [15] takes the 
form 


1 
H, = “Fedral log |Za — Z| 


and the equations of motions become 


N 
fa = 5 SO T a=1,...,N [16] 
T siab- [Za — zo 
(v) Equation [14] can we rewritten in a more 
compact form as 


dX 
JE = JVxH, [17] 


where 


X= (93.8059 GNy Pisses PN] 


y = (2 EO z) 
X w dan’ Opi? OPN 


i=(5 o) 


I being the N x N identity matrix. 

(vi) How close is the point-vortex model to the 
original Euler equation? Point-vortex systems repre- 
sent discrete solutions of the Euler equation in a 
“weak” sense — see both the book and the article by 
Marchioro and Pulvirenti (1993, 1994). These 
authors proved that the Euler dynamics is “similar” 
to the vortex dynamics in which the vortices are 
localized in very small regions, and the vortex 
intensities are the total vorticities associated to 
such small regions. In particular, let us consider a 
vorticity field with compact support on a family of 
e-balls, that is, 

N 
v= ` ws 
=1 


1 


with support of wf contained in the ball of center x; 
(independent of €) and radius e. Furthermore let us 


assume that 
/ wo dr=T; 
Ir—rj|<e 


IG? 


Figure 8 In the limit «—0, the dynamics of the center of 
vorticity of a vortex «-ball is approximated by the dynamics of a 
point vortex. 


with the q; independent of e. Then in the limit € — 0 
the dynamics of the center of vorticity 
B.(t)= | rw.(r,t)dr, of a given e-ball, “converges” 
to the motion of a single point vortex (see Figure 8). 
This result is important to illustrate as vortex 
systems provide both a useful heuristic tool in the 
analysis of the general properties of the solutions of 
Euler’s equations (Poupaud 2002, Schochet 1995), 
and a useful starting point for the construction of 
practical algorithms for solving equations in specific 
situations. In particular, it provides a theoretical 
justification to the vortex method previously intro- 
duced by Carnevale et al. (1992). These authors 
constructed a numerical algorithm to study turbu- 
lence decaying in two dimensions. Their vortex 
method greatly simplifies fluid simulations as basi- 
cally it relies on a discretization of the fluid into 
circular patches. The dynamics of patches is given 
by the centers of vorticity, which interact as a point- 
vortex system, endowed with a rule dictating how 
patches merge (see Figure 9). 


Stability of a Vortex Ring 


As mentioned in the Introduction section, the study 
of vortex relative equilibria has a long history. 
Kelvin showed that steadily rotating patterns of 
identical vortices arise as solutions of a variational 
problem in which the interaction energy (vortex 
Hamiltonian) is minimized subject to the constraint 
that the angular impulse be maintained (see Aref 
(2003). In 1883, while studying and modeling the 
atomic structure, J J Thomson investigated the linear 
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N 2a, 
Figure 9 In Carnevale et al. (1992) the fluid is modeled by a 
dilute vortex gas with density p and typical radius a. The 
dynamics is governed by the point-vortex dynamics of the disk 
centers, each disk corresponding to a point vortex of intensity 
T =n€exra*, where Ee; plays the role of a vorticity density. Two 
vortices or radius a; and a merge when their center-to-center 
distance is less or equal to the sum of their radii, a; + a,. Then a 


new vortex is created and its radius ag is given by 
a3 = (at + af)". 
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stability of co-rotating point vortices in the plane. In 
particular, his interest was in configurations of 
identical vortices equally spaced along the circum- 
ference of a circle, that is, located at the vertices of a 
regular polygon (see Figure 7). He proved that for 
six or fewer vortices the polygonal configurations 
are stable, while for seven vortices — the Thomson 
heptagon — he erroneously concluded that the 
configuration is slightly unstable. It took more 
than a century to make some progresses on this 
problem. D G Dritschel (1985) succeeded in solving 
the heptagon mystery for what concerns its linear 
stability analysis, leaving open the nonlinear stabi- 
lity question: he proved that the Thomson heptagon 
is neutrally stable and that for eight or more vortices 
the corresponding polygonal configurations are 
linearly unstable. Later on in 1993, Polvani and 
Dritschel (1993) generalized the techniques used in 
Dritschel (1985) to study the linear stability of a 
“latitudinal” ring of point vortices on the sphere, as 
a function of the number N of vortices in the ring, 
and of the ring’s co-latitude 6 (see Figure 10). They 
proved that polygonal configurations are more 
unstable on the sphere than in the plane. In 
particular, they showed that at the pole, for N < 7 
the configuration is stable, for N= 7 it is neutrally 
stable and for N > 7 it is unstable. By means of the 
energy momentum method (Marsden—Meyer—Weistein 
reduction), J E Marsden and S Pekarsky (1998) 
studied the nonlinear stability analysis for the 
integrable case of polygonal configurations of 
three vortices of arbitrary vorticities ([;,[. and 
r3) on the sphere, leaving open the stability 
analysis for nonintegrable vortex systems (N > 3). 
In 1999 H E Cabral and D S Schmidt completed 
the linear and nonlinear stability analysis at once 
for polygonal configurations in the plane. In 2003 
Boatto and Cabral studied the nonlinear stability of 
a ring of vortices on the sphere, as a function of the 
number of vortices N and the ring colatitude 0. 





Figure 10 Latitudinal ring of vortices. Reproduced with 
permission from Boatto S and Cabral HE SIAM Journal of 
Applied Mathematics 64: 216-230 (2003). 
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Boatto and Simó (2004) generalized the stability 
analysis to the case of a ring with polar vortices 
and of multiple rings, the key idea being, as we 
shall discuss in this section, the structure of the 
Hessian of the Hamiltonian. 


How to infer about linear and nonlinear stability 
of steadily rotating configurations? 


Let us restrict the discussion to a polygonal ring of 
identical vortices on a sphere as illustrated in 
Figure 7 (Boatto and Cabral 2003, Boatto and 
Simó 2004). The reasoning is easily generalized for 
the planar case. The case of multiple rings is 
discussed in great detail in Boatto and Simo 
(2004). A polygonal ring is a relative equilibrium 
of coordinates X(t)=(q1(t),...,g9Nn(t),pi(t),..-, 
pn(t)), where 


qalt) = Galt) = wt + boa 

Dall) =P, = I cosl; WH 1; N 
w=(N — pa, io y le pil? Poa and Ooa = b0 
being the initial longitude and co-latitude of the ath 
vortex. 


Theorem 1 (Spherical case) (Boatto and Simó 
2004). The relative equilibrium [18] is (linearly and 
nonlinearly) stable if 


— 4(N — 1)(11 — N) + 24(N - 1)” 
+2N? +1+3(-1)* <0 [19] 


[18] 


and it is unstable if the inequality is reversed. 


Remarks 


(i) By Theorem 1 a vortex polygon, of N point vortices, 
is stable for 0° < 6, < 0% and (180° — 0%) < 0o < 
180°, where 6* = arcsin(r%) and 





pe < 4 -5 for N odd 
N? -8N +8 
oa < k o for N even 


where rž = sin 6%. 

(ii) Theorem 1 includes at once the results of 
Thomson (1883), Dritschel (1985), and Polvani 
and Dritschel (1993) (and other authors who 
have been working in the area (Aref et al. 2003)). 
We recover the planar case by setting r, =0 in 
eqn [19], deducing that stability is guaranteed 
for N <7. 


To prove Theorem 1 it is useful to consider the 
Hamiltonian equations as in eqn [17]. The first step 
is to make a change of reference frame: view the 


dynamics in a frame co-rotating with the relative 
equilibrium configuration. In the co-rotating refer- 
ence system, the Hamiltonian takes the form 


H=H+wM 


where M is the momentum of the system, and H and 
w are, respectively, the Hamiltonian and the rota- 
tional frequency of the relative equilibrium in the 
original frame of reference. In the new reference 
frame, the relative equilibrium becomes an equili- 
brium, X*, and the standard techniques can be used 
to study its stability. 

To study linear stability, the relevant equation is 


dAX 

~ JSAX [20] 
where X = X*+ AX, and S is the Hessian of H 
evaluated at the equilibrium X*. Then linear (or 
spectral) stability is deduced by studying the 
eigenvalues of the matrix JS (spectral stability). For 
nonlinear stability we make use of a sufficient 
stability criterion due to Dirichlet (1897) (see G 
Lejeune Dirichlet (1897). Werke, vol. 2, Berlin, 
pp. 5-8; Boatto and Cabral (2003) and references 
therein). 


Theorem 2 Let X* be an equilibrium of an 
autonomous system of ordinary differential equations 


dX 
=f), QER 21] 
that is, f(X*)=0. If there exists a positive (or 
negative) definite integral F of the system [21] in a 
neighborhood of the equilibrium X*, then X* is 
stable. 


In our case the Hamiltonian itself is an integral of 
motion. Then by studying definiteness of its Hes- 
sian, S, evaluated at X*, we infer minimal stability 
intervals in 0 and N. Details are given in Boatto and 
Cabral (2003) and Boatto and Simó (2004). The 
proof is mainly based on the following 
considerations: 


1. Since S is a symmetric matrix it is diagonaliz- 
able, that is, there exists an orthogonal matrix 
C such that C'SC=D, where D is a diagonal 
matrix, D=diag(A1,...,An). Furthermore, the 
matrix C can be chosen to leave invariant the 
symplectic form (equivalently J= CJC). Then 
by the canonical change of variables Y=C!X 
eqn [20] becomes 


= = JDAY (22) 


where Y=(q1,.-.,9NsP15---5PN) and (djs Pi)s 

j=1,...,N, are pairs of conjugate variables. 
Equation [22] can be rewritten as 

d AG; i 

JZ t= —\iN+NAG;, 





j=1,...,N 


2. When evaluated at the equilibrium X*, the 
Hessian S takes the block structure 


(34) 


where the matrices O and P are symmetric circulant 
matrices, that is, (N x N) matrices of the form 


a, az an 
AN a aN-1 

A=]. [23] 
d? 43 «x ay 


Circulant matrices are of special interest to us 
because we can easily compute their eigenvalues 
and eigenvectors for all N. In fact, it is immediate 
to show that: 


Lemma 3 All circulant matrices [23] have 
eigenvalues 
N 
k-1 
Ne a Fe Neen 
k=1 
and corresponding eigenvectors vj=(1,7;,..., 


ae st 1,...,N, where r= exp (2x7 —1)/N) 


are solutions of rN =1. 


Passive Tracers in the Velocity Fields of N Point 
Vortices: The Restricted (N + 1)-Vortex Problem 


The terminology “restricted (N + 1)-vortex prob- 
lem” is used in analogy with celestial mechanics 
literature, when one of the vorticities is taken to be 
zero. The zero-vorticity vortex does not affect the 
dynamics of the remaining N-vortices. For this 
reason, it is said to be passively advected by the 
flow of the remaining N-vortices and in the fluid 
mechanics literature the terminology “passive tra- 
cer” is also employed. The tracer dynamics is given 
by the Hamiltonian equations [8]. Notice that in 
general the Hamiltonian W is time dependent, 
through the vortex variables rj,j=1,...,N, that is, 


W(r,t) = V(r, ri(t),...,7n(t)) 


and (q,p)=(x,y) play the role of conjugate canoni- 
cal variables. There is an extensive literature on the 
subject both from theoretical (see, e.g., Boatto and 
Simó (2004) and Newton (2001)) and an experi- 
mental (van Heijst 1993, Ottino 1990) point of 
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view. As discussed in the previous section, there are 
some vortex configurations, such as the polygonal 
ones, for which vortices undergo a periodic circular 
motion. Then by viewing the dynamics in a 
reference frame co-rotating with the vortices the 
tracer Hamiltonian is manifestly time independent 
and, therefore, integrable — since it reduces to a 
Hamiltonian of one degree of freedom. In such an 
occurrence, tracer trajectories form a web of homo- 
clinic and heteroclinic orbits. An interesting theo- 
retical problem is to study how the tracer transport 
properties (i.e., existence of barriers to transport, 
diffusion etc.) are affected by perturbing the poly- 
gonal vortex configuration, that is, by introducing in 
Ų a “genuine” time dependence (periodic, quasi- 
periodic, or chaotic) (see, e.g., Boatto and Pierre- 
humbert (1999), Rom-Kedar, Leonard and Wiggins 
(1990), Kuznetsov and Zaslavsky (2000), and 
Newton (2001)). Furthermore, in the lab experi- 
ments, color dyes, which monitor the flow velocity 
field, are often used as the experimental equivalent 
of tracer particles. In this context we would like to 
stress the striking resemblance between theoretical 
particle trajectories, deduced from point vortex 
dynamics, and the actual dye visualizations observed 
by van Heijst and Flor for vortex dipoles in a 
stratified fluid (see Figures 11 and 12) (van Heijst 
1993). Similarly, tripolar structures have been 
observed both in lab experiments (see Figure 13) 
and in nature (see Figure 14). Recently, the Danish 
group of Jansson—Haspang—Jensen—Hersen—Bohr has 
observed beautiful rotating polygons, such as 
squares and pentagons, on a fluid surface in the 
presence of a rotating cylinder (see Figure 15). 


Point Vortex Motion with Boundaries 


In comparison with the extensive literature on point 
vortex motion in unbounded domains, the study of 
point vortex motion in the presence of walls is modest. 
There is, however, a general theory for such problems, 
and some recent new developments in this area have 
resulted in a versatile tool for analyzing point vortex 
motion with boundaries. Newton (Newton 2001) 
contains a chapter on point vortex motion with 
boundaries and also features a detailed bibliography. 
The reader is referred there for standard treatments; 
here, we focus on more recent developments of the 
mathematical theory. 


The Method of Images 


When point vortices move around in bounded 
domains, it is clear that the motion is subject to 
the constraint that no fluid should penetrate any of 
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Figure 11  Test-particle trajectories: on the left, theoretical 
trajectories, from the point-vortex model; on the right, a top view 
of a laboratory experiment in stratified flows. Reproduced from 
van Heijst GJF and Flor JB (1989) Dipole formation and 
collisions in a stratified fluid. Nature 340: 212-215, with 
permission from Nature Publishing Group. 


the boundary walls of the domain. If n denotes the 
local normal to the boundary walls, the boundary 
condition on the velocity field u is therefore u-n=0 
everywhere on the walls. Another way to say the 
same thing is that all the walls must be streamlines 
so that the streamfunction, Y% say, must be constant 
on any boundary wall. 

A classical approach to bounded vortex motion is 
the celebrated method of images — a rather special 
technique limited to cases where the domain of 
interest has certain geometrical symmetries so that 
an appropriate distribution of image vorticity can be 
ascertained, essentially by inspection. This image 
vorticity is placed in nonphysical regions of the 
plane in order to satisfy the boundary conditions 
that the walls act as impenetrable barriers for the 
flow. 

The simplest example is the motion of a single 
vortex next to a straight plane wall of infinite 
extent. Suppose the wall is along y=0 in an (x, y)- 
plane and that the fluid occupies the upper-half 
plane. If a circulation-[’ vortex is at the complex 
position %=xXo9 + iyo, the solution for the stream- 
function is 


A oO 
ele =a 





p(z, z) = -log 


T [24] 








where z=x+ iy. This has a single logarithmic 
singularity in the upper-half plane at z=zo 





Figure 12 A frontal collision of two dipoles as observed in a 
stratified fluid: after a so called “partner-exchange” two new 
dipoles are formed. Reproduced from van Heijst GJF and Flor JB 
(1989) Dipole formation and collisions in a stratified fluid. Nature 
340: 212-215, with permission from Nature Publishing Group. 





Figure 13 A tripolar vortex structure as observed in a rotating 
stratified fluid. Reproduced from van Heijst GJF, Kloosterziel 
RC, and Williams CWM (1991) Laboratory experiments on the 
tripolar vortex in a rotating fluid. Journal of Fluid Mechanics 225: 
301—331, with permission from Cambridge University Press. 





Figure 14 Infrared image taken by NOAA11 satellite on 
January 4 1990 (0212 UT) shows a tripolar structure in the 
Bay of Biscay. The central part of the tripole measures about 
50-70 km and rotates clockwise, whereas the two satellite 
vortices rotate anticlockwise. The dipoles persisted for a few 
days before it fell apart. Reproduced from Pingree RD and Le 
Cann B, Anticyclonic Eddy X91 in the Southern Bay of Biscay, 
Journal of Geophysical Research, 97: 14853-14362, May 1991 
to February 1992. Copyright (1992) American Geophysical 
Union. Reproduced/modified by permission of American Geo- 
physical Union. 





Figure 15 The free surface of a rotating fluid will, due to the 
centrifugal force, be pressed radially outward. If the flow is driven 
by rotating the bottom plate, the axial symmetry can break 
spontaneously and the surface can take the shape of a rigidly 
rotating polygon. With water Jansson—Haspang—Jensen—Her- 
sen—Bohr have observed polygons with up to six corners. The 
rotation speed of the polygons does not coincide with that of the 
plate, but it is often mode-locked, such that the polygon rotates 
by one corner for each complete rotation of the plate. 
Reproduced from Jansson TRN, Haspang M, Jensen KH, 
Hersen P, and Bohr T (2005) Rotating polygons on a fluid 
surface. Preprint, with permission from T Bohr. 
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(corresponding to the point vortex) and it is easily 
checked that %=0O on y=0. Therefore, no fluid 
penetrates the wall. Equation [24] can be written as 


LT T 
z) = —— = = — Zo 2 
p(z, 2) 5, log |z Z0| oe og |z — Zo| [25] 


which is the sum of the streamfunction due to a 
point vortex of circulation T at zọ=xo +iyo and 
another, one imagines, of circulation —I° at Z= 
xo — iyo. In this case, the image vortex distribution is 
simple: it is just the second vortex sitting at the 
reflected point in the wall. The method of images 
can be applied to flows in other regions bounded by 
straight line segments (e.g., wedge regions of various 
angles (Newton 2001)). 

A variant of the method of images is the Milne- 
Thomson circle theorem relevant to planar flow 
around a circular cylinder. Given a complex 
potential w(z) with the required singularities in the 
fluid region exterior to the cylinder, but failing to 
satisfy the boundary condition that the surface of 
the cylinder is a streamline, this theorem says that 
the correct potential W(z) is 


W(z) = w(z) + wla? /z) [26] 


where a is the cylinder radius and w(z) is the 
conjugate function to w(z). It is easy to verify that 
the imaginary part of W(z), that is, the stream- 
function, is zero on |z|=a. The second term, 
w(a*/z), produces the required distribution of 
image vorticity inside the cylinder. A famous 
example is the Föppl vortex pair which is the 
simplest model of the trailing vortices shed in the 
wake of a circular aerofoil traveling at uniform 
speed. 


Kirchhoff-Routh-Lin Theory 


The most important general mathematical tool for 
point vortex motion in bounded planar regions is 
the Hamiltonian approach associated with the 
names of Kirchhoff (1876) and Routh (1881), 
who developed the early theory. It is now known 
that the problem of N-vortex motion in a simply 
connected domain is a Hamiltonian dynamical 
system. Moreover, the Hamiltonian has simple 
transformation properties when a given flow 
domain of interest is mapped conformally to 
another — a result originally due to Routh. A 
formula for the Hamiltonian can be built from 
knowledge of the instantaneous Green’s function 
associated with motion of the point vortex in the 
simply connected domain D. In fact, [24] is 
precisely the relevant Green’s function when D is 
the upper-half plane. 
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Much later, in 1941, Lin (1941a) extended these 
general results to the case of multiply connected 
fluid regions. To visualize such a region, think of a 
bounded region of the plane containing fluid but 
also a finite number of impenetrable islands whose 
boundaries act as barriers for the fluid motion. If the 
islands are infinitely thin, they can be thought of as 
straight wall segments immersed in the flow (see 
later examples). Lin (1941b) showed that both the 
Hamiltonian structure, and the transformation 
properties of the Hamiltonian under conformal 
mapping, are preserved in the multiply connected 
case. 


Lin’s Special Green’s Function 


Since Lin’s result subsumes the earlier simply 
connected studies, we now outline the key results 
as presented in Lin (1941a). Consider a fluid region 
D, with outer boundary Cp and M enclosed islands 
each having boundaries {Cj|j=1,...,M}. Lin intro- 
duced a special Green’s function G(x, y;x0, Vo) 
satisfying the following properties: 


1. the function 
1 
8(X, Y; x0, Yo) = —G(x, y; x0, Y0) —z—logro [27] 
is harmonic with respect to (x,y) throughout 
the region D including at the point (xo, yo). Here, 
* + (y — yo)"; 
2. if OG/On is the normal derivative of G on a curve 
then 


to =y (X= Xo 


G(x, y; xo, yo) = Ag, on Cy, kR=1,...,M 
28 
eae kads M e 
C; On 
where ds denotes an element of arc and {A;} are 
constants; 
3. G(x, Y; Xo, Yo) =0 on Co. 


Flucher and Gustafsson (1997) refer to this G as 
the hydrodynamic Green’s function. (In fact, it 
coincides with the modified Green’s function 
arising in abstract potential theory — a function 
that is dual to the usual first-type Green’s function 
that equals zero on all the domain boundaries.) 
On the use of G, Lin established the following two 
key results: 


Theorem 4 If N vortices of strengths {T,|k = 
1,..., N} are present in an incompressible fluid at 
the points {(xņp, yp) k =1,..., N} in a general multi- 
ply connected region D bounded by fixed bound- 
aries, the stream function of the fluid motion is 
given by 


Py; Nis Yk) 
N 
= po(x, y) + X TG(x, Y; Xr, Yk) [29] 
k=1 


where olx, y) is the streamfunction due to outside 
agencies and is independent of the point vortex 
positions. 


Theorem 5 For the motion of vortices of strengths 
ITlk=1,..., N} in a general region D bounded by 
fixed boundaries, there exists a Kirchhoff—Routh 
function H({xp, yp}), depending on the point vortex 
positions, such that 


dx; oH dy, OH 


where H({x,, y}}) is given by 


N 
H({xr, Yet) = X Vebo(xes ye) 
= 


N 
+ » De Pe, G (Xkis Yki ko» Veo) 


ky ko=1 
ky >ko 


1 N 
— 5 > TES es Yki Xk Yk) [31] 
= 


In rescaled coordinates (xp, Łyk), [30] is a Hamil- 
tonian system in canonical form. For historical 
reasons, H is often called the Kirchhoff-Routh 
path function. Analyzing the separate contributions 
to the path function [31] is instructive: the first term 
is the contribution from flows imposed from outside 
(e.g., background flows and round-island circula- 
tions), the second term is the “free-space” contribu- 
tion (it is the relevant Hamiltonian when no 
boundaries are present) while the third term encodes 
the effect of the boundary walls (or, the effect of the 
“image vorticity” distribution discussed earlier). 


Lin (1941a) went on to show that, with the 
Hamiltonian in some D given by H in [31], the 
Hamiltonian relevant to vortex motion in another 
domain obtained from D by a conformal mapping 
z(C) consists of [31] with some simple extra additive 
contributions dependent only on the derivative of 
the map z(¢) evaluated at the point vortex positions. 

Flucher and Gustafsson (1997) also introduce 
the Robin function R(xo, yo) defined as the regular 
part of the above hydrodynamic Green’s function 
evaluated at the point vortex. Indeed, R(xọ, yo) = 
g(x0, Yo; X0, Yo), Where g is defined in [27]. An 
interesting fact is that, for single-vortex motion in 
a simply connected domain, R(xo, yo) satisfies the 
quasilinear elliptic Liouville equation everywhere in 


D with the boundary condition that it becomes 
infinite everywhere on the boundary of D. 

By combining the Kirchhoff—Routh theory with 
conformal mapping theory, many interesting prob- 
lems can be studied. What happens, for example, if 
there is a gap in the wall of Figure 16? In recent 
work, Johnson and McDonald (2005) show that if 
the vortex starts off, far from the gap, at a distance 
of less than half the gap width from the wall, then it 
will eventually penetrate the gap. Otherwise, it will 
dip towards the gap but not go through it. The 
trajectories are shown in Figure 17. 

Unfortunately, Lin did not provide any explicit 
analytical expressions for G in the multiply con- 
nected case. This has limited the applicability of his 
theory beyond fluid regions that are anything other 
than simply and doubly connected. Recently, how- 
ever, Lin’s theory has recently been brought to 
implementational fruition by Crowdy and Marshall 


Point vortex, circulation T 





Wall 


Image vortex, circulation-I" 


Figure 16 The motion of a point vortex near an infinite straight 
wall. The vortex moves, at constant speed, maintaining a 
constant distance from the wall. Other possible trajectories are 
shown; they are all straight lines parallel to the wall. The motion 
can be thought of as being induced by an opposite-circulation 
“image” vortex at the reflected point in the wall. 





Figure 17 Distribution of point vortex trajectories near a wall 
with a single gap of length 2. There is a critical trajectory which, 
far from the gap, is unit distance from the wall. 
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(2005a), who, up to conformal mapping, have 
derived explicit formulas for the hydrodynamic 
Green’s function in multiply connected fluid regions 
of arbitrary finite connectivity. Their approach 
makes use of elements of classical function theory 
dating back to the work of Poincaré, Schottky, and 
Klein (among others). This allows new problems 
involving bounded vortex motion to be tackled. For 
example, the motion of a single vortex around 
multiple circular islands has been studied in Crowdy 
and Marshall (2005b), thereby extending recent 
work on the two-island problem (Johnson and 
McDonald 2005). If the wall in Figure 17 happens 
to have two (or more) gaps, then the fluid region is 
multiply connected. The two-gap (doubly con- 
nected) case was recently solved by Johnson and 
McDonald (2005) using Schwarz—Christoffel maps 
combined with elements of elliptic function theory 
(see Figure 18). Crowdy and Marshall have solved 
the problem of an arbitrary number of gaps in a wall 
by exploiting the new general theory presented 
in Crowdy and Marshall (200Sa,b) (and related 
works by the authors). The case of a wall with three 
gaps represents a triply connected fluid region and 
the critical vortex trajectory is plotted in Figure 19. 

Point vortex motion in bounded domains on the 
surface of a sphere has received scant attention in 
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Figure 18 The critical trajectory when there are two symmetric 
gaps in a wall. The fluid region is now doubly connected. This 
problem is solved in Johnson and McDonald (2005) and Crowdy 
and Marshall (2005). 


—5 0 5 


Figure 19 The critical vortex trajectories when there are three 
gaps in the wall. This time the fluid region is triply connected. 
This problem is solved in Crowdy and Marshall (2005) using the 
general methods in Crowdy and Marshall (2005). 


78 Point-Vortex Dynamics 


the literature, although Kidambi and Newton 
(2000) and Newton (2001) have recently made a 
contribution. Such paradigms are clearly relevant 
to planetary-scale oceanographic flows in 
which oceanic eddies interact with topography such 
as ridges and land masses and deserve further study. 
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Introduction 


The Poisson reduction techniques allow the con- 
struction of new Poisson structures out of a given 
one by combination of two operations: “restriction” 
to submanifolds that satisfy certain compatibility 
assumptions and passage to a “quotient space” 
where certain degeneracies have been eliminated. 
For certain kinds of reduction, it is necessary to pass 
first to a submanifold and then take a quotient. 
Before making this more explicit, we introduce the 
notations that will be used in this article. All 
manifolds in this article are finite dimensional. 


Poisson Manifolds 


A “Poisson manifold” is a pair (M, {-,-}), where M is a 
manifold and {-,-} is a bilinear operation on C%(M) 
such that (C™(M), {-, -}) is a Lie algebra and {-,-} isa 
derivation (i.e., the Leibniz identity holds) in each 
argument. The pair (C™~(M),{-,-}) is also called a 
“Poisson algebra.” The functions in the center C(M) of 
the Lie algebra (C%(M),{-,-}) are called “Casimir 
functions.” From the natural isomorphism between 
derivations on C™~(M) and vector fields on M, it follows 
that each h € C™(M) induces a vector field on M via the 
expression X, ={-,h}, called the “Hamiltonian vector 
field” associated to the “Hamiltonian function” h. 
The triplet (M, {-,-},/) is called a “Poisson dynami- 
cal system.” Any Hamiltonian system on a symplec- 
tic manifold is a Poisson dynamical system relative 
to the Poisson bracket induced by the symplectic 


structure. Given a Poisson dynamical system 
(M,{-,-}, 4), its “integrals of motion” or “con- 
served quantities” are defined as the centralizer of 
h in (C%(M),{-,-}) that is, the subalgebra of 
(C°(M),{-,-}) consisting of the functions 
feC%(M) such that {f,h}=0. Note that the 
terminology is justified since, by Hamilton’s equa- 
tions in Poisson bracket form, we have f = X,[f]= 
{f,4}=0, that is, f is constant on the flow of X,. A 
smooth mapping y:M,;— M2, between the two 
Poisson manifolds (Mj,,{-,-},) and (Mp,{-,-},), 
is called “canonical” or “Poisson” if for all g, 
beC*(Mz) we have *{g,bhh = {p*g, ¢*gh. If 
iy: M; — M2 is a smooth map between two Poisson 
manifolds (M1, {-,-},) and (Mo,{-,-},), then y is a 
Poisson map if and only if Tyo Xpo = Xp © for 
any hEC™~(M>), where Ty: TM,— TM) denotes 
the tangent map (or derivative) of y. 

Let (S, {- , -}°) and (M, {- , -}“) be two Poisson mani- 
folds such that $S C M and the inclusion iş: S M 
is an immersion. The Poisson manifold (S,{-,-}°) is 
called a “Poisson submanifold” of (M,{-,-}™) 
if is is a canonical map. An immersed submanifold 
O of M is called a “quasi-Poisson submanifold” of 
(M,{-,-}™) if for any q € Q, any open neighborhood 
U of q in M, and any feEeC™(U) we have 
X(to(q)) € Tato (TzQ), where ig:Q<M is the 
inclusion and X; is the Hamiltonian vector field of f 
on U with respect to the Poisson bracket of M 
restricted to U. If (S,{-, -}°) is a Poisson submanifold 
of (M, {-, -}), then there is no other bracket {-, -}’ on 
S making the inclusion 7: S — M into a canonical map. 
If O is a quasi-Poisson submanifold of (M, {-, -}), then 
there exists a unique Poisson structure {-,-}~ on O 
that makes it into a Poisson submanifold of (M, {-, -}) 
but this Poisson structure may be different from the 
given one on O. Any Poisson submanifold is quasi- 
Poisson but the converse is not true in general. 
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The Poisson Tensor and Symplectic Leaves 


The derivation property of the Poisson bracket implies 
that for any two functions f, g € C™®(M), the value of 
the bracket {f, g}(z) at an arbitrary point z€ M (and 
therefore X;(z) as well) depends on f only through 
df(z) which allows us to define a contravariant 
antisymmetric 2-tensor B € A*(T*M), P the “Pois- 
son tensor,” by B(z)(az, 6z) = {f,g}(z), where 
df(z)=a,€T;M and dg(z) = 6: € TM. The vector 
bundle map B’ : T*M — TM over the identity naturally 
associated to B is defined by B(z)(a;z, 8:)= 
(az, B’(8z)). Its range D:=B*(T*M) c TM is called 
the “characteristic distribution” of (M, {- ,-}) since D is 
a generalized smooth integrable distribution. Its 
maximal integral leaves are called the “symplectic 
leaves” of M for they carry a symplectic structure that 
makes them into Poisson submanifolds. As integral 
leaves of an integrable distribution, the symplectic 
leaves £ are “initial submanifolds” of M, that is, the 
inclusion 1: £ —> M is an injective immersion such that 
for any smooth manifold P, an arbitrary map g : P > £ 
is smooth if and only if i o g: P — M is smooth. 


Poisson Reduction 
Canonical Lie Group Actions 


Let (M,{-,-}) be a Poisson manifold and let G be a 
Lie group acting canonically on M via the map 
®:G x M—M. An action is called “canonical” if 
for any h€ G and f,g € C”(M), one has 


{fo Ph, go Op} ={f, g} 0 ®, 


If the G-action is free and proper, then the orbit space 
M/G isa smooth regular quotient manifold. Moreover, 
it is also a Poisson manifold with the Poisson bracket 
{.,-}@/S. uniquely characterized by the relation 


{£,8}" (a(m)) = {f 07,8 0 7} (m) [1] 
for any m€M and where f,g:M/G—R are two 
arbitrary smooth functions. This bracket is appro- 
priate for the reduction of Hamiltonian dynamics 
in the sense that if he C*°(M)° is a G-invariant 
smooth function on M, then the Hamiltonian 
flow F; of X, commutes with the G-action, so it 
induces a flow FF on M/G that is Hamiltonian on 
(MIG; aJ Gy for the reduced Hamiltonian 
ripe e € C®(M/G) defined by [bh] o r=h. 

If the Poisson manifold (M,{-,-}) is actually 
symplectic with form w and the G-action has an 
associated momentum a J:M—aq*, then the 
symplectic leaves of (M/G,{-, -}“/°) are given by the 
spaces (MG :=G rein es w) ), where J+ (u) is a 
connected component of the fiber T 1(u)and Wo, is the 
restriction to Mọ, of the symplectic form wo, ‘of the 


symplectic orbit reduced space Mo, (see Symmetry 
and Symplectic Reduction). If, additionally, G is 
compact, M is connected, and the momentum map J 
is proper, then Mọ = Mo,. 

In the remainder of this section, we characterize 
the situations in which new Poisson manifolds can 
be obtained out of a given one by a combination of 
restriction to a submanifold and passage to the 
quotient with respect to an equivalence relation that 
encodes the symmetries of the bracket. 


Definition 1 Let (M,{-,-}) be a Poisson manifold 
and Dc TM a smooth distribution on M. The 
distribution D is called “Poisson” or “canonical,” if 
the condition df|p = dg|p =0, for any f,g € C%(U) 
and any open subset U C P, implies that d{f, g}|,, = 0. 


Unless strong regularity assumptions are invoked, the 
passage to the leaf space of a canonical distribution 
destroys the smoothness of the quotient topological 
space. In such situations, the Poisson algebra of functions 
is too small and the notion of presheaf of Poisson 
algebras is needed. See Singularity and Bifurcation 
Theory for more information on singularity theory. 


Definition 2 Let M be a topological space with a 
presheaf F of smooth functions. A presheaf of Poisson 
algebras on (M, F) is a map {-, -} that assigns to each 
open set U C M a bilinear operation {-,-}y: F(U) x 
F(U)— F(U) such that the pair (F(U), {-,-}y) is a 
Poisson algebra. A presheaf of Poisson algebras is 
denoted as a triple (M, F,{-,-}). The presheaf of 
Poisson algebras (M, F, {- , -}) is said to be “nondegene- 
rate” if the following condition holds: if f € F(U) is such 
that {f, 2} yay = 0, for any g € F( V) and any open set of 
V, then f is constant on the connected components of U. 


Any Poisson manifold (M,{-,-}) has a natural 
presheaf of Poisson algebras on its presheaf of smooth 
functions that associates to any open subset U of M 
the restriction {-,-}|, of {-,-} to C°(U) x C*°(U). 


Definition 3 Let P be a topological space and 
Z ={S;};-, a locally finite partition of P into smooth 
manifolds $; C P,i €I, that are locally closed topo- 
logical subspaces of P (hence their manifold topol- 
ogy is the relative one induced by P). The pair (P, Z) 
is called a “decomposition” of P with “pieces” in Z, 
or a “decomposed space,” if the following “frontier 
condition” holds: 


Condition (DS) If R,S€ Z are such that RNS + 9, 
then RCS. In this case, we write R < S. If, in 
addition, R Æ S we say that R is incident to S or that 
it is a boundary piece of S and write R <x S. 


Definition 4 Let M be a differentiable manifold 
and $ C M a decomposed subset of M. Let {S;} e1 


be the pieces of this decomposition. The topology 
of S is not necessarily the relative topology as a 
subset of M. Then D c TM|; is called a “smooth 
distribution” on S adapted to the decomposition 
{Sij}ie7, if DOTS; is a smooth distribution on S; for 
all ¿€ I. The distribution D is said to be “integrable” 
if D N TS; is integrable for each 7€ I. 


In the situation described by the previous defini- 
tion and if D is integrable, the integrability of the 
distributions Ds.:=DMTS; on S; allows us to 
partition each S; into the corresponding maximal 
integral manifolds. Thus, there is an equivalence 
relation on S$; whose equivalence classes are precisely 
these maximal integral manifolds. Doing this on 
each S;, we obtain an equivalence relation Ds on the 
whole set S$ by taking the union of the different 
equivalence classes corresponding to all the Ds. 
Define the quotient space $/Ds by 


$/Ds i= JS Ds; 


iel 


and let mp,: S — S/Ds be the natural projection. 


The Presheaf of Smooth Functions on S/Ds 


Define the presheaf of smooth functions Cyip, on 

S/Ds as the map that associates to any open n V 
of S/Ds the set of functions CSip.( VY) characterized 
by the following property: f € Csin, (V) V) if and only if 
for any z€V there exists m€ rg (V), Um open 
neighborhood of m in M, and F€ C¥(U,,) such that 
2] 


fo TDs las} (VNU = Flai (vnu 


F is called a “local extension” of f o mp, at the point 
m ETD. (V). When the distribution D is trivial, the 
presheaf CSip coincides with the presheaf of 
Whitney och functions C¥y on S induced by 
the smooth functions on M. 

The presheaf Cp is said to have the (D, Ds)- 
local extension pi pay when the topology of S is 
stronger than the relative topology and, at the same 
time, the local extensions of f o 7p, ene in [2] 
can always be chosen to satisfy 

dF(n)|om, =0 for any ne ip (V) (WU 
F is called a “local D-invariant extension” of f o mp, at 
the point m Erp. (V). If S is a smooth embedded 
submanifold of M and Ds is a smooth, integrable, and 
regular distribution on S, then the presheaf CX IDs 
coincides with the presheaf of smooth functions on 
S/Ds when considered as a regular quotient manifold. 

The following definition spells out what we mean 
by obtaining a bracket via reduction. 
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Definition 5 Let (M,{-,-}) be a Poisson manifold, 
S a decomposed subset of M, and DCTM|, a 
Poisson-integrable generalized distribution adapted 
to the decomposition of S. Assume that C%p, 
has the (D,Ds)-local extension property. Then 
(M, {-, I; D,S) is said to be “Poisson reducible” if 
(S/Ds,C2 S/Ds? v{-, -}9/5) is a well-defined presheaf of 
Poisson algebras where, for any open set V C S/Ds, 
the bracket {-,-}}/°°:C%p.(V) x C¥p.(V) > Cp, 
(V) is given by 


{f, ghy (rp, (m)) := {F, G}(m) 


for any mE Tp, (V V) for local D-invariant extensions 
F,G at m of f o tp, and go 7p,, respectively. 


Theorem 1 Let (M,{-,-}) be a Poisson manifold with 
associated Poisson tensor B € A*(T*M), S a decom- 
posed space, and D C TM|; a Poisson-integrable 
generalized distribution adapted to the decomposition 
of S (see Definitions 4 and 1). Assume that Csi, has 
the (D, Ds)-local extension property. Then (M,{-,-}, 
D, S) is Poisson reducible if for any meS 


B(Am) C [AS] [3] 


where Am := {dF(m)|F € C” (Um), dF(z)lp =9, for 
all z€ Um N S, and for any open neighborhood U, 
of m in M} and AS :={dF(m) E€ AnlFly ny, ts 
constant for an open neighborhood Um of m in M 
and an open neighborhood Vm of m in S}. 


‘A = is endowed with the relative topology, then 
= {dF(m) € An|Fly,, ov, 18 constant for an open 
sorta Unm of m in M}. 


Reduction by Regular Canonical Distributions 


Let (M,{-,-}) be a Poisson manifold and S an 
embedded submanifold of M. Let D c TM|; be a 
sub-bundle of the tangent bundle of M restricted to 
S such that Ds:=D NTS is a smooth, integrable, 
regular distribution on $ and D is canonical. 


Theorem 2 With the above hypotheses, (M,{-,-}, 
D,S) is Poisson reducible if and only if 


B4(D°) CTS + D [4] 


Applications of the Poisson Reduction 
Theorem 


Reduction of Coisotropic Submanifolds 


Let (M,{-,-}) be a Poisson manifold with associated 
Poisson tensor B€ A*(T*M) and S an immersed 
smooth submanifold of M. Denote by (TSY := {as € 
T*M|(a;,vs) =0, for all seS, v,¢ TS} Cc T*M the 
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conormal bundle of the manifold $; it is a vector 
sub-bundle of T*M|ę. The manifold S is called 
“coisotropic” if B*((TS)°) c TS. In the physics 
literature, coisotropic submanifolds appear some- 
times under the name of “first-class constraints.” 
The following are equivalent: 


1. S is coisotropic; 

2. if f € C°(M) satisfies f|, = 0, then Xp|s E€ X(S) 

3. for any s€S, any open neighborhood U, of s in 
M, and any function g@éC™*(U,) such that 
Xels) € T;S, if f € C°(U;) satisfies {f, g}(s) =O, it 
follows that X;(s) € T;S; 

4. the subalgebra {f € C~(M 


subalgebra of (C°(M 


) | fly = 0} is a Poisson 
Js [ ’ |) 

The following proposition shows how to endow 
the coisotropic submanifolds of a Poisson manifold 


with a Poisson structure by using the reduction 
theorem 1. 


Proposition 1 Let (M,{-,-}) be a Poisson manifold 
with associated Poisson tensor B € A?(T*M). Let S 
be an embedded coisotropic submanifold of M and 
D := B*((TS)°). Then 


(i) D=DATS=Ds is a 
distribution on S. 

(ii) D is integrable. 

(iii) If Cp, has the (D, Ds)-local extension property, 
then (M, {- ,-}, D, S) is Poisson reducible. 


smooth generalized 


Coisotropic submanifolds usually appear as the 
level sets of integrals in involution. Let (M,{-,-}) be a 
Poisson manifold with Poisson tensor B and let 
fis -. -fe E€ C™~(M) be k smooth functions in involu- 
tion, that is, {fie fi) =0, for any i1,j€{1,...,k}. 
ae that OER? isa regular value H the TE 

=f scs h] : M — R? and let 7 = F(0). Since for 
any se S, span {df,(s),...,df,(s)} C (T,S)° and the 
dimensions of both ses of this lon are equal, 
it follows that span{df;,(s),...,df,(s)}=(T;S)°. 
Hence, Bi(s)((T,S)°) = span(X; (s j. fen? P and 
B*(s) ((T;S)°) C T,S by the way ol the compo- 
nents of F. Consequently, S is a coisotropic submani- 


fold of (M,{-,-}). 


Cosymplectic Submanifolds and Dirac’s 
Constraints Formula 


The Poisson reduction theorem 2 allows us to define 
Poisson structures on certain embedded submani- 
folds that are not Poisson submanifolds. 


Definition 6 Let (M,{-,-}) be a Poisson manifold 
and let B € A*(T*M) be the corresponding Poisson 
tensor. An embedded submanifold $ C M is called 
cosymplectic if 


(i) B*((TS)°) A TS = {0}, 
(ii) TS + T.L.=T.M, 


M,{-: ) -}) 


for any s€ S and L£; the symplectic leaf of ( 
containing s€ S. 


The cosymplectic submanifolds of a symplectic mani- 
fold (M, w) are its symplectic submanifolds. Cosym- 
plectic submanifolds appear in the physics literature 
under the name of “second-class constraints.” 


Proposition 2 Let (M,{-,-}) be a Poisson manifold, 
BeA*(T*M) the corresponding Poisson tensor, 
and S a cosymplectic submanifold of M. then, for 
any sES, 


(1) TsLs=(IsSN TsLs) B aa 
the symplectic leaf of (M, { 
SES. 

(ii) (T;S)° A ker B¥(s) = {0}. 

(iii) T.M = B*(s)((T;S)°) ® T,S. 

(iv) B#((TS)°) is a sub-bundle of TM|; and hence 
TM|; = B#((TS)°) @ TS. 

(v) The symplectic leaves of (M,{-,-}) intersect S 
transversely and hence SAL is an initial 
submanifold of S, for any symplectic leaf L of 


(M, G ) }). 


S)°), where L; is 
-,-}) that contains 


Theorem 3 (The Poisson structure of a cosymplectic 
submanifold). Let (M,{-,-}) be a Poisson manifold, 
Be A?(T*M) the corresponding Poisson tensor, 


and S a cosymplectic submanifold of M. Let 
D := B? ((TS)°) c TM|.. Then, 


(i) (M, {-,-}, D, S) is Poisson reducible. 
(ii) The corresponding quotient manifold equals S 
and the reduced bracket {-,+}° is given by 


{Fg} (s) = {F, G}(s) [5] 


where f, g€ C%m(V) are arbitrary and F,G € 
C®(U) are local D-invariant extensions of f 
and g around s €S, respectively. 

The Hamiltonian vector field Xç of an arbitrary 
function f € CXu(V) is given either by 


Tio X= Xpot [6] 


— 


(iii 


where FE C®(U) is a local D-invariant exten- 
sion of f and i:S— M is the inclusion, or by 


Tio Xf = Tts o Xp01 [7] 


where F € C®(U]) is an arbitrary local extension 
of f and ms:TM|,—TS is the projection 
induced by the Whitney sum decomposition 
TM|; = B((TS)°) & TS of TM|.. 

(iv) The symplectic leaves of (S,{-,-}°) are the 
connected components of the intersections SM L, 
where L is a symplectic leaf of (M,{-,-}). Any 


symplectic leaf of (S,{-,-}°) is a symplectic 
submanifold of the symplectic leaf of (M,{-, -}) 
that contains it. 

(v) Let L; and L? be the symplectic leaves of 
(M, {-,-}) and (S, {-, -}°), respectively, that contain 
the point s € S. Let we, and wps be the correspond- 
ing symplectic forms. Then B*(s)((T,S)°) is a 
symplectic subspace of T,£L, and 


B¥(s)((T,S)°) = (TL8) A [8] 


where (T L52 denotes the wç (s)-orthogonal 
complement of TL? in TLs. 

Let Bs € A? (T*S) be the Poisson tensor associated 
to (S,{-,-}°). Then 


xr 


(vi 


B$ = 73 o B’ |s o r$ 9 


where rx: 1*S— T*M|, is the dual of m5: TM|g 
— TS. 


The “Dirac constraints formula” is the expression in 
coordinates for the bracket of a cosymplectic 
submanifold. Let (M,{-,-}) be an n-dimensional 
Poisson manifold and let S$ be a k-dimensional 
cosymplectic submanifold of M. Let z be an 
arbitrary point in S and (U, k) a submanifold chart 
around zg such that k = (p, y): U — V; x V2, where 
V, and V> are two open neighborhoods of the origin 
in two Euclidean spaces such that &(z0) = ((%(Z0), 


w(Zo)) = (0,0) and 
R(UNS) = V; x {0} 10] 


Let G=:(G',...,G*) be the components of @ 
and define @!:=@'|y,5,...,5:=Glyas. Extend 
¢',...,@* to D-invariant functions y!,...,y* on U. 
Since the differentials d! (s), ...,d@ (s) are linearly 
independent for any sE UNS, we can assume (by 
shrinking U if necessary) that dy!(z),...,dp*(z) are 
also linearly independent for any zeU. Conse- 
quently, (U,«) with K:=(y!,...,p*%, Yt, ..., YE) is 
a submanifold chart for M around zo with respect to 
S such that, by construction, 


do" (s) aes ((7,5) 
p 
= --- = de“ (s)| eso r,s) = 9 


for any s©UNS. This implies that for any 
iE{1,...,k}, jE{1,...,n— k}, andsES 

{e W}(s) = dy'(s)(Xy(s)) = 0 
since dy/(s) €(T;S)° by [10] and hence 


Xy (s) € B*(s)((TsS)”) [11] 
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Additionally, since the functions y!,...,y* are 
D-invariant, by [6], it follows that 


Kyi (s) = ae (s) ET,S,... 1X ok (s) 
= Åk (s) els 
p 


for any sE€S. Consequently, {X,i(s),.. 
Xyi(S),-- -, Xyne(s)} spans TL, with 


{Xi (5), ---,Xpe(s)} C TSN ToL, 


. X ok (s), 


and 
{Xyi(s),.--;Xyne(s)} C B¥(s)((T3S)”) 
By Proposition 2(i), 
span{X,,1(s),...,X,a(s)} = TSN TeL; 
and 
span{Xy1(s),-.-, Xy (s)} = B¥(s)((T3S)’) 


Since dim(B*(s)((T;S)°))=n—k by Proposition 
2(iii), it follows that {Xy:(s),..., Xyn-e(s)} is a basis 
of B%(s)((T;S)°). 

Since B#(s)((T;S)°) is a symplectic subspace of 
T;£; by Theorem 3(v), there exists some r €N such 
that n—k=2r and, additionally, the matrix C(s) 
with entries 


C” (s) := (WV }(s), 


is invertible. Therefore, in the coordinates (y!,..., 
ok, Yt, ..., Y$), the matrix associated to the 
Poisson tensor B(s) is 


B= ("5 ce) 


where Bs € A7(T*S) is the Poisson tensor associated 
to (S,{-,-}°). Let Ci(s) be the entries of the matrix 
C(s)". 


i,j E€{1,...,n— k} 


Proposition 3 (Dirac formulas). In the coordinate 
neighborhood (4%!,... 9f, Yt, ... pE) constructed 
above and for seS we have, for any f,g€ CXm(V): 


n—k 
X¢(s) = Xr(s) — S{F,W}(s)Ci(s)Xy(s) [12] 
ij=1 
and 
{f,g}°(s) ={F, G}(s) 
n—k 
— SEW }(s)Cy(s){W,G}(s) [13] 
T 


where F, G € C®(U) are arbitrary local extensions of 
f and g, respectively, around s€ S. 
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Mechanical Examples. Unfolding 
Billiard Trajectories 


The billiard system inside a polygon P has a very 
simple description: a point moves rectilinearly with 
the unit speed until it hits a side of P; there it 
instantaneously changes its velocity according to the 
rule “the angle of incidence equals the angle of 
reflection,” and continues the rectilinear motion. If 
the point hits a corner, its further motion is not 
defined. (see Billiards in Bounded Convex Domains). 
From the point of view of the theory of dynamical 
systems, polygonal billiards provide an example of 
parabolic dynamics in which nearby trajectories 
diverge with subexponential rate. 

One of the motivations for the study of polygonal 
billiards comes from the mechanics of elastic particles in 
dimension 1. For example, consider the system of two 
point-masses mı and mz on the positive half-line x > 0. 
The collision between the points is elastic, that is, the 
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energy and momentum are conserved. The reflection 
off the left endpoint of the half-line is also elastic: if a 
point hits the “wall” x =0, its velocity changes sign. 
The configuration space of this system is the wedge 
0 <3 5% Alter the tescaling =y Aat = 152, 
this system identifies with the billiard inside a wedge 
with the angle measure arctan \/ mı /mn2. 

Likewise, the system of two elastic point-masses 
on a segment is the billiard system in a right 
triangle; a system of a number of elastic point- 
masses on the positive half-line or a segment is the 
billiard inside a multidimensional polyhedral cone 
or a polyhedron, respectively. The system of three 
elastic point-masses on a circle has three degrees of 
freedom; one can reduce one by assuming that the 
center of mass of the system is fixed. The resulting 
two-dimensional system is the billiard inside an 
acute triangle with the angles 


mı + mı + m À 
arctan ( m poems) t= 1,2.3 
M1 MM3 


For comparison, the more realistic system of 
elastic balls identifies with the billiard system in a 
domain with nonflat boundary components. 


Figure 1 Unfolding a billiard trajectory in a wedge. 


A useful elementary method of study is unfolding: 
instead of reflecting the billiard trajectory in the 
sides of the polygon, reflect the polygon in the 
respective side and unfold the billiard trajectory to a 
straight line. This method yields an upper bound 


z ,/m == 


for the number of collisions in the system of two 
point-masses mı and m on the positive half-line. 
Likewise, the number of collisions for any number 
of elastic point-masses on the positive half-line is 
bounded above by a constant depending on the 
masses only. Similar results are known for systems 
of elastic balls (Figure 1). 

Similarly, one studies the billiard inside the unit 
square. Unfolding the square yields a square grid in 
the plane, acted upon by the group of parallel 
translations 2Z62Z. Factorizing by this group 
action yields a torus, and the billiard flow in a 
given direction becomes a constant flow on the 
torus. If the slope is rational, then all orbits are 
periodic, and if the slope is irrational, then all orbits 
are dense and the billiard flow is ergodic. Its metric 
entropy is equal to zero. Periodic trajectories of the 
billiard in a square come in bands of parallel ones. 
Let f(¢) be the number of such bands of length not 
greater than £. Then, f(£) equals the number of 
coprime lattice points inside the circle of radius 4, 
that is, f(£) has quadratic growth in £. 





Periodic Trajectories 


The simplest example of a periodic orbit in a 
polygonal billiard is the 3-periodic Fangano trajec- 
tory in an acute triangle: it connects the bases of the 
three altitudes of the triangle and has minimal 
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perimeter among inscribed triangles. The Fagnano 
trajectory belongs to a band of 6-periodic ones. It is 
not known whether every acute triangle has other 
periodic trajectories. 

For a right triangle, one has the following result: 
almost every (in the sense of the Lebesgue measure) 
billiard trajectory that leaves a leg in the perpendicular 
direction returns to the same leg in the same direction 
and is therefore periodic. A similar existence result 
holds for polygons whose sides have only two 
directions. 

In general, not much is known about the existence 
of periodic billiard trajectories in polygons. Con- 
jecturally, every polygon has one, but this is not 
known even for all obtuse triangles. Recently, 
R Schwartz proved that every obtuse triangle with 
the angles not exceeding 100° has a periodic billiard 
path. This work substantially relies on a computer 
program, McBilliards, written by Schwartz and 
Hooper. 

If an arbitrary small perturbation of the vertices of a 
billiard polygon leads to a perturbation of a periodic 
billiard trajectory, but not to its destruction, then this 
trajectory is called stable. Label the sides of the 
polygon 1,2,...,&. Then a periodic trajectory is 
coded by the word consisting of the labels of the 
consecutively visited sides. An even-periodic trajectory 
is stable if and only if the numbers in the respective 
word can be partitioned in pairs of equal numbers, so 
that the number from each pair appears once at an 
even position, and once at an odd one. As a 
consequence, if the angles of a polygon are indepen- 
dent over the rational numbers, then every periodic 
billiard trajectory in it is stable. 


Complexity of Billiard Trajectories 


The encoding of billiard trajectories by the consecu- 
tively visited sides of the billiard polygon provides a 
link between billiard and symbolic dynamics. For a 
billiard k-gon P, denote by © the set of words in 
letters 1,2,...,k corresponding to billiard trajec- 
tories in P, and let %,, be the set of such words of 
length n. 

One has a general theorem: the topological 
entropy of the billiard flow is zero. This implies 
that a number of quantities, associated with a 
polygonal billiard, grow slower than exponentially, 
as functions of n: the cardinality |%,,|, the number of 
strips of m-periodic trajectories, the number of 
generalized diagonals with n links (i.e., billiard 
trajectories that start and end at corners of the 
billiard polygon), etc. Conjecturally, all these quan- 
tities have polynomial growth in n. 
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The complexity of the billiard in a polygon is 
defined as the function p(m)=|»,|. Likewise, one 
may consider the billiard trajectories in a given 
direction 0 and define the corresponding complexity 
poln). 

In the case of a square, one modifies the encoding 
using only two symbols, say, 0 and 1, to indicate 
that a trajectory reflects in a horizontal or a vertical 
side, respectively. If 9 is a direction with an 
irrational slope, then p(n) =n + 1. This is a classical 
result by Hedlund and Morse. The sequences with 
complexity p(z)=n+1 are called Sturmian; this 
is the smallest complexity of aperiodic sequences. 
A generalization for multidimensional cubes and 
parallelepipeds, due to Yu Baryshnikov, is known. 

For a k-gon P, let N be the least common 
denominator of its z-rational angles and s be the 
number of its distinct z-irrational angles. Then, 


poln) < kNn(1 +5) 


Concerning billiard trajectories in all directions, 
one has a lower bound for complexity: p(n) > cn? 
for a constant c depending on the polygon. A similar 
estimate holds for a d-dimensional polyhedron with 
the exponent 2 replaced by d. 


Rational Polygons and Flat Surfaces 


The only class of polygons for which the billiard 
dynamics is well understood are rational one, the 
polygons satisfying the property that the angles 
between all pairs of sides are rational multiples of 7. 

Let P be a simply connected (without holes) 
rational k-gon with angles 1m;/n;, where m; and n; 
are coprime integers. The reflections in the sides of P 
generate a subgroup of the group of isometries of 
the plane. Let G(P) C O(2) consist of the linear 
parts of the elements of this group. Then, G(P) is the 
dihedral group Dy consisting of 2N elements. When 
a billiard trajectory reflects in a side of P, its 
direction changes by the action of the group G(P), 
and the orbit of a generic direction 0 Æ kr/N on the 
unit circle consists of 2N points. 

The phase space of the billiard flow is the unit 
tangent bundle P x St. Let Mg be the subset of 
points whose projection to S! belongs to the orbit of 
6 under G(P) = Dy. Then, Mg is an invariant surface 
of the billiard flow in P. The surface Mọ is obtained 
from 2N copies of P by gluing their sides according 
to the action of Dy. This oriented compact surface 
depends only on the polygon P, but not on the 
choice of 6, and may be denoted by M. The 
directional billiard flows Fy on M in directions 6 


Figure 2 The invariant surface for a right triangle with acute 
angle 7/8 has genus 2. 


are obtained, one from another, by rotations. The 
genus of M is given by the formula 


N 1 
145 (k-2-Do=] 


For example, if P is a right triangle with an acute 
angle 7/8, then M is a surface of genus 2 (Figure 2). 

The cases when M is a torus are as follows: the 
angles of P are all of the form r/n;, where n; are 
equal, up to permutations, to 


(3,3,3), (2,4,4), (2,3, 6), (2,2, 2,2) 


and the respective polygons are an equilateral 
triangle, an isosceles right triangle, a right triangle 
with an acute angle 7/6, and a square. All these 
polygons tile the plane. 

The billiard flow on the surface M has saddle 
singularities at the points obtained from the vertices 
of P. The surface M inherits a flat metric from P 
with a finite number of cone-type singularities, 
corresponding to the vertices of P, with cone angles 
multiples of 27 (Figure 3). 

A flat surface M is a compact smooth surface with 
a distinguished finite set of points ©. On M \ È, one 
has coordinate charts v = (x,y) such that the transi- 
tion functions on the overlaps are of the form 


V—>v+c or vo-vt+e 


Figure 3 A cone singularity for the flow on an invariant surface. 


In particular, one may talk about directions on a flat 
surface. 

The group PSL(2,R) acts on the space of flat 
structures. From the point of view of complex analysis, 
a flat surface is a Riemann surface with a holomorphic 
quadratic differential; the set of cone points © corre- 
sponds to the zeros of the quadratic differential. Not 
every flat surface is associated with a polygonal billiard. 

Concerning ergodicity, one has the theorem of 
Kerckhoff, Masur, and Smillie: given a flat surface of 
genus not less than 2, for almost all directions 8 (in the 
sense of the Lebesgue measure), the flow Fg is uniquely 
ergodic. Furthermore, the Hausdorff dimension of the 
set of angles 0 for which ergodicity fails does not 
exceed 1/2, and this bound is sharp. As a consequence, 
the billiard flow on the invariant surface is uniquely 
ergodic for almost all directions. Another corollary: 
there is a dense Gs subset in the space of polygons 
consisting of polygons for which the billiard flow is 
ergodic. If a billiard polygon admits approximation by 
rational polygons at a superexponentially fast rate, 
then the billiard flow in it is ergodic. 

Concerning periodic orbits, one has the following 
theorem due to H Masur: given a flat surface of genus 
not less than 2, there exists a dense set of angles 0 such 
that Fy has a closed trajectory. As a consequence, for 
any rational billiard polygon, there is a dense set of 
directions each with a periodic orbit. Furthermore, 
periodic points are dense in the phase space of the 
billiard flow in a rational polygon. 

Similarly to the case of a square, let f(¢) be the 
number of strips of periodic trajectories of length not 
greater than £ in a rational polygon P. By a theorem 
of H Masur, there exist constants c and C such that 
for sufficiently large £ one has: cl? < f(£) < CÊ, and 
likewise for flat surfaces. 

There is a class of flat surfaces, called Veech (or 
lattice) surfaces, for which more refined results are 
available. The groups of affine transformations of a 
flat surface determine a subgroup in SL(2, R). If this 
subgroup is a lattice in SL(2, R), then the flat surface 
is called a Veech surface. Similarly, one defines a 
Veech rational polygon. For example, regular poly- 
gons and isosceles triangles with equal angles m/n 
are Veech. All acute Veech triangles are described. 

For a Veech surface, one has the following Veech 
dichotomy: for any direction 9, either the flow Fo is 
minimal or its every leaf is closed (unless it is a saddle 
connection, 1.e., a segment connecting cone points). 
For a Veech surface (and polygon), the quadratic 
bounds for the counting function f(¢) become quad- 
ratic asymptotics: f(@)/@ has a limit as £ — oo. The 
value of this limit is expressed in arithmetical terms. 

A generic flat surface also has quadratic asymptotics. 
The value of the limit depends only on the stratum of 
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the Teichmuller space that contains this surface. These 
values are known, due to Eskin, Masur, Okunkov, and 
Zorich. Since a generic flat surface does not correspond 
to a rational polygon, this result does not immediately 
apply to polygonal billiards. However, quadratic 
asymptotics are established for rectangular billiards 
with barriers. 

Note, in conclusion, a close relation of billiards in 
rational polygons and interval exchange transforma- 
tions; the reduction of the former to the latter is a 
particular case of the reduction of the billiard flow to 
the billiard ball map. On an invariant surface M of the 
billiard flow, consider a segment I, perpendicular to 
the directional flow. Since “the width of a beam” is an 
invariant transversal measure for the constant flow, the 
first return map to I is a piecewise orientation preserving 
isometry, that is, an interval exchange transformation. 
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Introduction 


The theme of positive maps on *-algebras and other 
ordered vector spaces, dates back to the Perron- 
Frobenius theory of matrices with positive entries, 
the Shur’s product of matrices, the study of doubly 
stochastic matrices describing discrete-time random 
walks and the behavior of limits of powers of 
positive matrices in ergodic theory. 

A long experience proved that far-reaching general- 
izations of the above situations have to be considered 
in various fields of mathematical physics and that 
C*-algebras, their positive cones, and other associated 
ordered vector spaces provide a rich unifying frame- 
work of functional analysis to treat them. 

It is the scope of this note to review some of the 
basic aspects both of the general theory and of the 
applications. 

In the next section we briefly recall the definitions 
of C*-algebras and their positive cones. However, 
throughout this article we refer to C’-Algebras and 
their Classification and von Neumann Algebras: 
Introduction, Modular Theory and Classification 
Theory as sources of the definitions and general 
properties of the objects of these operator algebras. 
We then introduce positive maps, illustrate their 
general properties, and discuss some relevant classes 
of them. The correspondence between states and 
representations is described next, as well as the 
appearance of vector, normal and non-normal states 
in applications. We then illustrate the structure of 
completely positive maps and their relevance in 
mathematical physics. Finally, we describe the 
relevance of the class of completely positive maps 
to understand the structure of nuclear C*-algebras. 


Positive Cones in C*-Algebras 


A C*-algebra A is a complex Banach algebra with a 
conjugate-linear involution a+ a* such that ||a*a|| = 
lall for all a € A. 

When A has a unit 1,4, the spectrum Sp(a) of an 
element a is the subset of all complex numbers A 
such that a — à - 14 is not invertible in A. When A is 
realized as a subalgebra of some 6(H), and this is 
always possible, the set Sp(a) coincides with the 
spectrum of the bounded operator a on the Hilbert 
space H. 


The involution determines the self-adjoint part 
A, :={a € A: a=a*} of A, a real subspace such that 
A = A; +1Ay,. A self-adjoint element a of A satisfies 
Sp(a) CR and, if k > 0, one has |ja|| < k if and only 

The involution determines another important 
subset of A: A, :={a*a: a € A}. This subset of Aj is 
closed in the norm topology of A and contains the 
sums of its elements as well as their multiples by 
positive scalars: in other words, it is a closed convex 
cone. From a spectral point of view, one has the 
following characterization: a self-adjoint element a 
belongs to A, if and only if its spectrum is positive 
Sp(a) C[0, +00). It is this property that allows us to 
call A, the positive cone of A and its elements 
positive. If it exists, a unit 1,4 in A is always positive 
and a Hermitian element a is positive if and only if 
Ita —a/llal) | <1. 

The continuous functional calculus in A allows 
to write any self-adjoint element of Ay, as a 
difference of elements of A4: Ap =A, —A,. More- 
over, A, N(—A,)={0} and the decomposition 
a=b -c of a self-adjoint element a as difference 
of positive elements b and c is unique provided one 
requires that bc=cb=0. In this case, it is called the 
orthogonal decomposition. 

The cone A, determines an underlying structure 
of order space on A: for a,b € A one says that a is 
less than or equal to b, in symbols a < b, if and only 
if b—a € A. In particular, a > 0 just means that a 
is positive. 

Another fundamental characterization of the 
positive cone is the following: a self-adjoint element 
a=a* is positive if and only if there exists an 
element b in A such that a= b*. Moreover, among 
the elements b with this property, there exists one 
and only one which is positive, the square root of a. 
Some examples of positive cones are provided in the 
following. 


Example 1 By a fundamental result of I M 
Gelfand, a commutative C*-algebra A is isomorphic 
to the C*-algebra Co(X) of all complex continuous 
functions vanishing at infinity on a locally compact 
Hausdorff topological space X. The algebraic 
operations have the usual pointwise meaning and 
the norm is the uniform one. The constant function 
1 represents the unit precisely when X is compact. 
The positive cone Co(X), coincides with that of the 
positive continuous functions in Co(X). 


Example 2 Finite dimensional C*-algebras A are 
classified as finite sums M,,(C)®My,(C)@---@ 
M,,(C) of full matrix algebras M,,(C). An element 





a; Ppa ®--- Pap is positive if and only if the 
matrices a; have positive eigenvalues. 


Example 3 When a C*-algebra A C B(H) is rep- 
resented as a self-adjoint closed algebra of operators 
on a Hilbert space H, its positive elements are those 
which have non-negative spectrum. 


Positive Maps on C*-Algebras 


Among the various relevant classes of maps between 
C*-algebras, we are going to consider the following 
ones, whose properties are connected with the 
underlying structures of ordered vector spaces. 


Definition 1 Given two C*-algebras A and B, a 
map @:A— B is called positive if ¢(A,)C B+. In 
other words, a map is positive if and only if it 
transforms the positive elements of A into positive 
elements of B: 


a E€ A => (aa) cB, [1] 


If A and B have units, the map is called unital 
provided ġ(14)= 1p. 


Morphisms and Jordan Morphisms 


A *-morphism between C*-algebras ¢:A—>B is 
positive; in fact, d(a*a) = d(a)* d(a) > 0. 

This also the case for Jordan *-morphism, the 
linear maps satisfying ¢(a*)=¢(a)* and ¢({a, b}) = 
{b(a), d(b)}, where {a,b}=ab + ba denotes the Jor- 
dan product. In fact, if a=a* then d(a2) = ¢(a)* is 
positive. 


Shur’s Product of Matrices 


Let A € M,(C) be a positive matrix and define a 
linear map ¢:M,(C)— M,C) through the Shur’s 
product of matrices: $,4(B) := [AjBj]? j=1- Since the 
Shur’s product of positive matrices is positive too 
(i.e., the positive cone of M,(C) is a semigroup 


under matrix product), the above map is positive. 


Positive-Definite Function on Groups 


Positive maps also arise naturally in harmonic 
analysis. Let G be a locally compact topological 
group with identity e and left Haar’s measure m. Let 
p: G — C be a continuous positive-definite function 
on G. This just means that for all n> 1 and all 
S1,- --,Sn € G, the matrix lp sii belongs to 
the positive cone of M,(C): yj =1 P(S" sjaa; > 0 
for all ay,...,Q,. Such functions are necessarily 
bounded with ||p||,,<p(e), so that an operator 
6: L'(G,m) — L'(G,m) is well defined by point- 
wise multiplication: @(f)(s):= p(s)f(s). This map 
extends to a positive map @:C*(G)—C*(G), 
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which is unital when p(e)=1, on the full group 
C*-algebra C*(G). When G is amenable, this algebra 
coincides with reduced C*-algebra C,(G) so that, if 
G is also unimodular (as is the case if G is compact), 
the positive elements can be approximated by 
positive-definite functions in L'(G,m) and the 
positivity of o follows exactly as in the previous 
example. 


Positive Maps in Commutative C*-Algebras 


Positive maps ¢:Co(Y)—>+ Co(X) between commu- 
tative C*-algebras have the following structure: 
b(a)(x) = fy R(x, dy)a(y),a € Co(Y). Here the kernel 
xt+k(x,-) is a continuous map from X to the space 
of positive Radon measures on Y. In case X and Y 
are compact, the map is unital provided k(x,-) is a 
probability measure for each x € X. In fact, for a 
fixed x € X, the map at ¢(a)(x) is a positive linear 
functional from Co(Y) to C and Riesz’s theorem 
guarantees that it can be represented by a positive 
Radon measure on Y. 

In probability theory, one-parameter semigroups 
Pt © Ps = Prs Of positive maps ¢;: Co(X) — Co(X) 
such that ¢,(1) <1 for all £ > 0, are called Markovian 
semigroups (conservative, if the maps are unital). They 
represent the expectation at time t>0 of Markovian 
stochastic processes on X. In this case, the time- 
dependent kernel k(t,x,-) represents the distribution 
probability at time ¢ of a particle starting in x € X at 
tine 7—0. 

These kinds of maps arise also in potential theory, 
where the dependence of the solution ¢(a) of a 
Dirichlet problem on a bounded domain Q, with 
nice boundary 02, upon the continuous boundary 
data a € C(OQ) gives rise to a linear unital map 
@: C(OQ) — C(QU N), whose positivity and uni- 
tality translates the “maximum principle” for har- 
monic functions. When Q is the unit disk, k is the 
familiar Poisson’s kernel. 


Continuity and Algebraic Properties 


of Positive Maps 


Since the order structure of a C*-algebra A is defined 
by its positive cone A,, positive maps are 


1. real: ġ(a*) = ġ(a)* and 
2. order preserving: ¢(a) < 6(b) whenever a < b. 
From this follows an important interplay between 
positivity and continuity: 

a positive map @: A — B 


between C*-algebras is continuous 


In case A has a unit, this follows by the fact that ¢ is 
order preserving and that, for self-adjoint a, one has 
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—lla||14 <a < +llal[1a, so that —al|¢(14) < O(a) < 
+||a||6(14) and then ||¢(a)|| < ||6(14)]| - lal]. In gen- 
eral, splitting a = b + ic as a combination of Hermitian 
elements b and c, as ||b|| < ||a|| and ||c|] < |/a|], one 
obtains 


loa) lell + [leo 
< leab + fell) 
< 2\!6(14)I| - Ilall 


The second general result concerning positivity 
and continuity is the following: 


Let 6: A — B be a linear map between C*-algebras 
with unit such that (14) = 1g; then ¢ is positive if 
and only if ||| =1. 


The result relies, among other things, on the 
generalized Schwarz inequality for unital positive 
maps on normal elements, 


ola )ola) < plaa), aa=aa’ 


These results may be used to reveal the strong 
interplay between the algebraic, continuity and 
positivity properties of maps: 


Let ġ¢: A — B be an invertible linear map between 
unital C*-algebras such that ¢(14)=1g. The 
following properties are equivalent: 


1. @ is Jordan isomorphism, 

2. @ is an isometry, and 

3. ¢is an order isomorphism (¢ and ọ¢™ are order 
preserving). 


The above conclusions can be strengthened if, 
instead of individual maps, continuous groups of 
maps are considered. 


Let tra, be a strongly continuous, one-parameter 
group of maps of a unital C*-algebra A and 
assume that a;(14)=1, for all t € R. The follow- 
ing properties are equivalent: 


1. a; is a *-automorphism of A for allte R, 
2. |laz|| <1 for all t € R, and 
3. a; is positive for all t€ R. 


An analogous result holds true for w*-continuous 
groups on abelian, or factors, von Neumann algebras. 


States on C*-Algebras 


A state on a C*-algebra A is a positive functional 
@:A—C of norm 1: 


è ¢(a*a)>0 for all a € A, and 


© lel =1. 


As C is a C*-algebra, when A is unital, a state on it 
is just a unital positive map: 


è ¢(a*a)>0 for all a € A, and 
e d(14)=1. 


States for which (ab) = ¢(ba) are called tracial states. 

States constitute a distinguished class of positive 
maps, both from a mathematical viewpoint and for 
application to mathematical physics. We will see below 
that states are deeply connected to representations of 
C*-algebras (see C*-Algebras and their Classification). 


States on Commutative C*-Algebras 


Since this is a subcase of positive maps in commutative 
C*-algebras we only add a comment. As far as a 
C*-algebra represents observable quantities of a 
physical system, states carry our actual knowledge 
about the system itself. The smallest C*-sub-algebra 
{f(a): f € Co(R)} of A containing a given self-adjoint 
element a € A, representing a certain observable 
quantity, is isomorphic to the algebra C(Sp(a)) of 
continuous functions on the spectrum of a. A state on 
A induces, by restriction, a state on C(Sp(a)), which, 
by the Riesz representation theorem, is associated to a 
probability measure u4 on Sp(a) through the formula 


o(f(a)) = J of esta 


Since Sp(a) represents the possible values of the 
observable associated to a, ua represents the dis- 
tribution of these values when the physical state of 
the system is represented by 4. 


Vector States and Density Matrices 


In case A is acting on a Hilbert space hb, A C B(h), 
each unit vector € €h gives rise to a vector state 
pela) = (Ela £). In the quantum-mechanical descrip- 
tion of a finite system, as far as observables with 
discrete spectrum are concerned, one can assume A 
to be the C*-algebra K(hb) of compact operators on 
the Hilbert space þh. In this case every state is a 
convex superposition of vector states, in the sense 
that it can be represented by the formula 


pla) = tr(pa)/tr(p), 


for a suitable density matrix p, that is, a positive, 
compact operator with finite trace. In quantum 
statistical mechanics, the grand canonical Gibbs 
equilibrium state of a finite system at inverse tempera- 
ture 8 and chemical potential u, with Hamiltonian H 
and number operator N, is of the above type 


Po, (a) = tr(e"*a) /te(e 7”) 


a € K(h) 


where K = H — uN, and the spectrum of H is assumed 
to be discrete and such that e~"* is trace-class. For 
infinite systems, A is a quasilocal C*-algebra generated 
by a net {Aa}, of C*-subalgebras describing observa- 
bles referred to finite-volume regions. Infinite-volume 
equilibrium states on A can then be obtained as 
thermodynamic limits of finite-volume Gibbs equili- 
brium states of the above type. 


Normal and Singular States 


When observables with continuous spectrum have to 
be considered and one chooses the algebra B(h) of 
all bounded operators, the above formula, although 
still meaningful, does not describe all states on B(h) 
but only the important subclass of the normal ones. 
To this class, which can be considered on any von 
Neumann algebra M, belong states @ which are 
o-weakly continuous functionals. Equivalently, these 
are the states such that for all increasing net ag E M4 
with least upper bound a € M+, (a) is least upper 
bound of the net ¢(a,). 

In general, each state ¢ on a von Neumann 
algebra M splits as a sum of a maximal normal 
piece and a singular one. Singular traces appear in 
noncommutative geometry as very useful tools to get 
back local objects from spectral ones via the familiar 
principle that local properties of functions depend 
on the asymptotics of their Fourier coefficients. 

This is best illustrated on a compact, Riemannian 
n-manifold M by the formula 


J f dm = cy - TaM D| ”) 
M 


which expresses the Riemannian integral of a nice 
function f in terms of the Dirac operator D acting on the 
Hilbert space of square-integrable spinors, the multi- 
plication operator M; by f, and the singular Dixmier 
tracial state 7,, on B(H). Here the compactness of M 
implies the compactness of the operator M;|D| ” and 
T is a limiting procedure depending only on the 
asymptotic behavior of the eigenvalues of M;|D|”. 
Similar formulas are valid on self-similar fractals as well 
as on quasiconformal manifolds. Local index formulas 
represent cyclic cocycles in Connes?’ spectral geometry 
(see Noncommutative Geometry and the Standard 
Model; Noncommutative Geometry from Strings; 
Path-Integrals in Noncommutative Geometry). 


States and Representations: The 
GNS Construction 


A fundamental tool in studying a C*-algebra A 
are its representations. These are morphisms of 
C*-algebras 7: A —> B(H) from A to the algebra of 
all bounded operators on some Hilbert space H. 
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There is a symbiotic appearance of states and 
representations on C*-algebras. In fact, given a 
representation 7:A—->+B(H), one easily constructs 
states on A by unit vectors € € H by 


bela) = (Elrla)s) 


In fact, one checks that ge(la*a)= (£|[r(a*a) £) = 
(€|(a*)n(a) €) = ||z(a) ¿l|? > 0 and, at least if a unit 
exists, that ġe(14)= ||| =1. 

A fundamental construction due to Gelfand, 
Naimark, and Segal allows to associate a represen- 
tation to each state in such a way that each state is a 
vector state for a suitable representation. 


“Let w be a state over the C*-algebra A. It follows 


that there exists cyclic representation (Tu, Hw, Ew) 
of A such that 


wla) = (Ev|Tola)Ew) 
Moreover, the representation is unique up to 
unitary equivalence. It is called the canonical 
cyclic representation of A associated with w.” 


The positivity property of the state allows to 
introduce the positive-semidefinite scalar product 
(a|b) =w(a*b) on the vector space A. Moreover, its 
kernel Z,, = fa € A: w(a*a) =0} is a left-ideal of A: in 
fact, if acA and beT, then w((ba)*(ba)) < 
|a||?w(b*b) =0. This allows to define, on the 
quotient pre-Hilbert space A/Z,,, an action of 
the elements a € A: 7,(a)(b+Z,,):=ab+TZ,. It is 
the extension of this action to the Hilbert space 
completion H, of A/Z,, that gives the representation 
associated to w. When A has a unit, the cyclic vector 
€,, with the stated properties is precisely the image of 
1,+Z,,. By definition, the cyclicity of the represen- 
tation amounts to check that 7,,(A)é,, is dense in Hu. 


Completely Positive Maps 


In a sense, the order structure of a C*-algebra A 
is better understood through the sequence of 
C*-algebras A ® M,,(C) & M,(A), obtained as tensor 
products of A and full matrix algebras M,,(C). For 
example, C*-algebras are miatrix-ordered vector 
spaces as a*(M,,(A)),a C (M,(A)), for all matrices 
aE Mmxn(C). 

In this respect, one is naturally led to consider 
stronger notion of positivity: 


“A map ¢:A— +B is called n-positive if its 
extension 


6@1,:A®@M,(C)— B & M, (C) 
(98 1n)laijli; = Plai); 
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is positive and completely positive (CP map for 
short) if this happens for all n.” 


Equivalently, n-positive means that > 7; ;_, b70 x 
(a}.a;)b; > 0 for all A15..-,an EA and bi,. nga Eb. 
In particular, if ¢ is n-positive then it is k-positive for 
all k <n. Many positive maps we considered are in 
fact CP maps: 


1. morphisms of C*-algebras are CP maps; 

2. positive maps ¢:A4—->B are automatically CP 
maps provided A, B or both are commutative and 
states are, in particular, CP maps; and 

3. an important class of CP maps is the following. 
A norm one projection ¢:A—>B, from a 
C*-algebra A onto a C*-subalgebra B, is a 
contraction such that e(b)=b for all b € B. It 
can be proved that these maps satisfy 
e(bac)=be(a)c for all ac A and b,c eB and 
for this reason they are called conditional 
expectations. This property then implies that 
they are CP maps. 


However, the identity map from a C*-algebra A 
into its opposite A° is positive but not 2-positive 
unless A is commutative, the transposition a+ a’ in 
M,,(C) is positive and not 2-positive if n > 2 and, for 
all n, there exist n-positive maps which are not 
(n + 1)-positive. 


CP Maps in Mathematical Physics 


In several fields of application, the transition of a 
state of a system into another state can be described 
by a completely positive map ¢:A—>B between 
C*-algebras: for any given state w of B, wo ¢ is then 
a state of A. 


1. In the theory of quantum communication pro- 
cesses (see Channels in Quantum Information 
Theory; Optimal Cloning of Quantum States; 
Source Coding in Quantum Information Theory; 
Capacity for Quantum Information), for exam- 
ple, B and A represent the input and output 
systems, respectively, w the signal to be trans- 
mitted, wo ¢ the received signal, and ¢ the system 
of transmission, called the channel. 

2. In quantum probability and in the theory of 
quantum open systems, continuous semigroups 
of CP maps (see Quantum Dynamical Semi- 
groups) describe dissipative time evolutions of a 
system due to interaction with an external one 
(heat bath). 

3. In the theory of measurement in quantum 
mechanics, an observable can be described by a 
positive-operator-valued (POV) measure M which 
assigns a positive element m(E) in a C*-algebra A 


to each Borel subset E of a topological space X. For 
each a€Co(X), one can define its integral 
d(f):= fyfdE as an element of A. The map 
@:Co(X) — A, called the observation channel, is 
then a CP map. 

4. Another field of mathematical physics in which CP 
maps play a distinguished role is in the construc- 
tion and application of the quantum dynamical 
entropy, an extension of the Kolmogorov-Sinai 
entropy of measure preserving transformations 
(see Quantum Entropy). When dealing with 
a noncommutative dynamical system (M,a,T) 
in which T is a normal trace state on a finite 
von Neumann algebra M, the Connes—Stormer 
entropy /,(a) is defined through the consideration 
of an entropy functional H (N1, ..., Np) of finite- 
dimensional von Neumann subalgebras 
Nı,..., N} CM. To extend the definition to 
more general C*-algebras and states on them, one 
has to face the fact that C*-algebras may have no 
nontrivial C*-subalgebras. To circumvent the 
problem A Connes, H Narnhofer, and W Thirring 
(CNT) introduced an entropy functional 
H(1,...5 7) associated to a set y;:A; —A of 
CP maps (finite channels) from finite-dimensional 
C*-algebras A; into A. This led to the CNT entropy 
h (œa) of a noncommutative dynamical system 
(A,a,w), where w is a state on A and a is an 
automorphism or a CP map preserving it: 
WOoa=w. 


CP Maps and Continuity 


Since for an element a € A of a unital C*-algebra, 
one has ||a|| <1 precisely when 


(ei) 


is positive in M>(A), it follows that 
2-positive unital maps are contractive 


Unital 2-positive maps satisfy, in particular, the 
generalized Schwarz inequality for all a € A, 


pla pla) < olaa) 
In particular, 


“CP maps are completely bounded as sup, ||¢ ® 1n|| = 
}@(1,4)|| and completely contractive if they are 
unital. Conversely unital, completely contractive 
maps are CP maps.” 


CP Maps and Matrix Algebras 


When the domain or the target space of a map are 
matrix algebras, one has the following equivalences 
concerning positivity. Let [e;,;]; ; denote the standard 


matrix units in M,(C) and ¢:M,(C)—B into a 
C*-algebra B. The following conditions are 
equivalent: 


1. @ is a CP map, 
2. @ is n-positive, and 
3. [(e;,;)];,; is positive in M,(B). 


Associating to a linear map ¢:A—> M,(C), the 
linear functional sẹ:M„,(A)— C by s¢([ai,j]):= 
i,j laij) j One has the following equivalent 
properties: 


1. @ is a CP map, 

2. ¢ is n-positive, 

3. sọ is positive, and 

4. sg is positive on A, @ M,(C)_. 


Stinspring Representation of CP Maps 


CP maps are relatively easy to handle, thanks to the 
following dilation result due to W F Stinspring. It 
describes a CP map as the compression of a 
morphism of C*-algebras. 


Let A be a unital C*-algebra and ¢:A—> B(H) a 
linear map. Then ¢ is a CP map if and only if it 
has the form 


ola) = V'r(a)V 


for some representation 7: A —> B(K) on a Hil- 
bert space K, and some bounded linear map 
V:H—K. If A is a von Neumann algebra and ¢ 
is normal then ~ can be taken to be normal. When 
A=B(H) and H is separable, one has, for some 
b, € BH), 


pla) = D b*ab, 
n=1 


The proof of this result is reminiscent of the 
GNS construction for states and its extension, by 
G Kasparov, to C*-modules is central in bivariant 
K-homology theory. 


Despite the above satisfactory result, one should 
be aware that positive but not CP maps are much 
less understood and only for maps on very low 
dimensional matrix algebras do we have a definitive 
classification. To have an idea of the intricacies of 
the matter, one may consult Størmer (1963). 


Positive Semigroups on Standard Forms 
of von Neumann Algebras and Ground State 
for Physical Hamiltonians 


The above result allows one to derive the structure 
of generators of norm-continuous dynamical semi- 
groups in terms of dissipative operators. 
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Strongly continuous positive semigroups, which 
are KMS symmetric with respect to a KMS state w 
of a given automorphism group of a C*-algebra A, 
can be analyzed as positive semigroups in the 
standard representation (M, H, P,J) (see Tomita- 
Takesaki Modular Theory) of the von Neumann 
algebra M :=7,,(A)”. A semigroup on A gives rise to 
a corresponding w*-continuous positive semigroup 
on M and to a strongly continuous positive 
semigroup on the ordered Hilbert space (H, P) of 
the standard form. In the latter framework, one can 
develop an infinite-dimensional, noncommutative 
extension of the classical Perron—Frobenius theory 
for matrices with positive entries. This applies, in 
particular, to semigroups generated by physical 
Hamiltonians and has been used to prove existence 
and uniqueness of the ground state for bosons and 
fermions systems in quantum field theory (one may 
consult Gross (1972)). 


Nuclear C*-Algebras and Injective 
von Neumann Algebras 


The nonabelian character of the product in 
C*-algebras may prevent the existence of nontrivial 
morphisms between them, while one may have an 
abundance of CP maps. For example, there are no 
nontrivial morphisms from the algebra of compact 
operators to C, but there exist sufficiently many 
states to separate its elements. A much more well- 
behaved category of C*-algebras is obtained by 
considering CP maps as morphisms. This is true, in 
particular, for nuclear C*-algebras: those for which 
any tensor product A ® B with any other C*-algebra 
B admits a unique C*-cross norm (see C*-Algebras 
and their Classification). The intimate relation 
between this class of algebras and CP maps is 
illustrated by the following characterization: 


1. A is nuclear; 

2. the identity map of A is a pointwise limit of CP 
maps of finite rank; 

3. the identity map of A can be approximately 
factorized, limyg(T,oS,)a—a for all a €A, 
through matrix algebras and nets of CP maps 
Sa: A—Mz(C), Ta : M, (C) > A. 


A second important relation between nuclear 
C*-algebras and CP maps emerges in connection to 
the lifting problem. 


“Let A be a nuclear C*-algebra and J a closed two- 
sided ideal in a C*-algebra B. Then every CP map 
ġ:A— B/J can be lifted to a CP map ¢':A—B. 
In other words, @ factors through B by the 
quotient map q: B — B/J: d6=q0@.” 
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This and related results are used to prove that 
the Brown—Douglas—Fillmore K-homology invariant 
Ext(A) is a group for separable, nuclear C*-algebras. 

Our last basic result, due to W Arveson, about CP 
maps concerns the extension problem. 


“Let A be a unital C*-algebra and N a self-adjoint 
closed subspace of A containing the identity. Then 
every CP map ¢:N — B(H) from N into a type I factor 
B(H) can be extended to a CP map ¢:A— B(H).” 


This result can be restated by saying that type I 
factors are injective von Neumann algebras. It may 
suggest how the notion of a completely positive map 
plays a fundamental role along Connes’ proof of one 
culminating result of the theory of von Neumann 
algebras, namely the fact that the class of injective 
von Neumann algebras coincides with the class 
of approximately finite-dimensional ones (see von 
Neumann Algebras: Introduction, Modular Theory 
and Classification Theory). 


See also: Capacity for Quantum Information; 
C*-Algebras and Their Classification; Channels 

in Quantum Information Theory; Noncommutative 
Geometry and the Standard Model; Noncommutative 
Geometry from Strings; Optimal Cloning of Quantum 
States; Path Integrals in Noncommutative Geometry; 
Quantum Dynamical Semigroups; Quantum Entropy; 
Source Coding in Quantum Information Theory; Tomita— 


Takesaki Modular Theory; von Neumann Algebras: 
Introduction, Modular Theory, and Classification Theory. 
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Nilpotent Lie Groups 


While not much had been published on the geometry 
of nilpotent Lie groups with a left-invariant 
Riemannian metric till around 1990, the situation is 
certainly better now; see the references in Eberlein 
(2004). However, there is still very little that is 
conspicuous about the more general pseudo- 
Riemannian case. In particular, the two-step 
nilpotent groups are nonabelian and as close as 
possible to being abelian, but display a rich variety of 
new and interesting geometric phenomena (Cordero 
and Parker 1999). As in the Riemannian case, one of 
many places where they arise naturally is as groups of 
isometries acting on horospheres in certain (pseudo- 
Riemannian) symmetric spaces. Another is in the 
Iwasawa decomposition G=KAN of semisimple 


groups with the Killing metric tensor, which need 
not be (positive or negative) definite even on N. Here, 
K is compact and A is abelian. 

An early motivation for this study was the 
observation that there are two _ nonisometric 
pseudo-Riemannian metrics on the Heisenberg 
group H3, one of which is flat. This is a strong 
contrast to the Riemannian case in which there is 
only one (up to positive homothety) and it is not 
flat. This is not an anomaly, as we now well know. 

While the idea of more than one timelike 
dimension has appeared a few times in the physics 
literature, both in string/M-theory and in brane- 
world scenarios, essentially all work to date assumes 
only one. Thus, all applications so far are of 
Lorentzian or definite nilpotent groups. Guediri 
and co-workers led the Lorentzian studies, and 
most of their results stated near the end of the 
section “Lorentzian groups” concern a major, 
perennial interest in relativity: the (non)existence of 
closed timelike geodesics in compact Lorentzian 
manifolds. 


Others have made use of nilpotent Lie groups 
with left-invariant (positive or negative) definite 
metric tensors, such as Hervig’s (2004) constructions 
of black hole spacetimes from solvmanifolds (related 
to solvable groups: those with Iwasawa decomposi- 
tion G= AN), including the so-called BTZ construc- 
tions. Definite groups and their applications, already 
having received thorough surveys elsewhere, most 
notably those of Eberlein, are not included here. 

Although the geometric properties of Lie groups 
with left-invariant definite metric tensors have been 
studied extensively, the same has not occurred for 
indefinite metric tensors. For example, while the 
paper of Milnor (1976) has already become a classic 
reference, in particular for the classification of 
positive-definite (Riemannian) metrics on three- 
dimensional Lie groups, a classification of the 
left-invariant Lorentzian metric tensors on these 
groups became available only in 1997. Similarly, 
only a few partial results in the line of Milnor’s 
study of definite metrics were previously known for 
indefinite metrics. Moreover, in dimension 3, there 
are only two types of metric tensors: Riemannian 
(definite) and Lorentzian (indefinite). But in higher 
dimensions, there are many distinct types of indefi- 
nite metrics while there is still essentially only one 
type of definite metric. This is another reason why 
this area has special interest now. 

The list in “Further reading” at the end of this 
article consists of general survey articles and a 
select few of the more historically important papers. 
Precise bibliographical information for references 
merely mentioned or alluded to in this article 
may be found in those. The main, general reference 
on pseudo-Riemannian geometry is O’Neill’s (1983) 
book. Eberlein’s (2004) article covers the Rieman- 
nian case. At this time, there is no other compre- 
hensive survey of the pseudo-Riemannian case. One 
may use Cordero and Parker (1999) and Guediri 
(2003) and their reference lists to good advantage, 
however. 


Inner Product and Signature 


By an inner product on a vector space V we shall 
mean a nondegenerate, symmetric bilinear form on 
V, generally denoted by (,). In particular, we do not 
assume that it is positive definite. It has become 
customary to refer to an ordered pair of non- 
negative integers (p,q) as the signature of the inner 
product, where p denotes the number of positive 
eigenvalues and g the number of negative eigen- 
values. Then nondegeneracy means that p+q= 
dim V. Note that there is no real geometric 
difference between (p,q) and (q, p); indeed, O’Neill 
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gives handy conversion procedures for this and for 
the other major sign variant (e.g., curvature) (see 
O’Neill (1983, pp. 92 and 89, respectively)). 

A Riemannian inner product has signature (p, 0). 
In view of the preceding remark, one might as well 
regard signature (0,q) as also being Riemannian, so 
that “Riemannian geometry is that of definite metric 
tensors.” Similarly, a Lorentzian inner product has 
either p=1 or g=1. In this case, both sign 
conventions are used in relativistic theories with 
the proviso that the “1” axis is always timelike. 

If neither p nor q is 1, there is no physical 
convention. We shall say that v € V is timelike if 
(v,v) > 0, null if (v, v) =0, and spacelike if (v, v) < 0. 
(In a Lorentzian example, one may wish to revert to 
one’s preferred relativistic convention.) We shall refer 
to these collectively as the causal type of a vector (or of 
a curve to which a vector is tangent). 

Considering indefinite inner products (and metric 
tensors) thus greatly expands one’s purview, from 
one type of geometry (Riemannian), or possibly two 
(Riemannian and Lorentzian), to a total of |(p + 
q)/2| + 1 distinctly different types of geometries on 
the same underlying differential manifolds. 


Rise of 2-Step Groups 


Throughout, N will denote a connected (and simply 
connected, usually), nilpotent Lie group with Lie 
algebra n having center 3. We shall use (, ) to denote 
either an inner product on n or the induced left- 
invariant pseudo-Riemannian (indefinite) metric 
tensor on N. 

For all nilpotent Lie groups, the exponential map 
exp:n—N is surjective. Indeed, it is a diffeomorph- 
ism for simply connected N; in this case, we shall 
denote the inverse by log. 

One of the earliest papers on the Riemannian 
geometry of nilpotent Lie groups was Wolf (1964). 
Since then, a few other papers about general nilpotent 
Lie groups have appeared, including Karidi (1994) 
and Pauls (2001), but the area has not seen a lot of 
progress. 

However, everything changed with Kaplan’s 
(1981) publication. Following this paper and its 
successor (Kaplan 1983), almost all subsequent 
work on the left-invariant geometry of nilpotent 
groups has been on two-step groups. 

Briefly, Kaplan defined a new class of nilpotent 
Lie groups, calling them of Heisenberg type. This 
was soon abbreviated to H-type, and has since been 
called also as Heisenberg-like and (unfortunately) 
“generalized Heisenberg.” (Unfortunate, because 
that term was already in use for another class, not 
all of which are of H-type.) What made them so 
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compelling was that (almost) everything was expli- 
citly calculable, thus making them the next great test 
bed after symmetric spaces. 


Definition 1 We say that N (or n) is 2-step 
nilpotent when [n,n] C3. Then [[n,n],n]=0 and 
the generalization to k-step nilpotent is clear: 


-llin nl nj, n}---],n] = 0 


with k +1 copies of n (or k nested brackets, if you 
prefer). 


It soon became apparent that H-type groups 
comprised a subclass of 2-step groups; for a nice, 
modern proof see Berndt et al. (1995). By around 
1990, they had also attracted the attention of the 
spectral geometry community, and Eberlein pro- 
duced the seminal survey (with important new 
results) from which the modern era began. (It was 
published in 1994 (Eberlein 1994), but the preprint 
had circulated widely since 1990.) Since then, 
activity around 2-step nilpotent Lie groups has 
mushroomed; see the references in Eberlein (2004). 

Finally, turning to pseudo-Riemannian nilpo- 
tent Lie groups, with perhaps one or two 
exceptions, all results so far have been obtained 
only for 2-step groups. Thus, the remaining 
sections of this article will be devoted almost 
exclusively to them. 

The Baker-Campbell—Hausdorff formula takes on 
a particularly simple form in these groups: 


exp(x) exp(y) = exp(x + y + 4[x, y]) [1] 


Proposition 1 In a pseudo-Riemannian 2-step 
nilpotent Lie group, the exponential map preserves 
causal character. Alternatively, one-parameter sub- 
groups are curves of constant causal character. 


Of course, one-parameter subgroups need not be 
geodesics. 


Lattices and Completeness 


We shall need some basic facts about lattices in N. 
In nilpotent Lie groups, a lattice is a discrete 
subgroup I such that the homogeneous space 
M=T\N is compact. Here we follow the conven- 
tion that a lattice acts on the left, so that the coset 
space consists of left cosets and this is indicated by 
the notation. Other subgroups will generally act on 
the right, allowing better separation of the effects of 
two simultaneous actions. 

Lattices do not always exist in nilpotent Lie 
groups. 


Theorem 1 The simply connected, nilpotent Lie 
group N admits a lattice if and only if there exists a 


basis of its Lie algebra n for which the structure 
constants are rational. 


Such a group is said to have a rational structure, or 
simply to be rational. 

A nilmanifold is a (compact) homogeneous space 
of the form T\N, where N is a connected, simply 
connected (rational) nilpotent Lie group and F is a 
lattice in N. An infranilmanifold has a nilmanifold 
as a finite covering space. They are commonly 
regarded as a noncommutative generalization of 
tori, the Klein bottle being the simplest example of 
an infranilmanifold that is not a nilmanifold. 

We recall the result of Marsden from O’Neill 
(1983). 


Theorem 2 A compact, homogeneous pseudo- 
Riemannian space is geodesically complete. 


Thus, if a rational N is provided with a bi-invariant 
metric tensor (,), then M becomes a compact, 
homogeneous pseudo-Riemannian space which is 
therefore complete. It follows that (N,(,)) is itself 
complete. In general, however, the metric tensor is 
not bi-invariant and N need not be complete. 

For 2-step nilpotent Lie groups, things work nicely 
as shown by this result first published by Guediri. 


Theorem 3 On a 2-step nilpotent Lie group, all 
left-invariant pseudo-Riemannian metrics are geode- 
sically complete. 


No such general result holds for 3- and higher-step 
groups, however. 


2-Step Groups 


In the Riemannian (positive-definite) case, one splits 
n=3@0=303-, where the superscript denotes the 
orthogonal complement with respect to the inner 
product (,). In the general pseudo-Riemannian case, 
however, 3 & 3+ #n. The problem is that 3 might be 
a degenerate subspace; that is, it might contain a 
null subspace U for which U C Lt". 

It turns out that this possible degeneracy of the 
center causes the essential differences between 
the Riemannian and pseudo-Riemannian cases. So 
far, the only general success in studying groups with 
degenerate centers was in Cordero and Parker (1999) 
where an adapted Witt decomposition of n was used 
together with an involution v exchanging the two null 
parts. 

Observe that if 3 is degenerate, the null subspace 
U is well defined invariantly. We shall use a 
decomposition 


n=3;00=US3Z0NGE [2] 





in which ¿=U 9 3 and p=% 9 €, U and Y are 
complementary null subspaces, and U+ NV+ =3 9 €. 
Although the choice of YU is not well defined 
invariantly, once a Y has been chosen then 3 and € 
are well defined invariantly. Indeed, 3 is the portion of 
the center 3 in U+ N Y+, and € is its orthocomplement 
in U+ N Y+. This is a Witt decomposition of n given U, 
easily seen by noting that (U @ Y)+ =3 G €, adapted 
to the special role of the center in n. 

We shall also need to use an involution v that 
interchanges U and YW and which reduces to the 
identity on 3 6 € in the Riemannian (positive-definite) 
case. (The particular choice of such an involution is 
not significant.) It turns out that ų¿ is an isometry of n 
which does not integrate to an isometry of N. The 
adjoint with respect to (,) of the adjoint representa- 
tion of the Lie algebra n on itself is denoted by ad’. 








Definition 2 The linear mapping 
j:U63— End(U 6 €) 
is given by 
jac= ad! La 


Formulas for the connection and curvatures, and 
explicit forms for many examples, may be found in 
Cordero and Parker (1999). It turns out there is a 
relatively large class of flat spaces, a clear distinction 
from the Riemannian case in which there are none. 

Let x,y €n. Recall that homaloidal planes are 
those for which the numerator (R(x,y)y,x) of the 
sectional curvature formula vanishes. This notion is 
useful for degenerate planes tangent to spaces that 
are not of constant curvature. 


Definition 3 A submanifold of a pseudo-Riemannian 
manifold is flat if and only if every plane tangent to 
the submanifold is homaloidal. 


Theorem 4 The center Z of N is flat. 


Corollary 1 The only N of constant curvature 
are flat. 


The degenerate part of the center can have a 
profound effect on the geometry of the whole 


group. 
Theorem 5 [f [n,n] C U and E= {0}, then N is flat. 


Among these spaces, those that also have 3= {0} 
(which condition itself implies [n,n] C U) are funda- 
mental, with the more general ones obtained by 
making nondegenerate central extensions. It is also 
easy to see that the product of any flat group with a 
nondegenerate abelian factor is still flat. 

This is the best possible result in general. Using 
weaker hypotheses in place of E={0}, such as 
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[W, V] = {0} =[E, €], it is easy to construct examples 
which are not flat. 


Corollary 2 If dimZ > [n/2], then there exists a 
flat metric on N. 


Here |r| denotes the least integer greater than or 
equal to r and n= dimN. 

Before continuing, we pause to collect some facts 
about the condition [n,n] C U and its consequences. 


Remark 1 Since it implies j(z) =0 for all z € 3, this 
latter is possible with no pseudo-Euclidean de Rham 
factor, unlike the Riemannian case. (On the other 
hand, a pseudo-Euclidean de Rham factor is 
characterized in terms of the Kaplan-Eberlein map 
j whenever the center is nondegenerate.) 

Also, it implies j(u) interchanges U and € for all 
u € U if and only if [U,V] =[€, €] = {0}. Examples 
are the Heisenberg group and the groups H(p, 1) for 
p > 2 with null centers. 

Finally, we note that it implies that, for every u € LU, 
j(u) maps UV to VU if and only if j(u) maps È to € if and 
only if [%, €] = {0}. 


Proposition 2 If j(z)=0 for all z€3 and j(u) 
interchanges V and È for all u € U, then N is Ricci 
flat. 


Proposition 3 If j(z)=0 for all z€ 3, then N is 
scalar flat. In particular, this occurs when |n, n] C U. 


Much like the Riemannian case, we would expect 
that (N,(,)) should in some sense be similar to flat 
pseudo-Euclidean space. This is seen, for example, 
via the existence of totally geodesic subgroups 
(Cordero and Parker 1999). (O’Neill (1983, ex. 9, 
p. 125) has extended the definition of totally 
geodesic to degenerate submanifolds of pseudo- 
Riemannian manifolds.) 


Example 1 For any x €n the one-parameter sub- 
group exp(tx) is a geodesic if and only if x € 3 or 
xEUGe. This is essentially the same as the 
Riemannian case, but with some additional geodesic 
one-parameter subgroups coming from WL. 


Example 2 Abelian subspaces of Y $ È are Lie 
subalgebras of n, and give rise to complete, flat, 
totally geodesic abelian subgroups of N, just as in 
the Riemannian case. Eberlein’s construction is valid 
in general, and shows that if dimU@E>1+k+ 
k dim 3, then every nonzero element of Y $ € lies in 
an abelian subspace of dimension k + 1. 


Example 3 The center Z of N is a complete, flat, 
totally geodesic submanifold. Moreover, it deter- 
mines a foliation of N by its left translates, so each 
leaf is flat and totally geodesic, as in the Riemannian 
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case. In the pseudo-Riemannian case, this foliation 
in turn is the orthogonal direct sum of two foliations 
determined by S and 3, and the leaves of the 
U-foliation are also null. All these leaves are 
complete. 


There is also the existence of dim3 independent 
first integrals, a familiar result in pseudo-Euclidean 
space, and the geodesic equations are completely 
integrable; in certain cases (mostly when the center 
is nondegenerate), one can obtain explicit formulas. 
Unlike the Riemannian case, there are flat groups 
(nonabelian) which are isometric to pseudo- 
Euclidean spaces (abelian). 


Theorem 6 If [n,n] CU and E={0}, then N is 
geodesically connected. Consequently, so is any 
nilmanifold with such a universal covering space. 


Thus, these compact nilmanifolds are much like tori. 
This is also illustrated by the computation of their 
period spectrum. 


isometry Group 


The main new feature is that when the center is 
degenerate, the isometry group can be strictly larger 
in a significant way than when the center is 
nondegenerate (which includes the Riemannian case). 

Letting Aut(N) denote the automorphism group 
of N and I(N) the isometry group of N, set 
O(N)=Aut(N)OJI(N). In the Riemannian case, 
I(N) =O(N) KN, the semidirect product where N 
acts as left translations. We have chosen the 
notation O(N) to suggest an analogy with the 
pseudo-Euclidean case in which this subgroup is 
precisely the (general, including reflections) pseudo- 
orthogonal group. According to Wilson (1982), this 
analogy is good for any nilmanifold (not necessarily 
2-step). 

To see what is true about the isometry group in 
general, first consider the (left-invariant) splitting of 
the tangent bundle TN=3N 6 vN. 


Definition 4 Denote by IP!(N) the subgroup of the 
isometry group I(N) which preserves the splitting 
TN=3N@vN. Further, let I**(N) = O(N) KN, 
where N acts by left translations. 


Proposition 4 If N is a simply connected, 2-step 
nilpotent Lie group with left-invariant metric tensor, 
then IPN) < P*(N). 


There are examples to show that [SP!<[* is 
possible when L Æ {0}. 

When the center is degenerate, the relevant group 
analogous to a pseudo-orthogonal group may be 
larger. 


Proposition 5 Let O(N) denote the subgroup of 


I(N) which fixes 1EN. Then I(N) = O(N) KN, 
where N acts by left translations. 


The proof is obvious from the definition of O. 
It is also obvious that O < O. Examples show that 
O < O, hence I™ < I, is possible when the center is 
degenerate. 

Thus, we have three groups of isometries, not 
necessarily equal in general: IP! < J?" < I. When the 
center is nondegenerate (U = {0}), the Ricci transfor- 
mation is block-diagonalizable and the rest of 
Kaplan’s proof using it now also works. 


Corollary 3 If the center is nondegenerate, then 
IN) = IS'(N) whence O(N) = O(N). 
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In the next few results, we use the phrase “a 
subgroup isometric to” a group to mean that the 
isometry is also an isomorphism of groups. 


Proposition 6 For any N containing a subgroup 
isometric to the flat three-dimensional Heisenberg 


group, 
TPN) < I" (N) < I(N) 


Unfortunately, this class does not include our flat 
groups in which [n,n] CU and E= {0}. However, 
it does include many groups that do not satisfy 
[n,n] CU, such as the simplest quaternionic 
Heisenberg group. 


Remark 2 A direct computation shows that on this 
flat H3 with null center, the only Killing fields with 
geodesic integral curves are the nonzero scalar 
multiples of a vector field tangent to the center. 


Proposition 7 For any N containing a subgroup 
isometric to the flat H3 x R with null center, 


TPN) < F"(N) < I(N) 


Many of our flat groups in which [n,n] CU and 
€={0} have such a subgroup isometrically 
embedded, as in fact do many others which are not 


flat. 


Lattices and Periodic Geodesics 


In this subsection, we assume that N is rational and 
let T be a lattice in N. 

Certain tori Tp and Tg provide the model fiber 
and the base for a submersion of the coset space T \N. 
This submersion may not be pseudo-Riemannian in 
the usual sense, because the tori may be degenerate. 
We began the study of periodic geodesics in these 
compact nilmanifolds, and obtained a complete 
calculation of the period spectrum for certain flat 
spaces. 


To the compact nilmanifold T\N we may 
associate two flat (possibly degenerate) tori. 


Definition 5 Let N be a simply connected, two-step 
nilpotent Lie group with lattice I and let m:n— v 
denote the projection. Define 


T; = 3/(log lM 3) 
Ty = b/z(logT) 


Observe that 
dimn. 


dim T, + dim T, = dim3 + dim v = 


Let m= dim3 and n= dim». It is a consequence 
of a theorem of Palais and Stewart that T\N is a 
principal T’”-bundle over T”. The model fiber T” 
can be given a geometric structure from its closed 
embedding in I'\N; we denote this geometric 
m-torus by Tp. Similarly, we wish to provide the 
base n-torus with a geometric structure so that the 
projection pg : IT \N — Tp is the appropriate general- 
ization of a _ pseudo-Riemannian submersion 
(O’Neill 1983) to (possibly) degenerate spaces. 
Observe that the splitting n=3 6 v induces splittings 
TN=3N@0N and T(P\N)=3(C\N) 6 o(C\N), 
and that pg. just mods out 3([\N). Examining 
O’Neill’s definition, we see that the key is to 
construct the geometry of Tg by defining 


DBx: v,(I\N) — Taw dB) 
for each 7 € T\N is an isometry [3] 


and 


Y ea = pga (nV xy) 
for all x,y € v = Y Q È |4] 


where m:n — vis the projection. Then the rest of the 
usual results will continue to hold, provided that 
sectional curvature is replaced by the numerator of 
the sectional curvature formula at least when 
elements of Y are involved: 


(Rr, (PBX, PB«Y) PBxY; PBX) 
= (Ron (x, y)y, x) + 3([x, yl, [x, y]) [5] 


Now ppg will be a pseudo-Riemannian submersion in 
the usual sense if and only if U=W={O}, as is 
always the case for Riemannian spaces. 

In the Riemannian case, Eberlein showed that 
Tp S T, and Tg & Ty. In general, Tg is flat only if N 
has a nondegenerate center or is flat. 


Remark 3 Observe that the torus Tg may be 
decomposed into a topological product Tg x Ty in 
the obvious way. It is easy to check that Tz is flat 
and isometric to (log I'M €)\€, and that Ty has a 
linear connection not coming from a metric and not 
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flat in general. Moreover, the geometry of the 
product is “twisted” in a certain way. It would be 
interesting to determine which tori could appear as 
such a Ty and how. 


Theorem 7 Let N be a simply connected, 2-step 
nilpotent Lie group with lattice T, a left-invariant 
metric tensor, and tori as above. The fibers Tr of 
the (generalized) pseudo-Riemannian submersion 
[\N-» Tp are isometric to T;. If in addition the 
center Z of N is nondegenerate, then the base Tg is 
isometric to Ty. 


We recall that elements of N can be identified 
with elements of the isometry group I(N): namely, 
n € N is identified with the isometry ọ = L, of left 
translation by n. We shall abbreviate this by writing 
QEN. 


Definition 6 We say that ¢ € N translates the 
geodesic y by w if and only if y(t) =~y(t +w) for 
all t. If y is a unit-speed geodesic, we say that w is a 
period of ¢. 


Recall that unit speed means that |y= 
(4, 7)|'/* =1. Since there is no natural normal- 
ization for null geodesics, we do not define periods 
for them. In the Riemannian case and in the 
timelike Lorentzian case in strongly causal space- 
times, unit-speed geodesics are parametrized by 
arclength and this period is a translation distance. 
If ¢ belongs to a lattice T, it is the length of a closed 
geodesic in T\N. 

In general, recall that if y is a geodesic in N and if 
pn:N—>T\N denotes the natural projection, then 
pny is a periodic geodesic in T\N if and only if 
some ¢ € I translates y. We say periodic rather than 
closed here because in pseudo-Riemannian spaces it 
is possible for a null geodesic to be closed but not 
periodic. If the space is geodesically complete or 
Riemannian, however, then this does not occur; the 
former is in fact the case for our 2-step nilpotent Lie 
groups. Further, recall that free homotopy classes of 
closed curves in T\N correspond byectively with 
conjugacy classes in T. 


Definition 7 Let C denote either a nontrivial, free 
homotopy class of closed curves in T\N or the 
corresponding conjugacy class in I’. We define p(C) 
to be the set of all periods of periodic unit-speed 
geodesics that belong to C. 


In the Riemannian case, this is the set of lengths of 
closed geodesics in C, frequently denoted by ¢(C). 


Definition 8 The period spectrum of T\N is the set 
spec, (P\N) = J o(@) 
C 
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where the union is taken over all nontrivial, free 
homotopy classes of closed curves in [\N. 


In the Riemannian case, this is the length spectrum 
spec,(['\N). 


Example 4 Similar to the Riemannian case, we can 
compute the period spectrum of a flat torus [\R”, 
where T is a lattice (of maximal rank, isomorphic to 
Z”). Using calculations in an analogous way as for 
finding the length spectrum of a Riemannian flat 
torus, we easily obtain 


spec ([\R”) = {|g AO; g €T} 


It is also easy to see that the nonzero d’Alembertian 
spectrum is related to the analogous set produced 
from the dual lattice I*, multiplied by factors of 
+477, almost as in the Riemannian case. 


As in this example, simple determinacy of periods 
of unit-speed geodesics helps make calculation of the 
period spectrum possible purely in terms of 
logr Cn. 

For the rest of this subsection, we assume that N 
is a simply connected, two-step nilpotent Lie group 
with left-invariant pseudo-Riemannian metric tensor 
(,). Note that non-null geodesics may be taken to be 
of unit speed. Most non-identity elements of N 
translate some geodesic, but not necessarily one of 
unit speed. 

For our special class of flat 2-step nilmanifolds, 
we can calculate the period spectrum completely. 


Theorem 8 1f [n,n] C U and €= {0}, then spec, (M) 
can be completely calculated from logT for any 
M=D\N. 


Thus, we see again just how much these flat, two- 
step nilmanifolds are like tori. All periods can be 
calculated purely from logIT C n, although some will 
not show up from the tori in the fibration. 


Corollary 4 spec,(Tg) (respectively, Tr) is Ucg(C) 
where the union is taken over all those free 
homotopy classes C of closed curves in M =T\N 
that do not (respectively, do) contain an element in 
the center of T = mı(M), except for those periods 
arising only from unit-speed geodesics in M that 
project to null geodesics in both Tg and Tr. 


We note that one might consider using this to assign 

periods to some null geodesics in the tori Tg and Tr. 
When the center is nondegenerate, we obtain 

results similar to Eberlein’s. Here is part of them. 


Theorem 9 Assume U={0}. Let 6€ N and write 
log d=z* +e*. Assume ¢ translates the unit-speed 
geodesic y by w > 0. Let z' denote the component of 


z* orthogonal to [e*,n] and set w*=|z’ + e*|. Let 
(0) =z0 + eo. Then 


(i) je*| < w. In addition, w < w* for timelike (space- 
like) geodesics with wzo — z' timelike (spacelike), 
and w >w* for timelike (spacelike) geodesics 
with wzo — z’ spacelike (timelike); 

(ii) w= |e*| if and only if y(t) = exp(te*/|e*|) for all 
t € R; and 

(iii) w=w* if and only if wzo — z' is null. 


Although w* need not be an upper bound for periods 
as in the Riemannian case, it nonetheless plays a 
special role among all periods, as seen in (iii) above, 
and we shall refer to it as the distinguished period 
associated with œ € N. When the center is definite, 
for example, we do have w < u*. 

Now the following definitions make sense at least 
for N with a nondegenerate center. 


Definition 9 Let C denote either a nontrivial, free 
homotopy class of closed curves in I'\N or the 
corresponding conjugacy class in I. We define ¢*(C) 
to be the distinguished periods of periodic unit-speed 
geodesics that belong to C. 


Definition 10 The distinguished period spectrum 
of T\N is the set 


Dspec,(T\N) = J »°(C) 
C 


where the union is taken over all nontrivial, free 
homotopy classes of closed curves in [\N. 


Then we get this result: 


Corollary 5 Assume the center is nondegenerate. If 
n is nonsingular, then spec (Tg) (respectively, Tr) is 
precisely the period spectrum (respectively, the 
distinguished period spectrum) of those free homo- 
topy classes C of closed curves in M =T\N that do 
not (respectively, do) contain an element in the 
center of T S mı(M), except for those periods arising 
only from unit-speed geodesics in M that project to 
null geodesics in both Tg and Tr. 


Conjugate Loci 
This is the only general result on conjugate points. 


Proposition 8 Let N be a simply connected, 2-step 
nilpotent Lie group with left-invariant metric tensor 
(,), and let y be a geodesic with +(0)=a € 3. 
If ad'a=0, then there are no conjugate points 
along y. 


In the rest of this subsection, we assume that the 
center of N is nondegenerate. 

For convenience, we shall use the notation 
Jz=ad'z for any z€3. (Since the center is 


nondegenerate, the involution 1 may be omitted.) 
We follow Ciatti (2000) for this next definition. As 
in the Riemannian case, one might as well make 
2-step nilpotency part of the definition since it 
effectively is so anyway. 


Definition 11 N is said to be of pseudoH-type if 
and only if 
J, = —(2,2)1 


for any Z € 3. 


Complete results on conjugate loci have been 
obtained only for these groups (Jang et al. 2005). 
For example, using standard results from analytic 
function theory, one can show that the conjugate 
locus is an analytic variety in N. This is probably 
true for general two-step groups, but the proof we 
know works only for pseudoH-type. 


Definition 12 Let y denote a geodesic and assume 
that y(fo) is conjugate to 7(0) along y. To indicate 
that the multiplicity of y(to) is m, we shall write 
multep(to) =m. To distinguish the notions clearly, 
we shall denote the multiplicity of as an eigenvalue 
of a specified linear transformation by mult,,A. 


Let y be a geodesic with 7(0)=1 and 4(0) = zo + 
xo €3@0, respectively, and let J=/J,,. If y is not 
null, we may assume that y is normalized so that 
(7,7) = £1. As usual, Z* denotes the set of all 
integers with 0 removed. 


Theorem 10 Under these assumptions, if N is of 
pseudoH-type, then: 


(i) if zọ=0 and xo £0, then y(t) is conjugate to 
(0) along y if and only if (xo, xo) < 0 and 


in which case mult,p(t) = dim 3; 
(ii) if zo #40 and xy =0, then y(t) is conjugate to 
(0) along y if and only if (z0,z0) > 0 and 


2 
LE 41 7 
[zo] 


in which case multe (t) = dim v. 


Theorem 11 Let y be such a geodesic in a 
pseudoH-type group N with zo 404 xo. 


(i) If (z0,20)=0? with a > 0, then y(toọ) is con- 
jugate to (0) along y if and only if 


2 
to eT ZUA] U A? 
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where ; l 
Q Q i r 
Ai = fi € R| ixo, xo) cot S = (7, sy} 
and 
A> = f: E Rlat = al sina 
(Vii) FAZO) 





when dim3 > 2 


If to € (2n/a)Z", then 


dimv— 1 
multep (Zo) = dimn— 2 
If to € (2n/a)Z, then 


if (7,7) + (Z0,%0) #0 
if (4,4) + (%o0,%0) =0 


1 if to E€ Ay — Ad 
multep(to) = ¢dim3—1 if tọ E€ A2 — Aı 
dim 3 if to E AVN AD 


(ii) If (z0,20) = -8 with B>0, then (to) is a 
conjugate point along y if and only if to € 
Bı UB» where 





B, = f: ER (v0, x0) Z coth Z = Ga} 
and 
Bo = f: E RI6t = ae sinh 6} 
GE y) T (Z0, Z0) 





when dim3 > 2 


The multiplicity is 


1 if to € By — Bo 
mult,, (to) = dim 3 — 1 if to € Bo — By 
dim 3 if to € Bi, N B2 


(iii) If (zo,%0) =0, then (to) is a conjugate point 
along y if and only if 


2 12 
Lo => — 
(x0, x0) 


and multcp(to) = dim3 — 1. 





This covers all cases for a pseudoH-type group with 
a center of any dimension. 

Some results on other two-step groups and 
examples (including pictures in dimension 3) may 
be found in the references cited in Jang et al. (2005). 
When the groups are not pseudoH-type, however, 
complete results are available only when the center 
is one dimensional. Guediri (2004) has results in the 
timelike Lorentzian case. 
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Lorentzian Groups 


Not too long ago, only a few partial results in the 
line of Milnor’s study of definite metrics were 
known for indefinite metrics (Barnet 1989, Nomizu 
1979), and they were Lorentzian. 

Guediri (2003) and others have made special 
study of Lorentzian two-step groups, partly because 
of their relevance to general relativity, where they 
can be used to provide interesting and important 
(counter)examples. Special features of Lorentzian 
geometry frequently enable them to obtain much 
more complete and explicit results than are possible 
in general. 

For example, Guediri (2003) was able to provide 
a complete and explicit integration of the geodesic 
equations for Lorentzian 2-step groups. This 
includes the case of a degenerate center, which 
only required extremely careful handling through a 
number of cases. He also paid special attention to 
the existence of closed timelike geodesics, reflecting 
the relativistic concerns. 

As usual, N denotes a connected and simply 
connected 2-step nilpotent Lie group. For the rest 
of this section, we assume that the left-invariant 
metric tensor is Lorentzian. Whenever a lattice is 
mentioned, we also assume that the group is 
rational. 


Proposition 9 If the center is degenerate, then no 
timelike geodesic can be translated by a central 
element. 


Thus, there can be no closed timelike geodesics 
parallel to the center in any nilmanifold obtained 
from such an N. 


Theorem 12 If the center is Lorentzian, then T\N 
contains no timelike or null closed geodesics for any 
lattice I. 


To handle degenerate centers, three refined 
notions for nonsingular are used: almost, weakly, 
and strongly nonsingular. The precise definitions 
involve an adapted Witt decomposition (as in the 
general pseudo-Riemannian case, but a rather 
different one here) and are quite technical, as is 
typical. We refer to Guediri (2003) for details. 


Theorem 13 If N is weakly nonsingular, then no 


timelike geodesic can be translated by an element 
of N. 


Corollary 6 If N is flat, then no timelike geodesic 
can be translated by a non-identity element. 


Corollary 7 If N is flat, then T\N contains no 
closed timelike geodesics for any lattice T. 


Corollary 8 If N is weakly nonsingular, then T\N 
contains no closed timelike geodesic. 


Corollary 9 If N = Hy,44 is a Lorentzian Heisen- 
berg group with degenerate center, then T\N 
contains no closed timelike geodesic. 


Guediri also has the only non-Riemannian results 
so far about the phenomenon Eberlein called “in 
resonance.” Roughly speaking, this occurs when the 
eigenvalues of the map j have rational ratios. (The 
Lorentzian case actually requires a slightly more 
complicated condition when the center is 
degenerate.) 


Theorem 14 If N is almost nonsingular, then N is 
in resonance if and only if every geodesic of N is 
translated by some element of N. 


See also: Classical Groups and Homogeneous Spaces; 
Einstein Equations: Exact Solutions; Lorentzian 
Geometry. 
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Introduction 


In this article we give a brief introduction to q-special 
functions, that is, q-analogs of the classical special 
functions. Here q is a deformation parameter, usually 
0<q< 1, where g=1 is the classical case. The 
deformation is such that the calculus simultaneously 
deforms to a q-calculus involving q-derivatives and 
q-integrals. The main topics to be treated are 
q-hypergeometric series, with some selected evalu- 
ation and transformation formulas, and some 
q-hypergeometric orthogonal polynomials, most nota- 
bly the Askey-Wilson polynomials. In several vari- 
ables, we discuss Macdonald polynomials associated 
with root systems, with most emphasis on the A, case. 
The rather new theory of elliptic hypergeometric series 
gets some attention. While much of the theory of 
q-special functions keeps q fixed, some of the deeper 
aspects with number-theoretic and combinatorial 
flavor emphasize expansion in q. Finally, we indicate 
applications and interpretations in quantum groups, 
Chevalley groups, affine Lie algebras, combinatorics, 
and statistical mechanics. 


Conventions 


q € C\{1} in general, but O< q< 1 in all infinite 
sums and products. 

n,m,N will be non-negative integers unless men- 
tioned otherwise. 


q-Hypergeometric Series 
Definitions 


For a,g € C the q-shifted factorial (a; q), is defined 
as a product of k factors: 


(a; q), := (1—a)(1 —aq)--- (1 —aq*"') 
(REZs0); (a q)o:= 1 [1] 


If |g) <1 this definition remains meaningful for 
k=co as a convergent infinite product: 


CO 


(4:4). = | [GQ - aq’) |2] 
j=0 
We also write (a1,...,4,3g), for the product of r 


q-shifted factorials: 


(a1, .--,ar;d)p = (41;4)p - - (i De 
(k € Z>o or k = œ) [3] 


A q-hypergeometric series is a power series (for the 
moment still formal) in one complex variable z with 
power series coefficients which depend, apart from q, 
on r complex upper parameters a1,...,4, and s 
complex lower parameters b1,...,b; as follows: 


a 
oe b q, | SGO dseg UU 
bi,. Us 


=) COR 
— (b,,.. bsid); (43) 


s—r+1 
x rontgen) z% (r,s € Zso) [4] 


Clearly the above expression is symmetric in 
41,...,4, and symmetric in b1,...,bs. On the right- 
hand side of [4], we have that 


(k + 1)th term 


kth term 
Q-a) taga 1g 
(1 — bigk)- (1 — bsgt)(1 — q+!) 


is rational in g*. Conversely, any rational function in 
qf can be written in the form of the right-hand side 
of [5]. Hence, any series )>7 9c, with co=1 and 
Chii/Cp rational in q% is of the form of a 
q-hypergeometric series [4]. 

In order to avoid singularities in the terms of [4], 
we assume that }),...,b,;41,q',q7,.... If, for 
some i,a;=q", then all terms in the series [4] with 
k >n will vanish. If none of the a; is equal to q” 
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and if |g| < 1, then the radius of convergence of the 
power series [4] equals 00 ifr<s+1,1ifr=s+1, 
and 0 ifr>s+1. 

We can view the q-shifted factorial as a q-analog 
of the shifted factorial (or Pochhammer symbol) by 
the limit formula 


li (q `d); — 


a al a nd ald ata 


Hence the g-binomial coefficient 


kL Gana (mREZmEkE0 7 


tends to the binomial coefficient for q — 1: 


mel, =) į 


and a suitably renormalized q-hypergeometric series 
tends (at least formally) to a hypergeometric series 


asqÎ1: 


OO caus) iF 
7 1+s— 
lim per Pss | b b agi 2 
q Ds tag a e E 


e T E let ate ewe 
1 ae ome É 


At least formally, there are limit relations between 
q-hypergeometric series with neighboring r,s: 


: ay,. d1, ..., Ay— 
lim A 1; »4r—-1 
ay—00 as 


ae 
E A = 10 bi... b, a [10] 


; di,- år, = a1,.. 
Fass rs fs b, aba = Osi fs 


ge 


ele 
mE 11 
tens | | 
A terminating q-hypergeometric series 
po Cee* rewritten as 2") %_oCa-ęZz® yields 
another terminating g-hypergeometric series, for 
instance: 


Gg” Gis 1225s 
st+1Ps 54,% 


TE 
(4) g- 0/2)a(a+1) (M+ Dn on 
4 eer 
’ 3 Ge gb, o. Taa l 
STIYS da a”; _ Ga, ’ 
g bi o. b, 
ana. m 


Often, in physics and quantum groups related 
literature, the following notation is used for 


q-number, q-factorial, and q-Pochhammer 
symbol: 
(1/2)a _ ,-(1/2)a k 
a = a lg! = [i 
j=1 
k-1 
(lala) = | lati, (k E Zo) [13] 
j=0 


For q— 1, these symbols tend to their classical 
counterparts without the need for renormalization. 
They are expressed in terms of the standard notation 
[1] as follows: 


(1-q) 14 


(lal e =a NOD q 


Special Cases 


For s =r — 1, formula [4] simplifies to 


_ > (ars. -raridade ok 15] 


which has radius of convergence 1 in the nontermi- 
nating case. The case r=2 of [15] is the g-analog of 
the Gauss hypergeometric series. 


q-Binomial series 





ay RG az _ (42D) ve 
16045-3452) = Daa), Gada 


(if series is not terminating, then |z| < 1) [16] 


q-Exponential series 
€q(z):=10(0; —; q, 2) 


a ied 17 








=(—23 4), = (eq(-z))" (EC) [18] 


Eq(z):=161(0; —q "7; q", —2) 
æ gll/Aklk-1) 
=L æ (zec) m9 
k=0 (4:4); 


Jackson’s g-Bessel functions 


v+1. 


Peds ay (3) 


0,0 


1 
ahil ih ge] O<x<2) 20 


v+1 1 1 
J (x: q) =e Bs (G+) il v+1 4x J 


=(-jxa) Pea 
JO (5g) = (V5 Deo 


(93 I) x (G) 


0 1 
Kio] ytd ga] E> 22] 


(x > 0) [21] 


v+1. 


See [90] for the orthogonality relation for JỌ’ (x; q). 

If exp,(z) denotes one of the three q-exponentials 
[17]-[19], then (1/2)(exp,(ix)+exp,(—ix)) is a 
q-analog of the cosine and —(1/2)i(exp,(ix) 
—exp,(—ix)) is a q-analog of the sine. The three 
q-cosines are essentially the case v= —1/2 of the 
corresponding g-Bessel functions [20|-[22], and the 
three g-sines are essentially the case v=1/2 of x 
times the corresponding g-Bessel functions. 


q-Derivative and q-Integral 


The g-derivative of a function f given on a subset of 


R or C is defined by 


f(x) — f (qx) 
Df )(x) <= ————__ (x £0, 1 23 
(Daf) (x) (1—q)x (x#0,q#1) [23] 
where x and qx er od in the domain of f. m4 
continuity, we set (D,f)(0):=/'(0), provided f’(0 


exists. If f is pith on an open srs 
I, then 


lim(Daf)(x) = F(x) 


For a € R\{0} and a function f given on (0,a] or 
[a,0), we peak the g-integral by 


(x € I) [24] 


| "f(ee) dgx := a(1— 4) flag) d 
0 k=0 
=) f(aq*) (aqt — ag**"') [25] 


k=0 


provided the infinite sum converges absolutely (e.g., 
if f is bounded). If F(a) is given by the left-hand side 
of [25], then D, F=f. The right-hand side of [25] is 
an infinite Riemann sum. For q } 1 it converges, at 
least formally, to {> f(x) dx. 
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For nonzero a,b € R we define 


[fear = [ree [rede [26] 


For a q-integral over (0,00), we have to specify a 
q-lattice {ag*},-7 for some a>0 (up to multi- 
plication by an integer power of q): 


| flee) dgxe = a(1—a) Y flag’) gt 
k=—00 
= lim i Ta dX [27] 


The q-Gamma and q-Beta Functions 


The g-gamma function is defined by 








ED (1-0) i- 
T(z) := CAM (z7 0, Dyess) |28] 
(1-4) 
= J P1E,(—(1—q)qt)dgt (Rz>0) [29] 
Then 1—q? 
rae +1) = Fay Tale) [30] 
Pan +1) = ts, 31 
limT4(2) = T) [32] 


The q-beta function is defined by 


Talab) _ A=) (0,0 a) 
Bala: b) = Tab) (4a dx 
Cae E EE 33] 
— i b—1 (qt; da 
= (qt; D) x i 
(Rb > 0, a#0,-1, -2,...) [34] 


The q-Gauss Hypergeometric Series 
q-Analog of Euler’s integral representation 
261(q*,9°3 4°; 952) 
— o rno [ po-1 (9 Doo 
b) Jo ( 


«yal g(e - tagt; 4), 
PA Doo g 4 (Rb > 0, z| <1) [35] 
159) x 
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By substitution of [25], formula [35] becomes a 


transformation formula: 


261 (a, b; c;q,2) 


— 8 Doo Pi Doo 4 (c/b,z.ax:4,b) [36 


(23 Doo (6 Dow 








Note the mixing of argument z and parameters 
a,b,c on the right-hand side. 


Evaluation formulas in special points 
291 (a, b; c; q, c/(ab)) 


_ (aclid) o, 


apaa oap (BD, 

201(g E q, cq /b) (C34), [38] 
on pg gy bia, b" 

201(g ,b; :4,q) (c; q), [39] 


Two general transformation formulas 


OE- SE 


caz D bz |40] 


7 ) 


C C 


C gy 1c, g Me) 


i a 
= PDs) TT saal A 
= ("bezant TP saa] B3 
-ie mg es a A 


Second order q-difference equation 


z(q° — q@t?t1z)(D2u)(z) 


1 q¢ Paley M E 
— D 
te y Ta qt Ig j? (Dqu) (2) 














Some special solutions of [45] are: 


mS Eaa 9°39°s 4,2) [46] 


1ta—c (1+b-c 


me = z1-b1(qit**, qh? a a [47] 


us (z) = Zod Ce ge aa [48] 
They are related by: 

CE ae me 
(qo 1 ga c+1 Re oe 
(PON Doe 
“Gq b— cz, ge b+1e-1. Dx 
E a ae 
aa d 


(q aa : tz i Dok 
x u 49 


uy (z) EE 


un (z) 


a—b+1. 


a+b—c 


Summation and Transformation Formulas 
for -¢,_; Series 


An ,@,_1 series [15] is called “balanced” if b4 ...b,_1 = 
ga, ...a, and z =q, and the series is called “very well- 
poised” if ga, =a2b1 =a3b2 = --- =a,b,_ and qa!’ = 
a = —a3. The following more compact notation is 
used for very well-poised series: 


pV GT OG sees Ay; J; 2) 
ay, qa," -qat ETE 
= rPr— 1/2 1/2 d, Z [50] 
a, ,—a, ,qa/a4,...,qa1/a, 


Below only a few of the most important identities 
are given. See Gasper and Rahman (2004) for many 
more. An important tool for obtaining complicated 
identities from more simple ones is Bailey’s Lemma, 
which can moreover be iterated (Bailey chain), see 
Andrews (1986, ch.3). 


The g-Saalschiitz sum for a terminating balanced 34 


be 
"abc 


(dacb igy 


302| ~ (c,¢/(ab); 4), 


139,49 


Jackson’s sum for a terminating balanced W7 


sW7(a; b, C, d, gta) (bed ).q" E 59,4 


) 
_ (ga, qa/ (bc), qa/ (bd), qa/ (cd); q), [52] 
(qa/b, qa/c,qa/d, qa/(bcd);q), 


Watson’s transformation of a terminating gW7 into a 
terminating balanced 4¢3 


n+2 42 
sW7 (a b, C, d, e, a q, Ta) 


_ (qa, qa/ (de); q), 
(qa/d, qa/e; q), 
x< 43 | q ,d, e, qaj (bc) 


i 53 
aeai P adoja’ 2] ad 


Sears’ transformation of a terminating balanced 4¢3 





ae 
son)" ah q, | 
d,e,f 
_ (e/a, f/45 Qn g” e| q”,a,d/b,d{c . | [54] 
fay P ldg aje, gaf? 


By iteration and by symmetries in the upper and in 
the lower parameters, many other versions of this 
identity can be found. An elegant comprehensive 
formulation of all these versions is as follows. 

Let x1x2x3x4x5x6=q'". Then the following 
expression is symmetric in x1, X2, X3, X4, X5, X6: 


n 
(x1x2%3) 


g NIX 3 0151 


x 463 54,9] [SS] 
MAKIN INA, AION Ns, KANG 


Similar formulations involving symmetry groups can 
be given for other transformations, see Van der Jeugt 
and Srinivasa Rao (1999). 


Bailey’s transformation of a terminating 
balanced 10oW9 


ga 
Ws (a;b,c,d,e e Dg w “a 1) 


_ (ga, qa/ (ef), (qa)" | (bede), (qa) /(bedf); 4), 
(qa/e,qa/f, (qa) /(bedef), (qa) /(bed); 4), 


n+2 43 


qa qa qa qa q a 
« 109 (25 ed’ bd’ be e,f, tas "a 1) [56] 


Rogers-Ramanujan Identities 








00 gk 1 
_-(): z W 
ofi 0144) a ad @) Pa 
—:0: 58 
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Bilateral Series 


Definition [1] can be extended by 


(sao 


maka REZ) [59 


Define a bilateral q-hypergeometric series by the 
Laurent series 


al di,...; dry we( b p ) 
9 9 =y sta s+- ag Ar; ay s5 9 
re he l l ie 


< (a, ar Dp k (1/2)k(k-1) \* "k 
= ee a | z 
D tae (PE) 

C TEE E E E E [60] 


The Laurent series is convergent if |b4 ...b;/(a1...a,;)| < 
|z| and moreover, for s=r, |z| < 1. 


Ramanujan’s ;7); summation formula 


191 (B; cdz) 
(64/0, 25€/(02)50) x 


This has as a limit case 


(lc/b| <|z <1) [61] 


CEKIT 


oi Core 


$C) 4,2) = (zl > lel) [62] 
and as a further specialization the Jacobi triple 
product identity 


CO 


` Ei gre) gk 


k=—00 
= (952,9/2%3 4 (z #9) [63] 


which can be rewritten as a product formula for a 
theta function: 


~ k k? 2nikx 
O4(x;q) = X (515a e 
k=—00 
=|[a-4*) 
k=] 


x (1 — 24%! cos(2rx) + qi) [64] 


q-Hypergeometric Orthogonal 
Polynomials 


Here we discuss families of orthogonal polyno- 
mials {p,(x)} which are expressible as terminating 
q-hypergeometric series (0<q<1) and for 
which either (1) P,(x):=p,(x) or (2) P,(x):=Dpn 
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((1/2)(n+.x)) are eigenfunctions of a second- 
order g-difference operator, that is, 


A(x) Pn(qx) + B(x) Pn(x) + C(x) Pa (qx) 
= Àn P,,(x) [65] 


where A(x), B(x), and C(x) are independent of n, 
and where the 4, are the eigenvalues. The generic 
cases are the four-parameter classes of “Askey-— 
Wilson polynomials” (continuous weight function) 
and q-Racah polynomials (discrete weights 
on finitely many points). They are of type (2) (quad- 
ratic g-lattice). All other cases can be obtained from 
the generic cases by specialization or limit transition. 
In particular, one thus obtains the generic three- 
parameter classes of type (1) (linear g-lattice). These 
are the big g-Jacobi polynomials (orthogonality by 
q-integral) and the g-Hahn polynomials (discrete 
weights on finitely many points). 


Askey-Wilson Polynomials 
Definition as g-hypergeometric series 
= p„ (cos 9; a,b,c,d |q) 

(ab, ac, ad; d); 
ee 


q" 
—n ~n—1 16 —10 
d"d ‘abcd, de" ae 
x :4,q| [66] 


ab,ac,ad 


Du(cos 0) 


This is symmetric in a, b,c, d. 


Orthogonality relation Assume that a,b,c,d are 
four reals, or two reals and one pair of complex 
conjugates, or two pairs of complex conjugates. 
Also assume that |abj, |ac|, |ad|, |bc|, |bd|, |cd| < 1. 


Then 
1 
L Pul)Pmlx)10(3) dx 


DDAN e bm [67] 
k 





where 
l (e 210. D) xe 2 
2r sin 0 w(cos 0) = (ae, be, ce? , de®; q) [68] 
5 ____(abed: 4) 
: (q, ab, ac, ad, bc, bd, cd; q),,. 
hy, _ 1—abcdq""! 
hyo 1-—abcdq"-! 
B (q,ab,ac,ad, bc, bd, cd; q), 691 


(abcd; q), 


and the x, are the points (1/2)(eq* + e!q*) with 
e any of the a,b, c,d of absolute value >1; the sum 
is over the k € Zso with Jeg*| > 1. The Wp are 
certain weights which can be given explicitly. The 
sum in [67] does not occur if moreover 
jal, ||, lel, |d| < 1. 

A more uniform way of writing the orthogonality 
relation [67] is by the contour integral 


= al EF . Paa dg ') 
Geir dz 
Ves, & 


where C is the unit circle traversed in positive 
direction with suitable deformations to separate the 
sequences of poles converging to zero from the 
sequences of poles diverging to oo. 

The case 2=m=0 of [70] or [67] is known as the 
Askey—Wilson integral. 


q-Difference equation 


A(z)Pn(qz) — (A(z) +A(2)) Plz) +A(2")Pn(Q*2) 
=(q-"-1)(1-—q" ‘abcd)P,,(z) [71] 
where P„(2)=pa(4(z+2z7)) 


and A(z)=(1—az) 
) 


(Lbs)(1—¢z) (dz) E= e=] 


Special cases These include the continuous 
q-Jacobi polynomials (two parameters), the contin- 
uous q-ultraspherical polynomials (symmetric one- 
parameter case of continuous q-Jacobi), the 
Al-Salam-Chihara polynomials (Askey-Wilson with 
c=d=0Q), and the continuous g-Hermite polyno- 
mials (Askey—Wilson with a=b =c=d=0). 


Continuous q-Ultraspherical Polynomials 


Definitions as finite Fourier series and as special 
Askey—Wilson polynomial 


C„(cos 6; 3 |) 


SG DBD nck gitn-200 
| > (4; De a)n- [72] 


= as Dn (cos 6; Be. ee =o. 
— q2 6" |g) 73 














Orthogonality relation (—1< 8< 1) 
e219. 2 
=f c „(cos 0; 3, q)Cm(cos 0; 3, q) ane dé 
= a a E 2 nm 74 
(84da) 1- Ba" (Bahn A 
q-Difference equation 
A(z)Pn(qz) — (A(z) FAT PIA K AT 2) 


=(q”" —1)(1 — q" 8°) Pn (2) [75] 
where P,,(z) =Cy(5(z +27 3814) and A(z) =(1— 62") 
d-a T= gz). 

Generating function 


(Bez, Bez: D) xe 
ae. es) =) Gu cos 6; 3 | q)z” 


hi ot aiden A ode 1) [76] 


Special case: the continuous q-Hermite polynomials 


H„(x|q) = (4;4)„ Ca(x;0|q) [77] 


Special cases: the Chebyshev polynomials 
sin((n + 1)0) 


Cn(cos 8; q|q) = Un(cos 0) == — 7 [78] 
im (BDn C,,(cos 0; B|q) = T, (cos 0) 
611 (9; 4), 
:= cos(n0) (n > 0) [79] 
q-Racah Polynomials 
Definition as g-hypergeometric series 
(0s saN] 
Ri(q > + y6q"*"; a, 8,7, 6] 4) 
gaot aa yqr*" 
= 403 d, 
qa, qó, qy 
(a, Bê or y= q-™~*) [80] 
Orthogonality relation 
N 
NO Ralq? + 769") Ramla +EP wy 
y=0 
=) am [81] 


where wy and h, can be explicitly given. 
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Big q-Jacobi Polynomials 
Definition as q-hypergeometric series 
Pix) = 


= 302 É a 
qa, qc 


Plea, DyC3g) 


de x 


a, 1 [82] 


Orthogonality relation 


1 Canta 
ET <0) [83] 


where þ„ can be explicitly given. 


q-Difference equation 


A(x)Pn(qx) — (A(x) + C(x))Pn(x) + C(x)Pa(q” 'x) 

= (q-* — 1)(1 — abq**") P(x) [84] 
where A(x) =aq(x — 1)(bx — c)/x* and C(x) =(x — qa) 
(x —qe)/x? 


Limit case: Jacobi polynomials P% ® (x) 
lim P, (x; 4°, q”, -4° ‘4; q) 
qv 


n! 2x+d—1 
= ee ( d+1 ) | 





Special case: the little g-Jacobi polynomials 
= (—b) e ane 


(483 Dn 
(94; 4), 





P,(qbx;b,a,0;q) [86] 


=2¢1(q-",q"* ab; qa;q,qx) [87 


which satisfy orthogonality relation (for 0 < a < q! 
and b < q'') 


x 084 a dx 


1 
(qx; q) 
SM CRT a 
/ pala, b; 4) (xia, big) pe 


_ (9,940; 9). (1 = a)(9a)" (9,905 9) 1 Sam [88] 
(qa, qb; 4). 1 — abq™”+! (qa, qab; q), 


Limit case: Jackson’s third g-Bessel function (see [22]) 


N+k, (4.0) x —v(n+k) 
NEN PN-m TTP) = (aT; a)n 4 
Cage ea) we) [89] 
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by which [88] tends to the orthogonality relation for 


Tsa: 


` J?) (2q (1/2)(n+k). gq]? ig AE qa)’ 
(n,m E Z) [90] 


q-Hahn Polynomials 


Definition as g-hypergeometric series 


ae n+i y x 
Oaa Nd) = 30) q a N a4] 


(n = 0,1,...,N) 91] 


Orthogonality relation 


x (qa, q ™; q), (qab)? 
X ) =y =Y ee 
=; O,„(q )Qmlq ) (a Bg: Vy 


where þ„ can be explicitly given. 


Stieltjes-Wigert Polynomials 


Definition as g-hypergeometric series 


Seg) = 11 | a; -gta [93] 


(Dn 
The orthogonality measure is not uniquely determined: 
uj 1 
Sn (qt? x; q)Sm(q! 2x; g)w(x) dx = ——— bn m, 
J ‘a a VGA n 


where, for instance 


gi! 
= Tog Ng —ai?x, gx tq), 
a ce __ log’ x _ [94] 
27 log(q-!) . 2 log(q*) 


Rahman-Wilson Biorthogonal Rational Functions 


The following functions are rational in their first 
argument: 


Ra(5(z+27'); 4, b,c, d,e) 
= 10Wo(a/e; q/(be), q/(ce), 
q/ (de), az,a/z,q" ‘abcd,q-";q,q) [95] 


They satisfy the biorthogonality relation 


1 1 
— $ R,| = os 
2ri C (Fete Jia, bede) 


1 =j q 
xX Rm (ieta TARN q)” 


= 2bpônm [96] 


where the contour C is as in [70], and where 


w(z) 
(22,27, abcdez, abcde/z: q), 


AEB aD CACZ, T Doo tg) 
CELIA aR ae ena ! 


(bcde,acde, abde, abce, abcd; q). 


Pp e a 

: (q,ab,ac,ad,ae,bc,bd,be,cd, ce, de;q),,. pa 
and h,/hoj can also be given explicitly. For 
ab=qN,n,m €{0,1,...,N}, there is a related dis- 


crete biorthogonality of the form 


yx, (5 (ag ta g ‘a,b, e.dse) 


1 
x Rn (5 (adh aa" \a.b, 6,4, aig) =() 
(n # m) 99) 


Identities and Functions Associated 
with Root Systems 


7-Function Identities 


Let R be a root system on a Euclidean space of 
dimension l. Then Macdonald (1972) generalizes 
Weyl’s denominator formula to the case of an affine 
root system. The resulting formula can be written as 
an explicit expansion in powers of q of 


I (1-9 Taare) 
n=1 acR 


which expansion takes the form of a sum over a 
lattice related to the root system. For root system A, 
this reduces to Jacobi’s triple product identity [63]. 
Macdonald’s formula implies a similar expansion in 
powers of g of nq) TRI, where 7(q) is “Dedekind’s 
n-function” 7(q) = q'/7* (q; q)% 


Constant Term Identities 


Let R be a reduced root system, R* the positive 
roots, and k € Zo. Macdonald conjectured the 
second equality in 


Je eee a pgs aed 
T dx 
-er( TT To q'e a-ge) 
aEcR*+ i=1 


[100] 


where T is a torus determined by R, CT means the 
constant term in the Laurent expansion in e“, and 
the d; are the degrees of the fundamental invariants 
of the Weyl group of R. The conjecture was 
extended for real k > 0, for several parameters k 
(one for each root length), and for root system BC,, 
where Gustafson’s five-parameter n-variable analog 
of the Askey—Wilson integral ([70] for n=O) 
settles: 


2 dO 


I Ce eee oe) 
I well ars, 
n n+j—2 ; 
A A yaa 77) 
where 
Nee (212), Zi/ Zi; Dox 
1<i<j<n (1212), t2i/% 3) x 
n 2: 
ca). 
x Doo Žž [102] 


Ey (aes eiy CeO Gx 


Further extensions were in Macdonald’s conjectures 
for the quadratic norms of Macdonald polynomials 
associated with root systems (see the subsection 
“Macdonald—Koornwinder polynomials”), and finally 
proved by Cherednik. 


Macdonald Polynomials for Root System A, 


Letn € Zo. We work with partitions A = (Aq,..., An) 
of length < n, where A; >--- > A, > O are integers. 
On the set of such partitions, we take the partial 
order A< u> ` +e An =u teun and 
Ai Heee + A; < pa HeH a i=l, n — 1). Write 
A< u iff A<pu and A#Ap. The monomials are 
a aS, (Aissa Ope 756). For A a partition 


the symmetrized monomials m)(z) and the Schur 
functions s)(z) are defined by: 


= 


permutations a of (Aq,...,An)) 


(sum over all distinct 


[103] 


si) = a E TE [104] 


{z € C”| 


We integrate a function over the torus T := 
i= +++ = l= 1} as 





[105] 


2r 2r 
«| | f(el,...,e1")dO,...d0, 
0 0 
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Definition For à a partition and for 0 < t < 1, the 
(analytically defined) Macdonald polynomial Py(z) = 
P)(z3q, t) is of the form 


P, (2) = Py(z3q,t) = m (z) + X emu (2) 
p< 
(uy, E C) 


such that for all u < A 


J A m, (2) A(z) dz = 0 
T 





where 
=i 
RIX; 5D) gg 
A(z) = A(z; q,t) = | = : [106] 
ily ier 3 Doo 
Orthogonality relation 
: af Pa P, (2) P E) A(z) de 
-I oS -Aj ți- i q^ —Aj+1 y- i. 1G). 5, 1071 
7 1 1— H 
Gh q^ —A;țj—i+1 q^ —Aj+1 yj =i=1. d) 
q-Difference equation 
~ LZ — 3; 
wail Tazi PAK; q, t) 
=e 
= 5 ne) P)(z;q,t) [108] 
i=1 


where Tg, zx is the q-shift operator: Ta zf (Z1, ---sZn):= 
f(Z15 -< -3 QZiy ++ +5 Zn). See (Macdonald 1995, ch. VI, §3) 
for the full system of g-difference equations. 


Special value 


tq’; a)y- 
Ds hog 
i (FD, 
Restriction of number of variables 
Faia oad tinge E 
= Py docu dyt CAs eres [110] 
Homogeneity 
Phe AEG 1 wee a i He) 
(An > 0) [111] 
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Self-duality Let A, yu be partitions. 


Pa? a eh oo 
Pee Uo Ie) 
ag Gg ea a 


ge SO a a 
Pi ee de) [112] 


Special cases and limit relations 
Continuous q-ultraspherical polynomials (see [72]): 


Poa re, re”. q,t) = (q; DV) m—n pretn 
á ) (63D m—n 
X Ca CAT | a) [113] 
Symmetrized monomials (see [103]): 
Pa(z;.q, 1) = my(z) [114] 
Schur functions (see [104}]): 
Py(z;. 4,4) = sa(z) [115] 


Hall-Littlewood polynomials (see Macdonald (1995), 
ch. II): 


Py(z;0,¢) = Px (z; t) [116] 


Jack polynomials (see Macdonald (1995), §VI.10): 


lim Pa (zig. 4") = PA (2) 117 
q 


Algebraic definition of Macdonald polynomials 
Macdonald polynomials can also be defined 
algebraically. We work now with partitions 
A (à > A2 >- > 0) of arbitrary length L(A), and 
with symmetric polynomials in arbitrarily many 
variables x1,x2,..., which can be canonically 
extended to symmetric functions in infinitely 
many variables x1,x2,.... The rth power sum p, 
and the symmetric functions p, are formally 


defined by 


=> PSPs =: [118] 
i>] 
Put 
= [I ”'m;! where m; = m;(X) is the number of 


i>1 


parts of À equal to 7. [119] 


Define an inner product (,),, on the space of 


, ; q,t 
symmetric functions such that 


L(A) Ni 
— q 1 
bog = Ona | [5—> [120] 
i=1 
For partitions A,y the partial ordering A> pu 
means now that } >41 A=} j1 uj and à +: 
A; > pa +--+ u; for all i. The Macdonald poly- 
nomial P)(x;g,t) can now be algebraically defined 
as the unique symmetric function P) of the form 
Py >= DER Uy yMy (U), u E C, Uy A = 1) such that 


(Px, Puj =0 fA LU [121] 


If (A)<n, then the newly defined P(x) with 
Xn+1 = Xn+42 So =O coincides with P\(x;q,t) 
defined analytically, and the new inner product is a 
constant multiple (depending on n) of the old inner 
product. 


Bilinear sum 


1 
> Py Pet t)Py(y; q, t) 
A ‘ q,t 
_ (tXiVj5 d) [122] 
i,j>1 (yj; Doc 


Generalized Kostka numbers The Kostka numbers 
K\,„ occurring as expansion coefficients in 
sa = )0,,K), 7%, were generalized by Macdonald to 
coefficients K), „(q,t) occurring in connection with 
Macdonald polynomials, see Macdonald (1995, 
§VI.8). Macdonald’s conjecture that K),,,(q,t) is a 
polynomial in q and t with coefficients in Zs was 
fully proved in Haiman (2001). 


Macdonald—Koornwinder Polynomials 


Macdonald (2000, 2001) also introduced Macdonald 
polynomials associated with an arbitrary root 
system. For root system BC, this yields a three- 
parameter family which can be extended to the 
five-parameter Macdonald—Koornwinder (M-K) poly- 
nomials (Koornwinder 1992). They are orthogonal 
with respect to the measure occurring in [101] with 
A(z) given by [102]. The M-K polynomials are 
n-variable analogs of the Askey—Wilson polynomials. 
All polynomials just discussed tend, for q Î 1, to 
Jacobi polynomials associated with root systems. 
Macdonald conjectured explicit expressions for 
the quadratic norms of the Macdonald polynomials 
associated with root systems and of the M-K 
polynomials. These were proved by Cherednik by 
considering these polynomials as Weyl group 
symmetrizations of non-invariant polynomials 


which are related to double affine Hecke algebras 
(see Macdonald (2003)). 


Elliptic Hypergeometric Series 


Let p,q € C, |p|, |q| < 1. Define a modified Jacobi 
theta function by 


A(x; p) := (x, p/x; p) (x # 0) |123] 
and the elliptic shifted factorial by 
(a;q, P), = O(a; p)0(aq; p) ...0(aq*'; p) 
(k € Zs0), (439, P)o := 1 [124] 
(a1, ose 147345 D)p = (Ziza Pk saa (ar;q4,P)k [125] 


where a,a1,...,4, 40. For g=e*"", p=e?"" (Sr >0), 
and a € C we have 
(ae? xto), etmir) 
H ge mor. en) 
O(ae2Maetro™"). e2riT) 


6 (gem e2mir ) 


= 1 


=-a q [126] 
A series )>, 9 ce With cg}1/ck being an elliptic 
(i.e., doubly periodic meromorphic) function of k 
considered as a complex variable is called an elliptic 
hypergeometric series. In particular, define the ,E,—1 
theta hypergeometric series as the formal series 


pki GAs wes Or G5 P32) 


CE Gs) e á 
= ee 127 
=, (Disaran GSD), (439, P) | ! 


It has g(k):=cpi4/cp with 


igs 


g(x) = 20(a1q™ De: Cad" ; p) 
O(g**!; p) O(b1g* Dye O(b, 19"; p) 


By [126], g(x) is an elliptic function with periods oa! 
and ro! = e277 bp = eT) if the balancing condi- 
tion 41. = qb,...b,_ is satisfied. 

The ,V,—1 very well-poised theta hypergeometric 
series (a special ,E,-1) is defined, in case of 
argument 1, as: 


rVr-1(415 46,--- w p) 
= ee ar (41, 465 -+ +545 P)p 
= Oar; ` C aoo 
k 
oo: SERS [128] 
(939; P)p 
The series is called balanced if aZ...a7 =a%°q"™*. 


The series terminates if, for instance, a, = q”. 
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Elliptic Analog of Jackson’s M7 Summation 


10V9 (a; b, C, d, g a /(bcd), q”; q, p) 


(qa/b,qa/c,qa/d,qa/(bcd);q, p), [129] 


Elliptic Analog of Bailey’s ;.\/ Transformation 


n+2 3 
12 V11 (abcdef Trap) 
_ (ga,qa/ (ef), (qa) / (bede), (gay / (bcdf);q,P), 
(qa/e,qa/f,(qa} /(bcdef), (qa) /(bcd);q,p), 


qa qa qa qa qa in, 
«avn (E aon ra) oan 





Suitable 12V11 functions satisfy a discrete biortho- 
gonality relation which is an elliptic analog of [99]. 


Ruijsenaars’ elliptic gamma function 


—1 ,j+1,k+1 
< qg" pP 


Co 1 — 
F(z; q, p) = H ae [131] 
Ta 
which is symmetric in p and q. Then 
(qz; q, p) = Oz; pE (% 4, p) 1132] 
[(q"z; q, p) = (2:4, P), T (2:4, P) 
Applications 


Quantum Groups 


A specific quantum group is usually a Hopf algebra 
which is a q-deformation of the Hopf algebra of 
functions on a specific Lie group or, dually, of a 
universal enveloping algebra (viewed as Hopf 
algebra) of a Lie algebra. The general philosophy is 
that representations of the Lie group or Lie algebra 
also deform to representations of the quantum 
group, and that special functions associated with 
the representations in the classical case deform to 
q-special functions associated with the representa- 
tions in the quantum case. Sometimes this is 
straightforward, but often new subtle phenomena 
occur. 

The representation-theoretic objects which may 
be explicitly written in terms of q-special functions 
include matrix elements of representations with 
respect to specific bases (in particular spherical 
elements), Clebsch-Gordan coefficients and Racah 
coefficients. Many one-variable q-hypergeometric 
functions have found interpretation in some way 
in connection with a quantum analog of a three- 
dimensional Lie group (generically the Lie group 
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SL(2,C) and its real forms). Classical by now are: 
little g-Jacobi polynomials interpreted as matrix 
elements of irreducible representations of SU,(2) 
with respect to the standard basis; Askey—Wilson 
polynomials similarly interpreted with respect to a 
certain basis not coming from a quantum subgroup; 
Jackson’s third g-Bessel functions as matrix elements 
of irreducible representations of E,(2); q-Hahn 
polynomials and g-Racah polynomials interpreted 
as Clebsch—Gordan coefficients and Racah coeffi- 
cients, respectively, for SU,(2). 

Further developments include: Macdonald poly- 
nomials as spherical elements on quantum analogs 
of compact Riemannian symmetric spaces; g-analogs 
of Jacobi functions as matrix elements of irreducible 
unitary representations of SU,(1,1); Askey—Wilson 
polynomials as matrix elements of representations 
of the SU(2) dynamical quantum group; an inter- 
pretation of discrete 12 V11 biorthogonality relations 
on the elliptic U(2) quantum group. 

Since the g-deformed Hopf algebras are usually 
presented by generators and relations, identities for 
q-special functions involving noncommuting vari- 
ables satisfying simple relations are important for 
further interpretations of g-special functions in 
quantum groups, for instance: 


q-Binomial formula with g-commuting variables 


w+ =D tet yay) [133 
k=0 q 


Functional equations for g-exponentials with xy 
=qyx 


eql +y) = eg(v)eq(x) as 
Eg(x + y) = Eq(x)Eq(y) 
egl% +y — yx) = eg(x)eq(y) 135] 
Fg(x + y + yx) = Eq(y)Eq(x) 
Various Algebraic Settings 
Classical groups over finite fields (Chevalley 
groups) g-Hahn polynomials and various kinds of 


q-Krawtchouk polynomials have interpretations as 
spherical and intertwining functions on classical 
groups (GL,,SO,,Sp,,) over a finite field F} with 
respect to suitable subgroups, see Stanton (1984). 


Affine Kac—Moody algebras (see Lepowsky 
(1982)) The Rogers-—Ramanujan identities [57], 
[58] and some of their generalizations were inter- 
preted in the context of characters of representations 
of the simplest affine Kac-Moody algebra AY. 


Macdonald’s generalization of Weyl’s denominator 
formula to affine root systems has an interpretation 
as an identity for the denominator of the character 
of a representation of an affine Kac-Moody 
algebra. 


Partitions of Positive Integers 


Let n be a positive integer, p(n) the number of 
partitions of n, pn(m) the number of partitions of n 
into parts <N, pais(n) the number of partitions of 
n into distinct parts, and pogq(7) the number of 
partitions of n into odd parts. Then, Euler observed: 





: — = p(n)q" ne =" pni(njq" [136] 
’ oe) n=0 ) n=0 
(-G Doo = S paa ln)a 
l — [137] 
(4; 97) 5. > Podala 
and 
(=a; 7) = (4:42). Pasal) = Poda) [138] 


The Rogers-Ramanujan identity [57] has the 
following partition-theoretic interpretation: the 
number of partitions of n with parts differing at 
least 2 equals the number of partitions of n into 
parts congruent to 1 or 4 (mod 5). Similarly, [58] 
yields: the number of partitions of n with parts 
larger than 1 and differing at least 2 equals the 
number of partitions of n into parts congruent to 
2 or 3 (mod 5). 

The left-hand sides of the Rogers-Ramanujan 
identities [57] and [58] have interpretations in 
the “hard hexagon model,” see Baxter (1982). 
Much further work has been done on Rogers- 
Ramanujan-type identities in connection with 
more general models in statistical mechanics. The 
so-called “fermionic expressions” do occur. 


See also: Combinatorics: Overview; Eight Vertex and 

Hard Hexagon Models; Hopf Algebras and g-Deformation 
Quantum Groups; Integrable Systems: Overview; Ordinary 
Special Functions; Solitons and Kac—Moody Lie Algebras. 
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Introduction 


The idea to derive topological invariants of smooth 
manifolds from partition functions of certain action 
functionals was suggested by A Schwarz (1978) and 
highlighted by E Witten (1988). Witten interpreted 
the Jones polynomial of links in the 3-sphere S° as a 
partition function of the Chern—Simons field theory. 
Witten conjectured the existence of mathematically 
defined topological invariants of 3-manifolds, gen- 
eralizing the Jones polynomial (or rather its values 
in complex roots of unity) to links in arbitrary 
closed oriented 3-manifolds. A rigorous construction 
of such invariants was given by N Reshetikhin and 
V Turaev (1989) using the theory of quantum 
groups. The Witten—Reshetikhin—Turaev invariants 
of 3-manifolds, also called the “quantum invar- 


lants,” extend to a topological quantum field theory 
(TQFT) in dimension 3. 
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Ribbon and Modular Categories 


The Reshetikhin—Turaev approach begins with fixing 
suitable algebraic data, which are best described in terms 
of monoidal categories. Let C be a monoidal category 
(i.e., a category with an associative tensor product and 
unit object 1). A “braiding” in C assigns to any objects 
V,W €C an invertible morphism cy, w: V ® W — 
W & V such that, for any U, V, W € C, 


cu, vow = (idv 8 cu,w)(cu,v 8 idw) 
cuev,w = (cu,w @ idy)(idu 8 cv,w) 
A “twist” in C assigns to any object VEC an 


invertible morphism @y: V — V such that, for any 
V,WEC, 


Ovaw = cw cv w(Ov © Ow) 


A “duality” in C assigns to any object V € C a “dual” 
object V* € C, and evaluation and co-evaluation 
morphisms dy: V*@V—1, by:1 — V & V* such 
that 


(idy & dy)(by ®&) idy) = idy 
(dy &) idv: )(idy» &) by) = idy: 
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The category C with duality, braiding, and twist is 
ribbon, if for any V €C, 


(Oy Q idy«)by = (idy © Oy«)by 


For an endomorphism f: V — V of an object V € C, 
its trace “tr(f) € Endc(1)” is defined as 


tr(f) = dycy v«((Avf) & idy« )by : 1—1 


This trace shares a number of properties of the 
standard trace of matrices, in particular, 
tr(fg) =tr(gf) and tr(f ® g) =tr(f)tr(g). For an object 
V EC, set 


dim(V) = tr(idy) = dycy v«(Ov &) idy«)by 


Ribbon categories nicely fit the theory of knots 
and links in $. A link L c $ is a closed one- 
dimensional submanifold of S°. (A manifold is 
closed if it is compact and has no boundary.) A 
link is oriented (resp. framed) if all its components 
are oriented (resp. provided with a homotopy class 
of nonsingular normal vector fields). Given a framed 
oriented link L c $? whose components are labeled 
with objects of a ribbon category C, one defines a 
tensor (L) € Ende(l). To compute (L), present L by 
a plane diagram with only double transversal cross- 
ings such that the framing of L is orthogonal to the 
plane. Each double point of the diagram is an 
intersection of two branches of L, going over and 
under, respectively. Associate with such a crossing 
the tensor (cvy.w) where V, W € C are the labels of 
these two branches and +1 is the sign of the crossing 
determined by the orientation of L. We also 
associate certain tensors with the points of the 
diagram where the tangent line is parallel to a fixed 
axis on the plane. These tensors are derived from the 
evaluation and co-evaluation morphisms and the 
twists. Finally, all these tensors are contracted into a 
single element (L) € Ende(1). It does not depend on 
the intermediate choices and is preserved under 
isotopy of L in S°. For the trivial knot O(V) with 
framing 0 and label VEC, we have (O(V)) = 
dim (V). 

Further constructions need the notion of a tangle. 
An (oriented) tangle is a compact (oriented) one- 
dimensional submanifold of R? x [0,1] with end- 
points on R x O x {0,1}. Near each of its endpoints, 
an oriented tangle T is directed either down or up, 
and thus acquires a sign +1. One can view T as a 
morphism from the sequence of +1’s associated 
with its bottom ends to the sequence of +1’s 
associated with its top ends. Tangles can be 
composed by putting one on top of the other. 
This defines a category of tangles 7 whose objects 
are finite sequences of +1’s and whose morphisms 


are isotopy classes of framed oriented tangles. 
Given a ribbon category C, we can consider C- 
labeled tangles, that is, (framed oriented) tangles 
whose components are labeled with objects of C. 
They form a category Te. Links appear here as 
tangles without endpoints, that is, as morphisms 
Ø— Ø. The link invariant (L) generalizes to a 
functor (-):Te —> C. 

To define 3-manifold invariants, we need modular 
categories (Turaev 1994). Let k be a field. A 
monoidal category C is k-additive if its Hom sets 
are k-vector spaces, the composition and tensor 
product of the morphisms are bilinear, and 
Endc(l)=k. An object VEC is simple if 
Endc(V)=k. A modular category is a k-additive 
ribbon category C with a finite family of simple 
objects {V)}, such that (1) for any object VEC 
there is a finite expansion idy=)_, fig; for 
certain morphisms g;: V > Vy, fi: V), > V and 
(2) the S-matrix (S),,,) is invertible over k where 
Sa, = tr(cv,, v, Cv v). Note that Sy, =(H(à, u)) 
where H(A, u) is the oriented Hopf link with framing 0, 
linking number +1, and labels V), V,.. 

Axiom (1) implies that every simple object in C is 
isomorphic to exactly one of V). In most interesting 
cases (when there is a well-defined direct summa- 
tion in C), this axiom may be rephrased by saying 
that C is finite semisimple, that is, C has a finite set 
of isomorphism classes of simple objects and all 
objects of C are direct sums of simple objects. A 
weaker version of the axiom (2) yields premodular 
categories. 

The invariant (-) of links and tangles extends by 
linearity to the case where labels are finite linear 
combinations of objects of C with coefficients in k. 
Such a linear combination Q= $`, dim(V))V)_ is 
called the Kirby color. It has the following sliding 
property: for any object V € C, the two tangles in 
Figure 1 yield the same morphism V — V. Here, the 
dashed line represents an arc on the closed compo- 
nent labeled by Q. This arc can be knotted or linked 
with other components of the tangle (not shown in 
the figure). 


Figure 1 Sliding property. 


Invariants of Closed 3-Manifolds 


Given an embedded solid torus g:S! x D? => $$, 
where D? is a 2-disk and St = 0D7, a 3-manifold can 
be built as follows. Remove from S° the interior of 
e(S' x D?) and glue back the solid torus D? x S! 
along gicyci- This process is known as “surgery.” 
The resulting 3-manifold depends only on the 
isotopy class of the framed knot represented by g. 
More generally, a surgery on a framed link 
L= U”, L; in S with m components yields a 
closed oriented 3-manifold Mz. A theorem of 
W Lickorish and A Wallace asserts that any closed 
connected oriented 3-manifold is homeomorphic to 
Mz for some L. R Kirby proved that two framed 
links give rise to homeomorphic 3-manifolds if and 
only if these links are related by isotopy and a finite 
sequence of geometric transformations called Kirby 
moves. There are two Kirby moves: adjoining a 
distant unknot O* with framing £= +1, and sliding 
a link component over another one as in Figure 1. 

Let L=U™,L; CS be a framed link and let 
(bi,))i,;=1,..,m be its linking matrix: for i Æ j, bij is 
the linking number of Lj, L;, and b; ; is the framing 
number of L;. Denote by e, (resp. e_) the number of 
positive (resp. negative) eigenvalues of this matrix. 
The sliding property of modular categories implies 
the following theorem. In its statement, a knot K 
with label Q is denoted by K(Q). 


Theorem 1 Let C be a modular category with 
Kirby color Q. Then (O'(Q)) 4 0,(O1(Q)) 4 0 and 
the expression 


Te(Mz) = (O*(Q))** (O78 (2) (L1(Q), -- -, Lm(Q)) 


is invariant under the Kirby moves on L. This 
expression yields, therefore, a well-defined topological 
invariant Te of closed connected oriented 3-manifolds. 


Several competing normalizations of 7¢ exist in 
the literature. Here, the normalization used is such 
that re(S)=1 and re(St x $2) = Ð, (dim (V)))’. 
The invariant 7c extends to 3-manifolds with a 
framed oriented C-labeled link K inside by 


Te(M_, K) 
= (O'(Q)) *(O7N(O))* (L1(Q), ---, Lm(Q), K) 


Three-Dimensional TQFTs 


A three-dimensional TQFT V assigns to every closed 
oriented surface X a finite-dimensional vector space 
V(X) over a field k and assigns to every cobordism 
(M, X,Y) a linear map V(M) = V(M, X, Y): V(X) — 
V(Y). Here, a “cobordism” (M,X,Y) between 
surfaces X and Y is a compact oriented 3-manifold 
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M with M = (— X) II Y (the minus sign indicates the 
orientation reversal). A TQFT has to satisfy axioms 
which can be expressed by saying that V is a 
monoidal functor from the category of surfaces and 
cobordisms to the category of vector spaces over k. 
Homeomorphisms of surfaces should induce iso- 
morphisms of the corresponding vector spaces 
compatible with the action of cobordisms. From 
the definition, V(@)=k. Every compact oriented 
3-manifold M is a cobordism between Ø and 0M 
so that V yields a “vacuum” vector V(M) € Hom(V (0), 
V(OM))=V(OM). If OM=0@, then this gives a 
numerical invariant V(M) € V(@) =k. 

Interestingly, TQFTs are often defined for 
surfaces and 3-cobordisms with additional struc- 
ture. The surfaces X are normally endowed with 
Lagrangians, that is, with maximal isotropic 
subspaces in H1ı(X; R). For 3-cobordisms, several 
additional structures are considered in the litera- 
ture: for example, 2-framings, p,-structures, and 
numerical weights. All these choices are equiva- 
lent. The TQFTs requiring such additional struc- 
tures are said to be “projective” since they provide 
projective linear representations of the mapping 
class groups of surfaces. 

Every modular category C with ground field k 
and simple objects {V)}, gives rise to a projective 
three-dimensional TQFT Ve. It depends on the 
choice of a square root D of 5°) (dim (V)))* € k. 
For a connected surface X of genus g, 


The dimension of this vector space enters the 
Verlinde formula 


dimj(Ve(X)) - 1, = D=? X (dim(Vy))” "s 
A 


where 1, € k is the unit of the field k. If char(k) =0, 
then this formula computes dim; (Ve(X)). For a 
closed connected oriented 3-manifold M with 
numerical weight zero, VWe(M)=D?!™-!7¢(M), 
where b(M) is the first Betti number of M. 

The TQFT Ve extends to a vaster class of surfaces 
and cobordisms. Surfaces may be enriched with a 
finite set of marked points, each labeled with an 
object of C and endowed with a tangent direction. 
Cobordisms may be enriched with ribbon (or fat) 
graphs whose edges are labeled with objects of C and 
whose vertices are labeled with appropriate inter- 
twiners. The resulting TQFT, also denoted Ve, is 
nondegenerate in the sense that, for any surface X, 
the vacuum vectors in V(X) determined by all M 
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with OM=X span V(X). A detailed construction 
of Ve is given in Turaev (1994). 
The two-dimensional part of Ve determines a 


“modular functor” in the sense of G Segal, 
G Moore, and N Seiberg. 


Constructions of Modular Categories 


The universal enveloping algebra Ug of a (finite- 
dimensional complex) simple Lie algebra g admits 
a deformation U49, which is a quasitriangular Hopf 
algebra. The representation category Rep(U,q) is 
C-linear and ribbon. For generic g € C, this category is 
semisimple. (The irreducible representations of g can 
be deformed to irreducible representations of U44.) 
For q, an appropriate root of unity, a certain 
subquotient of Rep(U,g) is a modular category 
with ground field R=C. For q=sl2(C), it was 
pointed out by Reshetikhin and Turaev; the general 
case involves the theory of tilting modules. The 
corresponding 3-manifold invariant 7 is denoted 
T3. For example, if ¢=sl2(C) and M is the Poincaré 
homology sphere (obtained by surgery on a left- 
hand trefoil with framing —1), then (Le 2003) 


M) =(1—4)' > a-a) 
n>0 


x (1 = gq) a (1 a go) 


The sum here is finite since g is a root of unity. 

There is another construction (Le 2003) of a 
modular category associated with a simple Lie 
algebra g and certain roots of unity q. The 
corresponding quantum invariant of 3-manifolds is 
denoted T 8. (Here, it is normalized so that 
F 8(§°)=1.) Under mild assumptions on the order 
of q, we have 7$(M) =75(M)r'(M) for all M, where 
T'(M) is a certain Gauss sum determined by g, the 
homology group H=Hy,(M) and the linking form 
Tors H x Tors H — QO/Z. 

A different construction derives modular categories 
from the category of framed oriented tangles 7. Given 
a ring K, a bigger category K[T] can be considered 
whose morphisms are linear combinations of tangles 
with coefficients in K. Both 7 and K[7] have a 
natural structure of a ribbon monoidal category. 

The skein method builds ribbon categories by 
quotienting K[Z] using local “skein” relations, 
which appear in the theory of knot polynomials 
(the Alexander-Conway polynomial, the Homfly 
polynomial, and the Kauffman polynomial). In 
order to obtain a semisimple category, one com- 
pletes the quotient category with idempotents as 
objects (the Karoubi completion). Choosing appro- 
priate skein relations, one can recover the modular 


> : X en 


Figure 2 The Homfly relation. 


categories derived from quantum groups of series 
A, B, C, D. In particular, the categories determined 
by the series A arise from the Homfly skein relation 
shown in Figure 2 where a,s € K. The categories 
determined by the series B, C, D arise from the 
Kauffman skein relation. 

The quantum invariants of 3-manifolds and the 
TQFTs associated with sly can be directly described 
in terms of the Homfly skein theory, avoiding the 
language of ribbon categories (W _  Lickorish, 
C Blanchet, N Habegger, G Masbaum, P Vogel for 
sl, and Y Yokota for all sly). 


Unitarity 


From both physical and topological viewpoints, 
one is mainly interested in Hermitian and unitary 
TQFTs (over R=C). A TQFT V is Hermitian if the 
vector space V(X) is endowed with a nondegene- 
rate Hermitian form (.,.)y:V(X) 8c V(X) —C 
such that: 


1. the form (.,.)y is natural with respect to homeo- 
morphisms and multiplicative with respect to 
disjoint union and 

2. for any cobordism 
x € V(X), y € V(Y), 


(V(M, X, Y)(x), y)y = (Xx, V(-M, 1 X)(¥))x 


(M, X, Y) and any 


If (.,.)y is positive definite for every X, then the 
Hermitian TQFT is “unitary.” Note two features of 
Hermitian TQFTs. If M =, then V(—M)=V(M). 
The group of self-homeomorphisms of any X 
acts in V(X) preserving the form (.,.)y. For a 
unitary TQFT, this gives an action by unitary matrices. 
The three-dimensional TQFT derived from a mod- 
ular category V is Hermitian (resp. unitary) under 
additional assumptions on V which are discussed 
briefly. A “conjugation” in V assigns to each morph- 
ism f: V — W in V a morphism f : W — V so that 





g=f+g foranyf,g:VoW 
g for any morphisms f,g in C 





for any morphisms 


f:V>W,g:W—>V 


One calls YV Hermitian if it is endowed with 
conjugation such that 


by = (Oy), 
by = dycy v (0v Q 1y) 
dy = (1v: 8 Oy" eye ybv 


Cy wW = (cvw) 


for any objects V, W of V. A Hermitian modular 
category V is unitary if tr(ff) > 0 for any morphism 
fin V. The three-dimensional TQFT, derived from a 
Hermitian (resp. unitary) modular category, has a 
natural structure of a Hermitian (resp. unitary) 
TQFT. 

The modular category derived from a simple Lie 
algebra g and a root of unity q is always Hermitian. 
It may be unitary for some q. For simply laced q, 
there are always such roots of unity q of any given 
sufficiently big order. For non-simply-laced g, this 
holds under certain divisibility conditions on the 
order of q. 


Integral Structures in TQFTs 


The quantum invariants of 3-manifolds have one 
fundamental property: up to an appropriate res- 
caling, they are algebraic integers. This was 
first observed by H Murakami, who proved that 
r2 (M) is an algebraic integer, provided the order of 
q is an odd prime and M is a homology sphere. This 
extends to an arbitrary closed connected oriented 3- 
manifold M and an arbitrary simple Lie algebra g as 
follows (Le 2003): for any sufficiently big prime 
integer r and any primitive rth root of unity q, 


T?S(M) € Z[q] = Zlexp(27i/r)| [1] 


This inclusion allows one to expand 7, 5(M) as 
a polynomial in g. A study of its coefficients leads 
to the Ohtsuki invariants of rational homology 
spheres and further to perturbative invariants of 
3-manifolds due to T Le, J Murakami, and 
T Ohtsuki (see Ohtsuki (2002)). Conjecturally, the 
inclusion [1] holds for nonprime (sufficiently big) r 
as well. Connections with the algebraic number 
theory (specifically modular forms) were studied by 
D Zagier and R Lawrence. 

It is important to obtain similar integrality results 
for TQFTs. Following P Gilmer, fix a Dedekind 
domain D C C and call a TQFT V almost D-integral 
if it is nondegenerate and there is deC such 
that dV(M) <€D for all M with OM=Q9. Given 
an almost-integral TQFT V and a surface X, we 
define S(X) to be the D-submodule of V(X), generated 
by all vacuum vectors for X. This module is preserved 
under the action of self-homeomorphisms of X. 
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It turns out that S(X) is a finitely generated 
projective D-module and V(X)=S(X) @pC. 
A cobordism (M, X, Y) is targeted if all its connected 
components meet Y along a nonempty set. In 
this case, V(M)(S(X)) c S(Y). Thus, applying S to 
surfaces and restricting T to targetet cobordisms, we 
obtain an “integral version” of V. In many interest- 
ing cases, the D-module S(X) is free and its basis 
may be described explicitly. A simple Lie algebra g 
and a primitive rth (in some cases 4rth) root of unity 
q with sufficiently big prime r give rise to an almost 
D-integral TQFT for D=Z[q]. 


State-Sum Invariants 


Another approach to three-dimensional TQFTs is 
based on the theory of 6j-symbols and state sums on 
triangulations of 3-manifolds. This approach intro- 
duced by V Turaev and O Viro is a quantum 
deformation of the Ponzano—Regge model for the 
three-dimensional lattice gravity. The quantum 6j- 
symbols derived from representations of U,(sl2C) are 


C-valued rational functions of the variable go = q"? 
i j k 
l A n |2] 








numerated by 6-tuples of non-negative integers í, /, 
k, l, m, n. One can think of these integers as labels 
sitting on the edges of a tetrahedron (see Figure 3). 
The 6j-symbol admits various equivalent normal- 
izations and we choose the one which has full 
tetrahedral symmetry. Now, let go €C be a 
primitive 2rth root of unity with r>2. Set 
I={0,1,...,r— 2}. Given a labeled tetrahedron T 
as in Figure 3 with i,j,k,1,m,n € I, the 6j-symbol 
[2] can be evaluated at go and we can obtain a 
complex number denoted |T|. Consider a closed 
three-dimensional manifold M with triangulation t. 
(Note that all 3-manifolds can be triangulated.) A 
coloring of M is a mapping y from the set Edg(t) 
of the edges of t to I. Set 


IMI = (V2r/(qo-40')) “>> [I 


Y ecEdg(t) 


(ele) | IT? 
T 





Figure 3 Labeled tetrahedron. 
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where a is the number of vertices of t, (n) =(—1)” 
(a5 — 90") /(4o — qg) for any integer n, T runs over 
all tetrahedra of t, and T? is T with the labeling 
induced by vy. It is important to note that |M| does 
not depend on the choice of t and thus yields a 
topological invariant of M. 

The invariant |M| is closely related to the 
quantum invariant 78(M) for g=sl (C). Namely, 
|M] is the square of the absolute value of ToM); that 
is, |M| = |r8(M)/’. This computes |7$(M)| inside M 
without appeal to surgery. No such computation of 
the phase of T§(M) is known. 

These constructions generalize in two directions. 
First, they extend to manifolds with boundary. Second, 
instead of the representation category of U,(sl2C), one 
can use an arbitrary modular category C. This yields a 
three-dimensional TQFT, which associates to a surface 
X a vector space |X|-, and to a 3-cobordism (M, X, Y) 
a homomorphism |M|-:|X|- > |Y|-, (see Turaev 
(1994)). When X=Y=(Q, this homomorphism is 
multiplication C — C by a topological invariant 
|M|- € C. The latter is computed as a state sum on a 
triangulation of M involving the 6j-symbols associated 
with C. In general, these 6j-symbols are not numbers 
but tensors so that, instead of their product, one 
should use an appropriate contraction of tensors. The 
vectors in V(X) are geometrically represented by 
trivalent graphs on X such that every edge is labeled 
with a simple object of C and every vertex is labeled 
with an intertwiner between the three objects labeling 
the incident edges. The TQFT |- |e is related to the 
TQFT V=V¢ by |M|-=|V(M)|?. Moreover, for any 
closed oriented surface X, 


Xle =End(V(X)) = V(X) @ (V(X) 
= V(X) 8 V(-X) 


and for any three-dimensional cobordism (M, X, Y), 


|M|e = V(M) 8 V(—-M) : V(X) ® V(-X) 
— V(Y) & V(-Y) 


J Barrett and B Westbury introduced a general- 
ization of |M|. derived from the so-called spherical 
monoidal categories (which are assumed to be 
semisimple with a finite set of isomorphism classes 
of simple objects). This class includes modular 
categories and a most interesting family of (unitary 
monoidal) categories arising in the theory of sub- 
factors (see Evans and Kawahigashi (1998) and 
Kodiyalam and Sunder (2001)). Every spherical 
category C gives rise to a topological invariant |M|. 
of a closed oriented 3-manifold M. (It seems that this 
approach has not yet been extended to cobordisms.) 

Every monoidal category C gives rise to a double (or 
a center) Z(C), which is a braided monoidal category 


(see Majid (1995)). If C is spherical, then Z(C) is 
modular. Conjecturally, |M|-=7zic)(M). In the case 
where C arises from a subfactor, this has been recently 
proved by Y Kawahigashi, N Sato, and M Wakui. 
The state sum invariants above are closely related 
to spin networks, spin foam models, and other 


models of quantum gravity in dimension 2 + 1 (see 
Baez (2000) and Carlip (1998)). 


See also: Axiomatic Approach to Topological Quantum 
Field Theory; Braided and Modular Tensor Categories; 
Chern—Simons Models: Rigorous Results; Finite-type 
Invariants of 3-Manifolds; Large-N and Topological 
Strings; Schwarz-Type Topological Quantum Field 
Theory; Topological Quantum Field Theory: Overview; 
von Neumann Algebras: Subfactor Theory. 
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Introduction 


Calogero—Moser (C-M) systems are multiparticle 
(i.e., finite degrees of freedom) dynamical systems 
with long-range interactions. They are integrable 
and solvable at both classical and quantum levels. 
These systems offer an ideal arena for interplay of 
many important concepts in mathematical/theoreti- 
cal physics: to name a few, classical and quantum 
mechanics, classical and quantum integrability, 
exact and quasi-exact solvability, addition of dis- 
crete (spin) degrees of freedom, quantum Lax pair 
formalism, supersymmetric quantum mechanics, 
crystallographic root systems and associated Weyl 
groups and Lie algebras, noncrystallographic root 
systems, and Coxeter groups or finite reflection 
groups. The quantum integrability or solvability of 
C-M systems does not depend on such known 
solution mechanisms as Yang—Baxter equations, 
quantum R-matrix or Bethe ansatz for the quantum 
systems. In fact, quantum C-M systems provide a 
good material for pondering about quantum 
integrability. 


Quantum (Liouville) Integrability 


The classical Liouville theorem for an integrable 
system consists of two parts. Let us consider 
Hamiltonian dynamics of finite degrees of freedom 
N with coordinates q= (q1,..., qn) and conjugate 
momenta p=(pf1,...,9N) equipped with Poisson 
brackets {4j, Pe} = jks {4j> Ik} = {Djs Pe} =09. The first 
part is the existence of a set of independent and 
involutive {K;,K,}=0 conserved quantities {K;} as 
many as the degrees of freedom (j=1,...,N). The 
second part asserts that the generating function of the 
canonical transformation for the action-angle vari- 
ables can be constructed from the conserved quan- 
tities via quadrature. In other words, the second part, 
that is, the reducibility to the action-angle variables is 
the integrability. The quantum counterpart of the 
first half is readily formulated: that is, the existence 
of a set of independent and mutually commuting 
(involutive) [K;,K,]=0 conserved quantities {Kj} as 
many as the degrees of freedom. (This does not 
necessary imply, however, that they are well defined 
in a proper Hilbert space.) The definition of the 
quantum integrability should come as a second part, 
which is yet to be formulated. It is clear that the 


quantum Liouville integrability does not imply the 
complete determination of the eigenvalues and 
eigenfunctions. Such systems would be called exactly 
solvable. This can be readily understood by consider- 
ing any (autonomous) degree-1 Hamiltonian system, 
which, by definition, is Liouville integrable at the 
classical and quantum levels. However, it is known 
that the number of excatly solvable degree-1 Hamil- 
tonians are very limited. What would be the quantum 
counterpart of the “transformation to action-angle 
variables by quadrature”? Could it be better for- 
mulated in terms of a path integral? Many questions 
remain to be answered. The quantum C-M systems, 
an infinite family of exactly solvable multiparticle 
Hamiltonians, would shed some light on the problem 
of quantum integrability, in addition to their own 
beautiful structure explored below. 

Throughout this article, the dependence on 
Planck’s constant, 4, is shown explicitly to distin- 
guish the quantum effects. 


Simplest Cases (Based on A,_; Root 
System) 


The simplest example of a C-M system consists of r 
particles of equal mass (normalized to unity) on a 
line with pairwise 1/ (distance)? interactions 
described by the following Hamiltonian: 


1 r r 1 
H==S p +g- — H 
22, Gna 


in which g is a real positive coupling constant. 
Here q=(q1,...,qr) are the coordinates and 
p=(p1,-.--,Pr) are the conjugate canonical momenta 
obeying the canonical commutation relations: 
[Fis Pe] = 146 jes [qj qk] =[P),PRJ=9, j k=1,...,r. 
The Heisenberg equations of motion are qj = (i/h) 
[H, q;] = Pj» dj =P; = (i/b)[H, p;] = lglg — h) DT 1/ 
(qj — gz)’. The repulsive 1 / (distance)? potential 
cannot be surmounted classically or quantum 
mechanically, and the relative position of the 
particles on the line is not changed during the time 
evolution. Classically, it means that if a motion 
starts at a configuration q1 >q2>--->q,, then the 
inequalities remain valid throughout the time evolu- 
tion. At the quantum level, the wave functions 
vanish at the boundaries, and the configuration 
space can be naturally limited to g1>q2.>--->4q, 
(the principal Weyl chamber). 

Similar integrable quantum  many-particle 
dynamics are obtained by replacing the inverse 
square potential in [1] by the trigonometric 
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i 1/q? ia K | 1/sinh?q 
q q q 


Sutherland 


Four different types of quantum C—M potentials. 


Rational Calogero Hyperbolic 


Figure 1 


( hyperbolic) counterpart (see Figure 1) 
1/(q; — qk}? — a /sinh alq; — qz), in which a > 0 is 
a real parameter. The 1/sin*q potential case 
(the Sutherland system) corresponds to the 
1/(distance)* interaction on a circle of radius 1/2a, 
see rigue 2. A harmonic confining potential 
w a1 G 2/2 can be added to the rational Hamil- 
tonian [1] without breaking the integrability 
(the Calogero system, see Figure 1). At the 
classical level, the trigonometric (hyperbolic) and 
rational C-M systems are obtained from the 
elliptic potential systems (with the Weierstrass p 
function) as the degenerate limits: 0(q1 — q2) > 
a? /sinh” alqı — q2) > 1/(q1 — q2)*, namely as one 
(two) period(s) of the o ag on tends to infinity. 
It is remarkable that these equations of motion can 
be expressed in a matrix form (Lax pair): 
i/b[H, L]=dL/dt=LM — ML=[L, M] © Heisenberg 
equation of motion, in which L and M are given by 



































ig o. ig 
pı qı—q2 a1—4r 
ig ig 
q2—q1 p2 q2—4r 
L= 
1g 1g — 
Ar—-W1 = 4r-42 Pr 
: 7 [2] 
1114 — ok i 
(q1—42) (41-47) 
ig ig 
(q2—q1) (q2—4r) 
M= 
ig ig 
— — m 
(qr—q1) (q4,—q2) . 


distance(q;, q2) =sin a(q,—qo)/a 





Figure 2 Sutherland potential is 1/ (distance)? interaction on a 
circle. The large-radius limit, a — 0, gives the rational potential. 


Sai diagonal element m; of M is given by 

=i); 4 1/(q; — qe)’. The matrix M has a special 
oe >- iMe = pw Mir = 9, which ensures 
the quantum conserved quantities as the total sum of 
powers of Lax matrix L: [H,K,]=0, K, = 
Th") = DT tL" liks (1 = 1, 2, J eee iF [Kus Km] =0. 
It should be stressed that the trace of L” is not 
conserved because of the noncommutativity of q and 
p. The Hamiltonian is equivalent to Ko, H x Ky + 
const. In other words, the Lax matrix L is like a 
“square root?” of the Hamiltonian. The quantum 
equations of motion for the Sutherland and hyper- 
bolic potentials are again expressed by Lax pairs if 
the following replacements are made: 1/(q; — qx) > 
aa q) in L and 1/(q;— dp) 
a?/sinh? a (qi qk) in M. The quantum conserved 
quantities are obtained in the same manner as above 
for the systems with the trigonometric and hyperbolic 
interactions. 

The main goal here is to find all the eigenvalues 
{E£} and eigenfunctions {y(g)} of the Hamiltonians 
with the rational, Calogero, Sutherland, and 
hyperbolic potentials: Hyla) =€u(q). The mome- 
ntum operator p; acts as differential operators 
pj = —ihbð/ðq;. For example, for the rational 
model Hamiltonian [1], the eigenvalue equation 
reads 


which is a second-order Fuchsian differential 
equation for each variable {qj} with a regular 
singularity at each hyperplane q; =q} whose expo- 
nents are g/b,1—g/h. Any solution w of [3] is 
regular at all points, except for those on the union 
of hyperplanes gj=q,. Since the structure of the 
singularity is the same for the other three types of 
potentials, the same assertion for the regularity and 
singularity of the solution w holds for these cases, 
too. For the trigonometric (Sutherland) case, there 
are other singularities at gj — q =lr/a, l € Z, due 
to the periodicity of the potential. As is clear from 
the shape of the potentials, see Figure 1, the 
rational and hyperbolic Hamiltonians have only 
continuous spectra, whereas the Calogero and 
Sutherland Hamiltonians have only discrete 
spectra. 

The integrability or more precisely the triangular- 
ity of the quantum C-M Hamiltonian was first 
discovered by Calogero for particles on a line with 
inverse square potential plus a confining harmonic 
force and by Sutherland for the particles on a circle 


with the trigonometric potential. Later, classical 
integrability of the models in terms of Lax pairs was 
proved by Moser. Olshanetsky and Perelomov 
showed that these systems were based on A,—1 root 
systems, that is, q; — q =a: q, and a is one of the 
root vectors of A,_; root system [13]. They also 
introduced generalizations of the C-M systems 
based on any root system including the noncrystal- 
lographic ones. 

As shown by Heckman—Opdam and Sasaki and 
collaborators, quantum C-M systems with degen- 
erate potentials (i.e., the rational potentials with/ 
without harmonic force, the hyperbolic, and the 
trigonometric potentials), based on any root system 
can be formulated and solved universally. To be 
more precise, the rational and Calogero systems are 
integrable for all root systems, the crystallographic 
and noncrystallographic. The hyperbolic and trigo- 
nometric (Sutherland) systems are integrable for any 
crystallographic root system. The universal formulas 
for the Hamiltonians, Lax pairs, ground state wave 
functions, conserved quantities, the triangularity, the 
discrete spectra for the Calogero and Sutherland 
systems, the creation and annihilation operators, 
etc., are equally valid for any root system. This will 
be shown in the next section. Some rudimentary 
facts of the root systems and reflections are 
summarized in the appendix. 


Universal Formalism 


A C-M system is a Hamiltonian dynamical systems 
associated with a root system A of rank r, which is a 
set of vectors in R” with its standard inner product. 
A brief review of the properties of the root systems 
and the associated reflections together with explicit 
realizations of all the classical root systems will be 
found in the appendix. 


Factorized Hamiltonian 


The Hamiltonian for the quantum C-M system can 
be written in terms of a pre-potential W(q) in a 
“factorized form”: 


F(R (HD) e 


The pre-potential is a sum over positive roots: 


Wa) = Ð galn|w(a-q)| + ( 


acA, 








W 


>P) (5 


The real positive coupling constants gą are 
defined on orbits of the corresponding Coxeter 
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Table 1 Functions appearing in the prepotential and Lax pair 
Potential w(u) x(u) y(u) 
Rational u 1/u —1/u* 
Hyperbolic sinh au a coth au —a?/ sinh? au 
Trigonometric sin au a cot au —a?/sin® au 


group, that is, they are identical for roots in the 
same orbit. That is, for the simple Lie algebra cases, 
one coupling constant, ga = g, for all roots in simply 
laced models and two independent coupling con- 
stants, ga =g, for long roots and g,=gs for short 
roots, in non-simply laced models. The function 
w(u) and the other functions x(u) and y(u) appearing 
in the Lax pair [10],[11] are listed in Table 1 for 
each type of degenerate potentials. The dynamics of 
the prepotentials W(q) (eqn [5]) has been discussed 
by Dyson from a different point of view (random- 
matrix model). The above factorized Hamiltonian 
[4] consists of an operator part H, which is the 
Hamiltonian in the usual definition (see the Hamil- 
tonians in the previous section, e.g., [1]), and a 
constant Eo which is the ground-state energy, 
H=H — £p. The factorized Hamiltonian [4] also 
arises within the context of supersymmetric quan- 
tum mechanics. 

The pre-potential and the Hamiltonian are 
invariant under reflection of the phase space 
variables in the hyperplane perpendicular to any 
root W(sa(q)) = W(q), H(salP), Sa(q)) = H(p, q), Va € 
A, with s, defined by [12]. The above Coxeter 
(Weyl) invariance is the only (discrete) symmetry of 
the C-M systems. The main problem is, as in the A,_1 
case, to find all the eigenvalues {£} and eigenfunctions 
{w(q)} of the above Hamiltonian Hw(g) = Ev(q). 

For any root system and for any choice of 
potential, the C-M system has a hard repulsive 
potential ~1/(a-q)* near the reflection hyperplane 
H, = {gq € R’,a-q=0}. The C-M eigenvalue equa- 
tion is a second-order Fuchsian differential equation 
with regular singularities at each reflection hyper- 
plane H, and those arising from the periodicity in 
the case of the Sutherland potential. Near the 
reflection hyperplane Ha, the solution behaves as 
follows: 


Y~ (a: g)8!" (1 +regular terms), or 
wr (a: gq) 8!" (4 + regular terms) 


The former solution is chosen for the square 
integrability. Because of the singularities, the con- 
figuration space is restricted to the principal Weyl 
chamber PW or the principal Weyl alcove PWr 
for the trigonometric potential (see Figure 3): PW = 
{qE R” |a-q>0,a € II}, PWr={g ER |a-g> 0, 


126 Quantum Calogero-Moser Systems 





Figure 3 Simple roots, the highest root, fundamental weights, 
and the principal Weyl alcove (grey) and the principal Weyl 
chamber (light grey, extending to infinity) in a two-dimensional 
root system. 


a € Il, œp -q < T/a}, (II: set of simple roots, see the 
appendix). Here ay is the highest root. 


Ground-State Wave Function and Energy 


One straightforward outcome of the factorized 
Hamiltonian [4] is the universal ground-state wave 
function, which is given by 


®o(q) — eWV(q)/b 
a/b -(w 
— [I wla - g) 8°! (xe ( eae) 6] 


acA, 
H®o(q) = 0 


The exponential factor e~/2")1 exists only for the 


Calogero systems. The ground-state energy, that is, 
the constant part of H=H-—€o, has a universal 
expression for each potential: 


rational 


0 
Eo = l w(br/2 dD nek, ga) Calogero 
[7] 


—1 hyperbolic 
1 Sutherland 


where p=1/2J aea, gaa is called a “deformed 
Weyl vector.” Obviously, ®o(qg) is square integrable 
in the configuration spaces for the Calogero and 
Sutherland systems and not square integrable for the 
rational and hyperbolic potentials. 


Ey = 2a*p* x 


Excited States, Triangularity, and Spectrum 


Excited states of the C-M systems can be easily 
obtained as eigenfunctions of a differential operator 
H obtained from H by a similarity transformation: 


H = eV 740 


af 
=> X ( & (Aq; + 2hOW/0qj0/0q;) 
j=1 


The eigenvalue equation for H, HVg =EWg, is then 
equivalent to that of the original Hamiltonian, 
HV ce =EWVee™. Since all the singularities of the 
Fuchsian differential equation Hw(g)=E€v(q) are 


contained in the ground-state wave function eW, Ye 
must be regular at finite g, including all the 
reflection boundaries. As for the rational and 
hyperbolic potentials, the energy eigenvalues are 
only continuous. For the rational case, the eigen- 
functions are multivariable generalization of Bessel 
functions. 


Calogero systems The — similarity-transformed 


Hamiltonian H reads 





ð POA 
H =hwg -——- — 5 
ðq 2 êq; 8] 
Sa 0 
=) ca 
2 q "i Og 


which maps a Coxeter-invariant polynomial in q of 
degree d to another of degree d. Thus, the 
Hamiltonian H (8) is lower-triangular in the basis 
of Coxeter-invariant polynomials and the diagonal 
elements have values as Hw x degree, as given by the 
first term. Independent Coxeter-invariant polyno- 
mials exist at the degrees f; listed in Table 2: f; = 1+ 
Gsl = da Wile JC) faa, aie ic 
exponents of A. 

The eigenvalues of the Hamiltonian H are hwN 
with N a non-negative integer. N can be 
expressed as N= i nifi ni E€ Z}, and the 
degeneracy of the eigenvalue hwN is the number 
of partitions of N. It is remarkable that the 
coupling constant dependence appears only in the 
ground-state energy Eo. This is a deformation of 
the isotropic harmonic oscillator confined in the 
principal Weyl chamber. The eigenpolynomials 
are generalization of multivariable Laguerre 
(Hermite) polynomials. One immediate consequence 
of this spectrum is the periodicity of the quantum 
motion. If a system has a wave function (0) at 
t=0, then at t=T =2r/w the system has physically 
the same wave function as (0), that is, 
Y(T) =e 07/0). The same assertion holds at the 
classical level, too. 


Table 2 The degrees f; in which independent Coxeter-invariant 
polynomials exist 





A f=14+¢46 A fi=1+6j 

A, 2,3,4,...,r +1 Eg 2, 8, 12, 14, 18, 20, 24, 30 
Be 2,4,6,...,2r F4 2, 6, 8, 12 

C, 2,4,6,...,2r Go 2,6 

D, 2,4,...,2r—2,r hbh(m) 2,m 

Eş 2, 5, 6, 8, 9, 12 H3 2, 6, 10 

E 2,6,8, 10, 12,14,18 Ha 2, 12, 20, 30 


Sutherland Systems The periodicity of the trigono- 
metric potential dictates that the wave function 
should be a Bloch factor e“4 (where u is a weight) 
multiplied by a Fourier series in terms of simple 
roots. The basis of the Weyl invariant wave 
functions is specified by a dominant weight 
A= i= ijà mj E Lis O49) =) co, e744 where 
O; is D orbit of à by the action of the Weyl group: 
O ={e(A) |g c Ga}. The set of functions {@)} has an 
i [Ar > lar > bo, > oy. The similarity- 
a Hamiltonian H given by 


H r 32 


H= 
2 4 = ôg 


o 
— ab > Za cot (aa - q)a F [9] 


acA, 


is lower- Pann in this basis: Ho, = 2a? (h7 2 + 
2hp-A)Ox + Xiv exon. That is, the eigenvalue is 
E=2a2(6 2 +26p-r) or E+ Eo=2a? (bà + py. 
Again, the coupling constant dependence comes 
solely from the deformed Weyl vector p. This 
spectrum is a deformation of the spectrum corre- 
sponding to the free motion with momentum 2haA 
in the principal Weyl alcove. The corresponding 
eigenfunction is called a generalized Jack polynomial 
or Heckman—Opdam’s Jacobi polynomial. For the 
rank-2 (r=2) root systems, A2, B2 S C2 and In(m) 
(the dihedral group), the complete set of eigenfunc- 
tions are known explicitly. 


Quantum Lax Pair and Quantum Conserved 
Quantities 


The universal Lax pair for C—M systems is given in 
terms of the representations of the Coxeter (Weyl) 
group in stead of the Lie algebra. The Lax operators 
without spectral parameter for the rational, trigono- 
metric, and hyperbolic potentials are 


L(p,4q) =p-H + X(q) 
X(q) =i WT gala MDx(a-ge, I 
ac A, 
-5 ga yla- qa) -D [11] 
2 E, 


where I is the identity operator and {s,|a € A} are 
the reflection operators of the root system. They act 
on a set of R” vectors, R={p'*) € R'|k=1,...,d}, 
permuting them under the action of the reflection 
group. The vectors in R form a basis for the 
representation space V of dimension d. The matrix 
elements of the operators {ŝe|a € A} and 
{Hj l7=1,...,r} are defined as follows: 
(Sa) wv — Op, salv) = Ov, salp)» (Hi) ap = Lj Ops ae A, p, 
v € R. The form of the functions x,y depends on 


Quantum Calogero—Moser Systems 127 


the chosen potential as given in Table 1. Then the 
equations of motion can be expressed in a matrix 
form dL/dt=i/h[H, L]=[L,M]. The operator M 
satisfies the relation ee Mi= >a Mi = 9, 
which is essential for deriving quantum ponceried 
quantities as the total sum (Ts) of all the matrix 
elements of LL": K,=Ts(L") = dover (ews 
(H; Kal = 0 Koos Kol = 0; 2, = 1,2... In: particular, 
the power 2 is universal to all the root systems, and 
the quantum Hamiltonian is given by H œx K2 + 
const. As in the affine Toda molecule systems, a Lax 
pair with a spectral parameter can also be intro- 
duced universally for all the above potentials. The 
Dunkl operators, or the commuting differential- 
difference operators are also used to construct 
quantum conserved quantities for some root sys- 
tems. This method is essentially equivalent to the 
universal Lax operator formalism. As the Lax 
operators do not contain the Planck’s constant, the 
quantum Lax pair is essentially of the same form as 
the classical Lax pair. The difference between the 
trace (tr) and the total sum (Ts) vanishes as h — 0. 


Lax pair for Calogero systems The quantum Lax 
pair for the Calogero systems is obtained from the 
universal Lax pair [10] by replacement L— 
L*=L +iwO, O =q-H, which correspond to the 
creation and annihilation operators of a harmonic 
oscillator. The equations of motion are rewritten as 
dL+/dt =i/b[H, L*] =[L*, M] tiwL*+. Then £*= 
L*L* satisfy the Lax type equation dL*/dt= 
i/b[H, L*], giving rise to conserved quantities 
Ts(L*)", n=1,2,... The Calogero Hamiltonian is 
given by H x Ts(L*). 

All the eigenstates of the Calogero Hamiltonian H 
with eigenvalues hwN, N = D -attifer € Zy are 
simply constructed in terms ae L ID- (B; )” e eW, 
Here the integers 4{f)}, /=1,.:«57,. are listed in 
Table 2. The creation operators Br and the 
corresponding annihilation operators B7 j, are defined 
by Br = Ts(L*)',j=1,...,r. They are Hermitian 
er to each one (BE = BF with respect to 
the standard Hermitian inner product of the states 
defined in PW. They satisfy commutation relations 
[Hb |= + bkwB;, (BY, By ]=[B,, 8B; ]=90, k, Le 
(fil7=1,...,7r}. The ground state is annihilated by 
all the annihilation operators Br eal cg 


Further Developments 
Rational Potentials: Superintegrability 


The systems with the rational potential have a remark- 
able property: superintegrability. A rational C-M 
system based on a rank-r root system has 2r-— 1 
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independent conserved quantities. Roughly speaking, 
they are of the form K, =Ts(L”), Jmn = Ts(QL”), O = 
q-H, among which only r are involutive. At the 
classical level, superintegrability can be characterized 
as algebraic linearizability. Since a commutator of any 
conserved quantities is again a conserved quantity, these 
conserved quantities form a nonlinear algebra called a 
quadratic algebra. It can be considered as a finite- 
dimensional analog of the W-algebra appearing in 
certain conformal field theory. 


Quantum vs Classical Integrability 


In C-M systems, the classical and quantum integr- 
ability are very closely related. The quantum discrete 
spectra of the Calogero and the Sutherland systems 
are, as shown above, expressed in terms of the 
coupling constant (w,g) and the exponents or the 
weights of the corresponding root systems. Namely, 
they are integral multiples of coupling constants. The 
corresponding classical systems with the potential 
V(q) = (1/2) 2- i (OW (q)/Oq;) share many remark- 
able properties. As is clear from Figure 1, they always 
have an equilibrium position. The equilibrium posi- 
tions (g) are described by the zeros of a classical 
orthogonal polynomial; the Hermite polynomial 
(A-type Calogero), the Laguerre polynomial (B, C, D- 
type Calogero), the Chebyshev polynomial (A-type 
Sutherland) and the Jacobi polynomial (B,C, D-type 
Sutherland). For the exceptional root systems, the 
corresponding polynomials were not known for a long 
time. The minimum energy of the classical potential 
V(q) at the equilibrium is the quantum ground-state 
energy limp_,o Eo itself. It is also an integral multiple of 
coupling constants for both Calogero and Sutherland 
cases. Near a classical equilibrium, a multiparticle 
dynamical system is always reduced to a system of 
coupled harmonic oscillators. For Calogero systems, 
the eigenfrequencies of these small oscillations are, in 
fact, exactly the same as the quantum eigenfrequen- 
cies, wfj=w(1+e;). For Sutherland systems, the 
classical eigenfrequencies are the same as the o(þ) 
part of the quantum spectra corresponding to all 
the fundamental weights A;: 2a*A; - p. Moreover, the 
eigenvalues of various Lax matrices L and M at the 
equilibrium take many “interesting values.” These 
results provide ample explicit examples of the general 
theorem on the quantum-classical correspondence 
formulated by Loris—Sasaki. 


Spin Models 


For any root system A and an irreducible represen- 
tation R of the Coxeter (Weyl) group Ga, a spin 
C-M system can be defined for each of the 
potentials: rational, Calogero, hyperbolic and 


Sutherland. For each member u of R, to be called 
a “site,” a vector space V,, is associated whose 
element is called a “spin.” The dynamical variables 
are those of the particles {q;,p;} and the spin 
exchange operators {Pa} (a € A) which exchange 
the spins at the sites u and salu). For each A and R 
a spin exchange model can be defined by “freezing” 
the particle degrees of freedom at the equilibrium 
point of the corresponding classical potential 
{q, p} — {9,0}. These are generalization of Hal- 
dane-Shastry model for Sutherland potentials and 
that of Polychronakos for the Calogero potentials. 
Universal Lax pair operators for both spin C-M 
systems and spin exchange models are known and 
conserved quantities are constructed. 


Integrable Deformations 


C—M systems allow various integrable deformations at 
the classical and/or quantum levels. One of the well- 
known deformations is the so-called “relativistic” C-M 
system or the Ruijsenaars—Schneider (R-S) system. For 
degenerate potentials, they are integrable both at the 
classical and quantum levels. The classical quantities of 
the R-S systems at equilibrium exhibit many interesting 
properties, too. The equilibrium positions are described 
by the zeros of certain deformation of the above- 
mentioned classical polynomials. The frequencies of 
small oscillations are also related to the exact quantum 
spectrum, and they can be expressed as coupling 
constant times the (g-) integers. 

Inozemtsev models are classically integrable mul- 
tiparticle dynamical systems related to C-M systems 
based on classical root systems (A,B,C,D) with 
additional g® (rational) or sin’ 2q (trigonometric) 
potentials. Their quantum versions are not exactly 
solvable in contrast to the C-M or R-S systems, 
although there is some evidence of their Liouville 
integrability (without a proper Hilbert space). 
Quantum Inozemtsev systems can be deformed to 
be a widest class of quasi-exactly solvable multi- 
particle dynamical systems. They possess a form of 
higher-order supersymmetry for which the method 
of prepotential is also useful. 


Appendix: Root Systems 


Some rudimentary facts of the root systems and 
reflections are recapitulated here. The set of roots A 
is invariant under reflections in the hyperplane 
perpendicular to each vector in A. In other words, 
Sall) € A, Va, 8 € A, where 


So(8) =8- (a -p)a, a” = 2a/lol” — [12] 


The set of reflections {Sa |a € A} generates a group 
Ga, known as a Coxeter group, or finite reflection 
group. The orbit of 3 € A is the set of root vectors 
resulting from the action of the Coxeter group on 
it. The set of positive roots A, may be defined in 
terms of a vector U € R’, with a-U40,VaeE A, 
as the roots a € A such that a. U > 0. Given A,, 
there is a unique set of r simple roots 
T= {a;|j=1,...,7} defined such that they span 
the root space and the coefficients {a;} in 
B= 140; for BEA, are all non-negative. 
The highest root a,, for which a1 4 is max- 
imal, is then also determined uniquely. The subset 
of reflections {sa|a € IMI} in fact generates the 
Coxeter group Ga. The products of sa, with a € 
II, are subject solely to the relations 
(sas) =1, a, G € II. The interpretation is that 
SqSg is a rotation in some plane by 27/m(a, 3). The 
set of positive integers m/(a, (3) (with 
m(a,a@)=1,Va € II) uniquely specifies the Coxeter 
group. The weight lattice P(A) is defined as the 
Z-span of the fundamental weights {A,}, defined by 
a - Ap = Ojp, VQ; Ell. 

The root systems for finite reflection groups may 
be divided into two types: crystallographic and 
noncrystallographic. Crystallographic root systems 
satisfy the additional condition aY -G € Z,Va,G € A. 
The remaining noncrystallographic root systems are 
H3,H4, whose Coxeter groups are the symmetry 
groups of the icosahedron and four-dimensional 
600-cell, respectively, and the dihedral group of 
order 2m, {I2(m)|m > 4}. 

The explicit examples of the classical root 
systems, that is, A,B,C, and D are given below. 
For the exceptional and noncrystallographic root 
systems, the reader is referred to Humphrey’s book. 
In all cases, {e;} denotes an orthonormal basis in R’. 


1. A,-1: This root system is related with the Lie 
algebra su(r). 


A= U {x+(e-ex)}, 


1<j<k<r 


13 
Il = Ute — e41} = 


2. B,: This root system is associated with Lie 
algebra so(2r+1). The long roots have 
(length)* =2 and short roots have (length)? = 1: 


As U 


1<j<k<r 


II = Ute — ef U ter} 


{ be; + ey} UL, {Het 
(14) 
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3. C,: This root system is associated with Lie 
algebra sp(2r). The long roots have (length)* =4 
and short roots have (length)* =2: 

A= U {+e Fez} U; {+2e;} 
1<j<k<r 


15 
I= ‘Ute - e1} U {2er} > 


4. D,: This root system is associated with Lie 
algebra so(2r): 


A= U {+e; + e,} 


1<j<k<r 


r—1 [16] 
I[= pa — 41} U {€;-1 + rf 


See also: Calogero—Moser—Sutherland Systems 

of Nonrelativistic and Relativistic Type; 

Dynamical Systems in Mathematical Physics: 

An Illustration from Water Waves; Functional Equations 
and Integrable Systems; Integrable Discrete Systems; 
Integrable Systems in Random Matrix Theory; Integrable 
Systems: Overview; Isochronous Systems; Toda 
Lattices. 
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Introduction 


Statistical physics deals with systems with many 
degrees of freedom and the problems concern finding 
procedures for the extraction of relevant physical 
quantities for these extremely complex systems. The 
idea is to find relevant reduction procedures which 
map the complex systems onto simpler, tractable 
models at the price of introducing elements of 
uncertainty. Therefore, probability theory is a natural 
mathematical tool in statistical physics. Since the early 
days of statistical physics, in classical (Newtonian) 
physical systems, it is natural to model the observables 
by a collection of random variables acting on a 
probability space. Kolmogorovian probability techni- 
ques and results are the main tools in the development 
of classical statistical physics. A random variable is 
usually considered as a measurable function with 
expectation given as its integral with respect to a 
probability measure. Alternatively, a random variable 
can also be viewed as a multiplication operator by the 
associated function. Different random variables com- 
mute as multiplication operators, and one speaks of a 
commutative probabilistic model. 

Now, looking at genuine quantum systems, in 
many cases the procedure mentioned above leads to 
commutative probabilistic models, but there exist 
the realms of physics where quantum noncommuta- 
tive probabilistic concepts are unavoidable. Typical 
examples of such areas are quantum optics, low- 
temperature solid-state physics and ground-state 
physics such as quantum field theory. During the 
last 50 years physicists have developed more or less 
heuristic methods to deal with, for example, 
manifestations of fluctuations of typical quantum 
nature. In the last 30 years, mathematical founda- 
tions of such theories were also formulated, and a 
notion of quantum probability was launched as a 
branch of mathematical physics and mathematics 
(Cushen and Hudson 1971, Fannes and Quaegebeur 
1983, Quaegebeur 1984, Hudson 1973, Giri and 
von Waldenfels 1978). 

The aim of this article is to review briefly a few 
selected rigorous results concerning noncommuta- 
tive limit theorems. This choice is made not only 
because of the author’s interest but also for its close 
relation to concrete problems in statistical physics 
where one aims at understanding the macroscopic 


phenomena on the basis of the microscopic struc- 
ture. A precise definition or formulation of a 
microscopic and a macroscopic system is of prime 
importance. The so-called algebraic approach of 
dynamical systems (Brattelli and Robinson 1979 and 
2002) offers the necessary generality and mathema- 
tical framework to deal with classical and quantum, 
microscopic and macroscopic, finite and infinite 
systems. The observables of any system are assumed 
to be elements of an (C*- or von Neumann) algebra 
A, and the physical states are given by positive 
linear normalized functionals w of A, mapping the 
observables on their expectation values. 

A common physicist’s belief is that the macro- 
scopic behavior of an idealized infinite system is 
described by a reduced set of macroscopic quantities 
(Sewell 1986). Some examples of these are the 
average densities of particles, energy, momentum, 
magnetic moment, etc. Analogously as the micro- 
scopic quantities, the macroscopic observables 
should be elements of an algebra, and macroscopic 
states of the system should be states on this algebra. 
The main problem is to construct the precise 
mathematical procedures to go from a given micro- 
scopic system to its macroscopic systems. 

A well-known macroscopic system is the one 
given by the algebra of the observables at infinity 
(Lanford and Ruelle 1969) containing the spacial 
averages of local micro-observables, that is, for any 
local observable A one considers the observable 


Ay = w— lim al dx TxA 
V= V V 

where V is any finite volume in R” and 7, the 
translation over x € R”, and where w_lim is the 
weak operator limit in the microstate w. The limits 
A,, obtained correspond to the law of large numbers 
in probability. The algebra generated by these limit 
observables A,,={A,| A E€ A} is an abelian algebra 
of observables of a macroscopic system. This 
algebra can be identified with an algebra with 
pointwise product of measurable functions for 
some measure or macroscopic state. 

The content of this review is to describe an 
analogous mapping from micro to macro but for a 
different type of scaling, namely the scaling of 
fluctuations. For any local observable A € A, one 
considers the limit 


. 1 
lim arn | dx(7,A — w(7,A)) = F(A) 


The problem consists in characterizing the F(A) as 
an operator on a Hilbert space, called fluctuation 


operator, and to specify the algebraic character of 
the set of all of these. 

Based on this quantum central-limit theorem, one 
notes that not all locally different microscopic 
observables always yield different fluctuation opera- 
tors. Hence the central-limit theorem realizes a well- 
defined procedure of coarse graining or reduction 
procedure which is handled by the mathematical 
notion of an equivalence relation on the microscopic 
observables yielding the same fluctuation operator. 

In the following sections we discuss the prelimin- 
aries, the basic results about normal and abnormal 
fluctuations. Three model-independent applications 
are also discussed. In this review, we omit the 
properties of the so-called modulated fluctuations. 

One should remark that we discuss only fluctua- 
tions in space. One can also consider timelike 
fluctuations. The theory of fluctuation operators 
for these has not been explicitly worked out so far. 
However, it is clear that for normal fluctuations the 
clustering properties of the time correlation func- 
tions will play a crucial role. On the other hand, 
typical properties of the structure of this fluctuation 
algebra may come up. 

Another point which one has to stress is that all 
systems, which are treated in this review, are quasilocal 
systems. Other systems, for example, fermion systems, 
are note treated. But, in particular, fermion systems 
share many properties of quasilocality, and many of 
the results mentioned hold true also for fermion 
systems. 


Preliminaries 
Quantum Lattice Systems 


Although all results we review can be extented to 
continuous or more general systems, modulo some 
technicalities, we limit ourself to quasilocal quantum 
dynamical lattice systems. 

We consider the quasilocal algebra built on a 
v-dimensional lattice Z”. Let D(Z”) be the directed 
set of finite subsets of Z” where the direction is the 
inclusion. With each point x € Z” we associate an 
algebra (C*- or von Neumann algebra) A,, all copies 
of an algebra A. For all A € D(Z”), the tensor 
product &xcaAx is denoted by Aa. We take A to be 
nuclear, then there exists a unique C*-norm on A4. 
Every copy A, is naturally embedded in Ay. 
The family {Aa},epyzv) has the usual relations of 
locality and isotony: 


[Aa,,Aa,] = 0 if Ay Ar = 9 M1) 


An, C An, if Ad C Ad (2) 
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Denote by Ay all local observables, that is, 
Ar =| JAa 
A 


This algebra is naturally equipped with a C*-norm 
|| - || and its closure 


B= A, 


is called a quasilocal C*-algebra and considered as the 
microscopic algebra of observables of the system. 
Typical examples are spin systems where A = M, is the 
n x n complex matrix algebra. In this case, every state 
w of B is then locally normal, that is, there exists a 
family of density matrices {p, | A € D(Z”)} such that 


w(A)=trpsA forall AGC Aa 


An important group of *-automorphisms of B is the 
group of space translations {Tx, x E€ Z”: 


for all AE A. 

Note that the quasilocal algebra B is asymptoti- 
cally abelian for space translations: that is, for all 
ABEB 


lim ||[A, 7B]|] = 0 
|x| — 00 

A state w of B represents a physical state of the 
system, assigning to every observable A its expecta- 
tion value w(A). Therefore, this setting can be viewed 
as the quantum analog of the classical probabilistic 
setting. Sequences of random variables or observables 
can be constructed by considering an observable and 
its translates, that is, 7,(A),<7» is a noncommutative 
random field. If a state w is translation invariant, that 
is, WOT, =wW for all x, then all 7,(A) are identically 
distributed random variables. The mixing property of 
the random field is then expressed by the spatial 
correlations tending to zero: 


w(t (A)Ty(B)) — w(Tx(A))w(Ty(B)) —0 [3] 


if |x — y| > œœ. 

One of the basic limit theorems of probability theory 
is the weak law of large numbers. In this noncommu- 
tative setting the law of large numbers is translated into 
the problem of the convergence of space averages of an 
observable A € B. A first result was given by the mean 
ergodic theorem of von Neumann (1929). In Brattelli 
and Robinson (1979, 2002) one finds the following 
theorem: if the state w is space translation invariant and 
mixing (see [3]) then for all A, B, and C in B 


1 
tin, ofan (x: n) c) =w(AC)u(B) [4] 


xEA 
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That is, in the GNS (Gelfand—Naimark—Segal) repre- 
sentation of the state w, the sequence $,(B)= 
1/|A| X xea TxB converges weakly to a multiple of 
the identity: S(B) = w(B)l. This theorem, called the 
mean ergodic theorem, characterizes the class of 
states yielding a weak law of large numbers. Clearly, 
these limits {S(A)|A € B} form a trivial abelian algebra 
of macroscopic observables. 

Now we go a step further and consider space 
fluctuations. Define the local fluctuation of an 
observable A in a homogeneous (spatial invariant) 
state w by 


Fy(A )= TxA — w(A [5] 


xEN 


The problem is to give a rigorous meaning to 
lim Fa(A) for A tending to Z” in the sense of 
extending boxes. When does such a limit exist? 
What are the properties of the fluctuations or the 
limits F(A) = lim F,(A), etc.? Again, the F(A) are 
macroscopic variables of the microsystem. 

Already we remark the following: if A,B are 
strictly local elements, A, B € A, then 


N [A,B] EAL 


yeZ” 
and an easy computation yields, by [4], 


weak lim [F4 (A), Fa(B)) 


— weak lim = an Ly Tx (S [A aB) 


xEA yer 


my Se) 


xEA yEZ 


= ` w([A,7,B]) = io(A, B)1 
y EZ” 


= weak lim 


that is, if the F(A) and F(B) limits do exist, then 
[F(A), F(B)] = io(A, B) 1 [6] 


This property indicates that fluctuations should have 
the same commutation relations as boson fields. If 
fluctuations can be characterized as macroscopic 
observables, they must satisfy the canonical com- 
mutation relations (CCRs). Therefore, in the next 
section we introduce the essentials on CCR 
representations. 


CCR Representations 


We present the abstract Weyl CCR C*-algebra. 
More details can be found in Brattelli and 
Robinson (1979, 2002) and in particular in 
Manuceau et al. (1973), where the case of a real 


test function space (H,o) with a possibly degen- 
erate symplectic form o is treated. Hence, H is a 
real vector space and o a bilinear, antisymmetric 
form on H. 

Denote by W(H,c) the complex vector space 
generated by the functions W(f), f € H, defined by 


W(f): HC: g— Wif)g 
= iff #g 
1 ny =e 


W(H, o) becomes an algebra with unit W(0) for the 
product 


W(f)W(g) =W +g) 


and a *-algebra for the involution 


WE) > WA = W(-f) 


It becomes a C*-algebra C*(H,o) following the 
construction of Verbeure and Zagrebnov (1992). 
A linear functional w of a C*-algebra C*(H,o) is 
called a state if w(I)=1 and w(A*A) > 0 for all 
A € C*(H,o) and I= W(0). Every state gives rise to a 
representation through the GNS construction 
(Brattelli and Robinson 1979, 2002). In particular, 
w is a state if for any choice of A= Du cj W(f;) we 


have 
2 cjêpw( W 


AWO) =i 


GD): f geH 


— f))e —io(fifk) > >00 


A remark about the special case that o is degenerate 
is in order. Denote by Hp the kernel of o: 


Ho = {f € H| o(f, g) = 0 for all g€ H} 


If H= Ho $ Hı with cı a nondegenerate symplectic 
form on Hı and gı equal to the restriction of o to 
Hı, we have that C*(H, c) is a tensor product: 


C*(H, a) = C* (Ho, 0) & C*(H1, 01) 


Note that C*(Họ,0) is abelian and that each 
positive-definite normalized functional y, 


o : h €e Ho —> y(W(h)) 


defines a state w(W(h)) =y(W(h)) on C*(Ho, 0). 
Let € be any character of the abelian additive 
group H, then the map re, 


W(P) = AWA) 


extends to a *-automorphism of C*(H, co). Let s be a 
positive symmetric bilinear form on H such that for 
all f,g€ H: 


Lio(f,g)\’ < s(f, f) slg, 8) [7] 


and let ws¿ be the linear functional on C*(H, o) 
given by 


wse(W(h)) = (hye 0/25 [8] 


then it is straightforward (Brattelli and Robinson 
1979, 2002) to check that w, ¢ is a state on C*(H, o). 
All states of the type [8] are called quasifree states 
on the CCR algebra C*(H, 0). 

A state w of C*(H, ø) is called a regular state if, for 
all f g€ H, the map A E€ R-a(W(Af + g)) is con- 
tinuous. The regularity property of a state yields the 
existence of a Bose field as follows. Let (H, m, Q) be 
the GNS representation (Brattelli and Robinson 
1979, 2002) of the state w, then the regularity of 
w implies that there exists a real linear map 
b:H—L(H) (linear operators on H) such that 
Vf €H: b(f)* =b(f) and 


m(W(f)) = exp(ib(x)) 


The map b is called the Bose field satisfying the Bose 
field commutation relations: 


[O(f), O(g)| = io(f, 8) 9] 


Note that the Bose fields are state dependent. Note 
also already that if € is a continuous character of H, 
then any quasifree state [8] is a regular state 
guaranteeing the existence of a Bose field. 


Normal Fluctuations 


In this section we develop the theory of normal 
fluctuations for v-dimensional quantum lattice sys- 
tems with a quasilocal structure (see the section 
“Quantum lattice systems”) and for technical simpli- 
city we assume that the local C*-algebra Ay, x € Z”, 
are copies of the matrix algebra M,(C) of nxn 
complex matrices. Most of the results stated can be 
extended to the case where A, is a general C*-algebra 
(Goderis et al. 1989, 1990, Goderis and Vets 1989). 

We consider a physical system (6,w) where w is a 
translation-invariant state of 6, that is, w o Ty =w for 
all x € Z”. Later on we extend the situation to a 
C*-dynamical system (B,w,a,;) and analyze the 
properties of the dynamics a; under the central limit. 

For any local A we introduced its local fluctuation 
in the state w of the system: 


MASTAA) [10 


xEN 


The main problem is to give a rigorous mathema- 
tical meaning to the limits 


lim Fy(A) = F(A) 
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where the limit is taken for any increasing 
Z,-absorbing sequence {A}, of finite volumes of 
Z”. The limits F(A) are called the macroscopic 
fluctuation operators of the system (B, w). 

Already earlier work (Cushen and Hudson 1971, 
Sewell 1986) suggested that the fluctuations behave 
like bosons. We complete this idea by proving that 
one gets a well-defined representation of a CCR C*- 
algebra of fluctuations uniquely defined by the 
original system (B, w). 

Denote by Ar sa and Bsa the real vector space of 
the self-adjoint elements of AL, respectively, B. 


Definition 1 An observable A € B,, satisfies the 
central-limit theorem if 
(i) limw(F,(A)*) = s,,(A, A) exists and is finite, and 
ss ; ; 2 
(ii) lim w(el#FalA)) — e/V sel4, 4) for all te R. 
Clearly, our definition coincides with the notion in 
terms of characteristic functions, for classical systems (A 
abelian) equivalent with the notion of convergence in 
distribution. For quantum systems, there does not exist 
a standard notion of “convergence in distribution.” 
Only the concept of expectations is relevant. This does 
not exclude the notion of central-limit theorem in terms 


of the moments, which is the analog of the moment 
problem (Giri and von Waldenfels 1978). 


Definition 2 The system (b6,w) is said to have 
normal fluctuations if w is translation invariant and if 


(i) VA,B € AL 
N |w(At%B) — w(A)w(B)| < oo 


(ii) the central-limit theorem holds for all A € Aj, sa- 


Note that (i) implies that the state w is mixing for 
space translations. Also by (i), one can define a 
sesquilinear form on Ar: 


(A, B), = lim w(Fy(A*)Fa(B)) 
= SOWA" TB) — w(A* uB) 
and denote 


s.,(A, B) = Re(A, B), 
o.,(A, B) = 2Im(A, B), 


For A,B € AL sa one has 


o(A,B) = —i X w([A, 7xB]) [11] 
xLxEZ” 
S(A, A) = (A, A),, [12] 


Clearly, (AL sasou) is a symplectic space and s, a 
non-negative symmetric bilinear form on AL sa- 
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Following the discussion in the section “CCR 
representations” we get a natural CCR C*-algebra 
C*(AL saw) defined on this symplectic space. The 
following theorem is an essential step in the 
construction of a macroscopic physical system of 
fluctuations of the microsystem (B, w). 


Theorem 1 If the system (B,w) has normal 
fluctuations, then the limits {lim w(e ^) = 
exp ((—1/2)s (A, A)), A € Az} define ^a quasifree 
state © on the CCR C*-algebra C* (AL sa, Ow) by 


o(W(A)) = exp(— 5 Su(A, A)) 


Proof The proof is clear from the definition [8] if 
one can prove that the positivity condition [7] holds. 
But the latter follows readily from 


4[(A, B)? = lim [Imw(F,(A)Fa(B))|° 
< limw(F,(A)”)w(Fa(B)’) 
= s(A, A)s.(B, B) 
by Schwarz inequality. oO 


This theorem indicates that the quantum-mechan- 
ical alternative for (classical) Gaussian measures are 
quasifree states on CCR algebras. However, the 
following basic question arises: is it possible to take 
the limits of products of the form 


limo (etae...) 
A 


and, if they exist, do they preserve the CCR 
structure? Clearly, this is a typical noncommutative 
problem. 

Using the following general bounds: for C* = C 
and D* = D norm-bounded operators one has 


je = eC | < IDII 
fe“, eI] < IC, DI 


je’) — ee! || < 4 IEC, D] 


and by the expansion of the exponential function 
one proves easily that 


lina je Aaa — ei(Fa(A)+Fa(B)) 


x e7 (1/2[Fa(A),Fa(B)] || =0 [13] 


if A and B are one-point observables, that is, if A, B € 
A}. For general local elements the proof is some- 
what more technical and can be based on a Bernstein- 
like argument (for details see Goderis and Vets 
(1989)). The property [13] can be seen as a 
Baker—Campbell—Hausdorff formula for fluctuations. 


From [13], the mean ergodic theorem, and Theorem 
1 we get: 


Theorem 2 If the system (B,w) has normal 


fluctuations then for A,B € Aj, sa: 
imo) 
A 


i 
= epf -3 s,(A +B,A + B) — 50.(A,B)} 


2 
= w(W(A)W(B)) 
with © a quasifree state on the CCR algebra C* (AT s). 


Theorems 1 and 2 describe completely the 
topological and analytical aspects of the quantum 
central-limit theorem under the condition of normal 
fluctuations (Definition 2). In fact, the quantum 
central limit yields, for every microphysical system 
(B,w), a macrophysical system (C*(AL, sas Ow), ©) 
defined by the CCR C*-algebra of fluctuation 
observables C*(Az,5,,0.) in the representation 
defined by the quasifree state w. As the state W is a 
quasifree state, it is a regular state, that is, the map 
AERO w(W(AA + B)) is continuous. From in sec- 
tion “CCR representations” we know that this 
regularity property yields the existence of a Bose 
field, that is, there exists a real linear map 


F:A€ Aisa > F(A) 


where F(A) is a self-adjoint operator on the GNS 
representation space H of w, such that for all 
A,B € Aj, sa: 


[F(A), F(B)| = ia.,(A, B) 


Moreover, if one has a complex structure J on 


(AL, sas Ow) such that J* = —1 and for all A,B € Ar sa: 


ow(JA, B) = —o,(A, JB) 
o.,(A,J/B) > 0 


then one defines the boson creation and annihilation 
operators 
F=(A) = — (F(A) F iF(JA 
(A) 5 | (A) + iF(JA)) 
satisfying the usual boson commutation relations 


[F (A), F" (B)] = o(A, JB) T icu (A, B) 


Finally, it is straightforward, nevertheless impor- 
tant, to remark that Theorems 1 and 2 hold true if 
the linear space of local observables AL, sa is replaced 
by any of its subspaces. Some of them can have 
greater physical importance than others. This means 
that the quantum central-limit theorems can realize 
several macrophysical systems of fluctuations. But 
all of them are Bose field systems. 


It is also important to remark that these results 
end up in giving a probabilistic canonical basis of 
the canonical commutation relations. 

Now we analyze the notion of coarse graining due 
to the quantum central limit. Consider on A, the 
sesquilinear form (see [11], [12]) again 


(A,B), = X (W(A*TxB) — w(A)w(B)) 
x EZ” 


= s,(A, B) + 10,,(A, B) [14] 


This form defines a topology on Ay which is not 
comparable with the operator topologies induced by 
w. In fact, this form is not closable in the weak, 
strong, ultraweak, or ultrastrong operator topologies. 

We call A and B in A, equivalent, denoted by 
A~B if (A—B,A-—B),,=0. Clearly, this defines 
an equivalence relation on Ar. The property of 
coarse graining is mathematically characterized by 
the following: for all A,B € A, sa the relation A ~ B 
is equivalent with F(A)=F(B). Suppose first that 
F(A) = F(B), then 


[W(A), W(B)| = 0 
hence o,,(A,B)=0. Therefore, from Theorem 1: 


1 = 0(W(A)W(B)*) = a(W(A)W(-B)) 
= &(W(A — B)) = exp(—4s,,(A — B, A — B)) 


and from [12] and [14]: (A —B,A—B),=0. The 
converse is equally straightforward. 

From this property, it follows immediately that, for 
example, the action of the translation group is trivial 
or that F(7,A)=F(A) for all x € Z”. Therefore, the 
map F: Az s4— C*(AL sa, Jw) is not injective. This 
expresses the physical phenomenon of coarse graining 
and gives a mathematical signification of the fluctua- 
tions being macroscopic observables. 

In the above, we have constructed the new 
macroscopic physical system of quantum fluctua- 
tions for any microsystem with the property of 
normal fluctuations (see Definition 2). The main 
problem remains: when the microsystem does have 
normal fluctuations. We end this section with the 
formulation of a general sufficient clustering condi- 
tion for the microstate w in order that the micro- 
system (B,w) has normal fluctuations. 

Let A, A’ € D(Z”) and w a translation invariant 
state, denote 

a’ (A, A‘) = sup 


Ac AņpillAll=1 
Be Agr s||BI|=1 


(AB) — w(A)w(B) 


The cluster function aX,(d) is defined by 


a (d) = sup {a*(A, A’): d(A, A’) > d and 
max(|Al, |A’|) < N} 
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where N,d€R* and d(A,A’) is the Euclidean 
distance between A and A’. It is obvious that 


axd) zand) ifd>d' 
axd) < ald if N<N' 


The clustering condition is expressed by the follow- 
ing scaling law: 


35> 0: lim N'o (N12) =0 [15 
or, equivalently, 


35> 0: lim No pa N= iel 


—> CO 


Note that this condition implies that 


> An (lx]) < œ 


xEZ” 


that is, that the function a%(-) is an L!(Z”)- 
function for all N. In fact, this condition corre- 
sponds to the uniform mixing condition in the 
commutative (classical) central-limit theorem (see, 
e.g., Ibragimov and Linnick (1971)). This condition 
can also be called the modulus of decoupling. 
Product states, for example, equilibrium states of 
mean-field systems are uniformly clustering with 
aœa”(d)=0 for d >Q. 

The normality of the fluctuations of the micro- 
system (B,w) for product states is proved and 
extensively studied in Goderis et al. (1989), and for 
states satisfying the condition [15] or [16] in Goderis 
and Vets (1989). In the latter case, the proofs are 
very technical and based on a generalization of the 
well-known Bernstein argument (Ibragimov and 
Linnick 1971) of the classical central-limit theorem 
to the noncommutative situation. A refinement of 
these arguments can be found in Goderis et al. 
(1990). For the sake of formal self-consistency we 
formulate the theorem: 


Theorem 3 (Central-limit theorem) Take the micro- 
system (B, w) such that w is lattice translation invariant 
and satisfies the clustering condition |15]; then the 
system has normal fluctuations for all elements of the 
vector space of local observables Ax, sa. oO 


In Goldshtein (1982) a noncommutative central- 
limit theorem is derived using similar techniques. 
The main difference, however, is its strictly local 
character, namely for one local operator separately. 
The conditions depend on the spectral properties of 
the operator. It excludes a global approach resulting 
in a CCR algebra structure. 

Even for quantum lattice systems, it is not 
straightforward to check whether a state satisfies 
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the degree of mixing as expressed in conditions 
[15]-[16]. Clearly, one expects the condition to hold 
for equilibrium states at high enough temperatures. 
For quantum spin chains, a theorem analogous with 
Theorem 3 under weaker conditions than [15] is 
proved for example, in Matsui (2003). 

So far we have reviewed the quantum central-limit 
theorem for physical C*-spin systems (6,w) with 
normal fluctuations. 

Now we extend the physical system to a 
C*-dynamical system (B, w, a;) (Brattelli and Robinson 
1979, 2002) and we investigate the properties of the 
dynamics a, under the central limit. As usual, the 
dynamics is supposed to be of the short-range type in 
order to guarantee the norm limit: 


ark) =n— lim ela | "M 


and space homogeneous Q; Ty = Tx - Qt, Vt E€ R, Vx € 
Z”. We suppose that the state w is both space as 
time translation invariant. Moreover, we assume 
that the state w satisfies the mixing condition [15] 
for normal fluctuations. 

In [10] we defined, for every local A € AL, sa, the 
local fluctuation F,(A) and obtained a clear meaning 
of F(A) = lim, F,(A) from the central-limit theorem. 
Now we are interested in the dynamics of the 
fluctuations F(A). Clearly, for all A € Ar sa and all 
finite A: 


oF, (A) = Fa (aA) [17] 


and one is tempted to define the dynamics &+ of the 
fluctuations in the A-limit by the formula 


Gi,F(A) = F(a;A) 18] 


Note, however, that in general a;A is not a local 
element of AL sa. It is unclear whether the central 
limit of elements of the type aA, with A € AL, sa 
exists or not and hence whether one can give a 
meaning to F(a;A). Moreover, if F(a;A) exists, it 
remains to prove that (&:), defines a weakly 
continuous group of *-automorphisms on the fluc- 
tuation CCR algebra M=C*(AL, sas ow)” (the von 
Neumann algebra generated by the w-representation 
of C*(AL, sa, )). All this needs a proof. In Goderis 
et al. (1990), one finds the proof of the following 
basic theorem about the dynamics. 


Theorem 4 Under the conditions on the dynamics 
a, and on the state w expressed above, the limit 
F(a;A) = lim, Fy(a;A) exists as a central limit as in 
Theorem 2, and the maps & defined by [18] extend 
to a weakly continuous one-parameter group of 
*-automorphisms of the von Neumann algebra M. 
The quasifree state w is Q,-invariant (time invariant). 


This theorem yields the existence of a dynamics Q; 
on the fluctuations algebra and shows that it is of 
the quasifree type 


Gi,F(A) = F(a;A) 


where F(A) is a representation of a Bose field in a 
quasifree state w, the noncommutative version of a 
Gaussian distribution. In physical terms, it also 
means that any microdynamics a; induces a linear 
process on the level of its fluctuations. 

We can conclude that on the basis of the 
Theorems 3 and 4 the quantum central-limit 
theorem realized a map from the microdynamical 
system (B,w,a;) to a macrodynamical system 
(C*(AL, sas 0w), 5, t) Of the quantum fluctuations. 
The latter system is a quasifree Boson system. 

Note that, contrary to the central-limit theorem, 
the law of large numbers [4] maps local observables 
to their averages forming a trivial commutative 
algebra of macro-observables. The macrodynamics 
is mapped to a trivial dynamics as well. Therefore, 
the consideration of law of large numbers does not 
allow one to observe genuine quantum phenomena. 
On the other hand, on the level of the fluctuations, 
macroscopic quantum phenomena are observable. 


Abnormal Fluctuations 


The results about normal fluctuations in the last 
section contain two essential elements. On the one 
hand, the central limit has to exist. The condition in 
order that this occurs is the validity of the cluster 
condition ([15] or [16]) guaranteeing the normality 
of the fluctuations. On the other hand, there is the 
reconstruction theorem, identifying the CCR algebra 
representation of the fluctuation observables or 
operators in the quasifree state, which is denoted 
by ð. 

The cluster condition is in general not satisfied for 
systems with long-range correlations, for example, 
for equilibrium states at low temperatures with 
phase transitions. It is a challenging question to also 
study in this case the existence of fluctuations 
operators and, if they exist, to study their mathe- 
matical structure. Here we detect structures other 
than the CCR structure, other states or distributions 
different from quasifree states, etc. 

Progress in the elucidation of all these questions 
started with a detailed study of abnormal fluctua- 
tions in the harmonic and anharmonic crystal 
models (Verbeure and Zagrebnov 1992, Momont 
et al. 1997). More general Lie algebras are obtained 
than the Heisenberg Lie algebra of the CCR algebra, 
and more general states W or quantum distributions 


are computed beyond quasifree states, which is the 
case for normal fluctuations. 

Abnormal fluctuations turn up, if one has an 
ergodic state w with long-range correlations. We 
have in mind continuous (second-order) phase 
transitions, then typically, for example, the heat 
capacity or some more general susceptibilities 
diverge at critical points or lines. This means that 
normally scaled (with the factor |A|~/”) fluctuations 
of some observables diverge. This is equivalent with 
the divergence of sums of the type 


> WATA) — w(A)”) 


LEZ” 


for some local observable A. 

In order to deal with these situations, we rescale 
the local fluctuations. One determines a scaling 
index 64 € (—1/2,1/2), depending on the observa- 
ble A, such that the abnormally scaled local 
fluctuations 


FY‘ = |A| Fa (A) 


with F(A) as in [10], yield a nontrivial character- 
istic function: Vt € R, 


lim wa (et ®) = a(t) [19] 


where we limit our discussion to states wą local 
Gibbs states. The index 6, is a measure for the 
abnormality of the fluctuation of A. Note that 
ôa =—1/2 yields a triviality and that 64,=1/2 
would lead to a law of large numbers (theory of 
averages). Observe also that in general the char- 
acteristic function ¢, or the corresponding state w 
need not be Gaussian or quasifree. 

In the physics literature, one describes the long- 
range order by means of the asymptotic form of the 
connected two-point function in terms of the critical 
exponent n 


1 
—w,(A)* ~ 0 (a) Ix] co [20] 


wa (aT,A) 


Our scaling index 6, is related to the critical 
exponent n by the straightforward relation 


n = 2 — 2vôa 


As stated above, the index 6,4 is determined by the 
existence of the central limit and explicitly com- 
puted in several model calculations, for example, 
Verbeure and Zagrebnov (1992), and for equili- 
brium states. Apart from the strong model depen- 
dence, the indices also depend strongly on the 
chosen boundary conditions. This fact draws a new 
light on the universality of the critical exponents. 
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Suppose now that the indices 64 are determined 
by the existence of the central limit [19]. The next 
problem is to find out whether also in these cases a 
reconstruction theorem, comparable to, for exam- 
ple, Theorem 2, can be proved giving again a 
mathematical meaning to the limits 


lim F°4(A) = F% (A) [21] 


as operators, in general unbounded, on a Hilbert space. 

Here we develop a proof of the Lie algebra 
character of the abnormal fluctuations under the 
conditions: (1) the 6-indices are determined by the 
existence of the variances (second moments), and 
(2) the existence of the third moments (for more 
details see, e.g., Momont et al. (1997)). 

Consider a local algebra, namely an n-dimensional 
vector space G with basis {v;i};=1,...n and product 


uvg = || = ` Cp Ve [22] 
(=1 


with structure constants Cit satisfying 
4 E 


FoS To aS To 2s — 
> (GiGi + Cech + Chie) = 0 
7 
Consider the concrete Lie algebra basis of operators 
in Ajo} 


Lo ST agen shag 


such that L*=—L;,j7=0,1,...,m and w(L;)= 
limywa(L;)=0 for 7 > 0. Clearly, wa(Lo) =i for all 
A, and the {L;} satisfy eqn [22]. Because of the 
special choices of Lo one has cf, =c;,=0 and 
co, = —ilimywa([L;, L,]). We consider now the 
fluctuations of these generators and we are looking 
for a characterization of the Lie algebra of the 
fluctuations if any. 

For a translation-invariant local state w4, A C Z”, 
such that w= lim, wa, is mixing, define the local 
fluctuations, for j=1,...,m, 


m < CO 


ee i S"(teLj-—wa(Lj)) B3 
JA en 
and for notational convenience, take 
Fox =a 
Now we formulate the conditions for our purposes. 


Condition A We assume that the parameters 6; are 
determined by the existence of the finite and 
nontrivial variances: for all j= 1,...,m, 


0 < limwa (E) < 00 [24] 
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After reordering, take 1/2 > 6, > 6. > --- > m > 


47), 


Condition B Assume that all third moments are 
finite, that is, 


Ôj be pê 
lim|wa (PEE) < OO 


We have in mind, that the w,’s are Gibbs states 
for some local Hamiltonians with some specific 
boundary conditions. The limit A— Z” may depend 
very strongly on these boundary conditions, in the 
sense that they are visible in the values of the 
indices 6; (see, e.g., Verbeure and Zagrebnov 
(1992)). If for some j > 1, the corresponding 6; =0 
then the operator L; has a normal fluctuation 
operator 


OF a Pe 
F; = lim Fj’, [25] 


where the limit is understood in the sense of 
Condition A, namely a finite nontrivial variance. If, 
for some j > 1, the corresponding 6; Æ 0, then the 
fluctuation [25] is called an abnormal fluctuation 
operator. In order to satisfy Condition A, it happens 
sometimes that 6; has to be chosen negative (see, 
e.g., Verbeure and Zagrebnov (1992)). In this case, 
it is reasonable to limit our discussion to the 
situation that all 6; > —1/2. 

On the basis of Condition A, the limit set 
Elom of fluctuation operators generates a 
Hilbert space H with scalar product 


(FP Fit) = imwa (FE) 26 


On the basis of Condition B, the fluctuation 
operators are defined as multiplication operators of 
the Hilbert space H. Note that the Conditions A and 
B are not sufficient to obtain a characteristic 
function. However, they are sufficient to obtain the 
notion of fluctuation operator. Now we proceed to 
clarify the Lie algebra character of these fluctuation 
operators on H. 

Consider the Lie product of two local fluctuations 
for a finite A, one gets 


Ô; ó < 
LoDo [27] 
0 
with 
L= Llom 


£ C jk 
Cie (A) — Pa l 


—ô; —ó 
8 (A) = [Al ye wal Fe’ ) 


It is an easy exercise to check that the {ci,(A)} are the 
structure coefficients of a Lie algebra G(A). Hence, 
by considering local fluctuations, one constructs a 
map from the Lie algebra G onto the Lie algebra 
G(A) by a nontrivial change of the structure 
constants. When the transformed structure constants 
approach a well-defined limit, a new nonisomorphic 
Lie algebra might appear. The limit algebra G(Z”), 
called the contracted one of the original one G is 
always nonsemisimple. This contraction is a typical 
Inönü-Wigner contraction (Inönü and Wigner 
1953). About the limit algebra G(Z”), the following 
results are obtained (see Momont et al. (1997)): 


O if $+5 +ô- ô >0 


lim cal A) = Cit il 25 nesaee ES —=0 {28] 
0 EENE <0 


It is interesting to distinguish a number of special 
cases: 


1. If all fluctuations are normal, one recovers the 
Heisenberg algebra of the canonical commuta- 
tion relations with the right symplectic form ow. 

2. If 1/2 + 6; + dz — ô > O for all j,k, one obtains 
an abelian Lie algebra of fluctuations. 

3. One gets the richest structure if 1/2 + 6; + 6, — 
6;=0 for all j,k, or for some of them. One 
ei a phenomenon of scale invariance, the 

A) are A-independent. Algebras different doin 
in CCR algebra are observed. A particularly 
interesting case turns up if 6;= —6, 4 0, that is, 
one of the indices is negative, 7 example, 6; < 0, 
the corresponding fluctuation F? shows a prop- 
erty of space squeezing, and then 6, > 0, the 
fluctuation Fi expresses the property of space 
dilation. These phenomena are observed and 
computed in several models (see, e.g., Verbeure 
and Zagrebnov (1992)). This yields in particular 
a microscopic explanation of the phenomenon of 
squeezing (squeezed states and all that) in 
quantum optics. We refer also to the section 
“Spontaneous symmetry breaking” for this phe- 
nomenon as being the basis of the construction of 
the Goldstone normal modes of the Goldstone 
particle appearing in systems showing sponta- 
neous symmetry breakdown. 


Some Applications 


The notion of fluctuation operator as presented 
above, and the mathematical structure of the algebra 
of fluctuations have been tested in several soluble 
models. Many applications of this theory of quan- 
tum fluctuations can be found in the list of 
references. Here we are not entering into the details 


of any model, but we limit ourselves to mention 
three applications which are of a general nature and 
totally model independent. 


Conservation of the KMS Property under 
the Transition from Micro to Macro 


Suppose that we start with a micro-dynamical 
system (6,w,a;) with normal fluctuations, that is, 
we are in the situation as treated in the section 
“Normal fluctuations.” Hence, we know that the 
quantum central-limit theorem maps the system 
(B,w,a;) onto the macrodynamical system 
(C*(AL, sas Ow), Č, Az) Of quantum fluctuations. 

If the microstate w is a;-time invariant (w-a;=w 
for all t€ R), then it also follows readily that the 
macrostate w is &-time invariant (see Theorem 4, 
i.e., ©: Q,;=— for all t€ R). 

A less trivial question to pose is: suppose that the 
microstate w is an equilibrium state for the micro- 
dynamics a;, is then the macrostate w also an 
equilibrium state for the macrodynamics &, of the 
fluctuations? In Goderis et al. (1990) this question is 
answered positively in the following more technical 
sense: if w is an a;-KMS state of B at inverse 
temperature 8, then w is an a;-KMS state at the 
same temperature. 

This property proves that the notion of equili- 
brium is preserved under the operation of coarse 
graining induced by the central-limit theorem. This 
statement constitutes a proof of one of the 
basic assumptions of the phenomenological theory 
of Onsager about small oscillations around 
equilibrium. 

This result also yields a contribution to the 
discussion whether or not quantum systems should 
be described at a macroscopic level by classical 
observables. The result above states that the macro- 
scopic fluctuation observables behave classically if 
and only if they are time invariant. In other words, it 
can only be expected a priori that conserved 
quantities behave classically. In principle, other 
observables follow a quantum dynamics. 


Linear Response Theory 


In particular, in the study of equilibrium states 
(KMS states) a standard procedure is to perturb the 
system and to study the response of the system as a 
function of the perturbation. The response eluci- 
dates many, if not all, of the properties of the 
equilibrium state. 

Technically, one considers a perturbation of the 
dynamics by adding a term to the Hamiltonian. One 
expands the perturbed dynamics in terms of the 
perturbation and the unperturbed dynamics. It is 
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often argued that when the perturbation is small, 
one can limit the study of the response to the first- 
order term in the perturbation in the corresponding 
Dyson expansion. This is the basis of what is called 
the “linear response theory of Kubo.” 

A long-term debate is going on about the validity 
of the linear response theory. The question is how to 
understand from a microscopic point of view the 
validity of the response theory being linear or not. 
One must realize that the linear response theory 
actually observed in macroscopic systems seems to 
have a significant range of validity beyond the 
criticism being expressed about it. 

Here we discuss the main result of the paper 
(Goderis et al. 1991) in which contours are sketched 
for the exactness of the response being linear. 

We assume: 


1. that the microdynamics a; is the norm-limit of 
the local dynamics af =e. e441, where Hy 
contains only standard finite-range interactions 
(as in the section “Normal fluctuations”); 

2. that the w, are states such that w= lim, wa is a 
state which is time and space translation invar- 
iant; and 

3. that w satisfies the cluster condition [15] or [16]. 


From the time invariance of the state, one has a 
Hamiltonian GNS representation of the dynamics: 
a, =e . eH. On the basis of Theorem 4, one has 
the dynamics q of the fluctuation algebra 
C*(AL, sas Ow) in the state ©. This GNS representation 
yields a Hamiltonian representation for Q;: 
Gi, = eit . eit 

Now take any local perturbation P € AL sa of az, 
namely 


P __ git(H+F,(P)) , @—it(H+Fa(P)) 


On 


where F,(P) is the local fluctuation of P in w. Then 
one proves the following central-limit theorem 
(Goderis et al. 1991): for all A and B in AL, sa, one 
has the perturbed dynamics 


~P eit(H+F(P)) —it(H+F(P)) 


a, -e 
of the fluctuation algebra in the sense of [18]: 

i, F(A) = lim F(a, (A) 
This proves the existence and the explicit form of 


the perturbed dynamics lifted to the level of the 
fluctuations. In particular, one has 


limwa (a? (Fa(A))) = a(r F(A)) 
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This is nothing but the existence of the relaxation 
function of Kubo but lifted to the level of the 
fluctuations and instead of dealing with strictly local 
observables here one considers fluctuations. 
Assume, furthermore, that the state w is an (aş, 33)- 

KMS state; then one derives readily Kubo’s famous 
formula of his linear response theory: 

d 

(4: F(A)) = i@([F(P), &F(A)) 
which shows full linearity in the perturbation 
observable P. Kubo’s formula arises as the central 
limit of the microscopic response to the dynamics 
perturbed by a fluctuation observable. We remark 
that if w is an equilibrium state, then the right-hand 
side of the formula above can be expressed in terms 
of the Duhamel two-point function, which is the 
common way of doing in linear response theory. 


Spontaneous Symmetry Breaking 


SSB is one of the basic phenomena accompanying 
collective phenomena, such as phase transitions in 
statistical mechanics, or specific ground states in 
field theory. SSB goes back to the Goldstone 
theorem. There are many different situations to 
consider, for example, in the case of short-range 
interactions, it is typical that SSB yields a 
dynamics which remains symmetric, whereas for 
long-range interactions SSB also breaks the sym- 
metry of the dynamics. However, in all cases the 
physics literature predicts the appearance of a 
particular particle, namely the Goldstone boson, to 
appear as a result of SSB. The theory of fluctua- 
tion operators allows the construction of the 
canonical coordinates of this particle. The most 
general result can be found in Michoel and 
Verbeure (2001). We sketch the essentials in two 
cases, namely for systems of long-range interac- 
tions (mean fields) and for systems with short- 
range interactions. 


Long-range (mean-field) interactions Here we give 
explicitly the example of the strong-coupling BCS 
model in one dimension (v=1). The microscopic 
algebra of observables is B= &; (M2); where M2 is 
the algebra of 2 x2 complex matrices. The local 
Hamiltonian of the models is given by 


N 1 N 
= +- 
Hee 2 ana 


1 
Oe x5 


where o*,o* are the usual 2 x 2 Pauli matrices. In 


the thermodynamic limit, the KMS equation has the 


following product state solutions: w\= ®; trp), 
where 


e— Sh) 


P\ = eT A=trpyo ST ) 


ber -jo — Ae 


Note that A}=trp,o is a nonlinear equation for A 
whose solutions determine the density matrix py. 
This equation always has the solution A=0O, 
describing the so-called normal phase. For 3 > Be, 
with thG.e=2e«, one has a solution à Æ 0, describing 
the superconducting phase. Remark that if A is a 
solution, then also Ae? for all ¢ is a solution as 
well. It is clear that Hy is invariant under the 
continuous gauge transformation automorphism 
group G= {y| p € [0, 2a]} of B: 
Yolo) = ea} 

Hence G is a symmetry group. On the other hand: 
wr(¥olo")) =e wy (ot) walo). The gauge group 
G is spontaneously broken. Remark also that the 
gauge transformations are implemented locally by 
the charges 


N 
On= >, OF 


j=-N 


i +) — e7 YON gt e'PQN 
164 Valo =F o; e 


and o* is the symmetry generator density. As the 
states w) are product states, all fluctuations are 
normal (see the section “Normal fluctuations”). One 
considers the local operators 
2 
O= AT a? +e" + ra") 
m m 


pP=— (Act — do) 
H 


where u= (e + |A\7)'/*. Note that P is essentially 
the order parameter operator, that is, the operator P 
is breaking the symmetry: 


1/2 


Ealla) £0, w (A)=0 


On the other hand, O is essentially the generator of 
the symmetry oč normalized to zero, that is, 


wa(Q)=0. 
Michoel and Verbeure (2001) proved in detail 


that the fluctuations F(Q) and F(P) form a 
canonical pair 
AAI? 
P(Q),FP)| =i 


and that they behave, under the time evolution, as 
harmonic oscillator coordinates oscillating with a 


frequency equal to 2u. This frequency is called a 
plasmon frequency. Moreover, the variances are 


_ AÉ 

u2 
This means that these coordinates vanish or dis- 
appear if A=0. The coordinates F(Q) and F(P) are 
the canonical coordinates of a particle appearing 
only if there is spontaneous symmetry breakdown. 
They are the canonical coordinates of the Goldstone 
boson, which arise if SSB occurs. 


(F(Q)°) = W(F(P)’) 


Short-range interactions An analogous result, as 
for long-range interactions, can be derived for 
systems with short-range interactions. However, in 
this case we have equilibrium states with poor 
cluster properties. We are now in the situation as 
described in the “Abnormal Fluctuations” section. 
Also in this case we have the phenomenon of SSB, 
which shows the appearance of a Goldstone particle. 
Also in this case one is able to construct its 
canonical coordinates. The details of this construc- 
tion can be found in Michoel and Verbeure (2001). 
Here we give a heuristic picture of this construction. 

Consider again a microsystem (B,w,a;) and let +, 
be a strongly continuous one-parameter symmetry 
group of a; which is locally generated by 
On = doc, 4x- SSB amounts to find an equilibrium 
(KMS) or ground state w which breaks the symme- 
try, that is, there exists a local observable A € AL, sa 
such that for s#0 holds: w(7,(A)) #w(A) and 


O4Ys =7s0;. This is equivalent to 


Lufya) 





=limw(|O,, A] =c #0 
s=0 7 
with c a constant. 

Now we turn this equation into a relation for 
fluctuations. Using space translation invariance of 
the state, one gets 


j; 1 
m 


We now use another consequence of the Gold- 
stone theorem, namely that SSB implies poor 
clustering properties for the order parameter A, 
that is, in the line of what is done in the last 
section, we assume that the lack of clustering is 
expressed by the existence of a positive index ô 
such that 


S (ax — w(d)) X (A — w(A)) 


xEA yea 








lim w 
A 
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is nontrivial and finite. This means that the fluctua- 
tion F°(A) exists. Then we get 





l 1 
i ates 2, (ds — w(q)), 
1 
Tr A — w(A =S 
a l (A)) 





Hence 


w([F°(q), P(A)]) = c 


which for equilibrium states w, turns into the 
operator equation for fluctuations 


[F*(q), P(A)|= cl 


In other words, one obtains a canonical pair 
(F(q), F°(A)) of normal coordinates of the collec- 
tive Goldstone mode. 

Note that the long-range correlation of the 
order-parameter operator (positive 6) is exactly 
compensated by a squeezing, described by the 
negative index —é, for the fluctuation operator of 
the local generator of the broken symmetry. This 
result can also be expressed as typical for SSB, 
namely that the symmetry is not completely 
broken, but only partially. More detailed informa- 
tion about all this is found in Michoel and 
Verbeure (2001). 


See also: Algebraic Approach to Quantum Field Theory; 
Large Deviations in Equilibrium Statistical Mechanics; 
Macroscopic Fluctuations and Thermodynamic 
Functionals; Quantum Phase Transitions; Quantum 
Spin Systems; Symmetry Breaking in Field Theory; 
Tomita—Takesaki Modular Theory. 
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The Definition 


A numerical measure of the ability of a classical or 
quantum information processing system (for definite- 
ness, one speaks of a communication channel) to 
transmit information expressible as a text message 
(called “classical information” as distinct from quan- 
tum information). It is equal to the least upper bound 
for rates of the asymptotically perfect transmission of 
classical information through the system, when the 
transmission time tends to infinity, and arbitrary pre- 
and post-processing (encoding and decoding) are 
allowed at the input and the output of the system. 
Typically, for rates exceeding the capacity, not only 
the asymptotically perfect transmission is impossible, 
but the error probability with arbitrary encoding- 
decoding scheme tends to 1, so that the capacity has a 
nature of a threshold parameter. 


From Classical to Quantum 
Information Theory 


A central result of the classical information theory is 
the Shannon coding theorem, giving an explicit 
expression to the capacity in terms of the maximal 
mutual information between the input and the 
output of the channel. The issue of the information 
capacity of quantum communication channels arose 


soon after the publication of the pioneering papers 
by Shannon and goes back to the classical works of 
Gabor, Brillouin, and Gordon, asking for funda- 
mental physical limits on the rate and quality of 
information transmission. This work laid a physical 
foundation and raised the question of consistent 
quantum treatment of the problem. Important steps 
in this direction were made in the early 1970s when 
a quantum probabilistic framework for this type 
of problem was created and the conjectured upper 
bound for the classical capacity of quantum 
channel was proved. A long journey to the quantum 
coding theorem culminated in 1996 with the 
proof of achievability of the upper bound 
(the Holevo-Schumacher-Westmoreland theorem; 
see Holevo (1998) for a detailed historical survey). 
Moreover, it was realized that quantum channel is 
characterized by the whole spectrum of capacities 
depending on the nature of the information resources 
and the specific protocols used for the transmission. 
To a great extent, this progress was stimulated by an 
interplay between the quantum communication theory 
and quantum information ideas related to more recent 
development in quantum computing. This new age of 
quantum information science is characterized by 
emphasis on the new possibilities (rather than restric- 
tions) opened by the quantum nature of the informa- 
tion processing agent. On the other hand, the question 
of information capacity is important for the theory of 
quantum computer, particularly in connection with 
quantum error-correcting codes, communication and 
algorithmic complexity, and a number of other 
important issues. 


The Quantum Coding Theorem 


In the simplest and most basic memoryless case, the 

information processing system is described by the 
sequence of block channels, 

6°? —P@@Q---@O, 

eee pee” 


n 


na l 


of n parallel and independent uses of a channel ®, n 
playing the role of transmission time (Holevo 1998). 
More generally, one can consider memory channels 
given by open dynamical systems with a kind of 
ergodic behavior and the limit where the transmission 
time goes to infinity (Kretschmann and Werner 2005). 

Restricting to the memoryless case, encoding is given 
by a mapping of classical messages x from a given 
codebook of size N into states (density operators) p% 
in the input space H$” of the block channel ®®”, and 
decoding — by an observable M™ in the output space 
H3”, that is, a family {M\")} of operators constituting a 
resolution of the identity in H$”: 


Sup =r 
y 


Here y plays the role of outcomes of the whole 
decoding procedure involving both the quantum 
measurement at the output and the possible classical 
information post-processing. Then the diagram for 
the classical information transmission is 


(7) 
M” > 0, 


j he M” 
x p” = D” Sy 
~” —_—-———’ 


input output 
state state 


The such-described encoding and decoding consti- 
tute a quantum block code of length n and size N 
for the memoryless channel. The conditional prob- 
ability of obtaining an outcome y provided the 
message x was sent for a chosen block code is given 
by the statistical formula 


p” (y|x) = tr 6°" [py] My” 


and the error probability for the code is just 
max, (1 =p" (xx): 

Denoting by p.(”,N) the infimum of the error 
probability over all codes of length n and size N, the 
classical capacity C(®) of the memoryless channel is 
defined as the least upper bound of the rates R for 
which lim „— œ pe(m, 2"*) = 0. 

Let ® be a quantum channel from the input to the 
output quantum systems, assumed to be finite 
dimensional. The coding theorem for the classical 
capacity says that 


C(®) = lim —C,(6®") 1) 


n> n 
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where 


C,(®) = maxi (So p01 


=e ina} |2] 


H(p)=-—trplog,p is the binary von Neumann 
entropy, and the maximum is taken over all 
probability distributions {p,} and collections of 
density operators {px} in H1. 


The Variety of Capacities 


This basic definition and the formulas [1], [2] generalize 
the definition of the Shannon capacity and the coding 
theorem for classical memoryless channels. For quantum 
channel, there are several different capacities because 
one may consider sending different kinds (classical or 
quantum) of information, restrict the admissible coding 
and decoding operations, and/or allow the use of 
additional resources, such as shared entanglement, 
forward or backward communication, leading to really 
different quantities (Bennett et al. 2004). Few of these 
resources (such as feedback) also exist for classical 
channels but usually influence the capacity less drama- 
tically (at least for memoryless channels). Restricting to 
the transmission of classical information with no 
additional resources, one can distinguish at least four 
capacities (Bennett and Shor 1998), according to 
whether, for each block length 1, one is allowed to use 
arbitrary entangled quantum operations on the full 
block of input (resp. output) systems, or if, for each of the 
parallel channels, one has to use a separate quantum 
encoding (resp. decoding), and combine these only by 
classical pre- (resp. post-) processing: 


C.: full 
capacity, arbitary 
(de)coding 


Cio = Cy: C1: quantum 
unentangled block 
coding, quantum coding, separate 
block decoding decoding 


C414: one-shot 
capacity or accessible 
information, separate 

quantum (de)coding, block 
(de)coding only classical 





The full capacity Coooo is just the classical capacity 
C(®) given by [1]. That Cis coincides with the 
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quantity C,(®) given by [2] is the essential content 
of the HSW theorem, from which [1] is obtained 
by additional blocking. Since C, is apparently 
superadditive, C,(®1 © ®2) > C,(®1) + C,(2), one 
has Cywo > Cy. It is still not known whether the 
quantity C,(®@) is in fact additive for all channels, 
which would imply the equalities here. Additivity of 
C,(®) would have the important physical conse- 
quence — it would mean that using entangled input 
states does not increase the classical capacity of 
quantum channel. While such a result would be very 
much welcome, giving a single-letter expression for 
the classical capacity, it would call for a physical 
explanation of asymmetry between the effects of 
entanglement in encoding and decoding procedures. 
Indeed, the inequality in the lower left is known to be 
strict sometimes (Holevo 1998), which means that 
entangled decodings can increase the classical capa- 
city. There is even an intermediate capacity between 
Cy, and C;., obtained by restricting the quantum 
block decodings to adaptive ones (Shor 2002). The 
additivity of the quantity C, for all channels is one of 
the central open problems in quantum information 
theory; it was shown to be equivalent to several other 
important open problems, notably (super)additivity 
of the entanglement of formation and additivity of 
the minimal output entropy (Shor 2004). 

For infinite-dimensional quantum processing sys- 
tems, one needs to consider the input constraints 
such as the power constraint for bosonic Gaussian 
channels. The definition of the classical capacity and 
the capacity formula are then modified by introduc- 
ing the constraint in a way similar to the classical 
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Introduction 


Quantum chromodynamics, or QCD, as it is normally 
called in high-energy physics, is the quantum field 
theory that describes the strong interactions. It is the 
SU(3) gauge theory of the current standard model for 
elementary particles and forces, SU(3) x SU(2); x U(1), 
which encompasses the strong, electromagnetic, and 
weak interactions. The symmetry group of QCD, with 
its eight conserved charges, is referred to as color 
SU(3). As is characteristic of quantum field theories, 


theory (Holevo 1998, Holevo and Werner 2001). 
Another important extension concerns multiuser 
quantum information processing systems and their 
capacity regions (Devetak and Shor 2003). 


See also: Capacities Enhanced by Entanglement; 
Capacity for Quantum Information; Channels in Quantum 
Information Theory; Entanglement Measures. 
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each field may be described in terms of quantum waves 
or particles. 

Because it is a gauge field theory, the fields that 
carry the forces of QCD transform as vectors under 
the Lorentz group. Corresponding to these vector 
fields are the particles called “gluons,” which carry 
an intrinsic angular momentum, or spin, of 1 in 
units of h. The strong interactions are understood as 
the cumulative effects of gluons, interacting among 
themselves and with the quarks, the spin-1/2 
particles of the Dirac quark fields. 

There are six quark fields of varying masses in 
QCD. Of these, three are called “light” quarks, in a 
sense to be defined below, and three “heavy.” The 
light quarks are the up (u), down (d), and strange (s), 
while the heavy quarks are the charm (c), bottom (b), 


and top (t). Their well-known electric charges are 
ey =2e/3(u,c,t) and es = —e/3(d,s,b), with e the 
positron charge. The gluons interact with each quark 
field in an identical fashion, and the relatively light 
masses of three of the quarks provide the theory with 
a number of approximate global symmetries that 
profoundly influence the manner in which QCD 
manifests itself in the standard model. 

These quark and gluon fields and their correspond- 
ing particles are enumerated with complete confidence 
by the community of high-energy physicists. Yet, none 
of these particles has ever been observed in isolation, 
as one might observe a photon or an electron. Rather, 
all known strongly interacting particles are colorless; 
most are “mesons,” combinations with the quantum 
numbers of a quark q and a antiquark g’, or 
“baryons” with the quantum numbers of (possibly 
distinct) combinations of three quarks qq'q”. This 
feature of QCD, that its underlying fields never 
appear as asymptotic states, is called “confinement.” 
The very existence of confinement required new ways 
of thinking about field theory, and only with these 
was the discovery and development of QCD possible. 


The Background of QCD 


The strong interactions have been recognized as a 
separate force of nature since the discovery of the 
neutron as a constituent of atomic nuclei, along with 
the proton. Neutrons and protons (collectively, 
nucleons) possess a force, attractive at intermediate 
distances and so strong that it overcomes the electric 
repulsion of the protons, each with charge e. A sense 
of the relative strengths of the electromagnetic and 
strong interactions may be inferred from the typical 
distance between mutually repulsive electrons in an 
atom, ~10-°cm, and the typical distance between 
protons in a nucleus, of order 107!’ cm. 

The history that led up to the discovery of QCD is a 
fascinating one, beginning with Yukawa’s 1935 theory 
of pion exchange as the source of the forces that bind 
nuclei, still a useful tool for low-energy scattering. 
Other turning points include the creation of nonabelian 
gauge theories by Yang and Mills in 1954, the discovery 
of the quantum number known as strangeness, the 
consequent development of the quark model, and then 
the proposal of color as a global symmetry. The role of 
pointlike constituents in hadrons was foreshadowed by 
the identification of electromagnetic and weak currents 
and the analysis of their quantum-mechanical algebras. 
Finally, the observation of “scaling” in deep-inelastic 
scattering, which we will describe below, made QCD, 
with color as a local symmetry, the unique explanation 
of the strong interactions, through its property of 
asymptotic freedom. 
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The Lagrangian and Its Symmetries 


The QCD Lagrangian may be written as 


£= aG BIA] — may -5 t [EA 


1 
-Z (BalA))? Hte Ee i 
with D[A] =y- ð + igy - A the covariant derivative in 
QCD. The 7“ are the Dirac matrices, satisfying the 
anticommutation relations, [y", y], =2g"”. The SU(3) 
gluon fields are A” = De AHTa, where T, are the 
generators of SU(3) in the fundamental representation. 
The field strengths F [A] = 0, A, — WA, + igs[Ap, Av] 
specify the three- and four-point gluon couplings of 
nonabelian gauge theory. In QCD, there are oe 
flavors of quark fields, qf, with conjugate qf =q gn 

The first two terms in the expression [1] make up 
the classical Lagrangian, followed by the gauge-fixing 
term, specified by a (usually, but not necessarily 
linear) function B,(A), and the ghost Lagrangian. The 
ghost (anti-ghost) fields c,(¢,) carry the same adjoint 
index as the gauge fields. 

The classical QCD Lagrangian before gauge fixing 
is invariant under the local gauge transformations 


Ai (x) = > PAA (x) + Q(x)Al (x)! (x) 
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+ ig, |da(x) me +. 
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The full QCD action including gauge-fixing and 
ghost terms is also invariant under the Bechi, Rouet, 
Stora, Tyutin (BRST) transformations with 6&€ an 
anticommuting variable. 


bAya = (Sab On T gA pelabe ChE 
—$ Cab C606, DO—=ABi00 [3] 
OW; = ig|T];; Coy 


with fabe the SU(3) structure constants. The Jacobian 
of these transformations is unity. 

In addition, neglecting masses of the light quarks, 
u, d, and s, the QCD Lagranian has a class of global 
flavor and chiral symmetries, the latter connecting 
left- and right-handed components of the quark 


fields, pip = (1/2)(1 F ys), 
p(x) = eS p(x), 


y(x) = 


C= 


P=0,1 4] 
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Here, power P =0 describes phase, and P = 1 chiral, 
transformations. Both transformations can be 
extended to transformations among the light flavors, 
by letting ~ become a vector, and a an element in 
the Lie algebra of SU(M), with M = 2 if we take only 
the u and d quarks, and M=3 if we include the 
somewhat heavier strange quark. These symmetries, 
not to be confused with the local symmetries of the 
standard model, are strong isospin and its extension 
to the “eightfold way,” which evolved into the 
(3-)quark model of Gell-Mann and Zweig. The 
many successes of these formalisms are automati- 
cally incorporated into QCD. 


Green Functions, Phases, 
and Gauge Invariance 


In large part, the business of quantum field theory is 
to calculate Green functions, 


= DT D ce OH) DX) 0) |S! 


where T denotes time ordering. The ©®;(x) are 
elementary fields, such as A or qf, or composite 
fields, such as currents like J“ =qry”qf. Such a 
Green function generates amplitudes for the scatter- 
ing of particles of definite momenta and spin, when 
in the limit of large times the x;-dependence of the 
Green function is that of a plane wave. For example, 
we may have in the limit x? — ov, 


Gr(x1 os oy — ilp, A) e*il (p, A)|T(®4 (x1) ae 
Pia (Xi-1) Ps (Xi41) --- On (Xn))|0) [6] 


where @;(p, A) is a solution to the free-field equation for 
field ®;, characterized by momentum p and spin X. (An 
inegral over possible momenta p is understood.) 
When this happens for field 7, the vacuum state is 
replaced by |(p,A)), a particle state with precisely 
this momentum and spin; when it occurs for all 
fields, we derive a scattering (S)-matrix amplitude. 
In essence, the statement of confinement is that 
Green functions with fields g;(x) never behave as 
plane waves at large times in the past or future. 
Only Green functions of color singlet composite 
fields, invariant under gauge transformations, are 
associated with plane wave behavior at large times. 

Green functions remain invariant under the BRST 
transformations [3], and this invariance implies a set 
of Ward identities 
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The variation of the anti-ghost as in [3] is equivalent 
to an infinitesimal change in the gauge-fixing term; 
variations in the remaining fields all cancel single- 
particle plane wave behavior in the corresponding 
Green functions. These identities then ensure the 
gauge invariance of the perturbative $-matrix, a result 
that turns out to be useful despite confinement. 

To go beyond a purely perturbative description of 
QCD, it is useful to introduce a set of nonlocal 
operators that are variously called nonabelian 
phases, ordered exponentials, and Wilson lines, 


Uc(z,¥) = P exp -ie | ‘dx"A, (x) [8] 





where C is some self-avoiding curve between y and z. 
The U’s transform at each end linearly in nonabelian 
gauge transformations Q(x) at that point, 


Uday = A(z)Uc(z, y) [9] 


Especially interesting are closed curves C, for which 
z=y. The phases about such closed loops are, like 
their abelian counterparts, sensitive to the magnetic 
flux that they enclose, even when the field strengths 
vanish on the curve. 


QCD at the Shortest and Longest 
Distances 


Much of the fascination of QCD is its extraordinary 
variation of behavior at differing distance scales. Its 
discovery is linked to asymptotic freedom, which 
characterizes the theory at the shortest scales. 
Asymptotic freedom also suggests (and in part 
provides) a bridge to longer distances. 

Most analyses in QCD begin with a path-integral 
formulation in terms of the elementary fields 
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with Sgcp the action. Perturbation theory keeps 
only the kinetic Lagrangian, quadratic in fields, in 
the exponent, and expands the potential terms in 
the coupling. This procedure produces Feynman 
diagrams, with vertices corresponding to the cubic 
and quartic terms in the QCD Lagrangian [1]. 
Most nonperturbative analyses of QCD require 
studying the theory on a Eucliean, rather than 
Minkowski space, related by an analytic continuation 
in the times x°, y°,z° in G, from real to imaginary 
values. In Euclidean space, we find, for example, 


classical solutions to the equations of motion, known as 
instantons, that provide nonperturbative contributions 
to the path integral. Perhaps the most flexible non- 
perturbative approach approximates the action and the 
measure at a lattice of points in four-dimensional space. 
For this purpose, integrals over the gauge fields are 
replaced by averages over “gauge links,” of the form of 
eqn [8] between neighboring points. 

Perturbation theory is most useful for processes 
that occur over short timescales and at high relative 
energies. Lattice QCD, on the other hand, can 
simulate processes that take much longer times, but 
is less useful when large momentum transfers are 
involved. The gap between the two methods remains 
quite wide, but between the two they have covered 
enormous ground, enough to more than confirm 
QCD as the theory of strong interactions. 


Asymptotic Freedom 


QCD is a renormalizable field theory, which implies 
that the coupling constant g must be defined by its 
value at a “renormalization scale,” and is denoted 
g(u). Usually, the magnitude of a,(~) = 97/47, is 
quoted at 44=mz, where it is ~ 0.12. In effect, g(u) 
controls the amplitude that connects any state to 
another state with one more or one fewer gluon, 
including quantum corrections that occur over time- 
scales from zero up to b/u (if we measure u in units of 
energy). The QCD Ward identities mentioned above 
ensure that the coupling is the same for both quarks 
and gluons, and indeed remains the same in all terms 
in the Lagrangian, ensuring that the symmetries of 
QCD are not destroyed by renormalization. 

Quantum corrections to gluon emission are not 
generally computable directly in renormalizable 
theories, but their dependence on p is computable, 
and is a power series in a,(j1) itself, 


da, (11) ax (u) 
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where bp =11—2n¢/3 and 6; =2(31—19n;/3). The 
celebrated minus signs on the right-hand side are 
associated with both the spin and self-interactions of 
the gluons. 

The solution to this equation provides an expres- 
sion for a, at any scale u1 in terms of its value at 
any other scale uo. Keeping only the lowest-order, 
bo, term, we have 
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where in the second form, we have introduced Aqcp, 
the scale parameter of the theory, which embodies 
the condition that we get the same coupling at scale 
uy no matter which scale po we start from. 
Asymptotic freedom consists of the observation that 
at larger renormalization masses u, or correspond- 
ingly shorter timescales, the coupling weakens, and 
indeed vanishes in the limit y — oo. The other side of 
the coin is that over longer times or lower momenta, 
the coupling grows. Eventually, near the pole at 
Hı = AQcp, the lowest-order approximation to the 
running fails, and the theory becomes essentially 
nonperturbative. Thus, the discovery of asymptotic 
freedom suggested, although it certainly does not 
prove, that QCD is capable of producing very strong 
forces, and confinement at long distances. Current 
estimates of Agcp are ~200 MeV. 


Spontaneous Breaking of Chiral Symmetry 


The number of quarks and their masses is an external 
input to QCD. In the standard model masses are 
provided by the Higgs mechanism, but in QCD they 
are simply parameters. Because the standard model 
has chosen several of the quarks to be especially light, 
QCD incorporates the chiral symmetries implied by 
eqn [4] (with P=1). In the limit of zero quark 
masses, these symmetries becomes exact, respected to 
all orders of perturbation theory, that is, for any 
finite number of gluons emitted or absorbed. 

At distances on the order to 1/Aqcp, however, 
QCD cannot respect chiral symmetry, which would 
require each state to have a degenerate partner with 
the opposite parity, something not seen in nature. 
Rather, QCD produces, nonperturbatively, nonzero 
values for matrix elements that mix right- and left- 
handed fields, such as (O|uLur|0), with u the up-quark 
field. Pions are the Goldstone bosons of this symmetry, 
and may be thought of as ripples in the chiral 
condensate, rotating it locally as they pass along. The 
observation that these Goldstone bosons are not 
exactly massless is due to the “current” masses of the 
quarks, their values in Lacp. The (chiral perturbation 
theory) expansion in these _ light-quark masses 
also enables us to estimate them quantitatively: 
1.5<m,<4MeV,4<m,<8MeV, and 80<m,< 
155MeV. These are the light quarks, with masses 
smaller than Aqcp. (Like as, the masses are renorma- 
lized; these are quoted from Eidelman (2004) with 
u=2GeV.) For comparison, the heavy quarks 
have masses m.~ 1-1.5 GeV, m, ~ 4-4.5 GeV, and 
mı ~ 180 GeV (the giant among the known elementary 
particles). 

Although the mechanism of the chiral condensate 
(and in general other nonperturbative aspects of 
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QCD) has not yet been demonstrated from first 
principles, a very satisfactory description of the origin 
of the condensate, and indeed of much hadronic 
structure, has been given in terms of the attractive 
forces between quarks provided by instantons. The 
actions of instanton solutions provide a dependence 
exp[— 87 /g?] in Euclidean path integrals, and so are 
characteristically nonperturbative. 


Mechanisms of Confinement 


As described above, confinement is the absence of 
asymptotic states that transform nontrivially under 
color transformations. The full spectrum of QCD, 
however, is a complex thing to study, and so the 
problem has been approached somewhat indirectly. A 
difficulty is the same light-quark masses associated 
with approximate chiral symmetry. Because the masses 
of the light quarks are far below the scale Aqcp at 
which the perturbative coupling blows up, light quarks 
are created freely from the vacuum and the process of 
“hadronization,” by which quarks and gluons form 
mesons and baryons, is both nonperturbative and 
relativistic. It is therefore difficult to approach in both 
perturbation theory and lattice simulations. 

Tests and studies of confinement are thus normally 
formulated in truncations of QCD, typically with no 
light quarks. The question is then reformulated in a 
way that is somewhat more tractable, without 
relativistic light quarks popping in and out of the 
vacuum all the time. In the limit that its mass becomes 
infinite compared to the natural scale of fluctuations in 
the QCD vacuum, the propagator of a quark becomes 
identical to a phase operator, [8], with a path C 
corresponding to a constant velocity. This observation 
suggests a number of tests for confinement that can be 
implemented in the lattice theory. The most intuitive is 
the vacuum expectation value of a “Wilson loop,” 
consisting of a rectangular path, with sides along the 
time direction, corresponding to a heavy quark and 
antiquark at rest a distance R apart, and closed at some 
starting and ending times with straight lines. The 
vacuum expectation value of the loop then turns out to 
be the exponential of the potential energy between the 
quark pair, multiplied by the elapsed time, 


(oP exp|—igs $ Ante) de"||o ) 


= exp(—V(R)T/h) [13] 


When V(R) « R (“area law” behavior), there is a 
linearly rising, confining potential. This behavior, 
not yet proven analytically yet well confirmed on the 
lattice, has an appealing interpretation as the energy 
of a “string,” connecting the quark and antiquark, 
whose energy is proportional to its length. 





Motivation for such a string picture was also 
found from the hadron spectrum itself, before any of 
the heavy quarks were known, and even before the 
discovery of QCD, from the observation that many 
mesonic (gq’) states lie along “Regge trajectories,” 
which consist of sets of states of spin J and mass my 


that obey a relation 
(=a my [14] 


for some constant a’. Such a relation can be modeled 
by two light particles (“quarks”) revolving around each 
other at some constant (for simplicity, fixed nonrela- 
tivistic) velocity vg and distance 2R, connected by a 
“string” whose energy per unit length is a constant p. 

Suppose the center of the string is stationary, so 
the overall system is at rest. Then neglecting the 
masses, the total energy of the system is M=2Rp. 
Meanwhile, the momentum density per unit length 
at distance r from the center is v(r)=(r/R)vo, and 
the total angular momentum of the system is 


R 
j= Zou | dr? = ae 2 M2 [15] 
0 3 6p 
and for such a system, [14] is indeed satisfied. 
Quantized values of angular momentum J give 
quantized masses mj, and we might take this as a 
sort of “Bohr model” for a meson. Indeed, string 
theory has its origin in related consideration in the 
strong interactions. 

Lattice data are unequivocal on the linearly rising 
potential, but it requires further analysis to take a 
lattice result and determine what field configura- 
tions, stringlike or not, gave that result. Probably the 
most widely accepted explanation is in terms of an 
analogy to the Meissner effect in superconductivity, 
in which type II superconductors isolate magnetic 
flux in quantized tubes, the result of the formation 
of a condensate of Cooper pairs of electrons. If the 
strings of QCD are to be made of the gauge field, 
they must be electric (F“°) in nature to couple to 
quarks, so the analogy postulates a “dual” Meissner 
effect, in which electric flux is isolated as the result 
of a condensate of objects with magnetic charge 
(producing nonzero F’), Although no proof of this 
mechanism has been provided yet, the role of 
magnetic fluctuations in confinement has been 
widely investigated in lattice simulations, with 
encouraging results. Of special interest are magnetic 
field configurations, monopoles or vortices, in the 
Z3 center of SU(3), exp [itk/3]I3x3, R=0,1,2. Such 
configurations, even when localized, influence 
closed gauge loops [13] through the nonabelian 
Aharonov-Bohm effect. Eventually, of course, the 
role of light quarks must be crucial for any complete 


description of confinement in the real world, as 
emphasized by Gribov. 

Another related choice of closed loop is the 
“Polyakov loop,” implemented at finite temperature, 
for which the path integral is taken over periodic 
field configurations with period 1/T, where T is the 
temperature. In this case, the curve C extends from 
times t=0 to t=1/T at a fixed point in space. In 
this formulation it is possible to observe a phase 
transition from a confined phase, where the expec- 
tation is zero, to a deconfined phase, where it is 
nonzero. This phase transition is currently under 
intense experimental study in nuclear collisions. 


Using Asymptotic Freedom: 
Perturbative QCD 


It is not entirely obvious how to use asymptotic 
freedom in a theory that should (must) have 
confinement. Such applications of asymptotic free- 
dom go by the term perturbative QCD, which has 
many applications, not the least as a window to 
extensions of the standard model. 


Lepton Annihilation and Infrared Safety 


The electromagnetic current, Ja = >) e¢d¢Vudf> is a 
gauge-invariant operator, and its correlation functions 
are not limited by confinement. Perhaps, the simplest 
application of asymptotic freedom, yet of great 
physical relevance, is the scalar two-point function, 


n(Q)=> [dixe(0TIMOVIu(%))|0) [16 


The imaginary part of this function is related to the 
total cross section for the annihilation process e*e~ — 
hadrons in the approximation that only one photon 
takes part in the reaction. The specific relation is 
aoc = (e*/O7) Im(Q*), which follows from the 
optical theorem, illustrated in Figure 1. The perturba- 
tive expansion of the function z(Q) depends, in 
general, on the mass scales O and the quark masses 
my, as well as on the strong coupling a,(j:) and on the 
renormalization scale u. We may also worry about the 
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Figure 1 First line: schematic relation of lowest order ete- 
annihilation to sum over quarks q, each with electric charge eg. 
Second line: perturbative unitarity for the current correlation 
function 7(Q). 
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influence of other, truly nonperturbative scales, 
proportional to powers of Agcp. At large values of 
Ọ?, however, the situation simplifies greatly, and 
dependence on all scales below O is suppressed by 
powers of O. This may be expressed in terms of the 
operator product expansion, 


(O/T (7"(0)J,(%)) |0) 
E SOT T aslu)) 
Or 
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where dr is the mass dimension of operator Oz, and 
where the dimensionless coefficient functions Cy 
incorporate quantum corrections. The sum over 
operators begins with the identity (dy =0), whose 
coefficient function is identified with the sum of 
quantum corrections in the approximation of zero 
masses. The sum continues with quark mass correc- 
tions, which are suppressed by powers of at least 
m7 /O7, for those flavors with masses below O. Any 
QCD quantity that has this property, remaining 
finite in perturbation theory when all particle masses 
are set to zero, is said to be “infrared safe.” 

The effects of quarks whose masses are above O 
are included indirectly, through the couplings and 
masses observed at the lower scales. In summary, 
the leading power behavior of 7(Q), and hence of 
the cross section, is a function of O, u, and a,() 
only. Higher-order operators whose vacuum matrix 
elements receive nonperturbative corrections include 
the “gluon condensate,” identified as the product 
asl u)Gag G x Abon: 

Once we have concluded that O is the only 
physical scale in 7, we may expect that the right 
choice of the renormalization scale is ~=QO. Any 
observable quantity is independent of the choice of 
renormalization scale, p, and neglecting quark 
masses, the chain rule gives 


joes as(u)) = pay — 0 [18] 


du Ou Oa, 


which shows that we can determine the beta 
function directly from the perturbative expansion 
of the cross section. Defining a = a,(u)/r, such a 
perturbative calculation gives 


Im7(Q7) = > Se (1 tara (1.986 
f 


with bo as above. Now, choosing u = Q, we see that 
asymptotic freedom implies that when O is large, 
the total cross section is given by the lowest order, 
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Figure 2 Experimental variation of the strong coupling with 
scales. Reproduced from Bethke S (2004) Alpha(s) at Zinnowitz. 
Nuclear Physics Proceedings Supplements 135: 345—352, with 
permission from Elsevier. 


plus small and calculable QCD corrections, a result 
that is borne out in experiment. Comparing experi- 
ment to an expression like [19], one can measure the 
value of a,(Q), and hence, with eqn [12], as(u) for 
any u > Agcp. Figure 2 shows a recent compilation 
of values of a, from this kind of analysis in different 
experiments at different scales, clearly demonstrat- 
ing asymptotic freedom. 


Factorization, Scaling, and Parton distributions 


One step beyond vacuum matrix elements of currents 
are their expectation values in single-particle states, 
and here we make contact with the discovery of 
QCD, through scaling. Such expectations are relevant 
to the class of experiments known as deep-inelastic 
scattering, in which a high-energy electron exchanges 
a photon with a nucleon target. All QCD information 
is contained in the tensor matrix element 


WN (p,q) 
= a> / d*xe4* (p, olJ" (0J (x)lp,o) [20 


with q the momentum transfer carried by the 
photon, and p,o the momentum and spin of the 
target nucleon, N. This matrix element is not 
infrared safe, since it depends in principle on the 
entire history of the nucleon state. Thus, it is not 
accessible to direct perturbative calculation. 
Nevertheless, when the scattering involves a large 
momentum transfer compared to Agcp, we may 





Figure 3 Schematic depiction of factorization in deep-inelastic 
scattering. 


expect a quantum-mechanical incoherence between 
the scattering reaction, which occurs (by the uncer- 
tainty principle) at short distances, and the forces that 
stabilize the nucleon. After all, we have seen that the 
latter, strong forces, should be associated with long 
distances. Such a separation of dynamics, called 
factorization, can be implemented in perturbation 
theory, and is assumed to be a property of full QCD. 
Factorization is illustrated schematically in Figure 3. 
Of course, short and long distances are relative 
concepts, and the separation requires the introduction 
of a so-called factorization scale, up, not dissimilar to 
the renormalization scale described above. For many 
purposes, it is convenient to choose the two equal, 
although this is not required. 

The expression of factorization for deep-inelastic 
scattering is 


Wx (p,q) 


1 
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i=qf,4f,G ~ 
x fin (€, HF) [21] 


where the functions C” (the coefficient functions) 
can be computed as an expansion in a,(jp), and 
describe the scattering of the “partons,” quarks, and 
gluons, of which the target is made. The variable £ 
ranges from unity down to x = —q*/2p-q > 0, and 
has the interpretation of the fractional momentum 
of the proton carried by parton i. (Here —q? = Q? is 
positive.) The parton distributions f; can be 
defined in terms of matrix elements in the nucleon, 
in which the currents are replaced by quark (or 
antiquark or gluon) fields, as 


1 ied —iAxpt 
fne] dre re? 


x (p,0|q(A7) Un(mA,0)n-q(0)|b,0) [22] 


n is a light-like vector, and U, a phase operator 
whose path C is in the m-direction. The dependence 
of the parton distribution on the factorization scale 
is through the renormalization of the composite 
Operator consisting of the quark fields, separated 
along the light cone, and the nonabelian phase 
operator U,(7,0), which renders the matrix ele- 
ment gauge invariant by eqn [9]. By combining the 
calculations of the C’s and data for WẸ, we can 
infer the parton distributions, f/x. Important factor- 
izations of a similar sort also apply to some 
exclusive processes, including amplitudes for elastic 
pion or nucelon scattering at large momentum 
transfer. 

Equation [21] has a number of extraordinary 
consequences. First, because the coefficient function 
is an expansion in as, it is natural to choose pe ~ 
O? ~ p-q (when x is of order unity). When O is 
large, we may approximate C” by its lowest order, 
which is first order in the electromagnetic coupling 
of quarks to photons, and zeroth order in a,. In this 
approximation, dependence on O is entirely in the 
parton distributions. But such dependence is of 
necessity weak (again for x not so small as to 
produce another scale), because the up dependence 
of fisn(&, ur) must be compensated by the pp 
dependence of C;", which is order as. This means 
that the overall O dependence of the tensor Wy, is 
weak for O large when x is moderate. This is the 
scaling phenomenon that played such an important 
role in the discovery of QCD. 


Evolution: Beyond Scaling 


Another consequence of the factorization [21], or 
equivalently of the operator definition [22], is that 
the pp-dependence of the coefficient functions and 
the parton distributions are linked. As in the lepton 
annihilation cross section, this may be thought of as 
due to the independence of the physically observable 
tensor Wi, from the choice of factorization and 
renormalization scales. This implies that the 
jiz-dependence of f; may be calculated perturba- 
tively since it must cancel the corresponding 
dependence in C;. The resulting relation is coven- 
tionally expressed in terms of the “evolution 
equations,” 


dfain(x, H) 
a 
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where P,-(€) are calculable as power series, now 
known up to a}. This relation expands the applic- 
ability of QCD from scales where parton 
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distributions can be inferred directly from experi- 
ment, to arbitrarily high scales, reachable in accel- 
erators under construction or in the imagination, or 
even on the cosmic level. 

At very high energy, however, the effective values 
of the variable x can become very small and 
introduce new scales, so that eventually the evolu- 
tion of eqn [23] fails. The study of nuclear collisions 
may provide a new high-density regime for QCD, 
which blurs the distinction between perturbative and 
nonperturbative dynamics. 


Inclusive Production 


Once we have evolution at our disposal, we can take 
yet another step, and replace electroweak currents 
with any operator from any extension of QCD, in 
the standard model or beyond, that couples quarks 
and gluons to the particles of as-yet unseen fields. 
Factorization can be extended to these situations as 
well, providing predictions for the production of 
new particles, F of mass M, in the form of factorized 
inclusive cross sections, 


oaB—F(M)(M, pa, pB) 
= >. i déa dp fiya (Ea; U)fie (Eb, u) 


x Hijron Xab A XpbB, M, Hasle) [24] 


where the functions Hj.7 may be calculated 
perturbatively, while the f;a and fjg parton 
distributions are known from a combination of 
lower-energy observation and evolution. In this 
context, they are said to be “universal,” in that 
they are the same functions in hadron—hadron 
collisions as in the electron—hadron collisions of 
deep-inelastic scattering. In general, the calculation 
of hard-scattering functions Hj is quite nontrivial 
beyond lowest order in a,. The exploration of 
methods to compute higher orders, currently as far 
as az, has required extraordinary insight into the 
properties of multidimensional integrals. 

The factorization method helped predict the 
observation of the W and Z bosons of electroweak 
theory, and the discovery of the top quark. The 
extension of factorization from deep-inelastic scat- 
tering to hadron production is nontrivial; indeed, it 
only holds in the limit that the velocities, G;, of the 
colliding particles approach the speed of light in the 
center-of-momentum frame of the produced particle. 
Corrections to the relation [24] are then at the level 
of powers of 3; —1, which translates into inverse 
powers of the invariant mass(es) of the produced 
particle(s) M. Factorizations of this sort do not 
apply to low-velocity collisions. Arguments for this 
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result rely on relativistic causality and the uncer- 
tainty principle. The creation of the new state 
happens over timescales of order 1/M. Before that 
well-defined event, the colliding particles are 
approaching at nearly the speed of light, and hence 
cannot affect the distributions of each others’ 
partons. After the new particle is created, the 
fragments of the hadrons recede from each other, 
and the subsequent time development, when 
summed over all possible final states that include 
the heavy particle, is finite in perturbation theory as 
a direct result of the unitarity of QCD. 


Structure of Hadronic Final States 


A wide range of semi-inclusive cross sections are 
defined by measuring properties of final states that 
depend only on the flow of energy, and which bring 
QCD perturbation theory to the threshold of 
nonperturbative dynamics. Schematically, for a 
state N=|k,...kn), we define S(N)= Y`; s(Q))R?, 
where s(Q) is some smooth function of directions. 
We generalize the ete” annihilation case above, and 
define a cross section in terms of a related, but 
highly nonlocal, matrix element, 


-o =a | d'xe (0 
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x ( J d7Os(Q)E(Q) -8) T(x) o) [25] 


where oo is a zeroth-order cross section, and where 
E is an operator at spatial infinity, which measures 
the energy flow of any state in direction Q: E(Q) 
[ky ..- Rn) =(1/O) X; k? (N —Q;). This may seem a 
little complicated, but like the total annihilation cross 
section, the only dimensional scale on which it 
depends is O. The operator € can be defined in a 
gauge-invariant manner, through the energy-momen- 
tum tensor for example, and has a meaning indepen- 
dent of partonic final states. At the same time, this 
sort of cross section may be implemented easily in 
perturbation theory, and like the total annihilation 
cross section, it is infrared safe. To see why, notice 
that when a massless (k? = 0) particle decays into two 
particles of momenta xk and (1 — x)k (0 < x < 1), the 
quantity S is unchanged, since the sum of the new 
energies is the same as the old. This makes the 
observable S(N) insensitive to processes at low 
momentum transfer. 

For the case of leptonic annihilation, the lowest- 
order perturbative contribution to energy flow 
requires no powers of a,, and consists of an 
oppositely moving quark and antiquark pair. Any 
measure of energy flow that includes these config- 
urations will dominate over correlations that require 


a, corrections. As a result, QCD predicts that in 
most leptonic annihilation events, energy will flow 
in two back-to-back collimated sets of particles, 
known as “jets.” In this way, quarks and gluons are 
observed clearly, albeit indirectly. 

With varying choices of S, many properties of 
jets, such as their distributions in invariant mass, 
and the probabilities and angular distributions of 
multijet events, and even the energy dependence of 
their particle multiplicities, can be computed in 
QCD. This is in part because hadronization is 
dominated by the production of light quarks, 
whose production from the vacuum requires very 
little momentum transfer. Paradoxically, the very 
lightness of quarks is a boon to the use of 
perturbative methods. All these considerations can 
be extended to hadronic scattering, and jet and other 
semi-inclusive properties of final states also com- 
puted and compared to experiment. 


Conclusions 


QCD is an extremely broad field, and this article has 
hardly scratched the surface. The relation of QCD- 
like theories to supersymmetric and string theories, 
and implications of the latter for confinement and 
the computation of higher-order perturbative ampli- 
tudes, have been some of the most exciting devel- 
opments of recent years. As another example, we 
note that the reduction of the heavy-quark propa- 
gator to a nonabelian phase, noted in our discussion 
of confinement, is related to additional symmetries 
of heavy quarks in QCD, with many consequences 
for the analysis of their bound states. Of the 
bibliography given below, one may mention the 
four volumes of Shifman (2001, 2002), which 
communicate in one place a sense of the sweep of 
work in QCD. 

Our confidence in QCD as the correct description of 
the strong interactions is based on a wide variety of 
experimental and observational results. At each stage in 
the discovery, confirmation, and exploration of QCD, 
the mathematical analysis of relativistic quantum field 
theory entered new territory. As is the case for gravity or 
electromagnetism, this period of exploration is far from 
complete, and perhaps never will be. 
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Introduction 


Classical gravity, through its attractive nature, leads 
to a high curvature in important situations. In 
particular, this is realized in the very early universe 
where in the backward evolution energy densities 
are growing until the theory breaks down. Mathe- 
matically, this point appears as a singularity where 
curvature and physical quantities diverge and the 
evolution breaks down. It is not possible to set up an 
initial-value formulation at this place in order to 
determine the further evolution. 

In such a regime, quantum effects are expected to 
play an important role and to modify the classical 
behavior such as the attractive nature of gravity or the 
underlying spacetime structure. Any candidate for 
quantum gravity thus allows us to reanalyze the 
singularity problem in a new light which implies the 
tests of the characteristic properties of the respective 
candidate. Moreover, close to the classical 
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singularity, in the very early universe, quantum 
modifications will give rise to new equations of 
motion which turn into Einstein’s equations only on 
larger scales. The analysis of these equations of 
motion leads to new classes of early universe 
phenomenology. 

The application of quantum theory to cosmology 
presents a unique problem with not only mathema- 
tical but also many conceptual and philosophical 
ramifications. Since by definition there is only one 
universe which contains everything accessible, there 
is no place for an outside observer separate from the 
quantum system. This eliminates the most straight- 
forward interpretations of quantum mechanics and 
requires more elaborate, and sometimes also more 
realistic, constructions such as decoherence. From 
the mathematical point of view, this situation is 
often expected to be mirrored by a new type of 
theory which does not allow one to choose initial or 
boundary conditions separately from the dynamical 
laws. Initial or boundary conditions, after all, are 
meant to specify the physical system prepared for 
observations which is impossible in cosmology. 
Since we observe only one universe, the expectation 
goes, our theories should finally present us with only 
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one, unique solution without any freedom for 
further conditions. This solution then contains all 
the information about observations as well as 
observers. Mathematically, this is an extremely 
complicated problem which has received only scant 
attention. Equations of motion for quantum cosmol- 
ogy are usually of the type of partial differential or 
difference equations such that new ingredients from 
quantum gravity are needed to restrict the large 
freedom of solutions. 


Minisuperspace approximation 


In most investigations, the problem of applying full 
quantum gravity to cosmology is simplified by a 
symmetry reduction to homogeneous or isotropic 
geometries. Originally, the reduction was performed 
at the classical level, leaving in the isotropic case 
only one gravitational degree of freedom given by 
the scale factor a. Together with homogeneous 
matter fields, such as a scalar œ, there are then 
only finitely many degrees of freedom which one can 
quantize using quantum mechanics. The classical 
Friedmann equation for the evolution of the scale 
factor, depending on the spatial curvature k=O or 
+1, is then quantized to the Wheeler—DeWitt 
equation, commonly written as 


1 o O 
(tbe aa a ha?) oa) 
SiG a 
= — = aFlmatter(a)(a, $) [1] 


for the wave function Y (a, ġ). The matter Hamilto- 
nian Hmatter(a), such as 


2 
Hmatter(4) = — Spa e + a°V(¢) [2] 
is left unspecified here, and x parametrizes factor 
ordering ambiguities (but not completely). The 
Planck length 4p = v 87rG}þ is defined in terms of 
the gravitational constant G and the Planck 
constant h. 

The central conceptual issue then is the generality 
of effects seen in such a symmetric model and its 
relation to the full theory of quantum gravity. This 
is completely open in the Wheeler-DeWitt form 
since the full theory itself is not even known. On the 
other hand, such relations are necessary to value any 
potential physical statement about the origin and 
early history of the universe. In this context, 
symmetric situations thus present models, and the 
degree to which they approximate full quantum 
gravity remains mostly unknown. There are exam- 
ples, for instance, of isotropic models in anisotropic 


but still homogeneous models, where a minisuper- 
space quantization does not agree at all with the 
information obtained from the less symmetric 
model. However, often those effects already have a 
classical analog such as instability of the more 
symmetric solutions. A wider investigation of the 
reliability of models and when correction terms 
from ignored degrees of freedom have to be included 
has not been done yet. 

With candidates for quantum gravity being 
available, the current situation has changed to 
some degree. It is then not only possible to reduce 
classically and then simply use quantum 
mechanics, but also perform at least some of the 
reduction steps at the quantum level. The relation 
to models is then much clearer, and consistency 
conditions which arise in the full theory can be 
made certain to be observed. Moreover, relations 
between models and the full theory can be studied 
to elucidate the degree of approximation. Even 
though new techniques are now available, a 
detailed investigation of the degree of approxima- 
tion given by a minisuperspace model has not been 
completed due to its complexity. 

This program has mostly been developed in the 
context of loop quantum gravity, where the specia- 
lization to homogeneous models is known as loop 
quantum cosmology. More specifically, symmetries 
can be introduced at the level of states and basic 
operators, where symmetric states of a model are 
distributions in the full theory, and basic operators 
are obtained by the dual action on those distribu- 
tions. In such a way, the basic representation of 
models is not assumed but derived from the full 
theory where it is subject to much stronger 
consistency conditions. This has implications even 
in homogeneous models with finitely many degrees 
of freedom, despite the fact that quantum mechanics 
is usually based on a unique representation if the 
Weyl operators e'? and e” for the variables q and p 
are represented weakly continuously in the real 
parameters s and t. 

The continuity condition, however, is not neces- 
sary in general, and so inequivalent representations 
are possible. In quantum cosmology this is indeed 
realized, where the Wheeler-DeWitt representation 
assumes that the conjugate to the scale factor, 
corresponding to extrinsic curvature of an isotropic 
slice, is represented through a continuous Weyl 
operator, while the representation derived for loop 
quantum cosmology shows that the resulting opera- 
tor is not weakly continuous. Furthermore, the scale 
factor has a continuous spectrum in the Wheeler- 
DeWitt representation but a discrete spectrum in the 
loop representation. Thus, the underlying geometry 


of space is very different, and also evolution takes 
a new form, now given by a difference equation of 
the type 


(Vises — Vus3)eedy+4(¢) 
— (2+k7)(Vus1 — Vaalo) 
+(V,-3 — V,-s)e ’w,-4(¢) 
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in terms of volume eigenvalues V,,=(¢3|u|/6) >/*. 
For large u and smooth wave functions, one can see 
that the difference equation reduces to the 
Wheeler-DeWitt equation with |u| x a? to leading 
order in derivatives of yw. At small u, close to the 
classical singularity, however, both equations have 
very different properties and lead to different 
conclusions. Moreover, the prominent role of 
difference equations leads to new mathematical 
problems. 

This difference equation is not simply obtained 
through a discretization of [1], but derived from a 
constraint operator constructed with methods from 
full loop quantum gravity. It is, thus, to be regarded 
as more fundamental, with [1] emerging in a 
continuum limit. The structure of [3] depends on 
the properties of the full theory such that its 
qualitative analysis allows conclusions for full 
quantum gravity. 


Applications 


Traditionally, quantum cosmology has focused on 
three main conceptual issues: 


è the fate of classical singularities, 

è initial conditions and the “prediction” of inflation 
(or other early universe scenarios), and 

è arrow of time and the emergence of a classical 
world. 


The first issue consists of several subproblems since 
there are different aspects to a classical singularity. 
Often, curvature or energy densities diverge and one 
can expect quantum gravity to provide a natural 
cutoff. More importantly, however, the classical 
evolution breaks down at a singularity, and quan- 
tum gravity, if it is to cure the singularity problem, 
has to provide a well-defined evolution which does 
not stop. Initial conditions are often seen in relation 
to the singularity problem since early attempts tried 
to replace the singularity by choosing appropriate 
conditions for the wave function at a=0. Different 
proposals then lead to different solutions for the 
wave function, whose dependence on the scalar œ 
can be used to determine its probability distribution 
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such as that for an inflaton. Since initial conditions 
often provide special properties early on, the 
combination of evolution and initial conditions has 
been used to find a possible origin of an arrow 
of time. 


Singularities 


While classical gravity is based on spacetime 
geometry and thus metric tensors, this structure is 
viewed as emergent only at large scales in canonical 
quantum gravity. A gravitational system, such as a 
whole universe, is instead described by a wave 
function which, at best, yields expectation values for 
a metric. The singularity problem thus takes a 
different form since it is not metrics which need to 
be continued as solutions to Einstein’s field equa- 
tions but the wave function describing the quantum 
system. In the strong curvature regime around a 
classical singularity, one does not expect classical 
geometry to be applicable, such that classical 
singularities may just be a reflection of the break- 
down of this picture, rather than a breakdown of 
physical evolution. Nevertheless, the basic feature of 
a singularity as presenting a boundary to the 
evolution of a system equally applies to the quantum 
equations. One can thus analyze this issue, using 
new properties provided by the quantum evolution. 

The singularity issue is not resolved in the 
Wheeler—DeWitt formulation since energy densities, 
with a being a multiplication operator, diverge and 
the evolution does not continue anywhere beyond 
the classical singularity at a=0. In some cases one 
can formally extend the evolution to negative a, but 
this possibility is not generic and leaves open what 
negative a means geometrically. This is different in 
the loop quantization: here, the theory is based on 
triad rather than metric variables. There is thus a 
new sign factor corresponding to spatial orientation, 
which implies the possibility of negative u in the 
difference equation. The equation is then defined on 
the full real line with the classical singularity u= 0 
in the interior. Outside ~=0, we have positive 
volume at both sides, and opposite orientations. 
Using the difference equation, one can then see that 
the evolution does not break down at u=0, 
showing that the quantum evolution is singularity 
free. 

For the example [3] shown here, one can follow 
the evolution, for instance, backward in internal 
time u, starting from initial values for w at large 
positive u. By successively solving for ~,,_4, the wave 
function at lower yz is determined. This goes on in 
this manner only until the coefficient V,-3 — Vis of 
w,,-4 vanishes, which is the case if and only if u= 4. 
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The value wo of the wave function exactly at the 
classical singularity is thus not determined by initial 
data, but one can easily see that it completely drops 
out of the evolution. In fact, the wave function at all 
negative u is uniquely determined by initial values at 
positive u. Equation [3] corresponds to one parti- 
cular ordering, which in the Wheeler-DeWitt case is 
usually parametrized by the parameter x (although 
the particular ordering obtained from the continuum 
limit of [3] is not contained in the special family 
[1]). Other nonsingular orderings exist, such as that 
after symmetrizing the constraint operator, in which 
case the coefficients never become 0. 

In more complicated systems, this behavior is 
highly nontrivial but still known to be realized in a 
similar manner. It is not automatic that the internal 
time evolution does not continue since even in 
isotropic models one can easily write difference 
equations for which the evolution breaks down. 
That the most natural orderings imply nonsingular 
evolution can be taken as a support of the general 
framework of loop quantum gravity. It should also 
be noted that the mechanism described here, 
providing essentially a new region beyond a classical 
singularity, presents one mechanism for quantum 
gravity to remove classical singularities, and so far 
the only known one. Nevertheless, there is no claim 
that the ingredients have to be realized in any 
nonsingular scenario in the same manner. Different 
scenarios can be imagined, depending on how 
quantum evolution is understood and what the 
interpretation of nonsingular behavior is. It is also 
not claimed that the new region is semiclassical in 
any sense when one looks at it at large volume. If 
the initial values for the wave function describe a 
semiclassical wave packet, its evolution beyond the 
classical singularity can be deformed and develop 
many peaks. What this means for the re-emergence 
of a semiclassical spacetime has to be investigated in 
particular models, and also in the context of 
decoherence. 


Initial Conditions 


Traditional initial conditions in quantum cosmology 
have been introduced by physical intuition. The 
main mathematical problem, once such a condition 
is specified in sufficient detail, then is to study well- 
posedness, for instance, for the Wheeler—DeWitt 
equation. Even formulating initial conditions 
generally, and not just for isotropic models, is 
complicated, and systematic investigations of the 
well-posedness have rarely been undertaken. An 
exception is the historically first such condition, 
due to DeWitt, that the wave function vanishes at 


parts of miunisuperspace, such as a=0 in the 
isotropic case, corresponding to classical singulari- 
ties. This condition, unfortunately, can easily be 
seen to be ill posed in anisotropic models where in 
general the only solution vanishes identically. In 
other models, lim,_.97(a) does not even exist. 
Similar problems of the generality of conditions 
arise in other scenarios. Most well known are the 
no-boundary and tunneling proposal where initial 
conditions are still imposed at a=0, but with a 
nonvanishing wave function there. 

This issue is quite different for difference equa- 
tions since at first the setup is less restrictive: there 
are no continuity or differentiability conditions for a 
solution. Moreover, oscillations that become arbi- 
trarily rapid, which can be responsible for the 
nonexistence of lim,-.9~(a), cannot be supported 
on a discrete lattice. It can then easily happen that a 
difference equation is well posed, while its con- 
tinuum limit with an analogous initial condition is 
ill posed. One example are the dynamical initial 
conditions of loop quantum cosmology which arise 
from the dynamical law in the following way: the 
coefficients in [3] are not always nonzero but vanish 
if and only if they are multiplied with the value of 
the wave function at the classical singularity u= 0. 
This value thus decouples and plays no role in the 
evolution. The instance of the difference equation 
that would determine wo, for example, the equation 
for u= 4 in the backward evolution, instead implies 
a condition on the previous two values, %4 and we, 
in the example. Since they have already been 
determined in previous iteration steps, this translates 
to a linear condition on the initial values chosen. We 
thus have one example where indeed initial condi- 
tions and the evolution follow from only one 
dynamical law, which also extends to anisotropic 
models. Without further conditions, the initial-value 
problem is always well posed, but may not be 
complete, in the sense that it results in a unique 
solution up to norm. Most of the solutions, 
however, will be rapidly oscillating. In order to 
guarantee the existence of a continuum approxima- 
tion, one has to add a condition that these 
oscillations are suppressed in large volume regimes. 
Such a condition can be very restrictive, such that 
the issue of well-posedness appears in a new guise: 
nonzero solutions do exist, but in some cases all of 
them may be too strongly oscillating. 

In simple cases, one can use generating function 
techniques advantageously to study oscillating solu- 
tions, at least if oscillations are of alternating nature 
between two subsequent levels of the difference 
equation. The idea is that a generating function 
G(x)= X`, Ynx” has a stronger pole at x= —1 if Y, 


is alternating compared to a solution of constant 
sign. Choosing initial conditions which reduce the 
pole order thus implies solutions with suppressed 
oscillations. As an example, we can look at the 
difference equation 


2 
Pny +o Vn — Yna = 0 [4] 
whose generating function is 


_ ypx + wo(1 + 2x(1 — log(1 — x))) 
(+x) 


The pole at x= —1 is removed for initial values 
pı =W0(2 log 2 — 1) which corresponds to nonoscil- 
lating solutions. In this way, analytical expressions 
can be used instead of numerical attempts which 
would be sensitive to rounding errors. Similarly, the 
issue of finding bounded solutions can be studied by 
continued fraction methods. This illustrates how an 
underlying discrete structure leads to new questions 
and the application of new techniques compared to 
the analysis of partial differential equations which 
appear more commonly. 


G(x) [5] 


More General Models 


Most of the time, homogeneous models have been 
studied in quantum cosmology since even formulat- 
ing the Wheeler-DeWitt equation in inhomogeneous 
cases, the so-called midisuperspace models, is 
complicated. Of particular interest among homo- 
geneous models is the Bianchi IX model since it has 
a complicated classical dynamics of chaotic beha- 
vior. Moreover, through the Belinskii-Khalatnikov— 
Lifschitz (BKL) picture, the Bianchi IX mixmaster 
behavior is expected to play an important role even 
for general inhomogeneous singularities. The classi- 
cal chaos then indicates a very complicated 
approach to classical singularities, with structure 
on arbitrarily small scales. 

On the other hand, the classical chaos relies on a 
curvature potential with infinitely high walls, which 
can be mapped to a chaotic billiard motion. The 
walls arise from the classical divergence of curva- 
ture, and so quantum effects have been expected to 
change the picture, and shown to do so in several 
cases. 

Inhomogeneous models (e.g., the polarized 
Gowdy models) have mostly been studied in cases 
where one can reformulate the problem as that of a 
massless free scalar on flat Minkowski space. The 
scalar can then be quantized with familiar techni- 
ques in a Fock space representation, and is related to 
metric components of the original model in rather 
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complicated ways. Quantization can thus be per- 
formed, but transforming back to the metric at the 
operator level and drawing conclusions is quite 
involved. The main issue of interest in the recent 
literature has been the investigation of field theory 
aspects of quantum gravity in a tractable model. In 
particular, it turns out that self-adjoint Hamilto- 
nians, and thus unitary evolution, do not exist in 
general. 

Loop quantizations of inhomogeneous models are 
available even in cases where a reformulation such 
as a field theory on flat space does not exist, or is 
not being made use of to avoid special gauges. This 
is quite valuable in order to see if specific features 
exploited in reformulations lead to artifacts in the 
results. So far, the dynamics has not been investi- 
gated in detail, even though conclusions for the 
singularity issue can already be drawn. 

From a physical perspective, it is most important 
to introduce inhomogeneities at a perturbative level 
in order to study implications for cosmological 
structure formation. On a homogeneous back- 
ground, one can perform a mode decomposition of 
metric and matter fields and quantize the homo- 
geneous modes as well as amplitudes of higher 
modes. Alternatively, one can first quantize the 
inhomogeneous system and then introduce the mode 
decomposition at the quantum level. This gives rise 
to a system of infinitely many coupled equations of 
infinitely many variables, which needs to be trun- 
cated, for example, for numerical investigations. At 
this level, one can then study the question to which 
degree a given minisuperspace model presents a 
good approximation to the full theory, and where 
additional correction terms should be introduced. It 
also allows one to develop concrete models of 
decoherence, which requires a “bath” of many 
weakly interacting degrees of freedom usually 
thought of as being provided by inhomogeneities in 
cosmology, and an understanding of the semiclassi- 
cal limit. 


Interpretations 


Due to the complexity of full gravity, investigations 
without symmetry assumptions or perturbative 
approximations usually focus on conceptual issues. 
As already discussed, cosmology presents a unique 
situation for physics since there cannot be any 
outside observer. While this fact has already 
implications on the interpretation of observations 
at the classical level, its full force is noticed only in 
quantum cosmology. Since some traditional inter- 
pretations of quantum mechanics require the role of 
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observers outside the quantum system, they do not 
apply to quantum cosmology. 

Sometimes, alternative interpretations such as 
Bohm theory or many-world scenarios are cham- 
pioned in this situation, but more conventional 
relational pictures are most widely adopted. In 
such an interpretation, the wave function yields 
relational probabilities between degrees of free- 
dom rather than absolute probabilities for mea- 
surements done by an outside observer. This has 
been used, for instance, to determine the prob- 
ability of the right initial conditions for inflation, 
but it is marred by unresolved interpretational 
issues and still disputed. These problems can be 
avoided by using effective equations, in analogy 
to an effective action, which modify classical 
equations on small scales. Since the new equa- 
tions are still of classical type, that is, differential 
equations in coordinate time, no interpretational 
issues arise at least if one stays in semiclassical 
regimes. In this manner, new inflationary scenar- 
ios motivated from quantum cosmology have 
been developed. 

In general, a relational interpretation, though 
preferable conceptually, leads to technical 
complications since the situation is much more 
involved and evolution is not easy to disentangle. 
In cosmology, one often tries to single out one 
degree of freedom as internal time with respect to 
which evolution of other degrees of freedom is 
measured. In homogeneous models, one can 
simply take the volume as internal time, such as 
a or p earlier, but in full no candidate is known. 
Even in homogeneous models, the volume is not 
suitable as internal time to describe a possible 
recollapse. One can use extrinsic curvature 
around such a point, but then one has to under- 
stand what changing the internal time in quantum 
cosmology implies, that is, whether evolution 
pictures obtained in different internal time for- 
mulations are equivalent to each other. 

There are thus many open issues at different 
levels, which, strictly speaking, do not apply only to 
quantum cosmology but to all of physics. After all, 
every physical system is part of the universe, and 
thus a potential ingredient of quantum cosmology. 
Obviously, physics works well in most situations 
without taking into account its being part of one 
universe. Similarly, much can be learned about a 
quantum universe if only some degrees of freedom 
of gravity are considered as in mini- or 


midisuperspace models. In addition, complicated 
interpretational issues, as important as they are for 
a deep understanding of quantum physics, do not 
prevent the development of physical applications in 
quantum cosmology, just as they did not do so in 
the early stages of quantum mechanics. 


See also: Canonical General Relativity; Cosmology: 
Mathematical Aspects; Loop Quantum Gravity; Quantum 
Geometry and its Applications; Spacetime Topology, 
Causal Structure and Singularities; Wheeler—De Witt 
Theory. 
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Introduction 


With a given quantum system we associate a 
Hilbert space such that pure states of the system 
are represented by normalized vectors w in H or 
equivalently by one-dimensional projections |w) (w, 
whereas mixed states are given by density matrices 
p= >j plp) Wil p; 20, i Pj = 1, that IS, positive 
trace-1 operators and observables are identified 
with self-adjoint operators A acting on H. The 
mean value of an observable A at a state p is given 
by the following expression: 


<A>j= (pa) [1] 


The time evolution of the isolated system is deter- 
mined by the self-adjoint operator H (Hamiltonian) 
corresponding to the energy of the system. The 
infinitesimal change of state of the isolated system 
can be written as 


w(t + dt) = Y(t) —iHdty(t), or 


o(t + dt) = p(t) — ide(FH, 2 


what leads to a reversible purity preserving unitary 
dynamics Y(t) =e™™Ħ y, p(t) =e" pe", We use the 
notation [A,B] = AB—BA,{A,B}=AB+BA and 
put 4 =1. An interaction with environment leads 
to irreversible changes of the density matrix trans- 
forming, in general, pure states into mixed ones. 
Such a process can be modeled phenomenologically 
by a transition map V:H => H leading to 


1 
p(t + dt) = p(t) + dtVpV* — d5 iV" Vip} [3] 


Combining Hamiltonian dynamics with several 
irreversible processes governed by a family of 
transition operators {Vj} we obtain the following 
formal evolution equation in the Schrödinger picture 
(quantum Markovian master equation) 


TOETO 


= —i[H, p(t) 15> ([V; o(t) V7] + [Vje(2), V;]) 


jel 
= Dp(t) + p(t)D* + p(t) [4] 


with the initial condition p(0)=p. Here D=—iH — 
(1/2) >; ViVi Pp = der VipV;; and I is a certain 
countable set of indices. Assume for the moment 
that the Hilbert space H = C”. Then the eqn [4] is 
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always meaningful and its solution is given in 
terms of the exponential p(t)=A(t)p=e“p. The 
linear map ® is a general completely positive map 
on matrices, which preserves the positivity of p 
and ®@Iy preserves positivity of nd x nd matrices 
for arbitrary d=1,2,3,... A useful Dyson-type 
expansion 


Oo t th 
etp =W(t)p + 5S) dt, / dtp_y--- 
k=1 79 0 


x [ dt, W(t — tg) PW (tp — tp_1) 
x @--- PW (t1)p [5] 


with W(t) = W(t)pW(t)*, W(t)=eP shows that 
A(t) is also completely positive. It is often conve- 
nient to describe quantum evolution in terms of 
observables (Heisenberg picture) 


<A >p) = tr((e""p)A) 


=t (p eA) =), [6] 
d * 
qh) = A(t) 
= i[H, A(t)] + D (V;[A(t), Vi] + [V;, A()] VJ) 
jel 
= D* p(t) + p(t)D + &*p(t) [7] 


with the initial condition A(0)=A, completely 
positive ®*A = Je; V; AV; and the corresponding 
Dyson expansion. 

The solutions of eqns [4] and [7] are given in 
terms of dynamical semigroups. Their general 
mathematical properties and particular examples 
will be reviewed in this article. Various methods of 
derivation of master equations for open quantum 
systems from the underlying Hamiltonian dynamics 
of composed systems will also be presented. 


jel 


Semigroups and Their Generators 


For standard quantum-mechanical models it is con- 
venient to define quantum dynamical semigroup in 
the Schrödinger picture as a one-parameter family 
{A(t);£ > 0} of linear and bounded maps acting on the 
Banach space of trace-class operators 7 (H) equipped 
with the norm |lo||, = tr(co*)'/* and satisfying the 
following conditions: 


1. Composition (semigroup) law 


A(t)A(s) =A(t+s), for all t,s > 0 [8] 


160 Quantum Dynamical Semigroups 


2. Complete positivity 
A(t) @ Iq is positive on T(H @ C) 
for all d = 1,2,3,... and t= 0 [9] 
3. Conservativity (trace preservation) 
tr(A(t)p) =tr(p), for all p E€ T(H) [10] 
4. Continuity (in a weak sense) 
lim tr(AA(t)p) = tr(Ap) 
t— 
for all p € T(H), A € B(H) [11] 
From a general theory of one-parameter semigroups 
on Banach spaces it follows that under the condi- 
tions (1)-(4) A(t) is a one-parameter strongly 
continuous semigroup of contractions on T(H) 


uniquely characterized by a generally unbounded 
but densely defined semigroup generator £ with the 


domain dom(£L)C T(H) such that for any 
p E€ dom(L) 

d 

gre = Lp(t), p(t) = A(t)p [12] 
One can show that for A>0 the resolvent 


R(\) =(AIl— £)! can be extended to a bounded 
operator satisfying ||R(A)|| <A! and, therefore, the 
following formula makes sense: 


lim (1- =L) =A forallpeT(H) 13 
Under the additional assumption that the generator 
L is bounded (and hence everywhere defined) Gorini, 
et al. (1976) and Lindblad (1976) proved that eqns 
[4] and [7] with bounded H, V; and Daj VV; provide 
the most general form of £. The choice of H and V; is 
not unique and the sum over j can be replaced by an 
integral. In the case of n-dimensional Hilbert space 
we can always choose the form of eqn [4] with at 
most n? — 1 Vs. Sometimes the structure [4] is 
hidden as for the following useful example of the 
relaxation process to a fixed density matrix pọ with 
the rate u >Q: 


© p(t) = ulo ~ elt) [14] 


The general structure of an unbounded £ is not 
known. However, the formal expressions [4] and [7] 
with possibly unbounded D and V; are meaningful 
under the following conditions: 


e the operator D generates a strongly continuous 
contracting semigroup {eP ;t > 0} on H; 

e dom(V;) > dom(D), for all j; 

e <¢, Dy> + <D¢,p> + D7, < Vid, Viv> =0, for 
all ġ, Y € dom(D). 


We can solve eqn [4] in terms of a minimal solution. 
Defining by Z the generator of the contracting 
semigroup pt>e peP" and denoting by J the com- 
pletely positive (unbounded) map p> }jer VipV;, 
one can show that for any \ > 0, J(AI — Z)~! possesses 
a unique bounded completely positive extension 
denoted by A) with |/A)||<1. Hence, for any 
O<r< 1 there exists a strongly continuous, comple- 
tely positive and contracting semigroup A” (t) with the 
resolvent explicitly given by 


R(A) = AL- Z)! y rk AŠ [15] 
k=0 


As |[R™(A)|| <1 the limit lim, „1 R'(A)=R(A), 
where R(A) is the resolvent of the semigroup A(t) 
satisfying (1), (2), and (4) and called the minimal 
solution of the eqn [4]. The minimal solution need 
not be a unique solution or conservative (generally 
tr p(t)<trp(0) and for any other solution 
p(t) > p(t)). There exist useful sufficient conditions 
for conservativity, an example of a sufficient and 
necessary condition is the following: A$ — 0 strongly 
as n— co for all A>0 (Chebotarev and Fagnola 
1988). 


Examples 


Bloch equation The simplest two-level system can 
be described in terms of spin operators 
Sp =(1/2)op,k=1,2,3, where oy are Pauli matrices. 
The most general master equation of the form [4] 
can be written as (Alicki and Fannes 2001, Ingarden 
et al. 1997) 


eo a 1S 
= = — iD, h,(Sp, o] + 7 akii (Skp, Sı] 
+ [Sz, eSi]} [16] 


where hb, €R and [a,j] is a 3x3 complex, 
positively defined matrix. Introducing the magneti- 
zation vector M,(t)=tr(p(t)S,), we obtain the 
following Bloch equation used in the magnetic 
resonance theory: 


d 

dt 
where the tensor F (real, symmetric, and positive 
3 x 3 matrix) and the vector Mop are functions of 
[aşı]. In particular, complete positivity implies the 
following inequalities for the inverse relaxation 
times 71, %2, %3 (eigenvalues of F): 


M(t) = b x (M(t) — Mo) — F(M(#) — Mo) [17] 


vp =’), 
V3 Ty 2 Y2, 


yy 2 Ne 


[18] 
VAP Os 2 


Damped and pumped harmonic oscillator The 
quantum master equation for a linearly damped 
and pumped harmonic oscillator with frequency w 
and the damping (pumping) coefficient yı (y+) has 
form 


x = — iwla*a, p| + + (lap, a*] + |a, pa*]) 
+2 (la* p,a] + [a*, pal) 12 


where a*,a are creation and annihilation operators 
satisfying [a,a*]=1. Taking diagonal elements 
Pu= <n,pn> in the “particle number” basis 
a*a\ln>=n|n>, n=0,1,2,..., which evolve inde- 
pendently of the off-diagonal elements, one obtains 
the birth and death process, 


dp, 
7 =q (n + 1)pn+1 + YnPa-1 


— (yn + nla + 1))pn [20] 


It is convenient to use the Heisenberg picture and 
find an explicit solution in terms of Weyl unitary 
operators W(z) = exp[(i/W2)(za + Za*)], 


EWR) 


Z k £ Ta 
=ex <P 4 pee ( N fret [21] 


where z(t) = exp{—(iw+$( —r))t},¢ > 0. Fory >r 
the solution of eqn [19] always tends to the stationary 
Gibbs state 


ps = A ae Vie tre Pera 


1 
p => lnn) 





|22] 


Quasifree semigroups The previous example is the 
simplest instance of the dynamical semigroups for 
noninteracting bosons and fermions which are 
completely determined on the single-particle level. 
Such systems are defined by a single-particle Hilbert 
space Hı and a linear map Hı Ð ọm a*(¢ġ) into 
creation operators satisfying canonical commutation 
or anticommutation relations (CCRs or CARs, 
respectively) for bosons and fermions, respectively 


aly), a(o) =<4, 0> n 
|A, B], = AB — (+1)BA 
In all expressions containing (+), sign (+) refers to 


bosons and (—) to fermions. 
Consider a nonhomogeneous evolution equation 
on the trace-class operators o € T(H}): 


d 
T = —i|Hı , 0] 


SC - Mob +) 24 
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with a single-particle Hamiltonian Hı and a damp- 
ing (pumping) positive operator [\([;) >0. The 
operators Hı, T}, and TI; need not be bounded 
provided —iHı —(1/2){(T; —(+)I+) generates a 
(contracting in the fermionic case) semigroup 
{T(t);t > 0} on Hı and the formal solution of 
eqn [24] 


a(t) = T(t)o(0)T* (t) + Q(t) 
where O(t) =| T(s)C;T*(s)ds [25] 


is meaningful. We can now define the quasifree 
dynamical semigroup for the many-particle system 
described by the Fock space F+(H1) (Alicki and 
Lendi 1987, Alicki and Fannes 2001). The simplest 
definition involves Heisenberg evolution of the 
ordered monomials in a*(w;) and a(¢;): 


A (t)a* (p1) +++ a" (Wm )a(b1) +++ abn) 
= >, Dets [< Yr (4) 4 >l r 
P 


a(T'(t)bay) °° (P(E) Pons) 


x a(T*(t)bp,) ---4(T*(t)bp,_,) [26] 
The sum is taken over all partitions {(j1,...,fr) 
(is i ssm) hili i esty) (Ois ‘ ss Bn—r} such that 
N<fro< ees iss Oy LQ, Ss L Am_y, i ee 


<r, B1 < B2: < Qnr; E€ = 1, € is a product of 
signatures of the permutations {1,2,...,m}—> 
E E E e a iy sg les Poe aes 
Gn_r}3 a permanent Det, is taken for bosons, a 
determinant Det_ for fermions. 

Introducing an orthonormal basis {e,} in H1 and 
using the notation a*(eę) =a, we can write a 
formal master equation for density matrices on the 
Fock space corresponding to eqn [26]: 


dp _ 
T +D (e 


+ [ar paj]) + T$ (lap, ai] + lak pa])) [27] 


lagp, aj] 


Again, formally, 


Hp= Ss <er, Hie;> aa] 
k,l [28] 


rF = <e l je>, Yie 


Often the formulas [27], [28] are not well- 
defined, but replacing the (infinite) matrices by 
(distribution-valued) integral kernels, sums by inte- 
grals, and a;,a; by quantum fields, we can obtain 
meaningful objects. 

Quasifree dynamical semigroups find applications 
in the theory of unstable particles, quantum linear 


162 Quantum Dynamical Semigroups 


optics, solid-state physics, quantum information 
theory, etc. (Alicki and Lendi 1987, Sewell 2002). 


Ergodic Properties 


Dynamical semigroups which possess stationary 
states satisfying £Lpo=O0 are of particular interest, 
for example, in the description of relaxation 
processes toward equilibrium states (Frigerio 1977, 
Spohn 1980, Alicki and Lendi 1987). The dynamical 
semigroup {A(t)} with a stationary state po is called 
ergodic if 


lim A(t)p = po, for any initial p [29] 


t= co 


For the case of finite-dimensional H at least one 
stationary state always exists. If, moreover, it is 
strictly positive, pọ >0, then we have the following 
sufficient condition of ergodicity: 


(VAE = {AA € B(N),[A, Vj] = 0,7 € 
= C1 [30] 
Open systems interacting with heat baths at the 


temperature T are described by the semigroups with 
generators [4] of the special form 


d 


gree = — iH, p(t)| + ; Si ([Vj, p(t) Vi] 
w;>0 
+ [Vjo(e), VI) + e% (1V7 oV] 
+ [V eE), Vj]) } [31] 
where 
1 
oe u= [32] 


The Gibbs state pọ=Z e} is a stationary state 
for eqn [31] and the condition {V;, V*ž;j € IY =C1 
implies ergodicity (return to equilibrium). Moreover, 
the matrix elements of p diagonal in H-eigenbasis 
transform independently of the off-diagonal ones 
and satisfy the Pauli master equation 


d 
K = X (apiP1 — apr) [33] 
I 


with the detailed balance condition aņe7®®™ = 
aye Er, where E; are eigenvalues of H. 

Define the new Hilbert space L7(H, pg) as a 
completion of B(H) with respect to the scalar 
product (A, B) A tr(pA*B). The semigroup’s gen- 
erators in the Heisenberg picture corresponding to 
eqn [31] are normal operators in £L7(H, ps) with the 
Hamiltonian part i[H,-] being the anti-Hermitian 
one (automatically for bounded L*, and for 


unbounded one under technical conditions concern- 
ing domains). This allows spectral decomposition of 
L* and a proper definition of damping rates for the 
obtained eigenvectors. The normality condition is 
one of the possible definitions of quantum detailed 
balance. The other, based on the time-reversal 
operation, often coincides with the previous one 
for important examples. 

Interesting examples of nonergodic dynamical 
semigroups are given for open systems consisting of 
N identical particles with Hamiltonians H®’ and 
operators vi invariant with respect to particles 
permutations. Then the commutant {H®), ae 
j € IY contains an abelian algebra generated by 
projections on irreducible tensors corresponding to 
Young tables. 


From Hamiltonian Dynamics to 
Semigroups 


One of the main tasks in the quantum theory of open 
systems is to derive master equations [4] from the 
model of a “small” open system S interacting with a 
“large” reservoir R at a certain reference state wr 
(Davies 1976, Spohn 1980, Alicki and Lendi 1987, 
Breurer and Petruccione 2002, Garbaczewski and 
Olkiewicz 2002). Starting with the total Hamiltonian 
H\=Hs ©®1rR+15 S Hr HAJ Sa Q Ra, where Sa = 
S*, Ra = Rž ,tr(wrRa)=0, and A is a coupling con- 
stant, we define the reduced dynamics of S$ by 


p(t) = AY (t)p =trr(Uy(t)p @wrUi(t)) [84] 


with U)(t) = exp (—itH)). Here trr denotes a partial 
trace over R defined in terms of an arbitrary basis 
{ez} of R by the formula <4, (trrA)d>=)°,<¢®@ 
ep, Ad ®e,>. Generally, AM (t + s) 4 AM (HA® (s), 
but dynamical semigroups can provide good approx- 
imations in important cases. 


Weak-Coupling Limit 


Under the conditions of sufficiently fast decay of 
multitime correlation functions constructed from the 
observables R, at the state wr, one can prove that 
for small coupling constant A the exact dynamical 
map A(t) can be approximated by the dynamical 
semigroup corresponding to the following master 
equation: 


d , A? 
=, lt) = ilH, +S A 2 Calo) 


ab wESp 
x (IVS oE) VE] + [VSe@), VE"]) 135) 


where H= Hs +A Dah 28 Kaglw)Ve* VÉ is a 
renormalized Hamiltonian, $` es, denotes the sum 
over eigenfrequencies of —[H, - ], eS, e- *H = Danes 
Ve" and 


| eter (wpe! Re" Rg) dt 
0 
= $Cag(w) + iKog(w) [36] 


The rigorous derivation involves van Hove or weak 
coupling limit, A— 0, with T= \*t kept fixed. 

It follows from the Bochner theorem that the 
matrix [C,g(w)] is positively defined and therefore 
by its diagonalization we can convert eqn [35] 
into the standard form [4]. If the reservoir’s state 
Wr is an equilibrium state (Kubo—Martin—Schwinger 
state) then Cyg(—w) =e™™/ T Calw) and therefore 
eqn [35] can be written in a form [31]. Moreover, 
transition probabilities a,; from eqn [33] coincide 
with those obtained using the “Fermi golden 
rule.” 


Low-Density Limit 


If the reservoir can be modeled by a gas of 
noninteracting particles (bosons or fermions) at 
low density v, we can derive the following master 
equation which approximates an exact dynamics 
[34] in the low-density limit (v — 0, with 7 = vt kept 
fixed) 


4, P(t) = —i[H, p) + mf d°pd°p'G(p) 


wes 
ws 5(Ep a Ep A w) ([To(p, p"), p(t)T..(p, pY] 
Topp) )) [37] 


+ [To (p, p) p(t), 


Here H is a renormalized Hamiltonian of the system 
S eHre "s> alae n T is a Tmo 
describing the scattering S involving S and a 
single particle, T= VQ, where V is a particle- 
system potential and Q, is a Møller operator. 
T (p,p) denotes the integral kernel corresponding 
to T, expressed in terms of momenta of the bath 
particle, Ep the kinetic energy of a particle, and G(p) 
its probability distribution in the momentum space. 
If G(p)~ exp(—Ep/kRpT) and microreversibility con- 
ditions, Ep = E-p and T,,(—p, —p') = To(p', p), hold, 
then eqn [37] satisfies the quantum detailed-balance 
condition with the stationary Gibbs state 
PB» GB = Li kpT: 


Entropy and Purity 


The relative entropy S(p|c)=tr(plnp— plno) is 
monotone with respect to any trace-preserving 
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completely positive map A, that is, S(Ap|Ao) < S(pjo) 
Hence, for the quantum dynamical semigroup A(t) with 
the stationary state pọ we obtain the following relation 
for the von Neumann entropy S(p) = —tr(p ln p): 


© S(p(t)) = —<S(p(t)|o0) -$t 


where —(d/dt)S(p(t) | po) > O is an entropy produc- 
tion and the second term describes entropy exchange 
with environment (Spohn 1980, Alicki and Lendi 
1987). 

Bistochastic dynamical semigroups preserve the 
maximally mixed state, that is, £(1)=0. For them, 
the von Neumann entropy does not decrease and the 
purity trp? never increases (Streater 1995). Two 
important classes of master equations, used to 
describe decoherence, yield bistochastic dynamical 
semigroups: 


(p(t) In po) [38] 


© p(t) = ~i[H, p(t) 
T2 lA (Aj et)|], A= A? pI 
© p(t) = ~i[H, p(t) 


+ J u(da)(U(a)p(t)U* (a) — p(t) [40] 


where U(a) are unitary and u(-) is a (positive) 
measure on M. 


Itô-Schrödinger Equations 


Up to technical problems in the case of unbounded 
operators, the master equation [4] is completely 
equivalent to the following stochastic differential 
equation (in Itô form): 


dy(t) = — iHy(t) =>» Vi V(t) 
25 
-iX Viv(t)dx;(¢) [41] 
JEI 


where X;(t) are arbitrary statistically independent 
stochastic processes with independent increments 
(continuous or jump processes) such that the 
expectation E(dX;(t) dX;(t)) =ô; dt. Equation [41] 
should be understood as an integral equation 
involving stochastic It6 integrals with respect to 
{X;(t)} computed according to the Ito rule: 
dX;(t)dX;,(t)=6,,dt. Taking the average p(t)= 
E(\w(t) >< y(t)|) one can show, using the Ito rule, 
that p(t) satisfies eqn [4]. For numerical 
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applications, it is convenient to use the nonlinear 
version of eqn [41] for the normalized stochastic 
vector (t)=w(t)/||w(t)|], which can be easily 
derived from eqn [41] (Breurer and Petruccione 
2002). 

Introducing quantum noises, for example, quan- 
tum Brownian motions defined in terms of bosonic 
or fermionic fields and satisfying suitable quantum 
It6 rules one can develop the theory of noncommu- 
tative stochastic differential equations (NSDE) 
(Hudson and Parthasarathy 1984). Both, eqn [41] 
and NSDE, provide examples of unitary dilations — 
(physically singular) mathematical constructions of 
the environment R and the R-S coupling which 
exactly reproduce dynamical semigroups as reduced 
dynamics [34]. 


Algebraic Formalism 


In order to describe open systems in thermodyna- 
mical limit (e.g., infinite spin systems) or systems 
in the quantum field theory one needs the 
formalism based on C* or von Neumann algebras. 
In the C*-algebraic language, by dynamical semi- 
group (in the Heisenberg picture) we mean a 
family {T(t);t > 0} of linear maps on the unital 
C*-algebra A satisfying the following conditions: 
(1) complete positivity, (2) T(t)T(s)=T(t+s), 
(3) weak (or strong) continuity, and (4) T(t)1=1. 
Assuming the existence of a faithful stationary 
state w=woT(t) on A, one can use a Gelfand- 
Naimark—Segal (GNS) representation 7,(A) of A 
in terms of bounded operators on the suitable 
Hilbert space H,, with the cyclic and separating 
vector Q satisfying w(A)= <Q,7,,(A)Q> for all 
AEA. Then the dynamical semigroup can be 
defined on the von Neumann algebra M (obtained 
by a weak closure of 7,(A)) as T(t)r (A) = 
T.(T(t)A). The Kadison inequality valid even for 
2-positive bounded maps A on A 


A(AA*) > A(A)A(1)A(A*) (42) 


implies that w([T(t)A]*T(t)A) <w(A*A), which 
allows one to extend the dynamical semigroup 
to the contracting semigroup T(t)[m,(A)Q] = 
[7.(T(t)A)]Q on the GNS Hilbert space Ho. Typi- 
cally, one tries to define the semigroup in terms of 
the proper limiting procedures T(t) = lim, — o T;,(t), 
where T,(t) is well defined on A. However, the limit 
may not exist as an operator on A but can be well 
defined on the von Neumann algebra M. If not, the 
contracting semigroup on H, may still be a useful 
object. 

Although there exists a rich ergodic theory 
of dynamical semigroups for the special types of 


von Neumann algebras, the most difficult problem 
of constructing physically relevant semigroups 
for generic infinite systems remains unsolved 
(Majewski and Zegarlinski 1996, Garbaczewski 
and Olkiewicz 2002). 


Nonlinear Dynamical Semigroups 


The reduced description of many-body classical or 
quantum systems in terms of single-particle states 
(probability distributions, wave functions, or density 
matrices) leads to nonlinear dynamics (e.g., Boltz- 
mann, Vlasov, Hartree, or Hartree-Fock equations) 
(Spohn 1980, Garbaczewski and Olkiewicz 2002). A 
large class of nonlinear evolution equations for 
single-particle density matrices p can be written as 
Alicki and Lendi (1987) 


P= Cipp 43] 


where a+>L[o] is a map from density matrices to 
semigroup generators of the type [4]. Under 
certain technical conditions the solution of eqn 
[43] exists and defines a nonlinear dynamical 
semigroup — a family {I(t);¢ > 0} of maps on the 
set of density matrices satisfying the composition 
law T(t +s) =T(é)I(s). 

A simple example is provided by an open N- 
particle system with the total Hamiltonian invariant 
with respect to particle permutations. The Marko- 
vian approximation combined with the mean-field 
method leads to a nonlinear dynamical semigroup 
which preserves purity and for initial pure states is 
governed by the nonlinear Schrödinger equation 
with the following structure: 


dw 
= ilh + NUH) JY 


N x 
PO Viy > Viy 


- <p, Vib> Vid) [44] 


Here / is a single-particle Hamiltonian, U(w) a 
Hartree potential, and V; are single-particle opera- 
tors describing collective dissipation. 


See also: Boltzmann Equation (Classical and Quantum); 
Channels in Quantum Information Theory; Evolution 
Equations: Linear and Nonlinear; Kinetic Equations; 
Nonequilibrium Statistical Mechanics (Stationary): 
Overview; Positive Maps on C*-Algebras; Quantum 
Error Correction and Fault Tolerance; Quantum 


Mechanical Scattering Theory; Stochastic Differential 
Equations. 
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Introduction 


In general relativity, the metric is a dynamic entity, 
there is no preferred notion of time, and the theory 
is invariant under diffeomorphisms. Therefore, one 
expects the concept of dynamics to be very different 
from that in mechanical or special relativistic 
systems. Indeed, in a canonical formulation, the 
diffeomorphism symmetry manifests itself through 
the appearance of constraints (see Constrained 
Systems). In particular, in the absence of boundaries, 
the Hamiltonian turns out to be a linear combina- 
tion of them. Thus, the dynamics is completely 
encoded in the constraints. 

To quantize such a system following Dirac, one 
has to define operators corresponding to the 
constraints on an auxiliary Hilbert space. Solutions 
to the quantum dynamics are then vectors that are 
annihilated by all the constraint operators. Techni- 
cal complications can arise, and the solutions might 
not lie in the auxiliary Hilbert space but in an 
appropriately chosen dual. 

Physical observables on the other hand are 
associated with operators on the auxiliary space 
that commute with the constraints or, equivalently, 
operators that act within the space of solutions. 


Since the solutions of the quantum dynamics will 
not depend on any sort of time parameter in an 
explicit way, they cannot be readily interpreted as a 
(quantum) spacetime history. The conceptual ques- 
tions related to this are known as the “problem of 
time” in quantum gravity. 

We should mention that there is a proposal — 
consistent discretizations — that allows us to elimi- 
nate constraints, at the expense of a discretization 
of the classical theory and dynamical specification of 
Lagrange multipliers. Application of this technique 
to gravity is currently under study. 

Loop quantum gravity (LQG) (see Loop Quantum 
Gravity) is based on the choice of a canonical pair 
(Aa, E?) of an SU(2) connection and an su(2)-valued 
vector density. The constraints come in three classes: 


GilA, E](x) 
CIA, E](x 


=), VA. Ele) = 0, 
= 0 
the Gauss, vector, and scalar constraints, respectively. 
Before giving some detail about the quantization 
of the constraints and their solutions, we should 
mention that there exists an analogous classical 
formulation in terms of complex (self-dual) vari- 
ables. The quantization in that formulation faces 
serious technical obstacles, but in the case of 
positive cosmological constant an elegant formal 
solution to all the constraints — the Kodama state — 
is known. It is related to the Chern-Simons action 
on the spatial slice. 
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As said before, strictly speaking, implementing the 
dynamics comprises quantizing and satisfying all the 
constraints. Here we will however focus on C since 
it is the most challenging, and most closely related 
to standard dynamics in that it generates changes 
under timelike deformations of the Cauchy surface 
X on which the canonical formulation is based. 

The quantum solutions of the other constraints, 
linear combinations of s-knots, lie in a Hilbert space 
Kaige Which is part of the dual of the kinematical 
Hilbert space K of the theory. For details on these 
solutions as well as some basic definitions that will 
be used without comment below (see Loop Quan- 
tum Gravity). Since s-knots are labeled, among other 
things, by a diffeomorphism equivalence class of a 
graph, relations to knot theory are emerging at this 
level (see Knot Invariants and Quantum Gravity). 

It is important to note that C does not Poisson- 
commute with the diffeomorphism constraints. 
Therefore, in the quantum theory it does matter in 
which order the constraints are solved. It turns out 
that on the quantum solutions to the other con- 
straints, the scalar constraint can be defined by 
introducing a regulator, and stays well defined even 
when the regulator is removed. This ultraviolet 
finiteness on Kgif¢ can be intuitively understood 
from the diffeomorphism invariance of its elements: 
There is no problematic short-distance regime since 
the states do not contain any scale at all. 

In the following we will briefly review the imple- 
mentation of the scalar constraint in LQG and 
comment on some ramifications and open questions. 


The Scalar Constraint Operator 


In the Lorentzian theory the scalar constraint C is 
the sum of the scalar constraint CE of the Euclidean 
theory: 


CF = (det a” tr(F,,[E*, E*]) 


a second term of a similar form, but with the 
curvature F of the connection A replaced by the 
curvature associated to a certain triad e, and 
possibly matter terms. In the following we will just 
discuss C}, the other terms can be handled in a 
similar fashion. 

There appear to be a number of obstacles to the 
quantization of CF: for one, the inverse of 
the determinant would likely be ill defined, as 
the volume operator — essentially a quantization of 
| (det q)'/* — has a large kernel. In addition, there 
are no well-defined operators corresponding to F 
and E evaluated at points. Rather, only holonomies 
h.[A] of A along curves e and certain functionals of 


E are well defined as operators. These issues can 
however be dealt with in an elegant way as follows. 

The first step is to absorb the determinant factor 
into a Poisson bracket, 


2 
CE = — tr(F {Ac VY} 


where V is the volume of the spatial slice X. Then 
one approximates the curvature by (identity minus) 
the holonomy around a small loop. In the present 
case one finds that for a small tetrahedron A with 
base point v, one can approximate 


CE (N) := ant f N tr(F A {A, V}) 
A 


a -Ž Nq) tr(ba,ba (h71, V}) [1] 
where (see Figure 1a)) the s; are edges of A incident 
at v and the a; loops around the faces of A incident 
at i: 7 

This suggests how to define an operator Cf that 
acts on cylindrical functions on a given graph T: one 
chooses a triangulation adapted to the graph and 
quantizes the C% (N) (where A is a tetrahedron of 
this triangulation) using the right-hand side of [1] - 
holonomies are quantized by the holonomy opera- 
tors of the quantum theory, V by the volume 
operator V, and the Poisson bracket by the 
corresponding commutator divided by ih. To be 
more precise, the triangulation is chosen such that 
the są in [1] are part of IT, and the operators 
corresponding to the ha are creating new edges that 
connect the endpoints of the s, (see Figure 1b). _ 

Still this is not sufficient, since the definition of CE 
depends quite heavily on the choice of the triangula- 
tion, and there is no natural way to choose one. 
Furthermore, there is no choice that would guarantee 





(a) (b) 


Figure 1 (a) A tetrahedron A and its labeling of edges and 
loops. (b) A tetrahedron A adapted to the edges (dashed lines) 
of a graph r. 


that the CE for different I’ are consistent in the sense 
that they correspond to the action of the same 
operator CE on two different cylindrical subspaces. 
Here, the diffeomorphism invariance of the theory 
comes to the rescue: a well-defined operator largely 
free of ambiguities can be obtained by letting the 
operators above act (by duality) on Kaif to give 
elements in K*. When acting on diffeomorphism- 
invariant states, the ambiguities in the definition of 
the triangulations can be eliminated, and the opera- 
tors CE for different T are consistent and together 
define an operator CE(N). Roughly speaking, for a 
diffeomorphism-invariant state, it does not matter 
anymore where on the graph the endpoints of the s, 
lie and how they are connected to form the loops a. 
The final picture looks as follows: for each s-knot s, 
the operator gives a sum of ae te one for 
each vertex of s, that is, CE(N)s= a C,(N)s. The 
terms in this sum are not a atten 
Their evaluation on a spin network S is of the form 


= ae N(x 


where the s’ are s-knots that differ from s by the 
addition or deletion of certain edges, and correspond- 
ing changes in coloring (by 1/2) and intertwiners. As 
an example, Figure 2 schematically depicts the action 
on a trivalent vertex. The point x(v) on which N is 
evaluated in the above formula gets determined as 
follows: the evaluation s’[S] is zero unless the graph T 
on which S is based is an element in the diffeomorph- 
ism equivalence class on which s’ is based. x(v) is the 
position of the vertex v in this element of the 
equivalence class. Because of this x(v), the action of 
CE(N ) is not diffeomorphism invariant. 

Similar techniques give a quantization C of the 
full constraint. The solutions to the constraint can 
be determined as the vectors ọ% € Kaip that are 
annihilated by C in the sense that (C(N)w)[f] =0 
for all functions N and elements f of K. The 
solutions are more or less explicitly known; how- 
ever, the task of interpreting them is a hard one and 
remains an object of current research. 

It should be mentioned that, strictly speaking, one 
can arrive at several slightly different versions of the 


v))s [S] |2] 





Figure 2 A schematic rendering of the action of the operator 
C, for a trivalent vertex. 
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constraint operator along the lines sketched above. 
The quantization ambiguities include changes in the 
power of the volume operator and the spin quantum 
number that the constraint creates or annihilates. An 
interesting check on these quantizations would be to 
inspect the algebra of constraint operators for anoma- 
lies. In the present situation, this can only be carried 
out to a certain extent, because C is defined on 
diffeomorphism-invariant states. The Poisson bracket 
between two scalar constraints is proportional to a 
diffeomorphism constraint, and indeed it turns out 
that in the quantum theory the commutator of two 
scalar constraint operators vanishes for quantizations 
as described above. In that sense they are ambiguity 
free; however, this criterion is not strong enough to 
distinguish between the candidates. 

Recently, a slightly different strategy has been 
proposed, which, if successfully implemented, would 
eliminate some of the questions regarding the 
constraint algebra. The idea is to combine the 
constraints C(N) for different lapse functions N 
into one master constraint 


M = [ (ce q) "C dx 
3 


M is manifestly diffeomorphism invariant and could 
replace all the noncommuting constraints C(N), 
hence simplifying the constraint algebra considerably. 

The interpretation of the solutions of all the 
constraints hinges on the construction of observables 
for the theory. This is already a difficult task in the 
classical theory, and thus even more so after quantiza- 
tion. Though there is no general solution to this problem 
available, interesting proposals are being studied. 

Finally, it should be said that the quantization of 
the scalar constraint can be used to obtain a picture 
that resembles more the standard time evolution in 
quantum field theory. The (formal) power series 
expansion of the projector 


P = [Pace = J DIN] exp i [ NEČE) 


onto the kernel of C can be described by a spin foam 
model (see Spin Foams). 

For further information on the subject of this article 
see the references: Thiemann (to appear), Rovelli 
(2004), and Ashtekar and Lewandowski (2004) for 
general reviews on LQG (with a systematic exposition 
of a large class of quantizations of the scalar constraint 
and their solutions in Ashtekar and Lewandowski 
(2004)); Thiemann (1998) for a seminal work on the 
quantization of the scalar constraint; Rovelli (1999) 
and Reisenberger and Rovelli (1997) on the connec- 
tion to spin foam models; Di Bartolo et al. (2002) on 
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consistent discretizations; Kodama (1990) and Freidel 
and Smolin (2004) on the Kodama state; and 
Thiemann (2003) on the master constraint program. 


See also: Constrained Systems; Knot Invariants and 
Quantum Gravity; Loop Quantum Gravity; Quantum 
Geometry and its Applications; Spin Foams; Wheeler—De 
Witt Theory. 
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Introduction 


Quantum electrodynamics (QED) describes the 
interaction of the electromagnetic field (EMF) 
with charged particles. Any physical particle 
interacts, directly or indirectly, with any other 
particle (including itself); in the case of the 
electron, however, at low and medium energy 
(say, up to a few GeV) the interaction with the 
EMF is by and far the most important, so that 
QED describes with great precision the dynamics 
of the electron, and at the same time the electron 
provides with the most stringent tests of QED 
currently available. 

In the various sections of this article we will 
discuss, in the following order, the origin of QED, 
the structure of the radiative corrections, the 
application of QED to various bound states pro- 
blems (the hydrogen-like atoms, the muonium, and 
positronium) and the anomalous magnetic moments 
of the leptons (the muon and the electron). 


Origin of QED 


The origin of QED can ideally be traced back to the 
very beginning of quantum mechanics, the black- 
body formula by M Planck (1900), which was soon 
understood as pointing to a discretization of the 


energy and momentum associated to the EMF into 
quanta of light or photons (Einstein 1905). 

The quantization of the EMF was first worked out 
by P Jordan, within the article (1926) by M Born, 
W Heisenberg, and P Jordan (usually referred to as 
the Dreimdnnerarbeit) and then in the paper “The 
quantum theory of emission and absorption of 
radiation” by PAM Dirac, commonly considered 
the beginning of the so-called second quantization 
formalism. 

In the subsequent year (1928) Dirac published the 
famous equation for the relativistic electron, from 
which it was immediately deduced, on a firmer 
basis, that the electron has spin 1/2, that its spin 
gyromagnetic ratio (the ratio between spin and 
associated magnetic moment in suitable dimension- 
less units; see below for more details) is twice the 
value predicted by classical physics (a result 
expressed as g.=2) and that the levels of atomic 
hydrogen with the same principal quantum number 
n are not fully degenerate, as in the nonrelativistic 
limit, but do possess the so-called fine structure 
splitting. In particular, the energy of the 1 =2 levels 
splits into two values, one value for 2P3/2 states 
with total angular momentum J =3/2 and another 
value for the states 2S;/. and 2P1/2, which have 
J =1/2; note that the 2S; /. and 2P1/2 states are still 
degenerate. 

Very soon it was realized that Dirac’s equation 
also requires that each particle must be accompanied 
by its antiparticle, with exactly the same mass and 
opposite charge. The antiparticle of the electron, the 
positron, was indeed discovered by C Anderson 
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(1932), establishing Dirac’s equation as one of the 
cornerstones of theoretical physics. 

All the ingredients needed for the evaluation of 
the perturbative corrections to the QED theory 
(usually called radiative corrections) were already 
present at that moment, but radiative corrections 
were not systematically investigated for several 
years, due perhaps to the length and difficulty of 
the calculations and the absence of important 
disagreements between theoretical predictions and 
experimental results. 

The situation changed in 1947, when two experi- 
ments were carried out, measuring the energy 
difference between the 275, j2 and rad j2 levels of 
the hydrogen atom and the gyromagnetic ratios of 
the electron. 

Lamb and Retherford (1947), by using the “great 
wartime advances in microwaves techniques,” suc- 
ceeded in establishing that in the hydrogen atom 
“the 27S, j2 State is higher than the 2*P, j2 by about 
1000 Me/sec.,” while (as observed above) according 
to the Dirac theory the two states are expected to 
have exactly the same energy. Subsequent refine- 
ments of the experiment (Triebwasser et al. 1953) 
gave for the difference (now referred to as Lamb 
shift) the value 1057.77 + 0.10 MHz, with a relative 
error 1 x 107+. 

The authors of the second 1947 experiment 
(Kusch and Foley 1947) measured the frequencies 
associated with the Zeeman splitting of two differ- 
ent states of gallium, finding an inconsistency with 
the theoretical values of the gyromagnetic ratios of 
the electron. More exactly, write the magnetic 
moments {,;,/ls associated to the (dimensionless) 
orbital and spin angular momenta L,S of the 
electron as 


eh eh 
L, Us =—8s 








Hy = —&L [1] 


2MeC 2MeC 


where (—e) is the charge of the electron (e > 0), me 
its mass, c the speed of light and g7, gs, respectively, 
the orbital and spin gyromagnetic ratios; the Dirac 
theory then predicts g; =1 and gs=2, while the 
results of Kusch and Foley (1947) gave a discre- 
pancy which could be accounted for by taking 
gs = 2.00229 + 0.00008 and gz =1, or alternatively 
gs=2 and g, =0.99886+0.00004. In modern 


notation the first conjecture can be rewritten as 


gs = ge =2(1 +de), ae = 0.001145 +0.00004 [2] 


where a. is the anomalous magnetic moment (or 
magnetic anomaly) of the electron. 

The need of explaining the two experimental 
results gave rise to a rapid development of covariant 


perturbation theory (which replaced the previous 
noncovariant “old fashioned” perturbation theory) 
and of the renormalization theory, which liberated 
the perturbative expansion from the divergences 
plaguing the older approach, opening the path to the 
evaluation of radiative corrections and to the great 
success of precision predictions of QED. 

The formalism improved quickly, evolving in 
the more general quantum field theory (QFT) 
approach; three of the main contributors were 
Sin-Itiro Tomonaga, Julian Schwinger, and Richard 
P Feynman, awarded a few years later (1965) the 
Nobel price “for their fundamental work in quantum 
electrodynamics, with deep-ploughing consequences 
for the physics of elementary particles.” QFT was then 
successfully used for describing the weak interactions 
in the electroweak model and later on also for the 
strong interactions theory, dubbed quantum chromo- 
dynamics (or QCD, in analogy with the popular QED 
acronym). For more details and references to original 
works, the reader is invited to look at any treatise on 
QED or QFT, such as, for instance, Weinberg (1995). 

Initially, the Lamb shift was perhaps more 
important than the electron magnetic anomaly both 
for the establishment of renormalization theory and 
as a test of QED, but in the following years it was 
supplanted by the latter as a precision test of QED. 

In 1947 the “best values” for some fundamental 
constants were indeed 


c = (2.99776 + 0.00004) x 101° cm s7! 
E meca 


Ro — 109737.303+0.017cm!_ [3] 
2he 
1/a = 137.030 + 0.016 





where Ræ is the Rydberg constant for infinite mass, 
h the Planck constant, and a the fine structure 
constant (let us observe here in passing that Rə was 
and is still known much better than the separate 
values of m,.,a, and þ entering in its definition); for 
comparison, the current (2005) values for c and Ræ 
are 


c = 299792458ms! 


Ræ = 109737.31568525(73) cm~! 4 
where the value of c is exact (it is in fact the 
definition of the meter), and the relative error in Rx 
is 6.6 x 10 (the value of a will be discussed later). 
The measurement of the Lamb shift, repeated 
several times, gave results in nice agreement with 
the original value, and for several years it was 
providing either a test of QED or a precise value for 
a. But the Lamb shift is the energy difference 
between the metastable level 25;/. (whose lifetime 
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is about 1/7s) and the 2P1;2 level, which has a 
lifetime of about 1.596 ns or a natural linewidth of 
99.7 MHz. Such a large linewidth poses a strong 
intrinsic limitation to the precision attainable in the 
measure of the Lamb shift, which is just ten times 
larger; as a matter of fact, that precision could never 
reach the 1 x 10~° relative error level, while in the 
meantime the relative precision in a. reached the 
107? range, replacing the Lamb shift in the role of 
the leading quantity in high-precision QED. 


The Structure of Radiative Corrections 


For obvious space problems we can only super- 
ficially sketch here the lines along which the 
perturbative expansion of QED leading to the 
evaluation of radiative corrections can be built, 
considering for simplicity only the photon and the 
electron. One can start from a QED Lagrangian, 
formally similar to the classical Lagrangian, invol- 
ving the electron field and the vector potentials of 
the electromagnetic (or photon) field. The theory is 
a gauge theory (its physical content should not 
change if a gradient is added to the vector 
potentials); it is further an abelian gauge theory as 
the EMF does not interact directly with itself. 

The QED Lagrangian is separated into a free part 
and an interaction part. From the free part, one 
derives the wave functions of the free-particle states 
and the corresponding time-evolution operators 
(free Green’s functions or propagators; let us just 
recall here that to obtain a convenient photon 
propagator one has to break the gauge invariance 
by adding to the Lagrangian a suitable gauge- 
breaking term), while the interaction part of the 
Lagrangian gives the “interaction vertices” of the 
theory. 

Aim of the theory is to build the Green’s function 
for the various processes in the presence of the 
interaction; from these Green’s functions, one then 
derives all the physical quantities of interest. 

With the free propagators and the interaction 
vertices, one generates the perturbative expansion of 
the Green’s functions. The result, namely the 
contributions to the perturbative expansion (or 
radiative corrections), can be depicted in terms of 
Feynman graphs: they consist of various particle 
lines joined in the interaction vertices, with external 
lines corresponding to the initial and final particles 
and internal lines corresponding to intermediate or 
virtual particle states. Each graph stands for an 
integral on the momenta of all the intermediate 
states, each vertex implying among other things an 
interaction constant, which is (—e) in the case of 
electron QED, and a 6-function imposing the 


conservation of the momenta at that vertex. For 
each process, the Feynman graphs are naturally 
classified by the total number of the interaction 
vertices they contain. In the simplest graphs for a 
given process (the so-called tree graphs) the 
6-functions at the vertices make the integrations 
trivial; but when the number of vertices increases, 
closed loops of virtual particle states appear, whose 
evaluation quickly becomes extremely demanding. 
In QED, each loop gives an extra factor (—e)* with 
respect to the tree graph; it is customary to express it 
in terms of (a/m) =(e/2r)*, so that the resulting 
power of (a/r) corresponds to the number of 
internal loops. The typical QED prediction for a 
physical quantity is then expressed as a series of 
powers of the fine structure constant a (and of its 
logarithm in bound-state problems). As œ is small 
(a ~ 1/137), and the first coefficients of the expan- 
sions are usually of the order of 1, a small number 
of terms in the expansion is in general sufficient to 
match the precision of the available experimental 
data. 

But the number of different graphs for a given 
number of loops grows quickly with the number of 
the loops; in turn, each graph consists in general of a 
great number of terms and the loop integrations 
become prohibitively difficult when the number of 
loops increases, so that the evaluation of radiative 
corrections proved to be one of the major computa- 
tional challenges of theoretical physics. As a matter 
of fact, it prompted the development of computer 
programs (Veltman 1999) for processing the huge 
algebraic expressions usually encountered, and of 
many sophisticated numerical and analytical techni- 
ques for performing the loop integrations. 

It should be further mentioned here that Feynman 
graphs written by naively following the above 
sketched rules are often mathematically ill-defined, 
taking the form of nonconvergent integrals on the 
loop momenta. A regularization procedure is needed 
to give an unambiguous meaning to all the integrals; 
currently the most powerful regularization is the 
continuous dimensional regularization scheme, in 
which the loop integrations are carried out in d 
continuous dimensions, with d unspecified; renor- 
malization counter-terms are also evaluated in the 
same scheme, and the physical quantities are 
recovered in the d—4 limit (unrenormalized loop 
integrals and renormalization counterterms are 
usually singular as powers of 1/(d—4) in the 
d— 4 limit, but all those divergences cancel out in 
the physical combinations of interest). 

QED describes the main interaction of the 
charged leptons (e, u, and T) which have, however, 
weak interactions as well. Strictly speaking, pure 
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QED processes do not exist; it is an essential feature 
of QFT that any existing particle can contribute to 
the Feynman graphs for any process, when the 
approximation is pushed to a sufficiently high 
degree. In particular the photon, which is the main 
carrier of the QED interaction, is directly coupled 
also to the strongly interacting particles (the result- 
ing contributions are referred to as “hadronic 
vacuum polarization” effects). 

The precision tests of QED are then to be 
necessarily searched for in those phenomena where 
non-QED contributions are presumably small and 
which involve quantities already well known inde- 
pendently of QED itself. But such high-precision 
quantities are not always available, and as QED is 
known better than the rest of physics, very often it is 
taken to be correct by assumption, and used as a 
tool for extracting or measuring some of the non- 
QED quantities relevant to various physical 
processes. 

In any case, as QED predictions are expressed in 
terms of the fine structure constant a, a determina- 
tion of a independent of QED is needed; without it, 
the most precise predictions of QED would simply 
become measures of œ and not tests of the theory. 

Finally, it is to be recalled that, ironically, the 
problem of the convergence of the expansion in 
powers of a is still open, even if it is commonly 
accepted that convergence problems will matter only 
for precisions and corresponding perturbative orders 
(say at order 1/a ~ 137) absolutely out of reach of 
present experimental and computational possibili- 
ties, involving further extremely high energies, 
where the other fundamental interactions are 
expected to be as important as QED, so that it 
would be meaningless to consider only QED. 

In the following we will discuss only the QED 
predictions for bound states and the anomalous 
magnetic moments of u and e. 


The Bound States 


A very good review of the current status of the theory 
of hydrogen-like atoms can be found in Eides et al. 
(2001), to which we refer for more details and 
citation of the original papers. The starting point for 
studying the bound-state problem in QED is the 
scattering amplitude of two charged particles, pre- 
dicted by perturbative QED (pQED) as a (formal) 
series expansion in powers of a. In the static limit 
v— 0, where v is the relative velocity of the two 
particles, some of the pQED terms behave as a/v, so 
that the naive expansion in a becomes meaningless. 
Fortunately, it is relatively easy to identify the origin 
of those terms (which are essentially due to the 


Coulomb interaction between the two charges) and 
to devise techniques for their resummation. Among 
them, one can quote the Bethe-Salpeter equation, 
formally very elegant and complete but difficult to 
use in practice. A great progress has been achieved by 
the NRQED (nonrelativistic QED) approach, which 
is a nonrelativistic theory designed to reproduce the 
full QED scattering amplitude in the nonrelativistic 
limit by the ad hoc definition, a posteriori, of a 
suitable effective Hamiltonian. The Hamiltonian is 
then divided into a part containing the Coulomb 
interaction, which is treated exactly and which gives 
rise to the bound states, and all the rest, to be treated 
perturbatively. The power of the NRQED approach 
was further boosted by the continuous dimensional 
regularization technique of Feynman graph integrals. 

Traditionally, the results are expressed in terms of 
the energies of the bound states, but as in practice 
the precise measurements concern the transition 
frequencies between various levels, it is customary 
to express any energy contribution to some level, say 
AF, also in terms of the associated frequency 
vy =(AE)/h, where þ is the Planck constant. 


The Hydrogen-Like Atoms 


Quite in general, a hydrogen-like atom consists of a 
single electron bound to a positively charge particle, 
which is a proton for the hydrogen atom, a deuteron 
nucleus for deuterium, a Helium nucleus for an He” 
ion, a u meson for muonium, or a positron for 
positronium. Even if QED alone is not sufficient to 
treat the dynamical properties of the nuclei, their 
strong interactions can be described by introducing 
suitable form factors and a few phenomenological 
parameters; weak interactions could be treated 
perturbatively, but are not yet required at the 
precision levels achieved so far. 

The QED results for the hydrogen-like atoms can 
be expressed in terms of the mass M of the positive 
particle and of its charge Ze (of course Z=1 for 
hydrogen). When the electron mass m, is smaller 
then M (which is always the case, except the 
positronium case) one can take as a starting point 
the QED electron moving in the external field of the 
positive particle, and treat all the other aspects of 
the relativistic two-body problem (the so-called 
recoil effects) perturbatively in me/M. 

Neglecting the spin of the positive particle, the 
energy levels of the hydrogen-like atom are identi- 
fied by the usual principal quantum number n, the 
orbital angular momentum / (with the convention of 
writing S,P,D,... instead of l=0, J=1, l=2,...) 
and j, the total angular momentum including the 
spin of the electron. It turns out that the bound 
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levels consist of very many contributions of different 
kinds; dropping quantum number indices for sim- 
plicity, the energy levels can be written as an 
expression of the form 


1 
x [+ Zah + (Zafe + 
T AE vad + DE vec oh A E ael ae eee [5] 


Let us observe that it is convenient to write 
explicitly the Z factors even when Z =1 for a better 
bookkeeping of the various corrections. As usual, m, 
is the reduced mass of the electron, m,=m.M/ 
(Mme + M) the mass of the nucleus being M; the first 
term in the square bracket, 1/n?, the familiar 
Balmer term, is by and far the dominant one, giving 
for the n=1 level in the Z=1 case an energy of 
about 13.6eV or a corresponding frequency of 
3.3 x 10! Hz. The other terms in the square 
bracket, f4 and fę, are known coefficients (depend- 
ing also on the small parameter m,./M; fa 
essentially the fine structure). 

The term AE,.g, is the bulk of the radiative QED 
corrections; it can be written as a multiple expan- 
sion on (Za),a and L=In[1/(Za)*], which turns 
out to have the following explicit form: 


1 
AF ad = 045) > a =i (= ) [AsL + A40 + (Za)Aso 
+ (Za) (Aga? + AgiL + Aoo) + | 
2 
+(=} Bao + (Za) B50 


+ (Za) (Be3L? + Beg l? + Bey + Boo) + | 
+(<) "(Cav + (Za) Cot} 6) 


The first index of the coefficients refers to the power 
of (Za), the second to the power of L; as a rule, 
there are three powers of (Za) due to the normal- 
ization of the wave function and one power of (Za) 
for each interaction with the nucleus (in the leading 
term of eqn [5] one must subtract two powers of 
(Za) due to the long-range nature of the Coulomb 
interaction), while the terms in L= In[1/ (Za)*] are 
related to the infrared divergences of the scattering 
amplitude, with the binding energy acting as infra- 
red cutoff. The A-coefficients refers to order (a/7) 
or one-loop virtual correction (we do not distinguish 
here between one-loop self-mass and vacuum- 
polarization contribution, as usually done in the 
literature), the B-coefficients to two loops, etc. The 
coefficients are pure numbers, entirely determined 


within QED, even if their actual calculation is an 
extremely demanding task. One of the first results 
obtained in 1947 was A41 =(4/3)6;9, contributing to 
the 2S but not to the 2P states (quite in general, 
most corrections are much bigger for /=0 states 
than for higher-angular-momentum states), which is 
sufficient to give the right order of magnitude of the 
(2S) /2-2P;/2) Lamb shift (about 1000 MHz). The 
other coefficients are now known, thanks to the 
strenuous and continued efforts (Eides et al. 2001) 
since then, which is impossible to refer properly here 
in any detail. The current frontier of the theoretical 
calculation (around the dots in the previous for- 
mula) corresponds to 8-9 total powers of (a/7) and 
(Za) or some kHz for the 1S state. 

The next term in ely 7 AE vec contains 
contributions of order m,c?(Za)?(m-/M) or smaller 
(some care must be done for classifying the 
contributions of order m,./M, which can be 
accounted for by proper use of m, rather than me 
and genuine m./M contributions), and are suffi- 
ciently known for practical purposes; the same is 
true for many other contributions discussed in Eides 
et al. (2001) and skipped in eqn [5]. A troublesome 
contribution comes however from AE,,,\; at leading 
order, one has 


(Za) mce? RA? 
AEn = (7 2) fig 





3n3 h 


where R, is the so-called root-mean-square charge 
radius of the proton, which is not well known 
experimentally (in the literature, there are indeed 
two direct measurements, R,=0.805(11)fm and 
R, =0.862(12)fm, in poor agreement with each 
other; a new independent measurement is strongly 


needed). 


The hyperfine splitting The effect of the interac- 
tion of the electron with the spin of the positive 
particle introduces the so-called hyperfine splitting 
of all the levels. The order of magnitude of the 
hyperfine splitting of the 15S state is given by the 
Fermi energy 


4 2 4 Me 
Ep= 3 Mec (Za) 8p T 
where gp ~ 5.586 is the g-factor of the proton, 
which gives ~1.42 GHz. It was dubbed hyperfine 
because it is smaller than the fine structure terms by 
the factor me/mp. Many classes of corrections can 
be worked out, with patterns similar to those of the 
previous subsection, and also in this case the nuclear 
contributions (this time mainly due to the theoreti- 
cally unknown magnetic form factor and the 
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so-called polarizability of the proton) prevent from 
obtaining predictions with an error less than 1 kHz 
(or a relative precision better than 1 x 1076). 


The comparison with the experiments Experimen- 
Experimentally, one measures transition frequencies 
among the various levels. For many years the 
precision record was given by the hyperfine splitting 
of the ground states of hydrogen 1,;,(1S) was 
measured long ago (see Hellwig et al. (1970) and 
Essen et al. (1971)), 


Vyg(1S) = 1420 405.751 7667(9)kHz [7] 


with a relative error 6 x 107!°. The current record in 


the optical range is the value of the (18-285) 
hydrogen transition frequency, obtained by means 
of two-photon Doppler-free spectroscopy Niering 
et al. (2000), 


v(1S-2S) = 2466 061 413 187.103(46) kHz [8] 


with a relative precision 1.9 x 10714; other optical 


transitions, such as (26-8 D), (2S—12D) are measured 
with precision of about 1 x 107!'. 

The measurement of the Lamb shift was repeated 
several times, with results in nice agreement with the 
original value, such as Lundeen and Pipkin (1986), 
1057.845(9) MHz. The most precise value, 
1057.8514 + 0.0019 MHz was given in Palchikov 
et al. (1985) (the result depends, however, on the 
theoretical value of the lifetime, and should be 
changed into 1057.8576+0.0021 according to 
subsequent analysis (see Karshenboim (1996)). The 
experimental (254 /2-2P;/2) Lamb shift was also 
obtained as the difference between the measured 
fine structure separation (2P3/2-28$1/2) and the 
theoretical value of the (2P3/.-2P,/2) frequency, 
and the radiative corrections AE,,q to any level are 
now referred to as the Lamb shift of that level. 

As a somewhat deceiving conclusion, the wonder- 
ful experimental results of eqns |7] and [8] cannot 
be used as a high-precision test of the theory or to 
obtain precise values of many fundamental con- 
stants, as the theoretical calculations depend, unfor- 
tunately, on hadronic quantities which are not 
known accurately. Combining theoretical predic- 
tions, the above transitions and Lamb shift data, and 
the available values of a and m,/m,, one can indeed 
obtain a measure of Rp (Rp=0.883 + 0.014, 
according to Melnikov and van Ritbergen (2000)) 
and the value of Rẹ already quoted above. 


Muonium 


The muonium is the bound state of a positive u* 
meson and an electron. At variance with the proton, 


the u* lepton has no strong interactions, the u*e~ 
system can be studied theoretically within pure 
QED, with the weak interactions giving a known 
and small perturbation. Further, the ratio of the 
masses m,/m, ~ 4.8 x 10° is small, so that the 
external field approximation holds. However, the u 
is unstable (lifetime ~2.2 us), which makes experi- 
ments more difficult to carry out. The best measured 
quantity is the hyperfine splitting of the 1S ground 
state (see Liu et al. (1999)) 


vpr (ue, 1S) = 4463 302 765(53) Hz 


with a relative precision of 12 x 107°. The theore- 
tical treatment is similar to the case of hydrogen, 
with the important advantage that nuclear interac- 
tions are absent and everything can be evaluated 
within QED, so that the bulk of the contribution is 
given by a formula with the structure of eqn [6]. But 
the prediction depends, in any case, on the m,./m, 
mass, which is not known with the required 
precision. Indeed, a recent theoretical calculation 
(Czarnecki et al. 2002) (which includes also a 
contribution of 0.233(3)kHz from hadronic 
vacuum polarization) gives 4 463 302 680(510) 
(30)(220) Hz, where the first (and biggest) error 
comes from m,/m,, the second from a, and the third 
is the theoretical error (an estimation of higher- 
order contributions not yet evaluated). 


Positronium 


The positronium is the bound state of an electron 
and a positron. Theoretically, it is an ideal system to 
study, as it can be described entirely within QED, 
without any unknown parameter of non-QED 
origin. As the masses of the two constituents, 
positron and electron, are strictly equal, the reduced 
mass of the system is exactly equal to half of the 
electron mass, m,=m,/2, and the energy scale of 
the bound states is half of Ry. 

At variance with the muonium case, the external 
field approximation is not valid, so that positronium 
must be treated with the full two-body bound-state 
machinery of QFT, of which it provides an excellent 
test (Karshenboim 2004). 

Experimentally, radioactive positron sources are 
available, so that positronium is easier to produce 
than muonium. It is, however, unstable; states with 
total spin S equal O (also called parapositronium 
states) annihilate into an even number (mainly two) 
of gammas, and states with S = 1 (orthopositronium) 
into an odd number (mainly three) of gammas, with 
short lifetimes (which make precise measurements 
difficult). Further, as positronium is the lightest 
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atom, Doppler-broadening effects are very impor- 
tant, reducing the precision of spectroscopical 
measurements. 


Positronium decay rates There has been a long- 
time discrepancy between theory and experiment in 
decay rate of ground-state orthopositronium, which 
prompted thorough theoretical investigations look- 
ing for errors in the calculations or flaws in the 
formalism, but it turned out that the flaw was on the 
experimental side. The current theoretical prediction 
for the ground state S=1 decay is (Adkins et al. 
2002) 


2 3 3 
+B(=] m E @ a TE ) 
TE TT TE 
= 7.039979(11)us™t 


where To =2(7? — 9)meaf / (97) =7.2111670(1), 
A= —10.286606(10), B=45.06(26),C=—5.517, in 
nice agreement with the less precise experimental 
result of Karshenboim (2004, ref. 38) 7.0404 
(10)(8)us™t. As a curiosity, the coefficients A, B 
above are among the greatest coefficients so far 
appeared in QED radiative corrections. 

The agreement between theory and experiment for 
the ground-state parapositronium decay rate has 
always been good; the current status of Karshenboim 
(2004, ref. 41) is 7990.9(1.7) us™! for the experimental 
result and of Karshenboim (2004, ref. 43) 
7989.64(2) us™ for the theoretical prediction. 


Positronium levels The quantum number structure 
of the levels is similar to muonium, with the 
important difference, however, that the hyperfine 
splitting (which in hydrogen or muonium is small 
because it is proportional to the ratio of the masses of 
the two components) is in fact of the same order as 
the fine structure. The theoretical evaluation of the 
energy levels provides a very stringent check of QED 
and of the overall treatment of the bound-state 
problem. Corrections have been evaluated, typically, 
up to order mc*a’. The best-known quantities are 
the ground state (hyper)fine splitting, experimental 
value (Ritter et al. 1984) 203.38910(74) GHz 
(3.6 x 1076 relative error), theoretical (Karshenboim 
2004) 203.3917(6), and the 1S—2S transition for 
orthopositronium, experiment (Fee et al. 1993) 1 233 
607 216.4 (3.2)MHz, theory 1233607 222.2(6). 
The general agreement is good; the precisions 
achieved are, however, not yet sufficient to allow a 
determination of Rẹ» or œ competitive with other 
measurements. 


The Anomalous Magnetic Moments 
of Leptons 


The precision of the measurements requires, for both 
the e and u leptons, to also take into account graphs 
with contributions from the other leptons as virtual 
intermediate states and those of hadronic and weak 
origin. Quite in general, if the mass of the virtual 
particle, say my, is smaller than the mass of the 
external lepton, say m, one can have an In (m/m) 
behavior of the contributions; that is the case of the 
virtual electron contributions to the muon magnetic 
anomaly a,, which can be enhanced by powers of 
In(m,/m,). In the opposite case, my, >m the 
contribution has the behavior (mı Jm)"; that is the 
case of the (m/m, contributions to a, from T 
loops and of the (me /m} contributions from u 
loops to th electron magnetic anomaly, ae. As strong 
and weak interactions are in general associated with 
heavy-mass particles, they are expected to be more 
important for a, than ae; further, a given heavy 
particle contribution to a, is smaller by a factor 
(me/ mu) than the corresponding contribution to a. 


The Magnetic Anomaly a, of the u 


The a, has been reviewed in Passera (2005). The 
present (2005) world average experimental value is 


a,(exp) = 116 592 080(60) x 107" 


with a relative error 0.5 x 10~°. 
Theoretically, one can write 


a, = A (QED) + a, (had) + a (EW) [9] 


where the three terms stand for the contributions 
from pure QED, strong interacting hadrons and 
electroweak interactions. In turn, one can expand 
a, (QED) in powers of a as 


l 
a, (QED) = XC (=) 
l 
-=5 A? +A) (=) FA (=) 
] e 


T 


+A? (Z me) (“) 10) 
Me Mı T 

The coefficients Al involve only the photon and 
the external lepton as virtual states, are identically 
the same as in a; they are known up to /=4 
included (but, strictly speaking, the contribution of 
AY is smaller than the experimental error of ap) 
and will be discussed later for the electron. The 
AW (m/m) are very large, being enhanced by 
powers of In(7,/m.), and are required and known 
up to l= 5; AS) (m, /m,) starts with AY (m/m) ~ 
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1/45(m,/m,)*, contributing 4.2 x 107! to ay, so 
that the A (m, /m,) with higher values of l are not 
needed. A” (my /me, m/m), finally, starts from 
l=3, and gives a negligible contribution 
0.7x10'!. Summing up, one finds C,=1/2, 
C2 = 0.765 857 410(27) (the error is from the experi- 
mental errors in the lepton masses) C3= 24.050 
509 64(43), C4 = 131.011(8), and C5 = 677(40). As 
already observed, the coefficients are large due to the 
presence of In(m,/m,.) factors. The last term Cs 
contributes 4.6(0.3) x 107"! to au, and the total QED 
contribution is 


a,,(QED) = 116 584 718.8(0.3)(0.4) x 1071! 


where the first error is due to the uncertainties in the 
coefficients C2, C3, and Cs and the second from the 
value of a coming from atom interferometry 
measurements (see below). 

The hadronic contributions are of two kinds, 
those due to vacuum polarization, 4,(vac.pol), 
which can be evaluated by sound theoretical 
methods by using existing experimental data, and 
those due to light-by-light hadronic scattering, 
a,(lbl), whose evaluation relies on much less firmer 
grounds and are entirely model-dependent. The 
value of a,(vac.pol) varies slightly among the 
various authors (see Passera (2005) for reference to 
original work), let us take as a typical value 
a,(vac.pol) = 6834(92) x 10™!! (based on ete™ scat- 
tering data and including also first-order radiative 
corrections). The model-dependent value of the 
light-by-light contribution changed several times in 
the years (also in sign!) but now there is a general 
consensus that it should be positive; let us take, 
somewhat arbitrarily, a,(Ibl) =136(25) x 10, so 
that the total hadronic contribution becomes 


a,(had) = 6970(92) x 107"! 
The electroweak contribution, finally, is 
a, (EW) = 154(2) x 107" 


which accounts for a one-loop purely weak 
contribution and a two-loop electromagnetic and 
weak contribution, which turns out to be very large 
(—42 x 1071!) for the presence of logarithms in the 
masses (the error is due to the uncertainty in the 
Higgs boson mass). 

Summing up, eqn [9] gives a,—116591 842 
(92) x 10711, so that 


a,(exp) — a, = 138(60)(90) x 107" 


The substantial agreement can be considered to be a 
good overall check of QED and electroweak inter- 
actions. But another attitude is often adopted in 


the scientific community: the validity of QED and 
electroweak models is taken for granted, and a 
disagreement, if any, is considered to be an indica- 
tion of new physics. To obtain significant informa- 
tion in that direction, however, the experimental 
and the theoretical errors (dominated in turn by the 
experimental error in e*e” scattering data) should 
be significantly reduced. 


The Magnetic Anomaly a, of the Electron 


Experimentally, one has the 1987 value (Kinoshita 
2005, ref. 1). 


a-(exp) = 1159652188.4(4.3) x 107 [11] 


with a relative error 3.7 x 10? and the preliminary 
Harvard (2004) measurement (Kinoshita 2005, ref. 3). 


ae(Harvard) = 1159 652180.86(0.57) x 107'* [12] 


with 0.5 x 10~ relative error, that is, an increase in 
precision by a factor 7. 

Theoretically, eqns [9] and [10] apply also to the 
electron; given the smallness of the electron mass, 
the relevant terms up to the precision of the 
experimental data are 


OORO) 


a e 
H 


+ ae(had) + ae(EW) 13] 


The explicit calculation gives 


1 
AW =. (Passera 2005, ref. 1) 


2 
N i 
i =I qo" ~37 ln2 +766) 


= — 0.328 478 965579... 
(Passera 2005, ref. 17) 


(3)_ 83 2e 215 
TS a4+571n 2-54" In 2 
239 , 139 298 , 
w tag ay In2 
17101 , 28259 
810 — | 5184 


=1.181241456... (Laporta and Remiddi 1996) 
A”) = — 1.7283(35) (Kinoshita 2005) 
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2 
(2) Mie 2 i ES Mie a 1” —12 
a (= = 45 (=) (=) ARKAN 
a-(had) = 1.67(0.02) x 107" 
4e(EW) = 0.03 x 107 [14] 


For obtaining a meaningful prediction, one needs 
now a precise value of a. The most precise value 
available at present is that of Passera (2005, ref. 49) 


a! (aif) = 137.036 000 3(10) 


with relative error 7 x 10’, obtained by the atom 
interferometry method (which is independent of 
QED, depending only on the kinematics of the 
Doppler effect). With that value of a, the theoretical 
prediction for a. becomes 


de = 1159652 175.9(8.5)(0.1)107 7 


where the first error comes from a and the second 
from C4; conversely, one can use the QED predic- 
tion for a. and a,(Harvard) for obtaining a; one 
obtains in that way 


a`! (QED, ae) = 137.035 999 708(12)(67) 


where the first uncertainty is from C4 and the 
second from the experiment. We see that theory and 
experiment are in good agreement. 

As a concluding remark, another independent and 
more precise (or analytic!) evaluation of C4 contribu- 
tion would be welcome. The five-loop term is not 
known; but as (a/r) ~ 0.07 x 10-, if Cs is, say, 
not greater than 2, its contribution to a, becomes 
equal to the contribution of the error AC, of C4 and 
is not yet required to match the current precision of 
a.(exp). The ultimate theoretical limit, the error of 
the hadronic contribution, Aae(had) = 0.02 x 1071, 
is still smaller, corresponding to a change 
NC, = 0.0007 of C4 or AC; =0.3 of C5. 


See also: Abelian and Nonabelian Gauge Theories Using 
Differential Forms; Anomalies; Effective Field Theories; 
Electroweak Theory; Quantum Field Theory: A Brief 
Introduction; Standard Model of Particle Physics. 
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In the past 50 years, entropy has broken out of 
thermodynamics and statistical mechanics and 
invaded communication theory, ergodic theory 
mathematical statistics, and even the social and 
life sciences. The favorite subjects of entropy 
concern macroscopic phenomena, irreversibility, 
and incomplete knowledge. In the strictly mathe- 
matical sense entropy is related to the asymptotics 
of probabilities or concerns the asymptotic beha- 
vior of probabilities. 

This review is organized as follows. First the 
history of entropy is discussed generally and then we 
concentrate on the von Neumann entropy again 
somewhat historically following the work of von 
Neumann. Umegaki’s quantum relative entropy is 
discussed both in case of finite systems and in the 
setting of C*-algebras. An axiomatization is pre- 
sented. To show physical applications of the concept 
of entropy, the statistical thermodynamics is 
reviewed in the setting of spin chains. The relative 
entropy shows up in the asymptotic theory of 
hypothesis testing and data compression. 


General Introduction to Entropy: From 
Clausius to von Neumann 


The word “entropy” was created by Rudolf Clausius 
and it appeared in his work Abhandlungen über die 
mechanische Warmetheorie published in 1864. The 
word has a Greek origin, its first part reminds us of 
“energy” and the second part is from “tropos,” 
which means “turning point.” Clausius’ work is the 
foundation stone of classical thermodynamics. 
According to Clausius, the change of entropy of a 
system is obtained by adding the small portions of 
heat quantity received by the system divided by the 
absolute temperature during the heat absorption. 
This definition is satisfactory from a mathematical 
point of view and gives nothing other than an 
integral in precise mathematical terms. Clausius 
postulated that the entropy of a closed system 
cannot decrease, which is generally referred to as 
the second law of thermodynamics. 

The concept of entropy was really clarified by 
Ludwig Boltzmann. His scientific program was to 
deal with the mechanical theory of heat in connec- 
tion with probabilities. Assume that a macroscopic 
system consists of a large number of microscopic 
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ones, we simply call them particles. Since we have 
ideas of quantum mechanics in mind, we assume 
that each of the particles is in one of the energy 
levels Ey < Ex <--- < Em. The number of particles 
in the level E; is Nj, so 5>;N;=N is the total 
number of particles. A macrostate of our system is 
given by the occupation numbers N1, N2,..., Nm. 
The energy of a macrostate is E = )), NjE;. A given 
macrostate can be realized by many configurations 
of the N particles, each of them at a certain energy 
level E;. These configurations are called microstates. 
Many microstates realize the same macrostate. We 
count the number of ways of arranging N particles 
in m boxes (i.e., energy levels) such that each box 
has N1, N2,..., Nm particles. There are 


N 7 N! ; 
Ni,Na,...,Nm ) NGN N, H 


such ways. This multinomial coefficient is the 
number of microstates realizing the macrostate 
(Ni, N2,..., Nm) and it is proportional to the 
probability of the macrostate if all configurations 
are assumed to be equally likely. Boltzmann called [1] 
the thermodynamical probability of the macrostate, 
in German “thermodynamische Wahrscheinlichkeit,” 
hence the letter W was used. Of course, Boltzmann 
argued in the framework of classical mechanics and 
the discrete values of energy came from an approxi- 
mation procedure with “energy cells.” 

If we are interested in the thermodynamic limit N 
increasing to infinity, we use the relative numbers 
pi:=N;/N to label a macrostate and, instead of the 
total energy E= >X; N;E;, we consider the average 
energy pro particle E/N = X}; p;E;. To find the most 
probable macrostate, we wish to maximize [1] under 
a certain constraint. The Stirling approximation of 
the factorials gives 


TEL N, Nas.. Na) 
NOSNI No... Np 
= H(p1,p2,..-,Pm) + O(N logN) [2] 


where 
Hbibi Pa > —pilogp; [3] 


If N is large then the approximation [2] yields that 
instead of maximizing the quantity [1] we can 
maximize [3]. For example, maximizing [3] under 
the constraint X}; p;E; =e, we get 


4] 
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where the constant A is the solution of the equation 
>, E; e = 
; D eT AE; 
1 J 


Note that the last equation has a unique solution if 
Eı <e < En, and the distribution [4] is now known 
as the discrete Maxwell—Boltzmann law. 

Let ~1,P2,---5Pn be the probabilities of different 
outcomes of a random experiment. According to 
Shannon, the expression [1] is a measure of our 
ignorance prior to the experiment. Hence it is also 
the amount of information gained by performing the 
experiment. The quantity [1] is maximum when all 
the p;’s are equal. In information theory, logarithms 
with base 2 are used and the unit of information is 
called bit (from binary digit). As will be seen below, 
an extra factor equal to Boltzmann’s constant is 
included in the physical definition of entropy. 

The comprehensive mathematical formalism of 
quantum mechanics was first presented in the famous 
book Mathematische Grundlagen der Quantenme- 
chanik published in 1932 by Johann von Neumann. 
In the traditional approach to quantum mechanics, a 
physical system is described in a Hilbert space: 
observables correspond to self-adjoint operators and 
statistical operators are associated with the states. In 
fact, a statistical operator describes a mixture of pure 
states. Pure states are really the physical states and 
they are given by rank-1 statistical operators, or 
equivalently by rays of the Hilbert space. 

von Neumann associated an entropy quantity to a 
statistical operator in 1927 and the discussion was 
extended in his book (von Neumann 1932). His 
argument was a gedanken experiment on the 
grounds of phenomenological thermodynamics. Let 
us consider a gas of N(>>1) molecules in a box. 
Suppose that the gas behaves like a quantum system 
and is described by a statistical operator w which is a 
mixture $; A;|y;)(yi|, where |y~;) = y; are orthogonal 
state vectors. We may take A;N molecules in the pure 
state y; for every i. The gedanken experiment gave 


s( > Medel 
= X ASe) -sA Ailoga [5] 


where « is Boltzmann’s constant and S is certain 
thermodynamical entropy quantity (relative to the 
fixed temperature and molecule density). 

After this, von Neumann showed that S(|~)(y]) is 
independent of the state vector |p}, so that 


s( Men = -s$ Alog 16 


up to an additive constant, which could be chosen to 
be 0 as a matter of normalization. Equation [6] is 
von Neumann’s celebrated entropy formula; it has a 
more elegant form 


S(w) = K trnu) [7] 


where the state w is identified with the correspond- 
ing statistical operator, and 7:R*—R is the 
continuous function n(t)= —tlogt. 

von Neumann solved the maximization problem 
for S(w) under the constraint tr wH =e. This means 
the determination of the ensemble of maximal 
entropy when the expectation of the energy operator 
H is a prescribed value e. It is convenient to rephrase 
his argument in terms of conditional expectations. 
H = H* is assumed to have a discrete spectrum and 
we have a conditional expectation E determined by 
the eigenbasis of H. If we pass from an arbitrary 
statistical operator w with tr wH =e to E(w), then the 
entropy is increasing, on the one hand, and the 
expectation of the energy does not change, on the 
other, so the maximizer should be searched among 
the operators commuting with H. In this way we are 
(and von Neumann was) back to the classical 
problem of statistical mechanics treated at the 
beginning of this article. In terms of operators, the 
solution is in the form 


exp(— 3H) 
trexp(—GH) 


which is called Gibbs state today. 


[8 | 


The von Neumann Entropy 


von Neumann was aware of the fact that statistical 
operators form a convex set whose extreme points 
are exactly the pure states. He also knew that 
entropy is a concave functional, so 


S(X Awi) > YO AS(w;) 9) 


for any convex combination. To determine the 
entropy of a statistical operator, he used the 
Schatten decomposition, which is an orthogonal 
extremal decomposition in our present language. 
For a statistical operator w there are many ways to 
write it in the form 


w= Ss Aili) (wil 


if we do not require the state vectors to be 
orthogonal. The geometry of the statistical opera- 
tors, that is, the state space, allows many extremal 
decompositions and among them there is a unique 
orthogonal one if the spectrum of w is not 


degenerate. Nonorthogonal pure states are essen- 
tially nonclassical. They are between identical and 
completely different. Jaynes recognized in 1956 that 
from the point of view of information the Schatten 
decomposition is optimal. He proved that 


(GQ) = sup} -JD A; log à; w = S Aws) [10] 


where the supremum is over all convex combina- 
tions w= }; Ajw; statistical operators. This is Jaynes 
contribution to the von Neumann entropy. By the 
way, formula [10] may be used to define von 
Neumann entropy for states of an arbitrary 
C*-algebra whose states cannot be described by 
statistical operators. 

Certainly the highlight of quantum entropy theory 
in the 1970s was the discovery of subadditivity. This 
property is formulated in a tripartite system whose 
Hilbert space H is a tensor product HA ® Hg ® Hc. 
A statistical operator wagc admits several reduced 
densities, wap, wp, wgc, and others. The strong 
subadditivity is the inequality due to Lieb and 
Ruskai in 1973: 


Theorem 1 
S(waBc) F S(wp) s S(waB) + S(wgc) [11] 


The strong subadditivity inequality [11] is con- 
veniently rewritten in terms of the relative entropy. 
For statistical operators p and w, 


S(p|lw) = tr p(log p — log w) [12] 


if suppp <suppw, otherwise S(p||w)=-+o00. The 
relative entropy expresses statistical distinguishabil- 
ity and therefore it decreases under stochastic 
mappings: 


S(pllw) = S(E(p)||E(w)) [13] 


for a completely positive trace-preserving mapping E. 
The strong subadditivity is equivalent to 


S(was, Y ® wg) < S(waBc, Y © wege) [14] 


where y is any state on B(Ha) of finite entropy. This 
inequality is a consequence of monotonicity of the 
relative entropy, since wag = E(wagc) and y ® wg = 
Elp ® wgc), where E is the partial trace over Hc. 
Clearly, the equality in [11] is equivalent to equality 
in [14]. 


Theorem 2 The equality holds in [11] if and only 
if there is an orthogonal decomposition pgpHpg = 
BD, Hbr ® Hy, pp =suppwg, such that the density 
operator of wasc satisfies 


WABC = Swe (Dn)wy @ wy [15] 
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where wk € B(Ha)®@ B(H‘,) and wre BOC) Q 
B(Hc) are density operators and p, € B(Hp) are 
the orthogonal projections Hg > Hi, ® HBh. 


Quantum Relative Entropy 


The quantum relative entropy is an information 
measure representing the uncertainty of a state with 
respect to another state. Hence it indicates a kind of 
distance between the two states. The formal defini- 
tion [12] is due to Umegaki. 

Now we approach quantum relative entropy 
axiomatically. Our crucial postulate includes the 
notion of conditional expectation. Let us recall that 
in the setting of operator algebras conditional 
expectation (or projection of norm 1) is defined as 
a positive unital idempotent linear mapping onto a 
subalgebra. 

Now we list the properties of the relative entropy 
functional which will be used in an axiomatic 
characterization: 


1. Conditional expectation property. Assume that A 
is a subalgebra of 5 and there exists a projection of 
norm 1 E of B onto A, such that yo E=y. Then 
for every state w of BS(w,y)=S(w|A, y|A) + 
S(w, wo E) holds. 

2. Invariance property. For every automorphism a 
of B we have S(w, yp) =S(woa,poa). 

3. Direct sum property. Assume that B = B1 & Bp. Let 
y12(4 D b) = Ap, (a) + (1 — A)y2(b) and w12(a $ b) = 
Aw (a) + (1 — A)w2(b) for every a € B1, b € B2 and 
some 0<A<1. Then S(w12, 912) =AS (wy, 41)+ 
(1 — A)S(w2, Y2). 

4. Nilpotence property. S(y, yp) =0. 

5. Measurability property. The function (w,y)t 
S(w, yp) is measurable on the state space of the 
finite dimensional C*-algebra B (when œ is 
assumed to be faithful). 


Theorem 3 If a real valued functional R(w, py) 
defined for faithful states p and arbitrary states w 
of finite quantum systems shares the properties 
[1]-[5], then there exists a constant ce R such 
that 


R(w, yp) = c Tr D,, (log Dy — log Dg) 


The relative entropy may be defined for linear 
functionals of an arbitrary C*-algebra. The general 
definition may go through von Neumann algebras, 
normal states and the relative modular operator. 
Another possibility is based on the monotonicity. 
Let w and be states of a C*-algebra A. Consider 
finite-dimensional algebras 6 and completely posi- 
tive unital mappings a: B— A. Then the supremum 
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of the relative entropies S(w o allp o a) (over all a) 
can be defined as S(w||y). 


Theorem 4 The relative entropy of states of 
C*-algebras shares the following properties. 


(i) (w, p)=> S(w||y) is convex and weakly lower- 
semicontinuous. 

(ii) lg — wl? < 28(w, p). 

(iii) For a unital Schwarz map a:Aj— A, the 
relation S(w o ally o a) < S(w||y) holds. 


Property (iii) is Uhlmann’s monotonicity theorem, 
which we have already applied above. 

The relative entropy appears in many concepts 
and problems in the area of quantum information 
theory (Nielsen and Chuang 2000, Schumacher and 
Westmoreland 2002). 


Statistical Thermodynamics 


Let an infinitely extended system of quantum spins 
be considered in the simple cubic lattice L =Z”, 
where v is a positive integer. The observables 
confined to a lattice site x € Z” form the self-adjoint 
part of a finite-dimensional C*-algebra A, which is a 
copy of the matrix algebra M,(C). It is assumed that 
the local observables in any bounded region A C Z” 
are those of the finite quantum system 


Ay = Q) A 


xEA 


It follows from the definition that for A C A’ we 
have Ax =A, 8 Ama, where A’\A is the comple- 
ment of A in A’. The algebra A, and the subalgebra 
An 8 Clana of Ag: have identical structure and we 
identify the element ACA, with A @Ianq in Ay. 
If Ac A’ then A, C Ay and it is said that A, is 
isotonic with respect to A. The definition also 
implies that if A; and A^% are disjoint then elements 
of An, commute with those of A,,. The quasilocal 
C*-algebra A is the norm completion of the normed 
algebra A, = Ua Ag, the union of all local algebras 
A, associated with bounded (finite) regions A C Z”. 

We denote by a, the element of A, corresponding 
to a € Ap(x € Z”). It follows from the definition 
that the algebra A,, consists of linear combinations 
of terms ay) ---al®) where x1,...,x, and a, ...,a®) 
run through Z” and Ao, respectively. We define yx 
to be the linear transformation 


1 k (1) 
a’) eee ai) +> Ax +x . 


x corresponds to the space translation by x € Z” 
and it extends to an automorphism of A. Hence y is 
a representation of the abelian group Z” by 


automorphisms of the quasilocal algebra A. Clearly, 
the covariance condition 


Yx (An) = An+x 


holds, where A+ x is the space-translate of the 
region A by the displacement x. 

Having described the kinematical structure of 
lattice systems, we turn to the dynamics. The local 
Hamiltonian H(A) is taken to be the total potential 
energy between the particles confined to A. This 
energy may come from many-body interactions of 
various orders. Most generally, we assume that there 
exists a global function ® such that for any finite 
subsystem A the local Hamiltonian takes the form 


= X 2(X) [16] 


Each (X) represents the interaction energy of the 
particles in X. Mathematically, ®(X) is a self-adjoint 
element of Ax and H(A) will be a self-adjoint 
operator in Aa. We restrict our discussion to 
translation-invariant interactions, which satisfy the 
additional requirement 


Yx(P(X)) = O(X + x) 


for every x € Z” and every region X C Z”. An 
interaction ® is said to be of finite range if ®(A) =0 
when the cardinality (or diameter) of A is large 
enough, d(A) > dẹ. The infimum of such numbers is 
called the range of ®. 

If y is a state of the quasilocal algebra A then it 
will induce a state ya on A(A), the finite system 
comprising the spin in the bounded region A of Z”. 
The (local) energy, entropy, and free energy of this 
finite system are given by the following formulas: 


Ealo) := tr awa H(A) 


Salo) := —tr awa log wa 17 
Fi (p) = Ea(y) — 5500) 


Here w, denotes the density of ya with respect to the 
trace try of Ay, and 8 denotes the inverse temperature. 
The functionals E,, S,, and FP are termed local. It is 
rather obvious that all three local functionals are 
continuous if the weak* topology is considered on the 
state space of the quasilocal algebra. The energy is 
affine, the entropy is concave and consequently, the 
free energy is a convex functional. 

The free energy functional FP is minimized by the 
Gibbs state (see [8] with H=H(A)), and the 


minimum value is given by 


1 
— g8 teye FHA) [18] 


Our aim is to explain this variational principle after 
the thermodynamic limit is performed. 

The thermodynamic limit “A tends to infinity” 
may be taken along lattice parallelepipeds. Let a € Z” 
with positive coordinates and define 


A(a) = {x € Z”: Vy — G. 7 ey [19] 


When a—> œ, A(a) tends to infinity in a manner 
suitable for the study of thermodynamic limit: the 
boundary of the parallelepipeds is getting more and 
more negligible compared with the volume. The 
notion of limit in the sense of van Hove makes this 
idea more precise and physically more satisfactory. 
For the sake of simplicity, we restrict ourselves to 
thermodynamic limit along parallelepipeds. 

Denoting by |A| the volume of A (or the number 
of points in A), we may define the global energy, 
entropy, and free energy functionals of translation- 
ally invariant states to be 


e(y) = lim Ea(p)/|Al [20] 
s(p) := lim Sa(y)/IAl [21] 
f°() = lim Fy(y)/lAI [22] 


The existence of the limit in [21] is guaranteed by 
the strong subadditivity of entropy, while that of the 
limits in [20] and [22] is assumed if the interaction 
is suitably tempered, as it certainly does if the 
interaction is of finite range. 


Theorem 5 If yis a translationally invariant state of 
the quasilocal algebra A, then the limit [21] exists and 


s(p) = inf{Syo(o)/IM@)|: aE Zy} [23 


Moreover, the von Neumann entropy density functional 
y= slo) is affine and upper-semicontinuous when the 
state space is endowed with the weak* topology. 

Let be an interaction of finite range. Then the 
thermodynamic limit [20] exists and the energy 
density is given by 

e(y) = (Eo) and Es = ye 
OcA Al 


Furthermore, e(y) is an affine weak* continuous 
functional of œ. 

It follows that the free energy density f(y) exists 
and it is an affine lower-semicontinuous function of 
the translation-invariant state yp. 

For 0 < 8 < œ the thermodynamic limit 


ea ad _ 
oon pace pA) = 9G: ®) 


exists. 
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In accordance with the lattice-gas interpretation 
of our model, the global quantity p is termed 
pressure. 

In the treatment of quantum spin systems, the set 
G, of all translation-invariant states is essential. The 
global entropy functional s is a continuous affine 
function on G, and physically it is a macroscopic 
quantity which does not have microscopic (i.e., 
local) counterpart. Indeed, the local entropy func- 
tional is not an observable because it is not affine on 
the (local) state space. The local internal energy 
Eal) is microscopic observable and the energy 
density functional e of G, is the corresponding 
global extensive quantity. 

As an analog of the variational principle for finite 
quantum systems, the global free-energy functional fg 
attains an absolute minimum at a translationally 
invariant state, and the minimum value of ff is equal 
to the thermodynamic limit of the canonical free- 
energy densities of the local finite systems. In the next 
theorem, this global variational principle will be 
formulated in a slightly different but equivalent way. 


Theorem 6 When ® is an interaction of finite 
range, then 


p(B, ®) = supis(w) — Ge(w)} 


holds, when the supremum is over all translationally 
invariant states w on A. 


The minimizers of the right-hand side are called 
equilibrium states and they have several different 
characterizations. 


Asymptotical Properties 


We keep the notation of the previous section but we 
consider one-dimensional chains, v=1. Let w be 
translation-invariant state on A and we fix a positive 
number £ < 1. We have in our mind that £ is small and 
say that a sequence of projection QO, E€ A,n] is of high 
probability if w(O,) > 1— €. The size of O,, the 
cardinality of a maximal pairwise orthogonal family of 
projections contained in O,, is given by tr,O,. (The 
subscript n in tr, indicates that the algebraic trace 
functional on A, is meant here.) The theorem below 
says that the entropy density of w governs asymptoti- 
cally the rank of the high-probability projections. 


Theorem 7 Assume that w is an ergodic translation- 
invariant state of A. Then the limit relation 
othe 
lim —inf{logtr,O,} = slw) 
n—=>œ n 


holds, when the infimum is over all projections 
On € Anny such that wn(On) 2 1—e. 
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This result is strongly related to data compression. 
When w is interpreted as a stationary quantum source 
(with possible memory), then efficient and reliable 
data compression needs a subspace of small dimension 
and the range of Q, can play this role. The entropy 
density is the maximal rate of reliable compression. 

It is interesting that one can impose further 
requirements on the high-probability projections 
and the statement of the theorem remains true. 


1. The partial trace of O,41 over An+1 is On; 

2. e@-*) < trO, < e+) if n is large enough; and 

3. if q < O, is a minimal projection (in Ap, nj), then 
w(q) <e~"S-*) if n is large enough. 


In (2) and (3) s stands for s(w). Let D, be the density 
matrix of the restriction of w to Aj,,). It follows 
that for an eigenvalue A of O,D,O,, the inequality 
log A 


S-éE<- 
n 


holds. 

From the point of view of data compression, it is 
important if the sequence O, € Aj1,,; works uni- 
versally for many states. Indeed, in this case the 
compression algorithm can be universal for several 
quantum sources. 


Theorem 8 Let R>O. There is a projection 
On € A,n) such that 


1 
lim sup z logtrO„ < R [24] 


and for any ergodic state w on A such that slw) < R 
the relation 


limw(Qn) =1 |25] 


holds. 


In the simplest quantum hypothesis testing prob- 
lem, one has to decide between two states of a 
system. The state po is the null hypothesis and p4 is 
the alternative hypothesis. The problem is to decide 
which hypothesis is true. The decision is performed 
by a two-valued measurement {T,I — T}, where 
O<T<I is an observable. T corresponds to 
the acceptance of pọ and I — T corresponds to the 
acceptance of pı. T is called a test. When the 
measurement value is 0, the hypothesis pọ is 
accepted, otherwise the alternative hypothesis pı is 
accepted. The quantity a[T]=trpo(I — T) is inter- 
preted as the probability that the null hypothesis is 
true but the alternative hypothesis is accepted. This 
is the error of the first kind. Similarly, G[T] =trp,T 
is the probability that the alternative hypothesis is 
true but the null hypothesis is accepted. It is called 
the error of the second kind. 


Now we fix a formalism for an asymptotic theory 
of the hypothesis testing. Suppose that a sequence 
(Hn) of Hilbert spaces is given, (p\””) and (ey) are 
density matrices on H,. The typical example we have 
in mind is a = Po © po ®---@ po and a = 8 
p18: Qpı. A positive contraction T, € B(H,) is 
considered as a test on a composite system. Now the 
errors of the first and second kind oo on n: 
On[Tn] =trpy (I — Tx) and B,[T,] = troy’ Tn. 

Set 


B*(n,e) = inf{trp\ An} [26] 


where the infimum is over all A, € B(H,,) such that 
0< A, <I and trol” (I —A,) < £. In other words, 
this is the infimum of the error of the second kind 
when the error of the first kind is at most €. The 
importance of this quantity is in the customary 
approach to hypothesis testing. 

The following result is the quantum Stein lemma. 


Theorem 9 In the above setting, the relation 


1, 
lim —log 3" (n, €) = —S(pollp1) 


NWN 


holds for every 0 <€< 1. 
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Quantum ergodicity and mixing belong to the field 
of quantum chaos, which studies quantizations of 
“chaotic” classical Hamiltonian systems. The basic 
question is: how does the chaos of the classical 
dynamics impact on the eigenvalues/eigenfunctions 
of the quantum Hamiltonian H and on long-time 
dynamics generated by H? 

These problems lie at the foundations of the 
semiclassical limit, that is, the limit as the Planck 


constant 4 — 0 or the energy E— œ. More generally, 
one could ask what impact any dynamical feature of a 
classical mechanical system (e.g., complete integrabil- 
ity, KAM, and ergodicity) has on the eigenfunctions 
and eigenvalues of the quantization. 

Over the last 30 years or so, these questions have 
been studied rather systematically by both mathe- 
maticians and physicists. There is an extensive 
literature comparing classical and quantum 
dynamics of model systems, such as comparing the 
geodesic flow and wave group on a compact (or 
finite-volume) hyperbolic surface, or comparing 
classical and quantum billiards on the Sinai billiard 
or the Bunimovich stadium, or comparing the 
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discrete dynamical system generated by a hyperbolic 
torus automorphism and its quantization by the 
metaplectic representation. As these models indicate, 
the basic problems and phenomena are richly 
embodied in simple, low-dimensional examples in 
much the same way that two-dimensional toy 
statistical mechanical models already illustrate com- 
plex problems on phase transitions. The principles 
established for simple models should apply to far 
more complex systems such as atoms and molecules 
in strong magnetic fields. 

The conjectural picture which has emerged from 
many computer experiments and heuristic argu- 
ments on these simple model systems is roughly 
that there exists a length scale in which quantum 
chaotic systems exhibit universal behavior. At this 
length scale, the eigenvalues resemble eigenvalues of 
random matrices of large size and the eigenfunctions 
resemble random waves. A small sample of the 
original physics articles suggesting this picture is 
Berry (1977), Bohigas et al. (1984), Feingold and 
Peres (1986), and Heller (1984). 

This article reviews some of the rigorous mathe- 
matical results in quantum chaos, particularly those 
on eigenfunctions of quantizations of classically 
ergodic or mixing systems. They support the 
conjectural picture of random waves up to two 
moments, that is, on the level of means and 
variances. A few results also exist on higher 
moments in very special cases. But from the 
mathematical point of view, the conjectural links 
to random matrices or random waves remain very 
much open at this time. A key difficulty is that the 
length scale on which universal behavior should 
occur is far below the resolving power of any known 
mathematical techniques, even in the simplest model 
problems. The main evidence for the random 
matrix and random wave connections comes from 
numerous computer experiments of model cases in 
the physics literature. We will not review numerical 
results here, but to get a well-rounded view of the 
field, it is important to understand the computer 
experiments (see, e.g., Backer et al. (1998a, b) and 
Barnett (2005)). 

The model quantum systems that have been most 
intensively studied in mathematical quantum chaos 
are Laplacians or Schrödinger operators on com- 
pact (or finite-volume) Riemannian manifolds, with 
or without boundary, and quantizations of sym- 
plectic maps on compact Kahler manifolds. Similar 
techniques and results apply in both settings, so for 
the sake of coherence we concentrate on the 
Laplacian on a compact Riemannian manifold 
with “chaotic” geodesic flow and only briefly 
allude to the setting of “quantum maps.” 


Additionally, two main kinds of methods are in 
use: (1) methods of semiclassical (or microlocal) 
analysis, which apply to general Laplacians (and 
more general Schrödinger operators), and (2) 
methods of number theory and automorphic 
forms, which apply to arithmetic models such as 
arithmetic hyperbolic manifolds or quantum cat maps. 
Arithmetic models are far more “explicitly solvable” 
than general chaotic systems, and the results obtained 
for them are far sharper than the results of semiclassi- 
cal analysis. This article is primarily devoted to the 
general results on Laplacians obtained by semiclassical 
analysis; see Arithmetic Quantum Chaos for results by 
J Marklov. For background on semiclassical analysis, 
see Heller (1984). 


Wave Group and Geodesic Flow 


The model quantum Hamiltonians we will discuss 
are Laplacians A on compact Riemannian mani- 
folds (M,g) (with or without boundary). The 
classical phase space in this setting is the cotangent 
bundle T*M of M, equipped with its canonical 
symplectic form `; dx; \d&;. The metric defines 
the Hamiltonian 


A(x, §) = |€le = 





on T*M, where 


ee E 
Si Ta On; OX 


[g’] is the inverse matrix to [gj]. We denote the 
volume density of (M,g) by dVol and the corre- 
sponding inner product on L7(M) by (f,g). The unit 
(co-) ball bundle is denoted B*M = {(x, €):|€| < 1}. 
The Hamiltonian flow ®’ of H is the geodesic 
flow. By definition, (x, €) = (x;, €&), where (xr, €) is 
the terminal tangent vector at time ¢ of the unit 
speed geodesic starting at x in the direction €. Here 
and below, we often identify T*M with the tangent 
bundle TM using the metric to simplify the 
geometric description. The geodesic flow preserves 
the energy surfaces {H = E} which are the co-sphere 
bundles SiM. Due to the homogeneity of H, 
the flow on any energy surface {H = E} is equivalent 
to that on the co-sphere bundle S*M={H=1}. 
(This homogeneity could be broken by adding a 
potential VeC™*(M) to form a semiclassical 
Schrödinger operator —h*A + V, whose underlying 
Hamiltonian flow is generated by E + V(x).) See 
h-Pseudodifferential Operators and Applications. 
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The quantization of the Hamiltonian H is the 
square root VA of the positive Laplacian 


1G0 ; ð 
N = —— GI gg —— 
Je ox8 Sax; 


of (M,g). Here, g=det[gj]. We choose to work 
with VA rather than A since the former generates 
the wave 


U, = eitVA 


which is the quantization of the geodesic flow ®’. 
By the last statement we mean that U; is related to 
®’ in several essentially equivalent ways: 


1. singularities of waves, that is, solutions U;w of 
the wave equation, propagate along geodesics; 

2. U, is a Fourier integral operator (= quantum 
map) associated to the canonical relation defined 
by the graph of ® in T*M x T*M; and 

3. Egorov’s theorem holds. 


We only define the latter since it plays an important 
role in studying eigenfunctions. As with any quantum 
theory, there is an algebra of observables on the 
Hilbert space L7(M,dvol,) which quantizes T*M. 
Here, dvol, is the volume form of the metric. The 
algebra is that U*(M) of pseudodifferential operators 
WDO’s of all orders, though we often restrict to the 
subalgebra Y? of y DO’s of order zero. We denote by 
Ww’ (M) the subspace of pseudodifferential operators of 
order m. The algebra is defined by constructing a 
quantization Op from an algebra of symbols a € 
S” (T*M) of order m (polyhomogeneous functions on 
T*M\0) to UW”. The map Op is not unique. In the 
reverse direction is the symbol map ca: Y” — 
S”(T*M) which takes an operator Op(a) to the 
homogeneous term 4m of order m in a. 

Egorov’s theorem for the wave group concerns the 
conjugations 


a;(A):=U;,AU;, AEwWw"(M) [1] 


Such a conjugation defines the quantum evolution of 
observables in the Heisenberg picture, and, since the 
early days of quantum mechanics, it was known to 
correspond to the classical evolution 


V;(a):=a o [2] 


of observables a € C®(S*M). Egorov’s theorem is 
the rigorous version of this correspondence: it states 
that a; defines an order-preserving automorphism of 
W*(M), that is, œ(A)€ %”(M) if Acw"(M), and 
that 


ouat (x, €) = aa(®"(x, €)) = VATA), 
(x, £) € T*M\0 3 


This formula is almost universally taken to be the 
definition of quantization of a flow or map in the 
physics literature. 

The key difficulty in quantum chaos is that it 
involves a comparison between long-time dynamical 
properties of ® and U; through the symbol map and 
similar classical limits. The classical dynamics 
defines the “principal symbol” behavior of U; and 
the “error” U;AU; — Op(o, o ®’) typically grows 
exponentially in time. This is just the first example 
of a ubiquitous “exponential barrier” in the subject. 


Eigenvalues and Eigenfunctions of A 


The eigenvalue problem on a compact Riemannian 
manifold 


Ay =H}, (Pir Pe) = Ge 


is dual under the Fourier transform to the wave 
equation. Here, {y;} is a choice of orthonormal basis 
of eigenfunctions, which is not unique if the 
eigenvalues have multiplicities >1. The individual 
eigenfunctions are difficult to study directly, and so 
one generally forms the spectral projections kernel, 


E(A, x,y) = ` p(x) p(y) [4] 


J:A SKA 


Semiclassical asymptotics is the study of the A —> oo 
limit of the spectral data {y;, A;} or of E(A, x,y). The 
(Schwartz) kernel of the wave group can be 
represented in terms of the spectral data by 


U;(x, y) = >, e N p;(x)p;(y) 


or equivalently as the Fourier transform 
fp et’ dE(A, x,y) of the spectral projections. Hence, 
spectral asymptotics is often studied through the 
large-time behavior of the wave group. 

The link between spectral theory and geometry, 
and the source of Egorov’s theorem for the wave 
group, is the construction of a parametrix (or WKB 
formula) for the wave kernel. For small times ż, the 
simplest is the Hadamard parametrix, 


Urle,y) ~ f EE S U(x, yd 
0 k=0 


(t < inj(M, g)) [5] 


where r(x,y) is the distance between points, 
Uo(x,y)=O'/7(x,y) is the volume 1/2-density, 
inj(M,g) is the injectivity radius, and the higher 
Hadamard coefficients are obtained by solving 
transport equations along geodesics. The parametrix 
is asymptotic to the wave kernel in the sense of 
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smoothness, that is, the difference of the two sides of 
[5] is smooth. The relation [5] may be iterated using 
Uim = U” to obtain a parametrix for long times. 
This is obviously complicated and not necessarily 
the best long-time parametrix construction, but it 
illustrates again the difficulty of a long-time 
analysis. 


Weyl Law and Local Weyl Law 


A fundamental and classical result in spectral 
asymptotics is Weyl’s law on counting eigenvalues: 


N(Q) =H) SA} 
_ IB, 


Tn) Vol(M, g)\" + O(A!) [6] 





Here, |B„| is the Euclidean volume of the unit ball 
and Vol(M, g) is the volume of M with respect to the 
metric g. An equivalent formula which emphasizes 
the correspondence between classical and quantum 
mechanics is 


_ Vol(léle < 


tre 
r À (27)” 


7] 
where Vol is the symplectic volume measure relative 
to the natural symplectic form >7"_,dx; A dg on 
T*M. Thus, the dimension of the space where 
H=VA is < X is asymptotically the volume where 
its symbol |€], < A. 

The remainder term in Weyl’s law is sharp on the 
standard sphere, where all geodesics are periodic, but 
is not sharp on (M,g) for which the set of periodic 
geodesics has measure zero (Duistermaat—Guillemin, 
Ivrii) (see Semiclassical Spectra and Closed Orbits). 
When the set of periodic geodesics has measure zero 
(as is the case for ergodic systems), one has 


N(A) = Hi) <A} 
|B, 

(27)” 

The remainder is then of smaller order than the 


derivative of the principal term, and one then has 
asymptotics in shorter intervals: 


Vol(M, g)\” + 0(\"") [8] 





N([A, A+ 1) = #{j : X € [A,A + 1} 
Bn 
(27) 





= n-z Vol(M, g)A" 1 + o(à"71) [9] 
Physicists tend to write A ~ h™ and to average over 
intervals of this width. Then mean spacing between 
the eigenvalues in this interval is ~ C,,Vol(M, g) x 
A-1) where C, is a constant depending on the 
dimension. 


An important generalization is the “local Weyl law” 
concerning the traces trAE(A), where A € Y”(M). 
It asserts that 


N (Apno 
IZA 
1 


7 cou oadx dé A” + O(A\"!) [10] 


There is also a pointwise local Weyl law: 


S ga) =a IB" + ROX) [1 
7E (27) 


where R(\,x)=O(A"-!) uniformly in x. Again, 
when the periodic geodesics form a set of measure 
zero in S*M, one could average over the shorter 
interval [A,\ +1]. Combining the Weyl and local 
Weyl law, we find the surface average of o, is a 
limit of traces: 


1 
w(A) = Tm ca du 


, 1 
= T 24% pj) [12] 


Here, u is the “Liouville measure” on S*M, that is, 
the surface measure du = dx d/dH induced by the 
Hamiltonian H = |€|, and by the symplectic volume 
measure dx d€ on T*M. 


Problems on Asymptotics Eigenfunctions 


Eigenfunctions arise in quantum mechanics as 
stationary states, that is, states ~ for which the 
probability measure h(t, x)| dvol is constant in time 
where y(t,x)= U(x) is the evolving state. This 
follows from the fact that 
a [13] 
and that |e”«|=1. They are the basic modes of the 
quantum system. One would like to know the 
behavior as \; > œo (or h—0 in the semiclassical 
setting) of invariants such as: 


U;p, =e 


1. matrix elements (Ay;,;) of observables in this 
state; 

2. transition elements (Ay;, y;) between states; 

3. size properties as measured by LP norms ||yj|| 73 

4. value distribution as measured by the distribution 
function Vol{x € M: loj(x) | > tf}; and 

5. shape properties, for example, distribution of 
zeros and critical points of yj. 


Let us introduce some problems which have 
motivated much of the work in this area. 
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Problem 1 Let Q denote the set of “quantum 
limits,” that is, weak” limit points of the sequence 
{®,} of distributions on the classical phase space 
S*M, defined by 


f adar := (Op(a)Pk, Pk) 


where a € C™(S*M). 


The set Q is independent of the definition of Op. 
It follows almost immediately from Egorov’s theo- 
rem that Q C Mr, where M, is the convex set of 
invariant probability measures for the geodesic flow. 
Furthermore, they are time-reversal invariant, that 
is, invariant under (x,&)— (x, —€) since the eigen- 
functions are real valued. 

To see this, it is helpful to introduce the linear 
functionals on Y°: 


PR(A) = (Op(4) kr, Pk) [14] 


We observe that p,(1)=1, p,(A)>0 if A>O, 
and that 


pe(UsAU;) = pe(A) 15] 


Indeed, if A >0 then A=B*B for some B €e W° 
and we can move B* to the right-hand side. 
Similarly, [15] is proved by moving U; to the right- 
hand side and using [13]. These properties mean 
that p; is an “invariant state” on the algebra Y°. 
More precisely, one should take the closure of Y° in 
the operator norm. An invariant state is the analog 
in quantum statistical mechanics of an invariant 
probability measure. 

The next important fact about the states p is that 
any weak limit of the sequence {p,} on W® is an 
invariant probability measure on C(S*M), that is, 
a positive linear functional on C(S*M) rather than 
just a state on Y°. This follows from the fact that 
(Ky;, yj) — 0 for any compact operator K, and so any 
limit of (Ay,, yp) is equally a limit of ((A + K)yp, yp). 
Hence, any limit is bounded by infx ||A + K|| (the 
infimum taken over compact operators), and for any 
A € Y°, ||o4||;0 = infx ||A + K||. Hence, any weak 
limit is bounded by a constant times ||c,||,. and is 
therefore continuous on C(S*M). It is a positive 
functional since each p;, and hence any limit, is a 
probability measure. By Egorov’s theorem and the 
invariance of the p, any limit of p,(A) is a limit of 
pe(Op(o,4 o ®*)) and hence the limit measure is 
invariant. 

Problem 1 is thus to identify which invariant 
measures in My, show up as weak limits of the 
functionals p, or equivalently the distributions d®,. 
The weak limits reflect the concentration and 


oscillation properties of eigenfunctions. Here are 
some possibilities: 


1. Normalized Liouville measure. In fact, the func- 
tional w of [12] is also a state on W° for the 
reason explained above. A subsequence {ip,,} of 
eigenfunctions is considered diffuse if pp, > w. 

2. A periodic orbit measure u, defined by 


1 
mA) =F | oads 
Jy 


where L, is the length of y. A sequence of 
eigenfunctions for which pg, > u, obviously con- 
centrates (or strongly “scars”) on the closed 
geodesic. 

3. A finite sum of periodic orbit measures. 

4. A delta-function along an invariant Lagrangian 
manifold A C S*M. The associated eigenfunctions 
are viewed as “localizing” along A. 

5. A more general invariant measure which is 
singular with respect to dp. 


All of these possibilities can and do happen in 
different examples. If d®,,—w, then in particular 
we have 


1 Vol(E) 
Vol(M) J. py (x)ÉdVol > Vol(M) 


for any measurable set E whose boundary has 
measure zero. Interpreting lpr, (x)| dVol as the 
probability density of finding a particle of energy 
àZ at x, this result means that the sequence of 
probabilities tends to uniform measure. 

However, d®,,— w is much stronger since it says 
that the eigenfunctions become diffuse on the energy 
surface S*M and not just on the configuration space 
M. As an example, consider the flat torus R”/Z”. 
An orthonormal basis of eigenfunctions is furnished 
by the standard exponentials e**) with k € Z”. 
Obviously, |e2\**)|7 =1, so the eigenfunctions are 
already diffuse in configuration space. On the other 
hand, they are far from diffuse in phase space, and 
localize on invariant Lagrangian tori in S*M. Indeed, 
by definition of pseudodifferential operator, 
Ae2™\&*) = a(x, k) e27*%*) where a(x,k) is the com- 
plete symbol. Thus, 





(Ae mkr, gamie -j a(x, k) dx 
R”/Z” 


a) 
~ x,— | dx 
macy. 


A subsequence e ) of eigenfunctions has a weak 
limit if and only if k;/|k;| tends to a limit vector £o in 
the unit sphere in R”. In this case, the associated 


2mi(kj, x 
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weak* limit is frey z» A(X, o)dx, that is, the delta- 
function on the a torus Tg C S*M defined 
by the constant momentum condition =. The 
eigenfunctions are said to localize on this invariant 
torus for ®’. 

The flat torus is a model of a completely 
integrable system on both the classical and quantum 
levels. Another example is that of the standard 
round sphere $”. In this case, the author and 
D Jakobson showed that absolutely any invariant 
measure v E€ Mgr can arise as a weak limit of a 
sequence of eigenfunctions. This reflects the huge 
degeneracy (multiplicities) of the eigenvalues. 

On the other hand, if the geodesic flow is ergodic, 
one would expect the eigenfunctions to be diffuse in 
phase space. In the next section, we will discuss the 
rigorous results on this problem. 

Off-diagonal matrix elements 


pie(A) = (Avi. pj) [16] 


are also important as transition amplitudes between 
states. They no longer define states since p;k(I) = 0, 
are positive, or invariant. Indeed, p;}(U,AUž)= 
ePm A) pik(A), so they are eigenvectors of the 
P phic a, of [1]. A sequence of such matrix 
elements cannot have a weak limit unless the 
spectral gap A; — Az tends to a limit 7 € R. In this 
case, by the same discussion as above, any weak 
limit of the functionals p, will be an eigenmeasure 
of the geodesic flow which transforms by e'” under 
the action of ©’. Examples of such eigenmeasures 
are orbital Fourier coefficients 


1 e t t 
F) e "oal (x, €)) dt 
La Jo 


along a periodic orbit. Here, 7 € (27/L,)Z. We 
denote by Q, such eigenmeasures of the geodesic 
flow. Problem 1 has the following extension to off- 
diagonal elements: 


Problem 2 Determine the set Q, of “quantum 
limits,” that is, weak” limit points of the sequence 
{®,;} of distributions on the classical phase space 
S*M, defined by 


| ad@y := (Op(a)yr, Pi) 


where Aj — Ag =T + 0(1) and where a € C™(S*M), or 
equivalently of the functionals pj. 


As will be discussed in the section “Quantum 
weak mixing,” the asymptotics of off-diagonal 
elements depends on the weak mixing properties of 
the geodesic flow and not just its ergodicity. 


Matrix elements of eigenfunctions are quadratic 
forms. More “nonlinear” problems involve the 
L?-norms or the distribution functions of eigenfunc- 
tions. Estimates of the L®-norms can be obtained 
from the local Weyl law [10]. Since the jump in 
the left-hand side at A is ae = lel x)|? and the 
jump in the right-hand side is the j jump of R(A,x), 
this Ba 


` ly 


piaAj=a 


1 


OA") = |lyjllr~2 = OF) [17] 


For general L?-norms, the following bounds were 
proved by C Sogge for any compact Riemannian 
manifold: 


lvillp 








=O(N")), 2<p< œ [18] 
lella 
where 
1 1\ 1 2(n+1) 
mom ea =a a 
6(p) = ( p) 2o n1 COO ig 
= n 2(n+ 1) 
PO E) Da 
2 2 p n—1 


These estimates are sharp on the unit sphere $” c 
R"*1. The extremal eigenfunctions are the zonal 
spherical harmonics, which are the L?-normalized 
spectral projection kernels I j(x,x0)/||[In(-,xo)|| 
centered at any xo. However, they are not sharp 
for generic (M,g), and it is natural to ask how 
“chaotic dynamics” might influence L?-norms. 


Problem 3 Improve the estimates lel, Hebs 
O(A°'?)) for (M, g) with ergodic or mixing d 
flow. 


C Sogge and the author have proved that if a 
sequence of eigenfunctions attains the bounds in 
[17], then there must exist a point xg so that a 
positive measure of geodesics starting at xo in S% M 
returns to xo at a fixed time T. In the real analytic 
case, all return so xo is a perfect recurrent point. In 
dimension 2, such a perfect recurrent point cannot 
occur if the geodesic flow is ergodic; hence 
lp;illro =0(å”71/2) on any real analytic surface 
with ergodic geodesic flow. This shows that none 
of the L?-estimates above the critical index are sharp 
for real analytic surfaces with ergodic geodesic flow, 
and the problem is the extent to which they can be 
improved. 

The random wave model (see the section “Random 
waves and orthonormal bases”) predicts that eigen- 
functions of Riemannian manifolds with chaotic 
geodesic flow should have the bounds ||y)||,, = O(1) 
for p < co and that ||yy||;~ < log. But there are 
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no rigorous estimates at this time close to such 
predictions. The best general estimate to date on 
negatively curved compact manifolds (which are 
models of chaotic geodesic flow) is just the logarithmic 


improvement 
\e-l 
| ee 
loli = (| 


on the standard remainder term in the local Weyl 
law. This was known for compact hyperbolic 
manifolds from the Selberg trace formula, and 
similar estimates hold manifolds without conjugate 
points (P Bérard). The exponential growth of the 
geodesic flow again causes a barrier in improving 
the estimate beyond the logarithm. In the analogous 
setting of quantum “cat maps,” which are models of 
chaotic classical dynamics, there exist arbitrarily 
large eigenvalues with multiplicities of the order 
O(A\"~!/log A); the L°-norm of the L?-normalized 
projection kernel onto an eigenspace of this multi- 
plicity is of order of the square root of the 
multiplicity (Faure et al. 2003). This raises doubt 
that the logarithmic estimate can be improved by 
general dynamical arguments. Further discussion of 
L™-norms, as well as zeros, will be given at the end 
of the next section for ergodic systems. 


Quantum Ergodicity 


In this section, we discuss results on the problems 
stated above when the geodesic flow of (M,g) is 
assumed to be ergodic. Let us recall that this means 
that Liouville measure is an ergodic measure for ©’. 
This is a spectral property of the operator V; of [2] 
on L7(S*M,du), namely that V; has 1 as an 
eigenvalue of multiplicity 1. That is, the only 
invariant L*-functions (with respect to Liouville 
measure) are the constant functions. This implies 
that the only invariant sets have Liouville measure 0 
or 1 and (Birkhoff’s ergodic theorem) that time 
averages of functions are constant almost every- 
where (equal to the space average). 

In this case, there is a general result which 
originated in the work of Schnirelman and was 
developed into the following theorem by Zelditch, 
Colin de Verdiére, and Sunada (manifolds without 
boundary), and Gérard—Leichtnam and Zelditch— 
Zworski (manifolds with boundary). The following 
discussion is based on the articles (Zelditch 
1996b, c, Zelditch and Zworski 1996), which 
contain further references to the literature. 


Theorem 1 Let (M,g) be a compact Riemannian 
manifold (possibly with boundary), and let {A;, p;} 
be the spectral data of its Laplacian A. Then the 


geodesic flow G* is ergodic on (S*M, du) if and only 
if, for every A € U°(M), we have: 


(i) lim) >œ EF Daca Aw pi) — wA) =0. 


(11) (Ve)(dô) lim SUP) — 00 NO rr AnA sA =A 
(Api, Ye) < €. 


This implies that there exists a subsequence {y;,} 
of eigenfunctions whose indices j, have counting 
density 1 for which (Ay;,, pi) > w(A). We will call 
the eigenfunctions in such a sequence “ergodic 
eigenfunctions.” One can sharpen the results by 
averaging over eigenvalues in the shorter interval 
[A,A + 1] rather than in [0, A]. 

There is also an ergodicity result for boundary values 
of eigenfunctions on domains with boundary and with 
Dirichlet, Neumann, or Robin boundary conditions 
(Gérard—Leichtnam, Hassell—Zelditch, Burg). This cor- 
responds to the fact that the billiard map on B*OM 
is ergodic. 

The first statement (i) is essentially a convexity 
result. It remains true if one replaces the square by 


any convex function y on the spectrum of A, 
1 
erga p((Ave, Ye) — w(A)) 0 [20] 
N(E) 2, 


Before sketching a proof, we point out a some- 
what heuristic “picture proof” of the theorem. 
Namely, ergodicity of the geodesic flow is equivalent 
to the statement that Liouville measure is an 
extreme point of the compact convex set Mg. In 
fact, it further implies that w is an extreme point of 
the compact convex set Ep of invariant states for a; 
of eqn [1]; see Ruelle (1969) for background. But 
the local Weyl law says that w is also the limit of the 
convex combination 


1 
NB 22" 


ASE 


An extreme point cannot be written as a convex 
combination of other states unless all the states in the 
combination are equal to it. In our case, w is only a 
limit of convex combinations so it need not (and does 
not) equal each term. However, almost all terms in the 
sequence must tend to w, and that is equivalent to [1]. 


Sketch of Proof of Theorem 1(i) As mentioned 
above, this is a convexity result and with no 
additional effort we can consider more general 
sums of the form. We then have 


NO olUlApr, Ye) — w(A)) 


ASE 


= Y o(((A)r -wA o) R1 


ASE 
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where 


1 


(A)r = 5 _UAU; di 


We then apply the M inequality 
X y((By;, p;)) < tr (B) 
j=l 


with B=IIg[(A)7 — w(A)]He to get 


S el (A) Px, Pk)) 
ASE 
< tr p(Iz|(A) 7 — wlA)]e) |22] 


Here, Ig is the spectral projection for H corre- 
sponding to the interval [0, E]. From the Berezin 


inequality we then have (if y(0) = 0): 
Ne Cllr ~ wx(A)]T) 
< gp OA) - (A) 


> welp {A)r — w(A))), as E => 00 


As long as y is smooth, y((A)r—w(A)) is a 
pseudodifferential operator of order zero with 
principal symbol y((o4)7 —w(A)). By the assump- 
tion that wg > w we get 


pí (Apk, Pk) — w(A)) 


< ah - w(A)) dy 


where 


1 T 
(TA)T = FP J oo% dt 


As T— œ the right-hand side approaches y(0) =0 
by the dominated convergence theorem and by 
Birkhoff’s ergodic theorem. Since the left-hand side 
is mepees of T, this implies that 


5 2 I pl (Apr pr} — w(A)) = 0 


lim 
lim 5 Wek 


for any smooth convex y on Spec(A) with (0 


As mentioned above, the statement of Theorem 1(i) 
is equivalent to saying that there is a subsequence 
{y;,} of counting density 1 for which p;, =w. The 
above proof does not and cannot settle the question 
whether there exist exceptional sparse subsequences 
of eigenfunctions of density zero tending to other 
invariant measures. To see this, we observe that 


the proof is so general that it applies to seemingly very 
different situations. In place of the distributions 
{®;} we may consider the set u, of periodic orbit 
measures for a hyperbolic flow on a compact manifold 
X. That is, 


W(P) = [f tofe ctx) 


y 


where y is a closed orbit and T, is its period. 
According to the Bowen—Margulis equidistribution 
theorem for closed orbits of hyperbolic flows, we 
have 


1 1 
—_. —__________ Jj, -5 

I(T) Eride = P " 

where as above n is the Liouville measure, where P, 
is the linear Poincaré map and where II(T) is the 
normalizing factor which makes the left side a 
probability measure, that is, defined by the integral 
of 1 against the sum. An exact repetition of the 
previous argument shows that up to a sparse 
subsequence of y’s, y, — u individually. Yet clearly, 
the whole sequence does not tend to du: for 
instance, one could choose the sequence of iterates 
4f of a fixed closed orbit. 


Quantum Ergodicity in Terms of Operator Time 
and Space Averages 


The first part of the result above may be reformu- 
lated as a relation between operator time and space 
averages. 


Definition Let A € V® be an observable and define 
its time average to be: 


and its space average to be scalar operator 
w(A)-I 


Here, the limit is taken in the weak operator 
topology (i.e., one matrix element at a time). To see 
what is involved, we consider matrix elements with 
respect to the eigenfunctions. We have 


Lt 
(= J U; AU, drongi) = 


from which it is clear that the matrix element tends 
to zero as T — œo unless A; = A;. However, there is 
no uniformity in the rate at which it goes to zero 
since the spacing A; — A; could be uncontrollably 
small. 


sin T(A; — A;) 


TOA) Ce 
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In these terms, Theorem 1(i) states that 


(A) =w(A)I+K, where lim wy(K*K) +0 [23] 


where w)(A)=trE(A)A. Thus, the time average 
equals the space average plus a term K which 
is semiclassically small in the sense that its 
Hilbert-Schmidt norm square ||E)K||ĝ in the span 
of the eigenfunctions of eigenvalue <A is o(N(A)). 

This is not exactly equivalent to Theorem 1(i) 
since it is independent of the choice of orthonormal 
basis, while the previous result depends on the 
choice of basis. However, when all eigenvalues have 
multiplicity 1, then the two are equivalent. To see 
the equivalence, note that (A) commutes with VA 
and hence is diagonal in the basis {y;} of joint 
eigenfunctions of (A) and of U;. Hence, K is the 
diagonal matrix with entries (Ayz,yz) — w(A). The 
condition is therefore equivalent to 


1 2 
1 ICE (Ave, Pr) — oA) = 0 


Since all the terms are positive, no cancellation is 
possible and this condition is equivalent to the 
existence of a subset S C N of density 1 such that 
Os:={d®,:k € S} has only w as a weak” limit 
point. As above, one says that the sequence of 
eigenfunctions is ergodic. 

One could take this restatement of Theorem 1(i) 
as a semiclassical definition of quantum ergodicity. 
Two natural questions arise. First: 


Problem 4 Suppose the geodesic flow ® of (M, g) 
is ergodic on S*M. Is the operator K in 


(A) =w(A)+K 


a compact operator? In this case, VA is said to be 
quantum uniquely ergodic (QUE). If ergodicity is 
not sufficient for the QUE property, what extra 
conditions need to be added? 


Compactness would imply that (Kyk, pk) — 0O, 
hence (Ay,,yr) ~w(A) along the entire sequence. 
Quite a lot of attention has been focused on this 
problem in the last decade. It is probable that 
ergodicity is not by itself sufficient for the QUE 
property of general Riemann manifold. For instance, 
it is believed that there exist modes of asymptotic 
bouncing ball type which concentrate on the 
invariant Lagrangian cylinder (with boundary) 
formed by bouncing ball orbits of the Bunimovich 
stadium (see e.g., Heller (1984) for more on such 
“scarring”). Further, Faure et al. (2003) have shown 
that QUE does not hold for the hyperbolic system 
defined by a quantum cat map on the torus. Since 
the methods applicable to eigenfunctions of 


quantum maps and of Laplacians have much in 
common, this negative result shows that there 
cannot exist a universal structural proof of QUE. 

The principal positive result available at this time 
is the recent proof by Lindenstrauss of the QUE 
property for the orthonormal basis of Laplace- 
Hecke eigenfunctions on arithmetic hyperbolic sur- 
faces. It is generally believed that the spectrum of 
the Laplace eigenvalues is of multiplicity 1 for such 
surfaces, so this should imply QUE completely for 
these surfaces. Earlier partial results on Hecke 
eigenfunctions are due to Rudnick—Sarnak, Wolpert, 
and others. For references and further discussion onf 
Hecke eigenfunctions, see Rudnick and Sarnak 
(1994) (see Arithmetic Quantum Chaos). 

So far we have not mentioned Theorem 1(ii). In 
the next section, we will describe a similar but more 
general result for mixing systems and the relevance 
of (ii) will become clear. An interesting open 
problem is the extent to which (ii) is actually 
necessary for the equivalence to classical ergodicity. 


Problem 5 Converse QE: What can be said of the 
classical limit of a quantum ergodic system, that is, a 
system for which (A)=u(A)+K, where K is 
compact? Is it necessarily ergodic? 


Very little is known on this converse problem at 
present. It is known that if there exists an open set in 
S*M filled by periodic orbits, then the Laplacian 
cannot be quantum ergodic (see Marklof and 
O’Keefe (2005) for recent results and references). 
But no proof exists at this time that KAM systems, 
which have Cantor-like positive measure invariant 
sets, are not quantum ergodic. It is known that there 
exists a positive proportion of approximate eigen- 
functions (quasimodes) which localize on the invari- 
ant tori, but it has not been proved that a positive 
proportion of actual eigenfunctions has this localiza- 
tion property. 


Further Problems and Results on Ergodic 
Eigenfunctions 


Ergodicity is also known to have an impact on 
the distribution of zeros. The complex zeros in 
Kahler phase spaces of ergodic eigenfunctions of 
quantum ergodic maps become uniformly distrib- 
uted with respect to the Kahler volume form 
(Nonnenmacher—Voros, Shiffman—Zelditch). An inter- 
esting problem is whether the real analog is true: 


Problem 6 Ergodicity and equidistribution of 
nodal sets. Let M, C M denote the nodal set (zero 
set) of yj, and equip it with its hypersurface volume 
form dH”! induced by g. Let (M,g) have ergodic 
geodesic flow, and suppose that {y;} is an ergodic 
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sequence of eigenfunctions. Are the following 


asymptotics valid? 


n—1 1 J 
J. FT ~ NTD haf D 


This is predicted by the random wave model of 
the section “Random waves and orthonormal 
bases.” An equidistribution law for the complex 
zeros is known which gives some evidence for the 
validity of this limit formula. Let (M,g) be a 
compact real analytic Riemannian manifold and let 
pF be the holomorphic extension of the real analytic 
eigenfunction y; to the complexification Mc of M 
(its Grauert tube). Then, if the geodesic flow is 
ergodic and if y; is an ergodic sequence of 
eigenfunctions, the normalized current of mo ranon 
(1/A;)Z c over the complex zero set of pF tends 
weakly to (i/7)O0|€,|. This current is singular along 
the zero section. 

Finally, we mention some results on L®-norms of 
eigenfunctions on arithmetic hyperbolic manifolds 
of dimensions 2 and 3. It was proved by Iwaniec-— 
Sarnak that the joint eigenfunctions of A and the 
Hecke operators on arithmetic hyperbolic surfaces 
have the upper bound ||y;||,, = Od?! +) for all 
j and €>0O, and the lower bound lello = 
c\/log log à; for some constant c > 0 and infinitely 
many j. Rudnick and Sarnak (1994) proved that 
there exists an arithmetic hyperbolic manifold and a 
subsequence yj, of eigenfunctions with ||y;,||;0 > 

1/4 ia 
à; , contradicting the random wave model 


ik 2, 
predictions. 


Quantum Weak Mixing 


There are parallel results on quantizations of weak- 
mixing geodesic flows which are the subject of this 
section. First we recall the classical definition: 
the geodesic flow of (M,g) is weak mixing if the 
operator V, has purely continuous spectrum on the 
orthogonal complement of the constant functions in 
L7(S*M, du). Hence, like ergodicity, it is a spectral 
property of the geodesic flow. 
We have: 


Theorem 2 (Zelditch 1996c). The geodesic flow ®' 
of (M, g) is weak mixing if and only if the conditions 
(i) and (ii) of Theorem 1 hold and additionally, for 
any A € V°(M), 


(Ve) (S46) lim sup (Ay, pe) < € 
A—00 NO, >., / 
|\j-Ap-7I<6 
(Vr € R) 


The restriction f Æ k is of course redundant unless 
T=0, in which case the statement coincides with 
quantum ergodicity. This result follows from the 
general asymptotic formula, valid for any compact 
Riemannian manifold (M, g), that 
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In the case of weak-mixing geodesic flows, the right- 
hand side tends to 0 as T — œo. As with diagonal 
sums, the sharper result is true where one averages 
over the short intervals [A, \ + 1]. 


Spectral Measures and Matrix Elements 


Theorem 2 is based on expressing the spectral 
measures of the geodesic flow in terms of matrix 
elements. The main limit formula is 


TTE 1 2 
J duo, := im No) >D Api, ¥;)| |25] 


i,j: YS, 
|Aj-Aj-TI<e 


where dus, is the spectral measure for the geodesic 
flow corresponding to the principal symbol of 
A, oa € C™~(S*M, du). Recall that the spectral mea- 
sure of V, corresponding to f € L? is the measure 


dur defined by 


(Vif Pom = | el duu; (7) 


The limit formula [25] is equivalent to the dual 
formula (under the Fourier transform): 


1 
NI) a enn | Age) 
400 N(A) ds 
= ( VITA, TA) LSM) |26] 


The proof of [26] is to consider, for A € Y°, the 
operator ASA € W° with A;=UFAU;. By the local 
Weyl law, 


1 
lim ——~tr E(\A)A;A = 


NO) (Via, CA) 2(5-M) 


The right-hand-side of [25] defines a measure dm, 
on R and [26] says 


[et dmat) = Vios oaen = f e duos 
R R 
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Since weak-mixing systems are ergodic, it is not 
necessary to average in both indices along an 
ergodic subsequence: 


lim (A; Ay), pi) = X EOD (Agi, gi) 
jo j 
= (Vio, 0A) 12(5*M) [27] 


Dually, one has 


lim ` 


Nj OO” 
i 1: |Ai—Aj—-T|<e 


(Ag: 92 = J duo, [28] 


For QUE systems, these limit formulas are valid for 
the full sequence of eigenfunctions. 


Rate of Quantum Ergodicity and Mixing 


A quantitative refinement of quantum ergodicity is 
to ask at what rate the sums in Theorem 1(i) tend to 
zero, that is, to establish a rate of quantum 
ergodicity. More generally, we consider “variances” 
of matrix elements. For diagonal matrix elements, 
we define 


Va(A) = NOY 2s (App) -oA 29] 


In the off-diagonal case, one may view | (A4; pj) 7 as 
analogous to |(Ayj, yj) — w(A)|?. However, the sums 
in [25] are double sums while those of [29] are 
single. One may also average over the shorter 
intervals [A, A + 1]. 


Quantum Chaos Conjectures 


First, consider off-diagonal matrix elements. One 
conjecture is that it is not necessary to sum in j in 
[28]: each individual term has the asymptotics 
consistent with [28]. This is implicitly conjectured 
by Feingold—Peres (1986) (see [11]) in the form 


em 
(Apno ~ om 30 


where 


ewe J a ENT, 
In our notation, X =b7E; and p(E)dE ~ dN()). 
There are ~ CA”! eigenvalues \; in the interval 
[Aj — T — €, àj; — T + €], so [30] states that individual 
terms have the asymptotics of [28]. 


On the basis of the analogy between |(Ay;, i) 
and |(Ay;, yj) — w(A)|”, it is conjectured in Feingold 
and Peres (1986) that 


C4_.a)r (0) 


NAO) No vol() 


The idea is that y+ =(1/V2)(y; + yj) have the same 
matrix element asymptotics as eigenfunctions when 
A; — Aj is sufficiently small. But then 2(Ayi;, y_) = 
(Api yi) — (Ay, yj) when A*=A. Since we are 
taking a difference, we may replace each matrix 
element (Ay;, yi) by (Ayi, pi) — w(A) (and also for ¢;). 
The conjecture then assumes that (Ay;, pi} — w(A) has 
the same order of magnitude as (Ay;, yi) — (AY, pi). 
Dynamical grounds for this conjecture are given in 
Eckhardt et al. (1995). The order of magnitude is 
predicted by some natural random wave models, as 
discussed in the next section. 


Rigorous results 


At this time, the strongest variance result is an 
asymptotic formula for the diagonal variance proved 
by Luo and Sarnak (2004) for special Hecke 
eigenfunctions on the quotient H*/SL(2, Z) of the 
upper half plane by the modular group. Their result 
pertains to holomorphic Hecke eigenforms, but the 
analogous statement for smooth Maass—Hecke 
eigenfunctions is expected to hold by similar 
methods, so we state the result as a theorem/ 
conjecture. Note that H? /SL(2, Z) is a noncompact 
finite-area surface whose Laplacian A has both a 
discrete and a continuous spectrum. The discrete 
Hecke eigenfunctions are joint eigenfunctions of A 
and the Hecke operators Ty. 


Theorem/Conjecture 1 (Luo and Sarnak 2004). 
Let {y,} denote the orthonormal basis of Hecke 
eigenfunctions for H? /SL(2,Z). Then there exists a 
quadratic form B(f) on C%(H?/SL(2,Z)) such that 


1 2 1 
NOL f flo dvol - gzjgg | FaVol 
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When the multiplier f = p) is itself an eigenfunc- 
tion, Luo—Sarnak have shown that 


Blox Yr) = Cy, (OLG, pa) 


where L(4,)) is a certain L-function. Thus, the 
conjectured classical variance is multiplied by an 
arithmetic factor depending on the multiplier. A 
crucial fact in the proof is that the quadratic form B 
is diagonalized by the y). 
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The only rigorous result to date which is valid on 
general Riemannian manifolds with hyperbolic 
geodesic flow is the logarithmic decay: 


Theorem 3 (Zelditch). For any (M, g) with hyper- 
bolic geodesic flow, 


1 oe. 2 
Noy 2x (Ay, pi) — w(A)| = (log à}? ny 


The logarithm reflects the exponential blow-up in 
time of remainder estimates for traces involving the 
wave group associated to hyperbolic flows. It would 
be surprising if the logarithmic decay is sharp for 
Laplacians. However, a recent result of R Schubert 
shows that the estimate is sharp in the case of two- 
dimensional hyperbolic quantum cat maps. Hence, 
the estimate cannot be improved by semiclassical 
arguments that hold in both settings. 


Random Waves and Orthonormal Bases 


We have mentioned that the random wave model 
provides a kind of guideline for what to conjecture 
about eigenfunctions of quantum chaotic system. In 
this final section, we briefly discuss random wave 
models and what they predict. 

By a random wave model, one means a prob- 
ability measure on a space of functions. To deal 
with orthonormal bases rather than individual 
functions, one sets a probability measure on a 
space of orthonormal bases, that is, on a unitary 
group. We denote expected values relative to a given 
probability measure by E. We now consider some 
specific Gaussian models and what they predict 
about variances. 

As a model for quantum chaotic eigenfunctions 
in plane domains, Berry (1977) suggested using 
the Euclidean random wave model at fixed 
energy. A rigorous version of such a model is as 
follows: let €, denote the space of (tempered) 
eigenfunctions of eigenvalue à? of the Euclidean 
Laplacian A on R”. It is spanned by exponentials 
el(&*) with k € R”, |k|= Aà. The infinite-dimensional 
space €) is a unitary representation of the Euclidean 
motion group and carries an invariant inner 
product. The inner product defines an associated 
Gaussian measure whose covariance kernel 
C(x, vy) = Ef(x)f(y) is the derivative at A of the 
spectral function 


E(A,%,y) = (2m) / ede EER" [31] 


JE|<A 


Thus, 
d 
Cal, y) = GV ES) 
= (ony {eras 
el=A 
= (2r) "A"! / e-¥8) dS [32] 
iel=1 


where dS is the usual surface measure. With this 
definition, C,(x,x) ~X”"!. In order to make 
E(f(x)*) =1 consistent with normalized eigenfunc- 
tions, we divide by \”~! to define 


ale y) = ny f 


Kmi 


elAlx—y.6) dS 


One could express the integral as a Bessel function 
to rewrite this as 


n 
r 
5 


Wick’s formula in this ensemble gives 





*) ile A 2? Fog_ay (Abe — yl 
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Thus, in dimension n we have 
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In the last line, 
asymptotics 


anny | elA(x—-y8) dS 
c/=1 


~ CalAlx — |)? cos(x — lA) [33] 


p(y) dxdy — V? 








V(x) V(y)dx dy 
D ests — y|A)*dx dy 


we used the stationary-phase 


Thus, the variances have order A" in dimension n, 
consistent with the conjectures in Feingold and 
Peres (1986) and Eckhardt et al. (1995). 

This model is often used to obtain predictions on 
eigenfunctions of chaotic systems. By construction, 
it is tied to Euclidean geometry and only pertains 
directly to individual eigenfunctions of a fixed 
eigenvalue. It is based on the infinite-dimensional 
multiplicity of eigenfunctions of fixed eigenvalue of 
the Euclidean Laplacian on R”. There also exist 
random wave models on a curved Riemannian 
manifold (M,g), which model individual eigen- 
functions and also random orthonormal bases 
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(Zelditch 1996a). Thus, one can compare the 
behavior of sums over eigenvalues of the orthonor- 
mal basis of eigenfunctions of A with that of a 
random orthonormal basis. Instead of taking 
Gaussian random combinations of Euclidean plane 
waves of a fixed eigenvalue, one takes Gaussian 
random combinations 3, x eja,\41] G9; Of the eigen- 
functions of (M,g) with eigenvalues in a short 
interval in the sense above. Equivalently, one takes 
random combinations with >`, lg|? =1. These 
random waves are globally adapted to (M,g). The 
statistical results depend on the measure of the set of 
periodic geodesics of (M,g); thus, as discussed in 
Kaplan and Heller (1998), different random wave 
models make different predictions about off- 
diagonal variances. 

Fix a compact Riemannian manifold (M,g) and 
partition the spectrum of VA into the intervals 
I,=[k,kR+1]. Let I,=E(kR+1)—E(k) be the 
kernel of the spectral projections for VA corre- 
sponding to the interval Ip. Its kernel II,(x,y) is 
the covariance kernel of Gaussian random combi- 
nations Me, cjp; and is analogous to C(x, y) in 
the Euclidean case; it is of course not 
the derivative dE(A,x,y) but the difference of the 
spectral projector over I}. We denote by N(k) the 
number of eigenvalues in I, and put H; = ranli; 
(the range of I). We define a “random” ortho- 
normal basis of H, by changing the basis of 
eigenfunctions {yj} of A in Hg by a random 
element of the unitary group U(H,) of the finite- 
dimensional Hilbert space Hg. We then define a 
random orthonormal basis of L?(M) by taking the 
product over all the spectral intervals in our 
partition. More precisely, we define the infinite- 
dimensional unitary group 


of sequences (U1, U2,...), with U, € U(H,). We 
equip U(oo) with the product 


di = Il diy, 
k=1 


of the unit mass Haar measures dv, on U(H;): we 
then define a random orthonormal basis of L7(M) to 
be obtained by applying a random element 
U € U(co) to the orthonormal basis @={y;} of 
eigenfunctions of VA. 

Assuming the set of periodic geodesics of (M, g) 
has measure zero, the Weyl remainder results [8] 
and strong Szegö limit asymptotics of Guillemin- 
Okikiolu and Laptev—Robert-Safarov give two term 


asymptotics for the traces II, All,, (I1,Al],)” for any 
pseudodifferential operator A. Combining the strong 
SzegO asymptotics with the arguments of Zelditch 
(1996a), random orthonormal bases can be proved 
to satisfy the following variance asymptotics: 


1. E(Sprjen, (AU; Ue) — (A) 
~ (w(A"A) — w(A)’); 
sinT(A;—Aj—T) 


2, ED i ie, TAN) 
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See also: Arithmetic Quantum Chaos; Determinantal 
Random Fields; Eigenfunctions of Quantum Completely 
Integrable Systems; Fractal Dimensions in Dynamics; 
h-Pseudodifferential Operators and Applications; Number 
Theory in Physics; Regularization for Dynamical Zeta 
Functions; Semiclassical Spectra and Closed Orbits. 
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Quantum Error Correction 


Building a quantum computer or a quantum com- 
munications device in the real world means having 
to deal with errors. Any qubit stored unprotected or 
one transmitted through a communications channel 
will inevitably come out at least slightly changed. 
The theory of quantum error-correcting codes 
(QECCs) has been developed to counteract noise 
introduced in this way. By adding extra qubits and 
carefully encoding the quantum state we wish to 
protect, a quantum system can be insulated to a 
great extent against errors. 

To build a quantum computer, we face an even 
more daunting task: if our quantum gates are 
imperfect, everything we do will add to the error. 
The theory of fault-tolerant quantum computation 
tells us how to perform operations on states encoded 
in a QECC without compromising the code’s ability 
to protect against errors. 

In general, a QECC is a subspace of a Hilbert 
space designed so that any of a set of possible errors 
can be corrected by an appropriate quantum 
operation. Specifically: 


Definition 1 Let H, be a 2”-dimensional Hilbert 
space (n qubits), and let C be a K-dimensional 
subspace of Hn. Then C is an ((n, K)) (binary) QECC 
correcting the set of errors €={E,} iff IR s.t. R isa 
quantum operation and (Ro E,)(|~))=|w) for all 
E, € E, |Y} € C. 


R is called the “recovery” or “decoding” opera- 
tion and serves to actually perform the correction 
of the state. The decoder is sometimes also taken to 
map H, into an unencoded Hilbert space Hiogk 
isomorphic to C. This should be distinguished from 
the “encoding” operation which maps Hiogx into 
Hn, determining the imbedding of C. The computa- 
tional complexity of the encoder is frequently 


a great deal lower than that of the decoder. 
In particular, the task of determining what error 
has occurred can be computationally difficult 
(NP-hard, in fact), and designing codes with 
efficient decoding algorithms is an important task 
in quantum error correction, as in classical error 
correction. 

This article will cover only binary quantum codes, 
built with qubits as registers, but all of the 
techniques discussed here can be generalized to 
higher-dimensional registers, or “qudits.” 

To determine whether a given subspace is able to 
correct a given set of errors, we can apply the 
quantum error-correction conditions (Bennett et al. 


1996, Knill and Laflamme 1997): 
Theorem 1 A QECC C corrects the set of errors €E iff 


(hE Erhi r= Cap dj [1] 


where E,,E, € E and {|w;)} form an orthonormal 
basis for C. 


The salient point in these error-correction condi- 
tions is that the matrix element C,, does not depend 
on the encoded basis states 7 and j, which, roughly 
speaking, indicates that neither the environment nor 
the decoding operation learns any information about 
the encoded state. We can imagine the various 
possible errors taking the subspace C into other 
subspaces of H,, and we want those subspaces to be 
isomorphic to C, and to be distinguishable from 
each other by an appropriate measurement. For 
instance, if C,,=6,,, then the various erroneous 
subspaces are orthogonal to each other. 

Because of the linearity of quantum mechanics, 
we can always take the set of errors € to be a linear 
space: if a QECC corrects E, and Ep, it will also 
correct aE, + BE, using the same recovery opera- 
tion. In addition, if we write any superoperator S in 
terms of its operator-sum representation S(p)t> 
S` ARpA}, a QECC that corrects the set of errors 
{A,} automatically corrects S as well. Thus, it is 
sufficient in general to check that the error-correc- 
tion conditions hold for a basis of errors. 


Frequently, we are interested in codes that correct 
any error affecting t or fewer physical qubits. In that 
case, let us consider tensor products of the Pauli 


matrices 
1 0 0 1 
[= = 
(oa) X= o) 
0 -1 1 O 
YS, Li = 

Go) = (0 a) 
Define the Pauli group P, as the group consisting of 
tensor products of I, X, Y, and Z on n qubits, with 
an overall phase of +1 or +1. The weight wt(P) of a 
Pauli operator P € P, is the number of qubits on 
which it acts as X, Y, or Z (i.e., not as the identity). 
Then the Pauli operators of weight t or less form a 
basis for the set of all errors acting on £ or fewer 
qubits, so a QECC which corrects these Pauli 
operators corrects all errors acting on up to t 
qubits. If we have a channel which causes errors 
independently with probability O(¢) on each qubit 
in the QECC, then the code will allow us to 
decode a correct state except with probability 
O(t), which is the probability of having more 
than ¢ errors. We get a similar result in the case 
where the noise is a general quantum operation on 
each qubit which differs from the identity by 
something of size O(e). 


Definition 2 The distance d of an ((n, K)) QECC is 
the smallest weight of a nontrivial Pauli operator 
E € P, s.t. the equation 


(hilElp) = CE) 6; [3] 


fails. 


We use the notation ((n,K, d)) to refer to an 
((n,K)) QECC with distance d. Note that for P,O € 
Pn, wt(PO) < wt(P)+wt(Q). Then by comparing 
the definition of distance with the quantum error- 
correction conditions, we immediately see that a 
QECC corrects t general errors iff its distance d > 2t. 
If we are instead interested in “erasure” errors, when 
the location of the error is known but not its precise 
nature, a distance d code corrects d— 1 erasure 
errors. If we only wish to detect errors, a distance d 
code can detect errors on up to d — 1 qubits. 

One of the central problems in the theory of 
quantum error correction is to find codes which 
maximize the ratios (log K)/n and d/n, so they can 
encode as many qubits as possible and correct as 
many errors as possible. Conversely, we are also 
interested in the problem of setting upper bounds on 
achievable values of (logK)/n and d/n. The 
quantum Singleton bound (or Kaill-Laflamme 
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(1997) bound) states that any ((n,K, d)) QECC 
must satisfy 


n — logK > 2d -2 [4] 


We can set a lower bound on the existence of 
QECCs using the quantum Gilbert-Varshamov 
bound, which states that, for large n, an ((n,2%,d)) 
QECC exists provided that 


k/n <1—(d/n)log3 —h(d/n) (5) 


where h(x)=—xlogx —(1—<x)log(1— x) is the 
binary Hamming entropy. Note that the Gilbert- 
Varshamov bound simply states that codes at least 
this good exist; it does not suggest that better codes 
cannot exist. 


Stabilizer Codes 


In order to better manipulate and discover QECCs, 
it is helpful to have a more detailed mathematical 
structure to work with. The most widely used 
structure gives a class of codes known as “stabilizer 
codes” (Calderbank et al. 1998, Gottesman 
1996). They are less general than arbitrary quantum 
codes, but have a number of useful properties that 
make them easier to work with than the general 


QECC. 


Definition 3 Let S$ C P, be an abelian subgroup of 
the Pauli group that does not contain —1 or +i, and 
let C(S) = {Y} s.t. P|) =|b)VP € S}. Then C(S) is a 


stabilizer code and S is its stabilizer. 


Because of the simple structure of the Pauli group, 
any abelian subgroup has order 2”-* for some k and 
can easily be specified by giving a set of n—k 
commuting generators. 

The code words of the QECC are by definition in 
the +1-eigenspace of all elements of the stabilizer, 
but an error E acting on a code word will move the 
state into the —l-eigenspace of any stabilizer element 
M which anticommutes with E: 


M(E|))) = —EM|)) = —Ely) 6 


Thus, measuring the eigenvalues of the generators of 
S tells us information about the error that has 
occurred. The set of such eigenvalues can be 
represented as an (n — k)-dimensional binary vector 
known as the “error syndrome.” Note that the error 
syndrome does not tell us anything about the encoded 
state, only about the error that has occurred. 


Theorem 2 Let S be a stabilizer with n — k gener- 
ators, and let St={E € P, s.t.[E,M]=0 YM E S}. 
Then S encodes k qubits and has distance d, where d 
is the smallest weight of an operator in S*\S. 
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We use the notation [[n, k, d]] to a refer to such a 
stabilizer code. Note that the square brackets specify 
that the code is a stabilizer code, and that the middle 
term k refers to the number of encoded qubits, and 
not the dimension 2* of the encoded subspace, as for 
the general QECC (whose dimension might not be a 
power of 2). 

SŁ is the set of Pauli operators that commute with 
all elements of the stabilizer. They would therefore 
appear to be those errors which cannot be detected 
by the code. However, the theorem specifies the 
distance of the code by considering S+\S. A Pauli 
operator P € S cannot be detected by the code, but 
there is in fact no need to detect it, since all code 
words remain fixed under P, making it equivalent to 
the identity operation. A distance d stabilizer code 
which has nontrivial P € S with wt(P) < d is called 
degenerate, whereas one which does not is non- 
degenerate. The phenomenon of degeneracy has no 
analog for classical error-correcting codes, and 
makes the study of quantum codes substantially 
more difficult than the study of classical error 
correction. For instance, a standard bound on 
classical error correction is the Hamming bound 
(or sphere-packing bound), but the analogous 
quantum Hamming bound 


k/n <1-—(t/n) log3 — h(t/n) [7] 


for [[n,k,2t+ 1]] codes (when n is large) is only 
known to apply to nondegenerate quantum codes 
(though in fact we do not know of any degenerate 
QECCs that violate the quantum Hamming bound). 

An example of a stabilizer code is the 5-qubit 
code, a [[5,1,3]] code whose stabilizer can be 
generated by 


XOLZEOLZOXOl 
L@X@LZQLZOxX 
XOLI@OXOLZOZ 
ZOXOI@X@OZL 


The 5-qubit code is a nondegenerate code, and is the 
smallest possible QECC which corrects 1 error (as 
one can see from the quantum Singleton bound). 

It is frequently useful to consider other represen- 
tations of stabilizer codes. For instance, P € P,, can 
be represented by a pair of n-bit binary vectors 
(px | pz), where px is 1 for any location where P has 
an X or Y tensor factor and is 0 elsewhere, and pz 
is 1 for any location where P has a Y or Z tensor 
factor. Two Pauli operators P=(px|pz) and 
Q=(qx|qz) commute iff px-qz+pz-qx=0. 
Then the stabilizer for a code becomes a pair of 
(1—k) xn binary matrices, and most interesting 
properties can be determined by an appropriate 


linear algebra exercise. Another useful representa- 
tion is to map the single-qubit Pauli operators I, X, 
Y, Z to the finite field GF(4), which sets up a 
connection between stabilizer codes and a subset of 
classical codes on four-dimensional registers. 


CSS Codes 


CSS codes are a very useful class of stabilizer codes 
invented by Calderbank and Shor (1996), and by 
Steane (1996). The construction takes two binary 
classical linear codes and produces a quantum code, 
and can therefore take advantage of much existing 
knowledge from classical coding theory. In addition, 
CSS codes have some very useful properties which 
make them excellent choices for fault-tolerant 
quantum computation. 

A classical [n, k, d] linear code (n physical bits, k 
logical bits, classical distance d) can be defined in 
terms of an (n — k) x n binary “parity check” matrix 
H - every classical code word v must satisfy Hv = 0. 
Each row of the parity check matrix can be 
converted into a Pauli operator by replacing each 0 
with an I operator and each 1 with a Z operator. 
Then the stabilizer code generated by these opera- 
tors is precisely a quantum version of the classical 
error-correcting code given by H. If the classical 
distance d=2t+ 1, the quantum code can correct t 
bit flip (X) errors, just as could the classical code. 

If we want to make a QECC that can also correct 
phase (Z) errors, we should choose two classical 
codes C; and C2, with parity check matrices Hı and 
H>. Let Cı be an [n,k1,d,] code and let Cy be an 
[7,k2,d2] code. We convert Hy, into stabilizer 
generators as above, replacing each 0 with I and 
each 1 with Z. For H2, we perform the same 
procedure, but each 1 is instead replaced by X. The 
code will be able to correct bit flip (X) errors as if it 
had a distance dı and to correct phase (Z) errors as 
if it had a distance d2. Since these two operations are 
completely separate, it can also correct Y errors as 
both a bit flip and a phase error. Thus, the distance 
of the quantum code is at least min (d1, d2), but 
might be higher because of the possibility of 
degeneracy. 

However, in order to have a stabilizer code at all, 
the generators produced by the above procedure 
must commute. Define the dual C+ of a classical 
code C as the set of vectors w s.t. w-v=O for all 
v EC. Then the Z generators from H, will all 
commute with the X generators from H) iff Cy C 
Cı (or equivalently, C+} C C2). When this is true, C; 
and C, define an [[n, k1 + k2 — n, d]] stabilizer code, 
where d > min (dj, d2). 


The smallest distance-3 CSS code is the 7-qubit 
code, a [[7, 1, 3]] QECC created from the classical 
Hamming code (consisting of all sums of classical 
strings 1111000, 1100110, 1010101, and 1111111). 
The encoded |0) for this code consists of the 
superposition of all even-weight classical code 
words and the encoded |1) is the superposition of 
all odd-weight classical code words. The 7-qubit 
code is much studied because its properties make it 
particularly well suited to fault-tolerant quantum 
computation. 


Fault Tolerance 


Given a QECC, we can attempt to supplement it 
with protocols for performing fault-tolerant opera- 
tions. The basic design principle of a fault-tolerant 
protocol is that an error in a single location — either 
a faulty gate or noise on a quiescent qubit — should 
not be able to alter more than a single qubit in each 
block of the QECC. If this condition is satisfied, t 
separate single-qubit or single-gate failures are 
required for a distance 2t + 1 code to fail. 

Particular caution is necessary, as computational 
gates can cause errors to propagate from their 
original location onto qubits that were previously 
correct. In general, a gate coupling pairs of qubits 
allows errors to spread in both directions across the 
coupling. 

The solution is to use transversal gates whenever 
possible (Shor 1996). A transversal operation is one 
in which the ith qubit in each block of a QECC 
interacts only with the ith qubit of other blocks of 
the code or of special ancilla states. An operation 
consisting only of single-qubit gates is automatically 
transversal. A transversal operation has the virtue 
that an error occurring on the third qubit in a block, 
say, can only ever propagate to the third qubit of 
other blocks of the code, no matter what other 
sequence of gates we perform before a complete 
error-correction procedure. 

In the case of certain codes, such as the 7-qubit 
code, a number of different gates can be performed 
transversally. Unfortunately, it does not appear to 
be possible to perform universal quantum compu- 
tations using just transversal gates. We therefore 
have to resort to more complicated techniques. 
First we create special encoded ancilla states in a 
non-fault-tolerant way, but perform some sort of 
check on them (in addition to error correction) to 
make sure they are not too far off from the goal. 
Then we interact the ancilla with the encoded data 
qubits using gates from our stock of transversal 
gates and perform a fault-tolerant measurement. 
Then we complete the operation with a further 
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transversal gate which depends on the outcome of 
the measurement. 


Fault-Tolerant Gates 


We will focus on stabilizer codes. Universal fault 
tolerance is known to be possible for any stabilizer 
code, but in most cases the more complicated type 
of construction is needed for all but a few gates. The 
Pauli group P, however, can be performed trans- 
versally on any stabilizer code. Indeed, the set S+\S 
of undetectable errors is a boon in this case, as it 
allows us to perform these gates. In particular, each 
coset $+ /S corresponds to a different logical Pauli 
operator (with S itself corresponding to the identity). 
On a stabilizer code, therefore, logical Pauli opera- 
tions can be performed via a transversal Pauli 
operation on the physical qubits. 

Stabilizer codes have a special relationship to a 
finite subgroup C, of the unitary group U(2”) 
frequently called the “Clifford group.” The Clifford 
group on n qubits is defined as the set of unitary 
operations which conjugate the Pauli group P,, into 
itself; C, can be generated by the Hadamard trans- 
form, the controlled-NOT (CNOT), and the single- 
qubit 2/4 phase rotation diag(1, 1). The set of 
stabilizer codes is exactly the set of codes which can 
be created by a Clifford group encoder circuit using 
|0) ancilla states. 

Some stabilizer codes have interesting symmetries 
under the action of certain Clifford group elements, 
and these symmetries result in transversal gate 
operations. A particularly useful fact is that a 
transversal CNOT gate (i.e., CNOT acting between 
the ith qubit of one block of the QECC and the ith 
qubit of a second block for all i) acts as a logical 
CNOT gate on the encoded qubits for any CSS code. 
Furthermore, for the 7-qubit code, transversal 
Hadamard performs a logical Hadamard, and the 
transversal 7/4 rotation performs a logical —7/4 
rotation. Thus, for the 7-qubit code, the full logical 
Clifford group is accessible via transversal 
operations. 

Unfortunately, the Clifford group by itself does 
not have much computational power: it can be 
efficiently simulated on a classical computer. 
We need to add some additional gate outside 
the Clifford group to allow universal quantum 
computation; a single gate will suffice, such as the 
single-qubit 7/8 phase rotation diag(1, exp (i7/4)). 
Note that this gives us a finite generating set of 
gates. However, by taking appropriate products, we 
get an infinite set of gates, one that is dense in the 
unitary group U(2”), allowing universal quantum 
computation. 
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The following circuit performs a 7/8 rotation, 
given an ancilla state |wW,/3) =|0) + exp (im/4)|1): 





laa) 


Here P is the 7/4 phase rotation diag(1, i), and X 
is the bit flip. The product is in the Clifford group, 
and is only performed if the measurement outcome 
is 1. Therefore, given the ability to perform fault- 
tolerant Clifford group operations, fault-tolerant 
measurements, and to prepare the encoded |w,/g) 
state, we have universal fault-tolerant quantum 
computation. A slight generalization of the fault- 
tolerant measurement procedure below can be used 
to fault-tolerantly verify the |W,/g) state, which is a 
+1 eigenstate of PX. Using this or another verifica- 
tion procedure, we can check a non-fault-tolerant 
construction. 


Fault-Tolerant Measurement 
and Error Correction 


Since all our gates are unreliable, including those 
used to correct errors, we will need some sort of 
fault-tolerant quantum error-correction procedure. 
A number of different techniques have been devel- 
oped. All of them share some basic features: they 
involve creation and verification of specialized 
ancilla states, and use transversal gates which 
interact the data block with the ancilla state. 

The simplest method, due to Shor, is very general 
but also requires the most overhead and is 
frequently the most susceptible to noise. Note that 
the following procedure can be used to measure 
(non-fault-tolerantly) the eigenvalue of any (possibly 
multiqubit) Pauli operator M: produce an ancilla 
qubit in the state |+) = |0) + |1). Perform a con- 
trolled-M operation from the ancilla to the state 
being measured. In the case where M is a multiqubit 
Pauli operator, this can be broken down into a 
sequence of controlled-X, controlled-Y, and con- 
trolled-Z operations. Then measure the ancilla in the 
basis of |+) and |—) =|0) — |1). If the state is a +1 
eigenvector of M, the ancilla will be |+), and if the 
state is a —1 eigenvector, the ancilla will be |—). 

The advantage of this procedure is that it 
measures just M and nothing more. The disadvan- 
tage is that it is not transversal, and thus not fault 
tolerant. Instead of the unencoded |+) state, we 
must use a more complex ancilla state |00...0) + 
/11...1) known as a “cat” state. The cat state 
contains as many qubits as the operator M to 


be measured, and we perform the controlled-X, -Y, 
or -Z operations transversally from the appropriate 
qubits of the cat state to the appropriate qubits in 
the data block. Since, assuming the cat state is 
correct, all of its qubits are either |0) or |1), the 
procedure either leaves the data state alone or 
performs M on it uniformly. A +1 eigenstate in the 
data therefore leaves us with |00...0) + |11...1) in 
the ancilla and a —1 eigenstate leaves us with 
JOO...0) —|11...1). In either case, the final state 
still tells us nothing about the data beyond the 
eigenvalue of M. If we perform a Hadamard 
transform and then measure each qubit in the 
ancilla, we get either a random even-weight string 
(for eigenvalue +1) or an odd-weight string (for 
eigenvalue —1). 

The procedure is transversal, so an error on a 
single qubit in the initial cat state or in a single gate 
during the interaction will only produce one error in 
the data. However, the initial construction of the cat 
state is not fault tolerant, so a single-gate error then 
could eventually produce two errors in the data 
block. Therefore, we must be careful and use some 
sort of technique to verify the cat state, for instance, 
by checking if random pairs of qubits are the same. 
Also, note that a single phase error in the cat state 
will cause the final measurement outcome to be 
wrong (even and odd switch places), so we should 
repeat the measurement procedure multiple times 
for greater reliability. 

We can then make a full fault-tolerant error- 
correction procedure by performing the above 
measurement technique for each generator of the 
stabilizer. Each measurement gives us one bit of the 
error syndrome, which we then decipher classically 
to determine the actual error. 

More sophisticated techniques for fault-tolerant 
error correction involve less interaction with the 
data but at the cost of more complicated ancilla 
states. A procedure due to Steane uses (for CSS 
codes) one ancilla in a logical |0) state of the same 
code and one ancilla in a logical |0} + |1) state. A 
procedure due to Knill (for any stabilizer code) 
teleports the data qubit through an ancilla consisting 
of two blocks of the QECC containing an encoded 
Bell state |00) + |11). Because the ancillas in Steane 
and Knill error correction are more complicated 
than the cat state, it is especially important to verify 
the ancillas before using them. 


The Threshold for Fault Tolerance 


In an unencoded protocol, even one error can 
destroy the computation, but a fully fault-tolerant 
protocol will give the right answer unless multiple 


errors occur before they can be corrected. On the 
other hand, the fault-tolerant protocol is larger, 
requiring more qubits and more time to do each 
operation, and therefore providing more opportu- 
nities for errors. If errors occur on the physical 
qubits independently at random with probability p 
per gate or time step, the fault-tolerant protocol has 
probability of logical error for a single logical gate 
or time step at most Cp*, where C is a constant that 
depends on the design of the fault-tolerant circuitry 
(assume the QECC has distance 3, as for the 7-qubit 
code). When p < p,=1/C, the fault tolerance helps, 
decreasing the logical error rate. p, is the “thresh- 
old” for fault-tolerant quantum computation. If the 
error rate is higher than the threshold, the extra 
overhead means that errors will occur faster than 
they can be reliably corrected, and we are better off 
with an unencoded system. 

To further lower the logical error rate, we turn to 
a family of codes known as “concatenated codes” 
(Aharonov and Ben-Or, Kitaev 1997, Knill et al. 
1998). Given a code word of a particular [[z, 1]] 
QECC, we can take each physical qubit and again 
encode it using the same code, producing an [[n7, 1]] 
QECC. We could repeat this procedure to get an n?- 
qubit code, and so forth. The fault-tolerant proce- 
dures concatenate as well, and after L levels of 
concatenation, the effective logical error rate is 
p.(p/P2)>- (for a base code correcting 1 error). 
Therefore, if p is below the threshold p,;, we can 
achieve an arbitrarily good error rate e per logical 
gate or time step using only poly( loge) resources, 
which is excellent theoretical scaling. 

Unfortunately, the practical requirements for this 
result are not nearly so good. The best rigorous 
proofs of the threshold to date show that the 
threshold is at least 2 x 10° (meaning one error 
per 50,000 operations). Optimized simulations of 
fault-tolerant protocols suggest that the true thresh- 
old may be as high as 5%, but to tolerate this much 
error, existing protocols require enormous overhead, 
perhaps increasing the number of gates and qubits 
by a factor of a million or more for typical 
computations. For lower physical error rates, over- 
head requirements are more modest, particularly if 
we only attempt to optimize for calculations of a 
given size, but are still larger than one would like. 
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Furthermore, these calculations make a number of 
assumptions about the physical properties of the 
computer. The errors are assumed to be independent 
and uncorrelated between qubits except when a gate 
connects them. It is assumed that measurements and 
classical computations can be performed quickly 
and reliably, and that quantum gates can be 
performed between arbitrary pairs of qubits in the 
computer, irrespective of their physical proximity. 
Of these, only the assumption of independent errors 
is at all necessary, and that can be considerably 
relaxed to allow short-range correlations and certain 
kinds of non-Markovian environments. However, 
the effects of relaxing these assumptions on the 
threshold value and overhead requirements have not 
been well studied. 
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Introduction and Preliminaries 


Quantum Field Theory (QFT) in curved spacetime 
is a hybrid approximate theory in which quantum 
matter fields are assumed to propagate in a fixed 
classical background gravitational field. Its basic 
physical prediction is that strong gravitational 
fields can polarize the vacuum and, when time 
dependent, lead to pair creation just as a strong 
and/or time-dependent electromagnetic field can 
polarize the vacuum and/or give rise to pair 
creation of charged particles. One expects it to 
be a good approximation to full quantum gravity 
provided the typical frequencies of the gravita- 
tional background are very much less than 
the Planck frequency eG ~ 104 s) and 
provided, with a suitable measure for energy, the 
energy of created particles is very much less than 
the energy of the background gravitational field or 
of its matter sources. Undoubtedly, the most 
important prediction of the theory is the Hawking 
effect, according to which a, say spherically 
symmetric, classical black hole of mass M will 
emit thermal radiation at the Hawking tempera- 
ture T=(87M)* (here and from now on, we use 
Planck units where G,c,h and, k (Boltzmann’s 
constant) are all taken to be 1). 

On the mathematical side, the need to formulate the 
laws and derive the general properties of QFT on 
nonflat spacetimes forces one to state and prove results 
in local terms and, as a byproduct, thereby leads to an 
improved perspective on flat-spacetime QFT too. It is 
also interesting to formulate QFT on idealized space- 
times with particular global geometrical features. 
Thus, QFT on spacetimes with bifurcate Killing 
horizons is intimately related to the Hawking effect; 
QFT on spacetimes with closed timelike curves is 
intimately related to the question whether the laws of 
physics permit the manufacture of a time machine. 

As is standard in general relativity, a curved 
spacetime is modeled mathematically as a 
(paracompact, Hausdorff) manifold M equipped 
with a pseudo-Riemannian metric g of signature 
(—, +++) (we follow the conventions of the 
standard text by Misner et al. (1973)). We shall 
also assume, except where otherwise stated, our 
spacetime to be globally hyperbolic, that is, that 
M admits a global time coordinate, by which we 


mean a global coordinate ¢ such that each constant-t 
surface is a smooth Cauchy surface, that is, a 
smooth spacelike 3-surface cut exactly once by each 
inextendible causal curve. (Without this default 
assumption, extra problems arise for QFT which 
we shall briefly mention in connection with the 
“time machine” question discussed later.) In view 
of this definition, globally hyperbolic spacetimes 
are clearly time-orientable and we shall assume a 
choice of time-orientation has been made so we can 
talk about the “future” and “past”? directions. 
Modern formulations of the subject take, as the 
fundamental mathematical structure modeling the 
quantum field, a x*-algebra A (with identity I) 
together with a family of local sub x-algebras 
A(O) labeled by bounded open regions © of the 
spacetime (M, g) and satisfying the isotony or net 
condition that O; C Oz implies A(Q;) is a subalge- 
bra of A(O2) as well as the condition that whenever 
two bounded open regions O4 and Q) are spacelike 
separated, then A(Q,) and A(Oz) commute. 
Standard concepts and techniques from algebraic 
quantum theory are then applicable: In particular, 
states are defined to be positive (this means 
w(A*A) > 0 VA € A) normalized (this means w(I) = 1) 
linear functionals on A. One distinguishes between pure 
states and mixed states, only the latter being writable 
as nontrivial convex combinations of other states. To 
each state, w, the GNS construction associates a 
representation, p,, of A on a Hilbert space Ho 
together with a cyclic vector Q € H,, such that 


w(A) = (Q|p.(A)Q) 


(and the GNS triple (p,,H,Q) is unique up to 
equivalence). There are often technical advantages 
in formulating things so that the x-algebra is a 
C*-algebra. Then the GNS representation is as every- 
where-defined bounded operators and is irreducible if 
and only if the state is pure. A useful concept, due to 
Haag, is the folium of a given state w which may be 
defined to be the set of all states w, which arise in the 
form tr(ap,(-)), where o ranges over the density 
operators (trace-class operators with unit trace) on Hw. 

Given a state, w, and an automorphism, a, which 
preserves the state (i.e., w o a@=w) then there will be 
a unitary operator, U, on H,, which implements a in 
the sense that p,(a(A))=U'p,(A)U and U is 
chosen uniquely by the condition UQ =Q. 

On a stationary spacetime, that is, one which 
admits a one-parameter group of isometries 
whose integral curves are everywhere timelike, 
the algebra will inherit a one-parameter group (i.e., 
satisfying a(t;) o a(t2) =a(t, + t2)) of time-translation 


automorphisms, a(t), and, given any stationary 
state (i.e., one which satisfies wo a(t)=w Vt € R), 
these will be implemented by a one-parameter 
group of unitaries, U(t), on its GNS Hilbert space 
satisfying U(t)Q =Q. If U(t) is strongly continuous 
so that it takes the form e“” and if the 
Hamiltonian, H, is positive, then w is said to be 
a “ground state.” Typically one expects ground 
states to exist and often be unique. 

Another important class of stationary states for 
the algebra of a stationary spacetime is the class of 
KMS states, wf, at inverse temperature 8; these have 
the physical interpretation of thermal equilibrium 
states. In the GNS representation of one of these, the 
automorphisms are also implemented by a strongly 
continuous unitary group, e~"”", which preserves 2 
but (in place of H positive) there is a complex 
conjugation, J, on H, such that 


p(A)Q = Jp,,(A*) [1] 


for all A € A. An attractive feature of the subject is 
that its main qualitative features are already present 
for linear field theories and, unusually in compar- 
ison with other questions in QFT, these are 
susceptible of a straightforward explicit and rigor- 
ous mathematical formulation. In fact, as our 
principal example, we give, in the next section a 
construction for the field algebra for the quantized 
real linear Klein—Gordon equation 


V)o=0 2 | 


of mass m on a globally hyperbolic spacetime (M, g). 
Here, [Jẹ denotes the Laplace—Beltrami operator 

g*a (= (l det (9)|) "234l det (g)|"/29%%0,)). We 
include a scalar external background classical field, 
V, in addition to the external gravitational field 
represented by g. In case m is zero, taking V to equal 
R/6, where R denotes the Riemann scalar, makes the 
equation conformally invariant. 

The main new feature of QFT in curved spacetime 
(present already for linear field theories) is that, in a 
general (neither flat nor stationary) spacetime there 
will not be any single preferred state but rather a 
family of preferred states, members of which are best 
regarded as on an equal footing with one another. It 
is this feature which makes the above algebraic 
framework particularly suitable, indeed essential, to 
a clear formulation of the subject. Conceptually, it is 
this feature which takes the most getting used to. In 
particular, one must realize that, as we shall explain 
later, the interpretation of a state as having a 
particular “particle content” is in general problematic 
because it can only be relative to a particular choice 
of “vacuum” state and, depending on the spacetime 


e-BH/2 


(Og — m =- 
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of interest, there may be one state or several states or, 
frequently, no states at all which deserve the name 
“vacuum” and even when there are states which 
deserve this name, they will often only be defined in 
some approximate or asymptotic or transient sense or 
only on some subregion of the spacetime. 

Concomitantly, one does not expect global obser- 
vables such as the “particle number” or the quantum 
Hamiltonian of flat-spacetime free-field theory to 
generalize to a curved spacetime context, and for 
this reason local observables play a central role in 
the theory. The quantized stress—energy tensor is a 
particularly natural and important such local obser- 
vable and the theory of this is central to the whole 
subject. A brief introduction to it is given in a later 
section. 

This is followed by a further section on the 
Hawking and Unruh effects and then a brief section 
on the problems of extending the theory beyond the 
“default” setting, to nonglobally hyperbolic space- 
times. Finally, we briefly mention a number of other 
interesting and active areas of the subject as well as 
issuing a few warnings to be borne in mind when 
reading the literature. 


Construction of «-Algebra(s) for a Real 
Linear Scalar Field on Globally 
Hyperbolic Spacetimes and Some 
General Theorems 


On a globally hyperbolic spacetime, the classical 
equation [2] admits well-defined advanced and 
retarded Green functions (strictly bidistributions) 
A* and AP and the standard covariant quantum 
free real (or “Hermitian”) scalar field commutation 
relations familiar from Minkowski spacetime free- 
field theory naturally generalize to the (heuristic) 
equation 


[o(x), 60] 


where A is the Lichnérowicz commutator function 
A=AA4 — AB. Here, the “^” on the quantum field ¢ 
serves to distinguish it from a classical solution @. In 
mathematical work, one does not assign a meaning 
to the field at a point itself, but rather aims to assign 
meaning to smeared fields ¢(F) for all real-valued 
test functions F € C(M) which are then to be 
interpreted as standing for f, d(x) f )f (x)| det ( g)" * dx. 
In fact, it is straightforward to define a minimal 
field algebra (see below) Amin generated by such 
o(F) which satisfy the suitably smeared version 


O(F), o(G)] = 


= iA(x,y)I 


iA(F, G)I 
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of the above commutation relations together with 
Hermiticity (i.e., d(F)* = 4(F)), the property of being a 
weak solution of eqn [2] (i.e., o((Oe =m — V)F)= 
0 VF € C(M)) and linearity in test functions. There 
is a technically different alternative formulation of this 
minimal algebra, which is known as the Weyl algebra, 
which is constructed to be the C*-algebra generated by 
operators W(F) (to be interpreted as standing for 


Spy b(x)f (x)| det (g? dfx) satisfying 
W(F,) W(F2) = exp(—1A(F,, F2)/2)W (Fi E F2) 


together with W(F)* = W(—F) and W((O; — m — 
V)F) =I. With either the minimal algebra or the 
Weyl algebra one can define, for each bounded open 
region O, subalgebras A(O) as generated by the ¢(-) 
(or the W(-)) smeared with test functions supported 
in © and verify that they satisfy the above “net” 
condition and commutativity at spacelike separation. 

Specifying a state, w, on Amin is tantamount to 
specifying its collection of m-point distributions (i.e., 
smeared n-point functions) w(d(F;)...¢(E,)). (In the 
case of the Weyl algebra, one restricts attention to 
“regular” states for which the map F —> w(W(F)) is 
sufficiently often differentiable on finite-dimensional 
subspaces of C3°(M) and defines the n-point 
distributions in terms of derivatives with respect to 
suitable parameters of expectation values of suitable 
Weyl algebra elements.) A particular role is played 
in the theory by the quasifree states for which all the 
truncated n-point distributions except for n=2 
vanish. Thus, all the n-point distributions for odd 1 
vanish while the four-point distribution is made out 
of the two-point distribution according to 


etc. The anticommutator distribution 


G(Fi, F2) = 0(6(Fi)¢(F2)) + w(b(F2)0(Fi)) [8] 


of a quasifree state (or indeed of any state) will 
satisfy the following conditions (for all test functions 
F, F4, F2, etc.): 


C1. Symmetry 
G(F1, Fy) = GU), F1) 
C2. Weak bisolution property 
G((Og — m — V)F,, Fx) = 0 
= G(F,, (O; — m — V)Fy) 


C3. Positivity 


G(F, F) > 0 and G(F;, F1)? G(F2, F2)? 
> |A(Fi, F)| 


and it can be shown that, to every bilinear 
functional G on C(M) satisfying (C1)-(C3), 
there is a quasifree state with two-point distri- 
bution (1/2)(G +14). One further declares a 
quasifree state to be physically admissible only if 
(for pairs of points in sufficiently small convex 
neighborhoods) 
C4. Hadamard condition 


1 1 
“G(x1,%2) =a (unaa Pi 


+v(x1, x2) log |a| + (1,2) ) 


This last condition expresses the requirement that 
(locally) the two-point distribution actually “is” 
(in the usual sense in which one says that a 
distribution “is” a function) a smooth function for 
pairs of non-null-separated points. At the same 
time, it requires that the two-point distribution be 
singular at pairs of null-separated points and 
locally specifies the nature of the singularity for 
such pairs of points with a leading “principal part 
of 1/o” type singularity and a subleading “log |o|” 
singularity, where o denotes the square of the 
geodesic distance between x; and x2. u (which 
satisfies u(x 1,x2) = 1 when x; =x2) and v are certain 
smooth two-point functions determined in terms of 
the local geometry and the local values of V by 
something called the Hadamard procedure while the 
smooth two-point function w depends on the state. 
We shall omit the details. The important point is 
that this Hadamard condition on the two-point 
distribution is believed to be the correct general- 
ization to a curved spacetime of the well-known 
universal short-distance behavior shared by the 
truncated two-point distributions of all physically 
relevant states for the special case of our theory 
when the spacetime is flat (and V vanishes). In the 
latter case, u reduces to 1, and v to a simple power 
series X` o Vno” with vo =m7/4, etc. 

Actually, it is known (this is the content of “Kay’s 
conjecture” which was proved by M Radzikowski in 
1992) that (C1)-(C4) together imply that the two- 
point distribution is nonsingular at all pairs of (not 
necessarily close together) spacelike separated 
points. More important than this result itself is a 
reformulation of the Hadamard condition in terms 
of the concepts of microlocal analysis which 
Radzikowski originally introduced as a tool towards 
its proof. 


C4’. Wave front set (or microlocal) spectrum condition 


WF(G +14) 
= {(x1,P13%2,p2) E T*(M x M) \ O|x; and x, 
lie on a single null geodesic, pı is tangent to 
that null geodesic and future pointing, and 
p2 when parallel transported along that null 
geodesic from x2 to xı equals —p1} 


For the gist of what this means, it suffices to know that 
to say that an element (x, p) of the cotangent bundle of 
a manifold (excluding the zero section 0) is in the wave 
front set, WF, of a given distribution on that manifold 
may be expressed informally by saying that that 
distribution is singular at the point x in the direction 
p. (And here the notion is applied to G + iA, thought 
of as a distribution on the manifold M x M.) 

We remark that generically (and, e.g., always if the 
spatial sections are compact and m? + V(x) is every- 
where positive) the Weyl algebra for eqn [2] on a given 
stationary spacetime will have a unique ground state 
and unique KMS states at each temperature and these 
will be quasifree and Hadamard. 

Quasifree states are important also because of a 
theorem of R Verch (1994, in verification of another 
conjecture of Kay) that (in the Weyl algebra frame- 
work) on the algebra of any bounded open region, 
the folia of the quasifree Hadamard states coincide. 
With this result one can extend the notion of 
physical admissibility to not-necessarily-quasifree 
states by demanding that, to be admissible, a state 
belong to the resulting common folium when 
restricted to the algebra of each bounded open 
region; equivalently, that it be a locally normal state 
on the resulting natural extension of the net of local 
Weyl algebras to a net of local W*-algebras. 


Particle Creation and the Limitations 
of the Particle Concept 


Global hyperbolicity also entails that the Cauchy 
problem is well posed for the classical field equation 
[2] in the sense that for every Cauchy surface, C, and 
every pair (f,p) of Cauchy data in C3°(C), there 
exists a unique solution ¢ in C3°(M) such that 
f =¢|¢ and p=| det ( g) g”Əpple. Moreover, ¢ has 
compact support on all other Cauchy surfaces. 
Given a global time coordinate t, increasing towards 
the future, foliating M into a family of constant-t 
Cauchy surfaces, C;, and given a choice of global 
timelike vector field 7? (e.g., 7? =g%0,t) enabling 
one to identify all the C;, say with Co, by identifying 
points cut by the same integral curve of 7%, a single 
such classical solution @ may be pictured as a family 
{(ft, Dt):t € R} of time-evolving Cauchy data on Co. 
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Moreover, since [2] implies, for each pair of classical 
solutions, $1, 2, the conservation (i.e., 0,j7 =0) of 
the current /* =| det ( ke- ab dipha — PTR the 
symplectic form (on C3 (C) x C (C)) 


AFP = | (fe? — pieds 
Co 


will be conserved in time. 

Corresponding to this picture of classical 
dynamics, one expects there to be a description of 
quantum dynamics in terms of a family of sharp- 
time quantum fields (y,,7;) on Co, satisfying 
heuristic canonical commutation relations 


and evolving in time according to the same 
dynamics as the Cauchy data of a classical solution. 
(Both these expectations are correct because the field 
equation is linear.) An elegant way to make rigorous 
mathematical sense of these expectations is in terms 
of a x-algebra with identity generated by Hermitian 
objects “o((po, To); (f, p)? (“symplectically smeared 
sharp-time fields at t=0”) satisfying linearity in f 
and p together with the commutation relations 


[o((p0, To); Tp o (Yo, To); To 
= io((f, p); (f, p^) 
and to define (symplectically smeared) time-t sharp- 
time fields by demanding 


a((¥2, Tt); (fe: Pr)) = a( (po, To); (fo; Po)) 


where (fr, pt) is the classical time-evolute of (fo, po). 
This x-algebra of sharp-time fields may be identified 
with the (minimal) field x-algebra of the previous 
section, the (F 7 of the ee section being 
ad with o((~o,70)3(f,p)), where (f,p) are 
the Cauchy data at t=0 of Ax r (This identifica- 
tion is of course many-one since ¢(F) =0 whenever 
F arises as (Tg — m — V)G for some test function 
G € CP(M).) 

Specializing momentarily to the case of the free 
scalar field (O — m?)ġ =0 (m 40) in Minkowski 
space with a flat t=0 Cauchy surface, the “sym- 
plectically smeared” two-point function of the usual 
ground state (“Minkowski vacuum state”), wg, is 
given, in this formalism, by 


wo(a((y, T); (f°, p alle, T); FpD) 
=} ((f uf) + (p' |e p?) 
+io((f",p'); FBA) [4] 
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where the inner products are in the one-particle 
Hilbert space H= L2(R°) and u= (m? — V2)!/2_ The 
GNS representation of this state may be concretely 
realized on the familiar Fock space F(H) over H by 


po(o((y, T); (f,P))) = —i(a'(a) — (a'(a))") 
where a denotes the element of H: 


(ull? f + ip /*p) 
v2 


(we note in passing that, if we equip H with the 
symplectic form 2Im(-|-), then K:(f,p)—>a is a 
symplectic map) and âf (a) is the usual smeared creation 
operator (= “ fat (x)a(x)d?x”) on F(H) satisfying 


[(a"(a"))", a" (a*)] = (a'la) 


The usual (smeared) annihilation operator, àla), is 
(a'(Ca))*, where C is the natural complex conjuga- 
tion, a a* on H. Both of these operators annihilate 
the Fock vacuum vector Q7 . In this representation, 
the one-parameter group of time-translation 
automorphisms 


a(t): ao( (po, To); (F,P)) = allr T); A p)) [5] 


is implemented by exp (—iHt) where H is the second 
quantization of u (i.e. the operator otherwise 
known as [y(k)a'(k)a(k)d?k) on F(H). 

The most straightforward (albeit physically artifi- 
cial) situation involving “particle creation” in a curved 
spacetime concerns a globally hyperbolic spacetime 
which, outside of a compact region, is isometric to 
Minkowski space with a compact region removed — 
that is, to a globally hyperbolic spacetime which is flat 
except inside a localized “bump” of curvature (see 
Figure 1). (One could also allow the function V in [2] 
to be nonzero inside the bump.) On the field algebra 
(defined as in the previous section) of such a spacetime, 
there will be an “in” vacuum state (which may be 
identified with the Minkowski vacuum to the past of 
the bump) and an “out” vacuum state (which may be 
identified with the Minkowski vacuum to the future of 
the bump) and one expects, for example, the “in 
vacuum” to arise as a many-particle state in the GNS 
representation of the “out vacuum” corresponding to 
the creation of particles out of the vacuum by the 
bump of curvature. 

In the formalism of this section, if we choose our 
global time coordinate on such a spacetime so that, 
say, the ż= 0 surface is to the past of the bump and 
the t=T surface to its future, then the single 
automorphism a(T) (defined as in [5]) encodes the 
overall effect of the bump of curvature on the 
quantum field and one can ask whether it is 
implemented by a unitary operator in the GNS 
representation of the Minkowski vacuum state [4]. 


Figure 1 A spacetime which is flat outside of a compact bump 
of curvature. 


This question may be answered by referring to the 
real linear map T:H—H which sends ap =27!/? 
(wilt tip? pr) to ao=2 (ufo + ipo). 
By the conservation in time of o and the symplec- 
ticity, noted in passing above, of the map 
K:(f,p)=— a, this satisfies the defining relation 


Im(Ta'|Ta*) = Im(a' |a?) 


of a classical Bogoliubov transformation. Splitting T 
into its complex-linear and complex-antilinear parts 
by writing 


T=a+ BC 


where a and 8 are complex-linear operators, this 
relation may alternatively be expressed in terms of 
the pair of relations 


a" a _ BB = L 


where & = CaC, 8 = CC. 

We remark that there is an easy-to-visualize 
equivalent way of defining a and 8 in terms of 
the analysis, to the past of the bump, into 
positive- and negative-frequency parts of complex 
solutions to [2] which are purely positive fre- 
quency to the future of the bump. In fact, if, for 
any element a€ H, we identify the positive- 
frequency solution to the Minkowski-space 
Klein-Gordon equation 


Palt, x) = (24) "7 exp(—ipt)a) (x) 


with a complex solution to [2] to the future of the 
bump, then (it may easily be seen) to the past of the 
bump, this same solution will be identifiable with 


a p=0' a 


the (partly positive-frequency, partly negative- 
frequency) Minkowski-space Klein—Gordon solution 


din(t, x) = (2)? exp(—ipt)oa) (x) 
+ ((2u)-"? exp (int)Ba) (x) 


and this could be taken to be the defining equation 
for the operators a and $. 

It is then known (by a 1962 theorem of Shale) 
that the automorphism [5] (strictly, its Weyl algebra 
counterpart) will be unitarily implemented if and 
only if 8 is a Hilbert-Schmidt operator on H. Wald 
(1979, in case m > 0) and Dimock (1979, in case 
m Æ 0) have verified that this condition is satisfied 
in the case of our bump-of-curvature situation. In 
that case, if we denote the unitary implementor by 
U, we have the following results: 


R1. The expectation value (UQ\|N(a)UQ) zo of the 
number operator, N(a)=4'(a)a(a), where a is a 
normalized element of H, is equal to (8a | Ga),,. 

R2. First note that there exists an orthonormal basis 
of vectors, @;,(i=1...00), in H such that the 
(Hilbert-Schmidt) operator (6*a*~' has the 
canonical form X`; A;(Ce;|-)|e;). We then have 
(up to an undetermined phase) 


1 
UQ = Nexp (- 5 ` sae) Q 


where the normalization constant N is chosen 
so that ||UQ|| = 1. This formula makes manifest 
that the particles are created in pairs. 


We remark that, identifying elements, a, of H with 
positive-frequency solutions (below, we shall call 
them “modes”) as explained above, result (R1) may 
alternatively be expressed by saying that the 
expectation value, win(N(a)), in the in-vacuum state 
of the occupation number, N(a), of a normalized 
mode, a, to the future of the bump, is given by 
(Balbay 

This formalism and the results, (R1) and (R2) 
above, will generalize (at least heuristically, and 
sometimes rigorously — see especially the rigorous 
scattering-theoretic work in the 1980s by Dimock 
and Kay and more recently by A Bachelot and others) 
to more realistic spacetimes which are only asympto- 
tically flat or asymptotically stationary. In favorable 
cases, one will still have notions of classical solutions 
which are positive frequency asymptotically towards 
the future/past, and, in consequence, one will have 
well-defined asymptotic notions of “vacuum” and 
“particles.” Also, in, for example, cosmological, 
models where the background spacetime is slowly 
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varying in time, one can define approximate adia- 
batic notions of classical positive-frequency solutions, 
and hence also of quantum “vacuum” and “particles” 
at each finite value of the cosmological time. But, at 
times where the gravitational field is rapidly varying, 
one does not expect there to be any sensible notion of 
“particles.” And, in a rapidly time-varying back- 
ground gravitational field which never settles down, 
one does not expect there to be any sensible particle 
interpretation of the theory at all. To understand 
these statements, it suffices to consider the (1 + 0)- 
dimensional Klein—Gordon equation with an external 


potential V: 
P 
(- J2 m? — vo) o=0 


which is of course a system of one degree of 
freedom, mathematically equivalent to the harmonic 
oscillator with a time-varying angular frequency 
w(t) =(m? + V(t))'/*. One could of course express 
its quantum theory in terms of a time-evolving 
Schrödinger wave function WU(y,t) and attempt to 
give this a particle interpretation at each time, s, by 
expanding W(y,s) in terms of the harmonic oscilla- 
tor wave functions for a harmonic oscillator with 
some particular choice of angular frequency. But the 
problem is, as is easy to convince oneself, that there 
is no such good choice. For example, one might 
think that a good choice would be to take, at time s, 
the set of harmonic oscillator wave functions with 
angular frequency w(s). (This is sometimes known 
as the method of “instantaneous diagonalization of 
the Hamiltonian.”) But suppose we were to apply 
this prescription to the case of a smooth V(-) which 
is constant in time until time O and assume the 
initial state is the usual vacuum state. Then at some 
positive time s, the number of particles predicted to 
be present is the same as the number of particles 
predicted to be present on the same prescription at 
all times after s for a V(-) which is equal to V(-) up 
to time s and then takes the constant value V(s) for 
all later times (see Figure 2). But V(-) will 
generically have a sharp corner in its graph (i.e., a 





Figure 2 Plots of w against tfor the two potentials V (continuous 
line) and V (continuous line upto s and then dashed line) which play 
a role in our critique of “instantaneous diagonalization.” 
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discontinuity in its time derivative) at time s, and 
one would expect a large part of the particle 
production in the latter situation to be accounted 
for by the presence of this sharp corner — and 
therefore a large part of the predicted particle 
production in the case of V(-) to be spurious. 

Back in 1+3 dimensions, even where a good 
notion of particles is possible, it depends on the 
choice of time evolution, as is dramatically illu- 
strated by the Unruh effect discussed in the relevant 
section. 


Theory of the Stress-—Energy Tensor 


To orient ideas, consider first the free (minimally 
coupled) scalar field, (O — m”)¢=0, in Minkowski 
space. If one quantizes this system in the usual 
Minkowski-vacuum representation, then the expec- 
tation value of the renormalized stress-energy tensor 
(which in this case is the same thing as the normal 
ordered stress—energy tensor) in a vector state W in 
the Fock space will be given by the formal point- 
splitting expression 


(WT p(x) t) 
alak — dnap(nt 40103 + m?2)) 


x ((W|p0(O(x1) O(x2)) ¥) 
— (QF |po(b(x1)b(x2))Q")) [6] 


where na is the usual Minkowski metric. A 
sufficient condition for the limit here to be finite 
and well defined would, for example, be for WV to 
consist of a (normalized) finite superposition of 
n-particle vectors of form 4'(a1),...,@'(a,)Q7 
where the smearing functions aj,...,a, are all 
C% elements of H (i.e., of L2,(R°). The reason this 
works is that the two-point function in such states 
shares the same short-distance singularity as the 
Minkowski-vacuum two-point function. For exactly 
the same reason, one obtains a well-defined finite 
limit if one defines the expectation value of 
the stress—-energy tensor in any physically admissible 
quasifree state by the expression 


w(Tab(x)) 


7 (x4 pi (x (è ð; 7 5 Nab (nO, On t m?)) 


x (w(P(x1)b(x2)) — wo( (x1) O(x2))) [7] 


This latter point-splitting formula generalizes to a 
definition for the expectation value of the 
renormalized stress—energy tensor for an arbitrary 
physically admissible quasifree state (or indeed 


for an arbitrary state whose two-point function 
has Hadamard form — i.e., whose anticommutator 
function satisfies condition (C4)) on the minimal 
field algebra and to other linear field theories 
(including the stress tensor for a conformally 
coupled linear scalar field) on a general globally 
hyperbolic spacetime (and the result obtained 
agrees with that obtained by other methods, 
including dimensional regularization and zeta- 
function regularization). However, the general- 
ization to a curved spacetime involves a number 
of important new features which we now briefly 
list (see Wald (1978) for details). 

First, the subtraction term which replaces 
wo(h(x1)b(x2)) is, in general, not the expectation 
value of $(x1)¢(x2) in any particular state, but 
rather a particular locally constructed Hadamard 
two-point function whose physical interpretation is 
more subtle; the renormalization is thus in general 
not to be regarded as a normal ordering. Second, the 
immediate result of the resulting limiting process 
will not be covariantly conserved and, in order to 
obtain a covariantly conserved quantity, one needs 
to add a particular local geometrical correction 
term. The upshot of this is that the resulting 
expected stress-energy tensor is covariantly con- 
served but possesses a (state-independent) anoma- 
lous trace. In particular, for a massless conformally 
coupled linear scalar field, one has (for all physically 
admissible quasifree states, w) the trace anomaly 
formula 


w(T4(x)) = (288072)! Cre + Rap R” — JR?) 


plus an arbitrary multiple of QR. In fact, in general, 
the thus-defined renormalized stress-energy tensor 
operator (see below) is only defined up to a finite 
renormalization ambiguity which consists of the 
addition of arbitrary multiples of the functional 
derivatives with respect to g,, of the quantities 


p= J podao d 
M 


where n ranges from 1 to 4 with F,;=1, F2 =R, 
F; = R?, and F4 =R,,R®. In the Minkowski-space 
case, only the first of these ambiguities arises and it 
is implicitly resolved in the formulas [6], [7] 
inasmuch as these effectively incorporate the 
renormalization condition that wọo(T,)=0. (For the 
same reason, the locally flat example we give below 
has no ambiguity.) 

One expects, in both flat and curved cases, that, 
for test functions, F e€ CẸ(M), there will exist 
operators T,,(F) which are affiliated to the net of 


local W*-algebras referred to earlier and that it is 
meaningful to write 


J 1o(Typ(x))F(x)| det(g)|"/2d4x = w(Typ(B)) 


provided that, by w on the right-hand side, we 
understand the extension of w from the Weyl algebra 
to this net. (T,,(F) is however not expected to 
belong to the minimal algebra or be affiliated to the 
Weyl algebra.) 

An interesting simple example of a renormalized 
stress-energy tensor calculation is the so-called 
Casimir effect calculation for a linear scalar field 
on a (for further simplicity, (1 + 1)-dimensional) 
timelike cylinder spacetime of radius R (see 
Figure 3). This spacetime is globally hyperbolic 
and stationary and, while locally flat, globally 
distinct from Minkowski space. As a result, while — 
provided the regions © are sufficiently small 
(such as the diamond region in Figure 3) — elements 
A(O) of the minimal net of local algebras on this 
spacetime will be identifiable, in an obvious way, 
with elements of the minimal net of local algebras 
on Minkowski space, the stationary ground state 
Weylinder Will, when restricted to such thus-identified 
regions, be distinct from the Minkowski vacuum 
state wo. The resulting renormalized stress—energy 
tensor (as first pointed out in Kay (1979)), 
definable, once the above identification has been 
made, exactly as in [7]) turns out, in the massless 
case, to be nonzero and, interestingly, to have a (in 
the natural coordinates, constant) negative energy- 
density Too. In fact, in this massless case, 


1 
Weylinder (Tab) — VATR2 "lab 


Figure 3 The timelike cylinder spacetime of radius R with a 
diamond region isometric to a piece of Minkowski space. See 
Kay (1979). Casimir effect in quantum field theory. (Original title: 
The Casimir effect without magic.) Physical Review D 20: 
3052-3062. Reprinted with permission © 1979 by the American 
Physical Society. 
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Hawking and Unruh Effects 


The original calculation by Hawking (1975) con- 
cerned a model spacetime for a star which collapses 
to a black hole. For simplicity, we shall only discuss 
the spherically symmetric case (see Figure 4). Adopt- 
ing a similar “mode” viewpoint to that mentioned 
after results (R1) and (R2) discussed earlier, the 
result of the calculation may be stated as follows: 
For a real linear scalar field satisfying [2] with m=0 
(and V =O) on this spacetime, the expectation value 
Win(N(aqz,e)) of the occupation number of a one- 
particle outgoing mode a,.¢) localized (as far as a 
normalized mode can be) around w in angular- 
frequency space and about retarded time v, and with 
angular momentum “quantum number” 4, in the in- 
vacuum state (i.e., on the minimal algebra for a real 
scalar field on this model spacetime) win is, at late 
retarded times, given by the formula 


C(@,®) 


Win(N(age)) = exp(8mMa) — 1 


where M is the mass of the black hole and the 
absorption factor (alternatively known as gray-body 
factor) T (w, £) is equal to the norm-squared of that 
part of the one-particle mode a,z,¢ which, viewed as 
a complex positive-frequency classical solution 
propagating backwards in time from late retarded 
times, would be absorbed by the black hole. (Note 
the independence of the right-hand side of this 
formula from the retarded time, v.) This calculation 
can be understood as an application of result (R1) 
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Figure 4 The spacetime of a star collapsing to a spherical 
black hole. 
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(even though the spacetime is more complicated than 
one with a localized “bump of curvature” and even 
though the relevant overall time evolution will not be 
unitarily implemented, the result still applies when 
suitably interpreted) and the heart of the calculation 
is an asymptotic estimate of the relevant “8” 
Bogoliubov coefficient which turns out to be depen- 
dent on the geometrical optics of rays which pass 
through the star just before the formation of the 
horizon. This result suggests that the in-vacuum state 
is indistinguishable at late retarded times from a state 
of blackbody radiation at the Hawking temperature, 
THawking = 1/87M, in Minkowski space from a 
blackbody (gray body) with the same absorption 
factor. This was confirmed by further work by many 
authors. Much of that work, as well as the original 
result of Hawking was partially heuristic but later 
work by Dimock and Kay (1987), by Fredenhagen 
and Haag (1990), and by Bachelot (1999) and others 
has put different aspects of it on a rigorous 
mathematical footing. The result generalizes to 
nonzero mass and higher spin fields to interacting 
fields as well as to other types of black hole and the 
formula for the Hawking temperature generalizes to 


Taine = K/ 27 


where « is the surface gravity of the black hole. 
This result suggests that there is something funda- 
mentally “thermal” about quantum fields on black- 
hole backgrounds and this is confirmed by a number of 
mathematical results. In particular, the theorems in the 
two papers Kay and Wald (1991) and Kay (1993), 
combined together, tell us that there is a unique state 
on the Weyl algebra for the maximally extended 
Schwarzschild spacetime (a.k.a. Kruskal-Szekeres 
spacetime) (see Figure 5) which is invariant under the 
Schwarzschild isometry group and whose two-point 
function has Hadamard form. Moreover, they tell us 
that this state, when restricted to a single wedge (i.e., 
the exterior Schwarzschild spacetime) is necessarily a 
KMS state at the Hawking temperature. This unique 
state is known as the Hartle-Hawking-Israel state. 
These results in fact apply more generally to a wide 
class of globally hyperbolic spacetimes with bifurcate 
Killing horizons including de Sitter space — where the 
unique state is sometimes called the Euclidean and 
sometimes the Bunch—Davies vacuum state — as well as 
to Minkowski space, in which case the unique state is 
the usual Minkowski vacuum state, the analog of the 
exterior Schwarzschild wedge is a so-called Rindler 
wedge, and the relevant isometry group is a one- 
parameter family of wedge-preserving Lorentz boosts. 
In the latter situation, the fact that the Minkowski 
vacuum state is a KMS state (at “temperature” 1/27) 


Future singularity 
(Schwarzschild case) 






Exterior 
Schwarzschild 
wedge/ 
Rindler wedge 


Past singularity 
(Schwarzschild case) 


Figure 5 The geometry of maximally extended Schwarzschild 
(/or Minkowski) spacetime. In the Schwarzschild case, every 
point represents a 2-sphere (/in the Minkowski case, a 2-plane). 
The curves with arrows on them indicate the Schwarzschild time 
evolution (/one-parameter family of Lorentz boosts). These 
curves include the (straight lines at right angles) event horizons 
(/Killing horizons). 


when restricted to a Rindler wedge and regarded with 
respect to the time evolution consisting of the wedge- 
preserving one-parameter family of Lorentz boosts is 
known as the Unruh effect (1975). This latter property 
of the Minkowski vacuum in fact generalizes to 
general Wightman OFTs and is in fact an immediate 
consequence of a combination of the Reeb—Schlieder 
theorem (applied to a Rindler wedge) and the 
Bisognano—Wichmann theorem (1975). The latter 
theorem says that the defining relation [1] of a KMS 
state holds if, in [1], we identify the operator J with the 
complex conjugation which implements wedge reflec- 
tion and H with the self-adjoint generator of the 
unitary implementor of Lorentz boosts. We remark 
that the Unruh effect illustrates how the concept of 
“vacuum” (when meaningful at all) is dependent on 
the choice of time evolution under consideration. 
Thus, the usual Minkowski vacuum is a ground state 
with respect to the usual Minkowski time evolution 
but not (when restricted to a Rindler wedge) with 
respect to a one-parameter family of Lorentz boosts; 
with respect to these, it is, instead, a KMS state. 


Nonglobally Hyperbolic Spacetimes 
and the “Time Machine” Question 


Hawking (1992) argued that a spacetime in which a 
time machine gets manufactured should be modeled 
(see Figure 6) by a spacetime with an initial globally 
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Figure 6 The schematic geometry of a spacetime in which a 
time machine gets manufactured. 


hyperbolic region with a region containing closed 
timelike curves to its future and such that the future 
boundary of the globally hyperbolic region is a 
compactly generated Cauchy horizon. On such a 
spacetime, Kay et al. (1997) proved that it is 
impossible for any distributional bisolution which 
satisfies (even a certain weakened version of) the 
Hadamard condition on the initial globally hyper- 
bolic region to continue to satisfy that condition on 
the full spacetime -— the (weakened) Hadamard 
condition being necessarily violated at at least one 
point on the Cauchy horizon. This result implies 
that, however one extends a state, satisfying our 
conditions (C1)—(C4), on the minimal algebra for [2] 
on the initial globally hyperbolic region, the expec- 
tation value of its stress—-energy tensor must neces- 
sarily become singular on the Cauchy horizon. This 
result, together with many heuristic results and 
specific examples considered by many other authors 
appears to support the validity of the (Hawking 
1992) chronology protection conjecture to the effect 
that it is impossible in principle to manufacture a time 
machine. However, there are potential loopholes in the 
physical interpretation of this result as pointed out by 
Visser (1997), as well as other claims by various authors 
that one can nevertheless violate the chronology 
protection conjecture. For a recent discussion on this 
question, we refer to Visser (2003). 


Other Related Topics and Some 
Warnings 


There is a vast computational literature, calculating 
the expectation values of stress-energy tensors in 
states of interest for scalar and higher spin linear 
fields (and also some work for interacting fields) on 
interesting cosmological and black-hole backgrounds. 
QFT on de Sitter and anti-de Sitter space is a big 
subject area in its own right with recent renewed 
interest because of its relevance to string theory and 
holography. Also important on black-hole back- 
grounds is the calculation of gray-body factors, 
again with renewed interest because of relevance to 
string theory and to brane-world scenarios. 
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There are many further mathematically rigorous 
results on algebraic and axiomatic QFT in a curved 
spacetime setting, including versions of PCT, spin- 
statistics and Reeh—Schlieder theorems and also 
rigorous energy inequalities bounding the extent to 
which expected energy densities can be negative, etc. 

There is much mathematical work controlling 
scattering theory on black holes, partly with a view 
to further elucidating the Hawking effect. 

Perturbative renormalization theory of interacting 
quantum fields in curved spacetime is also now a 
highly developed subject. 

Beyond QFT in a fixed curved spacetime is 
semiclassical gravity which takes into account the 
back-reaction of the expectation value of the stress- 
energy tensor on the classical gravitational back- 
ground. There are also interesting condensed matter 
analogs of the Hawking effect such as dumb holes. 

Readers exploring the wider literature, or doing 
further research on the subject should be aware that 
the word “vacuum” is sometimes used to mean 
“sround state” and sometimes just to mean “quasifree 
state.” They should be cautious of attempts to define 
particles on Cauchy surfaces in instantaneous diag- 
onalization schemes (cf. the remarks at the end of the 
section “Particle creation and the limitations of the 
particle concept”). When studying (or performing) 
calculations of the “expectation value of the stress- 
energy tensor” it is always important to ask oneself 
with respect to which state the expectation value is 
being taken. It is also important to remember to check 
that candidate two-point (anticommutator) functions 
satisfy the positivity condition (C3) discussed earlier. 
Typically, two-point distributions obtained via mode 
sums automatically satisfy condition (C3) (and condi- 
tion (C4)), but those obtained via image methods do 
not always satisfy it. (When they do not, the presence 
of nonlocal spacelike singularities is often a tell-tale 
sign as can be inferred from Kay’s conjecture/Radzi- 
kowski’s theorem discussed earlier.) There are a 
number of apparent implicit assertions in the literature 
that some such two-point functions arise from “states” 
when of course they cannot. Some of these concern 
proposed analogs to the Hartle-Hawking-Israel state 
for the (appropriate maximal globally hyperbolic 
portion of the maximally extended) Kerr spacetime. 
That they cannot belong to states is clear from a 
theorem in Kay and Wald (1991) which states that 
there is no stationary Hadamard state on this space- 
time at all. Others of them concern claimed “states” on 
spacetimes such as those discussed in the previous 
section which, if they really were states would seem to 
be in conflict with the chronology protection con- 
jecture. Finally, beware states (such as the so-called a- 
vacua of de Sitter spacetime) whose two-point 
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distributions violate the “Hadamard” condition (C4) 
and which therefore do not have a well-defined finite 
expectation value for the renormalized stress—energy 
tensor. 


See also: AdS/CFT Correspondence; Algebraic 
Approach to Quantum Field Theory; Axiomatic Quantum 
Field Theory; Black Hole Mechanics; Bosons and 
Fermions in External Fields; Integrability and Quantum 
Field Theory; Quantum Fields with Indefinite Metric: 
Non-Trivial Models; Quantum Fields with Topological 
Defects; Quantum Geometry and Its Applications; 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools; Thermal Quantum 
Field Theory. 
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By any account quantum field theory occupies a 
prominent place in the history of mathematical 
physics. This article is, however, not intended to 
serve as an overview of this subject, but has the 
more modest aim of identifying a few areas which 
seem to me interesting and significant. 


Historical Remarks; Second Quantization 


At the time when quantum field theory was at the 
forefront of theoretical physics its raison d’étre was 
to complete the quantum description of the sub- 
atomic world. Quantum mechanics had been amaz- 
ingly successful in solving almost the whole of 
atomic physics by making explicit the quantum 


(wave) nature of the electron, according to the 
formulations of Heisenberg and Schrödinger. The 
introduction of the quantum idea into physics, 
however, by Planck in 1900 closely followed by 
Einstein in 1905 was the proposal of a quantum 
(particular) aspect of the electromagnetic field — the 
photon. In the mid-1920s the only force in nature to 
be considered was the electromagnetic interaction; 
this was before the theories of Yukawa and Fermi, 
concerning the strong and weak nuclear forces. 
Dirac, Heisenberg, Jordan, and others then 
addressed themselves to finding a formulation of 
quantum electrodynamics (QED) comparable in 
mathematical sophistication to the Heisenberg- 
Schrodinger formulation of quantum mechanics — 
which Planck’s and Einstein’s theories were not. 
The idea that was pursued, at least in the early 
stages, was that the Schrodinger wave function w, 
taken as a wave field, should be “quantized”; Dirac 


seems to have taken this as a model for photons. 
Jordan further proposed that electrons should be 
treated as the quanta of an electron field, but 
recognized that their fermionic nature would modify 
the quantization procedure. This generic idea 
involved what was called “second quantization” — 
of a field into a particle. 

One of the earliest quantization rules was Bohr’s 
condition relating to the periodic orbits of electrons in 
atoms, J = f p dq =nh. At the hands of Heisenberg and 
Dirac this became upgraded to the commutation 
relation 


|q, p] = ih 


where the operators p and q are “observables.” In 
their papers on quantum field theory, Dirac, Jordan 
and Wigner, and Heisenberg introduced creation and 
annihilation operators which had the function, as 
their name implied, of creating and destroying single 
particles — quanta of the field. These operators obeyed 
the commutation rules (with [A, B] = AB — BA) 
|b», BS | = Ors, [by Bs] = ae = 0 
when the field quanta were bosons, and the anti- 
commutation rules 
{b,, bs} = bys, {b,, bs} = {b7, b3} =0 

(with {A, B}= AB + BA) when the field quanta were 
fermions (e.g., electrons). These steps constitute 
second quantization, but it may be noted that 
the creation and annihilation operators are not 
observables, as p and q are in the Heisenberg 
commutation relation. In addition, the second 
quantization conditions do not involve Planck’s 
constant. “First” and “second” quantization are 
therefore not so similar as one might like to think. 

The question of what exactly is being quantized 
was in fact the source of some confusion. In his 
paper of 1927, Dirac’s attention is focussed on 
electromagnetic radiation, but he nevertheless dis- 
cusses the difference between “a light-wave and the 
de Broglie or Schrodinger wave associated with the 
light-quanta.” As Dirac points out, “their intensities 
are to be interpreted in different ways. The number 
of light quanta per unit volume associated with a 
monochromatic light-wave equals the energy per 
unit volume of the wave divided by the energy 
(27h)v of a single light quantum. On the other hand 
a monochromatic de Broglie wave of amplitude a 
(multiplied into the imaginary exponential factor) 
must be interpreted as representing a” light quanta 
per unit volume for all frequencies.” There are at 
least two problematic issues here. First, is the 
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Schrödinger wave function w to be considered as a 
“real” field, whose quanta result in “real” particles, 
or is it a probability field, whose significance lies in 
Born’s probabilistic interpretation of quantum 
mechanics? Born wrote in 1926, “[Einstein said 
that] the waves are present only to show the 
corpuscular light quanta the way, and he spoke in 
the sense of a “ghost field”. This determines the 
probability that a light quantum, the bearer of 
energy and momentum, takes a certain path; 
however, the field itself has no energy and no 
momentum.” This is the first problem. The second 
one concerns the nature of the quantization itself. Is 
this a quantization of field energy, or a quantization 
of the field itself, as a substantial entity? If the field 
is real, the second of these does not imply the first. 

Ambiguities surrounding the idea of second 
quantization survived into the 1960s. Wigner is 
recorded as saying, in an interview in 1963, “just as 
we get photons by quantising the electromagnetic 
fields, so we should be able to get material particles 
by quantising the Schrodinger field.” And Rosenfeld, 
also in an interview in 1963, said, “in some sense or 
other, Jordan himself took the wave function, the 
probability amplitude, physically more seriously 
than most people [did].” 

It would seem we are justified in concluding that the 
idea of second quantization contains flaws, but an even 
clearer indication of the need for rethinking is provided 
by the story of the Dirac equation. This is a wave 
equation for the electron, compatible with special 
relativity, and taking explicit account of its spin being 
(1/2)b. The equation famously had both positive- and 
negative-energy solutions. This potential disaster was 
converted by Dirac into a triumph by reinterpreting the 
(absence of) negative-energy solutions as (positive- 
energy) antiparticles — positrons, particles with positive 
charge but the same mass and spin as the electron. 
Positrons were eventually discovered by Anderson. It 
was later shown that the existence of antiparticles is a 
general feature of quantum field theory, not just a 
peculiarity of spin-1/2 particles. The significance of this 
discovery, however, is that the twin requirements of 
relativity and quantum theory are not compatible with 
a single-particle state; rather, these requirements result 
in a two-particle state. Thus, in some sense the 
requirements of relativity and quantum mechanics 
already start to take us down the road to a quantum 
theory of fields. 

Quantum field theory is then constructed on the 
following sort of framework: “classical” theories for 
fields with any spin may be written down and these 
are quantized by reinterpreting the field variables as 
Operators and imposing Heisenberg-type commuta- 
tion relations on the field and its corresponding 
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“momentum” variable. So, for example, for spinless 
fields we have the equal-time commutation relation 


[o(x, t), m(y, t)] = ibd) (x — y) 


where 7=0OL/0(0o¢) and L is the Lagrange density. 
The mass and spin of particles are defined with 
reference to the Poincaré group (thereby incorporat- 
ing special relativity) and the quantum requirement 
is the familiar one that physical states are repre- 
sented by vectors in Hilbert space. The rest follows: 
as Weinberg says, “quantum field theory is the way 
it is because (with certain qualifications) this is the 
only way to reconcile quantum mechanics with 
special relativity.” 


Renormalization 


A notorious problem in quantum field theory is the 
occurrence of infinities. In QED, for example, the 
electron acquires a self-energy — and therefore a 
contribution to its mass — by virtue of the emission 
and reabsorption of virtual photons. It turns out 
that this self-energy is infinite — it is given by a 
divergent integral — even in the lowest order of 
perturbation theory. In the early days, this was 
recognized as being a serious problem, and in fact it 
turns out to be a generic problem in quantum field 
theory. It was realized by Dyson, however, that in 
some field theories these divergences may be dealt 
with by redefining a small number of parameters 
(e.g., in QED, the electron mass, charge, and field 
amplitude) so that thereafter the theory is finite to 
all orders of perturbation theory. Such theories are 
called renormalizable, and QED is a renormalizable 
field theory. 

Some important field theories, however, are not 
renormalizable; an example is Fermi’s theory of 
weak interactions. To lowest order in perturbation 
theory, Fermi’s theory works well (e.g., in account- 
ing for the electron spectrum in neutron beta decay), 
but to higher orders divergent results are obtained, 
which cannot be waved away by redefining a finite 
number of parameters; that is to say, as the order of 
perturbation increases, so also does the number of 
parameters to be redefined. Nonrenormalizable 
theories of this type have traditionally been regarded 
as highly undesirable, not to say rather nasty. 

The modern view of renormalization is, however, 
somewhat different. The problem with nonrenormal- 
izable theories is that, in order to calculate a physical 
process to all orders in perturbation theory, an 
infinite number of parameters must be renormalized, 
so the theory has no predictive power. In practice, 
however, we do not need to calculate to all orders in 


perturbation theory, since any physical process (say a 
scattering process or a particle decay) will only be 
observed at a finite energy and comparison of theory 
and experiment therefore only requires calculation up 
to a finite order of perturbation theory. So even 
nonrenormalizable theories are perfectly acceptable 
as low-energy theories. This amounts to a philosophy 
of effective field theories; an effective field theory is a 
model which holds good up to a particular energy 
scale, or equivalently down to a particular length 
scale. 

An important addition to the theoretical armoury 
is the renormalization group. Renormalization is 
implemented first of all by a scheme of regulariza- 
tion, which enables the divergences to be exhibited 
explicitly. The simplest type of regularization is the 
introduction of a cutoff in the momentum integrals, 
but in modern particle physics the favored scheme is 
dimensional regularization. The dimensionality of 
the integrals in momentum space is taken to be 
d=4-—e and the divergent quantities have an 
explicit dependence on € (which, of course, as the 
“real” world is approached, approaches zero). At 
the same time, a mass parameter u is introduced in 
order to define dimensionless quantities, for exam- 
ple, a dimensionless coupling constant. The renor- 
malized quantities then depend on the “bare” 
(unrenormalized) quantities and on u and e£. The 
arbitrariness of u enables a differential equation, for 
scattering amplitudes, for example, to be written 
down. While at first sight this renormalization 
group equation might seem to have no physical 
importance, in fact it gives a powerful way of 
studying scattering behavior at large momenta. 

Most interestingly, the concept of the renormali- 
zation group also arises in condensed matter physics. 
Here, rather than, for example, a cutoff in momen- 
tum space, the relevant parameter is a distance scale. 
In the Ising model in statistical mechanics, for 
example, in which spins are located on a lattice, 
the parameter is the lattice spacing. To construct a 
theory that describes the physics on the macroscopic 
scale involves integrating out the details on the 
microscopic scale and one way to do this is via the 
“block spin” transformation originally introduced 
by Kadanoff. In this way the renormalization group 
has had a large impact in condensed matter physics, 
for example, in the study of critical phenomena. 


Particle Physics and Cosmology 


Probably the most spectacular success of quantum 
field theory in the twentieth century has been in 
particle physics. The “standard model” accounts for 
the strong, electromagnetic, and weak interactions 


between elementary particles with outstanding 
success. The interactions are generalizations of Max- 
well’s electrodynamics, which is invariant under a 
symmetry group U(1) of gauge transformations. An 
enlargement of this group to SU(2) ® U(1) accounts 
for the unified electroweak interaction (the unifica- 
tion resulting from the fact that the two U(1)’s above 
are not exactly the same; there is some on-diagonal 
mixing), and the strong interactions between quarks, 
which binds them into hadrons, are invariant under an 
SU(3) group of gauge transformations. The gauge 
fields are the photon y, the W and Z bosons (both 
heavy; of the order of 100 times the proton mass), and 
the (massless) gluons mediating the force between 
quarks (quantum chromodynamics, QCD). An 
important feature of the standard model is sponta- 
neous symmetry breaking, which is the mechanism by 
which the W and Z particles acquire a mass (but the 
photon does not, and neither do the gluons). This goes 
by the name of the Higgs mechanism. 

The quantization of the standard model is most 
successfully carried out using the path-integral 
formalism, rather than canonical quantization, and 
the proof of the renormalizability of the model (of 
nonabelian gauge theories with spontaneous sym- 
metry breaking) was given by °t Hooft. Details of 
these topics are now available in many textbooks. 

Confidence that this is a realistic model of elemen- 
tary particles — that is to say, of quarks and leptons — 
depends, of course, on particular experiments and 
their interpretation and an important milestone on this 
journey was Feynman’s quark—parton model of deep 
inelastic electron—proton scattering. The interpretation 
of the data required a picture of an electron scattering 
from an individual quark in the proton, and this in 
turn required a negligible interaction between quarks; 
in other words, that at small distances (inside the 
proton) the quarks are (almost) free — despite the fact 
that at large distances they most certainly are not! The 
proof, by Gross, Politzer, and Wilczek, that nonabe- 
lian gauge are indeed asymptotically free (asymptotic 
in momentum space, that is) was therefore an 
important event in helping to establish the credibility 
of the standard model. 

A characteristic contribution of quantum field theory 
to our view of the physical world is its picture of the 
vacuum, as being populated with virtual particle- 
antiparticle pairs. A consequence of this is the phenom- 
enon of vacuum polarization — that the presence of an 
electric charge in free space polarizes these virtual pairs. 
This in turns leads to the phenomenon of screening in 
QED, and antiscreening in QCD, SU(3) having a more 
complicated structure than U(1). It also leads to a 
nonzero (in fact, quadratically divergent!) value for the 
energy of the vacuum. This is in effect the contribution 
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of the zero-point energies of all the oscillators in the 
Fourier expansion of the scalar field operator. In any 
other interaction than gravity, this zero-point energy 
may be ignored, but in gravity it may be expected to 
have observable consequences, and indeed it turns out 
that it plays the same role as a cosmological constant A, 
and therefore acts as an agent of acceleration, rather 
than deceleration, of the universe. 

A final topic worth noting is one whose existence 
would have been inconceivable in the early days of this 
subject. The nonlinearity of the (nonabelian) gauge 
field equations and the existence of a nontrivial group 
space allows new types of topologically nontrivial 
solutions to these equations: solitons, bounces, instan- 
tons, sphalerons, and so on. Effects such as fractional 
spin and nonconservation of fermion number also 
appear, and, on the cosmological scale, domain walls 
and cosmic strings. There is something here for 
theoretical physicists of many differing interests. 
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Introduction 


The nonperturbative construction of quantum field 
models with nontrivial scattering in arbitrary dimen- 
sion d of the underlying Minkowski spacetime is 
much simpler in the framework of quantum field 
theory with indefinite metric than in the positive- 
metric case. In particular, there exist a number of 
solutions in the physical dimension d = 4, where up 
to now no positive-metric solutions are known. The 
reasons why this is so are reviewed in this article, 
and some examples obtained by analytic continua- 
tion from the solutions of Euclidean covariant 
stochastic partial differential equations (SPDEs) 
driven by non-Gaussian white noise are discussed. 


The Hilbert Space Structure Condition 


It has been proved by F Strocchi that a quantum 
gauge field in a local, covariant gauge cannot act on 
a Hilbert space with a positive-definite inner 
product. But it is possible to overcome this obstacle 
by passing from a Hilbert space representation of 
the algebra of the quantum field to Krein space 
representations in order to preserve locality and 
covariance under the Poincaré group. 

A Krein space K is an inner-product space which 
also is a Hilbert space with respect to some auxiliary 
scalar product. The relation between the inner 
product (.,.) and the auxiliary scalar product (.,.) 
is given by a self-adjoint linear operator ]:K > K 
with J? = 1x and (.,.) = (.,J.). J is called the metric 
operator. A quantum field acting on such a space is 
called a quantum field with indefinite metric. The 
formal definition is as follows. 

Let DC K be a dense linear space and QED a 
distinguished vector (henceforth called the vacuum). 
Let S = S(R4,C) be the space of Schwartz test 
functions with values in C. A quantum field ¢ by 
definition is a linear mapping from S to the linear 
operators on D. One usually assumes that D is 
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generated as the linear span of vectors generated by 
repeated application of field operators to the 
vacuum. The following properties should hold for 
the quantum field ¢: 


1. Temperedness: fa > f nS > (Y, d(fr)®) > 
(V, (f)¥) VU, d € S. 

2. Covariance: There exists a weakly continuous 
representation U of the covering of. the 
orthochronous, proper Poincaré group P ` by 
linear operators on D which is J-unitary, that is, 
Ul = U-! with U™ = JU*J|p and leaves Q 
invariant. @ is said to be covariant with respect 
to U and a representation 7 of the covering of the 


orthochronous, proper Lorentz > Li if 
U(g)o(f)U(g) = (fg), where felx) T(A)f(A- i 
(x — a)), us, ah, K eL, a ERI. 


oe Special iy: Let U(a), a € R%, be the representa- 
tion of the e group and let 
o = Uy oep suppF( (Y, U(.)®)) with F the Fourier 
transform (in the sense of tempered distribu- 
tions). Formally, o is the joint spectrum of the 
generators of spacetime translations U(a). The 
spectral condition then demands that o C ve 
the closed forward light cone in energy-momentum 
space. 

4. Locality: There is a decomposition C = 4, V, 
such that for each f, € S taking values in a V,, 
and having spacelike separated supports one has 


either [A(f),¢(4)]=0 or {¢(f), o(2)}=0, where 
[.,.| is the commutator and {.,.} the 
anticommutator. 


5. Hermiticity: There is an involution * on S such 
that d(f)'"' = o(f*). 


The quantum-mechanical interpretation of the 
inner product of two vectors in K as a probability 
amplitude, however, gets lost. It has to be restored 
by the construction of a physical subspace of K 
where the restriction of the inner product is non- 
negative. This is called the Gupter—Bleuler gauge 
procedure. Typically, one first considers the problem 
of constructing quantum fields with indefinite 
metric, that is, the dynamical problem is addressed. 
This is often followed by the construction of the 
physical states, which involves implementation of 
quantum constraints. 
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The vacuum expectation values (VEVs), also called 
Wightman functions, of the quantum field theory 
with indefinite metric (IMQFT) are defined as 


Waf @ +++ @ fn) = (Q, Of) ++ Of) Q) 
fi,---,fn€S [1] 


An axiomatic framework for (unconstrained) 
IMQFT has been suggested by G Morchio and 
F Strocchi in terms of the Wightman functions 
W, E€ S', nENo. Previous work on the topic had 
been done by J Yngvason. These generalized Wight- 
man axioms of Morchio and Strocchi replace the 
positivity condition on the Wightman functions by a 
so-called Hilbert space structure condition (HSSC): 
for n € No there exist p, a Hilbert seminorm on S*” 
such that 


Wrim(f D P)| < Paff) Pmb) Vn, m E€ No 
les hes” [2] 


This condition makes sure that a field algebra on a 
Krein space with VEVs equal to the given set of 
Wightman functions can be constructed. The 
remaining axioms of the Wightman framework - 
temperedness, covariance, spectral condition, local- 
ity, and Hermiticity — remain the same. Clustering of 
Wightman functions is assumed at least for massive 
theories: 


lim Wrimlf Q bia) = Wilf)Winlh) Yn,m E No 
f e S”, h e S% [3] 


for spacelike a € R?. It fails to hold in certain 
physical contexts where multiple vacua (also called 
O-vacua) accompanied with massless Goldstone 
bosons occur due to spontaneous symmetry 
breaking. 

In the original Wightman axioms, there are 
essentially two nonlinear axioms: positivity and 
clustering. Here nonlinear means that checking that 
condition involves more than one VEV with a given 
number of field operators. The cluster condition can 
be linearized by an operation on the Wightman 
functions called “truncation.” The equations 


=^ [| Whe ef) H 


TeP Graie 
Ji <j2 <: <j] 


recursively define the truncated Wightman functions 
WT for neN. Here P™ stands for the set of all 
partitions of {1,...,7} into disjoint, nonempty sets. 
Unfortunately, the positivity condition (at least 


when combined with nontrivial scattering) becomes 
highly nonlinear for truncated Wightman functions. 
This can be seen as one explanation why it is so 
difficult to find nontrivial (i.e., corresponding to 
nontrivial interactions) solutions to the Wightman 
axioms. 

But it turns out that, in contrast to positivity, the 
HSSC is essentially linear for truncated Wightman 
functions. 


Theorem 1 If there exists a Schwartz norm ||- || on 
S such that WT is continuous with respect to || - ||°” 
for n € N then the associated sequence of Wightman 
functions {W,,} fulfills the HSSC [2]. 


Note that ||- ||” is well defined as S is a nuclear 
space. This theorem makes it much easier to 
construct IMQFTs. In particular, all known solu- 
tions of the linear program for truncated 
Wightman functions lead to an abundance of 
mathematical solutions to the axioms of IMQFT, 
as long as the singularities of truncated Wightman 
functions in position and energy-momentum space 
do not become increasingly stronger with growing n. 
For example, the perturbative solutions to Wight- 
man functions of Ostendorf and Steinmann provide 
solutions when the perturbation series is truncated at 
a given order. 


Relativistic Fields from Euclidean 
Stochastic Equations 


In the classical work on constructive quantum field 
theory, relativistic fields in spacetime dimensions 
d=2 and 3 have been constructed by analytic 
continuation from Euclidean random fields. This, in 
particular, has led to firm connections between 
quantum field theory and equilibrium statistical 
mechanics. Let us discuss one specific class of 
solutions of the axioms of IMQFT for arbitrary d 
which also stem from random fields related to an 
ensemble of statistical mechanics of classical, con- 
tinuous particles. Mathematically, this is connected 
with using random fields with Poisson distribution. 
As in constructive QFT, the moments, also called 
Schwinger functions, of the random field can be 
analytically continued from Euclidean imaginary 
time to relativistic real time. That this is possible 
results from an explicit calculation. Axiomatic results 
cannot be used, as they depend on positivity or 
reflection positivity in the Euclidean spacetime, 
respectively. 

By definition, a mixing Euclidean covariant 
random field » is an almost surely linear mapping 
from Sp = S(R7,RN) to the space of real-valued 
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measurable functions (random variables) on some 
probability space that fulfills the following 
properties: 


1. Temperedness: fn —> f in Sp => ylfa) = (f). 

2. Covariance: plf) £ O(fe) Vf ESR, g = {A, a}, 
A €SO(d), aER*, flx) = T(A)f(A1(x — a)) for 
some continuous representation tT: SO(d) + GL(N). 

3. Mixing: lim;...E[AB,] = E[AJE[B] for all 
square-integrable random variables A = A(y), 
B=B(y), and Big=B(Yia), Pralf) = (fia) VF E SR, 
aER4\ {0}. 


The mixing condition in the Euclidean spacetime 
plays the same role as the cluster property in the 
generalized Wightman axioms. 

In particular, we consider random fields ọ 
obtained as solutions of the SPDE Dy = 7. In this 
equation, 7 is a noise field, that is, 7 is T-covariant 
for some representation of SO(d), n(f) has infinitely 
divisible probability law and n(f), n(h) are indepen- 
dent Vf, h € Sr with supp f O supp h = Ø. D is a 
r-covariant (i.e, T(A)DT(A) t =D VYA €SO(d)) 
partial differential operator with constant coeffi- 
cients (also pseudodifferential operators D could be 
considered). From the classification of infinitely 
divisible probability laws, it is known that 7 
essentially consists of Gaussian white noise and 
Poisson fields and derivatives thereof. Such a Gauss- 
Poisson noise field by the Bochner—Minlos theorem 
is characterized by its Fourier transform. Direct 
relations with QFT arise if one chooses 


Ble] = exp} [wor 3p(-A)f dx 
f E SR [5] 


where y : RN —> C is a Lévy function, 


eS — 1) dr(s 
z J ce Da) 
teRN [6] 


. bot 
w(t) = ia -t — 5 





Here the centered dot represents a T-invariant scalar 
product on RN, ø a positive-semidefinite T-invariant 
NxN matrix, z>0 a real number and r is a 
T-invariant probability measure on R” \{0} with all 
moments. Further, A g= (Pt) /Ot.Ots)|, — o> 
and p :[0, 09) — [0,co) is a polynomial depending 
on D. If D , the Fourier-transformed inverse of D, 
exists, it can be represented by 


Dp” b= Or(k) 
a I= 1 (IAI + mz)” 


Here Og(k) is a complex NxN matrix with 
polynomial entries being t-covariant, 7(A)Og 


[7] 


(A-!k)r(A)-!'=Og(k) VA € SO(d), k € RË. n€ 
N and mı € C\(—co, 0) are parameters with the 
interpretation of the mass spectrum (m1,...,mp) 
and (14,...,vp) the dipole degrees of the related 
masses. We restrict ourselves to the case of positive 
mass spectrum where m; > 0, and in this case 


p(t) = p(t, D) = —= p~z > t>0 [8] 


One can show that y obtained as the unique 
solution of the SPDE Dy = 77 is a Euclidean covariant, 
mixing random field. The Schwinger functions 
(moments) of y are given by 


=Elp(fi)::-eGa)l, fis- fa E SR [9] 


Now the Schwinger functions can be calculated 
explicitly. They are determined by the truncated 
Schwinger functions, cf. [4], as follows: for n = 2, 
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D ande (x1 x2) 
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and for n > 3 
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t=0 

and the Einstein convention of summation and raising/ 
lowering of indices on R with respect to the invariant 
inner product - is applied. The Schwinger functions 
fulfill the requirements of t-covariance, symmetry, 
clustering, and Hermiticity from the Osterwalder-— 
Schrader axioms of Euclidean QFT. 

While there is no known general reason why a 
relativistic QFT should exist for a given set of 
Schwinger functions, one can take advantage of the 
explicit formulas [10|-[13] in order to calculate the 
analytic continuation from Euclidean to relativistic 
times explicitly. 
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It simplifies the considerations to exclude dipole 
fields, that is, one assumes that 1=1 for 
l = 1,...,n. In physical terms, the no-dipole condi- 
tion guarantees that the asymptotic fields in Min- 
kowski spacetime fulfill the Klein—Gordon equation 
and thus generate particles in the usual sense if 
applied to the vacuum. If this condition is not 
imposed, asymptotic fields might only fulfill a dipole 
equation (O +m?) g™/%t = 0 or a related hyper- 
bolic equation of even higher order, and the particle 
states generated by application of such fields to the 
vacuum require a gauge fixing (constraints) in order 
to obtain a physical interpretation. Given the no- 
dipole condition, one obtains by expansion into 
partial fractions 


1 N 
oe [14] 
TE = 1 (Al + mp) PUET The ney 
with b; € (0,00) uniquely determined and b; Æ 0. 
For the truncated Schwinger functions, this implies 
(n > 3) that 


Ma f, I (-A + m7) )~* (x — x;) dx [15] 


At this point, a lengthy calculation yields a aai 
entation of the functions fpa lai A +m; _ 
(x — xj) dx as the Fourier-Laplace transform of a 
distribution WJ me that fulfills the spectral 
condition. This is equivalent to the statement that 
the analytic cone oe of such functions to 
relativistic times yields W/m,» Where the latter 
distribution is the inverse Fourier transform of 
wr mim, This distribution up to a constant that 
can be integrated into OF is given by 


[16] 


Here 6=(k) = 6(4k°)6(k2 —m*), where 0 is the 
Heaviside step function and k? = k” — |k|*. On the 
other hand, the partial differential operator OF can 
be anally continued in momentum space: 


OE Rien 
= Or (Gki; ki), ee 
ki,...,k„ € R%. With the definition 
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2i 
lI by, W nam, T Mi, (Ri, oer kn) [19] 
the analytic continuation of Schwinger functions can 
be summarized as follows: 


Theorem 2 The truncated Schwinger functions 
S! have a Fourier-Laplace representation with WT 
dined in eqns [18] and [19]. Pa acu), SI is the 
analytic continuation of WI from butelj real 
relativistic time to purely imaginary Euclidean 
time. The truncated Wightman functions W! fulfill 
the requirements of temperedness, relativistic covar- 
iance with respect to the representation of the 
orthochronous, proper Lorentz group 7: L! (d) — 
GI(L), locality, spectral property, and cluster prop- 
erty. Here 7 is obtained by analytic continuation of T 
to a representation of the proper complex Lorentz 
group over C? (which contains SO(d) as a real 
submanifold) and restriction of this representation 
to the real orthochronous proper Lorentz group. 


Again making use of the explicit formula in 
Theorem 2, the condition of Theorem 1 can be verified. 
This proves the existence of IMQFT models associated 
with the class of random fields under discussion. 


Theorem 3 The Wightman functions defined in 
Theorem 2 fulfill the HSSC [2]. In particular, there 
exists a OFT with indefinite metric such that the 
Wightman functions are given as the VEVs of that 
IMOFT. 


Nontrivial Scattering 


Theories as described in Theorem 2 obviously have 
trivial scattering behavior if the noise field 7 is 
Gaussian, that is, if, in [7], z = 0. In the case where 
there is also a Poisson component in 7, that is, z > 0, 
higher-order truncated Wightman functions do not 
vanish and such relativistic theories have nontrivial 
scattering. 

Before the scattering of the models can be 
discussed, some comments about scattering in 
IMQFT in general are in order. The scattering 
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theory in axiomatic QFT, Haag—Ruelle theory, relies 
on positivity. In fact, one can show that in the class 
of models under discussion, the LSZ asymptotic 
condition is violated if dipole degrees of freedom are 
admitted. In that case more complicated asymptotic 
conditions have to be used. In any case, the Haag- 
Ruelle theory cannot be adapted to IMQFT. 
Nevertheless, asymptotic fields and states can be 
constructed in IMQFT if one imposes a no-dipole 
condition in a mathematically precise way. Then the 
LSZ asymptotic condition leads to the construction of 
mixed VEVs of asymptotic in- and out-fields with local 
fields. The collection of such VEVs is called the form- 
factor functional. After constructing this collection of 
mixed VEVs, one can try to check the HSSC for this 
functional and obtains a Krein space representation for 
the algebra generated by in- local and out-fields. 
Following this line, asymptotic in- and out-particle 
states can be constructed for the given mass spectrum 
(mı,... mp). If gi (ky, l =1, ...,P, denotes the 
creation operator for an incoming/outgoing particle 
with mass mı, spin component a, and energy—-momen- 
tum k, the following scattering amplitude can be derived 
for r incoming particles with masses m; ,..., m, and 
n — r outgoing particles with masses my,- , m: 


n 


. l T 
(ant (ka) + (k aT, (kr) o aik) 


=O ega Rire Re ketika) 
x [I En, (k) 5(K™ — K™) [20] 


= 


Ki™/out stand for the total energy-momentum of 
one out-particles, that is, K™ = Da k; and 
ee > = r+1 kj : 

Two immediate consequences can be drawn from 
[20]. First, choosing a model with nonvanishing 
Poisson part such that C,,3,3, 4 0 and a differential 
operator D containing in its mass spectrum the 
masses m and u with m > 2u, one gets a nonvanish- 


ing scattering amplitude for the process 


om 21] 


even though in- and out-particle states consist of 
particles with well-defined sharp masses. Thus, for the 
incoming particle, the energy uncertainty, which for a 
particle at rest is proportional to the mass uncertainty, 
vanishes but still the particle undergoes a nontrivial 
decay and must have a finite decay time. This appears 
to be a contradiction to the energy—-time uncertainty 
relation, which therefore seems to have an unclear 
status in IMQFT (i.e., in QFT including gauge fields). 
The origin of this inequality, which of course is 


experimentally very well tested, apparently has to be 
located in the constraints, that is, in the procedure of 
implementing a gauge, of the theory and not in the 
unconstrained IMQFT. 

Second, one can replace somewhat artificially the 
polynomials O™ in [17] by any other symmetric and 
relativistically covariant polynomial. If the sequence of 
the “new” O™ is of uniformly bounded degree in any 
of the arguments k1,...,&,, the redefined Wightman 
functions in [17] still fulfill the requirements of 
Theorem 1 and thus define a new relativistic, local 
IMQFT. The scattering amplitudes of such a theory 
are again well defined and given by [20]. For example, 
in the case of only one scalar particle with mass m, one 
can show that arbitrary Lorentz-invariant scattering 
behavior of bosonic particles can be reproduced by 
such theories for energies below an arbitrary maximal 
energy up to arbitrary precision. This kind of 
interpolation theorem shows that the outcome of an 
arbitrary scattering experiment can be reproduced 
within the formalism of (unconstrained) IMQFT as 
long as it is in agreement with the general requirements 
of Poincaré invariance and statistics. 
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Introduction 


The ordered patterns we observe in condensed 
matter and in high-energy physics are created by 
the quantum dynamics. Macroscopic systems exhi- 
biting some kind of ordering, such as superconduc- 
tors, ferromagnets, and crystals, are described by the 
underlying quantum dynamics. Even the large-scale 
structures in the universe, as well as the ordering in 
the biological systems appear to be the manifesta- 
tion of the microscopic dynamics governing the 
elementary components of these systems. Thus, we 
talk of macroscopic quantum systems: these are 
quantum systems in the sense that, although they 
behave classically, some of their macroscopic fea- 
tures nevertheless cannot be understood without 
recourse to quantum theory. 

The question then arises how the quantum 
dynamics generates the observed macroscopic prop- 
erties. In other words, how it happens that the 
macroscopic scale characterizing those systems is 
dynamically generated out of the microscopic scale 
of the quantum elementary components (Umezawa 
1993, Umezawa et al. 1982). 

Moreover, we also observe a variety of phenom- 
ena where quantum particles coexist and interact 
with extended macroscopic objects which show a 
classical behavior, for example, vortices in super- 
conductors and superfluids, magnetic domains in 
ferromagnets, dislocations and other topological 
defects (grain boundaries, point defects, etc.) in 
crystals, and so on. 

We are thus also faced with the question of the 
quantum origin of topological defects and their 
interaction with quanta (Umezawa 1993, Umezawa 
et al. 1982): this is a crucial issue for the under- 
standing of symmetry-breaking phase transitions 
and structure formation in a wide range of systems 
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Defects 


from condensed matter to cosmology (Kibble 1976, 
Zurek 1997, Volovik 2003). 

Here, we will review how the generation of 
ordered structures and extended objects is explained 
in quantum field theory (QFT). We follow Umezawa 
(1993) and Umezawa et al. (1982) in our presenta- 
tion. We will consider systems in which spontaneous 
symmetry breaking (SSB) occurs and show that 
topological defects originate by inhomogeneous 
(localized) condensation of quanta. The approach 
followed here is alternative to the usual one 
(Rajaraman 1982), in which one starts from the 
classical soliton solutions and then “quantizes” 
them, as well as to the QFT method based on dual 
(disorder) fields (Kleinert 1989). 

In the next section we introduce some general 
features of QFT useful for our discussion and treat 
some aspects of SSB and the rearrangement of 
symmetry. Next we discuss the boson transforma- 
tion theorem and the topological singularities of the 
boson condensate. We then present, as an example, 
a model with U(1) gauge invariance in which SSB, 
rearrangement of symmetry, and topological defects 
are present (Matsumoto et al. 1975a, b). There we 
show how macroscopic fields and currents are 
obtained from the microscopic quantum dynamics. 
The Nielsen—Olesen vortex solution is explicitly 
obtained as an example. The final section is devoted 
to conclusions. 


Symmetry and Order in QFT: 
A Dynamical Problem 


QFT deals with systems with infinitely many degrees 
of freedom. The fields used for their description are 
operator fields whose mathematical significance is 
fully specified only when the state space where they 
operate is also assigned. This is the space of the 
states, or physical phase, of the system under given 
boundary conditions. A change in the boundary 
conditions may result in the transition of the system 
from one phase to another. For example, a change 
of temperature from above to below the critical 
temperature may induce the transition from the 
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normal to the superconducting phase in a metal. The 
identification of the state space where the field 
operators have to be realized is thus a physically 
nontrivial problem in QFT. In this respect, the QFT 
structure is drastically different from the one of 
quantum mechanics (QM). The reason is the 
following. 

The von Neumann theorem (1955) in QM states 
that for systems with a finite number of degrees of 
freedom all the irreducible representations of the 
canonical commutation relations are unitarily 
equivalent. Therefore, in QM the physical system 
can only live in one single physical phase: unitary 
equivalence means indeed physical equivalence and 
thus there is no room (no representations) for 
physically different phases. Such a situation drasti- 
cally changes in QFT where systems with infinitely 
many degrees of freedom are treated. In such a case, 
the von Neumann theorem does not hold and 
infinitely many unitarily inequivalent representa- 
tions of the canonical commutation relations do in 
fact exist (Umezawa 1993, Umezawa et al. 1982). It 
is such richness of QFT that allows the description 
of different physical phases. 


QFT as a Two-Level Theory 


In the perturbative approach, any quantum experi- 
ment or observation can be schematized as a 
scattering process where one prepares a set of free 
(noninteracting) particles (incoming particles or in- 
fields) which are then made to collide at some later 
time in some region of space (spacetime region of 
interaction). The products of the collision are 
expected to emerge out of the interaction region as 
free particles (outgoing particles or out-fields). 
Correspondingly, one has the in-field and the out- 
field state space. The interaction region is where the 
dynamics operates: given the in-fields and the in- 
states, the dynamics determines the out-fields and 
the out-states. 

The incoming particles and the outgoing ones 
(also called quasiparticles in solid state physics) are 
well distinguishable and localizable particles only far 
away from the interaction region, at a time much 
before (f£=—ooc) and much after (t=+00) the 
interaction time: in- and out-fields are thus said to 
be asymptotic fields, and for them the interaction 
forces are assumed not to operate (switched off). 

The only regions accessible to observations are 
those far away (in space and in time) from the 
interaction region, that is, the asymptotic regions 
(the in- and out-regions). It is so since, at the 
quantum level, observations performed in the inter- 
action region or vacuum fluctuations occurring there 


may drastically interfere with the interacting objects, 
thus changing their nature. Besides the asymptotic 
fields, one then also introduces dynamical or 
Heisenberg fields, that is, the fields in terms of 
which the dynamics is given. Since the interaction 
region is precluded from observation, we do not 
observe Heisenberg fields. Observables are thus 
solely described in terms of asymptotic fields. 

Summing up, QFT is a “two-level” theory: one level 
is the interaction level where the dynamics is specified 
by assigning the equations for the Heisenberg fields. 
The other level is the physical level, the one of the 
asymptotic fields and of the physical state space 
directly accessible to observations. The equations for 
the physical fields are equations for free fields, 
describing the observed incoming/outgoing particles. 

To be specific, let the Heisenberg operator fields 
be generically denoted by wy(x) and the physical 
operator fields by y;,(x). For definiteness, we choose 
to work with the in-fields, although the set of out- 
fields would work equally well. They are both 
assumed to satisfy equal-time canonical (anti)- 
commutation relations. 

For brevity, we omit considerations on the renor- 
malization procedure, which are not essential for the 
conclusions we will reach. The Heisenberg field 
equations and the free-field equations are written as 


A(O)vu(x) = Inj) [1] 
A(O)yin(x) = 0 [2] 


where A(ð) is a differential operator, x = (t,x) and 
J is some functional of the Yy fields, describing the 
interaction. 

Equation [1] can be formally recast in the 
following integral form (Yang—Feldman equation): 


wa(x) = vin(x) + A71 (8) * Tihu] (x) [3] 


where » denotes convolution. The symbol A™ (ð) 
denotes formally the Green function for pin(x). The 
precise form of Green’s function is specified by the 
boundary conditions. Equation [3] can be solved by 
iteration, thus giving an expression for the Heisen- 
berg fields wy(x) in terms of powers of the yin(x) 
fields; this is the Haag expansion in the LSZ 
formalism (or “dynamical map” in the language of 
Umezawa 1993 and Umezawa et al. 1982), which 
might be formally written as 


u(x) = Flex; pin] [4] 


(A (formal) closed form for the dynamical map is 
obtained in the closed time path (CTP) formalism 
(Blasone and Jizba 2002). Then the Haag expansion 
[4] is directly applicable to both equilibrium and 
nonequilibrium situations.) 


We stress that the equality in the dynamical map 
[4] is a “weak” equality, which means that it must 
be understood as an equality among matrix elements 
computed in the Hilbert space of the physical 
particles. 

We observe that mathematical consistency in the 
above procedure requires that the set of yin fields 
must be an irreducible set; however, it may happen 
that not all the elements of the set are known from 
the beginning. For example, there might be compo- 
site (bound states) fields or even elementary quanta 
whose existence is ignored in a first recognition. 
Then the computation of the matrix elements in 
physical states will lead to the detection of unex- 
pected poles in the Green’s functions, which signal 
the existence of the ignored quanta. One thus 
introduces the fields corresponding to these quanta 
and repeats the computation. This way of proceed- 
ing is called the self- consistent method (Umezawa 
1993, Umezawa et al. 1982). Thus it is not necessary 
to have a one-to-one correspondence between the 
sets {Y4} and {y/,}, as it happens whenever the set 
{yi} includes composite particles. 


The Dynamical Rearrangement of Symmetry 


As already mentioned, in QFT the Fock space for 
the physical states is not unique since one may have 
several physical phases, for example, for a metal the 
normal phase and the superconducting phase, and so 
on. Fock spaces describing different phases are 
unitarily inequivalent spaces and correspondingly 
we have different expectation values for certain 
observables and even different irreducible sets of 
physical quanta. Thus, finding the dynamical map 
involves singling out the Fock space where the 
dynamics has to be realized. 

Let us now suppose that the Heisenberg field 
equations are invariant under some group G of 
transformations of wy: 


bu(x) > Yy(x) = glyu (x)] [5] 


with g € G. The symmetry is spontaneously broken 
when the vacuum state in the Fock space H is not 
invariant under the group G but only under one of 
its subgroups (Umezawa 1993, Umezawa et al. 
1982). 

On the other hand, eqn [4] implies that when wy 
is transformed as in [5], then 


Pin(x) — Pin (x) = 8 [¥in()| 6 


with g’ belonging to some group of transformations 
G’ and such that 


glvu(x)] = Fle’ [vin (2%) I] [7] 
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When symmetry is spontaneously broken it is 
G' Æ G, with G’ the group contraction of G; when 
symmetry is not broken then G’ = G. 

Since G is the invariance group of the dynamics, 
eqn [4] requires that G’ is the group under which 
free fields equations are invariant, that is, also y%, 
is a solution of [2]. Since eqn [4] is a weak equality, 
G’ depends on the choice of the Fock space H 
among the physically realizable unitarily inequiva- 
lent state spaces. Thus, we see that the (same) 
original invariance of the dynamics may manifest 
itself in different symmetry groups for the yj, fields 
according to different choices of the physical state 
space. Since this process is constrained by the 
dynamical equations [1], it is called the dynamical 
rearrangement of symmetry (Umezawa 1993, 
Umezawa et al. 1982). 

In conclusion, different ordering patterns appear 
to be different manifestations of the same basic 
dynamical invariance. The discovery of the process 
of the dynamical rearrangement of symmetry leads 
to a unified understanding of the dynamical genera- 
tion of many observable ordered patterns. This is the 
phenomenon of the dynamical generation of order. 
The contraction of the symmetry group is the 
mathematical structure controlling the dynamical 
rearrangement of the symmetry. For a qualitative 
presentation see Vitiello (2001). 

One can now ask which ones are the carriers of 
the ordering information among the system elemen- 
tary constituents and how the long-range correla- 
tions and the coherence observed in ordered patterns 
are generated and sustained. The answer is in 
the fact that SSB implies the appearance of bosons 
(Goldstone 1961, Goldstone et al. 1962, Nambu 
and Jona-Lasinio 1961), the so-called Nambu- 
Goldstone (NG) modes or quanta. They manifest 
as long-range correlations and thus they are respon- 
sible of the above-mentioned change of scale, from 
microscopic to macroscopic. The coherent boson 
condensation of NG modes turns out to be the 
mechanism by which order is generated, as we will 
see in an explicit example in a later section. 


The “Boson Transformation” Method 


We now discuss the quantum origin of extended 
objects (defects) and show how they naturally 
emerge as macroscopic objects (inhomogeneous 
condensates) from the quantum dynamics. At zero 
temperature, the classical soliton solutions are then 
recovered in the Born approximation. This approach 
is known as the “boson transformation” method 
(Umezawa 1993, Umezawa et al. 1982). 
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The Boson Transformation Theorem 


Let us consider, for simplicity, the case of a 
dynamical model involving one scalar field Yg and 
one asymptotic field yi, satisfying eqns [1] and [2], 
respectively. 

As already remarked, the dynamical map is valid 
only in a weak sense, that is, as a relation among matrix 
elements. This implies that eqn [4] is not unique, since 
different sets of asymptotic fields and the correspond- 
ing Hilbert spaces can be used in its construction. Let us 
indeed consider a c-number function f(x), satisfying 
the Yin equations of motion [2]: 


A(O)f (x) = 0 [8] 


The boson transformation theorem (Umezawa 1993, 
Umezawa et al. 1982) states that the field 


wh (x) = Fle: pin + fl 9] 


is also a solution of the Heisenberg equation [1]. 
The corresponding Yang—Feldman equation takes 
the form 


bhi (x) = vin(x) + f(x) + A718) * TWH (x) [10] 


The difference between the two solutions wy and 
wh, is only in the boundary conditions. An impor- 
tant point is that the expansion in [9] is obtained 
from that in [4] by the spacetime-dependent 
translation 


Pin(x) > pin(x) + f(x) [11] 


The essence of the boson transformation theorem is 
that the dynamics embodied in eqn [1] contains an 
internal freedom, represented by the possible 
choices of the function f(x), satisfying the free- 
field equation [8]. 

We also observe that the transformation [11] is a 
canonical transformation since it leaves invariant the 
canonical form of commutation relations. 

Let |0) denote the vacuum for the free field yin. 
The vacuum expectation value of eqn [10] gives 


of (x) = (Oly h («)10) 
= f(x) + (0| [ao s TEI) Jo) [12] 


The c-number field ¢/(x) is the order parameter. We 
remark that it is fully determined by the quantum 
dynamics. In the classical or Born approximation, 
which consists in taking (0|7 [a!,]]0) = J[o"], that 
is, neglecting all the contractions of the physical 
fields, we define oh (x) = limp_.9 6! (x). In this limit, 
we have 


A(O) G4 (x) = Tle) |x) [13] 


that is, i (x) provides the solution of the classical 
Euler-Lagrange equation. 

Beyond the classical level, in general, the form of 
this equation changes. The Yang—Feldman equation 
[10] gives not only the equation for the order 
parameter, eqn [13], but also, at higher orders in 
h, the dynamics of the physical quanta in the 
potential generated by the “macroscopic object” 
o (x) (Umezawa 1993, Umezawa et al. 1982). 

One can show (Umezawa 1993, Umezawa et al. 
1982) that the class of solutions of eqn [8] which 
lead to topologically nontrivial (i.e., carrying a 
nonzero topological charge) solutions of eqn [13], 
are those which have some sort of singularity with 
respect to Fourier transform. These can be either 
divergent singularities or topological singularities. 
The first are associated to a divergence of f(x) for 
Ix] =oo, at least in some direction. Topological 
singularities are instead present when f(x) is not 
single-valued, that is, it is path dependent. In both 
cases, the macroscopic object described by the 
order parameter, carries a nonzero topological 
charge. 


Topological Singularities and Massless Bosons 


An important result is that the boson transformation 
functions carrying topological singularities are only 
allowed for massless bosons (Umezawa 1993, 
Umezawa et al. 1982). 

Consider a generic boson field yin satisfying the 
equation 


(0* + m*)xin(x) = 0 [14] 
and suppose that the function f(x) for the boson 
transformation Xin(x)— Xin(x) + f(x) carries a topo- 


logical singularity. It is then not single-valued and 
thus path dependent: 


G(x) = [n 3] F(x) # 0, 


On the other hand, ô f(x), which is related with 
observables, is single-valued, that is, [0,,0,] 
O ,f(x)=0. Recall that f(x) is solution of the Xin 
equation: 


for certain u,v,x [15] 


(07 + m’)f(x) =0 [16] 


From the definition of G}, (x) and the regularity of 
Ouf (x), it follows, by computing 0"G* (x), that 


: 5 OGY, (x) [17] 


Ouf (x) = Ly ree) 


This equation and the antisymmetric nature of 
G(x) then lead to 0*f(x) =0, which in turn implies 


m=(. Thus, we conclude that [15] is only compa- 
tible with massless equation for Xin. 
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The topological charge is defined as 


Nr = J dia, f = J dS ,¢!!” 0, Oof 
C S 


=5 J ds” Gt, [18] 


Here C is a contour enclosing the singularity and S a 
surface with C as boundary. Nr does not depend on 
the path C provided this does not cross the 
singularity. The dual tensor G” (x) is 


G(x) =-} MG} (x) 19) 
and satisfies the continuity equation 
QG (x) = 0 
<> 0,Gy,(x) + 0,G),(x) + AGI (x) =0 [20] 
Equation [20] completely characterizes the topolo- 


gical singularity (Umezawa 1993, Umezawa et al. 
1982). 


An Example: The Anderson-Higgs-Kibble 
Mechanism and the Vortex Solution 


We consider a model of a complex scalar field (x) 
interacting with a gauge field A,,(x) (Anderson 1958, 
Higgs 1960, Kibble 1967). The lagrangian density 
Ll d(x), o*(x),A,(x)] is invariant under the global 
and the local U(1) gauge transformations (we do not 
assume a particular form for the Lagrangian density, 
so the following results are quite general): 


o(x) > e p(x), A, (x) > A(x) [21] 
P(x) > eM B(x), Ay (x) > Ay(x) + O,A(x) [22] 


respectively, where A(x)—0 for |xg|—0o and/or 
|x| > co and eo is the coupling constant. We work 
in the Lorentz gauge 0,A"“(x)=0. The generating 
functional, including the gauge constraint, is 
(Matsumoto et al. 1975a, b) 


(J, K] = 5; | [dA,l[dol[do" [dB 
x Exp ji S\Ay, B, o)l] [23] 


S= J d*x L(x) + B(x)0"A,,(x) 
+ K* (x) (x) + K(x)" (x) 
+ J ()Ay (x) + iel d(x) — vt 
N = J [dA,J[do)[a6" (a8) 


x exp i f x(c) + ie|ġ(x) — vÈ)| 


B(x) is an auxiliary field which implements the 
gauge-fixing condition (Matsumoto et al. 1975a, b). 
Notice the e-term where v is a complex number; its 
role is to specify the condition of symmetry breaking 
under which we want to compute the functional 
integral and it may be given the physical meaning of 
a small external field triggering the symmetry 
breaking (Matsumoto et al. 1975a, b). The limit 
c€ — 0 must be made at the end of the computations. 
We will use the notation 


Flélenk =z | dAulidellde' dB) Flo, 
x exp[iS[A,,,B,4]| [24] 
with (F[¢]), = (Floe j=K=0 and (F[¢]) = limo 
(F[¢)) 


The fields ¢, A,,, and B appearing in the generating 
functional are c-number fields. In the following, the 
Heisenberg operator fields corresponding to them 
will be denoted by $y, AnH, and By, respectively. 
Thus, the spontaneous symmetry breaking condition 
is expressed by (0|¢y(x)|0) = v Æ 0, with v constant. 

Since in the functional integral formalism the 
functional average of a given c-number field gives 
the vacuum expectation value of the corresponding 
operator field, for example, (F[¢]) = (0|F[dy]|0), we 
have lim._.o(¢(x)), = (O|dy(x)|0) =o. 


Let us introduce the following decompositions: 


p(x) = ox) = lx) 


Note that (x(x)).=0 because of the invariance 
under y ——x. 


The Goldstone Theorem 


Since the functional integral [23] is invariant under 
the global transformation [21], we have that 
OZ[],K]/00=0 and subsequent derivatives with 
respect to Kı and K2 lead to 


(w(x), = Vev J d*y(x(x)x(y)), 


= V2evA, (e, 0) [25] 


In momentum space the propagator for the field x 
has the general form 

Zy 
p — ms, + ieay 


A,(0,p) = lim | 


e—0 


+ (continuum contributions)| [26] 
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Here Z, and a, are renormalization constants. The 
integration in eqn [25] picks up the pole contribu- 
tion at p* =0, and leads to 


p= v2 ay Sy =0, 
ay 

The Goldstone theorem (Goldstone 1961, Goldstone 
et al. 1962) is thus proved: if the symmetry is 
spontaneously broken (v 4 0), a massless mode must 
exist, whose field is y(x), that is, the NG boson 
mode. Since it is massless, it manifests as a long- 
range correlation mode. (Notice that in the present 
case of a complex scalar field model, the NG mode 
is an elementary field. In other models, it may 
appear as a bound state, for example, the magnon in 
(anti)ferromagnets.) Note that 


5 (WO). = v2 | dy(o(x)0)), R8 
and because m, #0, the right-hand side of this 
equation vanishes in the limit e— 0; therefore, v is 
independent of |v|, although the phase of |u| 
determines the one of v (from eqn [25]): as in 
ferromagnets, once an external magnetic field is 
switched on, the system is magnetized independently 
of the strength of the external field. 


v=O0em, #0 [27 


The Dynamical Map and the Field Equations 


Observing that the change of variables [21] (and/or 
[22]) does not affect the generating functional, we may 
obtain the Ward—Takahashi identities. Also, using 
B(x) — B(x) + A(x) in [23] gives (OMA U(X) 67K = 9. 
One then finds the following two-point function pole 
structures (Matsumoto et al. 1975a, b): 


a 4, .-ip(x-y) __ E0 
me na ~ 23 


poa (30) 


(B(x)B(y)) anf y d*p eo POY) ca 


r = a PH 


The absence of branch-cut singularities in propaga- 
tors [29|-[31] suggests that B(x) obeys a free-field 
equation. In addition, eqn [31] indicates that the 
model contains a massless negative-norm state 
(ghost) besides the NG massless mode x. Moreover, 
it can be shown (Matsumoto et al. 1975a, b) that a 
massive vector field U; also exists in the theory. 
Note that because of the invariance (x,A,,,B)— 











(=X, —A,, —B), all the other two-point functions 
must vanish. 

The dynamical maps expressing the Heisenberg 
operator fields in terms of the asymptotic operator 


fields are found to be (Matsumoto et al. 1975a, b) 


Zz 
vax) = ali^ —— yin (x } |v + Z!/? pin (x) 


+F (pin; D: O(Xin — bia ll: [32] 


in? 


1/2 71/2 
Ax) =Z} U: (x) + z 





o” bin (x) 
T waT U! O(Xin — Dinli: [33] 


in? 


eoU 
Bu(x) = zn bin(x) —xin(xJJ te [B4 
where :...: denotes the normal ordering and the 


functionals F and F” are to be determined within a 
particular model. In eqns [32]-[34], Xin denotes the 
NG mode, bin the ghost mode, U% the massive 
vector field, and pin the massive matter field. In eqn 
[34] c is a c-number constant, whose value is 
irrelevant since only derivatives of B appear in the 
field equations (see below). Z3 represents the wave 
function renormalization for U% . The corresponding 
field equations are 


O*yin(x) =0, O72bin(x) = 0 
(07 + m?) pin (x) =) A 


(O° + my) Ui (x) =0, 3 Uk) =0 [B6] 


with my? = (Z3 /Z e00). The field equations for 
By and Ap, read (Matsumoto et al. 1975a, b) 


=0, —0*Ay,,(x) = jr, (x) 


with ju,(x)=6L(x)/5Ay,(x). One may then require 
that the current jy, is the only source of the gauge 
field Ay, in any observable process. This amounts to 
impose the condition: »(b|0,,Bu(x)|a), =0, that is, 


(—0*), (bA, (la), =Ablinu()la), [38] 


where |a), and |b), denote two generic physical 
states and A(x ) = Af (x) — eoù : OM bin(x):. Equa- 
tions [38] are the classical Maxwell equations. The 
condition »(b|0,Bu(x)|a), =0 leads to the Gupta- 
Bleuler-like condition 


na — bf (x)]|a), = 0 [39] 


where X ) and b- are the positive-frequency parts 
of the Xn and ba fields. Thus, we see that yin and 
bin cannot participate in any observable reaction. 


0? By (x) —0,,By(x) [37] 


This is confirmed by the fact that they are present 
in the S-matrix in the combination (xin — bin) 
(Matsumoto et al. 1975a, b). It is to be remarked, 
however, that the NG boson does not disappear from 
the theory: we shall see below that there are situations 
in which the NG fields do have observable effects. 


The Dynamical Rearrangement of Symmetry 
and the Classical Fields and Currents 


From eqns [32]-[33] we see that the local gauge 
transformations of the Heisenberg fields 


u(x) el?) pyx) 40) 
All (xe) + ARa) + A(x), Bul) = Bula) 
with 07X(x)=0, are induced by the in-field 
transformations 
Cov 
Kiala) = xin) + Fag MH) 
Cov 41 
bin(x) => Din (x *) + Fa Ne x) [41] 


Pin(x) — pin(x), U;, (x Ja U(x) 


On the other hand, the global phase transformation 
dy (x) — e” dy(x) is induced by 


~ 


Ù 
Xin(x) > Xin(x) + T bin(x) — bin(x) 

x 
Pin(x) > pin(x), U(x) > Un (x) [42] 
with 0*f(x) =0 and the limit f(x) — 1 to be performed 


at the end of computations. Note that under the above 
transformations, the in-field equations and the 
S-matrix are invariant and that By is changed by an 
irrelevant c-number (in the limit f — 1). 

Consider now the boson transformation 
Xin(x) > vin(x) + a(x): in local gauge theories the 
boson transformation must be compatible with the 
Heisenberg field equations but also with the physical 
state condition [39]. Under the boson transforma- 


tion with a(x) = 02,1 Of (x) and 3? ?f(x)= 0, By 
changes as 
~2 
Bu(x) > Bu(x) - Z f(a) 43 
x 


eqn [38] is thus violated when the Gupta—Bleuler- 
like condition is imposed. In order to restore it, the 
shift in By must be compensated by means of the 
following transformation on U% : 


UL (x) — UE (x) + Z3 a" (x), O,al"(x) =0 [44] 


with a convenient c-number function a”(x). The 
dynamical maps of the various Heisenberg operators 
are not affected by [44] since they contain Uf and 
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By in a combination such that the changes of By 
and of Uf compensate each other provided 
2 = 
(O* + my)a,(x) = n uf (x ) [45] 
Equation [45] thus obtained is the Maxwell equa- 
tion for the massive potential vector a,, (Matsumoto 


et al. 1975a, b). The classical ground state current j” 
turns out to be 


jla) = (Olfi(x)|0) = mh a") -Efo (46) 


The term m%,a"(x) is the Meissner current, while 
(m%,/e9)O"f(x) is the boson current. The key point 
here is that both the macroscopic field and current 
are given in terms of the boson condensation 
function f(x) 

Two remarks are in order: first, note that the 
terms proportional to O“f(x) are related to obser- 
vable effects, for example, the boson current which 
acts as the source of the classical field. Second, note 
that the macroscopic ground state effects do not 
occur for Sa bw (Gi (x) = A In fact, from [45] 
we obtain a,,(x bi ela O.f(x) for regular f(x) 
which implies zero classical current (j,,=0) and 
zero Classical field (F,,=0,a, —0,a,), since the 
Meissner and the boson current cancel each other. 

In conclusion, the vacuum current appears only 
when f(x) has topological singularities and these can 
be created only by condensation of massless bosons, 
that is, when SSB occurs. This explains why 
topological defects appear in the process of phase 
transitions, where NG modes are present and 
gradients in their condensate densities are nonzero 
(Kibble 1976, Zurek 1997). 

On the other hand, the appearance of spacetime 
order parameter is no guarantee that persistent 
ground state currents (and fields) will exist: if f(x) 
is a regular function, the spacetime dependence of ù 
can be gauged away by an appropriate gauge 
transformation. 

Since, as already mentioned, the boson transfor- 
mation with regular f(x) does not affect observable 
quantities, the $-matrix is actually given by 


1 
my 


This is indeed independent of the boson transforma- 
tion with regular f(x) 


1 
SS =:8 Po Uam 0 Nia bin) 
my 


Be Lap] [48] 
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since d,,(x) =(1/eo)ð f(x) for regular f(x). However, 
S Æ S for singular f(x): S includes the interaction of 
the quanta Uf and in with the classically behaving 
macroscopic defects (Umezawa 1993, Umezawa 
et al. 1982). 


The Vortex Solution 


Below we consider the example of the Nielsen- 
Olesen vortex string solution. We show which one is 
the boson function f(x) controlling the nonhomoge- 
neous NG boson condensation in terms of which the 
string solution is described. For brevity, we only 
report the results of the computations. The detailed 
derivation as well as the discussion of further 
examples can be found in (Umezawa 1993, 
Umezawa et al. 1982). 

In the present U(1) problem, the electromagnetic 
tensor and the vacuum current are (Umezawa 1993, 
Umezawa et al. 1982, Matsumoto et al. 1975a, b) 


F(x) = Oar(x) — vax) 


m? 4 
= 2 f d°x'A.(x — x')Gi(x') [49] 
eo 


2 
jula) = -2r J d*x'A.(x — x’) Gt (x) [50 


respectively, and satisfy “F (x)= —j,(x). In these 
equations, 





A.(x—x') = : [ pee [51] 


(27)* 2 _ me, +ie 


The line singularity for the vortex (or string) 
solution can be parametrized by a single line 
parameter o and by the time parameter r. A static 
vortex solution is obtained by setting yo(7,0)=7 and 
y(T,0)=y(o), with y denoting the line coordinate. 
G (x) is nonzero only on the line at y (we can 
consider more lines but let us limit to only one line, 
for simplicity). Thus, we have 


Go;(x) = J do PHO) Gly — y(a)] G(x) = 0 52] 


G; (x) =—€pGop(x), G5,(x) =0 


Equation [49] shows that these vortices are purely 
magnetic. We obtain 


oof (x) =) 





ip-(x—y(0)) 
z / po [53] 


that is, by using the identity (27)? [d?p(ei?*/p*) = 
1/2|x|, 





1 dy, (7) l 
Note that V?f(x)=0 is satisfied. 

A straight infinitely long vortex is specified by 
yilo) = 06;3 with —oo < ø < œ. The only nonvanish- 
ing component of G(x) are G” (x)= G} (x)= 
ó(x1)ó(x2). Equation [54] gives (Umezawa 1993, 
Umezawa et al. 1982, Matsumoto 1975a, b) 


o 1 0 _ 
Efl) =5 | dog Ha + (ay = o) 


X2 


7 XT +5 [55] 
o = X1 O E 
T (x) PEE Dx! = 0 
and then 


f(x) = tan“! (=) = (x) [56] 
We have thus determined the boson transformation 


function corresponding to a particular vortex solu- 
tion. The vector potential is 


a3(x) = ao(x) = 0 


and the only nonvanishing component of F»: 


2 
Fia(x) = -27 J dfx’ A. (x — x")6(x" Jla) 


2 
Hayy Lo te 
= K + 58 
o( my K 2) [58] 


Finally, the vacuum current eqn [50] is given by 


3 
: m X2 
ji(x) =— a (my [x4 + <3) 
eo 2 


xi + x5 
3 
m X1 59 
px = =k; (my +34) [59] 
0 [xi +a 


We observe that these results are the same of the 
Nielsen—Olesen vortex solution. Notice that we did 
not specify the potential in our model but only the 
invariance properties. Thus, the invariance proper- 
ties of the dynamics determine the characteristics of 
the topological solutions. The vortex solution 


manifests the original U(1) symmetry through the 
cylindrical angle 0 which is the parameter of the 
U(1) representation in the coordinate space. 


Conclusions 


We have discussed how topological defects arise as 
inhomogeneous condensates in QFT. Topological 
defects are shown to have a genuine quantum 
nature. The approach reviewed here goes under the 
name of “boson transformation method” and relies 
on the existence of unitarily inequivalent representa- 
tions of the field algebra in QFT. 

Describing quantum fields with topological 
defects amounts then to properly choose the physical 
Fock space for representing the Heisenberg field 
operators. Once the boundary conditions corre- 
sponding to a particular soliton sector are found, 
the Heisenberg field operators embodied with such 
conditions contain the full information about the 
defects, the quanta and their mutual interaction. 
One can thus calculate Green’s functions for 
particles in the presence of defects. The extension 
to finite temperature is discussed in Blasone and 
Jizba (2002) and Manka and Vitiello (1990). 

As an example we have discussed a model with 
U(1) gauge invariance and SSB and we have obtained 
the Nielsen—Olesen vortex solution in terms of 
localized condensation of Goldstone bosons. These 
thus appear to play a physical role, although, in the 
presence of gauge fields, they do not show up in the 
physical spectrum as excitation quanta. The function 
f(x) controlling the condensation of the NG bosons 
must be singular in order to produce observable 
effects. Boson transformations with regular f(x) only 
amount to gauge transformations. For the treatment 
of topological defects in nonabelian gauge theories, 
see Manka and Vitiello (1990). 

Finally, when there are no NG modes, as in the 
case of the kink solution or the sine-Gordon 
solution, the boson transformation function has to 
carry divergence singularity at spatial infinity 
(Umezawa 1993, Umezawa et al. 1982, Blasone 
and Jizba 2002). The boson transformation has also 
been discussed in connection with the Baklund 
transformation at a classical level and the confine- 
ment of the constituent quanta in the coherent 
condensation domain. 

For further reading on quantum fields with 
topological defects, see Blasone et al. (2006). 
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Introduction 


In general relativity, the gravitational field is 
encoded in the Riemannian geometry of spacetime. 
Much of the conceptual compactness and mathema- 
tical elegance of the theory can be traced back to 
this central idea. The encoding is also directly 
responsible for the most dramatic ramifications of 
the theory: the big bang, black holes, and gravita- 
tional waves. However, it also leads one to the 
conclusion that spacetime itself must end and 
physics must come to a halt at the big bang and 
inside black holes, where the gravitational field 
becomes singular. But this reasoning ignores quan- 
tum physics entirely. When the curvature becomes 
large, of the order of 1/03, =c?/Gh, quantum effects 
dominate and predictions of general relativity can 
no longer be trusted. In this “Planck regime,” one 
must use an appropriate synthesis of general 
relativity and quantum physics, that is, a quantum 
gravity theory. The predictions of this theory are 
likely to be quite different from those of general 
relativity. In the real, quantum world, evolution may 
be completely nonsingular. Physics may not come to 
a halt and quantum theory could extend classical 
spacetime. 

There are a number of different approaches to 
quantum gravity. One natural avenue is to retain the 
interplay between gravity and geometry but now use 
“quantum” Riemannian geometry in place of the 
standard, classical one. This is the key idea under- 
lying loop quantum gravity. There are several 
calculations which indicate that the well-known 
failure of the standard perturbative approach to 
quantum gravity may be primarily due to its basic 
assumption that spacetime can be modeled as a 
smooth continuum at all scales. In loop quantum 
gravity, one adopts a nonperturbative approach. 
There is no smooth metric in the background. 
Geometry is not only dynamical but quantum 
mechanical from “birth.” Its fundamental excita- 
tions turn out to be one dimensional and polymer- 
like. The smooth continuum is only a coarse-grained 
approximation. While a fully satisfactory quantum 
gravity theory still awaits us (in any approach), 
detailed investigations have been carried out to 


completion in simplified models — called mini- and 
midi-superspaces. They show that quantum space- 
time does not end at singularities. Rather, quantum 
geometry serves as a “bridge” to another large 
classical spacetime. 

This article will focus on structural issues from 
the perspective of mathematical physics. For com- 
plementary perspectives and further details, see 
Loop Quantum Gravity, Canonical General Relativity, 
Quantum Cosmology, Black Hole Mechanics, and 
Spin Foams in this Encyclopedia. 


Basic Framework 


The starting point is a Hamiltonian formulation of 
general relativity based on spin connections 
(Ashtekar 1987). Here, the phase space F consists 
of canonically conjugate pairs (A,P), where A is a 
connection on a 3-manifold M and P a 2-form, both 
of which take values in the Lie algebra su(2). Since F 
can also be thought of as the phase space of the 
SU(2) Yang-Mills theory, in this approach there is a 
unified kinematic framework for general relativity 
that describes gravity and the gauge theories which 
describe the other three basic forces of nature. The 
connection A enables one to parallel transport chiral 
spinors (such as the left-handed fermions of the 
standard electroweak model) along curves in M. Its 
curvature is directly related to the electric and 
magnetic parts of the spacetime “Riemann tensor.” 
The dual P of P plays a double role (the dual is 
defined via [,,PAw= J, Pw for any 1-form w 
on M). Being the momentum canonically conjugate 
to A, it is analogous to the Yang-Mills electric field. 
But (apart from a constant), it is also an orthonor- 
mal triad (with density weight 1) on M and 
therefore determines the positive-definite (“spatial”) 
3-metric, and hence the Riemannian geometry of M. 
This dual role of P is a reflection of the fact that 
now SU(2) is the (double cover of the) group of 
rotations of the orthonormal spatial triads on M 
itself rather than of rotations in an “internal” space 
associated with M. 

To pass to quantum theory, one first constructs an 
algebra of “elementary” functions on I (analogous 
to the phase-space functions x and p in the case of a 
particle) which are to have unambiguous operator 
analogs. The holonomies 


DAA) = P exp -fA [1] 


associated with a curve/edge e on M are (SU(2)- 
valued) configuration functions on IT. Similarly, 


given a 2-surface $ on M, and an su(2)-valued (test) 
function f on M, 


Psy = | (FP) 2) 


is a momentum function on T, where tr is over the 
su(2) indices. (For simplicity of presentation, all 
fields are assumed to be smooth and curves/edges e 
and surfaces S, finite and piecewise analytic in a 
specific sense. The extension to smooth curves and 
surfaces was carried out by Bacz and Sawin, 
Lewandowski and Thiemann, and Fleischhack. It is 
technically more involved but the final results are 
qualitatively the same.) The symplectic structure on 
I enables one to calculate the Poisson brackets 
(be, Pst}. The result is a linear combination of 
holonomies and can be written as a Lie derivative, 


ee Pss} — Lx, Pe [3] 


where Xs is a derivation on the ring generated by 
holonomy functions, and can therefore be regarded 
as a vector field on the configuration space A of 
connections. This is a familiar situation in classical 
mechanics of systems whose configuration space is a 
finite-dimensional manifold. Functions 4, and vector 
fields Xs generate a Lie algebra. As in quantum 
mechanics on manifolds, the first step is to promote 
this algebra to a quantum algebra by demanding 
that the commutator be given by ib times the Lie 
bracket. The result is a x-algebra a, analogous to the 
algebra generated by operators expiAx and p in 
quantum mechanics. By exponentiating the momen- 
tum operators Ps f one obtains MU, the analog of the 
quantum-mechanical Weyl algebra generated by 
exp iAx and exp ipp. 

The main task is to obtain the appropriate 
representation of these algebras. In that representa- 
tion, quantum Riemannian geometry can be probed 
through the momentum operators Ps fs Which 
stem from classical orthonormal triads. As in 
quantum mechanics on manifolds or simple field 
theories in flat space, it is convenient to divide the 
task into two parts. In the first, one focuses on the 
algebra € generated by the configuration operators 
h. and finds all its representations, and in the second 
one considers the momentum operators Ps f to 
restrict the freedom. 

€ is called the holonomy algebra. It is naturally 
endowed with the structure of an abelian C* algebra 
(with identity), whence one can apply the powerful 
machinery made available by the Gel’fand theory. 
This theory tells us that € determines a unique 
compact, Hausdorff space A such that the C* algebra 
of all continuous functions on A is naturally 
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isomorphic to €. A is called the Gel’fand spectrum 
of €. It has been shown to consist of “generalized 
connections” A defined as follows: A assigns to any 
oriented edge e in M an element A(e) of SU(2) 
(a “holonomy”) such that A(e!) =[A(e)]~; and, if 
the endpoint of e; is the starting point of e2, then 
A(e, o @€2)=A(e;): A(ez). Clearly, every smooth con- 
nection A is a generalized connection. In fact, the 
space A of smooth connections has been shown to be 
dense in A (with respect to the natural Gel’fand 
topology thereon). But A has many more “distribu- 
tional elements.” The Gel’fand theory guarantees that 
every representation of the C* algebra € is a direct 
sum of representations of the following type: the 
underlying Hilbert space is H = L? (A, du) for some 
measure u on A and (regarded as functions on A) 
elements of € act by multiplication. Since there are 
many inequivalent measures on A, there is a multi- 
tude of representations of €. A key question is how 
many of them can be extended to representations of 
the full algebra a (or %W) without having to introduce 
any “background fields” which would compromise 
diffeomorphism covariance. Quite surprisingly, the 
requirement that the representation be cyclic with 
respect to a state which is invariant under the action 
of the (appropriately defined) group Diff M of 
piecewise-analytic diffeomorphisms on M singles out 
a unique irreducible representation. This result was 
established for a by Lewandowski, Okotow, Sahl- 
mann and Thiemann, and for %W by Fleischhack. It is 
the quantum geometry analog to the seminal results 
by Segal and others that characterized the Fock 
vacuum in Minkowskian field theories. However, 
while that result assumes not only Poincaré invar- 
lance but also specific (namely free) dynamics, it is 
striking that the present uniqueness theorems make 
no such restriction on dynamics. The requirement of 
diffeomorphism invariance is surprisingly strong and 
makes the “background-independent” quantum geo- 
metry framework surprisingly tight. 

This representation had been constructed by 
Ashtekar, Baez, and Lewandowski some ten years 
before its uniqueness was established. The under- 
lying Hilbert space is given by H = L?(A, duo) where 
Ho is a diffeomorphism-invariant, faithful, regular 
Borel measure on A, constructed from the normal- 
ized Haar measure on SU(2). Typical quantum states 
can be visualized as follows. Fix: (1) a graph a on M 
(by a graph on M we mean a set of a finite number 
of embedded, oriented intervals called edges; if two 
edges intersect, they do so only at one or both ends, 
called vertices), and (2) a smooth function w on 
[SU(2)]”. Then, the function 


Ww, (A) := W(A(er),... 
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on A is an element of H. Such states are said to be 
“cylindrical” with respect to the graph a and their 
space is denoted by Cyl,. These are “typical states” 
in the sense that Cyl:= Ua Cyl, is dense in H. 
Finally, as ensured by the Gel’fand theory, the 
holonomy (or configuration) operators þe act just 
by multiplication. The momentum operators Ps 5 act 
as Lie derivatives: Ps {Y = —ihL x, ,W. 


Remark Given any graph a in M, and a labeling of 
each of its edges by a nontrivial irreducible represen- 
tation of SU(2) (i.e., by a nonzero half integer j), one 
can construct a finite-dimensional Hilbert space Ha, j, 
which can be thought of as the state space of a spin 
system “living on” the graph a. The full Hilbert space 
admits a simple decomposition: H= @o,j Ha,;. This 
is called the spin-network decomposition. The geo- 
metric operators discussed in the next section leave 
each Ha,j invariant. Therefore, the availability of this 
decomposition greatly simplifies the task of analyzing 
their properties. 


Geometric Operators 


In the classical theory, E :=8rGyP has the inter- 
pretation of an orthonormal triad field (or a 
“moving frame”) on M (with density weight 1). 
Here, is a dimensionless, strictly positive number, 
called the Barbero—Immirzi parameter, which arises 
as follows. Because of emphasis on connections, in 
the classical theory the first-order Palatini action is a 
more natural starting point than the second-order 
Einstein—Hilbert action. Now, there is a freedom to 
add a term to the Palatini action which vanishes 
when Bianchi identities are satisfied and therefore 
does not change the equations of motion. y arises as 
the coefficient of this term. In some respects y is 
analogous to the 0 parameter of Yang-Mills theory. 
Indeed, while theories corresponding to any permis- 
sible values of y are related by a canonical 
transformation classically, quantum mechanically 
this transformation is not unitarily implementable. 
Therefore, although there is a unique representation 
of the algebra a (or 29), there is a one-parameter 
family of inequivalent representations of the algebra 
of geometric operators generated by suitable func- 
tions of orthonormal triads E, each labeled by the 
value of y. This is a genuine quantization ambiguity. 
As with the 6 ambiguity in QCD, the actual value of 
y in nature has to be determined experimentally. 
The current strategy in quantum geometry is to fix 
its value through a thought experiment involving 
black hole thermodynamics (see below). 

The basic object in quantum Riemannian geome- 
try is the triad flux operator Es. f := 8aGy Ps f It is 


self-adjoint and all its eigenvalues are discrete. To 
define other geometric operators such as the area 
operator As associated with a surface S or a volume 
operator Vp associated with a region R, one first 
expresses the corresponding phase-space functions in 
terms of the “elementary” functions Es - using 
suitable surfaces $; and test functions f; and then 
promotes Fs, to operators. Even though the 
classical expressions are typically nonpolynomial 
functions of Es, the final operators are all well 
defined, self-adjoint and with purely discrete eigen- 
values. Therefore, in the sense of the word used in 
elementary quantum mechanics (e.g., of the hydro- 
gen atom), one says that geometry is quantized. 
Because the theory has no background metric or 
indeed any other background field, all geometric 
operators transform covariantly under the action of 
the Diff M. This diffeomorphism covariance makes 
the final expressions of operators rather simple. In 
the case of the area operator, for example, the 
action of As on a state Y, [4] depends entirely on 
the points of intersection of the surface $ and the 
graph a and involves only right- and left-invariant 
vector fields on copies of SU(2) associated with 
edges of œa which intersect S. In the case of the 
volume operator Vp, the action depends on the 
vertices of aœ contained in R and, at each vertex, 
involves the right- and left-invariant vector fields on 
copies of SU(2) associated with edges that meet at 
each vertex. 

To display the explicit expressions of these 
operators, let us first define on Cyl, three basic 
operators i? e) with j € {1,2,3}, associated with the 
pair consisting of an edge e of a and a vertex v of e: 


Eo 
i hao Valeses Vel AVET yn) 


_ if e begins at v 


_d 
i; b-§ al sees XD 09) Us Ao ase) 
if e ends at v 
[5] 


where 7; denotes a basis in su(2) and “...” stands for 
the rest of the arguments of WV, which remain 
unaffected. The quantum area operator A, is 
assigned to a finite two-dimensional submanifold S 
in M. Given a cylindrical state we can always 
represent it in the form [4] using a graph a adapted 
to S, such that every edge e either intersects S at 
exactly one endpoint, or is contained in the closure 
S, or does not intersect S. For each vertex v in S of 
the graph a, the family of edges intersecting v can be 
divided into three classes: edges {e;,...,¢,} lying on 
one side (say “above’’) S, edges {en+1, -- - , €u+a} lying 


on the other side (say “below”), and edges contained 
in S. To each v we assign a generalized Laplace 
operator 


u u+d 
ass=- (yeeo $ee) 
I=1 


I=u+1 
u ee u+d a 
v,e v,e 
(ye-a) 
K=1 K=u+1 


where 7 stands for —1/2 the Killing form on su(2). 
Now, the action of the quantum area operator As on 
W,, is defined as follows: 


AsVa = 4ryly, Y \/—Asy Wa [7] 

ves 
The quantum area operator has played the most 
important role in applications. Its complete spec- 
pay : ae in a closed form. Consider arbitrary 


sets ie i £ : and jt of half-integers, subject to the 
condition 
Oe a a A ecg a el 


where I runs over any finite number of integers. The 
general eigenvalues of the area operator are given by: 


= trite D (211 ADAJ GI 4:1) 


(utd u+d 1/2 
— 99 +1) 9] 


On the physically interesting sector of SU(2)- 
gauge-invariant subspace Hin of H, the lowest 
eigenvalue of As — “the area gap” — depends on 
some global properties of S. Specifically, it “knows” 
whether the surface is open, or a 2-sphere, or, if M is 
a 3-torus, a (nontrivial) 2-torus in M. Finally, on 
Hiny, One is often interested only in the subspace of 
states Ya, where a has no edges which lie within a 
given surface S. Then, the expression of eigenvalues 
simplifies considerably: 


as = 8r X vir + 1) [10] 
I 


To display the action of the quantum volume 
operator Vr, for each vertex v of a given graph a, 
let us first define an operator g, on Cyl,. 


1 

Ay =(81yl51)” — 

q ( nY p) 48 
x g e(e, deje] ojee 11] 


where e, e’, and e” run over the set of edges 
intersecting v, e(e,e’,e”) takes values +1 or O 
depending on the orientation of the half-lines 
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tangent to the edges at v, [7, Ti] = Cte and the 
indices are raised by the tensor nj. The action of the 
quantum volume operator on a cylindrical state [4] 
is then given by 


ViTa = Ko > Vllt [12] 


veER 


Here, ko is an overall, independent of a graph, 
constant resulting from an averaging. 

The volume operator plays an unexpectedly 
important role in the definition of both the gravita- 
tional and matter contributions to the scalar 
constraint operator which dictates dynamics. 
Finally, a notable property of the volume operator 
is the following. Let R(p, <€) be a family of neighbor- 
hoods of a point p € M. Then, as indicated above, 
VR, Ya = 0 if a has no vertex in the neighborhood. 
However, if a has a vertex at p 


lim Verve) Va 
me R(x,€) 


exists but is not necessarily zero. This is a reflection 
of the “distributional” nature of quantum geometry. 


Remark States V, € Cyl have support only on the 
graph a. In particular, they are simply annihilated 
by geometric operators such as As and Vp if the 
support of the surface S and the region R does not 
intersect the support of a. In this sense the 
fundamental excitations of geometry are one dimen- 
sional and geometry is polymer-like. States W,, 
where a is just a “small graph,” are highly quantum 
mechanical — like states in QED representing just a 
few photons. Just as coherent states in QED require 
an infinite superposition of such highly quantum 
states, to obtain a semiclassical state approximating 
a given classical geometry, one has to superpose a 
very large number of such elementary states. More 
precisely, in the Gel’fand triplet Cyl C H C Cyl’, 
semiclassical states belong to the dual Cyl” of Cyl. 


Applications 


Since quantum Riemannian geometry underlies loop 
quantum gravity and spin-foam models, all results 
obtained in these frameworks can be regarded as its 
applications. Among these, there are two which 
have led to resolutions of long-standing issues. The 
first concerns black hole entropy, and the second, 
quantum nature of the big bang. 


Black Holes 


Seminal advances in fundamentals of black hole 
physics in the mid-1970s suggested that the entropy 
of large black holes is given by Spy = (dnor/46))5 
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where apor is the horizon area. This immediately 
raised a challenge to potential quantum gravity 
theories: give a statistical mechanical derivation of 
this relation. For familiar thermodynamic systems, a 
statistical mechanical derivation begins with an 
identification the microscopic degrees of freedom. 
For a classical gas, these are carried by molecules; 
for the black body radiation, by photons; and for a 
ferromagnet, by Heisenberg spins. What about black 
holes? The microscopic building blocks cannot be 
gravitons because the discussion involves stationary 
black holes. Furthermore, the number of micro- 
scopic states is absolutely huge: some exp 10” for a 
solar mass black hole, a number that completely 
dwarfs the number of states of systems one normally 
encounters in statistical mechanics. Where does this 
huge number come from? In loop quantum gravity, 
this is the number of states of the “quantum horizon 
geometry.” 

The idea behind the calculation can be heuristi- 
cally explained using the “It from Bit” argument, 
put forward by Wheeler in the 1990s. Divide the 
black hole horizon into elementary cells, each with 
one Planck unit of area, Cs and assign to each cell 
two microstates. Then the total number of states M 
is given by N=2", where n= (aņor/%) is the 
number of elementary cells, whence entropy is 
given by S=InN ~ apo. Thus, apart from a 
numerical coefficient, the entropy (It) is accounted 
for by assigning two states (Bit) to each elementary 
cell. This qualitative picture is simple and attractive. 
However, the detailed derivation in quantum geo- 
metry has several new features. 

First, Wheeler’s argument would apply to any 
2-surface, while in quantum geometry the surface 
must represent a horizon in equilibrium. This 
requirement is encoded in a certain boundary 
condition that the canonically conjugate pair (A, P) 
must satisfy at the surface and plays a crucial role in 
the quantum theory. Second, the area of each 
elementary cell is not a fixed multiple of 4, but is 
given by [10], where I labels the elementary cells 
and fr can be any half-integer (such that the sum is 
within a small neighborhood of the classical area of 
the black hole under consideration). Finally, the 
number of quantum states associated with an 
elementary cell labeled by j; is not 2 but (27; + 1). 

The detailed theory of the quantum horizon 
geometry and the standard statistical mechanical 
reasoning is then used to calculate the entropy and 
the temperature. For large black holes, the leading 
contribution to entropy is proportional to the 
horizon area, in agreement with quantum field 
theory in curved spacetimes. (The subleading term 
—(1/2) In(apor 1&5) is a quantum gravity correction 


to Hawking’s semiclassical result. This correction, 
with the —1/2 factor, is robust in the sense that it 
also arises in other approaches.) However, as one 
would expect, the proportionality factor depends on 
the Barbero-Immirzi parameter y and so far loop 
quantum gravity does not have an independent way 
to determine its value. The current strategy is to 
determine y by requiring that, for the Schwarzschild 
black hole, the leading term agrees exactly with 
Hawking’s semiclassical answer. This requirement 
implies that y is the root of algebraic equation and 
its value is given by y% 0.2735. Now, quantum 
geometry theory is completely fixed. One can 
calculate entropy of other black holes, with angular 
momentum and distortion. A nontrivial check on the 
strategy is that for all these cases, the coefficient in 
the leading-order term again agrees with Hawking’s 
semiclassical result. 

The detailed analysis involves a number of 
structures of interest to mathematical physics. First, 
the intrinsic horizon geometry is described by a U(1) 
Chern-Simons theory on a punctured 2-sphere (the 
horizon), the level k of the theory being given by 
k = Apo. /40765,. The punctures are simply the inter- 
sections of the excitations of the polymer geometry 
in the bulk with the horizon 2-surface. Second, 
because of the horizon boundary conditions, in the 
classical theory the gauge group SU(2) is reduced to 
U(1) at the horizon. At each puncture, it is further 
reduced to the discrete subgroup Z, of U(1), 
sometimes referred to as a “quantum U(1) group.” 
Third, the “surface phase space” associated with the 
horizon is represented by a noncommutative torus. 
Finally, the surface Chern—Simons theory is entirely 
unrelated to the bulk quantum geometry theory but 
the quantum horizon boundary condition requires 
that the spectrum of a certain operator in the 
Chern-Simons theory must be identical to that of 
another operator in the bulk theory. The surprising 
fact is that there is an exact agreement. Without this 
seamless matching, a coherent description of the 
quantum horizon geometry would not have been 
possible. 

The main weakness of this approach to black hole 
entropy stems from the Barbero—Immirzi ambiguity. 
The argument would be much more compelling if 
the value of y were determined by independent 
considerations, without reference to black hole 
entropy. (By contrast, for extremal black holes, 
string theory provides the correct coefficient without 
any adjustable parameter. The AdS/CFT duality 
hypothesis (as well as other semiquantitative) argu- 
ments have been used to encompass certain black 
holes which are away from extremality. But in these 
cases, it is not known if the numerical coefficient is 


1/4 as in Hawking’s analysis.) It’s primary strengths 
are twofold. First, the calculation encompasses all 
realistic black holes — not just extremal or near- 
extremal — including the astrophysical ones, which 
may be highly distorted. Hairy black holes of 
mathematical physics and cosmological horizons 
are also encompassed. Second, in contrast to other 
approaches, one works directly with the physical, 
curved geometry around black holes rather than 
with a flat-space system which has the same number 
of states as the black hole of interest. 


The Big Bang 


Most of the work in physical cosmology is carried 
out using spatially homogeneous and isotropic 
models and perturbations thereon. Therefore, to 
explore the quantum nature of the big bang, it is 
natural to begin by assuming these symmetries. 
Then the spacetime metric is determined simply by 
the scale factor a(t) and matter fields (t) which 
depend only on time. Thus, because of symmetries, 
one is left with only a finite number of degrees of 
freedom. Therefore, field-theoretic difficulties are 
bypassed and passage to quantum theory is simpli- 
fied. This strategy was introduced already in the late 
1960s and early 1970s by DeWitt and Misner. 
Quantum Einstein’s equations now reduce to a 
single differential equation of the type 


82 
Oa? 
on the wave function WV(a, ¢), where f; is the matter 
Hamiltonian and f(a) reflects the freedom in factor 
ordering. Since the scale factor a vanishes at the big 
bang, one has to analyze the equation and its 
solutions near a=0. Unfortunately, because of the 
standard form of the matter Hamiltonian, coeffi- 
cients in the equation diverge at a=0 and the 
evolution cannot be continued across the singularity 
unless one introduces unphysical matter or a new 
principle. A well-known example of new input is the 
Hartle-Hawking boundary condition which posits 
that the universe starts out without any boundary 
and a metric with positive-definite signature and 
later makes a transition to a Lorentzian metric. 
Bojowald and others have shown that the situa- 
tion is quite different in loop quantum cosmology 
because quantum geometry effects make a qualita- 
tive difference near the big bang. As in older 
quantum cosmologies, one carries out a symmetry 
reduction at the classical level. The final result 
differs from older theories only in minor ways. In 
the homogeneous, isotropic case, the freedom in the 
choice of the connection is encoded in a single 


(fla)t (a, p)) = const. Ag U(a,¢) [13] 
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function c(t) and, in that of the momentum/triad, in 
another function p(t). The scale factor is given by 
a* = |p]. (The variable p itself can assume both signs; 
positive if the triad is left handed and negative if it is 
right handed. p vanishes at degenerate triads which 
are permissible in this approach.) The system again 
has only a finite number of degrees of freedom. 
However, quantum theory turns out to be inequi- 
valent to that used in older quantum cosmologies. 

This surprising result comes about as follows. 
Recall that in quantum geometry, one has well- 
defined holonomy operators h but there is no 
operator corresponding to the connection itself. In 
quantum mechanics, the analog would be for 
operators U(X) corresponding to the classical func- 
tions exp iàx to exist but not be weakly continuous 
in A; the operator x would then not exist. Once the 
requirement of weak continuity is dropped, von 
Neumann’s uniqueness theorem no longer holds and 
the Weyl algebra can have inequivalent irreducible 
representations. The one used in loop quantum 
cosmology is the direct analog of full quantum 
geometry. While the space A of smooth connections 
reduces just to the real line R, the space A of 
generalized connections reduces to the Bohr com- 
pactification Rpop, of the real line. (This space was 
introduced by the mathematician Harold Bohr (Nils’ 
brother) in his theory of almost-periodic functions. 
It arises in the present application because holo- 
nomies turn out to be almost periodic functions 
of c.) The Hilbert space of states is thus 
H =L*(Rpohrs duo) where uo is the Haar measure 
on (the abelian group) Rgohr. As in full quantum 
geometry, the holonomies act by multiplication and 
the triad/momentum operator p via Lie derivatives. 

To facilitate comparison with older quantum 
cosmologies, it is convenient to use a representation 
in which p is diagonal. Then, quantum states are 
functions U(p, d). But the Wheeler-DeWitt equation 
is now replaced by a difference equation: 


C*(p) U(p + 400, d) + C°(p) U(p, 4) 
+ C (p) U(p — 4p0)(¢) = const. AY (p,p) [14] 


where po is determined by the lowest eigenvalue of the 
area operator (“area gap”) and the coefficients C*(p) 
and C°(p) are functions of p. In a backward “evolu- 
tion,” given Ų at p+4 and p, such a “recursion 
relation” determines W at p — 4, provided C7 does not 
vanish at p — 4. The coefficients are well behaved and 
nowhere vanishing, whence the evolution does not stop 
at any finite p, either in the past or in the future. Thus, 
near p = 0 this equation is drastically different from the 
Wheeler-DeWitt equation [13]. However, for large p — 
that is, when the universe is large — it is well 
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approximated by [13] and smooth solutions of [13] are 
approximate solutions of the fundamental discrete 
equation [14] in a precise sense. 

To complete quantization, one has to introduce a 
suitable Hilbert space structure on the space of 
solutions to [14], identify physically interesting 
operators and analyze their properties. For simple 
matter fields, this program has been completed. 
With this machinery at hand, one begins with 
semiclassical states which are peaked at configura- 
tions approximating the classical universe at late 
times (e.g., now) and evolves backwards. Numerical 
simulations show that the state remains peaked at 
the classical solution till very early times when the 
matter density becomes of the order of Planck 
density. This provides, in particular, a justification, 
from first principles, for the assumption that space- 
time can be taken to be classical even at the onset of 
the inflationary era, just a few Planck times after the 
(classical) big bang. While one would expect a result 
along these lines to hold on physical grounds, 
technically it is nontrivial to obtain semiclassicality 
over such huge domains. However, in the Planck 
regime near the big bang, there are major deviations 
from the classical behavior. Effectively, gravity 
becomes repulsive, the collapse is halted and then 
the universe re-expands. Thus, rather than modify- 
ing spacetime structure just in a tiny region near the 
singularity, quantum geometry effects open a bridge 
to another large classical universe. These are 
dramatic modifications of the classical theory. 

For over three decades, hopes have been expressed 
that quantum gravity would provide new insights 
into the true nature of the big bang. Thanks to 
quantum geometry effects, these hopes have been 
realized and many of the long-standing questions 
have been answered. While the final picture has 


some similarities with other approaches, (e.g., 
“cyclic universes,” or pre-big-bang cosmology), 
only in loop quantum cosmology is there a fully 
deterministic evolution across what was the classical 
big-bang. However, so far, detailed results have 
been obtained only in simple models. The major 
open issue is the inclusion of perturbations and 
subsequent comparison with observations. 
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Introduction 


Mathematics of classical gauge theories is contained 
in the theory of principal and associated vector 
bundles. Principal bundles describe pure gauge 
fields and their transformations, while the asso- 
ciated bundles contain matter fields. A structure 
group of a bundle has a meaning of a gauge group, 


while the base manifold is a spacetime for the 
theory. In this article, we review the theory of 
bundles in which a structure group is a quantum 
group and base space or spacetime might be 
noncommutative. To fully deal with geometric 
aspects, we first review differential geometry of 
quantum groups. Then we describe the theory of 
quantum principal bundles, connections on such 
bundles, gauge transformations, associated vector 
bundles and their sections. We indicate that, for a 
certain class of quantum principal bundles, sections 
of an associated bundle become vector bundles of 
noncommutative geometry à la Connes, that is, 
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finite projective modules. The theory is illustrated 
by two explicit examples that can be viewed as 
deformations of the classical magnetic monopole 
and the instanton. 


Differential Structures on Algebras 
Algebraic Conventions 


Throughout this article, A (P etc.,) will be an 
associative unital complex algebra. To gain some 
geometric intuition the reader can think of A as an 
algebra of continuous complex functions on a 
compact (Hausdorff) space X, C(X), with product 
given by pointwise multiplication fg(x) =f(x)g(x), 
and with the unit provided by a constant function 
x— 1. The algebra C(X) is commutative, but, in 
what follows, we do not assume that A is a 
commutative algebra. By an A-bimodule we mean 
a vector space with mutually commuting left and 
right actions of A. All modules are unital (i.e., the 
unit element of A acts trivially). On elements, the 
multiplication in an algebra or an action of A on a 
module is denoted by juxtaposition. 


Differential Calculus on an Algebra 


A first-order differential calculus on A is a pair 
(Q1(A),d), where Q'(A) is an A-bimodule and 
d:A—!(A) is a linear map such that: 


1. for all a,b € A,d(ab) =(da)b + adb (the Leibniz 
rule); and 

2. every w E€ Q! (A) can be written as w = 5-, a;db; for 
some a;, b; €A. 


Elements of Q!(A) are called differential 1-forms 
and the map d is called an exterior derivative. As a 
motivating example, take A=C(X) and !(A) the 
space of 1-forms on X (sections of the cotangent 
bundle T*X), and d the usual exterior differential. 
Higher-differential forms corresponding to (Q'(A), d) 
are defined as elements of a differential graded 
algebra Q(A). This is an algebra which can be 
decomposed into the direct sum of A-bimodules 
Q”(A), that is, Q(A)=A BONA) B QA)... In 
addition to d:A—!'(A), there are maps dņ: Q” 
(A) 0"+'(A) such that, for all w,€7(A), 
we €O*(A), 


1. 076d =0 and G44, o d,—0,7 = 1.25420: 
2. wwr € OA and 
oF ditk(Wnwp) = (dywn)wk + (—1)wnldkwg). 


Elements of Q”(A) are known as “differential 
n-forms.” Q”(A) contains all linear combinations 
of expressions do da, daz - - - dan with do,...,a, EA. 


One says that Q(A) satisfies the “density 
condition” if any element of Q”(A) is of the 
above form, for any n. To simplify notation, one 
writes d for d,. 

As an example of Q(A), take A = C(X) and then 
the exterior algebra Q(X) for Q(A). The exterior 
algebra satisfies density condition as any n-form 
can be written as f(x) ^ dg(x) \dh(x)A---. The 
wedge product is anticommutative, but for a 
noncommutative algebra A, the anticommutativity 
of the product in Q(A) cannot be generally 
required. 


The Universal Differential Calculus 


Any algebra A comes equipped with a universal 
differential calculus denoted by (1A, d). Q!A is def- 
ined as the kernel of the multiplication map, that 
1S, 01A:={>-.a,@b;EA@Al Sab U CASA: 
The derivative is defined by d(a)=1@a—a@1. The 
n-forms are defined as QD7A=Q!A@,40'A Qae 
@,Q'A (n-copies of Q'A). Q”A can be identified 
with a subspace of A®A®---@A (n+ 1-copies of 
A) consisting of all such elements that vanish upon 
multiplication of any two consecutive factors. With 
this identification, higher derivatives read 


n+1 
d(Y iode od) = > Ohga 
i k=0 i 
D:--Qa,_ 18184,- Qa, 
The universal differential calculus satisfies the 
density condition. 

This calculus captures very little (if any) of the 
geometry of the underlying algebra A, but it has the 
universality property, that is, any differential calcu- 
lus on A can be obtained as a quotient of QA. 
In other words, any differential calculus Q(A) is 
fully determined by a system of A-sub-bimodules 
N, €A®"*! (or homogeneous ideals in the algebra 
QA), so that Q”(A)=Q”A/N,. The differentials d in 
Q(A) are derived from universal differentials via the 
canonical projections Ty : Q”A — Q”(A). 

Typical examples of algebras in quantum geome- 
try are given by generators and relations, that is, 
A =C kisesa) RAM iy 60g 0) Where Xis Xa) 
is a free algebra on generators x, and R;(x1,...,Xņ) 
are polynomials, so that R;(x1,...,Xn)=0 in A. 
Correspondingly, the modules (”(A) are given by 
generators and relations. If (A) satisfies the density 
condition, that the whole of (A) must be generated 
by some 1-forms. The sub-bimodules N, contain 
relations satisfied by these generators. 
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*=Calculi 


If A is a x-algebra, then a calculus is called a 
“x-calculus” provided Q(A) is a graded x-algebra, 
and d(p*)=(dp)*, for all p € Q(A). 


Differential Structures on Quantum 
Groups (Hopf Algebras) 


Hopf Algebra Preliminaries 


From now on, A is a Hopf algebra (quantum group), 
with a coproduct A:A—A®A, counit €: A—C 
and antipode S. We use Sweedler’s notation 
A(a) = X a1) Qan. We also write At = kere (the 
augmentation ideal). 

For any algebra P, the convolution product of 
linear maps f,g: A— P is a linear map f xg: A —>P, 
defined by f *g(a)=) f(a) * glan). A map 
f:A—P is said to be convolution invertible, 
provided there exists f1:A—P such that 
fel "=a = 16 

An A-coaction on a comodule V, 0: V — V Q A, is 
denoted by o(v)= $ veo) Qva. The right adjoint 
coaction in A is a map 


Ad: AAA, 
Ad(a) = ` a(2) ®&) (Sa(1) )a(3) 


A subspace B of A is said to be “Ad-invariant” 
provided Ad(B) C B@A. For example, A* is such a 
space. 


Covariant Differential Calculi 


For Hopf algebras one can study calculi that are 
covariant with respect to A. For A=C[G] (an 
algebra of functions on a Lie group), this corre- 
sponds to the covariance of a differential structure 
on G with respect to regular representations. 

A first-order differential calculus 0'(A) on a 
quantum group A is said to be left-covariant, if 
there exists a linear map Ay :!(A)—-A@Q/(A) 
(called a left coaction) such that, for all a,b € A, 


At (adb) = X ` aba) ® aay dbq) 


Q!'(A) is called a right-covariant differential calculus 
if there exists a linear map Ar:Q'(A) —Q'(A)@A 
(called a right coaction) such that, for all a,b € A, 


Ar(adb) = X ado) Sanbo) 


If Qİ (A) is both left- and right-covariant, it is called 
a “bicovariant differential calculus.” A bicovariant 
Q1(A) has a structure of a Hopf A-bimodule, that is, 
it is an A-bimodule and an A-bicomodule such that 
the coactions are compatible with actions. 


The universal calculus on A is bicovariant with 
coactions 


AR (© e b') = > aii) 8 bin 8 aabo) 
1 1 

Al (Sia ab!) =Y aybin Baia) 8 bin 
1 1 


Since Q'(A)=Q'A/N for an A-sub-bimodule 
NEQ!A, the calculus '(A) is left (resp. right) 
covariant if and only if A?(N) CA@N (resp. 
AR(N) C N@A). 


The Woronowicz Theorems 


A form w in a left-covariant differential calculus 
Q'(A) is said to be left-invariant provided 
Ay (w)=1@w. 0'(A) is a free A-module with basis 
given by left-invariant forms, that is, one can choose 
a set of left-invariant forms a” such that any 1-form 
p can be uniquely written as a finite sum 
p=} au", aj; EA. 

The first Woronowicz theorem states that there is 
a one-to-one correspondence between left-covariant 
calculi on A and right A-ideals O C A*t. The 
correspondence is provided by the map 


K:AQO-N, a®qr X asqa) 8 q0) 


where N is such that Q! (A) = (Q'A)/N. The inverse of 
K reads KO aob, abii ® bia. The map 
k induces the map #:At/O—-QI(A), via 
w([a])=[K(1@a)] where [—-] denotes cosets in 
At/ỌQ and in Qt(A)=(Q1A)/N. This establishes a 
one-to-one correspondence between the space 
A'!—A*/O and the space of left-invariant 1-forms in 
(!(A). The dual space to At, that is, the space of linear 
functionals At — C, is often termed a “quantum Lie 
algebra” or a “quantum tangent space” corresponding 
to a left-covariant calculus Q!(A). The dimension of 
A! is known as a dimension of Q! (A). 

The definitions and analysis of right-covariant 
differential calculi are done in a symmetric manner. 
For a bicovariant calculus, a form w that is both left- 
and right-invariant, is termed a “bi-invariant” form. 

The second Woronowicz theorem states a one-to- 
one correspondence between bicovariant differential 
calculi and Ad-invariant A-ideals O C A* (cf. the 
subsection “Hopf algebra preliminaries”). The 
correspondence is provided by the map « above. 
For the universal calculus, O is trivial, and hence 
Al = At = ker (£). 


Higher-order Bicovariant Calculi 


Given a first-order bicovariant calculus 9'(A), one 
constructs a braiding operator, known as the 
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“Woronowicz braiding” 7:0'(A)@,40!(A) > Q!(A) 
Qa QA) by setting T(aw ® an) =an@,w for all a € A, 
and any left-invariant w and right-invariant 7, and then 
extending it A-linearly to the whole of 
Q'(A)@,4Q'(A). This operator satisfies the braid 
relation (id@,47)0(7T@,id) 0 (id @, 7) =(7T@agid) o 
(id®,47)0(7@,id), and is invertible provided the 
antipode S is invertible. The Woronowicz braiding is 
used to define symmetric forms as those invariant 
under 7. One then defines exterior 2-forms as elements 
of 0'(A)@4Q!(A)/ker(id—7), and introduces the 
wedge product. The wedge product is not in general 
anticommutative, but one does have wAn =—n Aw 
for bi-invariant w,7. This construction is extended to 
higher forms and leads to the definition of the exterior 
algebra (A). To define exterior n-forms, one maps 
any permutation on n-elements to the corresponding 
element of the braid group generated by 7 and then 
takes the quotient of the nth tensor power of 2!(A) by 
all elements corresponding to even permutations. The 
differential d: A—!(A) is extended to an exterior 
differential in the whole of Q(A) in the following way. 
First, '(A) is extended by a one-dimensional 
A-bimodule generated by a form @ that is required to 
be bi-invariant. The resulting extended bimodule 
(which, in general, is not a first-order differential 
calculus, as 0 is not necessarily of the form 5>,a;dbj, 
for some a;,b;€ A) is then determined from the 
relation da=6a— a0 for all ac A. Higher exterior 
derivative is then defined by do=0 ^ p — (—1)”p ^9, 
for any pE Q” (A). 

The algebra Q(A) is a Z2-graded differential Hopf 
algebra, that is, it has a coproduct such that 


Alwan) = X (11e Mow Ana 8wa Ang) 


where |wn)| etc., denotes the degree of a homo- 
geneous component in the decomposition of A(w). 
Furthermore, 


A(dw) = > (dwa 8 w(2) + (=1)"olwn) dwg) ) 


On the 1-forms this coproduct is simply the sum 
AL + ÂR. 


Classification 


There is no unique covariant differential calculus on A, 
so classification of covariant differential calculi is an 
important problem. For example, it is known that the 
quantum group SU,(2) admits a left-covariant three- 
dimensional calculus, but there is no three-dimen- 
sional bicovariant calculus. On the other hand, there 
are two four-dimensional bicovariant calculi on 
SU, (2). Differential calculi are classified for standard 
quantum groups such as SL,(N) or Sp,(N). 


General classification results are based on 
the equivalence between the category of Hopf 
bimodules of a finite-dimensional Hopf algebra 
A and that of Yetter—Drinfeld or crossed modules 
of A. These are the modules of the Drinfeld double 
of A. As a result, in the case of a finite-dimensional 
factorizable coquasitriangular Hopf algebra A with 
a dual Hopf algebra H, the bicovariant 2'(A) are 
in One-to-one correspondence with two-sided ideals 
in H*. If, in addition, A is semisimple, then 
(coirreducible) calculi are in one-to-one correspon- 
dence with nontrivial irreducible representations of 
H. This can be extended to infinite-dimensional 
algebras, provided one works over a field of formal 
power series in the deformation parameter. 


Quantum Group Principal Bundles 
Quantum Principal Bundles 


In classical geometry, a (topological) principal 
bundle is a locally compact Hausdorff space with a 
(continuous) free and proper action of a locally 
compact group (e.g., a Lie group). In terms of 
algebras of functions this gives rise to the following 
structure. A is a Hopf algebra (the model is 
functions on a group G), P is a right A-comodule 
algebra with a coaction Ap:P—P®A (the model 
is functions on a total space X). Let 
B={beP|Ap(b)=b@1} be the coinvariant sub- 
algebra (the model is functions on a base manifold 
M=X/G). Fix a bicovariant calculus 0!(A), with 
the corresponding Q and A!=A*/O as in the 
subsection “The Woronowicz theorems.” Take a 
differential calculus Q'(P)=Q!P/Np such that: 


1. Agip(Np) CNp@A, where for all $; p 8q EQ!P, 


Agip ( » p' ® d) = 2; Pio) 8 dio) 8 Pada) 
EQ'P@A 


2. x(Np) C Np ® O, where 


7%: 0'P>+P@AT, 
`S pode piAr(g') =Y tto Sda 


3. Ng =NpAQ!B gives rise to a differential struc- 
ture 0'(B) =Q'B/Ngz on B. Condition (1) ensures 
that Agip descends to a coaction Agip: 
Q1(P)—Q!(P) @A, while (2) allows for defining 
a map 


ver : Q!(P) >P QA}, ver(|w]) = [v(w)] 
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Since B is a subalgebra of P, the P-bimodule 
PQ! (B)P := {> P'(db)a'ip'.a’ cP, bi EB) 


is a sub-bimodule of Q!P, known as horizontal 
forms. P is called a “quantum principal bundle” 
over B with quantum structure group A and calculi 
Q!'(A) and Qt (P) provided the following sequence; 


0-4 PO’ BP = 0 (P) SPAN 0 

is exact. This definition reflects the geometric 
content of principal bundles, but is not restricted 
to any specific differential calculus. The surjectivity 
of ver corresponds to the freeness of the (co)action, 
while the condition ker (ver) = PQ'(B)P corresponds 
to identification of vertical vector fields as those that 
are annihilated by horizontal forms. 


The Universal Calculus Case 


In the universal calculus case, both Np and O in the 
previous subsection are trivial, and ver=x. Uni- 
versal horizontal forms P(Q!B)P coincide with the 
kernel of the canonical projection P®gP—-P®P. 
The exactness of the sequence in the last subsection 
is equivalent to the requirement that the map 


can: P@pP—-P@A 
p pq pAr(4) = X Pao) D40) 


be bijective. In algebra, such an inclusion of algebras 
B C P is known as a Hopf-Galois extension. Thus, a 
geometric notion of a quantum principal bundle 
with the universal calculus is the same as the 
algebraic notion of a Hopf—Galois extension. 

If (2) in the previous subsection is replaced by 
stronger conditions ¥(Np)=Np®O and (NpA 
ker ¥) C P(Q'B)P, then exactness of the sequence 
in the previous subsection is equivalent to the 
biectivity of “can.” Thus, although defined in a 
purely algebraic way, the notion of a Hopf—Galois 
extension carries deep geometric meaning. It there- 
fore makes sense to consider primarily Hopf—Galois 
extensions and then specify differential structure in 
such a way that this stronger version of (2) is 
satisfied. Henceforth, unless specified otherwise, a 
quantum principal bundle is taken with the uni- 
versal differential calculus. 


Quantum Homogeneous Bundles 


Suppose that P is a Hopf algebra, and that there is a 
Hopf algebra surjection 7:P— A. This induces a 
coaction of P on A via Ap = (id 8 7) o A, where now 


A is a coproduct in P. P is a quantum principal 
A-bundle over the coinvariants B, provided ker m C 
BYP, where BT=BOP*. B is a left quantum 
homogeneous space in the sense that A(B) C P@B, 
and P is known as a quantum homogeneous bundle. 
An example of this is the standard quantum 
2-sphere — a quantum homogeneous space of 
SU,(2) (see the subsection “The Dirac q-monopole”). 
This construction reflects the classical construction of 
a principal bundle over a homogeneous space, since 
every homogeneous space of a group G can be 
identified with a quotient G/H, where H CG is a 
subgroup. Not every quantum homogeneous space 
can be obtained in this way (e.g., nonstandard 
quantum 2-spheres), as quantum groups P do not 
have sufficiently many quantum subgroups A (in a 
sense of Hopf algebra projections 7: P — A). To study 
gauge theory on general quantum homogeneous 
spaces, more general notion of a bundle needs to be 
developed (see the subsection “Generalizations of 
quantum principal bundles”). 

A general differential calculus on a quantum 
homogeneous bundle is specified by choosing a 
left-covariant calculus on P with an ideal Op € PT 
such that (id ® 7) o Ad(Op) C Op 8A. A bicovariant 
calculus on A is then given by O4 =7(Qp). 


Quantum Trivial Bundles 


A quantum principal bundle (with the universal 
differential calculus) is said to be “trivial” or “cleft” 
provided there exists a linear map ®: A — P such that 


1. &(1)=1 (unitality); 

2. Apo ®=(® Sid) oA (colinearity or covariance); 
and 

3. ® is convolution invertible (cf. the subsection 
“Hopf algebra preliminaries”). 


® is called a trivialization. In this case, P is 
isomorphic to B®A as a left B-module and right 
A-comodule via the map B 8A — P,b 8a b(a). 
In particular, an A-covariant (i.e., colinear) algebra 
map j;:A—P is a trivialization (the convolution 
inverse of f is f o S). 

Based on trivial bundles, locally trivial bundles 
can be constructed by choosing a compatible cover- 
ing of B (in terms of ideals). 

At this point, the reader should be warned that 
the notion of a trivial quantum principal bundle 
includes bundles which are not trivial classically 
(i.e. do not correspond to functions on the 
Cartesian product of spaces). As an example, 
consider the Möbius strip viewed as a Z2-principal 
bundle over the circle St. Obviously, this is not a 
trivial bundle (the Möbius strip is not isomorphic to 
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S! x Z2). It can be shown, however, that the 
quantum principal bundle corresponding to the 
Mobius strip has a trivialization ® in the above 
sense. 


Generalizations of Quantum Principal Bundles 


In the case of majority of quantum homogeneous 
spaces, the map a in the subsection “Quantum 
homogeneous bundles” is a coalgebra and right 
P-module map, but not an algebra map. Thus, the 
induced coaction is not an algebra map either. To 
cover examples like these, one needs to introduce 
a generalization of quantum principal bundles. 
Consider an algebra P that is also a right comodule 
of a coalgebra C with coaction Ap. Define 


p= {be PlypeP, Ap(bp) = bAP(p) 


= ` bpo) ® Pay} 


B is a subalgebra of P. P is a principal coalgebra- 
bundle over B or BCP is a coalgebra-Galois 
extension provided the map 


can: P®pP—P®C 
p ® pqr pAr(q) = X_ pace) @ a) 


is bijective. This purely algebraic requirement 
induces a rich symmetry structure on P, given in 
terms of entwining, which allows one for developing 
various differential geometric notions such as those 
discussed in the next section. The lack of space does 
not permit us to describe this theory here. 


Connections, Gauge Transformations, 
Matter Fields 


Connections and Connection Forms 


A “connection” in a quantum principal bundle with 
calculi Q'(P),Q'(A) is a left P-linear map 
IL: 0!(P) -Q!(P) such that: 


1. Hol =l (II is a projection); 

2. ker II = PQ!(B)P; and 

3. AQP) O lI = (II &) id) O Agip) 
covariance). 


(colinearity or 


The exact sequence in the subsection “Quantum 
principal bundles” implies that II is a left P-linear 
projection if and only if there exists a left P-linear 
map o0:P @ A'—Q!(P) such that ver o o =id. Since 
o is left P-linear, it is fully specified by its action on 
Al. This leads to the equivalent definition of a 


connection as a connection form or a gauge field, 
that is, a map w: A! —(!(P) such that: 


1. for all A€ Al, ver(w(A)) =1@ A; and 

2. Agip) ow =(w @id) o Ad\ (Ad-covariance), where 
Adi is a projection of the adjoint coaction to At, 
that is, Ad,ai([a]) =[Ad(a)] (well defined, because 
O is Ad-invariant for a bicovariant calculus, see 
the subsection “The Woronowicz theorems”). 


The correspondence between connections and con- 
nection 1-forms is given by the formula 


| [@da) = X` paw w(laayl) 


In the universal differential calculus case, At =A‘, 
hence w can be viewed as a map w:A—0'P, such 
that w(1)=0. The map F,,:A—-7P, given by 
F.,=dw+t+wx«*w is called a “curvature” of w. The 
curvature satisfies the Bianchi identity, d¥,,= 
F yp #W— WSF g: 

In the case of a trivial bundle with trivialization ® 
and universal calculus, any linear map 6:A—'B 
such that G(1) =0 defines a connection 1-form 


w= OG !xdb4+O'xBxS 


The corresponding curvature is F,,=@® x Fg  , 
where Fg =d8 + 8 * B. 

In the case of a quantum homogeneous bundle 
with calculus determined by OpePt and 
O,=7(Op) (cf. the subsection “Quantum homo- 
geneous bundles”), a canonical connection form can 
be assigned to any algebra map 7: A — P such that 


1. moi=id (i-splits 7); 

2. €p oi=e, (co-unitality); 

3. (id @ 7) o Adp o i=(i&id)o Ady (Ad-covariance); 
and 


4. i(O,4) C Op (differentiability). 
Covariant Derivative: Strong Connections 


A covariant derivative associated to a connection II 
is a map D: P—> PO!(B)P, p> dp — I(dp). A covar- 
iant derivative maps elements of P into horizontal 
forms, since kerII=PQ!'(B)P, and satisfies the 
Leibniz rule D(bp)=(db)p+bDp, for all DEB, 
peP. 

A connection is “strong” provided D(p) € Qt (B)P. 
A covariant derivative of a strong connection is a 
connection on module P in the sense of Connes. 
Furthermore, in the universal calculus case, and when 
A has invertible antipode, the existence of strong 
connections leads to rich gauge theory of associated 
bundles (cf. the subsection “Associated bundles: 
matter fields”). A connection in a trivial bundle 
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described in the subsection “Connections and con- 
nection forms” is strong (and every strong connection 
in a trivial bundle is of this form). Assuming 
invertibility of the antipode in A, a canonical 
connection in a quantum homogeneous bundle 
described in that subsection is strong provided Ad- 
covariance (3) is replaced by conditions (id ® 77) o Ao 
i= (1@id) o Ay (right covariance) and (7 @id)o Ao 
i=(id@i)o A, (left covariance), where A is a 
coproduct in P, and Ag is a coproduct in A. 

In the universal calculus case, the map D can 
be extended to a map D:Q!P—?P via the 
formula D(p)=dp + >> poywlpa). Then Do D(p) = 
X poFolpa), where Fu is the curvature of w (cf. the 
subsection “Connections and connection forms”). This 
explains the relationship between a curvature under- 
stood as the square of a covariant derivative and F,,. 


Bundle Automorphisms and Gauge 
Transformations 


A quantum bundle automorphism is a left B-linear 
right A-covariant (i.e., colinear) automorphism 
F:P —P such that F(1)=1. Bundle automorphisms 
form a group with operation FG = G o F. This group 
is isomorphic to the group G(P) of gauge transfor- 
mations, that is, maps f:A— P that satisfy the 
following conditions: 


1. f(14)= [1p (unitality); 

2. Apof =(f id) o Ad (Ad-covariance); and 

3. f is convolution invertible (cf. the subsection 
“Hopf algebra preliminaries”). 


The product in G(P) is the convolution product 
(cf. the subsection “Hopf algebra preliminaries”). 
The group of gauge transformations acts on the 
space of (strong) connection forms w via the formula 


fow=fxwxf '+fxdf', VfEG(P) 


This resembles the gauge transformation law of a 
gauge field in the standard gauge theory. The curvature 
transforms covariantly as Fr, =f * Fo x f. 

In the case of a trivial principal bundle, gauge 
transformations correspond to a change of the 
trivialization and can be identified with convolution- 
invertible maps y: A —B such that 7(1)=1. A map 
B:A—Q'B that induces a connection as in the 
subsection “Connections and connection forms” is 
transformed to y* Bx! + yx*dy™, and the curva- 
ture Fg yx Foxy’. 


Associated Bundles: Matter Fields 


Given a right A-comodule (corepresentation) 
0: V—+V®@A one defines a quantum vector bundle 
associated to P as 


B= {Z rop ever] S vio Bpi ovnin 
-Sveplel\cver 


E is a right B-module with product (S°.v' @ p')b= 
X`; 1 @p’b. A right B-linear map s: E— B is called a 
section of E. The space of sections T(E) is a left B- 
module via (bs)(p) = bs(p). 

The theory of associated bundles is particularly rich 
when A has a bijective antipode and P has a strong 
connection form w. In this case, T(E) is isomorphic to 
the left B-module T, of maps ¢: V — P such that Ap o 
p= (¢ id) o o. If Vis finite dimensional, then I’, is a 
finite projective B-module, that is, it is a module of 
sections of a noncommutative vector bundle in the 
sense of Connes. The strong connection induces a map 
V:r,—-QO'BSsl,, given by V(d)(v)=dd(v) + 
X ġlvo)w(va). V is a connection in the sense of 
Connes (in a projective left B-module), that is, for all 
beB, ETa, V(bo)=db @ Bp + bV(¢). 

In the case of a trivial bundle, I’, can be identified 
with the space of linear maps V — B. Thus, sections 
of an associated bundle correspond to pullbacks of 
matter fields, as in the classical local gauge theory 
matter fields are defined as functions on a spacetime 
with values in a representation (vector) space of the 


gauge group. 


The Dirac g-Monopole 


This is an example of a strong connection in a 
quantum homogeneous bundle (cf. the subsection 
“Quantum homogeneous bundles”). P=SU,(2) is a 
matrix Hopf «-algebra with matrix of generators 


a —qc 
c æ 


ac =g a, tc e 


and relations 


ao = qea; 

a“a + c“c= l1, aa* + q’cc* =1 

where q is a real parameter. A=C[U(1)] is a Hopf 
*-algebra generated by unitary and group-like u 
(i.e. uu* =u*u=1,A(u)=u@u). The x-projection 
m:P—A is defined by a(a)=u. The coinvariant 
subalgebra B is generated by x=cc*, z=ac", 
z* = ca*. The elements x and z satisfy relations 


x =. wx = xz, 


z = g’x(l—q’x), x*zg =x(1—x) 
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Thus, B is the algebra of functions on the standard 
quantum 2-sphere. A strong connection is obtained 
from a bicovariant *-map 1:A—P given by 
i(u”) =a” (cf. the subsections “Quantum homoge- 
neous bundles,” “Connections and connection 
forms,” and “Covariant derivative: strong connec- 
tions”). Explicitly, the connection form reads 


w(u”) — ` (G) P ea dr e 


k=0 
ow) S k” 
(u*”) Dal) 


where the deformed binomial 
defined for any number x by 


dede a) 
q? 


coefficients are 


mo _ (= Deea 1) 
(n Grae e 


There is a family V, neZ of one-dimensional 
corepresentations of C[U(1)] with V,=C and 
o"(1)=1@u",n > 0 and 0”(1)=1@u*",n < 0. This 
leads to the family of finite projective modules 
ITa =I as described in the subsection “Associated 
bundles: matter fields.” The Hermitian projectors 
e(n) of these modules come out as, for n > 0, 


n n Saa Be 
eln); = ( ‘) ( ; q” tóc q" 1 
tf q? NJ) g2 


ij=0,1,... 7 
e(—n); =q" " ca a ie, 
1] 1 qu J q2 
ij=0,1,... 7 


The e(n) describe q-monopoles of magnetic charge 
—n. For example, the charge-1 projector explicitly 


reads 
lexy z 
Zz xX 
and reduces to the usual charge-1 Dirac monopole 
projector when g=1. The covariant derivatives V 


are Levi-Civita or Grassmann connections in mod- 
ules I’, corresponding to projectors e(n). 


The g-Instanton 


This is an example of a coalgebra bundle and the 
associated vector bundle, which is a deformation of 
an instanton (with instanton number 1). P = C[S7] is 
the x-algebra of polynomial functions on the 


quantum 7-sphere. As a x-algebra it is defined by 
generators 21,22, 23,24 and relations 


(for i <j) 
(for i Æ /) 
rk = MeMet+(L— G7) S_ ge", 


j<k 
` ara =l 
k=1 


where qE€R. The coaction of the *-Hopf algebra 
A=SU,(2) (cf. the previous subsection) on P is 
constructed as follows. Start with the quantum group 
U,(4), generated by a matrix =i), j=1 and view 
CIS⁄] as a right quantum homogeneous space of U,(4) 
generated by the bottom row in t. Thus, there is a right 
coaction of U,(4) on C[S/] obtained by the restriction of 
the coproduct in U,(4). Next, project U,(4) to SU,(2) 
by a suitable coideal and a right ideal in U,(4). The 
corresponding canonical surjection r: U,(4) — SU,(2) 
is a coalgebra map, characterized as a right U,(4)- 
module map by r(ti1t2 — gti2t21) = 1 and 


[u0 _  ( “n -un 
r(t) 7 k a Ág & ~ 


where u = (uj); j=1 1s the matrix of generators 
of SU,(2) (cf. the previous subsection). When 
applied to the coaction of U,(4) on CIS], r induces 
the required coaction Ap: C[S7] > C[S?] @ SU, (2). 
Explicitly, the coaction comes out on generators 
as Ap(z)= )2;2; @r(tj). The coaction Ap is not 
an algebra map. The coinvariant subalgebra B is a 
*-algebra generated by 


Zigi = YRiRi 
Z“ Zi = GRR; 


a= zi m n 
b = 2423+ qt z2z4 
R216) taos 
The elements a,a*,b,b*,R satisfy the following 
relations: 
Ra = q°aR, Rb = q7bR 
ab = q°ba, ab* =q 'b*a 
aa* + q°bb* = R(1 — q°R) 
aa* = g’a*a+(1—q*)R’ 
b*b = q*bb* + (1—q*)R 
Hence B can be understood as a deformation of the 
algebra of functions on the 4-sphere and is denoted 
by CE4]. One can show that the map “can” in the 
subsection “Generalizations of quantum principal 
bundles” is bijective, hence there is an SU,(2)- 


coalgebra principal bundle with the total space the 
quantum 7-sphere C[S3] and the base space the 
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quantum 4-sphere C[X4]. By abstract arguments 
that involve cosemisimplicity of SU,(2), one can 
prove that there exists a strong connection in this 
bundle; this is the g-deformed instanton field. At the 
time of writing this article, however, the explicit 
form of this connection is not known. 

On the other hand, following the classical con- 
struction of an instanton, one can take the funda- 
mental two-dimensional corepresentation V = C% of 
SU, (2) and explicitly construct q-instanton projection 
with instanton number 1. Writing e;, e2 for the basis 
of V, the coaction o: V — V ®SU,(2) is given by 


(e) = de @uj 


The associated bundle (cf. the subsection “Asso- 
ciated bundles: matter fields”) is a finite projective 
left module over CIX: The corresponding q-instan- 
ton projector comes out as 


R 0 qa q?b 

0 q? R qb* =G a" 
qa qb 1-R 0 
gb -qa 0 1=@'R 


See also: Bicrossproduct Hopf Algebras and 
Noncommutative Spacetime; Hopf Algebras and 
g-Deformation Quantum Groups; Noncommutative Tori, 
Yang-Mills, and String Theory. 
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Introduction 


When a current flows in a thin sample with a 
transverse magnetic field B, the Lorentz force 
deflects the trajectories of the charge carriers, 
producing an excess charge on one side and a 
charge deficiency on the other, and creating a 
potential difference across the conductor perpendi- 
cular to both the direct current and the magnetic 
field. This is known as the Hall effect, in honour of 
E H Hall, who, inspired by a remark of Maxwell, 
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first demonstrated it in thin samples of gold foil in 
October 1879 (Hall’s subsequent measurements of 
the potential difference showed that the carriers 
could be positively or negatively charged for 
different materials). A schematic diagram of Hall’s 
experiment and the lateral separation of charges is 
shown in Figure 1. 

Equilibrium is reached when the magnetic force 
balances that from the potential difference E due to 
the displaced charge. When the charge carriers are 
electrons, with the electron density n, and the 
electron current J, this gives neE = ]B. Comparison 
with Ohm’s law, J=cE, gives conductance (the 
reciprocal of resistance) to be o=ne/B. More 
generally, considering the currents and fields as 


Magnetic 
field B 










Direct 
current 


Figure 1 Schematic diagram of charge separation in Hall’s 
experiment. 


vectors, ø is represented by a matrix. Rescaling by 
the sample thickness 6, the diagonal components of 
do give the direct conductivity oy and its off- 
diagonal elements give the Hall conductivity: 
oy = 6021. (For systems symmetric under 90° rota- 
tions, 011 =022 and 12 = —021.) In quantum 
theory, one usually works in terms of the filling 
fraction v=n6h/eB and then oy = ve? /h. 

In 1980 von Klitzing, Dorda, and Pepper dis- 
covered that at very low temperatures in very high 
magnetic fields, the Hall conductivity oy is quan- 
tized as integral multiples of e*/h, a fact known as 
the integer quantum hall effect (IQHE). The integer 
multiples were accurate to 1 part in 10°, and the 
effect was exceptionally robust against changes in 
the geometry of the samples and in the experimental 
parameters. Indeed, the unprecedented accuracy of 
the effect led to its adoption as the international 
standard for resistance in 1990. 

More precisely, the Hall conductivity was no 
longer proportional to the filling fraction v, but the 
graph of oy against v displayed a sequence of jumps, 
as shown in Figure 2. In this figure, the conductivity 
has plateau at the integer multiples of e*/h, and 
jumps between them within fairly small ranges of 
the filling fraction. Moreover, the direct conductiv- 
ity vanishes where the Hall conductivity takes its 
constant integral values. 

These results raise numerous questions. 


1. Why does the conductivity take such precise 
integer values, and why are they so stable under 
changes of the geometry and physical 
parameters? 

2. Why does the direct conductivity vanish, except 
in regions where the Hall conductivity jumps 
between integer values, and how are such jumps 
possible? 


Moreover, any theory must also explain why 
these features are not present under the more normal 
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Figure 2 Schematic diagram of the Hall and direct conductivities 
plotted against the filling factor v. 


conditions of the classical Hall effect. The following 
features seem to play a role, and in the case of the 
first three, even in the classical effect. 


1. As Hall discovered, the samples must be very thin 
to exhibit even the classical effect. (Nowadays 
they are often a surface layer between two 
semiconductors.) 

2. The samples are macroscopic and much larger 
than the quantum wavelengths appearing in the 
problem. 

3. The electric field is small enough that nonlinear 
effects are negligible. 

4. The quantum effect appears only at a very low 
temperature. 


The first of these suggests that we should idealize 
to the case where the motion of the charge carriers is 
restricted to a two-dimensional region, and the 
second that we may work in the thermodynamic 
limit where the conducting surface is the whole 
of R°. The third and fourth ensure both that the 
linear Ohm/’s law should be adequate, and also that 
it should be enough to consider the limiting cases of 
very weak electric fields and zero temperature. 
Multiple limits of this sort raise delicate mathema- 
tical issues. Indeed, many plausible models of the 
effect turn out, on careful analysis, to predict 
vanishing Hall conductivity. 

A theoretical explanation of the quantization of 
the conductivity was soon suggested by Laughlin. 
Exploiting the apparent independence of sample 
geometry, he considered a cylindrical conductor 
where quantization followed on consideration of 
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the flux tubes threading it. Laughlin’s choice of a 
particular configuration precluded investigation of 
the influence of changing geometry. This was soon 
provided by Thouless, Kohmoto, Nightingale, and 
de Nijs, who argued (from a lattice version of the 
problem) that the conductivity could be identified 
with the Chern character of a line bundle over a 
Brillouin zone (a quotient of momentum space by 
the action of the reciprocal crystal lattice), so that it 
had to be integral and the stability of the effect was 
a consequence of the topological nature of o. 
Unfortunately, whilst suggestive, this explanation 
worked only under the physically implausible con- 
straint that the magnetic flux through a crystal cell 
was rational, offered no explanation of the link 
between the Hall and direct conductivities, and, 
working with a periodic Hamiltonian, made no 
allowance for the impurities and disorder usually 
important in solid-state problems of this sort. 

Notwithstanding these deficiencies, this model con- 
tained important insights, which inspired Bellissard 
to model the effect using Connes’ newly developed 
noncommutative geometry (Bellissard 1986, Connes 
1986). (Kunz produced a Hilbert space theory at about 
the same time, but that has been rather less influential.) 
Connes’ work turned out to contain all the relevant 
concepts and tools needed to provide a good under- 
standing of the effect, based on interpreting the 
conductivity as a noncommutative Chern character 
for a noncommutative version of the Brillouin zone. In 
fact, the techniques of noncommutative geometry 
seemed to fit the quantum Hall effect so well that 
this has become one of the standard examples of the 
theory. 

Even whilst the theorists were struggling to 
explain the experiments, observations by Tsui, 
Stormer, and Gossard showed that, with suitable 
care, fractional Hall conductivities could also be 
observed, although these were far less stable than 
those given by integers. One, therefore, distinguishes 
between IQHE and the fractional quantum Hall 
effect (FQHE), and this survey concentrates largely 
on the former. One simplifying feature of the IQHE 
is that it seems to be comprehensible at the level of 
individual noninteracting electrons, whereas the 
FQHE certainly involves some kind of interaction 
and many-body theory. 

This article presents an outline of the connection 
between noncommutative differential geometry and 
the IQHE, and concludes by discussing some of the 
approaches to the FQHE, and some other applica- 
tions of noncommutative geometry and mathema- 
tical directions suggested by the theory. The sections 
alternate between the physical model and the 
mathematical abstraction from it. 


There are good surveys of the area (Bellissard 
et al. 1994, McCann 1998) explaining how the 
mathematical model arises out of the physics, the 
mathematical models themselves. As well as being 
the standard reference for noncommutative geome- 
try, Connes (1994) discusses the Hall effect. These 
resources contain good bibliographies, which may 
be consulted for further references. 


Electron Motion in a Magnetic Field 


The following discussion restricts attention to 
motion in two dimensions, with electrons as the 
charge carriers, and no interactions between them. 
(The first condition is essential; the second could be 
relaxed a little to allow sufficiently long-lived 
quasi-particles.) A single free electron with mass 
m and charge e moving in the xj-x2 plane with a 
constant transverse magnetic field B in the positive 
x3-direction, can be described by the Landau 
Hamiltonian 


Hı = |P — eA|’/2m (1) 


where A= 4B x X is a magnetic vector potential that 
gives rise to B. This problem is exactly solvable by, 
for example, introducing K4 = (K1, K2)=P F eA. 
The components of K, and K_ commute with each 
other, but [K}, K4] = +ibeB. Comparison with the 
harmonic oscillator shows that the energy spectrum of 
Hy =[(K1)* + (K2)?]/2m is {(n++)hbeB/m: n € Z}. 
Since Hy commutes with the components of K_, 
each of these Landau energy levels is infinitely 
degenerate, and the filling fraction v measures 
what proportion of states in the Landau levels are 
filled. The frequency w.=eB/m is the cyclotron 
frequency for classical circular orbits in the 
magnetic field. 

The degeneracy of the Landau Hamiltonian can 
also be understood in terms of the magnetic 
translations obtained by exponentiating the connec- 
tion defined by the magnetic potential A: V; = ð; + 
ieA;/h =iK’_/b. More precisely, we set 


U(a) = exp(—ia- V) =exp(-1a-K_/h) [2] 


which clearly commutes with Hı, expressing the 
translational symmetry of this model. The curvature 
[V1, V2] = B of the connection manifests itself in the 
identities 


U(a)U(b) = ee"? U(a +b) = el? U(b)U(a) [3] 


where ¢=eB- (a x b)/b measures the magnetic flux 
through the parallelogram spanned by a and b. 
These show that U is a projective representation 
of R? with projective multiplier A(a, b) = exp ( sid). 


The significance of this is that, unless @ is an 
integer multiple of 27, U(a) and U(b) generate a 
noncommutative algebra. This replaces the commu- 
tative algebra of functions on two-dimensional 
momentum space and leads naturally to a noncom- 
mutative geometry. 

The unembellished Landau Hamiltonian cannot 
describe the Hall effect without adding an electric 
potential eE -X to drive the current in the sample. 
(Alternatively, and useful for the later discussion, 
one could use the radiation gauge in which, instead 
of introducing a scalar potential, a time-dependent 
term is added to A so that E = —0A/0t.) 

The quantum Hall effect also depends crucially on 
the effects of impurities in the conducting material. 
These can be modeled by adding a random potential 
V., with w in a compact probability space Q to 
obtain H,,=H,+eE-X+V,,(X). A continuous 
function f on 2 can be interpreted as a random 
variable, and its expectation To(f) gives a trace on 
the C*-algebra C(Q) (i.e., a positive linear functional 
such that Tro(AB) =79(BA)). 

Although the magnetic translations commute with 
Hı, they do not generally commute with the 
potentials so they act on 2, but, on the other hand, 
the physics of a disordered system and its translates 
should be the same, so we assume that the 
probability measure and hence also mū are invariant 
under magnetic translations. (As noted earlier, we 
work in the thermodynamic limit, where the Hall 
sample expands to fill R, so we do not need to 
worry about translations moving the sample itself.) 
Then 2 with the magnetic translation action can be 
interpreted as the noncommutative Brillouin zone. 
(A space 2 can be reconstructed from the magnetic 
translations of the resolvents of the Hamiltonians 
(Bellisard et al. 1994).) 

The current J may be defined as the functional 
derivative of the Hamiltonian with respect to the 
vector potential A or, in components, /],=6,H = 
6H/éA,z. For the Landau Hamiltonian, this gives 


ih6,Hy = ieh(P, — eAp)/m = e|X,, H] [4] 


a relation which persists for H = Hy, + V(X) when- 
ever the potential V is independent of A, so that 
6,H = —ie[X,, H]/h=edxX;/dt, the charge times 
velocity, as one might expect. The operator func- 
tional calculus delivers a similar formula for deriva- 
tions of the spectral projections of H. We have 
6, =eOp/h, where, in view of the commutation 
relations, O, = —i[X,,-] can be regarded as a 
momentum-space derivative, confirming that we 
are dealing with the differential geometry of 
momentum space. 
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We now wish to calculate the expected current 
(Jz), in a thermal state with chemical potential u at 
inverse temperature B=1/kT (where k is Boltz- 
mann’s constant and the temperature is T (kelvin). 
Using the Fermi—-Dirac distribution, the grand 
canonical expectation value is 


Josef (1+) "p] 


Since the quantum Hall effect occurs at low tempera- 
tures (large 6) and for weak fields, we formally 
proceed to those limits. Then (1 + e7#-")) + tends to 
the projection Pp onto the states with energy less than 
the Fermi energy Ep in the absence of the electric 
field. The limiting expected current is, therefore, 
tr(PrJ,) =tr(Pré,H), where H is now the Hamilto- 
nian including the electric field (without which there 
would be no current). 

A detailed calculation of the Hall conductivity 
using the Kubo—Greenwood formula shows that the 
conductivity matrix is actually 


On = i(e° /h)tr(Pr[3;Pr, OgPe]) [6] 


In particular, this immediately implies that the direct 
conductivity terms g; vanish, as observation sug- 
gested. The derivation of [6] requires great care, and 
references may be found in the surveys, but a formal 
argument in the next section may lend this expres- 
sion some plausibility. 


The Noncommutative Geometry 


The principal ingredient for noncommutative geo- 
metry is an algebra, and thus we shall now consider 
a class of algebras broad enough to include the 
physical example. 

The action of the magnetic translations on Q 
defines automorphisms of the C*-algebra C(Q), 
which permit the construction of a twisted crossed- 
product algebra, in which these automorphisms are 
represented by conjugation. Because much of the 
theory has been formulated with lattice approxima- 
tions using Z? rather than R’, it is useful to work 
more generally with a separable locally compact 
abelian group G with continuous multiplier À, and a 
homomorphism a to automorphisms of a C*-algebra 
A, with trace m, which will in practice be the 
commutative algebra C(Q) with to. The twisted 
crossed product A= C(A;, G, A) can be constructed 
as the norm completion of the continuous compactly 
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supported functions from G to A, with the product, 
adjoint and norm 


(f*g)(x) = J Ax —y)fO\(ayg(e—y)dy [7 
f (æ) = Mex, —x) f(-x)" [8] 


Al = maxd f Mla dw f ENa deh 9 


integration being with respect to the Haar measure. 
The crossed-product algebra is noncommutative, 
both because of the action of G and due to the 
multiplier A. It has a trace 7[f ]=7[f(0)] and, when 
G =R’, has derivations given by d,f = —ix,f(x). 

As an example, consider the case of periodic 
potentials invariant under translation by vectors a 
and b. Then the group G = Z? generated by a and b 
acts trivially on Q and the crossed-product algebra is 
just a product of A; and the twisted group algebra 
of complex-valued functions C(C, G, A), generated 
by U(a) and U(b). We already noted that the algebra 
is commutative only when the flux ¢€27Z, in 
which case it is just the convolution algebra of 
Z’, which by Fourier transforming (effectively 
setting U(a)=e'® and U(b)=e'”) is the algebra 
C(T*), with torus coordinates a and 8. For fluxes 
which are rational multiples of 27 we obtain a 
matrix algebra, whilst irrational fluxes give an 
infinite-dimensional irrational rotation algebra or 
noncommutative torus, a standard example in 
noncommutative geometry. 

Any *-representation p of A; on a Hilbert space 
Hp, can be induced to a *-representation 7, of the 
twisted crossed product on H=L*(G,H,) by 
setting 


(lf) (2) 
= | Xey- Aoo [10 


for f€ A and yE H. When Ay=C(Q), we may 
take p to be a one-dimensional irreducible 
x-representations given by evaluating the function 
at a point w EQ. 

When G =R, it is easy to construct a Fredholm 
module from 7,. The space H2 = H ® C? has actions 
T, of A on the first factor and of the Pauli spin 
matrices 01,02,03, on the second. It may be 
regarded as a graded module with grading operator 
o3, and 


F= EA +x "(x101 + x202) [11] 


provides a Connes—Fredholm involution which 
anticommutes with o3. Detailed technical results of 
Connes show how to use the supertrace on H3 and 
the Dixmier trace to interpret the physically impor- 
tant quantities in this setting. 

We now turn to the formal derivation of the key 
alternative expression for the conductivity. In the 
abstract algebraic setting, when p € A is a projec- 
tion in the domain of a derivation 6 the derivative of 


(1 — p)p=0 gives 
0 = (0 - p)p) = (1 — p)ôlp) -—6(b)p [12] 


and then an easy calculation leads to 


p, [6p, p]] = 2p(5p)p — (p)p* — p (6p) = —ôp [13] 


In the identity for elements a, b, c, and þh € A 


T(la, [b, el]h) — rCellh, a], b]) 
= 1((a, [b, c]b]) + (fb, c[h,a]]) =0 [14] 


we set a=c=p and b=6p to obtain 


T([p, lop, pilh) = T(PllA, p], pl) [15] 


Combining this with [12] when roé=0, one 
obtains 


r(pôh) = r(5(ph)) — r(6(p)h) 
= 7([p, lop, pllh) = r(pllh, p], 6p|) [16] 


The Hall Conductivity and Anderson 
Localisation 


Substituting p = Pp and b = H in formula [16] would 
give the current tr(Pp[[H,Pp],dPr]). Since 6, is 
proportional to the commutator with Xg, it is true 
that tr o 6, =0, but, unfortunately, Pp need not lie in 
the domain of 6,, and H is unbounded, further 
compounding the difficulties. These are serious 
problems, although the situation is not quite as 
bad as it seems. Without the electrostatic term eE - X 
in H, Pp would have been a spectral projection with 
which H would commute, so that 


H, Prl = e|E "X, Ppl = eE; |X; Ppl = 1¢L;0;PF [17] 


and H disappears from the formula, to be replaced 
by ;Pr. This would give the expected current 
i(e* /b)tr(Pr[O;Pr, OgPr])E;, and the conductivity 
matrix 


Ok = i(e* /b)tr(Pr[O;Pr, Prl) [18] 


given earlier (there is no need to scale by the 
thickness in two dimensions). 


However it is derived, this expression for the 
conductivity only makes sense under suitable condi- 
tions, otherwise tr(Pp[o;Pr, O,Pr]) might either be 
undefined (because Pr is not differentiable) or might 
not be trace class. There is a simple condition 
sufficient to handle both these difficulties, which 
also leads to an interesting physical insight. From 
the obvious inequality 


O <tr [Pp (Pr £ iða Pp) (OPE T id, Pr) [19] 


<tr [Pe ( (Pr)? ms (rPr)”) 
+ itr(Pp[O1, Pr, 2P¢]) [20] 


and the fact that 1 > Pp, we deduce that 
tr (Pr) + (2Pr)”) 


> tr |P ( (31 Pr)? + (@Pr)*) 
> |tr(Pr[O1 Pr, 02PF])| [21] 


Thus, if tr[((31 Pp)? + (0)P¢)*)] exists and is finite, then 
our expression for the conductivity is well defined. 
Mathematically, this is a Sobolev type condition. To 
see the physical significance, we recall that 0,P = — 1 
[X}, Pr], so that the condition is equivalent to the 
finiteness of tr[(X? + X5)Pp7] — tr[(X1 Pr) + (X2PF)]. 

This condition imposes a requirement for some 
localization in the system (when Pp is a rank-1 
projection, it reduces to the requirement that the 
variance (Xj + X3) — (X,)°—(X)* be finite). This 
links with a much older observation of Anderson that 
the interference caused by impurities in a crystal, 
which cancel at long range, should, at smaller 
distances, cause localized clumping. The mathe- 
matical development of this idea by Pastur provides 
an appropriate tool for handling the conditions 
for the valdiity of the conductivity formula. The 
impurities generating Anderson localization are 
provided in this model by the random potential 
in the Hamiltonian. It also leads us to restrict 
attention to the dense subalgebra Ay of f € A, 
where T[(Oif)"(O1f) + (O2f)"(O2f)] < œ. 


The Integral Quantum Hall Effect 


Having identified the features of physical interest, 
we can return to the abstract algebraic description 
with conductivity i(e”/h)r(p[Oip, Opp]). The key 
observation is that this can be interpreted as the 
Connes pairing between a cyclic cocycle c, on Ao 
and the projection p whose stable equivalence class 
represents an element of the C*-algebraic K-theory, 
Ko(A). Such pairings give noncommutative Chern 
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characters. The cyclic cocycle is a trilinear form 
defined on elements do, 41,42 € Ag by 


C,(do, a1, a2) = T|a0(61a162a2 = 62416142)| [22] 


This is easily shown to be cyclic, c,(ao,41, a2) = 
C,(a1,42,a9), and to satisfy the cyclic 2-cocycle 
condition 


c (dođi; 42,43) — C,(d0, 4142, a3) 
+ €,(a0, 41,4243) — C-(a340,a1,42) =O 23] 


The Hall conductivity o21=ic,(p, p, p)e?/b can 
now be interpreted as the noncommutative 
Chern character defined by the projection p. 

This interpretation of the Hall conductivity clears 
the way to prove that it is integral, and there are 
several different routes to this. 

One approach is to identify the conductivity with 
some kind of index which is clearly integral. 
Bellissard worked with the Fredholm module 
where, by results of Connes, the Chern character is 
interpreted as the index of the Fredholm operator 
T (p)Fr,(p). Avron, Seiler and Simon have inter- 
preted the conductivity as a relative index 
dim [ker (Pp — Op — 1)] — dim [ker (Op — Pp — 1)] of 
the projections Pp and its conjugate Op = uPpu* by 
an off-diagonal element u of F. This is particularly 
interesting as the conjugation by u can be inter- 
preted as a nonsingular gauge transformation of 
exactly the kind introduced by Laughlin in his 
original explanation of the quantum Hall effect in 
terms of singular flux tubes piercing a cylindrical 
conductor. 

Xia suggested another approach rewriting A as a 
repeated crossed product with R, which allows us to 
calculate Ko(A), using either Connes’ Thom iso- 
morphism theorem or the Takai duality theorem for 
stable algebras to get 


Ko(A) = Ko|[C(4A1, G, A)| = Ko(Ai) [24] 


which, when A; = C(Q), is just K°(Q), leading to 
identification as a topological index. For the simplest 
case of Q = T’, this gives K°(Q) = Z”. The image of 7, 
and so also c,, actually sits in just one component, 
leading to quantization of the Hall conductivity. 

The two questions posed in the introduction can 
now be answered as follows: The Hall conductivity 
can be identified with a topological index which can 
take only integer values, and therefore does not 
respond to continuous changes in any of the physical 
parameters until the change brings the system into a 
region where one of the background assumptions 
fails, such as a breakdown in the localization 
condition. The same conditions also ensure that the 
direct current vanishes. Roughly speaking, the 
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plateaus occur when the Fermi energy is in a gap in 
the extended (nonlocalized) spectrum. 

This brief overview has omitted many of the 
interesting features of the detailed theory, which can 
be found in the surveys, such as the fact that low- 
lying energy levels do not contribute to the 
conductivity, and Shubin’s theorem identifying 7(p) 
as the integrated density of states. Harper’s equation 
describing a discrete lattice analog of the IQHE has 
been a test-bed for many of the ideas, and various 
results were first proved in that setting. The FQHE 
was discovered during an unsuccessful search for a 
Wigner crystal phase transition, but analysis of 
discrete models provides strong evidence that Hall 
conductors have very complicated phase diagrams. 


The Fractional Quantum Hall Effect 


As mentioned in the introduction, by the time IQHE 
had been understood theoretically, it had been found 
that, with appropriate care, fractional conductivities 
could also be observed, although they were much less 
precise and stable than the integer values, and the 
plateaus less pronounced. Although there have been 
many phenomenological explanations, there is as yet no 
mathematical understanding from quantum field the- 
ory as compelling as that for the integer effect. We shall 
briefly summarize some of the main lines of attack. 

The first explanation, again due to Laughlin, has also 
provided the basis for many subsequent treatments of 
the problem. The wave functions of the oscillator-like 
Landau Hamiltonian can conveniently be represented 
in the Bargmann-—Segal Fock space of holomorphic 
functions fon R* ~ C which are square-integrable with 
respect to a Gaussian measure. Incorporating the 
measure into the functions, these have the form 
f(z) exp(—|z|*/2). Many particle wave functions are 
similarly realized in terms of holomorphic functions on 
C, and must be antisymmetric under odd permuta- 
tions of the particles to describe fermions. This quickly 
leads one to consider functions of the form 


[Ie - z) exp -5 al/a) 25] 


r<s 


for odd integers k > 0, and their multiples by even 
holomorphic functions. The lowest energy where such a 
wave function occurs is when k = 1, and larger values of 
k have the effect of dividing the Hall conductivity by k, 
which produces fractional conductivities. 

Halperin suggested quite early that counterflow- 
ing currents in the interior of a sample would tend 
to cancel, so that most of the current would be 
carried near the edge of the sample. There are 
several mathematical derivations of this, by, for 


example, Macris, Martin, and Pulé, and by Frohlich, 
Graf, and Walcher. The K-theory of the boundary 
and bulk of a sample can be linked by exact 
sequences such as those of the commutative theory 
(Kellendonk et al. 2000), and even in the IQHE 
boundary and bulk conductivities can be used 
(Schulz-Baldes et al. 2002). 

It has been fairly clear that whilst the IQHE can 
already be understood in terms of the motion of a 
single electron, the fractional effect is a many-body 
cooperative effect. One attempt to simplify the 
description is to work with an incompressible quan- 
tum fluid, and for edge currents one should study the 
boundary theory of such a fluid, in which the 
dominant contribution to the action is a Chern—Simons 
term, with conductivity as a coefficient. For an annular 
sample, this leads, in a suitable limit, to a chiral 
Luttinger model on the boundary circles, which can 
then be tackled mathematically using the representa- 
tion theory of loop groups. This leads to some elegant 
mathematics, including extensions to multiple coupled 
bands, with conductivities described by Cartan 
matrices, as explained in the International Congress 
of Mathematicians (ICM) survey (Frohlich 1995), and 
in the review by Frohlich and Studer (1993). 

The theory of composite fermions provides another 
physical approach in which field-theoretic effects result 
in the electrons sharing their charges in such a way as to 
produce fractional charges, and there is experimental 
evidence of such fractional charges in studies of 
tunneling from one edge to another. Then the FQHE 
is easily understood by simply replacing the electron 
charge e by e/k in the appropriate formulas. 

Susskind has suggested combining noncommuta- 
tive geometry with the theory of incompressible 
quantum fluids, an idea taken up by Polychronakos 
(2001). There are intriguing mathematical parallels 
with work by Berest and Wilson on ideals in the 
Weyl algebra and the Calogero—Moser model. 


Further Developments 


Bellissard and others have extended the use of 
noncommutative geometrical methods into other 
parts of solid-state theory, where they clarify a 
number of the physical ideas. This is particularly 
useful in the case of quasicrystals, which are not 
easily handled by the conventional methods 
(Bellissard et al. 2000). Some ideas in string theory 
resemble higher-dimensional analogs, and higher- 
dimensional versions of the quantum Hall effect 
have also been studied by Hu and Zhang. 

Finally, we conclude with some mathematical 
extensions of the theory. We have seen that, for 
periodic systems, the noncommutative Brillouin 


zone can be a noncommutative torus, and it is 
possible to consider noncommutative versions of 
Riemann surfaces of higher genera. Carey et al. 
(1998) studied the effect in a noncommutative 
hyperbolic geometry with a discrete group action, 
generalizing the action of a Fuchsian group on the 
unit disc. This provides a tractable example in which 
one has an edge (albeit rather different from 
the normal physical situations) and also examples 
of a Hall effect in higher-genus noncommutative 
Riemann surfaces closely related to those of Klimek 
and Lesznewski. Natsumé and Nest have subse- 
quently shown that these are deformation quantiza- 
tions of the commutative Riemann surface theory in 
the sense of Rieffel. Coverings of noncommutative 
Riemann surfaces, which might provide an analoge 
of composite fermions, have been investigated by 
Marcolli and Mathai (1999, 2001). 


See also: C*-Algebras and Their Classification; 
Chern—Simons Models: Rigorous Results; Fractional 
Quantum Hall Effect; Hopf Algebras and g-Deformation 
Quantum Groups; Localization for Quasiperiodic 
Potentials; Noncommutative Geometry and the Standard 
Model; Noncommutative Tori, Yang—Mills, and String 
Theory; Schrodinger Operators. 
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Introduction 


Scattering theory is concerned with the study of the 
large-time behavior of solutions of the time- 
dependent Schrödinger equation [1] for a system 
with a Hamiltonian H: 


iðu/ðt = Hu, u(0)=f [1] 


Being a part of the perturbation theory, scattering 
theory describes the asymptotics of u(t) as t— +00 
or t— —co in terms of solutions of the Schrödinger 


equation for a “free” system with a Hamiltonian Hp. 
Of course, eqn [1] has a unique solution 
u(t) = exp(—iHt)f, while the solution of the 
same equation with the operator Ho and the 
initial data uo(0)=fp is given by the formula 
uo(t) = exp (—1Hot)fo. From the viewpoint of scat- 
tering theory, the function u(t) has free asympto- 
tics as t+ too if for appropriate initial data ff 
eqn [2] holds: 


lim Jul) — 0g (0)\| = 0 2 
Here and throughout this article a relation contain- 
ing the signs “+” is understood as two indepen- 
dent equalities. We emphasize that initial data ff 
are different for t—-+oo and t—-—oo and 
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uy (t)=exp(—iHot)f>. Equation [2] leads to a 
connection between the corresponding initial data 
fo and f given by 


= lim exp(iHt) exp(—iHot) fF [3] 


If f is an eigenvector of H, that is, Hf = Af, then 
obviously u(t) =ef. On the contrary, if f belongs 
to the (absolutely) continuous subspace of H, then 
necessarily u(t) has the free asymptotics as t > +00. 
This result is known as asymptotic completeness. 

The Schrodinger operator H = —A + V(x) in the 
space H= L2 (Rf) with a real potential V decaying 
at infinity is a typical Hamiltonian of scattering 
theory. The operator H describes a particle in an 
external potential V or two interacting particles. 
Asymptotically (as t— +00 or t— —oo), particles 
may either form a bound state or be free (a 
scattering state). Of course, a bound (scattering) 
state at —co remains the same at +oo. To be more 
precise, suppose that 


[V(x)| CA + |x)" 4 


where p > 1. Then relation [2] can be justified with 
the kinetic energy operator Hyp =—A playing the 
role of the unperturbed operator. 

As discussed in Landau and Lifshitz (1965) (see 
also Amrein et al. (1977), Pearson (1988), and Yafaev 
(2000)), in scattering experiments one sends a beam 
of particles of energy A>0 in a direction w. Such a 
beam is described by the plane wave 

wo(x;w, A) = exp(ik(w,x)), A=k*>0 
(which satisfies of course the free equation 
Ayo = AW). The scattered particles are described 
for large distances by the outgoing spherical wave 


a(%, w; ) |x|"? exp(ik|x]) 


Here £= x|x|" is the direction of observation and 
the coefficient a(x,w;) is known as the scattering 
amplitude. This means that quantum particles 
subject to a potential V(x) are described by the 
solution w of eqn [5] with asymptotics [6] at infinity: 


-AY + V(x) = Ay% [5] 
P(x; w, A) = exp(ik(w, x)) 

+ a(&,w; X) |x|"? exp (ik|x]) 

+ (|x| 0?) 6 


The existence of such solutions requires of course a 
proof. The differential scattering crosssection 


defined by eqn [7] gives us the part of particles 
scattered in a solid angle dx: 


do(&,w; A) = |a(&, w; A)| då [7] 


As discussed below, the temporal asymptotics 
of solutions of the time-dependent Schrödinger 
equation [1] are closely related to the asymptotics 
at large distances of solutions of the stationary 
Schrödinger equation [5]. 


Time-Dependent Scattering Theory 
and Møller Operators 


If V(x)—0 as |x| — co, then the essential spectrum 
of the Schrodinger operator H = — A + V(x) covers 
the whole positive half-line, whereas the negative 
spectrum of H consists of eigenvalues accumulating, 
perhaps, at the point zero only. 

Scattering theory requires a more advanced 
classification of the spectrum based on measure 
theory. Consider a self-adjoint operator H defined 
on domain D(H) in a Hilbert space H. Let E be its 
spectral family. Then the space H can be decom- 
posed into the orthogonal sum of invariant sub- 
spaces H'?),H'S°) and H'**), The subspace HP is 
spanned by eigenvectors of H and the subspaces 
H, HC are distinguished by the condition that 
the measure (E(X)f,f) (here X C R is a Borel set) is 
singularly or absolutely continuous with respect to 
the Lebesgue measure for all f € HS or f € HeY., 
Typically (in applications to quantum-mechanical 
problems) the singularly continuous part is absent, 
that is, H'S°) = {0}. We denote by H'*°) the restriction 
of H on its absolutely continuous subspace H'*° and 
by P®° the orthogonal projection on this subspace. 
The same objects for the operator Ho will be 
endowed with the index “0.” 

Equation [3] motivates the following fundamental 
definition. The wave, or Meller, operator 
W. = W.(H, Ho) for a pair of self-adjoint operators 
Hp and H is defined by eqn [8] provided that the 
corresponding strong limit exists: 


Wi = s-lim exp(1Ht) exp(—iHot) PY [8] 
t— EO 
The wave operator is isometric on HẸ® and enjoys 
the intertwining property 
HW = Wis 9] 


Therefore, its range Ran W+ is contained in the 
absolutely continuous subspace 1'*°) of the operator H. 

The operator Wi(H, Ho) is said to be complete if 
eqn [10] holds: 


Ran W..(H, Ho) = HEY [10] 


It is easy to see that the completeness of W+(H, Ho) 
is equivalent to the existence of the “inverse” wave 
operator Wi(Ho,H). Thus, if the wave operator 
W.(H, Ho) exists and is complete, then the opera- 
tors H” and H'°) are unitarily equivalent. We 
emphasize that scattering theory studies not arbi- 
trary unitary equivalence but only the “canonical” 
one realized by the wave operators. 

Along with the wave operators an important role 
in scattering theory is played by the scattering 
operator defined by eqn [11] where W% is the 
operator adjoint to W4: 


S = S(H, Ho) = W* (H, Ho)W-(H, Ho) [41] 


The operator § commutes with Hp and hence 
reduces to multiplication by the operator function 
S(A) =S(A; H, Ho) in a representation of aa which 
is diagonal for m. The operator S(A) is known as 
the scattering matrix. The scattering operator [11] is 
unitary on the subspace H provided the wave 
operators W.(H,Ho) exist and are complete. The 
scattering operator S(H, Ho) connects the asympto- 
tics of the solutions of eqn [1] as t—+ —oco and as 
t— +00 in terms of the free problem, that is 
S(H, Ho): fo fo’, where fọ are the same as in eqn 
[2]. The scattering operator and the scattering 
matrix are usually of great interest in mathematical 
physics problems, because they connect the “initial” 
and the “final” characteristics of the process 
directly, bypassing its consideration for finite times. 

The definition of the wave operators can be 
extended to self-adjoint operators acting in different 
spaces. Let Hg and H be self-adjoint operators in 
Hilbert spaces Ho and H, respectively, and let 
“identification” J:Ho— H be a bounded operator. 
Then the wave operator W+ = W.(H,Ho;/J) for the 
triple Ho, H, and J is defined by eqn [12] provided 
again that the strong limit there exists: 


Wa = s-lim exp(iHt)J exp(—iHot)P®? J12 
ta 00 


Intertwining property [9] is preserved for wave 
operator [12]. This operator is isometric on a if 
and only if 


lim |lJ exp(—iHot)foll = Ilfoll 
for all fy € Hi). Since 


slim K exp(—iHot) Py =) 

t|—0o 

for a compact operator K, wave operators [12] 
corresponding to identifications J; and J2 coincide if 
Jı — Jı is compact or, at least, the operators (J2 — J1) 
FEo(X) are compact for all bounded intervals X. 
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Consideration of wave operators [12] with J #1 
may of course be of interest also in the case Ho = H. 

It suffices to verify the existence of limits [8] or 
[12] on some set dense in the absolutely continuous 
subspace (al of the operator Hp. The following 
simple but convenient condition for the existence of 
wave operators is usually called Cook’s criterion. 
Suppose that Hp =H" and that the operator J 
maps domain D(Ho) of the operator Ho into D(H). 
Let 


+00 
/ | (H] — JH) exp(—iHot)fldt < oc 


for all f from some set Do C D(Ho) dense in Ho. 
Then the wave operator W.(H, Ho; J) exists. 

This result is often useful in applications since the 
operator exp(—iHot) is known explicitly. For 
example, it works with J =I for the pair 


Ho = —A, H = Ho + V(x) [13] 


if V(x) satisfies estimate [4] with p> 1. On the 
other hand, different proofs of the existence of the 
wave operators W4(Họo, H;J*) require new mathe- 
matical tools. There are two essentially different 
approaches in scattering theory: the trace-class and 
smooth methods. 


Time-Independent Scattering Theory 


The approach in scattering theory relying on 
definition [8] is called time dependent. An alter- 
native possibility is to change the definition of wave 
Operators replacing the unitary groups by the 
corresponding resolvents Ro(z)=(Ho —z) and 
R(z)=(H—z). They are related by a simple 
identity 


R(z) = Ro(z) — Ro(z) VR(z) 
= Ro(z) — R(z) VRo(z) [14] 


where V =H — Hp and Im z Æ 0. In the stationary 
approach in place of limits [8] one has to study 
the boundary values (in a suitable topology) of the 
resolvents as the spectral parameter z tends to the 
real axis. An important advantage of the stationary 
approach is that it gives convenient formulas for the 
wave operators and the scattering matrix. 

Let us discuss here the stationary formulation of 
the scattering problem for operators [13] in the 
Hilbert space H = L (Rf) in terms of solutions of the 
Schrodinger equation [5]. If V(x) satisfies estimate [4] 
with p > (d+ 1)/2, then for all \ > 0 and all unit 
vectors w € S41, eqn [5] has the solution w(x; w, A) 
with asymptotics [6] as |x|— o0. Moreover, the 
scattering amplitude a(x,w;) belongs to the space 
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L>(S*') in the variable £ uniformly in w € S4~!, and 
it can be expressed via w(x;w, A) by the formula 


a(0, w; A) = —ya(A) J ; e492) V(x) (xe; w, A) dx 


where 
yal à) = p40, ee 


Let us define two sets of scattering solutions, or 
eigenfunctions of the continuous spectrum, by the 
formulas 


W_(x;w,r) = p(x;w, A) and +(x; w, A) 
= w(x; —_W, A) 


In terms of boundary values of the resolvent, the 
functions Y+(w, A) can be constructed by the formula 


Wt (w, A) — Wo (w, A) _ R(A F 10) Vo (w, A) [15] 


Obviously, functions [15] satisfy eqn [5]. Using 
resolvent identity [14], it is easy to derive the 
Lippmann-Schwinger equation 


Wt (w, A) = Wo (w, A) — Ro(A T 10) Vb (w, A) 


for w+(w, A). Asymptotics [6] can be deduced from 
the formula 


(Ro(A + i0)f) (x) =ca(A)(To(A)f) (4%) axl A 
x exp(+ik|x]) Sh O(lx| 7+2) 


where f€ CL(RË), cz (A) = q1/2\-1/4 eFinld—3)/4 and 
the operator To(à) defined by eqn [16] is (up to the 
numerical factor) the restriction of the Fourier 
transform f =Fof onto the sphere of radius \!/*: 


(To (AA w) =2- PAT MAF(APW), wes™! [16 


The wave operators W(H,Ho) can be con- 
structed in terms of the solutions w+. Set €=!/2w 
(£ is the momentum variable), write w+(x,€) instead 
of w+(x;w,A), and consider two transformations 


(FANO = ny | Df aN 


(defined initially, e.g., on the Schwartz class S(R2)) 
of the space L (R) into itself. The operators F+ can 
be regarded as generalized Fourier transforms, and 
both of them coincide with the usual Fourier 
transform Fo if V=O. It follows from eqns [5], 
[17] that under the action of F+ the operator H goes 
over into multiplication by |€|7, that is, 


(FLHP)(€) = EFFAN E 


Moreover, with the help of eqn [15], it can be 
shown that F is an isometry on H”, it is zero on 
HOH, and its range Ran F4 =L>(R%). This is 
equivalent to eqns [18]: 


Poss Jor al [18] 


Hence any function f € H®® admits the expansion 
in the generalized Fourier integral 


fx) = (2P | vale FANE dE 
R 
It can also be deduced from eqn [6] that the vector 


(Fi — F$) exp(—ilg|*t)f 


tends to zero as t—> +00 for an arbitrary f eLm 
This implies the existence of the wave operators 
W.=W.(H,Ho) for pair [13] and gives the 
representation 


Wi=F Fo [19] 


Completeness of W+ follows from eqn [19] and 
the first equation in [18]. The second equality in 
[18] is equivalent to the isometricity of WŁ. 
Formula [19] is an example of a stationary 
representation for the wave operator. It formally 
implies that 


Wipo (w, A) = Yp (w, A) 


which means that each wave operator establishes a 
one-to-one correspondence between eigenfunctions of 
the continuous spectrum of the operators Ho and H. 

The main ideas of the stationary approach go 
back to Friedrichs (1965), and Povzner. The inverse 
problem of reconstruction of a potential V given the 
scattering amplitude a (see eqn [6]) is treated in 
Faddeev (1976). 


The Trace-Class Method 


Recall that the class 6, p > 1, consists of compact 
operators T such that the norm 


1/p 
IT ||, = (>: sor > pate 


is finite. Eigenvalues ,(|T|)=:s,(T) of a non- 
negative operator |T| are called singular numbers 
of T. In particular, © is the trace class and G is the 
Hilbert—Schmidt class. 

The trace-class method (see Reed and Simon (1976) 
or Yafaev (1992) for a detailed presentation) makes no 
assumptions about the “unperturbed” operator Hp. Its 
basic result is the following theorem of Kato and 
Rosenblum. If V =H — Ho belongs to the trace class 


G,, then the wave operators W..(H, Ho) exist and are 
complete. In particular, the operators H?” and H'*) 
are unitarily equivalent. This can be considered as a far 
advanced extension of the H Weyl theorem, which 
states the stability of the essential spectrum under 
compact perturbations. 

The condition V € G, in the Kato—Rosenblum 
theorem cannot be relaxed in the framework of 
operator ideals G,. This follows from the Weyl-von 
Neumann-Kuroda theorem. Let Ho be an arbitrary 
self-adjoint operator. For any p > 1 and any £ > 0 
there exists a self-adjoint operator V such that V € Gy, 
|V|, <£ and the operator H = Hp + V has purely 
point spectrum. Of course, such an operator H has no 
absolutely continuous part. At the same time, the 
operator Ho may be absolutely continuous. In this 
case, the wave operators W(H, Ho) do not exist. 

Although sharp in the abstract framework, the 
Kato—Rosenblum theorem cannot directly be applied 
to the theory of differential operators where a 
perturbation is usually an operator of multiplication 
and hence is not even compact. We mention its two 
generalizations applicable to this theory. The first, 
the Birman—Kato-Krein theorem, claims that the 
wave operators W(H, Ho) exist and are complete 
provided 


R”(z) — RG(z) € Gi 


for some n=1,2,... and all z with Imz Æ 0. The 
second, the Birman theorem, asserts that the same is 
true if D(H) = D(Ho) or D(|H|'/*) = D(|Ho|'/*) and 


E(X)(H — Ho)Eo(X) € © 


for all bounded intervals X. 

The wave operators enjoy the following property 
known as the Birman invariance principle. Suppose 
that (H) — (Ho) € G; for a real function p such 
that its derivative y’ is absolutely continuous and 
y'(A) > 0. Then the wave operators W..(H, Ho) exist 
and eqn [20] holds: 


W.(H, Ho) = W+(y(A), y(Ao)) [20] 


A direct generalization of the Kato—Rosenblum 
theorem to the operators acting in different spaces is 
due to Pearson. Suppose that Hp and H are self- 
adjoint operators in spaces Ho and H, respectively, 
J:Ho—-H is a bounded operator and V=H] — 
JHo € ©;. Then the wave operators W4(H, Ho; J) 
and W.(Ho, H;]*) exist. 

Although rather sophisticated, the proof relies 
only on the following elementary lemma of Rosen- 
blum. For a self-adjoint operator H, consider the set 
R CH" of elements f such that 


r (f) := ess sup d(E(A)f, f)/dà < co 


Quantum Mechanical Scattering Theory 255 


If K:H—G (G is some Hilbert space) is a Hilbert- 
Schmidt operator, then for all f € R 


/ © |[Kexp(iHi) fl? dt < rA (NIKIZ [21] 


Moreover, the set % is dense in H'*”. 

The Pearson theorem allows to simplify consider- 
ably the original proofs of different generalizations 
of the Kato—Rosenblum theorem. 

A typical application of the trace-class theory is 
the following result. Suppose that 


H=L (Rİ), Ho =—A+Vo(x), H=Ho+V(x) [22] 


where the functions Vo and V are real, Vo € Lo (R) 
and V satisfies estimate [4] for some p >d. Then the 
wave operators W.(H,Ho) exist and are complete. 


The Smooth Method 


The smooth method (see Kuroda (1978), Reed and 
Simon (1979), or Yafaev (1992), for a detailed 
presentation) relies on a certain regularity of the 
perturbation in the spectral representation of the 
operator Ho. There are different ways to understand 
regularity. For example, in the Friedrichs—Faddeev 
model Ho acts as multiplication by independent 
variable in the space H =L>(A;St), where A is an 
interval and Mt is an auxiliary Hilbert space. The 
perturbation V is an integral operator with suffi- 
ciently smooth kernel. 

Another possibility is to use the concept of H- 
smoothness introduced by Kato. An H-bounded 
operator K is called H-smooth if, for all f € D(H), 


f * |[Kexp(—iHa)f||2 dt < CIA 23) 


(cf. eqns [21] and [23]). Here and below, C are different 
positive numbers whose precise values are inessential. 
It is important that this definition admits equivalent 
reformulations in terms of the resolvent or of the 
spectral family. Thus, K is H-smooth if and only if 


sup ||K(R(A + ie) — R(A — ie))K*|| < co 
AER,e>0 


or if and only if 
sup |X| '|| KE(X)||° < 00 


for all intervals X C R. 

In applications the assumption of H-smoothness 
of an operator K imposes too stringent conditions 
on the operator H. In particular, the operator H is 
necessarily absolutely continuous if kernel of K is 
trivial. This assumption excludes eigenvalues and 
other singular points in the spectrum of H, for 
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example, the bottom of the continuous spectrum for 
the Schrodinger operator with decaying potential or 
edges of bands if the spectrum has the band 
structure. The notion of local H-smoothness sug- 
gested by Lavine is considerably more flexible. By 
definition, K is called H-smooth on a Borel set X c R 
if the operator KE(X) is H-smooth. Note that, under 
the assumption 
sup ||K(R(A + ie) — R(A — ie))K*|| <œ 124] 
AEX, e>0 

the operator K is H-smooth on the closure of X. 

The following Kato—Lavine theorem is simple but 
very useful. Suppose that 


HJ — JHo = K* Ko 


where the operators Kg and K are Ho-smooth and 
H-smooth, respectively, on an arbitrary compact 
subinterval of some interval A. Then the wave 
operators 


W..(H, Ho;JEo(A)) and W.(Ho, H; J E(N) 


exist (and are adjoint to each other). 

This result cannot usually be applied directly since 
the verification of Họ- and especially of H-smooth- 
ness may be a difficult problem. Let us briefly 
explain how it can be done on the example of pair 
[10], where the potential V(x) satisfies estimate [4] 
for some p> 1. Let us start with the operator 
Hj =—A. Denote by LY) = LP (RÍ) the Hilbert 
space with the norm ||f ||; = Ilx Fl], where (x) = (1 + 
l7). Let the operator To(A) be defined by 
eqn [16], and let X c (0,00) be some compact 
interval. Set N= L(S%!). If f € L? with 1 > 1/2, 
then, by the Sobolev trace theorem, 


IToA)F ll < Cfi 
To AF -ToA Flin < CA = AT IA: 


for an arbitrary a < l — 1/2,a < 1 and all A, X € X. 
These estimates imply that the function 


EFLA = fOr as 26 


|E? <A 


|25] 


is differentiable and the derivative 
d(Eo(A)f,f)/dd = [AFl fF € LY’, 1> 1/2 


is H6lder-continuous in A>0 (uniformly in f, 
Wf ll; < 1). Therefore, applying the Privalov theorem 
to the Cauchy integral 


RON | =a AEA 
we obtain that the analytic operator function 


Rolz) = (x) Rolz (x), 1 >1/2 


considered in the space H, is continuous in norm in 
the closed complex plane C cut along (0,00) with 
possible exception of the point z=0. This implies 
Ho-smoothness of the operator (x)~,/ > 1/2, on all 
compact intervals X C (0, 00). 

To obtain a similar result for the operator H, 
we proceed from the resolvent identity [14]. 
Let R(z) = (x) R(z) (x), and let B be the operator 
of multiplication by the bounded function 
(1 + |x|)? V(x). If 


f + Ro(z)Bf = 0 


then w= Ro(z) (x) Bf satisfies the Schrodinger equa- 
tion Hy = zw. Since H is self-adjoint, this implies that 
w=0 and hence f = 0. Using eqn [14], we obtain that 


R(z) = (I + Ro(z)B) 'Ro(z), Imz#0 [27 


because the inverse operator here exists by the 
Fredholm alternative. 

The operator function (I + Ro(z)B)~ is analytic 
in the complex plane cut along (0,00) with possible 
exception of poles (coinciding with eigenvalues of H) 
on the negative half-axis. Moreover, (I + Ro(z)B)~ 
is continuous up to the cut except the set M C (0, co) 
of \ where at least one of the homogeneous equations 


1 


f + Ro(A +i0)Bf =0 (28) 


has a nontrivial solution. It follows from eqn [27] 
that the same is true for the operator function F(z). 
It can be shown that the set M is closed and has the 
Lebesgue measure zero. Let A=(0,00)\N; then 
A= U, A, where A, are disjoint open intervals. By 
condition [24], the operator (x)“,/>1/2, is 
H-smooth on any strictly interior subinterval of 
every A, Applying the Kato—Lavine theorem, we see 
that the wave operators W+(H,Ho0;Eo(A,)) and 
W.(Ho, H; E(A,)) exist for all n. Since Eg(A)=I 
and E(A)=P'), this implies the existence of 
W.(H, Ho) and W+(Ho, H). Thus, the wave opera- 
tors W.(H, Ho) for pair [13] exist and are complete 
if estimate [4] holds for some p > 1. 

Compared to the trace-class method, conditions on 
the perturbation V(x) are less restrictive, while the 
class of admissible “free” problems is essentially more 
narrow (in eqn [22] Vo(x) is an arbitrary bounded 
function). It is not known whether the wave 
operators Wi(H,Ho) exist for all pairs [22] such 
that Vo € Læ and V satisfies [4] for some p > 1. 

It is important that the smooth method allows one 
to prove the absence of the singular continuous 
spectrum. Note first that the continuity of R(z) 
implies that the operator H is absolutely continuous 
on the subspace E(A)H. Therefore, the singular 


positive spectrum of H is necessarily contained in M. 
To prove that its continuous part is empty, it suffices 
to check that the set M consists of eigenvalues of the 
operator H. In terms of u= (x) Bf,l=p/2, eqn 
[28] can be rewritten as 


u + VRo(A +10)u = 0 [29] 


Multiplying this equation by Ro(A +10)u and taking 
the imaginary part of the scalar product, we see that 


r d(Eo(A)u, u)/dà = Im(Ro(A  i0)u, u) = 0 
According to eqn [26], this implies that 
w(€)=0 for || =A? [30] 
It follows from eqn [29] that 
w= Ro(à +i0)u [31] 


that is, %(E)=(|ċ] —AFi0) aE), is a formal 
(because of the singularity of the denominator) 
solution of Schrödinger equation [5]. Therefore, one 
needs only to verify that a € L2(Rf). Since u € LY), 
where / = p/2, this is a direct consequence of [25] and 
[30] if p > 2. In the general case, one uses that under 
assumption [30] the function (|€ i — \)u(€) belongs 
to the space LY’ for any p</—1. By virtue of 
condition [4] where p > 1, eqn [29] now shows that 
actually u € Le | for any p<l+p-—1. Repeating 
these arguments, we obtain, after n steps, that u € 
Ly for any p < l+ n(p — 1). For n large enough, this 
implies that u € ce for p> 1, and consequently 
function [31] belongs to L2(R%). 

Similar arguments show that eigenvalues of H 
have finite multiplicity and do not have positive 
accumulation points. For the proof of boundedness 
of the set of eigenvalues, one uses additionally the 
estimate 


Ro(A £i0)|| =O(A 1/7), AG [32] 


Actually, according to Kato theorem the Schrédin- 
ger operator H does not have positive eigenvalues. 

There exists also a purely time-dependent 
approach, the Enss method (see Perry (1983)), 
which relies on an advanced study of the free 
evolution operator exp (—iHof?). 


The Scattering Matrix 


The operator Hp =—A can of course be diagona- 
lized by the classical Fourier transform. To put it 
slightly differently, set 


(Fof)(A) =To(A)f 
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where the operator Io(A) is defined by eqn [16]. 
Then 


Fo : Lo(R4) 3 L2(Ry;0), It=L,(S*4) 


is a unitary operator and (FoHof)(A) = A(Fof)(A). 

Under assumption [4] where p > 1, the scattering 
operator S for pair [13] is defined by eqn [11]. It is 
unitary on the space H= L(Rf) and commutes 
with the operator Ho. It follows that (FoSf)(A) = 
S(A)(Fof)(A), A > 0, where the unitary operator 
S(A):9t— N is known as the scattering matrix. The 
scattering matrix S(A) for the pair Ho,H can be 
computed in terms of the scattering amplitude. 
Namely, S(A) acts in the space L>(S4-1), and S(A) — 
I is the integral operator whose kernel is the 
scattering amplitude. More precisely, 


(S(A)f)() 
= f(0) + 2iàt/ yg(A) J 409, w; A)F (w) dw 


In operator notation, this representation can be 
rewritten as 


S(A) = I — 2r iro (A)(V — VR(A + i0) V)E5 (A) [33] 


The right-hand side here is correctly defined as a 
bounded operator in the space Jt and is continuous 
in A > 0. Moreover, the operator S(A) — I is compact 
since To(A) (x): HOM is compact for l > 1/2 by 
virtue of the Sobolev trace theorem. 

It follows that the spectrum of the operator S(A) 
consists of eigenvalues of finite multiplicity, except 
possibly the point 1, lying on the unit circle and 
accumulating at the point 1 only. In the general 
case, eigenvalues of S(A) play the role of scattering 
phases or shifts considered often for radial potentials 
V(x) = (|x|). 

The scattering amplitude is singular on the 
diagonal 0=w only. Moreover, this singularity is 
weaker for potentials with faster decay at infinity 
(for p bigger). If p > (d+1)/2, then the operator 
S(A) — I belongs to the Hilbert—Schmidt class. In this 
case the total scattering cross section 


olw; À) = J 48.0 X)| dé 


is finite for all energies \ >0O and all incident 
directions w€S*'. If p>d, then the operator 
S(A) — I belongs to the trace class. In this case, the 
scattering amplitude a(0,w; ÀA) is a continuous func- 
tion of 6,w € S*! (and A > 0). The unitarity of the 
operator S(\) implies the optical theorem 


alw; A) = AT? Im (yz! (A)a(w, w; A)) 
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Using resolvent identity [14], one deduces from 
eqn [33] the Born expansion 


S(A) =I —2ni SOCIARA OWT O) 
n=0 


This series is norm-convergent for small potentials V 
and according to estimate [32] for high energies A. 


Long-Range Interactions 
Potentials decaying at infinity as the Coulomb 
potential 


V(x) = I\x|', d>3 


or slower are called long-range. More precisely, it is 
required that 


Væ) < C(1 + lx), pe) [84 


for all derivatives of V up to some order. In the 
long-range case, the wave operators W.(H,Ho) do 
not exist, and the asymptotic dynamics should be 
properly modified. It can be done in a time- 
dependent way either in the coordinate or momen- 
tum representations. For example, in the coordinate 
representation, the free evolution exp(—iHof) 
should be replaced in definition [8] of wave 
operators by unitary operators Uo(t) defined by 


(Uo (t)f)(x) = exp(iZ(x, t) (2it) "fF (x / (2) 


where f is the Fourier transform of f. For short- 
range potentials we can set E(x, t) = (4t)! xl”. In the 
long-range case the phase function =(x, t) should be 
chosen as a (perhaps, approximate) solution of the 
eikonal equation 


0&/dt +|VEl+V=0 


In particular, we can set 


1 
=(x,t) = (4t) "|x|? -t | V (sx) ds 
0 
if p > 1/2 in [34]. For the Coulomb potential, 
E(x, t) = (4t) "|x = In |e 


(the singularity at x =O is inessential here). Thus, 
both in short- and long-range cases solutions of the 
time-dependent Schrödinger equation “live” in a 
region of the configuration space where |x| is of 
order |t|. Long-range potentials change only asymp- 
totic phases of these solutions. 

Another possibility is a time-independent modifi- 
cation in the phase space. Let us consider wave 


operators Wi(H,Ho;J]), where J is a pseudodiffer- 
ential operator, 


(If) (ce) = (2n)~4? J 


R2 


eiei e2, E) (E) dé 


with oscillating symbol exp (1®(x, €))¢(x, €). Due to the 
conservation of energy, we may suppose that C(x, €) 
contains a factor WIE") with Y € C (0, oo). Set 


(x, E) = (x, €) + B(x, 8) 


The perturbation HJ — JHọ is also a pseudodiffer- 
ential operator, and its symbol is short-range (it is 
O(|x|~*), £ > 0, as |x|— oo) if exp (ip(x,€)) is an 
approximate eigenfunction of the operator H corre- 
sponding to the “eigenvalue” |E i This leads to the 
eikonal equation 


V(x, O° + V(x) = E 


The notorious difficulty (for d > 2) of this method is 
that the eikonal equation does not have (even 
approximate) solutions such that |V,®(x, €)| +0 as 
|x|— oo and the arising error term is short-range. 
However, it is easy to construct functions y= y+ 
satisfying these conditions if a conical neighborhood 
of the direction FE is removed from Rf. For 
example, 


P(x, £) = +27! [ ve rE) — V(416)) dr 


if p> 1/2 in eqn [34]. Then the cutoff function 
C(x, €) =Ci(x,€) should be homogeneous of order 
zero in the variable x and it should be equal to zero 
in a neighborhood of the direction FE. We empha- 
size that now we have a couple of different 
identifications J = J+. 

The long-range problem is essentially more diffi- 
cult than the short-range one. The limiting absorp- 
tion principle remains true in this case, but its proof 
cannot be performed within perturbation theory. 
The simplest proof relies on the Mourre estimate 
(see Cycon et al. (1987)) for the commutator i[H, A] 
of H with the generator of dilations 


d 
A= -i) (xj0; + 0x/) 


j=l 
The Mourre estimate affirms that, for all A > 0, 
1E(A))[H, AJEA) >c(A)E(Ay), c(A)>0 [BSI 


if Ay =(A—e,A+ €) and £ is small enough. For the 
free operator Ho, this estimate takes the form 
i[Ho, A] = 4Ho and can be regarded as a commutation 
relation. Estimate [35] means that the observable 


(Ae tf oo) 


is a strictly increasing function of t for all f € H®®. 
The H-smoothness of the operator (x), 1 > 1/2, is 
deduced from this fact by some arguments of 
abstract nature (they do not really use concrete 
forms of the operators H and A). 

However, the limiting absorption principle is not 
sufficient for construction of scattering theory in the 
long-range case, and it should be supplemented by 
an additional estimate. To formulate it, denote by 


(Vu) (x) = (Vu) (x) — x4 (Wu) (x), x)x 


the orthonal projection of a vector (Vz)(x) on the 
plane orthogonal to x. Then the operator 
K= (x) "V1 is H-smooth on any compact X C 
(0,20). This result is formulated as an estimate 
(either on the resolvent or on the unitary group of 
H), which we refer to as the radiation estimate. This 
estimate is not very astonishing from the viewpoint 
of analogy with the classical mechanics. Indeed, in 
the case of free motion, the vector x(t) of the 
position of a particle is directed asymptotically as its 
momentum é. Regarded as a pseudodifferential 
operator, Vt has symbol £- |x|” (£, x)}x, which 
equals zero if x=y¢ for some y€ R. Thus, V+ 
removes the part of the phase space where a classical 
particle propagates. The proof of the radiation 
estimate is based on the inequality 


K*K < Co[H, ð] + C(x) t’, 8, = 8/A|x| 


which can be obtained by a direct calculation. Since 
the integral 


if (H, ale**f, e-f) ds 


= (ef, ef) — (af, f) 
is bounded by C(X)IIFII" for f€ E(X)f and 


the operator (x) (> is H-smooth on X, this 
implies H-smoothness of the operator KE(X). 
Calculating the perturbation HJ. — J+Họo, we see 
that it is a sum of two pseudodifferential operators. 
The first of them is short-range and thus can be 
taken into account by the limiting absorption 
principle. The symbol of the second one contains 
first derivatives (in the variable x) of the cutoff 
function ¢i(x,€) and hence decreases at infinity as 
x|" only. This operator factorizes into a product of 
Ho- and H-smooth operators according to the 
radiation estimate. Thus, all wave operators 
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W.(H,Ho;J+) and W4(Ho, H;J}) exist. These 
Operators are isometric since the operators J+ 
are in some sense close to unitary operators. 
The isometricity of W4(Ho, H; J}) is equivalent to 
the completeness of W+(H, Ho; J+). 

Although the modified wave operators enjoy 
basically the same properties as in the short-range 
case, properties of the scattering matrices in the 
short- and long-range cases are drastically different. 
Here we note only that for long-range potentials, 
due to a wild diagonal singularity of kernel of the 
scattering matrix, its spectrum covers the whole unit 
circle. 

Different aspects of long-range scattering are 
discussed in Derezinski and Gérard (1997), Pearson 
(1988), Saito (1979), and Yafaev (2000). 


See also: N-Particle Quantum Scattering; Quantum 
Dynamical Semigroups; Random Matrix Theory in 
Physics; Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools; Schrodinger 
Operators; Spectral Theory for Linear Operators. 
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The Framework of Quantum Mechanics 


In 1900, Max Planck initiated the quantum revolu- 
tion by presenting the hypothesis that radiation is 
emitted or absorbed only in “quanta,” each of 
energy hv, for frequency v (where h was a new 
fundamental constant of Nature). By this device, he 
explained the precise shape of the puzzling black- 
body spectrum. Then, in 1905, Albert Einstein 
introduced the concept of the photon, according to 
which light, of frequency v would, in appropriate 
circumstances, behave as though it were constituted 
as individual particles, each of energy hv, rather 
than as continuous waves, and he was able to 
explain the conundrum posed by the photoelectric 
effect by this means. Later, in 1923, Prince Louis de 
Broglie proposed that, conversely, all particles 
behave like waves, the energy being Planck’s þv 
and the momentum being hA™, where A is the 
wavelength, which was later strikingly confirmed in 
a famous experiment of Davisson and Germer in 
1927. Some years earlier, in 1913, Niels Bohr had 
used another aspect of this curious quantum 
“discreteness,” explaining the stable electron orbits 
in hydrogen by the assumption that (orbital) angular 
momentum must be quantized in units of b(=h/27). 

All this provided a very remarkable collection of 
facts and concepts, albeit somewhat disjointed, 
explaining a variety of previously baffling physical 
phenomena, where a certain discreteness seemed to 
be entering Nature at a fundamental level, where 
previously there had been continuity, and where 
there was an overriding theme of a confusion as to 
whether — or in what circumstances — waves or 
particles provide better pictures of reality. More- 
over, no clear and consistent picture of an actual 
“quantum-level reality” as yet seemed to arise out of 
all this. Then, in 1925, Heisenberg introduced his 
“matrix mechanics,” subsequently developed into a 
more complete theory by Born, Heisenberg and 
Jordan, and then more fully by Dirac. Some six 
months after Heisenberg, in 1926, Schrödinger 
introduced his very different-looking “wave 
mechanics,” which he subsequently showed was 
equivalent to Heisenberg’s scheme. These became 
encompassed into a comprehensive framework 
through the transformation theory of Dirac, which 
he put together in his famous book The Principles of 
Quantum Mechanics, first published in 1930. Later, 


von Neumann set the framework on a more rigorous 
basis in his 1932 book, Mathematische Grundlageen 
der Quantenmechanik (later translated as Mathe- 
matical Foundations of Quantum Mechanics, 1955). 

This formalism, now well known to physicists, is 
based on the presence of a quantum state |W) 
(Dirac’s “ket” notation being adopted here). In 
Schroédinger’s description, |) is to evolve by unitary 
evolution, according to the Schrodinger equation 
. Op) 
ih ~ H|q) 
where H is the quantum Hamiltonian. The totality of 
allowable states |V) constitutes a Hilbert space H 
and the Schrödinger equation provides a continuous 
one-parameter family of unitary transformations of H. 
The letter U is used here for the “quantum-level” 
evolution whereby the state |W) evolves in time 
according to this unitary Schrodinger evolution. 
However, we must be careful not to demand an 
interpretation of this evolution similar to that 
which we adopt for a classical theory, such as is 
provided by Maxwell’s equations for the electro- 
magnetic field. In Maxwell’s theory, the evolution 
that his equations provide is accepted as very 
closely mirroring the actual way in which a 
physically real electromagnetic field evolves with 
time. In quantum mechanics, however, it is a highly 
contentious matter how we should regard the 
“reality” of the unitarily evolving state |~). 

One of the key difficulties resides in the fact that 
the world that we actually observe about us rather 
blatantly does not accord with such a _ unitarily 
evolving |W). Indeed, the standard way that the 
quantum formalism is to be interpreted is very far 
from the mere following of such a picture. So long 
as no “measurement” is deemed to have been taking 
place, this U-evolution procedure would be adopted, 
but upon measurement, the state is taken to behave 
in a very different way, namely to “jump” instanta- 
neously to some eigenstate |) of the quantum 
operator O which is taken to represent the measure- 
ment, with probability given by the Born rule 


ple)? 


if we assume that both |y) and |¢) are normalized 
(ly) =1=(d|¢)); otherwise we can express this 
probability simply as 


(PLY) Yl) 


(YIP) Pl) 


(The operator O is normally taken to be self-adjoint, 
so that O = Ọ* and its eigenvalues are real, but more 





generally complex eigenvalues are accommodated if 
we allow O to be normal, that is, QỌ* =QO°O. In 
each case we require the eigenvectors of O to span 
the Hilbert space H.) This “evolution procedure” of 
the quantum state is very different from U, owing 
both to its discontinuity and its indeterminacy. The 
letter R will be used for this, standing for the 
“reduction” of the quantum state (sometimes referred 
to as the “collapse of the wave function”). This 
strange hybrid, whereby U and R are alternated, with 
U holding between measurements and R holding at 
measurements, is the standard procedure that is 
pragmatically adopted in conventional quantum 
mechanics, and which works so marvelously well, 
with no known discrepancy between the theory and 
observation. (In his classic account, von Neumann 
(1932, 1955), “R” is referred to as his “process I” 
and “U” as his “process II.”) However, there appears 
to be no consensus whatever about the relation 
between this mathematical procedure and what is 
“really” going on in the physical world. This is the 
kind of issue that will be of concern to us here. 


Quantum Reality 


The discussion here will be given only in the 
Schrodinger picture, for the reason that the issues 
appear to be clearer with this description. In the 
Heisenberg picture, the state |Y} does not evolve in 
time, and all dynamics is taken up in the time 
evolution of the dynamical variables. But this 
evolution does not refer to the evolution of specific 
systems, the “state” of any particular system being 
defined to remain constant in time. Since the 
Schrödinger and Heisenberg pictures are deemed to 
be equivalent (at least for the “normal” systems that 
are under consideration here), we do not lose 
anything substantial by sticking to Schr6édinger’s 
description, whereas there does seem to be a 
significant gain in understanding of what the 
formalism is actually telling us. 

There are, however, many different attitudes that 
are expressed as to the “reality” of |y}. (There is an 
unfortunate possibility of confusion here in the two 
uses of the word “real” that come into the discussion 
here. In the quantum formalism, the state is mathe- 
matically a “complex” rather than a “real” entity, 
whereas our present concern is not directly to do 
with this, but with the “ontology” of the quantum 
description.) According to what is commonly regarded 
as the standard — “Copenhagen” — interpretation of 
quantum mechanics (due primarily to Bohr, 
Heisenberg, and Pauli), the quantum state |~) is not 
taken as a description of a quantum-level reality at all, 
but merely as a description of the observer’s 
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knowledge of the of the quantum system under 
consideration. According to this view, the “jumping” 
that the quantum state undergoes is regarded as 
unsurprising, since it does not represent a sudden 
change in the reality of the situation, but merely in the 
observer’s knowledge, as new information becomes 
available, when the result of some measurement 
becomes known to the observer. According to this 
view, there is no objective quantum reality described 
by |Y}. Whether or not there might be some objective 
quantum-level reality with some other mathematical 
description seems to be left open by this viewpoint, but 
the impression given is that there might well not be any 
such quantum-level reality at all, in the sense that it 
becomes meaningless to ask for a description of 
“actual reality” at quantum-relevant scales. 

Of course some connection with the real world is 
necessary, in order that the quantum formalism can 
relate to the results of experiment. In the Copenha- 
gen viewpoint, the experimenter’s measuring appa- 
ratus is taken to be a classical-level entity, which can 
be ascribed a real ontological status. When the 
Geiger counter “clicks” or when the pointer 
“points” to some position on a dial, or when the 
track in the cloud chamber “becomes visible” —- 
these are taken to be real events. The intervening 
description in terms of a quantum state vector |W} is 
not ascribed a reality. The role of |y} is merely to 
provide a calculational procedure whereby the 
different outcomes of an experiment can be assigned 
probabilities. Reality comes about only when the 
result of the measurement is manifested, not before. 

A difficulty with this viewpoint is that it is hard to 
draw a clear line between those entities which are 
considered to have an actual reality, such as the 
experimental apparatus or a human observer, and 
the elemental constituents of those entities, which 
are such things as electrons or protons or neutrons 
or quarks, which are to be treated quantum 
mechanically and therefore, on the “Copenhagen” 
view, their mathematical descriptions are denied 
such an honored ontological status. Moreover, there 
is no limit to the number of particles that can 
partake in a quantum state. According to current 
quantum mechanics, the most accurate mathemati- 
cal procedure for describing a system with a large 
number of particles would indeed be to use a 
unitarily evolving quantum state. What reasons can 
be presented for or against the viewpoint that this 
gives us a reasonable description of an actual 
reality? Can our perceived reality arise as some 
kind of statistical limit when very large numbers of 
constituents are involved? 

Before entering into the more subtle and con- 
tentious issues of the nature of “quantum reality,” it 
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is appropriate that one of the very basic mathema- 
tical aspects of the quantum formalism be addressed 
first. It is an accepted aspect of the quantum 
formalism that a state-vector such as |W) should 
not, in any case, be thought of as providing a unique 
mathematical description of a “physical reality” for 
the simple reason that |Y) and z|w), where z is any 
nonzero complex number, describe precisely the 
same physical situation. It is a common, but not 
really necessary, practice to demand that |W) be 
normalized to unity: (w|w)=1, in which case the 
freedom in |Y} is reduced to the multiplication by a 
phase factor |Y) — e? |y). Either way, the physically 
distinguishable states constitute a projective Hilbert 
space PH, where each point of PH corresponds to a 
one-dimensional linear subspace of the Hilbert space 
H. The issue, therefore, is whether quantum reality 
can be described in terms of the points of a 
projective Hilbert space PH. 


Reality in Spin-1/2 Systems 


As a general comment, it seems that for systems with a 
small number of degrees of freedom — that is, for a 
Hilbert space H” of small finite dimension 7 — it seems 
more reasonable to assign a reality to the elements of 
PH” than is the case when 7 is large. Let us begin with 
a particularly simple case, where n=2, and H? 
describes the two-dimensional space of spin states of 
a massive particle of spin 1/2, such as an electron, 
proton, or quark, or suitable atom. Here we can take 
as an orthonormal pair of basis states |48) and |W), 
representing right-handed spin about the “up” and 
“down” directions, respectively. Clearly there is 
nothing special about these particular directions, so 
any other state of spin, of direction |7} say, is just as 
“real” as the original two. Indeed, we always find 


JA) = wN) + 2|V) 


for some pair of complex numbers z and w (not both 
zero). The different possible ratios z:w give us a 
complex plane (of zw!) compactified by a point at 
infinity (where w = 0) — a “Riemann sphere” — which is 
a realization of the complex projective 1-space PH’. 
There does indeed seem to be something “real” 
about the spin state of such a spin-1/2 particle or 
atom. We might imagine preparing the spin of 
a suitable spin-1/2 atom using a Stern—Gerlach 
apparatus (see Introductory Article: Quantum 
Mechanics) oriented in some chosen direction. The 
atom seems to “know” the direction of its spin, 
because if we measure it again in the same direction 
it has to be prepared to give us the answer “YES,” to 
the second measurement, with certainty, and that 
direction for its spin state is the only one that can 


guarantee this answer. (We are, of course, consider- 
ing only “ideal” measurements, for the purpose of 
argument.) Moreover, we could imagine that 
between the two measurements, some appropriate 
magnetic field had been introduced so as to rotate 
the spin direction in some very specific way, so that 
the spin state is now some other direction such as 
IN). By rotating our second Stern—Gerlach apparatus 
to agree with this new direction, we must again get 
certainty for the YES answer, the guaranteeing of 
this by the rotated state seeming now to give a 
“reality” to this new state |N). The quantum 
formalism does not allow us to ascertain an 
unknown direction of spin. But it does allow for us 
to “confirm” (or “refute”) a proposed direction for 
the spin state, in the sense that if the proposed 
direction is incorrect, then there is a nonzero 
probability of refutation. Only the correct direction 
can be guaranteed to give the YES answer. 


EPR-Bohm and Bell’s Theorem 


For a pair of particles or atoms of spin 1/2, the issue 
of the “reality” of spin states becomes less clear. 
Consider, for example, the EPR-Bohm example 
(where “EPR” stands for Einstein—Podolski—Rosen) 
whereby an initial state of spin 0 decays into two 
spin-1/2 atoms, traveling in opposite directions (east 
E, and west W). If a suitable Stern—Gerlach apparatus 
is set up to measure the spin of the atom at E, finding 
an answer |N), say, then this immediately ensures 
that the state at W is the oppositely pointing |N}, 
which can subsequently be “confirmed” by measure- 
ment at W. This, then, seems to provide a “reality” 
for the spin state |N) at W as soon as the E 
measurement has been performed, but not before. 
Now, let us suppose that some orientation different 
from N had actually been set up for the measurement 
at W, namely that which would have given YES for 
the direction €. This measurement can certainly give 
the answer YES upon encountering |N) (with a 
certain nonzero probability, namely (1 + cos @)/2, 
where @ is the angle between N and €). So far, this 
provides us with no problem with the “reality” of the 
spin state of the atom at W, since it would have been 
IN) before the measurement at W and would have 
“collapsed” (by the R-process) to |€) after the 
measurement. But now suppose that the measure- 
ment at W had actually been performed momentarily 
before the measurement at E, rather than just 
after it. Then there is no reason that the 
W-measurement would encounter |N}, rather than 
some other direction, but the result |€) of the 
measurement at W now seems to force the state at 
E to be |). Indeed, the two measurements, at E and 


at W, might have been spacelike separated, and 
because of the requirements of special relativity there 
would be no meaning to say which of the two 
measurements — at E or at W -— had “actually” 
occurred first. One seems to obtain a different picture 
of “reality” depending on this ordering. 

In fact, the calculations of probabilities come out 
the same whichever picture is used, so if one asks 
only for a calculational procedure for the probabil- 
ities, rather than an actual picture of quantum 
reality, these considerations are not problematic. But 
they do provide profound difficulties for any view of 
quantum reality that is entirely local. The difficulty 
is made particularly clear in a theorem due to John 
Bell (1964, 1966a, b) which showed that on the 
basis of the assumptions of local realism, there are 
particular relations between the conditional prob- 
abilities, which must hold in any situation of this 
kind; moreover, these inequalities can be violated in 
various situations in standard quantum mechanics. 
(See, most specifically, Clauser et al. (1969).) Several 
experiments that were subsequently performed 
(notably Aspect et al. (1982)) confirmed the expec- 
tations of quantum mechanics, thereby presenting 
profound difficulties for any local realistic model of 
the world. There are also situations of this kind 
which involve only yes/no questions, so that actual 
probabilities do not need to be considered, see 
Kochen and Specker (1967), Peres (1991), Hardy 
(1993), Conway and Kochen (2002). Basically: if 
One insists on realism, then one must give up 
locality. Moreover, nonlocal realistic models, con- 
sistent with the requirements of special relativity, are 
not easy to construct (see Quantum Mechanics: 
Generalizations), and have so far proved elusive. 


Other Aspects of Quantum Nonlocality 


Problems of this kind occur even at the more 
elementary level of single particles, if one tries to 
consider that an ordinary particle wave function 
(position-space description of |w)) might be just 
some kind of “local disturbance,” like an ordinary 
classical wave. Consider the wave function spread- 
ing out from a localized source, to be detected at a 
perpendicular screen some distance away. The 
detection of the particle at any one place on the 
screen immediately forbids the detection of that 
particle at any other place on the screen, and if we 
are to think of this information as being transmitted 
as a Classical signal to all other places on the screen, 
then we are confronted with problems of super- 
luminary communication. Again, any “realistic” 
picture of this process would require nonlocal 
ingredients, which are difficult to square with the 
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requirements of special relativity. (It is possible that 
these difficulties might be resolved within some kind 
of nonlocal geometry, such as that supplied by 
twistor theory (see Twistors; Twistor Theory: Some 
Applications); see, particularly, Penrose (2005).) 
These types of issues are made even more dramatic 
and problematic in the procedure of “quantum 
teleportation,” whereby the information in a quantum 
state (e.g., the unknown actual direction Z in some 
quantum state |7)) can be transported from one 
experimenter A to another one B, by merely 
the sending of a small finite number of classical bits 
of information from A to B, where before this classical 
information is transmitted, A and B must each be in 
possession of one member of an EPR pair. More 
explicitly, we may suppose A (Alice) is presented with 
a spin-1/2 state |7}, but is not told the direction 7. She 
has in her possession another spin-1/2 state which is an 
EPR-Bohm partner of a spin-1/2 state in the posses- 
sion of B (Bob). She combines this |7) with her EPR 
atom and then performs a measurement which 
distinguishes the four orthogonal “Bell states” 


0: MIY — IN) 
1: [MID — |Y)I) 
2: [MIN + IY)I) 
3: MIY + IM) 


where the first state in each product refers to her 
unknown state and the second refers to her EPR 
atom. The result of this measurement is conveyed to 
Bob by an ordinary classical signal, coded by the 
indicated numbers 0, 1, 2, 3. On receiving Alice’s 
message, Bob takes the other member of the EPR 
pair and performs the following rotation on it: 


0: leave alone 

1: 180° about x-axis 
2: 180° about y-axis 
3: 180° about z-axis 


This achieves the successful “teleporting” of |7) 
from A to B, despite the fact that only 2 bits of 
classical information have been signaled. It is the 
acausal EPR-Bohm connection that provides the 
transmission of “quantum information” in a classi- 
cally acausal way. Again, we see the essentially 
nonlocal (or acausal) nature of any attempted 
“realistic” picture of quantum phenomena. It may 
be regarded as inappropriate to use the term 
“information” for something that is propagated 
acausally and cannot be directly used for signaling. 
It has been suggested, accordingly, that a term such 
as “quanglement” might be more appropriate to use 
for this concept; see Penrose (2002, 2004). 
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The preceding arguments illustrate how quantum 
systems involving even just a few particles can exhibit 
features quite unlike the ordinary behavior of classical 
particles. This was pointed out by Schrödinger (1935), 
and he referred to this key property of composite 
quantum systems as “entanglement.” An entangled 
quantum state (vector) is an element of a product 
Hilbert space H” & H” which cannot be written as a 
tensor product of elements |~)|¢), with |Y) € H” and 
|) € H”, where H” refers to one part of the system and 
H” refers to another part, usually taken to be physically 
widely separated from the first. EPR systems are a 
clear example, and we begin to see very nonclassical, 
effectively nonlocal behavior with entangled systems 
generally. A puzzling aspect of this is that the vast 
majority of states are indeed entangled, and the more 
parts that a system has, the more entangled it becomes 
(where the generalization of this notion to more than 
two parts is evident). One might have expected that 
“big” quantum systems with large numbers of parts 
ought to behave more and more like classical systems 
when they get larger and more complicated. However, 
we see that this is very far from being the case. There is 
no good reason why a large quantum system, left on its 
own to evolve simply according to U should actually 
resemble a classical system, except in very special 
circumstances. Something of the nature of the R 
process seems to be needed in order that classical 
behaviour can “emerge.” 


Schrodinger’s Cat 


To clarify the nature of the problem we must consider a 
key feature of the U formalism, namely “linearity,” 
which is supposed to hold no matter how large or 
complicated is the quantum system under considera- 
tion. Recall the quantum superposition principle, which 
allows us to construct arbitrary combinations of states 


p) = wx) + zlo) 


from two given states |x) and |). Quantum linearity 
tells us that if 


p) ~ lo) 


where the symbol “~-=” expresses how a state will 
have evolved after a specified time period T, then 


Ie) = wlx) + zlo) ~ly) = wxh + zlo) 


Let us now consider how this might be applied in 
a particular, rather outlandish situation. Let us 
suppose that the |x)-evolution consists of a photon 
going in one direction, encountering a detector, 
which is connected to some murderous device which 
kills a cat. The |¢)-evolution, on the other hand, 


x)= |x") and 
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consists of the photon going in some other direction, 
missing the detector so that the murderous device is 
not activated, and the cat is left alive. These two 
alternatives would each be perfectly plausible 
evolutions which might take place in the physical 
world. Now, by use of a beam splitter (effectively a 
“half-silvered mirror”) we can easily arrange for the 
initial state of the photon to be the superposition 
w|x) + z|) of the two. Then by quantum linearity 
we find, as the final result, the superposed state 
w\|x') + z|o"), in which the cat is in a superposition 
of life and death (a “Schrédinger’s cat”). 

We note that the two individual final states |x’) 
and |¢') would each involve not just the cat but also 
its environment, fully entangled with the cat’s state, 
and perhaps also some human observer looking at 
the cat. In the latter case, |x’) would involve the 
observer in a state of unhappily perceiving a dead 
cat, and |g) happily perceiving a live one. Two of 
the “conventional standpoints” with regard to the 
measurement problem are of relevance here. Accord- 
ing to the standpoint of environmental decoherence, 
the details of the environmental degrees of freedom 
are completely inaccessible, and it is deemed to be 
appropriate to construct a density matrix to describe 
the situation, which is a partial trace D of the 
quantity |Y} (p|, constructed by tracing out over all 
the environmental degrees of freedom: 


D = trace over environment{|q) y|} 


The density matrix tends to be regarded as a more 
appropriate quantity than the ket |y} to represent 
the physical situation, although this represents 
something of an “ontology shift” from the point of 
view that was being held previously. Under appro- 
priate assumptions, D may now be shown to attain a 
form that is close to being diagonal in a basis with 
respect to which the cat is either dead or alive, and 
then, by a second “ontology shift” D is re-read as 
describing a probability mixture of these two states. 

According to the second “conventional standpoint” 
under consideration here, it is not logical to take this 
detour through a density-matrix description, and 
instead one should maintain a consistent ontology by 
following the evolution of the state |Y} itself through- 
out. The “real” resulting physical state is then taken to 
be actually |W}, which involves the superposition of a 
dead and live cat. Of course this “reality” does not agree 
with the reality that we actually perceive, so the position 
is taken that a conscious mind would not actually be 
able to function in such a superposed condition, and 
would have to settle into a state of perception of either a 
dead cat or a live one, these two alternatives occurring 
with probabilities as given by the Born rule stated 
above. It may be argued that this conclusion depends 


upon some appropriate theory of how conscious minds 
actually perceive things, and this appears to be lacking. 

A good many physicists might argue that none of 
these attempts at resolution of the measurement 
problem is satisfactory, including “Copenhagen,” 
although the latter at least has the advantage of 
offering a pragmatic, if not fully logical, stance. Such 
physicists might take the position that it is necessary 
to move away from the precise version of quantum 
theory that we have at present, and turn to one of its 
modifications. Some major candidates for modifica- 
tion are discussed in Quantum Mechanics: General- 
izations. Most of these actually make predictions 
that, at some stage, would differ from those of 
standard quantum mechanics. So it becomes an 
experimental matter to ascertain the plausibility of 
these schemes. In addition, there are reinterpretations 
which do not change quantum theory’s predictions, 
such as the de Broglie—Bohm model. In this, there are 
two levels of “reality,” a firmer one with a particle or 
position-space ontology, and a secondary one con- 
taining waves which guide the behavior at the firmer 
level. It is clear, however, that these issues will 
remain the subject of debate for many years to come. 


See also: Functional Integration in Quantum Physics; 
Normal Forms and Semiclassical Approximation; 
Quantum Mechanics: Generalizations; Twistor Theory: 
Some Applications [In Integrable Systems, Complex 
Geometry and String Theory]; Twistors. 
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Introduction 


According to the so-called “Copenhagen Interpreta- 
tion,” standard quantum theory is limited to describ- 
ing experimental situations. It is at once remarkably 
successful in its predictions, and remarkably ill-defined 
in its conceptual structure: what is an experiment? 
what physical objects do or do not require 


quantization? how are the states realized in nature to 
be characterized? how and when is the wave-function 
“collapse postulate” to be invoked? Because of its 
success, one may suspect that quantum theory can be 
promoted from a theory of measurement to a theory 
of reality. But, that requires there to be an unambig- 
uous specification (S) of the possible real states of 
nature and their probabilities of being realized. 

There are several approaches that attempt to 
achieve S. The more conservative approaches (e.g., 
consistent histories, environmental decoherence, 
many worlds) do not produce any predictions that 
differ from the standard ones because they do not 
tamper with the usual basic mathematical 
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formalism. Rather, they utilize structures compatible 
with standard quantum theory to elucidate S. These 
approaches, which will not be discussed in this 
article, have arguably been less successful so far at 
achieving S than approaches that introduce 
significant alterations to quantum theory. 

This article will largely deal with the two most 
well-developed realistic models that reproduce 
quantum theory in some limit and yield potentially 
new and testable physics outside that limit. First, the 
pilot-wave model, which will be discussed in the 
broader context of “hidden-variables theories.” 
Second, the continuous spontaneous localization 
(CSL) model, which describes wave-function col- 
lapse as a physical process. Other related models 
will also be discussed briefly. 

Due to bibliographic space limitations, this article 
contains a number of uncited references, of the form 
“Tauthor] in [year].” Those in the next section can 
be found in Valentini (2002b, 2004a,b) or at 
www.arxiv.org. Those in the subsequent sections 
can be found in Adler (2004), Bassi and Ghirardi 
(2003), Pearle (1999) (or in subsequent papers by 
these authors, or directly, at www.arxiv.org), and in 


Wallstrom (1994). 


Hidden Variables and Quantum 
Nonequilibrium 


A deterministic hidden-variables theory defines a 
mapping w=w(M, A) from initial hidden parameters 
à (defined, e.g., at the time of preparation of a 
quantum state) to final outcomes w of quantum 
measurements. The mapping depends on macro- 
scopic experimental settings M, and fixes the out- 
come for each run of the experiment. Bell’s theorem 
of 1964 shows that, for entangled quantum states of 
widely separated systems, the mapping must be 
nonlocal: some outcomes for (at least) one system 
must depend on the setting for another distant 
system. 

In a viable theory, the statistics of quantum 
measurement outcomes — over an ensemble of 
experimental trials with fixed settings M - will 
agree with quantum theory for some special dis- 
tribution par(A) of hidden variables. For example, 
expectation values will coincide with the predictions 
of the Born rule 


(Whar = [4d pqr(r)u(M, A) = tr(A%) 


for an appropriate density operator fp and Hermi- 
tian observable Q. (As is customary in this context, 
{dd is to be understood as a generalized sum.) 


However, given the mapping w=wu(M, A) for indi- 
vidual trials, one may, in principle, consider 
nonstandard distributions p(A) Æ par(A) that yield 
statistics outside the domain of ordinary quantum 
theory (Valentini 1991, 2002a). We may say that 
such distributions correspond to a state of quantum 
nonequilibrium. 

Quantum nonequilibrium is characterized by the 
breakdown of a number of basic quantum con- 
straints. In particular, nonlocal signals appear at the 
statistical level. We shall first illustrate this for the 
hidden-variables model of de Broglie and Bohm. 
Then we shall generalize the discussion to all 
(deterministic) hidden-variables theories. 

At present there is no experimental evidence for 
quantum nonequilibrium in nature. However, from 
a hidden-variables perspective, it is natural to 
explore the theoretical properties of nonequilibrium 
distributions, and to search experimentally for the 
statistical anomalies associated with them. 

From this point of view, quantum theory is a 
special case of a wider physics, much as thermal 
physics is a special case of a wider (nonequilibrium) 
physics. (The special distribution por(à) is analo- 
gous to, say, Maxwell’s distribution of molecular 
speeds.) Quantum physics may be compared with 
the physics of global thermal equilibrium, which is 
characterized by constraints — such as the impossi- 
bility of converting heat into work (in the absence of 
temperature differences) — that are not fundamental 
but contingent on the state. Similarly, quantum 
constraints such as statistical locality (the impossi- 
bility of converting entanglement into a practical 
signal) are seen as contingencies of part(A). 


Pilot-Wave Theory 


The de Broglie-—Bohm “pilot-wave theory” — as it 
was originally called by de Broglie, who first 
presented it at the Fifth Solvay Congress in 1927 — 
is the classic example of a deterministic hidden- 
variables theory of broad scope (Bohm 1952, Bell 
1987, Holland 1993). We shall use it to illustrate the 
above ideas. Later, the discussion will be generalized 
to arbitrary theories. 

In pilot-wave dynamics, an individual closed 
system with (configuration-space) wave function 
W(X, tf) satisfying the Schrodinger equation 


„0Y « 
has an actual configuration X(t) with velocity 
| Xt 
ko = J 2 
(Y(X, 2)| 


where J=J[V]=/J(X,t) satisfies the continuity 
equation 

ot _ 

a +V-J=0 [3] 


(which follows from [1]). In quantum theory, J is the 
“probability current.” In pilot-wave theory, Y is an 
objective physical field (on configuration space) 
guiding the motion of an individual system. 

Here, the objective state (or ontology) for a closed 
system is given by W and X. A probability distribu- 
tion for X -— discussed below -— completes an 
unambiguous specification S (as mentioned in the 
introduction). 

Pilot-wave dynamics may be applied to any 
quantum system with a locally conserved current in 
configuration space. Thus, X may represent a many- 
body system, or the configuration of a continuous 
field, or perhaps some other entity. 

For example, at low energies, for a system of N 
particles with positions Ẹx;(t) and masses 
m;(i=1,2,...,N), with an external potential V, 
[1] (with X = (x1,x2,...,xn)) reads 
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-Y + VY 4 
ot Zi 2m; x = | | 


while [2] has components 


dx; h (z5) Vi 
—Im = 








dt mi; NY [5] 
(where Y = |Y lets), 

In general, [1] and [2] determine X(t) for an 
individual system, given the initial conditions 
X(0), ¥(X,0) at t=0. For an arbitrary initial 
distribution P(X,0), over an ensemble with the 
same wave function W(X,0), the evolution P(X, t) 
of the distribution is given by the continuity 
equation 


Mj 


oP 

— . (PX) = 0 6 

T +V.: (PX) 6 
The outcome of an experiment is determined by 

X(0), ©(X,0), which may be identified with A. For 

an ensemble with the same W(X,0), we have 

A= X(0). 


Quantum equilibrium From [3] and [6], if we 
assume P(X, 0) =|(X, 0)|? at t=O, we obtain 
P(X, t) =| W(X, t) — the Born-rule distribution of 
configurations — at all times t. 

Quantum measurements are, like any other 
process, described and explained in terms of evol- 
ving configurations. For measurement devices whose 
pointer readings reduce to configurations, the 
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distribution of outcomes of quantum measurements 
will match the statistical predictions of quantum 
theory (Bohm 1952, Bell 1987, Durr et al. 2003). 
Thus, quantum theory emerges phenomenologically 
for a “quantum equilibrium” ensemble with 
distribution P(X, t) =|©(X, t)|* (or p(A) = por(à)). 


Quantum nonequilibrium In principle, as we saw 
for general hidden-variables theories, we may con- 
sider a nonequilibrium distribution P(X,0) Æ 
U(X, 0)|* of initial configurations while retaining 
the same deterministic dynamics [1], [2] for indivi- 
dual systems (Valentini 1991). The time evolution of 
P(X,t) will be determined by [6]. 

As we shall see, in appropriate circumstances 
(with a sufficiently complicated velocity field X), [6] 
generates relaxation P—|W|* on a coarse-grained 
level, much as the analogous classical evolution on 
phase space generates thermal relaxation. But for as 
long as the ensemble is in nonequilibrium, the 
statistics of outcomes of quantum measurements 
will disagree with quantum theory. 

Quantum nonequilibrium may have existed in the 
very early universe, with relaxation to equilibrium 
occurring soon after the big bang. Thus, a hidden- 
variables analog of the classical thermodynamic 
“heat death of the universe” may have actually 
taken place (Valentini 1991). Even so, relic cosmo- 
logical particles that decoupled sufficiently early 
could still be in nonequilibrium today, as suggested 
by Valentini in 1996 and 2001. It has also been 
speculated that nonequilibrium could be generated 
in systems entangled with degrees of freedom behind 
a black-hole event horizon (Valentini 2004a). 

Experimental searches for nonequilibrium have 
been proposed. Nonequilibrium could be detected 
by the statistical analysis of random samples of 
particles taken from a parent population of (for 
example) relics from the early universe. Once the 
parent distribution is known, the rest of the popula- 
tion could be used as a resource, to perform tasks 
that are currently impossible (Valentini 2002b). 


H-Theorem: Relaxation to Equilibrium 


Before discussing the potential uses of nonequili- 
brium, we should first explain why all systems 
probed so far have been found in the equilibrium 
state P=|W|*. This distribution may be accounted 
for along the lines of classical statistical mechanics, 
noting that all currently accessible systems have had 
a long and violent astrophysical history. 

Dividing configuration space into small cells, and 
introducing coarse-grained quantities P, IU], a gen- 
eral argument for relaxation P > ||? is based on an 
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analog of the classical coarse-graining H-theorem. 
The coarse-grained H-function 


Fe J dX Pin(P/|W|2) 7] 


(minus the relative entropy of P with respect to 
|") obeys the H-theorem (Valentini 1991) 


H(t) < H(0) 


(assuming no initial fine-grained microstructure in P 
and ||*). Here, H > 0 for all P,|U|* and H=0 if 
and only if P=|W|* everywhere. 

The H-theorem expresses the fact that P and ||? 
behave like two “fluids” that are “stirred” by the same 
velocity field X, so that P and |W|* tend to become 
indistinguishable on a coarse-grained level. Like its 
classical analog, the theorem provides a general 
understanding of how equilibrium is approached, 
while not proving that equilibrium is actually 
reached. (And of course, for some simple systems — 
such as a particle in the ground state of a box, for 
which the velocity field VS/m vanishes — there is no 
relaxation at all.) A strict decrease of H(t) immedi- 
ately after t=0 is guaranteed if Xo - V(Po/|Wo|") has 
nonzero spatial variance over a coarse-graining cell, 
as shown by Valentini in 1992 and 2001. 

A relaxation timescale r may be defined by 
1/7? = —(d?H/dt*))/Ho. For a single particle with 
quantum energy spread AE, a crude estimate given 
by Valentini in 2001 yields 7 ~ (1/)h* /m'/2(AE)?”, 
where € is the coarse-graining length. For wave 
functions that are superpositions of many energy 
eigenfunctions, the velocity field (generally) varies 
rapidly, and detailed numerical simulations (in two 
dimensions) show that relaxation occurs with an 
approximately exponential decay H(t) ~ Hoe™*:, 
with a time constant tf of order 7 (Valentini and 
Westman 2005). 

Equilibrium is then to be expected for particles 
emerging from the violence of the big bang. The 
possibility is still open that relics from very early 
times may not have reached equilibrium before 
decoupling. 


Nonlocal Signaling 


We now show how nonequilibrium, if it were ever 
discovered, could be used for nonlocal signaling. 
Pilot-wave dynamics is nonlocal. For a pair of 
particles A, B with entangled wave function 
W(x4,xp,t), the velocity x,4(t)=V,4S(x,4,xp,t)/ma, 
of A depends instantaneously on xg, and local 
operations at B — such as switching on a potential — 
instantaneously affect the motion of A. For an 


ensemble P(x, xp, t) =|U(x4, xp, t)|”, local opera- 
tions at B have no statistical effect at A: the 
individual nonlocal effects vanish upon averaging 
over an equilibrium ensemble. 

Nonlocality is (generally) hidden by statistical 
noise only in quantum equilibrium. If instead 
P(xas xg 0S IU (x4, xg, 0)|", a local change in the 
Hamiltonian at B generally induces an instan- 
taneous change in the marginal p,(x,,t)= 
J d° xpP(x4,xp,t) at A. For example, in one dimen- 
sion a sudden change Hp — H’ b in the Hamiltonian 
at B induces a change Apa4 = pa(xa,t) — pa(xa, 0) 
(for small t) (Valentini 1991), 


t2 
Apa = — ae (al a(xa) ) | den b (xp) 


x P(x Xp 0) _ Ta [8] 
[E (x4, xg, 0)" 


(Here ma = mpg =m, a(xq) depends on ¥(x4,xg,0), 
while b(xg) also depends on H', and vanishes if 
H', = Hp. ) The signal is generally nonzero if 
Po # |Wol. 

Nonlocal signals do not lead to causal paradoxes 
if, at the hidden-variable level, there is a preferred 
foliation of spacetime with a time parameter that 
defines a fundamental causal sequence. Such sig- 
nals, if they were observed, would define an 
absolute simultaneity as discussed by Valentini in 
1992 and 2005. Note that in pilot-wave field 
theory, Lorentz invariance emerges as a phenom- 
enological symmetry of the equilibrium state, 
conditional on the structure of the field-theoretical 
Hamiltonian (as discussed by Bohm and Hiley in 
1984, Bohm, Hiley and Kaloyerou in 1987, and 
Valentini in 1992 and 1996). 


Subquantum Measurement 


In principle, nonequilibrium particles could also be 
used to perform “subquantum measurements” on 
ordinary, equilibrium systems. We illustrate this 
with an exactly solvable one-dimensional model 
(Valentini 2002b). 

Consider an apparatus “pointer” coordinate y, 
with known wave function go(y) and known 
(ensemble) distribution mo(y) F | goly)|7, where 70(y) 
has been deduced by statistical analysis of random 
samples from a parent population with known wave 
function go(y). (We assume that relaxation may be 
neglected: for example, if go is a box ground state, 
y=0 and roly) is static.) Consider also a “system” 
coordinate x with ages wave function wo(x) and 
known distribution po(x)=|wWo(x)|*. If moly) is 
arbitrarily narrow, x9 can be measured without 


disturbing wWo(x), to arbitrary accuracy (violating the 
uncertainty principle). 

To do this, at t=0 we switch on an interaction 
Hamiltonian H = axXpy, where a is a constant and py 
is canonically conjugate to y. For relatively large a, 
we may neglect the Hamiltonians of x and y. For 
= W(x, y,t), we then have OW/ot= —axðY /Oy. 
For |W we have the continuity equation 0|U|* /dt = 
—ax0|U|"/Ay, which implies the hidden-variable 
velocity fields x =0, y= ax and trajectories x(t) = xo, 
y(t) = yo + axot. 

The initial product Wo(x, y) = vWo(x)go(y) evolves 
into W(x, y, t) = Wo0(x)go(y — axt). For at— 0 (with a 
large but fixed), U(x, y, t) — wWo(x)go(y) and wWo(x) is 
undisturbed: for small at, a standard quantum 
pointer with the coordinate y would yield negligible 
information about xo. Yet, for arbitrarily small at, 
the hidden-variable pointer coordinate y(t) = yo + axot 
does contain complete information about xo (and 
x(t)=xo). This “subquantum” information will be 
visible to us if 79(y) is sufficiently narrow. 

For, over an ensemble of similar e 
with initial joint distribution Po(x, y) = |wo(x i Toly) 
(equilibrium for x and nonequilibrium for y), the 
a equation nd Ot =—axOP/Oy implies that 
Pixy t= |o j| Toly — axt). If roly) is localized 
around y =Q (moly) = ‘0 for |y| > w/2), then a stan- 
dard (faithful) measurement of y with result ymeas 
will imply that x lies in the interval (Ymeas/at — w/2at, 
Ymeas /At + w/2at) (so that P(x,y, t) # 0). Taking the 
simultaneous limits at— 0, w— 0, with w/at— 0, 
the midpoint Ymeas/at— xo (since Ymeas = Yo + axot 
and |yo| < w/2), while the error w/2at — 0. 

If w is arbitrarily small, a sequence of such 
measurements will determine the hidden trajectory 
x(t) without disturbing (x,t), to arbitrary accuracy. 


Subquantum Information and Computation 


From a hidden-variables perspective, immense phy- 
sical resources are hidden from us by equilibrium 
statistical noise. Quantum nonequilibrium would 
probably be as useful technologically as thermal or 
chemical nonequilibrium. 


Distinguishing nonorthogonal states In quantum 
theory, nonorthogonal states |1), |w2) ((~1|wW2) 4 0) 
cannot be distinguished without disturbing them. 
This theorem breaks down in quantum nonequili- 
brium (Valentini 2002b). For example, if |), Y2) 
are distinct states of a single spinless particle, then 
the associated de Broglie—Bohm velocity fields will 
in general be different, even if (w1|q2) 40, and so 
will the hidden-variable trajectories. Subquantum 
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measurement of the trajectories could then distin- 


guish the states |), |W). 


Breaking quantum cryptography The security of 
standard protocols for quantum key distribution 
depends on the validity of the laws of quantum 
theory. These protocols would become insecure 
given the availability of nonequilibrium systems 
(Valentini 2002b). 

The protocols known as BB84 and B92 depend on 
the impossibility of distinguishing nonorthogonal 
quantum states without disturbing them. An eaves- 
dropper in possession of nonequilibrium particles could 
distinguish the nonorthogonal states being transmitted 
between two parties, and so read the supposedly secret 
key. Further, if subquantum measurements allow an 
eavesdropper to predict quantum measurement out- 
comes at each “wing” of a (bipartite) entangled state, 
then the EPR (Einstein—Podolsky—Rosen) protocol also 
becomes insecure. 


Subquantum computation It has been suggested 
that nonequilibrium physics would be computation- 
ally more powerful than quantum theory, because of 
the ability to distinguish nonorthogonal states 
(Valentini 2002b). However, this ability depends 
on the (less-than-quantum) dispersion w of the 
nonequilibrium ensemble. A well-defined model of 
computational complexity requires that the 
resources be quantified in some way. Here, a key 
question is how the required w scales with the size 
of the computational task. So far, no rigorous results 
are known. 


Extension to All Deterministic 
Hidden-Variables Theories 


Let us now discuss arbitrary (deterministic) theories. 


Nonlocal signaling Consider a pair of two-state 
quantum systems A and B, which are widely 
separated and in the singlet state. Quantum 
measurements of observables 64 = m,4-O,4,63 = 
mp-Op (where ma,mpg are unit vectors in Bloch 
space and 64,08 are Pauli spin operators) yield 
outcomes o4,0g = 241, in the ratio 1:1 at each 
wing, with a correlation (646g) = —m,-mp. Bells 
theorem shows that for a hidden-variables theory to 
reproduce this correlation — upon averaging over an 
equilibrium ensemble with distribution par(A) — it 


must take the nonlocal form 
oA = 04(ma, MB, À), og = og(ma, mg, àA) 19] 


More precisely, to obtain (TAOB}QT = —mM4 : Mg 
(where (7408) or = f d\per(A)oaog), at least one of 
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o4,0pB must depend on the measurement setting at 
the distant wing. Without loss of generality, we 
assume that o4 depends on mpg. 

For an arbitrary nonequilibrium ensemble with 
distribution p(A) Æ par(A), in general (o,4op) = 
{ dA p(A)oaog differs from —m,-mp, and the out- 
comes 04,03 = +1 occur in a ratio different from 1:1. 
Further, a change of setting mg — m, at B will generally 
induce a change in the outcome statistics at A, yielding a 
nonlocal signal at the statistical level. To see this, note 
that, in a nonlocal theory, the “transition sets” 


Ta(—, +) = {Aloa (ma, mp, à) = —1, 
o4(m,4,m,, A) = +1} 
Ta(+, —) = {\|o4(ma, mp, A) = +1, 
o4(m,4,m,, A) = —1} 


cannot be empty for arbitrary settings. Yet, in quantum 
equilibrium, the outcomes a4 = £1 occur in the ratio 
1:1 for all settings, so the transition sets must 
have equal equilibrium measure, pgr[T4(—, +)] = 
uorlTa(t,—)] (duor = par(A)dd). That is, the 
fraction of the equilibrium ensemble making the 
transition 04 = —l1—o04,=+1 under mg —> mp must 
equal the fraction making the reverse transition 
o,g=t+l—-o,=-—1. (This “detailed balancing” is 
analogous to the principle of detailed balance in 
statistical mechanics.) Since T4(—,+),T4(+,—) are 
fixed by the deterministic mapping, they are indepen- 
dent of the ensemble distribution p(A). Thus, for 
plà) £ par(A), in general p[Ta(—, +)] 4 elTa (+, —)] 
(du =p(à)dA): the fraction of the nonequilibrium 
ensemble making the transition oa =—1l—-o04,=+1 
will not in general balance the fraction making the 
reverse transition. The outcome ratio at A will then 
change under mg — mh and there will be an instanta- 
neous signal at the statistical level from B to A 
(Valentini 2002a). 

Thus, in any deterministic hidden-variables 
theory, nonequilibrium distributions p(A) Æ por(à) 
generally allow entanglement to be used for non- 
local signalling (just as, in ordinary statistical 
physics, differences of temperature make it possible 
to convert heat into work). 


Experimental signature of nonequilibrium Quantum 
expectations are additive, (aa + 2.2) = ¢1(Q4)+ 
c2(Q2), even for noncommuting observables 
([91, Q2] Æ 0, with c1, c2 real). As emphasized by 
Bell in 1966, this seemingly trivial consequence 
of the (linearity of the) Born rule (Ô) =tr(AQ) is 
remarkable because it relates statistics from 
distinct, “incompatible” experiments. In none- 
quilibrium, such additivity generically breaks 
down (Valentini 2004b). 


Further, for a two-state system with observables 
m : Ô, the “dot-product” structure of the quantum 
expectation (m-G)=tr(pm-G)=m-P (for some 
Bloch vector P) is equivalent to expectation 
additivity (Valentini 2004b). Nonadditive expecta- 
tions then provide a convenient signature of none- 
quilibrium for any two-state system. For example, 
the sinusoidal modulation of the quantum trans- 
mission probability for a single photon through a 
polarizer 


Por(O)= s(1+(m-6))=5(1+Pcos20) [10] 


(where an angle 0 on the Bloch sphere corresponds 
to a physical angle © =0/2) will generically break 
down in nonequilibrium. Deviations from [10] 
would provide an unambiguous violation of quan- 
tum theory (Valentini 2004b). 

Such deviations were searched for by Papaliolios 
in 1967, using laboratory photons and successive 
polarization measurements over very short times, to 
test a hidden-variables theory (distinct from pilot- 
wave theory) due to Bohm and Bub (1966), in which 
quantum measurements generate nonequilibrium for 
short times. Experimentally, successive measure- 
ments over timescales ~107!%s agreed with the 
(quantum) sinusoidal modulation cos?© to <1%. 
Similar tests might be performed with photons of a 
more exotic origin. 


Continuous Spontaneous Localization 
Model (CSL) 


The basic postulate of CSL is that the state vector 
|y, t) represents reality. Since, for example, in 
describing a measurement, the usual Schrödinger 
evolution readily takes a real state into a nonreal 
state, that is, into a superposition of real states 
(such as apparatus states describing different 
experimental outcomes), CSL requires a modifica- 
tion of Schrédinger’s evolution. To the Hamiltonian 
is added a term which depends upon a classical 
randomly fluctuating field w(x,t) and a mass- 
density operator A(x,t). This term acts to collapse 
a superposition of states, which differ in their 
spatial distribution of mass density, to one of these 
states. The rate of collapse is very slow for a 
superposition involving a few particles, but very 
fast for a superposition of macroscopically different 
states. Thus, very rapidly, what you see (in nature) 
is what you get (from the theory). Each state vector 
evolving under each w(x,t) corresponds to a 
realizable state, and a rule is given for how to 
associate a probability with each. In this way, an 
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unambiguous specification S, as mentioned in the 
introduction, is achieved. 


Requirements for Stochastic Collapse Dynamics 


Consider a normalized state vector |w,t)= 
X`, On(t)|an)((an|av) =n) which undergoes a 
stochastic dynamical collapse process. This means 
that, starting from the initial superposition at t=0, 
for pe run of the process, the squared amplitudes 
x,(t) = |an(t)|? fluctuate until all but one vanish, that 
1S, Aa 1, (x4m(co) = 0) with probability xm (0). 

This may be achieved simply, assuming negligible 
effect of the usual Schrödinger evolution, if the 
stochastic process enjoys the following properties 
(Pearle 1977): 





a1 [11a] 
Xn(t) = Xn(0) [11b] 
Xnl(œ)x%mlo0) =0 form#n [11c] 


where the overbar indicates the ensemble average at 
the indicated time. The only way that a sum of 
products of non-negative terms can vanish is for at 
least one term in each product to vanish. Thus, 
according to [11c], for each run, at least one of each 
pair {x„(œ0), X%m(o0)}(n Æ m) must vanish. This 
means that at most one x,(0oo) might not vanish 
and, by [11a], applied at t= œ, one x,,(oo) must not 
vanish and, in fact, must equal 1: hence, each run 
produces collapse. Now, let the probability of the 
outcome {x,,(00) = 1, xz,(0o) = 0} be denoted P,,. Since 
eco 1s Py >. zy 0 - Pm = Pn then, according to 
the Martingale property [11b], applied at 
t =œ, P, =x,(0): hence, the ensemble of runs pro- 
duces the probability postulated by the usual “collapse 
rule” of standard quantum theory. 

A (nonquantum) stochastic process which obeys 
these equations is the gambler’s ruin game. Suppose 
one gambler initially possesses the fraction x1(0) of 
their joint wealth, and the other has the fraction 
x2(0). They toss a coin: heads, a dollar goes from 
gambler 1 to gambler 2, tails the dollar goes the 
other way. [11a] is satisfied since the sum of money 
in the game remains constant, [11b] holds because it 
is a fair game, and [11c] holds because each game 
eventually ends. Thus, gambler 7 wins all the money 
with probability x;(0). 





CSL in Essence 


Consider the (nonunitary) Schrödinger picture evo- 
lution equation 


l 
i) =Tesp(— | di’ {iH 
+ (4) w(t) — nam) p0) [12] 


where H is the usual Hamiltonian, w(t’) is an 
arbitrary function of white noise class, A is a 
Hermitian operator (Ala,) =a,|\a,)), is a collapse 
rate parameter, 7 is the time-ordering operator and 
h= 1. Associated with this, the probability rule 


t/dt 


wl Tans) 


P,(w)Dw = (wb, tih, t) )/(2rA/dt)"? [13] 


is defined, which gives the probability that nature 
chooses a noise which lies in the range {w(t'), w(t) + 
dw(t')} for O0<t' <t (for calculational purposes, 
time is discretized, with tọ = 0). 

Equations [12] and [13] contain the essential 
features of CSL, and are all that is needed to discuss 
the simplest collapse behavior. Set H =0, so there is 
no competition between collapse and the usual 
Schrödinger evolution, and let the initial state vector 
be |y, 0)}= S$) anlan}. Equations [12] and [13] 
become 


a, t) 59 Onlan) exp (=) (4A) tf aro 


— 2an) [14a] 
l 
P,(w) =X |an" exp (- Qn J de! [w(t 
z 2an) [14b] 


When the unnormalized state vector in [14a] is 
divided by p” *(w) and so normalized, the squared 
amplitudes are 


xn(t) = lan}? mel 


x f ar dew- 2a +) Palu) 


which are readily shown to satisfy [11a], [11b], and 


[11c] in the form x1% (o0)x}/ (o0)=0(m £ n) (which 
does not change the argument in the last subsection, 
but makes for an easier calculation). Thus, [14a] and 


[14b] describe collapse dynamics. 


2/2 Quantum Mechanics: Generalizations 


To describe collapse to a joint eigenstate of a set 
of mutually commuting operators <A’, replace 
(4) w(t) — 2AA]? in the exponent of [12] by 
Dee (4) fw" (t) — 2AA]. The interaction picture 
state vector in this case is [12] multiplied by 
exp (iHt): 


l 
=) exp! = = 
oe 


where A’(t’) = exp (iHt')A’ exp (—iH?’). The density 
matrix follows from [15], and [13]: 


— 2A" (#’)] r) 0 [15] 


A(t) = J Pi(w)Dw|y, t) uu (ts | /Pr(tv) 


-A oP) ao 16 


where A (Ax (t! )) appears to the left (right) of 6(0), 
and is time-ordered (time reverse-ordered). In the 
example described by [14], the density matrix [16] is 


a(t) = Soe O82) tna) oy, 00% lan) (am| 


n,m 
which encapsulates the ensemble’s collapse behavior. 


CSL 


The CSL proposal (Pearle 1989) is that collapse is 
engendered by distinctions between states at each 
point of space, so the index r of A” in [15] 


becomes x, 
1 t 
: / J dt! dx! 
0 


x w(x’, t) — 2AA(x’, nP) lb, 0) [17] 


Ui = 2 EXD (= 


and the distinction looked at is mass density. However, 
a a ma E choice A(x, 0)= M(x), where 

=% mi€! (x E(x) is the mass-density operator 
nn is the mass d the ith type of particle, so 
Me, Mp, Mn, ... are the masses, respectively, of elec- 
trons, protons, neutrons..., and a (x) is the creation 
operator for such a Bartle A location x), because this 
entails an infinite rate of energy increase of particles 
([23] with a=0). Instead, adapting a “Gaussian 
smearing? idea from the Ghirardi et al. (1986) 
spontaneous localization (SL) model (see the 


subsection “Spontaneous localization model”), choose 
A* as, essentially, proportional to the mass in a sphere 
of radius a about x: 


=el 


E / dM) .-20y -a o-i 1g 


The parameter value choices of SL, A ~ 10716 s71 
(according to [17] and [18], the collapse rate for 
protons) and a ~ 10™ cm are, so far, consistent with 
experiment (see the next subsection), and will be 
adopted here. 

The density matrix associated with [1 
n [16], 


7| is, as 


p(t) =T exp (- 0/2) [ dt! dx’ [Ay (x’, t’) 
i âre) 000) 19] 


which satisfies the differential equation 


AD = > fax [ae 


of Lindblad—Kossakowski form. 


A(x’,2), A(Z)]] 2O 


Consequences of CSL 


Since the state vector dynamics of CSL is different 
from that of standard quantum theory, there are 
phenomena for which the two make different 
predictions, allowing for experimental tests. Con- 
sider an N-particle system with position operators 
Xj(Xj\x) =x;|x)). Substitution of A(x’) from [18] in 
the Schrödinger picture version of [20], integration 
over x’, and utilization of 


results in 
da(t) A Sm 
SOL) = j 
rr ifo(t), H] 5s am = 
pal gad PP 
x fen) shu) + e7 (4a?) ' (Xai—Xry)? 
~ Dee ea A(t) [21] 
which is a useful form for calculations first 


suggested by Pearle and Squires in 1994. 


Interference Consider the collapse rate of an initial 
state |b) =@1|1) + a2|2), where |1),|2) describe a 


clump of matter, of size <a, at different locations 
with separation œa. Electrons may be neglected 
because of their small collapse rate compared to the 
much more massive nucleons, and the nucleon mass 
difference may be neglected. In using [21] to calculate 
d(1|A(t)|2)/dt, since exp[—(4a2) 1(X; — x;) ]=1 
when acting on state |1) or |2), and ~0 when X; 
acts on |1) and X; acts on |2), [21] yields, for N 
nucleons, the collapse rate AN?: 


d(11ô(£)12) 
AAE = -i(1\[0(0), 2) 


— AN*(1|p(#)|2) [22] 
If the clump undergoes a two-slit interference 
experiment, where the size and separation condi- 
tions above are satisfied for a time AT, and if the 
result agrees with the standard quantum theory 
prediction to 1%, it also agrees with CSL provided 
A™ > 100N7AT. So far, interference experiments 
with N as large as &10° have been performed, by 
Nairz, Arndt, and Zeilinger in 2000. The SL value 
of A% ~ 10!6 would be testable, that is, the 
quantum-predicted interference pattern would be 
“washed out” to 1% accuracy, if the clump were 
an =10-°cm radius sphere of mercury, which 
contains N#10° nucleons, interfered for 
AT=0.01s. Currently envisioned but not yet 
performed experiments (e.g., by Marshall, Simon, 
Penrose, and Bouwmester in 2003) have been 
analyzed (e.g., by Bassi, Ippoliti, and Adler in 
2004 and by Adler in 2005), which involve a 
superposition of a larger clump of matter in 
slightly displaced positions, entangled with a 
photon whose interference pattern is measured: 
these proposed experiments are still too crude to 
detect the SL value of à, or the gravitationally 
based collapse rate proposed by Penrose in 1996 
(see the next section and papers by Christian in 
1999 and 2005). 


Bound state excitation Collapse narrows wave 
packets, thereby imparting energy to particles. If 
H = YN Ê? /2m; + Vix1,...,xn), it is straight- 
forward to calculate from [21] that 


da ON x go 
qg W= yg" r|Hp(t)| = La 4m? |23] 


For a nucleon, the mean rate of energy increase is 
quite small, ~3 x 107% eV s71. However, deviations 
from the mean can be significantly greater. 
Equation [21] predicts excitation of atoms and 
nuclei. Let |Eo) be an initial bound energy 
eigenstate. Expanding [21] in a power series in 
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(bound state size/a)*, the excitation rate of state 
dt <0 


Ey is 
me esa Fo] 5 XB 
242 am 1 0 0 — mp 1 


r — HEIHE) 
+ O(size/a)* [24] 








Since |Eo), E) are „eigenstates of the center-of-mass 
operator Y^; m;X;/ X; m; with eigenvalue 0, the 
dipole contribution explicitly given in [24] a 
identically. This leaves the quadrupole contribution 
as the leading term, which is too small to be 
measured at present. 

However, the choice of A(x) as mass-density 
Operator was made only after experimental indica- 
tion. Let g; replace m;/m, in [21] and [24], so that 
Ag? is the collapse rate for the ith particle. Then, 
experiments looking for the radiation expected from 
“spontaneously” excited atoms and nuclei, in large 
amounts of matter for a long time, as shown by 
Collett, Pearle, Avignone, and Nussinov in 1995, 
Pearle, Ring, Collar, and Avignone in 1999, and 
Jones, Pearle, and Ring in 2004, have placed the 
following limits: 


12m, 
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Random walk According to [17] and [13], the 
center-of-mass wave packet, of a piece of matter of 
size a or smaller, containing N nucleons, achieves 
equilibrium size s in a characteristic time 7,, and 
undergoes a random walk through a root-mean- 
square distance AO: 


_ a*h s _ Nmps 
> mN) 7? Ah 
[25] 
f1/2,3/2 
AO x ——— 
mpa 


The results in [25] were obtained by Collett and 
Pearle in 2003. These quantitative results can be 
qualitatively understood as follows. 

In time Aż, the usual Schrödinger equation 
expands a wave packet of size s to s+ 
(b/Nmps)At. CSL collapse, by itself, narrows the 
wave packet to ~s[1 — AN? (s/a) At]. The condition 
of no change in s is the result quoted above. 7, is the 
time it takes the Schrodinger evolution to expand a 
wave packet near size s to size s: (4/Nmps)T; ™ s. 
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The t?/* dependence of AO arises because this is a 
random walk without damping (unlike Brownian 
motion, where AOQw~t'!/*). The mean energy 
increase SAND moat of [23] implies the root- 
mean-square velocity increase ~ [Ah mat] 7 
whose product with t is AQ. 

For example, a sphere of density 1cm™ and 
radius 10™cm has s %4 x 107’ cm, 7r, ~% 0.6s and 
AO x S[t in days]°/2 cm. At the low pressure of 
5 x 107!" torr at 4.2K reported by Gabrielse’s 
group in 1990, the mean collision time with gas 
molecules is ~80 min, over which AO x 0.7mm. 
Thus, observation of this effect should be feasible. 


=3 


Further Remarks 


It is possible to define energy for the w(x, t) field so 
that total energy is conserved: as the particles gain 
energy, the w-field loses energy, as shown by Pearle 
in 2005. 

Attempts to construct a special-relativistic CSL- 
type model have not yet succeeded, although 
Pearle in 1990, 1992, and 1999, Ghirardi, Grassi, 
and Pearle in 1990, and Nicrosini and Rimini in 
2003 have made valiant attempts. The problem is 
that the white noise field w/(x,t) contains all 
wavelengths and frequencies, exciting the vacuum 
in lowest order in A to produce particles at the 
unacceptable rate of infinite energy/per second per 
cubic centimeter. Collapse models which utilize a 
colored noise field w have a similar problem in 
higher orders. In 2005, Pearle suggested a quasir- 
elativistic model which reduces to CSL in the low- 
speed limit. 

CSL is a phenomenological model which describes 
dynamical collapse so as to achieve S. Besides 
needing decisive experimental verification, it needs 
identification of the w(x,t) field with a physical 
entity. 

Other collapse models which have been investi- 
gated are briefly described below. 


Spontaneous Localization Model 


The SL model of Ghirardi et al. (1986), although 
superseded by CSL, is historically important and 
conceptually valuable. Let H=0 for simplicity, and 
consider a single particle whose wave function at 
time ¢ is (x,t). Over the next interval dz, with 
probability 1 — Adt, it does not change. With prob- 
ability Adt it does change, by being “spontaneously 
localized” or “hit.” A hit means that the new 
(unnormalized) wave function suddenly becomes 


w(x, t + dt) = w(x, t) (maz) 3/4 eC e- 


with probability 
T J dxly(x, t + de)? 


Thus z, the “center” of the hit, is most likely to be 
located where the wave function is large. For a single 
particle in the superposition described in the subsec- 
tion “Interference,” a single hit is overwhelmingly 
likely to reduce the wave function to one or the other 
location, with total probability |a;|*, at the rate A. 

For an N-particle clump, it is considered that each 
particle has the same independent probability, Adz, 
of being hit. But, for the example in the subsection 
“Interference,” a single hit on any particle in one 
location of the clump has the effect of multiplying 
the wave function part describing the clump in the 
other location by the tail of the Gaussian, thereby 
collapsing the wave function at the rate AN. 

By use of the Gaussian hit rather than a delta- 
function hit, SL solves the problem of giving too 
much energy to particles as mentioned in the 
subsection “CSL.” By the hypothesis of independent 
particle hits, SL also solves the problem of achieving 
a slow collapse rate for a superposition of small 
objects and a fast collapse rate for a superposition of 
large objects. However, the hits on individual 
particles destroys the (anti-) symmetry of wave 
functions. The CSL collapse toward mass density 
eigenstates removes that problem. Also, while SL 
modifies the Schrödinger evolution of a wave 
function, it involves discontinuous dynamics and so 
is not described by a modified Schrödinger equation 
as is CSL. 


Other Models 


For a single (low-energy) particle, the polar decom- 
position U=Re'/”)S of the Schrödinger equation 
implies two real equations, 


2 
Gt: (R=) =0 [26] 
Ot m 
(the continuity equation for R? = |Y|*) and 
ƏS (VS) 
BETE beal = 2 
a t oe +V+0=0 [27] 


where O= —(h° /2m)V2R/R is the “quantum 
potential.” (These equations have an obvious gen- 
eralisation to  higher-dimensional configuration 
space.) In 1926, Madelung proposed that one should 
start from [26] and [27] — regarded as hydrodyna- 
mical equations for a classical charged fluid with 
mass density mR? and fluid velocity VS/m — and 
construct Y = Re'/”)S from the solutions. 


This “hydrodynamical” interpretation suffers from 
many difficulties, especially for many-body systems. 
In any case, a criticism by Wallstrom (1994) seems 
decisive: [26] and [27] (and their higher-dimensional 
analogs) are not, in fact, equivalent to the Schrödin- 
ger equation. For, as usually understood, the quan- 
tum wave function W is a single-valued and 
continuous complex field, which typically possesses 
nodes (¥=0), in the neighborhood of which the 
phase $ is multivalued, with values differing by 
integral multiples of 27h. If one allows S in [26], 
[27] to be multivalued, there is no reason why the 
allowed values should differ by integral multiples of 
27h, and in general W will not be single-valued. On 
the other hand, if one restricts S in [26], [27] to be 
single-valued, one will exclude wave functions — such 
as those of nonzero angular momentum — with a 
multivalued phase. (This problem does not exist in 
pilot-wave theory as we have presented it here, where 
W is regarded as a basic entity.) 

Stochastic mechanics, introduced by Fényes in 1952 
and Nelson (1966), has particle trajectories x(t) 
obeying a “forward” stochastic differential equation 
dx(t) = b(x(t), t)dt + dw(t), where b is a drift (equal to 
the mean forward velocity) and w a Wiener process, 
and also a similar “backward” equation. Defining 
the “current velocity” v= (1/2)(b + b.), where b, is 
the mean backward velocity, and using an appropriate 
time-symmetric definition of mean acceleration, one 
may impose a stochastic version of Newton’s second 
law. If one assumes, in addition, that v is a gradient 
(v=VS/m for some S), then one obtains [26], [27] 
with R= ,/p, where p is the particle density. 
Defining Y = Spell! PS it appears that one recovers 
the Schrodinger equation for the derived quantity W. 
However, again, there is no reason why S should 
have the specific multivalued structure required for 
the phase of a single-valued complex field. It then 
seems that, despite appearances, quantum theory 
cannot in fact be recovered from stochastic 
mechanics (Wallstrom 1994). The same problem 
occurs in models that use stochastic mechanics as an 
intermediate step (e.g., Markopoulou and Smolin in 
2004): the Schrödinger equation is obtained only for 
exceptional, nodeless wave functions. 

Bohm and Bub (1966) first proposed dynamical 
wave-function collapse through deterministic evolu- 
tion. Their collapse outcome is determined by the 
value of a Wiener-Siegel hidden variable (a variable 
distributed uniformly over the unit hypersphere in a 
Hilbert space identical to that of the state vector). In 
1976, Pearle proposed dynamical wave-function col- 
lapse equations where the collapse outcome is deter- 
mined by a random variable, and suggested (Pearle 
1979) that the modified Schrödinger equation be 
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formulated as an Itô stochastic differential equation, 
a suggestion which has been widely followed. (The 
equation for the state vector given here, which is 
physically more transparent, has its time derivative 
equivalent to a Stratonovich stochastic differential 
equation, which is readily converted to the Ito form.) 
The importance of requiring that the density matrix 
describing collapse be of the Lindblad—Kossakowski 
form was emphasized by Gisin in 1984 and Diosi in 
1988. The stochastic differential Schrodinger equation 
that achieves this was found independently by Diosi in 
1988 and by Belavkin, Gisin, and Pearle in separate 
papers in 1989 (see Ghirardi et al. 1990). 

A gravitationally motivated stochastic collapse 
dynamics was proposed by Diosi in 1989 (and some- 
what corrected by Ghirardi et al. in 1990). Penrose 
emphasized in 1996 that a quantum state, such as that 
describing a mass in a superposition of two places, puts 
the associated spacetime geometry also in a super- 
position, and has argued that this should lead to wave- 
function collapse. He suggests that the collapse time 
should be ~4/AE, where AE is the gravitational 
potential energy change obtained by actually displa- 
cing two such masses: for example, the collapse time 
~h/(Gm*/R), where the mass is m, its size is R, and 
the displacement is ~R or larger. No specific dynamics 
is offered, just the vision that this will be a property of 
a correct future quantum theory of gravity. 

Collapse to energy eigenstates was first proposed 
by Bedford and Wang in 1975 and 1977 and, in the 
context of stochastic collapse (e.g., [11] with A =H), 
by Milburn in 1991 and Hughston in 1996, but it has 
been argued by Finkelstein in 1993 and Pearle in 
2004 that such energy-driven collapse cannot give a 
satisfactory picture of the macroscopic world. 
Percival in 1995 and in a 1998 book, and Fivel in 
1997 have discussed energy-driven collapse for 
microscopic situations. 

Adler (2004) has presented a classical theory 
(a hidden-variables theory) from which it is argued 
that quantum theory “emerges” at the ensemble level. 
The classical variables are N x N matrix field ampli- 
tudes at points of space. They obey appropriate 
classical Hamiltonian dynamical equations which he 
calls “trace dynamics,” since the expressions for 
Hamiltonian, Lagrangian, Poisson bracket, etc., have 
the form of the trace of products of matrices and their 
sums with constant coefficients. Using classical statis- 
tical mechanics, canonical ensemble averages of 
(suitably projected) products of fields are analyzed 
and it is argued that they obey all the properties 
associated with Wightman functions, from which 
quantum field theory, and its nonrelativistic-limit 
quantum mechanics, may be derived. As well as 
obtaining the algebra of quantum theory in this way, 
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it is argued that statistical fluctuations around the 
canonical ensemble can give rise to the behavior of 
wave-function collapse, of the kind discussed here, 
both energy-driven and CSL-type mass-density-driven 
collapse so that, with the latter, comes the Born 
probability interpretation of the algebra. The Hamil- 
tonian needed for this theory to work is not provided 
but, as the argument progresses, its necessary features 
are delimited. 


See also: Quantum Mechanics: Foundations. 
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Introduction 


In quantum theory, the mean value of a certain 
observable A in a (pure) quantum state |7) is defined 
by the quadratic form: 


N 


(A); =: (Ali) [1] 


Here A is Hermitian operator on the Hilbert space 
H of states. We use Dirac formalism. The above 
mean is interpreted statistically. No other forms had 
been known to possess a statistical interpretation in 
standard quantum theory. One can, nonetheless, try 


to extend the notion of mean for normalized bilinear 
expressions (Aharonov et al. 1988): 


__ Ali 
Aw = UF ‘ 


However unusual is this structure, standard quan- 
tum theory provides a plausible statistical interpre- 
tation for it, too. The two pure states |), |f} play the 
roles of the prepared initial and the postselected 
final states, respectively. The statistical interpreta- 
tion relies upon the concept of weak measurement. 
In a single weak measurement, the notorious 
decoherence is chosen asymptotically small. In 
physical terms, the coupling between the measured 
state and the meter is assumed asymptotically weak. 
The novel mean value [2] is called the (complex) 
weak value. 

The concept of quantum weak measurement 
(Aharonov et al. 1988) provides particular 





conclusions on postselected ensembles. Weak mea- 
surements have been instrumental in the interpreta- 
tion of time-continuous quantum measurements on 
single states as well. Yet, weak measurement itself 
can properly be illuminated in the context of 
classical statistics. Classical weak measurement as 
well as postselection and time-continuous measure- 
ment are straightforward concepts leading to con- 
clusions that are natural in classical statistics. In 
quantum context, the case is radically different and 
certain paradoxical conclusions follow from weak 
measurements. Therefore, we first introduce the 
classical notion of weak measurement on postse- 
lected ensembles and, alternatively, in time-contin- 
uous measurement on a single state. Certain idioms 
from statistical physics will be borrowed and certain 
not genuinely quantum notions from quantum 
theory will be anticipated. The quantum counterpart 
of weak measurement, postselection, and continuous 
measurement will be presented afterwards. The 
apparent redundancy of the parallel presentations 
is of reason: the reader can separate what is 
common in classical and quantum weak measure- 
ments from what is genuinely quantum. 


Classical Weak Measurement 


Given a normalized probability density p(X) over 
the phase space {X}, which we call the state, the 
mean value of a real function A(X) is defined as 


(A), =: J dX Ap [3] 


Let the outcome of an (unbiased) measurement of A 
be denoted by a. Its stochastic expectation value 
Eja] coincides with the mean [3]: 


4] 


Performing a large number N of independent 
measurements of A on the elements of the ensemble 
of identically prepared states, the arithmetic mean a 
of the outcomes yields a reliable estimate of E[a] 
and, this way, of the theoretical mean (A). 
Suppose, for concreteness, the measurement 
outcome a is subject to a Gaussian stochastic 
error of standard dispersion o > 0. The probability 
distribution of a and the update of the state 


corresponding to the Bayesian inference are 
described as 
pla) = (Go(a—A)), 5] 
= G,(a — A) [6] 
pP p(a) o P 
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respectively. Here G, is the central Gaussian 
distribution of variance ø. Note that, as expected, 
eqn [5] implies eqn [4]. Nonzero o means that the 
measurement is nonideal, yet the expectation value 
E[a] remains calculable reliably if the statistics N is 
suitably large. 

Suppose the spread of A in state p is finite: 


A? A =: (4°), — (A); < 00 [7] 


Weak measurement will be defined in the asympto- 
tic limit (eqns [8] and [9]) where both the stochastic 
error of the measurement and the measurement 
statistics go to infinity. It is crucial that their rate is 
kept constant: 


o, N > œ [8] 
2 
= z = const. [9] 


Obviously for asymptotically large o, the precision 
of individual measurements becomes extremely 
weak. This incapacity is fully compensated by the 
asymptotically large statistics N. In the weak 
measurement limit (eqns [8] and [9]), the probability 
distribution py of the arithmetic mean a of the N 
independent outcomes converges to a Gaussian 
distribution: 


pula) + Ga(a- (A), ] 10) 


The Gaussian is centered at the mean (A),, and the 
variance of the Gaussian is given by the constant 
rate [9]. Consequently, the mean [3] is reliably 
calculable on a statistics N growing like ~ o?. 

With an eye on quantum theory, we consider two 
situations — postselection and _ time-continuous 
measurement — of weak measurement in classical 


statistics. 


Postselection 


For the preselected state p, we introduce postselec- 
tion via the real function II(X), where 0 < II < 1. 
The postselected mean value of a certain real 
function A(X) is defined by 


i [11] 





where (II), is the rate of postselection. Postselection 
means that after having obtained the outcome a 
regarding the measurement of A, we measure the 
function II, too, in ideal measurement with random 
outcome m upon which we base the following 
random decision. With probability m, we include 
the current a into the statistics and we discard it 
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with probability 1 — m. Then the coincidence of E[a] 
and (A),, as in eqn [4], remains valid: 


Ela] = nA), [12] 


p? 


Therefore, a large ensemble of postselected states 

allows one to estimate the postselected mean 1(A),. 
Classical postselection allows introducing the 

effective postselected state: 

II 

pu = [13] 

p 

Then the postselected mean [11] of A in state p can, 

by eqn [14], be expressed as the common mean of A 

in the effective postselected state py: 


mA)» = (Alon [14] 


As we shall see later, quantum postselection is 
more subtle and cannot be reduced to common 
statistics, that is, to that without postselection. The 
quantum counterpart of postselected mean does not 
exist unless we combine postselection and weak 
measurement. 


Time-Continuous Measurement 


For time-continuous measurement, one abandons the 
ensemble of identical states. One supposes that a single 
time-dependent state p; is undergoing an infinite 
sequence of measurements (eqns [5] and [6]) of A 
employed at times t = ôt, t = 26t, t = 36t,... . The rate 
v =:1/dt goes to infinity together with the mean 
squared error o*. Their rate is kept constant: 


0, V — CO [15] 
2 

eS ? = const. [16] 
V 


In the weak measurement limit (eqns [15] and [16]), 
the infinite frequent weak measurements of A 
constitute the model of time-continuous measure- 
ment. Even the weak measurements will signifi- 
cantly influence the original state pọ, due to the 
accumulated effect of the infinitely many Bayesian 
updates [6]. The resulting theory of time-continuous 
measurement is described by coupled Gaussian 
processes [17] and [18] for the primitive function 
a, of the time-dependent measurement outcome 
and, respectively, for the time-dependent Bayesian 
conditional state p;: 


— ,-1 
do: = g'(A — (A) p )or dW [18 


Here dW, is the Itô differential of the Wiener 
process. 


Equations [17] and [18] are the special case of the 
Kushner-Stratonovich equations of time-continuous 
Bayesian inference conditioned on the continuous 
measurement of A yielding the time-dependent 
outcome value a;. Formal time derivatives of both 
sides of eqn [17] yield the heuristic equation 


ar = (A), +86 19) 


Accordingly, the current measurement outcome is 
always equal to the current mean plus a term 
proportional to standard white noise &. This 
plausible feature of the model survives in the 
quantum context as well. As for the other equation 
[18], it describes the gradual concentration of the 
distribution p; in such a way that the variance A,,A 
tends to zero while (A), tends to a random 
asymptotic value. The details of the convergence 
depend on the character of the continuously mea- 
sured function A(X). Consider a stepwise A(X): 


A(X) = )/a*P*(X) [20] 
À 


The real values a^ are step heights all differing from 


each other. The indicator functions P^ take values 
0 or 1 and form a complete set of pairwise disjoint 
functions on the phase space: 


Sf ed [21] 
À 


Pb [22] 


In a single ideal measurement of A, the outcome a is 
one of the a’’s singled out at random. The 
probability distribution of the measurement out- 
come and the corresponding Bayesian update of the 
state are given by 


po [23] 
L 24 a 
po > p po =: p |24] 


respectively. Equations [17] and [18] of time- 
continuous measurement are a connatural time- 
continuous resolution of the “sudden” ideal 
measurement (eqns [23] and [24]) in a sense that 
they reproduce it in the limit £ — oo. The states p^ 
are trivial stationary states of the eqn [18]. It can be 
shown that they are indeed approached with 
probability p* for t — oo. 


Quantum Weak Measurement 


In quantum theory, states in a given complex 
Hilbert space H are represented by non-negative 
density operators f, normalized by tr ô= 1. Like the 


classical states p, the quantum state ô is interpreted 
statistically, referring to an ensemble of states with 
the same f. Given a Hermitian operator A, called 


A 


observable, its theoretical mean value in state ĝ is 


defined by 
(A), = tr(Ap) 25] 


p 

Let the outcome of an (unbiased) quantum measure- 
ment of A be denoted by a. Its stochastic expectation 
value E[a] coincides with the mean [25]: 


|26] 


Performing a large number N of independent 
measurements of À on the elements of the ensemble 
of identically prepared states, the arithmetic mean a 
of the outcomes yields a reliable estimate of E[a]| 
and, this way, of the theoretical mean (A) a If the 
measurement outcome a contains a Gaussian sto- 
chastic error of standard dispersion o, then the 
probability distribution of a and the update, called 
collapse in quantum theory, of the state are 
described by eqns [27] and [28], respectively. (We 
adopt the notational convenience of physics litera- 
ture to omit the unit operator I from trivial 
expressions like al.) 


pla) = (Go(a—A)) 27 


A. alr e 

P> Gy(a— AypGya-A) 28 
Nonzero o means that the measurement is nonideal, 
but the expectation value E[a] remains calculable 
reliably if N is suitably large. 

Weak quantum measurement, like its classical 

counterpart, requires finite spread of the observable 
A on state ĝ: 


Oo en fe RE 
AGA =: (A*), — (A), < 00 [29] 


Weak quantum measurement, too, will be defined in 
the asymptotic limit [8] introduced for classical weak 
measurement. Single quantum measurements can no 
more distinguish between the eigenvalues of A. Yet, 
the expectation value E[a] of the outcome a remains 
calculable on a statistics N growing like ~o7. 

Both in quantum theory and classical statistics, 
the emergence of nonideal measurements from ideal 
ones is guaranteed by general theorems. For com- 
pleteness of this article, we prove the emergence of 
the nonideal quantum measurement (eqns [27] and 
[28]) from the standard von Neumann theory of 
ideal quantum measurements (von Neumann 1955). 
The source of the statistical error of dispersion o 
is associated with the state f, in the complex 
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Hilbert space £L? of a hypothetic meter. Suppose 
R € (~œ, ov) is the position of the “pointer.” Let its 
initial state 6, be a pure central Gaussian state of 
width o; then the density operator 6, in Dirac 
position basis takes the form 


pu = [ar far'cy?(R)GY?(R)IRV(R'| BO 


We are looking for a certain dynamical interaction 
to transmit the “value” of the observable A onto the 
pointer position R. To model the interaction, we 
define the unitary transformation [31] to act on the 
tensor space H ® L°: 


U = exp(iA @ K) [31] 


Here K is the canonical momentum operator 
conjugated to R: 


exp(iaK)|R) = |R + a) [32] 


The unitary operator U transforms the initial 
uncorrelated quantum state into the desired corre- 
lated composite state: 


È =: Up ® ômÛ' [33] 


Equations [30]-[33] yield the expression [34] for the 
state X: 


2 = J aR | aR'G!/?(R =A JG 
x (R' — A) @ |R)(R"| [34] 


Let us write the pointer’s coordinate operator R into 
the standard form [35] in Dirac position basis: 


R= J daļa) la| [35] 


The notation anticipates that, when pointer R is 
measured ideally, the outcome a plays the role of the 
nonideally measured value of the observable À. 
Indeed, let us consider the ideal von Neumann 
measurement of the pointer position on the corre- 
lated composite state *>. The probability of the 
outcome a and the collapse of the composite state 
are given by the following standard equations: 


p(a) = tr|(1@ Jayla) È| 36 


P 1 r — 
DETE, ae | 

sa [F@laa)ECe@ laa] 187) 
respectively. We insert eqn [34] into eqns [36] and 
[37]. Furthermore, we take the trace over £% of both 
sides of eqn [37]. In such a way, as expected, eqns 
[36] and [37] of ideal measurement of R yield the 
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earlier postulated eqns [27] and [28] of nonideal 
measurement of A. 


Quantum Postselection 


A quantum postselection is defined by a Hermitian 
operator satisfying Ô < ÛÎ < Ī. The corresponding 
postselected mean value of a certain observable A is 


defined by 


“ay, ope ital 
(A), =: Re i, [38] 





The denominator (II) ə is the rate of quantum 
postselection. Quantum postselection means that 
after the measurement of A, we measure the 
observable II in ideal quantum measurement and 
we make a statistical decision on the basis of the 
outcome m. With probability m, we include the case 
in question into the statistics while we discard it 
with probability 1 — r. By analogy with the classical 
case [12], one may ask whether the stochastic 
expectation value E[a] of the postselected measure- 
ment outcome does coincide with 


Ela] =n (Â); [39] 


Contrary to the classical case, the quantum equation 
[39] does not hold. The quantum counterparts of 
classical equations [12]-[14] do not exist at all. 
Nonetheless, the quantum postselected mean (A); 
possesses statistical interpretation although 
restricted to the context of weak quantum measure- 
ments. In the weak measurement limit (eqns [8] and 
[9]), a postselected analog of classical equation [10] 
holds for the arithmetic mean 4 of postselected weak 
quantum measurements: 


pu(@) > Ga(a— 9(A),) 40) 


The Gaussian is centered at the postselected mean 
4 (A) 4, and the variance of the Gaussian is given by the 
constant rate [9]. Consequently, the mean [38] 
becomes calculable on a statistics N growing like ~ o°. 

Since the statistical interpretation of the postse- 
lected quantum mean [38] is only possible for weak 
measurements, therefore a (A); is called the (real) 
weak value gf A. Consider the special case when 
na the state ô= |i) (i| and the postselected operator 

=|f)(f| are pure states. Then the weak value 
(Ay, takes, in usual notations, a particular form 
[41] yielding the real part of the complex weak 
value Ay [1]: 


HÂ); = Re 41] 





The interpretation of postselection itself reduces to a 
simple procedure. One performs the von Neumann 
ideal measurement of the Hermitian projector |f) (f|, 
then includes the case if the outcome is 1 and 
discards it if the outcome is 0. The rate of 
postselection is |(f|ġ|/. We note that a certain 
statistical interpretation of Im Aw, too, exists 
although it relies upon the details of the “meter.” 

We outline a heuristic proof of the central 
equation [40]. One considers the nonideal measure- 
ment (eqns [27] and [28]) of A followed by the ideal 
measurement of II. Then the joint distribution of the 
corresponding outcomes is given by eqn [42]. The 
probability distribution of the postselected outcomes 
a is defined by eqn [43], and takes the concrete form 
[44]. The constant M assures normalization: 


p(r,a) = tr( 6(n— f1)G"/2(a— A) pG¥/? (a -A)) 42] 


A) = x | mote a) dr [43] 


_ 1 72 ANT O1/2 À 
pla) = (G3 la- A)HGY(a—A)) 4] 
Suppose, for simplicity, that A is bounded. When 
o — oo, eqn [44] yields the first two moments of 
the outcome a: 


Eja] > (A), [45] 
E[a?] ~ o° [46] 


Hence, by virtue of the central limit theorem, the 
probability distribution [40] follows for the average 
a of postselected outcomes in the weak measurement 
limit (eqns [8] and [9]). 


Quantum Weak-Value Anomaly 


Unlike in classical postselection, effective postse- 
lected quantum states cannot be introduced. We can 
ask whether eqn [47] defines a correct postselected 
quantum state: 


[47] 





This pseudo-state satisfies the quantum counterpart 
of the classical equation [14]: 


lA), = (Api) [48] 


In general, however, the operator py is not a density 
operator since it may be indefinite. Therefore, eqn 
[47] does not define a quantum state. Equation [48] 
does not guarantee that the quantum weak value 


(A) a lies within the range of the eigenvalues of the 
observable A. 

Let us see a simple example for such anomalous 
weak values in the two-dimensional Hilbert space. 
Consider the pure initial state given by eqn [49] and 
the postselected pure state by eqn [50], where 
@ € [0,7] is a certain angular parameter. 


=| Son 49) 
=| er | so 


The probability of successful postselection is cos? ¢. 
If @A#7/2, then the postselected pseudo-state 
follows from eqn [47]: 


2 1 1 cos! ¢ 
r= 2 coig 1 | i 


This matrix is indefinite unless @=0, its two 
eigenvalues are 1 + cos! ¢. The smaller the post- 
selection rate cos? @, the larger is the violation of the 
positivity of the pseudo-density operator. Let the 
weakly measured observable take the form 


A= f 4 [52] 


Its eigenvalues are +1. We express its weak value 
from eqns [41], [49], and [50] or, equivalently, from 
eqns [48] and [51]: 


a 1 
pA 7 cos Q 





|53] 


This weak value of À lies outside the range of the 
eigenvalues of A. The anomaly can be arbitrarily 
large if the rate cos? ¢ of postselection decreases. 
Striking consequences follow from this anomaly 
if we turn to the statistical interpretation. For 
concreteness, suppose @¢=27/3 so that ;(A);=2. 
On average, 75% of the statistics N will be lost 
in postselection. We learnt from eqn [40] that 
the arithmetic mean a of the postselected outcomes 
of independent weak measurements converges 
stochastically to the weak value upto the Gaussian 
fluctuation A, as expressed symbolically by 


@=2+A [54] 


Let us approximate the asymptotically large error o 
of our weak measurements by o=10 which is 
already well beyond the scale of the eigenvalues +1 
of the observable À. The Gaussian error A derives 
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from eqn [9] after replacing N by the size of the 
postselected statistics which is approximately N/4: 


A? = 400/N [55] 


Accordingly, if N=3600 independent quantum 
measurements of precision o=10 are performed 
regarding the observable A, then the arithmetic 
mean 4 of the ~ 900 postselected outcomes a will be 
2 + 0.33. This exceeds significantly the largest 
eigenvalue of the measured observable A. Quantum 
postselection appears to bias the otherwise unbiased 
nonideal weak measurements. 


Quantum Time-Continuous Measurement 


The mathematical construction of time-continuous 
quantum measurement is similar to the classical one. 
We consider the weak measurement limit (eqns [15] 
and [16]) of an infinite sequence of nonideal 
quantum measurements of the observable A at 
t= dt, 26t,..., on the time-dependent state f,. The 
resulting theory of time-continuous quantum mea- 
surement is incorporated in the coupled stochastic 
equations [56] and [57] for the primitive function a; 
of the time-dependent outcome and the conditional 
time-dependent state /,, respectively (Didsi 1988): 


da, = (A), dt + gdW, [56] 


dp, = — $ g” (A, (A, p;|| dt 
+ g`! Herm (A — (â); ) be dW, [57] 


Equation [56] and its classical counterpart [17] are 
perfectly similar. There is a remarkable difference 
between eqn [57] and its classical counterpart [18]. 
In the latter, the stochastic average of the state is 
constant: E[dp;|=0, expressing the fact that classi- 
cal measurements do not alter the original ensemble 
if we “ignore” the outcomes of the measurements. 
On the contrary, quantum measurements introduce 
irreversible changes to the original ensemble, a 
phenomenon called decoherence in the physics 
literature. Equation [57] implies the closed linear 
first-order differential equation [58] for the stochas- 
tic average of the quantum state , under time- 
continuous measurement of the observable A: 





— 3g" (A, A, E[A,]]] [58] 


This is the basic irreversible equation to model the 
gradual loss of quantum coherence (decoherence) 
under time-continuous measurement. In fact, the 
very equation models decoherence under the influ- 
ence of a large class of interactions, for example, 
with thermal reservoirs or complex environments. In 
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two-dimensional Hilbert space, for instance, we can 
consider the initial pure state (i| =: [cos @, sin ġ] and 
the time-continuous measurement of the diagonal 
observable [59] on it. The solution of eqn [58] is 
given by eqn [60]: 

n 1 0 

A= | k A 59) 


et/48° cos sin d 


sin? ġ 


2 
ae 60) 
e/48" cos œ sin ọġ 
The off-diagonal elements of this density matrix 
go to zero, that is, the coherent superposition 
represented by the initial pure state becomes an 
incoherent mixture represented by the diagonal 
density matrix p,.. 

Apart from the phenomenon of decoherence, the 
stochastic equations show remarkable similarity 
with the classical equations of time-continuous 
measurement. The heuristic form of eqn [56] is 
eqn [61] of invariable interpretation with respect 
to the classical equation [19]: 


ay = (A), + gE; [61] 


Equation [57] describes what is called the time- 
continuous collapse of the quantum state under 
time-continuous quantum measurement of A. For 
concreteness, we assume discrete spectrum for A and 
consider the spectral expansion 


A=)S aP [62] 
À 


The real values a* are nondegenerate eigenvalues. 


The Hermitian projectors P^ form a complete 
orthogonal set: 


\S Ps! [63] 


A 
PAP = 5, P 64 


In a single ideal measurement of A, the outcome a is 
one of the a%s singled out at random. The 
probability distribution of the measurement out- 
come and the corresponding collapse of the state are 
given by 


pa. [65] 
: ioe ee 
po => rod poP* =: p [66] 


respectively. Equations [56] and [57] of continuous 
measurements are an obvious time-continuous 


resolution of the “sudden” ideal quantum measure- 
ment (eqns [65] and [66]) in a sense that they 
reproduce it in the limit £ — oo. The states f^ are 
stationary states of eqn [57]. It can be shown that 
they are indeed approached with probability p* for 
t— oo (Gisin 1984). 


Related Contexts 


In addition to the two particular examples as 
in postselection and in time-continuous measure- 
ment, respectively, presented above, the weak 
measurement limit itself has further variants. 
A most natural example is the usual thermodynamic 
limit in standard statistical physics. Then weak 
measurements concern a certain additive micro- 
scopic observable (e.g., the spin) of each constituent 
and the weak value represents the corresponding 
additive macroscopic parameter (e.g., the magneti- 
zation) in the infinite volume limit. This example 
indicates that weak values have natural interpreta- 
tion despite the apparent artificial conditions of 
their definition. It is important that the weak value, 
with or without postselection, plays the physical role 
similar to that of the common mean (A),. If, 
between their pre- and postselection, the states f 
become weakly coupled with the state of another 
quantum system via the observable A, their average 
influence will be as if A took the weak value 4(A) >. 
Weak measurements also open a specific loophole to 
circumvent quantum limitations related to the 
irreversible disturbances that quantum measure- 
ments cause to the measured state. Noncommuting 
observables become simultaneously measurable in 
the weak limit: simultaneous weak values of non- 
commuting observables will exist. 

Literally, weak measurement had been coined 
in 1988 for quantum measurements with (pre- and) 
postselection, and became the tool of a certain time- 
symmetric statistical interpretation of quantum states. 
Foundational applications target the paradoxical 
problem of pre- and retrodiction in quantum theory. 
In a broad sense, however, the very principle of weak 
measurement encapsulates the trade between asymp- 
totically weak precision and asymptotically large 
statistics. Its relevance in different fields has not yet 
been fully explored and a growing number of founda- 
tional, theoretical, and experimental applications are 
being considered in the literature — predominantly in 
the context of quantum physics. Since specialized 
monographs or textbooks on quantum weak measure- 
ment are not yet available, the reader is mostly referred 
to research articles, like the recent one by Aharonov 
and Botero (2005), covering many topics of postse- 
lected quantum weak values. 


Nomenclature 

a measurement outcome 

a arithmetic mean of measurement 
outcomes 

A Hermitian operator, quantum observable 

A(X) real phase-space function 

Ejas] stochastic expectation value 

(FIAD matrix element 

(Fli) inner product 

H Hilbert space 

L space of Lebesgue square-integrable 
complex functions 

p probability distribution 

tr trace 

U unitary operator 

W, Wiener process 

E white noise process 

ieee, postselected mean value 

f density operator 

p(X) phase-space distribution 

Q direct product 

i operator adjoint 

ER state vector 

TA adjoint state vector 

ons mean value 
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Introduction 


This article concerns the nonrelativistic quantum 
mechanics of isolated systems of n particles inter- 
acting by means of a scalar potential, what we shall 
call the “quantum n-body problem.” Such systems 
are described by the _ kinetic-plus-potential 
Hamiltonian, 





2m 


n P, 
moree. | +V(R,...,Rn) [1 
a=1 Q 


where Ra,Pa,a=1,...,n are the positions and 
momenta of the n particles in three-dimensional 
space, Ma are the masses, and V is the potential 
energy. This Hamiltonian also occurs in the 
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“classical n-body problem,” in which V is usually 
assumed to consist of the sum of the pairwise 
gravitational interactions of the particles. In this 
article, we shall only assume that V (hence H) is 
invariant under translations, proper rotations, par- 
ity, and permutations of identical particles. The 
Hamiltonian H is also invariant under time reversal. 
This Hamiltonian describes the dynamics of isolated 
atoms, molecules, and nuclei, with varying degrees 
of approximation, including the case of molecules in 
the Born-Oppenheimer approximation, in which V 
is the Born-Oppenheimer potential. We shall ignore 
the spin of the particles, and treat the wave function 
W as a scalar. We assume that WV is an eigenfunction 
of H, HY = EY. In practice, the value of n typically 
ranges from 2 to several hundred. Often the cases 
n=3 and n=4 are of special interest. In this article, 
we shall assume that n > 3, since n =2 is the trivial 
case of central-force motion. The quantum n-body 
problem is not to be confused with the “quantum 
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many-body problem,” which usually refers to the 
quantum mechanics of large numbers of identical 
particles, such as the electrons in a solid. 

Of particular interest is the “reduction” of the 
Hamiltonian [1], that is, the elimination of those 
degrees of freedom that can be eliminated due to the 
continuous symmetries of translations and rotations. 
A basic problem is to write down the reduced 
Hamiltonian and to make its analytical and geome- 
trical properties clear. In the following we shall 
present this reduction in two stages, dealing first with 
the translations and second with the proper rotations. 
In each stage, we shall describe the reduction first in 
coordinate language and then in geometrical lan- 
guage. The discrete symmetries of parity, time 
reversal, and permutation of identical particles are 
handled by standard methods of group representation 
theory, and will not be discussed here. 

There has been considerable interest in mathema- 
tical circles in recent years in the reduction of 
dynamical systems with symmetry, and the quantum 
n-body problem is one of the most important such 
systems from a physical standpoint. As such, the 
basic theory of the quantum n-body problem has 
received considerable attention in the physical 
literature going back to the birth of quantum 
mechanics, and continues to be of great practical 
importance. This article and the bibliography 
attempt to bridge these two centers of interest. 


Reduction by Translations: Coordinate 
Description 


We begin with a coordinate description of the 
reduction of the system [1] by translations. The 
coordinates (Rj,...,R,,) are coordinates on the con- 
figuration space of the system, called the “original 
configuration space” or OCS. The OCS is R”. The 
original system has 3n degrees of freedom. The 
translation group acts on configuration space by 
Ra Ra + €, for a=1,...,2, where € is a displace- 
ment vector. It acts on wave functions by 
W(Ry,..., Ra) = WR — €.-., Rn — €). 

To reduce the system by translations, we perform 
a linear coordinate transformation on the OCS, 
taking us from the original vectors (R1,...,R,) to a 
new set of n vectors (11,...,%n-1,Rcm), where Rem 
is the center-of-mass position, 


1 n 
Rem = M 2 MaRa [2] 


where M = X, ma is the total mass of the system, and 
the other n — 1 vectors of the new coordinate system, 
(r1,.--5%-1), are required to be translationally 


invariant, that is, independent linear functions of the 
relative particle positions Ra — Rg. We denote the 
momenta conjugate to (r1,...,fn-1,Rcm) by 
(P15--+>P,—-19P'cm), of which Pcm turns out to be the 
total momentum of the system, 


Poem = Y Pa (3) 
a=1 


Under such a coordinate transformation, the poten- 
tial energy becomes simply a function of the n — 1 
relative vectors, V(rj,...,%n—1), whereas the kinetic 
energy becomes 


=. — a De 
2M 2 T 


where K®’ is a symmetric tensor (the “inverse mass 
tensor”). 

The vectors (r1,...,1%n—1) specify the positions of 7 
particles relative to their center of mass. As described 
so far, these vectors need only be independent, 
translationally invariant linear combinations of the 
particle postitions. However, it is convenient to 
choose them so that the inverse mass tensor becomes 
proportional to the identity, K°’ = (1/M)éag. An 
elegant way of doing this is the method of Jacobi 
vectors, which involves splitting the original set of 
particles into two nonempty subsets, which are then 
split into smaller subsets, etc., until only subsets of a 
single particle remain. The process can be represented 
by a tree growing downward, with the original n 
particles as the root, and the ends of the branches at 
the bottom each containing one particle. Then the 
vectors (r1,...,fn—1) (the Jacobi vectors) are chosen 
to be proportional to the differences between the 
centers of mass of the two subsets at each splitting. 
With the right constants of proportionality, the 
kinetic energy becomes 


T= pot t+ > pe 
TaM OM! Tay 2, Po 


5] 
Henceforth, we shall assume that the vectors 
(71,---5%m-1) are Jacobi vectors with conjugate 
momenta (p,,..-,P,,_1)- 

The choice of Jacobi vectors is not unique. In the 
first place, there is a discrete set of possible ways of 
splitting the original set of n particles into subsets 
(of forming trees), each of which leads to the same 
form [5] of the kinetic energy. More generally, the 
kinetic energy [5] is invariant under transformations 


n—1 


Q Qag rg [6] 


where Qag is an orthogonal matrix, O € O(n — 1). 
Such transformations are called “kinematic rota- 
tions.” The discrete choices of trees in forming the 
Jacobi vectors are equivalent to a discrete set of 
kinematic rotations Qag that map one standard 
choice of Jacobi vectors into the others. 

Since the momentum Pcy of the center of mass 
commutes with H, the eigenfunctions Y of H can be 
chosen to have the form 


W(Ry,...,Rn) 
= exp(iRcm f Pom /h)v(nn, ioe 1n-1) [7] 


This causes w~ to be an eigenfunction of the 


“translation-reduced Hamiltonian,” HeY = Exw, 
where 
1 n—1 j 
Hu = 7y 2 Pol + Viri,.-., Tn-1) [8] 
The kinetic energy of the center of mass, 


IPom|"/2M, has been discarded from both H,, and 
Ew, which represent physically the energy of the 
system about its center of mass. 


Reduction by Translations: Geometrical 
Description 


The kinetic energy T in eqn [1] specifies a metric 
d= >. m,|dR,|~ on the OCS (=R%”). The transla- 
tion group (=R°) acts freely on the OCS, with an 
action that is generated by Pcy. This action defines 
an orthogonal decomposition of the OCS, 
R?” = R? @ R”, where R? is the orbit of the origin 
(the other orbits of the translation group action are 
parallel spaces), and R°*”~? is the orthogonal subspace 
(henceforth the “translation-reduced configuration 
space” or TRCS for short). The TRCS is physically 
the space of configurations relative to the center of 
mass. The vectors (71,...,%,—1) are coordinates on 
the TRCS. The TRCS possesses a metric which is the 
projection of the metric on the OCS onto the TRCS 
by means of the translation group action. The metric 
can be projected because translations preserve the 
original metric (they are isometries). Jacobi vectors 
are Euclidean coordinates on the TRCS with respect 
to this metric. 

The tree method of constructing Jacobi vectors 
can be understood in terms of certain group actions 
which take place as each subset of particles is split 
into two further subsets. The group action in 
question leaves the center of mass of the original 
subset invariant, while moving the two new subsets 
apart along a line. This motion in the configuration 
space is orthogonal to all the other group actions 
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that are created in the process of splitting subsets of 
particles, including the original action of the 
translation group. Thus, each splitting of a subset 
of particles generates a three-dimensional subspace 
of the OCS, on which one of the ra are coordinates. 
The conjugate momentum p, is the generator of the 
group action moving the two new subsets apart. The 
final result is that the OCS is decomposed into n 
orthogonal, three-dimensional subspaces, one of 
which contains the action of the original translation 
group, and the others of which represent the 
decomposition of the TRCS into n— 1, three- 
dimensional orthogonal subspaces. 

The TRCS can also be seen as a global section of a 
flat, trivial, principal fiber bundle created by the 
action of the translation group on the OCS. 
Alternatively, the TRCS can be seen as the quotient 
space, R°*”/R°. The construction is fairly simple 
because the translation group is Abelian. 

The wave function w can be seen as a member of 
the Hilbert space of wave functions on the TRCS, 
upon which the reduced Hamiltonian H of eqn [8] 
acts. Alternatively, it can be seen as the function 
obtained by restricting YW on the OCS to the TRCS, 
where W has a dependence along the orbits of the 
translation group given by exp (iRcm - Pcw/h), that 
is, by an irreducible representation (irrep) of the 
translation group. 


Reduction by Rotations: Coordinate 
Description 


The Hamiltonian H,, acts on wave functions w 
defined on the TRCS and has 31—3 degrees of 
freedom. Consider a coordinate transformation to 
eliminate further degrees of freedom due to the 
rotational invariance. This coordinate transforma- 
tion takes us from the Jacobi vectors {rg,a=1,..., 
n — 1} to orientational and shape coordinates. Shape 
coordinates are a set of 32-6 coordinates 
{q", u=1,...,32 — 6} that specify the shape of the 
n-particle system, that is, they are 37 — 6 independent 
functions of the interparticle distances (hence rota- 
tionally invariant). We will call the space upon which 
the q” are coordinates “shape space.” For example, in 
the case of the three-body problem, shape space is the 
space of all triangles. 

As for orientational coordinates, to define them it 
is necessary first to define a “body frame.” We 
assume we are already given one frame, the “space 
frame,” a fixed inertial frame. The body frame is a 
3-frame attached in a conventional way to each shape 
of the system of particles, which rotates with the 
particles. The orientational coordinates, to be 
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denoted by {6’,i=1, 2, 3}, are three coordinates (e.g., 
Euler angles) specifying the SO(3) rotation that maps 
the space frame into the body frame. We shall write 
the new coordinates collectively as {6', q”}. 

There is a great deal of arbitrariness in the choice 
of a body frame, since for a given shape a body frame 
can be attached in many ways, the different choices 
being related by proper rotations. The only require- 
ment is that the body frame should change smoothly 
as the shape changes. Popular choices for the body 
frame are the principal axis and Eckart frames. 

When the potential energy is transformed to the new 
coordinates, it becomes a function only of the {q”}, 
that is, of the shape. The potential can be written as 
V=V(q). Visa scalar field on shape space. 

The transformation of the kinetic energy is more 
complicated. When the (Euclidean) metric tensor on 
the TRCS is transformed to orientational and shape 
coordinates there results a (3n — 3) x (3n — 3) com- 
ponent matrix which may be partitioned into blocks 
according to the coordinates {6, q”}, that is, accord- 
ing to 3n — 3=3 + (3n — 6). This matrix cannot be 
made diagonal or even block diagonal by any choice 
of orientational or shape coordinates, or by any 
choice of body frame. 

The components of the metric tensor in the new 
coordinates are conveniently expressed in terms of 
three fields on shape space. The first is the moment-of- 
inertia tensor E, which describes the 3 x 3 upper block 
of the metric tensor. Its components are given by 


n—1 
E; = M Ser Ôij = rifa) [9] 
a=1 


The vectors and tensors in this equation can be 
referred either to the space frame or the body frame, 
but the body frame is more convenient because then 
the components of the vectors rą are functions only 
of the shape coordinates q”. Thus, the body frame 
components E; of the moment-of-inertia tensor 
define a field on shape space. 

The second field is the “gauge potential” A,,, an 
object with 3(3n— 6) components Ai,i= 1,23; 
u=1,...,32—6, which describes the off-diagonal 
blocks of the metric tensor. It is defined by 


n—1 
A, = E~! (MS x se) [10] 


in which all vectors are understood to be referred to 
the body frame (so the partial derivatives make 
sense). The gauge potential A, is responsible for the 
“falling cat” phenomenon, in which a flexible body 
of zero angular momentum nevertheless manages to 
rotate. 


The third field is the (3n — 6) x (3n — 6) lower 
block of the metric tensor on the TRCS, an object 
with two shape indices. It is given by 


n—1 
Or, Ora 








where again the vectors are referred to the body 
frame. The notation suggests (correctly) that guv is 
the metric tensor on shape space. 

On transforming the wave function from the 
Jacobi vectors to coordinates (0', q”), it is convenient 
to introduce a Jacobian factor, wW(r1,...,%n-1)= 
D'/*4(6',q"), where D=(det E)(det g,,). This 
causes the new wave function @ to have the 


normalization 
3n—6 
/ (TI ua al? 12 
u=1 


where dR is the Haar measure on the group SO(3). 
The factor D depends only on the q“, not the 6’. 
Then the Schrödinger equation can be written as 
Hyp = Evro, where He is a differential operator 
involving 0/00 and ð/ðq”. 

The orientational derivatives 0/06’ in H,, are 
conveniently expressed in terms of the angular 
momentum operator L. When acting on the original 
wave function Y on the OCS, the angular momen- 
tum is 


iS > Beh es [13] 
a=1 
When this is transformed to the coordinates 
(11,---5%m-1,Rcm), it becomes L=Lecm + Lu, 
where Lem = Recm x Pcm, and 
n—1 
b=) tanpa [14 
a=1 


Physically, L, is the angular momentum of the 
system about the center of mass. 

We shall henceforth drop the “tr” on He, Ew, and 
La, thereby restricting attention to the energy and 
angular momentum about the center of mass. 

The angular momentum L, when acting on wave 
functions w(71,...,%n—-1) on the TRCS, is a vector of 
differential operators involving 0/Or,. When these 
are transformed to orientational and shape coordi- 
nates, the components of L become differential 
operators involving only orientational derivatives, 
0/06’. There are no shape derivatives, 0/0q", since 
L generates rotations, that is, changes in orientation, 
not shape. Thus, one can solve for the operators 
0/06’ in terms of the components of L. This is true 


both for the space and the body components of L, 
although the differential operators are not the same 
in the two cases. The space components of L satisfy 
the usual angular momentum commutation rela- 
tions, [L;, Lj] = ihe Lg, while the body components 
of satisfy [L;, Lj] = —ihe;, Lg (with a minus sign 
relative to the space commutation relations). 

Thus, the Hamiltonian can be expressed in 
terms of L and the shape momentum operators, 


pu = —1hd0/0q". The result is 


H=3L-E“-L+3(p,—-L-A,)g” (py —L-A,) 
+ V2(q) + V(q) [15] 
where all vectors are referred to the body frame, 


where g” is the contravariant metric tensor on 
shape space, and where V3 is given by 


be ð opis 
=—D Ap 16 





V> looks like a potential (it is a function of only q), 
hence the notation, but physically it belongs to the 
kinetic energy. It is sometimes called an “extrapoten- 
tial.” It arises from nonclassical commutators in the 
transformation of the kinetic energy (hence the b° 
dependence). The first term of eqn [15] is the kinetic 
energy of rotation, also called the “vertical” kinetic 
energy, the next two terms are the remainder of the 
kinetic energy, somewhat imprecisely thought of as 
the kinetic energy of vibrations or changes in shape, 
also called the “horizontal kinetic energy,” and the 
final term is the (true) potential, discussed above. 

Since the Hamiltonian commutes with the angular 
momentum, [H,L]=0,¢ can be chosen to be 
simultaneous eigenfunctions of L? and L, (the latter 
being the space component), as well as of energy. 
Let ım be these eigenfunctions, where l and m are 
the quantum numbers of L? and L,, respectively. 
Then by the transformation properties of @ under 
rotations, we can write 


+l 
bim(G, g”) E ` xXk(q") Din(') [17] 
k=-l 


where D is a standard rotation matrix and yy are 
functions only of q”. In these equations we use the 
phase and other standard conventions of the theory of 
rotations. The wave function y is a function only of q” 
and can loosely be thought of as the wave function on 
shape space. It is not a scalar like Y, yY, or ¢, but rather 
has 2l + 1 components indexed by k. 

The Schrödinger equation for y can be written 
as Hy=Ey, where H has the same form as in 
eqn [15], except that now the components of the 
angular momentum L; are interpreted, no longer 
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as differential operators in 6’, but as (2/+1)x 
(27+ 1) matrices that act on the “spinor” x. These 
matrices are the transposes of the usual angular 
momentum matrices in angular momentum theory, 
that is, (Liew = (k'|Li|). 

This is the final form of the Schrödinger equation 
after all reductions by all continuous symmetries 
have been carried out. The fully reduced system has 
3n — 5 degrees of freedom (3n — 6 for the shape 
coordinates, and one for the “spinor” index k). 


Reduction by Rotations: Geometrical 
Description 


The proper rotation group SO(3) acts on the OCS 
by Ra — RR,, and on the TRCS by ra> Rra, where 
R € SO(3). Rotations acting on the OCS do not 
commute with translations, but the action preserves 
the translation fibers, and thus can be projected onto 
the TRCS. 

The action of SO(3) on the TRCS is effective but 
not free, that is, most orbits are diffeomorphic to 
SO(3), but a subset of measure zero (the “singular” 
orbits) are diffeomorphic to S* or a single point. 
Configurations of the 1-particle system in which the 
particles do not lie on a line (“noncollinear shapes”) 
have SO(3) orbits, those in which the particles do lie 
on a line but are not coincident have S* orbits, and 
the n-body collision (a single shape) has an orbit that 
is a single point. Thus, the action of SO(3) on the 
TRCS foliates the TRCS into a (3n — 6)-parameter 
family of copies of SO(3), plus the singular orbits. If 
we exclude the singular orbits, then the TRCS has the 
structure of an SQO(3) principal fiber bundle. In 
general, the bundle is not trivial. Shape space may 
be defined as the quotient space under the SO(3) 
action. Omitting the singular shapes, shape space is 
the base space of the bundle. The coordinates q” 
introduced above are coordinates on shape space. 
The singular shapes and orbits are physically acces- 
sible, and there are important questions regarding the 
behavior of the system in their neighborhood. 

The definition of a body frame is equivalent to the 
choice of a section of the fiber bundle, generally 
only locally defined over some region of shape 
space. A configuration (a point in the TRCS) on the 
section defines an orientation of the n-particle 
system for the given shape, which serves as a 
reference orientation to which others can be 
referred. We think of the reference orientation as 
one in which the space and body frames coincide; in 
other orientations of the same shape, the body frame 
has been rotated with the body to a new orientation. 
The choice of the section (body frame) allows us to 
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impose coordinates on each (nonsingular) rotation 
fiber, that is, we label points on the fiber by the 
rotation that takes us from the section to the actual 
configuration in question. This is why a choice of 
body frame is necessary before defining orienta- 
tional coordinates. Sections are only defined locally. 
Popular choices of body frame, such as the principal 
axis frame, imply multivalued sections, unless 
branch cuts are introduced. Orientational coordi- 
nates are simply coordinates on the group manifold 
SO(3), transferred to the nonsingular rotation fibers, 
with the group identity element mapped onto the 
point where the fiber intersects the section. 

The metric tensor determines much of the geome- 
try of the reduction by rotations. Since the metric on 
the TRCS is SO(3)-invariant, horizontal subspaces in 
the SO(3) fiber bundle (the TRCS minus the singular 
orbits) can be defined as the spaces orthogonal to the 
fibers (hence orthogonal to the vertical subspaces). 
This is a standard construction in Kaluza—Klein 
theories, which reappears here. Thus, the bundle has 
a connection, induced by the metric. 

The moment-of-inertia tensor is the metric tensor 
restricted to a fiber, evaluated in a basis of left- 
(body frame) or right-invariant (space frame) vector 
fields on SO(3), which are transported to the fibers 
to create a basis of vertical vector fields. 

The coordinate description of the connection is 
the gauge potential A,,, in which the u index refers 
to shape coordinates q“, and the components of the 
3-vector A refer to the standard set of left- or right- 
invariant vector fields on SO(3). The coordinate 
representative of the curvature 2-form is conveni- 
ently denoted by B,,,, defined by 


_ 0A, ôA, 
" øg“ Og” 








-A,x A, [18] 


where it is understood that body frame components 
are used. Direct calculation shows that it is nonzero, 
hence the fiber bundle is not flat, for any value of 
n> 3. The curvature form B,, appears in the 
classical equation of motion and in the quantum 
commutation relations. 

The field B,,, satisfies differential equations on 
shape space that have the form of Yang-Mills field 
equations. It is interesting that the sources of this 
field are singularities of the monopole type, located 
on the singular shapes. In the case n= 3, the source 
is a single monopole located at the three-body 
collision, which is similar to a Dirac monopole in 
electromagnetic theory. 

The (3n — 6)-dimensional horizontal subspaces of 
the TRCS are annihilated by three differential forms, 
whose values on a velocity vector of the system are 


the components of the classical angular momentum L 
(body or space components, depending on the basis 
of forms). Thus, horizontal motions are those for 
which L=0, and horizontal lifts of curves in shape 
space are motions of the system with vanishing 
angular momentum. Since angular momentum is 
conserved, such motions are generated by the 
classical equations of motion and are physically 
allowed. For loops in shape space, the holonomy 
generated by the horizontal lift is physically the 
rotation that a flexible body experiences when it is 
carried under conditions of vanishing angular 
momentum from an initial shape, through intermedi- 
ate shapes and back to the initial shape. An example 
is the rotation generated by the “falling cat.” 

Since the metric on the TRCS is SO(3)-invariant, 
it may be projected onto shape space, which there- 
fore is a Riemannian manifold in its own right. The 
projected metric is ds* = g, dq” dq’. This metric is 
not flat (the Riemann curvature tensor is nonzero 
for all values n > 3). Geodesics in shape space have 
horizontal lifts that are free particle motions (V = 0) 
of zero angular momentum. Conversely, such 
motions project onto geodesics on shape space. 

A popular choice of body frame in molecular 
physics is the Eckart frame, which has advantages 
for the description of small vibrations and other 
purposes. The section defining the Eckart frame is a 
flat vector subspace of the TRCS of dimension 37 — 6 
that is orthogonal (horizontal) to a particular fiber 
(over an equilibrium shape) at a particular 
orientation. 

The geometrical meaning of eqn [17] is that 
rotations act on a set of wave functions ¢ that span 
an irrep of SO(3) by multiplication by the represen- 
tative element of the group. In standard physics 
notation, / indexes the irrep, and m indexes the basis 
vectors spanning the irrep. Thus, the values of these 
wave functions at any point on the fiber are known 
once their values are given at a reference point. A 
convenient choice for the reference point is the point 
on the section, and the wave functions x; are simply 
the values of the m on this reference point (with a 
change of notation, m— k). Thus, the wave func- 
tions yj, are properly not “wave functions on shape 
space,” but rather wave functions on the section. 

Shape space in the case n= 3 is homeomorphic to 
the region x3 > 0 of RÌ, and in the case n= 4 to R°. 
A convenient tool for understanding the structure 
of shape space is by its foliation under the action of 
the kinematic rotations, eqn [5]. The kinematic 
rotations commute with ordinary rotations, and 
hence have an action on shape space. This action 
preserves the eigenvalues of the moment-of-inertia 
tensor. 


Concluding Remarks 


The quantum n-body problem provides an interesting 
example in which nonabelian gauge theories find 
application in nonrelativistic quantum mechanics. The 
fields E, A,,, and g,,,, and fields derived from them such 
as the curvature tensor B,,, and the Riemann curvature 
tensor derived from g,,,, satisfy a complex set of 
differential equations on shape space that can be 
derived by considering the vanishing of the Riemann 
tensor on the TRCS. The resulting field equations are 
useful in perturbation theory, for example, in the study 
of small vibrations of a molecule. This means of 
constructing field equations on the base space of a 
bundle is standard in Kaluza—Klein theories, which are 
an important line of thinking in modern attempts to 
understand gauge field theories in particle physics. 

The rotations generated by flexible bodies of vanish- 
ing angular momentum (the “falling cat”) are an 
example of a “geometric phase,” that is, a nonabelian 
generalization of “Berry’s phase.” It is interesting how 
the associated gauge potential A,, in this problem plays 
a role in the dynamics of the 7-particle system. 

The Hamiltonian [15] is the starting point for 
numerous practical calculations, for example, the 
numerical evaluation of energy levels, cross-sections 
and reaction rates in molecular physics. One can 
compute, for example, chemical reaction rates for 
molecular processes in atmospheric or astrophysical 
contexts, where experiments would be difficult or 
expensive. The numerical analysis of the Hamiltonian 
[15] usually requires the introduction of a basis set and 
the processing of large matrices. Current techniques 
for basis set selection are not very satisfactory, and this 
is an area where research into wavelets and numerical 
analysis could have an impact. 
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Introduction 


The study of second-order phase transitions at 
nonzero temperatures has a long and distinguished 
history in statistical mechanics. Many key physical 
phenomena, such as the loss of ferromagnetism 
in iron at the Curie temperature or the critical 
endpoint of CO2, are now understood in precise 
quantitative detail. This understanding began in the 
work of Onsager, and is based upon what may now 
be called the Landau—Ginzburg—Wilson theory. 
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See also: Bosons and Fermions in External Fields; 
Gravitational N-Body Problem (Classical); Integrable 
Systems: Overview. 
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The content of this sophisticated theory may be sum- 
marized in a few basic principles: (1) The collective 
thermal fluctuations near second-order transitions 
can be accurately described by simple classical 
models, that is, quaantum-mechanical effects can be 
entirely neglected. (2) The classical models identify 
an “order parameter,” a collective variable which 
has to be treated on par with other thermodynamic 
variables, and whose correlations exhibit distinct 
behavior in the phases on either side of the 
transition. (3) The thermal fluctuations of the 
order parameter near the transition are controlled 
by a continuum field theory whose structure is 
usually completly dictated by simple symmetry 
considerations. 
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This article will not consider such nonzero 
temperature phase transitions, but will instead 
describe second-order phase transitions at the 
absolute zero of temperature. Such transitions are 
driven by quantum fluctuations mandated by the 
Heisenberg uncertainty principle: one can imagine 
moving across the quantum critical point by 
effectively “tuning the value of Planck’s constant, 
h.” Clearly, quantum mechanics plays a central role 
at such transitions, unlike the situation at nonzero 
temperatures. The reader may object that absolute 
zero is an idealization not realized by any experi- 
mental system; hence, the study of quantum phase 
transitions is a subject only of academic interest. As 
we will illustrate below, knowledge of the zero- 
temperature quantum critical points of a system is 
often the key to understanding its finite-temperature 
properties, and in some cases the influence of a zero- 
temperature critical point can be detected at 
temperatures as high as ambient room temperature. 

We will begin in the following section by 
introducing some simple lattice models which 
exhibit quantum phase transitions. Next the theory 
of the critical point in these models is based upon 
a natural extension of the Landau—Ginzburg—Wilson 
(LGW) method, and this will be presented. This 
section will also describe the consequences of a zero- 
temperature critical point on the nonzero tempera- 
ture properties. Finally, we will consider more 
complex models in which quantum interference 
effects play a more subtle role, and which cannot 
be described in the LGW framework: such quantum 
critical points are likely to play a central role in 
understanding many of the correlated electron 
systems of current interest. 


Simple Models 
Quantum Ising Chain 


This is a simple model of N qubits, labeled by the 
index j=1,...,N. On each “site” j there are two 
qubit quantum states |f); and ||); (in practice, these 
could be two magnetic states of an ion at site j in a 
crystal). The Hilbert space therefore consists of 2% 
states, each consisting of a tensor product of the 
states on each site. We introduce the Pauli spin 
~, on each site j, with a =x, y, z: 


operators, 0; ; 
0 —i 
E =|. 
1 0 


~y G ) 

= , 
1 0 

s= 1) 
0 =i 


[1] 


These operators clearly act on the two states of the 
qubit on site j, and the Pauli operators on different 
sites commute. 

The quantum Ising chain is defined by the simple 
Hamiltonian 


N 
Hi =-J > 6764,-g]>_ 6? [2] 
- 


where J > 0 sets the energy scale, and g >O is a 
dimensionless coupling constant. In the thermody- 
namic limit (N — oo), the ground state of Hı exhibits 
a second-order quantum phase transition as g is 
tuned across a critical value g=g. (for the specific 
case of Hy it is known that g.=1), as we will now 
illustrate. 

First, consider the ground state of Hı for g< 1. 
At g=0, there are two degenerate “ferromagnetically 
ordered” ground states 


N N 
m= Tn, We Ty, [3] 
j=l j=l 


Each of these states breaks a discrete “Ising” 
symmetry of the Hamiltonian rotations of all 
spins by 180° about the x-axis. These states are 
more succinctly characterized by defining the 
ferromagnetic moment, No, by 


No = (MEIN) = (HE) 4] 


At g=0 we clearly have No=1. A key point is 
that in the thermodynamic limit, this simple picture 
of the ground state survives for a finite range of 
small g (indeed, for all g < g.), but with O < No < 1. 
The quantum tunneling between the two ferromag- 
netic ground states is exponentially small in N (and 
so can be neglected in the thermodynamic limit), 
and so the ground state remains 2-fold degenerate 
and the discrete Ising symmetry remains broken. 
The change in the wave functions of these states 
from eqn [3] can be easily determined by perturba- 
tion theory in g: these small g quantum fluctuations 
reduce the value of No from unity but do not cause 
the ferromagnetism to disappear. 

Now consider the ground state of Hı for g > 1. 
At g=oo there is a single nondegenerate ground 
state which fully preserves all symmetries of Hy: 


N 
+) =2 8? TT (In +1,) s] 
j=1 


It is easy to verify that this state has no ferromagnetic 
moment No = (=|6¥|>) =Q. Further, perturbation 
theory in 1/g shows that these features of the ground 
state are preserved for a finite range of large g values 


(indeed, for all g > g.). One can visualize this ground 
state as one in which strong quantum fluctuations 
have destroyed the ferromagnetism, with the local 
magnetic moments quantum tunneling between “up” 
and “down” on a timescale of order 4/J. 

Given the very distinct signatures of the small g 
and large g ground states, it is clear that the ground 
state cannot evolve smoothly as a function of g. 
These must be at least one point of nonanalyticity as 
a function of g: for Hy it is known that there is only 
a single nonanalytic point, and this is at the location 
of a second-order quantum phase transition at 
oL 

The character of the excitations above the ground 
state also undergoes a qualitative change across the 
quantum critical point. In both the g < g, and g > g, 
phases, these excitations can be described in the 
Landau quasiparticle scheme, that is, as super- 
positions of nearly independent particle-like 
excitations; a single well-isolated quasiparticle has 
an infinite lifetime at low excitation energies. 
However, the physical nature of the quasiparticles 
is very different in the two phases. In the ferromag- 
netic phase, with g<g., the quasiparticles are 


domain walls between regions of opposite 
magnetization: 
j,i+1) = Jli TI 1; [6] 


f=j+1 


This is the exact wave function of a stationary 
quasiparticle excitation between sites j and j + 1 at 
g=0; for small nonzero g the quasiparticle acquires 
a “cloud” of further spin-flips and also becomes 
mobile. However its qualitative interpretation as a 
domain wall between the two degenerate ground 
states remains valid for all g < g.. In contrast, for 
g > 8g., there is no ferromagnetism, and the non- 
degenerate paramagnetic state has a distinct quasi- 
particle excitation: 


=P- tD) 7 


kŻj 


This is a stationary “flipped spin” quasiparticle at 
site j, with its wave function exact at g=oo. Again, 
this quasiparticle is mobile and applicable for all 
g > 8., but there is no smooth connection between 
eqns [7] and [6]. 


Coupled Dimer Antiferromagnet 


This model also involves qubits, but they are now 
placed on the sites, j, of a two-dimensional square 
lattice. Models in this class describe the magnetic 
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Figure 1 The coupled dimer antiferromagnet. Qubits (i.e., 
S = 1/2 spins) are placed on the sites, the A links are shown as 
full lines, and the 6 links as dashed lines. 


excitations of many experimentally important spin 
gap compounds. 

The Hamiltonian of the dimer antiferromagnet is 
illustrated in Figure 1 and is given by 


Ha=J X. (6/6f + 676} + 6763) 
on 
+e ôg +ô? + orz) [8] 
(ik)€ 


where J > 0 is the exchange constant, g > 1 is the 
dimensionless coupling, and the set of nearest- 
neighbor links A and B are defined in Figure 1. An 
important property of Hg is that it is now invariant 
under the full O(3) group of spin rotations under 
which the o° transform as ordinary vectors (in 
contrast to the Z2 symmetry group of Hy). In 
analogy with Hj, we will find that Hg undergoes a 
quantum phase transition from a paramagnetic 
phase which preserves all symmetries of the 
Hamiltonian at large g, to an antiferromagnetic 
phase which breaks the O(3) symmetry at small g. 
This transition occurs at a critical value g=g., 
and the best current numerical estimate is 
1g = 05233713). 

As in the previous section, we can establish the 
existence of such a quantum phase transition by 
contrasting the disparate physical properties at large 
g with those at g~ 1. At g=oo the exact ground 
state of Hy is 


spin gap) -I 5 We- Wile) 2 


(ik) aY 


and is illustrated in Figure 2. This state is non- 
degenerate and invariant under spin rotations, and 
so is a paramagnet: the qubits are paired into spin 
singlet valence bonds across all the A links. 

The excitations above the ground state are 
created by breaking a valence bond, so that the 
pair of spins form a spin triplet with total spin 
S= 1 —- this is illustrated in Figure 3. It costs a large 
energy to create this excitation, and at finite g the 
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Figure 2 The paramagnetic state of Hy for g > ge. The state 
illustrated is the exact ground state for g=oo, and it is 
adiabatically connected to the ground state for all g > ge. 


Figure 3 The triplon excitation of the g > ge paramagnet. The 
stationary triplon is an eigenstate only for g = œ but it becomes 
mobile for finite g. 


triplet can hop from link to link, creating a gapped 
“triplon” quasiparticle excitation. This is similar to 
the large g paramagnet for Hy, with the important 
difference that each quasiparticle is now 3-fold 
degenerate. 

At g=1, the ground state of Hg is not known 
exactly. However, at this point Hy becomes equiva- 
lent to the nearest-neighbor square lattice antiferro- 
magnet, and this is known to have antiferromagnetic 
order in the ground state, as illustrated in Figure 4. 
This state is similar to the ferromagnetic ground 
state of Hy, with the difference that the magnetic 
moment now acquires a staggered pattern on the 
two sublattices, rather than the uniform moment of 
the ferromagnet. Thus, in this ground state 


(AF|¢|AF) = Nonna [10] 


where 0 < No < 1 is the antiferromagnetic moment, 
nj = +1 identifies the two sublattices in Figure 4, and 
NM, iS an arbitrary unit vector specifying the 


Figure 4 Schematic of the ground state with antiferromagnetic 
order with g < ge. 


orientation of the spontaneous magnetic moment 
which breaks the O(3) spin rotation invariance of 
Ha. The excitations above this antiferromagnet are 
also distinct from those of the paramagnet: they are 
a doublet of spin waves consisting of a spatial 
variation in the local orientation, na, of the 
antiferromagnetic order: the energy of this excita- 
tion vanishes in the limit of long wavelengths, in 
contrast to the finite energy gap of the triplon 
excitation of the paramagnet. 

As with Hy, we can conclude from the distinct 
characters of the ground states and excitations for 
g>1 and gx 1 that there must be a quantum 
critical point at some intermediate g=g.. 


Quantum Criticality 


The simple considerations of the previous section 
have given a rather complete description (based on 
the quasiparticle picture) of the physics for g < g, 
and g > g.. We turn, finally, to the region g ~ g.. 
For the specific models discussed in the previous 
section, a useful description is obtained by a method 
that is a generalization of the LGW method 
developed earlier for thermal phase transitions. 
However, some aspects of the critical behavior 
(e.g., the general forms of eqns [13]-[15]) will 
apply also to the quantum critical point of the 
section “Beyond LGW theory.” 

Following the canonical LGW strategy, we need 
to identify a collective order parameter which 
distinguishes the two phases. This is clearly given 
by the ferromagnetic moment in eqn [4] for the 
quantum Ising chain, and the antiferromagnetic 
moment in eqn [10] for the coupled dimer antiferro- 
magnet. We coarse-grain these moments over some 
finite averaging region, and at long wavelengths this 
yields a real order parameter field ¢,, with the index 
a=1,...,n. For the Ising case we have n=1 and dy 
is a measure of the local average of No as defined in 
eqn [4]. For the antiferromagnet, a extends over the 
three values x,y,z (so 7=3), and three components 
of a specify the magnitude and orientation of the 
local antiferromagnetic order in eqn [10]; note the 
average orientation of a specific spin at site j is nj 
times the local value of dy. 

The second step in the LGW approach is to write 
down a general field theory for the order parameter, 
consistent with all symmetries of the underlying 
model. As we are dealing with a quantum transition, 
the field theory has to extend over spacetime, with 
the temporal fluctuations representing the sum over 
histories in the Feynman path-integral approach. 
With this reasoning, the proposed partition function 


for the vicinity of the critical point takes the 
following form: 


z= J E EE 


e - J dfx ar(5 (0-a) 


HAV) E) [1 


Here r is imaginary time; there is an implied 
summation over the n values of the index a, c is a 
velocity, and s and u > 0 are coupling constants. 
This is a field theory in d + 1 spacetime dimensions, 
in which the Ising chain corresponds to d=1 and 
the dimer antiferromagnet to d=2. The quantum 
phase transition is accessed by tuning the “mass” s: 
there is a quantum critical point at s=s, and the 
S < Sels > Sc) regions correspond to the g<g.(g>g.) 
regions of the lattice models. The s< se phase has 
(ġa) #0 and this corresponds to the spontaneous 
breaking of spin rotation symmetry noted in eqns [4] 
and [10] for the lattice models. The s>s, phase is 
the paramagnet with (¢,) =0. The excitations in this 
phase can be understood as small harmonic oscilla- 
tions of ¢, about the point (in field space) ¢,=0. A 
glance at eqn [11] shows that there are n such 
oscillators for each wave vector. These oscillators 
clearly constitute the g>g. quasiparticles found 
earlier in eqn [7] for the Ising chain (with n=1) 
and the triplon quasiparticle (with n=3) illustrated 
in Figure 3 for the dimer antiferromagnet. 

We have now seen that there is a_ perfect 
correspondence between the phases of the quantum 
field theory Zz and those of the lattice models Hy 
and Hy. The power of the representation in eqn [11] 
is that it also allows us to get a simple description of 
the quantum critical point. In particular, readers 
may already have noticed that if we interpret the 
temporal direction 7 in eqn [11] as another spatial 
direction, then Z is simply the classical partition 
function for a thermal phase transition in a ferro- 
magnet in d+ 1 dimensions: this is the canonical 
model for which the LGW theory was originally 
developed. We can now take over standard results 
for this classical critical point, and obtain some 
useful predictions for the quantum critical point of 
Z 4. It is useful to express these in terms of the 
dynamic susceptibility defined by 


ed) =z) dx 


x / de((5(x,2), 6(0,0)]) eer" [12] 
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Here ¢ is the Heisenberg field operator correspond- 
ing to the path integral in eqn [11], the square 
brackets represent a commutator, and the angular 
brackets an average over the partition function at a 
temperature T. The structure of x can be deduced 
from the knowledge that the quantum correlators of 
Zə are related by analytic continuation in time to 
the corresponding correlators of the classical statis- 
tical mechanics problem in d+ 1 dimensions. The 
latter are known to diverge at the critical point as 
~1/p?-" where p is the (d + 1)-dimensional momen- 
tum, 77 is defined to be the anomalous dimension of 
the order parameter (7=1/4 for the quantum Ising 
chain). Knowing this, we can deduce the form of the 
quantum correlator in eqn [12] at the zero-tempera- 
ture quantum critical point 


1 
x(k, w) ER a T = 0, 


The most important property of eqn [13] is the 
absence of a quasiparticle pole in the spectral 
density. Instead, Im(x(k,w)) is nonzero for all w > ck, 
reflecting the presence of a continuum of critical 
excitations. Thus the stable quasiparticles found at 
low enough energies for all g 4 g, are absent at the 
quantum critical point. 

We now briefly discuss the nature of the phase 
diagram for T >0 with g near g.. In general, the 
interplay between quantum and thermal fluctuations 
near a quantum critical point can be quite compli- 
cated, and we cannot discuss it in any detail here. 
However, the physics of the quantum Ising chain is 
relatively simple, and also captures many key 
features found in more complex situations, and is 
summarized in Figure 5. For all g Æ g, there is a 
range of low temperatures (T <|g—g.|) where the 
long time dynamics can be described using a dilute 
gas of thermally excited quasiparticles. Further, the 


g=g,. [13] 
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Figure 5 Nonzero temperature phase diagram of Hı. The 
ferromagnetic order is present only at T =O on the shaded line 
with g < go. The dashed lines at finite T are crossovers out of 
the low-T quasiparticle regimes where a quasiclassical descrip- 
tion applies. The state sketched on the paramagnetic side used 
the notation |>); =27/2(11); +11);) and |=); =272(1t); — 11);). 
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dynamics of these quasiparticles is quasiclassical, 
although we reiterate that the nature of the 
quasiparticles is entirely distinct on opposite sides 
of the quantum critical point. Most interesting, 
however, is the novel quantum critical region, 
T>=|g—g.|, where neither quasiparticle picture nor 
a quasiclassical description are appropriate. Instead, 
we have to understand the influence of temperature 
on the critical continuum associated with eqn [13]. 
This is aided by scaling arguments which show that 
the only important frequency scale which charac- 
terizes the spectrum is kgT/þb, and the crossovers 
near this scale are universal, that is, independent of 
specific microscopic details of the lattice Hamilto- 
nian. Consequently, the zero-momentum dynamic 
susceptibility in the quantum critical region takes 
the following form at small frequencies: 


1 1 


X(R = 0,W) ~ aE aia fT) 


[14] 
This has the structure of the response of an 
overdamped oscillator, and the damping frequency, 
IR, is given by the universal expression 
T kgT 
Pr = (2 tan) S 15 
. N16) h a. 
The numerical proportionality constant in eqn. [15] 
is specific to the quantum Ising chain; other models 
also obey eqn [15] but with a different numerical 
value for this constant. 


Beyond LGW Theory 


The quantum transitions discussed so far have 
turned to have a critical theory identical to that 
found for classical thermal transitions in d+1 
dimensions. Over the last decade it has become 
clear that there are numerous models, of key 
physical importance, for which such a simple 
classical correspondence does not exist. In these 
models, quantum Berry phases are crucial in estab- 
lishing the nature of the phases, and of the critical 
boundaries between them. In less technical terms, a 
signature of this subtlety is an important simplifying 
feature which was crucial in the analyses of the 
section “Simple models”: both models had a 
straightforward g— oo limit in which we were able 
to write down a simple, nondegenerate, ground-state 
wave function of the “disordered” paramagnet. In 
many other models, identification of the disordered 
phase is not as straightforward: specifying absence 
of a particular magnetic order is not enough to 
identify a quantum state, as we still need to write 
down a suitable wave function. Often, subtle 
quantum interference effects induce new types of 


order in the disordered state, and such effects are 
entirely absent in the LGW theory. 

An important example of a system displaying such 
phenomena is the S=1/2 square lattice antiferro- 
magnet with additional frustrating interactions. The 
quantum degrees of freedom are identical to those of 
the coupled dimer antiferromagnet, but the Hamil- 
tonian preserves the full point-group symmetry of 
the square lattice: 


H, = "Ju (G*6f +66? +6762) +--- [16l 
j<k 

Here the Jj, > 0 are short-range exchange interac- 
tions which preserve the square lattice symmetry, 
and the ellipses represent possible further multiple 
spin terms. Now imagine tuning all the non-nearest- 
neighbor terms as a function of some generic 
coupling constant g. For small g, when H, is nearly 
the square lattice antiferromagnet, the ground state 
has antiferromagnetic order as in Figure 4 and 
eqn [10]. What is now the disordered ground state 
for large g? One natural candidate is the spin-singlet 
paramagnet in Figure 2. However, because all 
nearest neighbor bonds of the square lattice are 
now equivalent, the state in Figure 2 is degenerate 
with three other states obtained by successive 90° 
rotations about a lattice site. In other words, the 
state in Figure 2, when transferred to the square 
lattice, breaks the symmetry of lattice rotations by 
90°. Consequently it has a new type of order, often 
called valence-bond-solid (VBS) order. It is now 
believed that a large class of models like H, do 
indeed exhibit a second-order quantum phase 
transition between the antiferromagnetic state and 
a VBS state — see Figure 6. Both the existence of VBS 
order in the paramagnet, and of a second-order 
quantum transition, are features that are not 
predicted by LGW theory: these can only be 
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Figure 6 Phase diagram of Hs. Two possible VBS states are 
shown: one which is the analog of Figure 2, and the other in 
which spins form singlets in a plaquette pattern. Both VBS states 
have a 4-fold degeneracy due to breaking of square lattice 
symmetry. So the novel critical point at g = ge (described by Z.) 
has the antiferromagnetic and VBS orders vanishing as it is 
approached from either side: this coincident vanishing of orders 
is generically forbidden in LGW theories. 


understood by a careful study of quantum inter- 
ference effects associated with Berry phases of spin 
fluctuations about the antiferromagnetic state. We 
will not enter into details of this analysis here, but will 
conclude our discussion by writing down the theory so 
obtained for the quantum critical point in Figure 6: 


Zz, = J DEDA 
x exp (-/ d*x dr Q — Aea + skal 


+ 5 (zal) + 5 (erað A ) [17] 


Here u,v, AÀ are spacetime indices which extend over 
the two spatial directions and 7, a is a spinor index 
which extends over 7, |, and Za is complex spinor 
field. In comparing Z; to Zg, note that the vector 
order parameter ¢, has been replaced by a spinor Za, 
and these are related by ¢,=zi,0%,,%3, where o° are 
the Pauli matrices. So the order parameter has 
fractionalized into the z,. A second novel property 
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Introduction 


The theory of quantum spin systems is concerned with 
the properties of quantum systems with an infinite 
number of degrees of freedom that each have a finite- 
dimensional state space. Occasionally, one is specifically 
interested in finite systems. Among the most common 
examples, one has an n-dimensional Hilbert space 
associated with each site of a d-dimensional lattice. 

A model is normally defined by describing 
a Hamiltonian or a family of Hamiltonians, which 
are self-adjoint operators on the Hilbert space, and 
one studies their spectrum, the eigenstates, the 
equilibrium states, the system dynamics, and non- 
equilibrium stationary states, etc. 

More particularly, the term “quantum spin sys- 
tem” often refers to such models where each degree 
of freedom is thought of as a spin variable, that is, 
there are three basic observables representing the 
components of the spin, St, S*, and S°, and these 
components transform according to a unitary repre- 
sentation of SU(2). The most commonly encountered 
situation is where the system consists of N spins, each 
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of Z, is the presence of a U(1) gauge field A,,: this 
gauge force emerges near the critical point, even 
though the underlying model in eqn [16] only has 
simple two spin interactions. Studies of fractiona- 
lized critical theories like Z. in other models with 
spin and/or charge excitations is an exciting avenue 
for further theoretical research. 


See also: Bose-Einstein Condensates; Boundary 
Conformal Field Theory; Fractional Quantum Hall Effect; 
Ginzburg—Landau Equation; High Tc Superconductor 
Theory; Quantum Central-Limit Theorems; Quantum 
Spin Systems; Quantum Statistical Mechanics: Overview. 
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associated with a fixed irreducible representation of 
SU(2). One speaks of a spin-J model if this represen- 
tation is the (2J + 1)-dimensional one. The possible 
values of J are 1/2, 1, 3/2,... 

The spins are usually thought of as each being 
associated with a site in a lattice, or more generally, a 
vertex in a graph. In a condensed-matter-physics 
model, each spin may be associated with an ion in a 
crystalline lattice. Quantum spin systems are also used 
in quantum information theory and quantum compu- 
tation, and show up as abstract mathematical objects 
in representation theory and quantum probability. 

In this article we give a brief introduction to the 
subject, starting with a very short review of its history. 
The mathematical framework is sketched and the most 
important definitions are given. Three sections, “Sym- 
metries and symmetry breaking,” “Phase transitions,” 
and “Dynamics,” together cover the most important 
aspects of quantum spin systems actively pursued today. 


A Very Brief History 


The introduction of quantum spin systems was the 
result of the marriage of two developments during 
the 1920s. The first was the realization that angular 
momentum (hence, also the magnetic moment) is 
quantized (Pauli 1920, Stern and Gerlach 1922) and 
that particles such as the electron have an intrinsic 
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angular momentum called spin (Compton 1921, 
Goudsmit and Uhlenbeck 1925). 

The second development was the attempt in 
statistical mechanics to explain ferromagnetism and 
the phase transition associated with it on the basis of a 
microscopic theory (Lenz and Ising 1925). The 
fundamental interaction between spins, the so-called 
exchange operator which is a subtle consequence 
of the Pauli exclusion principle, was introduced 
independently by Dirac and Heisenberg in 1926. 
With this discovery, it was realized that magnetism is 
a quantum effect and that a fundamental theory 
of magnetism requires the study of quantum-mechan- 
ical models. This realization and a large amount of 
subsequent work notwithstanding, some of the most 
fundamental questions, such as a derivation of 
ferromagnetism from first principles, remain open. 

The first and most important quantum spin model 
is the Heisenberg model, so named after Heisenberg. 
It has been studied intensely ever since the early 
1930s and its study has led to an impressive variety 
of new ideas in both mathematics and physics. Here, 
we limit ourselves to listing only some landmark 
developments. 

Spin waves were discovered independently by 
Bloch and Slater in 1930 and they continue to play 
an essential role in our understanding of the 
excitation spectrum of quantum spin Hamiltonians. 
In two papers published in 1956, Dyson advanced 
the theory of spin waves by showing how interac- 
tions between spin waves can be taken into account. 

In 1931, Bethe introduced the famous Bethe 
ansatz to show how the exact eigenvectors of the 
spin-1/2 Heisenberg model on the one-dimensional 
lattice can be found. This exact solution, directly 
and indirectly, led to many important developments 
in statistical mechanics, combinatorics, representa- 
tion theory, quantum field theory and more. 
Hulthén used the Bethe ansatz to compute the 
ground-state energy of the antiferromagnetic spin- 
1/2 Heisenberg chain in 1938. 

In their famous 1961 paper, Lieb, Schultz, and 
Mattis showed that some quantum spin models in 
one dimension can be solved exactly by mapping 
them into a problem of free fermions. This paper is 
still one of the most cited in the field. 

Robinson, in 1967, laid the foundation for the 
mathematical framework, which we describe in the 
next section. Using this framework, Araki estab- 
lished the absence of phase transitions at positive 
temperatures in a large class of one-dimensional 
quantum spin models in 1969. 

During the more recent decades, the mathematical 
and computational techniques used to study quantum 
spin models have fanned out in many directions. 


When it was realized in the 1980s that the magnetic 
properties of complex materials play an important role 
in high-T, superductivity, a variety of quantum spin 
models studied in the literature proliferated. This 
motivated a large number of theoretical and experi- 
mental studies of materials with exotic properties that 
are often based on quantum effects that do not have a 
classical analog. An example of unexpected behavior is 
the prediction by Haldane of the spin liquid ground 
state of the spin-1 Heisenberg antiferromagnetic chain 
in 1983. In the quest for a mathematical proof of this 
prediction (a quest still ongoing today), Affleck, 
Kennedy, Lieb, and Tasaki introduced the AKLT 
model in 1987. They were able to prove that the 
ground state of this model has all the characteristic 
properties predicted by Haldane for the Heisenberg 
chain: a unique ground state with exponential decay of 
correlations and a spectral gap above the ground state. 

There are also particle models that are defined on 
a lattice, or more generally, a graph. Unlike spins, 
particles can hop from one site to another. These 
models are closely related to quantum spin systems 
and, in some cases, are mathematically equivalent. 
The best-known example of a model of lattice 
fermions is the Hubbard model. Such systems are 
not discussed further in this article. 


Mathematical Framework 


Quantum spin systems present an area of mathema- 
tical physics where the demands of mathematical 
rigor can be fully met and, in many cases, this can be 
done without sacrificing the ability to include all 
physically relevant models and phenomena. This 
does not mean, however, that there are few open 
problems remaining. But it does mean that, in 
general, these open problems are precisely formu- 
lated mathematical questions. 

In this section we review the standard mathema- 
tical framework for quantum spin systems, in which 
the topics discussed in the subsequent section can be 
given a precise mathematical formulation. It is 
possible, however, to skip this section and read the 
rest with only a physical or intuitive understanding 
of the notions of observable, Hamiltonian, 
dynamics, symmetry, ground state, etc. 

The most common mathematical setup is as follows. 
Let d > 1, and let £ denote the family of finite subsets 
of the d-dimensional integer lattice 7“. For simplicity 
we will assume that the Hilbert space of the “spin” 
associated with each x € Zf has the same dimension 
n > 2: Hy, = C”. The Hilbert space associated with 
the finite volume A € £ is then Ha = yeca Hx. The 
algebra of observables for the spin of site x consists of 
the n xn complex matrices: Ay, = M,(C). For any 


A € L, the algebra of observables for the system in A is 
given by A, = Rea At}. The primary observables for 
a quantum spin model are the spin-S matrices 
S', S*, and S°, where S is the half-integer such that 
n=2S-+ 1. They are defined as Hermitian matrices 
satisfying the SU(2) commutation relations. Instead 
of S! and S*, one often works with the spin-raising 
and -lowering operators, St and S~, defined by the 
relations St = (S+ + S~)/2, and $? =(S+ — S~)/(2i). In 
terms of these, the SU(2) commutation relations are 

[ St, S57] =26?, [S?, S+] = +S* [1] 
where we have used the standard notation for the 
commutator for two elements A and B in an algebra: 
[A, B] = AB — BA. In the standard basis $$, St, and 
S~ are given by the following matrices: 


—§ 


SS 


where, for m = —S, —S+1,..., S, 


S(S + 1) —m(m — 1) 


Cm = 


In the case n=2, one often works with the Pauli 
matrices, øl, 07, o°, simply related to the spin 
matrices by o = 2S’, j=1, 2, 3. 

Most physical observables are expressed as finite 
sums and products of the spin matrices 


S/, 7=1, 2, 3, associated with the site x € A: 


eT pA 
yen 
with A, =S, and Ay=1 if y £ x. 

The A, are finite-dimensional C*-algebras for the 
usual operations of sum, product, and Hermitian 
conjugation of matrices and with identity l4. 

If Ag C Ay, there is a natural embedding of Ay, 
into A,,, given by 


Ary = Ary & LAy\Ag C Aa, 


The algebra of local observables is then defined by 


Aloc = U Ay 


AEL 
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Its completion is the C*-algebra of quasilocal 
observables, which we will simply denote by A. 

The dynamics and symmetries of a quantum spin 
model are described by (groups of) automorphisms 
of the C*-algebra A, that is, bijective linear trans- 
formations œa on A that preserve the product and 
* operations. Translation invariance, for example, is 
expressed by the translation automorphisms Tx, x € 
Zt, which map any subalgebra A, to Aji, in the 
natural way. They form a representation of the 
additive group Zf on A. 

A translation-invariant interaction, or potential, 
defining a quantum spin model, is a map ¢:L—- A 
with the following properties: for all xX € £, 
we have #(X) € Ax, #(X)=¢6(X)*, and for x € Zf, 
o(X + x)=7,(¢(X)). An interaction is called finite 
range if there exists R>O such that ¢(X)=0 
whenever diam(X) > R. The Hamiltonian in A is 
the self-adjoint element of A, defined by 


Hy = ` o(X) 


XCA 


For the standard Heisenberg model the interaction is 
given by 


Ox, y} = -Sx Sy, if |x- y| = 1 |2] 


and ¢(X)=0 in all other cases. Here, Sx -Sy is the 
conventional notation for SIS} + S484 + S2S). The 
magnitude of the coupling constant J sets a natural 
unit of energy and is irrelevant from the mathema- 
tical point of view. Its sign, however, determines 
whether the model is ferromagnetic (J > 0), or 
antiferromagnetic (J < 0). For the classical Heisen- 
berg model, where the role of S, is played by a unit 
vector in R°, and which can be regarded, after 
rescaling by a factor S~*, as the limit S — 00 of the 
quantum Heisenberg model, there is a simple trans- 
formation relating the ferro- and antiferromagnetic 
models (just map S$, to —S, for all x in the even 
sublattice of Zf). It is easy to see that there does not 
exist an automorphism of A mapping Sy to —S,, since 
that would be inconsistent with the commutation 
relations [1]. Not only is there no exact mapping 
between the ferro- and the antiferromagnetic models, 
their ground states and equilibrium states have 
radically different properties. See below for the 
definitions and further discussion. 

The dynamics (or time evolution), of the system in 
finite volume A is the one-parameter group of 
automorphisms of A, given by 


al) (A) = m Ae" te R 
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For each t € R, a") is an automorphism of A and 
the family fal”? | ¢ € R} forms a representation of the 
additive group R. 

Each a’) can trivially be extended to an auto- 
morphism on A, by tensoring with the identity map. 
Under quite general conditions, a‘) converges 
strongly as A > Z in a suitable sense, that is, for 
every A € A, the limit 


lim at) (A) = a;(A) 
Az” 


exists in the norm in A, and it can be shown that it 
defines a strongly continuous one-parameter group 
of automorphisms of A. AÌT Zf stands for any 
sequence of A € £ such that A eventually contains 
any given element of £. A sufficient condition on the 
potential ¢ is that there exists A > 0 such that ||®||, 
is finite, with 


IP, = X ele) [3] 
X30 
Here |-| denotes the number of elements in X. One 


can show that, under the same conditions, 6 defined 
on Aoc by 


6(A) = lim |H,, A] 
ATZ” 


is a norm-closable (unbounded) derivation on A and 
that its closure is, up to a factor i, the generator of 
far |t € R}, that is, formally 


a, = e’ 

For the class of @ with finite ||®||, for some A > 0, Aoc 
is a core of analytic vectors for 6. This means that, for 
each A € Aoc, the function t+ a;(A) can be extended 
to a function a,(A) analytic in a strip |Im z| <a 
for some a > 0. 

A state of the quantum spin system is a linear 
functional on A such that w(A*A) > 0, for all Ac A 
(positivity), and w(l)=1 (normalization). The res- 
triction of w to An, for each A € ZL, is uniquely 
determined by a density matrix, that is, pa E€ Ag, 
such that 


w(A)=trp,aA, forall A € Ay 


where tr denotes the usual trace of matrices. pa is 
non-negative definite and of unit trace. If the density 
matrix is a one-dimensional projection, the state is 
called a vector state, and can be identified with a 
vector Y% € Ha, such that Cy = ran py. 

A ground state of the quantum spin system is a 
state w satisfying the local stability inequalties: 


w(A*6(A)) > 0, forall A € Alc [4] 


The states describing thermal equilibrium are 
characterized by the Kubo-Martin-Schwinger 
(KMS) condition: for any 3 > 0 (related to absolute 
temperature by G=1/(RkgT), where kg is the Boltz- 
mann constant), w is called G-KMS if 


w(Aais(B)) =w(BA), forall A,B € Aoc [5] 


The most common way to construct ground states 
and equilibrium states, namely solutions of [4] and [5], 
respectively, is by taking thermodynamic limits of 
finite-volume states with suitable boundary condi- 
tions. A ground state of the finite-volume Hamiltonian 
Hy, is a convex combination of vector states that are 
eigenstates of H, belonging to its smallest eigenvalue. 
The finite-volume equilibrium state at inverse tem- 
perature 3 has density matrix pg defined by 


m= eo 


where Z(A, 3)=tre~? is called the partition 
function. By considering limit points as 
A — Zf, one can show that a quantum spin model 
always has at least one ground state and at least one 
equilibrium state for all 8. 

In this section, the basic concepts have so far been 
discussed in the most standard setup. Clearly, many 
generalizations are possible: one can consider non- 
translation-invariant models; models with random 
potentials; the state spaces at each site may have 
different dimensions; instead of Z“ one can consider 
other lattices or define models on arbitrary graphs; 
one can allow interactions of infinite range that 
satisfy weaker conditions than those imposed by the 
finiteness of the norm [3], or restrict to subspaces of 
the Hilbert space by imposing symmetries or 
suitable hardcore conditions; and one can study 
models with infinite-dimensional spins. Examples of 
all these types of generalizations have been consid- 
ered in the literature and have interesting 
applications. 


Symmetries and Symmetry Breaking 


Many interesting properties of quantum spin sys- 
tems are related to symmetries and symmetry 
breaking. Symmetries of a quantum spin model are 
realized as representations of groups, Lie algebras, 
or quantum (group) algebras on the Hilbert space 
and/or the observable algebra. The symmetry prop- 
erty of the model is expressed by the fact that the 
Hamiltonian (or the dynamics) commutes with this 
representation. We briefly discuss the most common 
symmetries. 

Translation invariance. The translation auto- 
morphisms 7, have already been defined on the 


observable algebra of infinite quantum spin systems 
on Zf. One can also define translation automorph- 
isms for finite systems with periodic boundary 
conditions, which are defined on the torus 
Z” | TZ“, where T = (T1, ..., Ty) is a positive integer 
vector representing the periods. 

Other graph automorphisms. In general, if G is a 
group of automorphisms of the graph I, and 
Hr = Qer C” is the Hilbert space of a system of 
identical spins defined on T, then, for each g € G, one 
can define a unitary U, on Hr by linear extension of 
Us @ &x = & Pet(x)9 where yp, € C”, for all x eT. 
These unitaries form a representation of G. With the 
unitaries one can immediately define automorphisms 
of the algebra of observables: for A € Ay, and U € Ay 
unitary, r(A) = U*AU defines an automorphism, and 
if U, is a group representation, the corresponding Tg 
will be, too. Common examples of graph automorph- 
isms are the lattice symmetries of rotation and 
reflection. Translation symmetry and other graph 
automorphisms are often referred to collectively as 
spatial symmetries. 

Local symmetries (also called gauge symmetries). 
Let G be a group and ug, g E€ G, a unitary repre- 
sentation of G on C”. Then, Ug= Qen Ug is a 
representation on Ha. The Heisenberg model [2], for 
example, commutes with such a representation of 
SU(2). It is often convenient, and generally equiva- 
lent, to work with a representation of the Lie 
algebra. In that case the SU(2) invariance of the 
Heisenberg model is expressed by the fact that Hy 
commutes with the following three operators: 


Say, Fad 
xEA 

Note: sometimes the Hamiltonian is only sym- 
metric under certain combinations of spatial and 
local symmetries. CP symmetry is an example. 

For an automorphism 7, we say that a state w is 
T-invariant if worT=rT. If w is 7,-invariant for all 
g € G, we say that w is G-invariant. 

It is easy to see that if a quantum spin model has a 
symmetry G, then the set of all ground states or all 
B-KMS states will be G-invariant, meaning that if w 
is in the set, then so is wo Tg, for all g € G. By a 
suitable averaging procedure, it is usually easy to 
establish that the sets of ground states or equili- 
brium states contain at least one G-invariant 
element. 

An interesting situation occurs if the model is 
G-invariant, but there are ground states or KMS 
states that are not. This means that, for some 
g € G, and some w in the set (of ground states or 
KMS states), wo 7 Æ w. When this happens, one says 
that there is spontaneous symmetry breaking, a 
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phenomenon that also plays an important role in 
quantum field theory. 

The famous Hohenberg—Mermin—Wagner theo- 
rem, applied to quantum spin models, states that, as 
long as the interactions do not have very long range 
and the dimension of the lattice is 2 or less, 
continuous symmetries cannot be spontaneously 
broken in a G-KMS state for any finite £. 

Quantum group symmetries. We restrict ourselves 
to one important example: the SU,(2) invariance of 
the spin-1/2 XXZ _ Heisenberg chain with 
q € [0, 1], and with special boundary terms. The 
Hamiltonian of the SU,(2)-invariant XXZ chain of 
length L is given by 


L-1 1 
a Ds ON Ceo + S 41) 
x=1 


1 
-7 Cer = 1/4) t3 ISA Sa = S$) 


where q € (0, 1] is related to the parameter A > 1 
by the relation A= (q + q7™!)/2. When q=0, Hy is 
equivalent to the Ising chain. Thus, the XXZ model 
interpolates between the Ising model (the primordial 
classical spin system) and the isotropic Heisenberg 
model (the most widely studied quantum spin model). 
In the limit of infinite spin (S — oo), the model 
converges to the classical Heisenberg model (XXZ 
or isotropic). An interesting feature of the XXZ 
model are its non-translation-invariant ground 
states, called kink states. 

In this family of models, one can see how aspects 
of discreteness (quantized spins) and continuous 
symmetry (SU(2), or quantum symmetry SU,(2)) are 
present at the same time in the quantum Heisenberg 
models, and the two classical limits (q — 0 and 
S — oo) can be used as a starting point to study its 
properties. 

Quantum group symmetry is not a special case of 
invariance under the action of a group. There is no 
group, but there is an algebra represented on the 
Hilbert space of each spin, for which there is a good 
definition of tensor product of representations, and 
“many” irreducible representations. In this example, 
the representation of SU,(2) on Hy4,1} commuting 
with Hy is generated by 


L 
S=$°1,8--- 882 @lyuy1 lr 
x= 
L 
St= S04 @-- OaD Oley @- Ly 
x=l 


L 
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Quantum group symmetries were discovered in 
exactly solvable models, starting with the spin-1/2 
XXZ chain. One can exploit their representation 
theory to study the spectrum of the Hamiltonian in 
very much the same way as ordinary symmetries. 
The main restriction to its applicability is that the 
tensor product structure of the representations is 
inherently one-dimensional, that is, relying on an 
ordering from left to right. For the infinite XXZ 
chain the left-to-right and right-to-left orderings can 
be combined to generate an infinite-dimensional 
algebra, the quantum affine algebra U4 (ŝl2). 


where 


Phase Transitions 


Quantum spin models of condensed matter physics 
often have interesting ground states. Not only are 
the ground states often a good approximation of the 
low-temperature behavior of the real systems that 
are modeled by it, and studying them is therefore 
useful, it is in many cases also a challenging 
mathematical problem. This is in contrast with 
classical lattice models for which the ground states 
are usually simple and easy to find. In more than 
one way, ground states of quantum spin systems 
display behavior similar to equilibrium states of 
classical spin systems at positive temperature. 

The spin-1/2 Heisenberg antiferromagnet on 
A C Zf, with Hamiltonian 


HS D 


x,y;EA|x—y|=1 


Sx Sy 6 


is a case in point. Even in the one-dimensional case 
(d=1), and even though the model in that case is 
exactly solvable by the Bethe ansatz, its ground state is 
highly nontrivial. Analysis of the Bethe ansatz solution 
(which is not fully rigorous) shows that spin-spin 
correlation function decays to zero at infinity, but 
slower than exponentially (roughly as inverse distance 
squared). For d= 2, it is believed, but not mathemati- 
cally proved, that the ground state has Néel order, that 
is, long-range antiferromagnetic order, accompanied by 
a spontaneous breaking of the SU(2) symmetry. Using 
reflection positivity, Dyson, Lieb, and Simon were able 
to prove the Néel order at sufficiently low temperature 
(large G), for d > 3 and all S > 1/2. This was later 
extended to the ground state for d=2 and S > 1, and 
d > 3 and S > 1/2, that is, all the cases where Néel 
order is expected except d=2, S=1/2. 


In contrast, no proof of long-range order in the 
Heisenberg ferromagnet at low temperature exists. This 
is rather remarkable since proving long-range order in 
the ground states of the ferromagnet is a trivial problem. 

Of particular interest are the so-called quantum 
phase transitions. These are phase transitions that 
occur as a parameter in the Hamiltonian is varied and 
which are driven by the competing effects of energy 
and quantum fluctuations, rather than the balance 
between energy and entropy which drives usual 
equilibrium phase transitions. Since entropy does not 
play a role, quantum phase transitions can be oberved 
at zero temperature, that is, in the ground states. 

An important example of a quantum phase 
transition occurs in the two-or higher-dimensional 
XY model with a magnetic field in the Z-direction. 
It was proved by Kennedy, Lieb, and Shastry that, at 
zero field, this model has off-diagonal long-range 
order (ODLRO), and can be interpreted as a hard- 
core Bose gas at half-filling. It is also clear that if the 
magnetic field exceeds a critical value, he, the model 
has a simple ferromagnetically ordered ground state. 
There are indications that there is ODLRO for all 
|b| < b.. However, so far there is no proof that 
ODLRO exists for any b Æ 0. 

What makes the ground-state problem of quantum 
spin systems interesting and difficult at the same time 
is that ground states, in general, do not minimize the 
expectation value of the interaction terms in the 
Hamiltonian individually although, loosely speaking, 
the expectation value of their sum (the Hamiltonian) 
is minimized. However, there are interesting excep- 
tions to this rule. Two examples are the AKLT model 
and the ferromagnetic XXZ model. 

The wide-ranging behavior of quantum spin models 
has required an equally wide range of mathematical 
approaches to study them. There is one group of 
methods, however, that can make a claim of sub- 
stantial generality: those that start from a representa- 
tion of the partition function based on the Feynman- 
Kac formula. Such representations turn a d-dimen- 
sional quantum spin model into a (d + 1)-dimensional 
classical problem, albeit one with some special 
features. This technique was pioneered by Ginibre in 
1968 and was quickly adopted by a number of authors 
to solve a variety of problems. Techniques borrowed 
from classical statistical mechanics have been adapted 
with great success to study ground states, the low- 
temperature phase diagram, or the high-temperature 
regime of quantum spin models that can be regarded as 
perturbations of a classical system. More recently, it 
was used to develop a quantum version of Pirogov- 
Sinai theory which is applicable to a large class of 
problems, including some with low-temperature 
phases not related by symmetry. 


Dynamics 


Another feature of quantum spin systems that makes 
them mathematically richer than their classical 
couterpart is the existence of a Hamiltonian 
dynamics. Quite generally, the dynamics is well 
defined in the thermodynamic limit as a strongly 
continuous one-parameter group of automorphisms 
of the C*-algebra of quasilocal observables. Strictly 
speaking, a quantum spin model is actually defined 
by its dynamics az, or by its generator 6, and not by 
the potential ¢. Indeed, ¢ is not uniquely determined 
by a, In particular, it is possible to incorporate 
various types of boundary conditions into the 
definition of ¢. This approach has proved very useful 
in obtaining important structural results, such as the 
proof by Araki of the uniqueness the KMS state at 
any finite G in one dimension. Another example is a 
characterization of equilibrium states by the energy- 
entropy balance inequalities, which is both physically 
appealing and mathematically useful: w is a G-KMS 
state for a quantum spin model in the setting of the 
section on the mathematical framework in this article 
(and in fact also for more general quantum systems), 
if and only if the inequality 


Gw( X"6(X)) > w(X*X) lose 


is satisfied for all X € Abc. This characterization 
and several related results were proved in a series of 
works by various authors (mainly Roepstorff, Araki, 
Fannes, Verbeure, and Sewell). 

Detailed properties of the dynamics for specific 
models are generally lacking. One could point to 
the “immediate nonlocality” of the dynamics as 
the main difficulty. By this, we mean that, except in 
trivial cases, most local observables A € A,., 
become nonlocal after an arbitrarily short time, 
that is, a;(A) Z Aoc, for any t Æ 0. This nonlocality 
is not totally uncontrolled however. A result by 
Lieb and Robinson establishes that, for models with 
interactions that are sufficiently short range (e.g., 
finite range), the nonlocality propagates at a 
bounded speed. More precisely, under quite general 
conditions, there exist constants c, v > 0 such that, 
for any two local observables A, B € Ajo), 


Ifoe(A), 7 (B)]|| < 2|AI[Blle 


Attempts to understand the dynamics have gen- 
erally been aimed at one of the two issues: return to 
equilibrium from a perturbed state, and convergence 
to a nonequilibrium steady state in the presence of 
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currents. Some interesting results have been 


obtained although much remains to be done. 
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Introduction 


Quantum theory actually started at the beginning of 
the twentieth century as a many-body theory, 
attempting to solve problems to which classical 
physics gave unsatisfactory answers. 

This article aims to follow the developments of 
quantum statistical mechanics, hereafter called 
QSM, staying close to the underlying physics and 
sketching its methods and perspectives. The next 
section outlines the historical path, and the first 
achievements by Planck (1900) and Debye (1913); 
the subsequent free quantum gas theory will be 
recalled in the first original insights due to Fermi, 
Dirac (1926) and Bose, Einstein (1924-25), when 
many open problems began to find a coherent 
treatment. 

In this framework, an interesting new idea 
appeared: the elementary units of the systems could 
be “particles”, in the usual or in a broad meaning, a 
notion which includes photons, phonons, and 
quasiparticles of current use in condensed matter 
physics. The description of a classical harmonic 
system through independent normal modes is an 
example of a very fruitful use of collective variables. 

The subsequent section will deal with more recent 
achievements, related to the properties of quantum 
N-body systems, which are fundamental for the 
derivation of their macroscopic behavior. In parti- 
cular, the works by Dyson-Lenard and Lieb- 
Lebowitz on the stability of matter have to be 
recalled: a system made of electrons and ions has a 
thermodynamic behavior, thanks to the quantum 
nature of its constituents, where the Pauli exclusion 
principle plays an essential role. 

We will then present relations that arise in 
quantum field theory, that is, from the second 
quantization methods; related technical and concep- 
tual problems will also be presented briefly. 

This is necessary for taking into account the 
recent works and perspectives, which will be 
considered in the last section. Here the new inputs 
and challenges from outstanding achievements in 
physics laboratories will be taken into account, 
referring to some exactly solvable models which 
help in understanding and in fixing the boundaries 
of approximate methods. 


The Crisis of Classical Physics: 
The Quantum Free Gas 


Let us briefly recall some of what Lord Kelvin called 
the “nineteenth century clouds” over the physics 
of that time (1884), and the subsequent new ideas, 
(Gallavotti 1999). 

It is well known that the classical Dulong—Petit law 
of specific heat of solid crystals may be derived from 
the model of point particles interacting through 
harmonic forces; the equipartition of the mean energy 
among the degrees of freedom implies, for N 
particles, the linear dependence of the internal energy 
Uy on absolute temperature T, hence a constant heat 
capacity Cy (kg is the Boltzmann’s constant) 

Un — 6. > NkaT, CN = oN = 3Nkp [1] 
Experimentally this is relatively well satisfied at high 
temperatures but it is violated for low T: one 
observes that Uy vanishes faster than linearly as T 
goes to zero, so that Cyn vanishes. Moreover, the 
contributions to the heat capacity from the internal 
degrees of freedom of the molecular gases or from 
the free electrons in conducting solids are negligible, 
at room temperature: these degrees of freedom, in 
spite of the equipartition principle, seem frozen. 

The analysis of the blackbody radiation problem 
from the classical point of view, that is, using 
equipartition among the normal modes of the 
electromagnetic field in the “black” cavity at 
temperature T, gives the following dependence, 
Rayleigh-Jeans law (1900), of the spectral energy 
density u(v,T), on frequency v and temperature T 
(c is the speed of light in vacuum): 


Sap 





u(v,T) =—=—kpT (2) 


B 
The experimental curves for any positive T show a 
maximum for a frequency Vmax(T) which increases 
linearly with T according to Wien’s displacement 
law (1893). The spectral energy density decreases 
fast enough to zero as y— œ in such a way that the 
overall (integrated) energy is (finite and) propor- 
tional to T*, according to Stefan’s law (1879); the 
agreement with the classical form holds for low 
frequencies. The analytic form of the classical 
u(v, T) in [2] does not present maxima and the 
overall radiated energy is clearly divergent (this bad 
behavior for large v, present in many formulas for 
other models, sometimes in the corresponding 
“short-distance” form, is called an “ultraviolet 
catastrophe”). 


The effort by M Planck (1900) to understand the 
right dependence of u from v and T was based on a 
thermodynamic argument about the possible 
energy-entropy relation, and on an assumption 
similar to the discretization rules on which the 
“old quantum theory” for the atomic structure is 
based. The electromagnetic field is represented, via 
Fourier analysis, as a set of infinitely many 
independent harmonic oscillators, two for every 
wave vector k, to take into account the polarization. 
The frequency depends linearly on the wave number 
k=|k| (linear dispersion law), and the spacing 
becomes negligible for macroscopic dimensions of 
the cavity. The key idea for computing the partition 
function is the discretization of the phase space of 
each oscillator (of frequency v=w/27). Putting 
there the adimensionalized Lebesgue measure 
dpdq/h, where / is a constant with physical 
dimensions of an action, we consider the regions 
Rg bounded by the constant-energy ellipses and 
their areas |Rg|, and find 


dpdq 2rE E 
R= | SF 


b bhw bhv 
If these adimensional areas have integer values, that 
is, E=nhv,n=0,1,2,..., the annular region (“cell” 
C,,) between Rg, and Rg, has unit area and so we 
approximate the partition function with the series 
(G=1/(RkgT), the ubiquitous parameter in statistical 
mechanics, often called “inverse temperature”) 


1 
Li = — Bnhv) = ———— 
discr 2 el Gn v) 1— exp(—bhv) 
In this way, the probabilistic weight given to this 
cell is 


= exp(—pnhv) 
Pn) = S5 exp Bib) 
= exp(—Snbv)(1 — exp(-6hv)) [3 


A well-defined value for the constant þh (i.e., þh = 
6.626... x 10°*’ erg s, the Planck constant), com- 
bined with the usual computation for the density of 
states, gives a formula which quantitatively agrees 
with experimental data (see Figure 1) 


Srv? þv 


r exp(hv/kpT) — 1 


4] 
Moreover, for a certain range of parameters, that 
is, such that Bhv <1, there is agreement with the 
classical law. 
The “quantum of light,” introduced by Einstein in 


1905 in his work on photoelectric effect, was later 
(1926) called photon by G N Lewis. The picture for 
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Figure 1 Dependence of the electromagnetic energy density 


on v, for Ti < To < Ts. 


representing the radiating system was one of a gas 
of noninteracting photons, carrying energy and 
momentum, and being continuously created and 
absorbed. 

A slightly different approach was used about the 
same time, for the problem of specific heat of 
crystalline solids. 

The simpler model considers N points on the 
nodes of the lattice Z’, in a cubic box of side L, and 
interacting through harmonic forces; similarly to the 
radiation problem, the system is represented by a 
collection of independent harmonic oscillators (nor- 
mal modes), which are “quantized? as before: the 
corresponding quanta were called phonons (by 
Fraenkel, in 1932) for the role of the acoustic band 
of frequencies. In this simplified approach (by 
Debye, in 1913) the different phonons are deter- 
mined by a finite set of wave vectors 

2n 
k= 7% 
where the maximal modulus ky is such that the 
total number of different k’s is 3N (degrees of 
freedom). 

Moreover, the frequency—wave number relation is 
simplified too, extrapolating the low-frequency 
(acoustic) linear relation v= |k|vo (vo is the sound 
speed). In this way, the density of states which is 
quadratic in the frequency, has a cutoff to zero at 
the maximal frequency, vp, corresponding to 
kml, with an associate temperature Op =hup/kp 
(Debye’s temperature). The expected energy Uyn in 
the canonical ensemble, after the computation of the 
canonical partition function, is given in term of the 
Debye function D(-): 


ni integer, i = 1,2,3; |k] < km 


Un = 3NkgTD ($) 
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Figure 2 The specific heat of crystal solids according to 


Debye. 


The agreement with experimental data, for the 
specific heat of different materials (i.e., different 
Op), at low and high temperatures, is rather good. 
At low temperatures, one recovers the empirical T? 
behavior (see Figure 2). More careful measurements 
at low T put into evidence, for metallic solids, the 
role of the conduction electrons: their contribution 
to the heat capacity turns out to be linear in T, with 
a coefficient such that at room temperature it is 
much smaller than the lattice contribution, so that a 
satisfactory agreement with the classical law is 
found. 

Soon after the beginning of quantum mechanics in 
its modern form (1925-26), physicists considered 
many-particle systems, dealing initially with the 
simplest situations, with a relatively easy formal 
apparatus, yet sufficient enough to understand in the 
main lines the “anomalous,” that is, nonclassical, 
behavior. 

For a system of N free particles in a cubic box of 
side L, quantum theory brings the labeling of the 
one-particle states with the wave vectors k, recalling 
the de Broglie relation for the momentum p = bpk, 
with a possible additional spin (intrinsic angular 
moment) label o 


2 
k= n, 


T n € Z°\{0} 


and the statistics of the particles: because of 
indistinguishability, the wave function of several 
identical particles has to be symmetric (B-E, Bose- 
Einstein statistics) or antisymmetric (F-D, Fermi- 
Dirac statistics) in the exchange of the particles. This 
has the deep implication that no more than one 
fermion shares the same quantum state. 

We may here recall the spin-statistics connection, 
which, in the framework of a local relativistic 
theory, states that integer spin particles are bosons, 
while particles with half-odd-integer spin are 
fermions. 


As the state is completely defined by the knowl- 
edge of occupation numbers nko, we have the 
simple and relevant statement on the ground states 
for the N spinless bosons and N spin-1/2 fermion 
systems are described by the statement: 


B—E system : nk = Nog ko 


F—D system : nko = 1(|k| < Rp)Vo 


The constant kp (Fermi wave number), or the 
equivalent pp = hkr and ef = pp? /2m (Fermi momen- 
tum and energy, respectively) denotes the higher 
occupied level. In the continuum approximation, 
this implies the following relation between Fermi 
energy ep and density p= N/L’: 


he j 
2 52/3 
EF = 2 3T 7 
5, (ST P) 7] 
Going to the positive-temperature case, the grand 
canonical partition function is computed by con- 
sidering that occupation numbers are non-negative 
integers for the B-E case and just 0 or 1 for the F-D 
case. This implies the simple formulas, with obvious 
meaning of symbols and leaving more details to the 
vast literature (see Figure 3): 


Nk o> Bu 


1 
—___"_ 4+forF-D, -forB-E [8 
TE [8] 


It is useful to introduce the Fermi temperature 
Tr=€ép/kp; using some realistic data, that is, for 
common metals like copper, Tp ranges roughly 
between 10*-10°K, that is, well above the “nor- 
mal,” room temperatures: the quantum nature (_L.e., 
quantum degeneracy) of the conduction electrons, 
modeled as free electrons, is macroscopically visible 
in normal conditions. 

















Figure 3 Expected fermionic occupation number, for T =O, 
T >0, and p=2. 


The presence of an external field, like the periodic 
one given by the ionic lattice of a crystal, changes 
the situation in a relevant way, as the one-particle 
spectrum generally gets a band structure, and the 
allowed momenta are described in the reciprocal 
lattice: the Fermi sphere becomes a surface, and its 
structure is central for further developments. 

For massive bosons, the strange superfluid fea- 
tures of liquid *He at low temperature, that is, 
below the critical value 2.17 K, led F London, just 
after Kapitza’s discovery in 1937, to speculate that 
these were related to a macroscopic occupation of 
the ground state (B-E condensation). A more 
realistic model has to take into account interaction 
between bosons (see last section) as the microscopic 
interactions in superfluid liquid *He are not 
negligible. 


Quantum N-Body Properties: 
Second Quantization 


The main step in analyzing a quantum N-body 
system is its energy spectrum, and in particular its 
ground state, as it may represent a good approxima- 
tion of the low-temperature states: its structure, the 
relations with possible symmetries of the Hamilto- 
nian, its degeneracy, the dependence of its energy on 
the number of particles, are further relevant ques- 
tions. The last one is related to the possibility of 
defining a thermodynamics for the system (Ruelle 
1969). As a physically very interesting example, 
consider a system of electrically charged particles, N 
electrons with negative unit charge, and K atoms 
with positive charge z, say, interacting through 
electrostatic forces; the classical Coulomb potential 
as a function of distance behaves badly, as it 
diverges at zero and decreases slowly at infinity. 
The first question is about the stability: thanks to 
the exclusion principle, for the ground-state energy 
Ey. x an extensive estimate from below is valid: 


Ex.x = —¢o(N + Kz) 


so that a finite-volume grand partition function 
exists, while for the thermodynamic limit, which 
involves large distances, we need more, that is, 
charge neutrality, which allows for screening, and a 
fast-decreasing effective interaction. 

Let us see an example (quantum spin, Heisenberg 
model) belonging to the class of lattice models, 
where the identical microscopic elements are distin- 
guishable by their fixed positions, that is, the nodes 
of a lattice like Zf. To any site x € Z is associated 
a copy Hx of a (2s+1)-dimensional Hilbert space 
H, where an irreducible unitary representation of 
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SU(2) is given, so that the nonzero values for s are 
1/2,1,3/2,..... For any x, the generators 
So(x),(a@=1,2,3) satisfy the well-known commuta- 
tion relations of the angular momentum; moreover, 
ae S.7(x)=s(s+1)1, and operators related to 
different sites commute. The ferromagnetic, iso- 
tropic, next-neighbors, magnetic field Hamiltonian 
for the finite system is 


H=- D> sœ so) -rE Sse) (9) 


<x y> 


where J is the positive strength of the next-neighbors 
coupling (<x,y> means that x and y are next 
neighbors); hb is the intensity of the magnetic field 
oriented along the third axis. This model is consider- 
ably studied even now with several variants regarding 
possible anisotropies of the interaction, the possibly 
infinite range of the interaction, and the sign of J, for 
other (e.g., antiferromagnetic) couplings. Among the 
relevant results, the Mermin-Wagner theorem, at 
variance with the analogous classical spin model, 
states the absence of spontaneous magnetization in 
this zero-field model for d=2 for any positive 
temperature; this can also be formulated as absence 
of symmetry breaking for this model (Fröhlich and 
Pfister in 1981 shed more light on this point). 

As mentioned earlier, a useful mathematical tool 
for dealing with quantum systems of many particles 
or quasiparticles, is the occupation-number repre- 
sentation for the state of the system. The vector 
space for a system with an indefinite number of 
particles is the Fock space: it is the direct sum of all 
spaces with any number of particles, starting with 
the zero-particle, vacuum state. The operators which 
connect these subspaces are the creation and 
annihilation operators, very similar to the raising 
and lowering operators introduced by Dirac for the 
spectral analysis of the harmonic-oscillator Hamil- 
tonian and the angular momentum, in the context of 
one-particle quantum theory. 

It is perhaps worth sketching the action of these 
operators on the Fock space. 

We consider spinless bosons first, as spin might 
easily be taken into account, if necessary. We 
suppose that a one-particle Hamiltonian has eigen- 
functions labeled by a set of quantum numbers k, 
say, as the wave vector for the purely kinetic one- 
particle Hamiltonian. Let |nk, nks- --,nk, > denote 
a vector state with X -;—1,. pk; particles, where ng, 
denotes the number of particles with wave vector 
ki,i=1,...,p3;|0 > denotes the no-particle, vacuum 
state. We define the creation operators a; as follows: 


A oe ea R a es [10] 
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Its adjoint a, is called the annihilation operator, 
for its action on the vectors 


E = /mp|---, 7% —1,...) [11] 


The operator a; creates a new particle with that 
momentum: for any k 


TN ee] ee = nk| nx Wipe ce) 


arak := Np (the occupation-number operator) 


The vacuum state belongs to Kerap for any k, and 
the whole space is generated by application of 
creation operators on the vacuum state. 

The following basic commutation relations, for 
any k,k’, are valid: 

/ 
lap, ay, =6(k,k), |az,ay| = lap, A | =0 [12] 

For fermions, multiple occupancy is forbidden, so 
that the analogous annihilation (œg) and creation 
(a%) operators satisfy anticommutation relations: 


[ak ay], =6lk,k), [ap ag], =[og, oy], =0 [13] 


The presence of spin is dealt by an additional spin 
label o to these symbols, and a 6é(c,0’), where 
necessary. 

The Hamiltonian for a system of particles, say 
spinless bosons, in a box A, made of its kinetic part 
together with a two-body (Av(x — y)) interaction, is 
written in terms of the “field operators”; if {,(x)} 
are the one-particle eigenfunctions of the single- 
particle purely kinetic Hamiltonian for the spinless 
case, and their complex conjugates are {¢;(x)}, we 


define the fields 
D(x) =X dy(x)ap, P(x) =X pa [14] 
k k 


So that the full Hamiltonian is given by 


42 
Hy = | avor(x) (2) A®(x) 


+A f de | dyo(x —y)0*(x)8(x)O") 6) 
[15] 


We mention that a theoretical breakthrough in the 
analysis of superfluidity was made by Bogoliubov 
(1946), who, starting from the Hamiltonian in [15], 
introduced the following Hamiltonian in the 
momentum representation: 


1 r 
Hi = > Epa ap + 5 > Vgdy_ gy 4 gtk [16] 
k k,k',q 
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Figure 4 Excitation spectrum for superfluids. 


where eg is the one-particle kinetic energy and wg is 
the Fourier transform of the two-body potential. 
To study the excitation spectrum above the ground 
state, he introduced an approximation about the 
persistency of a macroscopic occupation of the 
ground state and a diagonalization procedure 
leading to new quasiparticles with a characteristic 
energy spectrum, linearly increasing near |k|=0, 
then presenting a positive minimum before the 
subsequent increase (see Figure 4). 


Some Mathematical Tools for 
Macroscopic Quantum Systems 


The formal apparatus of second quantization, born 
in the context of the quantum field theory, brought 
to statistical mechanics new ideas and techniques 
and related difficulties. For instance, the renormali- 
zation group was conceived in the 1970s to deal 
both with critical phenomena (i.e., power singula- 
rities of thermodynamic quantities around the 
critical point) and with divergences in quantum 
field theory. This subject is currently being devel- 
oped and applied in models of quantum statistical 
mechanics (QSM) (Benfatto and Gallavotti, 1995). 

Another issue, which has again strong relations 
with quantum field theory, is the algebraic formula- 
tion of QSM. This point of view, which is well 
suited for the analysis of infinitely extended quan- 
tum systems, uses a unified, synthetic, and rigorous 
language. The procedure for passing from a finite 
quantum system to its infinitely extended version 
deserves some attention. 

It is well known that, for finite quantum systems, 
say N particles in a box A, an observable is represented 
by a self-adjoint operator A on a Hilbert space H4, and 
the normalized elements {Y>} of this space are the 
pure states py which define the expectations 


py(A):= <Y|Ay> 


The mixed states (mixtures) are defined by convex 
combinations of pure states, the coefficients having 
an obvious statistical meaning. 

Among the observables, the Hamiltonian plays a 
special role, as it generates the dynamics of the 
system, which evolves the pure states through the 
unitary group (Schrodinger picture) 


vo > exp(- =) v> 


To the notion of equilibrium probability measure on 
the phase space of a classical system, corresponds 
the mixed state pp,,g such that 


pm ol(A):= Zag tr(exp(—GHa)A) [17] 


The normalization factor Za, 3 = tr( exp (— Hxg)) is 
the canonical partition function. 

Consider now the algebra A(A) of local observa- 
bles; sending A to infinity, by induction, it is possible 
to define the algebra A of quasilocal observables. 
The main point is a set of algebraic relations like the 
canonical commutation relations (CCRs) and the 
canonical anticommutation relations (CARs) for 
the creation/annihilation operators: the observables 
of A, through the GNS (Geľfand, Naimark, and 
Segal) construction may be represented as operators 
on the appropriate Hilbert spaces, depending on the 
chosen state; the representations, at variance with 
the finite case, might be inequivalent. It is possible to 
define the equilibrium state for the infinite system 
and how to insert in a natural way the possible 
group invariance of the system (R@ or Zf, typically), 
ending with characterization of the pure phases of 
the system as the ergodic components in the 
decomposition of an equilibrium state. These states 
have the property that coarse-grained observables 
have sharp values (Ruelle 1969, Sewell 2002): if 
Av;(A) is the space average on scale l, that is, over 
boxes of side /, for an ergodic state p, 


lim p([Avi(A) — p(A)F) =0 


Another issue which is worth mentioning is the 
characterization of equilibrium states through 
the KMS (Kubo—Martin-Schwinger) condition. The 
strong formal similarity between the finite-volume 
quantum evolution operator a;:=exp(—itH,/h) 
and the statistical equilibrium density operator 
exp(—3H,), leads to the identity, valid for any 
couple of bounded observables A and B, using the 
short symbol <->g, for the expectations with 
respect to the statistical operator: 


<A;B>g A = <BAr4+ibg> B.A [1 8| 
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This relation is suitably extended for infinite size, 
and therefore defines a KMS state; it implies some 
physically relevant properties like stability with 
respect to local disturbances and dissipativity 
(Sewell 2002). 

A final issue in this section concerns another 
formalism stemming from the Feynman path-integral 
formulation of quantum mechanics: here a functional 
integral represents the statistical equilibrium density 
operator Wg = exp(— 6H). For a d-dimensional sys- 
tem of N particles in a potential field (X € R®) 
V= V(X)= ae g(x; —x;) and Hamiltonian H = 
—(1/2) A+ V the Feynman-Kac formula which, for 
a test function y, may be written as follows: 


(WX) = | PÅ y(du) exp(- / “ds V(w(5))x4Y ) 


where P y(dw) is the Wiener measure on the space 
of paths ‘{w(s),s € [0,8]}. For details on the con- 
struction and several other related features on the 
treatment of the different statistics, see Glimm and 
Jaffe (1981). 


New Problems and Challenges 


In this final section, we recall some phenomena 
which have been observed recently in physics 
laboratories, and which presumably deserve con- 
siderable efforts to overcome the heuristic level of 
explanation. About this last point, it is worth 
quoting a method that has been used to get results 
even without clear justifications of the underlying 
hypotheses, that is, the mean-field procedure. It 
started with the Curie-Weiss theory of magnetism 
and is based on the following drastic simplification: 
the microscopic element of the system feels an 
average interaction field due to other elements, 
indipendently of the positions of the latter. This 
method might provide relatively good results if the 
range of the interaction is very large, and in fact, a 
clear version with due limiting procedure was 
introduced by Kac, and applied by Lebowitz and 
Penrose in the 1960s for a microscopic derivation of 
van der Waals equation, and soon extended by Lieb 
to quantum systems. 

We will briefly outline some aspects of three 
recent achievements of condensed matter physics for 
which modeling is still on the way of further 
progress: the B-E condensation, the high-T, super- 
conductivity, and the fractional quantum Hall effect. 
The first consists in trapping an ultracold (at less 
than 5S0uK) dilute bosonic gas, for example, 
10*-10’ atoms of °’Rb, finding experimental evi- 
dence for Bose condensation. To understand the 
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properties of this system, an important tool is the 
Gross—Pitaevskii energy functional for the conden- 
sate wave function ®, 


b 2 2 8 in 
E|®| = | ax 5, Vl + Vext(x)|®| + 5/2 


where the quartic term represents the reduced 
(mean-field) interaction among particles. 

The second issue, that is, the high-temperature 
superconductivity, certainly deserves much atten- 
tion. It has been observed recently in some ceramic 
materials well above 100 K, and a clear model which 
takes into account the formation of pairs and the 
peculiar isotropy—anisotropy aspects of the normal 
conductivity and superconductivity is still lacking 
(Mattis 2003). 

Finally, let us consider the fractional quantum 
Hall effect; recall that the integer version, that is, a 
discretization of the Hall resistivity Ry by multiples 
of h/(e*), finds an explanation in terms of band 
spectra, formation of magnetic Landau levels, and 
localization from surface impurities, that is, without 
taking into account direct interactions among 
electrons. 

The fractional discretization of Ry (Stormer 1999) 
has a theoretical interpretation, in terms of subtle 
collective behavior of the two-dimensional semicon- 
ductor electron system: the quasiparticles which 
represent the excitations may behave as composite 
fermions or bosons, or exhibit a fractional statistics 
(see Fractional Quantum Hall Effect). 

This brief excursion through these new fascinating 
phenomena shows the rich interplay between theory 
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Introduction: From Periodic 
to Quasiperiodic Systems 


Periodic systems occur in many branches of physics. 
Their mathematical analysis was stimulated in 
particular by the analysis of the periodic transla- 
tional symmetry of crystals. The systematic study of 
the compatibility between translational and crystal- 
lographic point or reflection symmetry leads to the 
concept of space group symmetry. Mathematical 
crystallography in three dimensions (3D) culminated 


and experiments: these phenomena are a source of 
new ideas and suggest new models for further 
progress. 


See also: Bose-Einstein Condensates; Dynamical 
Systems and Thermodynamics; Exact Renormalization 
Group; Falicov—Kimball Model; Fermionic Systems; 
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in 1892 in the complete classification of the 230 
space groups due to Fedorov and Schoenflies 
(see Schwarzenberger (1980, pp. 132-135). One 
characteristic property of periodic systems is that 
their Fourier transform has a pure point spectrum. 
Since the Fourier spectrum is experimentally acces- 
sible through diffraction experiments, it provides 
a main tool for the structure determination of 
crystals. 

With quantum mechanics in the twentieth cen- 
tury, it became possible to describe crystal structures 
quantitatively as ordered systems of atomic nuclei 
and electrons with electromagnetic interactions. 
The representation theory of crystallographic 
space groups now opened the way to verify the 


space group symmetry of atomic systems for 
example from the band structure of crystals. It 
was then believed that in physics atomic long- 
range order is linked to periodicity and hence to 
the paradigm of the 230 space groups in 3D. 

Mathematical analysis beyond this paradigm 
started independently in various directions. Bohr 
(1925) studied quasiperiodic functions and their 
Fourier transform. He interpreted them as restric- 
tions of periodic functions in nD to their values on a 
linear subspace of orientation irrational with respect 
to a lattice. Mathematical crystallography in general 
dimension 7 > 3, including point group symmetry, 
was started around 1949 in work by Hermann and 
by Zassenhaus (see Schwarzenberger (1980)), and 
completed in 1978 for n=4 in Brown et al. (1978). 
A different route was taken by Penrose (1974). He 
constructed an aperiodic tiling (covering without 
gaps or overlaps) of the plane. Its tiles in two 
rhombus shapes provide global 5-fold point symme- 
try and make the tiling incompatible with any 
periodic lattice in 2D. The connection between 
Penrose’s aperiodic tiling and irrational subspaces 
in periodic structures was made by de Bruijn (1981). 
He interpreted the Penrose rhombus tiling as the 
intersection of geometric objects from cells of a 
hypercubic lattice in 5D with a 2D subspace, 
irrational and invariant under 5-fold noncrystallo- 
graphic point symmetry. Kramer and Neri (1984) 
embedded the icosahedral group as a point group 
into the hypercubic lattice in 6D and constructed a 
3D irrational subspace invariant under the noncrys- 
tallographic icosahedral point group. From intersec- 
tions of boundaries of the hypercubic lattice cells 
with this subspace, they constructed a 3D tiling of 
global icosahedral point symmetry with two rhom- 
bohedral tiles. 

Shechtman et al. (1984) discovered in the system 
AlMn diffraction patterns of icosahedral point 
symmetry. Since icosahedral symmetry is incompa- 
tible with a lattice in 3D, they concluded that there 
exists atomic long-range order without a lattice. The 
new paradigm of quasiperiodic long-range order in 
quasicrystals was established and since then stimu- 
lated a broad range of theoretical and experimental 
research. 

The interplay between the notions — (1) of 
crystallographic symmetry in nD, n> 3, (2) of 
subspaces invariant under a point group but 
irrational and hence incompatible with a lattice, 
and (3) of discrete geometric periodic objects in nD 
providing quasiperiodic tilings on these subspaces — 
forms the mathematical basis for a new quasiper- 
iodic long-range order found in quasicrystals. The 
present-day theory of quasicrystals offers the most 
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elaborate study of quasiperiodic systems. Therefore, 
we shall focus in what follows on the concepts 
developed in this theory. 

In the following section, we briefly review basic 
concepts of periodic systems and lattices in nD, their 
classification in terms of point symmetry and space 
groups, and their cell structure. In a section on 
quasiperiodic point sets and functions, a quasiper- 
iodic system is taken as a geometric object on an 
irrational mD subspace in an n-dimensional space 
and lattice. Noncrystallographic point symmetry is 
shown to select the irrational subspace. Next, 
scaling symmetry in quasiperiodic systems is demon- 
strated. Then, examples of quasiperiodic systems 
with point and scaling symmetry are given. The 
penultimate section discusses quasiperiodic tilings 
and their windows. Finally, the notion of a funda- 
mental domain for quasiperiodic functions compa- 
tible with a tiling is illustrated. 


Concepts from Periodic Systems 


A distribution f?(x) of geometric objects on Eucli- 
dean space E” (a real linear space equipped with 
standard Euclidean scalar product (,) and metric) 
with coordinates x € E” is called “periodic” if it is 
invariant under translations b’ in n linearly indepen- 
dent directions, 


(p) : f? : fP(x +b’) =f?(x), 


The set of all translations on E” forms the discrete 
additive abelian translation group 


i= lsa 1] 


r=f € E”:b= N mb, (m1,...,mn) ez") [2] 


Any orbit (set of all images of an initial point) under 
the action T x E” — E” yields a lattice A on E”. 
Since T acts fixpoint-free, there is a one-to-one 
correspondence A +> T. A fundamental domain on 
E” is defined as a subset of points x € E” which 
contains a representative point from any orbit under 
T. Such a fundamental domain can be chosen, for 
example, as the unit cell of the lattice A or as the 
Voronoi cell (eqn [5]). By eqn [1], the functional 
values on E” of a periodic function f?(x) are 
completely determined from its values on a funda- 
mental domain of E”. 

Given the lattice basis (b!,...,b”) of eqn [2] 
in E”, the vector components of the basis form the 
nxn basis matrix B of A. The most general change 
of the basis preserving the lattice is given by acting 
with any element of the general linear group 
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Gl(n, Z), with integral matrix entries and determi- 
nant +1, on the lattice basis, 


Gl(n,Z) 3h: B > B' = Bh 3) 


The crystallographic classification of inequivalent 
lattices in E” starts from Gl(n, Z). In addition to 
translations, it employs crystallographic point sym- 
metry operations, (Brown et al. 1978, p. 9). A 
crystallographic point group operation of a lattice A 
is a Euclidean isometry g which belongs to a group 
G>g with representations D:G — O(n,R) and 
D:G — Gl(n, Z) such that 


G = {g : D(g)B = BD(e)} [4] 


The maximal crystallographic point group for given 
lattice A is the holohedry of A. The group generated 
by T, G is a space group which classifies the lattice. 
For finer details in the classification of space groups, 
we refer to Brown et al. (1978). For crystallography 
in E>, this classification yields 230 space groups. 
Crystallography in E” is described in Schwarzenberger 
(1980) and in Brown et al. (1978) where it is 
elaborated for E*. 

From a lattice A € E” and from the Euclidean metric, 
one constructs a cell structure as follows: the Voronoi 
cell V(b), centered at a lattice point b € A, known in 
physics as the Wigner-Seitz cell, is the set of points 


V(b) ={x € E” : |x- b|<|x—-b'|, b'ea} [5] 


Any Voronoi cell has a hierarchy of boundaries Xp 
of dimension p,O0<p<n which we denote as 
p-boundaries. 

The set of Voronoi cells at all lattice points form 
the A-periodic Voronoi complex of A € E”. The 
Voronoi cells and complexes associated with a 
lattice admit a notion of geometric duality. We 
denote dual objects by a star, *. They are built from 
convex hulls of sets of lattice points (Kramer and 
Schlottmann 1989) as follows. A Voronoi p-boundary 
Xp is shared by several Voronoi cells V(b) and 
determines a set of lattice points 


S(Xp): {b € A:X, € V(b)} 6) 


The boundary dual to Xp is defined as the convex 
hull X y :=conv{b: b € S(Xp)}. Xin—p) can be 
shown to be an (n-— p)-boundary of a dual 
Delone cell. A Delone cell D is defined as the 
convex hull of all lattice points whose Voronoi cells 
share a single vertex, called a hole of the lattice. 
Since these vertices fall into classes of orbits under 
translations, they determine translationally inequi- 
valent classes of Delone cells D®, D’,.... 

Fourier analysis applied to a periodic function f?(x) 
on E” reduces to an n-fold Fourier series. The Fourier 


spectrum is a pure point spectrum and the Fourier 
coefficients can be referred to the points of a reciprocal 
lattice A° (eqn [7]) in Fourier space E”. We denote 
objects belonging to this Fourier space by the index °. 
The basis matrix B° of the reciprocal lattice A° € E” 
is obtained from B as the inverse transpose, 


(b, b'\ — g apa Ca [7] 


The values of the Fourier coefficients of f” (x) reduce 
to integrals over the fundamental domain of the 
lattice A. From eqns [4] and [7] it follows that the 
orthogonal representation of a point group G in 
coordinate and in Fourier space coincides. The 
Fourier spectrum and its point symmetry in crystals 
are observed in diffraction experiments. 


Quasiperiodic Point Sets and Functions 


Quasiperiodic functions are characterized from their 
Fourier spectrum (Bohr 1925) by 


The Fourier point spectrum of a quasiper- 
iodic function forms a Z-module M° of rank 
n,n >m on Fourier space E”. 


(qp°) 


A Z-module of rank n,n > m on E” is defined as a set 
w-d N mib”, i om) ez} [8] 
j 


with the Z-module basis (b°!,...,b°") linearly 
independent with respect to integral linear combina- 
tions. The step from a lattice A° to a module M° is 
nontrivial since the set of all module points becomes 
dense on E°”, The Fourier coefficients of a 
quasiperiodic function are assigned to the discrete 
set of module points (eqn [8]). 

Bohr in his analysis of quasiperiodic functions 
(Bohr 1925, II, pp. 111-125) shows that a general 
Z-module M° of rank n can be taken as the 
projection to a subspace E°” of dimension m of a 
(nonunique) lattice A° € E”, n > m. It is convenient 
to consider in Fourier space E%” an orthogonal 
splitting which we denote as 


Boag A BL [9] 


A characterization of a quasiperiodic function 
fP (x) on coordinate space is obtained as follows. 
From A° one can construct with the help of eqn [7] 
the lattice A:=(A°)° reciprocal to A° on a coordi- 
nate space E” and associate to it via the Fourier 
series a quasiperiodic function on a coordinate 
subspace EV’ of E” = Ej’ + ENT", equipped with a 
Z-module M (eqn [11]). As a result one finds a 


characterization of a quasiperiodic function in 
coordinate space: 


(qp) A quasiperiodic function f%?(x)), x) € E can 
always be interpreted as the restriction to a 
subspace EV’ of a A-periodic function f?(x) 
on E”, 


E” =E” + EO ™ waxy + 
fe a, rel 10 


In the interpretation (qp) (eqn [10]), the Z-modules 
in Fourier space (eqn [8]) and in coordinate space 
E become projections of reciprocal lattices, 


M? = 7 (A°), M=7 A) [11] 


The linear independence of the module basis 
enforces a splitting (eqn [9]), irrational with respect 
to the lattice A° € E”. 

As in the classification of crystal lattices, point 
symmetry plays a crucial role in the classification of 
Z-modules for quasiperiodic systems like quasicrys- 
tals. Noncrystallographic point groups G (with a 
representation incompatible with any lattice) give 
rise to quasiperiodic systems as follows: 


(qp) Given a point group G with orthogonal 
representations D):G— O(m, R), Dı :G — O 
(n —m, R) such that D) is incompatible with 
any lattice in E, we now require in E” instead 
of eqn [10] a lattice A with basis B and a 
representation D : G —> Gl(n, Z) such that 


Dı(G) 0 E 
ni D: (G) B=BD(G) [12] 
Equation [12] requires that the matrix B provides an 
irrational reduction of the representation D(G) into 
the two representations D)(G),D (G). Periodic 
functions restricted as in the second line of eqn 
[10] are quasiperiodic. 

For any finite group G, a representation D(G) 
allowing for lattice embedding can always be 
constructed by the technique of induced representa- 
tions. Its reduction into representations D)(G), 
D (G) contained in this induced representation is 
obtained by standard techniques. If Dj(G) is non- 
crystallographic and inequivalent to D,(G), the 
subspace decomposition (eqn [12]) is unique. 

Quasiperiodic functions compatible with tilings 
and their windows can be constructed from the dual 
cell structure (eqns [5] and [6]) of the embedding 
lattice (Kramer and Schlottmann 1989). Examples 
are given in the sections “Point symmetry in 
quasiperiodic systems” and “Quasiperiodic tilings 
and their windows”. 
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Scaling and Quasiperiodicity 


Quasiperiodic systems lack periodicity but can have 
scaling symmetries originating from a non-Euclidean 
extension of eqn [12]. 


Example 1: Scaling in the Square Lattice Z? 


We begin with the Fibonacci scaling on the square 
lattice Z* of E*. The symmetric matrix 


Pe i 4 € GI(2, Z) [13] 


has eigenvalues 


M= Sag, 


Az=T:=(1+vV5)/2 [14] 
Evaluation of the orthogonal eigenvectors allows us 
to define a lattice basis B=(b',b*) and rewrite the 
eigenvalue equation similar to eqn [12] as 


— /zT+3 T+2 [15] 
y 5 
p= : 


This relation shows that / with respect to the basis 
B acts as a non-Euclidean point symmetry of the 
square lattice and generates an infinite discrete 
group. Equation [15] provides an orthogonal 
splitting E*=E) + E,. The element h acts on the 
two subspaces as a discrete linear scaling by 
-r r, respectively. It maps points of Z? in E?, 
hence also their projections to Ej, into one 
another. 

Figure 1 shows the lattice basis from eqn [15]. 
We choose as fundamental domain of Z* two 
squares A,B whose boundaries are parallel or 
perpendicular to Ej. A horizontal line Ej intersects 
these two squares at vertical distances varying with 
respect to their horizontal boundaries. The quasi- 
periodic restriction f%(x))=f?(x) +c.) of a 
Z’-periodic function f(x) to a line Ne Pek 
picks up varying functional values on these sec- 
tions. Clearly, one needs all the values of f? on its 
fundamental domain in FE? to obtain all the values 
taken by f?. 

Scaling symmetry appears in conjunction with 
noncrystallographic point symmetry (cf. the follow- 
ing section). Combined with quasiperiodic tilings, it 
gives rise to a hierarchy of self-similar tilings whose 
tiles scale with r. 
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Figure 1 The square lattice with Fibonacci scaling. Lattice 
points are black squares, holes white circles. The vectors 
(b',b*) indicate the lattice basis. The directions x), x, of 
scalings by —7~',7 run horizontally and vertically, respectively. 
Perpendicular and parallel projections V; of Voronoi and D; of 
Delone cells are attached to the lattice and hole points, 
respectively. Two different pairs of dual 1-boundaries X;, X; of 
Voronoi and Delone squares are marked on the right. The 
product polytopes Xi x Xı,ı of their projections form two 
squares A, B and yield a periodic tiling of E?. A single pair A, B 
forms a fundamental domain of the lattice. The characteristic 
functions on A,B are windows for the tiles. A general 
quasiperiodic function f9°(x)) is the restriction of a periodic 
function f?(x), defined on A, B, to its values on a horizontal line 
X=X +c. If the periodic function f?(x) on A,B takes only 
values independent of x_, its quasiperiodic restriction f9?(x)) := 
fP(x) + c) to this line repeats its values on the long and short 
tiles A), Bj, respectively, of the standard Fibonacci tiling. Then 
Aj, B} form a fundamental domain for quasiperiodic functions 
compatible with the tiling. 


Point Symmetry in Quasiperiodic 
Systems 


Quasiperiodic systems with noncrystallographic 
point symmetry provide the structure theory and 
physics of quasicrystals. We illustrate the general 
scheme (qp) of eqn [12] by examples of 5-fold and 
icosahedral point symmetry. For generalizations, see 
Janssen (1986). 


Example 2: 5-Fold Point Symmetry from 
the Root Lattice A, 


The A, root lattice basis in E* may be derived (Baake 
et al. 1990) from five orthonormal unit vectors 


(e!,e7,e°,e*,e°) in E° as 


B = (b! b2, b3, b*) 


ele — e,e* g e ee ,e* og] [16] 


As the generator of the cyclic group Cs; of 5-fold 


rotations, we take the cyclic permutation (12345) in 


cycle notation acting on the vectors (e!, e”, e°, e*, e°). 


A possible choice of the basis for eqn [12] is the 
irrational matrix 





0 0 0 
1 c E c E 
={1 1 0 0 
O s so =F +5 
= ae | 1 0 
l g E C E 
0 —1 1 
0 6 =s § <5 
0 0 —1 
C= COS ai M ial s= sin zal = cr 
E S7 2° ~~ 5] 2 
1 oe (47) _ _T r oe (4T\_ yT 
Cc = cos C Tp s = sin z )= 5 
[17] 


Equation [12] for the representation of the generator 
(12345) of the cyclic group Cs; becomes 


c —s 0 0 
S c 0 
(b e sb”) 
0 0 € =s 
0 0 s g 
0 0 0 =] 
1 0 0 —1 
=D 00 [18] 
0 1 0 -1 
0 0 1 -1 


The left of eqn [18] generates two 2D inequivalent 
representations of 5-fold planar rotations which are 
incompatible with any 2D lattice. 

The lattice A4 in addition has a scaling symmetry 
with a factor r. The scaling transformation may be 
expressed in terms of the basis (eqn [16]) and an 
element h € GI(4, Z) as 


-r 00 0 
QO =7 0 0 TLETT 
oo o tg | (eb bby 
0 00 r 


st Q 1 

_ apt pep pay[O -1 -1 1 

= (b, b4, b>, b*) 1 =] =] 0 [19] 
1 O =1 0 


It is easily verified that the operations of scaling and 
of 5-fold rotation (eqns [19] and [18]) commute 
with one another. 


Example 3: Icosahedral Point Symmetry from 
Lattices A =Z°, Dg 


The icosahedral group G = H; has two inequivalent 
3D noncrystallographic representations. H3 allows 
for an induced embedding representation D: H3 —> 
Gl(6, Z), (Kramer and Neri 1984, Kramer et al. 
1992, Kramer and Papadopolos 1997) into a 
hypercubic lattice A=Z°. This representation 
reduces into two 3D orthogonal inequivalent irre- 
ducible noncrystallographic representations Dy): 
H; — O(3, R), Dı :H3—-O(3,R). The irrational 
basis matrix of eqn [12] for A=Z° becomes 
(Kramer et al. 1992, p. 185, eqn (7)) 


B=(b',b?, b°, bt, b5, b) 


0117 07 
1770 1 o0 
B 1 ee 0 1 ed |20] 
 V2t+2)l077101 
7 1107 0 
100f74 1 7 
with 7=—7,1=—1. The six basis vectors with 


components in the upper three rows span the so- 
called primitive icosahedral Z-module associated 
with D) in Ej in the sense of eqn [11]. In this 
space they point along the directions of six 5-fold 
axes of the icosahedron. 

A second lattice in E which admits icosahedral 
point symmetry is the root lattice Ds. The basis of 
this root lattice, often denoted as the P-lattice, is 
obtained from eqn [20] by a centering matrix 
given in Kramer et al. (1992, p. 185, eqn (8)). The 
corresponding Z-module is inequivalent to the 
module projected from eqn [20]. The third 
lattice of icosahedral point symmetry in Ef is 
A=I:=P° reciprocal to the root lattice De. All 
three icosahedral modules admit (powers of) 
T-scaling. 


Quasiperiodic Tilings and Their Windows 


Quasiperiodic sets of points arise from the general 
scheme (qp) (eqn [12]) by choosing particular 
periodic functions in the embedding space E”, called 
the “windows,” whose intersections with Ej are the 
quasiperiodic sets of points. 

The window for the construction of a discrete 
quasiperiodic point set based on eqn [12] is given by 
the characteristic function x(x) on the projection 
Vi(x,):= 7 (V(b)) of the Voronoi cell (eqn [5]), 
attached to any lattice point b € A. 
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Example 4: The Quasiperiodic Fibonacci Point Set 


If in the Fibonacci system (Figure 1), one attaches to 
any point b of the square lattice as a window the 
characteristic function x of the perpendicular pro- 
jection V,(b) of the unit square attached to b, the 
function f4(x)) becomes the standard quasiperiodic 
Fibonacci sequence of points. 

The dual cell geometry of Voronoi and Delone 
cells and their dual boundaries (eqns [5] and [6]) 
allows us to construct dual canonical quasiperiodic 
tilings (7,A),(Z7*,A) (Kramer and Schlottmann 
1989). To this end one constructs from local projec- 
tions of pairs of dual boundaries Xm, 1, X} p-m), 1 OF 
X* > X(n—m),. the direct product polytopes Xm, x 
n-m), 1 OF X; x X1 called “klotz polytopes.” The 
characteristic functions on these polytopes form the 


windows for the tiles X,,, j, X nP respectively. 


Example 5: The Quasiperiodic Fibonacci Tiling 


The Voronoi cells V of the square lattice are squares 
centered at lattice points, the Delone cells D are 
squares centered at the vertices of Voronoi squares. 
The product polytopes Xj, x Xj, from projections 
of dual 1-boundaries X7,X1 of Delone and Voronoi 
squares (cf. Figure 1) become the two types of square 
windows A,B. If a parallel line section x =x) +c, 
crosses one of these squares, the tile A) or By is 
formed. The standard Fibonacci tiling results. 


Example 6: Canonical Tilings from the Root 
Lattices A,, De 


The two rhombus tiles of the planar quasiperiodic 
Penrose pattern (Penrose 1974) (J, A4) are the projec- 
tions of 2-boundaries of the Voronoi complex of the 
root lattice A4 € E* (Baake et al. 1990). The triangle 
tiles of the dual tiling (7~, A4) are shown in Figure 2. 

They are projections of 2-boundaries from the 
Delone complex of the same lattice. A full analysis 
of dual Voronoi and Delone boundaries of the root 
lattice D¢ is given in Kramer et al. (1992). It leads to 
icosahedral tilings (7, D6) and (J*, D¢) of E’, (Kramer 
et al. 1992, Kramer and Papadopolos 1997, Kramer 
and Schlottmann 1989) and to models of icosahedral 
quasicrystals. 


Fundamental Domains for Quasiperiodic 
Tilings 


Canonical tilings allow us to construct quasiperiodic 
functions equipped with a quasiperiodic counterpart 
of fundamental domains or cells in crystals: assume 
that the tiles of a tiling (7, A) all are translates in E” 
of a finite minimal set of prototiles (X',..., X"). 
Consider the class of quasiperiodic functions which 
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Figure 2 A patch of the planar quasiperiodic triangle tiling 
(T*, A4) obtained from the root lattice A, € E*. The tiles are two 
triangles, projections of 2-boundaries from the Delone cells of 
A4. The vertices are projections of lattice points. The 20 shaded 
triangles form a set of prototiles such that any other tile is a 
translate of one of them. The shaded set forms a fundamental 
domain for the tiling. 
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take identical values on any translate of a prototile. 
These values are prescribed on the finite set of 
prototiles in E” which define a fundamental domain 
for this class of quasiperiodic functions. Only this 
class of quasiperiodic functions is compatible with 
the tiling. It can be characterized in the scheme (qp) 
(eqn [12]) by A-periodic functions on E” whose 
values on the tile windows of the previous section 
are independent of the perpendicular coordinate. A 
fundamental domain for the triangle tiling (7*, A4) 
is given by the shaded parts in Figure 2. The 
fundamental domain property appears in relation 
with the theory of covering of quasiperiodic sets (see 
Kramer and Papadopolos (2000)). 


Example 7: Fundamental Domain 
for the Fibonacci Tiling 


Attach to the squares A,B in Figure 1 a periodic 
function f?(x) with functional values independent of 
the perpendicular coordinate x, within the two 
squares. Consider the functional values f%?(x)) = 
f?(x) +c.) picked up on a parallel line. Clearly, 
these values become independent of the perpendi- 
cular coordinate of any intersection with a square 
A,B. The general prescription of values on a 
fundamental domain of A €E? needed for a 
quasiperiodic function reduces to a prescription of 
its functional values in Ej on the fundamental 
domain formed by the two prototiles A), By. 


Conclusion 


For quasiperiodic systems, the general construction 
was introduced in the section “Quasiperiodic point 
sets and functions”, and illustrations were given in 
four subsequent sections. Further reading resources are 
provided by the references given at the end. Here, we 
mention some of the many possible generalizations. 

Bohr (1925) considers quasiperiodic as special 
cases of almost periodic systems. The module of an 
almost periodic function has a countable basis. 

Moody (1997) discusses the notion of Meyer sets. 
These describe discrete sets on locally compact 
abelian groups and as particular cases encompass 
quasiperiodic systems. 

Lagarias (2000) studies aperiodic sets character- 
ized by the following properties, shared with 
periodic and quasiperiodic sets: 


(ap1): inequivalent patches of points are volume 
bounded, 

(ap2): pure point Fourier spectrum, 

(ap3): linear repetitivity of patches, and 

(ap4): self-similarity. 


See also: Compact Groups and Their Representations; 
Finite Group Symmetry Breaking; Lie Groups: General 
Theory; Localization for Quasiperiodic Potentials; 
Symmetries and Conservation Laws; Symmetry and 
Symmetry Breaking in Dynamical Systems. 
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Determinants in Finite Dimensions 


The determinant of a linear transformation 
A: V— W acting between finite-dimensional com- 
plex vector spaces is an element det A of a complex 
line La. The abstract element detA is called the 
Quillen determinant of A, and the complex line La 
is called the determinant line of A. A choice of 
(linear) isomorphism 


ee © 1] 
associates to det A the complex number 
det ¿A := ọ(det A) € C [2] 
which can equivalently be written as the ratio 
det A 
det ¿A = — ~ 3 
“o i 


taken in the one-dimensional complex vector space 
La relative to the canonical generator #'(1). It is 
not necessarily the case that det A determines a 
generator for La; specifically, if dim V=m and 
dim W =n, then det A=0 if m Æ n (by “fiat”), while 
if m=n, then detA=O precisely when A is not 
invertible. For the moment, set m =n. 

For k € {0,1,...,7} the kth exterior power opera- 
tor is defined by 


AA: ARV nA W 
AK A( v Av2 A+++ Avg) = AV AnA A Avy fA 


where 14,...,42€V and A°V:=C and A°A:=1. 
When kR=n,DetV:=A"V and DetW:= A" W are 
complex lines and the determinant line of A is 


La := Det V* & Det W [S] 
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while for any basis {e1,..., €n} for V, with dual basis 


leises) tor V“; 
det A := e} A Ae, Q (AA) le^ Aen) E La [6] 


There is a canonical isomorphism for A € Hom(V, W), 
B € Hom(U, V) 


Lap = La 8 Lpg [7] 
coming from the isomorphism 
Det V* & Det V — C [8] 


defined by the canonical pairing Det V* x Det V => C, 
and this preserves the determinant elements 


det (AB) — det A & det B [9] 


The Classical Determinant 


When V= W these constructions take on a more 
familiar form. Then ¢ can be chosen to be the 
canonical isomorphism [8] and evaluation on 
det A € La outputs the classical determinant 


det cA = N (1a o1) e. ån o(n) [10] 


where the sum is over permutations of {1,...,7} and 
(a; j) is the matrix of A with respect to any basis of V — 
changing the basis may change the summands on the 
right-hand side of [10], but not their sum. It is 
fundamental that when V = W the classical determi- 
nant is an intrinsic invariant of the operator A, inde- 
pendent of the choice of basis for V; when V 4 W that 
is no longer so since there is then no canonical bilinear 
pairing Det V* x Det W — C; the choice of a non- 
degenerate pairing is equivalent to a choice of ¢ in [1]. 

The identification of [10] from [6] and [8] 
amounts to the identity in Det V 


(A"A)(e1 A+++ Aen) = detcA.e&1 A^- Aen [11] 
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Since A\”(AB)= A” Ao A”B, [11] in turn implies the 
characterizing multiplicativity property of the classical 
determinant 


det c(AB) = detcA.detcB [12] 


for A,B € End(V), specializing the general fact in 
[7]. Similarly, the group GI(V,C) of invertible 
elements of End(V) is identified with those A with 
detc A Æ 0. 

The classical determinant can also be thought of 
in the following ways. First, the direct sum of the 
operators defined in [4] yields the total exterior 
power operator AA: AV —> ^V on the exterior 
algebra AV = @%_, AFV and this has trace 


tr (AA) = detc(I + A) [13] 


where I is the identity. Alternatively, one can do 
something a little more sophisticated and use the 
holomorphic functional calculus to define the 
logarithm log, B of B € End(V) by 


log, B = al log, \(B—AN!ddv [14 
20 Ty 


Here log, A is the branch of the complex logarithm 
defined by 0 — 27 < arg(A) < 0 and Tọ is a positively 
oriented contour enclosing spec(B) but not any point 
of the spectral cut Rg={re”’|r > 0}. Then, if B is 
invertible, 


tr (log, B) = log, det B [15] 


The Fredholm Determinant 


The advantage of the constructions [13] and [15] is 
that they extend to a restricted class of bounded linear 
operators on infinite-dimensional Hilbert spaces. This 
is consequent on the fact that both of the formulas [13] 
and [15] are computed as operator traces. 

(Recall that a trace on a Banach algebra B is a 
linear functional r:B— C which has the property 
T([a,b])=0 for all a, b in B, where [a,b] :=ab — ba 
is defined by the product structure on B. Since one 
can define the logarithm log, b of an element b of B 
with spectral cut Rg by the formula [14], one in this 
case obtains a determinant det,»(b) on such 
elements by setting 


log, det ,9(b) = T(log, b) [16] 


If a,b, ab € B have common spectral cuts 0, the trace 
property of 7 translates into the multiplicativity 
property det, (ab) = det ,,9(a)det ,,9(b) via a version 
of the Campbell—Hausdorff formula.) 

The operator trace arises as follows. Let H be a 
complex separable Hilbert space with inner product 


<,>, let C(H) be the algebra of compact operators 
on H, and let 


{= 


Li = fa € C(H)| AIÈ = D> mi(A*A) < ~| 17 
=1 


be the ideal of trace-class operators, where the sum 
is over the real discrete eigenvalues w;(A*A) N +0 of 
the compact self-adjoint operator A*A. For any 
orthonormal basis {nj} of H the map 


tr: Lı — C, A tr (A) =) ye 
j 


is a trace functional on L;(H), independent of the 
choice of basis. Lidskii’s theorem states that 


tr(A)= X A [18] 


AeEspec(A) 


with the sum over the eigenvalues of A counted up 
to algebraic multiplicity; for general trace-class 
operators this equality is highly nontrivial. 

If A is trace class, then for each non-negative 
integer k so is each of the exterior power operators 
AFA: Ak H—> ~k H, defined as in [4]. Following 
[13], a determinant can therefore be defined on the 
semigroup I + Lı :={I + A|A € Lj} of determinant- 
class operators by the absolutely convergent sum 


det p(T + A):= tr (AA) = 1 + Soir (ASA) [19] 
k=1 


On the other hand, since tr is tracial and log, (I + A) 
defined by [14] is trace class, then according to [16], 
there is a determinant given on invertible determinant- 
class operators by 


log, detr (I + A) = tr (log, (J + A)) [20] 


which, as the left-hand side already suggests, 
coincides with the Fredholm determinant. 

The Fredholm determinant retains the character- 
izing properties of the classical determinant in finite 
dimensions, that detp : J + Ly — C is multiplicative, 


detp((I + A)(I + B)) = detp(I + A) detg(I + B), 
ABE Li [21] 


and detp(I + A) Æ 0 if and only if I + A is invertible. 
It is, moreover, essentially unique; any other multi- 
plicative functional on I + L, is equal to some power 
of the Fredholm determinant, or, equivalently, any 
trace on Lı is a constant multiple of the operator 
trace. The trace property, the operator trace, and the 
multiplicativity of the Fredholm determinant do not, 
however, persist to any functional extension of the 
operator trace (resp. Fredholm determinant) on the 


space of pseudodifferential operators of any real 
order acting on function spaces (fields over space- 
time). In quantum physics, this is a primary cause of 
anomalies. More precisely, determinants of differen- 
tial operators arise in quantum field theories (QFTs) 
and string theory through the formal evaluation of 
their defining Feynman path integrals and the 
calculation of certain stable quantum numbers, 
which are in some sense “topological.” 

From the latter perspective, it is instructive to be 
aware also of the following, third, construction of the 
Fredholm determinant, which equates the existence 
of a nontrivial determinant to the existence of 
nontrivial topology of the general linear group. 
First, in a surprising contrast to Gl(n, C), the general 
linear group Gl(H) of an infinite-dimensional Hilbert 
space H with the norm topology is contractible, and 
hence topologically trivial. By transgression proper- 
ties in cohomology, this implies any vector bundle 
with structure group GI(H) is isomorphic to the 
trivial bundle. In order to recapture some topology 
(and hence, in applications, some physics), it is 
necessary to reduce to certain infinite-dimensional 
subgroups of Gl(H). The most obvious one is the 
group Gl(oo) of of invertible operators differing from 
the identity by an operator of finite rank. As the 
inductive limit of the Gl(n, C), the cohomology and 
homotopy groups of Gl(co) are a stable version of 
those of Gl(n, C). Precisely, Gl(oo) is torsion free and 
its cohomology ring is an exterior algebra with odd 
degree generators, while Bott (1959) periodicity 
identifies 7,(Gl(oo)) to be isomorphic to Z if k is 
odd and trivial if k is even. Topologically, it is 
preferable to consider the closure of Gl(oo) in Gl(H), 
which yields the group Glepe(H) of operators differing 
from the identity by a compact operator, but this is 
now a little “too large” for analysis and differential 
geometry. Given our earlier comments, there is an 
intermediate natural choice of the Banach Lie group 
Gl (H) of operators differing from the identity by a 
trace-class operator (in fact, there is a tower of such 
Schatten class groups). Moreover, the inclusions 
Gl(oo) C Gl (H) C Glp(H) are homotopy equiva- 
lences, and so the cohomology of Gl (H) is just the 
exterior algebra mentioned above 


H*(Gh (H)) = A(w1,W3, Ws, te as 
degw; = 27 — 1 p2] 
The advantage of considering Gl,(H) is that precise 


analytical representatives for the classes w; can be 
written down: 
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where 
O =tr(Z~'dZ) [23] 


is the 1-form on Gl; (H). 

This equation makes sense because the derivative 
dZ is trace class, and hence so is ZdZ. Now, 
locally © =d log detp (Z), so that the 1-form w1 
pulled back by a path a: St — Gli (H) is precisely the 
winding number of the curve traced out in C* by the 
function detr (o). In fact, this is just a special case of 
the Bott periodicity theorem, which tells us that the 
stable homotopy group 72;-1(Gl1(H)) is isomorphic to 
Z and an isomorphism is defined by assigning to a map 
f : S771 — Gli (H) the integer Joa f*w; € Z (it is not 
obvious a priori that it is an integer). 

Notice that it was not necessary to have mentioned 
the Fredholm determinant of Z at this point. Indeed, 
the third definition of the Fredholm determinant is to 
see it as the integral of the 1-form ©, define 


log, detp(I + A) := J © [24] 
y 
where y:[0,1]— Gl (H) is any path with 4(0)=I 
and ~y(1)=I + A; this uses the connectedness of 
Gl,(H) and independence of the choice of y, as 
guaranteed by Bott periodicity. 

Interestingly, this is closely tied in with the 
Atiyah-Singer index theorem for elliptic pseudodif- 
ferential operators (which in full generality uses the 
Bott periodicity theorem). Here, there is the follow- 
ing simple but quintessential version of that theorem 
which links it to the winding number of the 
determinant of the symbol of a differential operator 


De > ap. [25] 


a|<m 


on Euclidean space R” with a=(qj,...,a,) a multi- 
index of non-negative integers, |a| =@1 +--+ Qn, 
and D, =10/0x;. Here D acts on C™(IR”, V) with V 
a finite-dimensional complex vector space and 
the coefficients of D are matrices varying smoothly 
with x which are required to decay suitably fast, 
D a] = O(|x|7!") as |x| — oo. If the symbol op 
of D, defined by 


op(x,€) = Y aalx)e 26 


a| <m 


with €=(&,...,&) E R”, satisfies the ellipticity 
condition of being invertible on the 2n — 1 sphere 
S*"-1 in (x, £) space, then D is a Fredholm operator. 
The index theorem then states 


niao- J oblon) 


318 Quillen Determinant 


the higher-dimensional analog of the winding 
number of the determinant. 


Fredholm Operators and Determinant 
Line Bundles 


The operators whose determinants are considered in 
this article are all Fredholm operators. Recall that a 
linear operator A: H; — H2 between Hilbert spaces 
is Fredholm if it is invertible modulo compact 
Operators; that is, there is a “parametrix” 
O:H— Hı such that OA—I and AQ- I are 
compact operators on Hı and H2, respectively. 
Equivalently, the range A(H;) of A is closed in H2, 
and the kernel Ker(A)={ņ € H,|An=0} and 
cokernel Coker(A)=H2/A(H;) of A are finite 
dimensional. (This is equally true for Banach and 
Frechet spaces, we restrict our attention to Hilbert 
spaces for brevity.) The space Fred of all such 
Fredholm operators with the norm topology has the 
homotopy type of the classifying space Z x BGI(oo). 
The first factor parametrizes the connected compo- 
nents of Fred, two Fredholm operators are in the 
same component if and only if they have the same 
index 


index (A) = dim Ker(A) — dim Coker(A) 


Mostly we restrict our attention to the connected 
component Fredy of operators of index zero. The 
cohomology of Fredo ~ BGlI(oo) is a polynomial 
ring 


H* (Fredo) — Richy, cha, ch3, om 


whose generators may be formally realized as the 
even degree components of the Chern character of 
an infinite-dimensional bundle over Fredo. In fact, 
the generators w2;_1 of H*(Gl,(H)) are related to the 
ch; through transgression, see Chern and Simons 
(1974). We shall be interested here in the first 
generator ch;, a transgression of the Fredholm 
determinant “winding number 1-form” w1, which 
coincides with the real Chern class of a canonical 
complex line bundle DET) — Fredo. The fiber of 
DET o at A € Fredo is the determinant line Det(A) of 
the Fredholm operator A, which is defined as 
follows (Segal 2004). 

Just as for finite-rank operators (see the subsec- 
tion “Determinants in finite dimensions”), the 
determinant of a Fredholm operator A:H! — H? 
exists abstractly not as a number but as an element 
detA of a complex line Det(A). For simplicity, we 
suppose that index(A)=0. Elements of the 


determinant line Det(A) are equivalence classes 
[E, A] of pairs (E, à), where E: H! — H? such that 
A —E is trace class and relative to the equivalence 
relation (Eq, A) ~ (E, det f(q)à) for q: Ht — H! of 
determinant class and where detr (q) is the Fredholm 
determinant of q. Complex multiplication on Det(A) 
is defined by pl[A,A]=[A, pA]. The abstract, or 
Quillen, determinant of A is the preferred element 
det A:=[A, 1] in Det(A). 

Here are some essential properties of the determi- 
nant line. First, det A is nonzero if and only if A is 
invertible. Next, quotients of abstract determinants 
in Det(A) are given by Fredholm determinants; for if 
A,:H! > H?, A, : H! — H? are Fredholm operators 
such that A;— A are trace class, then if A2 is 
invertible we see that Aj'A, is determinant class 
and hence from the definition that 


det (Ay ) 


det(Ap) = detp(Az'A1) [27] 





where the quotient on the left-hand side is taken in 
Det(A). The principal functorial property of the 
determinant line is that given a commutative 
diagram with exact rows and Fredholm columns 


0 — Hi — H, — H] — 0 


lja mw Ja 28) 


0 — HM — H, — Hy — 0 


then there is canonical isomorphism of complex 
lines 


Det(A’) & Det(A) @ Det(A”) [29] 


preserving the Quillen determinants det(A’) 
det (A) ® det (A”). A consequence of this property is 
that given Fredholm operators A:H— H3, and 
B: Hı —> A, then 


Det(AB) = Det(A) ® Det(B) 


with det(AB)<>det(A) & det(B), generalizing the 
elementary property [9]. 

The principal context of interest for studying 
determinant lines is the case where one has a 
family A={A,|x €B} of Fredholm operators 
parametrized by a manifold B, satisfying suitable 
continuity properties, and one aims to make sense 
of the determinant as a function A— C. It is then 
of no difficulty to show that the corresponding 
family of determinant lines DET(A)= UDet(A,) 
defines a complex line bundle over B endowed 
with a canonical section det :B—DET(A) 


assigning to x€B the Quillen determinant 
det (Ax) € Det(A,x) (Quillen 1985, Segal 2004). To 
identify the Quillen determinant section with a 
function on A, we need to identify a trivialization 
of the line bundle DET(A), giving a global basis 
for the fibers. This is the same thing as giving a 
non(or never)vanishing section w:B—DET(4A), 
with respect to which we have the regularized 
determinant function (cf. [3]): 


det (Ax) 
h(x) 


If A is trivializable, so a nonzero section exists, there 
will be many such sections and some extra data is 
needed to fix a natural choice of w. 

Each of the properties mentioned above for 
determinant lines carries forward to determinant 
line bundles in a natural way. In particular, one 
easily deduces from [28], or from the exact 
sequence 


xt det y(Ax):= [30] 


0 — KerA, — Hix =, Hz x — CokerA, — 0 


that if the kernels KerA, have constant dimension as 
x varies, then there is a canonical isomorphism 


Det(A) © A™*Ker(A)’ @ A™*Coker(A) [31] 


where Ker(A) is the finite-rank complex vector 
bundle over B with fiber KerA,, and Coker(A) 
similarly. The interesting feature here is that it 
shows the determinant bundle to be the top 
exterior power of the index bundle Ind(A)= 
[Ker(A)] — [Coker(A)] € K(B) in the even K-theory 
of B, and in this sense determinant theory may be 
seen as a particular aspect of index theory - 
understood is the very broadest sense; in fact, 
the computation of determinants is usually a 
considerably more complex and difficult task than 
computing an index. 


Determinant Bundles for Differential 
Operators over Manifolds 


The Quillen determinant has been of particular 
interest in the case of families of Dirac operators. 
Such a family is associated to a C% fibration 
m7: M —> B of closed boundaryless finite-dimensional 
Riemannian manifolds of even dimension. If there is 
a graded Hermitian vector bundle E=ET @ E7 M 
of Clifford modules, then from the Riemannian 
structure one can construct a Levi-Civita connection 
on the vertical tangent bundle T(M/B) which can be 
lifted to a Clifford connection on £; for example, the 
spinor connection if we have a family of spin 
manifolds. This data yields a smooth family of 
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first-order elliptic differential operators D={D,: 
C% (Mx; Ef) — C®(My3 EZ) |x € B} of chiral Dirac- 
type, with D, a Dirac-type operator acting over the 
manifold My =m! (x) parametrized by the fibration, 
along with a determinant line bundle DET(D)— B 
endowed with a canonical section x+>det(D,). 
There are various contexts in mathematics and 
physics in which one would like to assign to the 
determinant section a naturally associated smooth 
function (a regularized determinant) det reg : B — C, 
which can, for example, then be integrated. As 
discussed in the previous section, this depends on 
identifying a trivializing (nonzero) section of 
DET(D). For such a section to exist, the first Chern 
class cy(DET(D) € H*(B) must vanish, and this in 
turn can be computed as a term in the Atiyah—Singer 
(1984) index theorem for families. Indeed, this is 
clear from the formal identification [31] which here 
takes on a precise meaning. 

The following simple example, which is the basic 
topological anomaly computation in string theory, 
may help to explain the type of computation. Let 
Mx be a copy of X a compact Riemann surface, so 
that M is a family of surfaces parametrized by B 
Let T=UT, be the vertical complex tangent line 
bundle on M, where Ty is the complex tangent line 
bundle to M,. Each fiber has an associated 
-operator 0, which we couple to the Hermitian 
bundle E, := T9” for m a non-negative integer. In 
this way, we get a family Ds, of -operators coupled 
to €=T®°” whose index bundle is the element 
Ind(Dy) = fi(T°®”) € K(B). The Atiyah-Singer index 
theorem for families in this situation coincides with 
the Grothendieck-—Riemann—Roch theorem and this 
says that 


ch(fi(T™”)) 


where ch is the Chern character class and Todd(T) is 
the Todd class defined for a vector bundle F whose 
first few terms are 


= f,(ch(T®”)Todd(T)) 


Todd(F) =1+4a(F)+ 4a (PF ++- 
and where f,:H’(M)— H’!(B) is integration over 
the fibers. That is, with €=c,(T), 
ch(fi(T®”)) =fa((1 + mE +40? +.) 
x (IHHEF BE +) 
as = - E+ (me +m +4) 


So we have 


c1(fi(T°")) = yy (6m? + 6m + 1)A(E) EH (B) [32] 


320 Quillen Determinant 


But for any element of K-theory, c,(E)= 
c1(DET(E)), and so the left-hand side of [32] is the 
first Chern class of the determinant line bundle 
DET(Dy). If we take, in particular, B =Conf(), the 
space of conformal classes of metrics on © (or 
compact subsets of this space), and couple the 
family Dy to a background trivial real bundle of 
rank d/2, or its negative in K-theory, then taking 
m=1 [32] is easily seen to be modified to 


d —26 
c1(Ds -aj2) = CnN 
It follows for this topological anomaly to vanish 
one must have background spacetime of dimension 
d=26. The idea here is that Conf(X)) is a 
configuration space for bosonic strings in R 
with the requirement that the determinant section 
of the determinant line bundle be conformally 
invariant, corresponding to the classical invariance 
of the string Lagrangian defining the string path 
integral from which the determinant arises. That 
is, in order to evaluate the path integral on the 
reduced configuration space, one requires a trivia- 
lization of the determinant line bundle which 
defines a conformally invariant regularized deter- 
minant function. The above calculation says that 
there is a topological obstruction to this occurring 
when the background space dimension differs 
from 26. 

This is the most basic example of determinant 
anomaly computations, which have acquired 
considerably more sophisticated constructions in 
modern versions of string theory and QFT. One 
immediate deficiency in the approach explained so 
far is that not all anomalies are topological and so 
even though the first Chern class of the determinant 
line bundle may vanish, there may still be local and 
global obstructions to the existence of a determi- 
nant function with the correct symmetry properties. 
To be more precise, one needs to say not just that a 
trivialization of the determinant line bundle for- 
mally exists, but to actually be able to construct a 
specific preferred trivialization. For this more 
refined objective, one needs to know more about 
the differential geometry of the determinant line. 
One approach is to fix a canonical choice of 
connection and, if the determinant bundle is 
topologically trivial, to construct a determinant 
section (up to phase) using the parallel transport 
of the connection. 

The principal contribution to such a theory was 
made in a remarkable four page paper by Quillen 
(1985) in which using zeta-function regularization 
he presented a construction of a metric and 


connection on the determinant line bundle for a 
family of 0-operators over a Riemann surface 
coupled to a holomorphic vector bundle. (This is 
the first paper one should read on determinant line 
bundles; Quillen’s motivation, in fact, did not come 
from physics but from a problem in number 
theory.) 

To outline this construction, which was extended 
to general families of Dirac-type operators in Bismut 
and Freed (1986), first we recall that if A is 
an invertible Laplacian-type second-order elliptic 
differential operator acting on the space of sections 
of a vector bundle over a compact manifold of 
dimension n, then it has a spectrum consisting of 
real discrete eigenvalues {A} forming an unbounded 
subset of the positive real line. The zeta function 
of A is defined in the complex half-plane Re(s) > 
n/2 by 


((A,s) =tr(A“) = XD aA™, Re(s) > > 
AÀ 


and extends to a meromorphic function of s on the 
whole complex plane. It turns out that the extension 
has no pole at s=0 and this means that we may 
define the zeta-function regularized determinant of 
A by 


since (d/ds)| -0A =log Aà this formally represents a 
regularized product of the eigenvalues of A. A 
metric is now defined on the determinant line 
bundle DET(D) by defining the norm square of the 
element det (Dx) € Det(D,.) by 


| det(D,)||* := dete(D4 Dx) 


over the subset Bo of x € B where D, is invertible. 
Elsewhere in B, one includes a factor defined by the 
induced L? metric in the kernel and cokernel. See 
Quillen (1985) and Bismut and Freed (1986) for full 
details. 

A connection is defined by similarly constructing 
a regularized version of the connection we would 
define if we were working with finite-rank bundles. 
First, one includes in the data associated to the 
fibration 7: M —> B defining the family of opera- 
tors D a splitting of the tangent bundle 
TM = T(M/B) @7*(TB). This assumption and the 
Riemannian geometry of the fibration yield a 
connection V') defined along the fibers of the 
fibration. The connection form over Bo is then 


defined by 


w(x) =tr¢(Dz1V™D,) 


where the zeta-regularized trace tr¢ is defined on a 
vertical bundle endomorphism-valued 1 form x> Ax 
on M by 


tre(Ax):= fp,_ott (Ax (Dž Dx) DPA 


where the superscript indicates we are considering the 
meromorphically extended form, and fp,—ọo(G(s)) 
means the finite part of a meromorphic function G 
on C; that is, the constant term in the Laurent 
expansion of G(s) near s=0. 

A theorem of Bismut and Freed, generalizing 
Quillen’s original computation, computes the curva- 
ture QPETD)) of this connection to be the 2-form 
component in the local Atiyah—Singer families index 
density. This is a refined version of the topological 
version of that theorem which we utilized earlier; it 
expresses the characteristic classes on B in terms of 
specific canonical differential forms constructed by 
integrating, along the fibers of the fibration, 
canonically defined vertical characteristic forms. 
More precisely, they prove the formula (Bismut 
and Freed 1986 and Berline et al. 1992) 


Q(PEMD) = (2m ( / A/B) [33] 
M/B 


[2] 


where (0); € 7(B) means the 2-form component 
of a differential form o on B. Here A(M/B)= 
det'/?((R™/® /2)/ sinh (RM/B/2)) is the vertical 
A-genus differential form, while ch(€) is the vertical 
Chern character form associated to the curvature 
form of the bundle €. 

This theory seems a long way from the classical 
theory of stable characteristic classes and the 
Fredholm determinant discussed in earlier sections. 
There are, however, interesting parallels which 
may guide the search for an understanding of the 
geometry of families of elliptic operators, of which 
determinants form a component. The prototypical 
situation where determinants arise in the quantiza- 
tion of gauge theory is the following. Consider the 
infinite-dimensional affine space A of connections 
on a complex vector bundle E with structure 
group G sitting over S” the m-sphere. The Lie 
group G is assumed to be compact. For each 
connection A € A, we consider a Dirac operator 
Da: C%(S",S* @ E) — C™(S",S” & E), where E is 
a Hermitian vector bundle coupled to the spinor 
bundles S+. The group G of based gauge transfor- 
mations acts on A and symmetry properties of 
conservation laws lead one to be interested in 
constructing a determinant function on the quo- 
tient space A/G. More precisely, g € G transforms 
Da to Dga and by equivariance the Quillen 
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determinant section pushes down to a section of 
a reduced determinant line bundle over A/G. As 
seen earlier, the topological obstruction to realiz- 
ing this determinant section as a function on A/G 
can be computed from the Atiyah-Singer index 
theorem for families applied to the corresponding 
index bundle Ind(D 4;g) in the K(A/G) by picking 
out the degree-2 component in H*(A/G) of the 
Chern character ch(Ind(D4/g)). On the other hand, 
it turns out that this characteristic class is the 
transgression of the element of H'(G, Z) defined by 
the zeta-determinant trace 


Əc:= tre (D4 Dga) 'de(DADga)) 
= D O a *)|"™ 


which counts the winding number of the zeta 
determinant G — C* defined by det;¢(D%,D,.4). This 
provides an interesting parallel of the classical 
theory described in the section “The Fredholm 
determinant.” For more details of this and more 
advanced ideas take a look at Singer (1985). (A 
similar parallel holds between the topological 
derivation of the conformal anomaly outlined at 
the beginning of this section and what it called the 
Polyakov multiplicative anomaly formula for the 
zeta determinant of the Laplacian with respect to 
conformal changes in the metric on the surface.) 
Aspects of more recent work in this direction have 
been the extension of the theory to manifolds with 
boundary, and how it encodes into the structures of 
topological and conformal field theories, see Segal 
(2004) and Mickelsson and Scott (2001), and more 
generally into M-theory (Freed and Moore 2004). 


See also: Anomalies; Feynman Path Integrals; 
Index Theorems; Regularization for Dynamical 
¢-Functions. 
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Introduction 


A classic question in probability theory, studied by 
M Kac, S O Rice, and many others, is to find the 
expected number and distribution of zeros or critical 
points of a random polynomial. The same question 
can be asked for random holomorphic functions or 
sections of bundles, and are the subject of “random 
algebraic geometry.” 

While this theory has many physical applications, 
in this article we focus on a variation on a standard 
question in the theory of disordered systems. This 
is to find the expected distribution of minima of 
a potential function randomly chosen from an 
ensemble, which might be chosen to model a crystal 
with impurities, a spin glass, or another disordered 
system. Now whereas standard potentials are real- 
valued functions, analogous functions in supersym- 
metric theories, such as the superpotential and 
the central charge, are holomorphic sections of a 
line bundle. Thus, one is interested in finding the 
distribution of critical points of a randomly chosen 
holomorphic section. 

Two related and much-studied problems of this 
type are (1) the problem of finding attractor points 
in the sense of Ferrara, Kallosh, and Strominger, and 
(2) the problem of finding flux vacua as posed by 
Giddings, Kachru, and Polchinski. These problems 
involve a good deal of fascinating mathematics and 
are good illustrations of the general theory. 

A note on general references for further reading on 
the subject of this article is in order. For background 
on random algebraic geometry and some of its other 
applications, as well as references in the text not 
listed here, consult Edelman and Kostlan (1995) and 
Zelditch (2001). The attractor problem is discussed in 
Ferrara et al. (1995) and Moore (2004), while Ib flux 
vacua were introduced in Giddings et al. (2002). 
Background on Calabi-Yau manifolds can be found in 
Cox and Katz (1999) and Gross et al. (2003). 


Elementary Random Algebraic Geometry 


Let us introduce this subject with the problem of 
finding the expected distribution of zeros of a 
random polynomial, 


f(z) =co tee t+-+- tone” 


We define a random polynomial to be a probability 
measure on a space of polynomials. A natural choice 
might be independent Gaussian measures on the 
coefficients, 


e cil A [1] 


-JI Pas Or 


We still need to choose the variances. At first the 
most natural choice would seem to be equal 
variance for each coefficient, say o; = 1/2. We can 
characterize this ensemble by its two-point 
function, 


du [f] = dulco,...,¢ 


Can 22) = EN a) 


N 
=) (2122) 
n=0 
gN+1gN+1 
Oe 
1 — 22 


We now define djio(z) to be a measure with unit 
weight at each solution of f(z)=0, such that its 
integral over a region in C counts the expected 
number of zeros in that region. It can be written in 
terms of the standard Dirac delta function, by 
multiplication by a Jacobian factor, 


dyto(z) = El (F(z) Of @) OF E) |2] 


To compute this expectation value, we introduce a 
constrained two-point function, 


Blo (F(2)) F(z1) F 2) 
E[6 (F(z))] 


It could be explicitly computed by using the 
constraint f(z)=0 to solve for a coefficient c; in 


Gf =0(215 22) = 


324 Random Algebraic Geometry, Attractors and Flux Vacua 


the Gaussian integral, that is, projecting on the 
linear subspace 0= X` c;z'. The result, in terms of 
G(Z1, 22), 1s 


B6 (Fe) = 


(Z, 2) 





z - G(21, Z)G(z, Z2 
Gr (z)=0(%1; 22) = G(z1, Z2) — ee 


as can be verified by considering 


EBO FE) FEF E] x Gro- 22) 
= G(z, Z2) — aa 


Using this, eqn [2] can be evaluated by taking 
derivatives: 


= 0 


1 = E 
duo(2) = eae D1 D2G;,(21, 22) 
1 


= — 00 log G (z, Z) 


TT 


For the constant variance ensemble eqn [2], 


2 Jf —;N 
goo = E (oa E) 3 


m \(1— 2g)’ (1 — (zz) ™)° 


We see that as N — ov, the zeros concentrate on the 
unit circle |z| = 1 (Hammersley 1954). 

A similar formula can be derived for the distribu- 
tion of roots of a real polynomial on the real axis, 
using dyu(t)=E[d(f(t))|df/dzt|]. One obtains (Kac 
1943): 


i dt 1 
duo(t) = m a-r? - 


(N + 1)*#2N 
cee t2N+2)* 


Integrating, one finds the expected number of real 
zeros of a degree N random real polynomial is Ey ~ 
(2/m)logN, and as N — œ the zeros are concen- 
trated at t= +1. 

While concentration of measure is a fairly 
generic property for random polynomials, it is by 
no means universal. Let us consider another 
Gaussian ensemble, with variance o,=N!/n! 
(N —2n)!. This choice leads to a particularly simple 
two-point function, 


G(z,z)=(1+ zz) [4] 
and the distribution of zeros 
1- N dz 
duo = = ð log G = ———_~ 5 
Ho m 8 r(1 4 zz)" | | 


Rather than concentrate the zeros, in this ensemble 
zeros are uniformly distributed according to the 


volume of the Fubini-Study (SU(2)-invariant) Kahler 
metric 


w= K, K= log(1 +22) 


on complex projective space CPt. 

We can better understand the different behaviors 
in our two examples by focusing on a Hermitian 
inner product (f,g) on function space, associated to 
the measure eqn [1] by the formal expression 


duif] = [Df] 6P 


In making this precise, let us generalize a bit further 
and allow f to be a holomorphic section of a line 
bundle £, say O(N) over CP! in our examples. We 
then choose an orthonormal basis of sections 
(Si, Sj) = ĉj, and write 


T = 2 CiSi [6] 


and 


J. 2 
d°c; eel? 
1 





1 
dulf] = (2n)% 


i= 


We can then compute the two-point function 


N 
G(z1,%2) = Els(z1)s*(2)] = $ si(zi)s;(%) [7 
i=1 


and proceed as before. 

In these terms, the simplest way to describe the 
measure for our first example is that it follows from 
the inner product on the unit circle, 


ae 


fa) = ae 


Thus, we might suspect that this has something to 
do with the concentration of eqn [3] on the unit 
circle. Indeed, this idea is made precise and general- 
ized in Shiffman and Zelditch (2003). 

Our second example belongs to a class of problems 
in which M is compact and £ positive. In this case, 
the space H®(M, £L) of holomorphic sections is finite 
dimensional, so we can take the basis to consist of all 
sections. Then, if M is in addition Kahler, we can 
derive all the other data from a choice of Hermitian 
metric h(f,g) on £. In particular, this determines a 
Kahler form w as the curvature of the metric 
compatible connection, and thus a volume form 
Vol, = w”"/n!. We then define the inner product to be 


(f, 2) = I, Vol Alf, 2) 


(z)g(z) 


Thus, the measure equation [1] and the final distribu- 
tion equation [2] are entirely determined by h. In 
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these terms, the underlying reason for the simplicity of 
eqn [5] is that we started with the SU(2)-invariant 
metric þh, so the final distribution must be invariant 
as well. More generally, eqn [7] is a Szegö kernel. 
Taking £L=£® for N large, this has a known 
asymptotic expansion, enabling a rather complete 
treatment (Zelditch 2001). 

Our two examples also make the larger point that 
a wide variety of distributions are possible. Thus, to 
get convincing results, we must put in some informa- 
tion about the ensemble of random polynomials or 
sections which appear in the problem at hand. 

The basic computation we just discussed can be 
vastly generalized to multiple variables, multipoint 
correlation functions, many different ensembles, and 
different counting problems. We will discuss the 
distribution of critical points of holomorphic 
sections below. 


The Attractor Problem 


We now turn to our physical problems. Both are 
posed in the context of compactification of the type 
IIb superstring theory on a Calabi-Yau 3-fold M. 
This leads to a four-dimensional effective field 
theory with N=2 supersymmetry, determined by 
the geometry of M. 

Let us begin by stating the attractor problem 
mathematically, and afterwards give its physical 
background. We begin by reviewing a bit of the 
theory of Calabi-Yau manifolds. By Yau’s proof of 
the Calabi conjecture, the moduli space of Ricci-flat 
metrics on M is determined by a choice of complex 
structure on M, denote this J, and a choice of Kahler 
class. Using deformation theory, it can be shown 
that the moduli space of complex structures, denote 
this M.(M), is locally a complex manifold of 
dimension h>!(M). A point J in M.(M) picks out a 
holomorphic 3-form Q; € H*°(M,C), unique up to 
an overall choice of normalization. The converse is 
also true; this can be made precise by defining the 
period map M,(M) — P(H?(M,Z) @C) to be the 
class of Q in H°(M,Z)@C up to projective 
equivalence. One can prove that the period map is 
injective (the Torelli theorem), locally in general and 
globally in certain cases such as the quintic in CP%*. 

Now, the data for the attractor problem is a charge, 
a class y € H?(M,Z). An attractor point for y is then 
a complex structure J on M such that 


y € H}” (M,C) @ H?” (M,C) [8] 


This amounts to h>! complex conditions on the h>! 
complex structure moduli, so picks out isolated 
points in M.(M), the attractor points. 


There are many mathematical and physical ques- 
tions one can ask about attractor points, and it 
would be very interesting to have a general method 
to find them. As emphasized by G Moore, this is one 
of the simplest problems arising from string theory 
in which integrality (here due to charge quantiza- 
tion) plays a central role, and thus it provides a 
natural point of contact between string theory and 
number theory. For example, one might suspect that 
attractor Calabi-Yau’s are arithmetic, that is, are 
projective varieties whose defining equations live in 
an algebraic number field. This can be shown to 
always be true for K3xT*, and there are 
conjectures about when this is true more generally 
(Moore 2004). 

A simpler problem is to characterize the distribu- 
tion of attractor points in M,(M). As these are 
infinite in number, one must introduce some 
control parameter. While the first idea which 
might come to mind is to bound the magnitude of 
y, since the intersection form on H°(M,Z) is 
antisymmetric, there is no natural way to do this. 
A better way to get a finite set is to bound the 
period of y, and consider the attractor points 
satisfying 


_ Nur A 2K 


ZZ >Z gD = 2 


max = 9] 
As an example of the type of result we will discuss 
below, one can show that for large Zmax, the density 
of such attractor points asymptotically approaches 
the Weil—Peterson volume form on M,. 

We now briefly review the origins of this problem, 
in the physics of 1/2 BPS (Bogomoln’yi—Prasad- 
Sommerfield) black holes in N =2 supergravity. We 
begin by introducing local complex coordinates 2’ 
on M,(M). Physically, these can be thought of as 
massless complex scalar fields. These sit in vector 
multiplets of N =2 supersymmetry, so there must be 
h*'(M) vector potentials to serve as their bosonic 
partners under supersymmetry. These appear 
because the massless modes of the type IIb string 
include various higher rank-p form gauge potentials, 
in particular a self-dual 4-form which we denote C. 
Self-duality means that dC = x dC up to nonlinear 
terms, where x is the Hodge star operator in ten 
dimensions. Now, Kaluza—Klein reduction of this 
4-form potential produces b(M) 1-form vector 
potentials A; in four dimensions. Given an explicit 
basis of 3-forms wr for H?(M, R) N H?(M, Z), this 
follows from the decomposition 


b3 
C= \ Ar A wr + massive modes 
T=] 
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However, because of the self-duality relation, only 
half of these vector potentials are independent; the 
other half are determined in terms of them by four- 
dimensional electric-magnetic duality. Explicitly, 
given the intersection form n;; on H? @ H?, we have 


dA; = Nij x4 dA; [10] 


where x4 denotes the Hodge star in d=4. Thus we 
have h>! + 1 independent vector potentials. One of 
these sits in the N=2 supergravity multiplet, and 
the rest are the correct number to pair with the 
complex structure moduli. We now consider 1/2 BPS 
black hole solutions of this four-dimensional N = 2 
theory. Choosing any S? which surrounds the 
horizon, we can define the charge y as the class in 
H?(M, Z) which reproduces the corresponding mag- 
netic charges 


1 
o=] da= na 
20 92 M 


Using eqn [10], this includes all charges. 
One can show that the mass M of any charged 
object in supergravity satisfies a BPS bound, 


M? > |Z(9;2)/° [11] 


The quantity IZ(+32)|7, defined in eqn [9], depends 
explicitly on y, and implicitly on the complex 
structure moduli z through Q. A 1/2 BPS solution 
by definition saturates this bound. 

We now explain the “attractor paradox.” 
According to Bekenstein and Hawking, the entropy 
of any black hole is proportional to the area of its 
event horizon. This area can be found by finding 
the black hole as an explicit solution of four- 
dimensional supergravity, which clearly depends on 
the charge y. In fact, we must fix boundary 
conditions for all the fields at infinity, in particular 
the complex structure moduli, to get a particular 
black hole solution. Now, normally varying the 
boundary conditions varies all the data of a 
solution in a continuous way. On the other hand, 
if the entropy has any microscopic interpretation as 
the logarithm of the number of quantum states of 
the black hole, one would expect e° to be integrally 
quantized. Thus, it must remain fixed as the 
boundary conditions on complex structure moduli 
are varied, in contradiction with naive expectations 
for the area of the horizon, and seemingly contra- 
dicting Bekenstein and Hawking. 

The resolution of this paradox is the attractor 
mechanism. Let us work in coordinates for which 
the four-dimensional metric takes the form 

A(r) 


ds* = —f(r) dé? + dr + pe d$, 


With some work, one can see that in the 1/2 BPS 
case, the equations of motion imply that as r 
decreases, the complex structure moduli z follow 
gradient flow with respect to Ziy, z) in eqn [11], 
and the area A(r) of an S* at radius r decreases. 
Finally, at the horizon, z reaches a value z, at which 
IZ(-+, Z) is a local minimum, and the area of 
the event horizon is A=4n|Z(7, z). Since z, is 
determined by minimization, this area will not 
change under small variations of the initial z, 
resolving the paradox. 

A little algebra shows that the problem of finding 
nonzero critical points of |Z(y, z) |? is equivalent to 
that of finding critical points D;Z =0 of the period 
associated to y, 


Z= | yao [12] 


usually called the central charge, with respect to the 
covariant derivative 


D;Z = 09; Z+ (0; K)Z [13] 
Here 


Ks fang [14] 


The mathematical significance of this rephrasing is 
that K is a Kähler potential for the Weil-Peterson 
Kahler metric on M.(M), with Kahler form 
w= 00K, and eqn [13] is the unique connection on 
H'?>) (M,C) regarded as a line bundle over M,(M), 
whose curvature is —w. These facts can be used to 
show that D;Q provides a basis for H 1(M, C), so 
that the critical point condition forces the projection 
of y on H>! to vanish. This justifies our original 
definition eqn [8]. 


Flux Vacua in llb String Theory 


We will not describe our second problem in as much 
detail, but just give the analogous final formulation. 
In this problem, a “choice of flux” is a pair of 
elements of H?(M, Z), or equivalently a single 
element 


F € H(M,Z TZ) [15] 


where r € H= {r € C|Im7r > 0} is the so-called 
“dilaton-axion.” 

A flux vacuum is then a choice of complex 
structure J and r for which 


F € H” (M,C) $ H” (M,C) [16] 


Now we have h>! + h°3 =h*! +1 complex condi- 
tions on the joint choice of h>! complex structure 
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moduli and 7, so this condition also picks out 
special points, now in M, x H. 

The critical point formulation of this problem is 
that of finding critical points of 


W=/ OF (17) 


under the covariant derivatives eqn [13] and 
D, W = o, W + (0,W)Z 


with K the sum of eqn [14] and the Kahler potential 
—log Imr for the metric on the upper half-plane of 
constant curvature —1. 

This is a sort of complexified version of the 
previous problem and arises naturally in IIb com- 
pactification by postulating a nonzero value F for a 
certain 3-form gauge field strength, the flux. The 
quantity eqn [17] is the superpotential of the 
resulting N=1 supergravity theory, and it is a 
standard fact in this context that supersymmetric 
vacua (critical points of the effective potential) are 
critical points of W in the sense we just stated. 

We can again pose the question of finding the 
distribution of flux vacua in M,(M) x H. Besides 
W|", which physically is one of the contributions to 
the vacuum energy, we can also use the “length of 
the flux” 


1 
L =; | Re FAIm F [18] 
Imr 


as a control parameter, and count flux vacua for 
which L < Lmax. In fact, this parameter arises 
naturally in the actual IIb problem, as the “orienti- 
fold three-plane charge.” 

What makes this problem particularly interesting 
physically is that it (and its analogs in other string 
theories) may bear on the solution of the cosmolo- 
gical constant problem. This begins with Einstein’s 
famous observation that the equations of general 
relativity admit a one-parameter generalization, 


Rag 52 „R 00 Lye ip 


Physically, the cosmological constant A is the 
vacuum energy, which in our flux problem takes 
the form A=---—3|/W|* (the other terms are 
inessential for us here). 

Cosmological observations tell us that A is very small, 
of the same order as the energy of matter in the present 
era, about 107'**Mj,_.,. in Planck units. However, in a 
generic theory of quantum gravity, including string 
theory, quantum effects are expected to produce a large 
vacuum energy, a priori of order M$ „x. Finding an 
explanation for why the theory of our universe is in this 
sense nongeneric is the cosmological constant problem. 


One of the standard solutions of this problem is 
the “anthropic solution,” initiated in work of 
Weinberg and others, and discussed in string theory 
in Bousso and Polchinski (2000). Suppose that we 
are discussing a theory with a large number of 
vacuum states, all of which are otherwise candidates 
to describe our universe, but which differ in A. If the 
number of these vacuum states were sufficiently 
large, the claim that a few of these states realize a 
small A would not be surprising. But one might still 
feel a need to explain why our universe is a vacuum 
with small A, and not one of the multitude with 
large A. 

The anthropic argument is that, according to 
accepted models for early cosmology, if the value of 
|A| were even 100 times larger than what is 
observed, galaxies and stars could not form. Thus, 
the known laws of physics guarantee that we will 
observe a universe with A within this bound; it is 
irrelevant whether other possible vacuum states 
“exist” in any sense. 

While such anthropic arguments are controversial, 
one can avoid them in this case by simply asking 
whether or not any vacuum state fits the observed 
value of A. Given a precise definition of vacuum 
state, this is a question of mathematics. Still, 
answering it for any given vacuum state is extremely 
difficult, as it would require computing A to 10712 
precision. But it is not out of reach to argue that out 
of a large number of vacua, some of them are 
expected to realize small A. For example, if we 
could show that the number of otherwise physically 
acceptable vacua was larger than 10'**, and that the 
distribution of A among these was approximately 
uniform over the range (— M$ a Mbiancy)» We Would 
have made a good case for this expectation. This style 
of reasoning can be vastly generalized and, given 
favorable assumptions about the number of vacua in 
a theory, could lead to falsifiable predictions inde- 
pendent of any a priori assumptions about the choice 
of vacuum state (Douglas 2003). 


Asymptotic Counting Formulas 


We have just defined two classes of physically 
preferred points in the complex structure moduli 
space of Calabi-Yau 3-folds, the attractor points 
and the flux vacua. Both have simple definitions in 
terms of Hodge structure, eqn [8] and eqn [16], and 
both are also critical points of integral periods of the 
holomorphic 3-form. 

This second phrasing of the problem suggests the 
following language. We define a random period of 
the holomorphic 3-form to be the period for a 
randomly chosen cycle in H3(M,Z) of the types we 
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just discussed (real or complex, and with the 
appropriate control parameters). We are then inter- 
ested in the expected distribution of critical points 
for a random period. This brings our problem into 
the framework of random algebraic geometry. 
Before proceeding to use this framework, let us 
first point out some differences with the toy 
problems we discussed. First, while eqn [12] and 
eqn [17] are sums of the form eqn [6], we take not 
an orthonormal basis but instead a basis s; of 
integral periods of Q. Second, the coefficients c; are 
not normally distributed but instead drawn from a 
discrete uniform distribution, that is, correspond to 
a choice of y in H?(M, Z) or F as in eqn [15], 
satisfying the bounds on |Z| or L. Finally, we do not 
normalize the distribution (which is thus not a 
probability measure) but instead take each choice 
with unit weight. 

These choices can of course be modified, but are 
made in order to answer the question, “how many 
attractor points (or flux vacua) sit within a specified 
region of moduli space?” The answer we will get is a 
density u(Zmax) or u(Lmax) on moduli space, such 
that as the control parameter becomes large, the 
number of critical points within a region R 
asymptotes to 


N(R; La) ~J iil Zma) 


The key observation is that to get such asympto- 
tics, we can start with a Gaussian random 
element s of H°(M,R) or H°(M,C). In other 
words, we neglect the integral quantization of 
the charge or flux. Intuitively, this might be 
expected to make little difference in the limit 
that the charge or flux is large, and in fact one 
can prove that this simplification reproduces the 
leading large L or |Z| asymptotics for the density 
of critical points, using standard ideas in lattice 
point counting. 

This justifies starting with a two-point function 
like eqn [7]. While the integral periods s; of Q can 
be computed in principle (and have been in many 
examples) by solving a system of linear PDEs, the 
Picard—Fuchs equations, it turns out that one does 
not need such detailed results. Rather, one can 
use the following ansatz for the two-point 
function, 


G(z1, 22) n” s1(z1)s; (2) 


=| A) AdE) 
exp —K(z1, z2) 


In words, the two-point function is the formal 
continuation of the Kahler potential on M,(M) to 
independent holomorphic and antiholomorphic 
variables. This incorporates the quadratic form 
appearing in eqn [18] and can be used to count 
sections with such a bound. 

We can now follow the same strategy as before, 
by introducing an expected density of critical 
points, 


du(z) = E[6™ (Djs(z))5 (D;5(z)) | det Hyl [19 


where the “complex Hessian?” H is the 2n x 2n 
matrix of second derivatives 


s(2) a 20] 


(note that Ds = DDs at a critical point). One can 
then compute this density along the same lines. 
The holomorphy of s implies that ð;D;s = ws, 
which is one simplification. Other geometric 
simplifications follow from the fact that eqn [19] 
depends only on s and a finite number of its 
derivatives at the point z. 
For the attractor problem, using the identity 


D; Djs = Fipo Dis = () 


from special geometry of Calabi-Yau 3-folds, the 
Hessian becomes trivial, and detH = |s|”. One thus 
finds (Denef and Douglas 2004) that the asymptotic 
density of attractor points with large |Z| < Zmax in a 
region R is 


2n+1 
Z”*1 . vol(R) 


N(R, |Z] < Zas) oa (n 1 1)r” max 


where vol(R)= fk w”/n! is the volume of R in the 
Weil-Peterson metric. The total volume is known to 
be finite for Calabi-Yau 3-fold moduli spaces, and 
thus so is the number of attractor points under this 
bound. 

The flux vacuum problem is complicated by the 
fact that DDs is nonzero and thus the determinant 
of the Hessian does not take a definite sign, and 
implementing the absolute value in eqn [19] is 
nontrivial. The result (Douglas, et al. 2004) is 


|det(HH* — |x|? - 1)| 


JEG., 

Oran S 

(z) b3! det A(z) H(z)xC 
x ef Me) H-t JH dx 


where H(z) is the subspace of Hessian matrices eqn 
[20] obtainable from periods at the point z, and A(z) 
is a covariance matrix computable from the period 
data. 
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A simpler lower bound for the number of 
solutions can be obtained by instead computing the 
index density 


Lr (Z) = E [6 (Djs)6 (Dis) det Ay 


1<ij<2n | 


|21] 


so-called because it weighs the vacua with a Morse- 
Witten sign factor. This admits a simple explicit 
formula (Ashok and Douglas 2004), 


Lael Ky L = Das) 


Cee 
~ n”+tib,! 


J det(R+w-1) 122] 
R 


where R is the (n + 1) x (n + 1)-dimensional matrix 
of curvature 2-forms for the Weil-Peterson metric. 

One might have guessed this density by the 
following reasoning. If s had been a single-valued 
section on a compact M, (it is not), topological 
arguments determine the total index to be [cy41(£ ® 
T*M)], and this is the simplest density constructed 
solely from the metric and curvatures in the same 
cohomology class. 

It is not in general known whether this integral over 
Calabi-Yau moduli space is finite, though this is true 
in examples studied so far. One can also control |W]? 
as well as other observables, and one finds that the 
distribution of |W|* among flux vacua is to a good 
approximation uniform. Considering explicit exam- 
ples, the prefactor in eqn [22] is of order 10!°°—10°°, 
so assuming that this factor dominates the integral, we 
have justified the Bousso—Polchinski solution to the 
cosmological constant problem in these models. 

The finite L corrections to these formulas can be 
estimated using van der Corput techniques, and are 
suppressed by better than the naive L~'/2 or |Z|" one 
might have expected. However the asymptotic for- 
mulas for the numbers of flux vacuum break down in 
certain limits of moduli space, such as the large 
complex structure limit. This is because eqn [18] 
is an indefinite quadratic form, and the fact that 
it bounds the number of solutions at all is somewhat 
subtle. These points are discussed at length in 
(Douglas et al. 2005). 

Similar results have been obtained for a wide 
variety of flux vacuum counting problems, with 
constraints on the value of the effective potential at 
the minimum, on the masses of scalar fields, on 
scales of supersymmetry breaking, and so on. And in 
principle, this is just the tip of an iceberg, as the 


study of more or less any class of superstring vacua 
leads to similar questions of counting and distribu- 
tion, less well understood at present. Some of these 
are discussed in Douglas (2003), Acharya et al. 
(2005), Denef and Douglas (2005), Blumenhagen 
et al. (2005). 


See also: Black Hole Mechanics; Chaos and Attractors; 
Compactification of Superstring Theory; Supergravity. 
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Introduction 


The concept of random dynamical system is a 
comparatively recent development combining ideas 
and methods from the well-developed areas of 
probability theory and dynamical systems. 

Let us consider a mathematical model of some 
physical process given by the iterates Tk = Too £ -o To, 
k>1, of a smooth transformation Tp:M©O of a 
manifold into itself. A realization of the process 
with initial condition xo is modeled by the sequence 
(T (olei the orbit of xo. 

Due to our inaccurate knowledge of the particular 
physical system or due to computational or theore- 
tical limitations (e.g., lack of sufficient computa- 
tional power, inefficient algorithms, or insufficiently 
developed mathematical or physical theory), the 
mathematical models never correspond exactly to 
the phenomenon they are meant to model. More- 
over, when considering practical systems, we cannot 
avoid either external noise or measurement or 
inaccuracy errors, so every realistic mathematical 
model should allow for small errors along orbits not 
to disturb the long-term behavior too much. To be 
able to cope with unavoidable uncertainty about the 
“correct” parameter values, observed initial states 
and even the specific mathematical formulation 
involved, let randomness be embedded within the 
model to begin with. 

This article presents the most basic classes of 
models, defines the general concept, and presents 
some developments and examples of applications. 


Dynamics with Noise 


To model random perturbations of a transformation 
To, we may consider a transition from the 
image T(x) to some point according to a given 
probability law, obtaining a Markov chain, or, if To 
depends on a parameter p, we may choose p at 
random at each iteration, which also can be seen as 
a Markov chain but whose transitions are strongly 
correlated. 


Random Noise 


Given To:MO© and a family {p(-|x):x« € M} of 
probability measures on M such that the support of 
p(-| x) is close to To(x), the random orbits are 


sequences (xpz)ps,; Where each x,41 is a random 
variable with law p(-|x,). This is a Markov 
chain with state space M and transition probabilities 
{p(- | x)}xem. To extend the concept of invariant 
measure of a transformation to this setting, a 
probability measure u is said to be “stationary” if 
u(A)= | p(A | x) du(x) for every measurable (Borel) 
subset A. This can be conveniently translated by 
saying that the skew-product measure u x p on 
M x Mò given by 


d(u x p)(x0,%1,---,Xn5---) 
= du(xo)p(dx1 | xo) p(dxn+1 | Xn): 


is invariant by the shift map S: M x MO on the 
space of orbits. Hence, we may use the ergodic 
theorem and get that time averages of all continuous 
observables y: M — R, that is, writing x= (xp)k>0 
and 


n—1 
pa) = lim Y yl) 
k=0 


n—+oo n 


= 1 
= lim — 
n—+oon 


n—1 
N p(m0(S*(x))) 
k=0 


exist for jx p-almost all sequences x, where 
to:M x M — M is the natural projection on the 
first coordinate. It is well known that stationary 
measures always exist if the transition probabilities 
p(- | x) depend continuously on x. 

A function y:M—R is invariant if y(x)= 
| p(z)p(dz |x) for yu-almost every x. We then say 
that u is ergodic if every invariant function is 
constant p-almost everywhere. Using the ergodic 
theorem again, if u is ergodic, then P=] ydp, 
u-almost everywhere. 

Stationary measures are the building blocks for 
more sophisticated analysis involving, for example, 
asymptotic sojourn times, Lyapunov exponents, decay 
of correlations, entropy and/or dimensions, exit/ 
entrance times from/to subsets of M, to name just a 
few frequent notions of dynamical and probabilistic/ 
statistical nature. 


Example 1 (Random jumps). Given €e>0 and 


To : M — M, let us define 
m(A N B(To(x), €)) 
€ A — 
PATS) = (B(To(®),6) 
where m denotes some choice of Riemannian 


volume form on M. Then p‘;(- | x) is the normalized 
volume restricted to the e-neighborhood of To(x). 


This defines a family of transition probabilities 
allowing the points to “jump” from To(x) to any 
point in the e-neighborhood of To(x) following a 
uniform distribution law. 


Random Maps 


Alternatively, we may choose maps Tj, T2,..., T; 
independently at random near To according to a 
probability law v on the space T(M) of maps, whose 
support is close to To in some topology, and 
consider sequences x,=T1,0---0 7 (x9) obtained 
through random iteration, k > 1, x9 € M. 

This is again a Markov chain whose transition 
probabilities are given for any x € M by 


p(A | x) =v({T € T(M): T(x) € A}) 


so this model may be reduced to the first one. 
However, in the random-maps setting, we may 
associate, with each random orbit, a sequence of 
maps which are iterated, enabling us to use “robust 
properties” of the transformation To (i.e., properties 
which are known to hold for Tọ and for every 
nearby map T) to derive properties of the random 
orbits. 

Under some regularity conditions on the map 
x ++ p(A|x) for every Borel subset A, it is possible 
to represent random noise by random maps on 
suitably chosen spaces of transformations. In fact, 
the transition probability measures obtained in the 
random-maps setting exhibit strong spatial correla- 
tion: p(-|x) is close to p(- |y) as x is near y. 

If we have a parametrized family T:U x M — M 
of maps, we can specify the law v by giving a 
probability 0 on U. Then with every sequence 
Tı,..., Tk,... of maps of the given family, we 
associate a sequence w1,...,Wk,... Of parameters in 
U since 


Ty 0+:0T, =Ty, e 


for all k > 1, where we write T,,(x) = T(w, x). In this 
setting, the shift map S becomes a skew-product 
transformation 


S:MxUNO (x,w) (T,,(x), o(w)) 


to which many of the standard methods of dynami- 
cal systems and ergodic theory can be applied, 
yielding stronger results that can be interpreted in 
random terms. 


Example 2 (Parametric noise). Let T : P x M — M 
be a smooth map where P, M are finite-dimensional 
Riemannian manifolds. We fix po € P, denote by m 
some choice of Riemannian volume form on P, set 
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Ty(x)=T(w, x), and for every e>0 write 
9. =(m(B(po, ©): (mm |B(po, €)), the normalized 
restriction of m to the e-neighborhood of po. Then 
(Tw) wep» together with @., defines a random pertur- 
bation of Tp,, for every small enough e > 0. 


Example 3 (Global additive perturbations). Let M 
be a homogeneous space, that is, a compact 
connected Lie group admitting an invariant 
Riemannian metric. Fixing a neighborhood U of 
the identity e € M, we can define a map T: Ux 
M — M, (u, x) L,(To(x)), where L,(x)=u-x is 
the left translation associated with u € M. The 
invariance of the metric means that left (and also 
right) translations are isometries, hence fixing u € U 
and taking any (x, v) € TM, we get 


||DT,.(x) - v] = ||DLu(To(x))(DTo(x) - v) | 
= ||DTo(x) -v| 


In the particular case of M=7™, the d-dimensional 
torus, we have T,,(x)=To(x) + u, and this simplest 
case suggests the name “additive random pertur- 
bations” for random perturbations defined using 
families of maps of this type. 


For the probability measure on U, we may 
take 6., any probability measure supported in the 
e-neighborhood of e and absolutely continuous 
with respect to the Riemannian metric on M, for 
any € > 0 small enough. 


Example 4 (Local additive perturbations). If 
M=Rf and Up is a bounded open subset of M 
strictly invariant under a diffeomorphism To, that is, 
closure (Tp(Uo)) C Uo, then we can define an 
isometric random perturbation setting: 


(i) V=Tp(Up) (so that 
(To(Uo)) C Uo); 
(ii) G ~ Rf the group of translations of Rf; and 
(iii) V a small enough neighborhood of 0 in G. 


closure (V)= closure 


Then for v € V and x € V, we set T, (x) =x + v, with 
the standard notation for vector addition, and 
clearly T, is an isometry. For 6., we may take any 
probability measure on the e-neighborhood of 0, 
supported in V and absolutely continuous with 
respect to the volume in Rf, for every small enough 
e> 0. 


Random Perturbations of Flows 


In the continuous-time case, the basic model to start 
with is an ordinary’ differential equation 
dX; =f(t, X;)dt, where f:[0,+0co) — ¥(M) and 
X(M) is the family of vector fields in M. We 
embed randomness in the differential equation 
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basically through “diffusion,” the perturbation is 
given by white noise or Brownian motion “added” 
to the ordinary solution. 

In this setting, assuming for simplicity that 
M =R”, the random orbits are solutions of stochas- 
tic differential equations 


dX; = f(t, X;)dt + €: o(t, XpdW,, 
O0O<tŁt<T, X% =Z 


where Z is a random variable, «T > 0 and both 
f :[0, T] x R” = R” and o: [0, T] x R” — L(R£, R”) 
are measurable functions. The space of linear maps 
R* = R” is written on L(R*,R”) and W, is the 
white-noise process on R*. The solution of this 
equation is a stochastic process: 


X:RxQ-M (t,w) Xw) 


for some (abstract) probability space 2, given by 


‘i T 
X,=Z+ J f(s, Xs)ds + / e-o(s, X;)dW, 
0 0 


where the last term is a stochastic integral in the 
sense of It6. Under reasonable conditions on f and øg, 
there exists a unique solution with continuous paths, 
that is, 


10, +20) Str X;(w) 


is continuous for almost all w € Q (in general these 
paths are nowhere differentiable). 

Setting Z=6,,, the probability measure concen- 
trated on the point xo, the initial point of the path is 
xo with probability 1. We write X;(w)xo for paths of 
this type. Hence, x> X;(w)x defines a map 
X;(w) : MO which can be shown to be a home- 
omorphism and even diffeomorphisms under suit- 
able conditions on f and o. These maps satisfy a 
cocycle property 


Xo(w) =Idy (identity map of M) 
Xt4s(w) = X+(A(s)(w)) o Xs(w) 


for s t> 0 and w€Q, for a family of measure- 
preserving transformations @(s):(Q,P)O on a 
suitably chosen probability space (Q, P). This 
enables us to write the solution of this kind of 
equations also as a skew product. 


The Abstract Framework 


The illustrative particular cases presented can all be 
written in skew-product form as follows. 

Let (Q, P) be a given probability space, which will 
be the model for the noise, and let T be time, which 
usually means Z,, Z (discrete, resp. invertible 
system) or R.,R (continuous, resp. invertible 


system). A random dynamical system is a skew 
product 


S; : Q x MO, (w,x) = (6(t)(w), y(t, w)(x)) 


for all że T, where 0: TxQ—Q is a family 
of measure-preserving maps O@(t):(Q, P)O and 
y:TxQxM—-M is a family of maps 
y(t, w): MO satisfying the cocycle property: for 
stET,wEQ, 


(0, w) = Idm 
p(t + s, w) = p(t, A(s)(w)) p(s, w) 


In this general setting an invariant measure for the 
random dynamical system is any probability mea- 
sure u on Q x M which is S;-invariant for all t € T 
and whose marginal is P, that is, u(S;,'(U)) = u(U) 
and j(m>'(U))=P(U) for every measurable U c 
Q x M, respectively, with mo : Q x M — Q the nat- 
ural projection. 


Example 5 In the setting of the previous examples 
of random perturbations of maps, the product 
measure 7 =P x u on Q x M, with Q=U^, P= 0 
and p any stationary measure, is clearly invariant. 
However, not all invariant measures are product 
measures of this type. 


Naturally an invariant measure is ergodic if every 
S;-invariant function is p-almost everywhere 
constant. That is, if w:QxM-—R satisfies 
woS,=wy p-almost everywhere for every te T, 
then w is u-almost everywhere constant. 


Applications 


The well-established applications of both probability 
or stochastic differential equations (solution of 
boundary value problems, optimal stopping, sto- 
chastic control etc.) and dynamical systems (all 
kinds of models of physical, economic or biological 
phenomena, solutions of differential equations, 
control systems etc.) will not be presented here. 
Instead, this section focuses on topics where the 
subject sheds new light on these areas. 


Products of Random Matrices and the 
Multiplicative Ergodic Theorem 


The following celebrated result on products of 
random matrices has far-reaching applications on 
dynamical systems theory. 

Let (X,),>9 be a sequence of independent and 
identically distributed random variables on 
the probability space (Q, P) with values in 
L(R*®, RE) such that E(log™ ||X1|]) < +oo, where 
log” x = max {0,logx} and ||- || is a given norm on 


LIRE, RE). Writing y,(w) =X,(w)o---0 X1(w) for 
all n > 1 and w €Q we obtain a cocycle. If we set 


1 
B= e y) EN x RÝ: lim —log |p; (w)y|| 


exists and is finite or is -0 }, 


Q = {wE Q: (w,y) € B for all y € Rf} 


then Q’ contains a subset Q” of full probability and 
there exist random variables (which might take the 
value —oo) à > ` >--- > Ap with the following 
properties. 


1, Let date lai >a be any 
(l + 1)-tuple of integers and then we define 


T= {w EQ’: Ai (w) = Aj(w), ip Filo EF 
and A; (w) > Aj,,,(w) foralli <h <l} 


the set of elements where the sequence A; jumps 
exactly at the indexes in I Then for 
weEQ,1<hb <i, 


‘, dh 
Pip (o) = fy ERE: lim “log lien(e)ll < AsCe)f 


is a vector subspace with dimension i,_; — 1. 
2. Setting Uy g+1(w) = {0}, then 


-l 
lim zlog |!Pn(w) || = Alo) 


for every y € Uypp(w)\Urp41(). 
3. For all w € Q” there exists the matrix 


A(w) = lim [(¢n(w))*Gn(w)] 


MOO 
whose eigenvalues form the set {e':i=1,..., k}. 


The values of A; are the random Lyapunov 
characteristics and the corresponding subspaces are 
analogous to random eigenspaces. If the sequence 
(Xn)„>0 is ergodic, then the Lyapunov characteristics 
become nonrandom constants, but the Lyapunov 
subspaces are still random. 

We can easily deduce the multiplicative ergodic 
theorem for measure-preserving differentiable maps 
(To, u) on manifolds M from this result. For simplicity, 
we assume that M C R? and set p(A | x) =87,(.)(A) =1 
if To(x) € A and 0 otherwise. Then the measure u x p 
on M x M is o-invariant (as defined earlier) and we 
have that ro o o = To o mo, where mo : M — M is the 
projection on the first coordinate, and also (mo), (u x 
p) = u. Then, setting for n > 1 


X:M— L(R*,R*) and X, = Xom o0” 
x => DTo(x) 
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we obtain a stationary sequence to which we can 
apply the previous result, obtaining the existence of 
Lyapunov exponents and of Lyapunov subspaces on 
a full measure subset for any C! measure-preserving 
dynamical system. 

By a standard extension of the previous setup, we 
obtain a random version of the multiplicative ergodic 
theorem. We take a family of skew-product maps 
St: Qx MÓ as in the section “The abstract frame- 
work” with an invariant probability measure u and 
such that y(t, w):M©Ó is (for simplicity) a local 
diffeomorphism. We then consider the stationary family 


Xi: Q —> L(TM), we Dyl(t,w):TMO teT 


where Dy(t, w) is the tangent map to y(t, w). This is 
a cocycle since for all t, s € T, w € Q we have 


X(s+t,w) = X(s,0(t)w) o X(t,w) 
If we assume that 


sup sup(log* ||Dy(t,.2)(x)l]) € L4(@,P) 

O0<t<1 xEeM 
where ||- || denotes the norm on the corresponding 
space of linear maps given by the induced norm 
(from the Riemannian metric) on the appropriate 
tangent spaces, then we obtain a sequence of 
random variables (which might take the value —oo) 
Ay > A2 > ++: > Ap, with k being the dimension of 
M, such that 


. 1 
„lim 7log [X:(w, x)y|| — A;(w, x) 


for every y € Ew, x)= X;lw, x) \ Nivi(w, x) and 
i=1,...,k +1, where (X;(w, x)); is a sequence of 
vector subspaces in TyM as before, measurable with 
respect to (w, x). In this setting, the subspaces E;(w, x) 
and the Lyapunov exponents are invariant, that is, 
for all t€ T and p-almost every (w, x) € Q x M, we 


have 


Ai(S:(w,x)) = Ai(w,x) and E;(S;(w,x)) = E;(w, x) 

The dependence of Lyapunov exponents on the 
map To has been a fruitful and central research 
program in dynamical systems for decades extending 
to the present day. The random multiplicative 
ergodic theorem sets the stage for the study of the 
stability of Lyapunov exponents under random 
perturbations. 


Stochastic Stability of Physical Measures 


The development of the theory of dynamical systems 
has shown that models involving expressions as 
simple as quadratic polynomials (as the logistic 
family or Hénon attractor), or autonomous ordinary 
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differential equations with a hyperbolic singularity 
of saddle type, as the Lorenz flow, exhibit sensitive 
dependence on initial conditions, a common feature 
of chaotic dynamics: small initial differences are 
rapidly augmented as time passes, causing two 
trajectories originally coming from practically indis- 
tinguishable points to behave in a completely 
different manner after a short while. Long-term 
predictions based on such models are unfeasible, 
since it is not possible to both specify initial 
conditions with arbitrary accuracy and numerically 
calculate with arbitrary precision. 


Physical measures Inspired by an analogous situa- 
tion of unpredictability faced in the field of 
statistical mechanics/thermodynamics, researchers 
focused on the statistics of the data provided by 
the time averages of some observable (a continuous 
function on the manifold) of the system. Time 
averages are guaranteed to exist for a positive- 
volume subset of initial states (also called an 
observable subset) on the mathematical model if 
the transformation, or the flow associated with the 
ordinary differential equation, admits a smooth 
invariant measure (a density) or a physical measure. 

Indeed, if zo is an ergodic invariant measure for the 
transformation To, then the ergodic theorem ensures 
that for every p-integrable function y:M — R and 
for u-almost every point x in the manifold M, the time 
average (x) = lim, 4.77! Da, p(T} (x)) exists and 
equals the space average f duo. A physical measure 
u is an invariant probability measure for which it is 
required that time averages of every continuous 
function y exist for a positive Lebesgue measure 
(volume) subset of the space and be equal to the space 
average u(y). 

We note that if u is a density, that is, absolutely 
continuous with respect to the volume measure, then 
the ergodic theorem ensures that u is physical. 
However, not every physical measure is absolutely 
continuous. To see why in a simple example, we 
consider a singularity p of a vector field which is an 
attracting fixed point (a sink), then the Dirac mass 
6) concentrated on p is a physical probability 
measure, since every orbit in the basin of attraction 
of p will have asymptotic time averages for any 
continuous observable y given by y(p) = dp(v). 

Physical measures need not be unique or even 
exist in general but, when they do exist, it is 
desirable that the set of points whose asymptotic 
time averages are described by physical measures 
(such a set is called the basin of the physical 
measures) be of full Lebesgue measure — only an 
exceptional set of points with zero volume would 
not have a well-defined asymptotic behavior. This is 


yet far from being proved for most dynamical 
systems, in spite of much recent progress in this 
direction. 

There are robust examples of systems admitting 
several physical measures whose basins together are 
of full Lebesgue measure, where “robust” means 
that there are whole open sets of maps of a manifold 
in the C* topology exhibiting these features. For 
typical parametrized families of one-dimensional 
unimodal maps (maps of the circle or of the interval 
with a unique critical point), it is known that the 
above scenario holds true for Lebesgue almost every 
parameter. It is known that there are systems 
admitting no physical measure, but the only known 
cases are not robust, that is, there are systems 
arbitrarily close which admit physical measures. 

It is hoped that conclusions drawn from models 
admitting physical measures to be effectively obser- 
vable in the physical processes being modeled. 
In order to lend more weight to this expectation, 
researchers demand stability properties from such 
invariant measures. 


Stochastic stability There are two main issues 
concerning a mathematical model, both from theo- 
retical and practical standpoints. The first one is to 
describe the asymptotic behavior of most orbits, that 
is, to understand what happens to orbits when time 
tends to infinity. The second and equally important 
one is to ascertain whether the asymptotic behavior 
is stable under small changes of the system, that is, 
whether the limiting behavior is still essentially the 
same after small changes to the law of evolution. In 
fact, since models are always simplifications of the 
real system (we cannot ever take into account the 
whole state of the universe in any model), the lack 
of stability considerably weakens the conclusions 
drawn from such models, because some properties 
might be specific to it and not in any way 
resembling the real system. 

Random dynamical systems come into play in this 
setting when we need to check whether a given 
model is stable under small random changes to the 
law of evolution. 

In more precise terms, we suppose that there is a 
dynamical system (a transformation or a flow) admit- 
ting a physical measure uo and we take any random 
dynamical system obtained from this one through the 
introduction of small random perturbations on the 
dynamics, as in Examples 1-4 or in the section on 
“Random perturbations of flows,” with the noise level 
c€ > 0 close to zero. 

In this setting if, for any choice ue of invariant 
measure for the random dynamical system for all 
€ > 0 small enough, the set of accumulation points of 


the family (u.)..9, when c€ tends to 0 — also known as 
zero-noise limits — is formed by physical measures or, 
more generally, by convex linear combinations of 
physical measures, then the original unperturbed 
dynamical system is stochastically stable. 

This intuitively means that the asymptotic beha- 
vior measured through time averages of continuous 
observables for the random system is close to the 
behavior of the unperturbed system. 

Recent progress in one-dimensional dynamics has 
shown that, for typical families (f;),<(9,;, of maps of 
the circle or of the interval having a unique critical 
point, a full Lebesgue measure subset T of the set of 
parameters is such that, for t € T, the dynamics of f; 
admits a unique stochastically stable (under additive 
noise type random perturbations) physical measure 
ut whose basin has full measure in the ambient space 
(either the circle or the interval). Therefore, models 
involving one-dimensional unimodal maps typically 
are stochastically stable. 

In many settings (e.g., low-dimensional dynamical 
systems), Lyapunov exponents can be given by time 
averages of continuous functions — for example, the 
time average of log ||DTo|| gives the biggest expo- 
nent. In this case, stochastic stability directly implies 
stability of the Lyapunov exponents under small 
random perturbations of the dynamics. 


Example 6 (Stochastically stable examples). Let 
Ty: S'© be a map such that A, the Lebesgue (length) 
measure on the circle, is To-invariant and ergodic. 
Then A is physical. 


We consider the parametrized family T, : S! x 
S'= S14 (t,x) OH x E and a ae! - probability 
measures 0, =(A(—«, €)) «(A | (—e€, €)) given by the 
normalized restriction of A to ae €- pak te of 
0, where we regard S' as the Lie group R/Z and use 
additive notation for the group operation. Since A is 
T,-invariant for every t € S',) is also an invariant 
measure for the measure-preserving random system 


SHS XO Ax Oyo 


for every € >0, where 2=(S')* 
is stochastically stable under 
perturbations. 

Concrete examples can be irrational rotations, 
To(x) =x + a with a € R\Q, or expanding maps of 
the circle, To(x)=b-x for some DE N,un>2. 
Analogous examples exist in higher-dimensional tori. 


(To, A) 


noise 


. Hence, 
additive 


Example 7 (Stochastic stability depends on the type 
of noise). In spite of the straightforward method 
for obtaining stochastic stability in Example 6, for 
example, an expanding circle map To(x)=2 - x, we 
can choose a continuous family of probability 
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measures 9. such that the same map Tọ is not 
stochastically stable. 


It is well known that A is the unique absolutely 
continuous invariant measure for Tp and also the 
unique physical measure. Given e > 0 small, let us 
define transition probability measures as follows: 


= r | [be(Z) — €, Ge(Z) + €| 
pel 19) = Mgl) 6b +e) 
where ¢, | (—e, €) = 0, ¢ | [S* \ (—2e, 2€)] = To, and 


over (—2e, —e] U[e,2€), we can define ġe by inter- 
polation in order that it be smooth. 

In this setting, every random orbit starting at 
(—e, €) never leaves this neighborhood in the 
future. Moreover, it is easy to see that every 
random orbit eventually enters (—e, €). Hence, 
every invariant probability measure p, for this 
Markov chain model is supported in [—e, e]. Thus, 
letting € — 0, we see that the only zero-noise limit 
is 69, the Dirac mass concentrated at 0, which is 
not a physical measure for To. 

This construction can be achieved in a random- 
maps setting, but only in the C? topology — it is not 
possible to realize this Markov chain by random 
maps that are C! close to To for e near 0. 


Characterization of Measures Satisfying 
the Entropy Formula 


Significant effort has been put in recent years in 
extending important results from dynamical systems 
to the random setting. Among many examples are: 
the local conjugacy between the dynamics near a 
hyperbolic fixed point and the action of the derivative 
of the map on the tangent space, the stable/unstable 
manifold theorems for hyperbolic invariant sets and 
the notions and properties of metric and topological 
entropy, dimensions and equilibrium states for 
potentials on random (or fuzzy) sets. 

The characterization of measures satisfying the 
entropy formula is one important result whose 
extension to the setting of iteration of independent 
and identically distributed random maps has 
recently had interesting new consequences back 
into nonrandom dynamical systems. 


Metric entropy for random perturbations Given a 
probability measure u and a partition € of M, except 
perhaps for a subset of p-null measure, the entropy 
of u with respect to € is defined to be 


=- 5 (R) 


REE 


) log a(R 
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where the convention that 0 log 0 =0 has been used. 
Given another finite partition Ç, we write €V Ç to 
indicate the partition obtained through intersection 
of every element of € with every element of ¢, and 
analogously for any finite number of partitions. If u 
is also a stationary measure for a random-maps 
model (see the section “Random maps”), then for 
any finite measurable partition € of M, 


n—1 B 
b,(é) = inf fH, (V (Ti) o) dp (w) 


> 
n>n i—0 


is finite and is called the entropy of the random 
dynamical system with respect to € and to u. 

We define h, = sup, h,(€) as the metric entropy 
of the random dynamical system, where the 
supremo is taken over all -measurable partitions. 
An important point here is the following notion: 
setting A the Borel o-algebra of M, we say that a 
finite partition € of M is a random generating 
partition for A if 


+00 

VG)" O=A 

i=0 
(except u-null sets) for pY-almost all w E€ Q=U~. 
Then a classical result from ergodic theory ensures 
that we can calculate the entropy using only a 
random generating partition €, that is, b, = h, (£). 


The entropy formula There exists a general 
relation ensuring that the entropy of a measure- 
preserving differentiable transformation (To, u) on a 
compact Riemannian manifold is bounded from 
above by the sum of the positive Lyapunov 
exponents of To 


by(To) < | Y Aa) dul 
Ai(x)>0 


The equality (entropy formula) was first shown 
to hold for diffeomorphisms preserving a measure 
equivalent to the Riemannian volume, and then the 
measures satisfying the entropy formula were 
characterized: for C* diffeomorphisms the equality 
holds if and only if the disintegration of u along the 
unstable manifolds is formed by measures abso- 
lutely continuous with respect to the Riemannian 
volume restricted to those submanifolds. The 
unstable manifolds are the submanifolds of M 
everywhere tangent to the Lyapunov subspaces 
corresponding to all positive Lyapunov exponents, 
analogous to “integrating the distribution of Lya- 
punov subspaces corresponding to positive expo- 
nents” — this particular point is a main subject of 


smooth ergodic theory for nonuniformly hyperbolic 
dynamics. 

Both the inequality and the characterization of 
stationary measures satisfying the entropy formula 
were extended to random iterations of independent 
and identically distributed C? maps (noninjective 
and admitting critical points), and the inequality 
reads 


by < ff È Au) dula) dw 


i(x,w)>0 


where the functions A; are the random variables 
provided by the random multiplicative ergodic 
theorem. 


Construction of Physical Measures 
as Zero-Noise Limits 


The characterization of measures which satisfy the 
entropy formula enables us to construct physical 
measures as zero-noise limits of random invariant 
measures in some settings, outlined in the following, 
obtaining in the process that the physical measures 
so constructed are also stochastically stable. 

The physical measures obtained in this manner 
arguably are natural measures for the system, since 
they are both stable under (certain types of) 
random perturbations and describe the asymptotic 
behavior of the system for a positive-volume subset 
of initial conditions. This is a significant contribu- 
tion to the state-of-the-art of present knowledge on 
dynamics from the perspective of random dynami- 
cal systems. 


Hyperbolic measures and the entropy formula The 
main idea is that an ergodic invariant measure u for 
a diffeomorphism To which satisfies the entropy 
formula and whose Lyapunov exponents are every- 
where nonzero (known as hyperbolic measure) 
necessarily is a physical measure for To. This follows 
from standard arguments of smooth nonuniformly 
hyperbolic ergodic theory. 

Indeed u satisfies the entropy formula if and only 
if u disintegrates into densities along the unstable 
submanifolds of Tp. The unstable manifolds W"(x) 
are tangent to the subspace corresponding to every 
positive Lyapunov exponent at u-almost every point 
x, they are an invariant family, that is, 
To(W"(x)) = W"(x) for p-almost every x, and dis- 
tances on them are uniformly contracted under 
iteration by T,'. 

If the exponents along the complementary direc- 
tions are nonzero, then they must be negative 
and smooth ergodic theory ensures that there exist 
stable manifolds, which are submanifolds W‘%(x) of 


M everywhere tangent to the subspace of negative 
Lyapunov exponents at -almost every point x, form 
a To-invariant family (To(W%(x)) = WS(x), -almost 
everywhere), and distances on them are uniformly 
contracted under iteration by To. 

We still need to understand that time averages 
are constant along both stable and unstable mani- 
folds, and that the families of stable and unstable 
manifolds are absolutely continuous, in order to 
realize how a hyperbolic measure is a_ physical 
measure. 

Given y € W%(x), the time averages of x and y 
coincide for continuous observables simply because 
dist (TG (x), TG(y)) — 0 when n — +oo. For unstable 
manifolds, the same holds when considering time 
averages for Tj '. Since forward and backward time 
averages are equal p-almost everywhere, the set of 
points having asymptotic time averages given by u 
has positive Lebesgue measure if the set 


B=|J{W*(y): y € W(x) A supp(u)} 


has positive volume in M, for some x whose time 
averages are well defined. 

Now, stable and unstable manifolds are trans- 
verse everywhere where they are defined, but they 
are only defined u-almost everywhere and depend 
measurably on the base point, so we cannot use 
transversality arguments from differential topol- 
ogy, in spite of W"(x)N supp(u) having positive 
volume in W"(x) by the existence of a smooth 
disintegration of u along the unstable manifolds. 
However, it is known for smooth (C*) transforma- 
tions that the families of stable and unstable 
manifolds are absolutely continuous, meaning 
that projections along leaves preserve sets of zero 
volume. This is precisely what is needed for 
measure-theoretic arguments to show that B has 
positive volume. 


Zero-noise limits satisfying the entropy 
formula Using the extension of the characteriza- 
tion of measures satisfying the entropy formula 
for the random-maps setting, we can build random 
dynamical systems, which are small random pertur- 
bations of a map To, having invariant measures pe 
satisfying the entropy formula for all sufficiently 
small e > 0. Indeed, it is enough to construct small 
random perturbations of Tọ having absolutely 
continuous invariant probability measures ue for all 
small enough e > 0. 

In order to obtain such random dynamical 
systems, we choose families of maps T : U x M > 
M and of probability measures (0.)..9) as in 
Examples 3 and 4, where we assume that o € U, so 
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that To belongs to the family. Letting T,,(u) = T(u, x) 
for all (u,x) €U x M, we then have that T,(0.) is 
absolutely continuous. This means that sets of 
perturbations of positive @.-measure send points of 
M onto positive-volume subsets of M. Such a 
perturbation can be constructed for every contin- 
uous map of any manifold. 

In this setting, any invariant probability measure 
for the associated skew-product map S:Q x MÓ of 
the form 6% x ue is such that ue is absolutely 
continuous with respect to volume on M. Then the 
entropy formula holds: 


bi. = Nj(x,w) due(x) d (w 
J E~ ) due(x) db (w) 


Having this and knowing the characterization of 
measures satisfying the entropy formula, it is natural 
to look for conditions under which we can guaran- 
tee that the above inequality extends to any zero- 
noise limit uo of ue when e— 0. In this case, uo 
satisfies the entropy formula for To. 

If, in addition, we are able to show that uo is a 
hyperbolic measure, then we obtain a physical measure 
for To which is stochastically stable by construction. 

These ideas can be carried out completely for 
hyperbolic diffeomorphisms, that is, maps admitting 
a continuous invariant splitting of the tangent space 
into two sub-bundles E 6 F defined everywhere with 
bounded angles, whose Lyapunov exponents are 
negative along E and positive along F. Recently, 
maps satisfying weaker conditions were shown to 
admit stochastically stable physical measures follow- 
ing the same ideas. 

These ideas also have applications to the con- 
struction and stochastic stability of physical measure 
for strange attractors and for all mathematical 
models involving ordinary differential equations or 
iterations of maps. 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Homeomorphisms and 
Diffeomorphisms of the Circle; Lyapunov Exponents and 
Strange Attractors; Nonequilibrium Statistical Mechanics 
(Stationary): Overview; Random Walks in Random 
Environments; Stochastic Differential Equations. 
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Introduction 


We wish to study energy correlations of quantum 
spectra. Suppose the spectrum of a quantum system 
has been measured or calculated. All levels in the 
total spectrum having the same quantum numbers 
form one particular suwbspectrum. Its energy levels are 
at positions x,, 7=1,2,...,N, say. We assume that 
N, the number of levels in this subspectrum, is large. 
With a proper smoothing procedure, we obtain the 
level density R1(x), that is, the probability density of 
finding a level at the energy x. As indicated in the top 
part of Figure 1, the level density R1(x) increases with 
x for most physics systems. In the present context, 
however, we are not so interested in the level density. 
We want to measure the spectral correlations 
independently of it. Hence, we have to remove the 
level density from the subspectrum. This is referred to 
as unfolding. We introduce a new dimensionless 
energy scale € such that dé = R,(x) dx. By construc- 
tion, the resulting subspectrum in € has level density 
unity, as shown schematically in the bottom part of 
Figure 1. It is always understood that the energy 
correlations are analyzed in the unfolded subspectra. 

Surprisingly, a remarkable universality is found in 
the spectral correlations of a large class of systems, 
including nuclei, atoms, molecules, quantum chaotic 
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Figure 1 Original (top) and unfolded (bottom) spectrum. 
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and disordered systems, and even quantum chromo- 


dynamics on the lattice. Consider the nearest- 
neighbor spacing distribution p(s). It is the prob- 
ability density of finding two adjacent levels in 
the distance s. If the positions of the levels are 
uncorrelated, the nearest-neighbor spacing distribu- 
tion can be shown to follow the Poisson law 


p™ (s) = exp(-s) [1] 


While this is occasionally found, many more systems 
show a rather different nearest-neighbor spacing 
distribution, the Wigner surmise 


p™) (s) = 5s exp (- =) [2] 


As shown in Figure 2, the Wigner surmise excludes 
degeneracies, pW’ (0) = 0, the levels repel each other. 
This is only possible if they are correlated. Thus, the 
Poisson law and the Wigner surmise reflect the absence 
or the presence of energy correlations, respectively. 

Now, the question arises: if these correlation 
patterns are so frequently found in physics, is 
there some simple, phenomenological model? -— 
Yes, random matrix theory (RMT) is precisely this. 
To describe the absence of correlations, we choose, 
in view of what has been said above, a diagonal 
Hamiltonian 


H =diag(x1,...,Xn) [3] 
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Figure 2 Wigner surmise (solid) and Poisson law (dashed). 


whose elements, the eigenvalues x,, are uncorrelated 
random numbers. To model the presence of correla- 
tions, we insert off-diagonal matrix elements, 


Ay, +++ Hin 
aS) 3 | 4] 
Ani FANN 
We require that H is real symmetric, H" =H. The 
independent elements H,,, are random numbers. 
The random matrix H is diagonalized to obtain the 
energy levels x,,2=1,2,...,N. Indeed, a numerical 
simulation shows that these two models yield, after 
unfolding, the Poisson law and the Wigner surmise 
for large N, that is, the absence or presence of 
correlations. This is the most important insight into 
the phenomenology of RMT. 

In this article, we set up RMT in a more formal 
way; we discuss analytical calculations of correla- 
tion functions, demonstrate how this relates to 
supersymmetry and stochastic field theory and 
show the connection to chaos, and we briefly sketch 
the numerous applications in many-body physics, in 
disordered and mesoscopic systems, in models for 
interacting fermions, and in quantum chromody- 
namics. We also mention applications in other 
fields, even beyond physics. 


Random Matrix Theory 
Classical Gaussian Ensembles 


For now, we consider a system whose energy levels 
are correlated. The N x N matrix H modeling it has 
no fixed zeros but random entries everywhere. There 
are three possible symmetry classes of random 
matrices in standard Schrödinger quantum 
mechanics. They are labeled by the Dyson index £. 
If the system is not time-reversal invariant, H has to 
be Hermitian and the random entries H,,,, are 
complex (@=2). If time-reversal invariance holds, 
two possibilities must be distinguished: if either the 
system is rotational symmetric, or it has integer spin 
and rotational symmetry is broken, the Hamilton 
matrix H can be chosen to be real symmetric (3 = 1). 
This is the case in eqn [4]. If, on the other hand, the 
system has half-integer spin and rotational symme- 
try is broken, H is self-dual (8 =4) and the random 
entries H,,, are 2x2 quaternionic. The Dyson 
index 8 is the dimension of the number field over 
which H is constructed. 

As we are interested in the eigenvalue correla- 
tions, we diagonalize the random matrix, H = 
U'xU. Here, x=diag(x1,...,xNn) is the diagonal 
matrix of the N eigenvalues. For G=4, every 
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eigenvalue is doubly degenerate. This is Kramers’ 
degeneracy. The diagonalizing matrix U is in the 
orthogonal group O(N) for G=1, in the unitary 
group U(N) for G=2 and in the unitary—symplectic 
group USp(2N) for G=4. Accordingly, the three 
symmetry classes are referred to as orthogonal, 
unitary, and symplectic. 

We have not yet chosen the probability densities 
for the random entries Ham. To keep our assump- 
tions about the system at a minimum, we treat all 
entries on equal footing. This is achieved by 
rotational invariance of the probability density 
P\(H), not to be confused with the rotational 
symmetry employed above to define the symmetry 
classes. No basis for the matrices is preferred in any 
way if we construct POH ) from matrix invariants, 
that is, from traces and determinants, such that it 
depends only on the eigenvalues, POW = PP (x). A 
particularly convenient choice is the Gaussian 


P 
P} (H) = C$? exp (- zr [5] 


where the constant v sets the energy scale and the 
constant C ensures normalization. The three 
symmetry classes together with the probability 
densities [5] define the Gaussian ensembles: the 
Gaussian orthogonal (GOE), unitary (GUE) and 
symplectic (GSE) ensemble for 8=1,2,4. 

The phenomenology of the three Gaussian 
ensembles differs considerably. The higher 8, the 
stronger the level repulsion between the eigenvalues 
Xn. Numerical simulation quickly shows that the 
nearest-neighbor spacing distribution behaves like 
pP (s) ~ s° for small spacings s. This also becomes 
obvious by working out the differential probability 
PENH )d[H] of the random matrices H in eigenvalue- 
angle coordinates x and U. Here, d[H] is the invariant 
measure or volume element in the matrix space. When 
writing d[-], we always mean the product of all 
differentials of independent variables for the quantity 
in the square brackets. Up to constants, we have 


d[H] = |An(x)|" d[x] du(U) 6 


where du(U) is, apart from certain phase contribu- 
tions, the invariant or Haar measure on O(N), U(N), 
or USp(2N), respectively. The Jacobian of the 
transformation is the modulus of the Vandermonde 
determinant 


An(x) = | | (%n — xn) 7| 


n<m 


raised to the power 6. Thus, the differential 
probability PO (H) d[H] vanishes whenever any 
two eigenvalues x, degenerate. This is the level 
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repulsion. It immediately explains the behavior of 
the nearest-neighbor spacing distribution for small 
spacings. 

Additional symmetry constraints lead to new 
random matrix ensembles relevant in physics, the 
Andreev and the chiral Gaussian ensembles. If one 
refers to the classical Gaussian ensembles, one 
usually means the three ensembles introduced 
above. 


Correlation Functions 


The probability density to find k energy levels at 
positions x1,...,X is the k-level correlation func- 
tion RY (x1,...,Xp). We find it by integrating out 
N — k levels in the N-level differential probability 
P\)(H) d[H]. We also have to average over the 
bases, that is, over the diagonalizing matrices U. 
Due to rotational invariance, this simply yields the 
group volume. Thus, we have 


RY) (x01, .. 5%) 


N! pte ss 

=a] deno) denlAn(x)I?PN (2) [8] 
“J — —0o 

Once more, we used rotational invariance which 
implies that P\ (x) is invariant under permutation of 
the levels x,,. Since the same then also holds for the 
correlation functions [8], it is convenient to normal- 
ize them to the combinatorial factor in front of the 
integrals. A constant ensuring this has been 
absorbed into PLE) (x). 

Remarkably, the integrals in eqn [8] can be done 
in closed form. The GUE case (G=2) is mathema- 
tically the simplest, and one finds the determinant 
structure 


RY (x1,...,x~) =det[KO (xp, xq)] 9) 


p,q=1,...,R 


All entries of the determinant can be expressed in 
terms of the kernel KY) (Xp Xa) which depends on 
two energy arguments (x,,x,). Analogous but 
more complicated formulae are valid for the 
GOE (8=1) and the GSE (8=4), involving 
quaternion determinants and integrals and deriva- 
tives of the kernel. 

As argued in the Introduction, we are interested in 
the energy correlations on the unfolded energy scale. 
The level density is formally the one-level correla- 
tion function. For the three Gaussian ensembles it is, 
to leading order in the level number N, the Wigner 


semicircle 
(6) = i 2 
R x= x2 \/4 Nv* — x7 [10] 


for |x1| < 2VNv and zero for |x1| > 2VNv. None of 
the common systems in physics has such a level 


density. When unfolding, we also want to take the 
limit of infinitely many levels N — œo to remove 
cutoff effects due to the finite dimension of the 
random matrices. It suffices to stay in the center of 
the semicircle where the mean level spacing is 
D =1/R” (0) =nv/VN. We introduce the dimen- 
sionless energies p = xp/D,p =1,...,k, which have 
to be held fixed when taking the limit N — oo. The 
unfolded correlation functions are given by 


Xe ER = Jim DER (Dé1,..., DE) [11] 


As we are dealing with probability densities, the 
Jacobians dxp/d&p enter the reformulation in the 
new energy variables. This explains the factor D*. 
Unfolding makes the correlation functions transla- 
tion invariant; they depend only on the differences 
E —€ . The unfolded correlation functions can be 
written in a rather compact form. For the GUE 
(G=2), they read 


sin mép — su 
™(Ep — Sq) p,q=1,...,k 


gereg 


XPE- s61) = det 12] 


There are similar, but more complicated, formulae 
for the GOE (G=1) and the GSE (G=4). By 
construction, one has X (£1) =l 

It is useful to formulate the case where correla- 
tions are absent, that is, the Poisson case, accord- 
ingly. The level density R (x1) is simply N times the 
(smooth) probability density chosen for the entries 
in the diagonal matrix [4]. Lack of correlations 
means that the k-level correlation function only 
involves one-level correlations, 


N! : 
T T wont LLR (o) [13] 
Ne 


The combinatorial factor is important, since we 
always normalize to N!/(N — k)!. Hence, one finds 


XPE... E) =1 [14] 


for all unfolded correlation functions. 


Statistical Observables 


The unfolded correlation functions yield all statis- 
tical observables. The two-level correlation function 
X2(r) with r=& —& is of particular interest in 
applications. If we do not write the superscript (6) 
or (P), we mean either of the functions. For the 
Gaussian ensembles, x (r) is shown in Figure 3. 
One often writes X2(r)=1— Y2(r). The two-level 
cluster function Yx(r) nicely measures the deviation 
from the uncorrelated Poisson case, where one has 
xP (r) = 1 and YS (r) =0. 


XPV r) 











Figure 3 Two-level correlation function x$?(r) for GOE (solid), 
GUE (dashed) and GSE (dotted). 


By construction, the average level number in an 
interval of length L in the unfolded spectrum is L. 
The level number variance %7(L) is shown to be an 
average over the two-level cluster function, 


Y=? fa — r) Y2 (r)dr [15] 
0 


We find L + \/=2(L) levels in an interval of length L. 
In the uncorrelated Poisson case, one has £?” (L) = L. 
This is just Poisson’s error law. For the Gaussian 
ensembles X40) (L) behaves logarithmically for large L. 
The spectrum is said to be more rigid than in the 
Poisson case. As Figure 4 shows, the level number 
variance probes longer distances in the spectrum, in 
contrast to the nearest-neighbor spacing distribution. 

Many more observables, also sensitive to higher 
order, k >2 correlations, have been defined. In 
practice, however, one is often restricted to analyz- 
ing two-level correlations. An exception is, to some 
extent, the nearest-neighbor spacing distribution 
p(s). It is the two-level correlation function with 
the additional requirement that the two levels in 
question are adjacent, that is, that there are no levels 
between them. Thus, all correlation functions are 
needed if one wishes to calculate the exact nearest- 
neighbor spacing distribution p%®(s) for the 
Gaussian ensembles. These considerations explain 
that we have Xe) ~ p” (s) for small s. But while 
X (s) saturates for large s, p®(s) quickly goes to 











Figure 4 Level number variance »?(L) for GOE (solid) and 
Poisson case (dashed). 
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zero in a Gaussian fashion. Thus, although the 
nearest-neighbor spacing distribution mathemati- 
cally involves all correlations, it makes in practice 
only a meaningful statement about the two-level 
correlations. Luckily, p® (s) differs only very slightly 
from the heuristic Wigner surmise [2] (correspond- 
ing to G=1), respectively from its extensions 
(corresponding to G=2 and G=4). 


Ergodicity and Universality 


We constructed the correlation functions as averages 
over an ensemble of random matrices. But this is not 
how we proceeded in the data analysis sketched in 
the Introduction. There, we started from one single 
spectrum with very many levels and obtained the 
statistical observable just by sampling and, if 
necessary, smoothing. Do these two averages, the 
ensemble average and the spectral average, yield the 
same? Indeed, one can show that the answer is 
affirmative, if the level number N goes to infinity. 
This is referred to as ergodicity in RMT. 

Moreover, as already briefly indicated in the 
Introduction, very many systems from different 
areas of physics are well described by RMT. This 
seems to be at odds with the Gaussian assumption 
[5]. There is hardly any system whose Hamilton 
matrix elements follow a Gaussian probability 
density. The solution for this puzzle lies in the 
unfolding. Indeed, it has been shown that almost all 
functional forms of the probability density POH ) 
yield the same unfolded correlation functions, if no 
new scale comparable to the mean level spacing is 
present in POH ). This is the mathematical side of 
the empirically found universality. 

Ergodicity and universality are of crucial impor- 
tance for the applicability of RMT in data analysis. 


Wave Functions 


By modeling the Hamiltonian of a system with a 
random matrix H, we do not only make an 
assumption about the statistics of the energies, but 
also about those of the wave functions. Because of 
the eigenvalue equation Huy, =xyuUyn,n=1,...,N, 
the wave function belonging to the eigenenergy x, 
is modeled by the eigenvector u,. The columns of 
the diagonalizing matrix U = [u1 u2---uy] are these 
eigenvectors. The probability density of the compo- 
nents Uy, Of the eigenvector u, can be calculated 
rather easily. For large N it approaches a Gaussian. 
This is equivalent to the Porter-Thomas distribu- 
tion. While wave functions are often not accessible 
in an experiment, one can measure transition 
amplitudes and widths, giving information about 
the matrix elements of a transition operator and a 
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projection of the wave functions onto a certain state 
in Hilbert space. If the latter are represented by a 
fixed matrix A or a fixed vector a, respectively, one 
can calculate the RMT prediction for the probability 
densities of the matrix elements u!Au,, or the 
widths a'u, from the probability density of the 
eigenvectors. 


Scattering Systems 


It is important that RMT can be used as a powerful 
tool in scattering theory, because the major part of 
the experimental information about quantum sys- 
tems comes from scattering experiments. Consider 
an example from compound nucleus scattering. In 
an accelerator, a proton is shot on a nucleus, with 
which it forms a compound nucleus. This then 
decays by emitting a neutron. More generally, the 
ingoing channel v (the proton in our example) 
connects to the interaction region (the nucleus), 
which also connects to an outgoing channel p (the 
neutron). There are A channels with channel wave 
functions which are labeled v=1,...,A. The 
interaction region is described by an NxN 
Hamiltonian matrix H whose eigenvalues x, are 
bound-state energies labeled n=1,...,N. The 
dimension N is a cutoff which has to be taken to 
infinity at the end of a calculation. The Ax A 
scattering matrix S contains the information about 
how the ingoing channels are transformed into the 
outgoing channels. The scattering matrix S is 
unitary. Under certain and often justified assump- 
tions, a scattering matrix element can be cast into 
the form 


Si = Sy, — i2nWIG' W, [16] 


The couplings W,,, between the bound states n and 
the channels v are collected in the N x A matrix W, 
W, is its vth column. The propagator G™ is the 
inverse of 


G=zln-H+in X W,W} [17] 


v open 


Here, z is the scattering energy and the summation 
is only over channels which are open, that is, 
accessible. Formula [16] has a clear intuitive inter- 
pretation. The scattering region is entered through 
channel v, the bound states of H become resonances 
in the scattering process according to eqn [17], the 
interaction region is left through channel u. This 
formulation applies in many areas of physics. All 
observables such as transmission coefficients, cross 
sections, and others can be calculated from the 
scattering matrix S. 


We have not made any statistical assumptions yet. 
Often, one can understand generic features of a 
scattering system by assuming that the Hamiltonian 
H is a random matrix, taken from one of the three 
classical ensembles. This is one RMT approach used 
in scattering theory. 

Another RMT approach is based on the scattering 
matrix itself, S is modeled by a Ax A unitary 
random matrix. Taking into account additional 
symmetries, one arrives at the three circular ensem- 
bles, circular orthogonal (COE), unitary (CUE) and 
symplectic (CSE). They correspond to the three 
classical Gaussian ensembles and are also labeled 
with the Dyson index G=1,2,4. The eigenphases of 
the random scattering matrix correspond to the 
eigenvalues of the random Hamiltonian matrix. The 
unfolded correlation functions of the circular 
ensembles are identical to those of the Gaussian 
ensembles. 


Supersymmetry 


Apart from the symmetries, random matrices con- 
tain nothing but random numbers. Thus, a certain 
type of redundancy is present in RMT. Remarkably, 
this redundancy can be removed, without losing any 
piece of information by using supersymmetry, that 
is, by a reformulation of the random matrix model 
involving commuting and anticommuting variables. 
For the sake of simplicity, we sketch the main ideas 
for the GUE, but they apply to the GOE and the 
GSE accordingly. 

One defines the k-level correlation functions by 
using the resolvent of the Schrödinger equation, 


RO Gensco 
-1 [ pOH i 2 dH) [18 
saf Reg s 


The energies carry an imaginary increment X =Xp Ė 
ie and the limit € — 0 has to be taken at the end of 
the calculation. The k-level correlation functions 
R (i, ...,Xp) as defined in eqn [8] can always be 
obtained from the functions [18] by constructing a 
linear combination of the R roest in which 
the signs of the imaginary increments are chosen 
such that only the imaginary parts of the traces 
contribute. Some trivial -distributions have to be 
removed. The k-level correlation functions [18] 
can be written as the k-fold derivative 


RO on, ae 
1 of 


= 2 (x +S) [19] 
m*a Op * 7 


of the generating function 
2 
Z (x+ J) 
k det( x$ + Jp — H) 


= POH —_+——_—_—_<d/H] [20] 
J e det (xf ET H) 

which depends on the energies and k new source 

variables Jp, p=1,...,k, ordered in 2k x 2k diag- 


onal matrices 


x = diag(x1, x1, ce a) 
J = diag(+Jı, =} 45 ve stl psf p) 


We notice the normalization Z (x) = 1 at J=0. The 
generating function [20] is an integral over an 
ordinary N x N matrix H. It can be exactly rewritten 
as an integral over a 2k x 2k supermatrix o contain- 
ing commuting and anticommuting variables, 


|21] 


Zi (x +J) 
e / Q)(a)sdetN(x* + J —o)d[o] [22 


The integrals over the commuting variables are of 
the ordinary Riemann-Stiltjes type, while those over 
the anticommuting variables are Berezin integrals. 
The Gaussian probability density [5] is mapped onto 
its counterpart in superspace 


1 
or a= ae exp (- 5,2 tt a) [23] 
where ae is a normalization constant. The supertrace 


str and the superdeterminant sdet generalize the 
corresponding invariants for ordinary matrices. The 
total number of integrations in eqn [22] is drastically 
reduced as compared to eqn [20]. Importantly, it is 
independent of the level number N which now only 
appears as the negative power of the superdeterminant 
in eqn [22], that is, as an explicit parameter. This most 
convenient feature makes it possible to take the limit of 
infinitely many levels by means of a saddle point 
approximation to the generating function. 

Loosely speaking, the supersymmetric formulation 
can be viewed as an irreducible representation of RMT 
which yields a clearer insight into the mathematical 
structures. The same is true for applications in 
scattering theory and in models for crossover transi- 
tions to be discussed below. This explains why super- 
symmetry is so often used in RMT calculations. 

It should be emphasized that the rôle of super- 
symmetry in RMT is quite different from the one in 
high-energy physics, where the commuting and 
anticommuting variables represent physical parti- 
cles, bosons and fermions, respectively. This is not 
so in the RMT context. The commuting and 
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anticommuting variables have no direct physics 
interpretation; they appear simply as helpful math- 
ematical devices to cast the RMT model into an 
often much more convenient form. 


Crossover Transitions 


The RMT models discussed up to now describe 
four extreme situations, the absence of correla- 
tions in the Poisson case and the presence of 
correlations as in the three fully rotational 
invariant models GOE, GUE, and GSE. A real 
physics system, however, is often between these 
extreme situations. The corresponding RMT mod- 
els can vary considerably, depending on the 
specific situation. Nevertheless, those models in 
which the random matrices for two extreme 
situations are simply added with some weight are 
useful in so many applications that they acquired a 
rather generic standing. One writes 


H(a) = H + aH [24] 


where H® is a random matrix drawn from an 
ensemble with a completely arbitrary probability 
density Po), The case of a fixed matrix is 
included, because one may choose a product of 
6-distributions for the probability density. The 
matrix H”) is random and drawn from the classical 
Gaussian ensembles with probability density 
POH) for G=1,2,4. One requires that the 
group diagonalizing H") is a subgroup of the one 
diagonalizing H‘®. The model [24] describes a 
crossover transition. The weight a is referred to as 
transition parameter. It is useful to choose the 
spectral support of H') and H® equal. One can 
then view a as the root-mean-square matrix element 
of H'*), At a=0, one has the arbitrary ensemble. 
The Gaussian ensembles are formally recovered in 
the limit a — oo, to be taken in a proper way such 
that the energies remain finite. 

We are always interested in the unfolded correla- 
tion functions. Thus, a has to be measured in units 
of the mean level spacing D such that \=a/D is 
the physically relevant transition parameter. It 
means that, depending on the numerical value of 
D, even a small effect on the original energy scale 
can have sizeable impact on the spectral statistics. 
This is referred to as statistical enhancement. The 
nearest-neighbor spacing distribution is already 
very close to p'*)(s) for the Gaussian ensembles if 
A is larger than 0.5 or so. In the long-range 
observables such as the level number variance 
©? (L), the deviation from the Gaussian ensemble 
statistics becomes visible at interval lengths L 
comparable to A. 
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Crossover transitions can be interpreted as diffu- 
sion processes. With the fictitious time t=a*/2, the 
probability density Py(x,t) of the eigenvalues x of 
the total Hamilton matrix H = H(t)=H{(a) satisfies 
the diffusion equation 


AP (x,t) = S P(E, t) [25] 


where the probability density for the arbitrary 
ensemble is the initial condition Px(x, 0) = PY (x). 
The Laplacian 


N a2 
ð 6 ð ð 
= yy 2 oer ee ee 5 
a 2 Oe + 2 Ki Kip es =) Bd 


n<m 





lives in the curved space of the eigenvalues x. This 
diffusion process is Dyson’s Brownian motion in 
slightly simplified form. It has a rather general meaning 
for harmonic analysis on symmetric spaces, connecting 
to the spherical functions of Gelfand and Harish- 
Chandra, Itzykson—Zuber integrals, and to Calogero- 
Sutherland models of interacting particles. All this 
generalizes to superspace. In the supersymmetric 
version of Dyson’s Brownian motion the generating 
function of the correlation functions is propagated, 


40 


AZS) — az ls, t) [27] 


p 
where the initial condition Z 0E e is the 
generating function of the correlation functions for 
the arbitrary ensemble. Here, s denotes the eigenva- 
lues of some supermatrices, not to be confused with 
the spacing between adjacent levels. Since the 
Laplacian A, lives in this curved eigenvalue space, 
this diffusion process establishes an intimate con- 
nection to harmonic analysis on superspaces. Advan- 
tageously, the diffusion [27] is the same on the 
original and on the unfolded energy scales. 


Fields of Application 
Many-Body Systems 


Numerous studies apply RMT to nuclear physics 
which is also the field of its origin. If the total 
number of nucleons, that is, protons and neutrons, is 
not too small, nuclei show single-particle and 
collective motion. Roughly speaking, the former is 
decoherent out-of-phase motion of the nucleons 
confined in the nucleus, while the latter is coherent 
in-phase motion of all nucleons or of large groups of 
them such that any additional individual motion of 
the nucleons becomes largely irrelevant. It has been 
shown empirically that the single-particle excitations 
lead to GOE statistics, while collective excitations 


produce different statistics, often of the Poisson type. 
Mixed statistics as described by crossover transitions 
are then of particular interest to investigate the 
character of excitations. For example, one applies 
the model [24] with H drawn from a Poisson 
ensemble and H” from a GOE. Another application 
of crossover transitions is breaking of time-reversal 
invariance in nuclei. Here, HO’ is from a GOE and 
H°) from a GUE. Indeed, a fit of spectral data to this 
model yields an upper bound for the time-reversal 
invariance violating root-mean-square matrix element 
in nuclei. Yet another application is breaking of 
symmetries such as parity or isospin. In the case of 
two quantum numbers, positive and negative parity, 
say, one chooses H® =diag(H™),H™’) block- 
diagonal with H™ and HO) drawn from two 
uncorrelated GOE and H” from a third uncorre- 
lated GOE which breaks the block structure. Again, 
root-mean-square matrix elements for symmetry 
breaking have been derived from the data. 

Nuclear excitation spectra are extracted from 
scattering experiments. An analysis as described 
above is only possible if the resonances are isolated. 
Often, this is not the case and the resonance widths 
are comparable to or even much larger than the mean 
level spacing, making it impossible to obtain the 
excitation energies directly from the cross sections. 
One then analyzes the latter and their fluctuations as 
measured and applies the concepts sketched above 
for scattering systems. This approach has also been 
successful for crossover transitions. 

Due to the complexity of the nuclear many-body 
problem, one has to use effective or phenomenological 
interactions when calculating spectra. Hence, one often 
studies whether the statistical features found in the 
experimental data are also present in the calculated 
spectra which result from the various models for nuclei. 

Other many-body systems, such as complex atoms 
and molecules, have also been studied with RMT 
concepts, but the main focus has always been on nuclei. 


Quantum Chaos 


Originally, RMT was intended for modeling systems 
with many degrees of freedom such as nuclei. Surpris- 
ingly, RMT proved useful for systems with few degrees 
of freedom as well. Most of these studies aim at 
establishing a link between RMT and classical chaos. 
Consider as an example the classical motion of a point- 
like particle in a rectangle billiard. Ideal reflection at the 
boundaries and absence of friction are assumed, 
implying that the particle is reflected infinitely many 
times. A second billiard is built by taking a rectangle 
and replacing one corner with a quarter circle as shown 
in Figure 5. The motion of the particle in this Sinai 


Figure 5 The Sinai billiard. 


billiard is very different from the one in the rectangle. 
The quarter circle acts like a convex mirror which 
spreads out the rays of light upon reflection. This effect 
accumulates, because the vast majority of the possible 
trajectories hit the quarter circle infinitely many times 
under different angles. This makes the motion in the 
Sinai billiard classically chaotic, while the one in the 
rectangle is classically regular. The rectangle is separ- 
able and integrable, while this feature is destroyed in the 
Sinai billiard. One now quantizes these billiard systems, 
calculates the spectra, and analyzes their statistics. Up 
to certain scales, the rectangle (for irrational squared 
ratio of the side lengths) shows Poisson behavior, the 
Sinai billiard yields GOE statistics. 

A wealth of such empirical studies led to the Bohigas- 
Giannoni-Schmit conjecture. We state it here not in its 
original, but in a frequently used form: spectra of 
systems whose classical analogues are fully chaotic 
show correlation properties as modeled by the Gaussian 
ensembles. The Berry—Tabor conjecture is complemen- 
tary: spectra of systems whose classical analogs are fully 
regular show correlation properties which are often 
those of the Poisson type. As tar as concrete physics 
applications are concerned, these conjectures are well- 
posed. From a strict mathematical viewpoint, they have 
to be supplemented with certain conditions to exclude 
exceptions such as Artin’s billiard. Due to the defnition 
of this system on the hyperbolic plane, its quantum 
version shows Poisson-like statistics, although the 
classical dynamics is chaotic. Up to now, no general 
and mathematically rigorous proofs could be given. 
However, semiclassical reasoning involving periodic 
orbit theory and, in particular, the Gutzwiller trace 
formula, yields at least a heuristic understanding. 

Quantum chaos has been studied in numerous 
systems. An especially prominent example is the 
Hydrogen atom put in a strong magnetic field, 
which breaks the integrability and drives the 
correlations towards the GOE limit. 


Disordered and Mesoscopic Systems 


An electron moving in a probe, a piece of wire, say, is 
scattered many times at impurities in the material. 
This renders the motion diffusive. In a statistical 
model, one writes the Hamilton operator as a sum of 
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the kinetic part, that is, the Laplacian, and a white- 
noise disorder potential V(r) with second moment 


(V(r) V(7')) = cvd (r —1') |28] 


Here, r is the position vector in d dimensions. The 
constant cy determines the mean free time between 
two scattering processes in relation to the density of 
states. It is assumed that phase coherence is present 
such that quantum effects are still significant. This 
defines the mesoscopic regime. The average over the 
disorder potential can be done with supersymmetry. 
In fact, this is the context in which supersymmetric 
techniques in statistical physics were developed, 
before they were applied to RMT models. In the 
case of weak disorder, the resulting field theory in 
superspace for two-level correlations acquires the 
form 


J du(Q)f(Q) exp(—S(Q)) 29] 


where f(O) projects out the observable under 
consideration and where S(Q) is the effective 
Lagrangian 


S(Q) = — J str (Divar)? ni i2rMQ(r))d°r [30] 


This is the supersymmetric nonlinear o model. It is 
used to study level correlations, but also to obtain 
information about the conductance and conduc- 
tance fluctuations when the probe is coupled to 
external leads. The supermatrix field QO(r) is the 
remainder of the disorder average, its matrix 
dimension is four or eight, depending on the 
symmetry class. This field is a Goldstone mode. It 
does not directly represent a particle as often the 
case in high-energy physics. The matrix O(r) lives 
in a coset space of certain supergroups. A tensor M 
appears in the calculation, and r is the energy 
difference on the unfolded scale, not to be confused 
with the position vector r. 

The first term in the effective Lagrangian invol- 
ving a gradient squared is the kinetic term, it stems 
from the Laplacian in the Hamiltonian. The con- 
stant D is the classical diffusion constant for the 
motion of the electron through the probe. The 
second term is the ergodic term. In the limit of 
zero dimensions, d— 0, the kinetic term vanishes 
and the remaining ergodic term yields precisely the 
unfolded two-level correlations of the Gaussian 
ensembles. Thus, RMT can be viewed as the zero- 
dimensional limit of field theory for disordered 
systems. For d > 0, there is a competition between 
the two terms. The diffusion constant D and the 
system size determine an energy scale, the Thouless 
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Figure 6 Level number variance ¥?(L). In this example, the 
Thouless energy is E,*+10 on the unfolded scale. The 
Gaussian ensemble behavior is dashed. 


energy Ec, within which the spectral statistics is of 
the Gaussian ensemble type and beyond which it 
approaches the Poisson limit. In Figure 6, this is 
schematically shown for the level number variance 
X? (L), which bends from Gaussian ensemble to 
Poisson behavior when L > E.. This relates to the 
crossover transitions in RMT. Gaussian ensemble 
statistics means that the electron states extend over 
the probe, while Poisson statistics implies their 
spatial localization. Hence, the Thouless energy is 
directly the dimensionless conductance. 

A large number of issues in disordered and 
mesoscopic systems have been studied with the 
supersymmetric nonlinear o model. Most results 
have been derived for quasi-one-dimensional sys- 
tems. Through a proper discretization, a link is 
established to models involving chains of random 
matrices. As the conductance can be formulated in 
terms of the scattering matrix, the experience with 
RMT for scattering systems can be applied and 
indeed leads to numerous new results. 


Quantum Chromodynamics 


Quarks interact by exchanging gluons. In quantum 
chromodynamics, the gluons are described by gauge 
fields. Relativistic quantum mechanics has to be 
used. Analytical calculations are only possible after 
some drastic assumptions and one must resort to 
lattice gauge theory, that is, to demanding numerics, 
to study the full problem. 

The massless Dirac operator has chiral symmetry, 
implying that all nonzero eigenvalues come in pairs 
(—An, +An) symmetrically around zero. In chiral 
RMT, the Dirac operator is replaced with block 
off-diagonal matrices 


v-a Y 


wi 0 pa 


where W, is a random matrix without further 
symmetries. By construction, W has chiral symmetry. 
The assumption underlying chiral RMT is that the 
gauge fields effectively randomize the motion of the 
quark. Indeed, this simple schematic model correctly 
reproduces low-energy sum rules and spectral statis- 
tics of lattice gauge calculations. Near the center of 
the spectrum, there is a direct connection to the 
partition function of quantum chromodynamics. 
Furthermore, a similarity to disordered systems exists 
and an analog of the Thouless energy could be found. 


Other Fields 


Of the wealth of further investigations, we can 
mention but a few. RMT is in general useful for 
wave phenomena of all kinds, including classical 
ones. This has been shown for elastomechanical and 
electromagnetic resonances. 

An important field of application is quantum 
gravity and matrix model aspects of string theory. 
We decided not to go into this, because the reason 
for the emergence of RMT concepts there is very 
different from everything else discussed above. 

RMT is also successful beyond physics. Not 
surprisingly, it always received interest in mathema- 
tical statistics, but, as already said, it also relates to 
harmonic analysis. A connection to number theory 
exists as well. The high-lying zeros of the Riemann ¢ 
function follow the GUE predictions over certain 
interval lengths. Unfortunately, a deeper under- 
standing is still lacking. 

As the interest in statistical concepts grows, RMT 
keeps finding new applications. Recently, one even 
started using RMT for risk management in finance. 


See also: Arithmetic Quantum Chaos; Chaos and 
Attractors; Determinantal Random Fields; Free Probability 
Theory; Growth Processes in Random Matrix Theory; 
Hyperbolic Billiards; Integrable Systems in Random 
Matrix Theory; Integrable Systems: Overview; Number 
Theory in Physics; Ordinary Special Functions; Quantum 
Chromodynamics; Quantum Mechanical Scattering 
Theory; Random Partitions; Random Walks in Random 
Environments; Semi-Classical Spectra and Closed 
Orbits; Supermanifolds; Supersymmetry Methods in 
Random Matrix Theory; Symmetry Classes in Random 
Matrix Theory. 
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Partitions 


A partition of n is a monotone sequence of non- 
negative integers, 


Ai ng SoS a oe 0) 


with sum n. The number n is also denoted by |A| and 
is called the size of n. The number of nonzero terms 
in A is called the length of A and often denoted by 
L(A). It is convenient to make the sequence A infinite 
by adding a string of zeros at the end. 

A geometric object associated to partition is its 
diagram. The diagram of A= (4,2,2,1) is shown in 
Figure 1. A larger diagram, flipped and rotated by 135°, 
can be seen in Figure 2. Flipping the diagram introduces 
an involution on the set of partitions of n known as 
transposition. The transposed partition is denoted by X’. 

Partitions serve as natural combinatorial labels for 
many basic objects in mathematics and physics. For 
example, partitions of n index both conjugacy classes 
and irreducible representations of the symmetric 
group S(7). Partitions A with (A) < n index irredu- 
cible polynomial representations of the general linear 


Figure 1 Diagram of a partition. 
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group GL(7). More generally, the highest weight of a 
rational representation of GL(m) can be naturally 
viewed as two partitions of total length < n. 

For an even more basic example, partitions A with 
Ay < m and (A) < n are the same as upright lattice 
paths making n steps up and m steps to the right 
(just follow the boundary of A). In particular, there 
are ("*’") of such. By a variation on this theme, 
partitions label the standard basis of fermionic Fock 
space (Miwa et al. 2000). They also label a standard 
basis of the bosonic Fock space. 

In most instances, partitions naturally occur 
together with some weight function. For example, 
the dimension, dim åA, of an irreducible representation 
of S(n), or some power of it, is what always appears in 
harmonic analysis on S(7). By a theorem of Burnside, 


(dim A)* 


n! 


Wepianch (A) = [1] 
is a probability measure on the set of partitions of n; it 
is known as the Plancherel measure. Besides harmonic 
analysis, there are many other contexts in which 
it appears, for example, by a theorem of 
Schensted (see Sagan (2001) and Stanley (1999)), the 
distribution of the first part A, of a Plancherel random 
partition A is the same as the distribution of the longest 
increasing subsequence in a uniformly random permu- 
tation of {1,2,...,7}. 





—2 —1 1 2 


Figure 2 A Plancherel-random partition of 1000 and the limit 
shape. 
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Partitions of n being just a finite set, one is often 
interested in letting n — oo. Even if the original 
problem was not of a probabilistic origin, one can 
still often benefit from adopting a probabilistic 
viewpoint because of the intuition and techniques 
that it brings. This is best illustrated by concrete 
examples, which is what we now turn to. These 
examples are not meant to be a panorama of 
random partitions. This is an old and still rapidly 
growing field and a simple list of all major 
contributions will take more space than is allowed. 
The books Kerov (2003), Pitman (n.d.) Sagan 
(2001), and Stanley (1999) offer much more 
information on the topics discussed below. 


Plancherel Measure 
Dimension of a Diagram 


There are several formulas and interpretations 
for the number dim in [1]; see Sagan (2001) and 
Stanley (1999). The one that often appears in the 
context of growth processes is the following: 
dim A is the number of ways to grow the diagram 
à from the empty diagram Ø by adding a square 
at a time. That is, dim A is number of chains of 
the form 


g= cac. ac NOU CAM =H) 


where |A®|=k and C2 means inclusion of 


diagrams. 
From the classical formula 
|! i, 
dim à = ———"___ |[ w- y+- iÀ) [2] 


IA; +k- 1)! -a 


where k is any number such that A,,; =0, one sees 
that the Plancherel measure is a discrete analog of 
the eigenvalue density 


eo (2) 2 tG E xj)? 
i<j 

of a GUE random matrix (Mehta 1991). Indeed, 
the first factor in [2], which looks like a multi- 
nomial coefficient, is the analog of the Gaussian 
weight. Kerov (2003) and Johansson were among 
the first to recognize the analogy between Plan- 
cherel measure and GUE. One comes across many 
partition sums that are discrete analogs of random 
matrix integrals. 

The most compact formula for dim A is the hook 
formula 





T= We) 3 


YEA 


Here the product is over all squares % in the 


diagram of à and 
h(~)=1+a(x)4+l(8) 


where a( x) and /( ~) is the number of squares to the 
right of the square ~ and below it, respectively. 
(These are known as arm-length and leg-length.) 


Limit Shape and Edge Scaling 


When the diagram of is very large, the logarithm 
of the hook product approximates a double 
integral. The analysis of the corresponding integral 
plays the central role, (see Kerov (2003), chapter 3) 
in the proof of the following law of large numbers 
for the Plancherel measure. 

Take the diagram of A, flip and rotate it as in Figure 1 
and rescale by a factor of y/n so that it has unit area. In 
this way one obtains a measure on continuous and, in 
fact, Lipschitz functions. By a result of Logan and Shepp 
and, independently, Vershik and Kerov these measures 
converge as n— œ to the -measure on a single 
function Q(x). This limit shape for the Plancherel 
measure, is also plotted in Figure 2. Explicitly, 


Q(x) = (x arcsin(x/2) + v4—x?), x| <2 


bap Ix > 2 


This is an analog of Wigner’s semicircle law (Mehta 
1991) for spectra of random matrices. The Gaussian 
correction to the limit shape was also found by 
Kerov (2003). 

The limit shape result can be refined to show 
that Ay/./n 2 in probability. Together with 
Schensted’s theorem, this answers the question 
posed by Ulam about the longest increasing 
subsequence in a random permutation. Further 
progress came in the work of Baik, Deift, and 
Johansson (see Deift (2000)), who conjectured 
(and proved for i=1 and 2) that as n — oo the 
joint distribution 


\i -2Vn 


ae P= 12... 


becomes exactly the same as the distribution of 
largest eigenvalues of a GUE random matrix. In 
particular, the longest increasing subsequence, 
suitably scaled, is distributed exactly like the 
largest eigenvalue. The distribution of the latter is 
known as the Tracy—Widom distribution; it is 
given in terms of a particular solution of the 
Painlevé II equation. For more information about 
the proof of the full conjecture, see Aldous and 
Diaconis (1999), Deift (2000), and Okounkov 
(2002). 


Correlation Functions 


One way to prove the full BDJ conjecture is to use the 
following exact formula first obtained in a more general 
setting by Borodin and Olshanski (see Olshanski (2003), 
and Okounkov (2002) for further generalizations). 
Look at the downsteps of the zig-zag curve in Figure 2. 
The x-coordinates of their midpoints are the numbers 


SA) = Ai ita} CZ+y 4 


The map Ar S (A) makes a random partition a random 
subset of Z + 4, that is, a random point field on a lattice. 
These random points should be treated like eigenvalues 
of a random matrix. In particular, it is natural to consider 
their correlations, that is, the probability that X c S(A) 
for some fixed X C Z + 5. 

Many formulas work better if we replace the 
Plancherel measures Wpianch,n ON partitions of a 
fixed number n by their Poisson average, 


_ 4 
Wee =e $ » >= Weplanch,n 
#20 n: 


Here > 0 is a parameter. It equals the expected 
size of A. For any finite set X, we have 


Probe(X C S(A)) = det E ea pg 2Gp £)| [5] 


xi, XjEX 
where Kpessel is the discrete Bessel kernel given by 


K Bessel (x; Yy; £) 


x—y 
Note that only Bessel function of integral order 
enter this formula. 

For large argument £, J,,(2,/€) has sine asymptotics 
if n<2,/€ and Airy function asymptotics if n ~ 
2,/é. Consequently, one gets the random matrix 
behavior near the edge of the limit shape and 
discrete sine kernel asymptotics of correlations in 
the bulk of the limit shape. 


Permutation Enumeration 


A basic combinatorial problem is to count per- 
mutations 01,...,0, E S(m) of given cycle types 
uw), ...,u'?) such that 


gigy [6] 


A geometric interpretation of this problem is to count 
covers of the sphere $% = CP! branched over p given 
points with monodromy p"),..., u'?). Elementary 
character theory of S(n) gives (Jones 1998) 


#40; € Co, [[o: =1}= (TT fio). [7] 
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where C,, is the conjugacy class with cycle type 
u and 


fà) = Clg 


is the central character of the irreducible representa- 
tion à. Here XÀ is the character of any ø € C, in the 
representation A. 

Let u be of the form (f,1,1,...) with ff fixed. 
By a result of Kerov and Olshanski, (N FA, is a 
polynomial in A of degree |ji|. See [11] for the 
simplest example a= (2), that is, for the central 
character of a transposition. We thus recognize in 
[7] a discrete analog of the GUE expectation of a 
polynomial in traces of a random matrix. This 
analogy becomes even clearer in the Gromov—Witten 
theory of CP’, which can be viewed as taking into 
account contributions of certain degenerate covers, 
see Okounkov (2002). 

There is a generalization, due to Burnside, of [7] 
to counting branched covers of surfaces of any 
genus g; see Jones (1998). The only modification 
required is that a representation A is now counted 
with the weight (dim \)*~’8. For example, covers of 
the torus correspond to a uniform measure on 
partitions. In particular, the probability that two 
random permutation from S(7z) commute is p()/n!, 
where p(n) is the number of partitions of n. 


Generalizations of Plancherel Measure 
Schur Functions and Cauchy Identity 


Schur functions s)(x1,...,X,), where A is a parti- 
tion with at most n parts, form a distinguished 
linear basis of the algebra of symmetric polyno- 
mials in x1,...,X,. Various definitions and many 
remarkable properties of these function are dis- 
cussed in, for example, Sagan (2001) and Stanley 
(1999). One of them is that s)(x) is the trace of a 
matrix with eigenvalues {x;} in an irreducible 
GL(z) module with highest weight A. The follow- 
ing stability of s), 


Ce Xi ves 0) SS ice ee VOD 


allows one to define Schur functions in infinitely 
many variables. The formulas 


Pu = N ÀSA 
A 


where 
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establish the transition between the basis of Schur 
function and the basis of power sum functions 


k 
=| [Pe P= > a 
i 
In particular, the dimension function dim A is the 


following specialization of the Schur function: 


dim à 
TP T Sale 1, p2=p3=-=0 








We will discuss other important specializations of 
Schur functions later. 

A typical situation in which a random matrix 
integral can be reduced to a sum over partition is 
when one uses the Cauchy identity 


ee eC 
H=) (> k ) 
= X sa(x)sa(y) [8] 
À 


to expand the integrand in Schur function and 
integrate term by term using, for example, the 
orthogonality of characters or the identity 


J SABET) de = FAB) 0 


Here s\(A) denotes the Schur function in eigenvalues 
of a matrix A, dg is the normalized Haar measure on 
the unitary group U(n), and 


dim, À = s)(1,...,1) 
—$_ 


n times 


is the dimension of irreducible GL(n) module V^ 
with highest weight A. The meaning of [9] is that 
normalized characters are algebra homomorphisms 
from the center of the group algebra of U(m) to 
numbers. This method of converting a random 
matrix problem to a random partition problem is 
known as character expansion (see, e.g., Kazakov 
(2001)). 

Inspired by the Cauchy identity, one can general- 
ize Plancherel measure to 


Weschur = [a — x;yi)sa æ )sa 9) 


where x and y, or, equivalently, p(x) and p(y), are 
viewed as parameters. This is known as the Schur 
measure. If pı(x)=p2:(y)=ẸvE and all other p,’s 
vanish, we get Wischur = Vte. Many properties of the 
Plancherel measure can be generalized to Schur 
measure, in particular, exact formulas for correla- 
tion functions, description of the limit shape, etc. 
(Okounkov 2002). 


Dimension Functions 


We already met the function dim, A. There is a 
useful formula 

n+c(LZ) 10 
LEA (0) 


where c((i,/)) =j — i is the content of the square O in 
ith row and jth column. From [10] it is clear that dim, 
makes sense for arbitrary complex values of n. The 
corresponding specializations of the Schur measure 


dim, A = 


x= V/é,...,V7€, y= Vé...,J€é 
—--A——" ——=—" 
z times z’ times 
where €,z,z are parameters, are related to 


the so-called Z-measures and their theory is much- 
developed (Olshanski 2003). As z,2/,€! — œo in 
such a way that zz’€ — £o, we get Wl, in the limit. 

The enumerative problems discussed in the section 
“Permutation enumeration” have analogs for the 
unitary groups U(n) and, suitably interpreted, the 
answers are the same with the dimension dim, A 
replacing dim A. For example, instead of counting the 
solutions to [6], one may be interested in the volume 
of the set of p-tuples of unitary matrices with given 
eigenvalues that multiply to 1. Geometrically, such 
data arise as the monodromy of a flat unitary 
connection over S*\{p points}, which is a U(n) analog 
of a branched cover. The analog of Burnside’s 
formula is Witten’s formula for the volumes of 
moduli spaces of flat connections on a genus g 
surface with given holonomy around p punctures, 
(see, e.g., Witten (1991) and Woodward (2004)). It 
involves summing normalized characters over all 
representations V^, not necessarily polynomial, with 
the weight (dim V*)*~78. If additionally weighted by 
a Gaussian of the form exp(—A(f,(A) + (7/2)|Al)), 


where 


=> (0) 11) 


this becomes Migdaľs formula for the partition 
function of the 2D Yang-Mills theory, the positive 
constant A being the area of the surface (see, e.g., 
Witten (1991) and Woodward (2004)). 

A further generalization naturally arising in the 
theory of quantum groups is the quantum dimension 


dimyg À = salq 7 T TEREE | 7 1 


where g is a parameter (it is more common to use 
dim,,, 71/2 instead). Obviously, dim,,, — dim, as q — 1. 
The function dim,, q is an important building block of, 
for example, quantum invariants of knots and 3-folds, 
and various related objects (see, e.g., Bakalov and 
Kirillov (2001)). The Verlinde formula (Bakalov and 
Kirillov 2001) can be viewed as an analog of Burnside’s 
formula with weight dim,,,. When q is a root of unity 
the summation over A is naturally truncated to a 
finite sum. 

The next level of generalization is obtained by 
deforming Schur function to Jack and, more generally, 
Macdonald symmetric functions (Macdonald 1995). 
In particular, the Jack polynomial analog of the 
Plancherel measure is 


Gack (A) 
ft ——— a o 
~ Ah (a) + Da + (On) (On + O) + a) 


where ¢,,t2 are parameters, and a(O) and (0) 
denote, as above, the arm- and leg-length of a 
square O. This measure depends only on the ratio 
ty /t, which is the usual parameter of Jack poly- 
nomials. To continue the analogy with random 
matrices, this should be viewed as a general 8 
analog of the Plancherel measure. 

The measure Wty, naturally arises in Atiyah- 
Bott localization computations on the Hilbert 
scheme of n points in C*. By definition, this 
Hilbert scheme parametrizes ideals I Cc C[x,y] of 
codimension n as linear spaces. The torus (C*)? 
acts on it by rescaling x and y and the fixed points 
of this action are 


1, = Span of eo oa 


where A is a partition of n. The weight of this fixed 
point in the Atiyah-Bott formula is proportional to 
Döack(à), the parameters tı and —t being the 
standard torus weights. Corresponding formulas in 
K-theory involve a Macdonald polynomial analog of 
dim A. 

Nekrasov defines the partition functions of V =2 
supersymmetric gauge theories by formally applying 
the Atiyah-Bott localization formula to (noncom- 
pact) instanton moduli spaces. The resulting expres- 
sion is a sum over partitions with a weight which is 
a generalization of Woack- In this way, random 
partitions enter gauge theory. What is more, 
statistical properties of these random partitions are 
reflected in the dynamics of gauge theories. For 
example, the limit shape turns out to be precisely the 
Seiberg-Witten curve (see Nekrasov and Okounkov 
(2003), Okounkov (2002), and also Nakajima and 
Yoshioka (2003)). 
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Harmonic Functions on Young Graph 
Definitions 


Partitions form a natural directed graph Y, known 
as Young graph, in which there is an edge from u to 
A if A is obtained from u by adding a square. We 
will denote this by 7A. Let k be a non-negative 
function (called multiplicity) on edges of Y. A 
function @ on the vertices of Y is harmonic if it 
satisfies 


lu) = X (uA) OA) [12] 


ANu 


for any u. For given edge multiplicities «, non- 
negative harmonic functions normalized by ọ¢(ģ)=1 
form a convex compact (with respect to pointwise 
convergence) set, which we will denote by H(«). The 
extreme points of H(«) are the indecomposable or 
ergodic harmonic functions. They are the most 
important ones. One defines 
Il K(Vi, Vi+1) 


dim, A/u = ` 
=v Zn S = Z VA- u 5A 
and dim, A= dim, A/@. For example, if « = 1 then 
dim, A= dim à. Any function ¢€ H(k) defines a 
probability measure on partitions of fixed size 
n,n=0,1,2,..., by 


Meo n(A) = O(A) dim, A, |A| = 2 [13] 


The mean value property [12] implies a certain 
coherence of these measures for different values of 
n, which, in general, does not hold for measures like 
Weschure Two multiplicity functions « and x’ are 
gauge equivalent if 


K! (uw, A) = f(p)rl u, AFA 


for some function f. In this case, H(k) and H(k’) are 
naturally isomorphic and the measures Wi are the same. 


First Example: Thoma Theorem 


Let F be a central function on the infinite symmetric 
group S(co)=,S(z), normalized by F(1)=1. 
Restricted to S(m), F is a linear combination of 
irreducible characters 


|A|=n 


The branching rule x" |ste—1) = Dow Zr x" implies that 
the Fourier coefficients ¢ are harmonic with respect 
to x = 1. They are non-negative if and only if F is a 
positive-definite function on S(oo), which means that 
the matrix (F( gig; ')) is non-negative definite for any 
{gi} C S(co). The description of all indecomposable 
positive-definite central functions on S(oo) was first 
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obtained by Thoma (see Kerov (2003, 1998) and 
Olshanski (2003)). Rephrased in our language, it 
says that the functions 


P(A) = Salpa p y ak +(-1)*1 X ok k>1 


are the extreme points of H(1). Here a; and (3; are 
parameters satisfying 


Zos 2 237 2 0, 
Sia t+ fj <1 


This set is known as the Thoma simplex. The origin 
a; = 3; =0 corresponds to the Plancherel measure. 

A general positive-definite central functions on 
S(co) defines a measure on the Thoma simplex. This 
measure can be interpreted as a point process on the 
real line, for example, by placing particles at 
positions {a;} and {—(;}. Interesting central func- 
tions lead to interesting processes (see Olshanski 
(2003)). 


By > Po >--- > 0 


Second Example: Kingman Theorem 


Let II be a partition of the naturals N into disjoint 
subsets. For any n=1,2,...,II defines the induced 
partition II, of {1,...,7} and hence a partition X(II,) 
of the number n. A measure M on partitions II is 
called exchangeable if 


M(Un) = (An), 


for some function @ on Y. This implies that ¢ is 
harmonic for 


Mal eee 


KK (LL, A) = Pk 


where palaes and \ = 101202... peel 
(k+ 1)... The description of all exchange- 
able measures M was first obtained by Kingman. 
In our language, it says that the extreme points of 
A(kx) are 


P(A) = MA rE at k>i 


where m, is the monomial symmetric function (sum 
of all monomials with exponents A) and a; are 
parameters as before. The corresponding measure 
Ma can be described as follows. Let X; be a 
sequence of independent, identically distributed 
random variables such that {a;} are the measures 
of atoms of their distribution. This defines a 
random partition II of N by putting 7 and f in the 
same block of II if and only if X;=X;. A general 
exchangeable measure M is then a convex linear 
combination of M,, which can be viewed as 
making the common distribution of X; also 
random. See Pitman (n.d.) for a lot more about 
Kingman’s theorem. 


The multiplicities kx are gauge equivalent to 
multiplicities 


Kyp(H, A) = kpr [14] 


which arise in the study of probability measures on 
virtual permutations ®© (Olshanski 2003). By definition, 


S = lim S(n) 


with respect to the maps S(z) — S(n — 1) that delete n 
from the disjoint cycle decomposition of a permutation 
o € S(n). For n> 5, this is the unique map that 
commutes with the right and left action of S(m — 1). 
Thus, $ has a natural S(oo) x S$(oo) action; however, it 
is not a group. A measure M on $ is central if it is 
invariant under the action of the diagonal subgroup in 
S(oo) x S(oo). Let the push-forward of M to S(7) give 
mass @(A) to a permutation with cycle type A. It is then 
easy to see that ¢ is harmonic with respect to [14]. 
Thus, Kingman’s theorem gives a description of 
ergodic central measures on &. For example, a; = 0 
corresponds to the 6-measure at the identity. 


Ergodic Method 


A unified approach to this type of problems was 
proposed and developed by Vershik and Kerov. It is 
based on the following ergodic theorem. Let o be an 
ergodic harmonic function. Then 
. dim,A 

olu) = lim GE Aoo ts 
for almost all A with respect to the measure [13] 
(Kerov 2003). This is similar to approximating a 
Gibbs measure in infinite volume by a sequence of 
finite-volume Gibbs measures with appropriate 
boundary conditions. The ratio on the RHS of [15] 
is known as the Martin kernel. Its asymptotics as 
[A| — co plays the essential role. 

Let us call a sequence {A(m)} of partitions of n 
regular if the limit in [15] exists for all u. For « = 1, 
Vershik and Kerov proved that {A(z)} is regular if 
and only if the following limits exist: 


= er aah Bj |16] 
n 


that is, if the rows and columns of A(n), scaled by n, 
have a limit. In this case, the limit in [15] is the 
harmonic function with Thoma parameters a; and 
Bi. This simultaneously proves Thoma classification 
and gives a law of large numbers for the correspond- 
ing measures [13]. It also gives a transparent 
geometric interpretation of Thoma parameters. 
Note that the behavior [16] is very different from 
the formation of a smooth limit shape that we saw 
earlier. For a common generalization of this result 
and Kingman’s theorem see Kerov (1998). 


See also: Determinantal Random Fields; Growth 
Processes in Random Matrix Theory; Integrable Systems 
in Random Matrix Theory; Random Matrix Theory in 
Physics; Symmetry Classes in Random Matrix Theory. 
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Introduction 


Random walks provide a simple conventional model to 
describe various transport processes, for example, 
propagation of heat or diffusion of matter through a 
medium (for a general reference see, e.g., Hughes 
(1995)). However, in many practical cases, the medium 
where the system evolves is highly irregular, due to 
factors such as defects, impurities, fluctuations, etc. It is 
natural to model such irregularities as “random 
environment,” treating the observable sample as a 
statistical realization of an ensemble, obtained by 
choosing the local characteristics of the motion (e.g., 
transport coefficients and driving fields) at random, 
according to a certain probability distribution. 

In the random walks context, such models are 
referred to as “random walks in random environ- 
ments” (RWRE). This is a relatively new chapter 
in applied probability and physics of disordered 
systems initiated in the 1970s. Early interest in 
RWRE models was motivated by some problems 


in biology, crystallography, and metal physics, but 
later applications have spread through numerous 
areas (see review papers by Alexander et al. (1981), 
Bouchaud and Georges (1990), and a comprehensive 
monograph by Hughes (1996)). After 30 years of 
extensive work, RWRE remain a very active area of 
research, which has been a rich source of hard and 
challenging questions and has already led to many 
surprising discoveries, such as subdiffusive behavior, 
trapping effects, localization, etc. It is fair to say that 
the RWRE paradigm has become firmly established 
in physics of random media, and its models, ideas, 
methods, results, and general effects have become an 
indispensable part of the standard tool kit of a 
mathematical physicist. 

One of the central problems in random media 
theory is to establish conditions ensuring homogeniza- 
tion, whereby a given stochastic system evolving in a 
random medium can be adequately described, on some 
spatial-temporal scale, using a suitable effective 
system in a homogeneous (nonrandom) medium. In 
particular, such systems would exhibit classical diffu- 
sive behavior with effective drift and diffusion coeffi- 
cient. Such an approximation, called “effective 
medium approximation” (EMA), may be expected to 
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be successful for systems exposed to a relatively small 
disorder of the environment. However, in certain 
circumstances, EMA may fail due to atypical environ- 
ment configurations (“large deviations”) leading to 
various anomalous effects. For instance, with small but 
positive probability a realization of the environment 
may create “traps” that would hold the particle for an 
anomalously long time, resulting in the subdiffusive 
behavior, with the mean square displacement growing 
slower than linearly in time. 

RWRE models have been studied by various 
nonrigorous methods including Monte Carlo simu- 
lations, series expansions, and the renormalization 
group techniques (see more details in the above 
references), but only a few models have been 
analyzed rigorously, especially in dimensions greater 
than one. The situation is much more satisfactory in 
the one-dimensional case, where the mathematical 
theory has matured and the RWRE dynamics has 
been understood fairly well. 

The goal of this article is to give a brief 
introduction to the beautiful area of RWRE. The 
principal model to be discussed is a random walk 
with nearest-neighbor jumps in independent and 
identically distributed (i.i.d.) random environment 
in one dimension, although we shall also comment 
on some generalizations. The focus is on rigorous 
results; however, heuristics will be used freely to 
motivate the ideas and explain the approaches and 
proofs. In a few cases, sketches of the proofs have 
been included, which should help appreciate the 
flavor of the results and methods. 


Ordinary Random Walks: A Reminder 


To put our exposition in perspective, let us give 
a brief account of a few basic concepts and 
facts for ordinary random walks, that is, evolving 
in a nonrandom environment (see further details in 
Hughes (1995)). In such models, space is modeled 
using a suitable graph, for example, a d-dimensional 
integer lattice Zf, while time may be discrete or 
continuous. The latter distinction is not essential, 
and in this article we will mostly focus on the 
discrete-time case. The random mechanism of 
spatial motion is then determined by the given 
transition probabilities (probabilities of jumps) at 
each site of the graph. In the lattice case, it is usually 
assumed that the walk is translation invariant, so 
that at each step distribution of jumps is the same, 
with no regard to the current location of the walk. 

In one dimension (d=1), the simple (nearest- 
neighbor) random walk may move one step to right 
or to the left at a time, with some probabilities p and 
g=1-—p, respectively. An important assumption is 


that only the current location of the walk determines 
the random motion mechanism, whereas the past 
history is not relevant. In terms of probability theory, 
such a process is referred to as “Markov chain.” Thus, 
assuming that the walk starts at the origin, its position 
after n steps can be represented as the sum of 
consecutive displacements, X,=Z,+---+ Zn, 
where Z; are independent random variables with the 
same distribution P{Z; = 1} =p, P{Z;= —1}=d. 

The strong law of large numbers (LLN) states that 
almost surely (1.e., with probability 1) 


Xn 
lim —=EZ,=p-—-4gq, 


nwo n 


P-a.s. [1] 


where E denotes expectation (mean value) with respect 
to P. This result shows that the random walk moves 
with the asymptotic average velocity close to p — q. It 
follows that if p—gq#0, then the process X,,, with 
probability 1, will ultimately drift to infinity (more 
precisely, +00 if p— q > 0 and —oo if p— q < 0). In 
particular, in this case, the random walk may return to 
the origin (and in fact visit any site on Z) only finitely 
many times. Such behavior is called “transient.” 
However, in the symmetric case (i.e., p =q = 0.5) the 
average velocity vanishes, so the above argument fails. 
In this case, the walk behavior appears to be more 
complicated, as it makes increasingly large excursions 
both to the right and to the left, so that 
lim, oo Xn = +00, lim, .,, Xn = —0o(P-a.s.). This 
implies that a symmetric random walk in one dimen- 
sion is “recurrent,” in that it visits the origin (and 
indeed any site on Z) infinitely often. Moreover, it can 
be shown to be “null-recurrent,” which means that the 
expected time to return to the origin is infinite. That is 
to say, return to the origin is guaranteed, but it takes 
very long until this happens. 

Fluctuations of the random walk can be char- 
acterized further via the central limit theorem 
(CLT), which amounts to saying that the probability 
distribution of X, is asymptotically normal, with 
mean n(p — q) and variance 4npq: 


lim P id) — nlp = 4) 


mm Anpa 
1 x 2 
= B(x): | e /2 dy [2] 


These results can be extended to more general 
walks in one dimension, and also to higher dimen- 
sions. For instance, the criterion of recurrence for a 
general one-dimensional random walk is that it is 
unbiased, E(X1 — Xo) =0. In the two-dimensional 
case, in addition one needs E|X; — Xo|* <co. In 
higher dimensions, any random walk (which does 
not reduce to lower dimension) is transient. 


<2 


Random Environments and Random Walks 


The definition of an RWRE involves two ingredi- 
ents: (1) the environment, which is randomly chosen 
but remains fixed throughout the time evolution, 
and (2) the random walk, whose transition prob- 
abilities are determined by the environment. The set 
of environments (sample space) is denoted by 
Q ={w}, and we use P to denote the probability 
distribution on this space. For each w€Q, we define 
the random walk in the environment w as the (time- 
homogeneous) Markov chain {X;, t=0,1,2,...} on 
Z with certain (random) transition probabilities 


p(x, y,w) = P™X1 = y|Xo = x} 3] 


The probability measure P“ that determines the 
distribution of the random walk in a given environ- 
ment w is referred to as the “quenched” law. We 
often use a subindex to indicate the initial position 
of the walk, so that, for example, P%{Xo =x}=1. 
By averaging the quenched probability P¥ further, 
with respect to the environment distribution, we 
obtain the “annealed” measure P, =P x P%, which 
determines the probability law of the RWRE: 


P,(A) = J P(A) P(dw) =EPH(A) 4 


Expectation with respect to the annealed measure 
P, will be denoted by Ex. 

Equation [4] implies that if some property A of the 
RWRE holds almost surely with respect to the 
quenched law P% for almost all environments (i.e., 
for all w € Q’ such that P(Q’) = 1), then this property is 
also true with probability 1 under the annealed law P,.. 

Note that the random walk X, is a Markov 
chain only conditionally on the fixed environment 
(i.e., with respect to P¥), but the Markov property 
fails under the annealed measure P}. This is because 
the past history cannot be neglected, as it tells what 
information about the medium must be taken into 
account when averaging with respect to environ- 
ment. That is to say, the walk learns more about 
the environment by taking more steps. (This idea 
motivates the method of “environment viewed from 
the particle,” see related section below.) 

The simplest model is the nearest-neighbor one- 
dimensional walk, with transition probabilities 


px ify=x+1 
p(x, y,w) = $ qx 
0 otherwise 


ify=x-1 


where px, and g,=1—p, (x€Z) are random vari- 
ables on the probability space (Q, P). That is to say, 
given the environment wé€Q, the random walk 
currently at point x€ Z will make a one-unit step 
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to the right, with probability px, or to the left, with 
probability qx. Here the environment is determined 
by the sequence of random variables {p,.}. For most 
of the article, we assume that the random probabil- 
ities {px,x E Z} are i.i.d., which is referred to as 
“iid. environment.” Some extensions to more 
general environments will be mentioned briefly in 
the section “Some generalizations and variations.” 
The study of RWRE is simplified under the follow- 
ing natural condition called “(uniform) ellipticity:” 


O<6<p,<1-6<1, xEZ, P-as. JS] 


which will be frequently assumed in the sequel. 


Transience and Recurrence 


In this section, we discuss a criterion for the RWRE 
to be transient or recurrent. The following theorem 
is due to Solomon (1975). 


Theorem 1 Set px :=qx/Px, x E€ Z, and ņn:= Eln po. 


(i) If 740 then X; is transient (Po-a.s.); moreover, 
if 7 <0 then lim; 9 X;= +00, while if 7 > 0 
then lim; „o X; = — o0 (Po-a.s.). 

(ii) If 7=0 then X, is recurrent (Po-a.s.); moreover, 


lim X; = +œ, lim X;=—oo, Po-a.s. 
t —> OO t — œ 


Let us sketch the proof. Consider the hitting times 
Ty, := miaf >0:X,=x} and denote by f,, the 
quenched first-passage probability from x to y: 

fry = PE{1 < Ty <00} 


Starting from 0, the first step of the walk may be 
either to the right or to the left, hence by the 
Markov property the return probability foo can be 
decomposed as 


foo = Pofio + qof-1,0 [6] 


To evaluate fio, for n > 1 set 
Ux =y\"):— PY’{T, <I Vea 


which is the probability to reach O prior to n, 
starting from x. Clearly, 


fio = lim ui” [7] 


Decomposition with respect to the first step yields 
the difference equation 


Uy = Pxx+1 + dxtx1, O<x<n [8] 
with the boundary conditions 
uy = 1, in = 0 [9] 
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Using px + qx = 1, eqn [8] can be rewritten as 
Ux+4 — Ux = Py g= Ux_1) 


whence by iterations 


x 


Ux 41 — Ux = (m — uo) | | o [10] 
j=1 


Summing over x and using the boundary conditions 


[9] we obtain 
n—1 -1 
1 — u = ( a) [11] 
x=0 j=1 


(if x=0, the product over j is interpreted as 1). In 
view of eqn [7] it follows that fio = 1 if and only if 
the right-hand side of eqn [11] tends to 0, that is, 


x 


> Sp =0 Y= > lag [12] 
x=1 J= 


Note that the random variables In p; are 1.i.d., hence 
by the strong LLN 


Yx 
lim —=Elnpp=n, P-a.s. 


x=>00 X 
That is, the general term of the series [12] for large x 
behaves like exp (x7); hence, for 7 > 0 the condition 
[12] holds true (and so fio = 1), whereas for 7 < 0 it 
fails (and so fio < 1). 

By interchanging the roles of px and qx, we also 
have f-1,0 < 1 if 7 > 0 and f-1,0=1 if 7 < 0. From 
eqn [6], it then follows that in both cases foo <1, 
that is, the random walk is transient. 

In the critical case, 7=0, by a general result from 
probability theory, Y, > 0 for infinitely many x 
(P-a.s.), and so the series in eqn [12] diverges. 
Hence, fio = 1 and, similarly, f_1,9 = 1, so by eqn [6] 
foo = 1, that is, the random walk is recurrent. 

It may be surprising that the critical parameter 
appears in the form ņ=E ln pọ, as it is probably 
more natural to expect, by analogy with the 
ordinary random walk, that the RWRE criterion 
would be based on the mean drift, E(po — go). In the 
next section, we will see that the sign of d may be 
misleading. 

A canonical model of RWRE is specified by the 
assumption that the random variables p, take only 
two values, @ and 1 — 3, with probabilities 


P{px =B} =a, P{px=1-B}=1-a [13] 


where 0<a<1,0<@<1. Here n=(2a—1)x 
In (1+ (1 — 28)/6), and it is easy to see that, for 
example, 7 <0 if a<1/2,8<1/2 or a>1/2, 
B > 1/2. The recurrent region where 7 = 0 splits into 
two lines, G=1/2 and a=1/2. Note that the first 
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Figure 1 Phase diagram for the canonical model, eqn [13]. In the 
regions where ņ < Oor7 > 0, the RWRE is transient to +020 or —oo, 
respectively. The recurrent case, 7=0, arises when a=1/2 or 
B=1/2. The asymptotic velocity vu := lim:.o X;/t is given by eqn 
[14]. Adapted from Hughes BD (1996) Random Walks and Random 
Environments. Volume 2: Random Environments, Ch. 6, p. 391. 
Oxford: Clarendon, by permission of Oxford University Press. 





case is degenerate and amounts to the ordinary 
symmetric random walk, while the second one 
(except where @G=1/2) corresponds to Sinai’s 
problem (see the section “Sinai’s localization”). A 
“phase diagram” for this model, showing various 
limiting regimes as a function of the parameters a, p, 
is presented in Figure 1. 


Asymptotic Velocity 


In the transient case the walk escapes to infinity, 
and it is reasonable to ask at what speed. For a 
nonrandom environment, p,=p, the answer is 
given by the LLN, eqn [1]. For the simple 
RWRE, the asymptotic velocity was obtained by 
Solomon (1975). Note that by Jensen’s inequality, 
(Epo) ~* < Epo!. 





Theorem 2 The limit v:=lim;_..~ X;/t exists 
(Po-a.s.) and is given by 
1 — Epo ; 
if Epo < 1 
1 + Epo PO 
= 1— Ep7! 14 
” e tE <1 n4 
1+ Ep 
0 otherwise 


Thus, the RWRE has a well-defined nonzero 
asymptotic velocity except when (Epo)~'<1< 
Epp!. For instance, in the canonical example 
eqn [13] (see Figure 1), the criterion Epo < 1 for 
the velocity v to be positive amounts to the 
condition that both (1—a)/a and (1—)/G lie on 
the same side of point 1. 


The key idea of the proof is to analyze the hitting 
times T, first, deducing results for the walk X; later. 
More specifically, set r; = T; — T;_1, which is the time 
to hit 7 after hitting 7— 1 (providing that i > Xo). If 
Xo =0 and n > 1, then T, =7, +--+ Tn. Note that 
in fixed environment w the random variables {7;} are 
independent, since the quenched random walk “for- 
gets” its past. Although there is no independence with 
respect to the annealed probability measure Po, one 
can show that, due to the iid. property of the 
environment, the sequence {7;} is ergodic and therefore 
satisfies the LLN: 


T, TETT 


Maa L ET Po-a.s. 
n n 
In turn, this implies 
Xt 1 
——+——, Ppo-as. 15 
t 7 Eoi i oo | | 


(the clue is to note that Xr, =n). 
To compute the mean value Egt, observe that 


m= Vga lege il + To +71) [16] 


where 14 is the indicator of event A and 7,7; 
are, respectively, the times to get from —1 to 0 and 
then from 0 to 1. Taking expectations in a fixed 
environment w, we obtain 


Eom = po + qo(1 + Eto + ET) [17] 
and so 
Fo71 = 1 + po + poE9To [18] 


Note that Ej7) is a function of {px,x < 0} and 
hence is independent of po =qo/po. Averaging eqn 
[18] over the environment and using Eor = Eom 
yields 





1 + Epo l 
fE 1 
Eor = 4 1 -— Epo ee [19] 
OO if Epo 7 1 


and by eqn [15] “half? of eqn [14] follows. The 
other half, in terms of Epo !, can be obtained by 
interchanging the roles of p, and gx, whereby po is 
replaced with pō t. 

Let us make a few remarks concerning Theorems 
1 and 2. First of all, note that by Jensen’s inequality 
Eln pọ < lIn Epo, with a strict inequality whenever 
po is nondegenerate. Therefore, it may be possible 
that, with Po-probability 1, X;— oo but X;/t—0 
(see Figure 1). This is quite unusual as compared 
to the ordinary random walk (see the subsection 
“Ordinary random walks: a reminder”), and 
indicates some kind of slowdown in the transient 
case. 
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Furthermore, by Jensen’s inequality 
Epo =Epy'!— 1 > (Epo) — 1 
so eqn [14] implies that if Epo < 1, then 
0 <v < 2 Epo —1=E (po — q0) 


and the inequality is strict if pọ is genuinely random 
(i.e. does not reduce to a constant). Hence, the 
asymptotic velocity v is less than the mean drift 
E(po — go), which is yet another evidence of slow- 
down. What is even more surprising is that it is 
possible to have E(po — go) > 0 but n= E In pp > 0, so 
that Po-a.s. X; > —oo (although with velocity v = 0). 
Indeed, following Sznitman (2004) suppose that 


P{po=P}=a, Pi{po=ys=1-a 


with a>1/2. Then Epo > aß > 1/2 if 1>6> 
1/2a, hence E(po —go)=2 Epp -1>0. On the 
other hand, 


Y 0 








E ln pp =aln——" ealas 


if y is sufficiently small. 


Critical Exponent, Excursions, and Traps 


Extending the previous analysis of the hitting times, 
one can obtain useful information about the limit 
distribution of T, (and hence X,). To appreciate 
this, note that from the recursion eqn [16] it follows 


Ti = Ie i + Fp cee + T F- TI 
and, similarly to [17], 
E97} = po + ES + 79 + 71)" 


Taking here expectation E, one can deduce that 
FEot} <œ if and only if Ep) < 1. Therefore, it is 
natural to expect that the root « of the equation 


Epp = 1 [20] 


plays the role of a critical exponent responsible for 
the growth rate (and hence, for the type of the limit 
distribution) of the sum T,,=7, +---+ Tn. In parti- 
cular, by analogy with sums of i.i.d. random 
variables one can expect that if k > 2, then T, is 
asymptotically normal, with the standard scaling 
yn, while for « <2 the limit law of T, is stable 
(with index «) under scaling ~nt“. 

Alternatively, eqn [20] can be obtained from 
consideration of excursions of the random walk. 
Let T4 be the left-excursion time from site 1, that is 
the time to return to 1 after moving to the left at the 
first step. If 7=Elnpo < 0, then T4 < œœ (Po-a.s.). 
Fixing an environment w, let w;=E¥%T}, be the 
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quenched mean duration of the excursion T4 and 
observe that w;=1+ E57, where 7; is the time to 
get back to 1 after stepping to 0. 

As a matter of fact, this representation and 
eqn [19] imply that the annealed mean duration of 
the left excursion, EgT},, is given by 


——— iE 1 
1—Epo 1E po < 
OO if Epo > 1 


Ew, = [21] 


Note that in the latter case (and bearing in mind 7 < 0), 
the random walk starting from 1 will eventually drift to 
+oo, thus making only a finite number of visits to 0, 
but the expected number of such visits is infinite. 

In fact, our goal here is to characterize the 
distribution of w , under the law P. To this end, 
observe that the excursion T4 involves at least two 
steps (the first and the last ones) and, possibly, 
several left excursions from 0, each with mean time 
wo =E*T;.,.. Therefore, 


wi =2+ 5° qhpoljwo)=2 + powo [22] 
El 


By the translation invariance of the environment, the 
random variables wı and wọ have the same distribu- 
tion. Furthermore, similarly to recursion [22], we 
have wo =2+p_1w_4. This implies that wo is a 
function of p, with x < —1 only, and hence wọ and 
po are independent random variables. Introducing the 
Laplace transform ¢(s) = E exp (—sw 1) and condition- 
ing on po, from eqn [22] we get the equation 


o(s) =e “Ed(spo) [23] 
Suppose that 


1— ¢(s)~as", s—0 


then eqn [23] amounts to 

1—as +- =(1—2s+---)(1-—as Epp +--:) 
Expanding the product on the right, one can see that 
a solution with x= 1 is possible only if Epo < 1, in 
which case 


2 


a = Ew ~ 1—Epo 


We have already obtained this result in eqn [21]. 

The case k <1 is possible if Epj=1, which is 
exactly eqn [20]. Returning to w 1, one expects a 
slow decay of the distribution tail, 


P{w,>t}~bt/", t= 


In particular, in this case the annealed mean 
duration of the left excursion appears to be infinite. 


Although the above considerations point to the 
critical parameter k, eqn [20], which may be 
expected to determine the slowdown scale, they 
provide little explanation of a mechanism of the 
slowdown phenomenon. Heuristically, it is natural 
to attribute the slowdown effects to the presence of 
“traps” in the environment, which may be thought 
of as regions that are easy to enter but hard to leave. 
In the one-dimensional case, such a trap would 
occur, for example, between two long series of 
successive sites where the probabilities p, are fairly 
large (on the left) and small (on the right). 

Remarkably, traps can be characterized quantita- 
tively with regard to the properties of the random 
environment, by linking them to certain large- 
deviation effects (see Sznitman (2002, 2004)). The 
key role in this analysis is played by the function 
F(u):= InEpj, we R. Suppose that 7=Eln po < 0 
(so that by Theorem 1 the RWRE tends to 
+00, Po-a.s.) and also that Epp > 1 and Ept > 1 
(so that by Theorem 2, v=0). The latter means that 
F(1) > 0 and F(—1) > 0, and since F is a smooth 
strictly convex function and F(0) =0, it follows that 
there is the second root 0 < «x < 1, so that F(x) =0, 
that is, Ep} =1 (cf. eqn [20)). 

Let us estimate the probability to have a trap in 
U=[-—L,L] where the RWRE will spend anoma- 
lously long time. Using eqn [11], observe that 


Pa Lo < T1441} >1- exp{—LS;} 


where Sr:=L7! D Inpy—->n<0O as L—-o. 
However, due to large deviations S$; may exceed 
level € > 0 with probability 


PS; > e} ~ exp{—Ll(e)}, 


L— œ 


where I(x) := sup, {ux — F(u)} is the Legendre trans- 
form of F. We can optimize this estimate by 
assuming that eL > lnn and minimizing the ratio 
I(e)/e. Note that F(u) can be expressed via the 
inverse Legendre transform, F(u) = sup, {xu — I(x)}, 
and it is easy to see that if K:=min,so I(e)/e, then 
F(t) =0, so «K is the second (positive) root of F. 

The “left” probability PY {To < T_p-1} is esti- 
mated in a similar fashion, and one can deduce that 
for some constants K > 0,c > 0, and any x’ > k, for 
large n 


PÍ Pi max |X| < Kinn} > c) >y" 
<n 


That is to say, this is a bound on the probability to 
see a trap centered at 0, of size ~ lnn, which will 
retain the RWRE for at least time n. It can be 
shown that, typically, there will be many such traps 
both in [-n",0] and [0,n"], which will essentially 


prevent the RWRE from moving at distance n“ 
from the origin before time n. In particular, it 
follows that lim,_...X,,/n" =0 for any Kk’ >k, so 
recalling that 0 < «x < 1, we have indeed a sublinear 
growth of X,. This result is more informative as 
compared to Theorem 2 (the case v=0O), and it 
clarifies the role of traps (see more details in 
Sznitman (2004)). The nontrivial behavior of the 
RWRE on the precise growth scale, n“, is char- 
acterized in the next section. 


Limit Distributions 


Considerations in the previous section suggest that 
the exponent «x, defined as the solution of eqn 
[20], characterizes environments in terms of dura- 
tion of left excursions. These heuristic arguments 
are confirmed by a limit theorem by Kesten et al. 
(1975), which specifies the slowdown scale. We 
state here the most striking part of their result. 
Denote In* uw:= max{Inw,0}; by an arithmetic 
distribution one means a probability law on R 
concentrated on the set of points of the form 
OSC, S224 5% 


Theorem 3 Assume that —o <n=Elnpo < 0 
and the distribution of lnpọ is nonarithmetic 
(excluding a possible atom at —co). Suppose that 
the root k of eqn [20] is such that O < k <1 and 
Epk In* po < 00. Then 


lim Po{n-*T,, < t} =L,(t) 
lim Po{t "X; < x} = 1 — Lae”) 


where L,.(-) is the distribution function of a stable 
law with index k, concentrated on |0, œ). 


General information on stable laws can be found 
in many probability books; we only mention here 
that the Laplace transform of a stable distribution 
on [0,00o) with index « has the form ¢(s)= 
exp { —Cs*}. 

Kesten et al. (1975) also consider the case «x > 1. 
Note that for s > 1, we have Ep < (Eye) = 1, so 
v>0O by eqn [14]. For example, if « > 2 then, as 
expected (see the previous section), there exists a 
nonrandom ø? > 0 such that 


Let us describe an elegant idea of the proof based 
on a suitable renewal structure. (1) Let U? (i < n) be 
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the number of left excursions starting from 7 up to 
time T;,, and note that T, =n + 2 J; U”. Since the 
walk is transient to +00, the sum 5).-, U” is finite 
(Po-a.s.) and so does not affect the limit. (2) Observe 
that if the environment w is fixed then the condi- 
tional distribution of U”, given Ce Oe 
the same as the distribution of the sum of 1 + U% 4 iid. 
random variables Vj, V2,..., each with geo- 
metric distribution P5{V; =k} = pid; (R=0, 1; 2h 
Therefore, the sum 5>"_, U” (read from right to 
left) can be represented as Da Z:, Where Zo = 
0,Z1,Z2,... is a branching process (in random 
environment {p;}) with one immigrant at each step 
and the geometric offspring distribution with parameter 
p; for each particle present at time j. (3) Consider 
the successive “regeneration” times rý, at which 
the process Z; vanishes. The partial sums 
Wpi= D <r, Z; form an i.i.d. sequence, and the 
proof amounts to showing that the sum of W, has a 
stable limit of index «x. (4) Finally, the distribution of 
Wo can be approximated using Mọ:= X 2] Ma pj 
(cf. eqn [11]), which is the quenched mean number of 
total progeny of the immigrant at time t=0. Using 
Kesten’s renewal theorem, it can be checked that 
P{Mo > x}~ Kx~" as x = œœ, so Mo is in the domain 
of attraction of a stable law with index «x, and the 
result follows. 

Let us emphasize the significance of the regenera- 
tion times T“. Returning to the original random 
walk, one can see that these are times at which the 
RWRE hits a new “record” on its way to +00, never 
to backtrack again. The same idea plays a crucial 
role in the analysis of the RWRE in higher 
dimensions (see the subsections “Zero—one laws 
and LLNs” and “Kalikow’s condition and Sznitman’s 
condition (T’)’’). 

Finally, note that the condition —co <7 <0 
allows P{po =1} > 0, so the distribution of pọ may 
have an atom at 0 (and hence In pọ at —oo). In view 
of eqn [20], no atom is possible at +oo. The 
restriction for the distribution of Inpo to be 
nonarithmetic is important. This will be illustrated 
in the section “Diode model,” where we discuss the 
model of random diodes. 


Sinai’s Localization 


The results discussed in the previous section indicate 
that the less transient the RWRE is (i.e., the critical 
exponent decreasing to zero), the slower it moves. 
Sinai (1982) proved a remarkable theorem showing 
that for the recurrent RWRE (i.e. with 
n=E ln pọ =0), the slowdown effect is exhibited in 
a striking way. 
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Theorem 4 Suppose that the environment {px} is 
i.i.d. and elliptic, eqn [5], and asme that 
eae, with P{p9 =1} < 1. Denote o? := Eln? 
po,0 <0? <œ. Then there exists a T on 





W, = W,(w) of the random environment such that 
for any £ > e 
"X 
lim Pod e- w, >e} =0 [24] 
ee In“ n 








Moreover, W,„ has a limit distribution: 


lim P{W, < x} = G(x) 25] 


and thus also the distribution of o2X,,/\n* n under 
Po converges to the same distribution G(x). 


Sinai’s theorem shows that in the recurrent case, the 
RWRE considered on the spatial scale In n becomes 
localized near some random point (depending on the 
environment only). This phenomenon, frequently 
referred to as “Sinai’s localization,” indicates an 
extremely strong slowdown of the motion as com- 
pared with the ordinary diffusive behavior. 

Following Révész (1990), let us explain heuristi- 
cally a X,, is measured on the scale ln? n. Rewrite 
eqn [11] a 


n—1 =i 
PY{T, < To} = ( + $ exr(¥) [26] 


x=1 


where Y, is defined in eqn [12]. By the CLT, the 
typical size of |Y,.| for large x is of order of yx, and 
so eqn [26] yields 


P?{Tn < To} © exp{- vn} 


This suggests that the walk started at site 1 will 
make about exp{,/z} visits to the origin before 
reaching level n. Therefore, the first passage to 
site n takes at least time ~ exp{,/v}. In other 
words, one may expect that a typical displace- 
ment ae es n steps will be of order of In* n (cf. eqn 
[24]). This argument also indicates, in the spirit 
of the trapping mechanism of oE discussed 
at the end of the section “Critical exponent, 
excursions, and „traps, ”? that there is typically a 
trap of size ~In*n, which retains the RWRE until 
time 7. 

It has been shown (independently by H Kesten 
and A O Golosov) that the limit in [25] coincides 
with the distribution of a certain functional of the 
standard Brownian motion, with the density 
function 


co y 43k 
OES apd- Cau i 





m 


Environment Viewed from the Particle 


This important technique, dating back to Kozlov 
and Molchanov (1984), has proved to be quite 
efficient in the study of random motions in random 
media. The basic idea is to focus on the evolution of 
the environment viewed from the current position of 
the walk. 

Let @ be the shift operator acting on the space of 
environments Q = {w} as follows: 


{Px-1} 


g o 
w= {px} > 0 = 
Consider the process 
wo = w 


which describes the state of the environment from 
the point of view of an observer moving along with 
the random walk X,. One can show that w, is a 
Markov chain (with respect to both Pë and Po), with 
the transition kernel 

T(w, du") = po b6(dw") + qo 69-1,,(dw") |27] 
and the respective initial law 6, or P (here 6, is the 
Dirac measure, 1.e., unit mass at w). 

This fact as it stands may not seem to be of any 
practical use, since the state space of this Markov 
chain is very complex. However, the great advan- 
tage is that one can find an explicit invariant 
probability Q for the kernel T (i.e., such that 
QT=Q), which is absolutely continuous with 
respect to P. 

More specifically, assume that Epo < 1 and set 
Q=f(w)P, where (cf. eqn [14]) 


f =u (1+ po) Dae 


x=0 j=1 [28] 
_ 1- Epo 


1+ Epo 





Using independence of {px}, we note 


fa Q(dw) = 
x= 


= Ef = (1 — Epo) X_ (Epo) = 


hence Q is a probability measure on Q. Furthermore, 
for any bounded measurable function g on Q we 
have 


OTz = J Te(w)O(du) = EfTe 


= E4 f [po (g0 0) + qo (g0 07')]} 


= E{g[(pof)00 + (gof)o6]} 29) 


By eqn [28], 


(pof) oO * = vp-1(1 + p- >>) Pj-1 


=o(tm Ta) =+ 


x=0 j=1 
and similarly 
(qof)08 = -v+ f 
of =— —_____= 
qo iT 


So from eqn [29] we obtain 


QTg = E(gf) = J g(w) Q(dw) = Qg 


which proves the invariance of Q. 

To illustrate the environment method, let us 
sketch the proof of Solomon’s result on the 
asymptotic velocity (see Theorem 2). Set d(x, w) := 
E!(X1 — Xo) =Px— qx. Noting that d(x, w)= 
d(0,8*w), define 


D, := 2 d(Xj_1,w) = 3 d(0, 0%=w) 


Due to the Markov property, the process M,:= 
X, — D, is a martingale with respect to the natural 
filtration Fn, = o{X1,..., Xn} and the law PẸ, 


ES |Musi |F a] =Ma. Pas. 


and it has bounded jumps, |M,—My,-1| < 2. By 
general results, this implies M„/n — 0 (P§-a.s.). 


On the other hand, by Birkhoff’s ergodic 
theorem 
lim =" = f 40.0) Dad Pas 
The last integral is easily evaluated to yield 
E(po — q0) f= Jo (1 — po) 
604 
= v(1 — Epo) > Em) =v 


and the first part of the formula [14] follows. 

The case Epọ> 1 can be handled using a 
comparison argument (Sznitman 2004). Observe 
that if px < px for all x then for the corresponding 
random walks we have X; < X; (PẸ -a.s.). We now 
define a suitable dominating random medium by 
setting (for y > 0) 
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Then E fo = Ego/(po +) < 1 if y is large enough, 
so by the first part of the theorem, PẸ - a.s., 

— Xn -no Xn 1-Eğ 

lim — < lim = a [30] 
1+ Epo 





Note that Eo is a continuous function of y with 
values in [0, Epo] 3 1, so there exists y* such that 
Epo attains the value 1. Passing to the limit in 
eqn [30] as y T9“, we obtain lim,—.X,/n < 0 
(P5-a.s.). Similarly, we get the reverse inequality, 
which proves the second part of the theorem. 

A more prominent advantage of the environment 
method is that it naturally leads to statements of CLT 
type. A key step is to find a function H(x,t,w)= 
x —vt+h(x,w) (called “harmonic coordinate”) such 
that the process H(X,,”,w) is a martingale. To this 


end, by the Markov property it suffices to have 
E? A(Xnyi,2 + 1,w) = H(Xp,7,0), 


_ h(x, w) 


Po-a.s. 


For A(x,w):=h(x + 1,w) 
leads to the equation 


A(x,w) = pxA(x — 1,w) 


this condition 


+v—1+(1+v)p, 


If Epo < 1 (so that v > 0), there exists a bounded 


solution 
co. hUOUrR 
A(x,w) =v—1 +2uS— | [ox 
k=0 i=0 


and we note that A(x, w) = A(0, 8w) is a stationary 
sequence with mean EA(x,w)=0. Finally, setting 


h(0,w) =0 we find 


x—1 
N Alk, w), x>0 
Cr) a 
— N A(—k, w), <U 
k=1 


As a result, we have the representation 
Xn — nv = H(X, n, w) + b(Xn, w) [31] 


For a fixed w, one can apply a suitable CLT for 
martingale differences to the martingale term in eqn 
[31], while using that X, ~ nv (Po-a.s.), the second 
term in eqn [31] is approximated by the sum $`% o 
A(k, w), which can be handled via a CLT for stationary 
sequences. This way, we arrive at the following result. 


Theorem 5 Suppose that the environment is 
elliptic, eqn [5], and such that Ep” < 1 for some 
e€ > 0 (which implies that Epo < 1 and hence v > 0). 
Then there exists a nonrandom o* > 0 such that 


Xy — 
lim Po} = < 7 = (x) 
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Note that this theorem is parallel to the result by 
Kesten et al. (1975) on asymptotic normality when 
k >2 (see the section “Limit distributions”). The 
moment assumptions in Theorem 5 are more 
restrictive, but they can be relaxed. On the other 
hand, Theorem 5 does not impose the nonarithmetic 
condition on the distribution of the environment 
(cf. Theorem 3). More importantly, the environment 
method proves to be quite efficient in more general 
situations, including non-i.i.d. environments and 
higher dimensions (at least in some cases, e.g., for 
random bonds RWRE and balanced RWRE dis- 
cussed subsequently). 


Diode Model 


In the preceding sections (except in the section 
“Limit distributions,” where however we were 
limited to a nonarithmetic case), we assumed that 
O<p, <1 and therefore excluded the situation 
where there are sites through which motion is 
permitted in one direction only. Allowing for such 
a possibility leads to the “diode model” (Solomon 
1975). Specifically, suppose that 


P{p.=P}=a, Pipr=1t=1-a [32] 


with O<a<1,0< 8< 1, so that with probability 
œ a point x € Z is a usual two-way site and with 
probability 1— a it is a repelling barrier (“diode”), 
through which passage is only possible from left to 
right. This is an interesting example of statistically 
inhomogeneous medium, where the particle motion 
is strongly irreversible due to the presence of special 
semipenetrable nodes. The principal mathematical 
advantage of such a model is that the random walk 
can be decomposed into independent excursions 
from one diode to the next. 

Due to diodes, the RWRE will eventually drift to 
+oo. If 8 > 1/2, then on average it moves faster 
than in a nonrandom environment with px = (3. The 
situation where 8 < 1/2 is potentially more inter- 
esting, as then there is a competition between the 
local drift of the walk to the left (in ordinary sites) 
and the presence of repelling diodes on its way. 
Note that Epp =ap, where p:=(1-—{3)/G, so the 
condition Epo < 1 amounts to 8 > a/(1+ a). In this 
case (which includes 8 > 1/2), formula [14] for the 
asymptotic velocity applies. 

As explained in the section “Critical exponent, 
excursions, and traps,” the quenched mean duration 
w of the left excursion has Laplace transform given 
by eqn [23], which now reads 


p(s) =e *{1 — a +a ¢(sp)} 


This equation is easily solved by iterations: 


g(s) = (1-a) X ate 


k= 
k . 
p2 ` p’ 
j=0 


hence the distribution of w is given by 


© 


[33] 


P{w = t} = (1 -a)af, k=0,1,... 


This result has a transparent probabilistic meaning. 
In fact, the factor (1—a)a* is the probability that 
the nearest diode on the left of the starting point 
occurs at distance k+ 1, whereas t, is the corre- 
sponding mean excursion time. Note that formula 
[33] for t, easily follows from the recursion t = 2 + 
ptp_1 (cf. eqn [22]) with the boundary condition 
p= 2 

A self-similar hierarchy of timescales [33] indi- 
cates that the process will exhibit temporal oscilla- 
tions. Indeed, for ap > 1 the average waiting time 
until passing through a valley of ordinary sites of 
length k is asymptotically proportional to t, ~ 2p*, 
so one may expect the annealed mean displacement 
EoX, to have a local minimum at n% tp. Passing to 
logarithms, we note that Int,,, — Int, ~ lIn p, which 
suggests the occurrence of persistent oscillations on 
the logarithmic timescale, with period Inp (see 
Figure 2). This was confirmed by Bernasconi and 
Schneider (1985) who showed that for ap > 1 


FoX,~n*Finn), n— co [34] 
where k= —Ina/Inp < 1 is the solution of eqn [20] 
and the function F is periodic with period In p (see 
Figure 2). 

In contrast, for ag=1 one has 
l 
FoXn~ hee n — CO 
2lnn 


and there are no oscillations of the above kind. 

These results illuminate the earlier analysis of the 
diode model by Solomon (1975), which in the main 
has revealed the following. If ag=1, then X, 
satisfies the strong LLN: 


X, —Inp 





m n/\Inn 2” Poras, 
while in the case ap > 1 the asymptotic behavior of 
Xn is quite complicated and unusual: if n;— œœ is a 
sequence of integers such that {lInn;}—~y (here 
{a}=a-—l[a] denotes the fractional part of a), then 
the distribution of n; "X,, under Po converges to a 
nondegenerate distribution which depends on y. 
Thus, the very existence of the limiting distribution 


In(n-? EX») 


0 2 4 
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Figure 2 Temporal oscillations for the diode model, eqn [32]. Here ~=0.3 and p=1/0.09, so that ag > 1 and x= 1/2. The dots 
represent an average of Monte Carlo simulations over 10000 samples of the environment with a random walk of 200000 steps in 
each realization. The broken curve refers to the exact asymptotic solution [34]. The arrows indicate the simulated locations of the 
minima t, the asymptotic spacing of which is predicted to be In p ~ 241. Reproduced from Bernasconi J and Schneider WR (1982). 
Diffusion on a one-dimensional lattice with random asymmetric transition rates. Journal of Physics A: Mathematical and General 15: 


L729-L734, by permission of IOP Publishing Ltd. 


of X, and the limit itself heavily depend on the 
subsequence n; chosen to approach infinity. 

This should be compared with a more “regular” 
result Theorem 3. Note that almost all the condi- 
tions of this theorem are satisfied in the diode 
model, except that here the distribution of In po is 
arithmetic (recall that the value Inpo = —co is 
permissible), so it is the discreteness of the environ- 
ment distribution that does not provide enough 
“mixing” and hence leads to such peculiar features 
of the asymptotics. 


Some Generalizations and Variations 


Most of the results discussed above in the simplest 
context of RWRE with nearest-neighbor jumps in an 
i.i.d. random environment have been extended to 
some other cases. One natural generalization is to 
relax the i.i.d. assumption, for example, by con- 
sidering stationary ergodic environments (see details 
in Zeitouni (2004)). In this context, one relies on an 
ergodic theorem instead of the usual strong LLN. 
For instance, this way one readily obtains an 
extension of Solomon’s criterion of transience versus 
recurrence (see Theorem 1). Other examples include 
an LLN (along with a formula for the asymptotic 
velocity, cf. Theorem 2), a CLT and stable laws for 
the asymptotic distribution of X, (cf. Theorem 3), 
and Sinai’s localization result for the recurrent 
RWRE (cf. Theorem 4). Usually, however, ergodic 
theorems cannot be applied directly (like, e.g., to 
Xn, as the sequence X, — X,_ 1 is not stationary). In 
this case, one rather uses the hitting times which 
possess the desired stationarity (cf. the sections 
“Asymptotic velocity” and “Critical exponent, 
excursions, and traps”). In some situations, in 
addition to stationarity, one needs suitable mixing 


conditions in order to ensure enough decoupling 
(e.g., in Sinai’s problem). The method of environ- 
ment viewed from the particle (discussed earlier) is 
also suited very well to dealing with stationarity. 

In the remainder of this section, we describe some 
other generalizations including RWRE_ with 
bounded jumps, RWRE where randomness is 
attached to bonds rather than sites, and continuous- 
time (symmetric) RWRE driven by the randomized 
master equation. 


RWRE with Bounded Jumps 


The previous discussion was restricted to the case of 
RWRE with nearest-neighbor jumps. A natural 
extension is RWRE with bounded jumps. Let L, R 
be fixed natural numbers, and suppose that from 
each site x € Z jumps are only possible to the sites 
x +1,1= —L,...,R, with (random) probabilities 


R 
px(i)>0, So px(i)=1 [35] 


i=—L 


We assume that the random vectors p,(-) determin- 
ing the environment are i.i.d. for different x €Z 
(although many results can be extended to the 
stationary ergodic case). 

The study of asymptotic properties of such a 
model is essentially more complex, as it involves 
products of certain random matrices and hence must 
use extensively the theory of Lyapunov exponents 
(see details and further references in Brémont 
(2004)). Lyapunov exponents, being natural analogs 
of logarithms of eigenvalues, characterize the 
asymptotic action of the product of random matrices 
along (random) principal directions, as described by 
Oseledec’s multiplicative ergodic theorem. In most 
situations, however, the Lyapunov spectrum can 
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only be accessed implicitly, which makes the 
analysis rather hard. 

To explain how random matrices arise here, let us first 
consider a particular case R=1,L > 1. Assume that 
px(—L),px(1) > 6 > 0 for all xe Z (ellipticity condi- 
tion, cf. eqn [5]), and consider the hitting probabilities 
Uni P To <h. where To= mint > 022g <0) 
(cf. the section “Transience and recurrence”). By 
decomposing with respect to the first step, for n > 1 
we obtain the difference equation 


L 
Un = Pn(1)uns1 +> Pn(—i)Un-i [36] 
i=0 


with the boundary conditions up = --- =u p41 =1. 


Using that 1=p,(1) + ans we can rewrite 
eqn [36] as 


L 


Pu(1) (un — Uny1) = N Pal—i) (uni — Un) 


i=1 


or, equivalently, 


L 
Vn = 2, bn (i Wai [37] 


where v; := uj — uj,, and 
. Pn(—i) eee + Pu(—-L) 
pD aes 38 
(7) >, (1) [38] 


Recursion [37] can be written in a matrix form, 
T 
Vn =MrVan-15 where V, := (Uaes KO 


ba (1) Dal C) 
1... O 0 
Mi : [39] 
C a 2 « 


and by iterations we get (cf. eqn [10]) 


V, = M,---M1Vo, Vo =(1—m,0,...,0)' 
Note that M, depends only on the transition 
probability vector p,(-), and hence M,,---My, is the 
product of 1.1.d. random (non-negative) matrices. By 
Furstenberg—Kesten’s theorem, the limiting behavior 
of such a product, as n—> co, is controlled by the 


largest Lyapunov exponent 
yı := lim a™! ln ||M, ... Mı || [40] 


(by Kingman’s subadditive ergodic theorem, the limit 
exists P-a.s. and is nonrandom). It follows that, Po-a.s., 
the RWRE X, is transient if and only if 7,40, and 
moreover, limp, X, = +00 (—oo) when 7 < 0(> 0), 


whereas lim, ,..Xn= —00, limpo Xn = +00 when 
y= 0. 


For orientation, note that if p,(i)=p(i) are 
nonrandom constants, then 7; = In A;, where A, > 0 
is the largest eigenvalue of Mo, and so 7 < O if and 
only if Ay < 1. The latter means that the character- 
istic polynomial y(A):= det(Mo — AI) satisfies the 
condition (-1)’y(1) > 0. To evaluate det (Mo — J), 
replace the first column by the sum of all columns 
and expand to get y(1)=(—1)'~'(b) +--+ b2). 
Substituting expressions [38] it is easy to see that 
the above condition amounts to p(1)— SŁ ipx 
(—i) > 0, that is, the mean drift of the random 
walk is positive and hence X,— +00 a.s. 

In the general case, L > 1,R > 1, similar con- 
siderations lead to the following matrices of order 
d:=L+R-—1 (cf. eqn [39]): 


an(R — 1) an(1) Bn (1) b,,(L) 
1 0 0 
0 1 0 0 
M, = 
0 0 1 0 


where D,(i) are given by eqn [38] and 


eae — Pad) ee + pr(R) 
At) = ~p(R) 


Suppose that the ellipticity condition is satisfied in 
the form p,(i) >6>0,140, -—L <i<R, and let 
>y >- > yy be the (nonrandom) Lyapunov 
exponents of {M,,}. The largest exponent 7 is again 
given by eqn [40], while other exponents are 
determined recursively from the equalities 


vate = im In JACM ++- Ma) 


(1 < k < d). Here A denotes the external (antisym- 
metric) product: x\y=—yAx (x, y€ RÎ), and 
ARM acts on the external product space A*R%, 
generated by the canonical basis {e;, ^+- A e,1 < 
ii <- < ip < d}, as follows: 


AR M(x [ws /\ Xp) = M(x) [aes A M(x) 


One can show that all exponents except yr are 
sign-definite: yR-1 > 0 > yr41. Moreover, it is the 
sign of yr that determines whether the RWRE is 
transient or recurrent, the dichotomy being the same 
as in the case R = 1 above (with 7 replaced by yg). 
Let us also mention that an LLN and CLT can be 
proved here (see Brémont (2004)). 

In conclusion, let us point out an alternative 
approach due to Bolthausen and Goldsheid (2000) 


who studied a more general RWRE on a strip 
Z x {0,1,...,m—1}. The link between these two 
models is given by the representation X, =mY, + Zn, 
where m:=max{L,R}, Y EZ, Z,€{0,...,m—1}. 
Random matrices arising here are constructed in- 
directly using an auxiliary stationary sequence. 
Even though these matrices are nonindependent, 
thanks to their positivity the criterion of transience 
can be given in terms of the sign of the largest 
Lyapunov exponent, which is usually much easier to 
deal with. An additional attractive feature of this 
approach is that the condition p,(R)>0O (P-a.s.), 
which was essential for the previous technique, can 
be replaced with a more natural condition 
P{p..(R) > 0} > 0. 


Random Bonds RWRE 


Instead of having random probabilities of jumps 
at each site, one could assign random weights 
to bonds between the sites. For instance, the 
transition probabilities p,=p(x,x+1,w) can be 


defined by 


ec 41] 

Cx—1,x -+ Cx x+1 
where cx x+1 > 0 are i.i.d. random variables on the 
environment space Q. 

The difference between the two models may not 
seem very prominent, but the behavior of the walk 
in the modified model [41] appears to be quite 
different. Indeed, working as in the section “Tran- 
sience and recurrence,” we note that 


= qx = 
Px Cx x+1 

hence, exploiting formulas [11] and [41], we obtain, 

P-a.s., 


n—1 
sy 
1 — Uy 0 


Cx x+1 


Cx—1,x 


Px 








~co n Eg — OO [42] 


since Ecg! > 0. Therefore, foo=1, that is, the 
random walk is recurrent (Po-a.s.). 

The method of environment viewed from the 
particle can also be applied here (see Sznitman 
(2004)). Similarly to the section ‘Environment 
viewed from the particle,” we define a new prob- 
ability measure Q =f(w) P using the density 


u= Z! (c_1,9(w) a ae (w)) 


where Z=2Eco; is the normalizing constant (we 
assume that Eco; < oo). One can check that Q is 
invariant with respect to the transition kernel 
eqn [41], and by similar arguments as in that 
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section, we obtain that lim,..X,/n exists 


(P5-a.s.) and is given by 
| d(0,w) Q(dw) = ZE [co — c_-10| = (0 
Q 


so the asymptotic velocity vanishes. 

Furthermore, under suitable technical conditions 
on the environment (e.g., co} being bounded away 
from 0 and œ, cf. eqn [5]), one can prove the 
following CLT: 





. Xa 
lim Pod 5 < 


x) = B(x) 43] 


where o? = (Eco} - Fe.) Note that o? < 1 (witha 
strict inequality if co; is not reduced to a constant), 
which indicates some slowdown in the spatial 
spread of the random bonds RWRE, as compared 
to the ordinary symmetric random walk. 

Thus, there is a dramatic distinction between the 
random bonds RWRE, which is recurrent and 
diffusive, and the random sites RWRE, with a 
much more complex asymptotics including both 
transient and recurrent scenarios, slowdown effects, 
and subdiffusive behavior. This can be explained 
heuristically by noting that the random bonds 
RWRE is reversible, that is, m(x)p(x,y)=m/(y)x 
p(y,x) for all x,y € Z, with m(x) :=cx—1,% + Cx, x41 
(this property also easily extends to multidimen- 
sional versions). Hence, it appears impossible to 
create extended traps which would retain the 
particle for a very long time. Instead, the mechanism 
of the diffusive slowdown in a reversible case is 
associated with the natural variability of the 
environment resulting in the occasional occurrence 
of isolated “screening” bonds with an anomalously 
small weight cy, 41. 

Let us point out that the RWRE determined by 
eqn [41] can be interpreted in terms of the random 
conductivity model (see Hughes (1996)). Suppose 
that each random variable cx x}1 attached to the 
bond (x,x +1) has the meaning of the conductance 
of this bond (the reciprocal, cy 1 į being its 
resistance). If a voltage drop V is applied across 
the system of N successive bonds, say from 0 
to N, then the same current I flows in each 
of the conductors and by Ohm’s law we have 
T=cyx41 V,x41, Where V;,x41 is the voltage drop 
across the corresponding bond. Hence 


N N 
—_ — —1 
V= Veen =I} Gan 
x=0 x=0 


which amounts to saying that the total resistance of 
the system of consecutive elements is given by the 
sum of the individual resistances. The effective 
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conductivity of the finite system, Cyn, is defined as 
the average conductance per von’ so that 


zl a 
CN — 2 4 


and by the strong LLN, cg! > ed as N > œ (P-a.s.). 
Therefore, the effective conductivity of the infinite 
system is given by ¢=(Kcp,!)', and we note that 
c < Eco, if the random medium is nondegenerate. 
Returning to the random bonds RWRE, eqn [41], 
it is easy to see that a site f is recurrent if and only if 
the conductance c; œo between x and oo equals zero. 
Using again Ohm’s law, we have (cf. eqn [42]) 


a =) an = 0 


and we recover the result about recurrence. 


P-a.s. 


Continuous-Time RWRE 


As in the discrete-time case, a random walk on Z with 
continuous time is a homogeneous Markov chain 
X+, t€ [0, co), with state space Z and nearest-neighbor 
(or at least bounded) jumps. The term “Markov” as 
usual refers to the “lack of memory” property, which 
amounts to saying that from the entire history of the 
process development up to a given time, only the 
current position of the walk is important for the future 
evolution while all other information is irrelevant. 
Since there is no smallest time unit as in the discrete- 
time case, it is convenient to describe transitions of X; 
in terms of transition rates characterizing the 
likelihood of various jumps during a very short time. 
More precisely, if Pxy(t):= P{X;=y | Xo =x} are the 
transition probabilities over time ż, then for h — 0 


Pay(h ) = Cxyh + o(h) (x#y) 


Pix(h) =1—b S~ xy + olh) [44] 
y#x 


Equations for the functions pxy(t) can then be 
derived by adapting the method of decomposition 
commonly used for discrete-time Markov chains 
(cf. the section “Transience and recurrence”). Here 
it is more convenient to decompose with respect to 
the “last” step, that is, by considering all possible 
transitions during a small increment of time at the 
end of the time interval [0,ż+ b]. Using Markov 
property and eqn [44] we can write 


Pox ( (t+ h) =h X poy(t) Cyx 
yF#X 
+ pox(t) ( -bX iy) + o(b) 
yF#X 


which in the limit hb — 0 yields the master equation 
(or Chapman—Kolmogorov’s forward equation) 


d 
— Pox (t = S {oxPoy(t) — CxyPox (£) } 
dt ez [45] 


Pox(0) = 60(x) 


where 60(x) is the Kronecker symbol. 

Continuous-time RWRE are therefore naturally 
described via the randomized master equation, that 
is, with random transition rates. The canonical 
example, originally motivated by Dyson’s study of 
the chain of harmonic oscillators with random 
couplings, is a symmetric nearest-neighbor RWRE, 
where the random transition rates Cxy are nonzero 
only for y=x+1 and satisfy the condition 
Cx, x+1 =Cx+1,x Otherwise being 1.i.d. (see Alexander 
et al. (1981)). In this case, the problem [45] can be 
formally solved using the Laplace transform, leading 
to the equations 





s+ Go + Go = [bo(s)]' [46] 
s+G, +G} =0 (x40) |47] 
where G7 , Gł are defined as 
E 4 Pox(s) = Pox+i(s) [48] 
” Pox (s) 
and pox(s):= fo Pox(t) edt. From eqns [47] and 
[48] one aie the a 
= 
a 
Cx x+1 S =- G- |49] 
Desa () A a os 


The quantities Gj are therefore expressed as infinite 
continued fractions depending on s and the random 
variables Cx x+1, Cx,xt25---- The function poo(s) can 
then be found from eqn [46]. 

In its generality, the problem is far too hard, and 
we shall only comment on how one can evaluate the 
annealed mean 


Epoo(s) 


According to eqn [49], the random variables 
Gj,Gọ are determined by the same algebraic 
formula, but involve the rate coefficients from 
different sides of site x, and hence are i.i.d. 
Furthermore, eqn [49] implies that the random 
variables Gj, Gf have the same distribution and, 
moreover, G] and co; are independent. Therefore, 
eqn [49] may be used as an integral equation for the 
unknown density function of Gj. It can be proved 
that the suitable solution exists and is unique, and 


—E(s+Gi+G5)" 


although an explicit solution is not available, one 
can obtain the asymptotics of small values of s, 
thereby rendering information about the behavior of 
Poo(t) for large t. More specifically, one can show 
that if cx:= (Ecg!) > 0, then 


Epoo (s) a (4c,s)", s= U 
and so by a Tauberian theorem 


Epoo(t) ~ (4rct) ", t00 [50] 


Note that asymptotics [50] appears to be the same 
as for an ordinary symmetric random walk with 
constant transition rates Cx, x+1 = Cx+1,x = Cx, SUggest- 
ing that the latter provides an EMA for the RWRE 
considered above. 

This is further confirmed by the asymptotic 
calculation of the annealed mean square displace- 
ment, EoX? ~ 2c,t as t— 00 (Alexander et al. 1981). 
Moreover, Kawazu and Kesten (1984) proved that 
X; is asymptotically normal: 





X 
lim Pod < 7 = P(x) [51] 
Therefore, if cą > 0, then the RWRE has the same 
diffusive behavior as the corresponding ordered 
system, with a well-defined diffusion constant 
D= Cy 
In the case where c, =0 (i.e., Ecg,! = 00), one may 
expect that the RWRE exhibits subdiffusive beha- 
vior. For example, if the density function of the 
transition rates is modeled by 


fali) = (1 a) u “Lio<cu<1} 


then, as shown by Alexander et al. (1981), 


(0<a< 1) 


Epoo(t) AI Ca oe] 
Eo X? ne Crees’ 


In fact, Kawazu and Kesten (1984) proved that in 
this case t~°/'+%X, has a (non-Gaussian) limit 
distribution as t > oo. 

To conclude the discussion of the continuous- 
time case, let us point out that some useful 
information about recurrence of X; can be obtained 
by considering an imbedded (discrete-time) random 
walk X,,, defined as the position of X; after n jumps. 
Note that continuous-time Markov chains admit an 
alternative description of their evolution in terms of 
sojourn times and the distribution of transitions at a 
jump. Namely, if the environment w is fixed, then 
the random sojourn time of X; in each state x is 
exponentially distributed with mean 1/c,, where 
= 2a dx Cxys while the distribution of transitions 
from x is given by the probabilities px = Cxy/Cx. 
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For the symmetric nearest-neighbor RWRE con- 
sidered above, the transition probabilities of the 
imbedded random walk are given by 


o al 
Cx—1,x T Cx x+1 


Ax := Px x-1 = l= Dx 


and we recognize here the transition law of a 
random walk in the random bonds environment 
considered in the previous subsection (cf. eqn [41]). 
Recurrence and zero asymptotic velocity established 
there are consistent with the results discussed in the 
present section (e.g., note that the CLT for both X,, 
eqn [43], and X;, eqn [51], does not involve any 
centering). Let us point out, however, that a “naive” 
discretization of time using the mean sojourn time 
appears to be incorrect, as this would lead to the 
scaling t=nd, with 6; :=E(c_190+ Co1) 1, while 
from comparing the limit theorems in these two 
cases, one can conclude that the true value of the 
effective discretization step is given by 
6,:=(2c,)'=(1/2)Ecol. In fact, by the arith- 
metic-harmonic mean inequality we have 6, > 64, 
which is a manifestation of the RWRE’s diffusive 
slowdown. 


Px := Px x+1 = 


RWRE in Higher Dimensions 


Multidimensional RWRE with nearest-neighbor 
jumps are defined in a similar fashion: from site 
x € Zf the random walk can jump to one of the 2d 
adjacent sites x +e€7Z" (such that le =1), with 
probabilities p,(e) > 0, A pxle)=1, where the 
random vectors px(-) are assumed to be i.i.d. for 
different x€ Zf. As usual, we will also impose the 
condition of uniform ellipticity: 


pDx(e) >6>0, P-as. 


lel) =1, xeZ4 Pa 

In contrast to the one-dimensional case, theory of 
RWRE in higher dimensions is far from maturity. 
Possible asymptotic behaviors of the RWRE for d > 2 
are not understood well enough, and many basic 
questions remain open. For instance, no definitive 
classification of the RWRE is available regarding 
transience and recurrence. Similarly, LLN and CLT 
have been proved only for a limited number of 
specific models, while no general sharp results have 
been obtained. On a more positive note, there has 
been considerable progress in recent years in the so- 
called ballistic case, where powerful techniques have 
been developed (see Sznitman (2002, 2004) and 
Zeitouni (2003, 2004)). Unfortunately, not much is 
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known for nonballistic RWRE, apart from special 
cases of balanced RWRE in d>2 (Lawler 1982), 
small isotropic perturbations of ordinary symmetric 
random walks in d > 3 (Bricmont and Kupiainen 
1991), and some examples based on combining 
components of ordinary random walks and RWRE 
in d > 7 (Bolthausen et al. 2003). In particular, there 
are no examples of subdiffusive behavior in any 
dimension d > 2, and in fact it is largely believed that 
a CLT is always true in any uniformly elliptic, 1.1.d. 
random environment in dimensions d > 3, with 
somewhat less certainty about d=2. A heuristic 
explanation for such a striking difference with the 
case d= 1 is that due to a less restricted topology of 
space in higher dimensions, it is much harder to force 
the random walk to visit traps, and hence the 
slowdown is not so pronounced. 

In what follows, we give a brief account of some 
of the known results and methods in this fast- 
developing area (for further information and specific 
references, see an extensive review by Zeitouni 


(2004)). 


Zero-One Laws and LLNs 


A natural first step in a multidimensional context is 
to explore the behavior of the random walk X,, as 
projected on various one-dimensional straight lines. 
Let us fix a test unit vector l€ Rf, and consider the 
process Z!':=X,-. Then for the events 
Ase :={limy— 0 Z£ = +00} one can show that 


Po(A U A_¢) € {0,1} [53] 


That is to say, for each £ the probability that the 
random walk escapes to infinity in the direction £ is 
either 0 or 1. 

Let us sketch the proof. We say that 7 is “record 
time” if |Zi| > |Z;,| for all k < t, and “regeneration 
time” if in addition |Z‘| < |Z‘| for all n > 7. Note 
that by the ellipticity condition [52], lim, ,.|Z5|= 
oo (Po-a.s.), hence there is an infinite sequence of 
record times =m <n <mn <-:-. If Po(AgU 
A_,¢)>0, we can pick a subsequence of record 
times T;, each of which has a positive Po- 
probability to be a regeneration time (because 
otherwise |Z‘| would persistently backtrack 
towards the origin and the event AyUA_, could 
not occur). Since the trials for different record 
times are independent, it follows that a regenera- 
tion time 7* occurs Po-a.s. Repeating this argu- 
ment, we conclude that there exists an infinite 
sequence of regeneration times 7;, which implies 
that |Z£|— oo (Po-a.s.), that is, P(Ay UA_~) =1. 

Regeneration structure introduced by the 
sequence {7*} plays a key role in further analysis 


of the RWRE and is particularly useful for 
proving an LLN and a CLT, due to the fact 
that pieces of the random walk between con- 
secutive regeneration times (and fragments of the 
random environment involved thereby) are inde- 
pendent and identically distributed (at least 
starting from 7;). In this vein, one can prove a 
“directional” version of the LLN, stating that for 
each £ there exist deterministic vgv- (possibly 
zero) such that 


7 
lim — = v 14, +v- 14, 


n=œ Nn 


Po-a.s. [54] 


Note that if Po(A;)€{0,1}, then eqn [54] in 
conjunction with eqn [53] would readily imply 
⁄ 


lim — = Ve, 
n= n 


Moreover, if Po(Ac)€ {0,1} for any #4, then there 
exists a deterministic v (possibly zero) such that 


Po-a.s. [55] 


Po-a.s. [56] 


lim — =v, 

n—-o n 
Therefore, it is natural to ask if a zero—one law [53] 
can be enhanced to that for the individual prob- 
abilities Po(Av). It is known that the answer is 
affirmative for i.i.d. environments in d=2, where 
indeed P(A,) € {0, 1} for any £, with counterexamples 
in certain stationary ergodic (but not uniformly 
elliptic) environments. However, in the case d > 3 
this is an open problem. 


Kalikow’s Condition and Sznitman’s Condition (T’) 


An RWRE is called “ballistic” (ballistic in direction £) 
if v~0 (vp #0), see eqns [55] and [56]. In this 
section, we describe conditions on the random 
environment which ensure that the RWRE is ballistic. 

Let U be a connected strict subset of Z* contain- 
ing the origin. For x€ U, denote by 


Tu 
g(x,w):= ESS 1x) 
n=0 


the quenched mean number of visits to x prior to the 
exit time Ty:= min {n > 0:X„¢ U}. Consider an 
auxiliary Markov chain X,, which starts from 0, 
makes nearest-neighbor jumps while in U, with 
(nonrandom) probabilities 


> o _ Ele(x.)ps(e) 
Ps) = Few) 


and is absorbed as soon as it first leaves U. Note 
that the expectations in eqn [57] are finite; indeed, if 
a is the probability to return to x before leaving U, 


XEU 137] 


then, by the Markov property, the mean number of 
returns is given by 


N kak (1 — Ax) = = 
k=1 


1 — a, 





< O 


since, due to ellipticity, a, < 1. 

An important property, highlighting the usefulness 
of X,,, is that if X, leaves U with probability 1, then the 
same is true for the original RWRE X,„ (under 
the annealed law Po), and moreover, the 
exit points X; and Xr, have the same distribution 
laws. 

Let L€ Rf, |¢)=1. One says that Kalikow’s condi- 
tion with respect to £ holds if the local drift of X,, in 
the direction £ is uniformly bounded away from zero: 


(e- £) px(e) > 0 |58] 


A sufficient condition for [58] is, for example, that 
for some kK > 0 


E[(d(0,u) -),] > KE[(dO,w)-0 | 


where d(0,w) = EG X1 and u+ := max {+u, 0}. 

A natural implication of Kalikow’s condition [58] 
is that Po(Av)=1 and v > 0 (see eqn [55]). More- 
over, noting that eqn [58] also holds for all @ in a 
vicinity of £ and applying the above result with d 
noncollinear vectors from that vicinity, we conclude 
that under Kalikow’s condition there exists a 
deterministic v #0 such that X,/n—v as n— co 
(Po-a.s.). Furthermore, it can be proved that 
(X, —nv)/,/n converges in law to a Gaussian 
distribution (see Sznitman (2004)). 

It is not hard to check that in dimension d=1 
Kalikow’s condition is equivalent to v#0 and 
therefore characterizes completely all ballistic 
walks. For d > 2, the situation is less clear; for 
instance, it is not known if there exist RWRE with 
P(A;) > 0 and ve = 0 (of course, such RWRE cannot 
satisfy Kalikow’s condition). 

Sznitman (2004) has proposed a more compli- 
cated transience condition (T’) involving certain 
regeneration times 7; similar to those described in 
the previous subsection. An RWRE is said to satisfy 
Sznitman’s condition (T’) relative to direction £ if 
Po(Av) =1 and for some c > 0 and allO <y< 1 


Eo exp (c sup xr) < [59] 


* 
NST, 


This condition provides a powerful control over 7; 
for d > 2 and in particular ensures that 7; has finite 
moments of any order. This is in sharp contrast with 
the one-dimensional case, and should be viewed as a 
reflection of much weaker traps in dimensions d > 2. 
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Condition [59] can also be reformulated in terms of 
the exit distribution of the RWRE from infinite thick 
slabs “orthonormal” to directions ¢’ sufficiently close 
to £. As it stands, the latter reformulation is difficult 
to check, but Sznitman (2004) has developed a 
remarkable “effective” criterion reducing the job to 
a similar condition in finite boxes, which is much 
more tractable and can be checked in a number of 
cases. 

In fact, condition (T’) follows from Kalikow’s 
condition, but not the other way around. In the one- 
dimensional case, condition (T’) (applied to 2=1 and 
l= —1) proves to be equivalent to the transient 
behavior of the RWRE, which, as we have seen in 
Theorem 2, may happen with v=0, that is, in a 
nonballistic scenario. The situation in d > 2 is quite 
different, as condition (T’) implies that the RWRE is 
ballistic in the direction £ (with v; > 0) and satisfies a 
CLT (under Po). It is not known whether the ballistic 
behavior for d>2 is completely characterized by 
condition (T’), although this is expected to be true. 


Balanced RWRE 


In this section we discuss a particular case of 
nonballistic RWRE, for which LLN and CLT can 
be proved. Following Lawler (1982), we say that an 
RWRE is “balanced” if p,x(e)=px(—e) for all 
x €7Z4,|e|=1 (P-a.s.). In this case, the local drift 
vanishes, d(x,w) =0, hence the coordinate processes 
X!, (i=1,...,d) are martingales with respect to the 
natural filtration F,,=o{Xo,...,Xn}. The quenched 
covariance matrix of the increments AX! := 
Xi; —X!, (i=1,...,d) is given by 


n+1 
Ep [AX AX! |F a= 26x, (e;) [60] 


Since the right-hand side of eqn [60] is uniformly 
bounded, it follows that X„/n — 0 (Po-a.s.). Further, 
it can be proved that there exist deterministic positive 
constants 41,...,4g such that fori=1,...,d 


q wt q; 
lim = )=t, Poas. [61 
dim a e [61 


Once this is proved, a multidimensional CLT for 
martingale differences yields that X„/yn converges 
in law to a Gaussian distribution with zero mean 
and the covariances bj = 6;dj. 

The proof of [61] employs the method of environ- 
ment viewed from the particle. Namely, define a 
Markov chain w, :=6*"w with the transition kernel 

d 
T(w, di!) =Y lpo lei) 6oo(de 
i=1 


+ po(—e))b9-1,,(dw")] 
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(cf. eqn [27]). The next step is to find a probability 
measure Q on Q invariant under T and absolutely 
continuous with respect to P. Unlike the one- 
dimensional case, however, an explicit form of Q is 
not available, and Q is constructed indirectly as the 
limit of invariant measures of certain periodic 
modifications of the RWRE. Birkhoff’s ergodic 
theorem then yields, Po-a.s., 


n—1 


ic 1 
-9 Px, (ei w) = -> polei, Wp) 
k=0 


k=0 
— / polei w) Q(dw) > 6 
Q 


by the ellipticity condition [52], and eqn [61] 
follows. 

With regard to transience, balanced RWREs 
admit a complete and simple classification. Namely, 
it has been proved (see Zeitouni (2004)) that any 
balanced RWRE is transient for d > 3 and recurrent 
for d=2 (Po-a.s.). It is interesting to note, however, 
that these answers may be false for certain balanced 
random walks in a fixed environment (P-probability 
of such environments being zero, of course). Indeed, 
examples can be constructed of balanced random 
walks in Z and in Z with d> 3, which are 
transient and recurrent, respectively (Zeitouni 


2004). 


RWRE Based on Modification of Ordinary 
Random Walks 


A number of partial results are known for RWRE 
constructed on the basis of ordinary random walks 
via certain randomization of the environment. A 
natural model is obtained by a small perturbation of 
a simple symmetric random walk. To be more 
precise, suppose that: (1) |p.(e) —1/2d| < e for all 
x€ Zf and any |e|=1, where £ > 0 is small enough; 
(2) Epx(e)=1/2d; (3) vectors px(-) are i.i.d. for 
different x€ Zf; and (4) the distribution of the 
vector p,x(-) is isotropic, that is, invariant with 
respect to permutations of its coordinates. Then for 
d > 3 Bricmont and Kupiainen (1991) have proved 
an LLN (with zero asymptotic velocity) and a 
quenched CLT (with nondegenerate covariance 
matrix). The proof is based on the renormalization 
group method, which involves decimation in time 
combined with a suitable spatial-temporal scaling. 
This transformation replaces an RWRE by another 
RWRE with weaker randomness, and it can be 
shown that iterations converge to a Gaussian fixed 
point. 

Another class of examples is also built using small 
perturbations of simple symmetric random walks, but 
is anisotropic and exhibits ballistic behavior, providing 


that the annealed local drift in some direction is strong 
enough (see Sznitman (2004)). More precisely, sup- 
pose that d>3 and 7€(0,1). Then there exists 
é9=€0(d,n) >O such that if |p,(e)—1/2d| < 
e (x EZ, le| =1) with 0 <€< £ọ, and for some eg 
one has E[d(x,w)-eg] > e7°7" (d=3) or > & 77 
(d > 4), then Sznitman’s condition (T’) is satisfied 
with respect to eg and therefore the RWRE is ballistic 
in the direction eg (cf. the subsection “Kalikow’s 
condition and Sznitman’s condition (T')”). 

Examples of a different type are constructed in 
dimensions d > 6 by letting the first dj > 5 coordi- 
nates of the RWRE X, behave according to an 
ordinary random walk, while the remaining 
d,=d-—d, coordinates are exposed to a random 
environment (see Bolthausen et al. (2003)). One can 
show that there exists a deterministic v (possibly 
zero) such that X„/n—v (Po-a.s.). Moreover, if 
dı > 13, then (X, — nv)/vyn satisfies both quenched 
and annealed CLT. Incidentally, such models can be 
used to demonstrate the surprising features of the 
multidimensional RWRE. For instance, for d > 7 
one can construct an RWRE X, such that the 
annealed local drift does not vanish, Ed(x,w) 40, 
but the asymptotic velocity is zero, X,/n—0 
(Po -a.s.), and furthermore, if d > 15, then in this 
example X,,/,/n satisfies a quenched CLT. (In fact, 
one can construct such RWRE as small perturba- 
tions of a simple symmetric walk.) On the other 
hand, there exist examples (in high enough dimen- 
sions) where the walk is ballistic with a velocity 
which has an opposite direction to the annealed drift 
Ed(x,w)40. These striking examples provide 
“experimental” evidence of many unusual properties 
of the multidimensional RWRE, which, no doubt, 
will be discovered in the years to come. 


See also: Averaging Methods; Growth Processes in 
Random Matrix Theory; Lagrangian Dispersion (Passive 
Scalar); Random Dynamical Systems; Random Matrix 
Theory in Physics; Stochastic Differential Equations; 
Stochastic Loewner Evolutions. 
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Introduction 


One of the tasks of classical mechanics has always 
been to identify those Hamiltonian systems which, 
by their peculiar properties, are considered solvable. 
The integrable systems of Liouville and the separ- 
able systems of Jacobi can serve as representative 
examples here. The bi-Hamiltonian geometry, a 
branch of Poisson geometry dealing with a special 
kind of deformation of Poisson bracket, suggests 
two further classes of Hamiltonian systems — the 
bi-Hamiltonian systems and the cyclic systems of 
Levi-Civita. The purpose of this article is to 
investigate the second class of systems mentioned 
above, and to explain why they are relevant for 
classical mechanics. (see Bi-Hamiltonian Methods in 
Soliton Theory and Multi-Hamiltonian Systems for 
further details). 

To define a cyclic system of Levi-Civita, one 
must consider a symplectic manifold (S,w) endowed 
with a tensor field of type (1,1), seen as an 
endomorphism N:TS — TS that obeys two 
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Mechanics 


conditions. The first condition is that the vector- 
valued 2-form 


Tn (X, Y) =[NX, NY] — N[NX, Y| — N[X, NY] 
+ N*[X, Y] 


(called the Nijenhuis torsion of N) vanishes identi- 
cally. In this case N is termed a “recursion 
operator.” The second condition is that 


w (X,Y) = w(NX, Y) 


is a closed 2-form. The manifolds where these 
conditions are fulfilled are called wN manifolds. 
On these manifolds, each Hamiltonian vector field 
X, is embedded into the distribution 


D, = (Xp, NXp, N? Xp, ae d 


which is the minimal invariant distribution con- 
taining Xp. This can be called the Levi-Civita 
distribution generated by X,. Experience has 
shown that D, is seldom integrable. The cyclic 
systems of Levi-Civita are, by definition, the 
generators of the integrable Levi-Civita distribu- 
tions. Even though this notion is new in classical 
mechanics, many interesting classical systems dis- 
play this property. 

The aim of this article is to show that the cyclic 
systems of Levi-Civita are closely related to 
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separable systems of Jacobi. To this end, the 
article is organized in four sections, of which the 
first three clarify the above-mentioned concepts. In 
the section “wN manifolds,” the idea of wN 
manifolds is explained from the viewpoint of bi- 
Hamiltonian geometry. The section ‘“Cotangent 
bundles” shows that cotangent bundles provide a 
large class of wN manifolds, proving that such 
manifolds are not rare. Next, two basic examples 
of cyclic systems of Levi-Civita are presented. 
Finally, the relation between cyclic systems of 
Levi-Civita and separable systems of Jacobi is 
explained briefly. 


wN Manifolds 


Let us consider a symplectic manifold (S,w) with its 
Hamiltonian vector fields X; defined by 


w(X;),, -) = —dh 
and with the Poisson bracket 


{f, g} — w(Xf, Xg) 


Both the Hamiltonian vector fields and the functions 
on S form a Lie algebra, and these algebras are 
homomorphic, since 


Xf, Xe] = X ip 9} 


The bi-Hamiltonian geometry is the study of the 
deformations of the Lie algebras which preserve the 
above morphism. 

We start from the deformations of the Poisson 
algebra of functions, by replacing the bracket {f, g} 
with the linear pencil 


{f g}. ={f.gt + eff, gY, 


The problem is to find {f, g} in such a way that the 
linear pencil satisfies the Jacobi identity for any 
value of the parameter e. To solve this problem it 
is convenient to represent the bracket {f,g}’ in 
the form 


eER 


{fg} =w (Xf Xe) 


(which is analogous to the standard representation 
of the Poisson bracket of S) and then to notice that 
there exists a unique (1, 1) tensor field N: TS — TS 
such that 


w (X-, Xz) = w(NX-, Xg) 


Due to the skew-symmetry of w’, the tensor field N 
must satisfy the condition 


w(NX;, X~) = w(X;, NXg) 


To the first order in e€, the Jacobi identity on {f, g} 
gives 


{{f, ot bY + {{f, 2 hb} + cyclic permutations = 0 


This condition entails a constraint on w’. One can 
readily check that w’ must be a closed 2-form: 


dw’ = 0 


In turn, this constraint imposes a condition on N. 
The translation of the closure of w’ on N is 


INX >, Xs] + Xf, NXg] — N[X/, Xe] = Xp oy! 


To the second order in e, the Jacobi identity on 
(fs 8}. gives 


if gt, hy + cyclic permutations = 0 
entailing the condition 
INX/, NX] = NX grg 


on N. Thus, the Jacobi identity is satisfied at any 
order in € if and only if N is torsion free and w is a 
closed 2-form. Hence, according to the definition 
given in the “Introduction,” the manifold S is an wN 
manifold. 

It may be of interest to notice that the bracket 


X, Yly = [NX, Y] + [X, NY] — N[X, Y] 


is a new (deformed) commutator on vector fields, 
since the torsion of N vanishes. The same is also 
true for 


X, Y]. = [X, Y] + €[X, Y]y 


since the torsion of (Id + «N) vanishes too. There- 
fore, one can write 


Xf, Xe]. = Xe gy, 


This formula shows that this process of deformation 
is rigid. For each change of the Poisson bracket, 
there is a deformation of the commutator of vector 
fields such that the basic correspondence between 
functions and Hamiltonian vector fields, established 
by the symplectic form w, remains a Lie algebra 
morphism. 

The same phenomenon can be observed in 
connection with the definition of Hamiltonian 
vector field. If one introduces the pencil of 2-forms 


we = wt ew! 
and the pencil of derivations 
d: = d + edn 


where dyn is the derivation of type d and degree 1 
canonically associated with N according to the 


theory of graded derivations of Frolicher and 
Nijenhuis, one can prove that 


d=0, dw.=0 


and that 
w.(X,,-) = —d.h 


This means that, on an wN manifold, the symplectic 
form w and the de Rham differential d are deformed 
in such a way that the basic relation between 
functions and Hamiltonian vector fields established 
by w holds true. 


Cotangent Bundles 


Cotangent bundles are a source of examples of wN 
manifolds. The construction begins on the 
base manifold ©O. For any (1,1) tensor field 
L:TQ — TO with vanishing Nijenhuis torsion, 
one constructs the deformed Liouville 1-form 


= S yiL (dx’) 
= 


and its exterior derivative 
w = dé’ 


It can be proved that w satisfies the conditions 
explained in the previous section, and conclude 
that T*O, endowed with the pencil of 2-form 
We =w +; w, is an wN manifold. 

A subclass of these structures merits attention. It 
is related to the polynomials 


s(A) = à” — (sy! oe a eee Sn) 


the coefficients of which are functions on O 
satisfying the condition 


ds, A dsz A+++ A ds, £ 0 


(almost) everywhere on QO. Moreover, it is con- 
venient to assume that the roots (Aj, A2,..., An) of 
s(A) are distinct and real, so that they are 
functionally independent and can be used as 
coordinates on O. Therefore, the choice of s(A) is 
equivalent to fix a special system of coordinates on 
O, as it happens in R? when one introduces the 
elliptical coordinates as the roots of the 
polynomial 


s(A) = (A—a)(A b) -= c) 
Pe y2 a 
1 
<( tt) 
The peculiarity of this situation is that there exists 
a unique recursion operator L:TO — TO whose 
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characteristic polynomial is s(A). Thus, the choice 
of s(A) also determines an wN structure on T*O 
according to the previous prescription. The con- 
clusion is that there is a relation between pencils 
of Poisson brackets on T*O and coordinate 
systems on QO. This relation is the clue to 
understand the geometry of separable systems of 


Jacobi. 


Cyclic Systems of Levi-Civita 


The systems of coupled harmonic oscillators are the 
first example of cyclic systems of Levi-Civita. Let us 
consider, for simplicity, a system formed by only 
two particles, with masses mı and m2, moving on a 
line under the action of an internal elastic force. The 
Lagrangian of the system is 


L= - (mài + mX?) = 5 R(x = x2)? 
and the equations of motion are 
Mx+Kx=0, x= (z) 


where 


mı 0 k —k 
m= (Som) l E) 
Under a change of coordinates, the entries of the 
matrices M and K obey the transformation law of 
the components of a second-order covariant tensor. 
Therefore, the entries of the matrix L=M7!K are 
the components of a tensor field of type (1,1) on R?. 


The defining equations of the associated endo- 
morphism L: TR* — R? are 


L*(dx1) = ws (dx. = dx1) 
L*(dx2) = ws (dxi — dx2) 


if wt=k/m; and w$ =k/m, and these equations 
clearly show that L is torsion free. The same 
argument holds for any system of coupled harmonic 
oscillators. Therefore, the cotangent bundle asso- 
ciated with any system of coupled harmonic 
oscillators is an wN manifold. 

To compute the tensor field N in our example, 
one has to follow the prescription, passing from 


6’ = (wy = w3y2)(dx2 = dx1) 
to 


w = (wy dy, — WW dy? ) /\ (dx — dx1) 
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and to 


o o o 
omens 
sn) 7 Oy, Oy2 


The Levi-Civita distribution D, is therefore spanned 
by the vector fields 


y o yw O ð ð 
aeee ee = aan ies 
ý > oxı = w5 Ox? T (x2 *1) (= =) 


Ww O w2 O 
NX, = — 1 ee — 72 — 
p=k (i Ww y2) Ox] (z wi y) 0x2 


+ (wy + 03) (x2 — x1) & 7 z). 


related to the Hamiltonian 


of the system of coupled oscillators. Since 
[X,, NX;,]=0, the distribution is integrable; there- 
fore, the system is a cyclic system of Levi-Civita. 
This property holds for any system of coupled 
harmonic oscillators. It will be apparent at the end 
of this article that this result is due to the 
eigenvectors of L defining the separation coordi- 
nates of the coupled oscillators. 

The second and final example of cyclic systems of 
Levi-Civita is the Neumann system, that is, the 
anisotropic harmonic oscillator on the sphere S4, 
whose Lagrangian is 


“9 .2 .2 
L =}im(x +5 + x3) — +4 (axi + arx5 + a3x3) 
with the constraint 
E EEr 


This constraint can be avoided by using the first two 
Cartesian coordinates (x1,x2) as local coordinates 
on S*. The Hamiltonian of the system can then be 
written in the form 


h =$(14+xt)y} — x1x2y1y2 
+5 (14+ x3)95 +4 (a1 — 43)x4 
+5 (a = a3) x5 


where, for simplicity, m = 1. Formally one is back in 
R? as in the previous example, but the nonlinearity 
of the equations of motion hinders us to readily see 


the appropriate recursion operator L: TR? > TR? 
to be used to construct the wN structure on T*R?. 
Let us however recall that according to Neumann, 
the system is separable in elliptical spherical (also 
called spheroconical) coordinates, defined as the 
roots of the restriction to S? of the polynomial 











x? x x$ 
s(A) = ay(A~b)(A-9 (E+ 24 


= X — (s1À + s2) 


Let us, therefore, use this polynomial to construct 
the unique recursion operator L having s(A) as its 
characteristic polynomial. It is given by 


L*(ds1) = ds2 + sı dsı 

L*(ds2) = s2 dsı 
or, after a brief computation, by 
L*(dx1) = ay dxı — xı d| (a1 — a3)xf +4 (a2 — a3 )x5 | 
L*(dx2) = a dx — x2 d|5 (a1 — a3) xz +4 (a2 — a3)x5| 


The situation stays the same as in the previous 
example. Accordingly, the recursion operator N on 
T*R? is now given by 


N* dx, = a, dx, — xı df 

N* dxz = a dx2 — x2 df 

N* dy1 = a1 dy — (a1 — a3)x1 dg + yı df 

N* dy2 = a dy2 — (a2 — a3)x2 dg + y2 df 
where the shorthand notations 


f = 3(a1 — a3)xi + 3(a2 — a3) x3 


g = X1y1 F X22 


have been used. The derivation dy, associated with 
N, is accordingly defined by 


dy =N dx; = ay + (a3 — ai) x4] dx, 
+ (a3 — a2 )x1 x2 dx 
dnx2 =N* dx2 = (a3 — a1 )x1x2 dx1 
+ [az + (a3 — ay) x5 | dx> 
dnyı = N* dy, = [(43 — a1)x1y2 — (a3 — a2)x2y1] dx2 
+ [a1 + (a3 — ay) x4] dy1 + (a3 — a1)x1x2 dy2 
dny2 = N* dy2 = [(a3 — a2)x2y1 — (a3 — a1) x12] dxy 
+ (a3 — ay)x1x2 dyı + [a2 + (a3 — a2)x5] dy2 


on the coordinate functions. Recalling that dy 
anticommutes with d, one can then easily check the 
condition 


ddyh = ds; A dh 


where sı is the first coefficient of the polynomial 
defining the elliptical spherical coordinates, and h is 
the Hamiltonian of the Neumann system. By the 
Frobenius theorem, this equation alone entails the 
integrability of the distribution D,, without the need 
of computing X,,NX,, and their commutator 
[X,, NX;,]. Thus, it can be concluded that the 
Neumann system too is a cyclic system of Levi- 
Civita, and that the recursion operator N, generat- 
ing the distribution D}, is closely related to the 
polynomial defining the separation coordinates of 
the Neumann system. 


Separable System of Jacobi 


In 1838, Jacobi noticed that the Hamilton-Jacobi 
equation 


OW a) 
=e 


h Mg A253 2425s 6s 
Ox4 OX, 


of many Hamiltonian systems splits owing to an 
appropriate choice of coordinates in a set of 
ordinary differential equations. On account of 
this property, these systems have been called 
separable. In 1904, Levi-Civita gave a first partial 
characterization of separable Hamiltonians by 
means of his separability conditions. In a letter 
addressed to Stackel, he proved that / is separ- 
able in a preassigned system of canonical coordi- 
nates if and only if the conditions 


h db db Ëh db dh 
OxjOXx, OV; Oye OxjOy, Oy; Ox, 
Oh Oh Oh Oh Oh Oh — 


- OyOXp, OX; IYr y OyjOyp Ox; Oxp_ 


are satisfied by h. One must notice the nontensorial 
character of these conditions; they hold only in a 
specific coordinate system, and if the coordinates are 
changed, it is not possible to reconstruct the form of 
the separability conditions in the new coordinates. 
The nontensorial character is the major drawback of 
the separability conditions of Levi-Civita, making 
them practically useless in the search of separation 
coordinates. 

The contact between the theory of separable 
system of Jacobi and the theory of cyclic systems 
of Levi-Civita rests on two occurrences. The first is 
the form of the integrability conditions of the 
distribution D, generated by any vector field X, 
on an wN manifold. Exploiting the Frobenius 
integrability conditions and the properties of the 
differential operator dx associated with the recur- 
sion operator N, it can be proved that Dp is 
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integrable if and only if the 2-form ddyh vanishes 
on D;: 


ddh =O on D, 


Suppose now that the dimension of D, is maximal, 
that is, equal to n = (1/2) dim S. Then Dy is spanned 
by the n vector fields (X,, NX;,,...,N”~!X;), and 
the vanishing condition of ddyh on D; turns out to 
be equivalent to 


ddyh(N'X;, N*X;,) = 0 


for any value of j and k from 0 to n — 1. Thus, the 
number of separability conditions of 4 and the 
number of integrability conditions of D, are equal. 
This circumstance strongly suggests that the two sets 
of conditions are related. The nontensorial character 
of the Levi-Civita conditions, compared with the 
tensorial character of the integrability conditions of 
D,, further suggests that the former should be the 
evaluation of the latter in a specific system of 
coordinates. These coordinates are the “normal 
coordinates” of an wN manifold, that will be 
introduced in the following. 

Assume that the minimal polynomial of N has 
real and distinct roots (h,...,l„). In this case, the 
wN manifold is said to be semisimple. A two- 
dimensional eigenspace is associated with each 
root /,. Let us consider the distribution E, spanned 
by all the eigenvectors of N, except those 
associated with l}. Since N is torsion free, each 
distribution E, is integrable. Let us fix the 
attention on one of these distributions. It turns 
out that its leaves are symplectic submanifolds of 
codimension 2. So they are the level surfaces of a 
pair of (local) functions which are not in involu- 
tion. By collecting together the pairs of functions 
associated with the n distributions (F,,...,E,), 
one obtains, at the end, a coordinate system 
(X1, [15 A25 U23- - -3 Ans Un) on S. Moreover, these 
functions can be chosen in such a way to form a 
system of canonical coordinates. The final result is 
that, on a semisimple wN manifold, one can 
construct a coordinate system such that 
dy /\ dà; 


Ww = 
j=1 
and 

N" (dà) = Fd); 

N* (dj) = fd 
These coordinates are called the normal coordinates 
(or sometimes, the Darboux—Nijenhuis coordinates) of 
the wN manifold. One can prove that the separability 
conditions of Levi-Civita are the integrability 
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conditions of D,, written in normal coordinates. This 
result allows us to claim that the cyclic systems of Levi- 
Civita on semisimple wN manifolds are all separable. 
The reverse is also true. As has already been 
shown in the example of the Neumann system, a 
given separable system of Jacobi can be associated 
with a recursion operator N in such a way that its 
phase space (with the possible exclusion of a 
singular locus) becomes an wN manifold, and the 
Hamiltonian vector field X, becomes a cyclic system 
of Levi-Civita. A new interpretation of the process 
of separation of variables follows from this result. 
Indeed, to find separation coordinates for a given 
system on a symplectic manifold S$ is equivalent to 
deforming the Poisson bracket of S into a pencil 


{fg} = {fg} + eff, gy 


in such a way that the recursion operator N defining 
the pencil {f, g}. generates, with X,, an integrable 
distribution D,. Therefore, classical mechanics is 
deeply entangled with the theory of recursion opera- 
tors, even if the insistence on the use of separation 
coordinates has hidden this factor for a long time. 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Classical -Matrices, Lie Bialgebras, and Poisson Lie 
Groups; Integrable Systems and Algebraic Geometry; 
Integrable Systems and Recursion Operators on 
Symplectic and Jacobi Manifolds; Integrable Systems: 
Overview; Multi-Hamiltonian Systems; Separation of 


Variables for Differential Equations; Solitons and 
Kac—Moody Lie Algebras. 


Further Reading 


Dubrovin BA, Krichever IM, and Novikov SP (2001) Integrable 
systems I. In: Arnol’d VI (ed.) Encyclopaedia of Mathematical 
Sciences. Dynamical Systems IV, pp. 177-332. Berlin: Springer. 

Jacobi CGJ (1996) Vorlesungen ber analytische Mechanik, 
Deutsche Mathematiker Vereinigung, Freiburg. Braunschweig: 
Friedrich Vieweg and Sohn. 

Ivan K, Michor PW, and Slovak J (1993) Natural Operations in 
Differential Geometry. Berlin: Springer. 

Kalnins EG (1986) Separation of Variables for Riemannian 
Spaces of Constant Curvature. New York: Wiley. 

Krasilshchik IS and Kersten PHM (2000) Symmetries and 
Recursion Operators for Classical and Supersymmetric Differ- 
ential Equations. Dordrecht: Kluwer. 

Magri F, Falqui G, and Pedroni M (2003) The method of Poisson 
pairs in the theory of nonlinear PDEs. In: Conte R, Magri F, 
Musette M, Satsuma J, and Winternitz P (eds.) Direct and 
Inverse Methods in Nonlinear Evolution Equations, Lecture 
Notes in Physics, vol. 632, pp. 85-136. Berlin: Springer. 

Miller W (1977) Symmetry and Separation of Variables. Reading, 
MA-London—Amsterdam: Addison-Wesley. 

Olver PJ (1993) Applications of Lie Groups to Differential 
Equations, 2nd edn. New York: Springer. 

Pars LA (1965) A Treatise on Analytical Dynamics. London: 
Heinemann. 

Vaisman I (1994) Lectures on the Geometry of Poisson Mani- 
folds. Basel: Birkhauser. 

Vilasi G (2001) Hamiltonian Dynamics. River Edge, NJ: World 
Scientific. 

Yano K and Ishihara S (1973) Tangent and Cotangent Bundles: 
Differential Geometry. New York: Dekker. 


Reflection Positivity and Phase Transitions 


Y Kondratiev, Universitat Bielefeld, Bielefeld, 
Germany 

Y Kozitsky, Uniwersytet Marii Curie-Sklodowskiej, 
Lublin, Poland 


© 2006 Elsevier Ltd. All rights reserved. 


Phase Transitions in Lattice Systems 
Introduction 


Phase transitions are among the main objects of 
equilibrium statistical mechanics, both classical and 
quantum. There exist several approaches to the descrip- 
tion of these phenomena. Their common point is that 
the macroscopic behavior of a statistical mechanical 
model can be different at the same values of the model 
parameters. This corresponds to the multiplicity of 
equilibrium phases, each of which has its own proper- 
ties. In the mathematical formulation, models are 


defined by interaction potentials and equilibrium phases 
appear as states — positive linear functionals on algebras 
of observables. In the classical case the states are defined 
by means of the probability measures which satisfy 
equilibrium conditions, formulated in terms of the 
interaction potentials. Such measures are called Gibbs 
measures and the corresponding states are called Gibbs 
states. The observables are then integrable functions. In 
the quantum case the states mostly are introduced by 
means of the Kubo—Martin—Schwinger condition — a 
quantum analog of the equilibrium conditions used for 
classical models. The quantum observables constitute 
noncommutative von Neumann algebras. 

Infinite systems of particles studied in statistical 
mechanics fall into two main groups. These are 
continuous systems and lattice systems. In the latter 
case, particles are attached to the points of various 
crystalline lattices. In view of the specifics of our subject, 
in this article we will deal with lattice systems only. 


One of the main problems of the mathematical 
theory of phase transitions is to prove that the Gibbs 
states of a given model can be multiple, that is, that 
this model undergoes a phase transition. To solve 
this problem one has to elaborate corresponding 
mathematical tools. Typically, at high temperatures 
(equivalently, for weak interactions), a model, which 
undergoes a phase transition, has only one Gibbs 
state. This state inherits all the symmetries possessed 
by the interaction potentials. At low temperatures 
this model has multiple Gibbs states, which may lose 
the symmetries. In this case the phase transition is 
accompanied by a symmetry breaking. Among the 
symmetries important in the theory of lattice 
systems, there is the invariance with respect to the 
lattice translations. If the Gibbs state of a translation 
invariant lattice model is unique, it ought to be 
ergodic with respect to the group of lattice transla- 
tions. This means in particular that the spacial 
correlations in this state decay to zero at long 
distances. Therefore, the lack of the latter property 
may indicate a phase transition. In a number of 
lattice models, phase transitions can be established 
by means of their special property — reflection 
positivity. The most important consequence of 
reflection positivity are chessboard (another name 
checkerboard) estimates, being extended versions of 
Holder’s inequalities. The proof of a phase transi- 
tion is then performed either by means of a 
combination of such estimates and contour methods, 
or by means of infrared estimates obtained from the 
chessboard estimates. 

In this article we show how to prove phase 
transitions by means of the infrared estimates for 
some simple reflection positive models, both classi- 
cal and quantum. The details on the reflection 
positivity method in all its versions may be found 
in the literature listed at the end of the article. There 
we also provide short bibliographic comments. 


Nonergodicity and Infrared Estimates 


The following heuristic arguments should give an idea 
how to establish the nonergodicity of a Gibbs state by 
means of infrared estimates. Let us consider a classical 
ferromagnetic translation-invariant model. (Of 
course, we assume that it possesses Gibbs states, 
which for models with unbounded spins is a 
nontrivial property. A particular case of this model 
is described in more detail in the subsection “Gaus- 
sian domination.”) This model describes the system 
of interacting N-dimensional spins x; € RY, indexed 
by the elements £ € Z of the d-dimensional simple 
cubic lattice. The interaction is pairwise, attractive, 
nearest-neighbor, and invariant with respect to the 
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rotations in R. Consider a translation-invariant 
Gibbs state of this model, which always exists. Let 
K(0,0'),£,@ € Zf, be the expectation of the scalar 
product (x, xø) of spins in this state. Then K(£, ¢’) is 
also translation invariant and hence may be written as 


1 + : 
KO) = Gf Ripe dp, i= V=T p 
(27) (—1,x]? 
where the generalized function K is defined by the 
Fourier series 


K(p) = DO KE ee, 
eZ 
As the model is ferromagnetic, K(é,¢’) > 0. The 
Gibbs state is nonergodic if K(4, ¢’) does not tend to 
zero as |(—¢’|—+oo. In this case K should be 
singular at p=0. Set 


K(p) = (27) A6(p) + g(p) [3] 


where 6(p) is the Dirac 6-function and g(p) is regular 
at p =0. Then the Gibbs state is nonergodic if A Æ 0. 
Suppose we know that g(p)>0 and that the 
following two estimates hold. The first one is 


g(p) < r/J\pl’, p#0 [4] 


where y > 0 is a constant and J > 0 is the interaction 
intensity multiplied by the inverse temperature $. 
This is the infrared estimate. The second estimate is 


K(£,0)>x>0 [5] 


pel-m r’ [2 


where x is independent of J. By these estimates and 
[1], [2], we get 


en J DA 6] 
(20)4J Jira] |p| 


For d > 3, the latter integral exists; hence, A > 0 for 
J large enough, which means that the state we 
consider is nonergodic. 

The quantum case is more involved. The infrared 
bounds are obtained not for functions like K(p) but 
for the so-called Duhamel two-point functions. Then 
one has to prove a number of additional statements, 
which finally lead to the proof of the result desired. 
In the section on reflection positivity in quantum 
systems we indicate how to do this for a simple 
quantum spin model. 





Reflection Positivity and Phase 
Transitions in Classical Systems 


We begin by studying reflection positive (RP) 
functionals. Gibbs states of RP models are such 
functionals. 
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Reflection Positive Functionals 


Let A be a finite set of indices consisting of an 
even number |A| of elements, which label real 
variables x, A. For A CA, we write 
xn = (Xe) pear E RAT. Suppose we are given a bijec- 
tion p:A—A,pop=id, such that the set A falls 
into two disjoint parts A+ with the property 
p: A} —>A_. Therefore, |A,|=|A_|, and the map p 
may be regarded as a reflection. For x, € R'“!, we 
set plæa) = (xon )yea: Now let A be an algebra of 
functions A:R^ — R. Then we define the map 
V: A — A by setting 


D(A) (xa) = A(p(xa)) [7] 
Clearly, for all A,B € A and &,n ER, 
(EA + nB) = EV(A) + n (B) 


[8] 
(A - B) = (A) -0(B) 


By A*t (respectively, A`), we denote the sub- 
algebra of A consisting of functions dependent 
on xa, (respectively, x,_). Then 3(A*)=A™ and 
3 od =id. 


Definition 1 A linear functional 6: A— R is called 
RP with respect to the maps p and ¥, if 


VA € At: g[AU(A)] > 0 [9] 


Example 2 Let y be a Borel measure on the real 
line (not necessarily positive), with respect to which 
all real polynomials are integrable. Let also A be the 
algebra of all real-valued polynomials on R',|A\ 
being even. Finally, let p and V be any of the maps 
with the properties described above. Then the 
functional 


(10) 


is RP. Indeed, let F:R'/*—R_ be such that 
A(x) =F(x,,). Then 


In the above example the multiplicative structure of 
the measure y, is crucial. It results in the positivity 
of ¢ with respect to all reflections. If one has just 
one such reflection, the measure which defines ¢ 
may be decomposable onto two measures only. Let 
A,A,p, and V be as above. Consider a Borel measure 


v on RI^? such that every real-valued polynomial 
on R!“ is v-integrable. 


Proposition 3 The functional 


(A) =f Alea) dua dva) [AN 


is RP. 


In both these examples the states are symmetric, 


that is, 
plAV(B)] = 6[BY(A)], forall A,B € A* [12] 


In the sequel we shall suppose that all RP functionals 
possess this property. Therefore, RP functionals obey 
a Cauchy-Schwarz type inequality. 


Lemma 4 If ¢ is RP, then for any A,B € A’, 
{4]AV(B)]¥ < g[AV(A)] - d[Bd(B)] [13] 


Proof For € € R, by [8] we have 
p|(A + €B)v(A + §B)| 
= ġ|(A + €B)(V(A) + €0(B))] = 0 


Since ¢ is linear, the latter can be written as a 
3-nomial, whose positivity for all € € R is equivalent 
to [13]. oO 


Now let an RP functional ¢ be such that for 
AB. Creses Ci Dinaecs Dm E A’ 


there exists 


Q e (4 + 0(B) + 3 cwo») 


and that the series 


N1 ,... Nm =O 


x exp[A + 0(B)]} 14] 


as well as the one with all Cjs replaced by Djs 
converge absolutely. 


Lemma 5 Let the functional ọ and the functions 
A, B, Ci, Dj,i=1,...,m, be as above. Then 


2 
lo a (4 + (B) + 2 cb) 


exp (4 + 3(A) + 3 conc 





<9 








x Q os (: + 0(B) + Spb.) [15] 
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Proof By the above assumptions 


Q 





Il 
S 





I 
Pe 
a 
4 =. 


x [Cm (Dm) ] [16] 
where F = eĉ, G =e”. Then by [13] and the Cauchy- 


Schwarz inequality for sums we get 


RHS[16] 





1/2 
GLP OCE)ICr8(Ca)P" [CnC] } 


Ay! 





1/2 
| 9[G9(G)[D19(D1))" -[D{Dn))"]} 





1/2 
o[FO(F)[C,0(C1) |" Caan 





-fo 


7 1/2 
x fo ex (» +0(B) pw) 
i=l 


which yields [15]. O 





exp (a +V0(A)+ 





Main Estimate 


Let A be a finite set and A’ be its nonempty subset. 
Let also u and v be finite Borel measures on 


RNIALN €N. For vectors b,c € RN, by (b,c) and 
lb], |c| we denote their scalar product YX; b% c% 
and the corresponding norms, respectively. By xq 


we denote (x)pea, xe € RN; hence, xa € RNAI, 


Lemma 6 Let the sets A, A’ and the measures u, v be 
as above. Then for every (ajen € RN'*! and J > 0, 


2 
J exp - f lxe— ye =d] | dutsa)dvos) 
RNAI IEA 


<f exp -15 e-n) du(xa)du(ya) 


lel! 
: =. soa Je Pier P) a penn). 7) 


Proof Take two copies of A and denote them by 
A+. Furthermore, by A‘. C A4 we denote the subsets 
consisting of the elements of A’ C A. For an £ € A}, 
by p(¢) we denote its counterpart in A_. Then p isa 
reflection and p(A.)=A‘. Let A= UA, N = 


A’, UA‘, and A be the algebra of all polynomials 
or (xy yx) € RNIN, Note that xy may be regarded 
as the pair (xx „xa ). Let A* (respectively, A`) be 
the subalgebra of A consisting of the polynomials 
which depend on xy ,yẹy (respectively, xa , Ya ) 
only. Introduce the measures 


dji(xa) = exp ( l ` wt) du(xa) 


LEA! 


di(xa) = exp -iy t) dv(xa) 


ten! 
and define the following functional on A: 
oF) E fa POS ya’) Apk 
x dö(ya,) du(xa_) dv(ya_) [18] 


It has the same structure as the one described by 
Proposition 3, hence is RP with respect to the map V 
defined by the reflection p. Set 


t= f dies) To= f dea) 19 


and 
A= 0, B=-1 |; jao? + (ayo) 
tEn. 
20 
Sy, DP = vi +a) PO 
CEN, RS Llacs N 
Then the left-hand side of [17] is 
LHS [17] 
— oo A+ 0(B 
TT) $ on (B) 








2 


DDO o(D! 


with ¢ given by [18]. Applying [15] and taking into 
account [19], we arrive at 


LHS [17] 


ned 
er) exp| JD XX pce 
(Tate) Re x | 


x dti(xa,) dai(xa_) dv(ya,) dd(ya_) 


x 
how = V 2 ao) 


x dti(xa,) dai(xa_) dv(ya,) dv(ya_) 


which completes the proof. oO 


[21] 


ZZ lr))| 





= RHS [17] 
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Gaussian Domination 


Let A be a finite set, |A| even, and E be a set of 
unordered pairs of elements of A, such that the 
graph (A,E) is connected. If e € E connects given 
Ll EA, we write e= (£, l). We suppose that E 
contains no loops (¢,¢). With each ¿€A we 
associate a random N-component vector x;, called 
spin. The joint probability distribution of the spins 
(xeen is defined by means of the local Gibbs 
measure 


1 
duaal = i a > lace — xe? | dxa(xa), 
(LL)CE 
x, € RNAI [22] 
Here the measure 
dxa (xa) = | | dx(xo) |23] 
LEA 


describes the system if the interaction intensity 
J equals zero. In general, J > 0, that is, the model 
[22], [23] is ferromagnetic. The single-spin measure 
x is a probability measure on R and 


J 2 
Z= f p ee ee = xe 


Le\EE 


dya(xa) [24] 


is the partition function. Set 


20) = in =~ -5 2 xe — xe — bool? 
( 


LL VEE 
x dxa (xa) [2.5] 
where he = ho € RN, (4L, L) EE. 


Definition 7 The model [22]-[23] admits Gaussian 
domination if for all h = (hy) ip, a) ces 


Z(h) < Za(0) |26] 


We prove that our model admits Gaussian domina- 
tion if the graph satisfies the following: 


Assumption 8 The set of edges E can be 
decomposed 

E= JEn En({)Ew=9, ifngn [27 
n=1 


in such a way that for every n=1,...,m, the graph 
(A, E\E,) is disconnected and falls into two con- 
nected components, (AM. E and (A, E™®), which 
are isomorphic. This means that there exists a 
















































































Figure 1 The torus. 


bijection py: A— A, pn pn =id, such that 
pn( A’) =A™ and (p,(€), pn(f’)) E€ E™ whenever 
(4,0) € E”. Finally, we assume that if (¢, 0) € E, 
and £ € AY’ then p,(@)=£. 


By this assumption if (¢,¢’) € E,, then no other 
elements of E, can be of the form (¢,2") or (¢", 0’). 
The basic example here is the torus which one obtains 
from a rectangular box A C Z, |A| even, by imposing 
periodic conditions on its boundaries. The set of edges 
is EF={(€,e)||€-—|,=1}, where |€— |, is the 
periodic distance on A (see the next subsection). 
Then every plane which contains the center of the 
torus and its axis cuts it out along a family of 
edges onto two subgraphs with the property 
desired (see Figure 1). 


Theorem 9 The model [22|-[23] defined on the 
graph obeying Assumption 8 admits Gaussian 
domination. 


Proof Foro=+1,h= (hye) ip oeg, andn=1,...,m, 
we define the map 


how, if (0,0) € E™) 
CE = Do, (Opn(l) i (650 °C E“) [28] 
0, if (2,2) © Ey 


According to Assumption 8 


ane in mAP “4 2: [xe — xe — he 


(LEE 


x drio (x 4) dy (x Ko) [29] 


= €Xp 7 > lace — xe — bye |” dy, (xo); 


(,0)eE”) 


Set 


x dx, (x0) 


Ly we apply here Lemma 6, with A‘ ={é¢ 
AWI, L) € Ey}, and obtain 


2 J 2 
[Za (hb) < | sy EXP 5 D lxe — xe | ) 
(ee )eEy 


(1) (x a0) 


J 2 
«| exp |— > ` |e — xp | 
RNAI | = (00V EE 


x drio (xao) dvo (xao) 


= Za (TI b)Za (Tī h) 


Next we estimate both Za(TF¥h) employing E2 and 
T5. Repeating this procedure due times we finally 
get 

Zah < [[ Za(Ter--- Tih) =[Zr(0)FP" [80] 


Ol; Om el 


Note that T?”---Ty'b=0 for any h e RNIEI and any 
sequence 01,...,0m= +1, which follows from [27] 
and [28]. O 


As might be clear from the proof given above, the 
local Gibbs state 


oa(A) =f Aeda) BI 


defined by means of the measure [22], is RP 
with respect to all reflections p,,n=1,...,m. 
Indeed, the functional defined by the product 
measure 


dž (xa) = = exp - 


5 læ ‘Jor (xa) [32] 


2 
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is RP (see Example 2). The Gibbs measure [22] can 


be written as 


m N 
nize SEE ene) 


x dX (Xa) [33] 


where ct „k=1,...,N, are the same as in [20] and 
Ai “defy eA” Nie, l) € En}. Then the reflection 
wy of aie Gibbs state [31] can be obtained 
along the line of arguments used for proving Lemma 
6. It appears that this is the only possible way to 
construct an RP functional from another RP 
functional. 

Repeated application of the estimate [15] also 
yields 


On M Fs) <[] a(l Fis) i [34] 


LEA LEA LEN 


which holds for any family of functions 
(Fp: RN + [0,+00)}pe,, for which the above 
expressions make sense. The estimate [34] is a 
chessboard estimate, which is a very important 
element of the theory of phase transitions in 
RP models. The estimate [26] may be obtained 
from [34]. 


Infrared Bound 


Let us show now how to derive the infrared 
estimates from the Gaussian domination [26]. 
Consider the system of N-dimensional spins 
indexed by the elements of Z* with the nearest- 
neighbor ferromagnetic interaction and the sin- 
gle-spin measure y. To construct the periodic 
local Gibbs measure of this system, we take the 
box 

A= (CL, L NZ, LEN [35] 
and impose periodic conditions on its boundaries. 
This defines the periodic distance 


J 1/2 
l-l = bd , €0exX 
| lA bar i 36 


G- Gl, = min{|G — 6); L- 1-4) 

and hence the set of edges E, being unordered 
pairs (¢,0’) such that |@— |, =1. Thus, we have 
the graph (A, E) and the measure [22]. This is the 
periodic local Gibbs measure of our model. By 
[31] it defines the periodic local Gibbs state ¢y. 
We have included the inverse temperature 8 into J 
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and assumed that the single-spin measure x is 
rotation invariant. Let us introduce the Fourier 
transformation 


x(p) = xpel(P) 
Pa 


1 [37] 
xe = ae ten 
VAI vin oe 
TT 
W= fv z renal = -T +7 fj 
ia $1, 00yd} 38] 
Then we can set 
RY (p) = on | (9) p) 
P N [39] 
Ralp) =X RO (p) 
kal 


Thereby, cf. [1], [2], 


Ka (0,0) = dal(xe,x0)] = 


Rape ao 


By construction, for any fo € A, 


Ka (b, L) = Kall + bo, l + bo) [41] 


where addition is componentwise modulo 2L. This 
means that Ka(¢,¢’) is invariant with respect to the 
translations on the corresponding torus. One can 
show that K,(é, ¢’) converges, as L — +00, to K(Z, L’) 
discussed in the Introduction. The corresponding 
Gibbs state of the whole model is called the periodic 
Gibbs state. By construction, it is translation 
invariant. Set 


Theorem 10 For all p € A, \ {0} 


~ N 
Ka(p) < IJE) 


Proof Consider the function f(€)=Z,(éh),€ € R, 
where Z,(h) is defined by [25]. By Theorem 9 it has 


a maximum at = 0; hence, 





[43] 


f (0) <0 [44] 


Obviously, f”(0) 


sl depends on h= (bie) overs 
hog E R“. 


Let us choose / such that only the 


first components bi) are nonzero. Then [44] 


holds if 
T D deal (eh xe) (xe 24) Pacha 


(01,0, )EE (02,05) €E 
25 A? | 45] 
(LEVEE 


This means that the eigenvalues of the matrix of the 
real quadratic form (with respect to h) defined by 
the left-hand side of [45] do not exceed one. The 
same ought to be true for the extension of this form 
to the complex case. Let us show that the complex 
eigenvectors bi) (p) of this matrix and the corre- 
sponding eigenvalues A(p) are 


bi (p) = (E09 — i00) VA] 
Ap) = ZJE) RO (p) 


For j=1,...,d, let 0; € Z? be the unit vector with 
the jth component equal to 1. Then for (¢, 0’) € E, 
there exists 6; such that ¢— ¢@/=+6). Since the edge 
(€,@) is an unordered set, let us fix V =£ + 4). 


Thereby, 
_ a) (cio _ ello) | 


1 St (x0 
X 
A, ( f 
are ai pee cos( 6) 


LL'VECE 
LEA j 


= 24 (p)E(p) 
In view of [41], one has 


paR (P)O (P')] = bopo Ki (P) 
Then employing the latter two facts and [37], we get 


J 2 KAIGA =a ee - x), bee (P) 


CAA 


= 2JE(p) os (x1) -x 2 @)] 
1 


pen, 146] 


= 2JE(p)K\ KOLN (p) 
which proves [46]. Then by [45] K (H(p) e n 
for p £0. The same holds for RE ip), k “ON, 
which by [39] yields [43]. O 


The result just proved and the convergence of 
Ky(é, 2) + K(é, 2), as L— +00, imply the infrared 
bound [4]. It turns out that the estimate [43] 


may be used directly to prove the phase transi- 
tion. Consider 


def 1 
= 


` dal(Xe, ’ xe, )] 


[Al l h EA 
2 
| > 0 [47] 


where A is the box [35]. By [40] and [41], we have 


1 & 
P= jaj KAO [48] 








One can show that if P% limz—+əo Pa is positive, 

then there exist multiple Gibbs states. By [40], [41], 
and [48], we get that for any £ € A, 

1 a 

Ka(é,£) = Pa + m K(p) [49] 

ped. \{0} 


Suppose that, cf. [5], 
Ka(Z,2) >x>0 [50] 


with x independent of A and J. Employing in [49] 
this estimate and [43], and passing to the limit 
L — +00, we get 


P>x-—T(d)N/2] [51] 
where 
df 1 Ea 
m= (2r)? I. E(p) 32] 


which is finite for d > 3. Thereby, we have proved 
the following: 


Theorem 11 For the spin model |22], [23], there 
exist multiple Gibbs states, and hence multiple 
phases, if d > 3 and J > T(d)N/2x. 


Finally, let us pay some attention to the estimate 
[50], which is closely related with the properties of the 
single-spin measure y (note that x played no role in 
obtaining [26] and [43]). If it is the uniform measure 
on the unit sphere Sy_, C R, then Ka(é,¢)=1 and 
[50] is trivial. In general, one has to employ some 
technique to obtain such an estimate. 


Reflection Positivity and Phase 
Transitions in Quantum Systems 


As in the classical case, the way of proving the phase 
transition for appropriate models leads from an 
estimate like [17] to Gaussian domination and then 
to the infrared bound. However, here this way is 
much more complicated, so in the frames of this 
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article we can only sketch its main elements basing 
on the original paper by Dyson et al. (1978), where 
the interested reader can find the details. As above, 
we start by studying reflection positive functionals. 


Reflection Positivity in Nonabelian Case 


Again we consider a finite set A, |A| being even. For every 
l € A, let a complex Hilbert space He be given. This is 
the single-spin physical Hilbert space for our quantum 
system. We suppose that all Hp, £ € A, are the copies of a 
certain finite-dimensional space H. The physical Hilbert 
space H4 corresponding to A is the tensor product of 
Hy, l € A. Let A, be the algebra of all linear operators 
defined on Ha. This is the algebra of observables in our 
case; it is noncommutative (nonabelian) and contains the 
unit element I — the identity operator. As above, A splits 
into two subsets A+, which are the mirror images of each 
other, that is, we are given a reflection p: A— A, such 
that p(A,)=A_. This allows us to introduce the 
corresponding subalgebras Ax by setting the elements 
of AÑ to be of the form A & I, where A: Hy, > Ha, isa 
linear operator and I is the identity operator on Ha. 
Respectively, the elements of A, are to be of the form 
I @ A. Then we define the map 3: Ay — Ax as 


WAQD=I@A [53] 


where A => A is complex (not Hermitian) conjugation; it 
may be realized as transposing and taking Hermitian 
conjugation. For A;,...,A, € A, one has A1 --- A, = 
A,,:::Ayn. We also suppose that 9 possesses the 
properties [8]. A linear functional 6: A, —> R is called 
RP (with respect to the pair p, ©) if it has the property [9]. 


Definition 12 A functional ¢ is called generalized 
reflection positive (GRP) if for any A1,..., An E€ AÑ, 

g|A10(A1) >> And(An)| = 0 [54] 
In principle, this notion differs from the reflection 
positivity only in the nonabelian case. However, if 


the algebras A¥ commute (they do commute in our 
case), a functional ¢ is RP if and only if it is GRP. 
Example 13 Let 

(A) =trace(A), AE Ag [55] 


Since the space Ha is finite dimensional, this @ is 
well defined. It is GRP. Indeed, as the algebras AÑ 
commute, we have 
lA, @QI-v(Ay @I)---A, ®@I-V(A, ® D] 
= dA; @1---A, @1-V(A1 @D)--- V(iA, QT) 
= f[A1 @1---A, QI- (A, QI- -An @ DI 
= trace[A; ---A,] - trace[A, --- An] 
= |trace[A, ---A,]|" > 0 
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The Cauchy-Schwarz inequality [13] obviously 
holds also in the quantum case. By means of this 
inequality and the Trotter product formula 


exp(A +B) = lim [exp(A/n) exp(B/m)|" [56 


one can prove that every RP functional obeys an 
estimate like [17]. Thereby, we have the following 
analog of Lemma 6: 


Lemma 14 Let A,B,C,,...,C, E€ Aj, be any self- 
adjoint operators possessing real matrix representa- 
tion and a1,...,4m be any real numbers. Then 


2 
traced ex (4 + (B) — XG — V(C,) — an’) i 
n=l 


= ase exp (4 + V0(A) — 


«eae exp (8 00) -yG —v(C ar) [57] 


Gaussian Domination and Phase Transitions 


To proceed further we need a concrete model with 
finite-dimensional physical Hilbert spaces. As every 
quantum model, it is defined by its Hamiltonian. Let 
Ac Zf be the box [35] and (A,E) be the same 
graph as in the subsection “Infrared bound.” The 
periodic Hamiltonian of our model is 


Hy => Or+5 XO [Se = Sef? |58] 


LEA (0\EE 


where at each ¿€ A we have the copies Q,, 
ae eee oy of N+ 1 basic operators, acting in the 
Hilbert space Hp, and 


N 
ISe — eg _ > (8. E si)" 


k=1 


The only condition we impose so far is that all these 
Operators can simultaneously be chosen as real 
matrices. For h = (hw) myer € RN we set 


BY Ou 


LEA 


"z S S.—Se—het}} [59] 


2 L\EE 


Lito) = trace exp (- 


where (> 0 is the inverse temperature. 


Theorem 15 For the model [58] 
b= (he )emer E RN", 


Zy(h) < Za(0) (60) 


and any 


The proof is performed by means of Lemma 14. 
The periodic local Gibbs state of the model [58] at 
the inverse temperature 8, analogous to the state [31], is 


pa lA) =trace{A exp(—GHy)}/Z (0), Ac Aa [61] 


As in the classical case, one can define the parameter 
[47]. However, now the fact that limp_.4.,P, >0 
does not yet imply the phase transition. One has to 


prove a more general fact 
) l>a [62] 


os ! ae PA (a JA’ D a 


LEN 
where A’ is the box [35] of side 2L’. Furthermore, in 
the quantum case the Gaussian domination [60] 
does not lead directly to the estimate [43], which 
yields [51]. Instead, one can get a bound like [43] 
but for the Duhamel two-point function (DTF). 
Given A,B € Ay, their DTF is 








1 
(A, B) = / by (Ae Ha Bede [63] 


By means of [56] one can show that 





1 
ee) = 70) 
{-2-tracelexp(éA-+nB—Ha)Ib 64 
x trace|exp = 
on í i f=n=0 
Let S(p)=(S"(p),...,8™(p)),p € Ax, be the Fourier 


image of Sy, defined b 375-13 


(S(p), 8(-p)) = S (S(p), 8p) 


k=1 
Theorem 16 For all p € A,\{0}, it follows that 


7 7 N 
(SoSo) < zE 

To prove this statement one has to use the 
Gaussian bound [60] exactly as in the case of 
Theorem 10. The second derivative with respect to 
€ gives the corresponding DTF (see [64]). 

Now let us indicate how the infrared bound [65] 
leads to the phase transition. To this end we use the 
simplest quantum spin model with the E 
[58], for which QO;=0,N=2, and Ji  k=1,2, 
being the copies of the Pauli matrices 


01 
(1) _ 
= (1 o) 


Ke, 2) = oa (Sp sf) =1 = 
forall ¿€ A, kR=1,2 [66] 


8]. Then 


[65] 


Then 


which gives the bound x (see [50]). For A,B € A4, 
by [A, B] we denote the commutator AB — BA. Set 


=! (p) = da([8(), [aas] 
k=1,2 [67 


The phase transition in the model we consider can 
be established by means of the following statement 
(see Dyson 1978, Theorem 5.1). 


Proposition 17 Suppose there exist S*)(p),k=1, 2, 
p E€ (-7, r]? such that, for all LEN, 


op) <x (p), k=1,2, ped. [68] 


Then the model undergoes a phase transition at a 
certain finite 8 if d > 3 and 


= I. cont 


for a certain, and hence for both, k=1,2. 


1/2 
dp < 1 [69] 








Thus to prove the phase transition we have to 
estimate D (p), k = 1,2. By means of the Cauchy- 
Schwarz inequality, the estimate [69] may be 
transformed into the following: 


where Z(d) is the same as in [52]. The integral on 
the left-hand side can be estimated from above by 


8,/d(d +1); hence, the latter inequality holds if 


T(d),\/d(d+1) <2 


which holds for all d > 3. In particular, Z(3) ~ 0.505. 
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Introduction 


If A is a finite, say NxN, matrix with 
complex coefficients, the following easy equality 


gives an expression for the polynomial 
IIX; (1 — ede) = det (Id — zA): 


det(Id — zA) = exp| — eer A” [1] 


n=1 


(here, Id denotes the identity matrix and tr is the 
trace of a matrix). Even in this trivial finite- 
dimensional case, the z-radius of convergence of 
the logarithm of the right-hand side only gives 
information about the spectral radius (the modulus 
of the largest eigenvalue) of A. The zeros of the 
left-hand side (1.e., the inverses z=1/A, of the 
nonzero eigenvalues of A) can only be located 
after extending holomorphically the right-hand 
side. The purpose of this article is to discuss 
some dynamical situations in which A is replaced 
by a linear bounded operator £, acting on an 
infinite-dimensional space, and for which a dyna- 
mical determinant (or dynamical ¢-function), con- 
structed from periodic orbits, takes the part of the 
right-hand side. In the examples presented, £ will 
be a transfer operator associated to a weighted 
discrete-time dynamical system: given a transfor- 
mation f:M — M ona compact manifold M and 
a function g:M — C, we set 


Lp=g-yof [2] 


(If f is not inversible, it is understood, e.g., that f 
has at most finitely many inverse branches, and 
that the right-hand side of [2] is the sum over 
these inverse branches, see the next section.) We 
let £ act on a Banach space of functions or 
distributions y on M. For suitable g (in particular 
g =| det Tf!| when this Jacobian makes sense), the 
spectrum of £ is related to the fine statistical 
properties of the dynamics f: existence and 
uniqueness of equilibrium states (related to the 
maximal eigenvector of £), decay of correlations 
(related to the spectral gap), limit laws, entro- 
pies, etc: see, for example, Baladi (1998) or 
Cvitanović et al. (2005). The operator £ is not 
always trace-class, indeed, it sometimes is not 
compact on any reasonable space. Even worse, its 
essential spectral radius may coincide with its 
spectral radius. (Recall that the essential spectral 
radius of a bounded linear operator £ acting on a 
Banach space is the infimum of those p > 0, such 
that the spectrum of £ outside of the disk of 
radius p is a finite set of eigenvalues of finite 
algebraic multiplicity.) However, various techni- 
ques allow us to prove that a suitable dynamically 
defined replacement for the right-hand side of [1] 
extends holomorphically to a disk in which its 
zeros describe at least part of the spectrum of £. 
Some of these techniques have a “regularization” 
flavor, and we shall concentrate on them. 

In the following section, we present the simplest 
case: analytic expanding or hyperbolic dynamics, 
for which no regularization is necessary and the 
Grothendieck-Fredholm theory can be applied. 
Next, we consider analytic situations where 
finitely many neutral periodic orbits introduce 
branch cuts in the dynamical determinant, and 
see how to “regularize” them. Finally, we discuss a 


kneading operator regularization approach, 
inspired by the work of Milnor and Thurston, 
and applicable to dynamical systems with finite 
smoothness. 

Despite the terminology, none of the regulariza- 
tion techniques discussed below match the following 
“C-regularization” formula: 


OO d OO = 
[I Ap = eXp (- a ` ak ™ [3] 
k=1 k=1 


(For information about the above -regularization 
and its applications to physics, we refer, e.g., to 
Elizalde 1995. See also Voros (1987) and Fried 
(1986) for more geometrical approaches and further 
references, e.g., to the work of Ray and Singer.) 

We do not cover all aspects of dynamical 
¢-functions here. For more information and refer- 
ences, we refer to our survey Baladi (1998), to the 
more recent surveys by Pollicott (2001) and Ruelle 
(2002), and also to the exhaustive account by 
Cvitanović et al. (2005), which contains a rich 
array of physical applications. 


The Grothendieck-Fredholm Case 


Let M be a real analytic compact manifold (e.g., the 
circle or the d-torus), and let f:M — M be real 
analytic and g:M-—C be analytic. 

First suppose that f is uniformly expanding, that 
is, there is A>1 so that ||Tf(v)|| > Allv||. (For 
example, f(z)=z* on the unit circle, or a small 
analytic perturbation thereof.) Consider 


Li gox = 2 gly [4] 

f(y)=x 
(For example, with g(y)=1/|detTf(y)| or 
1/| det Tf(y)|’.) Ruelle (1976) proved that an 


operator Lo, which is essentially the same as £f , 
(the difference, if any, arises from the use of Markov 
partitions, especially in higher dimensions), acting 
on a Banach space of holomorphic and bounded 
functions, is not only compact, but is in fact a 
nuclear operator in the sense of Grothendieck. In 
particular, the traces of all its powers are well 
defined, and the Grothendieck—Fredholm (Gohberg 
et al. 2000) determinant 


do(z) = exp (- ‘ <tr ci [5] 


n=1 

extends to an entire function of finite order, the 
zeros of which are exactly the inverses of the 
nonzero eigenvalues of £o. (The order of the zero 
coincides with the algebraic multiplicity of the 
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eigenvalue.) Ruelle also proved that the traces can 
be written as sums over periodic orbits: 


ro 5 TIES (Fx) 
m= De Tdet(ld ~ TH") 


where X`“ means that the fixed points of f” lying in 
the intersection of two or more elements of the 
Markov partition must i counted two or more 
times. (Note that if f”(x)=x, then this closed orbit 
gives a natural inverse U for f ”.) Taking into 
account the periodic orbits on the boundaries of the 
Markov partition, Ruelle expresses the following 
“dynamical determinant”: 


df g(2) 


L z” Maa 
aon , Tdet(id - Tey} 


as an alternated product of determinants do(z) as in [5]. 
The expression [6] is sometimes also called a 

“dynamical ¢-function,” but we prefer to reserve this 

terminology for the following power series: 


Cfg(Z) = exp p > [f9 7 


n=1 E Tr hy 0 


It is not difficult to write Ç; .(z) as (Baladi 1998) an 
alternated product of determinants df g, for 
i=0,...,d, and appropriate weights g;. 

i a. the results just described hold in more 
e for example, for piecewise bijective and 
analytic interval maps. Such maps, f, appear 
naturally, for example, when considering Schottky 
subgroups of PSL(2, Z). We mention the recent 
work of Guillopé-Lin-Zworski (2004), who let the 
a operator mg igen to such f and weights 

= 1/|f'(y)|? act (as trace-class operators) on 
mani Hilbert spaces of holomorphic functions. 
This allows them to obtain precise estimates for the 
number of zeros of st>d;, [1] in the complex 
plane: these zeros are the resonances (in the sense of 
the spectrum of the Laplacian). 

Note that the nuclearity properties extend also to 
the Gauss map f(x)={1/x}, which has infinitely 
many inverse branches, if the weight g has summa- 
bility properties over the branches (e.g., 

y)=|1/f'(y)|’, where s is a complex parameter, 
with Rs > 1/2). The dynamical determinant dy, (z) 
for the transfer operator of the Gauss map is related 
to the Selberg ¢-function (see e.g., Chang and Mayer 
(2001) and references therein). 

Next, assume that M and g are as before, but f is a 
uniformly hyperbolic real analytic diffeomorphism. 
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For example, M is the 2-torus and f is a small real 
analytic perturbation of the linear automorphism 


G 1) 


More generally, we may assume that f is a real 
analytic Anosov diffeomorphism, that is, there are 
C>1 and ’>1 such that the tangent bundle 
decomposes as TM = E” @ E', where the dynamical 
bundles E” and E5 are Tf-invariant, with ||Tf”|,.|| < 
CA” and ||Tf”|ge]| < CA” for all n € Z,. In 
general, the smoothness of x—> E”(x) and E(x) is 
only Hölder. Under the very strong additional 
assumption that E”(x) and E‘(x) are real analytic, 
Ruelle (1976) (see also Fried (1986)) showed that 
the power series dy (2) can again be written as a 
finite alternated product (this product being again 
an artifact of the Markov partition) of entire 
functions of finite order. For this, he constructed 
auxiliary transfer operators associated to the 
expanding (and analytic!) quotiented dynamics 
acting on holomorphic functions on disks. The 
analyticity assumption on the dynamical bundles 
was later lifted by Rugh (1996) (see also Fried 
(1995)), who let their transfer operators act on 
Banach topological tensor products of spaces of 
holomorphic functions on a disk with the dual of 
such a space. In all these cases, the transfer 
Operator is a nuclear operator in the sense of 
Grothendieck and no regularization is needed. 
(More recent work of Kitaev (1999), when applied 
to this analytic setting, shows that the “mero- 
morphic” function dy (z) in fact does not have 
poles.) 


Regularization and Intermittency 


Consider the interval M = [0, 1], and f defined on M 
by f(x)=fi(x)=x/(1—x) on [0,1/2], and f(x) = 
fo(x)=(1—x)/x on [1/2,1]. (This is the Farey 
map, which appears naturally when considering 
continued fractions.) Each of the two branches is 
an analytic bijection onto [0, 1]. The second branch 
is expanding, but the first one, fı, has a (parabolic) 
neutral fixed point at x=0 (the expansion is 
f(x)=x+x7+2°+4+---). Let g=g, be an analytic 
weight of the form g(y) = 1/|f' (y) for Rs > 1/2. We 
are interested in the spectrum of the operator Lf g 
associated with the pair (f,g) by [4]. Clearly, the 
expression [6] is not a good candidate for an analog 
of the Fredholm determinant of Lf. Rugh (1996) 
introduced a Banach space B of functions in a 
complex neighborhood of M, having a controlled 
singularity at 0, and such that the spectral radius of 


Lf, on B is equal to 1, and such that the following 
regularized determinant 


dy, (2) 


CO „n n—1 Ry 
sep- D Hesta g 


n=1 xE(0,1]: f” (x =x X 


is a holomorphic function in the cut complex plane 
{z €C|z¢[1,co)}. Furthermore, its zeros z in this 
cut plane are in bijection with the spectrum of Ly. ¢|, 
outside of the unit interval [0,1], and this spectrum 
consists of eigenvalues 1/z of finite multiplicities. 
Finally, these eigenvalues can only accumulate at 0 
or 1, although each point in the unit interval belongs 
to the spectrum of Ly ,. In particular, the essential 
spectral radius of Lf on B coincides with its 
spectral radius. 

Let us define the Banach space B and explain the 
key ideas in the proof of the above result (Rugh’s 
claim is in fact more general than the statement 
above and applies to a class of maps f with neutral 
fixed points). The starting point is the decomposition 


Lig = Li Fi 


where Lip=yof;! - (fY. The operator £2 is of 
the type discussed in the previous section, and it is 
nuclear when acting, for example, on bounded 
holomorphic functions in a complex neighborhood 
of M. Since fı is not expanding (because of the 
parabolic fixed point at 0), other ideas must be used 
to handle the operator £1. The change of coordinates 
(this idea goes back to Fatou) w=1/x replaces the 
weak contraction fī! by the translation w> w + 1 in 
a suitable domain containing a half-plane Rw > wo. 
In order to take into account the weight g,, it is 
convenient to use the change of variables 
U(w) = y(1/w) - ws. Indeed, in the new coordinates 
the operator £4 reads as 


M,U(w) = Vw + 1) 


The next step consists in letting Mı act on the 
Banach space 6, of Laplace transforms of 
L'(R*, Lebesgue), that is, functions 


W(w) = f graed 


with the induced norm ||V||,, = J |Y(t)| dt. Since Mı 
maps w to e ‘y(t), it is not difficult to see that the 
spectrum of M, on B,, (and thus of £4 on the pullback 
B of Bu by ®, which consists of functions in a complex 
neighborhood of [0,1], holomorphic in a sector at 0, 
and with a possible, but controlled, singularity at 0) is 
the closed unit interval. One can check that £> is 
nuclear on B. Composing a bounded operator with a 


nuclear operator gives a nuclear operator. If 1/z g 
[0, 1], the resolvent (1 — z£,)~ is a bounded operator, 
and therefore, for such z, the operator 


P(z) := 2l2(1—2L1)7' [9] 


is nuclear on B. We view P(z) as a 
version of Lf g = L1 + L2. Now, since 


=z) =(1—2(£1+L£2)) 
= (1 = zL)" (1 =a = AD 


“regularized” 


it is not surprising that one can prove (Rugh 1996) 
that the Fredholm determinant 


ut det(1 — £Lo(u — Lay) 


(which is holomorphic in u ¢ [0, 1]) has as its zero set 
sp(Ly elg) \ [0, 1], and that this set consists in isolated 
eigenvalues of finite multiplicity (equal to the order of 
the corresponding zero) for Ly, z. Formally, 


= Yeteh [10] 


so that the regularization we just described can be 
viewed as mirroring an induction (or renormaliza- 
tion) procedure, where the dynamics f is replaced by 
the first-return map to the “chaotic” part of the 
phase space [0, 1/2]. (For the Farey map, the induced 
map is just the Gauss map.) The formal equality [10] 
is also behind the fact that (Rugh 1996) 


(1 — zl) 


5 11 £5 as(f*) 


tr P(z)" = — 
x+0: f” (x)=x 1- Th, 


An extension of this theory to the two-dimensional 
setting has been obtained by Baladi, Pujals, and 
Sambarino. 


Regularization and Kneading 
Determinants 


Up to now we have only discussed analytic dynamical 
systems, for which hyperbolicity (or uniform expan- 
sion) guaranteed that the transfer operator (or a 
regularized version thereof) was compact, even 
nuclear, on a natural Banach space. When considering 
hyperbolic invertible (or expanding noninvertible) 
maps f, and weights g with “finite smoothness,” say 
C” for some finite r > 1, the transfer operator defined 
by [2] or [4] is usually not compact on any infinite- 
dimensional space. However, one can often prove a 
“Lasota-Yorke” type inequality (see e.g., Baladi 
(1998)) which ensures that the essential spectral radius 
Pess(Lf, g), defined in the “Introduction,” is strictly 
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smaller than the spectral radius. Then, the goal is to 
prove that the dynamical determinant [6] defines a 
holomorphic function in the disk of radius 1/,.,,, and 
that its zeros in this disk are exactly the inverses of the 
eigenvalues of Ly .. For uniformly expanding C” maps 
f on compact manifolds, and C” weights, denoting by 
A > 1 the expansion coefficient as in the section “The 
Grothendieck—Fredholm case,” this goal was essen- 
tially attained by Ruelle (1990). For Ly, acting on the 
Banach space of C” functions on M, Ruelle proved 
Pess(Lf,¢) < A” and was able to extend dy .(z) (and 
interpret its zeros) in the disk of radius A”. 

For C” Anosov diffeomorphisms f, and C” weights g, 
Pollicott, Ruelle, Haydn, and others obtained important 
results using the symbolic dynamics description (for 
which the maximal smoothness which can be used is 
r < 1, because of the metric-space model). Later, Kitaev 
(1999) was able to show that df .(z) extends to a 
holomorphic function in the disk of radius \~"/7, 
but did not give any spectral interpretation of the 
zeros of dy (2). More recently, Liverani (2005) was able 
to give such an interpretation, in a smaller disk however. 

All the works mentioned in the previous paragraph 
are based on some approximation scheme (Taylor 
expansion style). In the early 1990s, a new approach, 
with a regularization flavor, was launched (see e.g., 
Baladi and Ruelle (1996)), initially for piecewise 
monotone interval maps. We present it next. 

Consider a finite set of local homeomorphisms 
w,:U, — w,(U,), where each U, is a bounded 
open interval of R, and of associated weight functions 
g- which are continuous, of bounded variation, and 
have support inside U,,. For example, the y,, can be the 
inverse branches of a single piecewise monotone 
interval map f, and g, can be goy, for a single g. 
(No contraction assumption is required on the w,: 
their graph can even coincide with the diagonal on a 
segment.) The transfer operator is now 


=) gu: (po Wu) 


Ruelle obtained an estimate, noted R, for the essential 
spectral radius of M acting on the Banach space BV of 
functions of bounded variation. The main result of 
Baladi and Ruelle (1996) links the eigenvalues of 
M:BV — BV outside of the disk of radius R, with 
the zeros of the following “sharp determinant”: 


Siet) [11] 


n=1 


det” (Id — zM) = exp (- 
=0 if y=0) 


1 wy — 
trř M = Ho e =r X de, (x) 


where (with the understanding that y/|y| 
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If the Yy are strict contractions which form the set 
of inverse branches of a piecewise monotone interval 
map f, and g,=go0vy),, then integration by parts 
together with the key property that 


a 
2 |x| 
show that det” (Id — zM) =1/¢ (z) (recall [7]). If 


one assumes instead only that the graph of each 
admissible composition y% of n successive ws (with 
n > 1) intersects the diagonal transversally, then 


d = 6,the Dirac delta at the origin of R 


n 


= exp|- >= 


n=1 admissible y% x:Y% (x)=x 
n—1 
x Te (us) 12 
k=0 


where L(x,w) € {—1, 1} is the Lefschetz number of a 
transversal fixed point x = w(x) (if = is C! this is just 
sen (1 — y’(x))). Therefore, we call the sharp determi- 
nant det” (Id — zM) a Ruelle—Lefschetz (dynamical) 
determinant. For a class of “unimodal” interval maps f 
and constant weight g=1, the expression [12] with 
Lefschetz numbers, coming from the additional 
transversality assumption, gives that det” (Id — zM) 
is just 1/¢~ (z), where the “negative ¢-function” 


L(x, Vo) 


Ce) < exp] + E E e- 3 
n=1 


is defined by counting (twice) the sets 


Fix (f”) = {x|f"(x) =x, f strictly decreasing 
in a neighborhood of x} 


of “negative fixed points.” This negative ¢-function 
was studied by Milnor and Thurston, who proved 
the remarkable identity 


(¢-(z)) = det(1 + D(z) 


where D(z) is a 1x1 “matrix,” which is just a 
power series in z with coefficients in {—1,0, +1}, 
given by the signed itinerary of the image of the 
turning point (the so-called “kneading” data). 
Returning now to the general setup Yw, g., the 
crucial step in the proof of the spectral interpreta- 
tion of the zeros of this Ruelle-Lefschetz determi- 
nant consists in establishing the following 
continuous version of the Milnor—Thurston identity: 


det” (Id — zM) = det *(Id + D(z)) [14] 


where the “kneading operator” D(z) replaces (for- 
mally) the finite kneading matrix of Milnor and 


Thurston. In a suitable z-disk, one proves that this 
operator D(z) is a Hilbert-Schmidt operator on an 
L? space (its kernel is bounded and compactly 
supported), thus allowing the use of regularized 
determinants of order 2 (see e.g., Gohberg et al. 
(2000)). By definition, det"(Id + D(z)) is the product of 
this regularized determinant with the exponential of the 
average of the kernel of D(z) along the diagonal, which 
is well defined. Another kneading operator, D(z), is 
essential. If 1/z is not in the spectrum of M (on BV), 
then D(z) is also Hilbert-Schmidt, and one can show 
det'(Id + D(z)) = det*(Id + D(z))*. The initial defini- 
tions of D(z) and D(z) were technical and we shall not 
give them here. However, a more conceptual definition 


of the D(z) was later implemented: 
D(z) = N (Id —2zM)'S [15] 


where M is an auxiliary transfer operator and S is 
the convolution 


See) = | ip eO) du 





where p is an auxiliary non-negative finite measure. 
From [15], it becomes clear that the kneading 
operator is a regularized (through the convolution 
S) object which describes the inverse spectrum of the 
transfer operator: the resolvent (Id — zM)™ in [15] 
means that poles can only appear if 1/z is an 
eigenvalue. Since det*(Id + D(z)) = det*(Id + D(z)), 
this can be translated into a statement for zeros of 
det (Id + D(z)). The Milnor-Thurston identity [14] 
then implies that any zero of det*(Id — zM) is an 
inverse eigenvalue of M. 

The one-dimensional kneading regularization we 
just presented is well understood. The higher- 
dimensional theory is not as developed yet. Let 
U,, be now finitely many bounded open subsets of R, 
Wo: Uuo > w,(U,) be local C” homeomorphisms or 
diffeomorphisms, while g,,: U,, — C are compactly 
supported C’ functions, for r > 1. 

In 1995, A Kitaev wrote a two-page sketch proving a 
higher-dimensional Milnor—Thurston formula, under 
an additional transversality assumption. This assump- 
tion guarantees that the set of fixed points of each fixed 
period m is finite, so that the Ruelle—Lefschetz 
determinant det” (Id — zM) can be defined through 
[12]. Inspired by Kitaev’s unpublished note, Baillif 
(2004) proved the following Milnor—Thurston formula: 


d—1 ; 
det” (Id — zM) = |] det?(Id + Dee) [16 
k=0 
Here, the D,(z) are kernel operators acting on (k + 1)- 


forms, constructed with the resolvent (Id — zM,)~, 
together with a convolution operator S,, mapping 
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(k + 1)-forms to k-forms and which satisfies the 
homotopy equation dS + Sd=1. The kernel o,(x, y) 
of S, has singularities of the form (x — y)/||x — yl|%. 
The transversality assumption allows Baillif to interpret 
the determinant obtained by integrating the kernels 
along the diagonal as a flat determinant in the sense of 
Atiyah and Bott, whence the notation det’ in the right- 
hand side of [16]. 

Baillif (2004) did not give a spectral interpretation 
of zeros or poles of the sharp determinant [16], but 
he noticed that for |z| very small, suitably high 
iterates of the D,(z) are trace-class on L?(R°), 
showing that the corresponding regularized determi- 
nant has a nonzero radius of convergence under 
weak assumptions. The spectral interpretation of the 
sharp determinant [12] in arbitrary dimension, but 
under additional assumptions, was subsequently 
carried out by Baillif and the author of the present 
article, giving a new proof of some of the results in 
Ruelle (1990). 


See also: Chaos and Attractors; Dynamical Systems and 
Thermodynamics; Ergodic Theory; Hyperbolic Dynamical 
Systems; Number Theory in Physics; Quantum 
Ergodicity and Mixing of Eigenfunctions; Quillen 
Determinant; Semi-Classical Spectra and Closed Orbits; 
Spectral Theory for Linear Operators. 
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Introduction 


The description of phenomena at high energies 
requires the investigation of relativistic wave equa- 
tions, that is, equations which are invariant under 
Lorentz transformations. Our discussion will be given 
classically (i.e., nonquantum). A classification of the 


wave equations may be based on the spin of the 
particles (or physical fields), which was discovered 
for the electron by Goudsmith and Uhlenbeck in 
1925. For the greater part of physics, the three spin 
numbers s=0,1/2, and 1 are sufficient; the respec- 
tive equations named after their discoverers Klein- 
Gordon, Dirac, and Proca for massive fields and 
D’Alembert, Weyl, and Maxwell for massless fields, 
respectively (see the following section). 

In their original form, these equations look rather 
different. However, their translation into spinor form 
shows that the wave equations for bosons and fermions 
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have the same structure, if s > 0. Therefore, most of 
the equations dealt with in this article are formulated 
for spinor fields. (Strictly speaking, the exclusive use of 
2-spinors restricts the relativistic invariance to the 
proper Lorentz group SO*‘(1,3). However, all the 
results presented here can be “translated back” into 
tensor or bispinor form, respectively (Illge 1993).) 
Relativistic wave equations for free fields with arbi- 
trary spin s > 0 in Minkowski spacetime are discussed 
in the section “Higher spin in Minkowski spacetime”; 
they were first given by Dirac (1936). 

In the subsequent section, we explain how the field 
theory can be extended to curved spacetimes. If a 
Lagrangian is known, then there exists a well-known 
mathematical procedure (“Lagrange formalism”) to 
obtain the field equations, the energy-momentum 
tensor, etc. All field equations for “low” spin s < 1 
arise from an action principle. Consequently, they can 
be extended to curved spacetime by simply replacing the 
flat metric and connection with their curved versions. 

If s > 1, then the wave equations do not follow from 
a variation principle without supplementary conditions. 
Nevertheless, one can try to generalize the equations of 
the section “Higher spin in Minkowski spacetime” to 
curved spacetime by the “principle of minimal cou- 
pling,” too. However, the arising equations are not 
satisfactory, since there is an algebraic consistency 
condition in curved space if s > 1 (Buchdahl 1962), and 
another for charged fields in the presence of electro- 
magnetism if s > 1/2 (Fierz and Pauli 1939). 

There have been numerous attempts to avoid these 
inconsistencies. As a rule, the alternative theories 
require an extended spacetime structure or additional 
new fields or they give up some important principle. An 
extensive literature is devoted to just this problem — 
unfortunately, a survey article or book is missing. 

Finally, we present a possibility to describe fields 
with arbitrary spin s > 0 within the framework of 
Einstein’s general relativity without any auxiliary 
fields and subsidiary conditions in a uniform manner. 
The approach is based on irreducible representations 
of type D(s,0) and D(s—1/2,1/2) instead of 
D(s/2,s/2) in the Fierz theory for bosons and 
D(s/2 + 1/4,s/2 —1/4) in the Rarita-Schwinger 
theory for fermions. It was first pointed out 
by Buchdahl (1982) that this type of field equations 
can be generalized to a curved spacetime if the mass is 
positive. After a short time Wünsch (1985) simplified 
them to their final form: 


Vr AB...E + 1XB..Ep' = O 1) 


VIAXB.E)P — M294B..E = 0 


This system contains the well-known wave equa- 
tions for low spin s= 1/2 and s= 1 as special cases. 


By iteration we obtain second-order wave equations 
of normal hyperbolic type. Further, Cauchy’s initial- 
value problem is well posed and a Lagrangian is 
known. For zero mass, we state the wave equations 


Va Ol AlB"...B”) = 0 [2] 


which are just the curved versions of the equations 
for the potential of a massless field. They are 
consistent in curved spacetime, too, and the Cauchy 
problem is well posed (Illge 1988). 

Last but not least, let us mention the esthetic 
aspect. Equations [1] and [2] satisfy Dirac’s demand: 
“Physical laws should have mathematical beauty.” 

In the following, we assume that the spacetime 
and all the spinor and tensor fields are of class C™. 
All considerations are purely local. We will call a 
symmetric (“irreducible”) spinor to be of type (n,k) 
if and only if it has n unprimed and k primed indices 
(irrespective of their position). Moreover, we use the 
notations and conventions of Penrose and Rindler 
(1984), especially for the curvature spinors Vagcp 
and P ABAR’. 


Wave Equations for Low Spin 
in Minkowski Spacetime 


The spin (or intrinsic angular momentum) of a 
particle is found to be quantized. Its projection on 
any fixed direction is an integer or half-integer 
multiple of Planck’s constant 4; the only possible 
values are 


—sh, (—s + 1)b, ..., (s — 1)b, sh 


The spin quantum number s so defined can have one 
of the values s=0,1/2,1,3/2,2,... and is a 
characteristic for all elementary particles along 
with their mass m and electric charge e. The 
particles with integer s are called “bosons,” those 
with half-integer s “fermions.” The three numbers 
s=0,1/2, and 1 are referred to as “low” spin; they 
are sufficient for the greater part of physics. 

The principle of first quantization associates a type 
of field and a field equation to each type of elementary 
particles. Massive particles, with rest mass m > 0, and 
massless particles, with rest mass m=O, are to be 
distinguished. Accordingly, we obtain six linear wave 
equations for s < 1, which read as follows in units 
such that c=h=1 (see Table 1): 

For the sake of simplicity, we consider only free 
fields in Table 1; no source terms or interaction terms 
appear here. The associated “free” Lagrangians are 
given in Table 2. 

Since the electromagnetic field tensor F,,, satisfies the 
first part of Maxwell’s equations 0;-F 4) = 0, it follows 
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Table 1 Relativistic wave equations for low spin s =0, 1/2, and 1 
Spin, mass Wave equation Associated particles 
s=0,m>0 Klein—Gordon eqn. Scalar mesons 
(x +m?)u=0 TAS yas 
s=—0, m=O D’Alembert eqn. — 
xu=0 
s=1/2,m>0 Dirac eqn. Leptons e, u, T 
ORDA + BXa =0  Baryons p,n, A, =, X,... 
O XA — 7p 0A=0 
s=1/2,m=0 Weyl eqn. Massless(?) neutrinos 
i, va=0 Voy Vie U 
s=1,m>0 Proca eqn. Vector mesons 
Hab = a Up — Op Ua p, Ww, Y, ®, eee 
O° Hex + m? Ua =0 
s=1,m=0 Maxwell eqn. Photon y 
Oa Fbc = 0 
On Fab = 0 


Table 2 The Lagrangian densities for free (i.e., noninteracting) 
fields with low spin 





Field Lagrangian density 

Scalar field Q = ${(0%u)(d,u) — mu} 

Dirac field Q= sp (Kao xa + gP dpe vy? — oB dpe OF 
-xa ^^ Xa) + M(Kaye* + g^ xa) 

Weyl field g= (PA ^A VA — va^" Dy) 

Proca field g= jh? =A aU T UUs 

Maxwell field L= —1 FæF® = — (ôa Ay (32A!) 


that a vector field A, exists such that F p = 4A, — 
O, Aq. This vector field is called the “electromagnetic 
4-potential.” It is not uniquely determined by the field 
Fy; the freedom in A, is A, — Az +0,7 where 
T= T(x) is a real-valued function. This gauge transfor- 
mation of A, can be used, for example, to obtain the 
Lorentz gauge condition 0°A, = 0. 

The wave equations listed in Table 1 look rather 
different, but this formal disadvantage can be over- 
come. To begin with, we remark that fermions 
require spinors for their description. The Dirac and 
Weyl equations are not describable by linear equa- 
tions for tensor fields. On the other hand, bosons can 
be described by spinors as well. All tensor equations 
can be “translated” into spinor form using the mixed 
spinor-tensor o%,,. We will demonstrate this proce- 
dure for the Proca field in some detail. 

The (possibly complex) skew-symmetric tensor 
H,» and the vector U, have the spinor equivalents 


a b 
H bAa Be = PapEearp + Sa/BrEAB 
U TAA = XAA! 


where y and € are both symmetric spinors: 
PAB = PAB)» EAB’ = E(A'B'). After a straightforward 
calculation the Proca equation yields 


o c 
OaXycr t PAB = 0, Oa XByc T Earp = 0 
Spc, +83E Eoy +M Xay = 0 


Further, from the equation ôjcHap = 0, we obtain 
o Exo = Ç pac; thus, the first and second summand 
in the third equation are equal. Consequently, we find 
the following spinor form of the Proca equations: 

2 
C m 
OnPca + 5 Xay = 0, 
2 
C m 
On ECA TH XAA = 0, 


OAXBc + Yap = 0 
[3] 


C’ 
DAXB -+ Earp = 0 


If the tensor fields H and U are real, then we have 
Eygi = Pags Xaar =Xaar and the second pair of equa- 
tions is just the complex conjugate of the first. 

Now it is readily seen that the Dirac and Proca 
equations have the same structure. They are coupled 
first-order systems of differential equations for pairs 
of spinor fields. The only decisive difference is that 
the spinors have one index if s= 1/2 and two indices 
ies 

We obtain a similar result for Maxwell fields. The 
real tensor F,, has the spinor equivalent 


a b _. = 
Faban TBB = PABEaB + ParprEAB 


with a symmetric spinor y,p,. The spinor form of 
Maxwell’s equations is (Penrose and Rindler 1984) 


on Pag = 9 [4] 


and has the same structure as the Weyl equation. 

Here we found an example for the power and utility 
of spinor techniques since they allow the formulation 
of the wave equations for bosons and fermions in a 
uniform manner. Only the cases m > 0 and m =Q are 
to be distinguished. Moreover, the above results 
suggest the way for generalizing the wave equations 
to higher spin. Therefore, we can already end the 
discussion of the fields with low spin and take them as 
special cases of those with arbitrary spin. 


Higher Spin in Minkowski Spacetime 
Massive Fields 


Relativistic wave equations for particles with arbi- 
trary spin were first considered by Dirac (1936). His 
equations read 


Or PAB...DO'...T' F ™1XB...DP'Q'...T! — 0 5) 
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where the spinors y and y are of type (n,k) and 
(n— 1,k + 1), respectively (corresponding to irredu- 
cible representations of the restricted Lorentz group 
SOT (1,3)). The constants mı and m are mass 
parameters (m? = —2m my) and the spin s is one 
half of the total number of indices of each spinor, 
s=(1/2)(1+k). As in the preceding section, we 
assume that electromagnetism and other interactions 
are absent. We should mention that equations for 
higher spin were not motivated by observations or 
empirical facts in that period of time, because only a 
few elementary particles were known (proton, 
neutron, electron, positron, and photon), and all of 
them have low spin (see Table 1). Since that time, 
particles with s>1 were found in nature, for 
example, resonances in scattering experiments. 

The system [5] allows a uniform description of free 
fields with arbitrary spin s > 0, including Dirac and 
Proca fields, as we know from the preceding section. 
(Remark: The symmetrization in eqns [3] can be 
omitted since the vector field U is divergence-free 
as a consequence of the second Proca equation.) 
Various other field equations proposed subsequently 
can be comprehended as its special cases (Corson 
1953). Examples are the Rarita-Schwinger equations 
for fermions: if they are written in terms of 2-spinors, 
then one obtains just the system [5] where the spinor 
y is of type (s + 1/2, s — 1/2) and the spinor x is of 
type (s—1/2,s+1/2). 

If we apply È to the first of the equations in [5] 
and use the second, we obtain 


(O + m*)paz..vo..1' = 0 [6a] 


since the second derivatives commute in flat space- 
times. Similarly, 


(O +m?)xB..DPro..r = 0 [6b] 


so both fields y and y satisfy a Klein-Gordon type 
equation. Moreover, eqns [5] imply that each of y 
and y is divergence-free 


"2 PAB.DQ..T =0= OPP XB. DPT 7] 


if they have at least one index of each kind. 

In a sense, this procedure can be reversed. Let a 
symmetric spinor field y be given that satisfies [6a] 
and [7]. (Remark: A significant example is the Fierz 
system 


(O+m’)U,y..4 = 9, O Ug... = 9 


for a symmetric, tracefree tensor field U, since the 
spinor equivalent of U is of type (k, k).) 
Define 


XB...DP'O!...1' = Op YAB...DO'...T! 


Then xy is symmetric in all its indices since is 
divergence-free. Further, we obtain 


P PA 
Or XB...DP'O'...T! = Op Op PAB...DO'...T! 


1 
=-3 LIPEB...DO'...T' 
73 PEB...DQ'...T' 


since y satisfies the Klein-Gordon equation [6a]. 
Consequently, the pair (p, x) satisfies a system [5]. 
Obviously, this procedure can be continued: define 


9B 
NC...DO'P'Q'...T! = Oo XB...DP'O"...T! 


etc. We obtain a sequence of spinors of type 
(0, 2s), (1,2s—1),...,(2s,0) each of which is 
obtainable from its immediate neighbors by a 
differentiation contracted on one index. Together, 
these spinors form an invariant exact set (Penrose 
and Rindler 1984). 

The just given arguments show that there is an 
ambiguity in the system [5]. The spin s fixes only 
the total number of indices of y and y. However, 
their partition into primed and unprimed ones is 
not a priori fixed. Therefore, we can choose a 
“convenient” partition for the respective needs. 


Massless Fields 


If m=0, then the Dirac system [5] is decoupled. 
Therefore, we have to state a single equation for a 
single field. Let y be a spinor field of type (n, 0). The 
massless free-field equation for spin (1/2) is then 
taken to be 


ON aB..E = 0 [8] 


More precisely, the solutions of [8] represent left- 
handed massless particles with helicity —(1/2)nh, 
whereas the solutions of the complex-conjugate 
form of this equation are right-handed particles 
(helicity + (1/2)nh). Recall that the Weyl equation 
(n= 1) and the source-free Maxwell equation (n=2) 
have this form. (Remark: The Bianchi identity in 
Einstein spaces also falls in this category, with the 
Weyl spinor WVagcp taking the place of ».... 
Moreover, we may think of [8] with n=4 as the 
gauge-invariant equation for the weak vacuum 
gravitational field.) 

The massless field equation [8] can be solved 
using methods of twistor geometry. Moreover, there 
is an explicit integral formula for representing 
massless free fields in terms of arbitrarily chosen 
null data on a light cone (Penrose and Rindler 1984, 
1986, Ward and Wells 1990). We do not discuss 
eqns [8] in detail since they are generally incon- 
sistent in curved spacetimes if n > 2 (see the next 
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section). We only indicate that each solution of [8] 
satisfies the second-order wave equation 


LIyap..z = 0 


Maxwell’s equations imply the existence of an 
electromagnetic potential (cf. section “Wave equa- 
tions for low spin in Minkowski spacetime”). This 
concept can be generalized to higher spin. 
A “potential” for a spinor field yap. of type 
(7,0) is a spinor field O,p..¢ of type (1,2 — 1) such 
that 


O(a Oae.. = 0 [9] 
and 
paB..E = 3f OE OAB E [10] 


One can check in a straightforward manner that a 
spinor field » that is given by [9] and [10] satisfies 
the massless equation [8]. If n > 1, there is a gauge 
freedom in these potentials; it turns out to be 


Oap..ei > Oap.e + Oa(Bwo....E) 


for any spinor field w of type (0,2 — 2). Further- 
more, the general massless field y can locally be 
expressed in this way (Penrose and Rindler 1986). 


Wave Equations in Curved Spacetimes, 
Consistency Conditions 


First of all we emphasize that Hamilton’s principle 
of stationary action is extremely important in field 
theories (see, e.g., Schmutzer (1968)). Assume that 
the Lagrangian & contains at most first derivatives 
of a field wy: 8 = L(vys(x), OgWy(x)). “Special rela- 


tivity” states that £ is invariant under Lorentz 


transformations. The Euler-Lagrange equations 
with respect to variation of Yy read 
OX OX 
= 0 [11] 


py 7 a OlOapy) 


and these are the field equations that wy is required to 
satisfy. 

In “general relativity,” the Lagrangian £ has to be 
generally covariant. So we have £=(wy(x), 
VaWy(x)) and the Euler-Lagrange equations 

Va — =) [12] 
dpr ð (Vays) 
emerge. If we assume that the Lagrangian & does 
not contain the curvature tensors and their deriva- 
tives explicitly and compare [11] and [12], then it is 
easily seen how the wave equations in curved 
spacetime can be obtained: by simply replacing the 


flat metric and connection with their curved 
versions. This procedure is called the “principle of 
minimal coupling.” 

All equations for low spin in Minkowski 
spacetime are the Euler-Lagrange equations of a 
variation principle (see Table 2). Consequently, they 
can be extended to curved spacetime by simply using 
the principle of minimal coupling. The arising 
equations are perfectly acceptable. No complications 
arise, and so we do not repeat them in this section. 

If s > 1, then neither the massive nor the massless 
wave equations follow from a variation principle 
without supplementary conditions. Nevertheless, we 
can try to generalize the equations of the previous 
section to a curved spacetime by formally replacing 
the flat metric and connection with their curved 
versions, too. However, serious problems arise: 

Let us first consider massless fields of helicity 
—(1/2)nh. The principle of minimal coupling yields 


Var PaB..£ = 0 [13] 
If we apply V4 to this equation, we obtain 
Va V4 PAB... = 0 


Since the covariant derivatives do not commute 
with each other, the term on the left-hand side is not 
completely symmetric in the unprimed indices. 
Therefore, this equation can be decomposed into 
two nontrivial irreducible parts if n > 1: symmetri- 
zation yields the covariant D’Alembert equation 


V’ VapB..EF = 0 


as required, while antisymmetrization yields by use 
of the spinor Ricci identities 


(n — 2) UM CPD...E)KLM = 0 [14] 


where Wapcp is the Weyl spinor. If n > 2 and the 
spacetime is not conformally flat, then this algebraic 
consistency condition effectively renders eqn [13] 
useless as physical field equations. 

If m > 0, the situation is not better. In somewhat 
similar way, we obtain the algebraic consistency 
conditions 


(n= 2) UN CYD...E)KLMO'P"...T' 


-- ko oO PIKLC. EP.. TX =0 (n>1) 15) 
(k — Dy (SX |B...DX'Y'Z!|T'...U") 


+(n—1) Oe" xc pyexysr.u =0 (k>0) 
if the spinor field ọ is of type (n,k) (Buchdahl 1962). 
We remark that similar consistency conditions 


occur if we have no gravitation, but an interaction 
with an electromagnetic field. Then the partial 
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derivative is to be replaced by D4 =ô, — ieA, and 
we obtain consistency conditions like [14] and 
[15], where the curvature spinors are to be 
replaced by the electromagnetic spinor (Fierz and 
Pauli 1939). 

So far one is left with the problem: “Find the 
‘correct? laws for arbitrary spin, that means field 
equations which coincide with the well-known 
approved ones for low spin and which remain 
consistent even for higher spin when electromagnet- 
ism and/or gravitation is coupled!” 

An extensive literature is devoted to just this 
problem. Let us briefly sketch some means by which 
the authors tried to solve it: 


è derivation of the desired field equations from a 
variation principle where the original spinor fields 
are supplemented by auxiliary fields; 

è extension of the four-dimensional spacetime geome- 
try to a richer one: higher number of dimensions, 
complexification, addition of torsion, nonmetrical 
connection, ...; 

è replacement of the algebra of spinors by some 
richer algebra; 

e disclaim of the principle of minimal coupling; and 

è supergravity theories. 


Some of these attempts are able to solve the problem, 
at least partially. But, as a rule, they pay a price of 
new difficulties. In the next section, we offer “good” 
equations for arbitrary s > 0 within the conventional 
framework of the minimal coupling principle and of 
a curved spacetime background. 


Wave Equations for Arbitrary Spin 
without Consistency Conditions 


Massive Fields 


The ansatz which leads to the desired result is 
surprisingly simple. We avoid the ambiguity in the 
Dirac system [5] that has been discussed earlier as 
well as any consistency condition if we state the 
wave equations 


VS PAB..E + M1XB..EP' = 0 16 
Vi AXB...E)P! — 2B... = 0 


This system was first proposed by Wiinsch (1985); 
it is equivalent to a pair of equations given by 
Buchdahl (1982) which contains the Weyl spinor 
explicitly. As before, y and y are symmetric spinor 
fields, y has n unprimed indices (and no one else!) 
and the constants m,,m2 are mass parameters 
(m? = —2m m2). We assume mı Æ 0 in this section. 
Obviously, the Dirac and Proca equations are 


special cases of [16], choose n=1 and n=2, 
respectively. (Remark: An electromagnetic field can 
be included in [16] by Vz > Da = V,a — ieA,, and 
the equations remain consistent (Illge 1993).) 

First of all, we remark that eqns [16] are the Euler- 
Lagrange equations of an action principle. The 
existence of a Lagrangian is plausible since the 
number of equations and the number of degrees of 
freedom are equal. We do not state the Lagrangian, 
the energy-momentum tensor, and the current vector 
in this article and refer the reader to Illge (1993). 

If n > 1, we can apply V?” to the first equation of 
[16] and obtain using the spinor Ricci identities: 


BP! 1 OBP oA 
V“ XBC...EP = — mare VP PABC...E 
1 


my 


n—2 
YM cep..e)Kim [17] 


Hence the divergence of y vanishes if n=2 or if the 
spacetime is conformally flat. These are exactly the 
cases where the symmetrization in the second 
equation of [16] can be omitted. 

Now we are going to derive the second-order 
equations for y and x. Substituting 


1 
VBC...EP) = — — VAB...E [18] 
mı 


into the second equation of [16], we obtain, after a 
bit of algebra, 


V7VaPan..g — 2(n — 1)" ABYC.EKL 


n+2 
+( T R +m? ) pane =0 [19] 





This is a linear second-order equation of normal 

hyperbolic type for the spinor field y. It can be used 

to solve Cauchy’s problem for the system [16]. 
Similarily, we get a second-order equation for y: 


V°VaXeB..EP — 2(n — 1)®g Ky XC..E)KW’ 


R 2 
+ 7 FM |XB.EP 
n—1 





=2 Ver VEY xXc...E)KW! |20] 
Seemingly this is not an equation of hyperbolic 
type if n > 1. However, the second derivatives of x 
on the right-hand side of [20] can be eliminated 
using [17]. Therefore, if the spinor field is 
already known by solving [19], then [20] is an 
equation of Klein—Gordon type, too. However, it 
is generally inhomogeneous if ~>2. A wave 
equation that contains the spinor field x alone 
exists only if n=1,n=2, or the spacetime is 
conformally flat. 
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Now we are going to discuss the “Cauchy 
problem” for the wave equations [16] (for details 
see Wünsch (1985)). Let a spacelike hypersurface S$ 
be given and let n denote the future-directed unit 
normal vector on S and V,=n’V,. The local 
Cauchy problem is to find a solution (y, x) of [16] 
with given Cauchy data °, x? on S. 

In general, the initial data y? and x° cannot be 
prescribed arbitrarily. Suppose that a solution (y, x) 
of [16] does exist. Then the differential equations 
have to be satisfied on S, too. Thus, we obtain 


(VnPAB...E)|S = 2n4 (Vi PB...EF -+ mM1XB..EA' )|S [21] 


where the differential operator Wa =Vaa' — 
naa Vn is just the tangential part of Vax with 
respect to S. Therefore, the right-hand side of [21] 
is completely determined by the initial data. Now 
the symmetry of the solution yasp..¢ implies the 
symmetry of V, ~apg..~. Consequently, the right- 
hand side of [21] has to be symmetric with respect 
to the unprimed indices and so we obtain the 
following constraints for the initial data if y has at 
least two indices: 


~ 


nea (Vo EE T mıxX’B.Ea) is 0 [22] 
Now we can state: 


Theorem 1 Ifthe Cauchy data pland x? satisfy the 
constraints |22], then the Cauchy problem has a 
unique solution in a neighborhood of S. 


For each differential equation of hyperbolic type 
we can ask the question whether the wave propaga- 
tion is “sharp,” that is, free of tails. If this property 
is valid we say that the equation satisfies “Huygens’ 
principle” (for an exact definition, see, e.g., Wünsch 
(1994)). Using invariant Taylor expansions of 
the parallel propagator and of the Riesz kernels in 
normal coordinates we can prove (Wünsch 1985): 


Theorem 2 The massive wave equations |16] for 
spin s > 0 satisfy Huygens’ principle if and only 
if the spacetime is of constant curvature and 
R= —(6m°/s). 


Massless Fields 


In the preceding section, we have seen that the 
premise mı #0 is decisive for the consistency of 
[16] if s > 1. This fact agrees with the result of the 
previous section, that eqn [13] is inconsistent if 
s > 1 and the spacetime is not conformally flat. On 
the other hand, m2=0 is possible. Therefore we 
state the wave equations 


Vey Qala...) = 0 [23] 


for a spinor field © of type (1,2 — 1). This is just 
eqn [9] for the potential of a massless field. We will 
show that [23] is a satisfactory equation in a 
generally curved spacetime (Illge 1988). Unfortu- 
nately, no Lagrangian has been found if n > 1. 

To begin with, we remark that there is a gauge 
freedom in curved spacetimes, too, since the 
solution © of [23] cannot be uniquely determined 
if 2>1. We use this freedom to prescribe the 
divergence of ©. So let an arbitrary spinor field 
w of type (0,n— 2) be given. We consider eqns 
[23] and 


/ 
Vi"? Op'e E = WELE 
or, together, 


n—1 





ViOap Et = — EA'(B'WC...E') |24] 
If we apply V4 to this equation, we obtain using the 
spinor Ricci identities 


R 
V"°VaOpp'..e — 2(2 — 1) ®p ‘Br Orxic..E)w + goban 


2(n—1 
a 7 nee [25] 


This is a linear second-order equation of normal 
hyperbolic type for the spinor field © (cf. [20]). 

Now let us discuss some particular cases. If n= 1, 
then [23] is just the Weyl equation itself. Therefore, 
the equations for the field and its potential are 
identical and there is no gauge freedom. If n=2, 
then the spinor field O44: is a (complex) vector field 
and eqn [23] yields 


Vw Oa) = 0 


The gauge field w is just a scalar function, especially 
we can choose w= 0 (Lorentz gauge). As in eqn [10] 
we define the field spinor as 


pas = Vi,Onppr 
Since we have the identity 
Vg Vena = Vg Vipmalay) 


for arbitrary spinor fields nax (which must not have 
additional free indices!), the spinor field yap satisfies 
the massless free-field equation 


If 7 >2, we can define a field yap pf via the 
relation [10], too, replacing the partial with the 
covariant derivatives. But the field equation for 
Yap..—- becomes more complicated than [13]. This 
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fact is not surprising, since eqn [23] is a consistent 
one, whereas [13] is inconsistent. 

We continue with some remarks on “conformal 
rescalings of the metric.” The equations for massless 
fields have to be invariant with respect to such 
transformations. Therefore, the “curved space” 
scalar wave equation is 


R 
(a + 4 p= 0 [26] 
Further, the equations 


VA AB..EIB'..F') =0 [27] 


for any spinor field 7 of type (n,k) are conformally 
invariant (Penrose and Rindler 1984). Especially, 
eqns [23] for the massless potential and [13] for the 
massless field have this property. 

We mention a further special case of [27]. If ņ is of 
type (k+1,k), then these equations are consistent, 
too (Frauendiener and Sparling 1999). The Cauchy 
problem is well posed and a Lagrangian is known. 
Unfortunately, the solutions do not satisfy a wave 
equation of second order if k > 0. 

We conclude with the discussion of the Cauchy 
problem for eqn [24]. As in the preceding section, let 
a spacelike hypersurface S and initial data 6° on S 
be given. We can state: 


Theorem 3 If a symmetric spinor field w of type 
(0, — 2) is given, then there exists a neighborhood 
of S in which eqn [24] has one and only one solution 
satisfying Os = 0°. 


The proof is given in Illge (1988). We emphasize 
that there are no constraints on the Cauchy data for 
the massless equation [24]. 

In contrast to massive fields we are far away from 
an answer to the question whether Huygens princi- 
ple is valid for the massless equations. A particular 
result is Wünsch (1994): 


Theorem 4 Huygen’s principle for the conformally 
invariant scalar wave equation |26], the Weyl, and 
the Maxwell equations is valid only for conformally 
flat and plane wave metrics within the classes of 
centrally symmetric, recurrent, (2, 2)-decomposable, 


Petrov type N, III or D spacetimes as well as those 
with V aRojc =Q. 


See also: Clifford Algebras and Their Representations; 
Dirac Fields in Gravitation and Nonabelian Gauge 
Theory; Euclidean Field Theory; Evolution Equations: 
Linear and Nonlinear; Spinors and Spin Coefficients; 
Standard Model of Particle Physics; Twistors. 
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Introduction 


Quantum field theories (QFTs) provide a natural 
framework for quantum theories that obey the 
principles of special relativity. Among their most 
striking features are ultraviolet (UV) divergences, 
which at first sight invalidate the existence of the 
theories. The divergences arise from Fourier modes 
of very high wave number, and hence from the 
structure of the theories at very short distances. In 
the very restricted class of theories called “renorma- 
lizable,” the divergences may be removed by a 
singular redefinition of the parameters of the theory. 
This is the process of renormalization that defines a 
QFT as a nontrivial limit of a theory with a UV 
cutoff. 

A very important QFT is the standard model, an 
accurate and successful theory for all the known 
interactions except gravity. Calculations using 
renormalization and related methods are vital to 
the theory’s success. 

The basic idea of renormalization predates QFT. 
Suppose we treat an observed electron as a 
combination of a bare electron of mass mp and the 
associated classical electromagnetic field down to a 
radius a. The observed mass of the electron is its 
bare mass plus the energy in the field (divided by c’). 
The field energy is substantial, for example, 0.7 MeV 
when a=107' m, and it diverges when a— 0. The 
observed mass, 0.5 MeV, is the sum of the large 
(or infinite) field contribution compensated by a 
negative and large (or infinite) bare mass. This 
calculation needs replacing by a more correct 
version for short distances, of course, but it remains 
a good motivation. 

In this article, we review the theory of renorma- 
lization in its classic form, as applied to weak- 
coupling perturbation theory, or Feynman graphs. It 
is this method, rather than the Wilsonian approach 
(see Exact Renormalization Group), that is typically 
used in practice for perturbative calculations in the 
standard model, especially its QCD part. 

Much of the emphasis is on weak-coupling 
perturbation theory, where there are well-known 
algorithmic rules for performing calculations and 
renormalization. Applications (see Quantum Chro- 
modynamics for some important nontrivial examples) 
involve further related results, such as the operator 
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product expansion, factorization theorems, and the 
renormalization group (RG), to go far beyond simple 
fixed-order perturbation theory. The construction of 
fully rigorous mathematical treatments for the exact 
theory is a topic of future research. 


Formulation of QFT 


A QFT is specified by its Lagrangian density. 
A simple example is ¢* theory: 


i 2 2 4! A 
where $(x) = d(t, x) is a single component Hermitian 
field. The Lagrangian density and the resulting 
equation of motion, 076 + m*¢ + (1/6)\¢° =0, are 
local; they involve only products of fields at the 
same spacetime point. Such locality is characteristic 
of relativistic theories, where otherwise it is difficult 
or impossible to preserve causality, but it is also the 
source of the UV divergences. The question mark 
over the equality symbol in eqn [1] is a reminder 
that renormalization of UV divergences will force us 
to modify the equation. 

The Feynman rules for perturbation theory are 
given by a free propagator i/(p* — m? + i0) and an 
interaction vertex —ià. Although we will usually 
work in four spacetime dimensions, it is useful also 
to consider the theory in a general spacetime 
dimensionality n, where the coupling has energy 
dimension [A] = E4”. We use “natural units,” that 
is, with 6 =c=1. The “i0” in the propagator i/(p” — 
m? + i0) symbolizes the location of the pole relative 
to the integration contour; it is often written as ie. 

The primary targets of calculations are the 
vacuum expectation values of time-ordered products 
of ¢; in QFT these are called the Green functions of 
the theory. From these can be reconstructed the 
scattering matrix, scattering cross sections, and 
other measurable quantities. 


One-Loop Calculations 


Low-order graphs for the connected and amputated 
four-point Green function are shown in Figure 1. 
Each one-loop graph has the form 


rA Tak 1 
2 J (2n)* (k2 —m2+i0)[(p— k)” — m2 + i0] 





where p is a combination of external momenta. 
There is a divergence from where the loop 
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K K+ C+ <r" 


Figure 1 One-loop approximation to connected and amputated 
four-point function, before renormalization. 


momentum k goes to infinity. We define the degree 
of divergence, A, by counting powers of k at large k, 
to get A=0. In an n-dimensional spacetime we 
would have A=n—4. The integral is divergent 
whenever A > 0. Comparing the dimensions of the 
one-loop and tree graphs shows that A equals the 
negative of the energy dimension of the coupling À. 
Thus, the dimensionlessness of A at the physical 
spacetime dimension is equivalent to the integral 
being just divergent. 

The infinity in the integral implies that the theory 
in its naive formulation is not defined. With the aid 
of RG methods, it has been shown that the problem 
is with the complete theory, not just perturbation 
theory. 

The divergence only arises because we use a 
continuum spacetime. So suppose that we formulate 
the theory initially on a lattice of spacing a (in space 
or spacetime). Our loop graph is now 


—i A I (p; m,a) 
—)2 
3274 


where the free propagator S(k, m;a) approaches the 
usual value i/(k* — m? + i0) when k is much smaller 
than 1/a, and it falls off more rapidly for large k. 
The basic observation that propels the renormaliza- 
tion program is that the divergence as a—0 is 
independent of p. This is most easily seen by 
differentiating once with respect to p, after which 
the integral is convergent when a=0, because the 
differentiated integral has degree of divergence —1. 

Thus we can cancel the divergence in eqn [2] by 
replacing the coupling in the first term in Figure 1, 
by the so-called bare coupling 


No = A+ 3A(a)H + O(°) [4] 


[ Se S(k,m;a) S(p = kma) [3] 


Here A(a) is chosen so that the renormalized value 
of our one-loop graph, 


-i\?Ir(p?, m?) = iM lim [I(p;m,a) + A(a)] [5] 


exists, at a=0, with A(a) in fact being real valued. 
The factor 3 multiplying A(a) in eqn [4] is because 
there are three one-loop graphs, with equal diver- 
gent parts. The replacement for the coupling is made 
in the tree graph in Figure 1, but not yet at the 
vertices of the other graphs, because at the moment 
we are only doing a calculation accurate to order 7; 


XK a OK + C+ O< a 


Figure 2 One-loop approximation to renormalized connected 
and amputated four-point function, with counter-term. 


the appropriate expansion parameter of the theory is 
the finite renormalized coupling A, held fixed as 
a— 0. We call the extra term in eqn [5] a counter- 
term. The diagrams for the correct renormalized 
calculation are represented in Figure 2, which has a 
counter-term graph compared with Figure 1. 

In the physics terminology, used here, the cutting- 
off of the divergence by using a modified theory is 
called a regularization. This contrasts with the 
mathematics literature, where “regularized integral” 
usually means the same as a physicist’s “renorma- 
lized integral.” 

There is always freedom to add a finite term to a 
counter-term. When we discuss the RG, we will see 
that this corresponds to a reorganization of the 
perturbation expansion and provides a powerful 
tool for improving perturbatively based calculations, 
especially in QCD. Contrary to the impression given 
in some parts of the literature, it is not necessary 
that a renormalized mass equal a corresponding 
physical particle mass, with similar statements for 
coupling and field renormalization. While such a 
prescription is common and natural in a simple 
theory like QED, it is by no means required and 
certainly may not always be best. If nothing else, the 
correspondence between fields and stable particles 
may be poor or nonexistent (as in QCD). 

One classic possibility is to subtract the value of 
the graph at p=0O, a prescription associated with 
Bogoliubov, Parasiuk, and Hepp (BPH), which 
leads to 


—ià Ir ppu(p7) 
i l daxl 1 —p?x(1 — 2 6 
=a) dmi- —x)/m?] 6 


In obtaining this from [2], we used a standard 
Feynman parameter formula, 





1 [ 1 
=] dx——___—_, 
AB Jo [Ax + B(1 — x)] 
to combine the propagator denominators, after 
which the integral over the momentum variable 


k is elementary. We then obtain the renormalized 
one-loop (four-point and amputated) Green function 


~id — ià [Ip (s) + R(t) + In(u)] + O(*) [8] 


where s, t, and u are the three standard Mandelstam 
invariants for the Green function. (For a 2—2 


[7] 


scattering process, or a corresponding off-shell 
Green function, in which particles of momenta pı 
and p2 scatter to particles of momenta p} and p5, 
the Mandelstam variables are defined as s= (pı + 
p2)", t=(p1 — p1, and u= (p1 — p}).) 

In the general case, with a nonzero degree of 
divergence, the divergent part of an integral is a 
polynomial in p and m of degree D, where D is the 
smallest positive integer less than or equal to A. Ina 
higher spacetime dimension, this implies that renor- 
malization of the original, momentum-independent, 
interaction vertex is not sufficient to cancel the 
divergences. We would need higher derivative terms, 
and this is evidence that the theory is not renorma- 
lizable in higher than 4 spacetime dimensions. Even 
so, the terms needed would be local, because of the 
polynomiality in p. 


Complete Formulation of 
Renormalization Program 


The full renormalization program motivated by 
example calculations is: 


e the theory is regulated to cut off the divergences; 

è the numerical value of each coefficient in £ is 
allowed to depend on the regulator parameter 
(e.g., a); and 

e these dependences are adjusted so that finite 
results for Green functions are obtained after 
removal of the regulator. 


In ¢* theory, we therefore replace £ by 


Zr a ZN 
2 4! 
with the bare parameters, Z, mo and Ao, having a 
regulator dependence such that Green functions of ¢ 

are finite at a=0. 
The slightly odd labeling of the coefficients in 
eqn [9] arises because observables like cross sections 


are invariant under a redefinition of the field by a 
factor. In terms of the bare field do V Ze, we have 





£=5 (06) - eo 


1 m? 
L=5(8b0) — 563-790 [10] 


The unit coefficient of (1/2)(A¢0)” implies that do 
has canonical commutation relations (in the regu- 
lated theory). This provides a natural standard for 
the normalization of the bare mass mp and the bare 
coupling Xo. 

All terms in £ have coefficients with dimension 
zero or larger. This is commonly characterized by 
saying that the terms £ “have dimension 4 or less,” 
which refers to the products of field operators and 
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derivatives in each term. A generalization of the 
power-counting analysis shows that if we start with 
a theory whose £ only has terms of dimension 4 or 
less, then no terms of higher dimension are needed 
as counter-terms, at least not in perturbation theory. 
This is a very powerful restriction on self-contained 
QFTs, and was critical in the discovery of the 
standard model. 

Sometimes it is found that the description of some 
piece of physics appears to need higher-dimension 
Operators, as was the case originally with weak- 
interaction physics. The lack of renormalizability of 
such theories indicates that they cannot be complete, 
and an upper bound on the scale of their applic- 
ability can be computed, for example, a few 
hundred GeV for the four-fermion theory of weak 
interactions. Eventually, this theory was superseded 
by the renormalizable Weinberg-Salam theory of 
weak interactions, now a part of the standard 
model, to which the four-fermion theory provides a 
low-energy approximation for charged current weak 
interactions. 

Certain operators of allowed dimensions are 
missing in eqn [9]: the unit operator, and @ and 
°. Symmetry under the transformation ¢— —¢ 
implies that Green functions with an odd number of 
fields vanish, so that no ¢ and ¢° counter-terms are 
needed. Divergences with the unit operator do 
appear, but not for ordinary Green functions. In 
gravitational physics, the coefficient of the unit 
Operator gives renormalization of the cosmological 
constant. 

To implement renormalized perturbation theory, 
we partition £ (nonuniquely) as 


L= Lire T Lia interaction t Loaner tani [1 1] 


where the free, the basic interaction, and the 
counter-term Lagrangians are 


1 2 m? 2 
Lies = 2 (aQ) > 7? [12] 
A 
L bask interaction — —_ 4! i [13 | 
fe 2 (Zm m?) 2 
Leonte- tet — ao (ð p) = 3 p 
Z hee 


The renormalized coupling and mass, \ and m, are to 
be fixed and finite when the UV regulator is removed. 
Both the basic interaction and the counter-terms are 
treated as interactions. First we compute “basic 
graphs” for Green functions using only the basic 
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interaction. The counter-terms are expanded in 
powers of A, and then all graphs involving counter- 
term vertices at the chosen order in are added to the 
calculation. The counter-terms are arranged to cancel 
all the divergences, so that the UV regulator can be 
removed, with m and A held fixed. The counter-terms 
cancel the parts of the basic Feynman graphs asso- 
ciated with large loop momenta. An algorithmic 
specification of the otherwise arbitrary finite parts of 
the counter-terms is called a renormalization prescrip- 
tion or a renormalization scheme. Thus, it gives a 
definite relation between the renormalized and bare 
parameters, and hence a definite specification of the 
partitioning of £ into its three parts. 

It has been proved that this procedure works to all 
orders in A, with corresponding results for other 
theories. Even in the absence of fully rigorous 
nonperturbative proofs, it appears clear that the results 
extend beyond perturbation theory, at least in asymp- 
totically free theories like QCD: see the discussion on 
Wilsonian RG (see Exact Renormalization Group). 


Dimensional Regularization 
and Minimal Subtraction 


The final result for renormalized graphs does not 
depend on the particular regularization procedure. 
A particularly convenient procedure, especially in 
QCD, is dimensional regularization, where diver- 
gences are removed by going to a low spacetime 
dimension n. To make a useful regularization method, 
n is treated as a continuous variable, n =4 — 2e. 

Great advantages of the method are that it 
preserves Poincaré invariance and many other 
symmetries (including the gauge symmetry of 
QCD), and that Feynman graph calculations are 
minimally more complicated than for finite graphs 
at n =4, particularly when all the lines are massless, 
as in many QCD calculations. 

Although there is no such object as a genuine 
vector space of finite noninteger dimension, it is 
possible to construct an operation that behaves as if 
it were an integration over such a space. The 
Operation was proved unique by Wilson, and 
explicit constructions have been made, so that 
consistency is assured at the level of all Feynman 
graphs. Whether a satisfactory definition beyond 
perturbation theory exists remains to be determined. 

It is convenient to arrange that the renormalized 
coupling is dimensionless in the regulated theory. 
This is done by changing the normalization of A with 
the aid of an extra parameter, the unit of mass u: 


ào = u (À + counter-terms) [15] 


with à and p being held fixed when e — 0. (Thus, 
the basic interaction in eqn [13] is changed to 
—\A uot /4!.) Then for the one-loop graph of eqn [2], 
dimensionally regularized Feynman parameter meth- 
ods give 


a (n) Tle 


1 2 p2 _ i =E 
«| a a oe) ie 
0 H 


=I I(p;m, e€) = 


A natural renormalization procedure is to subtract 
the pole at «=0, but it is convenient to accompany 
this with other factors to remove some universally 
occurring finite terms. So MS renormalization 
(“modified minimal subtraction”) is defined by 
using the counter-term 


MS 
. 2 _ . E€ 
—iA(e)A = pE ae [17] 
where S. $% (47e-%)*, with yg = 0.5772... being the 


Euler constant. This gives a renormalized integral (at 
€= 0) 


n dx nea ea] 18] 


which can be evaluated easily. A particularly simple 
result is obtained at m= 0: 


-\2 ES. 
E - In +2] 19] 
T u 


This formula symptomizes important and very 
useful algorithmic simplifications in the higher- 
order massless calculations common in QCD. 

The MS scheme amounts to a de facto standard 
for QCD. At higher orders a factor of S.” is used in 
the counter-terms, with L being the number of 
loops. 


Coordinate Space 


Quantum fields are written as if they are functions 
of x, but they are in fact distributions or generalized 
functions, with quantum-mechanical operator 
values. This indicates that using products of fields 
is dangerous and in need of careful definition. The 
relation with ordinary distribution theory is simplest 
in the coordinate-space version of Feynman graphs. 
Indeed in the 1950s, Bogoliubov and Shirkov 
formulated renormalization as a problem of 
defining products of the singular numeric-valued 
distributions in coordinate-space Feynman graphs; 
theirs was perhaps the best treatment of renormali- 
zation in that era. 


For example, the coordinate-space version of 
eqn [5] is 


=A lim dfx d*y f(x,y) 
x [45 - yim, a}? + iA(a)6 (x —y)] [20] 


where x and y are the coordinates for the interaction 
vertices, f(x,y) is the product of external-line free 
propagators, and S(x — y;m,a) is the coordinate- 
space free propagator, which at a=0 has a 
singularity 
1 

4r2[-(x — y)? + i0] 
as (x — y)? — 0. We see in eqn [20] a version of the 
Hadamard finite part of a divergent integral, and 
renormalization theory generalizes this to particular 
kinds of arbitrarily high-dimension integrals. The 
physical realization and justification of the use of 
the finite-part procedure is in terms of renormaliza- 
tion of parameters in the Lagrangian; this also gives 
the procedure a significance that goes beyond the 
integrals themselves and involves the full nonpertur- 
bative formulation of QFT. 


|21] 


General Counter-Term Formulation 


We have written £ as a basic Lagrangian density 
plus counter-terms, and have seen in an example 
how to cancel divergences at one-loop order. In this 
section, we will see how the procedure works to all 
orders. The central mathematical tool is Bogoliubov’s 
R-operation. Here the counter-terms are expanded 
as a sum of terms, one for each basic one-particle 
irreducible (1PI) graph with a non-negative degree 
of divergence. To each basic graph for a Green 
function is added a set of counter-term graphs 
associated with divergences for subgraphs. The 
central theorem of renormalization is that this 
procedure does in fact remove all the UV diver- 
gences, with the form of the counter-terms being 
determined by the simple computation of the degree 
of divergence for 1PI graphs. 

To see the essential difficulty to be solved, consider 
a two-loop graph like the first one in Figure 3. Its 
divergence is not a polynomial in external momenta, 
and is therefore not canceled by an allowed counter- 
term. This is shown by differentiation with respect to 


Figure 3 A two-loop graph and its counter-terms. The label B 
indicates that it is the two-loop overall counter-term for this graph. 
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external momenta, which does not produce a finite 
result because of the divergent one-loop subgraph. 
But for consistency of the theory, the one-loop 
counter-terms already computed must be themselves 
put into loop graphs. Among others, this gives the 
second graph of Figure 3, where the cross denotes 
that a counter-term contribution is used. The 
contribution used here is actually 2/3 of the total 
one-loop counter-term, for reasons of symmetry 
factors that are not fully evident at first sight. The 
remainder of the one-loop coupling renormalization 
cancels a subdivergence in another two-loop graph. 
It is readily shown that the divergence of the sum of 
the first two graphs in Figure 3 is momentum 
independent, and thus can be canceled by a vertex 
counter-term. 

This method is fully general, and is formalized in 
the Bogoliubov R-operation, which gives a recursive 
specification of the renormalized value R(G) of a 
graph G: 





R(G)EG+ X G [22] 


a Ea N 


yi—C(y;) 


The sum is over all sets of nonintersecting 1PI 
subgraphs of G, and the notation G|., cq denotes 
G with all the subgraphs y; replaced by associated 
counter-terms C(7;). The counter-term C(y) of a 1PI 
graph y has the form 


C(x) 2a T(y + counter-terms 


for subdivergences) [23] 


Here T is an operation that extracts the divergent 
part of its argument and whose precise definition 
gives the renormalization scheme. For example, in 
minimal subtraction we define 


T(T) = pole part ate = 0 of T [24] 


We formalize the term inside parentheses in eqn 
[23] as 


R(y) Sa + counterterms for subdivergences 


/ 
=r Cl es [25] 


Lissa} 


where the prime on the X~ denotes that we sum over 
all sets of nonintersecting 1PI subgraphs except for 
the case that there is a single y; equal to the whole 
graph (i.e. the term with m=1 and y=7 is 
omitted). 

Note that, for the MS scheme, we define the T 
operation to be applied to a factor of constant 
dimension obtained by taking the appropriate power 
of u€ outside of the pole-part operation. Moreover, 
it is not a strict pole-part operation; instead each 
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pole is to be multiplied by S+, where L is the 
number of loops, and S, is defined after eqn [17]. 

Equations |22|-[25] give a recursive construction 
of the renormalization of an arbitrary graph. The 
recursion starts on one-loop graphs, since they have 
no subdivergences, that is, C(y)= —T(y) for a one- 
loop 1PI graph. 

Each counter-term C(y) is implemented as a 
contribution to the counter-term Lagrangian. The 
Feynman rules ensure that once C(y) has been 
computed, it appears as a vertex in bigger graphs 
in such a way as to give exactly the counter-terms 
for subdivergences used in the R-operation. It has 
been proved that the R-operation does in fact give 
finite results for Feynman graphs, and that basic 
power counting in exactly the same fashion as at 
one-loop determines the relevant operators. 

In early treatments of renormalization, a problem 
was caused by graphs like Figure 4. This graph has 
three divergent subgraphs which overlap, rather 
than being nested. Within the R-operation approach, 
such cases are no harder to deal with than merely 
nested divergences. 

The recursive specification of R-operation can be 
converted to a nonrecursive formulation by the 
forest formula of Zavyalov and Stepanov, later 
rediscovered by Zimmerman. It is normally the 
recursive formulation that is suited to all-orders 
proofs. 

Whether these results, proved to all orders of 
perturbation theory, genuinely extend to the com- 
plete theory is not so easy to answer, certainly in a 
realistic four-dimensional QFT. One illuminating 
case is of a nonrelativistic quantum mechanics 
model with a delta-function potential in a two- 
dimensional space. Renormalization can be applied 
just as in field theory, but the model can also be 
treated exactly, and it has been shown that the 
results agree with perturbation theory. 

Perturbation series in relativistic QFTs can at best 
be expected to be asymptotic, not convergent. So 
instead of a radius of convergence, we should talk 
about a region of applicability of a weak-coupling 
expansion. In a direct calculation of counter-terms, 
etc., the radius of applicability shrinks to zero as the 
regulator is removed. However, we can deduce the 
expansion for a renormalized quantity, whose 
expansion is expected to have a nonzero range of 
applicability. We can therefore appeal to the 
uniqueness of power series expansions to allow the 


QX 


Figure 4 Graph with overlapping divergent subgraphs. 


calculation, at intermediate stages, to use bare 
quantities that are divergent as the regulator is 
removed. 


Renormalizability, Non-Renormalizability, 
and Super-Renormalizability 


The basic power-counting method shows that if a 
theory with conventional fields (at n=4) has only 
operators of dimension 4 or less in its £, then the 
necessary counter-term operators are also of dimen- 
sion 4 or less. So if we start with a Lagrangian with 
all possible such operators, given the field content, 
then the theory is renormalizable. This is not the 
whole story, as we will see in the discussion of gauge 
theories. 

If we start with a Lagrangian containing operators 
of dimension higher than 4, then renormalization 
requires operators of ever higher dimension as 
counter-terms when one goes to higher orders in 
perturbation theory. Therefore, such a theory is said 
to be perturbatively non-renormalizable. Some very 
powerful methods of cancelation or some nonper- 
turbative effects are needed to evade this result. 

In the case of dimension-4 interactions, there is 
only a finite set of operators given the set of basic 
fields, but divergences occur at arbitrarily high 
orders in perturbation theory. If, instead, all the 
operators have at most dimension 3, then only a 
finite number of graphs need counter-terms. Such 
theories are called super-renormalizable. The diver- 
gent graphs also occur as subgraphs inside bigger 
graphs, of course. There is only one such theory in a 
four-dimensional spacetime: ø? theory, which suf- 
fers from an energy density that is unbounded from 
below, so it is not physical. In lower spacetime 
dimension, where the requirements on operator 
dimension are different, there are many more 
known super-renormalizable theories, some with a 
very rigorous proof of existence. 

All the above characterizations rely primarily on 
perturbative analysis, so they are subject to being 
not quite accurate in an exact theory, but they form 
a guide to the relevant issues. 


Renormalization and Symmetries: 
Gauge Theories 


In most physical applications, we are interested in 
QFTs whose Lagrangian is restricted to obey certain 
symmetry requirements. Are these symmetries pre- 
served by renormalization? That is, is the Lagran- 
gian with all necessary counter-terms still invariant 
under the symmetry? 


We first discuss nonchiral symmetries; these are 
symmetries in which the left-handed and right- 
handed parts of Dirac fields transform identically. 

For Poincaré invariance and simple global internal 
symmetries, it is simplest to use a regulator, like 
dimensional regularization, which respects the sym- 
metries. Then it is easily shown that the symmetries 
are preserved under renormalization. This holds 
even if the internal symmetries are spontaneously 
broken (as happens with a “wrong-sign mass term,” 
e.g., negative m? in eqn [1]). 

The case of local gauge symmetries is harder. But 
their preservation is more important, because gauge 
theories contain vector fields which, without a gauge 
symmetry, generally give unphysical features to the 
theory. For perturbation theory, BRST quantization 
is usually used, in which, instead of gauge symme- 
try, there is a BRST supersymmetry. This is 
manifested at the Green function level by Slavnov— 
Taylor identities that are more complicated, in 
general, than the Ward identities for simple global 
symmetries and for abelian local symmetries. 

Dimensional regularization preserves these 
symmetries and the Slavnov—Taylor identities. More- 
over, the R-operation still produces finite results with 
local counter-terms, but cancelations and relations 
occur between divergences for different graphs in 
order to preserve the symmetry. A simple example is 
QED, which has an abelian U(1) gauge symmetry, and 
whose gauge-invariant Lagrangian is 


2 
L=-4 (3AP - AP) 
+ gofia- eoA® -mo)yo 26 


At the level of individual divergent 1PI graphs, 
we get counter-terms proportional to A,” and to 
(A 7, operators not present in the gauge-invariant 
Lagrangian. The Ward identities and Slavnov—Taylor 
identities show that these counter-terms cancel when 
they are summed over all graphs at a given order of 
renormalized perturbation theory. Moreover, the 
renormalization of coupling and the gauge field are 
inverse, so that eg AV) equals the corresponding 
object with renormalized quantities, uSeA,. Natu- 
rally, sums of contributions to a counter-term in 
£ can only be quantified with use of a regulator. 
In nonabelian theories, the gauge-invariance proper- 
ties are not just the absence of certain terms in £ but 
quantitative relations between the coefficients of terms 
with different numbers of fields. Even so, the argument 
with Slavnov—Taylor identities generalizes appropri- 
ately and proves renormalizability of QCD, for 
example. But note that the relation concerning the 
product of the coupling and the gauge field does not 
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generally hold; the form of the gauge transformation is 
itself renormalized, in a certain sense. 


Anomalies 


Chiral symmetries, as in the weak-interaction part of 
the gauge symmetry of the standard model, are 
much harder to deal with. Chiral symmetries are 
ones for which the left-handed and right-handed 
components of Dirac field transform independently 
under different components of the symmetry group, 
local or global as the case may be. Occasionally, 
some or other of the left-handed or right-handed 
components may not even be present. 

In general, chiral symmetries are not preserved by 
regularization, at least not without some other 
pathology. At best one can adjust the finite parts of 
counter-terms such that in the limit of the removal of 
the regulator, the Ward or Slavnov—Taylor identities 
hold. But in general, this cannot be done consistently, 
and the theory is said to suffer from an anomaly. In 
the case of chiral gauge theories, the presence of an 
anomaly prevents the (candidate) theory from being 
valid. A dramatic and nontrivial result (Adler— 
Bardeen theorem and some nontrivial generaliza- 
tions) is that if chiral anomalies cancel at the 
one-loop level, then they cancel at all orders. 

Similar results, but more difficult ones, hold for 
supersymmetries. 

The anomaly cancelation conditions in the standard 
model lead to constraints that relate the lepton content 
to the quark content in each generation. For example, 
given the existence of the b quark, and the 7 and v, 
leptons (of masses around 4.5 GeV, 1.8 GeV, and zero 
respectively), it was strongly predicted on the grounds 
of anomaly cancelation that there must be a t quark 
partner of the b to complete the third generation of 
quark doublets. This prediction was much later 
vindicated by the discovery of the much heavier top 
quark with m, ~ 175 GeV. 


Renormalization Schemes 


A precise definition of the counter-terms entails 
a specification of the renormalization prescription 
(or scheme), so that the finite parts of the counter- 
terms are determined. This apparently induces extra 
arbitrariness in the results. However, in the ¢* 
Lagrangian (for example), there are really only two 
independent parameters. (A scaling of the field does 
not affect any observables, so we do not count Z as 
a parameter here.) Thus, at fixed regulator para- 
meter a or €, renormalization actually just gives a 
reparametrization of a two-parameter collection of 
theories. A renormalization prescription gives the 


406 Renormalization: General Theory 


change of variables between bare and renormalized 
parameters, a rather singular transformation when 
the regulator is removed. If we have two different 
prescriptions, we can deduce a transformation 
between the renormalized parameters in the two 
schemes. The renormalized mass and coupling mı 
and 1 in one scheme can be obtained as functions 
of their values m and A> in the other scheme, with 
the bare parameters, and hence the physics, being 
the same in both schemes. Since these are renorma- 
lized parameters, the removal of the regulator leaves 
the transformation well behaved. 

Generalization to all renormalizable theories is 
immediate. 


Renormalization Group and Applications 
and Generalizations 


One part of the choice of renormalization scheme is 
that of a scale parameter such as the unit of mass u of 
the MS scheme. The physical predictions of the theory 
are invariant if a change of u is accompanied by a 
suitable change of the renormalized parameters, now 
considered as p-dependent parameters A(u) and m(u). 
These are called the effective, or running, coupling and 
mass. The transformation of the parametrization of 
the theory is called an RG transformation. 

The bare coupling and mass àọ and mp are RG 
invariant, and this can be used to obtain equations 
for the RG evolution of the effective parameters 
from the perturbatively computed counter-terms. 
For example, in ¢* theory, we have (in the 
renormalized theory after removal of the regulator) 


dà 
dln p2 = BA) [27] 


with B(A) =3A7/(1677) + O(A°). As exemplified in 
eqns. [18] and [19], Feynman diagrams depend 
logarithmically on u. By choosing u to be comparable 
to the physical external momentum scale, we remove 
possible large logarithms in this and higher orders. 
Thus, provided that the effective coupling at this scale 
is weak, we get an effective perturbation expansion. 

This is a basic technique for exploiting perturba- 
tion theory in QCD, for the strong interactions, 
where the interactions are not automatically weak. 
In this theory the RG function is negative so that 
the coupling decreases to zero as ps — co; this is the 
asymptotic freedom of QCD. 

A closely related method is that associated with 
the Callan—Symanzik equation, which is a formula- 
tion of a Ward identity for anomalously broken 
scale invariance. However, RG methods are the 
actually used ones, normally, even if sometimes an 


RG equation is incorrectly labeled as a Callan- 
Symanzik equation. 

The elementary use of the RG is not sufficient for 
most interesting processes, which involve a set of 
widely different scales. Then more powerful theo- 
rems come into play. Typical are the factorization 
theorems of QCD (see Quantum Chromodynamics). 
These express differential cross sections for certain 
important reactions as a product of quantities that 
involve a single scale: 


do = C(Q, u, A(u)) @ f(m, u, A(H)) 


+ small correction [28] 


The product is typically a matrix or a convolution 
product. The factors obey nontrivial RG equations, 
and these enable different values of u to be used in 
the different factors. Predictions arise because some 
factors and the kernels of the RG equation are 
perturbatively calculable, with a weak effective 
coupling. Other factors, such as f in eqn [28], are 
not perturbative. These are quantities with names 
like “parton distribution functions,” and they are 
universal between many different processes. Thus, 
the nonperturbative functions can be measured in a 
limited set of reactions and used to predict cross 
sections for many other reactions with the aid of 
calculations of the perturbative factors. 

Ultimately, this whole area depends on physical 
phenomena associated with renormalization. 


Concluding Remarks 


The actual ability to remove the divergences in 
certain QFTs to produce consistent, finite, and 
nontrivial theories is a quite dramatic result. More- 
over, associated with the integrals that give the 
divergences is behavior of the kind that is analyzed 
with RG methods and generalizations. So the 
properties of QFTs associated with renormalization 
get tightly coupled to many interesting consequences 
of the theories, most notably in QCD. 

QFTs are actually very abstruse and difficult 
theories; only certain aspects currently lend them- 
selves to practical calculations. So the reader should 
not assume that all aspects of their rigorous 
mathematical treatment are perfect. Experience, 
both within the theories and in their comparison 
with experiment, indicates, nevertheless, that we 
have a good approximation to the truth. 

When one examines the mathematics associated 
with the R-operation and its generalizations with 
factorization theorems, there are clearly present 
some interesting mathematical structures that are 
not yet formulated in their most general terms. Some 
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indications of this can be seen in the work by 
Connes and Kreimer (see Hopf Algebra Structure of 
Renormalizable Quantum Field Theory), where it is 
seen that renormalization is associated with a Hopf 
algebra structure for Feynman graphs. 

With such a deep subject, it is not surprising that 
it lends itself to other approaches, notably the 
Connes—Kreimer one and the Wilsonian one (see 
Exact Renormalization Group). Readers new to the 
subject should not be surprised if it is difficult to get 
a fully unified view of these different approaches. 


Notes on Bibliography 


Reliable textbooks on quantum field theory 
are Sterman (1993) and Weinberg (1995). A clear 
account of the foundations of perturbative QCD 
methods is given by Sterman (1996). 

A pedagogical account of renormalization and 
related subjects may be found in Collins (1984). 
The best account of renormalization theory before 
the 1970s is given by Bogoliubov and Shirkov 
(1959); the viewpoint is very modern, including a 
coordinate-space distribution-theoretic view. A 
full account of the Wilsonian method as applied 
to renormalization is given by Polchinski (1984). 

Manuel and Tarrach (1994) give an excellent account 
of renormalization for a theory with a non-relativistic 
delta-function potential in 2 space dimensions, which 
provides a fully tractable model. 

Tkachov (1994) reviews a systematic application 
of distribution theoretic methods to asymptotic 
problems in QFT. Finally, Weinzierl (1999) provides 
a construction of dimensional regularization with the 
aid of K-theory using an underlying vector space of 
the physical integer dimension. Other constructions, 
referred to in this paper, follow Wilson and use an 
infinite-dimensional underlying space. 
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Renormalization Group 
and Condensed Matter 


Statistical mechanical systems at critical points 
exhibit scaling laws of order parameters, susceptibi- 
lities, and other observables. The exponents of these 
laws are universal, that is, independent of most 


details of the system. For example, the liquid—gas 
transition for real gases has the same exponents as 
the magnetization transition in the three-dimensional 
Ising model. 

The renormalization group (RG) was developed 
by Kadanoff, Wilson, and Wegner, to understand 
these critical phenomena (Domb and Green 1976). 
The central idea is that the system becomes scale 
invariant at the critical point, which makes it 
natural to average over degrees of freedom on 
increasing length scales successively in the 
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calculation of the partition function. This leads to a 
map between effective interactions associated to 
different length scales. Thus, the focus shifts from 
the analysis of a single interaction to that of a flow 
on a space of interactions. This space is in general 
much larger than the original formulation of the 
model would suggest: the description of long- 
distance or low-energy properties may be in terms 
of variables that were not even present in the 
original formulation of the system. Phenomeno- 
logically, this corresponds to the emergence of 
collective degrees of freedom. 

Condensed matter theory is itself already an 
effective theory, and its “microscopic” formulation 
gets inputs from the underlying theories, which 
determine in particular the statistics of the particles 
and their interactions at the scale of atomic energies. 
At much lower-energy scales, which are relevant for 
low-temperature phenomena in condensed matter, 
collective excitations of different, sometimes exotic, 
statistics may emerge, but the starting point is given 
naturally in terms of fermionic and bosonic parti- 
cles. For this reason, the discussion given below will 
be split in these two cases. 

A major difference between high-energy and 
condensed matter systems is that the latter have a 
well-defined Hamiltonian which can be used to 
define the finite-volume ensembles of quantum 
statistical mechanics and which determines the time 
evolution, as well as various analyticity properties. 

The relevant spatial dimensions in condensed 
matter are d<3, but some results in higher 
dimensions relevant for the development of the 
method will also be discussed below. The cases 
d=1 and d=2 have always been of mathematical 
interest but in recent years have become important 
for the theory of new materials. 

Some interesting topics cannot be covered here 
due to space restrictions, notably the application of 
renormalization methods to membrane theory (see 
Wiese (2001)) and renormalization methods for 
operators (see Bach et al. (1998)). 


The Renormalization Group 


In this section we briefly describe the setup of two 
important versions of the RG, namely the block spin 
RG and the RG based on scale decompositions of 
singular covariances. 


Block spin RG 


Let A be a finite lattice, for example, a finite subset 
of 77. For the following, it is convenient to take A 
to be a cube of side-length LÉ for L > 1 and some 


large K. Let T be a set and #4 ={6: A — T} be the 
set of spin configurations. Common examples for 
the target space T are T={—1,1} for the Ising 
model, T=SN~-! for the O(N) model, and T =R” 
for unbounded spins. Let S,:®, —> R, > S,(¢) be 
an interaction and 


Z(A, Sa) = J [I does A] 


xEA 


In the unbounded case, Są is assumed to grow 
sufficiently fast for |¢| — co, so that Z exists; for the 
case of a finite set T, the integral is replaced by a 
sum. Denote the corresponding Boltzmann factor by 


plA, Sa), 


p(A, Sa) (o) = e 8) 2] 


Z(A, Sa) 
The block spin transformation consists of an 
integration step and a rescaling step. Divide the 
lattice into cubic blocks of side-length L and define 
a new lattice A’ by associating one lattice site of the 
new lattice to each L-block of the old lattice. For 
any ¢': A’ — T, let 


oE) =| TI do(xP(oee 3 
xEA 
where P(#',¢)>0 and | [[xead¢'(x')P(¢', 6) =1 
for all ¢, so that p’ remains a probability distribu- 
tion. Since p’ is positive, one defines 


w(?) = —log p(¢) 4 


By construction, the partition function is invariant: 
Z(A’, Si) = ZA, Sa). The new lattice A’ has spacing L; 
now rescale to make it a unit lattice. This completes 
the RG step in finite volume. 

In an algorithmic sense, the “blocking rule” 
P(¢’, ¢) can be viewed as a transition probability of 
a configuration @ to a configuration ¢’. P may be 
deterministic, that is, simply fix ¢’ as a function 
of ¢. From the intuition of averaging over local 
fluctuations, ¢’ is often taken to be some average of 
olx) at x ina block around x’, hence the name. 

Obviously, the thus defined RG transformation 
often cannot be iterated arbitrarily, since in every 
application, the number of points of the lattice shrinks 
by a factor L4, so that after K iterations, a lattice with 
only a single point is left over. It is necessary to take the 
infinite-volume limit L— oo to obtain a map that 
operates from a space to itself. However, [4] can 
become problematic in that limit: Gibbs measures p 
can map to measures p’ whose large-deviation proper- 
ties differ from those of Gibbs measures. The discus- 
sion of this problem and its solution is reviewed in 
Bricmont and Kupiainen (2001). The problem can be 
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solved in different ways, relaxing conditions on Gibbs 
measures or, in the Ising model, changing the descrip- 
tion from the spins to the contours. The crucial point is 
that the difficulties arise only because [4] is applied 
globally, that is, to every ¢’. The set of bad ¢’ has very 
small probability. 

Block spin methods have been used in mathema- 
tical construction of quantum field theories, for 
example, in the work of Gawedzki and Kupiainen 
(1985) and Balaban (1988) (see the subsection 
“Field theory and statistical mechanics”). The 
above-mentioned problem was avoided there by 
not taking a logarithm in the so-called large-field 
region (which has very small probability). 


Scale Decomposition RG 


The generating functionals of quantum field theory 
and quantum statistical mechanics can be cast into 
the form 


Z(C, V,¢) = J duelg') e V+ [5] 


Here duc denotes the Gaussian measure with covar- 
iance C, and V is the two-body interaction between the 
particles. The field variables are real or complex for 
bosons and Grassmann-valued for fermions. Differ- 
entiating log Z with respect to the external field ¢ 
generates the connected amputated correlation func- 
tions. The covariance determines the free propagation 
of particles; the interaction their collisions. 

In most cases, such functional integrals are a priori 
ill-defined, even if V is small (and bounded from 
below) because the covariance C is singular. That is, 
the integral kernel C(X, X’) of the operator C either 
diverges as |x — x’| — 0 (ultraviolet (UV) problem) or 
C(X, X’) has a slow decay as |x — x'| — oo (infrared 
(IR) problem). In our notational convention, X may, 
in addition to the configuration variable x, also 
contain discrete indices of the fields, such as a spin or 
color index. The dependence of C on x and x’ is 
assumed to be of the form x — x’. A typical example 
is the massless Gaussian field in d dimensions, where 
C is the inverse Fourier transform of C(k) =1/k?, 
k € Rf, which has both a UV and an IR problem, or 
its lattice analog, 


with a the lattice constant, which has only an IR 
problem. A typical interaction is of the type 


V(¢) = / dX d¥(X)4(X)o(X, Y)A(Y)O(Y) [6l 


Again, we assume that the potential v depends on x 
and y only via x — y, so that translation invariance 
holds. In both UV and IR cases, naive perturbation 
theory fails even as a formal power series. That is, 
writing V = AVọ, with a coupling constant A which is 
treated as a formal expansion parameter, the singu- 
larity of C leads to termwise divergences in the series. 
The theory is called perturbatively renormalizable if 
all divergences can be removed by posing counter- 
terms of certain types, which are fixed by physically 
sensible renormalization conditions. Identifying the 
UV renormalizable theories was a breakthrough in 
high-energy physics. The IR renormalization problem 
is different, and in some respects harder, because 
there is almost no freedom to put counter-terms: the 
microscopic model is given from the start. This will 
be discussed in more detail below for an example. 

A much more ambitious, and largely open, project 
is to do this renormalization nonperturbatively, that 
is, to treat À as a real (typically, small) parameter. 
Some results will be discussed below. 

The RG is set up by a scale decomposition 
C= >; C. In the example of the massless Gaussian 
field, one would take each C; to be a C% function 
supported in the region {k € Rf: M! < k? < Mt}, 
where M > 1 is a fixed constant, and the summation 
over j runs over Z. 

The scale decomposition of C leads to a represen- 
tation of [5] by an iteration of Gaussian convolution 
integrals with covariances Cj, hence a sequence of 
effective interactions V;, defined recursively by 


e Vile) = | dc, lo’) = palto) Vo = V [7] 


For a singular covariance, the scale decomposition is 
an infinite sum. A formal object like [5] is now 
regularized by starting with a finite sum, that is, 
imposing a UV and IR cutoff, which is mathemati- 
cally well defined, and then taking limits of the thus 
defined objects. Again, in condensed matter applica- 
tions, imposing an IR cutoff is an operation that 
needs to be justified, for example, by showing that 
taking the limit as the cutoff is removed commutes 
with the infinite-volume limit. 

Note that the RG map, which is the iteration 
V;— Vj_-1, goes to lower and lower j, corresponding 
to longer and longer length scales. The convention 
that the iteration starts at some fixed j, for example, 
j=0, is appropriate for IR problems. In UV 
problems, the iteration would start at some large 
Juv, which defines a UV cutoff and is taken to 
infinity, to remove the cutoff, at the end. 

A variant using a continuous scale decomposition, 
C= | dsC,, originally due to Wegner and Houghton, 
became very popular after Polchinski (1984) used it 
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to give a short argument for perturbative renorma- 
lizability. Polchinski’s equation, the analog of the 
recursion [7], reads 


1 1 
Fe 73 be’ =F AeV-5(F C m) [8] 


denotes the Laplacian in field space associated to the 
covariance C. Polchinski’s argument has been devel- 
oped into a mathematical tool that applies to many 
models. For an introduction to perturbative renor- 
malization using this method, see Salmhofer (1998). 
Equations of the type [8] have also been very useful 
beyond perturbation theory: much work has been 
done based on the beautiful representation of Mayer 
expansions found in Brydges and Kennedy (1987) 
using RG equations. 


Mathematical Structure and Difficulties 


The RG flow is thus, depending on the implementa- 
tion, either a sequence or a continuous flow of 
interactions. Setting up this flow in mathematical 
terms is not easy and indeed part of the mathema- 
tical RG analysis is to find a suitable space of 
interactions that is left invariant by the successive 
convolutions, and then to control the RG iteration. 
A serious problem is the proliferation of interac- 
tions: already a single application of the RG 
transformation |7] maps a simple interaction, such 
as [6], to a nonlocal functional of the fields, 


VOD [aX dX, 
m>(0 


x UY) (X1,. 2, Xm) O(X1) +++ P(X) [9] 


Already for perturbative renormalization, one needs 
to extract local terms, calculate their flow more 
explicitly, and control the power counting of the 
remainder. The convergence of the series is not an 
issue in formal perturbation theory because in every 
finite order r in A, the sum over 7n is finite. 

For nonperturbative renormalization, however, 
the problem is much more serious. For bosonic 
systems, the expansion in powers of the fields in 
[9] is divergent, and one needs a split into small- 
field and large-field regions and cluster expansions 
to obtain a well-defined sequence of effective 
actions (Gawedzki and Kupiainen 1985, Feldman 
et al. 1987, Rivassean 1993). That is, the local 
parts are extracted and treated explicitly only in 
the small-field region, and this is combined with 


estimates on the rareness of large-field regions 
using cluster expansions. For fermions, the expan- 
sion in powers of the fields can be proved to 
converge for regular, summable covariances, which 
leads to substantial technical simplifications. 

The spatial proliferation of interactions is absent 
only in certain one-dimensional and in specially 
constructed higher-dimensional models, the so- 
called “hierarchical models.” In these models, the 
search for an RG fixed point is still a nonlinear 
fixed-point problem, whose treatment leads to 
interesting mathematical results. 

This article will be restricted to the mathema- 
tical use of the RG both in perturbative and 
nonperturbative quantum field theory of con- 
densed matter systems. Many nonrigorous but 
very interesting applications have also come out 
of this method, showing that it also works well in 
practice, but they will not be reviewed here. Before 
discussing condensed matter systems, the pioneer- 
ing works done on the mathematical RG, which 
were largely motivated by high-energy physics, 
will be reviewed briefly, as they laid the founda- 
tion of much of the technique used later in the 
condensed matter case. 


Field Theory and Statistical Mechanics 


Because of the close connection between quantum 
field theory and statistical mechanics given by 
formulas of the Feynman—Kac type, a significant 
amount of work on the mathematical RG focused 
on models of classical statistical mechanics in 
connection with field theories and gauge theories. 
Here we mention some of the pioneering results in 
that field. 

The scale decomposition method was developed 
in a mathematical form and applied to perturbative 
UV renormalization of scalar field theories, as well 
as nonperturbative analysis of some models, by 
Gallavotti and Nicolo (Gallavotti 1985). 

Infrared ¢* theory in four dimensions was 
constructed using block spin methods (Gawedzki 
and Kupiainen 1985) and scale decomposition RG 
(Feldman et al. 1987). An essential feature of the 44 
model is its IR asymptotic freedom, meaning that 
the local part of the effective quartic interaction 
tends to zero in the IR limit. 

Block spin methods were used by Balaban (1988) 
to construct gauge theories in three and four 
dimensions. For gauge theories, the block spin RG 
has the major advantage that it allows to define a 
gauge-invariant RG flow. The scale decomposition 
violates gauge invariance, which creates substantial 
technical problems (Rivasseau 1993). 
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Condensed Matter: Fermions 


Starting with the seminal work of Feldman and 
Trubowitz (1990, 1991) and _ Benfatto and 
Gallavotti (1995), this field has become one of the 
most successful applications of the mathematical 
RG. We use this example to discuss the scale 
decomposition method in a bit more detail. 

We shall mainly focus on models in d>2 
dimensions (the case d=1 is described in detail in 
Benfatto and Gallavotti (1995)). The system is put 
into a finite (very large) box A of side-length L. For 
simplicity we take periodic boundary conditions. 
The Hilbert space for spin-1/2 electrons is the 
fermionic Fock space F = @,.,) A” L?(A,C’). The 
grand canonical ensemble in finite volume is given 
by the density operator p= Z~!e°\"-HN), with the 
Hamiltonian H and the number operator N, in the 
usual second quantized form. The parameter 
3=T™ is the inverse temperature and the chemical 
potential jz is an auxiliary parameter used to fix the 
average particle number. 

The grand canonical trace defining the ensemble 
can be rewritten in functional-integral form. It takes 
the form [5], but now duc stands for a Grassmann 
Gaussian “measure,” which is really only a linear 
functional (for definitions, see, e.g., Salmhofer 
(1998, chapter 4 and appendix B)). A two-body 
interaction corresponds to a quartic interaction 
polynomial V, as in [6]. The covariance is (in the 
infinite-volume limit L — oo) 


i — T dk i(k-x—wrT) ¢ 
C(r,x) a ` Pa C(w, k) 


we Mp [10] 
1 


Or) ae) 

where 7 € (0, 3] is a Euclidian time variable and k 
is the spatial momentum. The summation over w 
runs over the set of fermionic Matsubara frequen- 
cies Mp=aT(2Z +1). The function e(k) =e(k) — u, 
where e(k) is the band function given by the single- 
particle term in the Hamiltonian. For a lattice 
system, k € By, the momentum space torus (e.g., 
for the lattice 77, By =R? /2xZ*); for a continuous 
system, ke R*%, hence there is a spatial UV 
problem. Electrons in a crystal have a natural 
spatial UV cutoff (see Salmhofer (1998, chapter 4) 
for a discussion) so we assume in the following 
that there is either a UV cutoff or that the system is 
on a lattice. A nonperturbative definition of the 
functional integral involves a limit from discrete 
times (by the Trotter product formula); see, for 
example, Salmhofer (1998) or Feldman et al. 
(2003, 2004). 


Perturbative Renormalization 


Renormalization of the Fermi surface at zero 
temperature In the limit T— 0, the Matsubara 
frequency w becomes a real variable, hence the 
propagator has a singularity at w=0 and k €S, 
where S={k:e(k) =0}, a codimension-1 subset of 
B4, is the Fermi surface. The existence of a Fermi 
surface which does not degenerate to a point is a 
characteristic feature of systems showing metallic 
behavior. ; 

The singularity implies that C ¢ L?(R x Bz) for 
any p > 2. Because terms of the type 


x [I(T kl, k)) [11] 


appear for all p>1 in the formal perturbation 
expansion, with functions T; and F that do not 
vanish on the singularity set of C, the perturbation 
expansion for observables is termwise divergent. 
The deeper reason for these problems is that the 
interaction shifts the Fermi surface so that the true 
propagator has a singularity of the form 
G(w, k) = (iw — elk) — o(w,k)). If the self-energy o 
is a sufficiently regular function, G has the same 
integrability properties as C, but the singularity of G 
is on the set S={k:e(k) + 0(0,k) =0} (the singular- 
ity in w remains at w=0). 

Let 1= > `j<o Xj(w,k) be a C” partition of unity 
such that 


forj <0 supp x; C {(w,k): eM? 
< iw — e(k)| < eoM’} [12] 


where M > 1 and €p is a fixed constant (an energy 
scale determined by the global properties of the 
function e; see Salmhofer (1998, chapter 4)). The 
corresponding covariances Cac. have the prop- 
erties that for j < 0, ||C||, < const.M’ and ||C;||,, < 
const.M/. Using these bounds and expanding 
vů =S~ ., vu) A", one can derive estimates for the 
coefficient functions v? ,. 

Of course, the scale decomposition by itself does 
not solve the problem of the moving singularity. It 
only allows us to pinpoint the problematic terms in 
the expansion. To construct the self-energy o, as 
well as all higher Green functions, a two-step 
method is used (Feldman and Trubowitz 1990, 
1991, Feldman et al. 1996, 2000). First, a counter- 
term function K which modifies e is introduced, so 
that all two-point insertions T; get subtracted on 
the Fermi surface, hence replaced by T;(w,k)= 
T;(w, k) — T;(0,k'), with k’ obtained from k by a 
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projection to the Fermi surface (Feldman and 
Trubouitz 1990, 1991). Consequently, the T; vanish 
linearly on the Fermi surface, so that the integral over 
k in [11] converges. The effect of the counter-term 
function K can be described less technically: it fixes 
the Fermi surface to be S, the zero set of e. Thus, K 
forces S to be the Fermi surface of the interacting 
system. To achieve this, K must be chosen a function 
of e, k, and AV. In contrast to the situation for 
covariances with point singularities, the function K 
will, for a nontrivial Fermi surface, be very different 
from the original e. It can, however, be constructed to 
all orders in perturbation theory for a large class of 
Fermi surfaces. More precisely, one can prove: if e € 
C*(B,,R), ò € C@(B1, R), and the Fermi surface S 
contains no points k with Ve(k) = 0 and no flat sides, 
then K = 5, \’K, exists as a formal power series in A 
and the map e> e + K is locally injective on this set 
of e’s (Feldman et al. 1996, 2000). With this counter- 
term, the order-r m-point functions on scale j satisfy 
the bounds 














< Wm „M62 5" 
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S Öms [13] 














with constants Wm, and W »,,. Here F is the 
Fourier transform of P (see [9], with the momen- 
tum conservation delta function from translation 
invariance removed. 

Equation [13] implies that in the RG sense, the 
two-point function is relevant, the four-point func- 
tion is marginal, and all higher m-point functions 
are irrelevant. 

In one dimension, the Fermi “surface” reduces to 
two points which are related by a symmetry, so the 
counter-term function K is just a constant, that is, an 
adjustment of the chemical potential u, which is 
justified because u is only an auxiliary parameter 
used to fix the average value of the particle number. 
The counter-term function is a constant also in 
higher dimensions in the special case e(k) = k? — u: 
there, rotational symmetry implies that K can be 
chosen independent of k (if v is also rotationally 
symmetric). However, in the generic case of non- 
spherical Fermi surfaces, K depends nontrivially 
on k, and an inversion problem arises: adding the 
counter-term changes the model. To obtain the 
Green functions of a model with a given dispersion 
relation and interaction (E, V), one needs to show 
that given E in a suitable set, the equation 


e(k) + K(\, e, V)(Rk) = E(k) [14] 


has a unique solution. If this is done, the procedure 
for renormalization is as follows. For a model given 
by dispersion relation and interaction (E, V), solve 
[14], then add and subtract e in the kinetic term. 
This automatically puts K = E — e as a counter-term, 
and the expansion is now set up automatically with 
the right counter-term. The function K describes 
the shift from the Fermi surface of the free system (the 
zero set of E) to that of the interacting system 
(the zero set of e). Proving that K is sufficiently 
regular and solving [14] is nontrivial. Uniqueness of 
the solution follows from the above stated properties 
of K as a function of e. Existence was shown for a 
class of Fermi surfaces with strictly positive curva- 
ture in Feldman et al. (1996, 2000), to every order 
in perturbation theory. This implies a bijective 
relation between the Fermi surfaces of the free and 
the interacting model. 


Positive temperature and the zero-limit temperature 
One advantage of the functional-integral approach 
is that the setup at positive temperatures is identical 
to that at zero temperature, save for the discreteness 
of the set Mp at T>0. Because 0 ¢ Mp, the 
temperature effectively provides an IR cutoff, so 
that all term-by-term divergences are regularized in 
a natural way. However, renormalization is still 
necessary because the temperature is a_ physical 
parameter and unrenormalized expansions give 
disastrous bounds for the behavior of observables 
as functions of the temperature. Renormalization 
carries over essentially unchanged (the counter-term 
function is constructed slightly differently). 

Because |w| > 7/G for all w € Mr, [12] implies 
supp xj = for j < —J3, where 


Js = logy © 15) 
T 

Thus, the scale decomposition is now a finite sum 

over 0 > j > —Jg. This restriction is inessential for 

the problem of renormalizing the Fermi surface, but 

it puts a cutoff on the marginal growth of the four- 


point function: [15] and [13] imply that 


Joell < Bm (108%) 16 
If one can show that w,,,, < AB” with constants A 
and B, this implies that perturbation theory con- 
verges for |A| log (8eo/r) < Bt. Such a bound has 
been shown using constructive methods (Disertori 
and Rivasseau 2000, Feldman et al. 2003, 2004) (see 
below). The logarithm of 8 is due to the Cooper 
instability (see Feldman and Trubowitz (1990, 
1991) and Salmhofer (1998, section 4.5)). 
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The application of renormalization at positive 
temperature also led to the solution of a longstanding 
puzzle in solid-state physics, namely the (seeming) 
discontinuity of the results of perturbation theory as a 
function of the temperature claimed in the early 
literature. When renormalization is done correctly, 
there is no discontinuity in the temperature. 


Nonperturbative Renormalization for Fermions 


It is a remarkable feature of fermionic field theories 
that for a covariance for which ||C||, and ||C]|, are 
both finite, the effective action defined in [7] exists 
and is analytic in the fields and in the original 
interaction V, thanks to determinant bounds. For a 
V as in [6], with Av weak and of short range, the 
skeleton functions (where all relevant m-point 
functions are projected back to their initial values 
in the RG iteration) satisfy 

EP loo < const- PGI 87 


m 


For the many-electron covariance [10], with a 
positively curved C? Fermi surface and with the 
scale decomposition [12], ||C;||, is of order M’ and 
|||, is of order M~/4+)/2. The right-hand side of 
[17] then contains M'4+3~”/?, which agrees (up to 
logarithms) with the perturbative power-counting 
bounds [13] only for d=1. In dimension d= 2, the 
method has been refined by dividing the Fermi 
surface into angular sectors. The corresponding 
sectorized propagators have a better decay bound 
|Ci||,, but the trade-off is sector sums at every 
vertex. Momentum conservation restricts these 
sector sums sufficiently in two dimensions to allow 
for good power-counting bounds. This has allowed 
for the construction of an interesting class of 
interacting fermionic models. 

The major results obtained with the RG method 
are as follows. 

Luttinger liquid behavior at zero temperature was 
proved for one-dimensional models with a repulsive 
interaction (Benfatto and Gallavotti 1995). 

Fermi liquid behavior in the region where 
[A| log(Geo) << 1 was proved for the two- 
dimensional model with e(k)= k? — 1, a local poten- 
tial V, and a UV cutoff both on k and the Matsubara 
frequencies w in Disertori and Rivasseau (2000). 

A two-dimensional model with a band function 
elk) that is nonsymmetric under k—-—k and a 
general short-range interaction was proved to be a 
Fermi liquid at zero temperature (Feldman et al. 
2003, 2004). Due to the asymmetry under k >—k, 
the Cooper instability can be proved to be absent. In 
Feldman et al. (2003, 2004), a counter-term func- 
tion as in Feldman et al. (1996, 2000) was used. The 


nonperturbative proof of the corresponding inver- 
sion theorem remains open. 

In d = 3, the proof of Fermi liquid behavior remains 
an open problem, despite some partial results. 


Condensed Matter: Bosons 


Recent advances in quantum optice, in particular the 
trapping of ultracold atoms, have led to the 
experimental realization of Bose-Einstein condensa- 
tion (BEC), which caused a surge of theoretical and 
mathematical works. For bosons, the definition of 
the ensembles is similar to, but more involved than 
in, the fermionic case. On a formal level, the 
functional-integral representation is analogous to 
fermions, except that the fields are not Grassmann 
fields but complex fields, and the covariance is given 
by a sum as in [10], but now the summation over w 
runs over the bosonic Matsubara frequencies 
Mp =27rTZ. The existence of even the free partition 
function in finite volume restricts the chemical 
potential (for free particles, u < inf,e(k) must 
hold). Note that C is complex and Gaussian 
measures with complex covariances exist in infinite 
dimensions only under rather restricted conditions, 
which are not satisfied by [10]. This is inessential for 
perturbative studies, where everything can be 
reduced to finite-dimensional integrals involving 
the covariance, but a nonperturbative definition of 
functional integrals for such systems requires again a 
carefully regularized (e.g., discrete-time) definition 
of the functional integral. 


Bose-Einstein Condensation 


The problem was treated to all orders in perturba- 
tion theory at positive particle density p > 0 by 
Benfatto (Benfatto and Gallavotti 1995). The initial 
interaction is again quartic, e(k)=k?, and one 
considers the problem at zero temperature, in the 
limit u — 07, which is the limit in which BEC occurs 
for free particles. The interaction is expected to 
change the value of p, given the density, so a 
chemical potential term is included in the action, to 
give the interaction 


V(¢) = J drdxdy|o(r, x)v(x —y)|4(r,y)? 
+V | drdx|(r, x)|” [18] 


After writing o(T, x) =€ + (T, x), where € is indepen- 
dent of r and x, the density condition becomes 
p=|€|’. v now needs to be chosen such that the 
free energy has a minimum at €=,/p. This can be 
reformulated in terms of the self-energy of the boson. 
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Benfatto uses the RG to prove that the propagator of 
the interacting system no longer has the singularity 
structure (iw — k2)! but instead (w* + c2k2)', where 
c is a constant. This requires a nontrivial analysis of 
Ward identities in the RG flow. 

BEC has been proved in the Gross—Pitaevskii limit 
(Lieb et al. 2002). In the present formulation, this limit 
corresponds to an infinite-volume limit L — oo where 
the density p is taken to zero as an inverse power of L. 
A nonperturbative proof of BEC at fixed positive 
particle density remains an open problem. 


Superconductivity 


Superconductivity (SC) occurs in fermionic systems, 
but it happens at energy scales where the relevant 
excitations have bosonic character: the Cooper pairs 
are bosons. In the RG framework, they arise naturally 
when the fermionic RG flow discussed above is 
stopped before it leaves the weak-coupling region 
and the dominant Cooper pairing term is rewritten by 
a Hubbard-Stratonovich transformation. The fer- 
mions can then be integrated over, resulting in the 
typical Mexican hat potential of an O(2) nonlinear 
sigma model. Effectively, one now has to deal with a 
problem similar to the one for BEC, but the action is 
considerably more complicated. 


The Nonlinear Sigma Models 


The prototypical model, into whose universality 
class both examples mentioned above fall, is that 
of O(N) nonlinear sigma models: both BEC and SC 
can be reformulated as spontaneous symmetry 
breaking (SSB) in the O(2) model in dimensions 
d > 3. For d=2, long-range order is possible only at 
zero temperature because only then does the time 
direction truly represent a third dimension, prevent- 
ing the Mermin—Wagner theorem from applying. 

SSB has been proved for lattice O(N) models by 
reflection positivity and Gaussian domination meth- 
ods (Fröhlich et al. 1976). The elegance and 
simplicity of this method is unsurpassed, but only 
very special actions satisfy reflection positivity, so 
that the method cannot be used for the effective 
actions obtained in condensed matter models. 
Results in the direction of proving SSB in O(N) 
models for d > 3 by RG methods, which apply to 
much more general actions, have been obtained by 
Balaban (1995). 


See also: Bose-Einstein Condensates; Fermionic 
Systems; High 7, Superconductor Theory; Holomorphic 
Dynamics; Operator Product Expansion in Quantum 
Field Theory; Perturbative Renormalization Theory and 


BRST; Phase Transition Dynamics; Reflection Positivity 
and Phase Transitions. 
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Introduction 


In quantum mechanics and wave propagation, 
eigenvalues (and eigenfunctions) appear naturally 
as they describe the behavior of a quantum 
system (or the vibration of a structure). There 
are however some cases where these simple 
notions do not suffice and one has to appeal to 
the more subtle notion of resonances. For 
example, if the vibration of a drum is well 
understood in terms of eigenvalues (the audible 
frequencies) and eigenfunctions (the correspond- 
ing vibrating modes), the notion of resonances is 
necessary to understand the propagation of waves 
in the exterior of a bounded obstacle. Another 
example (taken from Zworski (2002)) which 
allows us to understand both the similarities of 
resonances with eigenvalues and their differences 
is the following: consider the motion of a 
classical particle submitted to a force field 
deriving from the potential V(x) on a bounded 
interval as shown in Figure 1a. If the classical 
momentum is denoted by €, then the classical 
energy is given by 


E = |E + Va (e) 


and the classical motion is given by the relations of 
Hamiltonian mechanics: 

: OE , 

E=- VG) 

Since energy is conserved, if the initial energy is 
smaller than the top of the barrier, then the classical 
particle bounces forever in the well. Now we can 
consider the same example with the potential V2(x) 
on R as shown in Figure 1b. Of course, if the 
particle is initially inside the well (with the same 
energy as before), the classical motion remains the 
same. 


x=—~ = 2, 





(a) (b) 


Figure 1a, b A particle trapped in a well. 
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On the quantum mechanics point of view, both 

systems are described by the Hamiltonians 
g 
H; = a T Væ) 

acting on L7({[—1, 1]) (with boundary conditions) and 
L? (R), respectively. In the first case, H; has a discrete 
spectrum, A; , € R with eigenfunctions e; ,(x),7 € N, 
and the time evolution of the system is given by 


ety — ` eh, X ep [1] 
j 

where u; p X e;,p is the orthogonal projection of u on 
the eigenspace Ce; p. In the second case, Hz has no 
square integrable eigenfunction, and no simple 
description as [1] can consequently hold. However 
as h — 0, the correspondence principle tells us that 
quantum mechanics should get close to classical 
mechanics. Since for both quantum problems the 
classical limit is the same (at least for initial states 
confined in the well with energy E), we expect that 
for the second potential there should exist a 
quantum state corresponding to the classical one. 
In fact, this is indeed the case and one can show that 
there exist resonant states e; p associated to reso- 
nances E; p which are solution of the equation 


Hei — Ep Crh. E} ~ E 
are not square integrable, but still have moderate 
growth at infinity and are confined in the interior of 
the well (see sections “Definition” and “Location of 
resonances”). On the other hand, the first quantum 
system is confined, whereas the second one is not and 
we know that even for initial states confined in the 
well, tunneling effect allows the quantum particle to 
escape to infinity. This fact should be described by 
the theory as a main difference between eigenvalues 
and resonances. This is indeed the case as the 
resonances E; p are not real (contrarily to eigenvalues 
of self-adjoint operators) but have a nonvanishing 
imaginary part (see section “Resonance-free regions”) 
ImEj,~ el? 

If we assume that a similar description as [1] still 
holds for the second system, at least locally in space 
(see section “Resonances and time asymptotics”), 
then, for time t >> e°/’, the factor es becomes 
very small (the quantum particle has left the well 
due to tunneling effect). 

There have been several studies on resonances and 
scattering theory and the presentation here cannot be 
complete. For a more in-depth presentation, one can 
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consult the books by Lax and Phillips (1989) and 
Hislop and Sigal (1987), or the reviews on resonances 
by Vodev (2001) and Zworski (1994) for example. 


Definition 


There are different (equivalent) definitions of reso- 
nances. The most elegant is certainly the Helffer and 
Sjöstrand (1986) definition (see also the presentation 
of complex scaling by Combes et al. (1984) and the 
very general “black box” framework by Sjostrand and 
Zworski (1991)). However, it requires a few prerequi- 
sites and we preferred to stick to the more elementary 
(but less general) resolvent point of view. The starting 
point for this definition of resonances is the fact that 
the eigenvalues of a (self-adjoint) operator P are the 
points where P is not injective. The more general 
resonances will be the points where the operator is not 
invertible (on suitable spaces). 

More precisely, consider a perturbation of the 
Laplace operator on R”, Po(h) = —h?A in the following 
sense: let © C Rf be a (possibly empty) smooth obstacle 
whose complementary, Q = O°, is connected. Consider 
a classical self-adjoint operator defined on L*(Q): 


Pyu = (=A + V(x))u 2 
with boundary conditions (Dirichlet) 
u |ao= 0 [3] 


(Neumann boundary conditions could be used too). 
This setting contains both the Schrödinger operator 
(P, = —h*A+ V(x) — E on Q = R”) and the Helmoltz 
equation with Dirichlet conditions, in the exterior of 
an obstacle (waves at large frequencies: P = — A — 7°; 
in this case, define h =7~! and P, = — b? A), which we 
shall define as acoustical scattering. 

We assume that P is a perturbation of Po, that is, 
V — 0, |x| — +00 sufficiently fast (see Sjöstrand and 
Zworski (1991) for the very general black box 
assumptions). For example, this perturbation 
assumption is fulfilled if V has compact support. 
Then the resolvent P;,(z)=(P) — z)" is well defined 
for Imz Æ 0 as a bounded operator from L7(Q) to 


H?(Q) AHO) 


(because the operator P, is self-adjoint). However, it 
is not bounded for z>0 on L7(Q) because the 
essential spectrum of P, is precisely the semiaxis z > 
0, but it admits a meromorphic continuation from 
Imz > 0 toward the lower half-plane: 


R,(z) : LPO) > LQ) 


comp loc 


The poles of this resolvent R, are by definition the 
semiclassical resonances, Resse(Pp). 


Remark 1 In the case of acoustical scattering 
(P= —A—17*,7=h"), the introduction of the addi- 
tional parameter z is pointless and one works 
directly with the parameter T =h™!,/z. In that case 
the resolvent R(r)(—A—72)7 is well defined for 
Im + < 0, the essential spectrum is precisely the axis 
TER and the resolvent admits a meromorphic 
continuation from Im z < 0 toward the upper half- 
plane (with possibly a cut at 0): 


R(T) : L? (Q) LQ) 


comp loc 


The acoustic resonances are by definition the poles 
of this meromorphic continuation. They are related 
to semiclassical resonances by the relation 


Resse = ha/Resac 


It can also be shown that if z is a resonance, there 
exists an associated resonant state e, such that 


(P, — z)ez = 0 


the function e, satisfies Sommerfeld radiation con- 
ditions (in polar coordinates (7,0) € [0, +00) x S”"') 


|hO,e — in/ze| < Cle?" | /r1+"/? 
and the function 


ez eiv?" 
1 + y(1/2)+e 


is square integrable. 


Resonance-Free Regions 


The very first result about resonance-free regions is 
based on Rellich uniqueness theorem (uniqueness for 
solutions of elliptic second-order equations) and says 
that there are no real resonances (except possibly 0). 
The more precise determination of resonance-free 
regions (originally in acoustical scattering) has been a 
subject of study from the 1960s and it has motivated a 
large range of works from the multiplier methods of 
Morawetz (1975) to the general propagation of 
singularity theorem of Melrose and Sjostrand (1978). 
To state the main result in this direction, we need the 
notion of nontrapping perturbation. 


Definition 1 A generalized bicharacteristic at energy 
E(x(s), €(s)) is an integral curve of the Hamiltonian field 


a OE Ox Ox ðt 
of the principal symbol p(x, £) = ël" +V(x) of the 
operator P, included in the characteristic set 
p(x, €)=E and which, when hitting the boundary of 
the obstacle, reflects according to the laws of 
geometric optics (see (Melrose and Sjostrand 1978)). 


The operator P (or by extension the obstacle in the 
case of acoustic scattering) is said to be nontrapping 
at energy E if all generalized bicharacteristics go to 
the infinity: 

lim |x(s)| = +00 

s—> +00 
The operator P (or by extension the obstacle in the 
case of acoustic scattering) is said to be nontrapping 
near energy E if P is nontrapping at energy E’ for E’ 
in a neighborhood of E. 


The following result was obtained in different 
generalities by Morawetz (1975), Melrose and 
Sjöstrand (1978), and others. 


Theorem 1 Assume that the operator P is nontrap- 
ping near energy E. Then for any N > 0 there exist 
ho >0 such that for 0<h<hg there are no 
resonances in the set 


{z; |Imz| < —Nh log(h)} 


In the case of analytic geometries (and coefficients), 
this result (see Bardos et al. 1987) can be improved to 


Theorem 2 Assume that the operator P is non 
trapping. Then there exist € > 0, No > 0 and ho > 0 
such that for 0 < b < ho there are no resonances in 
the set 


{z; |Imz| < Noh! @/3)1 7 {|z El < e} 


Remark 2 In the case of acoustical scattering, with 
the new definition of resonances, T=h !,/z, the 
resonance-free zones have respectively the forms 


{z; |Imz| < -N log(|z|), |z| >> 1} 
{z:lImz| < Nolz|’/°, |z| >> 1} 


In the case of trapping perturbations, the first result 
was obtained by Burg (1998). 


Theorem 3 There exist C > 0 and ho > 0 such that 
for 0 < b < ho there are no resonances in the set 


{z; |Im z| < No ee] N {|z E| < e} 


Resonances and Time Asymptotics 


The relationship between eigenfunctions/eigen- 
values and time asymptotics is straightforward. 
This is no longer the case for resonances. For 
nontrapping problems however, this question has 
been studied in the late 1960s by Lax and Phillips 
(1989) and Vainberg (1968). In particular, this 
approach was decisive to study the local energy 
decay in acoustical scattering. As a consequence of 
Theorem 1, we have 
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Theorem 4 If the acoustical problem is nontrap- 
ping, then there exist C,a > 0 such that for any 
solution of the wave equation 


Ou 
[lu = 0, U|1—0 = Uo, O;u| 10 = Uj, tlr = 0, a ITy= 0 
with compactly supported initial data (uọ,u1) (in a 
fixed compact), one has 


Eise (u) 


= / Vul” -- \O,u| 
ON{|x|<C 


Ce if the space dimension is even 


a. GC | oo 4 
— if the space dimension is odd 


Trapping perturbations were investigated more 
recently. In that case, the local energy decays, but the 
rate cannot be uniform. The first trapping example in 
acoustic scattering was studied by Ikawa (1983): the 
obstacle is the union of a finite number (and at least 
two) convex bodies. In that case, one has 


Theorem 5 For any « > Q there exists C > 0 such 
that for any initial data supported in a fixed 
compact set 


Ejc(u)(t) < Ce™™ || (0, 4) pia -aaa 


where D((1—A)"*9/7) is the domain of the 
operator (1 — A)"+9/2 Remark that the norm in 
D((1 — A)"*) is the natural energy and consequently 
the estimate above exhibits a loss of € derivatives. 
For strongly trapping perturbations, the results are 
worse. They are consequences of Theorem 3. 


Theorem 6 For any k there exists C, > 0 such that 
for any initial data supported in a fixed compact set 


Ck 2 
Ejoc(#) (t) < 2 ROE | (uo, u1) loa ayo?) 


log(t) 

One can also obtain real asymptotic expansions in 
terms of resonances (see the work by Tang and 
Zworski (2000)). 

Theorem 7 Let x € CO(R”) and y E€ CX ((0, co)) 
and let chsupp w=[a,b]. There exists 0<6< 


c(h) <26 such that for every M > Mo there exists 
L=L(M), and we have 


ye VHP) vaP) = yRes(e7#(*)/? 
zeM(h)nRes(P) 
x R(e,h),z)xv(P) (5] 
+ On an(b®~), fort>bh* 
Q(b) =(a—c(h),b+c(b)) —i[0,b™) 
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where Res(f(e),z) denotes the residue of a mer- 
omorphic family of operators, f, at z. 


The function c(h) depends on the distribution of 
resonances: roughly speaking we cannot “cut” 
through a dense cloud of resonances. Even in the 
very well understood case of the modular surface 
there is, currently at least, a need for some 
nonexplicit grouping of terms. The same ideas can 
be applied to acoustic scattering. 


Trace Formulas 


Trace formulas provide a description of the classical/ 
quantum correspondence: one side is given by the trace 
of a certain function of the operator f(P;), whereas the 
other side is described in terms of classical objects 
(closed orbits of the classical flow). In the case of 
discrete eigenvalues, the question is relatively simple 
and can be solved by using the spectral theorem. In the 
case of continuous spectrum, the problem is much more 
subtle (self-adjoint operators with continuous spectrum 
behave in some ways as non-normal operators). It has 
been studied by Lax and Phillips (1989), Bardos et al. 
(1982), and Melrose (1982). More recently, Sjöstrand 
(1997) introduced a local notion of trace formulas. 
a W cQ be an open precompact subsets of 
[20,0110 +oof. Assume that the intersections I 
and J of W and Q with the real axis are intervals and 
that Q is simply connected. 


Theorem 8 Let f(z,h) be a family of holomorphic 
functions on z EQ such that |flo\w| <1. Let x€ 
C(R) equal to 1 on a neighborhood of I. Then 


Trace((xf)(Pi) — (xf) (—h7A)) 
= SS fab) +00" 


A a resonance of POQ 


The use of this result with a clever choice of functions f 
allows Sjöstrand to show that an analytic singularity of 
the function E+> Vol({x; V(x) > E}) (observe that if V 
is bounded, this function vanishes for large E and 
consequently it has analytic singularities) gives a lower 
bound for 2 a neighborhood of E 


tRes(P,) AQ > ch” 


which coincides with the upper bound (see Zworski 
(2002) and the references given there). 


Location of Resonances 


In some particular cases, one can expect to have a 
precise description of the location of resonances. 
This is the case in Ikawa’s example in acoustic 
scattering where the obstacle is the union of two 


disjoint convex bodies. In this case, the line 
minimizing the distance, d, between the bodies is 
trapped. However, this trapped trajectory is isolated 
and of hyperbolic type (unstable). Ikawa (1983) and 
Gérard (1988) have obtained: 


Theorem 9 There exist geometric positive constants 
kp — +00 as p— +œ such that all resonances 
located above the line Imz > —C (C arbitrary large 
but fixed) have an asymptotic expansion 


Aw No tD apd A HONS), j> +0 


where the approximate resonances 
A 
Nip =I a ikp 


are located on horizontal lines. 


Another example is when the obstacle is convex. 
This example is nontrapping and Sjöstrand and 
Zworski (1999) are able to e that the resonances 
in any region Imz > N|z|'/° (N arbitrary large) are 
asymptotically distributed near cubic curves 


Ci = {z € C; Imz = —Gz|"7} 


Finally, the last main example where one can give a 
precise asymptotic for resonances is when there 
exists a stable (elliptic) periodic trajectory for the 
Hamiltonian flow. In that case it had been known 
from the 1960s (see the works by Babič (1968)) that 
one can construct quasimodes, that is, compactly 
supported approximate solutions of the eigenfunc- 
tions equation: 


(P, — Ey )ej = O(h% ) 


It is only recently that Tang and Zworski (1998) and 
Stefanov (1999) proved that these quasimodes 
constructions imply the existence of resonances 
asymptotic to Ep, h —> 0. 


See also: h-Pseudodifferential Operators and 
Applications; Semi-Classical Spectra and Closed Orbits. 
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Introduction 


Riemann surfaces were first studied as the natural 
domain of definition of (multivalued) holomorphic 
or meromorphic functions. They were the starting 
point for the development of the theory of 
real and complex manifolds (see Weyl (1997)). 
Nowadays, Riemann surfaces are simply defined 
as one-dimensional complex manifolds (see the 
next section). Compact Riemann surfaces can 
be embedded into projective spaces and are thus, 
by virtue of Chow’s theorem, algebraic curves. By 
uniformization theory, the universal cover of 
a connected Riemann surface is either the unit 


disk, the complex plane, or the Riemann sphere 
(see the section “Uniformization’’). 

This article discusses the basic theory of compact 
Riemann surfaces, such as their topology, their 
periods, and the definition of the Jacobian variety. 
Studying the zeros and poles of meromorphic 
functions leads to the notion of divisors and linear 
systems. In modern language this can be rephrased 
in terms of line bundles, resp. locally free sheaves 
(see the section “Divisors, linear systems, and line 
bundles”). One of the fundamental results is the 
Riemann-Roch theorem which expresses the 
difference between the dimension of a linear system 
and that of its adjoint system in terms of the degree 
of the linear system and the genus of the curve. This 
theorem has been vastly generalized and is truly one 
of the cornerstones of algebraic geometry. 
A formulation of this result and a discussion of 
some of its applications are also discussed. 


420 Riemann Surfaces 


A study of the subsets of the Jacobians parame- 
trizing linear systems of given degree and dimension 
leads to Brill-Noether theory, which is discussed in 
the section “Brill-Noether theory.” This is followed 
by a brief introduction to the theory of equations 
and syzygies of canonical curves. 

Moduli spaces play a central role in the theory 
of complex variables and in algebraic geometry. 
Arguably, the most important of these is the 
moduli space of curves of genus g. This and 
related moduli problems are treated in the section 
“Moduli of compact Riemann surfaces.” In parti- 
cular, the space of stable maps is closely related to 
quantum cohomology. Finally, we present a brief 
discussion of the Verlinde formula and conformal 


blocks. 


Basic Definitions 


Riemann surfaces are one-dimensional complex 
manifolds. An n-dimensional complex manifold 
M is a topological Hausdorff space (i.e., for any 
two points x #y on M, there are disjoint open 
neighborhoods containing x and y), which has a 
countable basis for its topology, together with a 
complex atlas A. The latter is an open covering 
(Ua)ae4 together with homeomorphisms fa: Ua > 
Va C C”, where the U, are open subsets of M and 
the Va are open sets in C”. The main requirement 
is that these charts are holomorphically compati- 
ble, that is, for Ua N Us #9, the map shown in 
Figure 1, 


fo o fa lf.(UanUs) falUa nU] fe UnU CC: 


is biholomorphic. A map h:M — N between two 
complex manifolds is holomorphic if it is so with 
respect to the local charts. This means the following: 





=à 
PeT, 





Y 





VEC 


Vc C’ 


Figure 1 Charts of a complex manifold. 
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Figure 2 Holomorphic map between manifolds. 


for each point x€M, there are charts 
fe US =V CC" near x and fi: U3 > Ve C 
C” near h(x) with b(UM) c Uz such that the map 
shown in Figure 2 


ee Ve > V3 c C” 


is holomorphic (one checks easily that this does not 
depend on the choice of the charts). 

A Riemann surface is a one-dimensional com- 
plex manifold. Trivial examples are given by open 
sets in C (where one chart suffices). Another 
example is the Riemann sphere C=CU {ov}, 
which can be covered by the two charts given by 
z#oo and z£0. Both of these charts are home- 
omorphic to C with the transition function given 
by z++1/z. Historically, Riemann surfaces were 
viewed as (branched) coverings of C or of the 
sphere, where they appear as the natural domain 
of definition of multivalued holomorphic or 
meromorphic functions. 


Uniformization 


If M is a Riemann surface, then its universal 
covering M is again a Riemann surface. The 
connected and simply connected Riemann surfaces 
can be fully classified. Let 


E = {z € C;|z| < 1} 


be the unit disk and Ĉ=CU {o0} the Riemann 
sphere. The latter can be identified with the complex 
projective line PŁ. 


Theorem 1 (Generalized Riemann mapping 
theorem). Every connected and simply connected 
Riemann surface is biholomorphically equivalent 


to the unit disk E, the complex plane C, or the 
Riemann sphere C. 


This theorem was proved rigorously by Koebe 
and Poincaré at the beginning of the twentieth 
century. 


Compact Riemann Surfaces 


The topological structure of a compact Riemann 
surface C is determined by its genus g (Figure 3). 
Topologically, a Riemann surface of genus g is a 
sphere with g handles or, equivalently, a torus with 
g holes. 

Analytically, the genus can be characterized as the 
maximal number of linearly independent holo- 
morphic forms on C (see also the section “The 
Riemann-Roch theorem and applications”). 

There exists a very close link with algebraic 
geometry: every compact Riemann surface C can 
be embedded into some projective space PA (in 
fact already into PÈ). By Chow’s theorem, C is 
then a (projective) a variety, that is, it can 
be described by finitely many homogeneous equa- 
tions. It should be noted that such a phenomenon 
is special to complex dimension 1. The crucial 
point is that one can always construct a non- 
constant meromorphic function on a Riemann 
surface (e.g., by Dirichlet’s principle). Given such 
a function, it is not difficult to find a projective 
embedding of a compact Riemann surface C. On 
the other hand, it is easy to construct a compact 
two-dimensional torus T=C*/L for some suitably 
chosen lattice L, which cannot be embedded into 
any projective space Pé. 

The dichotomy Riemann surface/algebraic curve 
arises from different points of view: analysts think 
of a real two-dimensional surface with a Rieman- 
nian metric which, via isothermal coordinates, 
defines a holomorphic structure, whereas algebraic 
geometers think of a complex one-dimensional 
object. 

In this article, the expressions compact Riemann 
surface and (projective) algebraic curve are both 
used interchangeably. The choice depends on 
which expression is more commonly used in the 
part of the theory which is discussed in the 
relevant section. 


S52 


genus g 


aise 3 Genus of Riemann surfaces. 
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Periods and the Jacobian 


On a compact Riemann surface C of genus g, there 
exist 2g homologically independent paths, that is, 
Hı (C, Z) & Z”. 

Let 71,.--sYag be a basis of Hı(C,Z) and 
let w1,...,Wg be a basis of the space of holomorphic 
1-forms on C. Integrating these forms over the paths 
Y1,- --, 72g defines the period matrix 

ml T Jy” 
Q= 
ie o dt 


g” © 


If O=(7, 7) is the intersection matrix of the paths 
Y1s---s 2g, then Q satisfies the Riemann bilinear 
relations 

QOR =0, V—-1NEN >0 (1] 


where the latter condition means positive definite. 
One can choose (see Figure 4) 1,..., Y2g such that 


—, { 0 L 
Q =J a —1, 0 
where 1, is the gxg unit matrix. Moreover, 


W1,...,Wg can be chosen such that 


pia ~ % 
Let 
No = (Ti) 1<i,j<¢ 
Then the Riemann bilinear relations [1] become 
O= Ino, >0 


that is, Qo is an element of the Siegel upper half- 
space 


HMs e Misan Cr ae dmr 





The matrix Qo is defined by the Riemann surface C 
only up to the action of the symplectic group 


Sp(2g, Z) = {M € Mat(2g x 2g, Z); MJM = J} 


Figure 4 Homology of a compact Riemann surface. 
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which acts on the Siegel space Hg by 


M = (c 4 : T (Ar +B)(Cr+D)"' 


Here A,... D are g x g blocks. 
The rows of the matrix Q define a rank-2g lattice 
Lo in C! and the Jacobian of C is the torus 


J(C) = CŒ /Lo 


More intrinsically, one can define J(C) as follows. 
Let H?(C,wc) be the space of holomorphic differ- 
ential forms on C. Then, integration over cycles 
defines a monomorphism 


H(G, Z) = EG, We)” 


yf 
i 


J(C) = H’ (C, wc)*/H1(C, Z) 


and 


For a fixed base point Po € C, the Abel-Jacobi 
map is defined by 


u:C—IJ](C) 


P P 
Poff asf a) 
Po Po 


Here, the integration is taken over some path 
from Po to P. Obviously, the integral depends on 
the choice of this path, but since J(C) was 
obtained by dividing out the periods given by 
integrating over a basis of H;(C,Z), the map is 
well defined. 

Let C? be the dth Cartesian product of C, that is, 
the set of all ordered d-tuples (P1,...,P,). Then, u 
defines a map 


Wu Co (C) 
(Py... Py) u(P1) +--+ + u(Py) 


where + is the usual addition on the torus J(C). 
If d=g—1, then 


© = Im(u8') c J(C) 


is a hypersurface (i.e., has codimension 1 in J(C)) 
and is called a theta divisor. A different choice of the 
base point Po results in a translation of the theta 
divisor. Using the theta divisor, one can show that 
J(C) is an abelian variety, that is, J(C) can be 
embedded into some projective space Pé. The pair 
(J(C),©) is a principally polarized abelian variety 
and Torelli’s theorem states that C can be 
reconstructed from its Jacobian /(C) and the theta 
divisor O. 


Divisors, Linear Systems, 
and Line Bundles 


A divisor D on C is a formal sum 


D=n,P,+.---+n,P,, PEC neZ 


The degree of D is defined as 
deg D = nı +---+n, 


and D is called “effective” if all n; > 0. Every 
meromorphic function f 4 0 defines a divisor 


(7) = fo — foo 


where fo are the zeros of f and fæ the poles (each 
counted with multiplicity). Divisors of the form (f) are 
called principal divisors and the degree of any principal 
divisor is O (see the next section). Two divisors D; and 
Dy are called linearly equivalent (Dı ~ D2) if their 
difference is a principal divisor, that is, 


D, — D2 =(f) 


for some meromorphic function f 4 0. This defines 
an equivalence relation on the group Div(C) of all 
divisors on C. Since principal divisors have degree 0, 
the notion of degree also makes sense for classes of 
linearly equivalent divisors. We define the divisor 
class group of C by 


CI(C) = Div(C) /~ 
The degree map defines an exact sequence 
deg 


I-sCl(O=CdO 730 


where Cl°(C) is the subgroup of CI(C) of divisor 
classes of degree 0. 

Let Cy be the set of unordered d-tuples of points 
on C, that is, 


C= Gs, 


where the symmetric group Sg acts on the Cartesian 
product C by permutation. This is again a smooth 
projective variety and the Abel—Jacobi map 
uf : C4 — J(C) clearly factors through a map 


uq : Ca > J(C) 
The fibers of this map are of particular interest. 


Theorem 2 (Abel). Two effective divisors Dı and 
D> on C of the same degree d are linearly equivalent 
if and only if uqg(D 1) =ug(D2). 


One normally denotes the inverse image of ug( D) by 
|D| = uz" (ug(D)) = {D'; D' > 0,D' ~ D} 


Note that the latter description also makes sense if 
D itself is not necessarily effective. One calls |D] the 


complete linear system defined by the divisor D. If 
degD <0, then automatically |D|=Ø, but the 
converse is not necessarily true. Let Mc be the 
field of meromorphic (or equivalently rational) 
functions on C. Then, one defines 


L(D) = {f € Mc; (f) 2 —D} 


This is a C-vector space and it is not difficult to see 
that L(D) has finite dimension. To every function 
0 ~f € L(D), one can associate the effective divisor 


D = (f)+D>0 


Clearly, Dy ~ D and every effective divisor with this 
property arises in this way. This gives a bijection 


P(L(D)) = |D| 


showing that the complete linear system |D| has the 
structure of a projective space. A linear system is a 
projective subspace of some complete linear system |D]. 

Clearly, the map ug: Cy — J(C) can be extended 
to the set Div“(C) of degree d divisors and Abel’s 
theorem then states that this map factors through 
cite C), that is, that we have a commutative diagram 


DivC) CIC} 





where “g is injective. 


Theorem 3 (Jacobi’s Inversion Theorem). The 
map ug is surjective and hence induces an isomorphism 


Tig : Cl“(C) = J(C) 


It should be noted that the definition of the maps 
ug depends on the choice of a base point Po € C. 
Hence, the maps # are not canonical, with the 
exception of the isomorphism Tọ: cic C) = J(C) 
where the choice of Po drops out. 

The concepts of divisors and linear systems can be 
rephrased in the language of line bundles. A (holo- 
morphic) vector bundle on a complex manifold M is a 
complex manifold E together with a projection 
p:E — M which is a locally trivial C’-bundle. This 
means that an open covering (Ua)aca of M and local 
trivializations 


1 == U xC! 
N fee 


exist, such that the transition maps 





—1 f 
PA Pa | (UgnUy) xC 


(Ua N Uz) x Co (Ua N Uz) xO 
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are fiberwise linear isomorphisms. If M is connected, 
then r is constant and is called the rank of the vector 
bundle. A line bundle is simply a rank-1 vector bundle. 

Alternatively, one can view vector bundles as 
locally free Oy-modules, where Oy denotes the 
structure sheaf of holomorphic (or in the algebro- 
geometric setting regular) functions on M. An 
Om-module € is called locally free of rank r, if an 
open covering (Ua)aca of M exists such that Ely = 
OF. The transition functions of a locally free sheaf 
can be used to define a vector bundle and vice versa, 
and hence the concepts of vector bundles and focally 
free sheaves can be used interchangeably. The open 
coverings U, can be viewed either in the complex 
topology, or, if M is an algebraic variety, in 
the Zariski topology, thus leading to either holo- 
morphic vector bundles (locally free sheaves in the 
C-topology) or algebraic vector bundles (locally free 
sheaves in the Zariski topology). Clearly, every 
algebraic vector bundle defines a holomorphic 
vector bundle. Conversely, on a projective variety 
M, Serre’s GAGA theorem (géométrie algébriques et 
géométrie analytique), a vast generalization of 
Chow’s theorem, states that there exists a bijection 
between the equivalence classes of algebraic and 
holomorphic vector bundles (locally free sheaves). 

The Picard group Pic M is the set of all isomorph- 
ism classes of line bundles on M. The tensor product 
defines a group structure on Pic M where the neutral 
element is the trivial line bundle © m and the inverse 
of a line bundle £ is its dual bundle £*, which is also 
denoted by £7. For this reason, locally free sheaves 
of rank 1 are also called invertible sheaves. 

We now return to the case of a compact Riemann 
surface (algebraic curve) C. The concept of line 
bundles and divisors can be translated into each 
other. If D = $` n;P; is a divisor on C and U an open 
set, then we denote by Dy the restriction of D to U, 
that is, the divisor consisting of all points P; € U 
with multiplicity n;. One then defines a locally free 


sheaf (line bundle) £(D) by 
£(D)(U) = {f € Mc(U); (f) = —Du} 


To see that this is locally free, it is enough to 
consider for each point P; a neighborhood U; on 
which a holomorphic function t; exists, which 
vanishes only at P; and there of order 1 (i.e., it is a 
local parameter near the point P;). Then, 


L(D)(U;) = t "Ou, S Oy, 
This correspondence defines a map 


Div G — PicC 
D= £(D) 
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It is not hard to show that: 


1. every line bundle £ € Pic C is of the form £= 
L(D) for some divisor D on the curve C; 
2. Dy ~ Di = L(D1) = L(D2); 
3. L(D1) ® L(D2) S L(D, + D2); and 
4. LDS nUD, 
Hence, there is an isomorphism of abelian groups 
CI(C) & Pic C 


This correspondence allows to define the degree of a 
line bundle £. In the complex analytic setting this 
can also be interpreted as follows. Let ©% be the 
sheaf of nowhere-vanishing functions. Using cocycles, 
one easily identifies 
HCO Piet 
and the exponential sequence 
E E T, 

induces an exact sequence 

0 — H'(C,Z) — H'(C, Oc) 

— H'(C, 0%) = PicC = H*(C,Z) 
The last map in this exact sequence associates to 
each line bundle £ its first Chern class cı(£) € 
H? (C, Z) Z, which can be identified with the 
degree of L. Hence, the subgroup Pic? C of degree 0 
line bundles on C is isomorphic to 
Pic? C = H'(C, Oc)/H1(C, Z) 


Altogether there are identifications 


Pic? C = CI? C = J(C) 


The Riemann-Roch Theorem 
and Applications 


For every divisor D on a compact Riemann surface C, 
the discussion of the preceding section shows that there 
is an identification of finite-dimensional vector spaces 


LD =H CAD) 


( 
where H°(C, £(D)) is the space of global sections of 
the line bundle £(D). One defines 


I(D) = dime L(D) 


It is a crucial question in the theory of compact Riemann 
surfaces to study the dimension /(D) as D varies. 

The canonical bundle wc of C is defined as the dual 
of the tangent bundle of C. Its global sections are 
holomorphic 1-forms. Every divisor Kc on C with 
wco=L(Kc) is called (a) canonical divisor. The 


canonical divisors are the divisors of the meromorphic 
1-forms on C, whereas the effective canonical divisors 
correspond to the divisors of holomorphic 1-forms 
(here, we simply write a 1-form locally as f(z) dz and 
define a divisor by taking the zeros, resp. poles of f (z)). 
By abuse of notation, we also denote the divisor class 
corresponding to canonical divisors by Kc. There is a 
natural identification 


P(H"(C,we)) = [Kel 
For a divisor D, the index of speciality is defined by 
i(D) = (Kc — D) = dimc L({Ke = D) 


The linear system |Kc — D| is called the adjoint 
system of |D|. A crucial role is played by the 


Theorem 4 (Riemann-Roch). For any divisor D ona 
compact Riemann surface C of genus g, the equality 


I(D) —i(D) = deg D+ 1-8 [2] 
holds. 


This can also be written in terms of line bundles. 
If £ is any line bundle, then we denote the 
dimension of the space of global sections by 


h°(L) = dime H? (C, £) 
Then, the Riemann—Roch theorem can be written as 
b°(L)—h’(we@l')=deg£L+1-g [3] 


This can be written yet again in a different way, if 
we use sheaf cohomology. By Serre duality, there is 
an isomorphism of cohomology groups 


H1(C, £L) = H? (C, wc 8 £71)" 
and hence if we set 
h'(L)= dime H! (C, £) 
then [3] reads 
h°(L) —h'(L) =deg£L+1-¢ [4] 


Whereas [2] is the classical formulation of the 
Riemann—Roch theorem, formula [4] is the formula- 
tion which is more suitable for generalizations. 
From this point of view, the classical Riemann- 
Roch theorem is a combination of the cohomologi- 
cal formulation [4] together with Serre duality. 

The Riemann-Roch theorem has been vastly gen- 
eralized. This was first achieved by Hirzebruch who 
proved what is nowadays called the Hirzebruch- 
Riemann—Roch theorem for vector bundles on projec- 
tive manifolds. A further generalization is due to 
Grothendieck, who proved a “relative” version invol- 
ving maps between varieties. Nowadays, theorems like 
the Hirzebruch-Riemann-Roch theorem can be 


viewed as special cases of the Atiyah—-Singer index 
theorem for elliptic operators. The latter also contains 
the Gauss—Bonnet theorem from differential geometry 
as a special case. Moreover, Serre duality holds in 
much greater generality, namely for coherent sheaves 
on projective varieties. 

Applying the Riemann—-Roch theorem [3] to the 
zero divisor D =0, resp. the trivial line bundle Oc, 
one obtains 


h* (we) = 8 [5] 


that is, the number of independent global holo- 
morphic 1-forms equals the genus of the curve C. 
Similarly, for D = Kc, resp. L = wc, we find from [3] 
and [5] that 


deg Kc = 2g — 2 


These relations show, how the Riemann-Roch 
theorem links analytic, resp. algebraic, invariants 
with the topology of the curve C. 

Finally, if deg D > 2g — 2, then deg(Kc — D) < 0 
and hence i(D) =/(Kc — D)=0 and [2] becomes 


(D) =degD+1-g ifdegD >2g-2 


which is Riemann’s original version of the theorem. 

Classically, linear series arose in the study of 
projective embeddings of algebraic curves. For a 
nonzero effective divisor 


the support of D is defined by 
supp(D) =P esre] 


A complete linear system |D| is called base point 
free, if no point P exists which is in the support of 
every divisor D’ € |D|. This is the same as saying 
that for every P € Ca section s € H? (C, £L(D)) exists 
which does not vanish at P. Let |D| be base point 
free and let so,...,s, € H? (C, L(D)) be a basis of the 
space of sections. Then, one obtains a map 


pp : C > P(H?(C, £(D))) = P” 
P++(so(P):...: sa(P)) 


The divisors D’ € |D| are then exactly the pullbacks 
of the hyperplanes H of P” under the map y)p). Note 
that the map yp; as defined here depends on the 
choice of the basis so,...,S„, but any two such 
choices only differ by an automorphism of P”. We 
say that |D|, resp. the associated line bundle 
£L=L(D), is very ample if gp, defines an 
embedding. Using the Riemann—Roch theorem, it is 
not difficult to prove: 
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Proposition 1 Let D be a divisor of degree d on the 
curve C. Then 


(i) |D| is base point free if d > 2g and 
(ii) |D| is very ample if d > 2g +1. 


If the genus g(C) > 2, then one can prove that |Kc| 
is base point free and consider the canonical map 


Y\Kc| C = pe~! 


A curve C is called hyperelliptic if there exists a 
surjective map f:C — P! which is a covering of 
degree 2. In genus 2 every curve is hyperelliptic, 
whereas for genus g > 3 hyperelliptic curves are 
special. The connection with the canonical map is 
given by 


Theorem 5 (Clifford). Let C be a curve of genus 
g > 2. Then the canonical map is an embedding if 
and only if C is not hyperelliptic. 


We end this section by stating Hurwitz’s theorem: 
Let f:C— D be a surjective holomorphic map 
between compact Riemann surfaces (if f is not 
constant then it is automatically surjective). Then, 
near a point P€ C the map f is given in local 
analytic coordinates by f(t)=?t” and we call f 
“ramified” of order np if np > 1. The ramification 
divisor of f is defined as 


R =X _(np—1)P 


Pec 


Note that this is a finite sum. If we define 


FOS >> mP 
Pef—'(Q) 


then one can show that 


degf =degf*(Q)= X` np 


Pef! (Q) 


is independent of the point O. This number is called 
the degree of the map f. (This should not be 
confused with the degree deg(f) of the principal 
divisor (f) defined by f.) In fact, applying the above 
equality to the map f:C— P! associated to a 
nonconstant meromorphic function f shows that 
the degree of the principal divisor (f) is zero, since 


deg(f) = deg f*(0) — deg f*(co) = 0 


Theorem 6 (Hurwitz). Let f:C — D be a surjec- 
tive holomorphic map between compact Riemann 
surfaces of genus g(C) and g(D), respectively. Then, 


2g(C) — 2 = degf - (2g(D) — 2) + deg R 


where R is the ramification divisor. 
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Brill-Noether Theory 


In this section, we state the main results of Brill- 
Noether theory. For a divisor D on a curve C we 
denote by 


r(D) =I(D) —1 


the projective dimension of the complete linear 
system |D|. The principal objects of Brill-Noether 
theory are the sets W% C Cl“(C) = Pic4(C) given by 


W (OC) = {D; deg D = d, r(D) > r} 


These sets are subvarieties of CI C) =Pic4(C). 

We denote by g’, a linear system (not necessarily 
complete) of degree d and projective dimension r. 
Closely related to the varieties W7 are the sets 


G’(C) = {6;6is a g} on C} 


These sets also have a natural structure as a projective 
variety. Clearly, there are maps G7,(C) — WZ(C). 

If g=g(C) is the genus of the curve C, then the 
Brill-Noether number is defined as 


p(g,7,d) =g—(r+1)(g—d+r) 


Its significance is that it is the expected dimension of 
the varieties G’,(C). The two basic results of Brill- 
Noether theory are: 


Theorem 7 (Existence Theorem). Let C be a curve 
of genus g. Let d, r be integers such that d > 1, r > 0, 
and p(g,r,d) > 0. Then G',(C) and hence W'(C) are 
nonempty and every component of G'(C) has dimen- 
sion at least p. If r> d—g, then the same is true 
for W?(C). 


Theorem 8 (Connectedness Theorem). Let C be 
a curve of genus g and d,r integers such that d > 1, 
r > 0, and p(g, r, d) > 1. Then G',(C) and hence also 
W'(C) are connected. 


The above theorems hold for all curves C. There 
are other theorems which only hold for general 
curves (where general means outside a countable 
union of proper subvarieties in the moduli space, see 
the section “Moduli of compact Riemann surfaces”). 


Theorem 9 (Dimension Theorem). Let C be a 
general curve of genus g and d > 1, r > 0 integers. If 
p(g, r, d) <0, then G'(C)=0. If p = 0, then every 
component of G'(C) has dimension p. 


Theorem 10 (Smoothness Theorem). Let C be a 
general curve of genus g and d>1,r>0. Then, 
G'(C) is smooth of dimension p. If p> 1, then 
G^ (C) and hence W'(C) are irreducible. 


Brill-Noether theory started with a paper of Brill 
and Noether in 1873. It was, however, only from 
the 1970s onwards that the main theorems could be 
proved rigorously, due to the work of Griffiths, 
Harris, Kleiman, Mumford, and many others. For 
an extensive treatment of the theory, as well as a list 
of references, the reader is referred to Arbarello 
et al. (1985). 


Green’s Conjecture 


In recent years, much progress was achieved in 
understanding the equations of canonical curves. If 
the curve C is not hyperelliptic, then the canonical 
map prj C = ps! defines an embedding. We 
shall, in this case, identify C with its image in P%~! 
and call this a canonical curve. The Clifford index 
(for a precise definition see Lazarsfeld (1989)) is a 
first measure of how special a curve C is with 
respect to the canonical map. Hyperelliptic curves, 
where the canonical map fails to be an embedding, 
have, by definition, Clifford index 0. The two next 
special cases are plane quintic curves (they have 
a gs) and trigonal curves. A curve C is called 
trigonal, if there is a 3:1 map C — Pt, in which 
case C has a g}. More generally, the gonality of a 
curve C is the minimal degree of a surjective map 
C — P!. Plane quintics and trigonal curves are 
precisely the curves which have Clifford index 1. 


Theorem 11 (Enriques-Babbage). If C c PE! is a 
canonical curve, then C is either defined by quad- 
ratic equations, or it is trigonal or isomorphic to a 
plane quintic curve (i.e., it has Clifford index 1). 


One can now ask more refined questions about 
the equations defining canonical curves and the 
relations (syzygies) among these equations. This 
leads to looking at the minimal free resolution of a 
canonical curve C, which is of the form 





Glo DOs" esri @;O pe (j) — 0 
Here, Zc is the ideal sheaf of C and Ope- (7) is the 
nth power of the dual of the Hopf bundle (or 
tautological sub-bundle) on P%! if n>0, 
resp. the |z|th power of the Hopf bundle if n < 
0. The §(C) are called the Betti numbers of C. 
The Green conjecture predicts a link between the 
nonvanishing of certain Betti numbers and geo- 
metric properties of the canonical curve, such as 
the existence of multisecants. Recently, C Voisin 
and M Teixidor have proved the Green conjecture 
for general curves of given gonality (see Beauville 
(2003)). 


Moduli of Compact Riemann Surfaces 


As a set, the moduli space of compact Riemann 
surfaces of genus g is defined as 
M, ={C; C is a compact 
Riemann surface of genus g}/ = 
For genus g=0, the only Riemann surface is the 
Riemann sphere C=P' and hence Mog consists of 


one point only. Every Riemann surface of genus 1 is 
a torus 


E=C/L 
for some lattice L, which can be written in the form 
L; = Zr + Z, 


Two elliptic curves E, =C/L, and Ey =C/Ly are 
isomorphic if and only if a matrix 


Imr > 0 


exists with 


This proves that 





Mı = H4 /SL(2, Z) 


and this construction also shows that Mı can itself 
be given the structure of a Riemann surface. Using 
the j-function, one obtains that 


Mı >C 


The situation is considerably more complicated for 
genus g > 2. The space of infinitesimal deformations 
of a curve C is given by H!(C, Tc) where Tc is the 
tangent bundle. By Serre duality 


H'(C,Tc) & H? (C, w8)" 
and by Riemann’s theorem it then follows that 
dim H' (C, Tc) = dim H? (C, w8?) = 3g — 3 


This shows that a curve of genus g depends on 
3g— 3 parameters or moduli, a dimension count 
which was first performed by Riemann. 

In genus 2 every curve has the hyperelliptic 
involution, and for a general curve of genus 2 this 
is the only automorphism. In genus g> 3 the 
general curve has no automorphisms, but some 
curves do. The order of the automorphism group is 
bounded by 84(g — 1). The existence of automorph- 
ism for some curves means that M, is not a 
manifold, but has singularities. The singularities 
are, however, fairly mild. Locally, M, always 
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looks like C°8°/G near the origin, where G is a 
finite group acting linearly on C°8~*. One expresses 
this by saying that M, has only finite quotient 
singularities. A space with this property is also 
sometimes referred to as a V-manifold or an 
orbifold. Moreover, M, is a quasiprojective variety, 
that is, a Zariski-open subset of a projective variety. 
As the above parameter count implies, the dimen- 
sion of Mg, is 3g—3. At this point it can also be 
clarified what is meant by a general curve in the 
context of Brill-Noether theory: a property is said to 
hold for the general curve in Brill—-Noether theory if 
it holds outside a countable number of proper 
subvarieties of Meg. 

It is often useful to work with projective, rather 
than quasiprojective, varieties. This means that one 
wants to compactify M, to a projective variety Mg, 
preferably in such a way that the points one adds 
still correspond to geometric objects. The crucial 
concept in this context is that of a stable curve. A 
stable curve of genus g is a one-dimensional 
projective variety with the following properties: 


1. C is connected (but not necessarily irreducible), 

2. C has at most nodal singularities (i.e., two local 
analytic branches meet transversally), 

3. the arithmetic genus p,(C) =h'(C, Oc) =g, and 

4, the automorphism group Aut(C) of C is finite. 


The last of these conditions is equivalent to the 
following: if a component of C is an elliptic curve, 
then this must either meet another component or 
have a node, and if a component is a rational curve, 
then this component must either have at least two 
nodes or one node and intersect another component, 
or it is smooth and has at least three points of 
intersection with other components. 

It should be noted that, in contrast to the previous 
illustrations, Figure 5 is drawn from the complex 
point of view, that is, the curves appear as one- 
dimensional objects. 

The concept of stable curves leads to what is 
generally known as the Deligne-Mumford compac- 
tification of Mg: 


M, = {C;C is a stable curve of genus g}/ = 





Figure 5 An example of a stable curve of genus 3. 
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Theorem 12 (Deligne-—Mumford, Knudsen). Meg is 
an irreducible, projective variety of dimension 3g — 3 
with only finite quotient singularities. 


The spaces M, have been studied intensively over 
the last 30years. From the point of view of 
classification, an important question is to determine 
the Kodaira dimension of these spaces. 


Theorem 13 (Harris-Mumford, Eisenbud—Harris). 


The moduli spaces Mg are of general type for 
o> 23. 


On the other hand, it is known that Mg is 
rational for g < 6, unirational for g < 14, and has 
negative Kodaira dimension for g < 16. 

A further topic is to understand the cohomology 
of M,, resp. the Chow ring, and to compute the 
intersection theory on Mg. For these topics we refer 
the reader to Vakil (2003). 

Closely related is the moduli problem of stable 
n-pointed curves. A stable 1-pointed curve (Figure 6) 
is an (n+ 1)-tuple (C, x1,..., Xn), where C is a 
connected nodal curve and x1,...,x, are smooth 
points of C with the stability condition that the 
automorphism group of (C,xj,..., Xn) is finite. 
These curves can be parametrized by a coarse 
moduli space M,,.. These spaces share many 
properties of the spaces M,: they are irreducible, 
projective varieties with finite quotient singularities 
and of dimension 3g —3 + n. 

A further development, which has become very 
important in recent years, is that of moduli spaces of 
stable maps. These were introduced by Kontsevich in 
the context of quantum cohomology. To define stable 
maps, one first fixes a projective variety X and then 
considers (n + 2)-tuples (C, x1,...,Xn, f) where 
(C, X1, ..., Xn) is an n-pointed curve of genus g and 
f:C — X a map. The stability condition is, that this 
object allows only finitely many automorphisms 
y:C —> C, fixing the marked points x1,...,Xn, such 
that f oy =f. In order to obtain meaningful moduli 
spaces, one also fixes a class y € H(X, Z). One then 
asks for a space parametrizing all stable (n + 2)-tuples 
(C, x1, ---, Xn, f) with the additional property that 
fC] =y. This construction is best treated in the 
language of stacks, and one can show that this moduli 





Figure 6 An example of marked stable curve. 


problem gives rise to a proper Deligne-Mumford stack 
Me n(X, 7). In general, this stack is very complicated, 
it need not be connected, can be very singular, and may 
have several components of different dimensions. Its 
expected dimension is 


exp. dim Mg n(X, 7) 


= (dim X — 3)(1 — g) +n + J c1(Tx) 

y 
Quantum cohomology can now be rephrased as 
intersection theory on the stack M,,,(X, y). In 
general, these stacks do not have the expected 
dimension. For this reason, Behrend and Fantechi 
(1997) have constructed a virtual fundamental class of 
the right dimension, which is the correct tool for the 
intersection theory which gives the algebro-geometric 
definition of quantum cohomology. In addition to this, 
there is also a symplectic formulation. It was shown by 
B Siebert that both approaches coincide. 


Verlinde Formula and Conformal Blocks 


The study of vector bundles (locally free sheaves) on 
a compact Riemann surface is an area of research in 


its own right. For a rank-r bundle €, the slope of € is 
defined by 


_ deg E 


r 





(E) 


where the degree of E is defined as the degree of the 
line bundle A” E=detE. The bundle € is called 


stable, resp. semistable, if 


u(F) < mE), resp. u(F) < ulE) 


for every proper sub-bundle {0} GF GE. Let C be a 
compact Riemann surface of genus g>2 and let 
SUc(r) be the moduli space of semistable rank-r vector 
bundles with trivial determinant det € = Oc. This is 
a projective variety of dimension (r? — 1)(g— 1). 
It contains a smooth open set, whose points corres- 
pond to the isomorphism classes of stable vector 
bundles. The complement of this set is in general the 
singular locus of SUc(r) and its points correspond to 
direct sums of line bundles of degree 0. These are the 
so-called graded objects of the semistable, but not 
stable, bundles. By a theorem of Narasimhan and 
Seshadri, the points of SUc(r) are also in one-to-one 
correspondence with the isomorphism classes of 
representations 71(C) — SU(r). 

Let L € Pic&'(C) be any line bundle of degree 
g— 1 on C. Then, the set 


Oz = {E € SUc(r); dim H° (C, E @ L) > 0} 
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is a Cartier divisor on SUc(r) and thus defines a line 
bundle £ on SUc(r). This is a natural generalization 
of the construction of the classical theta divisor. The 
line bundle £ generates the Picard group of the 
moduli space SUc(r). 


Theorem 14 (Verlinde Formula). 
and k is a positive integer, then 


If C has genus g 


dim H°(SUc(r), £*) 


y g 
-2 I 


gees 


gf ml 


sin 7 ———— 
EFTER 





This formula was first found by Verlinde in the context 
of conformal field theory. Due to this relationship, the 
spaces H?(SUc(r), L*) are also called conformal 
blocks. These spaces can also be defined for principal 
bundles. Rigorous proofs for the general case of the 
Verlinde formula are due to Beauville-Laszlo and 
Faltings. For a survey, see Beauville (1995). 


See also: Characteristic Classes; Cohomology Theories; 
Index Theorems; Mirror Symmetry: a Geometric Survey; 
Moduli Spaces: An Introduction; Polygonal Billiards; 
Several Complex Variables: Basic Geometric Theory; 
Several Complex Variables: Compact Manifolds; 
Topological Gravity, Two-Dimensional. 
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Introduction 


The Riemann—Hilbert (RH) method in mathematical 
physics and analysis consists in reducing a particular 
problem to the problem of reconstruction of an 
analytic, scalar- or matrix-valued function in the 
complex plane from a prescribed jump across a 
given curve. More precisely, let an oriented contour 
X be given in the complex A-plane. The contour X 
may have points of self-intersections, and it may 


consist of several connected components; typical 
contours appearing in applications to integrable 
systems are shown in Figure 1. 

The orientation of an arc in X defines the + 
and the — side of X. Suppose in addition that we 
are given a map v:~S—GL(N,C) with v,v! € 
L(x). The (normalized) RH problem determined 
by the pair (%,v) consists in finding an Nx N 


— O jH 


Figure 1 Typical contours for RH problems. 
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matrix-valued function m(A) with the following 
properties: 


m(A) is analytic in C\S [1a] 


m,(r) =m_(A)v(A) for A € X 
where m4(A)(m_(A)) is the limit 
of m from the + (—) side of © [1b] 


m(X) — I (identity matrix) as A— œ fic] 


The precise sense in which the limit at oo and the 
boundary values m+ are attained are technical 
matter that should be specified for each given RH 
problem (X, v). 

Concerning the name RH problem we note that 
in literature (particularly, in the theory of bound- 
ary values of analytic functions), the problem of 
reconstructing a function from its jump across a 
curve is often called the Hilbert boundary-value 
problem. The closely related problem of analytic 
matrix factorization (given ©X and v, find G(A) 
analytic and nondegenerate in C\» such that 
G,G_=v on È) is sometimes called the Riemann 
problem. The name “RH problem” is also 
attributed to the reconstruction of a Fuchsian 
system with given poles and a given monodromy 
group. 

In applications, the jump matrix v also depends 
on certain parameters, in which the original problem 
at hand is naturally formulated (e.g., v = v(à; x,t) in 
applications to the integrable nonlinear differential 
equations in dimension 1 + 1, with x being the space 
variable and ¢ the time variable), and the main 
concern is the behavior of the solution of the RH 
problem, m(A;x,t), as a function of x and t. 
Particular interest is in the behavior of m(A;x,t) as 
x and t become large. 

In the scalar case, N=1, rewriting the original 
multiplicative jump condition in the additive form 


logm,(A) = logm,(A) + logv(à) 


and using the Cauchy—Plemelj-Sokhotskii formula 
give an explicit integral representation for the 


solution 
B 1 flogv(u) 
aA epia [ Se 2 


(in the case of nonzero index, A log v|» 4 0, formula 
[2] admits a suitable modification). 

A generic (nonabelian) matrix RH problem 
cannot be solved explicitly in terms of contour 
integrals; however, it can always be reduced to a 
system of linear singular-integral equations, thus 
linearizing an originally nonlinear system. 


The main benefit of reducing an originally non- 
linear problem to the analytic factorization of a 
given matrix function arises in asymptotic analysis. 
Typically, the dependence of the jump matrix on the 
external parameters (say, x and t£) is oscillatory. In 
analogy of asymptotic evaluation of oscillatory 
contour integrals via the classical method of steepest 
descent, in the asymptotic evaluation of the solution 
m(A;x,t) of the matrix RH problem as x, t — œ, the 
nonlinear steepest-descent method examines the 
analytic structure of the jump matrix v(à; x,t) in 
order to deform the contour © to contours where 
the oscillatory factors become exponentially small as 
x,t— œ, and hence the original RH problem 
reduces to a collection of local RH problems 
associated with the relevant points of stationary 
phase. Although the method has (in the matrix case) 
noncommutative and nonlinear elements, the final 
result of the analysis is as efficient as the asymptotic 
evaluation of the oscillatory integrals. 


Dressing Method 


The RH method allows describing the solution of a 
differential system independently of the theory of 
differential equations. The solution might be expli- 
cit, that is, given in terms of elementary or elliptic or 
abelian functions and contour integrals of such 
functions. In general (transcendental) case, the 
solution can be represented in terms of the solution 
of certain linear singular integral equations. 

In the modern theory of integrable systems, a 
system of nonlinear differential equations is often 
called integrable if it can be represented as a 
compatibility condition of an auxiliary overdeter- 
mined linear system of differential equations called a 
Lax pair of the given nonlinear system (actually it 
might involve more than two linear equations). In 
order that the compatibility condition represents a 
nontrivial nonlinear system of equations, the Lax 
pair is required to depend rationally on an auxiliary 
parameter (called a spectral parameter). The RH 
problem formulated in the complex plane of the 
spectral parameter allows, given a particular solu- 
tion of the compatibility equations, to construct 
directly new solutions of the compatibility system by 
“dressing” the initial one. 

For example, let D(x, A),x € R”,A € CbeanN x N 
diagonal, polynomial in \ with smooth coefficients, 
function such that a;:=0D/0x; are polynomials in 
A of degree dj. Then Wo := exp D(x, A) solves the 
system of linear equations OW9/Ox;=a;Yo, whose 
compatibility conditions 07W /Ox;Oxp = 07 Vo /Oxp0x; 
are trivially satisfied. Given a contour © and a smooth 
function v, consider the matrix RH problem [1] 


with the jump matrix 0(A;x):=expD(x, A)v()) 
exp —D(x, A). Let m(A;x) be the solution of this RH 
problem. Then (Djm), =(Djm)_v, where Djf := 
Of /Ox; + [aj,f] with [a,b]:=ab — ba. The Liouville 
theorem implies that (Dj)m™ is an entire function 
which is 0(A%) as A— oo. Setting W(x, A) :=m(A;x) 
exp D(x, A) gives the system of linear equations 


aw 
=a +) Map(x) = R(x, AU B 
Ox; hed, 


the compatibility conditions for which are 


OR, ƏR; p 

e g = RRA a] 
Equating coefficients of various powers of A in [4] 
gives a (generally) nonlinear system of partial 
differential equations for the coefficient matrices 
qik- Thus, given D(x, à), the RH problem, if it is 
solvable, maps the pair (X, v) to solutions of [4]. 

Specializing to n = 2 with variables (x,t) € R*, the 

overdetermined system of linear equations and the 
corresponding compatibility conditions are 


v, =UV, v,=Vw [5] 
and 
U; — V +[U, V] = 0 [6] 


respectively. Conditions [6] are sometimes called the 
zero-curvature conditions. 

Equations [5] and [6] with U and V depending 
rationally on the spectral parameter A represents the 
integrable nonlinear systems in 1 + 1 dimension. A 
typical example of such a system is the (defocusing) 
nonlinear Schrödinger (NLS) equation 


iqt + Axx — 2Iql°4 = 0 [7] 


Starting from the RH problem with the 2 x 2 jump 
matrix 


v(A; x, t) = 73/2 y())e—1073/? [8] 


where 0(\;x,t) = —tA\? + xA, 03 = diag{1, —1}, and 
v(A) satisfies the involution o3v*(A)o3=v(A), 
expanding out the limit of the solution of the RH 


problem as À —> oo 





1 
m(A;x,t) =I+ = t) + o(;) [9] 
and arguing as above gives [5], with 
i 0 
y = 03 ( s) [10] 
2 q 0 
and q= —i(m1ı)ı2, whereas the compatibility condi- 


tion [6] reduces to [7]. 
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The relation between the RH problem and the 
differential equations [5] is local in x and t; it is based 
only on the unique solvability of the RH problem, 
the Liouville theorem, and the explicit dependence of 
the jump matrix in x and t. The uniqueness of the 
solution of an RH problem is basically provided by 
the Liouville theorem: the ratio m” (m)! of any 
two solutions is analytic in C\¥ and continuous 
across X and is therefore identically equal to I by the 
normalization condition [1c]. 

On the other hand, there are no completely 
general effective criteria for the solvability. Never- 
theless, many RH problems seen in applications to 
integrable systems satisfy the following sufficient 
condition: if X is symmetric with respect to R and 
contains R, and if, in addition, v*(A)=v(A) for A € 
“\R and Rev(A)>0 for AER, then the RH 
problem is solvable. 

For nonlinear equations supporting solitons, the 
RH problem appears naturally in a more general 
setting, as a meromorphic factorization problem, 
where m in [1] is sought to be a (piecewise) 
meromorphic function, with additionally prescribed 
poles and respective residue conditions. Alterna- 
tively, in the Riemann factorization problem 
G.G_=v, one assumes that G degenerates at 
some given points \1,...,A, E QT and py,..., Un € 
Q7, where C=Q* U Q7 UX, and prescribes two sets 
of subspaces, Im G|)_,, and Ker G|}, In the case 
v = I, the solution of the factorization problem with 
zeros (meromorphic RH problem) is purely alge- 
braic, and gives formulas describing multisoliton 
solutions. In the general case, v ÆI, the mero- 
morphic RH problem can be algebraically converted 
to a holomorphic RH problem, by subsequently 
removing the poles with the help of the Blaschke- 
Potapov factors. 

Alternatively, a meromorphic RH problem can be 
converted to a holomorphic one by adding to © an 
additional contour ™,,, enclosing all the poles, 
interpolating the constants involved in the residue 
conditions inside the region surrounded by “iayx, and 
defining a new jump matrix on “ayx using the 
interpolant and the Blaschke—Potapov factors. 

RH problems formulated on the complex plane C 
correspond typically to solutions of relevant non- 
linear problems decaying at infinity. For other types 
of boundary conditions (e.g., nonzero constants or 
periodic or quasiperiodic boundary conditions), the 
corresponding RH problem is naturally formulated 
on a Riemann surface. For example, the RH 
problem associated with finite density conditions 
q(x,t) — pel?+ as x > +00 for the NLS equation [7] 
is naturally formulated on the two-sheet Riemann 


surface of the function k(\)=./A2 — 4p? with 
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the contour © consisting of the points (A,¢), where 
[A| > 2p and ¢=+1 marks the surface sheet. 


Inverse-Scattering Transform 


The inverse-scattering transform method for solving 
initial-value problems for integrable nonlinear equa- 
tions written as the compatibility conditions [6] for 
linear equations [5] consists in the following: starting 
from the given initial data, solve the direct problem, 
that is, determine appropriate eigenfunctions (solu- 
tions of the differential x-equation in the Lax pair [5]) 
having well-controlled analytic properties as functions 
of the auxiliary (spectral) parameter and the 
associated spectral functions of A; then, by virtue of 
the t-equations in the Lax pair [5], the associated 
functions evolve in a simple, explicit way. Finally, 
using the explicit evolution of the spectral functions, 
solve the inverse problem of finding the associated 
coefficients in the x-equation, which, by [5], evolve 
according to the given nonlinear equation and thus 
solve the Cauchy problem for this equation. The last 
step in this procedure, the inverse-scattering problem, 
can be effectively solved by reformulating it as an RH 
problem, which in turn can be related to a system of 
singular integral equations. The classical Gelfand- 
Levitan—Marchenko integral equation of the inverse- 
scattering problem is the Fourier transform of some 
special cases of these singular integral equations. 

To fix ideas, consider the initial-value problem for 
the NLS equation [7], where the data g(x,t=0)= 
go(x) have sufficient smooth and decay as |x| — oo. 
For each A € C\R, one constructs solutions W(x, A) 
of W,=UW with U given by [10], having the 
properties 


—1xA03 





m(x, A) := U(x, A) exp( ) —I asx —-—oo 
and m(x, A) is bounded as x — oo. For each fixed x, 
the 2 x 2 matrix function m(x,A) solves the RH 
problem in å, where © =R and the jump matrix is 


_ bux) = { ETOP rel 
pauls y= (A hort 1 11 


Here r(A) is the reflection coefficient of go(x). 
The direct scattering map R is described by 


mapping qf, 
q= m(x, A) = m(x, à;q) = và; x) =r = R(q) 


By virtue of the t-equations in [5], if q(t)=q(x,t) 
solves the NLS equation, then r(t) = R(q( - ,t)) evolves 
as r(t) =r(t, A) =e r9(A), where ro = R(qo). Given 
r, the inverse-scattering map R! is obtained by 
solving the normalized RH problem (RHP) with the 


jump matrix |11] and evaluating its solution m(x, A) as 
A— œ [9]: 
r= vm RHP m(x, à) 
= m(x, à; r) => m(x) => q(x) 
= —i(mı(x))12 


and thus 
q(x, t) = R (eO-“#0"7(.)) 12 


The mathematical rigor to this scheme is provided 
by the general theory of analytic matrix factoriza- 
tion making use of the relation between the 
factorization problem and certain singular integral 
equations; this relation can be established with the 
help of the Cauchy operators 


fhu) du 
a= | EE, VEC\D 
and 
C*h(A)= lim (Ch)(¥) 


 €(+)—side of È 





For a very general class of contours, the Cauchy 
operators C*:L? >L?,1<p<oo, are bounded, 
Ct — C =I, and Ct + C~ = —H, where 


is the Hilbert transform. 

The map R is often considered as a nonlinear 
Fourier-type map; this point of view is supported by 
the fact that R is a bijection between the corre- 
sponding Schwartz spaces of functions. Making use 
of the LP or Holder theory of the Cauchy operators 
and the related factorization problems, it is possible 
to analyze the action of R and R™ in various 
functional spaces. This also requires making more 
precise the definition of the RH problem: for fixed 
1<p<o, given © and v such that v,v™ € 
L®(£ — GL(N, C)), we say that m+ solves an RH 
L?-problem if m:z€I+OC(L?) and m,(A)= 
m_(A)v(A) for A € X. Here a pair of L?(%)-functions 
f+ € OC(L?) if there exists a unique function 
he L?(X%) such that f,(A)=(C*h)(A). Then f(A) = 
Ch(A), A € C\%, is called the extension of f+ off X. 

Given a factorization of v=(v~) ‘vt =(I—w7)? 
(I+wt) on X with v*,(v+)'e LP”, the basic 
associated singular integral operator is defined by 


Cuh := CT (bw) + C (hw?) 


If the operator I — C,, is invertible on L?(%), with 
uweEl+L?(d), solving (I-—C,)m=I, then m(A)= 
I+ (C(pu(ws + w)))(A) is the unique solution of the 


RH problem (£, v). Although the operator C,, need 
not be compact, in many cases it is Fredholm with 
zero index. Then the existence of (I—C,,)? is 
equivalent to the solvability of the RH problem 
(X,v), and the normalized RH problem (m— I as 
A— oo) has a unique solution if and only if the 
corresponding homogeneous RH problem (with 
m=—Q0 as Aco) has only the trivial solution 
(vanishing lemma). 

The most complete theory for RH problem relative 
to simple contours is the theory when v is in an 
inverse, closed, decomposing Banach algebra A, that 
is, the algebra of continuous functions with the 
Hilbert transform bounded in it such that if f € A, 
then f! € A. For contours with self-intersections, the 
RH factorization theory is formulated in terms of a 
pair of decomposing algebras: choosing the orienta- 
tion of the contour in such a way that it divides the 
A-plane into two disjoint regions, Qt and Q7, and 
each arc of X forms part of the positively oriented 
boundary of (7, the functions in the +(—) algebra 
are continuous up to the boundary in each connected 
component of QF (Q7). 

The choice of functional spaces in the RH problem 
should be based on the integrable system at hand. For 
example, an integrable flow connected to the scatter- 
ing problem for Y, = UY, with U defined by [10], 
has in general the form e’”™y(A)e "3 (Ablowitz— 
Kaup—Newell-Segur (AKNS) hierarchy) in the scat- 
tering space (for the NLS equation, p=2), so that 
appropriate spaces are L?((1 + x?)dx)n H?! for 
q(-,t) and L*((1 + ALPA | A]) N H! as the scatter- 
ing space. Deift and Zhou showed that in this case 
the scattering map R and the inverse-scattering map 
R indeed involve no “loss” of smoothness or decay. 

A generalization of the inverse-scattering trans- 
form method to the initial boundary-value problems 
for integrable nonlinear equations (on the half-line 
or on a finite interval with respect to the space 
variable x) can be also developed on the basis of the 
RH problem formalism. It this case, the construction 
of the corresponding RH problem involves simulta- 
neous spectral analysis of the both linear equations 
in the Lax pair [5]. The boundary values generate an 
additional set of spectral functions, which generally 
makes the construction of the associated RH 
problem more complicated than in the case of the 
corresponding initial-value problem (particularly, 
the contour is to be enhanced by adding the part 
coming from the spectral analysis of the t-equation); 
however, this RH problem again depends explicitly 
on x and t, which makes it possible to develop 
relevant techniques (such as the nonlinear steepest- 
descent method for the asymptotic analysis) in the 
same spirit as in the case of initial-value problems. 
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An RH problem may be viewed as a special case 
in a more general setting of problems of recon- 
structing an analytic function from the known 
structure of its singularities. The departure from 
analyticity of a function m of the complex variable 
A can be described in terms of the “d-bar” 
derivative, 0m/0OX. If 0m/OX can be linearly related 
to m itself, then the use of the extension of 
Cauchy’s formula 


1 = 1 Om 1 


leads to a linear integral equation for m. This is the 
case for some multidimensional (2+ 1) nonlinear 
integrable equations. For example, for the Kadomtsev— 
Petviashvili-I equation (the two-dimensional general- 
ization of the Korteweg-de Vries equation) (gq; + 
69x + qxxx)x = 3qyy, the appropriate eigenfunctions 
are still sectionally meromorphic, but their jumps 
across a contour are connected nonlocally to m on 
the contour, which leads to nonlocal RH problem of 
the type 


m(u) 
u— À 


ma (à) = m- (à) + [ dum_(wf(u,A), AEX 


with given f(u,) (analogue of scattering data). 
Contrarily, the eigenfunctions for the Kadomtsev— 
Petviashvili-II equation (q: + 69qx + xxx). = —3dyy 
are nowhere analytic, with 0m/0O. related to m by 


a (—A), 


py) = F(Re A, Im A)m AEC 


Nonlinear Steepest-Descent Method 


The nonlinear steepest-descent method is based on a 
direct asymptotic analysis of the relevant RH 
problem; it is general and algorithmic in the sense 
that it does not require a priori information (anzatz) 
about the form of the solution of the asymptotic 
problem. However, the noncommutativity of the 
matrix setting requires developing rather sophisti- 
cated technical ideas, which, in particular, enable an 
explicit solution of the associated local RH problems. 

To fix ideas, let us again consider the NLS 
equation. The dependence of the jump matrix 
v(A;x,t) on x and ż is oscillatory; it is the same as 
in the integral 


1 | Fs 
q(x, t) ar | ger) G5 dX [13] 


which solves the initial-value problem for the 
linearized version of [7]: 


gp Gexe=90, aeg) [14] 
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(here go(A) is the Fourier transform of the initial data 
go). The main contribution to [13] as |x| and £ tend to 
coo comes from the point of stationary phase of 
ellxA1’) that is, the point A = ào =x/2t, for which 


i (xà — tX )=0 

If go(A) is analytic in a strip |Im A| < £, then one can 
use Cauchy’s theorem to deform [13] to an integral 
on a contour X; such that |ei®à—"] decreases rapidly 
on X, away from A=Aọ. Hence, as t— oo, the 
problem localizes to a neighborhood of A= Apo; this 
constitutes the standard method of steepest descent. 

In the spirit of the oscillatory contour integral 
case, the nonlinear steepest-descent method for an 
oscillatory RH problem introduced by Deift and 
Zhou consists in the following: deform the contour 
and (rationally) approximate the jump matrix in 
order to obtain an RH problem with a jump matrix 
that decays to the identity away from stationary 
phase points; then, rescaling the problem near the 
stationary phase points, obtain a (local) RH problem 
with a piecewise constant jump matrix, which can 
be solved in closed form, usually in terms of certain 
special functions. 

The contour deformation means the following. 
Suppose that the jump matrix of an RH problem 
(,v) has a factorization v=b—'v,b, between two 
points on X, where b,(b_) has holomorphic and 
nondegenerating continuation to the part Q*(Q7) of a 
disk Q supported by these points, see Figure 2a. Then 
the contour may be deformed to the contour 
XY’ =X U OQ, and the jump matrices across X’ may be 
defined as indicated in Figure 2b. If m solves the RH 
problem (£, v), then m’ defined by m'=mb;! in Q+ 
and m’=m outside Q solves the deformed RH 
problem associated with ©’. 

The appropriate factorization of v given by [8] 
and the contour deformation are to be chosen in 
accordance with signature table; for the NLS 
equation, it is given in Figure 3. The key step is to 
move algebraically the factors e+” in v(A;x, ft) into 
regions of the complex plane, where they are 
exponentially decreasing as t > oo. The jump matrix 
admits two algebraic factorizations: 





Figure 2 Deformation of an RH problem. 
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Figure 3 Signature table. 
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1 re 5 
x 1s r (À = Ao) 
0 1 
The diagonal factors (1 — |r|)" can be removed by 


conjugating v by 63°, where (A) solves the scalar, 
normalized RH problem on R:6,=6_(1— |r|”) for 
A < ào and 6,=6_ for A > Xo; the solution of the 
latter can be written in a closed form: 


` lo — rl 
s0) =e} 3 f BÀ gy) 


Then m:=m6~ solves the RH problem across 
X= R, with the jump matrix 


1 762e” 1 0 
(MP) (ht) oom 
0 1 76 +e 1 


1 A rő? 2 
=| fa 4 1—|r\" |] (A< Ao) 
1 — |r|? 0 1 


Replacing 7,7, etc., by appropriate rational approx- 
imations [r], [7], matching at A = Ao, 


7 1 0 
+\ [ree 1 


can be continued to the sector above Ry + Ap and 


nG Oe) 





can be continued to the sector below R + Ao, where 
the factors e*"’ are exponentially decreasing. Doing the 
same for the appropriate factors on R- + Ao, we 
obtain an RH problem on a cross, say, (Ao + e™/4R) U 
(Ayo +e77/4R). As t—oo, the RH problem then 
localizes at Ao. 

Performing an appropriate scaling, a straightfor- 
ward computation shows that, as t— oo, the 
problem reduces to an RH problem with the jump 
matrix that does not depend on A (it is determined 
by r(Ao)), which make it possible to solve this 
problem explicitly (in terms of the parabolic cylinder 
functions, in the case of the NLS equation). Using 
explicit asymptotics for these functions and control- 
ling the error terms, it is possible to obtain the 
uniform (for all x € R) asymptotics for the solution 
of the initial-value problem for the NLS equation 
with go € L7((1 + x?) dx) NH! of the form 


q(x, t) =t! a(o) exp(ix /(4£) — iv(Ao) log 2t) 
p3 O A 


for any fixed 0 < «x < 1/4, where a and v are given 
in terms of r =R(qo): 





A 
arga(A) =— | log(à = 1) dlog(1 = Ir) 
+ 5 + arg T(iv(A)) + argr(A) 


The method can be used to obtain asymptotic 
expansions to all orders. Also, for nonlinear equa- 
tions supporting solitons, the soliton part of the 
asymptotics can be incorporated via the dressing 
method. 

Further applications include long-time asympto- 
tics for near-integrable systems, such as the per- 
turbed NLS equation ig; + qxx — 2\g|7q — elq|q=0 
for 1 >2 and € > 0, and the small-dispersion limits 
of integrable equations (e.g., for the Korteweg- 
de Vries equation q: — 6qqx + €*dxxx =0 with small 
dispersion € N 0). 

The RH formalism makes possible a comprehen- 
sive global asymptotic analysis of the Painlevé 
transcendents (which, due to their increasing role 
in the modern mathematical physics, should be 
considered as new nonlinear special functions), 
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including explicit connection formulas, as x 
approaches relevant critical points along different 
directions in the complex plane. 

The development of the RH method in the theory 
of integrable systems caused emerging new analytic 
and algebraic ideas for other branches of mathe- 
matics and theoretical physics. The recent examples 
are the study of the asymptotics in the theory of 
orthogonal polynomials and random matrices and in 
combinatories (random permutations). 


See also: Boundary-Value Problems for Integrable 
Equations; 6 Approach to Integrable Systems; Integrable 
Systems and Algebraic Geometry; Integrable Systems 
and the Inverse Scattering Method; Integrable Systems: 
Overview; Nonlinear Schrödinger Equations; Painleve 
Equations; Twistor Theory: Some Applications [in 
Integrable Systems, Complex Geometry and String 
Theory]; Riemann-Hilbert Problem. 
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Regular and Fuchsian Linear Systems 
on the Riemann Sphere 


Consider a system of ordinary linear differential 
equations with time belonging to the Riemann 
sphere CP! = C U œ: 


dX/dt = A(t)X 1) 


The n xn matrix A is meromorphic on CPt, with 
poles at a1,..., 441; the dependent variables X form 
an nxn matrix. One can assume that oo is not 
among the poles a; and it is not a pole of the 1-form 
A(t)dt (this can be achieved by a fractionally-linear 
transformation of t). 

P Deligne has introduced a terminology of 
meromorphic connections and sections which is 
often preferred in modern literature to the one of 
meromorphic linear systems and their solutions, and 
there is a one-to-one correspondence between the 
two languages. 


Definition 1 System [1] is regular at the pole a; if 
its solutions have a moderate (or polynomial) 
growth rate there, that is, for every sector S centered 
at a; and not containing other poles of the system 
and for every solution X restricted to S there exists 
N; ER such that ||X(t — a;)|| =O(|t —a;|™) for all 
t € S. System [1] is regular if it is regular at all poles 
a;. System [1] is Fuchsian if its poles are logarithmic 
(i.e. of first order). Every Fuchsian system is 
regular. 


Remark 2 The opening of the sector $ might be 
> 27. Restricting to a sector is necessary because the 
solutions are, in general, ramified at the poles a; and 
by turning around the poles much faster than 
approaching them one can obtain any growth rate. 


A Fuchsian system can be presented in the form 


p+1 


dX/dt = S Agi = a) x Aj = gl(n, C) [2] 


j=1 
The sum of its matrices-residua A; is 0, that is, 
Agp Ap = 0 [3] 
(recall that oo is not a pole of the system). 


Remark 3 The linear equation (with meromorphic 


coefficients) y= 0 Glt)x” =0 is Fuchsian if a; has 


poles of order only <m—j. A linear equation is 
Fuchsian if and only if it is regular. The best-studied 
Fuchsian equations are the hypergeometric one and 
its generalizations and the Jordan—Pochhammer 
equation. 


The linear change of the dependent variables 
Xr W(t)X [4] 


(where W is meromorphic on CP!) makes system [2] 
undergo the gauge transformation 


A— —W!(dW/dt) + W AW [5] 


(Most often one requires W to be holomorphic and 
holomorphically invertible for t 4 aj,j=1,...,p +1, 
so that no new singular points appear in the system.) 
This transformation preserves regularity but not 
necessarily being Fuchsian. The only invariant under 
the group of linear transformations [4] is the 
monodromy group of the system. 


Definition 4 Set SCP A esa: Fix a 
base point ap € and a matrix B € GL(n,C). 
Consider a closed contour y with base point do 
and bypassing the poles of the system. The mono- 
dromy operator of system [1] defined by this 
contour is the linear operator M acting on the 
solution space of the system which maps the 
solution X with X|,_,,=B into the value of its 
analytic continuation along y. Notation: X++ XM. 
The monodromy operator depends only on the class 
of homotopy equivalence of y. 

The monodromy group is the subgroup of 
GL(n, C) generated by all monodromy operators. It 
is defined only up to conjugacy due to the freedom 
to choose dp and B. 


Definition 5 Define the product (concatenation) 
7172 of two paths 71,72 in © (where the end of %1 
coincides with the beginning of 72) as the path 
obtained by running 7 first and %2 next. 


Remark 6 The monodromy group is an antirepre- 
sentation of the fundamental group 71(%) into 
GL(n, C) because one has 


X: XMı Š XM:-Mı (6) 


that is, the concatenation 717 of the two contours 
defines the monodromy operator M7My. In the text, 
the monodromy group is referred to as to a 
representation, not an antirepresentation. 


One usually chooses a standard set of generators 
of 71(%) (see Figure 1) defined by contours 
Y j=1,...,p +1, where yj consists of a segment 





Ao 


Figure 1 The standard set of generators. 


[a0,4;] (a; being a point close to aj), of a small 


circumference run counterclockwise (centered at aj, 
passing through a’ and containing inside no pole of 
the system other than a;), and of the segment [4;, ao]. 
Thus, y is freely homotopic to a small loop 
circumventing counterclockwise a; (and no other 
pole a;). The indices of the poles are chosen such 
that the indices of the contours increase from 1 to 
p +1 when one turns around do clockwise. 

For the standard choice of the contours the 
generators M; satisfy the relation 


Mı ...Mp =I 7 


Indeed, the concatenation of contours 7p41...71 is 
homotopy equivalent to 0 and equality [7] results 
from Remark 6. 


Remarks 7 


(i) If the matrix-residuum A; of a Fuchsian system 
has no eigenvalues differing by a nonzero 
integer, then the monodromy operator M, 
defined as above is conjugate to exp (271A)). It 
is always true that the eigenvalues og; of M; 
equal exp (27i\, ;), where A,,; are the eigenva- 
lues of Aj. 

(ii) If the generators M; of the monodromy group 
are defined after a standard set of contours 7, 
then they are conjugate to the corresponding 
operators L; of local monodromy, that is, when 
the poles a; are circumvented counterclockwise 
along small loops. The operators L; of a regular 
system can be computed (up to conjugacy) 
algorithmically — one first makes the system 
Fuchsian at a; by means of a change [4] and 
then carries out the computation. Thus, 
N= OGO, for some QO; € GL(n, C) and the 
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difficulty when computing the monodromy 
group of system |1] consists in computing the 
matrices Q; which is a transcendental problem. 

(iii) As will be noted in Theorem 9, every compo- 
nent of every solution to a regular linear system 
is a function of the class of Nilsson, that is, 
representable as a convergent (on sectors) series 
2keN, 1<i<n,0<v<n-1%i pvt" In” ta; € Ca y 
= 


Example 8 The Fuchsian system dX/dt= (A/t)X, 
A € gl(n,C), has two poles — at 0 and at œ, 
with matrices-residua A and —A. Any solution 
is of the form X= exp(Alnt)G,G € GL(n, C). 
To compute the local monodromy around 0, change 
the argument of t by 271. This results in Intt 
Int+2mi and X XG"! exp (27iA)G, that is the 
monodromy operator at 0 equals G~! exp (27iA)G 
(and in the same way the one at co equals 
G-t! exp (—27iA)G). 





Formulation and History of the Problem 


The Riemann-Hilbert problem (or Hilbert’s twenty- 
first problem) is formulated as follows: 


Prove that for any set of points a1,...,ap}1 € CP! 
and for any set of matrices My,...,M, E€ GL(n, C) 
there exists a Fuchsian linear system with poles 
at and only at a4,...,4p41 for which the correspond- 
ing monodromy operators are My,,...,M,, 
Mp+1 =(M1 ... Mp)". 


Historically, the Riemann-Hilbert problem was 
first stated for Fuchsian equations, not for systems — 
Riemann mentions in a note at the end of the 1850s 
the problem how to reconstruct a Fuchsian equation 
from its monodromy representation and Hilbert 
includes it in 1900 as the twenty-first problem on 
his list in a formulation mentioning equations and 
not systems. However, the number of parameters 
necessary to parametrize a Fuchsian equation is, in 
general, smaller than the one necessary to parame- 
trize a monodromy group generated by p matrices. 
Therefore, one has to allow the presence of 
additional apparent singularities in the equation, 
that is, singularities the monodromy around which is 
trivial. 

It had been believed for a long time that the 
Riemann-Hilbert problem has a positive solution 
for any n € N, after J. Plemelj in 1908 gave a proof 
with a gap. In his proof, Plemelj tries to reduce the 
Riemann-Hilbert problem to the so-called homo- 
geneous Hilbert boundary-value problem of the 
theory of singular integral equations. It follows 
from the correct part of the proof that if one of 
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the monodromy operators of system [1] is diagonal- 
izable, then system [1] is equivalent to a Fuchsian 
one; this is due to Yu S II’yashenko. (In particular, if 
one allows just one additional apparent singularity, 
then the Riemann-Hilbert problem is positively 
solvable. The author has shown that the result still 
holds if one of the monodromy operators has one 
Jordan block of size 2 and n — 2 Jordan blocks of 
size 1. The result is sharp — it would be false if one 
allows one Jordan block of size >3 or two blocks of 
size 2.) It also follows that any finitely generated 
subgroup of GL(z, C) is the monodromy group of a 
regular system with prescribed poles which is 
Fuchsian at all the poles with the possible exception 
of one (where the system is regular) which can be 
chosen among them at random. 

After the publication of Plemelj’s result, the 
interest shifted basically towards the question how 
to construct a Fuchsian system given the mono- 
dromy operators M;. At the end of the 1920s 
IA Lappo-Danilevskii expressed the solutions to a 
Fuchsian system as series of the monodromy 
operators. These series are convergent for mono- 
dromy operators close to the identity matrix and for 
such operators one can express the residua A; of the 
Fuchsian system as convergent series of the mono- 
dromy operators. 

In 1956 BL Krylov proved that the Riemann- 
Hilbert problem is solvable for n=p=2 by con- 
structing a Fuchsian system after its monodromy 
group. In 1983 NP Erugin did the same in the case 
n=2,p=3, and established a connection between 
the Riemann-Hilbert problem and Painlevé’s 
equations. 

In 1957H Rohrl reformulated the problem in 
terms of fibre bundles. His approach is more 
geometric; however, it does not require the system 
realizing a given monodromy group to be Fuchsian, 
but only regular. 

In 1978 W Dekkers considered the particular case 
n=2 of the Riemann-Hilbert problem, and gave a 
positive answer to it. The gap in Plemelj’s proof was 
detected in the 1980s by AT Kohn and YuS 
Il’ yashenko. 

It was proved by AA Bolibrukh in 1989 that, for 
n > 3, the problem has a negative answer. For n= 3, 
the answer is negative precisely for those couples 
(monodromy group, set of poles) for which each 
monodromy operator Mj,...,M,+1 is conjugate to 
a Jordan block of size 3, the monodromy group is 
reducible, with an invariant subspace or factor-space 
of dimension 2, the monodromy sub- or factor- 
representation corresponding to it is irreducible and 
cannot be realized by a Fuchsian system having all 
its matrices-residua conjugate to Jordan blocks of 


size 2. In Bolibrukh’s work, the last condition is 
formulated in a different (but equivalent) way using 
the notion of Fuchsian weight. 


The New Setting of the Problem 


After the negative answer to the Riemann—Hilbert 
problem for n > 3, it is reasonable to reformulate it 
as follows: 


Find necessary and/or sufficient conditions for the 
choice of the monodromy operators M1, ..., Mp and 
the points a1,...,4p41 so that there should exist a 
Fuchsian system with poles at and only at the given 
points and whose monodromy operators M; should 
be the given ones. 


In the new setting of the Riemann-—Hilbert pro- 
blem, the answer is positive if the monodromy group 
is irreducible (for any positions of the poles a;). This 
has been first proved by Bolibrukh for n =3 and then 
independently by the author and by him for any n. 

Bolibrukh found many examples of couples 
(reducible monodromy group, poles) for which the 
answer to the Riemann-—Hilbert problem is nega- 
tive. For n=3, the negative answer is due to 
possible “bad position” of the poles and a small 
shift from this position while keeping the same 
monodromy group leads to a couple for which the 
answer is positive. For n > 4, there are couples 
where the negative answer is due to arithmetic 
properties of the eigenvalues of the matrices- 
residua and the corresponding monodromy groups 
are not realizable by Fuchsian systems for any 
position of the poles. During the last years of his 
life, Bolibrukh studied upper-triangular mono- 
dromy representations and found other examples 
with negative answer to the Riemann—Hilbert 
problem. 

Bolibrukh also found some sufficient conditions 
for the positive resolvability of the Riemann—Hilbert 
problem in the case of a reducible monodromy 
group. For example, suppose that the monodromy 
group is a semidirect sum: 


mi) @ Me 
J 


where the matrices Mj (of size l; x l; i= 1,2) define 
the representations y;. Suppose that the representa- 
tion x2 is realizable by a Fuchsian system, that the 
representation %1 is irreducible, and that one of the 
matrices M; is block-diagonal, with left upper block 
of size s x s, where s < l. Then for any choice of the 
poles a; the monodromy group can be realized by 
some Fuchsian system. 


Bolibrukh also gave an estimation upon the 
number m of additional apparent singularities in a 
Fuchsian equation which are sufficient to realize a 
given irreducible monodromy group. It follows from 
his result that 


m < 


n(n- 1)(b — 1) 
———9 tl -n 


One can ask the question what the codimension of 
the subset in the space (monodromy group, poles) is 
which provides the negative answer to the Riemann- 
Hilbert problem in its initial setting. The (author’s) 
answer for p > 3 is 2p(n—1), and for n > 7 this 
codimension is attained only at couples (mono- 
dromy group, poles) for which every monodromy 
operator M; is conjugate to a Jordan block of size n, 
the group has an invariant subspace or factor-space 
of dimension n— 1, the corresponding sub- or 
factor-representation is irreducible and cannot be 
realized by a Fuchsian system in which all matrices- 
residua are conjugate to Jordan blocks of size n — 1. 
For <6 there are examples where the same 
codimension is attained (but cannot be decreased) 
on other couples as well. 


Levelt’s Result and Bolibrukh’s Method 


In 1961, AHM Levelt described the form of the 
solution to a regular system at its pole. His result is 
in the core of Bolibrukh’s method for solving the 
Riemann-Hilbert problem. 


Theorem 9 In the neighborhood of a pole, the 
solution to a regular linear system is representable in 
the form 


X = U,(t — aj)(t — aj)" (t — aj) G [8] 


where the matrix U; is holomorphic in a neigh- 
borhood of 0, D;=diag(y1,;,. -s Pn, i) Pn j € Z, 
det G; 40. The matrix Ej is in upper-triangular 
form and the real parts of its eigenvalues belong to 
[0, 1) (by definition, (t — aj)" = eFIn(-4))) The num- 
bers pp; satisfy the condition [10] formulated 
below. They are valuations in the eigenspaces of 
the monodromy operator M; (i.e., in the maximal 
subspaces invariant for M; on which it acts as an 
operator with a single eigenvalue). 
A regular system is Fuchsian at a; if and only if 


det U;(0) £ 0 (9) 


The condition on p; ; can be formulated as follows: let 
E; have one and the same eigenvalue in the rows with 
indices sı < s2 <-++ < Sg. Then one has 


Para Yop are Psg, j [10] 
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Remark 10 Denote by (6, the diagonal entries 
(i.e., the eigenvalues) of the matrix Ej. Then the 
sums k j + pj; are the eigenvalues of the matrix- 
residuum Aj at dj. 


In proving that the Riemann—Hilbert problem is 
positively solved in the case of an irreducible mono- 
dromy group, Bolibrukh (or the author) uses the 
correct part of Plemelj’s proof — namely, that the given 
monodromy group can be realized by a regular system 
which is Fuchsian at all poles but one. After this, a 
suitable change [4] is sought which makes the system 
Fuchsian at the last pole. The criterium to be Fuchsian 
is provided by the above theorem; one checks how the 
matrices Dj, that is, the exponents p; and the 
matrices U; change as a result of the transformation 
[4]. This is easier (one has only to multiply to the left 
by W(t)) than to see how the matrix A(t) of system [1] 
changes because one has conjugation in rule [5]. This 
idea is also due to Bolibrukh. 

When Bolibrukh obtains the negative answer to 
the Riemann-Hilbert problem in some case of 
reducible monodromy group, he often uses the 
following two propositions: 


Proposition 11 The sum pee + Pj relative to a 
subspace of the solution space invariant for all 
monodromy operators is a non-positive integer. 


| In particular, the sum of all exponents Dij + Pk,j 
is a non-positive integer which is 0 if and only if the 
system is Fuchsian. 


Proposition 12 If some component of some col- 
umn of some matrix solution to a regular system is 
identically equal to 0, then the monodromy group of 
the system is reducible. 


A reducible monodromy group can be conjugated 
to a block upper-triangular form, with the diagonal 
blocks defining irreducible representations. Thus, the 
Riemann-Hilbert problem for reducible monodromy 
groups makes necessary the answer to the question 
“given the set of poles a;, for which sets of exponents 
Pk; can a given irreducible monodromy group be 
realized by such a Fuchsian system?” For n > 2, an 
irreducible monodromy group can be a priori realized 
by infinitely many Fuchsian systems, with different 
sets of exponents pz j. Consider the case when these 
exponents are fixed for j Æ 1; suppose that a; =0. 
The author has shown that then infinitely many of 
the a priori possible choices of the exponents yz 1 
cannot be realized by Fuchsian systems if and only if 
the given monodromy group is realized by a Fuchsian 
system which is obtained from another one via the 
change of time trot*/(b,t* + bgt! +---+ bo), 
b; € C, bo # 0,k € N*,k > 1. This change increases 
the number of poles. 
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Further Developments - The 
Deligne-Simpson Problem 


The Riemann-Hilbert problem can be generalized for 
irregular systems as follows. One asks whether for 
given poles a; there exists a linear system of ordinary 
differential equations on the Riemann sphere with 
these and only these poles which is Fuchsian at the 
regular singular points, which has prescribed formal 
normal forms, formal monodromies and Stokes 
multipliers at the irregular singular points, and 
which has a prescribed global monodromy. 

The Riemann-Hilbert problem has been consid- 
ered in some papers (of H Esnault, E Vieweg, and C 
Hertling) in the context of algebraic curves of higher 
genus instead of CP!. 

The study of the so-called Riemann—Hilbert 
correspondence between the category of holonomic 
D-modules and the one of perverse sheaves with 
constructible cohomology has been initiated in the 
works of J Bernstein in the algebraic aspect and of 
M Sato, T Kawai, and M Kashiwara in the analytic 
one. This has been done in the case of a variety of 
arbitrary dimension (not necessarily CPt), with 
codimension one pole divisor. Perversity has been 
defined by P Deligne, M Goresky, and R MacPher- 
son. Regularity has been defined by M Kashiwara in 
the analytic aspect and by Z Mebkhout in the 
geometric one. Important contributions in the 
domain are due to Ph Maisonobe, M Merle, N 
Nitsure, C Sabbah, and the list is far from being 
exhaustive. The Riemann—Hilbert correspondence 
plays an important role in other trends of mathe- 
matics as well. 

The Deligne-Simpson problem is formulated like 
this: Give necessary and sufficient conditions upon 
the choice of the conjugacy classes c; C gl(n, C) or 
C; C GL(n, C) so that there should exist an irredu- 
cible (i.e. without proper invariant subspace) 
(p + 1)-tuple of matrices A; € c; satisfying [3] or of 
matrices M; satisfying [7]. 

The problem was stated in the 1980s by P Deligne 
for matrices M; and in the 1990s by the author for 
matrices A;. C Simpson was the first to obtain results 
towards its resolution in the case of matrices M;. The 
problem admits the following geometric interpretation 
in the case of matrices M;: For which (p + 1)-tuples of 
local monodromies does there exist an irreducible 
global monodromy with such local monodromies? 

For generic eigenvalues the problem has found a 
complete solution in the author’s papers in the form of 
a criterium upon the Jordan normal forms defined by 
the conjugacy classes. The author has treated the case 
of nilpotent matrices A; and the one of unipotent 
matrices M; as well. For matrices A;, the problem has 


been completely solved (for any eigenvalues) by W 
Crawley-Boevey. The case of matrices A; with p=2 
has been treated by O Gleizer using results of A 
Klyachko. The case when the matrices M; are unitary 
is considered in papers of S Agnihotri, P Belkale, I 
Biswas, C Teleman, and C Woodward. Several cases of 
finite groups have been considered by M Dettweiler, S 
Reiter, K Strambach, J Thompson, and H Volklein. 
The important rigid case has been studied by NM 
Katz. Y Haraoka has considered the problem in the 
context of linear systems in Okubo’s normal form. 
One can find details in an author’s survey on the 
Deligne-Simpson problem (Kostov, 2004). 


See also: Affine Quantum Groups; Bicrossproduct Hopf 
Algebras and Non-Commutative Spacetime; Einstein 
Equations: Exact Solutions; Holonomic Quantum Fields; 
Integrable Systems: Overview; Isomonodromic 
Deformations; Leray—Schauder Theory and Mapping 
Degree; Painlevé Equations; Riemann—Hilbert Methods 
in Integrable Systems; Twistors; WDVV Equations and 
Frobenius Manifolds. 
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Riemannian Holonomy Groups 


Let (M,g) be a Riemannian n-manifold. The 
holonomy group Hol(g) is a Lie subgroup of O(n), 
a global invariant of g which measures the constant 
tensors $ on M preserved by the Levi-Civita 
connection V of g. The most well-known examples 
of metrics with special holonomy are Kahler metrics, 
with Hol(g) C U(m) c O(2m). A Kahler manifold 
(M, g) also carries a complex structure J and Kahler 
2-form w with VJ =Vw=0. 

The classification of Riemannian holonomy 
groups gives a list of interesting special Riemannian 
geometries such as Calabi-Yau manifolds and the 
exceptional holonomy groups G3 and Spin(7), all of 
which are important in physics. These geometries 
have many features in common with Kahler geome- 
try, and are characterized by the existence of 
constant exterior forms. 


General Properties of Holonomy Groups 


Let M be a connected manifold of dimension n and ga 
Riemannian metric on M, with Levi-Civita connec- 
tion V, regarded as a connection on the tangent 
bundle TM of M. Suppose 7:[0,1]— M is a smooth 
path, with 7(0)=x and y(1)=y. Let s be a smooth 
section of 7*(TM), so that s:[0,1] — TM with s(t) € 
TaM for each t€ [0,1]. Then we say that s is 
parallel if V-iz)s(t) =0 for all ¢ € [0, 1], where 7(¢) is 


S(t) E TyM 

For each v € T M, there is a unique parallel 
section s of ¥(TM) with s(0)=v. Define a map 
P,:TsM—>T,M by P.(v)=s(1). Then P, is well 
defined and linear, and is called the parallel 
transport map along ~y. This easily generalizes 
to continuous, piecewise-smooth paths ~y. As 


Vg=0, we see that P,:TxM—T,M is orthogonal 
with respect to the metric g on TyM and T,M. 


Definition 1 Fix a point x € M. y is said to be loop 
based at x if y:[0,1]— M is a continuous, piece- 
wise-smooth path with 7(0) = 7(1) =x. If y is a loop 
based at x, then the parallel transport map P, lies in 
O(T,,.M), the group of orthogonal linear transforma- 
tions of T,.M. Define the (Riemannian) holonomy 
group Hol,(g) of g based at x to be 


Hol,(g) = {P,: 7 is a loop based at x} 
C O(T;.M) [1] 


Here are some elementary properties of Hol,(g). 
The only difficult part is showing that Hol,(g) is a 
(closed) Lie subgroup. 


Theorem 2 Hol,(g) is a Lie subgroup of O(TxM), 
which is closed and connected if M is simply 
connected, but need not be closed or connected 
otherwise. Let x,y E€ M, and suppose y:[0,1] > M 
is a continuous, piecewise-smooth path with 
(0) =x and y(1)=y, so that P,:T,M— T,M. Then 


P, Hol,(g)P;! = Hol(g) 2] 


By choosing an orthonormal basis for T,,.M we 
can identify O(T„M) with the Lie group O(n), and 
so identify Hol,(g) with a Lie subgroup of O(n). 
Changing the basis changes the subgroups by 
conjugation by an element of O(n). Thus, Hol,(g) 
may be regarded as a Lie subgroup of O(n) defined 
up to conjugation. Equation [2] shows that in this 
sense, Hol,(g) is independent of the base point x. 
Therefore, we omit the subscript x and write 
Hol(g) for the holonomy group of g, regarded as 
a subgroup of O(n) defined up to conjugation. 

It is significant that Hol(g) is a global invariant of g, 
that is, it does not vary from point to point like 
local invariants of g such as the curvature. Generic 
metrics g on M have Hol(g)=SO(z) if M is 
orientable, and Hol(g)= O(n) otherwise. But some 
special metrics g can have Hol(g) a proper 
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subgroup of SO(n) or O(n). Then M carries some 
extra geometric structures compatible with g. 
Broadly, the smaller Hol(g) is as a subgroup of 
O(n), the more special g is, and the more extra 
geometric structures there are. Therefore, under- 
standing and classifying the possible holonomy 
groups gives a family of interesting special Rieman- 
nian geometries, such as Kahler geometry. All of 
these special geometries have cropped up in physics. 
Define the holonomy algebra Hol(g) to be the Lie 
algebra of Hol(g), regarded as a Lie subalgebra of 
o(n), defined up to the adjoint action of O(n). 
Define fol,(g) to be the Lie algebra of Hol,(g), as a 
Lie subalgebra of 0(T,,.M) = A*T*M. The holonomy 
algebra Hol(g) is intimately connected with the 
Riemann curvature tensor Rjp-d = ZaeR° peg Of g. 


Theorem 3 The Riemann curvature tensor Rabed 
lies in S*hol,(g) at x, where bol,(g) is regarded as a 
subspace of A?T*M. It also satisfies the first and 
second Bianchi identities 


K shed T Radbe T Kedah = 0 [3] 
NVeRdba ai VaN dipde 7 V dK aber = 0 |4] 


A related result is the Ambrose-Singer holonomy 
theorem, which, roughly speaking, says that Ņol,(e) 
may be reconstructed from Rabcaly for all y € M, 
moved to x by parallel transport. 

If (M, g) and (N,/) are Riemannian manifolds, the 
product M x N carries a product metric g x h. It is 
easy to show that Hol(g x 4) = Hol(g) x Hol(h). A 
Riemannian manifold (M,g) is called reducible if 
every point has an open neighborhood isometric to a 
Riemannian product and irreducible otherwise. 


Theorem 4 Let (M,g) be Riemannian n-manifold. 
Then the natural representation of Hol(g) on R” is 
reducible if and only if g is reducible. 


There is a class of Riemannian manifolds called 
the “Riemannian symmetric spaces” which are 
important in the theory of Riemannian holonomy 
groups. A Riemannian symmetric space is a 
special kind of Riemannian manifold with a 
transitive isometry group. The theory of sym- 
metric spaces was worked out by Elie Cartan in 
the 1920s, who classified them completely, using 
his own classification of Lie groups and their 
representations. 

A Riemannian metric g is called “locally sym- 
metric” if VeRabcd = 0, and “nonsymmetric” other- 
wise. Every locally symmetric metric is locally 
isometric to a Riemannian symmetric space. The 
relevance of symmetric spaces to holonomy groups 


is that many possible holonomy groups are the 
holonomy group of a Riemannian symmetric space, 
but are not realized by any nonsymmetric metric. 
Therefore, by restricting attention to nonsymmetric 
metrics, one considerably reduces the number of 
possible Riemannian holonomy groups. 

A tensor S$ on M is constant if VS=0O. An 
important property of Hol(g) is that it determines 
the constant tensors on M. 


Theorem 5 Let (M,g) be a Riemannian manifold, 
with Levi-Civita connection V. Fix x €M, so 
that Hol,(g i acts on Ty M, and so on the tensor 
powers @* aE a Q' TšM. Suppose Se C% 
(@* TM & Q T*M) is a constant tensor. Then S|, 
is fixed by the action of Hol(g). Conversely, 
if S| E & T,M 9 Q! T*M is fixed by Hol(g), 
it extends to a ~ constant tensor 
SE C%( (@* TM Q Q T*M 


The main idea in the — is that if S is a constant 
tensor and y:[0,1]— M is a path from x to y, then 
P.A(S|,.) = Sly» that is, “constant tensors are invariant 
under parallel transport.” In particular, they are 
invariant under parallel transport around closed 
loops based at x, and so under elements of Hol,(g). 


Berger’s Classification of Holonomy Groups 


Berger classified Riemannian holonomy groups in 
1955. 


Theorem 6 Let M be a simply connected, 
n-dimensional manifold, and g an irreducible, non- 
symmetric Riemannian metric on M. Then 


(i) Hol(g) = SO(n), 


(ii) 2=2m and Hol(g) =SU(m) or U(m), 

(iii) n=4m and Hol(g) =Sp(m) or Sp(m)Sp(1), 
(iv) n=7 and Hol(g) = Gy, or 

(v) 2=8 and Hol(g) = Spin(7). 


To simplify the classification, Berger makes three 
assumptions: M is simply connected, g is irreducible, 
and g is nonsymmetric. We can make M simply 
connected by passing to the “universal cover.” The 
holonomy group of a reducible metric is a product 
of holonomy groups of irreducible metrics, and the 
holonomy groups of locally symmetric metrics 
follow from Cartan’s classification of Riemannian 
symmetric spaces. Thus, these three assumptions can 
easily be removed. 

Here is a sketch of Berger’s proof of Theorem 6. 
As M is simply connected, Theorem 2 shows Hol(g) 
is a closed, connected Lie subgroup of SO(n), and 
since g is irreducible, Theorem 4 shows the 
representation of Hol(g) on R” is irreducible. So, 
suppose that H is a closed, connected subgroup of 
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SO(n) acting irreducibly on R”, with Lie algebra 9. 
The classification of all such H follows from the 
classification of Lie groups (and is of considerable 
complexity). Berger’s method was to take the list of 
all such groups H, and to apply two tests to each 
possibility to find out if it could be a holonomy 
group. The only groups H which passed both tests 
are those in the theorem. 

Berger’s tests are algebraic and involve the 
curvature tensor. Suppose that R,»-qg is the Riemann 
curvature of a metric g with Hol(g)=H. Then 
Theorem 3 gives R,y-q € S49, and the first Bianchi 
identity [3] applies. But if § has large codimension in 
o(n), then the vector space R” of elements of $2% 
satisfying [3] will be small, or even zero. However, 
the “Ambrose-Singer holonomy theorem” shows that 
R” must be big enough to generate ). For many of the 
candidate groups H, this does not hold, and so H 
cannot be a holonomy group. This is the first test. 

Now V-Ropeg lies in (IR”)* @ R”, and also satisfies 
the second Bianchi identity, eqn [4]. Frequently, 
these imply that VR=0, so that g is locally 
symmetric. Therefore, we may exclude such H, and 
this is Berger’s second test. 

Berger’s proof does not show that the groups on 
his list actually occur as Riemannian holonomy 
groups — only that no others do. It is now known, 
though this took another thirty years to find out, 
that all possibilities in Theorem 6 do occur. 


The Groups on Berger’s List 


Here are some brief remarks about each group on 
Berger’s list. 


(i) SO(n) is the holonomy group of generic 
Riemannian metrics. 

(ii) Riemannian metrics g with Hol(g) C U(m) are 
called “Kahler metrics.” Kahler metrics are a natural 
class of metrics on complex manifolds, and generic 
Kahler metrics on a given complex manifold have 
holonomy U(m). 

Metrics g with Hol(g) =SU(m) are called Calabi- 
Yau metrics. Since SU(m) is a subgroup of U(m), all 
Calabi-Yau metrics are Kahler. If g is Kahler and M 
is simply connected, then Hol(g) C SU(m) if and 
only if g is Ricci-flat. Thus, Calabi-Yau metrics are 
locally more or less the same as Ricci-flat Kahler 
metrics. 

If (M,J) is a compact complex manifold with 
trivial canonical bundle admitting Kahler metrics, 
then Yau’s solution of the Calabi conjecture gives a 
unique Ricci-flat Kahler metric in each canonical 
class. This gives a way to construct many examples 
of Calabi-Yau manifolds, and explains why these 
have been named after them. 


(iii) Metrics g with Hol(g)=Sp(m) are called 
“hyper-Kahler.” As Sp(m) C SU(2m) c U(2m), hyper- 
Kahler metrics are Ricci-flat and Kahler. 

Metrics g with holonomy group Sp(m)Sp(1) for 
m > 2 are called “quaternionic Kahler.” (Note that 
quaternionic Kahler metrics are not in fact Kahler.) 
They are Einstein, but not Ricci-flat. 

(iv), (v) G2 and Spin(7) are the exceptional cases, 
so they are called the “exceptional holonomy 
groups.” Metrics with these holonomy groups are 
Ricci-flat. 


The groups can be understood in terms of the four 
division algebras: the real numbers R, the complex 
numbers C, the quaternions H, and the octonions or 
Cayley numbers O. 





e SO(n) is a group of automorphisms of R”. 

è U(m) and SU(m) are groups of automorphisms of C”. 

e Sp(m) and Sp(m) Sp(1) are automorphism groups 
of H”. 

e G, is the automorphism group of Im O = R’. 
Spin(7) is a group of automorphisms of O = R8, 
preserving part of the structure on O. 





The Exceptional Holonomy Groups 


For some time after Berger’s classification, the 
exceptional holonomy groups remained a mystery. 
In 1987, Bryant used the theory of exterior 
differential systems to show that locally there exist 
many metrics with these holonomy groups, and gave 
some explicit, incomplete examples. Then in 1989, 
Bryant and Salamon found explicit, complete 
metrics with holonomy G3 and Spin(7) on non- 
compact manifolds. In 1994-95, the author con- 
structed the first examples of metrics with holonomy 
G2 and Spin(7) on compact manifolds. For more 
information on exceptional holonomy, see Joyce 
(2000, 2002). 


The Holonomy Group G2 


Let (x1,...,;xX7) be coordinates on R”. Write dx; 
for the exterior form dx; A^ dx; A---Adx; on R’. 
Define a metric go, a 3-form yọ, and a 4-form *yọ 


on R’ by 
go = dy eed 
po = dx123 + dx145 + dx167 + dx246 
— dx257 — dx347 — dx356 [5] 
xpo = dx4567 + dx2367 + dx2345 + dx1357 


— dx1346 — dx1256 — dx1247 


The subgroup of GL(7,R) preserving yo is the 
exceptional Lie group G2. It also preserves go, * Yo, 
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and the orientation on R’. It is a compact, 
semisimple, 14-dimensional Lie group, a subgroup 
of SO(7). 

A G»-structure on a 7-manifold M is a principal 
sub-bundle of the frame bundle of M, with 
structure group G». Each Gp-structure gives rise 
to a 3-form y and a metric g on M, such that every 
tangent space of M admits an isomorphism with R’ 
identifying y and g with yo and go, respectively. By 
an abuse of notation, (y,g) can be referred to as a 
G>-structure. 


Proposition 7 Let M be a 7-manifold and (y,g) a 
G»-structure on M. Then the following are 
equivalent: 


(i) Hol(g) C Gy, and ọ is the induced 3-form; 
(ii) Vp=0 on M, where V is the Levi-Civita 
connection of g; and 
(iii) dy =d(*xy) =0 on M. 


The equations dg=d(ky)=0 look like linear 
partial differential equations on y. However, it is 
better to consider them as nonlinear, for the 
following reason. The 3-form y determines the 
metric g, and g gives the Hodge star x on M. So 
x% is a nonlinear function of y, and d(ky)=0 a 
nonlinear equation. Thus, constructing and study- 
ing G2-manifolds come down to studying solu- 
tions of nonlinear elliptic partial differential 
equations. 

Note that Hol(g) C G2 if and only if Vp=0 
follows from Theorem 5. We call Vy the 
“torsion” of the G»-structure (y,g), and when 
Vy=0 the G»-structure is “torsion-free.” A triple 
(M,y,g) is called a G»-manifold if M is a 
7-manifold and (y,g) a torsion-free G»2-structure 
on M. If g has holonomy Hol(g) C G2, then g is 
Ricci-flat. 


Theorem 8 Let M be a compact 7-manifold, and 
suppose that (p, g) is a torsion-free G2-structure on M. 
Then Hol(g) = Gz if and only if mı(M) is finite. In 
this case, the moduli space of metrics with holon- 
omy G on M, up to diffeomorphisms isotopic to 
the identity, is a smooth manifold of dimension 


b3(M). 


The Holonomy Group Spin(7) 


Let RÊ have coordinates (x;,...,xg). Define a 
> 3 


4-form Q on R® by 


Qo = dx1234 + dx1256 + dx1278 + dx1357 — dx1368 
= dx145g — dx1467 — dx2353 — dx2367 — dx2457 
+ dx246g + dx3456 + dx3473 + dx5678 [6] 


The subgroup of GL(8, R) preserving Qo is the 
holonomy group Spin(7). It also preserves the 
orientation on R8 and the Euclidean metric 
go =dx?7 +--+ dx. It is a compact, semisimple, 
21-dimensional Lie group, a subgroup of SO(8). 

A Spin(7)-structure on an 8-manifold M gives rise 
to a 4-form Q and a metric g on M, such that each 
tangent space of M admits an isomorphism with R? 
identifying Q and g with Qo and go, respectively. By 
an abuse of notation, the pair (Q, g) is referred to as 
a Spin(7)-structure. 


Proposition 9 Let M be an 8-manifold and (Q, g) a 
Spin(7)-structure on M. Then the following are 
equivalent: 


(i) Hol(g) C Spin(7) and Q is the induced 4-form; 
(ii) VOQ=0 on M, where V is the Levi-Civita 
connection of g; and 
(iii) d2=0 on M. 


We call VQ the torsion of the Spin(7)-structure 
(Q,g), and (Q,g) torsion free if VO=0. A triple 
(M,Q, g) is called a Spin(7)-manifold if M is an 8- 
manifold and (Q,g) a torsion-free Spin(7)-structure 
on M. If g has holonomy Hol(g) C Spin(7), then g is 
Ricci-flat. 

Here is a result on compact 8-manifolds with 
holonomy Spin(7). 


Theorem 10 Let (M,Q,g) be a compact Spin(7)- 
manifold. Then, Hol(g) =Spin(7) if and only if M is 
simply connected, and b*M)+ b¢(M)=b*(M) + 
2b*(M) +25. In this case, the moduli space of 
metrics with holonomy Spin(7) on M, up to 
diffeomorphisms isotopic to the identity, is a smooth 
manifold of dimension 1 + bt (M). 


The inclusions between the holonomy groups 
SU(m), G2, Spin(7) are 
SU(2) — SU(3) — G2 
{ { { [7] 
SU(2) x SU(2) —> SU(4) =y Spin(7) 
The meaning of the above equation is illustrated 
by using the inclusion SU(3)<> G2. As SU(3) acts 
on C?, it also acts on ROC’ =R’, taking the 
SU(3)-action on R to be trivial. Thus, we embed 
SU(3) as a subgroup of GL(7,R). It turns out 
that SU(3) is contained in the subgroup G2 of 


GL(7,R) defined in the section “The holonomy 
group G2.” 


Constructing Compact G2- and Spin(7)-Manifolds 


The author’s method of constructing compact 
7-manifolds with holonomy Gy) is based on the 
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Kummer construction for Calabi-Yau metrics 
on the K3 surface and may be divided into four 
steps. 


Step 1. Let T” be the 7-torus and (yo, go) a flat 
G-structure on T”. Choose a finite group I’ of 
isometries of T” preserving (po, go). Then the quotient 
T’/T is a singular, compact 7-manifold, an orbifold. 

Step 2. For certain special groups I’, there is a 
method to resolve the singularities of T’/T ina natural 
way, using complex geometry. We get a nonsingular, 
compact 7-manifold M, together with a map 7: M —> 
T’/T, the resolving map. 

Step 3. On M, we explicitly write down a one- 
parameter family of G2-structures (pr, g;) depending 
on t € (0,¢). They are not torsion free, but have 
small torsion when ¢ is small. As t—0O, the 
G»-structure (y;,g;) converges to the singular 
G»-structure 1*(Yo, Zo). 

Step 4. We prove using analysis that for suffi- 
ciently small t, the G2-structure (y;,g;) on M, with 
small torsion, can be deformed to a G»-structure 
(yw, &,), with zero torsion. Finally, it is shown that g, 
is a metric with holonomy G2 on the compact 
7-manifold M. 


We explain the first two steps in greater detail. 
For Step 1, an example of a suitable group T is given 
here. 


Example 11 Let (x,...,x7) be coordinates on 
T’ =R’/Z’, where x; € R/Z. Let (yo,g0) be the 
flat Gy-structure on T” defined by [5]. Let a, 8, and 
y be the involutions of T” defined by 


(Ki ssc00 3007) 
med CAR OPE —X4, —X5, —X6, —x7) [8] 
DEANS cacy XZ) 
(x1, —X2, —X3,%4,X5,5 — X6, —X7) [9] 
E TE 


> (=x, X2, —X3, X4, $—X5, X6, 4 — X7) [10] 


By inspection, a, 3, and y preserve (0,20), 
because of the careful choice of exactly which signs 
to change. Also, a? = 67 =7* =1, and a, 6, and y 
commute. Thus, they generate a group 
r= (a, 8,7) a7, of isometries of T’ preserving 
the flat G»-structure (p0, go). 


Having chosen a lattice A and finite group I’, the 
quotient T’/T is an orbifold, a singular manifold 
with only quotient singularities. The singularities of 
T’/I come from the fixed points of nonidentity 


elements of I. We now describe the singularities in 
the example. 


Lemma 12 In Example 11, By, ya, aß, and aby 
have no fixed points on T’. The fixed points of 
a, By are each 16 copies of T°’. The singular set S of 
T’/T is a disjoint union of 12 copies of T’, 4 copies 
from each of a, ß,y. Each component of S is a 
singularity modeled on that of T? x C*/{£1}. 


The most important consideration in choosing T 
is that we should be able to resolve the singula- 
rities of T’/T within holonomy G», in Step 2. We 
have no idea how to resolve general orbifold 
singularities of Gy-manifolds. However, after fifty 
years of hard work we understand well how to 
resolve orbifold singularities of Calabi-Yau mani- 
folds, with holonomy SU(m). This is done by a 
combination of algebraic geometry, which pro- 
duces the underlying complex manifold by a 
crepant resolution, and Calabi-Yau analysis, 
which produces the Ricci-flat Kahler metric on 
this complex manifold. 

Now the holonomy groups SU(2) and SU(3) are 
subgroups of G2, as in [7]. Our tactic in Step 2 is to 
ensure that all of the singular set § of T’/I can 
locally be resolved with holonomy SU(2) or SU(3), 
and then use Calabi-Yau geometry to do this. In 
particular, suppose each connected component of S 
is isomorphic to either 


1. T? x CIG, for G a finite subgroup of SU(2); or 
2. S! x C?/G, for G a finite subgroup of SU(3) 
acting freely on C? \ {0}. 


One can use complex algebraic geometry to find a 
crepant resolution X of C/G or Y of C?/G. Then 
T? x X or S! x Y gives a local model for how to 
resolve the corresponding component of S in T’/T. 
Thus we construct a nonsingular, compact 7-mani- 
fold M by using the patches T? x X or S! x Y to 
repair the singularities of T’/I. In the case of 
Example 11, this means gluing 12 copies of T? x X 
into T’/T, where X is the blow-up of C? /{+1} at its 
singular point. 

By considering different groups T acting on T”, 
and also by finding topologically distinct resolu- 
tions My,...,M, of the same orbifold T’/I, we 
can construct many compact Riemannian 7-mani- 
folds with holonomy G2. A good number of 
examples are given in Joyce (2000, chapter 12). 
Figure 1 displays the 252 different sets of Betti 
numbers of compact, simply connected 7-mani- 
folds with holonomy G2 constructed there 
together with 5 more sets from Kovalev. It 
seems likely to the author that the Betti numbers 
given in Figure 1 are only a small proportion of 
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Figure 1 


the Betti numbers of all compact 7-manifolds with 
holonomy Gp. 

A different construction of compact 7-manifolds 
with holonomy G3 was given by Kovalev (2003), 
involving gluing together asymptotically cylindrical 
Calabi-Yau 3-folds. Compact 8-manifolds with 
holonomy Spin(7) were constructed by the author 
using two different methods: first, by resolving 
singularities of torus orbifolds T/T in a similar way 
to the G, case (though the details are different and 
more difficult), and second, by resolving Y/(c) for Y 
a Calabi-Yau 4-orbifold with singularities of a 
special kind, and o an antiholomorphic isometric 
involution of Y. Details can be found in Joyce (2000). 


See also: Calibrated Geometry and Special Lagrangian 
Submanifolds. 
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b°(M) 


Betti numbers (b*, b?) of compact Go-manifolds. (From Joyce (2000) and Kovalev (2003).) 
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Introduction 


Many problems arising in science and engineering 
call for the solving of the Euler equations of 
functionals, that is, equations of the form 


G'(u) = 0 [1] 


where G(u) is a C!-functional (usually representing 
the energy) arising from the given data. As an 
illustration, the equation 


—Au(x)= f(x, u(x) 
is the Euler equation of the functional 


Glu) =5 (IVa? - | F(x, u(x)) dx 


on an appropriate space, where 


P41) = [res ds [2] 


and the norm is that of L*. The solving of the Euler 
equations is tantamount to finding critical points of 
the corresponding functional. The classical approach 
was to look for maxima or minima. If one is looking 
for a minimum, it is not sufficient to know that the 
functional is bounded from below, as is easily 
checked. However, one can show that there is a 
sequence satisfying 


G(uk) >a, Gu) + 0 [3] 


for a= inf G. If the sequence has a convergent 
subsequence, this will produce a minimum. 
However, when extrema do not exist, there is no 
clear way of obtaining critical points. In particular, 
this happens when the functional is not bounded 
from either above or below. Until recently, there 
was no organized procedure for producing critical 
points which are not extrema. We shall describe an 
approach which is very useful in such cases. 


To illustrate the technique, we consider the 
problem of finding a solution of 


=u" (x) + u(x) = f(x, u(x) 4] 
x € I=[0,27r], under the conditions 
u(0) = u(2r), u'(0) = u (27) [5] 


We assume that the function f(x, t) is continuous in 
IxR and is periodic in x with period 27. The 
approach begins by asking the question, “does there 
exist a differentiable function G from a space H to 
R such that [4], [5] are equivalent to [1]?” It is 
hoped that one can mimic the methods of calculus to 
find critical points and thus solve [1]. 

Actually, we are asking the following: does there exist 
a mapping G from a space H to R such that G has a 
critical point u satisfying G'(u) = —u"+ u — f(x, u(x))? 

In order to solve the problem one has to 


1. find G(u) such that 
(G'(u),U) = (U,V) i g (f(u), v) [6] 


holds for each u,v € H, 

2. show that there is a function u(x) such that 
G'(u)=0, 

3. show that u” exists in I, 

4. show that [1] implies [4]. 


We used the notation 
2r 
(u,v) = J u(x)v(x) dx 
0 


In order to carry out the procedure, we assume 
that for each R > O there is a constant Cpr such that 


F(x, t)| << Cr, xE, tER, || <R [7] 


This assumption is used to carry out step (1). We define 


2r 
Gu) =3luh- f Fæuade [8 


where F(x, t) is given by [2] and we take H to be the 
completion of C!(I) with respect to the norm 


jully = Clee IPP + Mie? [9] 


where |||” = (u, u). We have 
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Theorem 1 If f(x,t) satisfies |7], then G(u) given 
by [8] is continuously differentiable and satisfies |6]. 


Once we have reduced the problem to solving [1], 
we can search for critical points. The easiest type to 
locate are “saddle points” which are local minima in 
some directions and local maxima in all others. For 
instance, we obtain theorems such as 


Theorem 2 Assume that 


pane Ct SELTER g 
2F(x,t)/t — B(x) a.e. as |t| — co fae 

with B(x) satisfying 
1+ < B(x) <14+(n4+1) mt 


L+n* # B(x) #14+(n+1) 


and n an integer >0. If G(u) is given by [8], then 
there is aug € H such that 


G' (uo) = [12] 


In particular, uo is a solution of [4] and [5] in the 
usual sense. 


In proving this theorem, we shall make use of 


Theorem 3 Let M,N be closed subspaces of a 
Hilbert space E such that M=N-. Assume that at 
least one of these subspaces is finite dimensional. 
Let G be a continuously differentiable functional on 
E satisfying 


mo = sup inf G(v + w) 4 —oo [13] 
vEN weM 
and 
mı = inf sup G(v + w) 4 œ [14] 
wEM veN 


Then there is a sequence {uz} C E such that 


G(up) >c, mo < c< m, G'(up)—>0 [15] 


Theorem 3 allows us to obtain solutions if we can 
find subspaces of H such that [13] and [14] hold. We 


use it to give the proof of Theorem 2. 
Proof. Note that 


lula =X A+R leh, weH [16] 
where the a, are given by 
6A) h 0 im cal E [17] 


and 


: el** kp =0,+1,42,... [18] 





p(x) = 


` 


Let 
N = {u € H : ap = 0 for |k| > n} 
Thus, 
lullig = X (1 +R )lael? 
|k|<n 
<(1+n7)|\ul|?, weN [19] 
Let 


M = {u € H: ap = 0 for |k| < n} 

In this case, 
2 2 

lula = D> A+? )lon 
|k|>n+1 

> (1+(a+1))llal’, 
Note that M, N are closed subspaces of H and that 
M=N-~-. Note also that N is finite dimensional. If 


we consider the functional [8], it is not difficult to 
show that [11] implies 


ucM 120] 


inf G > —oo; sup G < œ [21] 
M N 
We are now in a position to apply Theorem 3. This 
produces a saddle point satisfying [1]. o 
Minimax 


Theorem 3 is very useful when extrema do not exist, but 
it is not always applicable. One is then forced to search 
for other ways of obtaining critical points. Again, one is 
faced with the fact that there is no systematic method of 
finding them. A useful idea is to try to find sets that 
separate the functional. By this we mean the following: 


Definition 1 Two sets A, B separate the functional 


G(u) if 
ao := sup G < bo := inf G [22] 
A B 


We would like to find sets A and B such that [22] 
will imply 


gu : G(u) > bọ, G'(u)=0 [23] 


This is too much to expect since even semibounded- 
ness does not imply the existence of an extremum. 
Consequently, we weaken our requirements and 
look for sets A,B such that [22] implies 

G(u}) > a, G' (uz) — 0 [24] 
with a > bo. This leads to 


Definition 2 We shall say that the set A links the 
set B if [22] implies [24] with a > bo for every C! 
functional G(u). 


Of course, [24] is a far cry from [23], but if, for 
example, the sequence [24] has a convergent 
subsequence, then [24] implies [23]. Whether or 
not [24] implies [23] is a property of the functional 
G(u). We state this as 


Definition 3 We say that G(u) satisfies the Palais- 
Smale (PS) condition if [24] always implies [23]. 


The usual way of verifying this is to show that 
every sequence satisfying [24] has a convergent 
subsequence (there are other ways). 

All of this leads to 


Theorem 4 If G satisfies the PS condition and is 
separated by a pair of linking sets, then it has a 
critical point satisfying [23]. 


This theorem cannot be applied until one knows if 
there are linking sets and functionals that satisfy the 
PS condition. Fortunately, they exist. Examples and 
sufficient conditions for A to link B are found in the 
literature. Obviously, the weaker the conditions, the 
more pairs will qualify. To date, the conditions 
described in the next section allow all known 
examples. 


The Details 


Let E be a Banach space, and let ® be the set of all 
continuous maps [ =I(t) from E x [0,1] to E such 
that 


1. 1'(0) =I, the identity map; 

2. for each ¢ € [0,1), I(t) is a homeomorphism of E 
onto E and I~!(t) € C(E x [0, 1), E); 

3. T(1)E is a single point in E and I(t)A converges 
uniformly to T(1)E as t — 1 for each bounded 
set A C E; and 

4. for each to € [0,1) and each bounded set A C E, 


sup {rull + I~" eul} < co 2S] 
O<t<to,ucA 
We have the following 
Theorem § A sufficient condition for A to link B is 
(i) AN B=¢ and 
(ii) for each T € ® there is a t € (0, 1] such that 
T“QANBSA ¢ 


Theorem 6 Let G be a C!-functional on E, and let 
A,B be subsets of E such that A,B satisfy [22] and 
the hypotheses of Theorem 5. Assume that 

G(I\(s)u) |26] 


a:= inf sup 
PEP 0<s<1,uEA 
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is finite. Let y(t) be a positive, locally Lipschitz 
continuous function on [0, 0) such that 


| Ones 27 
0 

Then there is a sequence {uz} C E such that 
G'(uz)/V(l4ell) 79 [B8] 


If a=bo, then we can also require that 


d(u,,B) > 0 (29) 


G(uz) > d, 


Corollary 1 Under the hypotheses of Theorem 6 
there is a sequence {uz} C E such that 


Guk) >a, (1+ llull) G (u) +90 [30] 
Proof. We merely take y(u)=1/(1+ lul) in 
Theorem 6. oO 


A useful criterion for finding linking subsets is 


Theorem 7 Let F be a continuous map from a 
Banach space E to R”, and let O C E be such that 
Fo = F| o ts a homeomorphism of Q onto the closure 
of a bounded open subset Q of R”. If p € Q, then 
F51(0Q) links F~'(p). 


Some Examples 
The following are examples of sets that link. 


Example 1 Let M,N be closed subspaces such that 
E=M®@N (with one finite dimensional). Let 


Br=({ueE:||u|| < R} 


and take A=OBrNN,B=M. Then A links B. 
To see this, we identify N with some R” and take 


Q9=Br AN, OQ =Q. For u € E, we write 
u=v+w, vEeN,weM [31] 
and take F to be the projection 
Fu = v 


Since F|o =I and M = F-'(0), we see from Theorem 7 
that A links B. 


Example 2 We take M,N as in Example 1. Let 
wy #0 be an element of M, and take 
A={veEN:|v|| < R} 
U {swo +v : v E€ N,s > 0, ||swo + v|| = R} 
B =08BsAM, O<6<R. 


Then A links B. Again we identify N with some R”, 
and we may assume ||wo||=1. Let 


O = {swo +v : v E N,s > 0, ||swo + || < R} 
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Then A=0O in R”*!. If u is given by [31], we 
define 


Fu = v + |lw||wo 


Then Fly =I and B = F-!(6wo). We can now apply 
Theorem 7 to conclude that A links B. 


Example 3 Take M,N as before and let vp 4 0 be 
an element of N. We write N = {vo} 6 N’. We take 


A={v' EN’: |I| < R} 
U {svo +v: v € N’,s > 0,||svo + v'|| = R} 
B={w € M: |w|| > 6} 
U {svo + w : w € M,s > 0, ||svo + w|| = 6} 
where 0 < 6 < R. Then A links B. To see this, we let 
O = {svo +v: v € N',s > 0, ||svo +v'|| < R} 





and reason as before. For simplicity, we assume that 
voll =1, E is a Hilbert space and that the splitting 
E=N’ 6 {vo} 6 M is orthogonal. If 


u=v'+w+svy, vEN,weM,sER_ [32] 


we define 
F(u) =v’ + (s +6— 4/6 — jo? Jv, Jw || <6 


=v + (s+ êo, lwll > 6 


Note that F|o =I while F-! (vo) is precisely the set 
B. Hence we can conclude via Theorem 7 that A 
links B. 


Example 4 This is the same as Example 3 with A 
replaced by A=OBR™N. The proof is the same 
with O replaced by O = BRON. 


Example 5 Let M,N be as in Example 1. Take 
A=08B5N N, and let vp be any element in 0B, ON. 
Take B to be the set of all u of the form 


u =w +sv, wEM 


satisfying any of the following: 
(i) ||w|| < R,s=0, 
(ii) Iwl] < R,s=2Ro, and 
(iii) |jw|| = R,O<s <2Ro 
where 0 < 6 < min(R,Ro). Then A links B. To see 


this, take N = {vo} GN’. Then any u€ E can be 
written in the form [32]. Define 


R 
F(u) =v + (Ro — max{ fo wll, |s — Rol} )vo 


and QỌ=B5 NN. Again we may identify N with 
some R”. Then F € C(E, N) and F|o =I. Moreover, 
A = F~! (0). Hence, A links B by Theorem 7. 


Example 6 Let M,N be as in Example 1. Let vo 
be in ðBıNAN and write N={vo}@N’. Let 
A=0OBsNN, O=BsNN, and 
B ={w € M: ||w|| < R} 
U {w + svo : w € M,s > 0, |w + svo|| = R} 


where 0 < 6 < R. Then A links B. To see this, write 
u=w +v + svo, w E€ M, v' € N', s € R and take 


F(u) = (cR — max{c||w + svol|, [cR — s|} vo + v 


where c=6/(R — 6). Then F is the identity operator 
on Q, and F-'(0)=B. Apply Theorem 7. 


Some Applications 


Many elliptic semilinear problems can be described 
in the following way. Let Q be a domain in R”, and 
let A be a self-adjoint operator on L*(Q). We assume 
that A > Ao > 0 and that 


C&(0) c D:= D(A’?) cH™7(9) 133] 


for some m > 0, where CẸ (Q) denotes the set of test 
functions in Q (i.e., infinitely differentiable functions 
with compact supports in 2), and H”*(Q) denotes 
the Sobolev space. If m is an integer, the norm in 
H">*(Q) is given by 


12 
242 = X prat”) [34] 


|u|<m 


Here D” represents the generic derivative of order 

|u| and the norm on the right-hand side of [34] is 

that of L*(Q). We shall not assume that m is an 
integer. 

Let q be any number satisfying 

2<q<2n/(n-—2m), 2m<n 

2x g <00 n < 2m 


and let f(x,t) be a continuous function on Q x R. 
We make the following assumptions. 


Assumption A The function f(x, t) satisfies 

f(x, t)| < Vol) lt + Vo(x)Wo(x) [85] 
and 

f(x, t)/Vo(x)? = o(|t|™") as |t| + 00 [36] 
where Vo(x) > 0 is a function in L4(Q) such that 


|Voull, < Cllullp, “ED [37] 


and Wo is a function in L7(Q). Here 
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lullo = lA" 7u [39] 


and q’=q/(q —1). With the norm [39], D becomes 
a Hilbert space. Define G and F by [8] and [2]. It 
follows that G is a continuously differentiable 
functional on the whole of D. 

We assume further that 


H(x,t) = 2F(x,t) — txt) 
>—-Wi(x)ELi(Q), xE€N,tER [40] 

and 
H(x,t)— œ a.e. as |t| — oo [41] 
Moreover, we assume that there are functions 


V(x), W(x) € L7(Q) such that multiplication by 
V(x) is a compact operator from D to L7(Q) and 


F(x,t) < C(V (x) lt + V(x) W(x)Ie) [42] 
We wish to obtain a solution of 
Au = f (x,u) ueD [43] 


By a solution of [43] we shall mean a function u € D 
such that 


(u, v)p — (nu), v), veD [44] 


If f(x, u) is in L*(Q), then a solution of [44] is in D(A) 
and solves [43] in the classical sense. Otherwise we call 
it a weak or semistrong solution. We have 


Theorem 8 Let A be a self-adjoint operator in 
L7(Q) such that A > Xo > 0 and [33] holds for some 
m> QO. Assume that Xo is an eigenvalue of A with 
eigenfunction po. Assume also 


2F(x,t) < rot”, |t| < 6 for some 6>0 [45] 
and 
2F(x,t) > Aot* — Wo(x), t>0,x EQ [46] 


where Wo € L'(Q). Assume that f(x, t) satisfies [35], 
[36], [40], [41], and [42]. Then [43] has a solution 


u #4 (0. 


Proof. Under the hypotheses of the theorem, it 
is known that the following alternative holds: either 


(i) there is an infinite number of y(x) € D(A) \{0} 


such that 
Ay = f(x,y) = roy [47] 


or 
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(ii) for each p > 0 sufficiently small, there is an € > 0 
such that 


G(u) > €,||4Ilp = p [48] 


We may assume that option (ii) holds, for otherwise 
we are done. By [46] we have 


/\ 


G(R¢o) < R2(\|voll2, — Aollwoll2) + | Wo(xe) dx 


J Wo(x) dx 


By Theorem 6, there is a sequence satisfying [28]. 
Taking Y(r)=1/(r +1), we conclude that there is a 
sequence {u,} C D such that 


G(up) >c, Mo = C & Mii 
(1+ liukllp)G (u) — 0 [49] 


In particular, we have 


palh -2 f Fonm)dx +e [50 
and 
llb — FCs xe), te) — 0 st] 
Consequently, 
| H(x,1,) dx > —c [52] 
These imply 
J E TE [53] 


If pk = ||uk||p — 00, let ty =ug/pp. Then |jõllp = 1. 
Consequently, there is a renamed subsequence such 
that a, — ù weakly in D, strongly in L7(Q), and a.e. 
in Q. We have from [42] 


1 < (mı + 6)/p; 
+2C | (VPR + V) We) og dx 
Q 
Consequently, 
1<2C / V(x)2#? dx [54] 
Q 


This shows that ù Æ 0. Let Qo be the subset of Q on 
which 4 4 0. Then 


jugk(x)| = ppltty(x)| 2 œ, x EN [55] 
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If Q1 =Q\Qo, then we have 


| Homdas f +f 


> | H(x,upz) dx 
Qo 


— | Wi(x)dx—- co [56] 
Qı 


This contradicts [53], and we see that p = ||up||p is 
bounded. Once we know that the p, are bounded, 
we can apply well-known theorems to obtain the 
desired conclusion. o 


Remark 1 It should be noted that the crucial 
element in the proof of Theorem 8 was [51]. If we 
had been dealing with an ordinary Palais-Smale 
sequence, we could only conclude that 

uelli — (FC, ug), He) = Op) 
which would imply only 


| Hen) de = ola) 
This would not contradict [56], and the argument 
would not go through. 
As another application, we wish to solve 
—x" (t) = Vx V (t, x(t)) [57] 
where 
OL Se heaga [58] 


is a map from I=[0,27] to R” such that each 
component x;(t) is a periodic function in H! with 
period 27, and the function 
Mee VE ign tae) 
is continuous from R”*! to R with a gradient 
VeV (hax) =(OV/0x1,...,0V/Oxn) 59 
59 
e CR" R”) 


For each x € R”, the function V(t, x) is periodic in t 
with period 27. We shall study this problem under 
the following assumptions: 


1. 0 < V(t, x) < C(x +1) 
tellxeR” 
2. There are constants m > 0,a < 3m?/277 such that 
V(t x) <a, |x|<m,te€l,x eR” 


3. There are constants 8 > 1/2 and C such that 


V(t, x) > Bxl 


when 


|x| > C,t € I, x € R” 


4. The function given by 


H(t x) =]=2V 0,4) =V; Vix] x [60] 
satisfies 
H(t,x)< W(t)EL'(D, lx >C [61] 
teI, xeR”, and 
H(t, x) — —oo as |x| — co [62] 


We have 


Theorem 9 Under the above hypotheses, the 
system |57] has a nonconstant solution. 


Proof. Let X be the set of vector functions x(t) 
described above. It is a Hilbert space with norm 
satisfying 


n 


2 2 
lolx = Š Wxillen 


j=1 
We also write 
lit” = $ l; 
j=1 
where ||- || is the L*(I) norm. Let 
N = {x(t) € X : x(t) = constant, 1 <j < n} 


and M=N +. The dimension of N is n, and 
X=MBN. The following is easily proved. 


Lemma 1 Ifx € M, then 
2 Ty 2 
< — 
ilà < Zll 


and 
|||] < |]>"|| 


We define 


G(x) = |x’ a V(t x(t))dt, xeEX [63] 


For each x € X write x =v + w, where v € N, w € M. 
For convenience, we shall use the following equivalent 
norm for X: 


2 2 2 
Ixl% = lle I + llel] 


If x € M and 


then Lemma 1 implies that ||x||,, < 7, and we have 
by Hypothesis 2 that V(t,x) <a. 
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Hence, 


G(x) > |x’? — 


2 f adt 
|x|<m 


> p? — 2a(2r) > 0 (64) 


Note that Hypothesis 3 is equivalent to 
V(t,x) > Blix? —C, tel,xeR" [65] 
for some constant C. Next, let 
y(t) = v + swo 
where v € N, s > 0, and 
wo = (sin t,0,...,0) 
Then wo € M, and 
[wolt = wol? = x 
Note that 
ly = jvf +s? = Zrv] + rs? 


Consequently, 


G(y) = êloh? -2 J V(t, y(t)) dt 


ms? — 28 J y(t) |? dt + 27C 
I 

ms? — 2(||v||” + rs?) +2rC 

(1 —28)ns? — 48n|v|" + 2rC 


— —oo as s* + |v|* 5 00 


Il IA 


IA 


We also note that Hypothesis 1 implies 
G(v) <0, veEN [66] 
Take 
A={vEN: |wl| < R} 
U {swo +v: v E N,s > 0,||swo +vl|y =R} 
B = B, N M, 0 < p = 6m Jn < R 
where 
B, = {x € X: |ixllx < o} 
By Example 2, A links B. Moreover, if R is 
sufficiently large, 


sup G=0 < inf G [67] 
A B 
Hence, we may conclude that there is a sequence 
{x'*)} C X such that 


G(x) e200, (14 |x ]lx) Ge) = 0 


Hence, 
ee aie. 
-2 f vies! x (t))dt+c>0 [68] 
(Ga),2)/2— ("| 2) 
ited -2(t)dt 0, zex [69] 
and 
a = nut 
he V(t, x) «© de 0 '70] 
If 


k 
pe = (|x |ly < C 
then there is a renamed subsequence such that x'*) 
converges to a limit x€ X weakly in X and 


uniformly on I. From [69] we see that 


(G'(x),z)/2 = (x2) 


-f overo) 


from which we conclude easily that x is a solution of 
[57]. From [68], we see that 


-2(t)dt=0, zEx 


G(x) > e>0 


showing that x(t) is not a constant. For if c > 0 and 
x € N, then 


Gla) = -2 | V(t, x(t)) dt < 0 


If c=0, we see that x € B by Theorem 6. Hence, 
x € M. It 


k 
Pr = ||x! ‘IIx a 
let Z% = x) pr: Then, zs a =1. Let x) = wl) + 
D$, where w'*) € M e pk) € N. There is a nei? 
subsequence such that x'*) converges uniformly in I to 


a limit x and IZT] — r and xe") | => T; where y2 + 
7 =1. From [68] and [70], we obtain 


ROY? —2 / V(t, x(t) dt/p? + 0 


P- | VeVee,x®) +2 


2 l V(t, x” (t)) dt/pz > r [71] 
I 


and 


dio =0 


Thus, 


454 Saddle Point Problems 


and 


[ vox) -x 
J Hex@) 


By Hypothesis 3, the left-hand side of [71] is 


) dt/ py >r [72] 
Hence, 


)) dt/ pz — 0 [73] 


> 2\|x | — 4xC/ py 


Thus, 
r > 26r* =26(1- r°) 


showing that r> 0. Hence, x(t) #0. Let Qo CI 
be the set on which [x(t)] 40. The measure of 
Qo is positive. Thus, |x®(t)| = co as k = œ for 
t E€ Qo. Hence, 


I H(t, x) 


H(t, x *)(t)) dt + 
Qo I\ Qo 


W(t) dt — —co 


contrary to Hypothesis 4. Thus, the p, are bounded, 
and the proof is complete. o 


Superlinear Problems 
Consider the problem 
—Au = f(x, u) xEQ; u=0 on OO [74] 


where Q c R” is a bounded domain whose bound- 
ary is a smooth manifold, and f(x, t) is a continuous 
function on 2 x R. This semilinear Dirichlet pro- 
blem has been studied by many authors. It is called 
“sublinear” if there is a constant C such that 


f(x, t| < CU +1), xeQ,teR 


Otherwise, it is called “superlinear”. Assume 


(a1) There are constants c1,c2 > 0 such that 
f(x, t)| < c1 + clt 


where 0 < s < (n + 2)/(n — 2) if n> 2. 
(an) f(x,t) =o((t|) as t > 0. 
(a3) Either 


F(x,t)/t? — co as t > co 
or 
F(x, t)/t? — œ as t > —o0. 


We have 


Theorem 10 Under 
boundary-value problem 


-Au = Gf (x,u), x € 9; 


hypotheses (a,)—(a3) the 


u = 0 on óN [75] 
has a nontrivial solution for almost every positive (3. 


Unfortunately, this theorem does not give any 
information for any specific 8. It still leaves open the 
problem of solving [74]. For this purpose, we add 
the assumption 

(a4) There are constants u > 2,r > 0 such that 


uF(æ,t) — f(t) < CÊ +1), 
We have 


E [76] 


Theorem 11 Under hypotheses (aı)—(a4) problem 
[74] bas a nontrivial solution. 


We also have 
Theorem 12 If we replace hypothesis (a4) with 
(a4) The function —H(x,t) is convex in t, 


then the problem |74] has at least one nontrivial 
solution. 


Weak Linking 


It is not clear if it is possible for A to link B if neither is 
contained in a finite-dimensional manifold. For 
instance, if E=M@N, where M,N are closed 
infinite-dimensional subspaces of E and Bp is the ball 
centered at the origin of radius R in E, it is unknown if 
the set A= M N Bpr links B =N. (If either M or N is 
finite dimensional, then A does link B.) Unfortunately, 
this is the situation which arises in some important 
applications including Hamiltonian systems, the wave 
equation and elliptic systems, to name a few. 

We now consider linking when both M and N are 
infinite dimensional and G’ has some additional 
continuity property. A property that is very useful is 
that of weak-to-weak continuity: 


u, — u weakly in E 
=> G'(uz) — G'(u) weakly [77] 
We make the following definition: 


Definition 3 A subset A of a Banach space E links 
a subset B of E “weakly” if for every G € CE, R) 
satisfying [77] and 


dy := sup G < bọ := inf G [78] 
A B 


there is a sequence {u,} C E and a constant c such 
that 


bo < c < œ [79] 


and 


Guk) > ¢, — G'(ue) > 0 [80] 


We have the following counterpart of Theorem 7. 


Theorem 13 Let E be a separable Hilbert space, 
and let G be a continuous functional on E with a 
continuous derivative satisfying |77]. Let N be a 
closed subspace of E, and let O be a bounded open 
subset of N containing the point p. Let F be a 
continuous map of E onto N such that 


(i) Flo =I, and 

(ii) For each finite-dimensional subspace S # {0} of 
E containing p, there is a finite-dimensional 
subspace So # {0} of N containing p such that 


v E€ QN So, w E€ S = F(v+w) E€ So [81] 
Set A = ðQ, B = F`! (p). If 
a, = sup G < œœ [82] 
e 


and |22] holds, then there is a sequence {up} C E 
such that [24] holds with a < a4. 


Theorem 13 states that if O,F,p satisfy the 
hypotheses of that theorem, then A=OQ links 
B=F-!(p) weakly. It follows from this theorem 
that all sets A,B known to link when one of the 
subspaces M,N is finite dimensional will link 
weakly even when M,N are both infinite 
dimensional. 

Now we give some applications of Theorem 13 to 
semilinear boundary-value problems. Let Q be a 
domain in R” and let A be a self-adjoint operator in 
L7(Q) having 0 in its resolvent set (thus, there is an 
interval (a,b) in its resolvent set satisfying 
a<0O<b). Let f(x,t) be a continuous function on 
Q x R such that 


f(x, t)| < V(x)" |e] + W(x) V(x) [83] 
x EO, t eR, and 


f(x,t)/t => a+ (x) 


where V, W € L7(Q), and multiplication by V(x) > 
0 is a compact operator from D=D(|A\"”) to 
L7(Q). Let 


M= f dEQ)D, N= f aED 


where {E(A)} is the spectral measure of A. Then M, N 
are invariant subspaces for A and D= MAN. If 


a(u,v) = fom —a_u-)v dx [85] 


as t — +00 [84] 
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alu) =a(u,u), then we assume that 
a(v) > (Av,v), vEN [86] 
(Aw,w) > alw) wEM [87] 
We also assume that the only solution of 
Au = ayu" — a u` [88] 
is u = 0, where u+ = max {+u, 0}. We have 


Theorem 14 Under the above hypotheses there is 
at least one solution of 


u € D(A) [89] 


Next, we consider an application concerning 
radially symmetric solutions for the problem 


M — Au = fixu); 
Wi.) =U; 


tEeR,xeBr (90) 
te R, x € OBR 191] 
u(t+T,x)=uli,x), tER, xe Br (92!) 


where Br = {x € R”:|x| < R}. We assume that the 
ratio R/T is rational. Let 


8R/T =a/b [93] 


where a,b are relatively prime positive integers. It 
can be shown that 


n #3 (mod(4,a)) [94] 


implies that the linear problem corresponding to 
[90]-[92] has no essential spectrum. If 


n = 3 (mod(4,a)) [95] 


then the essential spectrum of the linear operator 
consists of precisely one point 


ào = —(n — 3) (n — 1)/4R? [96] 
Consider the case 


f(t r,s) = us + p(t,r,s) [97] 


where p is a point in the resolvent set, r= |x|, and 


plt, r,s)| < C(s +1), seR [98] 


for some number 0 < 1. We then have 


Theorem 15 If [94] holds, then [90|-[92] have a 
weak rotationally invariant solution. If [95] holds 
and o< u, assume in addition that p(t,r,s) is 
nondecreasing in s. If u < Ao, assume that p(t, r,s) is 
nonincreasing in s. Then [90|-[92] have a weak 
rotationally invariant solution. 
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See also: Combinatorics: Overview; Homoclinic 
Phenomena; Ljusternik—Schnirelman Theory; Minimax 
Principle in the Calculus of Variations. 
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Physical Motivation and Mathematical 
Setting 


The primary connection of relativistic quantum field 
theory to experimental physics is through scattering 
theory, that is, the theory of the collision of elementary 
(or compound) particles. It is therefore a central topic 
in quantum field theory and has attracted the attention 
of leading mathematical physicists. Although a great 
deal of progress has been made in the mathematically 
rigorous understanding of the subject, there are 
important matters which are still unclear, some of 
which will be indicated below. 

In the paradigmatic scattering experiment, several 
particles, which are initially sufficiently distant from 
each other that the idealization that they are not 
mutually interacting is physically reasonable, 
approach each other and interact (collide) in a region 
of microscopic extent. The products of this collision 
then fly apart until they are sufficiently well separated 
that the approximation of noninteraction is again 
reasonable. The initial and final states of the objects in 
the scattering experiment are therefore to be modeled 
by states of noninteracting, that is, free, fields, which 
are mathematically represented on Fock space. Typi- 
cally, what is measured in such experiments is the 
probability distribution (cross section) for the transi- 
tions from a specified state of the incoming particles to 
a specified state of the outgoing particles. 


It should be mentioned that until the late 1950s, 
the scattering theory of relativistic quantum particles 
relied upon ideas from nonrelativistic quantum- 
mechanical scattering theory (interaction representa- 
tion, adiabatic limit, etc.), which were invalid in the 
relativistic context. Only with the advent of axio- 
matic quantum field theory did it become possible to 
properly formulate the concepts and mathematical 
techniques which will be outlined here. 

Scattering theory can be rigorously formulated 
either in the context of quantum fields satisfying 
the Wightman axioms (Streater and Wightman 1964) 
or in terms of local algebras satisfying the Haag- 
Kastler-Araki axioms (Haag 1992). In brief, the 
relation between these two settings may be described 
as follows: in the Wightman setting, the theory is 
formulated in terms of operator-valued distributions ¢ 
on Minkowski space, the quantum fields, which act on 
the physical state space. These fields, integrated with 
test functions f having support in a given region O of 
spacetime (only four-dimensional Minkowski space 
Rf will be treated here), d(f) = f dfx f(x)@(x), form 
under the operations of addition, multiplication, and 
Hermitian conjugation a polynomial *-algebra P(O) of 
unbounded operators. In the Haag—Kastler—Araki 
setting, one proceeds from these algebras to algebras 
A(O) of bounded operators which, roughly speaking, 
are formed by the bounded functions A of the 
operators (f). This step requires some mathematical 
care, but these subtleties will not be discussed here. As 
the statements and proofs of the results in these two 
frameworks differ only in technical details, the theory 
is presented here in the more convenient setting of 
algebras of bounded operators (C*-algebras). 

Central to the theory is the notion of a particle, 
which, in fact, is a quite complex concept, the full 
nature of which is not completely understood, cf. 


Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools 457 


below. In order to maintain the focus on the 
essential points, we consider in the subsequent 
sections primarily a single massive particle of integer 
spin s, that is, a boson. In standard scattering theory 
based upon Wigner’s characterization, this particle 
is simply identified with an irreducible unitary 
representation U41 of the identity component P! of 
the Poincaré group with spin s and mass m > 0. The 
Hilbert space Hı upon which UPL) acts is called 
the one-particle space and determines the possible 
states of a single particle, alone in the universe. 
Assuming that configurations of several such parti- 
cles do not interact, one can proceed by a standard 
construction to a Fock space describing freely 
propagating multiple particle states, 


He = Q H, 


nENo 


where Ho = C and H, is the n-fold symmetrized direct 
product of Hı with itself. This space is spanned by 
vectors ®; ®---@®®,, where & denotes the symme- 
trized tensor product, representing an 7-particle state 
wherein the kth particle is in the state ®, € 
Hı, R=1,...,n. The representation U1 (P|) induces 
a unitary representation Ur(P! ) on Hr by 


Up(A)(®1 @ ++: @ Oy) =U1(A)O1 8---8 U1), [1] 


In interacting theories, the states in the correspond- 
ing physical Hilbert space H do not have such an a 
priori interpretation in physical terms, however. It is 
the primary goal of scattering theory to identify in H 
those vectors which describe, at asymptotic times, 
incoming, respectively, outgoing, configurations of 
freely moving particles. Mathematically, this amounts 
to the construction of certain specific isometries 
(generalized Moller operators), Q™ and 0°", mapping 
Hr onto subspaces H'™ C H and H C H, respec- 
tively, and intertwining the unitary actions of the 
Poincaré group on Hr and H. The resulting vectors 


(®, Q-Q 9 ea aa Ea in/out (G, IQQ ®,) CH [2] 


are interpreted as incoming and outgoing particle 
configurations in scattering processes wherein the 
kth particle is in the state ®, € H1. 

If, in a theory, the equality H™ =H holds, then 
every incoming scattering state evolves, after the 
collision processes at finite times, into an outgoing 
scattering state. It is then physically meaningful to 
define on this space of states the scattering matrix, 
setting S=Q™Q%™*., Physical data such as collision 
cross sections can be derived from S and the corre- 
sponding transition amplitudes ((®1 @---@®,,)", 
(Di @--- @ P), respectively, by a standard proce- 
dure. It should be noted, however, that neither the 


above physically mandatory equality of state spaces nor 
the more stringent requirement that every state has an 
interpretation in terms of incoming and outgoing 
scattering states, that is, H#=H™—=H° (asymptotic 
completeness), has been fully established in any inter- 
acting relativistic field theoretic model so far. This 
intriguing problem will be touched upon in the last 
section of this article. 

Before going into details, let us state the few 
physically motivated postulates entering into the 
analysis. As discussed, the point of departure is a 
family of algebras A(O), more precisely a net, 
associated with the open subregions O of Min- 
kowski space and acting on H. Restricting attention 
to the case of bosons, we may assume that this net is 
local in the sense that if ©, is spacelike separated 
from O2, then all elements of A(QO;) commute with 
all elements of A(z). (In the presence of fermions, 
these algebras contain also fermionic operators 
which anticommute.) This is the mathematical 
expression of the principle of Einstein causality. 
The unitary representation U of Pi acting on H is 
assumed to satisfy the relativistic spectrum condition 
(positivity of energy in all Lorentz frames) and, in 
the sense of equality of sets, U(A)A(O)U(A)7 = 
A(XAO) for all A€ Pl and regions O, where AO 
denotes the Poincaré transformed region. It is also 
assumed that the subspace of U(P! )-invariant 
vectors is spanned by a single unit vector Q, 
representing the vacuum, which has the Reeh- 
Schlieder property, that is, each set of vectors 
A(O)Q is dense in H. These standing assumptions 
will subsequently be amended by further conditions 
concerning the particle content of the theory. 


Haag-Ruelle Theory 


Haag and Ruelle were the first to establish the 
existence of scattering states within this general 
framework (Jost 1965); further substantial improve- 
ments are due to Araki and Hepp (Araki 1999). In all 
of these investigations, the arguments were given for 
quantum field theories with associated particles (in 
the Wigner sense) which have strictly positive mass 
m > Q0 and for which m is an isolated eigenvalue of 
the mass operator (upper and lower mass gap). 
Moreover, it was assumed that states of a single 
particle can be created from the vacuum by local 
operations. In physical terms, these assumptions 
allow only for theories with short-range interactions 
and particles carrying strictly localizable charges. 

In view of these limitations, Haag—Ruelle theory 
has been developed in a number of different 
directions. By now, the scattering theory of massive 
particles is under complete control, including also 
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particles carrying nonlocalizable (gauge or topo- 
logical) charges and particles having exotic statistics 
(anyons, plektons) which can appear in theories in 
low spacetime dimensions. Due to constraints of 
space, these results must go without further men- 
tion; we refer the interested reader to the articles 
Buchholz and Fredenhagen (1982) and Fredenhagen 
et al. (1996). Theories of massless particles and of 
particles carrying charges of electric or magnetic 
type (infraparticles) will be discussed in subsequent 
sections. 

We outline here a recent generalization of Haag- 
Ruelle scattering theory presented in Dybalski 
(2005), which covers massive particles with localiz- 
able charges without relying on any further con- 
straints on the mass spectrum. In particular, the 
scattering of electrically neutral, stable particles 
fulfilling a sharp dispersion law in the presence of 
massless particles is included (e.g., neutral atoms in 
their ground states). Mathematically, this assump- 
tion can be expressed by the requirement that there 
exists a subspace Hı C H such that the restriction of 
U(P |) to Hı is a representation of mass m > 0. We 
denote by P, the projection in H onto H1. 

To establish notation, let O be a bounded space- 
time region and let A € A(O) be any operator such 
that P} AQ Æ 0. The existence of such localized (in 
brief, local) operators amounts to the assumption 
that the particle carries a localizable charge. That 
the particle is stable, that is, completely decouples 
from the underlying continuum states, can be cast 


into a condition first stated by Herbst: for all 
sufficiently small u > 0 
|Ep(1 — P1)AQ|| < cp” 3 


for some constants c,7 > 0, where E, is the projec- 
tion onto the spectral subspace of the mass operator 
corresponding to spectrum in the interval (m — u, 
m+ u). In the case originally considered by Haag 
and Ruelle, where m is isolated from the rest of the 
mass spectrum, this condition is certainly satisfied. 

Setting A(x) = U(x)AU(x) t, where U(x) is the 
unitary implementing the spacetime translation 
x=(x9,x) (the velocity of light and Planck’s 
constant are set equal to 1 in what follows), one 
puts, for t Æ 0, 


A,(f) = / d*x 24 (220) fe, (10) A(x) 4] 


Here xo g;(xo)=g((xo — t)/|t|*)/|t|" induces a 
time averaging about ¢, g being any test function 
which integrates to 1 and whose Fourier transform 
has compact support, and 1/(1+ 7) <« <1 with 7 
as above. The Fourier transform of fx, is given by 


fx. (p) = f(p)e =(P), where f is some test function 
on R? with f(p) having compact support, and 
w(p) = (p? + m2)". Note that (xo, x)—> fx,(x) is a 
solution of the Klein-Gordon equation of mass m. 

With these assumptions, it follows by a straight- 
forward application of the harmonic analysis of 
unitary groups that in the sense of strong conver- 
gence A A — a and A,(f)"Q — 0 as t —> +00, 
where A(f = fd xf(x)A(0,x). Hence, the opera- 
tors A;(f) Lone be salar af as creation operators 
and their adjoints as annihilation operators. These 
operators are the basic ingredients in the construc- 
tion of scattering states. Choosing local operators 
A, as above and test functions ff) with disjoint 
compact supports in momentum space, 
k=1,...,n, the scattering states are obtained as 
limits of the Haag—Ruelle approximants 


Ang”) = 


Roughly speaking, the operators A,z,(f‘*)) are loca- 
lized in spacelike separated regions at asymptotic 
times t, due to the support properties of the Fourier 
transforms of the functions f‘. Hence they com- 
mute asymptotically because of locality and, by the 
clustering properties of the vacuum state, the above 
vector becomes a product state of single-particle 
states. In order to prove convergence, one proceeds, 
in analogy to Cook’s method in quantum-mechanical 
scattering theory, to the time derivatives, 


An (f)O [5] 


Arf) ie 
= SAF) + [Ace F), Au (fO)] Ana FPO 
kAl 


Ver Ane (fA FOHQ — (6 


+) Au(f' 
k 


where ý denotes omission of Aj,,(f‘*)). Employing 
techniques of Araki and Hepp, one can prove that 
the terms in the first summation on the right-hand 
side (RHS) of [6], involving commutators, decay 
rapidly in norm as t approaches infinity because of 
locality, as indicated above. By applying condition 
[3] and the fact that the vectors &Az(f®)NQ do not 
have a component in the single-particle space H1, 
the terms in the second summation on the RHS of 
[6] can be shown to decay in norm like |t}*"*”. 
Thus, the norm of the vector [6] is integrable in £, 
elie the existence of the strong limits 


in/out 


(PrAi(f0)0.@ + @ PLAn(f™)2) 
~ lim Aa e 


t— Foo 


An (f®)Q [7] 
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As indicated by the notation, these limits depend 
only on the single-particle vectors P;A,(f\))Q € Hı, 
k=1,...,n, but not on the specific choice of 
operators and test functions. In order to establish 
their Fock structure, one employs results on cluster- 
ing properties of vacuum correlation functions in 
theories without strictly positive minimal mass. 
Using this, one can compute inner products of 
arbitrary asymptotic states and verify that the maps 


(PA (F28: @ PLA, FOA) 


in/out 
= (PAFOS 8 PAFA) [8 
extend by linearity to isomorphisms (2'"/°"* from the 
Fock space Hg onto the subspaces H'/° cH 
generated by the collision states. Moreover, the 
asymptotic states transform under the Poincaré 
transformations UIP!) as 


U(A) (PA (fO) Q-Q O 
~ (VAPA fO) De 
x U1 (A)Pi An fa) 9 


Thus, the isomorphisms '"/%" intertwine the action 
of the Poincaré group on Hp and H™™, We 
summarize these results, which are vital for the 
physical interpretation of the underlying theory, in 
the following theorem. 


Theorem 1 Consider a theory of a particle of mass 
m>0 which satisfies the standing assumptions and 
the stability condition [3]. Then there exist canoni- 
cal isometries Q'"/°"*, mapping the Fock space Hr 
based on the single-particle space Hı onto subspaces 
Hilt CH of incoming and outgoing scattering 
states. Moreover, these isometries intertwine the 
action of the Poincaré transformations on the 
respective spaces. 


Since the scattering states have been identified 
with Fock space, asymptotic creation and annihila- 
tion operators act on H'/°* in a natural manner. 
This point will be explained in the following section. 


LSZ Formalism 


Prior to the results of Haag and Ruelle, an axiomatic 
approach to scattering theory was developed by 
Lehmann, Symanzik, and Zimmermann (LSZ), 
based on time-ordered vacuum expectation values 
of quantum fields. The relative advantage of their 
approach with respect to Haag—Ruelle theory is that 


useful reduction formulas for the S-matrix greatly 
facilitate computations, in particular in perturba- 
tion theory. Moreover, these formulas are the 
starting point of general studies of the momentum 
space analyticity properties of the S-matrix (disper- 
sion relations), as outlined in Dispersion Relations 
(cf. also Iagolnitzer (1993)). Within the present 
general setting, the LSZ method was established by 
Hepp. 

For simplicity of discussion, we consider again a 
single particle type of mass m > 0 and integer spin s, 
subject to condition [3]. According to the results of 
the preceding section, one then can consistently 
define asymptotic creation operators on the scatter- 
ing states, setting 


in/out 


ACA)" (PAs (FO) 0.@ 8 PrAn(f)O) 


= Jim. A,(f) (Pid; (FO Q---Q PrAn(fO)2) in/out 
= (PAPI PAFON 
® PiAn(f)) 10 


Similarly, one obtains the corresponding asymptotic 
annihilation operators, 


in/out 


A (P A (f))0.@---@ PiAn(f™)O) 


= lim A (f (PA (f)20--- 


in/out 
@ PA, (f)Q) =0 11] 


where the latter equality holds if the Fourier trans- 
forms of the functions f,f"),...,f, have disjoint 
supports. We mention as an aside that, by replacing 
the time-averaging function g in the definition of 
A;(f) by a delta function, the above formulas still 
hold. But the convergence is then to be understood 
in the weak Hilbert space topology. In this form, the 
above relations were anticipated by LSZ (asymptotic 
condition). 

It is straightforward to proceed from these 
relations to reduction formulas. Let B be any local 
operator. Then one has, in the sense of matrix 
elements between outgoing and incoming scattering 
states, 


BA(f)"" — A(f)™ B = lim (BA(f-,) — A(f:)B) 


= ( J dxf (x) BA (x j- f dxf, (x) A(x)B ) [12] 


fi(x)=g;(x0)f (x0)(vec(x)). Because of the (essential) 
inti properties of the functions f,;, the contribu- 
tions to the latter integrals arise, for asymptotic t, 
from spacetime points x where the localization 
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regions of A(x) and B have a negative timelike (first 
term), respectively, positive timelike (second term) 
distance. One may therefore proceed from the 
products of these operators to the time-ordered 
products T(BA(x)), where T(BA(x))=A(x)B if the 
localization region of A(x) lies in the future of that 
of B, and T(BA(x))=BA(x) if it lies in the past. It is 
noteworthy that a precise definition of the time 
ordering for finite x is irrelevant in the present 
context — any reasonable interpolation between the 
above relations will do. Similarly, one can define 
time-ordered products for an arbitrary number of 
local operators. The preceding limit can then be 
recast into 


lim | d'x(fs(x) —fi(x))T(BA(x)) [13] 


t—co 


The latter expression has a particularly simple form in 
momentum space. Proceeding to the Fourier trans- 
forms of f+; and noticing that, in the limit of large t, 


(FP) — file) (Po = wp) 
— —2rif (p) êlpo — w(p)) [14] 


one gets 


BA(f)'" — A(f)™B 
S J WP pf (p) (po — w(p)) 


x T(BA(—p)) 





po=w(p) pe 
Here T(BA(p)) denotes the Fourier transform of 
T(BA(x)), and it can be shown that the restriction of 
(py — w(p))T(BA(—p)) to the manifold {p € R*: po = 
w(p)} (the “mass shell”) is meaningful in the sense of 
distributions on R°. By the same token, one obtains 


A(f)""B = BAY 


= -27i | dp 7(p) (bo - wl) TA(0)B) 16 
po=w(p) 

Similar relations, involving an arbitrary number of 
asymptotic creation and annihilation operators, can 
be established by analogous considerations. Taking 
matrix elements of these relations in the vacuum state 
and recalling the action of the asymptotic creation 
and annihilation operators on scattering states, one 
arrives at the following result, which is central in all 
applications of scattering theory. 


Theorem 2 Consider the theory of a particle of 
mass m > 0 subject to the conditions stated in the 
preceding sections and let f\",...,f' be any family 
of test functions whose Fourier transforms have 
compact and nonoverlapping supports. Then 


out 


((PrAi(f?)0 IQQ P1Ae(f®)A) | 
(PAn Ne PiAn(f)2) Y 


=(-2niy" | f dpi- dpa fO) 


x f® (pf (bra) FO,) 
x [] in - oD (A T(A 1) 


i=1 











x Aj (De) Arn (Pest) ae 
x An(—Pn)2) ) - 


Pio =w(p;) 





[17] 


in an obvious notation. 


Thus, the kernels of the scattering amplitudes in 
momentum space are obtained by restricting the (by 
the factor [];_, (pi, —w(p;))) amputated Fourier 
transforms of the vacuum expectation values of the 
time-ordered products to the positive and negative 
mass shells, respectively. These are the famous LSZ 
reduction formulas, which provide a convenient link 
between the time-ordered (Green’s) functions of a 
theory and its asymptotic particle interpretation. 


Asymptotic Particle Counters 


The preceding construction of scattering states 
applies to a significant class of theories; but even if 
one restricts attention to the case of massive 
particles, it does not cover all situations of physical 
interest. For an essential input in the construction is 
the existence of local operators interpolating 
between the vacuum and the single-particle states. 
There may be no such operators at one’s disposal, 
however, either because the particle in question 
carries a nonlocalizable charge, or because the given 
family of operators is too small. The latter case 
appears, for example, in gauge theories, where in 
general only the observables are fixed by the 
principle of local gauge invariance, and the physical 
particle content as well as the corresponding inter- 
polating operators are not known from the outset. 
As observables create from the vacuum only neutral 
states, the above construction of scattering states 
then fails if charged particles are present. Never- 
theless, thinking in physical terms, one would expect 
that the observables contain all relevant information 
in order to determine the features of scattering 
states, in particular their collision cross section. That 
this is indeed the case was first shown by Araki and 
Haag (Araki 1999). 

In scattering experiments, the measured data are 
provided by detectors (e.g., particle counters) and 
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coincidence arrangements of detectors. Essential 
features of detectors are their lack of response in 
the vacuum state and their macroscopic localization. 
Hence, within the present mathematical setting, a 
general detector is represented by a positive operator 
C on the physical Hilbert space H such that CQ =0. 
Because of the Reeh—Schlieder theorem, these con- 
ditions cannot be satisfied by local operators. 
However, they can be fulfilled by “almost-local” 
operators. Examples of such operators are easy to 
produce, putting C= L*L with 


L= J dfx f(x) A(x) [18] 


where A is any local operator and f any test function 
whose Fourier transform has compact support in the 
complement of the closed forward light cone (and 
hence in the complement of the energy momentum 
spectrum of the theory). In view of the properties of 
f and the invariance of Q under translations, it 
follows that C=L*L annihilates the vacuum and 
can be approximated with arbitrary precision by 
local operators. The algebra generated by these 
operators C will be denoted by C. 

When preparing a scattering experiment, the first 
thing one must do with a detector is to calibrate it, 
that is, test its response to sources of single-particle 
states. Within the mathematical setting, this 
amounts to computing the matrix elements of C in 
states ® € Hı: 


(, CO) = / J dp d'4 DP) (a) (plCla) [19 


Here p+> ®(p) is the momentum space wave func- 
tion of ®,(-|C|-) is the kernel of C in the single- 
particle space H1, and we have omitted (summations 
over) indices labeling internal degrees of freedom of 
the particle, if any. The relevant information about 
C is encoded in its kernel. As a matter of fact, one 
only needs to know its restriction to the diagonal, 
pr (p|Cļp). It is called the sensitivity function of C 
and can be shown to be regular under quite general 
circumstances (Araki 1999, Buchholz and Fredenhagen 
1982). 

Given a state VW € H for which the expectation 
value (W,C(x))W) differs significantly from 0, one 
concludes that this state deviates from the vacuum 
in a region about x. For finite x, this does not mean, 
however, that W has a particle interpretation at x. 
For that spacetime point may, for example, be just 
the location of a collision center. Yet, if one 
proceeds to asymptotic times, one expects, in view 
of the spreading of wave packets, that the prob- 
ability of finding two or more particles in the same 


spacetime region is dominated by the single-particle 
contributions. It is this physical insight which 
justifies the expectation that the detectors C(x) 
become particle counters at asymptotic times. 
Accordingly, one considers for asymptotic t the 
operators 


C,(h) = J d?xh(x/t) C(t, x) [20] 


where h is any test function on R?. The role of the 
integral is to sum up all single-particle contributions 
with velocities in the support of / in order to 
compensate for the decreasing probability of finding 
such particles at asymptotic times £ about the 
localization center of the detector. That these ideas 
are consistent was demonstrated by Araki and Haag, 
who established the following result (Araki 1999). 


Theorem 3 Consider, as before, the theory of a 
massive particle. Let C"),...,C™ €C be any family 
of detector operators and let h"),...,b™ be any 
family of test functions on R°. Then, for any state 
wot € H°™ of finite energy, 


fai an, g" (hD) E c™ (He wr") 


t— co 


= J ai | d°p, ni Pp, (pe. p°™ (pı) pp Ta (p, uo") 


x | [A ,/o(be)) (PelC™ Pe) |21] 
k=1 


where p™(p) is the momentum space density (the 
product of creation and annihilation operators) of 
outgoing particles of momentum p, and (summa- 
tions over) possible indices labeling internal degrees 
of freedom of the particle are omitted. An analogous 
relation holds for incoming scattering states at 
negative asymptotic times. 


This result shows, first of all, that the scattering 
states have indeed the desired interpretation with 
regard to the observables, as anticipated in the 
preceding sections. Since the assertion holds for all 
scattering states of finite energy, one may replace in the 
above theorem the outgoing scattering states by any 
state of finite energy, if the theory is asymptotically 
complete, that is, #=H™=H°". Then choosing, in 
particular, any incoming scattering state and making 
use of the arbitrariness of the test functions h'*) as well 
as the knowledge of the sensitivity functions of the 
detector operators, one can compute the probability 
distributions of outgoing particle momenta in this state, 
and thereby the corresponding collision cross sections. 

The question of how to construct certain specific 
incoming scattering states by using only local 
observables was not settled by Araki and Haag, 
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however. A general method to that effect was 
outlined in Buchholz et al. (1991). As a matter of 
fact, for that method only the knowledge of states in 
the subspace of neutral states is required. Yet in this 
approach one would need for the computation of, 
say, elastic collision cross sections of charged 
particles the vacuum correlation functions involving 
at least eight local observables. This practical 
disadvantage of increased computational complexity 
of the method is offset by the conceptual advantage 
of making no appeal to quantities which are a priori 
nonobservable. 


Massless Particles 
and Huygens’ Principle 


The preceding general methods of scattering theory 
apply only to massive particles. Yet taking advan- 
tage of the salient fact that massless particles always 
move with the speed of light, Buchholz succeeded in 
establishing a scattering theory also for such 
particles (Haag 1992). Moreover, his arguments 
lead to a quantum version of Huygens’ principle. 

As in the case of massive particles, one assumes 
that there is a subspace Hı C H corresponding to a 
representation of U(P |) of mass m=O and, for 
simplicity, integer helicity; moreover, there must 
exist local operators interpolating between the 
vacuum and the single-particle states. These 
assumptions cover, in particular, the important 
examples of the photon and of Goldstone particles. 
Picking any suitable local operator A interpolating 
between Q and some vector in Hy, one sets, in 
analogy to [4], 


x (—1/2r)e(xo) 6(xj — x7)O0 A(x) [22] 


Here g;(xo)=(1/|1n t|) g((xo — t)/| In t|) with g as in 
[4], and the solution of the Klein-Gordon equation 
in [4] has been replaced by the fundamental solution 
of the wave equation; furthermore, O09A(x) denotes 
the derivative of A(x) with respect to x9. Then, once 
again, the strong limit of A;Q as t — -too is P, AQ, 
with Pı the projection onto H1. 

In order to establish the convergence of A; as in 
the LSZ approach, one now uses the fact that these 
Operators are, at asymptotic times tf, localized in the 
complement of some forward, respectively, back- 
ward, light cone. Because of locality, they therefore 
commute with all operators which are localized in 
the interior of the respective cones. More specifi- 
cally, let © C R* be the localization region of A and 
let OŁ C R* be the two regions having a positive, 


respectively, negative, timelike distance from all 
points in ©. Then, for any operator B which is 
compactly localized in Ox, respectively, one obtains 


lime A,;BQ = lim; — +o BA,Q = BP, AQ. This 
relation establishes the existence of the limits 
Aivout — lim A; [23] 


t— Foo 


on the (by the Reeh-Schlieder property) dense sets of 
vectors {BQ:B € A(Oz)} CH. It requires some 
more detailed analysis to prove that the limits have 
all of the properties of a (smeared) free massless 
field, whose translates x + A™ t(x) satisfy the wave 
equation and have c-number commutation relations. 
From these free fields, one can then proceed to 
asymptotic creation and annihilation operators and 
construct asymptotic Fock spaces H™°™ C H of 
massless particles and a corresponding scattering 
matrix as in the massive case. The details of this 
construction can be found in the original article, cf. 
Haag (1992). 

It also follows from these arguments that the 
asymptotic fields A/°"t of massless particles ema- 
nating from a region QO, that is, for which the 
underlying interpolating operators A are localized in 
O, commute with all operators localized in O;, 
respectively. This result may be understood as an 
expression of Huygens’ principle. More precisely, 
denoting by A'/°(O) the algebras of bounded 
operators generated by the asymptotic fields A/°, 
respectively, one arrives at the following quantum 
version of Huygens’ principle. 


Theorem 4 Consider a theory of massless particles 
as described above and let A™°"(O) be the algebras 
generated by massless asymptotic fields A™/ with 
A € A(O). Then 


A'™(O) c A(O_)’ 
and [24] 
A Oe AO: 


Here the prime denotes the set of bounded operators 
commuting with all elements of the respective 
algebras (i.e., their commutants). 


Beyond Wigner’s Concept of Particle 


There is by now ample evidence that Wigner’s 
concept of particle is too narrow in order to cover 
all particle-like structures appearing in quantum 
field theory. Examples are the partons which show 
up in nonabelian gauge theories at very small 
spacetime scales as constituents of hadrons, but 
which do not appear at large scales due to the 
confining forces. Their mathematical description 


Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools 463 


requires a quite different treatment, which cannot be 
discussed here. But even at large scales, Wigner’s 
concept does not cover all stable particle-like 
systems, the most prominent examples being parti- 
cles carrying an abelian gauge charge, such as the 
electron and the proton, which are inevitably 
accompanied by infinite clouds of (“on-shell”) 
massless particles. 

The latter problem was discussed first by Schroer, 
who coined the term “infraparticle”’ for such 
systems. Later, Buchholz showed in full generality 
that, as a consequence of Gauss’ law, pure states 
with an abelian gauge charge can neither have a 
sharp mass nor carry a unitary representation of the 
Lorentz group, thereby uncovering the simple origin 
of results found by explicit computations, notably in 
quantum electrodynamics (Steinmann 2000). Thus, 
one is faced with the question of an appropriate 
mathematical characterization of  infraparticles 
which generalizes the concept of particle invented 
by Wigner. Some significant steps in this direction 
were taken by Frohlich, Morchio, and Strocchi, who 
based a definition of infraparticles on a detailed 
spectral analysis of the energy-momentum opera- 
tors. For an account of these developments and 
further references, cf. Haag (1992). 

We outline here an approach, originated by Buch- 
holz, which covers all stable particle-like structures 
appearing in quantum field theory at asymptotic times. 
It is based on Dirac’s idea of improper particle states 
with sharp energy and momentum. In the standard 
(rigged Hilbert space) approach to giving mathema- 
tical meaning to these quantities, one regards them as 
vector-valued distributions, whereby one tacitly 
assumes that the improper states can coherently be 
superimposed so as to yield normalizable states. This 
assumption is valid in the case of Wigner particles but 
fails in the case of infraparticles. A more adequate 
method of converting the improper states into normal- 
izable ones is based on the idea of acting on them with 
suitable localizing operators. In the case of quantum 
mechanics, one could take as a localizing operator any 
sufficiently rapidly decreasing function of the position 
operator. It would map the improper “plane-wave 
states” of sharp momentum into finitely localized 
states which thereby become normalizable. In quan- 
tum mechanics, these two approaches can be shown to 
be mathematically equivalent. The situation is differ- 
ent, however, in quantum field theory. 

In quantum field theory, the appropriate localiz- 
ing operators L are of the form [18]. They constitute 
a (nonclosed) left ideal £ in the C*-algebra A 
generated by all local operators. Improper particle 
states of sharp energy-momentum p can then be 
defined as linear maps |-),:£ — H satisfying 


UML. =e" LG), LEL [25] 


p? 
It is instructive to (formally) replace L here by the 
identity operator, making it clear that this relation 
indeed defines improper states of sharp energy- 
momentum. 

In theories of massive particles, one can always find 
localizing operators L € £ such that their images 
|L), E H are states with a sharp mass. This is the 
situation covered in Wigner’s approach. In theories 
with long-range forces there are, in general, no such 
operators, however, since the process of localization 
inevitably leads to the production of low-energy 
massless particles. Yet improper states of sharp momen- 
tum still exist in this situation, thereby leading to a 
meaningful generalization of Wigner’s particle concept. 

That this characterization of particles covers all 
situations of physical interest can be justified in the 
general setting of relativistic quantum field theory as 
follows. Picking g; as in [4] and any vector V € H 
with finite energy, one can show that the functionals 
pr, t € R, given by 


p(L'L) = f dbx gi(xo) (V, (L'L)(2)9), Le£ [26 


are well defined and form an equicontinuous family 
with respect to a certain natural locally convex 
topology on the algebra C=£*£. This family of 
functionals therefore has, as £ — too, weak-* limit 
points, denoted by ø. The functionals o are positive 
on C but not normalizable. (Technically speaking, 
they are weights on the underlying algebra A.) Any 
such o induces a positive-semidefinite scalar product 
on the left ideal £ given by 


(Ly | Lz) = GL, Ly), Li; InEeLl [27] 


After quotienting out elements of zero norm and 
taking the completion, one obtains a Hilbert space 
and a linear map L> |L) from £ into that space. 
Moreover, the spacetime translations act on this 
space by a unitary representation satisfying the 
relativistic spectrum condition. 

It is instructive to compute these functionals and 
maps in theories of massive particles. Making use of 
relation [21] one obtains, with a slight change of 
notation, 


(La |La) = | du(p) (p LiLalp} 28 


where u is a measure giving the probability density 
of finding at asymptotic times in state Y a particle of 
energy-momentum p. Once again, possible summa- 
tions over different particle types and internal 
degrees of freedom have been omitted here. Thus, 
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setting |L),=L|p), one concludes that the map 
L= |L) can be decomposed into a direct integral of 
improper particle states of sharp energy-momen- 
tum, |-)= J, du(p)'/*|-),. It is crucial that this result 
can also be established without any a priori input 
about the nature of the particle content of the 
theory, thereby providing evidence of the universal 
nature of the concept of improper particle states of 
sharp momentum, as outlined here. 


Theorem 5 Consider a relativistic quantum field 
theory satisfying the standing assumptions. Then the 
maps L++|L) defined above can be decomposed into 
improper particle states of sharp energy-momentum p, 


n= [ dyu(p)!|:), 29 


where u is some measure depending on the state Y 
and the respective time limit taken. 


It is noteworthy that whenever the space of 
improper particle states corresponding to fixed 
energy-momentum p is finite dimensional (finite 
particle multiplets), then in the corresponding Hilbert 
space there exists a continuous unitary representation 
of the little group of p. This implies that improper 
momentum eigenstates of mass m=(p2)'/? > 0 carry 
definite (half)integer spin, in accordance with Wigner’s 
classification. However, if m = 0, the helicity need not 
be quantized, in contrast to Wigner’s results. 

Though a general scattering theory based on 
improper particle states has not yet been developed, 
some progress has been made in Buchholz et al. 
(1991). There it is outlined how inclusive collision 
cross sections of scattering states, where an unde- 
termined number of low-energy massless particles 
remains unobserved, can be defined in the presence 
of long-range forces, in spite of the fact that a 
meaningful scattering matrix may not exist. 


Asymptotic Completeness 


Whereas the description of the asymptotic particle 
features of any relativistic quantum field theory can be 
based on an arsenal of powerful methods, the question 
of when such a theory has a complete particle 
interpretation remains open to date. Even in concrete 
models there exist only partial results, cf. Iagolnitzer 
(1993) for a comprehensive review of the current state 
of the art. This situation is in striking contrast to the 
case of quantum mechanics, where the problem of 
asymptotic completeness has been completely settled. 
One may trace the difficulties in quantum field 
theory back to the possible formation of superselection 
sectors (Haag 1992) and the resulting complex particle 


structures, which cannot appear in quantum-mechan- 
ical systems with a finite number of degrees of freedom. 
Thus, the first step in establishing a complete particle 
interpretation in a quantum field theory has to be the 
determination of its full particle content. Here the 
methods outlined in the preceding section provide a 
systematic tool. From the resulting data, one must then 
reconstruct the full physical Hilbert space of the theory 
comprising all superselection sectors. For theories in 
which only massive particles appear, such a construc- 
tion has been established in Buchholz and Fredenhagen 
(1982), and it has been shown that the resulting Hilbert 
space contains all scattering states. The question of 
completeness can then be recast into the familiar 
problem of the unitarity of the scattering matrix. It is 
believed that phase space (nuclearity) properties of the 
theory are of relevance here (Haag 1992). 

However, in theories with long-range forces, where 
a meaningful scattering matrix may not exist, this 
strategy is bound to fail. Nonetheless, as in most high- 
energy scattering experiments, only some very specific 
aspects of the particle interpretation are really tested — 
one may think of other meaningful formulations of 
completeness. The interpretation of most scattering 
experiments relies on the existence of conservation 
laws, such as those for energy and momentum. If a 
state has a complete particle interpretation, it ought to 
be possible to fully recover its energy, say, from its 
asymptotic particle content, that is, there should be no 
contributions to its total energy which do not manifest 
themselves asymptotically in the form of particles. 
Now the mean energy-momentum of a state Y € H is 
given by (V,PW),P being the energy-momentum 
operators, and the mean energy-momentum contained 
in its asymptotic particle content is [du(p)p, where u 
is the measure appearing in the decomposition [29]. 
Hence, in case of a complete particle interpretation, 


the following should hold: 
(2, PU) = | dulp)p 30 


Similar relations should also hold for other con- 
served quantities which can be attributed to parti- 
cles, such as charge, spin, etc. It seems that such a 
weak condition of asymptotic completeness suffices 
for a consistent interpretation of most scattering 
experiments. One may conjecture that relation [30] 
and its generalizations hold in all theories admitting 
a local stress-energy tensor and local currents 
corresponding to the charges. 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Dispersion Relations; 
Perturbation Theory and its Techniques; Quantum 
Chromodynamics; Quantum Field Theory in Curved 
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Spacetime; Quantum Mechanical Scattering Theory; 
Scattering, Asymptotic Completeness and Bound States; 
Scattering in Relativistic Quantum Field Theory: The 
Analytic Program. 
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Introduction to the Analytic Structures 
of Quantum Field Theory 


The importance of complex variables and of the 
concept of analyticity in theoretical physics finds 
one of its best illustrations in the analytic structure 
of relativistic quantum field theory (QFT). The latter 
have been investigated from several viewpoints in 
the last 50 years, according to the successive 
progress in QFT. 

In the two main axiomatic frameworks of QFT, 
namely the one based on Wightman axioms (for a 
short presentation, see Dispersion Relations and also 
Axiomatic Quantum Field Theory) and the Haag, 
Kastler, and Araki theory of “local observables” (see 
Algebraic Approach to Quantum Field Theory), 
there are general justifications of analyticity proper- 
ties for relevant “N-point structure functions” both 
in complexified spacetime variables and in complex- 
ified energy-momentum variables. 

In the Wightman framework, relativistic quantum 
fields are operator-valued distributions ®;(x) on four- 
dimensional Minkowski spacetime that transform 
covariantly under a unitary representation of the 
Poincaré group in the Hilbert space of states. The 
basic quantities of QFT are (tempered) distributions 
on RN of the form < Y, (x1): -- ®(xy)W! >, which 
depend on pairs of states Y, Y’, belonging to the 
Hilbert space of the QFT considered: they can be 


called N-point structure functions of the field ® “in 
x-space,” namely in Minkowski spacetime (here, for 
brevity, we assume that the system is defined in terms 
of a single quantum field). In parallel, it is important to 
consider the Fourier transform (p) = f e'?*@(x) dx of 
the field in the Minkowskian energy-momentum 
space (p-x=poxo — p- x denoting the Minkowskian 
scalar product). The corresponding quantities 
<W,0(p1)---®(pn)W’> , can then be called N-point 
structure functions of the field ® “in p-space,” namely 
in energy-momentum space. 

In the algebraic QFI framework, each basic 
local observable B affiliated to a certain bounded 
region of spacetime © generates a Haag—Kastler— 
Araki quantum field B(x) by the action of 
the translations of spacetime, namely B(x)= 
U(x)BU(x). Here U(x) denotes the unitary repre- 
sentation of the group of spacetime translations in 
the Hilbert space of states: B(x) is affiliated to the 
translated region O(x)= {y;y — x € O}. Then again 
one can consider N-point structure functions of the 
theory of the form <W,B(x,)---B(xn)W'> and 
< Y, B(pi)---B(pn)v'>. 

To summarize the situation as it occurs in both 
cases, one can say the following: 


1. A certain postulate of relativistic causality 
implies the analyticity of structure functions of 
a certain class, often called “Green functions,” 
in the complex energy-momentum variables 
k;=p;+iqj, in particular for purely imaginary 
energies. 

2. “Stability properties” of the states Y, Y’ such as a 
“bounded energy content” of these states imply 
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the analyticity of the previous structure functions 
in the complex spacetime variables, in particular 
for purely imaginary times. 


In both cases, analyticity is obtained as a basic pro- 
perty of the Fourier—Laplace transformation in several 
variables. Let V* denote the forward cone of the 
Minkowskian space (V+ = —V~ ={x; x7 =x- x > 0, 
xo > O}) and let 


~ 


Fp 4 iq) — J, el(ptiq)-x f(x)dx [1] 


a 


g(a + iy) = (2m) J e Pet E(p)dp [2 


+ 
V; 


be the associated reciprocal Fourier formulas, 
applied, respectively, to functions f(x) with support 
contained in the translated forward cone Vý = —a + 
Vt, a€ V% (or in its closure), and to functions g(p) 
with support contained in the translated forward 
cone V} =—P+V*t, P € V* of energy-momentum 
space (or in its closure). Then in view of the 
convergence properties of the previous integrals, one 
easily checks that f(k) is holomorphic with possible 
exponential increase in the imaginary directions 
controlled by the bound e%% in the tube domain 
T*=R* +iV*; similarly, g(z) is holomorphic with 
an increase controlled by the exponential bound e”? 
in the tube domain T~ = R* +iV~. 

On the one hand, for each N the structure functions 
<W,O(p1)---O(pn)W'> (or <W, B(p1)---B(pn)’>) 
have conical support properties of the previous type in 
the variables p;, as a consequence of the relativistic 
shape of the energy-momentum spectrum. In both 
axiomatic frameworks, in fact, one postulates that 
there is a state of zero energy-momentum Q, called the 
vacuum, and that the energy-momentum spectrum ™, 
namely the joint spectrum of the generators P, of the 
Lie algebra of the group U(x), is contained in the 
closure of V*: this is the so-called spectral condition. 
A more refined assumption introduced for the require- 
ments in particle physics is that © contains discrete 
parts localized on sheets of (mass-shell) hyperboloids 
inside V*. These support properties in p-space imply 
that the corresponding inverse Fourier transforms 
<W,O(x1)---®(xy)W'> are boundary values of holo- 
morphic functions in appropriate tube domains of the 
complex space variables (z1,...,Zn). 

On the other hand, in order to exhibit structure 
functions with conical support properties in x-space, 
one needs to build appropriate algebraic combina- 
tions of functions <W, ®(x;,)---®(x;,)W'> with 
permuted arguments in order to take the benefit of 
the causality postulate, which is always formulated 
in terms of the commutator of two field operators. 


There are two versions of this postulate. In the 
Wightman framework, causality is expressed by the 
condition of local commutativity or microcausality, 


[®(x1), ®(x2)] = 0 for (x; — x2) < 0 [3] 


In the algebraic QFT framework, causality is 
expressed by a similar property in terms of any 
field B(x) generated by a local observable B= B(0) 
affiliated to a region of spacetime enclosed in a 
given “double cone” Op = V} N (—V}). The corres- 
ponding expression of causality is 


[B(x1), B(x2)| = 0 
for (x1 — x2) (Vj U (—V}) 4] 


for all a such that a > 2b. 

So, we see that basically, causality and spectral 
condition generate analyticity respectively in com- 
plexified p-space and x-space. However, the situa- 
tion is more intricate, since for each N there are 
always several holomorphic branches (two in the 
case N=2) in the variables (z1,...,Zn) and also in 
the variables (k1,..., kn): each of these two sets is 
obtained essentially by permutations of the N vector 
variables. The important point is that these various 
branches can be seen to “communicate together,” 
thanks to the existence of “coincidence regions” of 
their boundary values on the reals. Here again the 
roles played by causality and stability are symmetric 
(but inverted): while causality produces coincidence 
regions for the holomorphic functions in complex 
spacetime, spectral conditions produce coincidence 
regions for the holomorphic functions in complex 
energy-momentum space. 

In view of a basic theorem of several complex 
variable analysis, called the edge-of-the-wedge the- 
orem (see below in (4)), the two sets of commu- 
nicating holomorphic branches actually define by 
mutual analytic continuation two holomorphic 
function HY (kı, ...,Rn) and WY (215 ++ «5 ZN) in 
respective domains Dy ” and AY ” However, these 
two primitive domains are not natural holomorphy 
domains (a phenomenon which is particular to 
complex geometry in several variables). The prob- 
lem of finding their holomorphy envelopes, namely 


the smallest domains Dye, ” and AY ” in which any 
functions holomorphic in the primitive domains can 
be analytically continued, is the idealistic purpose of 
what has been called the analytic program of 
axiomatic QFT. So, we see that there is an analytic 
program in x-space and there is an analytic program 
in p-space. In practice, except for the case N=2, 
where the complete answer is known, only a partial 
knowledge of the holomorphy envelopes has been 
obtained. 
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The analytic program in p-space, which is the 
only one to be described in the rest of this article, 
was often considered as physically more interesting, 
in view of the fact that it aims to establish 
analyticity properties of the scattering kernels on 
the complex mass shell. As a matter of fact, an 
important part of it concerns the derivation of the 
analyticity domains of dispersion relations for two- 
particle scattering amplitudes. This part is important 
from the historical viewpoint as well as from 
conceptual, physical, and pedagogical viewpoints 
(the reader may find it useful to first check the 
article Dispersion Relations, which illustrates how a 
structure function of the form Hy ©" (ki, ko) can be 
used for that purpose with a suitable choice of the 
states V and Y’). In the general development of the 
analytic program (in x-space as well as in p-space), 
it is recommended to consider the infinite set of 
structure functions Hy = Hy (ki, ... kn) and 
Wy =W’ (z1,..., 2N) where Q is the privileged 
vacuum state of the theory, in view of the fact that 
each of these sets characterizes entirely the field 
theory considered. 

Before shifting to the analytic program in p-space, 
we would like to mention various points of interest 
of the analytic program in x-space: 


1. Various results of this program have been 
extensively used for proving fundamental prop- 
erties of QFT, such as the PCT-invariance 
theorem, the spin-statistics connection, etc. 
A good part of these can be found in the 
books by Streater and Wightman (1980) and by 
Jost (1965). 

2. The functions Hy and Wy are holomorphic in 
their respective p-space and x-space “Euclidean 
subspaces.” To make this clear, let us assume 
that a Lorentz frame has been chosen once for 
all; the linear subspace of complex spacetime 
(resp. energy-momentum) vectors of the form 
z = (iyo, x) (resp. R=(igo,p)) is called the “Eucli- 
dean subspace” of the corresponding complex 
Minkowskian space, in view of the fact that the 
quadratic form 27=z-z=—(y5+x*) (resp. 
k? =k . k= — (q5 + p?)) has a definite (negative) 
sign on that subspace. Then it has been estab- 
lished that (for each N) the restrictions of Hyn 
and Wy to the corresponding N-vector Euclidean 
subspaces are the Fourier transforms of each 
other. This fact participates in the foundation of 
the Euclidean formulation of QFT or “QFT at 
imaginary times”; the latter has provided many 
important results in QFT, in particular for the 
rigorous study of field models (initiated by 
Glimm and Jaffe in the 1970s). 


3. A more recent extension of QFT called thermal 
QFT (TQFT), which aims to study the behavior of 
quantum fields in a thermal bath, can be described 
in terms of a modified analytic program. In the 
latter, the spectral condition is replaced by the 
so-called KMS condition, which prescribes x-space 
analyticity properties of a particular type for the 
structure functions Wy: it requires analyticity 
together with periodicity conditions with respect 
to imaginary times, the period being the inverse of 
the temperature (see Thermal Quantum Field 
Theory). The usual analytic structure for the 
theories with vacuum and spectral conditions is 
recovered in the zero-temperature limit. 

4. In more recent investigations concerning quan- 
tum fields on (holomorphic) curved spacetimes, 
analyticity properties of the structure functions 
similar to those of thermal QFT can be estab- 
lished. This is the case in particular with de Sitter 
spacetime, for which a notion of “temperature of 
geometrical origin” is most simply exhibited. 


In this article, an account of the general analytic 
program of axiomatic QFT in complex energy- 
momentum space will be presented; it will describe 
some of the methods which have been used for 
establishing analyticity properties of the N-point 
structure functions of QFT and corresponding proper- 
ties of the (n — 7')-particle collision processes, for all 
n,n’ such that n > 2,n' > 2,n+n'=N. (For a more 
detailed study, in particular concerning the microlocal 
methods, see the book by Iagolnitzer (1992)). 

Concerning the important case N =4, this article 
gives complements to the results described in the 
article Dispersion Relations. In fact, the program 
allows one to justify other important analytic 
structures of the four-point functions and of two- 
particle scattering functions. They concern 


è the field-theoretical basis of analyticity in the 
complexified variable of angular momentum, first 
introduced and developed in potential theory 
(Regge 1959); 

e the Bethe-Salpeter (BS-) type structure (based on 
the additional postulate of asymptotic complete- 
ness), which is a relativistic field-theoretical gen- 
eralization of the Lippmann-Schwinger structure 
of nonrelativistic scattering theory (for Schrodinger 
equations with Yukawa-type potentials). 


The latter allows one to introduce the concept of 
composite particle in the field-theoretical framework 
(including bound states and unstable particles or 
“resonances”) and also the concept of “Regge 
particle,” thanks to complex angular momentum 
analysis. 
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Various Aspects of the General 
Analytic Program of QFT in Complex 
Energy-Momentum Space 


The N-Point Structure Functions of QFT 


It is proved in the Wightman QFT axiomatic frame- 
work that any QFT is completely characterized by the 
(infinite) sequence of its “N-point functions” or 
“vacuum expectation values” (also called “Wightman 
functions”) 


Wn(x1,---,XnN) = < Q, D(t) O(an)Q > 


which are tempered distributions on R*™ satisfying a 
set of general properties that can be split up into 
linear and nonlinear conditions. (This is known as 
the Wightman reconstruction theorem). 


Linear conditions Each individual N-point func- 
tion satisfies three sets of linear conditions which 
result, respectively, from: 


1. Poincaré invariance: typically, for every Poincaré 
transformation g of Minkowski spacetime 


Wn(x1,--- , XN) 


in particular, the Wy are invariant under space- 
time translations and therefore defined on the 
quotient subspace R4N-" = RN /R* of the differ- 
ences Xj; — Xp. 

2. Microcausality: support conditions on commu- 
tator functions of the following form: 


CUD (304, ne 


JAN) = Wn(gx1, ee 


Xn) = Wri(X1,.+- Xj, Xj41,++-, XN) 
as WN (Xirsi 


Xece N) 0 


in the region of R" defined by a — x41)" <0. 

3. Spectral condition: support conditions on the 
Fourier transform Wy(p1,...,PN)=6(P1 +--+ 
DN) x WN(P1,---,PN—-1) Of Wyn, which assert that 
WNn(P1,---,PN-1) =O if either one of the follow- 
ing conditions is fulfilled: py +---+p,; Z X, for 
j-1,....N—1. 


For each N, one can then construct a set of 
distributions R (x1,..., XN), called “generalized 
retarded functions” (Araki, Ruelle, Steinmann, 
1960 (see Iagolnitzer (1992, ref. [EGS])) which are 
appropriate linear combinations of multiple com- 
mutator functions built from Wy and multiplied by 
products of Heaviside step-functions 0(x;o — xko) of 
the differences of time coordinates. Each of these 
distributions RO (x1,..., XN) has its support con- 
tained in a convex salient cone Ca. This construction 
can be seen as a generalization of the decomposition 


[23] of the commutator Cy, w in the article 


Dispersion Relations. Then in view of the Laplace- 
transform theorem in several variables, the Fourier 
transform R (p1, ... PN) = ólpi +---+ pn) x 
7O ([p] n) is such that mm” ([p] n) is the boundary value 
of a holomorphic function | [k]n) defined in a 
tube T,=R*8-) +iC,. Here [k]y =[plny + ila] 
belongs to a 4(N — 1)-dimensional complex linear 
space Ma: this is the set of complex vectors 
[R] x = (k1,..., kN) such that ky +--+ kn =0. Ca is 
the dual cone of Ca in the real (4(N — 1)-dimen- 
sional) [g],-space. Geometrically, each cone Ca is 
defined in terms of a certain “cell” of [g]\-space 
which is defined by prescribing consistent conditions 
of the form e7qj E V* with q; = } ja; qj and e; = +1 
for all proper subsets J of the set {1,2,..., N}. 
This is the expression of the microcausality postu- 
late (summarized in [3] or [4]) in complex energy- 
momentum space. (Concerning the difference 
between the two formulations [3] and [4], one can 
see that there is no geometrical difference concern- 
ing the analyticity domains, but differences for the 
type of increase of the structure functions in their 
tube domains: in the case of [3], they are bounded 
by powers of the energy-momenta, while in the case 
of [4] they may have an exponential increase 
governed by factors of the type e1. 

For each N, the linear space generated by all the 
distributions zol pln) is constrained by a set of 
linear relations (called Steinmann relations) which 
result from algebraic expressions of discontinuities 
of the following type, called (generalized) “absorp- 
tive parts,” 


A? (pln) — 72 e 


=< Q, RE. (plg) RE. (ply I> [S] 


for all pairs of adjacent cells (aœ, œ), p) in the 
following sense: a and a’ only differ by changing the 
value of ep =-—€n,(Jı, J2) denoting any given 
partition of the set {1,2,..., N}. In [5], the symbols 
Re denote generalized retarded operators of lower 
order and the argument [p](;) stands for the set of 
independent 4-momenta { p;;7 € J}. Formula [5] may 
be seen as an N-point generalization of formula [26] 
of Dispersion Relations for the case when the state 
Y = Y is replaced by Q. 

Then by applying to [5] the same argument based 
on spectral condition as in the exploitation of 
eqn [26] in Dispersion Relations, one concludes 
that the two distributions yo and pe) coincide on 
an open set Ra,œ of the form p7, = p7, << M;, where 
Pi= dc}, Pi = — pp. It then follows from the gen- 
eral “oblique edge-of-the-wedge theorem” (Epstein, 
1960; see below) that the two corresponding 
holomorphic functions #@)"([R]y) and #2) ([R]y) 
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have a common analytic continuation in the union of 
their tubes together with a certain complex “connecting 
set,” bordered by Ry, œw. Since this argument applies to 
all pairs (a, a")( 7, p)» the following important property 
holds (see Iagolnitzer (1992, refs. [B2], [EGS])): 


Theorem 1 


(i) All the holomorphic functions ee (TR) x) 
admit a common analytic continuation 
Hn ((k]x), called the N-point structure function 
(or Green function) of the given quantum field 
in complex energy-momentum space. It is 
holomorphic in a “primitive domain” Dyn of 
MÜ, which is the union of all tubes T, 
together with complex “connecting sets” bor- 
dered by all the coincidence regions Ra a 
defined previously. 

(ii) For each N the complex domain Dy contains the 
whole Euclidean subspace En of M, which is 
the set of all complex vectors |k]n =(k1,..., RN) 
such that k; = (ki,o, Ril; ki,o = RT k; =P for 
j=1,2,...,N. (This Euclidean subspace depends 
on the choice of a given Lorentz frame in 
Minkowski spacetime.) 


Positivity Conditions The Hilbert space framework 
which underlies the axioms of QFT implies (an 
infinite set of) positivity inequalities on the N-point 
structure functions of the fields. As a typical 
example related to the previous formula [5] when 
Wil =Y2|}=N/2 (for N even), one can mention the 
positive-definiteness property of the absorptive parts 
for appropriate pairs of adjacent cells (a1,a2= 
—Q1)(,,J5)» Which simply expresses the positivity of 
the following Hilbertian squared norm: 


[Fel fllely C) 
ale 


—A) (Pldldl oly, dlel cy 





2 


N 
= io mn sdp 29 [6 





Scattering Kernels of General (n — n’)-Particle 
Collisions and General Reduction Formulas 


The presentation of (2 —2)-particle scattering ker- 
nels in the article Dispersion Relations can be 
generalized to arbitrary (n— n')-particle collision 
processes, involving n incoming massive particles 
(n> 2) and n’ outgoing massive particles (w > 2). 
The big “scattering matrix” or “S-matrix” in the 
Hilbert space of states is the collection of all partial 
scattering matrices S, ,, or of the equivalent kernels 


Snn (Pn,in3 Pn’, out)> defined by a straightforward gen- 
eralization of formula [20] of the quoted article: 


aa! (eee ) Si out) 


= con (Prin )&n',out (Pout) 
Maz 


x Sin! (Pain Pow B ARTA (Di out) [7 | 


Here we have considered for simplicity the case of 
collisions involving a single type of particle with 
mass m. In the arguments of the wave packets, the 
kernel, and the measures (u”, U% ), Pnin and Pw, outs 
respectively, denote the sets of incoming and 
outgoing 4-momenta (p1,...,Pn) and (Pis... Ph) 
which all belong to the physical mass shell 
H+} ={p;p € V*,p* =m}. By supplementing these 
mass-shell constraints with the relativistic law of 
conservation of total energy-momentum pı +---+ 
Pn =P} +: +p, one obtains the definition of the 
mass-shell manifold M, of (n — n')-particle colli- 
sion processes. 

We shall reserve the name of scattering kernel (or 
scattering amplitude), denoted by Ty, w (Pn, ini Pv’, out) 
to the so-called “connected component” of the 
S-matrix kernel Sn, w (Pn, ini Pr',out). By analogy with 
the definition of T in terms of S for the two-particle 
collision processes (see Dispersion Relations) T, w is 
defined by a recursive algorithm, which amounts to 
subtract from S,, all the components of the 
(1 n')-collision processes that are decomposable 
into independent collision processes involving smal- 
ler number of particles, according to all admissible 
partitions of the numbers n and w. 

For any given N, let us consider all the “affiliated” 
scattering kernels T, w such that n + n’ = N and whose 
corresponding collision processes, also called 
“channels,” are deduced from one another by the 
relevant exchange of incoming particles and 
outgoing antiparticles (eg., I], + Ih + IM —> M4 + 
M+ ie Mrt Im t a and I + 
I — Il + H4 + Is + Ie). There exist general reduc- 
tion formulas according to which all these scattering 
kernels are restrictions to the mass-shell manifold Mn) 
of appropriate boundary values of the (so-called) 
“amputated N-point function” Hy(ki,...,2n) = 
(kt — m*)--- (ke, — m?) x Hn(k1,---, kn). More pre- 
cisely, these reduction formulas can be written as 
follows: 


Tre Geir Pr! owt) a = A) (pı, e. PN) Me [8] 


In the latter, H v ) denotes a certain boundary value of 
Hy on the reals: it is equal to a generalized retarded 
function (ply) which depends in a specific way 


on a region of the mass shell, called MIN in which 
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the (n— n')-channel is considered. The important 
thing to be noted in [8] is the sign convention which 
attributes the notation —p; to the momentum of any 
incoming particle and therefore implies that pj; 
belongs to the negative sheet of hyperboloid H7, = 
—H;. This is the price to pay for expressing 
symmetrically the energy-momentum conservation 
law as pı + p2 +--+ pyn =0 (according to the QFT 
formalism), but it also displays, as a nice feature, 
the fact that all the affiliated scattering kernels 
Taw such that n+n =N are located on the 
various connected components of the mass shell 
MP = ys 71,2 5<n05.N)e the choice of the 
sheet H, or H} of H,, is exactly linked to the 
incoming or outgoing character of the particle 
considered. 


Remark 1 The reduction formulas are more usually 
expressed in terms of the Fourier transforms of the 
(connected parts of the) N-point amputated chronolo- 
gical functions 7y([p]n) (see Scattering in Relativistic 
Quantum Field Theory: Fundamental Concepts and 
Tools). As a matter of fact, the latter coincide with the 
boundary values po) ([p]ln) of Hy in the corresponding 
relevant regions MN). 


Remark 2 Coming back to the case of two-particle 
scattering amplitudes (i.e. 2 =n'=2,N=4), one 
can see that the general study presented here implies 
the consideration of the four-point function 
H4(k1, k2,k3,R4), which is a holomorphic function 
of three independent complex 4-momenta (since 
kı +ko +k3+k4=0). In that case, the domain D4 
contains 32 tubes 7, which are specified by triplets 
of conditions such as q1 € V*, q2 E€ V+, q3 € V", or 
= €V",41+42€V",gi1+93¢V", and those 
obtained by permutations of the subscripts 
(1,2,3,4) and also by a global substitution of the 
cone V` to V”. 


Remark 3 The logical path from the postulates of 
QFT to the analyticity properties of two-particle 
scattering amplitudes that has been followed in the 
article Dispersion Relations can be seen as a partial 
exploitation of the general analyticity properties of 
the four-point function: one was specially interested 
there in the analyticity properties of H4 in a single 
4-momentum k;=—k3 (at fixed real values of 
p2 = —p4). The “partial reduction formula” [27] of 
Dispersion Relations corresponds to the restriction 
of eqn [8] (for N=4) to the linear submanifold 
(pı = —p3, p2 = —p4). It may also be worthwhile to 
stress the fact that, in spite of the exponential 
bounds on H4 implied by the postulates of algebraic 
QFT, it has been possible to prove that the 
scattering function is still bounded by a power of s 


in its cut-plane (or crossing) domain; the dispersion 
relations with two subtractions are still justified in 
that case (Epstein, Glaser, Martin, 1969 (see Martin 
(1969, preprint))). 


Off-Shell Character of Dy: Nontriviality of the 
Analytic Structure of the Scattering Kernels 


One can now see that for each value of N(N > 4) 
the situation created by complex geometry in the 
space CÎND of [k]n is a mere generalization of the 
one described in a simple situation in the article 
Dispersion Relations. 


1. There exists a fundamental (3N — 4)-dimensional 
complex submanifold, namely the complex mass 
shell Mish defined by the equations k? =m"; 
j=1,...,N, which connects together the various 
real mass-shell components Mr, interpreted as 
the various physical regions of a set of affiliated 
(n— n')-collision processes. The problem of 
proving the “analyticity of (n— n')-scattering 
functions” thus amounts to constructing such 
holomorphic functions on the complex manifold 
Mnp whose boundary values on the various real 
regions Min) would reproduce the relevant 
scattering kernels Ty, v(—Dn, in3 Pw, out): 

2. All the tubes 7, which generate the primitive 
domain Dy are off-shell domains, namely their 
intersections with Mis) are empty. This simply 
comes from the fact that the conditions q; € V* 
(included in their definition) and k? =m? > Q are 
incompatible. One can also check that adding the 
coincidence regions Ra, w between adjacent tubes 
does not improve the situation. However, one 
can state as a relevant scope the following 
program. 

3. Linear program (so-called because it only relies 
on the linear conditions presented in the section 
“N-point structure functions of QFT”): find parts 
of the holomorphy envelope of Dyn (possibly 
improved by the exploitation of the Steinmann 
relations) whose intersections with the complex 
mass shell M are nonempty. In the best case, 
show that such intersections can exist which 
connect two different regions M together, 
which means “proving the crossing property 
between these two regions.” 

4. We shall see in the following that, except for the 
case N=4, the results of this linear program 
have been rather disappointing as far as reaching 
the complex mass shell is concerned; however, 
other interesting analytic structures also coming 
from positivity conditions and from the addi- 
tional postulate of asymptotic completeness have 
been investigated under the general name of 
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nonlinear program. The “synergy” created by the 
combination of these two programs remains, to a 
large extent, to be explored. 


Results of Analytic Completion 
in the “Linear Program” 


We can only outline here some of the geometrical 
methods which allow one to compute parts of the 
holomorphy envelopes of the domains Dy. One 
important method, which may be used after apply- 
ing suitable conformal mappings, reduces to the 
following basic theorem. 


The tube theorem The holomorphy envelope of a 
“tube domain” of the form Tg = R” + iB, where B is 
an arbitrary domain in R” called the basis of the 
tube, is the convex tube Tg =R” + iB, where B is the 
convex hull of B. 


The opposite or oblique edge-of-the-wedge theo- 
rem (Epstein 1960 (see Streater and Wightman 
(1980, ch. 2, ref. 18))) is a refined local version of 
the tube theorem, in which the basis B is of the form 
B = C1 UC), where C1, C2 are two disjoint (opposite 
or nonopposite) cones with apex at the origin 
and where Tg is replaced by a pair of “local tubes” 
Gi Lo. Here the adjective “local” means that 
the real parts of the variables are confined in a given 
open set U (which can be arbitrarily small). The 
connectedness of Tg is now replaced by the 
consideration of any pair of functions (fi, f) 
holomorphic in these local tubes whose boundary 
values on their common real set U coincide. The 
result is that fi and fh admit a common analytic 
continuation f in a local tube [ where C is the 
convex hull of C1 U C2. In the case of Opposite cones 
(C1 = — C2), f is then analytic in the real set U, while 
in the general oblique case f is only analytic in a 
complex eaea Te set bordered by U (namely a set 
which connects T°° and 7). There exists an 
extended version a the edge- op the-wedge theorem 
in which the boundary values of fı and fo are only 
defined as distributions. 

For simplicity, we shall just give a very rough 
classification of the type of results obtained. We 
shall distinguish: 


e analyticity domains in the space of several 
(possibly all) variables: they can be of global 
type or of microlocal type, namely restricted to 
complex neighborhoods of real points; 

e analyticity domains in special families of one- 
dimensional complex manifolds; and 

èe combinations of one-dimensional results which 
generate domains in several variables by a refined 
use of the tube theorem, called the Malgrange- 


Zerner “flat tube theorem,” or “flat edge-of-the- 
ri theorem.” In the latter, the local tubes 

and ae of fı and fo reduce to one-variable 
somal a of the upper half-plane in separate 
variables z4 =x, + iy1, Z2 = x2 + iy2 but with a 
common range of real parts (x1, x2) € U. The data 
fi(Z1,X2) and fh(x1,z2) have coinciding boundary 
values (fı(x1, x2) = fh (x1, x2)) in the limit (yy —> 0, 
y2 — 0). The result is again the existence of a 
common analytic continuation to fı and f, which 
is a function of two complex variables f (z1, z2) in 
the intersection of the quadrant (y; > 0, y2 > 0) 
with a complex neighborhood of U. (Note that 
this result of complex analysis still holds when the 
real boundary values of the holomorphic func- 
tions have singularities, namely are only defined 
in the sense of distributions). 


Global analyticity properties The following prop- 
erty (discovered by Streater for three-point func- 
tions) looks like an extension of the tube theorem. 
The holomorphy envelope of the union of two tubes 
TaT corresponding to adjacent pairs of cells 
(a, a')g,,p) together with a om connecting set 
bordered by Ra, o = {Pls 07, <m 7} is the convex 
hull Taw of the union of these tubes minus the 
following analytic hypersurface øz, which can be 
called “a cut”: o7, = {[R]n: ke =m}, + p, p > 0}. The 
interest of this result (although it remains by itself an 
off-shell result) is that it can generate larger cut- 
domains by additional analytic completions, which 
may have intersections with the complex mass shell 
(see below for the case N = 4). 


Microlocal analyticity properties In the case of the 
four-point function Hy4, it is possible to consider 
opposite cut-domains of the previous type, for which 
oj, =0(1,2) is the energy-cut of the channel (1,2 — 
3,4), and for which the spectral conditions prescribe 
an “edge-of-the-wedge situation” in the neighbor- 
hood of the corresponding mass-shell component 
M«,2-3,4). The result is that H4 is proved to be 
Ao hes in a full complex cut-neighborhood of 
M«1,2-3,4) in the ambient complex energy-momen- 
tum space. The intersection . a local domain 
with the complex mass shell MG 4) 1s of oO a full 
complex cut-neighborhood of M(1,2 — 3,4) in Mi and 
this proves that the AONE scattering wad 
is the boundary value of an analytic scattering function 
defined as the restriction F(s, t) = H4 MS of Hy: it is 
holomorphic in a domain of complex (s,t) space 
deprived from the s—cut. 

In the general case N > 4, the results are less 
spectacular, although a more sophisticated microlocal 
method involving a “generalized edge-of-the-wedge 
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theorem” has been applied. This method, which was 
one of the three methods at the origin of the chapter 
of mathematics called microlocal analysis (the other 
two being Hormander’s “analytic wave-front” 
method and Sato’s “microfunctions” method) is 
based on a local version of the Fourier—Laplace 
transformation called the FBI transformation (see, 
e.g., the book on “hypo-analytic structures” by 
Treves (1992) and in the present context the article 
“Causality and local analyticity” by Bros and 
Jagolnitzer (1973) (see Iagolnitzer (1992, ref. 
[BI1]))). 

A first positive result (obtained at first by Hepp in 
1965) is the fact that the various real boundary 
values of Hy admit well-defined restrictions as 
tempered distributions on the corresponding (real) 
mass shell Mn); this result is in fact crucial for the 
rigorous proof of general reduction formulas. How- 
ever, (according to Bros, Epstein, Glaser, 1972 (see 
Jagolnitzer (1992, ref. [BEG2])) the local existence 
of an analytic scattering function in Mj) is not 
ensured at all points of the mass shell, but only in 
certain regions. A rather favourable situation still 
occurs for (2 —3)-particle collision amplitudes (i.e., 
for N=5), but in the general case there are large 
regions of the mass shell where it is only possible to 
prove (at least in this linear program) that the 
amplitude is a sum of a limited number of boundary 
values of analytic functions, defined in local domains 
of Mish (see in this connection, Iagolnitzer (1992)). 


Analyticity at fixed total energy in momentum 
transfer variables A remarkably simple situation 
had already been exploited before the general 
analysis of Hy leading to Theorem 1 was carried 
out. It is the section of the domain of the N-point 
function in the space of the “initial relative 
4-momentum” k=(k;—k2)/2 of the s-channel 
with initial 4-momenta (k1,R2), when the total 
energy-momentum P= —(kı + k2) with P*=s is 
kept fixed and real. The remaining 4-momenta 
p3,..., pyn such that p3 +---+pn=P are also kept 
fixed and real. Consider the case when P is (positive) 
timelike and such that s > 4m. Then it can be seen 
that one obtains analyticity of (a certain “1-vector 
restriction” of) Hyn with respect to the vector variable 
k in the union of the two opposite tubes T* = R* + 
iV+, T =R* +iV~-. Moreover, an edge-of-the- 
wedge situation holds in view of the spectral coin- 
cidence region of the form k? =(—P/2+k)* < M;, 
k2 = (—P/2 — k)* < Mj. The corresponding holomor- 
phy envelope is given by a Jost-Lehmann—Dyson 
domain (see Dispersion Relations), whose section by 
the complex mass shell k7 =k}=m* turns out 
to give a “spherical tube domain” of the form 


{k; k=p + iq; k.P =0, k? = —s/4 + m?; |q?| < b7}. The 
(2— N —2)-particle scattering kernel is therefore the 
boundary value of a scattering function holomorphic 
in the previous spherical domain of complex k-space. 
In the special case of the two-particle scattering 
amplitude F(s,t), one checks that the previous domain 
yields for each s,s > 4m?, an ellipse of analyticity for 
F(s,t) in the t-plane with foci at t=0 and u=4m? — 
s — t=0; this ellipse is called the Lehmann ellipse. (We 
have considered for simplicity the case of a single type 
of particle with mass m and two-particle threshold at 
2m.) In fact, the squared momentum transfer t is equal 
to (k—k’)*, if k’=(k3—k4)/2 denotes the “final 
relative momentum” of the s-channel, which was 
here taken to be fixed and real. Moreover, by a similar 
argument the corresponding absorptive part, namely 
the discontinuity across the s-cut of the scattering 
amplitude, can be shown to be holomorphic in a larger 
ellipse with the same foci called the large Lehmann 
ellipse. 

It is interesting to compare the previous result 
with the one that one obtains when the fixed vector 
P is chosen to be spacelike, namely when s has a 
negative, namely “unphysical” value with respect to 
the distinguished channel (1, 2 — 3,4). For that case, 
the exploitation of the primitive domain D4 shows 
that for all negative (unphysical) values ¢; =k? < 0; 
i=1,2,3,4, of the squared mass variables, the 
function H4 is holomorphic in a cut-plane of the 
variable t, where the cuts are the t-cut (t=4m* + p, 
o>0) and the u-cut (u=4m* — s — t= 4m +), 
p > 0). This cut-plane has of course to be compared 
with the off- shell cut-plane domain A¢ at the basis 
of the proof of dispersion relations (see Dispersion 
Relations). Here, however, the choice of the squared 
momentum transfer t as the variable of analyticity 
allows one to shift to another interpretation in terms 
of the concept of angular momentum. 


Analyticity in the complex angular momentum 
variable In all the situations previously considered 
for the case N=4, one can see that at fixed real 
values of the squared energy s and of the squared 
masses €={G; 1=1,2,3,4}, the complex initial and 
final relative 4-momenta k and k’ have directions 
which vary on the complexified sphere S$‘. More- 
over, the corresponding restriction of H4 to that 
sphere turns out to be always well defined and 
analytic on the real part of that sphere: it therefore 
defines a kernel on the sphere, which, in view of 
Poincaré invariance, is invariant under the rotations 
and therefore admits a convergent expansion in 
Legendre polynomials. Let us call /,(s;¢) the 
corresponding sequence of Legendre coefficients. 
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In the first case considered above, this sequence 
coincides (all ¢; being equal to m*) with what the 
physicists call the set of partial waves f(s) of the 
scattering amplitude. The analyticity of Ĥ4 on a 
complex spherical tube of S©, namely of F(s,t) in 
the Lehmann ellipse, is then equivalent to a certain 
exponential decrease property with respect to £ of 
the sequence of partial waves. 

In the second case, where s and the Ç; are negative, it 
can be seen that the sphere S describes 4-momentum 
configurations which all belong to a certain Euclidean 
subspace £4 of M.: But this situation is much more 
favourable from the viewpoint of analyticity, since H4 
can be seen to be holomorphic on the full complex 
submanifold S$‘) x S minus two sets o, and oa, 
which correspond to the t- and u-cuts of the 
complex f-plane. Then this larger analyticity prop- 
erty turns out to be equivalent to the fact that the 
sequence },(s;¢) admits an interpolation H();s;¢) 
holomorphic in a certain half-plane of the form 
Re A > 4 such that for all integers £ > ọ one has: 
H(é;s;¢) =hp(s;C). The value of & is linked to the 
power bound at large momenta that must be 
satisfied by H4 as a consequence of the temperate- 
ness property included in the Wightman axiomatic 
framework (Bros and Viano 2000). 

Of course, this nice analytic structure in a 
complex angular momentum variable could extend 
to the set of physical partial waves f(s) if one could 
establish the analytic continuation of F(s,t) in a cut- 
plane of t containing the Lehmann ellipses, but this 
seems out of the possibilities at least of the linear 
program. 


The “Nonlinear Program” and 
Its Two Main Aspects 


The extension of the analyticity domains by positivity 
and the derivation of bounds by unitarity Positivity 
conditions of the form [6] have been extensively 
applied to the case N =4 (namely for subsets J with 
two elements). The main result (Martin 1969) consists 
in the possibility of differentiating the forward disper- 
sion relations with respect to t and, as a consequence, 
to enlarge the analyticity domain in ż at fixed s: the 
Lehmann ellipse, whose size shrinks to zero when s 
tends to infinity, can then be replaced by an ellipse 
(i.e. the Martin ellipse) whose maximal point 
t=tmax > 0 is fixed when s goes to infinity. This 
justifies the extension of dispersion relations in s to 
positive values of t; then in a second step the use of 
unitarity relations for the partial waves allows one to 
obtain Froissart-type bounds on the scattering ampli- 
tudes (see Martin (1969)). 


Asymptotic completeness and BS-type structural 
analysis The BS equations have been at first 
introduced as identities of formal series in the 
perturbative approach of QFT, and the idea of 
considering such identities as exact equations having 
a conceptual content in the general axiomatic 
framework of QFT has been introduced and devel- 
oped by Symanzik in 1960. However, it took a long 
time before its integration in the analytic program of 
QFT (Bros 1970 (see Iagolnitzer (1992, ref. [B1]))). 
These developments belong to the nonlinear pro- 
gram since they rely on quadratic integral equations 
between the various N-point functions, which 
express the postulate of asymptotic completeness 
via the use of appropriate reduction formulas. 

For brevity, the general set of BS-type equations 
for the N-point functions with N > 4 will not be 
presented. The simplest BS-type equation, which 
concerns the four-point function, can be written as 
follows: 


i4(K; k, k') = B(K;k, k') + (H4 os B)(K;k,k') [9] 
where 


(H4 os B)(K;k, k’) 


= J Ĥ4(K: k, k")B(K; kR)G(5 +k") 
T 


Te (5 2 e') dak” 10) 


In the latter, the s-channel is privileged, with 
s= K?, K = - (kı + k2); H4 is seen as a K-dependent 
kernel (k and k’ are the initial and final relative 
4-momenta already defined), and the new object B 
to be studied is also a K-dependent kernel. The 
function G(k) is holomorphic in k? in a cut-plane 
except for a pole at k? =m? which plays a crucial 
role. (It is essentially the “propagator” or two-point 
function of the field theory considered). Apart from 
pathologies due to the Fredholm alternative, the 
correspondence between Hy, and B is one-to-one, but 
the peculiarity concerns the integration cycle I’ of 
[10]: it is a complex cycle of real dimension 4, which 
coincides with the Euclidean space of the vector 
variable k” when all the 4-momenta are Euclidean, 
and can always be distorted inside the analyticity 
domain of H4 together with the external variables. 
The exploitation of the Fredholm equation in 
complex space with “floating integration cycles” 
then implies that B is holomorphic at least in the 
primitive domain of Hy. 

An important geometrical aspect of the integra- 
tion on the cycle T in [10] is the fact that this cycle is 
“pinched” between the pair of poles of the functions 
G when K? tends to its threshold value (s =4m7). 
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The type of mathematical concept encountered here 
is closely related to those used in the study of 
analyticity properties and Landau singularities of the 
Feynman amplitudes in the perturbative approach of 
QFT (in this connection, see the books by Hwa and 
Teplitz (1966) and by F Pham (2005) and references 
therein). 

The first basic result is that it is equivalent for H4 
to satisfy an asymptotic completeness equation in 
the pure two-particle region 4m* < s < 9m? and for 
B to satisfy the following property called two- 
particle irreducibility: B satisfies dispersion relations 
in s such that the s-cut begins at the three-particle 
threshold: s = 9m?. 

The consequence of this extended analyticity 
property of B is that it generates the following type 
of analyticity properties for Ha: 


1. The existence of a two-sheeted analytic structure 
for H4 over a domain of the s-plane containing 
the interval 4m? < s < 9m, with a square-root- 
type branch point at the threshold s = 4m. 

2. Composite particles. There exists a Fredholm- 
type expression 


N(K;k, R’) 


H4(K;k, k’) = D(K?) 


[11] 
where N and D are expressed in terms of B via 
Fredholm determinants, which shows that in its 
second sheet ĤĦ4 may have poles in s=K?, 
generated by the zeros of D. These poles are 
interpreted as resonances or unstable particles. 
The generation of real poles in the first sheet (i.e., 
bound states) is also possible under special 
spectral assumptions of QFT. 

3. Complex angular momentum diagonalization of 
BS-type equations (Bros and Viano 2000, 2003). 
The operation o, in the BS-type equation [9] 
contains not only an integration over squared- 
mass variables, but also a convolution product on 
the sphere S; the latter is transformed into a 
product by the Legendre expansion of four-point 
functions described previously in the subsection 
“Analyticity in the complex angular momentum 
variable.” As a result, there is a partially 
diagonalized transform of eqn [9] in terms of 
the functions H(A; s;¢) and B(A; s;¢), which 
allows one to write a Fredholm formula similar 
to [11], namely 

HO. [12] 
D(A; s) 


Then under suitable increase assumptions on B, 
there may exist a half-plane of the form Re A > 
lı (with 41 < o) such that H(A; s;¢) admits poles 


in the joint variables \ and s, corresponding to 
the concept of Regge particle: the composite 
particles introduced in (2) might then be inte- 
grated in the Regge particle, although they 
manifest themselves physically only for integral 
values £ of A with the corresponding spin 
interpretation. Of course, this scenario is by no 
means proven to hold in the general analytic 
program of QFT, but we have seen that the 
relevant “embryonary structures” are concep- 
tually built-in, so that the phenomenon might 
hopefully be produced in a definite quantum field 
model. 

4. Byproducts of BS-type structural analysis for 
N=5 and N=6. Relativistic exact structural 
equations for (3-—3)-particle collision ampli- 
tudes, which generalize the Faddeev structural 
equations of nonrelativistic potential theory, 
have been shown to be valid in the energy 
region of “elastic” collisions (i.e., with total 
energy bounded by 4m); relevant Landau singu- 
larities of tree diagrams and triangular diagrams 
have been exhibited as a by-product in this 
low-energy region (Bros, and also Combescure, 
Dunlop in two-dimensional field models, 1981 
(see Iagolnitzer (1992, refs. [B3], [B4], [CD]))). 
Moreover, crossing domains on the complex mass 
shell for (2 — 3)-particle collision amplitudes have 
been obtained (Bros 1986 (see Iagolnitzer (1992, 
ref. [B1]))) by conjointly using (N=5) BS-type 
equations together with analytic completion prop- 
erties (see, e.g., the “Crossing lemma” in Dispersion 
Relations). 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Dispersion Relations; 
Scattering, Asymptotic Completeness and Bound States; 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools; Thermal Quantum 
Field Theory. 
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Introduction 


Relativistic quantum field theory (QFT) has been 
mainly developed since the 1950s in the perturba- 
tive framework. Quantities of interest then appear 
as infinite sums of Feynman integrals, correspond- 
ing to infinite series expansions with respect to 
couplings. This approach has led to basic successes 
for practical purposes, but suffered due to crucial 
defects from conceptual and mathematical view- 
points. First, individual terms were a priori infinite: 
this was solved by perturbative renormalization. 
However, even so, the series remain divergent. Two 
rigorous approaches have been developed since the 
1960s. The axiomatic approach aims to establish a 
general framework independent of any particular 
model (Lagrangian interaction) and to analyze 
general properties that can be derived in that 
framework from basic principles. The “construc- 
tive” approach aims to rigorously establish the 
existence of nontrivial QFT models (theories) and 
to directly analyze their properties. Some of the 
fundamental bases are described in this encyclope- 
dia in the articles by J Bros, D Buchholz and 
J Summers, and by G Gallavotti, respectively. This 
article aims to a deeper study of particle analysis 
and scattering of theories. In contrast to the articles 
by Buchholz and Summers and G Gallavotti, it is 
restricted to massive theories, a rather strong 
restriction, but for the latter goes much beyond in 
particle analysis. 

From a purely physical viewpoint, results remain 
limited: the models rigorously defined so far are 
weakly coupled models in spacetime dimensions 2 
or 3, results on bound states depend on specific 
kinematical factors in these dimensions, proofs 
of asymptotic completeness (AC) are not yet 
complete,.... On the positive side, we might say 
that the analysis and results are of interest from both 
conceptual and physical viewpoints; on the other 


hand, these works have also largely been related and 
have contributed to important, purely mathematical 
developments, for example, in the domain of 
analytic functions of several complex variables, 
microlocal analysis,.... 

The general framework of QFT based on 
Wightman axioms is introduced in the next 
section. Massive theories are characterized in that 
framework by a condition on the mass spectrum. 
Haag-—Ruelle asymptotic theory then allows one to 
define, in the Hilbert space H of states, two 
subspaces Hin and Hoy corresponding to states 
that are asymptotically tangent, before and after 
interactions, respectively, to free-particle states. The 
AC condition H=Hin = Hou introduces a further 
important implicit particle content in the theory. 
Collision amplitudes or scattering functions are then 
well defined in the space of on-mass-shell initial and 
final energy—momenta (satisfying energy-momen- 
tum conservation). The LSZ “reduction formulas” 
give their link with chronological functions of the 
fields. 

Basic properties of scattering amplitudes that 
follow from the Wightman axioms are then out- 
lined. In particular, these axioms allow one to define 
the “N-point functions,” which are analytic in a 
domain of complex energy-momentum space con- 
taining the Euclidean region (imaginary energy 
components), and from which chronological and 
scattering functions can be recovered. Other results 
at that stage include the on-shell physical sheet 
analyticity properties of four-point functions, as also 
general asymptotic causality and local analyticity 
properties for N > 4. 

Next, we describe results derived from AC and 
regularity conditions on analyticity and asymptotic 
causality in terms of particles. In particular, the 
analysis of the links between analyticity properties 
of irreducible kernels (satisfying Bethe-Salpeter type 
equations) and AC in low-energy regions are 
included, following ideas of K Symanzik. 

The final three sections are devoted to the analysis 
of models. 

Models of QFT have been rigorously defined in 
Euclidean spacetime, through cluster and, more 
generally, phase-space expansions which are shown 
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to be convergent at small coupling (and replace the 
nonconvergent expansions, of perturbative QFT). 
Examples of such models are the super-renormalizable 
massive yf models in dimensions 2 or 3 (in the 
1970s) and the “just renormalizable” massive 
(fermionic) Gross-Neveu model — in dimension 2 — 
in the 1980s. The N-point functions of these models 
can be shown to have exponential fall-off in 
Euclidean spacetime. By the usual Fourier—Laplace 
transform theorem, one obtains in turn analyticity 
properties in corresponding regions away from the 
Euclidean energy-momentum space. 

On the other hand, à la Osterwalder—Schrader 
properties can be established in Euclidean spacetime. 
By analytic continuation from imaginary to real 
times, it is in turn shown that a corresponding 
nontrivial theory satisfying the Wightman axioms is 
recovered on the Minkowskian side. This analysis is 
omitted here. However, no information is obtained 
in that way on the mass spectrum, AC, energy- 
momentum space analyticity,.... Such results can 
be obtained through the use of irreducible kernels. 
This was initiated by T Spencer in the 1970s and 
then developed along the same line (Spencer and 
Zirilli, Dimock and Eckmann, Koch, Combescure, 
and Dunlop). We outline here the more general 
approach of the present authors. In the latter, 
irreducible kernels are directly defined through 
“higher-order” cluster expansions which are again 
convergent at sufficiently small coupling. They are 
shown to satisfy exponential fall-off in Euclidean 
spacetime with rates better than those of the 
N-point functions, and hence corresponding analy- 
ticity in larger regions around (and away from) the 
Euclidean energy-momentum space. Results will 
then be established by analytic continuation, from 
the Euclidean up to the Minkowskian energy- 
momentum space, of structure equations that 
express the N-point functions in terms of irreducible 
kernels. These structure equations are infinite series 
expansions, with again convergence properties at 
small coupling. In the cases N=2 and N =4 (even 
theories), the re-summation of these structure equa- 
tions give, respectively, the Lippmann—Schwinger and 
Bethe-Salpeter (BS) integral equations (up to some 
regularization). 

The one-particle irreducible (1PI) two-point 
kernel G4 is analytic up to s=(2m)* — ec, where e 
is small at small coupling (s is the squared center of 
mass energy of the channel). A simple argument 
then allows one to show analyticity of the actual 
two-point function in the same region up to a pole 
at k? = m4: this shows the existence of a first basic 
physical mass m,, (close at small coupling to the 
bare mass m). In a free theory (zero coupling) with 


one mass m, there is only one corresponding 
particle. At small coupling, the existence of other 
(stable) particles is not a priori expected; never- 
theless, we will see that such particles (two-particle 
bound states) will occur in some models in view of 
kinematical threshold effects. 

The 2PI four-point kernel G2 is shown to be 
analytic up to s=(4m)* — € in an even theory. On 
the other hand, it satisfies a (regularized) BS 
equation. In a way analogous to the section “AC 
and analyticity,” starting here from the analyticity of 
G2, the actual four-point function F is in turn 
analytic or meromorphic in that region up to the cut 
at s > 4m”, and the discontinuity formula associated 
with AC in the low-energy region is obtained. 

For some models (depending on the signs of some 
couplings), it will be shown that F has a pole in the 
physical sheet, below the two-particle threshold (at a 
distance from it which tends to zero as the coupling 
itself tends to zero). This pole then corresponds to a 
further stable particle. 

More generally, and up to some technical pro- 
blems, the structure equations should allow one to 
derive various discontinuity formulas of N-point 
functions including those associated with AC in 
increasingly higher-energy regions. Asymptotic caus- 
ality in terms of particles and related analyticity 
properties (Landau singularities...) should also 
follow. However, in this approach, results should 
be obtained only for very small couplings as the 
energy region considered increases. 

Note: Notations used are different in the next 
two sections on the one hand, and the final three 
sections on the other. These notations follow the 
use of, respectively, axiomatic and constructive 
field theory; for instance, x and p are real on 
the Minkowskian side in the next two sections 
whereas they are real on the Euclidean side in the 
last three sections. The mass m in the next two 
sections is a physical mass, whereas it is a bare 
mass in the last three sections (where a physical 
mass is noted mph). 


The General Framework of Massive 
Field Theories 


We denote by x = (x9, x) a (real) point in Minkowski 
spacetime with respective time and space components 
xo and x (in a given Lorentz frame); x? =x — x’. 
Besides the usual spacetime dimension d = 4, possible 
values 2 or 3 will also be considered. In all that 
follows, the unit system is such that the velocity c of 
light is equal to 1. Energy-momentum variables, dual 


(by Fourier transformation) to time and space 
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variables, respectively, are denoted by p= (po, p); 
p =p- P". 

We describe below the Wightman axiomatic 
framework, though alternative ones such as “local 
quantum physics” based on the Araki-Haag—Kastler 
axioms may be used similarly for present purposes. 
For simplicity, unless otherwise stated, we consider 
a theory with only one basic (neutral, scalar) field A; 
A is defined on spacetime as an operator-valued 
distribution: for each test function f, A(f) (formally 
| A(x)f(x)dx) is an operator in a Hilbert space H of 
states. A physical state is represented by a (normal- 
ized) vector in H modulo scalar multiples. It has to 
be physically understood as “sub specie aeternitatis” 
(i.e., “with all its evolution,” the Heisenberg picture 
of quantum mechanics being always adopted). It is 
assumed that there exists in H a representation of the 
Poincaré group (semidirect product of pure Lorentz 
transformations and spacetime translations). 

The Wightman axioms include: 


1. local commutativity: A(x) and A(y) commute if 
x — y is spacelike: (x — y)* < 0. 

2. the spectral condition (= positivity of the energy 
in relativistic form): the spectrum of the energy- 
momentum operators (infinitesimal generators of 
spacetime translations) is contained in the cone 
V (p? >0,po > 0). In a massive theory, the 
spectrum is more precisely assumed to be 
contained in the union of the origin (that will 
correspond to the vacuum vector introduced 
next), of one or more discrete mass-shell hyper- 
boloids H,(m;)(p 7 =m?, po > 0) with strictly 
positive masses m;, and of a continuum. For 
simplicity, and unless otherwise stated, we con- 
sider in this section a theory with only one mass 
m and a continuum starting at 2m (but this will 
not be so in a theory with “two-particle bound 
states”). This condition introduces a first (partial) 
particle content of the theory. In models, physical 
masses will not be introduced at the outset but 
will have to be determined. 

3. existence in H of a vacuum vector Q, which is the 
only invariant vector under Poincaré transforma- 
tions up to scalar multiples; it is moreover assumed 
that the vector space generated by the action of field 
operators on the vacuum is dense in H. 

4. Poincaré covariance of the theory. 


Subspaces Hin and Hou of H can be defined by 
limiting procedures. To that purpose, one considers 
test functions f(x) with Fourier transforms of 
the form f (pje Pop +m" where the functions f; 
have their supports in a neighborhood of the mass- 
shell H, (m). It can then be shown that vectors of the 
form W,=A(fi,4)A(fo,t)---A(fn,r)Q converge to 


limits in H when t—> +00, respectively, and that 
these limits depend only on the mass-shell restric- 
tions of the test functions fiy, gn 

Hin and Hou are interpreted physically as sub- 
spaces of states that are “asymptotically tangent” 
before, respectively, after the interactions, to free- 
particle states with particles of mass m. They are in 
fact both isomorphic to the free-particle Fock space 
F, namely the direct sum of n-particle spaces of 
“wave functions” depending on n on mass-shell 
energy-momenta p1, P2,- -, Pn. 

AC is the assertion that H = Hin = Hout, that is, 
that each state in H is asymptotically tangent to a 
free-particle state, with particles of mass m, both 
before and after interactions (the two free-particle 
states are different if there are interactions). This 
condition cannot be expected to always hold in the 
general framework introduced above, even if we 
restrict our attention to “physically reasonable” 
theories in which states of H are asymptotically 
tangent to free-particle states before and after 
interactions: the absence of other stable particles 
with different masses is not guaranteed. For 
instance, even if A is “neutral,” the action of field 
operators on the vacuum might generate pairs of 
“charged” particles with opposite charges, whatever 
“charge” one might imagine. Individual charged 
particles cannot occur in the neutral space H and 
their mass thus does not appear in the spectral 
condition. Hence, such states of pairs of charged 
particles will not belong to Hin or Hour although 
they belong to H. However, if the set of charged 
particles is known, it can be shown that the above 
framework might be enlarged by defining charged 
fields, in such a way that AC might still be valid in 
the enlarged framework (see the article of Buchholz 
and Summers). For simplicity, we restrict below our 
attention to the simplest theories in which AC holds 
in the way stated above. 

If AC holds, it is shown that there exists a linear 
operator S from H to H, called “collision operator” 
or “S-matrix,” that relates the “initial” and “final” 
free-particle states to which a state in H is tangent 
before and after interactions, respectively; if AC 
does not hold, S can also be defined as in operator in 
F. Collision amplitudes or scattering functions are 
the energy-momentum kernels of S for given 
numbers m and n of initial and final particles. As 
easily seen, they are well-defined distributions on the 
space of all initial and final on-shell energy- 
momenta. For convenience, we will denote by p; 
the physical energy-momentum of a final particle 
with index k(p, € H.(m)), and by —p, the physical 
energy-momentum of an initial particle 


(—pr € Hy(m)). 
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Wightman Functions, Chronological Functions, 
and LSZ Reduction Formulas 


The N-point Wightman “functions” Wy are defined 
as the vacuum expectation values (VEVs) of the 
products of N field operators, namely: 


Wn (x1, X2, TE ,XN) 
= < Q, A(x1)A(x2) . -A(xn)Q > 


The chronological functions Ty are the VEVs of the 
chronological products of the fields A(xj),..., 
A(xn): in the latter, fields are ordered according to 
decreasing values of the time components of the 
points xz. Ty is essentially well defined due to local 
commutativity with, however, problems not treated 
here at coinciding points. 

Tn(p1,--.,Pn) will denote the Fourier transform 

of Ty. In view of the invariance of the theory under 
spacetime translations, functions above are invariant 
under global spacetime translation of all points x, 
together. Hence, their Fourier transforms contain an 
energy-momentum conservation (e.m.c.) delta func- 
tion 6(p; + p2 +---+pn). Connected N-point func- 
tions are defined by induction (over N) via a 
formula expressing each (nonconnected) function 
as the sum of the corresponding connected function 
and of products of connected functions depending 
on subsets of points. In contrast to nonconnected 
functions, the analysis shows that connected func- 
tions in energy-momentum space do not contain in 
general e.m.c. delta functions involving subsets of 
energy—momenta. 
_It can be shown that the two-point function 
T2(p1, p2) = 6(P1 + p2)T2(p1) has a pole of the form 
1/(p{ — m?) and that Ty has similar poles for each 
energy-momentum variable p, on the mass-shell. The 
connected, amputated chronological function i is 
defined by multiplying (TN)connected = TX (for N > 2) 
by the product of all factors pz — m* that cancel these 
poles. It is then shown that it can be restricted as a 
distribution to the mass-shell of any physical process 
with m initial and n final particles, with m+n=N, 
and that this restriction coincides with the collision 
amplitude of the process. A process is here character- 
ized by fixing the initial and final indices. 

The analyticity properties of interest (described 
below) will apply to the connected functions after 
factoring out their global e.m.c. delta functions. 


The Analytic N-point Functions 


The Wightman axioms (without so far AC) yield 
general analyticity, as also asymptotic causality, 
properties that we now describe. The analysis is 
essentially based on the interplay of support proper- 
ties in x-space arising from local commutativity and 


the definition of chronological operators, and sup- 
port properties in p-space due to the spectral 
condition. Support properties in x-space apply to 
cell and more general “paracell” functions which are 
VEVs of adequate combinations of products of 
“partial” chronological operators. It is shown that 
each such function has support in x-space in a closed 
cone Cs (with apex at the origin). Moreover, for cell 
functions, the cone Cs is convex and salient. Hence, 
in view of the usual Laplace transform theorem, the 
cell function in p-space (after Fourier transforma- 
tion) is the boundary value of a function analytic in 
complex space in the tube Re p arbitrary, Im p in the 
open dual cone Cs of Cs. It is also shown that, near 
any real point P=(P1,...,Pn), the chronological 
function in p-space coincides with one or more cell 
functions. 

Together with support properties in p-space 
arising from the spectral condition and the use of 
coincidence relations between some cell functions (in 
adequate real regions in p-space), one then shows 
the existence, for each N, of a well-defined, unique 
analytic function Fy, called the “analytic N-point 
function,” whose domain of analyticity, the “primi- 
tive domain of analyticity,” in complex p-space 
contains all the tubes Ts associated with the cell 
functions. It also contains in particular a complex 
neighborhood of the Euclidean energy-momentum 
space which consists of energy momenta P, with 
real P, and imaginary energies (P,)9. Moreover, the 
chronological function Ty,"”’* is the boundary value 
of Fy at all real points P, from imaginary directions 
which include those of the convex envelope of the 
cones Cs associated with cell functions that coincide 
locally with TEPS. 

However, the primitive domain has an empty 
intersection with the complex mass-shell, and thus 
gives no result on analyticity properties of collision 
amplitudes on the (real or complex) mass-shell. For 
N=4, it has been possible to largely extend the 
primitive domain (which is not a “natural domain of 
holomorphy”) by computing (parts of) its holomorphy 
envelope, which now has a nonempty intersection 
with the complex mass shell. It is shown in turn that 
the four-point function F4 can be restricted to the 
complex mass-shell in a one-sheeted domain, called 
the “physical sheet,” that admits each (real) physical 
region on its boundary (there is here one physical 
region for each choice of the two initial and the two 
final indices, the corresponding physical regions being 
disconnected from each other). In each physical 
region, the collision amplitude is the boundary value 
of the mass-shell restriction of F4, from the corre- 
sponding half-space of “+ic” directions Im s > 0, 
where s is the (squared) energy of the process. 
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The analyticity domain on the complex mass-shell 
contains paths of analytic continuation between the 
various physical regions (“crossing property”) and 
admits cuts sj; real > (2m)* covering the various 
physical regions. From these analyticity properties in 
the physical sheet, one can also derive “dispersion 
relations” (see Dispersion Relations). 


Asymptotic causality and analyticity 
properties for N > 4 


No similar result has been achieved at N > 4, and as 
a matter of fact, no similar result is expected if the 
AC condition is not assumed. The best results 
achieved so far are decompositions of the collision 
amplitude, in various parts of its physical region, as 
a sum of boundary values of functions analytic in 
domains of the complex mass-shell. In contrast to 
the case N =4, the sum reduces to one term only in 
a certain subset of the physical region. Near other 
points, the N-point analytic function cannot be 
restricted locally to the complex mass-shell, though 
it can be decomposed as a sum of terms which, 
individually, are locally analytic in a larger domain 
that intersects the complex mass-shell. 

These analyticity properties for N > 4 are a direct 
consequence of (and equivalent to) an asymptotic 
causality property that we now outline. Let fg ,(p) 
be, for each index k, a test function of the form 


far (p) = P7 enrii Pi 


where each ug is a point in spacetime, P is a given 
on-shell energy-momentum, and r will be a space- 
time dilatation parameter (y > 0). It is well localized 
in p-space around the point P, and its Fourier 
transform is well localized in x-space around the 
point Tug up to an exponential fall-off of width \/y7 
which is small compared to 7 as T— œ. 

We now consider the action of the (connected, 
amputated) chronological function on such test 
functions. A configuration u = (u1,..., un) will be 
called “noncausal” at P= (P1,..., Pyn) if this action 
decays exponentially as r— œo. In mathematical 
terms, u is then outside the “essential support” or 
“microsupport” at P. The asymptotic causality 
property established, has roughly the following 
content: the only possible causal configurations u 
at P are those for which energy-momentum can be 
transferred from the initial to the final points in 
future cones. Moreover, at least two initial “extre- 
mal” points must coincide, as also two extremal 
final points. The simplest example is the case N = 4; 
if, for example, indices 1,2 are initial and 3,4 final, 
then the only a priori possible causal situations are 
such that u3 = u4 is in the future cone of wu, = m (in 


this particular case Lorentz invariance implies that 
uz — u;, must be proportional to P3 + P4). In more 
general cases, the possible causal configurations u 
depend on P. 


AC and Analyticity 


Asymptotic Causality in Terms of Particles 
and Landau Singularities 


As a matter of fact, a better causality property “in 
terms of particles” — which is the best possible 
one — is expected for “physically reasonable” 
theories if the (stable) particles of the theory are 
known. (By physically reasonable, we mean the 
absence of “a la Martin” pathologies such as the 
occurrence of an infinite number of unstable 
particles with arbitrary long lifetime). That prop- 
erty expresses the idea that the only causal 
configurations u at P are those for which the 
energy-momentum can be transferred from the 
initial to the final points via intermediate stable 
particles in accordance with classical laws: there 
should exist a classical connected multiple scatter- 
ing diagram in spacetime joining the initial and 
final points ug, with physical on-shell energy- 
momenta for each intermediate particle and 
energy-momentum conservation at each (point- 
wise) interaction vertex. 

This property, if it holds, yields in turn (and is 
equivalent to) improved analyticity of the analytic 
N-point function near real physical regions: the (on- 
shell) collision amplitude is the boundary value of a 
unique analytic function in its physical region, at 
least away from some “exceptional points.” The 
boundary value (namely the collision amplitude) is 
moreover analytic outside Landau surfaces L,(I°) of 
connected multiple scattering graphs T; and along 
these surfaces (which are in general smooth 
codimension-1 surfaces), it is in general obtained 
from well-specified “+ie” directions (that depend in 
general on the real point P of L.). 

Exceptional points are those that lie at the 
intersection of two (or several) surfaces L,(T1), 
L4(T2)..., with opposite causal directions, and 
hence having no +ie directions in common (in the 
on-shell framework). Such points do not occur at 
N=4 for two-body processes, in which case the 
surfaces L} are the n-particle thresholds s = (nm), 
with n >2,s=(p;+p2)*. They do occur more 
generally: in a 3—3 process, 1,2,3 initial, 4,5,6 
final, this is the case of all points P such that 
— P; = is — P> Sp — P3 = P6 which all belong to 
the Landau surfaces of the two graphs [y,12, with 
only one internal line joining two interaction 
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vertices: in the case of I4, (resp., T2), the first vertex 
involves the external particles 1, 2, 4 (resp., 1, 3, 5), 
while the second one involves 3, 5, 6 (resp., 2, 4, 6). 
If moreover P;,P2,P3 lie in a common plane, 
previous points P also lie on surfaces L, of 
“triangle” graphs with again opposite causal 
directions at P. The fact that +ie directions are 
Opposite can equally be checked for the corre- 
sponding Feynman integrals of perturbative field 
theory. 


Remark The above points are no longer exceptional 
in spacetime dimension 2. In fact, all surfaces 
L, mentioned then coincide with the (on-shell) 
codimension-1 surface —p, =p4, — p2 = P5, — P3 = P6, 
with two opposite causal directions. The previous 
asymptotic causality property, together with a further 
“causal factorization” property for causal configura- 
tions, then yields along that surface an actual 
factorization of the three-body (nonconnected) 
S-matrix into a product of two-body scattering 
functions modulo an analytic background. The latter 
vanishes outside the surface, hence is identically zero, 
for some special two-dimensional models. 


In the absence of the AC condition, one clearly 
sees why the above causality in terms of particles 
cannot be established: as we have seen, there is 
a priori no control on the stable particles of the 
theory and on their masses, and pathologies such as 
those mentioned above cannot be excluded. Hope- 
fully, the first problem should be solved if AC is 
assumed, and the second one should be removed by 
adequate regularity assumptions. This is the pur- 
pose of the so-called axiomatic nonlinear program, 
in which one also wishes to examine further 
problems, for example, analytic continuation into 
unphysical sheets, with the occurrence of possible 
unstable particle poles and other singularities, 
nature of singularities, possible multiparticle dis- 
persion relations, ...., to cite only a few. Results so 
far remain limited but provide a first insight into 
such problems. 


The Nonlinear Axiomatic Program 


Results described below are based on discontinuity 
formulas arising from — and essentially equivalent in 
adequate energy regions to — AC, together with 
some regularity conditions. They can be established 
either with or without the introduction of adequate 
“irreducible” kernels. The methods rely on some 
general preliminary results on Fredholm theory in 
complex space (and with complex parameters). 
Irreducible kernels are defined through integral 
(Fredholm type) equations, first in the Euclidean 


region (imaginary energies) and then by local 
distortions of integration contours allowing one to 
reach the Minkowskian region. From discontinuity 
formulas and algebraic arguments, these irreducible 
kernels are shown to have analyticity (or meromor- 
phy) properties associated with the physical idea of 
irreducibility (see examples below). 

Results obtained so far with or without irreduci- 
ble kernels are comparable in the simplest cases. 
However, the method based on irreducible kernels 
gives more refined results and seems best adapted to 
“extricate” the analytic structure of N-point func- 
tions for N > 4. 


N=4, Two-Body Processes in the 
Low-Energy Region 


By even theory, we mean theories in which N-point 
function vanishes identically for N odd. 

Standard results on two-body processes with 
initial (resp., final) energy-momenta p1,p2 (resp., 
pi, pi) in the low-energy region (2m)? < s < (3m) 


(s=(p1 + p2 = (Pi Pp) are based on the “off- 
shell unitarity equation” 
F,—F_ =F, *F- 1] 


where Fii Pa pipa) and F_(p1, p23 P,P) denote, 
respectively, the +ie and —ie boundary values of the 


four-point function F4 from above or below the cut 
s > (2m)* in the physical sheet, and « denotes on- 
shell convolution over two intermediate energy- 
momenta. This relation is a direct consequence of 
AC for s less than (37)*, or less than (4m)? in an 
even theory. When the four external energy-momen- 
tum vectors p1, p2, P4, ph are put on the mass shell 
(on both sides of that relation), one recovers the usual 
elastic unitarity relation for the collision amplitude 
T, and its complex conjugate T_: 


T-T _=T,xT_ 


In the exploitation of these relations outlined below, 
a regularity condition is moreover needed, for 
example, the continuity of F, in the low-energy 
region. 

By considering the unitarity equation as a Fredholm 
equation for T, at fixed s (in the complex mass 
shell), one obtains the following result: T, can be 
analytically continued as a meromorphic function 
of s through the cut (in the low-energy region) in a 
two-sheeted (d even) or multisheeted (d odd) 
domain around the two-particle threshold. Possible 
poles in the second sheet (generated by Fredholm 
theory) will correspond physically to unstable 
particles. The singularity at the two-particle thresh- 
old is of the square-root type in s for d even, or in 
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1/logs for d odd. The difference between the two 
cases is due to the power (d — 1)/2 of s, integer or 
half-integer, in the kinematical factor arising from 
on-shell convolution. This result can also be 
extended to the off-shell function F4 by applying a 
further argument of analytic continuation making 
use of the off-shell unitarity equation. 

Restricting now our attention to an even theory 
(for simplicity), a similar result also follows from the 
introduction of a two 2PI BS type kernel G 
satisfying (and here defined from F through) a 
regularized BS equation of the form 


where oy denotes convolution over two intermedi- 
ate energy—momenta with two-point functions on 
the internal lines and a regularization factor in order 
to avoid convergence problems at infinity (G then 
depends on the choice of this factor but its proper- 
ties and the subsequent analysis do not). Alterna- 
tively, one may also introduce a kernel satisfying a 
renormalized BS equation, but this is not useful for 
present purposes. 

Starting from the above discontinuity formula [1], 
one shows in turn that G is indeed “2PI” in the 
analytic sense: 


G, =G_ 3] 


in the low-energy region. More precisely, G is 
analytic or meromorphic (with poles that may arise 
from Fredholm theory) in a domain that includes the 
two-particle threshold s=(2m)*, in contrast to F 
itself. 

The proof of [3] is based on the relation 
independent of M (and thus leaving the M depen- 
dence implicit). 


0, 0 = 4] 


(which is a nontrivial adaptation of the decomposi- 
tion of a mass-shell delta function as a sum of plus 
and minus ie poles). A simple algebraic argument 
then shows essentially the equivalence between the 
discontinuity formulas [1] and [3]. 

In turn, assuming that G has no poles, this 
analyticity allows one to recover the two-sheetedness 
(d even) or multisheetedness (d odd, singularity in 
1/log) of F, in view of the BS type equation. 


N =6, 3-3 Process in the Low-Energy Region 
(Even Theory) 


The result, in the neighborhood of the 3-3 physical 
region, is here a “structure equation” expressing the 
3-3 function F in the low-energy region as a sum of 


(TN 


à la Feynman contributions” associated with 


graphs with one internal line and with triangle 
graphs, with two-point functions on internal lines 
and four-point functions at each vertex, plus a 
remainder R. The latter is shown to be a boundary 
value from +ie directions Im s positive, where 
s=(p1 + p2 + p3)°sP15P2,p3 denoting the energy- 
momentum vectors of the initial particles. Further 
regularity conditions are needed to recover its local 
physical region analyticity. The various explicit 
contributions that we have just mentioned yield the 
actual physical region Landau singularities expected 
in the low-energy 3-3 physical region. 

A more refined result, in the approach based on 
irreducible kernels outlined below, applies in a 
larger region and then includes further à la Feynman 
contributions associated with 2-loop and 3-loop 
diagrams (the latter do not contribute to “effective” 
singularities in the neighborhood of the physical 
region). 

The first result can be established from disconti- 
nuity formulas for the three-point function around 
two-particle thresholds, arising from AC, and 
“microsupport” analysis of all terms involved. In 
the approach based on irreducible kernels, it is 
useful to introduce in particular a 3PI kernel G3 
that, in contrast to the 3-3 function, will be analytic 
or meromorphic in a domain including the three- 
particle threshold. To that purpose, an adequate set 
of integral equations is introduced and the three- 
particle irreducibility of G3 in “the analytic sense” is 
then established. In turn it provides the complete 
structure equation mentioned above. 


More General Analysis 


There are so far only preliminary steps in more 
general situations, in view of (difficult) technical 
problems involved and the need of ad hoc regularity 
assumption at each stage. As already mentioned, the 
approach based on irreducible kernels seems best 
adapted. The analysis should clearly involve more 
general irreducible kernels with various irreducibil- 
ity properties with respect to various channels (and 
not only with respect to the basic channel consid- 
ered such as the 3-3 channel in the case above). 
From a heuristic viewpoint, one may first consider 
to that purpose adequate formal expansions into 
(infinite) sums of “a la Feynman contributions” 
adapted to the energy regions under investigation. 
These a la Feynman contributions will involve 
adequate irreducible kernels in the graphical sense 
at each vertex, and the above expansions correspond 
formally to the best possible regroupings of 
Feynman integrals with respect to the energy region 
considered. From such expansions, one might 
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determine adequate sets of integral equations allow- 
ing one, together with regularity assumptions, to 
carry out an analysis similar to above. 


The Models 


A Euclidean field-theoretical model can be defined 
by a probability measure du(y) on the space of 
tempered distributions y in Euclidean spacetime, 
whose moments verify the Osterwalder—Schrader (or 
similar) axioms. The moments of dy are, for each N, 
the Euclidean (Schwinger) N-point functions: 


S(x1,...,%N) = / pla) glam) duly) — [5] 


In what follows, the measure du will be a 
perturbed Gaussian measure which, for the massive 
yt model with a volume cutoff A and an ultraviolet 
cutoff p, is given in d dimensions by 


dyr = MS MOO) J AOE dy (p)/Zap [6 


where Za,, is the normalization factor and where 
dv,(y) is the Gaussian measure of mean zero 
(f pdv=0) and covariance 


C(x — y; p) = J d'p el?) e*/ I (Elpo) p? +m?) 


where by convention m is called the bare mass. 

For d=2 or 3 one can show that, for A(p)=A 
small enough (depending on m) and ¢(p)=1, there 
exists a function a(p) (a(p)=O(A) as A—> 0) such 
that, for any set of N distinct points, the function 
Slits AN] = A, gargs Sha Xiyv as Xk). ES, “15 
not Gaussian (hence does not correspond to a trivial, 
free theory), and satisfies the Osterwalder—Schrader 
axioms. The connected part S(x1,...,XN)connected has 
the following perturbative series: 


im OH f ole)... olan) 


A,p—oo 


: | OORLOG de dV. p(0) omecrea D 





which is the (divergent) sum of the connected 
renormalized (Euclidean) Feynman graphs. 

The study of the perturbative series leads to the 
distinction of: 


1. the super-renormalizable theories, where it is 
possible to take A(p),C(p) not depending on p. 
In dimension 2, all the models where Ayt is 
replaced by 


Copp? topip 1 +--+toptriyt+ay® [8] 


also exist provided that c2p > 0 is small enough 
depending on m and on the other coefficient c’s 
and A, and 

2. the just renormalizable theories where X(p) (and 
possibly ¢(p)) depend in general on p. In models 
mentioned below A(p) 0 as p— œ; this char- 
acterizes “asymptotic freedom.” 


The proof of the existence of the N-point 
functions makes use of Taylor type expansions 
with remainder. The first orders are used to compute 
Alp), C(p), alp). The idea is to consider the functional 
integral [5] — at A,p finite — as an integral over 
roughly Ap? “degrees of freedom” which are weakly 
coupled. This corresponds to a decomposition of the 
phase space (with cutoff both in x-space (the box A) 
and in p-space (roughly |p| < p)). The coupling 
between different regions in x-space comes from 
the propagators C,; the coupling between different 
frequencies in p-space comes from the y* term (the 
interaction vertex). The expansion is then, for each 
degree of freedom, a finite expansion in the coupling 
between this degree and the others so that, even if 
the expansion is perturbative up to the order Apf, 
the bound on each term is qualitatively the one on a 
product of Ap® finite order-independent expansions, 
the order of which can be fixed uniformly in p (and 
depending only on A). To achieve this program, the 
propagator linking two points of distance of order L 
must have a decrease of order e% '!*~!, that is, have 
momentum larger than L“, so that one must 
localize both in x-space and p-space ; for example, 
the smallest cells of phase space correspond to fields 
y localized in x, p-spaces, the x-boxes being of side 
p™ and the p-localization consisting of values such 
that roughly (p/2) < |p| < p. More generally, a 
generic cell (of index i) corresponds to fields y at 
point x and momentum p, with x in a box of side 
Jo and 2 *p <= |p| =< 2-"o, 

These expansions are mimicking the a la Wilson 
renormalization group. For just renormalizable theo- 
ries (where \(p) depends on p), one is led to introduce 
the effective coupling constant \(2~‘p) whose pertur- 
bative expansion is the value at momentum zero of 
the sum of all the (connected, amputated) four-point 
functions containing only propagators of momentum 
(roughly) bigger than 2~'p (plus A(p) which in fact 
tends to zero as p— œ). 

Then by small coupling we mean a theory where 
\(27*p) /C(2-#p)* is small for all 2. 

By convention we write Aren,Gen,@ren for the 
effective parameters of the theory at zero 
momentum. 

The expansion obtained expresses Sconnected aS a 
sum of terms each of them being associated to a 
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given set of phase-space cells which are “connected” 
together by “links” that are either propagators or 
vertices. Each term decreases exponentially with the 
difference imax — imin Of the upper and lower indices 
of the phase-space cells involved. Moreover, each set 
must contain the cells associated to the fields 
(p(x1)---y(xn) whose indices are fixed by the order 
of magnitude of the distances between the points. 
On the other hand, the difference between the 
theory of cutoff p and the one of cutoff 2p are 
terms containing at least one cell of momentum of 
order p; these terms are thus small like 
cst(x1,...,xn)e'"", so that the limit as p— oo 
exists. 

So far, the “construction” of models is possible 
only at small coupling, apart from special cases. The 
yt theory in dimension 4 is just renormalizable 
(from the perturbative viewpoint) but the above 
condition of small coupling cannot be achieved (and 
it is generally believed that this model cannot be 
defined as a nontrivial theory). A just renormaliz- 
able model has been shown to exist, namely the 
Gross—Neveu model which is a fermionic theory in 
dimension 2. The elementary particle physics models 
are just renormalizable but their construction has 
not been completed so far (in particular in view 
of the confinement problem). See Constructive 
Quantum Field Theory for details. 

To state the result in a form convenient for our 
purposes here, we introduce a splitting of the 
covariance in two parts: 


C(x — y;p) = Cu(x—y3p) + Csm(x—y;p), M>m 


Cu(p; p) = (e /p? +m?) — (eP/” /p? +M?) 


so that Cyy(x — y) behaves like C at large distances but 
has an ultraviolet cutoff of size M, and |Csy(x — y)| < 
eMlx-yl decreases exponentially depending on the 
(technical) choice of M. Let dijyy(y) be the Gaussian 
measure of covariance Cy. 

One divides also A in unit cubes and obtains for 
the connected N-point function an expansion as a 
sum over connected trees; a tree T is composed of 
lines £ and vertices v; each line joins two vertices or 
one of the external points x;,...,xn and a vertex; 
moreover, there are no loops. 

To each line ¢ is associated a propagator 
Coulze, 24) = CuO) 

To each vertex v are associated: 


1. two subsets I,, I’, of {4}, 

2. a connected set X, of unit cubes such that all the 
ze,f € I, and all the z,, € I, are contained in 
X,3|X,| is the volume of X,, and 

3. a kernel Kx, ({z, 2’},3 p) 


Finally, the external points are by convention zg 
points; then: 


->XN ) connected 


1 
= | imd È 


{Xv} 
nonoverlapping 


: J | I x|] deiCu(l) 


LET 


Saplar.. 


zp not external 


x | | Kx. (£z, To) [9] 


veT 


where for coupling small enough: 


EOE Tmo (10 


vel veT 


The X’s are 2 x 2 nonoverlapping; however, it will 
suffice to sum over all X’s (without restriction) to 
get a bound showing the convergence of the 
expansion as A—oo. In this formula the K(.,~)’s 
are still coupled by the measure divy(y); all the 
nonperturbativity is hidden in the K’s (in particular 
the contribution of momentum bigger than M). 

As a consequence of [9] and if a(p,A) has been 
chosen such that dren = 0, for M large enough and at 
small coupling (depending on M, m): 


|S (x59 ener | 
< |Cy(x — y)| + J dz) dz5|Cu(x — z1) 


Kea Ca =a 
< (aope nA [11] 


More generally, the connected N-point function 
satisfies 


S(e1,---, Xn )eonnectedl £ cst e Aen) [12] 


where d(x1,...,xyn) is the length of the smallest tree 
joining x1,...,xn, with possibly intermediate points. 


The Irreducible Kernels 
The 1PI Kernel and a Lippmann-Schwinger Equation 


To then show that a theory — if the perturbation series 
heuristically shows it — contains only one particle of 
mass smaller than 2m(1 — €), it is necessary to expand 
further the coupling between the K’s in [9]. Each 
perturbative step relatively to this coupling will 
generate a sum of terms such that in each one there is 
a “new” propagator Cy between two K’s. 

The fact that in [9] the X’s are nonoverlapping 
has the consequence that an expansion where for 
each pair of Kx the number of propagators Cy 
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remains bounded (say by n + 1) is convergent (for 
small enough couplings depending on m, n); this is 
because, for a given X, the others must be farther 
and farther as their number increases, and in view of 
the exponential decrease (in x-space) of Cy. 

We then consider the expansion where we have 
further expanded the two-point function S(x,y) such 
that each term can be decomposed in the channel 
x—y in Cy propagators and 1PI contributions (in 
the sense that any line cutting such a 1PI contribu- 
tion (and outside the X’s) cuts at least two 
propagators); that means that these 1PI contribu- 
tions are no longer coupled by the dvy(y) measure. 
They are made of propagators and of Kx which still 
have nonoverlapping restrictions; the latter are 
straightforwardly expanded using a kind of (con- 
vergent) Mayer expansion; the result is finally a 
Lippmann-Schwinger type equation: 


S(x, Yad _ Cm (x 7 y) T | oa dez Cm a 7 z1) 


x Gi(21,22) Cu(z2 -—y)+--- [13] 
Or 
S GS connected = |Cu N [G1 Cm} (x,y) 
p20 








which is equivalent to 
Seaniested = Cm Eg Cm G1 Cm T CMG ieonneced [14] 


where G4 is a 1PI kernel that satisfies the bound 
(Cira Ane Aan [15] 


In Fourier transform, eqn [14] becomes 


F(p) = Cu(p) + Cu(p)Gi(p)Cu(p) 
+ Cu(p)Gi(p)F(p) [16] 


Denoting by 6(p + q) F(p, q) the Fourier transform of 
S(x5;V)connecteds WE can then compute F(p): 


(p? + m*)[Cu + CuGi](p) 17 
(p> + m?) — (p? + m*)CuGi(p) 
where (p? + m)Cu(p) = (1 — m?/M7?) as p 0 and 
IG1(p)| < Aren cst(m) so that (as expected) F has no 
pole in the Euclidean region at small coupling; but, 


as will be seen in the next section, it has a pole 
outside the Euclidean region. 


F(p) = 


The 2PI Kernel and a BS Equation 


From the previous discussion, it is clear that one can 
extract from [9] as many propagators as we want 
between kernels Kx. If one considers a splitting of 
the external points in incoming x1,...,Xp and 


outgoing Xp+1,...,XyN points, this defines a channel. 
One then obtains nPI kernels (in the given channel). 
In the same way as above, one obtains a relevant 
structure equation; this equation makes sense only 
if the kernels Kx have a decrease corresponding to 
n-particle irreducibility; to that purpose we take 
M > nm. The expansion converges for couplings 
small enough depending on m and n. 

In the case n=2 this gives a kind of BS equation 
(the Lippmann-Schwinger equation corresponding 
to the case n= 1); if we restrict, for simplicity, the 
analysis to even theories one is led to jump directly 
to the case n=3: 


S(x4 , X2; X3, a aea 


= [ez dt, dzz dtz (om) (x1, X2; z1, t1) 


X Gaetan (Om) oaa) [18] 


S = OM S [Grom]? 


pel 
or 
S = 0omGo oy + oy GoS [19] 
where 
(om)(x1, x2; x3, x4) =S(x1, x3)S(x2, x4) 
+ S(x1,2%4)S(x2, x3) 
and where 


|G2 (t1, t2; u1, u2)| 
< Aren Exp{ —4m(1 — €) max(|t; — uj|) } [20] 
i,j 


Equation [19] once amputated, and after Fourier 
transformation, is eqn [2]. 


More General Irreducible Kernels 
and Structure Equations 


Irreducible kernels with various degrees of irreduci- 
bility in various channels can be defined in a similar 
way. Corresponding expansions of N-point func- 
tions follow, in terms of integrals involving these 
kernels and two-point functions. These kernels are 
again convergent at small coupling (— 0 as their 
irreducibility — oo) as well as the corresponding 
structure equations (which generalize eqn [18]). 


Analyticity, AC, and Bound States 


As explained in the introduction, we now proceed 
by analytic continuation away from the Euclidean 
region in complex energy-momentum space. 
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First, it is easily seen that the two-point function 
is analytic in the region s < (2m)* — apart from a 
pole at s=m?, which defines the physical mass 
Mph (min is the zero in p* of the denominator in 
formula [17]). In view of the bounds of the previous 
two sections, Mph is close to the “bare” mass m. 

The 2PI kernel, for even theories, is shown, again by 
Laplace transform theorem, to be analytic and bounded 
in domains around and away from the Euclidean region 
up to s =(4m)* — €, and is of the order of Aren. 

As we have seen in the section “AC and 
analyticity,” the analyticity of G2 entails the analytic 
structure of F (two-sheeted or multisheeted at the 
threshold). On the other hand, further poles of F can 
be generated by the BS integral equation [2] in the 
physical or unphysical sheets. If a pole in the 
physical sheet occurs at s< (2mp) real, it will 
correspond to a new particle in the theory, namely a 
two-particle bound state. 


AC in the Low-Energy Region 


The analysis of possible bound states, which will be 
presented in the following, will show that there 
might be at most one two-particle bound state of 
mass mg <2m,, which tends to 2m, as the 
couplings tends to zero. 

On the other hand, for even theories, in view of 
the analyticity properties of the two-point function 
and of the 2PI kernel G2, equation [1] holds in the 
region (2m5) Ls (4mp) — é, where * is on-shell 
convolution with particles of mass mph. 

If there is no two-particle bound state, this 
characterizes the AC of the theory for s < (4mp) — e. 

If there is a bound state of mass mg, AC is 
established only in the region s < (Sma) — e. 

For non-even theories, the analysis is similar but 
requires the introduction of new irreducible kernels 
in view of the fact that the non-evenness opens new 
channels. AC in all cases can be established, for 
small couplings, up to s < (Sma) — e. 


Analysis of Possible Two-Particle Bound States 
for Even Theories at Small Coupling 


It can be checked that such poles of F, if there are, 
either lie far away in the unphysical sheet(s) or are 
close to the two-particle threshold (s= (2mh) k 
This is due to the convergence, at small coupling, of 
the Neumann series F= G2 + G2 oy Go 4+ ---. Indi- 
vidual terms G2 om -om G2 are, in fact, defined 
away from the Euclidean region by analytic con- 
tinuation in a two-sheeted (d even) or multisheeted 
(d odd) domain around the threshold: to that 
purpose locally distorted integration contours (initi- 
ally the Euclidean region) are introduced as in the 


section “AC and analyticity,” so as to avoid the pole 
singularities of the two-point functions involved in 
oy, the threshold singularities being due to the 
pinching of this contour between the two poles as 
so (2mp). If a fixed neighborhood of the thresh- 
old is excluded, one does obtain uniform bounds of 
the form (cst Aren)? (for a term with q factors G2) in 
any bounded domain, which ensures the conver- 
gence of the Neumann series. 

It remains to study the neighborhood of the 
threshold. To that purpose, the following method 
is convenient. One shows that the convolution 
operator oy can be written in the form 


om = g(s) * +V [21] 


where x is, as in the section “AC and analyticity,” 
on-shell convolution for s > (2mp) or is obtained 
by analytic continuation for complex value of s 
around the threshold; g(s) = 1/2 for d even and, if d 
is odd, g(s)=(i/27) logo, where o=4m}, —s. In 
view of this definition of g(s), the operator V is 
regular: it is an analytic one-sheeted operation 
around the threshold (this is equivalent to [4]), and 
it has no pole singularities. This property of V can 
be established by geometric methods or by an 
explicit evaluation. 

It is then useful to introduce a new kernel U 
linked to Gz by the integral equation 


U = G + UVG; (22) 


In view of the regularity and bounds of V and Gy, 
one sees (e.g., by a series expansion) that U, like Go, 
is analytic in a neighborhood of the threshold and 
behaves in the same way at small Agen. 

By a simple algebraic argument F and U are 
related by the integral equations 


F = U + g(s)U x F = U + g(s)F x U [23] 


Two-dimensional models We start the analysis with 
the case d = 2. The mass shell is trivial in this case; let f 
be the restriction of F to the mass shell; it depends only 
on s=(p3+ 4)" due to the mass shell and e.m.c. 
constraints (as also Lorentz invariance). On the mass 
shell, the operation x becomes a mere multiplication 
and the integral equation [23] becomes 


—~ f (s)u(s) [24] 
where u is the mass shell restriction of U and the 


factor a(s) arising from * is of the form 
a(s)=cst s! 201/2, o = (2mp) — s, which gives 


f(s) = Aes 25] 
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In turn one obtains 

U, U 
a(s) — u(s) 
where U (resp., jU) is U with p3,p4 (resp., p1, p2) 
restricted to the mass shell. Equation [26] comple- 
tely characterizes the local structure of F in view of 
the local analyticity of U. 

The analysis of the possible poles follows from the 
fact that U is equal to G2 up to higher order in Aren; 
on the other hand, G2 is equal to a first known term 
plus higher-order corrections in Aren (if we expand in 
Aren the expression for G2 obtained in the previous 
section), so that the leading contribution of u(s) is 
known and the results follow. 

For a theory (see [8]) containing a Areny* term there is 
exactly one pole, which corresponds to the zero of a(s) — 
u(s), lying in the region (2mh) —e<s< (2mh). 
This pole is either in the physical sheet for Aren < 0 or in 
the second sheet if Aren > 0. In the case Aren < 0, this 
pole corresponds to a two-particle bound state of 
physical mass mg which tends to 2mph as Aren — O. 

In a model without y* term (Aren = 0) the lowest- 
order contribution to G2, hence to U, is in general of 
the order of the square of the leading coupling, in 
which case there is always one bound state. 

The treatment of the fermionic Gross—Neveu 
model, which involves spin and color indices, is 
analogous, with minor modifications. Equations now 
involve, in the two-particle region, 4 x 4 matrices; 
poles of F are now the zeros of det (a(s)I — m(s)u(s)), 
where m(s) is the 4 x 4 matrix obtained from 2 x 2 
residue matrices (whose leading matrix elements are 
explicitly computable). The detailed analysis, which 
requires the consideration of different channels 
(various color and spin indices) is omitted. 


F=U+ [26] 


Three-dimensional models The results are similar: 
F is decomposed as F' + F”, where F' is the 4=0 
“partial wave component” of F, namely F’ = (1/27) 
| Fd6, where @ is the “scattering angle” of the 
channel; its complement F” is shown to be locally 
bounded in view of a further factor o. The analysis 
is then analogous to the case d=2 with a(s) now 
behaving like cst/ logo as o— 0. There is, a priori, 
either no pole, or one pole in the physical sheet at 
s=me < (2mp) with mg =2mp,, + O(e*/*=), 
depending again on the signs of the couplings. For 
the existing even models such as the yf model, there 
is no pole, hence no two-particle bound state. 


Four-dimensional models The existence of the yf 
model in dimension 4 is doubtful. If a four- 
dimensional model were defined, and if the 2PI 
kernel G2 of a massive channel could be defined and 
shown to satisfy analyticity properties analogous to 


above, there would be no two-particle bound state at 
small coupling. In fact, the kinematical factor o!4~*)/4 
(for d even) generated by the mass shell convolution 
is no longer equal to o™/? as in the d=2 case but 
now to o!/*. As a consequence, the Neumann series 
giving F in terms of G3 is convergent also in the 
neighborhood of the two-particle threshold. 


Non-even theories The analysis for the non-even 
theories follows similar lines. As already mentioned, 
the analysis requires the introduction of new irredu- 
cible kernels. For the models Ayt + c3y*, which do 
exist at small couplings in dimensions 2 and 3, there 
will be either exactly one or no two-particle bound 
state, depending on the respective values of À, c3. 


Structure Equations and AC in 
Higher-Energy Regions 


The structure equations of the previous section provide, 
after analytical continuation away from the Euclidean 
region, a rigorous version of the analysis presented at 
the end of the section “AC and analyticity.” The 
irreducible kernels can here be defined in a direct way 
following the previous section, together with their 
analyticity properties. One has then to derive the 
discontinuity formulas that in turn characterize AC. 
This program has been carried out in the 3 — 3 particle 
region, and partly in the general case. It seems possible 
to complete general proofs up to some technical 
(difficult) problems. As already mentioned, in this 
approach, the coupling should be taken smaller and 
smaller as the energy region considered increases. 
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Schrödinger operators are linear partial differential 
operators of the form 


Hy = —A + V(x) 1] 


acting on a suitable dense domain dom(Hy) C L4 (Q) 
in the Hilbert space of square-integrable functions 
on a spatial domain Q C Rf, where d € N. Here, 
Hj) =—-A= 2y 0? /Ox? is (minus) the Laplacian 


on Q, and the potential V:Q — R acts as a multi- 
plication operator, [V~](x):= V(x)w(x). 


Historical Origin and Relation 
to Theoretical Physics 


In 1926, Schrodinger formulated quantum theory as 
wave mechanics and proved later that it is equiva- 
lent to Heisenberg’s matrix mechanics. He proposed 
that the state of a physical system at time t € R is 
given by a normalized wave function y, € L7(Q) 
whose dynamics is determined by a linear Cauchy 
problem: wo is the state at time t= 0, and for t > 0, 
it evolves according to 

OW 

ae Hy, [2] 
the Schrödinger equation. More generally, wWo is a 
normalized element of a Hilbert space H, and 
the Hamiltonian Hy is a self-adjoint operator, 
that is, dom(Hy) =dom(Hj,) C H and Hy = Hj, on 
dom(Hy). Formally, eqn [2] is solved by the 
evolution operator or propagator exp(—itHy) in 
the form y,= exp(—itHy)wo. The self-adjointness 
of Hy insures the existence and unitarity of 
the propagator exp(—itHy), for all te€R, so 
|] = |]o|| =1. For physics, this unitarity is crucial, 
because ||,||> is interpreted as the total probability 
of the system to be at time ¢ in some state in H. The 
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general validity of eqn [2] as the fundamental 
dynamical law of all physical theories, including, 
for example, nonrelativistic and (special) relativistic 
quantum mechanics, quantum field theory, and 
string theory, deserves appreciation. 

If the physical system under consideration is a 
nonrelativistic point particle of mass m >Q in a 
potential V:R? — R, then, according to the princi- 
ples of classical (Newtonian) mechanics, its state is 
determined by its momentum p € R? and its posi- 
tion x € Rf, its kinetic energy is (1 /2m)p’, its 
potential energy is V(x), and the dynamics is given 
by the Hamiltonian flow generated by _ the 
Hamiltonian function Hoass(p, x) = (1/2m)p* + V(x). 
Schrödinger derived the Hamiltonian (operator) 
H= —(6°/2m)A + V(x) in [2] from the replace- 
ment of the momentum p € R? by the momentum 
operator 4hV,. This prescription is called quanti- 
zation and is further discussed in the section 
“Quantization and semiclassical limit.” The 
Schrodinger operator Hy in [1] is then obtained after 
an additional unitary rescaling, w(x) u4? (px), 
by :=h(2m)'/7, and a redefinition V(x) := V(x/p) 
of the potential. 

For more details, we refer the 
Schrödinger (1926) and Messiah (1962). 


reader to 


Self-Adjointness 


Led by the requirement of unitarity of the propa- 
gator, the domain dom(Hy) in [1] is usually chosen 
such that Hy is self-adjoint, which, in turn, is most 
often established by means of the Kato-—Rellich 
perturbation theory, briefly described below. If 
V=0, then Ho equals the Laplacian —A, which 
is a positive self-adjoint operator, provided 
dom(Ho) = WZ. (Q) is the second Sobolev space 
with suitable conditions on the boundary OO of Q. 
Typical examples are dom(Ho)=W2(R2), for 
Q=R4%, and WZ,.(Q) and W2,,,(Q) with Dirichlet 
or Neumann boundary conditions on ðQ, respec- 
tively, in case that 2 is a bounded, open domain in 
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R? with smooth boundary 09. Starting from this 
situation, V is required to be relatively Ho-bounded, 
that is, that M(V,r):=V(—A-+r1)7? defines 
(extends to) a bounded operator on L7(Q), for any 
r>0O. If lim, ||M(V,7r)||<1, then Hy is self- 
adjoint on dom(Hp) and semibounded, that is, the 
infimum inf o(Hy) of its spectrum o(Hy) is finite; in 
other words, Hy > cl, for some cEeR, as a 
quadratic form. (The semiboundedness corresponds 
to quasidissipativity, as a generator of the semigroup 
exp(—GHy).) 

A fairly large class of potentials fulfilling these 
requirements is defined by 


in sup J lx — y4 V (y)? 7 =0 [8] 
AND | xen |x—y|<a 


for d #4, and with |x — y|*4 replaced by (In |x — 
y|)', for d=4. For d < 3, [3] is equivalent to the 
uniform local square integrability of V, that is, 
SUP,<9 lesia V(y)? d*y < oo. Note that [3] allows 
for local singularities of V, provided they are not too 
severe; in this respect, quantum mechanics is more 
general than classical mechanics. Equation [3] is a 
sufficient condition for Hy =—A+V to be self- 
adjoint on dom(—A) because lim,_,., ||M(V, 7)|| = 0. 
Moreover, as eqn [3] only misses some borderline 
cases, it is also almost necessary for the self- 
adjointness of Hy. By means of Kato’s inequality, the 
conditions on V, especially on its positive part 
V,:=max{V,0}, can be further relaxed. Also, if one 
realizes Hy as the Friedrichs extension of a semi- 
bounded quadratic form, the conditions to impose on 
V are milder. One possibly loses, however, control 
over the operator domain dom(Hy), and typically 
dom(—A) is only a core for Hy. 

For further details on self-adjointness, we refer the 
reader to Reed and Simon (1980a,b), Kato (1976), 
and Cycon et al. (1987). 


Spectral Analysis 


The self-adjointness of Hy establishes a functional 
calculus, generalizing the notion of diagonalizability of 
finite-dimensional self-adjoint matrices: there exists a 
unitary transformation W:L7(Q) — L?(o(Hy), du) 
such that Hy acts on elements y of L7(o(Hy), dun,) 
as a multiplication operator, [Hyy](w) =wy(w). The 
spectral measure uy, decomposes into an absolutely 
continuous (ac) part UHy,ac, a pure point (pp) part 
HHy,pp and a singular continuous (sc) part MHy,scs 
mutual disjointly supported on the ac spectrum 
Oac(Hy), the pp spectrum opp(Hv), and the sc 
spectrum o.<(Hy) C R, respectively, whose union is 
the spectrum o(Hy) of Hy. There is an additional 


decomposition of the spectrum of Hy into the discrete 
spectrum dOgisc(Hy), which consists of all isolated 
eigenvalues of Hy of finite multiplicity, and its 
complement dess(Hy) = R\ogisc(Hv), the essential 
spectrum of Hy, as its residual spectrum is void. One 
of the main goals of the spectral analysis is to 
determine the spectral measure for a given potential 
V as precisely as possible. 

In many applications, Q = R? and the potential V in 
Hy is not only relatively Ho-bounded, but even 
relatively Hp-compact, that is, M(V, 1) is compact. In 
this case, lim, ||M(V,7r)||=0, insuring self- 
adjointness on dom(Hp) and semiboundedness of Hy. 
Moreover, a theorem of Weyl implies that its essential 
spectrum agrees with the one of Ho, that is, with the 
positive half-axis Rg, and the discrete spectrum is 
contained in the negative half-axis R`. If, furthermore, 
(Ho + 1) [x - VV(x)](Ho + 1) is compact, then the 
essential spectrum on the positive half-axis is purely 
absolutely continuous, dess(Hy) A R? =oac(Hv) N 
Rt, and hence ogisc(Hv) © opp(Hv) © odisc(Hyv) U 
{0}; the singular continuous spectrum is void. 

We remark that the absence of singular contin- 
uous spectrum is not understood. Indeed, it is 
possible to explicitly construct potentials V such 
that H(V) has singular continuous spectrum. In 
terms of the Baire category, singular continuous 
spectrum is even typical. The appearance of singular 
continuous spectrum can, perhaps, be easier 
understood in terms of the dynamical properties of 
exp | —itHy], rather than the spectral analysis of its 
generator Hy: Singular continuous spectrum occurs 
when initially localized states are not bound states, 
but move out to infinity very slowly. 

The reader is referred to Simon (2000), Reed and 
Simon (1980a, b) and Cycon et al. (1987) for further 
detail. 


Properties of Eigenfunctions 


Let us assume Q=R%, that V <0 is nonpositive, 
fulfills [3], and that lim)... V(x)=0. From the 
statements in the last section we conclude that 
Hy = —A + V(x) is semibounded, that the essential 
spectrum is the positive half-axis and that all 
eigenvalues are negative and of finite multiplicity, 
possibly accumulating only at 0. We collect some 
properties of the eigenfunctions p; € L?(Rf) with 
corresponding eigenvalue e; <0, that is, Hyw;= 
ej; The smallest eigenvalue eo := inf o(Hy) (coin- 
ciding with the bottom of the spectrum) is simple, 
and the corresponding eigenfunction wo(x) > 0 is 
strictly positive a.e. Elliptic regularity implies that at 
a given point x € R%, the eigenfunction p; is almost 
2 — d/2 degrees more regular than V. For example, 


if Ve C*[B>-(x)], for some e>0, then pj € 
CkTB.(x)], for all £ < 2 — d/2. Agmon estimates 
(originally obtained by S’nol and also known in 
mathematical physics as Combes-Thomas argu- 
ment) furthermore show that, for unbounded Q, 
the eigenfunction y; decays exponentially: |2;(x)| < 
Cae ™™l, for any 0 <a < ej. 

For more details, see Reed and Simon (1978, 
1980a, b) and Cycon et al. (1987). 


One Dimension and Sturm-Liouville 
Theory 


For d=1, the stationary Schrödinger equation 
reduces to a second-order ordinary differential 
equation known as a Sturm-—Liouville problem, 


=Y" (x) + V(x)v(x) = Ey(x) 4] 


on L?([a,b]), with V e€ ZLt([a,b)] and independent 
boundary conditions at —oo <a < b < œ, say. Equa- 
tion [4] admits an almost explicit solution by means of 
the Prüfer transformation defined by gy(x):= 


arctan [~y(x)/w"'(x)] and R(x):= In (i | (xe)? wa). 


The key point about the Prüfer transformation is that it 
effectively reduces the second-order differential equa- 
tion [4] into a (nonlinear) first-order equation for y, 


g'(x) = (E— V(x)) sin*[p(x)] + cos*[p(x)] [5] 


Note that [5] does not involve R and that the 
boundary conditions on wy and y at a and b can be 
easily expressed in terms of y(a) and y(b). More- 
over, having determined y on [a,b] from [5], the 
function R is immediately obtained by integrating 
R’'(x) =[1 + V(x) — E] sin [y(x)] cos [y(x)]. In case of 
a bounded interval, ~oo < a < b < œ, or a confin- 
ing potential, lim, +. V(x) =œ, it is not difficult to 
derive from [5] the following basic facts: the 
spectrum of H(V) consists only of simple eigenva- 
lues Ey < E4 < Eo < --- with lim,.., En = oo. More- 
over, the corresponding eigenfunction w, Æ 0, 
n € No, with H(V)y,= Envy, has precisely n zeros, 
and Sturm’s oscillation theorem holds. 
See Amrein et al. (2005) for more details. 


Quantization and Semiclassical Limit 


The quantization procedure postulated by Schrödinger 
is the replacement of the classical momentum p € R? 
by the quantum-mechanical momentum operator 
—ihV,. It is known (and, in fact, easy to see, 
cf. Messiah (1962)) that the classical Hamiltonian 
equation of motions is invariant under symplectic 
transformations, but Schrddinger’s quantization 
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procedure does not commute with symplectic 
changes of the classical variables. The question of 
the geometrically sound definition of quantization, 
with a general d-dimensional manifold replacing 
the spatial domain Q, has attracted many mathe- 
maticians and has led to the mathematical fields 
of geometric quantization and deformation 
quantization. 

It is remarkable, however, that Schrödinger himself 
discovered already in his early paper the fact that 
classical dynamics derives as the scaling limit 4 — 0 
from quantum mechanics. The systematic study of 
the convergence of wave functions and of operators 
and their spectral properties is known as semiclassical 
analysis, which is nowadays considered to be part of 
microlocal analysis. We illustrate the type of results 
one obtains by the following example on 2 =R%. 

Let Fe C(R;R) be a smooth characteristic 
function, compactly supported in an interval I c R` 
away from the essential spectrum of the semiclassi- 
cal Schrödinger operator H; = -bA +V with a 
smooth potential V € CS(R$) of compact support. 
We define the operator F|H;] by functional calculus 
(note that I C oz(Hy) and F[H;] is of trace class). 

Let, furthermore, A; = duid <M 4a(x)O¢ be a differ- 
ential operator representing an observable. Then 
tr{A,F[H,|}, which exists because the eigenfunctions 
of H; are smooth and decay exponentially, is, up to 
normalization, interpreted to be the expectation of the 
observable A, in the state represented by the spectral 
projection of Hy in I, approximated by F[H,]. 

Semiclassical analysis then yields an asymptotic 
expansion of the form 


tr{A,F[Hp]} = (co + ci + ++» + nb” + olh") 


for arbitrarily large integers n € N. The leading- 
order coefficient co is determined by Bohr’s corre- 
spondence principle, 


tr{ A, F|H5]} 
Bi P 3 r dp dy 
= | als, p]FIp? + Vix) a 


+0 (rhy!) 6) 





Semiclassical analysis thus provides the mathemati- 
cal link between quantum and classical mechanics. 
The proof of [6] usually involves pseudodifferential 
and/or Fourier integral operators, depending on the 
method. Advanced topics in semiclassical analysis 
studied more recently are the construction of 
quasimodes, that is, wave functions Y£ p,n which 
solve the eigenvalue problem (Hy — EWE, b,n = O(h”) 
up to errors of order b”, for arbitrarily large n € N, 
and the relation between semiclassical asymptotics 
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and the KAM (Kolmogorov—Arnold—Moser) theory 
from classical mechanics. 

For more details, see Dimassi and Sjöstrand 
(1999), and Robert (1987). See also Stability Theory 
and KAM, KAM Theory and Celestial Mechanics in 
this encyclopedia. 


Lieb-Thirring Inequalities 


Lieb-Thirring inequalities are estimates on eigenva- 
lue sums of Hy = -— A — V(x), where V>0O is 
assumed to be non-negative (note that we changed 
the sign of V) and vanishing at oo; the most 
important examples for these sums are the number 
of eigenvalues below a given —E < 0 and the sum of 
its negative eigenvalues, counting multiplicities. 
More generally, denoting by [A], := max {A, 0} the 
positive part of \ € R, Lieb-Thirring inequalities are 
estimates on tr{[—-E — H_y]"}, for y > 0. The num- 
ber of eigenvalues below —E is then obtained in the 
limit y — 0, and the sum of the negative eigenvalues 
corresponds to E=0 and y=1. We henceforth 
assume E=0O, for simplicity. A guess inspired by 
[6] with F[A]:=[-A]}, A=1, and 4 =1 then is that 
tr{[—H_y]"} is approximately given by 


d’x d@p 
I. [V(x) p G (2r)? 
= Csc(7, d) J V(x) PH dfx [7] 
RI 





for a suitable constant Csc(y, d) > 0 depending only 
on y and d (but not on V). While this guess is 
wrong, it is nevertheless a useful guiding principle. 
Namely, in a rather large range of y and d, there 
exist constants CLr(y, d) > 0 such that 


tr{[-H-v]}} 
< Crr(y, d) l, V(x) tt d?x [8] 


for all V > 0, for which the right-hand side is finite 
(with the understanding that this finiteness also 
insure that [-H_y]". is trace class, in the first place). 

Of COULSE, Cave d) = Csc(7; d), by [6]. The 
Lieb-Thirring conjecture, which is still open today, 
says that the best possible choice of Cy7(1, 3) equals 
Csc(1,3) in the physically most relevant case y= 1 
and d=3. It is known that Crr(y, d) > Csc(y, d), for 
y<lord<3. 

Lieb-Thirring estimates have been derived for 
various modifications of the original model, depend- 
ing on the application. One of these are pseudor- 
elativistic Hamiltonians of the form H=T(p) — V, 
where T(p)=./p? + m’, with m > 0, another one 


includes an external magnetic field, for example, 
H =(p — A} — V (see the next and the last section). 

The reader is referred to Thirring (1997), Reed and 
Simon (1978), and Simon (1979) for further details. 


Magnetic Schrodinger Operators 


Magnetic Schrodinger operators are Hamiltonians 
of the form 


Hme(A, V) = (p — A(x))} -V (x) 
on L7(R°) [9] 
Hpauii (A, V) = [0 - (p — A(x))] -V (x) 
on L? (R?) 8 C? [10] 


where V is the (electrostatic) potential; as before, 
A:R? — R? is the vector potential of the magnetic 
field B=VAA, and ø= (01,02,03) are the Pauli 
matrices. Hmc(A, V) and HApaui(A, V) generate the 
dynamics of a particle moving in an external electro- 
magnetic field of spin s= 0 and spin s= 1/2, respec- 
tively. The operator Hpau (A, V) is usually called Pauli 
Hamiltonian, and we refer to Hme(A, V) as the 
magnetic Hamiltonian. To keep the exposition simple, 
we assume henceforth that A, and 0,,A, are uniformly 
bounded, which suffices to prove the self-adjointness 
of both Hamiltonians. 

At a first glance, the magnetic and the Pauli 
Hamiltonians may seem to differ only marginally, 
but in fact, some of their spectral properties are 
fundamentally different. 


1. The magnetic Hamiltonian fulfills the diamagnetic 
inequality, [je Hm V(x, y)| < e Hml, V, y), for 
almost all x,y € R?, where m(x,y) denotes the 
integral kernel of an operator m. As a consequence, 
inf o[Hmce(A, V)] < inf o[Hmc(0, V)] = inf o[H(V)], 
and the quadratic form of the magnetic Hamilto- 
nian is semibounded, for all choices of A, provided 
H(V) is. 

2. If info[Hm-(A, V)] is an eigenvalue, the diamag- 
netic inequality reflects the fact that the corre- 
sponding eigenvector is not positive or of 
constant phase. The determination of the nodal 
set of eigenfunctions is a difficult task on its own. 

3. For V=0, the diamagnetic inequality and the 
minimax principle imply that p — A has no zero 
eigenvalue. 

4. The diamagnetic inequality fails to hold for the 
Pauli Hamiltonian. On the contrary, if A is 
carefully adjusted in H,,,(A, —Z|x|~"), and Z is 
sufficiently large, then the corresponding 


quadratic form may assume arbitrarily small 
values (even if the corresponding field energy is 
added). 

5. For many choices of A, the (Dirac) operator 
o -(p — A) has a nontrivial kernel. 


From (1)-(4) it is clear that the proof of stability of 
matter (see the next section) in presence of a 
magnetic field is more difficult than in absence of it. 
This can be illustrated by the fact that magnetic Lieb- 
Thirring inequalities, being the natural analog of eqn 
[8], are more involved to derive than the original 
estimate [8]. The currently best bound is of the form 


tr{[-H-v];} 
< Cor f (VORE BOVER 
+([B(x)| + Le(x)?) L TVE bd’ [11] 


for some universal Cmr < œ, where L,(x) is a local 
length scale associated with B. It is nonlocal in x 
and somewhat reminiscent of a maximal function. 

We further remark that if restricted to two 
dimensions, d=2, both the magnetic and the Pauli 
Hamiltonians play an important role in the theory of 
the (integer) quantum Hall effect. 

For more details, see Simon (1979), Cycon et al. 
(1987), Rauch and Simon (1997), and Erdös and 
Solovej (2004). See also the article Quantum Hall 
Effect in this encyclopedia. 


N-Body Schrodinger Operators 


The origin of quantum mechanics is atomic (K=1 
below) or molecular (K > 2) physics. If we regard 
the nuclei of the molecule as fixed point charges 
Z:=(Z1,...,ZK) >O at respective positions 
R:=(R1,...,Rx) € R?, then the Hamiltonian (in 
convenient units) of this molecule with N € N 
electrons is the following Schrodinger operator: 


Hy(Z, R) -5 -f-a, 7 


n=1 


+ So —_ [12] 
1<m<n<N [Xm — Xn 

defined on H'N):= A™_, LR x Z2] C LR x 
Z2)], the space of totally antisymmetric, square- 
integrable wave functions in N space-spin variables 
(x1,01),---5(xn,on) € R? x Z2. The antisymmetry 
of the wave function accounts for the fact that 
electrons are fermions and is of crucial importance. 
Note that the number N of electrons is possibly very 
large. It is clear that we cannot expect to carry out 
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the spectral analysis of this Schrödinger operator 
directly, but rather only suitable approximations. 

In spite of the fact that Hn(Z, R) was one of the 
basic operators of quantum mechanics from its very 
beginning in the late 1920s, Hn(Z, R) was, strictly 
speaking, not known to be self-adjoint before Kato 
developed the perturbation theory (described in the 
section “Self-adjointness”) some 20 years later, which 
then also yielded the semiboundedness of Hn(Z, R). 
So, the ground-state energy En(Z, R):=info[Hn 
(Z, R)| > —oo is finite. From the HVZ (Hunziker— 
van Winter—Zishlin) theorem follows that inf cess | HN 
(Z, R)| = En_1(Z, R), which particularly implies that 
En(Z,R) is monotonically decreasing in N and 
negative (because E;(Z, R) < 0). 

It is known that En(Z, R)=Enii(Z,R) and that 
Hy(Z,R) has no eigenvalue, for N > 2Z, + 1, 
where Ztot:= yee is the total nuclear charge 
of the atom. On the other hand, it is known that 
En(Z, R) is an eigenvalue, provided N < Zor. Thus, 
defining Neri to be the smallest number such that 
En(Z, R) is not an eigenvalue, for all N > Neit, that 
is, Neit is the maximal number of electrons the 
molecule can bind, we have that Zio: < Noite < 
2Ztot + 1. In increasing precision, asymptotic neu- 
trality, Nectr=Ztoe + R(Ziot), with R(Ziot)=0(Ziot) 
and R(Z)=0(Z°/’), was shown for atoms and for 
molecules, respectively. The ionization conjecture 
states that Nert < Zio + C, for some universal 
constant C. It is still open for the full model 
represented by Hn(Z, R), but has been proved in 
the Hartree-Fock approximation. It has been proved 
in the Hartree-Fock approximation by Solovej. 

The semiboundedness of Hn(Z, R), for fixed Z, R, 
and N, alone does not rule out a physical collapse of 
the matter described by Hn(Z, R), but the stronger 
property of stability of matter does. It holds if there 
exists a constant C, possibly depending on Z, such that 


Ex(Z,R)+ XŠ 


1<k<l<K 


RR) 2~CN+K) [13] 


that is, if the ground-state energy plus the repulsive 
electrostatic energy of the nuclei is bounded below 
by a constant times the total number N+K of 
particles in the system. Equation [13] was shown to 
hold for Hn (Z, R). 

In connection with stability of matter, Thomas- 
Fermi theory and the question of the limit of large 
nuclear charge came into the focus of research. For 
simplicity, we restrict ourselves to atoms, K = 1, that 
is, there is one nucleus of charge Z:=Z, at the 
origin, R;=0, and we consider E(Z):=minnen 
En(Z,0) (which amounts to fixing N := Neit). An 
asymptotic expansion for E(Z) of increasing 
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precision in Z was obtained by ever-finer estimates; 
presently, one knows that 


E(Z) = Er Z +122 + Cops Z’ + 0(Z9/9) [14 


where the leading contribution ErpZ”” is the 
Thomas—Fermi energy, (1/4)Z? is the Scott correc- 
tion, and CpsZ°/? is the Dirac-Schwinger term. The 
computation of this last term requires semiclassical 
analysis sketched in the section “Quantization and 
semiclassical limit.” 

For more details, see Cycon et al. (1987), Rauch 
and Simon (1997), Thirring (1997), and Solovej 
(2003). See also the article Stability of Matter in this 
encyclopedia. 


Scattering Theory 


The study of the properties of the propagator 
exp(—itH) of a self-adjoint operator H = H*, as 
t— œ, is the concern of scattering theory. To 
obtain a well-defined mathematical object in this 
limit, it is necessary to compose exp(—itH) with 
the inverse of some explicitly accessible compar- 
ison dynamics before passing to the limit £ — oo. If 
V is a short-range potential, that is, V is relatively 
Ho-compact and |V(x)| < C|x| ”, for some v>1 
and C < œ, then the comparison dynamics appro- 
priate for Hy is generated by Ho: the wave 
operators Q* are defined as the strong limits 
= iim ere [15] 
t— +00 
A general technique in scattering theory to prove the 
existence of such limits is Cook’s argument, which 
formally amounts to an application of the funda- 
mental theorem of calculus. For example, for the 
existence of Q”, one writes 


i d; m, pH 
+ = —ItHy altho 
Q l / ali (e e ) } 


=-i f devé) [16 
0 


and additionally proves the absolute integrability of 
trey Veto y, for y in a dense subset of H, like 
dom(Ho) =dom(Hy). 

Research in scattering theory in the past two 
decades or so was focused around the question of 
asymptotic completeness, which is a mathematically 
precise formulation 


Rano" = RanQ™ = Hy (Hy) [17] 


of the physical expectation that the states in H are 
either bound states (eigenvectors) of Hy or 


scattering states (states in the range of *) of Hy. 
The intertwining property HyQ*=Q*Hp (which 
easily follows from [15]) implies that the restriction 
of Hy to RanQ* is unitarily equivalent to Ho, hence 
Rano C H,.(Hy) C H;,,(Hyv). The difficult part of 
the proof of asymptotic completeness is to show that 
H,,(Hv) C RanQ*. 

Much effort has been spent to prove asymptotic 
completeness for N-body Schrödinger operators on 


HN) := Qa L?(R°) of the form 





as 
n=1 m 
with V(x):= X > Vin(%m—%n) [18] 


1<m<n<N 


where each pair potential Vmn obeys |OYVinn(y)| < 
C(1 + lyp! with a € NS being a multi-index. If 
u> 1 for all m+n then V is called a short-range 
potential. Conversely, if 0 < u < 1 then V is a long- 
range potential. Note that even though each Vj, 
decays at infinity, |x|” =x? +x*3+---+2x2 — o0 
alone does not imply that V(x) — oo. In fact, physical 
intuition tells us that for a cluster C of N particles, 
whose dynamics is generated by Hy(V), several 
scenarios for the long-time asymptotic behavior of 
the evolution are possible: 


1. The N particles stay together in their cluster C 
whose center of mass moves in space at constant 
velocity. 

2. The cluster breaks up into two (or even more) 
subclusters, C; and C2, of Nj and Na =N — N; 
particles, respectively, whose centers of mass drift 
apart from each other at constant velocities (in 
the short-range case). For each subcluster C; and 
C2, both scenarios may appear again, after wait- 
ing sufficiently longer. 

3. In the limit t — oo, possibly after going through 
(1) and (2) several times, the initial cluster C is 
broken up into 1<K<N © subclusters 
Ci,...,Cx, whose centers of mass drift apart 
from each other at constant velocities according 
to a free and independent dynamics of their 
centers of mass. 


In some sense, asymptotic completeness says that 
nothing else than (1)-(3) can possibly happen. 
(Strictly speaking, asymptotic completeness is a 
statement about the limit t — œ and only 
involves (3) — the actual behavior of exp [—itHy] 
at intermediate times in terms of (1)—(3) is beyond 
the reach of current mathematics.) It is a key 
insight of scattering theory that the asymptotics of 
the time evolution in the sense of (3) is completely 


characterized by the asymptotic velocity defined 
by the strong limit 

Pes lim Ga = olin) | [19] 

t— o0 t 

It is a nontrivial fact that P* exists, commutes with 
Hy (V), and that bound states are precisely the states 
with zero asymptotic velocity, while states with 
nonzero asymptotic velocity are scattering states in 
RanQ*. This then implies asymptotic completeness 
for short-range potentials. The proof of this dichot- 
omy builds essentially upon positive commutator or 
Mourre estimates. Given an interval J localized (in 
energy) away from any eigenvalue of any possible 
subcluster configuration C,,...,Cx (called thresh- 
olds), the Mourre estimate asserts the existence of a 


positive constant M > 0 and a compact operator 
R € B(H™)) such that 


lzi[Hn(V),A] lz > Mlz—R [20] 


as a quadratic form, for some suitable operator 
A. This operator A is often chosen to be the 
dilation generator A=(1/2){p-x+x-p} or a var- 
iant thereof. 

Again, the proof of asymptotic completeness for 
long-range potentials is still more difficult and has 
been carried out only for u > /3—1. The addi- 
tional problem is the comparison dynamics of the 
relative motion of the clusters C4 and C2 in (2), 
which is not the free one; the clusters rather 
influence each other even at large distances. 

For more details, see Reed and Simon (1980c) and 
Derezinski and Gérard (1997). See also the articles 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools, Scattering, 
Asymptotic Completeness and Bound States in this 
encyclopedia. 


Random Schrodinger Operators 


Schrödinger operators H(V,,) on L2(R%) or (Z) 
with a random potential V,, are called random 
Schrödinger operators. (If H(V,,) acts on (2(Z7), 
then the (continuum) Laplacian —A is replaced by the 
discrete Laplacian on Z? defined by [—Agiscf ](x) = 
a {2f(x) —f(x—e,)—f(x+e,).) More precisely, 
given a probability space (0,P,u) and a random 
variable Q 3w V„, the family {H(V.,)},-9 defines 
an operator-valued random variable that we refer to 
as a random Schrodinger operator. Random quantum 
systems are physically relevant as models for amor- 
phous materials, and for solids in very heterogenous 
external fields or coupled to quantized fields. Suitable 
ergodicity assumptions on w—V,, ensure that the 
domain of H,, and even many spectral properties (in 
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particular, the spectrum o(H(V,,))CR_ itself) are 
independent of w P-almost surely. For example, 
assuming an independent, identical distribution 
(i.i.d.) of V„ in the discrete case on Zf, one arrives 
at the Anderson model, which has been most 
thoroughly studied. Its counterpart for continuum 
models is a Poisson-distributed V,,. A model which 
also has ergodic properties, although deterministic, is 
the Hofstadter or the Mathieu problem. Most 
research has been focused on localization, that is, 
spatial decay properties of the resolvent {H(AV,,) — 
E}! (x,y) of H(AV,,), as |x — y| — oo, and particularly 
the question of presence or absence of exponential 
decay (localization), as this is an important indicator 
for the transport properties of the material under 
consideration. Exponential localization of eigenstates 
has been established for d=1 or strong disorder or 
sufficiently high energies E>> 1. Localization is also 
intimately related to bounds on moments of the form 
|x/2 || <C,t?. The study of the asymptotic dis- 
tribution of eigenvalues close to the lowest threshold 
leads to the so-called Lifshitz tails. 

The reader is referred to Figotin and Pastur 
(1992), Cycon et al. (1987), and Stollmann (2001). 


(Pseudo)relativistic Schrodinger 
Operators 


Schrödinger operators of the form H(V) =p? + V(x) 
do not observe the invariance principles of (special) 
relativity, as their derivation is based in classical 
(Newtonian) mechanics. The free Dirac operator 
D:=a@-p+m@ (here, a, and 8 are self-adjoint 
4x4 matrices) possesses the desired relativistic 
invariance, but it is not semibounded, and the 
definition of an interacting Dirac operator is 
notoriously difficult (and unsolved). The replace- 
ment of the kinetic energy (1/2m)p” by the Klein- 
Gordon operator \/p*+m is a step towards 
relativistic invariance, which, at the same time, 
yields a positive operator. This replacement may 
also be viewed as the restriction of the free Dirac 
Operator to its positive-energy subspace. The virtue 
of this replacement is that it immediately allows for 
the study of interacting N-particle operators, 


N K 
Z 
rel _ J 2a k 
1 
+ — [21] 
Pon Xe — Xn 


much like in [12]. Since \/p* + m? ~ |p|, as p > œ, 
the pseudorelativistic kinetic energy \/p% + m? can 
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balance only less severe local singularities of the 
potential V than the nonrelativistic kinetic energy 
(1/2m)p*. Indeed, pees the quadratic form 


Vp + m? — glx| on C(R R?) associated to a hydro- 


gen-like atom is Ba from below if g > 2/r. 
Hence, the stability of matter becomes a more subtle 
property of pseudorelativistic matter. The relaxation 
of the restriction onto the positive subspace of the free 
Dirac operator also got into the focus of research. 


For more details, we refer the reader to Thirring 
(1997): 


See also: Deformation Quantization; Elliptic Differential 
Equations: Linear Theory; h-Pseudodifferential Operators 
and Applications; Localization for Quasiperiodic 
Potentials; Nonlinear Schrödinger Equations; Normal 
Forms and Semiclassical Approximation; N-Particle 
Quantum Scattering; Quantum Hall Effect; Quantum 
Mechanical Scattering Theory; Scattering, Asymptotic 
Completeness and Bound States; Stability of Matter; 
Stationary Phase Approximation. 
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Introduction 


Topological quantum field theories (TQFTs) provide 
powerful tools to probe topology of manifolds, 
specifically in low dimensions. This is achieved by 
incorporating very large gauge symmetries in the 
theory which lead to gauge-invariant sectors with 
only topological degrees of freedom. These theories 


are of two kinds: (1) Schwarz type and (2) Witten 
type. 

In a Witten-type topological field theory, action is a 
BRST exact form, so is the stress energy tensor T, so 
that their functional averages are zero (Witten 1988). 
The BRST charge is associated with a certain shift 
symmetry. The topological observables form cohomo- 
logical classes and semiclassical approximation turns 
out to be exact. In four dimensions, such theories 
involving Yang-Mills gauge fields provide a field- 
theoretic representation for Donaldson invariants. 

On the other hand, Schwarz-type TQFTs are 
described by local action functionals which are not 
total derivatives but are explicitly independent of 
metric (Schwarz 1978, 1979, 1987, Witten 1989). 
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The examples of such theories are topological 
Chern—Simons (CS) theories and BF theories. 

Metric independence of the action S of a Schwarz- 
type gauge theory implies that stress-energy tensor 
IS Zero: 


More generally, in the gauge-fixed version of such 
theories, stress-energy can be BRST exact, where 
BRST charge corresponds to gauge fixing in contrast 
to Witten-type theories where corresponding BRST 
charge corresponds to a combination of shift 
symmetry and gauge symmetry. There are no local 
propagating degrees of freedom; the only degrees of 
freedom are topological. Expectation values of 
metric-independent operators W are also indepen- 
dent of the metric: 





AW) _ 
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Three-dimensional CS theories are of particular 
interest, for these provide a framework for the study 
of knots and links in any 3-manifold. Pioneering 
indications of the fact that topological invariants 
can be found in such a setting came in very early 
when A S Schwarz demonstrated that a particular 
topological invariant, Ray-Singer analytic torsion 
(which is equivalent to combinatorial Reidemeister— 
Franz torsion) can be interpreted in terms of the 
partition function of a quantum gauge field theory 
(Schwarz 1978, 1979). In particular, in the weak- 
coupling limit of CS theory of gauge group G on a 
manifold M, contribution from each topologically 
distinct flat connection (characterized by the equiva- 
lence classes of homomorphisms: mı( M) — G) to the 
partition function is given by metric-independent 
Ray-Singer torsion of the flat connection up to a 
phase. This phase factor is also a topological 
invariant of framed 3-manifold M (Witten 1989). 
It was Schwarz who first discussed CS theory as a 
topological field theory and also conjectured that 
the well-known Jones polynomial may be related to 
it (Schwarz 1987). In his famous paper Witten 
(1989) not only demonstrated this connection, but 
also set up a general field-theoretic framework to 
study the topological properties of knots and links in 
any arbitrary 3-manifold. In addition, this frame- 
work provides a method of obtaining some new 
manifold invariants. As discussed by A Achtcaro 
and P K Townsend, CS theory also describes gravity 
in three-dimensional spacetime (Carlip 2003). 

BF theories in three dimensions provide another 
framework for field-theoretic description of 


topological properties of knots and links. These 
theories with bilinear action in fields can also be 
defined in higher dimensions. In particular in D = 4, 
BF theory, besides describing two-dimensional gen- 
eralizations of knots and links, also provides a field- 
theoretic interpretation of Donaldson invariants. 
This provides a connection of these theories with 
Witten-type TQFTs of Yang-Mills gauge fields. We 
shall not discuss BF theories in the following and 
refer to the article BF Theories in this Encyclopedia. 

Witten (1995) has also formulated CS theories in 
three complex dimensions described in terms of 
holomorphic 1-forms. Such a theory on Calabi-Yau 
spaces can be interpreted as a string theory in terms 
of a Witten-type topological field theory of a sigma 
model coupled to gravity. General topological sigma 
models in Batalin—Vilkovisky formalism have been 
constructed by Alexandrov et al. (1997). This is a 
Schwarz-type theory. However, in its gauge-fixed 
version, it can also be interpreted as a Witten-type 
theory. This construction provides a general for- 
mulation from which numerous topological field 
theories emerge. In particular, the Witten A and B 
models and also multidimensional CS theories are 
special cases of this construction. 

In the following, we shall survey three-dimensional 
CS theory as a description of knots/links, indicate 
how manifold invariants can be constructed from 
invariants for framed links, and also discuss its 
application to three-dimensional gravity. 


Three-Dimensional CS Theory with 
Gauge Group U(1) 


The simplest Schwarz-type topological field theory is 
the U(1) CS theory described by the action: 


1 


where A is a connection 1-form A=A,,dx" and M is 
the 3-manifold, which we shall take to be S? for the 
discussion below. The action has no dependence on 
the metric. Besides being the U(1) gauge invariant, it 
is also general coordinate invariant. 

In quantum CS field theory, we are interested in 
the functional averages of gauge-invariant and 
metric-independent functionals WIA]: 


(WIA) == J [DA]WIA] expfiks} 


|2] 
Ze fipa exp{ikS} 
This theory captures some of the simple, but 


interesting, topological properties of knots and links 
in three dimensions. For a knot K, we associate a knot 
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operator fẹ A which is gauge invariant and also does 
not depend on the metric of the 3-manifold. Then for 
a link made of two knots K4 and 2 we have the loop 
correlation function (fk, A fk, A ), which can be 
evaluated in terms of two- -point correlator 
(A,,(x)A,(y)) in R? (with flat metric). This correlator 
in Lorentz gauge (0,,A" = 0) is 


i (x—y) 
p gag) 


(Ay(x)AL(y)) = 


so that for two distinct knots K; and K> 
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This integral is the well-known topological invariant 
called “Gauss linking number” of two distinct 
closed curves. It is an integer measuring the number 
of times one knot Kı goes through the other knot 
K2. Linking number does not depend on the 
location, size, or shape of the knots. In electro- 
dynamics, it has the physical interpretation of work 
done to move a monopole around a knot while 
electric current runs through the other knot. 

Abelian CS theory also provides a field-theoretic 
representation for another topological quantity 
called “self-linking number,” also known as “fram- 
ing number,” of the knot. It is related to the 
functional average of (fẹ A fg A) where two loop 
integrals are over the same knot. Coincidence 
singularity is avoided by a topological loop-splitting 
regularization. For a knot K given by x”(s) para- 
metrized along the length of the knot by s, we 
associate another closed curve Kę given by 
y"'(s) =x"(s) + en"(s), where € is a small parameter 
and n”(s) is a principal normal to the curve at s. The 
coincidence limit is then obtained at the end by 
taking the limit «0. Such a limiting procedure is 
called framing and knot Kç is the “frame” of knot K. 
Linking number of the knot K and its frame Kq is the 
self-linking number of the knot: 


-pf fra y Ew p(X — Y). — y)” 
— 9° 


Hence coincidence two loop correlator is 


(fagan u 


Notice that the self-linking number of a knot is 
independent of the regularization parameter ¢, but 


where 


SL(K, n”) 


does depend on the topological character of the 
normal vector field n”(s). It is also related to two 
geometric quantities called “twist” T(K) and “writhe” 
w(K) through a theorem due to Calugareanu: 


SL(K) = T(K) + w(K) [5] 
where 
dx" „ dx’ 
MK) = 52 fas eds” ds 
de” de" 
=F fas g dle G Fe 
Here 
yit) — y*(s) 
e” (s, t) = =— 
= TAE 


is a unit map from K & K — S? and n"(s) is a normal 
unit vector field. T(K) and w(K) are not in general 
integers and represent the amount of twist and coiling 
of the knot. These are not topological invariants but 
their sum, self-linking number, is indeed always an 
integer and a topological invariant. This result has 
found interesting applications in the studies of the 
action of enzymes on circular DNA. 


Nonabelian CS Theories 


Nonabelian CS theories provide far more informa- 
tion about the topological properties of the mani- 
folds as well as knots and links. 

Nonabelian CS theory in a 3-manifold M (which 
as in last section is taken to be $?) is described by 
the action functional 


1 
s=z | tr(A\dA+3A AANA) [6] 
4r M 


where A is a gauge field 1-form which takes its value 
in the Lie algebra CLG of a compact semisimple Lie 
group G. For example, we may take this group to be 
SU(N) and A=A‘%T“%, where T? is the fundamental 
N-dimensional representation with trT?T’ = — 1/26”. 
Under homotopically nontrivial gauge transforma- 
tions this action is not invariant, but changes by an 
amount 27” where integers n are the winding 
numbers characterizing the gauge transformations 
which fall in homotopic classes given by II3(G) = Z 
for a compact semisimple group G. However, for 
quantum theory what is relevant is exp[ikS] which 
is invariant even under homotopically nontrivial 
gauge transformations provided the coupling k 
takes integer values. This quantized nature of the 
coupling was pointed out by Deser et al. (1982a, b) 
(and also they were first to introduce the non- 
abelian CS term as a gauge-invariant topological 
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mass term in gauge theories). So for integer k, the 
quantum field theory we discuss here is gauge 
invariant. 

The topological operators are Wilson loop opera- 
tors for an oriented knot K: 


Wer(|K] =tr P exp $ Ar [7] 


where Ar =A°T$ with T$ as the representation 
matrices of a finite-dimensional representation R of 
the £G. P stands for the path ordering of the 
exponential. The observable Wilson link operator 
for a link L= [J] Ki, carrying representations R; on 
the respective component knots, is 


Waser [L] = J| Wali 8 
1 


Expectation values of these operators are: 


{[DA|Wr,.-r, [L]e* 


VR RoR, LL] = f[DAle® 


9] 
The measure [DA] has to be metric independent. 
These expectation values depend not only on the 
isotopy of the link L but also on the set of the 
representations {R;}. These can be evaluated in 
principle nonperturbatively. For example, when 
£LG=su(N) and each of the component knot of the 
links carries the fundamental N-dimensional repre- 
sentation, the Wilson link expectation values satisfy 
a recursion relation involving three link diagrams 
which are identical except for one crossing where 
they differ as over crossing (L+), under crossing 
(L_), and no crossing (Lo) as shown in the Figure 1. 

The expectation values of these links are related 
as (Witten 1989): 


qh? Vx[L4] — aN? VIL] 
= a E q7" \Vn[Lo] [10] 
where 
271 
soo 


This is precisely the well-known skein relation for 
the HOMFLY polynomial. The famous Jones one- 


variable polynomial (whose two-variable 
L, G L_ 


Figure 1 Skein related links. 


generalization is the HOMFLY polynomial) corre- 
sponds to the case of spin-1/2 representation of 
SU(2) CS theory: V2[L]= Jones polynomial [L], up 
to an overall normalization. These skein relations 
are sufficient to recursively find all the expectation 
values of links with only fundamental representation 
on the components. To obtain invariants for any 
other representation, more general methods have to 
be developed. A complete and explicit solution of 
the CS field theory is thus obtained. One such 
method has been reviewed in Kaul (1999). The 
method makes use of the following important 
statement: 


Proposition: CS theory on a 3-manifold M 
with boundary © is described by a WZNW 
(Wess—Zumino—Novikov-Witten) conformal field 
theory (CFT) on the boundary (Figure 2). 


Using the same identification, functional average 
for Wilson lines ending at n points on the boundary 
X is obtained from WZNW field theory on the 
boundary with n punctures carrying representations 
R; (Figure 3): 

We can represent CS functional integral as a 
vector (Witten 1989) in the Hilbert space H 
associated with the n-point vacuum expectation 
values of primary fields in WZNW conformal field 
theory on the boundary ©. Next, to obtain a 
complete and explicit nonperturbative solution of 

2 
Cf} 
M 
Figure 3 CS functional integrals with Wilson lines and CFT on 


the CS theory, the theory of knots and links and 
Figure 2 Relation of CS to CFT. 
punctured boundary. 


their connection to braids is invoked. 
| 
| 2 ; > 
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Knots/Links and Braids 


Braids have an intimate connection with knots and 
links which can be summarized as follows: 


1. An n-braid is a collection of nonintersecting 
strands connecting n points on a horizontal rod 
to m points on another horizontal rod below 
strictly excluding any backward traversing of the 
strands. A general braid can be written as a word 
in terms of elementary braid generators. 

2. We associate representations R; of the group with 
the strands as their colors. We also put an 
orientation on each strand. When all the repre- 
sentations are identical and also all strands are 
unoriented, we get ordinary braids, otherwise we 
get colored oriented braids. 

3. The colored oriented braids form a groupoid 
where product of the different braids is obtained 
by joining them with both colors and orientations 
matching on the joined strands. Unoriented 
monochromatic braids form a group. 

4. A knot/link can be formed from a given braid by 
a process called platting. We connect adjacent 
strands namely the (2i+1)th strand to 2ith 
strand carrying the same color and opposite 
orientations in both the rods of an even-strand 
braid (Figure 4a). 

There is a theorem due to Birman which states 
that all colored oriented knots/links can be 
obtained through platting. This construction is 
not unique. 

5. There is another construction associated with 
braids which relates them to knots and links. We 
obtain a closure of a braid by connecting the ends 
of the first, second, third, ... strands from above 
to those of the respective first, second, third,... 
strands from below as shown in the Figure 4b. 
There is theorem due to Alexander which states 
that any knot or link can be obtained as a closure 
of a braid, though again not uniquely. 


Link Invariants 


This connection of braids to knots and links can be 
used to construct link invariants, say in S°. To do so, 


(a) (b) 
Figure 4 (a) Platting and (b) closure of braids. 


two nonintersecting 3-balls are removed from the 
3-manifold S? to obtain a manifold with two S? 
boundaries. Then we arrange 27 Wilson lines of, say 
SU(N) CS theory, as a 2n-strand oriented braid 
carrying representations R; in this manifold. The CS 
functional integral over this manifold is a state in 
the tensor product of the Hilbert spaces Hı Q H2 
associated with conformal field theory on the two 
boundaries. These boundaries have 2n punctures 
carrying the set of representations {Rj} and {R}}, 
respectively, the two sets being permutations of each 
other. This state can be expanded in terms of some 
convenient basis given by the conformal blocks for 
the 2n-point correlation functions of SU(N), 
WZNW conformal field theory. The duality of 
these correlation functions represents the transfor- 
mation between different bases for the Hilbert 
space. Their monodromy properties allow us to 
write down representations of the braid generators. 
Since an arbitrary braid is just a word in terms of 
these generators, this construction provides us a 
matrix representation B({R;}, {R;}) for the colored 
oriented braid in the manifold with two $% bound- 
aries. Then we plat this braid by gluing two balls B, 
and B2 with Wilson lines as shown in Figure 5. 

Each of the two caps again represents a state 
lw({R;})) in the Hilbert space associated with the 
conformal field theory on punctured boundary ($2). 
Platting of the braid then simply is the matrix 
element of braid ee B({R;}, A with 
respect to these states |Y({R;})}) and |\Y({R}}))} corre- 
sponding to two caps By, Bo. Thus, for a bk in s 
the invariant is given by the Bloo e theorem: 


Theorem The vacuum expectation value of Wilson 
loop operator of a link L constructed from platting 
of a colored oriented 2n braid with representation 
B({R;}, {Rj}) is given by (Kaul 1999): 


VIL] = WARHIBAR:} {RIVERI [11] 


This theorem can be used to calculate the 
invariant for any arbitrary link. For an unknot U 


WARY] 





Figure 5 Construction of the link invariant. 


Schwarz-Type Topological Quantum Field Theory 499 


carrying an N-dimensional representation in an 
SU(N) CS theory, the knot invariant is: 


N/2 _ ,—-N/2 


where [N] = "Aaa ga 

Wilson link expectation values calculated this way 
depend on the regularization, that is, the definition 
of framing used in defining coincident loop correla- 
tors. One such regularization usually used is the 
standard framing, where the frame for every knot is 
so chosen that its self-linking number is zero. 

The procedure outlined here has been used for 
explicit computations of knot/link invariants. This 
has led to answers to several questions of knot 
theory. One such question relates to distinguishing 
chirality of knots (Kaul 1999). In this context, newer 
invariants constructed with arbitrary representations 
living on the knots are more powerful than the older 
polynomial invariants. For example, invariants with 
spin-3/2 representation in an SU(2) CS theory are 
sensitive to chirality of many knots which otherwise 
is not detected by Jones, HOMFLY, and Kauffman 
polynomials. However, invariants obtained from CS 
theories do not distinguish all chiral knots. There is 
a class of links known as “mutants” which are not 
distinguished by CS link invariants (Kaul 1999). A 
mutant link is obtained by removing a portion of 
weaving pattern in a link and then gluing it back 
after rotating it about any one of three orthogonal 
axes by an amount 7. 

The CS invariants of knots and links can also be 
used to construct special 3-manifold invariants. 
Hence, CS theory provides an important tool to 
study these. 


Manifold Invariants from CS Theory 


Different 3-manifolds can be constructed through a 
procedure called “surgery of framed knots and 
links” in $ (Lickorish-Wallace theorem). This 
construction is not unique. That is, there are many 
framed knots and links which give the same 
manifold. However, rules of this equivalence are 
known: these are called “Kirby moves.” 
Classification of 3-manifolds would involve find- 
ing a method of associating a quantity with the 
manifold obtained by surgery on the corresponding 
framed knot/link on $°. If the Kirby moves on the 
framed knot/link leave this quantity unchanged, 
then it is a 3-manifold invariant. Knot/link invar- 
iants of nonabelian CS theories provide a method of 
finding such 3-manifold invariants. Equivalently, 
this procedure gives an algebraic meaning to the 
surgery construction of 3-manifolds. Details of this 


method for generating manifold invariants are given 


in Kaul (1999) and Kaul and Ramadevi (2001). 


Surgery of Framed Knots/Links and Kirby Moves 


As discussed earlier, frame of a knot K is an 
associated closed curve Ky going along the length 
of the knot wrapping around it certain number of 
times. Self-linking number (also called framing 
number) is equal to the linking number of the knot 
with its frame. There are several ways of fixing this 
framing. The “standard” framing is one in which the 
frame number of the knot, that is, the linking 
number of the knot and its frame is zero. On the 
other hand, “vertical” framing is obtained by 
choosing the frame vertically above the knot 
projected on to a plane. In such a frame, the framing 
number of a knot is the same as its crossing number. 
In constructing the 3-manifold invariants from CS 
theories, we need vertical framing. The framing 
number may be denoted by writing the integer by 
the side of knot. We denote a framed r-component 
link by [L, f] where framing f =(7(1), 2(2),...,7(r)) 
is a set of integers denoting the framing number of 
component knots K1, K>,...,K, in the link L. 

According to the Lickorish—-Wallace theorem, 
surgery over links with vertical framing in $° yields 
all the 3-manifolds. This surgery is performed in the 
following way. 

Take a framed r-component link [L,f] in $. 
Thicken the component knots Kj, K2,...,K, such 
that the solid tubes N1, N2,..., N, so obtained are 
nonintersecting. Then the compliment $ — 
(Ny; + No +---+N,) will have r toral boundaries. 
On the ith toral boundary, we imagine an 
appropriate curve winding n(i) times around the 
meridian and once along the longitude. Perform a 
modular transformation so that this curve bounds 
a disk. This construction is done with each of the 
toral boundaries. The tubes N1, N2,...,N, are 
then glued back in to the respective gaps. This 
surgery thus yields a new 3-manifold. This 
construction is not unique. The rules of equiva- 
lence for surgery on framed knots/links in $° are 
two independent Kirby moves. 


Kirby move I Take an arbitrary r-component 
framed link [L,f] in S? and consider a curve C 
with framing number +1 going around the unlinked 
strands of L as in Figure 6a. We refer to this (r + 1)- 
component link as H[X], where X represents a 
weaving pattern of the strands. Kirby move I 
consists of twisting the disk enclosed by C in the 
clockwise direction from below by an amount 27. 
This twisting thereby introduces new crossings 
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n’(i) 











(a)  H[X] (b) U[X] 
Figure 6 Kirby move l. 


between the curve C and the strands enclosed by it. 
Then the curve C is removed giving us a new 
r-component link U[X] of Figure 6b. Framing 
numbers n'(i) of the component knots in link U[X] 
are related to the framing number n(i) of framed link 
[L, f] as n'(i) =n(i) — (L(Ki, C))*, where £(K;, C) is 
the linking number of knot K; and closed curve C. 
The surgery of the framed links in Figures 6a and 6b 
will give the same 3-manifold. 

Inverse Kirby move I involves removal of a curve 
C with framing number —1 (instead of +1) after 
making one complete anticlockwise twist from 
below on the disk enclosed by C. In the process the 
unlinked strands get twisted in the anticlockwise 
direction leading to changed framing numbers 
n'(i) =n(i) + (L(K;, C))* of the component knots K;. 


Kirby move II This move consists of removing a 
disjoint unknot C with framing —1 from framed link 
[L,f] without changing the rest of the link as in 
Figure 7. Surgery of the two links in Figure 7 will 
give the same manifold. 

Inverse Kirby move II involves removal of a 
disjoint unknot with framing +1 (instead of —1) 
from a framed link. 


3-Manifold Invariants 


Now a 3-manifold invariant can be constructed by 
an appropriate combination of the invariants of 
framed links in such a way that this algebraic 
expression is unchanged under the Kirby moves. We 


C 
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Figure 7 Kirby move ll. 


need for this purpose invariants for links in $° with 
vertical framing. 

Let M be the manifold obtained from surgery 
of an r-component framed link [L,f] in $°. Then 
a manifold invariant F'9[M] is given as a linear 
combination of the framed link invariants ae R, 
[L,f], with representations R1, R2,..., R, living on 
component knots, obtained from CS theory based 
on a compact semisimple group G: 


POM] =at S M m ) 
i=1 


R1, R, 
r, Lf] [12] 


Here o[L,f] is the signature of the linking matrix 
and jr, =Sor, a=e'™/+, where c is the central 
charge of the associated WZNW conformal field 
theory and Sor, denotes the matrix element of the 
modular matrix S. General S-matrix elements for 
any compact group are given by 


(G) 
x VRIR 


gereg 


Sara = (—i) AEL Het A 


—2 71 


x Jo ew)exp( FT ole, +) Ans +o) 





where W denotes the Weyl group and its elements w 
are words constructed using the generator Sa, — that 
is, w= |], So, and e(w) = (— 1)‘ with 4w) as length of 
the word. Here Ar,’s denotes the highest weights of 
the representations R;’s and p is the Weyl vector. The 
action of the Weyl generator s, on a weight Ap is 


(Ar, a) 
(a, a) 


and |L,,/L| is the ratio of weight and coroot lattices 
(equal to the determinant of the Cartan matrix for 
simply laced algebras). Also C, is quadratic Casimir 
invariant for the adjoint representation. 

It is important to stress that the expression 
FPO[M] is unchanged under both Kirby moves I 
and II (for detailed proof, see Kaul (1999) and Kaul 
and Ramadevi (2001)). Notice that for every 
compact gauge group, we have a new 3-manifold 
invariant. 


Sa(Ar) = Ar — 2a 





Few examples of 3-manifolds Table 1 lists the 
algebraic expressions of this invariant calculated 
explicitly from the formula in eqn [12] for a few 
3-manifolds. All these examples can be constructed 
by surgery on an unknot U(f) with different frame 
numbers f. 

In Table 1 L[p,q] stands for Lens spaces of the 
type (p,q) and Cpr is the quadratic Casimir invariant 
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Table 1 Invariants for some simple manifolds 

U(f) M ÊM] 

U(0) S- x S! 1/ Soo 

U(+1) S> 1 m 
U(+2) RP? ats Song on 
U(+p) Lip, 1] an 2 


for representation R of the Lie algebra of the gauge 
group G. 

Partition function of a CS theory on M is also an 
invariant characterizing the 3-manifold. This has 
been calculated for several manifolds by different 
methods. Invariant F? [M] listed above for various 
manifolds is related to the CS partition function 
ZEM]: FOM] = Sa Z9)[M]. So the method of 
constructing 3-manifold invariants above can also 
be used to calculate the partition function of CS 
theories. 


3D Gravity and CS Theory 


Three-dimensional CS theory also provides a 
description of gravity. The 3D gravity including 
cosmological constant has been first discussed by 
Deser and Jackiw (1984). The action with cosmolo- 
gical constant A=+1/@ is: 


S= — A d?x./—g(R — 2A) [13] 


G is the Newton’s constant, g,,, is the metric on the 
3-manifold M, and R is scalar curvature. Solutions 
of Einstein equations of motion have a constant 
positive (negative) curvature if A is positive (nega- 
tive). It is also well known that there are no 
dynamical degrees of freedom for gravity in dimen- 
sions D < 3; it is indeed described by topological 
field theories. The gravity action above can be 
rewritten as a CS gauge theory in first-order 
formulation (Carlip 2003). For triads a and spin 
connection w? of Euclidean gravity, we define 
1-forms e=e%T*dx",w=utT%dx", which have 
values in the Lie algebra of SU(2) whose generators 
are T*?=i07/2 with øf as three Pauli matrices. 
In terms of these we define two gauge field 1-forms 
A and A as: 


a (Gos). as 


Then the Euclidean gravity action can be written 


in terms of two CS actions, Scs[A] and Scs[A], as 


S = kScs[A] — kScs[A] [14] 


where the coupling constant k = £/ (4G) for negative 
cosmological constant A= —1/. The gauge group 
for this theory is SL(2,C). Infinitesimal diffeo- 
morphisms are described by field-dependent gauge 
transformations. The corresponding gauge group for 
Minkowski gravity with negative cosmological con- 
stant A is SO(2, R) 8&8 SO(2, R). For positive A, one 
gets SO(3, 1) and SO(4) for Minkowski and Euclidean 
metrics, respectively. For A=0, we have ISO(2, 1) 
(ISO(3)) as the gauge group for Minkowski 
(Euclidean) gravity. Hence, the sign of cosmological 
constant determines the gauge group of the CS 
theory. 

Identification of 3D gravity with CS theory can be 
used with some advantage to find the partition 
function for a black hole in 3D gravity with negative 
cosmological constant. This in turn yields an 
expression for entropy of the black hole. 


BTZ Black Hole and Its Partition Function 


Only for negative A we have a black hole solution of 
the Einstein’s equations. This solution, known as the 
BTZ black hole (Carlip 2003), in Euclidean gravity 
is given by the metric 


It is specified by two parameters M and J (the mass 
and angular momentum). By a coordinate transfor- 
mation, this metric can be rewritten as ds¢ = 
(P /z7)(dx* + dy? + dz*), with z > 0. This is the 3D 
upper-half hyperbolic space and can be rewritten 
using spherical polar coordinates as 


[2 


ds* = 
© R2 sin? y 


(dR? + R?d? + R? sin? 6d’) 


We have the identifications (R, 6, x) ~ (R exp {2ar,/Th, 
0 + {2nr_/l},x) where r} and r_ are the outer and 
inner horizon radii, respectively. It is clear from this 
identification that topologically the metric corre- 
sponds to a solid torus. Functional integral over 
this manifold represents a state in the Hilbert space 
specified by the mass and angular momentum. It is 
the microcanonical ensemble partition function and 
its logarithm is the entropy of the black hole. 

To evaluate this partition function, the connection 
1-form is kept at a constant value on the toroidal 
boundary through a gauge transformation. We 
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define local coordinates on the torus boundary 
z=x+rTy such that J, dz=1, |, dz=r7, where 
a (b) stands for the contractible (noncontractible) 
cycle of solid torus and r=7, +172 is the modular 
parameter of the boundary torus. Then connection 
describing the black hole is 





A= (a a) T? 115] 


where u and w are canonically conjugate with 
commutation relation: [#,u]=(2/m)m(k+ a), 
These are related to black hole parameters 
through holonomies of gauge field A around the 
a- and b-cycles (for a classical black hole solution 
O=27): 


1 2n(r_ + ilr_|) 
so if aaea se ae 


For a fixed value of connection, namely u, the 
functional integral is described by a state wo with no 
Wilson line in the bulk. The states with Wilson line 
carrying spin 7/2 are given by Labastida and 
Ramallo: 


k 
wi(u,T) = exp pi? bal T) 


where the Weyl—Kac characters for affine su(2) 


k+2 k+2 
eT” (u,7) — OD (u,7) 


eS OH) = Cur) 


and © functions are defined by 


O% (u,7T) = Y exp{ 2rik (ati retel ) 


neZ 


Given the collection of states 7, we write the 
partition function by choosing an appropriate 
ensemble for fixed mass and angular momentum. 
This black hole partition function is: 


2 


AAR / die 





k 
N (400, T)) y(u, T) 
j=0 





where modular invariant measure is dyu(t,7)= 
dr d7/75. This integral can be worked out for large 


black hole mass and zero angular momentum in 
saddle-point approximation. The computation yields 
(Govindarajan et al. 2001): 


[2 8r G 2nry 
ee SEN s i 
BH PNP exp( 4G ) y pe) 


This gives not only the leading Bekenstein-Hawking 
behavior of the black hole entropy S but also a 
subleading logarithmic term: 








RTA 3 207s. 
— ii 


S=InZpu = Ge —37!n Ge 








This is an interesting application of CS theory to 
3D gravity. In fact, three-dimensional CS theory also 
has applications in the study of black holes in four- 
dimensional gravity: the boundary degrees of free- 
dom of a black hole in 4D are also described by an 
SU(2) CS theory. This allows a calculation of the 
degrees of freedom of, for example, Schwarzschild 
black hole. For large area black holes, this in turn 
results in an expression for the entropy which, besides 
a Bekenstein-—Hawking area term, has a logarithmic 
area correction with same coefficient —3/2 as above. 
This suggests a universal, dimension-independent, 
nature of the these logarithmic corrections. 


See also: BF Theories; The Jones Polynomial; Knot 
Theory and Physics; Large-N and Topological Strings; 
Quantum 3-Manifold Invariants; Topological Quantum 
Field Theory: Overview. 
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Introduction 


Gauge theory is the cornerstone of the standard 
model of elementary particles. The original motiva- 
tion for studying supersymmetric gauge theories was 
phenomenological (such as the hierarchy problem). 
They display a large number of interesting phenom- 
ena and become the models for the dynamics of 
strongly coupled field theories. They also offer 
valuable insights to nonsupersymmetric models. In 
N=1 gauge theory, the low-energy effective super- 
potential is holomorphic both in the superfields and 
in the coupling constants. This powerful holomor- 
phy principle, together with symmetry and various 
limits, often determines the effective superpotential 
completely. Such theories often have quantum 
moduli spaces where the classical singularities are 
smoothed out, continuous interpolation between 
Higgs and confinement phases, massless composite 
mesons and baryons, and dual theories weakly 
coupled at low energy. For N =2 pure gauge theory, 
the low-energy effective theory is an abelian gauge 
theory in which both the kinetic term and the 
coupling constant are determined by a holomorphic 
prepotential. The electric-magnetic duality is in the 
ambiguity of the low-energy description. Much 
physical information, such as the coupling constant, 
the Kahler metric on the quantum moduli, the 
monodromy around the singularities, can be incor- 
porated in a family of elliptic curves. This low- 
energy exact solution is also useful to topological 
field theory that can be obtained from the N=2 
theory by twisting. Much of the above was the work 
of Seiberg and Witten in the mid-1990s. In this 
article, we review some of the fascinating aspects of 
N=1 and N=2 supersymmetric gauge theories. 


Seiberg-Witten Theory 503 


Schwarz AS (1987) New Topological Invariants in the Theory of 
Ouantized Fields. Abstracts in the Proceedings of International 
Topological Conference, Baku, Part II. 

Witten E (1988) Topological quantum field theory. Communica- 
tions in Mathematical Physics 117: 353-386. 

Witten E (1989) Quantum field theory and the Jones polynomial. 
Communications in Mathematical Physics 121: 351-399. 
Witten E (1995) Chern-Simons gauge theory as a string theory. 

Progress in Mathematics 133: 637-678. 


N =1 Gauge Theory and Seiberg Dualities 
N=1 Yang-Mills Theory and QCD 


Let G be a compact Lie group and let P be a principal 
G-bundle over the Minkowski space R*!. In pure 
gauge theory, the dynamical variable is a connection A 
in P; two connections are equivalent if they are related 
by a gauge transformation. Let F € Q?(R>!, ad P) be 
the curvature of A. It decomposes into the self-dual and 
anti-self-dual parts, that is, F=F*+F , where 
F* =(1/2)(F = V-1 x F). With a suitably normalized 
nondegenerate bilinear form (-,-) on the Lie algebra g, 
the classical action is 


SyalAl= f — | (ASP 4— (PAP 


292 167? 

T T 
=] =- Are Ar 
S LAE- 


Here g > 0 is the coupling constant and 0 € R, the 
6 angle, and 


is a complex number in the upper-half plane that 
incorporates both. Classically, the theory is con- 
formally invariant and the dynamics is independent 
of the 6-term. At the quantum level, 6(mod27z) 
appears in the path integral and parametrizes 
inequivalent vacua. The coupling constant runs as 
energy u varies, satisfying the renormalization group 
equation 


dg _ __bo 
dy (4r) 
where the right-hand side is called the 8-function 


G(g). This introduces, when bọ Æ 0, a mass scale A 
given by 





g’ +0(g”) 


(A/ uy =e 8/0 
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up to one-loop. Consequently, the classical scale 
invariance is lost. It is convenient to redefine A as a 
complex quantity such that 


(A/p) =v 17 
For pure gauge theory, by =(11/3)h, where h is the 
dual Coxeter number of g. At high energy (ts — oo), 
the coupling becomes weak (g — 0); this is known as 
asymptotic freedom. On the contrary, the interac- 
tion becomes strong at low energy. It is believed that 
the theory exhibits confinement and has a mass gap. 
QCD, or quantum chromodynamics, is gauge 
theory coupled to matter fields. Suppose the boson 
@ and the fermion w are in the (complex) representa- 
tions Ry, and Ry of G, respectively. That is, ¢ € 
T(P xc Rp), or ¢ is a section of the bundle P xg Rp, 
and Y ET(S8 (P xg Rf)), where S is the spinor 
bundle over R>!. The classical action is 


SocolA, p, Y|=SymlA] 


4 1 2 
ak xs |Vel +V—1(, YY) + 


where V is the covariant derivative, Y is the Dirac 
operator coupled to A, and we have omitted possible 
mass and potential terms. The quantum theory 
depends sensitively on the representations Rẹ, and 
Rp. In the @-function, we have 


where v(R) is the Dynkin index of a representation 
R. If bo < 0, the theory is free in the infrared but 
strongly interacting in the ultraviolet. If bọ > 0, the 
converse is true; in particular, the theory exhibits 
asymptotic freedom. If bọ = 0, the situation depends 
on the sign of the two or higher-loop contributions. 

Pure N =1 supersymmetric gauge theory is one on 
the superspace R*!!'2>7) with a constraint that the 
curvature vanishes in the odd directions. The 
dynamical variables are in the superfield strength 
W, a 1\(1,0)-form valued in ad P. In components, 
the theory is gauge field coupled to a Majorana or 
Weyl fermion in the adjoint representation. Let $+ 
be spinor bundles of positive (negative) chiralities, 
respectively, and let A be a section of St ® adP. The 
action, written both in superspace and in ordinary 
spacetime, 1s 


SNTMA, A] = m| [dtxdoriw, w)) 


4 \ wt 
+3 f xV—1(A, YA) 


= Sym [Al 


Since bo =3h, the theory is asymptotically free but 
strongly coupled at low energy. Classically, the 
theory has a U(1)pg chiral symmetry. However, due 
to anomaly, only the subgroup Z,; survives at the 
quantum level. Instanton effect yields gaugino 
condensation (AA) ~ A®. The symmetry is thus 
further broken to Z2, resulting þh inequivalent vacua. 

The N=1 QCD has additional chiral superfields 
® in a representation R, including the bosons ¢ € 
T(P xg R) and the fermions Y € T(S (P xc R)). 
In the absence of superpotential, the action is 


Seym l4; A| 
+5 dx d0 d0 taj 
§ 


SSocDIA, A, 4, Y] = 


In components, the second term is 
1 x 
a J (Ev + VT -DP +) 


where D:R—q* is the moment map of the 
Hamiltonian G-action on R, and we have omitted 
other terms containing fermionic fields. The 
moduli space of classical vacua is the symplectic 
quotient D~'(0)/G=R//G. It is the same as the 
Kahler quotient R/G“, where the stable subset 
RS={¢ € RIG’ -¢ND"(0) Æ Ø} is open and dense in 
R. Again, the quantum theory Pn on the 
representation R. Since bọ = 3h — (1/2)v(R), the theory 
is asymptotically free, infrared free, Me uae (to 
one-loop) when v(R) < 6h,v(R) > 6h,v(R) = 6h, 
respectively. The moduli space may be lifted by a 
superpotential or modified by other quantum effects. 


SU(N.) Theories at Low Energy 


We now consider N=1 QCD with G=SU(N,);N, 
is the number of colors. The matter field consists of 
Ny copies of quarks Q’(1 < i < N;) in the funda- 
mental representation of SU(N.) and Ny copies of 
antiquarks O%,(1 <7 < Ny) in the conjugate repre- 
sentation. Using the isomorphism of su(N,) with its 
dual, the moment map is 


D(O, QO’) = traceless part of V—1(QQ' — O'O") 


So (Q, QO’) € D“(0) if and only if OOt — Q" =cIy. 
for some c € R. If Ny < Ne, then c=0 and 


d1 


OO 


aN; 


for some a, > 0. Generically, these a, > 0 and the 
gauge group SU(N-) is broken to SU(N. — Ny). If 
Ne zNa then 


ay a 

O ~ E l 
aN, an, 

where ag, a', > 0 satisfy a; — a! =c for some c € R. 

The gauge group is completely broken. The low- 


energy superfields are the mesons M}, = O'O,, and, if 
N; > Ne, the baryons 
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When N; < Ne, Affleck et al. (1984) found a 
dynamically generated superpotential 


: A3Ne—Ny\ 1/(Ne—Np) 
We (M) = (Ne — Ny) (S) 


generated by instanton effect when N; = N. — 1 and by 
gaugino condensation in the unbroken SU(N, — Nr) 
theory when N; < N, — 1. It is also the unique super- 
potential (up to a multiplicative constant) that is 
consistent with the global and supersymmetry. The 
potential pushes the vacuum to infinity. Therefore, 
contrary to the classical picture, theories with Nr < Ne 
do not have a vacuum at the quantum level. 

When Ny > 3N., the theory is not strongly inter- 
acting at low energy, and perturbation methods are 
reliable. (When N; = 3N., the two-loop contribution 
to the (-function is negative.) We now look at the 
range Ne < Nf < 3N.. The cases Ne =N.,N,+ 1 
and N.+2<N,;<3N, were studied in Seiberg 
(1994) and Seiberg (1995), respectively. 

When N;=N,, the classical moduli space is 
det M= BB’. The quantum theory at low energy 
consists of the fields M, B, B’ satisfying the 
constraint det M — BB’ = A7N-, The quantum moduli 
space is smooth everywhere, and there are no 
additional massless particles. So the gluons are 
heavy throughout the moduli space. This is due to 
confinement near the origin, where the interaction is 
strong, and due to the Higgs mechanism far out in 
the flat direction, where the classical picture is a 
good approximation. We see a smooth transition 
between these two effects. 

When Ny =N: + 1, there is a dynamically gener- 
ated superpotential 


1 
Weti = MN. 1 (B'MB = det M) 
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The stationary points of Wer are at BB’ — A:M = 0, 
BM=0, MB’ = 0; these are precisely the constraints 
that the classical configuration satisfies. However, 
the moduli space is interpreted differently: it is 
embedded into a larger space, and the constraints 
are satisfied only at stationary points. At the 
singularity (M)=0, the whole global symmetry 
group is unbroken, and B,B’ are the new massless 
fields resolving the singularity. So we have a 
continuous transition between confinement (without 
chiral symmetry breaking) and the Higgs mechanism 
in the semiclassical regime. 

When N; +2 < N; < (3/2)N,, the original theory, 
called the electric theory, is still strongly coupled in 
the infrared. Seiberg (1995) proposed that there is a 
dual, magnetic theory, which is infrared free. The 
two theories are different classically, but are 
equivalent at the quantum level. The dual theory 
is an N=1SU(N,) gauge theory with N. =N; — Ne, 
coupled to dual quarks Q; O”, where 1 < i; i’ < 
Nç; are flavor indices. In addition, the mesons Mi, 
become fundamental fields. They are not coupled to 
the SU(N,) gauge field but interact with the dual 
quarks through the superpotential 


W= p’ MiO" 


The two theories have the same global symmetry 
and the same gauge-invariant operators. The dual 
quarks are fundamental in the magnetic theory but 
are solitonic excitations in the electric theory. At 
high energy, the electric theory is asymptotically 
free, while the magnetic theory is strongly coupled. 
At low energies, the converse is true. Therefore, 
reliable perturbative calculations can be performed 
by choosing an appropriate weakly coupled 
theory. 

When (3/2)N. < Nf < 3N., the theory has a 
nontrivial infrared fixed point. This is because up 
to two-loop, 


3 


Blo) = — ze (3Ne — Ny) 


J N 
tosa (2N.N; SIN = z£) + 0(9’) 

There is a solution g, >0 to G(g)=0. We have 
B(g) < 0 when 0 < g < g,,G(g) > 0 when g > g,. In 
the infrared limit, the coupling constant flows to 
g=g,, Where we have a nontrivial, interacting 
superconformal theory in four dimensions. The 
conformal dimension becomes anomalous and is 
equal to 3/2 of the charge of the chiral U(1)p; for 
example, that of the meson “M is 3(N;— 
N.)/N; > 1 in this range. 
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Other Classical Gauge Groups 


We now consider N=1 supersymmetric gauge 
theory and QCD with gauge groups Sp(N,) and 
SO(N,). The Sp(N,) theories, studied by Intriligator 
and Pouliot (1995), are the simplest examples of 
the N=1 theories. We take 2Ny chiral supertfields 
Oj(i=1,...,2Ny) in the fundamental representation 
C'N: ~ HN: of Sp(N-). The number of copies must 
be even so that the quantum theory is free from 
global gauge anomaly. The gauge-invariant quanti- 
ties are the mesons Mj = Of OP wap, where w is 
the symplectic form on C^, subject to a constraint 
ee Nee Mig aoe » Mon, 41,2N,42 =), Using the 
decomposition u(2N.) =sp(N.) @ V—1{lH-self-adjoint 
matrices}, the moment map D(Q) is the projection of 


V—100?' on sp(N,). So D(Q) = 0 implies 


ay 
1 0 
0 1 


Oe E Q 
Amin{N.,N;} 
where a, > 0. At a generic point of the classical 
moduli space, the gauge group is broken to Sp(N, — 
Ny) if N: > Ny; it is completely broken if Ne < Ny. 
Since bo = 3(N- + 1) — Ny, the quantum theory is 
infrared free if N; > 3(N- +1). (When bo =0, the 
two-loop -function is negative.) When Ny < Ne, 
there is a dynamically generated superpotential 


We = (Ne +1 — Ny) 


IN.—1 A 3(Ne+1)—Ny 
x ( PEM ) 








1/(N.+1—Ny) 


pushing the vacuum to infinity. 

When N; = N., the classical moduli space PfM = 0 
has singularities. The quantum moduli space is 
Pf M = 20:71 A2N:+1). The singularity is smoothed 
out and there are no light fields other than the 
mesons M. When N; = N, + 1, all components of M 
become dynamical in the low-energy theory, and 
there is a superpotential 


PfM 


Wef = — sn —1,aNat 


At the most singular point (M)=0, the global 
symmetry is unbroken, and all the light fields in M 
become massless. In both cases, there is a transition 
between confinement and Higgs mechanism. 

When N. +3 < Ne < (3/2)(N-+1), there is a 
dual, magnetic theory which is free in the infrared. 
The dual theory has 2N; quarks O? in the funda- 
mental representation of Sp(N,), where N. =N; — 
N, — 2. In addition, the mesons Mj; become elemen- 
tary and couple to Q through a superpotential 
W = (2u) M; Q” OP õa, where w is the symplectic 


form on C7’. When (3/2)(N- + 1) < Ny < 3(N. + 1), 
the theory flows to an interacting superconformal field 
theory in the infrared. 

Theories with the SO(N,) gauge group were 
studied by Seiberg (1995) and by Intriligator and 
Seiberg (1995). Since the fundamental representa- 
tion is real, there is no constraint on the number N; 
of quarks O'(1 <i < N;). The gauge invariants are 
the mesons M’ = Q0, 0” and, if N; > Ne, the 
baryons. Biye =i O ON., They 
satisfy rank M < N. and ‘BB = AN: M. Using the 
decomposition 1(N,)=$0(N.) @ V—L{R-self-adjoint 
matrices}, the moment map D(Q) is the projection 
V-100? on 80(N,). If D(O)=0, then up to gauge 
and global symmetries, O is of the form 


a1 


Ow 


ar 


where 4,...,4,>0 if r=rankO<WN, and 
a1,...,4n,-1 > 0 and an, £ 0 if r= N.. Generically, 
the gauge group is broken to SO(N, — Ny) if N. > 
N; +2 and is totally broken if Ne < Ny +2. 

We have bob =3(N.—2)—Ny if N.>5S. For 
N,-=4, the group is (SU(2) x SU(2))/Z. and 
bo =6—Ny for each SU(2) factor. If N,=3, the 
group is SU(2)/Z2b)=6—2Ny. The theory is 
asymptotically free if N; > 3(N, — 2) and infrared 
free if Nr > 3(N- — 2). 

When N; < N, — 5, there is a dynamically gener- 
ated superpotential 


1 
Weft = 5 (Ne ge ING) 


16 A3(Nc-2) -Ny 
x ( det M ) 
lifting the classical vacuum degeneracy. The coeffi- 
cient is fixed by mass deformation and by matching 
the SU(4) theory when N. =6. 
When Nr =N. — 4, the unbroken gauge group is 
SO(4) = (SU(2) x SU(2))/Z2 on the generic point of 


the moduli space. The superpotential of the original 
theory is 


1/(N-—2—Ny) 


sande 1/2 


Wert = 2(e, + €) (r 


where the choices €}, e- = +1 correspond to the fact 
that each of the SU(2) theory has two vacua. There 
are two physically inequivalent branches: €} = e€- 
and e} = —e_. For e} =e_, the superpotential pushes 
the vacuum to infinity. For e} = —e_,Weg = 0. In the 
quantum theory, the singularity is smoothed out and 


all the massless fermions are in M, even at the origin 
of the moduli space. Hence the quarks are confined. 

When Ny =N. — 3, the unbroken gauge group is 
SO(3) and the theory has two branches with 


A2Ne-3 

Wer = 4(1 —— 

eff ( + €) det M 
where e= +1. For e= 1, the quantum theory has no 
vacuum. For e= -—1, W. =0, but there are addi- 


tional light fields Ọ; coupling to M via the super- 
potential W ~ (2u) *MïQ;Q; near M=0. 

When N; = N. — 2, the low-energy theory is related 
to the N =2 gauge theory and will be addressed in the 
subsection “Seiberg—Witten’s low-energy solution.” 

When N; > N. — 1, we define a dual, magnetic 
theory whose gauge group is SO(N-), where 
N. =N; — N: +4. There are Ny dual quarks O,(1 < 
i < Ny) in the fundamental representation. This 
theory is infrared free if Ny < (3/2)(N, — 2). In the 
effective theory, the mesons M” become fundamen- 
tal and couple with the dual quarks through a 
superpotential W = (2p) M" Q;Q; if Ny > Na; there 
is an additional term det M/64A*N-~ if Ny =N, — 1. 
When (3/2)(N. — 2) < Ny < 3(N. — 2), the theory 
flows to an interacting superconformal field theory in 
the infrared. 


N =2 Gauge Theory and Seiberg-Witten 
Duality 


N=2 Yang-Mills Theory 


Pure N =2 supersymmetric gauge theory is a special 
case of N=1 QCD when R=Q is the (complex- 
ified) adjoint representation of G. The moment map 
is D(¢) =(1/2V-1)[¢, 6] € g = g*(¢ € g). Since the 
fermionic fields A and ~ are sections of the same 
bundle, there is a second set of supersymmetry 
transformations by interchanging the roles of À and 
w. This makes the theory N=2 supersymmetric. 
The classical action is 


SNA; A, p, o] = Sym |A| 
1 fa _ 
ak WAYA 


+ (0, Pu) +5 1V9P 
+ V-1((6, A, Y] + (6, a) 
-zlip 41? 


The energy reaches the minimum when ¢ takes a 
constant value ¢ € g“ that can be conjugated by G 
to the Cartan subalgebra t“. (t is the Lie algebra of 
the maximal torus T.) The classical moduli space is 
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go /Go =t'/W, where W is the Weyl group. At a 
generic ¢ € t“, the gauge group is broken to T by 
the Higgs mechanism. Classically, the massless 
degrees of freedom are excitations of @ and 
components of the gauge field in t. So the low- 
energy physics can be described by these massless 
fields. However, the moduli space is singular when ¢ 
is on the walls of the Weyl chambers. At these 
values, the unbroken gauge group is larger and there 
are extra massless fields that resolve the 
singularities. _ 

Since bo = 2h > 0, the quantum theory is asymp- 
totically free but strongly interacting at low energy. 
It can be shown that N=1 supersymmetry already 
forbids a dynamically generated superpotential on 
t“/W. Therefore, the vacuum degeneracy is not 
lifted and the quantum moduli space is still a 
continuum. However, there are corrections to the 
part of classical moduli space where strong interac- 
tions occur. The quantum theory has a dynamically 
generated mass scale A. We pick the renormalization 
scale u to be |¢|, the typical energy scale where 
spontaneous symmetry breaking occurs. Far away 
from the origin, that is, when || >> |A], the theory is 
weakly interacting and the classical description of 
the moduli space is a good approximation. How- 
ever, when |¢| is comparable to |A|, the classical 
language and perturbation methods fail due to 
strong interaction. At ¢=0, the full gauge symmetry 
is restored classically. But since the theory becomes 
strongly interacting at low energy, it cannot be the 
low-energy solution of the original theory. 

The classical U(1)p symmetry extends to U(2).p, 
mixing A and yw. The U(1)p subgroup in U(2)p is 
anomalous except for a subgroup Z,;. So we have a 
global SU(2)k xz, Z4; symmetry at the quantum 
level. This is consistent with a continuous moduli 
space of vacua, if the group SU(2)p is to act 
nontrivially. Also, the space is not a single orbit of 
the global symmetry group. The generator of Z,; 
acts on tÏ by a phase etV-1/h The group Z4; is 
spontaneously broken to the subgroup which 
acts trivially on t’/W. 

We study the general form of low-energy effective 
Lagrangian that is consistent with N=2 super- 
symmetry. We assume that the quantum effect does 
not modify the topology of the moduli space t“/W, 
though it may alter the singularity and its nature. 
Suppose U is the quantum moduli. At a generic 
point in U, the residual gauge group is T. In the 
N=1 language, the theory is a supersymmetric 
gauged sigma model with target space U. It contains 
N =1 vector multiplets W! and chiral multiplets ®t, 
where 1<I<r,r=dimT being the rank of G. 
N =1 supersymmetry requires that U is Kahler, with 
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possible singularities where the effective theory 
breaks down. N =2 supersymmetry requires further 
that U is special Kahler, that is, there is a flat, 
torsion-free connection V on TU such that the 
Kahler form w is parallel and such that dy] =0, 
where the complex structure J is viewed as a 1-form 
valued in TU. See, for example, Freed (1999). 
Locally, there is a ne omer a prepotential F and 
special coordinates {z!}. Let Z; =0F/0z' be the dual 
coordinates and let Ty = 0? F/0z'0z! = 021/0z!. Then 
K =Im(z)z!) is a Kahler potential and w= (v—1/2) 
Im(77)dz' Adz is the Kahler form. The effective 
action 1s 


se? [W, 0] = zim | d'xd?ojry(o)(w", W) 


+ J d*x ‘Pod? 9K(®) ) 


Note that both the coupling constants 7 and the 
metric Im? on U are determined by a holomorphic 
function F, which is the hallmark of N=2 
supersymmetry. 

In the bare theory with ae n group T, the 
action is given by choosing Fo(®) = (1/2)ry(®!, o i 
where the 7; (and hence the pin Imr) are 
constants. Due to one-loop and instanton effects, 
F is no longer quadratic in the effective theory. 
Since 7 varies on U, it cannot be holomorphic 
(except at a few singular points), single valued, and 
having a _ positive-definite imaginary part. The 
solution to this apparent contradiction is that each 
set of special coordinates and the expression of F is 
valid only in part of U. Solving the N=2 gauge 
theory at low energy means understanding the 
singularity of U in the strong coupling regime and 
obtaining the explicit form of F or Ty in various 
regions of the moduli space. 


Seiberg-Witten’s Low-Energy Solution 


We consider N=2 gauge theory with G=SU(2). 
The Cartan subalgebra is t ¥ C; each a € C deter- 
mines an element g=(1/2)(4 °) in t. The Weyl 
group W & Z, acts on C by a> +a. The moduli 
space of classical vacua is A u-plane C/Z2 
parametrized by u=tr¢? =(1/2)a*. When u 40, 
the gauge group is broken to U(1). The generator 
of Z, = Zs C U(1)ęg acts as am vyla, u= —u. 
The Zg symmetry is broken to Z4; the quotient 
Z2 = Zg/ Z4 acts on the u-plane by u> u. 
Abelian gauge theory and N =4 supersymmetric 
gauge theory exhibit exact electric-magnetic duality 
in the sense that the quantum theories are identical 
if the coupling constant r undergoes an SL(2, Z) 
transformation. Seiberg and Witten (1994a,b) 


proposed that this is so for the low-energy effective 
theory of the N=2 gauge theory. An SL(2,Z) 
transformation maps one description of the low- 
energy theory to another, exchanging electricity and 
magnetism. It is however not an exact duality of the 
full SU(2) theory. Rather, duality is in the ambiguity 
of the choice of the low-energy description. More 
precisely, 7 is a section of a flat SL(2, Z) bundle over 
U. Thus, 7 is multivalued and exists as a function in 
local charts only. So we must use different Lagran- 
gians in different regions of the u-plane. Around the 
singularities where 7 is not defined, nontrivial 
monodromy can appear. 

Away from infinity, the electric theory is strongly 
interacting but the magnetic theory is aad free. 
The dual field is a=dF(a)/da, and Telu) = da/da. 
The group SL(2, Z) is generated by 

0 1 
49) 


(EM) sol 


To see its action on (4), we use the central 
extension of the N=2 super-Poincaré algebra. In 
the classical theory, the central charge is Z = (ne + 
TNm)a from the boundary terms at infinity. As the 
electric-magnetic duality transformation S inter- 
changes nę and nm, we have for any y € SL(2, Z), 
Y: (Nm, Me)? (Nm, Ne) yt. When nm =0, the classical 
formula Z=na is valid. Invariance of Z under 
SL(2, Z) requires that Z = nm + na at the quan- 
tum level and that SL(2,Z) acts on (2) homo- 
geneously as a column vector. 

When u = (1/2)a? is large, perturbation is reliable. 
The classical and one-loop results are a(u) ~ 
V2u, a ~ (V —1/r)aloga?. As u goes around infinity, 
the fields transforms as am —a,ä = —a+2a. The 
monodromy is M,=PT~*. The mass M of a 
monopole state is bounded by M?=P#P,, > |Z|’, 
which is precisely the Bogomol’nyi bound. Now as a 
consequence of the N =2 supersymmetry, it receives 
no quantum corrections as long as supersymmetry is 
not broken at the quantum level. The states that 
saturate the bound are the BPS states. The BPS 
spectrum at u € U is a subset of Hi(E,,Z) S Z? 
containing the pairs (nm, e) realized by the dyon 
charges. Near infinity, the condition is that either 
Ne = +1,nm =0 (for W+ particles) or nn = +1 (for 
monopoles or dyons). This spectrum is invariant 
under the monodromy M». 

The nontrivial holonomy at infinity implies the 
existence of at least one singularity at a finite value 
u=uo, where extra particles become massless. 
Seiberg and Witten (1994a, b) propose that these 


particles are collective excitations in the perturbative 
regime. Suppose along a path connecting uo and 
some base point near infinity, a monopole of charges 
(+1, 7) =(0, 1)(T**S*!) + becomes massless at uo. 
Then by the renormalization group analysis 
and duality, the monodromy at uo is M,,, =(T*”S*!) 
T2(T#"S*!)+, It turns out that there are two 
singularities w—=-+A* with monodromies Mp = 
ST?S and M_,2=(TS)T2(TS)“. The particles that 
become massless at +A? are of charges (nm, ne) = (1, 0) 
and (1, —1), respectively. The only BPS states in the 
strong coupling regime are those which become 
massless at the singularities; the others decay as u 
deforms towards strong interaction. 

The monodromies M,.2,M. (or any two of 
them) generate the subgroup I(2). The family of 
elliptic curves with these monodromies can be 
identified with y* =(x — A7)(x + A7)(x— u) called 
the Seiberg—Witten curve. The singularities are at 
u= +A? and u=oo, where the curve degenerates. 
Let 


y2 ydx 


~ On x2 — A4 


be the Seiberg—Witten differential (of second kind on 
the total space E). Then in a suitable basis (a, 3) of 
Hı(E„/U, Z), we have a= f,A,a= fà At a 
singularity, if v=n,G+nea is a vanishing cycle, 
then the dyon of charges (nm, ne) becomes massless. 
This is because its central charge is Z=nna+ 
n.a=J,A. The monodromy at a singularity where v 
is a vanishing cycle is given by the Picard—Lefshetz 
formula M: y> y — 2(y - v)v. At u= +A, the van- 
ishing cycles are 8 and 8 — a, respectively. 

We return to the N=1 SO(N,) gauge theory with 
N; =N. — 2. At a generic point in the moduli space, 
the gauge group is broken to SO(2), which is 
abelian. Much of the above discussion applies to 
this case. By N=1 supersymmetry, the effective 
coupling Teg is holomorphic in M but is not single 
valued. In fact, Ter depends on u = det M, which is 
invariant under the (anomaly free) SU(N;) symme- 
try. For large u, we have e2tV—lrete — A4Ne-8 Ju? and 
the monodromy around infinity is M,,=PT~. 
On the other hand, a large expectation value 
of M of rank N,—3 breaks the gauge group to 
SO(3) and the theory is the N=2 theory discussed 
earlier. Using these facts, Intriligator and Seiberg 
(1995) identified the family of elliptic curves as 
y2 = x(x — 16A*N-*4)(x — u). There are two singula- 
rities with inequivalent physics. At u=0, the mono- 
dromy is ST?S"!. A pair of monopoles O* becomes 
massless. They couple with M through the super- 
potential W ~ (2) 'M’0;O;. At u=16A2N~4, the 
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monodromy is (T2S)T2(T2S). A pair of dyons E* of 
charges +1 become massless. The effective action is 
Weg ~ (u — 164A?N-4) Et ES. 

Topological gauge theory is a twisted version of 
N=2 Yang-Mills theory in which the observables 
at high energy are the Donaldson invariants. The 
work of Seiberg and Witten (1994a, b) yields new 
insight to it and has a tremendous impact on the 
geometry of 4-manifolds. See Witten (1994) for the 
initial steps. 

After the work of Seiberg and Witten (1994a, b), 
there has been much progress on theories with other 
gauge groups. If the gauge group is a compact Lie 
group of rank r, the w-plane is replaced by t“/W; 
the singularities are modified by quantum effects. 
The duality group is Sp(2r, Z) or its subgroup of 
finite index, acting on the coupling matrix T= (77) 
by fractional linear transformations. For example, for 
G=SU(N,), the moduli space is parametrized by 
gauge invariants u2, ..., uyn, defined by det (xI — ¢) = 
xNe — SNe, uxN = Py (x, uj). Classically, the sin- 
gular locus is a simple singularity of type An,_1. At 
the quantum level, the singularity consists of two 
copies of such locus shifted by +A” in the u, 
direction. The monodromies correspond to a family 
of hyperelliptic curves y* =P (x,uj)* — AN: of 
genus N, — 1. The Seiberg-Witten differential is 


_ 42 OPN, (x, ui) x dx 
| Ox y 





+ O(---) 


The N.—1 independent eigenvalues a’ of ¢ and 
their duals 4;=0F /0a' are the periods of à along 
the 2N, — 2 homology cycles in the curve. For more 
details, the reader is referred to Klemm et al. (1995) 
and Argyes and Faraggi (1995). 


N=2 QCD 


N=2 supersymmetric QCD is N=2 Yang-Mills 
theory coupled to N=2 matter. The latter consists 
of N=1 superfields O that form a quarternionic 
representation R of the gauge group G. The space R 
has a G-invariant hyper-Kahler structure. The 
hyper-Kahler moment map uy: R —> g* Q Im H con- 
sists of a real moment map ugr:R— g* for the 
Kahler structure and a complex moment map 
uc:R—(q*)~ for the holomorphic symplectic 
structure. As an N= 1 theory, the matter superfields 
are valued in R xg“ with a D-term D(O,®)= 
Lup (Q) + (1/2V—-1)[6,&] and a superpotential 
W(Q, &) = V2(p-(Q), ©) + m(Q), where the mass 
term m is a G-invariant quadratic form on R. The 
classical moduli space of vacua has two branches. 
On the Coulomb branch where O=0 and © £0, 
the unbroken gauge group is abelian and the 
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photons are massless. If O #0 exists in the flat 
directions, the gauge group is broken according to 
the value of O; these are the Higgs branches. If 
m=0Q, the moduli space of classical vacua is the 
hyper-Kahler quotient ua (0)/G. The branches of 
two types touch at the origin, where the full gauge 
group is restored, and at other subvarieties in R. The 
global symmetry is the subgroup of U(R) that 
commutes with the G-action on R and preserves 
m; it contains U(2)p. 

Quantum mechanically, such a theory is free 
from local gauge anomalies. Consistency under large 
gauge transformations puts a torsion condition on R, 
such as v(R)=0(mod 2). Since by = 2h — (1/2)v(R), 
the theory is asymptotically free if v(R) < 4h. If 
v(R)=4h, the quantum theory is scale invariant up 
to one-loop (and hence to all loops), and is expected 
to be so nonperturbatively. If v(R) > 44, the quan- 
tum theory may not be defined but it can be the low- 
energy solution of another asymptotically free theory. 
Due to the axial anomaly, the U(2)p global symmetry 
reduces to the subgroup SU(2)r xz, Z4p-ur) The 
metric on the Coulomb branch can be corrected by 
quantum effects, but those on the Higgs branches do 
not change because of the uniqueness of the hyper- 
Kahler metric. In the quantum theory, the Higgs 
branches still touch the Coulomb branch, but the 
photons of the Coulomb branch are the only massless 
gauge bosons at the point where they meet. 

When G=SU(N,) we take Ny quarks 
O'(i=1,...,N,) in the fundamental representation 
and N; antiquarks O;(i=1,..., Ny) in the complex- 
conjugate representation. The moment map is the 
same as in N=1 QCD whereas the superpotential 
is W=V20,;60! + `; m;Q;Q'. Consider the case 
G =SU(2) as in Seiberg and Witten (1994b). Since 
bj) =4—Ny, the asymptotically free theories have 
Nr < 3 whereas the N; =4 theory is scale invariant. 
As the representations on QO’ and Q; are isomorphic, 
the classical global symmetry is O(2Nr) x U(2)p 
when all m;=0. The appearance of the even number 
of fundamental representations is necessary for the 
consistency of the theory at the quantum level. The 
U(1)p symmetry is anomalous if Ny # 4. When Nr > 0, 
SO(2Nr) is anomaly free, whereas O(2N,)/SO(2Nr) = 
Z is anomalous. The anomaly free subgroup of Z3 x 
U(1)p is Z4(4_n,). Its Z2 subgroup acts in the same way 
as Z2 C Z(SO(2Nr)). A nonzero expectation value of 
u=trd* further breaks the symmetry to Z4. The 
quotient group that acts effectively on the u-plane (the 
Coulomb branch) is Z4_n, if N; > 0 and Z2 if Nr =0. 
When N; =4, the U(1) symmetry is anomaly free but 
Z2 = O(8)/SO(8) is still anomalous. 

The N; =0 theory is the N =2 pure gauge theory. 
In order to compare it to the Ny > 0 theories, we 





multiply ne by 2 so that it has integer values on Q' 
and Q;, and divide a by 2 to preserve the formula 
Z =m + na. The monodromies around the singu- 
larities become My? = STS, Mp2 = (T? S)T(T2S)*, 
M,,=PT~*. They generate the subgroup To(4) of 
SL(2,Z). The coupling constant is 
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T g? 





The Seiberg-Witten curve is y*=x> — ux? + 
(1/4)A6x, related to the earlier one y* = (x — u)(x* — 
Aj) by an isogeny. Here and below, An, is the 
dynamically generated scale. 

For N; > 0, we consider the case with zero bare 
masses. The simplest BPS-saturated states are the 
elementary quarks with mass V2|a|, which form 
the vector representation of SO(2N;). In addition, the 
quarks have fermion zero modes in the monopole 
background. When nm =1, each SU(2) doublet of 
quarks has one zero mode. With Ny hypermultiplet, 
there are 2N; zero modes in the vector representation 
of SO(2Nr). Upon quantization, the quantum states 
are in the spinor representation. So the flavor 
symmetry is really Spin(2N;). The spectrum may 
also include states with nm > 1. For Nr =2,3,4, the 
center Z(Spin(2N-)) are Z2 X Zn, Z4, Z2 X Z2, 
whose generators act on states, of charges (nm, Me) 
by ((—1)"""™, (—1)"), v—1 7 3 (1), (—1)”*), 
respectively. 

Suppose at a singularity on the u-plane, the low- 
energy theory is QED with k hypermultiplets. Let m; 
be the bare mass and S;, the U(1) charge of the ith 
hypermultiplet. With the expectation value of ¢, the 
actual masses are |\/2a + m;|(1 < i < k). As the states 
form a small representation of the N = 2 algebra, the 
central charge is modified as Z=nmä + nea + S- 
m//2, where m= (m1,..., mp) and S=(S1,..., Sp). 
Under a duality transformation M € SL(2,Z), the 
column vector (m/V/2,4a,a) is multiplied by a matrix 
of the form M = (7 a: (For example, if M=T, M 
can be derived by one-loop analysis.) So the row 
vector W = (S, nm, ne) transforms as Wr WM. The 
transformation on (nm, ne) is not homogeneous when 
there are hypermultiplets. This phenomenon persists 
even when all the bare masses m; are zero. 

When N; =1, the global symmetry of the u-plane 
is Z3. There are three singularities related by this 
symmetry, where monopoles with charges (nm, ne) = 
(1,0),(1,1), and (1,2) become massless. The low- 
energy theory at each singularity is QED with a 
single light hypermultiplet. Besides the photon, no 
other flat directions exist. This is consistent with the 
absence of Higgs branch in the original theory. 
The monodromies at the singularities are STS, 








(TS)T(TS)“, (T2S)T(T2S)“, respectively, and the 
corresponding Seiberg—Witten family of curves is 
y? =x*(x — u) — (1/64)A°. The Seiberg-Witten dif- 


ferential is 


When N; = 2, there are two singularities related by 
the global symmetry Z2 of the u-plane. The massless 
states at one singularity have (7 ,”-)=(1,0) and 
form a spinor representation of SO(4) while those at 
the other have (7,”e)=(1,1) and form the other 
spinor representation. The low-energy theory at each 
singularity is QED with two light hypermultiplets. 
There are additional flat directions along which 
SO(4) x SU(2)p is broken. They form the two Higgs 
branches that touch the u-plane at the two singula- 
rities rather than at the origin. The metric and pattern 
of symmetry breaking are the same as classically. 
The monodromies are ST2S~!,(TS)T2(TS) +. The 
Seiberg-Witten curve is y% = (x° —u) — (1/64)A5) 
(x — u) and the differential is 


J- V2 ydx 
An x2 — A4/64 

When N; =3, the u-plane has no global symme- 
try. There are two singularities. At one of them, a 
single monopole bound state with (nm, ne) = (2, 1) 
becomes massless and there are no other light 
particles. At the other singularity, the massless states 
have (%m,%e)=(1,0) and form a (four-dimensional) 
spinor representation of SO(6) with a definite 
chirality. Thus, the low-energy theory is QED with 
four light hypermultiplets. Along the flat directions, 
the SO(6) x SU(2)p symmetry is further broken. 
This corresponds to a single Higgs branch touching 
the u-plane at the singularity. Again, the metric on 
the Higgs branch is not modified by quantum 
effects. The monodromies at the two singularities 
are (ST*S)T(ST2S) and ST*S~, respectively. The 
Seiberg-Witten curve is y*=x7(x —u) — (1/64) 
A3(x — u)” and the differential is 


A = toe(y + viR (+ 


When N;=4, the theory is characterized by 
classical coupling constant 7r, and there are no 
corrections to a=(1/2)/2u,a=rTa. There is only 
one singularity at u = 0, where the monodromy is P. 
Seiberg and Witten (1994b) postulate that the full 
quantum theory is SL(2,Z) invariant, just like the 
N=4 pure gauge theory. The elementary 
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hypermultiplet has (7m,”-)=(0,1) and form the 
vector representation v of SO(8). Fermion zero 
modes give rise to  hypermultiplets with 
(nm, ne) = (1,0), (1, 1) that transform under the spinor 
representations s, c of Spin(8). SL(2, Z) acts on the 
spectrum via a homomorphism onto the outer-auto- 
morphism group $3 of Spin(8), which then permutes v, 
s, and c. So duality is mixed in an interesting way with 
the SO(8) triality. In v, s, and c, the center Z? x Z2 
acts as (1), a) = (1, =1), (ds 1), (—1, =l}; 
respectively. The full SL(2, Z) invariance predicts the 
existence of multimonopole bound states: for every 
pair of relatively prime integers (p, q), there are eight 
states with (nm, ne) = (p, q) that form a representation 
of Spin(8) on which the center acts as ((—1)?, (—1)2). 

Solutions when the bare masses are nonzero are 
also obtained by Seiberg and Witten (1994b). The 
masses can be deformed to relate theories with 
different values of Ny. N=2 QCD with a general 
classical gauge group has also been studied. By 
adding to these theories a mass term mtr © 
that explicitly breaks the supersymmetry to N=1, 
the dualities of Seiberg can be recovered. For 
SU(N,),SO(N,) and Sp(2N.) gauge groups, 
see Hanany and Oz (1995), Argyes et al. (1996), 
Argyes et al. (1997) and references therein. 


See also: Anomalies; Brane Construction of Gauge 
Theories; Donaldson—Witten Theory; Duality in 
Topological Quantum Field Theory; Effective Field 
Theories; Electric-Magnetic Duality; Floer Homology; 
Gauge Theories from Strings; Gauge Theory: 
Mathematical Applications; Nonperturbative and 
Topological Aspects of Gauge Theory; Quantum 
Chromodynamics; Topological Quantum Field Theory: 
Overview; Supersymmetric Particle Models. 
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Introduction 


The purpose of this article is to describe the so- 
called “semiclassical trace formula” (SCTF) relating 
the “spectrum” of a semiclassical Hamiltonian to 
the “periods of closed orbits” of its classical limit. 
SCTF formula expresses the asymptotic behavior as 
h—0 (b=h/2n) of the regularized density of states 
as a sum of oscillatory contributions associated to 
the closed orbits of the classical limit. 

We will mainly present the case of the Schrodin- 
ger operator on a Riemannian manifold which 
contains the purely Riemannian case. 

We start with a section about the history of the 
subject. We then give a statement of the results and 
a heuristic proof using Feynman integrals. This 
proof can be transformed into a mathematical 
proof which we will not give here. After that we 
describe some applications of the SCTF. 


About the History 


SCTF has several origins: on one side, Selberg 
trace formula (1956) is an exact summation formula 
concerning the case of locally symmetric spaces; this 
formula was interpreted by H Huber as a formula 
relating eigenvalues of the Laplace operator and 
lengths of closed geodesics (also called the “lengths 
spectrum”) on a closed surface of curvature —1. 


On the other side, around 1970, two groups of 
physicists developed independently asymptotic trace 
formulas: 


e M Gutzwiller for the Schrödinger operator, 
using the quasiclassical approximation of the 
Green function (the “van Vleck’s formula”); it 
is interesting to note that the word “trace 
formula” is not written, but Gutzwiller instead 
speaks of a new “quantization method” (the old 
one being “‘Einstein—Brillouin—Keller (EBK)” or 
“Bohr-Sommerfeld rules”). 

e R Balian and C Bloch, for the eigenfrequencies of 
a cavity, use what they call a “multiple reflection 
expansion.” They asked about a possible applica- 
tion to Kac’s problem. 


At the same time, under the influence of Mark 
Kac’s famous paper “Can one hear the shape of a 
drum?,” mathematicians became quite interested in 
inverse spectral problems, mainly using heat kernel 
expansions (for the state of the art around 1970, see 
Berger et al. (1971)). 

The SCTF was put into its final mathematical 
form for the Laplace operator on closed manifolds 
by three groups of people around 1973-75: 


e Y Colin de Verdiére in his thesis was using the 
short-time expansion of the Schrödinger kernel 
and an approximate Feynman path integral. He 
proved that the spectrum of the Laplace operator 
determines generically the lengths of closed 
geodesics. 

e J Chazarain derived the qualitative form of the 
trace for the wave kernel using Fourier integral 
operators. 


e Using the full power of the symbolic calculus of 
Fourier integral operators, H Duistermaat and 
V Guillemin were able to compute the main term 
of the singularity from the Poincaré map of the 
closed orbit. Their paper became a canonical 
reference on the subject. 


After that, people were able to extend SCTF to: 
(Helffer— 


è general semiclassical Hamiltonians 
Robert, Guillemin—Uribe, Meinrenken), 

èe manifolds with boundary (Guillemin—Melrose), 

è surfaces with conical singularities and polygonal 
billiards (Hillairet), and 

è several commuting 


Popov). 


operators (Charbonnel- 


Recently, some researchers have remarked about the 
nonprincipal terms in the singularities expansion 
which come from the semiclassical Birkhoff normal 
form (Zelditch, Guillemin). 


Selberg Trace Formula 


We consider a compact hyperbolic surface X. 
“Hyperbolic” means that the Riemannian metric is 
locally (dx? + dy*)/y* or is of constant curvature 
—1. Such a surface is the quotient X = H/T where T 
is a discrete co-compact subgroup of the group of 
isometries of the Poincaré half-plane H. Closed 
geodesics of X are in bijective correspondence with 
nontrivial conjugacy classes of I’. More precisely, 
the set of loops C(S',X) splits into connected 
components associated to conjugacy classes and 
each component of nontrivial loops contains exactly 
one periodic geodesic. 


Theorem 1 (Selberg trace formula). If p is a real- 
valued function on R whose Fourier transform ĝ is 
compactly supported and \;=1/4+ u is the spec- 
trum of the Laplace operator on X, we have: 


> onis = J p(u + s)s tanh zs ds 
2r R 


=1 


~ 


j 2 >, 2r sinh(nl,/2) 


yEP n=1 


x Re(A(al,)ei"#) 


where A is the area of X, P the set of primitive 
conjugacy classes of T and, for y € P, L is the length 
of the unique closed geodesic associated to +. 


A nice recent presentation of the Selberg trace 
formula can be found in Marklof (2003). 
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Semiclassical Schrödinger Operators 
on Riemannian Manifolds 


If (X, g) is a (possibly noncompact) Riemannian 
manifold and V:X—R a smooth function which 
satisfies lim inf,_,.. V(x) = E» > ~œ, the differential 
operator H=(1 / 2)þÞ A +V is semibounded from 
below and admits self-adjoint extensions. For all 
those extensions, the spectrum is discrete in the interval 
] —00, Ex | and eigenfunctions Hy; = Ej;y; are loca- 
lized in the domain V < Ej. If X is compact and V =0, 
we recover the case of the Laplace operator. 
We will denote this part of the spectrum by 


int V < ulh) <b) a= Bh) << Bs 


For the Laplace operator, we have E; =b); where 
Ay < Ag L< KA <-++ is the spectrum of the 
Laplace operator. 

The SCTF can also be derived the same way for 
Schrödinger operators with magnetic field. One can 
even extend it to Hamiltonian systems which are not 
obtained by Legendre transform from a regular 
Lagrangian. In this case, Morse indices have to be 
replaced by the more general Maslov indices. 


Classical Dynamics 
Newton Flows 


Euler-Lagrange equations for the Lagrangian 
L(x, v) = (1/2)|lvl|g — V(x) admit a Hamiltonian 
formulation on T*X whose energy is given by 
H=(1/2)\léll¢ + Vix). We will denote by Xy the 
Hamiltonian vector field 


OH. OH 
Xy := X` L 8n — 5 o 
= 296 ax,“ 


Preservation of H by the dynamics shows immedi- 
ately that the Hamiltonian flow ©; restricted to H < 
E% is complete. 

The Hamiltonian H is the “classical limit” of H; 
in more technical terms, H is the semiclassical 
principal symbol of H. 

If V=0, H=(1/2)g"E€ and the flow is the geo- 
desic flow. 


Periodic Orbits 


Definition 1 A periodic orbit (y,T) (also denoted 
p.o.) of the Hamiltonian H consists of an orbit y 
of Xy which is homeomorphic to a circle and 
a nonzero real number T so that ®7(z)=z for all 
zevy. We will denote by To(y) > 0 (the primitive 
period) the smallest T > 0 for which ®7(z) =z. 
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If (T,E) are given, Wr,g is the set of 2’s so that 
H(z)=E and ®7(z) =z. 


e The (linear) Poincaré map IL, of a p.o. (y, T) with 
H(y)=E: we restrict the flow to Sg:={H = E} 
and take a hypersurface X inside Sg transversal to 
y at the point zo. The associated return map P is a 
local diffeomorphism fixing zo. Its linearization 
IL := P'(zọ) is the linear Poincaré map, an 
inversible (symplectic) endomorphism of the 
tangent space T,X. 

e The Morse index u(y): p.o. (y, T) is a critical point 
of the action integral i Liv(s),7(s)) ds on the 
manifold C%°(R/TZ,X). It always has a 
finite Morse index (Milnor 1967) which is denoted 
by u(y). For general Hamiltonian systems, the Morse 
index is replaced by the Conley—Zehnder index. 

e The nullity index v(y) is the dimension of the 
space of infinitesimal deformations of the p.o. y 
by p.o. of the same energy and period. We always 
have v(y) > 1 and v(y) =1 + dim ker (Id — II). 


Example 1 (Geodesic flows) 


è Riemannian manifold with sectional curvature <0: 
in this case, we have for all periodic geodesics 
Uy) =0, (7) = 1. 

e Generic metrics: for a generic metric on a closed 
manifold, we have v(y)=1 for all periodic 
geodesics. 

e For flat tori of dimension d: we have ı(y)=0 and 
V(y) =d. 

è For sphere of dimension 2 with constant curva- 
ture: if y, is the mth iterate of the great circle, we 
have (Yn) =2|n| and v(y,) = 3. 


It is a beautiful result of J-P Serre that any pair of 
points on a closed Riemannian manifold are end- 
points of infinitely many distinct geodesics. Count- 
ing geometrically distinct periodic geodesics is much 
harder especially for simple manifolds like the 
spheres. It is now known that every closed Riemannian 
manifold admits infinitely many geometrically distinct 
periodic geodesics (at least, in some cases, for 
generic metrics, (Berger 2000 chap. V). There exists 
significant knowledge concerning more general 
Hamiltonian systems as well. 


Nondegeneracy 


There are several possible nondegeneracy assump- 
tions. They can be formulated “a la Morse—Bott” 
(critical point of action integrals) or purely 
symplectically. 


Definition 2 Two submanifolds Y and Z of X 
intersect cleanly iff YM Z is a manifold whose 


tangent space is the intersection of the tangent 
spaces of Y and Z. 

Fixed points of a smooth map are clean if the 
graph of the map intersects the diagonal cleanly. 


Definition 3 We will denote by (ND) the following 
property of the p.o. (yo, To): the fixed points of the 
associated (nonlinear) Poincaré map P are clean. 

The set Wr, g is ND if all p.o.’s inside are ND. 
Wr, g is then a manifold of dimension (7). 


Example 2 


è Generic case: v=1; (ND) is equivalent to “1 is 
not an eigenvalue of the linear Poincaré map.” 
In this case, we can deform the p.o. smoothly by 
moving the energy. This family of p.o.’s is called 
a cylinder of p.o.’s. The period T(E) is then a 
smooth function of E. 

è Completely integrable systems: v = d; (ND) is then a 
consequence of the so-called “isoenergetic KAM 
condition”: assuming the Hamiltonian is expressed 
as H(I,,...,14) using action-angle coordinates, this 
condition is that the mapping I — [VH(I)] from the 
energy surface H = E into the projective space is a 
local diffeomorphism. This condition implies that 
Diophantine invariant tori are not destructed by a 
small perturbation of the Hamiltonian. 

e Maximally degenerated systems: it is the case 
where all orbits are periodic (v =2d — 1). For 
example, the two-body problem with Newtonian 
potential and the geodesic flows on compact 
rank-1 symmetric spaces. 


Canonical Measures and Symplectic Reduction 


Under the hypothesis (ND), the manifold Wr, g admits 
a canonical measure uc, invariant by ®;. In the case 
v= 1, this measure is given by |dt|/./det(Id — II). 

By using a Poincaré section, it is enough to 
understand the following fact: if A is a symplectic 
linear map, the space ker (Id — A) admits a canonical 
Lebesgue measure. 

We start with the following construction: let L4 
and Ly be two Lagrangian subspaces of a symplectic 
space E and wj,j=1,2, be half-densities on L,, 
denoted by wj € Q12(L;). If W= L1 A L2, we have 
the following canonical isomorphisms: 0!/?(L;) = 
O12(W) @ONA(L:/W). So O2(L1)@01/2(L2) = 
O1/2(L,/W) 8 O'/2(L2/W) @01(W). Mj=L;/W are 
two Lagrangian subspaces of the reduced space 
W°/W whose intersection is 0. Hence, by using 
the Liouville measure on it, we get '/7(Mj) @ 
Q'/?(My)=C. Hence, we get a density w1 «uw 
on W. It turns out that the previous calculation is one 
of the main algebraic pieces of the symbolic calculus of 


Fourier integral operators and the density w1 * w2 
arises in stationary-phase computations. 

The graph of a symplectic map is equipped with a 
half-density by pullback of the Liouville half- 
density. So we can apply the previous construction 
to the intersection of the graph of A and the graph 
of the identity map. 


Actions 


Definition 4 If (y,T) is a p.o., we define the 
following quantity which is called action of 7: 


= | gx 


In the (ND) case, A(y) is constant on each connected 
component of Wr E£. 


In the generic case and if T’(E) 4 0 (cylinder of 
p.o.), p.o.’s of the cylinder are also parametrized 
by T (i.e., we note by yg the p.o. of the cylinder of 
energy E and yr the P.0. he period T). If 
a(E)=A(yz) and b(T)=— fy L(yr(s), Yr(s))ds, a(E) 


and b(T) are Legendre wi A ‘of each other. 


Playing with Spectral Densities 


We will define the “regularized spectral densities.” 
The general idea is as follows: we want to study an 
h-dependent sequence of numbers E,(/) (a spectrum) 
in some interval [a,b]. We introduce a non negative 
function p € S(R) which satisfies f p(t)df=1, and 
also Dy,«,s(E)= op-(E—E;), where —p-(E) = 
e'p(E/e). It gives the analysis of the spectrum at 
the scale e. Of course, we will adapt the scaling e€ 
to the small parameter h. If the scaling is of the size 
of the mean spacing of the spectrum, we will get a 
very precise resolution of the spectrum. 
The general philosophy is: 


e If h is the semiclassical parameter of a semiclassi- 
cal Hamiltonian, the mean spacing of the eigen- 
values is of ander h’ (Weyl’s law). The trace 
formula gives the asymptotic behavior of 
Dp e p(E) for e~h (and hence £ >> AE except 
if d=1). This behavior is not “universal” and 
thus contains a significant amount information of 
(in our case, on periodic trajectories). 

e Better resolution of the spectrum needs the use of 
the long-time behavior of the classical dynamics and 
is conjecturally universal. It means that eigenvalues 
seen at very small scale behave like eigenvalues of an 
ensemble of random matrices, the most common one 
being the Wigner Gaussian orthogonal ensemble 
(GOE) and Gaussian unitary ensemble (GUE). 


We fix some interval [a,b] with b < Ex. 
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We define D(E):= Dla<e,<b O(E;) as the sum of 
Dirac measures at the points E; and its h-Fourier 
transform as 

Z(t) = trace e au) =) exp(—itE;/h) [1] 
where X~ is the sum over E; € [a,b]. 

The Duistermaat-Guillemin trick relates the 
previous behavior to asymptotics of the regularized 
density of a Let us give a function p € 


S(R) so that p(t)= fe" p(E)dE is compactly 
supported and 
p(t) =1+ O), t—>0 |2] 


(all moments of p vanish). We introduce, for E € 
[a;b]; D E):= D RAE — E;/b). D,(E) is indepen- 
dent modulo O(b%®) of a,b. We have 


DE) =x, | nz t) dt 


The idea is now to start from a semiclassical 
approximation of U(t)=e~"#/" and to insert it into 
eqn [1]. We need only a uniform approximation of 

U(t) for t € Support(f). From the asymptotic expan- 
sion of Z(t), we will deduce the asymptotic expan- 
sion of D,, the regularized eigenvalue density. 


The Smoothed Density of States 


The following statement expressing the smoothed 
density of eigenvalues is the main result of the 
subject. Under the (ND) assumption, it gives the 
existence of an asymptotic expansion for D,(E): 


Theorem 2 If E is not a critical value of H and the 
(ND) condition is satisfied for all p.o.’s of energy 
E € [a,b] and period inside the support of /, 


D,(E) = Dwey(E) + X Dwr.) +O) [3] 


where: 
(1) 


Dweyi (E) = (27b) (> aj(E w 


with aọ(E)= |,,_,dL/dH 
(ii) The sum is over all the manifolds Wr, g so that 
T € Support(ĝ). 
(iii) 
E€ —1(y)r/2 
Dwre) = 5 aye E 


A(q)/b ` b;(E)b! 


j20 
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with 


if T'(E) > 0 
if T’(E) < 0 


If v(-y) =1, we get bo = p(T,)To|det(Id — TL )|"?. 


The Weyl Expansion 


If Support(6) is contained in [—Tinin, Tmin], where 
Tmin is the smallest period of a p.o. y with H(y)=E, 
and, if E is not a critical value of H, formula [3] 
reduces to 


D,(E) ~ (27b) ‘(Saw n) 


From the previous formula, it is possible to deduce 
the following estimates: 


Theorem 3 Ifa, b are not critical values of H: 


Hila < E;(h) < b} 
— (27h) “volume(a a= 0) 1 OW) 


This remainder estimate is optimal and was first 
shown in rather great generality by Hormander 
(1968). 


Derivation from the Feynman Integral 
The Feynman Integral 


R Feynman (Feynman and Hibbs 1965) found 
a geometric representation of the propagator, 
that is, the kernel p(t,x,y) of the unitary group 
exp (—itH /b) using an integral (FPI := Feynman path 
integral) on the manifold .y:={y:[0,¢]— 
X|7(0) =x, y(t)=y} of paths from x to y in the 
time t; if L(y,7) is the Lagrangian, we have, for 
t >Q: 


: t 
p(t, x,y) =} p(z L(q(s 
Qi xy 0 


where |dy| is a “Riemannian measure” on the 
manifold Q, x,y with the natural Riemannian 
structure. 

There is no justification FPI as a useful mathema- 
tical tool. Nevertheless, FPI gives good heuristics 
and right formulas. 


), 469) ds) ld 


The Trace and Loop Manifolds 


Let us try a formal calculation of the partition 
function and its semiclassical limit. We get 


o= fia f el [ cow. 


If we denote by Q, the manifold of paths 
7: R/tZ — X, (loops) and we apply Fubini (sic !), 
we get 


20)= f efi [er 


The Semiclassical Limit 


+(8))ds) dy 


i(8)) ds) dy 


We want to apply stationary phase in order to get 
the asymptotic expansion of Z(t); critical points of 
J,:Q: — R are the p.o.’s of the Euler-Lagrange flow 
and hence of the Hamiltonian flow of period t. We 
require the ND assumption (Morse-Bott), the Morse 
index, and the determinant of the Hessian: 


1. The ND assumption is the original Morse—Bott 
one in Morse theory: we have smooth manifolds 
of critical points and the Hessian is transversally 
ND. 

2. The Morse index is the Morse index of the action 
functional on periodic loops: L(y) := fo vis 
Y(s))ds. 

3. The Hessian is associated to a periodic Sturm- 
Liouville operator for which many regulariza- 
tions have already been proposed. 


In this manner, we get a sum of contributions 
given by the components W;,; of W;: 


Z(t (t ) = (ib)~ Vj /2 a ( el/P)LO) c(h) 
with c(h) ~ d/o GB and 
ea iu(n/2) 


where u is the Morse index and 6 is a regularized 
determinant. 


The Integrable Case 


As observed by Berry—Tabor, the trace formula in 
this case comes from Poisson summation formula 
using action-angle coordinates. Asymptotic of the 
eigenvalues to any order can then be given in the so- 
called quantum integrable case by Bohr-Sommerfeld 
rules. 


The Maximally Degenerated Case 


Let us assume that (X,g) is a compact Riemannian 
manifold for which all geodesics have the same 
smallest period Tp = 27. Then we have the following 
clustering property: 


Theorem 4 There exists some constant C and some 
integer a so that 


(i) the spectrum of A is contained in the union of 
the intervals 


ie (k E (k cl, 


E e 


(ii) N(k) =#Spectrum(A) A I; is a polynomial func- 
tion of k for k large enough. 


The property (ii) is consequence of the trace 
formula. 


Applications to the Inverse 
Spectral Problem 


We will now restrict ourselves to the case of the 
Laplace operator on a compact Riemannian mani- 
fold (X, g). The main result is as follows: 


Theorem 5 (Colin de Verdière). If X is given, there 
exists a generic subset Gx, in the sense of Baire 
category, of the set of smooth Riemannian metrics on 
X, so that, if g € Gx, the length spectrum of (X, g) can 
be recovered from the Laplace spectrum. The set Gx 
contains all metrics with <0 sectional curvature and 
(conjecturally) all metrics with <0 sectional curvature. 


We can take for Gy the set of metrics for which all 
periodic geodesics are nondegenerate and the length 
spectrum is simple. 

Some cancelations may occur between the asympto- 
tic expansions of two ND periodic trajectories with the 
same actions if the Morse indices differ by 2 mod 4. 


The Case with Boundary 


If (X, g) isa smooth compact manifold with boundary, 
one introduces the broken geodesic flow by extending 
the trajectories by reflection on the boundary. SCTFs 
have been extended to that case by Guillemin and 
Melrose. Periodic geodesics which are transversal to 
the boundary contribute to the density of states in the 
same way as for periodic manifolds. Periodic geodesics 
inside the boundary are in general accumulation of 
periodic geodesics near the boundary: their contribu- 
tions is therefore very complicated analytically. 
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Bifurcations 


Let us denote by Cy C RẸ p» the set of pairs (T,E) 
for which Wr, is not empty. The previous results 
apply to the “smooth” part of the set Cy. Among 
other interesting points are points (0,E) with critical 
value E of H (Brummelhuis—Paul—Uribe) and points 
corresponding to bifurcation of p.o. when moving 
the energy. 

Detailed studies of some of these points have been 
done, for example, the results of suitable applica- 
tions of the theory of singularities of functions 
of finitely many variables, their deformations (catas- 
trophe theory), and applications to stationary-phase 
method, and a significant body of knowledge on 
these subjects now exists. 


SCTF and Eigenvalue Statistics 


One of the main open mathematical problems is: 
“can one really use appropriate forms of the SCTF 
as quantization rules and use it in order to derive 
eigenvalues statistics?” 

This problem is related to the fine-scale study of the 
eigenvalue spacings (€ << h). It is one of the important 
unsolved problems of the so-called “quantum chaos.” 
Many people think that progress in this field will allow 
us to solve the Bohigas—Giannoni-Schmit conjecture: 
“if the geodesic flow is hyperbolic, eigenvalue distribu- 
tion follows random matrix asymptotics.” 


See also: Billiards in Bounded Convex Domains; 
h-Pseudodifferential Operators and Applications; 
Quantum Ergodicity and Mixing of Eigenfunctions; 
Random Matrix Theory in Physics; Regularization for 
Dynamical Zeta Functions; Resonances. 
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Introduction 


A semilinear wave equation is an equation of the 
form 


Du = F(u,u), u:QACRxR"-R [1] 


where F:R”*? — R is a smooth function, the 
d’Alembert operator O is defined as 
o o 


O=D} -D4 ---D}, Ds=—,D 


=— 2 
or Ox | | 


and uw’ denotes the vector of all first-order deriva- 
tives of u: 


uy! = (D,u, Dx, oe »Dx,u) = (Up, Ux, us Ux, ) 


Sometimes the term “semilinear” is used in a more 
restrictive sense and refers to the special class of 


equations 
: Ou =f (u) 3] 


The very particular case f(u) = —mu,m > 0, corres- 
ponds to the Klein-Gordon equation, used to model 
relativistic particles. True nonlinear terms of the form 
flu)= —mu—u’?,m>0 (meson equation), or 
f(u)= —sin u (sine-Gordon equation) have been pro- 
posed as models of self-interacting fields with a local 
interaction. Notice that for the physical applications it 
is natural to consider complex-valued functions u(t, x); 
in the general case of eqn [1], this actually means that 
we are considering a 2 x 2 system in Ru and Su. 
However, the natural physical requirement of gauge 
invariance restricts the possible nonlinearities to the 
functions satisfying the condition 


f (eu) = f(u)e”, VOER [4] 
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Thus, in particular f(0)=0 and we see that f must 
be of the form f(u) = g(|u| ju for some g. Since the 
gauge-invariant wave equation 


Ou = g(|u|")u [5] 


has essentially the same properties as the real-valued 
equation [3], it is not too restrictive to study only 
real-valued functions as we shall mostly do in the 
following. 

The more general equations of the form [1], 
involving the derivatives of u, are encountered in 
several physical theories, including the nonlinear 
o-models and general relativity. 

However, beyond the concrete physical applica- 
tions, eqn [1] is important since it is a simplified but 
relevant model of much more general equations and 
systems of mathematical physics; despite its simple 
structure, the semilinear wave equation presents 
already all the main difficulties and phenomena of 
nonlinear wave interaction, and it represents an 
ideal laboratory for such problems. 

In this article we plan to give a concise but, as far 
as possible, comprehensive review of the main 
research directions concerning eqn [1], and in 
particular we shall focus on the global existence of 
both large and small nonlinear waves, and the 
problem of local existence for low-regularity solu- 
tions. A large part of the theory extends to nonlinear 
perturbations of the form Du = F(u, u’, u") and to 
the fully nonlinear case; we have no space here to 
give an account of these developments and we must 
refer the reader to the books and papers cited in the 
“Further reading” section. 


Classical Results 


Equations [1] and [3] are hyperbolic with respect to 
the variable t. This is a precise way of stating that 
the “correct” problem for it is an initial-value 
problem (IVP) with data at some fixed time, or 


more generally on some spacelike surface: this 
means that we assign two functions u(x), u(x), 
called the “initial data,” and we look for a function 
u(t, x) satisfying the IVP: 


Du = F(u,w), u(0,x)=uo(x), u,(0,x) =u (x) [6] 


This setting is in agreement with the physical picture 
of an evolution problem: the data represent the 
complete state of a system at a fixed time, and they 
uniquely determine the evolution of the system, 
which is described by the differential equation. 
This rough statement of the problem is sufficient 
when working with smooth functions, as in the 
classical approach. By purely classical methods, that 
is, energy inequalities and nonlinear estimates, it is 
not difficult to prove the following local existence 
result, where H* =H*(R”) denotes the Sobolev 
space of functions with k derivatives in L*(R”): 


Theorem 1 Assume F is C®. Let (uo,u1) € HE x 
H*-! for some k > 1 + n/2. Then there exists a time 
T=T(||“oll ge + |ui) > 0 such that problem 
[6] has a unique solution belonging to (u,m) € 
Clit) CR ae), 

If F= F(u) depends only on u, the result holds for 
allk > n/2. 


Proof We decided to include a sketchy but com- 
plete proof of this result since it shows the basic 
approach to nonlinear wave equations: many results 
of the theory, even some of the most delicate ones, 
are obtained by suitable variations of the contrac- 
tion method, and are similar in spirit to this classical 
theorem. 

Assume for a moment that the equation is linear 
so that F = F(t, x) is a given smooth function of (t, x). 
For the linear equation [Qu = F, we can construct a 
solution u using explicit formulas. Moreover, u 
satisfies the energy inequality 


t 
E) <Ex(0) + | URS Ihwrds 7 
where the energy E,(t) is defined as 


Ex(t) = ult, Jla + [le (2, -) Ne [8] 


Now we introduce the space Xr = C([-T, T]; H£) A 
CFT TEH ), the space r= CFT, T MA; 
the mapping ®: F— u that takes the function F(t, x) 
into the solution of []u =F (with fixed data uo, u1), 
and the mapping W(u) = F(u, uv’) which is the original 
right-hand side of the equation. 

The energy inequality tells us that ® is bounded 
from Yr to Xr. Actually, for M large enough with 
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respect to E,(0) (the H? norm of the data), ® takes 
any ball By(0, N) of Yr into the ball By(0,M + NT) 
of Xr. Moreover, if we apply [7] to the difference of 
two equations Ou = F and Ov = G, we also see that 
® is Lipschitz continuous from Yr to Xr, with a 
Lipschitz constant CT. 

On the other hand, U(u) = F(u,u') takes Xr to Yr, 
provided k > 1+n/2; we can even say that it is 
Lipschitz continuous from Bx(0, M) to By(0, C(M)) 
for some function C(M), with a Lipschitz constant 
C,(M) also depending on M. This follows easily 
from Moser type estimates like 


n 
FO w) lga < ellul lula > >HI 


Or 


n 
lE) < Allalla -& > 5 


Now it is easy to conclude: the composition ® o Y 
maps Xr into itself, and actually is a contraction of 
Bx(0, M) into itself provided M is large enough with 
respect to the data, and T is small enough with 
respect to M. The unique fixed point is the required 
solution. o 


The wave operator has an additional important 
property called the finite speed of propagation, 
which can be stated as follows: given the IVP 
u(0, x) = uo(x), uz(0,x) = u4(x) 
if we modify the data “outside” a ball B(xo, R) C R”, 
the values of the solution inside the cone 


K(xo, R) = {(t,x):t > 0, |x —xo| < R-t} 


do not change. Notice that K(xo, R) is the cone with 
basis B(xo, R) and tip (R, xo); the slope of its mantle 
represents the speed of propagation of the signals, 
which for the wave operator O is equal to 1. The 
property extends without modification to the semi- 
linear problem [6], at least for the smooth solutions 
given by Theorem 1. Actually, it is not difficult to 
modify the proof of the theorem to work on cones 
instead of bands [- T, T] x R”; in other words, given 
a ball B=B(xo,R), we can assign two data 
uo € H*(B), wu, € He-1(B\(k > n/2+1) and prove 
the existence of a local solution on the cone 
K(xo, R) for some time interval t € [0, T]. 

In general, the finite speed of propagation allows 
us to localize in space most of the results and the 
estimates; as a rule of thumb, we expect that what is 
true on a band [0,7] x R” should also be true on 
any truncated cone K(xo, R) N {0 < t< T). 
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Symmetries 


The linear wave equation can be written as the 
Euler-Lagrange equation of a suitable Lagrangian. 
This is still true for the semilinear perturbations of 
the form 


[lu + f(u)=0 9] 


Indeed, denoting with F(s) = 1 f(a) do the primitive 
of f, the Lagrangian of [9] is 


cu) = | f =F + Vaal? + Fw) dtdx [10] 


The functional £ is not positive definite; hence, the 
variational approach gives only weak results. How- 
ever, this point of view allows us to apply Noether’s 
principle: any invariance of the functional is related 
to a conservation law of the equation. These 
conserved quantities can also be obtained by taking 
the product of the equation by a suitable multiplier, 
although this method is far from obvious in many 
cases. We describe here this circle of ideas briefly. 

The functional £ is invariant under the Poincaré 
group, generated by time and space translations and 
the Lorentz transformations (A > 1,c Æ 0): 


A= XC 
t > At — xj/C 

àX -—1 
The infinitesimal generators of the translations are 
simply the partial derivatives D; and D,,. The Lorentz 
transformations can be decomposed as a rotation 


followed by a boost, and indeed a corresponding 
complete set of infinitesimal generators are the operators 


I; = xD; + tD; [12] 


[11] 


’ Xj => 


Diz = x;De - Xel); 


All the operators in the Poincaré group commute 
with O exactly. 

The conservation law related to time translations 
(time derivative) is the fundamental “conservation of 
energy” 

E(t) = z” HN + F(u) |dx = E(0) [13] 
while spatial translations (spatial derivatives) lead to 
the conservation of momenta 


f ums dx =const., j=1,...,n 


On the other hand, infinitesimal rotations and 
boost [12] are connected to the conservation of 
angular momenta 


[led — x;D,u| - D,udx = const., 
Ralat [14] 


and 
[svt + Du D,u| dx = const., 
keken [15] 
where 
e(u) = tu? + tV u|? + F(u) [16] 


is the energy density. 

The Poincaré group does not exhaust the invar- 
iance properties of the free wave equation. Among 
the other transformations which commute or almost 
commute with O, we mention the spacetime dilations 
and inversions (which together with translations and 
Lorentz transformations generate the larger confor- 
mal group), the scaling u — Au, the spatial dilations, 
and, in the complex-valued case, the gauge transfor- 
mation u> eu. In this way several useful conserva- 
tion laws can be obtained, including the conformal 
energy identities of K Morawetz. 


Strichartz Estimates 


Energy estimates are very useful tools but they have 
some major shortcomings. The main one is clearly 
the large number of derivatives necessary to estimate 
the nonlinear term. This is why the modern theory 
of semilinear wave equations relies mainly on 
different tools, which go under the umbrella name 
of Strichartz estimates and express the decay 
properties of solutions when measured in L? or 
related norms. In this section we summarize these 
estimates in their most general form, and try to give 
a feeling of the techniques involved. 

Consider the following IVP for a homogeneous 
linear wave equation: 


Ou = 0, woa =O, u(0,x)=f(x) [17] 


The conservation of energy states that 


u(t, lie + Viele, lize = Mite [18] 


for all times t. Thus, we see that L7-type norms of 
the solution do not decay. The interesting fact is that 
if we measure the solution u in a different L?-norm, 
p > 2, the norm decays as t—> oo, and the decay is 
fastest for the L°-norm. 

To appreciate the dispersive phenomena at their 
best, let us assume that the Fourier transform of the 
data is localized in an annulus of order 1: 


supp f(€) C {1/2 < |g] < 2} [19] 


Then the corresponding solution u(t, x) has the same 
property, and we see that 


lull = lêle < Zlé = 21 Vallp2 < 41472 


We condense the last line in the shorthand notation 
lull  |[Val]p2 
We shall also write 


lvl Sllelly == lvl < Clwlẹy for some C 


We can now rewrite the conservation of energy 
[20] in a very simple form; for localized data (and 
hence a localized solution) as in [19], we have 


lut, 2l S Wize |20] 


The basic L®-estimate for a solution of [17] with 
localized data as in [19] is simply 


let, llre sll |21] 


This estimate is well known since the 1960s; it can 
be proved easily by several techniques, notably by 
the stationary-phase method. Property [21] mea- 
sures the fact that as time increases, the total 
energy of the solution remains constant but spreads 
over a region of increasing volume, due to the 
propagation of waves. If we interpolate between 
[20] and [21], we obtain the full set of dispersive 
estimates 


let, Vln SO OPN 


Sat, 2<qsow 22 

q pP 
Recall that we are working with localized solutions 
on the annulus ||~ 1; it is easy to extend the 
above estimates to general solutions by a rescaling 
argument, exploiting the fact that, if u(t,x) is a 
solution of the homogeneous wave equation, 
u(At, àx) is also a solution for any constant A. 
Indeed, if f (and hence uz) is supported in the 
annulus 2/-! < |€| < 271, je Z, by rescaling [21], 
we obtain 


HE lw SEO PPMP TF (23) 


If f is any smooth function, not localized in 
frequency, we can still write it as a series 


f= Dah 


JEZ 
{2/-1 < |é| <21. The quantity 


IFllag, = SO 2*Uflle 


JEZ 


where supp f C 
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is by definition the BS, Besov norm of f. Thus, 
summing the estimates [23] over j, we conclude that 
a general solution of [17] satisfies the dispersive 
estimate 


a(t, -lro SO Fla 





a-n) [24] 


The Strichartz estimates can be obtained as a 
consequence of the above dispersive estimates, plus 
some subtle functional analytic arguments. In the 
general form we give here, they were proved by 
J Ginibre and G Velo, and in the most difficult 
endpoint cases by Keel and T Tao. The solution of 
the homogeneous problem [17] studied above can be 
written as 


sin(t|/D 7 
u(t,x) = EUDE D= Fee 
D| 
(here F denotes the Fourier transform). On the 
other hand, the solution of the complete nonhomo- 
geneous problem 


Ou = F(t,x), u(0,x)= uo, u(0,x)= u(x) [25] 


can be written by Duhamel’s formula as 


— Asin(t\D|)_—_._ sin(z|D]) 
MOO oF Dy py 


and we see that the above estimates [22] apply to all 
the operators appearing here. If we consider problem 
[25] and we assume that the data F(t, x), uo, u1 are 
localized in frequency so that F(t, €),iu%,u%, have 
support in the annulus |&| ~ 1, the Strichartz estimate 
takes the following form: 

lellera S [leolln2 + uili + Flere [26] 
Here the dimension is n > 2; L?L4 denotes the space 
with norm 


1/p 
leer = (| lule, Yese d r) , ton 


or [=R 


the indices p,g satisfy the conditions 








tpi se a ee 
og 2-2 2° 
p € [2, œ], (n, p,q) # (3,2, 00) |27] 


while p,g satisfy an identical condition (and p’ 
denotes the conjugate index to p). The constant in 
inequality [26] is uniform with respect to the 
interval I. 
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To get the most general form of the estimates, 
some additional function space trickery is required. 
As before, a simple rescaling argument extends 
estimate [26] to the case of data F,uo,u1, whose 
spatial Fourier transforms are localized in the 
annulus 251 < |E] < 21; we obtain 


(1 nj/2 
UPD ell pera S 2!" luollz: 


+2 ee ILL 


(1/p'+n/q'—2) . 
+ 2/ 1 Fe ra 


Finally, if the data are arbitrary, we may decompose 
them as series of localized functions, and summing 
the corresponding estimates we obtain the general 
Strichartz estimates for the wave equation [25]: for 
all (p, q) and (p,) as in [27], 


lll ippies ||zto || gn/2 ZI [|241 || peat 
in 28] 
I” 
Here, given a decomposition f= ezf; the 
homogeneous Besov and Sobolev norms are defined, 


respectively, by the identities (obvious modification 
for r= o0): 


Ifllig, = J Ihle 


JEZ 
lulle = MEFA = lulla: 


It is easy to convert the estimates [28] into a form 
that uses only the more traditional norms 


Ifl; = DPF line: 


since by the Besov-Sobolev embedding we have 


IDI SF NEF 


B52 CH, for2<q<o, 
92 2H; forl<q<2 


Notice that if we apply to the equation and the 
data the operator |D|° = F'|€|"F, which commutes 
with O, the Strichartz estimate [28] can be rewritten 
in an apparently more general form: 


|u| 1/p+n/q+0 [L |240 || p240 


pi 
L; B32 


+ fatale) + |E] or gupa- [29] 
2 


b'i 
BB 


In particular, it is possible to choose the indices in 
such a way that no derivatives appear on u and F: 
this choice gives 


lullo S Ilolle + lu lleg + UF Ie R 
2(n+ 1) 


n— 1 


which is the estimate originally proved by Strichartz. 


Global Large Waves 


As for ordinary differential equations (ODEs), the 
local solutions constructed in Theorem 1 can be 
extended to a maximal time interval [0, T*], and a 
natural question arises: are these maximal solutions 
global, that is, is T* = œo? 

For generic nonlinearities and large data, the 
answer is negative; in a dramatic way, in general 
the norm |ju(t, - )||;~ is unbounded as t}T* < oo. 
The reason for this is simple: using the finite speed 
of propagation, we can localize the equation and 
work on a cone; then if we take constant functions 
as initial data, the solution inside the cone does not 
depend on x, and the equation restricted to the cone 
effectively reduces to an ODE: 


Ou =H) = O=O gg 
y(t) = u(t, x) 
By this remark it is elementary to construct solutions 
of the IVP [6] that blow up in a finite time. 
This construction does not apply if the equation 
has some positive conserved quantity. Indeed, con- 
sider a general gauge-invariant equation 


Ou + g(|u\")u = 0, 

u(x) =u); m0 ax) = tala) 
for some smooth function g(s). Writing G(s)= 
fo g(a) do, multiplying the equation by m, and 
integrating over R”, it is easy to check that the 
nonlinear energy 


E(t) = / lel? + [Vru]? + G(le?)] dx = E(0) [32 


(31) 


is constant in time, provided the solution u is 
smooth enough. When G(s) has no definite sign, 
we can proceed as above and construct solutions 
that blow up in finite time; this is usually called the 
“focusing” case. However, if we assume that 
G(s) > 0 (“defocusing” case), the energy E(t) is 
non-negative. The corresponding ODE, which is 
y” + g(y7)y=0, has only global solutions, and one 
may guess that also the solutions of [31] can be 
extended to global ones. 

This innocent-looking guess turns out to be one of 
the most difficult problems of the theory of nonlinear 
waves, and is actually largely unsolved at present. 

The only general result for eqns [31] is Segal’s 
theorem, stating that the IVP has always a global 
weak solution: 


Theorem 2 Let g(s) be a C! non-negative function 
on [0, +00), write G(s) = fo g(a) do and assume that 
for some constant C 


sels“) < CG(s"), lim G(s) = +00 [33] 


s— +00 


Then for any (uo,u1) € H! x L? such that G(|uo|”) 
€ L!, the IVP [31] has a global solution u(t, x) in the 
sense of distributions, such that u! € L® (R, L?(R")) 
and F(u) € L©(R, L!(R”)). 


The proof (see Shatah and Struwe (1998)) is 
delicate but elementary in spirit: by truncating the 
nonlinear term, we can approximate the problem at 
hand with a sequence of problems with global 
solution; then the conservation law [32] yields 
some extra compactness, which allows us to extract 
a subsequence converging to a solution of the 
original equation. 

Thus we see that, despite its generality, this result 
does not shed much light on the difficulties of the 
problem. Indeed, the weak solution obtained might 
not be unique, nor smooth, and in these questions 
the real obstruction to solving [31] is hidden. 

Notice that in the one-dimensional case n= 1 the 
solution is always unique and smooth when the data 
are smooth, since in this case E(t) controls the L™- 
norm of u. For higher dimensions n > 2, something 
more can be proved if we assume that the nonlinear 
term has a polynomial growth: 


sg(s*) = |s 7's for s large, p > 1 [34] 


In particular, the defocusing wave equation with a 
power nonlinearity 


Clu + luu = 0 [35] 


has been studied extensively. Notice that when p is 
close to 1, the term |z|’~'w becomes singular near 0; 
this introduces additional difficulties in the problem; 
for this reason, it is better to consider a smooth term 
as in [34]. 

We can summarize the best-known results con- 
cerning [31] under [34] as follows. Let po(m) be the 
number 


po(1) = po(2) = œ 


4 
poln) =1+——~ 


> 
-3 for n > 3 


Then 


è in the subcritical case 1 < p < poln), for any data 
(49,41) € H! x LŽ, there exists a unique solution 
u € C(R; H!) such that uw’ € C(R; L7); 

è the same result holds in the critical case p = po(m) 
for n > 3; and 

e when 3<n<7,1< p< poln), the solution is 
smoother if the data are smoother. 


These results have been achieved in the course of 
more than 30 years through the works of several 
authors (it is indispensable to mention at least the 
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names of K Jörgens, I Segal, W Strauss, W von 
Wahl, P Brenner, H Pecher, J Ginibre, G Velo, 
R Glassey and the more recent contributions of 
J Shatah, M Struwe, L Kapitanski, M Grillakis, 
omitting many others). Actually modern proofs are 
remarkably simple, and are based again on a 
variation of the fixed-point argument. Roughly 
speaking, the linear equation Ou + g(\v|*)v=0 
defines a mapping v> u; the Strichartz estimates 
localized on a cone imply that this mapping is 
Lipschitz continuous in suitable spaces, the Lipschitz 
constant being estimated by the nonlinear energy of 
the solution restricted to the cone. In order to show 
that this mapping is actually a contraction, it is 
sufficient to prove that the localized energy tends to 
zero near the tip of the cone, that is, it cannot 
concentrate at a point. Once this is known, it is easy 
to continue the solution beyond any maximal time 
of existence and prove the global existence and 
uniqueness of the solution. 

In the supercritical case p > po(m), very little is 
known at present; there is some indication that the 
problem is much more unstable than in the 
subcritical case (Kumlin, Brenner, Lebeau), and 
there is some numerical evidence in the same 
direction. 


Global Small Waves 


It was noted already in the 1960s (Segal, Strauss) 
that the equation in dimension n > 2 


[Ju=f(u), u(0,x)=euo(x), 
f(u) =O(|u|") foru~0 


u(0,x) = cui (x) 


with small data can be considered as a perturbation of 
the free wave equation and admits global solutions. 
The phenomenon may be regarded as follows: the 
wave operator tends to spread waves and reduce their 
size (see [21]); the nonlinear term tends to concen- 
trate the peaks and make them higher, but at the same 
time it makes small waves smaller. If the rate of 
dispersion is fast enough, the initial data are small 
enough, and the power of the nonlinear term is high 
enough, the peaks have no time to concentrate, and 
the solution quickly flattens out to 0. Notice that in 
dimension 1 there is no dispersion, and this kind of 
mechanism does not occur. 

It was, however, F John who initiated the modern 
study of this question by giving the complete picture 
in dimension 3: for the IVP 


[lu = jul’, 


u(0, x) = eui (x), 


u(0, x) = euo(x) 
M= 
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he proved that, for fixed uo, u1 € C, 


e if y>1+v2 and € is small enough, the solution 
is global and 

è if 1<y<1+v2 and the data are not identically 
zero, the solutions blow up in a finite time for all € 
(i.e., the L°°-norm is unbounded). 


Later Schaeffer proved that blow-up occurs also at 
the critical value y=1+ V2. 

W Strauss guessed the correct critical value for all 
dimensions — yọ(n) is the positive root of the 
algebraic equation 


and conjectured that the same picture as in dimen- 
sion 3 is valid for all dimensions n > 2. 

Soon Sideris proved that, for 1 < y < y(n) and the 
quite general and small data, one always has blow-up. 
Also it was proved by Klainerman, Shatah, Christo- 
doulou, and others that the positive part of the 
conjecture was true for y > yọoln), with a small gap 
near the critical value. The gap was closed by 
Georgiev, Lindblad, Sogge, who proved global exis- 
tence for all y > yolnan). We also mention that the 
solution at the critical value y =yo(”) always seems to 
blow up; this is settled for low dimension (Schaeffer, 
Yordanov, Zhang and others), but the question is still 
not completely clear for large dimensions. 

This problem has spurred a great deal of 
creativity, eventually leading to very fruitful results: 
the different approaches have proved useful in a 
variety of problems, sometimes quite different from 
the original semilinear equation. We mention a few: 


e The weighted estimates of F John are estimates of 
the solution in spacetime L? norms with weights 
of the form (1+ |¢| + |x|)*(1 + ii — |xl])? 
extension of this method was also used in the 
final complete proof of the conjecture. 

e The vector field approach of S Klainerman. If we 
regard energy estimates as norms generated by the 
plain derivatives, it is natural to extend them to 
more general norms generated by vector fields 
commuting, or quasicommuting, with the wave 
operator. The conservation of energy expressed in 
these generalized norms has a built-in decay that 
allows us to prove global existence of small waves. 
This circle of ideas led very far, and we might even 
regard Christodoulou and Klainerman’s proof of 
the stability of Minkowski space for the Einstein 
equation as an extreme consequence of this 
approach. 

e The normal forms of J Shatah. The idea is to 
apply a nonlinear (and nonlocal) transformation 


to the equation in order to increase the power y. 
This method is effective for a variety of equations, 
including the semilinear wave, Klein—Gordon, and 
Schrodinger equations. 

e The conformal transform method of D Christo- 
doulou. The Penrose transform takes the wave 
operator on RY” to the wave operator on a 
bounded subset of R x S”, the so-called Einstein 
diamond (here S” is the n-dimensional sphere). 
Thanks to the fact that a problem of global 
existence is converted into a problem of local 
existence, the proof reduces to showing that the 
lifespan of the local solution becomes large 
enough to cover the whole diamond when e€ 
decreases. 


A similar theory has been developed for the more 
general semilinear equation 


Du = F(u,u'), F(u,u') 


but the results are less complete. The general picture 
is similar: for y > 2 when n > 4, and for y > 3 when 
n=3, one has global small solutions, while for y 
close to 1 one in general has blow-up. 

A very interesting phenomenon in this context 
was discovered by S Klainerman: some nonlinea- 
rities with a special structure, called “null struc- 
ture,” behave better than the others. This structure 
is clearly related to the wave operator, and in the 
end it can be precisely explained in terms of 
interaction of waves in phase space. We illustrate 
these ideas in the most interesting special case. 
Consider the equation in three dimensions 


Ou = F(D,u,D,u), F=O(|u'|"), 1 =3 


= O(|u,u'|") foru~ 0 


In the “cubic” case y=3, one has global existence 
for all data small enough. On the other hand, in the 
“quadratic” case y=2, it is possible to construct 
examples where the solution blows up in a finite 
time no matter how small the data. Now, assume 
that the nonlinear term has the following structure: 


F(u) =aQo(u + Soe Ci Qjk (u 


0<j<k<3 


)+O(\w'|") [36] 


which is called a “null structure”. Here a,cjp are 


constants, and the quadratic forms Q are the 
following: 
Qo(u') a [Diu]? 7 Daal 7 Dial 7 Par [37] 
OQg(u') = D;u - Dyu — Du : Dyu, j=1,2,3 [38] 
„(u) = Dau - Dy u — D, u- Dyu 
Opl ) j k k j [39] 


EER 


Then the problem has a global solution for all small 
enough data. The extensions and applications of this 
idea are very wide (see the “Further reading” section 
for further information). Another situation where 
the null structure plays an important role is 
discussed in the next section. 


Low Regularity 


Theorem 1, although optimal in the classical frame- 
work, is not satisfactory for a few reasons. From a 
physicist’s point of view, requiring 7/2 + 1 deriva- 
tives of the data is not meaningful, since the 
measurable quantities involve only low-order deri- 
vatives, the most important one being the energy, 
that is, the H'-norm of the solution. Moreover, the 
wave equation has a rich set of conserved quantities, 
symmetries and decay properties which may be 
useful to prove stronger results, and in particular the 
global existence. However, many of these structures 
appear only at a low-regularity level (H! or even 
LP); in order to exploit them it is essential to work 
with low-regularity solutions. 

As an example, if we were able to prove Theorem 1 
for k=1, then we could deduce that the local 
solutions can be extended to global ones in all cases 
when the H!-norm is conserved. For instance, this 
would allow us to solve globally the equations of 
the form 


Ou + G'(\u|7)u=0, G(s) >0 

The problem of the lowest value of s such that 
a unique local solution exists in H* is quite 
difficult, and still not completely solved. In order 
to state the results we precise the definition of 
solution as follows: the IVP is said to be locally 
well posed in H5, if, for all (uo, u1) in a bounded 
set B of HS x Ht, there exist a T > 0, a Banach 
space Xr (depending on B) continuously 
embedded in C([0, T]; H), and a unique solution 
u € Xr, such that the map (uo, u1)—> u is contin- 
uous from B to Xr. 

For the wave equation with a power nonlinearity 


Ou = |x)? [40] 
or more generally 
gu = F(u), F(u)=0 
(u), F(u) p o 
|F(u) — F(v)| < Clu — v|(luP + wP) 


the picture is almost complete. Indeed, by using the 
scaling 


tr At, xr rx [42] 
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and the Lorentz transformation 


At — t— À 
à? — 1 A2 — 1 


it is possible to show by explicit constructions that 


e the equation is not locally well posed for p(u/2 — s) < 
(1/2 +2 — s) (scaling) and 

è the equation is not locally well posed for p(n/4 + 
1/4 — s) < n/4 + 5/4 — s (Lorentz). 


On the positive side, local well-posedness has been 
almost fully proved in the complementary region of 
indices, with the exception of a tiny spot near the 
endpoint s=0, p=(n + 5)/(n+1) where the pro- 
blem is still open (and the conjecture is that the 
equation is ill posed for indices in that region). 
These results are due to several authors, among the 
others we cite C Kenig, G Ponce, L Vega, 
H Lindblad, C Sogge, L Kapitanski, and T Tao. 

When the nonlinearity depends also on the first- 
order derivatives of u, the situation becomes more 
complex. In the general case, the best result 
available is still the local existence theorem 
(Theorem 1); the only possible refinement is the 
use of fractional Sobolev spaces H5, but in general 
local solvability only holds for s > n/2 +1. If we 
assume that F= F(u’) is a quadratic form in the 
first-order derivatives, a clever use of Strichartz 
estimates allows us to prove local solvability down 
to s>n/2+1/2 for n>3 and s>7/4 for n=2 
(Ponce and Sideris). 

However, exactly as in the case of the small 
nonlinear waves examined in the previous section, if 
the nonlinear term has a null structure the result can 
be improved. Indeed, when F(z’) is a combination of 
the forms [37]|-[39], then local solvability and 
uniqueness can be proved for all s > 7/2, as in the 
case of a nonlinear term of the type F(u). This result 
is due to Klainerman, Machedon, and Selberg. 
Again, the proof is based on a variation of the 
contraction method; the additional ingredient here 
is the use of suitable function spaces, which are 
the counterpart for the wave equation of the spaces 
used by Bourgain in the study of the nonlinear 
Schrödinger equation. The norm of these spaces is 
defined as follows: 


Illas = IE Ct] = IED EC, Ellez 


where (£)=(1 + |)!" and @ is the spacetime 
Fourier transform of u(t, x). The wave operator can 
be regarded as a spacetime Fourier multiplier of the 
form 72 — |€|? =(|t| — |€|)(|t] + |El), and we see that 
“inverting” the operator O has a regularizing effect 
in the scale of H*’ spaces, since it decreases both 
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s and @ by one unit. Substantiating this formal 
argument and complementing it with suitable esti- 
mates for the nonlinear term requires some hard work, 
which is contained in the theory of bilinear estimates 
developed by Klainerman and his school. 


See also: Evolution Equations: Linear and Nonlinear; 
Symmetric Hyperbolic Systems and Shock Waves; Wave 
Equations and Diffraction. 
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Introduction 


The method of separation of variables (SoV) is a 
way of finding particular and general solutions of 
certain types of partial differential equations (PDEs). 
Its main idea is to consider the additive ansatz 
u(x) = X; w;(x',a) or the multiplicative ansatz 
u(x) = |], ui(x',a) for a solution of a PDE that 
allows for reducing this PDE to a set of (uncoupled) 
ordinary differential equations (ODEs) for the 
unknown functions w;(x',a) or u;(x',a) of one 
variable x’, where x=(x!,...,x”). Locally, the 
additive ansatz is, through the change of variables 
u(x) = exp( >>; w;(x',a)), equivalent to the multi- 
plicative ansatz. 

Many well-known equations of mathematical 
physics such as the heat equation, the wave 


equation, the Schrödinger equation, and the 
Hamilton-Jacobi equation are solved by separating 
variables in suitably chosen systems of coordinates. 


Fourier Method 


The SoV method can be attributed to Fourier 
(1945), who solved the heat equation 


for distribution of temperature u(x,t) in a one- 
dimensional metal rod (of length L) by looking 
first for special solutions of the product type 
u(x,t) = X(x)T(t). This ansatz, substituted to [1], 
reduces it to two ODEs: 0,T=—k*T and ô xX = 
—k*X that can be solved by quadratures: 

T(t) = Ae", X(x) = Bcos(kx) + Csin(kx) 
Due to linearity of [1], any formal linear combina- 
tion u(x, t)= >>, cXą(x)T(t) is again a solution of 
the heat equation and can be used for solving an 
initial boundary-value problem (IBVP). For instance, 
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in the case of the IBVP on the interval O< x < L 
and with zero boundary conditions 
0<t0<x<L 
O<t 

i ee ane 


OM = yx, 
HOt) =a.) =, 
u(x,0) = f(x), 


only a countable set of values for the separation 
constant k is admissible: k, =(na/L), n=1,2,.... 
Then the general solution has the form of the 
Fourier series 


WX 1) = ` Cn exp(—k-t) sin(k„) 
n=1 


where the coefficients c, are given by the integrals 


L 
i= a f(x) sin(Ryx) dx 


The sequence of functions sin(k,x) is complete on 
the interval [0,L]. That means that any regular 
(continuous and differentiable) initial data function 
f(x) such that f(0)=f(L)=0 can be uniquely 
expressed as an infinite convergent sum of the 
orthogonal set of functions sin(k,x). The study of 
mathematical properties of the Fourier expansion 
gave rise to the classical theory of Fourier series and 
Fourier integrals. 


Separability of PDEs in General Setting 


A general setting for an additive separability of a 
single, usually nonlinear, PDE has been developed 
by Levi-Civita (1904) and by Kalnins and Miller 
(1980) (see also Miller (1983)). Let 


H(x',... "5, Uj, Uy, Up, ---) =E 
1<ijk <n [2] 


be a finite-order PDE for an unknown function u(x), 
where u(x) = OiU, Ui =O,j0,iu, etc., and E is a 
constant. A separable solution u(x)= X`; W;(x’) 
satisfies the simpler equation 


E = H(x; u, uj, ti, ...) = H|x, ul [3] 


where all mixed derivatives uj, etc., disappear. If a 
separable solution is admissible by eqn [2], then the 
function H(x;u,uj;,ujj,...) has to satisfy a set of 
integrability conditions following from the total 
derivatives of [3]. Let 


D; = Oyi + thii Da + Ui, 20u; + Ta 4- ui, mi+1 Ou; m, 
= Di Fu myi Oui m; 


(where u; 1 = Ui, Uui j+1 = xii j, etc., and m; is the 
largest number / such that 0,.,H #0) denote the 


operator of total derivative with respect to (w.r.t.) x’; 
then, D;H[x,u]=0 or 
D;H 
Ui m;+1 = 
Hum; 





where Hy, m; = Ox;,m,H. The integrability conditions 
Djui,m41=0, 7 At, give rise to a large set of 
differential conditions to be satisfied by H[x, u]: 


Humi Hum; (D:D;H) +F ae iji (Din) (DH) 
= Hum; (DiH) (Diura) 
+ Ay; m; (DH) (Dhaan) 4 


In general, the conditions [4] are restrictions for 
both H and the form of a particular separable 
solution u(x). If [4] is satisfied identically w.r.t. all 
u, Uuk, we say that the corresponding coordinate 
system x’ is a regular separable coordinate system; 
then the PDE [3] admits a (}),m; + 1)-parameter 
family of separable solutions. Most cases considered 
in literature are regular; since then the separable 
solution is usually sufficiently general for solving 
various IBVPs. 

A given PDE, however, usually does not satisfy 
[4]; since these equations are not of tensorial type, 
the natural question arises if there exists a suitable 
change of coordinates y(x) such that the transformed 
PDE satisfies [4]. Such separation coordinates may 
or may not exist; it is usually very difficult to decide. 

Here and in what follows, we speak about 
separability of a single (scalar) PDE. The theory of 
separability of systems of PDEs is still not developed 
fully, although it is of relevance in the theory of 
Maxwell equations and of the Dirac equation. 

We present here the most classical part of SoV theory: 


orthogonal separability of the Hamilton-Jacobi 
equation for geodesic motions on Riemannian 
manifolds. 


Configurational Separation 
of Hamilton-Jacobi Equation 
on Riemannian Manifolds 


Around 1842, C G J Jacobi invented the method of 


generating function for solving the canonical 
Hamilton equations 
. _ OH (x,y) OH(x,y) 
[= Se ee 
Oy Ox [5] 
x = (x1,..., x”) = (y',...,y”) 


where H(x,y) is a Hamiltonian and dot denotes the 
time derivative (Landau and Lifshitz 1976). In this 
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method, one looks for a generating function W(x, a) 
of a canonical transformation 


_ OW (x, a) _ OW(x, a) 
V= ax = ða 
that transforms Hamiltonian equations [5] into simple 
equations for the new variables G € R”, a € R”. Since 
the transformation is canonical, the transformed 
equations are again Hamiltonian with the new 
Hamiltonian H((G,a)=H(x(G,a),y(G,a)). If we 
choose this transformation so that H(8, a) = a1, then 
the transformed Hamilton equations become 


, OH(B,a) _ 
8 =— z; = (1,0,...,0) 
__0H(B,a) _ 4 
— = 
SO that B(t)=(t+ Pro, B20,---5 Bro), a(t) = 


(Q10,--+5Qn0) =const. and the solution x(t), y(t) of 
the Hamilton equations [5] is then given implicitly 
by the equations 


_ OW (x(t), a) 
Oa 


OW (x, a) 


Ox 


the generating function W(x, a) has to satisfy (identi- 
cally w.r.t. (x, a)) the first-order nonlinear PDE 


OW (x, 
H (s TR) =a i 
This equation is called the Hamilton-Jacobi 


equation for the generating function W(x,œ). It is 
solved when its complete integral W(x, a), complete 
means that 


depending on n independent constants a is known. 
In general, it is very difficult to find solutions of [6]. 
The most important method is the method of 
separation of variables when one looks for a 
solution in the form W/(x,a)= $`% We(x*,a) 
which is a sum of n functions W,(x*,a), each 
depending on a single variable x* and, possibly, all 
constants a. If the Hamilton-Jacobi equation [6] 
admits such a solution, then integrating this 
equation is reduced to integrating n (uncoupled) 
first-order ODEs for functions W,(x*,a). The 
constants a, acquire then the meaning of integration 
constants. 


A separable solution W(x, a) of [6] exists when- 
ever the Hamiltonian H(x,y) satisfies (identically) 
the integrability conditions [4] which in this case 
acquire the (nonlinear) form 


L;(H) =0;H0;HO'd'H + 0'HO'Hd,0;H 
— 0;H0'H0'0;H — 0'H0;H0;0'H 
=0 Tor alg ds ssa [7] 


(ð; = 0/0x', 0! =0/dy;) found by Levi-Civita (1904). 
In classical mechanics the most important 
Hamiltonians are natural ones: 


Hey) =5 oe ey VEG B 


They are defined on the cotangent bundle T*O of a 
configurational Riemannian manifold O with the 
metric tensor g. The function G is the geodesic 
Hamiltonian associated with the metric tensor g. For 
such natural Hamiltonians, the Levi-Civita condition 
Li(G+V)=0 splits into the condition Lj(G)=0 
and a condition for the potential V(x). The condition 
Li(G)=0, depending solely on the kinetic energy 
term, is thus a necessary condition for coordinates x! 
on O to be separation coordinates for [8]. 

In the fundamental case of orthogonal separation 
(i.e., when g’=0 for i Æj), the Levi-Civita condi- 
tions L(G + V)=0 read 


0,0,g** — (0; Ing) ag" 
— (AIng")dg"* =0, if; [9] 


o;o; V — (0; In g’); V 


= (0; In g”); V -= 0, 1 a [10] 


The main questions arising here are 


1. What is the algebraic form of orthogonally 
separable Riemannian metrics? 

2. What is the form of separable coordinates on 
Riemannian manifolds? 


The first question is answered by the Stäckel 
theorem (Stäckel 1891) that provides an algebraic 
characterization of orthogonal separability of a 
natural Hamiltonian H=G + V. 


Theorem 1 The Hamilton-Jacobi equation for the 
natural Hamiltonian 


1 il 2 
H=G+V =>) sian + Ve) 
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is separable in the (orthogonal) coordinates x if and 

only if 

(i) There exists a matrix ® =[y;(x')], det (&) 4 0 
(so that the row i depends only on x') such that 
[glt,...,2"] is the first row of the inverse 
matrix V=. 

(ii) The potential V is the form V(x) = >>, g"fi(x! 
where each f;(x') is a function As one ron x! 
only. 


Such matrix ® is called a Stackel matrix. 


Proof If 


p11 (x) Pin(x’ ) 
ae g] f 
Pra (x”) Pnn(x") 
~[1,0,...,0 111] 


then the Hamilton-Jacobi equation for H can be 
written as 


B e (E) + BAe) =a 
= 2e pit ( x! )+ az > 8 vial 


I OP Sy Sg yvin(x') [12] 





This equation admits an additively separable 
solution W= X; W(x’), where the functions W; 
satisfy n ODEs (separation equations): 


TOES 


= arya (x') + arpal) +-+ anvin(x') 
L baren [13] 





By differentiating [13] w.r.t. aj, we get 


OW; OPW; 
Ox! Ox Oa; 








pij (x I= 


and thus 


rn, OW, OW, o- W 
so that W = X`, W;(x') is indeed a a integral of 
the Hains equation [12]. Conversely, if 
W = X`, Wi(x') is a complete integral of the Hamilton- 


Jacobi equation [12], then by differentiating it w.r.t. a; 
we get forj = 1 


Ox! Ox' Oa, B 























and 
Ox! Ox'0a; E 
(for j=2,...,n), that is, the condition [11] for the 
Stäckel matrix 
(OW; OW; 
E Ox! Ox' Oa; 


Further, we see that 
1 i 
yV = Q1 — Do w 
1 ii 1 2 
oe aipa (x) — 5 (xi Wi) 


=) gf) C 


Remark 2 The Stäckel characterization of orthogo- 
nal separability is equivalent to Levi-Civita conditions 
[9] and [10]. It is in fact a solution of these conditions. 


Remark 3 With every Stäckel matrix, one can 
relate a family of n quadratic in momenta Hamilto- 
nians defined by n rows of the inverse Stäckel matrix 


=p s [Vte]: 
1< 2 
He = 3) ded k=1,...,n [14 
(so that Hı = G). These Hamiltonians are linearly 


and functionally independent; they Poisson- 
commute (so that they form a Liouville integrable 
system) and are all diagonal so that they have 
common eigenvectors. 


These properties are the main ingredients of an 
intrinsic (coordinate-independent) characterization 
of separable geodesic Hamiltonians G in terms of 
involutive Killing tensors that is due to works of 
Eisenhart (1934), Kalnins and Miller (1980), and 
Benenti (1997). 


Theorem 4 A necessary and sufficient condition 
for the existence of an orthogonal additive separable 
coordinate system x for the Hamilton-Jacobi 
equation of the geodesic Hamiltonian Hı =G 
on an n-dimensional (pseudo)-Riemannian manifold 


is that ey exist n quadratic forms 
pe a, j x)yiy; such that 
(i) They all Poisson-commute: {H,,H,}=0,1 <r, 


s<. 

(i) The set {H,}"_, is linearly independent. 

(ii) There is a basis {wp}; of n simultaneous 
eigenforms for all H,. 
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If conditions(i)—(iii) are pean then uaa exist 
functions gj(x) such that wp =gjdx', j=1,. 


This theorem has a further acai’ 
by Benenti (1997), who has shown that for separ- 
ability it is sufficient that g” admits a single Killing 
2-tensor with simple eigenvalues and normal eigen- 
vectors. He has also explained the role of ignorable 
coordinates. 

These results are key ingredients of an answer to the 
question (2). Eisenhart (1934), starting from the fact 
that every separable geodesic Hamiltonian H=G 
admits n quadratic (w.r.t. momenta y;) integrals of 
motion, derived a set of nonlinear PDEs characterizing 
separable Riemannian metrics. He has solved these 
equations for spaces of constant curvature. This 
solution is the basis of the Kalnins and Miller’s 
(1986) diagrammatic classification of all orthogonal 
separation coordinates on R” and the sphere S”. 
Separable coordinates on the Minkowski space M” 
have not been classified yet. 

Since the work of Robertson (1927) and Eisenhart 
(1934), it is known that in R”, S” and, in general, in 
the space with diagonal Ricci tensor, the (additive) 
separability of Hamilton-Jacobi equation for the 
natural Hamiltonian H=G+V is equivalent 
to multiplicative separability of the stationary 
Schrodinger equation with the same potential V: 


(A + V(x))O(x) = EO(x) [15] 


where 


A= TES ag? a( der(g)8"ð 


is the Laplace-Beltrami operator. Usually, multi- 
plicative separated solutions ©(x)= [[?_, O,(x) is 
considered but the change of the dependent variable 
u=I|nO transforms it into an additive separable 
solution. If we restrict our considerations to ortho- 
gonal separation coordinates (g’ =0 for i Æ j), eqn 
[15] becomes 


n 


p 1 
"(u;; +u?) + — ô; 
5 (e (ui + ut) + Te 


i=1 


x ( det(g)s" ) m) + V(x) =E 


where u;= ðu, uj; =0;0;u. The integrability condi- 
tions [4] for regular separation lead to the Levi-Civita 
condition [9] on the components g” of the metric 
tensor, upon comparison of the coefficients at 17. 


The coefficients at u; yield the Robertson condition 


2} In( det(g)g n=, ix { 


and the constant terms in [4] give the Levi-Civita 
equation [10] meaning that V(x)= >~"_, g"filx 

Eisenhart has shown that the Robertson condition is 
equivalent to the requirement that the Ricci tensor is 
diagonal: Rj;=0,74j in variables x so that the 
Robertson condition is satisfied automatically in the 
Euclidean space, in spaces of constant curvature and in 
Einstein spaces. Thus every orthogonal coordinate 
system permitting multiplicative separation of the 
Schrödinger equation corresponds to the Stackel form. 


Jacobi Problem of Separability 


In order to apply the separability theory to physical 
Hamiltonians H=(1/2)p? + V(q), p= (Pis -<-> Pn), 
q=(q',...,q"), it is essential to solve the following 
problem: “given a potential V(q), decide if there 
exists a point transformation x(q) to some curvi- 
linear coordinates x such that the Hamilton-Jacobi 
equation associated with H is separable in coordi- 
nates x, and if such transformation exists, determine 
it and solve the obtained Hamilton-Jacobi 
equation.” 

This problem has been raised by Jacobi (1884) in 
connection with the problem of finding geodesic 
motions on a 3-axial ellipsoid. For solving this 
problem Jacobi introduced his “remarkable change 
of coordinates” to the generalized elliptic coordi- 
nates x(q) defined through zeros of the rational 
function 
Te- x’) 
ey tee 
where the constants A; > 0 are all different. From 
the graph of the left-hand side of [16], it is easy to 
see that there are exactly n simple, real zeros. For 
given values of elliptic coordinates x’, the values of 
(q‘)* are uniquely determined as residues at à; while 
Cartesian coordinates q are determined uniquely 
only in each n-tant of R”. 

The Jacobi elliptic coordinates play a pivotal role 
in orthogonal separability on R” and S” since they 
are the mother of all other separation coordinates 
that can be obtained through proper and improper 
degenerations of ),’s. By using these coordinates 
Jacobi solved not only the geodesic motions on the 
ellipsoid but also the motion on the mre oe 
the action of harmonic potential V(q) =(1/2)q*. He 
has also found separation paella for a system 
of three interacting particles on the line known 
today as the Calogero system. In general, however, 
Jacobi considered the problem of finding separation 
coordinates for a given potential V(g) to be very 
difficult. In V6rlesungen über Dynamik, ch. 26, he 
writes: “The main difficulty in integrating a given 





Separation of Variables for Differential Equations 531 


differential equation lies in introducing convenient 
variables, which there is no rule for finding. There- 
fore, we must travel the reverse path and after 
finding some notable substitution, look for problems 
to which it can be successfully applied”. This 
statement had a profound influence on further 
development of SoV theory that concentrated on 
characterizing separable Hamiltonians (as expressed 
in terms of separation coordinates) and on describ- 
ing and classifying separation coordinates. 

The original problem of Jacobi of finding separa- 
tion variables for a given natural Hamiltonian has 
been taken up by Rauch-Wojciechowski (1986), 
who found a characterization of separable potentials 
V(qg) in terms of Cartesian coordinates q;. Its 
invariant geometric form has been given by Benenti. 
A complete criterion of separability that allows for 
an effective testing and calculation of separation 
coordinates (if they exist) for V(g) has been solved 
by Waksj6 and Rauch-Wojciechowski (2003). This 
criterion is directly applicable to the problem of 
finding SoV for the Schrödinger equation. 


Criterion of Separability for n=2 


The criterion of separability for n=2 can be read 
from the Bertrand—Darboux theorem. 


Theorem 5 
Hamiltonian: 


H= 5(pt + 3) + V(q1, 92) 


the following statements are equivalent: 


(Bertrand—Darboux). For the 


(i) H has a functionally independent integral of 
motion {H, K}=0 of the form 


K = (ag + bq + c)pi + (aqi + bqı + ¢)p5 
+ (— 2aqiqz — bqı — bq + d)pip2 
a k(q1, 92) 


(ii) The potential V(q1, q2) satisfies the following linear 
second-order PDE with quadratic coefficients 


0 =2(aqy — aq} + bq — bqi + c — 2) AV 
+ (Jag — bqi — bqn + d) (ORV — OFV) 
+ (6aq2 +3b),V — (6aqı +3b)HV [17] 


where a,b,b,c,c,d are 
O10 7303 =O: 

(iii) The Hamilton-Jacobi equation for H is separ- 
able in one of the four orthogonal coordinate 
systems in the plane: elliptic, parabolic, polar, 
or Cartesian. 


Some constants, 


Remark 6 If the potential V(q1,q2) is separable, 
then it admits an integral of motion K that is 
quadratic w.r.t. momenta and V satisfies (identically 
w.r.t. 41,92) eqn [17] for certain values of the 
undetermined constants a,b,b,c,c,d. Since coeffi- 
cients at linearly independent expressions of g1, q2 
have_ to be equal to zero, the parameters 
a,b,b,c,c,d have to satisfy a set of linear, algebraic, 
homogeneous equations. If there is a nonzero 
solution for a,b,b,c,c,d, then there exists an 
integral of motion K and separation coordinates 
can be determined as characteristic variables for 
equation [17]. 


Example 7 Separable cases of the Henon—Heiles 
potential 


V =4 (ug +4295) + agiqn — 4693 


By substituting this form of V into [17], we get two 
sets of admissible solutions for parameters a, Ø, 


wi, w2: (i) G= —a,w,=u2 with V separable in 
rotated (by 7/4) Cartesian coordinates; (ii) 
B= —6a, w1, w2-arbitrary with V separable in the 


shifted parabolic coordinates. In case (ii) eqn [17] 
becomes 


1 
2( a2 = ga (401 = v2)) O1daV 
+ qi(0;V —- & V) +30,V =0 


and in its characteristic coordinates defined as 
gi = VEN, q2 =(1/2)(€ — n) + (1/4a)(4w1 —w2) it 
takes the form (€ — 7)0¢0, V + V + 0, V=0 solved 
by V(é n) = (E+ m IFE +.g(n)] which is separable 
in the parabolic coordinates. 


Effective Criterion of Separability 
for Arbitrary Dimension 


For n >2, a similar theorem characterizing separ- 
ability in generalized elliptic coordinates has been 
formulated by Rauch-Wojciechowski (1986). 


Theorem 8 (Elliptic Bertrand—Darboux). For a 
natural Hamiltonian H=(1/2)p*+V(q), the 
following statements are equivalent: 


(i) H has n global, functionally independent and 
involutive integrals of motion {H,K;}=0, 
{Ki, Kj} =0, i 7=1,...,”, having the form 


n 


Ki= X (Ad) E + p + kila) 


r= rž 
ly = iPr — q4rPi [18] 
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(ii) The potential V satisfies the following system of 
linear second-order PDEs 


(Ai — dj )O;0; V = 32 + R)V =) 
$= 1h ue 1S] [19] 


AjOj Sip V T AQ Spi V cy Ak Ok Si; V =0 
all i,j,k different [20] 


where Sy=q 0 — qj0i,; R=} r 

(iii) The Hamilton-Jacobi equation for H is separ- 
able in the generalized elliptic coordinates |16] 
with parameters Ai. 


Remark 9 Equations [19]-[20] follow from the 
compatibility conditions that mixed derivatives of 
k;(q) calculated from the conditions {H, K,}=0, are 
equal. This leads to an overdetermined system [19]-[20] 
of PDEs for V(q). Equations [19]-[20] are not linearly 
independent but we keep both sets [19]-[20] in the 
formulation of this theorem because eqns [19] give rise 
to the basic Bertrand-Darboux equations [21] used in 
the criterion of separability while eqns [20] give rise to 
cyclic Bertrand-Darboux equations [22] used for testing 
the level of spherical symmetry in the potential. 


For testing elliptic separability of any given potential 
V(q), it is necessary to introduce into eqns [19] and 
[20] the freedom of choice of the Euclidean reference 
frame (as described by the Euclidean transformation 
g=A'(q—b), A € SO(n), b € R”). By substituting it 
into [19]-[20], omitting tildes and summing over one 
of the indices, we obtain new equations 


n 


0 = X ((agide + Bide + Bedi + Vik) OkO; V 
k=1 


— (agide + Bde + Brai + je) OV 
+ 3((aqi + 6); V — (aq; + 5))0;V)) 


n 


0 = N (Yuq; — ¥1Fi)KAV + (Viqr — WIA HAV 
= 


+ (RIdi — VIk)OVAV) [22] 


with the new coefficients a, §;, yj that are uncon- 
strained despite that the orthogonal matrix A 
satisfies the quadratic algebraic constraint AA’ =I. 

Theorem 8 provides the following test of elliptic 
separability for a potential V(q) given in Cartesian 
coordinates. 


1. Insert V(q) into the Bertrand—Darboux equations 
[21]. This gives a system of linear, homogeneous, 
algebraic equations for the unknown parameters 


a, Bi, Jije If a=0, then V(q) is not separable in 
elliptic coordinates. 

2. If a40, set bD=-atB,S=bb'-—a'y and 
diagonalize S:S=Adiag(\1,...,A,)A’. If some 
eigenvalues A; coincide, then V(q) is not separ- 
able in elliptic coordinates. Otherwise V(q) is 
separable in the © elliptic coordinates 
x=(x!,...,x7) given by 


L G Me- x) 
2) E 





(compare with [16]), where q4 = Aq + b, with b and 
A found as above. 


If a=0,G40, then there exists a similar 
algorithm for separability in generalized parabolic 
coordinates and for @=0, 8=0,y#Æ#tI, we 
have separability in Cartesian coordinates if all A; 
are different. For giving an idea of what happens 
when degenerations occur, consider the case 
a=0, @=0. Then the Bertrand-Darboux equations 
[21] are Euclidean equivalent to the canonical form 
(A; Aj) O,0,V=0 and if all à are different, 
then equations 0;0,;V=0 imply that V(q) is a 
sum of functions of one variable only: 
Via) = E$ Vid’). 

The main problem is to handle all possible 


degenerations when certain .’s_ coincide. Let 
ApH SA < Aj < < Am where A<j<n. 
Then ViQeeVi@ scus@) + Vial) HH 


V„(q”) which means that variables q’/*'!,...,q” 
separate off while the potential V;(q',...,q/) has to 
be tested again on R’ with the use of eqns [21]. 
Degenerations for a4 0 or G40 are more compli- 
cated and the cyclic Bertrand—Darboux equations 
[22] have to be used. They unfold the level of 
spherical degeneracy of spheres and embedded sub- 
spheres. A complete analysis of all possible degenera- 
tions is technical. It requires considering of all possible 
degenerations of the sequences A; <---< A, and of 
the related equations [21]-[22] for the potential V(q). 
It has been proved by Waksjd and 
Rauch-Wojciechowski (2003) that there is a one-to- 
one correspondence between all possible sets of PDEs 
[21]-[22] characterizing separable potentials and all 
possible types of Riemannian metrics (in the Kalnins 
and Miller (1986) classification of all separable 
coordinates on R” and S$”) so that no completely 
separable case is missed. The most important is that 
after maximally steps separation coordinates are 
always determined (if they exist) by a sequential use of 
the Bertrand—Darboux and cyclic Bertrand—Darboux 
equations [21|-[22]. 
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Separation of Eigenvalues Problems 


Eigenvalues problems (in a given domain D) of the 
form 

Aw(q) + Ap(q)w(q) = 9, p>0 |23] 
(where A is the Laplace operator) arise when sepa- 
rating the wave equation p(q)un = Au and the diffu- 
sion equation p(q)u; = Au (Courant and Hilbert 1989). 
The multiplicative ansatz u(g,t)=w/(q)g(t) yields 
eqn [23] together with g= Ag or g= Ag. The problem 
[23] is also used for solving the inhomogeneous 
equation Au =f with the zero boundary condition 
u|əp = 0. In general, the properties of the eigenvalues A; 
and of the corresponding eigenfunctions w; of the 
problem [23] depend on the regularity requirements for 
wi; and on the boundary conditions at OD. 

For the zero boundary conditions w(q)|5p =0, one 
seeks a nontrivial (w+ 0) solution having in the 
region D continuous first- and second-order deriva- 
tives. General theorems (Courant and Hilbert 1989) 
state that for such problems there exists a growing 
sequence {A,}"_, of positive eigenvalues A; such that 
à; — œ as 7 increases, and that there is a related 
sequence of normalized eigenfunctions ,/pw4, 
\/pw2,... that form a complete weighted-orthogonal 
(in the sense that fh pwjw; = 6j, 1,7=1,2,...) system 
of functions so that every regular initial function 
v(q) with v(qg)|g5 =0 may be expanded in terms of 
the eigenfunctions Wm in an absolutely and uni- 
formly convergent series v(q) = X4 CmWm(q) with 
Cm = fp PYWwm. This makes it possible to express a 
solution of the IBVP for the wave or for the diffusion 
equation with zero boundary conditions: 


p(q)uzx = Au respectively p(q)u; = Au 
u(q, t) lop = 0, u(q, (= 0) a v(q) [24] 


as a convergent infinite series u(qg,t)= 
Sai CmW (GQ) Ealt) where g(t) satisfy g=Ag 
respectively $ = Ag. Further determination of proper- 
ties of the eigenfunctions w, is possible only in 
special domains D when the problem [23] can be 
reduced to one-dimensional eigenvalue problems by 
separating variables in some suitable coordinates. 


Example 10 Consider the spherical domain 
r =x? +y +z? <1. Equation [23] with p=1 
attains in the spherical coordinates (r, p, 0) the form 


= 1 E 1 
Aw + àw T (a0 sin 0 ow) + O, (sa at) 


sin 0 


+ alsin 6 ay) +rAw = 0 


The ansatz w=f(r)Y(0,~) gives the separated 
equation 





2r À 2 
(PYF k (a(z Y) 
a Y sind sin 0 
+ Op(sin 8 &Y) 


so that its both sides must be equal to a constant k. 
Continuity of Y implies that it has to be periodic in y 
(with period 27) and regular at 0=0, 0=7. It can 
only be satisfied for x = n(n + 1). The left-hand side of 
the above equation yields then (77f‘’)’ — n(n + 1)f + 
dr?f =0. Solutions that are regular at r=0 are the 
Bessel functions (1//7)Jns(1/2)(W Ar). The equation for 
spherical harmonics 


1 1 . 
Vand (2, (5 a.¥ + oy(sin #36Y) 
+n(in+1)Y =0 


can be further multiplicatively separated by assum- 
ing Y=O(6)®(y). The function P(z= cos 6) = O() 
satisfies then the Legendre equation 


((1 - 2°) P'(z)) + (n(x +1)- 3) P@ =) 
P(z) is regular at z=+1 only when o=k’, 
k=0,1,2,.... The function (y) satisfies then 
6” — —k*® with solutions ®,(y) =a, cos (ky) + b; 
sin (ky). The full solution of the eigenvalue problem 
Aw + Aw =0 has the form of an infinite series 


< 1 
A A AE D o TIa (nar) [an oP (cos 8) 
n=0 


+ > (a„pcos(k0) + bnp sin(k0)) 
k=1 


x P„ (cos 0)| 


where the constants Am, n, m = 1,2, ..., are determined 
by the transcendental equation J,,4(1/2) (/r) =0 that 
follows from the boundary condition u(q, t)|55 = 0. 


Almost all BVPs that can be reduced to one- 
dimensional eigenvalue problems may be considered 
as a special or limiting case of the Lame problem 
where the boundary OD is given by pieces of confocal 
quadrics corresponding to some separation coordi- 
nates. If D = {q(x) € R? D 6 = 1525,3) 154 
domain defined by parametrizing q with the elliptic 
coordinates x; given by [16], then the eigenvalue 
problem Aw+Aw=0 splits into three one- 
dimensional equations of the form 


p(s) ¥"(s) + 3¢"(s)¥'(s) + As + u)Y(s)=0 
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where y(s) = 4(s — e1)(s — e2)(s — e3) and e; are para- 
meters of the elliptic coordinates. This is the Lame 
equation; its solutions define new transcendental func- 
tions that depend on the choice of the constants A, u. 

The approach presented here extends to diverse 
modifications such as vibrations with forcing term 
Aw(q) + Aw(q) =f(q), vibrations of a nonhomogen- 
eousmedium Aw(q) + Ap(q)w(q)=0, the stationary 
Schrödinger equation Aw(g) + V(qg)w(qg) =Aw(q) 
whenever the functions p(q), f(¢), V(q) are compatible 
with the separation coordinates. 

Separation equations for the second-order BVP 
are the source of one-dimensional eigenvalue pro- 
blems of the Sturm—Liouville type 


(p(s)u')'—q(s)u + Ap(s)u = 0 


with singularities that may occur at the endpoints of 
the fundamental domain. Majority of orthogonal 
polynomials and special functions appearing in math- 
ematical physics are solutions of Sturm—Liouville 
problems. 

In the complex domain the study of singularities 
of Laurent series solutions of the same equations led 
to development of theory of linear ODEs with 
singular points of the Fuchs class and the Bocher 
class. 


Constructive Approach to Separability 
of Liouville Integrable Systems 


In the constructive approach to separability, one 
considers simultaneously all Hamilton-Jacobi equa- 
tions following from a set of n, functionally 
independent, commuting integrals Hy(x,y),..., 
H,,(x,y), (Hi, H;}=0, that define a Liouville inte- 
grable system (Sklyanin 1995). 

One starts with the separation equations, a set 
of n decoupled ODEs for the functions W;(x;, œ) 
depending on one variable x; and parametric 


a € R”: 
= OW; (x;, a) = 
f (anyi = Sa =0 |25] 


Assume that the dependence on a; is essential (i.e., 
that det(Of;/Oa;) #0) so that we can resolve eqns 
[25] w.r.t. a; so that a; = H;(x, y) for some functions 
H;. If the functions W;(x;,a@) solve [25] identically 
w.r.t. x and a, then the function W(x,a)= 
S 1 Wilx;,@) is simultaneously an additively 
separable solution of eqns [25] and of the equations 


fH, (xy =), b= l [26] 


since solving [25] w.r.t. œ is a purely algebraic 
operation. We can treat eqns [26] as a set of 
simultaneously separable (in the canonical variables 
(x,y)) Hamilton-Jacobi equations related to the 
Hamiltonians H;. Assume now that 


rw OW; 


i.e. that W is a complete integral for [26]. Then the 
Hamiltonians H,(x,y)=a; Poisson-commute since 
a; can be treated as new canonical variables 
obtained by the canonical transformation (x,y) —> 


(G,a) given by 





_ OW (x, a) 
yY —— Ox ) 


_ OW (x, a) 


p Oa 


Thus, any solvable w.r.t. œ set of separation relations 
[25] defines a Liouville integrable system. 

If we perform a canonical transformation from 
(x,y) to new variables (q,p), then the new set of 
commuting Hamiltonians 4H,;(g,p)=H;(x(q, p), 
y(q,p)) is also called separable. 

The main problem for any given set of commuting 
Hamiltonians H;(q,p) is to decide if there exists a 
canonical transformation (g,p) — (x,y) to the 
separation variables (x,y) so that the related 
Hamilton-Jacobi equations [26] are simultaneously 
separable. An answer to this problem is known for 
integrable Hamiltonians solvable through the spec- 
tral curve method (Sklyanin 1995) and for the whole 
class of natural Hamiltonians discussed earlier. 

This approach brings new, wider perspective to the 
classical separability mechanism stated in the Stackel 
theorem. It contains majority of all known separable 
Hamiltonian systems. For example, if we specify the 
separation relations [25] to be affine in a;, 


Dy eR = Ba), Teie [27] 
k=1 


then [27] are called generalized Stackel separability 
conditions. To recover the explicit form of Hamilto- 
nians H; = ag, it is enough to solve relations [27] w.r.t. 
ap. It has been proved that the Stackel Hamiltonians in 
[27] constitute a quasi-bi-Hamiltonian chain. If we 
specify further relations [27] by assuming that func- 
tions f do not depend on y; and functions g; are 
quadratic in y; then we obtain the classical Stackel 
separability conditions (see Theorem 1) 


So falda = gie + h;(x;) [28] 
k=1 


that can be solved for ag yielding 





Looe 2, Pilxi) 
aly) =5 (0a (7 + 

2 2 gi(xi) 
that is, the Stackel Hamiltonians [14] with the Stackel 
matrix © = [pæ], where pie = fu(xi)/gi(xi). By speci- 
fying [28] further, we obtain separation relations 


xP ay + xP a +++ + On = 58x) 97 + h(x) 


which give the so-called Benenti systems associated 
with conformal Killing tensors and cofactor pair 
systems. 

Relations [27], with g;(x;,y;) depending exponen- 
tially on momenta y, contain several well-known 
systems such as periodic Toda lattice, the KdV 
dressing chain, and the Ruijsenaar—Schneider sys- 
tem. Relations with g; cubic in momenta y yield 
stationary flows of Boussinesq hierarchy and integr- 
able systems on the loop algebra sl(3). 


See also: Boundary-Value Problems for Integrable 
Equations; Calogero—Moser—Sutherland Systems of 
Nonrelativistic and Relativistic Type; Elliptic Differential 
Equations: Linear Theory; Evolution Equations: Linear 
and Nonlinear; Integrable Systems: Overview; Multi- 
Hamiltonian Systems; Ordinary Special Functions; 
Recursion Operators in Classical Mechanics; Toda Lattices. 
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Separatrices are asymptotic manifolds in dynamical 
systems. However, this term is applied usually in the 
case of a small dimension of the phase space, where 
these manifolds are hypersurfaces. In the context of 
separatrix splitting manifolds asymptotic to hyper- 
bolic tori are usually considered, where tori of 
dimension 0 and 1 are called equilibrium positions 
and periodic trajectories, respectively. A separatrix 
can be stable (asymptotic as t—> +00) and unstable 
(asymptotic as t— —oo). 
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In this article we consider the case of systems with 
finite-dimensional phase space. Basically we deal with 
nonautonomous Hamiltonian systems 27-periodic in 
time. However, it is useful to keep in mind the fact 
that the cases of autonomous Hamiltonian systems 
and symplectic maps are dynamically the same. Some 
results for non-Hamiltonian perturbations will also 
be presented. Hamiltonian systems with one- 
and-a-half or two degrees of freedom as well as 
area-preserving two-dimensional maps are especially 
important for us because the results on the separatrix 
splitting in this case are more clear and complete. 
Dynamics in such systems is essentially the same. 
Below we call these systems two dimensional. 

We assume that all systems are at least C°-smooth. 
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Poincare Integral 


Consider a Liouville integrable Hamiltonian system. 
Then any separatrix either goes to infinity or joins 
two hyperbolic tori. From a dynamical point of 
view, the latter case is more interesting. If these tori 
are different, the situation is called heteroclinic, 
otherwise homoclinic. Poincaré was the first to 
notice that after a generic perturbation stable and 
unstable separatrices become different submanifolds 
of the phase space. This phenomenon is called the 
separatrix splitting. 

Poincaré (1987) considered perturbations of 
separatrices homoclinic to a periodic solution in a 
Hamiltonian system with one-and-a-half degrees of 
freedom. In this case the system has the form 


OH OH 
c= y= —- DER ii 


where D is an open domain and 
H(x,y,t,€) = Ho(x,y) +eHi(x,y,t)+ O(e7) R] 


We assume that H is 27-periodic in t and € is a 
small parameter. Let (xo, yo) € D be an equilibrium 
position for the unperturbed (e€=0) system: 
grad Ho(x0, vo) =0. Without loss of generality, 
(xo, yo) =0. In the extended phase space D x T 
(T ={tmod 27} is a one-dimensional torus) instead 
of the equilibrium we have a 27-periodic solution 
0 x T. Suppose that the equilibrium (and therefore, 
the periodic solution) is hyperbolic and the corre- 
sponding stable and unstable separatrices A*" are 
doubled: AS = A" =A. Let q(t) be a natural para- 
metrization of A, that is, (x(t), y(t) =~(t) is a 
solution of eqns [1]. In the extended phase space, 
we have the asymptotic surface 


rtr hLi TEITER 


For small values of £, the perturbed system has a 
hyperbolic periodic solution (o-(t), t), ce(t) = O(e) € D 
and the separatrices 


(ECET), t), yo (t, T) =T) 


Since the addition to the Hamiltonian of a function, 
depending only on ż and £, does not change the 
dynamics, without loss of generality we can assume 
that Hı(0,0,ż) = 0. Hence the Poincaré integral 


+00 
P@)= Ay(y(t +7), t) dé 
— OO 
converges. The function P carries all information on 
the separatrix splitting in the first approximation 
in E. 


Periodicity of Hı in t implies 27-periodicity of 
P(r). There is also the following obvious identity: 


TOL [Ho Hayle +7).t) 4 





where {,} is the Poisson bracket. 


Melnikov Integral 


Melnikov (1963) considered general (not necessarily 
Hamiltonian) 27-periodic in ¢ perturbations: 


H 
x= = + evi (x,y, t) + O(e*) 


OH 2 
= Se Ue, 1) FO ) 
OX 
In this case, information on the separatrix splitting 
in the first approximation is contained in the 
Melnikov integral 


+00 
M(r) =| vHo(y(t + 7), t) dt 
where vHp = v10H 9 /Ox + v20H 9 /Oy. 

Note that if the vector field v is Hamiltonian and 
Hı is the corresponding Hamiltonian function, we 
have: vHo = —{Họ, Hı}. Hence in Hamiltonian 
systems we have: M(r) = —dP(r)/dr. 

A multidimensional version of the Melnikov 
integral is presented in Wiggins (1988). 


Geometric Meaning of M(7) 


Let I'r be a compact piece of the unperturbed 
separatrix 


Dr = {(x,y) E€ D: (x,y) = y(t), lel < T} 


Then for any T >Q there exists a neighborhood U of 
Ir and symplectic coordinates (time-energy coordi- 
nates) T, h on U such that the section of the perturbed 
separatrices A$" by the plane {t = 0} is as follows: 


Ae 20 = {(7, h) : h = hè (T)} 
where 


1. h*(T)=0(e*), 
2. b8(r) = —eM(r) + O(e?). 

Moreover, let gt : D — D be the phase flow of 
the perturbed system. The map g2" is called the 
Poincaré map. The following statement holds. 

3. For any two points zo, z1 € U such that z1 = g2"(z0), 
let (79,40) and (11,41) be their time—energy 
coordinates. Then 


Ti = To + 2r + O(e), hı = ho + O(€) 





Figure 1 Perturbed separatrices in time—energy coordinates. 


Existence of such coordinates has several 

corollaries. 

e If P is not identically constant, the separatrices 
split and this splitting is of the first order in e. 

e let n be a simple zero of M. Then the 
perturbed separatrices intersect transversally at 
a point z(e) with time-energy coordinates 
(Ta + O(c), O(e*),t=0). Such a point z,(e) is 
called a transversal homoclinic point. It gen- 
erates a doubly asymptotic solution in the 
perturbed system. 

è Consider a lobe domain L(r,, €) bounded by two 
segments of separatrices on the section {t= 0} 
(see Figure 1). Let another “corner point” of the 
lobe £(7,, €) correspond to the simple zero r} of 


M. Then the symplectic area of L(T,, €) equals 


Ag(T,&) = —€ f M(r) dr + O(e7) 


A Standard Example 


Consider as an example a pendulum with periodi- 
cally oscillating suspension point. The Hamiltonian 
of the system can be presented in the form 


H(x,y,t,€) = ty + Q% cosx + c0(t)cosx [3] 


where Q is the “internal” frequency of the pendulum. 
The function @ is 27-periodic in time. Hence the 
frequency of the suspension point oscillation equals 
1. In this case, the unperturbed homoclinic solution 
y(t) can be computed explicitly. In particular, 


cos(x(t)) = 1 —2cosh 7(Mt) 
Therefore, P(r) = [7° 0(t)(cos (x(t+7)) — 1) dt. For 


example, if 6(t) = cost, we have 


27 COST 


P(t) = ~ oF inh (a2) 


In this case, different lobes have the same area 


4en 


Ac =o sinh(7/2Q) 


+ O(e7) 
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Multidimensional Case 


Multidimensional generalization of the Poincaré- 
Melnikov construction is strongly connected to the 
concept of a (partially) hyperbolic torus. Let 
(M,w,H) be a Hamiltonian system on the 2m- 
dimensional symplectic manifold (M, w). 

An invariant n-torus N C M (0 < n < m) is called 
hyperbolic if there exist coordinates x,y,z on M in a 
neighborhood of N such that 


L Vinay eh A= Xis a mod 27, 
Cae SO aa a Te 

2. w=dy A dx+dz" A dz; 

3, NS (e210 =O, <= 01; and 

4. H= (v, y} + (1/2)(Ay,y) + (2, Q(x)2*) + O3(y, 2)s 


where v € R” is a constant vector, A is a constant 
nxn matrix, Q is an /x/ matrix such that 
Q(x) +Q1(x) is positive definite for any x mod 2r, 
the symbol O3 denotes terms of order not less than 
3, and (a, b) = > Gj. 

If det A Æ 0, the torus is called nondegenerate. If v 
is Diophantine, that is, for some a,3>0 and any 


OA REZ" 
[(v,k)| > ajk” 


the torus N is called Diophantine. The coordinates 
(x,y,z) are called canonical for N. 

Now suppose that the Hamiltonian H depends 
smoothly on the parameter e: 


H=H)+eH,+ Oe) 


and for ¢=0 the system is Liouville integrable with 
the commuting first integrals F,,..., Fm: 


{F Fk} = 0, 1<j,k <m 


Let Mo={F1 =---=Fn=0}CM be their zero 
common level and let N C M be an n-dimensional 
nondegenerate Diophantine hyperbolic torus. The 
torus N generates the invariant Lagrangian asymp- 
totic manifolds A*" C M. Suppose that the separa- 
trices are doubled, that is, there is a Lagrangian 
manifold A C ASN A”. 

Consider the perturbed Hamiltonian H = Họ + 
eH, + O(e?). The torus N as well as the asymptotic 
manifolds A*" survive the perturbation. Let N, be the 
corresponding hyperbolic torus in the perturbed 
system and A" its asymptotic manifolds: N, and 
AS“ depend smoothly on ¢ and No = N, Aj" = A>". 

Let the function x(x) satisfy the equation 


(v, Ox(x)/Ox) + Hı (x, 0,0) 


1 
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This equation has a smooth solution unique up to an 
additive constant. 

Consider a solution of the unperturbed Hamiltonian 
equations y(t) C A. Let I 1’, 1<j,1<m be the 
following quantities anal 1994): 


+ {Fx} E — {B, mae) 


"= lim (- L (F; {Fn Hi}}((e)) dt 
E E 
- {F{F;, HAT) 


The numbers 1,1”, play the role of the first and 
second derivatives of the Poincaré integral at some 
point. 

If any of the quantities I’,I’, does not vanish, 
the oe aan manifolds A*” ole Moreover, sup- 
pose that I] = --- = I}, =0 and the rank of the matrix 
(I) ) cquak m — 1. Then for small values of £, the 
manifold AS and A" intersect transversally on the 
energy level at points of the solution y(t), where 
Y> yase—0. 


Poincaré Integral in Multidimensional 
Case 


Suppose that the Hamiltonian from the previous 

section equals 

= Ho (y, u,v) + €Hy(x, y, u,v, t) 
HOE 

Here x = (Xis Xa mod 27, y=(¥1,..., Vn) ER and 

(u,v) €R*. The symplectic structure w= dy A^ dx + 

dv ^ du. 


We assume that in the unperturbed integrable 
system the variables separate: 


Ho(y, u, v) -= F(y) + f (u,v) 


and the system with one degree of freedom and 
Hamiltonian f has a hyperbolic equilibrium 
(u,v) =0 with a homoclinic solution y(t). Any torus 


N(y®) = {(x,9,4,0,2): y=y°, «=v =0} 


is a hyperbolic torus of the unperturbed system with 
frequency vector 


u(y’) 


1 


gies Y, u, V, t, E) 


, V(y) = ðF/ðy 


Suppose that N=N(0) is Diophantine and non- 
degenerate. Then in the perturbed system there is 
smooth in £ hyperbolic torus N-, No =N. Consider 
the Poincaré function 


P(E, T) = J i (m (E+ v(t + 


CO 


-Hı (£ + v(t + 


TLO mang es) 
7), 0,0, 0, t))de 


Obviously, P(€, T) is 27-periodic in € and 7. 

If P is not identically constant, asymptotic 
surfaces of N; split in the first approximation in €. 
Nondegenerate critical points of P correspond to 
transversal homoclinic solutions of the perturbed 
system. 

Other results on the splitting of multidimensional 
asymptotic manifolds are presented in Arnol’d et al. 
(1988) and Lochak et al. (2003). 


Exponentially Small Separatrix Splitting 


If in the unperturbed (integrable) system there are no 
asymptotic manifolds, they can appear after a 
perturbation. Consider, for example, perturbation 
of a real-analytic Liouville integrable system near a 
simple resonance: 
>. OH. OH cT” 
X S= = X 
Oy ) y Ox ) ) 
H(x,y,t,€) = Ho(y) + €H1(x, y, t,€) 


yEDC R” 


As usual, we assume 27-periodicity in t. A simple 
resonance corresponds to a value of the action 
variable y =y? such that the frequency vector 


V 
H 
p= | = F209) €R” 


(here 1 is the frequency, corresponding to the time 
variable) admits only one resonance. More precisely, 
there exists a nonzero Å € Z”*!, satisfying (k, 0) =0 
and any ke Z”+! such that (k, v) =0 is collinear 
with k. 

Without loss of generality, we can assume that 
y?=0 and v= (5) DER” |. Then the vector 
p= A r) € R” is nonresonant. 

In a ye-neighborhood of the resonance we have a 
system with fast variables X =(x2,...,Xm,t) mod 2r 
and slow variables Y = (x1, €™2y1,..., E1 Ym) 
variables: 


¥=O(ve), X=7+O0(vƏ 4 


If the frequency vector Y is Diophantine, by using 
the Neishtadt averaging procedure, we can reduce 
the dependence of the right-hand sides of eqns [4] on 


the fast variables to exponentially small in € terms. 
This means that there exist new symplectic variables 


Q = X + O(ve) 


(new time coincides with the old one) such that 
system [4] takes the form 


P = yEF(P, Ve) + O(exp(—ae~”)) 
Ò =T + WeG(P, Ve) + O(exp(—ae’)) 


with positive constants a, b. 

If we neglect the exponentially small reminders, 
the system turns out to be integrable. Generically, it 
has a family of hyperbolic m-tori of the form 
(P,Q): P=const.} with doubled asymptotic mani- 
folds. However, the terms O(exp (—ae~”)) generic- 
ally cannot be removed completely. They produce 
an exponentially small splitting of the asymptotic 
manifolds. This splitting implies nonintegrability, 
chaotic behavior, Arnol’d diffusion, and other 
dynamical effects. 

It is important to note that exponentially small 
splitting appears only in the analytic case. In smooth 
systems the splitting is much stronger. 

Unfortunately, at present there are no quantitative 
methods for studying such splittings except obvious 
upper estimates and the case of two-dimensional 
systems. 


P=Y+O(ve), 


Exponentially Small Splitting 
in Two-Dimensional Systems 


The main results on exponentially small separatrix 
splitting were obtained by Lasutkin and his students 
(Gelfreich and others). Another effective approach 
was proposed by Treschev. There are no general 
theorems in this situation; however, many examples 
were studied. We discuss the splitting in the 
pendulum with rapidly oscillating suspension point. 
The Hamiltonian of the system has the form 


H = 4y% + (1 + 2bcos(t/e)) cos x 


(cf. [3]). For any value of e the circle 
{(x, Vt): x=y=0} is a periodic trajectory. For 
small £ >Q the trajectory is hyperbolic. 

Poincaré integral can be formally written in this 
system. It predicts the area of lobes 16rbe™! eo m(2e) 
However, there is no reason to expect that this 
asymptotics of the splitting is correct. Indeed, its 
value is exponentially small in £, while the error of the 
Poincaré—Melnikov method is in general quadratic in 
the perturbation. To obtain correct asymptotics of the 
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separatrix splitting, one has to study singularities of 
the solutions with respect to complex time. Area of 
lobes in this system equals (Treschev 1997) 


Ac = 4bf(b,e)e7! en m(2e) | 


Here f(b,¢),e > 0 is a smooth function. The func- 
tion f(b,0) is even and entire. It can be computed 
numerically as a solution of a problem which does 
not contain £. The value f(0,0)=47 corresponds to 
the Poincaré integral, but the function f(b, 0) is not 
constant. It is possible to prove that f can be 
expanded in a power series in €. Apparently, this 
series diverges for any b Æ 0. 


Separatrix Splitting and Dynamics 


1. Separatrix splitting can be regarded as an obstacle 
to the integrability of the perturbed system. How- 
ever, this statement needs some comments. 
Doubled asymptotic surfaces in an integrable 
Hamiltonian system can have self-intersections. In 
the case of equilibrium, such intersections can even 
be transversal. In the literature, there is no general 
result saying that separatrix splitting implies non- 
integrability. Some particular cases (studied by 
Kozlov, Ziglin, Bolotin, and others) are presented 
in Arnol’d et al. (1989). For example, in the two- 
dimensional case, this is seen to be true. 

2. Conceptual reason for the nonintegrability, dis- 
cussed in the previous item, is a complicated 
dynamics near the splitted separatrices. In many 
situations, it is possible to find in this domain a 
Smale horseshoe. This implies positive topological 
entropy, existence of nontrivial hyperbolic sets, 
symbolic dynamics, etc. 

3. Consider a near-integrable area-preserving two- 
dimensional map. In the perturbed system in the 
vicinity of the splitted separatrices of a hyperbolic 
fixed point ze the so-called stochastic layer is 
formed. Here we mean the domain bounded by 
invariant curves, closest to the separatrices. An 
important quantity, describing the rate of chaos, is 
the area of the stochastic layer As, (£). It turns out 
(Treschev 1998b) that Ası (£) is connected with the 
area of the largest lobe Ac¢(e) by the simple formula 


Ac(¢) log(Ac(e)) 


C1 Ast (£) < 7 
log“ u 


< C2 Ast (£) 


with some constants c1,c2 > 0, where p is the 
largest multiplier (Lyapunov exponent) of the fixed 
point Zo. 

4. Let z be a hyperbolic fixed point of an area- 
preserving two-dimensional map. The point Z 
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divides the corresponding separatrices A*" in 4 
branches Aj, and A} 2. Suppose that the pair of 
branches A? and Af satisfies the following 
conditions: 


e A? and AF lie in a compact invariant domain; 
® Ae and AY do not coincide and intersect at a 
homoclinic point. 


Then the closures ALA are compact invariant 
sets. Very little is known about these sets. For 
example, it is not known if their measure is positive. 
However, by using the Poincaré recurrence theorem, 
it is possible to prove (Treschev 1998a) that A; =A,. 


See also: Averaging Methods; Billiards in Bounded Convex 
Domains; Hamiltonian Systems: Obstructions to Integrability; 
Hamiltonian Systems: Stability and Instability Theory. 
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Introduction 


The rubric “several complex variables” is attached to a 
wide area of mathematics which involves the study of 
holomorphic phenomena in dimensions higher than 
one. In this area there are viewpoints, methods and 
results which range from those on the analytic side, 
where analytic techniques of partial differential equa- 
tions (PDEs) are involved, to those of algebraic geometry 
which pertain to varieties defined over finite fields. Here 
we outline selected basic methods which are aimed at 
understanding global geometric phenomena. Detailed 
presentations of most results discussed here can be 
found in the basic texts (Demailly, Grauert and 
Fritzsche 2001, Griffiths and Harris 1978, Grauert et 
al. 1994, Grauert and Remmert 1979, 1984). 


Domains in C” 


Complex analysis begins with the study of 
holomorphic functions on domains D in Œ”. 


These are smooth complex-valued functions f 
which satisfy 


i af .. 
of = Dg = 0 


Some results from the one-dimensional theory extend 
to the case where n > 1. However, even at the early 
stages of development, one sees that there are many 
new phenomena in the higher-dimensional setting. 


Extending Results from the One-Dimensional 
Theory 


For local results one may restrict considerations to 
functions f which are holomorphic in a neighbor- 
hood of 0 € C”. The restriction of f to, for example, 
any complex line through 0 is holomorphic, and 
therefore the maximum principle can be immedi- 
ately transferred to the higher-dimensional setting. 
The zero-set V(f) of a nonconstant holomorphic 
function is one-codimensional over the complex 
numbers (two-codimensional over the reals). Thus 
the identity principle must be formulated in a 
different way from its one-dimensional version. For 
example, under the usual connectivity assumptions, 
if f vanishes on a set E with Hausdorff dimension 
bigger than 2n — 2, then it vanishes identically. Here 
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is another useful version: if M is a real submanifold 
such that the real tangent space T,M generates the 
full complex tangent space at one of its points, that 
is, T-M +iT,M=T,C", and f|M = 0, then f = 0. 

In the one-dimensional theory, after choosing 
appropriate holomorphic coordinates, f(z)=z* for 
some k. This local normal form implies that 
nonconstant holomorphic functions are open map- 
pings. Positive results in the mapping theory of 
several complex variables are discussed below. The 
simple example F:C* — C’, (z, w) — (z, zw), shows 
that the open mapping theorem cannot be trans- 
ferred without further assumptions. 

The local normal-form theorem in several com- 
plex variables is called the “Weierstraf$ preparation 
theorem.” It states that after appropriate normal- 
ization of the coordinates, f is locally the product of 
a nonvanishing holomorphic function with a 
“polynomial” 


P(z, /)=z* Hapal) 1 +--+ aol) 


where z is a single complex variable, z’ denotes the 
remaining n — 1 variables, and the coefficients are 
holomorphic in z’. This is a strong inductive device 
for the local theory. 

If D is a product D=D, x --- x D, of relatively 
compact domains in the complex plane C, then 
repeated integration transfers the one-variable 
Cauchy integral formula from the D; to D. The 
resulting integral is over the product bd(D,) x --- x 
bd(D,,) of the boundaries which is topologically a 
small set in bd(D). Complex analytically it is, however, 
large in the sense of the above identity principle. 

It follows from, for example, the 7-variable 
Cauchy integral formula that holomorphic functions 
agree with their convergent power series develop- 
ments. As in the one-variable theory, the appro- 
priate topology on the space O(D) of holomorphic 
functions on D is that of uniform convergence on 
compact subsets. In this way O(D) is equipped with 
the topology of a Fréchet space. 


First Theorems on Analytic Continuation 


Analytic continuation is a fundamental phenomenon 
in complex geometry. One type of continuation 
theorem which is known in the one-variable theory 
is of the following type: If E is a small closed set in 
D and f € O(D\E) is a holomorphic function which 
satisfies some growth condition near E, then it 
extends holomorphically to D. The notion “small” 
can be discussed in terms of measure, but it is more 
appropriate to discuss it in complex analytic terms. 

An analytic subset A of D is locally the common 
zero set {a € D; fi(a)=--- =fn(a)=0} of finitely 


many holomorphic functions. A function g on A is 
said to be holomorphic if at each a € A it is the 
restriction of a holomorphic function on some 
neighborhood of a in D. There is an appropriate 
notion of an irreducible component of A. If A is 
irreducible, it contains a dense open set Areg, which 
is a connected k-dimensional complex manifold, 
that is, at each of its points a there are functions 
fi,---5f~ which define a map F:= (fi, ..., fg), which 
is a holomorphic diffeomorphism of A;eg onto an 
Open set in Cc’. The boundary Aging is the set of 
singular points of A, which is a lower-dimensional 
analytic set. The dimension of an analytic set is the 
maximum of the dimensions of its irreducible 
components. 

Here are typical examples of theorems on con- 
tinuing holomorphic functions across small analytic 
sets E. If codim E > 2, then every function which is 
holomorphic on D\E extends to a holomorphic 
function on D. The same is true of meromorphic 
functions, that is, functions which are locally 
defined as quotients m=f/g of holomorphic func- 
tions. If f is holomorphic on D, then g:=1/f is 
holomorphic outside the analytic set E:=V/(f). 
Thus g cannot be holomorphically continued across 
this one-codimensional set. However, Riemann’s 
Hebbarkeitssatz is valid in several complex vari- 
ables: if f is locally bounded outside an analytic 
subset E of any positive codimension, then it extends 
holomorphically to D. 

With a bit of care, continuation results of this type 
can be proved for (reduced) complex spaces. These 
are defined as paracompact Hausdorff spaces which 
possess charts (Ua, a), where the local home- 
omorphism y, identifies the open set U, with a 
closed analytic subset Aa of a domain Da in some 
C”, As indicated above, a continuous function on 
A, is holomorphic if at each point it can be 
holomorphically extended to some neighborhood of 
that point in Da. Finally, just as in the case of 
manifolds, the compatibility between charts is guar- 
anteed by requiring that coordinate change 
Pag : Uag — Uga is biholomorphic, that is, it is a 
homeomorphism so that it and its inverse are given by 
holomorphic functions as F = (f1, . . . , fm). The discus- 
sion of irreducible components, sets of singularities, 
and dimension for complex spaces goes exactly in the 
same way as that above for analytic sets. 

If E is everywhere at least two-codimensional, 
then the above result on continuation of mero- 
morphic functions holds in complete generality. The 
Hebbarkeitssatz requires the additional condition 
that the complex space is normal. In many situations 
this causes no problem at all, because, in general, 
there is a canonically defined associated normal 
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complex space X and a proper, surjective, finite- 
fibered holomorphic map X — X which is biholo- 
morphic outside a nowhere-dense proper analytic 
subset. Difficulties can be overcome by simply lifting 
functions to this normalization and applying the 
Hebbarkeitssatz. 

Continuation theorems of Hartogs-type reflect the 
fact that complex analysis in dimensions larger than 
one is really quite different from the one-variable 
version. The following is such a theorem. Let (z, w) be 
the standard coordinates in C” and think of the z-axis 
as a parameter space for geometric figures in the 
w-plane. For example, let D; := {(z,w):|w| < 1} be 
a disk and A,:={(z,w):1—e< |w| <1} be an 
annulus. An example of a Hartogs figure H in C? 
is the union of the family of disks D, for |z| < 1—6 
with the family A, of annuli for 1— ô< |z| < 1. 
One should visualize the moving disks which 
suddenly change to moving annuli. One speaks of 
filling in the Hartogs figure to obtain the polydisk 
H :={(z,w); z|, |w| < 1}. Hartogs’ continuation the- 
orem states that a function which is holomorphic 
on H extends holomorphically to H. 


Cartan-Thullen Theorem 


One of the major developments in complex analysis 
in several variables was the realization that certain 
convexity concepts lie behind the strong continua- 
tion properties. At the analytic level one such is 
defined as follows by the full algebra of holo- 
morphic functions O(D). If K is a compact subset of 
D, then its holomorphic convex hull K is defined as 
the intersection of the sets P(f):={p € D:|f(p)| < 
Iflķ} as f runs through O(D). One says that D is 
holomorphically convex if K is compact for every 
compact subset K of D. 

The theorem of H. Cartan and Thullen relates this 
concept to analytic continuation phenomena as 
follows. A domain D is said to be a domain of 
holomorphy if, given a divergent sequence {z,,} C D, 
there exists f € O(D) which is unbounded along it. 
In other words, the phenomenon of being able to 
extend all holomorphic functions on D to a truly 
larger domain D does not occur. The Cartan- 
Thullen theorem states that D is a domain of 
holomorphy if and only if it is holomorphically 
convex. In the next paragraph the relation between 
this type of convexity and a certain complex 
geometric convexity of the boundary bd(D) will be 
indicated. 


Levi Theorem and the Levi Problem 


Consider a smooth (local) real hypersurface X 
containing 0 € C” with n> 1. It is the zero-set 


{o=0} in some neighborhood U of 0 of a smooth 
function with dp Æ 0 on U. This is viewed as a piece 
of a boundary of a domain D, where UM D= {p < O}. 
The real tangent space Tp) = Ker(dp(0)) contains a 
unique maximal (one-codimensional) complex sub- 
space Ty © =Ker(0p(0)) =H. The signature of the 
restriction of the complex Hessian (or Levi form) id0p 
to H is a biholomorphic invariant of X. In this 
notation the Hessian is a real alternating 2-form 
which is compatible with the complex structure, and 
its signature is defined to be the signature of the 
associated symmetric form. 

If the restriction of this Levi form to the complex 
tangent space has a negative eigenvalue, that is, if 
the boundary bd(D) has a certain degree of 
concavity, then there is a map F:A — U of the 
unit disk A which is biholomorphic onto its image 
with F(0)=0 and otherwise F(cl(A)) Cc D. The 
reader can imagine pushing the image of this map 
into the domain to obtain a family of disks which 
are in the domain, and pushing it in the outward 
pointing direction to obtain annuli which are also in 
the domain. Making this precise, one builds a 
(higher-dimensional) Hartogs figure H at the base 
point 0 so that H is an open neighborhood of 0. In 
particular this proves the theorem of E. E. Levi: 
every function holomorphic on UND extends to a 
neighborhood of 0. This can be globally formulated 
as follows: 


Theorem If D is a domain of holomorphy with 
smooth boundary in C”, then bd(D) is Levi- 
pseudoconvex. 


Here the terminology Levi-pseudoconvex is used to 
denote the condition that the restriction of the Levi 
form to the complex tangent space of every 
boundary point is positive semidefinite. 

One of the guiding problems of complex analysis 
in higher dimensions is the Levi problem. This is the 
converse statement to that of the Levi’s theorem: 


Levi Problem Is a domain D with smooth Levi- 
pseudoconvex boundary in a complex manifold 
necessarily a domain of holomorphy¢ 


Stated in this form it is not true, but for domains 
in C” it is true. As will be sketched below, under 
stronger assumptions on the Levi form it is almost 
true. However, there are still interesting open 
problems in complex analysis which are related to 
the Levi problem. 


Bounded Domains and Their Automorphisms 


The unit disk in the complex plane is particularly 
important, because, with the exceptions of projective 
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space P(C), the complex plane C, the punctured 
plane C\{0}, and compact complex tori, it is the 
universal cover of every (connected) one-dimensional 
complex manifold. 

In higher dimensions it should first be underlined 
that, without some further condition, there is no 
best bounded domain in C”. For example, two 
randomly chosen small perturbations of the unit ball 
Bo :={(z, w); z|" + jw? < 1}, with, for example, real 
analytic boundary, are not  biholomorphically 
equivalent. 

On the other hand, the following theorem of 
H. Cartan shows that bounded domains D are good 
candidates for covering spaces: 


Theorem Equipped with the compact open topol- 
ogy, the group Aut(D) of holomorphic automorph- 
isms of D is a Lie group acting properly on D. 


The notion of a proper group action of a 
topological group on a topological space is funda- 
mental and should be underlined. It means that if 
{x,} is a convergent sequence in the space where the 
group is acting, then a sequence of group elements 
{gn}, with the property that {g,(x,)} is convergent, 
itself possesses a convergent subsequence. As a 
consequence, isotropy groups are compact and 
orbits are closed. 

In the context of bounded domains D this implies 
that if I is a discrete subgroup of Aut(D), then 
X=D/T carries a natural structure of a complex 
space. If in addition T is acting freely, something 
that, with minor modifications, can be arranged, 
then X is a complex manifold. 

Many nontrivial compact complex manifolds arise 
as quotients D/T of bounded domains. Even very 
concrete quotients, for example, where D = B2, are 
extremely interesting. Conversely, if Aut(D) contains 
a discrete subgroup I so that D/T is compact, then 
D is probably very special. For example, it is known 
to be holomorphically convex! 

Any compact quotient X=D/I of a bounded 
domain is projective algebraic in the sense that it can 
be realized as a complex (algebraic) submanifold of 
some complex projective space. In fact the embed- 
ding can be given by quite special [T-invariant 
holomorphic tensors on D, and this in turn implies 
that X is of general type (see below). For further 
details, in particular on Cartan’s theorem on the 
automorphism group of a bounded domain, the 
reader is referred to Narasimhan (1971). 


Stein Manifolds 


The founding fathers of the first phase of “modern 
complex analysis” (Cartan, Oka, and Thullen) 


realized that domains of holomorphy form the 
basic class of spaces where it would be possible to 
solve the important problems of the subject con- 
cerning the existence of holomorphic or mero- 
morphic functions with reasonably prescribed 
properties. In fact, Oka formulated a principle 
which more or less states that if a complex analytic 
problem which is well formulated on a domain of 
holomorphy has a continuous solution, then it 
should have a holomorphic solution. Given the 
flexibility of continuous functions and the rigidity 
of holomorphic functions, this would seem impos- 
sible but in fact is true! 

Beginning in the late 1930s, Stein worked on 
problems related to this Oka principle, in particular 
on those related to what we would now call the 
algebraic topological aspects of the subject, and he 
was led to formulate conditions on a general 
complex manifold X which should hold if problems 
of the above type are to be solved. First, his axiom 
of holomorphic convexity was simply that, given a 
divergent sequence {x,} in X, there should be a 
function f € O(X) such that {f(x,)} is unbounded. 
Secondly, holomorphic functions should separate 
points in the sense that, given distinct points x1,x2 € 
X, there exists f € O(X) with f(x1) Æ f(x2). Finally, 
globally defined holomorphic functions should give 
local coordinates. Assuming that X is n-dimensional, 
this means that, given a point x € X, there exist 
fis-++sfn E O(X) such that dfy(x) A--- A df,(x) 4 0. 

Assuming Stein’s axioms, Cartan and Serre then 
produced a powerful theory in the context of sheaf 
cohomology which proved certain vanishing theo- 
rems that led to the desired existence theorems. This 
theory and typical applications are sketched below. 
Before going into this, we would like to mention 
that Grauert’s version of the Cartan-Serre theory 
requires only very weak versions of Stein’s axioms: 
(1) The connected component containing K of the 
holomorphic convex hull K of every compact set 
should be compact. (2) Given x € X, there are 
functions fi, ...,fm E€ O(X) so that x is an isolated 
point in the fiber of the map F:=(f,,...,fim):X > 
C”. Of course the results also hold for complex 
spaces. 

Holomorphically convex domains in C” are Stein 
manifolds, and since closed complex manifolds of 
Stein manifolds are Stein, it follows that any 
complex submanifold of C” is Stein. In particular, 
affine varieties are Stein spaces. Remmert’s theorem 
states the converse: an n-dimensional Stein manifold 
can be embedded as a closed complex submanifold 
of C*"*!, A nontrivial result of Behnke and Stein 
implies that every noncompact Riemann surface is 
also Stein. 
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Basic Formalism 


The following first Cousin problem is typical of those 
which can be solved by Stein theory. Let X be a 
complex manifold which is covered by open sets Uj. 
Suppose that on each such set a meromorphic function 
m; is given so that on the overlap Uj, := U; N U; the 
difference m; =m, — mi =: fy is holomorphic. This 
means that the distribution of polar parts of these 
functions is well defined. The question is whether or 
not there exists a globally defined meromorphic 
function m € M(X) with these prescribed polar 
parts, that is, with m — m; =: f; € O(U;). 

If one applies the Oka principle, this problem can 
be easily solved. For this one can assume that the 
covering is locally finite and take +; to be a partition 
of unity subordinate to the cover. Using standard 
shrinking and cut-off arguments, one extends the fi; 
to the full space X as smooth functions so that the 
alternating cocycle relations fj + fik + fk =0 and 
fi=—fi still hold. Then f; := >) yf, is a smooth 
function on U; which satisfies f; —fi=fj on the 
overlap Uj. It follows that f := m; + fi =m; + fj is a 
globally well-defined “smooth” function with the 
prescribed polar parts. The Oka principle would 
then imply that there is a globally defined mero- 
morphic function with the same property. 

The basic sheaf cohomological formalism for 
Stein theory can be seen in the above argument. 
Suppose that instead of applying extension and cut- 
off techniques from the smooth category, we could 
answer positively the question “given the holo- 
morphic functions {fj} on the Uj, do there exist 
holomorphic functions {f;} on the U; such that f; — 
fi = fy on the Uj?” Then we would immediately have 
the desired globally defined meromorphic function 
m:=m;i + fi. This question is exactly the question of 
whether or not the Cech cohomology class of the 
alternating cocycle {f;} vanishes. 

Let us quickly summarize the language of Cech 
cohomology. A presheaf of abelian groups is a 
mapping U — S(U) which associates to every open 
subset of X an abelian group. Typical examples are 
U — O(U), U — C™(U), U — H*(U, Z),... . The 
last example which associates to U its topological 
cohomology does not localize well in terms of 
following the basic axioms for a sheaf: (1) Given a 
covering {U;} of an open subset U of X and elements 
s; E€ S(U;) with s; — s;=0 on Uj, there exists s € S(U) 
with s|U;=s;. (2) If s,t¢S(U) are such that 
s|U;=t|U; for all i then s=t. For this we have 
assumed that the restriction mappings have been 
built into the definition of a presheaf. 

Associated to a sheaf S$ on X and a covering 
U={U;} is the space of alternating g-cocycles 


C7(U,S), which is the set of alternating maps & 
from the set of (q+ 1)-fold indices of the form 
(oee nata) Sise EG OU in igs Here Unai Un 
N---N U. The boundary mapping 6:C7 > CI! is 
defined by 6(€);.i,., = dup EE It 
follows that 67 = 0, and H*(U, S) is defined to be the 
cohomology of the associated complex. 

In any consideration it is necessary to refine 
coverings, shrink, etc., and therefore one goes to 
the limit H*(X,S) over all refinements of the 
coverings. The script notation S is used to denote 
that we have then localized the sheaf to the germ 
level. Due to a theorem of Leray one can, however, 
always take a _ suitable covering so that 
H7(U, S) =H1(X,S) for all q, where now S(U) 
satisfies the above axioms. 

One of the important facts in this cohomology 
theory is that a short exact sequence of sheaves 0 — 
S — S — S" — 0 yields a long exact sequence 


0 — H°(X,S') — H°(X,S) — H®(X, 8") 
— H'(X,S') — H'(X,S8) — H'(X, 8") > --- 


in cohomology. 

A fundamental theorem of Stein theory, Theorem 
B, states that for the basic analytic sheaves S of 
complex analysis, the so-called coherent sheaves, all 
cohomology spaces H4(X, S) vanish for all g > 1. In 
the above example of the first Cousin problem the 
desired vanishing is that of H'(U, ©). 


Coherent Sheaves 


Numerous important sheaves in complex analysis 
are associated to vector bundles on complex mani- 
folds. A holomorphic vector bundle 7: E — X over a 
complex manifold is a holomorphic surjective 
maximal rank fibration. Every fiber Ex :=m™! (x) is 
a complex vector space, and the vector space 
structure is defined holomorphically over X. For 
example, addition is a holomorphic map E xx E > E. 
Such bundles are locally trivial, that is, there is a 
covering {U;} of the base such that a !(U;) is 
isomorphic to U; x C” and on the overlap the gluing 
maps in the fibers are holomorphic maps yj: U; — 
GL,(C). The number r is called the rank of the 
bundle. Holomorphic bundles of rank 1 are referred to 
as holomorphic line bundles. Of course all of these 
definitions make sense in other categories, for exam- 
ple, topological, smooth, real analytic, etc. 

A holomorphic section of E over an open set U 
is a holomorphic map s:U — E with tos=Idy. 
The space of these sections is denoted by €(U), and 
the map U — €(U) defines a sheaf which is locally 
just O%. It is therefore called a locally free sheaf of 
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O-modules. Conversely, by taking bases of a locally 
free sheaf S on the open sets where it is isomorphic 
to a direct sum QO’, one builds an associated 
holomorphic vector bundle E so that E= S. 

It is not possible to restrict our attention to these 
locally free sheaves or equivalently to holomorphic 
vector bundles. One important reason is that images 
of holomorphic vector bundle maps are not necessa- 
rily vector bundles. A related reason is that the sheaf 
of ideals of holomorphic functions which vanish on 
a given analytic set A is not always a vector bundle. 
This is caused by the presence of singularities in A. 
There are many other reasons, but these should 
suffice for this sketch. 

The sheaves S that arise naturally in complex 
analysis are almost vector bundles. If X is the base 
complex manifold or complex space under consid- 
eration, then S will come from a vector bundle on 
some big open subset Xo whose boundary is an 
analytic set X1, and then on the irreducible 
components of X, it will come from vector bundles 
on such big open sets, etc. These sheaves are called 
coherent analytic sheaves of Ox-modules. The 
correct algebraic definition is that locally there 
exists an exact sequence 


0 — Oi ..- + OF! — O S50 [1] 


of sheaves of O-modules. This implies in particular 
that, although S might not be locally free, it is 
locally finitely generated, and the relations among 
the generators are also finitely generated. 


Selected Theorems 


The following efficiently formulated fundamental 
theorem contains a great deal of information about 
Stein manifolds. 


Theorem B A complex space X is Stein if and only 
if for every coherent sheaf S of Ox-modules 
H4(X, S)=0 for all q > 1. 


Since S is a sheaf, it follows that H? (X, S) =S(X). 
This is referred to as the space of sections of S over 
X. As a result of Theorem B, we are able to 
construct sections with prescribed properties. Let us 
give two concrete applications (there are many 
more!). 


Example Let A be a closed analytic subset of a 
Stein space X, and let Z denote the subsheaf of Ox 
which consists of those functions which vanish on A. 
Note that this must be defined for every open subset 
U of X. Then we have the short exact sequence 0 > 
T — Ox — Ox/T — 0. The restriction of Ox/TZ to 
A is called the (reduced) structure sheaf O, of A. In 


other words, for U open in A the space O,(U) 
should be regarded as the space of holomorphic 
functions on U. 

Now, Z is a coherent sheaf on X and therefore by 
Theorem B the cohomology group H!(X, Z) vanishes. 
Consequently, the associated long exact sequence in 
cohomology implies that the restriction mapping 
Ox(X) — O4(A) is surjective. This special case of 
Theorem A means that every (global!) holomorphic 
function on A is the restriction of a holomorphic 
function on X. © 


Example Let us consider the multiplicative (second) 
Cousin problem. In this case meromorphic functions 
m; are given on the open subsets U; of a covering U 
with the property that m;=f;jm;, where fj is holo- 
morphic and nowhere vanishing on the overlap Uj. 
This is a distribution D of the zero and polar parts of 
meromorphic functions, which in complex geometry is 
called a divisor, and the interesting question is whether 
or not there exists a globally defined meromorphic 
function which has D as its divisor. 

Now we note that GL;(C)=C° and thus 
fj: Uij + C* defines a line bundle L on X and we 
regard it as an element of the space H!(X, ©*) of 
equivalence classes of line bundles on X. Here 0" 
is the sheaf of nowhere-vanishing holomorphic 
functions on X. It is not even a sheaf of O-modules; 
therefore coherence is not discussed in this case. 

The long exact sequence in cohomology associated 
to the short exact sequence 0 > Z — O 2 O* — 1 
yields an element c,(L) € H*(X,Z), which is a purely 
topological invariant. It is called the Chern class of L, 
and one knows that L is topologically trivial if and 
only if cy(L) =0. 

Coming back to the Cousin II problem, using the 
same argument as in the Cousin I problem, we can 
solve it if and only if we can find nowhere-vanishing 
functions f; € O"(U;) with f; =fijf;. This is equivalent 
to finding a nowhere-vanishing section of L. But a 
line bundle has a nowhere-vanishing section if and 
only if it is isomorphic to the trivial bundle. In other 
words, the Cousin II problem can be solved for a 
given divisor D if and only if the associated line 
bundle L(D) is trivial in H'(X,O*). For this, a 
necessary condition is that the Chern class c,(L(D)) 
vanishes. But if X is Stein, this is also sufficient, 
because the vanishing of H(X, ©) together with the 
long exact sequence in cohomology shows that 
H! (X, O*) = H? (X, Z) is injective. 

Hence, in this case we have the following precise 
formulation of the Oka principle: “A given divisor 
D on a Stein manifold is the divisor of a globally 
defined meromorphic function if and only if the 
associated line bundle is topologically trivial.” © 
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A slightly refined statement from that above is the 
fact that on a Stein manifold the space of topologi- 
cal line bundles is the same as the space of 
holomorphic line bundles. In the case of (higher 
rank) vector bundles this is a deep and important 
theorem of Grauert. It can be formulated as follows. 


Grauert’s Oka principle On a Stein space the map 
F : Vectholo(X) — Vectrop(X) from the space of holo- 
morphic vector bundles to the space of topological 
vector bundles which forgets the complex structure 
is bijective. 

In closing this section, a few words concerning the 
proofs of the major theorems, for example, Theorem B, 
should be mentioned. In all cases one must solve 
something like an additive Cousin problem and one 
first does this on special relatively compact subsets. For 
this step there are at least two different ways to 
proceed. One is to delicately piece together solutions 
which are known to exist on very special polyhedral- 
type domains or build up from lower-dimensional 
pieces of such. 

Another method is to solve certain systems of PDEs 
on relatively compact domains where control at the 
boundary is given by the positivity of the Levi-form. 
An example of how such PDEs occur can already be 
seen at the level of the above Cousin I problem. At the 
point where we have solved it topologically, that is, the 
holomorphic cocycle {fj} is a coboundary f; = f; — fi of 
smooth functions, we observe that since Of; =0, it 
follows that a = Of; is a globally defined (0, 1)-form. It 
is O-closed, that is, the compatibility condition for 
solving the system Ou = a is fulfilled. If this system can 
be solved, then we use the solution u to adjust the 
topological solutions of the Cousin problem by 
replacing f; by f;-—u. We still have f;=f; — fi, but 
now the f; are holomorphic on Uj. 

To obtain the global solution to a Cousin-type 
problem, one exhausts the Stein space by the special 
relatively compact subsets U„ where, by one method 
or another, we have solved the problem with 
solutions s,. One would like to say that the s, 
converge to a global solution s. However, there is no 
way to a priori guarantee this without making some 
sort of estimates. One main way of handling this 
problem is to adjust the solutions as n — oo by an 
approximation procedure. For this one needs to 
know that holomorphic objects, for example, func- 
tions on U,,, can be approximated on U,, by objects 
of the same type which are defined on the bigger set 
U,41. This Runge-type theorem, which is a non- 
trivial ingredient in the whole theory, requires the 
introduction of an appropriate Fréchet structure on 
the spaces of sections of a coherent sheaf. This is in 
itself a point that needs some attention. 


Montel’s Theorem and Fredholm 
Mappings 


If U is an open subset of a complex space X, then 
O(U) has the Fréchet topology of convergence on 
compact subsets K defined by the seminorms |- |x. 
Using resolutions of type (1) above, one shows that 
the space of sections S(U) of every coherent sheaf S 
also possesses a canonical Fréchet topology. This is 
then extended to the spaces C7(U,S), and conse- 
quently one is able to equip the cohomology spaces 
H7(X,S) with (often non-Hausdorff) quotient 
topology. 

Elements of such cohomology groups can be 
regarded as obstructions to solving complex analytic 
problems. One often expects such obstructions, and 
is satisfied whenever it can be shown if there are 
only finitely many, that is, a finiteness theorem of 
the type dim H7(X,S) < co is desirable. Here we 
sketch two finiteness theorems which hold in 
seemingly different contexts, but their proofs are 
based on one principle: use the compactness 
guaranteed by Montel’s theorem as the necessary 
input for the Fredholm theorem in the context of 
Fréchet spaces. 

Recall that a continuous linear map T:E —> F 
between topological vector spaces is said to be 
compact if there is an open neighborhood U of 0 € E 
such that T(U) is relatively compact in F. If Y is a 
relatively compact open subset of a complex space 
X, then Montel’s theorem states that the restriction 
map ry:O(X)— O(Y) is compact. This can be 
extended to coherent sheaves, and using the Fred- 
holm theorem for certain natural restriction and 
boundary maps, one proves the following funda- 
mental fact. 


Lemma 1 If the restriction map r$:H4(X,S) — 
H7(Y,S) is surjective, then H4(Y,S) is finite 
dimensional. 


Since the methods for the proof are basic in complex 
analysis, we outline it here. Take a covering U of X 
such that H4(U, S)= H1(X, S). Then intersect its 
elements with Y to obtain a covering U of Y. Finally, 
refine that covering with refinement mapping 7 to a 
covering V of Y such that H7(V, S) = H1(X, S) and so 
that U; contains Vx) as a relatively compact subset 
for all i Let Z7(U,S) denote the kernel of 
the boundary map 6 for the covering U, and consider 
the map Z7(U,S) @CT1(V,S)  C1(V,S) which is 
the direct sum 76 6 of the restriction and boundary 
maps. By assumption it is surjective. Since 6 is the 
difference of this map and the compact map 7, 
L Schwartz’s version of the Fredholm theorem for 
Fréchet spaces implies that its image is of finite 
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codimension, that is, H7(Y,S)=H(V,S) is finite 
dimensional. 

Applying this Lemma in the case of compact 
spaces where X = Y, one has the following theorem 
of Cartan and Serre: 


Theorem If X is a compact complex space and S is 
a coherent sheaf on X, then dim H4(X,S) < œ for 
all q. 


Grauert made use of this technique in solving the 
Levi problem for a strongly pseudoconvex relatively 
compact domain D with smooth boundary in a 
complex manifold X. Here strongly pseudoconvex 
means that the restriction of the Levi form to the 
complex tangent space of every boundary point is 
positive definite. To do this he sequentially made 
“bumps” at boundary points to obtain a finite 
sequence of domains D = Dọ C Dı C- C Dm in 
such a way that the restriction mappings at the 
level of qth cohomology, q > 1, are all surjective 
and such that at the last step D is relatively 
compact in Dm. Applying the above Lemma, 
dim H4(D, S) < oo. Using another bumping proce- 
dure, it then follows that D is holomorphically 
convex and, in fact, that D is almost Stein. 

This last statement means that one can guarantee 
that O(D) separates points outside of some compact 
subset which could contain compact subvarietes on 
which the global holomorphic functions are constant. 
In this situation one can apply Remmert’s reduction 
theorem which implies that there is a canonically 
defined proper surjective holomorphic map 7: D > Z 
to a Stein space which is biholomorphic outside of 
finitely many fibers. One says that, in order to obtain 
the Stein space Z, finitely many compact analytic 
subsets must be blown down to points. 

The above mentioned reduction theorem is a 
general result which applies to any holomorphically 
convex complex space X. For this one observes that 
if X is holomorphically convex, then for x € X the 
level set L(x) :={y € X; f(y) =f(x) for all f € O(X)} 
is a compact analytic subset of X. One then defines 
an equivalence relation: x~ y if and only if the 
connected component of L(x) containing x and that 
of L(y) which contains y are the same. One then 
equips X/~ with the quotient topology and proves 
that the canonical quotient 7: X — X/~=:Z is 
proper. Finally, for U open in Z one defines 
Oz(U)=Ox(a1(U)) and proves that, equipped 
with this structure, Z is a Stein space. This Remmert 
reduction is universal with respect to holomorphic 
maps to holomorphically separable complex spaces, 
that is, if p: X — Y and Oy(Y) separates the points 
of Y, then there exists a uniquely defined holo- 
morphic map To: Z — Y so that T, o7=y. It should 


be noted that, even if the original space X is a 
complex manifold, the associated Stein space Z may 
be singular. This reflects the fact that it is difficult to 
avoid singularities in complex geometry. 


Mapping Theory 


Above we have attempted to make it clear that 
holomorphic maps play a central role in complex 
geometry. It is even important to regard a holo- 
morphic function as a map. Here we outline the 
basic background necessary for dealing with maps 
and then state three basic theorems which involve 
proper holomorphic mappings. 


Basic Facts 


A holomorphic map F:X — Y between (reduced) 
complex spaces is a continuous map which can be 
represented locally as a holomorphic map between 
analytic subsets of the spaces in which X and Y are 
locally embedded. In other words, F is the restriction 
of a map F=(f,,...,fm) which is defined by 
holomorphic functions. 

If X is irreducible and X and Y are one- 
dimensional, then a nonconstant holomorphic map 
F:X — Y is an open mapping. This statement is far 
from being true in the higher-dimensional setting. 
The reader need only consider the example 
F:C > C, (zw) > (zw, 2). 

Despite the fact that holomorphic maps can be 
quite complicated, they have properties that in 
certain respects render them tenable. Let us sketch 
these in the case where X is irreducible. First, one 
notes that every fiber F-'(y) is a closed analytic 
subset of X. One defines rank, F to be the codimen- 
sion at x of the fiber F'(F(x)) at x. Then 
rank F:= max {rank, F; x € X}. It then can be 
shown that {x € X; rank, F < k} is a closed analytic 
subset of X for every k. Applying this for 
k=rank F—1 we see that, outside a proper closed 
analytic subset, F has constant maximal rank. 

If F:X — Y has constant rank k in a neighbor- 
hood of some point x € X, then one can choose 
neighborhoods U of x in X and V of F(x) in Y so 
that F|U maps U onto a closed analytic subset of Y. 
By restricting F to the sets where it has lower rank 
and applying this local-image theorem, it follows 
that the local images of the set where F has lower 
rank are at least two dimensions smaller than those 
of top rank. Conversely, the fiber dimension 
dp(x):= dim, F-'(F(x)) is semicontinuous in the 
sense that dp(x) > dp(z) for all z near x. Finally, we 
note that if Y is m-dimensional, then F: X — Y is an 
open map if and only if it is of constant rank m. 
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Proper Mappings 


By definition a mapping F: X — Y between topolo- 
gical spaces is proper if and only if the inverse image 
F"(K) of an arbitrary compact subset in Y is 
compact in X. This is a more delicate condition 
than meets the eye. For example, if F:X — Y is a 
proper map and one removes one point from some 
fiber, then it is normally no longer proper! On the 
other hand, the restriction of a proper map to a 
closed subset is still proper. 

Remmert’s “Proper mapping theorem” is the first 
basic theorem on proper holomorphic maps: 


Theorem The image of a proper holomorphic map 
F:X — Y is a closed analytic subset of Y. 


Given another basic theorem of complex analysis, 
the reader can imagine how this might be proved. 
This is the continuation theorem for analytic sets 
due to Remmert and Stein: 


If X is a complex space and Y is a closed analytic 
subset with dim, Y < k for all y € Y and Z is a closed 
analytic subset of the complement X\Y with dim, Z > 
k+1 at all z € Z, then the topological closure cl(Z) of 
Z in X is a closed analytic subset of X with E=cl(Z)\ 
Z=cl(Z) NY a proper analytic subset of cl(Z). 


Similar results hold for more general complex 
analytic objects. For example, closed positive cur- 
rents with (locally) finite volume can be continued 
across any proper analytic subset (Skoda 1982). A 
sketch of the proof of the proper mapping theorem 
(for X irreducible) goes as follows. From the 
assumption that F is proper, the image F(X) 
is closed. If F has constant rank k, then, by the 
local result stated above, its image is everywhere 
locally a k-dimensional analytic set. Since the image 
is closed, the desired result follows. If rank F=k 
and E:={x € X; rank,F < k} 4 Ø, then by induction 
F(E) is a closed analytic subset of dimension at 
most k—2. Let A:=F'(E) and apply the 
previous discussion for constant rank maps to 
F\(X\A): X\A— Y\E. The image is a closed 
k-dimensional analytic subset of Y\E and its 
Remmert-Stein extension is the full image F(X). 

In this framework the Stein factorization theorem 
is an important tool. Here F:X — Y is again a 
proper holomorphic map which we may now 
assume to be surjective. Analogous to the construc- 
tion of the reduction of a holomorphically convex 
space, one says that two points in X are equivalent 
if they are in the same connected component of an 
F-fiber. This is indeed an equivalence relation, and 
the quotient Z:= X/~ is a complex space equipped 
with the direct image sheaf. Thus one decomposes F 


into two maps X > Z— Y, where X > Z is a 
canonically associated surjective map with con- 
nected fiber, and Z — Y is a finite map. 

This geometric proper mapping theorem is a preview 
of one of the deepest results in complex analysis: 
Grauert’s direct image theorem. This concerns the 
images of sheaves, not just the images of points. For this, 
given a sheaf S on X one defines the gth direct image 
sheaf on Y as the sheaf associated to the presheaf which 
attaches to an open set U in Y the cohomology space 
H1(F-'(U), S). Grauert’s “Bildgarbensatz” states the 
following: “If F : X — Y is a proper holomorphic map, 
then all direct image sheaves of any coherent sheaf on X 
are coherent on Y.” 


Complex Analysis and Algebraic 
Geometry 


The interplay between these subjects has motivated 
research and produced deep results on both sides. 
Here we indicate just a few results of the type which 
show that objects which are a priori of an analytic 
nature are in fact algebraic geometric. 


Projective Varieties 


Let us begin with the algebraic geometric side of the 
picture where we consider algebraic subvarieties X of 
projective space P,,(C). If [zo : 21: +--+ :2,] are homo- 
geneous coordinates of P„, such a variety is the 
simultaneous zero-set, X:= V(P1,..., Pm), of finitely 
many (holomorphic) homogeneous polynomials 
P; = P;(Z0,.-.-5 Zm). Chow’s theorem states that in this 
context there are no further analytic phenomena: 


Theorem Closed complex analytic subsets of pro- 
jective space P,,(C) are algebraic subvarieties. 


This observation has numerous consequences. For 
example, if F: X — Y is a holomorphic map between 
algebraic varieties, then, by applying Chow’s theorem 
to its graph, it follows that F is algebraic. 

Chow’s theorem can be proved via an application 
of the Remmert-Stein theorem in a very simple 
situation. For this, let a: CTI {0} — P,(C) be the 
standard projection, and let Z:=~'(X). Since Z is 
positive dimensional, by the Remmert-Stein theorem it 
can be extended to an analytic subset of C”*!. The 
resulting subvariety K(X) (the cone over X) is invariant 
by the C*-action which is defined by v — Av for A € 
C*. If f is a holomorphic function on C”*! which 
vanishes on K(X), then we develop it in homogeneous 
polynomials f=)? and note that 
MF )(z) =f(Az) = AP] also vanishes for all A. 
Hence, all P4 vanish identically and therefore the 
ideal of holomorphic functions which vanish on K(X) 
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is generated by the homogeneous polynomials which 
vanish on K(X) and consequently finitely many of 
these define X as a subvariety of P,,(C). 

Complements of subvarieties in projective varieties 
occur in numerous applications and are important 
objects in complex geometry. Even complements P„\ Y 
of subvarieties Y in the full projective space are not 
well understood. If Y is the intersection of a compact 
projective variety X with a projective hyperplane, that 
is, Y is a hyperplane section, then X\Y is affine. If Y is 
g-codimensional in X, then X\Y possesses a certain 
degree of Levi convexity and general theorems of 
Andreotti and Grauert (1962) on the finiteness and 
vanishing of cohomology indeed apply. However, not 
nearly as much is understood in this case as in the case 
of a hyperplane section. 


Kodaira Embedding Theorem 


Given that analytic subvarieties of projective space 
are algebraic, one would like to understand whether 
a given compact complex manifold or complex 
space can be realized as such a subvariety. Kodaira’s 
theorem is a prototype of such an embedding 
theorem. Most often one formulates projective 
embedding theorems in the language of bundles. 

For this, observe that if L — X is a holomorphic 
line bundle over a compact complex manifold, then 
its space I'(X, L) of holomorphic sections is a finite- 
dimensional vector space V. The zero-set of a section 
s€ V is a one-codimensional subvariety of X. 
Let us restrict our attention to bundles which are 
generated by their sections which for line bundles 
simply means that for every x € X there is some 
section s € V with s(x) Æ 0. It then follows that for 
every x € X the space H,:={x € X; s(x) =O} is a 
one-codimensional vector subspace of V. Thus L 
defines a holomorphic map yr : X — P(V*), x — Hy. 
Note that we must go to the projective space P(V*), 
because a linear function defining such an Hx is only 
unique up to a complex multiple. 

Projective embedding theorems state that under 
certain conditions on L the map yy is a holomorphic 
embedding, that is, it is injective and is everywhere 
of maximal rank in the analytic sense that its 
differential has maximal rank. Here we outline a 
complex analytic approach of Grauert for proving 
embedding theorems. It makes strong use of the 
complex geometry of bundle spaces. 

Let L — X be a holomorphic line bundle over a 
compact complex manifold. A Hermitian bundle metric 
is a smoothly varying metric þ in the fibers of L. This 
defines a norm function v> w| :=h(v,v) on the 
bundle space L. One says that L is positive if the tubular 
neighborhood T:={v € L; |v|?<1} is strongly 


pseudoconcave, that is, when regarded from outside 
T, its boundary is strongly pseudoconvex. 

To prove an embedding theorem, one must 
produce sections with prescribed properties. Sections 
of powers L* are closely related to holomorphic 
functions on the dual bundle space L*. This is due to 
the fact that if 7: L— X is the bundle projection, 
a(U,) & Ua x C is a local trivialization, and Za is 
a fiber coordinate, then a holomorphic function f on 
L* has a Taylor series development 


f(v) = $ sanaw) zao) 


The function f is well defined on L. Hence, the 
transformation law for the z” must be canceled out 
by a transformation law for the coefficient functions 
Saln). This implies that the s,(m) are sections of L”. 
Hence, proving the existence of sections in the 
powers of L with prescribed properties amounts to 
the same thing as proving the existence of holo- 
morphic funtions on L* with analogous properties. 

The positivity assumption on L is equivalent to 
assuming that the tubular neighborhoods of the zero- 
section in L* defined by the norm function associated 
to the dual metric are strongly pseudoconvex. The 
solution to the Levi problem, which was sketched 
above, then shows that L* is holomorphically convex, 
and its Remmert reduction is achieved by simply 
blowing down its zero-section. In other words, L* is 
essentially a Stein manifold, and using Stein theory, it 
is possible to produce enough holomorphic functions 
on L to show that some power L* defines a 
holomorphic embedding y,.:X — P(T(X,L*)*). 
Bundles with this property are said to be ample, and 
thus we have outlined the following fact: “a line 
bundle which is Grauert-positive is ample.” 

It should be underlined that we defined the Chern 
class of L as the image in H*(X,Z) of its equivalence 
class in H'(X,0*), that is, in this formulation the 
Chern class is a Cech cohomology class. It is, however, 
often more useful to consider it as a deRham class 
where it lies in the (1, 1)-part of Ha C). Ifþ isa 
bundle metric as above, then the Levi form of the norm 
function is a representative —c;(L,h) of the Chern 
class of L*. Thus cı(L,þ) is an integral (1, 1)-form 
which represents c1 (L). It is called the Chern form of L 
associated to the metric h. The following is Kodaira’s 
formulation of his embedding theorem: 


Theorem A line bundle L is ample if and only if it 
possesses a metric h so that c,(L, h) is positive definite. 


Kodaira’s proof of this fact follows from his 
vanishing theorem (see Several Complex Variables: 
Compact Manifolds) in the same way the example 
of Theorem A was derived from Theorem B in the 
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first example in the subsection “Selected theorems.” 
That an ample bundle is positive follows immedi- 
ately from the fact that if yr is an embedding, then 
its pullback of the (positive) hyperplane bundle on 
projective space agrees with L*. 

Finally, one asks the question “under what natural 
conditions can one construct a bundle L which is 
positive?” The following is an example of an answer 
which is related to geometric quantization. 

Suppose that X is a compact complex manifold 
equipped with a symplectic structure w, that is, w is 
a d-closed, nondegenerate 2-form. One says that w is 
Kahlerian if it is compatible with the complex 
structure J in the sense that w(Jv, Jw) =w(v,w) and 
w(Jv,v) > 0 for every v and w in every tangent space 
of X. Note that if L is a positive line bundle, then it 
possesses a Hermitian metric h such that w= c1 (L, þ) 
is a Kahlerian structure on X. 

It should be underlined that there are Kahler 
manifolds without positive bundles, for example, 
every compact complex torus T=C”/T possesses 
the Kahlerian structure which comes from the 
standard linear structure on C”. However, for n > 1 
most such tori are not projective algebraic and 
therefore do not have positive bundles. 

If, on the other hand, the Kahlerian structure is 
integral, a condition that is automatic for the Chern 
form c1(L,/) of a bundle, then there is indeed a line 
bundle L — X equipped with a Hermitian metric h 
such that c1(L, h) =w. The condition of integrality can 
be formulated in terms of the integrals of w over 
homology classes being integral or that its deRham 
class is in the image of the deRham isomorphism from 
the Cech cohomology H*(X, Z) @ C to Hj, p(X, C). 
Coupling this with the embedding theorem for positive 
bundles, we have the following theorem of Kodaira: 


Theorem If (X,w) is Kahblerian and w is integral, 
then X is projective algebraic. 


This result has been refined in the following 
important way (a conjecture of Grauert and 
Riemenschneider proved with different methods by 
Siu (1984) and by Demailly (1985)): the same result 
holds if w is only assumed to be semipositive and 
positive in at least one point. 

For Grauert’s proof of the Kodaira embedding 
theorem and a number of other important and 
beautiful results, we recommend the original paper 
(Grauert 1962). 


Quotients of Bounded Domains 


Let D be a bounded domain in C” and T be a discrete 
subgroup of Aut(D) which is acting freely on D with a 
compact quotient X := D/T. For y € F let J(y, z) be the 


determinant of the Jacobian dy/dz and, given a 
holomorphic function f, consider (at least formally) 
the Poincaré series X` f (y(z))](7, z)? of weight k. If f is 
bounded and k > 2, then this series converges to a 
holomorphic function P(f) on D which satisfies the 
transformation rule P(f) (y(2)) = ](y,z) *P(f)(z). 

Now the differential volume form Q :=dz1 A--- A 
dz, transforms in the opposite way (for k=1). 
Therefore s(f) = P(f)(Q)* is a T-invariant section of 
the kth power of the determinant bundle 
K:=A"T*D of the holomorphic cotangent bundle 
of D. In other words, s(f) € I(X,K*). Since the 
choice of f may be varied to show that there are 
sufficiently many sections to separate points and to 
guarantee the maximal rank condition, it follows 
that the canonical bundle K of X is ample. Compact 
complex manifolds with ample canonical bundle are 
examples of manifolds which are said to be of 
general type (see Several Complex Variables: Compact 
Manifolds). Thus, this construction with Poincaré 
series proves the following: “Every compact quotient 
D/T of a bounded domain is of general type and is 
in particular projective algebraic.” 


See also: Gauge Theoretic Invariants of 4-Manifolds; 
Moduli Spaces: An Introduction; Riemann Surfaces; 
Several Complex Variables: Compact Manifolds; Twistor 
Theory: Some Applications [in Integrable Systems, 
Complex Geometry and String Theory]. 
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Introduction 


The aim of this article is to give an overview of the 
classification theory of compact complex manifolds. 
Very roughly, compact manifolds can be divided 
into three disjoint classes: 


è Projective manifolds, that is, manifolds which can 
be embedded into some projective space, or 
manifolds birational to those, usually called 
Moishezon manifolds. These manifolds are treated 
by algebraic geometric methods, but very often 
transcendental methods are also indispensable. 

èe Compact (nonalgebraic) Kahler manifolds, that is, 
manifolds carrying a positive closed (1, 1)-form, 
or manifolds bimeromorphic to those. This class 
is treated mainly by transcendental methods from 
complex analysis and complex differential geo- 
metry. However, some algebraic methods are also 
of use here. 

e General compact manifolds which are not bimer- 
omorphic to Kahler manifolds. For two reasons 
we will essentially ignore this class in our survey. 
First, because of the lack of methods, not much is 
known, for example, there is still no complete 
classification of compact complex surfaces, and it 
is still unknown whether or not the 6-sphere 
carries a complex structure. And second, for the 
purpose of this encyclopedia, this class seems to 
be less important. 


The main problems of classification theory can be 
described as follows. 


e Birational classification: describe all projective 
(Kahler) manifolds up to birational (bimeromorphic) 
equivalence; find good models in every equivalence 
class. This includes the study of invariants. 

e Biholomorphic classification: classify all projec- 
tive (Kahler) manifolds with some nice property, 
for example, curvature, many symmetries, etc. 

e Topological classification and moduli: study all 
complex structures on a given topological manifold — 
including the study of topological invariants of 
complex manifolds; describe complex structures 
up to deformations and describe moduli spaces. 

e Symmetries: describe group actions and invariants — 
this is deeply related with the moduli problem. 


In this article we will assume familiarity with 
basic notions and methods from several complex 
variables and/or algebraic geometry. In particular 
we refer to Several Complex Variables: Basic 
Geometric Theory in this encyclopedia. 

We first note some standard notation used in this 
article. If X is a complex manifold of dimension n, 
then Tx will denote its holomorphic tangent bundle 
and QÅ, the sheaf of holomorphic p-forms, that is, 
the sheaf of sections of the bundle A? Tx. The 
bundle /A\" Ty is usually denoted by Kx, the 
canonical bundle of X and its sheaf of sections is 
the dualizing sheaf wx, but frequently we will not 
distinguish between vector bundles and their sheaves 
of sections. An effective (Cartier) divisor on a 
normal space X is a finite linear combination 
S > niYi, where nj; >0 and Y; C X are irreducible 
reduced subvarieties of codimension, which are 
locally given by one equation. If L is a line bundle, 
then instead of L®” we often write mL. If X is a 
compact variety and E a vector bundle or coherent 
sheaf, then the dimension of the finite-dimensional 
vector space H1(X, E) will be denoted by h1(X, E). 


Birational Classification 


Two compact manifolds X and Y are bimeromor- 
phically equivalent, if there exist nowhere dense 
analytic subsets A C X and BC Y and a biholo- 
morphic map X\ A — Y\B such that the closure of 
the graph is an analytic set in X x Y. In case X and 
Y are algebraic, one rather says that X and Y are 
birationally equivalent. This induces an isomorph- 
ism between the function fields of X and Y. If X and 
Y are projective or Moishezon(see below), then 
conversely an isomorphism of their function fields 
induces a birational equivalence between X and Y. 
Important examples are blow-ups of submanifolds; 
locally they can be described as follows. Suppose 
that locally X is an open set U C C” with coordi- 
nates 21,...,%, and that ACX is given by 
Z1 = +++ =%m=O0. Then the blow-up X—X is the 
submanifold X c U x Py m1 given by the 
equations 


yiti — yit; = 0 


where t; are homogeneous coordinates in Pp-m-1. 

The Chow lemma says that any birational — even 
rational — maps can be dominated by a sequence of 
blow-ups with smooth centers. Recently other 
factorizations (“weak factorization,” using blow- 
ups and blow-downs) have been established. 
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A projective manifold is a compact manifold which 
is a submanifold of some projective space Py. Of 
course, a projective manifold can be embedded into 
projective spaces in many ways. According to Chow’s 
theorem (see Several Complex Variables: Basic 
Geometric Theory), X C Py is automatically given 
by polynomial equations and is therefore an algebraic 
variety. This is part of Serre’s GAGA principle which 
roughly says that all global analytic objects on a 
projective manifold, for example, vector bundles or 
coherent sheaves and their cohomology are auto- 
matically algebraic. A compact manifold which is 
bimeromorphically equivalent to a projective mani- 
fold is called a Moishezon manifold. These arise 
naturally, for example, as quotient of group actions, 
compactifications, etc. 

The most important birational invariant of com- 
pact manifolds is certainly the Kodaira dimension 
K(X). It is defined in three steps: 


è «(X)=—oo iff h°(mKx)=0 for all m > 1. 

è k(X)=0 iff b°(mKx)<1 for all m, and 
h°(mKx) =1 for some m. 

è In all other cases we can consider the meromorphic 
map fim: X — Pnim associated to H? (mKx) for all 
those m for which h°(mKx) > 2. Let V, denote 
the (closure of the) image of fm. Then «(X) is 
defined to be the maximal possible dim V,,. 


Recall that fm is defined by [so: --- : sn] for a given 
base s; of H?(mKx), cf. Several Complex Variables: 
Basic Geometric Theory. 

In the same way one defines the Kodaira (or 
litaka) dimension «(L) of a holomorphic line bundle 
L (instead of L= Ky). 

We are now going to describe geometrically the 
different birational equivalence classes and how to 
single out nice models in each class. Using methods 
in characteristic p, Miyaoka and Mori proved the 
following theorem: 


Theorem 1 Let X be a projective manifold and 
suppose that through a general point x € X there is a 
curve C such that Kx -C < 0. Then X is uniruled, that 
is, there is a family of rational curves covering X. 


A rational curve is simply the image of noncon- 
stant map f : Pı —> X. It is a simple matter to prove 
that uniruled manifolds have «(X)=-—oo, but the 
converse is an important open problem. A step 
towards this conjecture has recently been made by 
Boucksom et al. (2004) if Kx is not pseudoeffective, 
that is, Kx “cannot be approximated by effective 
divisors,” then X is uniruled. Here one also finds a 
discussion of the case when Kx is pseudoeffective. 

Mori theory is central in birational geometry. 
To state the main results in this theory, we recall the 


notion of ampleness: a line bundle L is ample if L 
carries a metric of positive curvature. Alternatively 
some tensor power of L has enough global section to 
separate points and tangents and there gives an 
embedding into some projective space; see Several 
Complex Variables: Basic Geometric Theory for 
more details. The notion of nefness, which is in a 
certain sense the degenerate version of ampleness, 
plays a central role in Mori theory: a line bundle or 
divisor L is nef if 


eC = (ec LIC) 20 


for all curves C C X. Examples are those L carrying 
a metric of semipositive curvature, but the converse 
is not true. However, if L is nef, there exists for all 
positive € > 0 a metric he with curvature ©; > —ew, 
where w is a fixed positive form. In this context 
singular metrics on L are also important. Locally 
they are given by e” with a locally integrable 
weight function » and they still have a curvature 
current ©. If L has a singular metric with © 
bounded from below as current by a Kahler form, 
then L is big, that is, «(L) = dim X, the birational 
version of ampleness. If one simply has © > 0 as 
current, then L is pseudoeffective (and vice versa). 
All these positivity notions only depend on the 
Chern class c;(L) of L and therefore one considers 
the ample cone 


Kamp C (H'"'(X) NH*(X,Z)) @R 
and the cone of curves 
NE(X) c (H11 (X) n H*"-?(X,Z)) @R 


The ample cone is by definition the closed cone of 
nef divisors, the interior being the ample classes, 
while the cone of curves is the closed cone generated 
by the fundamental classes of irreducible curves. 

A basic result says that these cones are dual to 
each other. The structure of NE(X) in the part 
where Kx is negative is very nice; one has the 
following cone theorem: 


Theorem 2 NE(X) is locally finite polyhedral in 
the half-space {Kx < 0}; the (geometrically) extremal 
rays contain classes of rational curves. 


A ray R=R,[a] is said to be extremal in a closed 
cone K if the following holds: given b,c € K with 
b+c € R, then b,c € R. Given such an extremal ray 
R c NE(X), one can find an ample line bundle H 
and a rational number ¢ such that Ky + tH is nef 
and Kx + tH - R =0. Using the Kawamata—Viehweg 
vanishing theorem, a generalization of Kodaira’s 
vanishing theorem, which is one of the technical 
corner stones of the theory, one proves the so-called 


Base point free theorem Some multiple of Kx +tH 
is spanned by global sections and therefore defines a 
holomorphic map f : X — Y to some normal projec- 
tive variety Y contracting exactly those curves whose 
classes belong to R. 


These maps are called “contractions of extremal 
rays” or “Mori contractions.” In dimension 2 they 
are classical: either X =P, and f is the constant 
map, or f is a P,-bundle or f is birational and the 
contraction of a Py with normal bundle O(—1), that 
is, f contracts a (—1)-curve. In particular Y is again 
smooth. In the first two cases X has a very precise 
structure, but in the third birational case one 
proceeds by asking whether or not Ky is nef. If it 
is not nef, we start again by choosing the contrac- 
tion of an extremal ray; if Ky is nef, then a 
fundamental result says that a multiple of Ky is 
spanned. The class of manifolds with this property 
will be discussed later. 

The situation in higher dimensions is much more 
complicated. For example, Y need no longer be 
smooth. However the singularities which appear are 
rather special. 


Definition 1 A normal variety X is said to have 
only terminal singularities if first some multiple of 
the canonical (Weil) divisor Kx is a Cartier divisor, 
that is, a line bundle (one says that X is 
Q-Gorenstein) and second if for some (hence for 
every) resolution of singularities m:X— X the 
following holds: 


Ky =n"(Kx) + X aE; 


where the E; run over the irreducible 2-exceptional 
divisors and the a; are strictly positive. 


A brief remark concerning Weil divisors is in 
order: a Weil divisor is a finite linear combination 
X` a;iY; with Y; irreducible of codimension 1, but Y; 
is not necessarily locally defined by one equation. 
Recall that if each Y; is given locally by one 
equation, then the Weil divisor is Cartier. On a 
smooth variety these notions coincide. ; 

One important consequence is that K(X) = «(X) in 
case of terminal singularities, which is completely 
false for arbitrary singularities. Also notice that 
terminal singularities are rational: R47,(O,)=0 for 
q > 1. Terminal singularities occur in codimension 
at least 3. Thus they are not present on surfaces. In 
dimension 3 terminal singularities are well under- 
stood. The main point in this context is that for a 
birational Mori contraction the image Y often has 
terminal singularities. 

Now the scheme of Mori theory is the following. 
Start with a projective manifold X. If Kx is nef, we 


Several Complex Variables: Compact Manifolds 553 


stop; this class is discussed later. If Kx is not nef, 
then perform a Mori contraction f:X— Y. There 
are two cases: 


e If dim Y < dim X, then the general fiber F is a 
manifold with ample —Kpr, that is, a Fano 
manifold (discussed in the next section). Here we 
stop and observe that K(X) = —oo. Of course one 
can still investigate Y and try to say more on the 
structure of the fibration f. 

e If dim Y = dim X, then Y has terminal singularities — 
unless f is a small contraction which means that no 
divisors are contracted. Thus if fis not small, we may 
attempt to proceed by substituting X by Y. 


As a result one must develop the entire theory for 
varieties with terminal singularities. The big pro- 
blem arises from small contractions f. In that case 
Ky cannot be Q-Cartier and the machinery stops. So 
new methods are required. At this stage, other 
aspects of the theory lead one to attempt a certain 
surgery procedure which should improve the situa- 
tion and allow one to continue as above. The 
expected surgery Y — Y’, which takes place in 
codimension at least 2, is a “flip.” The idea is that 
we should substitute a small set, namely the 
exceptional set of a small contraction, by some 
other small set (on which the canonical bundle will 
be positive) to improve the situation. Of course Y’ 
should possess only terminal singularities. The 
existence of flips is very deep and has been proved 
by S Mori in dimension 3. Moreover, there cannot 
be an infinite sequence of flips, at least in dimension 
at most 4. 

In summary, by performing contractions and flips 
one constructs from X a birational model X’ with 
terminal singularities such that either 


© Ky is nef in which case we call X’ a minimal 
model for X, or 

e xX’ admits a Fano fibration f':X'— Y’ (discussed 
below), in which case K(X) = K(X’) = —oo. 


Up to now, Mori theory (via the work of 
Kawamata, Kollar, Mori, Reid, Shokurov, and 
others) works well in dimension 3 (and possibly in 
the near future in dimension 4) but in higher 
dimensions there are big problems with the existence 
of flips. Of course there might be completely 
different and possibly less precise ways to construct 
a minimal model. One way is to consider the 
canonical ring R of a manifold of general type: 


R=X_ H°(mKx) 


If R is finitely generated as C-algebra, then 
Proj(R) would be at least a canonical model which 
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has slightly more complicated singularities than a 
minimal model. However, it is known that this 
“finite generatedness problem” is equivalent to the 
existence of minimal models. On the other hand, if 
X is of general type with Ky nef (hence essentially 
ample) or more generally when some positive 
multiple mKx is generated by global sections, then 
R is finitely generated. 

We now must discuss the case of a nef canonical 
bundle. The behavior is predicted by the 


Abundance conjecture. If X has only terminal 
singularities and Kx is nef, then some multiple 
mKx is spanned. 


Up to now this conjecture is known only in 
dimension 3 (Kawamata, Kollar, Miyaoka). In 
higher dimensions it is even unknown if there is a 
single section in some multiple mK y. If mKy is 
spanned, one considers the Stein factorization 
f:X-— Y of the associated map, which is called the 
litaka fibration (if not birational) and we have 
dim Y = x«(X) by definition. The general fiber F is a 
variety with Kp = 0, a class discussed in the next 
section. If f is birational, then Y will be slightly 
singular (so-called canonical singularities) and Ky 
will be ample. Essentially we are in the case of 
negative Ricci curvature. 

Everything that was outlined above holds for 
projective manifolds. In the Kahler case one would 
expect the same picture, but the methods completely 
fail, and new, analytic methods must be found. Only 
very few results are known in this context. 

We come back to the case of a Fano fibration 
f : X — Y. By definition the anticanoical bundle —Kx 
is relatively ample so that the general fiber is a Fano 
variety. In this case there are no constraints on Y. 

To see how much of the geometry of X is dictated 
by the rational curves, one considers the so-called 
rational quotient of X. Here we identify two very 
general points on X if they can be joined by a chain 
of rational curves. In that way we obtain the 
rational quotient 


f:xX—Y 


This map is merely meromorphic, but has the 
remarkable property of being “almost holo- 
morphic,” that is, the set of indeterminacies does 
not project onto Y. In other words, one has nice 
compact fibers not meeting the indeterminacy set. If 
Y is just a point, then all points of X can be joined 
by chains of rational curves and X is called 
rationally connected. This notion is clearly biration- 
ally invariant. 

A deep theorem of Graber—Harris—Starr states 
that, given a Fano fibration (or a fibration with 


rationally connected fibers) f:X— Y, then X is 
rationally connected if and only if Y is. 

Manifolds X, which are birational to P, are 
called rational. If there merely exists a surjective 
(“dominant”) rational map P,, — X, then X is said 
to be unirational. Of course rational (resp. unira- 
tional) manifolds are rationally connected, but to 
decide whether a given manifold is rational/uni- 
rational is often a very deep problem. Therefore, 
rational connectedness is often viewed as a practical 
substitute for (uni)rationality. 

Often it is very important to compute the Kodaira 
dimension of fiber spaces. Let us fix a holomorphic 
surjective map f:X— Y between projective mani- 
folds and we suppose f has connected fibers. Then 
the so-called conjecture C,,,, states that 


K(X) > K(F) + &(Y) 


where F is the general fiber of f. This conjecture is 
known in many cases, for example, when the 
general fiber is of general type, but it is wide open 
in general. It is deeply related to the existence of 
minimal models (Kawamata). 


Biholomorphic Classification 
In this section we discuss manifolds X with 


èe ample anticanonical bundles —Kx (Fano manifolds), 
è trivial canonical bundles, and 
e ample canonical bundles Kx. 


Due to the solution of the Calabi conjecture by 
Yau and Aubin, these classes are characterized by a 
Kahler metric of positive (resp. zero, resp. negative) 
Ricci curvature. In principle, in view of the results of 
Mori theory, one should rather consider varieties 
with terminal singularities, but we ignore this aspect 
completely. Philosophically, up to birational equiva- 
lence all manifolds are via fibrations somehow 
composed of those classes via fibrations, possibly 
also up to étale coverings. 

Examples of Fano manifolds are hypersurfaces of 
degree at most n+ 1 in P,41, Grassmannians, or 
more generally homogenenous varieties G/P with G 
semisimple and P a parabolic subgroup. Fano 
manifolds are simply connected. This can be seen 
either by classical differential geometric methods 
using a Kahler metric of positive curvature or via the 
fundamental 


Theorem 3 Fano manifolds are rationally 


connected. 


The only known proof of this fact uses, as in the 
uniruled criterion mentioned above, characteristic p 
methods. By just using complex methods it is not 


known how to construct a single rational curve (of 
course, in concrete examples the rational curves are 
seen immediately). One still has to observe that 
rationally connected manifolds are simply con- 
nected, which is not so surprising, since rational 
curves lift to the universal cover. 

At least in principle, Fano manifolds can be 
classified: 


Theorem 4 There are only finitely many families of 
Fano manifolds in every dimension. 


A family (of Fano manifolds) is a submersion 
m7: X—S (with S irreducible) such that all fibers are 
Fano manifolds. The essential step is to bound (— Kx)”. 
An actual classification has been carried out only 
in dimension up to 3; in dimension 2 one finds 
P2, P1 x P1 and the so-called del Pezzo surfaces (P2 
blown up in at most eight points in general position). 
In dimension 3 there are already 17 families of Fano 
3-folds with b2 = 1 and 88 families with b2 > 2. 

An extremely hard question is to decide whether a 
given Fano manifold is rational or unirational. Even 
in dimension 3 this is not completely decided. 

The next class to be discussed are the manifolds 
with trivial canonical class Kx. This means that 
there is a holomorphic n-form without zeros 
(n= dim X). Important examples are tori and 
hypersurface in P„+1 of degree n+2. Simply 
connected manifolds with trivial canonical bundles 
are further divided into irreducible Calabi-Yau 
manifolds and irreducible symplectic manifolds. 
The first class is defined by requiring that there are 
no holomorphic p-forms for p < dim X whereas the 
second is characterized by the existence of a 
holomorphic 2-form of everywhere maximal rank. 
A completely different characterization is by holonomy: 
an irreducible Calabi-Yau manifold has SU-holonomy 
whereas irreducible symplectic manifolds have 
Sp-holonomy (with respect to a suitable Kahler metric). 

The splitting theorem of Beauville-Bogomolov— 
Kobayashi says 


Theorem 5 Let X be a projective (or compact 
Kahler) manifold with trivial canonical bundle. 
Then there exists a finite unbranched cover X > X 
such that 


X = A x IIX; x IY; 


with A a torus, X; irreducible Calabi-Yau, and Y; 
irreducible symplectic. 


The key to the proof of this theorem is the 
existence of a Ricci-flat Kähler metric on X, a 
Kähler-Einstein metric with zero Ricci curvature. 
Actually one has a stronger result: instead of 
assuming Kx to be trivial, just assume that 
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cı(X)=0 in H?(X,R). Then there exists a finite 
unramified cover X — X such that Kẹ is trivial. In 
view of Mori theory, normal projective varieties X 
with at most terminal singularities and Kx = 0 (i.e., 
Kx -C=0 for all curves) should also be investigated. 
It is expected that similar structure theorems hold; 
in particular mı(X) should be finite. The main 
difficulty is that there are no differential methods 
available; on the other hand an algebraic proof even 
for the splitting theorem in the smooth case is 
unknown. 

Calabi-Yau manifolds play an important role in 
string theory and mirror symmetry (see Mirror 
Symmetry: A Geometric Survey). Here we mention 
two basic problems. The first is the problem of 
boundedness: 

Are there only finitely many families of Calabi- 
Yau manifolds in any dimension? 

This problem is wide open; in particular one 
might ask: 

Is the Hodge number ht? bounded for Calabi- 
Yau 3-folds? 

The other problem asks for the existence of 
rational curves. In all known examples there are 
rational curves, but a general existence proof is not 
known. The case where b2(X)=1 seems to be 
particularly difficult. If b2(X)>2, then in may 
cases one can hope to find a fibration or a birational 
map, at least for 3-folds. Given such a map, the 
existence of rational curves is simple. For example, 
if D C X is an irreducible hypersurface which is not 
nef, choose H ample and consider the a priori 
positive real number p such that D+ pH is on the 
boundary of the ample cone. Then actually p is 
rational and a suitable multiple m(D+ pH) is 
spanned and defines a contraction on X. This 
comes from “logarithmic Mori theory.” 

The above splitting theorem exhibits a torus 
factor and all holomorphic 1-forms on X come 
from this torus. This principle generalizes: given any 
projective or compact Kahler manifold X, there 
exists a “universal object,” the Albanese torus 


Alb(X) = H®°(Q5)*/Hy(X, Z) 


(which is algebraic if X is) together with a 
holomorphic map 


a: X — Alb(X) 


the Albanese map. This Albanese map is given by 
integrating 1-forms and is often far from being 
surjective. The important property is now that, 
given a holomorphic 1-form w on X, there exists a 
holomorphic 1-form 7 on the Albanese torus such 
that w=a‘*(7). The universal property reads as 
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follows: every map X — T to a torus factors via an 
affine map Alb(X) > T. 

There is a nonabelian analog, the so-called 
Shafarevich map, but at the moment this map is 
only known to be meromorphic. It is an important 
tool to study the fundamental group 7;(X). We refer 
to Campana (1996) and Kollar (1995). 

In the following, Chern classes of holomorphic 
vector bundles will be important. Let X be a 
compact complex manifold and E a holomorphic 
vector bundle on X. The jth Chern class of E is an 
element 


cj(E) € H” (X, Q) n H” (X) 


It can be defined, for example, by putting a Hermitian 
metric on E, computing the curvature of the canonical 
connection compatible with both the metric and the 
holomorphic structure and then by applying certain 
linear operators coming from symmetric functions 
such as determinant and trace. Actually Chern classes 
can be attached to every complex topological vector 
bundle on a topological manifold; then c;(E) will 
simply live in H”(X,R). There is also a purely 
algebraic construction by Grothendieck. We refer, for 
example, to Fulton (1984) as well as for a discussion of 
the elementary functorial properties of Chern classes. 
Here we just recall that for a rank-r vector bundle E the 
first Chern class 


aE) (NE) 


where the Chern class of the line bundle NE as 
given in Several Complex Variables: Basic Geo- 
metric Theory actually lives in H? (X, Z). 

Finally we discuss manifolds with ample canonical 
class Kx. Here moduli question often plays a central 
role. Moduli spaces of surfaces with fixed cf and c2 
are very intensively studied (by Catanese, Ciliberto, 
and others). Here, without going into details, we 
will concentrate on the very interesting topic of 
Kähler-Einstein metrics. 

A Kahler metric w is said to be Kahler—Einstein, if 
its Ricci curvature Ric(w) is proportional to w. The 
proportionality factor A can be taken to be —1, 0, 1. In 
case Ky is ample or trivial, Kahler—Einstein metrics 
always exist by Yau and Aubin (cases \ = —1, resp. 
A = 0). However if X is Fano, there are obstructions, 
and a Kahler—Einstein metric does not always exist. 
An important consequence of the existence of a 
Kahler—Einstein metric on a manifold X,, with ample 
canonical class is the Miyaoka—Yau inequality: 
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n -+ u2 
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In case of equality, X is covered by the 
n-dimensional unit ball. 

The same inequality holds in case Kx =0, and as a 
consequence the Chern class c2(X) is in some sense 
semipositive. If c2(X)=0, then some finite unrami- 
fied cover of X is a torus. 

There is an interesting relation to stability. Recall 
that a vector bundle E on a compact Kähler 
manifold X, is semistable with respect to a given 
Kahler form w, if for all proper coherent subsheaves 
F CE of rank-r the following inequality holds: 


In case of strict inequality, E is said to be stable. 

The basic observation is now that the tangent 
bundle of a manifold with a Kahler—Einstein metric 
is semistable (with respect to the Kahler—Einstein 
metric). It is expected that Fano manifolds with 
by = 1 have (semi?-)stable tangent bundles, although 
in certain situations they do not admit a Kahler— 
Einstein metric. 

Again the first two Chern classes of a semistable 
vector bundle fulfill an inequality: 


2r 
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Equally important, semistable bundles with fixed 
numerical data form moduli spaces, this being the 
origin of the stability notion (Mumford). In this 
context, the notion of an Hermite—Einstein bundle is 
also important. Given a holomorphic vector bundle 
E with a Hermitian metric þh, there is a unique 
connection F, on E compatible both with and the 
complex structure. F, is a (1,1)-form with values in 
End(E). Now suppose (X,w) is Kahler and let AF, be 
the contraction of F, with w. Then (F,/) is said to 
Hermite—Einstein on (X,w), if 


AF, = yid 


with some constant y and id: E— E the identity. 
Notice that (X,w) is Kahler—Einstein if (Ty,h) is 
Hermite—Einstein over (X,w) with 4 the Kahler 
metric with Kahler form w. It is not so difficult to 
see that Hermite—Einstein bundles are semistable 
(with respect to the underlying Kahler form) and 
actually are directs sum of stable Hermite—Einstein 
bundles. Conversely, a very deep theorem of 
Uhlenbeck—Yau says that every stable vector bundle 
on a compact Kahler manifold is Hermite-Einstein. 
This is known as the Kobayashi—Hitchin correspon- 
dence; see Lübke and Teleman (1995). 


Topology, Invariants and Cohomology 


Besides the Kodaira dimension there are other 
important invariants of compact complex manifolds. 
Of course there are topological invariants such as the 
Betti number b;(X) = dim H’(X, R) or the fundamen- 
tal group 71(X). The fundamental group has been 
studied intensively in the last decade. A central 
question asks which groups can occur as fundamental 
groups of compact Kahler manifolds; another pro- 
blem is the so-called Shafarevitch conjecture which 
says that the universal cover of a compact Kahler 
manifold should be holomorphically convex. We 
refer to Campana (1996) and Kollar (1995). 
The plurigenera, 


Pm(X) = dim h°(mKx) 


are also extremely important. Here, Siu recently 
proved that P,,(X) is constant in families of 
projective manifolds. Other important invariants are 
h°(X, (Q%)°”). For example, it is conjectured that if 


h(x, (94) = 0 


for all positive m, then X is rationally connected. 
Tensor powers of the cotangent bundle somehow 
capture more of the structure of X than the Kodaria 
dimension but they are more difficult to treat. The 
relevance of the dimensions 


h? (X, 2) 


of holomorphic forms is easier to understand. More 
generally one has the Hodge numbers 


hP4(X) = dim H1 (X, 9%) 


For compact Kähler manifolds, the Hodge decom- 
position states 


H'(X,C) = G@ H?4(x) 


p+q=r 
Furthermore, Hodge duality, 
He4 (X)= Hae (X) 


holds. These results form a cornerstone for the 
geometry of compact Kahler manifolds and the 
starting point of Hodge theory. Hodge theory is, 
for example, extremely important in the study of 
families of manifolds and moduli. 

Concerning the topology of projective (Kahler) 
manifolds, the following two questions are very 
basic. 


e Which invariants are topological (or diffeo- 
morphic) invariants? 
e What are the projective or Kahler structures on a 


given compact topological manifold? 
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Concerning the first, Hodge decomposition 
implies that the irregularity )°(Q}) is actually a 
topological invariant. However it is unknown 
whether the number of holomorphic 2-forms is a 
topological invariant of Kahler 3-folds. Both ques- 
tions have been intensively studied in dimension 2. 
However, in higher dimensions almost nothing is 
known. For example, it is not known whether there 
is projective manifold of general type of even 
dimension which is homeomorphic to a quadric, 
that is, a hypersurface of degree 2 in projective 
space. 

Other important tools in the study of projective/ 
Kahler manifolds are listed below. 


e Cohomological methods: Riemann—Roch theorem 
and holomorphic Morse inequalities; vanishing 
theorems (Kodaira, Kawamata—Viehweg, etc.); 
Serre duality. References: Demailly (2000), 
Demailly and Lazarsfeld, Fulton (1984), Grauert 
et al. (1994), Lazarsfeld (2004). 

e Į? methods: extension theorems, singular metrics, 
multiplier ideals, etc. Reference: Demailly and 
Lazarsfeld (2001), Lazarsfeld (2004). 

e Theory of currents. Reference: Demailly 2000. 

e Cycle space and Douady space, resp. Chow 
scheme and Hilbert scheme. Reference: Fulton 
1984, Grauert 1994, Kollar 1996. 


We restrict our remarks on just one of these 
topics, vanishing theorems. The classical Kodaira- 
Nakano vanishing theorem says that if X is a 
compact manifold of dimension n with a positive 
(ample) line bundle L, then 


H4(X,L@N’) =0 


for p+q>n. This is usually proved via harmonic 
theory, that is, by representing the cohomology 
space by harmonic (p,q)-forms with values in L 
and by computing integrals of these forms. For 
many purposes, for example, for Mori theory, it is 
important to generalize this to a line bundle which 
have some positivity properties but which are not 
ample. This works only for p=n, however this is 
the most important part of the Kodaira—Nakano 
vanishing. The Kawamata—Viehweg vanishing theo- 
rem in its most basic version says that given a nef 
and big line bundle L, then Kodaira vanishing still 
holds: 


H14(X,L 9 Kx) =0 


for g > 1. But actually it is not necessary to assume 
L nef, in fact the following is true. Let 


D= N a;D; 
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be an effective Q-divisor, that is, all a; are positive 
rational numbers. Let (a;) be the fractional part of a; 
and suppose that the Q-divisor X` (a;}D; has normal 
crossings. Let [a;] be the roundup of a; and put 
L= X [a;|D;. If D is big and nef, then 


H14(X,L @ Kx) =0 


for q > 1. Of course L itself need not be nef! This 
generalization is technically very important and yields 
substantial freedom for birational manipulations. We 
refer to Kawamata et al. (1987) and Lazarsfeld (2004). 
Even this is not the end of the story: the Kawamata- 
Viehweg theorem is embedded in the broader context 
of the Nadel vanishing theorem where multiplier ideal 
sheaves come into the play. See Demailly and 
Lazarsfeld and Lazarsfeld (2004). 


Homogeneous Manifolds 


In this section we consider vector fields and 
holomorphic group actions on compact (Kahler) 
manifolds. Our main reference is Huckleberry 
(1990) with further literature given there. 

We denote by Aut(X) the group of holomorphic 
automorphisms of the compact manifold X (well 
known to be a complex Lie group), and by 
G := Aut? (X) the connected component containing 
the identity. The tangent space at any point of 
Aut? (X) can naturally be identified with H°(X, Ty), 
the (finite-dimensional) space of holomorphic 
vector fields on X. In fact, by integration, a 
vector field determines a one-parameter group of 
automorphisms. 

One says that X is homogeneous if G acts 
transitively on X. Therefore, one can write 


X=G/H 


where H is the isotropy subgroup of any point 
xo € X, that is, the subgroup of automorphisms 
fixed xo. Conversely one can take a complex Lie 
group G and a closed subgroup H and form the 
quotient G/H which is again a complex manifold 
and in fact homogeneous (of course not necessarily 
compact). 

Going back to a compact manifold X, the 
condition to be homogenenous can be rephrased by 
saying that the tangent bundle is generated by 
global sections, that is, if x € X and e € Tx,x, then 
there exists v € H®°(X,Tx) such that v(x)=e. The 
easiest case is when Ty is trivial. If X is Kahler, this 
is exactly the case when X is torus, X = C”/T with 
T ~ Z” a lattice, but without the Kahler assump- 
tion there are many more examples (the so-called 
parallelizable manifolds). 


More generally, let us consider the case that the 
compact Kahler manifold X admits a vector field v 
without zeros, but X is not required to be homo- 
geneous. Then a theorem of Lieberman says that 
there is a finite unramified cover f :X— X and a 
splitting 


X~FxT 


with T a torus, such that f*(v) is the pullback of a 
vector field on T. On the other hand, if v has a zero, 
then a classical theorem of Rosenlicht says that X is 
covered by rational curves, that is, X is uniruled. In 
particular «(X)=—oo. Notice also that a manifold 
of general type can never carry a vector field, in 
other words, the automorphism group is discrete, 
even finite. 

Coming back to compact homogeneous Kahler 
manifolds, the first thing to study is the Albanese 
map. The Borel-Remmert theorem says that 


XxTxO 


where T is the Albanese torus. This is proved using a 
maximal compact subgroup K C G and by some 
averaging process over K. Moreover, O is a rational 
homogeneous manifold. The structure of O is more 
precisely the following. One can write O = S/P with 
S a semisimple Lie group and PCS parabolic, 
which means that P contains a maximal connected 
solvable subgroup (the so-called Borel subgroup). 
The main ingredients of the proof are the Tits 
fibration, the Levi-Malcev decomposition of a Lie 
group into its radical and a semisimple group, and 
the Borel fixed point theorem: 


Theorem 6 Let G C GL,(C) be a connected 
solvable subgroup and X C P„,—ı be a G-stable 
subvariety. Then G has a fixed point on X. 


In the homogenenous Kahler case, the rationality of 
O is seen by exhibiting an open subset in O which is 
algebraically isomorphic to C”. 

Now things come down to classify all rational 
homogenenous manifold S/P which is of course 
classical. Notice that all rational homogeneous 
manifolds are Fano. One knows that a rational 
homogeneous manifold with Betti number bz > 2 
can be fibered over another rational homogenenous 
manifold with fibers rational homogeneous — this is 
actually a fiber bundle. The case that b2 =1 can be 
rephrased by saying that P is maximal parabolic. 
This fiber bundle might not be trivial as shown by 
the projectivized tangent bundle P(Tp,). 

Compact Hermitian symmetric spaces form a 
particularly interesting subclass of homogeneous 
Kahler manifolds. A manifold equipped with a 
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Hermitian metric is called Hermitian symmetric, if for 
every x € X there exists an involutive holomorphic 
isometry fixing x. Mok has shown the remarkable fact 
that the simply connected compact Hermitian sym- 
metric spaces are exactly those simply connected 
compact manifolds carrying a Kahler metric with 
semipositive holomorphic bisectional curvature. The 
only manifold having a metric with positive holo- 
morphic bisectional curvature is P,, (Siu-Yau, Mori). 


See also: Classical Groups and Homogeneous Spaces; 
Einstein Manifolds; Mirror Symmetry: A Geometric 
Survey; Moduli Spaces: An Introduction; Riemann 
Surfaces; Several Complex Variables: Basic Geometric 
Theory; Topological Sigma Models; Twistor Theory: 
Some Applications [in Integrable Systems, Complex 
Geometry and String Theory]. 
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Introduction 


In the standard model of cosmology, the expanding 
universe of galaxies is described by a Friedman- 
Robertson—Walker (FRW) metric, which in spherical 
coordinates has a line element given by (Blau and Guth 
1987, Weinberg 1972) 





d = —dt* + Rod +17 (dé? + sin? ode?) [1] 
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Friedman-Robertson- 


In this model, which accounts for things on the 
largest length scale, the universe is approximated by a 
space of uniform density and pressure at each fixed 
time, and the expansion rate is determined by the 
cosmological scale factor R(t) that evolves according 
to the Einstein equations. Astronomical observations 
show that the galaxies are uniform on a scale of 
about one billion light years, and the expansion is 
critical — that is, k=0 in [1] — and so, according to 
[1], on the largest scale, the universe is infinite flat 
Euclidian space R? at each fixed time. Matching the 
Hubble constant to its observed values, and invoking 
the Einstein equations, the FRW model implies that 
the entire infinite universe R? emerged all at once 
from a singularity (R=0), some 14 billion years ago, 
and this event is referred to as the big bang. 
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In this article, which summarizes the work of the 
authors in Smoller and Temple (1995, 2003), we 
describe a two-parameter family of exact solutions 
of the Einstein equations that refine the FRW metric 
by a spherical shock wave cutoff. In these exact 
solutions, the expanding FRW metric is reduced to a 
region of finite extent and finite total mass at each 
fixed time, and this FRW region is bounded by an 
entropy-satisfying shock wave that emerges from the 
origin (the center of the explosion), at the instant of 
the big bang, t=0. The shock wave, which marks 
the leading edge of the FRW expansion, propagates 
outward into a larger ambient spacetime from time 
t=O onward. Thus, in this refinement of the FRW 
metric, the big bang that set the galaxies in motion 
is an explosion of finite mass that looks more like a 
classical shock wave explosion than does the big 
bang of the standard model. (The fact that the entire 
infinite space R? emerges at the instant of the big 
bang, is, loosely speaking, a consequence of the 
Copernican principle, the principle that the Earth is 
not in a special place in the universe on the largest 
scale of things. With a shock wave present, the 
Copernican principle is violated, in the sense that 
the Earth then has a special position relative to the 
shock wave. But, of course, in these shock wave 
refinements of the FRW metric, there is a spacetime 
on the other side of the shock wave, beyond the 
galaxies, and so the scale of uniformity of the FRW 
metric, the scale on which the density of the galaxies 
is uniform, is no longer the largest length scale.) 

In order to construct a mathematically simple 
family of shock wave refinements of the FRW metric 
that meet the Einstein equations exactly, we assume 
k =0 (critical expansion), and we restrict to the case 
that the sound speed in the fluid on the FRW side of 
the shock wave is constant. That is, we assume an 
FRW equation of state p=op, where o, the square 
of the sound speed ,/Op/0p, is constant, 0 < a < c’. 
At o=c*/3, this catches the important equation of 
state p = (c? /3)p which is correct at the earliest stage 
of big bang physics (Weinberg 1972). Also, as o 
ranges from 0 to c*, we obtain qualitatively correct 
approximations to general equations of state. 
Taking c=1 (we use the convention that c=1, and 
Newton’s constant G=1 when convenient), the 
family of solutions is then determined by two 
parameters, 0<a0<1 and r,>0. The second 
parameter, rą, is the FRW radial coordinate r of 
the shock in the limit t— 0, the instant of the 
big bang. (Since, when k=0, the FRW metric is 
invariant under the rescaling r > ar and R > a7!R, 
we fix the radial coordinate r by fixing the scale 
factor a with the condition that R(tọ)=1 for some 
time to, say present time.) The FRW radial 


coordinate r is singular with respect to radial 
arclength ř=rR at the big bang R=0O, so setting 
r, > 0 does not place the shock wave away from the 
origin at time t=0. The distance from the FRW 
center to the shock wave tends to zero in the limit 
t — 0 even when r, > 0. In the limit rą — co, we 
recover from the family of solutions the usual 
(infinite) FRW metric with equation of state p = cp — 
that is, we recover the standard FRW metric in the 
limit that the shock wave is infinitely far out. In this 
sense our family of exact solutions of the Einstein 
equations considered here represents a two-parameter 
refinement of the standard FRW metric. 

The exact solutions for the case r, =0 were first 
constructed in Smoller and Temple (1995) (see also 
the notes by Smoller and Temple (1999)), and are 
qualitatively different from the solutions when r, > 0, 
which were constructed later in Smoller and 
Temple (2003). The difference is that, when r, =0, 
the shock wave lies closer than one Hubble length 
from the center of the FRW spacetime throughout 
its motion (Smoller and Temple 2000), but when 
r, > 0, the shock wave emerges at the big bang at a 
distance beyond one Hubble length. (The Hubble 
length depends on time, and tends to zero as t — 0.) 
We show in Smoller and Temple (2003) that one 
Hubble length, equal to c/H, where H=R/R, is a 
critical length scale in a k=0 FRW metric because 
the total mass inside one Hubble length has a 
Schwarzschild radius equal exactly to one Hubble 
length. (Since c/H is a good estimate for the age of 
the universe, it follows that the Hubble length c/H 
is approximately the distance of light travel starting 
at the big bang up until the present time. In this 
sense, the Hubble length is a rough estimate for the 
distance to the further most objects visible in the 
universe.) That is, one Hubble length marks precisely 
the distance at which the Schwarzschild radius 7, = 2M 
of the mass M inside a radial shock wave at distance 
7 from the FRW center, crosses from inside (7, < 7) 
to outside (7, > 7) the shock wave. If the shock wave 
is at a distance closer than one Hubble length from 
the FRW center, then 2M < Ff and we say that the 
solution lies outside the black hole, but if the shock 
wave is at a distance greater than one Hubble 
length, then 2M >F at the shock, and we say that 
the solution lies “inside” the black hole. Since M 
increases like 7, it follows that 2M <7 for 7 
sufficiently small, and 2M >Fř for 7 sufficiently 
large, so there must be a critical radius at which 
2M=7, and we show in what follows (see also 
Smoller and Temple (2003)) that when k =0, this 
critical radius is exactly the Hubble length. When 
the parameter r, = 0, the family of solutions for 0 < 
o <1 starts at the big bang, and evolves thereafter 
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“outside” the black hole, satisfying 2M/7 < 1 every- 
where from t=0 onward. But, when r, > 0, the 
shock wave is further out than one Hubble length 
at the instant of the big bang, and the solution 
begins with 2M/7 > 1 at the shock wave. From this 
time onward, the spacetime expands until even- 
tually the Hubble length catches up to the shock 
wave at 2M/7=1, and then passes the shock wave, 
making 2M/7 < 1 thereafter. Thus, when r, > 0, 
the whole spacetime begins inside the black hole 
(with 2M/7>1 for sufficiently large 7), but 
eventually evolves to a solution outside the black 
hole. The time when #7=2M actually marks the 
event horizon of a white hole (the time reversal of 
a black hole) in the ambient spacetime beyond the 
shock wave. We show that, when r, > 0, the time 
when the Hubble length catches up to the shock 
wave comes after the time when the shock wave 
comes into view at the FRW center, and when 
2M =f (assuming ¢ is so large that we can neglect 
the pressure from this time onward), the whole 
solution emerges from the white hole as a finite 
ball of mass expanding into empty space, satisfying 
2M/F < 1 everywhere thereafter. In fact, when r, > 0, 
the zero pressure Oppenheimer-—Snyder solution 
outside the black hole gives the large-time asymp- 
totics of the solution (Oppenheimer and Snyder 
1939, Smoller and Temple 1988, 2004 and the 
comments after Theorems 6-8 below). 

The exact solutions in the case r,=0O give a 
general-relativistic version of an explosion into a 
static, singular, isothermal sphere of gas, qualita- 
tively similar to the corresponding classical explo- 
sion outside the black hole (Smoller and Temple 
1995). The main difference physically between the 
cases rą > 0 and r, =0 is that, when r, > 0 (the case 
when the shock wave emerges from the big bang at a 
distance beyond one Hubble length), a large region 
of uniform expansion is created behind the shock 
wave at the instant of the big bang. Thus, when r, > 0, 
lightlike information about the shock wave 
propagates inward from the wave, rather than 
outward from the center, as is the case when r, =0 
and the shock lies inside one Hubble length. (One 
can imagine that when r, > 0, the shock wave can 
get out through a great deal of matter early on when 
everything is dense and compressed, and still not 
violate the speed of light bound. Thus, when r, > 0, 
the shock wave “thermalizes,” or more accurately 
“makes uniform,” a large region at the center, early 
on in the explosion.) It follows that, when r, > 0, 
an observer positioned in the FRW spacetime inside 
the shock wave will see exactly what the standard 
model of cosmology predicts, up until the time when 
the shock wave comes into view in the far field. In 


this sense, the case r, >0 gives a black hole 
cosmology that refines the standard FRW model of 
cosmology to the case of finite mass. One of the 
surprising differences between the case rą =0 and the 
case r, >0O is that, when r, >0, the important 
equation of state p = p/3 comes out of the analysis as 
special at the big bang. When r, > 0, the shock 
wave emerges at the instant of the big bang at a 
finite nonzero speed (the speed of light) only for the 
special value o=1/3. In this case, the equation of 
state on both sides of the shock wave tends to the 
correct relation p=p/3 as t— 0, and the shock 
wave decelerates to subluminous speed for all 
positive times thereafter (see Smoller and Temple 
(2003) and Theorem 8 below). 

In all cases 0<a0<1,r,>0, the spacetime 
metric that lies beyond the shock wave is taken to 
be a metric of Tolmann—Oppenheimer-Volkoff 
(TOV) form (Oppenheimar and Volkoff 1939): 


ds? = —B(7)d#* + A! (Ad? +7 [d8 + sin? Od¢*] [2] 


The metric [2] is in standard Schwarzschild coordi- 
nates (diagonal with radial coordinate equal to the 
area of the spheres of symmetry), and the metric 
components depend only on the radial coordinate 7. 
Barred coordinates are used to distinguish TOV 
coordinates from unbarred FRW coordinates for 
shock matching. The mass function M(?) enters as a 
metric component through the relation 


2M(r) 


A=1- [3] 
The TOV metric [2] has a very different character 
depending on whether A>0O or A < 0; that is, 
depending on whether the solution lies outside the 
black hole or inside the black hole. In the case A > 0, 
7 is a spacelike coordinate, and the TOV metric 
describes a static fluid sphere in general relativity. 
(When A> 0, for example, the metric [2] is the 
starting point for the stability limits of Buchdahl 
and Chandresekhar for stars (Weinberg 1972, 
Smoller and Temple 1997, 1998).) When A < 0, 7 
is the timelike coordinate, and [2] isa dynamical metric 
that evolves in time. The exact shock wave solutions are 
obtained by taking 7= R(t)r to match the spheres of 
symmetry, and then matching the metrics [1] and [2] at 
an interface 7=7(t) across which the metrics are 
Lipschitz continuous. This can be done in general. 
In order for the interface to be a physically mean- 
ingful shock surface, we use the result in Theorem 4 
below (see Smoller and Temple (1994)) that a single 
additional conservation constraint is sufficient to rule 
out 6-function sources at the shock (the Einstein 
equations G=x«T are second order in the metric, and 
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so 6-function sources will in general be present at a 
Lipschitz continuous matching of metrics), and 
guarantee that the matched metric solves the Einstein 
equations in the weak sense. The Lipschitz matching 
of the metrics, together with the conservation 
constraint, leads to a system of ordinary differential 
equations (ODEs) that determine the shock position, 
together with the TOV density and pressure at the 
shock. Since the TOV metric depends only on 7, the 
equations thus determine the TOV spacetime beyond 
the shock wave. To obtain a physically meaningful 
outgoing shock wave, we impose the constriant p < p 
to ensure that the equation of state on the TOV side 
of the shock is physically reasonable, and as the 
entropy condition we impose the condition that the 
shock be compressive. For an outgoing shock wave, 
this is the condition p > p, p > p, that the pressure 
and density be larger on the side of the shock that 
receives the mass flux — the FRW side when the 
shock wave is propagating away from the FRW 
center. This condition breaks the time-reversal sym- 
metry of the equations, and is sufficient to rule out 
rarefaction shocks in classical gas dynamics (Smoller 
1983, Smoller and Temple 2003). The ODEs, 
together with the equation-of-state bound and the 
conservation and entropy constraints, determine a 
unique solution of the ODEs for every 0 < ø < 1 and 
7,, > 0, and this provides the two-parameter family of 
solutions discussed here (Smoller and Temple 1995, 
2003). The Lipschitz matching of the metrics implies 
that the total mass M is continuous across the 
interface, and so when r, > 0, the total mass of the 
entire solution, inside and outside the shock wave, is 
finite at each time t > 0, and both the FRW and 
TOV spacetimes emerge at the big bang. The total 
mass M on the FRW side of the shock has the 
meaning of total mass inside the radius 7 at fixed 
time, but on the TOV side of the shock, M does not 
evolve according to equations that give it the 
interpretation as a total mass because the metric is 
inside the black hole. Nevertheless, after the space- 
time emerges from the black hole, the total mass 
takes on its usual meaning outside the black 
hole, and time asymptotically the big bang ends 
with an expansion of finite total mass in the usual 
sense. Thus, when r, > 0, our shock wave refine- 
ment of the FRW metric leads to a big bang of 
finite total mass. 

A final comment is in order regarding our overall 
philosophy. The family of exact shock wave solutions 
described here are rough models in the sense that 
the equation of state on the FRW side satisfies the 
condition o = const., and the equation of state on the 
TOV side is determined by the equations, and 
therefore cannot be imposed. Nevertheless, the 


bounds on the equations of state imply that the 
equations of state are qualitatively reasonable, and 
we expect that this family of solutions will capture 
the gross dynamics of solutions when more general 
equations of state are imposed. For more general 
equations of state, other waves, such as rarefaction 
waves and entropy waves, would need to be present 
to meet the conservation constraint, and thereby 
mediate the transition across the shock wave. Such 
transitional waves would be very difficult to model in 
an exact solution. But, the fact that we can find 
global solutions that meet our physical bounds, and 
that are qualitatively the same for all values of o € 
(0,1] and all initial shock positions, strongly suggests 
that such a shock wave would be the dominant wave 
in a large class of problems. 

In the next section, the FRW solution is derived 
for the case o=const., and the Hubble length is 
discussed as a critical length scale. Subsequently, 
the general theorems in Smoller and Temple (1994) 
for matching gravitational metrics across shock 
waves are employed. This is followed by a discus- 
sion of the construction of the family of solutions in 
the case r, =0. Finally, the case r, > 0 is discussed. 
(Details can be found in Smoller and Temple (1995, 
2003, 2004).) 


The FRW Metric 


According to Einstein’s theory of general relativity, 
all properties of the gravitational field are deter- 
mined by a Lorentzian spacetime metric tensor g, 
whose line element in a given coordinate system 


x= (x°,...,x°) is given by 
ds* = gidx' dx’ |4] 
(We use the Einstein summation convention, 


whereby repeated up-down indices are assumed 
summed from 0 to 3.) The components gj of the 
gravitational metric g satisfy the Einstein equations 


GÏ = kTË, T!=(pc+p)w'w'+pe" [S] 


where we assume that the stress-energy tensor T 


corresponds to that of a perfect fluid. Here G is the 
Einstein curvature tensor, 


o 87G 


KK = — 
cí 


6 


is the coupling constant, G is Newton’s gravitational 
constant, c is the speed of light, pc* is the energy 
density, p is the pressure, and w= (w?,..., w?) are 
the components of the 4-velocity of the fluid (cf. 
Weinberg 1972), and again we use the convention 


that c=1 and G=1 when convenient. 
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Putting the metric ansatz [1] into the Einstein 
equations [5] gives the equations for the FRW metric 
(Weinberg 1972), 


12 
R K k 
a (i)e 7! 


p=—3p + p)H [8] 


The unknown quantities R, p, and p are assumed to 
be functions of the FRW coordinate time ż alone, and 
the “dot” denotes differentiation with respect to t. 

To verify that the Hubble length Ferit = 1/H is the 
limit for FRW-TOV shock matching outside a black 
hole, write the FRW metric [1] in standard 
Schwarzschild coordinates x=(7,f), where the 
metric takes the form 


and 


ds? = —B(F,t)d + A(7,#) d +7 dQ? [9] 


and the mass function M(7F, £) is defined through the 
relation 


et es [10] 


It is well known that a general spherically symmetric 
metric can be transformed to the form [9] by 
coordinate transformation (see Weinberg (1972) and 
Groah and Temple (2004)). Substituting 7= Rr into 
[1] and diagonalizing the resulting metric, we obtain 
(see Smoller and Temple (2004) for details) 


1 1 — kr? _ 


1 


2. pao 
+ ee fe +rdQ [11] 


where w is an integrating factor that solves the 
equation 


ð ( 1-kr-H?r\ ð H7 
sO i) ar). 0 


and the time coordinate t=i(t,7) is defined by the 
exact differential 


= 1-—kr —-H*? Hr 7 


Now using [10] in [7], it follows that 





OE E 1K _ 
Mir) = J p(t)s*ds = 550P [14] 


Since in the FRW metric, 7= Rr measures arclength 
along radial geodesics at fixed time, we see from 


[14] that M(t,ř) has the physical interpretation as 
the total mass inside radius 7 at time ft in the FRW 
metric. Restricting to the case of critical expansion 
k=O, we see from [7], [14], and [13] that F= H~! is 
equivalent to 2M/7=1, and so at fixed time t, the 
following equivalences are valid: 


7=H' iff am iff A=0 [5] 
We conclude that 7=H™! is the critical length scale 
for the FRW metric at fixed time ż in the sense that 
A=1-—2M/7 changes sign at 7=H™', and so the 
universe lies inside a black hole beyond 7=H™~', as 
claimed above. Now, we proved in Smoller and 
Temple (1998) that the standard TOV metric out- 
side the black hole cannot be continued into A=0 
except in the very special case p=O. (It takes an 
infinite pressure to hold up a static configuration at 
the event horizon of a black hole.) Thus, shock 
matching beyond one Hubble length requires a 
metric of a different character, and for this purpose, 
we introduce the TOV metric inside the black hole — 
a metric of TOV form, with A < 0, whose fluid is 
comoving with the timelike radial coordinate 
7 (Smoller and Temple 2004). 

The Hubble length Feit =c/H is also the critical 
distance at which the outward expansion of the FRW 
metric exactly cancels the inward advance of a radial 
light ray impinging on an observer positioned at the 
origin of a k=0 FRW metric. Indeed, by [1], a light 
ray traveling radially inward toward the center of an 
FRW coordinate system satisfies the condition 


cd = R* dr [16] 


so that 


T- Rr+R=Hr-c=H(7-£)>0 [17] 
if and only if 
C 

F> a 
Thus, the arclength distance from the origin to an 
inward moving light ray at fixed time ¢t in a k=0 
FRW metric will actually increase as long as the light 
ray lies beyond the Hubble length. An inward moving 
light ray will, however, eventually cross the Hubble 
length and reach the origin in finite proper time, due 
to the increase in the Hubble length with time. 

We now calculate the infinite redshift limit in terms 
of the Hubble length. It is well known that light emitted 
at (te, re) at wavelength A, in an FRW spacetime will be 
observed at (to, ro) at wavelength Ao if 


A 
e 
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Moreover, the redshift factor z is defined by 
Ao 


re 

Thus, infinite redshifting occurs in the limit Re — 0, 
where R=0,t=0 is the big bang. Consider now a 
light ray emitted at the instant of the big bang, and 
observed at the FRW origin at present time t= fo. 
Let rẹ» denote the FRW coordinate at time t — 0 of 
the furthest objects that can be observed at the FRW 
origin before time t= tọ. Then rẹ marks the position 
of objects at time t=0 whose radiation would be 
observed as infinitly redshifted (assuming no scatter- 
ing). Note then that a shock wave emanating from 
7 =0 at the instant of the big bang, will be observed at 
the FRW origin before present time t= tọ only if its 
position r at the instant of the big bang satisfies the 
condition r < ræ. To estimate ræ, note first that from 
[16] it follows that an incoming radial light ray in an 
FRW metric follows a lightlike trajectory r= r(t) if 


r=—r =- f T 
i », R(T) 


m= 18] 


Using this, the following theorem can be proved 
(Smoller and Temple 2004). 


Theorem 1 If the pressure p satisfies the bounds 





and thus 





O<p<tp [19] 


then, for any equation of state, the age of the 
universe tọ and the infinite red shift limit r% are 
bounded in terms of the Hubble length by 


1 


oe e a 2 

2H, © = 3H A 
2 

ae O Al 

ie — Ho | | 


(We have assumed in Theorem 1 that R =0 when 
t=0 and R=1 when t= tọ, H = Ho.) 

The next theorem gives closed-form solutions of 
the FRW equations [7], [8] in the case when 


o=const. As a special case, we recover the bounds 
in [20] and [21] from the cases o =0 and 1/3. 


Theorem 2 Assume k=0 and the equation of state 
p =op |22] 
where o is taken to be constant, 


O<a<l 


then (assuming an expanding universe R > 0), the 
solution of system [7], [8] satisfying R=0 at t=0 
and R=1 at t=to is given by 


4 1 
p= ~ a |23] 
3k(1+0oy t 
f 2/(3(1+0)] 
r- (5) m 
to 
H to 
— = — 25 
Hy t |25] 


Moreover, the age of the universe to and the infinite 
red shift limit rx are given exactly in terms of the 
Hubble length by 





2 1 
EEEREN 26 
° 3(1+0)Ho Pe) 
2 4 
= = 27 
"00 1430p ce 


From [27] we conclude that a shock wave will be 
observed at the FRW origin before present time 
t=ty only if its position r at the instant of the big 
bang satisfies the condition 


2 1 
1+ 30 Ho 





T< 


Note that rœ ranges from one-half to two Hubble 
lengths as o ranges from 1 to 0, taking the 
intermediate value of one Hubble length at o = 1/3 
(cf. [21]). 

Note that using [23] and [24] in [14], it follows 
that 


M= =j p(t)s*ds 
2 Jo 
oT 2 
z —20/(1+0) 
= — t [28] 


so M<0 if o>0O. It follows that if p=ap, 
o=const. > 0, then the total mass inside radius 
r=const. decreases in time. 


The General Theory of Shock Matching 


The matching of the FRW and TOV metrics in the next 
two sections is based on the following theorems that 
were derived in Smoller and Temple (1994) (Theorems 
3 and 4 apply to non-lightlike shock surfaces. The 
lightlike case was discussed by Scott (2002).) 


Theorem 3 Let X denote a smooth, three-dimen- 
sional shock surface in spacetime with spacelike 
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normal vector n relative to the spacetime metric g; 
let K denote the second fundamental form on ¥; and 
let G denote the Einstein curvature tensor. Assume 
that the components gi of the gravitational metric g 
are smooth on either side of X (continuous up to the 
boundary on either side separately), and Lipschitz 
continuous across © in some fixed coordinate 
system. Then the following statements are 
equivalent: 


(i) [K]=0 at each point of X. 

(ii) The curvature tensors Ri and Gij, viewed as 
second-order operators on the metric compo- 
nents gi, produce no 6-function sources on X. 

(iii) For each point P€ ©, there exists a Ct! 

coordinate transformation defined in a neigh- 

borhood of P, such that, in the new coordinates 

(which can be taken to be the Gaussian normal 

coordinates for the surface), the metric compo- 

nents are Ctt functions of these coordinates. 

For each P € £, there exists a coordinate frame 

that is locally Lorentzian at P, and can be 

reached within the class of Ct! coordinate 
transformations. 


x 


(iv 


Moreover, if any one of these equivalencies hold, 
then the Rankine-Hugoniot jump conditions, 
[G]; n =0 (which express the weak form of con- 
servation of energy and momentum across © when 
G=kT), hold at each point on X. 


Here [f] denotes the jump in the quantity f across 
X (this being determined by the metric separately on 
each side of X because g; is only Lipschitz 
continuous across £), and by Ct! we mean that 
the first derivatives are Lipschitz continuous. 

In the case of spherical symmetry, the following 
stronger result holds. In this case, the jump condi- 
tions [G”]n;=0, which express the weak form of 
conservation across a shock surface, are implied by a 
single condition [G”]1;n; = 0, so long as the shock is 
non-null, and the areas of the spheres of symmetry 
match smoothly at the shock and change mono- 
tonically as the shock evolves. Note that, in general, 
assuming that the angular variables are identified 
across the shock, we expect conservation to entail 
two conditions, one for the time and one for the 
radial components. The fact that the smooth 
matching of the spheres of symmetry reduces 
conservation to one condition can be interpreted as 
an instance of the general principle that directions of 
smoothness in the metric imply directions of 
conservation of the sources. 


Theorem 4 Assume that g and & are two spheri- 
cally symmetric metrics that match Lipschitz con- 
tinuously across a three-dimensional shock interface 


X to form the matched metric g U g. That is, assume 
that g and & are Lorentzian metrics given by 


ds? = —a(t,r)dt? + b(t, r)dr* + c(t,r)dQ? [29] 
and 

ds? = —a(t, PdP + b(t, 7)d +2, 7dQ? [30] 
and that there exists a smooth coordinate transforma- 
tion V:(t,r) — (t, F), defined in a neighborhood of a 
shock surface © given by r=r(t), such that the metrics 


agree on X. (We implicitly assume that 0 and ọ are 
continuous across the surface.) Assume that 


ct) =a Vit7)) [31] 


in an open neighborhood of the shock surface £, so 
that, in particular, the areas of the 2-spheres of 
symmetry in the barred and unbarred metrics agree 
on the shock surface. Assume also that the shock 
surface r=r(t) in unbarred coordinates is mapped to 
the surface *=7(t) by (#,7(t))=WV(t,r(t)). Assume, 
finally, that the normal n to % is non-null, and that 


n(c) £0 [32] 


where n(c) denotes the derivative of the function c in 
the direction of the vector n. Then the following are 
equivalent to the statement that the components of 
the metric g U & in any Gaussian normal coordinate 
system are C'! functions of these coordinates across 
the surface X: 


(Gn; = 0 [33] 
[G" nin; =) [34] 
K] = 0 35] 


Here again, |f|=f—f denotes the jump in the 
quantity f across X£, and K is the second fundamental 
form on the shock surface. 


We assume in Theorem 4 that the areas of the 
2-spheres of symmetry change monotonically in the 
direction normal to the surface. For example, if 
c=r*, then 0c/dt=0, so the assumption n(c) Æ 0 is 
valid except when 2=0/0t, in which case the rays 
of the shock surface would be spacelike. Thus, the 
shock speed would be faster than the speed of light 


if our assumption n(c) Æ 0 failed in the case c=71’. 


FRW-TOV Shock Matching Outside the 
Black Hole - The Case r,=0 


To construct the family of shock wave solutions for 
parameter values 0<o<1 and r,=0, we match 
the exact solution [23]-[25] of the FRW metric [1] 
to the TOV metric [2] outside the black hole, 
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assuming A> 0. In this case, we can bypass the 
problem of deriving and solving the ODEs for the 
shock surface and constraints discussed above, by 
actually deriving the exact solution of the Einstein 
equations of TOV form that meets these equations. 
This exact solution represents the general-relativistic 
version of a static, singular isothermal sphere —- 
singular because it has an inverse square density 
profile, and isothermal because the relationship 
between the density and pressure is p =ap, 6 = const. 

Assuming the stress tensor for a perfect fluid, and 
assuming that the density and pressure depend only 
on 7, the Einstein equations for the TOV metric [2] 
outside the black hole (i.e., when A =1— 2M/7 > 0) 
are equivalent to the Oppenheimer—Volkoff system 


dM _ 


i An? p [36] 





{1 POL 2 gy 


Integrating [36], we obtain the usual interpretation 
of M as the total mass inside radius 7, 


mc = | “ant plOdé 38} 


The metric component B = B(?) is determined from p 
and M through the equation 





= —2 ——— 
B pt+p = 
Assuming 
p= op, p(T) = 5 [40] 


for some constants and y, and substituting into 
[3], we obtain 


M(r) = 4rgr [41] 


Putting [40] and [41] into [37] and simplifying yields 
the identity 


1 g 
= — | — 42 
7 zole) Pa 
From [38] we obtain 
A=1-81Gy< 1 [43] 


Applying [39] leads to 


J —2ē/ (1+0) 7 4g/ (1+5) 
B = Bo (=) = Bo (Z) [44] 
PO ro 


By rescaling the time coordinate, we can take Bo = 1 
at 7) = 1, in which case [44] reduces to 


B= p4a/(1+e) [45] 


We conclude that when [42] holds, [40]-[43] and 
[44] provide an exact solution of the Einstein field 
equations of TOV type, for each 0 <a < 1. (In this 
case, an exact solution of TOV type was first found 
by Tolman (1939), and rediscovered in the case 
o@=1/3 by Misner and Zapolsky (cf. Weinberg 
(1972 p. 320)).) By [43], these solutions are defined 
outside the black hole, since 2M/7<1. When 
o=1/3, [42] yields y=3/567G (cf. Weinberg 
(1972, equation (11.4.13))). 

To match the FRW exact solution [23]-[25] with 
equation of state p=ap to the TOV exact solution 
[40]-[45] with equation of state p=ap across a 
shock interface, we first set 7=Rr to match the 
spheres of symmetry, and then match the timelike 
and spacelike components of the corresponding 
metrics in standard Schwarzschild coordinates. The 
matching of the dř? coefficient A`! yields the 
conservation of mass condition that implicitly gives 
the shock surface F= f(t), 


4r 3 


M) = ole)? 46 


Using this together with [41] gives the following two 
relations that hold at the shock surface: 


[a 
Vel) 
oe ep a7 


O AE EY 


Matching the coefficient B of dř? on the shock 
surface determines the integrating factor % in a 
neighborhood of the shock surface by assigning 
initial conditions for [44]. Finally, the conservation 
constraint |T;]n;n;=0 leads to the single condition 


0=(1—-A)(p+p)(p +p) 


+ (1-4) EHDE 0 +0 Ao—p) [88 





which upon using p=op and p=ðōp is satisfied 
assuming the condition 


5 = vy 907 + 5404 49 —30 — 4 = H(o) [49] 


Alternatively, we can solve for ø in [49] and write 
this relation as 


pe eae [50] 
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This guarantees that conservation holds across the 
shock surface, and so it follows from Theorem 4 that 
all of the equivalencies in Theorem 3 hold across the 
shock surface. Note that H(0)=0, and to leading 
order @=(30/7)+ Olo?) as o—0. Within the 
physical region 0<o0,a¢ <1,H'(o) > 0,4 < o, and 
H(1/3) = v17 — 4 x 0.1231, H(1) = V112/2 — 5 x 
0.2915. 

Using the exact formulas for the FRW metric in 
[23]-[25], and setting Ro=1 at p=po,t=to, we 
obtain the following exact formulas for the shock 
position: 


ft) =a [51] 


Z CA [52] 


o 
N TT bate 


34\ 1/(3+39) 
p= atoes (2) [53] 


It follows from [41] that A > 0, and from [52] that 
r, = lim;.9 r(t)=0. The entropy condition that the 
shock wave be compressive follows from the fact 
that ¢=H(a) < o. Thus, we conclude that for each 
O0<o<1,r,=0, the solutions constructed in 
[40]-[53] define a one-parameter family of shock 
wave solutions that evolve everywhere outside 
the black hole, which implies that the distance 
from the shock wave to the FRW center is less than 
one Hubble length for all ż > 0. 

Using [51] and [52], one can determine the shock 
speed, and check when the Lax characteristic 
condition (Smoller 1983) holds at the shock. The 
result is the following theorem. (Note that even 
when the shock speed is larger than c, only the 
wave, and not the sound speeds or any other 
physical motion, exceeds the speed of light. See Scott 
(2002) for the case when the shock speed is equal to the 
speed of light.) The reader is referred to Smoller and 
Temple (1995) for details. 


where 


Theorem 5 There exist values 0 <0, < o2 <1, 
(a1 © 0.458, 02 =V5/3 ~ 0.745), such that, for 
0<o< 1, the Lax characteristic condition holds at 
the shock if and only if 0 < o < o1; and the shock 
speed is less than the speed of light if and only if 
OO =O, 


The explicit solution in the case r, =0 can be 
interpreted as a general-relativistic version of a 
shock wave explosion into a static, singular, 
isothermal sphere, known in the Newtonian case as 


a simple model for star formation (Smoller and 
Temple 2000). As the scenario goes, a star begins as 
a diffuse cloud of gas. The cloud slowly contracts 
under its own gravitational force by radiating energy 
out through the gas cloud as gravitational potential 
energy is converted into kinetic energy. This 
contraction continues until the gas cloud reaches 
the point where the mean free path for transmission 
of light is small enough that light is scattered, 
instead of being transmitted, through the cloud. The 
scattering of light within the gas cloud has the effect 
of equalizing the temperature within the cloud, and 
at this point the gas begins to drift toward the most 
compact configuration of the density that balances 
the pressure when the equation of state is isother- 
mal. This configuration is a static, singular, iso- 
thermal sphere, the general-relativistic version of 
which is the exact TOV solution beyond the shock 
wave when r, =0. This solution in the Newtonian 
case is also inverse square in the density and 
pressure, and so the density tends to infinity at the 
center of the sphere. Eventually, the high densities at 
the center ingnite thermonuclear reactions. The 
result is a shock wave explosion emanating from 
the center of the sphere, and this signifies the birth 
of the star. The exact solutions when r,=0 
represent a general-relativistic version of such a 
shock wave explosion. 


Shock Wave Solutions Inside the Black 
Hole - The Case r,.>0 


When the shock wave is beyond one Hubble length 
from the FRW center, we obtain a family of shock 
wave solutions for each 0<o0<1 and r, >0 by 
shock matching the FRW metric [1] to a TOV 
metric of form [2] under the assumption that 

ag) =1-* 1 N@ <0 [54] 
In this case, 7 is the timelike variable. Assuming that 
the stress tensor T is taken to be that of a perfect 
fluid comoving with the TOV metric, the Einstein 
equations G=x«T, inside the black hole, take the 
form (see Smoller and Temple (2004) for details) 





~ _p+p_N’ 
~ 2 Nei a 
N’ = -{ + gr) [56] 
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The system [55|-[57] defines the simplest class of 
gravitational metrics that contain matter, evolve 
inside the black hole, and such that the mass function 
M(?) < œ at each fixed time 7. System [55]-[57] for 
A < 0 differs substantially from the TOV equations 
for A > 0 because, for example, the energy density 
T°? is equated with the timelike component G” when 
A <0, but with G” when A > 0. In particular, this 
implies that, inside the black hole, the mass function 
M(?) does not have the interpretation as a total mass 
inside the radius 7 as it does outside the black hole. 

Equations [56], [57] do not have the same 
character as [54], [55] and the relation p =p with 
g = const. is inconsistent with [56], [57] together with 
the conservation constraint and the FRW assumption 
p=op tor shock matching. Thus, instead of looking 
for an exact solution of [56], [57] ahead of time, as in 
the case rą =0, we assume the FRW solution [23]- 
[25], and derive the ODEs that describe the TOV 
metrics that match this FRW metric Lipschitz- 
continuously across a shock surface, and then impose 
the conservation, entropy, and equation of state 
constraints at the end. Matching a given k=0 FRW 
metric to a TOV metric inside the black hole across a 
shock interface leads to the system of ODEs, (see 
Smoller and Temple (2004) for details), 


N= (OTa) 


9 (3u — 1)(o — u)N + 6u(1 + u) 
l (o —u)N+(1+u) } pe 


dr 1 7 


dN 1+3uN ed 
with conservation constraint 
—o(1+u)+(o-—u)N 
= 60 
"= +n) + (a —u)N el 
where 
E A pa E i [61] 
p p p 


Here p and p denote the (known) FRW density and 
pressure, and all variables are evaluated at the 
shock. Solutions of [58]-[60] determine the 
(unknown) TOV metrics that match the given 
FRW metric Lipschitz-continuously across a shock 
interface, such that conservation of energy and 
momentum hold across the shock, and such that 
there are no 6-function sources at the shock (Israel 
1966, Smoller and Temple 1997). Note that the 
dependence of [58]-[60] on the FRW metric is only 
through the variable o, and so the advantage of 
taking o=const. is that the whole solution is 


determined by the inhomogeneous scalar equation 
[58] when o=const. We take as the entropy 
constraint the condition that 


O<p<p, O<p<p [62] 


and to insure a physically reasonable solution, we 
impose the equation of state constriant on the TOV 
side of the shock (this is equivalent to the dominant 
energy condition (Blau and Guth 1987)) 


O<p<p [63] 


Condition [62] implies that outgoing shock waves 
are compressive. Inequalities [62] and [63] are both 
implied by the single condition (Smoller and Temple 


2004), 

1 1—u\/(o-u 

— 64 

a ea 
Since o is constant, eqn [58] uncouples from [59], 
and thus solutions of system [58]-[60] are deter- 
mined by the scalar nonautonomous equation [58]. 
Making the change of variable $=1/N, which 


transforms the “big bang” N — oo over to a rest 
point at S — 0, we obtain 


du f (1+u) 
nee 
(3u—1)(o—u) + 6u(1+u)S 
xd (o —u) + (1+n)S 65] 








Note that the conditions N>1 and 0<p<p 
restrict the domain of [65] to the region 0<u< 
a<1,0<S$<1. The next theorem gives the exis- 
tence of solutions for 0 < ø < 1,r, > 0, inside the 
black hole (Smoller and Temple 2003). 


Theorem 6 For every 0,0 < o< 1, there exists a 
unique solution u,(S) of [65], such that [64] holds 
on the solution for all S,0<S<1, and on this 
solution, 0 < u,(S) < ü, lims—o u,($) =ñ, where 


u = Min{1/3,o} [66] 
and 

jp = 0 = pa o 
For each of these solutions u,(S), the shock position 
is determined by the solution of [59], which in turn 
is determined uniquely by an initial condition which 


can be taken to be the FRW radial position of the 
shock wave at the instant of the big bang, 


a= lim AS) >0 [68] 


Concerning the shock speed, we have 
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Theorem 7 Let 0 <o< 1. Then the shock wave is 
everywhere subluminous, that is, the shock speed 
Sols) = s(ug($)) < 1 for all O < S <1, if and only if 
a= 1/3. 


Concerning the shock speed near the big bang 
S=0, the following is true: 


Theorem 8 The shock speed at the big bang S=0 
is given by 


lim sS o< 173 [69] 
lim so(S) =, @>1/3 [70] 
lim sa(5) =l, o=173 [71] 


Theorem 8 shows that the equation of state 
p = p/3 plays a special role in the analysis when r, > 0, 
and only for this equation of state does the shock 
wave emerge at the big bang at a finite nonzero 
speed, the speed of light. Moreover, [66] implies that 
in this case, the correct relation p/p=a@ is also 
achieved in the limit $ — 0. The result [67] implies 
that (neglecting the pressure p at this time onward), 
the solution continues to a k=0 Oppenheimer- 
Snyder solution outside the black hole for S > 1. 

It follows that the shock wave will first become 
visible at the FRW center 7=0 at the moment 


t=to,(R(to)=1), when the Hubble length 
H5! = H`! (tọ) satisfies 
1 1+ 30 


where r, is the FRW position of the shock at the 
instant of the big bang. At this time, the number of 
Hubble lengths VNo from the FRW center to the 
shock wave at time t= tọ can be estimated by 





2 2 GAA) 
i a rer 

Thus, in particular, the shock wave will still lie 
beyond the Hubble length 1/Họo at the FRW time to 
when it first becomes visible. Furthermore, the time 
terit > to at which the shock wave will emerge from 
the white hole given that tọ is the first instant at 
which the shock becomes visible at the FRW center, 
can be estimated by 


2 leg 2 
I a4 tnte OO I a2v3o/(1+o) 79 
a — to 7.436 73, 
for 0 < ø < 1/3, and by the better estimate 
eV6/4 < = <e [74] 
0 


in the case o = 1/3. Inequalities [73], [74] imply, for 
example, that at the Oppenheimer-Snyder limit o = 0, 


No _ 2, Écrit 4 


to 





and in the limit ¢=17 3, 


1< /No < 4.5 


We can conclude that at the moment tọ when the 
shock wave first becomes visible at the FRW center, 
the shock wave must lie within 4.5 Hubble lengths of 
the FRW center. Throughout the expansion up until 
this time, the expanding universe must lie entirely 
within a white hole — the universe will eventually 
emerge from this white hole, but not until some later 
time terit, Where terit does not exceed 4.5zo. 


1.8 < te4 5, 
Lo 


Conclusion 


We believe that the existence of a wave at the 
leading edge of the expansion of the galaxies is the 
most likely possibility. The alternatives are that 
either the universe of expanding galaxies goes on out 
to infinity, or else the universe is not simply 
connected. Although the first possibility has been 
believed for most of the history of cosmology based 
on the Friedmann universe, we find this implausible 
and arbitrary in light of the shock wave refinements 
of the FRW metric discussed here. The second 
possibility, that the universe is not simply connected, 
has received considerable attention recently (Klarreich 
2003). However, since we have not seen, and 
cannot create, any non-simply-connected 3-spaces 
on any other length scale, and since there is no 
observational evidence to support this, we view this 
as less likely than the existence of a wave at the leading 
edge of the expansion of the galaxies, left over from the 
big bang. Recent analysis of the microwave back- 
ground radiation data shows a cutoff in the angular 
frequencies consistent with a length scale of around 
one Hubble length (Andy Abrecht, private commu- 
nication). This certainly makes one wonder whether 
this cutoff is evidence of a wave at this length scale, 
especially given the consistency of this possibility 
with the case r, > 0 of the family of exact solutions 
discussed here. 


Acknowledgments 


The work of JS was supported in part by NSF 
Applied Mathematics Grant Number DMS-010- 
3998, and that of BT by NSF Applied Mathematics 
Grant Number DMS-010-2493. 


570 Short-Range Spin Glasses: The Metastate Approach 


See also: Black Hole Mechanics; Cosmology: 
Mathematical Aspects; Newtonian Limit of General 
Relativity; Symmetric Hyperbolic Systems and Shock 
Waves. 


Further Reading 


Blau SK and Guth AH (1987) Inflationary cosmology. In: Hawking 
SW and Israel W (eds.) Three Hundred Years of Gravitation, 
pp. 524-603. Cambridge: Cambridge University Press. 

Groah J, Smoller J, and Temple B (2003) Solving the Einstein 
equations by Lipschitz continuous metrics: shockwaves in 
general relativity. In: Friedlander S and Serre D (eds.) 
Handbook of Mathematical Fluid Dynamics, vol. 2, 
pp. 501-597. Amsterdam: North Holland. 

Groah J and Temple B (2004) Shock-Wave Solutions of the Einstein 
Equations: Existence and Consistency by a Locally Inertial Glimm 
Scheme. Memoirs of the AMS, 84 pages, vol. 172, no. B13. 

Israel W (1966) Singular hypersurfaces and thin shells in general 
relativity. IL Nuovo Cimento XLIV B(1): 1-14. 

Klarreich E (2003) The shape of space. Science News 164-165. 

Oppenheimer JR and Snyder JR (1939) On continued gravita- 
tional contraction. Physical Review 56: 455-459. 

Scott M (2002) General Relativistic Shock Waves Propagating at 
the Speed of Light. Ph.D. thesis, UC-Davis. 


Smoller J (1983) Shock Waves and Reaction Diffusion Equations. 
New York, Berlin: Springer. 

Smoller J and Temple B (1994) Shock-wave solutions of the 
Einstein equations: the Oppenheimer-Snyder model of grav- 
itational collapse extended to the case of non-zero pressure. 
Archives for Rational and Mechanical Analysis 128: 249-297. 

Smoller J and Temple B (1995) Astrophysical shock-wave solutions of 
the Einstein equations. Physical Review D 51(6): 2733-2743. 

Smoller J and Temple B (1997) Solutions of the Oppenheimer- 
Volkoff equations inside 9/8’ths of the Schwarzschild radius. 
Communications in Mathematical Physics 184: 597-617. 

Smoller J and Temple B (1998) On the Oppenheimer—Volkov 
equations in general relativity. Archives for Rational and 
Mechanical Analysis 142: 177-191. 

Smoller J and Temple B (2000) Cosmology with a shock wave. 
Communications in Mathematical Physics 210: 275-308. 
Smoller J and Temple B (2003) Shock wave cosmology inside a 
black hole. Proceedings of the National Academy of Sciences 

of the United States of America 100(20): 11216-11218. 

Smoller J and Temple B (2004) Cosmology, black holes, and 
shock waves beyond the Hubble length. Methods and 
Applications of Analysis 11(1): 77-132. 

Tolman R (1939) Static solutions of Einstein’s field equations for 
spheres of fluid. Physical Review 55: 364-374. 

Weinberg S (1972) Gravitation and Cosmology: Principles and 
Applications of the General Theory of Relativity. New York: 
Wiley. 


Shock Waves see Symmetric Hyperbolic Systems and Shock Waves 


Short-Range Spin Glasses: The Metastate Approach 


C M Newman, New York University, New York, NY, 
USA 
D L Stein, University of Arizona, Tucson, AZ, USA 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


The nature of the low-temperature spin glass phase in 
short-range models remains one of the central problems 
in the statistical mechanics of disordered systems (Binder 
and Young 1986, Chowdhury 1986, Mézard et al. 1987, 
Stein 1989, Fischer and Hertz 1991, Dotsenko 2001, 
Newman and Stein 2003). While many of the basic 
questions remain unanswered, analytical and rigorous 
work over the past decade have greatly streamlined the 
number of possible scenarios for pure state structure and 
organization at low temperatures, and have clarified the 
thermodynamic behavior of these systems. 

The unifying concept behind this work is that of 
the “metastate.” It arose independently in two 
different constructions (Aizenman and Wehr 1990, 


Newman and Stein 1996b), which were later shown 
to be equivalent (Newman and Stein 1998a). The 
metastate is a probability measure on the space of 
all thermodynamic states. Its usefulness arises in 
situations where multiple “competing” pure states 
may be present. In such situations it may be 
difficult to construct individual states in a measur- 
able and canonical way; the metastate avoids this 
difficulty by focusing instead on the statistical 
properties of the states. 

An important aspect of the metastate approach is 
that it relates, by its very construction (Newman and 
Stein 1996b), the observed behavior of a system in 
large but finite volumes with its thermodynamic 
properties. It therefore serves as a (possibly indis- 
pensable) tool for analyzing and understanding both 
the infinite-volume and finite-volume properties of a 
system, particularly in cases where a straightforward 
interpolation between the two may be incorrect, or 
their relation otherwise difficult to analyze. 

We will focus on the Edwards—Anderson (EA) 
Ising spin glass model (Edwards and Anderson 
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1975), although most of our discussion is relevant to 
a much larger class of realistic models. The EA 
model is described by the Hamiltonian 


7 = = > Ton [1] 


(x,y) 


where J denotes a particular realization of all of the 
couplings Jxy and the brackets indicate that the sum is 
over nearest-neighbor pairs only, with x,y € Z7. We 
will take Ising spins oy = +1; although this will affect 
the details of our discussion, it is unimportant for our 
main conclusions. The couplings Jxy are quenched, 
independent, identically distributed random variables 
whose common distribution u is symmetric about zero. 


States and Metastates 


We are interested in both finite-volume and infinite- 
volume Gibbs states. For the cube of length scale L, 
Ay ={-L, -L +1,...,L}¢, we define Hz, to be 
the restriction of the EA Hamiltonian to A; with a 
specified boundary condition such as free, fixed, or 
periodic. Then the finite-volume Gibbs distribution 
Ors 5 ee on A; (at inverse temperature G=1/T) is 


p'7),(c) = Zr! exp{-BHz1(0)} [2] 


where the partition function Z,((3) is such that 
the sum of 07 3 over all o yields 1. (In this and all 
succeeding definitions, the dependence on spatial 
dimension d will be suppressed.) 

Thermodynamic states are described by infinite- 
volume Gibbs measures. At fixed inverse temperature 
@ and coupling realization J, a thermodynamic state 
p7,g is the limit, as L — oo, of some sequence of such 
finite-volume measures (each with a specified bound- 
ary condition, which may remain the same or may 
change with L). A thermodynamic state p7,3 can also 
be characterized intrinsically through the Dobrushin-— 
Lanford—Ruelle (DLR) equations (see, e.g., Georgii 
1988): for any Az, the conditional distribution of p.7,¢ 
(conditioned on the  sigma-field generated by 
{ox 2X € Z4\ Az} is ee where g is given by the 
conditioned values of ox for x on the boundary of Az. 

Consider now the set G=G(J7, 3) of all thermo- 
dynamic states at a fixed (J, 3). The set of extremal, 
or pure, Gibbs states is defined by 


ex G =G \ {ap, + (1 — a)p2: 
a € (0,1); 0,02 EG; pi Apr} — [3] 
and the number of pure states N (J, 8) at (J, 6) is the 


cardinality |ex G| of ex G. It is not hard to show that, in 
any d and for a.e. J, the following two statements are 
true: (1) M =1 at sufficiently low 8 > 0; (2) at any 
fixed 3, N is constant a.s. with respect to the 7’s. (The 


last assertion follows from the measurability and 
translation invariance of AN, and the translation 
ergodicity of the disorder distribution of J.) 

A pure state pa (where a is a pure-state index) can 
also be intrinsically characterized by a “clustering 
property”; for two-point correlation functions, this 
reads 


(PxPy) pa — (Ox) pa Ty) pa > O 4] 


as |x — y| — oo. A simple observation (Newman and 
Stein 1992), with important consequences for spin 
glasses, is that if many pure states exist, a sequence 
of aS with boundary conditions and L’s chosen 
independently of J, will generally not have a 
(single) limit. We call this phenomenon “chaotic 
size dependence” (CSD). 

We will be interested in the properties of ex G at 
low temperatures. If the spin-flip symmetry present 
in the EA Hamiltonian equation [1] is spontaneously 
broken above some dimension do and below some 
temperature T,(d), there will be at least a pair of 
pure states such that their even-spin correlations 
are identical and their odd-spin correlations have the 
opposite sign. Assuming that such broken spin-flip 
symmetry indeed exists for d > dọ and T < T;,(d), the 
question of whether there exists more than one 
such pair (of spin-flip related extremal infinite- 
volume Gibbs distributions) is a central unresolved 
issue for the EA and related models. If many such 
pairs should exist, we can ask about the structure of 
their relations with one another, and how this 
structure would manifest itself in large but finite 
volumes. To do this, we use an approach, introduced 
by Newman and Stein (1996b), to study inhomoge- 
neous and other systems with many competing pure 
states. This approach, based on an analogy with 
chaotic dynamical systems, requires the construction 
of a new thermodynamic quantity which is called the 
“metastate” — a probability measure «7 on the 
thermodynamic states. The metastate allows an 
understanding of CSD by analyzing the way in 
which ae “samples” from its various possible limits 
as L — œo. 

The analogy with chaotic dynamical systems can 
be understood as follows. In dynamical systems, the 
chaotic motion along a deterministic orbit is 
analyzed in terms of some appropriately selected 
probability measure, invariant under the dynamics. 
Time along the orbit is replaced, in our context, by 
L and the phase space of the dynamical system is 
replaced by the space of Gibbs states. 

Newman and Stein (1996b) considered a “micro- 
canonical ensemble” (as always, at fixed 8, which 
will hereafter be suppressed for ease of notation) ky 
in which each of the finite-volume Gibbs states 


572 Short-Range Spin Glasses: The Metastate Approach 


OD, eee has weight N-t. The ensemble 


KN converges to a metastate kz as N — oo, in the 
following sense: for every (nice) function g on states 
(e.g., a function of finitely many correlations), 


nts ain 
Jim N'S so) = fat) dest) 15 


The information contained in «z effectively specifies 
the fraction of cube sizes Ly which the system spends 
in different (possibly mixed) thermodynamic states T 
as l — oo. 

A different, but in the end equivalent, approach 
based on J-randomness is due to Aizenman and 
Wehr (1990). Here one considers the random pair 
io Do defined on the underlying probability space 
of J, and takes the limit k (with conditional 
distribution kl given J), via finite-dimensional 
distributions along some subsequence. The details 
are omitted here, and the reader is referred to the 
work by Aizenman and Wehr (1990) and Newman 
and Stein (1998a). We note, however, the important 
result that a “deterministic” subsequence of volumes 
can be found on which [5] is valid and also (J, p] 
converges, with E= z (Newman and Stein 
1998a). 

In what follows we use the term “metastate” as 
shorthand for the «7 constructed using periodic 
boundary conditions on a sequence of volumes 
chosen independently of the couplings, and along 
which «Kg = We choose periodic boundary 
conditions for specificity; the results and claims 
discussed are expected to be independent of the 
boundary conditions used, as long as they are 
chosen independently of the couplings. 


Low-Temperature Structure 
of the EA Model 


There have been several scenarios proposed for the 
spin-glass phase of the Edwards—Anderson model at 
sufficiently low temperature and high dimension. 
These remain speculative, because it has not even 
been proved that a phase transition from the high- 
temperature phase exists at positive temperature in 
any finite dimension. 

As noted earlier, at sufficiently high temperature 
in any dimension (and at all nonzero temperatures in 
one and presumably two dimensions, although the 
latter assertion has not been proved), there is a 
unique Gibbs state. It is conceivable that this 
remains the case in all dimensions and at all nonzero 
temperatures, in which case the metastate kz is, for 
a.e. J, supported on a single, pure Gibbs state pz. 
(It is important to note, however, that in principle 


such a trivial metastate could occur even if NV > 1; 
indeed, just such a situation of “weak uniqueness” 
(van Enter and Frohlich 1985, Campanino et al. 
1987) happens in very long range spin glasses at 
high temperatures (Frohlich and Zegarlinski 1987, 
Gandolfi et al. 1993).) 

A phase transition has been proved to exist 
(Aizenman et al. 1987) in the Sherrington- 
Kirkpatrick (SK) model (Sherrington and Kirkpa- 
trick 1975), which is the infinite-range version of 
the EA model. Numerical (Ogielski 1985, Ogielski 
and Morgenstern 1985, Binder and Young 1986, 
Kawashima and Young 1996) and some analytical 
(Fisher and Singh 1990, Thill and Hilhorst 1996) work 
has led to a general consensus that above some 
dimension (typically around three or four) there does 
exist a positive-temperature phase transition below 
which spin-flip symmetry is broken, that is, in which 
pure states come in pairs, as discussed below eqn [4]. 
Because much of the literature has focused on this 
possibility, we assume it in what follows, and the 
metastate approach turns out to be highly useful in 
restricting the scenarios that can occur. The simplest 
such scenario is a two-state picture in which, below the 
transition temperature Te, there exists a single pair of 
global flip-related pure states p% and p3“. In this case, 
there is no CSD for periodic boundary conditions and 
the metastate can be written as 

RI = Olga 4 ty [6] 
That is, the metastate is supported on a single 
(mixed) thermodynamic state. 

The two-state scenario that has received the most 
attention in the literature is the “droplet/scaling” 
picture (McMillan 1984, Fisher and Huse 1986, 
1988, Bray and Moore 1985). In this picture a low- 
energy excitation above the ground state in A; is a 
droplet whose surface area scales as 1%, with I~ 
O(L) and d, < d, and whose surface energy scales as 
l°, with 0 > 0 (in dimensions where T, > 0). More 
recently, an alternative picture has arisen (Krzakala 
and Martin 2000, Palassini and Young 2000) in 
which the low-energy excitations differ from those 
of droplet/scaling, in that their energies scale as /°, 
with 6’ =0. 

The low-temperature picture that has perhaps 
generated the most attention in the literature is 
the replica symmetry breaking (RSB) scenario 
(Binder and Young 1986, Marinari et al. 1994, 
1997, Franz et al. 1998, Marinari et al. 2000, 
Marinari and Parisi 2000, 2001, Dotsenko 2001), 
which assumes a rather complicated pure-state 
structure, inspired by Parisi’s solution of the SK 
model (Parisi 1979, 1983, Mézard et al. 1984, 
1987). This is a many-state picture (N =oo for a.e. 


Short-Range Spin Glasses: The Metastate Approach 573 


J) in which the ordering is described in terms of the 
“overlaps” between states. There has been some 
ambiguity in how to describe such a picture for 
short-range models; the prevailing, or standard, 
view. Consider any reasonably constructed thermo- 
dynamic state py (see Newman and Stein (1998a) 
for more details) — e.g., the “average” over the 
metastate K7 


a J Pdeg(P) 7] 


Now choose o and o’ from the product distribu- 
tion p7(c)p7(c"). The overlap O is defined as 


Q = lim JAL D> oxo, 8 


xEAL 


and Pgz(q) is defined to be its probability 
distribution. 

In the standard RSB picture, p7 is a mixture of 
infinitely many pure states, each with a specific 


J-dependent weight W: 
pr(o) = >| W5p5(0) 9] 


Q 


If o is drawn from p% and o’ from a then the 
expression in eqn [8] equals its thermal mean, 


qf = lim JAL! SO (0x)alos)s [10] 


xEAL 


and hence P7 is given by 
Pala) =X WG Wola- 97) [11] 
ap 


The “self-overlap,” or EA order parameter, is given 
by qra =q% and (at fixed T) is thought to be 
independent of both a and J (with probability 1). 

According to the standard RSB scenario, the W%’s 
and qe ’s are non-self-averaging (i.e., 7-dependent) 
quantities, except for a= ĝ or its global flip, where 
q= +qra. The average P,(q) of P7(q) over the 
disorder distribution of J is predicted to be a 
mixture of two delta-function components at +q£A 
and a continuous part between them. However, it 
was proved by Newman and Stein (1996c) that this 
scenario cannot occur, because of the translation 
invariance of P7(q) and the translation ergodicity of 
the disorder distribution. Nevertheless, the metastate 
approach suggests an alternative, nonstandard, RSB 
scenario, which is described next. 

The idea behind the nonstandard RSB picture 
(referred to by us as the nonstandard SK picture in 
earlier papers) is to produce the finite-volume 
behavior of the SK model to the maximum extent 
possible. We therefore assume in this picture that in 


each Ay, the finite-volume Gibbs state Oe is well 
approximated deep in the interior by a mixed 
thermodynamic state I“), decomposable into many 
pure states pa, (explicit dependence on J is 
suppressed for ease of notation). More precisely, 
each I in «7 satisfies 


= >. Wr" Par |12] 


and is presumed to have a nontrivial overlap 
distribution for o,o’ from I'(a)I'(a’): 


Pr(q) = X WE We 6(9q—darae) [13] 


ar, br 


as did pz in the standard RSB picture. 

Because «z, like its counterpart p7 in the standard 
SK picture, is translation covariant, the resulting 
ensemble of overlap distributions Pr is independent 
of J. Because of the CSD present in this scenario, 
the overlap distribution for p varies with L, no 
matter how large L becomes. So, instead of 
averaging the overlap distribution over J, the 
averaging must now be done over the states T 
within the metastate «Kz, all at fixed J: 


Pas(q) = J Pr(q)rg(T)dr [14] 


The Pas(q) is the same for a.e. J, and has a form 
analogous to the P,(q) in the standard RSB picture. 

However, the nonstandard RSB scenario seems 
rather unlikely to occur in any natural setting, 
because of the following result: 


Theorem Newman and Stein 1998b). (Consider 
two metastates constructed along (the same) deter- 
ministic sequence of Ar’s, using two different 
sequences of flip-related, coupling-independent 
boundary conditions (such as periodic and antiper- 
iodic). Then with probability one, these two 
metastates are the same. 


The proof is given by Newman and Stein (1998b), 
but the essential idea can be easily described here. 
As discussed earlier, k7 = ze but Ki, is constructed 
by a limit of finite-dimensional distributions, which 
means averaging over other couplings including the 
ones near the system boundary, and hence gives the 
same metastate for two flip-related boundary 
conditions. 

This invariance with respect to different sequences 
of periodic and antiperiodic boundary conditions 
means essentially that the frequency of appearance 
of various thermodynamic states I’) in finite 
volumes A; is independent of the choice of 
boundary conditions. Moreover, this same 
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invariance property holds among any two sequences 
of fixed boundary conditions (and the fixed bound- 
ary condition of choice may even be allowed to vary 
arbitrarily along any single sequence of volumes)! It 
follows that, with respect to changes of boundary 
conditions, the metastate is extraordinarily robust. 

This should rule out all but the simplest overlap 
structures, and in particular the nonstandard RSB 
and related pictures (for a full discussion, 
see Newman and Stein 1998b). It is therefore 
natural to ask whether the property of metastate 
invariance allows any many-state picture. 

There is one such picture, namely the “chaotic pairs” 
picture, which is fully consistent with metastate 
invariance (our belief is that it is the only many-state 
picture that fits naturally and easily into results 
obtained about the metastate.) 

Here the periodic boundary condition metastate is 
supported on infinitely many pairs of pure states, 
but instead of eqn [12] one has 


P= (1/2)par + 1/2) p-ar [15] 


with overlap 
Pr = (1/2)6(q — qra) + (1/2)6(9+ qea) [16] 


So there is CSD in the states but not in the overlaps, 
which have the same form as a two-state picture in 
every volume. The difference is that, while in the latter 
case, one has the “same” pair of states in every volume, 
in chaotic pairs the pure-state pair varies chaotically as 
volume changes. If the chaotic pairs picture is to be 
consistent with metastate invariance in a natural way, 
then the number of pure-state pairs should be 
“uncountable.” This allows for a “uniform” distribu- 
tion (within the metastate) over all of the pure states, 
and invariance of the metastate with respect to 
boundary conditions could follow naturally. 


Open Questions 


We have discussed how the metastate approach to the 
EA spin glass has narrowed considerably the set of 
possible scenarios for low-temperature ordering in any 
finite dimension, should broken spin-flip symmetry 
occur. The remaining possibilities are either a two-state 
scenario, such as droplet/scaling, or the chaotic-pairs 
picture if there exist many pure states at some (6, d). 
Both have simple overlap structures. The metastate 
approach appears to rule out more complicated 
scenarios such as RSB, in which the approximate 
pure-state decomposition in a typical large, finite 
volume is a nontrivial mixture of many pure-state pairs. 

Of course, this does not answer the question of 
which, if either, of the remaining pictures actually 


does occur in real spin glasses. In this section we list 
a number of open questions relevant to the above 
discussion. 


Open Question 1 Determine whether a phase 
transition occurs in any finite dimension greater 
than one. If it does, find the lower critical dimension. 

Existence of a phase transition does not necessa- 
rily imply two or more pure states below T.. It could 
happen, for example, that in some dimension there 
exists a single pure state at all nonzero temperatures, 
with two-point spin correlations decaying exponen- 


tially above Tę and more slowly (e.g., as a power 
law) below T,. This leads to: 


Open Question 2 If there does exist a phase 
transition above some lower critical dimension, 
determine whether the low-temperature spin-glass 
phase exhibits broken spin-flip symmetry. 

If broken symmetry does occur in some dimen- 
sion, then of course an obvious open question is to 
determine the number of pure-state pairs, and hence 
the nature of ordering at low temperature. A 
(possibly) easier question (but still very difficult), 
and one which does not rely on knowing whether a 
phase transition occurs, is to determine the zero- 
temperature — i.e., ground state — properties of spin 
glasses as a function of dimensionality. A ground 
state is an infinite-volume spin configuration whose 
energy (governed by eqn [1]) cannot be lowered by 
flipping any finite subset of spins. That is, all ground 
state spin configurations must satisfy the constraint 


` Janty Z0 [17] 


(x,y) EC 


along any closed loop C in the dual lattice. 


Open Question 3 How many ground state pairs is 
the T=0 periodic boundary condition metastate 
supported on, as a function of d? 

The answer is known to be one for 1D, and a partial 
result (Newman and Stein 2000, 2001a) points 
towards the answer being one for 2D as well. There 
are no rigorous, or even heuristic (except based on 
underlying “ansatze”) arguments in higher dimension. 

An interesting — but unrealistic — spin-glass model 
in which the ground state structure can be exactly 
solved (although not yet completely rigorously) was 
proposed by the authors (Newman and Stein 1994, 
1996a) (see also Banavar 1994). This “highly 
disordered” spin glass is one in which the coupling 
magnitudes scale nonlinearly with the volume (and so 
are no longer distributed independently of the 
volume, although they remain independent and 
identically distributed for each volume). The model 
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displays a transition in ground state multiplicity: 
below eight dimensions, it has only a single pair of 
ground states, while above eight it has uncountably 
many such pairs. The mechanism behind the transi- 
tion arises from a mapping to invasion percolation 
and minimal spanning trees (Lenormand and Bories 
1980, Chandler et al. 1982, Wilkinson and Will- 
emsen 1983): the number of ground state pairs can be 
shown to equal 2^, where N =N (d) is the number of 
distinct global components in the “minimal spanning 
forest.” The zero-temperature free boundary condi- 
tion metastate above eight dimensions is supported 
on a uniform distribution (in a natural sense) on 
uncountably many ground state pairs. 

Interestingly, the high-dimensional ground state 
multiplicity in this model can be shown to be 
unaffected by the presence of frustration, although 
frustration still plays an interesting role: it leads to 
the appearance of chaotic size dependence when free 
boundary conditions are used. 

Returning to the more difficult problem of ground 
state multiplicity in the EA model, we note as a final 
remark that there could, in principle, exist ground 
state pairs that are not in the support of metastates 
generated through the use of coupling-independent 
boundary conditions. If such states exist, they may 
be of some interest mathematically, but are not 
expected to play any significant physical role. A 
discussion of these putative “invisible states” is 
given by Newman and Stein (2003). 


Open Question 4 If there exists broken spin-flip 
symmetry at a range of positive temperatures in 
some dimensions, then what is the number of pure- 
state pairs as a function of (6, d)? 

Again, the answer to this is not known above one 
dimension; indeed, the prerequisite existence of 
spontaneously broken spin-flip symmetry has not 
been proved in any dimension. A speculative paper 
by the authors (Newman and Stein 2001b), using a 
variant of the highly disordered model, suggests that 
there is at most one pair of pure states in the EA 
model below eight dimensions; but no rigorous 
arguments are known at this time. 
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Introduction 
The sine-Gordon equation 
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may be viewed as a prototype for a nonlinear 
integrable field theory. It is manifestly invariant 
under spacetime translations and Lorentz boosts, 

C = 2.09) 

(x,t) + (x cosh 0 — t sinh 0, t cosh 6 — x sinh 0) 
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It shares this relativistic invariance property with the 
linear Klein-—Gordon equation, which is obtained 
upon replacing sing by ¢. (The name sine-Gordon 
equation is derived from this relation, and was 
introduced by Kruskal.) The sine-Gordon equation 
can also be defined and studied in the form 





7 A a a 
Aude? I Ply) = BLE, *) (3] 
where 
u=(x+t)/2, v=(x—t)/2 4] 


are the so-called light-cone variables. 

There are two interpretations of the field ¢(t, x) 
that are quite different, both from a physical and 
from a mathematical viewpoint. The first one 
consists in viewing it as a real-valued function, so 


that [1] is simply a nonlinear PDE in two variables. 
In the second version, one views (t,x) as an 
operator-valued distribution on a Hilbert space. 
(Thus, one should smear (t,x) with a test function 
f(t,x) in Schwartz space to obtain a genuine 
operator on the Hilbert space.) In spite of their 
different character, the classical and quantum field 
theory versions have several striking features in 
common, including the presence of an infinite 
number of conservation laws and the occurrence of 
solitonic excitations. 

The classical sine-Gordon equation has been used 
as a model for various wave phenomena, including 
the propagation of dislocations in crystals, phase 
differences across Josephson junctions, torsion 
waves in strings and pendula, and waves along 
lipid membranes. It was already studied in the 
nineteenth century in connection with the theory of 
pseudospherical surfaces. The quantum version is 
used as a simple model for solid-state excitations. 

The designation “sine-Gordon” is also used for 
various equations that generalize [1] or bear 
resemblance to it. These include the so-called 
homogeneous and symmetric space sine-Gordon 
models, discrete and supersymmetric versions, and 
generalizations to higher-dimensional spacetimes 
(i.e., in [1] the spatial derivative is replaced by the 
Laplace operator in several variables). In this 
contribution we focus on [1], however. 

Our main goal is to discuss the integrability and 
solitonic properties, both at the classical and at the 
quantum level. First, we sketch the inverse-scattering 
transform (IST) solution to the Cauchy problem for 
[1]. Following Faddeev and Takhtajan, we emphasize 
the interpretation of the IST as an action-angle 
transformation for an infinite-dimensional Hamilto- 
nian system. Next, the particle-like solutions are 
surveyed by using a description in terms of variables 
that may be viewed as relativistic action-angle 
coordinates. This is followed by a section on the 
quantum field theory version, paying special atten- 
tion to the factorized scattering that is the quantum 
analog of the solitonic classical scattering. Finally, we 
sketch the intimate relation between the N-particle 
subspaces of the classical and quantum sine-Gordon 
field theory and certain integrable relativistic systems 
of N point particles on the line. 


The Classical Version: An Integrable 
Hamiltonian System 


In order to tie in the hyperbolic evolution equation 
[1] with the notion of infinite-dimensional integrable 
system, it is necessary to restrict attention to initial 
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data ¢(0,x)= (x) and 0,¢(0,x)=7(x) with special 
properties. First of all, the energy functional 


H = f (376 + Tarola) + (1 — cos o) dx 
[5] 


and symplectic form 


= [ dr(x) A d(x) dx [6] 


should be well defined on the phase space of initial 

data. Indeed, in that case [1] amounts to the 

Hamilton equation associated to [5] via [6]. 
Second, there exists a sequence of functionals 


hii(¢,7), LEZ [7] 


that formally Poisson-commute with H and among 
themselves. 

In particular, H equals 2(l +1), whereas 
2(I, — I1) equals the momentum functional 


OO 
Pa / PEP [8] 
=00 

The functional I,;,, contains x-derivatives of order 
up to |2l+1|, so one needs to require that the 
functions O,6(x) and q(x) be smooth and that all of 
their derivatives have sufficient decrease for 
x — =r OO. 

A natural choice guaranteeing the latter require- 
ments is 


Ox (x), W(x) E Sr (R) [9] 


where Sp(R) denotes the Schwartz space of 
real-valued functions on the line. To render the integral 
over 1 — cos ġ(x) (and similar integrals occurring for 
the sequence [7]) finite, one also needs to require 


o(x)—>2rk4, x> +o, k EZ [10] 


On this phase space Q of initial data, the Cauchy 
problem for the evolution equation [1] is not only 
well posed, but can be solved in explicit form by 
using the IST. More generally, the Hamiltonians 
bi} give rise to evolution equations that are 
simultaneously solved via the IST, yielding an 
infinite sequence of commuting Hamiltonian flows 
on Q. 

Before sketching the overall picture resulting from 
the IST, it should be mentioned at this point that [1] 
admits explicit solutions of interest that do not 
belong to Q. First, there is a class of algebro- 
geometric solutions that have no limits as x — + œo. 
These solutions can be obtained via finite-gap 
integration methods, yielding formulas involving 
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the Riemann theta functions associated to compact 
Riemann surfaces. Second, there are the tachyon 
solutions. They arise from the particle-like solutions 
that do belong to 2 by the transformation 


b(t, x) > plx, t) +7 [11] 


(Observe that the equation of motion [1] is invariant 
under [11], whereas due to the finite-energy require- 
ment [10] this is not true for solutions evolving in Q.) 

The IST via which the above Cauchy problem can 
be solved starts from an auxiliary system of two 
linear ordinary differential equations involving 
d(0,x) and (0, x). It is beyond our scope to 
describe the system in detail. The results derived 
from it, however, are to a large extent the same as 
those obtained via a simpler auxiliary linear opera- 
tor that is associated to the light-cone Cauchy 
problem. The latter operator is of the Ablowitz— 
Kaup—Newell-Segur (AKNS) form. That is, the 
linear operator is an ordinary differential operator 
of Dirac type given by 


me =j 
i= i [12] 
17 hae 


where the external potentials r(u) and g(u) depend 
on the evolution equation at hand. For the light- 
cone sine-Gordon equation [3], one needs to choose 


r = —q = (Oud) (u, 0) /2 [13] 


In both settings, the associated spectral features 
are invariant under the sine-Gordon evolution and 
all of the evolutions generated by the Hamiltonians 
I5j41, yielding the so-called isospectral flows. More 
specifically, if the initial data give rise to bound-state 
solutions of the linear problem (square-integrable 
wave functions), then the corresponding eigenvalues 
are time independent. Furthermore, due to the decay 
requirements on the potential in the linear system, 
there exist scattering solutions with plane-wave 
asymptotics for all initial data in Q. A suitable 
normalization leads to the so-called Jost solutions 
(x, à). (Here A is the spectral parameter, which 
varies over the real line for scattering solutions.) 
Their x— +œ asymptotics is encoded in transition 
coefficients a(A) and b(A), with a(A) and |b(A)| being 
time independent, whereas argb(X) has a linear 
dependence on time when the potential evolves 
according to the sine-Gordon equation. The bound 
states correspond to special A-values \1,..., An with 
positive imaginary part (namely the zeros of the 
coefficient a(A), which is analytic in the upper-half 
A-plane); their normalization coefficients 4,..., UN 
have an essentially linear time evolution, just 


as b(A). 


The crux of the IST is now that the potentials can 
be reconstructed from the spectral data 


D(X), A1,---,AN, 71,- --, VN} [14] 


by solving a linear integral equation of Gelfand- 
Levitan-Marchenko (GLM) type. (Alternatively, 
Riemann-Hilbert problem techniques can be used.) 
Hence, the nonlinear Cauchy problem can be 
replaced by the far simpler linear problems of 
determining the spectral data [14] of a linear 
operator (the direct problem) and then solving the 
linear GLM equation for the time-evolved scattering 
data (the inverse problem). 

From the Hamiltonian perspective, the IST may 
be reinterpreted as a transformation to action-angle 
variables. The action variables are defined in terms 
of |b(A)| and \4,...,An. They are time independent 
under the sine-Gordon and higher Hamiltonian 
flows. The angle variables are arg b(A) and suitable 
functions of the normalization coefficients. They 
depend linearly on the evolution times of the flows. 
The Hamiltonians can be explicitly expressed in 
action variables. 

Next, we point out that there is a large subspace 
of Cauchy data (ġ(x),m(x)) that do not give rise to 
bound states in the auxiliary linear problem. The 
associated solutions are the so-called radiation 
solutions: they decrease to 0 for large times. These 
solutions can be obtained from the inverse transform 
involving the GLM equation by only taking b(A) 
into account. 

The other extreme is to choose b(A)=0 and 
arbitrary bound states and normalization coeffi- 
cients in the GLM equation. This special case of 
vanishing reflection leads to the particle-like solu- 
tions that are studied in the next section. For general 
Cauchy data, one has both b(A) 40 and a finite 
number of bound states. These so-called mixed 
solutions have a radiation component (encoded in 
b(A)) which decays for asymptotic times, whereas 
the bound states show up for t— +o0 as isolated 
solitons, antisolitons, and breathers. 


Classical Solitons, Antisolitons, 
and Breathers 


Just as for other classical soliton equations, the case 
of reflectionless data can be handled in complete 
detail, since the GLM equation reduces to an N x N 
system of linear equations. The case N = 1 yields the 
1-soliton and 1-antisoliton solutions. Resting at the 
origin, these one-particle solutions are given by 


+4 arctan(e ~) [15] 


and have energy 8 (cf. [5]). (We normalize all 


solutions by requiring 


lim Ot) = 0 [16] 


Note that one can add arbitrary multiples of 27 
without changing the energy H [S].) A spatial 
translation and Lorentz boost then yields the general 
solutions 


obx(t, x) 
= +4 arctan(exp(g —xcosh@+tsinh@)) [17] 


with energy 8 cosh 6 and momentum 8 sinh @ (cf. [8]). 
Defining the topological charge of a solution 
(with normalization [16]) by 


Q= 5 lim o(t,x) 18 
the different charges O=1 and Q= —1 of the 
soliton and antisoliton reflect a signature associated 
to the special value of the spectral parameter on the 
imaginary axis for which a bound state in the linear 
problem occurs. More generally, for bound-state 
eigenvalues on the imaginary axis these signatures 
must be specified in the IST setting, a point glossed 
over in the previous section. 

Bound states in the linear problem can also arise 
from A-values off the imaginary axis, which come in 
pairs ia +b, with a,b > 0. Such pairs give rise to 
solutions containing breathers, which can be viewed as 
bound states of a soliton and an antisoliton. The one- 
breather solution breathing at the origin is given by 


sin(¢ sin n) 

4 arct tn —— 0,7/2) [19 
arc an(co I Neo) n € (0,7/2) [19] 
and has energy 16cos7. A spacetime translation and 


Lorentz boost then yields the general solution 


palt, x) 


7 sin|—y/2 + sinn(tcoshé — x sinh 0)| 
ca arctan( cot ps —cosn(xcoshdé — tsinh6)| 


|20] 


which has energy 16cosh@cos7 and momentum 
16sinhĝcosņ. It may be obtained by analytic 
continuation from the solution describing a collision 
between a soliton with velocity tanh6, and an 
antisoliton with velocity tanh, taking 4) <6}. 
The latter is given by 


p+—(t,x) 
= 4 arctan (coth (0 — 6) 2) E) 
A, < Q1 [21] 
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where 


Lj =q,; -xcosh6é;+ésinh6, g ER [22 


and ġ, results from ¢,_ by substituting 
As>Otin, girly Fiy)/2 [23] 


(For the case 6; < 62, one needs an extra minus sign 
on the right-hand side of [21].) 

There is yet another possibility for an eigenvalue 
on the imaginary axis we have not mentioned thus 
far: it may have an arbitrary multiplicity, giving rise 
to the so-called multipole solutions. This is illu- 
strated by the breather solution ¢,: when one sets 
y= —2qon and lets 7 tend to 0, one obtains a 
solution 


Psep(t, x) 


= 4 arctan( go + tcosh 8 — x sinh 8 


cosh|y/2 — x cosh 0 + tsinh z) Pal 


From a physical viewpoint, the soliton and anti- 
soliton have just enough energy to prevent a bound 
state from being formed. Notice that in this case the 
distance between soliton and antisoliton diverges 
logarithmically in |t) as t—+ +oo, whereas for @4_ 
one obtains linear increase. 

The 2-soliton and 2-antisoliton solutions can also be 
obtained by analytic continuation of ¢,_. They read 


p++ = 4 arctan | coth((; — 42)/2) 


,, cosh((H1 — H2)/2) 
sinh((u1 + u2)/2) 


where uj is given by [22]. Thus, they arise by 
taking q2 > q2 + 1m and qı > q1 +17 in [21], resp. 
The equal-signature eigenvalues corresponding to 
these two solutions cannot collide and move off 
the imaginary axis; physically speaking, equal- 
charge particles repel each other. The energy and 
momentum of the solutions [25] and [21] are given 
by 8cosh8ı +8coshð2 and 8sinhðı + 8 sinh 62, 
respectively. 

Up to scale factors, the above variables 01,02 and 
6,7 are the action variables resulting from the IST, 
whereas g1,g2 and y,y are the canonically con- 
jugated angle variables. Accordingly, the time and 
space translation flows (generated by H [5] and P 
[8], resp.) shift the angles linearly in the evolution 
parameters t and x. 

We conclude this section with a description of the 
N-soliton solution and its large time asymptotics. It 
can be expressed in terms of the N x N matrix 


Ty | eoth((6; — 01)/2) 
cosh(0; — 4)/2) 


0 < 01 [25] 


Lik = exp( uj) |26] 
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where w; is given by [22] with 


q1,---,9N ER, ON <- <4 27] 


Specifically, one has 


n4 (t, x) 


=4 tr arctan (£) 
= — 2i In(\1n +iL|/|1n — iL}) 


=-2itn( (1+5 ise i) yee) [28] 


where S; is the Ith symmetric function of £. Using 
Cauchy’s identity, one obtains the explicit formula 


i= >. esr ( Zw) [corn 00/2 |29] 


jel jel 
|=] kel 


In order to specify the t—> +oo asymptotics of én,, 
we introduce the 1-soliton solutions 


i (E, x) = 4 arctan(exp(y; F Aj/2)) [30] 
where 
s- (5+ gja (0; — Op) [31] 
k<j k>j 
5(0) = In (coth(6/2)*) [32] 


Then, one has 


= Ofexp(—lélr)) 








t— +o [33] 
where the decay rate is given by 
p= corn j)| tanh 6; — tanh 6|) [34] 
j 
Thus, the soliton profile with velocity tanh 0; incurs 
a shift A;/ cosh; as a result of the collision. The 


factor 1/cosh@; may be viewed as a Lorentz 
contraction factor. 


The Quantum Version: A Soliton 
Quantum Field Theory 


From a perturbation-theoretic viewpoint, the quan- 
tum sine-Gordon Hamiltonian is given by 


nf (aot 


“(1 - £08 48) ) : dx, 


+5 (Ad) 


o m, B>0 [35] 


Here, ¢(0,x) is a neutral Klein-Gordon field with 
mass m and the double dots denote a suitable 
ordering prescription. The associated equation of 
motion 


2 


bux — On = = sin bo [36] 


is equivalent to [1] on the classical level, but not on 
the quantum level. (If é(t, x) is a classical solution to 
[36], then B¢(t/m,x/m) solves [1].) This difference 
is due to the extremely singular character of 
interacting relativistic quantum field theory, a 
context in which “solving” the field theory has 
slowly acquired a meaning that is vastly different 
from the classical notion. Indeed, one can at best 
hope to verify [36] in the sense of expectation values 
in suitable quantum states, and this is precisely what 
has been achieved within the form-factor program 
sketched later on. 

From the perspective of functional analysis, the 
existence of a well-defined Wightman field theory with 
all of the features mentioned below is wide open. More 
precisely, beginning with pioneering work by Frohlich 
some 30 years ago, various authors have contributed 
to a mathematically rigorous construction of a sine- 
Gordon quantum field theory version, but to date it 
seems not feasible to verify that the resulting Wight- 
man field theory has any of the explicit features we are 
going to sketch. (For example, not even the free 
character of the field theory for 67=47 has been 
established; cf. below.) 

That said, we proceed to sketch some highlights 
of the impressive, but partly heuristic lore that has 
been assembled in a great many theoretical physics 
papers. A key result we begin with is the equivalence 
to a field theory that looks very different at face 
value. This is the massive Thirring model, formally 
given by the Hamiltonian 


Hr= | (w(- id, + PM)U +5 =i is rt) : dx 
M € (0,œ), gER [37] 


Here, Y(0, x) is the charged Dirac field with mass M 
and the double dots stand for normal ordering. For 
the y-algebra, one may choose 


-(° 1 
14 © 


and J, is the Dirac current, 


h=V A= 


The equivalence argument (due to Coleman) consists 
in showing that the quantities 


Oo ay m° 
z 57 Em Q, A : cos Bo : [40] 
in the sine-Gordon theory have the same vacuum 
expectation values (in perturbation theory) as the 


massive Thirring quantities 


ores -M : %*y Y: [41] 
resp., provided the parameters are related by 
4r g 
ER 42 
wai+s 42] 


This yields an equivalence between the charge-0 
sector of the massive Thirring model and the sector 
of the sine-Gordon theory obtained by the action of 
the fields [40] on the vacuum vector. But the 
charged sectors of the Thirring model can also be 
viewed as new sectors in the sine-Gordon theory, 
obtained by a solitonic field construction (first 
performed by Mandelstam). 

In this picture, the fermions and antifermions in 
the massive Thirring model correspond to new 
excitations in the sine-Gordon theory, the quantum 
solitons and antisolitons. The latter are viewed as 
coherent states of the sine-Gordon “mesons” in the 
vacuum sector, the rest masses being related by 


in the semiclassical limit 8? — 0. 

Even at the formal level involved in the corre- 
spondence, the theories are not believed to exist for 
8 >8n and g< —r/2, since there is positivity 
breakdown for this range of couplings. The free 
Dirac case g=0 corresponds to 67 =4r. In parti- 
cular, there is no interaction between the sine- 
Gordon solitons and antisolitons for this 8-value. 
In the range (7 € (47,87) there is interaction, but 
bound soliton—antisoliton pairs (quantum breathers, 
alias sine-Gordon mesons) do not occur. 

By contrast, for 8% < 4r there exist breathers with 
rest masses 


Mn = 2Msin(n + 1)a, a=m/2M, 
n+1=1,2,..., L < rT/2a [44] 


Thus, the “particle spectrum” consists of solitons 
and antisolitons with mass M and mesons Cj,...,Cz 
with masses m1,... „n, given by [44]. The latter 
formula was first established by semiclassical quan- 
tization of the classical breathers (Dashen— 
Hasslacher—Neveu), and ever since is usually called 
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the DHN formula. Notice that for a near zero mı and 
m are nearly equal, and that for G7 > 47 there are no 
longer any sine-Gordon mesons present in the theory. 

A priori, the existence of infinitely many classical 
conserved Hamiltonians does not even formally 
imply the same feature for the quantum field theory, 
as anomalies may occur. For the sine-Gordon and 
massive Thirring cases, anomalies have been shown 
to be absent, however. This entails not only that the 
number of solitons, antisolitons, and breathers in a 
scattering process is conserved, but also that the set 
of incoming rapidities equals the set of outgoing 
rapidities. 

The latter stability features and the DHN formula 
[44] are corroborated by the S-matrix, which is 
known in complete detail. The two-body amplitudes 
involving solitons and antisolitons can be written in 
terms of the function 


u(z) 
= exp (i [ cata =e sin 2xz) [45] 


x sinh ax cosh 1x /2 


They are given by 


CER TEE E (7) 
7 sinh(70/2a) 
= u(0/2) (nope aay 


isin(1 /2a) 
sinh (r(ir — 6)/2a)’ 1) ad 


where 0 denotes the rapidity difference. (Due to 
fermion statistics, one gets only one amplitude for a 
soliton or antisoliton pair. But a soliton and an 
antisoliton have opposite charge, so they can be 
distinguished. In that case, therefore, the notion of 
reflection and transmission coefficients makes sense.) 

The S-matrix involving an arbitrary number of 
solitons, antisolitons, and their bound states is also 
explicitly known. The amplitudes involving no 
breathers are readily described in terms of the above 
two-body amplitudes. Indeed, the S-matrix factorizes 
as a sum of products of the amplitudes [46], yielding a 
picture of particles scattering independently in pairs, 
just as at the classical level. The factorization can be 
performed irrespective of the temporal ordering 
assumed for the pair scattering processes, since the 
four functions occurring inside the parentheses of 
[46] satisfy the Yang—Baxter equations. 

Roughly speaking, the S-matrix for processes invol- 
ving breathers can be calculated by analytic continua- 
tion from the soliton—antisoliton S-matrix. The details 
are however quite substantial. We only add that 
scattering amplitudes involving solely breathers can 
be expressed using only hyperbolic functions. 
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Since the 1980s, a lot of information has also 
been gathered concerning matrix elements of 
suitable sine-Gordon field quantities between 
special quantum states (form factors). Unfortu- 
nately, the correlation functions involve infinite 
sums of form factors that are quite difficult to 
control analytically. Hence, it is not known whether 
the correlation functions associated with the form 
factors give rise to a Wightman field theory with 
the usual axiomatic properties. 


The Relation to Relativistic 
Calogero-Moser Systems 


The behavior of the special classical solutions 
discussed earlier is very similar to that of classical 
point particles. Furthermore, the picture of classical 
solitons, antisolitons, and their bound states scatter- 
ing independently in pairs is essentially preserved on 
the quantum level, just as one would expect for the 
quantization of an integrable particle system. 

Next, we note that from the quantum viewpoint 
there is no physical distinction between wave 
functions and point particles, whereas a classical 
wave is a physical entity that is clearly very different 
from a point particle. Even so, it is a natural 
question whether there exist classical Hamiltonian 
systems of N point particles on the line whose 
physical characteristics (charges, bound states, scat- 
tering, etc.) are the same as those of the particle-like 
sine-Gordon solutions. If so, a second question is 
equally obvious: does the quantum version of the 
N-particle systems still have the same features as 
that of the quantum sine-Gordon excitations? 

As we now sketch, the first question has been 
answered in the affirmative, whereas the second one 
has not been completely answered yet. However, all 
of the information on the pertinent quantum 
N-particle systems collected thus far points to an 
affirmative answer. The systems at issue are relati- 
vistic versions of the well-known nonrelativistic 
Calogero—Moser N-particle systems. 

To begin with the classical two-particle system, its 
Hamiltonian is given by 


H = (cosh pı + cosh p2) coth((x1 —x2)/2) 147] 
on the phase space 


= {(x, p) E Rf|xz < x1}, W = dx ^A dp [48] 


Taking x2 > x2 +17 yields the particle—antiparticle 
Hamiltonian 


~ 


H = (cosh pı + cosh p2)|tanh((x1 — x2)/2)| [49] 


on the phase space 
OQ = {(x,p) € RÍ}, 


The two-antiparticle Hamiltonian is again given by 
[47] and [48]. The interaction potential in [47] is 
repulsive, whereas it is attractive in [49]. Hence, any 
initial point in Q gives rise to a scattering state, 
whereas points in Ê yield scattering states if and 
only if the reduced Hamiltonian 


H, = cosh p|tanh(x/2)|, p = (pı — p2)/2 


x= x1 — X2 [51] 


ede A dp [50] 


satisfies H, > 1. More specifically, in both cases the 
distance |x1(t) — x2(t)| increases linearly as t — +o, 
the scattering (position shift) being encoded by the 
same function [32] as for the sine-Gordon solitons. 
The phase-space points on the separatrix {H, = 1} 
have the same temporal asymptotics as the multipole 
solution [24], whereas the bound-state oscillations 
for H, < 1 match those of the breathers [20]. 

More generally, the Hamiltonian for N, particles 
and N_ antiparticles is given by the function 


N, N, 
X cosh(p*) | [ |coth((x* — xf)/2)| 
= z 
N_ N_ 
x | [| |tanh((x* — x7 )/2)| + X cosh(p; ) 
f=] j=] 
N_ 
x | | |coth((xp — x;,)/2)| 
u 
x | | |tanh((x7 — x7)/2)| |52] 
j=l 


on the phase space 


ON, N- 
= f (xt, p") E€ R, (x7, p7) 


ERr aan ey <0 < x; } [53] 


WN, N_ = dx* A dp* +dx A dp~ [54] 


This defining Hamiltonian can be supplemented by 
(N, + N_ —1) independent Hamiltonians that pair- 
wise commute. The action-angle map of this integr- 
able system can be used to relate the scattering and 
bound-state behavior to that of the sine-Gordon 
solutions from an earlier section, yielding an exact 
correspondence. Indeed, the variables we used to 
describe the particle-like sine-Gordon solutions 
amount to the action-angle variables associated to 
[52]. Moreover, the matrix £ [26] with t=x=0 


equals the Lax matrix for the N-particle system, which 
is the manifestation of a remarkable self-duality 
property of the equal-charge case. There is an equally 
close relation between the general particle-like solu- 
tions and the general systems encoded in [52]. 

As a matter of fact, the connection can be further 
strengthened by introducing spacetime trajectories 
for the solitons, antisolitons, and breathers, which 
are defined in terms of the evolution of an initial 
point in Qn, .n_ under the time translation generator 
[52] and the space translation generator, obtained 
from [52] by the replacement cosh — sinh. These 
point particle and antiparticle trajectories make it 
possible to follow the motion of the solitons, 
antisolitons, and breathers during the temporal 
interval in which the nonlinear interaction takes 
place, whereas for large times the trajectories are 
located at the (then) clearly discernible positions of 
the individual solitons, antisolitons, and breathers. 

Before sketching the soliton-particle correspon- 
dence at the quantum level, we add a remark on the 
finite-gap solutions of the classical sine-Gordon 
equation, already mentioned in the paragraph 
containing [11]. These solutions may be viewed as 
generalizations of the particle-like solutions dis- 
cussed earlier, and they can also be obtained via 
relativistic N-particle Calogero—Moser systems. The 
pertinent systems are generalizations of the hyper- 
bolic systems just described to the elliptic level. 

Turning now to the quantum level, we begin by 
mentioning that the Poisson-commuting Hamilto- 
nians admit a quantization in terms of commuting 
analytic difference operators. This involves a special 
ordering choice of the p-dependent and x-dependent 
factors in the classical Hamiltonians, which is 
required to preserve commutativity. The resulting 
quantum two-body problem can be explicitly solved 
in terms of a generalization of the Gauss hypergeo- 
metric function. For the case of equal charges, the 
scattering is encoded in the sine-Gordon amplitudes 
u++(0) (cf. [45] and [46]). For the unequal-charge 
case, one should distinguish an even and odd 
channel. The scattering on these channels is encoded 
in the sine-Gordon amplitudes t,_(0)+7r,_(6). 
Moreover, the bound-state spectrum agrees with 
the DHN formula [44] and the bound-state wave 
functions are given by hyperbolic functions. 

As a consequence of these results, the physics 
encoded in the two-body subspace of the sine- 
Gordon quantum field theory is indistinguishable 
from that of the corresponding two-body relativistic 
Calogero—Moser systems. To extend this equivalence 
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to the arbitrary-N case, one needs first of all 
sufficiently explicit solutions to the N-body 
Schrödinger equation. To date, this has only been 
achieved for the case of N equal charges and the 
special couplings for which the reflection amplitude 
r,_ vanishes. The asymptotics of the pertinent 
solutions is factorized in terms of u++(0), in agree- 
ment with the sine-Gordon picture. 


See also: Backlund Transformations; Boundary-Value 
Problems for Integrable Equations; Calogero- 
Moser—Sutherland Systems of Nonrelativistic and 
Relativistic Type; Infinite-dimensional Hamiltonian 
Systems; Integrability and Quantum Field Theory; 
Integrable Systems and Discrete Geometry; Integrable 
Systems and Inverse Scattering Method; Integrable 
Systems: Overview; Ljusternik—Schnirelman Theory; 
Solitons and Other Extended Field Configurations; 
Solitons and Kac—Moody Lie Algebras; Symmetries and 
Conservation Laws; Two-Dimensional Models; 
Yang—Baxter Equations. 
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Introduction 


Fix a closed n-dimensional manifold M, and let M 
be the space of Riemannian metrics on M. As in the 
reasoning leading to the Einstein equations in 
general relativity, there is basically a unique simple 
and natural vector field on the space M. Namely, the 
tangent space T,M consists of symmetric bilinear 
forms; besides multiples of the metric itself, the Ricci 
curvature Ric, of g is the only symmetric form that 
depends on at most the second derivatives of the 
metric, and is invariant under coordinate changes, 
that is, a (0, 2)-tensor formed from the metric. Thus, 
consider 


Xg = pRicg + Ag 


where pu, are scalars. Setting u= -—2, the corre- 
sponding equation for the flow of X is 

d , 

ge) = —2Ricea) + Agl) [1] 


The Ricci flow, introduced by Hamilton (1982), is 
obtained by setting A = 0O: 


d . 
qo) = —2Rico(4) [2] 


Rescaling the metric and time variable ¢ transforms 
[2] into [1], with A = A(t). For example, rescaling the 
Ricci flow [2] so that the volume of (M,g(t)) is 
preserved leads to the flow equation [1] with 
A=2 $R, twice the mean value of the scalar 
curvature R. 

The Ricci flow [2] bears some relation with the 
metric part of the (-function or renormalization 
group (RG) flow equation 


© g(t) = oe) 


for the two-dimensional sigma model of maps 
5X? — M. The (-function is a vector field on M, 
invariant under diffeomorphisms, which has an 
expansion of the form 


—B(g) = Ricg + eRiem <<: 


where Riem* is quadratic in the Riemann curvature 
tensor. The Ricci flow corresponds to the one-loop 
term or semiclassical limit in the RG flow 
(cf. D’Hoker (1999) and Friedan (1985)). 


Recently, G Perelman (2002, 2003a, b) has deve- 
loped new insights into the geometry of the Ricci flow 
which has led to a solution of long-standing mathe- 
matical conjectures on the structure of 3-manifolds, 
namely the Thurston geometrization conjecture 
(Thurston 1982), and hence the Poincaré conjecture. 


Basic Properties of the Ricci Flow 


In charts where the coordinate functions are locally 
defined harmonic functions in the metric g(t), [2] 
takes the form 


d 
4, oi = ABi + Oij(g, Og) 


where A is the Laplace operator on functions with 
respect to the metric g = g(t) and O is a lower-order 
term quadratic in g and its first-order partial 
derivatives. This is a nonlinear heat-type equation 
for g; and leads to the existence and uniqueness of 
solutions to the Ricci flow on some time interval 
starting at any smooth initial metric. This is the 
reason for the minus sign in [2]; a plus sign gives a 
backwards heat-type equation, which has no solu- 
tions in general. 

The flow [2] gives a natural method to try to 
construct canonical metrics on the manifold M. 
Stationary points of the flow [2] are Ricci-flat 
metrics, while stationary points of the flow [1] are 
(Riemannian) Einstein metrics, where Rice = (R/n)g, 
with R the scalar curvature of g. One of Hamilton’s 
motivations for studying the Ricci flow were results 
on an analogous question for nonlinear sigma 
models. Consider maps f between Riemannian 
manifolds M, N with Lagrangian given by the 
Dirichlet energy. Eells-Sampson studied the heat 
equation for this action and proved that when the 
target N has nonpositive curvature, the flow exists 
for all time and converges to a stationary point of 
the action, that is, a harmonic map f,,: M — N. The 
idea is to see if an analogous program can be 
developed on the space of metrics M. 

There are a number of well-known obstructions to 
the existence of Einstein metrics on manifolds, in 
particular, in dimensions 3 and 4. Thus, the Ricci 
flow will not exist for all time on a general 
manifold. Hence, it must develop singularities. A 
fundamental issue is to try to relate the structure of 
the singularities of the flow with the topology of the 
underlying manifold M. 

A few simple qualitative features of the Ricci flow 
[2] are as follows: if Ric(x,t) > 0, then the flow 
contracts the metric g(t) near x, to the future, while 


if Ric(x, t) < 0, then the flow expands g(t) near x. At 
a general point, there will be directions of positive 
and negative Ricci curvature, along which the metric 
locally contracts or expands. The flow preserves 
product structures of metrics, and preserves the 
isometry group of the initial metric. 

The form of [2] shows that the Ricci flow 
continues as long as Ricci curvature remains 
bounded. On a bounded time interval where Ricg(z) 
is bounded, the metrics g(t) are quasi-isometric, that 
is, they have bounded distortion compared with the 
initial metric g(0). Thus, one needs to consider 
evolution equations for the curvature, induced by 
the flow for the metric. The simplest of these is the 
evolution equation for the scalar curvature R: 


oR ARG 2|Ric|* (3) 
dt 

Evaluating [3] at a point realizing the minimum Rmin 
of R on M shows that Rmin is monotone nondecreas- 
ing along the flow. In particular, the Ricci flow 
preserves positive scalar curvature. Moreover, if 


Rmin(0) > 0, then 


n 
= 2 Rein (0) 4 
Thus, the Ricci flow exists only up to a maximal 
time T < n/2Rmin(0) when Rmin(0) > 0. In contrast, 
in regions where the Ricci curvature stays negative 
definite, the flow exists for infinite time. 
The evolution of the Ricci curvature has the same 
general form as [3]: 


d po 
q Ba = ARy +O; [5] 
The expression for O is much more complicated 
than the Ricci curvature term in [3] but involves 
only quadratic expressions in the curvature. 
However, O involves the full Riemann curvature 
tensor Riem of g, and not just the Ricci curvature (as 
[3] involves Ricci and not just scalar curvature). An 
important feature of dimension 3 is that the full 
Riemann curvature Riem is determined algebraically 
by the Ricci curvature. So the Ricci flow has a much 
better chance of “working” in dimension 3. For 
example, an analysis of O shows that the Ricci flow 
preserves positive Ricci curvature in dimension 3; if 
Ricgio) > 0, then Ricgi) > 0, for t > 0. This is not the 
case in higher dimensions. On the other hand, in any 
dimension >2, the Ricci flow does not preserve 
negative Ricci curvature, or even a general lower 
bound Ric > —A, for A > 0. For the remainder of the 
article, we usually assume then that dim M =3. 

The first basic result on the Ricci flow is the 
following, due to Hamilton (1982). 
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Space-form theorem. If g(0) is a metric of positive 
Ricci curvature on a 3-manifold M, then the volume 
normalized Ricci flow exists for all time, and 
converges to the round metric on S?/T, where I is 
a finite subgroup of SO(4), acting freely on $è. 

Thus, the Ricci flow “geometrizes” 3-manifolds of 
positive Ricci curvature. There are two further 
important structural results on the Ricci flow. 

Curvature pinching estimate (Hamilton 1982, 
Ivey 1993). For g(t) a solution to the Ricci flow on 
a closed 3-manifold M, there is a nonincreasing 
function ¢:(—oo, co) — R, tending to 0 at œ, and a 
constant C, depending only on g(0), such that, 


Riem(x, t) > —C—¢(R(x,t))-|R(x,t)| 16] 


This estimate does not imply a lower bound on 
Riem(x,t) uniform in time. However, when com- 
bined with the fact that the scalar curvature R(x, t) 
is uniformly bounded below (cf. [3]), it implies that 
|Riem|(x,t) > 1 only where R(x,t)>>1. To control 
the size of |Riem|, it thus suffices to obtain just an upper 
bound on R. This is remarkable, since the scalar 
curvature is a much weaker invariant of the metric 
than the full curvature. Moreover, at points where the 
curvature is sufficiently large, [6] shows that 
Riem(x, t)/R(x,t) > —6, for 6 small. Thus, if one scales 
the metric to make R(x,t)=1, then Riem(x,t) > —6. In 
such a scale, the metric then has almost non-negative 
curvature near (x,t). 

Harnack estimate (Hamilton 1982). Let (N, g(t)) 
be a solution to the Ricci flow with bounded and 
non-negative curvature Riem > 0, and suppose g(t) 
is a complete Riemannian metric on N. Then for 
0 < ti < l2, 


where d, is the distance function on (M, g,). This 
allows one to control the geometry of the solution at 
different spacetime points, given control at an initial 
point. 


Singularity Formation 


The deeper analysis of the Ricci flow is concerned 
with the singularities that arise in finite time. 
Equation [3] shows that the Ricci flow will not 
exist for arbitrarily long time in general. In the case 
of initial metrics with positive Ricci curvature, this is 
resolved by rescaling the Ricci flow to constant 
volume. However, the general situation is necessarily 
much more complicated. For example, any manifold 
which is a connected sum of S°/T or S? x S! factors 
has metrics of positive scalar curvature. For obvious 
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topological reasons, the volume normalized Ricci 
flow then cannot converge nicely to a round metric; 
even the renormalized flow must develop 
singularities. 

The usual method to understand the structure of 
singularities, particularly in geometric PDEs, is to 
rescale or renormalize the solution on a sequence 
converging to the singularity to make the solution 
bounded, and try to pass to a limit of the 
renormalization. Such a limit solution models the 
singularity formation, and one hopes (or expects) 
that the singularity models have special features 
making them much simpler than an arbitrary 
solution of the flow. 

A singularity forms for the Ricci flow only where 
the curvature becomes unbounded. Suppose then 
that à? = |Riem|(x;, t;) — 00, on a sequence of points 
xi€ M, and times t; ~ T < oo. Consider the 
rescaled or blow-up metrics and times 


Bi(ti) = A; dj g(t), 


where ¢; are diffeomorphisms giving local dilations 
of the manifold near x; by the factor 4,. 

The flow g; is also a solution of the Ricci flow, 
and has bounded curvature at (x;,0). For suitable 
choices of x; and t; the curvature will be bounded 
near x;, and for nearby times to the past, t; < 0; for 
example, one might choose points (x;,t;) where the 
curvature is maximal on (M, g(t)),t < t;. 

The rescaling [8] expands all distances by the 
factor A; and time by the factor \7. Thus, in 
effect one is studying very small regions, of 
spatial size on the order of r;=A;! about (xj, tj), 
and “using a microscope” to examine the small- 
scale features in this region on a scale of size 
about 1. 

A limit solution of the Ricci flow, defined at least 
locally in space and time, will exist provided that the 
local volumes of the rescalings are bounded below 
(Gromov compactness). In terms of the original 
unscaled flow, this requires that the metric g(t) 
should not be locally collapsed on the scale of its 
curvature, that is, 


R=MG-14) [8] 


vol By (futi) =U, [9] 


for some fixed but arbitrary v >0. A maximal 
connected limit (N, g(t), x) containing the base point 
x= limx;, is then called a “singularity model.” 
Observe that the topology of the limit N may well 
be distinct from the original manifold M, most of 
which may have been blown off to infinity in the 
rescaling. 

To see the potential usefulness of this process, 
suppose one does have local noncollapse on the scale 


of the curvature, and that base points of maximal 
curvature in space and time £ < t; have been chosen. 
At least in a subsequence, one then obtains a limit 
solution to the Ricci flow (N,g(t),x), based at x, 
defined at least for times (—oo,0], with g(t) a 
complete Riemannian metric on N. Such solutions 
are called ancient solutions of the Ricci flow. The 
estimate [6] shows that the limit has non-negative 
curvature in dimension 3, and so [7] holds on N. 
Thus, the limit is indeed quite special. The topology 
of complete manifolds N of non-negative curvature 
is completely understood in dimension 3. If N is 
noncompact, then N is diffeomorphic to R°, S2 x R, 
or a quotient of these spaces. If N is compact, then 
a slightly stronger form of the space-form theo- 
rem implies N is diffeomorphic to S°/I, S% x St, or 
S2 X Za St, 

The study of the formation of singularities in 
the Ricci flow was initiated by Hamilton (1995). 
Recently, Perelman has obtained an essentially 
complete understanding of the singularity behavior 
of the Ricci flow, at least in dimension 3. 


Perelman’s Work 
Noncollapse 


Consider the Einstein—Hilbert action 


Rig) = | Rig) dV, 110 


as a functional on M. Critical points of R are Ricci- 
flat metrics. It is natural and tempting to try to 
relate the Ricci flow with the gradient flow of R 
(with respect to a natural L? metric on the space M). 
However, it has long been recognized that this 
cannot be done directly. In fact, the gradient flow of 
R does not even exist, since it implies a backwards 
heat-type equation for the scalar curvature R 
(similar to [3] but with a minus sign before A). 

Consider however the following functional 
extending R: 


FEP) = | (R+IVP)e AV, 11] 


as a functional on the larger space M x C®(M, R), or 
equivalently a family of functionals on M, parame- 
trized by C®(M, R). The functional [11] also arises in 
string theory as the low-energy effective action; the 
scalar field f is called the dilaton. Fix any smooth 
measure dm on M and define the Perelman coupling 
by requiring that (g, f) satisfy 


e” dV, = dm [12] 


The resulting functional 


F"(g, f) = J (R+IVfÊ)dm [13] 


becomes a functional on M. (This coupling does not 
appear to have been considered in string theory.) 
The L? gradient flow of F” is given simply by 

dg 

do 
where D?f is the Hessian of f with respect to g. The 
evolution equation [14] for g is just the Ricci flow [2] 
modified by an infinitesimal diffeomorphism: 
D*f =(d/dt)(¢*g), where (d/dt)¢,;= Vf. Thus, the 
gradient flow of F” is the Ricci flow, up to 
diffeomorphisms. The evolution equation for the 


scalar field f, 


—2(Ricz + Df) [14] 


f,=—Af—R [15] 


is a backward heat equation (balancing the forward 
evolution of the volume form of g(t)). Thus, this 
flow will not exist for general f, going forward in t. 
However, one of the basic points of view is to let the 
(pure) Ricci flow [2] flow for a time fp > 0. At tọ, 
one may then take an arbitrary f =f (tọ) and flow 
this f backward in time (T= tọ — t) to obtain an 
initial value f(0) for f. The choice of f(t) deter- 
mines, together with the choice of volume form of 
g(0)), (or g(to)), the measure dm and so the choice 
of F”. The process of passing from F to F” 
corresponds to a reduction of the symmetry group of 
all diffeomorphisms D of F to the group Do of 
volume-preserving diffeomorphisms; the quotient 
space D/Dy has been decoupled into a space 
C®(M, R) of parameters. 

The functionals F” are not scale invariant. To 
achieve scale invariance, Perelman includes an 
explicit insertion of the scale parameter, related to 
time, by setting 


wie.f.7) = | (UVF +R) +f- n) 
x (4rr) ” ef dV [16] 


with coupling so that dm = (4r7) ™? ef dV is fixed. 
The entropy functional W is invariant under 
simultaneous rescaling of r and g, and ņ= —1. 
Again, the gradient flow of W is the Ricci flow 
modulo diffeomorphisms and rescalings and the 
stationary points of the gradient flow are the 
gradient Ricci solitons, 


. 1 
Rice + D’f -57-8 = 0 
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for which the metrics evolve by diffeomorphisms 
and rescalings. Gradient solitons arise naturally as 
singularity models, due to the rescalings and 
diffeomorphisms in the blow-up procedure [8]. An 
important example is the cigar soliton on R? x R, 
(or R? x S1), 


g=(14+7) ggu + ds? [17] 


Perelman then uses the scalar field f to probe the 
geometry of g(t). For instance, the collapse or 
noncollapse of the metric g(t) near a point x € M 
can be detected from the size of W(g(t)) by choosing 
e` to be an approximation to a delta function 
centered at (x,t). The more collapsed g(t) is near x, 
the more negative the value of W(g(t)). The collapse 
of the metric g(t) on any scale in finite time is then 
ruled out by combining this with the fact that the 
entropy functional W is increasing along the Ricci 
flow. 

Much more detailed information can be obtained 
by studying the path integral associated to the 
evolution equation [15] for f, given by 


Ci) = J Va? + R(y(r))] dr 


where R and |ġ(T)| are computed with respect to the 
evolving metrics g(r). In particular, the study of the 
geodesics and the associated variational theory of 
the length functional £ are important in under- 
standing the geometry of the Ricci flow near the 
singularities. 


Singularity Models 


A major accomplishment of Perelman is essentially a 
classification of all complete singularity models 
(N, g(t)) that arise in finite time. In the simple case 
where N is compact, then as noted above, N is 
diffeomorphic to S°/T, S? x St, or S xz, St. 

In the much more important case where N is 
complete and noncompact, Perelman proves that the 
geometry of N near infinity is that of a union of 
e-necks. Thus, at time 0, and at points x with 
r(x) = dist(x,xo)>> 1, for a fixed base point xo, a 
region of radius « ! about x, in the scale where 
R(x) =1, is e-close to such a region in the standard 
round product metric on S* x R; may be made 
arbitrarily small by choosing r(x) sufficiently large. 
For example, this shows that the cigar soliton [17] 
cannot arise as a singularity model. Moreover, this 
structure also holds on a time interval on the order 
of e™ to the past, so that on such regions the 
solution is close to the (backwards) evolving Ricci 
flow on S* x R. 
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Perelman shows that this structural result for the 
singularity models themselves also holds for the 
solution g(t) very near any singularity time T. Thus, 
at any base point (x,t) where the curvature is 
sufficiently large, the rescaling as in [8] of the 
spacetime by the curvature is smoothly close, on 
large compact domains, to corresponding large 
domains in a complete singularity model. The 
“ideal” complete singularity models do actually 
describe the geometry and topology near any 
singularity. Consequently, one has a detailed under- 
standing of the small-scale geometry and topology in 
a neighborhood of every point where the curvature 
is large on (M, g(t)), for t near T. 

The main consequence of this analysis is the 
existence of canonical, almost round 2-spheres S* in 
any region of (M,g(t)) where the curvature is 
sufficiently large; the radius of the S?’s is on the 
order of the curvature radius. One then disconnects 
the manifold M into pieces, by cutting M along a 
judicious choice of such 2-spheres, and gluing in 
round 3-balls in a natural way. This surgery process 
allows one to excise out the regions of (M, g(t)) 
where the Ricci flow is almost singular, and thus 
leads to a naturally defined Ricci flow with surgery, 
valid for all times t € [0, 00). 

The surgery process disconnects the original 
connected 3-manifold M into a collection of disjoint 
(connected) 3-manifolds M;, with the Ricci flow 
running on each. However, topologically, there is a 
canonical relation between M and the components 
M;; M is the connected sum of {M,}. An analysis of 
the long-time behavior of the volume-normalized 
Ricci flow confirms the expectation that the flow 
approaches a fixed point, that is, an Einstein metric, 
or collapses along 3-manifolds admitting an S! 
fibration. This then leads to the proof of Thurston’s 
geometrization conjecture for 3-manifolds and 


consequently the proof of the Poincaré conjecture. 
It gives a full classification of all closed 3-manifolds, 
much like the classification of surfaces given by the 
classical uniformization theorem. 


See also: Einstein Manifolds; Evolution Equations: Linear 
and Nonlinear; Minimal Submanifolds; Renormalization: 
General Theory; Topological Sigma Models. 
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Introduction 


Dynamical systems first developed from the geometry 
of Newton’s equations (see Goodstein and Goodstein 
(1997)) and the question of the stability of the solar 
system motivated further researches inspired by 


celestial mechanics (cf. Siegel and Moser (1971)). 
Then dynamical systems developed intensively from 
stability theory (Lyapunov’s theory) to generic proper- 
ties (based on functional analysis techniques,) hyper- 
bolic structures (Anosov’s flows, Smale axiom A) and 
to perturbation theory (Pugh’s closing lemma, KAM 
theorem). There are many links with ergodic theory 
dating back to Birkhoff’s ergodic theorem (motivated 
by Boltzmann-Gibbs contributions to thermody- 
namics). These aspects have been developed in several 
articles of the encyclopedia (see Generic Properties of 


Dynamical Systems; Ergodic Theory; Hyperbolic 
Dynamical Systems). This article develops another 
aspect of dynamical systems, namely bifurcation 
theory. In contrast, the mathematics involved relates 
more to local analytic geometry in the broad sense and 
provides local models like normal forms, uses blow-up 
techniques and asymptotic developments. This con- 
tains the singularity theory of functions (related to 
singularities of gradient flows). A recent development 
of the whole subject deals with bifurcation theory of 
fast-slow systems. 


Singularity Theory of Functions 


A singular point of a gradient dynamics 


d 

T = grad V(x) 
is a critical point of the function V. Assume that the 
function V: U —>R is defined and infinitely differ- 
entiable on an open set U. Let xọ € U be a critical 
point of V. 


Definition 1 The critical point xo is said to be of 
Morse type if the Hessian of V at x9: D2V(x0) is of 
maximal rank n. The corank of a singular point xo is 
the corank of the matrix D2V(xo). 


Denote by O the local ring of germs of C% 
functions at point xo. 


Definition 2 The Jacobian ideal of the function V 
at xo, denoted as Jac(V), is the ideal generated in 
the ring O by the partial derivatives of 
V: 0V/0x;,i1=1,...,”, considered as elements of 
the local ring O. 


The singularity (or the singular point) is isolated if 
dimrO/Jac(V) < co 


In that case, the Milnor number is defined as the 
dimension 


u = dimrO/Jac(V) 


Local models of singularities at a point are simple 
expressions that germs of functions singular at this 
point have in local coordinates. 

R Thom proposed to focus more particularly on the 
singularities whose Milnor number is less than or 
equal to 4 and whose corank is less than or equal to 2. 

The list of local models V)(x) of functions whose 
singularities at 0 display a Milnor number less than 
or equal to 4 and a corank less than or equal to 2 is 
the following: 


Vix) = ly? + A,x, the fold, 
Vix = axe + 5 Mx? + Axx, the cusp, 
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Vile y> + 5 Mx? + LAX’ + \3x, the swallow tail, 

V(x) = t x6 + t Aix" + 5 2x? + 5 A3x* + A4x, the 
butterfly, 

V(x) =x? — 3xy + M(x? +y?) + 2x +A3y, the 
elliptic umbilic, 

Vy (x) =x? +y? + Apxy + A2.x+A3y, the hyperbolic 
umbilic, and 

Vy (20) = sy? + xe2y + Apx* + Avy* + à3x + Ay, the 
parabolic umbilic. 


Consider more particularly the first four cases. 
The “state equation” defines the critical points of V): 


OV) 
Ox 
which contains the subset of the stable equilibrium 
points of the associated gradient dynamics. The 


nature of these equilibrium states changes at points 
contained in the set defined by the equation 


0 





Vy 
Ox2 


The projection of this set on the space of parameters 
contains the set of values of the parameters for which 
the equilibrium position is susceptible to change of 
topological type (in other terms to undergo a bifurca- 
tion). This set is called the catastrophe set (see Figure 1). 

Consider now the case of umbilics where there are 
two state equations: 

OV OV | 0 

Ox Oy 
The catastrophe set S is determined by one further 
equation: 





27 92 217 \2 

en ee (5 | 
Ox? Oy? OxOy 
In both cases of hyperbolic and elliptic umbilics, the set 
S is a singular surface. For the last case of the parabolic 
umbilic, the set S is of dimension 3 and again it is only 
possible to represent it by a family of its sections by a 
variable hyperplane (see Figure 2). 

All possible deformations (in the space of func- 
tions) of a function with an isolated singularity can 
be induced by a single j-dimensional family of 
deformations named the “universal deformation.” In 
general, the “codimension” of a bifurcation is the 
minimal number of parameters needed to display all 
possible phase diagrams of all possible unfoldings. 
Several deep mathematical techniques, like the 
Malgrange division theorem and preparation theo- 
rem, allowed J Mather to prove the theorem (local, 
then global) of existence of the universal unfolding. 
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Examples of catastrophe sets. Adapted with permission from Françoise J-P (2005) Oscillations en Biologie: Analyse 


Qualitative et Modèles (Mathématiques et Applications, vol. 46). Heidelberg: Springer. 


The theory of unfoldings of singularities can be 
used, for instance, to provide asymptotic expression 
of stationary phase integrals when critical points of 
the phase are not of Morse type. This relates to 
monodromy, Bernstein polynomials, Milnor fibra- 
tion near a singular point, and simultaneous local 
models of forms and functions (cf. Malgrange 
(1974)) and see Feynman Path Integrals). 


Singularity Theory of Vector Fields 
Transcritical Bifurcation 


The transcritical bifurcation is the standard mechan- 
ism for changes in stability. The local model is given by 


X =rx— x? 
For r < 0, there is an unstable fixed point at x* =r 
and a stable fixed point at x* = 0. As r increases, the 
unstable and the stable fixed points coalesce when 
r=( and when r > 0, they exchange their stability. 


A simplified model of the essential physics of a laser 
is due to Haken (1983). It is given by 


n= GnN — kn 


were n is the number of photons in the laser field, N is 
the number of excited atoms, and the gain term comes 
from the process of stimulated emission which occurs 
at a rate proportional to the product n.N. Further- 
more, the number of excited atoms drops down by the 
emission of photons N = No — an. Then we obtain 


n = (GNo — k)n — aGn* 


This model displays a transcritical bifurcation, which 
explains in elementary terms the laser threshold. 


Pitchfork Bifurcation 


The local model for supercritical pitchfork bifurca- 
tion is 


x= rx — x 
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Figure 2 Sections of catastrophe sets. Adapted with permission from Françoise J-P (2005) Oscillations en Biologie: Analyse 
Qualitative et Modèles (Mathematiques et Applications, vol. 46). Heidelberg: Springer. 


When the parameter r < 0, it displays one stable 
equilibrium position. As r increases, this equilibrium 
bifurcates (for r > 0) into two stable equilibria and 
an unstable equilibrium. Its drawing suggests “the 
pitchfork.” In case of subcritical pitchfork 
bifurcation 


x= rx +x? 


there is a single stable state for r < 0 that bifurcates 
into two stable states and one unstable as r > 0. 


Normal Forms 


Local analysis of vector fields proceeds with local 
models called normal forms. A local vector field 
near a singular point (zero) is seen as a derivation of 
the local ring of functions which preserves the 
unique maximal ideal (of the functions which vanish 
at the singular point). It yields a linear operator of 
the finite-dimensional vector spaces of truncated 


Taylor expansions of functions. This leads to 
decomposition of the vector fields into semisimple 
and nilpotent parts (at the level of formal series). A 
normal form is a formal coordinate system in which 
the semisimple part is linear. If the vector field 
preserves a structure (like volume form or symplec- 
tic form) the change of coordinates which brings it 
to its normal form is also (volume-preserving, 
symplectic). The simplicity of the normal form 
depends on the number of allowed resonances for 
the eigenvalues of the first-order jet of the vector 
field at the singular point. The best-known example 
is the Birkhoff normal form of Hamiltonian vector 
fields that we recall now, but we should also 
mention the Sternberg normal form of volume- 
preserving vector fields. 

Local analysis of a Hamiltonian vector field under 
symplectic changes of coordinates is the same as the 
local analysis of functions (namely its associated 
Hamiltonian). Birkhoff normal form deals with the 
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case of a Hamiltonian that is a perturbation at the 
origin: 


Ho(p) = ` Aj Pj 
jal 


pj =x ty, j=1,...,m 


where the symplectic form is 
W = `S dx; A dy; 
j=1 


If the eigenvalues A; are assumed to be independent 
over the integers (no resonances), then there is 
a formal system of symplectic coordinates p;, q;, 
j=1,...,m, called action-angle variables, in which 
the Hamiltonian only depends of the action variables 
p;. Such a coordinate system is generically divergent 
because, under generic assumptions on the 3-jet of 
the Hamiltonian, the system displays isolated periodic 
orbits in any neighborhood of the origin (see Moser, 
Vey, Francoise). Normal forms are normally used in 
applications (e.g., Nekhoroshev theorem, Hopf bifur- 
cation theorem) in their truncated versions. Birkhoff 
normal form was conjectured (A Weinstein) to enter 
in the asymptotic expansion of the fundamental 
solution of the wave equation on a Riemannian 
manifold near elliptic geodesics. This conjecture was 
recently proved by V Guillemin. 


Stability Theory of Hamiltonian Systems, 
Nekhoroshev Theorem, Arnol’d Diffusion 


The generic divergence of the Birkhoff normal form does 
not allow one to conclude about the stability of the 
elliptic singular point. In the case where it is convergent, 
the motion is trapped inside invariant tori (conservation 
of the actions). The KAM theorem (see Gallavotti 
(1983)) provides the existence of many invariant tori 
but, except in low dimensions, this does not exclude the 
existence of trajectories that would escape to infinity. 
Arnold indeed provided a mechanism and examples of 
such situations (this is now called Arnol’d diffusion) (see 
Introductory Articles: Classical Mechanics). This diffu- 
sion process needs some time, which is estimated below 
by a theorem of Nekhoroshev. 

Consider the Hamiltonian 


H.(p,q) =h(p) + ef (p,q) 


where h(p) is strictly convex, analytic, anisochro- 
nous on the closure U of an open bounded region U 
of R” and the perturbation f(p, q) is analytic on U x 
R”. Nekhoroshev’s theorem tells that there are 
positive constants a, b, d, g, T such that for any initial 
data po,go, the actions p do not change by more 


than ae before a time bounded below by re?/ e ( 


Gallavotti (1983)). 


see 


Bifurcations of Periodic Orbits 


Consider a one-parameter family of vector fields X, 
of class C$, k > 3, 


x = F(x, d) 


Assume that X)(0) =0 and that the linear part of the 
vector field at 0 has two complex-conjugated 
eigenvalues p(A) and p(A) such that Re(p(A)) > 0 
for A > 0, Re(u(0))=0 and (Re(p(A)))/dAl|,=0 Æ 0. 
Then, for 4 > 0 but small enough, the vector field 
X, has a periodic orbit yà which tends to 0 as X 
tends to 0. 

This bifurcation of codimension 1 is named Hopf 
bifurcation and it occurs in many models. 

When several oscillators (conservative or dissipa- 
tive) are weakly coupled, they may display fre- 
quency locking (existence of an attractive periodic 
orbit) phase locking, and synchronization. The fact 
that we always see the same face of the Moon from 
the Earth can be explained by a synchronization of 
the rotation of the Moon onto itself with its rotation 
around the Earth. Synchronization also plays a 
fundamental role in living organisms (e.g., heart, 
population dynamics: see D Attenborough’s movie 
“The Trials of Life”). It is sometimes possible to be 
convinced of synchronization via computer experi- 
ments, but the main theoretical approach is due to 
Malkin. See Bifurcations of Periodic Orbits, where a 
full mathematical proof is included. 


Homoclinic Bifurcation, Newhouse’s Phenomenon 


Homoclinic bifurcation occurs in the family X) at 
the bifurcation value of the parameter A=0 if Xo 
displays a singular orbit which tends to 0 both for 
t— +0 and for t— —oo. In dimension 2, if X is 
slightly deformed around 0, one periodic orbit may 
appear (or disappear). For planar systems, the 
Bogdanov-Takens bifurcation is the codimension-2 
bifurcation, which mixes the homoclinic and the 
Hopf bifurcations. In dimension 3, more complicated 
phase diagrams may occur (such as in the Shilnikov 
bifurcation) with the appearance of infinitely many 
periodic orbits or homoclinic loops (in a stable way: 
Newhouse phenomenon). This eventually gives rise to 
strange attractors (the Roessler attractor). 


The Poincaré Center-Focus Problem, Local 
Hilbert’s 16th Problem, Abel Equations, Algebraic 
Moments 


Hopf bifurcation theory for two-dimensional sys- 
tems deals with the first case of a general situation 


often referred to as degeneracies of Hopf bifurca- 
tions or alternatively Hopf—Takens bifurcations. 
Consider more generally a planar vector field, 
tangent at the origin to a linear focus: 


x= y+pxt f(x,y) 
y= —x + wy + g(x,y) 


The Poincaré center-focus problem asks for 
necessary and sufficient conditions on the perturba- 
tion terms so that all orbits are periodic in a 
neighborhood of the origin. This problem is still 
pending in the case, for instance, where f and g are 
homogeneous of degrees 4 and 5. It was solved a 
long time ago for degrees 2 and 3. Part (b) of 
Hilbert’s 16th Problem asks for finding a bound in 
terms of the degrees of polynomial perturbations for 
the number of limit cycles (isolated periodic orbits) 
in the neighborhood of the origin. In the case of 
homogeneous perturbations, a Cherkas transforma- 
tion allows the reduction of both problems to the 
so-called one-dimensional periodic Abel equations: 


dy/dx = p(x)y* + g(x)y” 


where p and q are trigonometric polynomials in x. 
A perturbative approach was developed for several 
years and yields a theory of algebraic moments 
related to Livsic’s generalized problem of moments. 


Fast-Slow Systems 


Fast—-slow systems 


ex=f(%,y); Jg) 


are characterized by the existence of two timescales. 
Variables x are called fast variables and y are called 
slow variables. Different approximation techniques 
can be used (averaging method, multiscale approach 
(see Multiscale Approaches)). The behavior of 
solutions is approximated as follows (when the 
scale € is small). The orbit jumps to an attractor of 
the fast dynamics. This attractor may eventually lose 
its stability and/or bifurcate as time evolves. Then 
the orbit jumps to another attractor of the fast 
dynamics. Once again, this attractor may evolve/ 
bifurcate/disappear, depending on the slow variables 
y. This explains why bifurcation theory enters in the 
process in a crucial way, and it has to be adapted to 
this special context where some new phenomena may 
occur (e.g., singular Hopf bifurcation theory, 
Canards, etc.). Fundamental tools to be used in this 
context are Takens theorem, Fenichel central mani- 
fold theorem, blowing-up (Dumortier—Roussarie). 
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Excitability is also an important feature which occurs 
in some fast-slow systems. Consider initial data in a 
neighborhood of an excitable attractive point. For some 
initial data, the orbit goes very quickly to the attractor. 
For some others instead (usually below some threshold), 
the orbit undergoes a long incursion in the phase 
diagram before turning back to the attractive point. 

Singular Hopf bifurcation, hysteresis, and excit- 
ability can, for instance, occur in the electrodissolu- 
tion and passivation of iron in sulfuric acid 
(see Alligood et al. (1997)). 

Sometimes, the orbit leaves the neighborhood of a 
first attractor to jump to a second one and then this 
second one disappears and the orbit jumps back to 
the initial attractor as the slow variables have 
undergone a cycle. This is called a hysteresis cycle. 
In case one of the attractors is a point while the 
other is an attractive periodic orbit, it may lead to 
bursting oscillations. These oscillations are charac- 
terized by the periodic succession of silent phases 
(attractor of the fast dynamics) and active (pulsatile) 
phases (periodic attractor of the fast dynamics). 
They are ubiquitous in physiology, where they were 
first discovered and can be also observed in physics 
(laser beams) and in population dynamics. 


Example 


The Hindmarsh—Rose 
oscillations: 


model displays bursting 


ex = y— 9° +3x* +1 -2 
ey =1-—5x*-y 
Z=s(x—x1)—-zZ 


The fast dynamics is two dimensional. For some values 
of the parameters, it displays an attractive node, a 
saddle and a repulsive focus. Under the slow variation 
of z, the fast dynamics displays a saddle—node 
bifurcation, a Hopf bifurcation from which emerges 
a stable limit cycle which disappears into a homoclinic 
bifurcation. The fast-slow system undergoes a hyster- 
esis loop which yields to bursting oscillations. 


Conclusions 


Over the past three decades, mathematical tech- 
niques gathered under the names of singularity 
theory and bifurcation theory of dynamical systems 
have offered a powerful means to explore nonlinear 
phenomena in diverse settings. These include 
mechanical vibrations, lasers, superconducting cir- 
cuits, and chemical oscillators. Many such instances 
are further developed in this encyclopedia. 


594 Solitons and Kac-Moody Lie Algebras 


See also: Bifurcation Theory; Bifurcations of Periodic 
Orbits; Chaos and Attractors; Entropy and Quantitative 
Transversality; Ergodic Theory; Feynman Path Integrals; 
Generic Properties of Dynamical Systems; Gravitational 
Lensing; Homoclinic Phenomena; Hyperbolic Dynamical 
Systems; Multiscale Approaches; Optical Caustics; 
Poisson Reduction; Stationary Phase Approximation; 
Symmetry and Symmetry Breaking in Dynamical 
Systems; Symmetry and Symplectic Reduction; 
Synchronization of Chaos; Weakly Coupled Oscillators. 


Further Reading 


Alligood KT, Sauer TD, and Yorke JA (1997) Chaos, An 
Introduction to Dynamical Systems, Textbooks in Mathema- 
tical Sciences. New York: Springer. 

Alpay D and Vinikov V (eds.) (2001) Operator Theory, System 
Theory and Related Topics, The Mosche Livsic Anniversary 
Volume, Operator Theory, Advances and Applications 
vol. 123. Birkhauser. 

Briskin M, Francoise JP, and Yomdin (2001) Generalized 
Moments, Cener-Focus Conditions and Compositions of 
Polynomials. Operator Theory, Advances and Applications 
123 ( in honor of M Livsic, 80th birthday). 

Diener M (1994) The canard unchained, or how fast-slow dynamical 
systems bifurcate? The Mathematical Intelligencer 6: 38-49. 
Francoise JP and Guillemin V (1991) On the period spectrum of a 

symplectic map. Journal of Functional Analysis 100: 317-358. 


Gallavotti G (1983) The Elements of Mechanics. New York: Springer. 

Goodstein DL and Goodstein JR (1997) Feynmann’s lost lecture. 
London: Vintage. 

Guckenheimer J (2004) Bifurcations of relaxation oscillations. In: 
Ilyashenko Y, Rousseau C, and Sabidussi G (eds.) Normal Forms, 
Bifurcations and Finiteness Problems in Differential Equations. 
Séminaire de mathématiques supérieures de Montréal, Nato 
Sciences Series, II. Mathematics, vol. 137, pp. 295-316. Kluwer. 

Haken H (1983) Synergetics, 3rd edn. Berlin: Springer. 

Keener J and Sneyd J (1998) Mathematical Physiology. Inter- 
disciplinary Applied Mathematics, vol. 8. New York: Springer. 

Malgrange B (1974) Intégrales asymptotiques et monodromie. 
Annales de PENS 7: 405-430. 

May R-M (1976) Simple mathematical models with very 
complicated dynamics. Nature 261: 459-467. 

Nekhoroshev V (1977) An exponential estimate of the time of 
stability of nearly integrable Hamiltonian systems. Russian 
Mathematical Surveys 32(6): 1-65. 

Palis J and de Melo W (1982) Geometric Theory of Dynamical 
Systems, An Introduction. New York: Springer. 

Perko L (2000) Differential Equations and Dynamical Systems, 3rd 
edn, Text in Applied Mathematics, vol. 7. New York: Springer. 

Siegel C-L and Moser J (1971) Lectures on Celestial Mechanics, 
Die Grundleheren der mathematischen Wissenschaften, 
vol. 187. Berlin: Springer. 

Smale S (1998) Mathematical problems for the next century. The 
Mathematical Intelligencer 20: 7-15. 

Smale S. Dynamics retrospective: great problems, attempts that 
failed. Physica D 51: 267-273. 


Sobolev Spaces see Inequalities in Sobolev Spaces 


Solitons and Kac—Moody Lie Algebras 


E Date, Osaka University, Osaka, Japan 
© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Solitons and Kac-Moody Lie algebras were born at 
almost the same time in the 1960s, although they 
did not have a connection at first. They both have 
roots in the history of mathematics. From the 1970s 
on, they became intersection points for many 
(previously known and new) results. 

The notion of solitons has many facets and it is 
difficult to give a mathematically precise definition; 
closely related to solitons is the notion of “com- 
pletely integrable systems.” The latter is usually used 
in a much broader sense. 

The terminology “soliton” was originally used for 
a particular phenomenon in shallow water waves. 
Now, in its broadest sense, it is used to represent an 


area of research relating to this particular phenom- 
enon in direct or indirect ways. From the viewpoint 
of solitons, particular solutions of differential 
equations are of special interest. Although particular 
solutions have been studied for a long time, interest 
in them was overshadowed by the method of 
functional analysis in the 1950s. In the late nine- 
teenth century, in parallel with the theory 
of algebraic functions, several studies undertook 
the solution of mechanical problems by elliptic or 
hyperelliptic integrals. Subsequently, however, there 
was a drop in activity in this area of work. 

Originally it was hoped that this kind of phenom- 
enon could be used for practical applications. No 
mention of practical application of solitons will be 
made in this article. 

First we list several topics which constitute the 
main body of the notion of solitons in the early 
stages; we will then explain relations with Kac- 
Moody Lie algebras. 


Birth of Solitons 


The name “soliton” itself was coined by Martin D 
Kruskal around 1965. It was originally employed for 
the solitary wave solution Korteweg-de Vries (KdV) 
equation 


u; — 4 (6uuy + Uyxx) =0, u= u(x,t) [1] 


The coefficients here are not important. We can 

change them arbitrarily. The unknown function u, 

or rather —u, represents the height of the wave. 
The solitary wave solution in question is given by 


u(x,t) = —2c sech (ve(x — ct — d)) [2] 


This is a traveling-wave solution with the height of 
the wave proportional to the speed. This is one 
feature of the nonlinearity of this differential 
equation. 

A reason for this nomenclature comes from the 
particle-like property of solitary wave observed via 
numerical computations. That is, if we have two 
solitons [2] with different speeds, with the faster one 
on the left and the slower one on the right, then after 
some time they collide and their shapes are distorted. 
After a long enough time, they are separated and 
recover their original shapes, the only difference 
being in the change of the phase shift d in [2]. 

Solitary waves in shallow water (like a canal) 
were first observed by Scott Russell in Scotland in 
the middle of the nineteenth century. Differential 
equations which possess solitary waves in shallow 
water as solutions were sought after Scott Russell’s 
report. Boussinesq derived one (now called the 
Boussinesq equation, which contains second partial 
derivatives with respect to time) from the Euler 
equation of water wave; then in 1895 Korteweg and 
his student de Vries derived the KdV equation. They 
also showed that the KdV equation possesses 
solutions expressible in terms of elliptic functions. 

In the 1960s Kruskal and Zabusky carried out 
numerical computations for the Fermi—Pasta-Ulam 
problem; they also came across the KdV equation 
and found the aforementioned phenomenon. 


Inverse-Scattering Method 


Kruskal and his co-workers further pursued the 
origin of the particle-like property of solitons and 
proposed the so-called inverse-scattering method. 
The inverse problem of scattering theory of the 
one-dimensional Schrödinger operator 


a (2) be 
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was studied by Gelfand—Dikii, Marchenko, and 
Krein in the 1950s, motivated by scattering theory 
in quantum mechanics. 

It gives a one-to-one correspondence between rapidly 
decreasing potentials u(x) and scattering data which 
consist of discrete eigenvalues -y and normalization 
Cj,j=1,...,n, of the eigenfunctions corresponding to 
them and the reflection coefficient r(€). The reflection 
coefficient represents the ratio of reflection of the unit 
plane wave e by the potential field. The scattering 
data {r(€), nj, ci j= 1,...,n} are a mathematical ideali- 
zation of observable data in quantum scattering. The 
procedure of reconstructing a potential from given 
scattering data is called the inverse problem. The heart 
of this procedure is solving an integral equation (the 
Gelfand—Dikii—Marchenko equation). In the reflection- 
less case (r(€) = 0), this integral equation reduces to a 
system of linear algebraic equations. 

Kruskal and co-workers found that the scattering 
data of these operators with solutions of [1] as 
potentials depend very simply on t: 


nj(t) = n;(0), cit) =,(0) e2in't 
r(E,t) = r(€,0) ei 


It was realized at the same time that soliton solutions 
correspond to a reflectionless potential (r(€) = 0) with 
only one discrete eigenvalue, while reflectionless 
potentials correspond to a nonlinear “superposition” 
of soliton solutions (called multisoliton solutions) and 
describe the interaction of solitons. 

As was pointed out by Zakharov and others, the 
inverse-scattering method has an intimate relation 
with the Riemann—Hilbert problem. 


3] 


Lax Representation 


Looking at this invariance of the spectrum, Lax 
reformulated the KdV equation [1] as an evolution 
equation for the one-dimensional Schrödinger operator: 


dL d\? 


P R N 2 

(5) “4 Garo, 
Here we have changed the sign of the operator for 
later convenience. This form of representation 
together with the inverse-scattering method gave a 
framework for finding nonlinear differential (differ- 
ence) equations that have solutions with properties 


similar to solitons (soliton equations). 
Among such are the sine-Gordon equation 


4] 


Un — Uxx = SIN U 


596 Solitons and Kac-Moody Lie Algebras 


the nonlinear Schrodinger equation 
Iu, + Uyy + u| u =) 
the modified KdV equation 


Ut — l (6u°ux + He) = 0 


the Toda lattice equation 





5 





J = —exp(On — Ony1) + exp(Qn-1 — Qn) 


and so on. The first three are obtained by replacing 
L by a 2x2 matrix differential operator of first 
order. For eqn [5], the linear operator corresponding 
to L in the case of the KdV equation is a difference 
operator of order 2 and has a connection with the 
theory of orthogonal polynomials in one variable as 
well as with the theory of moment problems. 

Later it was remarked that the differential 
operator A in eqn [4] is nothing but the differential 
Operator part of the fractional power of 
L: A=(L?/7),. By replacing A in [4] by (L@7+)/4) 
we obtain higher (nth) KdV equations. 


Basic Representations of Affine 
Lie Algebras 


In the 1960s Kac and Moody introduced indepen- 
dently a class of infinite-dimensional Lie algebras 
which are in many respects close to finite-dimensional 
semisimple Lie algebras. Each of them is constructed 
for a given generalized Cartan matrix (GCM), 


C= (GZ; a =; aj < Ofori] 
and if dij = 0 then ji = 0 [6] 


There is a special class of Kac-Moody Lie algebras 
that are now called affine Lie algebras. They 
correspond to positive-semidefinite GCM and are 
realized as central extensions of loop algebras 
(current algebras) 


Cà, à=] 8g 


of finite-dimensional semisimple Lie algebras g. 
They have many applications in physics, in parti- 
cular as current algebras. The Sugawara construc- 
tion in current algebra plays an essential role in 
conformal field theory. Note that finite-dimensional 
semisimple Lie algebras correspond to positive- 
definite GCMs. 

In the late 1970s, there was interest in construct- 
ing representations of these algebras after the 
general theory of representations was constructed. 


Among them was the work of Lepowsky—Wilson, 
who constructed basic representations of the affine 
Lie algebra All) = l) using differential operators of 
infinite order in infinitely many variables. These 
operators were called vertex operators by Garland, 
in view of the resemblance to objects in string 
theory. Character formulas for these new Lie 
algebras were intensively studied and many combi- 
natorial identities were (re)derived. 


Geometric Interpretation 


How do Kac—Moody Lie algebras enter into this 
picture? 

In the early stages of the history of solitons 
Kac-Moody Lie algebras appeared rather artifi- 
cially. Some authors tried to understand solitons 
from geometric viewpoints. A typical example is the 
sine-Gordon equation. This equation appears as the 
Gaufs—Codazzi equation in the theory of embeddings 
of two-dimensional surfaces of constant negative 
curvature into three-dimensional Euclidean space, 
while the Gaufs-Weingarten equation is the linear 
equation that appears in the Lax representation of 
the sine-Gordon equation. Another approach of a 
geometric nature, involving the prolongation struc- 
ture, was the direction initiated by Wahlquist- 
Estabrook. In this approach, the Lie algebra 
appeared in a natural way, although the nature of 
such Lie algebras was not so clear. This direction of 
research is close in spirit to the method of Cartan for 
treating partial differential equations. 

Several authors considered generalizations of the 
Toda lattice equation. Bogoyavlenskii and others 
observed that the original Toda lattice equation [5] 
is related to the Cartan matrix of the affine Lie 
algebra of type A. Viewed in this way, it was 
straightforward to generalize the Toda lattice 
equation to Cartan matrices of another type of 
affine Lie algebras and also to ordinary Cartan 
matrices. These were typical appearances of Kac- 
Moody Lie algebras in the theory of solitons; they 
were used to produce soliton equations. The climax 
of this is the work of Drinfel’d—Sokolov. 

It needed some time to understand another role of 
affine Lie algebras in the theory of solitons. 


Backlund Transformation 


In the theory of two-dimensional surfaces of 
constant negative curvature, a method of obtaining 
another surface of constant negative curvature from 
the given one with some parameter was known by 
the work of Backlund. If we apply this to the trivial 
solutions u=0 of the sine-Gordon equation, we 


obtain a one-soliton solution of the sine-Gordon 
equation. From this fact, the transformation of 
solutions of soliton equations to other solutions is 
called a Bäcklund transformation. The original 
Darboux transformation is a special case of a 
Backlund transformation. 


Hamiltonian Formalism 


Another discovery of Gardner—Greene-Kruskal- 
Miura was the Hamiltonian structure of the KdV 
equation. In the process of showing the existence of 
infinitely many conservation laws, they used the 
so-called Miura transformation, which relates the 
KdV and the modified KdV equation. Faddeev— 
Zakharov showed that the transformation to 
scattering data is a canonical transformation, and 
conserved quantities are obtained from the expan- 
sion of the reflection coefficients. 

Gelfand—Dikii studied Hamiltonian structures of 
the KdV equation using the formal variational 
calculus they initiated. 

M Adler was the first to try to study the KdV 
equation by using the orbit method known for 
finite-dimensional Lie algebras. It was known by the 
works of Kostant and Kirillov or even earlier by Lie 
that the co-adjoint orbits of Lie algebras admit 
symplectic structures (the Kostant—Kirillov bracket). 
Adler considered the algebra of pseudodifferential 
operators in one variable. This acquires the structure 
of Lie algebra by the commutation relation. This 
algebra admits a natural triangular decomposition 
by order. He showed that the KdV equation can be 
viewed as a Hamiltonian system in the co-adjoint 
orbit of the one-dimensional Schrodinger operator 
with the Kostant-Kirillov bracket. By introducing 
the notion of residue of pseudodifferential operators 
he rederived conserved quantities. The work of 
Drinfeld-Sokolov can be regarded as a thorough 
generalization of this direction. Hamiltonian struc- 
tures of the KdV equation and other soliton 
equations are now understood in this way. 

The method is also applicable to finite-dimensional 
Lie algebras. Symes, Kostant, and others treated the 
finite Toda lattice in this way. 

The motion of tops, including that of Kovalevs- 
kaya, was also studied in this way. 


Hirota’s Method 


There was another approach to soliton equations, quite 
different from the above. This was the method initiated 
by Hirota. He placed stress on the form of multisoliton 
solutions of the KdV equation, the sine-Gordon 
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equation, and so on. He made a dependent-variable 
transformation of the KdV equation [1], 


1=2(2) toe 


This form naturally arises when we reconstruct the 
potential of the one-dimensional Schrödinger 
operator from the scattering data by solving the 
Gelfand—Dikii—Marchenko integral equation. In this 
new dependent variable, eqn [1] takes the following 
form: 


(Dt — 4D, D;) f (x,t) - f(x,t) = 0 


where the operator D, is defined by 


Delf 9) = fet) geo T 


This operator is called Hirota’s bilinear differential 
operator. In such transformed form, he tried to solve 
the resulting equation in a perturbative way, 


f=1+ 5S  exp(2pjx + 2p%t + q;) 
j=l 


+ X cyexp(2(pj + Pe)x 


1<j<k<n 
+ 2(p? + pp)t +a) +e) +: [8] 


It is rather miraculous that in the soliton equation 
case we can truncate such a perturbative procedure 
at a finite point. The number of steps corresponds to 
the number of solitons. 

Most of the soliton equations are rewritten in 
bilinear form with such bilinear differentiation after 
a suitable dependent-variable transformation. (Some 
equations need several new dependent variables.) 
Once we have a differential equation in Hirota’s 
bilinear differential form, it always has two-soliton 
solutions. 

Up to 1980, keywords characterizing solitons 
were; inverse-scattering method, Bäcklund trans- 
formation, multisolitons, Hirota’s method, quasi- 
periodic solutions, etc. No explicit mention was 
made of representation theory. 


Hierarchy of Soliton Equations 


As was stated above, soliton equations viewed as 
Hamiltonian systems have infinitely many conserva- 
tion laws. This implies that we can introduce infinitely 
many independent time variables consistently. From 
this viewpoint, it is natural to consider the KdV 
equation and its higher-order analogs simultaneously. 
They have many properties in common. For example, 
the t-dependence of the scattering data of the higher 
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KdV equation is given by replacing € by ¿”+t and n 
by at in eqn [3]. The totality of soliton equations 
organized in this way is called a hierarchy of soliton 
equations; in the KdV case, it is called the KdV 
hierarchy. This notion of hierarchy was introduced by 
M Sato. He tried to understand the nature of the 
bilinear method of Hirota. First, he counted the 
number of Hirota bilinear operators of given degree 
for hierarchies of soliton equations. For the number of 
bilinear equations, M Sato and Y Sato made extensive 
computations and made many conjectures that involve 
eumeration of partitions. 


Kadomtsev-Petviashvili Hierarchy 


Although it was included in a family of soliton 
equations slightly later, the Kadomtsev—Petviashvili 
(KP) equation is a soliton equation in three 
independent variables, which first appeared in 
plasma physics: 


3 Uyy — (u — 1 (6uuy + Uxx) ) =0 [9] 


For this equation we have to replace the Lax 
representation by 


a \? ð a 3 a ð 
s PENEAN PE gy ae E 1 
(2) i = (a) on A o ee 


This form of representation was introduced by 
Zakharov-Shabat. Sometimes it is referred to as 
the zero-curvature representation or the Zakharov- 
Shabat representation. The KP equation is universal 
in the sense that it contains the KdV equation [1] 
and the Boussinesq equation as special cases. If u 
does not depend on y, resp. t, this gives the KdV, 
resp. the Boussinesq equation. 


Work of Sato 


Sato stressed the importance of the study of the KP 
equation. He first introduced the KP hierarchy. 
Instead of the one-dimensional Schrödinger operator 
in the KdV case consider a pseudo- (micro) 
differential operator of first order, 


L=oO+ uz(x)O~! + uz (x)ð > Howes 
ð [11] 
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Setting B,=(L”),, the KP hierarchy is defined by 
the Zakharov-Shabat representation 


o o 
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If we assume that L? is a differential operator, we 
have the KdV hierarchy and the constraint that L? is 
a differential operator gives the Boussinesq 
hierarchy. This process is called reduction. 

Sato found that character polynomials (Schur 
functions) solve the KP hierarchy and, based on 
this observation, he created the theory of the 
infinite-dimensional (universal) Grassmann manifold 
and showed that the Hirota bilinear equations are 
nothing but the Plicker relations for this Grassmann 
manifold. 

Sato also gave an (infinite-dimensional) determi- 
nantal formula for Hirota’s dependent variable and 
called the latter the 7-function. Using this 
T-function, the wave function (the eigenfunction 
corresponding to the KP hierarchy) is expressed as 


~ x—e(k-* 
w(x,k) = osn( st ee) 
=1 


= T(x) 


e(k) = GE [12] 


Lw = kw 


where L is given by eqn [11]. 


Affine Lie Algebras as Infinitesimal 
Transformation Groups for Soliton 
Equations 


Date—Jimbo-—Kashiwara—Miwa found another rela- 
tion among soliton equations and affine Lie alge- 
bras. After noticing some similarity between the 
formula in the paper by Lepowsky—Wilson on the 
Rogers—Ramanujan identity using the vertex opera- 
tors for All) and the formula in the computation of 
numbers of bilinear operators in Sato’s paper, they 
applied the vertex operator for AY, 


X(p) = exp © saag?) 


j=l 


2 o 
x exp -opm 5) 


j=l 


to 1 (which is the simplest 7-function for the KdV 
hierarchy), where p is a parameter. They found that 
the result is the 7-function corresponding to the one- 
soliton solution of the KP hierarchy. They also 
found that successive application of X(p)’s to 1 
produced all multisoliton 7-functions. Therefore, 
applications of vertex operators are precisely 


Backlund transformations. This implies that the 
affine Lie algebra AY is the infinitesimal transfor- 
mation group for solutions of the KdV hierarchy. 

After this discovery, it was realized that the 
totality of 7r-functions of the KdV hierarchy is 
the group orbit of the highest weight vector (=1) 
of the basic representation of A, 

The vertex operators for the KP hierarchy were 
also found: 


x;(p! +d ) 
i=] 


J= 
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If we put q = —p, the vertex operator for All) ({12]) 
is recovered. 

Viewed in this way the Lie algebra corresponding 
to the KP hierarchy is gl,(=A,.,). And an embed- 
ding of A into A» was also found. Subsequently, 
the method using free fermions (Clifford algebras) 
was established. Frenkel-Kac had already used free 
fermions to construct basic representations. In this 
approach, the r-functions are defined as vacuum 
expectation values. Based on this connection with 
affine Lie algebras, many conjectures of Sato on the 
number of bilinear equations are (re)proved by using 
specialized characters of affine Lie algebras. 

The use of free fermions was exploited by 
Ishibashi-Matsuo—Ooguri to relate soliton equations 
with conformal field theory on Riemann surfaces. 
This aspect was further studied by Tsuchiya—Ueno- 
Yamada using D-modules. 

Once such a viewpoint was established, it was 
easy to construct soliton equations corresponding to 
other affine Lie algebras. Hierarchies similar to the 
KP hierarchies (the simplest equation contains three 
variables) were also found, which correspond to Lie 
algebras like qo,,, $p,, (the BKP hierarchy, the CKP 
hierarchy, and so on). 

Summarizing these developments, we can say that 
affine Lie algebras, or slightly larger ones like gl, 
appear naturally as infinitesimal transformation 
groups for soliton equations and the solution spaces 
are the (completed) group orbits of highest weight 
vector t-functions of level-1 representations. The 
Hirota bilinear equations are the equations describ- 
ing these orbits (analogs of Pliicker relations). 

Soon afterwards, the notion of t-functions was 
introduced in the study of Painlevé equations by 
Okamoto, revealing Hamiltonian structures in 
Painlevé equations. 


X(p,q) = os ( 
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The Method of Drinfeld—Sokolov 


The KdV or the KP hierarchies are related to scalar 
linear differential operators. A parallel treatment 
using matrix differential operators is also possible. 
In fact, the nonlinear Schrödinger equation, modi- 
fied KdV equation, the sine-Gordon equation, etc., 
are treated in this way. 

Drinfel’d and Sokolov gave a general framework 
along these lines. The first step is to choose the starting 
(matrix-valued) linear differential operator of order 
one. For that they use the language of Lie algebras. 

Let us start with a matrix realization of a Lie 
algebra (for an affine Lie algebra, the elements are 
Laurent polynomials in one variable). Consider a 
linear differential operator of the following form: 


L= 7 + q(x) +A 

where g(x) is an element of the Borel subalgebra and A 
is a sum of positive Chevalley generators in the case of 
affine Lie algebras. By using gauge transformations 
(adjoint group), they consider several normal forms. 
One normal form is obtained by choosing a node of 
the corresponding Dynkin diagram. The resulting 
matrix system is equivalent to the one obtained by 
scalar Lax representation (or a slight generalization of 
it). In this way, the generalized KdV equations for 
affine Lie algebras are obtained. Another normal form 
is to make q §-valued. Soliton equations obtained in 
this way are called the modified KdV equations. This is 
a generalization of the Miura transformation. They 
also comment on the construction of partially mod- 
ified soliton equations, which correspond to taking 
various parabolic subalgebras. The Hamiltonian 
formalism is also treated from their viewpoint. 

In summary, in their approach affine algebras are 
used to construct soliton equations, or one can say 
that they consider the space of initial values of 
soliton equations. 

They also discuss two-dimensional Toda lattices 
in their setting and show that modified equations in 
their sense are symmetries of the two-dimensional 
Toda lattices. 


Common Features of the Roles of Affine 
Lie Algebras in Solitons 


In r-function approach as well as in the method 
of Drinfeld-Sokolov, the existence of triangular 
decomposition of Lie algebras was essential. In the 
former case, it was basic when considering highest- 
weight representations and, for the latter, it was 
used for the setup. 
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Special Solutions of Soliton Equations 
(Multisoliton and Rational Solutions) 


One of the characteristic features of soliton equa- 
tions is that they allow rich special solutions. 
Multisoliton solutions were the starting point of 
the whole story. They directly relate to vertex 
operators of affine Lie algebras. 

Rational solutions (in terms of 7-function poly- 
nomial solutions) can be viewed as degenerations of 
multisoliton solutions. Motions of poles (or zeros) of 
the solutions are interesting. Airault-McKean—Moser 
studied the motion of poles of rational solutions of 
the KdV equation and found that they are identical to 
the motion of particles on a line (Calogero—Moser- 
Sutherland system). This viewpoint has now been 
generalized by Veselov and others. 

Another discovery of Sato was that polynomial 
T-functions of the KP hierarchy are precisely Schur 
functions (character polynomials). 

In accordance with the process of reduction, 
polynomial r-functions of the KdV hierarchy are 
Schur functions of special type. 


Quasiperiodic Solutions of 
Soliton Equations 


As mentioned above, the KdV equation admits 
solutions expressible in terms of elliptic functions. 
Dubrovin—Novikov and Its—Matveev, almost at the 
same time, studied solutions of the KdV equation 
with periodic initial condition. 

To the Sturm—Liouville (i.e., one-dimensional 
Schrödinger) operator with periodic potential 


L= (2) + u(x), u(x+l)=u(x) 


there corresponds the discriminant, which is an 
entire function of the spectral parameter. Its zeros 
represent the periodic and antiperiodic spectrum 4, 
of the operator: 


Lfi(x) =h), file +D=+ f(x) 


It turns out that, except for a finite number of zeros, 
other zeros are double. Such a potential is called a 
finite-zone potential. These zones correspond to the 
spectrum of the operator in the L*-sense. To a finite- 
zone potential u(x) there corresponds a hyperelliptic 
curve 


with simple zeros A; of the discriminant as zeros of 
polynomials defining the curve. If we consider the 
Dirichlet boundary value problem for the operator L, 


Lf = p,f 
f(s,u) =0=f(s+l, u) 


the eigenvalues are discrete and each eigenvalue ju; is 
located in a zone: 


Azj-1 < Hls) < Ay 


So, for the double zeros (Azj-1 = Az;), the corre- 
sponding Dirichlet eigenvalue u;(s) does not depend 
on s. 

Dubrovin—Novikov also showed that a finite-zone 
potential is a stationary solution of the higher-order 
KdV equation (the order being equal to the number 
of nontrivial zones) and the n-zonal potentials form 
a finite-dimensional integrable system. In other 
words, the linear operators L, A, defining the nth 
order KdV equations commute, 


Le Ab | =O 


In passing, it was later found that such a pair of 
commuting linear differential operators was first 
studied by Burchnall-Chaundy in the 1920s. 
H F Baker remarked on the corresponding simulta- 
neous eigenfunctions by relating them to multi- 
plicative functions on algebraic curves. 


The Work of Krichever 


Krichever reversed the above argument, utilizing the 
properties of corresponding eigenfunctions as a 
function of the spectral parameter. In this approach, 
we start with a compact Riemann surface C 
(=nonsingular algebraic curve) of genus g. Here 
we apply his method to the KP hierarchy. Take a 
point Po on C together with the inverse of a local 
parameter k™. Also take a general divisor 6 on C of 
degree g. Consider a function w(x, P), x = (x1, x2,...), 
with the following properties: 


1. p is meromorphic on C\Po with the pole divisor 
6, and 
2. near Po, Y behaves like 


w(x, P) =exp es st) (1+ O(k-')) 


j=l 


Such a yw exists uniquely and can be constructed 
using the theory of abelian integrals and the Jacobi 
problems on algebraic curves. Such a function was 
called the Baker-Akhiezer function, since Akhiezer 
constructed it by using abelian integrals and Jacobi’s 


problem in his study of moment problems (ortho- 
gonal polynomials). 

It was later realized that Schur had much earlier 
considered such functions in the study of ordinary 
differential equations. 

It is easy to show that such a function satisfies the 
following linear differential equations: 


A 3 n n-1 ð j 
av- (2) Suola) Je W= 2 oye 


J 


In this way, we obtain a solution of the KP 
hierarchy. 

If there exists a rational function f(P) on C with 
poles only at Po with singular part k”,w~ can be 
factorized as 


w(x, P) =exp f(P)w'(x’, P) 


where x’ indicates the set of variables other than x,,. 
Consequently, we have 


a(x, P) =F (P), P) 





In this way, for a hyperelliptic curve C and a 
branch point of it, viewed as the double cover of 
CP!, we recover the case of the KdV hierarchy. 

Multisolitons correspond to rational algebraic 
curves with ordinary double points, while rational 
solutions correspond to further degeneration. 

The study of quasiperiodic solutions of soliton 
equations revealed an intimate relationship with 
the theory of algebraic curves. One particular out- 
come was the characterization of Jacobian varieties 
among abelian varieties. This was originally posed 
by Schottky and subsequently reformulated by 
S P Novikov using soliton equations (Schottky 
problem, Novikov conjecture). This problem was 
solved through studies by Shiota, Mulase, and 
Arbarello—De Concini. 

Another aspect was finding commutative subalge- 
bras in the ring of linear differential operators. This 
problem is related to the theory of stable vector 
bundles on algebraic curves. 


Similarity Solutions of Soliton Equations 


Ablowitz and Segur have shown that the Painlevé 
transcendent of the second kind solves the KdV 
equation as a similarity solution. This was the 
starting point of the study of similarity solutions of 
soliton equations. 

Flaschka and Newell tried to construct the theory of 
multisimilarity solutions. As a by-product, they 
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discussed modulation of the KdV equation by using 
the averaging method of Whitham. This opens the 
way to study the quasiclassical limit of soliton 
equations. This aspect was further studied by Dubro- 
vin and others in connection with topological field 
theory. 

Quite recently, Noumi and Yamada gave a general- 
ization of the Painlevé equation in many variables by 
using the idea of similarity solutions of soliton 
equations. In the work of Noumi-—Yamada, the affine 
Weyl group and 7-functions play an essential role in 
constructing generalizations of the Painlevé equation. 
The shift or the unit of difference corresponds to 
imaginary null roots of affine Lie algebras. The idea is 
further applied to elliptic Painlevé equations. 


Integrable Many-Body Problems 


As mentioned in relation with the rational solutions 
of soliton equations, the theory of integrable many- 
body problems has an intimate relationship with 
the theory of solitons. Recently, Veselov and his 
co-workers introduced the notion of Baker—Akhiezer 
functions of many variables. This concerns a 
commutative subring of differential operators in 
many variables. The structure of vector bundles on 
algebraic varieties of higher dimensions is quite 
different from that of algebraic curves. For this 
reason, a naive generalization of soliton equations to 
higher dimensions is not possible. Veselov and 
others have set up a class of functions which they 
call multidimensional Baker-—Akhiezer functions. 
They are defined by giving a finite set of vectors in 
a Euclidean space. The first problem is the existence. 
For the existence of the multidimensional Baker- 
Akhiezer function the set must satisfy several 
constraints. This is quite different from the case of 
solitons. Root systems satisfy these constraints and 
the corresponding Baker—Akhiezer function becomes 
the common eigenfunction of linear differential 
Operators appearing in the Calogero—Sutherland- 
Moser model corresponding to root systems. 


Ball-Box Systems 


Satsuma—Takahashi found a soliton-like phenom- 
enon in cellular automata. It took much time for a 
mathematical explanation of this. Now it is under- 
stand that these systems are obtained by a limiting 
procedure from soliton equations. Sometimes this is 
called ultra-discretization. The system thus obtained 
can also be obtained from the theory of crystal bases 
of affine Lie algebras. They are now called ball-box 
systems. 
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Other Topics 


A quantized version of the inverse-scattering method 
was initiated by Faddeev and his co-workers, which 
makes a connection with two-dimensional solvable 
lattice models and produced the notion of quantum 
groups. Through the Bethe ansatz, another relation 
of two-dimensional lattice models and _ ball—box 
systems has been discussed. 


See also: Affine Quantum Groups; Backlund 
Transformations; Bi-Hamiltonian Methods in Soliton 
Theory; Coherent States; Current Algebra; Integrable 
Systems and Algebraic Geometry; Integrable Systems: 
Overview; Multi-Hamiltonian Systems; Painlevé 
Equations; Partial Differential Equations: Some Examples; 
g-Special Functions; Recursion Operators in Classical 
Mechanics; Sine-Gordon Equation; Toda Lattices. 
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Introduction 


A soliton is a localized lump (or string or wall, etc.) 
of energy, which can move without distortion, 
dispersion, or dissipation, and which is stable under 
perturbations (and collisions with other solitons). The 
word was coined by Zabusky and Kruskal in 1965 to 
describe a solitary wave with particle-like properties 
(as in electron, proton, etc.). Solitons are relevant to 
numerous areas of physics — condensed matter, 
cosmology, fluids/plasmas, biophysics (e.g., DNA), 
nuclear physics, high-energy physics, etc. Mathema- 
tically, they are modeled as solutions of appropriate 
partial differential equations. 

Systems which admit solitons may be classified 
according to the mechanism by which stability is 
ensured. Such mechanisms include complete integr- 
ability, nontrivial topology plus dynamical balan- 
cing, and O-balls/breathers. 


Sometimes the term “soliton” is used in a 
restricted sense, to refer to stable localized lumps 
which have purely elastic interactions: solitons 
which collide without any radiation being emitted. 
This is possible only in very special systems, namely, 
those that are completely integrable. For these 
systems, soliton stability (and the elasticity of 
collisions) arises from a number of characteristic 
properties, including a precise balance between 
dispersion and nonlinearity, solvability by the 
inverse scattering transform from linear data, infi- 
nitely many conserved quantities, a Lax formulation 
(associated linear problem), and Backlund transfor- 
mations. Examples of such integrable soliton sys- 
tems are the sine-Gordon, Korteweg—deVries, and 
nonlinear Schrodinger equations. 

The category of topological solitons is the most 
varied, and includes such examples as kinks, 
vortices, monopoles, skyrmions, and instantons. 
The requirement of dynamical balancing for these 
can be understood in terms of Derrick’s theorem, 
which provides necessary conditions for a classical 
field theory to admit static localized solutions. The 
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Derrick argument involves studying what 
happens to the energy of a field when one changes 
the scale of space. If one has a scalar field (or 
multiplet of scalar fields) ¢, and/or a gauge field F,,,, 
then the static energy E is the sum of terms such as 


Ey = | V(o) "x, 
J Fip Fik d”x 


Ea = | T(Djo) ax 


ja 


where each integral is over (n-dimensional) space 
R”, Dj¢ denotes the covariant spatial derivative of 4, 
and T4(&;) is a real-valued polynomial of degree d. 
In particular, for example, we could have T>(Dj;¢) = 
(D;¢)(D;¢), the standard gradient term. Under the 
dilation x/ +> Ax’, these functionals transform as 

Ep = "Ep 


Eo A”Eọo, Eg Af "Eg, 


In order to have a static solution (critical point of 
the static energy functional), one needs to have a zero 
exponent on 4, and/or a balance between positive and 
negative exponents. A negative exponent indicates a 
compressing force (tending to implode a localized 
lump), whereas a positive exponent indicates an 
expanding force; so to have a static lump solution, 
these two forces have to balance each other. For 
n=1, a system involving only a scalar field, with 
terms of the form Eo and E2, can admit static solitons 
(e.g., kinks); the scaling argument implies a virial 
theorem, which in this case says that Eọ = Ey. For 
n=2, one can have a scalar system with only E2, 
since in this case the relevant exponent is zero (e.g., 
the two-dimensional sigma model). Another 1=2 
example is that of vortices in the abelian Higgs model, 
where the energy contains terms Eo, E2, and Ep. For 
n = 3, interesting systems have E> together with either 
E4 (e.g., skyrmions) or Ep (e.g., monopoles). An Eo 
term is optional in these cases; its presence affects, in 
particular, the long-range properties of the solitons. 
For n=4, one can have instantons in a pure gauge 
theory (term Ep only). 

It should be noted that if there are no restrictions on 
the fields ¢ and A; (such as those arising, e.g., from 
nontrivial topology), then there is a more obvious mode 
of instability, which will inevitably be present: Gr pd 
and/or A; — 4A;, where 0 < u < 1. In other words, the 
fields can simply be scaled away altogether, so that the 
height of the soliton (and its energy) go smoothly to 
zero. This can be prevented by nontrivial topology. 

Another way of preventing solitons from shrink- 
ing is to allow the field to have some “internal” time 
dependence, so that it is stationary rather than 
static. For example, one could allow the complex 
scalar field ¢ to have the form ¢=w exp (iwt), where 


w is independent of time t. This leads to something 
like a centrifugal force, which can have a stabilizing 
effect in the absence of Skyrme or magnetic terms. 
The corresponding solitons are O-balls. 


Kinks and Breathers 


The simplest topological solitons are kinks, in 
systems involving a real-valued scalar field (x) in 
one spatial dimension. The dynamics is governed by 
the Lagrangian density 


L= (bi) — (Hx) — We] 


where W(¢) is a (fixed) smooth function. The system 
can admit kinks if W(@) has at least two zeros, for 
example, W(A) = W(B)=0 with W(¢) >0 for A < 
@ < B. Two well-known systems are: sine-Gordon 
(where W(¢) =2sin(¢/2), A =0, and B=2z) and ¢* 
(where W(¢) =1 — ¢7, A = —1, and B = 1). The corre- 
sponding field equations are the Euler-Lagrange equa- 
tions for £; for example, the sine-Gordon equation is 


Pu — Pxx + sin o= 0 [1] 


Configurations satisfying the boundary condi- 
tions ¢ — A as x — —œ and ọ — B as x — œ are 
called kinks (and the corresponding ones with 
x=0oo and x=-—oo interchanged are antikinks). 
For kink (or antikink) configurations, there is a 
lower bound, called the Bogomol’nyi bound, on the 
static energy E[@]; for kink boundary conditions, 
we have 


EI =5 | [0+ wo] ds 


1 CoO CoO 
=3| o- WOPde+ | W6)ode 
B 
> | Wod 
with equality if and only if the Bogomol’nyi equation 


do 
“= wid) 2 


is satisfied. A static solution of the Bogomol’nyi 
equation is a kink solution — it is a static minimum 
of the energy functional in the kink sector. For 
example, for the sine-Gordon system, we get Efo] > 
8, with equality for the sine-Gordon kink 


d(x) = 4 tan7! exp(x — xo) 


while for the ¢* system, we get Efe] > 4/3, with 
equality for the phi-four kink 


d(x) = tanh(x — xo) 
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These kinks are stable topological solitons; the 
nontrivial topology corresponds to the fact that the 
boundary value of (t,x) at x =oo is different from 
that at x =—oo. With trivial boundary conditions 
(say 6 > A as x — +00), stable static solitons are 
unlikely to exist, but solitons with periodic time 
dependence (which in this context are called breath- 
ers) may exist. For example, the sine-Gordon 
equation and the nonlinear Schrodinger equation, 
both, admit breathers — but these owe their existence 
to complete integrability. By contrast, the ¢* system 
(which is not integrable) does not admit breathers; a 
collision between a ¢* kink and an antikink (with 
suitable impact speed) produces a long-lived state 
which looks like a breather, but eventually decays 
into radiation. 

In lattice systems, however, breathers are more 
generic. In a one-dimensional lattice system, the 
continuous space R is replaced by the lattice Z, so 
d(t,x) is replaced by ¢,(t), where n€ Z. The 
Lagrangian is 


L=3> [e 


where /) is a positive parameter, corresponding to the 
dimensionless ratio between the lattice spacing and the 
size of a kink. The continuum limit is h — 0. This 
system admits kink solutions as in the continuum case; 
and for h large enough, it admits breathers as well, but 
these disappear as h becomes small. 

Interpreted in three dimensions, the kink becomes 
a domain wall separating two regions in which the 
order parameter ¢ takes distinct values; this has 
applications in such diverse areas as cosmology and 
condensed matter physics. 


(nta = bn) — W(bn) 


Sigma Models and Skyrmions 


In a sigma model or Skyrme system, the field is a 
map ¢ from spacetime to a Riemannian manifold M; 
generally, M is taken to be a Lie group or a 
symmetric space. The energy density of a static 
field can be constructed as follows (the Lorentz- 
invariant extension of this gives a relativistic 
Lagrangian for fields on spacetime). Let ¢% be local 
coordinates on the m-dimensional manifold M, let 
h,, denote the metric of M, and let x’ denote the 
spatial coordinates on space R”. An m x m matrix D 


is defined by 


= (O16) hac(5j9”) 
where ð; denotes derivatives with aoe to, the x. 
a the invariants GSD = |ð? and 


= (1/2)[(tr D)* — tr(D?)] can terms in the 


energy density, as well as a zeroth-order term 
Eo = V(¢") not involving derivatives of ¢. A term 
of the form €; is called a Skyrme term. 

The boundary condition on field configurations 
is that @ tends to some constant value ¢ọ € M as 
|x| — co in R”. From the topological point of view, 
this compactifies R” to S”. In other words, ¢ extends 
to a map from S” to M; and such maps are classified 
topologically by the homotopy group 7z,,(M). For 
topological solitons to exist, this group has to be 
nontrivial. 

In one spatial dimension (n= 1) with M = St (say), 
the expression €4 is identically zero, and we just have 
kink-type systems such as sine-Gordon. The simplest 
two-dimensional example (n=2) is the O(3) sigma 
model, which has M = S* with its standard metric. In 
this system, the field is often expressed as a unit 
3-vector field @=(¢', 67, °), with E2 = (0) - (Q). 
Here the configurations are classified topologically by 
their degree (or winding number, or topological 
charge) N € 7(S7) © Z, which equals 


1 
N = zz | 6: a19 x dap dx! de’ 


Instead of @, it is often convenient to use a single 
complex-valued function W related to @ by the 
stereographic projection W =(¢! + id”)/(1 — ¢3). In 
terms of W, the formula for the degree N is 

Wi W — W: W1 


x! dx 
E +w 


and the static energy is (with z =x! + ix?) 


= fE d*x 


s Ewe + [Wal 
Enn 
(+wF (a+ (WP) 


ej we 

(1+ | WI") 

From this, one sees that E satisfies the Bogomol’nyi 
bound E > 8xN, and that minimal-energy solutions 
correspond to solutions of the Cauchy—Riemann 
equations W;=0. To have finite energy, W(z) has to 
be a rational function, and so solutions with wind- 
ing number N correspond to rational meromorphic 
functions W(z), of degree |N]. (If N < 0, then Wis a 
rational function of Z.) The energy is scale invariant 
(conformally invariant), and consequently these 
solutions are not solitons — they are not quite stable, 
since their size is not fixed. Adding terms E4 and € 
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to the energy density fixes the soliton size, and the 
resulting two-dimensional Skyrme systems admit 
true topological solitons. 

The three-dimensional case (1=3), with M being 
a simple Lie group, is the original Skyrme model of 
nuclear physics. If M=SU(2), then the integer N € 
73(SU(2)) S Z is interpreted as the baryon number. 
The (quantum) excitations of the ¢-field correspond 
to the pions, whereas the (semiclassical) solitons 
correspond to the nucleons. This model emerges as 
an effective theory of quantum chromodynamics 
(QCD), in the limit where the number of colors is 
large. If we express the field as a function U(x’) 
taking values in a Lie group, then L; = U'0;U takes 
values in the corresponding Lie algebra, and €2 and 
E4 take the form 


E4 = —Ler([L;, Lel[L;, Lel) 


The static energy density in the basic Skyrme system 
is the sum of these two terms. The static energy 
satisfies a Bogomol’nyi bound E > 1277|N\, and it is 
believed that stable solitons (skyrmions) exist for 
each value of N. Classical skyrmions have been 
investigated numerically; for values of N up to ~25, 
they turn out to resemble polyhedral shells. Com- 
parison with nucleon phenomenology requires semi- 
classical quantization, and this leads to results which 
are at least qualitatively correct. 

A variant of the Skyrme model is the Skyrme- 
Faddeev system, which has n=3 and M=S7; the 
solitons in this case resemble loops which can be 
linked or knotted, and which are classified by their 
Hopf number N € 73(S*). In this case, the energy 
satisfies a lower bound of the form E > cN?/4. 
Numerical experiments indicate that for each N, 
there is a minimal-energy solution with Hopf 
number N, and with energy close to this topological 
lower bound. 


Abelian Higgs Vortices 


Vortices live in two spatial dimensions; viewed in 
three dimensions, they are string-like. Two of their 
applications are as cosmic strings and as magnetic 
flux tubes in superconductors. They occur as static 
topological solitons in the the abelian Higgs model 
(or Ginzburg-Landau model), and involve a mag- 
netic field B=0O,A2 — 02A;1, coupled to a complex 
scalar field ¢, on the plane R?. The energy density is 


E =4(Djd)(Djd) +4B +40- ld)? B 


where Dj¢:=0;¢ — 1Aj¢, and where A is a positive 
constant. The boundary conditions are 


}o]=1 4 


as r — oo. If we consider a very large circle C on R’, 
so that [4] holds on C, then ¢|, is a map from the 
circle C to the circle of unit radius in the complex 
plane, and therefore it has an integer winding 
number N. Thus configurations are labeled by this 
vortex number N. 

Note that if E vanishes, then B= 0 and |¢|= 1: the 
gauge symmetry is spontaneously broken, and the 
photon “acquires a mass”: this is a standard 
example of spontaneous symmetry breaking. 

The total magnetic flux fB d*x equals 27N; a 
proof of this is as follows. Let 0 be the usual polar 
coordinate around C. Because |¢|=1 on C, we can 
write ¢= exp [if(@)] for some function f; this f need 
not be single-valued, but must satisfy f(27) — 
f(0)=27N with N being an integer (in order that 
@ be single-valued). In fact, this defines the winding 
number. Now since Dj¢=0,6—iAj@=0 on C, 


we have 


Djo=0, B=0, 


Aj = —i¢ '0;6 = Of 


on C. So, using Stokes’ theorem, we get 


If A=1, then the total energy E= fEdx 
satisfies the Bogomol’nyi bound E > 7*#N;E=7N 
if and only if a set of partial differential equations 
(the Bogomol’nyi equations) are satisfied. Since 
like charges repel, the magnetic force between 
vortices is repulsive. However, there is also a 
force from the Higgs field, and this is attractive. 
The balance between the two forces is determined 
by A: if A>1, the vortices repel each other; 
whereas if A <1, the vortices attract. In the 
critical case A=1, the force between vortices is 
exactly balanced, and there exist static multi- 
vortex solutions. In fact, one has the following: 
given N points in the plane, there exists an 
N-vortex solution of the Bogomol’nyi equations 
(and hence of the full field equations) with ¢ 
vanishing at the chosen points (and nowhere 
else). All static solutions are of this form. These 
solutions cannot, however, be written down 
explicitly in terms of elementary functions (except 
of course for N=0O). 
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Monopoles 


The abelian Higgs model does not admit three- 
dimensional solitons, but a nonabelian generaliza- 
tion does — such nonabelian Higgs solitons are called 
magnetic monopoles. The field content, in the 
simplest version, is as follows. First, there is a 
gauge (Yang-Mills) field F,,,, with gauge potential 
A, and with the gauge group being a simple Lie 
group G. Second, there is a Higgs scalar field œ, 
transforming under the adjoint representation of G 
(thus ¢ takes values in the Lie algebra of G). For 
simplicity, G is taken to be SU(2) in what follows. 
So we may write A, =1A%oq, Fu = iFa, and 
@=16%0,, where og, are the Pauli matrices. The 
energy of static (Q96=0=0pA;), purely magnetic 
(Ao = 0) configurations is 


E= | EBB; +4(Dj6)"(Dj6)" +440 -9°0"))] d's 


where B?=(1/2)ejx/Fp is the magnetic field. The 
boundary conditions are B? — 0 and 79% — 1 as 
r— oo; so ¢ restricted to a large spatial 2-sphere 
becomes a map from S* to the unit 2-sphere in the 
Lie algebra su(2), and as such it has a degree N € Z. 
An analytic expression for N is 


J B? (Djo)? dx = 2nN [5] 


At long range, the field resembles an isolated 
magnetic pole (a Dirac magnetic monopole), with 
magnetic charge 2rN. Asymptotically, the SU(2) 
gauge symmetry is spontaneously broken to U(1), 
which is interpreted as the electromagnetic gauge 
group. 

In 1974, it was observed that this system admits a 
smooth, finite-energy, stable, spherically symmetric 
N=1 solution — this is the °t Hooft-Polyakov 
monopole. There is a Bogomol’nyi lower bound on 
the energy E: from 0 < (B + Dd)? =B? + (Dd)? + 
2B - Dd, we get 


E > 27N + faa — 16%)" dx (6) 


where [5] has been used. The inequality [6] is 
saturated if and only if the Prasad—Sommertfield 
limit \=0 is used, and the Bogomol’nyi equations 


(Djo)" = BY 7) 


hold. The corresponding solitons are called 
Bogomol’nyi—Prasad—Sommerfield (BPS) monopoles. 

The Bogomol’nyi equations [7], together with the 
boundary conditions described above, form a com- 
pletely integrable elliptic system of partial differen- 
tial equations. For any positive integer N, the space 


of BPS monopoles of charge N, with gauge freedom 
factored out, is parametrized by a (4N — 1)-dimen- 
sional manifold My. This is the moduli space of N 
monopoles. Roughly speaking, each monopole has a 
position in space (three parameters) plus a phase 
(one parameter), making a total of 4|N| parameters; 
an overall phase can be removed by a gauge 
transformation, leaving (4|N|—1) parameters. In 
fact, it is often useful to retain the overall phase, and 
to work with the corresponding 4|N|-dimensional 
manifold My. This manifold has a natural metric, 
which corresponds to the expression for the kinetic 
energy of the system. A point in My represents an 
N-monopole configuration, and the slow-motion 
dynamics of N monopoles corresponds to geodesics 
on My; this is the geodesic approximation of 
monopole dynamics. 

The N = 1 monopole is spherically symmetric, and 
the corresponding fields take a simple form; for 
example, the Higgs field of a 1-monopole located at 
r=O is 


ee 


7 r 2r2 


For N > 1, the expressions tend to be less explicit; 
but monopole solutions can nevertheless be char- 
acterized in a fairly complete way. The Bogomol’nyi 
equations [7] are a dimensional reduction of the self- 
dual Yang-Mills equations in Rf, and BPS mono- 
poles correspond to holomorphic vector bundles 
Over a certain two-dimensional complex manifold 
(“mini-twistor space”). This leads to various other 
characterizations of monopole solutions, for exam- 
ple, in terms of certain curves (“spectral curves”) on 
mini-twistor space, and in terms of solutions of a set 
of ordinary differential equations called the Nahm 
equations. Having all these descriptions enables one 
to deduce much about the monopole moduli space, 
and to characterize many monopole solutions. In 
particular, there are explicit solutions of the Nahm 
equations involving elliptic functions, which corre- 
spond to monopoles with certain discrete symme- 
tries, such as a 3-monopole with tetrahedral 
symmetry, and a 4-monopole with the appearance 
and symmetries of a cube. 


Yang-Mills Instantons 


Consider gauge fields in four-dimensional Euclidean 
space R*, with gauge group G. For simplicity, in 
what follows, G is taken to be SU(2); one can extend 
much of the structure to more general groups, for 
example, the simple Lie groups. Let A, and F, 
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denote the gauge potential and gauge field. The 
Yang-Mills action is 


1 
S=-7 J tt (FyF yu) dx [8] 


where we assume a boundary condition, at infinity 
in R*, such that this integral converges. The Euler- 
Lagrange equations which describe critical points of 
the functional S are the Yang-Mills equations 


DF, = 0 9] 


Finite-action Yang-Mills fields are called instantons. 
The Euclidean action [8] is used in the path-integral 
approach to quantum gauge field theory; therefore, 
instantons are crucial in understanding the path 
integral. 

The dual of the field tensor F,» is 


+A iy = 5 EwopEap 


The gauge field is self-dual if *F,,,=F,,,, and anti- 
self-dual if *F,,—=—F,,. In view of the Bianchi 
identity D, * Fy, =0, any self-dual or anti-self-dual 
gauge field is automatically a solution of the Yang- 
Mills equations [9]. This fact also follows from the 
discussion below, where we see that self-dual 
instantons give local minima of the action. 

The Yang-Mills action (and Yang-Mills equa- 
tions) are conformally invariant; any finite-action 
solution of the Yang-Mills equations on Rf extends 
smoothly to the conformal compactification S*. 
Gauge fields on S*, with gauge group SU(2), are 
classified topologically by an integer N, namely, the 
second Chern number 


1 
N= C? = -z | "Ew x Fiw) d*x [10] 


From [8] and [10] a topological lower bound on the 
action is given as follows: 


0<- J ATE E T 
= 8S — 167° N 


and so S > 27?N, with equality if and only if the 
field is self-dual. If N < 0, we get S > 277|N|, with 
equality if and only if F is anti-self-dual. So the self- 
dual (or anti-self-dual) fields minimize the action in 
each topological class. 

For the remainder of this section, we restrict to self- 
dual instantons with instanton number N > 0. The 
space (moduli space) of such instantons, with gauge 
equivalence factored out, is an (8N — 3)-dimensional 
real manifold. In principle, all these gauge fields can 
be constructed using algebraic-geometry (twistor) 
methods: instantons correspond to holomorphic vector 


bundles over complex projective 3-space (twistor 
space). One large class of solutions which can be 
written out explicitly is as follows: for N=1 and 
N =2 it gives all instantons, while for N > 3 it gives a 
(SN + 4)-dimensional subfamily of the full (8N — 3)- 
dimensional solution space. The gauge potentials in 
this class have the form 


Ag = 1640) log @ [11] 


where the c, are constant matrices (antisymmetric 
in uv) defined in terms of the Pauli matrices o, by 


— 1 
010 = 923 = 391 
= 1 
0290 = 031 = 3502 


_ =i 
030 = 012 = 503 


The real-valued function ¢=¢(x") is a solution of 
the four-dimensional Laplace equation given by 


where the x, are N + 1 distinct points in Rt, and the 
A, are N+ 1 positive constants: a total of 5N+5 
parameters. It is clear from [11] that the overall 
scale of ¢ is irrelevant, leaving a (SN + 4)-parameter 
family. For N=1 and N =2, symmetries reduce the 
parameter count further, to 5 and 13, respectively. 
Although ¢ has poles at the points x = xg, the gauge 
potentials are smooth (possibly after a gauge 
transformation). 

Finally, it is worth noting that (as one might 
expect) there is a gravitational analog of the gauge- 
theoretic structures described here. In other words, 
one has self-dual gravitational instantons — these are 
four-dimensional Riemannian spaces for which the 
conformal-curvature tensor (the Weyl tensor) is 
self-dual, and the Ricci tensor satisfies Einstein’s 
equations R,,=Ag,,. As before, such spaces 
can be constructed using a twistor-geometrical 
correspondence. 


Q-Balls 


A Q-ball (or nontopological soliton) is a soliton 
which has a periodic time dependence in a degree of 
freedom which corresponds to a global symmetry. 
The simplest class of O-ball systems involves a 
complex scalar field ¢, with an invariance under the 
constant phase transformation ¢+> e!’¢; the O-balls 
are soliton solutions of the form 


p(t, x) = e*ap(x) [12] 
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where y(x) is a complex scalar field depending only 
on the spatial variables x. The best-known case is 
the 1-soliton solution 


(t,x) = aV2 exp(ia*t)sech(ax) 


of the nonlinear Schrödinger equation 1¢; + @xx+ 
2 
$l" =9. 
More generally, consider a system (in n spatial 
dimensions) with Lagrangian 


£ =1(0,6)(0") — U(E) 


where (x) is a complex-valued field. Associated 
with the global phase symmetry is the conserved 
Noether charge Q = f Im(¢¢;) d”x. Minimizing the 
energy of a configuration subject to O being fixed 
implies that ¢ has the form [12]. Without loss of 
generality, we may take w > 0. Note that O =w], 
where I= f |y)|" d’x. The energy of a configuration 
of the form [12] is E=E, + Ex + Ep, where 


1 P 
Eg =5 | [uP dx 
E R O 
Ep = | UW) d"s 


Let us take U(0) = 0 = U' (0), with the field satisfying 
the boundary condition yy —> 0 asr —> o. 

A stationary Q-lump is a critical point of the 
energy functional E[w], subject to O having some 
fixed value. The usual (Derrick) scaling argument 
shows that any stationary O-lump must satisfy 


(2 =a] Ep =E ve, =0 [13] 


For simplicity, in what follows, let us take n > 3. 
Define m>0 by U"(0)=m?; then, near spatial 
infinity, the Euler-Lagrange equations give V7~— 
(m? — w*)w=0. So, in order to satisfy the boundary 
condition Y% — 0 asr — œ, we need w < m. 

It is clear from [13] that if U > (1/2)m? |4|" 
everywhere, then there can be no solution. So 
K= min[2U(|vJ)/|w|7] has to satisfy K < m?. Also, 


we have 


E, = J U >1KI = (K/u*)Ep > (K/u*)Ey [14 


where the final inequality comes from [13]. As a 
consequence, we see that w” is restricted to the range 


K <u < m? [15] 


An example which has been studied in some detail is 
U(f) =f2[1 + (1 — f2)*]; here m% =4 and K=2, so 
the range of frequency for O-balls in this system is 
V2 <w<2. The dynamics of O-balls in systems 
such as these turns out to be quite complicated. 


See also: Abelian Higgs Vortices; Homoclinic 
Phenomena; Integrable Systems: Overview; Instantons: 
Topological Aspects; Noncommutative Geometry from 
Strings; Sine-Gordon Equation; Topological Defects and 
Their Homotopy Classification. 
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Introduction 


Two key issues of classical and quantum informa- 
tion theory are storage and transmission of informa- 
tion. An information source produces some outputs 
(or signals) more frequently than others. Due to this 
redundancy, one can reduce the amount of space 
needed for its storage without compromising on its 
content. This data compression is done by a suitable 
encoding of the output of the source. In contrast, in 
the transmission of information through a channel, 
it is often advantageous to add redundancy to a 
message, in order to combat the effects of noise. 
This is done in the form of error-correcting codes. 
The amount of redundancy which needs to be added 
to the original message depends on how much noise 
is present in the channel (see, e.g., Nielson and 
Chuang (2000)). Hence, redundancy plays comple- 
mentary roles in data compression and transmission 
of data through a noisy channel. In this review we 
focus only on data compression in quantum infor- 
mation theory. 

In classical information theory, Shannon showed 
that there is a natural limit to the amount of 
compression that can be achieved. It is given by 
the Shannon entropy. The analogous concept in 
quantum information theory is the von Neumann 
entropy. Here, we review some of the main results 
of quantum data compression and the significance of 
the von Neumann entropy in this context. 

The review is structured as follows. We first give 
a brief introduction to the Shannon entropy and 
classical data compression. This is followed by a 
discussion of quantum entropy and the idea behind 
quantum source coding. We elaborate on data 
compression schemes for three different classes of 
quantum sources, namely memoryless sources, 
ergodic sources, and sources modeled by Gibbs 
states of quantum spin systems. In the bulk of the 
review, we concentrate on source-dependent, fixed- 
length coding schemes. We conclude with a brief 
discussion of universal and variable-length coding. 
We would like to point out that this review article 
is by no means complete. Due to a restriction on its 
length, we had to leave out various important 
aspects and developments of quantum source 
coding. 


Classical Data Compression 
Entropy and Source Coding 


A simple model of a classical information source 
consists of a sequence of discrete random variables 
X1,X2,..., Xn, whose values represent the output of 
the source. Each random variable X;,1 <i< n, 
takes values x; from a finite set, the source alphabet 
X. Hence, X):=(X1,...,X,) takes values x'):= 
(X1,.-+5Xn) E X”. We recall the definition of entropy 
(or information content) of a source. 

If the discrete random variables X1,...,X, which 
take values from a finite alphabet ¥V have joint 


probabilities 
PO: Gig watt Aa ND Ni cong ee) 


then the Shannon entropy of this source is defined by 


H(X1,-..,Xw) 
= — ` vee D 
x1E¥ XnEX 
x IOS DMs 2214) [1] 


Here and in the following, the logarithm is taken to 
the base 2. This is because the fundamental unit of 
classical information is a “bit,” which takes two 
values 0 and 1. Notice that H(X,...,X,,) in fact 
only depends on the (joint) probability mass func- 
tion (p.m.f.) p, and can also be denoted as H(p,). 

There are several other concepts of entropy, for 
example, relative entropy, conditional entropy, and 
mutual information. See, for example, Cover and 
Thomas (1991) and Nielson and Chuang (2000). It 
is easy to see that 


1. 0< H(X1,..., Xn) < nlog|X|, where |X| denotes 
the number of letters in the alphabet V. Two 
other important properties are as follows: 

2. H(X1,..., Xn) is jointly concave in X1,...,Xņn 
and 

3. A(X4,...,Xn) < H(X1,-.-3Xm) +H(Xm+1s---Xn) 
form <n. 


The latter property is called subadditivity. 

In the next section, analogous quantities are 
introduced for quantum information and the corre- 
sponding properties are stated. 

Suppose that the random variables X1, X2,...,Xn 
are independent and identically distributed (i.i.d.). 
Then the entropy of each random variable modeling 
the source is the same and can be denoted by H(X). 
From the point of view of classical information 
theory, the Shannon entropy has an important 
operational definition. It quantifies the minimal 
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physical resources needed to store data from a 
classical information source and provides a limit to 
which data can be compressed reliably (i.e., in a 
manner in which the original data can be recovered 
later with a low probability of error). Shannon 
showed that the original data can be reliably 
obtained from the compressed version only if the 
rate of compression is greater than the Shannon 
entropy. This result is formulated in Shannon’s 
noiseless channel coding theorem (Shannon 1918, 
Cover and Thomas 1991, Nielson and Chuang 
2000) given later. 


The Asymptotic Equipartition Property 


The main idea behind Shannon’s noiseless channel 
coding theorem is to divide the possible values 
X1,X2,...,X, Of random variables X,,...,X, into 
two classes — one consisting of sequences which have 
a high probability of occurrence, known as “typical 
sequences,” and the other consisting of sequences 
which occur rarely, known as “atypical sequences.” 
The idea is that there are far fewer typical sequences 
than the total number of possible sequences, but 
they occur with high probability. The existence 
of typical sequences follows from the so-called 
“asymptotic equipartition property”: 


Theorem 1 (AEP). If X1,X2,X3,... are iid. 
random variables with p.m.f. p(x), then 


1 
——log Pu(X1,--+)Xn) Z, H(X) (2] 


where H(X) is the Shannon entropy for a single 
variable, and py(X1,...,Xn) denotes the random 
variable taking values py(x1,.--,Xn) = [1 p(x) 
with probabilities py(x1,...5Xn)- 


This theorem has been generalized to the case of 
sequences of dependent variables (X,,),,<7 which are 
ergodic for the shift transformation defined below. 
It is easiest to formulate this for an information 
stream which extends from —oo to +o: 


Definition A sequence (X,)„ez is called “stationary” 
if for any nı < m and any Xm, .. -Xm E€ X, 


PXA Raga A a K) 


— P(X = Xni; e e o PAi — es) 


We define the shift transformation 7 by 


/ 


T( (Xn) nez,) = (Xp) new x, = Anel [3] 


Then (X,),e7, is called “ergodic” if it is stationary 
and if every subset A C ¥” such that 7(A) =A has 
probability 0 or 1, that is, P((X,),e7 € A)=0 or 1. 


It is known that (X,„)„ez is ergodic if and only if 
its probability distribution is extremal in the set of 
invariant probability measures. The generalization 
of Theorem 1 (McMillan 1953, Breiman 1957) now 
reads: 


Theorem 2 (Shannon—McMillan—Breiman theo- 
rem). Suppose that the sequence (Xn)nez is 
ergodic. Then 

lim f- “lo Du(X X,)} =} 

Pars P 8 Pn 1; e3 An — UKS 4] 


with probability 1 


where hgs is the Kolmogorov-Sinai entropy defined by 


bks = lim TH(Xis.- Xn) = inf H (Xi, Xn) [5] 
Remark. It follows from the subadditivity property 
(3) above that the sequence (1/7)H(p,,) is decreas- 
ing, and it is obviously bounded below by 0. 


We now define the set of typical sequences (or more 
precisely, e-typical sequences) as follows: 


Definition Let X1,...,X, be ii.d. random vari- 
ables with p.m.f. p(x). Given e > 0, e-typical set TO” 
is the set of sequences (x1 ...x,) for which 


Qn he) a p(x1...Xn)< 9M A(X) —€) (6] 


In the case of an ergodic sequence, H(X) is replaced 
by bks in [6]. 


Let |T™]| denote the total number of typical 
sequences and P{T”} denote the probability of the 
typical set. Then the following is an easy conse- 
quence of Theorem 1. 


Theorem 3 (Theorem of typical sequences). For 
any 6 > 04n0(6)>0 such that Yn > no(6) the follow- 
ing hold: 


(i) P{T} > 1-6 and 
(ii) (= 6) 2" A(X)—«) < T| < 2”(H(X)+e) 


Shannon’s Noiseless Channel Coding Theorem 


Shannon’s noiseless channel coding theorem is a 
simple application of the theorem of typical 
sequences and says that the optimal rate at which 
one can reliably compress data from an i.i.d. 
classical information source is given by the Shannon 
entropy H(X) of the source. 

A “compression scheme” C” of rate R maps 
possible sequences x =(x1,...,X,) to a binary string 
of length [nR]:C”:x y= (y1, -->Ynr]) where 
xj E X; |X|=d and y; € {0,1}V¥1<i< [xR]. The 
corresponding decompression scheme takes the [nR ] 


compressed bits and maps them back to a string of n 
letters from the alphabet ¥: D” : y € {0, 1} 8] Gx/ = 
ressa e oe compression-decompression scheme 
is said to be “reliable” if the probability that x’ Æ x 
tends to 0 as n— œœ. Shannon’s noiseless channel 
coding theorem (Shannon 1918, Cover and Thomas 
1991) now states 


Theorem 4 (Shannon). Suppose that {X;} is an i.1.d. 
information source, with X; ~ p(x) and Shannon 
entropy H(X). If R> H(X) then there exists a 
reliable compression scheme of rate R for the 
source. Conversely, any compression scheme with 
rate R < H(X) is not reliable. 


Proof (sketch). Suppose R > H(X). Choose «> 0 
such that H(X)+¢<R. Consider the set T™ of 
typical sequences. The method of compression is 
then to examine the output of the source, to see if it 
belongs to T™. If the output is a typical sequence, 
then we compress the data by simply storing an 
index for the particular sequence using [nR] bits in 
the obvious way. If the input string is not typical, 
then we compress the string to some fixed [nR] bit 
string, for example, (00...000). In this case, data 
compression effectively fails, but, in spite of this, the 
compression—decompression scheme succeeds with 
probability tending to 1 as n — ov, since by Theorem 3 
the probability of atypical sequences can be made 
small by choosing n large enough. 

If R < H(X), then any compression scheme of rate 
R is not reliable. This also follows from Theorem 3 
by the A aan Let S(z) be a collection 
of sequences x”) of size |S(n |< 2!"8l. Then the 
subset of atypical sequences in S(n) is highly 
improbable, whereas the corresponding subset of 
typical sequences has probability bounded by 
JORIE S G as Go, E 


Quantum Data Compression 
Quantum Sources and Entropy 


In quantum information processing systems, infor- 
mation is stored in quantum states of physical 
systems. The most general description of a quantum 
state is provided by a density matrix. 

A “density matrix” p is a positive semidefinite 
operator on a Hilbert space H, with tro = 1, and the 
expected value of an operator A on H is given by 


PLAJ = tr (pA) 7] 
The functional ¢ on M = B(H), the algebra of linear 
operators on H, is positive (i.e., (A) > 0, if A > 0) 


and maps the identity I € M to 1. Such a functional 
is also called a state. Conversely, given such a state 
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on a finite-dimensional algebra M, there exists a 
unique density matrix p such that [7] holds, so the 
concepts can be used interchangeably. (This is not 
true in the infinite-dimensional case.) 

The quantum analog of the Shannon entropy is 
called the von Neumann entropy. For any quantum 
state ġ (or equivalently p,), it is defined by 


S(d) = S(pg):= —tr(p¢ log pg) [8] 


Here we use log to denote log, and define 0 log 
0 = 0, as for the Shannon entropy. Let the density 
matrix py have a spectral decomposition 


d 
po = X Ailg (il [9] 
i=1 


Here {|v;)} is the set of eigenvectors of py. They 
form an orthonormal basis of the Hilbert space H. 
By the fact that pọ is positive definite and has trace 1, 
the eigenvalues A; of pẹ determine a probability 
distribution. When expressed in terms of the A;, the 
von Neumann entropy of p reduces to the Shannon 
entropy corresponding to this probability distribu- 
tion (henceforth, the subscript ¢ of pẹ will be 
omitted): S(p)= H(A), where A= {Aq,..., Ag}. 

The von Neumann entropy has properties analo- 
gous to H(X1,..., Xn), in particular (Ohya and Petz 
1993, Nielson and Chuang 2000) 


1. 0 < S(¢) < log(dim (H)); 

2. S(ġ) is concave in ¢; and 

3. if dis a state on H = H1 Q H2 then S(¢) < $(o1) + 
S(é2) if d; and 2 are the restrictions of ¢ to 
Hı ®I and I ® H2 respectively. 


A “quantum information source” in general is 
defined by a sequence of density matrices p” on 
Hilbert spaces H, of increasing dimensions N, given 
by a decomposition 


p” = DPR eee [10] 


where the states wl” " ee as ya signal 
states, and the saben p%” > 0 with `, p” = I. as 
their ababil ass of occurrence. The vectors poi Ne 
H, need not be mutually orthogonal. 


Compression-—Decompression 
Scheme and Fidelity 


To compress data ie such a oe one encodes 
each signal state ha "y by a state py" ! € B(H,) where 
dim mee z N, TE a a scheme 
is a map Cl”) : [wy py ep,” € B(H,). The state 
oe is referred to . the Compressed state. A 
corresponding oe scheme i a map 


D”: B(Ha)— B(Hn). Both C™ and D™® must be 
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completely positive maps. In particular, this implies 
that D™ must be of the form 


D”)(p) =F DipD? [11] 


for linear operators D; :Ha Hy such that 
X; DžD;=I (see Nielson and Chuang 2000). 
Obviously, in order to achieve the maximum 
possible compression of Hilbert space dimensions 
per signal state, the goal must be to make the 
dimension d,(z) as small as possible, subject to the 
condition that the information carried in the signal 
states can be retrieved with high accuracy upon 
decompression. 
The “rate of compression” is defined as 


_ log(dim Hn) _ log de(n) 


Rn = log(dim H,) log N, 


It is natural to consider the original Hilbert space 
Hn to be the n-qubit space. In this case N, = 2” and 
hence log N,, =n. As in the case of classical data 
compression, we are interested in finding the 
optimal limiting rate of data compression, which in 
this case is given by 


Roo lim 2E 2) 


n— OO n 


[12] 


Unlike classical signals, quantum signal states are 
not completely distinguishable. This is because they 
are, in general, not mutually orthogonal. As a result, 
perfectly reconstructing a quantum signal state from 
its compressed version is often an impossible task 
and therefore too stringent a requirement for the 
reliability of a compression—decompression scheme. 
Instead, a reasonable requirement is that a state can 
be reconstructed from the compressed version which 
is nearly indistinguishable from the original signal 
state. A measure of indistinguishability useful for 
this purpose is the average fidelity defined as 
follows: 


Fr =p GP DOE o m 
k 


This fidelity satisfies 0O < F, <1 and F,=1 if 
and only if Day) = p e| for all k. A 
compression—decompression scheme is said to be 
reliable if F, — 1 as n — oo. 

The key idea behind data compression is the fact 
that some signal states have a higher probability of 
occurrence than others (these states playing a role 
analogous to the typical sequences of classical 
information theory). These signal states span a 
subspace of the original Hilbert space of the source 
and is referred to as the typical subspace. 


Schumacher’s Theorem for Memoryless 
Quantum Sources 


The notion of a typical subspace was first 
introduced in the context of quantum information 
theory by Schumacher (1995) in his seminal paper. 
He considered the simplest class of quantum 
information sources, namely quantum memoryless 
or i.i.d sources. For such a source the density matrix 
po, defined through [10], acts on a tensor product 
Hilbert space H, =H®” and is itself given by a 
tensor product 


pO =g" 4 


Here H is a fixed Hilbert space (representing an 
elementary quantum subsystem) and 7 is a density 
matrix acting on H; for example, H can be a single 
qubit Hilbert space, in which case dim H =2, Hn is 
the Hilbert space of n qubits and m is the density 
matrix of a single qubit. If the spectral decomposi- 
tion of m is given by 


dim H 


m= X qildi) l:l [15] 
i=1 


then the eigenvalues and eigenvectors of p™ are 
given by 


Ne” = Dk Tk +++ Ly [16] 
and 
He”) = be.) 8 Ibe) ®---@lde,) [17 


Thus, we can write the spectral decomposition of 
the density matrix p” of an i.i.d. source as 


p= J AP ee be | 18 
k 


where the sum is over all possible sequences 
k=(k,...Ry), with each k; taking (dim H) values. 
Hence, we see that the eigenvalues p™) are labeled 
by a classical sequence of indices k = kı ... Ry. 

The von Neumann entropy of such a source is 
given by 


S(p) = S(r”) = nS(x) =nH(X) 119] 


where X is the classical random variable with 
probability distribution {q;}. 

Let T,” be the classical typical subset of indices 
(kı... ka) for which 


—*108 (4K, -< Dk, ) — S(T) = E€ [20] 


as in the theorem of typical sequences. Defining 
T.” as the space spanned by the eigenvectors wy) 


with k € T” then immediately yields the quantum 
analog of the theorem of typical sequences — Theorem 
4 given below. We refer to T as the typical subspace 
(or more precisely, the e-typical subspace). 


Theorem 4 (Typical subspace theorem). Fix e > 0. 
Then for any 6 > 04 no0(6) > 0 such that Yn > no(6) 
and p™ =r”, the following are true: 


(i) Tr(P™ po) > 1-6 and 

(ii) (1 — 6)25'")-9 < dim(T™) < 25("+9, where 
P™) is the orthogonal projection onto the 
subspace T”. 


Note that tr(P™ p™) a the probability of the 
typical subspace. As tr(P.” n) o”) approaches unity for 
n a large, T” carries almost all the weight 
of po). Let T+ denote the orthocomplement of the 
sn sibepace: that is, for any pair of vectors 
eb) T™ and |d)€ TO (oly) =0. It follows from 
the above theorem that the probability of a signal 
state belonging to 7+ can be made arbitrarily 
small os n sufficiently large. 

Let P” denote a a One projection onto the 
typical subspace T A encoding (compression) 
of the signal states ye ) et [10], is done in the 
following manner. C™ jut y (y py”, where 


pe = of WY") (He? | + Go) (Go| 121) 
Here 
i) Py”) 

Pe ee”) | [22] 
k= [POOPY], B= E- P) 0 | 


and |®q) is any i n in 7” 

ieee a le BIT), and hence the typical 
subspace 7%” plays the role of the compressed space. 
The decomprescion pr py”) is defined as the 
extension of J” on T” to Hy: 


p(t) = 24 


The fidelity of this compression—decompression 
scheme satisfies 


Fy = Soe (OK Lae ee”) 
k 
=e o 
> dP 0% |(Y 
= sn" (Zaz — 1) 


where A, = tr(P” pn). 


n) = (n)\ 12 n 2 
Ha ] — 


= Dom ay 


=2A,—1 [23] 
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Using the typical subspace theorem, Schumacher 
(1995) proved the following analog of Shannon’s 
noiseless channel coding theorem for memoryless 
quantum information sources: 


Theorem 5 (Schumacher’s quantum coding theo- 
rem). Let {pn, Hn} be an iid. quantum source: 
Pn = 78" and H, =H". If R > S(r), then there exists 
a reliable compression scheme of rate R. If R < S(z), 
then any compression scheme of rate R is not reliable. 


Proof 


(i) R > S(r). Choose e > 0 such that R > S(z) + € 
For a given 6 > 0, choose the typical subspace as 
above and choose n large enough so that (i) and (ii) 
in the typical subspace theorem hold. In particular, 
An =tr(P p„)>1-— ô. Thus, the fidelity tends to 1 
as N— oœ. 

(u) io KR < O(a). Let the compression map 
be C™. We e may assume that H, is a subspace of H, 
with dim H, =2”*. We sans the pro on onto 
H, as P, and let p= "(pol (Y y]. Since 
py” is co on i. we ae pa Es 
and hence D” Os )) < D™(P,), for any decompres- 
sion map D”, en into the definition of the 
fidelity, we then have 


Fs Sony Poley oo 
< 5 Ne (by? D \ 4 2, NG [24] 
keT” RET?” 


By the typical subspace theorem, the latter m 
tends to Y : am x e in the sum over k € T™” 
we have AY ko ami ). The first sum can therefore 
be pomad as TA 


> A O DAR 


keT” 
“9 SP? DO 
k 


22-80 
Sty (D n P 


= aed ny 
R 


= 2 -nS(m)—€) 0 


Poly) 


[25] 


by the cyclic property of the trace and the fact that 
yD Deland dig. 2 C 


Even for a quantum source with memory, reliable 
data compression is achieved by looking for a 
typical subspace 7™” of the Hilbert space H, for a 
given € > 0. In the following subsections, we discuss 
two different classes of such sources for which one 
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can find typical subspaces TJ) such that the fidelity 
F,, tends to 1 as n= œ. 


Ergodic Quantum Sources 


A quantum generalization of classical ergodic 
sources is defined as follows. First consider the 
analog of an infinite sequence of random variables 
which is a state on the infinite tensor product of a 
finite-dimensional *-algebra M. The latter is given 
by the norm closure of the increasing sequence of 
finite tensor products 


A translation-invariant state œ% on M is said to be 
ergodic if it cannot be decomposed as a (nontrivial) 
convex combination of other translation-invariant 
states. The analog of the Kolmogorov-—Sinai entropy 
[5] for an ergodic state œ is called the mean 
entropy and is given by 


1 1 
= lim —S(¢,) = inf Slo, 2 
Sm) = lim = S(@n) = int S(dn) [27] 
where ¢, is the restriction of ¢, to M,:=M®*". 
Following Hiai and Petz (1991), we define the 
following quantity for any state ¢ on an arbitrary 
finite-dimensional x-algebra M and a given 6 > 0: 


Bs(b) = inf{log tr(q):¢ E M,q* =q, 
q4 =4,¢(q)>1- ô} [28] 


We also define a state œ on Ma to be completely 
ergodic if it is ergodic under transformations on Mœ, 
induced by /-fold shifts on Z, for arbitrary l € N. The 
following theorem is due to Hiai and Petz (1991), 
who proved it in a slightly more general setting: 


Theorem 6 (Hiai and Petz). Suppose that 6. is a 
completely ergodic state on M ~ and d := dim M < œ, 
and set bn = boo! u,. Then, for any 8 > 0, the following 
hold: 


(i lim sup ~ (bn) <Su(bx) 29 


(i) liminf—8y(4) >Su(@x) —Slogd [30 


Proof of (i) Choose r>Sy(d,) and let €< r— 
Sm(œ) and h=r-—e. By the definition of Sml»), 
there exists l € N such that S(¢,;) < Lh. Let len a 
be an orthonormal set of eigenvectors of p, with 
corresponding eigenvalues A;, that is, let 


Id 
pa, = ADi [31] 
=l 


where p;= |e;)(e;| is the projection onto |e;), be the 
spectral decomposition for pọ. Denote the spectrum 
X = {A; Ké]. For n€ N, introduce the probability 
measures v, on X” by 


Va (A) = nilqa) [32] 


where, for any A C æ”, the projection q4 is defined by 


DiS a3 Ope [33] 


Similarly, we define v% on ¥”%. The sequence of 
random variables (X„)„ez with distribution væ is 
then ergodic since ¢. is completely ergodic (and 
hence /-ergodic). 


By the Shannon—McMillan-Breiman theorem 
(Theorem 2), 
1 

= Flog v ({ (x1, +++ %n) 5) > hrs [34] 


almost surely w.r.t. Vo, where hgs is the Kolmogorov- 
Sinai entropy. The latter is given by bgs = lim, > o 
(1/n)H, = infnex (1/n)H,„, where 


A, = — Un(4(%1,+++5%Xn)f) 
(X1, Xn) EX” 
x log va({(1, -+ xn)}) 35] 
Notice in particular that 
bks 114 = S(¢1) < lh [36] 


If let T” be the (typical) subset of 4” such that 


-log val {(%1,---,%n)})€ (bes — 6, bxs te) 37) 


for (x1,---5Xn)€ T™ then we have v.,(T”) >1-6 
for n large enough. Moreover, since 1,({(x1,..., Xn)}) > 
eWMxs+) for all (x1,...,X,)€ T”, and the total 
measure is 1, 


IT) | = els +) 2 end hte) [38] 


It follows that tr(qr) < e+) whereas nilapa l= 
v, (T™) > 1-— ô and we conclude that 


1 lh 
laou HS <, 39] 


from which [29] follows upon taking n— ov, since 
r> Sml») was arbitrary. (Notice that Bsl@n) is 
decreasing in n since M, C My41.) oO 


Proof of (ii) Given «,6 >0 and n € N, choose a 
projection qn with @,(gn) >1—6 and logtr(qn) < 
Bs(@n) +e. Since Sml) = inf (1/n)S(n) we have 


Su(@oo) < (1/2)S(én). We now use the following 


lemma: 


Lemma 7 If ¢ is a state on a finite-dimensional 
*-algebra M, and q E M is a projection, then 


S(@) < H(p) + 6(q) log tr(q) 
+ (1 — ¢(q)) log tr(1 — q) [40] 
where H(p)=—plogp — (1 — p) log (1 — p) (the bin- 
ary entropy) with p= @(q). 


Proof First notice that if [pg,q]=0 then the result 
[40] follows from the simple inequality: 


- X Aj log A; < logm if ae! [41] 


Indeed, diagonalizing py, the eigenvalues A; divide into 
two subsets with corresponding eigenvectors belong- 
ing to the range of q, respectively, its complement. 
Considering the first set, we have, ifm = dim (Ran(q)), 
and taking A; = à;/(%7 1 x) in [41], 


yh log rj < = SA log D 
= Si = 


= —tr(gpz) [log tr(qpe) — log tr(q)| 


Adding the analogous inequality for the part of the 
spectrum corresponding to 1 — g, we obtain [40]. 

In the general case, that is, if [p,q] Æ 0, define 
the unitary u =2q — 1 and the state 


g'(x) = 7[b(x) + o(uxu)] (42) 


Then [py,q]=0 and by concavity of S(¢) and the 
result for the previous case 


H(X) + (q) log tr(q) 
+ (1 — ¢(q)) log tr(1 — q) = S(%') > S(d) [43] 


since ¢'(q) = $(q). m 


Continuing with the proof of (ii), we conclude that 


S(Pn) < H(p) + nlqn) log tr(qn) 
+ (1 — $(4n)) log tr(1 — qn) 
< 1+ S5(¢n) ++ énlogd 


Dividing by n and taking the limit we obtain (30). 
LJ 


It follows from this theorem that we can define a 
typical subspace in the same way as in Schumacher’s 
theorem. Indeed, given 6>0 and €>0, we have 
that for n large enough, there exists a subspace T”’ 
equal to the range of a projection g, such that 
Galda >1—6 and e%Smle0)—Slogd—e) < dim (T™) = 
tt(dn) < e@ul?o+9, The proof of the quantum 
analog of the Shannon—McMillan theorem is then 
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similar to that of Schumacher’s theorem (Petz and 
Mosonyi 2001, Bjelakovic et al. 2004): 


Theorem 8 Let 6, be a completely ergodic 
stationary state on the infinite tensor product 
algebra Mœ. If R > Sml), then for any decom- 
position of the form 


pl) = F pP (Ue 44 


there exists a reliable quantum code of rate R. 
Conversely, if R< Sml») then any quantum 
compression—decompression scheme of rate R is 
not reliable. 


Remarks Theorem 6 also holds for higher- 
dimensional information streams, with essentially 
the same proof. (The existence of the mean entropy 
is more complicated in that case.) The condition of 
complete ergodicity in this theorem is unnecessary. 
Indeed, Bjelaković et al. (2004) showed that the 
result remains valid (also in more than one dimen- 
sions) if the state œ of the source is simply ergodic. 
They achieved this by decomposing a general 
ergodic state into a finite number of l-ergodic states, 
and then applying the above strategy to each. It 
should also be mentioned that a weaker version of 
Theorem 6 was proved by King and Lesniewski 
(1998). They considered the entropy of an asso- 
ciated classical source, but did not show that this 
classical entropy can be optimized to approximate 
the von Neumann entropy. This had in fact already 
been proved by Hiai and Petz (1991). The relevance 
of the latter work for quantum information theory 
was finally pointed out by Mosonyi and Petz (2001). 


Source Coding for Quantum 
Spin Systems 


In this section we consider a class of quantum 
sources modeled by Gibbs states of a finite strongly 
interacting quantum spin system in AC Z with 
d>2. Due to the interaction between spins, the 
density matrix of the source is not given by a tensor 
product of the density matrices of the individual 
spins and hence the quantum information source is 
non-i.i.d. We consider the density matrix to be 
written in the standard Gibbsian form: 
— 86H” 

or — = [45] 
where 6 >00 is the inverse temperature. Here w 
denotes the boundary condition, that is, the config- 
uration of the spins in A°=Z“\A, and HX is the 
Hamiltonian acting on the spin system in A under 
this boundary condition. (see Datta and Suhov (2002) 
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for precise definitions of these quantities). The 
denominator on the right-hand side of [45] is the 
partition function. 

Note that any faithful density matrix can be 
written in the form [45] for some self-adjoint 
operator HY with discrete spectrum, such that 
e’M\ is trace class. However, we consider H¥ to 
be a small quantum perturbation of a classical 
Hamiltonian and require it to satisfy certain 
hypotheses (see Datta and Suhov (2002)). In 
particular, we assume that Hy = Hoa + AVa, where 
(1) Hoa is a classical, finite-range, translation- 
invariant Hamiltonian with a finite number of 
periodic ground states, and the excitations of these 
ground states have an energy proportional to the 
size of their boundaries (Peierls condition); (2) AV, 
is a translation-invariant, exponentially decaying, 
quantum perturbation, being the perturbation 
parameter. These hypotheses ensure that the quan- 
tum Pirogov-Sinai theory of phase transitions in 
lattice systems (see, e.g., Datta et al. (1996)) applies. 

The power of quantum Pirogov—Sinai theory is 
such that, in proving reliable data compression for 
such sources, we do not need to invoke the concept 
of ergodicity. 

Using the concavity of the von Neumann entropy 
S(p“), one can prove that the von Neumann 
entropy rate (or mean entropy) of the source 


bv lim SOP) 
azzi |A| 





exists. For a general van Hove sequence, this follows 
from the strong subadditivity of the von Neumann 
entropy (see, e.g., Ohya and Petz (1993)). 

Let p*“ have a spectral decomposition 


= > Aili) i 


where the eigenvalues Aj,1 <j < 2A, and the 
corresponding eigenstates |7);), depend on w and A. 
Let P>^ denote the proba distribution {A;} and 
consider a random variable K+ which takes a value 
A; with probability ;: 


Ke (hj) = A; 


The data compression limit is related to asympto- 
tical properties of the random variables KY“ as 
A Z Zf. As in the case of i.i.d. sources, we prove 
the reliability of data compression by first proving 
the existence of a typical subspace. The latter 
follows from Theorem 9 below. The proof of this 
crucial theorem relies on results of quantum 
Pirogov-Sinai theory (Datta et al. 1996). 


wWA(wA __ 
Ped (Ke = dy 


Theorem 9 Under the above assumptions, for (3 
large and X small enough, for all « > 0 


lim pea — log K” —þ < e) 
AZZ JAI 


7 fe >. NXql-A t loga} =} [46] 


where x; denotes an indicator function. 


Theorem 9 is essentially a law of large numbers 
for random variables (—log K>+4). The statement of 
the theorem can be alternatively expressed as 
follows. For any e > 0, 


lim Pes Cal < Ke < 2-lea) —1 [47 
A/7Z4 7 7 


Thus, we can define a typical subspace 7T®^ by 


T?* = span y 27 AOt9 << 2-AI@-91 148] 


It clearly satisfies the analogs of (i) and (ii) of the 
typical subspace theorem, which implies as before 
that a compression scheme of rate R is reliable if and 
only if R> hb. 


Universal and Variable Length Data Compression 


Thus far we discussed source-dependent data com- 
pression for various classes of quantum sources. In 
each case data compression relied on the identifica- 
tion of the typical subspace of the source, which in 
turn required a knowledge of its density matrix. In 
classical information theory, there exists a general- 
ization of the theorem of typical sequences due to 
Csiszar and Korner (1981) where the typical set is 
universal, in that it is typical for every possible 
probability distribution with a given entropy. This 
result was used by Jozsa et al. (1998) to construct a 
universal compression scheme for quantum 1.i.d 
sources with a given von Neumann entropy S using 
a counting argument for symmetric subspaces. This 
was generalized to ergodic sources by Kaltchenko 
and Yang (2003) along the lines of Theorem 6. 
Hayashi and Matsumoto (2002) supplemented the 
work of Jozsa et al. (1998) with an estimation of the 
eigenvalues of the source (using the measurement 
smearing technique) to show that a reliable compres- 
sion scheme exists for any quantum 1.1.d source, 
independent of the value of its von Neumann entropy 
S, the limiting rate of compression being given by S. If 
one admits variable length coding, the Lempel—Ziv 
algorithm gives a completely universal compression 
scheme, independent of the value of the entropy, in 
the classical case (Cover and Thomas 1991). This 
algorithm was generalized to the quantum case for 
i.i.d sources by Jozsa and Presnell (2003), and to 
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sources modeled by Gibbs states of free bosons or 
fermions on a lattice by Johnson and Suhov (2002). 

Another important question is the efficiency of the 
various coding schemes. The above-mentioned 
schemes for quantum i.i.d. sources are not efficient, 
in the sense that they have no polynomial time 
implementation. Recently, it was shown by Bennett 
et al. (2004) that an efficient, universal compression 
scheme for i.i.d sources can be constructed by 
employing quantum state tomography. 
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The Value of Topological Reasoning 
in General Relativity 


Solving the equations of Einstein’s general relativity 
(see General Relativity: Overview) can be an exceed- 
ingly complicated business; it is commonly found 
necessary to resort to numerical solutions involving 
very complex computer codes (see Computational 
Methods in General Relativity: The Theory). The 
essential content of the basic equations of the theory 
itself is, however, something that can be phrased in 
simple geometrical terms, using only basic concepts 


of differential geometry (see General Relativity: 
Overview). By virtue of this, it is sometimes the 
case, in general relativity, that geometrical arguments 
of various kinds — including purely topological ones 
(i.e., arguments depending only upon the properties 
of continuity or smoothness) — can be used to great 
effect to obtain results that are not readily accessible 
by standard procedures of differential equation 
theory or by direct numerical calculation. 

One particularly significant family of situations 
where this kind of argument has a key role to play is 
in the important issue of the singularities that arise 
in many solutions of the Einstein equations, in 
which spacetime curvatures may be expected to 
diverge to infinity. These are exemplified, particu- 
larly, by two important classes of solutions of the 
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Einstein field equations in which singularities arise. 
In the first instance, we have cosmological models, 
which tend to exhibit the presence of an initial 
singularity referred to as the “Big Bang,” as was first 
noted in the standard Friedmann models (which are 
solutions of the Einstein equations with simple 
matter sources; see Cosmology: Mathematical 
Aspects). Secondly, we find a final singularity (for 
local observers) at the endpoint of gravitational 
collapse to a black hole (where in the relevant 
region, outside the collapsing matter, Einstein’s 
vacuum equations are normally taken to hold). In 
either case, there are canonical exact models, in 
which considerable symmetry is assumed, and where 
the models indeed become singular at places where 
the spacetime curvature diverges to infinity. For 
many years (prior to 1965), there had been much 
debate as to whether these singularities were an 
inevitable feature of the general physical situation 
under consideration, or whether the presence of 
singularities might be an artifact of the assumed 
high symmetry. The use of topological-type argu- 
ments has established that, in general terms, the 
occurrence of a singularity is not merely an artifact 
of symmetry, and cannot generally be removed by 
the introduction of small (finite) perturbations. 

Let us first consider the standard picture, put 
forward in 1939 by Oppenheimer and Snyder (OS), 
of the gravitational collapse of an over-massive star 
to a black hole; see Figure 1 (and see Stationary 
Black Holes). This assumes exact spherical symme- 
try. The region external to the matter is described by 
the well-known Schwarzschild solution of the 
Einstein vacuum equations, appropriately extended 
to inside the “Schwarzschild radius” r=2mG/c* 
(G being Newton’s gravitational constant and c, the 
speed of light, and where m is the total mass of 
the collapsing material; from now, for convenience, 
we choose units so that G=c=1). In Figure 1, 
this internal extension is conveniently expressed 
using Eddington—Finkelstein coordinates (r,v,6, @) 
(see Eddington (1924) and Finkelstein (1958)), 
where v=t+r+2mlog(r—2m), the metric form 
being 


ds? =(1 — 2m/r)dv? — 2dudr 
— 7? (d0? + sin? 6d¢”) 


(The signature convention +——— is being adopted 
here; see General Relativity: Overview.) We find 
that, in this model, there is a singularity (at r=0) at 
the future endpoint of each world line of collapsing 
matter. Moreover, no future-timelike line starting 
inside the horizon can avoid reaching the singularity 
when we try to extend it, as a timelike curve, 


Observer 


y 






AN Horizon 


ie 


WS. 


! i 
pl i “ 4 
| Collapsing h, 


q 
th í 
Ce matter 8 


Figure 1 Spacetime diagram of collapse to a black hole. 
(One spatial dimension is suppressed.) Matter collapses inwards, 
through the 3-surface that becomes the (absolute) event horizon. 
No matter or information can escape the hole once it has been 
formed. The null cones are tangent to the horizon and allow 
matter or signals to pass inwards but not outwards. An external 
observer cannot see inside the hole, but only the matter — vastly 
dimmed and redshifted — just before it enters the hole. 
(Reproduced with permission from Penrose R. (2004) The Road 
to Reality: a Complete Guide to the Laws of the Universe. 
London: Jonathan Cape.) 


indefinitely into the future, where the “horizon” is 
the three-dimensional region obtained by rotating, 
over the (6,¢) 2-sphere, the null (lightlike) line 
which is r= 2m outside the matter region and which 
is the extension of this line, as a null line, into the 
past until it meets the axis. It is easy to see that any 
observer’s world line within this horizon is indeed 
trapped in this sense. 

The question naturally arises: how representative 
is this model? Here, the singularity occurs at the 
center (r=0), the place where all the matter is 
directed, and where it all reaches without rebound- 
ing. So it may be regarded as unsurprising that the 
density becomes infinite there. Now, let us suppose 
that the collapsing material is not exactly spherically 
symmetrical. Even if it is only slightly (though 
finitely) perturbed away from this symmetrical 
situation, having slight (but finite) transverse 
motions, the collapsing matter is now not all 
directed exactly towards the center, as it is in the 
OS model. One might imagine that the singularity 
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could now be avoided, the different portions of 
matter just “missing” each other and then being 
finally flung out again, after some complicated 
motions, where the density and spacetime curvatures 
might well become large but presumably still finite. 
To follow such an irregular collapse in full detail 
would present a very difficult task, and one would 
have to carry it out by numerical means. As yet, 
despite enormous advances in computational tech- 
nique, a fully effective simulation of such a 
“generic” collapse is still not in hand. In any case, 
it is hard to make a convincing case as to whether or 
not a singularity arises, because as soon as metric or 
curvature quantities begin to diverge, the computa- 
tion becomes fundamentally unreliable and simply 
“sives up.” So we cannot really tell whether the 
failure is due to some genuine divergence or whether 
it is an artifact. It is thus fortunate that other 
mathematical techniques are available. Indeed, by 
use of a differential-topological-causal argument, 
we find that such perturbations do not help, at least 
so long as they are small enough not to alter the 
general character of the collapse, which we find has 
an “unstoppable” character, so long as a certain 
criterion is satisfied its early stages. 


Trapped Surfaces 


But how are we to characterize the collapse as 
“unstoppable,” where no symmetries are to be 
assumed, and the simple picture illustrated in 
Figure 1 cannot be appealed to? A convenient 
characterization is the presence of what is called a 
“trapped surface.” This notion generalizes a key 
feature of the 0 < r < 2m region inside the horizon 
of the vacuum (Eddington—Finkelstein) picture of 
Figure 1. To understand what this feature is, 
consider fixing a point s in the vacuum region of 
the (v,r)-plane of Figure 1. We must, of course, bear 
in mind that, because this plane is to be “rotated” 
about the central vertical axis (r= 0) by letting 0 and 
ġ vary as coordinates on a 2-sphere S7, the point s 
actually describes a closed 2-surface S (coordina- 
tized by 6 and ¢) with topology S* (so S is 
intrinsically an ordinary 2-sphere). We shall be 
concerned with the region I*(S), which is the 
(chronological) “future” of S, that is, the locus of 
points g for which a timelike curve exists having a 
future endpoint at g and a past endpoint on S. We 
shall also be interested, particularly, in the boundary 
OI*(S) of I*(S). This boundary is described, in 
Figure 1, by the pair of null curves v= const. and 
2r + 4m log(r—2m)=const., proceeding into the 
future from s (and rotated in 0 and @). The region 


I*(S) itself is represented by that part of Figure 1 
which lies between these null curves. 

We observe that, in this symmetrical case (s being 
chosen in the vacuum region), a characterization of s 
as being “trapped,” in the sense that it lies in a 
region that is within the horizon, is that the future 
tangents to these null curves both point “inwards,” 
in the sense of decreasing r. Since r is the metric 
radius of the $? of rotation, so that the element of 
surface area of this sphere is proportional to r4, it 
follows that the surface area of the boundary ðI (S) 
reduces, on both branches, as we move away from S 
into the future. The three-dimensional region OJ*(S) 
consists of two null surfaces joined along S, in 
the sense that their Lorentzian normals are null 
4-vectors. For each fixed value of 0 and @, this 
normal is a tangent to one or other of the two null 
curves of Figure 1, starting at s. For a trapped s, 
these normals point in the direction of decreasing r, 
and it follows that the divergence of these normals is 
negative (so p > 0 in what follows below). 

In the general case, it is this property of negativity 
of the divergence, at S, of both sets of Lorentzian 
normals (ie., of null tangents to OIJ*(S)), that 
characterizes S as a trapped surface, where in the 
general case we must also prescribe S to be compact 
and spacelike. But now there are to be no assump- 
tions of symmetry whatever. Such a characterization 
is stable against small, but finite, perturbations of 
the location of S, within the spacetime manifold M, 
and also against small, but finite, perturbations of M 
itself. 

We can think of a trapped surface in more direct 
physical/geometrical terms. Imagine a flash of light 
emitted all over some spacelike compact spherical 
surface such as S, but now in ordinary flat space- 
time, where for simplicity we suppose that S is 
situated in some spacelike (flat) 3-hypersurface H, of 
constant time t=0. There will be one component to 
the flash proceeding outwards and another proceed- 
ing inwards. Provided that S is convex, the outgoing 
flash will represent an initial increase of the surface 
area at every point of S and the ingoing flash, an 
initial decrease. In four-dimensional spacetime 
terms, we express this as positivity of the divergence 
of the outward null normal and the negativity of the 
divergence of the inward one. The characteristic 
feature of a trapped surface is that whereas the 
ingoing flash will still have an initially reducing 
surface area, the “outgoing” flash now has the 
curious property that its surface area is also initially 
decreasing, this holding at every point of S. 

Locally, this is not particularly strange. For a 
surface wiggling in and out, we are quite likely to 
find portions of ingoing flash with increasing area, 


620 Spacetime Topology, Causal Structure and Singularities 


and portions of outgoing flash with decreasing area. 
An extreme case in Minkowski spacetime has S as the 
intersection of two past light cones. All the null 
normals to S point along the generators of these past 
cones, and therefore all converge into the future. Such 
a surface S (indeed spacelike) looks “trapped” every- 
where locally, but fails to count as trapped, not being 
compact. Since there is nothing causally extreme about 
Minkowski space, it is appropriate not to count such 
surfaces as “trapped.” What is the peculiar about a 
trapped surface is that both ingoing and outgoing 
flashes are initially decreasing in area, over the entire 
compact S. (N. B. Hawking and Ellis (1973) adopt a 
slightly different terminology; the term “trapped,” 
used here, refers to their “closed trapped.”) 


The Null Raychaudhuri Equation 


What do we deduce from the existence of a trapped 
surface? A glance at Figure 1 gives us some 
indication of the trouble. As we trace OI*(S) into 
the future, we find that its cross-sectional area 
continues to decrease, until becoming zero at the 
central singularity. This last feature need not reflect 
closely what happens in more general cases, with no 
spherical symmetry. But the reduction in surface 
area is a general property. This is the first point to 
appreciate in a theorem (Penrose 1965, 1968, 
Hawking and Ellis 1973) which indicates the 
profoundly disturbing physical implications of the 
existence of a trapped surface in physically realistic 
gravitational collapse, according to Einstein’s gen- 
eral relativity. The surface-area reduction arises 
from a result known as “Raychaudhuri’s equation,” 
in the case of null rays — where we refer to this as 
the “Sachs” equations. We come to this next. 

Although many different notations are used to 
express the needed quantities, we can here conve- 
niently employ the spin-coefficient formalism, as 
described elsewhere in this Encyclopedia (see Spi- 
nors and Spin Coefficients). 

Suppose that we have a congruence (smooth three- 
parameter family) of rays (null geodesics) in four- 
dimensional spacetime. Let ¢* be a real future-null 
vector, tangent to a null geodesic y of the congruence, 
and let m°? be complex-null, also defined along y, 
where its real and imginary parts are unit vectors 
spanning a 2-surface element orthogonal to ¢* at each 
point of y, so we have 


Lc =0, Lan =, 
mm = 0, mam? = —1, 


(4 — 


where it is assumed that each of ¢%,m* is parallel- 
propagated along q: 


CV,0=0, €V,m’ =0 


(Va denoting covariant derivative). The spin-coefficient 
quantities 


p=m mV, b, and o =m mV, b, 


are of importance. Here, the real part of p measures the 
convergence of the congruence and the imaginary part 
defines its rotation; o measures its shear, where the 
argument of o defines the direction (perpendicular 
to y) of the axis of shear, and whose strength is defined 
by |o| (see Penrose and Rindler (1986) for a graphic 
description of these quantities). Defining propagation 
derivative along y by 


D = g Va 
we can write the Sachs equations as 
Dp = ø +o + ọ 
Do = 2po + Y 


where ®= -—(1/2)Rp 2 and Y= Capea me em, 
conventions for the Ricci tensor R,, and the Weyl 
tensor C,,.¢ being those of General Relativity: 
Overview (and of Penrose and Rindler (1984)). We 
note that it is the real Ricci component ® which 
governs the propagation of the divergence and the 
complex Weyl component Ų which governs the 
propagation of shear, though there are some non- 
linear terms. The quantity ® is normally taken non- 
negative, since it measures the energy flux across y 
(with, in fact ®=47GT,,0°0°, where T, is the 
energy tensor). The condition that ® > 0 at all points 
of spacetime and for all null directions #, is called 
the “weak energy condition.” (Again there is a minor 
discrepancy with Hawking and Ellis (1973) who 
adopt a somewhat stronger “weak energy condition,” 
which is the above but where 4 is also allowed to be 
future-timelike. Unfortunately, with this terminology, 
their “weak energy condition” is not strictly weaker 
than their “strong energy condition.”) 
It will now be assumed that p is real: 


p= 2 
which is always the case for propagation along the 
generators of a null hypersurface. The weak energy 
condition then has an important implication for us. 
We find that if A is an element of 2-surface area 
within the plane spanned by the real and imaginary 


parts of m*, then (this area element being propa- 
gated by D along the lines +) 


DA!/2 = = pA? 
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As a consequence, assuming ® > 0, 
D'A- —(G0 + ®)Al? <0 


This tells us that once the divergence (— p) becomes 
negative, then the area element must reduce to zero 
sometime in the future along y, assuming that y is 
future-null-complete in the sense that it extends to 
indefinitely large values of an affine parameter u 
defined along it, where an affine parameter asso- 
ciated with the parallel-propagated 4 satisfies 


CV u=1 


Such a place where the cross-sectional area pinches 
down to zero is a singularity of the congruence or null 
hypersurface, referred to as a “caustic.” (There are 
also terminological confusions arising from different 
authors defining the term “caustic” in slightly 
different ways. The terminology used here is slightly 
discrepant from that of Arnol’d (1992) (Chapter 3).) 

From this property, it follows that if we have a 
trapped surface S, then every generator of OI*(S), if 
extended indefinitely into the future, must eventually 
encounter a caustic. This, so far, tells us nothing about 
actual singularities in the spacetime M; even Minkowski 
space contains many null hypersurfaces with multitudes 
of caustic points. However, caustics do tell us some- 
thing significant about sets like 01*(S), which are the 
boundaries of future sets, and we come to this shortly. 


Causality Properties 


First, consider the basic causal relations. If a an b 
are two points of M, then if there is a nontrivial 
future-timelike curve in M from a to b we say that a 
“chronologically” precedes b and write 


a&b 


(so it would be possible for some observer’s world line 
to encounter first a and then b). If there is a future-null 
curve in M from a to b (trivial or otherwise), we say that 
a “causally” precedes b and write 


a<b 


(so it would be possible for a signal to get from a to 
b). We have the following elementary properties (see 
Penrose (1972)): 
axa 
if a & b then a < b 

ifa «b and b «& c then a <« c 

ifa «b and b <c thena <« c 

if a< b and b & c then a «&« c 

if a < b and b <c then a <c 


We generalize the definition of I~(S), above, to an 


arbitrary subset Q in M, obtaining the chronological 
future I*(Q) and past I (Q) of Q in M by 


I*(Q) = {q|p < q for some p € Q} 
I (Q) = {q|q < p for some p € Q} 


The notation {q| some property of q} denotes the set 
of q’s with the stated property and the causal future 
J*(Q) and past J~(Q) of Q in M by 


T Q) = {alp x q for some p € Q} 
J Q) = {qlq < p for some p € Q} 


The I+(Q) are always open sets, but the J= (Q) are not 
always closed (though they are for any closed set Q in 
Minkowski space). Thus, the sets I*(Q) have a more 
uniform character than the J* (Q), and it is simpler to 
concentrate, here, on the I+ (Q) sets. 

The boundary 0I*(Q) of I*(Q) has an elegant 
characterization: 


OI" (Q) = {ql (q4) E OI" (Q), but qé T (Q)} 


and the corresponding statement holds for ðI (Q). 
Boundaries of futures also have a relatively simple 
structure, as is exhibited in the following result (for 
which there is also a version with past and future 
interchanged): 


Lemma Let Q CM be closed, and p € OI*(Q) —Q, 
then there exists a null geodesic on OI*(Q) with 
future endpoint at p and which either extends along 
OI* (Q) indefinitely into the past, or until it reaches a 
point of Q. It can only extend into the future along 
OI*(Q) if p is not a caustic point of OI*(Q). 


Beyond a caustic point, the null geodesic would 
enter into the interior of I+ (Q), but this also happens 
(more commonly) when crossing another region of 
null hypersurface on ðI (Q). 

We wish to apply this to ôI" (S), for a trapped 
surface S, but we first need a further assumption that S 
lies in the interior of the (future) domain of dependence 
D*(H) of some spacelike hypersurface H. This region is 
defined as the totality of points g for which every 
timelike curve with future endpoint g can be extended 
into the past until it meets H. One can consider domains 
of dependence for regions H other than smooth space- 
like surfaces, but it is usual to assume, more generally, 
that H is a closed achronal set, where “achronal” means 
that H contains no pair of points a, b for whicha < b. 
We find that every point q in the interior intD*(H) of 
D*(H) has the further property that all null curves into 
the past from g will also eventually meet H if extended 
sufficiently. The physical significance of D*(H) is that, 
for fields with locally Lorentz-invariant and determi- 
nistic evolution equations, the (appropriate) initial data 
on H will fix the fields throughout D*(H) (and also 
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throughout the similarly defined past domain of 
dependence D~(H)). We find that points in the future 
Cauchy horizon H*(H), which is the future boundary 
of D*(H) defined by 


H*(H) = D*(H) -I-(D*(8)), 


has properties similar to the boundary of a past set, in 
accordance with the above lemma, and also for the 
past Cauchy horizon H~ (H), defined correspondingly. 


Singularity Theorems 
and Related Questions 


Now, applying our lemma to OIJ*(S), for a trapped 
surface S C intD*(H), we find that every one of its 
points lies on a null-geodesic segment y on OI*(S), 
with past endpoint on S (for if y did not terminate at S 
it would have to reach H, which is impossible). 
Assuming future-null completeness and weak energy 
(® > 0), we conclude that if extended far enough into 
the future, the family of such null geodesics y must 
encounter a caustic, and therefore they must leave 
OI*(S) and enter I*(S). We finally conclude that 
OI*(S) must be a compact topological 3-manifold. 
Using basic theorems, we construct an everywhere 
timelike vector field in intD*(H) which provides a 
(1-1) continuous map from the compact 0I*(S) to H, 
yielding a contradiction if H is noncompact, thereby 
establishing the following (Penrose 1965, 1968): 


Theorem The requirement that there be a trapped 
surface which, together with its closed future, lies in the 
interior of the domain of dependence of a noncompact 
spacelike hypersurface, is incompatible with future null 
completeness and the weak energy condition. 


We notice that this “singularity theorem” gives no 
indication of the nature of the failure of future null 
completeness in a spatially open spacetime subject to 
weak positivity of energy and containing a trapped 
surface. The natural assumption is that in an actual 
physical situation of such gravitational collapse, the 
failure of completeness would arise at places where 
curvatures mount to such extreme values that 
classical general relativity breaks down, and must be 
replaced by the appropriate “quantum geometry” (see 
Quantum Geometry and its Applications, etc.). 
Hawking (1965) showed how this theorem (in time- 
reversed form) could also be applied on a cosmolo- 
gical scale to provide a strong argument that the 
Big-Bang singularity of the standard cosmologies is 
correspondingly stable. He subsequently introduced 
techniques from “Morse theory” which could be 
applied to timelike rather than just null geodesics 
and, using arguments applied to Cauchy horizons, 


was able to remove assumptions concerning domains 
of dependence (e.g., Hawking (1967)). A later 
theorem (Hawking and Penrose 1970) encompassed 
most of the earlier ones and had, as one of its 
implications, that virtually all spatially closed uni- 
verse models, satisfying a reasonable energy condition 
and without closed timelike curves, would have to be 
singular, in this sense of “incompleteness,” but again 
the topological-type arguments used give little indica- 
tion of the nature or location of the singularities. 

Another issue that is not addressed by these 
arguments is whether the singularities arising from 
gravitational collapse are inevitably “hidden,” as in 
Figure 1, by the presence of a horizon — a conjecture 
referred to as “cosmic censorship” (see Penrose 
(1969, 1998)). Without this assumption, one cannot 
deduce that gravitational collapse, in which a trapped 
surface forms, will lead to a black hole, or to the 
alternative which would be a “naked singularity.” 
There are many results in the literature having a 
bearing on this issue, but it still remains open. 

A related issue is that of strong cosmic censorship 
which has to do with the question of whether 
singularities might be observable to local observers. 
Roughly speaking, a naked singularity would be one 
which is “timelike,” whereas the singularities in black 
holes might in general be expected to be spacelike 
(or future-null), and in the Big Bang, spacelike (or past- 
null). There are ways of characterizing these distinctions 
purely causally, in terms of past sets or future sets (sets Q 
for which Q =T (Q) or Q =I" (Q)); see Penrose (1998). 
If (strong) cosmic censorship is valid, so there are no 
timelike singularities, the remaining singularities would 
be cleanly divided into past-type and future-type. In the 
observed universe, there appears to be a vast difference 
between the structure of the two, which is intimately 
connected with the second law of thermodynamics, 
there appearing to be an enormous constraint on 
the Weyl curvature (see General Relativity: Overview) 
in the initial singularities but not in the final ones. 

Despite the likelihood of singularities arising in their 
time evolution, it is possible to set up initial data for the 
Einstein vacuum equations for a wide variety of 
complicated spatial topologies (see Einstein Equations: 
Initial Value Formulation). On the observational side, 
however, there seems to be little evidence for anything 
other than Euclidean spatial topology in our actual 
universe (which includes black holes). Speculation on 
the nature of spacetime at the tiniest scales, however, 
where quantum gravity might be relevant, often 
involves non-Euclidean topology, however. It may be 
noted that an early theorem of Geroch established that 
the constraints of classical Lorentzian geometry do not 
permit the spatial topology to change without viola- 
tions of causality (closed timelike curves). 
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Introduction 


Spectral sequences are a tool for collecting and 
distilling the information contained in an infinite 
number of long exact sequences. Their most 
common use is the calculation of homology by 
filtering the object under study and using a spectral 
sequence to pass from knowledge of the homology 
of the filtration quotients to that of the object itself. 
This article will discuss the construction of spectral 
sequences and the notion of convergence including 
conditions sufficient to guarantee convergence. 
Some sample applications of spectral sequences are 
given. 

A differential on an abelian group G is a self-map 
d:G—G such that d*=0. A morphism of differ- 
ential groups is a map f : G — G’ such that d'f = fd. 
The condition d*=0 guarantees that Imd C Kerd, 


so to the differential group (G,d) we can associate 
its homology, H(G,d):=Kerd/Imd. Often G has 
extra structure and we require d to satisfy some 
compatibility condition in order that H(G, d) should 
also have this structure. For example, a differential 
graded Lie algebra (L,d) requires a differential d 
which satisfies the condition d|x,y|=|dx,y]+ 
(—1)""[x, dy]. While, for simplicity, throughout this 
article we will always assume that G is an abelian 
group, the concepts are readily extended to the case 
where G is an object of some abelian category and 
generalizations to nonabelian situations have also 
been studied. 

An important example of extra structure is the 
case where G=Q@*_..G, is a graded abelian 
group. The appropriate compatibility condition for 
a differential graded group is that d should be 
homogeneous of degree —1. That is, d(G,,) C Gy_1. 
In many contexts it is more natural to use super- 
scripts and regard d as having degree +1; the two 
concepts are equivalent via the reindexing conven- 
tion G”:=G_,. Another important example is that 
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where G forms a graded algebra, meaning that it has 
a multiplication G,®G,—-G,.,. To form a 
differential graded algebra, in addition to having 
degree —1, d is required to satisfy the Leibniz rule 
d(xy) =d(x)y + —(1)"!xd(y) (where |x| denotes the 
degree of x) familiar from the differentiation of 
differential forms. 

In many cases, G itself is not the main object of 
interest, but is a relatively large and complicated 
object, G = G(X), formed by applying some functor 
G to the object X being studied. For example, X 
might be some manifold and G could be the set of 
all differential forms on X with the exterior 
derivative as d. The presumption is that H(G(X)) 
carries the information we want about X in a much 
simpler form than the whole of G(X). 

A spectral sequence (Leray 1946) is defined 
simply as a sequence ((E’,d")), = no, n41, Of el 
ential abelian groups such that Ertl = H(E’, d"). B 
reindexing, we could always arrange that no = 1, but 
sometimes it is more natural to begin with some 
other integer. If all terms (E”,d") of the spectral 
sequence have the appropriate additional structure, 
we might refer, for example, to a spectral sequence 
of Lie algebras. If there exists N such that E” = EN 
for all r > N (equivalently d’=0 for all r> N), the 
spectral sequence is said to “collapse” at EN. 

The definition of spectral sequence is so broad 
that we can say almost nothing of interest about 
them without putting on some additional condi- 
tions. We will begin by considering the most 
common type of spectral sequence, historically the 
one that formed the motivating example: the 
spectral sequence of a filtered chain complex. 


Filtered Objects 


To study a complicated object X, it often helps to 
filter X and study it one filtration at a time. A 
filtration Fy of a group X is a nested collection of 
subgroups 


Fx:=... Fa X C Fna X C- CX —o<n<% 


A morphism f:Fx— Fy of filtered groups is a 
homomorphism f : X — Y such that f(F,,(X)) c F, (Y). 
The groups F,X/F,-1X are called a “filtration 
quotients” and their direct sum Gr(Fx):= @, F,X/ 
F„,—1X is called the associated graded per of the 
filtered group Fx. In cases where X has additional 
structure, we might define special types of filtra- 
tions satisfying some compatibility conditions so 
that Gr(Fy) inherits the additional structure. For 
example, an algebra filtration of an algebra X is 
defined as one for which (F,,X)(F,X) C Fy. .X. 


Since our plan is to study X by computing 
Gr(F x), the first question we need to consider is 
what conditions we need to place on our filtration 
so that Gr(Fx) retains enough information to 
recover X. Our experience from the “5-lemma” 
suggests that the appropriate way to phrase the 
requirement is to ask for conditions on the filtra- 
tions which are sufficient to conclude that f : X — Y 
is an isomorphism whenever f:Fx—Fy is a 
morphism of filtered groups for which the induced 
Gr(f) : Gr(X) — Gr(Y) is an isomorphism. 

It is clear that GrF x can tell us nothing about 
X —(UX,) so we require that X =UX,,. Similarly 
we need that OX, =0. However, the latter condition 
is insufficient as can be seen from the following 
example. 


Example 1 Let X:=@;_,Z and Y:=|[;_, Z. Set 
X ifn>0 
BXI={ te z if n <0 
Y ifn>0 
BY=| te 7 if n <0 


and let f: X — Y be the inclusion. Then Gr(f) is an 
isomorphism but f is not. 


To phrase the appropriate condition we need the 
concept of algebraic limits. Given a sequence of 
objects {X,},<7 and morphisms f,:X,—7 Xn41 in 
some category, the “direct limit” or “colimit” of the 
sequence, written lim „FnX, is an object X together 
with morphisms g, :X,— X satisfying gn41 © fn = 8n, 
having the universal property that given any object 
X' together with maps g/,: X, — X’ satisfying g),,, © 
fr=2,,, there exists a unique morphism h: X — X’ 
such that g/=hog, for all n. By the usual 
categorical argument the object X, if it exists, is 
unique up to isomorphism. The dual concept, 
“inverse limit” or simply “limit” of the sequence, 
written lim F,X, is obtained by reversing the 
directions of the morphisms. For intuition, we note 
that these notions share, with the notion of limits of 
sequences in calculus, the properties that changing 
the terms X, only for ~<N does not affect 
lim, F,X, and if the sequence stabilizes at N (i.e., 
the morphisms f, are isomorphisms for all n > N ) 
then lim, „FnX S Xn. Similarly lim „FnX depends 
only upon behavior of the sequence as N— —o0. 
Limits over partially ordered sets other than Z can 
also be taken but we shall not need them in this 
article. Although limits need not exist in general, in 
the category of abelian groups, both the direct and 
inverse limit exist for any sequence and are given 
explicitly by the following constructions. lim 
F, X = @X,/~ where, letting i,:X,— @X, be the 


canonical inclusion, the equivalence relation is gener- 
ated by in(x)~iniif(x) for x€ Xy. lim F,X= 
(Xn) € IIX, falXn) = Xn+1 Vn}. 

The condition needed is that our filtrations should 
be bicomplete, defined as follows. Fy is called 
“cocomplete” if the canonical map X—lim F„X 
is an isomorphism and Fy is called “complete” if 
X => lim X/F„X is an isomorphism. Fx is called 
bicomplete if it is both complete and cocomplete. 
Note that Fx cocomplete is equivalent to UF„,X = X 
but Fy complete is stronger than NF,,X = 0. 


Theorem 1 (Comparison theorem). Let Fy be 
bicomplete and let Fy be cocomplete with 
NF, Y =0. Suppose that f: Fx —Fy is a morphism 
such that Gr(f):Gr(X) — Gr(Y) is an isomorphism. 
Then f : X — Y is an isomorphism. 


Filtered Chain Complexes 


A chain complex (C,d) of abelian groups consists of 
abelian groups C, for n € Z together with homo- 
morphisms d, : Cn —> C,—1 such that d, o d,41 =O for 
all n. To the chain complex (C, d) we can associate 
the differential (abelian) group (C., d) := 7- a Cn 
with d|ç, induced by d,. We often write simply C if 
the differential is understood. The dual notion in 
which d has degree +1 is called a cochain complex 
and the concepts are equivalent through our 
convention C” := C_,. 


Theorem 2 (Homology commutes with direct 
limits). A(lim, C,) = lim, H( C). 


As we shall see later, failure of homology to 
commute with inverse limits is a source of great 
complication in working with spectral sequences. 

Let Fc be a filtered chain complex. In many 
applications, our goal is to compute H,(C) from a 
knowledge of H,(F,C/F,-1C) for all n. The overall 
plan, which is not guaranteed to be successful in 
general, would be: 


1. use the given filtration on C to define a filtration 
on H,(C), 

2. use our knowledge of H,(GrC) to compute 
Gr H,(C), 

3. reconstruct H,.(C) from Gr H,(C). 


To begin, set F,(H.,C):=Im(s,),, where s,: 
F,(C)—C is the inclusion (chain) map from the 
filtration. The spectral sequence which we will 
define for this situation can be regarded as a method 
of keeping track of the information contained in 
the infinite collection of long exact homology 
sequences coming from the short exact sequences 


0 — F,_1C —> F, C > F,,C/F,_1C — 0. When working 
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with a long exact sequence, knowledge of two of 
every three terms gives a handle on computing the 
remaining terms but does not, in general, completely 
determine those terms, which explains intuitively 
why we have some reason to hope that a spectral 
sequence might be useful and also why it is not 
guaranteed to solve our problem. 

Before proceeding with our motivating example, 
we digress to discuss spectral sequences formed from 
exact couples. 


Exact Couples 


In this section, we will define exact couples, show 
how to associate a spectral sequence to an exact 
couple, and discuss some properties of spectral 
sequences coming from exact couples. As we shall 
see, a filtered chain complex gives rise to an exact 
couple and we will examine this spectral sequence in 
greater detail. 

Exact couples were invented by Massey and many 
books use them as a convenient method of con- 
structing spectral sequences. Other books bypass 
discussion of exact couples and define the spectral 
sequence coming from a filtered chain complex 
directly. 


Definition 1 An “exact couple” consists of a 


triangle 
p=) 
J 
E 


containing abelian groups D, E, and together with 
homomorphisms i, j, k such that the diagram is 
exact at each vertex. 


In the following, to avoid conflicting notation 
considering the many superscripts and subscripts 
which will be needed, we use the convention that an 
n-fold composition will be written f°” rather than 
the usual f”. 

Given an exact couple, set d:=jk:E— E. By 
exactness, kj=0, so d? =jkjik=0 and therefore 
(E,d) forms a differential group. To the exact 
couple we can associate another exact couple, called 
its derived couple, as follows. Set D’:=Im iC D and 
E= (Ed). Detine :=ilp and let 7D'—£ be 
given by j'(iy):= j(y), where x denotes the equiva- 
lence class of x. The map k’: E’ —D' is defined by 
k'(Z):=kz. One checks that the maps /’ and k’ are 
well defined and that (D’, F’,7’, 7’, k’) forms an exact 
couple. Therefore, from our original exact couple, 
we can inductively form a sequence of exact couples 
(D'E, i,j,k"), with D!:= D, E!:= E, D':= (D=! y 


626 Spectral Sequences 


and E’:=(E’~')'’. This gives a spectral sequence 
(Ed) With d =k". 

To the filtered chain complex Fc, we can 
associate an exact couple as follows. Set D:= @,,, 
Dag where Dy g=—HeigiipC) and ES Oral ha 
where Ep g = Hp+q(FyC/Fp_-1C). The long exact 
homology sequences coming from the sequences 
0 — Fy, 1C-+ F,C > F,C/F,_1C — 0 give rise, for 
each p and q, to maps a,:Dp-1,941 > Dp,q, 0x: 
Dyo > Eng and: Eng => Dp=ig Definer: D= D 
to be the map whose restriction to Dp—1,q+1 1s the 
composition of a, with the canonical inclusion 
Dy,q — D. Similarly, define j: D — E and k: E— D 
to be the maps whose restrictions to each 
summand are the compositions of b, and ð with 
the inclusions. The indexing scheme for the bigrada- 
tions is motivated by the fact that in many 
applications it causes all of the nonzero terms to 
appear in the first quadrant, so it is the most 
common choice, although one sometimes sees other 
conventions. 

There is actually a second exact couple we could 
associate to Fc, which yields the same spectral 
sequence: use the same E as above but replace D by 
P Dy,g with Dp, g = Hp+g41(C/FpC), and define i, f, 
and k in a manner similar to that above. 

When dealing with cohomology rather than 
homology, the usual starting point would be a 
system of inclusions of cochain complexes -- - F’+!'C 
C F”"C c F*!Cc.--- CC. This can be reduced to the 
previous case by replacing the cochain complex C by 
a chain complex C, using the convention Cp := C? 
and filtering the result by F,,C,:=F "C. The usual 
practice, equivalent to the above followed by a 
rotation of 180°, is to leave the original indices and 
instead reverse the arrows in the exact couple. In 
this case, it is customary to write D41 and E®4 for 
the terms in the exact couple and spectral sequence. 

In applications, it is often the case that E! is 
known and that our goal includes computing Dt. 
The example of the filtered chain complex with the 
assumption that we know H,(F,C/F,-1C) for all p 
is fairly typical. 

Since each D” is contained in D’~! and each E’ is 
a subquotient of E’', the terms of these exact 
couples get smaller as we progress. To get properties 
of the spectral sequence, we need to examine this 
process and, in particular, analyze that which 
remains in the spectral sequence as we let r go to 
infinity. 

For x € E, if dx=0 then x belongs to E? and so 
d*(x) is defined. In the following, we shall usually 
simplify the notation by writing simply x in place of 
x and writing d’x=0 to mean “d’x is defined and 
equals 0.” 


If dx=0,...,d’'x=0, then x represents an 
element of E” and d'x is defined. Set Z” := {x € 
E |d”x=0Ym < r}. Then E! © Z’/~ where x ~ y 
if there exists z € E such that for some t< r we 
have d”z=0 for m <t (thus d’z is defined) and 
d'z =x — y. With this as motivation, we set Z% := 
DZ = {x € E |d”x = 0 Ym} (known as the “infinite 
cycles”) and define E” := Z°%/~ where x~ y if 
there exists z € E such that for some t we have 
d”z=0 for m < t and dz =x — y. 

Notice that D™™t = Imi” © D/Ker i”. There is no 
analog of this statement for r=oo. Instead we have 
separate concepts so we set D®:=D/U, Ker i” 
and ®D := N, Imi”. The analog of the rth-derived 
exact couple when r=oco is the following exact 
sequence. 


Theorem 3 There are maps induced by i, j, and k 
producing an exact sequence 


0 > D” D” L E” 4°-p—.°D 


The fact that we were able to add the O term to 
the left of this sequence but not the right can be 
traced to the fact that lim preserves exactness but 
lim does not. 

In our motivating example, the terms of the initial 
exact couple came with a bigrading D = @ D,,, and 
E=@E,,, and writing |f| for the bidegree of a 
morphism f we had: |¿:|=(1, —1);lj| = (0, 0); |k| = 
(—1,0);d=(— 1,0). It follows that |:| =(1,—1); || = 
(—_r+1,r— 1); |k"| =(—1, 0); |d| =(—r,r — 1) which 
is considered the standard bigrading for a bigraded 
exact couple. Similarly, the standard bigrading for a 
bigraded spectral sequence is one such that 
|d| = (—r,r — 1). 

We observed earlier that terms of an exact couple 
and its corresponding spectral sequence get smaller 
as r— œ as each is a subquotient of its predecessor. 
Note that the bigrading is such that this applies to 
each pair of coordinates individually (e.g., ae is 
a subquotient of E; ,) and so in particular if the 
p, g-position ever becomes 0 that position remains 0 
forevermore. 


Convergence of Graded Spectral 
Sequences 


As noted earlier, the definition of spectral sequence 
is so broad that we need to put some conditions on 
our spectral sequences to make them useful as a 
computational tool. From now on, we will restrict 
attention to spectral sequences arising from exact 
couples in which D=@ Dp and E=@Ey are 
graded with i|p CDp+1,/|p,C Ep, and kip CDp-1. 
All the spectral sequences which have been studied 


to date satisfy this condition and in fact most also 
have a second gradation as in the case of our 
motivating example. To see how to proceed, we 
examine that case more closely. 

For a filtered chain complex Fc with structure 
maps oe — C we defined F,(H,(C))=Imsy,. If 
x=i°"—Yy belongs to 


Di alm ih : Ap tg (For ©) 9 Hera TyC) 


then (sp) = (sp), P0 y= (Spat) ei = (sp+1),ix 
Therefore, we have a commutative diagram 
Dig — Fol(Ap+q(C)) 


\ | 
Diga ~ Fp (Aptq(C)) 


yielding a map 


D g= uD g Fp+1 (Hp+q( C))/Fp (Hp+q(C)) 
= Grp +1 (Hp+4)C 

rae r go to infinity, we get an induced map 

o: D” /i®(D®) — Gr(H(C)). 
Theorem 4 If Fy(C 
(1) D” = F,,(H(C)); 
(ii) ġ:D®/i®(D®)— Gr(H(C)) is an isomorphism; 
(iii) There is an exact sequence 0 — Gr(H(C)) 


p25 ep op, 


) is cocomplete then 


We say that the spectral sequence (E") “abuts” to 
F if there is an isomorphism GrL — E”. Here we 
mean an isomorphism of graded abelian groups, 
which makes sense since under our assumptions E” 
inherits a grading from Et for each r. If in addition 
the filtration on L is cocomplete, we say that (E’) 
“weakly converges” to Fy, and if it is bicomplete we 
say that (E") “converges” (or strongly converges) to 
Fı. The notation (E) > Fy, (or simply (E") > L 
when the filtration on L is either understood or 
unimportant) is often used in connection with 
convergence but there is no universal agreement as 
to which of the three concepts (abuts, weakly 
converges, or converges) it refers to! In this article, 
we will also use the expression (E’) “quasicon- 
verges” to F; to mean that the spectral sequence 
weakly converges to F with ,F,L=0. (Note: the 
terminology quasiconverges is nonstandard although 
the concept has appeared in the literature, some- 
times under the name converges.) 

While it would be overstating things to claim that 
convergence of the spectral sequence shows that E” 
determines H(C), it is clear that convergence is what 
we need in order to expect that E® contains enough 
information to possibly reconstruct H(C). The sense 
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in which this is true is stated more precisely in the 
following theorem. 


Theorem 5 (Spectral 
theorem). Let f=(f"): 
of spectral sequences. 


sequence comparison 
(E") = E be a morphism 


i) If f: EN = EN is an isomorphism for some N, 
then f" is an isomorphism for all r > N (includ- 
ime F =o), 

Suppose in addition that (E") converges to Fx 
and (E") quasiconverges to F x. Let o: Fx > Fx 
be a morphism of filtered abelian groups which 
is compatible with f. (i.e., there exist isomorph- 
isms n:GrX = E” and #:GrX = E” such that 
f° on=HoGr(f)). Then f:X—>X is an 
isomorphism. 


xr 


(ii 


Within the constraints provided by Theorem 5, a 
spectral sequence might have many limits. A typical 
calculation of some group Y by means of spectral 
sequences might proceed as an application of 
Theorem 5 along the lines of the following plan. 


1. Subgroups F,Y forming a filtration of Y are 
defined, although usually not computable at this 
point. The subgroups are chosen in a manner that 
seems natural bearing in mind that to be useful it 
will be necessary to show convergence properties. 

2. Directly or by means of an exact couple, a 
spectral sequence is defined in a manner that 
seems to be related to the filtration. 

3. Some early term of the spectral sequence (usually 
E! or E?) is calculated explicitly and the 
differentials d” are calculated successively result- 
ing in a computation of E”. 

4. With the aid of the knowledge of E™, a 
conjecture Y =G is formulated for some G. 

5. A suitable filtration on G and a map of filtrations 
Fo — Fy or Fy — Fg are defined. 

6. The spectral sequence arising from Fg is demon- 
strated to converge to G. 

7. The original spectral sequence is demonstrated to 
converge to Y and Theorem 5 is applied. 


The hardest steps are usually (3) and (7). For step 
(3), in most cases the calculations require knowledge 
which cannot be obtained from the spectral sequence 
itself, although the spectral sequence machinery plays 
its role in distilling the information and pointing the 
way to exactly what needs to be calculated. Steps 
(4)-(6) are frequently very easy, and often not stated 
explicitly, with “by construction of G” being the 
most common justification of (6). We now discuss 
the types of considerations involved in step (7). 

Convergence of a spectral sequence to a desired L 
can be difficult to verify in general partly because 
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the conditions are stated in terms of some filtration 
(usually understood only in a theoretical sense) on 
an initially unknown L rather than in terms of 
properties of the spectral sequence itself or an exact 
couple from which it arose. Theorems 2 and 4(ii) 
give us the following extremely important special 
case in which we can conclude convergence to H(C) 
of the spectral sequence for Fc based on conditions 
that are often easily checked. 


Theorem 6 If Fc is a filtered chain complex such 
that Fc is cocomplete and there exists M such that 
H(F,,C)=0 for n < M, then the spectral sequence 
for Fc converges to H(C). 


Although the second hypothesis, which implies 
that °D=0, is very strong it handles the large 
numbers of commonly used filtrations which are 0 
in negative degrees. 

Under the conditions of Theorem 6, inserting the 
bigradings into Theorem 4 gives a short exact 
sequence’ OD inr 7D, 2 0. with 
Dig = Fol Ap+g(X))s cquivalenily 


PHC) )/ Fea Hal C)) S E; yk 


Thus, the only E®”-terms relevant to the computa- 
tion to H,,(C) are those on the diagonal p + q =n. In 
the important case of a first quadrant spectral 
sequence (Ef ¿=0 if p <0 or q< 0), the number 
of nonzero terms on any diagonal is finite so the 
E*-terms on the diagonal p+q=n give a finite 
composition series for each H,,(C). 

Here is an elementary example of an application 
of a spectral sequence. 


Example 2 Let S,() denote the singular chain 
complex, let H,():=H,(S,()) denote singular 
homology, and let H‘!"() denote cellular homology. 
Let X be a CW-complex with n-skeleton X). The 
inclusions S,(X')) — S,(X) yield a filtration on 
S,(X). In the associated spectral sequence, 


E, q =Hp+4 (x) xe») 
7 o abelian group on the p-cellsof X ifq=0 
= LO if g #0 


The differential 


diy: Hy (xo xe») sH (xexe) 


is the definition of the 
homology. Therefore, 


p2 -J H(X) ifq=0 
= LO if g #0 


differential in cellular 


lim! Z’ = 
rans 


Looking at the bidegrees, the domain or range of di P 
is zero for each p and q so d? =0, and similarly 
d” =0 for all r > 2. Therefore, the spectral sequence 
collapses with E? = E®. The spectral sequence con- 
verges to H,(X) so the terms on the diagonal 
p+qz=n form a composition series for H,(X). 
Since the (7,0) term is the only nonzero term on 


this diagonal, H,,(X) = H(X). That is, “cellular 
homology equals singular homology.” 
Returning to the general situation, set Lo := = 


lim „Dn and L_.:=Jjim „Dn. Filter Læ DY FaLa 
in: — Lo) d i L- DY Talb = 
(L~ —> D,). It follows from the definitions that 
Fie =D and so DAID 4 )=—Gar les. At the 
other end, the canonical map L- — D, lifts to °D, 
yielding an injection L_,,/F,L_.. —> ®D,. Therefore, 
for each n there is an injection Gr,L_,, — K, where 
K, = Ker(°Dy_-1 > ~D,). In general, the map 
L_.—°*D, need not be surjective (an element 
could be in the image of i” for each finite r without 
being part of a consistent infinite sequence), although 
it is surjective in the special case when YD, — ~D.41 
is surjective for each s. In the latter case we get 
Gr L_» & K. As we will see in the next section, the 
exact sequence of Theorem 3 extends to the right 
(Theorem 8) giving lim lim’ Z’ =Q as a sufficient condition 
that ”D; Dai be surjective for each s, where lim‘ 
is described in that section and (Z") to the system 
of inclusions CL Cr C Thus, 
Oisa wee a condition for a L`» SK. 


Taking into account the short exact sequence 
0 — D®”/i”(D®”) + EX — K—0 coming from 
Theorem 3, et preceding discussion yields two 
obvious O N for a suitable Fr: Fr, or Fe, 
In theory there are other possibilities, but in 
practice one of these two cases usually occurs. We 
examine them individually and see what additional 
conditions are required for convergence. 


Case I: Conditions for convergence to Fi Ţ7 It is 
easily checked from the definitions that lim D= 
lim „Dn so Fr. is always cocomplete. The clone 
bedes Gr Lẹ XE” (equivalently, K=0), it is 
required to verify that Fr 7 is complete. As we will 
see in the next section, the completeness condition can 
be restated as ND, =0 and lim'D, =0. According to 
the preceding discussion, undef the assumption that 
L- =N D” =0, which we need anyway as part of the 
requirement that Fı 7 be complete, lim! Z,X =0 is 
sufficient to show K = 0. 


Case II: Conditions for convergence to F,_. Any 
inverse limit is complete in its canonical filtration, so 
Fı__7 is always complete and the issues are whether 
GrL_. SE” and whether Fz_. is cocomplete. 
Fı__7 is cocomplete if and only if every element of 


L_ lies in Ker(L_., — D,,) for some n, for which a 
sufficient condition is that L,=0 or equivalently 
E* = K. Therefore, if the reason for the isomorph- 
ism GrLl_. E” is that the maps E®—»K and 
Gr L`» >> K are isomorphisms, then the rest of the 
convergence conditions are automatic. In particular, 
to deduce convergence to Fy_. it suffices to know 
that L» =0 and lim’ „Zr =0. 


Derived Functors 


The left and right derived functors L,T,R”T of a 
functor T provide a measure of the amount by which 
the functor deviates from preserving exactness. 

The category Tnv of inverse systems indexed over Z 
(i.e. the category whose objects are diagrams 
of abelian groups --- 
forms an abelian category in which a sequence of 
morphisms A’ — A — A” is exact if and only if the 
sequence A,’ > A, — A,” of abelian groups is exact 
for each n. The functor of interest to us is lim : Znv > 
AB where AB denotes the category of abelian groups. 

Let T: A—B be an additive functor between 
abelian categories. Suppose that X in Obj A has an 
injective resolution Ix. The definition of additive 
functor implies that T takes zero morphisms to zero 
morphisms, so TIx forms a cochain complex in B. 
The right derived functors of T are defined by 
(R”T)(X):= H"(TIx). The result is independent of 
the choice of injective resolution (assuming one 
exists) and satisfies: 


= Ag 4 = Aye Ape oS) 


1. If T is “left exact” (meaning that T preserves 
monomorphisms), then R°T(X) = T(X); 

2. If T preserves exactness, then (R”T)(X) 
n >Q. 


Theorem 7 Let 0 — X' — X — X" — 0 be a short 
exact sequence in A. Suppose T is left exact and that 
all the objects have injective resolutions. Then there 
is a (long) exact sequence 


0 > T(X') — T(X) — T(x") > 


—() for 


(R'T)(X’) > 
> (R"'T)(X") > 
(RTX) => 


(R”T)(X) > (R"T)(X) > 


Similarly, the left derived functors of T are defined 
by using projective resolutions and have similar 
properties with respect to the obvious duality. 

The functor lim is left exact and in the category 
Inv every object has an injective resolution. There- 
fore lim” is defined and lim’ X, =lim X,5, where 
lim? denotes the derived functor R4 (lim ). It turns 
out that lim? is : for q > 1, but we are particularly 


neee in Jim’ . 
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Let (X,,) be an inverse system with structure maps 
in-1: Xn-1 > Xp. An explicit construction for lim’ 
X, is as follows. Define ¢:][,X, > [],, Xn ~By 
letting $(x,) be the sequence whose nth component 
is (Xn — in—1Xn-1). Then lim’ Xn ~ Coker ¢. Observe 


that Kero S lim X, o e to the explicit for- 
mula for lim pes given earlier. 

Recall that we defined ~D= N, Im 4” S lim D”. 
The exact sequence of Theorem 3 can be exterided 
to give: 


Theorem 8 There is an exact sequence 
0 = D” pw +, Fe $6 ep ep 


L lim! Z" ++ lim'D' 5 lim! D’ —> 0 
-— 7 =r ~r 
It is clear from the explicit construction that if the 
system (X,,) stabilizes with X, = G for all sufficiently 
small n, then lim X=G and lim'X=0. If the 
spectral "segue oe collapses at any stage then the 
system (Z") stabilizes at that point, and so for a 
spectral sequence which collapses, the condition 
lim'Z’=0, which arose in the discussion of 


convergence in the previous section, 1s automatic. 


Let Fx be a filtered abelian group. Applying 
Theorem 7 to the short exact sequence 0 — F,,X — 
X — X/F,X — 0 of inverse systems gives an exact 
sequence 


0— lim FaX => lim X — lim X/F, 


tim! F,X > lim' x 


Since lim X =X and lim’ X = 0, we get 
<n 


Theorem 9 Fx is 5 couple if and only if 
lim nX = 0 and lim’ aX = 0. 


~ When working <n lim’ the following sufficient 
condition for its vanishing, known as the Mittag- 
Leffler condition, is often useful. 


Theorem 10 Suppose A is an inverse system in 
which for each n there exists ie <n such that 
Im(A; — A n) equals Im(Agin) —> An) for all i < k(n). 
Then lim’ a 0. 


Of course, this will not be (directly) useful in 
establishing lim! F,,X =0 since the structure maps in 
that system are all monomorphisms. 


Some Examples of Standard Spectral 
Sequences and Their Use 


To this point we have considered the general theory 
of spectral sequences. The properties of the spectral 
sequences arising in many specific situations have 
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been well studied. Usually the spectral sequence 
would be defined either directly, through an exact 
couple, or by giving some filtration on a chain 
complex. This defines the E!-term. Typically, a 
theorem would then be proved giving some formula 
for the resulting E*-term. In many cases, conditions 
under which the spectral sequence converges may 
also be well known. 

In this section, we shall take a brief look at the 
Serre spectral sequence, Atiyah—Hirzebruch spectral 
sequence, spectral sequence of a double complex, 
Grothendieck spectral sequence, change of ring 
spectral sequence, and Eilenberg—Moore spectral 
sequence, and carry out a few sample calculations. 


Serre Spectral Sequence 


Let F — X—> B be a fiber bundle (or more generally 
a fibration) in which the base B is a CW-complex. 
Define a filtration on the total space by 
F,X:=7a'B™, This yields a filtration on H,(X) by 
setting F,,H,(X):=Im(H,(F,X)— H,(X)). The spec- 
tral sequence coming from the exact couple in which 
D! | :=Hpig(FpX) and E! „:= Hp44(FpX, Fp-1X) is 
called the “Serre spectral sequence” of the fibration. 
Theorems from topology poaiantep that this filtra- 
tion is cocomplete and that E g =0 if either p < 0 
or q < 0. Therefore, the Serre spectral sequence is 
always a first a spectral sequence converging 
to H,,(X). 


Theorem 11 (Serre). In the Serre spectral sequence 
of the fibration F — E — B there is an isomorphism 
E2 | = H,(B;'H (F)). 


Here ‘H,(F) denotes a “twisted” or “local” 
coefficient system in which the differential is 
modified to take into account the action, coming 
from the fibration, of the fundamental groupoid of 
the base B on the fiber F. In the special case where B 
is simply connected and Tor(H,(B), H,(F))=0, the 
“universal coefficient theorem” says that the 
E*-term reduces to F =~ H,(B) 8 H4(F). 

The Serre spectral sequence for cohomology, 
ES? ~ H?(B;'H4(F)) > H?*+4(X), has the advantage 
that it is a spectral sequence of algebras which 
greatly simplifies calculation of the differentials d, 
which are restricted by the requirement that they 
satisfy the Leibniz rule with respect to the cup 
product on H*(B) and H*(F), and which also allows 
the computation of the cup product on H*(X). Since 
it is a first quadrant spectral sequence, convergence 
is not an issue. 

Frequently in applications of the Serre spectral 
sequence, instead of using the spectral sequence to 
calculate H,(X) from knowledge of H,(F) and H,(B) 


it is instead H,(X) and one of the other two 
homologies which is known, and one is working 
backwards from the spectral sequence to find the 
homology of the third space. 


Example 3 The universal S'-bundle is the bundle 
St — S — CP® where S% is contractible. We will 
calculate H*(CP%) from the Serre spectral sequence 
of this bundle, taking H*(S') and H*(S*) as known. 
We also take as known that CP% is path connected, 
so (CP) = Z, 


E54 ~ H?(CP™) @ HS) 
x f HECCPS) ig =0 orl 
= oO otherwise 


E,,-terms on the diagonal p +q =n form a compo- 
sition series for H”(S%) which is zero for n Æ 0. 
Therefore E?7=0 unless p=0 and q=0, with 
E%° ~ Z, Because all nonzero terms lie in the first 
quadrant, the bidegrees of the differentials show 
that d,(E} °Y—0 for all r>2, - =i = 

LS =H'(CP™), Since EP? = EH @ E24, it follows 
hat E 1—0 for all q. Talane into the account the 
now zero terms, the bidegrees of the differentials 
show that pe Ker(d; : E91 > E$?) and E% = 
Ey. Similarly, E2? = EZ” œ Coker(d): Ey’ > EY”). 
Therefore, the vanishing of these E,,-terms shows 
that d>: : EO! = B20 and in particular H™(CP™) > 
Za — HIS!) = 7 Attlee ds E54 SZE & 


ES’? for all q. With the aid of the fact that we 
r Ey = 0, we can repeat the argument used to 
show E; Lg =0 i all q to conclude that E; >41 — 0) for 
all q. Repeating the procedure, we ade dy find 
that E24 = E> for all p>0 and all q and in 
eae 


x xo, , | Z if mis even 
nce) a7 if n is odd 

The cup products in H*(CP®) can also be 
determined by taking advantage of the fact that the 
spectral sequence is a spectral sequence of algebras. 
Let a € Di ~ Z be a generator and set x := dha. By 
the preceding calculation, dy is an isomorphism so x 
is a generator of H?(CP®). Therefore, x @a is a 
generator of o and the isomorphism d2 gives 
that d(x ® a) is a generator of H4(CP®). However, 
dy(x ® a) = do(x ® 1)(1 @a)=0@ 1 + (-1)°(x @ 1)d 
a=x* & 1 and thus, x” is a generator of H*(CP®). 
Inductively, it follows that x” is a generator of 
H?"(CP®) for all n and so H*(CP®) = Z[x]. 

When working backwards from the Serre or 
other first quadrant spectral sequences in which 
E n E% 0 ® Ep y the following analog of the 
comparison theorem (Theorem 5) is often useful. 


Theorem 12 (Zeeman comparison theorem). Let 
E and E' be first quadrant spectral sequences such 
that E g l gS ER, and E ,=E,) @EQ,. Let 
f:E —> É bea bowen orp. of eal sequences 
such that on =F 0 O log: Suppose that fg; : ES, > 
ae is an isomorphism for all p and q. Then the 


par are equivalent: 


(i) fo: ee ed EY 0 is an isomorphism for p <n—1; 
(11) ae = ES, 


There is a version of the Serre spectral sequence 
for generalized homology theories coming from 
the exact couple obtained by applying the 
generalized homology theory to the Serre filtra- 
tion of X. 


q İs an isomorphism for q <n. 


Theorem 13 (Serre spectral sequence for generalized 
homology). Let F — X — B be a fibration and let 
Y be an (unreduced) homology theory satisfying the 
Milnor wedge axiom. Then there is a (right half- 
plane) spectral sequence with Ee q = Ap(B3 Yq (F)) 
converging to Yp+49(X). 


Cocompleteness of the filtration follows from the 
properties of generalized homology theories satisfy- 
ing the wedge axiom (Milnor 1962), and the rest of 
the convergence conditions are trivial since the 
filtration is 0 in negative degrees. Here, unlike 
the Serre spectral sequence for ordinary homology, 
the existence of terms in the fourth quadrant opens the 
possibility for composition series of infinite length, 
although in the case where B is a finite-dimensional 
complex all the nonzero terms of the spectral 
sequence will live in the strip between p=0 and 
p=dim B and so the filtrations will be finite. 

The special case of the fibration *« — X — X 
yields what is known as the “Atiyah—Hirzebruch 
spectral sequence”. 


Theorem 14 (Atiyah—Hirzebruch spectral sequence). 
Let X be a CW-complex and let Y be an (unreduced) 
homology theory satisfying the Milnor wedge 
axiom. Then there is a (right half-plane) spectral 
sequence with E P = H p(X; Yq(*)) converging to 
a 


In the cohomology Serre spectral sequence for 
generalized cohomology (including the cohomology 
Atiyah-Hirzebruch spectral sequence), convergence 
of the spectral sequence to Y*(X) is not guaranteed. 
Convergence to lim Y*(F„X), should that occur, 
would be of the type discussed in case II in the 
section “Convergence of graded spectral sequences”. 
Since X,=@ for n<0, the system defining Læ 
stabilizes to 0. Therefore, Lac =0 and, by the 
discussion in that section, lim! Z,X =0 becomes a 
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sufficient condition for convergence to lim Y*(F„X). 


However since the real object of study is usually 
Y*(X), the spectral sequence is most useful when one 
is also able to show lim! Y*(F,,X)=0 in which case 
the Milnor exact sequence (Milnor 1962) 


- Lyx * 
0>lim Y (Fa X) — Y*(X) 
—lim Y*(F„,X)— 0 
—— 


gives Y*(X) = lim Y*(F,,X). 

If Y*() has cup products then the spectral 
sequence has the extra structure of a spectral 
sequence of Y*(x)-algebras. In the case where B is 
finite dimensional, all convergence problems disap- 
pear since the spectral sequence lives in a strip and 
the filtrations are finite. 


Example 4 Let K*() be complex K-theory. Since 
K*(*) = Z[z,z1] with |z|=2, in the Atiyah- 
Hirzebruch spectral sequence for K*(CP”) we have 


ped — Z ifqisevenand piseven with 0 < p < 2n 
5 0 otherwise 


Because CP” is a finite complex, the spectral 
sequence converges to K*(CP”). Since all the non- 
zero terms have even total degree and all the 
differentials have total degree +1, the spectral 
sequence collapses at E2 and we conclude that 
K4(CP”) =0 if q is odd and that it has a composition 
series consisting of (n+ 1) copies of Z when q is 
even. Since Z is a free abelian group, this uniquely 
identifies the group structure of K°”"(CP”) as Z"*1. 
To find the ring structure we can make use of the 
fact that this is a spectral sequence of K*(x)- 
algebras. The result is K*(CP”) = K*(«)[x]/(x"**), 
where |x|= 


In the Atiyah—Hirzebruch spectral sequence for 
K*(CP™) again all the terms have even total degree 
so the spectral sequence collapses at E2. We noted 
earlier that collapse of the spectral sequence implies 
that lim'Z,X=0 and so the spectral sequence 

<—y 

convergences to lim K*(CP”), where we used 
Fy,CP~ = CP”. Since our preceding calculation 
shows that K*(CP”) — K*(CP*—') is onto, Mittag- 
Leffler (Theorem 10) implies that lim’ „K (CP”) = 0. 
Therefore, the spectral sequence _ converges to 
K*(CP®) and we find that K*(CP®)= lim 

K*(CP”), which is ea to the power series 
ring K*(x)[[x]], where |x| = 

In topology one might be E in the Atiyah- 
Hirzebruch spectral sequence in the case where X is 
a spectrum rather than a space (a spectrum being a 
generalization in which cells in negative degrees are 
allowed including the possibility that the dimensions 
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of the cells are not bounded below). In such cases, 
the spectral sequence is no longer constrained to lie 
in the right half-plane and convergence criteria are 
not well understood for either the homology or 
cohomology version. 


Spectral Sequence of a Double Complex 


A double complex is a chain complex of chain 
complexes. That is, it is a bigraded abelian group Cy, q 
together with two differentials d’: Cp, —> Cp_1,q and 
d":Cp.q—Cp,g-1 satisfying d od =0,d" od" =0, 
and d'd” =d"d'. Given a double complex C its total 
complex Tot C is defined by (Tot C), := p+q=n Cpa 
with differential defined by d|c, :=d' + (—1)?d’: 
Cy g — Cp-1,q p Cy g-1 C Tot,_1C. ) 

There are two natural filtrations, Fro and 
For co on Tot C given by 





(F,(TorC)) = Q Cs 
s+t=n 
Ss<p 
(Fy(TotC)) = @ Ce 
s+t=n 
t<p 
yielding two spectral sequences abutting to 


H,(TotC). In the first E% , = Hp(H4(Cs,+)) and in 
the other Ey q = Ag(Ap(Cs, «)). Convergence of these 
spectral sequences is not guaranteed, although the 
first will always converge if there exists N such that 
Cp, =0 for p < N and the second will converge if 
there exists N such that C,,,=0 for q < N. From 
the double complex C one could instead form the 
product total complex (Tot*C),,:= II, ge Cha and 
proceed in a similar manner to construct the same 
spectral sequences with different convergence pro- 
blems. In the important special case of a first 
quadrant double complex both spectral sequences 
converge and information is often obtained by 
playing one off against the other. 


Example 5 Let M and N be R-modules. Let 
Tor'(M, N) and Tor”?(M,N) be the derived func- 
tors of (_)® N and M 8 (_), respectively. Let P, and 
O, be projective resolutions of M and N respec- 
tively. Define a first quadrant double complex by 
Cy. g:= Pp ® Qq. Since Pp is projective, 


0O if 0 
H4 (Cp) = Pp 8 Ha(Cp x) = tN if ' á 0 
and so in the first spectral sequence of the double 
complex, 


ifg #0 


pe 0 
pq Tor, (M, N) ifq=0 


Therefore, the spectral sequence collapses to give 
H,,(Tot C) = Tor(M,N). Similarly, the second 
spectral sequence shows that H,,(Tot C) = Tor”? 
(M, N). Thus, Tor(M, N) can be computed equally 
well from a projective resolution of either variable. 


The technique of using a double complex in which 
one spectral sequence yields the homology the total 
complex to which both converge can be used to prove. 


Theorem 15 (Grothendieck spectral sequence). Let 
CBSA bea composition of additive functors, 
where C, B, and A are abelian categories. Assume 
that all objects in C and B have projective 
resolutions. Suppose that F takes projectives to 
projectives. Then for all objects C of C there exists 
a (first quadrant) spectral sequence with E? 


(LpG)\(LqF\(C)) converging to (Lpsa(GPC). 


Naturally, there is a corresponding version for 
right derived functors. 

An application of the Grothendieck spectral 
sequence is the following “change of rings spectral 
sequence.” Let f: R —S be a ring homomorphism, 
let M be a right S-module and let N be a left 
R-module. Let F(A)=S gRr A and G(B)=M 8s B, 
and note that GF(A)=M®@prA. Applying the 
Grothendieck spectral sequence to the composition 
(left R-modules — left S-modules “abelian groups) 
yields a convergent spectral sequence E P = Tor; 
(M, Tor; (S,N)) = Tor}, ,(M, N). 
Eilenberg-Moore Spectral Sequence 


For a topological group G, Milnor showed how to 
construct a universal G-bundle G — EG — BG in 
which EG is the infinite join G** with diagonal 
G-action. There is a natural filtration F,BG:= 
G*"*) /G on BG and therefore an induced filtration 
on the base of any principal G-bundle. This 
filtration yields a spectral sequence including as a 
special case a tool for calculating H,(BG) from 
knowledge of H,(G). 


Theorem 16 Let G—X—B be a principal 
G-bundle and let H,() denote homology with 
coefficients in a field. Then there is a first quadrant 
spectral sequence with E? , = Ton ae, H,,(*)) 
converging to Hy+4(BG). 


Here the group structure makes H,(G) into an 
algebra and Tor, (M, N) denotes degree q of the 
graded object formed as the pth-derived functor of 
the tensor product of the graded modules M and N 
over the graded ring A. 

There is also a version (Eilenberg and Moore 
1962) which, like the Serre spectral sequence, is 
suitable for computing H*(G) from H*(BG). 


Theorem 17 Let 


WwW — Y 
| |r 
x +, B 


be a pullback square in which r is a fibration and X 
and B are simply connected. Suppose that 
H*(X), H*(Y), and H*(B) are flat R-modules of 
finite type, where H*() denotes cohomology with 
coefficients in the Noetherian ring R. Then there is a 
(second quadrant) spectral sequence with Et4 = 
To Oy) converging to H?*7(W). 


The cohomological version of the Eilenberg—Moore 
spectral sequence, stated above, contains the more 
familiar Tor for modules over an algebra. For the 
homological version, one must dualize these notions 
appropriately to define the cotensor product of como- 
dules over a coalgebra, and its derived functors Cotor. 

Provided the action of the fundamental group of B 
is sufficiently nice there are extensions of the 
Eilenberg—Moore spectral sequence to the case 
where B is not simply connected, although they do 
not always converge, and extensions to generalized 
(co)homology theories have also been studied. 


See also: Cohomology Theories; Derived Categories; 
K-Theory; Spectral Theory for Linear Operators. 
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Introduction 


We begin with the study of linear operators 
on normed vector spaces (for definitions, see, e.g., 
Schechter (2002) or the appendix at the end of this 
article). If the scalars are complex numbers, we shall 


call the space complex. If the scalars are real, we 
shall call it real. 

Let X, Y be normed vector spaces. A mapping A 
which assigns to each element x of a set D(A) C X a 
unique element y € Y is called an operator (or 
transformation). The set D(A) on which A acts is called 
the domain of A. The operator A is called linear if 


1. D(A) is a subspace of X, and 
2. A(ayx, + a2Xx2)=a,Ax; + arAx2 


for all scalars a1, a2 and all elements x1,x2 € D(A). 
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To begin, we shall only consider operators A with 
D(A) =X. 

An operator A is called bounded if there is a 
constant M such that 


|Ax|| < M||x]], x €X [1] 
The norm of such an operator is defined by 


| Ax| 
||| 





A|| = sup |2] 
x#0 

It is the smallest M which works in [1]. An operator 

A is called continuous at a point x € X if x, > x in 

X implies Ax, — Ax in Y. A bounded linear 

operator is continuous at each point. For if x, —> x 

in X, then 


||Axn — Ax|| < ||A]] - [xn — x|| — 0 
We also have 


Theorem 1 If a linear operator A is continuous at 
one point xo € X, then it is bounded, and hence 
continuous at every point. 


We let B(X,Y) be the set of bounded linear 
operators from X to Y. Under the norm [2], one 
easily checks that B(X, Y) is a normed vector space. 


The Adjoint Operator 


An assignment F of a number to each element x of a 
vector space is called a functional and denoted by 
F(x). If it satisfies 

F(ayx4 + a2X2) = ay F(x) + aF(x2) [3] 


for a ,Q@2 scalars, it is called linear. It is called 


bounded if 

|F(x)| < M||x|], x €X [4] 
If F is a bounded linear functional on a normed 
vector space X, the norm of F is defined by 
|F(x)| 


|| Fl] = 
xEX, x40 || 





[5] 


It is equal to the smallest number M satisfying [4]. 

For any normed vector space X, let X’ denote the 
set of bounded linear functionals on X. If f,g € X’, 
we say that f =g if 


f(x) = g(x) for all x € X 


The “zero” functional is the one assigning zero to all 
x € X. We define h =f + g by 


h(x) = f(x) + g(x), 
and g=af by 


xe xX 


Under these definitions, X’ becomes a vector space. 
The expression 


If (x)| 


||| 





If || = sup s [6] 
x0 

is easily seen to be a norm. Thus, X’ is a normed vector 

space. It is therefore natural to ask when X’ will be 

complete. A rather surprising answer is given by 


Theorem 2 X’ is a Banach space whether or not 
X is. 

(For the definition of a Banach space, see, e.g., 
Schechter (2002) or the appendix at the end of this 
article.) 

Suppose X,Y are normed vector spaces and 
A€B(X,Y). For each y' € Y’, the expression y'(Ax) 
assigns a scalar to each x € X. Thus, it is a functional 
F(x). Clearly F is linear. It is also bounded since 


|F(x)| = Iy (Ax) < Iy Axl] < yl Al Illl 
Thus, there is an x’ € X’ such that 
y (Ax) =x (x), xEex [7] 


This functional x’ is unique. Thus, to each y’ € Y’ 
we have assigned a unique x’ € X’. We designate this 
assignment by A’ and note that it is a linear operator 
from Y’ to X’. Thus, [7] can be written in the form 


y (Ax) = A'y (x) [8] 


The operator A’ is called the adjoint (or conjugate) 
of A. We note 


Theorem 3 A’ € B(Y', X’), and ||A'|| = ||A||. 


The adjoint has the following easily verified 
properties: 


(A+B) =A +B' [9] 
(aA; = aA' [10] 
(ABY = B'A’ [11] 


Why should we consider adjoints? One reason is 
as follows. Many problems in mathematics and its 
applications can be put in the form: given normed 
vector spaces X, Y and an operator A € B(X, Y), one 
wishes to solve 


Ax=¥ [12] 


The set of all y for which one can solve [12] is called 
the “range” of A and is denoted by R(A). The set of 
all x for which Ax = 0 is called the “null space” of A 
and is denoted by N(A). Since A is linear, it is easily 
checked that N(A) and R(A) are subspaces of X and Y, 


respectively (for definitions, see, e.g., Schechter 
(2002) or the appendix at the end of this article). 
The dimension of N(A) is denoted by a(A). 

If y € R(A), there is an x € X satisfying [12]. For 
any y' € Y’ we have 


y (Ax) = y (y) 
Taking adjoints we get 

A'y'(x) = y'(y) 
If y € N(A’), this gives y'(y)=0. Thus, a necessary 
condition that y € R(A) is that y(y)=0 for all 


y' € N(A’). Obviously, it would be of great interest 
to know when this condition is also sufficient. 


The Spectrum and Resolvent Sets 


From this point henceforth we shall assume that 
X=Y. We can then speak of the identity operator I 
defined by 


Ix=x, xExX 


For a scalar A, the operator XI is given by 
Alx = Ax, xEx 


We shall denote the operator AI by À. 

We shall denote the space B(X,X) by B(X). 
For any operator A € B(X), a scalar A for which 
a(A — A) #0 is called an eigenvalue of A. Any 
element x Æ 0 of X such that (A — A)x =0 is called 
an eigenvector (or eigenelement). The points A for 
which (A—A) has a bounded inverse in B(X) 
comprise the resolvent set p(A) of A (for defini- 
tions, see, e.g., Schechter (2002) or the appendix 
at the end of this article). If X is a Banach space, 
it is the set of those A such that a(A — A) =0 and 
R(A — A)=X. The spectrum o(A) of A consists of 
all scalars not in p(A). The set of eigenvalues of A 
is sometimes called the point spectrum of A and 
is denoted by Po(A). 

We note that 


Theorem 4 For A in B(X), o(A’) =o(A). 


We are now going to examine the sets p(A) and 
o(A) for arbitrary A € B(X). 


Theorem 5 p(A) is an open set and hence o(A) is a 
closed set. 


Does every operator A € B(X) have points in its 
resolvent set? Yes. In fact, we have 


Theorem 6 For A in B(X), set 
ro(A) = inf ||A”||'/" [13] 


Then p(A) contains all scalars A such that |A| > r(A). 
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Let p(t) be a polynomial of the form 


p(t) =Y at" 
0 


Then for any operator A € B(X), we define the 
operator 


p(A) = Sra, 
0 


where we take A? =I. We have 


Theorem 7 If € o(A), then p(A) € o(p(A)) for any 
polynomial p(t). 


Proof Since A is a root of p(t) — p(A), we have 
p(t) — pA) = (t- A)g() 


where g(t) is a polynomial with real coefficients. 
Hence, 


P(A) — P(A) = (A— A)q(A) = 4AA -= A) [14] 


Now, if p(A) is in p(p(A)), then [14] shows that 
a(A—A)=0 and R(A—A)=X. This means that 


à € p(A), and the theorem is proved. O 
A symbolic way of writing Theorem 7 is 
p(o(A)) C o(p(A)) [15] 


Note that, in general, there may be points in 
o(p(A)) which may not be of the form p(A) for 
some A€o(A). As an example, consider the 
operator on R? given by 


A(ay, a2) = (—az, a1) 


A has no spectrum; A — is invertible for all real A. 
However, A? has —1 as an eigenvalue. What is the 
reason for this? It is simply that our scalars are real. 
Consequently, imaginary numbers cannot be con- 
sidered as eigenvalues. We shall see later that in 
order to obtain a more complete theory, we shall 
have to consider complex Banach spaces. Another 
question is whether every operator A € B(X) has 
points in its spectrum. For complex Banach spaces, 
the answer is yes. 


The Spectral Mapping Theorem 
Suppose we want to solve an equation of the form 
p(A)x =y, x, yEX |16] 


where p(t) is a polynomial and A € B(X). If 0 is not in 
the spectrum of p(A), then p(A) has an inverse in B(X) 
and, hence, [16] can be solved for all y € X. So a 
natural question to ask is: what is the spectrum of 
p(A)? By Theorem 7 we see that it contains p(o(A)), 
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but by the remark at the end of the preceding section 
it can contain other points. If it were true that 


p(o(A)) = o(p(A)) [17] 


then we could say that [16] can be solved uniquely 
for all y € X if and only if p(A) 4 0 for all A € ofA). 
For a complex Banach space we have 


Theorem 8 If X is a complex Banach space, then 
u € o(p(A)) if and only if u= p(A) for some A € o(A), 
that is, if [17] holds. 


Proof We have proved it in one direction already 
(Theorem 7). To prove it in the other, let 71,...,%, 
be the (complex) roots of p(t) — u. For a complex 
Banach space they are all scalars. Thus, 


HAJ = p= dA =) A=) ¢ #90 


Now suppose that all of the 7 are in p(A). Then 
each A — y; has an inverse in B(X). Hence, the same 
is true for p(A)— u. In other words, u € p(p(A)). 
Thus, if u € o(p(A)), then at least one of the 7; must 
be in o(A), say yp. Hence, = p(y), where yz € o(A). 
This completes the proof. o 

Theorem 8 is called the “spectral mapping 
theorem” for polynomials. As mentioned before, it 
has the useful consequence: 


Corollary 1 If X is a complex Banach space, then 
eqn [16] has a unique solution for every y in X if 
and only if p(X) #0 for all A € o(A). 


Operational Calculus 


Other things can be done in a complex Banach space 
that cannot be done in a real Banach space. For 
instance, we can get a formula for p(A)~ when it 
exists. To obtain this formula, we first note 


Theorem 9 If X is a complex Banach space, then 
(z—- A)" is a complex analytic function of z for 
z € p(A). 


By this, we mean that in a neighborhood of each 
zo € p(A), the operator (z — A) can be expanded in a 
“Taylor series,” which converges in norm to (z — A)”, 
just like analytic functions of a complex variable. 


Now, by Theorem 6, p(A) contains the set |z| > ||A]. 


We can expand (z — A)™ in powers of z! on this set. 
In fact, we have 
Lemma 1 If |z| > lim sup ||A”||'/”, then 
(z = A)! — > a [18] 
1 


where the convergence is in the norm of B(X). 


Let C be any circle with center at the origin and 
radius greater than, say, ||A||. Then, by Lemma 1, 


$ z” (z — A) ‘dz = D f zd 
C k=l C 

= 27iA” 19) 
or 


n > fil» . A\=! 
A a= ~ (z— A) dz [20] 


where the line integral is taken in the right direction. 

Note that the line integrals are defined in the same 
way as is done in the theory of functions of a 
complex variable. The existence of the integrals and 
their independence of path (so long as the integrands 
remain analytic) are proved in the same way. Since 
(z— A)" is analytic on p(A), we have 


Theorem 10 Let C be any closed curve containing 
o(A) in its interior. Then [20] holds. 


As a direct consequence of this, we have 


Theorem 11 r,(A)= maxye,ia) |A| and || A” |” > 
1,(A) as n— œ. 


We can now put Lemma 1 in the following form: 


Theorem 12 If |z| >r(A), then [18] holds with 
convergence in B(X). 


Now let b be any number greater than r,(A), and 
let f(z) be a complex-valued function that is analytic 
in |z| < b. Thus, 


f(z) = a el <b [21] 

0 
We can define f(A) as follows: the operators 

Sa, A* 

0 
converge in norm, since 
X lag| + |A*|| < 00 
0 


This last statement follows from the fact that if c is 
any number satisfying r,(A) < c < b, then 


ARI < c 


for k sufficiently large, and the series 


CO 
N Jaji 
0 


is convergent. We define f(A) to be 


N aA" [22] 
0 

By Theorem 10, this gives 
fh) = md $ z (z — 

2ri l=" 

-1 
= TD 2 z Ce dz 
ot Me 

-5 aif. F(z)(¢ — A)! dz 23] 


A) ' dg 


where C is any circle about the origin with radius 
greater than r(A) and less than b. 
We can now give the formula that we promised. 
Suppose f(z) does not vanish for |z| <b. Set 
z)=1/f(z). Then g(z) is analytic in |z| < b, and 
hence g(A) is defined. Moreover, 


1 
f(A “a5 a f f@ete)le- Ay" de 
=| E 
-> ee A) dead 
Since f(A) and g(A) clearly commute, we see that 
f(A) exists and equals g(A). Hence, 


alt fly yi 
e $ a (g—A) dz [24] 


In particular, if 


glz) = 1/f(2) = Dae! z| <b 


then 
a ee [25] 
0 


Now, suppose f(z) is analytic in an open set 2 
containing o(A), but not analytic in a disk of radius 
greater than r(A). In this case, we cannot say that 
the series [22] converges in norm to an operator in 
B(X). However, we can still define f(A) in the 
following way: there exists an open set w whose 
closure w C Q and whose boundary Ow consists of a 
finite number of simple closed curves that do not 
intersect, and such that o(A) Cw. (That such a 
set always exists is left as an exercise; see, e.g., 


Schechter (2002).) We now define f(A) b 


HA =f fa\e-Ay'de pe 
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where the line integrals are to be taken in the 
proper directions. It is easily checked that f(A) € 
B(X) and is independent of the choice of the set w. 
By [23], this definition agrees with the one given 
above for the case when Q contains a disk of radius 
greater than r(A). Note that if Q is not connected, 
f(z) need not be the same function on different 
components of Q. 

Now suppose f(z) does not vanish on o(A). Then 
we can choose w so that f(z) does not vanish on w 
(this is also an exercise). Thus, g(z)=1/f(z) is 
analytic on an ie set containing w so that g(A) is 
defined. Since f(z)g(z)=1, one would expect that 
f(A)g(A) men a in which case, it would 
follow that f(A) exists and is equal to g(A). This 
follows from 


Lemma 2 If f(z) and g(z) are analytic in an open 
set Q containing o(A) and 


then h(A) =f(A)g(A). 


Therefore, it follows that we have 


Theorem 13 If A is in B(X) and f(z) is a function 
analytic in an open set Q containing o(A) such that 
f(z) £0 on o(A), then f(A) exists and is given by 


a oe 
Oni aw f (2) 


where w is any open set such that 


(1) oA) Cu,wcQ, 
(ii) Ow consists of a finite number of simple closed 
curves, and 


(iii) f(z) Æ 0 on ù. 


Now that we have defined f(A) for functions 
analytic in a neighborhood of o(A), we can show 
that the spectral mapping theorem holds for such 
functions as well (see Theorem 8). We have 


Theorem 14 If f(z 
of o(A), then 


f(A) =A k 


) is analytic in a neighborhood 


o(f(A)) = (ofA) |27] 


that is, u € a(f(A 
à € o(A). 


)) if and only if p=f(A) for some 


Complexification 


What we have just done is valid for complex Banach 
spaces. Suppose, however, we are dealing with a real 
Banach space. What can be said then? 
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Let X be a real Banach space. Consider the set Z 
of all ordered pairs (x,y) of elements of X. We set 


(%1,¥1) + (%2, 92) = (x1 + x2, Y1 + Y2) 
(a +ib) (x,y) = ((ax — By), (Bx + ay)) 
a, BER 


With these definitions, one checks easily that Z is a 
complex vector space. The set of elements of Z of 
the form (x, 0} can be identified with X. We would 
like to introduce a norm on Z that would make Z 
into a Banach space and satisfy 


l 0| = Ixl, «ex 


An obvious suggestion is 


2 21/2 
(ælt + Iyl 
However, it is soon discovered that this is not a norm 
on Z (why?). We have to be more careful. One that 
works is given by 


(x,y) |] = Pars x (lax — By + ||Bx + ayl?) 


With this norm, Z becomes a complex Banach space 
having the desired properties. 

Now let A be an operator in B(X). We define an 
operator A in B(Z) by 


A(x, y) = (Ax, Ay) 
Then 
|A(x,y) | 
= max (laAx — GAy||* + || Ax + aAyl")"” 
= max (||A(ax = By) I + Ax + ayy)" 
< |All Ix, y) 
Thus, 
All < ||A] 
But, 
|| (Ax, 0) | 
All >s 
A] SUP T(x, Oj] E o = ||A|| 
Hence, 
|All = IAI 
If A is real, then 


This shows that A € p(A) if and only if A € p(A). 
Similarly, if p(t) is a polynomial with real coeffi- 
cients, then 


p(A) (x,y) = (p(A)x, p(A)y) 


showing that p(A) has an inverse in B(Z) if and only 
if p(A) has an inverse in B(X). Hence, we have 


Theorem 15 Equation [16] has a unique solution 
for each y in X if and only if p(A) #0 for all 
àA E€ o(A). 


In the example given earlier, the operator A 
has eigenvalues 1 and —i. Hence, —1 is in the 
spectrum of A? and also in that of A?. Thus, the 
equation 


(A? +1)x =y 


cannot be solved uniquely for all y. 


Compact Operators 


Let X, Y be normed vector spaces. A linear operator 
K from X to Y is called compact (or completely 
continuous) if D(K)=X and for every sequence 
{xn} C X such that ||x,,|| < C, the sequence {Kx,,} has 
a subsequence which converges in Y. The set of all 
compact operators from X to Y is denoted by 
K(X, Y). 

A compact operator is bounded. Otherwise, there 
would be a sequence {x,} such that ||x,|]| < C, while 
| Kxn|| + oo. Then {Kx,} could not have a conver- 
gent subsequence. The sum of two compact opera- 
tors is compact, and the same is true of the product 
of a scalar and a compact operator. Hence, K(X, Y) 
is a subspace of B(X, Y). 

If AE B(X,Y) and KE K(Y,Z), then KA € K 
(X,Z). Similarly, if L€ K(X, Y) and B € B(Y, Z), 
then BL € K(X, Z). 

Suppose K € B(X, Y), and there is a sequence {F,,} 
of compact operators such that 


|K — F,|| — 0 as n — co [28] 


We claim that if Y is a Banach space, then K is 
compact. 


Theorem 16 Let X be a normed vector space and 
Y a Banach space. If L is in B(X, Y) and there is a 
sequence {K,} C K(X, Y) such that 


|L — K,,|| — 0 as n— 0 
then L is in K(X, Y). 


Theorem 17 Let X be a Banach space and let K be 
an operator in K(X). Set A=I—K. Then, R(A) is 
closed in X and dimN(A)= dim N(A’) is finite. 


In particular, either R(A)=X and N(A)=({0}, or 
R(A) #4 X and N(A) £ {0}. 


The last statement of Theorem 17 is known as the 
“Fredholm alternative.” 

Let X,Y be Banach spaces. An operator A € 
B(X, Y) is said to be a Fredholm operator from X to 
Y if 
1. a(A) = dim N(A) is finite, 

2. R(A) is closed in Y, and 
3. B(A)= dim N(A’) is finite. 


The set of Fredholm operators from X to Y is 
denoted by #(X, Y). If X=Y and K € K(X), then, 
clearly, I — K is a Fredholm operator. The index of a 
Fredholm operator is defined as 


i(A) = a(A) — B(A) |29] 


For Ke K(X), we have shown that i(I— K)=0 
(Theorem 17). 


Theorem 18 Let X, Y be normed vector spaces, 
and assume that K is in K(X,Y). Then K' is in 
K(Y', X’). 


Let X be a Banach space, and suppose K € K(X). 
If A is a nonzero scalar, then 


MI — K = MI —27'K) € 8(X) [30] 


For an arbitrary operator A € B(X), the set of all 
scalars \ for which AI — A € ®(X) is called the ®-set 
of A and is denoted by #4. Thus, [30] gives 


Theorem 19 If X is a Banach space and K is in 
K(X), then ®x contains all scalars A £ 0. 


Theorem 20 Under the hypothesis of Theorem 19, 
a(K — A) =0 except for, at most, a denumerable set 
S of values of A. The set S depends on K and has 0 as 
its only possible limit point. Moreover, if 4 0 and 
AZS, then a(K —A)=0,R(K—A)=X and K-A 
has an inverse in B(X). 


Unbounded Operators 


In many applications, one runs into unbounded 
Operators instead of bounded ones. This is particu- 
larly true in the case of differential equations. For 
instance, consider the operator d/dt on C[0, 1] with 
domain consisting of continuously differentiable 
functions. It is clearly unbounded. In fact, the 
sequence x,(t)=?t” satisfies ||x,,|| = 1, ||dx,,/dt|| = 
n— œ as m— oo. It would, therefore, be useful if 
some of the results that we have stated for bounded 
operators would also hold for unbounded ones. We 
shall see that, indeed, many of them do. Unless 
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otherwise specified, X,Y,Z, and W will denote 
Banach spaces in this article. 

Let X, Y be normed vector spaces, and let A be 
a linear operator from X to Y. We now officially 
lift our restriction that D(A)=xX. However, if 
A € B(X, Y), it is still to be assumed that D(A) = X. 

The operator A is called closed if whenever {x,} C 
D(A) is a sequence satisfying 


Xn — xin X, AX, — y in Y [31] 


then x € D(A) and Ax =y. Clearly, all operators in 
B(X, Y) are closed. 

To define A’ for an unbounded operator, we 
follow the definition for bounded operators, and 
exercise a bit of care. We want 

Aly'(x) = y (Ax), x € D(A) [32] 
Thus, we say that y’ € D(A’) if there is an x’ € X’ 
such that 


x'(x) = y (Ax), x € D(A) [33] 


Then we define A’y’ to be x’. In order that this 
definition make sense, we need x’ to be unique, that 
is, that x’(x) =O for all x € D(A) should imply that 
x' = (0. This is true if and only if D(A) is dense in X. 
To summarize, we can define A’ for any linear 
operator from X to Y provided D(A) is dense in X. 
We take D(A’) to be the set of those y’ € Y’ for 
which there is an x’ € X” satisfying [33]. This x’ is 
unique, and we set A’y’=x’. Note that if 


y (Ax)| < Cl|x|], x € D(A) 


then a simple application of the Hahn-Banach 
theorem (see e.g., Schechter (2002) or the appendix) 
shows that y’ € D(A’). 

We define unbounded Fredholm operators in the 
following way: let X, Y be Banach spaces. Then the 
set P(X, Y) of Fredholm operators from X to Y 
consists of linear operators from X to Y such that 


D(A) is dense in X, 

. A is closed, 

a(A)= dim N(A) < œ, 
R(A) is closed in Y, and 
B(A) = dim N(A’) < œ. 


ae ae aR 


The Essential Spectrum 


Let A be a linear operator on a normed vector space 
X. We say that A € p(A) if R(A — A) is dense in X 
and there is a T € B(X) such that 


T(A — A) =I on D(A) 


(A —A)T =I on R(A — 4) ca 
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Otherwise,  € o(A). As before, p(A) and o(A) are 
called the resolvent set and spectrum of A, respec- 
tively. To show the relationship of this definition to 
the one given before, we note the following. 


Lemma 3 If X is a Banach space and A is closed, 
then à € p(A) if and only if 


a(A—r)=0, R(A-A)=X [85l 


Throughout the remainder of this section, we shall 
assume that X is a Banach space, and that A is a 
densely defined, closed linear operator on X. We ask 
the following question: what points of o(A) can be 
removed from the spectrum by the addition of a 
compact operator to A? The answer to this question is 
closely related to the set #4. We define this to be the 
set of all scalars A such that A — A € ®(X). We have 


Theorem 21 The set 4 is open, and i(A — A) is 
constant on each of its components. 


We also have 


Theorem 22 ®,4,c,=®, for all K which are 


A-compact, and 1(A+K—A)=i(A—A) for all 
AE Py. 
Set 
o(A)= {) o(A+K) 
KeK(X) 


We call o.(A) the essential spectrum of A (there are 
other definitions). It consists of those points of o(A) 
which cannot be removed from the spectrum by the 
addition of a compact operator to A. We now 
characterize o,(A). 


Theorem 23 A¢o.(A 
i(A — A) =0. 


) if and only if A€ 4 and 


Normal Operators 


A sequence of elements {y,} in a Hilbert space is 
called orthonormal if 


(Pm: Pn) = l ee [36] 


1, m=n 


(for definitions, see, e.g., Schechter (2002) or the 
appendix at the end of this article). 

Let {y,} be an orthonormal sequence (finite or 
infinite) in a Hilbert space H. Let {A,} be a sequence 
(of the same length) of scalars satisfying 


[Ag] S 


Then for each element f € H, the series 


N Alf, Pe) er 


converges in H. Define the operator A on H by 


Af =X elf, x) ee [37] 


Clearly, A is a linear operator. It is also bounded, 


since 
2 2 2 
IAAI = $ RIG, ee) 
by Bessel’s inequality 


<C lf’ [88] 


SOF o} < IIAP 39] 


1 


For convenience, let us assume that each A; Æ O (just 
remove those p, corresponding to the A; that vanish). 
In this case, N(A) consists of precisely those f € H 
which are orthogonal to all of the p}. Clearly, such f 
are in N(A). Conversely, if f € N(A), then 


0 = (Af, pr) = Ak (f, Ye) 


Hence, (f, ~,) =0 for each k. Moreover, each Ap is 
an eigenvalue of A with wy, the corresponding 
eigenvector. This follows immediately from [37]. 
Since o(A) is closed, it also contains the limit points 
of the Xp. 

Next, we shall see that if A Æ 0 is not a limit point 
of the Az, then àA € p(A). To show this, we solve 


(A—A)u=f |40] 
for any f € H. Any solution of [40] satisfies 


Au=f+Au=ft 5 (u, pe) ve [41] 





Hence, 
Alu, pr) = (F, Pk) + Ak (U, Pr) 
(uso) = EEY a2 


Substituting back in [41], we obtain 


Auf, Pk) Pk 
43 
ci Das ee oa [43] 
Since A is not a limit point of the Az, there is a 6 > 0 
such that 


r-rel > 6, k=1,2,... 


Hence, the series in [43] converges for each f € H. It 
is an easy exercise to verify that [43] is indeed a 
solution of [40]. To see that (\— A) is bounded, 
note that 


Al- [lel] < IAI] + CIA [44] 


(cf. [38]). Thus, we have proved 


Lemma 4 If the operator A is given by [37], then 
o(A) consists of the points A}, their limit points and 
possibly 0. N(A) consists of those u which are 
orthogonal to all of the wp. For A€ p(A), the 
solution of [40] is given by [43]. 


We see from all this that the operator [37] has 
many useful properties. Therefore, it would be 
desirable to determine conditions under which 
Operators are guaranteed to be of that form. For 
this purpose, we note another property of A. It is 
expressed in terms of the Hilbert space adjoint of A. 

Let Hı and H; be Hilbert spaces, and let A be an 
operator in B(H1, H2). For fixed y € H2, the expres- 
sion Fx = (Ax,y) is a bounded linear functional on 
Hı. By the Riesz representation theorem (see, e.g., 
Schechter (2002) or the appendix at the end of this 
article), there is a z € Hı such that Fx = (x, z) for all 
x € Hı. Set z= A*y. Then A* is a linear operator 
from H3 to H, satisfying 


(Ax, y) = (x, A*y) [45] 


A* is called the Hilbert space adjoint of A. Note the 
difference between A* and the operator A’ defined 
for a Banach space. As in the case of the operator A’, 
we note that A* is bounded and 


Al] = lA] [46] 


Returning to the operator A, we remove the 
assumption that each A, 4 0 and note that 


(Au,v) = X` Arlu, Pk) (Pk V) 
= (u, N Aklo, pK) | 


showing that 


Atv = S—dg(v, Pe) ve [47] 


(If H is a complex Hilbert space, then the complex 
conjugates A, of the A, are required. If H is a real 
Hilbert space, then the A, are real, and it does not 
matter.) Now, by Lemma 4, we see that each A, 
is an eigenvalue of A* with yw, a corresponding 
eigenvector. Note also that 


IAAI = SO AFI pel? [48] 
showing that 
IAAI = IASI, 


An operator satisfying [49] is called normal. An 
important characterization is given by 


FEH |49] 


Theorem 24 An operator is normal and compact 
if and only if it is of the form [37] with {p,} an 
orthonormal set and A, — 0 as k — œ. 
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We also have 


Lemma 5 If A is normal, then 


I(4* — Aull = I(4 - Aull, «eH [50] 


Corollary 2 If A is normal and Ap=Xp, then 
A GEAD: 

Lemma 6 If A is normal and compact, then it has 
an eigenvalue à such that |r| = ||A||. 


We also have 


Corollary 3 If A is a normal compact operator, 
then there is an orthonormal sequence {y,} of 
eigenvectors of A such that every element u in H 
can be written in the form 


u=h+ S (u, oR) ee 51] 
where h € N(A). 


Hyponormal Operators 


An operator A in B(H) is called hyponormal if 


|A°u|| < ||Aul], «cH |52] 


or, equivalently, if 


([AA* — A*A]lu,u) <0, uc H [53] 


Of course, a normal operator is hyponormal. An 
operator A € B(H) is called seminormal if either A 
or A* is hyponormal. We have 


Theorem 25 If A is seminormal, then 
ro(A) = ||Al [54] 


We have earlier defined the essential spectrum of 
an operator A to be 


oe(A)= (] o(A+K) [55] 


KeK(H) 


It was shown that À € ce(A) if and only if A € ®, 
and i(A — A)=0 (Theorem 23). Let us show that we 
can be more specific in the case of seminormal 
operators. 


Theorem 26 If A is a seminormal operator, then 
A €a(A)\oe(A) if and only if A is an isolated 
eigenvalue with r(A — A) = lim,—.. a[(A — A)"] < œ. 


Lemma 7 If A is hyponormal, then so is B=A — A 


for any complex x. 


Lemma 8 If B is hyponormal with O an isolated 
point of o(B) and either a(B) or 3(B) is finite, then 
B € (H) and i(B)=0. 
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There is a simple consequence of Lemma 8. 


Corollary 4 If A is seminormal and A is an isolated 
point of o(A), then à is an eigenvalue of A. 


We also have the following: 


Theorem 27 Let A be a seminormal operator such 
that o(A) has no nonzero limit points. Then A is 
compact and normal. Thus, it is of the form [37] 
with the {pp} orthonormal and rp —> 0. 


Corollary 5 If A is seminormal and compact, then 
it is normal. 


Spectral Resolution 


We saw in the section “Operational calculus” that, 
in a Banach space X, we can define f(A) for any 
A € B(X) provided f(z) is a function analytic in a 
neighborhood of o(A). In this section, we shall show 
that we can do better in the case of self-adjoint 
operators. 

A linear operator A on a Hilbert space X is called 
self-adjoint if it has the property that x € D(A) and 
Ax =f if and only if 


(x, Ay) =(f,¥), y€ D(A) 
In particular, it satisfies 
(Ax, y) = (x, Ay), x,y € D(A) 


A bounded self-adjoint operator is normal. 
To get an idea, let A be a compact, self-adjoint 
operator on H. Then by Theorem 24, 


Au = J dg(t, Ok) Ge [56] 


where {y;} is an orthonormal sequence of eigenvec- 
tors and the A, are the corresponding eigenvalues of 
A. Now let p(t) be a polynomial with real 
coefficients having no constant term 


= D a,t* [57] 
1 


Then p(A) is compact and self-adjoint. Let u 4 0 be 
a point in o(p(A)). Then w= p(A) for some A € o(A) 
(Theorem 8). Now à Æ 0 (otherwise we would have 
i= p(0) =0). Hence, it is an eigenvalue of A (see the 
section “The spectrum and resolvent sets”). If y is a 
corresponding eigenvector, then 


P(A) — nlp => _ a A*y — uy 
=) ad" — wp 
= [p(\) — ply = 0 


Thus u is an eigenvalue of p(A) and ọ is a 
corresponding eigenvector. This shows that 


(Aju = X pA) (u, vee |58] 


Now, the right-hand side of [58] makes sense if p(t) 
is any function bounded on o(A) (see the section 
“Normal operators”). Therefore it seems plausible 
to define p(A) by means of [58]. Of course, for such 
a definition to be useful, one would need certain 
relationships to hold. In particular, one would want 
f(t)g(t)=h(t) to imply f(A)g(A)=h(A). We shall 
discuss this a bit later. 

If A is not compact, we cannot, in general, obtain 
an expansion in the form [56]. However, we can 
obtain something similar. In fact, we have 


Theorem 28 Let A be a self-adjoint operator in 
B(H). Set 
m = eee M = sup (Au, u) 
"|= ll =1 
Then there is a family {E(A)} of orthogonal projection 


operators on H depending on a real parameter àA and 
such that: 


(i) E(Ay) < E(A2) for Ay < Ad; 

1) E(A)ju— E(Ao)u as Ap < à —> Aogyu € H; 

(iii) E(A)=0 for A < m, E(A) = for 

(iv) AE(A)=E(A)A; and 

v) if ax<m,b>M and p(t) is any polynomial, 
then 


b 
=f payee) 59) 
This means the following. Let a= ño < `M <- < 


An =b be any partition of [a,b], and let N, be any 
number satisfying Xp, < À, < Ag. Then 


ye (A) [E 


in B(H) as n= max (Az — Ap_1) 2 0. 


E(Ap-1)| > P(A) [60] 


Theorem 29 Let A be a self-adjoint operator on H. 
Then there is a family {E(A)} of orthogonal tyes 
tion operators on H satisfying (i) and (ii) of 
Theorem 28 and 


I 0 as A— —oco 
t) EQ) = {5 as A — +00 


(ii) E(A)A C AE()) 


(iii) P(A)=] PAdEO) 


for any polynomial p(t). 
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These theorems are known as the 
theorems for self-adjoint operators. 


spectral 


Appendix 


Here we include some background material related 
to the text. 

Consider a collection C of elements or 
with the following properties: 


1. They can be added. If f and g are in C, so is f + g. 

2.f+(g+h)=(f+a)+h, f,g,b€C. 

3. There is an element 0 € C such that h+0=h 
for all þ EC. 

4. For each h € C there is an element —h € C such 
that b + (—h) =0. 

S.gth=h4+g, gheEc. 

6. For each real number a,ah € C. 

7. alg +h)=ag+ab. 

8 

9 

0 


(4 


‘vectors”’ 


. (a+ B)h=abh + bh. 

. albh) = (aB)h. 

. To each h € C there corresponds a real number 

||| with the following properties: 

11. |[ab|| = [allil]. 

12. ||b|| =0 if, and only if, b=0. 

13. |g +All < llgl| + l]: 

14. If {ba} is a sequence of elements of C such 
that ||, —h,,|| + 0 as m,n — ov, then there is 
an element h € C such that ||b, —h|| +0 as 
nN — OO. 

A collection of objects which satisfies statements 

(1)-(9) and the additional statement 

15. lh=h 


is called a vector space or linear space. 

A set of objects satisfying statements (1)—(13) is 
called a normed vector space, and the number ||/]|| 
is called the norm of h. Although statement (15) is 
not implied by statements (1)-(9), it is implied by 
statements (1)-(13). A sequence satisfying 


ln —Pm|| ~0 as m,n — co 


is called a Cauchy sequence. Property (14) states 
that every Cauchy sequence converges in norm to 
a limit (i.e., satisfies ||h, —h|| ~0 as n— oo). 
Property (14) is called completeness, and a normed 
vector space satisfying it is called a complete normed 
vector space or a Banach space. 

We shall write 


b,—-b asn—oo 
when we mean 


ln — b|| ~0 asn—co 


A subset U of a vector space V is called a subspace 
of V if axı + œx is in U whenever x1,x2 are in U 
and a 1, q@2 are scalars. 

A subset U of a normed vector space X is called 
closed if for every sequence {x,} of elements in U 
having a limit in X, the limit is actually in U. 

Consider a vector space X having a mapping (f, g) 
from pairs of its elements to the reals such that 


L (af, g)=alf, g) 
4. (f,f) > 0 unless f =0. 
Then 


(fe SNE, feex [61] 


An expression (f,g) that assigns a real number to 
each pair of elements of a vector space and satisfies 
the aforementioned properties is called a scalar 
(or inner) product. 

If a vector space X has a scalar product (f, g), then 
it is a normed vector space with norm ||f|| = (f, f)". 
A vector space which has a scalar product and is 
complete with respect to the induced norm is called 
a Hilbert space. Every Hilbert space is a Banach 
space, but the converse is not true. Inequality [61] is 
known as the Cauchy—Schwarz inequality. R” is a 
Hilbert space. 

Let H be a Hilbert space and let (x,y) denote its 
scalar product. If we fix y, then the expression 
(x,y) assigns to each x € H a number. An assign- 
ment F of a number to each element x of a vector 
space is called a functional and denoted by F(x). 
The scalar product is not the first functional we 
have encountered. In any normed vector space, the 
norm is also a functional. The functional 
F(x) = (x,y) satisfies 


F(ayx4 + a2.X2) = &œıF(x1) -+ azF(x2) [62] 


for @œı,@2 scalars. A functional satisfying [62] is 
called linear. Another property is 

E&œ)| < M||x||, xeH [63] 
which follows immediately from Schwarz’s inequal- 
ity (cf. [61]). A functional satisfying [63] is called 
bounded. The norm of such a functional is defined 
to be 





|F(x)| 
|F| = sup 
xeH, x40 ||| 


Thus for y fixed, F(x)=(x,y) is a bounded linear 
functional in the Hilbert space H. We have 
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Theorem 30 For every bounded linear functional F 
on a Hilbert space H there is a unique element 
y € H such that 


Fxj={(x, y) foralle eA [64] 
Moreover, 
|F(x)| 
ly = sup = ||F| [65] 
x€H,xA0 ||| 


Theorem 30 is known as the “Riesz representation 
theorem.” 

For any normed vector space X, let X’ denote the 
set of bounded linear functionals on X. If f,g € X’, 
we say that f =g if 


f(x) = g(x) 


The “zero” functional is the one assigning zero to all 
x € X. We define h =f + g by 


h(x) = f(x) + g(x), 


and g=af by 


for all x € X 


xEX 


g(x) =af(x), xEXx 


Under these definitions, X’ becomes a vector space. 
We have been employing the expression 


If || = sup LO! 
xA~0 Ila || 





Tex (66) 


This is easily seen to be a norm. Thus X’ is a normed 
vector space. 
We also have 


Theorem 31 Let M bea subspace of a normed vector 
space X, and suppose that f(x) is a bounded linear 
functional on M. Set 


Ifl| = sup EO 


xeEM,x40 || 





Then there is a bounded linear functional F(x) on 
the whole of X such that 


F(x) =f(x), xEM [67] 
and 
|F= sup EE f= sup HO es] 
xex x0 ||| xeM x0 ||| 


Theorem 31 is known as the “Hahn—Banach theorem.” 
If A is a linear operator from X to Y, with 
R(A)=Y and N(A)= {0} (i.e., consists only of the 


vector 0), we can assign to each y € Y the unique 
solution of 


Ax=y 


This assignment is an operator from Y to X and is 
usually denoted by A! and called the inverse 
operator of A. It is linear because of the linearity 
of A. One can ask: “when is A™ continuous?” or, 
equivalent by, “when is it bounded?” A very 
important answer to this question is given by 


Theorem 32 If X, Y are Banach spaces and A is a 
closed linear operator from X to Y with 
R(A) = Y, N(A) = {0}, then A € B(Y, X). 


This theorem is sometimes referred to as the 
“bounded inverse theorem.” 
If A is self-adjoint and 


(A — A)x = 0, 
with A Æ u, then 


(A — p)y = 0 


(x, y) =0 


If A has a compact inverse, its eigenvalues cannot 
have limit points. If A™ is compact, then the 
eigenelements corresponding to the same eigenvalue 
form a finite-dimensional subspace. 


See also: Ljusternik—Schnirelman Theory; Quantum 
Mechanical Scattering Theory; Regularization for 
Dynamical Zeta Functions; Spectral Sequences; 
Stochastic Resonance. 
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Introduction 


In loop quantum gravity (LQG) (see Loop Quantum 
Gravity) — a background independent formulation of 
quantum gravity — the full quantum dynamics is 
governed by the following (constraint) operator 
equations or quantum Einstein equations: 


Gauss Law 
G;(A, E)|V >:= D,E? |Y >= 0 


Vector constraint 


JAS aAA 


V,(A, EIT >:= E7F!, (AT >= 0 


Scalar constraint 


eT 
N 


S(A, By >= VderE "EFEFFI,(A) s 1 
Į >= 0 


where A‘ is an SU(2) connection (1=1,2,3, 
a=1,2,3), E? is its conjugate momentum (the triad 
field), F”,(A) is the curvature of A’, and D, is the 
covariant derivative (see Canonical General Relativ- 
ity). The hat means that the classical phase-space 
functions are promoted to operators in a kinematical 
Hilbert space Hkin; the solutions are in the so-called 
physical Hilbert space Hphys- The goal of the spin foam 
approach is to construct a mathematically well-defined 
notion of path integral for LQG as a device for 
computing the solutions of the previous equations. 
The space of solution of the Gauss and vector 
constraints [1] is well understood in LQG (see Loop 
Quantum Gravity), and often also called kinematical 
Hilbert space Hkin- The solutions of the scalar 
constraint can be characterized by the definition of 
the generalized projection operator P from the 
kinematical Hilbert space Hkin into the kernel of 
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the scalar constraint H Formally, one can write 


P as 


phys: 
Pe | esa 


7 J "e exp i [ NSE) 2] 


A formal argument shows that P can also be defined 
in a manifestly covariant manner as a regularization 
of the formal path integral of general relativity. In 
first-order variables, it becomes 


P= | Dle] DIA] HlA,elexpliScr(e.A)] (3 


where e is the tetrad field, A is the spacetime connection, 
and | A, e] denotes the appropriate measure. 

In both cases, P characterizes the space of 
solutions of quantum Einstein equations as for 
any arbitrary state |Ọ >€ Hkin then P|d> is a 
(formal) solution of [1]. Moreover, the matrix 
elements of P define the physical inner product 
(<,>p) providing the vector space of solutions of 
[1] with the Hilbert space structure that defines 
Tipis Explicitly; 


A 5 eS 


for s,s’ € Hkin- 

When these matrix elements are computed in 
the spin network basis (see Figure 1) (see Loop 
Quantum Gravity), they can be expressed as a 
sum over amplitudes of “spin network histories”: 
spin foams (Figure 2). The latter are naturally 
given by foam-like combinatorial structures 
whose basic elements carry quantum numbers of 
geometry (see Loop Quantum Gravity). A spin 
foam history, from the state |s> to the state |s’>, 
is denoted by a pair (F,_.., {j}), where F,_,, is the 
2-complex with boundary given by the graphs of 
the spin network states |s > and |s>, respectively, 
and {j} is the set of spin quantum numbers 
labeling its edges (denoted e € F,_,,) and faces 
(denoted f ¢F,.,).  Vertices are denoted 
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Figure 1 A spin network state is given by a graph embedded 
in space whose links and nodes are labeled by unitary 
irreducible representations of SU(2). These states form a 
complete basis of the kinematical Hilbert space of LQG where 
the operator equations [1] are defined. 














Figure 2 A spin foam as the “colored” 2-complex representing 
the transition between three different spin network states. A 
transition vertex is magnified on the right. 


v € F,—ngy. The physical inner product can be 


expressed as a sum over spin foam amplitudes 


oe ee 


— NONE) >, [I Arp) 
Ba 


jy Peleg 
x [I Acie) [| A0) [4] 
ecF 1 vEF 1 


S—>S S—>S 


where N(F;_.,) is a (possible) normalization 
factor, and Ar(jr), Ac(je), and A,(j,) are the 2-cell 
or face amplitude, the edge or 1-cell amplitude, 
and the O-cell or vertex amplitude, respectively. 
These local amplitudes depend on the spin quan- 
tum numbers labeling neighboring cells in F,_, ¥ 
(e.g., the vertex amplitude of the vertex magnified 
in Figure 2 is A,(j,k,/,m, 7, s)). 

The underlying discreteness discovered in LQG 
is crucial: in the spin foam representation, the 
functional integral for gravity is replaced by a sum 
over amplitudes of combinatorial objects given by 
foam-like configurations (spin foams) as in [4]. A 
spin foam represents a possible history of the 


gravitational field and can be interpreted as a set 
of transitions through different quantum states of 
space. Boundary data in the path integral are given 
by the polymer-like excitations (spin network 
states, Figure 1) representing 3-geometry states in 


LQG. 


Spin Foams in 3D Quantum Gravity 


Now we introduce the concept of spin foams in a 
more explicit way in the context of the quantization 
of three-dimensional (3D) Riemannian gravity. Later 
in this section we will present the definition of P 
from the canonical and covariant viewpoint for- 
mally stated in the introduction by eqns [2] and [3], 
respectively. 


The Classical Theory 


Riemannian gravity in 3D is a theory with no local 
degrees of freedom, that is, a topological theory (see 
Topological Quantum Field Theory: Overview). Its 
action (in the first-order formalism) is given by 


S(e, w) = f wen Fw) [S] 


where M= xR (for X an arbitrary Riemann 
surface), w is an SU(2) connection, and the triad e 
is an su(2)-valued 1-form. The gauge symmetries of 
the action are the local SU(2) gauge transformations 


de = |e, al, dw = dua [6] 


where a is an su(2)-valued O-form, and the 
“topological”? gauge transformation 


pe =d: dw = 0 [7] 


where d „ denotes the covariant exterior derivative 
and ņ is an su(2)-valued 0-form. The first invariance 
is manifest from the form of the action, while the 
second is a consequence of the Bianchi identity, 
d,,F(w)=0. The gauge symmetries are so large that 
all the solutions to the equations of motion are 
locally pure gauge. The theory has only global or 
topological degrees of freedom. 

Upon the standard 2 + 1 decomposition (see Cano- 
nical General Relativity), the phase space in these 
variables is parametrized by the pullback to £ of w and 
e. In local coordinates, one can express them in terms of 
the two 2D connection A’, and the triad field 
E? = ek ny, where a=1,2 are space coordinate 
indices and 7, j = 1,2, 3 are su(2) indices. The symplec- 
tic structure is defined by 


{AZ (x), E; (y)} = 6266 (x,y) [8 | 


Local symmetries of the theory are generated by the 
first-class constraints 


DE 0, 1,(A) = 0 [9] 


which are referred to as the Gauss law and the 
curvature constraint, respectively — the quantization 
of these is the analog of [1] in 4D. This simple 
theory has been quantized in various ways in the 
literature; here we will use it to introduce the spin 
foam quantization. 


Kinematical Hilbert Space 


In analogy with the 4D case, one follows Dirac’s 
procedure finding first a representation of the basic 
variables in an auxiliary or kinematical Hilbert 
space Hkin- The basic states are functionals of the 
connection depending on the parallel transport 
along paths y C X: the so-called holonomy. Given 
a connection A‘(x) and a path y, one defines the 
holonomy /,[A] as the path-ordered exponential 


h,|A| = Pexp | A [10] 


The kinematical Hilbert space, Hkin, corresponds 
to the Ashtekar-Lewandowski (AL) representation 
of the algebra of functions of holonomies or 
generalized connections. This algebra is in fact a 
C*-algebra and is denoted Cyl (see Loop Quantum 
Gravity). Functionals of the connection act in the 
AL representation simply by multiplication. For 
example, the holonomy operator acts as follows: 


h [A)T [A] = k,[A]®{A] [11] 


As in 4D, an orthonormal basis of Hy, is defined 
by the spin network states. Each spin network is 
labeled by a graph y C E, a set of spins {je} labeling 
links £ € y, and a set of intertwiners {v,} labeling 
nodes n € y (Figure 3), namely: 


A =QuQ][letal) — [12 


ney Ley 


Sy {je} {in} | 


Ne) (oy 





Figure 3 A spin network state in 2+ 1 LQG. The decomposi- 
tion of a 4-valent node in terms of basic 3-valent intertwiners is 
shown. 
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where ee the unitary irreducible representation matrix 
of spin j (for a precise definition, see Loop Quantum 
Gravity). For simplicity, we will often denote spin 
network states |s > omitting the graph and spin labels. 


Spin Foams from the Hamiltonian Formulation 


The physical Hilbert space, Hpnys, is defined by 
those “states” that are annihilated by the con- 
straints. By construction, spin-network states solve 
the Gauss constraint - D,E?|s >=0 -— as they 
are manifestly SU(2) gauge invariant (see Loop 
Quantum Gravity). To complete the quantization, 
one needs to characterize the space of solutions of 
the quantum curvature constraints (Fi, ), and to 
provide it with the physical inner A The 
existence of Hphys is granted by the following: 


Theorem 1 There exists a normalized positive 
linear form P over Cyl, that is, P(yy*wW) > 0 for y € 
Cyl and P(1)=1, yielding (through the GNS 
construction (see Algebraic Approach to Quantum 
Field Theory)) the physical Hilbert space Hpnys and 
the physical representation Tp of Cyl. 


The state P contains a very large Gelfand ideal (set 
of zero norm states) ]:={a € Cyl s.t. P(a*a) = 0}. In 
fact, the physical Hilbert space Hphys := Cyl/J corre- 
sponds to the quantization of finitely many degrees of 
freedom. This is expected in 3D gravity as the theory 
does not have local excitations (no “gravitons”) (see 
Topological Quantum Field Theory: Overview). The 
representation mp of Cyl solves the curvature con- 
straint in the sense that for any functional f} [A] € Cyl 
defined on the subalgebra of functionals defined on 
contractible graphs y € £, one has that 


rplf,)¥ = f l0]Y 13 


This equation expresses the fact that “F = 0” in Priye 
(for flat connections, parallel transport is trivial 
around a contractible region). For s,s’ € Hkin, the 
physical inner product is given by 


Oe On), [14] 





where the *-operation and the product are defined 
in Cyl. 

The previous equation admits a “sum over 
histories” representation. We shall introduce the 
concept of the spin foam representation as an 
explicit een of the positive linear form P 
which, as in [2], is formally given by 


r- fo tool riNP(A)] 


= | [ oA [15] 


xed 
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Figure 4 Cellular decomposition of the space manifold 
£ (a square lattice in this example), and the infinitesimal 
plaquette holonomy W,|[A]. 


where N(x) € su(2). One can make the previous 
formal expression a rigorous definition if one intro- 
duces a regularization. Given a partition of © in terms 
of 2D plaquettes of coordinate area ef, one has that 


J tr[NF(A)] = lim $ etr[NpF i [16] 
r 


where N, and Fy are values of N’ and OF [A] 
at some interior point of the plaquette pê and € is 
the Levi-Civita tensor. Similarly, the holonomy 
W,i[A] around the boundary of the plaquette p’ 
(see Figure 4) is given by 


W,i[A] = 1 + &F,i(A) + O(c) [17] 


where F, = ad F (xpi) (7 are the generators of 
su(2) in the fundamental representation). The pre- 
vious two equations lead to the following definition: 
given s € Cyl (think of spin network state based on a 
graph y), the linear form P(s) is defined as 


ro lim (aT fans exp(i mN Mp) [18] 
pi 


where <,> is the inner product in the AL 
representation and |Q > is the “vacuum” (1 € Cyl) 
in the AL representation. The partition is chosen so 
that the links of the underlying graph y border the 
plaquettes. One can easily perform the integration 


fo™. l l 


I 
l l l l 
1 


tr{I(W,)] > TEE E 


over the N,i using the identity (Peter-Weyl 
theorem) 


fan exp(1 tr[NW]) 
= 2 + 1)tr mw) [19] 


Using the previous equation 


P(s) = lim J] Silo) +1) 
p i) 
jl) 
<Q tr| I (Wp), > i20] 


where j(p') is the spin labeling element of the sum 
[19] associated to the ith plaquette. Since the 
tr[Il/(W)] commute, the ordering of plaquette opera- 
tors in the previous product does not matter. It can be 
shown that the limit € — 0 exists and one can give a 
closed expression of P(s). 

Now in the AL representation (see eqn [11]), each 
tr[IV/ o) (W )] acts by creating a closed loop in the jp; 
representation at the boundary of the corresponding 
plaquette (Figures 5 and 6). 

One can introduce a (nonphysical) time parameter 
that works simply as a coordinate providing the means 
of organizing the series of actions of plaquette loop 
operators in [20]; that is, one assumes that each of the 
loop actions occurs at different “times.” We have 
introduced an auxiliary time slicing (arbitrary para- 
metrization). If one inserts the AL partition of unity 


L= X fi} >< Gl [21] 


YED Affe 


where the sum is over the complete basis of spin 
network states {|y, {7} >} — based on all graphs y € X 
and with all possible spin labeling — between each time 


l l l 
k o l l l 
7A Nim | j E | 
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Figure 5 Graphical notation representing the action of one plaquette holonomy on a spin network state. On the right is the result 
written in terms of the spin network basis. The amplitude Nj m x can be expressed in terms of Clebsch—Gordan coefficients. 





k j k : 
jkm 
= V = > SE A p O 
0,p A, AA-Am nop 
m m 


Figure 6 Graphical notation representing the action of one plaquette holonomy on a spin network vertex. The object in brackets ({ }) 


is a 6j-symbol and A; :=2/ + 1. 


slice, one arrives at a sum over spin network histories 
representation of P(s). More precisely, P(s) can be 
expressed as a sum over amplitudes corresponding to a 
series of transitions that can be viewed as the “time 
evolution” between the “initial” spin network s and 
the “final” “vacuum state” Q. The physical inner 
product between spin networks s and s’ is defined as 
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<5 pees) 


and can be expressed as a sum over amplitudes 
corresponding to transitions interpolating between 
the “initial” spin network s’ and the “final” spin 
network s (e.g., Figures 7 and 8). 





Figure 7 A set of discrete transitions in the loop-to-loop physical inner product obtained by a series of transitions as in Figure 5. On 


the right, the continuous spin foam representation in the limit e — O. 





Figure 8 A set of discrete transitions representing one of the contributing histories at a fixed value of the regulator. On the right, the 


continuous spin foam representation when the regulator is removed. 
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Spin network nodes evolve into edges while spin 
network links evolve into 2D faces. Edges inherit the 
intertwiners associated to the nodes and faces inherit 
the spins associated to links. Therefore, the series of 
transitions can be represented by a 2-complex whose 
1-cells are labeled by intertwiners and whose 2-cells 
are labeled by spins. The places where the action of 
the plaquette loop operators create new links 
(Figures 6 and 8) define O-cells or vertices. These 
foam-like structures are the so-called spin foams. 
The spin foam amplitudes are purely combinatorial 
and can be explicitly computed from the simple 
action of the loop operator in the AL representation 
(see Loop Quantum Gravity). A particularly simple 
case arises when the spin network states s and s’ 
have only 3-valent nodes. Explicitly, 


3.5 S52 Pes) 


=> Il (2j¢ +1)? IT 


{i} fore UEP 


|22] 











where the notation is that of [4], and vp=0 if 
fAsAOAfNS F0,y4=1 if fAsAOVENS' FO), 
and v»=2 if fAs=OAfNs'=0. The tetrahedral 
diagram denotes a 6j-symbol: the amplitude obtained 
by means of the natural contraction of the four 
intertwiners corresponding to the 1-cells converging 
at a vertex. More generally, for arbitrary spin 
networks, the vertex amplitude corresponds to 3nj- 
symbols, and <s,s’>, takes the general form [4]. 


Spin Foams from the Covariant Path Integral 


In this section we re-derive the spin foam represen- 
tation of the physical scalar product of 2+1 
(Riemannian) quantum gravity directly as a regular- 
ization of the covariant path integral. The formal 
path integral for 3D gravity can be written as 


P= J De|D[A] exp i J tle A F(A)] [23] 


Assume M=» x I, where I C R is a closed (time) 
interval (for simplicity, we ignore boundary 
terms). 

In order to give a meaning to the formal 
expression above, one replaces the 3D manifold 
(with boundary) M with an arbitrary cellular 
decomposition A. One also needs the notion of the 
associated dual 2-complex of A denoted by A*. The 
dual 2-complex A* is a combinatorial object defined 
by a set of vertices v € A* (dual to 3-cells in A), 
edges e € A* (dual to 2-cells in A), and faces f € A* 





Figure 9 The cellular decomposition of M= x I (£= T? in 
this example). The illustration shows part of the induced graph 
on the boundary and the detail of a tetrahedron in A and a face 
f € A* in the bulk. 


(dual to 1-cells in A). The intersection of the dual 
2-complex A* with the boundaries defines two 
graphs 71,72 € È (see Figure 9). For simplicity, we 
ignore the boundaries until the end of this section. 
The fields e and A are discretized as follows. The 
su(2)-valued 1-form field e is represented by the 
assignment of es € su(2) to each 1-cell in A. We 
use the fact that faces in A” are in one-to-one 
correspondence with 1-cells in A and label e; with a 
face subindex (Figure 9). The connection field A is 
represented by the assignment of group elements 
ge E€ SU(2) to each edge in e € A’ (see Figure 10). 

With all this, [23] becomes the regularized version 
Pa defined as 


Pa= | [I de, [I dg-explitrlesW,|]| [24] 


fe a* ec A* 


where de, is the regular Lebesgue measure on i 
dge is the Haar measure on SU(2), and W; denotes 
the holonomy around (spacetime) faces, that is, 
W;=g.:::g) for N being the number of edges 
bounding the corresponding face (see Figure 10). 
The discretization procedure is reminiscent of the 
one used in standard lattice gauge theory (see Lattice 





Figure 10 A (2-cell) face f € A* in a cellular decomposition of 
the spacetime manifold M and the corresponding dual 1-cell. The 
connection field is discretized by the assignment of the parallel 
transport group elements gi €SU(2) to edges ec A* 
=le 5 in the face shown here). 


Gauge Theory). The previous definition can be 
motivated by an analysis equivalent to the one 
presented in [16]. 

Integrating over es, and using [19], one obtains 


p= > / TI dge |] (i +1) 
{j} ecA* fe d* 


if 


tel 8) 25 


x tr 





Now it remains to integrate over the lattice con- 
nection {g,}. If an edge e € A* bounds n faces f € A* 
there will be n traces of the form tr[II/(---g.---)] in 
[25] containing ge in the argument. In order to 
integrate over ge we can use the following identity: 


Inv C= [es tits) © II(g) @--- 


-2 Jaj2 fn Cie ‘jn [26] 
where I” 


” „is the projector from the tensor a of 
irreducible representations 4... “in = 41 7 -Q jn 
onto the invariant component HH? j, = hvi @ j2 ® 

-Q jn]. On the right-hand side, we e haye chosen an 
sie ioe basis of invariant vectors (intertwiners) 
in H,j,...;, to express the projector. Notice that the 
assignment of intertwiners to edges is a consequence 
of the integration over the connection. Using [26] 
one can write Pa in the general spin foam 
representation form [4] 


Pra=S> [[@r+ [[4G%) 27 


{f} fed vei 


2 e) 


where A,(l,,/,) is given by the appropriate trace of 
the intertwiners corresponding to the edges bounded 
by the vertex. As in the previous section, this 
amplitude is given in general by an SU(2) 3Nj- 
symbol. When A is a simplicial complex, all the 
edges in A* are 3-valent and vertices are 4-valent. 
Consequently, the vertex amplitude is given by the 
contraction of the corresponding four 3-valent 
intertwiners, that is, a 67 symbol. In that case, the 
path integral takes the (Ponzano—Regge) form 








|28] 








The labeling of faces that intersect the boundary 
naturally induces a labeling of the edges of the 
graphs yı and %2 induced by the discretization. 
Thus, the boundary states are given by spin network 
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states on %1 and y2, respectively. A careful analysis 
of the boundary contribution shows that only the 
face amplitude is modified to (Aj,)” /2 and that the 
spin foam amplitudes are as in eqn [22]. 

A crucial property of the path integral in 3D 
gravity (and of the transition amplitudes in general) 
is that it does not depend on the discretization A — 
this is due to the absence of local degrees of freedom 
in 3D gravity and not expected to hold in 4D. Given 
two different cellular decompositions A and JA’, 
one has 


a eae ma [29] 


where no is the number of 0-simplexes in A, and 
r= >; (2j + 1)”. As 7 is given by a divergent sum, 
the discretization independence statement is formal. 
Moreover, the sum over spins in [28] is typically 
divergent. Divergences occur due to infinite gauge- 
volume factors in the path integral corresponding to 
the topological gauge freedom [7]. Freidel and 
Louapre have shown how these divergences can be 
avoided by gauge-fixing unphysical degrees of free- 
dom in [24]. In the case of 3D gravity with positive 
cosmological constant, the state sum generalizes to 
the Turaev-Viro invariant (see Topological Quan- 
tum Field Theory: Overview) defined in terms of the 
quantum group SU,(2) with g”=1 where the 
representations are finitely many and thus T < o. 
Equation [29] is a rigorous statement in that case. 
No such infrared divergences appear in the canoni- 
cal treatment of the previous section. 


Spin Foams in 4D 
Spin Foam from the Canonical Formulation 


There is no rigorous construction of the physical 
inner product of LQG in 4D. The spin foam 
representation as a device for its definition has 
been introduced formally by Rovelli. In 4D LQG, 
difficulties in understanding dynamics are centered 
araa the quantum scalar constraint 


detE TESEY | (A) +--+ (see [1]) — the vector 
constraint V, (A, E) is solved in a simple manner 


(see Loop Quantum Gravity). The physical inner 
product formally becomes 
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Figure 11 The action of the scalar constraint and its spin foam representation. N(xn) is the value of N at the node and Shop are the 


matrix elements of S. 


where < , >gifs denotes the inner product in the 
Hilbert space of solutions of the vector constraint, 
and the exponential has been expanded in powers in 
the last expression on the right-hand side. 

From early on, it was realized that smooth loop 
states are naturally annihilated by S (indepen- 
dently of any quantization ambiguity). Conse- 
quently, $ acts only on spin network nodes. 
Generically, it does so by creating new links and 
nodes modifying the underlying graph of the spin 
network states (Figure 11). 

Therefore, each term in the sum [30] represents a 
series of transitions — given by the local action of $ 
at spin network nodes — through different spin 
network states interpolating the boundary states s 
and s’, respectively. The action of S can be 
visualized as an “interaction vertex” in the “time” 
evolution of the node (Figure 11). As in the explicit 
3D case, eqn [30] can be expressed as sum over 
“histories” of spin networks pictured as a system of 
branching surfaces described by a 2-complex whose 
elements inherit the representation labels on the 
intermediate states. The value of the “transition” 
amplitudes is controlled by the matrix elements of 
S. Therefore, although the qualitative picture is 
independent of quantization ambiguities, transition 
amplitudes are sensitive to them. 

Before even considering the issue of convergence 
of [30], the problem with this definition is evident: 
every single term in the sum is a divergent integral! 
Therefore, this way of presenting spin foams has to 
be considered as formal until a well-defined regular- 
ization of [2] is provided. That is the goal of the spin 
foam approach. 

Instead of dealing with an infinite number of 
constraints Thiemann recently proposed to impose 
one single master constraint defined as 


_ 3 S(x) — q” Vax) Vix) 
— i det q(x) ee 


Using techniques developed by Thiemann, this 
constraint can indeed be promoted to a quantum 


operator acting on Hkin. The physical inner product 
is given by 


T Zs 
<s.6 >p = lim <s, | dies > [32] 
T— 00 -T 
A spin foam representation of the previous expres- 
sion could now be achieved by the standard 
skeletonization that leads to the path-integral repre- 
sentation in quantum mechanics. In this context, 
one splits the t-parameter in discrete steps and 

writes 

eM — lim [eM/N]N = lim [1+itM/NJX [33] 

N-oo N-o00 

The spin foam representation follows from the fact 
that the action of the basic operator 1+ i1tM/N ona 
spin network can be written as a linear combination 
of new spin networks whose graphs and labels have 
been modified by the creation of new nodes (in a 
way qualitatively analogous to the local action 
shown in Figure 11). An explicit derivation of the 
physical inner product of 4D LQG along these lines 
is under current investigation. 


Spin Foams from the Covariant Formulation 


In 4D, the spin foam representation of the dynamics 
of LQG has been investigated more intensively in 
the covariant formulation. This has led to a series of 
constructions which are referred to as spin foam 
models. These treatments are related more closely to 
the construction based on the covariant path- 
integral approach of the last section. Here we 
illustrate the formulation which has captured much 
interest in the literature: the Barrett-Crane (BC) 
model. 


Spin foam models for gravity as constrained quan- 
tum BF theory The BC model is one of the most 
extensively studied spin foam models for quantum 
gravity. To introduce the main ideas involved, we 
concentrate on the definition of the model in the 
Riemannian sector. The BC model can be formally 


viewed as a spin foam quantization of SO(4) 
Plebanski’s formulation of general relativity. Ple- 
banski’s Riemannian action depends on an SO(4) 
connection A, a Lie-algebra-valued 2-form B, and 
Lagrange multiplier fields A and u. Writing explicitly 
the Lie algebra indices, the action is given by 


S|B,A, A, u] 
= J [BY /\ F(A) -+ AIJKL BY /\ Be 
+ pe!” AKL] [34] 


where H is a 4-form and AKL = — ÀJIKL = 
—AyLK=AKLy is a tensor in the internal space. 
Variation with respect to u imposes the constraint 
KL Ayk =0 on Ayk. The Lagrange multiplier 
tensor Ayx, has then 20 independent components. 
Variation with respect to A imposes 20 algebraic 
equations on the 36 components of B. The (non- 
degenerate) solutions to the equations obtained by 
varying the multipliers \ and u are 


BY = +e! eg /\ Er 
and 
BY = +e! Ae! [35] 


in terms of the 16 remaining degrees of freedom of 
the tetrad field ef. If one substitutes the first solution 
into the original action, one obtains Palatini’s 
formulation of general relativity; therefore, on shell 
(and on the right sector), the action is that of 
classical gravity. 

The key idea in the definition of the model is that 
the path integral for the theory corresponding to the 
action S[B, A, 0,0], namely 


Poo = f DIBIDIAlexp|i f (BY A Fy(A)] [36] 


can be given a meaning as a spin foam sum, [4], in 
terms of a simple generalization of the construction 
of the previous section. In fact, S[B,A,0,0] corre- 
sponds to a simple theory known as BF theory that 
is formally very similar to 3D gravity (see BF 
Theories). The result is independent of the chosen 
discretization because BF theory does not have local 
degrees of freedom (just as 3D gravity). 

The BC model aims at providing a definition of 
the path integral of gravity pursuing a well-posed 
definition of the formal expression 


Por = | DIBIDIA|s[B = eter Aer] 


x expli f [BY A Fy(A)]| [37 
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where D[B]D[A]6(B — K eg ^er) means that one 
must restrict the sum in [36] to those configurations 
of the topological theory satisfying the constraints 
B= x (e ^e) for some tetrad e. The remarkable fact 
is that this restriction can be implemented in a 
systematic way directly on the spin foam configura- 
tions that define Propo. 

In Propo spin foams are labeled with spins corre- 
sponding to the unitary irreducible representations of 
SO(4) (given by two spin quantum numbers (jp, jL)). 
Essentially, the factor “6(B — e€!/K ex ^ eL)” restricts 
the set of spin foam quantum numbers to the so- 
called simple representations (for which jr =j, =f). 
This is the “quantum” version of the solution to the 
constraints [35]. There are various versions of this 
model. The simplest definition of the transition 
amplitudes in the BC model is given by 


P(s*s) =% [I (2j +1)” [I 


{i} poled veb ag 

















where we use the notation of [22], the graphs denote 
15j-symbols, and 1; are half-integers labeling SU(2) 
normalized 4-intertwiners. No rigorous connection 
with the Hilbert space picture of LQG has yet been 
established. The self-dual version of Plebanski’s 
action leads, through a similar construction, to 
Reisenberger’s model. 

The simplest amplitude in the BC model corre- 
sponds to a single 4-simplex, which can be viewed 
as the simplest triangulation of the 4D spacetime 
given by the interior of a 3-sphere (the correspond- 
ing 2-complex is shown in Figure 12). States of the 
4-simplex are labeled by ten spins f (labeling the ten 
edges of the boundary spin network, see Figure 12) 
which can be shown to be related to the area in 





Figure 12 The dual of a 4-simplex. 
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Planck units of the ten triangular faces that form the 
4-simplex. A first indication of the connection of the 
model with gravity was that the large-j7 asymptotics 
appeared to be dominated by the exponential of the 
Regge action (the action derived by Regge as a 
discretization of general relativity). This estimate 
was done using the stationary-phase approximation 
to the integral that gives the amplitude of a 
4-simplex in the BC model. However, more detailed 
calculations showed that the amplitude is dominated 
by configurations corresponding to degenerate 
4-simplexes. This seems to invalidate a simple 
connection to general relativity and is one of the 
main puzzles in the model. 


Spin Foams as Feynman Diagrams 


The main problem with the models of the previous 
section is that they are defined on a discretization A 
of M and that — contrary to what happens with a 
topological theory, for example, 3D gravity 
(eqn [29]) — the amplitudes depend on the discretiza- 
tion A. Various possibilities to eliminate this reg- 
ulator have been discussed in the literature but no 
explicit results are yet known in 4D. An interesting 
proposal is a discretization-independent definition of 
spin foam models achieved by the introduction of an 
auxiliary field theory living on an abstract group 
manifold — Spin(4)* and SL(2,C)* for Riemannian 
and Lorentzian gravity, respectively. The action of 
the auxiliary group field theory (GFT) takes the form 


2 


(5) 
aJ MOA B 


S= | # + 

G4 
where M®[¢] is a fifth-order monomial, and 
G is the corresponding group. In the simp- 


lest model, M) [4] = (g1, 82, 35 84) O( 84s 855 86587) X 
P(275 835 885 29) P( 895 865 825 210) P( 8105 88, 85,81). The 


field ¢ is required to be invariant under the 
(simultaneous) right action of the group on its 
four arguments in addition to other symmetries 
(not described here for simplicity). The perturba- 
tive expansion in A of the GFT Euclidean path 
integral is given by 


N 


_ oS) = A 
p= | Diol? = Aly] Ao 


Fu 


where A[Fn] corresponds to a sum of Feynman- 
diagram amplitudes for diagrams with N interaction 
vertices, and sym[Fy] denotes the standard symme- 
try factor. A remarkable property of this expansion 
is that A[Fx] can be expressed as a sum over spin 
foam amplitudes, that is, 2-complexes labeled by 
unitary irreducible representations of G. Moreover, 
for very simple interaction M“)[@], the spin foam 


amplitudes are in one-to-one correspondence to 
those found in the models of the previous section 
(e.g., the BC model). This duality is regarded as a 
way of providing a fully combinatorial definition of 
quantum gravity where no reference to any dis- 
cretization or even a manifold structure is made. 
Transition amplitudes between spin network states 
correspond to n-point functions of the field theory. 
These models have been inspired by generalizations 
of matrix models applied to BF theory. 

Divergent transition amplitudes can arise by the 
contribution of “loop” diagrams as in standard 
quantum field theory. In spin foams, diagrams 
corresponding to 2D bubbles are potentially divergent 
because spin labels can be arbitrarily high leading to 
unbounded sums in [4]. Such divergences do not occur 
in certain field theories dual (in the sense above) to the 
BC model. However, little is known about the 
convergence of the series in À and the physical meaning 
of this constant. Nevertheless, Freidel and Louapre 
have shown that the series can be re-summed in certain 
models dual to lower-dimensional theories. 


Causal Spin Foams 


Let us conclude by presenting a fundamentally 
different construction leading to spin foams. Using 
the kinematical setting of LQG with the assumption 
of the existence of a microlocal (in the sense of 
Planck scale) causal structure, Markopoulou and 
Smolin define a general class of (causal) spin foam 
models for gravity. The elementary transition ampli- 
tude A,,.;,,, from an initial spin network sr to 
another spin network s;,; is defined by a set of 
simple combinatorial rules based on a definition of 
causal propagation of the information at nodes. The 
rules and amplitudes have to satisfy certain causal 
restrictions (motivated by the standard concepts 
in classical Lorentzian physics). These rules gene- 
rate surface-like excitations of the same kind one 
encounters in the previous formulations. Spin foams 
FN__ are labeled by the number of times, N, these 


Si 7S 
elementary transitions take place. ‘Transition 
amplitudes are defined as 
N 
(Si, Sf) = DAF) 41 


which is of the generic form [4]. The models are not 
related to any continuum action. The only guiding 
principles in the construction are the restrictions 
imposed by causality, and the requirement of the 
existence of a nontrivial critical behavior that 
reproduces general relativity at large scales. Some 
indirect evidence of a possible nontrivial continuum 
limit has been obtained in certain versions of these 
models in 1 + 1 dimensions. 


See also: Algebraic Approach to Quantum Field Theory; 
BF Theories; Canonical General Relativity; Chern— 
Simons Models: Rigorous Results; Lattice Gauge 
Theory; Loop Quantum Gravity; Quantum Dynamics in 
Loop Quantum Gravity; Quantum Geometry and its 
Applications; Topological Quantum Field Theory: 
Overview. 
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Introduction 


From a physical point of view, spin glasses, as dilute 
magnetic alloys, are very interesting systems. They 
are characterized by such features as exhibiting a new 
magnetic phase, where magnetic moments are frozen 
into disordered equilibrium orientations, without any 
long-range order. See, for example, Young (1987) for 
general reviews, and also Stein (1989) for a very 
readable account about the physical properties of 
spin glasses. The experimental laboratory study of 
spin glasses is a very difficult subject, because of their 
peculiar properties. In particular, the existence of 
very slowly relaxing modes, with consequent memory 
effects, makes it difficult to realize the very basic 
physical concept of a system at thermodynamical 
equilibrium, at a given temperature. 

From a theoretical point of view some models 
have been proposed, which try to capture the 
essential physical features of spin glasses, in the 
frame of very simple assumptions. 

The basic model has been proposed by Edwards 
and Anderson (1975) many years ago. It is a simple 
extension of the well-known nearest-neighbor Ising 
model. On a large region A of the unit lattice in d 
dimensions, we associate an Ising spin o(7) to each 
lattice site n, and then we introduce a lattice 
Hamiltonian 


Ay(o,J) =- X J(n,n')o(n)o(n') [1] 
nn’) 


Here, the sum runs over all couples of nearest- 
neighbor sites in A, and J are quenched random 
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couplings, assumed for simplicity to be independent 
identically distributed random variables, with cen- 
tered unit Gaussian distribution. The quenched 
character of the J means that they do not contribute 
to thermodynamic equilibrium, but act as a kind of 
random external noise on the coupling of the o 
variables. In the expression of the Hamiltonian, we 
have indicated with ø the set of all o(7), and with J 
the set of all J(u,7’). The region A must be taken 
very large, by letting it invade all lattice in the limit. 
The physical motivation for this choice is that for 
real spin glasses the interaction between the spins 
dissolved in the matrix of the alloy oscillates in sign 
according to distance. This effect is taken into 
account in the model through the random character 
of the couplings between spins. 

Even though very drastic simplifications have 
been introduced in the formulation of this model, 
as compared to the extremely complicated nature 
of physical spin glasses, nevertheless a rigorous 
study of all properties emerging from the static 
and dynamic behavior of a thermodynamic system 
of this kind is far from being complete. In particular, 
with reference to static equilibrium properties, it 
is not yet possible to reach a completely substan- 
tiated description of the phases emerging in the 
low-temperature region. Even physical intuition 
gives completely different guesses for different 
people. 

In the same way as a mean-field version can be 
associated to the ordinary Ising model, so it is possible 
for the disordered model described by [1]. Now we 
consider a number of sites i= 1,2,...,.N, and let each 
spin a(i) at site 7 interact with all other spins, with the 
intervention of a quenched noise J;. The precise form 
of the Hamiltonian will be given in the following. 

This is the mean-field model for spin glasses, 
introduced by Sherrington and Kirkpatrick (1975). 
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It is a celebrated model. Numerous articles have 
been devoted to its study during the years, appearing 
in the theoretical physics literature. 

The relevance of the model stems surely from the 
fact that it is intended to represent some important 
features of the physical spin glass systems, of great 
interest for their peculiar properties, at least at the 
level of the mean-field approximation. 

But another important source of interest is 
connected with the fact that disordered systems, of 
the Sherrington—Kirkpatrick type, and their general- 
izations, seem to play a very important role for 
theoretical and practical assessments about hard 
optimization problems, as it is shown, for example, 
by Mézard et al. (2002). 

It is interesting to remark that the original paper 
was entitled “Solvable model of a spin-glass,” while 
a previous draft, as told by David Sherrington, 
contained the even stronger designation “Exactly 
solvable.” However, it turned out that the very 
natural solution devised by the authors is valid only 
at high temperatures, or for large external magnetic 
fields. At low temperatures, the proposed solution 
exhibits a nonphysical drawback given by a negative 
entropy, as properly recognized by the authors in 
their very first paper. 

It took some years to find an acceptable solution. 
This was done by Giorgio Parisi in a series of 
papers, marking a radical departure from the 
previous methods. In fact, a very intense method of 
“spontaneous replica symmetry breaking” was 
developed. As a consequence, the physical content 
of the theory was encoded in a functional order 
parameter of new type, and a remarkable structure 
emerged for the pure states of the theory, a kind of 
hierarchical, ultrametric organization. These very 
interesting developments, due to Parisi, and his 
coworkers, are explained in a brilliant way in the 
classical book by Mézard et al. (1987). Part of this 
structure will be recalled in the following. 

It is important to remark that the Parisi solution is 
presented in the form of an ingenious and clever 
“ansatz.” Until few years ago, it was not known 
whether this ansatz would give the true solution for 
the model, in the so-called thermodynamic limit, 
when the size of the system becomes infinite, or it 
would be only a very good approximation for the 
true solution. 

The general structures offered by the Parisi solu- 
tion, and their possible generalizations for similar 
models, exhibit an extremely rich and interesting 
mathematical content. Very appropriately, Talagrand 
(2003) has used a strongly suggestive sentence in the 
title to his recent book: “Spin glasses: a challenge for 
mathematicians.” 


As a matter of fact, how to face this challenge is a 
very difficult problem. Here we would like to recall 
the main features of a very powerful method, yet 
extremely simple in its very essence, based on a 
comparison and interpolation argument on sets of 
Gaussian random variables. 

The method found its first simple application in 
Guerra (2001), where it was shown that the 
Sherrington—Kirkpatrick replica symmetric approxi- 
mate solution was a rigorous lower bound for the 
quenched free energy of the system, uniformly in 
the size. Then, it was possible to reach a long- 
awaited result (Guerra and Toninelli 2002): the 
convergence of the free energy density in the 
thermodynamic limit, by an intermediate step 
where the quenched free energy was shown to be 
subadditive in the size of the system. 

Moreover, still by interpolation on families of 
Gaussian random variables, the first mentioned result 
was extended to give a rigorous proof that the 
expression given by the Parisi ansatz is also a lower 
bound for the quenched free energy of the system, 
uniformly in the size (Guerra 2003). The method gives 
not only the bound, but also the explicit form of the 
correction in a complex form. As a recent and very 
important result, along the task of facing the challenge, 
Michel Talagrand has been able to dominate these 
correction terms, showing that they vanish in the 
thermodynamic limit. This milestone achievement was 
first announced in a short note, containing only a 
synthetic sketch of the proof, and then presented with 
all details in a long paper (Talagrand 2006). 

The interpolation method is also at the basis of 
the far-reaching generalized variational principle 
proved by Aizenman et al. (2003). 

In our presentation, we will try to be as self- 
contained as possible. We will give all definitions, 
explain the basic structure of the interpolation 
method, and show how some of the results are 
obtained. We will concentrate mostly on questions 
connected with the free energy, its properties of 
subadditivity, the existence of the infinite-volume 
limit, and the replica bounds. 

For the sake of comparison, and in order to 
provide a kind of warm-up, we will recall also some 
features of the standard elementary mean-field 
model of ferromagnetism, the so-called Curie- 
Weiss model. We will concentrate also here on the 
free energy, and systematically exploit elementary 
comparison and interpolation arguments. This will 
show the strict analogy between the treatment of the 
ferromagnetic model and the developments in the 
mean-field spin glass case. Basic roles will be played 
in the two cases, but with different expressions, by 
positivity and convexity properties. 


Then, we will consider the problem of connecting 
results for the mean-field case to the short-range case. 
An intermediate position is occupied by the so-called 
diluted models. They can be studied through a 
generalization of the methods exploited in the mean- 
field case, as shown, for example, in De Sanctis (2005). 

The organization of the paper is as follows. We 
first introduce the ferromagnetic model and discuss 
behavior and properties of the free energy in the 
thermodynamic limit, by emphasizing, in this very 
elementary case, the comparison and interpolation 
methods that will be also exploited, in a different 
context, in the spin glass case. 

The basic features of the mean-field spin glass 
models are discussed next, by introducing all 
necessary definitions. This is followed by the 
introduction, for generic Gaussian interactions, of 
some important formulas, concerning the derivation 
with respect to the strength of the interaction, and 
the Gaussian comparison and interpolation method. 

We then give simple applications to the mean-field 
spin glass model, in particular to the existence of the 
infinite-volume limit of the quenched free energy 
(Guerra and Toninelli 2002), and to the proof of 
general variational bounds, by following the useful 
strategy developed in Aizenman et al. (2003). 

The main features of the Parisi representation are 
recalled briefly, and the main theorem concerning 
the free energy is stated. This is followed by a brief 
mention of results for diluted models. 

We also attack the problem of connecting the 
results for the mean-field case to the more realistic 
short-range models. 

Finally we provide conclusions and outlook for 
future foreseen developments. 

Our treatment will be as simple as possible, by 
relying on the basic structural properties, and by 
describing methods of presumably very long lasting 
power. The emphasis given to the mean-field case 
reflects the status of research. After some years from 
now this review would perhaps be written according 
to completely different patterns. 


A Warm-up. The Mean-field 
Ferromagnetic Model: Structure 
and Results 


The mean-field ferromagnetic model is among the 
simplest models of statistical mechanics. However, it 
contains very interesting features, in particular a 
phase transition, characterized by spontaneous 
magnetization, at low temperatures. We refer to 
standard textbooks for a full treatment and a 
complete appreciation of the model in the frame of 
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the theory of ferromagnetism. Here we first consider 
some properties of the free energy, easily obtained 
through comparison methods. 

The generic configuration of the mean-field 
ferromagnetic model is defined through Ising spin 
variables oj =+1, attached to each site :=1, 
Dogs 

The Hamiltonian of the model, in some external 
field of strength h, is given by the mean-field expression 


1 
Hn(o,h) =- 59 io; -hY 0; [2] 
Gj) i 


Here, the first sum extends to all N(N — 1)/2 site 
couples, and the second to all sites. 

For a given inverse temperature (3, let us now 
introduce the partition function Zn(G,/) and the 
free energy per site fn(8, hb), according to the well- 
known definitions 


Zn(8,h)= $ exp(—BHn(o,h)) [3] 


01...0ON 


—Bfx(8,b) = N~*E log Zn(Q, h) [4] 


It is also convenient to define the average spin 
magnetization 


1 
eae ee [5] 


Then, it is immediately seen that the Hamiltonian 
in [2] can be equivalently written as 


1 2 
Hy (a,b) = —5Nm* — OD (6) 
where an unessential constant term has been 
neglected. In fact, we have 
1 1 1 
ee ee ea 
(4j) i j;iŻj 


where the sum over all couples has been equivalently 
written as one half the sum over all 7, 7 with i Æ jf, 
and the diagonal terms with ¿=j have been added 
and subtracted out. Notice that they give a constant 
because o7 =1. 

Therefore, the partition function in [3] can be 
equivalently substituted by the expression 


Zu(8,b)= J exp (30N) exp Ç Sa) 8 
which will be our starting point. 

Our interest will be in the limy_.., N~! log Zn({, h). 
To this purpose, let us establish the important 
subadditivity property, holding for the splitting of the 
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large-N system in two smaller systems with N; and N2 
sites, respectively, with N=N, + N2, 


log Zn (3, h) < log Zn, (8,4) + log Zn, (8, /) [9] 


The proof is very simple. Let us denote, in the most 
natural way, by o1,...,0N, the spin variables for the 
first subsystem, and by on,41,--.,;0n the N2 spin 
variables of the second subsystem. Introduce also the 
subsystem magnetizations mı and m, by adapting 
the definition [5] to the smaller systems, in such a 
way that 


Nm = Nym,+ Nom [10] 


Therefore, we see that the large system magnetiza- 
tion m is the linear convex combination of the 
smaller system ones, according to the obvious 
Nı N2 
m = — mı +— m 11 
N N [11] 


Since the mapping m — m? is convex, we also have 


the general bound, holding for all values of the o 
variables 


m < — m + m5 [12] 


Then, it is enough to substitute the inequality in the 
definition [8] of Zn(6,þb), and recognize that we 
achieve factorization with respect to the two sub- 
systems, and therefore the inequality Zn < ZN, Zn. 
So we have established [9]. From subadditivity, the 
existence of the limit follows by standard arguments. 
In fact, we have 


Jim N! log Zn(6, h) = inf N! log Zn(6, b) [13] 


Now we will calculate explicitly this limit, by 
introducing an order parameter M, a trial function, 
and an appropriate variational scheme. In order to 
get a lower bound, we start from the elementary 
inequality m? > 2mM — M?, holding for any value 
of m and M. By inserting the inequality in the 
definition [8] we arrive at a factorization of the sum 
over o’s. The sum can be explicitly calculated, and 
we arrive immediately to the lower bound, uniform 
in the size of the system, 


N! log Zn(6, h) 
> log2 + logcosh 6(b + M) —48M? [14] 
holding for any value of the trial order parameter M. 


Clearly, it is convenient to take the supremum over M. 
Then, we establish the optimal uniform lower bound 


N`! log Zn(G, Þ) 
> sup(log2 + log cosh B(h + M)—46M*) — [15] 
M 


It is simple to realize that the supremum coincides 
with the limit as N — oo. To this purpose we follow 
the following simple procedure. Let us consider all 
possible values of the variable m. There are N + 1 of 
them, corresponding to any number K of possible 
spin flips, starting from a given o configuration, 
K=0,1,...,N. Let us consider the trivial decom- 
position of the identity, holding for any m, 


M 


where M in the sum runs over the N+ 1 possible 
values of m, and 6 is Kroneker delta, being equal to 1 
if M =N, and zero otherwise. Let us now insert [16] 
in the definition [8] of the partition function inside 
the sum over o’s, and invert the two sums. Because of 
the forcing m=M given by the 6, we can write 
m? =2mM — M? inside the sum. Then if we neglect 
the 6, by using the trivial 6 < 1, we have an upper 
bound, where the sum over o’s can be explicitly 
performed as before. Then it is enough to take the 
upper bound with respect to M, and consider that 
there are N + 1 terms in the now trivial sum over M, 
in order to arrive at the upper bound 


N“! log Zn(G, h) 
< sup (log 2 + log cosh (b + M) 
M 


—4 BM?) + N7' log(N + 1) [17] 


Therefore, by going to the limit as N — œ, we can 
collect all our results in the form of the following 
theorem giving the full characterization of the 
thermodynamic limit of the free energy. 


Theorem 1 For the mean-field ferromagnetic 
model we have 


Jim N`! log Zn(8, h) = inf N! log Zn(86,b) [18] 
= sup(log 2 + log cosh B(h + M) —48M*) [19] 
M 


This ends our discussion about the free energy in 
the ferromagnetic model. 

Other properties of the model can be easily 
established. Introduce the Boltzmann-Gibbs state 


wn(A) 
= ` A exp (5 aN) exp (ør ` x) [20] 


where A is any function of c1... ON. 
The observable m(o) becomes self-averaging under 
wyn, in the infinite-volume limit, in the sense that 


lim wn((m — M(B, b))*) = 0 [21] 


This property of m is the deep reason for the success 
of the strategy exploited earlier for the convergence 
of the free energy. Easy consequences are the 
following. In the infinite-volume limit, for b Æ 0, 
the Boltzmann-—Gibbs state becomes a factor state 


lim wy (01 -..05) = M(8,h)° [22] 


A phase transition appears in the form of sponta- 
neous magnetization. In fact, while for h=0 and 
B<1 we have M(Z,h)=0, on the other hand, for 
B > 1, we have the discontinuity 


lim M(3,b) = —lim M(8,4) = M(3)>0 [23] 


Fluctuations can also be easily controlled. In fact, 
one proves that the rescaled random variable 
VN(m — M(G,h)) tends in distribution, under wy, 
to a centered Gaussian with variance given by the 
susceptibility 


4p 
\(9,b) = M(B,b) =O MI A 


= 0 i Me) 
Notice that the variance becomes infinite only at the 
critical point h =0,G=1, where M=0. 

Now we are ready to attack the much more 
difficult spin glass model. But it will be surprising to 
see that, by following a simple extension of the 
methods described here, we will arrive at similar 
results. 


Basic Definitions for the Mean-Field Spin 
Glass Model 


As in the ferromagnetic case, the generic configura- 
tion of the mean-field spin glass model is defined 
through Ising spin variables o;= +1, attached to 
cach Site t= 152. was N: 

But now there is an external quenched disorder 
given by the N(N — 1)/2 independent and identical 
distributed random variables J, defined for each 
pair of sites. For the sake of simplicity, we assume 
each J; to be a centered unit Gaussian with averages 
Ep) =0,E ji) = 1. By quenched disorder we mean 
that the J have a kind of stochastic external 
influence on the system, without contributing to 
the thermal equilibrium. 

Now the Hamiltonian of the model, in some 
external field of strength h, is given by the mean- 
field expression 


1 
Hn (o,h, J) = ys? ED os [25] 
Gj) i 


Here, the first sum extends to all site pairs, and the 
second to all sites. Notice the VN, necessary to 
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ensure a good thermodynamic behavior to the free 
energy. 

For a given inverse temperature (, let us now 
introduce the disorder-dependent partition func- 
tion Zyn(G,h,J) and the quenched average of the 
free energy per site fn(G,h), according to the 
definitions 


Zn(8,,J)= >> exp(-GHn(o,h,J)) [26] 


OJON 


— Sfn(B,h) = NE log Zn(®, h, J) |27] 


Notice that in [27] the average E with respect to the 
external noise is made “after” the log is taken. This 
procedure is called quenched averaging. It represents 
the physical idea that the external noise does not 
contribute to the thermal equilibrium. Only the o’s 
are thermalized. 

For the sake of simplicity, it is also convenient to 
write the partition function in the following equiva- 
lent form. First of all let us introduce a family of 
centered Gaussian random variables K(c), indexed 
by the configurations o, and characterized by the 
covariances 


E(K(a)K(o')) = 4° (0, 0’) |28] 


where g(o, 0’) are the overlaps between two generic 
configurations, defined by 


qlo, o’) = N7! ` o;o; [29] 


with the obvious bounds —1 < g(o,o0’) <1, and 
the normalization g(o,0)=1. Then, starting from 
the definition [25], it is immediately seen that the 
partition function in [26] can also be written, by 
neglecting unessential constant terms, in the form 


Zn(G,h,J) 


= ` e(a 3c) (25a) [30] 


O1..-ON 


which will be the starting point of our treatment. 


Basic Formulas of Derivation 
and Interpolation 


We work in the following general setting. Let U; 
be a family of centered Gaussian random variables, 
i=1,...,K, with covariance matrix given by 
E(U;U;) = S;. We treat the index 7 now as configura- 
tion space for some statistical mechanics system, with 
partition function Z and quenched free energy given by 


Elog wi exp(VtU;) = ElogZ [31] 
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where w;> 0 are generic weights, and ¢ is a 
parameter ruling the strength of the interaction. 

It would be hard to underestimate the relevance of 
the following derivation formula 


d 
qE log 2; w; exp(vtU;) 


1 —1 y 
=5E(z wiexp(V#U; ) Sy 
1 5 
-3E(Z > ; ww; exp(vtU;). 


x exp(v#U))S; ) [32] 


The proof is straightforward. First we perform 
directly the t-derivative. Then, we notice that the 
random variables appear in expressions of the form 
E(U;F), where F are functions of the U’s. These can 
be easily handled through the following integration 
by parts formula for generic Gaussian random 
variables, strongly reminiscent of the Wick theorem 
in quantum field theory, 


E(U;F) = D S;E (o F) [33] 


Therefore, we see that always two derivatives are 
involved. The two terms in [32] come from the 
action of the U; derivatives, the first acting on the 
Boltzmann factor, and giving rise to a Kronecker 6,, 
the second acting on Z™, and giving rise to the 
minus sign and the duplication of variables. 

The derivation formula can be expressed in a 
more compact form by introducing replicas and 
suitable averages. In fact, let us introduce the state w 
acting on functions F of i as follows 


w(F(i)) = Z7! >, wiexp(VtU;)F(i) [34 


together with the associated product state Q acting 
on replicated configuration spaces 11,12,...,1;. By 
performing also a global E average, finally we define 
the averages 


(F), = EQ(F) 35) 


where the subscript is introduced in order to recall 
the ż dependence of these averages. 

Then, eqn [32] can be written in a more compact 
form 


d 
rr log 3 wiexp(vVtU:)= (Sha) — (Si) [36] 


Our basic comparison argument will be based on 
the following very simple theorem. 


Theorem 2 Let U; and Ü; for i=1,...,K, be 
independent families of centered Gaussian random 
variables, whose covariances satisfy the inequalities 
for generic configurations 


E(U;U;,) = Sij a E(U;U;) = Si [37] 
and the equalities along the diagonal 
E(U;U;) = Sj = E(U;U;) = Sa [38] 


then for the quenched averages we have the inequal- 
ity in the opposite sense 


E log ` w;exp(U;) < E log ` wiexp(U;) [39] 


where the w;>0 are the same in the two 


expressions. 


Considerations of this kind are present in the 
mathematical literature, as mentioned, for example, 
in Talagrand (2003). 

The proof is extremely simple and amounts to a 
straightforward calculation. In fact, let us consider 
the interpolating expression 


Elog X  w;exp(vtU; + V1 — tU;) [40] 


where 0 < ¢ < 1. Clearly, the two expressions under 
comparison correspond to the values t=0 and t= 1, 
respectively. By taking the derivative with respect to 
t, with the help of the previous derivation formula, 
we arrive at the evaluation of the ¢ derivative in 
the form 


d a 
age exp(VtU; +V1—- tU;) 


1 fa ‘ 
= ~ (z >, wi exp(vtU;)(S; — s) 


— E (2 2 he ww; exp(VtU;) 


x exp(VtU;) (Si — s) [41] 


From the conditions assumed for the covariances, 
we immediately see that the interpolating function is 
nonincreasing in t, and the theorem follows. 

The derivation formula and the comparison 
theorem are not restricted to the Gaussian case. 
Generalizations in many directions are possible. For 
the diluted spin glass models and optimization 
problems we refer, for example, to Franz and 
Leone (2003), and to De Sanctis (2005), and 


references therein. 


Thermodynamic Limit and the 
Variational Bounds 


We give here some striking applications of the basic 
comparison theorem. Guerra and Toninelli (2002) 
have given a very simple proof of a long-awaited 
result, about the convergence of the free energy per 
site in the thermodynamic limit. Let us show the 
argument. Let us consider a system of size N and 
two smaller systems of sizes N; and N> respectively, 
with N=N;, + N2, as before in the ferromagnetic 
case. Let us now compare 


E log Zn(8,h,J) 


= Elog ` exp (5o) 


01...0ON 


x exp (i >, a) [42] 


with 


E log >, exp (a5 


01...0ON 


x exp (ay) o) exp Ç >, a) 


= E log Zn, (8, hb, J) + Elog Zn, (8, hb, J) |43] 


where øl! stands for o;,i=1,...,Ni, and o for 
o;,i=N,+1,...,N. Covariances for K" and KP 
are expressed as in [28], but now the overlaps are 
substituted with the partial overlaps of the first and 
second block, qı and q2, respectively. It is very 
simple to apply the comparison theorem. All one has 
to do is to observe that the obvious 


Nq = Nig + Nog2 [44] 
analogous to [10], implies, as in [12], 
Ni No 
q Santa % [45] 


Therefore, the comparison gives the superaddivity 
property, to be compared with [9], 


E log Zn(6,h, J) 
> Elog Zn, (8,h,J) + ElogZvxn,(6,b,J) [46] 


From the superaddivity property the existence of the 
limit follows in the form 


Jim NHE log Zn(8,b,J) 


= sup N“'E log Zx (2, Þ, J) [47] 
N 


to be compared with [13]. 
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The second application is in the form of the 
Aizenman-Sims-Starr generalized variational princi- 
ple. Here, we will need to introduce some auxiliary 
system. The denumerable configuration space is 
given by the values of a=1,2,.... We introduce 
also weights wa > 0 for the a system, and suitably 
defined overlaps between two generic configurations 
pla, a’), with p(a,a)=1. 

A family of centered Gaussian random variables 
K(a), now indexed by the configurations a, will be 
defined by the covariances 


E(K(a)K(a')) = p*(a, a’) 48] 


We will also need a family of centered Gaussian 
random variables n;(aœ), indexed by the sites i of our 
original system and the configurations œ of the 
auxiliary system, so that 


E(nj(a)ni(a’)) = dip (a, a’) [49] 


Both the probability measure wa, and the overlaps 
pla, a’) could depend on some additional external 
quenched noise, which does not appear explicitly in 
our notation. 

In the following, we will denote by E averages 
with respect to all random variables involved. 

In order to start the comparison argument, we 
will consider first the case where the two o and a 
systems are not coupled, so as to appear factorized 
in the form 


E log Ss” Ss” Wo, EXP (1/5410 


O1..ON Q 


x exp eco) exp Ç » n) 


= E log Zn(8,hb, J) +E log X wa 


a 


x exp eco) [50] 


In the second case, the K fields are suppressed and 
the coupling between the two systems will be taken 
in a very simple form, by allowing the 77 field to act 
as an external field on the o system. In this way 
the o’s appear as factorized, and the sums can 
be explicitly performed. The chosen form for the 
second term in the comparison is 


E log >. >, Wa EXP (: >; maa) exp (i 2 a) 


O1...ON Q 


= Nlog2 + Elog X _ wa(cic2... en) [51] 
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where we have defined 
cj = cosh (h + (a) [52] 


as arising from the sums over o’s. 

Now we apply the comparison theorem. In the 
first case, the covariances involve the sums of 
squares of overlaps 


(7° (0,0') + p*(a,a')) [53] 


In the second case, a very simple calculation shows 
that the covariances involve the overlap products 


q(o,0°)p(a, a’) |54] 


Therefore, the comparison is very easy and, by 
collecting all expressions, we end up with the useful 
estimate, as in Aizenman et al. (2003), holding for 
any auxiliary system as defined before, 


N~'E log Zn(6,h,J) 


< log2 + N7! (F log X` Walc1c2 +++ CN) 


— Elog 2 Wa EXP CED) [55] 


The Parisi Representation 
for the Free Energy 


We refer to the original papers, reprinted in the 
extensive review given in Mézard et al. (2002), for 
the general motivations, and the derivation of the 
broken replica ansatz, in the frame of the ingenious 
replica trick. Here, we limit ourselves to a synthetic 
description of its general structure, independently 
from the replica trick. 

First of all, let us introduce the convex space ¥ of 
the functional order parameters x, as nondecreasing 
functions of the auxiliary variable q, both x and q 
taking values on the interval [0,1], that is, 


X>3x:(0,1])3q—x(q) € [0,1] [56] 


Notice that we call x the function, and x(q) its 
values. We introduce a metric on Æ through the 
L! ([0, 1], dq)-norm, where dg is the Lebesgue 
measure. 

For our purposes, we will consider the case of 
piecewise constant functional order parameters, 
characterized by an integer K, and two sequences 
qo, q1, - - - , qK, M1, M2, ..., mK of numbers satisfying 


0 = qo Su S: <qr-1ı SqK=1 
0 <m <m<.-<mg<1 [57] 


such that 


for0=qo<q<q 
for q <qd<qQ 


x(q) me 
x(q) = m 58 


x(q) =mx for qx-1<q<4x 


In the following, we will find it convenient to 
define also mọ =0, and mx 4, = 1. The replica 
symmetric case of Sherrington and Kirkpatrick 
corresponds to 


k= 2 qı = d, mı = 0, mÈ: = 1 [59] 

Let us now introduce the function f, with values 
f(4,y;x, 6), of the variables q€ [0,1], y €R, 
depending also on the functional order parameter 
x, and on the inverse temperature 6, defined 
as the solution of the nonlinear antiparabolic 


equation 


(af): Y) +3 (AIG) 


+$x(q)(f)"(q,y) = 0 60) 
with final condition 
f(1,¥) = log cosh(Gy) [61] 


Here, we have stressed only the dependence of f on q 
and y. 

It is very simple to integrate eqn [60] when x is 
piecewise constant. In fact, consider x(q) =ma4a, for 
qa-1 Sq < qa, firstly with ma > 0. Then, it is 
immediately seen that the correct solution of eqn 
[60] in this interval, with the right final boundary 
condition at q = qa, is given by 


f(4,y) 
1 
=< log | explmaf (qa. +2Vda=4)) dulz) (62 
where du(z) is the centered unit Gaussian measure 
on the real line. On the other hand, if m,=0, then 
[60] loses the nonlinear part and the solution is 
given by 


flay) = | Fda y +z VT =a) dule) 63 


which can be seen also as deriving from [62] in the 
limit m, — 0. Starting from the last interval K, and 
using [62] iteratively on each interval, we easily get 
the solution of [60], [61], in the case of piecewise 
order parameter x, as in [58], through a chain of 
interconnected Gaussian integrations. 

Now, we introduce the following important 
definitions. The trial auxiliary function, associated 
to a given mean-field spin glass system, as described 


earlier, depending on the functional order parameter 
x, is defined as 


2 


1 
log 2 + f(0, h; x, 8) — 5 f qx(q)dq [64] 


Notice that in this expression the function f appears 
evaluated at q = 0, and y = þh, where h is the value of 
the external magnetic field. This trial expression 
shoul be considered as the analog of that appearing 
in [14] for the ferromagnetic case. 

The Parisi spontaneously broken replica symmetry 
expression for the free energy is given by the definition 


— Bfp(G, h) 
2 1 
= inf(log2 + f(0,hix.8)- | gx(a)da) (65 


where the infimum is taken with respect to all 
functional order parameters x. Notice that the 
infimum appears here, as compared to the supre- 
mum in the ferromagnetic case. 

By exploiting a kind of generalized comparison 
argument, involving a suitably defined interpolation 
function, Guerra (2003) has established the follow- 
ing important result. 


Theorem 3 For all values of the inverse tempera- 
ture 8, and the external magnetic field hb, and for 
any functional order parameter x, the following 


bound holds: 


NTE log Zn (3, 4,) 
2 


1 
<log2 + f(0,b:x.8) -5 | axa)da 
0 
uniformly in N. Consequently, we have also 


N-1E log Zn(8,b,)) 
GB? 1 
< inf ( 1082 + f0,h:x,8)-5 f axla da) 
A 0 


uniformly in N. 


However, this result can also be understood in the 
framework of the generalized variational principle 
established by Aizenman—Sims-Starr as described 
earlier. 

In fact, one can easily show that there exist a 
systems such that 


NElogY  wacic2... cy = f (0, þh; x, 8) [66] 


NE log ` ta EXP Gea) 
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uniformly in N. This result stems from earlier work 
of Derrida, Ruelle, Neveu, Bolthausen, Sznitman, 
Aizenman, Contucci, Talagrand, Bovier, and others, 
and in a sense is implicit in the treatment given in 
Mézard et al. (1987). It can be reached in a very 
simple way. Let us sketch the argument. 

First of all, let us consider the Poisson point 
process yj > y2 > y3..., uniquely characterized by 
the following conditions. For any interval A, 
introduce the occupation numbers N(A), defined by 


N(A) = $ x(a € A) (68) 


where y()=1, if the random variable ya belongs to 
the interval A, and y()=0, otherwise. We assume 
that N(A) and N(B) are independent if the intervals 
A and B are disjoint, and moreover that for each A, 
the random variable N(A) has a Poisson distribution 
with parameter 


b 
(A) = | exp(—y) dy 69 
if A is the interval (a, b), that is, 


P(N(A) = k) = exp(—p(A))u(A)*/k! [70] 


We will exploit —y, as energy levels for a statistical 
mechanics system with configurations indexed by a. 
For a parameter 0 < m < 1, playing the role of inverse 
temperature, we can introduce the partition function 


p= 2 exp (=) [71] 


For m in the given interval it turns out that v is a 
very well defined random variable, with the sum 
over & extending to infinity. In fact, there is a strong 
inbuilt smooth cutoff in the very definition of the 
stochastic energy levels. 

From the general properties of Poisson point 
processes, it is very well known that the following 
basic invariance property holds. Introduce a random 
variable b, independent of y, subject to the condition 
E(expb)=1, and let b, be independent copies. 
Then, the randomly biased point process y’, = Ya + ba, 
a=1,2,..., is equivalent to the original one in 
distribution. An immediate consequence is the follow- 
ing. Let f be a random variable, independent of y, such 
that E(expf) < oo, and let fẹ be independent copies. 
Then, the two random variables 


>= exp (22) exp(fa) [72] 


X. exp(™) Elexp(onf)) 73 
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have the same distribution. In particular, they can be 
freely substituted under averages. 

The auxiliary system which gives rise to the Parisi 
representation according to [66] and [67], for a 
piecewise constant order parameter, is expressed in 
the following way. Now a will be a multi-index 
a@=(a1,02,...,QK), Where each a, runs on 
1,2,3,.... Define the Poisson point process y,,, then, 
independently, for each value of a1 processes Yaja, 
and so on up to Ya,a3...ag- Notice that in the cascade of 
independent processes Ya, Yaran -<-> Vazaa...ax the last 
index refers to the numbering of the various points of 
the process, while the first indices denote independent 
copies labeled by the corresponding a’s. 

The weights wa have to be chosen according to 
the definition 

You Yaria 


Wa = exp— exp ——... exp 
mı 


Yaşan...ak [74] 
mM MK 


The cavity fields 7 and K have the following 

expression in terms of independent unit Gaussian 
‘ee | nn 

fandom yanables J oa noard aa io) wees 


Q1Q2...QK? 
nila) = yq — qo Ji, + VQ — DT to 
T V qK _ QK-1 Tees [75] 


Kla) = i -BSa +E- Glow t: 


+ qk 7 dk n tecée [76] 


It is immediate to verify that E(n;(a)n(a’) is zero if 
i Æ 1', while 
0 if aœa1# a, 


. / / 
qi if a4 = 04,02 F a 

F / / / 
q if ay= a, a2 = 04,034 a5, 


E(ni(a)ni(a’)) = 
1 Fam 0,505 Oei 
QK = Qk 
[77] 
Similarly, we have 
0 if a£ oj 


2-3 2. st / 
4 if a4 = a,,024 a 


; É if O14 = dp 05,03 A Os, 
E(K(a)K(a’)) = 


1 if ay =a4,02=,..., 
oy 
QK = OK [78] 
This ends the definition of the œ system, associated 
to a given piecewise constant order parameter. 


Now, it is simple to verify that [66] and [67] 
hold. Let us consider, for example, [66]. With the 
a system chosen as before, the repeated applica- 
tion of the stochastic equivalence of [72] and [73] 
will give rise to a sequence of interchained 
Gaussian integrations exactly equivalent to those 
arising from the expression for f, as solution of 
the eqn [60]. For [73], there are equivalent 
considerations. 

Therefore, we see that the estimate in Theorem 3 
is also a consequence of the generalized variational 
principle. 

Up to this point we have seen how to obtain 
upper bounds. The problem arises whether, as in the 
ferromagnetic case, we can also get lower bounds, 
so as to shrink the thermodynamic limit to the value 
given by the inf, in Theorem 3. After a short 
announcement, Talagrand (2005) has firmly estab- 
lished the complete proof of the control of the lower 
bound. We refer to the original paper for the 
complete details of this remarkable achievement. 
About the methods, here we only recall that in 
Guerra (2003) we have given also the corrections to 
the bounds appearing in Theorem 3, albeit in a quite 
complicated form. Talagrand has been able to 
establish that these corrections do in fact vanish in 
the thermodynamic limit. 

In conclusion, we can establish the following 
extension of Theorem 1 to spin glasses. 


Theorem 4 For the mean-field spin glass model we 
have 


Jim N 'Elog Zn(G,4,J) 
= sup N~'Elog Zn(8, h, J) [79] 
N 


= inf (1082 + f(0,h;x, 3) — rf qx(q) da) [80] 


Diluted Models 


Diluted models, in a sense, play a role intermediate 
between the mean-field case and the short-range 
case. In fact, while in the mean-field model each site 
is interacting with all other sites, on the other hand, 
in the diluted model, each site is interacting with 
only a fixed number of other sites. However, while 
for the short-range models there is a definition of 
distance among sites, relevant for the interaction, no 
such definition appears in the diluted models, where 
all sites are in any case equivalent. From this point 
of view, the diluted models are structurally similar 
to the mean-field models, and most of the 


techniques and results explained before can be 
extended to them. 

Let us define a typical diluted model. The 
quenched noise is described as follows. Let K be a 
Poisson random variable with parameter aN, where 
N is the number of sites, and a is a parameter 
entering the theory, together with the temperature. 
We consider also a sequence of independent cen- 
tered random variables J1, J2,..., and a sequence of 
discrete independent random variables 11,/1, 
12,)/2,.--, uniformly distributed over the set of sites 
1,2,...,N. Then we assume as Hamiltonian 


K 


Hy (0) = — Y Jeon Fn [81] 


k=0 


Only the variables o contribute to thermodynamic 
equilibrium. All noise coming from K, ],,i,,/, 1s 
considered quenched, and it is not explicitly indi- 
cated in our notation for H. 

The role played by Gaussian integration by 
parts in the Sherrington—Kirckpatrick model, here 
is assumed by the following elementary derivation 
formula, holding for Poisson distributions, 


© P(K =k,taN)= © exp(~taN) (taN)*/k! 
— aN(P(K = k — 1,taN) 
— P(K =k,taN)) [82] 


Then, all machinery of interpolation can be easily 
extended to the diluted models, as firstly recognized 
by Franz and Leone in (2003). 

In this way, the superaddivity property, the 
thermodynamic limit, and the generalized varia- 
tional principle can be easily established. We refer to 
Franz and Leone (2003), and De Sanctis (2005), for 
a complete treatment. 

There is an important open problem here. While 
in the fully connected case, the Poisson probability 
cascades provide the right auxiliary œ systems to be 
exploited in the variational principle, on the other 
hand in the diluted case more complicated prob- 
ability cascades have been proposed, as shown, for 
example, in Franz and Leone (2003), and in 
Panchenko and Talagrand (2004). On the other 
hand, in De Sanctis (2005), the very interesting 
proposal has been made that also in the case of 
diluted models the Poisson probability cascades play 
a very important role. Of course, here the auxiliary 
system interacts with the original system differently, 
and involves a multi-overlap structure as explained 
in De Sanctis (2005). In this way a kind of very deep 
universality is emerging. Poisson probability cas- 
cades are a kind of universal class of auxiliary 
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systems. The different models require different 
cavity fields ruling the interaction between the 
original system and the auxiliary system. But further 
work will be necessary in order to clarify this very 
important issue. For results about diluted models in 
the high-temperature region, we refer to Guerra and 
Toninelli (2004). 


Short-Range Model and Its Connections 
with the Mean-Field Version 


The investigations of the connections between the 
short-range version of the model and its mean-field 
version are at the beginning. Here, we limit ourselves 
to a synthetic description of what should be done, and 
to a short presentation of the results obtained so far. 

First of all, according to the conventional wisdom, 
the mean-field version should be a kind of limit of the 
short-range model on a lattice in dimension d, when 
d — co, with a proper rescaling of the strength of the 
Hamiltonian, of the form d@!/*. Results of this kind 
are very well known in the ferromagnetic case, but 
the present technology of interpolation does not seem 
sufficient to assure a proof in the spin glass case. So, 
this very basic result is still missing. In analogy with 
the ferromagnetic case, it would be necessary to 
arrive at the notion of a critical dimension, beyond 
which the features of the mean-field case still hold, 
for example, in the expression of the critical 
exponents and in the ultrametric hierarchical struc- 
ture of the pure phases, or at least for the overlap 
distributions. For physical dimensions less than the 
critical one, the short-range model would need 
corrections with respect to its mean-field version. 
Therefore, this is a completely open problem. 

Moreover, always according to the conventional 
wisdom, the mean-field version should be a kind of 
limit of the short-range models, in finite fixed 
dimensions, as the range of the interaction goes to 
infinity, with proper rescaling. Important work of 
Franz and Toninelli shows that this is effectively the 
case, if a properly defined Kac limit is performed. 
Here, interpolation methods are effective, and we 
refer to Franz and Toninelli (2004), and references 
quoted there, for full details. 

Due to the lack of efficient analytical methods, it is 
clear that numerical simulations play a very important 
role in the study of the physical properties emerging 
from short-range spin glass models. In particular, we 
refer to Marinari et al. (2000) for a detailed account of 
the evidence, coming from theoretical considerations 
and extensive computer simulations, that some of the 
more relevant features of the spontaneous replica 
breaking scheme of the mean field are also present in 
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short-range models in three dimensions. Different 
views are expressed, for example, in Newman and 
Stein (1998), where it is argued that the phase-space 
structure of short-range spin glass models is much 
simpler than that foreseen by the Parisi spontaneous 
replica symmetry mechanism. 

Such very different views, both apparently 
strongly supported by reasonable theoretical con- 
siderations and powerful numerical simulations, are 
a natural consequence of the extraordinary difficulty 
of the problem. 

It is clear that extensive additional work will be 
necessary before the clarification of the physical 
features exhibited by the realistic short-range spin 
glass models. 


Conclusion and Outlook for Future 
Developments 


As we have seen, in these last few years, there has 
been an impressive progress in the understanding of 
the mathematical structure of spin glass models, 
mainly due to the systematic exploration of com- 
parison and interpolation methods. However, many 
important problems are still open. The most 
important one is to establish rigorously the full 
hierarchical ultrametric organization of the overlap 
distributions, as appears in Parisi theory, and to 
fully understand the decomposition in pure states of 
the glassy phase, at low temperatures. 

Moreover, it would be important to extend these 
methods to other important disordered models as, 
for example, neural networks. Here the difficulty is 
that the positivity arguments, so essential in com- 
parison methods, do not seem to emerge naturally 
inside the structure of the theory. 

Finally, the problem of connecting properties of 
the short-range model, with those arising in the 
mean-field case, is still almost completely open. 


Acknowledgment 


We gratefully acknowledge useful conversations 
with Michael Aizenman, Pierluigi Contucci, Giorgio 
Parisi, and Michel Talagrand. The strategy 
explained in this report grew out from a systematic 
exploration of comparison and interpolation meth- 
ods, developed in collaboration with Fabio Lucio 
Toninelli, and Luca De Sanctis. 

This work was supported in part by MIUR 
(Italian Ministry of Instruction, University and 
Research), and by INFN (Italian National Institute 
for Nuclear Physics). 


See also: Glassy Disordered Systems: Dynamical 
Evolution; Large Deviations in Equilibrium Statistical 
Mechanics; Mean Field Spin Glasses and Neural 
Networks; Short-Range Spin Glasses: The Metastate 
Approach; Statistical Mechanics and Combinatorial 
Problems. 


Further Reading 


Aizenman M, Sims R, and Starr S (2003) Extended variational 
principle for the Sherrington—Kirkpatrick spin-glass model. 
Physical Review B 68: 214403. 

De Sanctis L (2005) Structural Approachs to Spin Glasses and 
Optimization Problems. Ph.D. thesis, Department of Math- 
ematics, Princeton University. 

Edwards SF and Anderson PW (1975) Theory of spin glasses. 
Journal of Physics F: Metal Physics 5: 965-974. 

Franz S and Leone M (2003) Replica bounds for optimization 
problems and diluted spin systems. Journal of Statistical 
Physics 111: 535-564. 

Franz S and Toninelli FL (2004) The Kac limit for finite-range 
spin glasses. Physical Review Letters 92: 030602. 

Guerra F (2001) Sum rules for the free energy in the 
mean field spin glass model. Fields Institute Communica- 
tions 30: 161. 

Guerra F (2003) Broken replica symmetry bounds in the mean 
field spin glass model. Communications in Mathematical 
Physics 233: 1-12. 

Guerra F and Toninelli FL (2002) The thermodynamic limit in 
mean field spin glass models. Communications in Mathema- 
tical Physics 230: 71-79. 

Guerra F and Toninelli FL (2004) The high temperature region of 
the Viana—Bray diluted spin glass model. Journal of Statistical 
Physics 115: 531-555. 

Marinari E, Parisi G, Ricci-Tersenghi F, Ruiz-Lorenzo JJ, and 
Zuliani F (2000) Replica symmetry breaking in short range 
spin glasses: A review of the theoretical foundations and of the 
numerical evidence. Journal of Statistical Physics 98: 
973-1074. 

Mézard M, Parisi G, and Virasoro MA (1987) Spin Glass Theory 
and Beyond. Singapore: World Scientific. 

Mézard M, Parisi G, and Zecchina R (2002) Analytic and 
algorithmic solution of random satisfiability problems. Science 
297: 812. 

Newman CM and Stein DL (1998) Simplicity of state and overlap 
structure in finite-volume realistic spin glasses. Physical 
Review E 57: 1356-1366. 

Panchenko D and Talagrand M (2004) Bounds for diluted mean- 
field spin glass models. Probability Theory Related Fields 130: 
319-336. 

Sherrington D and Kirkpatrick S (1975) Solvable model of a spin- 
glass. Physical Review Letters 35: 1792-1796. 

Stein DL (1989) Disordered systems: Mostly spin glasses. In: Stein 
DL (ed.) Lectures in the Sciences of Complexity. New York: 
Addison-Wesley. 

Talagrand M (2003) Spin Glasses: A Challenge for Mathemati- 
cians. Mean Field Models and Cavity Method. Berlin: 
Springer. 

Talagrand M (2006) The Parisi formula. Annals of Mathematics 
163: 221-263. 

Young P (ed.) (1987) Spin Glasses and Random Fields. Singapore: 
World Scientific. 


Spinors and Spin Coefficients 


K P Tod, University of Oxford, Oxford, UK 
© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Spinors were invented by the mathematician 
E Cartan (see, e.g., Cartan (1981)) in the early 
years of the last century in the course of his study of 
rotation groups. The physicist Pauli reinvented what 
Cartan would have called the spinors of SU(2), 
which is the double cover of the rotation group 
SO(3), in order to explain the spectroscopy of alkali 
atoms and the anomalous Zeeman effect. For this, 
he needed an essential two-valuedness of the 
electron, an internal quantum number to contribute 
to the angular momentum, which is now called spin. 
Now the wave function becomes a two-component 
column vector. It is worth noting that, despite the 
name, Pauli resisted the picture of an electron as a 
spinning “thing” on the grounds that, as a repre- 
sentation of SU(2) which was not a representation of 
SO(3), it should have no classical kinematic model, 
which a spinning object would have. 

According to the review article of van der Waerden 
(1960), the term “spinor” is due to Ehrenfest in 
1929, and was introduced in the flurry of interest 
after the next important step in the evolution of 
spinors in the physics literature, which was the 
introduction of a relativistic equation for the 
electron by Dirac (1928). 

Dirac sought a linear, first-order but Lorentz- 
invariant equation for the electron which was to be 
the square root of the linear, Lorentz-invariant but 
second-order Klein—Gordon equation. He assumed 
the equation for the wave function ù would take 
the form 


Ly := (ipa + mel) = 0 [1] 


where p, = — ibð/ðx* for a=0, 1,2,3, but where 9° 
are complex square matrices, of a size to be 
determined, and I is the corresponding identity 
matrix. Differentiating [1] again, one obtains the 


Klein-Gordon equation for wy provided these 
matrices satisfy the equation 
Traer =a 2] 


where 7” is the Minkowski metric, diag(1, —1, 
1, -1). 

Assuming the 7” have been found, the usual 
substitution p — p — ieA, for a particle in a mag- 
netic field with vector potential A, leads to the 
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correct magnetic moment for the electron, so that 
this equation does describe an electron with spin in 
the form made familiar by Pauli. 

To decide on the size of the matrices 7“ and 
therefore the dimension of the space of w’s, one 
notices, with the aid of [2], that the following are a 
basis for the algebra generated by the 74’: 
lab] 


| ] 


1,57, yey?) glaa, labanen A [3] 
There are 16 elements in this basis, assuming that 
there are no extra identities among them, so that we 
might hope to find a representation as 4x4 
matrices. This can be done, and Dirac gave explicit 
formulas in terms of Pauli matrices. The space of 
Dirac spinors is now a complex four-dimensional 
vector space, which turns out to split as the sum of a 
complex two-dimensional vector space S$, which is 
referred to as a spin space, and its complex 
conjugate S$ (the relationship between a complex 
vector space and its complex conjugate is described 
in the text below and eqn [9]). Under proper, 
orthochronous Lorentz transformations, S trans- 
forms into itself by SL(2,C) transformation, but 
space and time reflections relate S to S. The fact that 
there are two spin spaces S and S in dimension 4 is 
the basis of chirality: an electron is represented by a 
Dirac spinor, which is a pair of spinors, one in each 
of S and $, which are related under space reflection; 
a particle represented just by a spinor in S cannot be 
invariant under space reflection. 

The Clifford algebra (see Clifford Algebras and 
their Representations) associated with a vector space 
V with metric g is defined as the algebra generated 
by elements v, w of V with the multiplication U 


satisfying 


vUw+wUv = 2g(v, w) [4] 
The matrices y* define a representation of the 
Clifford algebra by associating a covector va with a 
matrix v=v,z7", since [2] is then equivalent to [4]. 
This part of the process works in any dimension 1 
and signature s. For odd n, as, for example, with Pauli 
spinors, the 77 are square matrices of size 2N x 2N, 
where N = (n — 1)/2, and there is a single spin space 
of dimension 2. For even n, as with the original Dirac 
spinors, the 7” are square matrices of size 2N x 2N, 
where N =n/2, but there are two spin spaces each of 
dimension 27t., Reality properties of the spin spaces 
and the existence of other structures on them depend 
in an intricate way on n and s (Penrose and Rindler 


1984, 1986, Benn and Tucker 1987). 


668 Spinors and Spin Coefficients 


The dimension of the space of spinors rises rapidly 
with n, which is one reason why historically spinors 
have been most useful in spaces of dimensions 3 and 4, 
where the spin space has dimension 2. In a space of 
dimension 11, a case considered in supergravity, the 
spin space already has dimension 32. 


Spinors in General Relativity: Spinor 
Algebra 


In this section, we start again with a different 
emphasis. Conventions follow Penrose and Rindler 
(1984, 1986). To introduce spinors as a calculus in a 
four-dimensional, Lorentzian spacetime M, one can 
begin by choosing an orthonormal tetrad of vectors 
(€9,€1,€2,€3) at a point p. The following conven- 
tions are used: 


2(€q, ep) = Tab = diag(1, = =; —1) 


Any vector v in the tangent space V = T,M at p has 
components v* in this basis, which we arrange as a 
matrix and label in two ways: 


La paw fv™ g 
Y(v) = V2 (2 ip page J alt git [5] 
The reason for the factor 1/2 will be seen below, 


as will the rationale for the second form of the 
matrix. Note that Y(v) is Hermitian and that 


2 det U(v) = g(v,v) = nap*v” 6 


Clearly, there is a one-to-one correspondence 
between elements of V and Hermitian 2 x2 
matrices. Further, if t is any matrix in SL(2,C),then 
the transformation 


Yw) > tt (v)ti [7] 


where żÎ is the Hermitian conjugate of ¢, is linear in v, 
and preserves both Hermiticity and the norm of v. 
Thus, it must represent a Lorentz transformation. It is 
straightforward to check that it is a proper, ortho- 
chronous Lorentz transformation and that all such 
transformations arise in this way (recall that “proper” 
means transformations of determinant 1 so that 
orientation is preserved, and “orthochronous” means 
that future-pointing timelike or null vectors are taken 
to future-pointing timelike or null vectors, so that time 
orientation is preserved; the proper, orthochronous 
Lorentz group is equivalently the identity-connected 
component of the Lorentz group). Since both t and —t 
give the same Lorentz transformation, this provides an 
explicit demonstration of the (2 — 1)-homomorphism 
of SL(2,C) with the proper, orthochronous Lorentz 
group O! (1,3); 


If the vector v in [5] is null, then the matrix has 
vanishing determinant, or, equivalently, it has rank 
1, and so it can be written as the outer product of a 
two-component column vector a=(a°,a!)! and its 
Hermitian conjugate: 


U(v) = aal [8] 
Furthermore, under [7], a transforms as 
a — ta [9] 


The two-complex-dimensional space to which a 
belongs is the spin space S at p, already met in the 
previous section, and it follows from [8], since null 
vectors span V, that the tensor product S & S of S with 
its complex conjugate vector space $ is the complex- 
ification of V. Complex conjugation gives an antilinear 
map from S to S. (One associates the complex- 
conjugate vector space V to any given complex vector 
space V as follows: scalar multiplication for V can be 
considered as a function ¢:C x V — V given by 
olz, v)=zv, while vector addition is a map w: V x 
V—V given by w(u,v)=u+v. Define another 
complex vector space by taking the same vectors and 
the same w but with scalar multiplication ¢, where 
plz, v) = ¢(Z, v). This is the complex-conjugate vector 
space V. Given a choice of basis, we think of V as, say, 
n-component column vectors of complex numbers, 
and then V is the corresponding complex-conjugate 
columns.) 

Conventionally, S is the space of unprimed spinors 
and $ the space of primed spinors, and one also has 
the two duals S’ and § which are associated in the 
corresponding way to the dual V’ of V. Analogously 
to the situation with vectors and covectors, index 
conventions for spinors are as follows: 

Ge = S, dar € 5 
where A =0,1,A' =0', 1’. 

Spinor algebra mirrors tensor algebra: a spinor 
g api ,.-B,B,..B, 18 an element of the tensor 
product of p copies of S, q copies of S, r copies of S’, 
and s copies of §. The second way of writing the 
matrix in [5] enables the identification of a vector 
with a matrix to be conventionally written as 


a ES, va € S', 


yt = y^^ [10] 


and then extended to any tensor T*-°, 4 by replacing 
each vector index, say b, with a pair BB’ of spinor 
indices. In particular, from [8], it follows that any 
real null vector n? can be written in the form 


/ 
nt = pp4 


for some spinor vô. 


One must pay attention to the order of spinor 
indices of a given type, primed or unprimed, but by 
convention may permute primed and unprimed 
indices. A spinor with an equal number n of primed 
and unprimed indices corresponds to a tensor of 
valence n, and the tensor is real if the spinor satisfies 
a suitable Hermiticity relation. 

Spinors may have various symmetries among their 
indices, much as tensors have. However, since S is two 
dimensional, there is only a one-dimensional space of 
2-forms on S. This has two consequences: no spinor 
can be antisymmetric over more than two indices; and 
if we make a choice of canonical 2-form, all spinors 
can be written in terms of symmetric spinors and the 
canonical 2-form. This is a decomposition of spinors 
into irreducibles for SL(2, C). 

One makes a choice of 2-form cag according to 


€AB = —€BA, €01=1 


There is an inverse «^? defined by 


a T = 53 [1 1] 
where 6% is the Kronecker delta. The complex 
conjugate of cag is conventionally written without 
an overbar as ewp, and analogously «4 is the 
complex conjugate of e^. 

Because of the antisymmetry of eag, order of 
indices is crucial in equations such as [11]. The 
2-form eag has a role akin to that of a metric as it 
provides an identification of S and its dual, 
according to 


a” — QB = al eap 


Ba — p^ = ^” Bp 


with corresponding formulas for primed spinors. 
Note that, because of the antisymmetry of €,p, 
necessarily a4a“ =0 for any aô. 

With conventions made so far, it can be checked 
that 


b AA’, BB’ 
Zabt V? = Eapearp” VU [12] 


for any vector vf, where g, is the spacetime metric 
at p, so that 


Sab = EABEA'B' 


It is the desire to have this formula without 
constants that necessitates the choice of the factor 
1/V2 in [5]. 

One final piece of spinor algebra that we note is 
the following: given a symmetric spinor @a4,...4, there 
is a factorization 


1 n 
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where the round brackets indicate symmetrization 
over the indices Aj,...,A,, and the n spinors 
- ), which determined onl 
QAQA,» Which are determined only up to 
reordering and rescaling, are known as the principal 
spinors of ¢. To prove this, note that the principal 
spinors can be identified with the solutions ¢“ of the 


equation 


daa, Cot ++. Cn = 0 


and there are n of these, counting multiplicities, by 
the “fundamental theorem of algebra.” 


Spinors in General Relativity: Spinor 
Calculus 


We now want to define spinor fields on the 
spacetime M as sections of a spinor bundle S 
whose fiber at each point is S and such that the 
tensor product S@S is the complexified tangent 
bundle. The existence of such an S imposes global 
restrictions on M: M must be orientable and time 
orientable, and a certain characteristic class, the 
second Stiefel-Whitney class, must vanish (for an 
explanation of these terms see, e.g., Penrose and 
Rindler (1984, 1986)). Assuming that M satisfies 
these conditions, spinor fields can be defined. It is 
convenient to retain the algebraic formulas from the 
previous section (e.g., [10] or [12]) but with indices 
now regarded as abstract (a note on the abstract 
index convention appears in Twistors). 

By an argument analogous to that for the 
fundamental theorem of Riemannian geometry, 
there is a unique covariant derivative that satisfies 
the Leibniz condition, coincides with the Levi- 
Civita derivative on tensors and the gradient on 
scalars, and annihilates €4g and eyg. Following the 
conventions of the previous section, the spinor 
covariant derivative will be denoted as Vax. The 
commutator of derivatives can be written in terms 
of irreducible parts (for SL(2,C)) according to the 
formula 


VAA V pp — Veep VAA = €a'B AaB + €4B AaB) 
where A4g= Voa V$) The definition of the 
Riemann curvature tensor is in terms of the Ricci 
identity 

(VaVo — Vp Vatt = Rapa" 
and then this translates into two Ricci identities for 
a spinor field: 
D 
AABQC = XABCDQ 


Ayp'ac = Papcpa” 
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The curvature spinors yagcp and ®,:pcp are related 
to the curvature tensor. The Ricci spinor ®,/g/ag is 
Hermitian and symmetric on both index pairs and is 
a multiple of the trace-free part of the Ricci tensor: 


Papas = —5(Rav — 7Reup) 


The spinor yagcp is Symmetric on the first and last 
pairs of indices and decomposes into irreducibles 
according to 


XABCD = Vasco — 2Aep(a€pyc 


where A = R/24 in terms of the Ricci scalar or scalar 
curvature R, while Wagcp, which is totally sym- 
metric and is known as the Weyl spinor, is related to 
the Weyl tensor Capea by the equation 


Cabed = W ABCD EA’B’EC'D! + VA'B'CD'EABECD 


Thus, the ten real components of the Weyl tensor 
are coded into the five complex components of the 
Weyl spinor. 

Following the last remark in the previous section, 
the Weyl spinor has four principal spinors, each of 
which defines a null direction, the principal null 
directions of the Weyl tensor. There is a classifica- 
tion of Weyl tensors, the Petrov—Pirani—Penrose 
classification, based on coincidences among the 
principal null directions (Penrose and _ Rindler 
1984, 1986). 

As a final exercise in spinor calculus, we recall the 
zero-rest-mass equations (see Twistors). In flat 
spacetime, these are the equations 


V4 dan.c = 0 


on a totally symmetric spinor field d,p..c. The field 
is said to have spin s if it has 2s indices, and the 
cases s=1/2, 1, or 2, respectively, are the Weyl 
neutrino equation, the Maxwell equation, and the 
linearized Einstein equation. In flat spacetime, these 
hyperbolic equations are well understood and 
solvable in a variety of ways. In curved spacetime, 
however, if s>3/2, then there are curvature 
obstructions to the existence of solutions, known 
as Buchdahl conditions. This can be seen at once by 
differentiating again, say by VĒ, and using the 
spinor Ricci identity. After a little algebra, one finds 


wy" nde. ABC = 0 


so that, whenever the field has three or more indices, 
there are algebraic constraints on its components in 
terms of the Weyl spinor. 


The Spin-Coefficient Formalism 


The spin-coefficient formalism of Newman and 
Penrose is a formalism for spinor calculus in space- 
times (see, e.g., Penrose and Rindler (1984, 1986) 
and Stewart (1990)). It finds application in 
any calculation dealing with curvature tensors, 
including solving the Einstein equations. The form- 
alism exploits the compression of terminology which 
the introduction of complex quantities permits. 

The formalism starts with a choice of spinor dyad, 
a basis of spinor fields (o“,.“) normalized so that 
oat’ =1. From the dyad, one constructs a null 
tetrad, which is a basis of vector fields, according to 
the scheme 


Given the normalization of the spinor dyad, each of 
the vectors in the null tetrad is null (hence the name) 
and all inner products are zero, except for 


a PEN a a 
Ong = 1 = -m'm 


It follows that the metric can be written in the 
basis as 


Lab = 2L(qnp) — 2M(gMp) 


The components of the covariant derivative in the 
null tetrad are given separate names according to the 
following scheme: 


VY, = D, nVa = A, mV, = 6, mV, = 8 


and the spin coefficients are the 12 components of 
the covariant derivative of the basis. Each is labeled 
with a Greek letter according to the following 
scheme: 


> 


Do’ = e0 — Ki“, 
60% = Bof — at’, 
Di4 = roô — ec’, 


Sn = po* — Bi“, 


Aof = 404 — 71 


> 


ot = aof — pu 14 
AŻ = vof- y 


TE a, 


> > 


The spin coefficients code the 24 real Ricci rotation 
coefficients into 12 complex quantities. Some of the 
spin coefficients have direct geometrical interpreta- 
tion. For example, the vanishing of «x is the 
condition for the integral curves of # to be geodesic, 
while, if o is also zero, this congruence of geodesics 
is shear free. The same role is played by v and A for 
the n*-congruence. The real and imaginary parts of p 
are, respectively (minus), the expansion and the 
twist of the congruence of integral curves of &. 


In practice, it is often simpler to calculate the spin 
coefficients from the commutators of the basis 
vectors, now regarded as directional derivatives, as 
follows: 


AD —-DA=(7+7)D+(e+te)A—(7T+7)6—-(7T+7)6 
6D —D6=(@+8—-7)D+KA—(pt+e—86—06 
6A—A6=—0D + (r-G—B)A+(p—yt+7)b+A6 

65 — 66 =(ji—u)D + (p— p)A+(a—8)5—(a—)6 
[15] 


The commutator of second derivatives applied to 
the spinor dyad expresses the components of the 
curvature tensor in terms of the derivatives of 
the spin coefficients. Before presenting these, we 
adopt a convention for labeling the components of 
curvature. The components of the Weyl spinor are 
given as follows: 


Vo = Wien o OO" 

V1 = ipepo 6 O71 

ys = Wigcno o ie [16] 
Į, = Uine ro 


ŲĮ4 = Pazeni rra” 
so that these five complex scalars encode the ten real 
components of the Weyl tensor. For the Ricci spinor, set 
Poo = Parapo otot O” , 


IpI 
Poz = Papap o oPTi T , 


= Ve £ 
P12 = Papap o PTA T , 


— ' / 
Do, = Papago aL 


_ At pt 
D1, = Papap o v0" T 


= E / 
P22 = Bigyn r JA 


together with ®10 = ®01, P20 = o2, and ®21 =). 
The nine components of the trace-free Ricci tensor 
are encoded in these scalars of which three are real 
and three complex. The Ricci scalar, as before, is 
replaced by the real scalar A = R /24. 

Now the commutators of covariant derivatives on 
the spinor dyad lead to the following system: 


Dp — 6% =p* +07 +(e t] oR 
— (3a + 8 — T)K + Boo 
Do — ôr = (p + p + 3e — €o 
— (T — T +a + 3p)k + Vo 
Dr — Ak =(74+ 7) p+ (F+a)o4 (€-€)7 
— (37+ Va+ V1 + Bor 
Da — ôe = (p + €—2€)a + Ba — Be — KA — Ry 
+ (e + p)r + io 
DP — ôe = (a + r)a + (P — €) — (u + y)kK 
— (à — T)e + Yı 
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Dy— Ae=(7T4+7)at(T+m)G-(e+e\y—(y+7I)e€ 
+TK-VK+U,—-A+ Ê] 
DA— 6x = (p—3e+8€)A+op4+(r+a-—f)a 


— VK + 09 

Du—6n= (p—e—€)ut+orA+(t#-a4+68)a 
—VK+U,+2A 

Dv —An=(n+T)wt+(A+T)A+(Y-F)a 
— (3e+€)v+WV3+4+ 7, 


Ad — bv = —(u+ā+3y- 7) 
+ (3a+6+7-TF)v—W, 

ôp— ôo = (4+) p—(3a—B)o+(p—p)r 
+ (p— pie — Uy, + Oo1 

dba — 63 = up — àc +04 BB-2a8+(p— p)y 
+ (u—pje—V2+A+O14 

5A — b= (p—p)v+(u-fi)r+(a+)u 
+(a@—3B)\—WV34+ O21 

Au-év= —(u+y+7)u— A+ 00 
+(@sp 00 =F) U=- O75 

Ab —éby=(a+6-—T)y-pT+ov+en 
a m 12 

Ao —6r= —(u—3y+7)o—àp—(T+8-à)r 


+ KY — Po? 
Ap—6r=(y+4-fi)p—oA+(G-a-7)r 
+vk — Y — 2A 
Aa- ôq= (p+e)}v-(T+6)A+(7-5)a 
+(68-T)y—- V3 [17] 


Finally, it is possible to write out the Bianchi 
identities in this formalism. For simplicity, and 
with a view to an application, we do this below 
only for vacuum, so that the Ricci tensor is zero: 


DW, — Vo = (7 — 4a) Yo + 2(2p +) — 3K Y2 
AW — OW, = (4y — u) Yo — 2(27 + 6) Y1 + 302 
DW, — 6W, = —\Wy + 2(7 — a) Wy + 300, — 2KW; 
AW, — Y = vYo + 2(y — pw) — 37 + 20V; 
DW; — 6Y, = —2\W, + 3rW2 + 2(p — €)U3 — Ky 
AW, — 6W3 = 2VW, — 3upW2 + 2(8 —7)V3 + oY, 
DW, — Y; = —3AV, + 2(a + 2) V3 + (p — 4) Wy 
AW; — 6W4 = 3yW2 — 2(y + 2u)Y3 + (46—7)U4 
[18] 

The whole system is then loosely described as the 
spin-coefficient equations. 


As a simple application, we shall prove the 
Goldberg—Sachs theorem: for vacuum spacetimes, a 
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spinor field oĉ is geodesic and shear free iff it is a 
repeated principal spinor of the Weyl spinor. 

In the spin-coefficient formalism, oĉ is geodesic 
and shear-free iff k and o vanish, and, from [16], is a 
repeated principal spinor of the Weyl spinor 
provided VW)>=W,=0. It will be repeated three 
times if also Y =0 and four times if Y3 =0, but 
one must have Y, Æ 0 for some k if the spacetime is 
not to be flat. 

Suppose that oĉ is a (twice) repeated principal 
spinor of the Weyl spinor, then at once from the first 
two expressions in [18] both « and ø vanish. If it is 
repeated three times, one gets the same result from 
the third and fourth expressions in [18], while if oĉ 
is repeated four times then the fifth and sixth 
expressions of [18] should be used. 

For the converse, suppose that k =a = 0. Then, by 
the first equation in [14], oĉ can be rescaled to ensure 
that «= 0 and a spinor field 14 can be chosen which is 
normalized against oĉ and parallelly propagated along 
4, so that, by the fifth equation in [14], 7=0. From 
the second expression in [17], one can see at once that 
Wo = 0, so that the first two equations in [18] simplify 
to give expressions for DW, and 6V,. By commuting D 
and 6 on Yı and using the second expression of [15] 
with the relevant parts of [17], it can be concluded that 
UW, =0, as required. 

Another application which is easy to describe is 
the solution of the type-D vacuum equations. A 
type-D solution is one for which the Weyl spinor has 
two (linearly independent) repeated principal spi- 
nors. If these are taken as the normalized dyad, then 
from [16] only Y2 is nonzero among the W,. By the 
Goldberg—Sachs theorem, both spinors are geodesic 
and shear free, so that the spin coefficients g, k, A, 
and v all vanish. With these conditions, the spin- 
coefficient equations simplify to the point that 
careful choices of coordinates and the remaining 
freedom in the dyad enable the equations to be 
solved explicitly. One obtains metrics that depend 
only on a few parameters. Analogous methods 
reduce the Einstein equations to simpler systems 
for the other vacuum algebraically special metrics, 
that is, the other vacuum metrics for which the Weyl 
spinor does not have four distinct principal null 
directions (Mason 1998). 

The spin-coefficient formalism has also been 
extensively used in the study of asymptotically flat 
spacetimes and gravitational radiation (Penrose and 
Rindler 1984, 1986, Stewart 1990). 


A 


The Positive-Mass Theorem 


A very important application of spinor calculus in 
recent years was the proof by Witten (1981) of the 


positive-mass (or positive-energy) theorem. The 
proof was motivated by ideas from supergravity 
and gave rise to an increased interest in spinors in 
general relativity. 

The positive-mass theorem is the following asser- 
tion: given an asymptotically flat spacetime M with 
a spacelike hypersurface £X, which is topologically 
R? and in which the dominant energy condition 
holds, the total (or Arnowitt-Deser—Misner (ADM)) 
momentum is timelike and future-pointing. (The 
dominant-energy condition is the requirement that 
T pU" V? is non-negative for every pair of future- 
pointing timelike or null vectors U? and V®.) 

We follow the notation of Penrose and Rindler 
(1984, 1986), where the proof begins by considering 
the 2-form = defined in terms of a spinor field \“ on 
X by 


ai= —i\p Vaàgdx’ A dx? 


If à? tends to a constant spinor at spatial infinity on 
X, then 


1 pami AvA’ 
eh I 19] 


as the spacelike spherical surface $ tends to spatial 
infinity, where p, is the ADM momentum. Suppose 
X has unit normal f%, intrinsic metric hgp = g,» — tatp 
and the dual-volume 3-form is di?=?f*d=. Then 
Stokes’ theorem states that 


We calculate 


where 
a = 4nGT,, dd" 
pacia VA Vix dS 


where # = \*A“ and we have used the Einstein field 
equations to replace curvature terms in a by the 
energy-momentum tensor T,,. Provided the matter 
satisfies the dominant-energy condition, œ is every- 
where a positive multiple of the volume form on X 
and its integral is positive (it can vanish only in 
vacuum). To make the integral of 3 positive, A^ is 
required to satisfy 


Daa = 0 [20] 


where D, = b? Vp, which is the projection of the four- 
dimensional covariant derivative rather than the 
intrinsic covariant derivative of X. Equation [20] is 
the Sen-Witten equation; it is elliptic and reduces to 


the Dirac equation on a maximal surface; furthermore, 
given an asymptotically constant value for Aĉ on an 
asymptotically flat 3-surface ©} with the topology of 
R°, it has a unique solution. Equation [20] removes 
part of the derivative of \“ from £ to leave 


B= —h”’D,cDpAc dX 


Now /h,, is negative definite and X has timelike 
normal so that ( is a positive multiple of the volume 
form on © (unless \“ is covariantly constant, a case 
which is dealt with separately). Thus, the integral of 
dE is non-negative and therefore, by [19], so is the 
inner product of the ADM momentum p, with any 
null vector constructed from asymptotically constant 
spinors. Furthermore, this inner product is strictly 
positive, except in a vacuum spacetime admitting a 
constant spinor. Such spacetimes can be found 
explicitly and cannot be asymptotically flat, so that 
the ADM momentum is always timelike and future 
pointing, and vanishes only in flat spacetime. 

The basic positive-energy theorem outlined above 
can be extended in several directions: 


è to prove that the total momentum at future null 
infinity is also timelike and future pointing; 

eto deal with surfaces © which have 
boundaries, for example, at black holes; 

è to prove inequalities between charge and mass; and 

è to deal with spacetimes which are asymptotically 
anti-de Sitter rather than flat. 


inner 


Further Applications of Spinors 


Supersymmetry is a symmetry in quantum field 
theory relating bosons and fermions. In the language 
of spinors, bosons are represented by fields with an 
even number of spinor indices and fermions by fields 
with an odd number of indices. Thus, the gauge 
transformations of supersymmetry are generated by 
spinors with a single index. 

Supergravity is supersymmetry in the case that one of 
the fields is the graviton. A supergravity theory is 
labeled by an integer N for the number of independent 
supersymmetries and much of the numerology of these 
theories follows from properties of spinors. N=1 
supergravity contains a graviton and a spin-3/2 field 
coupled together, and the presence of the super- 
symmetry allows the Buchdahl condition to 
be evaded. Supergravity theory with one supersymme- 
try in 11 spacetime dimensions depends on one spinor, 
which, in 11 dimensions, has 32 components. This is as 
many components as eight Dirac spinors in a four- 
dimensional spacetime, and, by a process of dimen- 
sional reduction, N = 1 supergravity in 11 dimensions 
is related to N = 8 supergravity in four dimensions. For 
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reasons related to the Buchdahl conditions, 8 is the 
largest N that is considered in four dimensions. 

In superstring theory and in some supergravity 
theories, one often wishes to consider spaces 
with “residual supersymmetry,” by which is meant 
that there is a spinor field satisfying a condition of 
covariant constancy in some connection (Candelas et 
al. 1985). The existence of such constant spinors, as a 
result of spinor Ricci identities analogous to those 
given above, typically imposes strong restrictions on 
the curvature. Riemannian manifolds admitting con- 
stant spinors for the Levi-Civita connection are Ricci- 
flat (Hitchin 1974); Lorentzian ones can often be 
found in terms of a few functions. Manifolds of 
special holomorphy, which are of interest in super- 
string theory, can usually be characterized as admit- 
ting special spinors (Wang 1989). 


See also: Clifford Algebras and Their Representations; 
Dirac Operator and Dirac Field; Einstein Equations: Exact 
Solutions; Einstein’s Equations with Matter; General 
Relativity: Overview; Geometric Flows and the Penrose 
Inequality; Index Theorems; Relativistic Wave Equations 
Including Higher Spin Fields; Spacetime Topology, 
Causal Structure and Singularities; Supergravity; Twistor 
Theory: Some Applications [in Integrable Systems, 
Complex Geometry and String Theory]; Twistors. 
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Introduction 


This article gives a brief discussion of a topic with 
an enormous literature, namely the stability/instabil- 
ity of fluid flows. Following the seminal observa- 
tions and experiments of Reynolds in 1883, the issue 
of stability of a fluid flow became one of the central 
problems in fluid dynamics: stable flows are robust 
under inevitable disturbances in the environment, 
while unstable flows may break up, sometimes 
rapidly. These possibilities were demonstrated in a 
relatively simple experiment where flow in a pipe is 
examined at increasing speeds. As a dimensionless 
parameter (now known as the Reynolds number) 
increases, the flow completely changes its nature 
from a stable flow to a completely different regime 
that is irregular in space and time. Reynolds called 
this “turbulence” and observed that the transition 
from the simple flow to the chaotic flow was caused 
by the phenomenon of instability. 

Even though the topic has been the subject of 
intense study over more than a century, Reynolds 
experiment is still not fully explained by current 
theory. Although there is no rigorous proof of 
stability of the simple flow (known as Poiseuille 
flow in a circular pipe), analytical and numerical 
investigations of the equations suggest theoretical 
stability for all Reynolds numbers. However, experi- 
ments show instability for sufficiently large 
Reynolds numbers. A plausible explanation for this 
phenomenon is the instability of such flows with 
respect to small but finite disturbances combined 
with their stability to infinitesimal disturbances. 

The issue of fluid stability, in contexts much 
more complex than the fundamental experiment of 
Reynolds, arises in a multitude of branches of 
science, including engineeering, physics, astrophy- 
sics, oceanography, and meteorology. It is far 
beyond the scope of this short article to even 
touch upon most of the extensive literature. In the 
bibliography we list just a few of the substantive 
books where classical results can be found 
(Chandrasekhar 1961, Drazin and Reid 1981, 
Gershuni and Zhukovitiskii 1976, Joseph 1976, 
Lin 1967, Swinney and Gollub 1985). Recent 
extensive bibliographies on mathematical aspects 
of fluid instability are given in several articles in the 
Handbook of Mathematical Fluid Dynamics 


(Friedlander and Serre 2003) and the compendium 
of articles on hydrodynamics and nonlinear 
instabilities in Godreche and Maneville (1998). 


The Equations of Motion 


The Navier-Stokes equations for the motion of an 
incompressible, constant density, viscous fluid are 


1 
at A CVPR teva [lal 


divg = 0 [1b] 


where q(x, t) denotes the velocity vector, P(x, t) the 
pressure, and the constants p and v are the density 
and kinematic viscosity, respectively. This system is 
considered in three (or sometimes two) spatial 
dimensions with a specified initial velocity field 


q(x, 0) = q(x) [tc] 


and physically appropriate boundary conditions: for 
example, zero velocity on a rigid boundary, or 
periodicity conditions for flow on a torus. This 
nonlinear system of partial differential equations 
(PDEs) has proved to be remarkably challenging, 
and in three dimensions the fundamental issues of 
existence and uniqueness of physically reasonable 
solutions are still open problems. 

It is often useful to consider the Navier-Stokes 
equations in nondimensional form by scaling the 
velocity and length by some intrinsic scale in the 
problem, for example, in Reynolds’ experiment by 
the mean speed U and the diameter of the pipe d. 
This leads to the nondimensional equations 


1 
M V)q=—-VP+5V'q [2a] 


divq4 = 0 [2b] 
where the Reynolds number R is 
R = Ud/v [3] 


In many situations, the size of R has a crucial 
influence on stability. Roughly speaking, when R is 
small the flow is very sluggish and likely to be 
stable. However, the effects of viscosity are actually 
very complicated and not only is viscosity able to 
smooth and stabilize fluid motions, sometimes it 
actually also destroys and destabilizes flows. 

The Euler equations, which predate the Navier- 
Stokes equations by many decades, neglect the 
effects of viscosity and are obtained from [la] by 
setting the viscosity parameter v to zero. Since this 
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removes the highest-derivative term from the equa- 
tions, the nature of the Euler equations is funda- 
mentally different from that of the Navier-Stokes 
equations and the limit of vanishing viscosity (or 
infinite Reynolds number) is a very singular limit. 
Since all real fluids are at least very weakly viscous, 
it could be argued that only the the Navier-Stokes 
equations are physically relevant. However, many 
important physical phenomena, such as turbulence, 
involve flows at very high Reynolds numbers (104 or 
higher). Hence, an understanding of turbulence is 
likely to involve the asymptotics of the Navier-— 
Stokes equations as R— oo. The first step towards 
the construction of such asymptotics is the study of 
inviscid fluids governed by the Euler equations: 
Og 


<! + (q-V)q=-VP 4a 


divq =0 [4b] 


Stability issues for the Euler equations are in many 
respects distinct from those of the Navier-Stokes 
equations and in this article we will briefly touch 
upon stability results for both systems. 


Comments on Some “Classical” 
Instabilities 


To illustrate the complexity of the structure of 
instabilities that can arise in the Navier-Stokes 
equations, we mention one classical example, 
namely the centrifugal instabilities called Taylor- 
Couette instabilities. Consider a fluid between two 
concentric cylinders rotating with different angular 
velocities. If the inner cylinder rotates sufficiently 
faster than the outer one, the centrifugal force is 
stronger on inside particles than outside particles 
and a disturbance which exchanges the radial 
position of particles is enhanced, that is, the 
configuration is unstable. As the angular velocity 
of the inner cylinder is increased above a certain 
critical rate, the instability is manifested in a series 
of small toroidal (Taylor) vortices that fill the space 
between the cylinders. There follows a hierarchy of 
successive instabilities: azimuthal traveling waves, 
twisting regimes, and quasiperiodic regimes until 
chaotic solutions appear. Such a sequence of 
bifurcations is a scenario for a transition to 
turbulence postulated by Ruelle-Takens. Details 
concerning bifurcation theory and fluid behavior 
can be found in the book of Chossat and Iooss 
(1994). 

We note that phenomena of successive bifurca- 
tions connected with loss of stability, such as 
regimes of Taylor—Couette instabilities, occur at 


moderately large Reynolds numbers. Fully devel- 
oped turbulence is a phenomenon associated with 
very high Reynolds numbers. These are parameter 
regimes basically inaccessible in current numerical 
investigations of the Navier-Stokes equations and 
turbulent models. The Euler equations lie at the 
limit as R —> oo. It is an interesting observation that 
results at the limit of infinite Reynolds number are 
sometimes also applicable and consistent with 
experiments for flows with only moderate Reynolds 
number. 

There is a huge diversity of forces that couple 
with fluid motion to produce instability. We will 
merely mention a few of these which an interested 
reader could pursue in consultation with texts listed 
in the “Further reading” section and references 
therein. 


1. The so-called Bénard problem of convective 
instability concerns a horizontal layer of fluid 
between parallel plates and subject to a tempera- 
ture gradient. The governing equations are the 
Navier-Stokes equation for a nonconstant den- 
sity fluid and the heat equation. In this problem, 
the critical parameter governing the onset of 
instability is called the Rayleigh number. The 
patterns that can develop as a result of instability 
are strongly influenced by the boundary condi- 
tions in the horizontal coordinates. With lattice 
type conditions, bifurcating solutions include 
rolls, rectangles, and hexagons. Convection rolls 
are themselves subject to secondary instabilities 
that may break the translation symmetry and 
deform the rolls into meandering shapes. Further 
refinements of convective instabilities include 
doubly diffusive convection, where the density 
depends on concentration as well as temperature. 
Competition between stabilizing diffusivity and 
destabilizing diffusivity can lead to the so-called 
“salt-finger” instabilities. 

2. Of considerable interest in astrophysics and 
plasma physics are the instabilities that occur in 
electrically conducting fluilds. Here the fluid 
equations are coupled with Maxwell’s equations. 
Much work has been done on the topic of 
magnetohydrodynamical (MHD) stability, which 
was developed to address various important 
physical issues such as thermonuclear fusion, 
stellar and planetary interiors, and dynamo 
theory. For example, dynamo theory addresses 
the issue of how a magnetic field can be 
generated and sustained by the motion of an 
electrically conducting fluid. In the simplest 
scenario, the fluid motion is assumed to be a 
given divergence-free vector field and the study of 


the instabilities that may occur in the evolution 
of the magnetic field is called the kinematic 
dynamo problem. This gives rise to interesting 
problems in dynamical systems and actually is 
closely analogous to the topic of vorticity 
generation in the three-dimensional (3D) fluid 
equations in the absence of MHD effects. 


In the next section we discuss certain mathema- 
tical results that have been rigorously proved for 
particular problems in the stability of fluid flows. 
We restrict our attention to the “basic” equations, 
that is, [2a] and [2b], [4a] and [4b], observing that 
even in rather simple configurations there are still 
more open problems than precise rigorous results. 


The Navier-Stokes Equations: 
Mathematical Definitions of 
Stability/Instability 


Instability occurs when there is some disturbance of 
the internal or external forces acting on the fluid 
and, loosely speaking, the question of stability or 
instability considers whether there exist disturbances 
that grow with time. There are many mathematical 
definitions of stability of a solution to a PDE. Most 
of these definitions are closely related but they may 
not be equivalent. Because of the distinctly different 
nature of the Navier-Stokes equations for a viscous 
fluid and the Euler equations for an inviscid fluid, 
we will adopt somewhat different precise definitions 
of stability for the two systems of PDEs. Both 
definitions are related to the concept known as 
Lyapunov stability. A steady state described by a 
velocity field Upo(x) is called Lyapunov stable if 
every state g(x,t) “close” to Uo(x) at t=O stays 
close for all t > 0. In mathematical terms, “close- 
ness” is defined by considering metrics in a normed 
space X. While in finite-dimensional systems the 
choice of norm is not significant because all Banach 
norms are equivalent, in infinite-dimensional sys- 
tems, such as a fluid configuration, this choice is 
crucial. The point was emphasized by Yudovich 
(1989) and it is a version of the definition of 
stability given in this book that we will adopt in 
connection with the parabolic Navier-Stokes 
equations. 


Definitions for a General Nonlinear 
Evolution Equation 


Consider an evolution equation for u(x,t) whose 
phase space is a Banach space X: 
Ou 


—=Iu+N 
ey u + N(u, u) 
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We assume that if the initial value u(x,0) € X is 
given, the future evolution u(x,t),t > 0, of the 
equation is uniquely defined (at least for sufficiently 
small initial data). Without loss of generality, we 
can assume that zero is a steady state. 

We define a version of Lyapunov (nonlinear) 
stability and its converse instability. 


Definition 1 Let (X, Z) be a pair of Banach spaces. 
The zero steady state is called (X, Z) nonlinearly 
stable if, no matter how small e > 0, there exists 
6>0 so that u(x,0) € X and 


u(x, O)Ilz < ô 


imply the following two assertions: 


(i) there exists a global in time solution such that 
u(x,t) € (([0, 00); X); 
(11) ||ze(x,¢)||> < € for a.e. t € [0, co). 


The zero state is called nonlinearly unstable if either 
of the above assertions is violated. We note that 
under this strong definition of stability, loss of 
existence of a solution is a particular case of 
instability. The concept of existence that we will 
invoke in considering the Navier-Stokes equations is 
the existence of “mild” solutions introduced by Kato 
and Fujita (1962). Local-in-time existence of mild 
solutions is known in X=L’ for q > n, where n 
denotes the space dimension. (L1 denotes the usual 
Lebesque space). 

We now state two theorems for the Navier-Stokes 
equations [2a] and [2b]. The theorems are valid in any 
space dimension 7 and in finite or infinite domains. Of 
course, the most physically relevant cases are n = 3 or 
2. Both theorems relate properties of the spectrum of 
the linearized Navier-Stokes equations to stability or 
instability of the full nonlinear system. Let 
Uo(x), Po(x) be a steady state flow: 


1 1 
(Uo - V)Uo = —V Po +V Uo tE [Sa] 


VU =0 [Sb] 


where Up € C® vanishes on the boundary of the 
domain D and F is a suitable external force. We 
write [2a] and [2b] in perturbation form as 


i q(x, t) = Uo(x) + u(x,t) [6] 
se = Lysu + N(u, u) [7a] 
V-u=0 [7b] 
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with 
Lysu = (Uo : Vu (u i V)Uo 
1 


-+ gyu — VP; [8] 


N(u,u) = —(u - Vu — V P2 [9] 


Here Pı and P3 are, respectively, the portions of the 
pressure required to ensure that Lynsu and N(u, u) 
remain divergence free. The operators Lys and N act 
on the space of divergence-free vector-valued func- 
tions in the closure of the Sobolev space W*? that 
vanish on the boundary of D. 

We note that the spectrum of the elliptic linear 

operator Lyns with appropriate boundary conditions 
in a bounded domain is purely discrete: that is, it 
consists of a countable number of eigenvalues of 
finite multiplicity with the sole limit point being at 
infinity. 
Theorem 2 (Nonlinear instability). Let 1 < p< œ 
be arbitrary. Suppose that the operator Lys over L? 
has spectrum in the right half of the complex plane. 
Then the flow Uo(x) is (L1, L?) nonlinearly unstable 
for any q > max(p, n). 


Theorem 3 (Asymptotic Lyapunov stability). Let 
q >n be arbitrary. Assume that the operator Lys 
over L has spectrum confined to the left half of the 
complex plane. Then the flow Uo(x) is (L4, L41) 
nonlinearly stable. 


A recent proof of these theorems is given in 
Friedlander et al. (2006) using a bootstrap type 
argument. In Theorem 2, the space L4, q > n, is used 
as an auxiliary space inwhich the norm of the 
nonlinear term is controlled, while the final instabil- 
ity result is proved in L? for p € (1,00). We note 
that this includes the most physically relevant case 
of instability in the L* energy norm. An earlier proof 
of the theorems under the restriction p > n was 
given by Yudovich (1989). 

To apply Theorem 2 or 3 to conclude nonlinear 
instability or stability of a given flow Uo, it is 
necessary to have information concerning the spec- 
trum of the linear operator Lys. Obtaining such 
information has been the goal of much of the 
literature concerning fluid stability (see the biblio- 
graphy and the references therein). However, except 
in the case of some relatively simple flows, the 
eigenvalues of Lyns have not yet been calculated 
explicitly. Perhaps the example that is the most 
tractable is plane parallel shear flows. Here the 
eigenvalue problem is governed by an ordinary 
differential equation (ODE) known as the Orr- 
Sommerfeld equation, which has been the subject of 


extensive analytical and numerical investigations. 
Consider the parallel flow Uo =(U(z),0,0) in the 
strip —1 < z < 1. For disturbances of the form 


p(z) ellkix+k2y) e% (1 0] 


the eigenvalue A is determined by the following 
equation with k? =k? + kô: 








\ | d p 
1 |g? 
= TER aa pu 





with boundary conditions ¢=0 at z=+1. We note 
that the discreteness of the spectrum is preserved if 
periodicity conditions are imposed in the (x,y) 
plane. 

The complexity of the spectral problem [11] is 
apparent even for the simple case U(z)=1— 2? 
(known as plane Poiseuille flow). Unstable eigenva- 
lues exist but only in certain regions of (k,R) 
parameter space. There is a critical Reynolds number, 
R.=5772, below which ReA <0 for all wave 
numbers k. For R > Re, instability occurs in a band 
of wave numbers and the thickness of this band 
shrinks to zero as R— oo (i.e., the inviscid limit). 
Hence, Poiseuille flow with R < Re can be considered 
as an example where the stability Theorem 3 can be 
applied, that is, the flow is nonlinearly stable to 
infinitesimal disturbances. However, extremely care- 
ful experiments are needed to obtain agreement with 
the theoretical value of Re = 5772. Rather it is more 
usual in an experiment with R ~ 2000 that the flow 
exhibits instability in the form of streamwise streaks 
that appear near the walls. These structures do not 
look like traveling waves of the form given by 
expression [10], rather they are finite-amplitude 
effects of nonmodal growth. Such linear growth of 
disturbances, along with energy growth and pseudos- 
pectra have recently been investigated extensively. 

An example where Theorem 3, proving nonlinear 
instability, can be applied is the so-called 
Kolmogorov flow. This is also a shear flow with the 
spectral problem for the linearized operator given by 
eqn [11]. In this example, the profile is oscillatory in z 
with U(z) = sin mz. In an elegant paper, Meshalkin 
and Sinai (1961) used continued fractions to prove 
the existence of a real unstable positive eigenvalue. It 
is interesting, and in some sense surprising, that the 
particular case of sinusoidal profiles leads to a 
nonconstant-coefficient eigenvalue problem, where 
it is possible to construct in explicit form the 
transcendental characteristic equation that relates 
the eigenvalues À and the wave numbers. Usually, 


this can be done only for constant-coefficient equa- 
tions. In the case U(z)= sin mz, a Fourier series 
representation for the eigenfunctions leads to a 
tridiagonal infinite matrix for the algebraic system 
satisfied by the Fourier coefficients. This is amenable 
to examination using continued fractions. Analysis of 
the characteristic equation shows that there exist real 
eigenvalues À >0 provided R is larger than some 
critical value for each wave number k with k? < m7’. 


The Euler Equation: Linear and 
Nonlinear Stability/Instability 


We conclude this brief article with some discussion 
of instabilities in the inviscid Euler equations whose 
existence is likely to be important as a “trigger” for 
the development of instabilities in high-Reynolds- 
number viscous flows. As we mentioned, the Euler 
equations are very different from the Navier-Stokes 
equations in their mathematical structure. The 
Euler equations are degenerate and nonelliptic. As 
such, the spectrum of the linearized operator Lg is 
not amenable to standard spectral theory of elliptic 
operators. For example, unlike the Navier-Stokes 
operator, the spectrum of Lg is not purely discrete 
even in bounded domains. To define Lg we consider 
a steady Euler flow {Uo(x), Po(x)}, where 


Uo ° VUo = —V Po [12a] 


V-Up =0 [12b] 


We assume that Ug € C®. For the Euler equations, 
appropriate boundary conditions include zero nor- 
mal component of Up on a rigid boundary, or 
periodicity conditions (i.e., flow on a torus) or 
suitable decay at infinity in an unbounded domain. 
The theorems that we will be describing have been 
proved mainly in the cases of the second and third 
conditions stated above. There are many classes of 
vector fields Up(x), in two and three dimensions, 
that satisfy [12a] and [12b]. We write [4a] and [4b] 


in perturbation form as 


q(x,t) = Uo(x) + u(x,t) [13] 
with oi 
on Lgu + N(u,u) [14a] 
V- =0 [14b] 
Here 


Lgu = —(Uo - V) u — (u - V)Uo — V Pi [15] 


N(u,u) = —(u-V)u-— V P2 [16] 
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Linear (spectral) instability of a steady Euler flow 
Uo(x) concerns the structure of the spectrum of Lr. 
Assuming Uo € C®(T”), the linear equation 


ðu 


an 3 a Í 
J E U, V-u=0 [17] 


defines a strongly continuous group in every Sobolev 
space W*? with generator Lg. We denote this group 
by exp {Lr t}. For the issue of spectral instability of 
the Euler equation it proves useful to study not only 
the spectrum of Lg but also the spectrum of the 
evolution operator exp {Lert}. This permits the 
development of an explicit formula for the growth 
rate of a small perturbation due to the essential (or 
continuous) spectrum. It was proved by Vishik 
(1996) that a quantity A, refered to as a “fluid 
Lyapunov exponent” gives the maximum growth 
rate of the essential spectrum of exp{Lpt}. This 
quantity is obtained by computing the exponential 
growth rate of a certain vector that satisfies a 
specific system of ODEs over the trajectories of the 
flow Uo(x). This proves to be an effective mechan- 
ism for detecting instabilities in the essential 
spectrum which result due to high-spatial-frequency 
perturbations. For example, for this reason any flow 
U(x) with a hyperbolic fixed point is linearly 
unstable with growth in the sense of the L?-norm. 
In two dimensions, A is equal to the maximal 
classical Lyapunov exponent (i.e., the exponential 
growth of a tangent vector over the ODE x = Uo(x)). 
In three dimensions, the existence of a nonzero 
classical Lyapunov exponent implies that A > 0. 
However, in three dimensions there are also exam- 
ples where the classical Lyapunov exponent is zero 
and yet A > 0. We note that the delicate issue of the 
unstable essential spectrum is strongly dependent on 
the function space for the perturbations and that A, 
for a given Uo, will vary with this function space. 
More details and examples of instabilities in the 
essential spectrum can be found in references in the 
bibliography. 

In contrast with instabilities in the essential 
spectrum, the existence of discrete unstable eigenva- 
lues is independent of the norm in which growth is 
measured. From this point of view, such instabilities 
can be considered as “strong.” However, for most 
flows Uo(x) we do not know the existence of such 
unstable eigenvalues. For fully 3D flows there are no 
examples, to our knowledge, where such unstable 
eigenvalues have been proved to exist for flows with 
standard metrics. The case that has received the 
most attention in the literature is the “relatively 
simple” case of plane parallel shear flow. The 
eigenvalue problem is governed by the Rayleigh 
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equation (which is the inviscid version of the Orr- 
Sommerfeld equation [11]): 


1A d? 2 " 
(U-2) |ga-# ¢—U"¢=0 


@¢=0 at z=+1 [18] 


The celebrated Rayleigh stability criterion says that 
a sufficient condition for the eigenvalues A to be 
pure imaginary is the absence of an inflection point 
in the shear profile U(z). It is more difficult to prove 
the converse; however, there have been several 
recent results that show that oscillating profiles 
indeed produce unstable eigenvalues. For example, if 
U(z)= sin mz the continued fraction proof of 
Meshalkin and Sinai can be adapted to exhibit the 
full unstable spectrum for [18]. We note the “fluid 
Lyapunov exponent” A is zero for all shear flows; 
thus the only way the unstable spectrum can be 
nonempty for shear flows is via discrete unstable 
eigenvalues. 

As we have discussed, it is possible to show that 
many classes of steady Euler flows are linearly 
unstable, either due to a nonempty unstable essential 
spectrum (i.e., cases where A > 0) or due to unstable 
eigenvalues or possibly for both reasons. It is natural 
to ask what this means about the stability/instability 
of the full nonlinear Euler equations [14]-[16]. The 
issue of nonlinear stability is complex and there are 
several natural precise definitions of nonlinear 
stability and its converse instability. 

One definition is to consider nonlinear stability 
in the energy norm L* and the enstrophy norm Ht, 
which are natural function spaces to measure 
growth of disturbances but are not “correct” spaces 
for the Euler equations in terms of proven proper- 
ties of existence and uniqueness of solutions to the 
nonlinear equation. Falling under this definition is 
the most frequently employed method to prove 
nonlinear stability, which is an elegant technique 
developed by Arnol’d (cf. Arnol’d and Khesin 
(1998) and references therein). This is based on 
the existence of the so-called energy-Casimirs. The 
vorticity curl q is transported by the motion of 
the fluid so that at time ¢ it is obtained from the 
vorticity at time t=0 by a volume-preserving 
diffeomorphism. In the terminology of Arnol’d, 
the velocity fields obtained in this manner at any 
two times are called isovortical. For a given field 
Uo(x), the class of isovortical fields is an infinite- 
dimensional manifold M, which is the orbit of the 
group of volume-preserving diffeomorphisms in the 
space of divergence-free vector fields. The steady 
flows are exactly the critical points of the energy 
functional E restricted to M. If a critical point is a 


strict local maximum or minimum of E, then the 
steady flow is nonlinearly stable in the space J, of 
divergence-free vectors u(x,t) (satisfying the bound- 
ary conditions) that have finite norm, 


lulli, = lllz + [url ulir [19] 


This theory can be applied, for example, to show 
that any shear flow with no inflection points in the 
profile U(z) is nonlinearly unstable in the function 
space Jų, that is, the classical Rayleigh criterion 
implies not only spectral stability but also nonlinear 
stability. 

We note that Arnol’d’s stability method cannot be 
applied to the Euler equations in three dimensions 
because the second variation of the energy defined 
on the tangent space to M is never definite at a 
critical point Uo(x). This result is suggestive, but 
does not prove, that most Euler flows in three 
dimensions are nonlinearly unstable in the Arnol’d 
sense. To quote Arnol’d, in the context of the Euler 
equations “there appear to be an infinitely great 
number of unstable configurations.” 

In recent years, there have been a number of 
results concerning nonlinear instability for the 
Euler equation. Most of these results prove non- 
linear instability under certain assumptions on the 
structure of the spectrum of the linearized Euler 
operator. To date, none of the approaches prove 
the definitive result that in general linear instability 
implies nonlinear instability. As we have remarked, 
this is a much more delicate issue for Euler than for 
Navier-Stokes because of the existence, for a 
generic Euler flow, of a nonempty essential 
unstable spectrum. To give a flavor of the mathe- 
matical treatment of nonlinear instability for the 
Euler equations, we present one recent result and 
refer the interested reader to articles listed in the 
“Further reading” section for further results and 
discussions. 

In the context of Euler equations in two dimen- 
sions, we adopt the following definition of Lyapu- 
nov stability. 


Definition 4 An equilibrium solution Uo(x) is 
called Lyapunov stable if for every € > 0 there exists 
6 > 0 so that for any divergence-free vector u(x, 0) € 
Wt? s > 2/p, such that ||x(x, 0)||;2 < 6 the unique 
solution u(x,t) to [14]-[16] satisfies 

u(x, t)l|;2 <€ for t€ [0,co) 

We note that we require the initial value u(x, 0) to 
be in the Sobolev space W!**?, s > p/2, since it is 
known that the 2D Euler equations are globally in 
time well posed in this function space. 


Definition 5 Any steady flow Uo(x) for which the 
conditions of Definition 4 are violated is called 
nonlinearly unstable in L?. 


Observe that the open issues (in three dimensions) 
of nonuniqueness or nonexistence of solutions to 
[14]-[16] would, under Definition 5, be scenarios 
for instability. 


Theorem 6 (Nonlinear instability for 2D Euler 
flows). Let Uo(x) € C*%(T*) be satisfy [12]. Let A 
be the maximal Lyapunov exponent to the ODE 
x= Uo(x). Assume that there exists an eigenvalue A 
in the L* spectrum of the linear operator Lyg given 
by [15] with Reà >A. Then in the sense of 
Definition 5, Uo(x) is Lyapunov unstable with 
respect to growth in the L7-norm. 


The proof of this result is given in Vishik and 
Friedlander (2003) and uses a so-called “bootstrap” 
argument whose origins can be found in references 
in that article. We remark that the above result gives 
nonlinear instability with respect to growth of the 
energy of a perturbation which seems to be a 
physically reasonable measure of instability. 

In order to apply Theorem 6 to a specific 2D flow 
it is necessary to know that the linear operator Lg 
has an eigenvalue with ReA>A. As we have 
discussed, such knowledge is lacking for a generic 
flow Uo(x). Once again, we turn to shear flows. As 
we noted A=0 for shear flows, any shear profile for 
which unstable eigenvalues have been proved to 
exist provides an example of nonlinear instability 
with respect to growth in the energy. 

We conclude with the observation that it is 
tempting to speculate that, given the complexity 
of flows in three dimensions, most, if not all, such 
inviscid flows are nonlinearly unstable. It is clear 
from the concept of the fluid Lyapunov exponent 
that stretching in a flow is associated with 
instabilities and there are more mechanisms for 
stretching in three, as opposed to two, dimensions. 
However, to date there are virtually no mathema- 
tical results for the nonlinear stability problem for 
fully 3D flows and many challenging issues remain 
entirely open. 
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Introduction 


The theorem on stability of matter is one of the most 
celebrated results in mathematical physics. It is one 
of the rare cases where a result of such great 
importance to our understanding of the world 
around us appeared first in a completely rigorous 
formulation. 

Issues of stability are, of course, extremely impor- 
tant in physics. One of the major triumphs of the 
theory of quantum mechanics is the explanation it 
gives of the stability of the hydrogen atom (and the 
complete description of its spectrum). Quantum 
mechanics or, more precisely, the uncertainty princi- 
ple explains not only the stability of tiny microscopic 
objects, but also the stability of gigantic stellar 
objects such as white dwarfs. Chandrasekhar’s 
famous theory on the stability of white dwarfs 
required, however, not only the usual uncertainty 
principle, but also the Pauli exclusion principle for 
the fermionic electrons. 

Whereas both the stability of atoms and the 
stability of white dwarfs were early triumphs of 
quantum mechanics, it, surprisingly, took nearly 
40 years before the question of stability of everyday 
macroscopic objects was even raised (Fisher and 
Ruelle 1966). The rigorous answer to the question 
came shortly thereafter in what came to be known 
as the “theorem on stability of matter” proved first 
by Dyson and Lenard (1967). 

Both the stability of hydrogen and the stability of 
white dwarfs simply mean that the total energy of 
the system cannot be arbitrarily negative. If there 
were no such lower bound to the energy, one would 
have a system from which it would be possible, in 
principle, to extract an infinite amount of energy. 
One often refers to this kind of stability as stability 
of the first kind. 

Stability of matter is somewhat different. Stability 
of the first kind for atoms generalizes, as noted later, 
to objects of macroscopic size. The question arises 
as to how the lowest possible energy depends on the 
size or, more precisely, on the (macroscopic) number 
of particles in the object. Stability of matter in its 
precise mathematical formulation is the requirement 
that the lowest possible energy depends at most 
linearly on the number of particles. Put differently, 
the lowest possible energy calculated per particle 


cannot be arbitrarily negative as the number of 
particles increases. This is often referred to as 
“stability of the second kind.” If stability of the 
second kind does not hold, one would be able to 
extract an arbitrarily large amount of energy by 
adding a single atomic particle to a sufficiently large 
macroscopic object. 

A perhaps more intuitive notion of stability is 
related to the volume occupied by a macroscopic 
object. More precisely, the volume of the object, 
when its total energy is close to the lowest possible 
energy, grows at least linearly in the number of 
particles. This volume dependence is a fairly simple 
consequence of stability of matter as formulated 
above. 

The first mention of stability of the second kind 
for a charged system is perhaps by Onsager (1939), 
who studied a system of charged classical particles 
with a hard core and proved the stability of the 
second kind. The proof of stability of matter by 
Dyson and Lenard, which does not rely on any hard- 
core assumption, but rather on the properties of 
fermionic quantum particles, used results from 
Onsager’s paper. 

The real relevance of the notion of stability of the 
second kind was first realized by Fisher and Ruelle 
(1966) in an attempt to understand the thermo- 
dynamic properties of matter and to give meaning 
to thermodynamic quantities such as the energy 
density (energy per volume). Stability of matter is a 
necessary ingredient in explaining the existence of 
thermodynamics, that is, that the energy per 
volume has a well-defined limit as the volume and 
number of particles tend to infinity, with the ratio 
(i.e., the density of particles) kept fixed. The 
existence of this limit is, however, not just a simple 
consequence of stability of matter. The existence of 
the thermodynamic limit for ordinary charged 
matter was proved rigorously by Lieb and Lebowitz 
(1972) using the result on stability of matter as an 
input. 

After the original proof of stability of matter by 
Dyson and Lenard, several other proofs were given 
(see, e.g., reviews by Lieb (1976, 1990, 2004) for 
detailed references). Lieb and Thirring (1975) in 
particular presented an elegant and simple proof 
relying on an uncertainty principle for fermions. As 
explained in a later section, the best mathematical 
formulation of the usual uncertainty principle is in 
terms of a Sobolev inequality. The method of Lieb 
and Thirring is related to a Sobolev type inequality 
for antisymmetric functions. The Lieb-Thirring 
inequality is discussed later. The proof by Dyson 


and Lenard gave a very poor bound on the lowest 
possible energy per particle. The proof by Lieb and 
Thirring gave a much more realistic bound on this 
quantity (see below). Two proofs of stability of 
matter will be sketched here. Both proofs rely on the 
Lieb-Thirring inequality. The first proof described is 
mathematically simple to explain, whereas the 
second proof (Lieb-Thirring) is based on the 
Thomas—Fermi theory. It is mathematically some- 
what more involved but, from a physical point of 
view, more intuitive. 

As in the case of white dwarfs, stability of matter 
relies on the fermionic property of electrons. Dyson 
(1967) proved that the stability of the second kind 
fails if we ignore the Pauli exclusion principle. In 
physics textbooks, the importance of the Pauli 
exclusion principle for the stability of white dwarfs 
is often emphasized. Its importance for the stability 
of everything around us is usually ignored. 

As mentioned above the result on stability of 
matter appeared from the beginning as a completely 
rigorously proved theorem. In contrast, the stability 
of white dwarfs was only derived rigorously by Lieb 
and Thirring (1984) and Lieb and Yau (1987) over 
50 years after the original work of Chandrasekhar. 

The original formulation of stability of matter, 
which is given in the next section, dealt with 
charged matter consisting of electrons and nuclei 
interacting only through electrostatic interactions 
and being described by nonrelativistic quantum 
mechanics. Over the years, many generalizations of 
stability of matter have been derived in order to 
include relativistic effects and electromagnetic inter- 
actions. Some of these generalizations will be 
discussed in this article. A complete understanding 
of stability of matter in quantum electrodynamics 
(QED) does not exist as yet, which is intimately 
related to the fact that this theory still awaits a 
mathematically satisfactory formulation. 


The Formulation of Stability of Matter 


Consider K nuclei with nuclear charges z1,..., zg > 0 
at positions 1,...,rg E€ R?, and N electrons with 
charges —1 (this amounts to a choice of units) at 
positions x1,...,xn E€ R?. In order to discuss 
stability, it turns out that one can consider the 
nuclei as fixed in space, whereas the electrons are 
dynamic. More precisely, this means that the 
kinetic energy of the nuclei is ignored. It is 
important to realize that if stability holds for static 
nuclei, it also holds for dynamic nuclei. This is 
simply because the kinetic energy is positive, so that 
the effect of ignoring it is to lower the total energy. 
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Since we consider only electrostatic interactions, 
the quantum Hamiltonian describing this system is 





T 5 + yo ro g 


1<icj<N |æ; — x; 1<kel<K Tk — re] 





The kinetic energy operator T; is (half) the Laplacian in 
the variable x;, i.e., T; = —(1/2)A;. Atomic units are 
used, where not only the electron charge is —1, but the 
mass of the electron is also 1 and h=1. The unit of 
energy is then 2 Ry. 

The Hamiltonian Hy depends on the parameters 
Z= (Zes) and r=(rj,...,7x). It acts on the 
Hilbert space of fermionic, that is, antisymmetric 
wave functions. More precisely, the fermionic 
Hilbert space is 


N 
Hy = AETR) 


Here the target space is C^, in order to describe 
spin-1/2 particles. One can, of course, also consider 
the Hamiltonian Hy on the full Hilbert space, 


N 
Hy = 0) L7(R3;C’) = L?(R5N;C”) 


of which Hy, is a subspace. 
The quantity of interest is the ground-state energy 


E*(z,N, K) = inf inf spec: Hy 
= inf inf{ (Y, HW) y 


e HAN C>(RN;C™), u =1} 2 


and likewise for the ground-state energy E(z, N, K) 
on the full space Hy. Clearly, E'(z,N,K) > 
E(z, N, K). It turns out that the energy E(z, N, K) is 
the same as one would get by restricting to 
symmetric functions instead of antisymmetric 
ones. Therefore, the energy E(z,N,K) is often 
referred to as the lowest possible energy for bosonic 
particles. 

The Hamiltonian Hy is an unbounded operator 
and we must discuss its domain to be able to talk 
about its spectrum. Also, it should be self-adjoint. It 
turns out that these questions are intimately related 
to stability. The operator Hy is well defined on 
smooth (i.e., C~) functions. Thus, the last definition 
of E} (z, N, K) in [2] is meaningful. If this ground-state 
energy is finite (i.e., not —oo), then the Hamiltonian 
has an extension, the Friedrichs’ extension, to a 
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self-adjoint operator with the property that the 
second equality in [2] holds. 

In the definition of EF, we have minimized over 
all the positions r of the nuclei. Even though the 
nuclear dynamics is not considered, one is still 
interested in finding the lowest possible energy 
independent of where they are located. 


Theorem 1 (Stability of the first kind). 
K, and z, we have 


For all N, 


E(z,N,K) >- 


Theorem 2 (Stability of matter). There exists a 
constant Cj), >0 depending only on |z|= max 
{z1,... Zk} such that 


The constant C bounds the binding energy per 
particle. In the case of hydrogen atoms, when 
zZ]=1, Dyson and Lenard arrived at a bound with 
Cı ~ 10!*Ry. Lieb and Thirring arrive at Ci 7% 
5=10Ry. Since the binding energy of a single 
hydrogen atom is 1Ry, it is easy to see that one 
must have Cı > 1/4. Over the years, there have 
been some improvements on the estimated value of 
this constant in the theory of stability of matter. 

That the Pauli exclusion principle, that is, the 
fermionic character of the electrons, is necessary for 
stability of matter is a consequence of the next 
theorem. 


Theorem 3 (the N° law for bosons). If N=K 
and zı = ++: =zg =z > 0, then there exist constants 
C+ > 0 depending on z such that 


=C NP? EN N] -CN 


It is the superlinear (exponent 5/3) behavior in N 
of the upper bound that violates stability of matter. 
This upper bound was proved by Lieb (1979) by a 
fairly simple variational argument. The lower bound 
above, which shows that the exponent 5/3 is 
optimal, was proved by Dyson and Lenard (1968) 
in their original paper on stability of matter. 

This theorem leaves open the possibility that the 
stability of matter could be recovered by introducing 
finite nuclear masses. That this, indeed, is not the case 
was proved by Dyson (1967) by a complicated 
variational argument based on the Bogolubov pair 
theory for superfluid helium. We now add the kinetic 
energy De —(1/2)A,, of the nuclei (assuming, for 
simplicity, that they have the same mass as the 
electrons) to the Hamiltonian Hy and consider 
the case where z1 = 22 = --: =Zx = 1. We denote the 


ground-state energy over the space ER] 
(ignoring spin) by E(N,K). Then, Dyson proved that 


min E(N,K) < —CM7/> 

N+K=M 
for some constant C > 0. It was later shown by 
Conlon et al. (1988) that the exponent 7/5 is indeed 
optimal. Dyson (1967) made a conjecture for the 
precise asymptotic behavior of this energy. This 
conjecture, which was proved by Lieb and Solovej 
(2005) and Solovej (2004), is given in the next 
theorem. 


Theorem 4 
Bose gas). 


(Dyson’s 7/S-law for the charged 


i; ; E(N ,K) 
Moo NK M M’. 


= int} 5 [iver -J | 8? \o20,/ ¢ =} B [3] 


where 


— (4\°F T(U2)E(3/4) 
J ($) ST (5/4) 


TT 


Generalizations of Stability of Matter 


Over the years, generalizations of stability of matter 
including relativistic effects and interactions with the 
electromagnetic field have been attempted. Since the 
relativistic Dirac operator is not bounded below, we 
cannot simply replace the standard nonrelativistic 
kinetic energy operator T;= —(1/2)A; by the free 
Dirac operator. 

Relativistic effects have been included by con- 
sidering the (pseudo) relativistic kinetic energy 


[= =,/-c*Aj;+c*— c 


In the units used in this article, the physical value 
of the speed of light c is approximately 137 or, 
more precisely, the reciprocal of the fine-structure 
constant a. 

For this relativistic kinetic energy, Lieb and Yau 
(1988) proved that stability of matter holds in the 
sense formulated in Theorem 2 if a(=c™') is small 
enough and max; {z}a < 2/7. It is known here that 
the value 2/7 is the best possible, since it is so 
for the one-atom case. The one-atom case had 
been studied by Herbst. The corresponding case of 
a one-electron molecule was studied by Lieb and 
Daubechies. Less optimal results on the stability of 
matter with relativistic kinetic energy had been 


obtained prior to the work of Lieb and Yau by 
Conlon and later by Fefferman and de la Llave. 
References to these works can be found in the work 
of Lieb and Yau (1988). 

The relativistic kinetic energy (ie agrees with the 
free Dirac operator on the positive spectral subspace 
of the free Dirac operator (i.e., a subspace of 
L7(IR?;C*)). Therefore, the stability of matter 
follows if T; is replaced by the free Dirac operator 
and if one restricts to the Hilbert space obtained as 
in [2] but with L?(R°;C*) replaced by the positive 
spectral subspace of the free Dirac operator. This 
formulation is often referred to as the “no-pair” 
model. In the usual Dirac picture, the negative 
spectral subspace, the Dirac sea, is occupied. As long 
as one ignores pair creation, only the positive 
spectral subspace is available. 

Magnetic fields may be included by considering 
the “magnetic kinetic energy” 

TM = 1(-iV, — A(x) 
It turns out that the stability of matter theorem 
(Theorem 2) holds for all magnetic vector potentials 
A:R? — R? with a constant Cją independent of A. 
This is, therefore, also the case if we consider the 
magnetic field (or rather the vector potential) as a 
dynamic variable and add the (positive) field energy 


u=; | IV x A(x)|7 dx [4] 
ST Jg 
to the Hamiltonian. The resulting Hamiltonian 
describes a charged spinless particle interacting 
with a classical electromagnetic field. 

A more complicated situation is described by the 
“magnetic Pauli kinetic energy” 


pan = 5((—iV; — c-'A(x;)) ; oj)” 


where the coupling of the spin to the magnetic field 
is included through the vector of 2x2 Pauli 
matrices acting on the spin components of particle /, 
that is, O = (01, 02,03), with 


(0 1 (0 =i (1 0 
ee o Sm Og aao =l 


For the Pauli kinetic energy, stability of matter will 
not hold independently of the magnetic field (or even 
for a fixed unbounded magnetic field) unless the field 
energy U in [4] is included in the Hamiltonian. If the 
field energy is included, stability of matter holds 
independently of the magnetic field, that is, even if 
one minimizes over the dynamic variable A, if 
a(ļ(=c™!) and max; {z;}a? are small enough. This was 
proved by Fefferman (1997) and by Lieb et al. 
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(1995). The latter result includes the physical value of 
a. The fact that a bound on a is needed had been 
proved by Loss and Yau. Stability for a one-electron 
atom had been proved in this model by Fröhlich, 
Lieb, and Loss. The many-electron atom and the one- 
electron molecule had been studied by Lieb and Loss. 
Most relevant references may be found in the work of 
Lieb et al. (1995). 

The possibility of quantizing the magnetic field has 
also been studied. In this case, one must introduce an 
ultraviolet cutoff in the momentum modes of the 
vector potential. Stability of matter in the resulting 
model of (ultraviolet cutoff) QED coupled to non- 
relativistic matter was proved by Fefferman et al. 
improving results of Bugliaro, Fröhlich, and Graf. 

Finally, one may include both relativistic effects 
and electromagnetic interactions. Let us first discuss 
the case of classical electromagnetic fields. If instead 
of the Pauli kinetic energy one uses the Dirac 
operator with a magnetic vector potential then 
there would be no lower bound on the energy. But, 
as previously described, one can study a no-pair 
formulation of relativistic particles coupled to 
electromagnetic fields. The question arises which 
subspace of L?(R3; Cf) one should restrict to (i.e., 
which subspace is filled and which one is available). 
There are two obvious choices. Either one should, as 
before, restrict to the positive spectral subspace of 
the free Dirac operator or one should restrict to the 
positive spectral subspace of the magnetic Dirac 
operator. It is proved by Lieb et al. (1997) that the 
former choice leads to instability, whereas stability 
of matter holds for the latter choice under some 
conditions on a and max; {z;}. Stability requires that 
the field energy U is included in the Hamiltonian. It 
then holds independently of the magnetic field. 

This final stability result also holds if the magnetic 
field is quantized with an ultraviolet cutoff as 
proved by Lieb and Loss (2002). 

The no-pair model even with the ultraviolet cutoff 
quantized field is not fully relativistically invariant. 
As mentioned above, there is still no mathematical 
formulation of QED, a fully relativistically invariant 
model for quantum particles interacting with elec- 
tromagnetic fields. 


The Proof of Stability of the First Kind 


The proof of stability of the first kind will now be 
sketched for charged quantum systems. 

As mentioned in the introduction, stability of the 
first kind is a consequence of the uncertainty 
principle. Contrary to what is often stated in physics 
textbooks, stability does not follow from the 
Heisenberg formulation of the uncertainty principle. 
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A mathematically more flexible formulation 
is provided by the classical Sobolev inequality, 
which states that for all square-integrable functions 
y € L2(R°), one has 


[iverec ( J w) 5] 


for Cs > 0. It follows from this inequality that for 
any attractive potential V, there is a lower bound on 
the energy expectation 


e(a) 
saft- fuwtzze({v)” 
ASA (for) 
> -c fv? f w 


for some C > 0. Thus, the lowest possible energy of 
one particle moving in the potential V is bounded 
below by —C f V°/*. For N (noninteracting) particles, 
the lower bound is —CN f V°/?. This holds whether 
or not the particles have spin. If, more generally, the 
potential can be written as V=U+W,U,W > 0, 
where f U°/? < œo and W is bounded W < ||W||,., 
then the energy of N noninteracting particles moving 
in the potential V is bounded below by 


-NC | U5? -N| W] 6 


For the Hamiltonian Hy from [1], one can get a 
lower bound on the energy E(z, N, K) by ignoring all 
the positive potential terms, that is, the last two 
sums in [1]. The remaining Hamiltonian describes N 
independent particles moving in the potential 


K 
-5 
k-i a7; 


where U, is the restriction of z,/|x — r| to the set 
|x — r| < R for some R > 0 and W; is the restriction 
to the complementary set. Using [6], one can easily 
see that the energy expectation is bounded below by 


K 


-X (U + We) 


k=1 





—~CNK” max{z,} "R"? = NKmax{z}R 


sC NK max{ze}" 


where we have made the optimal choice for 
R ~ (K max; {zk} 

This finite lower bound on the energy proves 
the stability of the first kind, but it clearly does 


not have the form required for the stability of the 
second kind. 


The Proof of Stability of Matter 


The proof of stability of the first kind presented in 
the previous section must be improved in two ways 
in order to conclude the stability of matter. 

For fermions, it turns out that the lower bound in 
[6] can be improved in such a way that there is no 
factor N in the first term. This is the content of the 
bound of Lieb and Thirring discussed in the 
introduction. 


Theorem 5 (Lieb-Thirring inequality 1975). The 
sum of all the negative eigenvalues of the oper- 
ator —(1/2)A — V(x) is bounded below by 


-Lyr | v5? 


for some constant Lir > 0 


For N noninteracting fermions moving in the 
potential V, the lowest possible energy is given by 
the sum of the N lowest eigenvalues of the operator 
in the above theorem. Thus, the theorem gives a 
lower bound on this energy independently of N. 

The second point where the argument from the 
previous section has to be improved is the control of 
the electrostatic energy. In the above discussion, all 
repulsive terms have simply been ignored. For 
stability of matter, a much more delicate bound is 
needed. Many versions of such bounds have been 
given going back to the work of Onsager (1939). 
Here, a result of Baxter (1980) will be used. 


Theorem 6 (Baxter’s correlation estimate). For all 


positions Xi,..., XN, T1,---,7K € R? and all charges 
Z1,---52K > 0, we have the pointwise inequality 


1 
-SS 2 =x] 





k=1 i=1 1<i<j<N 
RRL = 
+ e-ve 
1<k<l<K [rk — rel j=l 
where V(x)=(1 +2 max; {zp}) max; {|x — rel} 


This theorem simply states that, for a lower 
bound, one can replace the full electrostatic Cou- 
lomb energy by the energy of independent electrons 
moving in the potential where they always see only 
the closest nuclei (with a modified charge). Baxter 
(1980) used probabilistic techniques to prove the 
inequality. An improved version of the inequality 
was given by Lieb and Yau (1988), with an analytic 
proof. 


Similarly to the argument in the previous section, 
one can write V(x)=U(x)+ W(x), where U is the 
restriction of V to the set where ming {|x — r|} < R 
for some R>O and W is the restriction to the 
complementary set. It then follows from Baxter’s 
correlation estimate and the Lieb-Thirring inequality 
that the lowest eigenvalue of the Hamiltonian Hy on 
the fermionic Hilbert space HË, is bounded below by 


— Lir | U`? — N(1 + 2max{z})Ro! 
a G 2 max{zg}) KR? 
—N(1+2 max{z,})R 
= —C'(1 + 2 max{z4}) (N +K) 


where R ~ (1 + 2max;{z}})!. This lower bound is 
linear in the total particle number N+ K, as 
required by stability of matter. 


From Thomas-Fermi Theory to Stability 
of Matter 


In this final section, the proof of stability of matter 
by Lieb and Thirring (1975), where they use the 
Thomas-Fermi theory, is discussed briefly. First note 
that there is a dual formulation of the Lieb-Thirring 
inequality theorem (Theorem 5), which makes the 
connection to the Sobolev inequality [5] much more 
transparent. 


Theorem 7 (Lieb-Thirring inequality as a kinetic 
energy bound). For any normalized antisymmetric 
ee) wave function Y € Hy we have with 
Chr = (4 Loh 7/3 the following lower bound on the 
bene, energy: 


M1 
ahh [VW 


2 Cir | p(x)?’ dx 
R? 


where ||- || is the norm in spin space (C? 
one-electron density is given by 


saat ee 
This estimate follows immediately from Theorem 
5, which implies that 


N 4 ; 
5 five - J ev> -lır | v5" 


Theorem 7, 


xn) dx e -dxn 


` ) and the 


(T(x, 202,.-. £N) ||7 dx2 ++ dey 


To arrive at simply choose 


V= sor. 
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One should compare the Lieb-Thirring kinetic 
energy bound with the expression (3/ 10)(322)7/8 95/3 
for the (thermodynamic) energy density of a 
free Fermi gas. One of the yet unproven conjectures 
is that the Lieb-Thirring bound holds with Curr 
replaced by the free Fermi constant (3/10)(372)*’°. 

The idea in the Lieb-Thirring proof of stability of 
matter is to bound the energy below by an 
expression depending only on the one-electron 
density. Theorem 7 achieves this for the kinetic 
energy. What is missing is a lower bound on the 
electrostatic Coulomb energy depending only on the 
density. One can show (see Lieb (1976) or Lieb and 
Thirring (1975)) that, except for an error of the 
form “—const x N,” the total energy expectation 
(VW, HW) may be bounded below by 


oe fo -$ Jo 
TIE 


Here, as before, p is the one-electron density of the 
N-body wave function Y. The expression [7] is the 
famous Thomas—Fermi energy functional. It has 
been studied rigorously by Lieb and Simon (1977). 
The Thomas-—Fermi energy is the infimum of the 
expression (7) over all p with f p=N. One of the 
important results about the Thomas—Fermi energy is 
Teller’s no-binding theorem (Lieb and Simon 1977). 
It states that in Thomas—Fermi theory atoms do not 
bind to form molecules. This means that the 
Thomas—Fermi energy is greater than the sum of 
the individual atomic energies (these energies in turn 
depend only on the nuclear charges). 

The above Thomas—Fermi lower bound on the 
energy expectation (V,HnW) together with the no- 
binding theorem implies stability of matter. 

The generalizations to stability of matter dis- 
cussed earlier are proved in a way similar to the 
proof presented in the previous section. 





Be 


yds dy+ So < j] 


1<k<l<K Te — re] 





See also: h-Pseudodifferential Operators and 
Applications; Quantum Statistical Mechanics: Overview; 
Schrodinger Operators. 
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Introduction 


The Minkowski space, which is the simplest solution 
of the Einstein field equations in vacuum, that is, in 
the absence of matter, plays a fundamental role in 
modern physics as it provides the natural mathema- 
tical background of the special theory of relativity. It 
is most reasonable to ask whether it is stable under 
small perturbations. In other words, can arbitrary 
small perturbations of flat initial conditions lead to 
developments which are radically different, in the 
large, from the flat Minkowski space? It turns out to 
be a highly nontrivial problem as the Einstein 
equations are of a quasilinear hyperbolic character. 
Typical systems of this type, in three space dimen- 
sions, do form singularities in finite time even for 
small disturbances of their trivial initial data. To 
avoid finite-time singularities, we must require that 
sufficiently small perturbations of Minkowski space 
are geodesically complete. This, however, is not 
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enough; one should also insist that the corresponding 
spacetimes become flat along all possible directions, 
that is, globally asymptotically flat. This is measured 
by the decay of the curvature tensor to zero. The 
precise rate of decay is also of interest. One expects 
that various null-frame components of the curvature 
tensor decay at different rates along outgoing null 
hypersurfaces; this goes under the name of “peeling 
estimates.” It turns out in fact that we cannot prove 
geodesic completeness without establishing at the 
same time sufficiently fast rates of decay to flatness 
corresponding to at least some peeling. 

The problem of stability of Minkowski space is 
intimately related to that of describing the asympto- 
tic properties of the gravitational field at large 
distances from an isolated, weakly radiating physical 
system. Precise laws of gravitational radiation can 
be deduced from the assumption that the spacetime 
(M,g) under consideration can be conformally 
compactified by adding a boundary S, called skry, 
to M so that an appropriate conformal rescaling of g 
can be extended smoothly to the new manifold 
(M,&) with boundary. In reality, the compactified 
spacetime cannot be smooth at the particular point 


i? corresponding to spacelike infinity. A spacetime 


(M,g) is called asymptotically simple (AS) if its 
conformal completion is smooth everywhere except 
i? and every null geodesic intersects S at precisely 
two endpoints. The AS assumption allows one to 
derive precise decay asymptotic for various curvature 
components of (M,g) along null geodesics which 
are referred to as strong peeling. The obvious 
questions raised by this procedure are: do there exist 
nontrivial AS spacetimes and, if so, do they contain 
a sufficiently large class of radiating spacetimes 
including those which appear in all relevant 
applications? 

Clearly, the two problems mentioned above are 
related but not equivalent. Asymptotically simple 
spacetimes verify strong peeling, in particular they 
are globally asymptotically flat, that is, their 
curvature tensor tends to zero along all geodesics. 
Yet, it is perfectly possible that arbitrarily small 
perturbations of the Minkowski space are geodesi- 
cally complete and globally asymptotically flat 
without being asymptotically simple. 

The first global stability result of the Minkowski 
metric was proved by Christodoulou and Klainer- 
man (1993). Their result proves sufficiently strong 
peeling estimates to allow one to derive the most 
important properties of gravitational radiation, such 
as the Bondi mass-law formula, but not as strong as 
those consistent with asymptotic simplicity. A 
companion result was proved by Klainerman and 
Nicolo (2003). Recently, Rodnianski and Lindblad 
(submitted) have obtained a surprising global 
stability of Minkowski result for the Einstein vacuum 
equations in the Lorentz gauge, which provides 
considerable weaker peeling than Christodoulou and 
Klainerman (1993) and Klainerman and Nicolo 
(1999) but is much easier to prove. 

The goal of this article is to describe various results 
obtained since the early 1980s concerning both 
aspects of the problem of stability of Minkowski 
mentioned above. 


Initial Data Formulation 


The proper mathematical context for the stability of 
Minkowski is that provided by the initial-value 
problem for vacuum solutions to the Einstein field 
equations, that is, Ricci flat spacetimes (M,g), 
R,, =0. We recall the following simple definitions: 


Definition 1 An initial data set is a triplet (©, g, k) 
consisting of a three-dimensional complete Rieman- 
nian manifold (“,g) and a 2-covariant symmetric 
tensor k on È satisfying the constraint equations: 


Viki —Vitrgk=0, R-—|ki? + (tre) =0 
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where V is the covariant derivative, R the scalar 
curvature of (%1,g). An initial data set is said to be 
maximal if tr k =0. This is a gauge condition which 
can be imposed without loss of generality. For 
simplicity we shall assume, throughout this article, 
that all initial data sets we consider are maximal. 


Definition 2 An initial data set is said to be flat, or 
trivial, if it corresponds to a complete spacelike 
hypersurface in Minkowski space with its induced 
metric and second fundamental form. An initial data 
set is said to be asymptotically flat if there exists a 
system of coordinates (x!',x7,x°) defined in a 
neighborhood of infinity on %, with 
r= /(x!)2 + (x2)4 + (x3)4, relative to which the 
metric g approaches the Euclidian metric and k 
approaches zero as r — oo. We assume, for simpli- 
city, that © has only one end. A neighborhood of 
infinity means the complement of a sufficiently large 
compact set on ». 


Remark 1 Because of the constraint equations, the 
asymptotic behavior cannot be arbitrarily pre- 
scribed. A precise definition of asymptotic flatness 
has to involve the ADM mass of (4, g). Taking the 
mass into account, we write 


2M 
8ij = (1 +2) 6 + o(r') 
According to the positive-mass theorem, M > 0 and 
M=0 implies that the initial data set is flat. 


Definition 3 We say that an initial data set is 
strongly asymptotically flat if, for some 61/2, 
relative to the coordinate system mentioned above, 


2M 
Zij — (1 +) by = OF"), ky = Or) 
as r —> OO 


Moreover, every derivative of g — (1 + 2M/r)6 and k 
improves the asymptotics by one. 


Definition 4 A Cauchy development of an initial 
data set (%,2,k) is a spacetime manifold (M, g) 
satisfying the Einstein equations together with an 
embedding i:i—>M such that i,(g),i.(k) are the 
first and second fundamental forms of i(£) in M. 
A development is required to be also globally 
hyperbolic (which means that i(¥) is a Cauchy 
hypersurface, i.e., each causal curve in M intersects 
i(X) at precisely one point) in order to assure the 
unique dependence of solutions on the data. A 
future development of (©, g, k) consists of a globally 
hyperbolic manifold (M, g) with boundary, satisfy- 
ing the Einstein equations, and an embedding 7 as 
before which identifies © to the boundary of M. 
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The most primitive question asked about the 
initial-value problem, solved in a satisfactory way, 
for very large classes of evolution equations, is that of 
local existence and uniqueness of solutions. For the 
Einstein equations, this type of result was first 
established by Bruhat (1952) with the help of wave 
coordinates which allowed her to cast the Einstein 
equations in the form of a system of nonlinear wave 
equations to which one can apply the standard theory 
of symmetric hyperbolic systems. A stronger result, 
due to Hughes et al. (1976), states the following: 


Theorem 1 Let (%,g,k) be an initial data set for 
the Einstein vacuum equations. Assume that © can 
be covered by a locally finite system of coordinate 
charts Ua related to each other by C' diffeomorpb- 
isms, such that (g,k) € Hj (Ua) x Hy Ua with 
s> 5/2. Then there exists a unique (up to an 
isometry) globally hyperbolic, Hausdorff, develop- 


ment (M, g) for which X£ is a Cauchy hypersurface. 


In Theorem 1, the uniqueness up to an isometry 
requires additional regularity, s > (5/2) +1, on the 
data. One has uniqueness, however, without addi- 
tional regularity for the reduced Einstein equations 
system in wave coordinates. 


Remark 2 In the case of nonlinear systems of 
differential equations, the local existence and 
uniqueness result leads, through a straightforward 
extension argument, to a global result. The formula- 
tion of the same type of result for the Einstein 
equations is a little more subtle; it was done by 
Bruhat and Geroch. 


Theorem 2 (Bruhat-—Geroch). For each smooth 
initial data set, there exists a unique maximal future 
development. 


Thus, any construction, obtained by an evolution- 
ary approach from a specific initial data set, must be 
necessarily contained in its maximal development. 
This may be said to solve the problem of global 
existence and uniqueness in general relativity. This is 
of course misleading, for equations defined in a fixed 
background global is a solution which exists for all 
time. In general relativity, however, we have no such 
background as the spacetime itself is the unknown. 
The connection with the classical meaning of a global 
solution requires a special discussion concerning the 
proper time of timelike geodesics; all further ques- 
tions may be said to concern the qualitative properties 
of the maximal development. The central issue is that 
of existence and character of singularities. First, we 
can define a regular maximal development as one 
which is complete in the sense that all future timelike 
and null geodesics can be indefinitely extended 


relative to their proper time (or affine parameter in 
the case of null geodesics). If the initial data set is 
sufficiently far off from the trivial one, the corre- 
sponding future development may not be regular. 
This is the content of the following well-known 
theorem of Penrose (1979). 


Theorem 3 If the manifold support of an initial 
data set is noncompact and contains a closed 
trapped surface, the corresponding maximal devel- 
opment is incomplete. 
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At the opposite end of Penrose’s trapped-surface 
condition, the problem of stability of Minkowski 
space concerns the development of asymptotically 
flat initial data sets which are sufficiently close to 
the trivial one. Although it may be reasonable to 
expect the existence of a sufficiently small neighbor- 
hood of the trivial initial data set, in an appropriate 
topology, such that all corresponding developments 
are geodesically complete and globally asymptoti- 
cally flat, such a result was by no means preor- 
dained. First, all known explicit asymptotically 
flat solutions of the Einstein vacuum equations, 
that is, the Kerr family, are singular. The attempts 
to construct nonexplicit, dynamic, solutions based 
on the conformal compactification method, due 
to Penrose (1962), were obstructed by the irregular 
behavior of initial data sets at i°. (The problem is 
that the singularity at 7? could propagate and thus 
destroy the expected smoothness of scry. This 
problem has been recently solved by constructing 
initial data sets which are precisely stationary at 
spacelike infinity.) Finally, the attempts, using 
partial differential equation hyperbolic methods, 
to extend the classical local result of Bruhat 
ran into the usual difficulties of establishing global 
in time existence to solutions of quasilinear hyper- 
bolic systems. Indeed, as mentioned above, the 
wave coordinate gauge allows one to express 
the Einstein vacuum equations in the form of 
a system of nonlinear wave equations which does 
not satisfy Klainerman’s null condition (the null 
condition (Klainerman 1983, 1986) identifies an 
important class of quasilinear systems of wave 
equations in four spacetime dimensions for which 
one can prove global in time existence of small 
solutions) and thus was sought to lead to formation 
of singularities. (The conjectured singular behavior of 
wave coordinates was sought, however, to reflect 
only the instability of the specific choice of gauge 
condition and not a true singularity of the equations.) 
According to Bruhat (personal communication), 


Einstein himself had reasons to believe that the 
Minkowski space may not be stable. The problem 
of stability of the Minkowski space was first settled 
by Christodoulou and Klainerman (1990). 


Theorem 4 (Global stability of Minkowski). Any 
asymptotically flat initial data set which is suffi- 
ciently close to the trivial one has a complete 
maximal future development. 


A related result (Theorem 5) proved recently by 
Klainerman and Nicolo (2003a), solves the problem 
of radiation for arbitrary asymptotically flat initial 
data sets: a proof the result below can also be 
derived, indirectly, from Christodoulou and Klainer- 
man (1993). The proof of Klainerman and Nicolo 
(2003a) avoids, however, a great deal of the 
technical complications of this proof. 


Theorem 5 For any, suitably defined, asymptoti- 
cally flat initial data set (“,g,k) with maximal 
future development (M,g), one can find a suitable 
domain Qo C £ with compact closure in X such that 
the boundary Dj of its domain of influence C* (Qo), 
or causal future of Q, in M has complete null 
geodesic generators with respect to the correspond- 
ing affine parameters. 


Both the results of Christodoulou-Klainerman and 
Klainerman-Nicolo prove in fact a lot more than 
stated above. They provide a wealth of information 
concerning the behavior of null hypersurfaces as well 
as the rate at which various components of the 
Riemann curvature tensor approach zero along time- 
like and null geodesics. Here are more precise 
versions for Theorems 4 and 5. 


Theorem 4 (Expanded version). Assume that 
(=,2,k) is maximal and strong asymptotically 
flat, g—(1+2M/r)6=O0(r%), R=O(r>/*) plus 
an appropriate global smallness assumption. We can 
construct complete spacetime (M,g) together with a 
maximal foliation X, given by the level hypersurfaces 
of a time function t and null foliation C,, given by the 
level hypersurfaces of an outgoing optical function u 
such that relative to an adapted null frame e4=L, 
e3 = L, and (ea)a=1,2 we have, along the null hyper- 
surfaces C, the weak peeling decay, 


Oy = KL, 6.127) = O(r-7/*) 
=R LLa —OG A 
4p = R(L, L, L, L) = O(r°) 
4o =* R(L, L, L, L) = O(r°) 
A ARAE 
(L, €a, L, ep) = 00 


Qa = RK 
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as roo with 4rr =Area(S;,=dX;9C,). Also, 
p—p,o=Ol(r*), with p the average of p over the 
compact 2-surfaces St u =X; Cy. 


Three points are noteworthy. (1) The outgoing 
optical solution refers to the solution of the Eikonal 
equation g®’0,u0gu=0 whose level hypersurfaces 
C, intersect X; in expanding wave fronts for 
increasing t; (2) the generators L and L are given 
by: L= —g%’03u0,, the null geodesic generator of 
C,; L is then the null conjugate of L, perpendicular 
to St u = Cu N Xz; and (3) e4 is an orthonormal frame 
on Siu. 


Theorem 5 (Expanded version). For any asympto- 
tically flat initial data sets (X£, g, k), verifying the same 
asymptotically flat conditions as in Theorem 4 one 
can find a suitable domain Qo C % with compact 
closure in X such that its future domain of influence 
CT(Qo) can be foliated by two null foliations; one 
outgoing C(u) whose leaves are complete towards the 
future and the second one C(u) which is incoming. 
Let S(u,u)=C(u) 1 C(u) denote the compact 
2-surfaces of intersection between the outgoing and 
incoming null hypersurfaces, whose area is denoted 
by 4rr*, and consider an adapted null frame (that is, 
L is a the geodesic null generator of C(u), L its null 
conjugate perpendicular to S(u,u), and e, an ortho- 
normal frame on S(u,u)) L,L,(ea)a=1,2 at every 
point along an outgoing null cone C(u). Then, 
denoting by a, 3, p,o,3,a the null components of 
the curvature tensor, as in Theorem 5, we have, along 
C(u) as r > œ, 


a, 3, p _ p, 0 = OA, 
a= O(r") 


Observe that the rates of decay in [1] and [2] are 
the same. This will be referred to as weak peeling to 
distinguish from the rates of decay compatible with 
asymptotic simplicity, that is, 


a= 0°), B=) 


po=O(r’), B=O(r*), a=0(r') ” 
to which we shall refer as strong peeling. We shall 
discuss more about these in the next section, 
following a review, of a recent result of Lindblad- 
Rodnianski. 

Even the expanded forms of Theorems 4 and 5 
stated here do not exhaust, all the information 
provided by global stability results in Christodoulou 
and Klainerman (1993) and Klainerman and Nicolo 
(2003a). Of particular interest are the main 
asymptotic conclusions which can be derived 
with the help of these information, the most 
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important being the Bondi mass-law formula which 
calculates the gravitational energy radiated at null 
infinity. 

The simplest gauge condition in which the 
hyperbolic character of the Einstein field equations 
are easiest to exhibit is the wave coordinate 
condition; that is, one solves the Einstein vacuum 
equations relative to a special system of coordinates 
x% which satisfy the equation Ogx*=0. Then, 
denoting by hag = 84g — Mag with m the standard 
Minkowski metric, we obtain the following system 
of quasilinear wave equations in h, 


g"”8,,0,b = N(b, Ob) 4] 


with N(b,0h) a nonlinear term, quadratic in Oh, 
which can be exhibited explicitly. This form of the 
Einstein field equations, called the wave coordinates 
reduced Einstein equations, is precisely the one 
which allowed Bruhat (1952) to prove the first 
local existence result. Later, she also pointed out 
that the first nontrivial iterate of [4] behaves like 
t! logt rather than t as expected from the decay 
properties of solutions to Ob=0 in Minkowski 
space. This seems to indicate that the wave 
coordinates may not be suitable to study the long- 
time behavior of solutions to the Einstein field 
equations. This negative conclusion is also consis- 
tent with the fact that the eqns [4] do not verify 
Klainerman’s null condition. (Klainerman’s null 
condition (Klainerman 1983) is an algebraic condi- 
tion on systems of nonlinear wave equations in 
(1 + 3) dimensions, similar to [4], which allows one 
to extend all local solutions, corresponding to small 
initial data, for all time. Moreover, these solutions 
decay at the rate of t™™ as t — œo consistent to the 
decay of free waves.) Lindblad and Rodnianski 
(2003) were able to isolate a new condition, which 
they call the weak null condition, verified by the 
wave coordinates reduced Einstein eqns [4], for 
which one can prove a small data global existence 
result consistent with the weaker decay rates 
suggested by the linear asymptotic analysis of 
Bruhat. Although the new result provides far 
weaker peeling information than [1], it is much 
simpler to prove than both Theorems 4 and S. 
Moreover, the result seems to apply to a broader 
class of initial data than in Theorems 4 and 5. It 
remains an intriguing open problem whether the 
result of Lindblad—Rodnianski can be used as a 
stepping stone towards the more complete results of 
Theorems 4 and 5; that it is once a complete 
solution, with limited peeling, is known to exist 
whether one can improve, using the more precise 
techniques employed in Theorems 4 and 5 minus an 


important part of their technical complications, the 
weak peeling properties of [1]. 


Strong Peeling 


The weak peeling properties [1] derived in Theorems 
4 and 5 are consistent, from a scaling point of view, 
with the SAF condition. To derive strong peeling, 
see [3], one needs stronger asymptotic conditions. 
Recently, Corvino-Schoen and Chruściel and Delay 
(2002) have proved the existence of a large class of 
asymptotically flat initial data sets (X£, g,k) which 
are precisely stationary (here ger, Rkerr are the initial 
data of the a Kerr solution in standard coordinates) 
2 = Lkerrs R = Rkerr Outside a sufficiently large com- 
pact set. Moreover, they have proved the existence 
of sufficiently small solutions in this class which 
satisfy the requirements needed in Friedrich’s con- 
formal compactification method (see Friedrich 
(2002) and the references within) to produce 
asymptotically simple spacetimes, that is, spacetimes 
satisfying Penrose’s regular compactification condi- 
tion (Penrose 1962). Simultaneously, Klainerman 
and Nicolo (1999) were able to refine the methods 
used in the proof of Theorem 5S to prove the 
following: 


Theorem 6 Assume that the initial data set (X£, g, k) 
of Theorem S satisfies the stronger assumption, 


eo), AA oi 
for some y > 3/2. Here 


=i 
gs = (1 - 2=) dr? + r?° (de? + sin’ 6 dd”) 


denotes the restriction of the Schwarzschild to t=0 
in standard polar coordinates. Then, in addition to 
the results reported in Theorem 5, we have the 
strong peeling estimates, 


a= O(r), 8=O(r*) 


as r—co along the outgoing null leaves C(u). 
Moreover, the same conclusions hold true if [S] is 


replaced by 


§ — 8kerr = OE AF, k — Riss = Gor) [6] 


for some y > 5/2. 


The first part of the theorem was proved in 
Klainerman and Nicolò (2003b). The second part is 
work in progress by Klainerman and Nicolò. The 
existence of initial conditions of the type required in 
Theorem 6 was established in the works of Corvino 
(2000) and Chruściel and Delay (2002). 


Open Problems 


Problem 1 Extend results of Theorems 5 and 6 to 
the whole domain of dependence, for small data sets. 


The results of Theorems 5 and 6 give a 
satisfactory description of gravitational radiation of 
general classes of asymptotically flat initial data sets 
outside the domain of dependence of a sufficiently 
large compact set. It would be desirable to extend 
these results to the whole domain of dependence of 
initial data sets which satisfy an additional global 
smallness assumption similar to that of Theorem 4. 


Problem 2 Is strong peeling (and implicitly asymp- 
totic simplicity) consistent with physically relevant 
data? If not, is weak peeling a good substitute? 


Damour and Christodoulou (2000) have given 
conclusive evidence that under no-incoming- 
radiation condition the future null infinity cannot 
be smooth. In fact, 8 = O(r~ logr) as r > on. 


Problem 3 Can one weaken the AF conditions to 
include, for example, initial data sets with infinite 
ADM angular momentum? 


It is reasonable to expect a global stability of 
Minkowski result for small initial data sets which 
verify, for arbitrarily small €, 


p= (1 + 2m =O0(ir'*%, k=0(r**) 


One expects in this case that the top null components 
a and 8 decay only like O(r) as r > œo along the 
null hypersurfaces C(u). It seems that the methods of 
Lindblad—Rodnianski can treat this case but can only 
give decay estimates for a, 8 of the form O(r-3*°). 


Problem 4 Is the Kerr solution in the exterior of 


the black hole stable? 


The problem remains wide open. 


See also: Asymptotic Structure and Conformal Infinity; 
Classical Groups and Homogeneous Spaces; Critical 
Phenomena in Gravitational Collapse; Einstein 
Equations: Exact Solutions; Geometric Analysis and 
General Relativity; Supergravity. 
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Introduction 


The long-term stability of planets and satellites might 
be desumed by the regular dynamics that we 
constantly observe. However, the ultimate fate of 
the solar system is an intriguing question, which has 
puzzled scientists since antiquity. In the past cen- 
turies, the common belief of a regular motion of the 
main planets was strengthened by the discovery of a 
simple law, due to J D Titius and J E Bode (eight- 
eenth century), which provides a recipe to compute 
the approximate distances of the planets from the 
Sun. Adopting astronomical units as a measure of the 
distance, the Titius—Bode law can be stated as 


d, =0.4+0.3 x 2” AU [1] 


where the index n must be selected as provided in 
Table 1, which compares the distances computed 
according to [1] with the observed values. Titius and 
Bode already noticed that it was necessary to skip 
one unit in n from Mars to Jupiter; indeed, the 
quantity d3 =2.8 AU might correspond to an aver- 
age distance of some minor bodies of the asteroid 
belt, which had been discovered since the beginning 
of the nineteenth century. The studies of the N-body 
problem, namely the dynamics of N mutually 
attracting bodies (according to Newton’s law), 
inspired several mathematical and physical theories: 
from the development of perturbation methods to 
the discovery of chaotic systems, as attested by the 
masterly work of H Poincaré (1892). In particular, 
perturbation theory had relevant applications in 
celestial mechanics; for example, it led to the 
prediction of the existence of Neptune in the 
nineteenth century by J C Adams and U Leverrier 


Table 1 Tititus—Bode law and observed data 





Index n Distance computed Observed 
(of [1]) from [1] distance (AU) 

Mercury —oo 0.4 0.39 

Venus 0 0.7 0.72 

Earth 1 1 1 

Mars 2 1.6 1.52 

Jupiter 4 5.2 5.2 

Saturn 5 10 9.54 

Uranus 6 19.6 19.19 


and later to the discovery of Pluto by C Tombaugh, 
as a result of unexplained perturbations on Uranus 
and Neptune, respectively. Modern advances in 
perturbation theories have been provided by the 
Kolgomorov—Arnol’d—Moser (KAM) and Nekhor- 
oshev theorems, which find broad applications in 
celestial mechanics insofar as simple model pro- 
blems are concerned. 

The stability of the solar system can also be 
approached through numerical investigations, which 
allow one to predict the motion of the celestial 
bodies using more realistic models. The results of 
the numerical integrations undermine in some cases 
the apparent regularity of the solar system: in the 
following sections, we shall review many examples 
of regular and chaotic motions in different contexts 
of celestial mechanics, from the N-body problem to 
the rotational dynamics. 


The Restricted Three-Body Problem 


Let Py,...,Pn be N bodies with masses m1,..., my, 
which interact through Newton’s law. Let u” € 
R?,i=1,2,...,N, denote the position of the bodies 
in an inertial reference frame. Normalizing the 
gravitational constant to 1, the equations of motion 
of the N-body problem have the form 
2 {i N ly = 
Si Ju — ut) | 





In the case N=2, one reduces to the two-body 
problem, which can be explicitly solved by means of 
Kepler’s laws as follows. Consider, for example, the 
Earth-Sun case: for negative values of the energy, 
the trajectory of the Earth is an ellipse with one 
focus coinciding with the barycenter, which can 
practically be identified with the Sun; the Earth—Sun 
radius vector describes equal areas in equal times; 
the cube of the semimajor axis is proportional to the 
square of the period of revolution. 

Consider now an extension to the study of three 
bodies such that in the Keplerian approximation P 
and P} move around P; and such that the 
semimajor axis of P is greater than that of P, (an 
example is obtained identifying P4 with the Sun, P2 
with the Jupiter, and P3 with an asteroid of the 
main belt). The three-body problem is described by 
[2] setting N=3; a special case is given by the 
restricted three-body problem, which describes the 
evolution of a “‘zero-mass” body under the gravita- 
tional attraction exerted by an assigned two-body 
system. Setting N=3 and m3=0 in [2], the 


equations governing the restricted three-body pro- 
blem are given by 





dru) m(u\) _ ul?) 
d Jy) uP 
Pu?) __ (0) — ul) 
de? u2) — u0’ 
d7u'3) mM (u®) a u\1)) m,(u') _ u)) 
d fy(3) — y(t)? 42) —y@P 
| | | | 


The first two equations concern the motion of the 
primaries P; and P) and they correspond to a 
Keplerian two-body problem, whose solution can 
be inserted in the equation for u”, which becomes a 
periodically forced second-order equation. The 
restricted three-body problem can be conveniently 
described in terms of suitable action-angle coordi- 
nates, known as Delaunay variables. The present 
discussion is restricted to the planar case, namely we 
assume that the motion of the three bodies takes 
place on the same plane. The corresponding Delau- 
nay variables, say (L, G, £, y) € R? x T’, are defined 
as follows (Szebehely 1967). Let a and e be, 
respectively, the semimajor axis and the eccentricity 
of the osculating orbit of P3 and let u= 1/ ms >. then 
Delaunay’s action variables are given by 


L = uyma, G=LvV1—-2 


Next, introduce the angle variables: we denote by A 
and ọ the longitudes of Jupiter and of the asteroid; 
let y be the argument of perihelion, namely the angle 
formed by the periapsis direction with a preassigned 
reference line, and let u denote the eccentric 
anomaly, which can be defined through 

1+e u 

t 


pry u 
I a 2 3 


tan 








Let Z be the mean anomaly, which is related to the 
eccentric anomaly by means of Kepler’s equation 


l =u -—esinu |4] 


Delaunay’s angle variables are represented by the 
mean anomaly ¢ and by the argument of perihelion 
y. For completeness, it should be remarked that 
the distance r between the minor body P3 and the 
primary P4 is related to the longitude and to the 
eccentric anomaly by means of the relations 


a(1 — e°) 


Tene) [5] 


In a reference frame centered at one of the 
primaries, say P4, let H= H(L,G,£4,y,à) denote 
the Hamiltonian function describing the planar 
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problem; notice that H(L, G, £, y, A) has two degrees 
of freedom and an explicit time dependence through 
the longitude A of P2. If the primaries are assumed to 
move in circular orbits around their common center 
of mass, the Hamiltonian function reduces to two 
degrees of freedom, where a new variable g is 
introduced as the difference between the argument 
of perihelion y and the longitude A of the primary. 
Normalizing the units of measure so that the 
distance between the primaries and the sum of 
their masses is unity, the Hamiltonian function H 
describing the circular, planar, restricted three-body 
problem is given by 


H(L, G, £, g) = — — G +eF(L,G,4£,g) [6] 


1 
Ir: 
where £= um. The perturbing function takes the 
form 


1 


F=rcos(f + g) — — 2rcos(f + g) 


1+7? 


where f = y — y represents the true anomaly, namely 
the angle formed by the instantaneous orbital radius 
with the periapsis line. Notice that the quantities r 
and f are functions of the Delaunay variables 
through the relations [3]-[5]. As a consequence, 
one can expand the perturbing function in the form 
(Delaunay 1860) 


FLG p= 


X Fit (0, g)ela® 


j,R>0 


where F,, are cosine terms with arguments given by 
a linear combination of the variables @ and g. For 
example, the first few terms of the series develop- 
ment are given by the following expression: 


i 9a Lo 
KLG LGe) = at ae t5 cos £ 
— (1 tak ) cos(e +8) 
7 74 
+7. e cos(l + 2g) 
— ($i +gt ) cos(2e+ 28) 
354 
-734 e cos(3l + 2g) 
- (GL tgh ) cos(30+ 38) 
35 
E cos(4é + 4g) 
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where the eccentricity is a function of the actions 
through e=\/1—G*/L*. We remark that the 
Hamiltonian [6] is nearly integrable with perturbing 
parameter £; indeed, for € = 0 one recovers the two- 
body problem describing the interaction between P4 
and P3, which can be explicitly solved according to 
Kepler’s laws. 


KAM Stability 


Classical perturbation theory, as developed by 
Laplace, Lagrange, Delaunay, Poincaré, etc., does 
not allow investigation of the stability of the N-body 
problem, since the series defining the solution are 
generally divergent. In order to justify this state- 
ment, let us start by rewriting the unperturbed 
Hamiltonian in [6] as 


1 
b(L,G) =-575- 8 
so that [6] becomes H(L,G,é,g)= h(L,G)+ 


eF(L, G,¢,g). In order to remove the perturbation 
to the second order in the perturbing parameter, one 
looks for a change of variables (L,G,¢,g) —> 
(L’, G’, gZ, 2’) close to the identity, that is, 


O® 
LSL —(L',G',£ 
Fear 9 4,8) 
O® 
G=G —(L',G',é 
ea 9 ,£, 2) 
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where ®(L’, G’, 4, g) is the generating function of the 
transformation. Let 

Oh 1 

— (L,G) =— = w(L 

L, G)= 7 =u(L) 
In order to perform a first-order perturbation 
theory, we look for a generating function 
P(L', G’, 4,2g), such that the transformed Hamilto- 
nian is integrable up to O(c), namely 


po. ô, oxi yp OF ae psi 
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where /;(L’, G’) is the new unperturbed Hamilto- 
nian. If we denote by Fo(L’, G’) the average of the 
perturbing function over the angle variables, the 
new unperturbed Hamiltonian takes the form 


hy(L', G’) = h(L', G’) + eFo(L’, G’) 


Expanding F in Fourier series as F(L, G, £, g)= 
De Fim(L, G)el*”8), the generating function 
is given by the following expression: 


ell(al+m g) 


Fom(L',G' 
5 ( ) 


(L,G, L, g) = -—i (L'n — m 


n, mEZ\{0} i 


The occurrence of small divisors of the form 


1 

e nmEZ 

might prevent the convergence of the series defining 
the generating function. In particular, we remark 
that zero divisors occur whenever w(L)=m/n. This 
situation, which is called an m:n orbit—orbit 
resonance, implies that during a given interval of 
time the body P3 makes m revolutions, whereas P2 
makes exactly n orbits about P4. 

The control of the occurrence of the small divisors 
was obtained through a theorem by A N Kolmogorov, 
who made a major breakthrough in the study 
of nearly integrable systems. He proved, under 
general assumptions, that some regions of the 
phase space are almost filled by maximal invariant 
tori. The theorem provides a constructive algorithm 
to give estimates on the perturbing parameter, 
ensuring the existence of some invariant surfaces. 
Kolmogorov’s theorem was later extended by 
V I Arnol’d and J Moser, giving rise to the so-called 
KAM theory. More precisely, the KAM theorem 
can be stated as follows (see, e.g., Arnol’d et al. 
(1997)): consider a real-analytic, nearly integrable 
Hamiltonian function and fix a rationally indepen- 
dent frequency vector w; if the unperturbed 
Hamiltonian is not degenerate and if the frequency 
satisfies a strong nonresonance assumption (called 
the diophantine condition), for sufficiently small 
values of the perturbing parameter, there exists an 
invariant torus on which a quasiperiodic motion 
with frequency w takes place. A preliminary 
investigation of the stability of the N-body problem 
by means of KAM theory (Arnold et al. 1997) 
leads to the existence of large regions filled by 
quasiperiodic motions, provided the masses of the 
planets are sufficiently small. Arnol’d’s version of 
KAM theorem has been applied by J Laskar and P 
Robutel to the spatial three-body planetary problem 
(the planetary problem concerns the study of the 


dynamics of two bodies with comparable masses, 
moving in the gravitational field of a larger primary) 
and the existence of quasiperiodic motions has been 
proved for values of the ratio of semimajor axis less 
than 0.8 and for inclinations up to ~ 1°. 

Concrete estimates on the strength of the perturba- 
tion were given by M Hénon: in the context of the 
three-body problem, the application of the original 
version of Arnol’d’s theorem allows one to prove the 
existence of invariant tori for values of the perturbing 
parameter (representing the Jupiter-Sun mass ratio) 
<10-°°? while the implementation of Moser’s theo- 
rem provides an estimate of 10°. We remark that the 
astronomical value of the Jupiter-Sun mass ratio 
amounts to ~ 10°, showing a relevant discrepancy 
between KAM results and physical measurements. 
More recently, KAM estimates have been refined and 
adapted to the study of significant problems of celestial 
mechanics (Celletti and Chierchia 1995). Strong 
improvements have been obtained combining accurate 
estimates with a computer-assisted implementation, 
where the computer is used to perform long computa- 
tions concerning the development of the perturbing 
series and the check of KAM estimates. The numerical 
errors are controlled through the implementation of a 
suitable technique, known as interval arithmetic. In 
the framework of the planar, circular, restricted three- 
body problem, the stability of some asteroids has been 
proved by A Celletti and L Chierchia for realistic 
values of the perturbing parameter (e.g., for e = 107°). 
A suitable approximation of the disturbing function 
(namely, a finite truncation of the series development 
as in [7|) has been considered. The result relies on an 
implementation of a computer-assisted isoenergetic 
KAM theorem and on the following remark: in the 
four-dimensional phase space, on a fixed energy level 
the invariant two-dimensional surfaces separate the 
phase space, providing the stability of the actions for 
all motions trapped between any two invariant tori. 
Since the action variables are related to the semimajor 
axis and to the eccentricity of the orbit, one obtains 
that the elliptic elements remain close to their initial 
values. 

A computer-assisted KAM theorem has been 
applied by A Giorgilli and U Locatelli to the 
planetary (Jupiter-Saturn) problem. Using a suitable 
secular approximation, it can be shown that this 
model admits two invariant tori, which bound the 
orbits corresponding to the initial data of Jupiter 
and Saturn. 


Nekhoroshev Stability 


A different approach in order to study the stability 
of nearly integrable systems is provided by 


Stability Problems in Celestial Mechanics 23 


Nekhoroshev’s theorem (see, e.g., Arnol’d et al. 
(1997)), which guarantees, under smallness require- 
ments, the stability of the motions on an open set of 
initial conditions for exponentially long times. 
Consider a Hamiltonian function of the form 


H(y,x) =h(y)+ef(y,x), (yx)EeBxT’ pPI 


where B is an open subset of R”. We assume that h 
and f are analytic functions and that the integrable 
Hamiltonian h satisfies a geometric condition, called 
steepness. We remark that functions such as h(L, G) 
in [8] satisfy the steepness condition. For sufficiently 
small values of £, Nekhoroshev’s theorem states that 
any motion (y(t),x(t)) satisfying Hamilton’s equa- 
tions associated with [9] is bounded for a finite (but 
exponentially long) time, that is, 


b 
ly) — y(0)|| < yoe, for |t| < toe”) 


where yo,to,€0,4, and b are suitable positive 
constants. 

Nekhoroshev’s theorem can be conveniently 
applied to the three-body problem, where it provides 
a confinement of the action variables, representing 
the semimajor axis and the eccentricity of the 
osculating orbit. Interesting applications of 
Nekhoroshev’s theorem concern the investigation 
of the triangular Lagrangian points in the spatial, 
restricted three-body problem. (The Lagrangian 
points are five equilibrium positions of the planar, 
restricted three-body problem in a synodic reference 
frame, which rotates with the angular velocity of the 
primaries. Two of such positions are called trian- 
gular, since the configuration of the three bodies is 
an equilateral triangle in the orbital plane.) Effective 
estimates were developed by A Giorgili and 
C Skokos, showing the existence of a stability 
region around the Lagrangian point L4, large 
enough to include some known asteroids. In the 
same framework, the exponential stability was 
proven by G Benettin, F Fassó, and M Guzzo for 
all values of the mass-ratio parameter, except for a 
few values of the reduced mass u up to u œ 0.038. 


Numerical Results 


The study of the stability of the N-body problem can 
be investigated by performing numerical integrations 
of the equations of motion. The dynamics of the 
outer planets of the solar system (from Jupiter to 
Pluto) has been explored by Sussman and Wisdom 
(1992) using a dedicated computer, the Digital 
Orrery. The integration of the equations of motion 
was performed over 845 million years; the results 
provided evidence of the stability of the major 
planets and a chaotic behavior of Pluto. An 
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alternative approach, based on an average of the 
equations of motion over fast angles, was adopted 
by Laskar (1995), where the perturbing function of 
the spatial problem was expanded up to the second 
order in the masses and up to the fifth powers of the 
eccentricity and the inclination. The dynamics of all 
planets (excluding Pluto) was investigated by means 
of frequency analysis over a time span ranging from 
—15 Gyr to +10Gyr. The numerical integrations 
provided evidence of the regularity of the external 
planets (from Jupiter to Neptune), a moderate 
chaotic behavior of Venus and the Earth, and a 
marked chaotic dynamics of Mercury and Mars. 
The computations show that the inner solar system 
is chaotic, with a Lyapunov time of ~5 Myr, thus 
preventing any prediction of the evolution over 
100 Myr. 


The Spin-Orbit Problem 


The dynamics of the bodies of the solar system 
results from a combination of a revolutionary 
motion around a primary body and a rotation 
about an internal axis. A simple mathematical 
model describing the spin-orbit interaction can 
be introduced as follows. Let S be a triaxial 
ellipsoidal satellite, which moves about a central 
planet P. We denote by Tey and Tot the periods of 
revolution and rotation. A p:g spin-orbit reso- 
nance occurs if 


Tia p 
=-, forp,geEN, 0 
Ta q p.q q F 





Whenever p=q=1, the satellite always points the 
same face to the host planet. Most of the evolved 
satellites or planets are trapped in a 1:1 resonance, 
with the only exception of Mercury, which is 
observed in a nearly 3:2 resonance. In order to 
introduce a simple mathematical model which 
describes the spin-orbit interaction, we assume that: 


1. the satellite moves on a Keplerian orbit around the 
planet (with semimajor axis a and eccentricity e); 

2. the spin axis is perpendicular to the orbit plane; 

3. the spin axis coincides with the shortest physical 
axis; and 

4. dissipative effects as well as perturbations due to 
other planets or satellites are neglected. 


We denote by A < B < C the principal moments 
of inertia of the satellite and by r and f, respectively, 
the instantaneous orbital radius and the true 
anomaly of the Keplerian orbit. Let x be the angle 
between the longest axis of the ellipsoid and a 
preassigned reference line. From standard Euler’s 


equations for rigid body, the equation of motion in 
normalized units (i.e., assuming that the period of 
revolution is 27) takes the form 

% + sin(2x — 2f) = 0 110] 
where £ = $(B — A)/C. This equation is integrable 
whenever A=B or in the case of zero orbital 
eccentricity. Due to the assumption of Keplerian 
motion, both r and f are known functions of the 
time. Therefore, we can expand [10] in Fourier 
series as 


CO 
xte 


m#~0,m=—oco 


w(>.e) sin(2x —mt)=0 [11] 


where the coefficients W(m/2,e) decay as 
W(m/2,e) œx el”-?!, A further simplification of the 
model is obtained as follows. According to (4), we 
neglected the dissipative forces and perturbations 
due to other bodies. The most important contribu- 
tion is due to the nonrigidity of the satellite, 
provoking a tidal torque caused by the internal 
friction. The size of the dissipative effects is 
significantly small compared to the gravitational 
terms. Therefore, we decide to retain in [11] only 
those terms which are of the same order or larger 
than the average effect of the tidal torque. The 
following equation results: 


No — 
FE ` w (Ze) sin(2x — mt) =0 [12] 
m#£),m=N1 


where N4 and N3 are suitable integers, which depend 
on the physical and orbital parameters of the satellite, 
while W(m/2,e) are suitable truncations of the 
coefficients W(m/2, e). We remark that eqn [12] can 
be derived from Hamilton’s equations associated 
with a one-dimensional, time-dependent, nearly 
integrable Hamiltonian function with perturbing 
parameter £ and a trigonometric disturbing function. 


Analytical Results 


The phase space associated with [12] admits a 
Poincaré map showing a pendulum-like structure: 
the periodic orbits are surrounded by librational 
curves and the chaotic separatrix divides the libra- 
tional regime from the region where rotational 
motions can take place. The three-dimensional 
phase space is separated by KAM rotational tori 
into invariant regions, providing a strong stability 
property for all motions confined between any pair 
of KAM rotational tori. Let us denote by P(p/q) a 
periodic orbit associated with the p:g resonance; in 
the context of the model associated with [12], the 


stability of the periodic orbit P(p/q) is obtained by 
showing the existence of two invariant tori 
T(w1;) and T(w2) with w1 < p/q < w2. A refined 
computer-assisted KAM theorem has been imple- 
mented (Celletti 1990) with the aim of proving the 
existence of trapping invariant surfaces. Realistic 
estimates, in agreement with the physical values of 
the parameters (namely, the equatorial oblateness € 
and the eccentricity e), have been obtained in several 
examples of spin-orbit commensurabilities, like the 
1:1 Moon-Earth interaction or the 3:2 Mercury- 
Sun resonance. 

Concerning Nekhoroshev-type estimates, the 
classical D’Alembert problem has been studied by 
Biasco and Chierchia (2002). In particular, an 
equatorially symmetric oblate planet moving on a 
Keplerian orbit around a primary body has been 
investigated; the model does not assume any further 
constraint on the spin axis. Although the Hamilto- 
nian describing this model is properly degenerate, it 
is shown that Nekhoroshev-like results apply to the 
D’Alembert problem in the proximity of a 1:1 
resonance. 


Numerical Results 


The model introduced in [10|-[12] often represents an 
unrealistic simplification of the spin-orbit dynamics. 
In particular, assumption (1) implies that secular 
perturbations of the orbital parameters are neglected, 
whereas the hypothesis (2) corresponds to disregarding 
the spin-orbit obliquity, namely the angle formed by 
the rotational axis with the normal to the orbital 
plane. Due to the presence of an equatorial bulge, the 
gravitational attraction of the other bodies of the solar 
system induces a torque, resulting in a precessional 
motion. It is also important to take into account the 
changes of the obliquity angle, whose variations 
might affect the climatic behavior. 

A realistic model for the precession and the 
variation of the obliquity has been presented by 
Laskar (1995). The numerical simulations and the 
frequency-map analysis show that the Earth’s 
obliquity is actually stable, although a large 
chaotic region is found in the interval between 
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60° and 90°. Since the present obliquity of the 
Earth amounts to ~23.3°, the Earth is outside the 
dangerous region. An interesting simulation was 
performed to evaluate the role played by the 
Moon. Without the Moon, the extent of the 
chaotic region would greatly increase, eventually 
preventing the birth of an evoluted life. Among 
the inner planets, Mars’ obliquity shows larger 
chaotic extent, which drives to variations from 
0° to 60° in a few million years. On the contrary, 
the external planets do not show significant 
chaotic regions and their obliquities are essen- 
tially stable. 


See also: Averaging Methods; Dynamical Systems in 
Mathematical Physics: An Illustration from Water Waves; 
Gravitational N-Body Problem (Classical); Hamiltonian 
Systems: Stability and Instability Theory; KAM Theory 
and Celestial Mechanics; Multiscale Approaches; 
Stability Theory and KAM. 
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Introduction 


A Hamiltonian system is a dynamical system whose 
equations of motions can be written in terms of a 
scalar function, called the Hamiltonian of the system: 
if one uses coordinates (p, q) in a domain (phase space) 
Dc R’, where N is the number of independent 
variables one needs to identify a configuration of the 
system (degrees of freedom), there is a function H(p, q) 
such that p= — ƏH /ðq and g=OH/dp. An integrable 
(Hamiltonian) system is a Hamiltonian system which, 
in suitable coordinates (A, œ) € A x T, where A is an 
open subset of R and T=R/27Z is the standard 
torus, can be described by a Hamiltonian H(A), 
that is, depending only on A. The coordinates 
(A, œ) are called action-angle variables. In such a 
case the dynamics is trivial: any initial condition 
(Ag, Œo) evolves in such a way that the action 
variables are constants of motion (i.e., A(t) = Ao for 
all t € R), while the angles grow linearly in time as 
a(t) =A)+a@t, where @=@(Ap) = O4Ho(Ao) is 
called the rotation (or frequency) vector. An 
integrable system can be thought of as a collection 
of decoupled (i.e., independent) rotators: the entire 
phase space Ax T™ is foliated into invariant tori 
and all motions are quasiperiodic. Integrable 
systems are stable, in the sense that nearby initial 
conditions separate at most linearly in time (in 
particular, the actions do not separate at all): 
mathematically, this is expressed by the fact that 
all the Lyapunov exponents are nonpositive. 

An example of an integrable system is any one- 
dimensional conservative mechanical system, in any 
region of phase space in which motions are 
bounded. By increasing the number of degrees of 
freedom, exhibiting nontrivial integrable systems 
can become a difficult task. The problem of studying 
the effects of even small Hamiltonian perturbations 
on integrable systems and of understanding if the 
latter remain stable, in the aforementioned sense, 
was considered by Poincaré to be the fundamental 
problem of dynamics. For a long time, it was 
commonly thought that all motions could be 
reduced to superpositions of periodic motions, 
hence to quasiperiodic motions, but at the end of 
nineteenth century it was realized by Boltzmann and 
Poincaré that such a picture was too naive, and that 
in reality more complicated motions were possible. 


As a consequence of this, it became a widespread 
belief that, even when starting from an integrable 
system, the introduction of an arbitrarily small 
perturbation would break integrability. 

This belief was strengthened by the work of 
Poincaré (1898), who showed that the series 
describing the solution in a perturbation theory 
approach are in general divergent. The source of 
divergence in perturbation series is the presence of 
small divisors, that is, of denominators of the kind 
of Ææ - v, where @ is the rotation vector that should 
characterize the invariant torus (if existent) and v is 
any integer vector. Despite this, however, perturba- 
tion series (known as Lindstedt series) continued to 
be extensively used by astronomers in problems of 
celestial mechanics, such as the study of planetary 
motions, for the simple reason that they provided 
predictions in good agreement with the observa- 
tions. But the feeling that the underlying mathema- 
tical tools were unsatisfactory persisted. 

In fact, the well-known Fermi—Pasta—Ulam 
numerical experiment, in 1955, was originally 
conceived in the spirit of confirming that integr- 
ability would in general be easily lost. Consider a 
chain with N harmonic oscillators, with, say, 
periodic boundary conditions, coupled with cubic 
and quartic two-body potentials, so that the 
Hamiltonian is 


N 
1 
H(p, 4) = X 52; + Wain — 41) 
=I 


1 


[1] 


1 
W(x) = ua +5x° + 3x" 


for a, 3 real parameters and (p,q) € R x R. One 
can introduce new variables such that the Hamilto- 
nian, for ~=G=0, can be written as 


N 
Ho(A) = 25 (P2+u.Q%)=0-A A 
7=1 


for a suitable rotation vector @=(w ,...,wn) € Rẹ 
(an explicit computation gives w, = 2 sin(ka/N)). 
Consider an initial condition in which all the 
energy is confined to a few modes, that is, A, 4 0 at 
t=O only for a few values of k. For a= 8 =0, the 
system is integrable, so that A,(t)=O0 for all t€ R 
and for all k such that A,;(0) =0. If the system ceases 
to be integrable when the perturbation is switched 
on, the energy is likely to start to be shared among 
the various modes, and after a long enough time has 


elapsed, an equidistribution of the energy among all 
modes (thermalization) might be expected. At least 
this behavior was expected by Fermi, Pasta, and 
Ulam, but it was not what they found numerically: 
on the contrary, all the energy seemed to remain 
associated with the modes close to the few initially 
excited ones. 

At about the same time, Kolmogorov (1954) 
published a breakthrough paper going exactly in 
the opposite direction: if one perturbs an integrable 
system, under some mild conditions on the integr- 
able part, most of the tori are preserved, although 
slightly deformed. A more precise statement is the 
following. 


Theorem 1 Let an N-degree-of-freedom Hamilto- 
nian system be described by an analytic Hamiltonian 
of the form 


H(A, œ) = Ho(A) + ef (A, æ) [3] 


with £ a real parameter (perturbation parameter), 
f a 2r-periodic function of each angle variable 
(potential or perturbation), and Ho(A) satisfying 
the nondegeneracy condition det 04Ho(A) 4 0 
(anisochrony condition). If @=@(A) = O4Ho(A) is 
fixed to satisfy the Diophantine condition 


C 
o- vi> wez\o [4] 
lV | 


for some constants Co >0 and tr >N-—1 (here 
lv] =|] +---+|un| and - denotes the standard 
inner product: @ -v =w +:::-+wynvyn), then 
there is an invariant torus with rotation vector @ 
for e small enough, say for e smaller than some value 
€o depending on Co and T (and on the function f). 


By saying that there is an invariant torus with 
rotation vector @, one means that there is an 
invariant surface in phase space on which, in 
suitable coordinates, the dynamics is the same as in 
the unperturbed case, and the conjugation (i.e., the 
change of variables which leads to such coordinates) 
is analytic in the angle variables and in the 
perturbation parameter. One also says that the 
torus of an integrable system (¢€=0) is preserved 
(or even persists) under a small perturbation. 

Note that, a posteriori, this proves convergence of 
the perturbation series: however, a direct check of 
convergence was performed only recently by 
Eliasson (1996). Kolmogorov’s proof was based on 
a completely different idea, that is, by performing 
iteratively a sequence of canonical transformations 
(which are changes of coordinates preserving the 
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Hamiltonian structure of the equations of motion) 
such that at each step the size of the perturbation is 
reduced. Of course, on the basis of Poincaré’s result, 
this iterative procedure cannot work for all initial 
conditions (e.g., when @ does not satisfy [4]). The 
key point in Kolmogorov’s scheme is to fix the 
rotation vector @ of the torus one is looking for, in 
such a way that the small divisors are controlled 
through the Diophantine condition [4] and the 
exponentially fast convergence of the algorithm. 

New proofs and extensions of Kolmogorov’s 
theorem were given later by Arnol’d (1962) and by 
Moser (1962); hence, the acronym KAM to denote 
such a theorem. Arnol’d gave a more detailed (and 
slightly different) proof compared to the original 
one by Kolmogorov, and applied the result to the 
planar three-body problem, thus showing that 
physical applications of the theorem were possible. 
Moser, on the other hand, proposed a modified 
method using a technique introduced by Nash 
(which approximates smooth functions with analy- 
tical ones) to deal with the case of systems with 
finite smoothness. 

For fixed small enough £, the surviving invariant 
tori cover a large portion of the phase space, called 
the Kolmogorov set; the relative measure of the 
region of phase space which is not filled by such tori 
tends to zero at least as y£ for e — 0. A system 
described by a Hamiltonian like [3] is then called a 
quasi-integrable Hamiltonian system. 

The excluded region of phase space corresponds 
to the unperturbed tori which are destroyed by the 
perturbation: the rotation vectors of such tori are 
close to a resonance, that is, to a value @ such that 
@-v=O for some integer vector v, and these are 
exactly the vectors which do not satisfy the 
Diophantine condition [4] for any value Co. A 
subset of phase space of this kind is called a 
resonance region. 

At first sight, this would seem to provide an 
explanation for the results found by Fermi, Pasta, 
and Ulam, but this is not quite the case. First, the 
threshold value o depends on N, and goes to zero 
very fast as N — oo (in general as N!~° for some 
œ > 0); however, the results of the numerical 
experiments apparently were insensitive to the 
number N of oscillators. Second, the KAM theorem 
deals with maximal tori, that is, tori characterized 
by rotation vectors which have as many components 
as the number of degrees of freedom, while the 
rotation vectors of the numerical quasiperiodic 
solutions seem to involve just a small number of 
components. 

Finally, as an extra problem, the validity of the 
nondegeneracy condition for the unperturbed 
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Hamiltonian is violated, because the unperturbed 
Hamiltonian is linear in the action variables (one 
says that the Hamiltonian is isochronous). Recently, 
Rink (2001), by continuing the work by Nishida, 
showed that in the Fermi—Pasta—Ulam problem it is 
possible to perform a canonical change of coordi- 
nates such that in the new variables the Hamiltonian 
becomes anisochronous: one uses part of the 
perturbation to remove isochrony. But the other 
two obstacles remain. 


Lower-Dimensional Tori 


A natural question is what happens to the invariant 
tori corresponding to rotation vectors which are not 
rationally independent, that is, vectors satisfying n 
resonance conditions, such as @-v;=0 for n 
independent vectors V1, ..., Vn, with 1<n<N-—2 
(the case n=N — 1 corresponds to periodic orbits 
and is comparatively easy); for instance, one can 
take @ = (w1,...,Wn,0,...,0) and, by a suitable 
linear change of coordinates, one can always make 
the reduction to a case of this kind. In particular, 
one can ask if a result analogous to the KAM 
theorem holds for these tori. Such a problem for the 
model [3] has not been studied very widely in the 
literature. What has usually been considered is a 
system of n rotators coupled with a system with 
s=N-—n degrees of freedom near an equilibrium 
point: then one calls normal coordinates the 
coordinates describing the latter, and the role of 
the parameter £ is played by the size of the normal 
coordinates (if their initial conditions are chosen 
near the equilibrium point). In the absence of 
perturbation (i.e., for ¢=0), one has either hyper- 
bolic or elliptic or, more generally, mixed tori, 
according to the nature of the equilibrium points: 
one refers to these tori as lower-dimensional tori, as 
they represent n-dimensional invariant surfaces in a 
system with N degrees of freedom. Then one can 
study the preservation of such tori. 

One can prove that, in such a case, at least if 
certain generic conditions are satisfied, in suitable 
coordinates, n angles rotate with frequencies 
W1,---,W,, respectively, while the remaining N — n 
angles have to be fixed close to some values 
corresponding to the extremal points of the function 
obtained by averaging the potential over the rotating 
angles. 

The case of hyperbolic tori is easier, as in the case 
of elliptic tori one has to exclude some values of £ to 
avoid some further resonance conditions between 
the rotation vector @ and the normal frequencies Ag 
(i.e. the eigenvalues of the linearized system 


corresponding to the normal coordinates), known 
as the first and second Melnikov conditions: 


C 
ovr] > WweZN\0, Vl<k<s 


C 
DV EAE Awl > py WweZN\0 
V1<k,k'<s 


Such conditions appear, with the values of the 
normal frequencies slightly modified by terms 
depending on €, at each iterative step, and at the 
end only for values of £ belonging to some Cantor 
set one can have elliptic lower-dimensional tori. 

The second Melnikov conditions are not really 
necessary, and in fact they can be relaxed as Bourgain 
(1994) has shown; this is an important fact, as it 
allows degenerate normal frequencies, which were 
forbidden in the previous works by Kuksin (1987), 
Eliasson (1988), and Poschel (1989). 

Similar results also apply in the case of lower- 
dimensional tori for the model [3], which represents 
sort of a degenerate situation, as the normal 
frequencies vanish for e=0. Again, one has to use 
part of the perturbation to remove the complete 
degeneracy of normal frequencies. 


Quasiperiodic Solutions in Partial 
Differential Equations 


For explaining the Fermi—Pasta-Ulam experiment, 
one has to deal with systems with arbitrarily many 
degrees of freedom. Hence, it is natural to investigate 
systems which have ab initio infinitely many 
degrees of freedom, such as the nonlinear wave 
equation, Uy — Uy, + V(x)u=y(u), the nonlinear 
Schrödinger equation, iu; — uy, + V(x)u=y(u), the 
nonlinear Korteweg-de Vries equation us + Uxxx — 
6u,u = (u), and other systems of nonlinear partial 
differential equations (PDEs); the continuum limit of 
the Fermi-Pasta-Ulam model gives indeed a non- 
linear Korteweg-de Vries equation, as shown by 
Zabuski and Kruskal (1965). Here (t, x) € Rx [0, 7], 
if d is the space dimension, and either periodic 
(u(0,t)=u(z,t)) or Dirichlet (u(0,t)=u(z,t)=0) 
boundary conditions can be considered; y(u) is a 
function analytic in u and starting from orders strictly 
higher than one, while V(x) is an analytic function of 
x, depending on extra parameters &1,...,&,. Such a 
function is introduced essentially for technical rea- 
sons, as we shall see that the eigenvalues Ag of the 
Sturm-—Liouville operator —02 + V(x) must satisfy 
some Diophantine conditions. If we set V(x) = € R 
in the nonlinear wave equation, we obtain the Klein- 
Gordon equation, which, in the particular case u = 0, 


reduces to the string equation. Again, the role of the 
perturbation parameter is played by the size of the 
solution itself. 

Small-amplitude periodic and quasiperiodic 
solutions for PDE systems have been extensively 
studied, among others, by Kuksin, Wayne, Craig, 
Poschel, and Bourgain. Results for such systems read 
as follows. Consider for concreteness the one-dimen- 
sional nonlinear wave equation with Dirichlet bound- 
ary conditions and with y(u) =u? + O(u°). When the 
nonlinear function y(u) is absent, any solution of the 
linear wave equation uy — Uxx + V(x)u=0 is a super- 
position of either finitely or infinitely many periodic 
solutions with frequencies A, determined by the 
function V(x). Let uo(@t,x) be a quasiperiodic 
solution of the linear wave equation with rotation 
vector @ E€ R”, where wg = Am,, for some n-tuple 
{m1,..., Mn}. Then for £ small enough there exists a 
subset =, of the space of parameters with large 
Lebesgue measure (more precisely, with complemen- 
tary Lebesgue measure which tends to zero when 
E€ — 0) such that for all E =(&,...,6,) € =. there is a 
solution u(t, x) of the nonlinear wave equation and a 
rotation vector @- satisfying the conditions 


lu-(t, x) — /euo(@-t, x)| < Ce 
\O. — ø| < Ce [6] 


for some positive constant C. 

The case n= 1 (periodic solutions) is not as easy 
as the finite-dimensional case, because there are 
infinitely many normal frequencies, so that there are 
small divisor problems which for finite-dimensional 
systems appear only for n > 2. 

For the nonlinear wave equation and the 
Schrödinger equation, if n>1, one can take 
V(x)=u, but one needs u Æ 0; for n > 1, one can 
take V(x)=u, as one can perform a preliminary 
transformation leading to an equation in which a 
function depending on parameters naturally 
appears, as shown by Kuksin and Pöschel (1996). 
For n=1, the case u=0 has been very recently 
solved by Gentile et al. (2005). 

Statements for more general situations can also 
be obtained, while extensions to space dimensions 
d > 2 are not trivial and have been obtained only 
recently by Bourgain (1998). The above result also 
holds if the number of components of the rotation 
vector is less than the number of parameters: one 
uses such parameters because one needs to impose 
some Diophantine conditions such as [5], now for 
all the frequencies Ay =w, k € {m1,..., Mn}. Again, 
the second Mel’nikov conditions were shown by 
Bourgain to be unnecessary, and this is an essential 
ingredient for the higher-dimensional case. 
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Even if systems of the type considered above have 
been widely studied, they remain significantly 
different from a discrete system such as the chain 
of oscillators [1] for N large enough (also in the 
limit N— oo), so that the results which have been 
found for PDE systems do not really provide an 
explanation for the numerical findings. 

Also in the case of lower-dimensional tori for finite- 
dimensional systems the main problem is that, even if 
such tori exist, it is not clear what relevance they can 
have for the dynamics (a case in which hyperbolic tori 
play a role is considered later). An important feature of 
maximal tori is that they fill most of the phase space, a 
property which certainly does not hold for lower- 
dimensional tori, which lie outside the Kolmogorov set. 

In the Fermi—Pasta—Ulam experiment, one con- 
siders initial conditions close to lower-dimensional 
tori; hence, an interesting problem is to study their 
stability, that is, how fast the trajectories starting 
from such initial conditions drift away. 


Arnol’d Diffusion and Nekhoroshev’s 
Theorem 


Consider again the maximal tori. For N=2, the 
preservation of most of the invariant tori prevents the 
possibility of diffusion in phase space: the tori 
represent two-dimensional surfaces in a three-dimen- 
sional space (as dynamics occur on the level surfaces 
of the energy in a four-dimensional space), so that, if 
an initial condition is trapped in a gap between two 
tori, the corresponding trajectory remains confined 
forever between them. The situation is quite different 
for N > 3: in such a case, the tori do not represent a 
topological obstruction to diffusion any more. 

That mechanisms of diffusion are really possible 
was shown by Arnol’d (1963). Because of the 
perturbation, lower-dimensional hyperbolic tori 
appear inside the resonance regions, with their 
stable and unstable manifolds (whiskers). It is 
possible that these manifolds of the same torus 
intersect with a nonvanishing angle (homoclinic 
angle); as a consequence, the angles between the 
stable and unstable manifolds of nearby tori 
(heteroclinic angles) can also be different from 
zero, and one can find a set of hyperbolic lower- 
dimensional tori such that the unstable manifold of 
each of them intersects the stable manifold of the 
torus next to it: one says that such tori form a 
transition chain of heteroclinic connections. Then 
there can be trajectories moving along such connec- 
tions, producing at the end a drift of order 1 (in £) in 
the action variables. Such a phenomenon is referred 
to as Arnol’d diffusion. 
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Of course, diffusing trajectories should be located 
in the region of phase space where there are no 
invariant tori (hence, a very small region when e€ is 
small), but an important consequence is that, unlike 
what happens in the unperturbed case, not all 
motions are stable: in particular, the action variables 
can change by a large amount over long times. 

Providing interesting examples of Hamiltonian 
systems in which Arnol’d diffusion can occur is not 
so easy: in fact, for the diffusion to really occur, one 
needs a lower bound on the homoclinic angles, and 
to evaluate these angles can be difficult. For 
instance, Arnold’s (1963) original example, which 
describes a system near a resonance region, is a two- 
parameter system given by 


5 (Aq + A$) + A; + (cos ay — 1) 
+ eu(cos a, — 1)(sin az + cos a3) [7] 


and the angles can be proved to be bounded from 
below only by assuming that the perturbation para- 
meter £ is exponentially small with respect to the other 
parameter u, which in turn implies a situation not 
really convincing from a physical point of view. More 
generally, for all the examples which are discussed in 
literature, the relation with physics (as the d’Alembert 
problem on the possibility for a planet to change the 
inclination of the precession cone) is not obvious. 

So the question naturally arises as to how fast can 
such a mechanism of diffusion be, and how relevant 
is it for practical purposes. A first answer is 
provided by a theorem of Nekhoroshev (1977), 
which states the following result. 


Theorem 2 Suppose we have an N-degree-of- 
freedom  quasi-integrable Hamiltonian system, 
where the unperturbed Hamiltonian satisfies some 
condition such as convexity (or a weaker one, 
known as steepness, which is rather involved, to 
state in a concise way); for concreteness consider a 
function Ho(A) in [2] which is quadratic in A. Then 
there are two positive constants a and b such that 
for times t up to O(exp(e~”)) the variations of the 
action variables cannot be larger than O(e*). 


The constants a and b depend on N, and they tend 
to zero when N — o; Lochak and Neishtadt (1992) 
and Poschel (1993) found estimates a=b=1/2N, 
which are probably in general optimal. Nekhor- 
oshev’s theorem is usually stated in the form above, 
but it provides more information than that explicitly 
written: the trajectories, when trapped into a 
resonance region, drift away and come close to 
some invariant torus, and then they behave like 
quasiperiodic motions, up to very small corrections, 
for a long time, until they enter some other 


resonance region, and so on. Of course, for initial 
conditions on some invariant torus, KAM theorem 
applies, but the new result concerns initial condi- 
tions which do not belong to any tori. 

Nekhoroshev’s theorem gives a lower bound for 
the diffusion time, that is, the time required for a 
drift of order 1 to occur in the action variables. But, 
of course, an upper bound would also be desirable. 
The diffusion times are related to the amplitude of 
the homoclinic angles, which are very small (and 
difficult to estimate as stated before). The strongest 
results in this direction have been obtained with 
variational methods, for instance, by Bessi, Bernard, 
Berti, and Bolle: at best, for the diffusion time, one 
finds an estimate O(u™ log yw), if u is the ampli- 
tude of the homoclinic angles (which in turn are 
exponentially small in some power of «€, as one can 
expect as a consequence of Nekhoroshev’s theorem). 

Then one can imagine that the results of the Fermi- 
Pasta—Ulam experiment can also be interpreted in the 
light of Nekhoroshev’s theorem. The solutions one 
finds numerically certainly do not correspond to 
maximal tori, but one could expect that they could be 
solutions which appear to be quasiperiodic for long 
but finite times (e.g., moving near some lower- 
dimensional torus determined by the initial condi- 
tions), and that if one really insists on observing the 
time evolution for a very long time, then deviations 
from quasiperiodic behavior could be detected. This 
is an appealing interpretation, and the most recent 
numerical results make it plausible: Galgani and 
Giorgilli (2003) have found numerically that the 
energy, even if initially confined to the lower modes, 
tend to be shared among all the other modes, and 
higher the modes the longer is the time needed for the 
energy to flow to them. Of course, this does not settle 
the problem, as there is still the issue of the large 
number of degrees of freedom; furthermore, for large 
N the spacing between the frequencies is small, and 
they become almost degenerate. Hence, the problem 
still has to be considered as open. 


Stability versus Chaos 


The main problem in applying the KAM theorem 
seems to be related to the small value of the threshold 
co which is required. In general, when the size of the 
perturbation parameter is very large, the region of 
phase space filled with invariant tori decreases (or even 
disappears), and chaotic motions appear. By the latter, 
one generally means motions which are highly 
sensitive to the initial conditions: a small variation of 
the initial conditions produces a catastrophic variation 
in the corresponding trajectories (this is due to the 
appearance of strictly positive Lyapunov exponents). 


A natural question is then how such a result as the 
KAM theorem is meaningful in physical situations: 
in other words, for which systems the KAM theorem 
can really apply. 

One of the main motivations to study such a 
problem was to explain astronomical observations 
and to study the stability of the solar system. In 
order to apply the KAM theorem to the solar 
system, one has to interpret the gravitational forces 
between the planets as perturbations of a collection 
of several decoupled two-body systems (each planet 
with the Sun). One can write the masses of the 
planets as em;, and € plays the role of the 
perturbation parameter. The corresponding Hamil- 
tonian (after suitable reductions and scalings) is 


Pi: Pj 








isig N 10 


a A 8] 

1<i<j<N di i qi! 
where i=0O corresponds to the Sun, while 
i=1,...,N correspond to the planets (hence 


N=9), mo is the mass of the Sun, and ep; are the 
reduced masses (u7! =m! + em ,'); here (qi, pi) € 
R? x R?,i = 0, ..., N, the inner product in Pi- Pris 
in R°, and the norm |- | is the Euclidean one. 

A first difficulty is that the solar system is a properly 
degenerate system; that is, the unperturbed Hamilto- 
nian does not depend on all the action variables. But 
such a degeneracy can be removed by performing a 
canonical change of coordinates which produces a new 
Hamiltonian in which the integrable part contains new 
terms of order £ depending on all action variables and 
is nondegenerate, while the perturbation becomes of 
order <7: the angle variables corresponding to the 
actions not originally appearing in the unperturbed 
Hamiltonian are called the slow variables, while the 
others are called the fast variables. 

However, a naive implementation of the KAM 
theorem, in general, even for simplified but still 
realistic systems, would provide a preposterously 
small value of the threshold £o. The problem could 
be just a computational one: in principle, a very 
refined estimate of the threshold could give a better 
value, so that it is very difficult to decide analytically 
if the real values of the planetary masses allow the 
solar system to fall inside the regime of appli- 
cability of the KAM theorem. Results in this 
direction have been obtained, but only for special 
situations: for instance, by considering the restri- 
cted planar circular three-body problem (which 
provides a simplified description of the system 
“Sun + Jupiter + asteroid”), Celletti and Chierchia 
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(1997) found analytical bounds on the perturbation 
parameters comparable with the physical values. Of 
course, this is not at all conclusive for the general 
situation in which all planets (with their satellites 
and the asteroids) are considered together; in 
particular, it does not shed light on the problem of 
the stability of the entire solar system. 

On the contrary, extensive numerical simulations 
performed by Laskar (starting from 1989) seem to 
suggest that the solar system is unstable. Deflections 
from the current orbits could be produced to such an 
extent that collisions between planets could not be 
avoided: Mercury could collide with Venus and be 
ejected from the solar system. An important issue is 
to consider the times over which such phenomena 
can occur. Laskar’s numerical simulations show that 
such times are less than the estimated age of the solar 
system, and that one can make accurate predictions 
for the planetary motions only for a finite amount of 
time (~100 Myr). Furthermore, the assumed partial 
instability of the solar system has also been used by 
Laskar (2004) to explain some observed phenomena 
such as the evolution of the obliquity (which is the 
angle between equator and orbital plane) of some 
planets. Of course, these simulations have been 
carried out with several approximations, as that of 
averaging over the fast variables, which allows one to 
use a large integration step in the numerical integra- 
tion of the equations of motion for the resulting 
system. This is the so-called secular system intro- 
duced by Lagrange: instead of the fast motion of the 
planets, one describes the slow deformations of the 
planetary orbits (imagining the planets as regions of 
mass spread along their orbits). 


See also: Averaging Methods; Bifurcation Theory; 
Billiards in Bounded Convex Domains; Diagrammatic 
Techniques in Perturbation Theory; Dynamical Systems 
and Thermodynamics; Gravitational N-Body Problem 
(Classical); Hamiltonian Systems: Stability and Instability 
Theory; Hamilton-Jacobi Equations and Dynamical 
Systems: Variational Aspects; Integrable Systems and 
Discrete Geometry; KAM Theory and Celestial 
Mechanics; Localization for Quasiperiodic Potentials; 
Stability Problems in Celestial Mechanics; 
Synchronization of Chaos; Weakly Coupled Oscillators. 
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Introduction 


The standard model (SM) is a consistent, finite, 
and — within the limitations of our present technical 
ability — computable theory of fundamental micro- 
scopic interactions that successfully explains most 
of the known phenomena in elementary particle 
physics. The SM describes strong, electromagnetic, 
and weak interactions. All microscopic phenomena 
observed to date can be attributed to one or the 
other of these interactions. For example, the forces 
that hold together the protons and the neutrons in 


the atomic nuclei are due to strong interactions; the 
binding of electrons to nuclei in atoms or of atoms 
in molecules is caused by electromagnetism; and the 
energy production in the Sun and the other stars 
occurs through nuclear reactions induced by weak 
interactions. In principle, gravitational forces 
should also be included in the list of fundamental 
interactions but their impact on fundamental 
particle processes at accessible energies is totally 
negligible. 

The structure of the SM is a generalization of 
that of quantum electrodynamics (QED), in the 
sense that it is a renormalizable field theory based 
on a local symmetry (i.e., separately valid at each 
spacetime point x) that extends the gauge invar- 
lance of electrodynamics to a larger set of 


conserved currents and charges. There are eight 
strong charges, called “color” charges and four 
electroweak charges (which, in particular, include 
the electric charge). The commutators of these 
charges form the SU(3) @ SU(2) ® U(1) algebra. In 
QED, the interaction between two matter particles 
with electric charges (e.g., two electrons) is 
mediated by the exchange of one (or more) photons 
emitted by one electron and reabsorbed by the 
second. In the SM the matter fields, all of spin 1/2, 
are the quarks, the constituents of protons, neu- 
trons, and all hadrons, endowed with both color 
and electroweak charges, and the leptons (the 
electron e7, the muon u`, the tauon 7, plus the 
three associated neutrinos Ve, Va, and v;) with no 
color but with electroweak charges. The matter 
fermions come in three generations or families with 
identical quantum numbers but different masses. 
The pattern is as follows: 


uu uU be C é € v 

P d d a p S S a 

t t t vy, 

; b b a [1] 


Each family contains a weakly charged doublet of 
quarks, in three color replicas, and a colorless 
weakly charged doublet with a neutrino and a 
charged lepton. At present, there is no explanation 
for this triple repetition of fermion families. The 
force carriers, of spin 1, are the photon y, the weak 
interaction gauge bosons W+, W7, and Zo and the 
eight gluons g that mediate the strong interactions. 
The photon and the gluons have zero masses as a 
consequence of the exact conservation of the 
corresponding symmetry generators, the electric 
charge and the eight color charges. The weak 
bosons Wt, W7, and Zo have large masses (mw ~ 
80.4 GeV, mz =91.2 GeV), signaling that the corre- 
sponding symmetries are badly broken. In the SM, 
the spontaneous breaking of the electroweak gauge 
symmetry is induced by the Higgs mechanism, 
which predicts the presence of one (or more) spin 0 
particles in the physical spectrum, the Higgs 
boson(s), not yet experimentally observed. A tre- 
mendous experimental effort is underway or 
planned to reveal the Higgs sector as the last crucial 
missing link in the SM verification. 


Quantum Chromodynamics 


The statement that quantum chromodynamics 
(QCD) is a renormalizable gauge theory based on 
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the group SU(3) with color triplet quark matter 
fields fixes the QCD Lagrangian density to be 


ee ae 
= -42 PAu Fe + D i -= m;)q; [2] 
= JF 


Here q; are the quark fields (of n; different flavors) 
with mass m;; D=D,,7", where q” are the Dirac 
matrices and D, is the covariant derivative 


D, = 0, — ies X tgi [3] 
A 


e, is the gauge coupling (in analogy with QED, 


e2 


e S 

a 4r 4 
here and throughout this article natural units, 
h =c=1, are used); ne A=1,...,8, are the gluon 
fields, and t“ are the SU(3) group generators in the 
triplet representation of quarks (i.e., t4 are 3 x 3 
matrices acting on q); the generators obey the 
commutation relations [t4,t?]=iCapgct©, where 
Capc are the complete antisymmetric structure 
constants of SU(3) (the normalization of Cagc and 
of e, is specified by tr[t4t?] = 1/264); 


Fi, = ze = Ong} — e;CaBcg,,8y [5] 


The physical vertices in QCD include the gluon- 
quark-antiquark vertex, analogous to the QED 
photon-fermion-antifermion coupling, but also the 
three-gluon and four-gluon vertices, of order e, and 
e?, respectively, which have no analog in an abelian 
theory like QED. In QED, the photon (a neutral 
particle) is coupled to all electrically charged 
particles. In QCD, the gluons are colored, 
hence self-coupled. This is reflected in the fact that 
in QED F,» is linear in the gauge field, so that the 
term F? in the Lagrangian is a pure kinetic term, 
while in QCD cD is quadratic in the gauge field, so 
that in F’? we find cubic and quartic vertices 
beyond the kinetic term. 

The QCD Lagrangian in eqn [2] has a simple 
structure but a very rich dynamical content, includ- 
ing the observed complex spectroscopy with a large 
number of hadrons. The most prominent properties 
of QCD are asymptotic freedom and confinement. 
In field theory, the effective coupling of a given 
interaction vertex is modified by the interaction. As 
a result, the measured intensity of the force depends 
on the transferred (four)momentum squared, O7, 
among the participants. In QCD, the relevant 
coupling parameter that appears in physical pro- 
cesses is a, (see eqn [4]). Asymptotic freedom means 
that the effective coupling becomes a function of 
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O7: a,(Q*) decreases for increasing O? and vanishes 
asymptotically. Thus, the QCD interaction becomes 
very weak in processes with large O7, called hard 
processes or deep inelastic processes (i.e., with a 
final-state distribution of momenta and a particle 
content very different from that in the initial state). 
One can prove that in four spacetime dimensions all 
gauge theories based on a noncommuting group of 
symmetry are asymptotically free, and conversely. 
The effective coupling decreases very slowly at large 
momenta with the inverse logarithm of O?: 
a;(O”7) =1/b log O7/A*, where b is a known con- 
stant and A is an energy of the order of a few 
hundred MeV. Since in quantum mechanics large 
momenta imply short wavelengths, the result is that 
at short distances the potential between two color 
charges is similar to the Coulomb potential, that is, 
proportional to a,(r)/r, with an effective color 
charge which is small at short distances. On the 
contrary the interaction strength becomes large at 
large distances or small transferred momenta, of 
order O < A. In fact, the observed hadrons are tightly 
bound composite states of quarks, with compensating 
color charges so that they are overall neutral in color. 
The property of confinement is the impossibility of 
separating color charges, like individual quarks and 
gluons. This is because in QCD the interaction 
potential between color charges increases, at long 
distances, linearly in r. When we try to separate the 
quark and the antiquark that form a color-neutral 
meson the interaction energy grows until pairs of 
quarks and antiquarks are created from the vacuum 
and new neutral mesons are coalesced instead of free 
quarks. For example, consider the process ee” — qq 
at large center-of-mass energies. The final-state quark 
and antiquark have large energies, so they separate in 
opposite directions very fast. But the color-confine- 
ment forces create new pairs in between them. Two 
back-to-back jets of colorless hadrons are observed 
with a number of slow pions that make the exact 
separation of the two jets impossible. In some 
cases, a third well-separated jet of hadrons is also 
observed: these events correspond to the radiation 
of an energetic gluon from the parent quark- 
antiquark pair. 


Electroweak Interactions 


We split the electroweak Lagrangian into two parts 
by separating the Higgs boson couplings: 


L= Lsymm + L Higgs [6] 


We start by specifying Lsymm, which involves only 
gauge bosons and fermions (a sum over all flavors of 


quarks and leptons, generally indicated by w is 


understood): 
Lsymm = — P „BH 


+ priy D tr + yri Dye [7] 


This is the Yang-Mills Lagrangian for the gauge 
group SU(2) & U(1) with fermion matter fields. Here 


By = 0; By — OB, 


8 
Fy = ð, W — aowi _ geac WE Wy | | 


are the gauge antisymmetric tensors constructed out 
of the gauge field B, associated with U(1), and Wi 
corresponding to the three SU(2) generators; €4gc 
are the group structure constants (see eqn [11]), 
which, for SU(2), coincide with the totally antisym- 
metric Levi-Civita tensor (recall the familiar 
angular-momentum commutators). 

The fermion fields are described through their 
left- and right-hand components: 


WLR = [(1 F ys)/2\y, Pir = V1  7s)/2] [9] 


Note that, as given in eqn [9], 


be = vbr =v! [(1 —4s)/2hv0 
= po — ys)/2}90 = ¥[(1 + 9s) /2! 


The matrices P4 =(1+-s5)/2 are projectors. They 
satisfy the relations Perea Ps. Pre, 
Pa + P_ = 1. 

The standard electroweak theory is a chiral 
theory, in the sense that yı and wp behave 
differently under the gauge group. In particular, all 
wr are singlets and all Yg are doublets in the 
minimal SM (MSM). Thus, mass terms for fermions 
(of the form yy yr +h.c.) are forbidden in the 
symmetric limit. Fermion masses are introduced, 
together with W+ and Z masses, by the mechanism 
of symmetry breaking. The covariant derivatives 
D yır are explicitly given by 


DY, R 


3 
. ot 
On +ig> aT, +ig 5 YurBulYir — [10] 
A=1 


where tip and 1/2Y,R are the SU(2) and U(1) 
generators, respectively, in the reducible representa- 
tions wR. The commutation relations of the SU(2) 
generators are given by 


Ge i = ieapcty [1 1] 


We use the normalization tr[t^t”]= 1/26" in the 
fundamental representation of SU(2). The electric 


A 4B . C 


charge generator O (in units of e, the positron 
charge) is given by 


O=#+1/2Y, =tR+1/2Yp [12] 
All fermion couplings to the gauge bosons can be 

derived directly from eqns [7] and [10]. The charged- 

current (CC) couplings are the simplest. From 


g(t! Wit Pwr) = gi (r + it”) / V2] 
x| (wh -iw2)/v2] +h.c.} 
=g{ |(W; )/V2| +h.c.} [13 


where t+=t'+it? and Wt=(W!+4iW7)/V2, we 


obtain the vertex 
Viuw = sry | (tf /V2) (1 — 75)/2 + (tt/v2) 
x (1+ 75)/2| YW; + he. 114] 


In the neutral-current (NC) sector, the photon A, 
and the mediator Z,, of the weak NC are orthogonal 
and normalized linear combinations of B,, and Wi: 


A, = cos @wB,, + sin Ow W 15) 
Zy = — sin OwB,, + cos Ow W? 


Equations [15] define the weak mixing angle Ow. 
The photon is characterized by equal couplings to 
left and right fermions with a strength equal to the 
electric charge. Recalling eqn [12] for the charge 
matrix O, we immediately obtain 


g sin Ow = g’ cosOw =e [16] 


or, equivalently, 


tan Ow = g'/g [17] 


Once Ow has been fixed by the photon couplings, it 
is a simple matter of algebra to derive the Z 
couplings, with the result 


T gyz =g/(2 cos Ow ry Ba — ys) + (1 +45) 
—20 sin? Oy] yZ" [18] 


where Iz is a notation for the vertex. In the 
MSM, ta =0 and t = +1/2. Note that the CC and 
NC weak couplings do not conserve P (parity) and C 
(charge conjugation). 

In order to derive the effective four-fermion 
interactions that are equivalent, at low energies, to 
the CC and NC couplings given in eqns [14] and 
[18], we anticipate that large masses, as experimen- 
tally observed, are provided for W* and Z by Lutggs. 
For left-left CC couplings, when the momentum 
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transfer squared can be neglected with respect to 
mîy in the propagator of Born diagrams with single 


W exchange, from eqn [14], we can write 


Loe = (g /8miy) [Py — ys ety] 
x [vy (1 — ys) ery [19] 


By specializing further in the case of doublet fields 
such as Ve —e or V, — u`, we obtain the tree-level 
relation of g with the Fermi coupling constant 
Gp measured from pu decay (Gp=1.16639(2)x 
10-5 GeV): 


Gp/V2 = 8 /8my 20 


By recalling that gsin@w=e, we can also cast this 
relation in the form 


my = Uam sin Ow [21] 


with 
1/2 
Dee (ma/V2Gr ~ 37.2802GeV [22] 


where a is the fine-structure constant of QED 
(a = e? /4r =1/137.036). 

In the same way, for neutral currents we obtain, 
in Born approximation, from eqn [18], the effective 
four-fermion interaction given by 


LNE œ V2GEpod yl: eby"[-. I [23] 


L.J = e(1— ys) +1 + 4s) —2O0sin* Ow [24] 


po = My, [mz cos” Ow [25] 


All couplings given in this section are obtained at 
tree level and are modified in higher orders of 
perturbation theory. In particular, the relations 
between mw and sinw (eqns [21] and [22]) and 
the observed values of p (p= pọ at tree level) in 
different NC processes are altered by computable 
small electroweak radiative corrections. 

The gauge-boson self-interactions can be derived 
from the F,,, term in Lsymm, by using eqn [15] and 
W*=(W! 4iW?)/V2. For the three-gauge-boson 
vertex WT WV with V=Z,7, we obtain 


Dw-w+v = igw-wrv uwla- 2), tga- 7), 
+ gurlr =d) [26] 
with 
gw-w+ = gsindw =e and 
gw-w-+z = gcos bw 
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This form of the triple gauge vertex is very special: in 
general, there could be departures from the above SM 
expression, even restricting us to SU(2) ® U(1) gauge 
symmetric and C and P invariant couplings. In fact, 
some small corrections are already induced by the 
radiative corrections. The SM form of the triple gauge 
vertex has been experimentally confirmed by measur- 
ing the cross section e*e — Wt W- at LEP. 

We now turn to the Higgs sector of the electro- 
weak Lagrangian. The Higgs Lagrangian is specified 
by the gauge principle and the requirement of 
renormalizability to be 


Lutiges = (D,.¢)'(D“) — V(¢'d) — LT vr 
— dr! [28] 


where @ is a column vector including all Higgs 
scalar fields; it transforms as a reducible representa- 
tion of the gauge group. The quantities T (which 
include all coupling constants) are matrices that 
make the Yukawa couplings invariant under the 
Lorentz and gauge groups. The potential V(¢'¢), 
symmetric under SU(2) & U(1), contains, at most, 
quartic terms in @¢ so that the theory is 
renormalizable: 


V(o'b) = —1 2416 +All) [29] 


Spontaneous symmetry breaking is induced if the 
minimum of V, which is the classical analog of 
the quantum-mechanical vacuum state (both are the 
states of minimum energy) is obtained for nonvan- 
ishing @ values. This occurs because we have taken 
u? and positive in V (note the “wrong” sign of the 
mass term). Precisely, we denote the vacuum 
expectation value (VEV) of œ, that is, the position 
of the minimum, by v: 


(O|(x)|0) =v # 0 (30) 


The fermion mass matrix is obtained from the 
Yukawa couplings by replacing ¢(x) by v: 


M = bp Myr + dRM pr [31] 
with 
Mara 32) 


In the SM, where all left fermions, YL, are doublets 
and all right fermions, Wr, are singlets, only Higgs 
doublets can contribute to fermion masses. There 
are enough free couplings in T, so that one single 
complex Higgs doublet is indeed sufficient to 
generate the most general fermion mass matrix. It 
is important to observe that by a suitable change of 
basis we can always make the matrix M Hermitian, 


ys-free and diagonal. In fact, we can make separate 
unitary transformations on wy and Wp according to 


wy = Ud, Yr = Vor [33] 


and consequently 
M — M' = U'MV [34] 


This transformation does not alter the general 
structure of the fermion couplings in Lsymm- 

If only one Higgs doublet is present, the change of 
basis that makes M diagonal will at the same time 
diagonalize also the fermion-Higgs Yukawa cou- 
plings. Thus, in this case, no flavor-changing neutral 
Higgs exchanges are present. This is not true, in 
general, when there are several Higgs doublets. But 
one Higgs doublet for each electric charge sector, 
that is, one doublet coupled only to u-type quarks, 
one doublet to d-type quarks, one doublet to charged 
leptons would also be satisfactory, because the mass 
matrices of fermions with different charges are 
diagonalized separately. In fact, at the moment, the 
simplest model with only one Higgs doublet seems 
adequate for describing all observed phenomena. 

Weak charged currents are the only tree-level 
interactions in the SM that change flavor: by 
emission of a W, a u-type quark is turned into a 
d-type quark, or a 1 neutrino is turned into an 
I~ charged lepton (all fermions are left-handed). If 
we start from a u-type quark that is a mass 
eigenstate, emission of a W turns it into a d-type 
quark state d’ (the weak isospin partner of u) that in 
general is not a mass eigenstate. In general, the mass 
eigenstates and the weak eigenstates do not coincide 
and a unitary transformation connects the two sets: 


d’ d 
s |=Vis [35] 
b' b 


or, in shorthand, D' = VD, where V is the Cabibbo- 
Kobayashi-Maskawa (CKM) matrix. Thus, in terms 
of mass eigenstates the charged weak current of 
quarks is of the form 


I x “y,(1 — ys) VD [36] 


Since V is unitary (i.e., VV! = V' V = 1) and commu- 
tes with T*, T3, and Q (because all d-type quarks 
have the same isospin and charge) the neutral current 
couplings are diagonal both in the primed and 
unprimed basis (if the Z d-type quark current is 
abbreviated as D/TD’ then by changing basis we get 
DV'TVD and V and I commute because, as seen 
from eqn [24], T is made of Dirac matrices and T3 and 
Q generator matrices). It follows that D'TD’ = DID. 
This is the Glashow-Iliopoulos—Maiani (GIM) 


mechanism that ensures natural flavor conservation 
of the neutral current couplings at the tree level. For 
three generations of quarks, the CKM matrix depends 
on four physical parameters: three mixing angles and 
one phase. This phase is the unique source of CP 
violation in the SM. 

We now consider the gauge-boson masses and their 
couplings to the Higgs. These effects are induced by 
the (D,,@)'(D“@) term in Lyiges (eqn [28]), where 


3 
Dud = |O.+ig > t Wi +ig'(Y/2)Bule [87] 
A=1 

Here t^ and 1/2Y are the SU(2) & U(1) generators in 
the reducible representation spanned by ¢. Not only 
doublets but all non-singlet Higgs representations can 
contribute to gauge-boson masses. The condition that 
the photon remains massless is equivalent to the 
condition that the vacuum is electrically neutral: 


Qlv) = 


The charged W mass is given by the quadratic terms 
in the W field arising from Lyiges, when (x) is 
replaced by v. We obtain 


(t +5Y)|v) =0 [38] 


2 
my Wt Wr = g|(ttv/v2) ww [39 
whilst for the Z mass we get (recalling eqn [15]) 
Lim?Z,,Zt = | lg cos fyt? 
—g' sinOw(Y/2)|v|"Z,Z" [40] 


where the factor of 1/2 on the left-hand side is the 
correct normalization for the definition of the mass 
of a neutral field. For Higgs doublets 


z 
o=(%) v=) 41] 
we obtain 


my = 1/280, m = 1/2g*v*/cos* Ow [42] 
Note that by using eqn [20] we obtain 
v = 23/4G;"? = 174.1 GeV [43] 
It is also evident that for Higgs doublets 
po = my, /mz cos? Ow = 1 [44] 


This relation is typical of one or more Higgs doublets 
and would be spoiled by the existence of, for example, 
Higgs triplets. This result is valid at the tree level and is 
modified by calculable small electroweak radiative 
corrections. The po parameter has been measured from 
the intensity of NC interactions (recall eqn [25]) and 
confirmed to be close to unity at a few per milli level. 
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In MSM only one Higgs doublet is present. Then the 
fermion—Higgs couplings are in proportion to the 
fermion masses. In fact, from the Yukawa couplings 
Zorel (fLefr + h.c.), the mass my is obtained by replacing 
$ by v, so that mf = gzv. In MSM, three out of the 
four Hermitian fields are removed from the physical 
spectrum by the Higgs mechanism and become the 
longitudinal modes of W+,W-, and Z which acquire a 
mass. The fourth neutral Higgs is physical and should 
be found. If more doublets are present, two more 
charged and two more neutral Higgs scalars should be 
around for each additional doublet. 

The couplings of the physical Higgs H to the 
gauge bosons can be simply obtained from Lyiggs, by 
the replacement 


w= (Go) (sana) 8 


(so that (D e) (D“¢)=1/2(0,H)* +---), with the 
result 
LIH, W, Z] 
2 


WH + (g /4) WF WH? 


g (v/ v2) W; 
e VZy Z") fai bw) |H 
+ |g /(8cos * Ow) | Z Z“H* 


In MSM, the Higgs mass m7, ~ \v* is of order of 
the weak scale v but cannot be predicted because the 
value of À is not fixed. The dominant decay mode of 
the Higgs is in the bb channel below the WW 
threshold, while the W* W- channel is dominant for 
sufficiently large my. The width is small below the 
WW threshold, not exceeding a few MeV, but 
increases steeply beyond the threshold, reaching the 
asymptotic value of T ~ 1/2m;, at large my, where 
all energies and masses are in TeV. 

A central role in the experimental verification of 
the standard electroweak theory has been played by 
CERN, the European Laboratory for Particle Physics, 
located near Geneva, between France and Switzer- 
land. The indirect effects of the Zo, that is, the 
occurrence of weak processes induced by the neutral 
current, were first observed in 1974 at CERN by the 
Collaboration Gargamelle (the name of the bubble 
chamber used in the experiment). Later, in 1982, the 
W+ and the Zp were, for the first time, directly 
produced and observed in proton—antiproton colli- 
sions by the UA1 and UA2 collaborations and then 
further studied with the same technique both at 
CERN and subsequently at the Tevatron of Fermilab 
near Chicago. Starting from 1989 LEP, the large e*e7 
collider was functioning at CERN till 2000. In the LEP 
circular ring of circumference ~27 km, electrons and 
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positrons were accelerated in opposite directions to an 
equal energy in the range between 45 and 103 GeV. 
The beams were made to cross and collide in 
correspondence of four experimental areas where the 
ALEPH, DELPHI, L3, and OPAL detectors were 
located to study the final states produced in the 
collisions. In its first phase, called LEP1, from 1989 
to 1995 the LEP operation had been completely 
dedicated to a precise study of the Zo properties, 
mass, lifetime, and decay modes in order to accurately 
test the predictions of the SM. The main lessons of the 
precision tests of the standard electroweak theory can 
be summarized as follows. It has been checked that the 
couplings of quarks and leptons to the weak gauge 
bosons W* and Z are indeed precisely those prescribed 
by the gauge symmetry. The accuracy of a few tenths 
of 1% for these tests implies that, not only the tree 
level, but also the structure of quantum corrections has 
been verified. Then, since the end of 1995, the energy 
of LEP was increased and the phase of LEP2 was 
started. The total energy was gradually increased up to 
206 GeV. The main physics goals of LEP2 were the 
search for the Higgs and for possible new particles, the 
precise measurement of mw and the experimental 
study of the triple gauge vertices WWy and WWZp. 
The Higgs particle of the SM could in principle be 
produced at LEP2 in the reaction e+e > ZoH, 
which proceeds by Zo exchange. The nonobservation 
of the Higgs particle at LEP2 has allowed to establish a 
lower limit on its mass: mpy>114GeV. Indirect 
indications on the Higgs mass were also obtained 
from the precision tests of the SM, as the radiative 
effects depend logarithmically on my. The indication 
is that the Higgs mass cannot be too heavy if the SM is 
valid: my <219 GeV at 95% c.l. In 2001, LEP was 
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Introduction 


This article treats a specific class of stationary 
solutions to the Einstein field equations which read 


1 8G 
Kip — z Sk = a lw [1] 


Here R,, and R =g” R, are, respectively, the Ricci 
tensor and the Ricci scalar of the spacetime metric 
gv, G the Newton constant, and c the speed of light. 


dismantled and, in its tunnel, a new double ring of 
superconducting magnets is being installed. The new 
accelerator, the LHC (Large Hadron Collider), will be 
a proton-proton collider of total center-of-mass 
energy 14 TeV. Two large experiments ATLAS and 
CMS will continue to search for the Higgs starting in 
the year 2007. The sensitivity of LHC experiments to 
the SM Higgs will go up to masses my of ~1 TeV. 


See also: Effective Field Theories; Electric—-Magnetic 
Duality; Electroweak Theory; General Relativity: 
Experimental Tests; Noncommutative Geometry and the 
Standard Model; Perturbative Renormalization Theory 
and BRST; Quantum Chromodynamics; Quantum 
Electrodynamics and its Precision Tests; Quantum Field 
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Theory; Supersymmetric Particle Models. 
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The tensor T,» is the stress—energy tensor of matter. 
Spacetimes, or regions thereof, where T,,,=0 are 
called vacuum. 

Stationary solutions are of interest for a variety 
of reasons. As models for compact objects at rest, 
or in steady rotation, they play a key role in 
astrophysics. They are easier to study than nonsta- 
tionary systems because stationary solutions are 
governed by elliptic rather than hyperbolic equa- 
tions. Finally, like in any field theory, one expects 
that large classes of dynamical solutions approach 
(“settle down to”) a stationary state in the final 
stages of their evolution. 

The simplest stationary solutions describing com- 
pact isolated objects are the spherically symmetric 


ones. In the vacuum region, these are all given by the 
Schwarzschild family. A theorem of Birkhoff shows 
that in the vacuum region any spherically symmetric 
metric, even without assuming stationarity, belongs to 
the family of Schwarzschild metrics, parametrized by a 
positive mass parameter m. Thus, regardless of 
possible motions of the matter, as long as they remain 
spherically symmetric, the exterior metric is the 
Schwarzschild one for some constant m. This has the 
following consequence for stellar dynamics: imagine 
following the collapse of a cloud of pressureless fluid 
(“dust”). Within Newtonian gravity, this dust cloud 
will, after finite time, contract to a point at which the 
density and the gravitational potential diverge. How- 
ever, this result cannot be trusted as a sensible physical 
prediction because, even if one supposes that New- 
tonian gravity is still valid at very high densities, a 
matter model based on noninteracting point particles 
is certainly not. Consider, next, the same situation in 
the Einstein theory of gravity: here a new question 
arises, related to the form of the Schwarzschild metric 
outside of the spherically symmetric body: 


g =-V' dt + V> dř + rdo’, 
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Here dQ? is the line element of the standard 


2-sphere. Since the metric [2] seems to be singular as 
r=2m is approached (from now on, we use units in 
which G = c= 1), there arises the need to understand 
what happens at the surface of the star when the 
radius r= 2m is reached. One thus faces the need of 
a careful study of the geometry of the metric [2] 
when r= 2m is approached, and crossed. 

The first key feature of the metric [2] is its 
stationarity, of course, with Killing vector field X 
given by X=. A Killing field, by definition, is a 
vector field the local flow of which generates isome- 
tries. A spacetime (the term spacetime denotes a 
smooth, paracompact, connected, orientable, and 
time-orientable Lorentzian manifold) is called station- 
ary if there exists a Killing vector field X which 
approaches ð; in the asymptotically flat region (where r 
goes to oo; see below for precise definitions) and 
generates a one-parameter group of isometries. A 
spacetime is called static if it is stationary and if the 
stationary Killing vector X is hypersurface orthogonal, 
that is, X’? A dX’ =0, where X’ = X, dx" = g, X” dx". 
A spacetime is called axisymmetric if there exists a 
Killing vector field Y, which generates a one-parameter 
group of isometries and which behaves like a rotation 
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in the asymptotically flat region, with all orbits 
27-periodic. In asymptotically flat spacetimes, this 
implies that there exists an axis of symmetry, that is, a 
set on which the Killing vector vanishes. Killing vector 
fields which are a nontrivial linear combination of a 
time translation and of a rotation in the asymptotically 
flat region are called stationary rotating, or helical. 

There exists a technique, due independently to 
Kruskal and Szekeres, of attaching together two 
regions r>2m and two regions r<2m of the 
Schwarzschild metric, as in Figure 1, to obtain a 
manifold with a metric which is smooth at r=2m. 
In the extended spacetime, the hypersurface {r = 2m} 
is a null hypersurface &, the Schwarzschild event 
horizon. The stationary Killing vector X=0, 
extends to a Killing vector in the extended spacetime 
which becomes tangent to and null on &. The global 
properties of the Kruskal-Szekeres extension of the 
exterior Schwarzschild spacetime make this spacetime 
a natural model for a nonrotating black hole. It is 
worth noting here that the exterior Schwarzschild 
spacetime [2] admits an infinite number of noniso- 
metric vacuum extensions, even in the class of 
maximal, analytic, simply connected ones. The 
Kruskal-Szekeres extension is singled out by the 
properties that it is maximal, vacuum, analytic, simply 
connected, with all maximally extended geodesics 
either complete, or with the area r of the orbits of the 
isometry groups tending to zero along them. 

We can now come back to the problem of the 
contracting dust cloud according to the Einstein 
theory. For simplicity, we take the density of the 
dust to be uniform — the so-called Oppenheimer- 
Snyder solution. It then turns out that, in the course 
of collapse, the surface of the dust will eventually 
cross the Schwarzschild radius, leaving behind a 
Schwarzschild black hole. If one follows the dust 
cloud further, a singularity will eventually form, but 
will not be visible from the “outside region” where 
r >2m. For a collapsing body of the mass of the 
Sun, say, one has 2m=3km. Thus, standard 
phenomenological matter models such as that for 
dust can still be trusted, so that the previous 
objection to the Newtonian scenario does not apply. 

There is a rotating generalization of the Schwarz- 
schild metric, namely the two-parameter family of 
exterior Kerr metrics, which in Boyer—Lindquist 
coordinates takes the form 


A —a’* sin’ 6 2asin* 6(17 +a? — A) 


= — dz? — dtd 
g S dt S p 
l 2\2_ Ag sin? 6 
Be i Ma ne) = adie sin? 0 dy? 
> 
uae dr + ude’ [3] 
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Figure 1 The Kruskal—Szekeres extension of the Schwarzschild solution. (Adapted with permission from Nicolas J-P (2002) Dirac fields 
on asymptotically flat space-times. Dissertationes Mathematicae 408: 1—85.) 


with 0<a<m. Here Y=r? +a? cos? 0, A =r? +4? — 
2mr and r} <r<oo where r, =m + (m? — a?) 
When a=0, the Kerr metric reduces to the 
Schwarzschild metric. The Kerr metric is again a 
vacuum solution, and it is stationary with X =ð; the 
asymptotic time translation, as well as axisymmetric 
with Y=0, the generator of rotations. Similarly to 
the Schwarzschild case, it turns out that the metric 
can be smoothly extended across r=r,, with {r=r,} 
being a smooth null hypersurface é in the extension. 
The null generator K of & is the limit of the 
stationary-rotating Killing field X-+wY, where 
w=a/(2mr,). On the other hand, the Killing vector 
X is timelike only outside the hypersurface {r =m + 
(m? — a2 cos? 6)'/*}, on which X becomes null. In the 
region between r} and r=m+(m?—a?cos26)'/”, 
which is called the ergoregion, X is spacelike. It is 
also spacelike on and tangent to &, except where the 
axis of rotation meets &, where X is null. Based on 
the above properties, the Kerr family provides 
natural models for rotating black holes. 

Unfortunately, as opposed to the spherically 
symmetric case, there are no known explicit collap- 
sing solutions with rotating matter, in particular no 
known solutions having the Kerr metric as final 
state. 

The aim of the theory outlined below is to 
understand the general geometrical features of 


stationary black holes, and to give a classification 
of models satisfying the field equations. 


Model-Independent Concepts 


Some of the notions used informally in the 
introductory section will now be made more 
precise. The mathematical notion of black hole is 
meant to capture the idea of a region of spacetime 
which cannot be seen by “outside observers.” Thus, 
at the outset, one assumes that there exists a family 
of physically preferred observers in the spacetime 
under consideration. When considering isolated 
physical systems, it is natural to define the “exterior 
observers” as observers which are “very far” away 
from the system under consideration. The standard 
way of making this mathematically precise is by 
using conformal completions, discussed in more 
detail in the article about asymptotic structure in 
this encyclopedia: a pair (.W,&) is called a con- 
formal completion at infinity, or simply conformal 
completion, of (.W%,g) if M is a manifold with 
boundary such that: 


1. M is the interior of M: 

2. there exists a function Q, with the property that 
the metric g, defined as 27g on M, extends by 
continuity to the boundary of M, with the 


extended metric remaining of Lorentzian signa- 
ture; and 

3. Q is positive on M, differentiable on M, vanishes 
on the boundary 


I := MN M 
with dQ nowhere vanishing on J. 


The boundary of M is called Scri, a phonic 
shortcut for “script I.” The idea here is the 
following: forcing Q to vanish on .¥ ensures that J 
lies infinitely far away from any physical object — a 
mathematical way of capturing the notion “very far 
away.” The condition that dQ does not vanish is a 
convenient technical condition which ensures that 4 
is a smooth three-dimensional hypersurface, instead 
of some, say, one- or two-dimensional object, or of a 
set with singularities here and there. Thus, .% is an 
idealized description of a family of observers at 
infinity. 

To distinguish between various points of .%, one 
sets 


£* = {points in J which are to the future of the 
physical spacetime } 

J~ = {points in A which are to the past of the 
physical spacetime } 


(Recall that a point q is to the future, respectively to 
the past, of p if there exists a future directed, 
respectively past directed, causal curve from p to g. 
Causal curves are curves y such that their tangent 
vector y is causal everywhere, 2(7,7) <0.) One 
then defines the black hole region Z as 


B := {points in M which are 4 
not in the past of 4” } 


By definition, points in the black hole region cannot 
thus send information to .4*; equivalently, observers 
on .£* cannot see points in Z. The white-hole region 
W is defined by changing the time orientation in [4]. 


A key notion related to the concept of a black hole is 
that of future (£+) and past (£7) event horizons, 


&* := OB, E~ = OW 5] 


Under mild assumptions, event horizons in station- 
ary spacetimes with matter satisfying the null-energy 
condition, 


Tvl" > 0 for all null vectors 4“ [6] 


are smooth null hypersurfaces, analytic if the metric 
is analytic. 

In order to develop a reasonable theory, one 
also needs a regularity condition for the interior of 
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spacetime. This has to be a condition which does not 
exclude singularities (otherwise the Schwarzschild 
and Kerr black holes would be excluded), but which 
nevertheless guarantees a well-behaved exterior 
region. One such condition, assumed in all the 
results described below, is the existence in -W of an 
asymptotically flat spacelike hypersurface Y with 
compact interior. Further, either Y has no boundary 
or the boundary of Z lies on &' UE. To make 
things precise, for any spacelike hypersurface let gi 
be the induced metric, and let Kj denote its extrinsic 
curvature. A spacelike hypersurface Sext diffeo- 
morphic to R? minus a ball will be called asympto- 
tically flat if the fields (gj, Kj) satisfy the fall-off 
conditions 


gi — êl + rlOogi| +--+ + r leegi 
+ r|Ki| +-+ rf |ð. Käl < Cr! [7] 


for some constants C, k > 1. A hypersurface Y (with 
or without boundary) will be said to be asymptotically 
flat with compact interior if Y is of the form Sint U 
Sext with Sint compact and Sext asymptotically flat. 

There exists a canonical way of constructing a 
conformal completion with good global properties 
for stationary spacetimes which are asymptotically 
flat in the sense of [7], and which are vacuum 
sufficiently far out in the asymptotic region. This 
conformal completion is referred to as the standard 
completion and will be assumed from now on. 

Returning to the event horizon €=6*U€ , 
it is not very difficult to show that every Killing 
vector field X is necessarily tangent to &. Since 
the latter set is a null Lipschitz hypersurface, it 
follows that X is either null or spacelike on &. This 
leads to a preferred class of event horizons, called 
Killing horizons. By definition, a Killing horizon 
associated with a Killing vector K is a null hypersur- 
face which coincides with a connected component of 
the set 


H(K) := ip E€ M: g(K,K)(p) = 9, K(b) #0} [8] 


A simple example is provided by the “boost Killing 
vector field” K = z0, + td, in Minkowski spacetime: 
H(K) has four connected components, 


Hes := {t =z, dt > 0}, €6€{+1} 


The closure H of H is the set {|t| = |z|}, which is not 
a manifold, because of the crossing of the null 
hyperplanes {t= +z} at t=z=0. Horizons of this 
type are referred to as bifurcate Killing horizons, 
with the set {K(p)=0} being called the bifurcation 
surface of H(K). The bifurcate horizon structure in 
the Kruszkal-Szekeres-Schwarzschild spacetime can 
be clearly seen in Figures 1 and 2. 
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The Vishveshwara—Carter lemma shows that if a 
Killing vector K is hypersurface orthogonal, K’ A 
dK’ =0, then the set H(K) defined in [8] is a union 
of smooth null hypersurfaces, with K being 
tangent to the null geodesics threading H (“H is 
generated by K”), and so is indeed a Killing 
horizon. It has been shown by Carter that the 
same conclusion can be reached if the hypothesis 
of hypersurface orthogonality is replaced by that 
of existence of two linearly independent Killing 
vector fields. 

In stationary-axisymmetric spacetimes, a Killing 
vector K tangent to the generators of a Killing 
horizon H can be normalized so that K =X + wY, 
where X is the Killing vector field which asymptotes 
to a time translation in the asymptotic region, and Y 
is the Killing vector field which generates rotations 
in the asymptotic region. The constant w is called 
the angular velocity of the Killing horizon H. 

On a Killing horizon H(K), one necessarily has 


V"(K’K,) = —26K# 9) 


Assuming the so-called dominant-energy condition 
on T,„ (see Positive Energy Theorem and Other 
Inequalities in GR), it can be shown that « is constant 
(recall that Killing horizons are always connected in 
the terminology used in this article); it is called the 
surface gravity of H. A Killing horizon is called 
degenerate when «=O, and nondegenerate other- 
wise; by an abuse of terminology, one similarly talks 
of degenerate black holes, etc. In Kerr spacetimes we 


r=constant < 2M 
r=2M 


have «=O if and only if m=a. A fundamental 
theorem of Boyer shows that degenerate horizons 
are closed. This implies that a horizon H(K) such that 
K has zeros in H is nondegenerate, and is of bifurcate 
type, as described above. Further, a nondegenerate 
Killing horizon with complete geodesic generators 
always contains zeros of K in its closure. However, it 
is not true that existence of a nondegenerate horizon 
implies that of zeros of K: take the Killing vector field 
zð; + tð; in Minkowski spacetime from which the 
2-plane {z=t=0} has been removed. The universal 
cover of that last spacetime provides a spacetime in 
which one cannot restore the points which have been 
artificially removed, without violating the manifold 
property. 

The domain of outer communications (DOC) of a 
black hole spacetime is defined as 


KMI) = M\iBUWYS (10) 


Thus, ((.@)) is the region lying outside of the white- 
hole region and outside of the black hole region; it is 
the region which can both be seen by the outside 
observers and influenced by those. 

The subset of ((.@)) where X is spacelike is called 
the ergoregion. In the Schwarzschild spacetime, 
w=0Q and the ergoregion is empty, but neither of 
these is true in Kerr with a Æ 0. 

A very convenient method for visualizing the 
global structure of spacetimes is provided by the 
Carter-Penrose diagrams. An example of such a 
diagram is given in Figure 2. 
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Figure 2 The Carter—Penrose diagram for the Kruskal-Szekeres spacetime. There are actually two asymptotically flat regions, with 
corresponding .4* and &* defined with respect to the second region, but not indicated on this diagram. Each point in this diagram represents 
a two-dimensional sphere, and coordinates are chosen so that light cones have slopes +1. Regions are numbered as in Figure 1. (Adapted 
with permission from Nicolas J-P (2002) Dirac fields on asymptotically flat space-times. Dissertationes Mathematicae 408: 1—85.) 





A corollary of the topological censorship theorem 
of Friedman, Schleich, and Witt is that DOCs of 
regular black hole spacetimes satisfying the domi- 
nant-energy condition are simply connected. This 
implies that connected components of event hor- 
izons in stationary spacetimes have R xS? 
topology. 

The discussion of the concepts associated with 
stationary-black hole spacetimes can be concluded 
by summarizing the properties of the Schwarzs- 
child and Kerr geometries: the extended 
Kerr spacetime with m >a is a black hole space- 
time with the hypersurface {r=r,} forming a 
nondegenerate, bifurcate Killing horizon generated 
by the vector field X+wY and surface gravity 
given by 

(m? O a2)"/? 


m eee 
2m|m + (m2 — a2) "] 


In the case a=0, where the angular velocity w 
vanishes, X is hypersurface orthogonal and becomes 
the generator of H. The bifurcation surface in this 
case is the totally geodesic 2-sphere, along which the 
four regions in Figure 1 are joined. 


Classification of Stationary Solutions 
(“No-Hair Theorems”) 


We confine attention to the “outside region” of 
black holes, the DOC. (Except for the degenerate 
case discussed later, the “inside”(black hole) 
region is not stationary, so that this restriction 
already follows from the requirement of stationar- 
ity.) For reasons of space, we only consider 
vacuum solutions; there exists a similar theory 
for electro-vacuum black holes. (There is a some- 
what less developed theory for black hole space- 
times in the presence of nonabelian gauge fields.) 
In connection with a collapse scenario, the vacuum 
condition begs the question: collapse of what? The 
answer is twofold: first, there are large classes of 
solutions of Einstein equations describing pure 
gravitational waves. It is believed that sufficiently 
strong such solutions will form black holes. 
(Whether or not they will do that is related to the 
cosmic censorship conjecture, see Spacetime Toplogy, 
Casual Structure and Singularities.) Consider, next, a 
dynamical situation in which matter is initially present. 
The conditions imposed in this section correspond 
then to a final state in which matter has either been 
radiated away to infinity, or has been swallowed by 
the black hole (as in the spherically symmetric 
Oppenheimer-Snyder collapse described above). 
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Based on the facts below, it is expected that the 
DOCs of appropriately regular, stationary, vacuum 
black holes are isometrically diffeomorphic to those 
of Kerr black holes: 


1. The rigidity theorem (Hawking). Event horizons in 
regular, nondegenerate, stationary, analytic 
vacuum black holes are either Killing horizons for 
X, or there exists a second Killing vector in ((.7@)). 

2. The Killing horizons theorem (Sudarsky—Wald). 
Nondegenerate stationary vacuum black holes 
such that the event horizon is the union of Killing 
horizons of X are static. 

3. The Schwarzschild black holes exhaust the family 
of static regular vacuum black holes (Israel, 
Bunting — Masood-ul-Alam, Chrusciel). 

4. The Kerr black holes satisfying 


m >a [11] 


exhaust the family of nondegenerate, stationary- 
axisymmetric, vacuum, connected black holes. 
Here m is the total Arnowitt-Deser-Misner 
(ADM) mass, while the product am is the total 
ADM angular momentum. (Of course, these 
quantities generalize the constants a and m 
appearing in the Kerr metric.) The framework 
for the proof has been set up by Carter, and the 
statement above is due to Robinson. 


The above results are collectively known under 
the name of no-hair theorems, and they have not 
provided the final answer to the problem so far. 
There are no a priori reasons known for the 
analyticity hypothesis in the rigidity theorem. 
Further, degenerate horizons have been completely 
understood in the static case only. 

Yet another key open question is that of the 
existence of nonconnected regular stationary- 
axisymmetric vacuum black holes. The following 
result is due to Weinstein: let 0Y%,, a= 1,..., N, be 
the connected components of 0.7. Let X’ = g, X" dx”, 
where X” is the Killing vector field which asymptoti- 
cally approaches the unit normal to Sext. Similarly, set 
Y’ =g, Y"dx”, Y” being the Killing vector field 
associated with rotations. On each 0.Y,, there exists 
a constant w, such that the vector X + w,Y is tangent 
to the generators of the Killing horizon intersecting 
OF a. The constant w4 is called the angular velocity of 
the associated Killing horizon. Define 


Ma = m 7 *dX’ [12] 
il b 
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Such integrals are called Komar integrals. One 
usually thinks of Lz as the angular momentum of 
each connected component of the black hole. Set 


Ua = Ma — 2waLa [14] 


Weinstein shows that one necessarily has p, > 0. 
The problem at hand can be reduced to a harmonic- 
map equation, also known as the Ernst equation, 
involving a singular map from R? with Euclidean 
metric 6 to the two-dimensional hyperbolic space. 
Let r, > 0, a=1,...,N—1, be the distance in R? 
along the axis between neighboring black holes as 
measured with respect to the (unphysical) metric ô. 
Weinstein proved that for nondegenerate regular 
black holes the inequality [11] holds, and that the 
metric on ((.@)) is determined up to isometry by the 
3N — 1 parameters 


.,1N_1) [15] 


just described, with f4, ua > 0. These results by 
Weinstein contain the no-hair theorem of Carter 
and Robinson as a special case. Weinstein also 
shows that, for every N > 2 and for every set of 
parameters [15] with ua,ra > 0, there exists a 
solution of the problem at hand. It is known that 
for some sets of parameters [15] the solutions will 
have “strut singularities” between some pairs of 
neighboring black holes, but the existence of the 
“struts” for all sets of parameters as above is not 
known, and is one of the main open problems in our 
understanding of stationary-axisymmetric electro- 
vacuum black holes. The existence and uniqueness 
results of Weinstein remain valid when strut 
singularities are allowed in the metric at the outset, 
although such solutions do not fall into the category 
of regular black holes discussed here. 
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Introduction 


An oscillatory integral is an integral of the form 


I(w) = J el) a(0)d0 [1] 


Here the integration is over a smooth k-dimensional 
manifold © which is provided with a smooth density 
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dð. The real variable w plays the role of a frequency 
variable, whereas the real-valued smooth function y on 
© is called the phase function. The amplitude function a 
is assumed to be a compactly supported complex 
(vector-) valued smooth function on ©. The topic of 
this article is the asymptotic behavior of the oscillatory 
integral I(w) as the frequency w tends to infinity. 
When the manifold © is not compact and the 
amplitude function is not compactly supported, then a 
smooth cutoff function may be used to write the 
integral as the sum of an integral with a compactly 
supported amplitude and one with an amplitude which 
is equal to zero in a large compact subset of ©. The 


latter integral can be studied if suitable assumptions 
are made about the asymptotic behavior of the phase 
function and the amplitude at infinity, but this is not 
the subject of this article. The use of the exponential 
function with purely imaginary argument instead of 
the sine and the cosine is just a matter of convenience. 

The first observation about oscillatory integrals in 
the next section is the principle of stationary phase, 
which states that the contributions to the integral 
which are not rapidly decreasing as w—oo only 
come from the stationary points of y, the points 9 € © 
where the total derivative dy(9) of y is equal to zero. 
This principle is closely related to the observation that 
a superposition of waves is maximal at points where 
the waves are in phase, an observation which goes 
back to Huygens (1690). 

Assume that ĝo is a nondegenerate stationary point of 
y. That is, dy(4)) =0 and the Hessian D*y(p) of y at 
Oo is nondegenerate. Then 9p is an isolated stationary 
point of y, and the contribution to I(w) of a neighbor- 
hood of ĝo has an asymptotic expansion of the form 


Co 
I(w) ~ e?) ` cw A T 
r=0 


Here the leading coefficient co is the product of a(o) 
with a nonzero constant which only depends on 
D7(09) and the density dô at 69. For increasing r the 
coefficients c, depend on the derivatives of y and a 
at ĝo of increasing order (see the section “The 
method of stationary phase”). 

Usually, even if all the objects are analytic in a 
neighborhood of 69, the asymptotic power series 
does not converge. However, there are exceptional 
cases where the stationary phase approximation is 
exact. Assume, for instance, that © is a compact 
manifold provided with a symplectic form øg, y is the 
Hamiltonian function of a Hamiltonian circle action 
on © with isolated fixed points, and a(0) d0 = of /k!. 
Then the stationary points of y are the fixed points 
of the circle action, each stationary point of is 
nondegenerate and I(w) is equal to the sum over the 
finitely many stationary points of only the leading 
terms of the asymptotic expansions at the stationary 
points. This Duistermaat-Heckman formula is a 
consequence of a more general localization formula 
in equivariant cohomology (see the section “Exact 
stationary phase”). 

For the purpose of applications, but also in the 
analysis of oscillatory integrals, it is worthwhile to 
allow complex-valued phase functions, but with a 
local minimum for the imaginary part at the 
stationary point ĝo of the real parts. That is, the 
real part of the exponent iwy(9) has a local 
maximum at ĝo. An extreme case occurs when 
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vp(6) = i4(@) for a real-valued function Y% which has a 
nondegenerate local minimum at 09, in which case 
the integrand is a sharply peaking Gaussian density 
at ło. When y and a are analytic near ĝo, then the 
method of steepest descent consists of deforming the 
path of integration in the complex domain in such a 
way that the integrand becomes such a sharply 
peaking Gaussian density. During the deformation, 
the integral does not change because of Cauchy’s 
integral theorem. 

An important extension of the theory occurs if the 
real-valued phase function and the amplitude are 
allowed to depend smoothly on additional para- 
meters x, which vary in an n-dimensional smooth 
manifold M. The amplitude is also allowed to 
depend on w, with an asymptotic expansion of 
the form 


a(x,0,w) ~ X a(x, Out as w > 00 [2] 
r=0 


The expansion is supposed to be locally uniformly in 
(x,0) and to allow termwise differentiations of any 
order with respect to the variables (x,0). Then the 
integral 


eo) = [ive alx,6 w) dé 


is called an oscillatory integral of order m. Here the 
function x= I(x,w) is viewed as a continuous 
superposition of the -dependent family of oscilla- 
tory functions xr el? a(x, 6). 

The example which formed the point of departure 
of Airy (1838) is that e”?'™a(x,6,w) is the wave 
which arrives at the points x in spacetime which is 
sent out by a point 0 on a reflecting mirror. That is, 
at x one collects (= integrates over ©) all the waves 
sent out by the various points 6 of the mirror ©. The 
main point of the theory, however, is that in great 
generality the solutions of linear partial differential 
equations, such as classical wave equations or 
quantum mechanical Schrodinger equations, can be 
represented, as functions of x, as oscillatory inte- 
grals. This construction has led to decisive progress 
in the general theory of linear partial differential 
equations with smoothly varying coefficients. 

According to the principle of stationary phase, the 
main asymptotic contributions to the integral come 
from the points @ such that Oy(x,0)/00=0. The 
phase function y € C*(M x ©) is called nondegene- 
rate if the (n + k) x k-matrix 


2 
oe) has rank k when e 


(x, 0)00 a 70 BI 


46 Stationary Phase Approximation 


This is the natural condition to ensure that the set 


Soi={ (08) EM x ee) 0 | 


is a smooth n-dimensional submanifold of M x ©. 
The condition [3], moreover, implies that the mapping 


haido (Ne (E ETM 


is a smooth immersion from S, into the cotangent 
bundle T*M of M. Note that €=0y(x,0)/Ox is 
coordinate invariantly defined as a linear form on 
the tangent space T,M of M at the point x. That 
is, € € (T,M)* = the dual space of TM, and (T,.M)* 
is the fiber of T*M over x. In classical mechanics, 
T*M is the phase space of the position space M, 
and a linear form € on T,M is called a momen- 
tum vector at the position x. If o denotes the cano- 
nical symplectic form on T*M, then :*0o=0. The 
immersion ty locally embeds S, onto a smooth 
n-dimensional submanifold A, of M, which is a 
Lagrangian manifold in T*M, which by definition 
means that uso =0. 

Oscillatory integrals with very different phase 
functions and amplitudes can define the same 
w-dependent functions on M. The theory of 
Hormander (1971, section 3.1) says that the germs 
of the Lagrangian manifolds A, and Ay are the same 
if and only if » and w define the same class of 
oscillatory integrals. Moreover, every Lagrangian 
submanifold A of T*M is locally of the form A, 
for some nondegenerate phase function y. In this 
way, the mapping pA, defines a bijection 
between the set of equivalence classes of germs of 
nondegenerate phase functions and the set of germs 
of Lagrangian submanifolds of T*M. Let A be an 
immersed Lagrangian submanifold of T*M. A 
global oscillatory integral of order m on M, defined 
by A, is a locally finite sum u(x,w) of oscillatory 
integrals of order m with nondegenerate phase 
functions y such that A, C A. The leading terms of 
the amplitudes correspond to a section s of a 
canonically defined complex line bundle à over A, 
which is called the principal symbol of u (see the 
section “The principal symbol on the Lagrangian 
manifold”). 

If P is a linear partial differential operator, such as 
the wave operators, in which the coefficients may 
depend in a smooth way on x and in a polynomial 
way on w, then the condition that Pu is asymptoti- 
cally small implies that p=0 on A, in which p is a 
smooth function on T*M, called the principal 
symbol of P. Because A is a Lagrangian manifold, 


the equation p =0 implies that A is invariant under 
the flow of the Hamiltonian system with Hamilton 
function equal to p. Furthermore, the principal 
symbol s of u satisfies a homogeneous first-order 
ordinary differential equation along the solution 
curves of the Hamiltonian system. Conversely, these 
properties can be used to construct global oscillatory 
integrals u which asymptotically satisfy Pu =0 and 
have prescribed initial values. This theory, due to 
Maslov (1972), may be viewed as a far reaching 
generalization of the WKB method. 

Let 7: T’M—M:(x,€)x denote the canonical 
projection from T*M onto M. The projections into 
M of the solution curves in a Lagrangian submani- 
fold A of T*M, of a Hamiltonian system which 
leaves A invariant, are the ray bundles of geome- 
trical optics. If A is not transversal to the fiber of 
T*M at (x,€), then the ray bundle exhibits a caustic 
at the point x € M, and the oscillatory integral is 
asymptotically of larger order than w” near x. 
Applying the theory of unfoldings of singularities 
to the phase function, one can determine the 
structurally stable caustics and obtain normal 
forms of the oscillatory integrals in the structurally 
stable cases (see the section “‘Caustics’’). 

If we also integrate over the frequency variable w, 
then we obtain the Fourier integral distributions u of 
Hormander (1971, sections 1.2 and 3.2). In this case 
the corresponding Lagrangian manifold is conic in 
the sense that if (x, €) € A, then (x,7&) € A for every 
T>0O. The wave front set of u, which is the 
microlocal singular locus of the distribution u, is 
contained in A, with equality if the principal symbol 
of u is not equal to zero at the corresponding 
stationary points of the phase function. Fourier 
integral operators are defined as the linear operators 
acting on distributions, of which the distribution 
kernels are Fourier integral distributions. Under a 
suitable transversality condition for the Lagrangian 
manifolds of the distribution kernels, the composi- 
tion of two Fourier integral operators is again a 
Fourier integral operator, and the principal symbol 
of the composition is a product of the principal 
symbols. The proof is an application of the method 
of stationary phase. Fourier integral operators are a 
very powerful tool in the analysis of linear partial 
differential operators with smoothly varying coeffi- 
cients (see Hormander (1985)). 


The Principle of Stationary Phase 


The principle of stationary phase says that if the 
phase function yw has no stationary points in 
the support of the amplitude function a, then the 


oscillatory integral [1] is rapidly decreasing, in the 
sense that for every N we have I(w)=O(w) as 
w— oo. For the proof, one introduces a vector field v 
on © such that vp=1 on a neighborhood of the 
support of a. Then e =(iw)‘v(e“*%), and an 
integration by parts in [1] yields that 


I(w) = = J el?) (tva) (0) dé 


I 


where tv denotes the transposed of the linear partial 
differential operator v. Iterating this, the rapid 
decrease of I(w) follows. 

Using cutoff functions, I(w) is, modulo a rapidly 
decreasing function, equal to an oscillatory integral 
with phase function p and an amplitude which 
has support in an arbitrarily small neighborhood of 
the set of stationary points of y. In this sense, 
the contributions to the integral which are not 
rapidly decreasing come only from the stationary 
points of ọ. 


The Method of Stationary Phase 


Assume that ĝo is a nondegenerate stationary point 
of y. Then ĝo is an isolated stationary point of y. 
Using local coordinates near ĝo, the contribution to 
[1] from the neighborhood of ĝo can be written as an 
oscillatory integral with © = R* and a pase function 
y which has a nondegenerate stationary point at 0. 
Write O =D*(0). According to the Morse lemma, 
there is smooth substitution of variables 6=T(y) 
such that T(0)=0,DT(0) =I, and y(T(u)) = (0) + 
(Oy, y)/2 for all y in a neighborhood of 0 in RÈ. 
Applying this substitution of variables to [1] we 
obtain 


Iw) = ev? J  e{079)/2b(y) dy 
R 


where b is a compactly supported smooth function 
on R? with b(0) =a(0). Now the Fourier transform 
of the function y e™9»)/2 is equal to the function 


n> (det (= Q) ) E ety O im |4] 


Both in the definition of the square root of the 
determinant and in the proof one uses the analytic 
continuation to the domain of complex-valued 
symmetric bilinear forms O for which the imaginary 
part of O is positive definite. For purely imaginary 
O we have the familiar formula for the Fourier 
transform of a Gaussian density (see Hormander 
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(1990, theorem 7.6.1)). The Taylor expansion of the 
exponential factor in [4] then yields that 





as w— oo (see Hormander (1990, lemma 7.7.3)). 

It is important for the applications that, if the 
phase function and amplitude depend smoothly on 
parameters, all the constructions can be made to 
depend smoothly on the parameters. 


Exact Stationary Phase 


Suppose that we have given an action of a Lie group 
G on the manifold ©. Let g denote the Lie algebra of 
G. For any g€ G and X €g the corresponding 
diffeomorphism of © and vector field on © is 
denoted by go and Xo, respectively. If (OQ) denotes 
the algebra of smooth differential forms on ©, then 
we consider the algebra Sq* ® (0) of all Q(O)- 
valued polynomials on g, where Sq* denotes the 
algebra of all polynomial functions on g. On Sq* & 
(©) we have the action of g € G which sends a to 
Xm g(a(AdgX)). Let A=(Sq* @(0))°% denote 
the subalgebra of all G-invariant elements of Sq* & 
Q(©0). The equivariant exterior derivative D is 


defined by 
(Da)(X) = d(a(X)) — ix, (a(X)) 


If a is homogeneous as a differential form of degree p 
and homogeneous as a polynomial on g of degree q, 
then r=p + 24 is called the total degree of a. Let A’ 
denote the space of sums of such a € A of total degree r. 
Then D, = D: A’ > A’*! and D, o D,-1 = 0. The space 
H,(0):= ker D,/ImD,_1 is called the equivariant 
cohomology in degree r, in the model of Cartan (1950). 

Assume that © is compact and oriented, and that 
the action of G preserves the orientation. If a € A, 
then we denote by a(X)'*! the volume part of the 
differential form a(X), and 


(Ja) (x)= | e00", Xeq 


defines an Ad G-invariant function fa on g. Now 
a= DG implies that aX)! id equal to the exterior 
derivative of 3(X)!*-", and therefore fa=0, in view 
of Stokes’ theorem. It follows that integration over O 
yields a linear mapping f from Hg¢(Q) to Saree. 
which is called integration in equivariant cohomology. 
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Now assume that also the Lie group G is 
compact, and let X € g. Then the zero-set Zx of 
Xo in O has finitely many connected components F, 
each of which is a smooth and compact submanifold 
of ©. In general, the F’s can have different 
dimensions. The linearization LX of the vector 
field Xo along F acts linearly on the normal bundle 
NF of F. If Q is the curvature form of NF, then 


1 


e(X) := detc 5 


(LX — 9)| 
is called the equivariant Euler form of NF. e(X) is an 
invertible element in the algebra (°"(F). The 


localization formula of Berline-Vergne (1982) and 
Atiyah—Bott (1984) now says that if Da=0 then 


(/2) 00 =F | e0000)" 


Assume that o is a symplectic form on ©, which 
implies that k = 2/ is even. Furthermore, assume that 
the infinitesimal action of q on © is Hamiltonian, 
which means that there exists a G-equivariant 
smooth mapping u:© —g*, called the momentum 
mapping, such that ixa = —d(yu(X)) for every X € g. 
Here ju is viewed as an element of (q* @°(@))° c A. 
Then o(X):=o0 — p(X) defines an element o € A such 
that Do =0. In turn, this implies that the form 


D(X) = 


pA l 
e wo(X) = elwu(X) (—iwo)’ /r! 
=() 


r 


is equivariantly closed, and the localization formula 
of equivariant cohomology applied to this case 
yields the Duistermaat-Heckman (1982, 1983) 
formula. Because 3(X)!*! = eX) (—jwo)! /I!, its inte- 
gral over © is an oscillatory integral with phase 
function p(X). The stationary points of p(X) are the 
zeros of Xo and the stationary points of p(X) are 
nondegenerate if and only if the zeros of Xo are 
isolated. It follows that in this case the oscillatory 
integral is equal to the leading term in the 
stationary-phase approximation. 


The Principal Symbol on the 
Lagrangian Manifold 


Let u(x,w) be a global oscillatory integral of order m 
defined by A, and let (xo, £o) € A. One way to define 
the principal symbol of u at (xo, £o) € A is to test u 
with an oscillatory function of the form e~“”) b(x), 
in which dw(x9)=€&, the support of b is contained 
in a small neighborhood of xo, and b(xo) =1. If u is 
locally represented by the phase function y and 


amplitude a, and (xo, o) =,(x0, 90), then the phase 
function v(x, @) — v(x) in the oscillatory integral 


(u,eb) = J [ Eaa 0)b(x) dé dx 
M Je 


has a stationary point at (x9,@9), which means that 
A and dw intersect at (xo, €o). Here the 1-form dw on 
M, which is a section of 7: T*M — M, is viewed as a 
submanifold of T*M. Locally the Lagrangian sub- 
manifolds of T° M which are transversal to the fibers 
of r: T*M—M are precisely the manifolds of the 
form dy. The stationary point of y—vw is non- 
degenerate if and only if L:=Ti,.,¢)A and 
Ly :=T xo, (dY) are transversal. In this case, the 
method of stationary phase can be applied in order 
to obtain an asymptotic expansion in terms of 
powers of w. The coefficient of the leading term of 
order w” depends only on the Lagrangian plane Ly, 
which is transversal to both L and the tangent space 
of the fiber of T*M, and not on the other data of w 
and b. If £ denotes the set of all Lagrangian planes 
in Tixo, ĉ&)(T*M) which are transversal to both L and 
the fiber, then the complex-valued functions on £ 
which arise in this way form a one-dimensional 
complex vector space Lix ,¢)). The Lixo) for 
(xo,€o) E€ A form a complex line bundle A over A 
which is canonically isomorphic to the tensor 
product of the line bundle of half-densities and the 
Maslov line bundle, a line bundle with structure 
group Z/4Z (see Duistermaat (1974, section 1.2)). 
In this way, the principal symbol s of u can be 
viewed as a section of the line bundle A over A. 


Caustics 


Let (xo, éo) be a point in the Lagrangian submanifold 
A of T*M. The restriction to A of the projection 
t:T° M—M is a diffeomorphism from an open 
neighborhood of (xo, ) in A onto an open neigh- 
borhood of x9 in M, if and only if A is transversal 
to the fiber of T*M at (xo,&). If A=A, for a 
nondegenerate phase function ,(x0,€0) E€ Sp and 
(x0, 0) =4,(x0, 90), then this condition is in turn 
equivalent to the condition that ĝọ is a nondegenerate 
stationary point of 0 —> (xo, 8). An application of the 
method of stationary phase shows that in this case the 
oscillatory integral is equal to a progressing wave of 
the form e)b(x,w). Here w(x) = v(x, 6(x)), where 
6(x) is the stationary point of 0 —> y(x, 6), and b(x, w) 
has an asymptotic expansion as in [2] with k=0. 

If @ is a degenerate stationary point of 
A+ ~(x9,8) and ao(xo, 00) 4 0, then the oscillatory 
integral is not of order O(w””). That is, it is of larger 
order than at points where we have a nondegenerate 


stationary point. For this reason, the points (xo, €o) 
at which A is not transversal to the fibers of 
t:T°M—M are called the caustic points of A. 
Their projections x9 € M form the caustic set in M. 

In the theory of unfoldings of singularities, the 
germs of the families of functions x+> (0 —> (x, 0)) 
and y> (wre ly, u)) are called equivalent if there 
exists a germ of a diffeomorphism of the form 
H: (x, 0) (y(x), (x, 0)) and a smooth function x(x) 
such that w(y(x), u(x, 0)) = p(x, 8) + x(x). If J(y,w) is 
an oscillatory integral with phase function y, 
integration variable u and parameter y, then the 
substitution of variables u= p(x,9) in the integral, 
followed by the substitution of variables y = y(x) in 
the parameters, yields that J(y, w) =e” I(x, w), in 
which I(x,w) is an oscillatory integral with phase 
function y and an amplitude function of the same 
order as the amplitude function of J. The germ ọ is 
called stable if every nearby germ w is equivalent to 
y. The Morse lemma with parameters implies that 
this is the case if 0> y(xo,@) has a nondegenerate 
stationary point at 09. However, the theory of 
unfoldings of singularities of Thom and Mather 
shows that there are many stable germs with 
degenerate critical points. Moreover, in dimension 
n < 5 the generic germ is stable, and is equivalent to 
a germ in a finite list of normal forms. 

The simplest example of a normal form with 
degenerate critical points is y(x,0)=6° + x10. Here 
we have taken k=1, but still allowed an arbitrary 
dimension n > 1 of M. In this normal form, the 
stationary points correspond to 367 + x; =0, which 
is a manifold which over the x-space folds over at 
xı =0. The stationary point is degenerate if and only 
if 69=0, hence x; =0, which means that x; =0 is 
the caustic set. If the amplitude is equal to 1, then 
the oscillatory integral is equal to w!/? Ai(w?/>x1), 
in which Ai(z) denotes the Airy function. If the 
amplitude is nonzero at a degenerate critical point, 
then the oscillatory integral near the corresponding 
caustic point is asymptotically of the same order as 
w!/3 Ai(w*/3x1), which implies that the oscillatory 
integral is a factor w'/° larger at these caustic points 
than at the points away from the caustic set. In Airy 
(1838), where the Airy function was introduced, Airy 
considered light in a neighborhood of a caustic as an 
oscillatory integral. Then, under suitable genericity 
conditions, he brought the phase function into the 
normal form 6° + x10. Even for stable normal forms 
in low dimensions, the interference patterns near the 
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caustic points can be very intricate (see, e.g., Berry 
et al. (1979)). A survey of the application of the 
theory of unfoldings to caustics in oscillatory 
integrals can be found in Duistermaat (1974). 


See also: Equivariant Cohomology and the Cartan 
Model; Feynman Path Integrals; Functional Integration in 
Quantum Physics; Hamiltonian Group Actions; 
h-Pseudodifferential Operators and Applications; 
Multiscale Approaches; Normal Forms and Semiclassical 
Approximation; Optical Caustics; Path Integrals in 
Noncommutative Geometry; Perturbation Theory and its 
Techniques; Schrodinger Operators; Singularity and 
Bifurcation Theory; Wave Equations and Diffraction. 
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Introduction 


Equilibrium statistical mechanics and combinatorial 
optimization — which is viewed here as a branch of 
discrete mathematics and theoretical computer 
science — have common roots. Phase transition are 
mathematical phenomena which are not limited to 
physical systems but are typical of many combina- 
torial problems, one famous example being the 
percolation transition in random graphs. Similarly, 
the understanding of relevant physical problems, 
such as three-dimensional lattice statistics or two- 
dimensional quantum statistical mechanics pro- 
blems, is strictly related to the question of purely 
combinatorial origin of solving counting problems 
over nonplanar lattices. Most of the tools and 
concepts which have allowed to solve problems in 
one field have a natural counterpart in the other. 
While the possibility of solving exactly physical 
models is always connected to the presence of some 
algebraic properties which guarantee integrability, in 
the combinatorial approach the emphasis is more on 
algorithms that can be applied to problem instances 
in which the symmetries behind intergrability might 
be absent. Also at the level of out-of-equilibrium 
phenomena, there exists a deep connection between 
physics and combinatorics: just like physical pro- 
cesses, local algorithms have to deal with an 
exponentially large set of possible configurations 
and their out-of-equilibrium analysis constitutes a 
theory of how problems are actually solved. 
Computational complexity theory deals with 
classifying problems in terms of the computational 
resources, typically time, required for their solution. 
What can be measured (or computed) is the time 
that a particular algorithm uses to solve the 
problem. This time in turn depends on the imple- 
mentation of the algorithm as well as on the 
computer the program is running on. The theory of 
computational complexity provides us with a notion 
of complexity that is largely independent of imple- 
mentational details and the computer at hand. This 
is not surprising, since it is related to a highly 
nontrivial question, that is: what do we mean by 
saying that a combinatorial problem is solvable? 
Problems which can be solved in polynomial time 
are considered to be tractable and compose the so- 
called polynomial (P) class. The harder problems are 


grouped in a larger class called NP, where NP stands 
for “nondeterministic polynomial time.” These 
problems are such that a potential solution can be 
checked rapidly in polynomial time, while finding a 
solution may require exponential time in the worst 
case. In turn, the hardest problems in NP belong to a 
subclass called NP-complete: an efficient algorithm 
for solving one NP-complete problem could be 
easily modified to effectively solve any problem in 
NP. By now, a huge number of NP-complete 
problems has been identified, and the lack of such 
an algorithm corroborates the widespread conjec- 
ture P Æ NP, that is, that no such algorithm exists. 
However, NP-complete problems are not always 
hard: when their resolution complexity is measured 
with respect to some underlying probability distri- 
bution of problem instances, NP-complete problems 
are often easy to solve on average. To deepen the 
understanding of the average-case complexity (and 
of the huge variability of running times observed in 
numerical experiments), computer scientists, mathe- 
maticians, and physicists have focused their atten- 
tion on the study of random instances of hard 
combinatorial problems, seeking for a link between 
the onset of exponential-time complexity and some 
intrinsic (i.e., algorithm independent) properties of 
the randomized NP-complete problems. These types 
of questions have merged combinatorial optimiza- 
tion with statistical physics of disordered systems. 

Computational complexity theory can also be 
formulated for counting problems: similarly to 
optimization problems, equivalence classes can be 
defined which separate polynomially solvable count- 
ing problems with the hard ones — the so-called #P 
and #P-complete problems. Complexity theory for 
counting problems makes the connections with 
statistical mechanics even more direct in that 
counting solutions is nothing but a computation of 
a partition function. 

Two simple theorems by Jerrum and Sinclair 
(1989) (which can be easily extended to many 
combinatorial problems) can help in clarifying 
these connections. 

The first theorem tells us that any randomized 
algorithm (e.g., Monte Carlo) for approximating the 
partition function of a generic spin glass model — the 
so-called spin glass problem — could be used to solve 
all the other NP combinatorial problems. The 
second theorem tells us that an algorithm for 
evaluating exactly the partition function of the 
ferromagnetic Ising model over a general graph 
would again solve any other problem in the class #P, 
which, as mentioned above, is the generalization of 


Statistical Mechanics and Combinatorial Problems 51 


the NP class to counting problems and obviously 
contains the class NP as a particular case. 

Let us consider the following sightly simplified 
definition of the Ising and the spin glass problems. 


Problem instance A symmetric matrix J; with 
entries in {— 1,0,1} and an inverse temperature (3. 


Output The partition function Z=}; 20, 
where H(o) = — ij Jaio; with Tel 


Moreover, let us define the fully polynomial 
randomized approximation scheme (FPRAS) for 
counting and decision problems. A FPRAS for a 
function f from problem instances to real numbers is 
a probabilistic algorithm that in polynomial time in 
the problem size n and in the relative error e € [0, 1], 
outputs with high probability a number which 
approximates f(n) within a ratio 1+ «e. Given the 
above definitions, the theorems can be stated as 
follows: 


Theorem 1 There can be no FPRAS for the spin 
glass problem unless P = NP, that is, all problems in 
NP turn out to be solvable in polynomial time. 


Theorem 2 The Ising problem is #P-complete even 
when the matrix Jį is non-negative, that is, an 
algorithm which outputs in polynomial time the 
exact Ising partition function for an arbitrary graph 
could be used to solve any other counting problem 
in #P. 


The above theorems hold for arbitrary graphs, 
in particular for those graph or lattice realizations 
which are particularly hard to analyze, the so-called 
worst cases. There exist no similar proofs of 
computational hardness for more restricted and 
realistic structures, such as, for instance, three- 
dimensional regular lattices for the Ising problem 
or finite connectivity random graphs for spin glasses. 

As a final introductory remark, it is worth 
mentioning that the connections between worst- 
case complexity and the average case one is the 
building block of modern cryptography and com- 
munication theory. On the one hand, the so-called 
RSA cryptosystem is based on factoring large 
integers, a problem which is believed to be hard on 
average while it is not known to be so in the worst 
case. On the other hand, alternative cryptographic 
systems have been proposed which rely on a worst- 
case/average-case equivalence (see, e.g., the theorem 
of Ajtai (1996) concerning some hidden vector 
problems in high-dimensional lattices.) 

As far as communication theory is concerned, 
average-case complexity is indeed crucial: while 
Shannon’s theorem (1948) provides a very general 
result stating that many optimal codes do exist (in 


fact, random codes are optimal), the decoding 
problem is in general NP-complete and therefore 
potentially intractable. However, since the choice of 
the coding scheme is part of the design, what 
matters are the average-case behavior of the decod- 
ing algorithm (and its large deviations) and very 
efficient codes which can solve on average the 
decoding problem close to Shannon’s bounds are 
known. 

In what follows, we will limit the discussion to 
two basic examples of combinatorial and counting 
problems which are representative and central to 
both computer science and statistical physics. 


Constraint Satisfaction Problems 


Combinatorial problems are usually written as 
constraint satisfaction problems (CSPs): discrete 
variables are given which have to satisfy m 
constraints, all at the same time. Each constraint 
can take different forms depending on the prob- 
lem under study: famous examples are the 
K-satisfiability (K-SAT) problem in which constraints 
are an “OR” function of K variables in the ensemble 
(or their negations) and the graph QO-coloring 
problem in which constraints simply enforce the 
condition that the endpoints of the edges in the graph 
must not have the same color (among the O possible 
ones). Quite in general a generic CSP can be written 
as the problem of finding a zero-energy ground state 
of an appropriate energy function and its analysis 
amounts at performing a zero-temperature statistical 
physics study. Hard combinatorial problems are 
those which correspond to frustrated physical model 
systems. 

Given an instance of a CSP, one wants to know 
whether there exists a solution, that is, an assign- 
ment of the variables which satisfies all the 
constraints (e.g., a proper coloring). When it exists, 
the instance is called SAT, and one wants to find 
a solution. Most of the interesting CSPs are 
NP-complete: in the worst case, the number of 
Operations needed to decide whether an instance 
is SAT or not is expected to grow exponentially 
with the number of variables. But recent years 
have seen an upsurge of interest in the theory of 
typical-case complexity, where one tries to identify 
random ensembles of CSPs which are hard to solve, 
and the reason for this difficulty. As already 
mentioned, random ensembles of CSPs are also 
of great theoretical and practical importance in 
communication theory, since some of the best 
modern error-correcting codes (the so-called low- 
density parity check codes) are based on such 
constructions. 
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Satisfiability and Spin Glass Models 


The archetypical example of CSP is satisfiability 
(SAT). This is a core problem in computational 
complexity: it is the first one to have been shown 
NP-complete, and since then thousands of problems 
have been shown to be computationally equivalent 
to it. Yet it is not so easy to find difficult instances. 
The main ensemble which has been used for this 
goal is the random K-SAT ensemble (for K > 2, 
K-SAT is NP-complete). 

The SAT problem is defined as follows. Given a 
vector of {0,1} Boolean variables x = {x;},-;, where 
IT={1,...,m}, consider a SAT formula defined by 


= Ce) 


acA 


where A is an arbitrary finite set (disjoint with T) 
labeling the clauses Cy; Ca(x) = Viera )Jayil i); any 
literal J, j(x;) is either x; or ~x; (“ oe xi); and 
finally, I(a) C I for every a € A. Similarly to I(a), we 
can define the set A(z) C A as A(i)= {a:i € I(a)}, that 
is, the set of clauses containing variable x; or its 
negation. 

Given a formula F, the problem of finding a 
variable assignment s such that F(s) =1, if it exists, 
can also be written as a spin glass problem 
as follows: if we consider a set of n Ising spins, 

€ {+1} in place of the Boolean variables 
(9; = —1,1<x;=0,1) we may write the energy 
function associated to each clause as follows: 


= (1 Ea a 


r=1 


= 


where Ja; = —1 (resp. Jz,;=1) if x; (resp. x;) appears 
in clause a. The total energy of a configuration 
E= ea | ¿Ea is nothing but a K-spin spin glass 
model. 

Random K-SAT is a version of SAT in which each 
clause is taken to involve exactly K distinct 
variables, randomly chosen and negated with uni- 
form distribution. Its energy function corresponds to 
a spin glass system over a finite connectivity 
(diluted) random graph. 

In recent years random K-SAT has attracted much 
interest in computer science and in statistical 
physics. The interesting limit is the thermodynamic 
limit 2— œ, m=|A|— oo at fixed clause density 
a=m/n. 

Its most striking feature is certainly its sharp 
threshold. It is strongly believed that there exists a 
phase transition for this problem: numerical and 


heuristic analytical arguments are in support of the 
so-called satisfiability threshold conjecture: 


Conjecture There exists a.(K) such that with high 
probability: 


e if a<a,(K), a random instance is satisfiable; 
e if a>a,(K), a random instance is unsatisfiable. 


Although this conjecture remains unproven, the 
existence of a nonuniform sharp threshold has been 
established by Friedgut (1997). A lot of effort has been 
devoted to understanding this phase transition. This is 
interesting both from physics and the computer science 
points of view, because the random instances with a 
close to a, are the hardest to solve. There exist 
rigorous results that give bounds for the threshold 
a,(K): using these bounds, it was shown that a,(K) 
scales as 2£ In (2) when K — oo. 

On the statistical physics side, the cavity method 
(which is the generalization to disordered systems 
characterized by ergodicity breaking of the iterative 
method used to solve exactly physical models on the 
Bethe lattice), is a powerful tool which is claimed to 
be able to compute the exact value of the threshold, 
giving for instance a,(3) ~ 4.2667... It is a non- 
rigorous method but the self-consistency of its 
results have been checked by a “stability analysis,” 
and it has also led to the development of a new 
family of algorithms — the so-called “survey propa- 
gation” — which can solve efficiently very large 
instances at clause densities which are very close to 
the threshold (for technical details see Mézard and 
Zecchina (2002) and Braunstein et al. (2005) and 
references therein). 

The main hypothesis on which the cavity analysis 
of random K-SAT relies is the existence, in a region 
of clause density [ag,a-| close to the threshold, of 
an intermediate phase called the “hard-SAT” phase; 
see Figure 1. In this phase the set S of solutions 
(a subset of the vertices in an n-dimensional 
hypercube) is supposed to split into many discon- 
nected clusters S=S; US) U---. If one considers 
two solutions X,Y in the same cluster Sj, it is 


Oe 
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Figure 1 A pictorial representation of the clustering transition 
in random K-SAT. 
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possible to walk from X to Y (staying in S) by 
flipping at each step a finite numbers of variables. If, 
on the other hand, X and Y are in different clusters, 
in order to walk from X to Y (staying in S), at least 
one step will involve an extensive number (i.e., «7) 
of flips. This clustered phase is held responsible for 
entrapping many local-search algorithms into non- 
optimal metastable states. This phenomenon is not 
exclusive to random K-SAT. It is also predicted to 
appear in many other hard SAT and optimization 
problems such as “coloring,” and corresponds to the 
so called “one-step replica symmetry breaking” 
(1RSB) phase in the language of statistical physics. 
It is also a crucial limiting feature for decoding 
algorithms in some error correcting codes. 

The only CSP for which the existence of the 
clustering phase has been established rigorously is 
the polynomial problem of solving random linear 
equation in GF (Motwani and Raghavan 2000). For 
random K-SAT, rigorous probabilistic bounds can 
be used to prove the existence of the clustering 
phenomenon, for large enough K, in some region of 
a included in the interval [ag(K), a.(K)] predicted by 
the statistical physics analysis. 

In the analysis of CSP like K-SAT, two main 
questions are in order. The first is of algorithmic 
nature and asks for an algorithm which decides 
whether for a given CSP instance all the constraints 
can be simultaneously satisfied or not. The second 
question is more theoretical and deals with large 
random instances, for which one wants to know the 
structure of the solution space and predict the 
typical behavior of classes of algorithms. 


Message-Passing Algorithms from 
Statistical Physics 


The algorithmic contributions of statistical 
mechanics to combinatorial optimization are numer- 
ous and important (a representative example being 
the celebrated “simulated annealing algorithm”). 
For the sake of brevity, here we limit the discussion 
to the so-called “message-passing algorithms” which 
are also of great interest in coding theory. 

The statistical analysis of the cavity equations 
allows to study the average properties of ensemble 
of problems and it is totally equivalent to the replica 
method in which the average over the ensemble is 
the first step in any calculation. The survey 
propagation (SP) equations are a formulation of 
the cavity equations which is valid for each specific 
instance and is able to provide information about 
the statistical behavior of the individual variables in 
the stable and metastable states of a given energy 
density (i.e., given fraction of violated constraints). 


The single-sample SP equations are nicely described 
in terms of the factor graph representation used in 
information theory to characterize error-correcting 
codes. In the factor graph, the N variables i,j,k,... are 
represented by circular “variable nodes,” whereas the 
M clauses a,b,c,... are represented by square “func- 
tion nodes.” For random K-SAT, the function nodes 
have connectivity K, while the variable nodes have an 
average Poisson connectivity Ka. 

The iterative SP equations are examples of message- 
passing procedures. In message-passing algorithms 
such as the so-called “belief propagation (BP) 
algorithm” used in error-correcting codes and 
statistical inference problems, the unknowns which 
are self-consistently evaluated by iteration are the 
marginals over the solution space of the variables 
characterizing the combinatorial problem (the prob- 
ability space is the set of all solutions sampled with 
uniform measure). According to the physical inter- 
pretation, the quantities that are evaluated by SP are 
the probability distributions of local fields over the set 
of clusters. That is, while BP performs a “white” 
average over solutions, SP takes care of cluster-to- 
cluster fluctuations, telling us which is the probability 
of picking up a cluster at random and finding a given 
variable biased in a certain direction (or unfrozen if it 
is paramagnetic in the cluster). SP computes quantities 
which are probabilities over different pure states: the 
order parameter which is evaluated as fixed point of 
the SP equations is a probability measure in a space of 
functions, or for finite n, the full list of probability 
densities describing the cluster-to-cluster fluctuations 
of the variables. 

In both SP and BP one assumes knowledge of the 
marginals of all variables in the temporary absence 
of one of them and then writes the marginal 
probability induced on this “cavity” variable in 
absence of another third variable interacting with it 
(i.e., the so-called Bethe lattice approximation for 
the problem). These relations define a closed set of 
equations for such cavity marginals that can be 
solved iteratively (this fact is known as message- 
passing technique). The equations become exact if 
the cavity variables acting as inputs are uncorre- 
lated. They are conjectured to be an asymptotically 
exact approximation over random locally tree—like 
structures such as, for instance, the random K-SAT 
factor graph. Both BP and SP can be derived in a 
variational framework. 


Complexity of Counting Problems 


In order to describe the nature of computational 
complexity of counting in physical models, it is 
enough to consider the classical Ising problem. The 
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computation of the Ising partition function or, more 
in general, of the weighted matching polynomial, is 
the root problem of lattice statistics. 

For planar graphs like, for example, two-dimensional 
regular lattices, counting problems can often be 
solved by a variety of different methods, for 
example, transfer matrices and Pfaffians, which 
require a number of operations which are poly- 
nomial in the number of vertices. 

The complexity of the counting problems changes 
if one considers nonplanar graphs, that is, graphs 
with a nontrivial topological genus. In discrete 
mathematics, such problems are classified as 
#P-complete, meaning that the existence of an 
exact polynomial algorithm for the evaluation of the 
generating functions would imply the polynomial 
solvability of many known counting combinatorial 
problems, the most famous one being the evaluation of 
the permanent of 0-1 matrices. In statistical mechanics 
and mathematical chemistry, the interest in nonplanar 
lattices is obviously related to their D > 2 character: 
the three-dimensional cubic lattice is nothing but a 
nonplanar graph of topological genus g=1+N/4, 
where N is the number of sites. 

The planar two-dimensional Ising model was solved 
in 1944 by Onsager using the algebraic transfer matrix 
method. Successively, alternative exact solutions have 
been proposed which resorted to simple combinatorial 
and geometrical reasoning. As is well known, the 
underlying idea of the combinatorial methods consists 
in recasting the sum over spin configurations of the 
Boltzmann weights as a sum over closed curves (loops) 
weighted by the activity of their bonds. Double 
counting is avoided by a proper cancellation mechan- 
ism which takes care of the different intrinsic 
topologies of loops which give rise to the same 
contribution in the partition function. Such an 
approach has been developed first by Kac and Ward 
(1952) and provides a direct way of taking the field 
theoretic continuum limit. In D > 2, the general- 
ization of the above method encounters enormous 
difficulties due to the variety of intrinsic topologies of 
surfaces immersed in D > 2 lattices. 

Another combinatorial method proposed in the 
1960s by Kasteleyn is the so-called Pfaffian method. 
It consists in writing the weighted sum over loops as 
a dimer covering or prefect matching generating 
function. Once the relationship between loop count- 
ing and dimer coverings (or perfect matchings) over 
a suitably decorated and properly oriented lattice is 
established, the Pfaffian method turns out to be a 
simple technique for the derivation of exact solu- 
tions or for the definition of polynomial algorithms 
over planar lattices which are applicable also to the 
two-dimensional Ising spin glass. 


The generalization of the Pfaffian construction to 
the nonplanar case must deal with the ambiguity of 
orienting the homology cycles of the graph. Such a 
problem can be formally solved in full generality for 
any orientable lattice and leads to an expression of 
the Ising partition function or the dimer coverings 
generating function given as a sum over all possible 
inequivalent orientations of the lattice (or its embed- 
ding surface): for a graph of genus g, the homology 
basis is composed of 2g cycles and, therefore, there 
are 2*8 inequivalent orientations. It is only for graphs 
of logarithmic genus that the generalized Pfaffian 
formalism provides a polynomial algorithm. 

Counting perfect matchings can be thought of as 
the problem of evaluating the permanent of 0-1 
matrices over properly constructed bipartite graphs, 
which is among the oldest and most famous 
#P-complete problems. 

The Pfaffian formalism when applied to the perma- 
nent problem leads to a simple general result, that is, it 
provides a general formula for writing the permanent 
of a matrix in terms of a number of determinants which 
is exponential in the genus of the underlying graph. 


See also: Combinatorics: Overview; Determinantal 
Random Fields; Dimer Problems; Phase Transitions in 
Continuous Systems; Spin Glasses; Two-Dimensional 
Ising Model. 
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Introduction 


When a fluid is in contact with another fluid, or 
with a gas, a portion of the total free energy of the 
system is proportional to the area of the surface of 
contact, and to a coefficient, the surface tension, 
which is specific for each pair of substances. 
Equilibrium will accordingly be obtained when the 
free energy of the surfaces in contact is a minimum. 

Suppose that we have a drop of some fluid, b, 
over a flat substrate, w, while both are exposed to 
air, a. We have then three different surfaces of 
contact, and the total free energy of the system 
consists of three parts, associated to these three 
surfaces. A drop of fluid b will exist provided its 
own two surface tensions exceed the surface tension 
between the substrate w and the air, that is, 
provided that 


wb ae ba > pwa 


If equality is attained, then a film of fluid b is 
formed, a situation which is known as perfect, or 
complete wetting (see Figure 1). 

When one of the substances involved is aniso- 
tropic, such as a crystal, the contribution to the total 
free energy of each element of area depends on its 
orientation. The minimum surface free energy for a 
given volume then determines the ideal form of the 
crystal in equilibrium. 

It is only in recent times that equilibrium crystals 
have been produced in the laboratory, first, in 
negative crystals (vapor bubbles) of organic sub- 
stances. Most crystals grow under nonequilibrium 


a 
Ww 

a 
b 
Ww 


Figure 1 Partial and complete wetting. 


conditions and it is a subsequent relaxation of the 
macroscopic crystal that restores the equilibrium. 

An interesting phenomenon that can be observed 
on these crystals is the roughening transition, 
characterized by the disappearance of the facets of 
a given orientation, when the temperature attains a 
certain particular value. The best observations have 
been made on helium crystals, in equilibrium with 
superfluid helium, since the transport of matter and 
heat is then extremely fast. Crystals grow to sizes of 
1-5 mm and relaxation times vary from milliseconds 
to minutes. Roughening transitions for three differ- 
ent types of facets have been observed (see, e.g., 
Wolf et al. (1983)). 

These are some classical examples among a 
variety of interesting phenomena connected with 
the behavior of the interface between two phases in 
a physical system. The study of the nature and 
properties of the interfaces, at least for some simple 
systems in statistical mechanics, is also an interesting 
subject of mathematical physics. Some aspects of 
this study will be discussed in the present article. 

We assume that the interatomic forces can be 
modeled by a lattice gas, and consider, as a simple 
example, the ferromagnetic Ising model. In a typical 
two-phase equilibrium state, there is a dense 
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component, which can be interpreted as a solid or 
liquid phase, and a dilute phase, which can be 
interpreted as the vapor phase. Considering certain 
particular cases of such situations, we first introduce 
a precise definition of the surface tension and then 
proceed on the mathematical analysis of some 
preliminary properties of the corresponding inter- 
faces. The next topic concerns the wetting properties 
of the system, and the final section is devoted to the 
associated equilibrium crystal. 


Pure Phases and Surface Tension 


The Ising model is defined on the cubic lattice £ = Z’, 
with configuration space Q = {—1,1}*. If o € Q, the 
value o(i)=—1 or 1 is the spin at the site 
i = (11, 12,13) E L, and corresponds to an empty or an 
occupied site in the lattice gas version of the model. 
The system is first considered in a finite box A C £, 
with fixed values of the spins outside. 

In order to simplify the exposition, we shall 
mainly consider the three-dimensional Ising model, 
though some of the results to be discussed hold in 
any dimension d>2. We shall also, sometimes, 
refer to the two-dimensional model, it being under- 
stood that the definitions have been adapted in the 
obvious way. We assume that the box A is a 
parallelepiped, centered at the origin of £, of sides 
L,,L2,L3, parallel to the axes. 

A configuration of spins on A(a(z),z € A), denoted 
on, has an energy defined by the Hamiltonian 


-JX o [1] 


(i DOAZ 


Hilol) = 


where J is a positive constant (ferromagnetic or 
attractive interaction). The sum runs over all 
nearest-neighbor pairs (i,j) C £, such that at least 
one of the sites belongs to A, and one takes 
o(1)=o(1) when ig A, the configuration EQ. 
being the given boundary condition. The probability 
of the configuration o,, at the inverse temperature 
B= 1/kT, is given by the Gibbs measure 


ps(oa|@) = Z7(A)™* exp(—GHa(oale)) [2 
where Z?(A) is the partition function 


A) = Sv exp(—6H, (onl) [3] 


Local properties at equilibrium can be described by 
the correlation functions between the spins on finite 


sets of sites, 
=> ualoala) | [oG [4] 


icA 


The measures [2] determine (by the Dobrushin- 
Lanford—Ruelle equations) the set of Gibbs states of 
the infinite system, as measures on the set Q of all 
configurations. If a Gibbs state happens to be equal 
to lim uallo), when L1, L2,L3— 00, under a fixed 
boundary condition g, we shall call it the Gibbs 
state associated to the boundary condition a. One 
also says that this state exists in the thermodynamic 
limit. Then, equivalently, the correlation functions 
[4] converge to the corresponding expectation values 
in this state. 

This model presents, at low temperatures (i.e., for 
B> Bc, where Be is the critical inverse temperature), 
two different thermodynamic pure phases, a dense 
and a dilute phase in the lattice gas language (called 
here the positive and the negative phase). This 
means two extremal translation-invariant Gibbs 
states, u" and pu, obtained as the Gibbs states 
associated with the boundary conditions o, respec- 
tively equal to the ground configurations a(i)=1 
and o(i)=—1, for all i€ £L. The spontaneous 
magnetization 


m* (6) = u` (o(4)) = — (a?) [5] 


is then strictly positive. On the other hand, if 3 < Be, 
then the Gibbs state is unique and m* =0. 

Each configuration inside A can be described in a 
geometric way by specifying the set of Peierls 
contours which indicate the boundaries between 
the regions of spin 1 and the regions of spin —1. 
Unit-square faces are placed midway between the 
pairs of nearest-neighbor sites i and j, perpendicu- 
larly to these bonds, whenever o(i)o(j)=—1. The 
connected components of this set of faces are the 
Peierls contours. Under the boundary conditions (+) 
and (—), the contours form a set of closed surfaces. 
They describe the defects of the considered config- 
uration with respect to the ground states of the 
system (the constant configurations 1 and —1), and 
are a basic tool for the investigation of the model at 
low temperatures. 

In order to study the interface between the two 
pure phases, one needs to construct a state describ- 
ing the coexistence of these phases. This can be done 
by means of a new boundary condition. Let 
n= (nı, m, n3) be a unit vector in R°, such that 
nz >Q, and introduce the mixed boundary condition 


(+,n), for which 


=d 


This boundary condition forces the system to 
produce a defect going transversally through the 
box A, a large Peierls contour that can be 


ifi-n>0O 
= 6 
ifi-n<O 6 


interpreted as the microscopic interface (also called 
a domain wall). The other defects that appear above 
and below the interface can be described by closed 
contours inside the pure phases. 

The free energy per unit area due to the presence 
of the interface is the surface tension. It is defined by 


N3 ZEHN) 


Atala In Z*(A) 7 


T(n)= lim lim — 
L1,L2—00 L3—00 

In this expression the volume contributions propor- 
tional to the free energy of the coexisting phases, as 
well as the boundary effects, cancel, and only the 
contributions to the free energy due to the interface 
are left. The existence of such a quantity indicates 
that the macroscopic interface, separating the 
regions occupied by the pure phases in a large 
volume A, has a microscopic thickness and can 
therefore be regarded as a surface in a thermo- 
dynamic approach. 


Theorem 1 The interfacial free energy per unit 
area, T(n), exists, is bounded, and its extension by 
positive homogeneity, f(x) =|x|7T(x/|x|), is a convex 
function on R°. Moreover, t(n) is strictly positive 


for B>-, and vanishes if B < Be. 


The existence of (n) and also the last statement 
were proved by Lebowitz and Pfister (1981), in the 
particular case n = (0, 0, 1), with the help of correla- 
tion inequalities. A complete proof of the theorem 
was given later with similar arguments. The con- 
vexity of f is equivalent to the fact that the surface 
tension 7 satisfies a thermodynamic stability condi- 
tion known as the pyramidal inequality (see 
Messager et al. (1992)). 


Gibbs States and Interfaces 


In this section we consider the (4,29) boundary 
condition, also simply denoted (+), associated to the 
vertical direction no = (0, 0, 1), 


gli)=1 ifi > 0, ai)=-1lifiz,<0 [8] 
The corresponding surface tension is T* = T(no). We 
shall first recall some classical results which concern 
the Gibbs states and interfaces at low temperatures. 

According to the geometrical description of the 
configurations introduced in the last section, we 
observe that 


Z*"(A)/Z*(A) = Š$ exp(-29J/A|- Ua (à)) [9] 
A 


where the sum runs over all microscopic interfaces A 
compatible with the boundary condition and [A| is 
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the number of faces of A (inside A). The term U4 (A) 
equals —In Z* (A, A)/Z*(A), the sum in the partition 
function Z7 (A, A) being extended to all configura- 
tions whose associated contours do not intersect A. 
Each term in sum [9] gives a weight proportional to 
the probability of the corresponding microscopic 
interface. 

At low (positive) temperatures, we expect the 
microscopic interface corresponding to this bound- 
ary condition, which at zero temperature coincides 
with the plane i3 = — 1/2, to be modified by small 
deformations. Each microscopic interface A can then 
be described by its defects, with respect to the 
interface at 8 =oo. To this end, one introduces some 
objects, called walls, which form the boundaries 
between the horizontal plane portions of the micro- 
scopic interface, also called the ceilings of the 
interface. 

More precisely, one says that a face of A is a 
ceiling face if it is horizontal and such that 
the vertical line passing through its center does not 
have other intersections with A. Otherwise, one 
says that it is a wall face. The set of wall faces splits 
into maximal connected components. The set of 
walls, associated to A, is the set of these compo- 
nents, each component being identified by its 
geometric form and its projection on the plane 
i3=—1/2. Every wall w, with projection z(w), 
increases the energy of the interface by a quantity 
2] \|w||, where ||w|| =|w] — rlw), and two walls are 
compatible if their projections do not intersect. In 
this way, the microscopic interfaces may be inter- 
preted as a “gas of walls” on the two-dimensional 
lattice. 

Dobrushin, who developed the above analysis, 
also proved the dilute character of this “gas” at low 
temperatures. This implies that the microscopic 
interface is essentially flat, or rigid. One can under- 
stand this fact by noticing first that the probability 
of a wall is less than exp (—2(]||w||) and, second, 
that in order to create a ceiling in A, which is not in 
the plane 13 = —1/2, one needs to surround it by a 
wall, that one has to grow when the ceiling is made 
over a larger area. 

Using correlation inequalities one proves that the 
Gibbs state p*, associated to the (+) boundary 
conditions, always exists, and that it is invariant 
under horizontal translations of the lattice, that is, 
u*(o(A + a))=p*(o(A)) for all a=(aj, 42,0). It is 
also an extremal Gibbs state. Let m(z) be the 
magnetization p~((o(z)) at the site z=(0,0,z). The 
function m(z) is monotone increasing and satisfies 
the symmetry property m(—z)=—m(z+1). Some 
consequences of Dobrushin’s work are the following 
properties. 
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Theorem 2 If the temperature is low enough, that 
is, if BJ > cı, where c1 is a given constant, then 


m=*(0) is strictly positive [10] 
m=(z) + m*, when z — œ, exponentially fast [11] 


Equation [10] is just another way of saying that 
the interface is rigid and that the state u* is non- 
translation invariant (in the vertical direction). 
Then, the correlation functions u*(o(A)) describe 
the local properties, or local structure, of the 
macroscopic interface. In particular, the function 
m(z) represents the magnetization profile. Then 
statement [11], together with the symmetry prop- 
erty, tells us that the thickness of this interface is 
finite, with respect to the unit lattice spacing. 

The statistics of interfaces has been rewritten in 
terms of a gas of walls and this system may further 
be studied by cluster expansion techniques. There is 
an interaction between the walls, coming from the 
term U,(A) in eqn [9], but a convenient mathema- 
tical description of this interaction can be obtained 
by applying the standard low-temperature cluster 
expansion, in terms of contours, to the regions 
above and below the interface. 

This method was introduced by Gallavotti in his 
study (mentioned below) of the two-dimensional 
Ising model. It has been applied by Bricmont and 
co-workers to examine the interface structure in the 
present case. As a consequence, it follows that 
the surface tension, more exactly G7*(@), and also 
the correlation functions, are analytic functions at 
low temperatures. They can be obtained as explicit 
convergent series in the variable C=e7"/. 

The same analysis applied to the two-dimensional 
model shows a very different behavior at low 
temperatures. In this case, the microscopic interface 
is a polygonal line and the walls belong to the one- 
dimensional lattice. One can then increase the size of 
a ceiling without modifying the walls attached to it. 

Indeed, Gallavotti turned this observation into a 
proof that the Gibbs state u® is now translation 
invariant. The line À undergoes large fluctuations of 
order yL, and disappears from any finite region of 
the lattice, in the thermodynamic limit. In particular, 
we have then p*=(1/2)(u*+ p77), a result that 
extends to all boundary conditions (+, n). 

Using these results Bricmont and co-workers also 
studied the local structure of the interface at low 
temperatures and showed that its intrinsic thickness 
is finite. To study the global fluctuations, one can 
compute the magnetization profile by introducing, 
before taking the thermodynamic limit, a change of 
scale: u*(o(zL4)), with 6=1/2 or near to this value. 


This is an exact computation that has been done by 
Abraham and Reed. 

Let us come back to the three-dimensional Ising 
model where we know that the interface orthogonal 
to a lattice axis is rigid at low temperatures. 


Question 1 At higher temperatures, but before 
reaching the critical temperature, do the fluctuations 
of this interface become unbounded, in the thermo- 
dynamic limit, so that the corresponding Gibbs state 
is translation invariant? 


One says then that the interface is rough, and it is 
believed that, effectively, the interface becomes 
rough when the temperature is raised, undergoing 
a roughening transition at an inverse temperature 
Br Zz Bc. 

It is known that Br < 64>, the critical inverse 
temperature of the two-dimensional Ising model, 
since van Beijeren proved, using correlation inequal- 
ities, that above this value, the state p* is not 
translation invariant. Recalling that the rigid inter- 
face may be viewed as a two-dimensional system, 
the system of walls, a representation that would 
become inappropriate for a rough interface, one 
might think that the phase transition of the two- 
dimensional Ising model is relevant for the rough- 
ening transition, and that Gr is somewhere near 
34-7, Indeed, approximate methods, used by Weeks 
and co-workers give some evidence for the existence 
of such a Gg and suggest a value slightly smaller 
than @¢=7, as shown in Table 1. To this day, 
however, there appears to be no proof of the fact 
that Gr > Be, that is, that the roughening transition 
for the three-dimensional Ising model really occurs. 

At present one is able to study the roughening 
transition rigorously only for some simplified mod- 
els with a restricted set of admissible microscopic 
interfaces. Moreover, the closed contours, describing 
the defects above and below A, are neglected, so that 
these two regions have the constant configurations 1 
or —1, and one has U,(A) =0 in eqn [9]. 

The best known of these models is the classic SOS 
(solid-on-solid) model in which the interfaces \ have 
the property of being cut only once by all vertical 
lines of the lattice. This means that is the graph of 
a function that can equivalently be used to define 
the possible configurations of A. If A contains the 
horizontal face with center (i1,i2,13 — 1/2), then 


Table 1 Some temperature values 


d=3 Bod ~ 0.22 approximate critical temperature 
d=3 Grd ~ 0.41 conjectured roughening temperature 
d=2 Bed = 0.44 exact critical temperature 


the value at (71,72) of the associated function is 
P(t1, 12) = 13. 

The proof that the SOS model with the boundary 
condition (+) has a roughening transition is a highly 
nontrivial result due to Frohlich and Spencer. When 
B is small enough, the fluctuations of A are of order 
Vln L (in a cubic box of side L). 

Moreover, other interface models, with additional 
conditions on the allowed microscopic interfaces, 
are exactly solvable. The BCSOS (body-centered 
SOS) model, introduced by van Beijeren, belongs to 
this class. It is, in fact, the first model for which the 
existence of a roughening transition has been 
proved. More recently, also the TISOS (triangular 
Ising SOS) model, introduced by Blote and Hilhorst 
and further studied by Nienhuis and co-workers, has 
been considered in this context. 

The interested reader can find more information 
and references, concerning the subject of this 
section, in the review article by Abraham (1986). 


Wetting Phenomena 


Next we consider the Ising model over a plane 
horizontal substrate (also called a wall) and study 
the difference of surface tensions which governs the 
wetting properties of this system. 

We first describe the approach developed by 
Frohlich and Pfister (1987) and briefly report some 
results of their study. We consider the model on the 
semi-infinite lattice 


Li = {i € 2: > 0} [12] 


A magnetic field, K > 0, is added on the boundary 
sites, 73 = 0, which describes the interaction with the 
substrate, supposed to occupy the complementary 
region £ \ Z’. 

We constrain the model in the finite box A’ = A N £’, 
with A as above, and impose the value of the spins 
outside. The Hamiltonian becomes 


Id, @ K 3, o) 


(i jN IEN’ i3=0 


Hý (on |e) = 


Here ow represents the configuration inside A’, the 
pairs (i,j) are contained in £’, and o(i) =a(i) when 
i Z A’, the configuration g being the given boundary 
condition (see Figure 2). The corresponding parti- 
tion function is denoted by Z”?(A’). 

Since there are two pure phases in the model, we 
must consider two surface free energies, or surfaces 
tensions, 7”’* and 7”~, between the wall and the 
positive or negative phase present in the bulk. They 
are defined through the choice of the boundary 
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Figure 2 Boundary conditions for the cubic lattice. Above, the 
box A with the (+) and (step) boundary conditions. Below, the 
box A’ and the wall W with the (w —) boundary conditions. 


condition, g(i)=1 or o(7) = —1, for all i € £’. Let us 
consider first the case of the (— ) boundary condition. 

The surface free energy contribution per unit area 
due to the presence of the wall, when we have the 
negative phase in the bulk, is 





“(8,K) 
o, | 1 , Z¥-(A') 


The division by Z~(A)'/7 allows us to subtract from 
the total free energy, In ZY (A), the bulk term and 
all boundary terms which are not related to the 
presence of the wall. The existence of limit [14] 
follows from correlation inequalities, and we have 
T= >00. 

One can prove, as well, the existence of the Gibbs 
state uw” of the semi-infinite system, associated to 
the (—) boundary condition. This state is the limit of 
the finite volume Gibbs measures pux(oxl\(—)) 
defined by the Hamiltonian [13]. It describes the 
local equilibrium properties of the system near 
the wall, when deep inside the bulk the system is 
in the negative phase. Similar definitions give the 
surface tension 7”* and the Gibbs state p”™, 
corresponding to the boundary condition a(i)=1, 
for allie A’. 

We remark that the states u”* and p’”~ are invariant 
by translations parallel to the plane i3=0, and 
introduce the magnetizations, m’’~(z) =p” (a(z)), 
where z denotes the site (0,0, z), n” = m” (0), and 
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similarly m”’*(z) and m”*, Their connection with 
the surface free energies is given by the formula 


7’ (B,K) — 7°" (6, K) 
K 
= / (m”’* (8,s) — m” (3,s)) ds [15] 


We mention in the following theorem some 
results of Frohlich and Pfister’s study. Here 7* is, 
as before, the usual surface tension between the 
two pure phases of the system, for a horizontal 
interface. 


Theorem 3 With the above definitions, we have 
7” (8, K) — r”*(B,K) < 7*(8) [16] 
m” (6, K) — m” (8, K) > 0 17) 


and the difference in |17] is a monotone decreasing 
function of the parameter K. Moreover, if m”’* =m", 
then the Gibbs states pp’? and u” coincide. 


The proof is a subtle application of correlation 
inequalities. Since, from Theorem 3, the integrand in 
eqn [15] is a positive and decreasing function, the 
difference Ar =T% — T” is a monotone increasing 
and concave (and hence continuous) function of the 
parameter K. On the other hand, one can prove that 
Ar=rt*, if K>J. This justifies the following 
definition: 


K,(8) = min{K: Ar(6,K) = 7" (6)} [18] 


In the thermodynamic description of wetting, the 
partial-wetting regime is characterized by the strict 
inequality in [16]. Equivalently, by K < K„(68). We 
must have then m”’* + m”, because of eqn [15]. 
This shows that, in the case of partial wetting, u" 
and u” are different Gibbs states. 

The complete-wetting regime is characterized by the 
equality in [16], that is, by K > K,,(G). Then, we have 
m”’* = m”, and taking into account the last statement 
in Theorem 3, also w”* = u~. This last result implies 
that there is only one Gibbs state. Thus, complete 
wetting corresponds to unicity of the Gibbs state. 

In this case, we also have limm”~(z) =m*, when 
zZ— oo, because this is always true for m”*(z). This 
indicates that we are in the positive phase of the 
system although we have used the (—) boundary 
condition, so that the bulk negative phase cannot 
reach the wall anymore. The film of positive phase, 
which wets the wall completely, has an infinite 
thickness with respect to the unit lattice spacing, in 
the thermodynamic limit. 

When G=oco, only a few particular ground con- 
figurations contribute to the partition functions, 
such as the configuration o(i) = —1 for the partition 


function Z”, etc., and we obtain Ar=2K and 
T*=2]. For nonzero but low temperatures, the 
small perturbations of these ground states have to be 
considered, a problem that can be treated by the 
method of cluster expansions. In fact, the corre- 
sponding defects can be described by closed con- 
tours as in the case of pure phases. 


Theorem 4 For K<J, the functions (1 (G, K) 
and Gr’*(G,K) are analytic at low temperatures, 
that is, provided that B(J — K) > c, where co is a 
given constant. Moreover, m”+(z) and m” (z) tend, 
respectively, to m* and to —m*, when z= oœ, 
exponentially fast. 


The last statement in Theorem 4 tells us that the 
wall affects only a layer of finite thickness (with 
respect to the lattice spacing). From a macroscopic 
point of view, the negative phase reaches the wall, 
and we are in the partial-wetting regime. Indeed, a 
strict inequality holds in [16]. 

Thus, for K <J there is always partial wetting at 
low temperatures. Then the following question arises: 


Question 2 Is there a situation of complete wetting 
at higher temperatures? It is understood here that K 
takes a fixed value, characteristic of the substrate, 
such that 0<K</J. 


This is known to be the case in dimension d=2, 
where the exact value of K,,(3) can be obtained 
from Abraham’s solution of the model: 


cosh 26K, = cosh 26] — e~*” sinh 287 


Then complete wetting occurs for 8 in the interval 
Be <@B<6,(K), where B. is the critical inverse 
temperature and (3,,(K) is the solution of K,,(3)=K. 
The case d=2 has been reviewed in Abraham 
(1986). 

To our knowledge, the above question remains an 
open problem for the Ising model in dimension 
d=3. The problem has, however, been solved for 
the simpler case of a SOS interface model. In this 
case, a nice and rather brief proof of the following 
result has been given by Chalker (1982): one has 
m'”’* =m", and hence complete wetting, if 


26(J — K) < —In(1 =e 


It is very plausible that a similar statement is valid 
for the semi-infinite Ising model and, also that 
Chalker’s method could play a role for extending the 
proof to this case, provided an additional assump- 
tion is made. Namely, that 8 is sufficiently large, 
and hence J — K small enough, in order to insure the 
convergence of the cluster expansions and to be able 
to use them. 


Equilibrium Crystals 


The shape of an equilibrium crystal is obtained, 
according to thermodynamics, by minimizing the 
surface free energy between the crystal and the 
medium, for a fixed volume of the crystal phase. 
Given the orientation-dependent surface tension 
T(m), the solution to this variational problem, 
known under the name of Wulff construction, is 
the following set: 


W = {x € Rè: x- n < t(n) for all n} [19] 


Notice that the problem is scale invariant, so that if we 
solve it for a given volume of the crystal, we get the 
solution for other volumes by an appropriate scaling. 
We notice also that the symmetry T(n) = T(—7) is not 
required for the validity of formula [19]. In the present 
case, T(n) is obviously a symmetric function, but 
nonsymmetric situations are also physically interesting 
and appear, for instance, in the case of a drop on a wall 
discussed in the last section. 

The surface tension in the Ising model between 
the positive and negative phases has been defined in 
eqn [7]. In the two-dimensional case, this function 
T(n) has (as shown by Abraham) an exact expression 
in terms of some Onsager’s function. It follows (as 
explained in Miracle-Sole (1999)) that the Wulff 
shape W, in the plane (x1,x2), is given by 


cosh 3x1 + cosh 8x2 < cosh* 28J/ sinh 26] 


This shape reduces to the empty set for B < Be, since 
the critical Be satisfies sinh 2J 8. = 1. For G> 6e, it is 
a strictly convex set with smooth boundary. 

In the three-dimensional case, only certain inter- 
face models can be exactly solved (see the section 
“Gibbs states and interfaces”). Consider the Ising 
model at zero temperature. The ground configura- 
tions have only one defect, the microscopic interface 
A, imposed by the boundary condition (+, n). Then, 
from eqn [9], we may write 
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(Ex(n)—B'Na(m)) [20] 
where Ea =2J|A| is the energy (all A have the 
same minimal area) and N, the number of 
ground states. Every such A has the property of 
being cut only once by all straight lines orthogo- 
nal to the diagonal plane 7; + i2 + i3 =0, provided 
that n >0, for k=1,2,3. Each A can then be 
described by an integer function defined on a 
triangular plane lattice, the projection of the 
cubic lattice £ on the diagonal plane. The model 
defined by this set of admissible microscopic 


Statistical Mechanics of Interfaces 61 


interfaces is precisely the TISOS model. A similar 
definition can be given for the BCSOS model that 
describes the ground configurations on the body- 
centered cubic lattice. 

From a macroscopic point of view, the roughness 
or the rigidity of an interface should be apparent 
when considering the shape of the equilibrium 
crystal associated with the system. A typical equili- 
brium crystal at low temperatures has smooth plane 
facets linked by rounded edges and corners. The 
area of a particular facet decreases as the tempera- 
ture is raised and the facet finally disappears at a 
temperature characteristic of its orientation. It can 
be argued that the disappearance of the facet 
corresponds to the roughening transition of the 
interface whose orientation is the same as that of the 
considered facet. 

The exactly solvable interface models mentioned 
above, for which the function t(m) has been 
computed, are interesting examples of this behavior, 
and provide a valuable information on several 
aspects of the roughening transition. This subject 
has been reviewed by Abraham (1986), van Beijeren 
and Nolden (1987), and Kotecky (1989). 

For example, we show in Figure 3 the shape 
predicted by the TISOS model (one-eighth of the 
shape because of the condition m,>0). In this 
model, the interfaces orthogonal to the three 
coordinate axes are rigid at low temperatures. 

For the three-dimensional Ising model at positive 
temperatures, the description of the microscopic 
interface, for any orientation n, appears as a very 
difficult problem. It has been possible, however, 
to analyze the interfaces which are very near to 
the particular orientations mo, discussed in the 


Figure 3 Cubic equilibrium crystal shown in a projection 
parallel to the (1,1,1) direction. The three regions (1, 2, and 3) 
indicate the facets and the remaining area represents a curved 
part of the crystal surface. 
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section “Gibbs states and interfaces.” This analysis 
allows us to determine the shape of the facets in a 
rigorous way. 

We first observe that the appearance of a facet 
in the equilibrium crystal shape is related, according 
to the Wulff construction, to the existence of a 
discontinuity in the derivative of the surface 
tension with respect to the orientation. More 
precisely, assume that the surface tension satisfies 
the convexity condition of Theorem 1, and let this 
function t(”)=7(6,@) be expressed in terms of the 
spherical coordinates of n, the vector no being taken 
as the x3-axis. A facet orthogonal to mp appears in 
the Wulff shape if and only if the derivative 
OT(0,¢)/00 is discontinuous at the point 0=0, 
for all ¢. The facet F C OW consists of the points 
x € R? belonging to the plane x3 = r(no) and such 
that, for all o between 0 and 27, 


x1 cos d+ x2 sing < 07(8, b)/OO|p_9+ [21] 


The step free energy is expected to play an 
important role in the facet formation. It is defined 
as the free energy associated with the introduction 
of a step of height 1 on the interface, and can be 
regarded as an order parameter for the roughening 
transition. Let A be a parallelepiped as in the section 
“Pure phases and surface tension,” and introduce 
the (step, m) boundary conditions (see Figure 2), 
associated to the unit vectors m = (cos ¢ġ, sind) € 
R’, by 


1 if i> 0 or if i = 0 and 
—1 otherwise 
Then, the step free energy per unit length for a step 


orthogonal to m (with m >0) on the horizontal 
interface, is 


rsteP(g) 
4 , coso, ZP» (A) 
“ioe a 


A first result concerning this point was obtained 
by Bricmont and co-workers, by proving a correla- 
tion inequality which establish 7*°P?(0) as a lower 
bound to the one-sided derivative 07(0,0)/00 at 
6=07 (the inequality extends also to ¢ Æ 0). Thus, 
when r*“P > 0, a facet is expected. 

Using the perturbation theory of the horizontal 
interface, it is possible to also study the microscopic 
interfaces associated with the (step, m) boundary 
conditions. When considering these configurations, 


the step may be viewed as an additional defect on 
the rigid interface described in the section “Pure 
phases and surface tension.” It is, in fact, a long wall 
going from one side to the other side of the box A. 
The step structure at low temperatures can then be 
analyzed with the help of a new cluster expansion. 
As a consequence of this analysis, we have the 
following theorem. 


Theorem 5 If the temperature is low enough, that 
is, if BJ > c3, where c3 is a given constant, then the 
step free energy, T*?(@), exists, is strictly positive, 
and extends by positive homogeneity to a strictly 
convex function. Moreover, BT*P(ġ) is an analytic 
function of ¢=e*!°, for which an explicit conver- 
gent series expansion can be found. 


Using the above results on the step structure, 
similar methods allow us to evaluate the increment 
in surface tension of an interface tilted by a very 
small angle 0 with respect to the rigid horizontal 
interface. This increment can be expressed in terms 
of the step free energy, and one obtains the 
following relation. 


Theorem 6 For 6J > c3, we have 
Or(9, b)/OO|g_o+ = TPP) |24] 


This relation, together with eqn [21], implies that 
one obtains the shape of the facet by means of the 
two-dimensional Wulff construction applied to the 
step free energy. The reader will find a detailed 
discussion on these points, as well as the proofs of 
Theorems 5 and 6, in Miracle-Sole (1995). 

From the properties of 7°? stated in Theorem 5, 
it follows that the Wulff equilibrium crystal presents 
well-defined boundary lines, smooth and without 
straight segments, between a rounded part of the 
crystal surface and the facets parallel to the three 
main lattice planes. 

It is expected, but not proved, that at a higher 
temperature, but before reaching the critical 
temperature, the facets associated with the Ising 
model undergo a roughening transition. It is then 
natural to believe that the equality [24] is true for 
any @ larger than Gp, allowing us to determine the 
facet shape from eqns [21] and [24], and that for 
B < Br, both sides in this equality vanish, and 
thus, the disappearance of the facet is involved. 
However, the condition that the temperature is 
low enough is needed in the proofs of Theorems 5 
and 6. 


See also: Dimer Problems; Phase Transitions in 
Continuous Systems; Phase Transition Dynamics; 
Two-Dimensional Ising Model; Wulff Droplets. 
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Introduction 


Stochastic differential equations (SDEs) appear 
today as a modeling tool in several sciences as 
telecommunications, economics, finance, biology, 
and quantum field theory. 

An SDE is essentially a classical differential 
equation which is perturbed by a random noise. 
When nothing else is specified, SDE means in fact 
ordinary SDE; in that case it corresponds to the 
perturbation of an ordinary differential equation. 
Stochastic partial differential equations (SPDEs) are 
obtained as random perturbation of partial differ- 
ential equations (PDEs). 

One of the most important difference between 
deterministic and stochastic ordinary differential 
equations is described by the so-called Peano type 
phenomenon. A classical differential equation with 
continuous and linear growth coefficients admits 
global existence but not uniqueness as classical 
calculus text books illustrate studying equations of 
the type 


* = ,/X(), X(0)=0 


However, if one perturbs the right member of the 
equality with an additive Gaussian white noise (¢;) 
(even with very small intensity), then the problem 
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becomes well stated. A similar phenomenon happens 
with linear PDEs of evolution type perturbed with a 
spacetime white noise. 

SDEs constitute a vast subject and account for an 
incredible amount of relevant contributions. We try 
to orientate the reader about the main axes trying to 
indicate references to the different subfields. We will 
prefer to refer to monographs when available, 
instead of articles. 


Motivation and Preliminaries 


In the whole article T will be a strictly positive real 
number. Let us consider continuous functions 
b:R, x RI >R4, a:R, x R?” >R? and xp € R?. 
We consider a differential problem of the following 
type: 


dX, 
dt = b(t, X;) [1] 
Xo = X0 


Let (Q, F,P) be a complete probability space. 
Suppose that previous equation is perturbed by a 
random noise (€;),.9. Because of modeling reasons it 
could be reasonable to suppose (€;),59 satisfying the 
following properties. 


1. It is a family of independent random variables 
(r.v.’s) 

2. (&)>o is “stationary”, that is, for any positive 
integer n, positive reals h,to,t1,...,t, the law of 
(Enths-- -> Er, +h) does not depend on h. 
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More precisely we perturb eqn [1] as follows: 


dX, 
pr = b(t, Xi) -+ a(t, X+)& 2) 
Xo = Xo 


We suppose for a moment that d = m = 1. In reality no 
reasonable real-valued process (&+);>ọ fulfilling pre- 
vious assumptions exists. In particular, if process (€;) 
exists (resp. (€) exists and each & is a square- 
integrable r.v.), then the process cannot have contin- 
uous paths (resp. it cannot be measurable with respect 
to Q x R,). However, suppose that such a process 
exists; we set B; = F Eds. In that case, properties (1) 
and (2) can be translated into the following on (B+). 


(P1) It has independent increments, which means 
that for any Dp s.025 0317 & UL Pp ay. = B nares 
Bp — B; +p are independent r.v.’s. 

(P2) It has stationary increments, which means that 
for any to,...,tn,4 >0, the law of (Ba+p— 
Biy+hs--->Bz,4+5 — Bt, +p) does not depend on +. 


On the other hand, it is natural to require that 


(C1) Bo=0 a.s., 
(C2) it is a continuous process, 
continuous paths a.s. 


that is, it has 


Equation [2] should be rewritten in some integral form 
t 
Xi = Xo +f b(s, X,)ds 
0 


4 J “als, X.)dB, [3] 


Clearly the paths of process (B;) cannot be differ- 
entiable, so one has to give meaning to integral 
fials, Xs)dB;. This will be intended in the “Ito” 
sense, see considerations below. 

An important result of probability theory says 
that a stochastic process (B+) fulfilling properties P1, 
P2 and C1, C2 is essentially a “Brownian motion”. 
More precisely, there are real constants b,o such 
that B,=bt+oW,, where (W,) is a classical Brow- 
nian motion defined below. 


Definition 1 


(i) A (continuous) stochastic process (W+) is called 
classical “Brownian motion” if Wọ=0 a.s., 
it has independent increments and the law of 
W, — W, is a Gaussian N(0,t — s) r.v 

(ii) A m-dimensional Brownian motion is a vector 
(Wt,..., W”) of independent classical Brow- 
nian motions. 


Let (F:)>o be a filtration fulfilling the usual 
conditions, see (Karatzas and Shreve (1991, section 1.1). 


There one can find basic concepts of the theory of 
stochastic processes as the concept of adapted, 
progressively measurable process. An adapted pro- 
cess is also said to be nonanticipating towards the 
filtration (F,) which represents the state of the 
information at each time t. A process (X;) is said to 
be adapted if for any t,X; is F,;-measurable. The 
notion of progressively measurable process is a slight 
refinement of the notion of adapted process. 


Definition 2 


(i) A (continuous) (F+) adapted process (W+) is called 
(classical) (F;)-Brownian motion if Wo=0, if 
for any s < t W,— W, is an N(0,t—s) distrib- 
uted r.v. which is independent of F.. 

(ii) An (F,)-m-dimensional Brownian motion is a 
vector (W!,...,W”) of (F;)-classical indepen- 
dent Brownian motions. 


From now on, we will consider a_ probability 
space (Q,F,P) equipped with a filtration (F;),+o 
fulfilling the usual conditions. From now on all the 
considered filtrations will have that property. 

Let W =(W;),s9 be an (F;),.9-m-dimensional clas- 
sical Brownian motion. In Karatzas and Shreve (1991, 
chapter 3) and Revuz and Yor (1999, chapter 4), one 
introduces the notion of stochastic Itô integral 
announced before. Let Y = (Yt, ..., Y”) be a progres- 
sively measurable m- doao] process such that 
fo || Y.||"ds < 00, then the Itô integral h Y,dW, is well 
defined. In aea the indefinite integral f} Y;dW; is 
an (F e)-progressively measurable continuous process. 
If Y is an R?” matrix-valued process, the integral 
Jo Yd W, componentwise defined and it will be a 
vector in R. The analogous of differential calculus in 
the framework of stochastic processes is It6 calculus, 
see again Karatzas and Shreve (1991, chapter 3) avid 
Revuz and Yor (1999, chapter 4). Important tools are 
the concept of quadratic variation [X] of a stochastic 
process when it exists. For instance, the quadratic 
variation [W]; of a classical Brownian motion equals ż. 
If M: = fi YsdW,, then [M]; = f [Ys] ds. One cele- 
brated theorem of P phe states the following: if (M;) 
defines a continuous (F;)-local martingale such that 
[M,] = t, then M is an (F;)-classical Brownian motion. 
That theorem is called the “Lévy characterization 
theorem of Brownian motion.” It6 formula constitutes 
the natural generalization of fundamental theorem of 
differential calculus to the stochastic calculus. Another 
significant tool is Girsanov theorem; it states essen- 
tially the following: suppose that the following so- 
called “Novikov condition” is verified: 


efh f ly, II at t) < 00 


Then the process W;= W, + i, Y;ds,t € [0, T] is 
again an m-dimensional (F+)-classical Brownian 
motion under a new probability measure O on 


(Q, Fr) defined by 


t 
dO = dP exp (| Y,dW, — ; TAKS 
0 


Let € be an Fo-measurable r.v., for instance, € = 
x € R?. We are interested in the SDE 


dX, = a(t, Xe) dW, + b(t, X;) dt 


Xy=é 4 


Definition 3 A progressively measurable process 
(Xt)rcjo, ry is said to be solution of [4] if a.s. 


t t 
xX; = z+ | a(t, X) dW, + f b(t, X,) dt 
0 0 
Vt € [0, T] 


[5] 


provided that the right-hand side member makes 
sense. In particular, such a solution is continuous. 
The function a (resp. b) is called the diffusion (drift) 
coefficient of the SDE. a and b may sometimes be 
allowed to be random; however, this dependence 
has to be progressively measurable. Clearly, we can 
define the notion of solution (X;)>o on the whole 
positive real axis. 


We remark that those equations are called Itô 
SDEs. A solution of previous equation is named 
diffusion process. 


The Lipschitz Case 


The most natural framework for studying the 
existence and uniqueness for SDEs appears when 
the coefficients are Lipschitz. 

A function y:[0,T] x R” — Rf is said to have 
“polynomial growth” (with respect to x uniformly in 
t), if for some n there is a constant C > 0 with 


sup ||y(¢,x)|] < CA + Illl”) [6] 
te(0,T| 


The same function is said to have “linear growth” if 
[6] holds with n= 1. A function y:R} x R” +R? is 
said to be “locally Lipschitz” (with respect to 
x uniformly in ft), if for every ¢ €[0,T],K > 0, 
Jlo, TIx[-K,K] 18 Lipschitz (with respect to x uniformly 
with respect to t). 

Let a:R, x R?” — R46: R, x R? — R, be 
Borel functions, £ an R?-valued r.v. Fo-measurable 
and (W;)»ọo be a m-dimensional (F;)-Brownian 
motion. 

Classical fixed-point theorems allow to establish 
the following classical result. 
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Theorem 1 We suppose a and b locally Lipschitz 
with linear growth. Let € be a square-integrable r.v. 
that is Fo-measurable. Then [4] has a unique 
solution X. Moreover, 


E (sn x] < 00 
t<T 


(i) Equation [4] can be settled similarly by putting 
initial condition x at some time s. In that case 
the problem is again well stated. If £ = x is a 
deterministic point of Rf, then we will often 
denote by X** the solution of that problem. 

(ii) If the coefficients are only locally Lipschitz, the 
equation may be solved until a stopping time. If 
d = 1, it is possible to state necessary and sufficient 
conditions for nonexplosion (Feller test). 

(iii) The theorem above admits several generaliza- 
tions. For instance, the Brownian motion can be 
replaced by general semimartingales, (possibly 
with jumps as Lévy processes). 


Remark 1 


An important role of diffusion processes is the fact 
that they provide probabilistic representation to 
PDEs of parabolic (and even elliptic) type. We will 
only mention here the parabolic framework. 

We denote A(t, x) =a(t,x)a(t,x)", where x means 
transposition for matrices. (t,x) — A(t, x) = (Aj(t, x)) 
is a d x d matrix-valued function. Let us consider also 
continuous functions k:[0, T] x Rf — Rt, g: [0, T] x 
R? — Rf with polynomial growth or non-negative. 

Given a solution of [4], we can associate its 
generator (L,,t € [0, T]) setting 


d 
Lf) = 5 > Ault. x) Rf (e) + b,x) VF) 
FE] 


Feynman-Kac theorem is stated below and it 
provides probabilistic representation of an asso- 
ciated parabolic linear PDEs. 


Theorem 2 Suppose there is a function v:[0, T| x 
R?—>R? continuous with polynomial growth of 
class C2([0,T] x RÍ) satisfying the following 
Cauchy problem: 


(Ov + Liw — kv = g 


en =e 7 


Then 


rE = E(f) e (- J klO, Xo) de) 


- f eax) exp = f ROX) ao} dr) 
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for (s,x) € [0, T] x R, where X =X**. In particu- 
lar, such a solution is unique. 


Remark 2 


(i) In order to obtain “classical solutions” of the 
above Cauchy problem, one needs some condi- 
tions. It is the case, for instance, when the 
following ellipticity condition holds on A: 


Jc >0, V(t,x) € [0, T] x R”, V(&1,..-,€2) E R” 
d d 
SADT Pa cy GI [8] 
ij i=1 


In the degenerate case, it is possible to deal with 
viscosity solutions, in the sense of P L Lions. 
This theorem establishes an important link 
between deterministic PDEs and SDEs. 

(ii) A natural generalization of Feynman—Kac theo- 

rem comes from the system of forward—backward 

SDEs in the sense of Pardoux and Peng. 

Other types of probabilistic representation do 

appear in stochastic control theory through the 

so-called verification theorems, see for instance, 

Fleming and Soner (1993) and Yong and Zhou 

(1999). In that case, the (nonlinear) Hamilton- 

Jacobi-Bellmann deterministic equation is 

represented by a controlled SDE. 

(iv) Another bridge between nonlinear PDEs and 
diffusions can be provided in the framework of 
interacting particle systems with chaos propaga- 
tion, see Graham et al. (1996) for a survey on 
those problems. Among the most significant 
nonlinear PDEs investigated probabilistically, we 
quote the case of porous media equations. For 
instance, for a positive integer m, a solution to 


xr 


(iii 


Ou = 107 (utt) [9] 


can be represented by a (nonlinear) diffusion of 
the type, see Benachour et al. (1996), 


d X; = u” (s, Xs) dW; 


10 
u(t, -) = law density of X; [10] 


Different Notions of Solutions 


Let a and b as at the beginning of the previous 
section. Let (Q,7,P) be a probability space, a 
filtration (F;),+9 fulfilling the usual conditions, an 
(F t)>o-classical Brownian motion (W;);>ọ. Let € be 
an Fo-measurable r.v. In the section “Motivation 
and preliminaries,” we defined the notion of solu- 
tion of the following equation: 


dX; = b(t, X;) dt + a(t, X;) dW, 


a M11] 


This equation will be denoted by E(a, b) (without initial 
condition). However, as we will see, the general 
concept of solution of an SDE is more sophisticated 
and subtle than in the deterministic case. We distin- 
guish several variants of existence and uniqueness. 


Definition 4 (Strong existence). We will say that 
equation E(a,b) admits strong existence if the 
following holds. Given any probability space 
(Q,F,P), a filtration (F:)>o, an (F+);s9-Brownian 
motion (W;)>o, an Fo-measurable and square- 
integrable r.v. €, there is a process (X;),59 solution 
to E(a, b) with Xo =€ a.s. 


Definition 5 (Pathwise uniqueness). We will say 
that equation E(a,b) admits pathwise uniqueness if 
the following property is fulfilled. Let (Q, F, P) be a 
probability space, a filtration (F+);>o, an (F2);s0 
Brownian motion (W;),s9. If two processes X, X are 
two solutions such that Xo = Xo a.s., then X and X 
coincide. 


Definition 6 (Existence in law or weak existence). 
Let v be a probability law on Rf. We will say that 
E(a,b;v) admits weak existence if there is a 
probability space (0,F7,P), a filtration (F;),.9, an 
(F+),s9-Brownian motion (W;),s9, and a process 
(X;),+9 solution of E(a, b) with v being the law of Xo. 

We say that E(a,b) admits weak existence if 
E(a, b;v) admits weak existence for every v. 


Definition 7 (Uniqueness in law). Let v be a 
probability law on R?. We say that E(a,b;v) has a 
unique solution in law if the following holds. We 
consider an arbitrary probability space (Q, F, P) and 
a filtration (F;),+9 on it; we consider also another 
probability space (Q, F, P) equipped with another 
filtration (F;),+9; we consider an (F+),+9-Brownian 
motion (Wiles and an (F:)>o-Brownian motion 
(W;:):>0; we suppose having a process (X;);+9 (resp. a 
process (X;);s9) solution of E(a, b) on the first (resp. 
on the second) probability space such that both the 
law of Xo and Xo are identical to v. Then X and X 
must have the same law as r.v. with values in 
E=C(R,) (or C[0, T]). 

We say that E(a, b) has a unique solution in law if 
E(a, b;v) has a unique solution in law for every v. 


There are important theorems which establish 
bridges among the preceding notions. One of the 
most celebrated is the following. 


Proposition 1 (Yamada—Watanabe). Consider the 


equation E(a, b). 


(i) Pathwise uniqueness implies uniqueness in law. 
(ii) Weak existence and pathwise uniqueness imply 
strong existence. 


A version can be stated for E(a,b;v) where v is a 
fixed probability law. 


Remark 3 


(i) If a and b are locally Lipschitz with linear 
growth, Theorem [1] implies that E(a, b) admits 
strong existence and pathwise uniqueness. 

(ii) If a and b are only locally Lipschitz, then 
pathwise uniqueness is fulfilled. 


Existence and Uniqueness in Law 


A way to create weak solutions of E(1,b) when 
(t,x) > b(t,x) is Borel with linear growth is the 
Girsanov theorem. Suppose d=1 for simplicity. Let 
us consider an (F;)-classical Brownian motion (X;). 


We set 


t 
W; = X; — / b(s, X;)ds 
0 


Under some suitable probability O,(W;,) is an (F;)- 
classical Brownian motion. Therefore, (X;) provides 
a solution to E(1, b; ôo). 

We continue with an example where E(a, b) does 
not admit pathwise uniqueness, even though it 
admits uniqueness in law. 


Example 1 We consider the stochastic equation 


t 
X, = / sign(X,)dW, [12] 
0 
with 
. B 1 ifx>0 
E -1 ifx<0 


It corresponds to E(a,b;69) with b=0 and 


a(x) = sign(x). 


If (W;,)s9 is an (F;)-classical Brownian motion, 
then (X;);s9 is (F;);s9-continuous local martingale 
vanishing at zero such that [X], = t. According to 
Lévy characterization theorem stated earlier, X is an 
(F),s9-classical Brownian motion. This shows in 
particular that E(a, b; 69) admits uniqueness in law. 
In the sequel, we will show that E(a,b;69) also 
admits weak existence. 

Let now (Q,7,P) be a probability space, an 
(F4);s9-classical Brownian motion with respect to a 
filtration and (X;),s9 such that [12] is verified. Then 


X;=—xX;, can also be shown to be a solution. 
Therefore, E(a,b;69) does not admit pathwise 
uniqueness. 


We continue stating a result true in the multi- 
dimensional case. 
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Proposition 3 (Stroock—Varadhan). Let v be a 
probability on R? such that 
/ x ||?" (dx) < +00 [13] 
R 


for a certain m>1. We suppose that a,b are 
continuous with linear growth. Then El(a,b;v) 
admits weak existence. 


From now on, a function y:[0, T] x R” — Rf will 
be said Hölder-continuous if it is Hölder-continuous 
in the space variable x € R” uniformly with respect 
to the time variable t € [0, T]. 

Stroock and Varadhan (1979) also provide the 
following result, which is an easy consequence of 
their theorem 7.2.1. 


Proposition 4 We suppose a,b both Hölder- 
continuous, bounded such that condition; [8] is 
fulfilled. Then SDE E(a, b; v) admits weak uniqueness. 


Remark 4 


(i) The Hölder condition and [8] in Proposition 4 
may be relaxed and replaced with the solva- 
bility of a Cauchy problem of a parabolic PDE 
with suitable terminal value. 
In the case d=1, if a,b are bounded and just 
Borel with [8] for x on each compact, then 
E(a, b; v) admits weak existence and uniqueness 
in law. See Stroock and Varadhan (1979, 
exercises 7.3.2 and 7.3.3). 
(iii) If d=2, the same holds as at previous point 
provided that moreover a does not depend on 
time. 


xr 


(ii 


We proceed with some more specifically unidi- 
mensional material stating some results from 
K J Engelbert and W Schmidt, who furnished 
necessary and sufficient conditions for weak exis- 
tence and uniqueness in law of SDEs. 

For a Borel function o: R—R, we first define 


Z(c) = {x € Rlo(x) = 0} 


then we define the set I(a) as the set of real numbers 
x such that 





XFS d 
y 
=oo, Ve>O 
J o*(y) 


Proposition 5 (Engelbert-Schmidt criterion). Sup- 
pose that a: R —> R, that is, does not depend on time 
and we consider the equation without drift E(a,0). 


(i) E(a,0) admits weak existence (without explo- 
sion) if and only if 


I(a) C Z(a) [14] 
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(ii) E(a,0) admits weak existence and uniqueness in 
law if and only if 


I(a) = Z(a) [15] 
Remark 5 


(i) If a is continuous then, [14] is always verified. 
Indeed, if a(x) Æ 0, there is € > 0 such that 
la(y)| > 0, Vy Ee [x—e,x +e 

Therefore, x cannot belong to I(a). 

(ii) Equation [14] is verified also for some discon- 
tinuous functions as, for instance, a(x) = 
sign(x). This confirms what was affirmed 
previously, that is, the weak existence (and 
uniqueness in law) for E(a, 0). 

(iii) If a(x) = 1o (x), [14] is not verified. 

(iv) If a(x) =|x|°,a > 1/2, then 


Z(a) = I(a) = {0} 


So there is at most one solution in law for 
E(a, 0). 

(v) The proof is technical and makes use of 
Lévy characterisation theorem of Brownian 
motion. 


Results on Pathwise Uniqueness 


Proposition 6 (Yamada—Watanabe). Let a,b: Rix 
R—R and consider again E(a,b). Suppose b 
globally Lipschitz and h:R4— Ry, strictly increas- 
ing continuous such that 


(i) b(0) =0; 
(ii) fo (1/h7)(y)dy=00, Ve > 0; and 
(iii) jalt,x) —alt,y)| < blæ — y). 


Then pathwise uniqueness is verified. 
Remark 6 


(i) In Proposition 6, one typical choice is 
h(u) =u°,a > 1/2. 

(ii) Pathwise uniqueness for E(a,b) holds therefore 
if b is globally Lipschitz and a is Hölder- 


continuous with parameter equal to 1/2. 


Corollary 1 Suppose that the assumptions of 
Proposition 6 are verified and a,b continuous with 
linear growth. Then E(a,b;v) admits strong exis- 
tence and pathwise uniqueness, whenever v verifies 
condition [13]. 


Proof It follows from Propositions 6 and 3 
together with Proposition 1 (ii). o 


Remark 7 Suppose d=1. Pathwise uniqueness for 
E(a, b) also holds under the following assumptions. 


(i) a, b are bounded, a is time independent and 
a > const. > 0, ) as in Proposition 6. This result 
has an analogous form in the case of spacetime 
white noise driven SPDEs of parabolic type, as 
proved by Bally, Gyongy, and Pardoux in 1994. 

(ii) a independent on time, b bounded and a> 
const. > 0; moreover, a(x) — aly)| < |f(y) - 
f(x)| and f is increasing and bounded. 


For illustration we provide some significant 
examples. 


Example 2 
t 
x= XJ dW, t>0 16 
0 


We set a(x)=|x|°,O<a<1. This is equation 
E(a,0) with a(x)=|x|°. According to Engelbert- 
Schmidt notations, we have Z(a) = {0}. Moreover 


(i) If a > 1/2, then I(a) = {0}. 
(ii) If a < 1/2 then I(a)=9. 


Therefore, according to Proposition 5, E(a,0) admits 
weak existence. On the other hand, if a > 1/2, 


|x” — y"| < (|x — yl) [17] 


where h(z)=z°. According to Proposition 6, [16] 
admits pathwise uniqueness and by Corollary [1], 
also strong existence. The unique solution is X = 0. 
If a < 1/2, X = 0 is always a solution. This is not 
the only one; even uniqueness in law is not true. 


Example 3 Let a(x)= vix], b Lipschitz. Then 
E(a, b) admits strong existence and pathwise unique- 
ness. In fact, a is Hölder-continuous with parameter 
1/2 and the second item of Remark 6 applies; so 
pathwise uniqueness holds. Strong existence is a 
consequence of Propositions 3 and 1 (ii). 

An interesting particular case is provided by the 
following equation. Let xọo,o,ô > 0,keR. The 
following equation admits strong existence and 
pathwise uniqueness. 


t t 
Z=% +0 | VizJaw. + | Ge: 
0 0 
t € [0, T] [18] 


Equation [18] is widely used in mathematical finance 
and it constitutes the model of Cox-Ingersoll-Ross: 
the solution of the mentioned equation represents the 
short interest rate. 

Consider now the particular case where k=0, 
o=2. According to some comparison theorem for 
SDEs, the solution Z is always non-negative and 


therefore the absolute value may be omitted. The 
equation becomes 


t 
Z,= x0 +2 | \/Z,aW, + ôt [19] 
0 


Definition 8 The unique solution Z to 


t 
Zi=xXo+ 2 | \/Z,aW, + ôt [20] 
0 


is called “square 6-dimensional Bessel process” 
starting at xo; it is denoted by BESQ°(xo); for fine 
properties of this process, see Revuz and Yor (1999, 
ch. IX.3). 

Since Z > 0, we call 6-dimensional Bessel process 
starting from xo the process X= vZ. It is denoted 
by BES? (xo). 


Remark 8 Let d>1. Let W=(W!,...,W4) be a 
classical d-dimensional Brownian motion. We set 
X: = || W:||. (Xz);50 is a d-dimensional Bessel process. 


Remark 9 If 6 > 1, it is possible to see that 


6—1 ds 
conta 


The Case with Distributional Drift 


Pioneering work about diffusions with generalized 
drift was presented by N I Portenko, but in the 
framework of semimartingale processes. Recently, 
some work was done characterizing solutions in the 
class of the so-called Dirichlet processes, with some 
motivations in random irregular environment. 

A useful transformation in the theory of SDE is 
the so-called “Zvonkin transformation.” Let (W;) be 
an (F;)-classical Brownian motion. Let a (resp. b) : 
R—R (resp. C!) be locally bounded. We suppose 
moreover a>0. We fix x9 E R. Let (X;z)is9 be a 
solution of 


t 
Xi — XO +f b(X;) ds 
0 
t 


+ J acx.jaw, [21] 


We set 


* 2b 
Bx) =f So) 
and we define 4: IR — R such that 
h! =e 


b(0) = 0, 


h is strictly increasing. We set a(x) =(ah’)(h|(x)), 
where h™ is the inverse of h. We set Y;=h(X;). 
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Without entering into details, the classical Ito 
formula allows to show that (Y;) defines a solution of 


dY; = a(Y;) dW, 


Yo = h(x0) Pa 


Now, eqn [22] fulfills the requirements of the 
Engelbert-Schmidt criterion so that it admits weak 
existence and uniqueness in law. Consequently, 
unless explosion, one can easily establish the same 
well-posedness for [21]. 

Zvonkin transformation also allows to prove 
strong existence and pathwise uniqueness results 
for [21]; for instance, when 


è 4 has linear growth, and 
© f ” b(s) ds 
"Jy Ks) 


is a bounded function. 





In fact, problem [22] satisfies pathwise uniqueness 
and strong existence since the coefficients are 
Lipschitz with linear growth. Therefore, one can 
deduce the same for [21]. 

Veretennikov generalized Zvonkin transformation 
to the d-dimensional case in some cases which 
include the case a=1 and b bounded Borel. 

Zvonkin’s procedure suggests also to consider a 
formal equation of the type 


where y is only a continuous function and so b=~/ 
is a Schwartz distribution; y could be, for instance, 
the realization of an independent Brownian motion 
of W. Therefore, eqn [23] is motivated by the study 
of irregular random media. When o=1,b=7/, SDE 
[22], b’ =e? still makes sense. 

Using the Engelbert-Schmidt criterion, one can see 
that problem [22] still admits weak existence and 
uniqueness in the sense of distribution laws. If Y is a 
solution of [22], X=h'(Y) provides a natural 
candidate solution for [21]. R F Bass, Z-Q Chen and 
F Flandoli, F Russo, and J Wolf investigated general- 
ized SDEs as [23]: in particular, they made previous 
reasoning rigorous, respectively, in the case of strong 
and weak solutions, see Flandoli et al. (2003). 


Connected Topics 

We aim here at giving some basic references about 
topics which are closely connected to SDEs. 
Stochastic Partial Differential Equations (SPDEs) 


If a SDE is a random perturbation of an ordinary 
differential equation, an SPDE is a random 
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perturbation of a PDE. Several studies were 
performed in the parabolic (evolution equation) 
and hyperbolic case (wave equation). Most of the 
work was done in the case of a fixed underlying 
probability spaces. We only quote two basic 
monographies which should be consulted at first 
before getting into the subject: the one of Walsh 
(1986) and the one of Da Prato and Zabczyk 
(1992). 

However, it was possible to establish some results 
about weak existence and uniqueness in law for 
SPDEs. One possible tool was a generalization of 
Girsanov theorem to the case of Gaussian spacetime 
white noise. Weak existence for the stochastic 
quantization equation was proved with the help of 
infinite-dimensional Dirichlet forms by S Albeverio 
and M Röckner. 

We also indicate a beautiful recent monography 
by Da Prato (2004) which pays particular attention 
to Kolmogorov equations with infinitely many 
variables. 


Numerical Approximations 


Relevant work was done in numerical approxima- 
tion of solutions to SDEs and related approxima- 
tions of solutions to linear parabolic equations 
via Feynman-KĶac probabilistic representation, see 
Theorem 2). It seems that the stochastic simulations 
(of improved Monte Carlo type and related topics) 
for solving deterministic problems are efficient when 
the space dimension is greater than 4. 


Malliavin Calculus 


Malliavin calculus is a wide topic (see Malliavin 
Calculus). Relevant applications of it concern 
stochastic (ordinary and partial) differential equa- 
tions. We only quote a monography of Nualart 
(1995) on those applications. Two main objects 
were studied. 


e Given a solution of an SDE, (X;), sufficient 
conditions so that X; t>0, has a (smooth) 
density p(t,-). Small-time asymptotics of this 
density, when t-—> 0, and small-drift perturbation 
were performed, refining Freidlin—Ventsell large- 
deviation estimates. 

èe Coming back to SDE [11], one can conceive to 
consider coefficients a, b nonadapted with respect 
to the underlying filtration (F+). On the other 
hand, the initial condition € may be anticipating, 
that is, not Fo-measurable. In that case, the Ito 
integral foals, X;)dW, is not defined. A replace- 
ment tool is the so-called “Skorohod integral.” 


Rough Paths Approach 


A very successful and significant research field is the 
rough path theory. In the case of dimension d= 1, 
Doss—Sussmann method allows to transform the 
solution of an SDE into the solution of an ordinary 
(random) differential equation. In particular, that 
solution can be seen as depending (pathwise) 
continuously from the driving Brownian motion 
(W,) with respect to the usual topology of C([0, T]). 
Unless exceptions, this continuity does not hold in 
case of general dimension d>1. Rough paths 
theory, introduced by T Lyons, allows to recover 
somehow this lack of continuity and establishes a 
true pathwise stochastic integration. 


SDEs Driven by Non-semimartingales 


At the moment, there is a very intense activity 
towards SDEs driven by processes which are not 
semimartingales. In this perspective, we list SDEs 
driven by fractional Brownian motion with the help 
of rough paths theory, using fractional and Young 
type integrals and involving finite cubic variation 
processes. Among the contributors in that area we 
quote L Coutin, R Coviello, M Errami, M Gubinelli, 
Z Qian, F Russo, P Vallois, and M Zahle. 


See also: Fractal Dimensions in Dynamics; Image 
Processing: Mathematics; Interacting Stochastic Particle 
Systems; Lagrangian Dispersion (Passive Scalar); 
Malliavin Calculus; Path Integrals in Noncommutative 
Geometry; Quantum Dynamical Semigroups; Quantum 
Fields with Indefinite Metric: Non-Trivial Models; Random 
Dynamical Systems; Random Walks in Random 
Environments; Stochastic Hydrodynamics; Stochastic 
Resonance. 
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Introduction 


Mathematical models in hydrodynamics are intro- 
duced to describe the motion of fluids. The basic 
equations for Newtonian incompressible fluids are 
the Euler and the Navier-Stokes equations, for 
inviscid and viscous fluids, respectively. For a given 
set of body forces acting on the fluid, these 
nonlinear partial differential equations (PDEs) 
model the evolution in time of the velocity and 
pressure at each point of the fluid, given the initial 
velocity and suitable boundary conditions (see 
Partial Differential Equations: Some Examples). 
The equations of hydrodynamics offer challenging 
mathematical problems, like proving the existence 
and uniqueness of solutions, determining their 
regularity, their asymptotic behavior for large time, 
and their stability. To gain some insight into the 
behavior of fluids, stochastic analysis is introduced 
into hydrodynamics. In fact, there are various 
attempts to describe turbulent regime (see Turbu- 
lence Theories). But, analyzing individual solutions 
that determine the flow at any time, for a given 
initial condition, is a desperate task, since the 
dynamics in a turbulent regime is chaotic and highly 
unstable. This is a particular chaotic motion with 
some characteristic statistical properties (see Monin 
and Yaglom (1987)). The aim of a statistical 
description of turbulent flow is to single out some 
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relevant collective properties of the flow that, 
hopefully, make it possible to grasp the salient 
features of the dynamics. In this sense, stochastic 
hydrodynamics is germane to the kinetic gas theory. 
In the next section we shall review a typical topic of 
stochastic hydrodynamics, the evolution of prob- 
ability measures. Results on stationary probability 
measures will be given in the subsequent sections. 

Another characteristic of turbulent flows is the lack 
of space regularity of the velocity field. We shall 
introduce in the section “The stochastic Navier-— 
Stokes equations” a stochastic model of turbulence, 
which exhibits lack of regularity of the solutions. 

The Euler equations are a singular limit of the 
Navier-Stokes equations, since they are first order, 
instead of second-order PDEs. It is little surprise if they 
involve different mathematical techniques. A full sec- 
tion will be devoted to a discussion of Euler equations 
and another to the Navier-Stokes equations. Statistics 
of an inviscid flow, when approximated by vortex 
motion, will be described in the final section. 


Statistical Solutions 


Let u(t, x) be the fluid velocity at time t and point 
x €DCR¥4; since the initial velocity is always 
affected by experimental errors, it is reasonable to 
assign a measure v determining the probability that 
the initial velocity belongs to a Borel set I’ of the 
space Hof all admissible velocity fields u = u(x). 

A spatial statistical solution is a family of 
probability measures p(t,-),t > 0, each supported 
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on the set H such that, given any Borel set F in H, 
we have 


Prob{u(t,x) Er} = u(t, T), Vt>0 [1] 


with the initial condition u(0,r)=v(T). The con- 
struction and analysis of statistical solutions u(t, -) is 
one of the crucial mathematical problems in 
stochastic hydrodynamics (see, e.g., Vishik and 
Fursikov (1988)). 

Hopf gave the first mathematical formulation of 
the problem of describing turbulent flows by 
statistical solutions. The first result on the existence 
of statistical solutions is by Foias in 1973. Hopf 
(1952) presented an equation in variational deriva- 
tives satisfied by the characteristic functional x(t, œ) 
of the family of measures u(t, -) associated with the 
Navier-Stokes equations. The characteristic func- 
tional y(t, d) is the Fourier transform of the measure 


L(t, ai 


TT l ele a(t, du) 2 





defined for any smooth test function @. 

We now derive the evolution equation for x(t, œ), 
by assuming that the dynamics takes place in the 
phase space H and follows the nonlinear equation 


du _ 

dt 
If u”(t) is the solution started from v at time t=0, 
then its probability distribution is represented by 
the time-evolved measure p(t,-). Therefore, we 
have that 


[ ned | Ood) A 


H 





F(u) [3] 








Differentiating in time, we obtain 


d 
q XË p) — J 


=i f e, P(o), de) [5] 
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elem ig, F(u” (t)))u(0, dv) 








The last integral is uniquely determined by x, since 
the measure u(t, -) is uniquely determined by x(t, ¢). 
We denote by ®y(t,¢) the last integral in [5]. The 
evolution equation thus obtained for the character- 
istic functional y is 


Exeo = itxt), Ve G 


This is called the Hopf equation associated with the 
dynamical system [3]. 

Another way to analyze the evolution of measures 
is through the moments; instead of the measure u(t, -) 


describing the spatial statistical solution, we deal with 
the moments of ju(t,-) of any order. For a nonlinear 
dynamics [3], the moments equations are an infinite 
chain of coupled equations, the so-called Friedman- 
Keller equations. 

A prominent role among statistical solutions is 
played by stationary solutions. They contain all the 
statistical information in the case of equilibrium in 
time. We have that the characteristic functional of 
an invariant measure is constant in time. Therefore, 


d 
yp XË P) =0 


Bearing in mind equation [5], this is equivalent 
to say that the signed measure (¢, F(v)) u(t, dv) vani- 
shes, for any test function @¢ and time t. Setting t= 0, 
we obtain that an invariant measure v in the space 
FI satisfies the Liouville equation 





/ (olv), F(v)) du(v) = 0 7] 
H 





for appropriate test functions ¢. This equation is 
also called the relation of infinitesimal invariance 
and the measure v is said to be infinitesimally 
invariant. 

The stationary measures are natural candidates to 
describe the statistical asymptotic behavior of the 
system when t— oo. Notice that, in a chaotic system 
two motions that are arbitrarily close to one another at 
t=O can evolve in completely different ways. So, to 
describe satisfactorily the dynamics we take average 
over a big number of experiments. This is the so-called 
ensemble average. These averages are assumed to be 
with respect to an invariant measure u. The invariant 
measures must exist and either they are unique or at 
most one has physical meaning and enters in the 
functional integral defining the ensemble average. 
According to the ergodic principle (an assumption not 
yet proved in hydrodynamics), ensemble averages 
replace long-time averages: for every initial velocity 
field v, except for a set of initial values negligible in 
some sense, the time average of an observable w tends, 
as time goes to infinity, to the ensemble average 


S ae 7 
lim = f (u"(t)) de = f vàn [8] 


T=œ T 





However, it is extremely difficult to prove the 
existence of stationary probability measures for the 
Navier-Stokes equations solving directly equation 
[7]. The situation is formally the same as in 
equilibrium statistical mechanics, where the Liouville 
equation is in fact solved, leading to the Boltzmann- 
Gibbs distribution. However, the results in statistical 
hydrodynamics are far from being satisfactory. 


Recent studies to prove the existence of invariant 
measures for the Navier-Stokes equations are based 
on stochastic models (see the section “The stochastic 
Navier-Stokes equations”). On the other hand, for 
the Euler equations it is possible to construct 
formally invariant measures, by means of invariant 
quantities of the classical motion (see the next 
section). 

Finally, we point out that there are techniques 
using invariant measures to show some results for 
the time evolution (e.g., the motion exists for almost 
all initial values with respect to an invariant 
measure). 


The Euler Equations 


We start recalling some basic facts on Euler 
equations (see Incompressible Euler Equations: 
Mathematical Theory). 

The motion of an inviscid, incompressible, and 
homogeneous fluid is described by the Euler 
equations, which in Eulerian coordinates read as 


ð 
a t (u: Viu+Vp=f 
V-u=O0 


u-n=0 on oD 


in D [9] 


where, at time ¢t > 0 and position x € D,u=u(t, x) 
is the vector velocity, p=p(t,x) the hydrodynamic 
pressure. The units have been chosen so that the 
mass density p=1. V denotes the nabla vector 
operator so 


Finally, f denotes the external force. If the spatial 
domain D has a boundary OD, then the velocity is 
assumed to be tangent to the boundary (n denotes 
the exterior normal vector to the boundary). Some 
initial condition up at time t=0 is assigned. 

When f=0, there are invariant quantities for 
system [9]. In the literature, there are many works 
suggesting a Gaussian stationary statistics (see, e.g., 
the paper by Kraichnan (1980)). We consider 
invariants that are quadratic in the velocity so as 
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to construct (formally) invariant measures of Gibbs 
type: the energy 
1 
== J ju|? dx 
2 Jp 


and, only in the two-dimensional case (d=2), the 
enstrophy 


S(u) := J curl u|” dx 


(with curl u=VŻ+ -u = ðu /ðx1ı — ðu /ðx for d=2). 

It is natural to look for velocity fields in the 
following function spaces: the space H? of finite 
kinetic energy and the space H! of finite enstrophy. 
Clearly, the admissible fields should also obey the 
boundary conditions and divergence-free condition. 
If P is the projection operator onto the space of 
divergence-free vectors, and B is the bilinear form 


B(u,v):=P|(u-V)v], the Euler equations can be 
given the structure of an evolution, 
= = —B(u,u) [10] 


obtained by applying the projection operator P to 
the first equation in [9]. The pressure disappears and 
can be regarded as a Lagrange multiplier associated 
with the divergence-free constraint (V - u = 0); it can 
be fully recovered once the velocity field is known. 
The dynamics is considered in the phase space of 
divergence-free velocity vectors H (a large space 
containing H? and H!), which is an infinite- 
dimensional functional space. More precisely, iden- 
tifying H? with its dual (H®)’, we introduce the 
Gelfand’s triplet 





Fences 

The space H°, with a=1,2,..., are the usual 
Sobolev spaces but with the additional divergence- 
free and boundary conditions. For a > 0 noninteger, 
the spaces H® are defined by interpolation, whereas 
those with a < 0 by duality. As usual, regularity in 
space is related to the spaces H® with higher 
exponent a. We have that H = Uaer H®. 

Invariance of € and S can be proved resorting to 
eqn [9] and assuming that u is a smooth vector field. 
For instance, 


d o 
Ta vin his hrat 





74 Stochastic Hydrodynamics 


By integrating by parts and bearing in mind the 
divergence-free condition and the boundary condi- 
tion, we conclude that 


d 
dt 
In the same way, the invariance of S can be proved. 


As a consequence, the following Gibbs measures 
which are defined on the space H 


E(u) = 0 





te(du) = = eE) du 
Ze 
_ [11] 
us(du) = Z; e du 


are heuristically invariant in time. In [11], Z. are the 
partition functions, that is, they are normalization 
constants needed to guarantee that ug and us are 
genuine probability measures (e.g., Ze = fą e ©) du). 

Actually, these measures u solve the Liouville 
equation 





f (o(u), Bidu) = 0 12 
HI 


for any test function @¢, cylindrical, infinitely differ- 
entiable, bounded, and with bounded derivatives. 

On the other hand, the (global and not only 
infinitesimal) invariance means that if there exists a 
global flow in time which is well defined in a phase 
space of full measure u, then the measure u is invariant 
under this dynamics. The measures ug and us are 
centered Gaussian measures whose support is in a 
space larger than H?, as can be proved by standard 
methods in the theory of Gaussian measures on 
infinite-dimensional spaces. By the very definition, ue 
is a cylindrical measure in H? and pg is cylindrical in 
H!. Then the support of ue is any Hilbert space H such 
that H? c His a Hilbert-Schmidt embedding, and the 
support of us is any space H such that H! c H is a 
Hilbert-Schmidt embedding. When the spatial dimen- 
sion d is 2, supp(ue) = Na<_-1 H® and supp(ps) = Na<o 
H°, When d is 3, supp(Me) = Na<-3/2 H®. 

Moreover, ue(H?) = ws(H°) = 0, that is, the space 
of finite energy H? is negligible with respect to these 
measures. Let us show this property for the 
“enstrophy measure” us when d=2. Let lafa be 
a complete orthonormal system in H?. Hence, for 

2 2 j 
u= >i, ujej, we have |u| = 35 lu" and |lz\|zn = 
2 Alul? (with O< à <A. <- and A ~j as 
j — œ). Keeping in mind its definition, the measure 
us can be considered as a measure on the space of 
the sequences {u;}; and written as an infinite product 


j 
of one-dimensional centered Gaussian measures 


us(du) = @j—— 
ne. 
2m 

















eo 4/2) lal? du; [13] 


The energy is 
1 2 
Eu) = 5 In 
j 
and the renormalized energy is 


E) =F (wP — f Pusti) 


Since, as can be easily shown | (:€: (u))7 us(du) 
<oo,:€:(u) is finite for us-almost every u. On the 
contrary, since >, S lai us(du) = D dj" = +00, 
E(u) is infinite for us-almost every u. 

We also note in passing that, for any y > 0 and 


p> =y 


—7S(u) 
J eeu) — du < œ 
H 





so that 
du [14] 


is a probability measure, which is infinitesimally 
invariant for the Euler flow. 

Since the space of finite-energy velocity is negligible 
with respect to these measures, it is necessary to 
replace the classical solutions having finite energy with 
generalized solutions. This is not an easy task in the 
three-dimensional case, whereas some results have 
been proved for the two-dimensional problem, where 
the following existence result holds. Let us analyze 
the quadratic term B(u, u) = —P|(u- V)u].(u- V)u can 
be rewritten as V(u &u), taking in account the 
divergence-free condition. Trivially, we have that 
V(u@uy=V(uS@u—:u@u:), where :uQ@u:= 
fu & u; usldu). We consider the quadratic expres- 
sion (u Qu —:u Qu:). This is integrable with respect 
to the measure us in the sense that 


J lugu- :ugu: |f usldu)<œ [15] 


for any £ > 0. We remark that this property is 
similar to the integrability of the renormalized 
energy, which is a quadratic expression as well. 
This implies that the H~'~*-norm of V(u@u) is 
integrable with respect to the measure pgs. There- 
fore, B(u,u) is defined for us-a.e. u. 

Now, let us replace eqn [10] with a system of infinite 
equations for all the components u; with respect to 
the orthonormal basis {e;};, obtained by taking the 
scalar product with e; of both sides of eqn [10]: 


BA), FS L2 [16] 


a 


Each component B;(u,u) is defined for us-a.e. u. 
These estimates lead to define a weak solution (see 
Albeverio and Cruzeiro (1990)): 


Theorem 1 Let d=2. There exists a flow U(t,w) 
defined on a probability space (Q, F, P) with values 
in H71 for any e > 0, U(-,w) € C(R, H~*"!) P-a.e. 
w, such that for each component U; we have 


U;(t, w) 
=U (0w) +| B,(U(s,w), U(s,w)) ds, 


P —a.e.w, vzeR 


Moreover, the measure us is invariant under this 


flow. 


We point out that uniqueness is an open problem 
also for d= 2. But already in the classical analysis of 
the Euler equations in a bounded domain, unique- 
ness for initial velocity of finite energy is not known. 
Working with the measure ug is even worse, 
especially when d = 3, because its support is a larger 
space within which more irregular velocity vectors 
live. The more irregular the spaces where the flow 
lives, the more difficult is to handle the nonlinear 
term B(u, u). 

On the other hand, for d=1, the mathematical 
analysis is much easier. For instance, it can be 
proved (see Robert (2003)) that the one-dimensional 
inviscid Burgers equation on the line 


ðu Of1 5 


has intrinsic invariant statistical solution, given by a 
class of Lévy’s processes with negative jumps. 


The Stochastic Navier-Stokes Equations 


The Navier-Stokes equations describe advection 
with velocity u and diffusion with kinematic 
viscosity v > 0 (see Viscous Incompressible Fluids: 
Mathematical Theory) 


ot yAwt (u-V)u+Vp =f 
V-u=0 in D [18] 
u=0 ondD 


where A is the Laplace operator. Nonslip boundary 
conditions are assumed. Although the Euler equa- 
tions [9] are formally obtained from [18] by setting 
v=0Q, the presence of the second-order operator 
—vA makes the analysis needed to prove the 
existence, uniqueness, and regularity of solutions 
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easier than for the Euler equations. However, at 
variance with the Euler equations, the Navier— 
Stokes equations do not possess invariants, since 
the viscosity dissipates energy. Hence, it is difficult 
to find explicit expressions of invariant measures for 
the deterministic Navier-Stokes equations, except 
the trivial invariant measures concentrated on a 
stationary solution. However, as soon as a stochastic 
force is introduced in these equations, it is possible 
to have nontrivial invariant measures. It is impos- 
sible to review here the wide literature concerning 
the stochastic Navier-Stokes equations and we 
confine ourselves to make some remarks. Most 
results are concerned with proving the existence 
and/or uniqueness of an invariant measure u, with- 
out giving an explicit representation, apart some 
attempts like Gallavotti (2002), where a formal 
representation of stationary distributions is given in 
terms of functional integrals. Some properties of the 
not explicit invariant measures are given like, for 
instance, estimates of moments, exponential conver- 
gence of the statistical solution for large time. 

Stochastic forces can enter in the Navier-Stokes 
equations in different ways. We can consider 
randomness in the forcing term, so that the force f 
in [18] has a deterministic component which 
represents its mean varying slowly and a stochastic 
one, which accounts small fluctuations around the 
mean and varying very rapidly. Alternatively, since 
the molecules are not rigidly connected to one 
another in the fluid, they are subjected to fluctua- 
tions. A complete description of fluctuations relating 
the microscopic and macroscopic motion is not 
achieved at present. However, we shall introduce 
some models for which rigorous mathematical 
results can be proved. 

The first part of this section concerns the Navier- 
Stokes equations with noise n: 


5p vAu+ (u-Viu+ Vp =n 


V-u=0 


[19] 


for which invariant measures exist, one of which can 
be ergodic provided that the noise is suitably chosen. 
In the second part, a Navier-Stokes-type stochastic 
system is described, which has irregular solutions, as 
expected in turbulence. 

Let us introduce the stochastic Navier-Stokes 
equations with time white noise. The first equation 
in [19] is an Ito equation: 


Ou +[—-vAu+(u-V)u+Vp| =w [20] 


Here w=wy),..-,W(¢) is a Brownian motion, that 
is, its time derivative n=Ow/Ot is a Gaussian 
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stochastic field with zero mean and correlation 
function given by 


Elng (t, x) (ey (t, x’) 
= q(x — x')6(t — t') [21] 


tot yo — lessi 

We shall use the differential form for the Itô 
equation [20] always understood in the integral 
form 


u(t) — W(0) + | [-vdu(s) + (als) V)uls) 
+ Vp(s)|ds = w(t) [22] 


Modeling perturbations by a white noise process 
represents the first step to understand how a random 
perturbation acts in the mathematical equations, 
rather than a good physical or numerical model. The 
first results are in a paper by Bensoussan and 
Temam (1973). 

Obviously, the regularity of the solutions depends 
on the spatial covariance q of the noise. 

Let us consider the following cases. 


e g=5: the noise is white also in space. 


An invariant measure is known explicitly. Indeed, 
assume periodic boundary conditions on the square 
(d=2) or the cube (d=3) D, which makes the 
spatial domain a torus. In this case, the Euler and 
Navier-Stokes equations are set in the same func- 
tional spaces. The generator of the stochastic 
Navier-Stokes equations [20] corresponds to the 
sum of the generator of the Euler equations [9] and 
of the stochastic Stokes equations 


Ou = |vAu — Vp] + Ow 


V-u=0 A 


Since the first equation in [23] is linear in the 
unknown velocity u, the Stokes system has a unique 
invariant measure which is a centered Gaussian 
measure. In particular, when the noise is a space- 
time white noise and d=2, this is the invariant 
measure [14] of the enstrophy: 


7 1 
pee (du) E ze Sdu 


On a bidimensional torus, it is proved that this 
measure is not only infinitesimally invariant, but 
also globally invariant for a unique flow [20] 
defined for ae, initial velocity. We recall 
that initial velocities of finite energy are negligible 
with respect to the measure o 

è q more regular than above, that is, the noise is 

colored in space. 


As soon as the forcing term is more regular in space, 
the Navier-Stokes system has a solution of finite 
energy. These are solutions close to those of the 
deterministic equation. Techniques similar to those 
used to prove the existence and/or uniqueness of 
solutions for the deterministic equations work also 
in the stochastic case with an additive noise (or even 
a multiplicative noise) to get weak or strong 
solutions. Global existence in the space H? is proved 
for d=2,3 and uniqueness only for d=2, as is the 
case for the deterministic Navier-Stokes equations. 

The interesting feature is that by adding a noise 
which acts on all the components with respect to a 
Hilbert basis (or at least on many components), the 
stochastic Navier-Stokes system has a unique 
invariant measure, which is ergodic. This is proved 
for the spatial dimension d=2. By means of the 
Krylov-Bogoliubov’s method, existence of at least 
an invariant measure is proved by compactness of a 
family of averaged measures; the limit measures are 
stationary measures. But, when many modes are 
perturbed by a noise, there is a mixing effect on the 
dynamics, avoiding existence of many stationary 
measures. For the spatial dimension d= 2, the best 
result in this context is in Hairer and Mattingly 
(2004), where the noise acts on very few modes. For 
the spatial dimension d=3, the result in Da Prato 
and Debussche (2003) shows the existence of an 
invariant measure; even if there is no uniqueness of 
the solutions (as in the deterministic case), by a 
selection principle, they construct a transition 
semigroup, which has a unique invariant measure, 
ergodic and strongly mixing. 

Mathematical proofs are given for very different 
noises. (The reader is urged to consult, among the 
others, the papers by E, Mattingly and Sinai; Flandoli 
and Maslowski; Mikulevicius and Rozovski; Vishik 
and Fursikov. The latter authors study also statistical 
solutions in two and three dimensions. For a kick noise 
n= >X, 6(t — k)qp(x) in equations [19], there are results 
for d= 2 by Bricmont, Kupiainen and Lefevere; Kuksin 
and Shirikyan.) 

We conclude that, as far as invariant measures 
and their ergodicity are concerned, the stochastic 
Navier-Stokes equations have richer results than the 
deterministic Navier-Stokes equations. It is appeal- 
ing to investigate the limit as the intensity of the 
noise goes to zero, so as to recover the deterministic 
equation. Now, think of equation [19] with a noise 
en, for n fixed and £— 0. Due to the sensitive 
dependence on initial conditions, even a small noise 
may have important effects on the dynamics. A 
conjecture by Kolmogorov is that the unique 
invariant measure us tends, when £ — 0, to a specific 
measure, the so-called Kolmogorov measure, which 


would enter into the ergodic principle. This is a 
difficult problem, not yet solved. 

We also mention the analysis of the inviscid limit. 
Kuksin (2004) showed that the solution u, of the 
two-dimensional stochastic Navier-Stokes equations 


hee G TEE E 


<1 [24 
T 0<v< [24] 


on the torus converges in distribution to a stationary 
solution of the Euler equations. Here n is a random 
force white in time and smooth in space. More 
precisely, for each subsequence 1,,, 

lim lim u, (T +t) = U(t) [25] 


v;—0 T-00 


and almost every trajectory of the nontrivial limit 
process U solves the Euler equations [9] without the 
forcing term. Moreover, the process U keeps 
memory of some features of the noise force n, since 
the mean values of the enstrophy and of the energy 
of U depend on the noise n. 

We now present the second part on stochastic 
models for viscous fluids. In his 1884 paper, 
Reynolds introduced the decomposition of turbulent 
flow into mean and fluctuating flows. The equations 
obtained are difficult to study. We shall show now a 
tractable model for a one-dimensional problem 
(d=1) with a suitable model of fluctuations. 
Decompose the velocity field into the sum of a 
mean flow z and a fluctuation 6 


u—u+o6 


The fluctuation is assumed to be highly irregular; it 
is reasonable to model it by a stochastic process. If 
we choose 


where b is a given velocity field and dw/dt is white 
noise, then the motion of the fluid is governed by a 
stochastic equation of Itô type. Indeed, the Navier- 
Stokes equations are balance equations of linear 
momentum: 


D 

5; = vAu — Vp [26] 
where Du/Dt is the material time derivative along 
the trajectory of a particle which is at time £ in 
position x(t) moving with velocity u (so 


u(x(t)) = (dx/dt)(t)): 


Du d Ou 
~ elt, x0) =a (u-V)u [27] 
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According to the mathematical model for the 
fluctuation, we have 


dx(t) = u(t, x(t))dt + b(x(t))dw(t) [28] 


Therefore, Du is computed by means of Itô’s 
formula 


d 
+45- b,bsdt 129] 
= be 


This leads to the stochastic Navier—Stokes-type 
equations (we neglect the overline symbol) 


d;u + [-vAu + (u- V)u+ Vp +5 Qu] dt 
= —(b-V)udwi(t) [30] 
V-u=0 


where O is the second-order differential operator 
given by the last term in [29]. 

Rigorous mathematical results for the above 
equations have been proved for the one-dimensional 
case, that is, the Burgers equations on the line. 
Given an initial velocity of finite energy uo € H®, 
there exists a unique solution u € C([0,T];H°)N 
L*(0,T; H!) (P-a.s.). But it can be shown that for a 
more regular initial velocity there is no higher 
regularity of the solution of eqn [30], if b 4 0. This 
means that these stochastic Burgers equations 
cannot have too regular nontrivial solutions, as 
expected in turbulent motion. 


Statistics of Vortices and Bidimensional 
Turbulence 


Onsager (1949) proposed to investigate bidimen- 
sional turbulent flows, extending in a rigorous way 
to hydrodynamics the statistical mechanics approach 
of Boltzmann. If we are interested in flows of finite 
energy, the results of the section “The Euler 
equations” provide no answer to the problem. 
Another way to proceed is by approximating the 
Euler equations in a suitable way. Actually, in a 
two-dimensional turbulent flow, there appears a 
large-scale organization leading to coherent struc- 
tures. These are hydrodynamical vortices, whose 
dynamics is governed by the Euler equations. 
Onsager suggested to approximate the continuous 
Euler equations by a great (but finite) number of 
point vortices. This leads to a finite-dimensional 
Hamiltonian system, to which the methods of 
statistical mechanics can be successfully applied. Of 
course, the crucial point is to pass to the limit, to 
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recover the continuous system. But there are many 
different ways to approximate a continuous vorticity 
by a cloud of point vortices and different approx- 
imations may lead to very different statistical 
equilibrium states. 

We present here the approach presented in Lions 
(1997). To get an idea of a completely different 
approximation, see, for example, Robert (2003). 

Let D be a bounded open smooth simply 
connected subset of R*. Then there exists a function 
~ (the stream function) such that u=V+w and 
w| 55 =0. Given the velocity u, we recover the stream 
rape by means of the vorticity w = curl u = —Ay), 
so W(x) = Jn g(x, y)w(y)dy (here g is the Green’s 
A of the Laplacian —A and x, y are points in 
D). The Euler equations can be written as 


Ow 
ee 31] 
w = curl u 


Consider now a solution given by vorticity concen- 
trated in a finite number N of points: 


N 
=% Nbs [32] 


= 


Here the vortex intensities A; are real values and 

x;(t) are distinct points in D for i=1,..., N. 
According to the Euler equations, these points evolve 

as follows (see also Marchioro and Pulvirenti (1994)): 


oy 
E LZ 
+ AV 5, B(x), cs lN cs) 
where g is related to the Green’s function g. This is a 
Hamiltonian system in DN. Hereafter, we shall 
suppose that the vortex intensities are the same 
(A; =AVi), so that the Hamiltonian is 


i, N 
H(X.. xN) =; X gapt A 
bias =1 


By means of H, we define the canonical Gibbs 
measure 





u^ (dx dx> ee dxn) 
= Z e PAH (En *N) dx, dx. +++ dxy [35] 


where Z(N) is the partition function. If Z(N) < œ, 
then p is a well-defined probability measure on DN 
and, by construction, it is an invariant measure for 


system [33]. We can prove that Z(N) is finite for 





oraz 87/N,47), so that it is natural to choose as 
a scaling GAN = 8. Hence, 
uN (dx dx> oes dxn) 
1 
_ —(8/NYH ye 
Z(N) c dx, dx.---dxn [36] 


is considered for -8r <68<0, o G>0 with 
N > b/4r. 

Bearing in mind the Onsager approach to approx- 
imate the turbulent Euler motion by means of point 
vortices, we are interested in the limit as N goes to 
+oo, for 8 fixed in (—8r, + 00). It turns out that, 
when the number of point vortices becomes very 
large, their statistical behavior corresponds to a very 
large number of independent particles moving in a 
mean force field that they create. 

More precisely, consider \=1/N,3=(. The 


empirical measure 
1X 
N ` Ôx; (t) 
i=1 


describing the vorticity, weakly converges to a 
probability density p and each correlation function 


-hÀ dig > J deN Fay 


for 7 = l [37] 


weakly converges to 7 (= IÉ: WEAN 
The equation satisfied by p, also called the mean- 
field equation, is 


e—6U(x) 
p(x) — fre-8U0) dy’ 
with U(x) = / e(x,yoy)dy B8 
D 


The relation between U and p can also be written as 
—AU=p in D, U=0 on OD. We point out that 
u=V-U is a stationary solution of the Euler 
equations. Indeed, w= — AU =p and p is a function 
of U, let us say p=F(U). This gives that 
Vw=VUF'(U) and thus the term u-Vw in the 
Euler equation [31] vanishes. 

It can be proved that there exists a solution of the 
mean-field equation when 8 > 0 or when 8 < 0 and 
D is simply connected. Uniqueness is known in some 
cases, for instance, when D is a bounded open 
smooth simply connected domain and the velocity is 
assumed tangent to the boundary. 

There are numerical evidences of this approxima- 
tion approach (see references in Lions (1997) 
referring to the periodic case). They show that for 


large time and large Reynolds number (viscosity v 
close to 0), the vorticity of the solution of the 
Navier-Stokes equations appears in a simple and 
organized structure. This stays intact until the 
viscous dissipation damps it. The important obser- 
vation is that the organized structure is described 
quite precisely by the solution of the mean-field 
equation for some specific £. 

Actually, to say that a fluid is inviscid is an 
approximation (which may be justified in many 
contexts), since every fluid displays some kind of 
viscosity. But turbulence is a phenomenon occurring 
at very small viscosity. In this sense, the above result 
provides a description of stationary regime in an 
ideal fluid, which is a good approximation of some 
numerical simulations of real fluids. Besides this 
good agreement with numerical simulations, there is 
no proof on how to deduce the mean-field equation 
from the Euler equations (e.g., which parameter (3 
has to be chosen in eqn [38]?). 


Remark The extension of this analysis to three- 
dimensional flows involves vortex filaments, instead 
of point vortices. There are attempts to describe 
interacting vortex filaments as proposed by Chorin. 
Idealizations of behavior of vortices are introduced 
to have a tractable mathematical model. The reader 
is referred to Lions (1997) for a description of nearly 
parallel vortex filaments and to Flandoli and Bessaih 
(2003) for more realistic filaments which fold. 


See also: Cauchy Problem for Burgers-Type Equations; 
Hamiltonian Fluid Dynamics; Incompressible Euler 
Equations: Mathematical Theory; Malliavin Calculus; 
Non-Newtonian Fluids; Partial Differential Equations: 
Some Examples; Stochastic Differential Equations; 
Turbulence Theories; Viscous Incompressible Fluids: 
Mathematical Theory; Vortex Dynamics. 
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Introduction 


The stochastic Loewner evolution or Schramm- 
Loewner evolution (SLE) is a family of random curves 
that appear as scaling limits of curves or cluster 
boundaries of discrete statistical mechanical models in 
two dimensions at criticality. The stochastic Loewner 
evolution was introduced by Oded Schramm as a 
candidate for the limit of loop-erased random walk 
and the boundary of percolation clusters, and it is now 
believed that SLE curves appear in most planar critical 
systems whose scaling limit satisfies conformal invar- 
lance. The curves are defined by solving a Loewner 
differential equation with a random input. 


Definition 


There are three major one-parameter families of SLE 
curves — chordal, radial, and whole-plane — which 
correspond to curves connecting two boundary points 
in a domain, a boundary point and an interior point in 
a domain, and two points in C, respectively. The 
parameter is usually denoted «x > 0. The starting point 
for defining SLE is to write down the assumptions 
that one expects from a scaling limit, assuming that 
the limit is conformally invariant. 

In the chordal case, we assume that there is a 
family of probability measures {up(z,w)}, indexed 
by simply connected proper domains D C C and 
distinct boundary points z,w € OD, supported on 
continuous curves y:[0,¢,| — D with 4(0)=g, 
y(t) =w, which satisfies the following: 


e Conformal invariance. If f:D — D’ is a con- 
formal transformation, then the image of up(z, w) 
under f is the same as up (f(z), f(w)), up to a time 
change. 

e Conformal Markov property for wup(z,w). 
Suppose ~[0,ż] is known, and let g, be 
a conformal transformation of the slit domain 
D\y[0,t] onto D with g,;(y(t)) =z, gw) =w 
(see Figure 1). Then the conditional distribution 
on golt, t], given y[0,¢], is the same, up to a 
change of parametrization, as the original dis- 
tribution. (Implicit in this is the assumption that 
y(t) is on the boundary of D\y(0,t], which will be 
true, e.g., if y is non-self-intersecting and 
+(0,ty) C D.) 


Using the Riemann mapping theorem, one can see 
that such a family {up(z,w)} is determined (up 
to reparametrization) by ug(0,œ0), where H= {x + 
iy:y > 0} denotes the upper half-plane. Suppose 
7y:[0,0o) — C is a simple (i.e., no self-intersections) 
curve with ~(0)=0, 7(0,00) c H, and sup,Im 
[y(t)] =co. Let H;=H\y[0,t]. There is a unique 

















conformal transformation g;:H; — H whose 
expansion at infinity is 
b(t z 
21(Z) =+} O(|z] =) Z—- œ 


(see Figure 2). The coefficient b(t), which is some- 
times called the half-plane capacity of y[0,¢] and 
denoted hcap[y[0, £]], is continuous, strictly increas- 
ing, and tending to oo. In fact, 


b(t) = lim yEllm[X-] | Xo =i) 


where X, denotes a complex Brownian motion and 
T =T,j0, 4 18 the first time s such that X, € RU 9[0, t]. 
By reparametrizing y, b(t)=2t. With this parame- 
trization, the maps g; satisfy the Loewner differen- 
tial equation 


=p ale) =e 


where U:[0,00) — R is a continuous function with 
Uo = 0. In fact, U; = g;(7(t)). Schramm observed that 
the measure py(0, 00), at least if it were supported 
on simple curves and the curves were parametrized 
using half-plane capacity, would produce a random 
U,. If the assumptions above on {up(z,w)} are 
translated into assumptions on the “driving func- 
tion” U;, one shows readily that U, must be a 
driftless Brownian motion, that is, U; = yK Bz, for a 
standard one-dimensional Brownian motion Bz. 
Chordal SLE, (in HL connecting 0 and oo) is 
defined to be the random collection of conformal 
maps g; obtained by solving the initial-value problem 








goz) =z [1] 


Ot 





Z=910(1)) 


Figure 1 The map g: from D\y(0, t] onto D. 
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Figure 2 The map g: from HĦiW[0,t] onto H. 


where B, is a standard one-dimensional Brownian 
motion. Equation [1] is often given in terms of the 
inverse fy = g7": 
. 2 
/ 
f(z) = —f; (z) ——=— 
@) = hay 


This equation describes a random evolution of 
conformal maps f; from H into subdomains of H. 
For each z € H, the solution of [1] is defined up to a 
time T, € [0,œ0] with T, > 0 for z #0. For fixed 
t,2; is the unique conformal transformation of 
H;:={z € H: T; >t} onto H with expansion 


2t 
gl) = Zt to Z= CO 

















The chordal SLE, path is the random curve 
y:[0,0o) — Ħ such that for each ¢,H; is the 
unbounded component of H\y[0,¢]. It is not 
immediate from the definition that such a curve y 
exists, but its existence has been proved. If G; = gy/,, 
then we can write eqn [1] as 








a 
o Gz) + W, 


where a=2/k and W;:=—/«B,/, is a standard 
Brownian motion. Then Z?:=G,(z) + W, satisfies 
the Bessel stochastic differential equation 


G,(z) [2 


dz? = 5 dt+dW,, Zi=z 3] 
t 


This equation is valid up to time KT}, which is the 
first time that Z? =0. 

Although chordal SLE, is defined with a parti- 
cular parametrization, one generally thinks of it as a 
measure on curves modulo reparametrization. The 
scaling properties of Brownian motion imply that 
this measure is invariant under dilations of H. If D 
is a simply connected domain and z, w are distinct 
boundary points of D, chordal SLE, in D connecting 
z and w is defined to be the conformal image of 
SLE, in H from 0 to oo under a conformal 
transformation of H onto D taking 0 to z and o 
to w. There is a one-parameter family of such 











transformations, but the scale invariance of SLE, in 
H shows that the image measure is independent of 
the choice of transformation. 

The geometric and fractal properties of the curve 
y vary greatly as the parameter « changes: 





è if k <4, yis a simple curve; 

e if 4<«< 8,7 has self-intersections, but is not 
space filling; and 

e if k > 8, yis a space filling curve. 


To see this, one notes that the conformal Markov 
property implies that there can be double points 
with positive probability if and only if Ty < co 
occurs with positive probability for x > 0. In add- 
ition, the curve is space filling if and only if T; < co 
for all z and Tu 4 T; for w 4 z. The problem is then 
reduced to a problem about the Bessel equation [3] 
for which the following holds: 


è if a> 1/2 and z 40, the probability that T, < co 
is zero. If a < 1/2, this probability equals 1. 

e if 1/4 < a< 1/2, and w,z are distinct points in H, 
then there is a positive probability that T,, = T,. 

e if 0< a< 1/4, then with probability 1, T,, 4 T, 
for all w Æ z. 





This kind of argument is typical when studying 
SLE - geometric properties of the curve are 
established by analyzing a stochastic differential 
equation. The Hausdorff dimension of the path y 
is given by 


dim|[y[0, 00)] = min{ 1 +5,2) 


The radial Loewner equation describes the evolu- 
tion of a curve from the boundary of the unit disk 
D={z:|z| < 1} to the origin. Suppose y:[0,00) — 
D is a simple curve with y(0)=1, y(0,œ0) c D\{0}, 
and y(t) — 0 as t — oo. Let g, be the unique 
conformal transformation of D\y[0,¢] onto D such 
that g:(0) =0,g,(0) > 0. One can check that g,(0) is 
continuous and strictly increasing in t, and hence we 
can parametrize y in such a way that g/(0)=e’. 
Using this reparametrization, there is a continuous 
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U,:[0,0o) — R with Uọo=0 such that g, satisfies 
the radial Loewner equation 


If z£#0, then we can define h,(z) = —i log g;(z) 
locally near z, and this equation becomes 


b,(z) = cot (20 


Radial SLE, (connecting 1 and 0 in D) is obtained 
by setting U; = vr B;. If D is a simply connected 
domain, z € D,w € D, then radial SLE, in D 
connecting w and z is obtained by conformal 
transformation using the unique transformation 
f of D onto D with f(0)=z,f(1)=w. Again, we 
think of this as being defined modulo time change. If 
a=2/K and v;=hg2, then 


, a viz) + W 
ù(z) = e (ae) [4] 
where W;:=—/«B,;, is a standard Brownian 


motion. If L =v;(z) + W;, then we get 


Z 
dL = 5 cot ($) dt + dW, 
Radial and chordal SLE are closely related. In fact, if 
y is a chordal SLE path in H from 0 to o, Ẹ is a 
radial SLE path in D from 1 to 0, and 7 = —i log¥, 
then for small ¢ the distribution of 7 is absolutely 
continuous to the distribution of a (random time 
change of) y. Showing this involves understanding 
the behavior of the Loewner equation under 
conformal transformations. Suppose y, have been 
parametrized as in [2] and [4] with a=2/k. Let g/ 
be the conformal transformation of H\7[0,¢] onto 
FI such that 











a` (t) 


E +o Z — 00 


and let U* be the Loewner driving function such that 
T 4” (t) 
e ae 
í gi(z) — U; 
Here a*(t)=hcap[7[0,t]]. If we consider a time 
change o such that a*(o(t))=at and let U; = Us) 


be the time-changed driving function, It6’s formula 
can be used to show that 


dU, =1(1 — 3a) F; dt + dW, [5] 


where the F, in the drift term depends on q[0, ¢] and 
is independent of a, and W is a standard Brownian 


motion. Girsanov’s theorem implies that Brownian 
motions with the same variance but different drifts 
have absolutely continuous distributions. In parti- 
cular, qualitative properties such as existence of 
double points or Hausdorff dimension of paths are 
the same for radial and chordal SLE. U; is a driftless 
Brownian motion if a=1/3, K=6. 

Whole-plane SLE, from 0 to o is a path 
y:(—00, 00) — C with y(—œ)=0, y(00) = 00, such 
that given y(—co, ft], the distribution of y(t, 00) is 
that of radial SLE, from boundary point y(t) to 
interior point oo in the domain C\y[—o«, t]. One can 
define whole-plane SLE, connecting two distinct 
points in C by conformal transformation. 


Locality and Restriction 


There are two special values of «x : x =6,a=1/3 that 
satisfies the “locality” property and k= 8/3, a=3/4 
that satisfies the “restriction” property. Suppose y is a 
chordal SLE, curve from 0 to oo in H parametrized 
as in [2]. Suppose ®:N — H is a conformal map 
taking a neighborhood M of 0 in H to ®(NV) and that 
locally maps R into R. Let 7(t)=®@0%(t), which is 
defined for sufficiently small t. Let gt be the 
conformal transformation of H\jV[0,¢] onto H with 














a` (t) 
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and let U, be the driving function such that 


a(t) 

gi(z) — U; 
Here a*(t)=hcap[y[0,¢]]. If we change time, 
W = Fou) So that a*(o(t)) =at, then an application 
of It6’s formula shows that Us := Uss satisfies 


ae ( 


x(t) (Wot) ? 
(3a — H- O oar wy, 
p’ ) 


dU* = 
o(t) (Woe) 


“2 


Here W, is a standard Brownian motion, ®, = g* o 
® og", and g; is the conformal map associated to y. 
In particular, if a=1/3,K*=6,U; is a standard 
Brownian motion; hence, 7* has the distribution of 
SLEs. The locality property for SLEs can be stated 
as “the conformal image of SLE¢ is (a time change 
of) SLE¢.” Intuitively, the SLE¢ path in a restricted 
domain does not feel the boundary of the domain 
until it reaches it. Radial SLE¢ satisfies a similar 
locality property. Moreover, [5] can be used to 
show that the image of chordal SLE¢ under the 
exponential map is the same (for small time t) as 
radial SLEs. The locality property explains why 


SLE¢ is a natural candidate for the boundary of 
percolation clusters. 

If k <4,SLE, paths are simple, that is, with no 
self-intersections. Suppose A C H\{0} is a compact 
set such that H\A is simply connected. Let y denote 
a chordal SLE, in H connecting 0 and oo and 
let Ea be the event Ea ={ (0,00) N A=@}. Let 
@,:H\A — H be the unique conformal transforma- 
tion with ®,(0)=0, ®4(co)=o0, ®,(00)=1. On 
the event E4, we can define 4(t) = ®, o y(t). Chordal 
SLE, is said to satisfy the restriction property if the 
conditional distribution of 7 given E, is the same as 
(a time change of) y. The only « < 4 that satisfies 
this property is x = 8/3. The proof of this fact also 
establishes the formula: if y is a chordal SLEg/3 
curve in H from 0 to ov, then 

















P{7(0,00) NA = Ø} = (0)! [6] 


There is a similar formula for radial SLEg/3, which 
establishes a radial restriction property. Suppose 
A C D\{0,1} is a compact set such that D\A is 
simply connected. Let Y4 be the unique conformal 
transformation of D\A onto D with V,4(0), Y4 (0) > 0. 
Then, if y is a radial SLEg/3 curve from 1 to 0 in D, 
then 


P{y(0,00)NA=O} = w', (0) 5/81, (1) 58 


The restriction property makes SLEs/3 the candidate 
for the scaling limit of self-avoiding walks. 


Relation to Conformal Field Theory 


The Schramm—Loewner evolution is one of the tools 
used to rigorously prove predictions made using 
powerful, yet nonrigorous, arguments of conformal 
field theory. In conformal field theory, there is a 
parameter c, called the central charge, which 
classifies theories. To each c < 1, there corresponds 
ak <4anda “dual” k'=16/k > 4: 


(8 — 3K)(K — 6) 
2K 


In particular, k = 8/3, «’=6 corresponds to central 
charge zero. It is expected, and has been proved in a 
number of cases, that SLE, or SLE, curves will 
appear in scaling limits of systems with central 
charge c,. These systems can also be parametrized 
by the boundary scaling exponent or conformal 
weight 


= Ck = 


6—K 
2K 
For «= 8/3, a=5/8 which is the exponent in [6]. 





Q = Qk = 
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In studying the relationship between SLE and 
conformal field theories, two other probabilistic 
objects, restriction measures and the (Brownian) 
loop soup, arise. An H-hull (connecting 0 and oo) is 
an unbounded, connected, closed set K c H with 
KOR={0} and such that H\K consists of two 
connected components, one whose boundary 
includes the positive reals and the other whose 
boundary includes the negative reals. A (chordal) 
restriction measure on hulls K is a probability 
measure with the property that for any A as in [6], 
the distribution of ®40K given {KN A =Q} is the 
same as the original measure. The (Brownian) loop 
measure is a measure on unrooted loops derived 
from Brownian bridges. It is the scaling limit of the 
measure on random walk loops that gives each 
unrooted simple random walk loop of length 2n 
measure 4-7”. The loop measure in a bounded 
domain is obtained by restricting to loops that stay 
in that domain. We can consider this as a measure 
on “hulls” by filling in the bounded holes (so that 
the complement of the hull is connected). By doing 
this we get a family of infinite measures on hulls, 
indexed by domains D, and this family satisfies 
conformal invariance and the restriction property. 
The loop soup with parameter A is a Poissonian 
realization from this measure with parameter A. 

The set of all restriction measures is parametrized 
by a>5/8; the a-restriction measure has the 
property that 








P{KNAF 0} = ©, (0)° 


For a=5/8, K is given by the path of SLEg/3. For 
integer a, the hull K can be constructed by taking 
a-independent Brownian excursions in H (Brownian 
motions starting at 0 conditioned to stay in H for all 
times), and letting K be the hull obtained by taking 
the union of the paths and filling in the bounded 
holes. If k < 8/3,c, <0, then the restriction mea- 
sure with exponent a, > 5/8 can also be con- 
structed as follows: take a chordal SLE, path and 
an independent realization of the loop soup with 
intensity A, = —c,; add to the SLE path all the 
loops in the soup that intersect the SLE, curve; and 
then fill in all the bounded hulls. The limiting case 
a=5/8, \=0 gives the only measure supported on 
simple curves that is also a restriction measure, 
SLE gs: 

For 8/3 <K<4,0<c, <1, it is conjectured, 
and proved for small c,, that SLE, curves can be 
found by taking a loop soup with parameter A = cx 
and looking at connected curves in the fractal set 
given by the complement of the union of all the hulls 
generated by the loops. 
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Examples 


The scaling limit of simple random walk, Brownian 
motion, is known to be conformally invariant. A 
two-dimensional Brownian bridge or loop is a 
Brownian motion, B,,0<t<1, conditioned so 
that Bo = B1. The frontier or outer boundary of the 
Brownian motion is the boundary of the unbounded 
component of the complement. Benoit Mandelbrot 
first observed numerically that the outer boundary 
of Brownian motion had fractal dimension ~4/3. 
Gregory Lawler, Oded Schramm, and Wendelin 
Werner used SLE to prove that the boundary has 
Hausdorff dimension 4/3. In fact, the outer bound- 
ary can be considered as an SLE¢/3 loop. 

SLE, and SLEg/3 arise in the scaling limit of 
critical percolation on the triangular lattice. Suppose 
that each vertex in the upper half-plane triangular 
lattice is colored black or white each with a 
probability 1/2. Suppose the real line gives a 
boundary condition of black on the negative real 
line and white on the positive real line. Then if we 
represent the vertices in the lattice as hexagons as 
in the figure, a curve is formed which is the 
boundary between the black and white clusters. 
This curve is called the “percolation exploration 
process.” Stanislav Smirnov proved that the scaling 
limit of this curve is conformally invariant, and from 
this it can be concluded that the curve is chordal 
SLE¢. In particular, the Hausdorff dimension is 7/4 
and the scaling limit has double points. In the 
scaling limit, the “outer boundary” of this curve has 
Hausdorff dimension 4/3 and its dimension is 
absolutely continuous with respect to that of 
SLEg;3. While this result is expected for other 
critical percolation model, such as bond percolation 
in Z with critical probability 1/2, it has only been 
proved for the triangular lattice. Percolation has 
central charge 0 and the “locality” property can be 
seen in the lattice model. The outer boundary of the 
curve has the same distribution as the outer 
boundary of a Brownian motion that is reflected at 
angle 7/3 off the real line. Locally, the outer 
boundary of percolation, the outer boundary of 
complex Brownian motion, and SLEg¢;3 all look the 
same, and it is expected that this will also be true for 
the scaling limit of self-avoiding walks. 

There are three models derived in some way from 
simple random walk that have been proved to have 
scaling limits of SLE,. The loop-erased random walk 
(LERW) in a finite subset V of Z connecting two 
distinct points is obtained by taking a simple 
random walk from one point to the other and 
erasing loops chronologically. The LERW is closely 
related to uniform spanning trees; in fact, if one 


chooses a spanning tree of V from the uniform 
distribution on all spanning trees, then the distribu- 
tion of the unique path connecting the two points is 
exactly that of the LERW (see Figure 3). Another 
description of the LERW is as the Laplacian random 
walk: the LERW from z to w in V chooses a new 
step weighted by the value of the function that is 
harmonic on the complement of w and the path up 
to that point with boundary values 0 on the path 
and 1 on w. The LERW in the discrete upper half- 
plane can be obtained by erasing loops from a 
simple random walk excursion. The LERW and the 
uniform spanning tree are systems with central 
charge c= —2. It has been proved that the scaling 
limit of the LERW is SLE2; hence, the paths have 
Hausdorff dimension 5/8. 

There is another path associated to spanning trees 
given by the one-to-one correspondence between 
spanning trees and Hamiltonian walks on a corre- 
sponding directed (Manhattan) lattice on the dual 
graph (see Figure 4). If the spanning trees, or 
equivalently the Hamiltonian walks, are chosen 
using the uniform distribution, then the scaling 
limit of this walk is the space-filling curve SLEg. 
Note that 2 and 8 are the dual values of & associated 
toc=-2. 


Figure 3 A spanning tree and the path between two vertices. 
If the tree has the uniform distribution, the path has the 
distribution of the LERW. 











| 


Figure 4 A spanning tree and the corresponding Hamiltonian 
walk. 


Another discrete process derived from simple 
random walk, the harmonic explorer, has a scaling 
limit of SLE4. There is a particular property of SLE,, 
that leads to the definition of this discrete process. 
Consider a chordal SLE, curve, let z € H, and let Z% 
be as in [3] with a=2/k. Itd’s formula shows that 
O,:= arg(Z?) satisfies 


1 sin(20;) sin ©; 
dO, = | = — a | ——— dt — dw 
: ( ) Zz Zi] 








In particular, ©; is a martingale if and only if 
a=1/2,«K=4. The probability that a complex 
Brownian motion starting at z € Hi first hits R on 
the negative half-line can be shown to be arg (z). If 
k <4, then we can see that © equals 0 or 7, 
depending on whether z is on the right or left side 
of the path (0,00). For the martingale case k =4, 
©, represents the probability that z is on the left 
side of y(0,co), given ~(0,t]. The harmonic 
explorer is a process on the hexagonal lattice 
defined to have this property. In a way similar to 
the percolation process, the walk is defined as the 
boundary between black and white hexagons on 
the triangular lattice. However, when an unex- 
plored hexagon is reached in the harmonic 
explorer, it is colored black with probability q, 
where g is the probability that a simple random 
walk on the triangular lattice starting at that 
hexagon (considered as a vertex in the triangular 
lattice) hits a black hexagon before hitting a white 
hexagon. It is not difficult to show that this process 
has the property that for z away from the curve, 
the “probability of z ending on the left given the 
curve of n steps” is a martingale. 

There are many other models for which SLE, 
curves are expected in the limit, but it has not been 
established. The most difficult part is to show the 
existence of a limit that is conformally invariant. 
One example is the self-avoiding walk (SAW). It is 
an open problem to establish that there exists a 
scaling limit of the uniform measure on SAWs and 
to establish conformal invariance of the limit. 
However, the nature of the discrete model is such 
that if the limit exists, it must satisfy the restriction 
property. Hence, under the assumption of confor- 
mal invariance, the only possible limit is SLEg/3. 
Numerical simulations strongly support the con- 
jecture that SLEg/3 is the limit of SAWs, and this 
gives strong evidence for the conformal invariance 
conjecture for SAWs. Critical exponents for SAWs 
(as well as critical exponents for many other 
models) can be predicted nonrigorously from 
rigorous scaling exponents for the corresponding 
SLE paths. 
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Generalizations 


One of the reasons that the theory of SLE is nice for 
simply connected domains is that a simply connected 
domain with an arc connected to the boundary of the 
domain removed is again simply connected. For 
nonsimply connected domains, it is more difficult to 
describe because the conformal type of the slit 
domain changes as time evolves. In the case of a 
curve crossing an annulus, this can be done with an 
added parameter referring to the conformal type of 
the annulus (two annuli of the form {z:r; < |z| < sj} 
are conformally equivalent if and only if 
r1/s1 = 12/s2). It is not immediately obvious what 
the correct definition of SLE should be in general 
domains and, more generally, on Riemann surfaces. 
One possibility for k < 4 is to consider a configura- 
tional (equilibrium statistical mechanics) view of 
SLE. Consider a family of measures {up(z,w)}, 
where D ranges over domains and z, w are distinct 
boundary points at OD is locally analytic, supported 
on simple curves from z to w (modulo time change). 
Let UA w) = up (z, w)/|up(z, w)| be the correspond- 
ing probability measures, which may be defined even 
if OD is not smooth at z,w. Then the following 
axioms should hold: 


è Conformal invariance. If f:D — D' is a confor- 
mal transformation, f o GAA w= i (f(z), f(w)). 

© Conformal Markov property. 

è Perturbation of domains. Suppose Dı C D and 
OD,,0D agree near z, w. Then pp, (z,w) should 
be absolutely continuous with respect to up(z, w). 
Let Y denote the Radon—Nikodym derivative of 
Up, (z, w) with respect to wp(z,w). Then 


Y(y) = 11700, t) C Di} Fe(D; 7, D\ D1) 


where F, is to be determined. In the case where 
D,D, are simply connected, F,(D;7,D\D ,)= 
J(y, D, D1) €, where J(y, D, D1) denotes the prob- 
ability that there is a loop in the Brownian loop 
soup in D that intersects both y and D\ D1. (There 
is no problem defining this quantity in nonsimply 
connected domains, but it is not clear that it is the 
right quantity.) Here c=c,. The restriction property 
tells us that Fo = 1. 


e Conformal covariance. If fis as above, OD, OD’ are 
smooth near z, w and f(z), f(w), respectively, then 


fo uoz w) = [FRI w) "un (F(z), fw) 


Here a=a,, is the boundary scaling exponent. 


See also: Boundary Conformal Field Theory; Percolation 
Theory; Random Walks in Random Environments. 
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Introduction 


The concept of stochastic resonance was introduced 
by physicists. It originated in a toy model designed 
for a qualitative description of periodicity phenom- 
ena in the recurrences of glacial eras in Earth’s 
history. It spread its popularity over numerous areas 
of natural sciences: neuronal response to periodic 
stimuli, variations of magnetization in a ferromag- 
netic system, voltage variations in the simple Schmitt 
trigger electronic circuit or in more complicated 
devices, behavior of lasers in optical bi-stability, etc. 
The interest in this ubiquitous phenomenon is 
enhanced by signal analysis: an optimal dose of 
noise in some system can essentially boost signal 
transduction. Noise in this context does not enter the 
system as an impurity perturbing its performance, but 
on the contrary as a catalyst triggering amplified 
stochastic response to weak periodic signals. 


The Climate Paradigm 


The phenomenon of stochastic resonance was first 
discovered in an elementary climate model serving in 
an explanation of major transitions in paleoclimatic 
time series confining glacial cycles. Data collected 
for instance from ice or deep sea cores allow one to 


Schramm O (2000) Scaling limits of loop-erased random walks 
and uniform spanning trees. Israel Journal of Mathematics 
118: 221-288. 
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pp. 113-195. Berlin: Springer. 


deduce estimates of the average temperature on 
Earth over the last 700000years. They exhibit 
periodic switching between ice and warm ages with 
fast spontaneous transitions. The average periodicity 
of the glaciation time series obtained is ~10° years. 
In order to explain temperature variations, Benzi 
et al. (1981) introduced random perturbations into 
an energy balance model of the Budyko-Sellers type. 
This model describes the evolution of the seasonal 
and global average temperature X caused by defects 
in the balance between incoming and outgoing 
radiation 


dX(t) 
dt 


where c is the active thermal inertia of the system. 
The incoming energy is modeled as proportional 
to the “solar constant” Q: 





= Ein = ont 


2nt . 
Ep= ©, (1 + A cos =), with T ~ 92000 years 
and Ax0.1% of QO. This exceedingly small varia- 
tion of the solar constant is caused by a modulation 
of the orbital eccentricity of the Earth’s trajectory 
(Figure 1). The outgoing radiation Eout is composed 





Figure 1 


Modulation of the orbital eccentricity. 


of two essential parts. The first part a(X)Ejn is 
dominated by the albedo a(X) representing the 
proportion of energy reflected back to space. It is a 
decreasing function of temperature, due to the 
higher rate of reflection from a brighter Earth at 
low temperatures implying a bigger volume of ice. 
The second part of the outgoing radiation comes 
from the fact that the Earth radiates energy like a 
black body, and is given by the Boltzmann law 7X4, 
where y is the Stefan constant. Describing the 
balance of energy terms as a slowly and weakly 
time-varying gradient of a potential U, the balance 
model can be expressed by 





where the time period 1 is blown up to (large) T by 
time scaling. The roles of deep and shallow wells 
switch periodically (Figure 2). Since the variation of 
the solar constant is extremely small, we can assume 
that the height of the barrier between the two wells 
is lower-bounded by a positive constant. The system 
then admits three steady states two of which are 
stable and separated by roughly 10K. As the solar 
constant, they fluctuate slowly and very weakly. 
Therefore, this deterministic system cannot account 
for climate changes with temperature variations of 
~10K. They can only be explained by allowing 
transitions between the two steady states which 
become possible by adding noise to the system. In 
general, short timescale phenomena such as annual 
fluctuations in solar radiation are modeled by 
Gaussian white noise of intensity € and lead to 
equations of the type 

bc (= 


> = X5) dt + ved W, 1] 


which are generic for studying stochastic resonance 
in numerous physical and biological models. Gen- 
erally, the input of noise amplifies a weak periodic 
signal by creating trajectories fluctuating randomly 
periodically between meta-stable states. An optimal 
tuning of noise intensity to period length (“stochas- 
tic resonance”) significantly enhances the response 





~ - 


Figure 2 Deep and shallow wells switching periodically. 
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of the random system to weak perturbations with 
long periods. 


Strongly Damped Brownian Particle 


It is useful to roughly compare solutions of 
stochastic differential equations and motions of 
Brownian particles in double-well landscapes 
(Figure 3) in order to understand properties of 
their trajectories (see Schweitzer 2003, Mazo 2002). 
As in the previous section, let us concentrate on a 
one-dimensional setting, remarking that we shall 
give a treatment that easily generalizes to the finite- 
dimensional setting. Due to Newton’s law, the 
motion of a particle is governed by the impact of 
all forces acting on it. Let us denote F the sum of 
these forces, m the mass, x the space coordinate, and 
v the velocity of the particle. Then 


mv = F 


Let us first assume the potential to be switched off. 
In their pioneering work at the turn of the 
twentieth century, Marian v. Smoluchowski and 
Paul Langevin introduced stochastic concepts to 
describe the Brownian particle motion by claiming 
that at time t 


F(t) = —yov(t) + vV 2kgeTy W, 


The first term results from friction yọ and is velocity 
dependent. An additional stochastic force represents 
random interactions between Brownian particles and 
their simple molecular random environment. The 
white noise W (formal derivative of the Wiener 
process) plays the crucial role. The diffusion coefficient 
(standard deviation of the random impact) is com- 
posed of Boltzmann’s constant kg, friction, and 
environmental temperature T. It satisfies the condition 
of the fluctuation-dissipation theorem expressing the 
balance of energy loss due to friction and energy gain 
resulting from noise. The equation of motion becomes 


dx(t) _ 
dt a) 
dv(t) = — n v(t) dt + a i dW, 





\ 


Figure 3 Brownian particle in a double-well landscape. 
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Figure 4 Resonance pictures for diffusions. 


In the stationary regime, the stationary Ornstein- 
Uhlenbeck process provides its solution 


t 
v(t) =v(0) e-o/me y V2RBT 70 f e- 00/3) ayy, 
m 0 


The ratio G:=y0/m determines the dynamic behav- 
ior. Let us focus on the over-damped situation with 
large friction and very small mass. Then for 
t>>1/G=T7 (relaxation time), the first term in the 
expression for velocity can be neglected, while the 
stochastic integral represents a Gaussian process. By 
integrating, we obtain in the over-damped limit 
(G— oo) that v and thus x is Gaussian with almost 
constant mean 


1—e 


m(t) =x(0)+ 3 


v(0) + x(0) 


and covariance close to the covariance of white 
noise see Nelson (1967): 





2kpT kpT 
e” min(s,t) + ——(—2 +2e + 2e 
Yo Yob 


— el-s] _ eAlé+5)) 


2kgT 
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min (s, £) 


Hence, the time-dependent change of the velocity of 
the Brownian particle can be neglected, the velocity 
rapidly thermalizes (v ~ 0), while the spactial coor- 
dinate remains far from equilibrium. In the so-called 
adiabatic transformation, the evolution of the 
particle’s position is thus given by the transformed 
Langevin equation 


dx(t) = © A dW, 


Yo 


Let us next suppose that we have a Brownian 
particle in an external field of force (see Figure 3), 


oT 





oT 


generating a potential U(t,x). This leads to the 
Langevin equation 


dx(t) 
g "9 


m dv(t) = —yo v(t)dt — = (t, x(t)) + /2kpT yo dW, 


In the over-damped limit, after relaxation time, the 
adiabatic elimination of the fast variables (Gardiner 
2004) leads to an equation similar to the one 
encountered in the previous section: 


ope Sais = dW, 
0 


In the particular case of some double-well potential 
x — U(t,x) with slow periodic variation, the follow- 
ing patterns of behavior of the solution trajectories 
will be experienced. If temperature is high, noise has 
a predominant influence on the motion, and the 
particle often crosses the barrier separating the two 
wells during one period. The behavior of the particle 
does not seem to be periodic but rather chaotic. If 
temperature is small, the particle stays for a very 
long time in the starting well, fluctuating weakly 
around the equilibrium position. It has too low 
energy to follow the periodic variation of the 
potential. So in this case too, the trajectories do 
not look periodic. Between these two extreme 
situations, there exists a regime of noise intensities 
for which the energy transmitted by the noise is 
sufficient to cross the barrier almost twice per 
period. The parameters are then near to the 
resonance point and the motion exhibits periodic 
switching (Figure 4). 





Transition Criteria 
and Quasideterministic Motion 


Studying stochastic resonance accordingly means 
looking for the range of regimes for which periodic 
behavior is enhanced and eventually optimal. The 
optimal relation between period T and noise 


intensity € emerges in the small noise limit. To 
explain this, let us focus on the basic indicator for 
periodic transitions — the time the Brownian particle 
needs to exit from the starting well, say the left one. 
In the “frozen” case, that is, if the time variation of 
the potential term is eliminated just by freezing it at 
some time s, the asymptotics of the exit time is 
derived from the classical large deviation theory of 
randomly perturbed dynamical systems (see Freidlin 
and Wentzell 1998). Let us assume that U is locally 
Lipschitz. We denote by D; (resp. D,) the domain 
corresponding to the left (resp. right) well and x% 
their common boundary. The law of the first exit 
time Tp = inf {t > 0, XF £ Dı} is described by some 
particular functional related to large deviation. For 
t>0, we introduce the “action functional” on the 
space of continuous functions C([0,¢]) on [0,¢] by 


Š J í . 
§§() = ali (Pu +58 (s, Pu)) du, if ọ is abs. 
/(~) = continuous 
TOO otherwise 


which is non-negative and vanishes on the set 
of solutions of the ordinary differential equation 
x= — (0U /ðx)(s, x). Let x and y E€ R. In relation with 
the action functional, we define the quasipotential 


V(x, ¥) = inf{S;(y):~ EC([0,t]), po =x, p:=y,t2 0} 


It represents the minimal work the diffusion starting 
in x has to do in order to reach y. To switch wells, 
the Brownian particle starting in the left well’s 
bottom x, has to overcome the barrier. So we let 


V, = inf V(x, y) 
yYEX 


This minimal work needed to exit from the left well 
can be computed explicitly, and is seen to equal to 
twice its depth. The asymptotic behavior of the exit 
time is expressed by 


and 


e—0 


for any 6 > 0 


The prefactor for the exponential rate, derived by 
Freidlin and Wentzell (1998), was first given by 
Eyring and Kramers and then by Bovier et al. (2004). 
Let us now assume that the left well is the deeper 
one at time s. If the Brownian particle has enough 
time to cross the barrier, that is, if T > eY/*, then 
whatever the starting point is, Freidlin (2000) proved 
that it should stay near x; in the following sense: 


A(t € [0, 1] : |X}, -x| > 6) +0 
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in probability as e—0. Here A denotes Lebesgue 
measure on R. If T < es/*, the time left is not long 
enough for crossings: the particle stays in the 
starting well, near the stable equilibrium point: 


A(te 0, 1] : Xr — (Xl wena + Xl wep, 3) > ô) — 0) 


This observation is at the basis of Freidlin’s law of 
quasideterministic periodic motion discussed in the 
subsequent section. The lesson it teaches is this: to 
observe switching of the position to the energetically 
most favorable well, T should be larger than some 
critical level eò‘. Measuring time in exponential 
scales by u through the equation T*=e“/*, the 
condition becomes u > À. 


Stochastic Resonance for Landscapes, 
Frozen on Half-Periods 


This particular case has analytical advantages, since 
it allows one to employ classical techniques of 
semigroup and operator theory. The situation is the 
following: let U be a double-well potential with 
minima x}=—1 and x,=1 and a saddle point at 
the origin. We assume that U(x)— oo as |x|— œo 
and U(—1)=—V/2=-— V,/2, U(1)=—v/2=— V,/2, 
U(0)=0, and 0<uv< V. We define the 1-periodic 
potential by U(t,x)=U(t+1/2, —x). Hence on each 
half-period the corresponding diffusion is time homo- 
geneous. The critical level A is then easily defined by 
àA =v, that is, twice the depth of the shallow well. By 
letting 


a(t) —1 for te[k,k +34) 

| 1 forte[k+i,k+1), &=0,1,2,... 
the periodic function which describes the location of 
the global minimum of the potential, we get in the 
small noise limit 


A(t € [0, 1] : |X}, — é(2)| > 6) 0 


in probability as e— 0. This result expresses 
Freidlin’s law of quasideterministic motion: for 
large periods, the trajectories of the particle 
approach a periodic deterministic function. But the 
sense in which this notion measures periodicity does 
not take into account that for large periods short 
excursions to the wrong well may occur in an erratic 
way without counting much for Lebesgue measure 
of time. In fact, if the period is too large, that is, 
u > V, the time available in one period permits the 
exit of not only the shallow well but also that of the 
deep well. So, whatever the starting position of 
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the particle is, the number of observed transitions in 
one half period becomes very large. Indeed the first 
time € the particle starting in x, hits again x, after 
visiting the position x, satisfies 


E(g) =e’ pe < Te = el? 


The motion of the particle appears more chaotical 
than periodic: noise intensity is too large compared 
to period length. We avoid this range of chaotic 
spontaneous transitions by defining the resonance 
interval IR = [v, V], as the range of admissible energy 
parameters u for randomly periodic behavior. In this 
regime, the trajectories possess periodicity proper- 
ties. In these terms the resonance point describes the 
tuning rate ur € Ir for which the stochastic response 
to weak external periodic forcing is optimal. To 
make sense, this point has to refer to some measure 
of quality for periodicity of random trajectories. In 
the huge physics literature concerning resonance, 
two families of criteria can be distinguished. The 
first one is based on invariant measures and spectral 
properties of the infinitesimal generator associated 
with the diffusion X°. Now, X° is not Markovian 
and consequently does not admit invariant mea- 
sures. But by taking into account deterministic 
motion of time in the interval of periodicity and 
considering the process Z,=(t mod(T*), X;), we 
obtain a Markov process with an invariant measure 
v;(x)dx. In other words, the law of X; ~ %(x)dx and 
the law of Xar ~ Y%47(x)dx, under this measure, 
are the same for all ¢ > 0. Let us present the most 
important ones: 


e the spectral power amplification (SPA) which 
plays an eminent role in the physics literature 
describes the energy carried by the spectral 
component of the averaged trajectories of X° 
corresponding to the period: 


2 


1 
Mspa (£, T) = | E, [XE r]e7™ ds 
0 
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Figure 5 Resonance pictures for Markov chain. 














e the SPA-to-noise ratio, giving the ratio of the 
amplitude of the response and the noise intensity, 
which is also related to the signal-to-noise ratio: 


Mopn(e, T) = Mspa(e, T)/e7 


è the total energy of the averaged trajectories 
2 
Menle, T)= | (E,[Xerl)* ds 
0 


The second family of criteria is more probabilistic. 
It refers to quality measures based on transition 
times between the domains of attraction of the local 
minima, residence times distributions measuring the 
time spent in one well between two transitions, or 
interspike times. This family is certainly less popular 
in the physics community. 

However, measures related to invariant measures 
may suffer from robustness deficiency (Imkeller and 
Pavlyukevich 2002). To explain what we mean by 
robustness, let us introduce a model reduction first 
discussed by McNamara and Wiesenfeld (1989). 
Instead of studying the diffusion X* in the double- 
well landscape, they introduce a two-state Markov 
chain Y° (Figure 5) the dynamics of which just 
takes account of the domain of attraction the diffusion 
is in, and therefore with state space {—1,1}. A 
reasonable choice of the infinitesimal generator should 
retain the dynamics of the diffusion’s transitions 
characterized by Kramers’ rate. We may take 


-Y P T 
L) = ’ 0 < t = = 
a= (7 5) osts3 
on=(( ©), fsect 
p py) 2 
periodically continued on R,. Here, p=pe"/* and 


w=qe "©. The prefactors of subexponential order 
are beyond the scope of large deviation theory. They 
are related to the curvature of the potential in the 
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minima and the saddle point of the landscape and 
given by 


1 
— "n nN 
p = 5- VUC DUO] 
_t 
aoe 


On the intervals [RT/2,(kR+1)T/2[,k >0, the 
Markov chain Y° is time-homogeneous and its 
transition probabilities can be expressed in terms 
of y and wy. For instance, the probability with which 
the chain jumps from state —1 to state + in the time 
window [t,t+h] equals yh+o(h), if this time 
interval is contained in [RT/2,(kR+1)T/2[ for 
some even k. The stationary measure of the Markov 
chain denoted by v can be explicitly calculated, and 
so can the classical quality measures based on the 
spectral notions. For instance, the spectral power 
amplification coefficient equals 


U"(1)|U"(0) 


i 2 
Mspa(e, T) = / E [Yi ]e7™ ds 
0 


4 Tlo- 


T (p +p) T? +r? 





This simple expression admits asymptotically a 
unique maximum which exhibits the resonance 
point: 


T Vv 
(i | (V+v) /2e 1 O —(V-v) /e 
opt \/2pq V — P f T (e )} 


The optimal period is then exponentially large — as 
was suggested by large deviation theory — and the 
growth rate is the sum of the two wells’ depths. The 
simple Markov chain model is popular since the 
usual physical quantities are easy computable and 
since it is believed to mimic the dynamics of a 
Brownian particle in the corresponding double-well 
landscape. However, the models are not as similar 
as expected (Freidlin 2003). Indeed, in a reasonably 
large time window around the resonance point for 
Y°, the tuning picture of the spectral power 
amplification for the diffusion is different. Under 
weak regularity conditions on the potential, it 
exhibits strict monotonicity in the window. Hence, 
optimal tuning points for diffusion and Markov 
chain differ essentially. In other words, the SPA 
tuning behavior of the diffusion is not robust for 
passage to the reduced model. This strange defi- 
ciency is difficult to explain. The main reason of this 
subtle effect appears to be that the diffusive nature 
of the Brownian particle is neglected in the reduced 
model. In order to point out this feature, we may 
compute the SPA coefficient of (X°), where g is a 
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particular function designed to cut out the small 
fluctuations of the diffusion in the neighborhood of 
the bottoms of the wells, by identifying all states 
there. So g(x) = —1 (resp. 1) in some neighborhood 
of —1 (resp. 1) and otherwise g is the identity. This 
results in 


~ 1 <i 5 
Mspa(é, T) = J E,g(Xgr)e™ ds 
0 


In the small noise limit this quality function admits a 
local maximum close to the resonance point of the 
reduced model: the growth rate of TE „ is also given 
by the sum of the wells’ depths. So the lack of 
robustness seems to be due to the small fluctuations 
of the particle in the wells’ bottoms. In any case, this 
clearly calls for other quality measures to be used to 
transfer properties of the reduced model to the 
original one. Our discussion indicates that due to 
their emphasis on the pure transition dynamics, the 
second family of quality measures should be used. 
For these notions there is no need to restrict to 
landscapes frozen in time-independent potential 
states on half-period intervals. 


Stochastic Resonance for Continuously 
Varying Landscapes 


From now on the potential U(t, x) is supposed to be 
continuously varying in (t,x). For simplicity, its 
local minima are assumed to be located at +1, and 
its only saddle point at 0, independently of time. So 
the only meta-stable states on the whole time axis 
are +1. Let us denote by A_(t) (resp. A,(f)) the 
depth of the left (resp. right) well at time t. Together 
with U, these functions are continuous and 
1-periodic. Assume that they are strictly monoto- 
nous between their global extrema. Let us now come 
back to the motion of a Brownian particle in this 
landscape. The exit time law by Eyring—Kramers-— 
Freidlin entails that trajectories get close to the 
global minimum, if the period is large enough. 
Stated as before in exponential rates T =e"/°, with 
u > max;—+ SUP;s9 2A+(t), that is, u exceeds the 
maximal work needed to cross the barrier, the 
particle often switches between the two wells and 
should stay close to the deepest position in the 
landscape. This position being described by the 
function $(t) =21,a, (ysa_(#} —1, we get in the small 
noise limit 
A(t € [0,1] : |X — o(t)| > 6) 0 

in probability. But on these long timescales, many 


short excursions to the wrong well are observed, and 
trajectories look chaotical instead of periodic. So we 
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have to look at smaller periods even at the cost that 
the particle may not stay close to the global 
minimum. Let us study the transition dynamics. 
Assume that the starting point is —1 corresponding 
to the bottom of the deep well. If the depth of the well 
is always larger than u = e€ log T°, the particle has too 
little time during one period to climb the barrier, and 
should stay in the starting well. If, on the contrary, 
the minimal work to leave the starting well, given by 
2A_(s), becomes smaller than u at some time s, then 
the transition can and will happen. More formally, 
for wé[infyso9 2A_(t), sup,.9 2A_(t)], we define 
(Figure 6). E 


a (s) = inf{t >s:2A_(t) < p} 


The first transition time from —1 to 1 denoted T, 
has the following asymptotic behavior as 
oO) Pa, (0). At the second transition the 
particle returns to the starting well. If at is defined 
analogously with respect to the depth function A4, 
this transition will occur near the deterministic time 
ay (a, (s))T°. In order to observe periodicity, and to 
exclude chaoticity from all parts of its trajectories, 
the particle has to stay for some time in the other 
well before returning. This will happen under the 
assumption 2A, (a,,(0)) > yu, that is, the right well is 
the deep one at transition time. In fact, we can 
define the resonance interval Ip (Figure 7), as the set 
of all scales u for which trajectories exhibit 
periodicity in the small noise limit, by 


Ik= | max inf 2A;(t); inf max 2A;(t)| 


i=+ t>0 (= 





Figure 6 Definition of a. 
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Figure 7 Resonance interval. 


On this interval they get close to deterministic 
periodic ones. Again, periodicity is quantified by a 
quality measure, to be maximized in order to obtain 
resonance as the best possible response to periodic 
forcing. One interesting measure is based on the 
probability that random transitions happen in some 
small time window around a deterministic time, in 
the small noise limit (Herrmann and Inkeller 2005). 
Formally, for h > 0, the measure gives 


M,(e, T) = min P;(r4/T* € id’, — hia, + h]) 


where P; is the law of the diffusion starting in 7. In 
the small noise limit, this quality measure tends to 1, 
and optimal tuning can be related to the exponential 
rate at which this happens. This is due to the 
following large deviations principle: 


lim e log(1 — M,(e, T)) = max}. — 2Aj(a', — h)} 


for EIR, with uniform convergence on each 
compact subset of Ip. The result is established 
using classical large-deviation techniques applied to 
locally time homogeneous approximations of the 
diffusion. Maximizing the transition probability in 
the time window position means minimizing the 
default rate obtained by the large deviations 
principle. This can be easily achieved. In fact, if the 
window length 2h is small, then u — 2A,(a), — h) © 
2hAj(a',), since 2A;(a,) =u by definition. The value 
A;(a/,) is negative, so we have to find the position 
where its absolute value is maximal. In this position 
the depth of the starting well has the most rapid 
drop under the level p, characterizing the link 
between the noise intensity and the period. So the 
transition time is best concentrated around it. 

It is clear that a good candidate for the resonance 
point is given by the eventually existing limit of the 
global minimizer pr(h) as the window length / tends 
to 0. This limit is therefore called the resonance 
point of the diffusion with time-periodic landscape 
U. Let us note that for sinusoidal depth functions 


fo V-v 


A_(t) J 





+ 





cos(27t) 


and 
A(t) = A_(t+7) 


the optimal tuning is given by T° = exp ur/e with 
ur =(v+V)/2. This optimal rate is equivalent to 
the optimal rate given by the SPA coefficient of the 
reduced dynamics’ Markov chain in the preceding 
section. 

The big advantage of the quality measure M; is its 
robustness. Indeed, consider the reduced model 


consisting of a two-state Markov chain with 
infinitesimal generator 


_ (-(t) pt) 
20= (So) Sa) 

where g(t)=exp—2A_(t/T)/e and = y(t)= 
exp 2A,(t/T)/e. The law of transition times of 
this Markov chain is readily computed from Laplace 
transforms. Normalized by T* it converges to Zi. 
This calculation even reveals a rigorous underlying 
pattern for the second- and higher-order transition 
times interpreting the interspike distributions of 
the physics literature. The dynamics of diffusion 
and Markov chain are similar. Resonance points 
provided by M; for the diffusion and its analog for 
the Markov chain agree. 


Related Notions: Synchronization 


In the preceding sections, we interpreted stochastic 
resonance as optimal response of a randomly 
perturbed dynamical system to weak periodic forcing, 
in the spirit of the physics literature (see Gammaitoni 
et al. (1998)). Our crucial assumption concerned the 
barrier heights a Brownian particle has to overcome 
in the potential landscape of the dynamical system: it 
is uniformly lower bounded in time. Measures for the 
quality of tuning were based on essentially two 
concepts: one concerning spectral criteria, with the 
spectral power amplification as most prominent 
member, the other one concerning the pure transi- 
tions dynamics between the domains of attraction of 
the local minima. A number of different criteria can 
be used to create an optimal tuning between the 
intensity of the noise perturbation and the large 
period of the dynamical system. The relations have to 
be of an exponential type T = exp u/c, since the 
Brownian particle needs exponentially long times to 
cross the barrier separating the wells according to the 
Eyring—Kramers-—Freidlin transition law. Our barrier 
height assumption seems natural in many situations, 
but can fail in others. If it becomes small periodically, 
and eventually scales with the noise-intensity para- 
meter, the Brownian particle does not need to wait an 
exponentially long time to climb it. So periodicity 
obtains for essentially smaller timescales. In this 
setting, the slowness of periodic forcing may also be 
assumed to be essentially subexponential in the noise 
intensity. 

If it is fast enough to allow for substantial changes 
before large deviation effects can take over, we are 
in the situation of Berglund and Gentz (2002). They 
in fact consider the case in which the barrier 
between the wells becomes low twice per period, 
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to the effect of modulating periodically a bifurcation 
parameter: at time zero the right-hand well becomes 
almost flat, and at the same time the bottom of the 
well and the saddle approach each other; half a 
period later, a spatially symmetric scenario is 
encountered. In this situation, there is a threshold 
value for the noise intensity under which transitions 
become unlikely. Above this threshold, the trajec- 
tories typically contain two transitions per period. 
Results are formulated in terms of concentration 
properties for random trajectories. The intuitive 
picture is this: with overwhelming probability, 
sample paths will be concentrated in spacetime sets 
scaling with the small parameters of the problem. In 
higher dimensions, these sets may be given by 
adiabatic or center manifolds of the deterministic 
system, which allow model reduction of higher- 
dimensional systems to lower-dimensional ones. 
Asymptotic results hold for any choice of the small 
parameters in a whole parameter region. A passage 
to the small noise limit as for optimal tuning in the 
preceding sections is not needed. 

Related problems studied by Berglund and Gentz 
in the multidimensional case concern the noise- 
induced passage through periodic orbits, where 
unexpected phenomena arise. Here, as opposed to 
the classical Freidlin—Wentzell theory, the distribu- 
tion of first-exit points depends nontrivially on the 
noise intensity. Again aiming at results valid for 
small but nonvanishing parameters in subexponen- 
tial scale ranges, they investigate the density of first- 
passage times in a large regime of parameter values, 
and obtain insight into the transition from the 
stochastic resonance regime into the synchronization 
regime. 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Magnetic Resonance 
Imaging; Spectral Theory for Linear Operators; 
Stochastic Differential Equations. 
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Introduction 


String field theory (SFT) is the second-quantized 
approach to string theory. In the usual, first- 
quantized, formulation of string perturbation the- 
ory, one postulates a recipe for the string S-matrix in 
terms of a sum over two-dimensional (2D) world 
sheets embedded in spacetime. Very schematically, 


(Va (ka) <- Vi(Rn))) 
— ` 87% f lda] (Valka) Valbo My [1] 


topologies 


Here the left-hand side stands for the S-matrix of the 
physical string states {Va(ka)}. The symbol (...),,) 
denotes a correlation function on the 2D world sheet, 
which is a punctured Riemann surface of Euler number 
xy and given moduli {ua}. In SFT, one aims to recover 
this standard prescription from the Feynman rules of a 
second-quantized spacetime action S[®]. The string 
field ®, the fundamental dynamical variable, can be 
thought of as an infinite-dimensional array of space- 
time fields {¢'(x”)}, one field for each basis state in the 
Fock space of the first-quantized string. 

The most straightforward way to construct S[®] 
uses the unitary light-cone gauge. Light-cone SFT is 
an almost immediate transcription of Mandelstam’s 


light-cone diagrams in a second-quantized language. 
While often useful as a bookkeeping device, light- 
cone SFT seems unlikely to represent a real 
improvement over the first-quantized approach. By 
contrast, from our experience in ordinary quantum 
field theory, we should expect Poincaré-covariant 
SFTs to give important insights into the issues of 
vacuum selection, background independence and the 
nonperturbative definition of string theory. 

Covariant SFT actions are well established for the 
open (Witten 1986), closed (Zwiebach 1993) and 
open/closed (Zwiebach 1998) bosonic string. These 
theories are based on the BRST formalism, where the 
world sheet variables include the bc ghosts intro- 
duced in gauge-fixing the world sheet metric to the 
conformal gauge g,p ~ 6,5. (An alternative approach 
(Hata et al.), based on covariantizing light-cone SFT, 
will not be described in this article.) Much less is 
presently known for the superstring: classical actions 
have been established for the Neveu-Schwarz sector 
of the open superstring (Berkovits 2001) and for the 
heterotic string (Berkovits et al. 2004). 

During the first period of intense activity in SFT 
(1985-1992), the covariant bosonic actions were 
constructed and shown to pass the basic test of 
reproducing the S-matrix [1] to each order in the 
perturbative expansion. The more recent revival of 
the subject (since 1999) was triggered by the 
realization that SFT contains nonperturbative infor- 
mation as well: D-branes emerge as solitonic 
solutions of the classical equations of motion in 


open SFT (OSFT). We can hope that the nonpertur- 
bative string dualities will also be understood in the 
framework of SFT, once covariant SFTs for the 
superstring are better developed. 

In this article, we review the basic formalism of 
covariant SFT, using for illustration purposes the 
simplest model — cubic bosonic OSFT. We then 
briefly sketch the generalization to bosonic SFTs 
that include closed strings. Finally, we turn to the 
subjects of classical solutions in OSFT and the 
physics of the open-string tachyon. 


Open Bosonic SFT 


The standard formulation of string theory starts with 
the choice of an on-shell spacetime background where 
strings propagate. In the bosonic string, the closed 
string background is described by a conformal field 
theory of central charge 26 (the “matter” CFT). The 
total world sheet CFT is the direct sum of this matter 
CFT and of the universal ghost CFT, of central charge 
26. To describe open strings, we must further specify 
boundary conditions for the string endpoints. The 
open-string background is encoded in a boundary CFT 
(BCFT), a CFT defined in the upper-half plane, with 
conformal boundary conditions on the real axis 
(see Boundary Conformal Field Theory in this encyclo- 
pedia). In modern language, the choice of BCFT 
corresponds to specifying a D-brane state. 

In classical OSFT, we fix the closed-string back- 
ground (the bulk CFT) and consider varying the 
D-brane configuration (the boundary conditions). 
To lowest order in g,, we can neglect the back- 
reaction of the D-brane on the closed-string fields, 
since this is a quantum effect from the open-string 
viewpoint. Let us prepare the ground by recalling 
the standard o-model philosophy. To describe off- 
shell open-string configurations, we should allow for 
general (not necessarily conformal) boundary condi- 
tions. We can imagine to proceed as follows: 


1. We choose an initial open-string background, a 
reference BCFT that we shall call BCFTo. For 
example, a Dp brane in flat 26 dimensions 
(Neumann boundary conditions on p + 1 coordi- 
nates, Dirichlet on 25 — p coordinates). 

2. We then write a basis of boundary perturbations 
around this background. Taking, for example, 
BCFTo to be a D25 brane in flat space, the world 
sheet action Sws takes the schematic form 


~ 


1 = 
= X, ox” T(x” 
Sws Ira) a” nO +f (x ) 


+ Ay (x")OX” + By (x")OPXY +... [2] 
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Here to the standard free bulk action (integrated 
over the upper-half complex plane UHP) we have 
added perturbation localized on the real axis R. 
Notice that the basis of perturbations depends on 
the chosen BCFTo. 

3. We interpret the coefficients {d/(x!)} of the 
perturbations as spacetime fields. (The tilde on 
(x) serves as a reminder that these fields are not 
quite the same as the fields ¢/(x) that will appear 
in the OSFT action). We are after a spacetime 
action S[{d‘}] such that solutions of its classical 
equations of motion correspond to conformal 
boundary conditions: 


ôS 
ôg 
> Bi {o} = 0 (world sheet) [3] 


= ()(spacetime) 


We recognize in [2] the familiar open-string 
tachyon T(x) and gauge field A, (x), which are the 
lowest modes in an infinite tower of fields. Relevant 
perturbations on the world sheet (with conformal 
dimension h < 1) correspond to tachyonic fields in 
spacetime (m? < 0), whereas marginal world sheet 
perturbations (b = 1) give massless spacetime fields. 
To achieve a complete description, we must include 
all the higher massive open-string modes as well, 
which correspond to nonrenormalizable boundary 
perturbations (þh > 1). In the traditional o-model 
approach, this appears like a daunting task. The 
formalism of OSFT will automatically circumvent 
this difficulty. 


The Open-String Field 


In covariant SFT the reparametrization ghosts play 
a crucial role. The ghost CFT consists of the 
Grassmann odd fields b(z), c(z), b(Z), €(Z), of dimen- 
sions (2,0), (—1,0), (0,2), (0, —1), respectively. The 
boundary conditions on the real axis are 
b=b,c=c. The state space Hpgcpr, of the full 
matter+ghost BCFI can be broken up into 
subspaces of definite ghost number, 


HBCFT; = s> on [4] 
G==co 

We use conventions where the SL(2, R) vacuum |0) 
carries zero ghost number, G(|0))=0, while 
G(c)=+1 and G(b)= —1. As is familiar from the 
first-quantized treatment, physical open-string states 
are identified with G= +1 cohomology classes of 
the BRST operator, 


OV sive) = 0, 
G(|Vphys)) = +1 


| Votive) -= LET T O|A) [5] 
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where the nilpotent BRST operator O has the 
standard expression 


1 
O — maf T matter + : bcOc :) [6] 


Though not a priori obvious, it turns out that the 
simplest form of the OSFT action is achieved by 
taking as the fundamental off-shell variable an 
arbitrary G=+1 element of the first-quantized 
Fock space, 


|$) € Hyer, 7] 


By the usual state—operator correspondence of CFT, 
we can also represent |®) as a local (boundary) 
vertex operator acting on the vacuum, 


}®) = &(0)|0) [8] 


The open-string field |®) is really an infinite- 
dimensional array of spacetime fields. We can 
make this transparent by expanding it as 


=> | PORDO) D 


where {|®,(R))} is some convenient basis of Hon 
that diagonalizes the momentum k,„. The fields 
ġġ are a priori complex. This is remedied by 
imposing a suitable reality condition on the string 
field, which will be stated momentarily. Notice that 
there are many more elements in {|®;(R))} than in the 
physical subspace (the cohomology classes of Q). 
Some of the extra fields will turn out to be 
nondynamical and could be integrated out, but at 
the price of making the OSFT action look much 
more complicated. 

It is often useful to think of the string field in 
terms of its Schrodinger representation, that is, as a 
functional on the configuration space of open 
strings. Consider the unit half-disk in the upper- 
half plane, Dy = {|z| < 1, Sz > 0}, with the vertex 
operator (0) inserted at the origin. Impose BCFT9 
open string boundary conditions for the fields X(z, z) 
on the real axis (here X(z, z) is a short-hand notation 
for all matter and ghost fields), and boundary 
conditions X(o) = X,(c) on the curved boundary of 
Dy, z= exp (io), 0 <o <r. The path integral over 
X(z,Z) in the interior of the half-disk assigns a 
complex number to any given X,(c), so we obtain a 
functional ®[X;,(c)]. This is the Schrödinger wave 
function of the state ®(0)|0). Thus, we can think of 
open-string functionals ®[X;,(c)] as the fundamental 
variables of OSFT. This is as it should be: the 


first-quantized wave functions are promoted to 
dynamical fields in the second-quantized theory. 
Finally, let us quote the reality condition for the 
string field, which takes a compact form in the 
Schrödinger representation: 


®[X"(o), (2), €()| 
= *|X" (r — o), b(n — o),c(nr—o)| [10] 


where the superscript x denotes complex conjugation. 


The Classical Action 


With all the ingredients in place, it is immediate to 
write the quadratic part of the OSFT action. The 
linearized equations of motion must reproduce the 
physical-state condition [5]. This suggests 


S~ (®|Q|®) [11] 


Here (|) is the usual BPZ inner product of BCFTo, 
which is defined in terms of a two-point correlator on 
the disk, as we review below. The ghost anomaly 
implies that on the disk we must have Got = +3, 
which happily is the case in [11]. Moreover, since the 
inner product is nondegenerate, variation of [11] gives 


Q = 0 [12] 


as desired. The equivalence relation |Vphys) ~ 
[Vne FOJA is interpreted in the second-quantized 
language as the spacetime gauge invariance 


6x|®) = QIA), Aen [13] 


valid for the general off-shell field. This equation is 
a very compact generalization of the linearized 
gauge invariance for the massless gauge field. 
Indeed, focusing on the level-zero components, 
Ib) ~ A,,(x)(cOX")(0)|0) and |A) ~ A(x)|0), we find 
ôA, (x) =ð, A(x). It is then plausible to guess that the 
nonlinear gauge invariance should take the form 


ba|®) = OIA) + |B) * |A) = |A) x8) [14] 


where * is some suitable product operation that 
conserves ghost number 


. q”) (mn) (n+m) 
x : Hgcer, © HBcrr, > MBCT, [15] 


Based on a formal analogy with 3D nonabelian 
Chern-Simons theory, Witten proposed the cubic 
action 


1/1 1 
S=—-—|=(®/O|®) += (PP «x P 16 
a(z +z) [16 
The string field |®) is analogous to the Chern- 
Simons gauge potential A = A;dx', the * product to 
the A product of differential forms, O to the exterior 
derivative d, and the ghost number G to the degree 


of the form. The analogy also suggests a number of 
algebraic identities: 


Q = 0 
(QAIB) = —(-1)°)(A|QB) 
O(A +B) = (QA) +B + IPA (QB) 7 
(AIB) = (—1) °° (BIA) 
(A|B x C) = (B|C x A) 
Ax*(BxC)=(Ax*B)*C 


Note in particular the associativity of the *-product. 
It is straightforward to check that this algebraic 
structure implies the gauge invariance of the cubic 
action under [14]. A *-product satisfying all required 
formal properties can indeed be defined. The most 
intuitive presentation is in the functional language. 
Given an open-string curve X(c),0<a0<7, we 
single out the string mid-point o=7/2 and define 
the left and right “half-string” curves 


Xz (0) 


Xr(o) 


for 0 <0 <5 
m [18] 
for z ee 


= X(o) 
= X(n- ø) 


A functional ®[X(c)] can, of course, be regarded as 
a functional of the two half-strings, ®[X]— 
®[X,, Xr]. We define 


(i x P2)[XL, XR] = fava Y|®2/Y, XR| [19] 


where f [dY] is meant as the functional integral over 
the space of half-strings Y(o), with Y(a/2)= 
X,(m/2)= Xpr(a/2). Figure 1a shows two open 
strings interacting (to form a single open string) if 
and only if the right half of the first string precisely 
overlaps with the left half of the second string. 
Associativity is transparent (Figure 1b). 

We can now translate this formal construction in 
the precise CFT language. Very generally, an 1-point 
vertex of open strings can be defined by specifying 
an n-punctured disk, that is, a disk with marked 
points on the boundary (punctures) and a choice of 
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Figure 1 Midpoint overlaps of open strings. 
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coordinates are essential since we are dealing with 
off-shell open-string states. The BPZ inner product 
(two-point vertex) is given by 


(81/82) = (10 $1 (0) ©2(0))upy 
1 [20] 
I(z) = i 


The symbol f o ®(0), where f is a complex map, 
means the conformal transform of ®(0) by f. For 
example, if ® is a dimension-d primary field, then 
f o ®(0) = f'(0)7(F(0)). If ® is nonprimary, the 
transformation rule will be more complicated and 
involve extra terms with higher derivatives of f. By 
performing the SL(2, C) transformation 


1 +1z 
1 —iz 





w=h(z)= [21] 
we can represent the two-point vertex as a corre- 
lator on the unit disk D = {|w]| < 1}, 


(@1|®2) = (f1 o B1 (0), fo 0 ®2(0))p 
fi(z1) = —h(z1),  fa(%2) = bk2) 


The vertex operators are inserted as w= —1 and 
w= +1 on D (see Figure 2a) and correspond to the 
two open strings at (Euclidean) world sheet time 
T= —oo (we take z= exp (io + 7)). The left half of 
D is the world sheet of the first open string; the right 
half of D is the world sheet of the second string. The 
two strings meet at r=0Q0 on the imaginary w axis. 
The three-point Witten vertex is given by 


|22] 


(®1 , 2 , P3) 
= (g1 0 &1(0)g2 o &2(0)g3 0 3(0))p [23] 


where 





1-121 
1+ 122 
nla)= (12) 24 
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Figure 2 Representation of the quadratic and cubic vertices as 
2- and 3-punctured unit disks. 
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The 3-punctured disk is depicted in Figure 2b, and 
describes the symmetric mid-point overlap of the 
three strings at r=0. Finally, the relation between 
the three-point vertex and the «-product is 


(D1 |B, x P3) = (®,, D5, P3) [2.5] 


Knowledge of the right-hand side (RHS) in [25] for 
all ® allows to reconstruct the x-product. All formal 
properties [17] are easily shown to hold in the CFT 
language. This completes the definition of the OSFT 
action. 

Evaluation of the classical action is completely 
algorithmic and can be carried out for arbitrary 
massive states, with no fear of divergences, since in 
all required correlators the operators are inserted 
well apart from each other. 


Quantization 


Quantization is defined by the path integral over the 
second-quantized string field. The first step is to deal 
with the gauge invariance [14] of the classical action. 
The gauge symmetry is reducible: not all gauge 
parameters A') (the superscript labels ghost number) 
lead to a gauge transformation. This is clear at the 
linearized level; indeed, if A—=QOA'!), then 
6o = O* A) =O. Thus, the set {A} gives a 
redundant parametrization of the gauge group. 
Characterizing this redundancy is somewhat subtle, 
since fields of the form A” = QA‘) do not really 
lead to a redundancy in A), and so on, ad infinitum. 
It is clear that we need to introduce an infinite tower 
of (second-quantized) ghosts for ghosts. 

The Batalin—Vilkovisky formalism is a powerful way 
to handle the problem. The basic object is the master 
action S(¢*, @:), which is a function of the “fields” o° 
and of the “antifields” ¢:. Each field is paired with a 
corresponding antifield of opposite Grassmanality. 
(“Grassmanality” is defined to be even or odd: a 
Grassmann even (odd) field is a commuting (antic- 
ommuting) field). The master action must obey the 
boundary condition of reducing to the classical action 
when the antifields are set to zero. (Note that in general 
the set of fields ø: will be larger than the set of fields ¢’ 
that appear in the classical action). Independence of the 
S-matrix on the gauge-fixing procedure is equivalent to 
the BV master equation 


5 {S,S} = -HAS [26] 
The antibracket {,} and the A operator are defined as 
OY OO; 


= O; O 
«OP oy 


BRIE yy 








where O; and O, are derivatives from the left and 
from the right. It is often convenient to expand S in 
powers of b, S= So + bS1 + TE , with 


{So, So} = 0 2g 
{So, Si} + {So, Si} = —2hASo,... 


With these definitions in place, we shall simply 
describe the answer, which is extremely elegant. In 
OSFT the full set of fields and antifields is packaged 
in a single string field |®) of unrestricted ghost 
number. If we write 


Ib) = |b_) + |.) 


29 
with G(®_) < 1 and G(®,) >2 Ga 


all the fields are contained in |®_) and all the 
antifields in |®,). To make the pairing explicit, we 
pick a basis {|®,)} of Hgcrr,, and define a conjugate 
basis {|®°)} by 


(Drs) = ôr [30] 
Clearly, G(®°) + G(,) =3. Then 


> l)o, |®4) = 


G(®;)<1 


ee Oy )e5 B1 
G(®)<1 

Basis states |®,) with even (odd) ghost number 
G(®,) are defined to be Grassmann even (odd). The 
full string field |) is declared to be Grassmann 
odd. It follows that 4° is Grassmann even (odd) for 
G(®,) odd (even), and that the corresponding 
antifield ¢: has the opposite Grassmanality of ¢°, 
as it must be. With this understanding of |®), the 
classical master action Sọ is identical in form to the 
Witten action [16]! The boundary condition is 
satisfied; indeed, setting |®;) =0, the ghost number 
anomaly implies that only the terms with G= +1 
survive. The equation {So,So}=0 follows from 
straightforward manipulations using the algebraic 
identities [17]. On the other hand, the issue of 
whether AS)=0, or whether instead quantum 
corrections are needed to satisfy full BV master 
equation, is more subtle and has never been fully 
resolved. The A operator receives singular contri- 
butions from the same region of moduli space 
responsible for the appearance of closed-string 
poles, which are discussed below. (See Thorn 
(1989) for a classic statement of this issue). It 
seems possible to choose a basis in Hgcrr, such that 
there are no quantum corrections to So (Erler and 
Gross 2004). In the following we shall derive the 
Feynman rules implied by Sp alone. 


SFT Diagrams and Minimal Area Metrics 


Imposing the Siegel gauge condition bb =0, one 
finds the gauge-fixed action 


1/1 1 


+ (4 bola) ) 32 


where ( is a Lagrangian multiplier. The propagator 
reads 


— = bo | dTe™ [33] 


Since Lo is the first-quantized open-string Hamilto- 
nian, e//° is the operator that evolves the open- 
string wave functions W[X(c)] by Euclidean world 
sheet time T. It can be visualized as a flat 
rectangular strip of “horizontal” width m and 
“vertical” height T. Each propagator comes with 
an antighost insertion 


m- f Be 34] 


integrated on a horizontal trajectory. 

The only elementary interaction vertex is the mid- 
point three-string overlap, visualized in Figure 3. We 
are instructed to draw all possible diagrams with 
given external legs (represented as semi-infinite 
strips), and to integrate over all Schwinger para- 
meters T; €[0,00) associated with the internal 
propagators. The claim is that this prescription 
reproduce precisely the first-quantized result [1]. 
This follows if we can show that (1) the OSFT 
Feynman rules give a unique cover of the moduli 
space of open Riemann surfaces; (2) the integration 
measure agrees with the measure [dua] in [1]. The 
latter property holds because the antighost insertion 
[34] is precisely the one prescribed by the Polyakov 
formalism for integrating over the moduli T;. To 
show point (1), we introduce the concept of 
minimal-area metrics, which has proved very 
fruitful. (Here and below, our discussion of 





Figure 3 The cubic vertex represented as the mid-point gluing 
of three strips. 
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minimal-area metrics will summarize ideas devel- 
oped mainly by Zwiebach.) Quite generally, the 
Feynman rules of an SFT provide us with a cell 
decomposition of the appropriate moduli space of 
Riemann surfaces, a way to construct surfaces in 
terms of vertices and propagators. Given a Riemann 
surface (for fixed values of its complex moduli), the 
SFT must associate with it one and only one string 
diagram. The diagram has more structure than the 
Riemann surface: it defines a metric on it. In all 
known covariant SFTs, this is the metric of minimal 
area obeying suitable length conditions. Consider 
the following: 


Minimal-area problem for open SFT Let Ro be a 
Riemann surface with at least one boundary 
component and possibly punctures on the boundary. 
Find the (conformal) metric of minimal area on Ro 
such that all nontrivial Jordan open curves have 
length greater than or equal to m. (A curve is said to 
be nontrivial if it cannot be continuously shrunk to a 
point without crossing a puncture.) 


An OSFT diagram (for fixed values of its Tj), 
defines a Riemann surface Ro endowed with a 
metric solving this minimal-area problem. This is the 
metric implicit in its picture: flat everywhere except 
at the conical singularities of defect angle (n — 2)r 
when n propagators meet symmetrically. (For n= 3, 
these are the elementary cubic vertices; for n > 3, 
they are effective vertices, obtained when propaga- 
tors joining cubic vertices collapse to zero length.) It 
is not difficult to see both that the length conditions 
are obeyed, and that the metric cannot be made 
smaller without violating a length condition. Con- 
versely, any surface Ro endowed with a minimal- 
area metric, corresponds to an OSFT diagram. The 
idea is that the minimal-area metric must have open 
geodesics (“horizontal trajectories”) of length ~ 
foliating the surface. The geodesics intersect on a 
set of measure zero — the “critical graph” where the 
propagators are glued. Bands of open geodesics of 
infinite height are the external legs of the diagram, 
while bands of finite height are the internal 
propagators. 

The single cover of moduli space is then ensured 
by an existence and uniqueness theorem for metrics 
solving the minimal-area problem for OSFT. These 
metrics are seen to arise from Jenkins—Strebel 
quadratic differentials. Existence shows that the 
Feynman rules of OSFT generate each Riemann 
surface Ro at least once. Uniqueness shows that 
there is no overcounting: since different diagrams 
correspond to different metrics (by inspection of 
their picture), no Riemann surface can be generated 
twice. 
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Closed Strings in OSFT 


As is familiar, the open-string S-matrix contains 
poles due to the exchange of on-shell open and 
closed strings. The closed-string poles are present in 
nonplanar loop amplitudes. We have seen that 
OSFT reproduces the standard S-matrix. Factoriza- 
tion over the open-string poles is manifest, it 
corresponds to propagator lengths T; going to 
infinity. Surprisingly, the closed-string poles are 
also correctly reproduced, despite the fact that 
OSFT treats only the open strings as fundamental 
dynamical variables. In some sense, closed strings 
must be considered as derived objects in OSFT. 
Factorizing the amplitudes over the closed-string 
poles, one finds that on-shell closed-string states can 
be represented, at least formally, as certain singular 
open-string fields with G = +2, closely related to the 
(formal) identity string field. The picture is that of a 
folded open string, whose left and right halves 
precisely overlap, with an extra closed-string vertex 
operator inserted at the mid-point. The correspond- 
ing open/closed vertex is given by 


(Wohys|®) oc = (Ehys OVE a ®(0))p 
_. f rig ? [35] 
r= (TE) 


1—iz 





and describes the coupling to the open-string field of 
a nondynamical, on-shell closed string |Y phys}. It is 
possible to add this open/closed vertex to the OSFT 
action. Remarkably, the resulting Feynman rules 
give a single cover of the moduli space of Riemann 
surfaces with at least one boundary, with open and 
closed punctures. This is shown using the same 
minimal-area problem as above, but now allowing 
for surfaces with closed punctures as well. 

We should finally mention that the structure of 
OSFT emerges frequently in topological string 
theory, in contexts where open/closed duality plays 
a central role. Two examples are the interpretation 
of Chern-Simons theory as the OSFT for the 
A-model on the conifold, and the intepretation of 
the Kontsevich matrix integral for topological 
gravity as the OSFT on FZZT branes in (2,1) 


minimal string theory. 


Closed Bosonic SFT 


The generalization to covariant closed SFT is 
nontrivial, essentially because the requisite closed- 
string decomposition of moduli space is much more 
complicated. 

The free theory parallels the open case, with a 
minor complication in the treatment of the CFT zero 


modes. The closed-string field is taken to live in a 
subspace of the matter + ghost state space, |W) € 
Hert): where the tilde means that we impose the 
subsidiary conditions 


blw) = Lg|¥) = 0, by =bo-bo 144 

Lo = Lo — Lo 
In the classical theory, the string field carries ghost 
number G = +2, since it is the off-shell extension of 
the familiar closed-string physical states, and the 
quadratic action reads 


DaT OAT [37] 


Here Q, is the usual closed BRST operator. The inner 
product (,) is defined in terms of the BPZ inner 
product, with an extra insertion of cp = co — Co, 


(A, B) = (Alco |B) [38] 


In [37] Grop= +6, as it should be. Without the 
extra ghost insertion and the subsidiary conditions 
[36] it would not be possible to write a quadratic 
action. The linearized equations of motion and 
gauge invariance, 


Q.) =0, |v) ~ |T) + QA), |A)e HE [39] 


give the expected cohomological problem. The fact 
that the cohomology is computed in the semirelative 
complex, bp |W) =b5|A) =0, well known from the 
operator formalism of the first-quantized theory, is 
recovered naturally in the second-quantized treatment. 

The interacting action is constructed iteratively, 
by demanding that the resulting Feynman rules give 
a (unique) cover of moduli space. This requires the 
introduction of infinitely many elementary string 
vertices Ve n, where n is the number of closed-string 
punctures and g the genus. This decomposition of 
moduli space is more intricate than the decomposi- 
tion that arises in OSFT, but is in fact analogous to 
it, when characterized in terms of the following. 


Minimal-area problem for closed SFT Let Re be a 
closed Riemann surface, possibly with punctures. 
Find the (conformal) metric of minimal area on R 
such that all nontrivial Jordan closed curves have 
length greater than or equal to 27. 


The minimal-area metric induces a foliation of 
Re by closed geodesics of length 27. In the classical 
theory (g=0), the minimal-area metrics arise from 
Jenkins—Strebel quadratic differentials (as in the open 
case), and geodesics intersect on a measure-zero set. 
For g > 0, however, there can be foliation bands of 
geodesics that cross. By staring at the foliation, we can 
break up the surface into vertices and propagators. In 
correspondence with each puncture, there is a band of 


infinite height, a flat semi-infinite cylinder of circum- 
ference 27, which we identify as an external leg of the 
diagram. We mark a closed geodesic on each semi- 
infinite cylinder, at a distance m from its boundary. 
Bands of finite height (internal bands not associated to 
punctures) correspond to propagators if their height is 
greater than 27, otherwise they are considered part of 
an elementary vertex. Along any internal cylinder of 
height greater than 27, we mark two closed geodesics, 
at a distance 7 from the boundary of the cylinder. If we 
now cut open all the marked curves, the surface 
decomposes into a number of semi-infinite cylinders 
(external legs), finite cylinders (internal propagators) 
and surfaces with boundaries (elementary interac- 
tions). Each elementary interaction of genus g and 
with n boundaries is an element of V, „n. A crucial point 
of this construction is that we took care of leaving a 
“stub” of length m attached to each boundary. Stubs 
ensure that sewing of surfaces preserves the length 
condition on the metric (no closed curve shorter 
than 27). 

These geometric data can be translated into an 
iterative algebraic construction of the full quantum 
action S[W]. The Vg, n satisfy geometric recursion 
relations whose algebraic counterpart is the quan- 
tum BV master equation for S[W]. Remarkably, the 
singularities of the A operator encountered in OSFT 
are absent here, precisely because of the presence of 
the stubs. We refer to Zwiebach (1993) for a 
complete discussion of closed SFT. 


Open/Closed SFT 


There is also a covariant SFT that includes both open 
and closed strings as fundamental variables. The 
Feynman rules arise from the following problem. 


Minimal-area problem for open/closed SFT Let 
Roc be a Riemann surface, with or without 
boundaries, possibly with open and closed punctu- 
res. Find the (conformal) metric of minimal area on 
Roc such that all nontrivial Jordan open curves have 
length greater than or equal to /,=7, and all 
nontrivial Jordan closed curves have length greater 
than or equal to k =2r7. 


The surface Roc is decomposed in terms of 
elementary vertices VẸ” (of genus g, b boundary 
components, n closed-string punctures and m open- 
string punctures) joined by open and closed propa- 
gators. Degenerations of the surface correspond 
always to propagators becoming of infinite length — 
factorization is manifest both in the open and in the 
closed channel. 

The SFT described in the section “Closed strings 
in OSFT” (Witten OSFT augmented with the single 
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open/closed vertex [35]) corresponds to taking lo = 7 
and /.=0. Varying k € [0,27], we find a whole 
family of interpolating SFTs. This construction 
clarifies the special status of the Witten theory: 
moduli space is covered by a single cubic open 
overlap vertex, with no need to introduce dynamical 
closed strings, but at the price of a somewhat 
singular formulation. 


Classical Solutions in Open SFT 


In the present formulation of SFT, a background (a 
classical solution of string theory) must be chosen from 
the outset. The very definition of the string field 
requires to specify a (B)CFT 9. Intuitively, the string 
field lives in the “tangent” to the “theory space” at a 
specific point — where “theory space” is some notion of 
a “space of 2D (boundary) quantum field theories,” 
not necessarily conformal. In the early 1990s indepen- 
dence from the choice of background was demon- 
strated for infinitesimal deformations: the SFT actions 
written using neighboring (B)CFTs are indeed related 
by a field redefinition. In recent years, it has become 
apparent that at least the open-string field reaches out 
to open-string backgrounds a finite distance away — 
possibly covering the whole of theory space. (Classical 
solutions of closed SFT are beginning to be investi- 
gated at the time of this writing (2005)). 

The OSFT action written using BCFT data is just 
the full world volume action of the D-brane with 
BCFTo boundary conditions. Which classical solu- 
tions should we expect in this OSFT? In the bosonic 
string, Dp branes carry no conserved charge and are 
unstable. This instability is reflected in the presence 
of a mode with m*=—1/a’, the open-string 
tachyon T(x"), u=0,...,p. From this physical pic- 
ture, Sen argued that: 


1. the tachyon potential, obtained by eliminating 
the higher modes of the string field by their 
equations of motion, must admit a local mini- 
mum corresponding to the vacuum with no 
D-brane at all (henceforth, the tachyon vacuum, 
T(x") = To); 

2. the value of the potential at Tọ (measured with 
respect to the BCFTo point T=0) must be 
exactly equal to minus the tension of the brane 
with BCFTo boundary conditions; 

3. there must be no perturbative open-string excita- 
tions around the tachyon vacuum; and 

4. there must be space-dependent “lump” solutions 
corresponding to lower-dimensional branes. For 
example, a lump localized along one world 
volume direction, say xt, such that T(x!) — To 
as x! > +0, is identified with a D(p — 1) brane. 


102 String Field Theory 


Sen’s conjectures have all been verified in OSFT. 
(See Sen (2004) and Taylor and Zwiebach (2003) 
for reviews). The deceptively simple-looking equa- 
tions of motion (in Siegel gauge) 


Lo|®) + bo(|®) * |®)) = 0 [40] 


are really an infinite system of coupled equations, 
and no analytic solutions are known. Turning on a 
vacuum expectation value (VEV) for the tachyon 
drives into condensation an infinite tower of modes. 
Fortunately, the approximation technique of “level 
truncation” is surprisingly effective. The string field 
is restricted to modes with an Lo eigenvalue smaller 
than a prescribed maximal level L. For any finite L, 
the truncated OSFT contains a finite number of 
fields and numerical computations are possible. 
Numerical results for various classical solutions 
converge quite rapidly as the level L is increased. 

The most important solution is the string field |7) 
that corresponds to the tachyon vacuum. A remark- 
able feature of |Z) is universality: it can be written 
as a linear combination of modes obtained by acting 
on the tachyon c;|0) with ghost oscillators and 
matter Virasoro operators, 


|Z) = To c1|0) EE u L™,c1|0) + v c_1|0) ole ies 


This implies that the properties of |7} are indepen- 
dent of any detail of BCFT 9, since all computations 
involving |7} can be reduced to purely combinator- 
ial manipulations involving the ghosts and the 
Virasoro algebra. The numerical results strongly 
confirm Sen’s conjectures, and indicate that the 
tachyon vacuum is located at a non-singular point in 
configuration space. Numerical solutions describing 
lower-dimensional branes and exactly marginal 
deformations are also available. For example, the 
full family of solutions interpolating between a 
D1 and a DO brane at the self-dual radius has 
been found. There is increasing evidence that the 
open-string field provides a faithful map of the 
open-string landscape. 


Vacuum SFT: D-branes as Projectors 


In the absence of a closed-form expression for |T}, 
we are led to guesswork. When expanded around 
IZ), the OSFT is still cubic, only with a different 
kinetic term Q, 


SS 5 (Bale) +$ (| d) 41] 


The operator Q must obey all the formal properties 
[17], must be universal (constructed from ghosts and 
matter Virasoro operators), and must have trivial 
cohomology at G= +1. Another constraint comes 


from requiring that [41] admits classical solutions in 
Siegel gauge. The choice 


Q == (cli) = 0) 
== lo sa) (arctan [42 


satisfies all these requirements. The conjecture 
(Rastelli et al. 2001) is that, by a field redefinition, 
the kinetic term around the tachyon vacuum can be 
cast into this form. This “purely ghost” Q is 
somewhat singular (it acts at the delicate string 
mid-point), and presumably should be regarded as 
the leading term of a more complicated operator 
that includes matter pieces as well. The normal- 
ization constant Ko is formally infinite. Nevertheless, 
a regulator (e.g., level truncation) can be introduced, 
and physical observables are finite and independent 
of the regulator. The vacuum SFT ([41]|-[42]) 
appears to capture the correct physics, at least at 
the classical level. Taking a matter/ghost factorized 
ansatz 


(Pa) B |Om) [43] 


and assuming that the ghost part is universal for all 
D-branes solutions, the equations of motion reduce 
to following equations for the matter part: 


Dat |Om) =B) [44] 


A solution |®,,) can be regarded as a projector 
acting in “half-string space.” Recall that the 
*-product looks formally like a matrix multiplica- 
tion [19]: the matrices are the string fields, whose 
“indices” run over the half-string curves. These 
projector equations have been exactly solved by 
many different techniques (see Rastelli (2004) for a 
review). In particular, there is a general BCFT 
construction that shows that one can obtain solu- 
tions corresponding to any D-brane configuration, 
including multiple branes — the rank of the projector 
is the number of branes. A rank-one projector 
corresponds to an open-string functional which is 
left/right split, ®[LX(c)]=F.(X,)Fr(Xpr). There is 
also clear analogy between these solutions and the 
soliton solutions of noncommutative field theory. 
The analogy can be made sharper using a formalism 
that rewrites the open-string *-product as the tensor 
product of infinitely many Moyal products. (See 
Bars (2002) and references therein). 

It is unclear whether or not multiple-brane 
solutions (should) exist in the original OSFT — they 
are yet to be found in level truncation. Under- 
standing this and other issues, like the precise role of 
closed strings in the quantum theory seems to 
require a precise characterization of the allowed 


space of open-string functionals. In principle, the 
path integral over such functionals would define the 
theory at the full nonperturbative level. This remains 
a challenge for the future. 


Note Added in Proof Very recently, M Schnabl, 
building on previous work on star algebra projectors 
and related surface states (Rastelli L (2004) and 
references therein) was able to find the exact 
solution for the universal tachyon condensate in 
OSFT. This breakthrough is likely to lead to rapid 
new developments in SFT. 


See also: Boundary Conformal Field Theory; BRST 
Quantization; Chern—Simons Models: Rigorous Results; 
Fedosov Quantization; The Jones Polynomial; Large-N 
and Topological Strings; Large-N Dualities; 
Noncommutative Geometry from Strings; 
Noncommutative Tori, Yang-Mills, and String Theory; 
Operads; Superstring Theories; Topological Quantum 
Field Theory: Overview; Two-Dimensional Conformal 
Field Theory and Vertex Operator Algebras. 
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String Theory and Compactification 


The string theory provides a setup in which gauge 
and gravitational interactions can be described in a 
unified framework consistently at the quantum level. 
As such, it provides a candidate theory in which to 
describe the standard model of particle physics 
(describing quarks and leptons and their strong and 
electroweak interactions) and gravity within the 
same quantum theory. 
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The string theory has a unique fundamental scale 
M,, fixed by the string tension, often encoded in the 
parameter a’ of dimension (length) ~. All other 
scales are derived from this one and are background 
dependent. 

Most of the string theory phenomenological 
model building has centered on the critical super- 
strings, which are ten dimensional (10D) and 
involve spacetime (as well as world-sheet) super- 
symmetry. There are five such different 10D 
theories: type IIA, type IIB, type I, and the Eg x Eg 
and SO(32) heterotic theories. The heterotic theories 
include nonabelian gauge fields and charged fer- 
mions in ten dimensions; hence, they constitute a 
promising setup to embed the standard model. On 
the other hand, the possibility of including D-branes 
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(which carry nonabelian gauge symmetries and 
charged matter) in compactifications of type II 
theories (and orientifolds thereof, like the type I 
theory itself) makes the latter reasonable alternative 
setups to embed the standard model as a brane 
world. The different 10D theories (as well as the 
11D M-theory) are related by diverse dualities, also 
upon compactification. This suggests that they are 
just different limits of a unique underlying theory. 
For 4D models, this implies that the different classes 
of constructions are ultimately related by dualities, 
and that often a given model may be realized using 
different string theory constructions as starting 
points. 

In order to recover 4D physics at low energies, 
compactification of the theory is required. In 
geometrical terms, the theory is required to propa- 
gate on a spacetime with geometry M4 x X6, where 
M; is a 4D Minkowski space, and X¢ is a compact 
manifold. This description is valid in the regime of a 
large compactification volume, a//R* « 1 (where R 
is the overall scale of the compact manifold), where 
a’ string theory corrections are negligible. Other 4D 
string models may be constructed using abstract 
conformal field theories. They may often be 
regarded as extrapolations of geometric compactifi- 
cations to the regime of sizes comparable with the 
string length, where string theory corrections are 
relevant and the classical geometric picture does not 
hold. 

In the simplest situation of geometrical compacti- 
fication, not including additional backgrounds 
beyond the metric, the requirement of 4D spacetime 
supersymmetry (useful for the stability of the model, 
as well as of phenomenological interest) implies that 
the space X¢ is endowed with an SU(3) holonomy 
metric. Existence of such metrics is guaranteed for 
Calabi-Yau spaces, namely Kahler manifolds with 
vanishing first Chern class. 

There are a very large number of 4D super- 
symmetric string models that can be constructed 
using different starting string theories and different 
compactification manifolds. They lead to different 
4D spectra, often including nonabelian gauge sym- 
metries and charged chiral fermions (but only rarely 
resembling the actual standard model). In addition, 
for each given model, there exist, in general, a large 
number of massless 4D scalars, known as moduli, 
whose vacuum expectation values are not fixed. 
They parametrize different choices of the compacti- 
fication data in a given topological sector (e.g., 
Kahler and complex structure moduli of the internal 
Calabi-Yau space). All physical parameters of the 
4D theory vary continuously with the vacuum 
expectation values of these scalars. 


All such models are on equal footing from the 
point of view of the theory. Hence, 4D string models 
suffer from a large arbitrariness. Although the 
breaking of supersymmetry clearly changes the 
picture qualitatively (e.g., flat directions associated 
to moduli are lifted by radiative corrections), it is 
difficult to evaluate this impact. 

In this situation, most of the research in string 
theory phenomenology has centered on the study of 
generic properties of certain classes of compactifica- 
tions, with the potential to lead to realistic struc- 
tures (such as N=1 or no supersymmetry, 
nonabelian gauge symmetries with replicated sets 
of charged chiral fermions). Within each class, 
explicit models (as close as possible to the standard 
model) have also been constructed. Generic predic- 
tions or expectations for phenomenology can be 
obtained within each setup, but quantitative results, 
even for explicit models, are always functions of 
undetermined moduli vacuum expectation values. 
Tractable mechanisms for moduli stabilization are 
under active research, although only preliminary 
results are available presently. 

The better-studied classes of models are compac- 
tifications of heterotic theories on Calabi-Yau 
spaces, and compactifications of type II theories (or 
orientifolds thereof) with D-branes. Other possibi- 
lities include the heterotic M-theory, the M-theory 
on G2 holonomy varieties, the F-theory on Calabi- 
Yau 4-folds, etc. As already mentioned, different 
classes (or even explicit models) are often related by 
string duality. 


Heterotic String Phenomenology 


A large class of phenomenologically interesting 
string vacua, which has been explored in depth, is 
provided by 4D compactifications of (any of the 
two) perturbative heterotic string theories. Compac- 
tification on large volume manifolds can be 
described in the supergravity approximation. As 
described by Candelas, Horowitz, Strominger, and 
Witten, the requirement of 4D N = 1 supersymmetry 
requires the internal manifold to be of SU(3) 
holonomy, a condition which is satisfied by 
Calabi-Yau manifolds. In the presence of a curva- 
ture, the Bianchi identity for the Kalb-Ramond 
2-form B is modified, so that, in general, it reads 


1 
dH = tr R? =o er [1] 


where H is the field strength 3-form, R is the Ricci 
2-form, and F is the field strength, in the adjoint 
representation, of the 10D gauge fields. Regarding 
the above equation in cohomology leads to a 


consistency condition, forcing the background gauge 
bundle V to be topologically nontrivial, with 


c2(V) = c2(T Xe) [2 


where cz denotes the second Chern class, and TX¢ is 
the compactification tangent space. 

The condition of supersymmetry implies that the 
gauge fields must be solutions of the Donaldson- 
Uhlenbeck—Yau equations. Existence of such a solu- 
tion is guaranteed for holomorphic and stable gauge 
bundles. The simplest solution to these conditions is 
the so-called standard embedding, where the gauge 
connection is locally identical to the spin connection, 
but more general solutions exist and have been 
characterized for particular classes of Calabi-Yau 
manifolds (e.g., when they are elliptically fibered). 
The gauge background bundle V, with structure 
group H, breaks the 10D gauge symmetry G to its 
commutant subgroup G4p. The latter corresponds to 
the 4D gauge symmetry. Moreover, the background 
bundle modifies the Kaluza—Klein reduction of the 
10D charged fermions, leading to a nonzero number 
of replicated 4D chiral fermions. Decomposing the 
adjoint representation of G (in which 10D fermions 
transform) with respect to Gap x H, 


Adj G = (RG. Ru) [3] 


the net number of 4D chiral fermions in the 
representation RG, is given by the index of the 
Dirac operator coupled to V in the representation 
Ry,;- Condition [1] implies proper cancellation of 
chiral anomalies in the resulting theory. A simple 
and well-studied class is provided by standard 
embedding compactifications of the Eg x Eg hetero- 
tic string theory, whose unbroken 4D gauge group is 
Es x Eg. The number of families (i.e., chiral multi- 
plets in the representation 27 of E¢ ) and conjugate 
families (in the 27) are given by the Hodge numbers 


nar = bia Xe), z7 = 2,1 (Xe) [4] 


More specifically, the harmonic representatives in 
each cohomology class represent the internal profile 
of the corresponding 4D fields. The net number of 
families is thus determined by the Euler character- 
istic y(X¢6) 


Nfam = [b11 — ho 4| = $|x(X6)| [5] 


Recently, much progress in heterotic model building 
has been achieved in nonstandard embedding com- 
pactifications by the detailed construction of holo- 
morphic stable bundles and the computation of the 
diverse indexes. In particular, explicit models with 
just the minimal supersymmetric standard model 
spectrum have been constructed. 
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The above geometric approach has several limita- 
tions. On the technical side, the construction of 
explicit holomorphic and stable gauge bundles is 
nontrivial from the mathematical viewpoint. On the 
more fundamental side, it allows one to explore only 
the large volume limit of heterotic compactifications. 

Further insight into the latter aspect can be 
obtained via constructions based on exactly solvable 
conformal field theories (CFTs), which describe the 
world-sheet string dynamics in compactifications, 
including all a’ corrections, and, therefore, allowing 
one to enter the small volume regime. The simplest 
such compactifications are provided by toroidal 
orbifolds, which describe string propagation in 
quotients of toroidal compactifications by a discrete 
group I. From the world-sheet viewpoint, they are 
described by 2D free CFT, but which include sectors 
of closed strings with boundary conditions twisted 
by elements of r. The resulting 4D theory contains 
chiral fermions, arising from the untwisted and 
twisted sectors. In the former, the nonchiral spec- 
trum of toroidal compactification suffers a projec- 
tion onto the T-invariant states and leads to 
chirality. Twisted sectors are localized at the fixed 
points of the orbifold action, where the local 
supersymmetry is reduced, leading naturally to 
chiral fermions. 

Many of these models can be regarded as limits of 
compactifications on Calabi-Yau spaces in the limit 
in which they become locally flat and develop 
conical singularities (and similarly, their gauge 
bundles become locally flat and with curvature 
localized near the singular points). Indeed, flat 
directions involving moduli fields in the twisted 
sector often exist, which correspond to geometric 
blow-ups of the singular point that resolve the 
conical singularities to yield a smooth Calabi-Yau. 

The theories remain simple and solvable for any 
value of the untwisted moduli (namely moduli of the 
underlying toroidal compactification). This allows 
the discussion of their low-energy effective action 
including the explicit dependence on the untwisted 
moduli, while only partial results for the dependence 
on twisted moduli are known. 

Other approaches, such as free fermion construc- 
tions or Gepner models, also provide exact descrip- 
tions of compactifications, although only at a point 
of the moduli space, deep inside the small volume 
regime. 

Exact CFT constructions provide a small volume 
description of Calabi-Yau compactifications, at 
least for particular models. Moreover, their consis- 
tency conditions (modular invariance of the parti- 
tion function) provide a stringy version of the large 
volume geometric condition implied by eqn [2]. The 
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constructions also show the existence of full-fledged 
string theory constructions with properties similar to 
geometric compactifications, but incorporating all a’ 
corrections. 

Within the general class of perturbative heterotic 
string models, a certain number of phenomenologi- 
cally interesting statements are quite generic. 


e The 4D Planck scale Mp and gauge couplings gyy 
(at the string scale) are related to the fundamental 
string scale by 


M, = Mpgym [6] 


This implies that the string scale is close to the 4D 
Planck scale. In this situation, supersymmetry can 
stabilize the electroweak scale against radiative 
corrections. 

e 4D heterotic models contain certain U(1) symme- 
tries, whose gauge bosons actually get Stuckelberg 
masses due to B A F couplings to components of 
the 2-form. Such U(1)’s would correspond to 
global symmetries, but are violated at tree level by 
a’ nonperturbative effects, namely world-sheet 
instantons. Hence, no continuous global symme- 
tries exist, even perturbatively, in these models. 
Proton decay might, however, be avoided by 
discrete global symmetries. In any event, even 
without such symmetries, the large fundamental 
scale suppresses the processes mediating proton 
decay. Thus, the proton lifetime is naturally larger 
than present experimental bounds. 

e Gauge coupling constants for the different gauge 
factors in the standard model unify at the string 
scale. This agrees with extrapolation from their 
electroweak values, assuming the minimal super- 
symmetric standard model content between the 
electroweak and string scale, up to a mismatch of 
scales (by a factor of 20). The latter may be 
addressed in diverse ways, such as threshold 
corrections, intermediate scales, or in the heterotic 
M-theory. 

e Yukawa couplings are, in principle, computable. 
Explicit computations have been carried out in 
standard embedding geometric compactifications 
(where they amount to the overlap integral of the 
internal profiles of the 4D fields, namely a 
topological intersection number), and in orbifold 
models. They are in general moduli dependent, so 
their quantitative analysis is involved. Qualita- 
tively, however, interesting patterns, such as 
hierarchical structures, are possible, for example, 
in specific orbifold models. 


Heterotic models have been studied beyond the 
perturbative regime. For instance, the construction 


of compactifications including nonperturbative 
objects, namely 5-branes, has been pursued; so has 
been the strong coupling limit of the Eg x Eg 
heterotic, described by compactifications of the 
M-theory on an interval (the so-called heterotic 
M-theory or Horava—Witten theory). The strong 
coupling phenomena of the SO(32) heterotic theory 
can be addressed using dual type I (or other type II 
orientifold) constructions. 


D-Brane Phenomenology 


A different setup for realistic string theory compac- 
tifications, within the so-called brane-world con- 
structions, is provided by compactifications of type II 
string theories containing D-branes, or quotients 
thereof. A particularly relevant class of quotients 
involves quotienting out by world-sheet parity, 
accompanied by some Z2 geometric action. The 
resulting theories are denoted type II orientifolds, and 
contain orientifold planes, subspaces fixed under the 
geometric action, corresponding to regions where the 
orientation of a string can flip. Type II compactifica- 
tions with D-branes filling the noncompact dimen- 
sions must satisfy a set of consistency conditions, 
known as RR tadpole cancellation. This is the 
condition that, in the compact space, the charge of 
D-branes and orientifold planes under the different 
RR forms must cancel. For the Z-valued charges, the 
conditions read 


S_ N,Q, + Qop =0 [7] 


where N, denotes the multiplicity of D-branes with 
charge vector and Q, under the RR fields, Qo, is the 
charge vector of the orientifold planes. Additional 
discrete conditions may be present if the relevant 
K-theory group (classifying D-brane charges in the 
corresponding background) contains torsion pieces. 
The most familiar example of these constructions 
is provided by the type I string theory, which is an 
orientifold quotient of the type IIB theory by world- 
sheet parity (with no geometric action). The model 
can be regarded as containing one orientifold 
9-plane and 32 D9 branes (all filling out 10D 
spacetime), such that their RR charges with respect 
to the (nondynamical) RR 10-form cancel. 
Supersymmetric geometric compactifications of 
type II theories and orientifolds must correspond to 
compactification on Calabi-Yau spaces in order to 
have a preserved spinor. Models with D-branes 
filling the noncompact dimensions may be broadly 
classified into two classes: type IIB compactifications 
with D(3 + 2p)-branes, wrapped on holomorphic 
2p-cycles, and carrying holomorphic and stable 


world-volume gauge bundles, and type IIA compac- 
tifications with D6 branes wrapped on special 
Lagrangian 3-cycles (in general, models with D4 
and D8 branes are not allowed since Calabi-Yau 
spaces do not have nontrivial 1- or 5-cycles on 
which to wrap the branes). This classification is a 
large volume realization of the general classification 
of supersymmetric configurations of D-branes into 
two classes, denoted A and B. 


Intersecting Brane Worlds 


Type IIA compactifications with A-branes corre- 
spond to compactifications of type IIA theory (or 
orientifolds thereof) with D6 branes wrapped on 
3-cycles of the internal Calabi-Yau space. In these 
models, each stack of N D6 branes generically leads 
to a U(N) gauge factor. Chirality arises from open 
strings stretched between pairs of branes at the 
corresponding intersections. The chiral fermions 
from an open string stretched between branes a 
and b transform in the bifundamental representation 
(Cz, Ob) of the gauge factors U(N,) x U(N;) of the 
intersecting D6 brane stacks. In general, two 
3-cycles in a 6D manifold intersect at points of the 
internal space. Hence, such fermions arise in several 
families, whose (net) number is given by the (net) 
number of intersections of the corresponding 
3-cycles Il}, Ip, namely the topological invariant 
intersection number of their homology classes 


Lab = [Ha] ` [Ig] [8] 


Simple modifications of the above rules arise in 
some sectors in the presence of orientifold planes 
(e.g., the reduction of the gauge symmetry from 
unitary to orthogonal or symplectic factors for 
branes on top of orientifold planes). 

The RR tadpole cancellation conditions specify 
that the total homological charge carried by the D6 
branes (and the orientifold 6-planes) cancel. They 
imply automatic cancellation of cubic nonabelian 
anomalies, and the cancellation of mixed U(1) 
anomalies by a Green—Schwarz mechanism mediated 
by 4D scalars from the RR closed-string sector. 

Explicit models with SM spectrum have been 
constructed in orientifolds of toroidal compactifica- 
tions in the nonsupersymmetric case, and in orbi- 
folds thereof in supersymmetric cases. The 
generalization of the above construction beyond 
toroidal situations is, in principle, possible, but 
difficult, due to the mathematically challenging 
task of constructing special Lagrangian submani- 
folds for general Calabi-Yau manifolds. 

Certain phenomenologically interesting quantities, 
such as gauge couplings and their threshold 
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corrections, Yukawa couplings, and other diverse 
correlation functions have been computed in toroi- 
dal cases, where the corresponding correlators are 
computable exactly in a’. Particularly interesting is 
the computation of Yukawa couplings, or, in 
general, of couplings involving only fields at inter- 
sections. These couplings arise from open-string 
world-sheet instantons, namely disks with bound- 
aries on the D-branes corresponding to those 
intersections. 


Type IIB Orientifolds 


Type IIB compactifications with B-type branes 
contain several familiar classes of 4D models, for 
instance, compactifications of type I string theory on 
smooth Calabi-Yau spaces (whose description may 
be carried out using the effective supergravity 
action, in close analogy with the heterotic compac- 
tifications). Compactifications of type I string theory 
on orbifolds can be regarded as a particular 
realization of this, easily described using exact 
CFTs (although from the viewpoint of the general 
description as B-branes, the appearance of lower- 
dimensional branes requires their mathematical 
description to involve coherent sheaves). Since 
Open strings at orbifolds do not have twisted 
boundary conditions, chirality arises from the orbi- 
fold projection of the toroidally compactified theory 
on the spectrum. 

Another example within this kind is provided by 
the so-called magnetized D-brane models. These 
correspond to toroidal compactifications of type I 
theory, with D9 branes carrying constant magnetic 
backgrounds for the internal components of the 
world-volume gauge fields. In this kind of model, 
although the closed-string sector is highly super- 
symmetric, the open-string spectrum has reduced 
supersymmetry, or no supersymmetry (if the bundle 
stability condition is relaxed). Chirality arises from 
the nontrivial index of the Dirac operator for open 
strings ending on D-branes with different world- 
volume magnetic fields. Explicit models have mainly 
centered on nonsupersymmetric models from orien- 
tifolds of Tf, and on supersymmetric models from 
orientifolds of the T°/(Z2 x Z2) orbifold. In both 
contexts, models with semirealistic spectra have 
been obtained: concretely nonsupersymmetric mod- 
els with just the standard model spectrum, or 
supersymmetric models with the minimal super- 
symmetric standard model spectrum, plus nonchiral 
matter. Further, properties of the gauge coupling 
constants and the computation of the Yukawa 
couplings have been studied as functions of unde- 
termined moduli. 
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Finally, a second large class of models constructed 
using B-type branes are given by lower-dimensional 
D-branes, for example, D3 branes, located at singular 
points in the internal compactification space. Since the 
massless sector of open strings is determined only in 
terms of the local structure of the singularity, these 
models have been mostly studied in noncompact 
setups. Resulting spectra can be encoded in quiver 
diagrams, related to those in the mathematical litera- 
ture on the McKay correspondence. Semirealistic three- 
family models have been constructed based on systems 
of D3 and D7 branes at the C*/Z3 orbifold singularity. 

Type IIB orientifold compactifications are also 
intimately related to F-theory compactifications on 
Calabi-Yau 4-folds, which provide a nonperturba- 
tive completion for such models. 

Mirror symmetry exchanges type IIB and HA 
compactifications with B- and A-type branes. Hence, 
it provides a map between the above two kinds of 
compactifications. This shows that type IIB orienti- 
fold models lead to spectra with structure similar to 
that of intersecting-branes worlds, and that they 
share many of their general properties. 

As a particular example, toroidal models of 
intersecting D6 branes are mapped under mirror 
symmetry to models of magnetized D9 branes. This 
mirror map has been exploited to construct the same 
theories from both starting points and to recover 
certain quantities, such as the a’-exact Yukawa 
couplings in the IIA picture from a purely classical 
(no a’ corrections) computation in the mirror IIB 
model. This is a particular application of the general 
proposal of homological mirror symmetry in com- 
pactifications with branes. 

Type II orientifold compactifications with 
D-branes have also been explored beyond the 
geometric regime, using exact CFTs to describe the 
(analog of the) internal space, and crosscap and 
boundary states to describe (the analogs of) orienti- 
fold planes and D-branes. Formal developments in 
the construction of the latter in Gepner models have 
been successfully applied to obtain large classes of 
semirealistic 4D string models in this setup. 

As compared with heterotic compactifications, the 
setup of D-brane models leads to several generic 
features: 


e Since gauge sectors are localized on D-branes, and 
have a dilaton dependence different from gravita- 
tional interactions, the relation between the 
fundamental string scale and the 4D Planck scale 
and gauge coupling reads 


Mpgym = “a [9] 


where Vr is a measure of the volume in the 
directions transverse to the brane, and g, is the 
10D string coupling. The above relation shows that 
it is possible to achieve large 4D Planck mass with 
a lower fundamental string scale by adjusting the 
transverse volume and the string coupling. This has 
been proposed by Antoniadis, Arkani-Hamed, 
Dimopoulos, and Dvali as an alternative to explain 
the Planck/weak hierarchy without supersymmetry. 

e The compactifications contain several U(1) gauge 
symmetries. For some of the corresponding gauge 
bosons, the 4D effective theory contains Stuckel- 
berg masses of order M,, due to B A F couplings 
to fields in the RR sector. These couplings make 
the U(1) gauge bosons massive; hence, they are 
absent from the low-energy physics. Nevertheless, 
the U(1)’s remain as global symmetries exact in a’ 
and to all orders in the perturbation theory in g,. 
They are violated by D-brane instantons, which 
are nonperturbative in g,. In many realistic 
models, the baryon number is one such global 
symmetry, and it prevents proton decay, even if 
the string scale is not large. 

e In general, each gauge factor in the standard 
model arises from a different brane stack, and 
their gauge couplings at the string scale are 
controlled by different moduli. This implies that, 
generically, it is not natural to have gauge 
coupling unification in D-brane models. Particular 
models may enjoy enhanced discrete global 
symmetries at special points in moduli space 
where unification is achieved, thus making uni- 
fication appear more natural in such examples. 
Similar statements apply for constructions which 
realize complete or partial unification of gauge 
groups at large scales (like string models of grand 
unification or of Pati-Salam type). 

e As already mentioned, important quantities such 
as Yukawa couplings are, in principle, computa- 
ble, although quantitative expressions have been 
derived only in a few examples, mostly in toroidal 
compactifications or quotients thereof. The results 
are moduli dependent, making it difficult to 
derive model-independent patterns. 


M-Theory Phenomenology 


Most of the phenomenological models from the 
M-theory have been constructed using the Horava- 
Witten theory (compactification of M-theory on 
S'/Z ) as starting point. This theory provides a 
description of the strong coupling regime of the 
Eg x Eg heterotic theory, and many of its basic 
features are similar to those in the perturbative 
regime. In particular, the techniques used in model 


building involve the construction of stable and 
holomorphic vector bundles and the computation 
of the relevant indexes to obtain the 4D gauge group 
and charge matter content. An important difference 
is that gauge interactions propagate only over the 
10D boundaries of spacetime, while gravity propa- 
gates over the 11 dimensions. This makes the setup 
share some features of brane-world constructions, 
and, in particular, it allows one to lower the 
fundamental scale of the theory (the 11D Planck 
scales) to reconcile it with the traditional unification 
scale. 

A different setup for M-theory phenomenology 
involves the compactification of the 11D theory on a 
7-manifold of G2 holonomy X7, in order to lead to 
N=1 supersymmetry in four dimensions. Although 
a fundamental formulation of the M-theory is 
lacking, duality arguments and indirect evidence 
can be used to show that nonabelian gauge 
symmetries of the A-D-E classical groups arise if 
X7 contains 3-cycles of codimension-4 singularities, 
locally of the form C?/T, with an A-D-E Kleinian 
subgroup of SU(2). Similarly, it can be shown that 
chiral multiplets charged under these gauge symme- 
tries arise if X7 contains certain codimension-7 
singularities. The local geometry of the latter has 
been explicitly described, and can be regarded as lying 
at the intersections of codimension-4 singularities. 

The direct construction of such singular G3 
holonomy manifolds is very difficult, and there are 
no known topological conditions that guarantee 
existence of such a metric for a fixed topology. 
However, the existence of large classes of such 
models can be indirectly shown by using duality 
arguments. Namely, any type IIA models of inter- 
secting D6 branes and O6 planes, preserving N= 1 
supersymmetry, lifts to an M-theory compactifica- 
tion on a singular G} holonomy manifold. In fact, 
the local structure of the codimension-4 and -7 
singularities agrees in particular cases with the local 
structure of D6 branes on 3-cycles and D6 brane 
intersections. 


Further Topics 


Some additional topics related to the phenomenol- 
ogy of the string theory, but not covered by the 
above model building description are discussed in 
the following. 


Effective Actions 


The construction of effective actions for such classes 
of models has been carried out in general in 
supersymmetric compactifications, using the 
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parametrization of the general 4D N=1 super- 
gravity action in terms of the Kahler potential for 
the moduli and matter fields, the gauge kinetic 
functions, and the superpotential. The moduli action 
is quite universal, at least for geometric compactifi- 
cations and for untwisted moduli in orbifold 
compactifications. For instance, the Kahler potential 
for the 4D dilaton multiplet $ and the modulus T 
controlling the size of the internal manifold, in the 
large volume and weak coupling regime, reads 


K = — log(S + S*) — 3 log(T + T*) [10] 


The corresponding expression including matter 
fields is more model dependent, but known within 
each particular class. 


Moduli Stabilization and Supersymmetry 
Breaking 


Both issues are often related. Although moduli 
stabilization preserving supersymmetry is possible, 
it often occurs that the potential stabilizing moduli 
has its origin in mechanisms related to super- 
symmetry breaking. 

The description of purely string theoretical 
mechanisms to break supersymmetry is difficult, 
and most approaches rely on field-theoretical 
mechanisms in the effective action. One of the better- 
studied mechanisms, mostly in the heterotic string 
setup (but also in type II compactifications), is 
gaugino condensation in a strongly coupled hidden 
sector, interacting with the standard model sector 
via gravitational (or perhaps additional gauge) 
interactions. Although explicit models with such 
hidden sectors and strong dynamics exist, they 
often result in runaway potentials for moduli. 
Racetrack scenarios where several condensates 
balance each other are possible but contrived. 

A second mechanism to break supersymmetry, 
mostly explored in type IIB/F-theory compactifica- 
tions, is the introduction of field-strength fluxes for 
p-form fields. Interestingly, such fluxes lead to 
nontrivial potentials depending on moduli, and 
generically breaking supersymmetry. The existence 
of several remnant flat directions in the leading a’, g, 
approximation, leaves unanswered the question of 
possible runaway moduli potentials in those direc- 
tions. However, evidence for nonperturbative con- 
tributions stabilizing the remaining moduli at finite 
distance has been proposed. Preliminary results in the 
analysis of flux stabilized vacua have been obtained 
in simple examples of (still unrealistic) Calabi-Yau 
compactifications with small number of moduli. 

Most explored mechanisms propose supersymmetry 
breaking below the Kaluza—Klein compactification 
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scale, and, therefore, can be described in the 4D 
effective theory. They can be nicely parametrized in 
terms of vacuum expectation values for the dilaton 
and geometric moduli of the compactification. This 
description allows for a computation of the soft 
terms using the expansion of the N = 1 supergravity 
formulas in components. Concrete patterns, such as 
the universality of squark masses, or the complex 
phases of diverse soft terms, can be explored using 
this approach. 

Alternative mechanisms of breaking supersymme- 
try at higher scales, such as the introduction of 
antibranes or nonsupersymmetric compactifications, 
lead to generic difficulties with stability. 

Related to the question of supersymmetry break- 
ing is the question of the cosmological constant. 
Unfortunately, there is no manifest mechanism in 
the string theory that explains the smallness of the 
observed value of this scale. Given that many 
aspects of both quantum gravity in the string theory 
and realistic model building (with proper super- 
symmetry breaking and moduli stabilization) are 
still under progress, an open-minded point of view 
on this problem and the proposed solutions is kept. 


Cosmology 


Although somewhat different from the traditional 
focus of string phenomenology, recent progress in 
observational cosmology has triggered much interest 
in string theory realizations of inflationary models 
(or alternatives such as pre-big bang scenarios). 
Most inflationary models have centered on using 
moduli as the inflaton field, due to their flat 
potentials. A simple setup in type II compactifica- 
tions, known as brane inflation models, uses the 
modulus controlling a brane position as the inflaton 
field, which has a flat enough potential with a 
moderate fine-tuning. Such setups may lead to 
interesting additional features, such as a moderate 
but potentially observable density of cosmic strings 
created in the reheating process. 

On the other hand, many interesting questions in 
string cosmology await further understanding of 
time-dependent backgrounds in the string theory. 


Retrospect 


It is remarkable that the formal framework of 
the string theory admits tractable solutions with 
reasonable resemblance to the structure of the 


standard model. In particular, generic features such 
as nonabelian gauge symmetry and chirality, coupled 
to gravity, are generic in 4D compactifications. This 
is already a success. In addition, much progress has 
been made in the general description of the relevant 
mathematical tools, and physical mechanisms and 
ingredients involved in these vacua, as well as in the 
explicit construction of models with the standard 
model spectrum (or supersymmetric extensions of 
it). Yet, many questions remain open and much 
more work is needed in order to make contact with 
the physics observed in nature. 


See also: Brane Worlds; Compactification of Superstring 
Theory; Cosmology: Mathematical Aspects; Superstring 
Theories. 
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String topology is a new field of study involving the 
geometric and algebraic topology of spaces of loops 
and paths in manifolds. The subject was initiated in 
the important work of Chas and Sullivan (1999) 
who uncovered previously unknown algebraic struc- 
ture in the homology and equivariant homology of 
loop spaces. While the structure is purely topologi- 
cal, it was motivated by formalisms in quantum field 
theory and string theory. Since that time this subject 
has attracted the attention of many mathematicians, 
but one of the main lines of research continues to be 
motivated by the attempt to understand the relation 
between this structure (and its generalizations) with 
topological and conformal field theories. 

In order to describe some of the recent advances in 
this field, we begin with some notation. Throughout 
this article M” will denote a closed, n-dimensional, 
oriented manifold. LM will denote the free loop space, 


LM = Map(S', M) 


For D,,D2 CM closed submanifolds, Pm(D1, D2) 
will denote the space of paths in M that start at Dı 
and end at D2, 


Pm(D1, D2) ={7 : [0,1] ~M,7(0) € Di, y(1) € Da} 


The paths and loops we consider will always be 
assumed to be piecewise smooth. Such spaces of paths 
and loops are well known to be infinite-dimensional 
manifolds, and roughly speaking, string topology is the 
study of the intersection theory in these manifolds. 

Recall that for closed, oriented manifolds, there is 
an intersection pairing, 


H,(M) x H,(M) > Hyas-n(M) 


which is defined to be Poincaré dual to the cup 
product, 


H” (M) x Hs (M) Be H2"-"-s (M) 


The geometric significance of this pairing is that if 
the homology classes are represented by submani- 
folds, P” and O*% with transverse intersection, then 
the image of the intersection pairing is represented 
by the geometric intersection, PN O. 

The remarkable result of Chas and Sullivan says 
that even without Poincaré duality, there is an 
intersection type product 


u : Hy(LM) x Hj (LM) > Hy +g—n(LM) 


that is compatible with both the intersection product 
on H,(M) via the map ev: LM— M(y—4(0)), and 
with the Pontrjagin product in H,(QM). 

The construction of this pairing involves consid- 
eration of the diagram, 


LM & Map(8, M) > LM x LM (1) 


Here Map(8,M) is the mapping space from the 
figure 8 to M, which can be viewed as the subspace 
of LM x LM consisting of those pairs of loops that 
agree at the basepoint. y: Map(8,M)— LM is the 
map on mapping spaces induced by the pinch map 
SS vs 

Chas and Sullivan constructed this pairing by 
studying intersections of chains in loop spaces. 
A more homotopy-theoretic viewpoint was taken 
by Cohen and Jones (2002) who viewed e:Map 
(8, M) LM x LM as an embedding, and showed 
there is a tubular neighborhood homeomorphic to a 
normal given by the pullback bundle, ev*(TM), 
where ev: LM — M is the evaluation map mentioned 
above. They then constructed a Pontrjagin-Thom 
collapse map whose target is the Thom space of the 
normal bundle, t:LM x LM —> Map(8,M)%™), 
Computing 7, in homology and applying the Thom 
isomorphism defines an “umkehr map,” 


e, : H,(LM x LM) — H,_,(Map(8, M)) 


The Chas-Sullivan loop product is defined to be the 
composition 


Ux =y.0e: H,(LM x LM) => H,_,,(Map(8, M)) 
— H,_,(LM) 


Notice that the umkehr map e, can be defined for a 
generalized homology theory h, whenever one has a 
Thom isomorphism of the tangent bundle, TM, 
which is to say a generalized homology theory þh, for 
which the representing spectrum is a ring spectrum, 
and which supports an orientation of M. 

By twisting the Pontrjagin-Thom construction by 
the virtual bundle —IM, one obtains a map of 
spectra, 


Te : LM~™ a LM-™ —, Map(8, M)” —™) 


where LM~™ is the Thom spectrum of the pullback 
of the virtual bundle ev*(—TM). Now we can 
compose, to obtain a multiplication, 


LM IM A LM IM aA Map(8, M)%o\-7™) ES LM- ™ 


The following was proved by Cohen and Jones 
(2002). 
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Theorem 1 Let M be a closed manifold, then 
LM" is a ring spectrum. If M is orientable the ring 
structure on LM~"™ induces the Chas—Sullivan loop 
product on H,(LM) by applying homology and the 
Thom isomorphism. 


The ring structure on the spectrum LM-™ was 
also observed by Dwyer and Miller using different 
methods. 

Cohen and Godin (2004) generalized the loop 
product in the following way. Observe that the 
figure 8 is homotopy equivalent to the pair of pants 
surface P, which we think of as a genus 0 cobordism 
between two circles and one circle. 

Furthermore, Figure 1 is homotopic to the 
diagram of mapping spaces, 


LM £= Map(P, M) —> (LM) 


where pin and pout are restriction maps to the 
“incoming” and “outgoing” boundary components 
of the surface P. So the loop product can be viewed 
as a composition, 


HL = [Lp 
= (Pout): © (Pin), : (H«(LM)) °* > H..(Map(P, M)) 
— H,(LM) 


where using the figure 8 to replace the surface P can 
be viewed as a technical device that allows one to 
define the umkehr map (Pin). 

In general if one considers a surface of genus g, 
viewed as a cobordism from p incoming circles to q 
outgoing circles, Ug 414, one gets a similar diagram 
(Figure 2) 


(LM)? Map(Sg.p+q,M) > (LM)? 


Figure 1 Pair of pants P. 





q circles 
p circles 


Figure 2 dig p+g- 


Cohen and Godin (2004) used the theory of “fat” or 
“ribbon” graphs to represent surfaces as developed 
by Harer (1985), Penner (1987), and Strebel (1984), 
in order to define Pontrjagin-Thom maps, 


Tre ptd ` (LM)? =a Map(Xg pig, M)” seta) 


where 1(Xig p49) is the appropriately defined normal 
bundle of pin. By applying (perhaps generalized) 
homology and the Thom isomorphism, they defined 
the umkehr map, 


(pin) : H.((LM)?) — Htx) n(Mapl(Egp+q: M)) 


where x(Ueg p49) = 2 — 2g —p—q is the Euler char- 
acteristic. Cohen and Godin then defined the string 
topology operation to be the composition, 


HX g54+q — Pout O (Pin) : H,((LM)?) ~ Herpa 
x (Map(X¢p+4; M)) ~= Herea LM)” 


They proved that these operations respect gluing of 
surfaces, 


HEi# = HY, O HXi 


where X1#}2 is the glued surface as shown in 
Figure 3. 

The coherence of these operations is summarized 
in the following theorem. 


Theorem 2 (Cohen and Godin 2004). Let h, be 
any multiplicative generalized homology theory that 
supports an orientation of M. Then the assignment 


Deptq 7 MSgpeq !Px((LM)”) > b.((LM)*) 


is a positive boundary topological quantum field 
theory. “Positive boundary” refers to the fact that 
the number of outgoing boundary components, q, 
must be positive. 


A theory with open strings was initiated 
by Sullivan (2004) and developed further by 
A Ramirez (2005) and by Harrelson (2004). In this 






rcircles 


q circles 
p circles 


Figure 3 X4#}2. 
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setting one has a collection of submanifolds, D; C M, 
referred to as “D-branes.” This theory studies 
intersections in the path spaces Pm(D;, D;). 

A theory with D-branes involves “open—closed 
cobordisms” which are cobordisms between com- 
pact one-dimensional manifolds whose boundary is 
partitioned into three parts: 


1. Incoming circles and intervals. 

2. Outgoing circles and intervals. 

3. The rest is the “free boundary” which is itself a 
cobordism between the boundary of the incom- 
ing and boundary of the outgoing intervals. Each 
connected component of the “free boundary” is 
labeled by a D-brane (see Figure 4). 


In a topological field theory with D-branes, 
one associates to each boundary circle a vector 
space Va (in our case Vsi=H,(LM)) and to an 
interval whose endpoints are labeled by Dj, Dj, one 
associates a vector space Vp,, D; (in our case Vp,, D; = 
H.(Pm(D;, D;))). 

To an open-closed cobordism as above, one 
associates an operation from the tensor product of 
these vector spaces corresponding to the incoming 
boundaries to the tensor product of the vector 
spaces corresponding to the outgoing boundaries. 
Of course, these operations have to respect the 
relevant gluing of open—closed cobordisms. 

By developing a theory of fat graphs that encode 
the open-closed boundary data, Ramirez was able 
to prove that there are string topology operations 
that form a positive boundary, topological quantum 
field theory with D-branes (Ramirez 2005). 

We end these notes by a discussion of three 
applications of string topology to classifying spaces 
of groups. 


Example 1 Application to Poincaré duality groups — 
(Abbaspour et al. to appear). For G any discrete 





Figure 4 Open-closed cobordism. 


group, one has that the loop space of the classifying 
space satisfies 


LBG ~ [| BC, 
[s] 


where [g] is the conjugacy class determined by 
g €G, and C, < G is the centralizer of g. 

When BG is represented by a closed manifold, or 
more generally, when G is a Poincaré duality group, 
the Chas-Sullivan loop product then defines pairings 
among the homologies of the centralizer subgroups. 
Abbaspour et al. describe this loop product entirely 
in terms of group homology, thus giving structure 
to the homology of Poincaré-duality groups that 
previously had not been known. 


Example 2 Applications to 3-manifolds. 
(Abbaspour 2005). Let +::H.,M— H,(LM) be 
induced by inclusion of constant loops. This is a 
split injection of rings. Write H,(LM)=H,(M)@ 
Am. We say H,(LM) has nontrivial extended loop 
products if the composition 


Am ® Am > H,(LM) ® H,(LM) > H,(LM) 


is nontrivial. 

Let M be a closed, irreducible 3-manifold. In a 
remarkable piece of work, Abbaspour showed the 
relationship between having a trivial extended loop 
product and M being “algebraically hyperbolic.” 
This means that M is a K(z,1) and its fundamental 
group has no rank-2 abelian subgroup. (If geome- 
trization conjecture is true, this is equivalent to M 
admitting a complete hyperbolic metric.) 


Example 3 The string topology of classifying 
spaces of compact Lie groups (Gruher (to appear) 
and of Gruher and Salvatore (to appear)). The goal 
of Gruher’s work is to construct string topological 
invariants of LBGYEGx GG, where G acts on 
itself via conjugation. Ultimately, one would like to 
understand the relationship between this structure 
and the work of Freed (2003) on twisted equivariant 
K-theory, KG(G) and the Verlinde algebra. 


The first observation in this program was to 
notice that the key ingredient in the forming of the 
Chas-Sullivan loop product is that the fibration 
ev: LM—M is a fiberwise monoid over a closed 
oriented manifold. The fiber is QM, which has the 
usual Pontrjagin product. 

The following was proved by 
Salvatore: 


Gruher and 


Lemma 3 Let G—E— M be a fiberwise monoid 
over a closed manifold M. Then E™ is a ring 
spectrum. 
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The following construction gives a large supply of 
examples of such fiberwise monoids over manifolds. 

Let G—P—M be a principal G bundle over a 
closed manifold M. We can construct the corre- 
sponding adjoint bundle, 


Ad(P) = PxçgG — M 


It is an easy observation that G— Ad(P)—M is a 
fiberwise monoid. 


Theorem 4 Ad(P)™ is a ring spectrum. This ring 
structure is natural with respect to maps of principal 


G-bundles. 


Let BG be classifying space of compact Lie 
groups. It is possible to construct a filtration of BG, 


Mı =M=... o> M; C Mu > --» > BG 


where the M;’s are compact, closed manifolds. An 
example of this is filtering BU(n) by Grassmannians. 

Let G — P; — M; be the restriction of EG — BG. 
By the above theorem one obtains an inverse system 
of ring spectra 


a E ee a p TMa o... 


i+1 

Theorem 5 The homotopy type of this pro-ring- 
spectrum is a well-defined invariant of BG. It is 
referred to as the “string topology of BG.” 


Potential Application: Twisted K-theory 
and the Verlinde Algebra 


Let G be a connected, compact Lie group. Using the 
observation that the loop space of a classifying space 
is the classifying space of the loop group, 
L(BG)~B(LG), the string topology gives new 
structure on the classifying space of these loop 
groups. In particular, one has new structure on the 
K-theory of these classifying spaces. Now classical 
results of Atiyah and Segal suggest that K-theory of 
classifying spaces should be related to the representa- 
tion theory of the group. In this case, the representa- 
tion theory of loop groups has been widely studied 
and is very important in conformal field theory. 
Understanding the precise relationship between the 
string topology of the classifying space and 
this representation theory is an interesting area of 
current research. To motivate this, first recall that the 
loop space, LBG, has a well-known description as 


LBG~EG xyC 


where the right-hand side refers to the homotopy 
orbit space of the conjugation (or adjoint) action of 
G on itself. Thus, the homology H,(LBG) is the 
equivariant homology H°(G). Similarly, the 


K-theory K*(LBG) maps to the equivariant K-theory, 
KG(G). Now in recent work of Freed (2003) twisted 
equivariant K-homology, Kg(G) was shown to be 
isomorphic to the Verlinde algebra. This algebra is a 
space of representations of the loop group, LG. The 
multiplication in this algebra is the “fusion product,” 
coming from conformal field theory. One topic of 
current research is to understand the relationship 
between multiplicative structure coming from the 
string topology of BG, and this fusion product in the 
Verlinde algebra. More generally, the goal is to bring 
to bear the considerable calculational techniques of 
algebraic topology that are available in string 
topology, to understand the recently uncovered field 
theoretic structure of twisted K-theory (Freed 2003), 
and its applications to string theory. 
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Introduction 


Superfluidity has been known to exist since the 
1930s. This widespread phenomenon occurs in 
many-particle Bose and Fermi systems as different 
as liquid *He, liquid *He, atomic gases like Rb and 
Li, atomic nuclei, pulsars and last, but not least, in 
metals, where the itinerant electrons may become 
superfluid. This article is devoted to a unifying 
theoretical description of Bose and Fermi super- 
fluidity. The mechanisms leading to superfluidity 
include Bose-Einstein condensation (BEC) and 
Bardeen, Cooper, and Schrieffer (BCS)—Leggett 
pairing correlations. We hope to be able to 
demonstrate why this fascinating phenomenon is — 
even roughly 80 years after its experimental discov- 
ery and its first theoretical explanation -— still a 
subject of intensive research. 

The phenomenon of superfluidity is closely 
connected with the apparent lack of any measurable 
flow resistance, which scales with the shear viscosity 
of the fluid. Its complete absence implies that 
the system is frictionless moving with zero viscosity. 
The observation of superfluidity is usually precluded 
by the solidification of most liquids as the tempera- 
ture is lowered. Only systems with particularly 
light atoms (like the helium isotopes *He and *He) 
stay liquid down to the lowest temperatures. 
These systems are referred to as “quantum liquids,” 
since their liquid state is caused by the quantum- 
mechanical zero-point motion of the atoms. It 
should be noted that the Helium isotopes 
belong to two different kinds of elementary 
particles which can be distinguished by their 
statistics: He is a spin-O boson and He a spin- 
1/2 fermion. 

In 1924, Satyendra Nath Bose and Albert Einstein 
proposed that below a characteristic degeneracy 
temperature Tp, a macroscopic number of bosons 
can condense into the state of lowest energy eg = 0. 
In the 1930s, Fritz London and Heinz London 
showed that this so-called Bose-Einstein condensate 
can be described by a macroscopic quantum- 
mechanical wave function like the one for a single 
elementary particle, but with the probability density 
replaced by the density of the condensed particles. 
By the end of the 1930s, the experimental results of 
Allen, Kamerlingh-Onnes, Keesom,  Kapitza, 


Superfluids 115 


Miesener, Wolfke, and others accumulated the 
evidence that liquid *He undergoes a second-order 
phase transition at T, =2.17K to a state referred to 
as a superfluid, since the liquid could flow without 
any sign of a flow resistance. This superfluid state 
was interpreted in terms of Bose condensation of the 
*He atoms in the liquid (London 1938). 

In Figure 1 the P-T phase diagram of liquid *He is 
shown with a normal liquid phase, a solid phase and 
the superfluid phase below the A-line at about 2 K. 

Fermions cannot condense in a way similar to the 
BEC, due to the Pauli exclusion principle. In 1957 
Bardeen, Cooper, and Schrieffer came up with their 
ingenious proposal that the superfluidity of the 
electron system (usually referred to as superconduc- 
tivity) comes about through the formation of 
fermion pairs (quasibosons) in k-space in a spin- 
singlet state. In 1971, several superfluid phases of 
liquid *He at a few mK were discovered by Lee, 
Osheroff, and Richardson at Cornell University. 
Experimental aspects connected with the spin 
degrees of freedom of the quantum liquid gave 
strong evidence for Cooper pairing of the >He atoms 
in a spin-triplet state. In Figure 2 the zero-field P-T 
phase diagram of liquid *>He is shown with a normal 
(Fermi) liquid phase, a solid phase and the super- 
fluid A and B phases. 

Immediately after this discovery, Anthony 
J Leggett applied the BCS ideas to liquid *He and 
introduced a generalized scheme, that allowed for 
triplet-pairing correlations. His theory turned out to 
describe a large variety of experimental results 
accurately. A new and exciting development set in 
when Bose-Einstein condensates were discovered for 
the first time in dilute gases of alkali atoms in 1995 
by Cornell and Wiemann et al. (Rb), Ketterle et al. 
(Na), and Hulet et al. (Li). 
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Figure 1 The phase diagram of liquid *He. Courtesy of Erkki 
Thuneberg. 
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Figure 2 The phase diagram of liquid 3He. Courtesy of Erkki 
Thuneberg. 


Boson and Fermion Degeneracy 


In what follows, the energy dispersion of Bose and 
Fermi systems is denoted as ey (free bosons/fermions 
would be represented by ¢,=h°k’/2m). A large 
number of bosons can occupy Bose quantum states 
|k), the average occupation is dictated by the Bose- 
Einstein distribution 


1 


"k = Cla—w/keT — J [1] 


For Bose systems, the chemical potential is negative 
u= —kpgTa and a is fixed by the condition 


ge 1 
n= 22 np = 1 Bl) [2] 


where the prime indicates the summation over 
excited states |k| > 0. In [2], Ap=h/./2amkpT 
denotes the thermal de Broglie wavelength which 
provides a criterion for the importance of quantum 
effects or degeneracy through 13. > O(1). The Bose 
integrals B,(a) originate from the conversion of the 
momentum sum into an energy integral and read for 
parabolic dispersion: 


7 1 oe) dy y7-! 7 Oo ene 
B,(a) i ao I eyta _ 1 | ` yI [3] 


v=1 





with B,(0) =C(o), T° the Euler T -function and ¢ denot- 
ing the Riemann ¢-function. It is important to under- 
stand that in order to have a constant total density 
n, B3/2(a) has to increase «T~%/? in the same way as 
Aż. This is, however, impossible at all temperatures 
since the chemical potential of the Bose gas vanishes 
(a — 0) at a finite temperature Tz given by 


rh n 17” 
Te = aks KGA) i 
for which nàh, = B3/2(0) =¢(3/2)=2.612.... 


In sharp contrast, fermions obey the Pauli exclu- 
sion principle, which states that only one fermion 
can occupy a quantum state |k,o) specified in 
addition by the spin projection o. The average 
statistical occupation is given by the Fermi—Dirac 
distribution 


1 


fk = a aT 41 5] 


Figure 3 shows a comparison of Bose-Einstein 
and Fermi—Dirac momentum distributions ng plotted 
vs. €e. The chemical potential is shown for fermions 
only, up =kgTa is always positive and the total 
density can be expressed as 


1 2 
n E2 = zR) 6 


where the factor of 2 originates from the spin 
degeneracy. For parabolic dispersion, the Fermi 
integral reads: 


F,(a) _ 1 f dy y”! T=0 (u/kB T)” [7] 


r (o) e-e 4 1 T(o+1) 

One recognizes that the degeneracy condition 
nr! a corresponds to the limit T < Tp= 
u(0)/kg, which is connected with the formation of 
a “Fermi sea,” with (0) = Ep the Fermi energy: 


2 
= Eş [8] 


To summarize, quantum behavior in Bose and 
Fermi system sets in below the degeneracy tempera- 
ture T*, defined through 1\3, = O(1). For bosons, 
T* = Tg is the temperature at which the chemical 
potential vanishes, whereas for fermions T* = Ty is 
the Fermi temperature. 
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Figure 3 The Fermi and Bose momentum distribution. 


London Quantum Hydrodynamics 


For a general treatment of the quantum-mechanical 
origin of the equations describing Bose and Fermi 
superfluidity, it is convenient to introduce a para- 
meter v which describes single bosons (v=1) or 
Fermion pairs (v=2) of mass M=vm. The basic 
assumption (London 1938) is that the laws of 
quantum mechanics are applicable also to a macro- 
scopic number of single (v = 1) or composite (v = 2) 
particles of density p*/vm, the so-called condensate, 
which is represented by a macroscopic wave func- 
tion y(r,t). p has the property 


or, tb" (7,8) = 28), 


Vm 





G12 


The dynamics of the condensate is governed by 
the Schrodinger equation 


202 
| itis tu)u 9 


i= — _ 
v Ot 2v? m 





in which u represents the condensate’s chemical 
potential. After performing a Madelung transforma- 
tion (Madelung 1926): 


one arrives at two coupled hydrodynamic equations, 
the first of which reads 





3p , 
z +V-7,,=0 
; [10] 
jn = PV, v = — Vo 
Vm 


Equation [10] can be interpreted as a continuity 
equation, which represents the conservation law for 
the condensate mass density p°. The second equation 


-2H im”? putova [11] 
assumes the form of the Hamilton-Jacobi equation 
for the action field of classical mechanics hy, if the 
quasiclassical limit (terms x O(h?V2) = 0) is taken. 

From [10] and [11] a condensate acceleration 
equation can be derived, which resembles the Euler 
equation of classical hydrodynamics (u = uo + dp): 





Ovs ; «| 
ay + (0° - Vje = — Vôu [12] 


The physical nature of the driving force becomes 
evident after applying the Gibbs-Duhem relation 
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nôu =óP —oodT. Finally, the acceleration of the 
mass supercurrent fò, is of the form 

OF; p> 

Jr 5 (6P — o96T) [13] 
It turns out that the London equations [10] and 
[13], in which p° is an unknown phenomenological 
parameter, explain many experimental observations 
such as persistent currents, U-tube oscillations, 
thermomechanical (e.g., fountain-) effects, beaker 
flow phenomena, and many others. 


Bose-Einstein Condensation (BEC) 


In order to understand the macroscopic quantum 
state in case of Bose systems, we consider first the 
simple case of a Bose gas. Let us decompose the 
energy eigenstates €p into those with e,=e)7=0 
(condensate) and average occupation number 


No 1 1 
o V Veæe-l1 


and those with e, > 0 (excited states) and average 
occupation number 


Nex B32 (a) E 
n = X n, =n — 15 
V Æ ®© B(0)\Ts a. 


with the total density n = nex + no. The consequence of 
the chemical potential vanishing at Tg clearly is a mac- 
roscopic occupation of the ground state of the Bose gas: 
1 1 


— = 16 
1+a+.--—1 a Pl 





no [14] 








Nj 
This phenomenon is referred to as BEC. Below 
Tp, @=0 and from [15] we see that 


ree DOSONS T ae 
Nex! en a (=) , T < Tg [17] 
Tp 


The average occupation of the ground state is given by 
no(T) =n—-—nex(T), T< Tp [18] 


It is important to understand that the number 
density of condensed particles nex has nothing to 
do with the current response function p° (eqn [10]). 
A derivation of p* will be given in the section “Local 
response of condensates and excitation gases.” 

Let us now discuss the structure of the excitation 
spectrum, which will turn out to be crucial for the 
observability of superfluidity, in some more detail. 
Suppose that a macroscopic object of mass M moves 
through the superfluid. Then one may ask the question, 
at what velocity does this motion cause the creation of 
an excitation of energy Ep and momentum p. The 
condition can be formulated in terms of the velocity 
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difference vj—ve as Ep=M(u7 — vi) /2 and 
p=M(v; — vr). Eliminating vs yields e€p=p -vi + 
O(M™!) so that condition for the creation of an excit- 
ation leads to the so-called Landau critical velocity 


i= min{ E) = 0 [19] 


It is immediately clear that for free bosons v =0. 
This means that a free Bose gas can never be a 
superfluid, since drag forces on moving objects will 
start to act even at smallest velocities. 

It turns out that interaction effects can drastically 
modify the nature of the elementary excitations. In 
1947, Nikolai Bogoliubov showed (for the first time 
using the method of second quantization) that even in 
the limit of weak repulsive interactions the excitation 
spectrum is phonon-like Ep = c|p|, with c the sound 
velocity. Lev Landau and Richard Feynman investi- 
gated the situation for superfluid *He, where the 
interactions between the atoms are far from weak. 
Landau (1947) postulated the following form for the 
excitation spectrum, for which Feynman (1953) gave 
the microscopic justification. At low momenta, the 
spectrum is phonon-like and linear in p: 


. hon 
lim Ep = Ep’ = clp 20 


At higher momenta, the spectrum is reminiscent 
of that of crystal phonons in that Ep passes though a 
maximum, and then, at a characteristic momentum 
po approaches the next minimum, which, however, 
is located at a finite energy A. Feynman called this 
part of the spectrum the “roton” (mass m,) in an 
analogy with a “smoke ring,” since it is connected 
with the forward motion of a particle accompanied 
by a ring of back-flowing other particles: 


2 
lim E, = ES —A 4 UP Po) 21 
ppo © P 2m, 21 


Figure 4 shows a sketch of the phonon-roton 
spectrum of superfluid *He. Clearly, the Landau 
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Figure 4 The phonon-roton spectrum. 


critical velocity for the phonon-roton spectrum is 
characterized by the roton minimum and is given by 
a / Pij- 


BCS-Leggett Pair Condensation 


The key assumptions of the weak-coupling mean- 
field BCS—Leggett pairing model can be summarized 
as follows: one first assumes that at sufficiently low 
temperatures it is energetically favorable that a 
temperature-dependent part of the fermions forms 
so-called Cooper pairs. This pair formation is caused 
by an attractive interaction in k-space near the 
Fermi surface: 

Mp <9, alosa 
Here =e, —js measures the energy from the 
chemical potential. The index s denotes the total 
spin of the pair. Classical superconductors have 
pairs in a relative singlet state s = 0, m; =0 whereas 
the superfluid phases of liquid *>He have pairs in a 
relative spin-triplet state s=1,m,=0, +1, with m, 
the magnetic quantum number. The amplitude of 
spontaneous pair formation is 


Skoon = ar Cees) F Q, T = I; [22] 


with k=k, — k) the relative momentum of the 
pair. The attractive interaction that drives the 
Cooper-pair formation connects the pairing ampli- 
tude koo, With a new energy scale, the so-called 
pair potential 


Aknas = ` i §poi02 [23] 
p 


As a consequence of triplet pairing the spin part of 
the pair potential is “even” upon interchange of o1 
and 02:Agg,¢, =Akoo, Then the Pauli principle 
requires that Akoo must be “odd” with respect 
to the interchange of kı and kọ or, equivalently, 
k — —k. The k-dependence can now be classified by 
an orbital quantum number £ with the special cases 
of l= 1 (p-wave) pairing, ¿= 3 (f-wave) pairing, etc. 
All superfluid phases of *He are characterized by 
p-wave orbital symmetry. 
The transition temperature T, from [23] reads 
bp Ts = 2E ee- tN) 

7 
with Np = 3n/2Ep the density of states at the Fermi 
level and y=0.577... the Euler constant. The 
energies € can trivially be divided into particle-like 
(€ > 0) and hole-like (£p < 0) terms. The presence 
of the pair potential A, leads to a mixing of particle- 
and hole-like contributions to the energy, which 


becomes a matrix in particle-hole, or Nambu space 
(Nambu 1960), and generates what is referred to as 
off-diagonal long-range order (ODLRO): 


Eel Ak 


As usual, the diagonalization of €, (Bogoliubov 
1958) leads to the energy dispersion of the relevant 
thermal excitations of the superfluid state, the so- 
called Bogoliubov quasiparticles or “bogolons”: 


Ek =\/ +A, A =A Al [25] 


In Figure 5, the dispersion Ep of Bogoliubov 
quasiparticles vs. |p| is shown. It turns out that the 
superfluid phases (A and B) of liquid *He in zero 
magnetic field are characterized by unitary matrices 
Az, so that the scalar quantity A, can be interpreted 
as the energy gap in the bogolon spectrum, which, in 
general, may be anisotropic in k-space. 

The energy gap A, of the superfluid B-phase can 
be represented in the simple nodeless (pseudoiso- 
tropic) and BCS-like form (Balian and Werthamer, 
(BW), 1963): 
le) an [26] 
kgTe œ 
Its spin structure is characterized by the presence of all 
three triplet components m,=0, +1 and will be 
discussed further with respect to the magnetization 
response (see next section). The gap symmetry of >He-A 
is uniaxial with respect to an axis Ê (Anderson and 
Morel 1960; Anderson and Brinkman 1973) 


Ao(0) E me/6 
kg T: Der 
where cos ¢, =k- f, and characterized by two point 


nodes of A, at the zeros (@,=0,7) on the Fermi 
surface. It has furthermore turned out that only the 





Az = Ao(T) sin ¢g, 
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Figure 5 The bogolon energy dispersion. 
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m,= +1 components of the spin triplet contribute 
to its spin dependence (equal spin pairing (ESP)). 


Local Response of Condensates 
and Excitation Gases 


In the previous sections we have seen that the 
structure (energy dispersion, statistics, critical flow 
velocity) of the relevant thermal excitations is of 
crucial importance for the superfluidity. We can 
now aim at a generalized statistical description of 
bosonic (phonons, rotons) and fermionic (bogolons) 
excitation gases, by introducing a generalized 
momentum distribution 


1 
no{ Ex} = cFs/ksT _ 6 [28] 


and its energy derivative 


— Ong{ ER} _ 1 


OF, 7 2kpT|cosh(E;/kpT) — 0| e 


Pro = 


Special cases are 


1, 
s= 7 


Introducing the spin s=(1 — 0)/4, the total 
momentum density response to the presence of a 
superfluid velocity 


Bose (phonons, rotons) 
Fermi (bogolons) 


: hV 1 
v = ————., s=0,-= 
(2s+1)m 2 
and a normal fluid velocity v" can be written in the 
general form 


_ asa 
Jm = yV 





pne {Er + Ex} + pv [30] 
k 


After Taylor-expanding mg with respect to the 
small energy shifts 6E,=p-(v’—v"), one may 
introduce the so-called normal fluid density tensor 


n 2s+1 
Pij = yd ProPiP; [31 


and the momentum density assumes the form 
=pl- [32] 


Equation [32] forms the central result of this 
essay because it represents the microscopic counter- 
part of the generalized London equation [10]. It is 
clearly seen how the phenomenon of superfluidity 
originates from p° > 0 due to a qualitative change 
in the dispersion of the elementary excitations, 
which may in particular be characterized by a gap in 


jm = + pv", 
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the excitation spectrum. Equation [32] is more general 
than [10] in that it introduces a two-fluid picture in 
which the mass supercurrent fẹ, = p*v* (eqn [10]) is 
complemented by a normal (excitation) mass current 
Jm = p'v" in the presence of a macroscopic velocity 
field v" of the excitation gas obeying arbitrary 
statistics. The temperature dependence of p*(T) can 
now be computed via [31] and the result depends on 
the dispersion of the thermal excitation under con- 
sideration. Figure 6 shows the temperature depen- 
dence of the normal and superfluid density of 
superfluid *He. The normal fluid density of superfluid 
He is, in general, a tensor quantity 


[33] 


Painiin 3He-A 
ij 


i more 3He-B 


The short-range Fermi liquid interaction leads to a 
quasiparticle mass enhancement m*/m=1-+ F;/3 
characterized by the pressure-dependent dimensionless 
Landau parameter F;. In Figure 7, the normal fluid 
density (Pj, , for ?He-A, p" for *He-B) is shown as a 
function of reduced temperature at a pressure of 27 
bar, where F} =12.53. The entropy density of an 
excitation system of arbitrary statistics below the 
transition can be written as 


_ g (2s) 
T0 Sks gh Pw 


Pro = O(1 + Ong) In(1 + 0ng) — ng ln ng 


[34] 


with ng =nọ{Eg}, from which one may derive the 
specific heat capacity 


2s+1 Ek + HOT 

= oe ao EE e 

my >, mere 6T) 
T 35] 
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Figure 6 The normal and superfluid density for He-ll. 
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Figure 7 The normal fluid density for °He-A, B. 


After a Taylor expansion of nọ with respect to the small 
local temperature change óT, the result for cy(T) reads 


2s+1 E2 ðE 
cy = aal k E e) [36] 
k 





V T 
In Figures 8 and 9 we show the cusp-like specific heat 
of a Bose gas as compared with the specific heat of 
>He-A, B, which display discontinuities at T-. 
Finally, the superfluid phases of *He are char- 
acterized in addition by the spin degrees of freedom, 
reflected by the bogolon spin magnetization 
response to an external magnetic field B: 


n _ 1b O 
M’ = V> onı {Ep — yhoB/2} = xoB [37] 


where y denotes the gyromagnetic ratio of the fermions. 
The bogolon spin susceptibility xo is obtained after a 
Taylor expansion of nı with respect to B as 


xo = OPDE = (>) “NFY(T) [38] 




















Figure 8 The specific heat capacity of a Bose gas. 
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Figure 9 The specific heat of 3He-A, B. 


Note that eqn [38] accounts only for the m,=0 
(bogolon) contribution to the spin-triplet suscept- 
ibility, the temperature dependence of which is given 


by the so-called Yosida function Y(T)=N;! 
yoko Yk, -1- The total susceptibility reads 
Xo = Xo XI Xa [39] 
bogolons condensate 


with the condensate contributing through Xm. =+1 a 
fraction of 2/3 of the normal state Pauli suscept- 
ibility. In Figure 10, the reduced spin susceptibility 
x/xn of `He-A,B is plotted vs. reduced tempera- 
ture. While the constant susceptibility is character- 
istic of the ESP pairing state, the reduction of the 
B-phase susceptibility is due to the lack of the 
nonmagnetic m; =Q contribution to the spin triplet 
in the low-temperature limit. Exchange interaction 
effects, characterized by the dimensionless Landau 
parameter F5, lead to a further reduction of the 
Balian-Werthamer (BW)-state susceptibility, which 
is Shown for 27 bar, where Fè = —0.755. Note that 
the theoretical picture reflected in Figure 10, and 
also in Figures 6, 7, and 9, is in quantitative 
agreement with experimental observations. 

In summary, superfluidity is a quantum-mechanical 
phenomenon seen on a macroscopic scale. It occurs 
below the degeneracy temperature T* x n*/3/m of 
both Bose and Fermi many-particle systems (like liquid 
*He and *He) and is a property of a macroscopic 
number of particles, the condensate. The role of (weak 
or strong) interactions is manifested in the structure of 
the relevant elementary excitations, which always exist 
in addition to the condensate at finite temperatures and 
above certain critical velocities. These excitations form 
a gas, referred to as the normal fluid, since it gives rise 
to temperature-dependent thermodynamic and response 
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Figure 10 The spin susceptibility of 7He-A, B. 


functions and contributes to the entropy and the flow 
dissipation. Superfluidity is now well understood using 
various aspects of the concept of the macroscopic wave 
function. On the microscopic level, the mechanisms of 
BEC and BCS-Leggett pair formation have been 
successfully invoked to understand the fascinating 
properties of Bose and Fermi superfluids. 


See also: Bose—Einstein Condensates; Bosons and 
Fermions in External Fields; High 7, Superconductor 
Theory; Topological Knot Theory and Macroscopic 
Physics; Variational Techniques for Ginzburg—Landau 
Energies; Vortex Dynamics. 
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Introduction: Minimal D = 4 Supergravity 


The essential idea of supersymmetry is an extension of 
the relativistic structure group of spacetime, which in 
ordinary four-dimensional physics in the absence of 
gravity is the Poincaré group ISO(3, 1). In a minimal 
supersymmetric theory in flat D=4 spacetime, the 
minimal supersymmetry algebra (the “graded Poincaré 
algebra”) adds spinorial generators Qa to the Lorentz 
generators M,,, and the translational generators 
(momenta) P,,,, where m = 0, 1, 2,3. The core relation 
is the “anticommutator” of two Qa: 


Ou On} — 20g Pm [1] 


where O=O'7° and the 7” are the Dirac gamma 
matrices. In the minimal D=4 supersymmetry 
algebra, the spinor generator O, is taken to be 
Majorana: O=C(Q)', where C is the charge- 
conjugation matrix and A! denotes the transpose 
of the matrix A. The full supersymmetry algebra 
adjoins to the anticommutation relation [1] the 
usual commutation relations among the Lorentz 
generators and the commutators of the Lorentz 
generators with the momenta and the spinors Qa; 
the latter express respectively the vectorial and 
spinorial characters of Pm and Qa: 


1|Minn, Moq| = Tap Mma = Nmp Mna [2] 
ilM Pa] = tngPm — Tong 3] 
ilM iis Oal = 5 (YnnQ) [4] 


where Ymn = (1/2)(Yn In — nm) and Nmn = diag(—1, 
1,1,1) is the Minkowski metric. The final relation 
in the supersymmetry algebra expresses the flatness 
of Minkowski space: 


Pm, Pa] = 0 [5] 


This algebra has been considered as an extension of the 
symmetry algebra of particle physics since the work of 
Goľfand and Likhtman in 1971, and especially since 
the linearly realized supersymmetric model of Wess 
and Zumino in 1974. That model contains a pair of 
D =4 scalar fields and a D =4 Majorana spinor, so 
the numbers of bosonic and fermionic degrees of 
freedom are equal; this is a fundamental characteristic 
of supersymmetric theories. 

The work of Wess and Zumino led to an 
explosion of interest in supersymmetry, especially 


once it was realized that renormalizable supersym- 
metric models display a cancellation of some of the 
divergences that have plagued relativistic quantum 
field theory since its inception in the 1930s. In 
particular, in renormalizable flat-space field theory 
models, divergences quadratic in a high-momentum 
cutoff vanish as a result of cancellations between 
virtual bosonic and fermionic particles. This is a 
very attractive feature for control of the “hierarchy 
problem” in particle physics, especially for the 
instability inherent in having vastly different scales 
within the same theory, for example, the TeV scale 
of ordinary electroweak physics and the 10!° GeV 
scale where unification with the strong interactions 
might come in. 

When one includes gravity, the stability problems 
of particle physics become much more severe. 
Einstein’s theory of general relativity is itself non- 
renormalizable, that is, its ultraviolet divergences are 
of different forms from the terms present in the 
original “classical” action and there is no acceptable 
finite set of correction terms that can be added to it 
to remove this defect. Moreover, when otherwise 
tolerably behaved matter field theories that are 
renormalizable in a flat-spacetime context are 
coupled to general relativity, the gravitational 
couplings pollute the matter theories with non- 
renormalizable divergences. This is a key aspect of 
the great difficulty that has been encountered in 
interpreting gravity as a quantum theory. 

Supersymmetry, with its divergence-canceling 
powers, was thus a very attractive option in the 
struggle to formulate a quantum theory of gravity, and 
the creation of a supergravity theory was thus a very 
high priority task. This was achieved in 1976 by 
Freedman, Ferrara, and Van Nieuwenhuizen using the 
technique of iterative Noether coupling to build up this 
nonlinear theory order-by-order in powers of the 
fermionic fields. The fermionic partner of the massless 
spin-2 “graviton” field is a massless fermionic spin-3/2 
field that has come to be called the “gravitino.” 

A second 1976 paper by Deser and Zumino soon 
followed, emphasizing how supergravity manages to 
circumvent the well-known problems of coupling 
spins higher than 1 to gravity. A key point in 
achieving this result is the role played by the local 
version of the supersymmetry algebra [1]-[5]. As 
one can see from the translations occurring on the 
right-hand side of [1], when one replaces translation 
symmetry by local general coordinate invariance in a 
gravitational context, the supersymmetry transfor- 
mations must themselves become local as well. Local 
symmetries allow for transformation parameters 


that are local in the spacetime coordinates x”, and 
in interacting theories they require coupling of the 
corresponding “gauge field” to a conserved current. 
In the case of supergravity, the gravitino field Yma 
plays this gauge-field role, and its coupling to the 
conserved current of supersymmetry is the key to 
allowing a consistent coupling between the spin-2 
graviton and the spin-3/2 gravitino. 


The Minimal Supergravity Action 


The action for minimal supergravity in D=4 
dimensions can be written, using the vierbein 
formalism where the metric is expressed as a 
quadratic expression in a nonsymmetric 4x4 
vierbein matrix €%,, Zinn = €4,€) Nab» as 


I -5 / d*x det(e)R(e,w(e) + K()) 


= >| d x6 Mb n¥5nD p(€, wle) + K(p))bq l6] 


where k=Łv8rG is the gravitational coupling 
constant, 
ab a b b 1 „nb 
Wm (e) =e" C? z é. z 7 e” Ce a én) 


+ 1 ete"? (Oer Crane [7] 


is the usual vierbein formalism spin connection (in 


which Com = One? and e” is the matrix inverse of 
ema), and 
ab ik? 5 a, jb 7a b TEN 
Kin W) = om PO? + Pm? =Y Y) [8] 


is the fermionic contorsion, an additional part of the 
covariant derivative D,,(e + K(w)) appearing in the 
action [6]. (Indices m,n are taken to be “world” 
indices while indices a, b are “tangent space” indices; 
one can convert from one type to another using the 
vierbein e%, and its inverse, e.g., Yaa = €” Yma.) 

Keeping the terms in the action grouped as above 
using the nonstandard covariant derivative e?? + K% 
is what has been called “1.5 order formalism”: this 
greatly simplifies the writing and analysis of the 
supergravity action [6]. In the action [6], one has the 
Ricci scalar R(e, w(e) + K(w)) written in terms of this 
generalized torsional spin connection. One may of 
course expand out all the w% + K combinations 
and write the nonlinear fermionic terms separately. 
Doing this produces a quartic term 


2 
L= 5 pry’ (Wea T 2a) 


= Ao) Pape] 
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showing the highly nonlinear nature of supergravity 
theory — when expanded out, the theory becomes 
much more cumbersome to study. The 1.5 order 
formalism trick is one of a large number of algebraic 
simplifications that had to be developed in order to 
master the technical aspects of supergravity. It also 
reveals a characteristic physical feature: this theory 
naturally involves a connection with torsion built 
from the fermionic fields. 

In terms of the torsional covariant derivative 
Dyy€ (38) = (m + (1/4) (ust? (e) + K2?(w))ap)e(2e) of the 
infinitesimal supersymmetry parameter e(x), the 
local supersymmetry transformations which leave 
the action [6] invariant (up to the integral of a total 
derivative) are 


bem = 1€Yam [9] 
6m = 2K 'Dme [10] 


The inhomogeneous part 2ktðpe in the gravitino 
transformation [10] demonstrates the gauge-field 
nature of the gravitino field. For a distribution of 
“supermatter” fields (e.g., Wess-Zumino model 
scalars and spinors), the integrated “charge” that 
one would get from a Gauss’s law surface integral at 
spatial infinity using the gravitino gauge field is the 
total supercharge Qa, which in turn plays the role of 
the supersymmetry generator in the original matter- 
sector supersymmetry algebra [1]. 

Both the gravitational field and the gravitino field 
are thus effectively gauge fields, albeit not of a 
standard Yang-Mills type. The local algebra is a 
deformation of the rigid supersymmetry algebra [1]- 
[5], generalizing the relation between general covar- 
lance and flat-space Poincaré symmetry. Some basic 
consequences of the flat-space algebra are preserved, 
however. An extremely important instance of this is 
energy positivity. As one can see by multiplying [1] 
by 7° and then contracting on the spinor index, 


E =P =>) {0,01} 


The right-hand side is manifestly non-negative 
provided the theory is quantized in a positive-metric 
Hilbert space. One can see this even more explicitly 
in a Majorana spinor basis, where Ol =Qa,. 
Accordingly, for flat-space supersymmetric theories, 
one obtains directly the result that energy is 
non-negative. This carries over to the local algebra 
of supergravity, where the total energy is obtained 
from a Gauss’s law integral over the sphere at 
spatial infinity. 

In general relativity, an integrated energy can be 
defined with respect to an asymptotic timelike 
Killing vector at spatial infinity. Showing that this 


124 Supergravity 


energy is non-negative remained for decades a 
famously unsolved problem in gravitational physics; 
it was ultimately proven in Yau’s positive-energy 
theorem. The algebraic structure of supergravity 
makes energy positivity much more transparent, 
however. Since pure general relativity can be 
obtained by setting the gravitino field to zero, this 
result is inherited by pure Einstein theory as a 
consequence of its being embeddable into super- 
gravity. Energy positivity can thus be proved even at 
the classical level using ideas taken from super- 
gravity, as was done by Witten and later streamlined 
by Nester, in an argument much simpler than Yau’s 
proof. This argument writes the energy as an 
integral over a positive-semidefinite expression 
quadratic in a commuting spinor field which is 
analogous to the (anticommuting) spinor parameter 
of supergravity in the transformations [9] and [10]. 


Auxiliary Fields and Superspace 


Supergravity shares with flat-space supersymmetric 
theories a curious technical feature that gives a hint 
of a new underlying geometry. Standard counting of 
the gauge-invariant continuous degrees freedom of 
the graviton and the gravitino in momentum space 
yield the same result per momentum value: two 
bosonic degrees of freedom and two fermionic 
degrees of freedom. This accords with the general 
requirement in supersymmetric theories that the 
numbers of bosonic and fermionic degrees of free- 
dom match. This count follows from the Einstein 
and spin-3/2 equations of motion, or “on-shell.” 
If one compares the count of nongauge degrees 
of freedom without using the equations of motion 
(i.e., “off-shell”), one obtains an imbalance, how- 
ever: six nongauge graviton versus 12 nongauge 
fermion fields. This is directly related to another 
puzzling feature of the supergravity realization of 
local supersymmetry: the local supersymmetry alge- 
bra closes onto a finite set of transformations only 
when the equations of motion are imposed. 

As in flat-space supersymmetry, the cure for this 
problem is to add nondynamical “auxiliary” fields 
to the action. In the supergravity case, the 
imbalance in the off-shell bose-fermi field count 
indicates that an additional six bosonic fields are 
needed. In the minimal set of auxiliary fields, these 
organize into a vector b,, and a scalar-pseudoscalar 
pair M, N; the additional terms in the action [6] are 
simply 


[ax det(e)(—4.M* — 4N? +435,,b”) 


while the local supersymmetry transformations are 
changed to include the auxiliary fields, e.g., the 
gravitino transformation becomes 


pm =2K ' Dy, (w, Ke 
+ qs (Om — 39m" bn) € — 3 m(M + ysN)e) 


while the auxiliary fields transform into expressions 
that vanish on-shell. Since the field equations for the 
auxiliary fields are algebraic in character and since 
for source-free supergravity they have the simple 
solution b, = M = N =0, one can directly regain the 
on-shell formalism by algebraically eliminating the 
auxiliary fields. 

The inclusion of auxiliary fields is not an empty 
trick, however. The local supersymmetry transfor- 
mations including the auxiliary fields form a closed 
set without the use of equations of motion (“off- 
shell closure”). This standardizes the form of the 
supersymmetry transformations so that they remain 
the same even when supermatter is coupled to 
supergravity instead of needing a case-by-case 
Noether construction as in the case without the 
auxiliary fields. In this way, a standard set of 
coupling rules can be drawn up, known as the 
“tensor calculus.” This tensor calculus is of great 
importance as it allows for the construction of 
general models of supergravity coupled to super- 
matter (Wess-Zumino multiplets and super Yang- 
Mills multiplets consisting of spin-1 gauge fields and 
spin-1/2 “gaugino” fields). These general couplings 
form the basis for essentially all supersymmetric 
phenomenology, and in particular for the formula- 
tion of the Minimal Supersymmetric Standard 
Model. Since supersymmetry is not directly observed 
in low-energy physics, it must be spontaneously 
broken, like many other gauge symmetries. As it 
happens, the physically realistic mechanisms of 
supersymmetry breaking all originate from super- 
gravity couplings derived using the tensor calculus. 

Given the regular set of tensor calculus rules for 
coupling supergravity to supermatter, one is led to 
suspect that a geometrical structure lies in the 
background. This is indeed the case; the correspond- 
ing construction is known as “superspace.” 

The basic idea of superspace is a generalization of 
the coset space construction of Minkowski space as 
the coset space given by the Poincaré group divided 
by the Lorentz group: M4(x”) =ISO(3, 1)/SO(3, 1). 
For supersymmetric theories, one analogously con- 
structs Superspace(x”, 4°) = Graded Poincaré/SO(3, 1). 
The basic ideas of superspace were introduced by 
Akulov and Volkov in 1972, while the idea of 
expanding in “functions” on this space, thus yielding 
“superfield,” was introduced by Salam and Strathdee 


in 1974. This led to a formulation of the Wess- 
Zumino model in terms of a chiral superfield ¢(x, 8), 
which is subjected to a covariant superspace 
constraint. 

In order to manage the formalism of superspace 
more efficiently, it is convenient to use a two- 
component spinor formalism corresponding to the 
Weyl basis for the Dirac gamma matrices, in which 
the Majorana spinor coordinate 0 is represented as 


(5) 


where two-component indices a,a@=1,2 are raised 
and lowered with the covariant two-index antisym- 
metric tensors «@’, €%’, which both take the numer- 
ical value io2. The flat-space fermionic covariant 
derivatives are then 


Da 
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[11] 
Dz = DE + iD 0", Om 
oly 
where the o”, =(1, 3) for m = (0, i) (where c; are the 
Pauli matrices) are the Van der Waerden matrices 
which establish the mapping between vector indices 
and (chiral, antichiral) spinor index pairs. The 
Wess-Zumino multiplet is then described by a 
complex chiral superfield satisfying the constraint 
Dap =0. Unlike the situation in Minkowski space, 
where the only Lorentz-covariant solution to a 
constraint that sets to zero the 0/Ox” derivatives is 
a constant, superspace has a reducible set of 
coordinates (x, 0°, 0%) and, as a result, requiring ¢ 
to be annihilated by Da does not require the whole 
superfield to be a constant. 

Since the fermionic coordinates of superspace 
6°, 6° are anticommuting (i.e., they are elements of 
a Grassman algebra), and since a,@=1,2 have an 
index range of two, powers of them higher than the 
second order necessarily vanish. As a result, super- 
fields like @ can be expanded into sets of component 
fields, each of which is an ordinary field in 
Minkowski space. In this way, a chiral superfield 
expands into (A(x), B(x), Xa(x), Xa(x), F(x), G(x)), 
where the fields A, B, x, and x are the physical 
fields of the Wess-Zumino model, while F and G 
are dimension-2 auxiliary fields. In this way, the 
auxiliary fields of supersymmetry naturally fit into a 
superspace formalism as higher components in a 
superfield expansion. It is in this sense that they 
point toward the superspace formulations of super- 
symmetric theories. 

For supergravity, there are a number of different 
approaches to realizing the theory in superspace, 
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and these correspond naturally to the various 
possible choices of auxiliary-field sets. With the 
minimal set, the supergravity multiplet is described 
by a superfield carrying a vector index H,,,(x, 6, 0); 
this superfield is called the prepotential of super- 
gravity. Note the fact that since the divisor group in 
the coset-space construction of superspace is the 
Lorentz group, superfields may carry indices corre- 
sponding to any Lorentz representation. The com- 
ponent-field expansion of the Hm superfield yields 
the physical ¢4,,WmasWma and auxiliary fields 
(bm, M, N) together with a number of other compo- 
nents of dimension lower than those of the physical 
fields. This is not, however, all that surprising: even 
the physical fields ef, Yma, Yma Contain components 
that are not directly related to the physical modes 
because we are dealing with a gauge theory. What 
occurs in superspace is a redundant expression of 
the supergravity multiplet with the presence of 
various component gauge fields. 

The full expression of local supersymmetry in 
superspace can be given in a number of different 
formalisms. Suffice it here to indicate the transfor- 
mation of the linearized theory expanded in small 
fluctuations about empty flat superspace. Convert- 
ing the vector index of H,, into a (chiral, antichiral) 
spinor index pair via H, ¿=0”;Hm, the linearized 
local symmetry transformation of the supergravity 
multiplet is 


Le 12 


where the transformation parameter superfield La 
carrying a spinor index is antichiral: Dalg=0 
(while the conjugate parameter superfield La is 
chiral). Expanding in component fields and compar- 
ing with the expansion of H,,, one sees that the 
chiral spinor superfield contains precisely the com- 
ponents needed to provide the standard gauge 
symmetries of ef, and Yma, Yma and also to trans- 
form the other gauge components of Hm as well. 
One can then make various gauge choices according 
to taste in a given context. 

One frequently encountered superspace gauge 
choice sets to zero all the fields in H,, except for 
the physical and auxiliary fields (€4,,WnasWmas 
bm, M,N). This is called a Wess—Zumino gauge 
following the analogy to a similar construction for 
super Maxwell theory (containing spins 1 and 1/2). 
Wess—Zumino gauge choices are not, however, 
supersymmetrically covariant. This shows up when 
one works out the supersymmetry algebra in such a 
gauge: the presence of auxiliary fields gives closure, 
as required, without use of the equations of motion, 
but the anticommutator of two supersymmetry 
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transformations when acting on a gauge field such 
as the Maxwell field or the vierbein gives a 
combination of the anticipated translation with an 
admixture of a gauge transformation with a field- 
dependent parameter. 

The prepotential superfield of minimal super- 
gravity can itself be fit into larger formalisms in 
superspace that are analogous to standard differen- 
tial geometry, with supervielbeins, superspin con- 
nections and so forth. An unavoidable feature of 
these more seemingly geometric constructions, how- 
ever, is their high degree of redundancy: superspace 
vielbeins and spin connections carrying Lorentz 
indices have many component fields in addition to 
those found in the prepotential. This redundancy is 
then cut down in turn by imposing superspace 
constraints on the geometrical superfields, for 
example, on the components of the torsion tensor 
in superspace. 


Extended Supergravities and 
Supergravities in Higher Dimensions 


The possible graded extensions of the Poincaré 
algebra allow for more than one spinorial generator. 
Thus, one can have N supersymmetry generators 
O03; 1,j=1,...N, with basic anticommutators 
(in Lorentz two-component notation) 


er Oz) = 260P m [13] 
tee J = 2€aga! Ze [14] 
{ Ogi, Ox} = 2€ 5 ph Zi [15] 


The right-hand sides of [14] and [15] allow for the 
possibility of nonvanishing commutators between 
supersymmetry generators of the same chirality. As 
one can see from the overall symmetry in pairs of 
indices (ai, Bj), the coefficients a must be antisym- 
metric in the 7, 7 indices, so such nonvanishing same- 
chirality anticommutators cannot occur for N=1. 
The corresponding abelian generators Z, are called 
central charges since they must commute with all the 
other ( QO. O ai Pm) elements of the algebra. 

The 7,7 indices may be endowed with a symmetry 
meaning as well, although this is not obligatory in 
every model. When the central charges are absent, 
Z~=0, one has U(N) (or SU(N)) as the maximal 
such external automorphism; the choice of index 
placement on O', and Qj; anticipates this. If such a 
symmetry is realized in a given model, the fact that 
the O',0O g carry representations both for that 
symmetry and for the spacetime Poincaré symmetry 
demonstrates how supersymmetry evades the no-go 


theorem barring unified spacetime and internal 
symmetries. This theorem (the Coleman-Mandula 
theorem) can be evaded, since at the time it was 
written, graded Lie symmetry algebras were not yet 
considered. For nonzero central charges, the exter- 
nal automorphism algebra becomes a subalgebra of 
U(N) determined by the requirement that invariant 
antisymmetric tensors a“ exist. 

The representations of the algebra [13]-[14] span 
an increasing range of spins as the number N of 
D = 4 supersymmetries increases. For massive repre- 
sentations without central charges, the spins of the 
smallest supersymmetry representation extend from 
states of spin 0 (scalars) up to spin N/2; with central 
charges, the spin range can be shortened down to a 
minimum range of N/4. For massless representa- 
tions, the range of helicities in a PCT (parity— 
change-time reversal) symmetric multiplet is from 
—N/4 to N/4. This spin range has an important 
implication for the maximal extension of super- 
symmetry that can be realized in an interacting 
supersymmetric field theory, because no interacting 
theories with a finite set of spins exist for spins >2. 
Accordingly, the maximal extension of supersym- 
metry is N =8 for massless theories, and in order to 
have massive states with spins that do not exceed 
spin 2 in an N=8 theory, the central charges have 
to be active for maximal multiplet shortening. 

The N=8 supergravity theory, found by Crem- 
mer and Julia in 1978, is thus the largest possible 
supergravity in D=4 dimensions. It contains the 
following “spin” range (allowing for a certain 
imprecision of expression: for massless fields one 
should really speak only of helicities) 


N =8 supergravity spins 
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Putipisiy| 1| 8 | 28 | 56 | 70 


In order to realize the automorphism SU(8) symme- 
try, one has to consider the field strengths for the 28 
spin-1 fields, separated into complex self-dual and 
anti-self-dual parts in their antisymmetric Lorentz 
indices. These complex field strengths can then be 
endowed with a complex 28-dimensional represen- 
tation of SU(8). The 70 scalars, on the other hand, 
fit precisely into the four-index antisymmetric 
self-dual representation of SU(8),  @%234= 
index epsilon tensor here that restricts the auto- 
morphism group to SU(8) instead of U(8). 

The SU(8) automorphism symmetry of N=8 
supergravity theory is linearly realized. It plays an 
important role in another symmetry of this theory 
which is highly nonlinear. This theory has a 


remarkable nonlinear E7 symmetry. In fact, the 70 
scalars form a nonlinear sigma model with the fields 
taking their values in the coset space E7/SU(8) (of 
dimension 133 — 63=70), where the SU(8) divisor 
is the linearly realized automorphism group dis- 
cussed above. 

The extended supergravities point to another 
aspect of supergravity theory: the existence of 
higher-dimensional supergravities, from which the 
extended theories in D =4 spacetime can be derived 
by Kaluza—Klein dimensional reduction. If one 
considers a D’ dimensional massless theory in a 
spacetime where d dimensions form a compact 
d-torus, then the theory can be viewed as a D = D’ — d 
dimensional theory in which the discrete Fourier 
modes arising from the periodicity requirements on 
the d-torus give rise to towers of equally spaced 
massive Kaluza—Klein states, plus a massless sector 
in D’ —d dimensions corresponding to the modes 
with no dependence on the d-torus coordinates. 

Importantly, N=8 supergravity in four- 
dimensional spacetime can be obtained in this way 
from a supergravity theory that exists in 11 space- 
time dimensions. Upon dimensional reduction on a 
7-torus to four dimensions, one obtains N = 8, D = 4 
supergravity at the massless level, plus an infinite 
tower of massive N = 8 supermultiplets with central 
charges so that their spin range extends only up to 
spin 2. This D=11 supergravity was in fact found 
before the N=8 theory by Cremmer, Julia, and 
Scherk, with the details of the more complicated 
N=8,D=4 theory being worked out via the 
techniques of Kaluza—Klein dimensional reduction. 
The fields of the D=11 theory include an exotic 
field type not encountered in D=4 theories: the 
bosonic fields of the theory comprise the graviton e% 
plus a three-index antisymmetric tensor gauge field 
Cunp. Counting the number of propagating modes 
of these fields for a given momentum value gives 
44 + 84=128 bosonic degrees of freedom. This 
precisely balances the 128 fermionic degrees of 
freedom coming from the D=11 gravitino Wma. 


Supergravity Effective Theories, Strings 
and Branes 


The hope for a cancellation of the ultraviolet 
divergences in a supersymmetric theory of gravity 
turned out to be ephemeral, although there is in fact 
a postponement of the divergence onset until a 
higher order in quantum field loops. There is 
agreement that the nonmaximal supergravities 
diverge at the three-loop order. For the 
N=8,D=4 theory, the situation remains unclear, 
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but divergences are nonetheless expected to occur at 
some finite loop order. 

This persistence of nonrenormalizability in D=4 
supergravity theories is no longer seen as a disaster, 
however, because these theories are now seen as 
effective theories for the massless modes arising 
from a deeper microscopic quantum theory. In 
addition, the theories that are most directly con- 
nected to this underlying quantum theory are, 
surprisingly, the maximal supergravities in space- 
time dimensions 10 and 11. D=11 supergravity can 
be dimensionally reduced on a 1-torus (1.e., a circle) 
to D=10 where the massless sector yields type IIA 
supergravity theory. This theory is the effective 
theory for a consistent quantum theory of type IIA 
superstrings in D=10. Theories of relativistic 
strings (i.e., one-dimensional extended objects) 
have strikingly different properties from theories of 
point particles. In particular, the spread-out nature 
of the interactions leads to a damping out of the 
quantum field theory divergences, while the under- 
lying supersymmetry causes a cancellation of other 
infinities that could have arisen owing to the two- 
dimensional nature of the string world sheets. This 
gives, for the first time, a perturbatively well-defined 
quantum theory including gravity. 

In addition to the type IIA theory, there are four 
other consistent superstring theories in D=10, and 
these are in turn related to various D=10 super- 
gravity effective theories for the massless modes: 
type IIB, Es x Eg heterotic, SO(32) heterotic, and 
SO(32) type I. Remarkably, the maximal D=11 
supergravity enters into this picture as well, as a 
consequence of a pattern of duality symmetries that 
have been found among the superstring theories. 

The dualities of string theory are directly related 
to the nonlinear symmetries of the dimensionally 
reduced supergravities in D = 4. The string quantum 
corrections do not respect the E7 symmetry of the 
classical N = 8 theory, but they do respect a discrete 
subgroup of this symmetry in which the E7 group 
elements are required to take integer values: E7(Z,). 

This quantum-level restriction to a discrete sub- 
group can be seen from another phenomenon 
characteristic of superstring theories: the existence 
of “electric” and “magnetic” brane solutions. The 
antisymmetric-tensor (or “form”) fields of the 
higher-dimensional supergravities naturally give rise 
to solitonic solutions in which p+ 1 dimensions 
form a flat Poincaré invariant subspace. This can be 
interpreted as the world volume of an infinite 
p-brane extended object. In the D=11 supergravity 
theory, the branes that emerge in this way are a 
2-brane and a 5-brane. The three-dimensional world 
volume of the 2-brane naturally couples to the 
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3-form field Cynp, just as an ordinary Maxwell 
vector field couples to the one-dimensional world 
line of a point particle (or O-brane). The 2-brane is 
thus naturally electrically charged with respect to 
the 3-form field; its charge can be obtained, in a 
direct generalization of the Maxwell case, from a 
Gauss’ law integral of the field strength Hj4) = dCj3) 
over a 7-sphere at spatial infinity in the eight 
directions transverse to the brane worldvolume. 
The 5-brane, on the other hand, has a magnetic 
type charge; it is the 7-form dual to Hy4) that is 
integrated to give its charge. In addition to these 
static infinite p-branes, the theory contains dynami- 
cal finite-extent branes as well, although for these 
one generally does not have explicit solutions. 

As one reduces a higher-dimensional supergravity 
to lower and lower dimensions, there is a proliferation 
of solitonic brane solutions of varying dimensionality, 
and of both electric and magnetic charge types. In a 
quantum theory context, these electrically and magne- 
tically charged branes pair up in ways that must satisfy 
a generalization of the Dirac quantization condition 
for D =4 electric and magnetic point particles. This 
ends up requiring all the supergravity solitonic brane 
charges to lie on a charge lattice. It is the requirement 
that this discrete brane-charge lattice be respected that 
restricts the classical supergravity nonlinear symmetry 
groups to discrete duality subgroups. 

The dualities relate brane solutions within a given 
theory and also between different string theories. 
They include transformations that invert the radii of 
compactifying tori, giving a large-small compactifi- 
cation scale duality. They also include transforma- 
tions that invert the string coupling constant, thus 
interchanging strong and weak coupling. The type 
IIB theory, for example, is self-dual under strong- 
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Introduction 


A supermanifold is a generalization of a classical 
manifold to include coordinates that are in some 
sense anticommuting. Much of the motivation for 
the study of supermanifolds comes from super- 
symmetric physics, where it is useful to have a 
formalism which treats fermions and bosons in the 
same way. The underlying reason for the 


weak coupling duality. In the case of the type HA 
theory, however, something remarkable happens. 
The strong coupling limit of this theory turns out to 
be related by duality, not to another string theory, 
but to the maximal D = 11 supergravity. The role of 
the Kaluza—Klein massive modes for the 11 to 10 
reduction is played by an infinite tower of extremal 
charged black holes. 

Thus, even D=11 supergravity theory has a role 
to play in the effective theory of the underlying 
quantum dynamics. This underlying theory has been 
dubbed “M-theory.” It is still only partially under- 
stood, but many of its most important properties are 
presaged by the remarkable nonlinear structure of 
the classical supergravities. 
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effectiveness of supermanifolds is that anticommut- 
ing coordinates allow the fermionic canonical anti- 
commutation relations to be handled in a way 
analogous to the bosonic canonical commutation 
relations. Supersymmetric methods have proved 
immensely effective in fundamental physics; they 
also play a considerable role in geometrical index 
theory in mathematics. In this article we describe 
supermanifolds from two points of view — geometric 
and algebraic — and consider some of the standard 
features of manifold calculus, including integration 
since this is an area where the distinctive features of 
this generalized geometry are particularly apparent. 
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Introduction 


A supermanifold is a generalization of a classical 
manifold to include coordinates that are in some 
sense anticommuting. Much of the motivation for 
the study of supermanifolds comes from super- 
symmetric physics, where it is useful to have a 
formalism which treats fermions and bosons in the 
same way. The underlying reason for the effective- 
ness of supermanifolds is that anticommuting 
coordinates allow the fermionic canonical anti- 
commutation relations to be handled in a way 
analogous to the bosonic canonical commutation 
relations. Supersymmetric methods have proved 
immensely effective in fundamental physics; they 
also play a considerable role in geometrical index 
theory in mathematics. In this article we describe 
supermanifolds from two points of view — geometric 
and algebraic — and consider some of the standard 
features of manifold calculus, including integration 
since this is an area where the distinctive features of 
this generalized geometry are particularly apparent. 
One situation where supermanifolds are used in 
physics is in the superspace formulation of super- 
gravity, where the physical fields are found in the 
component fields in the Taylor expansion of func- 
tions on the supermanifold in anticommuting vari- 
ables. More fundamentally, the symmetry groups of 
supersymmetric theories have commuting and anti- 
commuting generators, and are examples of super Lie 
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groups, which are supermanifolds with a compatible 
group structure. 


Some Algebraic Preliminaries 


The coordinates of a supermanifold have particular 
algebraic features which are best understood by 
introducing some of the basic concepts of super- 
algebra. (The word super here does not imply 
superiority, simply the extension of some classical 
concept to have odd as well as even, anticommuting 
as well as commuting, elements.) A “super vector 
space” is a vector space V together with a direct sum 
decomposition 


V = Vo @ Vi [1] 





The subspaces Vp and V; are referred to, respec- 
tively, as the even and odd parts of V. A general 
element v of V thus has the unique decomposition 
v=vo +v, with vp in Vo and vı in Vi. We will 
normally consider homogeneous elements, that is, 
elements v which are either even or odd, with parity 
denoted by |v|, so that |v|=7 if v is in V;,i=0,1. 
(Arithmetic of parity indices 1=0,1 is always 
modulo 2.) A superalgebra is a super vector space 
whose elements can be multiplied together in such a 
way that the product of an even element with an 
even element and that of an odd element with an 
odd element are both even, while the product of an 
odd element with an even element is odd; more 
formally: 


Definition 1 


(i) A “superalgebra” is a super vector space 
A=Ap @ A; which is also an algebra which 
satisfies A;A; a App: 

(ii) The superalgebra is “supercommutative” if, for 
all homogeneous a,b in A, ab = (—1) lel ba. 


If the algebra is supercommutative then odd 
elements anticommute, and the square of an odd 
element is zero. The basic supercommutative super- 
algebra used is the real Grassmann algebra with 
generators 1, 61, 62,... and relations 


16;=P1=6;, Gib) = —BiGi 2] 
A typical element of this algebra is then 
a = ajl + X aib + Say 3G ex [3] 
1 i<j 


This algebra, which is denoted Rs, is a superalgebra 
with Rs := Rs, ® Rsi, where Rso consists of linear 
combinations of products of even numbers of the 
anticommuting generators, while Rs, is built simi- 
larly from odd products. 

The Grassmann algebra Rs is used to build the 
(m, n)-dimensional superspace Rẹ” in the following 
way: 


Definition 2. An (m,n)-dimensional superspace is 
the space 


RE” = Rso x --- x Rso x Rsı x- x Rs M 
t 


m copies n copies 


A typical element of RP” is written as 
(xt,...,x73€',...,€”), where the convention is 
used that lower case Latin letters represent even 
objects and lower case Greek letters represent odd 
objects, while small capitals are used for objects of 
mixed or unspecified parity. 


As will be described in more detail below, in the 
geometric approach supermanifolds are spaces 
locally modeled on Ry”. In order to define a 
supermanifold, we will need to define a topology 
on this space, and to have some notion of 
differentiation. Consider first multilinear functions 
of purely anticommuting variables. If there are n 
such variables, é',...,€’, then a multilinear function 
F can be expressed in the form 


ee = Fý +X mË + ` ËE y 
i=1 


1=i<j 


where the coefficients Fy,F; and so on are real 
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numbers. Such functions will be known (anticipating 
the terminology for functions of both odd and even 
variables) as supersmooth. (A useful notation will be 
to write 


Heyes) = Bae [6] 


with u a multi-index p=py---pp, and & = 
EM... €4k1, The set of multi-indices is restricted to 
those where 1 < m <--- < pp <n.) More general 
supersmooth functions, with the coefficients Fg,... 
taking values in C,Rs, or some other algebra are 
also possible. 

Differentiation of supersmooth functions of anti- 
commuting variables is defined by linearity together 
with the rule 


Oge.. E) 


oë 
J EP a if j= pp 7] 
0 otherwise 


where the caret ^ indicates an omitted factor. 

In order to extend the notion of supersmoothness 
to functions on the more general superspace R”, 
we should strictly take note of the fact that an even 
Grassmann variable is not simply a real or complex 
variable, as explained in the appendix. Assuming 
this done, a supersmooth function on the general 
superspace R%”” can then be defined as a function of 
the form 


FO eng so yeni = a [8] 
u 


with each coefficient function F, a smooth function 
on R”. 

The final preparatory idea needed is the topology 
on the superspace Rẹ”. It turns out that a coarse, 
non-Hausdorff topology leads to most of the super- 
manifolds used in physics. In order to define this 
topology, we introduce a mapping 


c: Rs ~R 
defined by 
e(a DD EDD = ap [9] 
1 i<j 


and the related mapping 
e: RY” — R” 


defined by 
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These maps project out all the nilpotent Grass- 
mann generators, leaving simply the real part. The 
topology involves the inverse of these projection 
maps: a subset U of RẸ” is said to be open if and 
only if there exists an open set V in R” such that 
U=e'(V). Thus, an open set is unlimited in the 
nilpotent directions. 

In the sequel, where we consider integration, the 
superdeterminant of the matrix M of an endo- 
morphism of a super vector space V will be useful. 
If V is an (m,n)-dimensional super vector space 
(so that Vo has dimension m and V; dimension 7), 


then M will have the block diagonal form 
te Moi ) 
Mio Miu 


where the entries of Moo and M41 are even, whereas 
those of Mio and M10 are odd. If N = M™ has block 


form 
(> No1 ) 
Nio Ni 
then the superdeterminant of M is defined by 
S det M = det Moo det N11 


,e(x™)) [10] 


It can be shown that the superdeterminant obeys the 
product rule, unlike the obvious generalization of 
the determinant to the super case. 


The Geometric Approach to 
Supermanifolds 


A manifold is a space locally modeled on the 
topological space R”, where m is the dimension of 
the manifold. Thus, each point in a manifold has a 
neighborhood which is essentially a neighborhood in 
R”. The most geometrically intuitive approach to 
supermanifolds is to generalize this directly by 
modeling a space locally on an extension of R” to 
include anticommuting variables; the most straight- 
forward space with the required algebraic property 
is the superspace RP” built from a Grassmann 
algebra, leading to a supermanifold of dimension 
(m,n). (The dimension of a supermanifold is a pair 
of integers, indicating the numbers of even and odd 
coordinates of each point.) 

The formal definition of a supermanifold will now 
be given in a manner very closely analogous to that 
of a classical manifold. 


Definition 3. Let M be a set. 


(i) An (m,n) open chart on M is a pair (U,¢) such 
that U is a subset of M and ¢ is an injective map 
of U into R¢’”, with the image ¢(U) an open set 
in Ro”: 

(ii) An (m,n) atlas on M is a collection {(U,, @a)} of 
(m,n) charts on M such that the U, cover M 
and, whenever Ua N Ug is not empty, the change 
of coordinate function Qa © 5" is supersmooth. 


An (m,n)-dimensional supermanifold is a set M 
together with a maximal (m,n) atlas on M. 

The space M is given a topology by defining U c M 
to be open if and only if, for each a such that UN Ua 
is not empty, the set alU N Ua) is an open subset 
of Ry”. 

Examples of supermanifolds include R¢”” itself, and 
also supermanifolds constructed from the data of a 
vector bundle over a classical manifold in a manner 
which will now be described. If N is a classical 
m-dimensional real manifold and E is an n-dimensional 
vector bundle over N, then an (m,n)-dimensional 
supermanifold can be constructed in the following 
way: suppose that {( Va, Wq)} is an atlas of charts on N, 
so that each V, is an open subset of N and each Ya is 
an injective map of Va onto an open subset of R”, 
with Ya o pa" smooth. Suppose further that the Va are 
also local trivialization neighborhoods of the bundle E 
with transition functions gag:Va N Vg — GL(n). 
Then we build the supermanifold M by patching 
together the sets (pal Va) x RY”) in a consistent 
way. This leads to a supermanifold with coordinate 
change functions 


-1 1 1 
Pa © P3 Ce een) 


=a L.. x” El R 


Q? sera) SA? Q 


where 


: M1) 
E _ N Sasik (xh, e. a 
k=1 


(Here again we refer to the appendix for the way in 
which functions of even Grassmann variables, as 
opposed simply to real numbers, are handled.) 
Particular examples of this construction are the 
tangent bundle over N and bundles of spinors over 
N. It was actually shown by Batchelor that all real, 
supersmooth supermanifolds are of this form. 

A similar definition may be made of a complex 
supermanifold using a complex Grassmann algebra, 
with the coordinate transition functions required to 
be superanalytic. In this case, supermanifolds which 


are not related to vector bundles in the manner 
described above are possible, basically because 
partitions of unity do not exist in the analytic 
setting. An example is the twisted supertorus, which 
is built over the standard torus and has transition 
functions (z,¢) — (z+1,¢) and (z,¢) — (z+a+ 
aÇ, Ç +a), extending the standard torus with transi- 
tion functions z — z+1,z— z + a. (Here a,a are, 
respectively, even and odd constants.) This super- 
manifold is an example of a super Riemann surface; 
such surfaces play an important role in the quanti- 
zation of the spinning string. 

As with classical manifolds, a natural class of 
functions can be defined on a_supermanifold: 
a function f on an open subset U of the super- 
manifold M is said to be supersmooth if, for each a 
such that U N Ua is nonempty, the function f o ¢," is 
supersmooth on alU N Ua). In local coordinates 
supersmooth functions are such that 
|e gee Lie ae ci an fap (x',..-5x’)EX with 


each fau a smooth function. 


The Algebraic Approach to 
Supermanifolds 


In the algebraic approach to supermanifolds, it is the 
algebra of functions, rather than the manifold 
itself, which is extended to include anticommuting 
elements. In this approach an (m,7)-dimensional 
supermanifold is defined to be a pair (N, A), where 
N is an m-dimensional classical manifold and A is a 
sheaf of superalgebras over N with various proper- 
ties, described below. The statement that A is a 
sheaf of algebras over N means that corresponding 
to each open subset U of N there is an algebra A(U); 
also, if V C U, there is a “restriction map” pu,v 
mapping A(U) into A(V), and the various restriction 
maps obey certain consistency conditions. A parti- 
cular example of such a sheaf (with trivial odd part) 
is the sheaf Ag of real-valued functions on N, with 
Ag(U) = C™(U), the set of real-valued smooth func- 
tions on U and py,y mapping a function in C%(U) 
to its restriction in C™~(V). The defining property of 
the sheaf corresponding to an (m,m)-dimensional 
supermanifold is that there is a cover {Ua} of N for 
which the algebras A(U,) have the form A(U,) S 
C™(U,) ® A(R”), so that a typical element f of 
A(U,,) may be expressed as f= 7/6", where f, € 
C~(U,) and é',...,€” are generators of A(R”). The 
notation here is chosen to emphasize the close 
correspondence with the algebra of smooth func- 
tions described at the end of the previous section. 
This makes it clear that, despite an apparent 
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difference, the two approaches lead to essentially 
equivalent supermanifolds. 

The advantage of the algebraic approach is its 
mathematical elegance and economy - there is no 
need to introduce the auxiliary Grassmann algebra 
Rs in which coordinate functions take values — but 
from the point of view of physicists, the geometric 
point of view has two advantages: first, it is closer to 
the standard manifold picture and thus easier to 
grasp, and, second, it allows a wider class of 
supermanifolds, because Grassmann constants are 
allowed; for instance, the twisted supertorus 
described above cannot be included in the algebraic 
approach without either introducing an auxiliary 
algebra or moving to the more difficult concept of a 
family of supermanifolds. 

While there have been various attempts to develop 
infinite-dimensional supermanifolds, most of the 
constructions have been developed for very specific 
purposes, such as path integration and functional 
integration methods for theories with fermions. 
Even the question of defining a basic infinite- 
dimensional superalgebra with the necessary 
analytic properties, such as a Hilbert—Banach super- 
algebra, requires sophisticated procedures, so that 
the development of a theory of infinite-dimensional 
supermanifolds becomes extremely technical. 


Calculus on Supermanifolds 


Much of the calculus of functions on supermanifolds 
proceeds in simple analogy to that of classical 
manifolds, with addition sign factors occurring when- 
ever two odd quantities are transposed. For instance, a 
vector field on M may be described as a super- 
derivation of the algebra of supersmooth functions 
on M, that is, a linear mapping of this space obeying 
the super Leibnitz rule X fe=Xf g + (-1)'*!F Xg. 
Standard examples of vector fields (defined locally) are 
coordinate derivatives 0/O0x' and 0/0€/, defined by 
(0/Ox')f = O(f o p) and (0/08')f = O4m(f © H) with ¢ 
the coordinate function corresponding to the coordi- 
nates (x!,...,x;',...,€”). Equipped with this con- 
cept of vector field, much of differential calculus on 
manifolds can be directly generalized to supermani- 
folds in a relatively straightforward way. However, in 
the case of integration the situation is quite different. 
The standard approach to integration of anticommut- 
ing variables is the Berezin integral, which is a formal, 
algebraic integral that is not an antiderivative and has 
no measure-theoretic features. There are various 
reasons why such an integral is used: for instance, 
even the simple function € of a single anticommuting 
variable has no antiderivative, while the topology on 
R” does not allow open sets which discriminate in 
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odd directions. Additionally, when changing variables 
on Rọ” it is the superdeterminant of the Jacobian 
matrix which must be used. In the purely odd sector, 
differentials thus transform the “wrong” way. 

The Berezin integral of a function f of n anti- 
commuting variables is defined by 


fate oe") =f.» 12 


In other words, Berezin integration simply picks out 
the coefficient of the highest-order term, thus 
resembling differentiation more than integration in 
the classical sense. Nonetheless, the Berezin integral 
has very useful properties, in particular allowing 
direct analoges of Fourier transformations and 
integral kernel. Given that it is the algebra of 
functions, and the operators acting on these alge- 
bras, which is the key element in supergeometry, 
these are vital properties of the integral. 

The transformation rule under change of variable 
is the inverse of that which one expects. For 
instance, in the case of a single variable, if one 
makes the transformation € — ọ =a + 8 with a and 
B constants, a direct calculation shows that the 
integral is invariant provided that one sets dé = a dọ. 

Integration on R%” is essentially defined by 
combining classical integration for the even variables 
with Berezin integration for odd variables, giving 


J dxd"e( YO fulxt,... x! 
eI (V) u 


_ J dx (fi. nll, x) [13] 
V 


This also defines integration on supermanifolds, 
provided that we can find a rule for the change of 
variable. This, as indicated above, may be done by 
using the superdeterminant of the Jacobian matrix. 
Suppose that (y,@) are a new set of coordinates on 
our supermanifold. Then an invariant definition of 
integral is obtained if we set 


a) 

m Me Ox OF | ym an 

d yd £= Sdet ab ô$ d xd E [14] 
Ox o£ 


Appendix 


We now describe the device which allows functions 
of even Grassmann variables to be handled simply as 
functions of conventional variables. The necessary 
class of functions is captured by defining super- 
smooth functions on Rk as extensions by Taylor 


expansion from smooth functions on R”. 


Definition 4. The function F: Re” — Rs is said to 
be supersmooth if there exists a smooth function 


F:R™” — R, such that 


x (x! — 





e(x!)1) iag (e(x))... [15] 


(Although this Taylor series will in general be 
infinite, it gives well-defined coefficients for each 
G,, in the expansion [3], so that the value of F is a 
well-defined element of Rs.) A number of different 
classes of function can be obtained, by varying the 
space in which the function F takes its value. 


See also: Batalin—Vilkovisky Quantization; BRST 
Quantization; Graded Poisson Algebras; Path-Integrals in 
Non Commutative Geometry; Random Matrix Theory in 
Physics; Supergravity; Superstring Theories; 
Supersymmetric Particle Models; Supersymmetric 
Quantum Mechanics. 
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Introduction 


String theory postulates that all elementary particles 
in nature correspond to different vibration states of 
an underlying relativistic string. In the quantum 
theory both the frequencies and the amplitudes of 
vibration are quantized, so that the quantum states 
of a string are discrete. They can be characterized by 
their mass, spin, and various gauge charges. One of 
these states has zero mass and spin equal to 2h, and 
can be identified with the messenger of gravitational 
interactions, the graviton. Thus, string theory is a 
candidate for a unified theory of all fundamental 
interactions, including quantum gravity. 

In this article, we discuss the theory of superstrings 
as consistent theories of quantum gravity. The aim is 
to provide a quick (mostly lexicographic and biblio- 
graphic) entry to some of the salient features of the 
subject for a nonspecialist audience. Our treatment is 
thus neither complete nor comprehensive — there exist 
for this several excellent expert books, in particular 
by Green, et al. (1987) and by Polchinski (1998). An 
introductory textbook by Zwiebach (2004) is also 
highly recommended for beginners. Several other 
complementary reviews on various aspects of super- 
string theories are available on the internet (see the 
“Further reading” section); some more will be given 
as we proceed. 





Figure 1 A four-particle and a four-string interaction. 
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West P (1990) Introduction to Supersymmetry and Supergravity. 
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The Five Superstring Theories 


Theories of relativistic extended objects are tightly 
constrained by anomalies, that is, quantum viola- 
tions of classical symmetries. These arise because the 
classical trajectory of an extended p-dimensional 
object (or “p-brane”) is described by the embedding 
X#(¢7), where (7=°%»? parametrize the brane world 
volume, and X#=%-»P-! are coordinates of the 
target space. The quantum mechanics of a single 
p-brane is therefore a (p + 1)-dimensional quantum 
field theory, and as such suffers a priori from 
ultraviolet divergences and anomalies. The case 
p =1 is special in that these problems can be exactly 
handled. The story for higher values of p is much 
more complicated, as will become apparent later on. 

The theory of ordinary loops in space is called 
closed bosonic string theory. The classical trajectory 
of a bosonic string extremizes the Nambu—Goto 
action (proportional to the invariant area of the 
world sheet) 


1 
So =- zy | Ee -det (G „0a X 0X") [1] 


where G,,,(X) is the target-space metric, and a’ is 
the Regge slope (which is inversely proportional to 
the string tension and has dimensions of length 
squared). In flat spacetime, and for a conformal 
choice of world-sheet parameters (*=¢€° +¢', the 
equations of motion read: 


0,0-X"=0 and = ,0:X"O,X”=0 [2] 


with n, the Minkowski metric. The X” are thus free 
two-dimensional fields, subject to quadratic phase- 
space constraints known as the Virasoro conditions. 
These can be solved consistently at the quantum 
level in the critical dimension D=26. Otherwise, 
the symmetries of eqns [2] are anomalous: either 
Lorentz invariance is broken, or there is a conformal 
anomaly leading to unitarity problems. (For D < 26, 
unitary noncritical string theories in highly curved 
rather than in the originally flat background can be 
constructed.) 

Even for D=26, bosonic string theory is, how- 
ever, sick because its lowest-lying state is a tachyon, 
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The finiteness of string perturbation theory has 
been, strictly speaking, only established up to two 
loops — for a recent review see D’Hoker and Phong 
(2002). However, even though the technical pro- 
blem is open and hard, the qualitative case for all- 
order finiteness is convincing. It can be illustrated 
with the torus diagram which makes a one-loop 
contribution to string amplitudes. The thin torus of 
Figure 2 could be traced either by a short, light 
string propagating (virtually) for a long time, or by a 
long, heavy string propagating for a short period of 
time. In conventional field theory, these two virtual 
trajectories would have made distinct contributions 
to the amplitude, one in the infrared and the second 
in the ultraviolet region. In string theory, on the 
other hand, they are related by a modular transfor- 
mation (that exchanges ¢? with ¢') and must not, 
therefore, be counted twice. A similar kind of 
argument shows that all potential divergences of 
string theory are infrared — they are therefore 
kinematical (i.e., occur for special values of the 
external momenta), or else they signal an instability 
of the vacuum and should cancel if one expands 
around a stable ground state. 

The low-energy limit of the heterotic and type I 
string theories is N=1 supergravity plus super 
Yang-Mills. In addition to the N=1 graviton 
multiplet, the massless spectrum now also includes 
gauge bosons and their associated gauginos. The 
two-derivative effective action in the heterotic case 
reads: 


1 10 26 
Sue = 55 j d xv —Ge 


2 
R + 40,606 + — tr(F F”) 
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+ fermions [7] 





where w5™S° =tr(AdA + (2/3)A?) is the Chern- 
Simons gauge 3-form. Again, supersymmetry fixes 
completely the above action — the only freedom is in 
the choice of the gauge group and of the Yang-Mills 
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Figure 2 The same torus diagram viewed in two different 
channels. 


coupling gym. Thus, up to redefinitions of the fields, 
the type I theory has necessarily the same low- 
energy limit. 

The D=10 supergravity plus super Yang-Mills 
has a hexagon diagram that gives rise to gauge and 
gravitational anomalies, similar to the triangle 
anomaly in D=4. It turns out that for the two 
special groups Eg x Eg and SO(32), the structure of 
these anomalies is such that they can be canceled by 
a combination of local counter-terms. One of them 
is of the form | By A Xs(F, R), where Xg is an 8-form 
quartic in the curvature and/or Yang-Mills field 
strength. The other is already present in the lower 
line of expression [7], with the replacement 
INE > EBE _ yzOTeNtZ | where the second Chern- 
Simons form is built out of the spin connection. 
Note that these modifications of the effective action 
involve terms with more than two derivatives, and 
are not required by supersymmetry at the classical 
level. The discovery by Green and Schwarz that 
string theory produces precisely these terms (from 
integrating out the massive string modes) was called 
the “first superstring revolution.” 


D-Branes 


A large window into the nonperturbative structure 
of string theory has been opened by the discovery of 
D(irichlet)-branes, and of strong/weak-coupling 
duality symmetries. A Dp brane is a solitonic 
p-dimensional excitation, defined indirectly by the 
property that open string endpoints can attach to its 
world volume (see Figure 3). Stable Dp branes exist 
in the type IIA and type IIB theories for p even, 
respectively, odd, and in the type I theory for p=1 
and 5. They are charged under the R-R (p + 1)-form 
potential or, for p > 4, under its magnetic dual. 
Strictly speaking, only for 0 <p <6 do D-branes 
resemble regular solitons the word stands for 
“solitary waves”). The D7 branes are more like 
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Figure 3 D-branes and open strings. 


cosmic strings, the D8 branes are domain walls, 
while the D9 branes are spacetime filling. Indeed, 
type I string theory can be thought as arising from 
type IIB through the introduction of an orientifold 
9-plane (required for tadpole cancelation) and of 32 
D9 branes. 

The low-energy dynamics of a Dp brane is 
described by a supersymmetric abelian gauge theory, 
reduced from ten down to p+ 1 dimensions. The 
gauge field multiplet includes 9—p real scalars, 
plus gauginos in the spinor representation of the 
R-symmetry group SO(9 — p). These are precisely 
the massless states of an open string with endpoints 
moving freely on a hyperplane. The real scalar fields 
are Goldstone modes of the broken translation 
invariance, that is, they are the transverse coordinate 
fields Y(€*) of the D-brane. The bosonic part of the 
low-energy effective action is the sum of a Dirac- 
Born-Infeld (DBI) and a Chern—Simons (CS) like 


term: 
I, =—T, J Peet deent Fas) 
-p | Y ène” 8 


AN 


where F p = Bap + 2ra F p, hats denote pullbacks 
on the brane of bulk tensor fields (e.g., G,,= 
Giv0rY"OpY”), Fap is the field strength of the 
world-volume gauge field, and in the CS term 
one is instructed to keep the (p+ 1)-form of the 
expression under the integration sign. The constants 
Tp and pp are the tension and charge density of the 
D-brane. As was the case for the effective super- 
gravities, the above action receives curvature 
corrections that are higher order in the a’ expan- 
sion. Note however that a class of higher-order 
terms have been already resummed in expression 
[8]. These involve arbitrary powers of F,,, and are 
closely related more precisely T-dual, see later) to 
relativistic effects which can be important even in 
the weak-acceleration limit. When refereing to the 
D9 branes of the type I superstring, the action [8] 
includes the GS terms required to cancel the gauge 
anomaly. 

The tension and charge density of a Dp brane can 
be extracted from its coupling to the (closed-string) 
graviton and R-R (p + 1)-form, with the result: 


(4ra) 9] 


The equality of tension and charge follows from 
unbroken supersymmetry, and is also known as a 
Bogomoľ’nyi-Prasad-Sommerfeld (BPS) condition. 
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It implies that two or more identical D-branes 
exert no net static force on each other, because 
their R-R repulsion cancels exactly their gravita- 
tional attraction. A nontrivial check of the result 
[9] comes from the Dirac quantization condition 
(generalized to extended objects by Nepomechie 
and Teitelboim). Indeed, a Dp brane and a 
D(6 — p)-brane are dual excitations, like electric 
and magnetic charges in four dimensions, so their 
couplings must obey 


2K°Pppe-p = 27k where keZ [10] 


This ensures that the Dirac singularity of the long- 
range R-R fields of the branes does not lead to an 
observable Bohm-Aharonov phase. The couplings 
[9] obey this condition with k=1, so that D-branes 
carry the smallest allowed R-R charges in the 
theory. 

A simple but important observation is that open 
strings living on a collection of n identical D-branes 
have matrix-valued wave functions Wj, where 
i j=1,...,n label the possible endpoints of the 
string. The low-energy dynamics of the branes is 
thus described by a nonabelian gauge theory, with 
group U(m) if the open strings are oriented, and 
SO(n) or Sp(n) if they are not. We have already 
encountered such Chan-—Paton factors in our discus- 
sion of the type I superstring. More generally, this 
simple property of D-branes has led to many insights 
on the geometric interpretation and engineering of 
gauge theories, which are reviewed in the articles 
Brane Construction of Gauge Theories and Gauge 
Theories from Strings. It has also placed on a firmer 
footing the idea of a brane world, according to 
which the fields and interactions of the standard 
model would be confined to a set of D-branes, while 
gravitons are free to propagate in the bulk (for 
reviews, see Brane Worlds and reference Lust 
(2004)). It has, finally, inspired the gauge/string 
theory or AdS/CFT correspondence (see Ads/CFT 
Correspondence and Aharony et al. (2000)) on 
which we will comment later. 


Dualities and M Theory 


One other key role of D-branes has been to provide 
evidence for the various nonperturbative duality 
conjectures. Dual descriptions of the same physics 
arise also in conventional field theory. A prime 
example is the Montonen—Olive duality of four- 
dimensional, N=4 supersymmetric Yang-Mills, 
which is the low-energy theory describing the 
dynamics of a collection of D3 branes. The action 
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for the gauge field and six associated scalars ®} (all in 
the adjoint representations of the gauge group G) is 


1 
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+ fermionic terms [11] 


Consider for simplicity the case G=SU(2). The 
scalar potential has flat directions along which the 
six l commute. By an SO(6) R-symmetry rotation, 
we can set all but one of them to zero, and let 
<tr(®!@!)> =v? in the vacuum. In this “Coulomb 
phase” of the theory, a U(1) gauge multiplet stays 
massless, while the charged states become massive 
by the Higgs effect. The theory admits furthermore 
smooth magnetic-monopole and dyon solutions, and 
there is an elegant formula for their mass: 
4ri 


where T = = + re [12] 


M = vna F Tng]; = 


and .\(%mg) denotes the quantized electric (mag- 
netic) charge. This is a BPS formula that receives 
no quantum corrections. It exhibits the SL(2, Z) 
covariance of the theory, 
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and [13] 
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Here a,b,c,d are integers subject to the condition 
ad — bc=1. Of special importance is the transfor- 
mation t— —1/r, which exchanges electric and 
magnetic charges and (at least for 0 = 0) the strong- 
with the weak-coupling regimes. For more details 
see the review by Harvey (1996). 

The extension of these ideas to string theory can be 
illustrated with the strong/weak- coupling duality 
between the type I theory, and the Spin(32)/Z» 
heterotic string. Both have the same massless spec- 
trum and low-energy action, whose form is dictated 
entirely by supersymmetry. The only difference lies in 
the relations between the string and supergravity 
parameters. Eliminating the latter, one finds 


1 
À = == 
het 2A 


It is thus tempting to conjecture that the strongly 
coupled type I theory has a dual description as a 


and = ay, = V2A;0} [14] 


weakly coupled heterotic string. These are, indeed, 
the only known ultraviolet completions of the 
theory [7]. Furthermore, for \; >> 1, the D1 brane 
of the type I theory becomes light, and could be 
plausibly identified with the heterotic string. This 
conjecture has been tested successfully by comparing 
various supersymmetry-protected quantities (such as 
the tensions of BPS excitations and special higher- 
derivative terms in the effective action), which can be 
calculated exactly either semiclassically, or at a given 
order in the perturbative expansion. Testing the duality 
for nonprotected quantities is a hard and important 
problem, which looks currently out of reach. 

The other three string theories have also well- 
motivated dual descriptions at strong coupling A. 
The type IIB theory is believed to have an SL(2, Z) 
symmetry, similar to that of the N=4 super Yang- 
Mills. (Note that A is a dynamical parameter, that 
changes with the vacuum expectation value of the 
dilaton <@>. Thus, dualities are discrete gauge 
symmetries of string theory.) The type HA theory 
has a more surprising strong-coupling limit: it grows 
one extra dimension (of radius R11 =1/AVa’), and 
can be approximated at low energy by the maximal 
11-dimensional supergravity of Cremmer, Julia, and 
Scherk. The latter is a very economical theory — its 
massless bosonic fields are only the graviton and a 
3-form potential A3. The bosonic part of the action 
reads 


1 
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The electric and magnetic charges of the 3-form are a 
(fundamental?) membrane and a solitonic 5-brane. 
Standard Kaluza—Klein reduction on a circle maps $11p 
to the IIA supergravity action [6], where G», ¢, and Cy 
descend from the 11-dimensional graviton, and B2 and 
C3 from the 3-form A3. Furthermore, all BPS excita- 
tions of the type IIA string theory have a counterpart in 
11 dimensions, as summarized in Table 1. Finally, if 
one compactifies the eleventh dimension on an interval 
(rather than a circle), one finds the conjectured strong- 
coupling limit of the Eg x Eg heterotic string. 

The web of duality relations can be extended by 
compactifying further to D < 9 dimensions. Readers 
interested in more details should consult Polchinski 
(1998) or one of the many existing reviews of the 
subject (Townsend (1996), see also “Further Read- 
ing” section). In nine dimensions, in particular, the 
two type II theories, as well as the two heterotic 
superstrings, are pairwise T-dual. T-duality is a 
perturbative symmetry (thus firmly established, not 
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Table 1 BPS excitations of type IIA string theory, and their counterparts in M theory compactified on a circle of radius R41 
Tension Type IIA M on S! Tension 

(VT/ k10) (2ra? DO brane K-K excitation 1/Ry4 

Tr = (27) String Wrapped membrane 2nRi4 (272/62) 
(VT/k10)(2rVa') D2 brane Membrane Weser" 
(\/7/K10)(20V a!) D4 brane Wrapped 5-brane = (Qn? /n2,)°/° 
(T/K? )(270') NS-5-brane 5-brane (1/27) (202 /n2,)*° 
(\/m/K10)(20Va")* D6 brane K-K monopole 27° Re, [Ka 


From Bachas CP (1997) Lectures on D-branes. In: Olive DI and West PC (eds.) Duality and Supersymmetric Theories, Proceedings, 
Easter School, Newton Institute, Euroconference, Cambridge, UK, April 7-18. With permission of Cambridge University Press. 
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Figure 4 Web of dualities in nine dimensions. From Bachas CP 
(1997) Lectures on D-branes. In: Olive DI and West PC (eds.) 
Duality and Supersymmetric Theories, Proceedings, Easter School, 
Newton Institute, Euroconference, Cambridge, UK, April 7—18. With 
permission of Cambridge University Press. 


only conjectured) which exchanges momentum and 
winding modes. Putting together all the links one 
arrives at the fully connected web of Figure 4. This 
makes the point that all five consistent superstrings, 
and also 11-dimensional supergravity, are limits of a 
unique underlying structure called M theory. (For 
lack of a better definition, “M?” is sometimes also 
used to denote the D=11 supergravity plus 
supermembranes, as in Figure 4.) A background- 
independent definition of M theory has remained 
elusive. Attempts to define it as a matrix model of 
DO branes, or by quantizing a fundamental mem- 
brane, proved interesting but incomplete. A diffi- 
culty stems from the fact that in a generic 
background, or in D=11 Minkowski spacetime, 
there is only a dimensionful parameter fixing the 
scale at which the theory becomes strongly coupled. 


Other Developments and Outlook 


We have not discussed in this brief review some 
important developments covered in other contribu- 
tions to the encyclopedia. For the reader’s conve- 
nience, and for completeness, we enumerate (some 
of) them giving the appropriate cross-references: 
Compactification. To make contact with the 
standard model of particle physics, one has to 


compactify string theory on a six-dimensional 
manifold. There is an embarassment of riches, 
but no completely realistic vacuum and, more 
significantly, no guiding dynamical principle to 
help us decide (see Compactification of Superstring 
Theory). The controlled (and phenomenologically 
required) breaking of spacetime supersymmetry is 
also a problem. 

Conformal field theory and quantum geometry. 
The algebraic tools of 2D conformal field theory, 
both bulk and boundary (see Two-Dimensional 
Conformal Field Theory and Vertex Operator 
Algebras), play an important role in string theory. 
They allow, in certain cases, a resummation of a’ 
effects, thereby probing the regime where classical 
geometric notions do not apply. 

Microscopic models of black holes. Charged extre- 
mal black holes can be modeled in string theory by BPS 
configurations of D-branes. This has led to the first 
microscopic derivation of the Bekenstein—-Hawking 
entropy formula, a result expected from any consistent 
theory of quantum gravity. As with the tests of duality, 
the extension of these results to neutral black holes is a 
difficult open problem — see Branes and Black Hole 
Statistical Mechanics. 

AdS/CFT and holography. A new type of (holo- 
graphic) duality is the one that relates supersym- 
metric gauge theories in four dimensions to string 
theory in asymptotically anti-de Sitter spacetimes. 
The sharpest and best-tested version of this duality 
relates N=4 super Yang-Mills to string theory in 
AdS; x Ss. Solving the o-model in this latter back- 
ground is one of the keys to further progress in the 
subject (see AdS/CFT Correspondence). 

String phenomenology. Finding an experimental 
confirmation of string theory is clearly one of the most 
pressing outstanding questions. There exist several 
interesting possibilities for this — cosmic strings, large 
extra dimensions, modifications of gravity, primordial 
cosmology (see String Theory: Phenomenology for a 
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Figure 5 The unification of couplings. 


review). Here we point out the one supporting piece 
of experimental evidence: the unification of the 
gauge couplings of the (supersymmetric, minimal) 
standard model at a scale close to, but below the 
Planck scale, as illustrated in Figure 5. This is a 
generic “prediction” of string theory, especially in its 
heterotic version. 


See also: AdS/CFT Correspondence; Boundary 
Conformal Field Theory; Brane Construction of Gauge 
Theories; Brane Worlds; Branes and Black Hole 
Statistical Mechanics; Compactification of Superstring 
Theory; Derived Categories; Electroweak Theory; 
Gauge Theories from Strings; Noncommutative 
Geometry from Strings; Supermanifolds; String Field 
Theory; String Theory: Phenomenology; Supergravity; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras; Wheeler—DeWitt Theory. 
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Introduction 


Supersymmetric quantum field theories (see Super- 
gravity) are characterized by the existence of one 
(N=1 supersymmetry) or several (N > 1 extended 
supersymmetry) conserved Noether-like charges 
O, A=1,...,N, which establish symmetry links 
between particle states of different spin. Super- 
symmetry ensures equal numbers of bosonic and 
fermionic particle states. If it is exact, bosons and 
fermions related by supersymmetry transformations 
have equal masses. Moreover, supersymmetry 
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imposes stringent relations between interactions 
which involve particles of different spin. This gives 
rise to a special ultraviolet behavior of supersym- 
metric theories. Their ultraviolet divergences are 
much softer than in nonsupersymmetric theories. In 
particular, N=4 supersymmetric quantum field 
theories are finite and for any N they are free from 
quadratic divergences plaguing ordinary theories 
with elementary scalars. N >4 supersymmetric 
theories necessarily involve particles of spin higher 
than 1 and are not renormalizable. Supersymmetry 
promoted to a local symmetry includes gravity. 
Only N=1 supersymmetric theories allow for 
chiral fermions which are the fundamental objects in 
elementary particle interactions (see Standard Model 
of Particle Physics). This is because parity and 


charge conjugation symmetries are violated in weak 
interactions. Therefore, N > 1 theories may not be 
of immediate phenomenological relevance. How- 
ever, they may be useful for constructing super- 
symmetric theories in more than four dimensions 
(more than three spatial dimensions). Chiral (effec- 
tive) theory in four dimensions can be then obtained 
after compactification of extra dimensions. For 
instance, N=2 theory in five dimensions (x,, y) 
compactified on a circle with reflection symmetry 
y—-—y (orbifold compactification) gives chiral 
N =1 theory in four dimensions. 

Absence of quadratic divergences in supersym- 
metric theories is the main argument supporting the 
belief that fundamental interactions of elementary 
particles at energies not higher that O(1 TeV) should 
be described by an (approximately) N=1 super- 
symmetric extension of the standard model (SM). 
Indeed, supersymmetric models elegantly solve the 
so-called hierarchy problem of the SM. At present, 
supersymmetry remains a theoretical hypothesis. 
No experimental evidence for it has been found yet 
(for experimental lower bounds on the masses of 
supersymmetric particles see Eidelman et al. (2004)). 
Supersymmetric models will be tested experimentally 
at the Large Linear Collider at CERN (Geneva), after 
the completion of its construction in 2007. Super- 
gravity theories may be physically relevant as an 
intermediate step in constructing phenomenologically 
viable models from superstring theories. 

The essence of the hierarchy problem of the 
standard model (SM) — the successful SU(3),. x 
SU(2); x U(1)y gauge theory of interactions of 
quarks and leptons at energies up to about 100 GeV — 
is the following. By itself, the SM does not explain the 
value of the Fermi scale v of the electroweak 
SU(2); x U(1)y symmetry breaking (v ~ Ce where 
Gr is the Fermi constant determined by the life time 
of the muon). Indeed, in the SM, the electroweak 
symmetry breaking is realized by an elementary Higgs 
field H (an SU(2) doublet) with a potential 


A 
V=m HH + 5 (HHY (1] 


where m and A are free parameters of the SM. When 
m? <0 is chosen, the minimum of the potential 
occurs when 


m? v? 


A 2 
that is, the Higgs doublet acquires SU(2) x U(1)y 
breaking vacuum expectation value v which is just 
the Fermi scale. The masses of the intermediate 
vector bosons W* and Z? are proportional to v and 
depend also on the gauge couplings. Within the SM 


(H'H) =- 2] 
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understood as a theory with the momentum cut-off 
Asm, quantum corrections to the mass parameter m? 
in eqn [1] are quadratically divergent: 


3 
sm = = (383 bgt Anuta [3] 


Here, 21,22, and y, are the gauge couplings of the 
groups U(1)y,SU(2),, and the top-quark Yukawa 
coupling, respectively. This means that if, above the 
energy scale Asp, the SM is replaced by some more 
fundamental theory, in which there are particles of 
masses M > Asy, the quantum corrections to m? are 
quadratically dependent on the new mass scale M. 
For M >», this is very unnatural even if the original 
parameter m? remains a free parameter of this 
underlying theory and particularly difficult to accept 
if in the underlying theory m7 is fixed by some more 
fundamental considerations. If the SM was the 
correct theory up to, for example, the mass scale 
suggested by the see-saw mechanism for the neu- 
trino masses, Asm ~ 10!° GeV 


\5m?| ~ 1078 GeV? ~ 107477! 


Clearly, this excludes the possibility of understand- 
ing the magnitude of the Fermi scale v in any 
sensible way. Thus, for naturalness of the Higgs 
mechanism in the SM there should exist a new mass 
scale M 2 v, say only one order of magnitude higher 
than v and the theory describing the physics above 
that scale should be free of quadratic divergences. 
(Approximate) supersymmetry is at present the most 
elegant and theoretically most complete solution to 
the hierarchy problem of the SM. 


Supersymmetric Extensions of the SM 


In supersymmetry, the gauge fields A? are promoted 
to vector superfields V4 = (A3, àf, D°), one for each 
gauge symmetry group generator, where A?s are 
Weyl fermions (called gauginos) and D?s are 
nondynamical auxiliary fields. A  renormalizable 
supersymmetric gauge theory is completely defined 
(see, e.g., Sohnius (1985) and Wess and Bagger 
(1992)) by specifying the gauge group, the set of 
chiral supermultiplets ®;=(¢;,w;,F;) representing 
matter fields, and the superpotential — a holo- 
morphic polynomial function of at most third 
order in the chiral superfields which determines 
Yukawa couplings of the fermions Y; and scalars ¢j. 
Auxiliary fields D? and F; can be eliminated via their 
(algebraic) equations of motion. 

The so-called minimal supersymmetric SM 
(MSSM) encodes the main features of any super- 
symmetric extension of the SM. Its gauge group is 
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SU(3) x SU(2) x U(1) — the same as in the SM — and 
the chiral superfields are associated to each of the 
SM quark and lepton fields. Thus, quarks and 
leptons get scalar spin zero superpartners, the 
squarks and sleptons, carrying the same quantum 
numbers as their corresponding fermions and the 
vector superfields provide spin 1/2 superpartners for 
the gauge fields — the gluinos, the winos, and the 
bino. The SM Higgs doublet with weak hypercharge 
Y=1/2 becomes a scalar component of a chiral 
superfield H, which contains in addition one 
doublet of Weyl fermions -— the Higgsinos. The 
chiral anomaly cancelation condition requires that 
there be also a second Higgs chiral superfield H, 
with Y = —1/2. Such a superfield is also required for 
giving masses to all flavors of quarks; because of the 
holomorphicity of the superpotential the same Higgs 
doublet cannot couple simultaneously to all quarks. 

With the MSSM superfield content, the most 
general renormalizable superpotential consistent 
with the gauge symmetry has the form 


W=Y, U°OH, T YiD°OH, T YE LA, F uwHaH, 
+ MD'ÒL + YELL + A3 UDD + MLA, [4] 


(flavor indices are suppressed) where the superfield Ô 
contains the SU(2) quark doublet Ọ and its scalar 
superpartner O and similarly for the lepton doublet 
L, quark singlets U,D, and lepton singlet E super- 
fields. The three first terms in [4] give the SM-like 
Yukawa couplings of quarks and leptons to the Higgs 
fields together with Yukawa couplings of the corre- 
sponding superpartners. The fourth term has no SM 
analogy; it gives supersymmetric masses to the Higgs 
scalar and Higgsinos. The interactions in the second 
line do not conserve baryon and lepton numbers, 
respectively B and L, and should be forbidden (or 
strongly suppressed) by some additional symmetry of 
the theory as they would lead to rapid proton decay. A 
discrete symmetry, called R-parity R=(—1)°?7°?-, 
where S is the spin of the field, is an interesting 
possibility. R-parity acts differently on the different 
components of the superfields: it is even for all SM 
particles and odd for their superpartners. Its conserva- 
tion implies that superpartners must appear in pairs in 
any interaction vertex. Thus, with R-parity imposed, 
the lightest supersymmetric particle is stable and it is an 
excellent candidate for the dark matter in the universe. 

Supersymmetry cannot be an exact symmetry of 
nature because there do not exist elementary fermions 
and bosons degenerate in mass. The superpotential 
[4] does not break supersymmetry spontaneously but 
even if it did the elementary fermions and bosons 
would on average have equal masses (they would 
satisfy some mass sum rule) which is also 


contradicted by the experimental data. Therefore, in 
the MSSM, supersymmetry has to be broken expli- 
citly but in such a way that the soft ultraviolet 
behavior remains intact. Remarkably, the super- 
symmetry breaking terms which can be added to the 
MSSM Lagrangian without reintroducing quadratic 
divergences make heavy just those fields which are 
Opposite statistics superpartners of the SM gauge 
bosons and fermions. These so-called soft terms are: 


Loo = 26696 — nw" — LBBB 
- mb [ÖP = mh 0°? — m DP 
- mè LP — mžlËeP — m3 |e 
— mī, HF- m>(H“H4 + c.c.) 
+ AyU°OH, + AnD QH; + ALE LH; [5] 


and yield gaugino (gluino G, wino W, and bino B) 
and scalar mass terms as well as explicit trilinear 
couplings between scalars (scalar mass terms and 
A-terms are 3 x 3 matrices in the flavor space). As a 
result, supersymmetry is broken in the mass spectra 
but not in the dimensionless couplings. 

The origin of the soft supersymmetry breaking 
remains an open issue. Terms |5] are most probably 
remnants of the spontaneous supersymmetry break- 
ing in the so-called “hidden” sector — a hypothetical 
set of fields that do not interact directly with the 
MSSM fields. For example, in the popular scenario, 
they interact with the MSSM fields only gravitation- 
ally and spontaneous supersymmetry breaking in the 
hidden sector is communicated to the MSSM sector 
by gravitational interactions giving rise to terms [5]. 
Several other mechanisms of supersymmetry break- 
ing transmission have also been proposed (gauge 
mediation, anomaly mediation, etc.). 

The mass parameters and A-terms in [5] are free 
parameters of the low-energy supersymmetric theory 
and, combined with the interactions like OOG 
originating from supersymmetric kinetic terms, may 
be a new, troublesome, source of flavor changing 
neutral currents and of CP violation. 


Higgs Sector of the MSSM 
The MSSM Higgs potential reads 


V =m? |Ha + m3|H, |" + m3(H,,Hg + c-c.) 
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Its quartic part is uniquely determined by the 
structure of the supersymmetric gauge theory. The 


parameters mî, m3, and m3 are determined by 


the soft supersymmetry breaking Higgs boson 
masses [5] and the u parameter in [4]. The potential 
[6] is bounded from below for mî + m} > 2m4, and 
for mim; — mz < 0 it has the electroweak symmetry 
breaking minimum at v, = (H?) 4 0,v, =(H9) 4 0. 
The ratio v,/vg = tan 8 is then phenomenologically 
a very important parameter. 

Quantum corrections to the mass parameters in 
[6] are controlled by the mass scale M,o¢ of the 
supersymmetry breaking terms [5]; at the one-loop 
level instead of [3], one finds 
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mia ~ e? (323 + gi - 12y%,.) M rln Me . [7] 
where yp and y, are the bottom- and top-quark 
Yukawa couplings, respectively and Anew is the scale 
at which the soft supersymmetry breaking terms are 
generated by the putative supersymmetry breaking 
transmission mechanism. In gravity mediation scenar- 
ios, Anew ~ Mp). In gauge mediation scenarios, ANEW 
is low but it is a new scale, introduced by hand. 

In the softly broken supersymmetric models, the 
hierarchy problem is solved for Mot <O(10)v. 
Moreover, eqn [7] shows that via quantum correc- 
tions the large top-quark Yukawa coupling y, drives 
the mass parameter m2 to a negative value, inducing 
the electroweak symmetry breaking. This means that 
in supersymmetric models the electroweak scale is 
calculable in terms of the known coupling constants 
and the (unknown) scales M,,¢, and cutoff scale 
Angew to the MSSM. If Mot SO(10)v, the correct 
electroweak scale is obtained for Angew ~ Mcut. 
This nicely fits with unification of the gauge 
couplings. 

In supersymmetric models, the quartic couplings 
in the Higgs potential are restricted. This typically 
leads to a strong upper bound on the mass of the 
lightest Higgs particle. In the minimal model with 
the potential [6], at the tree level 


Muiggs < Mz x91 GeV [8] 


This bound is substantially modified by quantum 
corrections. They depend quadratically on the top- 
quark mass and logarithmically on the stop mass 
scale M; ~ Mgor: 


M2 


Higgs < dv” [9 | 


where A is given by 
AS 1(g5 F gi) cos” 23+ A 
3g3 mi Mi 


t 


(10) 


For M; <1 TeV, Muiges < 130 GeV. 


Supersymmetric Particle Models 143 


The minimal-model bound on the Higgs mass can 
be relaxed in models with extended Higgs sector. 
For instance, if an additional gauge group singlet 
chiral superfield couples to the Higgs doublets, the 
Higgs self-coupling A in [9] receives additional 
contributions. Explicit calculations show that in 
such and other models, with Mog <S1TeV, the 
bound on the Higgs mass cannot be raised above 
~150GeV if one wants to preserve perturbative 
gauge coupling unification. 


Supersymmetric Grand Unified Theories 


There are two striking aspects of the matter 
spectrum in the SM. One is the chiral anomalies 
cancelation (Weinberg 1996-2000, Pokorski 2000), 
which is necessary for a unitary (and renormaliz- 
able) theory, and occurs thanks to certain conspiracy 
between quarks and leptons suggesting a deeper link 
between them. The second one is that the spectrum 
fits into simple representations of the SU(5) and 
SO(10) groups (Ross 1985). Indeed, each generation 
of the SM matter fills 5° + 10 + 1 (if the right-handed 
neutrino is included into the spectrum) representations 
of SU(S5) and for SO(10),16=5*+10+41. The 
assignment of fermions to the SU(5) or SO(10) 
representations fixes the normalization of the U(1)y 
generator. Both facts suggest unification of strong and 
electroweak elementary forces in a grand unified 
theory with some bigger gauge symmetry group. Such 
unification implies that all the SM gauge forces 
become of equal strength at some unification scale. 
Their strength is measured by the running gauge 
couplings a;= 7/47, i=1,2,3, of the three group 
factors SU(3). x SU(2); x U(1)y. The energy scale 
dependence of a; is governed by the renormalization 
group equations. In the first nontrivial approximation, 





they read: 
1 1 bW 70 
ToT 2 ea 11 
Here, 1/a;(Mz) = (58.98 + 0.04, 29.57 + 0.03, 


8.40 +0.14) are the experimental values of the 

li he Fermi scale and b9 h 
gauge couplings at the Fermi scale and bọ are the 
coefficients which depend on the matter content of 
the theory. They are 


bo = (Go + Ne -= Na =li + 3Ng) 
in the SM and 
bo = (+ 2Ng, —5 + 2Ng, —9 + 2Ng) 


in the MSSM, where Ng is the number of fermion 
generations. In the SM, the running gauge couplings 
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approach each other at high scale of order 101° GeV 
but never unify. 

In the MSSM, with sparticle spectrum character- 
ized by Mog œ 1 TeV and for the initial Fermi scale 
values given above, the three gauge couplings unify 
with high precision at the scale Mcur ~ 101° GeV. 
Therefore, the MSSM can be embedded into super- 
symmetric grand unified theories with no hierarchy 
problem for the Fermi scale (it is stable with respect 
to radiative corrections generated by particles with 
masses ~Mcyr) and no conflict with the measured 
values of the gauge couplings. 

In the SM, the baryon number is (perturbatively) 
conserved since there are no renormalizable couplings 
violating this symmetry. Experimental search for 
proton decay, for example, p— et, p—Ktv, is 
one of the most fundamental tests for particle physics. 
The present limit on the proton life time is m, > 
10% yr. In grand unified theories, baryon number 
conservation is violated by interactions mediated by 
the heavy gauge bosons corresponding to the enlarged 
gauge symmetry (e.g., SU(S)), spontaneously broken at 
Mcur to the SM gauge symmetry. Such interactions 
manifest themselves at low energy as additional, 
nonrenormalizable interactions added to the SM 
Lagrangian. Proton decay is then induced by the set 
of dimension-6 operators of the form 


(6) 
el — - 
M6) 





qqql [12] 


where q,l denote quarks and leptons, respectively. 
For c! wv acura 1 /25, the experimental limit on 
Tp requires M6) 210!" GeV, consistently with 
Mcur = 10! GeV in supersymmetric GUTs. How- 
ever, in supersymmetric GUTs, there is still another, 
genuinely supersymmetric, source of contributions 
to the proton decay amplitudes. These are the 


dimension-5 operators 


(5) oo 
O; — 3 
Ms) 


qqql [13] 





where q, l denote squarks and sleptons, respectively. 
Such operators originate from the exchange of the 
color triplet scalars present in the Higgs boson GUT 
multiplets, with Mis)~Mcur~10'° GeV, and 
c)>10° is given by the Yukawa couplings. 
Inserted into diagrams with gaugino exchanges they 
give rise to dimension-6 operators of the form [12]. 
One then gets cl® = agur), Mio) = Mis) Msusy. 
Given various uncertainties, for example, in the 
unknown squark, gaugino, and heavy Higgs boson 
mass spectrum, such contributions in supersym- 
metric GUT models predict the proton life time to 


be consistent with but close to the present experi- 
mental limits. 


Summary 


Supersymmetry is distinct in several very important 
points from all other proposed solutions to the 
hierarchy problem. First of all, it provides a general 
theoretical framework which allows one to address 
many physical questions. Supersymmetric models, 
like the MSSM or its simple extensions, satisfy a 
very important criterion of “perturbative calculabil- 
ity.” In particular, they are easily consistent with 
the precision electroweak data. The SM is their 
low-energy approximation in the sense of the 
Appelquist-Carazzone decoupling, so most of the 
successful structure of the SM is built into super- 
symmetric models. The quadratically divergent quan- 
tum corrections to the Higgs mass parameter (the 
origin of the hierarchy problem in the SM) are absent 
in any order of perturbation theory. Therefore, the 
cutoff to a supersymmetric theory can be as high as 
the Planck scale, and “small” scale of the electroweak 
breaking is still natural. Supersymmetry is not only 
consistent with grand unification of elementary forces 
but, in fact, makes it very successful. And, finally, 
supersymmetry is needed for string theory. 

However, there are also some problems to be solved: 
the hierarchy problem of the electroweak scale is solved 
but the origin of the soft supersymmetry breaking scale 
Mo remains an open question: spontaneous super- 
symmetry breaking and its transmission to the visible 
sector is a difficult problem and a fully satisfactory 
mechanism which would yield M,, hierarchically 
smaller than the Planck (string) scale has not yet been 
found. On the phenomenological side, there are new 
potential sources of flavor-changing neutral current 
transitions and of CP violation, and baryon and lepton 
numbers are not automatically conserved by the 
renormalizable couplings. But even those problems 
can at least be discussed in a concrete quantitative way. 


See also: Brane Construction of Gauge Theories; 
Perturbation Theory and its Techniques; Seiberg—Witten 
Theory; Standard Model of Particle Physics; 
Supergravity; Supermanifolds. 
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Introduction 


Supersymmetric quantum mechanics is a specific 
extension of quantum mechanics with fermionic 
degrees of freedom. In quantum field theory and 
many-body theory, a fermionic degree of freedom is 
one which is subject to Pauls principle: any 
nondegenerate quantum state associated with a 
fermionic degree of freedom can be occupied at 
most once at any time. Similarly, in quantum 
mechanics, one associates a fermionic degree of 
freedom with an observable, the eigenvalue spec- 
trum of which is restricted to the discrete set (0, 1). 

The simplest example of a purely fermionic 
quantum system is the fermionic oscillator. It is 
represented by conjugate operators (f,f') such that 


P=0, f=o, f+Af=1 fl 


with a Hamiltonian H given by the bilinear 
expression 


Ay = Ef -+ buf 'f [2] 


The state space of this system is spanned by two 
independent state vectors |0} and |1), such that 


ftlo) = 11) 
f'l) =0 


f\0) = 0, 
f\1) = |0), 


By construction, the states |7¢) are eigenstates of 
fermion number, 


Ny = ff, 


with eigenvalue ny = (0, 1); this implements the Pauli 
principle. The states have energy eigenvalues 


n= (0,1) [5] 


[3] 


Ny = Ny 4] 


En, = € + npbw, 


differing in energy by AE = hw. Physically, the system 
can be identified with a single fixed magnetic dipole in 


an external magnetic field, the only polarization states 
of the dipole being spin up or spin down. 

In the Schrödinger representation of quantum 
mechanics (wave mechanics), fermionic degrees of 
freedom are represented by anticommuting Grassmann 
variables. These have no immediate classical analog, 
but can be used to construct quasiclassical obser- 
vables like spin. 

A supersymmetric quantum system is a system 
possessing both fermionic and bosonic degrees of 
freedom, characterized by a degeneracy between 
states with even and odd fermion number. In the 
Schrödinger representation, this is manifest in a 
symmetry transforming bosonic (Grassmann-even) 
into fermionic (Grassmann-odd) variables. The 
generators of the supersymmetry transformations 
square to the Hamiltonian of the system. 


The Supersymmetric Oscillator 


An elementary example of a supersymmetric quan- 
tum system is the supersymmetric oscillator. It is a 
physical system combining a standard bosonic 
quantum oscillator with a fermionic oscillator of 
the same frequency. The ordinary harmonic oscilla- 
tor is described by the pair of lowering and raising 
operators (b, b"), with commutator 


bbi — b'b=1 (6] 
and the Hamiltonian 
Hy = £p + hwb'b [7] 


In this case, the eigenvalue spectrum of the occupa- 
tion number 


N, = b'b [8] 


consists of all non-negative integers np = 0, 1,2,..., 
with corresponding energy eigenvalues. To construct 
the supersymmetric oscillator, the harmonic oscilla- 
tor is combined with a fermionic oscillator [2] of the 
same frequency: 


H, = eo + bw(b'b + f'f) [9] 


Ze SUP eee See ee 


where £o =€, + ef. The ground state of this system is 
the state annihilated by both b and f: 


b\0,0) = f|0,0) = 0 10] 


The full set of energy eigenstates of the system is 
constructed by taking 


1 
ny, ne) = =D" F100 
rp.) = = bf 0, 0} m 
p10, Mo 2e) ns = (0, 1) 
with the energy eigenvalue spectrum 


E(np, nf) = co + nhw, n=npt+ny [12] 


Clearly, there is a degeneracy in energy between the 
states |n + 1,0) and |n, 1), which have the same 
total occupation number n, but differ in the bosonic 
and fermionic occupation number by one unit. This 
is illustrated in Figure 1. Such pairs of states which 
are degenerate in energy can be transformed into 
each other by the operators 


O = V2bwh'f, 


The explicit transformations are 


Qt = V2bħhwf'b [13] 


1 
ma be 
|72 ) I Dia S. b, 1) 
oo 14 
n ,1 —= — n, + 1,0 
7, 1) a e. b ) 
The operations [14] are called supersymmetry 


transformations, and the operators O and Qt are 
called supercharges. 

As the zero point of energy is arbitrary in systems 
without gravitational interactions, it is customary to 
take co=0, that is, €f = — £p; with the normal- 
ization [13], the Hamiltonian H is then the symme- 
trized absolute square of the supercharges: 


QO' + O'Q = 2H [15] 


| States (Np, Nnà 
E/fiw 


4 (3,1) (4,0) 
3 (2,1) (3,0) 
2 (1,1) (2, 0) 


110,1) (1,0) 
0 (0, 0) 
0 1 2 3 
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Figure 1 Spectrum of states of the supersymmetric oscillator. 


whilst 


Q* =O" =0 [16] 
The above relations suffice to guarantee that the 
supercharges (Q, O') are conserved: 


[Q, H] = [Q', H] = 0 [17] 
a result re-expressing the degeneracy between states 
with the same n but different np and ny. The real 
form of the supercharges is 


Qı=5(Q+0, Q&=>(Q-Q') [8 


In this representation 


H = 07 +0; [19] 


An important observation is that the ground state is the 
only state annihilated by both supersymmetry operators: 


O|0,0)}=0,  Q"0,0) =0 |20] 


Indeed, it is the only state with zero energy 
eigenvalue, and only such a state can be an 
invariant supersinglet; all other states have positive 
energy and they necessarily occur in supersymmetry 
pairs. 


Anticommuting Variables 


Fermionic degrees of freedom can be described in a 
pseudoclassical formulation by anticommuting vari- 
ables € taking values in an infinite-dimensional 
Grassmann algebra: 


cf + &€=0 [21] 


With an anticommuting variable €, we can associate 
a derivative operator 0/0€, which is an element of 
another Grassmann algebra such that 


o o o 0? 
la ttaagetegah g= p 


This extends the original Grassmann algebra to a 
Clifford algebra. Integration with respect to an 
anticommuting variable is defined in the same 


way: 
f de=1 [ag-1=0 23 


that is, integration is the same as differentiation for 
anticommuting variables. With these definitions, 
we can represent the fermionic raising and lowering 
Operators in terms of anticommuting variables as 


flag fo? 


a [24] 


and the states by 
Oped | Lee |25] 


Then an arbitrary state takes the form of a linear 
superposition 
(P) = yol0) + y1]1) > LE) = yo+ vig [26] 


and the standard positive-semidefinite inner product 
on the state space is represented on the wave 
functions by the double integral 


(|v) = / dé dze€ (BVE = dodo + pipi [27] 


By construction, f! =€ and f =0/0€ are conjugates 
with respect to this inner product: 


)= feg =) w/e) [28 


The real (self-conjugate) forms of the fermion 
operators are, therefore, defined by 


n=(é 2), 02 = i(¢ — =) [29] 


which satisfy the Pauli-Dirac anticommutation 
relations 


J dé dé eS O* (EEY (E 


Oj0; + 0;j0; = 20; [30] 


By taking the product, we obtain 


3 = -ioo = 1-26 =1- 2N; 


=(1—os) 31] 
Thus, we may think of the wave functions as two- 
component spinors, the components being labeled 
either by the eigenvalues of the spin operator 03, or 
equivalently by the fermion number N;, which is a 
projection operator on the states with negative spin. 

The action of the Hamiltonian on a wave function 
W(€) is represented by the integral 


HUI) = | de dgeOHEHHE) [82 
where H(E, £) is the ordered symbol of the Hamiltonian: 


A(E, £) = ef + hugg [33] 


This expression is to be considered as the classical 
Hamiltonian of the system. In particular, the 
exponent of the action 


= N; = 


2 
= J dt (ibzé — H(E,2) 


1 


2 ati = 
=h / di(igé toče + erta) [34] 


Supersymmetric Quantum Mechanics 147 


provides the integrand for the path-integral repre- 
sentation of the evolution operator in the quantum 
theory. The proof is not given here; the reader is 
referred to the literature. In passing, note that as the 
anticommuting variables (€,£€) are taken to be 
dimentionless, one actually should identify the 
momentum conjugate to € with m= — ib; in the 
quantum theory, this is replaced by the operator 


~iho/dé. 


Classical Supersymmetry 


The classical action for the supersymmetric oscilla- 
tor with bosonic amplitude x and fermionic ampli- 


tude € is 
2 
_ 12 
- f(s 


As inferred from the quantum theory, it is a 
combination of a linear harmonic oscillator and a 
fermionic oscillator of the same frequency. A factor 
Vh is also absorbed in £ and €; equivalently, we can 
use natural units in which 4=1. In the following, 
we use this convention. 

The action [35] is invariant under infinitesimal 
symmetry transformations 


6x = —i(€€ + e£) 
ÔE = (x + iwx)e, 6& = 


z ~ i we) 35] 


(x — lwx)é ad 


with (€, €) Grassmann-odd parameters. The Noether 
theorem then implies that there are conserved 
fermionic charges 


Q = (p - iwx)é&, Q=(pt+iux)E [37] 


with the momentum defined by p=x. The other 
conserved quantity is the energy, represented by the 
Hamiltonian 


H= ; (p? + wx?) + wl [38] 


The canonical phase-space formulation is obtained 
by defining brackets of two functions (A, B) on the 


phase space (x,p;&,€) b 
ðAðB OAOB 
a aT ðp Ox 


OAOB OAOB 


where (—1)" is the Grassmann parity of A. In terms 
of these brackets, the time evolution and super- 
symmetry transformations take the form 


A = —{H, A}, A=ifeO+c«O,A} [40] 
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Moreover, the charges O and O satisfy the bracket 
algebra 


{O, OQ} = —2iH, {O,H}={O,H}=0 [41 


Thus, the action [35] is the classical counterpart of 
the quantum theory [9]-[17] in the correspondence 
limit i{A,B} — [A,B], = AB + BA. For these the- 
ories, supersymmetry is rooted in the classical 
transformations [36]. 


Supersymmetric Quantum Mechanics 


The construction for the supersymmetric oscillator 
can be generalized to other dynamical systems in 
two ways. First, the nature of the interactions as 
represented by the potential can be modified. 
Second, the number of degrees of freedom can be 
varied. This section presents a generalization of the 
supersymmetric oscillator to anharmonic interac- 
tions, obtained by modification of the supercharges 
[37] with a general function ®(x) as follows: 


O=(p—-i®(x))E, O=(pt+i®(x))E [42] 


The brackets [39] imply the supersymmetry algebra 
[41] with the Hamiltonian 


1 1 1_, 8 
=5P +5h (x) +5O(*)/(E-€) [43] 


In quantum mechanics, the supercharges become 
operators O and QO! upon reinterpretation of (x, p) 
as canonically conjugate operators, and the replace- 
ment € — f' and £ — f; this procedure involves no 
ordering ambiguity. The Hamiltonian operator 
defined by the anticommutator of O and QO! then 
takes the operator form associated with [43]. With 
the identification 


1 
— ji Ten i 
P),  Al=—(p+ib) M 


and making use of the (anti)commutation relations 


ffi+ff=1 as 


this Hamilton operator can be written in normal- 
ordered form as 


H =3(QQ' + 0'0) =A'A +8'(x)f'f [46 


It is positive-semidefinite by construction. All results 
for the supersymmetric oscillator are reproduced 
upon taking ®(x) = wx. 

As the Hamiltonian commutes with the fermion 
number operator Nf, we can label all stationary 


AAt — ATA = ọ' (x), 


states |E,nçs) by the energy E and the fermion 
number n;=(0,1). Moreover, all states of positive 
energy are degenerate with respect to fermion 
number, as they form pairs related by 
supersymmetry: 


O|E,0) = V2E|E,1), OJE,1) =V2E|E,0) [47] 


Only ground states with Ey =0 can occur as singlets 
under supersymmetry. The existence of such a 
ground state with fermion number ns amounts to 
the existence of a state |0, nç) satisfying 


The corresponding wave functions are of the form 


0,0) = Wo(x, €) = y- (x) 


o> nedz A 
where w(x) are solutions of the equations 

Ay_=0, Aly,=0 [50] 
These functions are formally given by the 
expressions 


iio ee St) 


For a zero-energy ground state to exist, one of these 
functions must be normalizable. For example, if 
(x) is a polynomial of positive odd degree 2k — 1, 
then, depending on the sign of the coefficient of 
x7k-1. one of the exponents is bounded, approaching 
zero for x — -too, and as a result becomes square 
integrable. 

If no normalizable wave functions of the form 
[51] exist, the ground state cannot have zero energy 
(Eo > 0) and all states necessarily belong to 
superdoublets. 


Spinning-Particle Mechanics 


Minimal supersymmetric classical or quantum 
mechanics requires equal number of bosonic and 
fermionic coordinates in configuration space (x;, &), 
rather than equal number of bosonic and fermionic 
degrees of freedom in phase space. Specifically, 
minimal free supersymmetric particle mechanics in 


n dimensions is described by the classical 
Lagrangian 
be eee j= 1 [52] 
=5%i +58 i=l... 


It is invariant modulo a total time derivative under 
infinitesimal supersymmetry transformations 


bX; = —icé;, 6&; = Xj€ [53] 


The canonical phase-space formulation is phrased 


in terms of the free-particle momentum and 
Hamiltonian 
pi = Xi, H = 5p; [54] 
and the brackets 
OA OB OAOB 4 OA OB 
Se eS CS 
es Ox; Op; Op; Ox; O§; O&; pa 


The supersymmetry transformations are generated 
by the supercharge 


O=ph, ôA = ie{Q, A} [56] 
with the supersymmetry algebra 
i{Q0,O}=2H, {Q,H}=0 [57] 


An important quantity in these models is the bilinear 
(Grassmann-even) antisymmetric tensor 


Or = e [58] 
For a free particle, it is a set of constants of motion 


forming a representation of so(n), the Lie algebra of 
n-dimensional rotations: 


STi, Okt} = Spo — ÊO ik — Sino + ilO ik [59] 


Therefore, the physical interpretation of oj is that it 
represents the particle spin. For this reason, super- 
symmetric particle mechanics is often called spin- 
ning-particle mechanics. 

Quantum mechanics of the spinning particle has 
the same algebraic structure, with (x;,p;) the 
standard canonically conjugate operators, and the 
fermionic coordinates €; represented by the genera- 
tors of a Clifford algebra; the irreducble representa- 


tion in terms of Pauli-Dirac matrices of dimension 
217/2] x 21/2] is 


1 
E&i g» wi + Yi = 265 [60] 


It follows that the wave functions have 2!”/?! 
components, describing different polarization states. 
Furthermore, in minimal supersymmetric quantum 
mechanics, the supersymmetry operator is repre- 
sented by the Dirac operator: 


O>- P (Qp)? =p=2H [61] 


Hence, the stationary states of the system solve the 
Dirac equation 


y- pY = V2EU [62] 


The models can, without difficulty, be extended to 
include interactions with external fields. As an 
example, we consider the coupling to a magnetic 
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field described by a vector potential A;(x). An 
extension of the free-particle action [52], invariant 
under the same supersymmetry transformations 


[53], is 


_ to a ee lee. 1 Reet, 

S =} de( 53 +5 66 + gAj(x)x; 5 Fy) 
|63] 
where F; = Vj;A; — V;A; is the field strength. The 


canonical momentum in this model is 
pp =X FA) [64] 


with the result that the canonical expressions for the 
Hamiltonian and supercharge become 


H=4(pi-gAi(x)), Q= (pi-qAi(x))& [65] 


In the quantum theory, these constants of motion 
become the covariant Laplacian and Dirac operator 
in an external vector potential A;(x). Observe that 
supersymmetry requires the spin to couple to the 
magnetic field with gyromagnetic ratio g=2. Expli- 
citly, the equation of motion for € can be trans- 
formed into an equation for the spin precession: 


& = qFj& => oj = Q Fron — ony) [66] 


In three dimensions, this is equivalent to an equation 
in terms of axial vectors: 


Fy = Eik Bk: Oij = EijkSk => [= —qB x S [67] 


showing that the precession rate of s is given by 
twice the Larmor frequency. 


Extended Supersymmetry 


It is possible to construct theories with more 
supersymmetries by associating with every bosonic 
coordinate several fermionic coordinates. An exam- 
ple is the supersymmetric oscillator and its general- 
izations considered earlier, which has equal number 
of bosonic and fermionic degrees of freedom in 
phase space, rather than equal number of bosonic 
and fermionic coordinates in configuration space. 
The classical phase space, spanned by variables 
(xi big, &) with i=1,...,”, then has double the 
number of fermionic variables compared to the 
minimal supersymmetric particle models. Such mod- 
els can be constructed for systems with an 
n-dimensional bosonic configuration space. Their 
supercharges take the form 


O= (p =i (x))&; 


a E Xe) 


O = (pi + i®;(x))& 68} 
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whilst the Hamiltonian becomes 
H =p; + 39; (x) 
+ #(V;®; + Vid) (EE — EE) [69] 
The supercharges are conserved if the curl of ®;(x) 


vanishes: V;®; — V;®;=0. It follows that at least 
locally there exists a single function W(x) such that 


d(x) = ViW(x) [70] 


W(x) is called the superpotential. Defining the 


operators 
Aj = Di — 1D; (x), Al = pit 1D; (x) 74] 
AjA! — A!A; = Við; + V;9; 


the supersymmetric quantum theory is defined by 
QO =Ayf}, O! = Alf, 
H = 4(Q0' + Q'Q) 


The Hamiltonian is the direct operator translation of 
the classical expression [69]; its normal-ordered form is 


[72] 


H= A'A; F +(Vi®; == Vii) ff [73] 
The total fermion number operator 
N; = fifi [74] 
(summed over i) satisfying the commutation 
relations 
Nn ff] =f, [Npfl_=-f 175) 


commutes with the Hamiltonian. Hence, the station- 
ary states can be labeled by the energy E and the total 
fermion number ns = (0,..., 7). The energy spectrum 
being positive semidefinite, all positive-energy states 
occur in pairs of fermion number (nf,nf + 1); zero- 
energy states exist only if the equations 


admit a normalizable solution. In this context, the 
vanishing of the curl of ®;(x) is important, as it is a 
necessary condition for the formal solutions 


val) = Crexp(+ | o1y) -dy) 
=C e e [77] 


to be single-valued. If one of them is normalizable, 
there exists a zero-energy ground state with ns = 0 
or ns =n, represented by a wave function: 


}0, 0) T W(x, ¢) = p- (x) 


0, r) a W,(x,¢) = Wy(x)E1 TE 78l 


Alternatively, we can represent the wave functions 
as spinors of dimension 2”, on which the fermion 
operators and f; act as a 2”-dimensional matrix 
representation of the Clifford algebra with genera- 


tors ya,4 = 1,...,2n, defined by 
w=ht+h, ar =ili-A) ra 


These operators indeed satisfy the anticommutation 
rule 


Vi pa = 2p [80] 


Thus, the wave functions have 2” components, as 
compared to the 2!”/*! polarization states of the 
minimal models. 


The Witten Index 


We have noted that for supersymmetric quantum 
systems, like the harmonic and anharmonic super- 
symmetric oscillator, states exist in pairs of different 
fermion number, degenerate in energy, except for 
possibly one or more zero-energy states which are 
superinvariant in the sense that 


Q|0,2) = Q|0,n)=0 = H|0,n)=0 [81] 


In the Schrodinger representation, these states are 
characterized as zero modes of the Dirac operator: 


y-DY=0 [82] 


where D; is an ordinary or field-dependent (e.g., 
covariant) derivative. Clearly, the existence of such 
states can, in some cases, be guaranteed if there is no 
state which can pair up with a given state to form a 
superdoublet. Witten developed a topological char- 
acterization of this condition, encoded in an index 


defined by 
I = tr(—1)™ = n (E = 0) — n(E = 0) [83] 


where Ny is the fermion number operator, and 
n, f(E =0) are the number of bosonic and fermionic 
zero-energy states. The trace is taken over the 
complete space of states, but as all nonzero energy 
states occur in pairs of a bosonic and a fermionic 
state, their contributions to the trace cancel, having 
opposite sign. Therefore, the trace is actually only 
over the zero-energy states, and counts the number 
of bosonic states with positive sign, and the number 
of fermionic states with negative sign. If the index 
vanishes, [=0, then any zero-energy states necessa- 
rily exist in equal number of bosonic and fermionic 
states; under perturbations of the potential, these 
states can form pairs and change their energy to a 
positive value. However, if the index does not 
vanish, I Æ 0, then there are states which have no 
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partner of complementary fermion number; these 
states can never get a nonzero energy under changes 
in the parameters of the potential, as long as the 
changes respect supersymmetry. Such systems, there- 
fore, necessarily possess exact zero-energy states 
which are invariant under all supersymmetries. 

Deformations of the potential respecting super- 
symmetry are those obtained by changing the 
parameters in the superpotential. The usefulness of 
this concept is, therefore, that the index for models 
with complicated superpotentials can be computed 
by comparing them with models with simple super- 
potentials having similar topological properties. 

Counting the number of states is not always a 
simple procedure, in particular when the spectrum 
includes continuum states. Therefore, in practice one 
often needs a regularization procedure, by taking the 
trace over the full state space of the exponentially 
damped quantity 


I(B) = te(-1) “ie? [84] 


and taking the limit 6 — 0. The quantity [84] can be 
computed in terms of a path integral with periodic 
boundary conditions for the fermionic degrees of 
freedom. 


Finally, as the wave function representation of 
supersymmetric quantum mechanics [82] links the 
Witten index to the space of zero modes of a Dirac 
operator, in particular cases it can be used to 
describe topological aspects of sigma models and 
gauge theories, and related mathematical quantities 
such as the Atiyah-Singer index. 

More details and references to the original 
literature can be found in the reviews listed in the 
Further Reading section. 


See also: Path-Integrals in Non Commutative Geometry; 
Supermanifolds. 
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Introduction 


A prominent theme of modern condensed matter 
physics is electronic transport — in particular, the 
electrical conductivity — of disordered metallic 
systems at very low temperatures. From the Landau 
theory of weakly interacting Fermi liquids, one 
expects the essential aspects of the situation to be 
captured by the single-electron approximation. 
Mathematical models that have been proposed and 
studied in this context include random Schrödinger 
operators and band random matrices. 

If the physical system has infinite size, two distinct 
possibilities exist: the quantum single-electron 
motion may either be bounded or unbounded. In the 
former case, the disordered electron system is an 
insulator, in the latter case, a metal with finite 
conductivity (if the electron motion is not critical 
but diffusive). Metallic behavior is expected for 
weakly disordered systems in three dimensions; 


insulating behavior sets in when the disorder strength 
is increased or the space dimension reduced. 

The main theoretical tool used in the physics 
literature on the subject is the “supersymmetry 
method” pioneered by Wegner and Efetov (1979-83). 
Over the past 20 years, physicists have applied the 
method in many instances, and a rather complete 
picture of weakly disordered metals has emerged. 
Several excellent reviews of these developments are 
available in print. 

From the perspective of mathematics, however, the 
method has not always been described correctly, and 
what is sorely lacking at present is an exposition of 
how to implement the method rigorously. (Unfortu- 
nately, the correct exposition by Schäfer and Wegner 
(1980) was largely ignored or forgotten by later 
authors.) In this article, an attempt is made to help 
remedy the situation, by giving a careful review of 
the Wegner—Efetov supersymmetry method for the 
case of Hermitian band random matrices. 


Gaussian Ensembles 


Let V be a unitary vector space of finite dimension. 
A Hermitian random matrix model on V is defined 
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by some probability distribution on Herm(V), the 
Hermitian linear operators on V. We may fix some 
orthonormal basis of V and represent the elements 
H of Herm(V) by Hermitian square matrices. 

Quite generally, probability distributions are 
characterized by their Fourier transform or char- 
acteristic function. In the present case this is 


Q(K) = oo) 


where the Fourier variable K is some other linear 
operator on V, and (...) denotes the expectation 
value with respect to the probability distribution for 
H. Later, it will be important that, if Q(K) is an 
analytic function of K, the matrix entries of K need 
not be from R or C but can be taken from the even 
part of some exterior algebra. 

The probability distributions to be considered in 
this article are Gaussian with zero mean, (H)=0. 
Their Fourier transform is also Gaussian: 


with J some quadratic form. We now describe J for a 
large family of hierarchical models that includes the 
case of band random matrices. 

Let V be given a decomposition by orthogonal 
vector spaces: 


V=V 9V2- 9 VN 





We should imagine that every vector space V; 
corresponds to one site 7 of some lattice A, and the 
total number of sites is |A|. For simplicity, we take 
all dimensions to be equal: dim V,=--- = dim 
Va =N. Thus, the dimension of V is N|A|. The 
integer N is called the number of orbitals per site. 
If II; is the orthogonal projector on the linear 
subspace V; C V, we take the bilinear form J to be 


|A] 


J(K,K’) = X Ji tr (ŒK UK’) 
ij=1 


where the coefficients J; are real, symmetric, and 
positive. This choice of J implies invariance under 
the group 7 of unitary transformations in each 
subspace: 


7 = U(V1) x U(V2) x +++ x U(Vjq)) 


Clearly, Q(K)=Q(UKU™) or, equivalently, the 
probability distribution for H is invariant under 
conjugation H= UHU”, for U € 7. 

If {e7},~1,..nN is an orthonormal basis of V;, we 
define linear operators 4? :V;— V; by Ere = e?. 
By evaluating /(E2 E}/)= Jin 6; 6% 6°", one sees 


yy? 


that the matrix entries of H all are statistically 
independent. 

By varying the lattice A, the number of orbitals N, 
and the variances Ji, one obtains a large class of 
Hermitian random matrix models, two prominent 
subclasses of which are the following: 


1. For |A|=1, one gets the Gaussian Unitary 
Ensemble (GUE). Its symmetry group is 7= 
U(N), the largest one possible in dimension 
N= dim V. 

2. If |i — j| denotes a distance function for A, and f a 
rapidly decreasing positive function on R+, of 
width W, the choice Ji;;=f(|¢—j|) with N=1 
gives an ensemble of band random matrices with 
bandwidth W and symmetry group 7” = U(1)'". 


Beyond being real, symmetric, and positive, the 
variances J; are required to have two extra proper- 
ties in order for all of the following treatment to go 
through: 


e They must be positive as a quadratic form. This is 
to guarantee the existence of an inverse, which we 
denote by w= (J );. 

e The off-diagonal matrix entries of the inverse 
must be nonpositive: w; < 0 for i £j. 


Basic Tools 
Green’s Functions 


A major goal of random matrix theory is to 
understand the statistical behavior of the spectrum 
and the eigenstates of a random Hamiltonian H. 
Spectral and eigenstate information can be extracted 
from the Green’s function, that is, from matrix 
elements of the operator (z — H)! with complex 
parameter z € C\R. For the models at hand, the 
good objects to consider are averages of 7-invariant 
observables such as 


G)(z) = (tr IL(z — H)') i1] 


GP (a1, 22) = (trai - H) I-H)" [2] 


The discontinuity of G!” (2) across the real z-axis 
yields the local density of states. In the limit of 
infinite volume (|A| — oo), the function Gi (21,22) 
for z1 =E + ie, z2 = E — ig, real energy E, and £ > 0 
going to zero, gives information on transport, 
for example, the electrical conductivity by the 
Kubo-Greenwood formula. 

Mathematically speaking, if G; (E + ie, E — ie) is 
bounded (for infinite volume) in € and decreases 
algebraically with distance |¿— j| at e=0+, the 
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spectrum is absolutely continuous and the eigen- 
states are extended at energy E. On the other hand, 
a pure point spectrum and localized eigenstates are 
signaled by the behavior ce wete*Al-l with 
positive Lyapunov exponent 4. 


Green’s Functions from Determinants 


For any pair of linear operators A, B on a finite- 
dimensional vector space V, the following formula 
from basic linear algebra holds if A has an inverse: 


d 
—det(A+ztB)| = det(A)tr(A~'B) 

dż 1=0 
Using it with A=z—H and z € C\R, all Green’s 
functions can be expressed in terms of determinants; 
for example, 





GY (w, z) 
N, @ /det(w -— H)det(z — H + E) 
7 ~, sót \ det(w — H — dra det(z — H) o 


It is clear that, given a formula of this kind, what 
one wants is a method to handle ensemble averages 
of ratios of determinants. This is what is reviewed in 
the sequel. 


Determinants as Gaussian Integrals 


Let the Hermitian scalar product of the unitary 
vector space V be written as y1, %2 > (1, %2), 
and denote the adjoint or Hermitian conjugate 
of a linear operator A on V by A*. If 
Re A:=(1/2)(A + A*) > 0, the standard Lebesgue 
integral of the Gaussian function pre '4?) 
makes sense and gives 


[eo = det A! [3] 


where it is understood that we are integrating with 
the Lebesgue measure on (the normed vector space) 
V normalized by fe"?! =1. The same integral 
with anticommuting w instead of the (commuting) 
p € V gives 


fe WAY) — det A |4] 


This basic formula from the field theory of 
fermionic particles is a consequence of the integra- 
tion over anticommuting variables actually being 
differentiation: 


2 


oag e 


J dy dyf (41, pı, . .) = 


Fermionic Variant 


The supersymmetry method of random matrix 
theory is a theme with many variations. The first 
variation to be described is the “fermionic” one. To 
optimize the notation, we now write dun,j(H) for 
the density of the Gaussian probability distribution 
ot A: 


(F(H)) = | F(H) dun y(H) 


All determinants and traces appearing below will be 
taken over vector spaces that are clear from the 
context. 


Let z1,...,Zn be any set of n complex numbers, 
put z:=diag(z1,...,Zn) for later purposes, and 
consider 

F(z n=/T. [I deta = Hd) 15 


The supersymmetry method expresses this average 
of a product of determinants in an alternative way, 
by integrating over a “dual” measure as follows. 
Introducing an auxiliary unitary vector space 
C”, one associates with every site 7 of the lattice 
A an object O; € Herm(C”), the space of Hermitian 
n x n matrices. If dO; for i= 1,..., |A| are Lebesgue 
measures on Herm(C”), one puts DỌ =const. x 


|], dQ; and 
dvn j(Q) = CPP HPL DO [6] 


The multiplicative constant in DỌ is fixed by 
requiring the density to be normalized: f dvn, 
(O)=1. By completing the square, this Gaussian 
probability measure has the characteristic function 


[ett dun (Q) =e 


where the Fourier variables Kj,...,K),; are nxn 
matrices with matrix entries taken from C or 
another commutative algebra. 

The key relation of the fermionic variant of the 
supersymmetry method is that the expectation of the 
product of determinants [5] has another expression as 


(1/2) XiJij tr KiK; 


|A] 


ofera (z vgl [fae (2-10) dy, O: [7] 


(i= V/—1). The strategy of the proof is quite simple: 
one writes the determinants in both expressions for 
Dig as Gaussian integrals over nN [A] complex 
fermionic variables %1,..., Wn (each Ya is a vector in 
V with anticommuting coefficients), using the basic 
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formula [4]. The integrals then encountered are 
essentially the Fourier transforms of the distribu- 
tions dun,;(H) resp., dv, 7(Q). The result is 


J e rzy (Dythy) e- (1/2) Eii 2o (Pa lide) (De Tia) 


for both expressions of ee In other words, 
although the probability digi aon dun, j(H) and 
dv,,j(Q) are distinct (they are defined on different 
spaces), their characteristic functions coincide when 
evaluated on the Fourier variables K = X`, wa(Wa, ©) 
for H and (Kj)ag = (Wa, Hiwg) for Q;. This establishes 
the claimed equality of the expressions [5] and [7] for 
OEM (z Z; J). 

What is the advantage of passing to the alternative 
expression by dv,,;(Q)? The answer is that, while H 
is made up of independent random variables, the new 
variables O;, called the Hubbard-Stratonovich field, 
are correlated: they interact through the “exchange” 
constants w; = (J lia If that interaction creates 
enough collectivity, a kind of mean-field behavior 
results. 

For the simple case of GUE (|A| = 1, w11 = N/A?) 
with z1 = --- =z, = E, one gets the relation 


(det”(E — H)) = | derX(E —iO)e 


the right-hand side of which is easily analyzed by the 
steepest descent method in the limit of large N. 

For band random matrices in the so-called ergodic 
regime, the physical behavior turns out to be governed 
by the constant mode Q1 = --- = Qyq) —a fact that can 
be used to establish GUE universality in that regime. 


=(N/2A°)tt O+ dO 


Bosonic Variant 


The bosonic variant of the present method, due to 
Wegner, computes averages of products of determi- 
nants placed in the denominator: 


OP (z, J) = [I [de'a = Mdu) 18 


where we now require Jmza 4 0 for all a=1,...,7. 
Complications relative to the fermionic case arise 
from the fact that the integrand in [8] has poles. If 
one replaces the anticommuting vectors wy, by 
commuting ones Ya, and then simply repeats the 
previous calculation in a naive manner, one arrives at 


|A| 


Pe (z, J) 2 J! [Teer ® 


where the integral is still over Q; € Herm(C”). The 
calculation is correct, and relation [9] therefore 


(z = Q;)dvn (Q) [9] 


holds true, provided that the parameters z1,...,Zn 
all lie in the same half (upper or lower) of the 
complex plane. To obtain information on transport 
properties, however, one needs parameters in both 
the upper and lower halves; see the paragraph 
following [2]. The general case to be addressed 
below is Jmza > 0 for a=1,...,p, and Imza < 0 
for a=p+1,...,u. Careful inspection of the steps 
leading to eqn [9] reveals a convergence problem for 
0 <p <n. In fact, [9] with Q; in Herm(C”) turns 
out to be false in that range. Learning how to 
resolve this problem is the main step toward 
mathematical mastery of the method. Let us there- 
fore give the details. 

If sa :=sgnJmza, the good (meaning convergent) 
Gaussian integral to consider is 


J ei Zasa (Pa (za -JI det! ( 


To avoid carrying around trivial constants, we now 
assume i”7PNIA 1, Use of the characteristic 
function of the distribution for H then gives 


ae) = fe 


ZagsalPa Mipa)sol(Pa Mpa) [10] 


(=isalza ~H )) 


x e72% 


The difficulty of analyzing this expression stems 
from the “hyperbolic” nature (due to the indefinite- 
ness of the signs Sa = +1) of the term quartic in the 


Pas Pa: 


Fyodorov’s Method 


The integrand for QP® is naturally expressed in 
terms of nxn matrices M; with matrix ele- 
ments (M;i)as = (Pa, Hipp). These matrices lie in 
Herm (C”), that is, they are non-negative as well 
as Hermitian. Fyodorov’s idea was to introduce 
them as the new variables of integration. To do 
that step, recall the basic fact that, given two 
differentiable spaces X and Y and a smooth map 
w:X— Y, a distribution u on X is pushed forward 
to a distribution y(u) on Y by Y(wif]:= ulf o WI, 
where f is any test function on Y. 

We apply this universal principle to the case at 
hand by identifying X with V”, and Y with 
(Hermt(C”))'!, and y with the mapping that sends 


(Piran yn) EX to (Missas; Mig E Y 


by (Mi)ag = (Pa, Hipp). On X= V” we are integrat- 
ing with the product Lebesgue measure normalized 
by feel?) = 1, We now want the push-forward 
of this flat measure (or distribution) by the mapping 
w. In general, the push-forward of a measure is not 
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guaranteed to have a density but may be singular 
(like a Dirac 6-distribution). This is in fact what 
happens if N < n. The matrices M; then have less 
than the maximal rank, so they fail to be positive 
but possess zero eigenvalues, which implies that 
the flat measure on X is pushed forward by wy into the 
boundary of Y. For N > n, on the other hand, the 
push-forward measure does have a density on Y; and 
that density is [2 (det M;N ™”dM;, as is seen by 
transforming to the eigenvalue representation and 
comparing Jacobians. The dM; are Lebesgue mea- 
sures on Herm(C”), normalized by the condition 


J e ™i (det My” dM; = fran = 1 
M;>0 


Assembling the sign information for Jmz, in a 
diagonal matrix s := diag(s;,...,5,), and pushing the 
integral over X forward to an integral over Y with 
measure DM:=]|[,;dM;, we obtain Fyodorov’s 
formula: 


Y 
x e%rtr(iscMęg+(N—n) lnaM) DM [1 1] 


This formula has a number of attractive features. 
One is ease of derivation, another is ready general- 
izability to the case of non-Gaussian distributions. 
The main disadvantage of the formula is that it does 
not apply to the case of band random matrices 
(because of the restriction N > n); nor does it 
combine nicely with the fermionic formula [7] to 
give a supersymmetric formalism, as one formula is 
built on J; and the other on wj. 

Note that [11] clearly displays the dependence on 
the signature of Jmz: you cannot remove the s1,..., Sn 
from the integrand without changing the domain of 
integration Y=(Hermt(C”))*'!. This important 
feature is missing from the naive formula [9]. 

Setting q =n — p, let U(p, q) be the pseudounitary 
group of complex n xn matrices T with inverse 
Tt =sT*s. Since |detT|=1 for T € U(p,q), the 
integration domain Y and density DM = | [;dM; of 
Fyodorov’s formula are invariant under U(p,q) 
transformations M;+> TM;T*, and so is actually the 
integrand in the limit where all parameters z1,...,Z, 
become equal. Thus, the elements of U(p,q) are 
global symmetries in that limit. This observation 
holds the key to another method of transforming the 
expression |10]. 


The Method of Schafer and Wegner 


To rescue the naive formula [9], what needs to be 
abandoned is the integration domain Herm(C”) for 
the matrices O;. The good domain to use was 


constructed by Schafer and Wegner, but was largely 
forgotten in later physics work. 

Writing (Mg)ag=(Pas Uys) as before, consider 
the function 


Fy(Q) = e (1/2) Xijwijtr(sQi+iz)(sQj+iz)—Ep_trMy Qy [12] 
viewed as a holomorphic function of 


O = (Q1, me -,O1n)) < End(C”)!" 


If the Gaussian integral [Fy(Q)DO with holo- 
morphic density DỌ = [[,dQ; is formally carried 
out by completing the square, one gets the integrand 
of [10]. This is just what we want, as it would allow 
us to pass to a O-matrix formulation akin to the one 
of the previous section. But how can that formal 
step be made rigorous? To that end, one needs to (1) 
construct a domain on which |Fy(Q)| decreases 
rapidly so that the integral exists, and (2) justify 
completion of the square and shifting of variables. 
To begin, take the absolute value of Fy(Q). 
Putting (1/2)(Q; + OF) =:ReQ; and (1/2i)(Q;— 
Q;) =: ImQj, we have |Fy| =e U/ANA+h+h) with 
fi(Q) = X wite(sImQ; + z)(sIMO; + 2) + c.c. 


ij 


f(Q) = -2 X  witr(ReQ;)(ReQ;) 


f3(O) — 4 ` tr (m + sImz XŠ ws] ReO; 
i j 


These expressions suggest making the following 
choice of integration domain for O,(i=1,...,|A)). 
Pick some real constant à > 0 and put 


ReO; = AT; T? JmOQ; = P P : ) 
CU); = Alil}; MO = Li 
0 P; 

with T; € U(p,q), P} € Herm(C?), P> € Herm(C‘). 
The set of matrices OQ; so defined is referred to as 
the Schafer-Wegner domain X47. The range of the 
field Q=(Qj,...,Qjaj) is the direct product 
= (XEDA, 

To show that this is a good choice of domain, we 
first of all show convergence of the integral 
f, Fu(Q)DQ. The matrices P; commute with s, so 


fi(O)| =2Re X witr(P; T sz) (P; F SZ) 
ij 
Since the coefficients w;; are positive as a quadratic 
form, this expression is convex (with a positive 


Hessian) in the Hermitian matrices P;. Second, the 
function 


fp(Q)l, = — 2% >, wijte(T:T}) TT} 
ij 
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is bounded from below by the constant —2\?2d,wj. 
This holds true because w; is negative for i Æ j, and 
because T;T* > 0 and the trace of a product of two 
positive Hermitian matrices is always positive. 


Third, 


BO- =40 Str (m +sImz* ` 7 GI; 
; j 


is positive, as (...) is positive Hermitian. As long as 
sJmz > 0, the function f3 goes to infinity for all 
possible directions of taking the T; to infinity on 
U(p, q). 

Thus, when the matrices QO; are taken to vary on 
the Schafer-Wegner domain X4’7, the absolute value 
IFu|=e"“/MAt+h+h) decreases rapidly at infinity. 
This establishes the convergence of f, Fy(Q)DO. 

Next, let us count dimensions. The mapping 
T= TT* for T € U(p,g)=:G is invariant under 
right multiplication of T by elements of the unitary 
subgroup H:=U(p) x U(q) — it is called the “Cartan 
embedding” of G/H into G. The real manifold G/H 
has dimension 2pq and so does its image under the 
Cartan embedding. Augmenting this by the dimen- 
sion of Herm(C’) and Herm(C’) (from P;), one gets 
dimX4’! =2pq + p? + q? = (pra) =n’, which is as 
it should be. 

Finally, why can one shift variables and do the 
Gaussian integral over O (with translation-invariant 
DQ) by completing the square? This question is 
legitimate as the Schafer-Wegner domain X‘4’? lacks 
invariance under the required shift, which is 
Oj QO; — isz + >; JysMjs. 

To complete the square in [12], introduce a 
parameter t € [0,1] and consider the family of shifts 


OF > O; + t(—isz + X;JysMjs) 


For fixed t, this shift takes 7 = (XP 7) IAI into another 
domain, 7(t). Inspection shows that the function 
[12] still decreases rapidly (uniformly in the M;) on 
7(t), as long as t<1. Without changing the 
integral, one can add pieces to 7(t) (for t <1) at 
infinity to arrange for the chain 7 — 2 (t) to be a 
cycle. Because 7(t) is homotopic to 7(0)= 7, this 
cycle is a boundary: there exists a manifold 7(t) of 
dimension dim% +1 such that 07(t)= 7 — 7(t). 
Viewed as a holomorphic differential form of degree 
(n2\A|,0) in the complex space End(C”)!"!, the 
integrand w:=Fy(O)DO is closed (i.e., dw=0). 
Therefore, by Stokes’ theorem, 


fo-| w= | w= | dw = 0 
2 7 (t) 97 (t) v(t) 


which proves J, Fu(Q)DQ= J, Fu(Q)DQ, inde- 
pendent of t. (This argument does not go through 
for the nonrigorous choice sOQ;:=T;P;T;! usually 
made!) 

In the limit t— 1, one encounters the expression 


J _,FulQ)D9 = J dunslisQ) 


v e (1/2)¥;Jytr(sMisM;)+iXptr(szM;) 

with dv„,y as in [6]. The normalization integral over 
7 is defined by taking the Hermitian matrices P; to 
be the inner variables of integration. The outer 
integrals over the T; then demonstrably exist, and 
one can fix the (otherwise arbitrary) normalization 
of DO by setting f, dv,,;(isQ)=1. Making that 
choice, and comparing with [10], one has proved 


NN = J ( / Fontene (ODO) 
PP a 


The final step is to change the order of integration 
over the OQ- and w-variables, which is permitted 
since the Q-integral converges uniformly in y. 
Doing the Gaussian y-integral and shifting Q; —> 
O, — isz, one arrives at the Schafer-Wegner formula 
for 9% 3 


A 


xe NE«tt In(Qx-is2) DO [13] 


which is a rigorous version of the naive formula [9]. 
Compared to Fyodorov’s formula, it has the dis- 
advantage of not being manifestly invariant under 
global hyperbolic transformations Q; — TO;T* (the 
integration domain 7 is not invariant). Its best 
feature is that it does apply to the case of band 
random matrices with one orbital per site (N = 1). 


Supersymmetric Variant 


We are now in a position to tackle the problem of 
averaging ratios of determinants. For concreteness, 
we shall discuss the case where the number of 
determinants is two for both the numerator and the 
denominator, which is what is needed for the 
calculation of the function GE (21,22) defined in 
eqn [2]. We will consider the case of relevance for 
the electrical conductivity: zı = E + ie, z2 = E — ie, 
with E € R and e > 0. 

A O-integral formula for Gi) (z1, 22) can be 
derived by combining the fermionic method for 


(det(z1 — H)det(22 — H + pE) ) 
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with the Schafer-Wegner bosonic formalism for 
(det! (zı —H- nE?) det! (z2 — H)} 


and eventually differentiating with respect to t1, t2 at 
tı =h = 0 and summing over a, b; see the subsection 
“Green’s functions from determinants.” All steps are 
formally the same as before, but with traces and 
determinants replaced by their supersymmetric 
analogs. Having given a great many technical details 
in the last two sections, we now just present the 
final formula along with the necessary definitions 
and some indication of what are the new elements 
involved in the proof. 

Let each of Opp, Orr, Osr, and Orp stand for a 
2x2 matrix. If the first two matrices have 
commuting entries and the last two anticommuting 
ones, they combine to a 4 x 4 supermatrix: 


= ( Opp OBE 
A (o> A 


Relevant operations on supermatrices are the 


supertrace, 
StrO = trOpp = trOrr 
and the superdeterminant, 
det 
SdetO = et(Opp) = 
det(Orr — OrgOzsp Orr) 

These are related by the identity Sdet= exp o Str o In 
whenever the superdeterminant exists and is 


nonzero. 

In the process of applying the method described 
earlier, a supermatrix QO; gets introduced at every 
site 7 of the lattice A. The domain of integration for 
each of the matrix blocks (Oj)pp(i=1,...,|A|) is 
taken to be the Schafer-Wegner domain Xy (with 
some choice of A > 0); the integration domain for 
each of the (QO;)pp is the space of Hermitian 2 x 2 
matrices, as before. 

Let Ext be the 4 x 4 (super)matrix with unit entry 
in the ope -left corner and zeros elsewhere; simi- 
larly, E% has unity in the lower-right corner and 
zeros elsewhere. Putting s=diag(1,—1,1,1) and 
z = diag(z1, 22, 21,22), the supersymmetric Q- ee 
formula for the generating function of GY 
obtained by combining the Schafer—Wegner Go oni 
method with the fermionic variant — is written as 


det(z — H) det(z2 — H + t)E%) 
det(z1 — H — t E?*) det(z2 = H) 


— JO Ean 


x et In (E,,<(Q,—isz)@ES +it EM, gE in EROE) [14] 


where the second supertrace includes a sum over 
sites and orbitals, and on setting t4 = t2 = 0 becomes 


=I] Sdet (Q, — isz) 


The superintegral “measure” DỌ = | |, DQ, is the 
flat Berezin form, that is, the product of differentials 
for all the commuting matrix entries in (O,)pp and 
(O,)pp, times the product of derivatives for all the 
anticommuting matrix entries in (O,)pp and (O,)gp 
To prove the formula [14], two new tools are 
needed, a brief account of which is as follows. 


e N&, Str In(Q,—isz) 


Gaussian Superintegrals 


There exists a supersymmetric generalization of the 
Gaussian integration formulas given in the subsec- 
tion “Determinants as Gaussian integrals”: if 
A,D(B,C) are linear operators or matrices with 
commuting (resp., anticommuting) entries, and 


JteA > 0, one has 


Verification of this formula is straightforward. 
Using it, one writes the last factor in [14] as a 
Gaussian superintegral over four vectors: (1, %2, U1, 
and y2. The integrand then becomes Gaussian in the 
matrices O,. 


Shifting Variables 


The next step in the proof is to do the “Gaussian” 
integral over the supermatrices O,. By definition, in 
a superintegral, one first carries out the Fermi 
integral, and afterwards the ordinary integrations. 
The Gaussian integral over the anticommuting parts 
(O,)pp and (O,)pp is readily done by completing the 
square and shifting variables using the fact that 
fermionic integration is differentiation: 


[acte-¢)= atte = JAFE 


Similarly, the Gaussian integral over the Hermitian 
matrices (O,)rp is done by completing the square 
and shifting. The integral over (O,)pp, however, is 
not Gaussian, as the domain is not R” but the 
Schafer-Wegner domain. Here, more advanced 
calculus is required: these integrations are done by 
using a supersymmetric change-of-variables theorem 
due to Berezin to make the necessary shifts by 
nilpotents. (There is not enough space to describe 
this here, so please consult Berezin’s (1987) book.) 
Without difficulty, one finds the result to agree with 
the left-hand side of eqn [14], thereby establishing 
that formula. 
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Approximations 


All manipulations so far have been exact and, in 
fact, rigorous (or can be made so with little extra 
effort). Now we turn to a sequence of approxima- 
tions that have been used by physicists to develop a 
quantitative understanding of weakly disordered 
quantum dots, wires, films, etc. While physically 
satisfactory, not all of these approximations are 
under full mathematical control. We will briefly 
comment on their validity as we go along. 


Saddle-Point Manifold 


We continue to consider GUE + ie, E — i£) and 
focus on E=0 (the center of the energy band) for 
simplicity. By varying the exponent on the right- 
hand side of [14] and setting the variation to zero 
one obtains, for ti =b =0, 


X wisQjs — NO;' =) 
j 


which is called the saddle-point equation. 
Let us now assume translational invariance, 
wi; =f(li— jl). Then, if A=./N/dX;wy, the saddle- 


point equation has i-independent solutions of the 


form 
qeg O 
j=A 
= (% in) 


where for qpr there are three possibilities: two 
isolated points qrr = +1 (unit matrix) coexist with 
a manifold 


7 cos 01 
oe sin ĝ4 e7'?! 


which is two-dimensional; whereas the solution 
space for gpp consists of a single connected 
2-manifold: 


sin 0, e!”! 
1 
— cos 6; pe 


E cosh 6o 
BB = \ sinh Oo e710 


The solutions qrp = +1 are usually discarded in the 
physics literature. (The argument is that they break 
supersymmetry and therefore get suppressed by 
fermionic zero modes. For the simpler case of the 
one-point function [1] and in three space dimen- 
sions, such suppression has recently been proved by 
Disertori, Pinson, and Spencer.) Other solutions for 
qgg are ruled out by the requirement ReO; > 0 for 
the Schäfer-Wegner domain. 

The set of matrices [16] and [15] — the “saddle- 
point manifold” — is diffeomorphic to the product of 
a 2-hyperboloid H? with a 2-sphere S%. Moving 


sinh ĝo e! ) 16] 


cosh 6 


along that manifold M := H? x S? leaves the O-field 
integrand [14] unchanged (for z1 =z2 = tı = t2 = 0). 

One can actually anticipate the existence of such a 
manifold from the symmetries at hand. These are 
most transparent in the starting point of the 


formalism as given by the characteristic function 
: g y 
(e1Kn ) with 


Ky = (G1, Hy1) — (G2, Hy2) + (Y1, Hd) 
+ (2, Hy2) 


The signs of this quadratic expression are what is 
encoded in the signature matrix s = diag(1, — 1,1,1) 
(recall that the first two entries are forced by Jmz, > 
0 and Jmz <0). The Hermitian form Ky is 
invariant under the product of two Lie groups: 
U(1, 1) acting on the y’s, and U(2) acting on the w’s. 
This invariance gets transferred by the formalism to 
the O-side; the saddle-point manifold M is in fact an 
“orbit” of the group action of G:=U(1,1) x U(2) 
on the Q-field. In the language of physics, the 
degrees of freedom of M correspond to the Gold- 
stone bosons of a broken symmetry. 

Ky also has some supersymmetries, mixing y’s 
with Ws. At the infinitesimal level, these combine 
with the generators of G to give a Lie superalgebra 
of symmetries g:=u(1,1|2). One therefore expects 
some kind of saddle-point supermanifold, say .⁄, on 
the O-side. 

# can be constructed by extending the above 
solution go:=diag(gpp,gdrr) of the dimensionless 
saddle-point equation sgs=q =! to the full 4x 4 
supermatrix space. Putting g=go + qi with 


_ O qpr 
n= (n o) 


and linearizing in q1, one gets 


sqı s = —q3 'q1q3' [17] 


The solution space of this linear equation for qı has 
dimension 4 for all go € M. Based on it, one expects 
four Goldstone fermions to emerge along with the 
four Goldstone bosons of M. 

For the simple case under consideration, one can 
introduce local coordinates and push the analysis to 
nonlinear order, but things get quickly out of hand 
(when done in this way) for more challenging, 
higher-rank cases. Fortunately, there exists an 
alternative, coordinate-independent approach, as 
the mathematical object to be constructed is 
completely determined by symmetry! 


Riemannian Symmetric Superspace 


The linear equation [17] associates with every point 
x € M a four-dimensional vector space of solutions, 
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Vx. As the point x moves on M the vector spaces Vy 
turn and twist; thus, they form what is called a 
vector bundle V over M. (The bundle at hand turns 
out to be nontrivial, i.e., there exists no global 
choice of coordinates for it.) 

A section of V is a smooth mapping v:M— V 
such that v(x) € Vx for all x€ M. The sections of 
V are to be multiplied in the exterior sense, as they 
represent anticommuting degrees of freedom; 
hence the proper object to consider is the exterior 
bundle, AV. 

It is a beautiful fact that there exists a unique 
action of the Lie superalgebra g on the sections of 
AV by first-order differential operators, or deriva- 
tions for short. (Be advised however that this 
canonical g-action is not well known in physics or 
mathematics.) 

The manifold M is a symmetric space, that is, a 
Riemannian manifold with G-invariant geometry. 
Its metric tensor, g, uniquely extends to a second- 
rank tensor field (still denoted by g) which maps 
pairs of derivations of AV to sections of AV, and is 
invariant with respect to the g-action. This collec- 
tion of objects — the symmetric space M, the 
exterior bundle AV over it, the action of the Lie 
superalgebra g on the sections of AV, and the 
g-invariant second-rank tensor g — form what 
the author calls a “Riemannian symmetric super- 
space,” ./. 


Nonlinear Sigma Model 


According to the Landau—Ginzburg—Wilson (LGW) 
paradigm of the theory of phase transitions, the 
large-scale physics of a statistical mechanical system 
near criticality is expected to be controlled by an 
effective field theory for the long-wavelength excita- 
tions of the order parameter of the system. 

Wegner is credited for the profound insight that 
the LGW paradigm applies to the random matrix 
situation at hand, with the role of the order 
parameter being taken by the matrix O. He argued 
that transport observables (such as the electrical 
conductivity) are governed by slow spatial variations 
of the O-field inside the saddle-point manifold. 
Efetov skilfully implemented this insight in a super- 
symmetric variant of Wegner’s method. 

While the direct construction of the effective 
continuum field theory by gradient expansion of 
[14] is not an entirely easy task, the outcome of the 
calculation is predetermined by symmetry. On 
general grounds, the effective field theory has to be 
a nonlinear sigma model for the Goldstone bosons 
and fermions of ./: if {^} are local coordinates for 


the bundle V with metric g4g(¢) = g(0/0¢*, 0/0¢°), 


the action functional is 


S= of dxa p gar lo)ð p? 


The coupling parameter o has the physical meaning 
of bare (i.e., unrenormalized) conductivity. In the 
present model o = NW7?a?~“, where W is essentially 
the width of the band random matrix in units of the 
lattice spacing a (the short-distance cutoff of the 
continuum field theory). S is the effective action in 
the limit z4 =z2. For a finite frequency w=z, — 22, a 
symmetry-breaking term of the form iwy f dxf (o), 
where v =N(rà) ta~? is the local density of states, 
has to be added to S. 

By perturbative renormalization group analysis, that 
is, by integrating out the rapid field fluctuations, one 
finds for d = 2 that o decreases on increasing the cutoff 
a. This property is referred to as “asymptotic freedom” 
in field theory. On its basis one expects exponentially 
decaying correlations, and hence localization of all 
states, in two dimensions. However, a mathematical 
proof of this conjecture is not currently available. 

In three dimensions and for a sufficiently large bare 
conductivity, the renormalization flow goes toward 
the metallic fixed point (o — oo), where G-symmetry 
is broken spontaneously. A rigorous proof of this 
important conjecture (existence of disordered metals 
in three space dimensions) is not available either. 


Zero-Mode Approximation 


For a system in a box of linear size L, the cost of 
exciting fluctuations in the sigma model field is 
estimated as the Thouless energy Ey, =o /vL?. In the 
limit of small frequency, |w| < Erp, the physical 
behavior is dominated by the constant modes 
p^(x)=¢^ (independent of x). By computing the 
integral over these modes, Efetov found the energy- 


level correlations in the small-frequency limit to be 
those of the GUE. 


See also: Random Matrix Theory in Physics; Symmetry 
Classes in Random Matrix Theory. 
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Introduction 


Many systems of partial differential equations 
arising in mathematical physics and differential 
geometry are quasilinear: the top-order derivatives 
enter only linearly. They may be cast in the form 
of first-order systems by introducing, if needed, 
derivatives of the unknowns as additional unknowns. 
For such systems, the theory of symmetric—hyperbolic 
(SH) systems provides a unified framework for 
proving the local existence of smooth solutions if 
the initial data are smooth. It is also convenient for 
constructing numerical schemes, and for studying 
shock waves. Despite what the name suggests, the 
impact of the theory of SH systems is not limited to 
hyperbolic problems, two examples being Tricomi’s 
equation, and equations of Cauchy—Kowalewska 
type. 

Application of the SH framework usually requires a 
preliminary reduction to SH form (“symmetrization”). 

After comparing briefly the theory of SH systems 
with other functional-analytic approaches, we col- 
lect basic definitions and notation. We then present 
two general rules, for symmetrizing conservation 
laws and strictly hyperbolic equations, respectively. 
We next turn to special features possessed by linear 
SH systems, and give a general procedure to prove 
existence, which covers both linear and nonlinear 
systems. We then summarize those results on shock 
waves, and on blow-up singularities, which are 
related to SH structure. Examples and applications 
are collected in the last section. 

The advantages of SH theory are: a standardized 
procedure for constructing solutions; the availability 
of standard numerical schemes; a natural way to 
prove that the speed of propagation of support is 
finite. On the other hand, the symmetrization 
process is sometimes ad hoc, and does not respect 


the physical or geometric nature of the unknowns; 
to obviate this defect to some extent, we remark that 
symmetrizers may be viewed as introducing a new 
Riemannian metric on the space of unknowns. The 
search for a comprehensive criterion for identifying 
equations and boundary conditions compatible with 
SH structure is still the object of current research. 
The most important fields of application of the 
theory today are general relativity and fluid 
dynamics, including magnetohydrodynamics. 


Context of SH Theory in Modern Terms 


The basic reason why the theory works may be 
summarized as follows for the modern reader; the 
history of the subject is, however, more involved. 

Let H be a real Hilbert space. Consider a linear 
initial-value problem du/dt + Au = 0; u(0) = uo € H, 
where A is unbounded, with domain D(A). By 
Stone’s theorem, one can solve it in a generalized 
sense, if the unbounded operator A satisfies A + 
A*=0. This condition contains two ingredients: a 
symmetry condition on A, and a maximality condi- 
tion on D(A), which incorporate boundary condi- 
tions (von Neumann, Friedrichs). Semigroup theory 
(Hille and Yosida, Phillips, and many others) 
handles more general operators A: it is possible to 
solve this problem in the form u(t) = S(t)uo for t > 0, 
where {S(t)},59 is a continuous contraction semi- 
group, if and only if (Au,u) > 0, and equation x + 
Ax=y has a solution for every y in H (this is a 
maximality condition on D(A)). One then says that 
A is maximal monotone. For such operators, A + 
A* >0. SH systems are systems Ou;+ Au =F, 
satisfying two algebraic conditions ensuring for- 
mally that A+ A* is bounded, and that O is 
symmetric and positive definite. This algebraic 
structure enables one to solve the problem directly, 
without explicit reference to semigroup theory. 
Precise definitions are given next. 

We assume throughout that all coefficients, 
nonlinearities, and data are smooth unless otherwise 
specified. 


Symmetric Hyperbolic Systems and Shock Waves 161 


Definitions 


Consider a quasilinear system 


M$? (x, u)ðau? = N4 (x,u) [1] 
where u = (u*) 4 — 1,..., m3 x = Ca TE and Oa — 
0/0x°. The components of u may be real or 
complex. We follow the summation convention on 
repeated indices in different positions; x° =t may be 
thought of as the evolution variable; we write 
x= (t,x), with x=(x!,...,x7). Indices A, B,... run 
from 1 to m, indices j,k,... from 1 to n, and Greek 
indices from 0 to n. The complex conjugate of u^ is 


written 74. 


e Equation [1] is symmetrizable if there are func- 
tions o4p(x,u) such that 


a C 


satisfies the condition M4% p = M2, for every a. 

e It is symmetric if it is symmetrizable with 
OAB = ÔAB- 

@ It is symmetric-hyperbolic with respect to ka if it 
is symmetric and if k,M%, is positive definite: 


kyo M% 48 > 0 for E= (£f) ¢ 0. 


Thus, a symmetrizer (cag) gives rise to a 
Riemannian metric (ka74cM§*) on the space of 
unknowns, independent of any Riemannian struc- 
ture on x-space. The system is SH with respect to x? 
it ka =o 

The simplest class of SH systems is provided by 
real semilinear systems of the form 


A°(x)O,u + A'(x)dju = N(x, u) [2] 


where the A® are real symmetric matrices, A? is 
symmetric and positive definite, and ka = a0. Writ- 
ing A? = P*, with P symmetric and positive definite, 
one finds that v = Pu solves a SH system with A? =I 
(identity matrix). 

Conservation laws (with “reaction”? or “source” 
term N^) are usually defined as quasilinear systems 
of the form 


Of (x, u= N4 (x, u) [3] 


They are common in fluid dynamics and combus- 
tion. They are limiting cases of nonlinear diffusion 
equations of the typical form 


af^ (x,u) = N4 (x,u) + cð (B4 apu?) J4 


The determination of the form of the coefficients 
B” is a nontrivial modeling issue; they may reflect 
varied physical processes such as heat conduction, 
viscosity, or bulk viscosity. They may depend on x, 
u, and the derivatives of u. The simplest case is 


BJ = pik gA with (D/*) diagonal. Some authors 
require the symmetry condition 


5acBOĂ = poB [5] 


Equations in which f4° =u“6¢ are called reaction- 
diffusion equations; they arise in physical and 
biological problems in which chemical reactions 
and diffusion phenomena are combined, and in 
population dynamics. 

A conservation law is symmetric if and only if 
Of’*/Ou® is symmetric in A and B, which means 
that there are, locally, functions g°(x,u) such that 
foe — 0g% /du4. 

A more fundamental derivation of conservation 
laws would take us beyond the scope of this survey. 


Symmetrization 


Two general procedures for symmetrization are 
available: one for conservation laws, the other for 
semilinear strictly hyperbolic problems. 


Conservation Laws with a Convex Entropy 


Consider, for simplicity, a conservation law of 
the form 


au’ + afu) = 0 6 


We, therefore, assume that the f4°=f4°(u) and 
f° (u) =u*. We show that the following three 
statements are equivalent locally: (1) there is a 
strictly convex function U(u) such that cag = 
0? U/ðu ðu! is a symmetrizer; (2) eqn [6] implies a 
scalar relation of the form 0, U® =0, with U? strictly 
convex; and (3) there is a change of unknowns 
Va =va(u) such that the system satisfied by v= (vy) 
is SH and (v4 /ðu”) is positive definite. 

In fluid dynamics, U? may sometimes be related 
to specific entropy, and U’ to entropy flux. For this 
reason, if (2) holds, one says that U? is an entropy 
for eqn [6], and that (U?, U’) is an entropy pair. A 
system may have several entropies in this sense; this 
fact is sometimes useful in studying convergence 
properties of approximate solutions of eqn [6]. 

Let us now prove the equivalence of these 
properties. 

Assume first (3): there are new unknowns 
Va=va(u) and functions g®(v) such that 
{4° =g% /ðva. One finds that if eqn [6] holds, 


Q 


aU“ =0 where U® = a — g“ [7] 
OVA 


Furthermore, we have f4? =u"; therefore, eqn [7] 
gives: U? =vau4 — g, so that U? is the Legendre 
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. It follows 
y 
is strictly 


transform (familiar from mechanics) of g? 
that va =0U?/ðu*. Finally, (dv4/du® 
Ou“du®) is positive definite, and U 
convex. 

We have proved that (3) implies (2). Next, assume 
(2): the entropy equality U, + ðU’ =0 holds identi- 
cally — and not just for the solution at hand. Using 
[6], we find 


OU OU! 


ee a a B 
= OuA oa Tm ee 
au of“ _ aul 


o| 2U B 
~ | Aud Aub ge ae 


Assumption (2), therefore, means that U is strictly 
convex and satisfies 


aU af" _ au! : 
u^ ðuB ðu 
Now, letting va =0U/ðu^ and g/(v)=vaf’/ — U, 
we find 
ae _ jay , [OU OF"! aur] au’ 
ðu Auf AuC| Ava 
-f^ 9 
Let oag = 0° U/du4du? . Since U is strictly convex, 


(oag) iS positive definite, and so is its inverse. We 
have now proved (3). Note that u^ =ðg?/ðv4, 
where g°(v)=u4v,4 — U(u) is the Legendre trans- 
form of U. 

Next, using eqn [9], and the relations oag = 


Ov, /Ou® = ðvg/ðut, we find 


0 = oaplôu” 4 of 
Ovg Og as 
Bud vB uT 1 





= = gagu” + —— 


g, Oe 
= o4pOU T A Aa Aan = Ou 
which is SH; therefore, oag is a symmetrizer for eqn 
[6], and (1) is proved. Thus, (2) implies (1) and (3). 
Finally, if (1) holds, o4c0f% /Ou® is symmetric in 
A and B. It follows that 


ð [dU Of) oO oU PA 
dul |ðu^ ðuB AC 9uB | Aud ðuBðuC 


is symmetric in B and C, so that there are, locally, 
functions U’ such that eqn [8] holds. Therefore, 
(U,U’) is an entropy pair, and we see that (1) 
implies (2). 

This completes the proof of the equivalence of (1), 


(2), and (3). 


Strictly Hyperbolic Equations 


Consider the scalar equation Pf = g(t, x), where P is 
the linear operator 


N-1 
- X pn-j(t, x)ð; 
j=0 


of order N. Let A = (1 — A)!/?, where A is the Laplace 
operator on the space variables. Then u = (u^), where 
u^ = 08-'AN AF for A=1,...,N, solves a first-order 
pseudodifferential system of the form 


= [Lu =G 


If P is strictly hyperbolic, the principal symbol 
aı(t,x,) of L has a diagonal form with real 
eigenvalues A,(t,x,€), and there are projectors 
pilt, x, €)(p; = p;) which commute with a1, such that 
1= >p; and a = )/,Ajp;. Let ro= D7 přpj, and 


ro(D) the corresponding operator. Equation 
ro(D)0,u — ro(D)Lu = r9(D)G 


is formally SH in the following sense: rọ is positive 
definite and roa, is Hermitian. 


Linear Problems 
Consider a linear system 
Lu = Q(t, x)O,u + A' (t, x)Oju + B(t,x)u 
= f (t,x) [10] 


We assume that O and the A’ are real and 
symmetric, O > c with c positive, and all coeffi- 
cients and their first-order derivatives are bounded. 


Energy Identity 


Multiplying the equation by u! (transpose of u), one 
derives the “energy identity” 


O,(u' Qu) + 0,(u' Alu) +u' Cu =2u'f(t,x) [11] 


where C=2B — Q — ðA’. C is not necessarily 
positive. However, v:=uexp (—Af) satisfies a linear 
SH system for which C is positive definite if A is 
large enough. 


Propagation of Support 


A basic property of wave-like equations is finite 
speed of propagation of support: if the right-hand 
side vanishes, and if the solution at time 0 is 
localized in the ball of radius r, then the solution 
at time ¢ is localized in the ball of radius r + ct for a 
suitable constant c. 

This property also holds for SH systems. To see 
this, let us consider the set where a solution u 
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vanishes: if the initial condition vanishes for |x| < R, 
we claim that u at some later time vanishes for |x| < 
R — t/a, for a large enough. 

Indeed, let us integrate the energy identity on a 
truncated cone T:= {|x| < a(to — t)/to;0 < t < ty} 
with t; < to. The boundary of I consists of three 
parts: OF = Qo U Q1 US, where Qo and Q4 represent 
the portions of the boundary on which t=0 and 4, 
respectively. The outer normal to S is proportional 
to (a, tox! /|x|). Let E(s) denote the integral of u' Ou 
on [n{t=s}. Integrating eqn [11] by parts, we 
obtain 


F(t,) — E(O) + J u'u ds 


= [fort — u'Cu)dt dx [12] 


where ® is proportional to aỌ + fo yj Al 71 
Take a so large that ® is positive definite. The 
integral over S is then non-negative. If C is positive 
definite and f = 0, so that E(0)=0, we find that 
E(t) < 0. Since QO is positive definite, this implies 
u = 0 on Q4, as claimed. 


A Numerical Scheme 


System Lu=f may be discretized, for example, by 
the Lax—Friedrichs method: let h be the discretiza- 
tion step in space, and k the time step; write 
Tu(t, x) =u(t,x',...,x/ +h,...,x") (translation in 
the j direction). One replaces Oju by the centered 
difference in the j direction: (7; — Tiu) /2h; and the 
time derivative by 


[u(t + k, x) — SE ut, x) + T7tu(t,x))]/k [13] 


For consistency of the scheme, we require k/b =X > 0 
to be fixed as k and / tend to zero; stability then 
holds if A is small. 


Nonlinear Problems and Singularities 


We give a simple setup for proving the existence of 
smooth solutions to SH systems for small times. 
Such solutions may develop singularities. We limit 
ourselves to two types of singularities, on which SH 
structure provides some information: jump disconti- 
nuities and blow-up patterns. Caustic formation is 
not considered. 


Construction of a Smooth Solution 


Consider a real SH system (eqn [1]). Recall that a 
function of x belongs to the Sobolev space H° if its 
derivatives of order s or less are square-integrable. 


One constructs a solution defined for t small, which 
is in H*, s>n/2+1, as a function of x, by the 
following procedure: 


(1) Replace spatial derivatives by regularized opera- 
tors, which should be bounded in Sobolev 
spaces; the regularized equation is an ODE in 
Hs; let u. be its solution. 

(2) Write the equation satisfied by derivatives of 
order s of u., and apply the energy identity to it. 

(3) Find a positive T such that the solution is 
bounded in H° for |t| < T, uniformly in ¢; this 
implies a C! bound. 

(4) Prove the convergence of the approximations 
nL 

(S) Prove the continuity in time of the H° norm; 
conclude that the u, tend to a solution in 
C(—T, T; H5). 


The result admits a local version, in which 
Sobolev spaces are replaced by Kato’s “uniformly 
local” spaces. Uniqueness of the solution is proved 
along similar lines. We do not attempt to identify 
the infimum of the values of s for which the Cauchy 
problem is well-posed. 


Jump Discontinuities: Shock Waves 


A “shock wave” is a weak solution of a system of 
conservation laws admitting a jump discontinuity. 
By definition, weak solutions satisfy, for any smooth 
function ¢4(x) with compact support, 


/ {f4°9,b4 +N“ dba} dt dx = 0 


The theory of shock waves is an attempt to 
understand solutions of conservation laws which are 
limits of solutions of diffusion equations; the hope is 
that the influence of second-derivative terms is 
appreciable only near shocks, and that, for given 
initial data, there is a unique weak solution of the 
conservation law which may be obtained as such a 
limit, if modeling has been done correctly. This 
problem may be difficult already for a single shock 
(“shock structure”). 

The theory of shock waves follows the one- 
dimensional theory closely. We therefore describe 
the main facts for a conservation law in one space 
dimension (u = u(t, x)): 


Ou + Of (u) = 0 


If a shock travels at speed c, the weak formulation 
of the equations gives the Rankine—Hugoniot rela- 
tion c[u]=[f(u)], where square brackets denote 
jumps. There may be several weak solutions having 
the same initial condition. One restricts solutions by 
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making two further requirements: (1) the system 
admits an entropy pair (U, F) with a convex entropy 
and (2) to be admissible, weak solutions must be 
limits of “viscous approximations” 


Ou + O,f(u) = c02u 


as e€—0. One then finds easily that the entropy 
equality (O,U + 0,F=0) must be replaced, for such 
weak solutions, by the entropy condition: 0,U + 
ôF < 0 in the weak sense. This condition admits a 
concrete interpretation if the gradient of each 
characteristic speed is never orthogonal to the 
corresponding right eigenvector (“genuine nonli- 
nearity”); in that case, characteristics must impinge 
on the shock (“shock inequalities”). 

For the equations of gas dynamics with polytropic 
law (pv =const.), there is a unique solution with 
initial condition u =u; for x < 0,u =u, for x > 0, 
where u; and u, are constant (“Riemann problem”) 
which satisfies the entropy condition, provided |u; — u,| 
is small. More generally, if the equation of state 
p=p(v,s) > 0 satisfies 0p/Ov < 0 and &*p/dv” > 0, 
the shock inequalities are equivalent to the fact that 
the entropy increases after the passage of a shock 
with |u; — u| small. 

On the numerical side, one should mention: 
(1) the widely used idea of upstream differencing; 
(2) the Lax—Wendroff scheme, the complete analysis 
of which requires tools from soliton theory; and 
(3) the availability of general results for dissipative 
schemes for SH systems. 

Recent trends include: (1) admissibility conditions 
when genuine nonlinearity does not hold and 
(2) other approximations of shock wave problems, 
most notably kinetic formulations. 

Some of the ideas of shock wave theory have been 
applied to Hamilton-Jacobi equations and to 
motion by mean curvature, with applications to 
front propagation problems and “computer vision.” 


Stronger Singularities: Blow-Up Patterns 


The amplitude of a solution may also grow without 
bound. Examples include optical pulse propagation 
in Kerr media and singularities in general relativity. 
The phenomenon is common when reaction terms 
are allowed. As we now explain, this phenomenon is 
reducible to SH theory in many cases of interest. 
Blow-up singularities are usually not governed by 
the characteristic speeds defined by the principal 
part, because top-order derivatives are balanced by 
lower-order terms. In many applications, a systema- 
tic process (Fuchsian reduction) enables one to 
identify the correct model near blow-up; as a result, 


one can write the solution as the sum of a singular 
part, known in closed form, and a regular part. If 
the singularity locus is represented by t=O, the 
regular part solves a renormalized equation of the 
typical form 


tMu + Au = ÉN [14] 


where Mu =0 is SH. Under natural conditions, for 
any initial condition uo such that Auo = 0, there is a 
unique solution of eqn [14] defined for small t. 

The upshot is an asymptotic representation of 
solutions which renders the same services as an 
exact solution, and is valid precisely where numeri- 
cal computation breaks down. 

Fuchsian reduction enables one in particular to 
study (1) the blow-up time; (2) how the singularity 
locus varies when Cauchy data, prescribed in the 
smooth region, are varied; and (3) expressions which 
remain finite at blow-up. It is the only known general 
procedure for constructing analytically singular 
spacetimes involving arbitrary functions, rather than 
arbitrary parameters, and is therefore relevant to the 
search for alternatives to the big bang. 


Examples and Applications 
Wave Equation with Variable Coefficients 


Consider the equation 
Oyu + 2a! (x) — g (x)O,u = f(t, x,u, Vu) 


with (a) positive definite. Letting v= (vo,..., 
Vn+1):= (u, Oju, Qu), we find the system 


OVO = Vai 
OVE, — OUa = 0 


k ik 
O;Vn+1 le OU n41 = Opy; =f 


It is symmetrizable, using the quadratic form 
gant t = v2 + al* uy, + nee 
One proves directly that, if v; =Ojvo for t=0, this 


relation remains true for all t. 


Maxwell’s Equations 


Maxwell’s equations may be split into six evolution 
equations: 0,E — curlB+j=0 and 0,B+ curlE=0, 
and two “constraints” divE — p=0,divB=0. The 
system of evolution equations is already in sym- 
metric form; the quadratic form o,pu‘u? is here 
E|? + |B’. 
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Compressible Fluids 


Consider first the case of a polytropic gas: 


Ov +(v-V)v+p '!Vp=0 


15 

p + div(pv) = 0 Pe 

with p proportional to ø. Taking (p,v) as 
unknowns, one readily finds the SH system 

1 1 

— p +—(v-V)p+divv = 0 16 

apo Pa | )p [16] 

pdw + Vp+pv:Vw=0 [17] 


Symmetrization for more general compressible 
fluids with dissipation, including bulk viscosity, so 
as to satisfy the additional condition [5] may be 
achieved if we take as thermodynamic variables p 
and T, and assume pressure p and internal energy € 
satisfy Op/Op >0 and ðe/ðT > 0, by taking as 
unknowns (p, pv, ple + w|? /2)). The specific entropy 
s satisfies de = Tds — pd(1/p). If the viscosity and 
heat conduction coefficients are positive, one finds 
that U = —ps is a convex entropy (in the sense of SH 
theory) on the set where p > 0,T > 0. 


Einstein’s Equations 


The computation of solutions of Einstein’s equations 
over long times, in particular in the study of 
coalescence of binary stars, has recently led to 
unexplained difficulties in the standard Arnowitt- 
Deser—Misner (ADM) formulation of the initial- 
value problem in general relativity. One way to 
tackle these difficulties is to rewrite the field 
equations in SH form; we focus on this particular 
aspect of recent research. 

Recall the problem: find a four-dimensional 
metric g,, with Lorentzian signature, such that 
Rap —5R8a=XTa, with V4T,,=0, combined 
with an equation of state if necessary. Rap is the 
Ricci tensor and R=g”’R,,, is the scalar curvature; 
they depend on derivatives of the metric up to order 2. 
In addition to the metric, T,, involves physical 
quantities such as fluid 4-velocity or an electro- 
magnetic field. The conservation laws of classical 
mathematical physics are all contained in the 
relation V7l 4=0. 

Now, the field equations cannot be solved for 
0? gab, and, as a consequence, the Taylor series of g4 
with respect to time cannot be determined, even 
formally, from the values of g,, and 0,g,, for t=0 
(i.e. the Cauchy data). Furthermore, these data 
must satisfy four constraint equations. If the 
constraints are satisfied initially, they “propagate.” 
But in numerical computation, these constraints are 


never exactly satisfied, and the computed solution 
may deviate considerably from the exact solution. 
Also, numerical computations depend heavily on the 
way Einstein’s equations are formulated. 

The simplest way to derive a SH system is to 
replace Ra by Ri) =Rab — 5 [geca + SacOpF |, 
where F°:= alee It turns out that Re = 
—+4 g°18-42ap + Ha lg, 0g), where the expression of 
H,» is immaterial. Applying to each component of 
the metric the treatment of the first example above 
(wave equation with variable coefficients), one 
easily derives an SH system of 50 equations for 50 
unknowns: the ten independent components of the 
metric, and their 40 first-order derivatives. Now, if 
the T° are initially zero (coordinates are “harmo- 
nic”), they remain so at later times. 

Unfortunately, the harmonic coordinate condition 
does not seem to be stable in the large. More recent 
formulations start with one of the standard setups 
(ADM formalism, conformal equations, tetrad 
formalism, Newman-—Penrose formalism) and pro- 
ceed by adding combinations of the constraints to 
the equations, multiplied by parameters adjusted so 
as to ensure hyperbolicity or symmetric—hyperboli- 
city if needed. Another recent idea is to add a new 
unknown A which monitors the failure of the 
constraint equations; one adds to the equations a 
new relation of the form 0;A=aC— BA, where 
C=0 is equivalent to the constraints, and a and 8 
are parameters. One then adds coupling terms to 
make the extended system SH. It is expected that the 
set of constraints acts as an attractor. 

Reported computations indicate that these meth- 
ods have resulted in an improvement of the time 
over which numerical computations are valid. 


Tricomi’s Equation 


Let (x,y) solve (yd2 — On) =0. Letting u= 
e™ (xp, Oyy), one finds a symmetric system Lu =0, 


with 
0 0 1 
L=(7 1 )+a)= (4 Jð 


si t) 


we find that K = ZL = At ô; + A*0, + B, where 


i 
-roate (EY A) 


If 


is positive definite if y is bounded, of arbitrary sign, 
and A is small. 


166 Symmetries and Conservation Laws 


Cauchy-Kowalewska Systems 


Consider a complex system 


= + B(z,t,u) [18] 
where u=(u4),z=(z',...,2”). The coefficients are 
analytic in their arguments when z and t are close to 
the origin and u is bounded by some constant K. 
The Cauchy—Kowalewska theorem ensures that, for 
any analytic initial condition near the origin, this 
system has a unique analytic solution near z=0, 
even without any symmetry assumption on the A’. 


ðu = A (z, t, u) G 


This result is a consequence of SH theory 
(Garabedian). 
Me e write 2’ =x + iy, 0, =(1/2)(O, — idy;), and 


=(1/2)(0,; +i0y;). Recall that analytic functions 
K 2 e Ca o ean equations zu =Q. 


Adding (A5)! ô; to [18], and using the definition of 
Oy and Oz, we find the symmetric system 


Lag a. ¢ RNY. 
= 5 (A+ (AN) byu 
F > (Ai — (Ai) Oyu + B 19] 


Solving this system, we find a candidate u for a 
solution of eqn [18]. To show that u is analytic if the 
data we solve a second SH system for 
w =w"):=0;,u. If the data are analytic, w vanishes 
eae and therefore remains zero for all t. 
Therefore, u is indeed analytic. 


See also: Computational Methods in General Relativity: 
The Theory; Einstein Equations: Initial Value 
Formulation; Evolution Equations: Linear and Nonlinear; 
Magnetohydrodynamics; Partial Differential Equations: 
Some Examples; Semilinear Wave Equations; Shock 


Wave Refinement of the Friedman—Robertson—Walker 
Metric. 
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Introduction: Spacetime Symmetries 


Symmetries have played, and continue to play, an 
important role in fundamental physics, but the part 
they play is today seen as more complicated and 
many-sided than it was in the early days of particle 
physics, just after the Second World War. The area 
in which symmetries have had their most dramatic 
consequences is elementary particle physics, or 


high-energy physics, and the majority of this article 
is concerned with this subject. The article concludes 
with some observations about symmetries and 
conservation laws in general relativity. 

In the early days, considerations of symmetry 
were almost limited to Lorentz transformations: we 
begin by reviewing this crucially important topic. 
Invariance of the laws of nature under translations 
in space and time are actually necessary for the 
existence of science itself; if experiments did not 
yield the same results today and tomorrow, and in 
Paris and Moscow and on the Moon, then in effect 
there would be no laws of nature. Almost as strong 


a statement could be made about invariance under 
rotations; if space were not isotropic, experimental 
results would depend on which direction the 
apparatus was aligned in, and again any laws 
would be extremely hard to find. Turning to the 
question of motion, Newton and Galileo realized 
that the laws of dynamics are the same in all inertial 
frames in relative motion. In the Newton—Galileo 
scheme, the rule for relating the space and time 
coordinates of two frames of reference is (for 
relative motion along the common x-axis) 

x =x — vt, ai [1] 
This principle of relativity was reaffirmed by 
Einstein, but with the crucial modification that the 
rules for relating coordinates in two frames are 
given by Lorentz transformations, so that [1] is 
replaced by 


/ 


x = q(x — vt), pagn) [2] 
Time is absolute in [1] but relative in [2]. Einstein 
was of course motivated by the fact that Maxwell’s 
equations are covariant under Lorentz transforma- 
tions, but not under Newton—Galileo ones. 

The above considerations reveal that the laws of 
nature should be covariant under ten types of 
transformation: three translations in space, one in 
time, three parameters (angles) for rotations and 
three velocities. These transformations together 
form a group, the inhomogeneous Lorentz, or 
Poincaré group. It is a nonabelian group whose ten 
generators correspond to 4-momentum, angular 
momentum, and Lorentz boosts. The seminal work 
on the significance of this group in fundamental 
physics is that of Wigner in 1939. Assuming that the 
states of fundamental quantum systems (particles, 
atoms, molecules) form the basis states for repre- 
sentations of this group, these entities are described 
by two quantities, mass and spin. Spin, moreover, 
which was already familiar from earlier investiga- 
tions in quantum physics, was described by the 
rotation group (SU(2), which is homomorphic to 
SO(3)) only for states with timelike momentum. For 
photons, for example, with null momentum, spin is 
described by the (noncompact) Euclidean group in 
the plane, with the consequence that there are only 
two polarization states for this massless particle. 

Noether’s theorem provides the crucial link 
between symmetries and conservation laws, via the 
principle of least action. Noether showed that the 
invariance of the action under a continuous 
symmetry operation implied the existence of a 
conserved quantity. The conserved quantities 
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corresponding to invariance under translation in 
space and time are momentum and energy; con- 
servation of angular momentum follows from 
invariance under rotations and invariance under 
Lorentz transformations gives rise to conservation 
of motion of the center of mass. 


Gauge Theories: Electromagnetism 
and Yang-Mills Theories 


A quantity whose conservation has been well known 
for a long time is electric charge. The question may 
then be asked: invariance under what symmetry 
gives rise to conservation of electric charge? A 
classical complex field has the Lagrangian density 


L = (BLP) — mo'o 3 
which is invariant under 
p — exp(—iQ A)d [4] 


A being the parameter for the transformation. 
Noether’s theorem then yields conservation of Ọ, 
interpreted as electric charge. With A a constant, as 
above, the Lagrangian possesses a “global” symme- 
try. This becomes a “local” symmetry when A 
becomes space and time dependent, A(r,t) or 
A(x"). In that case, however, the Lagrangian [3] is 
no longer invariant under [4], because of the 
derivative terms. To preserve invariance an extra 
field A, must be introduced, so that [4] then 
becomes 


$ > exp(—1Q A(x"))¢ 
Laa 5 


and the Lagrangian acquires extra terms, involving 
A, The field A, is called a gauge field and is 
identified with the electromagnetic potential. The 
transformation [5] is called a gauge transformation, 
and since the phase factor exp(—iO A) may be 
regarded as a unitary 1 x 1 matrix, we have here a 
theory with U(1) gauge invariance, which describes 
electromagnetism and conservation of charge. 

The notion of isospin had been introduced by 
Heisenberg in 1932. Isospin (then called isotopic 
spin) was a vector-like quantity conserved in strong 
(nuclear) interactions. Yang and Mills in 1954 made 
the pioneering suggestion that isospin conservation 
could also be recast as a gauge theory, by enlarging 
the U(1) group of electromagnetism to SU(2) 
(corresponding to rotations in “isospin space”), 
and at the same time treating the rotation angles as 
functions of spacetime. Then, eqn [4] will change: if 


Ay > Ay 
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for example y is an isospinor field, then local 
isospin rotations are given by 


w(x) > exp} -i5 O(x) by(x) = Ules) [6] 


where t are the Pauli matrices: t/2 are the generators 
of SU(2). The gauge field then has three components 
Ai (t= 1, 2,3) which may be written as a matrix 


= 
= 1 
Ay =A,, 5 

transforming as 


A, — A’, =U(x)A,, U7*(x) 


sae x))U7 1! (x 
oe [7] 


where g is the coupling constant, analogous to 
electric charge. The problem with this idea was that 
the isospin gauge field, analogous to the photon in 
electrodynamics, should, like the photon, be mass- 
less and have polarization states +1 (commonly, but 
inaccurately — see the work of Wigner (1939) — called 
spin 1); whereas the Yukawa particle, identified as the 
7 meson, was massive and had spin 0, so could not act 
as the isospin gauge field. 

The Yang-Mills idea really came into its own 
with the standard model (SM) of particle physics. 
This (gauge) model has an invariance group SU(2) 8 
U(1) & SU(3), the first two groups corresponding to 
electroweak interactions (a unification of weak 
interactions and electromagnetism) and the final 
SU(3) to quantum chromodynamics (QCD), the 
gauge theory describing quark interactions, which 
“glues” them together to make hadrons — protons, 
neutrons, pions, etc. This model is a dramatically 
successful one. The QCD sector of the theory 
requires essentially no further elaboration on the 
Yang-Mills idea than replacing the group SU(2) by 
SU(3). This is a straightforward matter of replacing 
the generators t/2 of SU(2) with the eight generators 
(3 x 3 matrices) of SU(3). U(x) then also becomes a 
3 x 3 matrix. The three degrees of freedom are the 
three quark “colors,” for which there is good 
experimental evidence, and the gluons, the quanta 
of the gauge fields, are indeed massless and have 
good experimental support. In the electroweak 
sector, however, the gauge fields, the W and Z 
bosons, were found with the predicted masses of 
80.3 and 91.2 GeV respectively (the proton mass, for 
comparison, is 0.98 GeV). They are certainly not 
massless, as the straightforward Yang-Mills theory 
would require, and the explanation for this requires 
the introduction of the concept of spontaneous 
symmetry breaking. 


Spontaneous Symmetry Breaking 


The general idea of spontaneous symmetry breaking 
is that the vacuum — the state of lowest energy — is 
not invariant under the symmetry in question. A 
simple and common illustration is a pencil balanced 
vertically on its tip on a horizontal plane. The pencil 
is in unstable equilibrium but the system has a 
symmetry under rotations in the plane about the 
axis coincident with the pencil. Eventually, the 
pencil will fall into its lowest-energy state (vacuum), 
lying on the table in some direction — and the 
rotational symmetry is then lost. In fact, under 
rotations the actual lowest-energy (vacuum) state 
will be changed into another such state. There is a 
degenerate vacuum. 

A similar scenario may be constructed in a 
complex scalar field theory. Consider such a theory 
with a Lagrangian given by 


L = (0,6)(0"¢") -mep — XO" [8] 


that is, with a potential energy function given by 


V(o, 0") =m pp + AGS)’ 9 
where m is the mass of the field (quantum) and A is the 
coupling of its self-interaction. The ground state is 
obtained by minimizing V, hence 0V/0¢=0, giving 
(assuming that m? > 0) a minimum at ¢=¢*=0. 
If, however, m? < 0, there is a local maximum at 
@=0 and a minimum at |¢|* =—m?/2> 0. In 
quantum theory language, the vacuum expectation 
value <0|¢|O> of the field is nonzero. Goldstone 
showed that this implied the presence of a massless 
scalar particle — a Goldstone boson. There was some 
interest in this result in particle physics, where the 
hypothesis of “partial conservation of the axial vector 
current” (PCAC) might result in a Goldstone boson 
that could be identified with the pion; although not 
massless, the pion is the lightest hadron, so “almost” 
massless. 

Higgs analyzed what happens to the Goldstone 
model if electromagnetism is included. The Lagran- 
gian [8] is invariant under the global transformation 
[4], but if this is made local, as in [5], a gauge field 
must be introduced and it is found that the massless 
Goldstone boson disappears and the massless gauge 
field (photon) becomes massive. Thus, spontaneous 
symmetry breaking of a gauge theory results in the 
appearance of a massive, rather than massless, gauge 
particle. (It is relevant to remark that a massless 
photon possesses two polarization states, but a 
massive one possesses three, so the number of spin- 
polarization states is preserved — the massless 
photon “eats” the Goldstone boson and becomes 
massive.) The Higgs model was generalized to the 


case of a nonabelian symmetry group by Guralnik, 
Hagen, and Kibble and invoked by Weinberg in his 
1971 model for the electroweak interaction in which 
the gauge quanta were massive. 

Higgs’ work was motivated by the theory of 
superconductivity, where the Meissner effect (expul- 
sion of magnetic flux from a superconductor), when 
relativistic, implies that the effective mass of a 
photon in a superconductor is nonzero — this is, 
the “reason” that the flux does not penetrate. In the 
theory of Bardeen, Cooper, and Schrieffer (BCS), a 
superconductor is described by an effective scalar 
field, a composite of electron pairs (though paired in 
momentum space rather than coordinate space), and 
this provides a physical analogy with the model 
above. The SM of particle physics postulates a Higgs 
scalar field analogous to the BCS composite scalar 
field. If this field exists, Higgs particles should also 
exist, but they have not yet been found. This is an 
outstanding problem for the SM. 


Baryon and Lepton Numbers 


The fact that the proton p does not decay into 
positron plus photon, e* + y, or muon plus photon, 
ut +7, implies a conservation law of baryon 
number B (the proton possessing B=1 and the 
others B=0O). Furthermore, the stability of ~~ and 
T against decay into e- + y implies conservation of 
lepton numbers Le, L,,, and L,. These are regarded 
as global, not local, symmetries, so there are no 
associated gauge fields or interactions. Interestingly, 
however, these symmetries are not built into the SM, 
so are not guaranteed by it. More interestingly, these 
symmetries are actually destroyed in one attempt to 
go beyond the SM. This is the hypothesis that QCD 
may be unified with electroweak interactions to 
produce a “grand unified” theory (GUT). The 
simplest GUT is the one in which the SU(2) ® U(1) ® 
SU(3) symmetry is assumed to be a subgroup of the 
much tighter symmetry SU(5), and in that theory the 
proton is unstable: 


pe gpa [10] 


The predicted lifetime is 10°°*! years, while a recent 
estimate of the lifetime for this decay mode is > 
5 x 10°? years. It may be that GUTs do not exist in 
nature, but since the decay [10] violates conserva- 
tion of the quantities B and Le, even entertaining the 
idea that the decay might take place begs the 
question, “are these conservation laws sacrosanct?” 

Another recent development which leads to the 
same question is the subject of neutrino oscillations. 
A strong motivation for this is the solar neutrino 
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problem; this is the problem that the number of 
electron neutrinos detected on Earth, originating in 
the Sun, is less than the number predicted, by a 
factor close to 3. The mismatch could be at least 
partly, and perhaps completely, explained if electron 
neutrinos “oscillated” into muon and/or tau neutri- 
nos on their passage from the Sun to the Earth, since 
the reaction which detects the neutrinos on Earth is 
sensitive only to electron neutrinos, and not to the 
other species. But oscillation is only permitted if 
Le, Lu, and L, are not separately conserved quan- 
tities. Oscillation can also only take place if the 
masses of the different neutrinos are different — the 
oscillation rate depends on Am? — hence not all 
the neutrinos may be massless. 


Discrete Symmetries 


Ever since parity violation was discovered in weak 
interactions (nuclear beta decay) by Wu in 1957, the 
whole subject of discrete symmetries has presented 
problems which are still not resolved. The symme- 
tries in question are 


P (space inversion): (x,y,z) — (—x, =y, —2) 

T (time reversal): £ ——t 

C (particle—antiparticle 
antiparticle 


conjugation): particle 


Are the laws of physics invariant under these 
operations? The Wu experiment revealed that weak 
interactions are not invariant under P, but what 
about other interactions and other operations? In 
this context, the CPT theorem is highly important. 
According to this theorem (based on very general 
assumptions), all laws of nature must be invariant 
under the combined operation CPT, so that, for 
example, the fact that weak interactions are not 
invariant under P means that they are not invariant 
under the product CT either. 

The violation of P invariance in beta decay was 
soon related to the fact that the neutrino involved 
(the electron neutrino — or, to be precise, antineu- 
trino) was massless. Spin-1/2 particles like the 
electron and neutrino obey the Dirac equation, 
which may be written out as a pair of coupled 
equations for left- and right-handed states. In the 
case m =Q, however, these equations decouple so it 
is possible to have a massless spin-1/2 particle which 
is either left-handed or right-handed. Any interac- 
tion involving this particle would automatically 
violate parity (which turns a left-handed state into 
a right-handed one). Experiments have verified that 
the neutrino is indeed left-handed. The SM incorpo- 
rates this in the sense that the left-handed electron 
e~; and the electron neutrino ve are assigned to a 
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weak isospin SU(2) doublet, while the right-handed 
electron ep transforms as a singlet. A similar 
pattern is repeated for the u and t particles and 
their neutrinos. The phenomenon of neutrino oscil- 
lations, on the other hand, does not allow all the 
neutrino states also to be purely left-handed (since 
they cannot be massless). This poses a potential 
problem for the SM. 

For a few years after 1957 it was believed that beta 
decay violated C as well as P, but conserved the 
product CP; and indeed that all weak interactions 
were CP invariant. In 1964, however, it was found 
that there is a small element of CP violation in K? 
decay. CP-violating effects are also expected in B° 
decays. The physical origin of CP violation is still not 
understood, but its importance is that it implies T 
violation, so that in (at least some) weak interactions, 
there is an “arrow of time” on the subnuclear scale. 
(Such an arrow of time is, of course, familiar in 
thermodynamics.) This is used in a cosmological 
context to explain baryon—antibaryon asymmetry in 
the Universe. 


Baryon-Antibaryon Asymmetry 


In the standard model of cosmology it is shown that 
applying the known laws of physics to the early 
Universe (the first few minutes) leads to the 
conclusion that at an age of 226s nuclear fusion 
reactions took place resulting in a mixture of 74% 
protons and 26% a particles, so that, hundreds of 
thousands of years later, when galactic condensation 
took place, it would involve precisely this admixture 
of hydrogen and helium gases. Just this amount of 
helium has been found in the Sun, giving great 
confidence to the “big bang” model. Assuming that 
at extremely small times the baryon number of the 
Universe was zero, B=0, and assuming also (a big 
assumption, but one nevertheless made by cosmol- 
ogists) that the Universe is made of matter and not 
antimatter, we may then ask, why is this — where 
has the antimatter gone? 

Surprisingly, this question was addressed as early 
as 1966 by Sakharov, who showed that, starting 
with an initial state with B =Q, it would be possible 
to reach a state with B4O as long as three 
conditions obtained: B violating interactions, CP 
and C violating interactions, and lack of thermal 
equilibrium. GUTs and ordinary weak interactions 
already provide possibilities for the first two of these 
conditions. Breakdown of thermal equilibrium will 
be expected to occur as the Universe expands. 
When the particle density is high, reactions such as 
p+p—-vy+y7 will ensure an equal population of 
baryons and antibaryons, even in the presence of B 


violating interactions, but as the density increases 
and this reaction rate becomes less than the 
expansion rate, thermal equilibrium can no longer 
be maintained. Thus, GUTs offer an explanation of 
why there is no antimatter in the Universe. It might 
be thought that this sort of explanation is implau- 
sible, since the B-violating and CP-violating forces 
are so weak, but actually this is not a problem, since 
the ratio of baryon number to photon number in the 
Universe is of the order Ng/N, ~ 10~?; so we may 
conjure up a scenario in which the B and CP 
violating forces give rise to a volume of space in 
which there are, say, 10? antibaryons, 10? +1 
baryons and approximately the same number of 
photons. Then, all the antibaryons become annihi- 
lated leaving one baryon and 10’ photons — as 
observed. 

A recent development in the area of discrete 
symmetries has been the suggestion by Kostelecky 
and coworkers that there might exist spontaneous 
violation of CPT and Lorentz symmetry. 


Topological Charges 


Conserved quantities of a quite different type have 
received a lot of attention in recent decades. Their 
conservation is a consequence of nontrivial bound- 
ary conditions for the fields. A famous example is 
the sine-Gordon “kink.” The sine-Gordon equation 

p Od 1. 

— — + —sin(bd) = 0 11 

Or Ox2 b2 ( p) | | 
describes a scalar field in one space and one time 
dimension. It is a nonlinear equation which pos- 
sesses, among others, the interesting solution 


F(= Í arctan expl(/Vb)4 


where £=x—vt and y=(1—v*)'/?. This corre- 
sponds to a solitary wave which moves, preserving 
its shape and size — in distinction to usual waves, 
which spread out and dissipate. Waves of this type 
are called solitons, and solitons have in fact been 
observed moving along canals. In this case, they are 
solutions to the Korteveg de Vries equation. Equa- 
tion [11] clearly possesses the constant solutions 


p=, a e nd e nee 

which, it may be shown, all have zero energy. We 
may then construct a solution of the above type, but 
with n=0 as x — —œ and n=N as x — +o0. This 
so-called “kink” solution has finite energy and is not 
continuously deformable into a solution with n = 0 
everywhere, since this would involve overcoming an 


infinite energy barrier. The “kink number” may be 
characterized as a charge: defining the current 


P= b RuN 
27 
with £” the totally antisymmetric symbol, it is clear 
that this is identically conserved, ðJ” =0. This is a 
consequence of the definition of e”; it is not a 
consequence of invariance of the  sine-Gordon 
Lagrangian under a symmetry operation, so the 
current J” is not a Noether current. The associated 
conserved charge is 


b [dd 
= 0 a, fe 
o= J] dx = = x 1X 
b 


= Z [6(00) - 4(-ce)] = N 

Models of the above type may be written down in 
a spacetime with more than two dimensions. In that 
case the above solution depends only on one 
coordinate, so represents an infinite planar “domain 
wall,” on the two sides of which the field assumes 
different values. Such domain walls, as well as 
“cosmic strings,” are considered as serious possibi- 
lities in cosmology. 

Nonabelian gauge theories and the sigma model 
also provide a fertile ground for topological excita- 
tions — field configurations which for topological 
reasons do not decay. Gauge theories with sponta- 
neous symmetry breaking have two-dimensional 
solutions corresponding to vortex lines and three- 
dimensional solutions corresponding to magnetic 
monopoles. In spacetime (3+ 1 dimensions), there 
is a solution to the gauge field equations, with no 
spontaneous symmetry breaking, corresponding to 
an “instanton,” a finite-energy field configuration, 
localized in time as well as in space (hence the name). 
The gauge group here is SU(2), whose group space is 
S°. Spacetime is “Euclideanized” into R*, whose 
boundary is then S$. Asymptotic field configurations 
may then be characterized by mappings of $? in field 
space into $? in parameter space, and since the third 
homotopy group of $? is nontrivial, 73(S°) = Z, these 
field configurations belong to different classes and 
are not deformable into each other. These define 
“degenerate vacua” of the gauge field equations. In 
quantum theory, tunneling between these vacua is 
allowed and ’t Hooft has shown how this may give 
rise to deuteron decay d—e* + D,. Other exam- 
ples of topologically nontrivial configurations are 
so-called sphalerons, which may also contribute to 
baryon number violation in the early Universe, and 
skyrmions, constructs in the nonlinear sigma model 
which serve as a model for baryon number. 
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Supersymmetry 


Supersymmetry is a fermion—boson symmetry, pos- 
tulating that multiplets of fundamental particles 
contain both fermions and bosons. Thus, for 
example, since electrons exist there should also be 
“selectrons” — “scalar” electrons, with spin 0. There 
should also be photinos, with spin 1/2, to take their 
place alongside photons, and so on. If supersymme- 
try were exact, these particles would have the same 
mass as their partners and would have all been 
found, but in fact none have yet been discovered, so 
presumably supersymmetry is a broken symmetry. 
The feature that makes supersymmetry attractive is 
that it holds some promise for solving divergence 
problems in quantum field theory, since the radia- 
tive corrections from fermion and boson loops are 
Opposite in sign and may exactly cancel. Super- 
symmetric models can also help to solve the 
so-called hierarchy problem in quantum field theory. 
If supersymmetry is made into a local symmetry, 
rather than simply a global one, extra fields must be 
introduced (as the photon field was introduced 
above), and it turns out that one of these is a spin-2 
field, which may be identified with the graviton. 
Local supersymmetry thus becomes supergravity. 


General Relativity 


Symmetries and conservation laws take on new aspects 
when general relativity is considered. Einstein’s field 
equations relate the energy-momentum tensor of 
matter (and radiation) to the Ricci tensor of spacetime. 
The Ricci tensor has vanishing covariant divergence, 
which means that the energy-momentum tensor 
possesses the same property, but conservation of 
energy and momentum requires that it is the ordinary 
derivative, not the covariant one, of this tensor that 
should vanish. It might be expected that this problem 
could be alleviated by including the contribution of the 
gravitational field itself in energy-momentum tensor. 
This is quite reasonable, but then problems of 
interpretation arise, since at any one point in a general 
spacetime, a coordinate system might be found which 
is inertial (this is the force of the equivalence principle), 
corresponding to no gravitational field, and therefore 
no energy. The usual procedure is to introduce an 
energy-momentum “pseudotensor,” and to conclude 
that energy in a gravitational field is not localizable. 
The role of symmetries in general relativity is rather 
different from its role in particle physics, which is set in 
Minkowski spacetime. In a general spacetime there are 
no symmetries, but many examples of particular 
spacetimes with their own symmetries are now 
known. The symmetry operations involved are 
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isometries, with corresponding groups of motion (so 
that the isometry group of Minkowski space is the 
Poincaré group). These groups are an important 
subject of study in cosmology; for example, there is a 
classification of homogeneous cosmological models, 
labeled according to the Bianchi classification. 


See also: Cotangent Bundle Reduction; Effective Field 
Theories; Electroweak Theory; General Relativity: 
Overview; Infinite-Dimensional Hamiltonian Systems; 
Noncommutative Geometry and the Standard Model; 
Quantum Field Theory: A Brief Introduction; 
Quasiperiodic Systems; Sine-Gordon Equation; 
Supergravity; Symmetries in Quantum Field Theory of 
Lower Spacetime dimensions; Symmetry and Symplectic 
Reduction; Symmetry Classes in Random Matrix Theory; 
Topological Defects and Their Homotopy Classification. 
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Symmetries in Quantum Field Theory 


Symmetries have proved to be one of the most 
powerful concepts in quantum theory, and in 
quantum field theory in particular. From the 
beginnings of quantum mechanics, it is well known 
that the presence of a symmetry allows one to 
predict relations between different measurements, to 
classify spectra (energy or other), and to understand 
the Pauli exclusion principle, to name only a few 
applications. Much more remarkably, in modern 
relativistic quantum field theory, designed to 
describe the interactions of elementary particles, 
fundamental interactions have been found to be 
induced by the principle of local gauge invariance. 
One distinguishes spacetime symmetries (Poincaré 
or conformal transformations), which change the 
position and orientation of the system in space and 
time, and internal symmetries, which preserve the 
localization, acting on certain internal degrees of 


freedom. The Coleman—Mandula (1967) theorem 
states that internal and spacetime symmetries cannot 
be mixed, in the sense that the generators of internal 
symmetries must be Lorentz scalars, hence the total 
group of symmetries factorizes into a direct product. 
Supersymmetries are an exception of this theorem 
because their generators do not form a Lie algebra, 
and they were in fact designed to circumvent the 
Coleman—Mandula theorem. 

It is well known that the structure of symmetries 
of quantum systems in low-dimensional spacetime 
differs significantly from that in four-dimensional 
spacetime. (“Low” means in our context two or 
three, depending on the type of charge localization, 
c.f. below.) To name some examples: 


e Two-dimensional quantum systems may have much 
higher symmetries than four-dimensional ones: 

— In two dimensions, there exist massive integr- 
able models with infinitely many conservation 
laws and factorizable scattering matrices (see 
Integrability and Quantum Field Theory). 
These models exhibit solitonic superselection 
sectors, c.f. below. 

— The conformal group of two-dimensional 
spacetime is infinite dimensional, allowing for 
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the exact computation of correlation functions 
by the help of Ward identities (Belavin, 
Polyakov, and Zamolodchikov 1984). Only 
the finite-dimensional Mobius group, however, 
is also a symmetry of the vacuum state. 
Mobius covariance implies that the theory 
contains two subtheories of chiral fields 
defined on the light rays t—x=constant, 
resp. t+x=constant, and that these can be 
extended to fields defined on a circle, by 
adding a “point at infinity” to the light ray 
(Liischer and Mack 1976). One arrives thus at 
one-dimensional chiral quantum field theories 
on a circle, which will play an important role 
in the discussion below. 

e Continuous symmetries cannot be spontaneously 
broken in two dimensions. The latter is true not 
only for relativistic quantum field theory (Cole- 
man 1973), but also in quantum statistical 
mechanics (Mermin and Wagner 1966) where 
it is responsible for the absence of ferromagnet- 
ism (see Symmetry Breaking in Field Theory). 
Spontaneous symmetry breakdown requires 
long-range order which is overcome by thermal 
fluctuations down to zero temperature, because 
these diverge logarithmically (in the thermody- 
namical limit) in two dimensions. This theorem 
thus illustrates how the spacetime dimension- 
dependent size of phase space has an effect on 
internal symmetries of quantum systems. A 
detailed mathematical analysis of the balance 
between phase space (thermal fluctuations) and 
long-range order (symmetry breakdown) has 
been given in a recent discussion of the Gold- 
stone theorem (Buchholz, Doplicher, Longo and 
Roberts 1992). 

e The Coleman—Mandula theorem, excluding a 
mixing between internal and spacetime symme- 
tries (see above), is valid only in higher 
dimensions. 


In more recent times, it has become apparent that 
low-dimensional quantum systems do not only 
admit more symmetries, but they may exhibit 
internal symmetries of an entirely new type, not 
describable by groups of transformations. In this 
article, we shall focus on the various ways in which 
the new symmetries can arise, and how they can be 
understood. In order to properly appreciate these 
issues, let us first recall some basic symmetry 
concepts in the conventional case. 

In the traditional setting, symmetries arise in the 
form of groups of transformations of the quantum 
system which leave observable quantities (e.g., 
vacuum expectation values and correlation 


functions) invariant. The symmetries form a group 
of *-automorphisms of the algebra of fields: 


Q'g( 162) = Ag(G1) Ag (2) 
(aslo) = ag(¢*) [1] 


(typically given by linear transformations of field 
multiplets). In the strongest case, the automorphisms 
are implemented by unitary operators on the state 
space 


U(g)@U(g)" = ag(¢) 2] 


The implementers form a representation of the 
group of automorphisms, 


U(g1)U(92) = U(gi 92) [3] 


and there is an invariant vector state (a ground state, 
or the vacuum state in relativistic quantum field 
theory), 


U(g)Q = 4] 


However, depending on the dynamics of the 
quantum system, these relations cannot always be 
fully realized. One therefore considers several 
weaker or more general notions of symmetries 
relevant in four dimensions: 


è Spontaneously broken symmetries. The transfor- 
mations are given as automorphisms of an 
algebra, but which are not unitarily implemented 
in a given irreducible representation of the 
algebra. Invariant pure states do not exist. 

e Projective representations. The symmetries are 
unitarily implemented, but the implementers fail 
to satisfy the group law [3]. They give rise to ray 
(projective) representations or representations of a 
covering group. In particular, an invariant state 
vector as in [4] cannot exist in an irreducible 
representation. 

e Infinitesimal symmetries. Lie algebras of infinite- 
simal transformations, given as derivations of an 
algebra, which cannot be integrated to finite 
transformations. Derivations may or may not be 
implemented in a given representation of the algebra 
by commutators with self-adjoint generators. 

e Supersymmetry. The infinitesimal transforma- 
tions form a graded Lie algebra. 

e Local gauge symmetries form an infinite- 
dimensional group which are, however, not 
realized as automorphisms of the quantum alge- 
bra. Quantization of classical gauge interactions 
usually proceeds by breaking the gauge invariance 
in some way and restoring it at a later stage. 
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The Connection between Symmetry 
and Superselection Sectors 


It is often convenient to describe a model in terms of 
localized fields which do not represent an observable 
(in the sense of quantum mechanics that an operator 
corresponds to some measurement prescription). For 
example, Fermi fields which violate the principle of 
causality because they anticommute with each other 
at spacelike distance rather than commute are not 
observables. Only fields which are quadratic in the 
Fermi fields (densities of charge, current, energy) are 
observables. This means that an internal symmetry 
is used in order to distinguish the observables as 
those operators which are invariant under the 
symmetry: in the example, the symmetry transfor- 
mation multiplies each Fermi field by —1 (by the 
spin-statistics theorem, this transformation coincides 
with the univalence of the Lorentz group). We 
characterize this situation by writing 


A(O) = F(0)° [5] 


where A(O) and F(O) stand for the algebras of 
observables and fields localized in some spacetime 
region O, respectively, G is the internal symmetry 
group acting by automorphisms on each F(O) 
without affecting the localization, and F(O)° = {a € 
F(O), ag(a) =a for all g € G} denotes the subalgebra 
of invariants. The internal symmetry group G which 
distinguishes the observables according to [5] is 
usually called the “(global) gauge group.” 

If the gauge symmetry G is unbroken in the 
vacuum state, then there is a well-known connec- 
tion between symmetry and superselection rules 
(see Symmetries and Conservation Laws): namely, 
the observables act reducibly on the vacuum 
Hilbert space representation of F because they 
commute with the unitary operators which imple- 
ment the symmetry (or with their infinitesimal 
generators, usually called charges). As a conse- 
quence, the validity of the superposition principle is 
restricted because two eigenstates of different 
eigenvalues of the charges cannot exhibit interfer- 
ence. In other words, they belong to different 
superselection sectors. Wick, Wightman, and 
Wigner (1952) were the first to point out this 
relation. We therefore call this scenario the “WWW 
scenario” for brevity. 

In the WWW scenario, the decomposition of the 
Hilbert space is determined by the central decom- 
position of the internal symmetry group (the 
eigenvalues of the Casimir operators). In this way, 
the superselection sectors are in one-to-one corre- 
spondence with the irreducible representations of 
the internal symmetry group. 


Superselection sectors of two-dimensional models 
do not follow this scheme expected by the WWW 
scenario (see below). This was most strikingly 
demonstrated through the classification of the 
unitary highest-weight representations of the 
Virasoro algebra (Friedan, Qiu, and Shenker) 
which is nothing other than the classification of the 
superselection sectors of the observable algebra 
generated by the chiral stress-energy tensor, and 
through the determination of their fusion rules by 
Belavin, Polyakov, and Zamolodchikov (1984). 

In two dimensions, one is therefore lacking a 
compelling a priori ansatz, like the WWW scenario, 
for describing the system in terms of auxiliary 
nonobservable charged fields. At this point, one 
may argue that from an operational point of view, a 
quantum field theory, and in particular its symme- 
tries, should be understood entirely in terms of its 
observables. (This viewpoint is emphasized in the 
algebraic approach to QFT, see Algebraic Approach 
to Quantum Field Theory.) We shall therefore now 
ask the opposite question: suppose we are given an 
algebra A of local observables (without knowledge 
of a field algebra and its gauge group). We define 
the superselection sectors intrinsically as (the unitary 
equivalence classes of) the positive-energy represen- 
tations of A. Then the question is: do these sectors 
arise through a WWW scenario from some field 
algebra and a gauge symmetry, and if so, can the 
latter be reconstructed from the given observables 
alone? 

The answer in four dimensions is positive, thanks 
to a deep result due to Doplicher and Roberts 
(1990). Let us sketch the line of reasoning leading to 
this result in some detail, because it shows how the 
connection between (global) gauge symmetry on the 
one hand and spacetime geometry on the other hand 
emerges through the principle of causality (locality) 
of relativistic quantum field theory, and because it 
makes apparent what is different in low-dimensional 
spacetime. 

The analysis is based on the general structure 
theory of superselection sectors due to Doplicher, 
Haag, and Roberts (DHR, 1971). The latter starts 
with a selection criterion invoking the concept of a 
localized charge: a superselection sector which by 
measurements within the causal complement of 
some spacetime region O cannot be distinguished 
from the vacuum sector. The heuristic idea is, of 
course, that the sector is obtained from the vacuum 
sector by placing some charge in the region O (e.g., 
by the application of a localized charged field 
operator to the vacuum vector). 

It has been shown (Buchholz and Fredenhagen 
1982) that positive-energy representations of 
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massive theories always satisfy this selection criter- 
ion with a localization region O of the form of a 
narrow cone extending in spacelike direction. (In 
massless theories with long-range interactions, such 
as QED, the situation is more complicated because 
the charge creates an electric field whose flux at 
infinity does not vanish (Gauss’ law) and is not 
Lorentz invariant.) DHR assume that the localiza- 
tion region is even compact, and can be chosen 
arbitrarily within the unitary equivalence class of the 
representation. 

Exploiting a strong version of locality (Haag 
duality) for the vacuum representation of the 
observables, DHR proceed to define an associative 
composition (or fusion) law for positive-energy 
representations. This law is commutative only up 
to unitary equivalence. The crucial point is that the 
unitary intertwiner establishing this equivalence (the 
statistics operator) can be chosen in a unique way 
provided any pair of spacelike disconnected locali- 
zation regions can be continuously deformed into 
any other such pair. 

This point marks the separation between high and 
low dimensions. In two dimensions, in each pair of 
spacelike disconnected regions, one region is to the 
left of the other, thus distinguishing the pair 
(O1, O2) from (O2, O1). Consequently, they cannot 
be deformed into each other, and there arise two 
statistics operators. The same holds in three dimen- 
sions when the localization regions are spacelike 
cones, and O,,QO ) are taken within (the causal 
complement of) some larger spacelike cone. If the 
spacetime dimension is at least 4, or if in three 
dimensions the localization regions are compact, 
then the statistics operator is unique and, as a 
consequence, coincides with its inverse. 

The (non-)uniqueness of the statistics operator has 
far-reaching consequences concerning our original 
question about the underlying gauge symmetry. 
Namely, the DHR analysis proceeds to show that 
the set of positive-energy representations equipped 
with the composition law, and the linear spaces of 
inertwiners between different representations, 
together form the mathematical structure of a C* 
tensor category. The statistics operators which are 
distinguished intertwiners give additional structure 
to this category: this structure is called a (permuta- 
tion) symmetry if the statistics operators coincide 
with their inverse, and it is called a_ braiding 
otherwise. (It gives rise to a representation of the 
permutation group or the braid group, respectively.) 
In other words, the spacetime topology, through the 
intervention of the uniqueness of the statistics 
operator, causes the tensor category to be symmetric 
in high dimensions, and braided in low dimensions. 


At a more elementary level, one may think of 
statistics operators as reflecting commutation rela- 
tions between the searched-for charged fields. Mak- 
ing an ansatz for the commutation relations at 
spacelike separation, essentially the same topological 
argument as before implies, together with Poincaré 
invariance, that the coefficients appearing in this 
relation should form a representation of the permu- 
tation group, or of the braid group, respectively. The 
DHR approach, however, is entirely intrinsic, 
avoiding any a priori assumption of charged fields. 

The duality theorem due to Doplicher and 
Roberts (1990) now states that every symmetric C* 
tensor category (with some further qualifications 
valid in the DHR setting) is isomorphic to the 
category of unitary representations of a compact 
group, in which the composition law is the tensor 
product and the (permutation) symmetry is the 
natural one. Moreover, the category uniquely 
determines the group, and by a crossed product 
construction (an action of the category on the 
algebra A) one reconstructs a field algebra F such 
that [5] holds. If fermionic sectors are present, then 
there is some arbitrariness in the commutation 
relations among the corresponding fermionic fields, 
which can be exploited to produce the normal 
commutation relations (fermionic fields anticom- 
mute among each other, and bosonic fields commute 
with any field at spacelike separation). This fixes the 
field algebra F up to unitary equivalence. The 
conclusion is that the WWW scenario is the most 
general in four dimensions (apart from the reserva- 
tions due to long-range forces, see above). 


Generalized Symmetries in Low 
Dimensions 


In view of the success of this program in four 
dimensions and the advantage of the WWW 
scenario for model building, the obvious challenge 
is to search for an analogous understanding of 
superselection sectors (charges) in low dimensions in 
terms of an algebra of charged fields and a gauge 
symmetry distinguishing the observables. This gauge 
symmetry cannot, in general, be a group for several 
reasons: 


e As stated before, the tensor category of super- 
selection sectors possesses only a braiding, rather 
than a (permutation) symmetry, hence the duality 
theorem fails. 

@ One can associate a (statistical) dimension d, to 
each superselection sector [r] which is multi- 
plicative under the composition law (fusion), and 
additive under direct sums. In a symmetric 
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category, the dimensions are necessarily positive 
integers. Indeed, in the WWW scenario, they 
coincide with the naive dimension of the asso- 
ciated representation of the gauge group. But in 
the low-dimensional models, the dimensions turn 
out to be nonintegers in general. 

e Moore and Seiberg (1988) have axiomatized the 
superselection structure of chiral and two- 
dimensional conformal field theories in terms 
of a system of recoupling and braiding coeffi- 
cients controlling the fusion of sectors and its 
noncommutativity. (In fact, this system is 
basically equivalent to the DHR category.) For 
models such as SU(2) current algebras at level 
k, these coefficients turn out to coincide with 
the recoupling and braiding coefficients one can 
associate with a quantum group deformation 
(Drinfel’d 1986) of SU(2) with deformation 
parameter q= —expim/k. Representations of 
quantum groups (quasitriangular Hopf algebras, 
see Hopf Algebras and g-Deformation Quantum 
Groups) have a tensor product defined in terms 
of a noncocommutative coproduct. Moreover, 
they possess a quantum dimension which is a 
q-deformation of an integer. The quantum 
dimensions precisely match the statistical dimen- 
sions of the superselection sectors. All this 
strongly suggests that quantum groups appear as 
generalized symmetries in two dimensions, at 
least in a large class of models. 


A natural testing ground for the search for 
appropriate generalized symmetry concepts in low 
dimensions is the abundance of models in chiral and 
two-dimensional conformal QFT (see Two- 
Dimensional Models). As mentioned before, confor- 
mal symmetry in two dimensions has far-reaching 
consequences, especially the existence of chiral quan- 
tum fields which are defined on a one-dimensional 
light ray. As a null direction in the two-dimensional 
spacetime, this ray unites both the spacelike property 
of carrying a causal structure, and the timelike 
property that the generator of translations has positive 
spectrum (energy). These two features together with 
Mobius covariance are so powerful that they allow for 
the exact construction of large classes of models. The 
most elementary ones (minimal models) are 
completely described by the chiral stress—-energy 
density field, that is, the local generator of the 
conformal symmetry. Other models also contain 
currents which are the local generators of internal 
symmetries. These models exhibit many nontrivial 
superselection structures, which illustrate the wide 
range of possible deviations from higher-dimensional 
QFT, and at the same time exhibit possible 


approaches to appropriate symmetry concepts in 
low dimensions. 

Attempts to classify the possible algebraic struc- 
tures of generalized internal symmetries in a model- 
independent setting start from the idea that the 
representation category of the internal symmetries of 
a given model should be equivalent to the tensor 
category of its superselection sectors. Several alge- 
braic structures have been proposed as candidates, 
complying with this idea. They all assume specific 
modifications or deformations of eqns [1]-[5] above, 
highly constrained by self-consistency. Among these 
proposals are: 


® quantum groups (see e.g., Frohlich and Kerler 
12723); 

è weak quasiquantum groups (Mack and Schomerus 
1992) and rational Hopf algebras (Fuchs et al. 
1994), 

e weak C* Hopf algebras (Rehren 1997, Böhm and 
Szlachanyi 1996) or quantum groupoids (Nik- 
shych and Vainerman 1998), and 

è braided groups (Majid 1991). 


In several cases, the respective “symmetry alge- 
bra” can be reconstructed from the tensor category 
of superselection sectors, and a field algebra with 
linear transformation behavior can be constructed 
which contains the observables as invariant ele- 
ments as in [5]. However, the situation is unsatis- 
factory for various reasons. First, the class of QFT 
models for which these constructions have been 
performed is quite restricted (most constructions 
work only for rational models, i.e., models with a 
finite set of charges); second, the reconstructed 
symmetry algebra is not unique and finally, the 
constructed field algebras have features which 
diverge significantly from the WWW scenario. For 
example, it is not always warranted that the 
quantum symmetries are consistent with the 
*-structure, indispensable for Hilbert space positiv- 
ity (a necessary prerequisite for the probability 
interpretation of quantum theory). Moreover, typi- 
cally there are global gauge transformations which 
are implemented by localized field operators, thus 
exhibiting a mixing of local and global concepts. It 
also happens that this holds for elements in the 
center of the symmetry algebra, which implies that 
the field algebra is not local relative to its gauge 
invariant elements, that is, the charged fields do not 
commute with the gauge-invariant elements at 
spacelike separation. In other constructions, the 
field algebra is not associative, or there are no finite 
field multiplets. 

Historically, the first candidate for a “symmetry 
algebra” compatible with braid group statistics has 
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been the structure of a quantum group, as men- 
tioned above. However, in physically interesting 
models, the quantum group is not semisimple and 
thus has too many (namely, indecomposable) repre- 
sentations. Solutions to this problem have been: 


1. A BRS approach in an indefinite-metric frame- 
work (Hadjiivanov et al. 1991), 

2. “Truncation,” that is, discarding the “unphysi- 
cal” representations. Frohlich and Kerler (1993) 
have done this consistently in a categorical 
framework. In fact, they have given a complete 
classification of the possible braided tensor 
categories generated by a single irreducible object 
with statistical dimension d satisfying 1 < d < 2, 
in terms of categories constructed from the 
“truncated” representations of U,(sl2). Trunca- 
tion can also be performed by dividing the 
quantum group itself through the ideal which is 
annihilated by all “physical” representations, 
leading to a weak quasiquantum group (Mack 
and Schomerus 1992). 

3. Relaxing the axioms, thus admitting the more 
general structures mentioned above. 


All the above approaches assume a given general- 
ized symmetry concept and show to what extent 
field algebras complying with it can be constructed. 
They thus concern nonobservable objects, and it is 
no contradiction if different symmetry concepts can 
be associated with the same observable data. 

A more radical concept of global gauge symmetry, 
applicable to the low-dimensional case, has been 
developed by Longo and Rehren (1995). Its point of 
departure is the notion of a conditional expectation, 
which has the same abstract properties as a group 
average. In the WWW scenario, the Haar measure 
of the compact gauge group defines an average 


pF > de | aoa) EA 6] 


which is a positive linear map respecting the 
localization, and the observables are invariant, 
u(a)=a. In fact, the observables are exactly the 
image of this map, that is, [5] is equivalently 
formulated, but without reference to the group 
transformations, as 


A(O) = MRO) [7] 


Turning to the observables A of a quantum field 
theory in low dimensions, one looks for a quantum 
field theory F, containing A and equipped with a 
conditional expectation p such that [7] holds, and 
which preserves the vaccum state. F may not satisfy 
local commutativity, but it should be local relative 


to the observables in the sense mentioned before. In 
rational chiral CFT, such extensions can be classi- 
fied (and indeed constructed) in terms of the super- 
selection category of A, giving direct access to the 
decomposition of the vacuum Hilbert space of F into 
superselection sectors of A. The advantage here is 
that no problems with Hilbert space structure can 
arise (because the approach is entirely in terms of 
operator algebras); a drawback is that in general F is 
not unique, and nonvacuum representations of F 
also have to be considered in order to generate all 
sectors of A. 

The method can be used to classify and construct 
both nonlocal chiral extensions as candidates for 
sector-generating field algebras for a theory A of 
chiral observables, and local two-dimensional quan- 
tum field theories containing two given chiral 
subtheories, that is, observable algebras of two- 
dimensional models (Kawahigashi and Longo 2004). 
The chiral sector structure of the latter models is 
described by a “modular invariant.” In many cases, 
this means that their thermal partition functions are 
invariant under the group PSL(2,Z) of modular 
transformations of the temperature (see below). 

At this point, another link between spacetime and 
internal symmetries may be noted. The modular 
theory of von Neumann algebras (see Tomita- 
Takesaki Modular Theory) associates a one-para- 
meter group of automorphisms (called the “modular 
group”) with a state and an algebra “in standard 
position.” In quantum field theory, for the vacuum 
state and an algebra of observables localized in 
certain wedge regions of Minkowski spacetime, this 
group can be identified with a boost subgroup of the 
Lorentz group (Bisognano and Wichmann 1975). 
Similarly, in chiral CFT on the circle, the modular 
group associated with the observables in an interval 
and the vacuum coincides with a subgroup of the 
Mobius group. For nonlocal theories, there may be 
an obstruction, however. On the other hand, if a 
subalgebra is stable under the modular group of 
some algebra, then there is a conditional expectation 
from the larger algebra onto the smaller algebra. 
Combining these general theorems, the Möbius 
covariance of the inclusions A(O) C F(O) implies 
the existence of a conditional expectation, that is, 
the above generalization of the average over the 
internal symmetry. Moreover, assuming a general- 
ized notion of compactness (“finite index”) for the 
generalized internal symmetry, the Bisognano—Wich- 
mann property holds also for nonlocal theories 
(Longo and Rehren 2004). 

Of course, there is also a WWW scenario in chiral 
theories, that is, one may restrict a local theory to its 
invariants under some group of internal gauge 
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symmetries (“orbifold models”). It then happens 
that the invariants not only have the expected 
superselection sectors in correspondence with the 
representations of the gauge group, but in addition 
“twisted” sectors appear which, together with the 
former, constitute a “quantum double” structure. 
The twisted sectors arise by restriction of solitonic 
sectors of the original theory, which are in one-to-one 
correspondence with the elements of the gauge 
group (Muger 2005). Solitonic sectors are localiz- 
able with respect to two different vacua, and do 
not admit an unrestricted composition law. 


Special Issues 


A particularly simple situation is the case of anyons, 
that is, when all sectors have statistical dimension 1. 
Then the sectors form an abelian group G under 
fusion, and one can construct a WWW scenario with 
global gauge group G the dual of G. The ensuing 
quantum fields satisfy generalized commutation rela- 
tions at spacelike separation, given by an abelian 
representation of the braid group, where the coeffi- 
cients can be arbitrary complex phases (responsible 
for the name “anyons”). However, it is known that 
there can arise an obstruction, which enforces the 
“local” global gauge transformations (mentioned 
before) to be present. In this case, the gauge 
symmetry can also be described by a quasiquantum 
group. It is noteworthy that free anyon fields have 
been constructed in two-dimensional spacetime, 
while in three dimensions there can be no (cone-) 
localized massive anyon fields which are free in the 
sense that they generate only single-particle states 
from the vacuum (Mund 1998). 

The charge structure of massive quantum field 
theories in two dimensions is very different both 
from that encountered in conformal quantum field 
theories, and from the charge structure in high 
dimensions. It has been observed long ago that, in 
contrast to four dimensions, the strong locality 
property (Haag duality) which is necessary to set 
up the DHR analysis of superselection sectors, fails 
for the algebra of invariants under an internal gauge 
group in two dimensions. This algebraic feature can 
be traced back to the fact that the causal comple- 
ment of a point is disconnected in two dimensions, 
or, in physical terms, that “a charge cannot be 
transported around a detector” without passing 
through its region of causal dependence. Müger 
(1998) has shown that any algebra of observables 
which satisfies Haag duality, cannot possess any 
nontrivial DHR superselection sectors at all, and 
that the only sectors which can exist are solitonic 


sectors. This general result nicely complies with the 
experience with integrable models, as mentioned 
before. 

There are also some results giving interesting 
insight, which can be obtained intrinsically in terms 
of the observables. One of them concerns “central” 
observables (generalized Casimir operators). 

Casimir operators in the WWW scenario are 
functions of the generators of the internal symmetry 
which usually are integrals over densities belonging 
to the field algebra F (Noether’s theorem). Since 
they also commute with the generators, they can be 
approximated by local observables, and are there- 
fore defined in each representation of the latter. By 
Schur’s lemma, they are multiples of the identity in 
each irreducible sector. Since the eigenvalues of 
Casimir operators distinguish the representations of 
the gauge group, they also distinguish the sectors. 

In chiral CFT extended to the circle (see above), 
one can find global “charge measuring operators” 
C;, one for each sector 7;, in the center of the 
observable algebra (Fredenhagen et al. 1992) which 
have similar properties. They arise as a consequence 
of an algebraic obstruction to define the charged 
sectors on the circle, related to a nontrivial effect if a 
charge is “transported once around the circle,” and 
form an operator representation of the fusion rules 
within the global algebra of observables. Under 
rather natural conditions clarified by Kawahigashi, 
Longo, and Miiger (2001), the matrix of eigenvalues 
mj(C;) is nondegenerate, that is, the generalized 
Casimir operators completely distinguish the super- 
selection sectors. In this case, the superselection 
category is a modular category (see Braided and 
Modular Tensor Categories): the matrix with entries 
d,,7j(C;) and the diagonal matrix with entries 7;(U) 
(where U is the Mobius rotation by 27) are multi- 
ples of the generators S$ and T of the “modular 
group” PSL(2, Z), in a matrix representation labeled 
by the superselection sectors of the chiral observa- 
bles. The physical significance of this matrix 
representation is that it relates thermal expectation 
values for different values of the temperature (Cardy 
1986, Kac and Peterson 1984, Verlinde 1988) 

These examples, together with the failure of the 
Coleman—Mandula theorem, may illustrate the 
intricate relations among spacetime geometry, cov- 
ariance, and internal symmetry (charge structure) in 
low dimensions. In relativistic quantum field theory, 
the link is provided by the principle of locality, 
which “turns geometry into algebra.” 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Braided and Modular 
Tensor Categories; Hopf Algebras and g-Deformation 
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Quantum Groups; Integrability and Quantum Field 
Theory; Quantum Field Theory: A Brief Introduction; 
Quantum Fields with Topological Defects; Symmetries 
and Conservation Laws; Symmetries in Quantum Field 
Theory: Algebraic Aspects; Symmetry Breaking in Field 
Theory; Tomita—Takesaki Modular Theory; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras; Two-Dimensional Models. 
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Introduction 


This article treats the most important results and 
concepts relating to symmetry and conservation 
laws in quantum field theory. It includes such results 
as Wigner’s theorem, Goldstone’s theorem, the 
Bisognano—Wichmann theorem, the quantum 
Noether theorem, and the theorem on the existence 
of gauge groups and a field net. It is written within 
the framework of algebraic quantum field theory, 
this being the simplest setting capable of expressing 
all these concepts and results. 

Symmetries come in many guises. They are to a 
physical system what automorphisms are to a 
mathematical theory. In fact, when a physical 
system is described in mathematical terms, its 
symmetries correspond to the automorphisms of 
the mathematical structure and in particular form a 
group, its symmetry group. The reader should bear 
in mind this simple picture throughout its diverse 


variations. Readers unfamiliar with the mathemati- 
cal terminology should consult the appendix. 


Elementary Quantum Mechanics 


Before turning to quantum field theory, let us 
comment on symmetries in elementary quantum 
mechanics. These systems have the density matrices, 
that is, positive operators of trace 1, on an infinite- 
dimensional separable Hilbert space as states, the 
self-adjoint operators as observables. The expecta- 
tion value of the bounded observable A in the state 
determined by p is given by tr pA. Having specified 
the mathematical structure, the notion of symmetry 
follows. With a suggestive notation, it is a pair of 
mappings A +> aA,p+> pa! such that 


tr pa 'aA = tr pA 


for all observables A and states p. 

If we take p and A to be the projections onto Co 
and Cw for unit vectors @ and w, then the above 
condition corresponds to the conservation of 
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transition probabilities |(¢, wy. This formed the 
starting point for Wigner’s analysis, who concluded: 


Theorem Every symmetry is of the form A => 
UAU” and p++ UpU", where U is a unitary or 
antiunitary operator. 


As could have been foreseen from the outset, this 
simple result in no way distinguishes one elementary 
quantum-mechanical system from another. A more 
useful notion of symmetry results if the Hamiltonian 
is reckoned as part of the information describing the 
system and, therefore, has to be left invariant by a 
symmetry. The operator U above must therefore 
satisfy the condition UHU™ =H and it commutes 
with the Hamiltonian. As the Hamiltonian is the 
generator of time translations, U is a constant of 
motion. This is the genesis of the relation between 
symmetries and conservation laws. 


Quantum Field Theories 


The simplest types of quantum field theories can be 
described by von Neumann algebras A(O) depend- 
ing on double cones O and subject to 


Ce oa A(O) C A(02) 


a structure referred to as the net of observables. 

An alternative approach would be to use the 
Wightman formalism. This would need a discussion 
of pointlike fields and the domains of definition of 
unbounded operators, thus complicating a general 
exposition of symmetry. 

Comparing this description of a quantum field 
theory with that of an elementary quantum- 
mechanical system, the net clearly substitutes obser- 
vables but nothing has yet been said about states. 
Since the set of double cones is directed under 
inclusion, the union of the %{(Q) is a x-algebra X and 
a state of our system is a state on this algebra. 

Most states are of no physical relevance. A 
characterization of the states of physical relevance, 
even say to elementary particle physics, is not 
known although some progress has been made. 

The net structure is the hallmark of a field theory 
and allows us to distinguish two important classes of 
symmetries. An internal symmetry a satisfies the 
condition 


a(A(0)) = A0) 


for all double cones ©. By contrast, a spacetime 
symmetry is an automorphism az; implementing a 
Poincaré transformation L and hence satisfying the 
condition 


ay (%(O)) = ANLO) 


for every double cone ©. It is usually the case that 
internal symmetries commute with spacetime 
symmetries. 

The state of prime relevance to elementary particle 
physics is the vacuum state wo. The corresponding 
Gelfand—Naimark-—Segal (GNS) representation 7 is 
called the vacuum representation. Now the vacuum 
state of a quantum field theory is typically unique 
and as such invariant under a symmetry of the system 


woa! =W 9: 


Spacetime Symmetries 


Since the vacuum state is invariant, we have a 
unitary representation of the Poincaré group imple- 
menting the spacetime symmetries in the vacuum 
representation. To illustrate the role of representa- 
tions up to a factor, we take instead the GNS 
representation of a pure state corresponding to a 
particle of half-integral spin. Here we need a unitary 
representation of the covering group of the Poincaré 
group, inhomogeneous SL(2,C) to implement the 
symmetries. The situation for the subgroup of 
rotations is the same. 

The most important property of these representa- 
tions is positivity of the energy. More precisely, in a 
representation of relevance to elementary particle 
physics such as the vacuum representation, the 
generator P° of time translations is a positive 
operator P? > 0. Expressed in a frame-independent 
way, the spectrum of spacetime translations is 
contained in the closed forward light cone. It is 
one of the basic principles to be exploited in 
applying quantum field theory to elementary particle 
physics. Notice that the principle is no longer valid 
for an equilibrium state. 

A similar situation arises in conformal field 
theory. Here the role of double cones in Minkowski 
space is played by intervals on the circle and that of 
the Poincaré group by the Möbius group on the 
circle PSL(2, R). Again, the Mobius group cannot 
always be unitarily implemented and conformal 
invariance is defined via a continuous unitary 
representation of its covering group. Most impor- 
tantly, there is an analog of positivity of the energy. 
The generator of rotations of the circle is a positive 
operator. 

A remarkable aspect of spacetime symmetries was 
discovered by Bisognano and Wichmann in an 
application of modular theory in the field-theoretical 
context looking not at double cones but at wedges. 
A wedge W is a Poincaré transform of the standard 
wedge x! > |x°|. They found that the modular 
automorphisms of X(W) and the vacuum vector Qo 
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have a geometric significance. For the standard 
wedge, they got the following result. 


Theorem If the net is derived from Wightman 
fields, the modular operator is e*"*, where K is the 
generator of boosts in the 1-direction and the 
modular conjugation is ZRO, where © is the TCP- 
operator, R is the rotation through n about the 
1-axis, and Z is the unitary operator equal to 1 on 
the Bose subspace and —i on the Fermi subspace. 


The modular data for (©) and Qo also admit a 
geometric interpretation for the free massless scalar 
field. 

These facts enhance our understanding of space- 
time symmetries. The ideas have meanwhile been 
applied to curved spacetime to select a state with 
vacuum-like properties using the principle of the 
geometric action of the modular conjugation. 


Gauge Symmetry 


Gauge symmetries do not fit into our scheme in that 
they act trivially on the observable algebra A. To 
exhibit a gauge symmetry we need a larger net & 
called the field net. The gauge group will be the 
group of automorphisms of 7¥ leaving the subnet YX 
pointwise fixed and X the subnet of % of fixed 
points under G. This has the merit of indicating the 
mathematical framework for gauge symmetry but 
otherwise begs important questions. A priori one 
does not know what properties ?§ should have nor 
how it should be constructed. 

The right approach is to understand what intrinsic 
structure of X governs the existence of a nontrivial 
gauge group. This brings us back to the states or 
representations relevant to elementary particle phy- 
sics. A condition for selecting some of these relevant 
representations is that asymptotically they be like 
the vacuum in spacelike directions. More precisely, 
m must be unitarily equivalent to the vacuum 
representation 7) on the spacelike complement of 
every double cone. 

The resulting theory of superselection sectors 
hinges on the property of Haag duality that, for 
each double cone O, 


AO) = A(O)! 


where O’ denotes the spacelike complement of O. It 
implies that every representation satisfying the 
selection criterion is unitarily equivalent to one of 
the form rop, where p is an endomorphism of A 
localized in some fixed but arbitrary double cone, 
that is, p(A)=A if A € A(O’). The endomorphisms 
thus obtained are closed under composition and 


hence the objects of a full tensor subcategory T of 
the category of all endomorphisms and their inter- 
twiners. There is a dimension function d defined on 
the objects of 7, d(p)=1,2,...,00. If T; denotes 
the full subcategory whose objects have finite 
dimension, then the following result holds. 


Theorem T+ is equivalent to the tensor category of 
finite-dimensional continuous unitary representa- 
tions of a canonical compact group G. There is a 
canonical field net % with Bose—Fermi commutation 
relations extending ÙA such that G is the group of 
automorphisms of X leaving Ù pointwise fixed. 


The first step in the proof is to define and analyze 
the statistics of the representations in question. The 
statistics of an irreducible representation p can be 
classified as being para-Bose or para-Fermi of order 
d(p). The second step is to show that each p of finite 
dimension has a well-defined conjugate up to 
equivalence. The third and most difficult step is 
showing that 7y can be embedded in the tensor 
category of Hilbert spaces. 


The Local Implementation 
of Symmetries 


Gauge symmetry has its associated conservation 
laws in that the different sectors of the last section 
are labeled by conserved quantities such as baryon 
number, lepton number, or electric charge, gener- 
ically called charges. The theory is built round the 
idea of creating charge and elements of the field net 
carry charges. But there should be a dual approach 
based on measuring charges. One would like to 
prove the existence of local conserved currents 
corresponding to these charges. This has not proved 
possible but there is a good substitute, described 
below, which can be regarded as a weak version of a 
quantum Noether theorem. 

If O; C OG is a strict inclusion of double cones, 
then the theory is said to satisfy the split property if 
there is a type I factor M such that 


A(O) CM c (O2) 


where a type I factor is a von Neumann algebra 
isomorphic to some B(H). In this case M can be 
chosen in a canonical fashion and there is an 
isomorphism w called the universal localizing map 
of B(H) onto M, where H is the underlying Hilbert 
space. We have w~(A)=A for A € (Oj). 


Theorem If U is an implementing representation of 
the internal symmetry group G, Y(U) will be a 
representation of G in M that continues to imple- 
ment the symmetry on (01). If G is a Lie group 
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then the infinitesimal generators in the representa- 
tion are an analog of locally integrated current 
densities. 


Spontaneously Broken Symmetry 


The standard physical example of a spontaneously 
broken symmetry is magnetization. Despite the 
overall rotational symmetry, a magnet picks out a 
preferred direction as its direction of magnetization. 
The chosen state breaks the symmetry. 

The phenomenon of spontaneously broken sym- 
metry involves an interplay of symmetries and 
certain classes of states, vacuum states, ground 
states, or equilibrium states. If such an w is induced 
by a vector cyclic and separating for a local algebra 
A(O), then, as explained in the appendix, given O, 
modular theory yields a canonical unitary represen- 
tation V of the internal symmetry group G: 

gA=V,AV,, Aec) 

The results concern the breaking of a one- 
parameter group A> a, of symmetries. More 
precisely, one asks whether wô=0 or not, where 6 
is the infinitesimal generator of A> a), 


6(F) = lim A~! (a\(F) — F) 


where norm convergence is understood and holds on a 
dense domain. 6, the derivation, is an infinitesimal 
symmetry. Goldstone first showed that the sponta- 
neous breaking of such symmetries requires the 
presence of massless bosons. The following result is 
taken from a more modern treatment. Or here denotes 
the double cone whose base is the ball in t = 0 of radius 
R centered on the origin and D the domain of 6. 


Theorem Let 6 be a derivation on a field net X in 
s>1 spatial dimensions such that for FEX 
(Or) ND 


wodF] < cre( FQ] + ||F"Q]]) + ellêF| 


(i) If lim infr% cr, -R“*)/* =0, then woô =0. 

(ii) If lim infr cre RI) < œ, then wd £ 0 is 
only possible if the spectrum of the translations 
coincides with the forward light cone V, and the 
boundary ©V,/{0} has non-trivial spectral mea- 
sure (i.e., there are massless particles in the 
theory). 

(ii) If cre is polynomially bounded in R, then 
woô #0 is only possible if the spectrum of 
translations coincides with V, but there are 
not necessarily any massless particles. 


Symmetries of the S-matrix 


Scattering theory not only allows one to construct 
the multiparticle scattering states but also shows 
that internal symmetries and spacetime symmetries 
continue to act on these states and are therefore 
symmetries of the S-matrix. We can, however, ask 
what are all the symmetries of the S-matrix. An 
answer was provided by Coleman and Mandula, 
who showed that, when there is nontrivial scatter- 
ing, there are no further symmetries of the S-matrix. 


Appendix 


In an effort to make this article more self-contained, 
this appendix collects together a few simple perti- 
nent concepts and results from the theory of 
operator algebras. A C*-algebra is a x-algebra A 
with a norm ||- || making it into a Banach algebra 
and satisfying 


[A*A] = IAI? 


for every A € A. Any C*-algebra can be realized as a 
norm closed x-subalgebra of the C*-algebra B(H) of 
all bounded operators on a Hilbert space H. A von 
Neumann algebra R is a C*-algebra that is the dual 
space of a Banach space. This Banach space R., the 
predual of R, is intrinsically defined. The topology 
on R determined by duality with R, is called the 
o-topology. B(H) is a von Neumann algebra and its 
predual is the set of trace class operators. Any 
von Neumann algebra can be realized as a o-closed 
unital *-subalgebra of some B(H). 

A state on a C*-algebra A is a positive linear 
functional w of norm 1. If A has a unit I the 
normalization condition can be expressed as 
w(I)=1. Of fundamental importance is the relation 
between representations and states. A representation 
of A on a Hilbert space H is just a structure- 
preserving mapping or morphism of A into B(H). 
For simplicity, we suppose that A has a unit. Given 
a state w, there is an associated representation Ty 
defined by a vector Q such that 7,,(A)Q is dense in 
the Hilbert space in question, that is, it is a cyclic 
vector for the representation and 


w(A) = (Q,m(A)Q), AEA 

that is, the cyclic vector implements the given state. 
This is referred to as the GNS construction. Given 
any two such representations, there is a unique 
unitary operator mapping the one cyclic vector onto 
the other and realizing the equivalence of the 
representations. 
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A state of a von Neumann algebra is said to be 
normal if it is continuous in the o-topology. If w is 
normal, then T (R) is o-closed. 

An inclusion of unital von Neumann algebras has 
the split property if there is an intermediate type I 
factor, that is, if it has the form Rı C B(H) C Ro. 

The following elementary observation is often 
used in treating symmetries. If œ is an automorphism 
of A with wa™ =w, there is a unique unitary 
operator leaving the cyclic vector Q invariant and 
inducing qa in the representation 7,,. In other words, 


UQ = and 
Un.,(A)U~! = m (aA) 


If we apply the above lemma to a group G of 
symmetries leaving a state invariant, it yields a 
group U(g) of unitaries satisfying the condition 


U(gh) = U(g)U(h), gheG 


since U(g) is uniquely defined by the above 
conditions. 

When there is no invariant state, the situation is 
more complicated. Suppose there is a group G of 
symmetries and a representation m of YX where each 
g is unitarily implemented. Thus, there is a unitary 


U(g) with 
U(g)m(A)U(g) = 7(gA), ACA 
All we can now conclude is that 
U(gh) = Z(g, hb) U(g)U(h) 


where Z(g, h) is a unitary in W, the commutant of 
M, satisfying the 2—cocycle identity 


Z(gh,k)Z(g,h) = Z(g, bk)*Z(h, k) 


where £X = U(g)XU(g). U is said to be a repre- 
sentation up to a factor. It can be chosen to be a 
representation if the cocycle Z is a coboundary, that 
is, if there is a unitary Y(g) in W such that 


Y(8) Y(h) = Y(gh)Z(g,h) 


In general, little is known about solving problems 
of this kind, but there are a number of results when 
m is irreducible and the unitary group of its 
commutant reduces to the circle. 

We turn now to consider the modular theory of 
von Neumann algebras. A vector Q is said to be 
separating for a von Neumann algebra R if AN =0 
and A E€ R implies A=0. If Q is both cyclic and 
separating, there is a uniquely determined closed 
antilinear involution S with SAQ = A*Q for AER. 
If S= JA}? is the polar decomposition of S, then the 
unitary operators A” induce automorphisms 6” of R 


and JRJ =R. J is called the modular conjugation, A 
the modular operator, and 6” the modular auto- 
morphisms. The closure of {A'/4AQ0:A € R,A > 0} 
is a cone, called the natural cone. Every normal state 
of R is implemented by a unique vector in the 
natural cone. If œ is an automorphism of R, there is 
therefore a unique vector Qa in the natural cone 
such that, for every A € R, 


(Q, aT HAQ) = (Qa, AQ) 


There is now a canonical unitary operator Va 


defined by 
V,AQ=a( AO, 


Va maps the natural cone into itself anda +> Va is an 
implementing representation of the group of auto- 
morphisms of R. Under these circumstances, we do 
not have to deal with representations up to a factor. 


See also: Algebraic Approach to Quantum Field Theory; 
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Quantum Field Theory of Lower Spacetime Dimensions; 
Two-Dimensional Models. 
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Introduction 


The same symmetries may underlie diverse contexts 
such as phase transitions of crystals (Landau 
theory), fluid dynamics, and problems in biology 
and chemical engineering. Hence, seemingly unre- 
lated systems may exhibit similar phenomena in 
regard to symmetries of patterns and transitions 
between patterns (spontaneous symmetry breaking). 
It is natural to focus attention on aspects of pattern 
formation that are universal or model independent — 
aspects depending on underlying symmetries rather 
than model-specific details. 

The general framework is that the underlying 
system is governed by an evolution equation 


x = f(x) 1] 


with symmetry group I’. To avoid technicalities, we 
assume that [1] is an ordinary differential equation 
(ODE), the vector field f : R” — R” is as smooth as 
desired, and T is a compact Lie group acting linearly 
on R”. An inner product may be chosen so that 
IT acts orthogonally. The vector field in [1] is 
T-equivariant if 

flx) = f(x) forall xe R”, yeT [2] 
Equivalently, if x(t) is a solution and y €T, then 
yx(t) is a solution. 

In this article, we are interested in the dynamics to 
be expected for equivariant vector fields, and 
transitions that arise as parameters are varied. The 
symmetry group I is taken as given, whereas f is a 
general I’-equivariant vector field. (Other features 
such as energy conservation or time reversibility 
must be built into the general setup, but are 
excluded in this article.) 


isotropy Subgroups and Commuting 
Linear Maps 


Let I be a compact Lie group acting linearly on R”. 
The isotropy subgroup of x € R” is defined to be 


weHay el a= x} 


Note that ©), = 7% for all x € R”, y Er. 


Given an isotropy subgroup © CT, define the 
fixed-point subspace 


Fix & = {y € R”: oy = y for all o € X} 


If f:R”—R” is a T-equivariant vector field, then 
f(Fix}£) c Fix) for each isotropy subgroup ©. 
Hence Fix © is flow invariant. 

The normalizer N(X%)={y € T: yhy! =} is the 
largest subgroup of TI that acts on Fix}, and 
fo =f lpn is (N(2)/%)-equivariant. 

An isotropy subgroup © is axial if dim Fix} =1, 
and then N(X))/ = Z2 or 1. More generally, © is 
maximal if there are no isotropy subgroups T with 
%CTCI other than T= and T=T. Then 
N(X)/X acts fixed-point freely on Fix and the 
connected component of the identity (N(/™)° = 1, 
SO(2) or SU(2). Correspondingly X is called real, 
complex, or quaternionic. In the complex case 
dim Fix © is even; in the quaternionic case dim Fix 
X = Omod4. 

The dihedral group [=D,, of order m is the 
symmetry group of the regular m-gon, m > 3. Its 
standard action on R? is generated by 


(o 2r/m —sin m) 


© Asin 2r/m  cos2r/m 


(a a) 


For m even, the isotropy subgroups up to conjugacy 
are 


Din, Z(t), Za (pk), 1 


where Z,(g) denotes the cyclic group of order j 
generated by g. The maximal isotropy subgroups 
© = Z(k), Z2(K) are axial with N(X)/È S Z2. For 
m odd, Zo(pk) is conjugate to Z.(K) leaving three 
conjugacy classes of isotropy subgroups, and 
X= Zy(«) is axial with N(X)/S =1. 

The space of commuting linear maps 


Homp(R”) ={L : R” > R” linear: 
Ly = 4L for all yer} 


is completely described representation-theoretically. 
Recall that IT acts irreducibly on R” if the only 
T-invariant subspaces of R” are R” and {0}. Then 
Homp(R”) is a real division ring (skew field) D S R, 
C or H. The representation is called absolutely 
irreducible when D=RK and nonabsolutely irreduci- 
ble when D=C or H. 
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If the action of I is not irreducible, write R” = 
Vi 6---@ V; (nonuniquely) as a sum of irreducible 
subspaces. Summing together irreducible subspaces 
that are isomorphic to form isotypic components W 


gives the (unique) isotypic decomposition 
R“=W,8---8 Wy. If Le Homy(R”), then 
L(W;) c W; for each j, hence Homr(R”)= 


Homp(W1) ®---@ Homy(W,). Each W; consists of 
k; isomorphic copies of an irreducible representation 
with division ring D;. Let Mz(D) denote the space of 
k x k matrices with entries in D. Then 


Homr(R") = Me (D1) @--@ My,(Dy) [8] 


Spectral properties of commuting linear maps can be 
recovered from the decomposition [3], paying due 
attention to multiplicity and complex conjugates of 
eigenvalues. 


Equivariant Dynamics 


The dynamics of equivariant systems includes 
(relative) equilibria and periodic solutions, robust 
heteroclinic cycles/networks, and symmetric chaotic 
attractors. 


Equilibria 


Consider the ODE [1] with T-equivariant vector 
field f satisfying [2]. If x(t) = xo is an equilibrium, 
f(xo)=0, then there is a group orbit I'xo of 
equilibria. 

Let X= X, be the isotropy subgroup of xo. If 
dim X = dimT, then generically (for an open dense 
set of '-equivariant vector fields), the eigenvalues of 
(df),,. have nonzero real part, hence xo is hyperbolic. 
If the eigenvalues all have negative real part, then xo 
is asymptotically stable. If at least one eigenvalue 
has positive real part, then xo is unstable. Hyper- 
bolic equilibria are isolated and persist under 
perturbations of f; the perturbed equilibria continue 
to have isotropy ©. Since (df),, € Homy(R”), 
decomposition [3] for the action of © on R” 
facilitates stability computations for xo. 

If dim» < dimT, then [xo is a continuous group 
orbit of equilibria. Generically, dim ker (df), = 
dimI'—dim™ and ker (df)) = {€x0:€ € LT}, where 
LT is the Lie algebra of r. The remaining k =n — 
dimI’ + dim © eigenvalues generically have nonzero 
real part so Txo is normally hyperbolic. If all k 
eigenvalues have nonzero real part, then I'xo is 
asymptotically stable. If at least one has positive real 
part, then Txo is unstable. When N(=)/® is finite, 
generically xg is an isolated equilibrium in Fix © and 
persists as an equilibrium with isotropy © under 
perturbation. 


Relative Equilibria and Skew Products 


A point xo € R” (or the corresponding group orbit 
Txo) is a relative equilibrium if f(xo) € Ta [xo = 
LI'xo. If xo has isotropy %, then xo is a relative 
equilibrium if f(xo) € LD»xo, where Dy =(N()/)°. 

Write f(xo)=€xo, where € € LDy. The closure of 
the one-parameter subgroup exp(f&) is a maximal 
torus in Dy for almost every €. All maximal tori are 
conjugate with common dimension d=rank Dy. 
The solution x(t)=exp(té)xo is typically a 
d-dimensional quasiperiodic motion. “Typically” 
holds in both the topological and probabilistic sense 
and there is no phase-locking. When d=1, x(t) is 
periodic, often called a rotating wave. 

Choose a »-invariant local cross section X to the 
group orbit Txo at xo. There is a T-invariant 
neighborhood of Txo that is I'-equivariantly diffeo- 
morphic to (T x X)/, where X acts freely on IT x X 
by 


Ge (Y, x) E (yot, ox) 


and T acts by left multiplication on the first 
factor. The T-equivariant ODE on (T x X)/È lifts 
to a (T x X)-equivariant skew product on I x X 


Y= E(x),  *= h(x) 4] 


where €: X — LT, h: X — X satisfy the S-equivariance 
conditions 


E(ox) = Ad,é(x) = o€(x)o™! 
h(ox) = Cn x) 


and h(xo) =0. 

Thus, dynamics near the relative equilibrium 
Txo C R” reduces to dynamics near the ordinary 
equilibrium xo E€ X for the -equivariant vector 
h:X— X, coupled with T drifts. In particular, the 
stability of Txo is determined by (dh)... 


Periodic Solutions 


A nonequilibrium solution x(t) is periodic if x(t + T) = 
x(t) for some T > 0. The least such T is the (absolute) 
period. The spatial symmetry group A is the isotropy 
subgroup of x(t) for some, and hence all, t € R. The 
periodic solution P={x(t):0<t< T} lies inside 
Fix A. Define the spatiotemporal symmetry group 
= {y € 1: yP =P}. Note that A is a normal subgroup 
of © and either X/A = S! (P is a rotating wave) or 
X/A = Z, and P is called a standing wave or a discrete 
rotating wave. For each o € X, there exists T, € [0, T) 
such that ox(t)=x(t + T,). The relative period of x(t) 
is the least T > 0 such that x(T) € Uxo. 

If dim = dimT, then generically P is hyperbolic, 
hence isolated, the stability of P is determined by its 
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Floquet exponents, and P persists under perturba- 
tion as a periodic solution with spatial symmetry A 
and spatiotemporal symmetry X. For T infinite and 
N(A)/A finite, generically P is isolated in Fix A and 
the neutral Floquet exponent has multiplicity 
dim IT — dim» + 1. 


Relative Periodic Solutions 


A solution x(t) is a relative periodic solution if it is 
not a relative equilibrium and x(T) € I'x(0) for some 
T > 0. The least such T is the relative period. The 
spatial symmetry group A=%,., for some, hence 
all, t. The spatiotemporal symmetry group © is the 
closed subgroup of I generated by A and ø, where 
x(T) =ox(0), and generically X/A aT? Ziq isa 
maximal topologically cyclic (Cartan) subgroup of 
N(A)/A containing cA. Then x(t) is a (d+ 1)- 
dimensional quasiperiodic motion. 

The dynamics near the relative periodic solution 
is again governed by a skew product. There exists 
n>1 such that o”=exp(n&), where € € LZ(X) 
and Z(X%) CT is the centralizer of X. Define 
a=exp(—)o. Form a semidirect product A Xx Z2, 
by adjoining to A an element O of order 2” such 
that O60! =o607! for ô € A. 

In a comoving frame with velocity €, a neighbor- 
hood of the relative periodic orbit is ['-equivariantly 
diffeomorphic to (T x X x S')/A X Zn, where X is 
a A X Z»,-invariant cross section, S! =R/2nZ and 
A X Zo, acts on I x X x S! as 


g> (y, x, 0) — (yo*, 6x, 8) 
OQ ` (ee?) = (yat, Qx, 0 + 1) 


The T-equivariant ODE on (I x X x S')/A™ Zan 
lifts to a T x (A x Z„)-equivariant skew product 


y = 7E(x,0), x=hb(x,0), 0=1 a 


where £: X x S'— LT, h: X x §'—X satisfy appro- 
priate A X Z,-equivariance conditions. 


Robust Heteroclinic Cycles 


Heteroclinic cycles, degenerate in systems without 
symmetry, arise robustly in equivariant systems. Let 
X15--+5Xm E R” be saddles with W"(x;) — {x,;} C 
PW (xj11) (where m+1=1). If X1,..., Em CT are 
isotropy subgroups, W"(x;) C Fix, and xj11 is a 
sink in Fix };, then saddle-sink connections from x; 
to xj,1 persist for nearby I-equivariant flows. The 
union |)”, [W"(x;) forms a robust heteroclinic cycle 
(see the subsection “Dynamics” for an example). Such 
cycles, when asymptotically stable, are a mechanism 
for intermittency or bursting, notably in rotating 
Rayleigh—Bénard convection (where rolls disappear 


and reorient themselves at approximately 60°), and 
provide a possible intrinsic explanation for irregular 
reversals of the Earth’s magnetic field. 

Asymmetric perturbations (deterministic or noisy) 
destroy the cycles, but the perturbed attractors 
inherit the bursting behavior. 

Establishing the existence of heteroclinic connec- 
tions is often straightforward when dim Fix ©); =2 
and nontrivial with dim Fix»; >3. Criteria for 
asymptotic stability of heteroclinic cycles are given 
in terms of real parts of eigenvalues of (df), and 
depend on the geometry of the representation of I. 

Robust cycles exist also between more complicated 
dynamical states such as periodic solutions or chaotic 
sets (cycling chaos). When W”(x;) connects to two or 
more distinct states, the collection of unstable 
manifolds forms a heteroclinic network leading to 
competition between various subnetworks. 


Symmetric Attractors 


Suppose that T is a finite group acting linearly on R”. 
A closed subset A C R” has symmetry groups A= 
{fy ET: yx=«x for all x € A}, E ={y ET: yA =A}. 
Here, A is an isotropy subgroup and A C © C N(A). 
In applications, A corresponds to instantaneous 
symmetry and ©) to symmetry on average. 

If A is an attractor (a Lyapunov stable w-limit set) 
for a I’-equivariant vector field f : R” — R”, then £ 
fixes a connected component of Fix A — L, where L 
is the union of proper fixed-point spaces in Fix A. 

Provided dim Fix A > 3, all pairs A,» satisfying 
the above restrictions arise as symmetry groups of a 
nonperiodic attractor A. If dim Fix A > 5, then A is 
realized by a uniformly hyperbolic (Axiom A) 
attractor. 

If dim Fix A > 3 and © fixes a connected compo- 
nent of Fix A — L, then A is realized by a periodic 
sink provided X/A is cyclic. If dim Fix A = 2, then in 
addition either =A or N=N(A). 

Suppose A is an attractor and y € rT — X. Then 
yANA=J%. Varying a parameter, A may undergo a 
symmetry-increasing bifurcation: A grows until it 
collides with yA producing a larger attractor with 
symmetry on average generated by X and y. 

Determining symmetries of an attractor by inspec- 
tion is often infeasible. A detective is a [-equivariant 
polynomial ¢:R” — V where every subgroup of T is 
an isotropy subgroup for the action on V, and each 
component of ¢ is nonzero. Suppose that A C R” is 
an attractor with physical (Sinai-Ruelle-Bowen) 
measure u. By ergodicity, the time average 


I. f= 
pa = Jim = | (x(t))dt € V 
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is well defined for almost every trajectory x(t) in 
supp u. Generically, Xy, =¥%4 so computing the 
symmetry of A reduces to computing the symmetry 
of a point. 

If T is an infinite compact Lie group, and A is an 
w-limit set containing points of trivial isotropy, then 
A cannot be uniformly hyperbolic. Hence partially 
hyperbolic flows arise naturally in systems with 
continuous symmetry. Consider the skew product 
[4] where © =1 and h: X — X possesses a hyperbolic 
basic set A C X with equilibrium measure u (for a 
Holder potential). Let v denote Haar measure on I. 
Then A xT is partially hyperbolic, and u xv is 
ergodic (even Bernoulli) for an open dense set of 
equivariant flows. Such stably ergodic flows possess 
strong statistical properties (rapid decay of correla- 
tions, central-limit theorem); a possible explanation 
for hypermeander (Brownian-like motion) of spiral 
waves in planar excitable media. 


Forced Symmetry Breaking 


In applications, symmetry is not perfect and account 
should be taken of I’-equivariant perturbations of 
[1] for I’ a subgroup of T (including I’ =1). This 
topic is not discussed in this article, except in the 
subsections “Robust  heteroclinic cycles” and 
“Branching patterns and finite determinacy.” 


Equivariant Bifurcation Theory 


Consider families of ODEs x = f(x, A), with bifurca- 
tion parameter AER and vector field f:R” x 
R —> R” satisfying f(0,0)=0 and the I’-equivariance 
condition 


f(x, A) = rf (x, A) 
for all x ER”, AER, yer 


A local bifurcation from the equilibrium x=0 
occurs if (df)g 9 is nonhyperbolic. The center sub- 
space E° is the sum of generalized eigenspaces 
corresponding to eigenvalues on the imaginary 
axis, and is T-invariant. By center manifold theory, 
local dynamics ((x, A) near (0,0)) are captured by the 
center manifold W°. After center manifold reduction 
(or Lyapunov—-Schmidt reduction if the focus is on 
equilibria), it may be assumed that R” = FS. 

If (df)o,9 possesses zero eigenvalues, then there is 
a steady-state bifurcation. Generically, (df)o 9 =0 
and E° is absolutely irreducible. There are two 
subcases. 

If T acts trivially on R”, then n= 1 and generically 
there is a saddle—node (or limit point) bifurcation 
where the zero sets of f(x,A) and +x? +A are 
diffeomorphic for (x,A) near (0,0). Higher-order 


degeneracies can be treated using singularity theory. 
The equilibria and their stability determines the 
local dynamics. All bifurcating equilibria have 
isotropy I’, so there is no symmetry breaking. 

From now on, consider the remaining subcase 
where T acts absolutely irreducibly and nontrivially 
on R”. Then FixP = {0}, f(0,A) = 0, and (df)o \ = 
c(A)I, where generically c'(0) 40. Assume that 
c'(0) > 0, so the “trivial solution” x =0 is asympto- 
tically stable subcritically (A <0) and unstable 
supercritically (A > 0). Bifurcating solutions lie out- 
side FixT and hence there is spontaneous symmetry 
breaking. 


Axial Isotropy Subgroups 


The “equivariant branching lemma” guarantees 
branches of equilibria with isotropy © for each 
axial isotropy subgroup. There are three associated 
branching patterns, see Figure 1. 

If N(X)/N=Zy, then fs is odd. Generically, 
O° fs(0,0) 40, since (xf +---+x2%)x is T-equivar- 
iant, and there are two branches of equilibria 
bifurcating supercritically or subcritically together, 
and lying on the same group orbit. The branches 
form a symmetric pitchfork whose direction of 
branching is determined by sgn 02/5,(0, 0). 

If N(X)/ S 1, then generically fs is even. If all 
quadratic [-equivariant maps vanish on Fix ®©, then 
the bifurcation is sub/supercritical depending on 
sen O2f5(0,0) but the branches lie on distinct group 
orbits. This is an asymmetric pitchfork. 

If 02f5(0,0) 40, then the equilibria exist tran- 
scritically: for A < 0 and A > 0. 

The natural actions of D,, on R* are absolutely 
irreducible. The axial branches are symmetric 
pitchforks for m>4 even, asymmetric pitchforks 
for m > 5 odd, and transcritical for m=3. 

The actions of D„,,m> 5 odd, provide the 
simplest instances of hidden symmetries, where 
certain N(X)/S-equivariant mappings on Fix» do 
not extend to smooth T -equivariant mappings on R”. 


Nonaxial Maximal Isotropy Subgroups 


For X a real maximal isotropy subgroup, dim Fix © 
odd, there exist branches of equilibria with isotropy 


—— = 


(a) (b) (c) 

Figure 1 Axial branches: (a) supercritical symmetric pitchfork, 
(b) supercritical asymmetric pitchfork, and (c) transcritical 
branches. 


188 Symmetry and Symmetry Breaking in Dynamical Systems 


X. When dimFix™ is even, there are examples 
where equilibria exist and examples where no 
equilibria exist. For complex or quaternionic, 
there exist branches of rotating waves with isotropy 
X. In the quaternionic case, the rotating waves 
foliate the SU(2) group orbits according to the Hopf 
fibration. 


Submaximal Isotropy Subgroups 


It has been conjectured falsely that steady-state 
bifurcation leads generically to equilibria only with 
maximal isotropy. The simplest counterexample is 
the 24-element group I = 73Z3 generated by 


0 1 0 10 0 
p-{0 01], nr=-101 0 
10 0 0 0 -1 


(Alternatively, rT = T $ Z2(— 13), where T C SO(3) is 
the tetrahedral group.) 

The isotropy subgroup ©=Z2(k) has two- 
dimensional fixed-point subspace Fix © = {(x, y, 0)}. 
The only one-dimensional fixed-point spaces con- 
tained in Fix™ are the x- and y-axes. The general 
I-equivariant vector field is 


x = g(x, 7,27, A)x 
Y = gy. 2.x", Ay 
z = g(z*, x7, y*, A)z 
After scaling, 
g(x", A 
=\=x — ay’ — bz + o(x’,y’,27,r) [6] 


Restricting to Fixy and dividing out the axial 
solutions x =0 and y=0 yields at lowest order the 
equations A=x? + ay? =y? + bx*. Submaximal 
solutions exist provided sgn(a — 1) = sgn(b — 1). 

In general, the existence of equilibria with 
submaximal isotropy must be treated on a case- 
by-case basis (for each absolutely irreducible repre- 
sentation of I’ and isotropy subgroup ™). 


Asymptotic Stability 


Subcritical and axial transcritical branches are 
automatically unstable. Moreover, the existence of 
a quadratic [-equivariant mapping q: R” — R” and 
x € Fix such that (dq), has eigenvalues with 
nonzero real part guarantees that branches of 
equilibria with axial isotropy © are generically 
unstable (even when q]r.,5, = 0). 

There are no general results for asymptotic 
stability, and calculations must be done on a case- 
by-case basis. (The remarks in the subsection 
“Equilibria” are useful here.) 


Branching Patterns and Finite Determinacy 


The following notion of finite determinacy is based 
on equivariant transversality theory. Assume I acts 
absolutely irreducibly. Consider the set F of 
T-equivariant vector fields f:R” x R— R” satisfy- 
ing (df)y9—=0. For an open dense subset of F, 
branches of relative equilibria near (0,0) are 
normally hyperbolic. The collection of branches of 
relative equilibria, together with their isotropy type, 
direction of branching, and stability properties, is 
called a branching pattern. These persist under small 
perturbations and are finitely determined: there exist 
q=qr > 2 and an open dense subset U(q) C F such 
that the branching patterns of f and f+g are 
identical for f €U(qg), g€ F, provided g(x, A)= 
o( (|||). 

Furthermore, branching patterns are strongly 
finitely determined: there exist d > 2 and an open 
dense subset S(d) C F such that the branching 
patterns of f and f +g are identical for f € S(d) 
and all (not necessarily equivariant) g satisfying 
g(x, A) =o(||2||"). 

For example, consider the hyperoctahedral group 
S „Zon > 1. Here S, acts by permutations of the 
coordinates (x1,...,X,) and Z3 consists of diagonal 
matrices with entries +1. Let 1 = T Z3, where T C S,, 
is a transitive subgroup. Then I acts absolutely 
irreducibly on R” and is strongly 3-determined. 
Submaximal branches of equilibria exist except when 
T=S,, T =A, and, if n = 6, T = PGL} (F5). 


Dynamics 


Absolutely irreducible representations have arbitra- 
rily high dimension, so steady-state bifurcation 
leads to rich dynamics. The group T = Z3Z} with 
sgn(a — 1) Æ sgn(b — 1) and a+b > 2 in [6] yields 
asymptotically stable heteroclinic cycles with planar 
connections connecting equilibria in the x-, y- and 
z-axes (see Figure 2). In Rf, there is the possibility of 
instant chaos where chaotic dynamics bifurcates 
directly from the equilibrium 0. 








Figure 2 Robust heteroclinic cycle for the group T =Z3x Z3. 
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In the absence of quadratic equivariants, the 
invariant-sphere theorem gives an open set of 
equivariant vector fields for which an attracting 
normally hyperbolic flow-invariant (n — 1)-dimen- 
sional sphere bifurcates supercritically. This simpli- 
fies computations of nontrivial dynamics. 


Hopf Bifurcation and Mode Interactions 
Equivariant Hopf Bifurcation 


The setting is the same as in the last section, except 
that L=(df)o o has imaginary eigenvalues +iw of 
algebraic and geometric multiplicity 1/2. Generic- 
ally, R” =E* is T-simple: either the direct sum of 
two isomorphic absolutely irreducible subspaces, or 
nonabsolutely irreducible. 

By Birkhoff normal-form theory (see below), for 
any k>1 there is a I-equivariant change of 
coordinates after which f(x, A) = f(x, A) + o(|lx||*), 
where f, is (Ix S')-equivariant. Here S!= 
fexp(tL): t € R} acts freely on R” and I x St acts 
complex irreducibly (D=C). Hence, dim Fix] is 
even for each isotropy subgroup J CT x St, and 
N(J)/J =S! when J is maximal. The equivariant 
Hopf theorem guarantees, generically, branches of 
rotating waves with absolute period approximately 
27/w for each maximal isotropy subgroup J. 

The notions of finite and strong finite determinacy 
extend to complex irreducible representations and the 
rotating waves persist as periodic solutions for the 
original T-equivariant vector field f. Define the 
spatial and spatiotemporal symmetry groups A C 
X CT asin the subsection “Periodic solutions.” Then 
J={(o,@(c)): o€ U} is a twisted subgroup, with 
0:£— St a homomorphism and A=] AT = ker 0. 

In the non-symmetry-breaking case, where T acts 
trivially on R*, phase-amplitude reduction leads to 
Zo2-equivariant amplitude equations on R and 
higher-order degeneracies are amenable to Z3- 
equivariant singularity theory. Similar comments 
apply to O(2)-equivariant Hopf bifurcation where 
the amplitude equations are D4-equivariant. The 
technique fails for general groups I. 


Mode Interactions and Birkhoff Normal Form 


Steady-state and Hopf bifurcations are codimen- 
sion 1 and occur generically in one-parameter 
families of ['-equivariant vector fields. Multipara- 
meter families may undergo higher-codimension 
bifurcations called mode interactions. Suppressing 
parameters, steady-state/steady-state bifurcation 
occurs when R” = E° = V1 @ V2, where V; and V2 
are absolutely irreducible and L= (df)ọ has zero 
eigenvalues. If V; and V2 are nonisomorphic then 





L=0, otherwise L is nilpotent and there is an 
equivariant Takens—Bogdanov bifurcation. Similarly, 
there are codimension-2 steady-state/Hopf and Hopf/ 
Hopf bifurcations. 

Write L=S + N (uniquely), where S is semisimple, 
N is nilpotent, and SN = NS. Then {exp tS: t € R} is 
a torus I”, where p > 0 is the number of rationally 
independent eigenvalues for L. 

For each k > 1, there is a I'-equivariant degree-k 
polynomial change of coordinates P : R” — R” satis- 
fying P(0)=0, (dP), =I transforming f to Birkhoff 
normal form f} +0/(||x|)"), where f, is (T x T?)- 
equivariant. 

If N 4 0, then {exp tN": t € R} & R and f, can be 
chosen so that the nonlinear terms are (T x T? x R)- 
equivariant. The linear terms are not R-equivariant. 

The study of mode interactions proceeds by first 
analyzing (T x T’)-equivariant normal forms, then 
considering exponentially small effects of the 
T-equivariant tail. Versions of the equivariant branch- 
ing lemma and equivariant Hopf theorem establish 
existence of certain solutions. There are numerous 
examples of robust heteroclinic cycles connecting 
(relative) equilibria and periodic solutions, symmetric 
chaos, and symmetry-increasing bifurcations. 


Bifurcations from Relative Equilibria 
and Periodic Solutions 


Using the skew product [4], bifurcations from 
a relative equilibrium with isotropy for a 
T-equivariant vector field reduce to bifurcations 
from a fully symmetric equilibrium for a 
-equivariant vector field h coupled with T drifts. 
If hb possesses (relative) equilibria or periodic 
solutions, then the drift is determined generically as 
in the subsections “Relative equilibria and skew 
products” and “Relative periodic solutions.” Never- 
theless, solving the drift equation can be useful for 
understanding behavior in physical space. This is 
facilitated by making equivariant polynomial 
changes of coordinates (yO(x), P(x)) putting / into 
Birkhoff normal form and simplifying €. 
Bifurcations from (relative) periodic solutions also 
reduce, mainly, to bifurcations from equilibria (with 
enlarged symmetry group). Based on the discussion 
in the subsection “Relative periodic solutions,” it 
suffices to consider bifurcations from isolated 
periodic solutions P= {x(t)} with spatial symmetry 
A and spatiotemporal symmetry X. Write x(T)= 
ox(0), where T is the relative period and ø is chosen 
so that the automorphism ém 0 '50,5 € A, has 
finite order k. Form the semidirect product 
A X Zo, by adjoining to A an element 7 of order 
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2k such that r tér =a! 60, for 6 € A. Codimension- 
1 bifurcations from P are in one-to-one correspon- 
dence (modulo tail terms) with bifurcations from 
fully symmetric equilibria for a (A x Z,)-equivariant 
vector field. In particular, period-preserving and 
period-doubling bifurcations from P reduce to 
steady-state bifurcations, and Naimark—Sacker 
bifurcations reduce to Hopf bifurcations. This 
framework incorporates issues such as suppression 
of period doubling. Similar results hold for higher- 
codimension bifurcations. 

The skew products [4] and [5] are valid for proper 
actions of certain noncompact Lie groups I pro- 
vided the spatial symmetries are compact, leading to 
explanations of spiral and scroll wave phenomena in 
excitable media. 

When the spatial symmetry group is noncompact, 
E° may be infinite-dimensional and center manifold 
reduction may break down due to continuous- 
spectrum issues. For Euclidean symmetry, there 
is a theory of modulation or Ginzburg-Landau 
equations. 


See also: Bifurcation Theory; Bifurcations in Fluid 
Dynamics; Bifurcations of Periodic Orbits; Central 
Manifolds, Normal Forms; Chaos and Attractors; 
Electroweak Theory; Finite Group Symmetry Breaking; 
Hyperbolic Dynamical Systems; Quantum Spin Systems; 
Quasiperiodic Systems; Singularity and Bifurcation 
Theory. 
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Introduction 


The use of symmetries in the quantitative and 
qualitative study of dynamical systems has a long 
history that goes back to the founders of mechanics. 
In most cases, the symmetries of a system are used to 
implement a procedure generically known under the 


name of “reduction” that restricts the study of its 
dynamics to a system of smaller dimension. This 
procedure is also used in a purely geometric context 
to construct new nontrivial manifolds having var- 
ious additional structures. 

Most of the reduction methods can be seen as 
constructions that systematize the techniques of 
elimination of variables found in classical 
mechanics. These procedures consist basically of 
two steps. First, one restricts the dynamics to flow- 
invariant submanifolds of the system in question 
and, second, one projects the restricted dynamics 
onto the symmetry orbit quotients of the spaces 
constructed in the first step. Sometimes, the 


flow-invariant manifolds appear as the level sets of a 
momentum map induced by the symmetry of the 
system. 


Symmetry Reduction 
The Symmetries of a System 


The standard mathematical fashion to describe the 
symmetries of a dynamical system (see Dynamical 
Systems in Mathematical Physics: An Illustration 
from Water Waves) X € 4(M) defined on a mani- 
fold M(X(M) denotes the Lie algebra of smooth 
vector fields on M endowed with the Jacobi—Lie 
bracket [-,-]) consists in studying its invariance 
properties with respect to a smooth Lie group 
®:G x M-—M (continuous symmetries) or Lie 
algebra 6:q—X(M)_ (infinitesimal symmetry) 
action. Recall that ® is a (left) action if the map 
g € Gr Gg, -) € Diff(M) is a group homomorph- 
ism, where Diff(M) denotes the group of smooth 
diffeomorphisms of the manifold M. The map ¢ is a 
(left) Lie algebra action if the map € € gr (€) € 
X(M) is a Lie algebra antihomomorphism and the 
map (m,&) € M x g> ¢(€)(m) € TM is smooth. The 
vector field X is said to be G-symmetric whenever it 
is equivariant with respect to the G-action ®, that is, 
Xo@,=T®,0X, for any ge G. The space of 
G-symmetric vector fields on M is denoted by 
X(M)°. The flow F, of a G-symmetric vector 
field XeEX(M)° is G-equivariant, that is, 
F; o @, =, 0 F;, for any g € G. The vector field X is 
said to be q-symmetric if [¢(€), X] = 0, for any £ € g. 

If q is the Lie algebra of the Lie group G (see Lie 
Groups: General Theory) then the infinitesimal gen- 
erators m € X(M) of a smooth G-group action 
defined by 


(exp té, m), 
t=0 


Ecg mEM 


constitute a smooth Lie algebra g-action and we 
denote in this case ¢(€) = £m. 

If m € M, the closed Lie subgroup Gn := {g € G| 
(g, m)=m} is called the isotropy or symmetry 
subgroup of m. Similarly, the Lie subalgebra 
gm := {E € g| d(€)(m) =O} is called the isotropy or 
symmetry subalgebra of m. If g is the Lie algebra of 
G and the Lie algebra action is given by the 
infinitesimal generators, then q,, is the Lie algebra 
of Gm. The action is called free if G,, = {e} for every 
m € M and locally free if g,, = {0} for every m € M. 
We will write interchangeably ®(g,m)=®,(m)= 
o&”(g)=g-m, for m E€ M and geG. 

In this article we will focus mainly on continuous 
symmetries induced by proper Lie group actions. 
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The action ® is called proper whenever for any 
two convergent sequences {Mn}nen and {2_-my,:= 
P(8n,Mn)}nen in M, there exists a convergent 
subsequence {gy,},ex in G. Compact group actions 
are obviously proper. 


Symmetry Reduction of Vector Fields 


Let M be a smooth manifold and G a Lie group 
acting properly on M. Let X € X(M)° and F, be its 
(necessarily equivariant) flow. For any isotropy 
subgroup H of the G-action on M, the H-isotropy 
type submanifold My := {m € M|G,,=H} is pre- 
served by the flow F;. This property is known as the 
law of conservation of isotropy. The properness of 
the action guarantees that G,, is compact and that 
the (connected components of) My are embedded 
submanifolds of M for any closed subgroup H of G. 
The manifolds My are, in general, not closed in M. 
Moreover, the quotient group N(H)/H (where N(H) 
denotes the normalizer of H in G) acts freely and 
properly on My. Hence, if ty: My > Mp /(N(H)/H) 
denotes the projection onto orbit space and 
in:My—M is the injection, the vector field X 
induces a unique vector field X on the quotient 
Mi /(N(H)/H) defined by X” o ry = Try o X oip, 
whose flow ie is given by Ee oTH=TH OF, o ip. We 
will refer to X” € X(My/(N(H)/H)) as the H-isotropy 
type reduced vector field induced by X. 

This reduction technique has been widely 
exploited in handling specific dynamical systems. 
When the symmetry group G is compact and we are 
dealing with a linear action, the construction of the 
quotient M;;/(N(H)/H) can be implemented in a 
very explicit and convenient manner by using the 
invariant polynomials of the action and the theo- 
rems of Hilbert and Schwarz—Mather. 


Symplectic Reduction 


Symplectic or Marsden—Weinstein reduction is a 
procedure that implements symmetry reduction for 
the symmetric Hamiltonian systems defined on a 
symplectic manifold (M,w). The particular case in 
which the symplectic manifold is a cotangent bundle 
is dealt with separately (see Cotangent Bundle 
Reduction). We recall that the Hamiltonian vector 
field X, € X(M) associated to the Hamiltonian 
function h € C®(M) is uniquely determined by the 
equality w(X,,-)=dh. In this context, the symme- 
tries ®:G x M—M of interest are given by sym- 
plectic or canonical transformations, that is, 
®)w=w, for any g € G. For canonical actions each 
G-invariant function h € C®(M)® has an associated 
G-symmetric Hamiltonian vector field X,. A Lie 
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algebra action y is called symplectic or canonical if 
Loew =Q for all €€ g, where £ denotes the Lie 
derivative operator. If the Lie algebra action is 
induced from a canonical Lie group action by taking 
its infinitesimal generators, then it is also canonical. 


Momentum Maps 


The symmetry reduction described in the previous 
section for general vector fields does not produce a 
well-adapted answer for symplectic manifolds (M, w) 
in the sense that the reduced spaces My/(N(H)/H) 
are, in general, not symplectic. To solve this 
problem one has to use the conservation laws 
associated to the canonical action, which often 
appear as momentum maps. 

Let G be a Lie group acting canonically on the 
symplectic manifold (M, w). Suppose that for any € € q, 
the vector field ém is Hamiltonian, with Hamiltonian 
function Jè € C®(M) and that £ € gJ? € C*(M) is 
linear. The map J:M — g* defined by the relation 
(J(z),€) = J5(z), for all €€ g and z € M, is called 
a momentum map of the G-action (see Hamiltonian 
Group Actions). Momentum maps, if they exist, are 
determined up to a constant in g* for any connected 
component of M. 


Examples 1 


(i) (Linear momentum) The phase space of an 
N-particle system is the cotangent space T*R°% 
endowed with its canonical symplectic struc- 
ture. The additive group R?, whose Lie algebra 
is abelian and is also equal to R°, acts 
canonically on it by spatial translation on each 
factor: v : (qp pP) = (q; + v, p), with i=1,...,N. 
This action has an associated momentum map 
J: T*R°’ — R?, where we identified the dual of 
R? with itself using the Euclidean inner pro- 
duct, which coincides with the classical linear 
momentum J(q; p’) = ZN p; 

(ii) (Angular momentum) Let SO(3) act on R? 
and then, by lift, on T*R?, that is, A - (q, p) = 
(Aq, Ap). This action is canonical and has as 
associated momentum map J: T*R? > so0(3)* = 
R°, the classical angular momentum J(q, p) = 
q xp. 

(iii) (Lifted actions on cotangent bundles) The 
previous two examples are particular cases of 
the following situation. Let ®:G x M — M bea 
smooth Lie group action. The (left) cotangent 
lifted action of G on T*O is given by g-ag:= 
Ty gga (aq) for gE G and a, E T*Q. Cotan- 
gent lifted actions preserve the canonical 1-form 
on T*O and hence are canonical. They admit 
an associated momentum map J:T*O —g* 


given by (J(aq),€)=Qg(€o(q)), for any ag € 
T*O and any €€ q. 

(iv) (Symplectic linear actions) Let (V,w) be a 
symplectic linear space and let G be a subgroup 
of the linear symplectic group, acting naturally 
on V. By the choice of G this action is canonical 
and has a momentum map given by 
(J), £) =(1/2)w(Eviv),v), for €€ g and ve V 


arbitrary. 


Properties of the Momentum Map 


The main feature of the momentum map that makes it of 
interest for use in reduction is that it encodes conserva- 
tion laws for G-symmetric Hamiltonian systems. 
Noether’s theorem states that the momentum map is a 
constant of the motion for the Hamiltonian vector field 
X, associated to any G-invariant function h € C°(M)° 
(see Symmetries and Conservation Laws). 

The derivative TJ of the momentum map satisfies 
the following two properties: range (Tm J) =(q,,)° and 
ker Tm J =(q-m)°, for any meM, where (q,,)° 
denotes the annihilator in q* of the isotropy subalgebra 
Gm Of m, g-m:=T(G-m) = {Eu(m)|€ € g} is the 
tangent space at m to the G-orbit that contains this 
point, and (q-7)* is the symplectic orthogonal space 
to g-m in the symplectic vector space (T,,M, w(m)). 
The first relation is sometimes called the bifurcation 
lemma since it establishes a link between the symmetry 
of a point and the rank of the momentum map at 
that point. 

The existence of the momentum map for a given 
canonical action is not guaranteed. A momentum 
map exists if and only if the linear map p:[&] € 
g/[g, Gg] > [w(Emu, -)] € H'(M,R) is identically zero. 
Thus, if H'(M,R)=0 or g/[9,9]=H'(g,R)=0 
then p= 0. In particular, if g is semisimple, the 
“first Whitehead lemma” states that H'(q,R)=0 
and therefore a momentum map always exists for 
canonical semisimple Lie algebra actions. 

A natural question to ask is when the map 
(g, [- ’ =) Jii (C™(M), [ 4 ‘}) defined by Em JE Eg, 
is a Lie algebra homomorphism, that is, 
JE” =J, Jh &neg Here {-,:}:C™(M) x 
C~(M) — C%(M) denotes the Poisson bracket asso- 
ciated to the symplectic form w of M defined by 
{f, h}:=w(Xf, Xp), f, h € C°(M). This is the case if 
and only if T,J(&u(z))=—ad-J(z), for any £ € g9, 
z€M, where ad’ is the dual of the adjoint 
representation ad:(&,7) € g x gee [6,7] € g of g on 
itself. A momentum map that satisfies this relation 
in called infinitesimally equivariant. The reason 
behind this terminology is that this is the infinitesi- 
mal version of global or coadjoint equivariance: J is 
G-equivariant if Ad, oJ=Jo®, or, equivalently, 


Ji tlg- z)= Jf), for all g€ G, eg, and z €M; 
Ad” denotes the dual of the adjoint representation 
Ad of G on g. Actions admitting infinitesimally 
equivariant momentum maps are called Hamilto- 
nian actions and Lie group actions with coadjoint 
equivariant momentum maps are called globally 
Hamiltonian actions. If the symmetry group G is 
connected then global and infinitesimal equivariance 
of the momentum map are equivalent concepts. If g 
acts canonically on (M,w) and H'(q,R)={0} then 
this action admits at most one infinitesimally 
equivariant momentum map. 

Since momentum maps are not uniquely defined, 
one may ask whether one can choose them to be 
equivariant. It turns out that if the momentum map is 
associated to the action of a compact Lie group, this 
can always be done. Momentum maps of cotangent 
lifted actions are also equivariant as are momentum 
maps defined by symplectic linear actions. Canonical 
actions of semisimple Lie algebras on symplectic 
manifolds admit infinitesimally equivariant momen- 
tum maps, since the “second Whitehead lemma” 
states that H? (g, R)=0 if g is semisimple. We shall 
identify below a specific element of H? (g, R) which is 
the obstruction to the equivariance of a momentum 
map (assuming it exists). 

Even though, in general, it is not possible to 
choose a coadjoint equivariant momentum map, it 
turns out that when the symplectic manifold is 
connected there is an affine action on the dual of the 
Lie algebra with respect to which the momentum 
map is equivariant. Define the nonequivariance 
1-cocycle associated to J as the map 0:G—q* 
given by g= J(®,(z))- Ad) (J(z)). The connectivity 
of M implies that the right-hand side of this equality 
is independent of the point z € M. In addition, ø is a 
(left) q*-valued 1-cocycle on G with respect to the 
coadjoint representation of G on g*, that is, 
o(gh) = o(g) + Ad, .0(h) for all g,b € G. Relative to 
the affine action O:Gxq*—-+q* given by 
(g, pu) Adiap + o(g), the momentum map J is 
equivariant. The “reduction lemma,” the main 
technical ingredient in the proof of the reduction 
theorem, states that for any m € M we have 


Gimym = gm N ker Tan J = gm N (g-m)* 


where Sym) is the Lie algebra of the isotropy group 
Gyim) of J(m) € g* with respect to the affine action 
of G on g* induced by the nonequivariance 
1-cocycle of J. 


The Symplectic Reduction Theorem 


The symplectic reduction procedure that we now 
present consists of constructing a new symplectic 
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manifold out of a given symmetric one in which the 
conservation laws encoded in the form of a 
momentum map and the degeneracies associated to 
the symmetry have been eliminated. This strategy 
allows the reduction of a symmetric Hamiltonian 
dynamical system to a dimensionally smaller one. 
This reduction procedure preserves the symplectic 
category, that is, if we start with a Hamiltonian 
system on a symplectic manifold, the reduced system 
is also a Hamiltonian system on a symplectic 
manifold. The reduced symplectic manifold is 
usually referred to as the symplectic or Marsden- 
Weinstein reduced space. 


Theorem 2 Let ®:GxM-—M be a free proper 
canonical action of the Lie group G on the connected 
symplectic manifold (M,w). Suppose that this action 
has an associated momentum map J: M— q*, with 
nonequivariance 1-cocycle o: G — g*. Let u € g* be 
a value of J and denote by G, the isotropy of u under 
the affine action of G on q*. Then: 


(i) The space M, := J" (u)/G, is a regular quotient 
manifold and, moreover, it is a symplectic 
manifold with symplectic form w, uniquely 
characterized by the relation 


T Wy = UW 
The maps i,:J'(u)—>M and m,:J'(u)—> 
J (u)/G,, denote the inclusion and the projec- 
tion, respectively. The pair (M,,,w,,) is called the 
symplectic point reduced space. 
Let h € C®(M)° be a G-invariant Hamiltonian. 
The flow F, of the Hamiltonian vector field X, 
leaves the connected components of J \(1) 
invariant and commutes with the G-action, so 
it induces a flow Ff on M, defined by 
tT, °F, 01, =F, 0 my. 
The vector field generated by the flow F on 
(M,,,w,,) is Hamiltonian with associated 
reduced Hamiltonian function h,, € C°(M,,) 
defined by h,on,=hoi,. The vector fields 
X, and X,, are 1,-related. The triple 
(Mu, Wu, Pu) is called the reduced Hamiltonian 
system. 
(iv) Let k € C~(M)° be another G-invariant func- 
tion. Then {h,k} is also G-invariant and 
(b, k}, = {Pus Riku,» where {-,-}y, denotes the 
Poisson bracket associated to the symplectic 
form w, on M,,. 


— 
p= o 
p—t o 
— 


x 


(iii 


Reconstruction of Dynamics 


We pose now the question converse to the reduction 
of a Hamiltonian system. Assume that an integral 
curve c,(t) of the reduced Hamiltonian system X,, 
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on (M,,,w,,) is known. Let mo € J(u) be given. One 
can determine from this data the integral curve of 
the Hamiltonian system X, with initial condition 
mo. In other words, one can reconstruct the solution 
of the given system knowing the corresponding 
reduced solution. The general method of reconstruc- 
tion is the following. Pick a smooth curve d(t) in 
J(u) such that d(0) = mo and 1,,(d(t)) =c,,(t). Then, 
if c(t) denotes the integral curve of X, with 
c(0)=mo, we can write c(t)=g(t)-d(t) for some 
smooth curve g(t) in G, that is obtained in two 
steps. First, one finds a smooth curve €(t) in g, 
such that €(t),y(d(t))=X,(d(t)) —d(t). With the 
E(t) € g, just obtained, one solves the nonautono- 
mous differential equation (t) =TeLgm Elt) on G, 
with g(0) = 


The Orbit Formulation of the Symplectic 
Reduction Theorem 


There is an alternative approach to the reduction 
theorem which consists of choosing as numerator of 
the symplectic reduced space the group invariant 
saturation of the level sets of the momentum map. 
This option produces as a result a space that is 
symplectomorphic to the Marsden—Weinstein quo- 
tient but presents the advantage of being more 
appropriate in the context of quantization problems. 
Additionally, this approach makes easier the com- 
parison of the symplectic reduced spaces corres- 
ponding to different values of the momentum map 
which is important in the context of Poisson 
reduction (see Poisson Reduction). In carrying out 
this construction, one needs to use the natural 
symplectic structures that one can define on the 
orbits of the affine action of a group on the dual of 
its Lie algebra and that we now quickly review. 

Let G be a Lie group, o0:G—q* a coadjoint 
1-cocycle, and u € q*. Let O, be the orbit through u 
of the affine G-action on q* associated to o. If 
“:q x g—R defined by 


d 
DG n) = dt 


is a real-valued Lie algebra 2-cocycle (which is 
always the case if ø is the derivative of a smooth 
real-valued group 2-cocycle or if o is the non- 
equivariance 1-cocycle of a momentum map), that 
is, X: x g—> R is skew-symmetric and X([£, n], C) + 
=([7,¢],€) FAG tor all €,7,¢ €q, then 
the affine orbit O, is a symplectic manifold with 
G-invariant symplectic structure Wo, given by 


wo, (v) (Eq: (v), Ng: (v)) = ( 


(o(exp(t§), n) 


t=0 





v, én) FEE n) [1] 


for arbitrary v€O,, and é nņn€g. The symbol 
£y (v) := —adž «y+ X(€,-) denotes the infinitesimal 
generator of “the affine action on q* associated to 
€€q. The symplectic structures wou* on O, are 
called the (+)-orbit or Kostant—Kirillov-Souriau 
(KKS) symplectic forms. 

This symplectic form can be obtained from 
Theorem 2 by considering the symplectic reduction 
of the cotangent bundle T*G endowed with the 
magnetic symplectic structure Wy := Wean — 7 By, 
where wean is the canonical symplectic form on 
T*G,r:T*G—G is the projection onto the base, 
and By € 2?(G)° is a left-invariant 2-form on G 
whose value at the identity is the Lie algebra 
2-cocycle ©:q x q—R. Since X is a cocycle, it 
follows that Bs is closed and hence ws is a 
symplectic form. Moreover, the lifting of the left 
translations on G provides a canonical G-action on 
T*G that has a momentum map given by 
J(e, p) = O(g, u), (g, u) € G x g* ~ T*G, where the 
trivialization G x g* ~ T*G is obtained via left 
translations. Symplectic reduction using these ingre- 
dients yields symplectic reduced spaces that are 
naturally symplectically diffeomorphic to the affine 
orbits O,, with the symplectic form [1]. 


Theorem 3 (Symplectic orbit reduction). Let®:G x 
M— M be a free proper canonical action of the Lie 
group G on the connected symplectic manifold (M, w). 
Suppose that this action has an associated momentum 
map J:M—>Qg*, with nonequivariance 1-cocycle 
o: G — g“. Let O, :=G - u C g* be the G-orbit of the 
point u € g* with respect to the affine action of G on 
q* associated to o. Then the set Mo,:= J” (0,)/G 
is a regular quotient symplectic manifold with 
the symplectic form wo, anguen, characterized by 
the relation ig w=To, WO, + Jo, Wo, , where Jo, is 
the restriction of J to T E On) and wO, is the (+)- 
symplectic structure on the affine orbit On. The maps 
io: J HO yp) 7M and To, : J- HO „) > Mo, are nat- 
ural injection and the projection, e The pair 
(Mo,,wo,) is called the symplectic orbit reduced space. 
Statements similar to (ii)-(iv) in Theorem 2 can be 
formulated for the orbit reduced spaces (Mo,,wo,). 


We emphasize that given a momentum value u € q*, 
the reduced spaces M, and Mo, are symplectically 
diffeomorphic via the projection to the quotients of the 
inclusion J! (u) oJ ( Oy): 

Reduction at a general point can be replaced by 
reduction at zero at the expense of enlarging the 
manifold by the affine orbit. Consider the canonical 
diagonal action of G on the symplectic difference 
Mo Ois which is the manifold M x 0, with the 
symplectic form mjw- Woes where 71:M x 
O, —M and m2:M x O,,—O, are the projections. 


A momentum map for this action is given by Jo 
T1 -m:M o 0} — g“. Let (MS Of )o:=((Jom — 
m7) OG wo wò )o) be the symplectic point 
reduced space at Zero. 


Theorem 4 (Shifting theorem). Under the hypoth- 
eses of the symplectic orbit reduction theorem 
(Theorem 3), the symplectic orbit reduced space 
Mo,, the point reduced spaces M,, and (M © O7)o 
are symplectically diffeomorphic. 


Singular Reduction 


In the previous section we carried out symplectic 
reduction for free and proper actions. The freeness 
guarantees via the bifurcation lemma that the 
momentum map J is a submersion and hence the 
level sets J7! (u) are smooth manifolds. Freeness and 
properness ensure that the orbit spaces 
M,:=J-'(u)/G, are regular quotient manifolds. 
The theory of singular reduction studies the proper- 
ties of the orbit space M,, when the hypothesis on 
the freeness of the action is dropped. The main 
result in this situation shows that these quotients are 
symplectic Whitney stratified spaces, in the sense 
that the strata are symplectic manifolds in a very 
natural way; moreover, the local properties of this 
Whitney stratification make it into what is called a 
cone space. This statement is referred to as the 
“symplectic stratification theorem” and adapts to 
the symplectic symmetric context the stratification 
theorem of the orbit space of a proper Lie group 
action by using its orbit type manifolds. In order to 
present this result, we review the necessary defini- 
tions and results on stratified spaces (see Singularity 
and Bifurcation Theory for more information on 
singularity theory). 


Stratified Spaces 


Let Z be a locally finite partition of the topological 
space P into smooth manifolds $; C P,i € I. We 
assume that the manifolds S$; C P,i € J, with their 
manifold topology are locally closed topological sub- 
spaces of P. The pair (P, Z) isa decomposition of P with 
pieces in Z when the following condition is satisfied: 


Condition (DS) If R,S € Z are such that RNS Æ 9, 
then RCS. In this case we write R < S. If, in 
addition, R Æ S we say that R is incident to S or that 
it is a boundary piece of S and write R < S. 


The above condition is called the frontier condition 
and the pair (P, Z) is called a decomposed space. The 
dimension of P is defined as dim P = sup{dim §; | S; € 
Z}, If k € N, the k-skeleton P* of P is the union of all 
the pieces of dimension smaller than or equal to k; its 
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topology is the relative topology induced by P. The 
depth dp(z) of any z € (P, Z) is defined as 


dp(z) = sup{k EN | 5So0,81,..., Sk EZ 
with z € So < Sy Kee Sh} 


Since for any two elements x,y € S in the same piece 
S € P we have dp(x)=dp(y), the depth dp(S) of the 
piece S is well defined by dp(S):=dp(x),x E€ S. 
Finally, the depth dp(P) of (P,Z) is defined by 
dp(P) :=sup{dp(S) | S € Z}. 

A continuous mapping f:P— O between the 
decomposed spaces (P, Z) and (Q, V) is a morphism 
of decomposed spaces if, for every piece S € Z, there 
is a piece Tey such that f(S)CT and the 
restriction f|s:S—T is smooth. If (P, Z) and (P, T) 
are two decompositions of the same topological 
space we say that Z is coarser than 7 or that T is 
finer than Z if the identity mapping (P, 7) —> (P, Z) 
is a morphism of decomposed spaces. A topological 
subspace O C P is a decomposed subspace of (P, Z) 
if, for all pieces S € Z, the intersection SMO is a 
submanifold of S and the corresponding partition 
ZO forms a decomposition of O. 

Let P be a topological space and z € P. Two subsets 
A and B of P are said to be equivalent at z if there is an 
open neighborhood U of z such that AN U=BN U. 
This relation constitutes an equivalence relation on the 
power set of P. The class of all sets equivalent to a 
given subset A at z will be denoted by [A], and called 
the set germ of A at z. If A C B C P, we say that [A], is 
a subgerm of [B],, and denote [A], c [B],. 

A stratification of the topological space P is a map 
S that associates to any z € P the set germ S(z) of a 
closed subset of P such that the following condition 
is satisfied: 


Condition (ST) For every z € P there is a neighbor- 
hood U of z and a decomposition Z of U such that 
for all y € U the germ S(y) coincides with the set 
germ of the piece of Z that contains y. 


The pair (P,S) is called a stratified space. Any 
decomposition of P defines a stratification of P by 
associating to each of its points the set germ of the 
piece in which it is contained. The converse is, by 
definition, locally true. 


The Strata 


Two decompositions Z, and Z% of P are said to be 
equivalent if they induce the same stratification of P. 
If Z; and Z are equivalent decompositions of P 
then, for all z € P, we have that dpz (z) =dpz,(z). 
Any stratified space (P,S) has a unique decomposi- 
tion Zs associated with the following maximality 
property: for any open subset U C P and any 
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decomposition Z of P inducing S over U, the 
restriction of Zs to U is coarser than the restriction 
of Z to U. The decomposition Zs is called the 
canonical decomposition associated to the stratifica- 
tion (P, S). It is often denoted by S and its pieces are 
called the strata of P. The local finiteness of the 
decomposition Zs implies that for any stratum S 
of (P,S) there are only finitely many strata R with 
S < R. Henceforth, the symbol S in the stratification 
(P,S) will denote both the map that associates to 
each point a set germ and the set of pieces associated 
to the canonical decomposition induced by the 
stratification of P. 


Stratified Spaces with Smooth Structure 


Let (P,S) be a stratified space. A singular or 
stratified chart of P is a homeomorphism 
@:U—¢(U) c R” from an open set UCP to a 
subset of R” such that for every stratum SES 
the image ¢(UNMS) is a submanifold of R” and 
the restriction ¢|y,4;: UNS— (UNS) is a diffeo- 
morphism. Two singular charts ¢: U— (U) c R” 
and y:V—y(V) CR” are compatible if for any 
z€UnNV there exist an open neighborhood 
WcUNV ofz, a natural number N > max {n, m}, 
open neighborhoods O, O' c R of (U) x {0} and 
(V) x {0}, respectively, and a diffeomorphism 
w:O-—-O' such that i, 0 ply =Y o0 ipo bly, where 
i, and i,, denote the natural embeddings of R” and 
R” into R by using the first n and m coordinates, 
respectively. The notion of singular or stratified 
atlas is the natural generalization for stratifications 
of the concept of atlas existing for smooth mani- 
folds. Analogously, we can talk of compatible and 
maximal stratified atlases. If the stratified space 
(P,S) has a well-defined maximal atlas, then we say 
that this atlas determines a smooth or differentiable 
structure on P. We will refer to (P,S) as a smooth 
stratified space. 


The Whitney Conditions 


Let M be a manifold and R,S c M two submani- 
folds. We say that the pair (R,S) satisfies the 
Whitney condition (A) at the point z€ R if the 
following condition is satisfied: 


Condition (A) For any sequence of points {Z,},en 
in S converging to z € R for which the sequence of 
tangent spaces {T}; S} en converges in the Grass- 
mann bundle of dim S—dimensional subspaces of TM 
to 7 C T,M, we have that T,R C7. 


Let 6: U— R” be a smooth chart of M around 
the point z. The Whitney condition (B) at the point 


z € R with respect to the chart (U, œ) is given by the 
following statement: 


Condition (B) Let {xn},en C ROU and {Yn} eN C 
SU be two sequences with the same limit 

a= lax, = lim, yy, 
and such that x, Æ Yn, for all n € N. Suppose that 
the set of connecting lines ¢(x,)¢(y,) C R” con- 
verges in projective space to a line L and that the 
sequence of tangent spaces {T}, S} en converges in 


the Grassmann bundle of (dim S)-dimensional sub- 
spaces of TM to r C T,M. Then, (T) (L) C r. 


If the condition (A) (respectively (B)) is verified 
for every point z € R, the pair (R, S) is said to satisfy 
the Whitney condition (A) (respectively (B)). It can 
be verified that Whitney’s condition (B) does not 
depend on the chart used to formulate it. A stratified 
space with smooth structure such that, for every pair 
of strata, Whitney’s condition (B) is satisfied is 
called a Whitney space. 


Cone Spaces and Local Triviality 


Let P be a topological space. Consider the equiva- 
lence relation ~ in the product P x [0, 00) given by 
(z,a) ~ (z’, a’) if and only if a=a' =0. We define the 
cone CP on P as the quotient topological space P x 
[0, co)/~ . If P is a smooth manifold then the cone 
CP is a decomposed space with two pieces, namely, 
P x (0,oo) and the vertex which is the class 
corresponding to any element of the form (z,0), 
z€P, that is, P x {0}. Analogously, if (P,Z) is a 
decomposed (stratified) space then the associated 
cone CP is also a decomposed (stratified) space 
whose pieces (strata) are the vertex and the sets of 
the form S x (0,00), with S€ Z. This implies, in 
particular, that dimCP=dimP+1 and dp(CP)= 
dp(P) + 1. 

A stratified space (P, S) is said to be locally trivial 
if for any z € P there exist a neighborhood U of z, a 
stratified space (F, SË), a distinguished point 0 € F, 
and an isomorphism of stratified spaces 


w:U—(SNU)xF 


where S is the stratum that contains z and w satisfies 
wy, 0) =y, for all y € SN U. When F is given by a 
cone CL over a compact stratified space L then L is 
called the link of z. 

An important corollary of “Thom’s first isotopy 
lemma” guarantees that every Whitney stratified 
space is locally trivial. A converse to this implication 
needs the introduction of cone spaces. Their defini- 
tion is given by recursion on the depth of the space. 


Definition 5 Let m € NU {œ,w}. A cone space of 
class C” and depth 0 is the union of countably many 
C” manifolds together with the stratification whose 
strata are the unions of the connected components 
of equal dimension. A cone space of class C” and 
depth d+ 1,d € N, is a stratified space (P, S) witha 
C” differentiable structure such that for any z € P 
there exists a connected neighborhood U of z, a 
compact cone space L of class C” and depth d called 
the link, and a stratified isomorphism 


w:U—(SnU) x CL 


where S is the stratum that contains the point z, the 
map w satisfies wv! (y,0)=y, for all y € SN U, and 0 
is the vertex of the cone CL. 

If m Æ 0 then L is required to be embedded into a 
sphere via a fixed smooth global singular chart 
y:L—S' that determines the smooth structure 
of CL. More specifically, the smooth structure of 
CL is generated by the global chart 7:[z,t] € 
CL — ty(z) € RR! The maps %:U — (SMU) x 
CL and y:L-—S! are referred to as a cone chart 
and a link chart, respectively. Moreover, if m Æ 0 
then y and ~~! are required to be differentiable of 
class C” as maps between stratified spaces with a 
smooth structure. 


The Symplectic Stratification Theorem 


Let (M, w) be a connected symplectic manifold acted 
canonically and properly upon by a Lie group G. 
Suppose that this action has an associated momen- 
tum map J:M—q* with nonequivariance 1-cocycle 
a:G—q*. Let weq* be a value of J,G, the 
isotropy subgroup of u with respect to the affine 
action 0:G x q*—>q* determined by o, and let 
H CG be an isotropy subgroup of the G-action on 
M. Let Mj, be the connected component of the 
H-isotropy type manifold that contains a given 
element z E€ M such that J(z)=y and let G,Mj, be 
its G,,-saturation. Then the following hold: 


1. The set J (u) N Gy Mir is a submanifold of M. 

2. The set M ta My T N G M% ]/G,„ has a unique 
quotient differentiable structure such that the 
canonical projection a : J” (u) N G, M% — 
MH) is a surjective submersion. 

3. There is a unique symplectic structure wo 
M” characterized by 


on 


where i): J(u) N G, Mł — M is the natural 
inclusion. The pairs (MIM), ut!) will be called 
singular symplectic point strata. 
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4. Let h € CX%(M)° be a G-invariant Hamiltonian. 
Then the flow F, of X, leaves the connected 
components of J~' (4) N G M3, invariant and com- 
mutes with the G,,-action, so it induces a flow F; on 
M‘") that is characterized by mi” 0 Foi = 
F! o rD, 

5. The flow F? is Hamiltonian on M® 
reduced Hamiltonian function h™ 
defined by pi on =h o it”) 
Xp and X,™ aie a )-related. 

6. Let k:M R be another G-invariant aan 
Then {h, k} is also G-invariant and {h, ky rath, 

RO Nn mu! i, where {, },, denotes the Poon oo 

ae b the sa leche structure on MH 


with 
H) sR 
. The vector fields 


Theorem 6 (Symplectic stratification theorem). The 
quotient M,, :—= J! u)/G, is a cone space when 
considered as a stratified space with strata M” 


As was the case for regular reduction, this theorem 
can also be formulated from the orbit reduction point 
of view. Using that approach one can conclude 
that the orbit reduced spaces Mg are cone 

. . . i . 
spaces symplectically stratified by the manifolds 


MG :=G- (J7 (u) N Mi)/G that have symplectic 
structure uniquely determined by the expression 

-(H)x H)x (H H)x 

a= aa + J, 
ya A < A ) A M4 )=— M is the inclusion, 
Jo): Gul T'u ) O M4) +O, is obtained R restric- 


tion of the momentum map J, and wO, is the 
(+)-symplectic form on O,,. Analogous statements 
to (7)-(6) above with Poa modifications are valid. 


See also: Cotangent Bundle Reduction; Dynamical 
Systems in Mathematical Physics: An Illustration 

from Water Waves; Graded Poisson Algebras; 
Hamiltonian Group Actions; Lie Groups: General Theory; 
Poisson Reduction; Singularity and Bifurcation Theory; 
Symmetries and Conservation Laws. 
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Introduction 


Spontaneous symmetry breaking in its simplest form 
occurs when there is a symmetry of a dynamical 
system that is not manifest in its ground state or 
equilibrium state. It is a common feature of many 
classical and quantum systems. In quantum field 
theories, in the infinite-volume limit, there are new 
features, the appearance of unitarily inequivalent 
representations of the canonical commutation 
relations, and the possibility of a true phase 
transition — a point in the phase space where the 
thermodynamic free energy is nonanalytic. The 
spontaneous breaking of a continuous global sym- 
metry implies the existence of massless particles, the 
Goldstone bosons, while in the local-symmetry case 
some or all of these may be eliminated by the Higgs 
mechanism. Spontaneous symmetry breaking in 
gauge theories is however a more elusive concept. 


Breaking of Global Symmetries 


In a quantum-mechanical system a (time-independent) 
symmetry is represented by a unitary operator U 
acting on the Hilbert space of quantum states which 


commutes with the Hamiltonian H. If the ground state 
10) of the system in not invariant under U, then 
10’) =U|0) £ c|0} is also a ground state. In other 
words, the ground state is degenerate. 

For a system with a finite number of degrees 
of freedom, whose states are represented by vectors 
in a separable Hilbert space H, symmetry breaking 
of an abelian symmetry group G is impossible, 
unless there are additional accidental symmetries. 
Consider, for example, a particle in a double-well 
potential 


=g (x* — a’) [1] 
which has the discrete symmetry group G = Z2; the 
inversion symmetry operator U satisfies U2 =1. 
There are then two approximate ground states |0) 
and |0’) = U0), with wave functions proportional to 
exp[—(1/2) mw(x + a)*|. However, there is an over- 
lap between these, and the off-diagonal matrix 
element (0|H|0’) is nonzero, although exponentially 
small, so the true energy eigenstates are, approxi- 
mately, |0+)=(1/W2)(|0) +|0’)). (More accurate 
energy eigenfunctions and eigenvalues may be 
found by using the WKB approximation.) 

Of course, if the symmetry group is nonabelian, 
and the ground state belongs to a nontrivial 
representation, then degeneracy is unavoidable. For 
example, if G is the rotation group SO(3) (or SU(2)) 


and the ground state has angular momentum j Æ 0, 
then it is (2j + 1)-fold degenerate. 

The situation is different, however, in a quantum 
field theory. In the infinite-volume limit, even abelian 
symmetries can be spontaneously broken. Take, for 
example, a real scalar field with Lagrangian 


£L = 50,60"¢ -V =i -V-V B 


(where we set c=h=1), again with a double-well 
potential 


V=W@ -r [3] 
which 


exhibiting a Z2 under 
P(x) — G(x). 

At least in the semiclassical or tree approxi- 
mation, there are two degenerate vacuum states |0) 


and |0’), with 


(O]O(x)|0) =n and (0'|¢(x)|0')& —n A 


If we quantize the system in a box of finite volume 
V, then, as earlier, there is an off-diagonal matrix 
element of the Hamiltonian connecting the two 
states, so the true ground state is (approximately) 
(1//2)(|0) + |0’)). However, this matrix element 
goes to zero exponentially as V — 0. Even for large 
but finite volume, the rate of transitions from |0} to 
|0") is exponentially slow. 

Similarly, we can consider a complex scalar field 
theory with a sombrero potential: 


symmetry 


L= |$} —|Volr-V 
2 
V =4A(|9? — 47) 


This model is invariant under the U(1) group of phase 
transformations, ¢(x)> o(x)e!®, so we now have a 
continuously infinite set of degenerate vacuum states 
0.) labeled by an angle a, and satisfying 


[5] 


(0al$()l0a) ne G 
V2 

Once again, one finds that in the infinite-volume 
limit there are no matrix elements connecting the 
different vacuum states. Moreover, in this limit no 
polynomial formed from the field operators ¢(x) in 
a finite volume can have nonzero matrix elements 
between |0,) and |05) for a #8. Applying the 
operators ģ(x) to any one of these vacuum states 
(On), we can construct a Fock space Ha, and the 
representations of the canonical commutation rela- 
tions on these separate Hilbert spaces are unitarily 
inequivalent. Formally, we can introduce operators 

Ua that perform the symmetry transformations: 


Uao(x)U~! = (x) 7] 
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However, these are not unitary operators on the 
spaces Hg, but rather maps from one space to 
another: Üa: Hg — Hog — or, alternatively, opera- 
tors on the nonseparable Hilbert space H = @,, Ha. 

So far, our discussion has been restricted to the 
tree approximation. For a full quantum treatment, 
V(¢) must be replaced by the effective potential 
V.r¢(), which may be defined as the minimum value 
of the mean energy density in all states in which the 
field é has the uniform expectation value (¢(x)) = ¢. 
Vers May be computed by summing vacuum loop 
diagrams. 

A point to note is that although the degenerate 
vacua |0,) are mathematically distinct, in the 
absence of any external definition of phase, they 
are physically identical. There is no internal obser- 
vational test that will distinguish them. 


Symmetry-Breaking Phase Transitions 


Spontaneous symmetry breaking often occurs in the 
context of a phase transition. At high temperature, 
T >n, there are large fluctuations in ¢@ and the 
central hump of the potential is unimportant. Then 
the equilibrium state is symmetric, with (d)=0. 
However, as the temperature falls, it becomes less 
probable that the field will fluctuate over the top of 
the hump. It will tend to fall into the trough, and 
acquire a nonzero average value (¢) — the order 
parameter for the phase transition — thus breaking 
the symmetry. The direction of symmetry breaking 
(e.g., the phase of ¢ in the U(1) model) is random, 
determined in practice by small preexisting fluctua- 
tions or interactions with the environment. 

One way of studying this process is to compute 
the temperature-dependent' effective potential 
Velo, T). In the one-loop approximation, at high 
temperature, the leading corrections to the zero- 
temperature effective potential V.¢(¢,T) are of the 
form 


2 
Vi oT] = Va G0) = gN T 
1 n 2 
T 54M (WI FOT) [8] 


where N, is the total number of helicity states of light 
particles (those with masses « T), and M2, which 
depends on @, is the sum of their squared masses. 
(Fermions if present contribute to N, with a factor of 
7/8 and to M? with a factor of 1/2.) In the simplest 
case, where we have only a multiplet @=(¢a),-1,..N 
of real scalar fields, N,=N and M*=M? 


aa 
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(summation over a implied), where the mass-squared 
matrix 1s 


8? V 
Mi, = 
ad Oba0bp 


For example, in an O(N) theory, with V = (1/8) 
A(G? — n}, where °= da¢,, one has 





[9] 


Min = FAC — 17) bab + AQaQr [10] 
whence 
Vel, T) HAG? — 12)? -Z NTA 
S 8 90 


+ 7 \T*[(N +2) -— Nn’) [11] 
It is then easy to see that the minimum occurs at 
ġ=0 for T >T., where in this approximation 
T =127?/(N +2), while below the critical tem- 
perature the minimum is at 

N+2_, 

F T [12] 
As T — 0, the equilibrium state approaches one of 
the vacuum states |0,), labeled by an N-dimensional 
unit vector n, such that (0,/@|0,,) = mn. 

It is often convenient to introduce a classical 
symmetry-breaking potential. For example, in the 
O(N) model, we may take Vp =—j-@(x), where j 
is a constant N-vector. This has the effect of tilting the 
potential, thus removing the degeneracy. A character- 
istic of spontaneous symmetry breaking is that the 
limits j — 0 and V — œ do not commute. If (for 
T < T,) we take the infinite-volume limit first, and 
then let 7 — 0, we get different equilibrium states, 
depending on the direction from which jf approaches 
zero; if we fix n and let j =jn, j — 0, then we find 


g = palT) S n° 


A 


lim Jim (9(%))in = PealT)n [13] 
We may also regard j as representing an interac- 
tion with the external environment (e.g., other 
fields). If such a term is present during the cooling 
of the system through the phase transition, it will 
constrain the direction of the spontaneous symmetry 
breaking. Note that one always arrives in this way 
at one of the degenerate vacua |0,), not a linear 
combination of them. 


Goldstone Bosons 


The Goldstone theorem states that spontaneous 
breaking of any continuous global symmetry leads 
inevitably (except, as we discuss later, in the 


presence of long-range forces) to the appearance of 
massless modes — the Goldstone bosons. 

The proof is straightforward. Associated with any 
continuous symmetry there is a Noether current 
satisfying the continuity equation Ə," =Q and such 
that infinitesimal symmetry transformations are 
generated by the spatial integral of j°. The fact that 
the symmetry is broken means that there is some 
scalar field ¢(x) whose vacuum expectation value 
(0|4(0)|0) is not invariant under the symmetry 
transformation. Hence, 


lim i | ax(01G°(%),4O)|O)la-0 #0 [14] 
l 


y—0 


Moreover, the time derivative of this integral is 
lim i [dx (012o? (x), 4(0)]10)| 0 
y—0 y 


= -lim i J ASOD, AONO = 0 [15 


where OV is the bounding surface of V. This vanishes 
because the surface integral is zero — in a relativistic 
theory, because the commutator vanishes at space- 
like separation, and more generally in the absence of 
long-range interactions because it tends rapidly to 
zero at large spatial separation. 

Now, inserting a complete set of momentum 
eigenstates |n, p) in [14], we can see that there must 
exist states such that (n, p|é(0)|0) 4 0, with p? — 0 
in the limit |p| — 0, that is, massless modes. 

One can see this more directly in the U(1) model 
above. Consider a vacuum state |0) such that 
(0|4|0) =n/V2 is real. Then it is useful to shift the 
origin of @ by writing 


i@)= +e) +i) el 
where y1 and %2 are real. Then the Lagrangian 
becomes 


L =i - (Ver)? + 03 — (Ven)? -Anyi 
2. 
—Anver (yy + 43) Ali + $5) | [17] 


Evidently, the field %1, corresponding to radial 
oscillations in ¢, is massive, with mass vAn. But 
there is no term in y5, so y2 is massless. 

In the case of spontaneous symmetry breaking of 
nonabelian symmetries, there may be several Gold- 
stone bosons, one for each broken component of the 
continuous symmetry. In our theory with symmetry 
group G=O(N), the possible values of the vacuum 
expectation value at T=0 are (0,{(0)|0,) = nn, 


where n is an arbitrary unit vector. In this case, for 
given n, there is an unbroken symmetry subgroup 


H={REO(N): Ra=n}=O(N-1) [18] 
and the number of broken symmetries is 
dim G — dim H=N-1 [19] 


Thus, the radial component of ọ is massive, and 
there are N-— 1 Goldstone bosons, the N- 1 
transverse components. 


Spontaneously Broken Gauge Theories 


As we shall see, symmetry breaking in gauge 
theories is a more problematic concept but, for the 
moment, these complications are ignored and the 
present discussion will continue with an approach 
similar to that used above. 

The simplest local gauge symmetry theory is a 
U(1) Higgs model, a model of a complex scalar field 
g(x) interacting with a gauge potential A,,(x), 
described by the Lagrangian 


£=D,o"D"d —1F,,F” — V(\9l) [20 


where V is a sombrero potential as in [5], while the 
covariant derivative D,@ and gauge field F, are 
given by 

Dd = Ong + ieA,zd, B= Oy — Oy, [21] 
The model is invariant under the local U(1) gauge 
transformations 


p(x) = p(x)e 


1 |22] 
A(x) = Ay(x) — z nol) 

The Goldstone theorem does not apply to local- 
symmetry theories. The problem is that to have a 
Hilbert space containing only physical states one 
must eliminate the gauge freedom by choosing a 
gauge condition (e.g., in the U(1) case the Coulomb 
gauge 0,A*(x) =0, which has the effect of restricting 
the number of polarization states of photons to 
two). This necessarily breaks manifest Lorentz 
invariance, although the theory is, of course, still 
fully Lorentz invariant. The proof of the theorem 
fails because the current is no longer local; the long- 
range Coulomb interaction makes the commutator 
fall off only like 1/7, so the surface integral no 
longer vanishes in the infinite-volume limit. (The 
theorem also fails for nonrelativistic models with 
long-range forces.) 
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Again, consider a vacuum state |0) in which 
(0|d|0) =n/V2, and make the same decomposition, 
[16]. Then, if we set 


1 
Al =A, + a Ou? [23] 


we find that the kinetic term for %2 has been 
absorbed into a mass term (1/2)e*n? A A" for the 
vector field. We have a model with only massive 
fields: the “Higgs field” yı with mass vAn and the 
gauge field A), with mass e7. The Goldstone bosons 
have been “eaten up” by the vector field to provide 
its longitudinal mode. This is the Higgs mechanism, 
first noted by Anderson in the context of the photon 
in a plasma becoming a massive plasmon. 

A more elegant way of seeing this is to note that 
we can always make a gauge transformation to 
ensure that ¢ is real (at least so long as ¢ Æ 0; where 
it is zero, there may be problems). This means that 
d(x) = (1/V2)(n + v1); p2 disappears altogether, and 
its kinetic term reduces to (1/2)e7A,A“(n + v1)", 
which includes the mass term for A, as well as cubic 
and quartic interaction terms. 

As before, the discussion can be generalized to 
nonabelian theories, although there are additional 
problems to be discussed later. If we have a local 
symmetry group G that breaks spontaneously to 
leave an unbroken subgroup H, then the gauge fields 
associated with H remain massless. Each of the 
(dim G — dim H) complementary fields “eats up” 
one of the Goldstone bosons, becoming massive in 
the process. We are left only with other, “radial” 
components of @, the massive Higgs fields. 

Consider, for example, a local SO(3) model, 
with scalar fields $=(¢,),—1,7,3 and gauge potentials 
A,, = (Aau). The infinitesimal gauge transformations are 


6d = 00 x @, dA, =0@ x Å — ~ 0,50 [24] 


where 6@ is the gauge parameter. The Lagrangian is 
L=3Dub: D'9 — SF F” XG 1 [25 
where the covariant derivative and gauge field are 


DP = Ong + eA, x @ 


[26] 
F, = 0,A, — 0A, + eA, x A, 


If we take (@) in the 3-direction, the fields A1, and 
A2, absorb the Goldstone fields $1, ¢2 to become 
massive. As in the abelian case, we can use the local 
SO(3) invariance to rotate @ everywhere to the 
3-direction, and write @=(0,0,7+ 3). In this 
gauge the kinetic term (1/2)(eA,, x @)* gives a mass 
en to the fields Aj,,, A2, while A3,, remains massless, 
and the Higgs field y3 again has mass vAn. 


202 Symmetry Breaking in Field Theory 


Elitzur’s Theorem; the Role of 
Gauge Fixing 


The concept of spontaneous symmetry breaking in 
the context of a local symmetry requires further 
discussion, in particular because of Elitzur’s theo- 
rem, proved in 1975, which states in essence that 
“spontaneous breaking of a local symmetry is 
impossible.” In the light of this theorem, it may 
seem that a “spontaneously broken gauge theory” is 
an oxymoron. In fact, it means something rather 
different, although even that is not unproblematic. 

The theorem was proved in the context of lattice 
gauge theory, where the spatial continuum is 
replaced by a discrete lattice. The scalar field is 
then represented by values @,. at each lattice site, and 
the gauge potential by values Ax, ,, on the links of the 
lattice. This is significant because on the lattice one 
can use a manifestly gauge-invariant formalism. 
Expectation values of gauge-invariant physical 
variables can be found, for example, by a Monte 
Carlo algorithm that effectively averages over all 
possible gauges. In this context, it is possible to 
show that the expectation value of any gauge- 
noninvariant operator (such as @,.) necessarily 
vanishes identically. 

To be more specific, suppose we incorporate a 
symmetry-breaking term of the form —j- $7, ,, and 
consider the limits V — oo followed by j — 0. In the 
global-symmetry case, as we noted earlier, this yields 
the nonzero result [13]. However, in the case of a 
local gauge symmetry, one can show rigorously that 


lim lim (@,), = 0 [27] 


j—-0 V—>œ 


The essential reason for this is that we can make a 
gauge transformation in the neighborhood of the 
point x to make @, have any value we like without 
changing the energy by more than a very small 
amount that goes to zero as j 0. Within this 
manifestly gauge-invariant formalism, it is clear that 
the expectation value of a gauge-noninvariant 
operator such as @ is not an appropriate order 
parameter. One must instead look for a gauge- 
invariant order parameter. 

It is important to note, however, that this result 
applies only in the context of a manifestly gauge- 
invariant formalism. But, in general, gauge theories 
cannot be quantized in a manifestly gauge-invariant 
way. In a _ path-integral formalism, the action 
functional, which appears in the exponent, is 
constant along the orbits of the gauge-group action. 
Consequently, the integral contains an infinite 
factor, the volume of the (infinite-dimensional) 
gauge group. There are corresponding divergences 


in the perturbation series. As is well known, this 
problem can be dealt with by introducing a gauge- 
fixing term, which explicitly breaks the gauge 
symmetry, and renders Elitzur’s theorem inapplic- 
able. But this procedure leaves a global symmetry 
unbroken, and it is in fact that global symmetry that 
is broken spontaneously. 

One example is the Landau—Ginzburg model of a 
superconductor, which is essentially just the non- 
relativistic limit of the abelian Higgs model, 
although there is one significant difference: here 
the field ¢ annihilates a Cooper pair, a bound pair 
of electrons with equal and opposite momenta and 
spins, so e above is replaced by the charge 2e of a 
Cooper pair. The appearance of a condensate of 
Cooper pairs in the low-temperature superconduct- 
ing phase corresponds to a state in which (¢) is 
nonzero. This would not be possible without fixing 
a gauge. In the nonrelativistic context, the obvious 
gauge to choose is the Coulomb gauge, defined by 
the condition 0,A* =0. This gauge-fixing condition 
breaks the local symmetry explicitly, but it leaves 
unbroken the global symmetry (x) — ¢(x)e!® with 
constant a. It is that global symmetry that is 
spontaneously broken when (¢) # 0. 

For a model with nonabelian local symmetry the 
standard procedure used to derive a perturbation 
expansion is that of Faddeev and Popov. Consider, 
for example, the SO(3) gauge theory discussed in the 
preceding section. To fix the gauge, we can choose a 
set of functions F=(F,) of the fields, and introduce 
into the path integral a gauge-fixing term of the form 


1 
Lot = — zE [28] 
where € is an arbitrary real constant. However, to 
ensure that this does not bias the integral, so that the 
gauge-fixed theory is at least formally equivalent to 
the original gauge-invariant theory, one must also 
include the determinant of the Jacobian matrix 


OF A(x) 
buy (y) 


The easiest way to do this is to introduce Faddeev— 
Popov ghost fields C, C, which are scalar Grassmann 
variables, and an appropriate term in_ the 
Lagrangian 





Jab (x,y) [29] 


Cr =C-J-C (30) 


For the SO(3) model, a convenient choice of gauge is 
the R¢ gauge defined by 


F = 0,A" — ern x 6 [31] 


where n is an arbitrarily chosen unit vector. It is 
clear that the full Lagrangian £ + Lef + Lpp is no 
longer invariant under the full SO(3) gauge group, 
although there is a residual U(1) gauge invariance 
corresponding to rotations about n. In this gauge, 
the arbitrary choice of m means that the global 
SO(3) symmetry is also broken. However, for other 
choices, such as the Lorentz gauge F =0,A" or 
axial gauge F=A3, the Lagrangian is invariant 
under global SO(3) rotations of all the fields. This 
global symmetry is then spontaneously broken, with 
Ê acquiring as before a nonzero expectation value of 
the form (6(x)) = nn. 

It is interesting to look again at the particle 
content of this model. By setting @(x) =n + o(x) 
with n = (0,0, 1), one finds that in the quadratic part 
of the Lagrangian, the cross-terms between A, and @ 
combine to form a total divergence which can be 
dropped. As before, y3 is the Higgs field, with 
m? = n°, A3, is the massless gauge field corres- 
ponding to the unbroken gauge symmetry, and the 
three transverse components of Aj, and A2, 
represent the massive vector fields, with m? = e?n’. 
There are, however, also unphysical fields with 
€-dependent masses: 01,25 C1,2, C12, and the long- 
itudinal components 0,,A{, all have m? = e771’. We 
can now compute the effective potential V¢(T, ø). 
One point that should be noted in performing this 
calculation is that the ghost fields C,C contribute 
negatively. Obviously, Veg, being &€-dependent, is 
not itself physically meaningful. Nevertheless, it can 
be shown that the stationary points of Ver are 
physical, and correspond to the possible equilibrium 
states of the theory. Moreover, the extremal values 
of Veg are independent of € and give correctly the 
thermodynamic potential in the corresponding equi- 
librium states. The negative contributions from the 
ghost fields to N, and M2? ensure that the £ 
dependence cancels out, and we find as expected 
N, =9 and M2 = (A + 6e7)r’. 


Phase Transitions and Crossovers 


Our discussion so far has for the most part been 
restricted to a semiclassical or mean-field approx- 
imation. It is important to bear in mind, however, 
that this approximation does not suffice to deter- 
mine whether a phase transition (where the thermo- 
dynamic free energy is nonanalytic) exists, or what 
its nature is. Determining the detailed characteristics 
of phase transitions requires other methods, such as 
the renormalization group or lattice simulations. In 
many cases, it is far from trivial to establish the 
order of the transition, or even whether a true phase 
transition actually exists. 
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Gauge theories pose particular problems because 
of the infrared divergences in the thermal field 
theory at high temperature, and because in asymp- 
totically free nonabelian theories the coupling 
becomes large at very low energy. Even when they 
appear to exhibit spontaneous symmetry breaking, 
they do not necessarily undergo a true phase 
transition. Lattice gauge theory calculations have 
led to the conclusion that in nonabelian gauge 
theories with the Higgs field in the fundamental 
representation, there are values of the coupling 
constants for which there is no phase transition, 
only a rapid but smooth crossover from one type of 
behavior to another, so that the high- and low- 
temperature phases are analytically connected. If the 
coupling constant is small, there is a first-order 
phase transition, and for moderate values the theory 
exhibits a very rapid crossover that looks quite 
similar to a symmetry-breaking phase transition. 
Nevertheless, the analytic connection between the 
two phases implies that there cannot exist an order 
parameter that is strictly zero above the transition 
and nonzero below it. 

In particular, it appears that for physical values 
of the Higgs mass, the electroweak theory does not 
undergo in fact undergo a true phase transition. It is 
somewhat ironic that the most famous example of a 
spontaneously broken gauge theory probably does 
not, strictly speaking, exhibit a symmetry-breaking 
phase transition! 


Conclusions 


We have discussed the main features of spontaneous 
symmetry breaking in both the global- and local- 
symmetry cases, especially the appearance of Gold- 
stone bosons when a continuous global symmetry 
breaks, and their elimination in the local-symmetry 
case by the Higgs mechanism, as well as the 
problems attaching to the concept of spontaneous 
symmetry breaking in gauge theories. 


See also: Abelian Higgs Vortices; Effective Field 
Theories; Electroweak Theory; Finite Group Symmetry 
Breaking; Lattice Gauge Theory; Noncommutative 
Geometry and the Standard Model; Phase Transitions in 
Continuous Systems; Quantum Central Limit Theorems; 
Quantum Spin Systems; Symmetries in Quantum Field 
Theory of Lower Spacetime Dimensions; Topological 
Defects and their Homotopy Classification. 
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Introduction 


A classification of random matrix ensembles by 
symmetries was first established by Dyson, in an 
influential 1962 paper with the title “the threefold 
way: algebraic structure of symmetry groups and 
ensembles in quantum mechanics.” Dyson’s three- 
fold way has since become fundamental to various 
areas of theoretical physics, including the statistical 
theory of complex many-body systems, mesoscopic 
physics, disordered electron systems, and the field of 
quantum chaos. 

Over the last decade, a number of random matrix 
ensembles beyond Dyson’s classification have come 
to the fore in physics and mathematics. On the 
physics side, these emerged from work on the low- 
energy Dirac spectrum of quantum chromodynamics 
(QCD) and from the mesoscopic physics of low- 
energy quasiparticles in disordered superconductors. 
In the mathematical research area of number theory, 
the study of statistical correlations in the values of 
Riemann zeta and similar functions has prompted 
some of the same generalizations. 

In this article, Dyson’s fundamental result will be 
reviewed from a modern perspective, and the recent 
extension of Dyson’s threefold way will be moti- 
vated and described. In particular, it will be 
explained why symmetry classes are associated 
with large families of symmetric spaces. 


The Framework 


Random matrices have their physical origin in the 
quantum world, more precisely in the statistical 
theory of strongly interacting many-body systems 
such as atomic nuclei. Although random matrix 
theory is nowadays understood to be of relevance to 


numerous areas of physics — see Random Matrix 
Theory in Physics — quantum mechanics is still 
where many of its applications lie. Quantum 
mechanics also provides a natural framework in 
which to classify random matrix ensembles. 
Following Dyson, the mathematical setting for 
classification consists of two pieces of data: 


e A finite-dimensional complex vector space V with 
a Hermitian scalar product (-,-), called a “unitary 
structure” for short. (In physics applications, 
V will usually be the truncated Hilbert space of 
a family of quantum Hamiltonian systems.) 

e On V there acts a group G of unitary and 
antiunitary operators (the joint symmetry group 
of the multiparameter family of quantum systems). 


Given this setup, one is interested in the linear space 
of self-adjoint operators on V — the Hamiltonians H 
— with the property that they commute with the 
G-action. Such a space is reducible in general, that 
is, the matrix of H decomposes into blocks. The goal 
of classification is to list all of the irreducible blocks 
that occur. 


Symmetry Groups 


Basic to classification is the notion of a symmetry 
group in quantum Hamiltonian systems, a notion 
that will now be explained. 

In classical mechanics, the symmetry group Go of 
a Hamiltonian system is understood to be the group 
of canonical transformations that commute with the 
phase flow of the system. An important example is 
the rotation group for systems in a central field. 

In passing from classical to quantum mechanics, 
one replaces the classical phase space by a quantum- 
mechanical Hilbert space V and assigns to the 
symmetry group Go a (projective) representation by 
unitary C-linear operators on V. Besides the one- 
parameter continuous subgroups, whose significance 
is highlighted by Noether’s theorem, the compo- 
nents of Go not connected with the identity play an 


important role. A prominent example is provided by 
the operator for space reflection. Its eigenspaces are 
the subspaces of states with positive and negative 
parity; these reduce the matrix of any reflection- 
invariant Hamiltonian to two blocks. 

Not all symmetries of a quantum-mechanical 
system are of the canonical, unitary kind: the 
prime counterexample is the operation of inverting 
the time direction, called time reversal for short. In 
classical mechanics, this operation reverses the sign 
of the symplectic structure of phase space; in 
quantum mechanics, its algebraic properties reflect 
the fact that inverting the time direction, t—> —t, 
amounts to sending i= /—1 to —i. Indeed, time t 
enters in the Dirac, Pauli, or Schrodinger equation 
as ibd/dt. Therefore, time reversal is represented in 
the quantum theory by an antiunitary operator T, 
which is to say that T is complex antilinear: 


T(zv) =zTv (zE€CveV) 


and preserves the Hermitian scalar product or 
unitary structure up to complex conjugation: 


(Tvi , Tv2) = (V1 , U2) = (V2 ,V1) 


Another operation of this kind is charge conjugation 
in relativistic theories such as the Dirac equation. 

By the symmetry group G of a quantum-mechanical 
system with Hamiltonian H, one then means the group 
of all unitary and antiunitary transformations g of V 
that leave the Hamiltonian invariant: gHg! = H. We 
denote the unitary subgroup of G by Go, and the set of 
antiunitary operators in G by G4 (not a group). If V 
carries extra structure, as will be the case for some 
extensions of Dyson’s basic scheme, the action of G on 
V has to be compatible with that structure. 

The set G; may be empty. When it is not, the 
composition of any two elements of G4 is unitary, so 
every g € G, can be obtained from a fixed element of 
G1, say T, by right multiplication with some U € Go: 
g= TU. In other words, when G; is nonempty the 
coset space G/Go consists of exactly two elements, Go 
and T - Go = G1. We shall assume that T represents 
some inversion symmetry such as time reversal or 
charge conjugation. T must then be a (projective) 
involution, that is, T? =z x Id with z a complex 
number of unit modulus, so that conjugation by TŻ is 
the identity operation. Since T is complex antilinear, 
the associative law T? - T =T - T? forces z to be real, 
and hence T? = +Id. 

Finding the total symmetry group of a Hamiltonian 
system need not always be straightforward, but this 
complication will not be an issue here: we take the 
symmetry group G and its action on the Hilbert 
space V as fundamental and given, and then ask 
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what are the corresponding symmetry classes, 
meaning the irreducible spaces of Hamiltonians on 
V that commute with G. 

For technical reasons, we assume the group Go to 
be compact; this is an assumption that covers most 
(if not all) of the cases of interest in physics. The 
noncompact group of space translations can be 
incorporated, if necessary, by wrapping the system 
around a torus, whereby translations are turned into 
compact torus rotations. 

While the primary objects to classify are the 
spaces of Hamiltonians H, we shall focus for 
convenience on the spaces of time evolutions 
U,=e “/ instead. This change of focus results in 
no loss, as the Hamiltonians can always be retrieved 
by linearizing in t at t=0. 


Symmetric Spaces 


We appropriate a few basic facts from the theory of 
symmetric spaces. 

Let M be a connected m-dimensional Riemannian 
manifold and p a point of M. In some open subset 
N, of a neighborhood of p there exists a map 
Sp: Np — Np, the geodesic inversion with respect to 
p, which sends a point x€N, with normal 
coordinates (x1,...,Xm) to the point with normal 
coordinates (—x1,..., —X,,). The Riemannian mani- 
fold M is called locally symmetric if the geodesic 
inversion is an isometry, and is called globally 
symmetric if sp extends to an isometry s,:M— M, 
for all pe M. A globally symmetric Riemannian 
manifold is called a symmetric space for short. 

The Riemann curvature tensor of a symmetric 
space is covariantly constant, which leads one to 
distinguish between three cases: the scalar curvature 
can be positive, zero, or negative, and the symmetric 
space is said to be of compact type, Euclidean type, 
or noncompact type, respectively. (In mesoscopic 
physics, each type plays a role: the first provides us 
with the scattering matrices and time evolutions, the 
second with the Hamiltonians, and the third with 
the transfer matrices.) The focus in the current 
article will be on compact type, as it is this type that 
houses the unitary time evolution operators of 
quantum mechanics. The compact symmetric spaces 
are subdivided into two major subtypes, both of 
which occur naturally in the present context, as 
follows. 


Type Il 


Consider first the case where the antiunitary 
component G4 of the symmetry group is empty, so 
the data are (V, G) with G = Go. Let 7(V) denote 
the group of all complex linear transformations that 
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leave the structure of the vector space V invariant. 
Thus, 7(V) is a group of unitary transformations if 
V carries no more than the usual Hermitian scalar 
product; and is some subgroup of the unitary group 
if V does have extra structure (as is the case for the 
Nambu space of quasiparticle excitations in a 
superconductor). The symmetry group Go, by acting 
on V and preserving its structure, is contained as a 
subgroup in 7(V). 

Let now H be any Hamiltonian with the pre- 
scribed symmetries. Then the time evolution 
t U,=e "H4/" generated by H is a one-parameter 
subgroup of 7(V) which commutes with the 
Go-action. The total set of transformations U; that 
arise in this way is called the (connected part of the) 
“centralizer” of Go in 7(V), and is denoted by Z. 
This is the “good” set of unitary time evolutions — 
the set compatible with the given symmetries of an 
ensemble of quantum systems. 

The centralizer Z is obviously a group: if U and 
U’ belong to Z, then so do their inverses and their 
product. What can one say about the structure of 
the group Z? 

Since Gop is compact by assumption, its group 
action on V is completely reducible and V is 
guaranteed to have an orthogonal decomposition 


vV=@v, 
À 


where the sum runs over isomorphism classes of 
irreducible Go-representations A, and the vector 
spaces V, are called the Go-isotypic components of 
V. For example, if Go is the rotation group SO3, the 
Go-isotypic component V) of V is the subspace 
spanned by all the states with total angular 
momentum A. 

Consider now any U € Z. Since U commutes with 
the Go-action, it does not connect different 
Go-isotypic components. (Indeed, in the example of 
SO3-invariant dynamics, angular momentum is 
conserved and transitions between different angular 
momentum sectors are forbidden.) Thus, every 
Go-isotypic component V, is an invariant subspace 
for the action of Z on V, and Z decomposes as 
Z= || Z) with blocks Z,=Z|y,. 

To say more, fix a standard irreducible 
Go-module R) of isomorphism class \ and consider 


Ly = Home, (Ry, Vy) 


the linear space of C-linear maps /:R, — V) that 
intertwine the Gop-actions on R, and V,. An element 
of Ly is called a Go-equivariant homomorphism. By 
Schur’s lemma, Ly & C if V) is Go-irreducible. More 
generally, dimL)=:m), counts the multiplicity of 
occurrence of Ry) in V); for example, in the case of 


Go = SO; we take R, to be the standard irreducible 
module of dimension 21+ 1; and m, then is the 
number of times a multiplet of states with total 
angular momentum A occurs in V). 

The natural mapping Li ® R, > Vy by l&r l(r) 
is an isomorphism, 


VLR 


and using it we can transfer the entire discussion 
from V) to La & Ry. The group Go acts trivially on 
L,=C” and irreducibly on R). Therefore, the 
component Z) of the centralizer Z is the unitary 


group 
Za = U(L)) S Un, 


if V is a unitary vector space with no extra structure. 
In the presence of extra structure (which, by 
compatibility with the Go-action, restricts to every 
subspace V)), the factor Z\ is some subgroup of 
Un. In all cases, Z is a direct product of connected 
compact Lie groups Z). 

To make the connection with symmetric spaces, write 
M:=Z). Since M is a group, the operation of taking 
the inverse, Ut» U-', makes sense for all U € M. 
Moreover, being a compact Lie group, the manifold M 
admits a left- and right-invariant Riemannian structure 
in which the inversion U= U™ is an isometry. By 
translation, one gets an isometry sy, : U= UUU] 
for every U; € M. All of these maps sy, are globally 
defined, and the restriction of sy, to some neighborhood 
of U; coincides with the geodesic inversion with respect 
to U1. Thus, M is a symmetric space by the definition 
given above. Symmetric spaces of this kind are called 
type II. 


Type | 


Consider next the case of G; Æ Ø, where some 
antiunitary symmetry T is present. As before, let Z 
be the connected component of the centralizer of Go 
in 7(V). Conjugation by T, 


Ur+7(U) := TUT! 


is an automorphism of 7(V) and, owing to T? = +Id, 
T is involutive. Because Go CG is a normal 
subgroup, 7 restricts to an involutive automorphism 
(still denoted by 7) of Z. Now recall that T is 
complex antilinear and the good Hamiltonians are 
subject to THT =H. The good time evolutions 
U,=e""/" clearly satisfy 7(U;) =U_,=U;". Thus, 
the good set to consider is /:={U € Z| U=7(U)"}. 
The set ./ is a manifold, but in general is not a 
Lie group. 

Further details depend on what 7 does with the 
factorization Z=[[,Z). If V) is a Go-isotypic 


component of V, then so is TV), since T normalizes 
Go. Thus, either Vy M7 TV) =0, or TV) = V). In the 
former case, the involutive automorphism 7 just 
relates U € Z) with 7(U) € Zry,, whence no intrin- 
sic constraint on Z, results, and the time evolutions 
(UU) eZ x Zrv, constitute a type-II sym- 
metric space, as before. 

A novel situation occurs when TV), = V}, in which 
case T restricts to an automorphism of Z). Let 
therefore TV,=V), put K= Z, for short, and 
consider 


M := {U € K|U = t(U) +} 


Note that if two elements p,po of K are in M, 


then so is the product pop™tpo. The group K acts on 
MCK by 


k-U=kUr(k)' (kEK) 


and this group action is transitive, that is, every U € M 
can be written as U=kr(k)! with some kEK. 
(Finding k for a given U is like taking a square root, 
which is possible since exp : Lie K — K is surjective.) 
There exists such a K-invariant Riemannian structure 
for M that for all po € M the mapping s,,:M—M 
defined by 


Sp) (P) = PoP” ‘Po 


is the geodesic inversion with respect to po € M. 
Thus, in this natural geometry M is a globally 
symmetric Riemannian manifold and hence a sym- 
metric space. The present kind of symmetric space is 
called type I. If K- is the set of fixed points of 7 in K, 
the symmetric space M is analytically diffeomorphic 
to the coset space K/K, by 


K/K, >McK, — UK," Ur(U)™' 


which we call the “Cartan embedding” of K/K, 
into K. 

In summary, the solution to the problem of 
finding the set of unitary time evolution operators 
that are compatible with a given symmetry group G 
and structure of Hilbert space V is always a 
symmetric space. This is a valuable insight, as 
symmetric spaces are rigid objects and have been 
completely classified by Cartan. 

If the dimension of V is kept variable, the 
irreducible symmetric spaces that occur belong to 
one of the large families listed in Table 1. 


Dyson’s Threefold Way 


Recall the goal: given a Hilbert space V and a 
symmetry group G acting on it, one wants to classify 
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Table 1 The large families of symmetric spaces. The form of H 
in the header applies to the last seven families 





Family Symmetric W Z 
space Form of H=( 5 Wy) 

A Un Complex Hermitian 

Al Un /On Real symmetric 

All U2n/USPpəy Quaternion self-adjoint 

C USpsy Z complex symmetric, 
W=w' 

Cl USpon /Un Z complex symmetric, 
W=0 

D SOvy Z complex skew, 
W= WwW' 

DINI SOon/Un Z complex skew, 
W=0 

Alll Unig/Up x Ug Z complex p x q, W=0 

BDI SOp+q/SOp x SOg Z real p x q, W=0 

Cll USP2p42q/USP2p Z quaternion 

xUSPoq 2p x 2q,W=0 


the (irreducible) spaces of time evolution operators 
U that are “compatible” with G, meaning 


U = goUgy' = giU gy" 
(for all g, € Go) 


As we have seen, the spaces that arise in this way are 
symmetric spaces of type I or II depending on the 
nature of the time reversal (or other antiunitary 
symmetry) T. 

An even stronger statement can be made when 
more information about the Hilbert space V is 
specified. In Dyson’s classification, the Hermitian 
scalar product of V is assumed to be the only 
invariant structure that exists on V. With that 
assumption, only three large families of symmetric 
spaces arise; these correspond to what we call the 
“Wigner—Dyson symmetry classes.” 


Class A 


Recall that in Dyson’s case, the connected part of the 
centralizer of Go in 7(V) is a direct product of 
unitary groups, each factor being associated with one 
Go-isotypic component V, of V. The type-II situation 
occurs when the set G; of antiunitary symmetries is 
either empty or else exchanges different V). In both 
cases, the set of good time evolution operators 
restricted to one Go-isotypic component V) is a 
unitary group U,,,, with m, being the multiplicity of 
the irreducible Go-representation A in V). 

The unitary groups UN=m, or, to be precise, their 
simple parts SUN, are called type-II symmetric spaces 
of the A family or A series — hence the name class A. 
The Hamiltonians H, the generators of time evolu- 
tions U,=e“"/", in this class are represented by 
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complex Hermitian N x N matrices. By putting a 
Un-invariant Gaussian probability measure 


exp(—trH’/20*)dH (oc €R) 


on that space, one gets what is called the GUE — the 
Gaussian unitary ensemble — which defines the 
Wigner—Dyson universality class of unitary symmetry. 


Classes Al and All 


Consider next the case G; Æ Ú, with antiunitary 
generator T. Let V,=TV, be any Gpo-isotypic 
component of V invariant under T (the type-I 
situation). The mapping U= TUT"! =7(U) then is 
an automorphism of the groups U(V)),Go and 
K=Z) = Um,. If K» is the subgroup of fixed points 
of T in K, the space of good time evolutions can be 
identified with the symmetric space K/K, by the 
Cartan embedding. Our task is to determine K,. 
To simplify the notation let us write V, = V,R) = 
R, and Ly = L. We now ask what happens with 
T:V— V in the process of transfer to LO R S V. 
The answer, so we claim, is that T transfers to a 
pure tensor made from antiunitary maps a:L—L 


and @:R—R, 
T=a8ßp 


To prove this claim, let C be the antilinear map 
from V to the dual vector space V* by v=(v,). 
Because the elements of Gg are represented by 
unitaries, the C-linear operator CT: V — V* inter- 
twines Go-actions: 


CTa(g)=g "CT (g€Go) 


where a is the automorphism a(g)=T~!gT. From 
the irreducibility of R it follows that the space of 
intertwiners R — R* is one dimensional here (Schur’s 
lemma). Therefore, CT : L ® R — L* & R* must be a 
pure tensor (as opposed to a sum of such tensors), 
and since C is clearly a pure tensor, so is T. This 
completes the proof. 

By the involutive property T? = er Idy (er = £1), 
the two antiunitary factors of T=a®/ cannot 
but square to a*=e,]Id, and 6*=egldr where 
Eq s€g=+1 are related by e,eg=er7. The factor a 
determines a nondegenerate complex bilinear form 


O:L x L—-C by 
O(h, h) = (ah, bh), 


Since œ is antiunitary one has the exchange 
symmetry 


Ohh) = (a*h, al), = €a Q(h, h) 


Thus, the complex bilinear form (or pairing) O is 
symmetric for €a = +1 and alternating for €a = —1. 


(layin E-L) 


Knowing the sign of €a =+1 we know the group 
K,. Indeed, an element k € K, commutes with T and 
after transfer from V to L still commutes with a. But 
since K, is a subgroup of K=U,,,, this means that 
k € K, preserves O. In the case of £a = +1, what is 
preserved is a symmetric pairing, and therefore K, = 
Om,. For £a = — 1, the multiplicity m, must be even 
and K, preserves an alternating pairing (or symplec- 
tic structure); in that case K, ~ USp,,,, the unitary 
symplectic group. 

Thus, there is a dichotomy for the sets of good 
time evolutions M = K/K;: 


Class Al: K/K, 2 Un/On 
Class AH: K/K- % Uon/USpoy 


(N = my) 
(2N = my) 


Again we are referring to symmetric spaces by the 
names they — or rather their simple parts SUn/SOn 
and SU2n/USpzy — have in the Cartan classification. 

In general, there is no immediate means of 
predicting the parity €,, and one has no choice but 
to go through the steps of constructing a. If 
@:R—R happens to be Go-invariant, however, the 
situation simplifies. In that case @ determines a 
Go-invariant pairing R x R —> C (in the same way as 
a determines O:L x L—C above). On general 
grounds, an irreducible Go-representation space 
admits at most one such pairing. If that pairing is 
symmetric, then, as we have seen, e¢g,=1; if it is 
alternating, then €g = —1. The parity €a is given by 
EaEB SET. 


Example Consider any physical system with spin- 
rotation symmetry (Go=SU2) and time-reversal 
symmetry. The physical operation of time reversal, 
T, commutes with spin rotations and, hence, here 
is a case where the factor 6 in T=a®Q@ is 
Go-invariant. On fundamental physics grounds one 
has T? =(—1)*° on states with spin S. The spin-S 
representation of SU, is known to carry an invariant 
pairing which is symmetric or skew depending on 
whether the integer 2S is even or odd. Therefore, 
ep =e, and £a = +1 in all cases. 

Thus, T-invariant systems with no symmetries 
other than energy and spin invariably are class AI. 
By breaking spin-rotation symmetry (Go = {Id}, 
€3= +1) while maintaining T-symmetry for states 
with half-integer spin (say single electrons, which 
carry spin S=1/2), one gets ¢,=-—1, thereby 
realizing class AIl. 


The Hamiltonians By passing to the tangent space 
of K/K, at unity one obtains Hermitian matrices 
with entries that are real numbers (class AI) or real 
quaternions (class AII). When K,-invariant Gaussian 


probability measures (called GOE resp., GSE) are 
put on these spaces, one gets the Wigner—Dyson 
universality classes of orthogonal resp., symplectic 
symmetry. In mesoscopic physics, these are realized 
in disordered metals with time-reversal invariance 
(absence of magnetic fields and magnetic impuri- 
ties). Spin-rotation symmetry is broken by strong 
spin-orbit scatterers such as gold impurities. 


Warning 


The word “symmetry class” is not synonymous with 
“universality class.” Indeed, inside a symmetry class 
many different types of physical behavior are 
possible. For example, random matrix models for 
disordered metallic grains with time-reversal sym- 
metry belong to the symmetry class of the example 
above (class AI), and so do Anderson tight-binding 
models with real hopping. The former are known to 
exhibit energy level statistics of universal GOE type, 
whereas the latter have localized eigenfunctions and 
hence level statistics which is expected to approach 
the Poisson limit when the system size goes to 
infinity. 


Disordered Superconductors 


When Dirac first wrote down his famous equation in 
1928, he assumed that he was writing an equation 
for the wave function of the electron. Later, because 
of the instability caused by negative-energy solu- 
tions, the Dirac equation was reinterpreted (via 
second quantization) as an equation for the ferm- 
ionic field operators of a quantum field theory. A 
similar change of viewpoint is carried out in reverse 
in the Hartree—Fock—Bogoliubov mean-field descrip- 
tion of quasiparticle excitations in superconductors. 
There, one starts from the equations of motion for 
linear superpositions of the electron creation and 
annihilation operators, and reinterprets them as a 
unitary quantum dynamics for what might be called 
the quasiparticle “wave function.” 

In both cases — the Dirac equation and the 
quasiparticle dynamics of a superconductor — there 
enters a structure not present in the standard 
quantum mechanics underlying Dyson’s classifica- 
tion: the field operators for fermionic particles are 
subject to a set of relations called the “canonical 
anticommutation relations,” and these are preserved 
by the quantum dynamics. Therefore, whenever 
second quantization is undone (assuming it can be 
undone) to return from field operators to wave 
functions, the wave-function dynamics is required to 
preserve some extra structure. This puts a linear 
constraint on the good Hamiltonians H. For our 
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purposes, the best viewpoint to take is to attribute 
the extra invariant structure to the Hilbert space V, 
thereby turning it into a Nambu space. 


Nambu Space 


Adopting the standard physics conventions of 
second quantization, consider some set of single- 
particle creation and annihilation operators g! and 
Ci, where 1=1,2,... labels an orthonormal system 
of single-particle states. Such operators are subject 
to the canonical anticommutation relations (CARs) 

i] 


al ak 
C56 + Gc; = Oy 


[1] 


ac za ae. == CiCj + CC; 


When written in terms of c; + a and i(c; — c!), these 
become the standard defining relations of a Clifford 
algebra over R. Field operators are linear combina- 
tions y = (uici + fic!) with complex coefficients u; 
and fi. 

Now take H to be some Hamiltonian which is 
quadratic in the creation and annihilation operators: 


1 _ 
H= » Wiclc; + TD Co + Zijcjci) 
1J LJ 


and let H act on field operators w by the 
commutator: H- y = [H, w]. The time evolution of 
y is then determined by the Heisenberg equation of 
motion 


dy 
which integrates to y(t) =el4/" . w(0), and is easily 


verified to preserve the CARs [1]. 

The dynamical equation [2] is equivalent to a 
system of linear differential equations for the 
amplitudes u; and fj. If these are assembled into 
vectors, and the W; and Zy into matrices, eqn [2] 


becomes 
wa) (a Av) (4) 


The Hamiltonian matrix on the right-hand side has 
some special properties due to Zj=—Z;; (from 
Ce=—Ge). and Wy= Wii (from H being self- 
adjoint as an operator in Fock space). To keep 
track of these properties while imposing some 
unitary and antiunitary symmetries, it is best to put 
everything in invariant form. 

So, let U be the unitary vector space of annihila- 
tion operators u=; uici and view the creation 
operators f =; fic! as lying in the dual vector space 
U*. The field operators y =u + f then are elements 
of the direct sum U@U*=:V, called “Nambu 
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bl 


space.” On V there exists a canonical unitary 
structure expressed by 


(WoW) =X Hi it; + fifi) 


A second canonical structure on V = U @ U* is given 
by the symmetric complex bilinear form 


{WW} = So (fiu; + fiit;) = f(u) + f (i) 


where the last expression uses the meaning of f as a 
linear function f :U — C. Note that {w, w} agrees 
with the anticommutator of the field operators, 
yy + yy. 

Now recall that the quantum dynamics is deter- 
mined by a Hamiltonian H that acts on w by the 
commutator H-w=[H,w]. The one-parameter 
groups t e”¥/” generated by this action (the time 
evolutions) preserve the symmetric pairing: 


(y, p} = {ey ty} 
since the anticommutation relations [1] do not 
change with time. They also preserve the unitary 


structure, 
ly, w) = (tt y, eitH/b v) 


because probability in Nambu space is conserved. 
(Physically speaking, this holds true as long as H is 
quadratic, i.e., many-body interactions are negligible.) 

One can now pose Dyson’s question again: given 
Nambu space V and a symmetry group G acting on 
it, what is the set of time evolution operators that 
preserve the structure of V and are compatible with 
G? From the section “The framework,” we know 
the answer to be some symmetric space, but which 
are the symmetric spaces that occur? 


Class D 


Consider a superconductor with no symmetries in its 
quasiparticle dynamics, so G = {Id}. (A concrete 
example would be a disordered spin-triplet super- 
conductor in the vortex phase.) The time evolutions 
U,=e4/" are then constrained only by invariance 
of the unitary structure and the symmetric pairing 
{,} of Nambu space. These two structures are 
consistent; they are related by particle-hole con- 
jugation C: 


(vi = (Cy, 9) 
which is an antiunitary operator with square C? = +Id. 
Let Ve C V denote the real vector space of fixed 
points of C. (The field operators in Vp are called of 
“Majorana?” type in physics.) The condition 
{w, w} ={U,w, Ury} selects a complex orthogonal 


group SO(V), and imposing unitarity yields a real 
orthogonal subgroup SO(Vpr) with dim Vg € 2N - 
a symmetric space of the D family. 

When expressed in some basis of Majorana 
fermions (meaning a basis of Vg), the matrix of 
the time evolution generator iH € so(Vp) is real 
skew, and that of H imaginary skew. The simplest 
random matrix model for class D, the SO-invariant 
Gaussian ensemble of imaginary skew matrices, is 
analyzed in the second edition of Mehta’s (1991) 
book. From the expressions given by Mehta it is 
seen that the level correlation functions at high 
energy coincide with those of the Wigner—Dyson 
universality class of unitary symmetry. The level 
correlations at low energy, however, show different 
behavior defining a separate universality class. 
This universal behavior at low energies has immedi- 
ate physical relevance, as it is precisely the low- 
energy quasiparticles that determine the thermal 
transport properties of the superconductor at low 
temperatures. 


Class DIil 


Let now magnetic fields and magnetic impurities 
be absent, so that time reversal T is a symmetry of 
the quasiparticle system: G = ({Id, T}. Following the 
section “The framework,” the set of good time 
evolutions is M ® K/K, with K=SO(Vp) and K, 
the set of fixed points of Ut>7(U)=TUT™ in K. 
What is K-? 

The square of the time-reversal operator is T* = —Id 
(for particles with spin 1/2), and commutes with 
particle-hole conjugation C, which makes P:=iCT a 
useful operator to consider. Since C by definition 
commutes with the action of K, and hence also with 
that of K,, the subgroup K, has an equivalent 
description as 


K, = {k €U(V)|k = PRP! = 7(k)} 


The operator P is easily seen to have the following 
properties: (1) P is unitary, (2) P* =Id, and (3) try 
P=0. Consequently, P possesses two eigenspaces 
V of equal dimension, and the condition k = PkP™ 
fixes a subgroup U(V,) x U(V_) of U(V). Since P 
contains a factor i= V—1 in its definition, it antic- 
ommutes with the antilinear operator T. Therefore, 
the automorphism 7 exchanges U(V,) with U(V_), 
and the fixed-point set K, is the same as U(V,) S 
Ubon. Thus, 


M = K/K; = SO4n/U2n (dim V+ = 2N) 


a symmetric space in the DIII family. Note that 
for particles with spin 1/2 the dimension of V, has 
to be even. 


By realizing the algebra of involutions C,T as 
Cy=(lon @iox)w and Ty=(lin @ioy)w, the 
Hamiltonians H in class DIII are brought into the 


standard form 
0 Z 
a (5 4 


where the 2N x 2N matrix Z is complex and skew. 


Class C 


Next let the spin of the quasiparticles be 
conserved, as is the case for a spin-singlet super- 
conductor with no spin-orbit scatterers present, and 
let time-reversal invariance be broken by a magnetic 
field. The symmetry group of the quasiparticle 
system then is the spin-rotation group: G=Go= 
Spin; = SU. 

Nambu space V can be arranged to be a tensor 
product V = L & R so that Go acts trivially on L and 
by the spinor representation on the spinor space R = 
C*. Since two spinors combine to give a scalar, the 
latter comes with an alternating bilinear form a: R x 
R— C. In a suitable basis, the anticommutation 
relations [1] factor on particle-hole and spin indices. 
The symmetric bilinear form {,} of V correspondingly 
factors under the tensor product decomposition 
V=L@K as 


{hL @ 11, 8r} = [hb] x a(n 72) 


where [,] is an alternating form on L, giving L the 
structure of a complex symplectic vector space. 

The good set M now consists of the time 
evolutions that, in addition to preserving the 
structure of Nambu space, commute with the spin- 
rotation group SU): 


M = {U € U(V)|UC = CU, Vg € SU; : gU = Ug} 


By the last condition, all time evolutions act trivially 
on the factor R. The condition UC=CU, which 
expresses invariance of the symmetric form of V, 
then implies that time evolutions preserve the 
alternating form of L. Time evolutions therefore 
are unitary symplectic transformations of L, hence 
M = USp( L) S USp,, — a symmetric space of the C 
family. The Hamiltonian matrices in class C have 
the standard form 


w z 
n=(z w) 


with W being Hermitian and Z complex and 
symmetric. 
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Class CI 


The next class is obtained by taking the time 
reversal T as well as the spin rotations g € SU) to 
be symmetries of the quasiparticle system. 

By arguments that should be familiar by now, the 
set of good time evolutions is a symmetric space 
M = K/K, with K=USp(L) and K- the set of fixed 
points of r in K. Once again, the question to be 
answered is: what is K,? The situation here is very 
similar to the one for class DIII, with L and USp(L) 
taking the roles of V and SO(Vap). By adapting the 
previous argument to the present case, one shows 
that K, is the same as U(L..) S Un, where L, is the 
positive eigenspace of P=iCT viewed as a unitary 
operator on L. Thus, 


Dirac Fermions: The Chiral Classes 


Three large families of symmetric spaces remain to 
be implemented. Although these, too, occur in 
mesoscopic physics, their most natural realization 
is by 4D Dirac fermions in a random gauge field 
background. 

Consider the Lagrangian » for the Euclidean 
spacetime version of QCD with N. > 3 colors of 
quarks coupled to an SUN. gauge field A,,: 


y= iy (On — Au) y + imypy 


The massless Dirac operator D=1y"(0, — A,) anti- 
commutes with ys = y°y!7*7°. Therefore, in a basis 
of eigenstates of ys the matrix of D takes the form 


o-(2 2) ow 


If the gauge field carries topological charge v € Z, 
the Dirac operator D has at least |v| zero modes by the 
index theorem. To make a simple model of the 
challenging situation where A,, is distributed according 
to Yang-Mills measure, one takes the matrices Z to be 
complex rectangular, of size p x q with p — q = v, and 
puts a Gaussian probability measure on that space. 
This random matrix model for D captures the 
universal features of the QCD Dirac spectrum in the 
massless limit. 

The exponential of the truncated Dirac operator, 
e’D (where t is not the time), lies in a space 
equivalent to Upig/U, x Ug — a symmetric space of 
the AII family. We therefore say that the universal 
behavior of the QCD Dirac spectrum is that of 
symmetry class AII. 

But hold on! Why are we entitled to speak of a 
symmetry class here? By definition, symmetries 
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always commute with the Hamiltonian, never do 
they anticommute! (The relation D = —75D y5 is not 
a symmetry in the sense of Dyson, nor is it a 
symmetry in our sense.) 


Class AIlll 


To incorporate the massless QCD Dirac operator 
into the present classification scheme, we adapt it 
to the Nambu space setting. This is done by 
reorganizing the four-component Dirac spinor 
y,w as an eight-component Majorana spinor Y, 
to write 


Z m=0 = 5 WD" (ð, — v4) 


The 8 x 8 matrices I” are real symmetric besides 
satisfying the Clifford relations [“T’ + TT” = 26”, 
A possible tensor product realization is 

TÍ = Ox Q Oy Q dy 


I? =o, @ a, @ay 


T° =18@0,@81, 
I? =o, @oy@1, 


The gauge field in this Majorana representation 
is A,=181® (Ar — Al Hig) where A = (172) 
(A,,+ A’) are the symmetric and skew parts of 
A, E€ su(N,). 

The operator H=il“(0,—.7,) is imaginary 
skew, therefore e” is real orthogonal. This means 
that there exists a Nambu space V with unitary 
structure (,) and symmetric pairing {,}, both of 
which are preserved by the action of e*#. No change 
of physical meaning or interpretation is implied by 
the identical rewriting from Dirac D to Majorana H. 
The fact that Dirac fermions are not truly Majorana 
is encoded in a U,-symmetry He®L =e”2H gener- 
ated by O=1@1 oy. 

Now comes the essential point: since H obeys 
H = —H, the chiral “symmetry” H=—IsHTs with 
Ts; =1®&o0, @1 can be recast as a true symmetry: 


HS +r;HT; = THT! 


with antilinear T:Ur-Ts5U. Thus, the massless 
QCD Dirac operator is indeed associated with a 
symmetry class in the present, post-Dyson sense: 
that is class AII, realized by self-adjoint operators 


on Nambu space with Dirac U,-symmetry and an 
antiunitary symmetry T. 


Classes BDI and Cll 


Consider Hamiltonians D still of the form [3] but 
now with matrix entries taken from either the real 
numbers or the real quaternions. Their one-parameter 
groups e”P belong to two further families of 
symmetric spaces, namely the classes BDI and CII 
of Table 1. These large families are known to be 
realized as symmetry classes by the massless Dirac 
operator with gauge group SU, (for BDI), or with 
fermions in the adjoint representation (for CII). For 
the details we must refer to Verbaarschot’s (1994) 
paper and the recent article by Heinzner et al. (2005). 


See also: Classical Groups and Homogeneous Spaces; 
Compact Groups and Their Representations; 
Determinantal Random Fields; Dirac Fields in Gravitation 
and Nonabelian Gauge Theory; Dirac Operator and Dirac 
Field; High T, Superconductor Theory; Integrable 
Systems in Random Matrix Theory; Lie Groups: General 
Theory; Random Matrix Theory in Physics; Random 
Partitions; Supersymmetry Methods in Random Matrix 
Theory; Symmetries and Conservation Laws. 
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Introduction: Chaotic Systems Can 
Synchronize 


Synchronization is a ubiquitous phenomenon char- 
acteristic of many processes in natural systems and 
(nonlinear) science. It has permanently remained an 
objective of intensive research and is today consid- 
ered as one of the basic nonlinear phenomena 
studied in mathematics, physics, engineering, or life 
science. This word has a Greek root, syn = common 
and chronos=time, which means to share the 
common time or to occur at the same time, that is, 
correlation or agreement in time of different 
processes (Boccaletti et al. 2002). Thus, synchroni- 
zation of two dynamical systems generally means 
that one system somehow traces the motion of 
another. Indeed, it is well known that many coupled 
oscillators have the ability to adjust some common 
relation that they have between them due to weak 
interaction, which yields to a situation in which a 
synchronization-like phenomenon takes place. 

The original work on synchronization involved 
periodic oscillators. Indeed, observations of (peri- 
odic) synchronization phenomena in physics go back 
at least as far as C Huygens (1673), who, during his 
experiments on the development of improved pen- 
dulum clocks, discovered that two very weakly 
coupled pendulum clocks become synchronized in 
phase: two clocks hanging from a common support 
(on the same beam of his room) were found to 
oscillate with exactly the same frequency and 
opposite phase due to the (weak) coupling in terms 
of the almost imperceptible oscillations of the beam 
generated by the clocks. 

Since this discovery, periodic synchronization has 
found numerous applications in various domains, 
for instance, in biological systems and living nature 
where synchronization is encountered on different 
levels. Examples range from the modeling of the 
heart to the investigation of the circadian rhythm, 
phase locking of respiration with a mechanical 
ventilator, synchronization of oscillations of human 
insulin secretion and glucose infusion, neuronal 
information processing within a brain area and 
communication between different brain areas. Also, 
synchronization plays an important role in several 
neurological diseases such as epilepsies and patho- 
logical tremors, or in different forms of cooperative 
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behavior of insects, animals, or humans (Pikovsky 
et al. 2001). 

This process may also be encountered in celestial 
mechanics, where it explains the locking of revolu- 
tion period of planets and satellites. 

Its view was strongly broadened with the devel- 
opments in radio engineering and acoustics, due to 
the work of Eccles and Vincent, 1920, who found 
synchronization of a triode generator. Appleton, 
Van der Pol, and Van der Mark, 1922-27, have, 
experimentally and theoretically, extended it and 
worked on radio tube oscillators, where they 
observed entrainment when driving such oscillators 
sinusoidally, that is, the frequency of a generator 
can be synchronized by a weak external signal of a 
slightly different frequency. 

But, even though original notion and theory of 
synchronization implies periodicity of oscillators, 
during the last decades, the notion of synchroniza- 
tion has been generalized to the case of interacting 
chaotic oscillators. Indeed, the discovery of determi- 
nistic chaos introduced new types of oscillating 
systems, namely the chaotic generators. 

Chaotic oscillators are found in many dynamical 
systems of various origins; the behavior of such 
systems is characterized by instability and, as a 
result, limited predictability in time. 

Roughly speaking, a system is chaotic if it is 
deterministic, has a long-term aperiodic behavior, 
and exhibits sensitive dependence on initial condi- 
tions on a closed invariant set (the chaos theory is 
discussed in more detail elsewhere in this encyclo- 
pedia) (see Chaos and Attractors). 

Consequently, for a chaotic system, trajectories 
starting arbitrarily close to each other diverge 
exponentially with time, and quickly become uncor- 
related. It follows that two identical chaotic systems 
cannot synchronize. This means that they cannot 
produce identical chaotic signals, unless they are 
initialized at exactly the same point, which is in 
general physically impossible. Thus, at first sight, 
synchronization of chaotic systems seems to be 
rather surprising because one may intuitively (and 
naively) expect that the sensitive dependence on 
initial conditions would lead to an immediate 
breakdown of any synchronization of coupled 
chaotic systems. This scenario in fact led to the 
belief that chaos is uncontrollable and thus unusa- 
ble. Despite this, in the last decades, the search for 
synchronization has moved to chaotic systems. 
Significant research has been done and, as a result, 
Yamada and Fujisaka (1983), Afraimovich et al. 
(1986), and Pecora and Carroll (1990) showed that 
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two chaotic systems could be synchronized by 
coupling them: synchronization of chaos is actual 
and chaos could then be exploitable. Ever since, 
many researchers have discussed the theory and the 
design or applications of synchronized motion in 
coupled chaotic systems. A broad variety of applica- 
tions has emerged, for example, to increase the 
power of lasers, to synchronize the output of 
electronic circuits, to control oscillations in chemical 
reactions, or to encode electronic messages for 
secure communications. 

The publication of the seminal paper of Pecora 
and Caroll (1990) had a very strong impact in the 
domain of chaos theory and chaos synchronization, 
and their applications. It had stimulated very intense 
research activities and the related studies continue to 
attract great attention. Many authors have contrib- 
uted to developing this domain, theoretically or 
experimentally (Boccaletti et al. 2002, Pecorra et al. 
1997, references therein). 

However, the special features of chaotic systems 
make it impossible to directly apply the methods 
developed for synchronization of periodic oscilla- 
tors. Moreover, in the topics of coupled chaotic 
systems, many different phenomena, which are 
usually referred to as synchronization, exist and 
have been studied now for over a decade. Thus, 
more precise descriptions of such systems are indeed 
desirable. 

Several different regimes of synchronization have 
been investigated. In the following, the focus will be 
on explaining the essentials on this large topic, 
subdivided into four basic types of synchronization 
of coupled or forced chaotic systems which have 
been found and have received much attention, while 
emphasizing on the first three: 


è identical (or complete) synchronization (IS), 
which is defined as the coincidence of states of 
interacting systems; 

è generalized synchronization (GS), which extends 
the IS phenomenon and implies the presence of 
some functional relation between two coupled 
systems; if this relationship is the identity, we 
recover the IS; 

è phase synchronization (PS), which means entrain- 
ment of phases of chaotic oscillators, whereas 
their amplitudes remain uncorrelated; and 

e lag synchronization (LS), which appears as a 
coincidence of time-shifted states of two systems. 


Other regimes exist, some of them will be briefly 
pointed out at the end of this article; we also will 
briefly discuss the very relevant issue of the stability 
of synchronous motions. 


Our discussion and examples given here are based 
on unidirectionally continuous systems, most of the 
exposed ideas can be easily extended to discrete 
systems. 

Let us also emphasize that the same year, 1990, 
saw the publication of another seminal paper, by 
Ott, Grebogi, and Yorke (OGY) on the control of 
chaos (Ott et al. 1990). Recently, it has been 
realized that synchronization and control of chaos 
share a common root in nonlinear control theory. 
Both topics were presented by many authors in a 
unified framework. However, synchronization of 
chaos has evolved in its own right, even if it is 
nowadays known as a part of the nonlinear control 
theory. 


Synchronization and Stability 


For the basic master-slave configuration, where an 
autonomous chaotic system (the master) 


dX a 
n F(X), XER [1] 
drives another system (the slave), 
= G(X, Y), YeR” [2] 


synchronization takes place when Y asymptotically 
copies, in a certain manner, a subset X, of X. That 
is, there exists a relation between the two coupled 
systems, which could be a smooth invertible func- 
tion %, which transforms the trajectories on the 
attractor of a first system into those on the attractor 
of a second system. In other words, if we know, 
after a transient regime, the state of the first system, 
it allows us to predict the state of the second: 
Y(t) =y(X(t)). Generally, it is assumed that n > ms; 
however, for the sake of easy readability (even if this 
is not a necessary restriction) the case n=m will 
only be considered; thus, X, =X. Henceforth, if we 
denote the difference Y — Y(X) by X1, in order to 
arrive at a synchronized motion, it is expected that 


|Xi || 0, as t— +00 3] 


If Y is the identity function, the process is called IS. 


Definition of IS System [2] synchronizes with 
system [1], if the set M={(X, Y) € R” x R”, Y =X} 
is an attracting set with a basin of attraction B(M C B) 
such that lim;.. ||X(t)— Y(t)||=0, for all 
(X(0), Y(O)) € B. 


Thus, this regime corresponds to the situation 
where all the variables of two (or more) coupled 
chaotic systems converge. 


If w is not the identity function, the phenomenon 
is more general and is referred to as GS. 


Definition of GS System [2] synchronizes with 
system |1], in the generalized sense, if there exists a 
transformation w:R”—R”, a manifold M= 
(X, Y) € R”*”, Y=u(X)} and a subset B (M c B), 
such that for all (Xo, Yo) € B, the trajectory based 
on the initial conditions (Xo, Yo) approaches M as 
time goes to infinity. This is explained further in the 
following. 


Henceforth, in the case of IS, eqn [3] above means 
that a certain hyperplane M, called synchronization 
manifold, within R”, is asymptotically stable. 
Consequently, for the sake of synchrony motion, 
we have to prove that the origin of the transverse 
system X, = Y — X is asymptotically stable. That is, 
to prove that the motion transversal to the synchro- 
nization manifold dies out. 

However, significant progress has been made by 
mathematicians and physicists in studying the 
stability of synchronous motions. Two main tools 
are used in the literature for this aim: conditional 
Lyapunov exponents and asymptotic stability. In the 
examples given below, we will essentially formulate 
conditions for synchronization in terms of Lyapunov 
exponents, which play a central role in chaos theory. 
These quantities measure the sensitive dependence 
on initial conditions for a dynamical system and also 
quantify synchronization of chaos. 

The Lyapunov exponents associated with the 
variational equation corresponding to the transverse 
system X]: 


—— = DF(X) X: |4] 


where DF(X) is the Jacobian of the vector field 
evaluated onto the driving trajectory X, are referred 
to as transverse or conditional Lyapunov exponents 
(CLEs). 

In the case of IS, it appears that the condition L+., < 
0 is sufficient to insure synchronization, where L+, is 
the largest CLE. Indeed, eqn [4] gives the dynamics of 
the motion transverse to the synchronization manifold; 
therefore, CLEs indicate if this motion dies out or not, 
and hence, whether the synchronization state is stable 
or not. Consequently, if L.,. is negative, it insures the 
stability of the synchronized state. This will be best 
explained using two examples below. 

Even if there exist other approaches for studying 
synchronization, one may ask if this condition on 
L-,. is true in general. To answer this question, 
mathematicians have recently formulated it in terms 
of properties of manifolds (or synchronization 
hyperplanes). Some rigorous results on (generalized) 
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synchronization, when the system is smooth, are 
given by Josic (2000). This approach relies on the 
Fenichel theory of normally hyperbolic invariant 
manifolds and quantities that resemble Lyapunov 
exponents, and is referred to as differentiable GS. 
However, many situations correspond to the case 
where, in some region of values of parameters 
coupling, the function w is only continuous but not 
smooth, that is, the graph of w is a complicated 
geometrical object. This kind of synchronization 
is called nonsmooth GS (Afraimovich et al. 2001). 
Furthermore, the mathematical theory of IS often 
assumes the coupled oscillators to be identical, even 
if, in practice, no two oscillators are exact copies of 
each other. This leads to small differences in system 
parameters and then to synchronization errors. 
These errors have been studied by many authors 
(see, e.g., Illing (2002), and references therein). 


Identical Synchronization 


Perhaps the best way to explain synchronization of 
chaos is through IS, also referred to as conventional 
or complete synchronization (Boccaletti et al. 2002). 
It is the simplest form of chaos synchronization and 
generalizes to the complete replacement which is 
explained below. It is also the most typical form of 
chaotic synchronization often observable in two 
identical systems. 

There are various processes leading to synchroni- 
zation; depending on the particular coupling config- 
uration used these processes could be very different. 
So, one has to distinguish between the following two 
main situations, even if they are, in some sense, 
similar: the unidirectional and the bidirectional 
coupling. Indeed, synchronization of chaotic systems 
is often studied for schemes of the form 


oX = F(X) + RN(X — Y) E 
5 
- = G(Y) + kM(X — Y) 


where F and G act in R”, (X, Y) € (R”)’, is a scalar, 
and M and N are coupling matrices belonging to 
R”*”. If F=G the two subsystems X and Y are 
identical. Moreover, when both matrices are non- 
zero then the coupling is called bidirectional, while 
it is referred to as unidirectional if one is the zero 
matrix, and the other nonzero. 


Constructing Pairs of Synchronized Systems: 
Complete Replacement 


Pecora and Carroll (1990) proposed the use of 
stable subsystems of given chaotic systems to 
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construct pairs of unidirectionally coupled synchro- 
nizing systems. Since then generalizations of this 
approach have been developed and various meth- 
ods now exist to synchronize systems (Wu 2002, 
Hasler 1998). 

One way to build a couple of synchronized 
systems is then to use the basic construction method 
introduced by Pecora and Carroll, who made an 
important observation. They found that, when they 
make a replica of part of a chaotic system and send 
a system variable from the original system (trans- 
mitter) to drive this replica (receiver), sometimes the 
replica subsystem and the original chaotic one lock 
in their steps and evolve together chaotically in 
synchrony. This method can be described as follows. 
Consider the autonomous n-dimensional dynamical 
system, 


du 


= F(u) 6 


divide this system into two subsystems (u = (v, w)), 


7 = G(v, w) 
di [7] 
a H(v, w) 


where v = (t1,..., Um), Werte cts Un) G=(F1,..., 
Fm), and H =(Fm41,..., Fn). Next, create a new 
subsystem w’ identical to the w-subsystem. This 
yields a (2n — m)-dimensional system: 





dv 
dt G(v, w) 
dw 
= 8 
T H(v, w) [8] 
dw’ 
T H(v,w') 


The first state-variable component v(t) of the (v, w) 
system is then used as the input to the w’-system. 
The coupling is unidirectional and the (v, w) 
subsystem is referred to as the driving (or master) 
system, the w’-subsystem as the response (or slave) 
system. In this context, the following notions and 
results are useful. 


Definition If lim;— 4. ||w’(t) — w(t)|| =0 and w(t) 
continues to remain in step with w(t) in the course 
of the time, the two subsystems are said to be 
synchronized. 


Definition The Lyapunov exponents of the 
response subsystem (w’) for a particular driven 
trajectory v(t) are called CLEs. 


Let w(t) be a chaotic trajectory with initial 
condition w(0), and w’(t) be a trajectory started at 
a nearly point w’(0). The basic idea of the Pecora- 
Carroll approach is to establish the asymptotic 
stability of the solutions of w’-subsystem by means 
of CLEs. They have shown the following result 
(Pecora and Carroll 1990): 


Theorem A necessary and sufficient condition for 
the two subsystems, w andu’, to be synchronized is 
that all of the CLEs be negative. 


Note that only a finite number of possible 
decompositions (or couplings) v-w exist; this is 
bounded by the number of different possible 
subsystems, namely N(N — 1)/2. (For a description 
and mathematical analysis of various coupling 
schemes see Wu (2002).) Furthermore, by splitting 
the main system [6] in a different way, (complete) 
synchronization could not exist. Indeed, in general, 
only a few of the possible response subsystems 
possess negative CLEs, and may thus be used to 
implement synchronizing systems using the Pecora- 
Caroll method. In fact, it has been pointed out in the 
literature that in some cases, the CLE criterion is not 
as practical as some other criteria. 

For simplicity, the idea will now be developed on 
the following three-dimensional simple autonomous 
system, which belongs to the class of dynamical 
systems called generalized Lorenz systems (see 
Deriviére and Aziz-Alaoui (2003), and references 
therein): 


x = —9x — 9y 
y = —17x — y — xz [9] 
Z= Z+ xy 


(This should be compared with the well-known 
Lorenz system: 


x = —10x + 10y 
y = 28x — y — xz 


which differs in the signs of various terms and the 
values of coefficients.) From previous observations, 
it was shown that system [9] oscillates chaotically; 
its Lyapunov exponents are +0.601, 0.000, and 
—16.470; it exhibits the chaotic attractor of Figure 1, 
with a three-dimensional feature very similar to that 
of Lorenz attractor (in fact, it satisfies the condition 
z < 0, but in our context it does not matter). 

Let us divide system [9] into two subsystems 
v=x; and w=(y,, 2%). By creating a copy 
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w' = (y2, 22) of the w-subsystem, we obtain the 
following five-dimensional dynamical system: 


X1 = —9x] — 94 


Vi SSI ey tS 

ay = -Z1 +4191 [10] 
Vx = = 17%1 =y — HZ 

Z2 = —g2 TXY 


In numerical experiments, it was observed that the 
motion quickly results in the two equalities, 
lim;— +00 |y2 — y1] =O and lim;— + |z2 — zı| = 0, to 
be satisfied, that is, lim;— +o ||w’ — w||=0. These 
equalities persist as the system evolves. Hence, the 
two subsystems w and w’ are synchronized. Figure 2 
illustrates this phenomenon. 

It is also easy to verify that the synchronization 
persists even if a slight change in the parameters of 
the system is made. The CLEs of the linearization of 
the system around the synchronous state, the 
negativity of which determines the stability of the 
synchronized solution, are also computed easily. 

Pecora—Carroll similarly built the system [10] by 
using the following steps. Starting with two copies 
of system [9], a signal x(t) is transmitted from the 
first to the second: in the second system all x- 
components are replaced with the signal from the 
first system, that is, x2 is replaced by x, in the 
second system. Finally, the dx2/dt equation is 
eliminated, since it is exactly the same as dx,/dt 
equation, and is superfluous. This then results in 
system [10]. For this reason, Pecora—Carroll called 
this construction a complete replacement. Thus, it is 
natural to think of the x; variable as driving the 
second system, but also to label the first system the 
drive and the second system the response. In fact, 
this method is a particular case of the unidirectional 
coupling method explained below. Note also that 
this method could be modified by using a partial 
substitution approach, in which a response variable 
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The chaotic attractor of system [9]: x-y and x-z plane projections. 


is replaced with the drive counterpart only in certain 
locations (Pecora et al. 1997). 


Unidirectional IS 


The IS synchronization has also been called as one- 
way diffusive coupling, drive-response coupling, 
master-slave coupling, or negative feedback control. 

System [5], F=G and N=0O, becomes unidirec- 
tionally coupled, and reads 


dX 
—— = Ft xX 
. a [11] 


M is then a matrix that determines the linear 
combination of X components that will be used 
in the difference, and k determines the strength of 
the coupling (see, for an interesting review on 
this subject, Pecora et al. (1997)). In unidirectional 
synchronization, the evolution of the first system 
(the drive) is unaltered by the coupling, the second 
system (the response) is then constrained to copy the 
dynamics of the first. Let us consider an example 
with two copies of system [9], and for 


1 0 0 
M=(0 0 0 (12) 
0 0 0 


that is, by adding a damping term to the first equation 
of the response system, we get a following unidir- 
ectionally coupled system, coupled through a linear 
term k > 0 according to variables x1 2: 

K == 

yı = —17x1 — y1 — X121 


Z1 =2i t X171 

[13] 
X2 = —9x> = 9y2 — k(x2 = x1) 

y2 
Z2 = —Z2 + X2V2 


= = 7 = 
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Figure 2 Complete replacement synchronization. Time series for (a) y;(t) and (b) z(t),/=1,2, in system [10]. The difference 
between the variable of the transmitter and the variable of the receiver asymptotes tends to zero as time progresses, that is, 
synchronization occurs after transients die down. (c) The plot of amplitudes y; against yo, after transients die down, shows a diagonal 
line, which also indicates that the receiver and the transmitter are maintaining synchronization. The plot of z; against zə shows a 


similar behavior. 


For k=O, the two subsystems are uncoupled; for 
k > 0 both subsystems are unidirectionally coupled; 
and for k — +00, we recover the complete replace- 
ment coupling scheme explained above. Our numer- 
ical computations yield the optimal value k for the 
synchronization; we found that for k > k=4.999, 
both subsystems of [13] synchronize. That is, 
starting from random initial conditions, and after 
some transient time, system [13] generates the same 
attractor as for system [9] (see Figure 1). Conse- 
quently, all the variables of the coupled chaotic 
subsystems converge: x2 converges to x1, y2 to y1, 
and z2 to z1 (see Figure 3). Thus, the second system 
(the response) is locked to the first one (the drive). 
Alternatively, observation of diagonal lines in 
correlation diagrams, which plot the amplitudes x, 


against x2, y1 against y2, and zı against z2, can also 
indicate the occurrence of system synchronization. 

IS was the first for which examples of unidir- 
ectionally coupled chaotic systems were presented. It 
is important for potential applications of chaos 
synchronization in communication systems, or for 
time-series analysis, where the information flow is 
also unidirectional. 


Bidirectional IS 


A second brief example uses a bidirectional (also 
called mutual or two-way) coupling. In this situa- 
tion, in contrast to the unidirectional coupling, both 
drive and response systems are connected in such a 
way that they influence each other’s behavior. Many 
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Figure 3 Time series for x;(t), y(t), and z;(t)\(i= 1,2) in system [13] for the coupling constant k =5.0, that is, beyond the threshold 
necessary for synchronization. After transients die down, the two subsystems synchronize perfectly. 


biological or physical systems consist of bidirection- 
ally interacting elements or components; examples 
range from cardiac and respiratory systems to 
coupled lasers with feedback. Let us then take two 
copies of the same system [9] as given above, but 
two-way coupled through a linear constant term k > 
0 according to variables x1,2: 


x1 = —9x1 = 94 = k(x4 = x2) 


yr = —17x14 — 1 — 4121 
= =i tyi 
[14] 
X2 = —9x> E 9y2 = k(x2 = x1) 
2 = —17x2 — yo — x22 
oy ae Mas) 


We can get an idea of the onset of synchronization 
by plotting, for example, x; against x2 for various 
values of the coupling-strength parameter k. Our 
numerical computations yield the optimal value k 
for the synchronization: k ~ 2.50 (Figure 4), both 
(Xi, Yi, Zi) Subsystems synchronize and system [14] 


also generates the attractor of Figure 1. 


Synchronization manifold and stability Geometri- 
Geometrically, the fact that systems [13] and [14], 
beyond synchronization, generate the same attractor 


as system [9], implies that the attractors of these 
combined drive-response six-dimensional systems 
are confined to a three-dimensional hyperplane (the 
synchronization manifold) defined by Y =X. After 
the synchronization is reached, this manifold is a 
stable submanifold in the full phase space R°. 
Figure 5 gives an idea of what the geometry of the 
synchronous attractor of system [13] or [14] looks 
like, by exhibiting the projection of the phase space 
R onto (x1, y1, y2) subspace. But, one can simi- 
larly plot any combination of variable x;, y;, and 
zi (i=1, 2), and get the same result, since the 
motion, in case of synchronization, is confined to 
the hyperplane defined in R° by the equalities 
x1 =X2, Y1 = 92, and zı =22. 

This hyperplane is stable since small perturbations 
which take the trajectory off the synchronization 
manifold decay in time. Indeed, as stated earlier, 
CLEs of the linearization of the system around the 
synchronous state could determine the stability of 
the synchronized solution. This leads to requiring 
that the origin of the transverse system, X1, is 
asymptotically stable. To see this, for both systems 
[13] and [14], we then switch to the new set of 
coordinates, X, =Y—X, that is, x, =x2-— x1, 
Yi =y2 — y1, and zı =z2 — 2. The origin (0, 0,0) 
is obviously a fixed point for this transverse system, 
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Figure 4 Illustration of the onset of synchronization of system [14]. (a)—(c) Plots of amplitudes x; against x2 for values of the coupling 
parameter k =0.5, 1.5,2.8, respectively. The system synchronizes for k > 2.5. (d) Plot, for k =2.8, of the norm N(X) = ||x; — X2|| + 
[y1 — Y2|| + ||21 — Z2|| versus t, which shows that the system synchronizes very quickly. 


within the synchronization manifold. Therefore, for 
small deviations from the synchronization manifold, 
this system reduces to a typical variational equation: 


dx, 
dt 


where DF(X) is the Jacobian of the vector field 
evaluated onto the driving trajectory X, that is, 





= DF(X)X, 15) 


coe i 

dt 

dy, 

<- |=Y 16 
a YL [16] 
dz. 

dt Z1 


For systems |13] and [14], we obtain 
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Figure 5 The motion of synchronized system [13] or [14] takes : S p 
place on a chaotic attractor which is embedded in the with k;=k for system [13] and k;=2k for system 


synchronization manifold, that is, the hyperplane defined by [14]. Let us remark that the only difference between 
X = Xo, Yı = Y2, and Z4 = Z2. both matrices V; is the coupling k which has a factor 
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Figure 6 The largest transverse Lyapunov exponents Lax as 
a function of coupling strength k, in the unidirectional system [13] 
(solid) and the bidirectional system [14] (dotted). 


2 in the bidirectional case. Figure 6 shows the 
dependence of L+, on k, for both examples of 
unidirectionally and bidirectionally coupling sys- 
tems. Li. becomes negative as k increases, which 
insures the stability of the synchronized state for 
systems [13] and [14]. 

Let us note that this can also be proved 
analytically as done by Deriviére and Aziz-Alaoui 
(2003) by using a suitable Lyapunov function, and 
using some new extended version of LaSalle invar- 


lance principle. 


Desynchronization motion Synchronization depends 
not only on the coupling strength, but also on the 
vector field and the coupling function. For some 
choice of these quantities, synchronization may 
occur only within a finite range [k1, k2] of coupling 
strength; in such a case a desynchronization phe- 
nomenon occurs. Thus, increasing k beyond the 
critical value kọ yields loss of the synchronized 
motion (L+ „„ becomes positive). 


Generalized Synchronization 


Identical chaotic systems synchronize by following the 
same chaotic trajectory. However, real systems are in 
general not identical. For instance, when the para- 
meters of two coupled identical systems do not match, 
or when these coupled systems belong to different 
classes, complete IS may not be expected, because 
there does not exist such an invariant manifold Y = X, 
as for IS. For non-identical systems, the possibility of 
some type of synchronization has been investigated 
(Afraimovich et al. 1986). It was shown that when two 
different systems are coupled with sufficiently strong 
coupling strength, a general synchronous relation 
between their states could exist and it could be 
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expressed by a smooth invertible function, 
Y(t) =y(X(t)). This phenomenon, called GS, is thus a 
relaxed and extended form of IS in non-identical 
systems. 

However, it may also occur for pairs of identical 
systems, for example, for systems having reflection 
symmetry, F(—X)= —F(X). Besides these examples 
of GS, others also exist that exploit symmetries of 
the underlying systems (Parlitz and Kocarev 1999). 

GS was introduced for unidirectionally coupled 
systems by Rulkov et al. (1995). For simplicity, we 
also focus on unidirectionally coupled continuous 
time systems: 


o* = F(X) 
1s 
dt = G(Y, u(t)) 
where X eR”, Y eR”, F:R”’ —R”, G:R” x 
RFR”, and  u(t)=(u4(t),...,u,(t)) with 


uilt) =h;(X(t, X,)). Two (non-identical) dynamical 
systems are said to be synchronized in a generalized 
sense if there is a continuous function w from the 
phase space of the first to the phase space of the 
second, taking orbits of the first system to orbits of 
the second. 

The main problem is to know when and under 
what conditions system [18] undergoes GS. Many 
authors have addressed this question, and it has been 
shown that asymptotic stability is equally significant 
for this more universal concept (for some theoretical 
results, see Rulkov et al. (1995) and Parlitz and 
Kocarev (1999)). For unidirectionally coupled con- 
tinuous time systems, the following results hold: 


Theorem A necessary and sufficient condition for 
system [18] to be synchronized in the generalized 
sense is that for each u(t) =u(X(t, X_)) the system- 
is asymptotically stable. 


When it is not possible to find a Lyapunov function 
in order to use this theorem, one can numerically 
compute the CLEs of the response system, and use the 
following result: 


Theorem The drive and response subsystems of 
system [18] synchronize in the generalized sense iff 
all of the CLEs of the response subsystem are 
negative. 


The definition of Y has the advantage that it allows 
the discussion of synchronization of non-identical 
systems and, at the same time, to consider synchroni- 
zation in terms of the property of synchronization 
manifold. Therefore, it is important to study the 
existence of the transformation y and its nature 
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(continuity, smoothness, ...). Unfortunately, except in 
special cases (Afraimovich et al. 1986), rarely will one 
be able to produce formulas exhibiting the mapping w. 

An example of two unidirectionally coupled 
chaotic systems which synchronize in the generalized 
sense is given below. Consider the following Rossler 
system driven by system [9]: 


X1 = —9x4 —= 94 
yı = —17x1 — y1 — x121 


Zi = -Z1 T Xiv 
; ; [19] 
Sn, = 9 = RO = (9, 


V2 = x2 + 0.2y2 — R(y2 — (yi a zn) 
Z = 0.2 + 22(x2 — 9.0) — k(z2 — Ca -- A) 


As shown in Figure 7, it appears impossible to tell 
what the relation is between the transmitter sub- 
system (x1, Yy1,7Z1) in eqn [19] and the two Rössler 
response subsystems (x2, y2,Z2) at k = 1 and k = 100. 

However, GS occurs for large values of the 
coupling-strength parameter k. Therefore, for such 
values we expect that orbits of [19] will lie in the 
vicinity of a certain synchronization manifold. 
Indeed, let us define the set 


S ={(x1, Y1, 21, X2,Y2,22) E RË : x2 = x? + y$, 


2, 2 2, 2 
y2 = yi + Zi, Z2 = x1 +27} 


Since the projections of S$ onto the coordinates 
(x1, Y1, X2), (Y1, Z1, Y2), and (x1, Z1, z2) are parabo- 
loids, we can see how the synchronization manifold 
is approached. This is illustrated in Figure 8, where 
the (x1, Y1, X2) projections of typical trajectories are 
shown at four different coupling values. (See Josic 
(2000) for other examples and further develop- 
ments; see also Pecora et al. (1997), where the 
authors summarize a method in order to get an idea 
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on the functional relation occurring in case of GS, 
between two coupled systems.) 


Phase Synchronization 


For coupled non-identical chaotic systems, other 
types of synchronizations exist. Recently, a rather 
weak degree of synchronization, the PS, of chaotic 
systems has been described (Pikovsky et al. 2001). 
The Greek meaning of the word synchronization, 
mentioned in the introduction, is closely related to 
this type of processes. The synchronous motion is 
actually not visible. Indeed, in PS the phases of 
chaotic systems with PS are locked, that is, there 
exists a certain relation between them, whereas the 
amplitudes vary chaotically and are practically 
uncorrelated. Thus, it is mostly close to synchroni- 
zation of periodic oscillators. 


Definition PS of two coupled chaotic oscillators 
occurs if, for arbitrary integers n and m, the phase 
locking condition between the corresponding 
phases, |nġı(t) — m@(t)| < constant, holds and the 
amplitudes of both systems remain uncorrelated. 


Let us note that such a phenomenon occurs when 
a zero Lyapunov exponent of the response system 
becomes negative, while, as explained above, iden- 
tical chaotic systems synchronize by following the 
same chaotic trajectory, when their largest trans- 
verse Lyapunov exponent of the synchronized 
manifold decreases from positive to negative values. 

Moreover, following the definition above, this 
phenomenon is best observed when a well-defined 
phase variable can be identified in both coupled 
systems. This can be done for strange attractors that 
spiral around a “hole,” or a particular (fixed) point 
in a two-dimensional projection of the attractor. The 
typical example is given by the Rössler system, which, 
for some range of parameters, exhibits a Möbius- 
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Figure 7 Projections onto the (xy) plane of typical trajectories of system [19]. (a) (x1, y1) projection, that is, a typical trajectory of 
system [9]; (b) and (c) (Xo, y2) projections at, respectively, k= 1 and k=100. 
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Figure 8 Generalized synchronization. (X1, Y1, X2) projections of typical trajectories of system [19] after transients die out, with 
(a) k=1, (b) k=20, (c) k=100, and (d) k=200. For the last value, the attractor lies in the set S, three-dimensional projections of 


which are paraboloids. 


strip-like chaotic attractor with a central hole. In such 
a case, a phase angle ¢(t) can be defined that decreases 
or increases monotonically. For an illustration, we 
take the following two coupled Rossler oscillators: 


Xi = —a1y1 — 21 + R(x2 — x1) 
yı = 04x, + 0.1771 
a= 0.24 Xi — 90) 
(20) 
eS 0) — a ER 
y2 = Q@2X2 + 0.17y2 
Aa 7.0) 
with a small parameter mismatch @1, 2 = 


0.95 + 0.04,k governs the strength of coupling. 
If we can define a Poincaré section surface for 
the system, then, for each piece of a trajectory 
between two cross sections with this surface, we 
define the phase, as done in Pikovsky et al. (2001), 
as a piecewise linear function of time, so that the 
phase increment is 27 at each rotation: 
p= 


EALE 
lni — t, 


g(t) = 2n LTS bi 
where t, is the time of the nth crossing of the secant 
surface. 

In our example, the last has been chosen as the 
negative x-axis and represented by the wide segment 
in Figure 9a. This definition of phases is clearly 
ambiguous since it depends on the choice of the 
Poincaré section; nevertheless, defined in this way, 


the phase has a physically important property, it 
does correspond to the direction with the zero 
Lyapunov exponent in the phase space, its perturba- 
tions neither grow nor decay in time. Figure 9c 
shows that there is a transition from the nonsyn- 
chronous phase regime, where the phase difference 
increases almost linearly with time (k=0.01 and 
k =0.05), to a synchronous state, where the relation 
|@1(t) — d2(t)| < constant holds (k=0.1), that is, 
the phase difference does not grow with time. 
However, the amplitudes are obviously uncorrelated 
as seen in Figure 9b. This example shows that 
PS could takes place for weaker degree of synchro- 
nization in chaotic systems. Readers can find more 
rigorous mathematical discussion on this subject, 
and on the definition of phases of chaotic oscillators, 
in Pikovsky et al. (2001), see also Boccaletti et al. 
(2002) and references therein. 


Other Treatments and Types 
of Synchronization 


Lag Synchronization 


PS synchronization occurs when non-identical chao- 
tic oscillators are weakly coupled: the phases are 
locked, while the amplitudes remain uncorrelated. 
When the coupling strength becomes larger, some 
relationships between amplitudes may be estab- 
lished. Indeed, it has been shown (Rosenblum et al. 
1997), in symmetrically coupled non-identical oscil- 
lators and in time-delayed systems, that there exists 
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Figure 9 (a) Rössler chaotic attractor projection onto x-y plane. (b) Amplitudes A; versus Az for the phase synchronized case at 


k =0.1. (c) Time serie of phase difference for different coupling strengths k; for k = 0.01 PS is not achieved, while for k = 0.1 PS takes 
place. Although the phases are locked, for k =0.1, the amplitudes remain chaotic and uncorrelated. 


a regime of LS. This process appears as a coin- 
cidence of time-shifted states of two systems: 


lim ||Y(t) — X(t-r)||=0 
t— -+00 
where 7 is a positive delay. 


Projective Synchronization 


In coupled partially linear systems, it was reported 
by Mainieri and Rehacek (1999) that two identical 
systems could be synchronized up to a scaling factor. 
This type of chaotic synchronization is referred to as 
projective synchronization. Consider, for example, a 
three-dimensional chaotic system X = F(X), where 
X =(x,y,z). Decompose X into a vector v= (x,y) 
and a scalar z; the system can then be rewritten as 


du d 
gauh = bv,2) 

In projective synchronization, two identical sys- 
tems X1=(x1,91,21) (drive) and Xz =(x2, y2, 22) 
(response) are coupled through the scalar variable z. 
It occurs if the state vectors vy and v7 synchronize up 
to a constant ratio, that is, lim;— 4. |lav;(t) — 
v2(t)||=0, where a is called a scaling factor. For 
partially linear systems, it may automatically occur 


provided that the systems satisfy some stability 
conditions. 

However, this process could not be classified as 
GS, even if there exists a linear relation between the 
coupled systems, because the response system of 
projective synchronization is not asymptotically 
stable. For more information about this subject, 
the reader is referred to Mainieri and Rehacek 


(1999). 


Anticipating Synchronization 


It is interesting to mention that a new form of 
synchronization has recently appeared, the so-called 
anticipating synchronization (Boccaletti et al. 2002). 
It shows that some coupled chaotic systems might 
synchronize such that their response anticipates the 
drivers by synchronizing with their future states. 

It is also interesting to mention the nonlinear Hæ 
synchronization method for nonautonomous 
schemes introduced by Suykens et al. (1997). 


Spatio-Temporal Synchronization 


Low-dimensional systems have rather limited useful- 
ness in modeling real-world applications. This is 
why the synchronization of chaos has been carried 


out in high dimensions (see Kocarev et al. (1997) for 
a review). See also Chen and Dong (2001) for a 
discussion of special high-dimensional systems, 
namely large arrays of coupled chaotic systems. 


Application to Transmission Systems 
and Secure Communication 


Synchronization principles are useful in practical 
applications. Use of chaotic signals to transmit 
information has been a very active research topic 
in the last decade. Thus, it has been established that 
chaotic circuits may be used to transmit information 
by synchronization. As a result, several proposals 
for secure-communication schemes have been 
advanced (see, e.g., Cuomo et al. (1993), Hasler 
(1998), and Parlitz et al. (1999)). The first labora- 
tory demonstration of a secure-communication 
system, which uses a chaotic signal for masking 
purposes, and which exploits the chaotic synchroni- 
zation techniques to recover the signal, was reported 
by Kocarev et al. (1992). 

It is difficult, within the scope of this article, to 
give a complete or detailed discussion, and it should 
be noted that there exist many competing and tested 
methods that are well established. 

The main idea of the communication schemes is 
to encode a message by means of a chaotic 
dynamical system (the transmitter), and to decode 
it using a second dynamical system (the receiver) 
that synchronizes with the first. In general, secure- 
communication applications assume additionally 
that the coupled systems used are identical. 

Different methods can be used to hide the useful 
information, for example, chaotic masking, chaotic 
switching, or direct chaotic modulation (Hasler 
1998). For instance, in the chaotic masking method, 
an analog information carrying the signal s(t) is 
added to the output y(t) of the chaotic system in the 
transmitter. The receiver tries to synchronize with 
component y(t) of the transmitted signal s(t) + y(t). 
If synchronization takes place, the information 
signal can be retrieved by subtraction (Figure 10). 

It is interesting to note that, in all proposed 
schemes for secure communications using the idea of 
synchronization (experimental realization or com- 
puter simulation), there is an inevitable noise 
degrading the fidelity of the original message. 














s(t) i y(t) S(t) 
l ie Í l Receiver | — > 
Information Transmitted Retrieved 
signal signal information 
(chaotic) signal 


Figure 10 A typical communication setup. 
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Robustness to parameter mismatch was addressed 
by many authors (Illing et al. 2002). Lozi et al. 
(1993) showed that, by connecting two identical 
receivers in cascade, a significant amount of the 
noise can be reduced, thereby allowing the recovery 
of a much higher quality signal. 

Furthermore, different implementations of chaotic 
secure communication have been proposed during 
the last decades, as well as methods for cracking this 
encoding. The methods used to crack such a chaotic 
encoding make use of the low dimensionality of the 
chaotic attractors. Indeed, since the properties of 
low-dimensional chaotic systems with one positive 
Lyapunov exponent can be reconstructed by analyz- 
ing the signal, such as through the delay-time 
reconstruction methods, it seems unlikely that these 
systems might provide a secure encryption method. 
The hidden message can often be retrieved easily by 
an eavesdropper without using the receiver. But, 
chaotic masking and encoding are difficult to break, 
using the state-of-the-art analysis tools, if suffi- 
ciently high dimensional chaos generators with 
multiple positive Lyapunov exponents (i.e., hyperch- 
aotic systems) are used (see Pecora et al. (1997), and 
references therein). 


Conclusion 


In spite of the essential progress in theoretical and 
experimental studies, synchronization of chaotic 
systems continues to be a topic of active investiga- 
tions and will certainly continue to have a broad 
impact in the future. Theory of synchronization 
remains a challenging problem of nonlinear 
science. 


See also: Bifurcations of Periodic Orbits; Chaos and 
Attractors; Fractal Dimensions in Dynamics; Generic 
Properties of Dynamical Systems; Isochronous Systems; 
Lyapunov Exponents and Strange Attractors; Singularity 
and Bifurcation Theory; Stability Theory and KAM; 
Weakly Coupled Oscillators. 
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Introduction 


Quantum field theory was initially invented in order 
to describe high-energy elementary particles, thereby 
unifying quantum mechanics and special relativity. 
In other words, quantum field theory was addressed 
to the so-called vacuum sector, that is, roughly 
speaking physics at zero temperature and zero 
particle density. 

The same applies to the various mathematically 
rigorous versions of quantum field theory that have 
been developed since the mid-1950s. Indeed, in 
Wightman’s axiomatic setting, quantum field theory 
is describes in terms of a set of the so-called vacuum 
expectation values. The “algebraic approach” to 
quantum field theory developed by Araki, Haag, 
Kastler, and their collaborators is more flexible in 
nature. In fact, right from the beginning, the new 
algebraic tools were successfully applied to lattice 
models and other nonrelativistic systems with 
infinitely many degrees of freedom (see Operator 
algebras and quantum statistical mechanics by 
O Bratteli and D W Robinson). But the need to 
treat large systems of relativistic particles was 
apparently not felt. Even in Haag’s recent mono- 
graph, Local Quantum Physics, the subjects of 
algebraic quantum field theory and algebraic quan- 
tum statistical mechanics are treated separately. 

It is remarkable that constructive field theory 
was ahead of its time in this respect. The famous 
P(ġ), model (first constructed by Glimm and Jaffe) 
was adapted to thermal states by Hoegh-Krohn 
as early as 1974 (see Hoegh-Krohn (1974)). 
His paper was properly named “Relativistic quan- 
tum statistical mechanics in two-dimensional 


space-time,” but only recently has it received 
proper attention. 

At the same time, around 1974, cosmology and 
heavy-ion collisions drew the interest of phyiscists 
towards the quantum statistical mechanics of hot 
relativistic quantum systems. Well-known papers 
from this early stage include those by Weinberg, 
Bernard, and Dolan and Jackiw. While most of the 
papers used Euclidean path integrals, Umezawa and his 
school developed a real-time framework called 
“thermo-field dynamics,” which involved a doubling 
of the degrees of freedom. The excellent review by 
Landsman and van Weert (1987) covers these early 
attempts; it also explains the basic connection to the 
algebraic approach. 

In the following years, it became evident that 
statistical mechanics (in its standard formulation) is 
barely sufficient to derive the properties of bulk 
matter from the underlying microscopic description 
provided by quantum field theory. Thus, various 
people began to establish mathematically rigorous 
foundations for the description of thermal field 
theory. The most successful approach was launched 
by D Buchholz (with various collaborators), who, 
from about 1985 onwards, started applying the 
KMS condition (which describes a thermal equili- 
brium state in the operator-algebraic framework of 
local quantum physics) to relativistic quantum field 
theory. In 1994, Buchholz and Bros managed to 
integrate the holomorphic structure of Wightman 
field theory into Haag’s operator-algebraic frame- 
work, which led them to the notion of a relativistic 
KMS condition. 

The advanced mathematical concepts involved in 
the formulation of entropy densities for thermal 
quantum fields (see Narnhofer (1994)) do not allow 
us to present this topic. The reader is referred to the 
excellent book Quantum Entropy and Its Use by 
M Ohya and D Petz for an introduction to the 
subject. A discussion of the so-called thermalization 
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effects that occur as a result of a curved spacetime is 
provided in Quantum Field Theory in Curved 
Spacetime. Another subject, which is missing almost 
completely, is perturbation theory. This subject has 
been covered extensively in three well-known text- 
books by Kapusta, Le Bellac, and Umezawa. 


Observables and States 


Following Heisenberg, we start from the basic 
assumption that quantum theory can be formulated 
in terms of observables which form an algebra A, that 
is, a vector space with a (noncommutative) multi- 
plication law. Although our emphasis on the abstract 
algebraic structure may look strange, there is a 
profound reason for starting out with an abstract 
algebra of observables: as soon as one considers 
systems with infinitely many degrees of freedom, one 
encounters a possibility to realize the abstract elements 
of the algebra A as operators on a Hilbert space in 
various inequivalent ways. The famous equivalence 
between the Heisenberg and the Schrödinger picture 
simply breaks down. States which are macroscopically 
different (e.g., thermal equilibrium states for different 
temperatures) give rise — in a natural way, which will 
be discussed in the sequel — to unitarily inequivalent 
representations of the abstract algebra of observables 
A, while states which only differ microscopically can 
be accommodated by density matrices within the same 
Hilbert space. In other words, a physical state is 
described macroscopically by specifying a representa- 
tion, and microscopically by a density matrix in this 
representation. 

In a Lagrangian approach, the algebra of obser- 
vables A may be thought of as being generated by 
the underlying fields, currents, etc. This leads to the 
so-called polynomial algebras. It is mathematically 
convenient to assume that A is an algebra of 
bounded operators, generated by the bounded 
functions of the underlying quantum fields. If (x) 
is any such field and if f € S(R“*!) is any real test 
function with support in a bounded region of 
spacetime, then the corresponding operator 


wif) = exp(i f dx foot) 


would be a typical element of A. The set of 
operators {W(f) | supp f C O} will generate a sub- 
algebra A(O) of A. The underlying fields can be 
recovered by taking (functional) derivatives, once a 
representation of A on a Hilbert space is specified. 
The spacetime symmetry of Minkowski space 
manifests itself in the existence of a representation 


a : (A, x)= aax E Aut( A), (A,x) € Pi. 


of the (orthochronous) Poincaré group P}. Here 
æA x is an automorphism of A, that is, a mapping 
from A to A which preserves the algebraic structure. 
Once a Lorentz frame is fixed by choosing a timelike 
vector e € V}, the time evolution tr-aj;,, will be 
denoted by t> 7. 

For the free field, the group of automorphisms 
(A, x) a,x is defined by 


anx(W(f)) = WEA. — x))) 


As before, f € S(R®!) is a Schwarz function over 
the Minkowski space R. 

While the invariance of the equations of motion is 
reflected in the existence of a representation of the 
Poincaré group in terms of automorphisms in the 
Heisenberg picture, at least the invariance with 
respect to Lorentz boosts is spontaneously broken 
in the Schrödinger picture for a thermal equilibrium 
state. 

The usual notions of vector states and density 
matrices associated with a given Hilbert space 
(usually Fock space) are a priori not general enough 
to cover all cases of interest in thermal field theory. 
The following algebraic definition of a state sub- 
stantially generalizes the notion of a state: A state w 
is a positive, linear, and normalized functional, that 
is, a linear map w: A — C such that 


w(a*a)>0 and w(l)=1 


Once a state w is distinguished on physical grounds, 
the GNS reconstruction theorem provides a Hilbert 
space H,, and a representation m, of A, that is, a 
map from A to the set of bounded operators B(H.,), 
which preserves the algebraic relations. 

It is instructive to consider the GNS representa- 
tion of the Pauli matrices {oo = 1,01, 02,03}. Given a 
state (a diagonal 2 x 2 matrix p with positive entries 
and tro=1), the left regular representation (a 
construction well known from group theory) 


ROP n/p >, a 


defines a reducible representation on Cf, unless one 
of the entries in the diagonal of p is zero (which 
corresponds to a pure state). In the latter case, the 
GNS Hilbert space is C*. By construction, 


<,/p|n(o;)|,/p > = tr po; i= 1, 2, 3. 


Thermal Equilibrium 


The variety of nonequilibrium states ranges from 
mild perturbations of equilibrium states through 
steady states, whose properties are governed 
by external heat baths, or hydrodynamic flows 
up to totally chaotic states which no longer 


admit a description in terms of thermodynamic 
notions. Buchholz et al. (2002) have initiated an 
investigation of nonequilibrium states that are 
locally (but not globally) close to thermal equili- 
brium. Unfortunately, we will not be able to cover 
this topic. Instead, we will concentrate on states 
which deviate from a true equilibrium state only 
microscopically. 


Characterization of Thermal Equilibrium States 


When the time evolution t> 7% € Aut(A) is changed 
by a local perturbation, which is slowly switched on 
and slowly switched off again, then an equilibrium 
state w returns to its original form at the end of this 
procedure. This heuristic condition of adiabatic 
invariance can be expressed by the stability 
requirement 
t 


lim dt w(la, n (b) ) =0 Ya,bE A [1] 
© Jt 

In a pioniering work Haag, Kastler, and Trych- 
Pohlmeyer showed that the characterization [1] of 
an equilibrium state leads to a sharp mathematical 
criterion, first encountered by Haag, Hugenholtz, 
and Winnink and more implicitly by Kubo, Martin, 
and Schwinger: 


Definition 1 A state wg over A is called a KMS 
state for some 8 > 0, if for all a,b € A, there exists a 
function F, p which is continuous in the strip 0 < 
Sz < 8 and analytic and bounded in the open strip 
0 < Sz < 8, with boundary values given by 


F, y(t) =we(am(b)) and 
Fi y(t +18) =we(n(b)a) WER [2] 


Before we start analyzing the properties of KMS 
states, we should mention an alternative character- 
ization of thermal equilibrium states: passivity. The 
amount of work a cycle can perform when applied 
to a moving thermodynamic equilibrium state is 
bounded by the amount of work an ideal windmill 
or turbine could perform; this property is called 
semipassivity (Kuckert 2002): a state w is called 
semipassive (passive) if there is an “efficiency 
bound” E >Q (E = 0) such that 


~ (WO, H,,WO,) < E- (Wou, PIWO) 
YW Em (A) 


with W = W* [H , W] €7,(A)”, and [P., W] E 
Tul A)”. Here (H , Pu) denote the generators imple- 
menting the spacetime translations in the GNS 
representation (Hu, Qw, Tw). Generalizing the notion 
of complete passivity, the state w is called completely 
semipassive if all its finite tensorial powers are 
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semipassive with respect to one fixed efficiency 
bound E. It has been shown by Kuckert (2002) 
that a state is completely semipassive in all inertial 
frames if and only if it is completely passive in some 
inertial frame. The latter implies that w is a KMS 
state or a ground state (a result due to Pusz and 
Woronowicz). 

Let us now turn to properties of thermal 
equilibrium states which are specific for relativistic 
models. It was first recognized by Bros and 
Buchholz (1994) that KMS states of a relativistic 
theory have stronger analyticity properties in con- 
figuration space than those imposed by the tradi- 
tional KMS condition: 


Definition 2 A KMS state weg satisfies the relativis- 
tic KMS condition (Bros and Buchholz 1994) if there 
exists a unit vector e in the forward light cone V} 
such that for every pair of local elements a, b of A 
the function F, + 


Fab (x4 ; X2) — W3( Abe, (a) ax, (b)) 


extends to an analytic function in the tube domain 
-I g x 1 Sefi where T geja = fz EC | Sz € V} ia 
(Ge/2 — V+)}. 


The relativistic KMS condition can be understood 
as a remnant of the relativistic spectrum condition in 
the vacuum sector. It has been rigorously established 
(Bros and Bruchholz 1994) for the KMS states 
constructed by Buchholz and Junglas (1989) and by 
C Gérard and the author for the P(¢), model. In the 
thermal Wightman framework (Bros and Buchholz 
1996) it has been shown that the relativistic KMS 
condition implies existence of model-independent 
analyticity properties of thermal n-point functions. 
These properties also appear in perturbative compu- 
tations of the thermal Wightman functions 
(Steinmann 1995). 

We now turn to the properties of the set of KMS 
states. For given 3, the convex set Sg of all KMS 
states is known to form a simplex; the extreme 
points in the set Sz are called extremal KMS states. 
As a consequence, the extremal states in Sg can be 
distinguished with the help of “classical” (central) 
observables, that is, by observables which commute 
with all other observables. 

If w is an extremal KMS state and y is an 
automorphism which commutes with the time 
evolution t++7;, then the state w defined by 


w (a) = w((a)), 


is again an extremal KMS state to the same 
parameter values. If w’Aw, one says that the 
symmetry is spontaneously broken. 


acA 


230 Thermal Quantum Field Theory 


Lorentz invariance with respect to boosts is 
always broken by a KMS state, since the KMS 
condition distinguishes a rest frame. A KMS state 
might also break spatial translation or rotation 
invariance. However, by averaging over the different 
configurations one can usually construct a transla- 
tion- and rotation-invariant state. The situation is 
drastically different with respect to supersymmetry. 
Buchholz and Ojima (1997) have shown that super- 
symmetry is broken in any thermal state and it is 
impossible to proceed from it by “symmetrization” 
to states on which an action of supercharges can be 


defined. 


Existence of Thermal Equilibrium States 


Buchholz and Junglas (1989) demonstrated that the 
existence of KMS states can be guaranteed for a 
large class of quantum field-theoretic models. The 
basic assumption to be met concerns the phase-space 
properties of the model. A generalized trace norm 
(the so-called “nuclear norm”) is used to estimate 
the “number” of degrees of freedom in phase space. 

The first step is to construction a subspace H(A) 
of the vacuum Hilbert space Hvac., which represents 
excitations of the vacuum strictly localized inside of 
a bounded spacetime region Ô. Due to the strong 
correlations present in the vacuum state of any 
relativistic model, as a consequence of the Reeh- 
Schlieder property (see the section “Analyticity of n- 
point functions”) this is a delicate procedure, which 
involves the so-called “split property.” This property 
ensures that there exists a product vector 7 in 
vacuum Hilbert space Hvac. such that 


(N, Taa lab yn) = Wea) . Wya (0) 
Va € A(O), b € A(O)S [3] 


Here O c Ô denotes a slightly smaller open space- 
time region (such that the closure © is inside the 
interior of Ô) and A(Ô):= {A € A | [A,B] =0 VB € 
A(Ô)}. The existence of a product vector can be 
ensured if the nuclear norm satisfies some mild 
bounds which are expected to hold in all models of 
physical interest. Given a product vector 7 which 


satisfies [3], the sought after subspace is 


H(A) := Tac. (AOT Oe 


The crucial step in the proof of existence of KMS 
states is to show that 


trE(A)je 7 E(A) <œ for B>0 


if the nuclearity condition holds. Here E(A) denotes 
the projection onto the subspace H(A) representing 
localized excitations and H denotes the Hamiltonian 


in the vacuum representation 7yac.. Next it is shown 
that the function 


[p> wg alar (b)) 


M str E(A)eP4E(A)ryac.(a7e(b)) 


allows an analytic extension to a strip of width 8 
which satisfies the KMS boundary condition [2] for 
jt] < é6if a,b € A(O.) and O, + te C O for |t| < 6. In 
the final step, Buchholz and Junglas were able to 
demonstrate that bounds on the nuclear norm are 
even sufficient to control the thermodynamic limit. 

Given a thermal field theory, a slight variation of 
the method used by Buchholz and Junglas allows 
one to construct a KMS state for a new temperature 
(Jakel 2004), that is, to change the temperature of a 
thermal state. 


Thermal Representations 


Given a KMS state wg, the GNS construction gives 
rise to a Hilbert space Hg and a representation rg, 
called a thermal representation, of A. The algebra 
Rg:=7 (A) possesses a cyclic (due to the GNS 
construction) and separating (due to the KMS 
condition) vector Qg such that 


wgla) = (N5, 73(a)Qz) Vac A 


The KMS condition implies that wg is invariant 
under time translations, that is, wg 07; =wg for all 


t € R. Thus, 
U(t)mg(a)Qg = TaT (a))Qg, 


defines a strongly continuous unitary group 
{U(t)}er implementing the time evolution in the 
representation 73. By Stone’s theorem there exists a 
self-adjoint generator L such that 


U(t)=e'", teER [4] 


acA 


For 0 < 8< œ, the Liouville operator L is not 
bounded from below; its spectrum is symmetric and 
consists typically of the whole real line. However, 
the negative part of L is “suppressed”? with respect 
to the algebra of observables Rg:=7,(A)" in the 
following sense (Haag 1992): let lj») be the 
spectral projection of L for the interval |— œ, — K] C 
Sp(L), then 


ll,- AQ,|| < IA] VA € Rg 


We now turn to structural aspects which are 
characteristic for a relativistic model, namely the 
existence of strong spatial correlations and the 
connection between the decay of these correlations 
and the spectral properties of the Liouville operator. 


Let wg be a state, which satisfies the relativistic 
KMS condition. It follows (using a theorem of 
Glaser) that for a € A the function 6,:R* — Hg, 


x ++ 1g (Ax(a))OQg 


can be analytically continued from the real axis into 
the domain T ge/2 such that it is weakly continuous 
for Sz\,0. If the usual additivity assumption 
U;0O; =O = ViRg(O;)=Re(O) for the local von 
Neumann algebras holds, then 


He = 13(A(O)) 2 [5] 


for any open spacetime region O c R®!. Junglas 
has shown that the thermal Reeh-Schlieder property 
[5] follows as well from the standard KMS condi- 
tion, if wg is locally normal with respect to the 
vacuum representation. 

The decay of spatial correlations depends on 
infrared properties of the model, and the essential 
ingredients for the following cluster theorem are the 
continuity properties of the spectrum of L near zero. 


Theorem 3 Let Q; denote the unique (up to a 
phase) normalized eigenvector with eigenvalue {0} of 
the Liouvillean L and let P* denote the projection 
onto the strictly positive part of the spectrum of L. 
Assume that there exist positive constants m > 0 
and C;(O) > 0 such that 


je PF 3(a) Qa 


<C(O)-™ lal] Va € A(O) 


Here O c R®! is an open and bounded spacetime 
region. Now consider two spacelike separated 
spacetime regions O1, O2, which can be embedded 
into O by translation and such that O,+6eC 
O,,6 >> b. then, for a € A(O;) and b € A(O), 


Jug(ba) — wa(b)ws(a)| < C2 : 8&7” Jal] lb 


The constant C2(3,0O)€R* may depend on the 
temperature 3-' and the size of the region O but is 
independent of 6,a, and b. 


From explicit calculations one expects that 
m= 1/2 for free massless bosons in 3 + 1 spacetime 
dimensions. Consequently, the exponent given on 
the right-hand side is optimal since it is well known 
that in this case the correlations decay only like 6". 

A description of thermal representations would be 
inadequate without pointing out one of the deepest 
connections between pure mathematics and physics 
that emerged in the last century: consider a von 
Neumann algebra R which possesses a cyclic and 
separating vector 2. Then polar decomposition of 
the closeable operator S:AQ = A*Q, A E R, pro- 
vides an antiunitary operator J (the modular 
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conjugation) and a self-adjoint operator A!/*. The 
connection to physics was established independently 
by Takesaki and Winnink, showing that the pair 
(R, 0) satisfies the KMS condition for G= —1, if one 
sets o,(A) = A“AA—" for AE R. 

Taking advantage of the Reeh—Schlieder property 
[5], one can associate modular objects to certain 
spacetime regions O. In general, a physical inter- 
pretation of these modular objects is missing. But for 
two-dimensional thermal models, which factorize in 
light-cone coordinates, the modular group corre- 
sponding to the algebra of a spacelike wedge admits 
a simple description: at large distances (compared to 
8) from the boundary, the flow pattern is essentially 
the same as time translations. These are results due 
to Borchers and Yngvason (1999). 


Analyticity Properties of n-Point Functions 


The correlation functions describe the full physical 
content of the theory: all observable quantities can 
in principle be derived from them. This is so because 
according to the Wightman reconstruction theorem 
(which is closely related to the GNS construction) 
knowledge of the correlation functions allows the 
reconstruction of the full representation of the field 
algebra. The Wightman distributions {Wy nens 


WE (ty =f Xs es3 l gaia Xai) 
= (Qp, pelti, x1): Pelta, Xn)Q4) [6] 
where 73(W/(f)) =:exp(i f dt dx f(t, x)dg(t, x)), satisfy 


a number of key properties: locality, positivity, 
Poincaré covariance, and temperedness. These prop- 
erties have been formulated for thermal field by Bros 
and Buchholz (1996), and this section is entirely 
based on their work. 

The relativistic KMS condition implies that the 
Wightman distributions {WH nen of a translation- 
invariant equilibrium state admit in the correspond- 
ing set of spacetime variables (t2 — t1,x2 —1),..., 
(ty —ty—-1,Xn — Xn—1) an analytic continuation into 
the union of domains 


(a1T ge) SO 66: 


for a; >0,i=1,...,2—1 and Da aj=1. The 
tube domains Tg. were specified in Definition 2. 
For 8 — ov, the tube Tg, tends to the vacuum tube 
T vac, = RO + iV}; thus, one recovers the spectrum 
condition for the vacuum expectation values. 

Let us now turn to the Fourier transformed 
Wightman correlation functions. Translation invar- 
lance implies 


X (An-1T pe) 


WP (v1, Pis- -Vm Pn) HH Un)O(Dr H pn) 
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The Wightman distribution wy" satisfies on the 
linear manifold (14,01) +--+ (Vn, Pn)=0 the KMS 
relation in the energy variables: for any pair of 
multi-indices (I,J) the identity 


WPG, D = WY (I,J) 


o where wr (J, I) is an abbreviation for 
wr (Piler (pihe) “and VI = Diet Vi: 

We now specialize to the two-point function W? 
The corresponding commutation function C(x) is 
given by 


C(x, — x2) = W (201, 2) — wy? (x2,%1) 
Locality implies that suppC Cc V,UV_. The 


retarded and the advanced propagator r and a, 
formally given by 


r(x) = 10(x.)C(x), a(x) = —10(—x.)C(x) 


satisfy the relation 
r—a=-—iC 


which corresponds to a partition of the support of 
C in its convex components: suppr Cc V} and 
suppa C V_. For the free scalar field of mass m 
the commutator function is 


1 o 
Co) = sf PEO) 


with 


EP (p) = =sen(v)6(7 — p? — m?) 


and subsequently the retarded and advanced propa- 
gators r™) and a’) are structural functions of the 
field algebra, which are determined by the c-number 
commutation relations of the fields. Thus, they are 
independent of the temperature, in contrast to the 
two-point function: 


“Toe 7 


Let now F(p) be the Fourier transform of the time- 
ordered function 7(x). The relation 


: —iř(p) + —ia(p)e~™” 
rp) = ŽO tp 


shows that 7(p) and —i7(p) only “coincide up to an 
exponential tail” at very high energies (Bros and 
Buchholz 1996). 


Particle Aspects 


The condition of locality (together with the relati- 
vistic KMS condition) leads to strong constraints on 


the general form of the thermal two-point functions 
that allow one to apply the techniques of the Jost- 
Lehmann—Dyson representation. As has been shown 
by Bros and Buchholz (1996), the interacting two- 
point function Wg, can be represented in the form 


watt) = f dim Dg(x, m)WẸ (t, x, m) 


Here Dg(x,m) is a distribution in x,m which is 
symmetric in x, and 

W(t, x,m) = (27) ' J dvap e' oP WV) (y, p) 
is the two-point correlation function of the free 
thermal field of mass m. In contrast to the vacuum 
case, the damping factors Dg(x,m) depend in a 
nontrivial way on the spatial variables x. The 
damping factors describe the dissipative effects of 
the thermal system on the propagation of sharply 
localized excitations. Bros and Buchholz suggested 
that the damping factor Dg(x,m) can be decom- 
posed into a discrete and an absolute continuous 
part 


Da(x,m) = 6(m — mo)Dg alx) + Dg c(x, m) 


and that the 6-contribution in the damping factors is 
due to stable constituent particles of mass mp out of 
which the thermal states are formed, whereas the 
collective quasiparticle-like excitations only contri- 
bute to the continuous part of the damping factors 
(Bros and Buchholz 1996). 

In the case of spontaneously broken internal 
symmetries Bros and Buchholz (1998) have shown 
that the damping factors D7 (x, m) which appear in 
the representation of current-field correlations 
functions 


(Qg, Jolt, x)ba(0, 0)Qz) 
= / dm (D4 X m)ð vE? (t, x; m) 
0 


myw (t,x, m)) 


indeed contain a discrete (in the sense of measures) 
zero-mass contribution and are slowly decreasing in 
|x| for small values of m. Thus, these damping 
factors coincide locally with the Kallén—Lehmann 
weights appearing in the case of spontaneous 
symmetry breaking in the vacuum sector (Bros and 
Buchholz 1998). It is easily seen in examples that 
there is no sharp energy-momentum dispersion law 
for the Goldstone particles. Thus, the Källén- 
Lehmann representation is better suited than Fourier 
transformation to uncover the particle aspects of 
thermal equilibrium states. 


+ D5 (x, 


Models of Thermal Field Theory 


In the simplest case, the classical Lagrangian density 
of the so-called P(¢), models is given by 


L= (BAP) me To 8 


Here ¢(t,x) denotes a real scalar field over space- 
time. The construction of the corresponding quan- 
tized thermal field presented in this section (Gérard 
and Jakel 2005) is based on the original ideas of 
Hgegh-Krohn (1974). 


Free Fields 


Let h,,, denote the L?-closure of C3°(R) with respect to 
the norm ||f||=(f,(1/2¢)f), where e(k) = Vk? + m? 
denotes the one-particle energy for a single neutral 
scalar boson and the scalar product is the usual 
L?-scalar product. The subspaces associated to a 
double cone O are given by 


b,,(O) := {h € b,,|supp Rh, suppv'Sh c O} 


where O denotes the basis of the double cone ©. 
The corresponding free quantum field is described by 
the Weyl algebra W(h,,,) :={W(f) |f € 6,,}, together 


with the time evolution {T? }eg, 


(WA) = Weef), 


T 


aF 


If m > 0, the KMS condition allows just one unique 


(quasifree) (7°, 3)-KMS state: 
we(W(f)) .— eT AEAF., p := (e% o DE 


The GNS representation associated to the pair 
OWNV(B,n), w3) is the well-known Araki-Woods repre- 
sentation, given by 


Haw := T (b, D Pa): 
taw(W(h)) = Wr((1 +o @ ph), h € by, 


Qaw := QF, 





Here h, is the Hilbert space conjugate to ,,, W£(.) 
denotes the usual Weyl operator on the Fock space 
T(h,, @6,,) and Q; eET(b,„, @b,) is the Fock 
vacuum. The Liouvillean Law (see [4]) can be 
identified with dT (e€ @ —e). 

The local von Neumann algebra generated by 
{taw(W(h)) |b € §,,(O)} is denoted by Raw(O). The 
algebra of observables for the free quantum field 
(and, as we will see, the P(¢), model) is the norm 
closure 


A= U Raw) 


OCR? 


of the local von Neumann algebras. 
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The Thermal P(¢), Model 


In 1+1 spacetime dimensions Wick ordering is 
sufficient to eliminate the UV divergences of poly- 
nomial interactions. As it turns out, the leading 
order in the UV divergences is independent of the 
temperature (in agreement with the results found in 
Kopper et al. (2001)). Thus, it is a matter of 
convenience whether one uses the thermal covar- 
iance function Cg, 


1 +e" 
Ca(h1, h2) = paa ey) 
h1,ho = S(R) 


or the vacuum covariance function Cyac. to define 
the Wick ordering: 


[72/2] j m 
oN c= ea a ott n) 


m=(0 


Now let P(A) be a real-valued polynomial, which is 
bounded from below. Then Euclidean techniques 
can be used to define the operator sum 


l 
H; := Law + [ : P(dg(x)) :C3 dx 


in the Araki-Woods representation and to show that 
H; is essentially self-adjoint Gérard and Jäkel (2005). 
Thus, (the closure of) H; can be used to define a 
perturbed time evolution t++ 7! on A and the vector 


eW (S/O iy 
Ie je GD HO aw || 


induces a KMS state w; for the dynamical system 


(mawl A)”, Ti 


A finite propagation speed argument (using 
Trotter’s product formula) shows that 
7,(A) := "Ae", teR [9] 


is independent of / for A € Raw(O),t € R fixed and 
! sufficiently large. Thus, there exists a limiting 
dynamics 7 such that 


lim |[r1(A) — 7(A)|| = 0 10 


for all A € Raw(O), © bounded. This norm conver- 
gence extends to the norm closure A of the local von 
Neumann algebras. 

The existence of weak* limit points (which are 
states) of the (generalized) sequence {wi}~o is a 
consequence of the Banach—Alaoglu theorem. The 
fact that all limit states satisfy the KMS condition 
with respect to the pair (A,7) follows from [10]. To 
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prove that the sequence {wi}~o has only one 
accumulation point, 


wg = lim wy [11] 


is more delicate. Following Høegh-Krohn, Nelson 
symmetry is used in Gérard and Jäkel (2005) to 
relate the interacting thermal theory on the real line 
to the P(¢) model on the circle S! of length at 
temperature 0. The existence of the limit [11] then 
follows from the uniqueness of the vacuum state on 
the circle. The relativistic KMS condition can be 
derived by Nelson symmetry as well, using the fact 
that the discrete spectrum of the model on the circle 
satisfies the spectrum condition. Since the limit [11] 
exists on the norm closure A of the weakly closed 
local algebras, it follows from a result of Takesaki 
and Winnink that wg is locally normal with respect 
to the Araki-Woods representation (which itself is 
locally normal with respect to the Fock representa- 
tion). Consequently, 


R3(O) := tg( A(O)” = Raw(O), O bounded 


that is, Rg(O) is (isomorphic to) the unique 
hyperfinite factor of type I. Moreover, the local 
Fock property implies that the split property holds. 


Perturbation Theory 


Steinmann (1995) has shown that perturbative expan- 
sions for the Wightman distributions of the :¢*:4 model 
can be derived directly in the thermodynamic limit, 
using as only inputs the equations of motion and the 
(thermal) Wightman axioms. The result can be 
represented as a sum over generalized Feynman graphs. 

The method consists in solving the differential 
equations for the correlation functions which follow 
from the field equation, by a power series expansion 
in the coupling constant, using the axiomatic 
properties of the Wightman functions as subsidiary 
conditions. The Wightman axioms are expected to 
hold separately in each order of perturbation theory, 
with the exception of the cluster property. 

As expected, the UV renormalization can be 
chosen to be temperature independent, that is, one 
can use the same counterterms as in the vacuum 
case. But the infrared divergencies are more severe, 
they cannot be removed by minor adjustments of the 
renormalization procedure. Various elaborate 
resummation techniques have been proposed to (at 
least partially) remove the infrared singularities. 

Another approach has been pursued by Kopper et al. 
(2001). They have investigated the perturbation expan- 
sion of the :¢*:4 model in the imaginary-time formal- 
ism, using Wilson’s flow equations. The result is once 
again that all correlation functions become ultraviolet- 


finite in all orders of the perturbation expansion, once 
the theory has been renormalized at zero temperature 
by usual renormalization prescriptions. 


Asymptotic Dynamics of Thermal Fields 


Timelike asymptotic properties of thermal correlation 
functions cannot be interpreted in terms of free fields 
due to persistent dissipative effects of a thermal 
system. This well-known fact manifests itself in a 
softened pole structure of the Green’s functions in 
momentum space and is at the root of the failure of 
the conventional approach to thermal perturbation 
theory (Bros and Bruchholz 2002). In fact, assuming 
a sharp dispersion law, one would be forced to 
conclude that the scattering matrix is trivial (a 
famous no-go theorem by Narnhofer et al. (1983)). 

However, there seems to be a possibility to find an 
effective theory, which is much simpler and still 
reproduces the correct asymptotic behavior of the full 
theory. Disregarding low-energy excitations, Bros and 
Buchholz (2002) have shown that the 6-contributions 
in the damping factors give rise to asymptotically 
leading terms which have a rather simple form: they are 
products of the thermal correlation function of a free 
field and a damping factor describing the dissipative 
effects of the model-dependent thermal background. 
This result is based on the assumption that the 
truncated n-point functions satisfy 


lim TRO DPW (ty 301, seima S0 


T—o 


k>0 


while the 6-contribution in the damping factors 
exhibit, for large timelike separations T, a T~°/* 
type behavior (in 3 + 1 spacetime dimensions). 
Bros and Buchholz (2002) have shown that the 
asymptotically dominating parts of the correlation 
functions can be interpreted in terms of quasifree 
states acting on the algebra generated by a Hermi- 
tian field ¢o satisfying the commutation relations 


lbo(t1, x1), bo(t2, x2) 


= Am (t1 — t2,%1,%2)Z(x1 — x2) 


Here Am, is the usual commutator function of a free 
scalar field of mass mp and Z is an operator-valued 
distribution commuting with ġo such that w3(Z(x1 — 
x2)) = Dg 4(x1 — x2). (Here wz denotes a KMS state 
for the algebra generated by ġo.) Intuitively speak- 
ing, the field ģọ carries an additional stochastic 
degree of freedom, which manifests itself in a central 
element that appears in the commutation relations 
and couples to the thermal background. 

As ġo describes the interacting field asymptoti- 
cally, one may expect that ¢o satisfies the field 


equation of the interacting field in an asymptotic 
sense. Buchholz and Bros (2002) have demonstrated 
that this assumption allows one to derive an explicit 
expression for the discrete part of the damping 
factors Dg q(x) in simple models. 


See also: Axiomatic Quantum Field Theory; Quantum 
Field Theory in Curved Spacetime; Scattering in 
Relativistic Quantum Field Theory: The Analytic 
Program; Tomita—Takesaki Modular Theory. 
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Lattices, or differential-difference equations, are a 
special class of ordinary differential equations, with 
the dependent variable t playing the role of time and 
an infinite number of dependent variables q„, = qn (t) 
numbered by integer indices n, characterized by a 
translational invariance with respect to the shift 
n—n+ 1. Due to this property, such equations are 
well suited for description of processes in 


translationally symmetric systems like crystals. On 
his search for lattice models admitting interesting 
explicit solutions, M Toda discovered in 1967 the 
lattice which nowadays carries his name: 


din = eann edn an-ı [1] 


Toda lattice is one of the most celebrated systems of 
mathematical physics, and a large amount of 
literature is devoted to it and to its various genera- 
lizations. Its most prominent property is “integr- 
ability,” so that it is amenable to a rather complete 
exact treatment; moreover, it can be regarded as one 
of the basic models, illustrating all the relevant 
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paradigms, notions, methods, and results of the 
theory of integrable systems (sometimes called the 
theory of solitons). One has a rare possibility to read 
the first-hand presentation of a large body of 
relevant results, including the authentic story of the 
original discovery, in Toda (1989). 


The Infinite Toda Lattice 
Model 


The classical infinite Toda lattice [1] describes a one- 
dimensional chain of unit mass particles, each one 
interacting with the nearest neighbors only, g,, being 
the displacement of the mth particle from equilibrium. 
It can be treated within the Hamiltonian formalism 
of the classical mechanics (with some care, because of 
the infinite number of degrees of freedom). In this 
framework, the second-order Newtonian equations 
of motion [1] are replaced by the first-order 
Hamiltonian ones, for the coordinates g, and 
canonically conjugate momenta py: 


Dini = Du, Pn = ednt1— In _ edn dn-1 [2] 


The corresponding Hamilton function is 


mpa = 5S P+ ere) B 


neZ, nE 


One can understand infinite sums here formally, or, 
alternatively, one can impose suitable boundary condi- 
tions, like qn+1 — qn œ> 0, pn — 0 as |n| — 00 (usually 
one requires decay faster than any degree of 1/|7]). 


Multisoliton Solutions 


M Toda found in 1967 a number of exact traveling 
wave solutions of this system, including the 1-soliton 
solution: 


{4 e—2(y1n—fit+61) 


qn(t) = log it) Fey 4l 
or, equivalently, 
2 
edn+1(t)—dn(t) = 1 + — U o [5] 


cosh” (yın — Bit + 64) 


where y1 > 0, 61 = sinh %1, and 6, is an arbitrary 
phase. Such a soliton moves with the velocity 
vı = 61/1 (to the right, if vı >Q, and to the left, if 
vı <0). Note that the faster the soliton is, the larger its 
amplitude. Multisoliton solutions were constructed in 
1973 by R Hirota with the help of his ingenious 
“direct” (or bilinear) method. They can be written as 


_ Tn 1(t)T-1 (2) 
cad — Te taal 6 


where, for an M-soliton solution, 7,,(t) can be 
represented through the M x M determinant depend- 
ing on 2M parameters z; € (—1,1) and c; € R: 


ne a 
a(t) = det (a + RON) 7] 
ec! 1<ij<M 


where c;(t) = ce", Bj = (1/2)(z;* — zj). If one sets 
zj=te 7% with y > 0, then 6;=+sinhy,;, and one 
can show that asymptotically both for t+ —oo and 
for t— +o0 the solution [6] looks like the sum of 
well-separated solitons [4] with the velocities 
v; = 0;/y; and the respective phases yn — Git + ae 
This is usually interpreted as a particle-like behavior 
of solitons. One can show that the scattering of 
solitons is factorized: 


6) — 5 = > log 


Uk <Uj 


— X log 


Up >vj 


L= 


Zj — Zk 








L= 
Zj — Zk 





[8] 








which means that the phase shifts of individual 
solitons can be interpreted as coming from the 
pairwise interactions only. 


Integrability 


The infinite Toda lattice is completely integrable in 
the sense of the classical Hamiltonian mechanics: it 
admits an infinite number of functionally indepen- 
dent integrals of motion in involution. This was 
demonstrated in 1974 by M Hénon. An instance of 
these higher integrals of motion is given by 


1 
F3(p,q) = a T: N (pn + Pnp) T [9] 


neZ, neZ, 


Hamiltonian flows corresponding to the higher 
integrals of motion (usually referred to as higher 
Toda flows) form the “Toda lattice hierarchy.” A 
beautiful approach to this hierarchy is based on the 
Lax representation of the Toda lattice, discovered in 
1974 independently by H Flaschka and S Manakov. 
In the variables a,,b,, related to qn, Pn by 


áy = ee Di = Pn [10] 
equations of motion of the Toda lattice [2] are 
rewritten as 

bn = an 


dn = An(On41 = Dy), — dy [1 1] 


It turns out that eqns [11] are equivalent to the 
operator equation 


L = |L, A4] = [A-, L] [12] 


where L and A+ are linear difference operators with 
coefficients depending on day, bn: 


L= ` Del as T ` Anl anai T ` Fria [13] 


neZ, nE neZ, 


A4 = `S Duk gn a ` Ensin 


nE neZ, 


A_ = X Aal anad 
neZ 


[14] 


Here difference operators are represented as infinite 
matrices, Em, n being the matrix with the only 
nonvanishing element equal to 1 in the position 
(m,n). A diagonal similarity (gauge) transformation 
of the matrix L leads to an equivalent Lax 
representation of the Toda lattice: 


Lo = [Lo, Ao] [15] 
with 
Lo = ` Dik =F >. ail? (Enia ES Enn+1) [16] 
neZ nEZ 
1 
Ao = D2 a (Evin — Ennyi) [17] 
nEZ, 


Being equivalent for the Toda lattice, these two Lax 
representations admit nonequivalent generalizations 
(see below). Note that the matrices A+ in [14] may 
be interpreted as A+ =7+(L), where m+ stands for 
the lower-triangular, resp., strictly upper-triangular 
part. The commuting higher members of the Toda 
lattice hierarchy (enumerated by s € N) are char- 
acterized by the Lax equations of the form [12] with 
the same Lax matrix L as in [13] and with 
A+=7+(L‘). In the Lax representation [15], the 
higher Toda flows are obtained by choosing 
Ao =skew(Lj), where “skew” denotes the skew- 
symmetric part (strictly lower-triangular part minus 
strictly upper-triangular part) of the symmetric 
matrix. The Hamilton functions of the higher flows 
are obtained as H, ~ tr(L*) =tr(L9). 


Inverse Scattering 


H Flaschka and S Manakov laid the Lax representa- 
tion into the base of the application of the inverse- 
scattering, or inverse-spectral, transformation 
method (IST) to the infinite Toda lattice. It was the 
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first application of IST in the lattice context. The 
matrix Lo in [16] is symmetric tridiagonal, which 
yields that the operator Lo is second order and self- 
adjoint. The direct and inverse-spectral problem for 
Low = uy with such operators Lo is well studied and 
parallel, to a large extent, to the corresponding 
theory for second-order differential operators. In the 
rapidly decaying case, the set of spectral data of the 
operator Lo, allowing for a solution of the inverse 
problem, consists of: 


1. eigenvalues u; =z + g of the discrete spectrum, 
with z; € (—1, 1); 

2. normalizing coefficients y; of the corresponding 
eigenfunctions; and 

3. reflection coefficient r(z) for |z| = 1, characterizing 
the continuous spectrum w=z+ 2! € [-2,2]. 


The solution of the inverse-spectral problem is given 
in terms of the Riemann—Hilbert problem or its 
variants, like the Gelfand—Levitan equation. Equa- 
tion [12] means that the evolution of the operator L, 
induced by the evolution of g,(t),p,(t) in virtue of 
the Toda lattice equations [2], is “isospectral.” More 
precisely, the discrete eigenvalues are integrals of 
motion, while the evolution of other spectral data is 
governed by simple linear equations: 


w(t) = (Oe 
r(z,t) = r(z,0)e* = 


Z; = const., 


[18] 


In particular, the multisoliton solutions correspond 
to the reflectionless case r(z,t) = 0. The IST solution 
of the initial-value problem for the infinite Toda 
lattice can be schematically depicted as in Figure 1. 


Bi-Hamiltonian Structure 


The canonical Poisson bracket for the variables qn, Pn 
turns in the Flaschka—Manakov variables [10] into 


{Onana ti = —ân, lan Onti ty = —ay [19] 








qn(0), P,(0) Direct-spectral problem 


Zj qj(0), rz, 0) 





Linear 
evolution 





q(t), Pp(t) Inverse-spectral problem |z;, y;(t), r(z, t) 








Figure 1 General scheme of the IST. 
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(all other brackets of the coordinate functions 
vanish), and the system [11] is Hamiltonian with 
respect to this bracket, with the Hamilton function 


HA =k +a 


However, one can define also a different Poisson 
bracket for the variables a,,, bn: 


1 Oy data — — bnan 


ns Da =; aUa 

{a +172 AnOn+1 20] 
1 as Pait = —ay 

{Any An+1 $9 = —~aAnan+1 


with the following properties: it is compatible with 
the first one (i.e., their linear combinations are again 
Poisson brackets), and the system [11] is Hamilto- 
nian with respect to this bracket, with the Hamilton 
function H1 = ` b,. So, the Toda lattice in the form 
[11] is a bi-Hamiltonian system. This result is due 
to M Adler (1979). The bi-Hamiltonian property, 
introduced by F Magri in 1978 on the example of 
the Korteweg-de Vries equation, has been estab- 
lished since then as an alternative (and highly 
effective and informative) definition of integrability. 
Actually, the Toda lattice [11] is even tri-Hamiltonian, 
since there exists one more local Poisson bracket for 
the variables a„, b,, with similar properties, discovered 
by B Kupershmidt in 1985. 


Darboux-Backlund Transformations 
and Discretization 


A further indispensable attribute of integrable 
systems are the so-called Darboux—Backlund trans- 
formations. For the Toda lattice they were first 
found by M Toda and M Wadati in 1975. A 
Backlund transformation (qn, Pn) — (gn, Pn) with the 
parameter / can be written as 

1+hp, = edn In 4 hedra- 


1 + hpn = elnan + )p2 ed —In [21] 


This is a canonical transformation, possessing a 
classical generating function. These formulas can be 


given a fundamentally important interpretation in 
terms of the matrices 


U, = ` et aE yn +h ` Entin [22] 


neZ, neZ, 


U_=I+h ` eti- Ey n41 [23] 
neZ 


The first formula in [21] is equivalent to the 
factorization I + hL =U,U_, while the second one 
is equivalent to the factorization I + hL=U_U, 


with the flipped factors. The Backlund transforma- 
tion [21] serves also as an integrable discretization 
of the Toda flow [2] with the time step h. 


Finite Open-End Toda Lattice 
Model 


The infinite Toda lattice [1] can be reduced to finite- 
dimensional systems by imposing suitable boundary 
conditions, different from the rapidly decaying ones. 
Particularly important are “open-end boundary 
conditions,” which correspond to placing the parti- 
cles 0 and N+1 at qọ=+œ and qN} =œ, 
respectively. In terms of the Flaschka—Manakov 
variables, this means that aọ=an =0 and bo= 
bn+1=0. The Hamilton function of the resulting 
system with N degrees of freedom is 


1 N r N=1 
An (p,q) = ae ae >. ednt+1—In [24] 
n=1 n=1 


This system consists of N particles subject to 
repulsive forces between nearest neighbors, and 
exhibits a scattering behavior both as t— —oo and 
t— +00. It admits a Lax representation of the same 
form [12] or [15] as in the infinite case, but with all 
the matrices being now of finite size N x N, so that 
[13]-[14] and [16]-[17] are replaced by 


N N-1 N-1 
L= >» Dilan T ` CEN Oe ae + ` Piin [25] 
n=1 n=1 n=1 


N N-1 
Ay = >. Dyna + ` Erria 
n=1 n=1 [26] 
N-1 
A_ = Ciba 
n=1 
and 
N N-1 
Lo = >, aS T ` a ain T Ennyi) [27] 
n=1 n=l 
A Eo: — Enn+1) [28] 
0 3 = n n+1,n nn+1 


The qualitative behavior of the solutions is easily 
understood: as a consequence of repulsive interac- 
tions, the pairwise distances between particles grow 
infinitely, a„(t) =e — 0 as t— +00, so that 
the matrix Lo becomes asymptotically diagonal, 
with the limit velocities b,(-:00)=q,(-too) as the 
diagonal entries. Due to the isospectral evolution of 
Lo, these limit velocities have to coincide with the 
eigenvalues u; of Lo, which are integrals of motion. 


As t—+—oo, they appear on the diagonal in the 
increasing order (the rightmost particle qı being the 
slowest, and the leftmost qy being the fastest), while 
as t—> +00, their order on the diagonal changes to 
the decreasing one (the particle g; becoming the 
fastest and gn becoming the slowest). 


Moser’s Solution 


Integration of this system has been first performed by 
J Moser in 1975. His solution can be interpreted 
within the general scheme of the IST (see Figure 1). 
The spectral data in this case consist, for example, of 
the eigenvalues juj(7=1,...,N) of the matrix Lo and 
the first components 7; of the corresponding ortho- 
normal eigenvectors. The evolution of these data 
induced by the Toda flow [2] turns out to be simple: 


2 t 
r (O)eM 
2 
Hij — const., 1; (t) = o are 
DAV 


The IST is expressed by the identity 


|29] 


N 2 1 


fe o a 


EITA U am 





aAN-1 


pp — bn 
both parts of which represent the entry (1,1) of the 
matrix (ul —Lo)*. It implies that all variables 
a,(t),b,(t) are rational functions of uj and eM’; in 
particular, one finds: 


= edn+1 (2)—4n(E) — Tn—1(t) Tn+1 (E) [31] 


i = 


where 7,(t) can be represented as an n x n Hankel 
determinant 


Tn(t) = det (cant) 0<j,k<n—1 


N 32 
c(t) = $ in (t) = 
i=1 


Factorization Solution 


The Lax representation [12] is a particular instance of 
a general construction, known under the name of 
Adler—Kostant-Symes (AKS) method and found 
around 1980. The ingredients of this construction are: 


è a Lie algebra g, equipped with a nondegenerate 
scalar product which is used to identify g with its 
dual space g“; 

èe a splitting of g into a direct sum of its two 
subspaces g, which are also Lie subalgebras, with 
T+:g— g, being the corresponding projections; 
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e the Lie group G of the Lie algebra g, and its Lie 
subgroups G with the Lie algebras g,; and 

e a function @:g— g covariant with respect to the 
adjoint action of G (in the case of matrix Lie 
algebras and groups, one can take, e.g., 
o(L)=L*). 


The AKS method provides a formula for the solution 
of the initial-value problem for Lax equations of the 
form [12] with the Lax matrix Leg and 
A+ =74(@(L)). The solution is given by 

L(t) = U;*(@)L(0)U,(t) = U_@LO)U"(t) [33] 
where the elements U(t) € G+ solve the factoriza- 
tion problem 


exp(t(L(0))) = U+ (2)U_(2) 34) 


For the open-end Toda lattice g=gl(N), the Lie 
algebra of all N x N matrices, g} consist of all 
lower-triangular, resp., strictly upper-triangular, 
matrices. Accordingly, G=GL(N), the Lie group 
of all nondegenerate N x N matrices, and G4 
consist of all nondegenerate lower-triangular 
matrices, resp., of upper-triangular matrices with 
units on the diagonal. The corresponding factor- 
ization problem in G is well known in the linear 
algebra under the name of LR factorization, and is 
related to the Gaussian elimination. From [33] and 
the well-known expression of the diagonal ele- 
ments of the lower-triangular factor in the LR 
factorization through the minors of the factorized 
matrix, we find: 


Tn+1 (GA (t) 


at) = a,(0) [35] 


where 7,(t) is the upper-left nxn minor of 
the matrix exp(tL(0)). If L(t) is the Lax matrix 
along the solution of the Toda flow (¢(L)=L), then 
the sampling of the matrix exp(L(t)) at the integer 
times t € Z coincides with the result of application 
of the Rutishauser’s LR algorithm to the matrix 
exp(L(0)). The LR algorithm applied to the matrix 
I+hL(0) is nothing other but the Backlund trans- 
formation [21] in the open-end situation. 


Finite Periodic Toda Lattice 
Model 


A different reduction of the infinite Toda lattice to a 
finite-dimensional system appears by imposing peri- 
odic boundary conditions, gy,1Nn(t) = gn(t) for all 
n € Z (of course, such relations hold also for the 
Flaschka—Manakov variables a,, b,). The Hamilton 
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function of the resulting system with N degrees of 
freedom is 


1 
Moaes Dd) Prt Dd) etre pa 


n€Z,|NZ. neZINZ 


This system consists of N particles q„a(n=1,..., N), 
and it is always assumed that gni1 = q1 and go = qN. 
Thus, the potential energy in [36] differs from the 
potential energy in [24] by one additional term e@~ 7. 
However, this modest difference leads to much more 
complicated dynamics of the system (quasiperiodic 
instead of scattering). It is convenient to replace 
infinite matrices in the Lax representation [12] by 
finite ones, of size N x N, but depending on an 
additional parameter A (called the spectral parameter): 


L= ` babna t A ` An alee 


nEZINZ nEZINZ 
+A >, eee ee [37] 
nEZINZ 
A= X bnEnntr> X Eman [B8] 
neZINZ nEZINZ 
Ae Gea [39] 
nEZ/NZ 


The Lax representation [12] holds identically in A, 
so that the spectral parameter drops out of the 
equations of motion. Note that, unlike the open-end 
case, L is no more a tridiagonal matrix, because of 
the nonvanishing entries in the positions (N, 1) 
and (1, N). 


Inverse-Spectral Transformation 


Solution of the periodic lattice in terms of multi- 
dimensional theta functions has been given indepen- 
dently by E Date and S Tanaka, and by I Krichever 
in 1976. In this case, the set of the spectral data is 
more complicated; it includes: 


@ a hyperelliptic Riemann surface R of genus N — 1 
determined by the eigenvalues of the periodic 
boundary-value problem for the operator L, or, 
in other words, by the equation R(A,u)= 
det(L(A) — uI) =0; and 

e N—1 points P, on R, which correspond to the 
eigenvalues of L with vanishing boundary 
conditions. 


Due to [12], the Riemann surface R itself is an 
integral of motion, and the evolution of points P, is 
such that the image of the divisor P4 +---+Pn_1 
under the Abel map moves along a straight line in 
the Jacobi variety of R. Solution of the inverse- 
spectral problem is given in terms of 


multidimensional theta-functions by formula [35] 
with 7,(t)=O0(nU — tV + D), where U,V,D are 
certain vectors on the Jacobian of R (the first two 
of them depending on the spectrum R only). 


Loop Algebras 


The periodic Toda lattice can be included into the 
general AKS scheme, if one interprets the Lax 
matrix L as an element of the loop algebra g 
which consists of Laurent polynomials (in A) with 
coefficients from gl(N), singled out by the additional 
condition 


g={L(\) €gl(N)[A,A 1]: QLA)Q7 = L(wA)} 


where Q=diag(1,w,...,wN!), w= exp(27i/N). Sub- 
algebras g, consist of Laurent polynomials with 
respect to non-negative, resp., strictly negative 
powers of A. The Lie group G corresponding to the 
Lie algebra g consists of GL(N)-valued functions 
U(A) of the complex parameter A, regular in 
CP!\{0,00} and satisfying QU(A)Q-'=U(wd). Its 
subgroups G+ corresponding to the Lie algebras g, 
are singled out by the following conditions: elements 
of G, are regular in the neighborhood of A=0, 
while elements of G_ are regular in the neighbor- 
hood of A= and take at A=oo the value I. The 
corresponding factorization is called the generalized 
LR factorization. As opposed to the open-end case, 
finding such a factorization is a problem of the 
Riemann-Hilbert type which is solved in terms of 
algebraic geometry and theta-functions rather than in 
terms of linear algebra and exponential functions. This 
approach to the periodic Toda lattice is due to Reyman 
and Semenov-Tian-Shansky (1979) and, indepen- 
dently, to M Adler and P van Moerbeke (1980). 


Generalizations: Lie-Algebraic Systems 


The AKS interpretation of the finite Toda lattices 
leads directly to their generalizations by replacing 
the algebra gl(N), resp., the loop algebra over gl(N), 
by simple Lie algebras, resp. affine Lie algebras. 
These generalized Toda systems were introduced in 
1976 by O Bogoyavlensky and solved in 1979 
independently by M Olshanetsky, A Perelomov, 
and by B Kostant. 


Simple Lie Algebras 


Let g be a simple Lie algebra (complex or real split), 
and hits Cartan subalgebra. Let further A= A, U A_ 
be the root system of g, decomposed into the sets of 
positive roots A, and the set of negative roots A_. 
One has a direct vector space g = g} © g_, where g, is 
spanned by the root spaces for positive roots and by b, 
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while g_ is spanned by the root spaces for negative 
roots (Borel decomposition). For a € A let Ea be a 
corresponding root vector. So, |H, Ea] = a(H)E,, for all 
H € b. The root a € h’ may be identified with Ha € b 
defined by (Ha, H) =a(H) for all H € Ñ. It is easy to 
deduce that [Eas E-a] = Ca Has were cy = (Ea Fg |: 
The system of simple roots will be denoted by ® c A,. 

The generalized Toda lattice for the Lie algebra g 
is the following system of differential equations on 


b x b: 
0s? 
P=-Y QE, Ea] = jé [40] 


This system can be given a Hamiltonian formula- 
tion, with the Hamilton function 


H, = y+ S$ cge? [41] 


It is completely integrable, and has a Lax represen- 
tation [12] with 


L=P+S Eat) eQE, [42] 
ac ac 
A,=P+) Ea, A- =) eE [43] 
acp aEc® 


The usual open-end Toda lattice corresponds to the 
algebra sl(N) (series An_1), so that the Hamilton 
function [24] can be denoted by Ha, ,. The 
Hamilton functions of the generalized lattices 
corresponding to other classical algebras so(2N + 
1) (series Bx), sp(N) (series Cx), and so(2N) (series 
Dy) can be written in the canonically conjugate 


variables gy,p,(m=1,...,N) as 
eo, g= Byn 

H, (p, q) = Hay, (p, q) R gman, g= Cyn [44] 
INTAN, j= Dn 


Affine Lie Algebras 


Turning to the generalizations of the periodic Toda 
lattice, let 9 be a Coxeter automorphism of a simple 
complex algebra g, the order of 0 being m. Introduce 
the loop algebra g as the Lie algebra of Laurent 
polynomials 


g= {Lo 


where w= exp(271/m). Denote by g; the eigenspaces 
of 0 corresponding to the eigenvalues w/(j € Z,/mZ,). 

Set a= go, and let s denote the dimension of a. By 
definition of the Coxeter automorphism, a is an 


) € g[A,A~"] : (L(A) = L@A)} 


abelian subalgebra of g. Denote by W the set of a € 
a* for which there exist nonzero elements Ea € g4 with 
[H, E,] =a(H)E, for all H € a. The elements E_, € 
g_, are defined similarly. It can be shown that Y 
contains s+ 1 elements, so that between them there 
exists exactly one linear relation. The elements of Y 
are called simple weights of the loop algebra g. The Lie 
algebra g is a direct sum of its two subspaces g, 
consisting of Laurent polynomials with non-negative, 
resp., with strictly negative powers of A; these 
subspaces are also Lie subalgebras. 

Now the generalized Toda lattice related to the loop 
algebra g can be introduced as the system of differential 
equations on a x a, which looks formally exactly as 
[40], and has the Hamilton function which looks 
exactly as [41], but with the set of simple roots ® of g 
being replaced by the set of simple weights Y of g. The 
matrices participating in the Lax representation [12] 
belong now to the loop algebra g: 


=P+A\S Eg +r TS ODE, [45] 


acy acy 
A)=P+ AS Ea 
acy 
46 
SOE, 46 
acy 


For the classical series of loop algebras, the 
Hamilton functions Hg in the canonically conjugate 
variables gy, p,(m=1,...,N) can be presented as 


Ag (p, q) — Ages (p, q) 

e aN a et? p= BY 

eT%AN 4 e% g= Cy) 

e an-an- 4 etita, = pW 

+ a (2) [47] 

a - + eT? C225 4 

eT N 4 9711 g= AY 

e IN + e11 , g= Doa 


Actually, one can find even more general integrable 
systems of the Toda type: one can add to Han (p,q) 
any of the two potentials e~7%~9N-' or qae™aIN + Bean 
on one end combined with any of the two potentials 
enta or yel +e? on the other end, where 
a,(3,7,6 are arbitrary constants. This result is due 
to E Sklyanin (1987). 


Generalizations: Lattices with 
Nearest-Neighbor Interactions 


There exist further integrable lattice systems with 
the nearest-neighbor interaction apart from the 
classical exponential Toda lattice [1]. Those of the 
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type n= 1(4n)\(B(Gn41 — In) — (Gn — Qn-1)) have 
been classified by R Yamilov in 1982, and the list 
contains, apart from the usual Toda lattice [1], the 
following ones: 


da — E a __ eur eet) [48] 


Gu = On\Qnui — 24dn + Gn-1) |49] 


1 1 
n= 2-1-1) w 
in = (Ge -7)(~——-—___) [50] 


Gn = — (G, — V° )(coth(qn+1 — qn) 
—coth(¢n = qn-1)) [51] 


Equations [48] are known as the “modified Toda 
lattice.” Equations [49] describe the “dual Toda lattice” 
which was instrumental in the original discovery by 
Toda (see Toda (1989)). All systems [49]-[51] can be 
obtained from [11] via suitable parametrizations of the 
variables a,,b, by canonically conjugate ones gy, Pn, 
similar to [10] for [1], see Suris (2003). 

A remarkable discovery of the integrable relati- 
vistic Toda lattice is due to S Ruijsenaars (1990). 
This lattice with the equations of motion 


efn+1 —dn 


edn—In-1 


1 + y) [52] 


can be considered as the perturbation of the usual 
Toda lattice with the small parameter a (the inverse 
speed of light). 

A class of integrable lattice systems of the relativistic 
Toda type Gn=1(4n)(Qn+if(Qn+1 — An) — An—1f (In — 
An—1) + 8lqn+1 — In) — 8(4n — In-1)) is richer than 
that of the Toda type, and has been isolated by Yu 
B Suris and by V Adler and A Shabat in 1997. The list 
contains, apart from the relativistic Toda lattice [52], 
two more a-perturbations of the usual Toda lattice [1]: 


= (1+ aqn-1) 


An = (1 + Adari ETT — (1 + Agi. je" 
_ Ga O galama) [53] 


a= (1 — Adn) (a _ adn+1) edn+1—In 
ie aġn-1)e 0 ) [54] 


two a-perturbations of the modified Toda lattice [48]: 


efn+1 —dn 


1 + aean 4n 


— ) 55] 


dh — Gi Cam — eln- |n-1 + Adn+1 


—— Q > — Cae, 
qn L 1 + medn—An-1 


2 . . GA 
Jn = Qn(1 — adn) (a — Qdn+1) Ere TE 
. edn In-1 
== ait) ee] [56] 


two a-perturbations of the dual Toda lattice [49]: 
AGn+14n 
1 + a(dn+1 — An) 
|57] 


a = 04 — Lila T dnk) T 
_ QGnQn-1 
1+ a(n = Gn-1) 


Pe ; ; dn+1 — qn — Adn+1 
n = dn 1+ a? n (fap eet 
In = dnl dn) 1 +aļlgn+1 — qn) 


dn — qn-1 — Adn-1 
— 2 58 
1+ a(n = Gn-1) ) | | 


and one a-perturbation of each of the systems [50] 
and [51]: 


°° -2 2 dn+1 — An — Adn+1 
Qn = (4 =] —— n rar 
(te — Qn) — (vay 
dn — Gn-1 — AGn-1 
— [<M 59 
(qn — qn)” — = ae 


dn = -5 (a E v’) 
9 sinh 2(dns1— qn) — vt sinh(2va)Gn+1 
sinh? (qn41 — qn) — sinh? (va) 


7 sinh 2(dn =g) = V sinh(2va)Gy_1 601 
sinh* (qn — dn—1) — sinh? (va) 


A detailed study of all these systems, their interrelations, 
and time discretizations can be found in Suris (2003). 

There exist also lattices with more complicated 
nearest-neighbor interactions, involving elliptic 
functions. They were discovered by A Shabat and 
R Yamilov (1990), and by I Krichever (2000). For 
example, the nonrelativistic elliptic Toda lattice is 
governed by the equations 


dn = a = 1)(V(dn, dn+i) + V(dns Qn-1)) [61] 


where V(q,q')=C(4 +q) +Clq - q) — ¢(2q) is an 
elliptic function in both arguments q,q' (here (q) is 
the Weierstrass ¢-function). 


Further Developments 
and Generalizations 


Sato’s Theory 


Formulas [6], [31], and [35] have the same structure, 
with the case-dependent functions 7,,(t) given by the 
determinants [7] for the multisoliton solution in the 


infinite case, by the Hankel determinants [32] or by the 
minors of the matrix exp(L(0)) in the open case, and 
by the multidimensional theta functions in the periodic 
case. All these seemingly different objects are actually 
particular cases of a beautiful construction due to M 
Sato (1981), developed by E Date, M Jimbo, M 
Kashiwara, T Miwa (1981-83), and by G Segal and G 
Wilson (1985), which provides one of the major 
unifying schemes for the theory of integrable 
systems. In this construction, integrable systems are 
interpreted as simple dynamical systems on an infinite- 
dimensional Grassmannian. The +7-function (first 
invented by R Hirota in 1971) receives in this theory 
a representation-theoretical interpretation in terms of 
the determinant bundle over the Grassmannian. 


Band Matrices 


The Lax matrices [13] and [16] in the Manakov— 
Flaschka variables can be easily generalized: in the 
symmetric matrix Lo one can admit nonvanishing 
elements in the band of the width 2s + 1>3 around 
the main diagonal, in the Heisenberg matrix L one 
can admit more nonvanishing diagonals in the 
upper-triangle part. A systematic presentation of 
a large body of relevant results is given in 
Kupershmidt (1985). In the setting of finite lattices, 
the integrability of such systems becomes a non- 
trivial problem (as opposed to the tridiagonal 
situation), because the number of independent 
conjugation-invariant functions tr(L‘) becomes 
less than the number of degrees of freedom. An 
effective approach to this problem based on the 
semi-invariant functions has been found by P Deift, 
L-Ch Li, T Nanda, and C Tomei in 1986. 


Two-Dimensional Toda Lattices 


Up to now, we considered integrable lattices with 
one continuous and one discrete independent vari- 
ables. This allows for a further generalization. 
Integrable systems with two continuous and one 
discrete independent variables are well known and 
widely used as models of the field theory. For 
instance, the Toda field theory deals with the system 


(dn) xy = ednt+1— In — edn In-1 [62] 


introduced in the soliton theory by A Mikhailov in 
1979. This two-dimensional system admits all possi- 
ble kinds of reductions and generalizations mentioned 
above for the usual Toda lattice. In particular, the 
periodic two-dimensional Toda lattice is referred to 
as the affine Toda field theory (with the prominent 
example of the sine-Gordon field which corresponds 
to the period 2). Later, it was realized that the 
equivalent equation (log Un). =Vn+1 — 2Un + Un-1, 
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which is obtained from [62] by setting v,= 
explqdn+1 — qn), already appeared in studies by 
G Darboux in the 1880s, as the equation satisfied 
by the Laplace invariants of the chain of Laplace 
transformations of a given conjugate net. This 
relation to the classical differential geometry was 
extensively studied by G Darboux, G Tzitzéica, and 
others long before the advent of the theory of 
integrable systems. Another link to the differential 
geometry is a more recent observation, and relates the 
two-dimensional Toda lattice, with the d’Alembert 
operator (-),., on the left-hand side of [62] replaced by 
the Laplace operator (-),z, to harmonic maps. For 
instance, the sinh-Gordon equation u, = sinh u gov- 
erns harmonic maps from C into the unit sphere S’, 
which can be interpreted also as Gauss maps of the 
constant mean curvature surfaces in R?. A review of 
this topic can be found in Guest (1997). 

Discretization of Toda lattices, nonabelian Toda 
Lattices, quantization of Toda lattices, dispersionless 
limit of Toda lattices, etc., are only some of the 
further relevant topics, which cannot be discussed in 
any detail in the restricted frame of this article, and 
the same holds, unfortunately, for such fascinating 
applications of the Toda lattice as the Frobenius 
manifolds, Laplacian growth problem, quantum 
cohomology, random matrix theory, two-dimensional 
gravity, etc. 


See also: Backlund Transformations; Bi-Hamiltonian 
Methods in Soliton Theory; Classical Matrices, 

Lie Bialgebras, and Poisson Lie Groups; Current Algebra; 
Dynamical Systems and Thermodynamics; Functional 
Equations and Integrable Systems; Integrable Discrete 
Systems; Integrable Systems and Discrete Geometry; 
Integrable Systems and the Inverse Scattering Method; 
Integrable Systems: Overview; Lie Groups: General 
Theory; Multi-Hamiltonian Systems; Quantum 
Calogero—Moser Systems; Separation of Variables for 
Differential Equations; Solitons and Kac—Moody Lie 
Algebras; WDVV Equations and Frobenius Manifolds. 
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Introduction 


A finite Toeplitz matrix is an n x n matrix with the 
following structure: 


ag ad_{ a_2 A_n+1 

ay ag a_} A_n+2 

a2 ay ag A_n+3 [1] 
Qn-1 An-2 An-3 °°" ao 


The entries depend on the difference i — j and hence 
they are constant down all the diagonals. There are 
two cases when the determinant is easy to compute. 
One is when the matrix is upper- or lower-triangular 


and the determinant is aj. The other case is when 
the matrix is of the form 


ao Qn-1 An- +: A 

ay ao Qn-1 ` a2 

a2 ay ag mt A3 [2] 
An—1 An—2 dn-3 ss) AQ 


In this latter case, the matrix is called a circulant 
matrix and the eigenvalues are given by the formula 


A a 0 = k <n— 1 


where 


The corresponding eigenvector for eigenvalue 


ple E) is 


(1, Pa E ERKE 


This can be verified by direct computation. The role 
of circulant matrices will not be emphasized in this 
article, although they are used in the computation of 
the generating function for certain dimer configura- 
tions and also in applications using the discrete 
Fourier transform. 

The most common way to generate a finite 
Toeplitz matrix is with the Fourier coefficients of 
an integrable function. Let ¢:T — C be a function 
defined on the unit circle with Fourier coefficients 


L T ik 
be =z; | Aeae 31 
We define T,„(¢) to be the Toeplitz matrix: 


Tae) =(P) ov 


A basic problem that in large part has been 
motivated by statistical mechanics is to determine 
the behavior of the asymptotics of the determinant 
of T,(¢) as n — oo. The determinant will be 
referred to as D,,(¢), where ¢ is called the generating 
function of the determinant. If the generating 
function has the property that its Fourier coefficients 
vanish for negative index (positive index) then the 
corresponding matrix is lower-triangular (upper- 
triangular) and hence the determinant is $j. For 
other cases, the determinant is not easy to determine 
and requires additional mathematical machinery. 
Some of the primary motivation to study the 
determinant of these matrices comes from the two- 
dimensional Ising model. We consider the Onsager 
lattice in the absence of a magnetic field with sites 


labeled by 

(7,/), 
and with a value oj,;= +1 assigned to each site. In 
the Ising model, o;,; signifies the state of the spin at 


the site (1,7). To each possible configuration of spins, 
we define an energy 


Elo) = —E1 X C7617 a L2 X CijOij+1 
ij ij 


0<i<M,0,<j<N 


Let 


vem `^ e7 FE(2) 


be the partition function. Then the probability of a 
given configuration is 
1 


Z 
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Here Fi, E2, and G=1/kT are, without loss of 
generality, assumed to be positive constants, T is the 
temperature, and k is the Boltzmann constant. If X 
is a random variable defined on the space of 
configurations, the expectation is given by 


E(X) -5 > X(a)e FO) 


o=+1 
Let n be fixed for the moment and assume toroidal 


boundary conditions for the lattice and then let 
N,M — œ. It is known that the random variable 


X (o) = 00090.n 


has expectation (00,000) given by D„(¢), where 








and 


zi = tanh Ei; z2 = tanh BE 
The square root is taken so that d(e'*)=1. This 
formula was first stated by Onsager and later 
verified in a difficult computation by Montroll, 
Potts, and Ward. 

The spontaneous magnetization M for the Ising 
model is defined by 


M? = lim (70,000,n) = lim D,,(¢) 


n— NCO 


Note that it is the square root of the correlation 
between two distant sites. Hence, the asymptotics of 
the Toeplitz determinants will determine whether 
the magnetization is positive or tends to zero as 
n— ©. 


Strong Szego Limit Theorem 


To determine the behavior of the determinants, we 
need to analyze the generating function ¢. Let us 
first consider the case where az < 1. (It is always the 
case that 0 < a; < 1.) This generating function is 
differentiable, nonzero and has winding number 
zero, and it is for functions of this type that a 
second-order expansion of the Toeplitz determinants 
can be described. The expansion first formulated by 
Szegö, in response to the question concerning the 
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spontaneous magnetization, is called the “strong 
Szegö limit theorem.” 

Before proving the Szeg6 theorem, it should be 
remarked that we can view the finite Toeplitz matrix 
as a truncation of an infinite array, 


do o-1 -2 
1 o $a 4 


d2 1 po 


The above infinite array is the matrix representation 
for the Toeplitz operator 


T(¢) : H? > H? 
defined by 


T(o)f = Pf) 
where H? is the Hardy space 


{f € L*(T) | fe =0, k <0} 


the function ¢€ L®(T), and P is the orthogonal 
projection of L*(T) onto H*. The matrix representa- 
tion given in [4] is with respect to the Hilbert space 
basis of H?, 


{el |0<k<co} 


and ¢ is called the symbol of the operator. Now 
define P,,:H* — H? by 


PT od tal 00a) = Oy is vers ata ty r) 


The finite Toeplitz matrix can be thought of as the 
upper-left corner of the array given in [4] or as 
Pol Ole 

To prove the Strong Szeg6 limit theorem, we 
introduce the Banach algebra B of bounded func- 
owe ee 2 
tions f satisfying Sop. |R\|fg|" < 00. 


Theorem 1 (Strong Szego limit theorem). Assume 
b = 6-4, where o+ have logarithms in B. Suppose 
log¢_, log, € H*. Then 


lim D,(¢)/G(¢)" = E() = exp © tsesa) 
= 


where G(ġ) = exp (( log ġ)o) and sp = log dg. 


Since B is a Banach algebra, it follows that if 
log ¢+ belong to B so do 


-, P+, go oo, Q, oo! 


and hence they are bounded. Since ¢, is in H? as 
well, its Fourier coefficients vanish for negative 
index and the Toeplitz operator has a corresponding 
infinite array that is lower-triangular. The Fourier 
coefficients vanish for positive index for @_ and 


hence the infinite array is upper triangular. From 
this, it follows that 


T(¢4)T(¢;') = T(¢2')T(¢_) =I [5] 
T(¢-)T(ġ+) = T(¢) [6] 


and 


P,T($_)Pn = T(¢-)P, 7 
This yields 
Died ae, 8 
= det P,T()TS)TO)T(O)T(O)Pn DT 


=detP,T(6+)PuT(¢,')T()T(¢_")PnT(_)Pn [10] 


z = det P,,T(¢4)Pn det(P,T(¢,')T(¢)T(d=')Pn) 
x det P,,T(b_)Pn [11] 


The determinants of the right-hand side and the left- 
hand side of the above expression are ((ġ+)o)”, 
respectively. Now given the Banach algebra condi- 
tions imposed on the symbol ¢, it follows that the 
operator 


TEF TET) 


is of the form I + K, where K is trace class. Hence, 
the eigenvalues A; of K satisfy 


N JA] < œ 


and the infinite (Fredholm) determinant of I + K is 
defined. To verify the claim that the operator 


T(E I T(E)T(H) = T(E )T(4-)T ($4) T (0) 


is I plus a trace class operator, we use the identity 


T(fg) — T(f)T(g) = A(f) (a) [12] 


where H(f) has matrix form (agi) =s and 
ae’) =g(e"). Our Banach algebra conditions 
show that if f is in B then the operator H(f) satisfies 
i lay" < oo, where the aj; are the matrix entries 
of the operator. Any operator satisfying this is called 
a Hilbert-Schmidt operator, and it is known that the 
product of two Hilbert-Schmidt is trace class. 
Applying the identity to 


T(¢,')T(¢-) 


shows that this operator is T(¢;'¢_) plus trace class. 
The operator 


T(¢4)T(¢=') 


is thus T(¢,@¢—) plus trace class and one more 
application of the identity combined with the fact 
that trace class operators form an ideal yield the 
desired result. 

From the theory of infinite determinants, as 
Nn œ, 


det P,T(¢,')T(¢)T(67')P, [13] 
converges to 
det (T(#;!)T(6)T(6=")) 14 
At this point, we have proved that 
lim Dx(6)/((6)o)" (0+) 
= lim det P,T(¢5")T(#)T (¢-") Pn 
= det(T(¢,')T(¢)T(¢7')) [15] 
It only remains to identify the constants. To see that 


G(b) = -Jd Ceo)" 


we note that 
1 20 iy 
G(d) = exp (llog 6)o) = exp(5— [log (ede 


2r 
me (5 f (oes-(e2) + log TO 
= exp(log -)o exp(log $+)9 = (G—)o($+)o 


To compute the determinant of 


T(SF TOT (E) 
we write 
det T(4;')T(4)T (62!) 
= det T(#,')T(¢_-¢+)T(¢-") 
= det T($;*) T(¢_)T(¢4)T (4) 
This last expression is the form 
eebe “e 
where 
A=-Ti(log¢,) and B= T(log¢_) 
If AB — BA is trace class then 
dete e e ne? = ett (AB-BA) 
The operator AB — BA is 
-T (log ġ+)T (log ¢-) + T (log ¢_)T (log +) 


which equals 


-T (log ¢,)T(log ¢-) + T((log ¢-)(log ¢+)) 
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and, by the identity from eqn [12], becomes 
H(log ġ-)H(log +) 
It can be directly computed that 


tr(H(log ¢-)H (log ¢+)) 


equals 


OO 
>» RSLS 
k=l 


and the theorem is proved. 


Returning to the Ising model, one needs to 
compute the asymptotics of the determinants for 
the generating function 


w [0-1 aN "7 
00) = (P ae O a) 


The term G(¢)=1 and for k > 0 











from which it follows that 


1/4 
na D,,(¢) = £ T a7) (1 E 2 


N= (1 — aa)? 


Recalling the definition of a; and az yields 


1/4 
ioe = |1- — 
nao VORU (sinh 23E; sinh 26E2) 


or the spontaneous magnetization M as 


i 1/8 
M = | 1 —-—__________, 
(sinh 2GE, sinh 2GEz) 


In order for this computation to be valid, it was 
necessary for 0 < œ < 1, and by elementary com- 
putations one can show that this is equivalent to the 
inequality 


sinh 2GE, sinh 2GE, > 1 


Nonsmooth Symbols or T = Te 


A problem occurs in the analysis just outlined when 
the inequality 0 < az < 1 does not hold. There are 
two separate possibilities, a2 > 1 or ay = 1. First, we 
consider the latter case. For fixed E, and Fo, this 
happens for exactly one fixed value of the constant 
Go=1/kT, and the corresponding temperature Te is 
called the critical temperature. The “strong Szeg6 
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limit theorem” does not apply since our generating 
function is of the form 


id ~i9)\ 1/2 
i (1 — aye")(1 —e™) 

ee ae 16 

M = (Fon 16] 

In 1968, Fisher and Hartwig raised a conjecture 

about D,(@) for nonsmooth ¢ which included the 

above example. They considered generating func- 
tions of the form 


R 


(ei) — (e) IGE [17] 
j=1 
where 
Pa gleh) = (2 — 2 cos 0) e80, 0 <0< 20 


Ra > —1/2, and 8 is not an integer. The function w 
is assumed to be a smooth function. Using the 
Fisher-Hartwig notation, the symbol of interest in 
the Ising model from eqn [16] can be written as 


we”) do,-1/2(e”) 


La el? 1/2 
10) __ 1 
ve ) 7 (; a =) 
The conjecture of Fisher and Hartwig for general 


symbols of this type stated that 
Di($) ~ GU) WE 


where 


where 
R 


p= > (oF - 8) 


r=1 


and E* is a constant whose value they did not 
identify. The constant was later computed to be 


R 
E*(¢) = Ely) [I py (e Pp CONE 
j=1 


x [I (1 — i0) la+), 


1<s#r<R 
g TT G(1 + a + §)GU + a — §) 
z G(1 + 2a;) 


where G(z) is the Barnes G-function satisfying 
G(1 +z) =T (2)G(2) 
and is defined by 


GU +2) = Or eC a 


as k 
x IG +7) e72+z*/2k 


k=1 


For the above factors, we normalize w so that the 
geometric mean is 1. Then we may assume that 
the factors ~,,w_(w_Wv,=w) are 1 at zero and 
infinity, respectively, and this defines the loga- 
rithms for the first product. The E(y) term is the 
constant in Szeg6’s theorem, and the argument of 
a term of the form (1 —e"*—*)) is taken between 
—7/2 and 7/2. 

In the case where R= 1, the conjecture is known 
to hold if Ra > —1/2 and the function b satisfies the 
conditions of Szeg6’s theorem and is infinitely 
differentiable. The theorem also has an extension 
to the case where Ra < —1/2, with 2a not an 
integer, as long as the Fourier coefficients are 
defined as the coefficients of a distribution. 

If we apply the theorem to the generating function 
from [16] 


. _ ið \ 1/2 . 
we") o,-1/2(e”) = (5) 0,-1/2(e"”) 


1 — Q1 e710 


we see that the asymptotic expansion is given by 
1 1/4 
nil (1) cunc H 
Si 


This last formula shows that, at the critical 


temperature, 
lim (70,000,n) = lim Di(¢) =) 
N—-0O n= 


thus M=0, and hence there is no correlation 
between distant lattice points. 

It should be remarked here that the diagonal 
correlation at the critical temperature is also given 
by a singular Toeplitz determinant, 


(00,00n) = Dalbo, -1/2) ~ 2 /*G(1/2)G(3/2) 


and thus this limit is also zero. 

The proof of the Fisher-Hartwig conjecture is 
much more complicated than the proof of the 
“strong Szegö limit theorem.” For an indication of 
how it is proved, note that if we consider the 
generating function ¢o,g, the Fourier coefficients 
are (sin mß)/[r(n— 8)] and hence the matrix is 
Cauchy and the determinant can be computed 
exactly. From this the asymptotics can be derived 
and they yield a special case of the Fisher-Hartwig 
conjecture. The main idea in extending the result to 
a symbol of the form 


ple” pople”) 
is to prove that the limit of 


D, (Ypo,g) 
D, (W)Dn (d,s) 


exists. The proof uses much of the same trace-class 
approach used in proving the “strong Szego limit 
theorem,” although the results are more compli- 
cated. These ideas are then extended for R > 1 and 
also more general 8 and a. 

It should be noted that in this article the Fisher- 
Hartwig conjecture does not always hold. If we 
consider the function 

id —1, -r <0<0 
A =d 1, OLOS T 


then 


if k is even 


h= 
"| —2i/(rk), if k is odd 


The matrix T,,(@) is antisymmetric and, if n is odd, 
D,(¢)=0. If n is even, using elementary row and 
column operations, the determinant can be put in 
block form with each block of Cauchy type. The 
determinant can then be evaluated to find 


D,(@) ~ (i)"n 7K 


where K is a certain constant. 
It is instructive to note that 


ple”) = dorpr(e e") bo, =i e ak 
= $9,-1/2(e)do12(e™) 
and thus that this particular symbol has two 
representations of the type given in [17] and each 
would give a different asymptotic expansion of the 
determinant if the conjecture were true for this set of 
parameters. Hence, it is clear that the conjecture 
must fail to hold in this case. 
However, this example indicates that there might 
be a generalization of the original conjecture of 
Fisher and Hartwig. If 


gap(lO) = paa 


then 


R 
=] [bag 


j=l 


it is also the case that 
R 
O) = yy [I Poy,8;+n; 6; 
i 


where 


R R 
D =0 and 7 =v¢][C 
j=l ral 
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In the example above, 6,;=1/2, G.= —1/2, 
6, =0, 6. =r, nı = —1, and m =1. The result for 
the counterexample, combined with what is known 
for the case of integer values of a and (3, leads to the 
following generalized conjecture. Suppose 


el”) oF I bos oe 


for some set of indices k. Define O(k) = 


(3%)*. Let Q = max, R(Q(k)) and 
K = {k| R(Q(R)) = Q} 


The generalized asymptotic formula is conjectured 


to be 
=S Git 


kEK 


“SE, + o(|G(¢)|"22) 


It may turn out that there is only one element in K 
and for these symbols there is a unique representa- 
tion that yields the highest power in the exponent of 
the asymptotic expansion. These are the symbols for 
which the original Fisher-Hartwig conjecture should 
be true and it is now confirmed in these cases. For 
example, the conjecture is known to hold for R > 1 
when |Ra,| < 1/2 and |RG,| < 1/2. 


Symbols with Nonzero Index or 7 > 7, 


The last possibility in computing the correlation 
asymptotics is the case where az > 1. Note that, for 
fixed Eq and E2, there is exactly one value of 
G=1/kT where 





For values of T > Te, we have that the symbol 


k — œe”) (1 — = 1/2 


(1 = aye”) (1 = azet?) 


is the same as 


( (1 — are”)(1 — (1/az)e") yo 
(1 — aye”) (1 — (1/az Je”) 


with the argument chosen so that the symbol is 
positive at 7. Except for the extra factor of e7”, this 
is the same type of smooth symbol that was 
considered earlier (see the section “Strong Szegö 
limit theorem”). However, a factor of e? can change 
the asymptotics considerably as can be seen by 
considering the simple example of the ¢=1. 
Fortunately, a variation of the Szeg6 theorem, first 
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considered by Fisher and Hartwig, holds for this 
case of smooth, nonvanishing index. 


Theorem 2 Suppose that 6=¢_¢, satisfies the 
condition of the “strong Szeg6 limit theorem” and 
in addition is at least once continuously differenti- 


able. Then, if b=¢_¢;' and c= $64, 


Dale g) 
~ (=1)'" 1" G(b)"E(b)G(c)” [18] 
br ed 
x | det + O(n?) 
Daim He De 
x (1+ O(n“)) [19] 


Applying this to the symbol 


a 
) 


—10 i0\ _ .—10 a 
DO (ae 


we have that m=1, G(¢) = —1, G(c)= —1, and 


E(6) = K -d (1-5) (1- a] 20 


The determinant in the above formula is the 


constant 


1 20 1 « 
b, ee 1 — 10 1 = 18 
27 Jo ( — ( Q2 j ) 
1 LANATI 
x (1 — aye”) (1 — Ze) ) e dé 


Q2 


The last integral can be deformed to a segment of 
the real line and evaluated asymptotically to find 
that the leading term is 


a(i) aeia) 


9 T(n + 1/2) 
T(n+1) 


Putting this together with the above constants, we 
have, for T > Te, 


(70,000,n) 


eo (1-02) 1- iea oy) "2 
Vana i as oe 


This implies that the correlation tends to zero very 
rapidly as n — oo. 


Further Remarks 


The interaction between statistical mechanics and 
the theory of Toeplitz determinants has a long 
history, and much of the motivation to describe the 
asymptotics of the determinants was spurred by the 
question of spontaneous magnetization in the two- 
dimensional Ising model. The previous three sections 
attempt to show how the very different physical 
situations — T<T.,T=T., and T>T - all 
correspond to very different behavior in the symbols 
of the generating functions. Critical systems predict 
qualitatively different Szeg6 type theorems. For 
example, the phase transition at T, predicts that 
the asymptotics for singular symbols cannot be 
predicted by the smooth symbols, that is, one cannot 
use continuous functions to approximate the results 
for singular symbols. 

Onsager (1971) was the first to understand that 
the correlation function could be expressed as a 
Toeplitz determinant. This was made explicit by 
Montroll e¢ al. (1963). For more information about 
the Ising model, the reader is referred to McCoy 
and Wu (1973), where a clear and complete 
description of the Ising model (and most of the 
notation used here in reference to this model) can 
be found. 

Szegö (1915, 1952) had originally proved a weak 
form of the “limit” theorem and he understood that 
it was desirable to extend to a second-order term. 
Szego first proved the “strong Szeg6 limit theorem” 
for positive generating functions and this was later 
extended to the nonpositive case. 

The first to understand that a different asymptotic 
behavior was expected at the critical temperature 
was Fisher and this resulted in the conjecture for the 
class of determinants generated by what is now 
known as Fisher-Hartwig symbols (Fisher and 
Hartwig 1968). Progress on the conjecture was 
made by many authors. Bottcher and Silbermann 
(1998) have provided general results concerning 
Toeplitz operators and determinants. Additional 
information about the conjectures of Fisher and 
Hartwig can be found in Böttcher and Silbermann 
(1990, 1998), Ehrhardt (2001), and Ehrhardt and 
Silbermann (1997). 

Toeplitz determinants are also important in many 
other applications. One more recent area of interest 
is the connection between random-matrix theory 
and Toeplitz determinants. Many statistical quanti- 
ties for the circular unitary ensemble can be 
described as a Toeplitz determinant. For example, 
the probability of finding no eigenvalues in an 
interval can be expressed as a Toeplitz determinant. 
It is also the case that many of the most interesting 


statistics correspond to singular symbols. For basic 
random-matrix theory information see Mehta 
(1991), and for connections between the circular 


unitary ensemble and Toeplitz determinants, 
see Hughes (2001), Tracy and Widom (1993), and 
Widom (1994). 


See also: Integrable Systems in Random Matrix Theory; 
Two-Dimensional Ising Model. 
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Basic Structure 


The origins of Tomita—Takesaki modular theory lie 
in two unpublished papers of M Tomita in 1967 and 
a slim volume by Takesaki (1970). It has developed 
into one of the most important tools in the theory of 
operator algebras and has found many applications 
in mathematical physics. 

Although the modular theory has been formulated 
in a more general setting, it will be presented in the 
form in which it most often finds application in 
mathematical physics (for generalizations, details, 
and further references concerning the material 
covered in this article, the reader is referred to the 
Further Reading section). Let M be a von Neumann 
algebra on a Hilbert space H containing a vector Q 
which is cyclic and separating for M. Define the 
operator So on H as follows: 


SoAQ = A*Q, for all AGM 


This operator extends to a closed antilinear operator 
S defined on a dense subset of H. Let A be the 
unique positive, self-adjoint operator and J the 
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unique antiunitary operator occurring in the polar 
decomposition 


S = JA‘? = AO] 


A is called the modular operator and J the modular 
conjugation (or modular involution) associated with 
the pair (M, Q). Note that J? is the identity operator 
and J]=/]*. Moreover, the spectral calculus may be 
applied to A so that A” is a unitary operator for 
each że R and {A”|t € R} forms a strongly con- 
tinuous unitary group. Let M’ denote the set of all 
bounded linear operators on H which commute with 
all elements of M. The modular theory begins with 
the following remarkable theorem. 


Theorem 1 Let M be a von Neumann algebra 
with a cyclic and separating vector Q. Then 
JQ=Q= AQ, and the following equalities hold: 


IMJ =M 
and 


AuMA“=M, forallteR 


Note that if one defines Fo A’Q = A*Q, for all A’ € 
M’, and takes its closure F, then one has the relations 


A=FS, A1=SF, F=JA” 
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Modular Automorphism Group 


By Theorem 1, the unitaries A”,¢ € R, induce a one- 

parameter automorphism group {o+} of M by 
o,(A)=A"AA™", AEM, tER 

This group is called the modular automorphism 

group of M (relative to 2). Let w denote the faithful 

normal state on M induced by Q: 


(A) =— (92,49), AEM 
[2 

From Theorem 1 it follows that w is invariant under 
fo}, that is, w(o,(A)) =w(A) for all A € M andt eR. 

The modular automorphism group contains infor- 
mation about both M and w. For example, the 
modular automorphism group is an inner auto- 
morphism on M if and only if M is semifinite. It is 
trivial if and only if w is a tracial state on M. Indeed, 
for any B € M, one has o;(B)=B for all t € R if and 
only if w(AB)=u(BA) for all AGM. Let M” 
denote the set of all such B in M. 


The KMS Condition 


The modular automorphism group satisfies a condi- 
tion which had already been used in mathematical 
physics to characterize equilibrium temperature 
states of quantum systems in statistical mechanics 
and field theory — the Kubo—Martin-Schwinger 
(KMS) condition. If M is a von Neumann algebra 
and {a;|t€R} is a o-weakly continuous one- 
parameter group of automorphisms of M, then the 
state ġo on M satisfies the KMS condition at (inverse 
temperature) 3 (0 < G < co) with respect to {a;} if 
for any A,B € M there exists a complex function 
Fap(z) which is analytic on the strip {z€ C|0 < 
Imz < 8} and continuous on the closure of this strip 
such that 


Fa p(t) = o(a,(A)B) 
Fa p(t +18) = ¢(Ba:(A)) 


for all t € R. In this case, ¢(aig(A)B) = (BA), for all 
A,B in a o-weakly dense, a-invariant *-subalgebra 
of M. Such KMS states are a-invariant, that is, 
d(a;(A))= (A), for all A € M,t € R, and are stable 
and passive (cf. Bratteli and Robinson (1981) and 
Haag (1992)). 

Every faithful normal state satisfies the KMS 
condition at G=1 (henceforth called the modular 
condition) with respect to the corresponding mod- 
ular automorphism group. 


Theorem 2 Let M be a von Neumann algebra 
with a cyclic and separating vector NQ. Then the 
induced state w on M satisfies the modular condi- 
tion with respect to the modular automorphism 
group {o,|t € R} associated to the pair (M,Q). 


The modular automorphism group is, therefore, 
endowed with the analyticity associated with the 
KMS condition, and this is a powerful tool in 
many applications of the modular theory to 
mathematical physics. In addition, the physical 
properties and interpretations of KMS states are 
often invoked when applying modular theory to 
quantum physics. 

Note that while the nontriviality of the modular 
automorphism group gives a measure of the non- 
tracial nature of the state, the KMS condition for the 
modular automorphism group provides the missing 
link between the values w(AB) and w(BA), for all 
A,B € M (hence the use of the term “modular,” as 
in the theory of integration on locally compact 
groups). 

The modular condition is quite restrictive. Only 
the modular group can satisfy the modular condition 
for (M, Q), and the modular group for one state can 
satisfy the modular condition only in states differing 
from the original state by the action of an element in 
the center of M. 


Theorem 3 Let M be a von Neumann algebra 
with a cyclic and separating vector N, and let {o;} 
be the corresponding modular automorphism 
group. If the induced state w satisfies the modular 
condition with respect to a group {a} of auto- 
morphisms of M, then {a;} must coincide with {o;}. 
Moreover, a normal state ~ on M satisfies the 
modular condition with respect to {o;} if and only 
if w(-)=wlh-)=w(h'/2-h'/*) for some unique 
positive injective operator h affiliated with the 
center of M. 


Hence, if M is a factor, two distinct states cannot 
share the same modular automorphism group. The 
relation between the modular automorphism groups 
for two different states will be described in more 
detail. 


One Algebra and Two States 


Consider a von Neumann algebra M with two 
cyclic and separating vectors Q and ®, and denote 
by w and @, respectively, the induced states on M. 
Let {a} and {of} denote the corresponding modular 
groups. There is a general relation between the 
modular automorphism groups of these states. 


Theorem 4 There exists a o-strongly continuous 
map R > t> U; € M such that 


(i) U, is unitary for all t € R; 
(ii) Us = U0” (U,) for all s,t € R; and 
(iii) of (A) = U,0%(A)U;* for all A € M andt ER. 


The 1-cocycle {U;} is commonly called the cocycle 
derivative of ¢ with respect to w and one writes 
U,=(D¢:Dw),. There is a chain rule for this 
derivative, as well: If é,w, and p are faithful normal 
states on M, then (Dw: Dé), = (Dy: Dp),(Dp: Dé),, 
for all t€ R. More can be said about the cocycle 
derivative if the states satisfy any of the conditions 
in the following theorem. 


Theorem 5 The following conditions are 


equivalent: 


(1) @ is {o7 }-invariant; 

(ii) w is {o°}-invariant; 

(iii) there exists a unique positive injective operator 
b affiliated with M% AM” such that w(-) = 
o(h-)=g(h'? - bht?) 

(iv) there exists a unique positive injective operator 
h! affiliated with M”? A M” such that ¢(-) = 
w(h' ; ) = w(h!!/2 ; h'/?), 

(v) the norms of the linear functionals w + iġ and 
w — id are equal; and 

(vi) œo? =0%o", for all s,t ER. 


The conditions in Theorem 5 turn out to be 
equivalent to the cocycle derivative being a 
representation. 


Theorem 6 The cocycle {U;} intertwining {0%} with 
{of} is a group representation of the additive group 
of reals if and only if ọ and w satisfy the conditions 


in Theorem 5. In that case, U(t)=h™. 


The operator h’=h~! in Theorem 5 is called the 
Radon-Nikodym derivative of ¢ with respect to w 
(often denoted by d@/dw), due to the following 
result, which, if the algebra M is abelian, is the 
well-known Radon-—Nikodym theorem from mea- 
sure theory. 


Theorem 7 If ¢ and w are normal positive linear 
functionals on M such that ¢(A) < w(A), for all 
positive elements A € M, then there exists a unique 
element ht? € M such that o(-)=w(h'/2 -h'/*) and 
Ome, 


The analogies with measure theory are not 
accidental, although these are not discussed in detail 
here. Indeed, any normal trace on a (finite) von 
Neumann algebra M gives rise to a noncommuta- 
tive integration theory in a natural manner. Mod- 
ular theory affords an extension of this theory to the 
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setting of faithful normal functionals 7 on von 
Neumann algebras M of any type, enabling the 
definition of noncommutative L? spaces, L?(M, n). 


Modular Invariants and the Classification 
of von Neumann Algebras 


As already mentioned, the modular structure carries 
information about the algebra. This is best evi- 
denced in the structure of type III factors. As this 
theory is rather involved, only a sketch of some of 
the results can be given. 

If M is a type III algebra, then its crossed 
product N=M~»x,.R_ relative to the modular 
automorphism group of any faithful normal state 
w on M is a type II, algebra with a faithful 
semifinite normal trace r such that To#,=e ‘7, 
t € IR, where @ is the dual of o” on N. Moreover, 
the algebra M is isomorphic to the cross product 
N xo R, and this decomposition is unique in a very 
strong sense. This structure theorem entails the 
existence of important algebraic invariants for M, 
which has many consequences, one of which is made 
explicit here. 

If w is a faithful normal state of a von Neumann 
algebra M induced by Q, let A, denote the modular 
operator associated to (M,Q) and spA,, denote the 
spectrum of A,,. The intersection 


S'(M) =n sp A, 


over all faithful normal states w of M is an algebraic 
invariant of M. 


Theorem 8 Let M be a factor acting on a 
separable Hilbert space. If M is of type M, then 
0 € S'(M); otherwise, (M) = {0,1} if M is of type 
I,, or I, and S'(M) = {1} if not. Let M now be a 
factor of type III. 


(i) M is of type INh,0<A<1, if and only if 
S'(M) = {0} U {A |n E€ Z}. 
(ii) M is of type Ilo if and only if S'(M) = {0,1}. 
(iii) M is of type Il, if and only if S’(M) =[0, co). 


In certain physically relevant situations, the 
spectra of the modular operators of all faithful 
normal states coincide, so that Theorem 8 entails 
that it suffices to compute the spectrum of any 
conveniently chosen modular operator in order to 
determine the type of M. In other such situations, 
there are distinguished states w such that 
S'(M)=sp A,,. One such example is provided by 
asymptotically abelian systems. A von Neumann 
algebra M is said to be “asymptotically abelian” if 
there exists a sequence {@n} en of automorphisms of 
M such that the limit of {Aa,(B) — ay,(B)A},ex in 
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the strong operator topology is zero, for all A,B € 
M. If the state w is a,-invariant, for all n € N, then 
sp A,, is contained in sp Ag, for all faithful normal 
states 6 on M, so that S'(M)=sp As. If, moreover, 
sp A,,=[0,00), then spA,=spAg, for all ọ as 
described. 


Self-Dual Cones 


Let j: M — M’ denote the antilinear *-isomorphism 
defined by j(A)=JAJ,A € M. The natural positive 
cone P associated with the pair (M,{) is defined as 
the closure, in H, of the set of vectors 


{A(A)Q| A € M} 


Let M, denote the set of all positive elements of M. 
The following theorem collects the main attributes 
of the natural cone. 


Theorem 9 


(i) P coincides with the closure in H of the set 

eas a E€ M4}. 

i) A® Pi =P! forallteR. 

i) Jee b for all ® € P! 

(iv) A mA \P? c P for all A € M. 

(v) P? is a pointed, self-dual cone whose linear 

span coincides with H. 

(vi) If € P’, then ® is cyclic for M if and only if 
® is separating for M. 

(vii) If ® € P is cyclic, and hence separating, for 
M, then the modular conjugation and the 
natural cone associated with the pair (M, ®) 
coincide with J and P*, respectively. 

(viii) For every normal positive linear functional > 
on M, there exists a unique vector ®4 € Pi 
such that ¢(A) = (®5,A®,) for all A € M. 


(1 
(111 


In fact, the algebras M and M’ are uniquely 
characterized by the natural cone P’ [4]. In light of 
(viii), if œ is an automorphism of M, then 


V(a)®, = Poa 


defines an isometric operator on P’, which by (v) 
extends to a unitary operator on H. The map 
at+V(qa) defines a unitary representation of the 
group of automorphisms Aut(M) on M in such a 
manner that V(a)AV(a)! =a(A) for all A € M and 
a E€ Aut(M). Indeed, one has the following: 


Theorem 10 Let M be a von Neumann algebra 
with a cyclic and separating vector Q. The group V 
of all unitaries V satisfying 

VP! = Pi 


VMVi=M, VIV*=J, 


is isomorphic to Aut(M) under the above map 
at>+V(a), which is called the “standard implemen- 
tation” of Aut( M). 


Often of particular physical interest are (anti-)auto- 
morphisms of M leaving w invariant. They can only 
be implemented by (anti)unitaries which leave 
the pair (M,Q) invariant. In fact, if U is a unitary 
Or antiunitary operator satisfying UQ=Q and 
UMU* =M, then U commutes with both J and A. 


Two Algebras and One State 


Motivated by applications to quantum field theory, 
the study of the modular structures associated with 
one state and more than one von Neumann algebra 
has begun (see Borchers (2000) for references and 
details). Let M C M be von Neumann algebras 
with a common cyclic and separating vector Q, 
and Ay, Jy and Am, Jm denote the corresponding 
modular objects. The structure (M,N,Q) is called 
a +-half-sided modular inclusion if A\,NA i C 
N, for all +t > 0. 


Theorem 11 Let M be a von Neumann algebra 
with cyclic and separating vector N. The following 
are equivalent: 


(i) There exists a proper subalgebra N C M such that 
(M,N, Q) is a =-half-sided modular inclusion. 

(ii) There exists a unitary group {U(t)} with positive 
generator such that 


U(t)MU(t) | CM, for alltt> 0, 
U“HQNX=0, forallteR 


Moreover, if these conditions are satisfied, then the 
following relations must hold: 


AUJA =A UGA = Us) 


and 


JmU(s)Jm = In U(s)Jv = U(-s) 


for all st € R. In addition, N = U(+1)MU(+1)! 
and if M is a factor, it must be type III}. 


The richness of this structure is further suggested 
by the next theorem. 


Theorem 12 


(i) Let (M,N 1,Q) and (M, N2, Q) be —-half-sided, 
resp. +-half-sided, modular inclusions satisfy- 
ing the condition Jy Jy, =]JmJ]v Jn: Jm. Then 
the modular unitaries Ai‘, A g A tu cR, 
generate a faithful continuous unitary repre- 
sentation of the identity component of the 


group of isometries of two-dimensional Min- 
kowski space. 

(ii) Let M,N,N QM be von Neumann algebras 
with a common cyclic and separating vector 9. If 
(M,MAN,Q) and (N,MOAN,Q) are —-half- 
sided, resp. +-half-sided, modular inclusions such 
that |y;MJy=M, then the modular unitaries 
Ns À» Aa mS Uu ER, generate a faithful 
continuous unitary representation of SL 


(2, R)/Z2. 


This has led to a further useful notion. If M c M 
and Q is cyclic for M N M, then (M, N,Q) is said to 
be a “+-modular intersection” if both (M, M AON, Q) 
and (N,M QA N,Q) are +-half-sided modular inclu- 


sions and 
li N it N —it li N it N —it 
In , as NAM In pr = MON 


where the existence of the strong operator limits is 
assured by the preceding assumptions. An example 
of the utility of this structure is the following 
theorem. 


Theorem 13 Let N,M,L be von Neumann alge- 
bras with a common cyclic and separating vector 9. If 
(M,N, Q) and (N", L, Q) are --modular intersections 
and (M, £,Q) is a +-modular intersection, then the 
unitaries AÏ}, Å, AM. s,t,u € R, generate a faithful 
continuous unitary representation of SO'(1,2). 


These results and their extensions to larger 
numbers of algebras were developed for application 
in algebraic quantum field theory, but one may 
anticipate that half-sided modular inclusions will 
find wider use. Modular theory has also been 
applied fruitfully in the theory of inclusions V C M 
of properly infinite algebras with finite or infinite 
index. 


Applications in Quantum Theory 


The Tomita—Takesaki theory has found many 
applications in quantum field theory and quantum 
statistical mechanics. As mentioned earlier, the 
modular automorphism group satisfies the KMS 
condition, a property of physical significance in the 
quantum theory of many-particle systems, which 
includes quantum statistical mechanics and quantum 
field theory. In such settings, for a suitable algebra 
of observables M and state w, an automorphism 
group {opi} representing the time evolution of the 
system satisfies the modular condition. Hence, on 
the one hand, {oq} is the modular automorphism 
group of the pair (M,Q), and, on the other, w is an 
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equilibrium state at inverse temperature 8, with all 
the consequences which both of these facts have. 
But it has become increasingly clear that the 
modular objects A”, J, of certain algebras of 
observables and states encode additional physical 
information. In 1975, it was discovered that if one 
considers the algebras of observables associated with 
a finite-component quantum field theory satisfying 
the Wightman axioms, then the modular objects 
associated with the vacuum state and algebras of 
observables localized in certain wedge-shaped 
regions in Minkowski space have geometric content. 
In fact, the unitary group {A"} implements the group 
of Lorentz boosts leaving the wedge region invariant 
(this property is now called modular covariance), 
and the modular involution J implements the space- 
time reflection about the edge of the wedge, along 
with a charge conjugation. This discovery caused 
some intense research activity (see Baumgartel and 
Wollenberg 1992, Borchers 2000, Haag 1992). 


Positive Energy 


In quantum physics the time development of the 
system is often represented by a strongly continuous 
group {U(t) =e |t € R} of unitary operators, and 
the generator H is interpreted as the total energy of 
the system. There is a link between modular 
structure and positive energy, which has found 
many applications in quantum field theory. This 
result was crucial in the development of Theorem 11 
and was motivated by the 1975 discovery mentioned 
above, now commonly called the Bisognano- 
Wichmann theorem. 


Theorem 14 Let M be a von Neumann algebra 
with a cyclic and separating vector Q, and let {U(t)} 
be a continuous unitary group satisfying U(t)MU 
(_t) C M, for all t>0. Then any two of the 
following conditions imply the third: 


(i) U(t) =e, with H > 0; 
(ii) U(t) =Q, for all t € R; and 
Gii) A“U(s)A-” = U(e™?™s) and JU(s)J = U(—s), for 
all s,t ER. 


Modular Nuclearity and Phase Space Properties 


Modular theory can be used to express physically 
meaningful properties of quantum “phase spaces” 
by a condition of compactness or nuclearity of 
certain maps. In its initial form, the condition was 
formulated in terms of the Hamiltonian, the global 
energy operator of theories in Minkowski space. 
The above indications that the modular operators 
carry information about the energy of the system 
were reinforced when it was shown that a 
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formulation in terms of modular operators was 
essentially equivalent. 

Let O, C O2 be nonempty bounded open subregions 
of Minkowski space with corresponding algebras of 
observables A(O,) C A(O2) in a vacuum representa- 
tion with vacuum vector Q, and let A be the modular 
operator associated with (A(©2),Q) (by the Reeh- 
Schlieder theorem, Q is cyclic and separating for 
A(O2)). For each A € (0,1/2) define the mapping 
=a :A(O1) > H by =)(A) = A*AQ. The compactness 
of any one of these mappings implies the compactness 
of all of the others. Moreover, the /? (nuclear) norms of 
these mappings are interrelated and provide a measure 
of the number of local degrees of freedom of the 
system. Suitable conditions on the maps in terms of 
these norms entail the strong statistical independence 
condition called the split property. Conversely, the split 
property implies the compactness of all of these maps. 
Moreover, the existence of equilibrium temperature 
states on the global algebra of observables can be 
derived from suitable conditions on these norms in the 
vacuum sector. 

The conceptual advantage of the modular com- 
pactness and nuclearity conditions compared to 
their original Hamiltonian form lies in the fact that 
they are meaningful also for quantum systems in 
curved spacetimes, where global energy operators 
(i.e., generators corresponding to global timelike 
Killing vector fields) need not exist. 


Modular Position and Quantum Field Theory 


The characterization of the relative “geometric” 
position of algebras based on the notions of modular 
inclusion and modular intersection was directly 
motivated by the Bisognano—Wichmann theorem. 
Observable algebras associated with suitably chosen 
wedge regions in Minkowski space provided exam- 
ples whose essential structure could be abstracted 
for more general application, resulting in the notions 
presented in the preceding sections. 

Theorem 12(ii) has been used to construct, from 
two algebras and the indicated half-sided modular 
inclusions, a conformal quantum field theory on the 
circle (compactified light ray) with positive energy. 
Since the chiral part of a conformal quantum field 
model in two spacetime dimensions naturally yields 
such half-sided modular inclusions, studying the 
inclusions in Theorem 12(ii) is equivalent to study- 
ing such field theories. Theorems 12(i) and 13 
and their generalizations to inclusions involving up 
to six algebras have been employed to construct 
Poincaré-covariant nets of observable algebras (the 
algebraic form of quantum field theories) satisfying 
the spectrum condition on (d+ 1)-dimensional 


Minkowski space for d=1,2,3. Conversely, such 
quantum field theories naturally yield such systems 
of algebras. 

This intimate relation would seem to open up the 
possibility of constructing interacting quantum field 
theories from a limited number of modular inclu- 
sions/intersections. 


Geometric Modular Action 


The fact that the modular objects in quantum field 
theory associated with wedge-shaped regions and the 
vacuum state in Minkowski space have geometric 
significance (“geometric modular action”) was origin- 
ally discovered in the framework of the Wightman 
axioms. As an algebraic quantum field theory (AQFT) 
does not rely on the concept of Wightman fields, it was 
natural to ask (i) when does geometric modular action 
hold in AQFT and (ii) which physically relevant 
consequences follow from this feature? 

There are two approaches to the study of 
geometric modular action. In the first, attention is 
focused on modular covariance, expressed in terms of 
the modular groups associated with wedge algebras 
and the vacuum state in Minkowski space. Modular 
covariance has been proven to obtain in conformally 
invariant AQFT, in any massive theory satisfying 
asymptotic completeness, and also in the presence of 
other, physically natural assumptions. To mention 
only three of its consequences, both the spin-statistics 
theorem and the PCT theorem, as well as the 
existence of a continuous unitary representation of 
the Poincaré group acting covariantly upon the 
observable algebras and satisfying the spectrum 
condition follow from modular covariance. 

In a second approach to geometric modular action, 
the modular involutions are the primary focus. Here, 
no a priori connection between the modular objects 
and isometries of the spacetime is assumed. The central 
assumption, given the state vector Q and the von 
Neumann algebras of localized observables {A(O)} on 
the spacetime, is that there exists a family W of subsets 
of the spacetime such that Jw,R(W2)/w, € 
{R(W)| W € W}, for every W1,W2 € W. This condi- 
tion makes no explicit appeal to isometries or other 
special attributes and is thus applicable, in principle, to 
quantum field theories on general curved spacetimes. 

It has been shown for certain spacetimes, including 
Minkowski space, that under certain additional 
technical assumptions, the modular involutions 
encode enough information to determine the 
dynamics of the theory, the isometry group of the 
spacetime, and a continuous unitary representation of 
the isometry group which acts covariantly upon the 
observables and leaves the state invariant. In certain 
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cases including Minkowski space, it is even possible 
to derive the spacetime itself from the group J 
generated by the modular involutions {Jw |W € W}. 

The modular unitaries AÏ, enter in this approach 
through a condition which is designed to assure the 
stability of the theory, namely that AX, € J, for all 
t€R and W € W. In Minkowski space, this addi- 
tional condition entails that the derived representation 
of the Poincaré group satisfies the spectrum condition. 


Further Applications 


As previously observed, through the close connec- 
tion to the KMS condition, modular theory enters 
naturally into the equilibrium thermodynamics of 
many-body systems. But in recent work on the 
theory of nonequilibrium thermodynamics it also 
plays a role in making mathematical sense of the 
notion of quantum systems in local thermodynamic 
equilibrium. Modular theory has also proved to be 
of utility in recent developments in the theory of 
superselection rules and their attendant sectors, 
charges and charge-carrying fields. 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Quantum Central-Limit 
Theorems; Symmetries in Quantum Field Theory of 


Lower Spacetime Dimensions; Thermal Quantum Field 
Theory; Positive Maps on C*-Algebras; Two-Dimensional 
Models; von Neumann Algebras: Introduction, Modular 
Theory, and Classification Theory. 


Further Reading 


Baumgartel H and Wollenberg M (1992) Causal Nets of Operator 
Algebras. Berlin: Akademie-Verlag. 

Borchers HJ (2000) On revolutionizing quantum field theory with 
Tomita’s modular theory. Journal of Mathematical Physics 41: 
3604-3673. 

Bratteli O and Robinson DW (1981) Operator Algebras and 
Quantum Statistical Mechanics II. Berlin: Springer. 

Connes A (1974) Caractérisation des algébres de von Neumann 
comme espaces vectoriels ordonnés. Annales de I’Institut 
Fourier 24: 121-155. 

Haag R (1992) Local Quantum Physics. Berlin: Springer. 

Kadison RV and Ringrose JR (1986) Fundamentals of the Theory 
of Operator Algebras, vol. II. Orlando: Academic Press. 

Pedersen GK (1979) C*-Algebras and Their Automorphism 
Groups. New York: Academic Press. 

Stratila S (1981) Modular Theory in Operator Algebras. Tun- 
bridge Wells: Abacus Press. 

Takesaki M (1970) Tomita’s Theory of Modular Hilbert Algebras 
and Its Applications, Lecture Notes in Mathematics, vol. 128. 
Berlin: Springer. 

Takesaki M (2003) Theory of Operator Algebras II. Berlin: 
Springer. 


Topological Defects and Their Homotopy Classification 


T W B Kibble, Imperial College, London, UK 
© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Symmetry-breaking phase transitions occur in a wide 
variety of systems — from condensed matter to the 
early universe. One of the common features of such 
transitions is the appearance, in the broken-symmetry 
phase, of topological defects, trapped regions in 
which the symmetry is restored, or at least changed. 
Examples are vortices in superfluids, domain walls in 
ferromagnets, and disclination lines in liquid crystals. 
Often these defects are stable for topological reasons, 
and play an important role in the dynamics of the 
system. An astonishingly rich variety of defects can be 
found in various systems. They can usefully be 
classified using the tools of homotopy theory. 


Spontaneous Symmetry Breaking 


Let us consider a quantum-mechanical system with a 
symmetry group G. This means that each g € G is 


represented on the Hilbert space of quantum states 
by a unitary operator U(g), which commutes with 
the Hamiltonian. Spontaneous symmetry breaking 
occurs if this symmetry is not shared by the ground 


state or vacuum state |0) of the system. In other 
words, for some g€ G, U(g)|0) Æ |0). Then the 
ground state is necessarily degenerate: U(g)|0) must 
have the same energy as |0). 

Spontaneous symmetry breaking is usually 
describable in terms of an order-parameter field, 
which vanishes above the transition and is nonzero 
below it. We can find a scalar field d(r), or multiplet 
of fields 6=(¢;,i=1,...,) transforming according 
to some representation D of G (assumed not to 
contain the trivial representation), whose expecta- 
tion value in the ground state is nonzero: 


(0/4(r)|0) = ¢o 4 0 [1] 
This is the order parameter. Since 
(0|U"(g)o(r)U(g)|0) = D(g) oo 2] 


it follows that the only elements of G that can be 
symmetries of the ground state are those in the 
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stability subgroup H of ġo (the group of unbroken 
symmetries in this ground state): 


H=i{geEG: D(g)¢0 = do} [3] 


In terms of this subgroup, we can find a useful 
characterization of the manifold M of degenerate 
ground states. As noted above, for each g € G, 
U(g)|0) is also a ground state. However, these are 
not all distinct, because clearly U(gh)|0) = U(g)|0) 
for all b € H. Hence, the distinct ground states are 
in one-to-one correspondence with the left cosets gH 
of H in G, and M may be identified with the 
quotient space G/H, the space of left cosets. 

For example, suppose G is the rotation group 
SO(3), and @ belongs to the three-dimensional 
vector representation. If @ £0 in the ground state, 
we may choose @)=(0,0,v). Then, clearly, 
H=SO(2), the group of rotations about the z-axis, 
and M =SO(3)/SO(2) = $2, the 2-sphere. It is useful 
to think of M as the subset of the order-parameter 
space comprising the possible expectation values 
¢ = () for the various degenerate ground states. For 
example, in this case, M ={@: @* =v}. 


Defect Formation 


It is often possible to characterize the dynamics at 
finite temperature in terms of a function of the order 
parameter, the effective potential V(¢), which is 
necessarily invariant under G, and whose minima 
define the equilibrium states. At low temperatures, it 
has a form like V=.(¢? —v?)*, whose minima 
occur at nonzero values of @. But above the critical 
temperature Te, the only minimum is at ¢=0, so the 
equilibrium state is symmetric under G. In the high- 
temperature phase, there may be large fluctuations 
in @, but its mean value will be zero. 

Now, when the system is cooled through the 
phase transition, ¢ will acquire a nonzero expecta- 
tion value, gradually approaching one of the 
degenerate ground states characterized by a point 
of M. But the choice of which one is unpredict- 
able; the symmetry breaking is spontaneous. 
Moreover, in a large system, there is no reason 
why the same choice should be made everywhere. 
For example, a ferromagnet cooling through its 
Curie point may acquire a spontaneous magneti- 
zation in different directions in different parts of 
the sample. 

Of course, there is an energetic penalty to having 
a spatially varying order parameter, so it will tend to 
become more uniform as the temperature is lowered. 
But the question arises whether there may be any 
topological obstruction to this process. It can 
happen that if we choose points on M in a 


continuous manner everywhere around the periph- 
ery of some region, it is topologically impossible to 
complete the process throughout its interior. 
Continuity may require that there are points where 
@ leaves the surface M. For example, if our 
ferromagnet has two opposite possible directions of 
easy magnetization, described by ¢) and —@,, then 
M consists essentially of these two points. Regions 
where @ ~ ġo and where @ ~ —@, must be separated 
by domain walls across which @ varies smoothly 
from one to the other. 


Homotopy Groups 


To classify the various possible types of defect, we 
need to consider the homotopy groups of the 
manifold M of degenerate ground states. In this 
section, we briefly review the necessary definitions. 

A path in M is a map ¢:I — M from the unit 
interval I=[0,1] c R. We choose a base point mo € 
M (which may be identified with ġo), and consider 
loops in M, paths such that 4(0)=¢(1)=mo. We 
say that two loops are homotopic, and write ¢ ~ w, 
if one can be continuously deformed into the other 
within M, that is, if there exists a map x: — M 
such that 


x(0,t) = (t) 


for all ¢, and 


and x(1,t) = Y(t) 4] 


x(s, 0) = x(s, 1) = mo [5] 


for all s. This is an equivalence relation. The set 
mı( M) is the set of equivalence classes [4] of loops 
under this relation. 

On the set of loops, we may define a product oy, 
comprising the loop ¢ followed by w (see Figure 1). 
Explicitly, 


(py) t) = l 6 


(2t), O<t<5 
w(2t — 1), - <1 


It is easy to show that if d6~ @ and Y ~ vw, then 
oy ~ oy’. Hence, this defines a product on 7(M), 
by [olly] =[dwW]. So equipped, 71(M) becomes the 


Figure 1 The product of loops. 
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fundamental group or first homotopy group of M. 
Note that the identity is the equivalence class [o] of 
the trivial loop with ¢9(t) = mo, while the inverse is 
[p] —[é], where the map ¢@ is the reverse of 
$: P(t) = o(1 — t). 

Strictly speaking, we should write 71(M,mo) in 
place of mı( M). However, for any path-connected 
space, the groups mı(M, mọ) and m1(M,mp) are 
always isomorphic, and, more importantly, the same 
is true for any coset space M = G/H, where G is a 
Lie group and H a closed subgroup. For a general 
manifold M, mı( M) is not necessarily abelian, but it 
is so if M is a Lie group, or more generally a 
Riemannian symmetric space. The space M is said to 
be simply connected if 71(M) = 0, the group compris- 
ing only the identity element, 0 = {[¢ọ]}. (Although 
mı( M) is not always abelian, it is conventional for 
homotopy groups to use an additive notation and 
represent the trivial group by O rather than 1.) 

The nth homotopy group 7,,(M) may be defined 
similarly, as a set of equivalence classes of maps 
@:I” — M such that ¢ maps the entire boundary OI” 
to the base point mo. Two such maps are homotopic 
(ġ ~ p) if there exists a map y:1"*+! — M such that 


x(0,t) = o(f) and x(1,t) = YZ) [7] 


for all t= (t1,..., ta), and, for each s € I, x(s, t) = mo 
for all t € OI”. The product oy is defined by 


(ov)(t, rey ty) 


| (2h, t,..-, tn), 
w(2t, —1,t,...,tn), 


The choice of tı rather than any other t; is arbitrary; 
all choices yield homotopic product maps. The 
product again defines a product on 7,(M), which 
thereby becomes a group, the nth homotopy group. 
One new feature is that, for all n > 1,7,(M) is 
always abelian. 

Note that since the entire boundary of I, is 
mapped to a single point, it is possible to collapse it, 
and talk instead about maps from the n-sphere S” to 
M, taking one designated point to mg. The fact that 
T (M) is nontrivial indicates the existence in M of 
closed n-surfaces that cannot be smoothly shrunk to 
a point. In particular, it is worth noting that, for any n, 
Ty(S")=Z, the additive group of integers, while 
Tm(S”) =0 for all m < n. 

A special case is n=0. Here, S° comprises two 
points only, and since one of them is always mapped 
to mo, we really have to consider maps from a single 
point to M, that is, points in M. Two points are 
homotopic if they can be joined by a path in M. 
Thus, 7o(M) may be identified with the set of path- 
connected components of M. Note, however, that in 


1 
O<h <5 


8 
5<t <1 PI 


general no product can be defined on 79(M), so 
to(M) should be called the zeroth homotopy set 
(not group). There is an important exception, 
however: if G is a Lie group, and Go its connected 
subgroup (the subset of elements joined by paths to 
the identity e), then 7o(M) may be identified with 
the quotient group G/Go. Note, however, that this 
group 7(M)=G/Go is not necessarily abelian. 


Classification of Defects 


We now turn to the classification of defects by 
means of homotopy groups. It will be useful to start 
with simple specific examples in three-dimensional 
space, RÌ. 

First, suppose again that @ belongs to the vector 
representation of G=SQO(3). Then M=SO(3)/ 
SO(2)=S* may be identified with the sphere 
M ={@: 7 =v} in @ space. Consider a closed surface 
S, an embedding of a 2-sphere S* in R°. Assume 
that everywhere on S the field @(r) has one of the 
ground-state values. In other words, we have a map 
@:S — M, from one 2-sphere to another. The map @ 
can be extended to a map from the interior of S to M 
only if it belongs to the trivial homotopy class [ġo] € 
t(M), where ġo: P — M: (t,t) mo = eH. In all 
other cases, there must be at least one point where 
@(r) =0; this is a point defect. The second homotopy 
group in this case is 72(S*)=Z, so the possible 
point defects, or monopoles, are labeled by an integer 
n € Z, the winding number. (An example of a map 
with winding number n is (in spherical polars) 
(7, 0, p) + (v, A, mp).) 

More generally, point defects in R are classified 
by mg1(M). A map @ from a closed (d-—1)- 
dimensional surface S C Rf to M can be extended 
to the interior of S if and only if it belongs to the 
trivial homotopy class [ġo] € mg_1(M). If this is not 
the case, there must be at least one point around 
which ¢(r) leaves the surface M, although in general 
it is not required to vanish anywhere. 

Second, take the case where ¢ is a single complex 
field, and G is the phase symmetry group U(1). In 
this case, H is the subgroup 1={1} Cc G. Thus, 
M=U(1)/1=S'; this manifold may be identified 
with the circle {¢:|¢|=v} in the order-parameter 
space. Now consider a closed loop C in space, an 
embedding of S! in R? (see Figure 2). Suppose that 
on C,¢(r) takes one of the ground-state values, 
say o(r)=vexp[ia(r)|. If S is some surface with 
boundary C, then the map ¢:C — M can be 
extended to a map ¢:S — M if and only if it 
belongs to the trivial homotopy class [ġo] € 7™(M). 
If it does not, then there must be at least one 
point on S within C where ¢=0. Moreover, this 
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Figure 2 A linear defect. 


must be true of every surface S spanning C, so there 
must be a curve passing through C along which 
@=0. This is a linear defect, a string or vortex line. 
In this case, the first homotopy group is 71(S') = Z, 
so we see that the possible linear defects are 
classified by an integer, the winding number n. An 
example of a map with winding number 7 is 
pre ve, 

Again, this result can easily be generalized. Linear 
defects in R? are classified by tg_>(M). If, on a 
(d — 2)-dimensional surface C, (r) takes values in 
M, and if it does not belong to the trivial homotopy 
class, there must be a linear defect threading 
through C, around which ¢ leaves the surface M —- 
although again it need not necessarily vanish. 

More generally yet, in the d-dimensional space R4, 
defects of dimension p are classified by the homotopy 
group Td-p—-1( M). For example, in three dimensions, 
planar defects — domain walls — are classified by 


Tol M). 


The Exact Sequence 


There are mathematical theorems that greatly 
facilitate the computation of the homotopy group 
of homogeneous spaces, of the form M = G/H. 

We begin with the maps relating these spaces to 
each other. There is a canonical injective homo- 
morphism i:H — G:h++h, and a canonical pro- 
jection associating each element of G with its coset: 
p:G — M:gm=gH. Moreover, it is clear that the 
image of i, namely the subgroup H, is also the kernel 
of p, the inverse image p™tmo of the distinguished 
element mo =eH of M. These statements can be 
summarized by saying that 


Len GCLM 





Figure 3 An exact sequence. 


is an exact sequence: the image of each map is the 
kernel of the following one (see Figure 3). 

Next, we note that since any closed loops (or 
n-surfaces) in H belonging to the same homotopy 
class are also homotopic as loops (or n-surfaces) in 
G, there is an induced homomorphism 1,:7,(H) — 
t(G). Similarly, homotopic loops or n-surfaces in G 
project to homotopic loops or n-surfaces in M, so 
there is an induced homomorphism p, :m,(G) —> 
T(M). Moreover, it is easy to see that although 1, is 
not necessarily injective and p, not necessarily 
projective, it is true that the image of i, is the kernel 
of p.. For example, any loop in G will be mapped to 
a homotopically trivial loop in M if and only if it is 
homotopic to the image of a loop in H. 

In addition, there is a boundary map that 
relates homotopy groups of different dimension: 
O:Tn41(M) — m,(H). To see this, it is useful to 
think of G as a fiber bundle with base space M and 
fiber H. Now consider a map @:(I"*!,dI"*!) > 
(M,7m). Since p is a projection, @ can always be 
lifted to a map ¢:(I”*!, I"*!) — (G,H), that is, we 
can find a (nonunique) map ¢ such that d6=pod¢ 
(see Figure 4). However, b does not necessarily map 
the boundary to a single point; what is true is that ¢ 
must map the boundary to a subset of H, and since 
topologically 0I”+! ~ S”, this defines a map ġ: S” — H. 
If we allow ¢ to vary over some homotopy class 


~ 


of maps, and ¢ to vary continuously, then ọ will 





Figure 4 Lift of a loop. 
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also remain in one homotopy class. Thus, we have 
defined a map 0:ty41(M) > a,(H):[é] + [ĝl]. 

It is also easy to see that the image 
of 0: %41(M) — mal H) is the kernel of i, :7,(H) — 
t,(G), because the m-surface in H defined by ¢ is 
necessarily homotopically trivial in G. Similarly, one 
can see that the image of px: nn+1(G) —> t41(M) is 
the kernel of 0: 7y41(M) — 7,(H). 

Putting all these results together, we see that there 
is a (semi-infinite) exact sequence connecting all the 
homotopy groups: 

5 anp (M) D m (H) 5 1a (G) 5 m (M) 


Â, n, a(S.. 


$, mo(G) È mo(M) 


This sequence makes it easy to compute most of 
the low-dimensional homotopy groups of M. Let us 
begin with mo( M), which merely labels its discon- 
nected components. As noted earlier, for the Lie 
group G, mo(G) is the quotient group mo(G) = G/Go, 
where Go is the connected subgroup of G. Now the 
image of ao(H) under i, is clearly the set of 
connected components of G that contain elements 
of H, so if G has m connected components, and n of 
them contain elements of H, then mo( M) has m/n 
elements (see Figure 5). 

Next, we note that, for all the higher homotopy 
groups, disconnected pieces are irrelevant. Since a 
loop, for example, starting at mọ must remain 


within its connected component My C M, it 
follows that mı(M)=m (Mo), and similarly 
Tal M) =T,( Mo) for all n> 1. So one can ignore 


any disconnected parts of the symmetry group G, 
and assume from now on that mọo(G)=0. Moreover, 
it is always possible to replace G by its simply 
connected covering group, replacing SO(3), for 


— 
TS 


N 


7™(H) 





KE eS 


ene 5 The disconnected components of G are shaded, those 
of H are cross-hatched. Here 79(M) has two elements. 


example, by SU(2). Thus, we may also assume that 
7™1(G) =0. Then the section of the exact sequence in 
the second line of [9] becomes 


075 m (M) ©. mo(H) 0 


which implies that the two groups in the center are 
isomorphic: 


™m(M) = mo(H) (10) 


For example, if the symmetry group G=SO(3) is 
completely broken, so that H = 1, then replacing G by 
G =SU(2) requires replacing H by H={+1, —1} 
~ Z2, hence also 1(M)=70(H) = Z2; there is only 
one nontrivial class of linear defects in this model. 

To find m2(M), we need a standard theorem 
about Lie groups, namely that the second homotopy 
group of any Lie group is trivial: for any 
G,72(G)=0. (No details of the proof are given 
here. It derives from the fact that a generic element 
g € G belongs to a unique one-parameter subgroup 
{exp (tX),t € R} c G, where X is an element of the 
Lie algebra of G. Thus, all the points on a surface in 
G may be joined by these paths to the identity, and 
the surface may then be shrunk along the resulting 
cone. There are exceptional elements for which 
this is not true, but it can be shown that in a d- 
dimensional group they lie on (d — 3)-dimensional 
surfaces, so any 2-surface can be smoothly deformed 
to avoid them.) 

It follows from this theorem that another section 
of the exact sequence is 


075 ma(M) ©. m (H) 0 
which again implies an isomorphism: 
m™2(M) = mı (H) [11] 


For example, if G=SO(3) and H=SO(2), or 
equivalently G=SU(2) and H=U(1) (a double 
cover of the SO(2)), then m(M)=7(H)=Z, so 
point defects in this theory are labeled by an integer 
winding number. 


Examples 


The simplest continuous symmetry is the U(1) phase 
symmetry ¢+> de of a complex field. In a weakly 
interacting Bose gas, below the Bose-Einstein con- 
densation temperature, or in superfluid helium-4, 
a macroscopic fraction of the atoms occupies a 
single quantum state, and ¢ acquires a nonzero 
expectation value, (¢) = ¢, whose phase is arbitrary, 
so the symmetry is completely broken to H=1. 
Thus, M=S!; we have a circle of equivalent 
degenerate ground states. (This corresponds to 
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spontaneous breaking of the particle-number sym- 
metry. It is possible to describe the system in a U(1)- 
invariant way, by projecting out a state of definite 
particle number, a uniform superposition of all the 
states in M, but it is generally less convenient to do 
so.) In this case, the only nontrivial homotopy group 
is 71(M)=Z, so the only defects are linear defects 
classified by a winding number n € Z. The defects 
with n= +1 are stable vortices. Those with |n| > 1 
are in general unstable and tend to break up into |n| 
single-quantum vortices. 

Low-temperature superconductors also have a 
U(1) symmetry, although there are important differ- 
ences. This is not a global symmetry but a local, 
gauge symmetry, with coupling to the electromag- 
netic field. Moreover, it is not single atoms that 
condense but Cooper pairs, pairs of electrons of 
equal and opposite momentum and spin. These 
systems too exhibit linear defects, magnetic flux 
tubes carrying a magnetic flux 4rnh/e. 

A less trivial example is a nematic liquid crystal. 
These materials are composed of rod-shaped mole- 
cules that tend, at low temperatures, to line up 
parallel to one another. The nematic state is 
characterized by a preferred orientation, described 
by a unit vector n, the director. (Note that n and —n 
are physically equivalent.) There is long-range 
orientational order, with molecules preferentially 
lining up parallel to n, but unlike a solid crystal 
there is no long-range translational order — the 
molecules move freely past each other as in a normal 
liquid. 

A convenient order parameter here is the mean 
mass quadrupole tensor ® of a molecule. In the 
nematic state, ® is proportional to (3mm — 1); for 
example, if n=(0,0,1), then ® is diagonal with 
diagonal elements proportional to (—1,—1,2). In 
this case, the symmetry group is SO(3) (or, more 
precisely, O(3); but the inversion symmetry is not 
broken, so we can restrict our attention to the 
connected part of the group). The subgroup H that 
leaves this ® invariant is a semidirect product, 
H=SO(2) x Z) (isomorphic to O(2)), composed of 
rotations about the z-axis and rotations through m 
about axes in the x-y plane. (If we enlarge G to its 
simply connected covering group G =SU(2), then H 
becomes H=[U(1) x Z4]/Z2, where U(1) is gener- 
ated as before by J,. The essential difference is that 
the square of any of the elements in the disconnected 
piece of H is not now the identity but the element 
e?ii — —1 € U(1).) The manifold M of degenerate 
ground states in this case is the projective space RP? 
(obtained by identifying opposite points of S*). 

Since H has disconnected pieces, we have 


~ 


1™1(M) = 10(H) = Z2. Thus, there can be topologically 





Figure 6 Orientation of molecules around a disclination line. 


stable linear defects, here called disclination lines, 
around which the director n rotates by z (see Figure 6). 
The fact that these defects are classified by Z2 rather 
than Z means that a line around which n rotates by 27 
is topologically trivial; indeed, n can be smoothly 
rotated near the line to run parallel to it, leaving a 
configuration with no defect. 

There are also point defects; since m2(M)= 
m(H)=Z, they are labeled by an integer winding 
number n. In a defect with n = 1, the vector n points 
radially outwards all round the defect position. 


Helium-3 


Finally, let us turn to helium-3, one of the most 
fascinating and complex examples of spontaneous 
symmetry breaking, which becomes a superfluid at a 
temperature of a few millikelvin. Unlike helium-4, this 
is, of course, a Fermi liquid, so it is not the atoms that 
condense, but bound pairs of atoms, analogous to 
Cooper pairs. In this case, however, the most attractive 
channel is not the tS, but the °P, so the pairs have both 
orbital and spin angular momentum, L = S = 1. There- 
fore, the order parameter is not a single complex scalar 
field but a 3 x 3 complex matrix ®,, where the two 
indices label the orbital and spin angular momentum 
states. 

To a good approximation, the system is invariant 
under separate rotations of L and S (the effects of 
the small spin-orbit coupling will be discussed 
later), so the symmetry group is 


G =U(1)y x $O(3), x SO(3), [12] 


where the subscripts denote the generators and U(1)y 
represents multiplication by an overall phase factor, 
eo": p Be. This complicated symmetry allows 
much scope for a large variety of defects. There are, in 
fact, two distinct superfluid phases, A and B, with 
different symmetries (and indeed in the presence of a 
magnetic field there is a third, A1). 

In the *He-A phase, the order parameter has the 
form ® x (mj +inj)d,, where m, n, d are unit 
vectors, with m Ln; if we set l=mAn, then 
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l defines the orbital angular momentum state by 
l. L=1, while d defines the spin quantization axis, 
such that d-S=0. The manifold Ma for this 


phase is 
Ma = [SO(3) x S7]/Za [13] 


where the Z2 is present because (m, n, d) and 
(-_m, —n, —d) represent the same state. If, for 
example, we take l and d in the z-direction, the 
unbroken symmetry subgroup is 


Ha =SO(2), yy x [SO(2)s x Zo] [14] 


where the nontrivial element of ‘Z may be taken to 
be e!™(5:+/<)_ The covering group of G is, of course, 


G = Ry x SU(2), x SU(2), [15] 
Correspondingly, 
Ha = Ry x [U(1)s, x Za] [16] 


It follows that the homotopy groups are 
m(Ma)=90, m(Ma) = Z4, m2(Ma)=Z [17] 


There are linear defects labeled by a mod-4 quantum 
number and point defects labeled by an integer. 

For the *He-B phase, by contrast, the order 
parameter is of the form 


Dip X Rige” [18] 
where R is a rotation matrix, R € SO(3). Here then, 
Mg = SO(3) x S! [19] 

with homotopy groups 


Tol Mp) = 0, Tı( Mp) =la x Z, 
T72(Mzg) = 0 [20] 


In this phase, there are two distinct types of linear 
defect, the mass vortices with an integer label, and 
the spin vortices with a mod-2 label. (One can also 
have a “spin-mass vortex” carrying both quantum 
numbers.) 


Composite Defects 


There are several cases, including in particular 
helium-3, that exhibit symmetry breaking with 
multiple length or energy scales. For example, there 
may be two order parameters, say @,~, with 
lol >> ||. If || is negligible, the symmetry G is 
broken by ¢ to H, and the manifold of degenerate 
ground states is M = G/H. However, these states 
are not all exactly degenerate: w~ breaks the 
symmetry further to K C H, so the precisely degen- 
erate ground states form a submanifold M'=G/K. 


The case of helium-3 is slightly different. Here it 
is the small spin-orbit coupling, arising from long- 
range dipole-dipole interactions, that introduces 
the second scale. Its effect is only significant over 
large distances. 

In the *He-A phase, at short range the l and d 
vectors are uncorrelated but, over large distances, 
they tend to be aligned parallel or antiparallel. We 
can use the Z2 symmetry mentioned earlier to 
choose 1=d. Hence, the manifold M', of true 
ground states is only a submanifold of M4, namely 
M’, =SO(3), whose homotopy groups are 


To( Ma) = 0, mı( Mi) = Zp, ma( M'i) =0 [21] 


Because of different behavior on different scales, 
“composite” defects can arise. For example, because 
mə(Ma)=Z, there are short-range monopole con- 
figurations. For the n=1 monopole, we have a 
configuration with uniform l, and with d pointing 
outwards from the center. But, eventually the 
misalignment of d with l is energetically disfavored, 
and at large distances d tends to rotate to align with 
l except around one particular direction where it is 
oppositely aligned (see Figure 7). We have a 
composite defect: a small monopole coupled to a 
relatively fat string. 

To see how the small- and large-scale structures fit 
together, one has to look also at the relative 
homotopy groups m (M, M’), whose elements are 
homotopy classes of maps from I” to M such that 
one face of the boundary is mapped into M’, and the 
remainder to the chosen base point mg. For example, 
mı( M, M’) classifies paths that terminate at mọ while 
beginning at any point of M’. There is, in fact, a 
long exact sequence, similar to [9], relating these 
homotopy groups, of which a typical segment is 


tty (M E np ( M) Br ( M, M’) 
2 mpa (MNS [22] 


> > > > > > > > > > > > > > > 
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Figure 7 Cross-section of a short-range monopole attached to 
a fat string. 
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The relevant groups in the present case are 
mil MaMa =Z2, 172(Ma4,M,) =Z [23] 


Because mı (M4) = Za, there are three distinct classes 
of linear defects at small scales, but only those with 
quantum number n = 2 (mod 4) survive unchanged to 
large scales; they correspond to the nontrivial element 
of mı( M4) =Z2. On the other hand, the homotopy 
classes n= +1(mod4) are mapped to nontrivial 
elements of mı( Ma, M3) = Z2, which indicates that 
the corresponding linear defects are coupled at long 
range to fat domain walls, across which d rotates 
through z with a compensating rotation through 7 
about l Similarly, the nontrivial elements of 
™2(Ma4)=Z are mapped to nontrivial elements of 
m™(M4, M’,), confirming that these short-range mono- 
poles are coupled to fat strings, as in Figure 7. 

For >He-B, the effect of the spin-orbit coupling 
is to make the most energetically favorable 
configurations those in which the rotation 
matrix R in [18] represents a rotation about an 
arbitrary axis n through the Leggett angle 0, = 
arccos(— 1/4) = 104°: R= exp (-19, 1 - J). 

Consequently, 


Mass [24] 

and so 
mol Mp = 0, 
The relative homotopy groups are 


mi Me Mg) =Z ma Mp Mp = 0 [26] 


miMa) =; mM =Z. BS 


Here the mass vortex persists at long range, but the 
configuration around the spin vortex deforms so 
that they become attached to fat domain walls. The 
“monopole” configurations corresponding to 


nontrivial elements of 72(Mj) have no short-range 
singularity at all. 


See also: Abelian Higgs Vortices; Leray—Schauder 
Theory and Mapping Degree; Liquid Crystals; Phase 
Transition Dynamics; Quantum Field Theory: A Brief 
Introduction; Quantum Fields with Topological Defects; 
Solitons and Other Extended Field Configurations; String 
Topology: Homotopy and Geometric Perspectives; 
Symmetries and Conservation Laws; Symmetry Breaking 
in Field Theory; Variational Techniques for 
Ginzburg—Landau Energies. 
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Introduction 


It is well known that large-N Hermitian matrix models 
generate Feynman diagrams which represent the 
triangulation of Riemann surfaces. For instance, if we 
consider the integral of an N x N Hermitian matrix H 


Z= fan exp (-N jum +žentl) [1] 


we find that the free energy F= log Z has the 1/N 
expansion 


F = YO NFR (A) 2 
g=0 


Inspection of the Feynman diagrams shows that F, 
reproduces the sum over the triangulations of genus 
g Riemann surfaces. The theory [1] is obviously well 
defined for A > 0. In the large-N expansion, the 
theory continues to exist also at negative values of 
A down to the critical point A, = — 1/12. 

The double scaling limit of large-N matrix 
models (Brézin and Kazakov 1990, Douglas and 


Shenker 1990, Gross and Migdal 1990) is given by 
adjusting the coupling À to A, and at the same time 
taking the limit N — oo. In this limit, contributions of 
all genera survive, and the theory describes the 
dynamics of fluctuating surfaces of arbitrary topolo- 
gies. Results obtained in this way do not, in fact, 
depend on the detailed choice of the potential (¢* type 
in [1]) and have a high degree of universality. Thus, it 
provides an interesting model of two-dimensional (2D) 
quantum gravity. 

Soon after the discovery of double scaling limit of 
matrix models, Witten observed that the correlation 
functions of the 2D gravity theory may be given a 
geometrical interpretation as topological invariants 
of the moduli space of Riemann surfaces M, and 
that the 2D gravity theory may be reformulated as a 
topological field theory (Witten 1990). This refor- 
mulation of the results of the 2D gravity theory is 
called “2D topological gravity.” 

In fact, 2D gravity theories come in a family 
parametrized by a pair of integers (p, q). The double 
scaling limit of [1] gives the simplest example 
(p =2,qg=1). Models with a chain of p — 1 Hermi- 
tian matrices give the (p, q) 2D gravity theories. The 
label q stands for the order of criticality of the 
model, and higher values of g are achieved by fine- 
tuning the parameters of the potential. At g=1, 2D 
gravity theories possess a topological interpretation. 
The most basic case (p=2,g=1) is called pure 
topological gravity, and in theories at higher values 
of p, topological gravity is coupled to a matter 
system, that is, topological minimal models. Topo- 
logical minimal models are obtained by twisting 
N =2 superconformal field theories. 

Let us first consider the case of pure gravity (p = 2, 1). 
Let O, denote the observables in the theory and t, the 
coupling constants to these operators. The correlation 
functions of topological gravity are given by 
Op Onin Ome) ps ni = 1, e [3] 
where (---), denotes the expectation value on a 
surface with g handles. The precise significance of 
eqn [3] as the intersection number on the moduli 
space is discussed below. The string partition 
function 7(t) is defined as the generating function 
of all possible correlation functions 


z(t) = exp (exp ` mOn), |4] 


The most striking aspect of topological gravity is 
the connection of the intersection theory on M to 
the theory of completely integrable systems, that is, 
Korteveg-de Vries (KdV) and KP hierarchies. 
Witten conjectured that the generating function of 
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intersection numbers on moduli space r(t) is the 
T-function of KdV hierarchy. KdV hierarchy is 
obtained by generalizing the well-known KdV 
equation 


ðu 3 ðu 1u 

ðt 2" əx 40x 
Identification of the KdV equation with topological 
gravity is given by u=2(0101)},x=t,t=t3. 
Witten’s conjecture was verified by Kontsevich 
(1991) by an explicit construction of a new type of 
matrix model which generates the triangulation of 
the moduli space of Riemann surfaces. 

In the general case of (p,1) topological gravity, 
the partition function of the theory obeys the 
equations of pth generalized KdV hierarchy (p 
reduction of KP hierarchy). 


[5] 


Intersection Theory 


We now present some basic features of intersection 
theory on the moduli space of Riemann surfaces. It 
is known that 2D oriented surfaces X with g handles 
and s marked points x; (i = 1,...,s) possess a finite 
number of inequivalent complex structures (complex 
structures are identified when they differ only by 
diffeomorphism). The space of inequivalent complex 
structures is called the moduli space M,, of the 
Riemann surface X. Its dimension is given by 


dim M,; =3g—3+s [6] 


For a mathematically rigorous treatment, we have to 
consider a compactification Mgs of moduli space 
M,; by adding suitable boundary components 
which arise due to various types of degenerations 
of Riemann surfaces. In the Deligne-Mumford or 
stable compactification, one considers the following 
three classes of singular Riemann surfaces X: 


1. Two points, x; and x;, on X come close together. In 
this case, an extra 2-sphere is pinched off from the 
surface by forming a thin neck. The sphere contains 
points x; and x; and also the point x; at the end of 
the neck (see Figure 1a). Since the original surface 
now has one point less and the 2-sphere with three 
points has no moduli, the degenerate surface has 
3g—4+s parameters and forms a boundary 
divisor of the moduli space M,.. 

2. If a cycle of nontrivial homology class shrinks to 
a point, we have a surface with one less genus 
and two extra marked points. Singular surface 
has 3(g—1) —3+s+2 number of moduli and 
this is again a complex codimension-1 compo- 
nent (see Figure 1b). 
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Figure 1 Degenerate Riemann surface obtained when (a) the 
points x; and x; coincide; (b) a nontrivial cycle collapses, two 
new points x; and x; are created; (c) a pinching cycle collapses, 
two new points x; and x; are created. 


3. Similarly, if a dividing cycle pinches, one obtains 
two disconnected surfaces of genus g; with s; + 1 
marked points (i= 1,2;g1 + g2 =8,S1 +52=s). 
This type of degeneration also has the same 
number of parameters X` (3g; — 3) + X (s1 + 1) = 
3g — 4 + s (see Figure 1c). 


It is known that M,, is a compact and smooth 
orbifold space, and observables of topological gravity 
are given by the cohomology classes on Mgs. There 
exist special cohomology classes introduced by 
Mumford and Morita, which are defined as follows: 
There are natural line bundles £1,...,£, on the 
moduli space Mgs. The fiber of the bundle £; at a 
point X € M,,, is the cotangent space Ty» to the 
point x; on the surface X. These line bundles have the 
first Chern classes c1(£;) and by taking their exterior 
power we can define 2-dimensional classes 


onli) = c1(Li)” € H” (Mg,s) [7] 


Correlation functions are defined by integrating 
these classes over the moduli space: 


Ong On.) ¢ z f ca(L1)”" Nee- Aci(Ls)" [8] 


These integrals are topological invariants of Mg. 
and are nonzero only when the degree of the 
cohomology classes adds up to the dimension of 
the moduli space 


S nj =3g-3+4+s [9] 
i=1 


On (i) is known as the nth descendant of the puncture 
operator oo(i), since it is associated with the marked 
point xj. 

The above correlation functions are evaluated 
using various recursion relations. First, one has the 
puncture equation 


S 


(00m: Ons) g = ` Om Cai [10] 
i1=1,n;#0 


which can be derived by considering a map 
T: Mg s+1 Mg; where one forgets the position of 
an extra point. Contributions arise when the for- 
gotten point coincides with the other points. This 
relation can be used to eliminate oo’s from correla- 
tion functions when they are well defined. At g=0, 
less than three insertions are ill-defined and one has 


(700000)9 = 1 [11] 


Another basic relation is the dilaton equation for 
the operator g1: 


(010m: Ondg = (28 = 2 + $) {0m On) — (12) 


The dilaton equation follows from the fact that since 
o, is the first Chern class cı(£), it calculates the 
degree of the canonical line bundle of genus g 
surface with s punctures. At g= 1, one insertion is 
required and one has 


(o1) = 34 [13] 


By combining these recursion relations, one can 
evaluate the correlation functions. For instance, at 
g=0 one finds 


(ni +--+ ns)! 
n1!---n,! 


(Om On)o = [14] 
A powerful way of computing correlation functions is 
given by the KdV hierarchies and Virasoro conditions 
as discussed below. In the context of integrable 
systems, it is convenient to redefine the observables as 


Orns = (2n+1)"!-o,, n>O0 [15] 


Topological Minimal Models 


Standard intersection theory applies to the case of 
pure topological gravity, p=2. At higher values of 
p, the theory is generalized as follows: one intro- 
duces the coupling of topological gravity to the 
topological matter sector which is obtained by 
twisting the M =2 superconformal theories. 

We recall that MN =2 superconformal symmetry is 
generated by the operators, stress tensor T(z), U(1) 
current J(z), and two types of supersymmetry 
generators G(z)*. (In the holomorphic sector of the 
theory these operators depend on the holomorphic 
coordinate z of the Riemann surface. In the antiholo- 
morphic sector they depend on the antiholomorphic 
variable z.) Mode expansion of the stress tensor and 
U(1) current is given by 


T= D ae J(z) = D ae [16] 


Ly, generates the Virasoro algebra 


< m(m —1)8 nino [17] 


Lin, Ln] = (m 12 


_ HN) Ligeey T 
where c denotes the central charge of the theory. 
Commutators of J,, and L, are given by 


C 
IA = 3 Môm+n,0, 


[Lm Ja] =—n]m+n [18] 

It is known that there is a continuum of unitary 
N=2 conformal theories in the range c > 3; 
however, only discrete values of the central charge 
c=3k/(kR+2),kR=1,2,... are allowed in the 
region 3 >c> 1. These are the N=2 minimal 
models labeled by the level k. Only a finite 
number of primary fields exist in these theories. 

In N =2 theory, primary fields ¢, are characterized 
by their conformal dimension and U(1) charge: 


Lo|¢a) m h|pa), Jolba) = qlpa) [19] 


There exists a special set of primary operators, chiral 
primary fields y (£=0,...,k), which are annihilated 
by the supercharge operator G*: 


f deG* (z)|de) = 0 20 


dy has the dimension and U(1) charge 


q( oe) = 


= Tyg Plo) =4a(o), €=0,1,..,4 [21 


By considering primary fields annihilated by G7, 
we can also define antichiral fields. Antichiral fields 
have U(1) charge opposite to those of chiral fields. 
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If one defines the twisted stress tensor by 
T'(z) = T(z) + 38] (2) |22] 


then T'(z) has a vanishing central charge. Further- 
more, the conformal dimensions of the supersym- 
metry operators G* become shifted from 3/2 to 
h(GT)=1 and h(G~)=2. It is then possible to 
integrate GĦ on the Riemann surface and define a 
fermionic scalar operator Gj = ¢ dzG* (z). From the 
N =2 algebra, one has 
(GEP=0, {GEGW}=2TW 3 

If we identify Gj as the Becchi-Rouet-Stora—Tyupin 
(BRST) operator of the theory, then the twisted 
stress tensor becomes BRST trivial, which is the 
characteristic feature of topological field theory. 
Thus, we obtain a topological field theory by twisting 
N =2 conformal theory (Eguchi and Yang 1990). 
These are topological minimal models. BRST-invar- 
iant observables are given by the chiral primary fields 
[20]. (To be precise, when we take account of the 
antiholomorphic sector, we may define either O = 
G} + G} or Q=G} +G; as the BRST operator. 
Thus, in general, we obtain two different topological 
field theories. This is the origin of the mirror 
symmetry. In the context of topological gravity, one 
takes the convention Q = Gj + C a 

Now, we consider the coupling of topological 
gravity to topological minimal models. We identify 
k=p-— 2. Making use of chiral fields ¢; (=0,..., 
p — 2), observables are constructed: 


Onl = On &) de [24] 


N =2U(1) charge is identified as the degree of 
differential form of the moduli space. Thus, the 
degree of oy is n+ £/p. Correlation functions 
(IE-1 Tnti)g are nonzero if the selection rule 


(at) = (3-22)e-p+s ps 


i=1 


is obeyed. 
We may assemble o,,¢ into operators with one 
index O,, as 


n 


Onprert = | [p +241) - ong [26] 
r=0 


where one introduces a convenient normalization 
factor. Note that the operators O,, do not exist 
when m = 0 mod p and the corresponding paramters 
tm are absent. This is a characteristic feature of p 
reduced KP hierarchy. 
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The puncture and dilaton equations for (p, 1) 
theories read 


(0,00 n.k; us Ong ks) g 


S 


— > OAT a On;—1,k;' _ Cnaks)g [27] 
7=17;70 


(1,00 n),ky E Tn, klg 


— (2g —2+ s) (On ki o Ong ks) [28] 
The special terms at g=0 and g=1 are given by 


=1 


(00,070,i90,p—i-2)9 = 1, (010), = i [29] 


Integrable Hierarchy 


We now summarize some basic facts about the 
integrable hierarchy (see for instance eqn [5]). We 
introduce a pth order differential operator: 


p—2 
ð 
L = DP’ i(x)D’, D=— 30 
+Qiu(@)D, D= Bo 


where the coefficient functions u; are arbitrary 
functions of x. This Lax operator describes the pth 
generalized KdV hierarchy. We consider the time 
evolution of the operator L by an infinite set of 
commuting Hamiltonians: 


OL 
——— = les deals Zl re 31 
Sa [Hw El, 31] 
where H,„ is given by 
Hye Le oy) 
(e), E 


Here “+” denotes the non-negative part of a 
pseudodifferential operator and is defined as 


a= >> fied’ 


1=— © 


Ay = > OD [33] 
i=0 
We also use the notation 


= 
res A= 741%), Aa= > fi(x)D' [34] 


1=—00 


Note that x is identified as the first time variable t4, 
that is, x = t4. 


It is a basic result of the calculus of pseudodiffer- 
ential operators that the above Hamiltonians satisfy 
the zero-curvature condition 


oH, 7 OH; 
Ot, OL 








T [Hista] =0 [35] 


Note that when m is a multiple of p, Hm becomes a 
power of L and trivially commutes with L. Thus, the 
time variables t,, are absent for n = 0 mod p. In the 
simple case of p=2, one has 


L = D* + u(x) [36] 
and H3 = D? + (3/2)uD + (3/4)u’. One finds 


1u 
4 0x3 


3 Ou 


ðL Ou _3 Ou 
TI Oe 


— = — = |H3,L 
Ot3 Of3 | i | 


[37] 
which is the standard KdV equation. 

In the case of KP hierarchy, one starts with a 
pseudodifferential operator 





Q=D+X 4D" [38] 
i=1 
and considers the time evolution equations 
o n 
IS = [H,. Oh Ha = (0°), BI 


p-reduced KP hierarchy is obtained if one has 
O? =0 |40] 


By introducing a pseudodifferential operator K, one 
may bring O to the simple derivative operator D as 


Q = KDK”! [41] 


K has an expansion of the form 
K=1+)5 aD [42] 
i=1 


After time evolution, the coefficient functions u;(x) 
of the Lax operator depend also on the variables 
to,t3,... and become functions of t= {t1,t,...}. 
These functions are expressed by the 7-function 7(t) 
of the hierarchy in the following manner: 





o 
K=-—l 4 
res T og T(t) [43] 
/p o? 
res DP = a log T(t) [44] 


These residues are expressed in terms of {u;} and 
their derivatives in x, and one can determine them in 
terms of the 7-function. 


In the case p=2, one has 


[H,,L] = 2Dres(L*/*) = DR}, k=odd [45] 


Here {R;} are the Gelfand—Dikii potentials 
u, R3 =1(3u? + u”) 
Rs = 4(10u* + Su? + 10uu" + u") 46) 


and obey the recursion relation 
DRey2 = t(D? +2(Dut+uD))R, [47] 


If one uses the relation [44], Gelfand-Dikii potentials 
are identified as 


= 2(010,) [48] 


By setting k = 1, we note u = 2(01 01) and find that 
the evolution equations [31] are all satisfied as 


ƏL m ð 
Otp B Ot, B Ot, 


(O71 01) = 2D(01 Ox) 


Now it is possible to identify the initial condition 
for the Lax operator in the case of topological (p, 1) 
gravity. By using the definition 


log T(t )= (exw d `“ PO , [50] 


g=0 
one has 
res L'/?(0) = (010), i= 
From [29] one finds 


1,...,p-1 [S1 


resL'/? (0) = ix - óip—1 [52] 


This gives the initial value of the Lax operator: 
L(0) = D? + px [53] 


Thus, only the lowest term uo(x)=px is nonzero 
and higher coefficients all vanish at t=0. This is the 
special simplification which takes place in the 
topological gravity theory. 

We note a relation 


ZD, L(0)| =l [54] 


This is the so-called “string equation” (at t=0). At 
nonzero values of t, the string equation takes the form 


[P,L] =1 
TE a 55] 


Pas (Li?) 
p 


- > kt, (L 


k=p+1 
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From [55], we see that (p, 1) theory corresponds to the 
background value of the coupling t).1 = —1/(p + 1). 
In the case of (p, q) theory, background value is given 


by tpg+1 = —1/(pq + 1). 


Virasoro Conditions 


A powerful algebraic machinery controlling 
the structure of 2D gravity is the so-called “Virasoro 
conditions.” One introduces differential operators 


o = O 1 
gee =Y ijtit, [56 
l Ot; i ktp Ot, 2. ijtit; [56 











_ 
Lo = 57 
: “gr u yk + | | 


By using the fact that derivative in t, brings down 
the operator O, when acting on the r-function, it is 
easy to show that 


Li r=0 [58] 


LoT =Q [59] 


reproduce the puncture [27] and dilaton equation 
[28], respectively. It is possible to show that the 
L_,-condition, La-r=0, is equivalent to the 
string equation [55]. 

Together with the operators (n > 1) 





32 
p= kt 
ET n+1)p +) k tran E Ba Ot;Ot; 
they generate Virasoro algebra (Li, = (1/p)L 
i Po I es —1 [60] 


It is possible to show that the (p, 1) 
Virasoro conditions [6] 


lag rT = 


model obeys the 


n> -1 61) 


It is known that (p,1) models with p > 2 also obey 
constraints of W-algebra. 

The relationship of the Virasoro conditions to 
KdV hierarchy is summarized as 


string equation + KdV hierarchy 


<= Virasoro and W-algebra constraints 


Topological o-Model 


It is known that when the target space of a 
supersymmetric nonlinear o-model is a Kahler 
manifold K, the theory acquires an enhanced NV =2 
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supersymmetry. Then we can twist the theory and 
converted into a topological field theory. This is the 
topological o-model [7]. The partition function of 
the theory consists of a sum over world-sheet 
instantons, that is, holomorphic maps from the 
Riemann surface to the target space K. Due to 
supersymmetry, functional determinants around 
instantons cancel and the theory simply counts the 
number of holomorphic curves inside the Kahler 
manifold K. Thus, the topological o-model has a 
close relationship with enumerative problems in 
algebraic geometry, that is, Gromov—Witten invar- 
lants and quantum cohomology theory. 

When the topological o-model is coupled to topolo- 
gical gravity, the BRST-invariant observables are given 
by o,,(®;) = a, Q ®;, where ®; are cohomology classes 
of K. Correlation functions are defined as 


ee) =f Jaena) (62 
i=1 


gd M(K;d),.. 1. 


Here M, s(K; d) denotes the (stable compactification 
of) moduli space of degree d holomorphic maps 
to K from genus g Riemann surfaces X. e* is the 
pullback of the evaluation map e;:(f3x1,..., Xs) € 


Meg, s(K;d) — f(x; E€ K by f where f is a holo- 
morphic map. Correlation functions [62] give 
topological (symplectic) invariants of the manifold 
K. In the cases n;=0 (i= 1,...,s), they are known as 
Gromov-Witten invariants. 

Equation [62] is nonvanishing if the selection rule 


S 


> (i + qi) = dim Mg .(K; d) 
a =c(K)d+(3—dim K)(g—1)+s [63] 


is obeyed, where q; is the degree of cohomology 
class ®; and c,(K) is the first Chern class of the 
tangent bundle of K. 

We see that there is a close parallel between the 
topological o-model and (p, 1) topological gravity. 
If we formally set g;=4¢;/p,c1(K) =0, and dim K = 
(p — 2)/p, eqn [63] agrees with eqn [25]. Based on this 
analogy, Eguchi, Hori, and Xiong proposed the 
Virasoro conjecture [8], that is, generating functions 
of the number of holomorphic maps to arbitrary 


Kahler manifolds are annihilated by the Virasoro 
operators which are constructed by taking an analogy 
with those of (p, 1) gravity. The Virasoro conjecture is 
a natural generalization of Witten’s conjecture, and 
has recently been rigorously proved in the case of 
curves and projective spaces. 

Excellent reviews on the theory of 2D topological 
gravity are given in Witten (1991) and Dijkgraaf (1991). 


See also: Axiomatic Approach to Topological Quantum 
Field Theory; Large-N and Topological Strings; Mirror 
Symmetry: A Geometric Survey; Moduli Spaces: An 
Introduction; Riemann Surfaces; Topological Sigma 
Models; WDVV Equations and Frobenius Manifolds. 
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Introduction to the Physical and 
Mathematical Contexts and Issues 


One of the most exciting developments of mathema- 
tical physics in the last three decades has been the 
discovery of numerous intimate relationships between 
the topology and the geometry of knot theory and the 
dynamics of many domains of “classical” and “new” 
macroscopic physics. Indeed, complex systems of 
knotted and entangled filamentary structures are 
ubiquitous in nature and arise in such disparate 
contexts as electrodynamics, magnetohydrodynamics, 
fluid dynamics (vortex structures), superfluidity, 
dynamical systems, plasma physics, cosmic string 
theory, chaos of magnetic flows and nonlinear 
phenomena, turbulence, polymer physics, and mole- 
cular biology. In the recent years, mathematical tools 
have been developed to identify and analyze the 
geometrical and topological complex structures and 
behaviors of such systems and relate this information 
to energy levels and stable states. 

The influence of geometry and topology on 
macroscopic physics has been especially fruitful in 
the study and comprehension of the following topics. 


1. Knots and braids in dynamical systems. It is 
now clear that the chaotic behavior of the Hénon- 
Heiles system and other nonlinear systems is driven 
and controlled by topological properties. For example, 
it has been found that trajectories in the phase space 
form hyperbolic knots. The finding of knots in the 
Lorenz equations is another important theme closely 
related to the previous. By varying the Rayleigh 
number r, a parameter in the Lorenz equations, both 
chaotic and periodic behavior is observed. In the recent 
years, the knots (notably several torus knots) corre- 
sponding to the different periodic solutions of the 
system have been found and classified. By finding 
hyperbolic knots and in particular hyperbolic figure-8 
knot as a solution to the Lorenz equations the 
suspicion that there exists a new route to chaos 
would be strengthened. 

2. Topological structures of electromagnetic fields. 
Progress in the field of space physics, astronomy, and 
astrophysics over the last decade, increasingly reveals 
the significance of topological magnetic fields in these 
areas. In particular, the interaction of plasma and 
magnetic field can create an astonishing variety of 
structures, which often exhibit linked and knotted 


forms of magnetic flux. In these complex structures of 
the fields, huge amounts of magnetic energy can be 
stored. It is, however, a typical property of astro- 
physical plasmas, that the dynamics of magnetic fields 
is alternating between an ideal motion, where all forms 
of knottedness and linkage of the field are conserved 
(topology conservation), and a kind of disruption of 
the magnetic structure, the so-called magnetic recon- 
nection. In the latter, the magnetic structure breaks up 
and reconnects, a process often accompanied by 
explosive eruptions, where enormous amounts of 
energy are set free. Magnetic reconnection is in close 
analogy to splitting of knots, which makes us 
confident that the global dynamics of magnetic and 
electromagnetic fields can be characterized with the 
help of such topological quantities as well. 

3. Knotting and unknotting of phase singularities. 
It has long been known that dislocation lines can be 
closed, and recently it was shown that they can be 
knotted and linked. Moreover, Berry and Dennis 
(2001) constructed exact solutions of the Helmhotz 
equation representing torus knots and links; in fact, 
a straightforward application of this idea led to 
knotted and linked dislocation lines in stationary 
states of electrons in hydrogen. As a parameter, 
called a, is varied, the topology of dislocation lines 
can change, leading to the creation of knots and 
links from initially simple dislocation loops, and the 
reverse process of unknotting and unlinking. The 
main purpose here is to elucidate the mechanism of 
these changes of topology. All waves are solutions of 
monochromatic wave equations, that is, stationary 
waves, and a is an external parameter that could be 
manipulated experimentally. However, œ could 
represent time, and then the analogous solutions of 
time-dependent wave equations would describe 
knotting and linking events in the history of waves. 
The methods of Berry and Dennis are based on exact 
stationary solutions of wave equations, and lead to 
knots and links threaded by multistranded helices. 


The Origins of Topological Vortex 
Dynamics Ideas 


The intimate relationship between three-dimensional 
vortex dynamics and topology was recognized as early 
as 1869 by W Thomson (Lord Kelvin) who tried to 
elaborate a theory of matter in which atoms were 
thought to be tiny vortex filaments embedded in an 
elastic-like fluid medium, called ether. Accordingly, 
the infinite variety of possible chemical compounds 
was given by the endless family of topological 
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combinations of linked and knotted vortices. Kelvin 
was inspired by the work of Gauss, who in an attempt 
to describe topologically the behavior of two insepar- 
ably closed linked circuits carrying electric current, 
found a relationship between the magnetic action 
induced by the currents and a pure number that 
depends only on the type of link, and not on the 
geometry: this number is the first topological invariant 
now known as the linking number. 

In modern mathematical terms, Gauss introduced 
an invariant of a link consisting of two simple closed 
curves 71,72 in R?, namely the signed number of turns 
of one of the curves around the other, the linking 
coefficient {71,72} of the link. His formula for this is 


N={1,72} 
-5 ECOLO = 


MORRAI [1] 


where | , ] denotes the vector (or cross) product of 
vectors in R? and ( , ) the Euclidean scalar product. 
Thus, this integral always has an integer value N. If 
we take one of the curves to be the z-axis in R? and 
the other to lie in the (x, y)-plane, then the formula 
[1] gives the net number of turns of the plane curve 
around the z-axis. It is interesting to note that the 
linking coefficient [1] may be zero even though the 
curves are nontrivially linked. Thus, its having 
nonzero value represents only a sufficient condition 
for nontrivial linkage of the loops. This last 
consideration leads naturally to the mathematical 
concepts of knots and links whose most striking 
properties have been investigated in our introduc- 
tory article (see Mathematical Knot Theory). 

The other source of inspiration of Kelvin’s theory 
of matter was the Helmholtz’s laws of vortex 
motion, which state that in an ideal fluid (where 
there is no viscosity) vortex lives forever: two closed 
vortex rings, once linked, will always be linked. The 
classical results obtained by Helmholtz are basic to 
understanding the dynamics of Euler motions. The 
vorticity of a velocity field is its curl and is denoted 
w,(z) := curl(X(z, t)). In two dimensions, the vorticity 
is a real-valued function and w, = — AY, where W is 
the stream function of X(z, t). Recall that the push- 
forward of a scalar field (O-form) s under a 
diffeomorphism f is f,s=sof-!. These results, in 
modern terms, can be stated as follows: 


Theorem (Helmholtz—Kelvin). An incompressible 
fluid motion (M,,¢;) with velocity field X and 
vorticity w, is Euler if and only if its vorticity is 
passively transported, 


Or Wy) = Wt 


and circulation around all smooth simple closed 
curves C are preserved under the flow, 


a 


X-dr=0 
dt Jao) 


One knows that in three dimensions, the Helmholtz- 
Kelvin theorem says that the vorticity (now a vector 
field) is transported. Thus, with generic initial 
vorticity a 3D time-periodic Euler fluid motion 
preserves a nontrivial vector field. One very interest- 
ing question that remains to be elucidated is the 
following: are there any chaotic, time-periodic Euler 
flows with stationary boundaries? 


The Connection between Topological and 
Numerical Invariants of Knots and the 
Physical Helicity of Vector Fields 


The writhing number of a curve in Euclidean three- 
dimensional space is the standard measure of the 
extent to which the curve wraps and coils around 
itself; it has proved its importance for electrody- 
namics and fluid mechanics in the study of the 
knotted structures of magnetic vortices and 
dynamics flows, and for molecular biologists in the 
study of knotted duplex DNA and the enzymes 
which affect it. The helicity of a divergenceless 
vector field defined on a domain in Euclidean 
3-space, introduced by Woltjer in 1958 in an 
astrophysical context and coined by Moffat in 
1969 in the study of its topological meaning, is the 
standard measure of the extent to which the field 
lines wrap and coil around one another; it plays 
important roles in fluid mechanics, magnetohydro- 
dynamics, and plasma physics. The “Biot—Savart 
operator” associates with each current distribution 
on a given domain the restriction of its magnetic 
field to the domain. When the domain is simply 
connected, the divergence-free fields which are 
tangent to the boundary and which minimize energy 
for given helicity provide models for stable force- 
free magnetic fields in space and laboratory plasmas; 
these fields appear mathematically as the extreme 
eigenfields for an appropriate modification of the 
Biot-Savart operator. Information about these fields 
can be converted into bounds on the writhing 
number of a given piece of DNA. 

Recent researches (Cantarella et al. 2001) 
obtained rough upper bounds for the writhing 
number of a knot or link in terms of its length and 
thickness, and rough upper bounds for the helicity 
of a vector field in terms of its energy and the 
geometry of its domain. It was also showed that in 
the case of classical electrodynamics in vacuum, the 
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natural helicity invariant, called the electromagnetic 
helicity, has an important particle meaning: the 
difference between the numbers of right- and left- 
handed photons. Recently, a topological model of 
classical electrodynamics has been proposed in 
which the helicity is topologically quantized, in a 
relation that connects the wave and particle aspects 
of the fields (Trueba and Rafiada 2000). 

Consider two disjoint closed space curves, C and 
C’, and the Gauss’ integral formula for their linking 
number 


LG C= af 2 x dy/dt) 
x x —y/|x — y| ds dt [2] 


The curves C and C’ are assumed to be smooth and 
to be parametrized by arclength. Now the question 
is to know what happens to this integral when the 
two space curves C and C’ come together and 
coalesce as one curve C. At first glance, the 
integrand looks like it might blow up along the 
diagonal of C x C’, but a careful calculation shows 
that in fact the integrand approaches zero on the 
diagonal, and so the integral converges. Its value is 
the writhing number Wr(C) of C defined above: 


Wr(C) = 2 [ (dx /ds x dy/dt) 
x x —y/|x — y| ds dt [3] 


Here is the very useful result, due to Fuller (1978). 
The writhing number of a knot K is the average 
linking number of K with its slight perturbations in 
every possible direction: 


1 


Wr(K) = 7a 


/ Lk(K,K+eW)d(area) [4] 
Wes? 
This is helpful for getting a quick approximation to 
the writhing number of a knot which almost lies in a 
plane; in the example of a trefoil knot, Wr(K) = 3. 
Here, a very important result must be recalled, a 
“bridge theorem,” proved by Berger and Field 
(1984), see also Ricca and Moffatt (1992), which 
connects helicity of vector fields to writhing of knots 
and links, and which can be used to convert upper 
bounds on helicity into upper bounds on writhing. 


Proposition (Berger and Field). Let K be a smooth 
knot or link in 3-space and Q= N(K,R) a tubular 
neighborhood of radius R about K. Let V be a 
vector field defined in Q, orthogonal to the cross- 
sectional disks, with length depending only on 
distance from K. This makes V divergence-free and 
tangent to the boundary of Q. Then the writhing 
number Wr(K) of K and the helicity H(V) of the 
vector field V are related by the formula 


H(V) = Flux(V)* Wr(K) 


In the formula, Flux(V) denotes the flux of V 
through any of the cross-sectional disks D, 


Flux(V) = J V -nd(area) 


where 7 is a unit normal vector field to D. 

A key feature of this formula is that the helicity of V 
depends on the writhing number of K, but not any 
further on its geometry; in particular, such quantities 
as the curvature and torsion of K do not enter into the 
formula. Berger and Field actually showed that the 
helicity H(V) is a sum of two terms: a “kink helicity,” 
which is given by the right-hand side of the above 
formula, and a “twist helicity,” which is easily shown 
in our case to be zero. Their proof assumes K is a knot, 
but it is straightforward to extend it to cover links. 

Let Q be a compact domain in 3-space with 
smooth boundary 0Q; we allow both Q and 02 to be 
disconnected. Let V be a smooth vector field (where 
“smooth” means of class C™), defined on the 
domain Q. The helicity H(V) of the vector field V 
is defined by the formula 


HV)=4-f Vex VO) -x—y/be 9 


x d(vol), d(vol), [5] 


Clearly, helicity for vector fields is the analog of 
writhing number for knots. Both formulas are 
variants of Gauss’ integral formula for the linking 
number of two disjoint closed space curves. 

In order to understand this formula for helicity, 
think of V as a distribution of electric current, and 
use the Biot-Savart law of electrodynamics to 
compute its magnetic field: 


BS(V)(9)= z | Via) xy—x/ly— xP d(vol), [6 


Then the helicity of V can be expressed as an integrated 
dot product of V with its magnetic field BS( V): 


H= [ven x V(y)-x—y/ly— x|” 
x d(vol),.d(vol),, 
1 
7 J, V(g): zl, V(x) x y—x/|x — y|? d(vol), 
x d(vol), 


_ | V(y) -BS(V)(y) d(vol), 


- | V- BS(V) d(vol) 
Q 
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Cantarella et al. (2001) found two very interesting 
results. 


Theorem 1 Let K be a smooth knot or link in 
3-space, with length L and with an embedded 
tubular neighborhood of radius R. Then the wri- 
thing number Wr(K) of K is bounded by 


[Wr(K)| < 1/4(L/R)*° 


For the proof, see Cantarella et al. (2001). 


Theorem 2 The helicity of a unit vector field V 
defined on the compact domain Q is bounded by 


IH(V)| < 1/2 vol(Q)*”” 


Let us now give a brief overview of the methods 
used to find sharp upper bounds for the helicity of 
vector fields defined on a given domain Q in 
3-space. As usual, 2 will denote a compact domain 
with smooth boundary in 3-space. Let K(Q) denotes 
the set of all smooth divergence-free vector fields 
defined on Q and tangent to its boundary. These 
vector fields, sometimes called “fluid knots,” are 
prominent for several reasons: (1) They are natural 
vector fields to study in a “fluid dynamics 
approach” to geometric knot theory. (2) They 
correspond to incompressible fluid flows inside a 
fixed container. (3) They are vector fields most often 
studied in plasma physics. (4) For given energy 
(equivalently minimize energy for given helicity), 
they provide models for stable force-free magnetic 
fields in gaseous nebulaes and laboratory plasmas. 
(5) The search for these helicity-maximizing fields 
can be converted to the task of solving a system of 
partial differential equations. (6) The fluid knots can 
reveal some fundamental and still unknown 
mechanisms, which characterize the phenomenon 
of phase transition, and in particular the transition 
from chaotic (unstable) phases and behaviors of 
matter to ordered (stable) ones. 


Knots and Fluid Mechanics (Vortex Lines, 
Magnetic Helicity, and Turbulence) 


The Kelvin’s theory of explaining atoms as knotted 
vortices in fluid ether was seminal in the develop- 
ment of topological fluid mechanics. The recent 
revival (starting in the 1970s) is mainly due to the 
work of Moffat, on topological interpretation of 
helicity, and Arnol’d, on asymptotic linking number 
of space-filling curves. Modern developments have 
been influenced by recent progress in the theory of 
knots and links. 


Influence of Geometry and Topology 
on Fluid Flows 


Ideal topological fluid mechanics deals essentially 
with the study of fluid structures that are 
continuously deformed from one configuration to 
another by ambient isotopies. Since the fluid flow 
map y is both continuous and invertible, then 
y,,(K) and y;,(K) generate isotopies of a fluid 
structure K (e.g., a vortex filament) for any 
{t1,t2} € I. Isotopic flows generate equivalence 
classes of (linked and knotted) fluid structures. In 
the case of (vortex or magnetic) fluid flux tubes, 
fluid actions induce continuous deformations in D. 
One of the simplest deformations is local stretch- 
ing of the tube. From a mathematical viewpoint, 
this deformation corresponds to a time-dependent, 
continuous reparametrization of the tube center- 
line. This reparametrization (via homotopy classes) 
generates ambient isotopies of the flux tube, with 
a continuous deformation of the integral curves. 

Moreover, in the context of the Euler equations, 
the Reidemeister moves (or isotopic plane deforma- 
tions), whose changes conserves the knot topology, 
are performed quite naturally by the action of local 
flows on flux tube strands. If the fluid in (D — K) is 
irrotational, then these fluid flows (with velocity u) 
must satisfy the Dirichlet problem for the Laplacian 
of the stream function y, that is, 


u=Vy in (D-—K) 


Vp=0 4 
with normal component of the velocity to the tube 
boundary u, given. Equations |7] admit a unique 
solution in terms of local flows, and these flows are 
interpretable in terms of Reidemeister’s moves 
performed on the tube strands. Note that boundary 
conditions prescribe only u, whereas no condition 
is imposed on the tangential component of the 
velocity. This is consistent with the fact that 
tangential effects do not alter the topology of the 
physical knot (or link). The three type of Reidemeister’s 
moves are therefore performed by local fluid flows, 
which are solutions to [7], up to arbitrary tangential 
actions. 


Knotted and Linked Tubes of Magnetic Flux 
Let T be the standard solid torus in R? given by 
((2+ecos@)cosy, (2 +£cos0)sing, esin@)) [8] 


where 0 < 8, < 2r, and O0 <€< 1. For relatively 
prime integers p and q, let Fy,, denote the foliation 
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of T by the curves y-o (where O<e<1 and 
0 <@< 2r) given by 


yels) = (2 + €cos(6 + qs)) cos(ps), 
(2 + ecos(@ + qs)) sin(ps), € sin(@ + qs) [9] 


where 0< s < 2r. 


Definition A magnetic tubular link (or magnetic 
link) is a smooth immersion into R? of finitely many 
disjoint standard solid tori U?_,T; 


L: ULT; > R? 


and a smooth magnetic field B on R? such that 


(i) L is an imbedding when restricted to the 
interior of U_T; 
(ii) the bounding surface of U,L(T;), that is, 
U;L(OT;) is a magnetic surface, and 
(iii) for each component LT;, there exist relatively 
prime nonzero integers p; and g; such that L 
maps the foliation F,,4, of T; onto the integral 
curves of B in LT;. 


Remark Thus, for every fixed i and j, the linking 
number between an arbitrary field line in LT; and 
an arbitrary field line in LT; is the same regardless 
of which integral curves are chosen from LT; and 
LT;, respectively. This is true even when i=j. 


It follows that a magnetic link U;LT; remains a 
magnetic link under the action of the fluid flow, that 
is, U;g;LT; is a magnetic link for t > 0. 

Keeping that the magnetic field B is frozen in the 
fluid, we can now find and study those properties of 
magnetic links that are invariant under the action of 
fluid flow. One obvious invariant is the volume V; 
of each flux tube g;LT;, that is, 


Vi=Vol(LT)=Vol(g1T)= | | f d(vol) [10] 


which remains unchanged because of incompressibility. 
Another invariant of fluid flow is defined as 
follows: 


Definition Let L be a magnetic link. For each solid 
torus T;, choose a meridional disk D;. The magnetic 
flux ®; = (LT ;) in the ith component is the surface 
integral defined as 


b; = &(LT)) : II, B - U d(area) 


where U denotes the normal to the surface LD; 


pointing in the positive direction induced by the B 
field. 


It can be shown that ®; is independent of the 
chosen meridional disk. It also can be shown that 
each 9; is a fluid flow invariant, that is, 


®;(g:LT;) = / | op B Udarea) [1 


is independent of t. 

One more fluid invariant that will play a central 
role in the energy minimization of magnetic links is 
given by the following definition. 


Definition The helicity of a magnetic link L is 
defined as 


H(L) = J J 4 A Bdlvol) 


The term helicity was first introduced in a fluid 
context by Moffat, and it was previously used in 
particle physics for the scalar product of the momen- 
tum and spin of a particle. In another connection, note 
that the helicity H(L) is the same as the Chern—Simons 
action: 


H(L)= | An dA 
= [wan dA+3AAAAA) [3 


where A now denotes the magnetic vector potential 
as a 1-form. 

It can be shown that H(L) is gauge invariant, and 
hence well defined. 


Theorem (Moffat). 
fluid flow, that is, 


The helicity is invariant under 


Í H(giL) =0 

Arnol’d (1998) defines the helicity in a more abstract 
setting and shows that it is invariant under the group 
S(Diff) of volume-preserving diffeomorphisms. 

The following theorem summarizes the many 
results due to Moffat, Ricca, Berger, Lomonaco, 
Hornig, Kauffman, and others, relating the helicity 
of magnetic links to linking and to magnetic flux. 


Theorem Let L be a magnetic link. Then 


H(L)= X` © SLz,+2 S SLK 


1<i<n 1<i<j<n 


where SLrp, denotes the self-linking number of 
the axis curve of the tube LT; with respect to the 
framing F; induced by the integral curves of the 
magnetic field B within LT;, and LK; denotes 
the linking number between any integral curve of 
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the magnetic field B in LT; with any integral curve 
of the magnetic field B in LT;. 


Remark In fact, SLp, is the same as the linking 
number between any two integral curves of the 
magnetic field B within the tube LT;. 


Thus, as many authors have showed, the helicity 
does reflect the topology and the geometry of the 
magnetic lines of force within a magnetic link. If, for 
example, L has only one component, that is, L is a 
magnetic knot, then 


H(L) = ®*SLz(C) [13] 


where SLẹp(C) is the self-linking number of the axis 
curve C of the knotted tube with respect to the 
framing F induced by the integral curves of the 
magnetic field B within the magnetic knot. If, for 
example, the tube is knotted in the form of a trefoil 
and if the magnetic lines of force appear to be 
parallel to the axis curve when the trefoil is placed 
on a plane flat surface, then SL = +3 and 


H= +367 [14] 


On the other hand, if for example, the magnetic 
lines of force induce the trivial framing in each 
component, then 


H(L)=2 X` ®ðLKi; [15] 


1<i<j<n 


Thus, if L is a magnetic two-component Hopf link 
with no twisting of the integral curves of the 
magnetic field within the components of L, then 


H(L)=+2010, 16] 


because the self-linking number based on the B-field 
framing is zero for each component, and the linking 
number between the two components is +1. 


Energy of Magnetic Knots and Links 


Let us conclude this section with the definition of 
the energy of a magnetic link. 


Definition The magnetic energy Ey(L) of a mag- 
netic link L is defined by the classical formula 


Ey) = alll. [B| d(vol) (Gaussian units) 


Although the energy Ey is not flow invariant, it will 
play a central role in magnetic relaxation of knots 
and minimum energy magnetic links. 

Consider a magnetic link L in a perfectly 
conducting, incompressible, viscous fluid. As a result 
of dissipative frictional fluid forces, the magnetic 
energy Ey(g;L) of g;L will decrease with time t. In 


losing energy, the magnetic lines of force will 
contract. On the other hand, since this is a volume- 
preserving process, the cross sections of the flux 
tubes of g;L will at the same time expand. These 
changes of topology occur while the flux ©, volume 
V, and helicity of g,L will remain the same. In other 
words, knotted magnetic flux tubes left free to 
evolve in such a fluid will do so by conserving their 
magnetic flux ® and volume V, but converting their 
magnetic energy into kinetic energy, which in turn 
dissipates by internal friction. Magnetic links and 
knots evolve from high to low magnetic energy 
levels, conserving topology; and because of the 
induced shortening of field lines under conservation 
of volume, they become fatter and fatter, with an 
increase of the average tube cross section. 

This process cannot continue indefinitely. Even- 
tually, the magnetic flux tubes of g;L must make 
contact with each other. In other words, the topology 
of the magnetic link g;L, as expressed in knotting and 
linking, creates a barrier to the full dissipation of the 
magnetic link’s energy, that is, Ey4(g;L) has a positive 
lower bound that results from the topology of gL. 
That means, in other words, that relaxation is 
obstructed by the knottedness and entanglement of 
the field lines, and a minimum magnetic energy is 
reached. Thus, the magnetic link will reach a 
nontrivial stable and invariant energy state, much as 
Kelvin conjectured his atomic vortices would. 

Various estimates of magnetomechanical energy in 
terms of topological quantities have been put forward 
in recent years (see Freedmann and He (1994)). These 
relations give lower bounds for the energy levels 
attainable by knot or link types by taking into account 
the effects that linking numbers and number of 
crossings have on the energy of the relaxed state. 
These bounds are expressed by relationship of the kind 


Ena = OCC mins È, V, N) [17] 


where Emin is the equilibrium energy and ¢ gives the 
relationship between physical quantities — such as 
total flux ®, number of tubes N, magnetic volume V — 
and topology, given here by the minimal possible 
number of crossing Cin. These relations offer 
numerous advantages due to the explicit dependence 
on qualitative properties of the flow field. A simple 
example is provided by the analysis of three braids, 
which shows that magnetic energy grows quadrati- 
cally in time due to random braiding. This means 
that the least possible amount of magnetic energy 
that can be attained by the physical knot or link is 
determined purely by its topology. If topological 
information sets the levels of minimum energy 
accessible to the knot or link, geometric properties 
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may also influence the relaxation process. Consid- 
erations of helicity and linking numbers, for 
example, demonstrate that internal rearrangement 
of magnetic field geometry leads to a spectrum of 
different asymptotic endstates with the same topol- 
ogy. Moreover, magnetic knots have a natural 
tendency to get rid of excessive torsion of field 
lines and S-shaped tube geometries, and this may 
influence the relaxation process. 

Since the helicity H(g;L) is both an invariant of 
fluid flow and an expression of the magnetic link 
g,L’s topology, the following theorem, first stated by 
Moffat, is a mathematical expression of this 
topological bounds. 


Theorem Let L be a magnetic link. Then 
Em(L) = qo|H(L)| 


where qo is a nonzero constant that is independent 
of the magnetic link. 


Freedman and He (1991) obtain more subtle and 
tighter topological bounds on the minimum energy 
of magnetic links. For example, for a magnetic knot 
K, they prove that 


1 (KP ac(K)*" 
Em(K) > PETI eo 
1 ®(K)*/*(2g(K) — 1) 


(Gaussian units) [18] 








where V(K) denotes the volume of the magnetic knot 
K, ®(K) denotes the flux in K, ac(K) is the asymptotic 
crossing number, and g(K) is the genus of the knot K. 
Freedman and He conjecture that ac(K) =c(K), where 
c(K) is the crossing number, that is, the minimum 
number of crossings among all plane diagrams repre- 
senting the knot K. Besides, Moffat (1990) suggests 
that the minimum energy spectrum of a magnetic knot 
can be used to construct new knot invariants. 


Topological Changes, Dissipation, 
and Reconnection in Fluid Patterns 


As we saw above, topological changes do occur 
when dissipative effects become predominant over 
the coherency of structures. When this happens, 
there is a dramatic change of fluid patterns, often on 
small timescales compared to evolution. The change 
occurs through the formation and disappearance of 
physical reconnections in the fluid pattern. In real 
fluids, for example, vortex and magnetic tubes do 
interact and reconnect freely. From a dynamical 
system viewpoint, reconnections take place when the 


vector field lines (streamlines, vortex lines, or 
magnetic lines) cross each other. If two field lines 
meet, the point of crossing is a true nodal point, like 
a bifurcation in a path. Dissipative effects allow the 
reconnection to proceed through such points. 

In dissipative fluids, mathematical and physical 
properties are no longer conserved, and during the 
process we lose part of the original information. 
However, some of the invariants are rather robust 
and may only degrade slowly. One of them is magnetic 
helicity, the magnetic analog of the kinetic helicity. Its 
dissipation during reconnection can be modest; in 
particular, if the reconnection timescale is small 
compared to classical dissipation times, then helicity 
loss will be negligible. The robustness of magnetic 
helicity plays a central role in fusion plasma physics 
and in many astrophysical contexts. On the other hand, 
large changes in kinetic helicity are intimately related to 
qualitative changes in the topology of vortex flows. 

Under Euler’s equations, the helicity of a vortex 
tube of vorticity w and velocity u is defined by 
H= [u-wdV. The integral is taken over the tube 
volume V occupied by w. Now, for n knotted and 
linked vortex tubes, each of (constant) strength 
(total vorticity) ®;(1 <i < N), the helicity of the 
whole system can be expressed in terms of linking 
numbers Lk; as 


H= X _ Lkjð;ð;-Lk; 
ij 

which is equal to Lk;; this is a topological invariant 
whose value does not change under continuous 
deformation of the fluid structure. Since helicity 
and flux-tube strength are measurable conserved 
quantities, the above equation provides useful 
information about the topology of the flow field 
and flow structures. In addition, by direct measure- 
ments of helicity and application of conservation of 
topology, one can estimate average geometric 
quantities, such as the mean twist of field lines, 
and their contribution to the total energy. 


Brief Conclusion 


In this article, we have made an attempt to indicate 
how “classical” field theories, which have been 
successfully used to describe physics of fundamental 
structures and forces of nature, can also be used to 
study geometry and topology of low-dimensional 
manifolds. These developments not only provide new 
insights into old problems of topology of these 
manifolds but also have been responsible for pro- 
foundly interesting new mathematics (fluid 
mechanics, dynamical flows, and polymer biophysics 
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are maybe the most significant examples in the last 
years). In particular fluid dynamics, a topological 
macroscopic field theory, provides a powerful frame- 
work for modern theory of knots and links in 
3-manifolds. Moreover, as we saw here, it provides 
a physical interpretation of the link, self-linking, and 
writhing number of knots and links. The present 
article was essentially aimed to illustrate such a 
relationship. Thus, the most fundamental result we 
reported here is the relation (formula) connecting the 
helicity of vector (magnetic) fields to the writhing 
number of knots: H(V) =Flux(V)* Wr(K). So, wri- 
thing number for knots is the analog of helicity for 
vector fields. Both expressions of these invariants are 
variants of the (Gaussian) integral formula for the 
linking number of two disjoint closed space curves. 
Further investigations of these invariants and their 
mathematical properties might throw new light on 
the interfaces between many different areas of 
macroscopic and quantum physics. 


See also: The Jones Polynomial; Knot Theory and 
Physics; Magnetohydrodynamics; Mathematical Knot 
Theory; Stability of Flows; Superfluids; Topological 
Quantum Field Theory: Overview; Vortex Dynamics; 
Yang—Baxter Equations. 
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Introduction 


Topological quantum field theory (TQFT) constitu- 
tes one of the most successful fields of mathematical 
physics since it originated in the 1980s. It possesses 
an inherent property which makes it unique: TQFT 
provides predictions in mathematics which open 


new fields of research. A well-known example is the 
prediction of Seiberg-Witten invariants as building 
blocks of Donaldson invariants. However, there are 
others such as the recent proposal for the coeffi- 
cients of the HOMFLY polynomial invariants for 
knots as quantities related to enumerative geometry. 
These developments have drawn the attention of 
mathematicians and physicists into TQFT since the 
1980s, a very fruitful period in which both commu- 
nities have benefited from each other. 

Topology has always been present in mathematical 
physics, in particular when dealing with aspects of 


quantum physics. Global effects play an important 
role in quantum-mechanical models and topology 
becomes an essential ingredient in their description. 
TQFT itself appeared in the winter of 1987 after 
Witten’s work (Witten 1988a) on Donaldson theory 
(Donaldson 1990), but a series of papers during the 
1980s which dealt with topological aspects of field and 
string theory anticipated its existence. Two of these 
correspond to Witten’s works on supersymmetric 
quantum mechanics and supersymmetric sigma mod- 
els (Witten 1982) that led to a generalization of Morse 
theory. This generalization was considered by Floer 
(1987) in a new context that constituted the key 
element in Witten’s construction of TQFT. These 
developments were certainly influenced by Atiyah 
(1988). TQFT was born as a result of the interplay 
between physics and mathematics. This has been a 
constant feature all along its development. 

Soon after the formulation of the TQFT 
addressing Donaldson theory, now known 
as Donaldson—Witten theory, Witten formulated a 
new TQFT which focuses on knot invariants such as 
the Jones polynomial and its generalizations (Jones 
1985). Witten (1989) constructed Chern—Simons 
gauge theory and proved its relation to the theory 
of knot and link invariants. This theory possesses 
different features than Donaldson—Witten theory, 
and in fact it turns out that these two theories fall 
into two different general types of TQFTs as will 
be explained in the following section. Anyhow, 
despite their formal differences, both Donaldson- 
Witten and Chern-Simons gauge theory emerged 
as a novel way to express topological invariants in 
terms of quantum field theory quantities as well as 
to generalize their previous formulation. But there 
was much more to them than it seemed in their 
beginnings. Once these topological invariants were 
formulated in field theory language, one had a 
huge machinery to study them from different 
points of view. Theoretical physicists have devel- 
oped many useful tools to study quantum field 
theory. The use of these tools led to new frame- 
works for these topological invariants. 

In this overview we are going to provide the basics 
of TQFT and briefly describe two examples - 
Donaldson—Witten theory and Chern-Simons gauge 
theory — to explain how the general features are 
implemented. Some excellent reviews on the subject 
(Birmingham et al. 1991, Cordes et al. 1996, 
Labastida and Mariño 2004) are available. The 
organization of this work is as follows. In the 
following section we present a general introduction 
to TQFT from a functional integral point of view. 
Next, we touch upon the twisting of extended 
supersymmetry as a general constructive approach 
to TQFT. This is followed by a section on 
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Donaldson—Witten theory where we discuss the 
computation of its observables from a perturbative 
approach, showing their relation to the Donaldson 
invariants. Next, we introduce Chern-—Simons gauge 
theory as a theory of knot and link invariants. The 
penultimate section deals with advanced develop- 
ments in TQFT. Finally, we end up with some 
concluding remarks. 


Topological Quantum Field Theory 


We will start our overview by presenting the most 
general structure of a TQFT from a functional 
integral point of view which, though not rigorously 
defined, is the approach that has led to the most 
important developments. As in conventional quan- 
tum field theory, axiomatic approaches to TQFT do 
exist, but we will not follow that route here. 

Let us consider an n-dimensional Riemannian 
manifold X endowed with a metric g,, and a 
quantum field theory on it. We will say that this 
theory is “topological” if there exist operators in the 
theory such that their correlation functions do not 
depend on the metric. If we denote these operators 
by O; (where 7 is a generic label), then 





(Oi, +++ Oi,) = 0 [1] 


where (---) denotes a vacuum expectation value. 
The operators that satisfy this equation are called 
“topological observables.” 

The simplest way to achieve metric independence is 
to consider a theory whose action and operators do not 
depend on the metric. In this situation, if no 
anomalous metric dependence is generated upon 
quantization, the correlation functions of these opera- 
tors satisfy [1] and lead to topological invariants on X. 
Theories of this sort are collectively referred to as 
Schwarz-type TQFTs, and well-known examples are 
Chern-Simons gauge theory and BF theories. How- 
ever, Schwarz-type theories are too restrictive. One 
would like to have a theory satisfying property [1] with 
a weaker condition on the action. This can be achieved 
with the help of a symmetry. The resulting TQFTs are 
called of Witten or cohomological type, the main 
examples being Donaldson—Witten theory and topo- 
logical sigma models (Witten 1988b). 

For TQFTs of Witten type, the action may depend 
on the metric. However, the theory has an underlying 
scalar symmetry 6 acting on the fields ¢;. Since 6 is a 
symmetry, the action of the theory satisfies 6S(¢;) = 0. 
In these theories, metric independence of the correla- 
tion functions is achieved as follows. Let T, = 
(6/dg"")S(¢;) be the energy-momentum tensor of 
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the theory. It turns out that the energy-momentum 
tensor is 6-exact: 


Ty = —i6Gyy [2] 


G, being some tensor. Indeed, if [2] is satisfied, it 
follows that for any set of operators O; which are 
6-invariant, 


ô 
dghy (Oj, O; B O;,) — (Oi, Oi, o On Tw) 
== —i (O; Oe any O; 0G iw, 
= +1(6(O;,O;, «++ O;,Gyw)) 
= [3] 
In this computation we have assumed that 


the symmetry 6 is not anomalous and that there are 
no contributions coming from boundary terms since 
we have integrated by parts in field space. This is not 
always the case and in fact the situations in which one 
of these two properties fails lead to rich phenomena. In 
those cases, for example, in Donaldson—Witten theory 
on manifolds with by = 1, the correlation functions fail 
to be topological invariants in a controlled manner 
which unveils many interesting properties. 

We will now describe Witten-type theories in a 
general context. The general structure of Schwarz-type 
theories is much simpler and will be illustrated in 
the example presented below. In Witten-type theories 
the observables are the 6-invariant operators. It is 
simple to prove that 6-exact operators decouple from 
the theory. Indeed, if O, is 6-exact, O, = 6O,, then 


(0,0;,O;, +++ O;,) =(6O00;,O;, +++ O;,) 


=(6(0,0;,0,,4°0,))=0 [4] 


Thus, one can restrict the set of observables to the 
cohomology of 6: 


Ker 6 
Im 6 


There is no reason a priori why the 6-symmetry 
should be a scalar Grassmannian symmetry, but in 
all known models of Witten-type TQFTs this turns 
out to be the case. Thus, these theories violate the 
spin-statistics theorem. In all these models the 
algebra of the 6 symmetry has the form 


Fez [6] 


O e€ 5] 


where Z is a symmetry transformation (typically a 
gauge symmetry of some sort). This property forces 
to consider Z-invariant observables and to work in 
the context of “equivariant cohomology.” 

The observables of Witten-type theories fit into a 
general pattern that we describe now. The key 
ingredient is a map between the homology of X and 


the equivariant cohomology of 6. Given an operator 
ġ® in the equivariant cohomology of 6, let us 
consider the following set of equations: 


do = 6g). 


where the operators ¢™ (n= 1,...,dim X) are diff- 
erential forms of degree n on X and d is the de Rham 
differential. These differential equations are called 
“descent equations” and their solutions ¢™ (n > 0) 
“topological descendants” of 4. We will show how 
to construct a solution to these equations on general 
grounds. 

The topological descendants lead to the construc- 
tion of a set of elements of the equivariant coho- 
mology of 6. Let y, be an n-cycle on X, y, € H,(X), 
and let us consider the following operator: 


Az) [7] 


wa = g” 8 
Yn 
This operator is 6-invariant, 


swo = f 66 = f age? = 


Yn Yn On 


g) =0 [9 


since Oy =0. On the other hand, if y, were trivial 
in homology, that is, if Yn = Yn+1, we would have 
that wires is 6-exact: 


wie — p” = J dg”) = 6 | pith) [10] 
O Yn+1 Yn+1 Yn+1 


Thus, given the operator ¢'°), we have constructed a 
map between the homology of X and the equivar- 
iant cohomology of 6. There are as many maps as 
basic operators ¢® one finds in the theory. 

To actually construct these maps, we need to find 
a solution of the descent equations [7]. As 
announced before, there is a general solution to 
those equations in Witten-type theories. Since in this 
type of theories [2] holds, there exists an operator 


G, = Gon [1 1] 
that satisfies 
P., = To, = —16G,, [12] 


Notice that G,, is an anticommuting operator and a 
1-form in spacetime. With the aid of this operator, 
one constructs the following solution to the descent 
equations |7]: 


1 
p” = eee dxlt A.A dx”” [13] 


where 


P a (x) — Gy, Guy: - Gpp OO (x), 
oe eres ioe, [14] 


One can easily check using [12] and the 6-invariance 
of ¢ that the operators [13] do satisfy the descent 
equations |7]. 

We have seen that Witten-type TQFTs are char- 
acterized by property [2]. It would be desirable to have 
at hand a systematic procedure to build theories 
satisfying that property. It has been found that 
extended supersymmetry provides a very helpful 
starting point to build those theories. Although super- 
symmetry guarantees from first principles only the 
weaker condition [12] instead of [2], all TQFTs that 
have been constructed from extended supersymmetry 
actually satisfy [2]. To build a TQFT from a theory 
with extended supersymmetry, one needs to go 
through the twisting procedure that we now describe. 


Twisting of Extended Supersymmetry 


All known Witten-type theories are related to an 
underlying extended supersymmetric quantum field 
theory. The topological theory is a modified version of 
the supersymmetric theory in which the Lorentz 
transformation properties (spins) of some of the fields 
have been modified. This modification of spin assign- 
ments is known as twisting, and it can be carried out 
on any theory with extended supersymmetry in any 
spacetime dimension. We will not consider the 
procedure in such a general setting but instead we 
will illustrate it by considering the case of M =2 
supersymmetry in four dimensions. We will begin with 
a general description and then we will apply it to a 
specific example: Donaldson—Witten theory. 

Let us consider the Euclidean version of the M =2 
supersymmetry algebra with no central charges. Central 
charges can be included without much ado but we will 
not consider them for simplicity. The total symmetry 
group of the theory is H = SU(2),, x SU(2)_ x SU(2)p x 
U(1)r, Kk =SU(2), x SU(2)_ being the rotation group, 
and SU(2)p x U(1)z the internal symmetry group of 
the N =2 supersymmetry algebra. The generator 
algebra takes the following form: 


{Qev, Qa} = 2ewo) Pu (Oan On =1) 


Py, Qa] = 0, Pus Qa] = 0 

Mag, Osr! E €5(a0 Bv [Mas Q av! =0 7 

[M 3; Osv! = 0, [M35 Oal = Eia Q Ayu 
iB’, On] = "Qn, B”, Of] = -1 0 
[Qav, R] = Qw, [Q RI = —O., 


[15] 


In these relations v, w € {1,2} are SU(2)p indices and 
œ and & denote spinorial indices of SU(2)_ and 
SU(2),, respectively. The supersymmetry generators 
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Ow and ©,,, transform under H as (0,2,2)! and 
(2,0,2)", respectively. Mag and Mag are the 
generators of SU(2), and SU(2)_, respectively, while 
BY’ and R generate SU(2)p and U(1)g, respectively. 
The twisting of a supersymmetric theory involves a 
modification of the couplings of the theory to a 
background metric on the space where the theory is 
defined. This modification is carried out redefining 
the Lorentz transformation properties of the different 
fields making use of the internal symmetry SU(2)g. In 
particular, we will redefine the couplings of the fields 
to the SU(2), spin connection according to the way 
they transform under SU(2)g. This is easily done by 
identifying the SU(2)ęg indices v with the SU(2), 
indices &. The procedure involves a redefinition of 
the rotation group into K’ =SU'(2), @ SU(2)_, where 
SU'(2), is generated by 
M’. a= M; i= B 


Q 


vB [16] 


The supersymmetry generators O,, and O,,, get 
transformed in the following way: 


Ox = Qag [17] 
Oa —2 0.8 
which allows us to define the “topological 
supercharge”: 
Qa" Oy [18] 


It is simple to prove using [15] and [16] that this 
quantity is a scalar under the new rotation group 
K':[Mas, Q] =0 and [M] „ Q] =0. In addition, from 
[15], it follows that Q is nilpotent (in the absence of 
central charges): 


Q =0 19] 


The scalar generator Q leads to the topological 
symmetry 6 of the previous section. Actually, the 
twisting procedure provides also the operator G,, in 
[12]. Defining 


To 
G, = 10) Qa 20 
one easily finds, after using [15] and [18], 
{Q, Gu} =ð; [21] 


which is indeed equivalent to [12]. On general 
grounds we cannot prove that twisted supersym- 
metric theories lead to theories which satisfy [12]. 
However relation [12], which is weaker, is guaran- 
teed. It turns out that in all the models originated 
from extended supersymmetry which have been 
studied, [2] is satisfied and thus the resulting 
theories are TQFTs of Witten type. 
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Donaldson-Witten Theory 


One of the greatest successes of TQFT has been the 
discovery of Seiberg—Witten invariants as building 
blocks of Donaldson invariants. This was achieved 
in two main steps. First, Donaldson theory was 
reformulated in field-theoretical terms, using pertur- 
bative methods. Second, the resulting TQFT was 
solved using nonperturbative methods. In this sec- 
tion we are going to describe in some detail the first 
step. The second one will be briefly addressed later 
and is the main object of a separate article in the 
encyclopedia (see Seiberg—Witten Theory). 

Let us consider V = 2 supersymmetric Yang-Mills 
theory in four dimensions. The field content of the 
theory is the following: a gauge field A,,, two spinors 
Ava, and a complex scalar ¢, all of them in the 
adjoint representation of a gauge group G. In 
addition, the theory possesses the auxiliary fields 
Dw in the 3 of the internal SU(2)p. The theory has 
the following action: 


J d*x tr (vuo Vib = io VA — TFE” 
1 1 1 
ye D =- H2 — „ow yo i 
S 4 2 ld, Q | D À lp b] Nigal 
l v ~wa 
Sre K al", 4) 22 


This action is invariant under the following M =2 
supersymmetric transformations: 


5b = V 2e" Edw 
6A, SiE A — ido pe 
dv = Dy" Ewe — i€valdy 1] — io!” a bya F uv 
+iv2Eyor E Vad 
6D™ = 2i TV AY) + iV A THE” 
+ 2i V 2E 1, ol] + 2iV2E" X”, g 


|23] 


€, being spinorial N =2 supersymmetric parameters. 

We can now twist the above theory following the 
procedure explained in the previous section. Upon 
twisting, the fields of the theory change their spin 
content as follows: 


A,,(2,2,0)° > A,(2,2)° 
Aav(2, 0,2)"  w,5(2,2)' 
Nav (0, 2,2)"  n(0,0)~*, x,4(3,0)~* 
$(0,0,0)* — ¢(0, 0) 
¢'(0,0,0) — t (0,0)? 
Dww(0,0,3)” + D55(0,0)° 


|24] 


In this table the representations of the respective 
rotation groups carried by the fields have been 
indicated. The superindices refer to the U(1)p charge 
which is also called “ghost number” in the context 
of TQFT. The fields 7 and y are given by the 
antisymmetric and symmetric pieces of Aad: Xeg = 
X68) and g= 12 Age. 

Notice that the twisted fields in [24] are differ- 
ential forms on X; therefore, the twisted theory 
makes sense globally on any arbitrary Riemannian 
4-manifold. This is not the case with the original 
N =2 supersymmetric Yang-Mills, which contains 
fermionic fields. Making global sense of those on 
arbitrary Riemannian 4-manifolds requires the 
manifold to be Spin. 

The dynamics of the twisted theory is governed by 
an action which can be obtained by twisting the 
action [22]. On an arbitrary Riemannian 4-manifold 
endowed with a metric g,,, the twisted action 
becomes 


S= / d*x./gtr (7.606! PDT kag 


: aa 1 V 1 ab 
—ideaV n — FF + 7D 54D p 
1 i ogg 
oe 172 _ _* 48 , 
519.9") a ID, Xa 
i 


Jd Wad 


where \/g= (det(g,,,))'/?. 

To obtain the transformations of the fields under 
the topological symmetry, we need to compute the 
Q-transformations. These are easily obtained using 
[18] and [23]. They turn out to be 


izle v0] 25) 


[O,¢] = 0 

[Q, Apl = Wu 

{Q,n} = [9, ¢"] 

{Q, yu} =2V2V 6 [26] 
[@, ġ'] = 2v2in 


{Q, Xab) = i = Dap) 
(2, D] = (2Vy)" + 2v2[¢, x] 


where yy =o "a and Fr, = or -F is the self-dual 
part of F,,,. Using these transformations, one easily 
finds that Q“ is a gauge transformation. This is not 
unexpected since the M =2 supersymmetric trans- 
formations [23] are in the Wess—Zumino gauge and 
they close only up to gauge transformations. This 
property implies that one must consider the equiv- 
ariant cohomology of Q defined on the set of gauge- 
invariant operators. 


The action [25] is Q-exact up to a topological 
term: 


sa vie 5 [FAP 27] 


where 
“i (panes os 


Sni. PF seta vigt) |28] 


2/2 


Actually, it turns out that in all the theories obtained 
after twisting extended supersymmetry, the resulting 
actions are Q-exact up to topological terms. In the 
case of M =2 theories, topological (theta) terms 
| FA F are generically not observable (due to a chiral 
anomaly), so it is customary to pick 


Sow = {Q, V} |29] 


as the action of the theory, which immediately implies 
[2] and therefore the topological character of the 
theory. Notice, however, that [29] is stronger than [2]. 

As we described in the previous section, the 
observables of the theory can be constructed using 
the operator G,, in [20]. Its action on the twisted 
fields is easily obtained nsns [23]: 





[Ga 6] = nam 

IG), Ay] = 58w — İXw 
[G,n] = -Spa 

IG, FT] =iVx + 5 * Vn 
{G, x} = ae Vo 
IG, D] = -2s y+ Vy 


We now need to fix the basic operator 6 in [14]. 
The starting point must be a set of gauge-invariant, 
Q-closed operators which are not Q-trivial. Since 
[Q,ġ]=0, these operators are the gauge-invariant 
polynomials in the field @. For a simple gauge group 
of rank r the algebra of these polynomials is 
generated by r elements, and we shall denote this 
basis by O,,2=1,...,r. A simple choice for SU(N) 
consists of the following Casimirs: 


Ova e™),, m= lN [31] 
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Using G, we can now construct the map between 
the homology of X and the equivariant cohomology 
of Q. Let us consider the simple case SU(2). There 
exists only one independent Casimir and, corre- 
spondingly, only one basic operator: 


O = tr(¢") [32] 


for which one finds the following set of descendants: 


OY = tr : on) dx” 


z 
s o(F, + Du) 


a 

“tr| —¢(F,, r 

v2 33] 
5s ) dx” A dx” 


o2) —— 


The map from the homology of X to the equivariant 
cohomology of Q can now be constructed very 
easily. Let y; be an element of the homology group 
H;(X). We associate to it the following observable: 


wht) = f 0° [34] 
Ji 

where O") is given in [33]. The construction assures 
that I;(7;) is invariant under Q and gauge transfor- 
mations. Furthermore, it is also assured that I;(7;) is 
not QO-exact. 

Let us consider the computation of correlation 
functions. The discussion will be presented for a 
generic gauge group. We will consider the topologi- 
cal theory defined by the Donaldson—Witten action 


Sow = {Q,V} [35] 


where V is defined in [28]. The property [35] has a 
very important consequence. The action Spw shows 
up in the correlation functions as exp(—Spw/e’), 
where e is a free parameter which corresponds to 
the coupling constant of the M =2 theory. Since the 
term involving the coupling constant is Q-exact, the 
correlation functions of Q-invariant operators are 
independent of e. Let us explain this in some detail. 
The (unnormalized) correlation functions of the 
theory are defined by 


(hi on) = J Depi- -pne t Sw [36] 


where ¢1,...,¢, are invariant under Q transforma- 
tions. Using the fact that Spw is Q-exact, one obtains 


S (p1 bn) = (61 dnSpw) 
-ZT bV} =0 B7 
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where we have used the fact that Q is a symmetry of 
the theory, and therefore as in [3] the last functional 
integral gives zero. This result implies that one can 
compute these correlation functions in different 
limits of e. In the weak-coupling limit (semiclassical 
or saddle point approximation), one establishes the 
connection with Donaldson theory. In the strong- 
coupling limit, Seiberg-Witten invariants appear and 
one finds the connection between these two types of 
invariants. We will briefly explore the weak- 
coupling limit e — 0. The functional integral [36] 
can be evaluated exactly in two steps: first one 
analyzes the zero modes or classical configurations 
that minimize the action, then one expands around 
them considering only quadratic fluctuations. The 
integration over these quadratic fluctuations 
involves ratios of determinants of kinetic operators 
that because of the Q-symmetry of the theory (which 
in fact is a Bose—Fermi symmetry) are +1. One is 
then left with an integral over the bosonic zero 
modes which leads to a finite-dimensional integral 
over the space of bosonic collective coordinates, and 
a finite Grassmannian integral over the zero modes 
of the fermionic fields. A careful analysis of the zero 
modes, first carried out by Witten, reveals that the 
infinite-dimensional functional integral is replaced 
by a finite-dimensional integral over the moduli 
space of anti-self-dual (ASD) connections Masp, 
that is, the space of connections satisfying F}, = 0. 

Therefore, the correlation functions [36] have the 
form 


(1-45) f AAA by [38] 


where the fields in ¢,---¢, are mapped to differ- 
ential forms ¢1---¢, on Masp — the degree of each 
form being given by the ghost number of its 
partner. Notice that the integral on the right-hand 
side vanishes unless the form has top degree. From 
the field-theoretical point of view, this is the 
requirement that the overall ghost number of the 
correlation function must be equal to dim Masp. 
The quantities on the right-hand side of [38] are - 
for gauge group SU(2) — precisely the Donaldson 
invariants. Thus, Witten’s work provided a new 
point of view on these invariants by reformulating 
them in a quantum field theory language. This is a 
very important contribution since quantum field 
theory is a very rich framework and a wide variety 
of methods can be used to analyze the correlation 
functions. This opened an entirely new strategy to 
investigate the Donaldson invariants. The emergence 
of Seiberg—Witten invariants is perhaps the greatest 
achievement of the implementation of this strategy. 


We finish this section by pointing out that many 
features of the evaluation of the functional integral 
of the Donaldson—Witten theory developed here are 
common to most topological field theories of the 
Witten type. These features can be studied in the 
context of the Mathai—Quillen formalism which is 
the object of a separate article in the encyclopedia 
(see Mathai—Quillen Formalism). 


Chern-Simons Gauge Theory 
for Knots and Links 


Chern-Simons gauge theory is the most important 
example of Schwarz-type TQFTs. Let us begin by 
introducing its basic elements. Chern—Simons gauge 
theory is a quantum field theory whose action is 
based on the Chern-Simons form associated to a 
nonabelian gauge group. The theory is defined by 
the following data: a smooth 3-manifold M which 
will be taken to be compact, a gauge group G which 
will be taken semisimple and compact, and an 
integer parameter k. The action of the theory is 


Scs(A) = Ñ f tt(AndA+5ANAAA) [39] 


where A is a gauge connection and the trace is taken 
in the fundamental representation. The exponential 
of i times this action is invariant under gauge 
transformations, 


As Atp dg [40] 


where g is a map g:M — G. 

Notice that the action [39] is independent of the 
metric on the 3-manifold M. In this theory, appro- 
priate observables lead to correlation functions 
which correspond to topological invariants. Candi- 
dates to be observables of this type must be metric 
independent and gauge invariant. Wilson loops 
satisfy these properties. They correspond to the 
holonomy of the gauge connection A along a loop. 
Given a representation R of the gauge group G and 
a 1-cycle y on M, it is defined as 


W-*(A) = trr(Hol,(A)) = trr Pexp | A [41] 


Products of these operators are the natural candi- 
dates to obtain topological invariants after comput- 
ing their correlation functions. These correlation 
functions are formally written as 


R R Ry 
a Way o Wa ) 


e J [DA] WR: (A)WĒ (A) -> WRr(A)elSe(4) [42] 


where 71,72,---5n are 1-cycles on M and R1, R2, 
and R, are representations of G. In [42], the 
quantity [DA] denotes the functional integral mea- 
sure and it is assumed that an integration over 
connections modulo gauge transformations is car- 
ried out. As usual in quantum field theory, this 
integration is not well defined. Field theorists have 
developed methods to assign a meaning to the right- 
hand side of [42]. These methods mainly fall into 
two categories — perturbative and nonperturbative — 
and their degree of success mostly depends on the 
quantum field theory under consideration. For gauge 
theories, it is also possible to take an alternative 
approach, the large-N expansion, which in general 
provides further insights into the theory. In Chern- 
Simons gauge theory all these three methods have 
proved of great value. 

Witten (1989) showed, using nonperturbative 
methods, that when one considers nonintersecting 
cycles 71,°72,---5%2 Without self-intersections, the 
correlation functions [42] lead to the polynomial 
invariants of knot theory discovered a few years 
earlier starting with the work of Jones (1985). 

Knot theory studies embeddings y:S! — M. Any 
two of such embeddings are considered equivalent if 
the image of one of them can be deformed into the 
image of the other by a homeomorphism on M. The 
main goal of knot theory is to classify the resulting 
equivalence classes. Each of these classes is a knot. 
Most of the work on knot theory has been carried 
out for the simple case M=S°. Chern—Simons gauge 
theory, however, being a formulation intrinsically 
three dimensional, provides a framework to study the 
case of more general 3-manifolds M. 

A powerful approach to classify knots is based on 
the construction of knot invariants. These are 
quantities which can be computed for a representa- 
tive of a class and are invariant within the class, that 
is, under continuous deformations of the chosen 
representative. At present, it is not known if there 
exist enough knot invariants to classify knots. 
Vassiliev invariants (Vassiliev 1990) are the most 
promising candidates, but it is already known that if 
they do provide such a classification, infinitely many 
of them are needed. 

The problem of the classification of knots in S° 
can be reformulated in a two-dimensional frame- 
work using regular knot projections. Given a 
representative of a knot in $$, deform it continuously 
in such a way that the projection on a plane has 
simple crossings. Draw the projection on the plane, 
and at each crossing use the convention that the line 
that goes under the crossing is erased in a neighbor- 
hood of the crossing. The resulting diagram is a set 
of segments on the plane, containing the relevant 
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information at the crossings. The problem of 
classifying knots is equivalent to the problem 
of classifying knot projections modulo a series of 
relations among them. These relations are known as 
Reidemeister moves. Invariance of a quantity under 
the three Reidemeister moves is called invariance 
under ambient isotopy. If a quantity is invariant 
under all but the first move, it is said to possess 
invariance under regular isotopy. 

The formalism described for knots generalizes to 
the case of links. For a link of n components, one 
considers n embeddings, y;:S' — M (i=1,...,7), 
with no intersections among them. Again, the main 
problem that link theory faces is the problem of 
their classification modulo homeomorphisms on M. 
In this case one can also define regular projections 
and reformulate the problem in terms of their 
classification modulo the Reidemeister moves. 

The study of knot and link invariants experimen- 
ted important progress in the 1980s. Jones (1985) 
discovered a new invariant which carries his name. 
The Jones polynomial can be defined very simply in 
terms of skein relations. These are a set of rules that 
can be applied to the diagram of a regular knot 
projection to construct the polynomial invariant. 
They establish a relation between the invariants 
associated to three links which only differ in a 
region as shown in Figure 1 where arrows have been 
introduced to take into account that the Jones 
polynomial is defined for oriented links. 

If one denotes by V,(t) the Jones polynomial 
corresponding to a link L, t being the argument of 
the polynomial, it must satisfy the skein relation: 


Lv, —tV; = (vi = =) Vio [43] 


where L,,L_, and Lo are the links shown in 
Figure 1. This relation plus a choice of normali- 
zation for the unknot (U) are enough to compute the 
Jones polynomial for any link. The standard choice 
for the unknot is 


Vu=1 [44] 


though it is not the most natural one from the point 
of view of Chern—Simons gauge theory. After Jones 


LXX 


Figure 1 Skein relations. 
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work in 1984, many other polynomial invariants 
were discovered, as the HOMFLY and the 
Kauffman polynomial invariants. 

The pioneering work of Witten in 1988 showed 
that the correlation functions of products of Wilson 
loops [42] correspond to the Jones polynomial when 
one considers SU(2) as gauge group and all the 
Wilson loops entering in the correlation function are 
taken in the fundamental representation F. For 
example, if one considers a knot K, Witten showed 
that 


V(t) = (Wx) [45] 


provided that one performs the identification 


2ri 
L= exp (E) [46] 


where þh =2 is the dual Coxeter number of the gauge 
group SU(2). Witten also showed that if instead of 
SU(2) one considers SU(N) and the Wilson loop 
carries the fundamental representation, the resulting 
invariant is the HOMFLY polynomial. The second 
variable of this polynomial originates in this context 
from the N dependence. However, these cases are 
just a sample of the general framework intrinsic to 
Chern-Simons gauge theory. Taking other groups 
and other representations, one possesses an enormous 
set of knot and link invariants. These invariants 
can also be obtained in the context of quantum 
groups. 

Many nonperturbative studies of Chern-Simons 
gauge theory have been carried out. The quantiza- 
tion of the theory has been studied from the point of 
view of the operator formalism as well as other 
more geometrical methods. Also, its connection to 
two-dimensional conformal field theory has been 
further elucidated, and a powerful method for the 
general computation of knot and link invariants has 
been developed by Kaul and collaborators. 

Chern-Simons theory is also amenable to pertur- 
bative analysis, which has provided important 
representations of the Vassiliev invariants. These 
invariants, proposed by Vassiliev in 1990, turned 
out to be the coefficients of the perturbative series 
expansion of the correlators of Chern-Simons gauge 
theory. Perturbative studies can be carried out in 
different gauges, originating a variety of new 
representations of Vassiliev invariants. Among the 
most relevant results related to these topics are the 
integral expressions for Vassiliev invariants by 
Kontsevich and by Bott and Taubes, as well as the 
recent combinatorial ones. These developments are 
not described here but the interested reader is 
referred to the recent review (Labastida 1999). 


Advanced Developments 


Topological sigma models are another important 
type of (Witten-type) TQFTs. These theories are 
obtained after twisting 2D M =2 supersymmetric 
sigma models. The twisting can be done in two 
different ways leading to two types of models, A and 
B. Their existence is related to mirror symmetry. 
Only type-A models will be described in what 
follows. These models can be defined on an 
arbitrary almost-complex manifold, though typically 
they are considered on Kahler manifolds. The theory 
involves maps from two-dimensional Riemann sur- 
faces X to target spaces X, together with fermionic 
degrees of freedom on X which are mapped to 
tangent vectors on X. The functional integral of the 
resulting theory is localized on holomorphic maps, 
defining the corresponding moduli space. The 
corresponding Q-cohomology provides the set of 
physical observables, which can be mapped to 
cohomology classes on the moduli space and 
integrated to produce topological invariants. 

Topological sigma models keep fixed the com- 
plex structure of the Riemann surface ©. Moti- 
vated by string theory, one also considers the 
situation in which one integrates over complex 
structures. In this case, one ends up working with 
holomorphic maps in the entire moduli space of 
curves. The resulting theories are called topologi- 
cal strings. 

We will review now a particular example of 
topological string theory which, besides being very 
interesting from the point of view of physics and 
mathematics, will be very useful in establishing a 
relation with Chern-Simons gauge theory. Let us 
consider topological strings with target manifold X 
a Calabi-Yau 3-fold. In this case, the virtual 
dimension of the moduli space of holomorphic 
maps turns out to be zero. Two situations can 
occur: either the space is given by a number of 
points (the real dimension is zero) or the moduli 
space is finite dimensional and possesses a bundle of 
the same dimension as the tangent bundle. In the 
first case, topological strings count the number of 
points weighted by the exponential of the area of the 
holomorphic map (the pullback of the Kahler form 
integrated over the surface) times x78~*, where x is 
the string-coupling constant and g is the genus of X. 
In the second case, one computes the top Chern class 
of the appropriate bundles (properly defined), again 
weighted by the same factor. In both cases one can 
classify the contributions according to the cohomology 
class 8 on X in which the image of the holomorphic 
map is contained. The sum of the numbers obtained 
for each 8 and fixed g are known as Gromov—Witten 


invariants, N: The topological string contribution 
takes the form 


Soa ` N? ed” [47] 


g20 BEH2(X,Z) 


where w is the Kahler class of the Calabi-Yau manifold. 
In general, the quantities Nf are rational numbers. 

The precedent discussion has shown how Gromov- 
Witten invariants can be interpreted in terms of string 
theory. One could think that this is just a fancy 
observation and that no further insight on these 
invariants can be gained from this formulation. The 
situation turns out to be quite the opposite. Once a string 
formulation has been obtained, the whole machinery of 
string theory is at our disposal. One should look to new 
ways to compute the quantity [47], where Gromov-— 
Witten invariants are packed. The hope is that, if this is 
possible, the new emerging picture will provide new 
insights on these invariants. This is indeed what 
occurred recently. It turns out that the quantity [47] 
can be obtained from an alternative point of view in 
which the embedded Riemann surfaces are regarded as 
D-branes. The outcome of this approach is that the 
Gromoy-—Witten invariants can be written in terms of 
other invariants which are integers and that possess a 
geometrical interpretation. To be more specific, the 
quantity [47] takes the form 


2g—2 
` ni (2 sin (Z) e p [48] 
BDA a 

where n? are the new “integer” invariants. This 
prediction has been verified in all the cases in which 
it has been tested. A similar structure will be found 
in the next section in the context of knot theory in 
the large-N limit. 

Let us now consider also Donaldson—Witten theory 
from a new perspective. To be more specific, let us 
consider the case in which the gauge group is SU(2), 
and the 4-manifold X is simply connected and has 
by > 1 (the case by = 1 is anomalous). In this situation 
there are 1 + b2 physical observables [34], © = I and 
I(t) =1(%q) (a=1,...,b2), where X, is a basis of 
H>(X). These can be packed in a generating functional: 


(os (x: gl (Xa) + 10) [49] 


where à and a,(a=1,...,b2) are parameters. In 
computing this quantity one can argue that the 
contribution is localized on the moduli space of 
instantons configurations and one ends up, after 
taking into account the selection rule dictated by the 
dimensionality of the moduli space, with integrations 
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over the moduli space of the selected forms. The 
resulting quantities are Donaldson invariants. 

As in the case of topological sigma models one could 
be tempted to argue that the observation leading to a 
field-theoretical interpretation of Donaldson invar- 
iants does not provide any new insight. Quite on the 
contrary, once a field theory formulation is available, 
one has at his disposal a huge machinery which could 
lead, on the one hand, to further generalizations of the 
theory and, on the other hand, to new ways to 
compute quantities such as [49], obtaining new 
insights on these invariants. This is indeed what 
happened in the 1990s, leading to an important 
breakthrough in 1994 when Seiberg and Witten 
calculated [49] in a different way and pointed out the 
relation of Donaldson invariants to new integer 
invariants that nowadays bear their names. 

The localization argument that led to the interpreta- 
tion of [49] as Donaldson invariants is valid because 
the theory under consideration is exact in the weak- 
coupling limit. Actually, the topological theory under 
consideration is independent of the coupling constant 
and thus calculations in the strong-coupling limit are 
also exact. These types of calculations were out of 
reach before 1994. The situation changed dramatically 
after the work of Seiberg and Witten in which NV =2 
super Yang-Mills theory was solved in the strong- 
coupling limit. Its application to the corresponding 
twisted version was immediate and it turned out that 
Donaldson invariants can be written in terms of new 
integer invariants now known as Seiberg—Witten 
invariants (Witten 1994). The development has a 
strong resemblance with the one described above for 
topological strings: certain noninteger invariants can 
be expressed in terms of new integer invariants. 

The Seiberg-Witten invariants are actually simpler 
to compute than Donaldson invariants. They corre- 
spond to partition functions of topological 
Yang-Mills theories where the gauge group is 
abelian. These contributions can be grouped into 
classes labeled by x= —2c1(L), where c;(L) is the 
first Chern class of the corresponding line bundle. 
The sum of contributions, each being +1, for a given 
class x is the integer Seiberg-Witten invariant ny. The 
strong-coupling analysis of topological Yang—Mills 
theory leads to the following expression for [49]: 


9 14+(1/4)(7x+110) ( e((v?/2)+22) nye’™ 
2 


ypo a D y A ous 50] 
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where v= `, aga, and x and ø are the Euler 
number and the signature of the manifold X. This 
result matches the known structure of [49] (structure 
theorem of Kronheimer and Mrowka) and provides 
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a meaning to its unknown quantities in terms of the 
new Seiberg-Witten invariants. Equation [50] is a 
rather remarkable prediction that has been tested in 
many cases, and for which a general proof has been 
recently proposed. For a review of the subject, see 
Labastida and Lozano (1998). 

The situation for manifolds with b} = 1 involves a 
metric dependence and has been worked out in 
detail (Moore and Witten 1998). The formulation of 
Donaldson invariants in field-theoretical terms has 
also provided a generalization of these invariants. 
This generalization has been carried out in several 
directions: (1) the consideration of higher-rank 
groups, (2) the coupling to matter fields after 
twisting M =2 hypermultiplets, and (3) the twist of 
theories involving MN =4 supersymmetry. 

We will now look at Chern-Simons gauge theory 
from the perspective that emerges from its treatment 
in the context of the large-N expansion. We will 
restrict the discussion to the case of knots on $? with 
gauge group SU(N). Gauge theories with gauge group 
SU(N) admit, besides the perturbative expansion, a 
large-N expansion. In this expansion correlators are 
expanded in powers of 1/N while keeping the 
>t Hooft coupling t= Nx fixed, x being the coupling 
constant of the gauge theory. For example, for the 
free energy of the theory, one has the general form 


F= `“ CrN oree [51] 
z 

In the case of Chern-Simons gauge theory, the coupling 
constant is x =27i/(k + N) after taking into account 
the shift in k. The large-N expansion [51] resembles a 
string-theory expansion and indeed the quantities C, ;, 
can be identified with the partition function of a 
topological open string with g handles and b bound- 
aries, with N D-branes on S* in an ambient six- 
dimensional target space T*S*. This was pointed out by 
Witten in 1992. The result makes a connection between 
a topological three-dimensional field theory and the 
topological strings described in the previous section. 

In 1998 an important breakthrough took place 
which provided a new approach to compute quan- 
tities such as [51]. Using arguments inspired by the 
AdS/CFT correspondence, Gopakumar and Vata 
(1999) provided a closed-string-theory interpretation 
of the partition function [51]. They conjectured that 
the free energy F can be expressed as 


F= y NEF, (t) [52] 
g>0 


where F,(t) corresponds to the partition function of a 
topological closed-string theory on the noncompact 
Calabi-Yau manifold X called the resolved conifold, 


O(-1) 6 O(-1) — P , t being the flux of the B-field 
through P'. The quantities F,(t) have been computed 
using both physical and mathematical arguments, 
thus proving the conjecture. 

Once a new picture for the partition function of 
Chern-Simons gauge theory is available, one should 
ask about the form that the expectation values of 
Wilson loops could take in the new context. The 
question was faced by Ooguri and Vafa and they 
provided the answer, later refined by Labastida, 
Marino, and Vafa. The outcome is an entirely new 
point of view in the theory of knot and link 
invariants. The new picture provides a geometrical 
interpretation of the integer coefficients of the 
quantum group invariants, an issue that has been 
investigated during many years. To present an 
account of these developments, one needs to review 
first some basic facts of large-N expansions. 

To consider the presence of Wilson loops, it is 
convenient to introduce a particular generating 
functional. First, one performs a change of basis 
from representations R to conjugacy classes C(k) of 


the symmetric group, labeled by vectors 
k=(ki,ko,...) with k;> 0, and |k|= 7k > 0. 
The change of basis is W= Yop vr(C(k)) Wr, 


where yr are characters of the permutation group 
So of L= jk; elements (£ is also the number of 
boxes of o Young tableau associated to R). 
Second, one introduces the generating functional: 


V = SCM wor) [53] 
k 





where 





In these expressions |C(k)| denotes the number of 
elements of the class C(k) in S,. The reason behind 
the introduction of this generating functional is that 
the large-N structure of the connected Wilson loops, 
w, turns out to be very simple: 


= YOR (A) [54] 


g=0 


CR) wie 
4! Wh 





where A= and t=Nx is the °t Hooft coupling. 
Writing x=t/N, it corresponds to a power series 
expansion in 1/N. As before, the expansion looks 
like a perturbative series in string theory where g is 
the genus and |k| is the number of holes. Ooguri and 
Vafa conjectured in 1999 the appropriate string- 
theory description of [54]. It corresponds to an open 
topological string theory (notice that the ones 


described in the previous section were closed), 
whose target space is the resolved conifold X. The 
contribution from this theory will lead to open- 
string analogs of Gromov—Witten invariants. 

In order to describe in more detail the fact that one 
is dealing with open strings, some new data need to 
be introduced. Here is where the knot description 
intrinsic to the Wilson loop enters. Given a knot K on 
S3, let us associate to it a Lagrangian submanifold Cx 
with b; = 1 in the resolved conifold X and consider a 
topological open string on it. The contributions in 
this open topological string are localized on holo- 
morphic maps f : Ug, — X with b=|k| which satisfy 
f.[X.,]=Q, and f,[C]= jly] for k; oriented circles 
C. In these expressions y € H1(Cx,Z), and Q€ 
H(X, Ck, Z), that is, the map is such that k; 
boundaries of X, p wrap the knot j times, and X, p 
itself gets mapped to a relative two-homology class 
characterized by the Lagrangian submanifold Cx. 
The number of such maps (in the sense described in 
the previous section) is the open-string analog of 
Gromoy-Witten invariants. They will be denoted by 
N? ,- Comparing to the situation that led to [47] in 
the closed-string case, one concludes that in this case 
the quantities F, (A) in [54] must take the form 


FeaQ)= ONS ee, r= fw [5 
Q 


where w is the Kahler class of the Calabi-Yau 
manifold X and \=e’. For any Q, one can always 
write fo w= Qt, where OQ is in general a half-integer 
number. Therefore, F, (À) is a polynomial in a 
with rational coefficients. 

The result [55] is very impressive but still does not 
provide a representation where one can assign a 
geometrical interpretation to the integer coefficients 
of the quantum-group invariants. Notice that to 
match a polynomial invariant to [55], after obtain- 
ing its connected part, one must expand it in x after 
setting g=e* keeping à fixed. One would like to 
have a refined version of [55], in the spirit of what 
was described in the previous section leading from 
the Gromov—Witten invariants N? of [47] to the 
new integer invariants a of [48]. It turns out that, 
indeed, F(V) can be expressed in terms of integer 
invariants in complete analogy with the description 
presented in the previous section for topological 
strings. A good review on the subject can be found 
in Mariño (2005). 


Concluding Remarks 


In this overview we have introduced key features of 
TQFTs and we have described some of the most 
relevant results emerged from them. We have 
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described how the many faces of TQFT provide a 
variety of important insights in a selected set of 
problems in topology. Among these outstand the 
reformulation of Donaldson theory and the discovery 
of the Seiberg-Witten invariants, and the string-theory 
description of the large-N expansion of Chern-Simons 
gauge theory, which provides an entirely new point of 
view in the study of knot and link invariants and points 
to an underlying fascinating interplay between string 
theory, knot theory, and enumerative geometry which 
opens new fields of study. 

In addition to their intrinsic mathematical inter- 
est, TQFTs have been found relevant to important 
questions in physics as well. This is so because, in a 
sense, TQFTs are easier to solve than conventional 
quantum field theories. For example, topological 
sigma models are relevant to the computation of 
certain couplings in string theory. Also, Witten-type 
gauge TQFTs such as Donaldson—Witten theories 
and its generalizations play a role in string theory as 
effective world-volume theories of extended string 
states (branes) wrapping curved spaces, and TQFTs 
arising from M =4 gauge theories in four dimen- 
sions have shed light on field- (and string-) theory 
dualities. 

Most of these developments, and others that we 
have not touched upon or only mentioned in passing 
have their own entries in the encyclopedia, to which 
we refer the interested reader for further details. 


See also: Axiomatic Approach to Topological Quantum 
Field Theory; BF Theories; Chern—Simons Models: 
Rigorous Results; Donaldson—Witten Theory; Gauge 
Theoretic Invariants of 4-Manifolds; Gauge Theory: 
Mathematical Applications; Hamiltonian Fluid Dynamics; 
The Jones Polynomial; Knot Theory and Physics; 
Mathai—Quillen Formalism; Mathematical Knot Theory; 
Schwarz-Type Topological Quantum Field Theory; 
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Introduction 


Topological sigma models govern the quantum 
mechanics of maps from a Riemann surface X to a 
target space M. In contrast to the standard super- 
symmetric sigma model, the topological version has a 
special local shift symmetry. This symmetry takes the 
form ut = é, where é is an arbitrary local function of 
the coordinates on the base manifold X. In essence, this 
topological shift symmetry ensures that all local 
degrees of freedom of the model can be gauged away. 
As a result, the dynamics of such a model resides in a 
finite number of global topological degrees of freedom. 
This feature is generic to all topological field theories 
of Witten type, also known as cohomological field 
theories (see Topological Quantum Field Theory: 
Overview). The topological shift symmetry is respon- 
sible for the special topological nature of the model, 
which is seen most readily by BRST quantizing the 
local shift symmetry. This gives rise to a nilpotent 
BRST operator O. The properties of this BRST 
operator are crucial for establishing the topological 
nature of the model. The key point in the construction 
of any cohomological field theory is the fact that the 
full quantum action S$, can be written as a BRST 
commutator Sq ={Q, V}, where V is a function of the 
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fields needed to define the path integral. In particular, 
one can show that the partition function and all 
correlation functions are independent of the metric on 
both the base manifold © and the target space M. For 
example, let us define the path integral by 


Fa J dd e{2V) i1] 


where ® denotes the full set of fields required at the 
quantum level. In general, the function V depends 
on geometric data of both © and M. Nevertheless, 
one can easily establish that the partition function is 
independent of this data by noting the following. 
Variation of Z with respect to the metric of the 
target space g (for example) gives 


LZ = — | db") (0, 5,V} 2 


The right-hand side of this equation is nothing but 
the vacuum expectation value of a BRST commu- 
tator, and this vanishes by BRST invariance of the 
vacuum. Ít is important to note here that the BRST 
operator O can be constructed to be independent of 
g. Apart from the necessity of introducing the metric 
tensor, these models also require additional geo- 
metric data for their construction. The complex 
structure of X, and at least an almost-complex 
structure on M, is required. By a similar argument, 
one can show that the partition function and 
correlation functions are independent of this extra 


geometric data. As mentioned above, these models 
possess no local degrees of freedom. One can then 
show that the path-integral expression for the 
correlation functions can be localized to a finite- 
dimensional moduli space of instanton configura- 
tions which minimize the classical action. 

We will first show how the full quantum action of 
the theory can be obtained as a BRST quantization of a 
classical action with a local gauge symmetry. How- 
ever, we shall then highlight the fact that the gauge 
algebra for this topological shift symmetry only closes 
on-shell. In order to proceed with a BRST quantization 
of the model, and obtain the complete quantum 
action, one must take recourse to the Batalin— 
Vilkovisky quantization scheme. This machinery is 
ideally tailored for such a problem, with the end result 
that quartic ghost terms are present in the action. 
However, the presence of such terms does not affect 
the arguments presented above, since the quantum is 
still obtained as a BRST commutator. Following this, 
we construct all observables of the theory and 
demonstrate their connection to the de Rham coho- 
mology of the target space. The special topological 
properties of the observables are then discussed, and it 
is shown how their computation is localized to the 
moduli space M of holomorphic maps from © to M. 
As a particular example, we show how the computa- 
tion of a certain class of observables determines the 
intersection numbers of the moduli space M. We 
present a brief discussion of the connection between 
topological sigma models with Calabi-Yau target 
space M, and the mirror symmetry of M. 


Construction of the Model 


We begin with the following classical action: 
S: = f d*ov h hag gjK*K’' 3] 
3 


where 
Kece - (acu + egf ja w) |4] 
The fields G% and K” both satisfy the self-duality 


constraint 
ee Pee s] 
Kg“ = Pe gK” 


where the self-dual and anti-self-dual projection 
operators are defined as 


Xa = 3 (5°36 = ea] ‘) 6 


The above action describes a theory of maps u!(c) 
from a Riemann surface X to an almost complex 
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manifold M. The coordinates on © are denoted by 
o*(a=1,2), while those on the target manifold M 
are denoted by u'(i=1,..., dim M). The metric and 
complex structure of © are denoted by hag and eg, 
respectively; they obey the relations ¢°ge’,, = —6%, 
and €43=h,, 1g. The metric tensor g; and almost- 
complex structure J; of M obey analogous relations 
to the above. In the general model, the target space 
need only be an almost-complex manifold. This 
requires the existence of a globally defined tensor 
field J; such that Vd = —6',. 

The action [3] is invariant under the topological 
shift symmetry 


fu = é [7] 


where ¢€ is an arbitrary local function of the 
coordinates on the base manifold X. Already, at 
this level, we see the distinction with the standard 
sigma model. The presence of this shift symmetry 
means that all local degrees of freedom can be 
gauged away, leaving only a finite number of global 
topological degrees of freedom. It requires some 
work to determine the corresponding transformation 
for G™, the key point being the preservation of the 
self-duality constraint. We find 


6G” = Pilg (Die +3 (Dig) Ou") 
+ her ge#(Dif,)GF-ThetGt [8] 


where the covariant derivative is defined by 
Dad = aé + Ti, (qu! et. 
. E . . 

Having determined the classical symmetries of the 
model, we can now proceed with the BRST quantized 
form of the quantum action. As a topological field 
theory of Witten type, one can show that the quantum 
action can be written as a BRST commutator, that is, 


Sa ={Q, V}, where the gauge fermion V is defined by 
= 2 PY [A&i a ai 
V= | dovb Cui (Pu 7B ) 9 


where qa is an arbitrary gauge-fixing parameter. The 
BRST operator O is nilpotent Q? = 0, off-shell. It is 
defined by 66 =c{O, ®}, and takes the form 


bu =C 

6C =0 
ôCai =€ (Ba a ; Ea” (Dp Ji) CC® +ECO ) 
5B“ = 7 (Ri, + a AE [10] 
_ =e (DF) Ck Bo 


+£ (ŒD; I',) (CD) Lae 3" 
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In the above, the ghost field is denoted by C’, while 
the anti-ghost field Ca; and the multiplier field Ba; 
obey the self-duality constraint [5]. The key point to 
note in the above transformations is the fact that the 
ghost field C’ is BRST invariant. Again, this is a 
feature which is generic to all cohomological field 
theories. The existence of such a field allows the 
construction of an entire set of topological correla- 
tion functions, as we shall see in the following 
section. 

While the gauge-fixing parameter qa is arbitrary, a 
conventional choice is to take @a=1, and then 
integrate out the multiplier field B. This yields the 
action in the form 


1 _ . J ee 
= / d?ovh p h” g;:0,,u' Ogu Pe Jjðaw ð 
+ Cai (pre + a (DiJ) u*C ) 


Le m ma T 
+zČa C "Rap GC 


+ OT OOJO a) 


It should be stressed that the classical gauge algebra 
[7] and [8] only closes on-shell. Quantization of the 
model is therefore more subtle, and requires use of 
the Batalin-Vilkovisky formalism. The on-shell 
closure problem automatically results in the pre- 
sence of quartic ghost coupling terms in the action 
and consequently cubic terms in the BRST transfor- 
mations. Despite this, we have established that the 
full quantum action can be written as a BRST 
commutator. 

The form of the action simplifies when the 
complex structure of the target manifold is 
covariantly constant, D,J’',=0. In this case, the 
target manifold M is Kahler and we denote the 
complex coordinates as ul, with their complex 
conjugates denoted by u!. The nonzero compo- 
nents of the metric tensor are then gy. Similarly, 
the coordinates of © are denoted o*, with nonzero 
metric components 4,_. The nonzero components 
of the ghost and anti-ghost are then given by 
C!,C',C,;,C_1. The action can be written in the 
form 


= 1 _ = 
m J dovh b yOu da +5C,(D-C)h*-gy 
+ 


PS 
z Č- (D+4C')bt SIJ 
1 O3 7 
toh Cy CIR WOO | [12] 


Construction of Observables 


Having defined the quantum action, it is now of 
interest to consider the correlation functions of the 
model. In the functional integral, we integrate over 
all maps £ — M in a fixed homotopy class. Let us 
consider a correlation function 


(0) = J dui dCq; dCie 80 [13] 


where t > 0 is a parameter, and the observable O is 
BRST invariant {0,0}=0. From the BRST invar- 
iance of the vacuum, it follows immediately that the 
vacuum expectation value of a BRST commutator is 
zero, ({O,O}) =0. An operator which is a BRST 
commutator is said to be QO-exact. Hence, our 
interest is in the O-cohomology classes of operators, 
that is, BRST invariant operators modulo BRST 
exact operators. It is for this reason that such a 
model is called a cohomological field theory. 

One can now show that the variation of [13] with 
respect to t is a BRST commutator, namely 


5,(O) = —6t J dui dCqidCies{0, VO} =0 [14] 


As a result, one can evaluate the correlation function 
in the large-t (weak-coupling) limit. In this limit, the 
path integral is dominated by fluctuations around 
the classical minima. For the sigma model under 
study, the classical action is minimized by the 
instanton configurations 


Oyu’ + Ea” J Ogu! — 0 [15] 


Indeed, this localization of the path integral to the 
moduli space of instantons can also be seen by 
choosing the a=0 gauge in [9]. Integration over the 
multiplier field then imposes a delta function 
constraint to the instanton configurations. The key 
point in the above derivation is the fact that the 
quantum action is a BRST commutator, S, = {Q, V}. 
By a similar argument, one can show that variations 
of (O) with respect to the metric and complex 
structure of X and M are also zero. 

Our aim now is to construct the O-cohomology 
classes of operators in the theory. Let us first associate 
an operator Co to each p-form A = A; i, du! A--- A 
du” on the target space M, given by 

OD) = Ani C” + Ch [16] 
p 
where C’ is the ghost field. Under a BRST 


transformation, we see that 
10, ow = Oj, Aging, Co... Ce 
0 
=-O4, [17] 


since the ghost fields are BRST invariant by [10]. 
Hence, Oy is BRST invariant if and only if A is a 
closed p-form. Similarly, if A is an exact p-form, 
then the corresponding operator is O-exact. Hence, 
the BRST cohomology classes of these operators are 
in one to one correspondence with the de Rham 
cohomology classes on M. The reason for assigning 
the peculiar superscript to the operator ©% will 
become clear at the end of this construction. Notice 
also that operators of the form oy can be used as 
building blocks for constructing new observables. If 
we consider a set of closed forms A1,..., Ap, then 
the product of the associated operators O re vee O 
is clearly Q-invariant as well. 

When considering the vacuum expectation values 
of operators which are polynomials in the fields, 
there is an implicit dependence on the points where 
the operators are located. In the case at hand 
however, the operator O'o at the point o has a 
vacuum expectation value which is a topological 
invariant, and thus cannot depend on the chosen 
point. To see this explicitly, we consider all fields 
defined over X, and differentiate the operator with 
respect to some local coordinates o°: 





O Ay" 
— A: Ot eS Cae 2 eer C 
aa ApG C (0; Ai,...i,) aaa G C 
We ip 
+ PAi -ip (On ) Jox C ee C [1 8| 


In terms of exterior derivatives, this takes the form, 


i1-ip 


dO? = 8, Azi du C.-C + pA i AC1C? <- CÈ 
={0 07] M9) 


where C= EO du“ C? -.. C>, and we have 
used the fact that A is a closed p-form. If we let y 
represent any path between two arbitrary points P 
and P’, then this expression has the integral form, 


oP) -0(P)={9, fop) po 


and we see that the vacuum expectation value of 
o. is point independent by the BRST invariance of 
the vacuum. The same remark applies to any 
product of operators of the form we are considering. 
To continue our construction, consider a one- 
dimensional homology cycle y(ðôy = 0), and define 


w= f op 21] 


This new operator Ww (4) is BRST invariant by 
inspection, 
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{wi} = [ {2.00} = | aop =o pa 


Moreover, if y happens to be the boundary of a two- 
dimensional surface (y=0OQ), so that y is trivial in 
homology, then this new operator is likewise trivial 
in O cohomology: 


wh) = fof = [do ={0, fog) g 


where 
—1 ES 
oG -i A FV du” A du” C” ... C” 
As before, let us now associate to each homology 
2-cycle 3(0G=0), another BRST invariant operator 
wW defined by 


wg) = J oY [24] 


The BRST invariance follows trivially as in [23]. 

In summary, we have produced three operators 
A and g” from any given closed form A, 
which satisfy the relations: 


0235050, |. 
dg 7 20,0.) 


The BRST observables are then given by arbitrary 

d f the i d Why) = 
products of the integrated operators a (q) 
Je Cc. where y is any i-cycle in homology. 


doy) = {0,0} i 
0 


do” = 


Observables and Intersection Theory 


Let us consider the computation of the correlation 
function (O} in the background field method. We 
first pick a background instanton configuration [15], 
and then integrate over the quantum fluctuations 
around that instanton. The relevant part of the 
quantum action is quadratic in the quantum fields, 
and localization of the model then ensures that such 
a computation is exact. The quantum fields are 
expanded into eigenfunctions of the operators that 
appear in the quadratic part of the action, and the 
functional integral is replaced by an integral over the 
eigenmodes. However, if there are fermionic zero 
modes, then those modes do not enter in the action. 
As a result, the fermionic integrals (f dx=0) over 
those modes will cause (©) to vanish unless it has 
the correct fermion content; the zero modes must be 
absorbed. In our case, a glance at the quantum 
action indicates that we should concern ourselves 
with the zero modes of the ghost C’ and anti-ghost 
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Cai; A Cİ zero mode is clearly in the kernel of the 
operator 


Di; = Daj + eag iD? + eag (DiJ h) u [26] 


and a Ca; zero mode is a zero eigenfunction of its 
adjoint D*. In the BRST quantization of the model, 
the ghost fields C’ are assigned ghost number +H, 
while the anti-ghost fields Ca; have ghost number 
—1. It is therefore apparent that the vacuum 
expectation value of any observable will vanish 
unless that observable has a ghost number equal to 
the number of D zero modes, a, minus the number 
of D* zero modes, b. This difference, w=a — b, is 
called the index of the operator D. 

There is a direct link between this index and the 
dimension of the moduli space of instantons. Recall 
that we are considering the space of maps X — M ina 
specified homotopy class, which satisfy equation [15]. 
It is then of interest to determine the dimension of the 
space of such solutions. To this aim, we examine the 
constraint that arises by considering an instanton u’, 
and another neighboring solution w' + i’, where f is 
an infinitesimal deformation. To first Sides in i’, we 
see that z’ must be a zero mode of the BEAST D. 
This is no coincidence, and we can thus interpret the 
ghost fields C? as cotangent vectors to instanton 
moduli space M. In particular, if M is a smooth 
manifold, then dim M =a. The index of the operator 
D is called the virtual dimension of the moduli space. 
In generic situations, the virtual dimension is equal to 
the actual dimension dim M. 

It is possible to interpret some of the observables 
that we have described in terms of intersection 
theory applied to the moduli space of instantons. In 
particular, one can show that all correlation func- 
tions of the form 


Gh a A 27] 


are intersection numbers of certain submanifolds of 
moduli space. In order to see this in a simple 
example, we first recall the notion of Poincaré 
duality and the relationship between cohomology 
and homology. 

Poincaré duality can be formulated as a relation- 
ship between de Rham cohomology (defined in 
terms of closed differential forms) and homology 
(defined in terms of subspaces of M). For our 
purposes here, it is sufficient to state that we can 
associate to each boundaryless submanifold N of 
codimension k, a cohomology class [¢] € H*(M), 


such that 
= 2 
[one Jo [28] 


for all [y] € H”-*(M). By w on the right-hand side of 
this equation, we mean the pullback 7*y under the 
inclusion i: N—M. Conversely, to each closed 
k-form ¢ on M, we can associate an (n — k)-cycle 
N (it is in general a chain of subspaces), unique up 
to homology, such that the previous relation is 
satisfied. Furthermore, one can show that the 
Poincaré dual to N can be chosen in such a way 
that its support is localized within any given open 
neighborhood of N in M (essentially delta function 
support on N). 

Let us now define the notion of transversal 
intersection. For simplicity, we will first consider 
the intersection of two submanifolds Mı and M> 
contained in M. We will say that these two 
submanifolds have transversal intersection if the 
tangent spaces satisfy 


Ts(M1ı) + Tx(M2) = T;,(M) [29] 


for all x € Mı M Mə. It is a theorem that a submanifold 
of codimension k can be locally “cut-out” by k smooth 
functions, that is, the submanifold is locally specified by 
the zeros of this set of functions. It is a worthwhile 
exercise to convince oneself that the definition of 
transversal intersection is equivalent to the statement 
that the functions which cut-out Mı are independent 
from those which cut-out M2. Thus, we can write 


codim(M, N Mz) = codim(M,) + codim(M2) [30] 


More generally, we say that the intersection M1 N--- N 
M, of s submanifolds is transversal if the intersection of 
every pair of them is transversal. It then follows 
trivially by the previous argument that the codimen- 
sions must satisfy 


codim(Mı N--- AM 


= ` codim(M;) [31] 
i=1 


The special case which will be important for us 
occurs when the intersection of submanifolds is a 
collection of points, that is, when the codimension 
of the intersection is equal to the dimension of M. 
Since these points are isolated, the compactness of 
M guarantees that they are finite in number. 

We are now in a position to describe in pa sense 
correlation functions of the form O sO 
determine intersection numbers in the ao ul space 
M of instantons. By definition, this moduli space is 
the set of maps from © to M which satisfy [15]. Let 
us consider the generic situation, where the virtual 
dimension of M (i.e., the index of D) is equal to 
dim M. For convenience, let us begin by choosing the 
forms A; which represent de Rham cohomology 
classes on M, together with their Poincaré duals M;, 
such that the forms have essentially delta function 


support on their respective submanifolds. Since each 
of the operators in the correlation function depends 
on some fixed point o;, it is meaningful to define the 
submanifolds L; = {u € M |u(o;) E€ Mi} C M. Now, 
the correlation function represents a functional 
integral over the space of maps Map(, M), and we 
have argued that this integral only receives contribu- 
tions from the instanton configurations. Since the 
operators A;(u(o;)) vanish unless u € L; by our choice 
of the Poincaré duals, we see that the only contribu- 
tion to the functional integral can be from those maps 
which lie in the intersection Li N -+-+ O Ls. By ghost 
number considerations, this correlation function must 
vanish unless the codimension of the intersection 
equals the virtual dimension of M. In the generic 
case where the virtual dimension is equal to dim M, 
this means that the intersection is simply a finite 
number of points. Intersection numbers +1 can then 
be assigned to each point in the intersection L4 N 
--- N Ls, by considering the relative orientation of the 
submanifolds L; at the intersection points. From the 
functional integral point of view, the computation 
reduces to an evaluation of the ratio of the bosonic 
determinant (integration over u) to the fermionic 
determinant (integration over C’ and Cai). In the 
Kahler case, for example, the intersection number 
assigned to each point in the intersection is always +1. 
This is due to the fact that the C!, C_’ determinant is 
the complex conjugate of the C’, Ce determinant. 


A and B Models and Mirror Symmetry 


The topological sigma model for a Kahler target 
space [12] is also known as the topological A model. 
In this case, the action can be recovered by twisting 
the standard N=2 supersymmetric sigma model. 
This twisting procedure amounts to a reassignment 
of the spins of the fields in the theory. However, 
there is an alternative twisting which can be done, 
and this leads to another model known as the 
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Introduction 


Turbulence has initially been defined as an irregular 
motion in fluids. The cloud formations in the 
atmosphere and the motion of water in rivers make 
this point clear. These are but a few readily available 
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topological B model. The usefulness of this observa- 
tion lies in the fact that the topological A model on a 
Calabi-Yau target space M is related to the 
topological B model on the mirror of M. This 
relationship and the computation of correlation 
functions in the A and B models thus sheds light 
on the nature of mirror symmetry. 
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examples of a multitude of flows which display 
turbulent regimes: from the blood that flows in our 
veins and arteries to the motion of air within our 
lungs and around us; from the flow of water in 
creeks to the atmospheric and oceanic currents; 
from the flows past submarines, ships, automobiles, 
and aircraft to the combustion processes propelling 
them; and in the flow of gas, oil, and water, from 
the prospecting end to the entrails of the cities. The 
great majority of flows in nature and in engineering 
applications are somehow turbulent. 
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Figure 1 Illustration of the irregular motion of a turbulent flow 
over a flat plate (thin lines), and of the well-defined velocity 
profile of the mean flow (thick lines). 


But turbulent flows are much more than simply 
irregular. More refined definitions were desirable 
and were later coined. A definitive and precise one, 
however, may only come when the phenomenon is 
fully understood. Nevertheless, several characteristic 
properties of a turbulent flow can be listed: 


Irregularity and unpredictability A turbulent flow 
is irregular both in space and time, displaying 
unpredictable, random patterns. 

Statistical order From the irregularity of a turbu- 
lent motion there emerges a certain statistical 
order. Mean quantities and correlation are regular 
and predictable (Figure 1). 

Wide range of active scales A wide range of scales of 
motion are active and display an irregular motion, 
yielding a large number of degrees of freedom. 

Mixing and enhanced diffusivity The fluid particles 
undergo complicated and convoluted paths, caus- 
ing a large mixing of different parts of fluid. This 
mixing significantly enhances diffusion, increasing 
the transport of momentum, energy, heat, and 
other advected quantities. 

Vortex stretching When a moving portion of fluid 
also rotates transversally to its motion an increase 
in speed causes it to rotate faster, a phenomenon 
called vortex stretching. This causes that portion 
of fluid to become thinner and elongated, and fold 
and intertwine with other such portions. This is 
an intrinsically three-dimensional mechanism 
which plays a fundamental role in turbulence 
and is associated with large fluctuations in the 
vorticity field. 


Turbulent Regimes 


Turbulence is studied from many perspectives. The 
subject of “transition to turbulence” attempts to 
describe the initial mechanisms responsible for the 
generation of turbulence starting from a laminar 
motion in particular geometries. This transition can 
be followed with respect to position in space (e.g., 
the flow becomes more complicated as we look 
further downstream on a flow past an obstacle or 


over a flat plate) or to parameters (e.g., as we 
increase the angle of attack of a wing or the pressure 
gradient in a pipe). This subject is divided into two 
cases: wall-bounded and free-shear flows. In the 
former, the viscosity, which causes the fluid to 
adhere to the surface of the wall, is the primary 
cause of the instability in the transition process. In 
the latter, inviscid mechanisms such as mixing layers 
and jets are the main factors. The tools for studying 
the transition to turbulence include linearization of 
the equations of motion around the laminar solu- 
tion, nonlinear amplitude equations, and bifurcation 
theory. 

“Fully developed turbulence,” on the other hand, 
concerns turbulence which evolves without imposed 
constraints, such as boundaries and external forces. 
This can be thought of turbulence in its “pure” 
form, and it is somewhat a theoretical framework 
for research due to its idealized nature. Hypotheses 
of homogeneity (when the mean quantities asso- 
ciated with the statistical order characterizing a 
turbulent flow are independent in space), stationar- 
ity (idem in time), and isotropy (idem with respect 
to rotations in space) concern fully developed 
turbulent flows. The Kolmogorov theory was devel- 
oped in this context and it is the most fundamental 
theory of turbulence. Current research is dedicated 
in great part to unveil the mechanisms behind a 
phenomenon called intermittency and how it affects 
the laws obtained from the conventional theory. 
Research is also dedicated to derive such laws as 
much from first principles as possible, minimizing 
the use of phenomenological and dimensional 
analysis. 

Real turbulent flows involve various regimes at 
once. A typical flow past a blunt object, for 
instance, displays laminar motion at its upstream 
edge, a turbulent boundary layer further down- 
stream, and the formation of a turbulent wake 
(Figure 2). The subject of turbulent boundary layer 
is a world in itself with current research aiming to 
determine mean properties of flows over rough 
surfaces and varied topography. Convective turbu- 
lence involves coupling with active scalars such as 





Figure 2 Illustration of a flow past an object, with a laminar 
boundary layer (light gray), a turbulent boundary layer (medium 
gray), and a turbulent wake (dark gray). 


large heat gradients, occurring in the atmosphere, 
and large salinity gradients, in the ocean. Geophy- 
sical turbulence involves also stratification and the 
anisotropy generated by Earth’s rotation. Anisotro- 
pic turbulence is also crucial in astrophysics and 
plasma theory. Multiphase and multicomponent 
turbulence appear in flows with suspended particles 
or bubbles and in mixtures such as gas, water, and 
oil. Transonic and supersonic flows are also of great 
importance and fall into the category of compres- 
sible turbulence, much less explored than the 
incompressible case. 

In all those real situations one would like, from the 
engineering point of view, to compute mean proper- 
ties of the flow, such as drag and lift for more 
efficient designs of aircraft, ships, and other vehicles. 
Knowledge of the drag coefficient is also of funda- 
mental importance in the design of pipes and pumps, 
from pipelines to artificial human organs. Mean 
turbulent diffusion coefficients of heat and other 
passive scalars — quantities advected by the flow 
without interfering on it, such as chemical products, 
nutrients, moisture, and pollutants — are also of 
major importance in industry, ecology, meteorology, 
and climatology, for instance. And in most of those 
cases a large amount of research is dedicated to the 
“control of turbulence,” either to increase mixing 
or reduce drag, for instance. From a theoretical 
point of view, one would like to fully understand 
and characterize the mechanisms involved in 
turbulent flows, clarifying this fascinating phe- 
nomenon. This could also improve practical appli- 
cations and lead to a better control of turbulence. 

The concept of “two-dimensional turbulence” is 
controversial. A two-dimensional flow may be 
irregular and display mixing, statistical order, and 
a wide range of active scales but definitely it does 
not involve vortex stretching since the velocity field 
is always perpendicular to the vorticity field. For this 
reason many researchers discard two-dimensional 
turbulence altogether. It is also argued that real 
two-dimensional flows are unstable at complicated 
regimes and soon develop into a three-dimensional 
flow. Nevertheless, many believe that two-dimensional 
turbulence, even lacking vortex stretching, is of 
fundamental theoretical importance. It may shed 
some light into the three-dimensional theory and 
modeling, and it can serve as an approximation to 
some situations such as the motion of the atmos- 
phere and oceans in the large and meso scales and 
some magnetohydrodynamic flows. The relative 
shallowness of the atmosphere and oceans or the 
imposition of a strong uniform magnetic field may 
force the flow into two-dimensionality, at least for 
a certain range of scales. 
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“Chaos” serves as a paradigm for turbulence, in 
the sense that it is now accepted that turbulence is a 
dynamic processes in a sensitive deterministic 
system. But not all chaotic motions in fluids are 
termed turbulent for they may not display mixing 
and vortex stretching or involve a wide range of 
scales. An important such example appears in the 
dispersive, nonlinear interactions of waves. 


The Equations of Motion 


It is usually stressed that turbulence is a continuum 
phenomenon, in the sense that the active scales are 
much larger than the collision mean free path 
between molecules. For this reason, turbulence is 
believed to be fully accounted for by the Navier- 
Stokes equations. 

In the case of incompressible homogeneous flows, 
the Navier-Stokes equations in the Eulerian form 
and in vector notation read 


M Aut (u-V)ut Vp =f [1a] 


V-u=0. [1b] 


Here, u= u(x,t) = (u1, u2,u3) denotes the velocity 
vector of an idealized fluid particle located at 
position x =(x1,x2,x3), at time t. The mass density 
in a homogeneous flow is constant, denoted p. The 
constant v denotes the kinematic viscosity of the 
fluid, which is the molecular viscosity u divided by 
p. The variable p = p(x, t) is the kinematic pressure, 
and f =f(x,t)= (fi, f2, f3) denotes the mass density 
of volume forces. 

Equation [1a] expresses the conservation of linear 
momentum. The term vAu accounts for the dissipa- 
tion of energy due to molecular viscosity, and the 
nonlinear term (u - V)u, also called the inertial term, 
accounts for the redistribution of energy among 
different structures and scales of motion. Equation 
[1b] represents the incompressibility condition. In 
Einstein’s summation convention, these equations 
can be written as 


Ou; an 2i 4 q 2” 4 
Ot Ox? 


The Reynolds Number 


The transition to turbulence was carefully studied by 
Reynolds in the late nineteenth century in a series of 
experiments in which water at rest in a tank was 
allowed to flow through a glass pipe. Starting with 
dimensional analysis, Reynolds argued that a critical 
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value of a certain nondimensional quantity was 
likely to exist beyond which a laminar flow gives 
rise to a “sinuous” motion. This was followed by 
observations of the flow for tubes with different 
diameter L, different mean velocities U across the 
tube section, and with the kinematic viscosity 
v=p/ being altered through changes in tempera- 
ture. The experiments confirmed the existence of 
such a critical value for what is now called the 
Reynolds number: 
Re = = 
V 

The dimensional analysis argument can be repro- 
duced in the following form: the physical dimension 
for the inertial term in [1a] is U*/L, while that for 
the viscous term is vU/L*. The ratio between them 
is precisely Re=LU/v. For small values of Re 
viscosity dominates and the flow is laminar, whereas 
for large values of Re the inertial term dominates, 
and the flow becomes more complicated and 
eventually turbulent. In applications, different types 
of Reynolds number can be used depending on the 
choice of the characteristic velocity and length, but 
in any case, the larger the Reynolds number, the 
more complicated the flow. 


The Reynolds Equations 


Another advance put forward by Reynolds in a 
subsequent article was to decompose the flow into a 
mean component and the remaining fluctuations. In 
terms of the velocity and pressure fields this can be 
written as 


p=p+P 2] 
with u and p representing the mean components and 
u' and p’, the fluctuations. By substituting [2] into 


[1], one finds the Reynolds-averaged Navier-Stokes 
(RANS) equations for the mean flow: 


eZ —vAn+(“-V)“+Vp=f4+V-7 


Ot 
V-u=0 


u=u+u', 


It differs from [1] only by the addition of the 
Reynolds stress tensor: 


r= we = (W) 
ijl 

In a laminar flow, the fluctuations are negligible, 
otherwise this decomposition shows how they 
influence the mean flow through this additional 
turbulent stresses. 


The Closure Problem and Turbulence 
Models 


The RANS equations cannot be solved directly for the 
mean flow since the Reynolds stresses are unknown. 
Equations for these stress terms can be derived but they 
involve further unknown moments. This continues 
with equations for moments of a given order depend- 
ing on new moments up to a higher order, leading to 
an infinite system of equations known as the Fried- 
man—Keller system. For practical applications, 
approximations closing the system at some finite 
order are needed, in what is called the closure problem. 
Several ad hoc approximations exist, the most famous 
being the Boussinesq eddy-viscosity approximation, in 
which the turbulent fluctuations are regarded as 
increasing the viscosity of the flow. Prandtl’s mixing- 
length hypothesis yields a prescription for the compu- 
tation of this eddy viscosity, and together they form the 
basis of the algebraic models of turbulence. Other 
models involve additional equations, such as the k-e 
and k-w models. Most of the practical computations of 
industrial flows are based on such lower-order models, 
and a large amount of research is done to determine 
appropriate values for the various ad hoc parameters 
which appear in these models and which are highly 
dependent on the geometry of the flow. This depen- 
dency can be explained by the fact that the RANS is 
supposed to model the mean flow even at the large 
scales of motion, which are highly affected by the 
geometry. 

Computational fluid dynamics (CFD) is indeed a 
fundamental tool in turbulence, both for research and 
engineering applications. From the theoretical side, 
direct numerical simulations (DNS), which attempt to 
resolve all the active scales of the flow, reveal some 
fundamental mechanisms involved in the transition to 
turbulence and in vortex stretching. As for applica- 
tions, DNS applies to flows up to low-Reynolds 
turbulence, with the current computational power 
not allowing for a full resolution of all the scales 
involved in high-Reynolds flows. And the current rate 
of evolution of computational power predicts that 
this will continue so for several decades. 

An intermediate CFD method between RANS and 
DNS is the large-eddy simulation (LES), which 
attempts to fully resolve the large scales while 
modeling the turbulent motion at the smaller scales. 
Several models have been proposed which have their 
own advantages and limitations as compared to 
RANS and DNS. It is currently a subject of intense 
research, particularly for the development of suitable 
models for the structure functions near the boundary. 
Theoretical results on fully developed turbulence play 
a fundamental role in the modeling process. 


LESs are a promising tool and they have been 
successfully applied to a number of situations. The 
choice of the best method for a given application, 
however, depends very much on the Reynolds 
number of the flow and the prior knowledge of 
similar situations for adjusting the parameters. 


Elements of the Statistical Theory 


Several types of averages can be used. The ensemble 
average is taken with respect to a number of experi- 
ments at nearly identical conditions. Despite the 
irregular motion of, say, the velocity vector u'”)(x, t) 
of each experiment n = 1,...,N, the average value 


N 


t) -2Y m ) 


n=1 


is expected to behave in a more regular way. This 
type of averaging is usually denoted with the symbol 
(-). This notion can be cast into the context of a 
probability space (M, £, P), where M is a set, Visa 
o-algebra of subsets of M, and P is a probability 
measure on X. The velocity field is a random 
variable in the sense that it is a density function 
w= ulļlx,t,w) from M into the space of time- 
dependent divergence-free velocity fields. The mean 
velocity field in this context is regarded as 


(u(x, t)) = J, u(x, t, w)dP(w) 


Other flow quantities such as energy and correla- 
tions in space and time can be expressed by means 
of a function p=y(u(-,-)) of the velocity field, 
with their mean value given by 


w)dP(w) 


In general, the statistics of the flow are allowed to 
change with time. A particular situation is when 
statistical equilibrium is reached, so that (u(x, t)) 
and, more generally, (y(u(-,- + t))} are independent 
of t. In this case, an ergodic assumption is usually 
invoked, which means that for “most” individual 
flows u(-,-,wo) (i.e, for almost all wo with respect to 
the probability measure P), the time averages along 
this flow converge to the mean ensemble value as 
the period of the average increases to the mean value 
obtained by the ensemble average: 


T= T 0 7 enor) 


w))dP(w) 
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Based on this assumption, the averages may in 
practice be calculated as time averages over a 
sufficiently large period T. There is a related 
argument for substituting space averages by time 
averages and based on the mechanics of turbulence 
which is called the “Taylor hypothesis.” 

Another fundamental concept in the statistical 
theory is that of homogeneity, which is the spatial 
analog of the statistical equilibrium in time. 
In homogeneous turbulence, the statistical quantities 
of a flow are independent of translations in space, 
that is, 


(lul + £, -)) = (elu, +) 
for all £ € R?. The concept of isotropic turbulence 
assumes further independence with respect to 
rotations and reflections in the frame of reference, 
that is, 


(p(Q'u(Q-, -)) = (p(u(-, -)) 
for all orthogonal transformations Q in R°, with 
adjoint O'. 
Under the homogeneity assumption, mean quan- 
tities can be defined independently of position in 
space, such as the mean kinetic energy per unit mass 


1 ee 
z ilula x)|°) = 5 9 (u(x) 


e = 


and the mean rate of viscous energy dissipation per 
unit mass and unit time 
) 


3 3 

= vY (Vu =» D> ( 
i=1 ij=1 

The mean kinetic energy can be written as 

e=trR(0)/2, where 


=R t Roa Raa (2), 
is the trace of the correlation tensor 

R(£) = (u(x) ® u(x + £)) = (R; (O) 
— ((u i(x x)uj(x + £))); j- 1 


which measures the correlation between the velocity 
components at different positions in space. From the 
homogeneity assumption, this tensor is a function 
only of the relative position £. Then, assuming that 
the Fourier transform of trR(/) exists, and denoting 
it by O(«), for k € R?, we have 

1 


k l-K k 
(2r)? e es 


=2f S(r) “dk 
0 


Ou;(x) 
OX; 











trR(£) LER’, 


trR(£) = 
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where S(k) is the energy spectrum defined by 


1 
2(2n)3/* |A|=K 
VK > 0 


S(s) = O(K)d& (x) 


with d&i(«) denoting the area element of the 
2-sphere of radius |x|. Then we can write 


1 
aa (|u(x)|") = 5 trR(0) 


- | EAT 


By expanding the velocity coordinates into Four- 
ier modes exp(£-«Kk), with k< |k|<k+d« and 
interpreting them as “eddies?” with characteristic 
wave number |x|, the quantity S(k)dk can be 
interpreted as the energy of the component of the 
flow formed by the “eddies” with characteristic 
wave number between « and « + dx. 

Similarly, 


c= 2v | k’S(k)dk 
0 


and we obtain the dissipation spectrum 2vK*S(k), 
which can be interpreted as the density of energy 
dissipation occurring at wave number k. 

In the previous arguments it is assumed that the 
flow extends to all the space R°. This avoids the 
presence of boundaries, addressing the idealized case 
of fully developed turbulence. It is sometimes 
customary to assume as well that the flow is 
periodic in space to avoid problems with unbounded 
domains such as infinite kinetic energy. 

The random nature of turbulent flows was greatly 
explored by Taylor in the early twentieth century, 
who introduced most of the concepts described 
above. Another important concept he introduced 
was the Taylor microlength r, which is a char- 
acteristic length for the small scales based on the 
correlation tensor. A microscale Reynolds number 
based on the Taylor microlength is very often used 
in applications. 


Kolmogorov Theory 


An inspiring concept in the theory of turbulence is 
Richardson’s “energy cascade” process. For large 
Reynolds numbers the nonlinear term dominates the 
viscosity according to the dimensional analysis, but 
this is valid only for the large-scale structures. The 
small scales have their own characteristic length and 
velocity. In the cascade process, the inertial term is 
responsible for the transfer of energy to smaller and 
smaller scales until small enough scales are reached 


Figure 3 Illustration of the eddy breakdown process in which 
energy is transferred to smaller eddies and so on until the smallest 
scales are reached and the energy is dissipated by viscosity. 


for which viscosity becomes important (Figure 3). At 
those smallest scales kinetic energy is finally dis- 
sipated into heat. It should be emphasized that 
turbulence is a dissipative process; no matter how 
large the Reynolds number is, viscosity plays a role 
in the smallest scales. 

The Kolmogorov theory of locally isotropic 
turbulence allows for inhomogeneity and anisotropy 
in the large scales, which contain most of the energy, 
assuming that with the cascade transfer of energy to 
smaller scales, the orienting effects generated in the 
large scales become weaker and weaker so that for 
sufficiently small eddies the motion becomes statis- 
tically homogeneous, isotropic, and independent of 
the particular energy-productive mechanisms. He 
proposed that the statistical regime of the small- 
scale eddies is then universal and depends only on v 
and e. The equilibrium range is defined as the range 
of scales in which this universality holds. 

Simple dimensional analysis shows that the only 
algebraic combination of v and e with dimension of 
length is 0. =(v°/e)'/*, which is then interpreted as 
that near which the viscous effect becomes impor- 
tant and hence most of the energy dissipation takes 
place. The scale & is known as Kolmogorov 
dissipation length. 

Kolmogorov theory gives particular attention to 
moments involving differences of velocities, such as 
the pth-order structure function 


Sp(0) = (u(x + fe) -e — u(x)e)”) 
where e may be taken as an arbitrary unit vector, 
thanks to the isotropy assumption. By restricting the 
search for universal laws for the structure functions 
only for small values of £ anisotropy and inhomo- 
geneity are allowed in the large scales. 


The theory assumes a wide separation between 
the energy-containing scales, of order say ¢9, and the 
energy-dissipative scales, of order é, so that the 
cascade process occurs within a wide range of scales 
l such that 0) > 2 >> &. In this range, termed the 
inertial range, the viscous effects are still negligible 
and the statistical regime should depend only on e. 
Then, the Kolmogorov “two-thirds law” asserts that 
within the inertial range the second-order correla- 
tions must be proportional to (eb)? that is, 


Sy (0) = Cge) 


for some constant Cx known as the Kolmogorov 
constant in physical space (there is a related constant 
in spectral space). The argument extends to higher- 
order structure functions, yielding 


Sp(2) = Cp (ee)? 


Kolmogorov’s derivation of these results was not by 
dimensional analysis, it was in fact a more convincing 
self-similarity argument based on the universality 
assumed for the equilibrium range. A different argu- 
ment without resorting to universality assumptions, 
however, was applied to the third-order structure 
function, yielding the more precise “four-fifths law”: 


S3 (£) = —Zel 


The “Kolmogorov five-thirds law” concerns the 
energy spectrum S(x) and is the spectral version of 
the two-thirds law, given by Obukhoft: 


SSC eae p= 


The constant Cy is the Kolmogorov constant 
in spectral space. The spectral version of the 
dissipation length is the Kolmogorov wave number 
ke = (€/v3)"", 

A typical distribution of energy in a turbulent 
flow is depicted in Figure 4. The energy is 








Inertial range 





Equilibrium range 
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concentrated on the large scales, while the dissipa- 
tion is concentrated near the Kolmogorov scale &. 
The four-fifths law becomes visible as a straight line 
in the logarithmic scale. 

A more precise mechanism for the energy cascade 
assumes that in the inertial range, eddies with length 
scale £ transfer kinetic energy to smaller eddies during 
their characteristic timescale, also known as circula- 
tion time. If wy is their characteristic velocity, then 
Te = l/u is their circulation time, so that the kinetic 
energy transferred from these eddies during this time is 


2 3 
Up Uy 
gv =— 
Te l 


In statistical equilibrium, the energy lost to the 
smaller scales equals the energy gained from the 
larger scales, and that should also equal the total 
kinetic energy dissipated by viscous effects. Hence, 
€ = €, and we find 


3 
ee 


4 


It also follows that tọ = ¢/u¢ = bh) = ABP) so 
that the circulation time decreases with the length 
scale and becomes of the order of the viscous 
dissipation time (v/e)'/* precisely when £ ~ £4. 

A similar relation between «€ and the large scales 
can also be obtained with heuristic arguments: let e 
be the mean kinetic energy and 9, a characteristic 
length for the large scales. Then uo given by e = u$ /2 
is a characteristic velocity for the large scales, and 
To = fo/uo is the large-scale circulation time. In 
statistical equilibrium, the rate € of kinetic energy 
dissipated per unit time and unit mass is expected to 
be of the order of e/7, hence 


EN 


which is called the “energy dissipation law.” 








Inertial range 








Equilibrium range 


Figure 4 A typical distribution for the energy spectrum S(«) and the dissipation spectrum 2vK?S(«) in spectral space in 
nonlogarithmic and logarithmic scales. The energy is mostly concentrated on the large scales while the dissipation is concentrated 
near the dissipation scale. In the logarithmic scale, the four-fifths law for the energy spectrum stands out as a straight line with 


slope —4/5 over the inertial range. 


302 Turbulence Theories 






























































Figure 5 A schematic representation of a flow structure 
displaying a range of active scales and a three-dimensional 
grid with linear dimension 9 and mesh length 2., sufficient to 
represent all the active scales in a turbulent flow. The number of 
degrees of freedom is the number of blocks: (9 /¢)°. 


From the energy dissipation law, several relations 
between characteristic quantities of turbulent flows can 
be obtained, such as 0/0. ~ Re?/*, for Re = bouo /v. 

Now, assuming the active scales in a turbulent 
flow exist down to the Kolmogorov scale ¢,., one 
needs a three-dimensional grid with mesh spacing & 
to resolve all the scales, which means that the 
number N of degrees of freedom of the system is of 
the order of N ~ (o/b)? (see Figure 5). This 
number can be estimated in terms of the Reynolds 
number by N ~ Re?/*. This relation is important in 
predicting the computational power needed to 
simulate all the active scales in turbulent flows. 

Several such universal laws can be deduced and 
extended to other situations such as turbulent 
boundary layers, with the famous logarithmic law 
of the wall. They play a fundamental role in 
turbulence modeling and closure, for the calculation 
of the mean flow and other quantities. 


intermittency 


The universality hypothesis based on a constant mean 
energy dissipation rate throughout the flow received 
some criticisms and was later modified by Kolmo- 
gorov in an attempt to account for observed large 
deviations on the mean rate of energy dissipation. Such 
phenomenon of intermittency is related to the vortex 
stretching and thinning mechanism, which leads to the 
formation of coherent structures of vortex filaments of 
high vorticity and low dissipation (Figure 6). These 
filaments have diameter as small as the Kolmogorov 
scale and longitudinal length extending from the 
Taylor scale up to the large scales and with a lifetime 
of the order of the large-scale circulation time. 

It has been argued based on experimental evidence 
that intermittency leads to modified power laws 





Figure 6 A portion of rotating fluid gets stretched and thinned 
as the flow speeds up, generating one of many coherent 
structures of high vorticity and low dissipation. 


Sp (£) x LP), C(p) < p/3, for high-order (p > 3) struc- 
ture functions. The issues of intermittency and 
coherent structures and whether and how they could 
affect the deductions of the universality theory such as 
the power laws for the structure functions are far from 
settled and are currently one of the major and most 
fascinating issues being addressed in turbulence 
theory. Several phenomenological theories attempt to 
adjust the universality theory to the existence of such 
coherent structures. Multifractal models, for instance, 
suppose that the eddies generated in the cascade 
process do not fill up the space and form multifractal 
structures. Field-theoretic renormalization group 
develops techniques based on quantum field renor- 
malization theory. Intermediate asymptotics also 
exploits self-similar analysis and renormalization 
theory but with a somewhat different flavor. Detailed 
mathematical analysis of the vorticity equations is 
also playing a major role in the understanding of the 
dynamics of the vorticity field. 


Mathematical Aspects 
of Turbulence Theory 


From a mathematical perspective, it is fundamental to 
develop a rigorous background upon which to study 
the physical quantities of a turbulent flow. The first 
problem in the mathematical theory is related to the 
deterministic nature of chaotic systems assumed in 
dynamical system theory and believed to hold in 
turbulence. This has actually not been proved for the 
Navier-Stokes equations. It is in fact one of the most 
outstanding open problems in mathematics to deter- 
mine whether given an initial condition for the velocity 
field there exists, in some sense, a unique solution of 
the Navier-Stokes equations starting with this initial 
condition and valid for all later times. It has been 
proved that a global solution (i.e., valid for all later 


times) exists but which may not be unique, and it has 
been proved that unique solutions exist which may not 
be global (i.e., they are guaranteed to exist as unique 
solutions only for a finite time). 

The difficulty here is the possible existence of 
singularities in the vorticity field (vorticity becoming 
infinite at some points in space and time). Depending 
on how large the singularity set is, uniqueness may fail 
in strictly mathematical terms. The existence of 
singularities may not be a purely mathematical 
curiosity, it may in fact be related with the inter- 
mittency phenomenon. Rigorous studies of the vorti- 
city equation may continue to reveal more fundamental 
aspects on vortex dynamics and coherent structures. 

The statistical theory has also been put into a firm 
foundation with the notion of statistical solution of the 
Navier-Stokes equations. It addresses the existence 
and regularity of the probability distribution assumed 
for turbulent flows and of the fundamental elements of 
the statistical theory such as correlation functions and 
spectra. Based on that, a number of relations between 
physical quantities of turbulent flows may be derived 
in a mathematically sound and definitive way. This 
does not replace other theories, it is mostly a 
mathematical framework upon which other techni- 
ques can be applied to yield rigorous results. 

Despite the difficulties in the mathematical theory 
of the NSE some successes have been collected such 
as estimates for the number of degrees of freedom in 
terms of fractal dimensions of suitable sets asso- 
ciated with the solutions of the Navier-Stokes 
equations, and partial estimates of a number of 
relations derived in the statistical theory of fully 
developed turbulence. 


See also: Bifurcations in Fluid Dynamics; Geophysical 
Dynamics; Incompressible Euler Equations: 
Mathematical Theory; Intermittency in Turbulence; 
Inviscid Flows; Lagrangian Dispersion (Passive Scalar); 
Stochastic Hydrodynamics; Variational Methods in 
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Turbulence; Viscous Incompressible Fluids: 
Mathematical Theory; Vortex Dynamics; Wavelets: 
Application to Turbulence. 
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Introduction 


Roger Penrose introduced twistor theory as a geome- 
trical framework for basic physics in order to unify 
quantum theory and gravity. This program has had 
many successes along the way, but the long-term goals 


of reformulating and superceding the established 
theories of basic physics are still a long way from 
being fulfilled. Nevertheless, the successes have had 
many important applications across mathematics and 
mathematical physics. This article will concentrate on 
three areas of application: integrable systems, geome- 
try, and perturbative gauge theory (via twistor-string 
theory). It is intended to be self-contained as far as 
possible, but the reader may well find it easier to first 
read the article Twistors. 
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Twistor Theory 


A basic motivation of twistor theory is to bring out 
the complex (holomorphic) geometry that underlies 
real spacetime. In general relativity, a spacetime is a 
4-manifold with metric g of signature (1,3), and 
when it is flat, that is, g= dt? — dx? — dy? — dz’, 
where (t, x,y,z) are coordinates on Rf, it is called 
Minkowski space. The first appearance of a com- 
plex structure arises from the fact that, at a given 
event, the celestial sphere of light rays (directions of 
zero length with respect to g) naturally has the 
structure of the Riemann sphere, CPt, in such a way 
that Lorentz transformations (linear transformations 
of the tangent space preserving the metric) act on 
this sphere by Mobius transformations. These are 
the maximal group of complex analytic transforma- 
tions of CPt. 

Twistor space extends this idea to the whole of 
Minkowski space. Denoted PT, the twistor space for 
Minkowski space is complex projective 3-space, CP”, 
the space of one-dimensional subspaces of C*; it is a 
three-dimensional complex manifold obtained by add- 
ing a “plane at infinity” to C?. Explicitly, we can 
introduce homogeneous coordinates Z € C* — {0} 
with a=0,1,2,3 but where Z® ~ AZ* for 1 € C — {0}. 
Affine coordinates on a C? chart Z°>40 can 
be obtained by setting (z1,22,A)=(Z°/Z°,Z'/Z?, 
Z?/Z>). Physically, points of twistor space corre- 
spond to spinning massless particles in Minkowski 
space. Mathematically, the correspondence can be 
understood as the Klein correspondence. 


The Klein Correspondence 


The correspondence between PT and Minkowski 
space can be extended first to complexified Minkowski 
space so that the coordinates are allowed to take on 
values in C, and then to its conformal compactification 
by including the “light cone at infinity.” It then 
coincides with the classical complex Klein correspon- 
dence. The Klein correspondence is the one-to-one 
correspondence between lines in CP? and points of a 
four complex-dimensional quadric, CM, in CP°. The 
4-quadric CM can be understood as conformally 
compactified complexified Minkowski space. Introdu- 
cing affine coordinates (z1, 22, A) on PT and (t, x, y, z) 
on CM, we find that a point (t,x,y,z) in CM 
corresponds to a line in PT according to 


4). te aiy 1 
Zz) \x-ly t+z A 


Alternatively, fixing (\,z1,Z2) in these equations 
gives a 2-plane in complex Minkowski space 
corresponding to all the lines in PT through 
(A, 21,22). Such 2-planes are called “a-planes.” 


They are totally null (i.e., the tangent vectors not 
only have zero length but are also mutually 
orthogonal) and also self-dual (under the differential 
geometer’s notion of Hodge duality). 

This complex correspondence can also be 
restricted to give correspondences for R* with 
metrics of positive-definite, Euclidean, signature or 
ultrahyperbolic, (2,2), signature. A particular sim- 
plification in Euclidean signature is that the complex 
a-planes intersect the real slice in a point. The 
conformal compactification of Euclidean R* is the 
4-sphere S* given by adding a single point at infinity, 
and so we have a projection p: PT —S* whose 
fibers are holomorphically embedded CP's. These 
fibers can be characterized as the lines in PT that 
are invariant under a quaternionic complex con- 
jugation which is an antiholomorpic map*:PT— 
PT with no fixed points. (Here quaternionic means 
that on the nonprojective twistor space, T=", the 
conjugation has the property Z=—Z so that it 
defines a second complex structure anticommuting 
with the standard one; this is sufficient to express 
T=Q*, where Q denotes the quaternions. The 
complex structures i, j, and k of the quaternions 
are given by identifying i with /—1 on C* and j 
with “and k=jij.) 


The Penrose Transform 


A basic task of twistor theory is to transform 
solutions to the field equations of mathematical 
physics into objects on twistor space. This works 
well for linear massless fields such as the Weyl 
neutrino equation, Maxwell’s equations for electro- 
magnetism and linearized gravity. In its general 
form, this transform has become known as the 
Penrose transform. Such fields correspond to freely 
prescribable holomorphic functions f(A, 21,22) (or, 
more precisely, analytic cohomology classes) on 
regions of twistor space. The field can be obtained 
from this function by means of a contour integral. 
The simplest of these integral formulas is 


p(x") = $ f(t- z +A + iy), — iy 
+ A(t + 2z))dà 


and differentiation under the integral sign leads 
easily to the fact that ¢ satisfies the wave equation 

rp Fb Fb Od 
This formula was originally discovered by Bateman. 
Note that f must have singularities on twistor space 
to yield a nontrivial ¢ and even then, there are many 
choices of f that yield zero. For a solution ¢ defined 


over a region U in spacetime, the function f is 
correctly understood as a representative of a Cech 
cohomology class defined on the region U’ in twistor 
space swept out by the lines corresponding to points 
of U. Furthermore, the function f should be taken 
globally to be a function of homogeneity —2, 
f{(AZ°) =X *f(Z°). This formula has generalizations 
to massless fields of all helicities in which a field of 
helicity s corresponds to a function (Cech cocycle) of 
homogeneity degree 2s — 2. 

The Penrose transform has found important 
applications in representation theory and integral 
geometry. For a review, the reader is referred to 
Baston and Eastwood (1989), the relevant survey 
articles in Bailey and Baston (1990), or Mason and 
Hughston (1990, chapter 1). 


Twistor Theory and Nonlinear Equations 


The Penrose transform for the Maxwell equations 
and linearized gravity turns out to be linearizations 
of correspondences for the nonlinear analogs of 
these equations: the Einstein vacuum equations and 
the Yang-Mills equations. However, the construc- 
tions only work when these fields are anti-self-dual. 
This is the condition that the curvature 2-forms 
satisfy F* = —iF, where x» denotes the Hodge dual 
(which, up to certain factors of i, has the effect of 
interchanging electric and magnetic fields); it is a 
nonlinear generalization of the right-handed circular 
polarization condition. Explicitly, in terms of space- 
tme mdices A Vises C12, (1/2)EabedF®, 
where €0123=1 and Eabcd =€[abcd}- In Minkowski 
signature, the i factor in the anti-self-duality condi- 
tion implies that real fields cannot be anti-self-dual. 
Thus, these extensions are not sufficient to fulfill the 
ambitions of twistor theory to incorporate real 
classical nonlinear physics in Minkowski space. 
However, the factor of i is not present in Euclidean 
and ultrahyperbolic signature, so the anti-self- 
duality condition is consistent with real fields in 
these signatures and this is where the main applica- 
tions of these constructions have been. 


The Nonlinear Graviton Construction 
and Its Generalizations 


The first nonlinear twistor construction was due to 
Penrose (1976), and was inspired by Newman’s 
(1976) construction of “heavens” from the infinities 
of asymptotically flat spacetimes in general 
relativity. 

The nonlinear graviton construction proceeds 
from the definition of twistors in flat spacetime as 
a-planes in complexified Minkowski space. It is 


Twistor Theory: Some Applications 305 


natural to ask which complexified metrics admit a 
full family of a-surfaces, that is, 2-surfaces that are 
totally null and self-dual. The answer is that a full 
family of a-surfaces exists iff the conformally 
invariant part of the curvature tensor, the Weyl 
tensor, is anti-self-dual. If this is the case, twistor 
space can be defined to be the (necessarily three- 
dimensional) space of such a-surfaces. 

The remarkable fact is that the twistor space, 
together with its complex structure, is sufficient to 
determine the original spacetime. Twistor space is 
again a three-dimensional complex manifold, and 
contains holomorphically embedded rational curves, 
CP's, at least one for each point of the spacetime. 
However, holomorphic rigidity implies that the 
family of rational curves is precisely four- 
dimensional over the complex numbers. Further- 
more, incidence of a pair of curves can be taken to 
imply that the corresponding points in spacetime lie 
on a null geodesic and this yields a conformal 
structure on spacetime. Further structures on twistor 
space can be imposed to give the complex spacetime 
a metric that is vacuum, perhaps with a cosmologi- 
cal constant. The correspondence is stable under 
small deformations and so the data defining the 
twistor space is effectively freely prescribable, see 
Penrose (1976). 

In Euclidean signature, again the complex 
a-planes intersect the real spacetime in a point, so 
the twistor space again fibers over spacetime. The 
twistor fibration can be constructed as the projecti- 
vized bundle of self-dual spinors or more commonly 
as the unit sphere bundle in the space of self-dual 
2-forms (Atiyah et al. 1978). In the latter formula- 
tion, the complex structure on the twistor space 
arises from the direct sum of the naturally defined 
complex structures on the horizontal and vertical 
tangent spaces to the bundle; that on the vertical 
subspace is the standard one on the sphere, and that 
on the horizontal subspace is a multiple of the self- 
dual 2-form at the given point of the fiber. 

There are now large families of extensions, 
generalizations, and reductions of this construction. 
They are all based on the idea of realizing a space 
with a given complexified geometric structure as the 
parameter space of a family of holomorphically 
embedded submanifolds inside a twistor space. In 
general, the most useful of these constructions are 
those in which the “spacetime” is obtained as the 
space of rational curves in a twistor space. This is 
because the equations that are solved on the 
corresponding spacetime can be thought of as a 
completely integrable system in which the integr- 
ability condition for the generalized a-surfaces is 
interpreted as the consistency condition of a Lax 
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pair or more general linear system. For a more 
detailed discussion from this point of view, see 
Mason and Woodhouse (1996, chapter 13). 


The Anti-Self-Dual Yang-Mills Equation 
and Its Twistor Correspondence 


The anti-self-dual Yang-Mills equations extend 
Maxwell’s equations for electromagnetism in the 
right-circularly polarized case. They are a family of 
equations that depend on a choice of Lie group G, 
usually taken to be a group of complex matrices; 
Maxwell’s equations arise when G = U(1). 

Introduce coordinates x*,a=0,1,2,3, on R* with 
metric ds? = dx? - dx? — dx! - dx? (this is a metric of 
ultrahyperbolic signature — Euclidean signature can 
be obtained by choosing the coordinates to be 
complex, but with (x°, —x*) the complex conjugates 
of (x°,x!)). The dependent variables are the compo- 
nents A, of a connection D= ð, — Aa, where 
0, =0/Ox* and A, = A,(x’) € Lie G, the Lie algebra 
of G. This connection defines a method of differ- 
entiating vector-valued functions s in some repre- 
sentation of G. The freedom in changing bases for 
the vector bundle induce the gauge transformations 
Aa œ> g Aag — g '0n2, g(x) € G on Aa; two connec- 
tions that are related by a gauge transformation are 
deemed to be the same. The self-dual Yang-Mills 
equations are the condition 


[Do, D2] = [D1, D3] = [Do, D3] — [D1, D2] = 0 
They are the compatibility conditions 
[Do + ADi, Dy + AD3] = 0 
for the linear system of equations 
(Dy — AD1)s = (D2 — XD3)s = 0 1] 


where AE C and s is an n-component column 
vector. These latter equations form a “Lax pair” 
for the system. 

The Ward (1977) construction provides a one-one 
correspondence between gauge equivalence classes 
of solutions of the self-dual Yang-Mills equations 
and holomorphic vector bundles on regions in 
twistor space. The key point here is that eqn [1] 
defines parallel propagation along a-planes. To each 
point Z in twistor space, we can associate the vector 
space Ez of solutions to eqn [1] along the 
corresponding a-plane. These vector spaces vary 
holomorphically with Z and that is what one means 
by a holomorphic vector bundle E— PT. The 
remarkable fact is that the anti-self-dual Yang- 
Mills field can be reconstructed up to gauge from E, 
and, in effect, for local analytic solutions, E can be 
represented by freely prescribable “patching” data 


consisting of local holomorphic matrix-valued func- 
tions on twistor space. To construct the solution on 
spacetime, one must first find a Birkhoff factoriza- 
tion of the patching data on each Riemann sphere in 
twistor space corresponding to points of the appro- 
priate region in spacetime. On each Riemann sphere, 
the Birkhoff factorization starts with the given 
patching function with values in GL(z,C) on the 
real axis in the complex plane, and expresses it as a 
product of functions with values in GL(z, C) one of 
which extends over the upper-half plane, and the 
other over the lower-half complex plane. The anti- 
self-dual connection can be obtained by differentiat- 
ing the resulting matrices. See Penrose (1984, 1986), 
Ward and Wells (1990), or Mason and Woodhouse 
(1996) for a full discussion, and Atiyah (1979) for 
the formulation appropriate to Euclidean signature. 


Completely Integrable Systems 


In effect, the twistor constructions amount to 
providing a geometric general local solution to the 
anti-self-duality equations; the twistor data is, for a 
local solution, freely prescribable. In this sense, they 
demonstrate complete integrability of the anti-self- 
duality equations. The reconstruction of a solution 
on spacetime from twistor data is not a quadrature — 
it involves, in the anti-self-dual Yang-Mills case, a 
Birkhoff factorization (also sometimes referred to as 
the solution to a Riemann-—Hilbert problem), and in 
the case of the anti-self-dual Einstein equations, the 
construction of a family of rational curves inside a 
complex manifold. Nevertheless, such constructions 
are a familiar part of the apparatus of the theory of 
integrable systems. 

In Ward (1985), this connection with integrable 
systems was developed further, and the anti-self- 
dual Yang-Mills equations were shown to yield 
many important integrable systems under symmetry 
reduction. Ward’s list has been extended and now 
includes many of the most famous examples of 
integrable systems such as the Painlevé equations, 
the Korteweg-de Vries (KdV) equation, the non- 
linear Schrödinger equation, the n-wave equations, 
and so on, see Mason and Woodhouse (1996) for a 
review. There are some notable omissions from the 
list such as the Kadomtsev—Petviashvili (KP) and 
Davey-Stewartson equations (at least if one restricts 
oneself to finite-dimensional gauge groups; reduc- 
tions using infinite dimensional gauge groups have 
been obtained). 

The list of integrable systems obtainable by 
symmetry reduction nevertheless remains impressive 
and provides a route to the classification of at least 
those integrable systems that can be obtained in this 


way. Such systems can be classified by the choice of 
ingredients required in the symmetry reduction: the 
gauge group, the group of spacetime symmetries to 
be reduced by, the choice of Euclidean or ultra- 
hyperbolic signature, and the choice of certain 
constants of integration that arise in the reduction. 

Another implication is that if an integrable system 
can be obtained from one of the  self-duality 
equations by symmetry reduction, then it inherits a 
reduced twistor correspondence because the twistor 
correspondences share the symmetry groups of the 
spacetime field equations. These twistor correspon- 
dences can be seen to underlie much of the theory of 
these equations; for example, Backlund transforma- 
tions of solutions correspond to elementary alge- 
braic operations on the twistor data, similarly the 
Kac—Moody Lie algebras of hidden symmetries act 
locally on the twistor data by matrix multiplication 
of the appropriate loop algebras. Similarly, the 
inverse-scattering transform for the KdV and non- 
linear Schrodinger equations can be seen to arise as 
particular presentations of the twistor construction. 

By and large, although twistor methods have 
yielded new insight into the geometry and structure 
of systems in dimensions 1 and 2, they have not 
necessarily superceded pre-existing techniques for 
constructing solutions and analyzing the solution 
space. The systems for which twistor methods have 
been particularly effective for constructing solutions 
and characterizing their properties are in 2+1 or 
higher dimension. Key examples here are of course 
the anti-self-dual Yang-Mills and Einstein equations 
themselves, and their single translation reductions. 
In the anti-self-dual Yang-Mills case, these reduc- 
tions lead either to Ward’s or Manakov and 
Zakharov’s chiral model in Lorentzian signature, 
2+1, or the Bogomolny equations for monopoles, 
the reduction from Euclidean signature. In both 
cases, the twistor construction has played a major 
role in constructing and studying the solitonic 
solutions. 

See Ward and Wells (1990), Mason and Wood- 
house (1996), Ward’s article in Huggett et al. (1998) 
and the first few chapters of Mason et al. (1995), 
and Mason et al. (2001) for more examples of 
aspects of the theory of integrable systems arising 
from twistor correspondences. 


Applications to Geometry 


These applications are, to a large extent, higher- 
dimensional analogs of those discussed above; most 
of the problems in geometry to which twistor theory 
has been applied are those for which the underlying 
differential equations are integrable. These start 
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with the Euclidean signature versions of the original 
Ward construction for anti-self-dual Yang-Mills 
fields and Penrose’s nonlinear graviton construction 
for Ricci-flat anti-self-dual metrics but, as we will 
discuss, these constructions have a number of 
extensions and generalizations. 

The first dramatic application of these construc- 
tions was the ADHM construction of Yang-Mills 
instantons. These are absolute minima of the Yang- 
Mills action, S[A] = f tr(F A F*) on the 4-sphere, S4, 
with its round metric. A simple argument shows that 
the action is bounded below by the second Chern 
class of the bundle and that this bound is achieved 
only for anti-self-dual fields. Thus, the problem was 
to characterize all the anti-self-dual Yang-Mills 
fields on S*. In this Euclidean context, twistor 
space, CP?, fibers over S4 and the corresponding 
Ward vector bundle is a bundle over all of CP. It 
turns out that all such bundles satisfying a certain 
stability condition had been constructed reasonably 
explicitly by algebraic geometers. Since the stability 
condition was implied by the context, this could be 
turned into an algebraic construction of the general 
instanton explicit enough to give some insight into 
both the local and global structure of the solution 
space. See Atiyah (1979) for a review. 

Hitchin used the Euclidean version of the non- 
linear graviton to develop the theory of gravitational 
instantons that are asymptotically locally Euclidean 
(i.e. asymptotically R/T, where T is a finite 
subgroup of the rotation group). These were finally 
constructed by Kronheimer who again used twistor 
theory to identify the appropriate parameter space, 
see his article in Mason et al. (2001) and Dancer’s 
review of hyper-Kahler manifolds in LeBrun and 
Wang (1999). 

Even in four dimensions, there are a number of 
variants of the nonlinear graviton construction. The 
basic twistor correspondence produces a twistor 
space that is a complex 3-manifold PT for 
4-manifolds with conformal structures whose Weyl 
tensor is anti-self-dual. There are four natural 
specializations that have attracted study: (1) the 
Ricci-flat case, (2) the Einstein case (with nonzero 
cosmological constant), (3) the scalar-flat Kahler 
case, and (4) the hypercomplex case. 

The twistor space in the Ricci-flat case admits the 
additional structure of a fibration over CP! together 
with a holomorphic Poisson structure on the fibers 
with values in the pullback of the 1-forms on CP! 
(alternatively, the bundle of holomorphic 3-forms 
should be the pullback of the square of the bundle of 
holomorphic 1-forms on CP'). The Einstein case 
with nonzero cosmological constant is a variant of 
this in which the twistor space admits a 
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nondegenerate holomorphic contact structure, that 
is, a distribution of 2-plane elements, which are only 
integrable when the cosmological constant vanishes. 
It also admits a Kahler form when the scalar 
curvature is positive (in the negative case the 
corresponding Kahler form is indefinite). For the 
case of Kahler metrics with vanishing scalar curva- 
ture, the twistor space admits a holomorphic volume 
form with a double pole. The Ricci-flat case is 
equivalent to the case of hyper-Kahler metrics, those 
that are Kahler with respect to three different 
complex structures I,J, and K satisfying the stan- 
dard quaternionic relations IJ = K, etc. A hypercom- 
plex structure is obtained when one only has the 
three integrable complex structures satisfying the 
quaternion relations. Such manifolds admit an 
underlying conformal structure that is anti-self- 
dual, and the corresponding twistor space admits a 
fibration to CP’. 

These constructions have all played a significant 
role in the general analysis of these geometric 
structures, and the construction of examples. A 
striking example of an application of the nonlinear 
graviton construction to general properties is due to 
Donaldson and Friedman who show that if two 
4-manifolds admit anti-self-dual conformal struc- 
tures, then their direct sum does also. 

In higher dimensions, most generalizations rely on 
quaternionic geometry and its reductions. The 
Euclidean signature formulation of the nonlinear 
graviton construction has natural extensions to 
quaternionic manifolds in 4k dimensions. These are 
manifolds with metric whose holonomies are con- 
tained in Sp(k) x Sp(1). The latter SP(1)=SU(2) 
factor leads to an associated S* bundle whose total 
space is the twistor space PT and it naturally has 
the structure of a (2k+1)-dimensional complex 
manifold. 

For a series of review articles, the reader is 
referred to Bailey and Baston (1990, chapters 3 
and 4) and also LeBrun and Wang (1999, chapters 
2, 5, 6, 10, and 14) which, despite being a book on 
the distinct subject of Einstein manifolds, is strongly 
influenced by twistor theory. Other applications 
along these lines are summarized in Mason et al. 
(2001, chapter 1). 

There are a number of applications that go 
beyond complete integrability. A striking application 
is the twistor framework of Merkulov for studying 
arbitrary geometric structures. This has led to a 
classification of all possible irreducible holonomies 
of torsion-free affine connections, see Merkulov’s 
article in Huggett et al. (1998). Another important 
area is in the field of conformal invariants in which 
the local twistor connection plays a prominent role. 


This is a connection that is naturally defined on any 
conformal manifold being the spinor representation 
of the Cartan conformal connection. An impressive 
application here is the construction of conformally 
invariant differential operators and other conformal 
invariants. See the article by Baston and Eastwood 
in Bailey and Baston (1990). 


Beyond Classical Integrability: 
Twistor-String Theory 


Until Witten (2004), there was little indication that 
twistor theory would have much useful to say about 
Yang-Mills or gravitational fields that are not anti- 
self-dual. Furthermore, it was problematic to incor- 
porate quantum field theory into twistor ideas. 
However, twistor-string theory has transformed the 
situation and has furthermore had impressive appli- 
cations to the field of perturbative gauge theory. 

The story starts with a formulation by Nair of the 
remarkable Park—Taylor formulas for the so-called 
maximal helicity violating (MHV) amplitudes in 
gauge theory. These are scattering amplitudes at tree 
level in which helicity conservation is maximally 
violated; using crossing symmetry to take all the 
particles to be outgoing, these are amplitudes in 
which  — 2 of the particles have helicity —1 and two 
have helicity +1. These amplitudes can be expressed 
simply as follows. Let the n particles have color t; in 
the Lie algebra of the gauge group and null 
momenta p; with spinor decompositions p? = 7474, 
i=1,...,n where the a are self-dual spinors and 
T^ are anti-self-dual spinors using the index notation 
of Spinors and Spin Coefficients, and Twistors. Let 
i=r and i=s be the two gluons of helicity +1. Then 
the coefficient of the colour term tr(t,t. ---t,,) is 


(Sp? | = = — 
i=1 1 


—1 i ° Ti+1 


where 7;-7;=7 m; denotes the standard skew- 
symmetric inner product on chiral spinors and 
Tn11=7 1. A striking feature is that, except for the 
delta function, it is holomorphic in the z;s except at 
the simple poles 7; -7j;,;=0. Nair interprets these 
poles as those associated to fermion correlators in a 
current algebra on a CP! parametrized by 7. Using a 
supersymmetric formulation adapted to N =4 super 
Yang-Mills, he formulated the amplitude as arising 
from an integral over lines in supertwistor space 
CP. 

Witten extends these ideas to give, at least 
conjecturally, a complete theory. He proposes that 
full perturbative N=4 super Yang-Mills theory on 
spacetime is equivalent to a string theory, a topological 


B model, on a supersymmetric version of twistor 
space, PT, =CP**. This is the space obtained by 
taking C44 with bosonic coordinates Z°,a=0,...,3 
and fermionic coordinates 7’,i=1,...,4 moduli the 
equivalence relation (Z°%,7') ~ A(Z°,7') where 
AECAF 0. 

The number 4 here plays two crucial but different 
roles. It is the maximum number of supersymmetries 
that Yang-Mills can have; it has the effect of 
incorporating both the positive and negative helicity 
parts of the gauge field in the same supermultiplet. It 
is also the only value of N for which CP°' is a 
Calabi-Yau manifold and this is a necessary condi- 
tion for the topological twisted B model to be 
anomaly-free. The Calabi-Yau condition is the 
condition that the manifold admit a global holo- 
morphic volume form which here is 


Op = cam AZ hd Ade 
Adn! A dn? A dr? A dnf 


This is invariant under (Z°, n) > (AZ®, An’) because 
d(Anf) = Adn, A € C follows from the Berezinian 
rule of integration [0d@=1 for anticommuting 
variables. 

Open-string topological twisted B models are 
known to correspond to holomorphic Chern-Simons 
theories on their target space. A holomorphic Chern- 
Simons theory is a theory whose basic variable is a 
d-bar operator 0,=0+A on a complex vector 
bundle E— PT?4, where A is a Lie algebra valued 
(0, 1)-form on the target space and whose action is 


S[A] = J; (43A + 54°) AO, 


The field equations are 07 =0. The classical solutions 
therefore consist of holomorphic vector bundles on 
the target space, here CP*!*. The twistor-space 
representation of the fields are obtained by expanding 
A in the anticommuting variables n to obtain 


A=a + nbi + nn cij + nin! nk dijp 


+n yng 


and a has homogeneity zero, but because the 
homogeneity of 77! is of degree 1, b; has homogeneity 
degree —1, and so on down to homogeneity degree 
—4 for g. Via the Ward construction, the a 
component corresponds to an anti-self-dual Yang- 
Mills field on spacetime. The other components of A 
can be seen to correspond to spacetime fields with 
helicities —1/2 to +1 that are background coupled to 
the anti-self-dual Yang-Mills field. 
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As it stands, although this holomorphic Chern- 
Simons theory gives the correct field content of 
N=4 super Yang-Mills, the couplings are only 
those of an anti-self-dual sector and more couplings 
are needed to obtain full N=4 super Yang-Mills. 
The remarkable fact is that these can be naturally 
introduced by coupling in certain D1 instantons. 
The D1 instantons are algebraic curves C in twistor 
space and the coupling is via a pair of spinor fields a 
and 3 on C with values in E and E*, respectively 
with action 


Sia,8,.4)= | 9840 


This leads to explicit expressions for Yang-Mills 
scattering amplitudes in terms of integrals of 
fermion correlators over the moduli spaces of such 
algebraic curves in supertwistor space. In principle, 
the integral is over all algebraic curves. However, 
algebraic curves have two topological invariants, 
their degree denoted d and genus g. An argument 
based on a classical scaling symmetry gives that 
integration over just those of curves of degree d 
gives the subset of processes for which 


d=q-1+l 


where g is the number of outgoing particles of 
helicity +1 in the process and / is the number of 
loops. It is also the case that g < l. 

An elegant formula for the amplitudes is that for 
the on-shell generating functional for tree-level 
scattering amplitudes .v[A], where A is the on- 
shell twistor field, being the above-mentioned (0, 1)- 
form. The generating functional for processes with 
q=d + 1 external fields of helicity +1 is then 


AA = J det(d + A)lcdu 
Ce. 74 


where dy is a natural measure on the moduli space 
“4 of connected rational (genus 0) curves in CP?4 
of degree d. This approach has been successfully 
exploited to obtain implicit algebraic formulas for 
all tree-level scattering amplitudes. 

In an alternative version, the curves of degree d 
can be taken to be maximally disconnected, being 
the union of d lines. However, in this approach, we 
need to also incorporate Chern—Simons propagators 
which, for tree diagrams, join the lines into a tree. 
This gives a very flexible calculus for perturbative 
gauge theory in which scattering processes are 
obtained by gluing together MHV diagrams. It has 
been argued that the two formulations are equiva- 
lent. On the one hand, the Chern-Simons propaga- 
tor has a simple pole when the lines meet and the 
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contour integral over the moduli space can be 
performed using residues in such a way as to 
eliminate the Chern—Simons propagators leaving an 
integral over d intersecting lines. On the other hand, 
the measure on the space of connected curves has a 
simple pole where the curve acquires double points 
and again the contour integral can be performed in 
such a way as to yield the same integral over d 
intersecting lines. 

It should be mentioned that Berkovits has given an 
alternative version of twistor-string theory which is a 
heterotic open-string theory with target supertwistor 
space in which the strings are taken to have boundary 
on the real slice RP? in CP? (this is appropriate to a 
spacetime with split signature) and the D1-instanton 
expansions are replaced by expansions in the funda- 
mental modes of the string (this is not a topological 
theory). This gives rise to the same formulas for 
scattering amplitudes as Witten’s original model. 

There have been many applications now of these 
ideas, perhaps the most striking being the recursion 
relations of Britto, Cachazo, Feng, and Witten 
which give, at tree level, on-shell recurrence rela- 
tions for Yang-Mills scattering amplitudes that 
suggests a hitherto unsuspected underlying structure 
for Yang-Mills theory. 

Despite all these successes, twistor-string theory is 
not thought by string theorists to be a good vehicle for 
basic physics. The most serious problem is that the 
closed-string sector gives rise to conformal supergravity 
which is an unphysical theory. This is particularly 
pernicious from the point of view of analyzing loop 
diagrams as from the point of view of string theory, 
loop diagrams will carry supergravity modes. From this 
point of view, twistor-string theory is another duality, 
like AdS-CFT etc., that gives insight into some standard 
physics but is fundamentally limited. 

From the point of view of a twistor theorist, 
however, twistor-string theory has overcome major 
obstacles to the twistor programme. Hodges has 
used the BCFW recursion relations to provide all 
twistor diagrams for gauge theory. In Mason (2005) 
it is shown how to derive the main generating 
function formulas from Yang-Mills and conformal 
gravity spacetime action principles via a twistor 
space actions for these theories. These twistor 
actions can in the first instance be expressed purely 
bosonically and distinctly and the twistor-string 
generating function formulas are obtained by 
expanding and re-summing the classical limit of the 
path integral in a parameter that expands about the 
anti-self-dual sector. This allows one to decouple the 
Yang-Mills and conformal gravity modes, and 
indeed to work purely bosonically — one is not tied 
to super Yang-Mills. Although there is much work 


to be done to extend these ideas to provide a 
consistent approach to the main equations of basic 
physics, obstacles that seemed insurmountable a few 
years ago have been overcome. 


See also: Chern—Simons Models: Rigorous Results; 
Einstein Equations: Exact Solutions; General Relativity: 
Overview; Instantons: Topological Aspects; Integrable 
Systems and the Inverse Scattering Method; Riemann- 
Hilbert Methods in Integrable Systems; Spinors and Spin 
Coefficients; Twistors; Classical Groups and 
Homogeneous Spaces; Quantum Mechanics: 
Foundations; Several Complex Variables: Compact 
Manifolds; Several Complex Variables: Basic Geometric 
Theory. 
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Introduction 


Twistor theory initially arose from two principal 
motivations: a desire for a conformally invariant 
calculus for spacetime geometry and fields on 
spacetime, and a desire to unify and account for 
the various occurrences of complex numbers and 
holomorphic functions in mathematical physics, 
especially in general relativity (Penrose and 
MacCallum 1973). The theory leads to a nonlocal 
relation between spacetime and twistor space, 
whereby a point in one is an extended object in 
the other. Part of the present-day motivation of the 
subject is that this nonlocal relation will be a 
fruitful way to approach the quantization of 
spacetime. A comparison is often invoked with 
Hamiltonian mechanics, which is a formal rephras- 
ing of classical mechanics that nonetheless provides 
a bridge from that theory to quantum mechanics. 
The hope is that the twistor theory has the right 
character to provide a bridge from general relativ- 
ity to quantum theory, specifically to quantum 
gravity. 

The principal successes of twistor theory in 
mathematical physics can be characterized as 
the linear Penrose transform, which provides a 
solution of the zero-rest-mass free-field equations 
in Minkowski space in terms of sheaf cohomology in 
twistor space, and the nonlinear Penrose transform, 
which provides solutions of certain nonlinear field 
equations in terms of holomorphic geometry. These 
are treated below, together with other applications 
of twistor theory, following a brief introduction to 
twistor geometry. 

Very recently, there has been a resurgence of interest 
in twistor theory following Witten’s introduction of 
twistor string theory (Witten 2003) as a string theory 
in twistor space. This is not treated here, but this 
article does provide the necessary background. 
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Ward RS and Wells RO (1990) Twistor Geometry and Field 
Theory. Cambridge: Cambridge University Press. 
Witten E (2004) Perturbative gauge theory as a string theory in 


twistor space. Communications in Mathematical Physics 252: 
189 (arXiv:hep-th/0312171). 


Twistor Geometry 


General references for this section are the books by 
Penrose and Rindler (1986) and Hugget and Tod 
(1994). It will be convenient to use Penrose’s 
abstract index convention (Penrose and Rindler 
1984, 1986), which is also used in Spinors and 
Spin Coefficients. This can be used wherever vector 
or tensor indices occur. Suppose that V is a (real or 
complex) finite-dimensional vector space with dual 
V’. Elements of V are written v% u?, w5 ..., where an 
index a,b,c,... is regarded not as an integer in the 
range 1 to dim V but simply as an abstract label 
indicating that the object to which it is attached is a 
vector. Elements of V’ are similarly written 
Ua, Up, We... and elements of the tensor algebra as 
{77b 4 according to valence, and so on. The usual 
operations of tensor algebra are written in the way 
that component calculations would suggest, but 
without necessitating a choice of basis. The jump 
to tensor fields on a manifold M is immediate. A 
metric is a particular field g,, and determines a Levi- 
Civita connection V, which defines maps V, :v? > 
V,v? and similar for other valences. The virtue of 
the formalism is that, while remaining invariant, it 
can harness the strength and flexibility of calcula- 
tions in components. 

With this understanding, twistors may first be 
defined as the fundamental representation of 
SU(2,2), so that they are elements Z° of a four- 
dimensional complex vector space T. T carries a 
Hermitian form © of signature (+ + — —) which is 
made explicit below and which provides an isomorph- 
ism from the complex conjugate of T to its dual. This 
isomorphism is used to eliminate all appearances of 
complex-conjugate twistors from the formalism and is 
therefore regarded as an antilinear map to the dual. 

SU(2, 2) is the double cover of O(2, 4), the rotation 
group of E24, the six-dimensional space with flat 
metric 72,4 of signature (+ — — — + —),which in turn is 
the double cover of C(1,3), the conformal group of 
Minkowski space M. This last group homomorphism 
may be made explicit as follows (suspending the 
abstract-index convention for the duration of this 
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: introduce pseudo-Cartesian coordinates 

3 ee ail eo ae 
X,X,X x) on M and y°=(y',y 59"; 
y*,y°) on E24. The corresponding metrics are 


aside 


n13 = nNapdx’ dx? 
= (dx°)* — (dx) — (dx?) — (dx?)° [1] 


72,4 = Noedy*dy” 
= (dy®)* — (dy')* — (dy*)* — (dy?) 
+ (dy*)* — (dy®)? [2 


We map M into E24 by 
o(x*) = (x', x", x°, 2°, (1—9)/2,(1+)/2) [3] 


where n=" px7x" with nap as in [1], and it can be 
checked that (M) is the intersection of the null cone 
N of the origin in E24 with the plane P defined by 
yt +y=1. P is in fact a null hyperplane in E24 
and any point of N not on the null hyperplane 
defined by 


yf +y =0 [4] 


can be mapped along the generators of N to a 
unique point of P (recall that any point on a cone 
lies on a line through the vertex: these lines are the 
generators). Thus, the image of M under ¢ gives a 
point on every generator of N except those satisfying 
[4]. It can also be seen from [2] that the intrinsic 
metric in E24 on the intersection of N and P is 
just 1,3- 

Now let PN be the projective null cone, or, 
equivalently, the space of generators of N. This is a 
compact manifold with topology St x S3, as one can 
see by intersecting N with the sphere 


PEAU 7)? + (93)* + (94) Hy = 


Each generator meets this sphere twice at, say, y® 
and —y% and PN is the quotient by this identifica- 
tion of the two surfaces 


(+ O47) =1= 01 40° +P + OY 
which define the intersection. The metric 72,4 defines 
a degenerate metric on N, which, however, is 
nondegenerate on any smooth cross section of N 
which meets each generator once. Furthermore, the 
map along the generators between any two such 
cross sections is conformal. Thus, there is a 
conformal metric on PN and it is conformal to 
m,3- We call PN compactified Minkowski space Me 
as it is compact and has the same conformal metric 
as Minkowski space. It can be thought of as M 
compactified by the addition of some points, namely 


the points of PN corresponding to the generators 
satisfying [4]. To interpret these, we consider the 
points satisfying the similar equation y* — y =0. By 
inspection of ¢, [3], we see that these points 
correspond to the light cone of the origin in M. 
Thus, Me is obtained from M by adding a single 
light cone, the light cone at infinity known as Z and 
read as “scri,” short for “script-I.” 

Now the rotation group O(2, 4) of E24 maps N to 
itself preserving the metric and consequently maps 
PN to itself, preserving the conformal metric. Thus, 
O(2,4) defines conformal transformations of Me 
and a count of dimension shows that it is locally 
isomorphic to the conformal group C(1, 3). The map 
is two-to-one with +I in O(2,4) maping to I in 
C(1,3). The fact that SU(2,2) is four-to-one homo- 
morphic to C(1,3) follows from calculations below. 
It is because of this homomorphism of SU(2,2) and 
C(1,3) that the geometry and analysis of twistors 
(i.e., twistor theory) provides a formalism adapted 
to conformally invariant or conformally covariant 
notions in M or Me. 

A twistor may be expressed in terms of two- 
component spinors of SL(2,C), the double cover of 
the Lorentz group, as follows: 


Zi = (ea) [5] 
where again indices are abstract, so that 
T=S6S' 


in terms of the spin space S and complex-conjugate 
dual spin space S’ of M. Now we can write 
the action of infinitesimal elements of C(1, 3) 
explicitly as 


ms / 
o4= pi pw? GTO" gap he 


a. | (6) 
TAa = PË, Tp + Bina + Ama 


where T^^ (a real vector) defines an infinitesimal 
translation, By, (another real vector) defines an 
infinitesimal special conformal transformation, A (a 
real constant) defines a dilatation and the (real) 
bivector M,, = dapeap + Oapreap defines an infini- 
tesimal rotation. This gives a total of 15 parameters 
for the transformation, which is the correct dimen- 
sion for C(1, 3). 
The Hermitian form %(,) can be written as 


O(Z,Z) = Z°Z, = wt, +O Ty 7] 


when it can be checked that the transformations [6] 
leave it invariant (and that its signature is (+ + — —); 
this establishes that SU(2, 2) is locally isomorphic to 
C(1,3)). Equation [7] will be referred to as the norm 
of a twistor. 


From [6], a twistor Z® = (w^ ma) gives rise, under 


translation by a variable x4“, to a spinor field 04 
given by 


QA = wh — ix ay [8] 
Differentiating [8] and symmetrizing, we see that Q4 
satisfies the differential equation 


VAa(A QB) = 0 [9] 


which is known as the twistor equation. In fact, the 
general solution of [9] takes the form of [8] for 
constant spinors w and my. Furthermore, the 
conformal group can be shown directly to act on 
solutions of [9], so that twistor theory can begin 
with the study of [9] and its solutions. In this 
approach, a twistor is precisely a solution of [9]. 

Given a spinor field Q4 of the form of [8], we may 
seek the points of M where it vanishes. In general, there 
are none, but if we consider complexified Minkowski 
space CM, then Q4 vanishes on a two-dimensional 
complex plane with the property that every tangent 
vector is of the form A^^ for varying à^ and fixed 
tê. The 2-plane is flat and totally null, in that the 
(analytically extended) Minkowski metric vanishes 
identically on it, and it has a self-dual (SD) tangent 
bivector determined by x“. Such a 2-plane is known as 
an a-plane (reserving the term 8-plane for a totally null 
2-plane with anti-self-dual (ASD) tangent bivector). At 
a given point p in CM, there is an a-plane for each 
choice of my up to scale (in other words, for each 
element of the projective (primed) spin space at p) 
which is a copy of the complex projective line, CP”. 

The a-plane is determined by the twistor up to 
scale (in that a constant complex multiple of the 
field Q4 determines the same a-plane). Thus, we 
consider the projective twistor space PT which, 
since T is CÊ is a copy of complex projective 
3-space, CP?. This is now the space of a-planes, but 
is also compact. We define complexified, compacti- 
fied Minkowski space CM, as the space of all 
(complex projective) lines in PT; then it is easy to 
see that this includes CM as an open dense subset. 
PT is the space of a-planes in CM, and two lines 
meet in PT iff the corresponding points in CM, lie 
on an a-plane, or, equivalently, iff they are null 
separated. Thus, the conformal structure in CM, is 
determined by incidence of lines in PT. 

To find M and M, in this picture, we seek a-planes 
containing real points. If 24 from [8] vanishes at a 
real x44, then the contraction w4z,4 must be purely 
imaginary, so that, by [7], the norm of the twistor is 
zero. Conversely, one calculates that Q4 can indeed 
vanish at real points if the norm is zero, and that it 
will then in fact vanish along a null geodesic with 
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tangent vector (proportional to) #47“. Twistors with 
norm zero are called null and the (five-dimensional, 
real) submanifold of them in PT is PN. This is a 
compactification of the space of (unscaled) null 
geodesics in M by the inclusion of the 2-sphere of 
null geodesics in Me which lie on the light cone at T. 
For use in the next section, we note the definition of 
PT* and PT” as the projective twistors with positive 
and negative norm, respectively. 

To summarize, we have found M and M.:: 
(complex projective) lines in PT define points of 
CM,; lines in PN define points of M. with one such, 
call it I, picked out as the vertex of the null cone Z; 
lines in PN which meet I correspond to points of Z; 
lines in PN which do not meet I correspond to 
points in M. As for CM,, the conformal structure of 
M and M, is determined by incidence in PN. We 
may now note the nonlocal correspondence men- 
tioned in the introduction: points in CM, are lines in 
PT and points in PT are a-planes in CMe. 

It will be convenient to refer to the line in PT 
associated with a point x in CM, as L,. With this 
notation, it is possible to characterize the forward or 
future tube in terms of twistor space: a point x of 
CM is in the forward tube iff its imaginary part is 
timelike and past-pointing, and this is equivalent to 
Ly lying in PT’. 

The starting point for Riemannian twistor theory is 
the fact that CP? is a fibration with fiber CP! over 
S*, where the fiber above a point p can be interpreted 
as the almost-complex structures at p (since this is the 
same as the projective primed spin space at p). In the 
picture developed above, this means that there is an 
Ss worth of lines filling out CP?, no two of which 
intersect (so that there are no null vectors and the 
metric is definite). The complexification of S4 with its 
conformal structure is again CMe. 

If a twistor has nonzero norm, say Z°Z, =s Æ 0, 
then it can be interpreted as a massless particle with 
spin s: the momentum is p,=7,7mq4 and the 
angular momentum bivector is M? = iw 478)? — 
io" 7B) 48, The angular momentum transforms 
appropriately under translation by virtue of [6] 
and the (Pauli—Lubanski) spin vector is spy, as it 
should be for a massless spinning particle. 


The Linear Penrose Transform: 
Zero-Rest-Mass Free Fields 


A zero-rest-mass free field of spin s is a symmetric 
spinor field ¢4z..c with 2s indices which satisfies the 
field equation 


VA bap..c = 0 [10] 
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The Weyl neutrino equation, source-free Maxwell 
equation, and linearized Einstein vacuum equation 
are examples of zero-rest-mass free-field equations, 
with spins 1/2, 1, and 2, respectively, so that these 
are equations of physical interest. Conventionally, 
one takes the s = 0 case to be the wave equation, and 
the complex-conjugate fields wWap..¢@ to have the 
same spin but opposite helicity. 

The conformal group acts on solutions of [10], so 
that the equations are conformally invariant. The 
equations can be solved by contour integral expres- 
sions involving homogeneous functions of a twistor 
variable. To be explicit, we define an operation px of 
restriction to the line L, for a function of a twistor 
variable by the following: 


pxf (Z*) = f (ix^ tar, na) [11] 


Now suppose that f(Z®) is holomorphic and homo- 
geneous of degree —2s — 2 in the twistor variable for 
positive integer 2s, but otherwise arbitrary, and 
consider the integral 


Pag.) 


= frare RE To puf (ZEE rE drp [12] 


where there are 2s indices on w and the integration 
is around a contour in the line L, in PT. The choice 
of homogeneity ensures that the integral is well 
defined but, to obtain a nonzero answer, pxf must 
have some singularities as a function of my on Ly. 
The answer then automatically gives a helicity-(—s) 
solution of [10], as may be checked by differentia- 
tion under the integral sign. 

For a helicity-s solution, we take an arbitrary 
function f(Z°), holomorphic and of homogeneity 
(2s — 2), and consider the integral 


Pas...c(x) 
o O o a IGI 
= | (aaa prf Z )) <P redar [13] 


where there are 2s indices on ¢ and the integration is 
again around a contour in the line L,. As before, 
one needs singularities to make the contour integral 
nonzero, but again the result satisfies [10]. 

The correct framework in which to understand 
these integrals is sheaf cohomology theory. For 
[12], the functions with singularities are actually 
elements of H'(U/,O(— 2s —2)), the first cohomol- 
ogy group of a region Ú in PT with coefficients in 
the sheaf of germs of holomorphic functions of 
homogeneity —2s — 2, while the fields are elements 
of H°(U, Z,), the zeroth cohomology group of the 
corresponding region U of M with coefficients in 


helicity-s zero-rest-mass fields (thus, 2/ must con- 
tain the neighborhood of lines Ly for points x in U). 
Similarly, [13] is interpreted cohomologically in 
terms of potentials modulo a gauge. With appro- 
priate conditions on U and U (for brevity, U is said 
to be elementary), these groups can be shown to be 
isomorphic and this isomorphism is known as the 
Penrose transform (Ward and Wells (1991)). A 
particular instance of an elementary U is the 
forward tube, when U/ is PT”. Since the definition 
of positive frequency is holomorphicity on the 
forward tube, this observation geometrizes the 
notion of positive frequency in terms of twistor 
space. 

For free fields with mass, there are generalizations 
of [12] and [13] to solve the Dirac equation for 
different spins. However, the integrands now 
involve functions of more than one twistor variable, 
subject to an equation. This equation is a counter- 
part of the Klein-—Gordon equation and breaks the 
conformal invariance (as it must, since mass does). It 
can be imposed by a projection which can in turn be 
written as a contour integral over arbitrary holo- 
morphic functions. It has been argued that the 
appropriate description of leptons and hadrons in 
twistor theory is with functions of two and three 
twistor variables, respectively. Such a function has 
two or three integer quantum numbers determined 
by the homogeneities in different variables, and this 
leads to a twistor particle classification scheme (see, 
e.g., Hughston and Sheppard (1980) and Sparling 
(1981)), similar in many respects to, but not 
identical with, the standard classifications. 

Given that free fields, massive or massless, are 
determined from arbitrary twistor functions through 
contour integrals, one may translate the Feynman 
diagrams of a quantum field theory into contour 
integrals over twistor functions. In the massless case, 
the contours are compact, so that the integrals are 
finite without need for renormalization. The massive 
case is more complicated but essentially parallel. 
This is twistor diagam theory and there is a 
substantial literature on it (see, e.g., the article by 
Hodges in the volume edited by Huggett et al. 
(1998)). There is currently no new physical theory, 
distinct from a known quantum field theory, to 
generate the relevant diagrams. 


The Nonlinear Penrose Transform: 
Curved Twistor Spaces 


The electromagnetic field, in Minkowski space say, 
can be regarded as a spinor field subject to field 
equations, in which case these equations can be 


solved via the Penrose transform by contour 
integrals. Alternatively, it can be seen as the 
curvature of a connection on a U(1) bundle over 
M, which is a more active role for the field in 
curving a bundle. For SD or ASD electromagnetic 
fields, there are analogous active twistor construc- 
tions. From an ASD electromagnetic field, one may 
define a connection on the primed spin space of CM 
which is flat on a-planes: if the tangents to the a- 
plane are of the form “7 for varying à^ and with 
Tt^ fixed up to scale, then consider the propagation 
of my around the a-plane given by 


T^ (Vara — iAxa)rp = 0 [14 


where Axa is a potential for the electromagnetic 
field. This connection is flat provided 


TÊ n? Vay Ad, = 0 [15] 


and if this is to hold for all my then Vata Agr 
vanishes and the electromagnetic field, defined as 
usual as the exterior derivative of the potential, is 
necessarily ASD. Now the space of a-planes in CM 
is projective twistor space PT, so we define a 
holomorphic C* bundle 7 over PT by taking the 
fiber above an a-plane to be choices of my scaled as 
in [14]. If we restrict attention to the a-planes 
through a given point p of CM, then by comparing 
the scalings at p we can trivialize the bundle; thus, 7 
is trivial on lines in PT. There is a converse to this 
construction and we have: there is a one-to-one 
correspondence between holomorphic C* bundles 
on a region U in PT which are trivial on lines and 
ASD electromagnetic fields on the corresponding 
region U of CM (for elementary U). 

This construction can be extended to solve the 
ASD Yang-Mills equations with holomorphic vector 
bundles replacing holomorphic line bundles: with U 
and elementary U as above, there is a natural one-to- 
one correspondence between ASD GL(n,C) gauge 
fields on U and holomorphic rank-n vector bundles 
E over U which are trivial on L, for every x in U. 

ASD Yang-Mills fields cannot be real on M, but 
using Riemannian twistor theory, one can impose 
appropriate reality and globality conditions to 
ensure that these ASD Yang-Mills fields are both 
real and globally defined on S*. These are then 
instantons. The Atiyah—Drinfeld—Hitchin-—Manin 
(ADHM) construction of instantons (Atiyah et al. 
1978) proceeds via construction of the correspond- 
ing holomorphic vector bundles over twistor space. 

The construction of ASD Yang-Mills fields is 
also the starting point for the twistor theory of 
integrable systems (Mason and Woodhouse 1996), 
following the observation that many of the known 
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completely integrable partial differential equations 
(PDEs) (including the sine-Gordon, Korteweg-de 
Vries (KdV) and nonlinear Schrodinger equations) 
are reductions of the ASD Yang-Mills equations. 
Solutions of these other integrable systems can be 
given in terms of a geometrical construction, 
usually of some structure in holomorphic geometry. 

The other major active twistor construction, 
which historically preceded the Yang-Mills one, is 
Penrose’s nonlinear graviton (Penrose 1976), which 
solves the ASD Einstein vacuum equations. For this, 
one starts from a complex, four-dimensional mani- 
fold M with holomorphic metric, vanishing Ricci 
curvature and ASD Weyl tensor. These conditions 
on the curvature are necessary and sufficient to 
allow the existence of a-surfaces, which generalize 
a-planes. They are two-dimensional totally null 
(complex) surfaces with SD tangent bivector, one 
for each choice of (null) SD bivector, or, equiva- 
lently, for each choice of primed spinor, at each 
point. 

The space of a-surfaces is a three-dimensional 
complex manifold, the curved twistor space P7. 
This is curved inasmuch as it is not now (part of) 
CP*, but it still contains complex projective lines: 
given a point p in M there is an a-surface through p 
for every primed spinor at p up to scale; these a- 
surfaces make up a projective line Lp in PT. The 
conditions on the curvature are equivalent to the 
statement that the Levi-Civita connection is flat on 
primed spinors, so that there exist constant primed 
spinors in M, and the tangent bivector to an a- 
surface can be taken to be constant, without loss of 
generality. The map associating a constant primed 
spinor with each a-surface defines a projection 7 
from PT to CP", so that PT is a fibration over 
CP. The lines L, define a four-parameter family of 
sections of this fibration. 

To define the metric of M from PZ, one needs 
the notion of normal bundle: the normal bundle of a 
submanifold Y in a manifold X is N=TX|,/TY in 
terms of the tangent bundles TX and TY. The 
normal bundle M, of a particular section Lẹ is the 
same in PT as it was in PT, namely H @H, where 
H is the hyperplane-section line bundle over CP! 
(Ward and Wells 1991). A section Sy of Np 
corresponds to a vector V in T,M (think of it as 
an infinitesimally neighboring point in M) and V is 
defined to be null iff Sy has a zero. Because of the 
nature of N, this defines a quadratic conformal 
metric, which, furthermore, agrees with the con- 
formal metric on M and generalizes the definition of 
conformal metric for CM, in terms of incidence in 
PT. To define the actual metric, as opposed to just 
the conformal metric, one has a covariant-constant 
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choice of e^? in M which defines an e€ on the base of 
the fibration, and a Poisson structure on the fibers u 
of the projection. The definition of u is more intricate, 
but the two structures enable the metric of M to be 
recovered from P7. Penrose (1976) and Huggett and 
Tod (1994) provide more details. 

Now the metric and curvature properties of M 
are coded into holomorphic properties of PT 
together with € and u. These properties characterize 
M: subject to topological conditions on M, there is 
a one-to-one correspondence between holomorphic 
solutions M of the Einstein vacuum equations with 
ASD Weyl tensor and three-dimensional complex 
manifolds PT fibered over CP', with a four- 
parameter of sections, each with normal bundle 
H @ H, and the forms « and u as above. 

In fact, one only needs to assume the existence of 
one section with the correct normal bundle and the 
full four-parameter family will automatically exist, 
at least near to the initial one. Penrose (1976) 
showed how curved twistor spaces with the neces- 
sary structures could be obtained by deforming the 
neighborhood of a line in the “flat” twistor space 
PT. The Kodaira-Spencer theory of complex defor- 
mations ensures that the necessary lines continue to 
exist under this deformation. 

The original nonlinear graviton construction has 
been extended in various ways including the follow- 
ing: to allow the possibility of a cosmological 
constant (Ward and Wells 1991); to produce real, 
Riemannian solutions (Hitchin 1995); to solve other 
but related field equations (e.g., those for hyper- 
complex metrics, scalar-flat Kahler metrics or 
Einstein—Weyl structures). 

The search for a twistor construction of the 
SD Einstein equations (distinct from a construction 
in terms of dual twistors, which is, of course, 
provided by deforming dual twistor space) is an 
active area of research. This and other applications of 
twistor theory, including a quasilocal definition of 
mass in general relativity, the classification of affine 
holonomies and the construction of four-dimensional 
conformal field theories, may be found in the 
literature cited in the “Further reading” section. 


See also: Classical Groups and Homogeneous Spaces; 
Clifford Algebras and Their Representations; Integrable 


Systems: Overview; Quantum Field Theory: A Brief 
Introduction; Quantum Mechanics: Foundations; 
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Introduction 


For the last twenty years or so, two-dimensional 
(2D) conformal field theories have played an 
important role in different areas of modern theo- 
retical physics. One of the main applications of 
conformal field theory has been in string theory (see 
Compactification of Superstring Theory), where the 
excitations of the string are described, from the 
point of view of the world sheet, by a 2D conformal 
field theory. Conformal field theories have also been 
studied in the context of statistical physics, since the 
critical points of second-order phase transition are 
typically described by a conformal field theory. 
Finally, conformal field theories are interesting 
solvable toy models of genuinely interacting quan- 
tum field theories. 

From an abstract point of view, conformal field 
theories are (Euclidean) quantum field theories 
that are characterized by the property that their 
symmetry group contains, in addition to the 
Euclidean symmetries, local conformal transforma- 
tions, that is, transformations that preserve angles 
but not necessarily lengths. The local conformal 
symmetry is of special importance in two dimen- 
sions since the corresponding symmetry algebra is 
infinite dimensional in this case. As a consequence, 
2D conformal field theories have an infinite 
number of conserved quantities, and are essentially 
solvable by symmetry considerations alone. The 
mathematical formulation of these symmetries has 
led to the concept of a vertex operator algebra, 
which has become a new branch of mathematics in 
its own right. In particular, it has played a major 
role in the explanation of “monstrous moonshine” 
for which Richard Borcherds received the Fields 
medal in 1998. 

In the following, we want to explain the main 
features of conformal field theory using an algebraic 
approach that will naturally lead to the concept of a 
vertex operator algebra. There are other approaches 
to the subject, most notably the formulation, 
pioneered by Segal, of conformal field theory as a 
functor from the category of Riemann surfaces to 
the category of vector spaces. Due to limitations of 
space, however, we will not be able to discuss any of 
these other approaches here. 


The Conformal Symmetry Group 


The conformal symmetry group of the m-dimen- 
sional Euclidean space R” consists of the (locally 
defined) transformations that preserve angles but 
not necessarily lengths. The transformations that 
preserve angles as well as lengths are the well- 
known translations and rotations. The conformal 
group contains (in any dimension) in addition the 
dilatations or scale transformations 


xP > KB = Ax? [1] 


where A € R and x” € R”, as well as the so-called 
special conformal transformations, 


e = o txat 2) 

1+2(x- a) + x?a? 
where a” ER” and x? =x"x,. (Note that this last 
transformation is only defined for x” 4 —a"/a’.) 

If the dimension n of the space R” is larger than 2, 
one can show that the full conformal group is 
generated by these transformations. For n=2, 
however, the group of (locally defined) conformal 
transformations is much larger. To see this, it is 
convenient to introduce complex coordinates for 
(x,y) E€ R? by defining z=x+iy and zZ=x — iy. 
Then any (locally) analytic function f(z) defines a 
conformal transformation by z+> f(z), since analytic 
maps preserve angles. (Incidentally, the same also 
applies to z> f(z), but this would reverse the 
orientation.) Clearly, the group of such transforma- 
tions is infinite dimensional; this is a special feature 
of two dimensions. 

In this complex notation, the transformations that 
are generated by translations, rotations, dilatations, 
and special conformal transformations simply gen- 
erate the Mobius group of automorphisms of the 
Riemann sphere 





az+b 
— 3 
ee a 3! 
where a,b,c,d are complex constants with 


ad — bc £0; since rescaling a,b,c,d by a common 
complex number does not modify [3], the Möbius 
group is isomorphic to SL(2,C)/Z2. In addition to 
these transformations (that are globally defined on the 
Riemann sphere), we have an infinite set of infinitesi- 
mal transformations generated by L, :z z + ez”t! 
for all n € Z. The generators L +; and Lo generate the 
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subgroup of Mobius transformations, and their com- 
mutation relations are simply 


[Lm Ln| = (m — 1) Lintn [4] 


In fact, [4] describes also the commutation relations 
of all generators L, with n€ 'Z: this is the Lie 
algebra of (locally defined) 2D conformal transfor- 
mations — it is called the Witt algebra. 


The General Structure of Conformal 
Field Theory 


A 2D conformal field theory is determined (like any 
other field theory) by its space of states and the 
collection of its correlation functions (vacuum 
expectation values). The space of states is a vector 
space H (which, in many interesting examples, is a 
Hilbert space), and the correlation functions are 
defined for collections of vectors in some dense 
subspace of H. These correlation functions are 
defined on a 2D (Euclidean) space. We shall mainly 
be interested in the case where the underlying 2D 
space is a closed compact surface; the other 
important case concerning surfaces with boundaries 
(whose analysis was pioneered by Cardy) will be 
reviewed elsewhere (see the article Boundary Con- 
formal Field Theory). The closed surfaces are 
classified (topologically) by their genus g, which 
counts the number of handles; the simplest such 
surface which we shall mainly consider is the sphere 
with g=0, the surface with g=1 is the torus, etc. 

One of the special features of conformal field 
theory is the fact that the theory is naturally defined 
on a Riemann surface (or complex curve), that is, on 
a surface that possesses suitable complex coordi- 
nates. In the case of the sphere, the complex 
coordinates can be taken to be those of the complex 
plane that cover the sphere except for the point at 
infinity; complex coordinates around infinity are 
defined by means of the coordinate function 
y(z)=1/z that maps a neighborhood of infinity to 
a neighborhood of zero. With this choice of complex 
coordinates, the sphere is usually referred to as the 
Riemann sphere, and this choice of complex 
coordinates is, up to Möbius transformations, 
unique. The correlation functions of a conformal 
field theory that is defined on the sphere are thus of 
the form 


(O] V (a1; 21,21) +++ VW; Zn, Zn)|0) [5] 


where V(w,z,2Z) is the field that is associated to the 
state w, and z; and Z; are complex conjugates of one 
another. Here |0) denotes the SL(2, C) /Z-invariant 
vacuum. The usual locality assumption of a 2D 


(bosonic) Euclidean quantum field theory implies 
that these correlation functions are independent of 
the order in which the fields appear in [5]. 

It is conventional to think of z=0 as describing 
“past infinity,” and z= as “future infinity”; this 
defines a time direction in the Euclidean field theory 
and thus a quantization scheme (radial quantiza- 
tion). Furthermore, we identify the space of states 
with the space of “incoming” states; thus, the state w 
is simply 


W = V(4%; 0, 0)|0) 6 


We can think of z; and z; in [5] as independent 
variables, that is, we may relax the constraint that Zz; 
is the complex conjugate of z;. Then we have two 
commuting actions of the conformal group on these 
correlations functions: the infinitesimal action on 
the z; variables is described (as before) by the L, 
generators, while the generators for the action on 
the z; variables are L„. In a conformal field theory, 
the space of states H thus carries two commuting 
actions of the Witt algebra. The generator Lo + Lo 
can be identified with the time-translation operator, 
and thus describes the energy operator. The space of 
states of the physical theory should have a bounded 
energy spectrum, and it is thus natural to assume 
that the spectrum of both Lo and Lo is bounded 
from below; representations with this property are 
usually called positive-energy representations. It is 
relatively easy to see that the Witt algebra does not 
have any unitary positive-energy representations 
except for the trivial representation. However, as is 
common in many instances in quantum theory, it 
possesses many interesting projective representa- 
tions. These projective representations are conven- 
tional representations of the central extension of the 
Witt algebra 


5 m(m —1)m-n [7 
which is the famous Virasoro algebra. Here c is a 
central element that commutes with all Lm; it is 
called the central charge (or conformal anomaly). 
Given the actions of the two Virasoro algebras 
(that are generated by L, and L,), one can 
decompose the space of states H into irreducible 


representations as 


[aoe La = (m = aioe: iF 


H = MiHo H; [8] 

ij 
where H;(H;) denotes the irreducible representations 
of the algebra of L,(Ly), and Mj € No describe the 
multiplicities with which these combinations of 
representations occur. (We are assuming here that 
the space of states is completely reducible with 
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respect to the action of the two Virasoro algebras; 
examples where this is not the case are the so-called 
logarithmic conformal field theories.) The positive- 
energy representations of the Virasoro algebra are 
characterized by the value of the central charge, as 
well as the lowest eigenvalue of Lo; the state w 
whose Lo eigenvalue is smallest is called the highest- 
weight state, and its eigenvalue Low=hy is the 
conformal weight. The conformal weight determines 
the conformal transformation properties of 4: under 
the conformal transformation z> f(z), Z — f(z), we 
have 


V(Y;z, z) 
> (F FEN VO; f(z), (2) [9] 


where Low=hw and Loy =bhy. The corresponding 
field V(wW;z,Z) is then called a primary field; if [9] 
only holds for the Mobius transformations [3], the 
field is called quasiprimary. 

Since Lm with m > 0 lowers the conformal weight 
of a state (see [7]), the highest-weight state y is 
necessarily annihilated by all L,,, (and Lm) with m > 0. 
However, in general the L,, (and Lm) with m < 0 
do not annihilate w~; they generate the descendants 
of yw that lie in the same representation. Their 
conformal transformation property is more compli- 
cated, but can be deduced from that of the primary 
state [9], as well as the commutation relations of 
the Virasoro algebra. 

The Mobius symmetry (whose generators annihi- 
late the vacuum) determines the 1-, 2- and 3-point 
functions of quasiprimary fields up to numerical 
constants: the 1-point function vanishes, unless 
h=h=0, in which case (0| V(4;z,z)|0})= C, inde- 
pendent of z and Z. The 2-point function of ~ and 
Ww. vanishes unless hj =b and h,=/); if the 
conformal weights agree, it takes the form 


(OV (a1; 21, 21) V (Wa 22, Z2)|0) 


= Cla — 2) "(i — 2) (10) 


Finally, the structure of the 3-point function of three 
quasiprimary fields %1, Y2, and w3 is 


(O| V (Wi; 21, 21) V (W2 22, Z2) V (a3; 23, Z3)[O) 
_ CTI tz _ gj) ee) c= zj) ahi hi) [11] 


i<j 
where for each pair i < j,k labels the third field, that 
is, RAi and k Æj. The Möbius symmetry also 
restricts the higher correlation function of quasi- 
primary fields: the 4-point function is determined up 
to an (undetermined) function of the Möbius 
invariant cross-ratio, and similar statements also 


hold for n-point functions with n > 5. The full 
Virasoro symmetry must then be used to restrict 
these functions further; however, since the genera- 
tors L, with n < —2 do not annihilate the vacuum 
|0), the Virasoro symmetry leads to Ward identities 
that cannot be easily evaluated in general. (In typical 
examples, the Ward identities give rise to differential 
equations that must be obeyed by the correlation 
functions.) 


Chiral Fields and Vertex Operator 
Algebras 


The decomposition [8] usually contains a special 
class of states that transform as the vacuum state 
with respect to Lm; these states are the so-called 
chiral states. (Similarly, the states that transform as 
the vacuum state with respect to Lm are the 
antichiral states.) Given the transformation proper- 
ties described above, it is not difficult to see that the 
corresponding chiral fields V(w;z,z) only depend on 
z in any correlation function, that is V(wW3z,Z) = 
V(w, z). (Similarly, the antichiral fields only depend 
on z.) The chiral fields always contain the field 
corresponding to the state L |0}, that describes a 
specific component of the stress—energy tensor. 

In conformal field theory, the product of two 
fields can be expressed again in terms of the fields of 
the theory. The conformal symmetry restricts the 
structure of this operator product expansion: 


V(e13 21, Z1) V (2; 22, 22) 
= > — 29) (1 — 22) 


NO V(Gi.g3 82,22) (21 — 22) (1 — Ze) [12] 


r,s>0 


where A; and Aj are real numbers, and r,s € No. 
(Here i labels the conformal representations that 
appear in the operator-product expansion, while r 
and s label the different descendants.) The actual 
form of this expansion (in particular, representations 
that appear) can be read off from the correlation 
functions of the theory since the identity [12] has to 
hold in all correlation functions. 

Given that the chiral fields only depend on z in all 
correlation functions, it is then clear that the 
Operator-product expansion of two chiral fields 
again only contains chiral fields. Thus, the subspace 
of chiral fields closes under the operator-product 
expansion, and therefore defines a consistent (sub)- 
theory by itself. This subtheory is sometimes referred 
to as a meromorphic conformal field theory (Goddard 
1989). (Obviously, the same also applies to the 
subtheory of antichiral fields.) The operator-product 
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expansion defines a product on the space of mero- 
morphic fields. This product involves the complex 
parameters z; in a nontrivial way, and therefore does 
not directly define an algebra structure; it is, however, 
very similar to an algebra, and is therefore usually 
called a vertex operator algebra in the mathematical 
literature. The formal definition involves formal 
power series calculus and is quite complicated; details 
can be found in (Frenkel-—Lepowski-Meurman 1988). 

By virtue of its definition as an identity that holds 
in arbitrary correlation functions, the operator- 
product expansion is associative, that is, 


(V1; Z1, 21) VYN 22, 22)) V (ws; C353) 
= V (biziz V (2; 22, Z2) V(w33.%3,23)) [13] 


where the brackets indicate which operator-product 
expansion is evaluated first. If we consider the case 
where both yı and y2 are meromorphic fields, then 
the associativity of the operator-product expansion 
implies that the states in H form a representation of 
the vertex operator algebra. The same also holds for 
the vertex operator algebra associated to the anti- 
chiral fields. Thus the meromorphic fields encode in 
a sense the symmetries of the underlying theory: this 
symmetry always contains the conformal symmetry 
(since L |0} is always a chiral field, and L | 0) 
always an antichiral field). In general, however, the 
symmetry may be larger. In order to take full 
advantage of this symmetry, it is then useful to 
decompose the full space of states H not just with 
respect to the two Virasoro algebras, but rather with 
respect to the two vertex operator algebras; the 
structure is again the same as in [8], where, 
however, each H; and H; is now an irreducible 
representation of the chiral and antichiral vertex 
operator algebra, respectively. 


Rational Theories and Zhu’s Algebra 


Of particular interest are the rational conformal 
field theories that are characterized by the property 
that the corresponding vertex operator algebras only 
possess finitely many irreducible representations. 
(The name “rational” stems from the fact that the 
conformal weights and the central charge of these 
theories are rational numbers.) The simplest exam- 
ple of such rational theories are the so-called 
minimal models, for which the vertex operator 
algebra describes just the conformal symmetry: 
these models exist for a certain discrete set of 
central charges c<1 and were first studied by 
Belavin, Polyakov, and Zamolodchikov in 1984. 
(Their paper is contained in the reprint volume of 
Goddard and Olive (1988).) It was this seminal 


paper that started many of the modern develop- 
ments in conformal field theory. Another important 
class of examples are the Wess—Zumino—Witten 
(WZW) models that describe the world-sheet theory 
of strings moving on a compact Lie group. The 
relevant vertex operator algebra is then generated by 
the loop group symmetries. There is some evidence 
that all rational conformal field theories can be 
obtained from the WZW models by means of two 
standard constructions, namely by considering 
cosets and taking orbifolds; thus rational conformal 
field theory seems to have something of the flavor of 
(reductive) Lie theory. 

Rational theories may be characterized in terms of 
Zhu’s algebra that can be defined as follows. The 
chiral fields V(w,z) that only depend on z must by 
themselves define local operators; they can therefore 
be expanded in a Laurent expansion as 


V(v,z2)= So Val) [14] 


neZ, 


where b is the conformal weight of the state Y. For 
example, for the case of the holomorphic compo- 
nent of the stress—energy tensor one finds 


TS ` ing = [15] 
neZ 


where the L, are the Virasoro generators. By the 
state/field correspondence [6], it then follows that 


V,()|0)=0 for n > -—h 16] 


and that 


V_p(h)|0) =o [17] 


(For an example of the above component of the 
stress—energy tensor, [16] implies that L_,|0)= 
Lo |0) =L,|0) =0 for n > 0 — thus the vacuum is in 
particular SL(2, C)/Z2 invariant. Furthermore, [17] 
shows that L |0} is the state corresponding to this 
component of the stress—energy tensor.) We denote 
by Ho the space of states that can be generated by 
the action of the modes V,„(%) from the vacuum |0). 
On Ho we consider the subspace O(Ho) that is 
spanned by the states of the form 


VENY 
where V’ (4) is defined by 


h 
v= Shh) v 19 


n 
n=0 


N>0 [18] 


and h is the conformal weight of 7. Zhu’s algebra is 
then the quotient space 


A = Ho/O(Ho) [20] 
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It actually forms an associative algebra, where the 
algebra structure is defined by 


ypx x = VO (W)x [21] 


This algebra structure can be identified with the 
action of the “zero-mode algebra”? on an arbitrary 
highest-weight state. 

Zhv’s algebra captures much of the structure of 
the (chiral) conformal field theory: in particular, it 
was shown by Zhu in 1996 that the irreducible 
representations of A are in one-to-one correspon- 
dence with the representations of the full vertex 
operator algebra. A conformal field theory is thus 
rational (in the above, physicists’, sense) if Zhu’s 
algebra is finite dimensional. (In the mathematics 
literature, a vertex operator algebra is usually called 
rational if in addition every positive-energy repre- 
sentation is completely reducible. It has been 
conjectured that this is equivalent to the condition 
that Zhu’s algebra is semisimple.) 

In practice, the determination of Zhu’s algebra is 
quite complicated, and it is therefore useful to 
obtain more easily testable conditions for rational- 
ity. One of these is the so-called C2 condition of 
Zhu: a vertex operator algebra is C)-cofinite if the 
quotient space Ho/O2(Ho) is finite dimensional, 
where ©2(Ho) is spanned by the vectors of the form 


V(x, n21 [22] 


It is easy to show that the C2-cofiniteness condition 
implies that Zhu’s algebra is finite dimensional. 
Gaberdiel and Neitzke have shown that every 
Cy-cofinite vertex operator algebra has a simple 
spanning set; this observation can, for example, be 
used to prove that all the fusion rules (see below) of 
such a theory are finite. 


Fusion Rules and Verlinde’s Formula 


As explained above, the correlation function of three 
primary fields is determined up to an overall 
constant. One important question is whether or not 
this constant actually vanishes since this determines 
the possible “couplings” of the theory. This infor- 
mation is encoded in the so-called fusion rules of the 
theory. More precisely, the fusion rules N;* E No 
determine the multiplicity with which the represen- 
tation of the vertex operator algebra labeled by k 
appears in the operator-product expansion of the 
two representations labeled by i and j. 

In 1988, Verlinde found a remarkable relation 
between the fusion rules of a vertex operator 
algebra and the modular transformation properties 
of its characters. To each irreducible representation 


H; of a vertex operator algebra, one can define the 
character 


xi(T) = try, Come ` q= e2"iT [23] 


For rational vertex operator algebras (in the math- 
ematical sense) these characters transform under the 
modular transformation T— —1/7 as 


x(-1/7) = > Sixil(7) [24] 


where S; are constant matrices. Verlinde’s formula 
then states that, at least for unitary theories, 


N.R = esos Lia 25 
i Darn 25] 
where the “0” label denotes the vacuum representa- 
tion. A general argument for this formula has been 
given by Moore and Seiberg in 1989; very recently, 
this has been made more precise by Huang. 


Modular Invariance and the Conformal 
Bootstrap 


Up to now, we have only considered conformal field 
theories on the sphere. In order for the theory to be 
well defined also on higher-genus surfaces, it is 
believed that the only additional requirement comes 
from the consistency of the torus amplitudes. In 
particular, the vacuum torus amplitude must only 
depend on the equivalence class of tori that is 
described by the modular parameter r € H, up to 
the discrete identifications that are generated by the 
usual action of the modular group SL(2, Z) on the 
upper half-plane H. For the theory with decomposi- 
tion [8] this requires that the function 


Z(T,7) = > Miyxi(7)x;(7) [26] 








is invariant under the action of SL(2, Z). This is a 
very powerful constraint on the multiplicity matrices 
M; that has been analyzed for various vertex 
operator algebras. For example, Cappelli, Itzykson, 
and Zuber have shown that the modular invariant 
WZW models corresponding to the group SU(2) 
have an A-D-E classification. The case of SU(3) was 
solved by Gannon, using the Galois symmetries of 
these rational conformal field theories. 

The condition of modular invariance is relatively 
easily testable, but it does not, by itself, guarantee that 
a given space of states H comes from a consistent 
conformal field theory. In order to construct a 
consistent conformal field theory, one needs to solve 
the conformal bootstrap, that is, one has to determine 
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all the normalization constants of the correlators so 
that the resulting set of correlators is local and 
factorizes appropriately into 3-point correlators 
(crossing symmetry). This is typically a difficult 
problem which has only been solved explicitly for 
rather few theories, for example, the minimal models. 
Recently, it has been noticed that the conformal 
bootstrap can be more easily solved for the corre- 
sponding boundary conformal field theory. Further- 
more, Fuchs, Runkel, and Schweigert have shown that 
any solution of the boundary problem induces an 
associated solution for conformal field theory on 
surfaces without boundary. This construction relies 
heavily on the relation between 2D conformal field 
theory and 3D topological field theory (Turaev 1994). 


See also: Boundary Conformal Field Theory; 
Compactification of Superstring Theory; Current Algebra; 
Knot Theory and Physics; String Field Theory; 
Superstring Theories; Symmetries in Quantum Field 
Theory of Lower Spacetime Dimensions. 
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Introduction 


The Ising model is a model of a classical ferro- 
magnet on a lattice first introduced in 1925 in the 
one-dimensional case by E Ising. At each lattice site 
there is a “spin” variable ø, which takes on the 
values +1 (spin up) and —1 (spin down). The mutual 
interaction energy of the pair of spins ca and oy, 
where a and a’ are nearest neighbors, is —E(a, a’) if 
Oy =O and is E(a,a’) if og = —oy. In addition, the 
spins can interact with an external magnetic field as 
—Ho,,. On a square lattice, where j specifies the row 
and k specifies the column, the interaction energy 
for the homogeneous case where E,(a,a’) and 
F,,(a@, a’) are independent of the position a,a’ may 
be explicitly written as 


E(H) = — N [Eho koj k41 + Evo; kOji k + Hog) [1] 
j,k 
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This very simple model [1] has the remarkable 
property that in two dimensions at H=0 many 
properties of physical interest can be computed 
exactly. Furthermore, the model has a ferromagnetic 
phase transition at a critical temperature T,, at 
which the specific heat diverges and the magnetic 
susceptibility diverges to infinity and below which 
there is a nonzero spontaneous magnetization. In 
addition, the microscopic correlations between spins 
can also be exactly computed. These exact calcula- 
tions are the basis of the modern theory of second- 
order phase transitions used to analyze real ferro- 
magnets and real fluids near their critical points in 
both two and three dimensions. The model may also 
be interpreted as a lattice gauge theory. 


Solvability 


The solvability of the Ising model at H=0 was 
discovered by Onsager in 1944 in one of the most 
profound and inventive papers ever written in 
mathematical physics. Onsager discovered that the 
model possesses an infinite-dimensional symmetry, 
which allowed him to exactly compute the free 


energy per site. This symmetry is generated by the 
relations 


[An Am] = 4Gi-m 
IG}, Am] = ZA mil — Ap- [2 
Gi Ga =Q 


This algebra of Onsager is a subalgebra of what is 
now called the loop algebra of the Lie algebra Sly 
and it is the first infinite-dimensional algebra to be 
used in physics. 

In the 60years since Onsager first computed the 
free energy, several other methods of exact solution 
have been found. In 1949, Kaufman reduced the 
computation of the free energy to a problem of free 
fermions. A closely related combinatorial method 
was invented by Kac and Ward, Hurst and Green, 
and by Kastelyn. Baxter (1982) has computed the 
free energy by means of star triangle equations and 
functional equations in his book. 

The fermionic and the combinatorial methods are 
powerful enough to compute the correlation func- 
tions but are not generalizable to other models. The 
functional equation methods of Baxter generalize to 
many other important models but they do not give 
correlation functions. There are still aspects of 
Onsager’s method that remain unexplored. 

The free energy per site in the thermodynamic 
limit is defined as 


P= —kpT lim N In Z(H) (3) 


where M is the total number of sites of the lattice 
and the partition function Z;(H) is defined as 


Z(H) = ee 4] 
all o=1 


with the sum being over all values o; , =+1 and kp 
is Boltzmann’s constant. The result of Onsager is 
that, at H=0, 


2r 2r 
Patea / dô, | d0:In [cosh 2E, /ksT 
0 0 


812 
x cosh 2E,/kgT — sinh 2E} /kgT cos 01 
— sinh 2E,/kgT cos 05 [S] 


This free energy has a singularity at a temperature 
T, defined from 


sinh(2E,/kgTe)sinh(2Ep/kgTe) = 1 [6] 


and near T, the specific heat diverges as 


"N (Ei sinh? 2E, /ks T; + 2E,Ep 


2 
kgT?r 
+ E? sinh? 2Ep /ksT;) Injl—T/T.| 17 
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The next property to be computed was the 
spontaneous magnetization, which is usually 
defined as 

M_ = lim M(H) [8] 
H-0r 
However, because solution is only available at 
H=0, this definition cannot be used and instead 
M_ is computed from an alternative definition in 
terms of the spin-spin correlation function 


1 _F(0)/kpT 
LODIT MNA — Soa X 00,00M,N®€ ° [9] 
Zı(0) ae | 


as 


M? = lim 


_ <00,00M.N> [10] 
M? +N? — o0 


The result for M_, first announced by Onsager in 
1949, is 


T l (1—k2)'8 for T < T, [11] 
7 0 for T > T, 


where 
k = (sinh 2E, /kgT sinh2E,/kpT) | [42] 


A key point in the computation of the magnetiza- 
tion [11] from [9] is that the spin-spin correlation 
function can be written as a determinant. In fact, 
there are many such different, but equal, determi- 
nental representations and the size of the smallest 
one in general is 2(|M| + IN|). The simplest case is 
the diagonal correlation 


ag a1 a2 a1—N 
ay ag a_} a2—N 
<00 00N, N> = a2 a} a a3—N [13] 
QAN-1 4N-2 4N-3 '`' æQ 


where 


1 T ep ‘iL ke? 1/2 


Determinants of the form [13], where the elements 
on each diagonal are equal, are called Toeplitz. 
The study of the spin-spin correlations of the 
Ising model provides a microscopic picture of the 
behavior of the ferromagnet near the phase transition 
temperature T,, and an entire branch of mathematics 
has developed from the study of the behavior of 
Toeplitz determinants when the size is large. The first 
such mathematical advance was the discovery by 
Szego of a general formula for the limit as N —> œ, 
from which the magnetization [11] is computed. 
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The simplest result for the approach to this N —> oo 
limit is the behavior of the diagonal correlation at 
T =T,.(k =1), where [13] exactly reduces to 


2 N N-1 1 I-N 
<00,0ON,N> = (=) [I a — al [15] 


l=1 


which behaves as N — œ as 
<00 00N N> ~ AN" * [16] 


where A ~ 0.6450 --- is a transcendental constant. 
Further results for large N and T fixed are for 
T<T.(k< 1), 


j 2 k2(N+1) 
<00 00N N> ~ M- Eee [17] 


and for T>T.(k > 1), 
pN 


CC S 


<00 00N N> ~ 


By comparing [16] with [17] and [18], we see that 
at T = T, the correlations decay algebraically but for 
T Æ Te the decay is exponential. It is useful to write 
the exponential in [17] for T < T, as 


RN = e™N/S- with &1=-Ink [19] 
and in [18] for T > T, as 
kN=e™N/& with €-1 =ln k [20] 


The quantity £ is called the correlation length and as 
T — T, the correlation length diverges as 


& ~ |1 — k| = const. |T — T.|7' [21] 


A more profound property of the correlations is 
that they satisfy differential and difference equa- 
tions. It was found by Jimbo and Miwa (1980) that 
the diagonal correlation function satisfies the non- 
linear differential equation related to the sixth 
Painlevé function 


~4F (@- FF - 0-4) (*- 6] [22] 


where for T < T, we set t=k™ and 


d 1 
on(t) = t(t — 1) qin <00,0ON.N> —] 


and for T > T. we set t =k? and 


|23] 


on(t) = t(t = 1) “In <00,0ON,N> -7 [24] 
Furthermore it was found by McCoy et al. (1981) 
that for a given temperature the general two spin 
correlation function and all multipoint correlations 
satisfy quadratic nonlinear partial difference equa- 
tions in the locations of the spins. 


Scaling Theory 


It is evident that the results [17] and [18] do not 
reduce to [16] when k— 1. Therefore, in order to 
uniformly characterize the behavior of the correla- 
tion function in the critical region near Te, it is 
necessary to introduce what is called the scaling 
function. This uniform expansion is obtained by 
introducing a scaled length defined as 


raNi/E [25] 
and considering the joint (scaling limit) where 
N-co and T—œ with, fixed [26] 
We define the scaled correlation function as 


G4 (r) = lim M7 <00,00N, N> [27] 
scaling 
where the subscript + means that the limit is taken 
from T >T. or T < T., respectively, M_ is the 
spontaneous magnetization [11] and 


M, = (k? - 1)" [28] 


This concept of the scaling limit and scaling 
function is very general and can be defined for any 
system with a critical point that has an order 
parameter like M_ that vanishes at T. and a 
correlation length that diverges at T. However, 
the Ising model has the further remarkable property 
discovered by Wu et al. (1976) that the scaled 
correlation function may be explicitly expressed in 
terms of a function which satisfies an ordinary 
nonlinear differential equation. Specifically, 


G(r) = 5 [LE mle/2)]n(r/27- 
exp | PP-A- 29 


where the function 7(r) satisfies the Painlevé III 
equation 


I 1 / — 
n ae [30] 


with the boundary condition that 


nír) ~ 1—2XKo(2r) as roo [31] 


where Ko(r) is the modified Bessel function of the 


third kind and 
As ia [32] 
The leading behavior of G4(r) for r > o is 


G+ (r) ~ AKo(r) [33] 


G_()~14 xP K2(r) — K2(r)] 
— rKolr)Ki(r) + KO [34] 


where K,,(z) is the modified Bessel function of the 
third kind. When A is given by [32] these r— oo 
limits of G+(r) agree with the behavior of 
<00, 00N, N> for N> 1 and |T —T,| small with 
NIT — T,| >> 1 which is obtained from [18] and 
[17]. The behavior of G(r) for r— 0 with the value 
of given by [32] is 


Gi(r) = const. r~ 1/4 [35] 


where the constant agrees with that computed from 
the result [16] for < oo, 00N,n > at T =T., for N > 1. 
For other values of the boundary condition constant A, 
the scaling function G(r) diverges with a power 
which differs from 1/4. The computation of the 
constant in [35] requires the evaluation of a nontrivial 
integral involving the Painlevé II function. 

The agreement of the limits r— oo and r—0 of 
the function G(r) with the lattice results near Te 
means that this scaling function uniformly inter- 
polates between T Æ T, and T=T, and that the 
lattice size (defined here as unity) and the self- 
generated correlation length € are the only two 
length scales in the theory. This feature that the 
system generates only one new length scale near T, 
is referred to as one length scale scaling. 


Susceptibility 
The final quantity of macroscopic thermodynamic 
interest is the magnetic susceptibility 

_ M(H) 
OH 





x(T) 





(36) 
H=0 


which is expressed in terms of the spin-spin 
correlation function as 


1 2 
x(T) = pT D1 <70.07MN> —M~«} [37] 
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The susceptibility may be studied by using the 
determinental expression for the correlation func- 
tion. The simplest result is obtained (for the 
isotropic case, Ey = Ep) by using the scaling form 
[27] to find for T ~ T, that 


kgeTx+(T) ~ ME 2r / drr{Gz—O+} [38] 
0 


where 0, =0 and ©_ =1. and thus y+(T) diverges 
at T — T. as 


EE TO [39] 


where Cz are transcendental constants given as 
integrals over the scaling function G(r), which 
were first evaluated by Barouch et al. in 1973 as 


C_ =0.0255369719..., 


[40] 
C, = 0.9625817322... 


Critical-Exponent Phenomenology 


From the behavior for the Ising model of the 
specific heat, magnetization, susceptibility, corre- 
lation length, and the correlation at Tę given 
above we abstract for general systems the phe- 
nomenological critical-exponent parametrization 
for T— T.+ of 











¢~ At|T — T,|-% 41] 
M ~ Ay|T. - T|? [42] 
yA T=" [43] 
€~ AF|T =L [44] 


and at T = T, for R — œ 
<ovor>~ A,/R?**" where dis the dimension [45] 


The exponents a+, y+, V+ above and below T, are usually 
found to be equal, and the exponent 77 is usually called the 
anomalous dimension. If it is assumed that the scaling 
function [27] exists and that one length scale scaling holds 
then the exponents are related by what are called scaling 
laws, such as 


26 =v_(d—-—2+7) [46] 
TE ee e. [47] 
dv_ =2-—a_ [48] 


Thus, from the properties of the Ising model near 
Te, we have obtained a phenomenology for use on 
all systems near the critical point. 
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Fuchsian Equations and Natural 
Boundaries for Susceptibility 


This critical phenomenology, however, has not 
taken into account the fact that the susceptibility 
is a much more complicated function than either 
the spontaneous magnetization [11] or the free 
energy [5], which have only isolated singularities at 
k*=1, and that there is more structure to the 
susceptibility than the singularity of [39]. 

For arbitrary T, the susceptibility was shown by 
Wu, McCoy, Tracy, and Barouch to be expressible 
in the form 


X(T) = Mz JPT) 49 


where in the sum f is odd (even) for T above (below) 
T,. The quantities ¥"(T) are explicitly given as 
j-fold integrals of algebraic functions and thus will 
satisfy linear differential equations with polynomial 
coefficients. Such functions can have only isolated 
singularities. The function ¥"!)(T) is elementary and 
has a double pole at T, and ¥'*)(T) is given in terms 
of complete elliptic integrals. Quite recently, 
remarkable Fuchsian linear differential equations 
for x')(T) and ¥"4)(T) of seventh and tenth orders, 
respectively, have been obtained by Zenine, Bouk- 
raa, Hassani, and Maillard for the isotropic lattice. 

Furthermore, it was shown by Orrick et al. (2001) 
that ¥” has singularities in the complex T plane at 


cosh(2E,/RT) cosh(2E;,/RT) 
— sinh(2E,/RT) cos(27/7) 
— sinh(2E,,/kT) cos(27m'/j) = 0 [50] 


with m,m'=1,2,...,7. The form of the singularity 
in X” (T) for T > T, is as 


e -3)/? In [51] 
and, for T < Te, it is as 
el -3)/2 [52] 


where € measures the deviation from the singular 
point [50]. These singularities become dense as 
j—oco and, therefore, the singularity at T =T, is 
not isolated and instead the critical point is 
embedded in a natural boundary. Such a function 
cannot satisfy a linear differential equation of finite 
order with polynomial coefficients. 

The existence of the natural boundary in the 
susceptibility is a new phenomenon which is not 
seen in either the free energy or magnetization and 
leads to the speculation that in the presence of a 
magnetic field the one length scale scaling property 
of the model at H =0 may fail. If this proves to be 


correct, there will be physical effects which are not 
incorporated in the phenomenological scaling theory 
of critical phenomena. 


Impure Ising Models 


The Ising model may also be studied when the 
interaction energies at sites j,k are not chosen to be 
independent of position but are allowed to vary 
from site to site. When these interactions are chosen 
randomly out of some probability distribution, this 
is a model of a ferromagnet with frozen (quenched) 
impurities. All real systems will be impure to some 
extent, so the study of such dirty systems is of great 
practical importance. 

The special case where the interactions are transla- 
tionally invariant in the horizontal direction but are 
allowed to vary in a layered fashion from row to row 
was introduced by McCoy and Wu in 1968 and 
found to be dramatically different from the pure Ising 
model described above. In particular, what is a 
critical temperature T, in the pure case is now spread 
out into a region bounded by the temperatures the 
pure model would be critical if all the bonds took on 
the minimum or maximum value allowed by the 
probability distribution. In this new region, the 
correlations (in the direction of translational invar- 
lance) are found to decay as a power law which 
depends on the temperature; the specific heat is never 
infinite but the susceptibility is infinite in an entire 
temperature region that includes the temperature at 
which the spontaneous magnetization first appears as 
T is lowered. The existence of this new region for 
Ising models with a general randomness in two and 
three dimensions has been demonstrated by Griffiths. 
More recently, this effect has been reinterpreted in 
terms of impurities in quantum spin chains. 


Quantum Field Theory 


The Ising model of [1] may be reinterpreted as a two- 
dimensional lattice gauge theory of the gauge field 


Sj+1/2,k = +1 

on the vertical link between (j,k) and (j + 1,k) 
Sjk+1/2 = 1 

on the horizontal link between (j, k) 

and (j,k + 1) [53] 
and a “Higgs” field 


Pik =+1 onthe site (j,k) [54] 


with the action 


Sg = —Eg X s /2,kSj+1,k+1/2$j+1/2,k+1$j,k+1/2 
j,k 


— En X (Øj kSj+1/2kPjt1 k + PjkSjk+1/2b;k1) [55] 
ik 


If we define 
Soh = tanh Bai Ret [56] 


the partition function of the gauge theory is expressed 
in terms of the Ising model partition function as 


Zg = [8 cosh(E,/kpT) cosh? 
x (En /kBT)zt zh]" Z(H) [57] 


where we make the identification 
H/kpT =4lnz, and E/kgT =+ł4lnzp [58] 


This identification may be extended to correlation 
functions. Of particular interest for the gauge theory is 
the plaquette—plaquette correlation < Po, oP; >, where 


Pik = Sj+1/2,kSj+1,k+1/2$j+1/2,k+1$j,k+1/2 [59] 


which is expressed in terms of the Ising correlations 
at H Æ 0 as 


<Pool t> m < Poo > 
= sinh” (2H /kBT)(< o0,00;Ł> — <o00>7) [60] 


To study this correlation further, we need to study 
the correlations of the Ising model in nonzero 
magnetic field. This has been done by McCoy and 
Wu in the scaling limit H —> 0, T —> Te with 

H 


for T< Te, where it is found that the scaling 
function G(r, h) for small h and large r if 


G(r,h) ~ X` abKo (2 p pP Nr 
' 


_ = _»f2/3 
ao yr iie 2r V5 hae rh?/3, 
; 


where A; are the solutions of 


Tir") + JapGr”*) =0 [63] 


with J,,(z) the Bessel function of order n and Ko(z) 
the modified Bessel function of the third kind. 

A field theory is said to possess a particle spectrum 
if the Fourier transform of the two-point function 


Cie J d?ret"G(r,h) [64] 
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has poles of the form A;/(k? + mọ), where m; is the 
mass of the /th particle. If we note that the Fourier 
transform of Ko(r) is 


| 2 
/ Prei*’Ky(r) = Zot 65] 


we see that the Fourier transform of [62] is the sum of 
an infinite number of poles. This is to be compared 
with the Fourier transform of the scaled correlation 
function G_(r) at H=0 and T < T. [34], which does 
not contain any poles at all and may instead be 
interpreted as having a two-particle cut. This phe- 
nomenon of a cut at h = 0 breaking up into an infinite 
number of poles for h > 0 is a signal that at b = 0 the 
theory has free unconfined two-particle states which 
become weakly confined by a linear confining 
potential for hb > 0. This confinement is thought to 
be a characteristic of most gauge theories. 


See also: Eight Vertex and Hard Hexagon Models; 
Holonomic Quantum Fields; Painlevé Equations; 
Percolation Theory; Phase Transitions in Continuous 
Systems; Statistical Mechanics and Combinatorial 
Problems; Toeplitz Determinants and Statistical 
Mechanics; Yang—Baxter Equations. 
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History and Motivation 


Local quantum physics of systems with infinitely many 
interacting degrees of freedom leads to situations 
whose understanding often requires new physical 
intuition and mathematical concepts beyond that 
acquired in quantum mechanics and perturbative 
constructions in quantum field theory. In this situa- 
tion, two-dimensional soluble models turned out to 
play an important role. On the one hand, they 
illustrate new concepts and sometimes remove mis- 
conceptions in an area where new physical intuition is 
still in the process of being formed. On the other hand, 
rigorously soluble models confirm that the underlying 
physical postulates are mathematically consistent, a 
task which for interacting systems with infinite degrees 
of freedom is mostly beyond the capability of 
pedestrian methods or brute force application of hard 
analysis on models whose natural invariances have 
been mutilated by a cutoff. 

In order to underline these points and motivate 
the interest in two-dimensional QFT, let us briefly 
look at the history, in particular at the physical 
significance of the three oldest two-dimensional 
models of relevance for statistical mechanics and 
relativistic particle physics, in chronological order: 
the Lenz-Ising (L-I) model, Jordan’s model of 
bosonization/fermionization, and the Schwinger 
model (QED,). (A more detailed account of the 
changeful history concerning their correct physical 
interpretation and generalizations to higher dimen- 
sions of these models and the increasing conceptual 
role of low-dimensional models in QFT can be 
found in Schroer (2005).) 

The L-I model was proposed in 1920 by Wihelm 
Lenz (see Lenz (1925)) as the simplest discrete 
statistical mechanics model with a chance to go 
beyond the P Weiss phenomenological ansatz 


Zenine N, Boukraa S, Hassani S, and Maillard JM (2004) The 
Fuchsian differential equation of the square lattice Ising model 
x) susceptibility. Journal of Physics A 37: 9651-9668. 

Zenine N, Boukraa S, Hassani S, and Maillard JM (2005) Ising 
model susceptibility: Fuchsian differential equation for y‘ 


and its factorization properties. Journal of Physics A 38: 
4149-4173. 


involving long-range forces and instead explain ferro- 
magnetism in terms of nonmagnetic short-range 
interactions. Its one-dimensional version was solved 
four years later by his student Ernst Ising. Its changeful 
history reached a temporary conceptual climax when 
Onsager succeeded to rigorously establish a second- 
order phase transition in two dimensions. 

Another conceptually rich model which lay 
dormant for almost two decades as a result of a 
misleading speculative higher-dimensional general- 
ization by its protagonist is the bosonization/ 
fermionization model first proposed by Jordan 
(1937). This model establishes a certain equivalence 
between massless two-dimensional fermions and 
bosons and is related to Thirring’s massless 
4-fermion coupling model and also to Luttinger’s 
one-dimensional model of an electron gas (Schroer). 
One reason why even nowadays hardly anybody 
knows Jordan’s contribution is certainly the ambi- 
tious but unfortunate title “the neutrino theory of 
light” under which he published a series of papers. 

Both discoveries demonstrate the usefulness of 
having controllable low-dimensional models; at the 
same time, their complicated history also illustrates 
the danger of rushing to premature “intuitive” 
conclusions about extensions to higher dimensions. 

A review of the early historical benchmarks of 
conceptual progress through the study of solvable 
two-dimensional models would be incomplete 
without mentioning Schwinger’s (1962) proposed 
solution of two-dimensional quantum electrody- 
namics, afterwards referred to as the Schwinger 
model. He used this model in order to argue that 
gauge theories are not necessarily tied to zero-mass 
vector particles. Some work was necessary 
(Schroer) to unravel its physical content with the 
result that the would-be charge of that QED, 
model was “screened” and its apparent chiral 
symmetry broken; in other words, the model exists 
only in the so-called Schwinger—Higgs phase with 
massive free scalar particles accounting for its 
physical content. Another closely related aspect of 


this model which also arose in the Lagrangian 
setting of four-dimensional gauge theories was that 
of the -angle parametrizing, an ambiguity in the 
quantization. 

A coherent and systematic attempt at a mathema- 
tical control of two-dimensional models came in the 
wake of Wightman’s first rigorous programmatic 
formulation of QFT (Schroer 2005). This formula- 
tion stayed close to the physical ideas underlying the 
impressive success of renormalized QED perturba- 
tion theory, although it avoided the direct use of 
Lagrangian quantization. The early attempts 
towards a “constructive QFT” found their successful 
realization in two-dimensional QFT (the Py models 
(Glimm and Jaffe 1987)); the restriction to low 
dimensions is related to the mild short-distance 
singularity behavior (super-renormalizability) which 
these methods require. We will focus our main 
attention on alternative constructive methods which, 
even though not suffering from such short-distance 
restrictions, also suffer from a lack of mathematical 
control in higher spacetime dimensions; the illustra- 
tion of the constructive power of these new methods 
comes presently from massless d=1+ 1 conformal 
and chiral QFT as well as from massive factorizing 
models. 

There are several books and review articles 
(Furlan et al. 1989, Ginsparg 1990, Di Francesco 
et al. 1996) on d=1 + 1 conformal as well as on 
massive factorizing models (Abdalla et al. 1991). To 
the extent that concepts and mathematical structures 
are used which permit no extension to higher 
dimensions (Kac—Moody algebras, loop groups, 
integrability, presence of an infinite number of 
conservation laws), this line of approach will not 
be followed in this article since our primary interest 
will be the use of two-dimensional models of QFT 
as “theoretical laboratories” of general QFT. Our 
aim is twofold; on the one hand, we intend to 
illustrate known principles of general QFT in a 
mathematically controllable context and on the 
other hand, we want to identify new concepts 
whose adaptation to QFT in d=1+ 1 lead to their 
solvability (Schroer). 


General Concepts and Their 
Two-Dimensional Manifestation 


The general framework of QFT, to which the rich 
world of controllable two-dimensional models con- 
tributes as an important testing ground, exists in 
two quite different but nevertheless closely related 
formulations: the 1956 approach in terms of point- 
like covariant fields due to Wightman (see Streater 
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and Wightman (1964)) (see Axiomatic Quantum 
Field Theory), and the more algebraic setting which 
can be traced back to ideas which Haag (1992) 
developed shortly after and which are based on 
spacetime-indexed operator algebras and related 
concepts which developed over a long period of 
time, with contributions of many other authors to 
what is now referred to as algebraic QFT (AQFT) or 
simply local quantum physics (LQP). Whereas the 
Wightman approach aims directly at the (not 
necessarily observable) quantum fields, the opera- 
tor-algebraic setting (see Algebraic Approach to 
Quantum Field Theory) is more ambitious. It starts 
from physically well-motivated assumptions about 
the algebraic structure of local observables and aims 
at the reconstruction of the full field theory 
(including the operators carrying the superselected 
charges) in the spirit of a local representation theory 
of (the assumed structure of the) local observables. 
This has the advantage that the somewhat myster- 
ious concept of an inner symmetry (as opposed to 
outer (spacetime) symmetry) can be traced back to 
its physical roots which is the representation- 
theoretical structure of the local observable algebra 
(see Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions). In the standard Lagrangian 
quantization approach, the inner symmetry is part of 
the input (multiplicity indices of field components 
on which subgroups of U(n) or O(n) act linearly) 
and hence it is not possible to problematize this 
fundamental question. When in low-dimensional 
spacetime dimensions the sharp separation (the 
Coleman—Mandula theorems) of inner versus outer 
symmetry becomes blurred as a result of the 
appearance of braid group statistics, the standard 
Lagrangian quantization setting of most of the 
textbooks is inappropriate and even the Wightman 
framework has to be extended. In that case, the 
algebraic approach is the most appropriate. 

The important physical principles which are shared 
between the Wightman approach (see Streater and 
Wightman (1964)) and the operator algebra (AQFT) 
setting (Haag 1992) are the spacelike locality or 
Einstein causality (in terms of pointlike fields or 
algebras localized in causally disjoint regions) and 
the existence of positive-energy representations of 
the Poincaré group implementing covariance and the 
stability of matter. In the algebraic approach, the 
observable content of the theory is encoded into a 
family of (weakly closed) operator algebras 
{A(O)}oex indexed by a family of convex causally 
closed spacetime regions © (with ©’ denoting the 
spacelike complement and A’ the von Neumann 
commutant) which act in one common Hilbert space. 
Covariant local fields lose their distinguished role 
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which they have in the classical setting and which 
(via Lagrangian quantization) was at least partially 
inherited by the Wightman approach and, apart in 
their role as local generators of symmetries (con- 
served currents), became mere “field coordinatiza- 
tions” of local algebras. (There is a denumerable set 
of such pointlike field generators which form a local 
equivalence (Borchers) class of fields and in the 
absence of interactions permits a neat description in 
terms of Wick-ordered free-field polynomials (Haag 
1992). Certain properties cannot be naturally for- 
mulated in the pointlike field setting (e.g., Haag 
duality for convex regions A(O’) = A(O)'’), but apart 
from those properties the two formulations are quite 
close; in particular for two-dimensional theories there 
are convincing arguments that one can pass between 
the two without imposing additional technical 
requirements. (Haag duality holds for observable 
algebras in the vacuum sector in the sense that any 
violation can be explained in terms of a sponta- 
neously broken symmetry; in local theories, it can 
always be enforced by dualization and the resulting 
Haag dual algebra has a charge superselection 
structure associated with the unbroken subgroup.) 
Haag duality is the statement that the commutant of 
observables not only contains the algebra of the 
causal complement that is, A(O’) C A(OY (Einstein 
causality) but is even exhausted by it; it is deeply 
connected to the measurement process and its 
violation in the vacuum sector for convex causally 
complete regions signals spontaneous symmetry 
breaking in the associated charge-carrying field 
algebra (Haag 1992). It can always be enforced 
(assuming that the wedge-localized algebras fulfill 
[1] below) by symmetry-reducing extension called 
Haag dualization. Its violation for multilocal region 
reveals the charge content of the model via charge- 
anticharge splitting in the neutral observable algebra 
(Schroer). 

Another physically important property which has 
a natural algebraic formulation is the split property: 
for regions ©; separated by a finite spacelike 
distance, one finds A(O;UQ2) ~ A(01) ® A(O2) 
which can be derived from the Buchholz—Wichmann 
“nuclearity property” (Haag 1992) (an appropriate 
adaptation of the “finiteness of phase-space cell” 
property of QM to QFT). Related to the Haag 
duality is the local version of the “time slice 
property” (the QFT counterpart of the classical 
causal dependency property) sometimes referred to 
as “strong Einstein causality” A(O") = A(O”. 

One of the most astonishing achievements of the 
algebraic approach (which justifies its emphasis on 
properties of “local observables”) is the DHR theory 
of superselection sectors (Doplicher et al. 1971), 


that is, the realization that the structure of charged 
(nonvacuum) representations (with the superposi- 
tion principle being valid only within one represen- 
tation) and the spacetime properties of the 
generating fields which are the carriers of these 
generalized charges (including their spacelike com- 
mutation relations which lead to the particle 
statistics and also to their internal symmetry proper- 
ties) are already encoded in the structure of the 
Einstein causal observable algebra (Symmetries in 
Quantum Field Theory: Algebraic Aspects). The 
intuitive basis of this remarkable result (whose 
prerequisite is locality) is that one can generate 
charged sectors by spatially separating charges in the 
vacuum (neutral) sector and disposing of the 
unwanted charges at spatial infinity (Haag 1992). 

An important concept which especially in d=1 + 1 
has considerable constructive clout is “modular 
localization.” It is a consequence of the above 
algebraic setting if either the net of algebras have 
pointlike field generators, or if the one-particle 
masses are separated by spectral gaps so that the 
formalism of time-dependent scattering can be 
applied (Schroer 2005); in conformal theories, this 
property holds automatically in all spacetime 
dimensions. It rests on the basic observation 
(Tomita—Takesaki Modular Theory) that a standard 
pair (A,Q) of a von Neumann operator algebra and 
a standard vector (standardness means that the 
operator algebra of the pair (A,Q) acts cyclic and 
separating on the vector Q) gives rise to a Tomita 
operator S$ through its star-operation whose polar 
decomposition yield two modular objects, a one- 
parametric subgroup A” of the unitary group of 
operators in Hilbert space whose Ad-action defines 
the modular automorphism of (A,9) whereas the 
angular part J is the modular conjugation which 
maps A into its commutant A’ 


SAQ = A*0, S = JA? 
Iw = U(jw) — SscatJo, AÑ E U(Aw(2rt)) [1] 
ow(t):= AdAY, 


The standardness assumption is always satisfied for 
any field-theoretic pair (A(O),Q) of a O-localized 
algebra and the vacuum state (as long as O has a 
nontrivial causal disjoint ©’), but it is only for the 
wedge region W that the modular objects have a 
physical interpretation in terms of the global 
symmetry group of the vacuum as specified in the 
second line of [1]; the modular unitary A‘, 
represents the W-associated boost Aw(y) and the 
modular conjugation Jw implements the TCP-like 
reflection along the edge of the wedge (Bisognano 


and Wichmann 1975). The third line is the defini- 
tion of the modular group. The importance of this 
theory for local quantum physics results from the 
fact that it leads to the concept of modular 
localization, an intrinsic new scenario for field- 
theoretic constructions which is different from the 
Lagrangian quantization schemes (Schroer 2005). 

A special feature of d= 1 + 1 Minkowski spacetime 
is the disconnectedness of the right/left spacelike region 
leading to a right—left ordering structure. So in addition 
to the Lorentz-invariant timelike ordering x < y (x 
earlier than y, which is independent of spacetime 
dimensions), there is an invariant spacelike ordering 
x < y (x to the left of y) in d=1 + 1 which opens the 
possibility of more general Lorentz-invariant spacelike 
commutation relations than those implemented by 
Bose/Fermi fields (Rehren and Schroer 1987) of fields 
with a spacelike braid group commutation structure. 
The appearance of such exotic statistics fields is not 
compatible with their Fourier transforms being crea- 
tion/annihilation operators for Wigner particles; 
rather, the state vectors which they generate from the 
vacuum contain in addition to the one-particle 
contribution a vacuum polarization cloud (Schroer 
2005). This close connection between new kinematic 
possibilities and interactions is one of the reasons why, 
(different from higher dimensions where interactions 
are prescribed by the recipe of local couplings of free 
fields) low-dimensional QFT offers a more intrinsic 
access to the central issue of interactions. 


Boson/Fermion Equivalence and 
Superselection Theory in a Special Model 


The simplest and oldest but conceptually still rich 
model is obtained, as first proposed by Jordan 
(1937), by using a two-dimensional massless Dirac 
current and showing that it may be expressed in 
terms of scalar canonical Bose creation/annihilation 
operators 


Ja =: Wnt) = OP, Q 
+œ å dp 
= f{e’*a*(p) + h.c.}=— [2] 
-o0 2\p| 
Although the potential ¢(x) of the current as a result 
of its infrared divergence is not a field in the 
standard sense of an operator-valued distribution 
in the Fock space of the a(p)* (It becomes an 
operator after smearing with test functions whose 
Fourier transform vanishes at p=0), the formal 
exponential defined as the zero-mass limit of a well- 
defined exponential free massive field 
» ) = Jim m” : adn (x) : [3] 


m—(0 
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turns out to be a bona fide quantum field in a larger 
Hilbert space (which extends the Fock space 
generated from applying currents to the vacuum). 
The power in front is determined by the requirement 
that all Wightman functions (computed with the 
help of free-field Wick combinatorics) stay finite in 
this massless limit; the necessary and sufficient 
condition for this is the charge conservation rule 


(TI . elaid(x) j 
=i (1/2)aiaj 
) ) 2a; = 0 4 
=p) 


0, otherwise 


where the resulting correlation function has been 
factored in terms of light-ray coordinates 
Cy =X4i — X4j,xX4—=t+x, and the eé-prescription 
stands for taking the standard Wightman bound- 
ary value t — t+1e, lim-o which insures the 
positive-energy condition. The finiteness of the 
limit insures that the resulting zero-mass limiting 
theory is a bona fide quantum field theory that is, 
its system of Wightman functions permits the 
construction of an operator theory in a Hilbert 
space with a distinguished vacuum vector. 

The factorization into light-ray components [4] 
shows that the exponential charge-carrying opera- 
tors inherit this factorization into two independent 
chiral components :exp ia ¢(x):=exp ia (x4): 
:exp lagd_(x_):, each one being covariant under 
scaling €— A£ if one assigns the scaling dimension 
d=a*/2 to the chiral exponential field and d=1 to 
the current. As any Wightman field, this is a singular 
object which only after smearing with Schwartz test 
functions yields an (unbounded) operator. But the 
above form of the correlation function belongs to a 
class of distributions which admits a much larger 
test-function space consisting of smooth functions 
which instead of decreasing rapidly only need to be 
bounded so that they stay finite on the compactified 
light-ray line R=S!. To make this visible, one uses 
the Cayley transform (now x denotes either x+ 
or x_) 


Lẹ 
 t-ix 





és [5] 


This transforms the Schwartz test function into a space 
of test functions on St which have an infinite order 
zero at g=—1 (corresponding to x= +00) but the 
rotational transformed fields j(z),: exp ia $(z): permit 
the smearing with all smooth functions on St, a 
characteristic feature of all conformal invariant 
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theories as the present one turns out to be. There is an 
additional advantage in the use of this compactifica- 
tion. Fourier transforming the circular current actually 
allows for a quantum-mechanical zero mode whose 
possible nonzero eigenvalues indicate the presence of 
additional charge sectors beyond the charge-zero 
vacuum sector. For the exponential field, this leads to 
a quantum-mechanical pre-exponential factor which 
automatically insures the charge selection rules so that 
unrestricted (by charge conservation) Wick contrac- 
tion rules can be applied. In this approach, the 
original chiral Dirac fermion w(x) (from which the 
current was formed as the :wy: composite) 
reappears as a charge-carrying exponential field 
for ~=1 and thus illustrates the meaning of 
bosonization/fermionization. (It is interesting to 
note that Jordan’s (1937) original treatment of 
fermionization had such a pre-exponential quan- 
tum-mechanical factor.) Naturally, this terminol- 
ogy has to be taken with a grain of salt in view of 
the fact that the bosonic current algebra only 
generates a superselected subspace into which the 
charge-carrying exponential field does not fit. 
Only in the case of massive two-dimensional QFT 
fermions can be incorporated into a Fock space of 
bosons (see last section). At this point, it should 
however be clear to the reader that the physical 
content of Jordan’s paper had nothing to do with 
its misleading title “neutrino theory of light” but 
rather was an early illustration about charge 
superselection rules in two-dimensional QFT. 

A systematic and rigorous approach consists in 
solving the problem of positive-energy representa- 
tion theory for the Weyl algebra on the circle (which 
is the rigorous operator-algebraic formulation of the 
abelian current algebra). (The Weyl algebra origi- 
nated in quantum mechanics around 1927; its use in 
QFT only appeared after the cited Jordan paper. By 
representation we mean here a regular representa- 
tion in which the exponentials can be differentiated 
in order to obtain (unbounded) smeared current 
operators.) It is the operator algebra generated by 
the exponential of a smeared chiral current (always 
with real test functions) with the following relation 
between the generators 


Wf) = eÏ) 
O= SEIO 
= -6(z—2) 6a 


W(f)W(g) = OT) WF + g) 
Wha Way) [6b] 


A(S*) = alg{ W(f), f € Coo(S") } 
A(I) = alg{ W (f), suppf C 1} [6c] 


e] E Pegla) 


is the symplectic form which characterizes the 
Weyl algebra structure and [6c] denotes the 
unique C* algebra generated by the unitary objects 
Wf). A particular representation of this algebra is 
given by assigning the vacuum state to the 
generators (W(f)))=e7 "Ilo, FI) = Enz nla. 
Starting with the vacuum Hilbert space represen- 
tation A(S!'))=7(A(S')), one easily checks that 
the formula 


where 


(W(f))q = EPWA) [7a] 


Ta(W(f)) = el mo(W(F)) [7b] 


defines a state with positive energy, that is, one 
whose GNS representation for œa #0 is unitarily 
inequivalent to the vacuum representation. Its 
incorporation into the vacuum Hilbert space [7b] is 
part of the DHR formalism. It is convenient to view 
this change as the result of an application of an 
automorphism 7 on the C*-Weyl algebra A(S!) 
which is implemented by a unitary charge-generat- 
ing operator Ia in a larger (nonseparable) Hilbert 
space which contains all charge sectors Ha =I',Ho, 


Ho = Hrsg = A(S!)Q: 


(Wf ))a= Val W(F)))o 
WYW =TaW (AE, 


TaN = Qa describes a state with a rotational homo- 
geneous charge distribution; arbitrary charge distribu- 
tions pa of total charge a that is, f (dz/ 27i)p, =a 
are obtained in the form 


Y =N(Pa) W (pT [9] 


where n(pa) is a numerical phase factor and the 
net effect of the Weyl operator is to change the 
rotational homogeneous charge distribution into 
Pa. The necessary charge-neutral compensating 
function p$, in the Weyl cocycle W(p$) is uniquely 
determined in terms of pa up to the choice of one 
point ¢ € S! (the determining equation involves 
the Inz function which needs the specification of 
a branch cut (Schroer 2005)). From this formula, 
one derives the commutation relations Y ve = 
a . for spacelike separations of the p 
supports; hence, these fields are relatively local 
(bosonic) for aG=2Z. In particular, if only one 


[8 | 


type of charge is present, the generating charge is 
Qgen = V2N and the composite charges are multi- 
ples, that is, QgenZ. This locality condition 
providing bosonic commutation relations does 
not yet ensure the (¢-independence. Since the 
equation which controls the ¢-change turns out 
to be 


ws (ve = eTa .2m1Qa [10] 


one achieves -independence by restricting the 
Hilbert space charges to be “dual” to that of the 
operators, that is, 


1 
= (_——Z 
o=- aN) 
The localized se Operators acting on the restricted 
separable Hilbert space H,es generate a ¢-indepen- 
dent extended observable algebra An(S') (Schroer) 
and it is not difficult to see that its representation in 


Hyes is reducible and that it decomposes into 2N 
charge sectors 


faagrin= Ode Nl 7 

J2N 

Hence, the process of extension has led to a charge 
quantization with a finite (“rational”) number of 
charges relative to the new observable algebra which 
is neutral in the new charge counting 


1 
— Zagen Z = Z/ O;e, = Zon 
gen 
The charge-carrying fields in the new setting are also 
of the above form [9], but now the generating field 
carries the charge 


dz 
J Iai Pgen E Qeen 


which is a (1/2N) fraction of the old gen. Their 
commutation relations for disjoint charge supports 
are “braidal” (or better “plektonic” which is more 
on par with being bosonic/fermionic). (In the abelian 
case like the present, the terminology “anyonic” 
enjoys widespread popularity, but in the present 
context the “any” does not go well with charge 
quantization.) These objects considered as operators 
localized on St do depend on the cut ¢, but using an 
appropriate finite covering of S! this dependence is 
removed (Schroer 2005). So the field algebra FZ2N 
generated by the charge-carrying fields (as opposed 
to the bosonic observable algebra Ayn) has its unique 
localization structure on a finite covering of St. An 
equivalent description which gets rid of ¢ consists in 
dealing with operator-valued sections on St. The 
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extension A— Ayn, which renders the Hilbert space 
separable and quantizes the charges, seems to be 
characteristic for abelian current algebra; in all other 
models which have been constructed up to now the 
number of sectors is at least denumerable and in the 
more interesting ones even finite (rational models). 
An extension is called maximal if there exists no 
further extension which maintains the bosonic 
commutation relation. For the case at hand, this 
would require the presence of another generating 
field of the same kind as above, which belongs to an 
integer N’ is relatively local to the first one. This is 
only possible if N is divisible by a square. 

In passing, it is interesting to mention a somewhat 
unexpected relation between the Schwinger model, 
whose charges are screened, and the Jordan model. 
Since the Lagrangian formulation of the Schwinger 
model is a gauge theory, the analog of the four- 
dimensional “asymptotic freedom” wisdom would 
suggest the possibility of “charge liberation” in the 
short-distance limit of this model. This seems to 
contradict the statement that the intrinsic content 
of the Schwinger model (QED, with massless 
Fermions) (after removing a classical degree of free- 
dom) is the QFT of a free massive Bose field and sucha 
simple free field is at first sight not expected to contain 
subtle information about asymptotic charge liberation. 
(In its original gauge-theoretical form, the Schwinger 
model has an infinite vacuum degeneracy. The 
removal of this degeneracy (restoration of the cluster 
property) with the help of the “6-angle formalism” 
leaves a massive free Bose field (the Schwinger—Higgs 
mechanism). As expected in d= 1 + 1 the model only 
possesses this phase.) Well, as we have seen above, the 
massless limit really does have liberated charges and 
the short-distance limit of the massive free field is the 
massless model (Schroer). 

As a result of the peculiar bosonization/fermioniza- 
tion aspect of the zero-mass limit of the derivative of 
the massive free field, Jordan’s model is also closely 
related to the massless Thirring model (and the related 
Luttinger model for an interacting one-dimensional 
electron gas) whose massive version is in the class of 
factorizing models (see later section). (Another struc- 
tural consequence of this aspect leads to Coleman’s 
theorem (Schroer 2005) which connects the Mermin— 
Wagner no-go theorem for two-dimensional sponta- 
neous continuous symmetry breaking with these 
zero-mass peculiarities.) The Thirring model is a 
special case in a vast class of “generalized” multi- 
coupling multicomponent Thirring models, that is, 
models with 4-fermion interactions. Under this name 
they were studied in the early 1970s (Schroer) with 
the aim to identify massless subtheories for which the 
currents form chiral current algebras. 
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The counterpart of the potential of the conserved 
Dirac current in the massive Thirring model is the 
sine-Gordon field, that is, a composite field which in 
the attractive regime of the Thirring coupling again 
obeys the so-called sine-Gordon equation of motion. 
Coleman gave a supportive argument (Schroer 2005) 
but some fine points about the range of its validity in 
terms of the coupling strength remained open. (It was 
noticed that the current potential of the free massive 
Dirac Fermion (g=0) does not obey the sine-Gordon 
equation (Schroer 2005).) A rigorous confirmation of 
these facts was recently given in the bootstrap form- 
factor setting (Schroer 2005). Massive models which 
have a continuous or discrete internal symmetry have 
“disorder” fields which implement a “half-space” 
symmetry on the charge-carrying field (acting as the 
identity in the other half-axis) and together with the 
basic pointlike field form composites which have 
exotic commutation relations (see the last section). 


The Conformal Setting, 
Structural Results 


Chiral theories play a special role within the setting 
of conformal quantum fields. General conformal 
theories have observable algebras which live on 
compactified Minkowski space (St in the case of 
chiral models) and fulfill the Huygens principle, 
which in an even number of spacetime dimension 
means that the commutator is only nonvanishing for 
lightlike separation of the fields. The fact that this 
classically expected behavior breaks down for 
nonobservable conformal fields (e.g., the massless 
Thirring field) was noticed at the beginning of the 
1970s and considered paradoxical at that time 
(“reverberation” in the timelike (Huygens) region). 
Its resolution around 1974-75 confirmed that such 
fields are genuine conformal covariant objects but 
that some fine points about their causality needed to 
be addressed. The upshot was the proposal of two 
different but basically equivalent concepts about 
globally causal fields. They are connected by the 
following global decomposition formula: 


A(%cov) =) Aa,s(*), Aa,a(x) 
=P rato er, jt 


On the left-hand side, the spacetime point of the 
field is a point on the universal covering of the 
conformal compactified Minkowski space. These are 
fields (Liiescher and Mack 1975) (Schroer 2005) 
which “live” in the sense of quantum (modular) 
localization on the universal covering spacetime (or 
on a finite covering, depending on the “rationality” 


of the model) and fulfill the global causality 
condition previously discovered by I Segal (Schroer 
2005). They are generally highly reducible with 
respect to the center of the covering group. The 
family of fields on the right-hand side, on the other 
hand, are fields which were introduced (Schroer and 
Swieca 1974; Schroer et al. 1975) with the aim to 
have objects which live on the projection x(Xcov), 
that is, on the spacetime of the physics laboratory 
instead of the “hells and heavens” of the covering 
(Schroer 2005). They are operator-distributional 
valued sections in the compactification of ordinary 
Minkowski spacetime. The connection is given by 
the above decomposition formula into irreducible 
conformal blocks with respect to the center Z of the 
noncompact covering group SO(2,7) where a, 8 are 
labels for the eigenspaces of the generating unitary Z 
of the abelian center Z. The decomposition [11] is 
minimal in the sense that in general there generally 
will be a refinement due to the presence of 
additional charge superselection rules (and internal 
group symmetries). The component fields are not 
Wightman fields since they annihilate the vacuum if 
the right-hand projection differs from Po = Pyac. 

Note that the Huygens (timelike) region in Min- 
kowski spacetime has a timelike ordering structure 
x <y or x>~y (earlier or later). In d=1 +1, the 
topology allows in addition a spacelike left-right 
ordering x S y. In fact, it is precisely the presence of 
these two orderings in conjunction with the factor- 
ization of the vacuum symmetry group SO(2, 2) ~ 
PSL(2R),; ® PSL(2,R),, in particular Z= Z; & Z, 
which is at the root of a significant simplification. 
This situation suggested a tensor factorization into 
chiral components and led to an extremely rich and 
successful construction program of two-dimensional 
conformal QFT as a two-step process: the classifica- 
tion of chiral observable algebras on the light ray and 
the amalgamation of left-right chiral theories to two- 
dimensional local conformal QFT. The action on the 
circular coordinates z is through fractional SU(1, 1) 
transformations 


RT 
bz+a 


g(z) 





whereas the covering group acts on the Mack- 
Luescher covering coordinates. 

The presence of an ordering structure permits the 
appearance of more general commutation relations 
for the above Aag component fields namely 


Ag,a(x)Bg4(y) 
= SRG Bae (y)Ag w(x), x>y [12] 
3 


with numerical R-coefficients which, as a result of 
associativity and relative commutativity with respect 
to observable fields, have to obey certain structure 
relations; in this way, Artin braid relations emerge 
as a new manifestation of the Einstein causality 
principle for observables in low-dimensional QFT 
(Rehren and Schroer 1989) (see Schroer 2005). 
Indeed, the DHR method to interpret charged fields 
as charge superselection carriers (tied by local 
representation theory to the bosonic local structure 
of observable algebras) leads precisely to such a 
plektonic statistics structure (Fredenhagen et al. 
1992, Gabbiani and Froehlich 1993) for systems in 
low spacetime dimension (see Symmetries in Quan- 
tum Field Theory of Lower Spacetime Dimensions). 
With an appropriately formulated adjustment to 
observables fulfilling the Huygens commutativity, 
this plektonic structure (but now disconnected from 
particle/field statistics) is also a possible manifesta- 
tion of causality for the higher-dimensional timelike 
structure (Schroer 2005). 

The only examples known up to the appearance 
of the seminal BPZ work (Belavin et al. 1984) were 
the abelian current models of the previous section 
which furnish a rather poor man’s illustration of the 
richness of the decomposition theory. The flood- 
gates of conformal QFT were only opened after the 
BPZ discovery of “minimal models,” which was 
preceded by the observation (Friedan et al. 1984) 
that the algebra of the stress-energy tensor came 
with a new representation structure which was not 
compatible with an underlying internal group 
symmetry (see Symmetries in Quantum Field The- 
ory: Algebraic Aspects). 

An important step in the structural study of chiral 
models was the recognition that the energy-momen- 
tum tensor has the commutation structure of a Lie 
field (Schroer 2005); in the next section, its algebraic 
structure and its representation theory will be 
presented. 


Chiral Fields and Two-Dimensional 
Conformal Models 


Let us start with a family which generalizes the 
abelian model of the previous section. Instead of a 
one-component abelian current we now take n 
independent copies. The resulting multicomponent 
Weyl algebra has the previous form except that the 
current is m-component and the real function space 
underlying the Weyl algebra consists of functions 
with values in an m-component real vector space 
f € LV with the standard Euclidean inner product 
denoted by (,). The local extension now leads to 
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(a, 3) € 2Z, that is, an even-integer lattice £ in V, 
whereas the restricted Hilbert subspace He: which 
ensures ¢-independence is associated with the dual 
lattice L*: (A;,a,)=6;, which contains £. The 
resulting superselection structure (i.e., the O- 
spectrum) corresponds to the finite factor group 
L*/L. For self-dual lattices £* =£ (which only can 
occur if dimV is a multiple of 8), the resulting 
observable algebra has only the vacuum sector; the 
most famous case is the Leech lattice A24 in 
dim V=24, also called the “moonshine” model. 
The observation that the root lattices of the Lie 
algebras of types A, B, or E (e.g., su(n) corre- 
sponding to A, ,) also appear among the even- 
integral lattices suggests that the nonabelian 
current algebras associated to those Lie algebras 
can also be implemented. This turns out to be 
indeed true as far as the level-1 representations are 
concerned which brings us to the second family: 
the nonabelian current algebras of level k asso- 
ciated to those Lie algebras; they are characterized 
by the commutation relation 


Jal) Jalz’)| = iagh E- 2’) 
= 5Rg036 (z = z’) [13] 


where fog are the structure constants of the under- 
lying Lie algebra, g their Cartan—Killing form, and 
k, the level of the algebra, must be an integer in 
order that the current algebra can be globalized to a 
loop group algebra. The Fourier decomposition of 
the current leads to the so-called affine Lie algebras, 
a special family of Kac-Moody algebras. For k= 1, 
these currents can be constructed as bilinears in 
terms of the multicomponent chiral Dirac field; 
there exists also the mentioned possibility to obtain 
them by constructing their maximal Cartan currents 
within the above abelian setting and representing the 
remaining nondiagonal currents as certain charge- 
carrying (“vertex” algebra) operators. Level-k alge- 
bras can be constructed from reducing tensor 
products of k level-1 currents or directly via the 
representation theory of infinite-dimensional affine 
Lie algebras. (The global exponentiated algebras 
(the analogs to the Weyl algebra) are called loop 
group algebras.) Either way one finds that, for 
example, the SU(2) current algebra of level k has 
(together with the vacuum sector) k+1 sectors 
(inequivalent representations). The different sectors 
are already distinguished by the structure of their 
ground states of the conformal Hamiltonian Lo. 
Although the computation of higher point correla- 
tion functions for k>1, there is no problem in 
securing the existence of the algebraic nets which 
define these chiral models as well as their k + 1 
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representation sectors and to identify their generat- 
ing charge-carrying fields (primary fields) including 
their R-matrices appearing in their plektonic com- 
mutation relations. It is customary to use the 
notation SU(2), for the abstract operator algebras 
associated with the current generators [13] and we 
will denote their k+1 equivalence classes of 
representations by Asu2),,n,7=0,...,Rk, whereas 
representations of current algebras for higher rank 
groups require a more complicated labeling (in 
terms of Weyl chambers). 

The third family of models are the so-called 
minimal models which are associated with the 
Lie-field commutation structure of the chiral 
stress—energy tensor which results from the chiral 
decomposition of a conformally covariant two- 
dimensional stress—energy tensor 


T(z), T(z] = (T(z) + T(z) (z-z) 
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whose Fourier decomposition yields the Witt- 
Virasoro algebra, that is, a central extension of 
the Lie algebra of the Diff(S!). (The presence of 
the central term in the context of QFT (the analog 
of the Schwinger term) was noticed later; however, 
the terminology Witt-Virasoro algebra in the 
physics literature came to mean the Lie algebra 
of diffeomorphisms of the circle including the 
central extension.) The first two coefficients are 
determined by the physical role of T(z) as the 
generating field density for the Lie algebra of the 
Poincaré group whereas the central extension 
parameter c > 0 (positivity of the two-point func- 
tion) for the connection with the generation of the 
Moebius transformations and the undetermined 
parameter c > 0 (the central extension parameter) 
is easily identified with the strength of the two- 
point function. Although the structure of the 
T-correlation functions resembles that of free 
fields (in the sense that is an algebraically 
computable unique set of correlation functions 
once one has specified the two-point function), the 
realization that c is subject to a discrete quantiza- 
tion if c<1 came as a surprise. As already 
mentioned, the observation that the superselection 
sectors (the positive-energy representation struc- 
ture) of this algebra did not at all follow the logic 
of a representation theory of an inner symmetry 
group generated a lot of attention and stimulated a 
flurry of publications on symmetry concepts 
beyond groups (quantum groups). A concept of 
fundamental importance is the DHR theory of 
localized endomorphisms of operator algebras and 


the concept of operator-algebraic inclusions (in 
particular, inclusions with conditional expectations — 
V Jones inclusions). 

The SU(2), current coset construction (Goddard 
et al. 1985) revealed that the proof of existence and 
the actual construction of the minimal models is 
related to that of the SU(2), current algebras. 
Constructing a chiral model does not necessarily 
mean the explicit determination of the n-point 
Wightman functions of their generating fields 
(which for most chiral models remains a prohibi- 
tively complicated task) but rather a proof of their 
existence by demonstrating that these models are 
obtained from free fields by a series of computa- 
tional complicated but mathematically controlled 
operator-algebraic steps as reduction of tensor 
products, formation of orbifolds under group 
actions, coset constructions, and a special kind of 
extensions. The generating fields of the models are 
nontrivial in the sense of not obeying free-field 
equations (i.e. not being “on-shell”). The cases 
where one can write down explicit n-point functions 
of generating fields are very rare; in the case of the 
minimal family this is limited to the field theory of 
the Ising model (Schroer 2005). 

To show the power of inclusion theory for the 
determination of the charge content of theory, let us 
look at a simple illustration in the context of the above 
multicomponent abelian current algebra. The vacuum 
representation of the corresponding Weyl algebra is 
generated from smooth V-valued functions on the 
circle modulo constant functions (i.e., functions with 
vanishing total integral) f € LVo. These functions 
equipped with the aforementioned complex structure 
and scalar product yield a Hilbert space. The 
I-localized subalgebra is generated by the Weyl image 
of I-supported functions (class functions whose repre- 
senting functions are constant in the complement I’) 
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The one-interval Haag duality A(IY = A(T') (the 
commutant algebra equals the algebra localized in 
the complement) is simply a consequence of the fact 
that the symplectic complement K(J)’ in terms of 
Im(f, g) consists of real functions in that space which 
are localized in the complement, that is, 
K(I) =K(J'). The answer to the same question for 
a double interval I= U Is (think of the first and 
third quadrant on the circle) does not lead to duality 
but rather to a genuine inclusion 


K(y U BY’) = K(h U I4) C Ky U ESI 


INI [16] 
Ky U I3) C K((l U I3) ) 


The meaning of the left-hand side is clear; these 
are functions which are constant in I4 U Ig with the 
same constant in the two intervals whereas the 
functions on the right-hand side are less restrictive 
in that the constants can be different. The 
conversion of real subspaces into von Neumann 
algebras by the Weyl functor leads to the algebraic 
inclusion A(I, UI3) C A((; UL)’)’.. In physical 
terms, the enlargement results from the fact that 
within the charge neutral vacuum algebra a charge 
split with one charge in I, and the compensating 
charge in h for all values of the (unquantized) 
charge occurs. A more realistic picture is obtained 
if one allows a charge split to be subjected to a 
charge quantization implemented by a lattice 
condition f(I,)—f(I4) € 27L which relates the 
two multicomponent constant functions (where 
f(I) denotes the constant value f takes in I). As 
in the previous one-component case, the choice of 
even lattices corresponds to the local (bosonic) 
extensions. Although imposing such a lattice 
structure destroys the linearity of the K, the 
functions still define Weyl operators which gener- 
ated operator algebras A,(I; U h). (The linearity 
structure is recovered on the level of the operator 
algebra.) But now the inclusion involves the dual 
lattice L* (which of course contains the original 
lattice), 


Arh U In) C Ar (l U In) 
ind{ Ar (l U In) Z A(t Uh))'} = |G] 
Ary U h) = invc A,» (l U h) 


This time the possible charge splits correspond to 
the factor group G = L*/L, that is, the number of 
possibilities is |G| which measures the relative size 
of the bigger algebra in terms of the smaller. This is 
a special case of the general concept of the so-called 
Jones index of an inclusion which is a numerical 
measure of its depth. A prerequisite is that the 
inclusion permits a conditional expectation which 
is a generalization of the averaging under the 
“gauge group” G on Ar(h Uh) in the third 
equation above, which identifies the invariant 
smaller algebra with the fix-point algebra (the 
invariant part) under the action of G. In fact, 
using the conceptual framework of Jones, one can 
show that the two-interval inclusion is independent 
of the position of the disjoint intervals character- 
ized by the group G. 

There exists another form of this inclusion which 
is more suitable for generalizations. One starts from 
the charge quantized extended local algebra Aj“ > 
A described earlier in terms of an even-integer lattice 
L (which lives in the separable Hilbert space Hr») as 
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our observable algebra. Again the Haag duality is 
violated and converted into an inclusion AP“ (I, U D) 
C AP ((I, U Ry) which turns out to have the same 
G=L*/L charge structure (it is in fact isomorphic 
to the previous inclusion). In the general setting 
(current algebras, minimal model algebras, ...), this 
double interval inclusion is particularly interesting if 
the associated Jones index is finite. One finds 
Kawahigashi et al. (2001) (Schroer 2005). 


Theorem 1 A chiral theory with finite Jones index 
u=ind{A((; Ub)')': Ad, Ub)} for the double 
interval inclusion (always assuming that A(S') is 
strongly additive and split) is a rational theory and 
the statistical dimensions d, of its charge sectors are 
related to u through the formula 


bee, [17] 


Instead of presenting more constructed chiral 
models, it may be more informative to mention 
some of the algebraic methods by which they are 
constructed and explored. The already mentioned 
DHR theory provides the conceptual basis for 
converting the notion of positive-energy represen- 
tation sectors of the chiral model observable 
algebras A (equivalence classes of unitary repre- 
sentations) into localized endomorphisms p of this 
algebra. This is an important step because con- 
trary to group representations which have a 
natural tensor product composition structure, 
representations of operator algebras generally do 
not come with a natural composition structure. 
The DHR endomorphisms theory of A leads to 
fusion laws and an intrinsic notion of generalized 
statistics (for chiral theories: plektonic in addition 
to bosonic/fermionic). The chiral statistics para- 
meters are complex numbers (Haag 1992) whose 
phase is related to a generalized concept of spin 
via a spin-statistics theorem and whose absolute 
value (the statistics dimension) generalized the 
notion of multiplicities of fields known from the 
description of inner symmetries in higher-dimen- 
sional standard QFTs. The different sectors may 
be united into one bigger algebra called the 
exchange algebra F,.g in the chiral context (the 
“reduced field bundle” of DHR) in which every 
sector occurs by definition with multiplicity 1 and 
the statistics data are encoded into exchange 
(commutation) relations of charge-carrying opera- 
tors or generating fields (“exchange algebra 
fields”) (Schroer 2005). Even though this algebra 
is useful in that all properties concerning fusion 
and statistics are nicely encoded, it lacks some 
cherished properties of standard field theory 
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namely there is no unique state-field relation, that 
is, no Reeh-Schlieder property (a field Aag whose 
source projection Pg does not coalesce with the 
vacuum projection annihilates the vacuum); in 
operator-algebraic terms, the local algebras are 
not factors. This poses the question of how to 
manufacture from the set of all sectors natural 
(not necessarily local) extensions with these 
desired properties. It was found that this problem 
can be characterized in operator-algebraic terms 
by the existence of the so-called DHR triples 
(Schroer). In case of rational theories, the number 
of such extensions is finite and in the aforemen- 
tioned “classical?” current algebra and minimal 
models they all have been constructed by this 
method (thus confirming existing results complet- 
ing the minimal family by adding some missing 
models). The same method adapted to the chiral 
tensor product structure of d=1+1 conformal 
observables classifies and constructs all two-dimen- 
sional local (bosonic/ fermionic) conformal QFT 6) 
which can be associated with the observable chiral 
input. It turns out that this approach leads to 
another of those pivotal numerical matrices which 
encode structural properties of QFT: the coupling 
matrix Z, 


A®AC By 
\>Z,cp(A) @ (A) CA@A [18] 
po 


where the second line is an inclusion solely 
expressed in terms of observable algebras from 
which the desired (isomorphic) inclusion in the first 
line follows by a canonical construction, the so- 
called Jones basic construction. The numerical 
matrix Z is an invariant closely related to the so- 
called “statistics character matrix” (Schroer 2005) 
and in case of rational models it is even a modular 
invariant with respect to the modular SL(2, Z) group 
transformations (which are closely related to the 
matrix S in the final section). 


Integrability, the Bootstrap 
Form-Factor Program 


Integrability in QFT and the closely associated 
bootstrap form-factor construction of a very rich 
class of massive two-dimensional QFTs can be 
traced back to two observations made during the 
1960s and 1970s ideas. On the one hand, there was 
the time-honored idea to bypass the “off-shell” field- 
theoretic approach to particle physics in favor of a 
pure on-shell S-matrix setting which (in particular 
recommended for strong interactions), as a result of 


the elimination of short distances via the mass-shell 
restriction, would be free of ultraviolet divergencies. 
This idea was enriched in the 1960s by the crossing 
property which in turn led to the bootstrap idea, a 
highly nonlinear seemingly self-consistent proposal 
for the determination of the S-matrix. However, the 
protagonists of this S-matrix bootstrap program 
placed themselves into a totally antagonistic fruitless 
position with respect to QFT so that the strong 
return of QFT in the form of gauge theory under- 
mined their credibility. On the other hand, there 
were rather convincing quasiclassical calculations in 
certain two-dimensional massive QFTs as, for 
example, the sine-Gordon model which indicated 
that the obtained quasiclassical mass spectrum is 
exact and hence suggested that the associated 
QFTs are integrable (Dashen et al. 1975) and 
have no real particle creation. These provocative 
observations asked for a structural explanation 
beyond quasiclassical approximations, and it soon 
became clear that the natural setting for obtain- 
ing such mass formulas was that of the “fusion” 
of boundstate poles of unitary crossing-symmetric 
purely elastic S-matrices; first in the special 
context of the sine-Gordon model (Schroer et al. 
1976) and later as a classification program from 
which factorizing S-matrices can be determined 
by solving well-defined equations for the elastic 
two-particle S-matrix (Karowski et al. 1977). 
(It was incorrectly believed that the “nontrivial 
elastic scattering implies particle creation” 
statement of Aks (Aks, 1963) is also valid for 
low-dimensional QFTs.) Some equations in this 
bootstrap approach resembled mathematical 
structures which appeared in C N Yang’s work 
on nonrelativistic 6-function particle interactions 
as well as relations for Boltzmann weights in 
Baxter’s work on solvable lattice models; hence, 
they were referred to as Yang—Baxter relations. 
These results suggested that the old bootstrap 
idea, once liberated from its ideological dead 
freight (in particular from the claim that the 
bootstrap leads to a unique “theory of 
everything” (minus gravity)), generates a useful 
setting for the classification and construction 
of factorizing two-dimensional relativistic 
S-matrices. Adapting certain known relations 
between two-particle form factors of field opera- 
tors and the S-matrix to the case at hand 
(Karowski and Weisz 1978), and extending this 
with hindsight to generalized (multiparticle) form 
factors, one arrived at the axiomatized recipes of 
the bootstrap form-factor program of d=1+1 
factorizable models (Smirnov 1992). Although 
this approach can be formulated within the 


setting of the LSZ scattering formalism, the use of 
a certain algebraic structure (Zamolodchikov and 
Zamolodchikov 1979) which in the simplest 
version reads 


Z(0)Z*(6') = S (6 — 6')Z*(0')Z(0) + 60 — 6’) 


[19] 
Z(0)Z(8') = S?)(6' -AZZA 


(the 6-term Faddeev is due to Faddeev) brought 
significant simplifications. In the general case, the 
Z's are vector valued and the S')-structure function 
is matrix valued. (The identification of the Z-F 
structure coefficients with the elastic two-particle 
S-matrix S'*) (which is prenempted by our notation) 
can be shown to follow from the physical inter- 
pretation of the Z-F structure in terms of localiza- 
tion.) In that case the associativity of the Z-F 
algebra is equivalent to the Yang—Baxter equations. 
Recently, it became clear that this algebraic relation 
has a deep physical interpretation; it is the simplest 
algebraic structure which can be associated with 
generators of nontrivial wedge-localized operator 
algebras (see the next section). 

Conceptually as well as computationally it is much 
simpler to identify the intrinsic meaning of integr- 
ability in QFT with the factorization of its $-matrix 
or a certain property of wedge-localized algebras 
(see next section) than to establish integrability (see 
Integrability and Quantum Field Theory). 

The first step of the bootstrap form-factor 
program namely the classification and construction 
of model S-matrices follows a combination of two 
patterns: prescribing particle multiplets transforming 
according to group symmetries and/or specifying 
structural properties of the particle spectrum. The 
simplest illustration for the latter strategy is supplied 
by the Zn model. In terms of particle content, Zn 
demands the identification of the Nth bound state 
with the antiparticle. Since the fusion condition for 
the bound mass m = (pı + po) =m + m5 + 2mı 
mĢzch(1 — 02) is only possible for a pure imaginary 
rapidity difference 012 =61 — 0&2 =iœa (“binding 
angle”). Hence, the binding of two “elementary” 
particles of mass m gives 


sin 2a 





m2 = M — 
sin @ 


and more generally of k particles with 


sin ka 





Mp = MmM—, 
sin a 

so that the antiparticle mass condition my = m = m 
fixes the binding angle to a=27/N. (The quotation 
mark is meant to indicate that in contrast to the 
Schrodinger QM there is “nuclear democracy” on 
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the level of particles. The inexorable presence of 
interaction-caused vacuum polarization limits a 
fundamental/fused hierarchy to the fusion of 
charges.) The minimal (no additional physical 
poles) two-particle S-matrix in terms of which the 
n-particle S-matrix factorizes is therefore 


2) _ sin(1/2)(@ + (2mi)/N) 
min ~ sin(1/2)(0 — (2mi)/N) 


(minimal = without so-called CDD poles) The 
SU(N) model as compared with the U(N) model 
requires a similar identification of bound states of 
N — 1 particles with an antiparticle. This S-matrix 
enters as in the equation for the vacuum to 
n-particle meromorphic form factor of local opera- 
tors; together with the crossing and the so-called 
“kinematical pole equation,” one obtains a recursive 
infinite system linking a certain residue with a form 
factor involving a lower number of particles. The 
solutions of this infinite system form a linear space 
from which the form factors of specific tensor fields 
can be selected by a process which is analogous but 
more involved than the specification of a Wick basis 
of composite free fields. Although the statistics 
property of two-dimensional massive fields is not 
intrinsic but a matter of choice, it would be natural 
to realize, for example, the Zy fields as Zyn-anyons. 

Another rich class of factorizing models are 
the Toda theories of which the sine-Gordon and 
sinh-Gordon are the simplest cases. For their 
descriptions, the quasiclassical use of Lagrangians 
(supported by integrability) turns out to be of some 
help in setting up their more involved bootstrap 
form-factor construction. 

The unexpected appearance of objects with new 
fundamental (solitonic) charges (e.g., the Thirring 
field as the carrier of a solitonic sine-Gordon charge) 
and the unexpected confinement of charges (e.g., the 
CP(1) model as a confined SU(2) model) turn out to 
be opposite sides of the same coin and both cases 
have realizations in the setting of factorizing models 
(Schroer 2005). 


|20] 


Recent Developments 


There are two ongoing developments which place 
the two-dimensional bootstrap form-factor program 
into a more general setting which permits to under- 
stand its position in the general context of local 
quantum physics. 

One of these starts from the observation that the 
smallest spacetime localization region in which it is 
possible to find vacuum-polarization-free generators 
(PFG) in the presence of interactions is the wedge 
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region. If one demands in addition that these 
generators (necessarily unbounded operators) have 
the standard domain properties of QFT (which 
include stability of the domain under translations), 
then one finds that this leads precisely to the two- 
dimensional Z-—F algebraic structure which in turn in 
this way a spacetime interpretation for the first time 
acquires. In these investigations (Schroer 20085), 
modular localization theory plays a prominent role 
and there are strong indications that with these 
methods one can show the nontriviality of intersec- 
tions of wedge algebras which is the algebraic 
criterion for the existence of a model within local 
quantum physics. 

There is a second constructive idea based on light- 
front holography which uses the radical reorganiza- 
tion of spacetime properties of the algebraic structure 
while maintaining the physical content including the 
Hilbert space. Since spacetime localization aspects 
(apart from the remark about wedge algebras and 
their PFG generators made before) are traditionally 
related to the concept of fields, holographic methods 
tend to de-emphasize the particle structure in favor of 
“field properties.” Indeed, the transversely extended 
chiral theories which arise as the holographic image 
lead to simplification of many interesting properties 
with very similar aims to the old “light-cone 
quantization” except that light-front holography is 
another way of looking at the original local ambient 
theory without subjecting it to another quantization. 
(The price for this simplification is that as a result of 
the nonuniqueness of the holographic inversion 
certain problems cannot be formulated.) 

Actually, as a result of the absence of a transverse 
direction in the two-dimensional setting, the family 
of factorizing models provides an excellent theore- 
tical laboratory to study their rigorous “chiral 
encoding” which is conceptually very different 
from Zamolodchikov’s perturbative relation (which 
is based on identifying a factorizing model in terms 
of a perturbation on a chiral theory). 

It turns out that the issue of statistics of particles 
loses its physical relevance for two-dimensional 
massive models since they can be changed without 
affecting the physical content. Instead such notions 
as order/disorder fields and soliton take their place 
(Schroer 2005). 

In accordance with its historical origin, the theory 
of two-dimensional factorizing models may also be 
viewed as an outgrowth of the quantization of 
classical integrable systems (Integrability and Quan- 
tum Field Theory). But in comparison with the 
rather involved structure of integrabilty (verifying 
the existence of sufficiently many commuting con- 
servation laws), the conceptual setting of factorizing 


models within the scattering framework (factoriza- 
tion follows from existence of wedge-localized 
tempered PFGs) is rather simple and intrinsic 
(Schroer 2005). 

Among the additional ongoing investigations 
in which the conceptual relation with higher- 
dimensional QFT is achieved via modular localiza- 
tion theory, we will select three which have caught 
Our, active attention. One is motivated by the recent 
discovery of the adaptation of Einsteins classical 
principle of local covariance to QFT in curved 
spacetime. The central question raised by this work 
(see Algebraic Approach to Quantum Field Theory) 
is if all models of Minkowski spacetime QFTs 
permit a local covariant extension to curved space- 
time and if not which models do? In the realm of 
chiral QFT, this would amount to ask if all 
Moebius-invariant models are also Diff(S')-covar- 
iant. It has been known for sometime that a QFT 
with all its rich physical content can be uniquely 
defined in terms of a carefully chosen relative 
position of a finite number of copies of one unique 
von Neumann operator algebra within one common 
Hilbert space. This is a perfect quantum field- 
theoretical illustration for Leibnitz’s philosophical 
proposal that reality results from the relative 
position of “monades” (As opposed to the more 
common (Newtonian) view that the material reality 
originates from a material content being placed into 
a spacetime vessel) if one takes the step of identify- 
ing the hyperfinite typ II]; Murray von Neumann 
factor algebra with an abstract monade from which 
the different copies result from different ways of 
positioning in a shared Hilbert space (Schroer 2005). 
In particular, Moebius-covariant chiral QFTs arise 
from two monades with a joint intersection defining 
a third monade in such a way that the relative 
positions are specified in terms of natural modular 
concepts (without reference to geometry). This begs 
the question whether one can extend these modular- 
based algebraic ideas to pass from the global 
vacuum preserving Moebius invariance to local 
Diff(S) covariance Moeb — Diff(S'). This would 
be precisely the two-dimensional adaptation of the 
crucial problem raised by the recent successful 
generalization of the local covariance principle 
underlying Einstein’s classical theory of gravity to 
QFT in curved spacetime: does every Poincaré 
covariant Minkowski spacetime QFT allow a unique 
correspondence with one curved spacetime (having 
the same abstract algebraic substrate but with a 
totally different spacetime encoding)? In the chiral 
context, one is led to the notion of “partially 
geometric modular groups” which only act geome- 
trically if restricted to specific subalgebras (Schroer 


2005). It is hard to imagine how one can combine 
quantum theory and gravity without understanding 
first the still mysterious links between spacetime 
geometry, thermal properties, and relative position- 
ing of monades in a joint Hilbert space. 

A second important umbilical cord with higher- 
dimensional theories is the issue of “Euclideaniza- 
tion” in particular the chiral counterpart of 
Osterwalder—-Schrader localization and the closely 
related Nelson-Symanzik duality. In concrete chiral 
models (e.g., the models in the section “Chiral fields 
and two-dimensional conformal models”), it has 
been noted as a result of explicit calculations that 
the analytic continuation in the angular parametri- 
zation for thermal correlation functions leads to 
a duality relation in 
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where the thermal correlation function is defined as 
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Compared with the thermally extended Nelson- 
Symanzik relation for two-dimensional QFT one 
notices that in addition to the expected behavior of 
real coordinates becoming imaginary and the 
2r-periodicity changing role with the (suitably 
normalized) KMS inverse temperature, there is a 
rotation in the space of superselected charges in 
terms of a unitary matrix S whose origin lies in the 
braid group statistics (the statistics character 
matrix). The deeper structural explanation which 
shows that this relation is not just a property of 
special models, but rather a generic property of 
chiral QFT, comes from a very deep angular 
Euclideanization which is based on modular theory 
(Schroer). Specializing A = identity, one obtains a 
relation for the partition function, the famous 
Verlinde identity which is part of the transformation 
law of the thermal angular correlation functions 
under the SL(2,R) modular group. 

There are many additional important observations 
on factorizing models whose relation to the physical 
principles of QFT, unlike the bootstrap form-factor 
program, is not yet settled. The meaning of the 
c-parameter outside the chiral setting and ideas on 
its renormalization group flow as well as the various 
formulations of the thermodynamic Bethe ansatz 
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belong to a series of interesting observations whose 
final relation to the principles of QFT still needs 
clarification. 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Bosons and Fermions 
in External Fields; Euclidean Field Theory; Integrablility 
and Quantum Field Theory; Operator Product Expansion 
in Quantum Field Theory; Sine-Gordon Equation; 
Symmetries in Quantum Field Theory: Algebraic Aspects; 
Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions; Tomita—Takesaki Modular 
Theory. 


Further Reading 


Abdalla E, Abdalla MCB, and Rothe K (1991) Non-Perturbative 
Methods in 2-Dimensional Quantum Field Theory. Singapore: 
World Scientific. 

Belavin AA, Polyakov AM, and Zamolodchikov AB (1984) 
Infinite conformal symmetry in two-dimensional quantum 
field theory. Nuclear Physics B 241: 333. 

Bisognano JJ and Wichmann EH (1975) Journal of Mathematical 
Physics 16: 985. 

Dashen F, Hasslacher B, and Neveu A (1975) Physics Reviews D 
11: 3424. 

Di Francesco P, Mathieu P, and Sénéchal D (1996) Conformal 
Field Theory. Berlin: Springer. 

Doplicher S, Haag R, and Roberts JE (1971/1974) Communica- 
tions in Mathematical Physics 23: 199; 35: 49. 

Fredenhagen K, Rehren KH, and Schroer B (1992) Superselection 
sector with braid group statistics and exchange algebras II: 
Geometric aspects and conformal invariance. Reviews of 
Mathematical Physics 1 (special issue): 113. 

Furlan P, Sotkov G, and Todorov I (1989) Two-dimensional 
conformal quantum field theory. Rivista del Nuovo Cimento 
12: 1-203. 

Gabbiani F and Froehlich J (1993) Operator algebras and 
conformal field theory. Communications in Mathematical 
Physics 155: 569. 

Glimm J and Jaffe A (1987) Quantum Physics. A Functional 
Integral Point of View. Berlin: Springer. 

Ginsparg P (1990) Applied conformal field theory. In: Brezin E 
and Zinn-Justin J (eds.) Fields, Strings and Critical Phenom- 
ena, Les Houches 1988. Amsterdam: North-Holland. 

Goddard P, Kent A, and Olive D (1985) Virasoro algebras and 
coset space models. Physics Letters B 152: 88. 

Haag R (1992) Local Quantum Physics. Berlin: Springer. 

Ising E (1925) Zeitschrift fur physik 31: 253. 

Jordan P (1937) Beitrage zur Neutrinotheorie des Lichts. Zeitschrift 
fiir Physik 114: 229 and earlier papers quoted therein. 

Karowski M, Thun H-J, Truoung TT, and Weisz P (1977) Physics 
Letters B 67: 321. 

Karowski M and Weisz P (1978) Physics Reviews B 139: 445. 

Kawahigashi Y, Longo R, and Mueger M (2001) Multi-interval 
subfactors and modularity in representations of conformal field 
theory. Communications in Mathematical Physics 219: 631. 

Lenz W (1920) Physikalische Zeitschrift 21: 613. 

Luiescher M and Mack G (1975) Global conformal invariance in 
quantum field theory. Communications in Mathematical 
Physics 41: 203. 

Rehren K-H and Schroer B (1987) Exchange algebra and Ising 
n-point functions. Physics Letters B 198: 84. 


342 Two-Dimensional Models 


Rehren KH and Schroer B (1989) Einstein causality and Artin 
braids. Nuclear Physics B 312: 715. 

Schroer B (2005) Two-dimensional models, a testing ground for 
principles and concepts of QFT, Annals of Physics (in print) 
(hep-th/0504206). 

Schroer B and Swieca JA (1974) Conformational transformations 
for quantized fields. Physics Reviews D 10: 480. 

Schroer B, Swieca JA, and Voelkel AH (1975) Global operator 
expansions in conformally invariant relativistic quantum field 
theory. Physics Reviews D 11: 11. 


Schroer B, Truong TT, and Weisz P (1976) Towards an explicit 
construction of the Sine-Gordon field theory. Annals of 
Physics (New York) 102: 156. 

Schwinger J (1962) Physical Review 128: 2425. 

Schwinger J (1963) Gauge theory of vector particles. In: 
Theoretical Physics Trieste Lectures 1962. Wien: IAEA. 

Smirnov FA (1992) Advanced Series in Mathematical Physics 14. 
Singapore: World Scientific. 

Streater RF and Wightman AS (1964) PCT, Spin and Statistics 
and All That. New York: Benjamin. 

Zamolodchikov AB and Zamolodchikov AB (1979) Annals of 
Physics (New York) 120: 253. 





Universality and Renormalization 


M Lyubich, University of Toronto, Toronto, ON, General Terminology and Notations 
Canada and Stony Brook University, Stony Brook, 
NY, USA 


© 2006 Elsevier Ltd. All rights reserved. 


We will use general notations and terminology from 
Holomorphic Dynamics. 


Unimodal Maps 


Introduction 
Definitions and Conventions 


Discovery of the universality phenomenon and the 
underlying renormalization mechanism by Feigen- 
baum and independently by Coullet and Tresser in 
late 1970s was one of the most influential events 
in the dynamical systems theory in the last quarter 
of the twentieth century. It was numerically 
observed that the cascades of doubling bifurca- 
tions leading to chaotic regimes in one-parameter flr 3 fn? 
families of interval maps, as well as the dynamical Sf = f2 (5) <0 

attractors that appear in the limits, exhibit the 

universal small-scale geometry. To explain this For simplicity, we also assume that the map f is 
surprising observation, a “Renormalization Con- even, and normalize it so that c=0 and one of the 
jecture” was formulated which asserted that a endpoints of I is a fixed point. 

natural renormalization operator acting in the 

space of dynamical systems has a unique hyper- Topological Dynamics 

bolic fixed point. 

It took about two decades to prove this conjecture 
rigorously (and without the help of computers). The 
proof revealed rich mathematical structures behind 
the universality phenomenon that linked it tightly to 
holomorphic dynamics and conformal and hyper- 
bolic geometry. 

Besides the universality per se, the renormaliza- 
tion theory led to many other important results. 
It includes the proof of the regular or stochastic © Regular maps. Such a map has an attracting or 


Let us consider a smooth interval map f :I— I. It is 
called unimodal if it has a single critical point c and 
this point is an extremum. We assume that the critical 
point is nondegenerate, unless otherwise it is expli- 
citly stated. A unimodal map is called S-unimodal if it 
has a negative Schwarzian derivative: 


Let J > 0 be a O-symmetric periodic interval, that is, 
f?(J) CJ for some p €N, such that the intervals 
JIn=f* (J), R=0,1,...,p —1, have disjoint interiors. 
Then we refer to U J, as a cycle of intervals of period p. 

According to their topological dynamics, S- 
unimodal maps can be divided into three possible 
types (Sharkovskii, Singer, Guckenheimer, Misiur- 
ewicz, van Strien, Blokh, etc.): 


dichotomy that gives us a complete under- parabolic cycle œ. In this case, almost all trajec- 
standing of the real quadratic family (and more tories of f converge to @. In case @ is attracting, the 
general families of one-dimensional maps) from map f is also called hyperbolic (see Holomorphic 
measure-theoretic point of view, as well as deep Dynamics). 
advances in several key problems of holomorphic ¢ Topologically chaotic maps. For such a map, 
dynamics. there is a cycle of intervals UJ, such that the 
Since the original discovery, many other manifes- restriction f| U J, is topologically transitive (i.e., it 
tations of the universality have been observed, has a dense orbit). Moreover, for almost all z € J, 
experimentally, numerically, and theoretically, in orb z eventually lands in this cycle. 
various classes of dynamical systems. However, in @ Infinitely renormalizable maps. For such a map, 
this article we will focus on mathematical aspects of there is a nested sequence of periodic intervals 


the original phenomenon. J} >fFD---30 of periods p,— oo. Then the 
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intersection of the corresponding cycles of 


intervals, 


pa—1 
ace [1] 


Arar () 


is a Cantor set endowed with a natural group 
structure (inverse limit of cyclic groups Z./pyZ) 
such that f|A becomes a group translation. 
Moreover, f”z — A for a.e. z € I. This Cantor set 
is also called the Feigenbaum attractor of f. 


Kneading Theory 


Kneading theory (Milnor and Thurston, mid-1970s) 
gives a complete topological classification of S-unimodal 
maps (and more general one-dimensional maps). Let I} 
and I_ stand for the components of I\ {0}, where I, 3 
f(0). To any point x € I, let us associate its itinerary 
Coa where £, € {+,—,0}, NEZ, Uœ, in the 
following way. If x is precritical then N € Z, is the 
smallest number such that fx = 0, and we let en = 0. 
Otherwise, N = oo. For n < N, €n = + if f”x € I}, and 
En=— if f”xel_. 

The kneading sequence of f is the itinerary of the 
critical value f(0). It essentially classifies S-unimodal 
maps: two nonregular S-unimodal maps are topolo- 
gically conjugate if and only if they have the same 
kneading sequence. (In the regular case, one should 
state if the map is hyperbolic or parabolic and 
specify the sign of the multiplier of the correspond- 
ing cycle.) 

The kneading theory completely describes admis- 
sible kneading sequences (realizable by some unim- 
odal maps), and order them linearly in such a way 
that a bigger sequence corresponds to a more 
“complicated?” map. The minimal admissible knead- 
ing sequence, + + +, is realized by the parabolic map 
x= x? + 1/4, while the maximal one, + — — — -, 
is realized by the Chebyshev map x > x? — 2. 

A central result of the kneading theory is the 
Intermediate Value Theorem asserting that a smooth 
one-parameter family of S-unimodal maps f; con- 
taining two kneading sequences also contains all 
intermediate kneading sequences. In particular, a 
family that contains the above maximal and the 
minimal kneading sequences, contains all admissible 
kneading sequences. Such a family is called full. We 
see that the real quadratic family P,,c € [—2,1/4], 
is full: any S-unimodal map is topologically equiva- 
lent to some quadratic polynomial. This indicates 
dynamical significance of the quadratic family. 

We say that a one-parameter family of unimodal 
maps f; is almost full if it contains all admissible 
kneading sequences except possibly the minimal one. 


Universality Phenomenon 


Universal Geometry of Doubling Bifurcations 
and the Feigenbaum Attractor 


Let us consider the real quadratic family P; : x > x? + c, 
c €[—2,1/4]. As the parameter c moves down from 
1/4, we observe a sequence of doubling bifurcations 
Cn where the attracting cycle of period 2” gives birth 
to an attracting cycle of period 2”+',n=0,1,... 
(see Holomorphic Dynamics and Figure 1). This 
sequence converges to the Feigenbaum parameter 
Co, at exponential rate: Cy — Co ~ A”, where A% 
4.6. It turns out that if we consider a similar one- 
parameter family of unimodal maps, say xa sin x, 
we observe a similar sequence of doubling bifurca- 
tions converging to the limit exponentially at the 
same rate A”, independently of the family under 
consideration. 

In the dynamical space, let us consider the 
Feigenbaum attractor A, [1] of an infinitely renor- 
malizable S-unimodal map f that appear in the limit 
of doubling bifurcations (so that the periods of 
periodic intervals J” are equal to 2”). Let us consider 
the scaling factors o,=|J"|/\J?~'|. Then on — 00; 
where the limiting scaling factor c% + 2.6 is 


c=-2 


Cœ =-—1.38 


c= 1/4 





Figure 1 Real quadratic family P: Xx? +c. This picture 
presents how the limit set of the orbit {PP (0)}—o bifurcates as 
the parameter c changes from 1/4 on the right to —2 on the left. 
Three topological types of regimes are intertwined in an intricate 
way. The gaps correspond to the regular regimes. The black 
regions correspond to the chaotic regimes (though, of course, 
there are many narrow invisible gaps therein). In the beginning 
(on the right) one can see the cascade of doubling bifurcations. 
This picture became symbolic for one-dimensional dynamics. 


independent of the particular map f under considera- 
tion. Thus, the small-scale geometry of Ay is 
universal. 

This was historically the first observed manifesta- 
tion of the quantitative universality of dynamical 
and parameter structures. 


Feigenbaum-Coullet-Tresser Renormalization 
Conjecture 


To explain the above universality phenomenon, 
Feigenbaum and independently Coullet and Tresser, 
formulated the following Renormalization Conjec- 
ture. Let us consider the space U of S-unimodal 
maps f:[-1,1]—[-1,1]. A map f €U is called 
(doubling) renormalizable if it has a cycle 
of intervals J— Jı —J of period 2. Then, for any 
n€7Z.. U{oo}, we can naturally define n-times 
renormalizable maps, where n=0 corresponds to 
the non-renormalizable case, while n= corres- 
ponds to the infinitely renormalizable case. 

Let U’ CU be the space of doubling renormaliz- 
able maps. If f € U’ then f*: J —/J is an S-unimodal 
map as well, and we define the (doubling) renorma- 
lization operator R:U'—U as the rescaling of this 
map: 


Rf (x) = o'f (ox) 
where g= ]|J|/2. 
The Renormalization Conjecture asserted that: 


e The renormalization operator R has a unique 
fixed point f,, and this point is hyperbolic; 

è the stable manifold W‘(f,) consists of infinitely 
renormalizable unimodal maps; 

e the unstable manifold W"(f,) is one dimensional 
and represents an almost full family of unimodal 
maps (see the section “Kneading theory”); and 

e the quadratic family {P,} transversally intersects 
W(f.) (see Figure 2). 


Assuming this conjecture, one can see that for any 
curve tr g, in U that transversally intersects the 
stable manifold W‘(f,) at some moment t., the 
doubling bifurcations parameters t, converge to t, at 






Quadratic family 






| ws 


Figure 2 Renormalization fixed point. 
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exponential rate A”, where A is the unstable 
eigenvalue of the differential DR(f,). This explains 
the universal geometry of doubling bifurcations. 

One can also show that the Feigenbaum attractor 
Ay of any map f € W°*(f,) is smoothly equivalent to 
A;., which explains the universal small-scale geome- 
try of these attractors. 


Full Renormalization Horseshoe 


Along with period doublings, one can consider 
period triplings, quadruplings, etc. A unimodal 
map f € U is said to be renormalizable with period 
p if it has a cycle of intervals J > Jy > --- — Jp-1 >J 
of period p. The corresponding renormalization 
operator is defined as Rf(x)=oa!f?(ox), where 
o = |J|/2. 

The combinatorics or type T of the renormalization 
operator is the order of the intervals Jņ, k = 
0,1,...,p— 1, on the real line (up to reversal). (For 
instance, there are three admissible combinatorics 7 of 
period 5.) If we want to specify combinatorics of the 
renormalization operator under consideration, we use 
notation R,. This operator is defined on the “renor- 
malization strip” W” of unimodal maps f € U that are 
renormalizable with combinatorics r. 

The Renormalization Conjecture admits a 
straightforward generalization to any renormaliza- 
tion operator R,. More interestingly, one can 
formulate a stronger version of it by putting all the 
admissible renormalization types together. Let 7 
stand for the set of all minimal renormalization 
types, that is, the types that cannot be factored 
through other types. Then the renormalization strips 
U*,7 €T, are pairwise disjoint, and we can define 
the full renormalization operator 


R:| Jw >u [2] 
TET 


by letting R|W =R,. Then the strong version of the 
renormalization conjecture asserted that: 


e there is an R-invariant hyperbolic subset A C U 
called the full renormalization horseshoe such 
that the restriction R|A is topologically con- 
jugate to the full shift o on the space X of bi- 
infinite sequences (...,7-1,70,71,---) of symbols 
Tn ET; 

e for any fẹ € A, the stable manifold W°(f4) consists 
of infinitely renormalizable maps f € U with the 
same combinatorics as fx; 

e for any f, E€ A, the unstable manifold W'"(f,) is 
one-dimensional and represents an almost full 
family of unimodal maps; and 

è the real quadratic family {P,} transversally inter- 
sects all stable manifolds W‘(f,). 
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Complex Renormalization 
Polynomial-Like Maps 


A polynomial-like map is a holomorphic branched 
covering of finite degree f : U— U’, where UEU’ c 
C are topological disks (In other words, the maps f is 
proper, that is, full preimages f-'(K) of compact sets 
Kc U’ are compact). For instance, if f is a 
polynomial of degree d then for a sufficiently large 
radius R > 0, the map f:f~!(Dr)— Dr is a poly- 
nomial-like map of the same degree d. We refer to 
such polynomial-like maps as “polynomials.” 

The filled Julia set of f is the set of nonescaping 
points: 


0 pe E Oe Oe een 


The Julia set of f is the boundary of its filled Julia 
set: J(f) =OK(f). 

A polynomial-like map of degree d has d—1 
critical points counted with multiplicities. The Julia 
set (and the filled Julia set) is connected if and only 
if all the critical points c; are nonescaping, that is, 
Gc K(f). 

A polynomial-like map of degree 2 is called 
quadratic-like. The Julia set of a quadratic-like 
map is either connected or a Cantor set, depending 
on whether its critical point is nonescaping or 
otherwise. 

The domain of a polynomial-like map is allowed 
to be slightly adjusted by taking V’ to be a 
topological disk such that U c V’ c U’ and letting 
V=f-'(V’). We say that two polynomial-like maps 
represent the same germ if one can be obtained from 
the other by a sequence of such adjustments. 

We will be mostly interested in the quadratic case; 
so let Q be the space of quadratic-like germs 
considered up to affine conjugacy, and let C be the 
connectedness locus in Q, that is, the subset of f € Q 
with connected Julia set. The space Q has a natural 
complex analytic structure such that holomorphic 
curves in Q are represented by holomorphic families 
f(z) of quadratic-like maps. 

Two polynomial-like maps are called hybrid 
equivalent if they are conjugate by a quasiconformal 
map h such that 0h =0 a.e. on K(f) (in particular, h 
is conformal on int K(f)). By the Straightening 
Theorem, any polynomial-like map is hybrid equiva- 
lent (after an adjustment of its domain) to a 
polynomial of the same degree (called the “straigh- 
tening” of f). The straightening depends only on the 
germ of f. 

For a quadratic-like map f with connected Julia 
set, the straightening P.:z--2z*+c is unique, 
c=yx(f). Thus, we obtain the straightening map 


x:C—M, where M is the Mandelbrot set (see 
Holomorphic Dynamics). We let He =x™(c) be the 
hybrid class passing through a point c € M. One can 
show that H< is a codimension-one submanifold in Q. 

Any quadratic-like map has two fixed points 
counted with multiplicity. In the case of connected 
Julia set, these fixed points have a different 
dynamical meaning: one of them, called a, is either 
attracting, or neutral, or repelling separating, that is, 
J(f)\{a} is disconnected. Another one, called 8, is 
either parabolic with multiplier 1 (and then it 
coincides with qa) of repelling nonseparating. 

In what follows, we normalize quadratic-like 
maps so that 0 is their critical point. 


Complex Renormalization and Little 
Mandelbrot Sets 


A quadratic-like map f:U—U’ with connected 
Julia set is called renormalizable if there is a 
topological disk V 5 0 and a natural number p > 2 
called the renormalization period such that: 


e letting g=f?|V and V’=g(V), the map g: V— V’ 
is quadratic-like; 

è the little Julia set K(g) is connected; and 

e the sets g”(K(g)),z=1,...,p—1, can intersect 
K(g) only at the G-fixed point of g. 


Under these circumstances, the quadratic-like germ g 
considered up to affine conjugacy is called the renorma- 
lization of the quadratic-like germ f; g = Rf. Moreover, 
one says that f is primitively renormalizable if the 
little Julia sets g”(K(g)),27=1,...,p —1, are pairwise 
disjoint. Otherwise, f is satellite renormalizable. 

As in the unimodal case, one can define combina- 
torics or type Tt of the complex renormalization. 
Roughly speaking, renormalizable maps with the same 
combinatorics have the same renormalization period 
and the “same position” of the little Julia sets f k(K(g)) 
in C (the rigorous definition is based on the notion of 
Thurston’s equivalence from Holomorphic Dynamics). 


Theorem 1 (Douady and Hubbard 1986). The set 
of parameters c for which a quadratic map 
P.:z-e 2? + c is renormalizable with a given combi- 
natorics T assemble a homeomorphic copy M’ of the 
Mandelbrot set M. 


This theorem explains the presence of many little 
Mandelbrot sets that are observable on the compu- 
ter pictures of M (see Figures 3 and 4). Moreover, 
the copies corresponding to the primitive renorma- 
lization originate at primitive hyperbolic compo- 
nents (see Holomorphic Dynamics), while the copies 
obtained by a satellite renormalization originate at 
satellite hyperbolic components attached to some 
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Figure 3 A primitive copy of the Mandelbrot set. 


Figure 4 The satellite copy of the Mandelbrot set attached to 
the main cardioid at the point of doubling bifurcation. 


“mother” hyperbolic component. (Satellite copies 
attached to the main cardioid are particularly 
prominent on the pictures of M.) 

Given a combinatorial type r, the set Q’ of 
quadratic-like germs f € Q that are renormalizable 
with combinatorics r (the complex renormalization 
strip) is the union of hybrid classes passing through 
the little copy M”. As in the real case, let us consider 
the set 7c of all minimal combinatorial types. Then 
the corresponding renormalization strips Q are 
pairwise disjoint, and we can define the full complex 
renormalization operator R: U er, V > Q. 


Renormalization Theorem 


The first proof of the Renormalization Conjecture in 
the period-doubling case was based on rigorous 
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computer estimates (Lanford 1982). It followed, in 
the 1980s, by works of Epstein, Eckmann, Khanin, 
Sinai, among others, which gave a better conceptual 
understanding and provided proofs of many ingre- 
dients of the picture (without computer assistance). 

The turning point in this development occurred 
when methods of holomorphic dynamics and con- 
formal geometry were introduced into the subject 
(Douady and Hubbard 1985, Sullivan 1986). This 
led to the proof of the renormalization conjecture in 
the space of quadratic-like germs: 


Theorem 2 (Sullivan-McMullen-Lyubich, the 
1990s). For any real combinatorics reT, the 
operator R, has a unique fixed point f, in the space 
Q. Moreover, f, is hyperbolic, its stable manifold 
Ws(f-) coincides with the hybrid class H.,c=x(f;), 
while the real slice of the unstable manifold 
represents an almost full family of unimodal maps. 


This result was further extended to the smooth 
category by de Faria, de Melo, and Pinto. 


MLC, Density of Hyperbolicity, and 
Geometry of Feigenbaum Julia Sets 


The “Mandelbrot set is locally connected” (MLC) 
conjecture (see Holomorphic Dynamics) is intimately 
related to the renormalization phenomenon. This 
connection was first revealed by the following result: 


Theorem 3 (Yoccoz 1990, unpublished). Let us 
consider a nonrenormalizable quadratic polynomial 
P.:zte2* +c with connected Julia set and both 
fixed points repelling. Then the Julia set ](P.) is 
locally connected and the Mandelbrot set is locally 
connected at c. 


This result was recently extended to higher-degree 
unicritical polynomials z+ zf +c (Kahn—Lyubich, 
preprint 2005). 

The MLC Conjecture is still open for general infinitely 
renormalizable parameters. However, the similar pro- 
blem for the real quadratic family has been resolved. 
It implies the real version of the Fatou conjecture in 
the quadratic case (see Holomorphic Dynamics): 


Theorem 4 (Lyubich 1997). Hyperbolic maps are 
dense in the real quadratic family. 


This result was recently extended to higher-degree 
polynomials by Kozlovskii, Shen, and van Strien 
(preprint 2003). 

Infinitely renormalizable quadratic maps of 
bounded combinatorial type (i.e. with bounded 
relative periods pyi1/pn) supply us with a rich class 
of fractals with very interesting geometry. These 
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Julia sets are “hairy” at the origin, that is, their 
blow-ups fill in densely the whole plane (this 
phenomenon is related to the universal geometry of 
the Feigenbaum attractors; McMullen (1996)). 
However, some of them have zero Lebesgue measure 
(Yarrington, thesis 1995) and Hausdorff dimension 
smaller than 2 (Avila—Lyubich, preprint 2004). It is 
unknown whether this happens for all of them or 
not (in particular, the answer is unknown for the 
Feigenbaum map born in the cascade of doubling 
bifurcations). 


Regular or Stochastic Dichotomy 
Stochastic Maps 


An S-unimodal map f is called stochastic if it has an 
absolutely continuous invariant measure p. In this 
case, f is topologically chaotic (see the section 
“Topological dynamics”) and u is supported on the 
transitive cycle of intervals UJ,. Moreover, u has a 
positive characteristic exponent, 


= J log |Df|du > 0 


and Lebesgue almost all orbits are equidistributed 
with respect to u, that is, for Lebesgue a.e. x € I, 


o's) = [ody 


for any continuous function ġ. The map f? | J is mixing 
with respect to u, and in fact, is weakly Bernoulli. 
Here are two important criteria for stochasticity: 


è Collet-Eckmann condition (see Holomorphic 
Dynamics). These maps have extra strong sto- 
chastic properties, notably, the exponential decay 
of correlations. 

è Martens—Nowicki condition. To state it, we need to 
define the principal nest of intervals, I? > I'D---53 
0. Here I? = [—a, a], where a is the fixed point with 
negative multiplier, and I”+! is inductively defined 
as the component of f~”"(I”) containing 0, where l, 
is the moment of first return of the orbit of 0 to I”. 
Let us consider the scaling factors o,, = |I"|/|I"~']. If 


S/n < œ then f is stochastic. 


Let NV C [—2,1/4] be the set of parameters c for 
which the quadratic map P, is topologically chaotic. 
Not every such map is stochastic. However, the set 
of stochastic parameters has positive Lebesgue 
measure (Jakobson 1981), and in fact, 


Theorem 5 (Lyubich 2000). For a.e. cE N, the 
map P. satisfies the Martens-Nowicki condition, 
and thus, is stochastic. 


Avila and Moreira (2005) went on to prove that 
for a.e. cE N, the map P, is Collet-Eckmann. 


Renormalization Horseshoe 


Let us consider the complexification of the renor- 
malization operator [2], 


R:(|J)a°-@ [3] 


TET 
acting in the space of quadratic-like maps. 


Theorem 6 (Lyubich 2002). The “Strong Renor- 
malization Conjecture” is valid for the operator [3]. 


Let Z C [—2,1/4] be the set of parameters for 
which the quadratic map P, is infinitely renormaliz- 
able. The above theorem implies that this set has 
zero Lebesgue measure. (Avila and Moreira went on 
to prove that HD(Z) < 1.) 


Regular or Stochastic Dichotomy 
Putting together Theorems 5 and 6, we obtain: 


Theorem 7 For a.e. c € [—2,1/4], the quadratic 
map P, is either regular or stochastic. 


This result gives a complete probabilistic picture 
of dynamics in the real quadratic family. It has been 
later transferred to any nondegenerate real analytic 
family of S-unimodal maps (Avila—Lyubich—de 
Melo), and further to a generic smooth family of S- 
unimodal maps (Avila—Moreira). 

Palis has formulated a strong general conjecture 
(in all dimensions) asserting that a typical (from 
the probabilistic point of view) smooth dynamical 
system f has finitely many attractors supporting 
SRB measures (see Lyapunov Exponents and 
Strange Attractors) that govern the behavior of 
Lebesgue a.e. trajectories of f. The above results 
confirm the Palis Conjecture in the setting of S- 
unimodal maps. 


Other Universality Classes 


From a more general point of view, renormalization 
is an appropriately rescaled return map to a relevant 
piece of the phase space, viewed as an operator in 
some class of dynamical systems. From this point of 
view, most dynamical systems are “renormalizable,” 
and the renormalization approach often provides a 
deep insight into the nature of the systems in 
question. 

Here is a partial list of classes of nonlinear 
systems that exhibit universality with an underlying 
renormalization mechanism (we provide a few 


relevant names, but there are many more people 
who contributed to the corresponding theories): 


e Holomorphic germs near indifferent equilibria 
(Yoccoz, Shishikura, McMullen); 

è critical circle maps (Kadanoff, Feigenbaum, Rand, 
Lanford, Swiatek, de Faria, Yampolsky); 


è non-renormalizable  quadratic-like maps of 
Fibonacci type (Lyubich—Milnor); 
è conservative two-dimensional diffeomorphisms 


near the point of breaking of KAM tori (MacKay, 
Koch); and 

è dissipative Hénon-like maps (Collet-Eckmann- 
Koch, de Carvalho—Lyubich—Martens). 


See also: Fractal Dimensions in Dynamics; Holomorphic 
Dynamics; Lyapunov Exponents and Strange Attractors; 
Multiscale Approaches. 
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Introduction 


The problem of fluid turbulence is commonly 
regarded as one of the most challenging problems of 
theoretical physics and mathematics. There is general 
agreement that the Navier-Stokes equations (NSEs) 
provide a satisfactory basis for the description of 
turbulent motions of homogeneous Newtonian fluids 
such as gases and most liquids. But the difficulty of 
generating solutions of these equations for high- 
Reynolds-number flows has prevented accurate 
answers to simple questions such as the question of 
the discharge of turbulent pipe flow as a function of 
the pressure head or the question of the heat transport 
by turbulent convection in a fluid layer heated from 
below. In view of this difficulty, it has become an 
attractive idea to obtain rigorous bounds on turbulent 
transports. Variational methods have played an 
important role in the derivation of such bounds. 

There is another motivation for the use of varia- 
tional methods for the understanding of turbulent 
fluid systems. Experimenters have sometimes noted 
the tendency of turbulent flows to maximize trans- 
ports under given external conditions. In his pioneer- 
ing paper, Howard (1963) mentions that the Malkus 
hypothesis of a maximum heat transport by thermal 
convection had motivated him to derive upper bounds 
through the use of variational methods. The techni- 
ques developed by Howard have later been applied to 
other kinds of turbulent transports by Busse. While 
relatively simple ordinary differential equations are 
obtained when the equation of continuity is not 
imposed as a constraint, the Euler-Lagrange equa- 
tions for a stationary value of the variational 
functional lead to nonlinear partial differential equa- 
tions when solenoidal extremalizing vector fields are 
required. Nevertheless, using boundary layer methods 
one can derive approximate analytical solutions even 
in the limit of asymptotically large Rayleigh and 
Reynolds numbers (Busse 1969, 1978). 


In the following, we shall first discuss the energy 
method which provides necessary conditions for the 
existence of turbulent solutions of the underlying 
equations and then turn to the problem of upper 
bounds for the turbulent momentum transport in the 
plane Couette flow configuration as a particular 
example. The properties and physical relevance of 
the extremalizing vector fields will be discussed in a 
final section. 


Energy Method 


For simplicity, we consider the NSEs for a homo- 
geneous incompressible fluid with a constant kine- 
matic viscosity v in an arbitrary fixed domain D. 
Using the diameter d of the domain as length scale 
and d?/v as timescale, we can write the NSEs of 
motion in dimensionless form, 


o 
aut Ve=-Vptft+ Vu [1a] 


V-v=0 [1b] 


where f denotes some given steady distribution of a 
force density. On the boundary OD of the domain D, 
steady velocities parallel to the boundary may be 
specified. We assume that the basic steady solution 
of the problem is given by v,=Rewv where the 
average of (#)*/2 over the domain D (indicated by 
angular brackets) is unity, (\#|7)=2. Any velocity 
field v; different from v,, that is, with u= v, — 
v,£0, must obey the equations 


ð : 
g + Us: Vu+u:Vvs +u: Vu=-Vþ+V*u [2a 
V-u=0 [2b] 


together with the homogeneous boundary conditions 
for u on OD. By multiplying eqn [2a] by u and 
averaging the result over the domain D we obtain 
the relationship 


54. (u-u) = —(|Vul") — Re(u- (u: V)ò) [8] 
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where the vanishing of u on OD and equations such 
as 


(u: (vs: Vu) = Ev : Vu- u) 
= (V - (vu -u)) = 0 
have been used to prove that the terms 


v,- Vu, u: Vu and Vp do not enter the balance [3]. 
This balance is called the Reynolds-Orr energy 
equation and is the basis for the application of the 
energy method. The lowest value Re for which the 
right-hand side of [3] is non-negative is called the 
energy Reynolds number Reg. For Re < Reg the 
steady solution v, is absolutely stable and the energy 
of any disturbance u must decay exponentially in 
time. Re > Reg is a necessary condition for the 
existence of a persistent turbulent state of fluid flow. 
Reg is determined as the solution of the variational 
problem: 


For a given flow v in D find the minimum Reg of the 
functional 
(Vă) 
Re = — 4 
E= (ù. (ù. V)0) 4 
among all vector fields u which satisfy the conditions 
V -u=0 in D, n=0 on OD, and (ù. (u-V)v) < 0. 


For Re> Reg there will exist at least one vector 
field u, namely the minimizing solution ŭ of the 
variational problem [4], the energy of which does 
not decay, at least not initially. In the derivation of 
the Euler-Lagrange equations as necessary condi- 
tions for stationary values of the variational func- 
tional [4], 


1G (1,00; + 0,00.) = —Ojt + kO; [Sa] 
Ott, = 0 [Sb] 


the constraint V -ŭ=0 has been taken into account 
through the Lagrange multiplying function 7. G is a 
stationary value of the functional [4] and in general 
there exist many of those which are determined as 
eigenvalues of the linear boundary value problem [5] 
together with its boundary condition “;=0 on OD. 
Only the infinum of all G provides the energy 
Reynolds number Reg. Many details on the energy 
method can be found in Joseph’s book (1976). Here 
we just wish to remark that the Reynolds—Orr balance 
[3] remains valid when the problem is considered in a 
system rotating with a constant angular velocity Qp 
since the Coriolis force does not contribute to the 
energy balance [3]. The values of Reg are usually 
much smaller than the critical values Re, for the onset 
of infinitesimal disturbances as can be seen from 
Table 1. Here the experimentally determined values 





Table 1 Reynolds numbers for shear flows 
Reg Rege Re, 
(from exp.) 
Plane Couette flow 82.6 ~ 1300 ee) 
Poiseuille flow (channel flow) 99.2? +20004 57/2 
Hagen-Poiseuille flow 81.5? 2=2100° oo 
(pipe flow) 
Circular Couette flow with 82.6 ~82.6 82.6 
Qp = Reg /2 


“The maximum velocity and the channel width d (radius d in the 


case of pipe flow) have been used in definition of Re. 


Reg for the instability of the basic flow state have also 
been listed. A unique situation occurs in the small gap 
limit of the Taylor—Couette system where Reg and Re, 
coincide for a special value of the dimensionless mean 
rotation rate Qp (Busse 2002). 


Variational Problem for Turbulent 
Momentum Transport 


In order to introduce the variational method for 
bounds on turbulent transports we consider the 
simplest configuration for which a nontrivial solu- 
tion of the NSEs of motion exists: the configuration 
of plane Couette flow (Figure 1). The Reynolds 
number is defined in this case in terms of the 
constant relative motion Uoi between the plates, 
Re = Uod/v, where i is the unit vector parallel to the 
plates and v is the kinematic viscosity of the fluid. 
Using the distance d between the plates as length 
scale and d?/v as timescale, the basic equations can 
be written in the form 


o 
gY tv: Vv =-Vp+V*v [6] 


V-v=0 [7] 


We use a Cartesian system of coordinates with the 
x, z-coordinates in the directions of i and k, 


all |s 
X — j 


1 
> Re 





Figure 1 
problem. 


Geometrical configuration of the plane Couette flow 


respectively, where k is the unit vector normal to the 
plates such that the boundary conditions are given by 


1 
at z=+— [8] 


1 
v=F> Rei 7 


After separating the velocity field v into its mean 
and fluctuating parts, v= U +ù with v= U, o=0, 
where the bar denotes the average over planes 
z=const., we obtain by multiplying eqn [6] by ù 
and averaging it over the entire fluid layer (indicated 
by angular brackets) 


jg D=- (m FU) (war) 


Here u denotes the component of v perpendicular to 
k and w is its z-component. We define fluid 
turbulence under stationary conditions by the prop- 
erty that quantities averaged over planes z= const. 
are time independent. Accordingly, the equation for 
the mean flow U can be integrated to yield 


d 

| U = wa — (wu) — Rei 1 
T wu — (wu) ei [10] 
where the boundary condition [8] has been 
employed. With this relationship, U can be elimi- 
nated from the problem and the energy balance 


(Vult) + (mw — (uw)|") = Re(uxw) [11] 


is obtained where the identity (mw) — (uw) = 
(mw — (uw)|~) has been used. 

Since the momentum transport in the x-direction 
between the moving rigid plates is described by 
M = —dU,./dz k-si =(uxw) + Re; we can con- 
clude immediately that the momentum transport 
by turbulent flow always exceeds the corresponding 
laminar value because (uxw) is positive according 
to the relationship [11]. Since a lower bound on M 
thus exists, an upper bound u on (uxw) as a 
function of Re is of primary interest. Following 
Howard (1963), it can be shown that p(Re) is a 
monotonous function and it is therefor equivalent 
to ask for a lower bound R of Re at a given value u 
of (u,w). We are thus led to the following 
formulation of the variational problem: 


Find the minimum R(,) of the functional 








12) 


among all solenoidal vector fields v=u+ kw (with 
ü - k =0) that satisfy the boundary condition v = 0 at 
z=+1/2 and the condition (ùy) > 0. 


Variational Methods in Turbulence 353 


The Euler-Lagrange equations as necessary con- 
ditions for an extremal value of the functional are 
given by 


eo ==VvVr Vo 3] 


Vev = 0 [14] 
where dU*/dz is defined by 


vi? 
oy (na) —i(n—$EPD) 5 








and where w=(u,w) has been set. When eqns 
[13]-[15] are compared with the equations for ù 
and for U, a strong similarity can be noticed. The 
variational problem does not exhibit any time 
dependence, but the Euler-Lagrange equations may 
still be regarded as the symmetric analogue of the 
NSEs for steady flow. 


Upper Bounds on the Turbulent 
Momentum Transport 


A simple analytical solution of the variational 
problem can be obtained when the constraint 
V -v=0 is dropped. In that case it is evident that 
the minimum of the functional [12] is reached 
when v is independent of x, y, and when 
Uy, =w=f(z) holds. The Euler-Lagrange equations 
then assume the form of an ordinary differential 
equation, 


f= WEP- -R+ 16 


Since the variational functional [12] is homogeneous 
in Ŭŭ, we are free to use a normalization condition for 
which we choose max[f(z)]=1. Multiplication of 
eqn [16] by f’ and integration yield 


P =z EAU -P 
with k? = p/[2(R + u)(f?) — 24F?) —»] [17 


This equation can be solved in terms of elliptical 
integrals. The minimum R(y) is determined by the 
relationships 


R= SRA + k*) + K°/D — 3k°KD)| 


u = 8k? KD 


[18] 


where D(k) and K(k) are the complete elliptical 
integrals usually labeled by these letters. For 
details, see the analysis by Howard (1963) of an 
analogous problem. In the asymptotic case of large 
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Reynolds numbers, relationships [18] yield the 
upper bound 


8 
Re) = — Re* 1 
(Re) = 755 Re [19] 

In solving the full eqns [13]-[15], it is convenient 
to eliminate eqn [14] through the general represen- 
tation of the solenoidal vector field ù, 


v=Vx(Vdxk)+VwWxk [20] 


We assume that the minimizing vector field ù does 
not depend on x, although a rigorous proof for this 
property can be given only for small values of wp. 
Introducing the notations 06=0w/dy and w= 
—0*b/Oy* we are thus led to the general ansatz 


N 
w= w™) = X az wn(z)bn(y) [21a] 
n=1 
N 
06=O0% = X On(z)bn(y) [21b] 
n—1 


where N may tend to infinity and the functions ¢,,(y) 
satisfy the equation 


or 2 


In the following, it will be assumed that the positive 
wavenumbers qa, are ordered according to their size, 
Qn—-1 < Qn < Qn+1. The solutions of the form [21] of 
the Euler-Lagrange equations exhibit a boundary 
layer structure for large u as sketched in Figure 2. 
Accordingly, the N-a solutions are characterized by a 
hierarchy of N boundary layers at each plate and 
provide the upper bound sequentially with increasing 
u starting with N = 1. The extremalizing vector fields 
thus exhibit a bifurcation structure similar to that 
found in many cases of the transition to turbulence. 
The thicknesses of the boundary layers decrease with 
increasing u and their ratio from one layer to the next 
approaches the factor 4 as indicated in Figure 3. The 
typical scale of motion increases linearly with 
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Figure 2 Qualitative sketch of the boundary layer structure of 
the extremalizing N-a solution. 
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Figure 3 Qualitative sketch of the nested boundary layers that 
characterize the vector field of maximum transport. The profile of 
the mean shear is shown on the right side. 


distance from the wall as assumed in Prandtl’s 
mixing-length theory. But the discreteness of the 
scales reflects the fact that effective transports require 
preferred scales. Asymptotically, the upper bound for 
the momentum transport approaches 


u(Re) = 0.010 Re? [23] 


which represents a significant improvement over the 
relationship [19]. Nevertheless, the upper bound still 
exceeds the measured values of the momentum 
transport by more than a factor 10. 


Discussion 


Bounds like those for the momentum transport have 
been obtained for many other kinds of turbulent 
transports. For details we refer to the review articles 
listed below. Usually, the formulation of the upper 
bound problem requires that the external conditions 
are homogeneous in two spatial dimensions such 
that a separation of the turbulent velocity, tempera- 
ture, or magnetic fields into mean and fluctuating 
parts is possible. In this respect, the variational 
methods for upper bounds are more restricted than 
those used for determination of the energy Reynolds 
number Reg. The latter problem, incidentally, 
corresponds to the limit u—0 of variational 
problems of the type [12] as can be seen from a 
comparison with expression [4]. 

In recent years, the background field method has 
been introduced by Doering and Constantin (1994) as 
an alternative way for obtaining bounds on properties 
of turbulent flows. When optimized, it becomes 
equivalent to the variational method discussed in this 
article as has been demonstrated by Kerswell (1998). 
The fact that not optimized bounds can be obtained 
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relatively easily emphasizes the point that the extre- 
malizing vector fields are the most interesting aspect of 
the variational problems. They often exhibit simila- 
rities with the observed turbulent velocity fields, in 
particular as far as the mean flows are concerned. In 
the case of convection in a layer heated from below, 
the transition of the bound from the 1-a solution to 
the 2-a solution corresponds closely to the experi- 
mentally observed transition from convection rolls to 
bimodal convection (Busse 1969). 

The close similarities between variational functionals 
for rather different physical systems suggest corre- 
sponding similarities between the respective turbulent 
fields. For example, the analogy between the fluctuat- 
ing component of the temperature in turbulent convec- 
tion and the streamwise component of the fluctuating 
velocity field in shear flow turbulence has been 
demonstrated and employed in a theory of the atmo- 
spheric boundary layer (Busse 1978). Better bounds 
and more physically realistic properties of the extre- 
malizing vector fields can be expected when additional 
constraints are imposed. For example, the energy 
balances for poloidal and toroidal components of the 
velocity field can be applied separately. But these 
developments are still in their initial stages. 


See also: Bifurcations in Fluid Dynamics; Fluid 
Mechanics: Numerical Methods; Turbulence Theories. 
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Ginzburg—Landau-type problems are variational 
problems which consider a Dirichlet-type energy 
posed on complex-valued functions, penalized by a 
potential term which has a well in the unit circle of 
the complex plane. The denomination comes from 
the physical model of superconductivity of Ginzburg 
and Landau. They are phase-transition-type models 
in the sense that they describe the state of the 
material according to different “phases” which can 
coexist in a sample and be separated by various 
types of interfaces. We start by presenting the 
physical model (readers familiar with it may wish 
to skip the next two sections and go straight to the 
section “The simplified model”). 


Introduction to the Ginzburg-Landau Model 


The Ginzburg-Landau model was introduced by 
Ginzburg and Landau in the 1950s as a pheno- 
menological model to describe superconductivity, 
and was later justified as a limit of the quantum 


BCS theory of Bardeen—Cooper-—Schrieffer. It is a 
model of great importance and recognition in physics 
(with several Nobel prizes awarded for it: Landau, 
Ginzburg, Abrikosov). In addition to its importance 
in the modeling of superconductivity, the Ginzburg- 
Landau model turns out to be mathematically 
extremely close to the Gross—Pitaevskii model for 
superfluidity, and models for rotating Bose-Einstein 
condensates, which all have in common the appear- 
ance of topological defects called “vortices.” 
Superconductivity, which was discovered in 1911 
by Kammerling Ohnes, consists in the complete loss 
of resistivity of certain metals and alloys at very low 
temperatures: the two most striking consequences of 
it being the possibility of permanent superconduct- 
ing currents and the particular behavior that an 
external magnetic field applied to the sample gets 
expelled from the material and can generate 
vortices, through which it penetrates the sample. 


The Energy Functional 


After a series of dimension reductions, the Ginzburg- 
Landau model describes the state of the 
superconducting sample occupying a region Q 
and submitted to the external magnetic field hex, 
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below the critical temperature, through its Gibbs 


energy: 
=3 vay +o" 


3f curl A — heal [1] 


"l i 


In this expression, the first unknown œ% is the 
“order parameter” in physics. It is a complex-valued 
condensed wave function, indicating the local state 
of the material, or the phase (in the Landau theory 
approach of piace transitions): ): ||? is the density of 
the “Cooper pairs” of superconducting electrons 
explaining superconductivity in the BCS approach. 
With our normalization |q| <1 and where y| ~ 1 
the material is in the superconducting phase, while 
where || ~ 0, it is in the normal phase (i.e., behaves 
like a normal conductor), the two phases being able 
to coexist in the sample. 

The second unknown A is the electromagnetic 
vector potential of the magnetic field, a function 
from Q to R°. The induced magnetic field in the 
sample is deduced by þh = curl A. The notation V4 
denotes the covariant derivative V — iA. The super- 
conducting current is the vector j of components 


jk = (th, (Va) Y) |2] 


where (.,.) denotes the scalar product in C 
identified with R4. 

Finally, the parameter € is the inverse of the 
“Ginzburg-Landau parameter” «, a dimensionless 
parameter (ratio of the penetration depth and 
the coherence length) depending on the material only. 

Most variational studies of Ginzburg-Landau 
focus on the regime of large k or small e, 
corresponding to “extreme type-II” superconduc- 
tors, also called the London limit. In this limit, the 
potential term acts as a singular perturbation, and 
the characteristic size of the vortices is € — 0; 
vortices become line-like topological singularities, 
which makes it easier to extract and describe them. 

This model is a U(1)-gauge theory, that is, it is 
invariant under the gauge transformations: 


Ma 3 
ArA+V® 
where ® is a smooth real-valued function. The 
physically relevant quantities are those that are 
gauge invariant, such as the energy G.,|w|,, and 
the superconducting current j. 
For more on the model, we refer to the physics 
literature (e.g., DeGennes (1966) and Tinkham 
(1996)). 


Reductions of the Model 


The goal of variational studies of the Ginzburg- 
Landau model is to relate the energy to the vortices 
and the applied field. In three dimensions (3D), 
vortices are filaments, or lines of zeros of the order 
parameter w, around which yw has a nonzero 
winding number. These are quite delicate to describe 
in 3D (we will mention some results below), so a 
simplification that is commonly made consists in 
reducing to a two-dimensional model. 

When reducing to 2D, one assumes that every- 
thing is independent of the vertical direction, and 
that the applied magnetic field is also vertical. The 
domain Q is then a two-dimensional, bounded and 
(for simplicity) simply connected open set, which is 
the horizontal section of an infinite vertical 
cylinder. One can also imagine it represents a thin 
film. 

In 2D, the energy is written the same way: 

1 1 —lwl2 
GapA) = 3 f vay? + SSB 


+ |curl A — hex [4] 


where this time A is R*-valued, and the induced 
magnetic field þ = curl A = A2 — A1 is now a 
real-valued function, which can be taken to be equal 
to hex (now a real positive number) in R?\Q. 

The stationary states of the system are the critical 
points of G., or the solutions of the Ginzburg- 
Landau equations: 


1 
(Vay y= zv- yh) ing 
—V~h = (ip, Va) in Q [5] 
b = bex on OY) 
Vay -v=0 on 02 


where V+ denotes (—O,,, Ox, ). 

A common simplification consists in suppressing 
the magnetic field, and thus in studying the 
simplified energy 


iay 
E(u) => f v CS fg 


ee 


where the order parameter is commonly denoted by 
u, and is still complex valued. This energy, which 
can be seen as a complex analog of the real-valued 
Allen—Cahn model of phase transitions, has been 
extensively studied, especially since the work of 
Bethuel—Brezis—Hélein, where the domain Q is 
assumed to be two dimensional and simply con- 
nected. The higher-dimensional case has also been 
considered. 
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Vortices and Critical Fields 


We now need to explain more precisely what a 
vortex is. In two dimensions, a vortex is an object 
centered at an isolated zero of u (or w), around 
which the phase of u has a nonzero winding number 
called the “degree of the vortex.” It is the simplest 
example of a topological defect. If the zero is located 
at xg, the winding number or degree is the integer 
that can be computed by 


= cae deZ [7] 

2T JoB(xo,r) OT 
where r is small enough, and y is the phase of u, that 
is, u can be written u = |uļ|e'®. For example, the phase 
y = dé, where @ is the polar angle centered at xo, yields 
a vortex of degree d. Observe that the phase is not a 
well-defined function, it is multivalued (and defined up 
to 27); however, we have the important relation 


curl Vy = 27r X ` dibs, [8] 


where the as are the zeros of u, d;’s the associated 
degrees, and 6, denotes the Dirac mass at x. 

When e is small, it is clear from [4] or [6] that |x| 
prefers to be close to 1, and a scaling argument hints 
that |u| is different from 1 in regions of characteristic 
size €. Of course this is an intuitive picture and several 
mathematical notions are used to describe the vortices. 

Vortices appear due to the applied field /,,. For 
type-II superconductors there are essentially three 
critical fields, H.,, He, Hez, critical values of hex for 
which phase transitions occur. For hy, < 
H,,=O(|loge|), there are no vortices and the 
superconductor is in the superconducting phase 
[y| ~ 1 everywhere. At He, the first vortices appear, 
and their number increases as /,, is raised. When 
they become numerous they tend to arrange in 
triangular lattices called Abrikosov lattices, as 
observed in experiments and predicted by Abrikosov 
from the Ginzburg-Landau model, in a very 
influential work. At the second critical field 
H., =O(1/e”) bulk superconductivity is destroyed, 
and surface superconductivity remains until 
H., = O(1/e7), the third critical field, above which 
w = 0 and the material is normal. 


Issues and Methods 


The variational approach to Ginzburg-Landau con- 
sists in expressing the energy in terms of reduced 
quantities or objects, in particular in terms of the 
vortices. This requires to develop mathematical tools 
to describe and characterize the vortices (in particular 
give some suitable definitions of a “vortex structure” 


for a given u or Y), and estimate precisely the energetic 
cost of each vortex and of their interaction. This 
allows us to obtain results of variational convergence 
of the energy G.,E- (or their variants), that is, to 
derive T -limits, or “reduced problems” posed in terms 
of the vortices, which are easier to minimize than the 
original ones. These limits depend on the regime of 
applied field, and allow to characterization of, in turn, 
the critical fields, and the optimal repartition and 
number of the vortices, if any. 

Variational methods also serve to solve some 
inverse problems, that is, to prove the existence of 
solutions of the equation which have some given 
properties, such as a given repartition of vortices, 
through local minimization procedures, or the use of 
topological methods based on investigating the 
topology of the energy levels. 

Nonvariational approaches of Ginzburg-Landau 
are also very useful, in particular to identify the 
profiles of the solutions, to describe vortices of 
nonminimizing critical points, or to perform a bifurca- 
tion analysis around the normal solution at H,,. 


The Simplified Model 


We first present the variational study of E. [6] in 
dimension 2, together with the mathematical tools 
used for both [6] and [4]. We will restrict to the 
asymptotics € —> 0, since this is the situation where 
the most results are known. 

Let us present informally the essential ingredients 
of the analysis. 


Tracing the Vortices 


The easiest way to trace the vortices is to use the 
current (iu, Vu) (or the “superconducting current” 
j= (iw, Vap) for the case with magnetic field). Here 
we recall (.,.) denotes the scalar product in C as 
identified with R*, that is, (iu, Vu) = (u x ðu, u x 
u) with x the vector product in R’. 

The curl of the current is the vorticity of the map u, 
exactly like in fluid mechanics. Writing u=pe'? we 
have (at least formally) (iu, Vu) = Vy and since 
p= |u| is close to 1 (other than in the small vortex 
regions), we have the approximation 


curl (iu, Vu) =curl (Ve) ~ curl Ve 
=b Y dibs, [9] 


where the a;’s are the zeros of u (or its vortices) and 
the d; s their degrees, or 


curl (iw, Vaw) + curl A ~ curl Vy 
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in the case with magnetic field. This can be made 
rigorous (see Jerrard and Soner (2002) and Sandier 
and Serfaty (to appear)), that is, one can express that 


curl (iu, Vu) — 2a S— diba, —-0 ase—0 [10] 


(or respectively curl(iy, Vay) + curlA — 27r $; didba, 
— 0) in some weak functional norm, thus giving a 
rigorous use of [8]. The quantity 


plu) = curl (iu, Vu) [11] 
or 
(p, A) = curl (iv, Vap) + curl A =curlj+h [12] 


in the case with magnetic field, will thus be called 
the vorticity and be used to trace the vortices, in this 
limit € — 0. The relation 


u—2rX_ dibs, > 0 ase—0 [13] 


states that it is close to being a measure. 

This is also called the Jacobian determinant if 
written (with differential forms) Ju= d/iu, du) = 
(idu, du) = 2(ux, X ux, )dx1 \dx2, and under this 
form it can be used in higher dimensions. 


The Cost of Each Vortex 


Here we investigate informally the cost of a vortex 
of degree d. We know already that the characteristic 
length scale of variation of u is €, and that (1 — 
lu|")* is strongly penalized. Thus, we may expect 
that |u| is close to 1 at a distance >< of the zeros. 
Assuming that xo is a zero of u, and taking formally 
lu] =1 for |x —xo| > £, we may write u=el” and 
\Vu| =|Vy| for |x — xo| > e. 
Then, we have 


1 
J Vu]? 
2 JR2\x-xo|2e 


{i r° a 
> J) / dr 
2 E oB(xo,r) 
2 
1 [ / Op\ 1 
oe — | — | dr 14 
E 2 E ( OB(xo,r) 4 2ar | | 


140d fdr 5 R 
= — = nd log— 1 
_ d° log [15] 


op 
OT 











where we have used the Cauchy—Schwarz inequality 
for [14], and the characterization of the degree [7]. 
We may also observe that this lower bound is sharp 
if Oy/Or is constant, that is, if the phase is d@ (and 
the vortex radial). The cost associated to |u| in the 
energy imposes the length scale € and is generally of 


order 1 (|Vu| <C/e), thus negligible compared to 
the cost associated to the phase, which blows up as 
log1/e as € > 0. 

The above estimate is only valid as long as 
B(xo, R) does not contain any other zero of u. If 
vortices get close to each other or become numer- 
ous, one needs refined techniques to estimate their 
cost. This can be done through a “ball-construction 
method” introduced independently by Jerrard and 
Sandier. 


Evaluating the Total Interaction Cost of Vortices 


In a first approach, one studies configurations which 
satisfy the upper bound E-(u) < C|loge|. Then, 
lower bounds of the type [15] show that the total 
sum of the degrees (hence the total number of 
vortices of nonzero degree) remains bounded as £ — 0. 
Up to extraction, we may assume these zeros 4; 
converge as € — 0 to a finite set of points p;, with a 
total degree still denoted d;. This can also be expressed 
as p(u-) — 27>). dibp, as € — O. 

This is not the only case of interest, since 
unbounded numbers of vortices do arise, especially 
in the physical situation of the energy with magnetic 
field, as we will see in the next section. However, 
this hypothesis, which was made in the work of 
Bethuel—Brezis—Hélein, makes the analysis easier 
and already allows us to exhibit the main 
phenomena. 

Vortices in superconductors are generated by the 
presence of the external magnetic field 4... For the 
energy without magnetic field, this has to be 
replaced by some boundary condition which forces 
some degree. Bethuel—Brezis—Hélein considered the 
fixed Dirichlet boundary condition us =g on OQ, 
where g is a fixed unit-valued map on OQ, of degree 
d>0O. This forces u to have a total degree d in Q. 
However, the Neumann boundary condition, for 
instance, can also be considered (the minimizers of 
E; are then simply constants, they are trivial, but 
one can still look for other critical points). 

Let us return to lower bounds in order to look 
for the next order term in the energy (still with 
formal arguments). Cutting out holes U; B(p;, p) of 
fixed size p around the limiting vortices p;, we may 
assume that u =e in O\ U; B(p;, p) =Q,, with y a 
real-valued function, defined modulo 27. Minimiz- 
ing the energy outside of the holes amounts to 
solving 


u=g on 02 
deg(u,OB(pi,p))=di 
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This is a harmonic map problem, whose solution is 
given in terms of y by 


AG=0 in 22, 
Op /. Og 
a, (is 5 on OL) 


J Tda 
ƏB(pi,p) OT 


and in terms of the harmonic conjugate ® which is 


the function (up to a constant) such that 
Vo=V-6, 

Ağ = 0 in Q, 

oP . Og 

a (ig, 5 on 02) 16 


p 
/ ae 
AB(pi,p) OV 


As p — 0, ® behaves like the solution of 
AG) = 20S dip, in Q 


Oo o Jf.: Og 
ay (is, æy on 02) 


[17] 


Hence, we have 


1 
— r di og + Wal(pi,--- De) 


+o(1) asp—0 [18] 


where 


Walai,---54n) =—7 y didjlog |pi — pj 
ij 


1 . Og 
+5 | (ie. 3) 19) 


and R(x)=®o(x) — X; dilog |x — pil. The function 
W was introduced by Bethuel—Brezis—Hélein and 
called the renormalized energy, since it consists in 
the part of the energy that is left after subtracting 
the “infinite part” in |loge| from Es. It contains the 
(logarithmic) interaction energy between the vor- 
tices: we see that vortices with degrees of same sign 
repel one another while vortices with degrees of 
opposite signs attract one another. The zd? log 1/p 
term corresponds to the self-interaction, or cost of 


the vortex of core of size p; it is what replaces the 
infinite term in the formal calculation. 

Now [18] is a good estimate for the optimal 
energy outside of the holes, while the energy in holes 
of size p can be bounded below by [15]. Given the 
degree d; on the boundary OB(p;, p) of the small 
hole, B(p;, p) contains one or several zeros of u of 
degrees 6, with total degree 5), 6, =dj. In view of 
[15], since the cost of a vortex of degree d grows like 
md7|loge|, and since the infimum of >>, 6; under the 
constraint $`; 6, =d; is 6, =sign(d;), the least costly 
way to achieve this is to have |d;| vortices of degree 
sign(d;). The smallest lower bound possible is thus 


1 2 (1—|x\*) p 
J Ma +-—.5— analog Fe [20] 
2 JB(pi,p) 2e E 


where the constant C can be described explicitly. 
Adding up the results of [20] and [18], we find 


E(u) > rò d log 


+r >> ldi log + Walpi» -+-+ Pn) 
PEN E 

> > | log~+ Wal(p1,--->Pn) 
$nC-+0.(1) [21] 


with equality only if u has |d;| zeros of degree 
sign(d;) in each B(p;, p). 

This provides a lower bound of the energy in 
terms of the vortices. Moreover, this bound is sharp: 
one can construct test configurations which have the 
given limiting vortices (p;,d;), and an energy equal 
to the right-hand side of [21]. 

One can thus deduce the behavior of global 
minimizers of the energy. Given the total degree 
d=deg(g)>0 on OQ, we need )°,d;=d, and the 
lowest value achievable under this constraint in 
the right-hand side of [21] is to have d;=1 for 
every i, and thus to have exactly d vortices of 
degree 1. Moreover, the limiting points ;’s 
should minimize W. We thus are led to the first 
main result. 


Theorem 1 (Bethuel—Brezis—Hélein). Minimizers of 
E, under the boundary condition u = g, deg(g) =d > 0, 
have d zeros of degree 1, which converge as € — 0 
to a minimizer of W. 


This result can be rephrased as a result of 
-convergence of E-—d|loge|. It reduces the 
minimization of E; to one of W, which is a finite- 
dimensional problem (interaction of point charges). 
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Thus, we see again the interest of studying this 
asymptotic limit € — 0 because the vortices become 
pointlike and the problem reduces to a finite- 
dimensional one, or one of minimizing the vortex 
interaction. 


Further Results 


A nonvariational approach also allowed Bethuel- 
Brezis—Hélein to prove a further correspondence 
between E, and W: they obtained that critical points 
of E., under the upper bound E, < C|loge|, have 
vortices which converge to a critical point of W. 
Other important results are the study of the blow-up 
profiles or solutions in the whole plane, by Brezis— 
Merle-Riviére and Mironescu. 

In two dimensions, the variational approach is 
also used to solve inverse problems (construct 
solutions) and study variants of the energy with 
pinning (or weighted) terms. 

The variational approach is also fruitful in higher 
dimensions. In dimension 3, for example, vortices are 
not points but vortex lines, and the Jacobian 
Ju=<d(iu, du) can be seen as a current carried by the 
vortex line, with ||Jz|| total mass of the current equal to 
m times the length of the line, and it was established by 
Jerrard and Soner that Ju. is compact in some weak 
sense, and converges, up to extraction, to some 7 times 
integer-multiplicity rectifiable current J, with 


. .  E-(ue) 
> 
po lloge| ~ IJ 





In fact, a complete I-convergence result of 
E-/|loge| can be proved, see the work of Alberti- 
Baldo—Orlandi, and thus minimizing E, reduces at 
the limit to minimizing the length of the line, leading 
to straight lines, or in higher dimensions, to 
codimension-2 minimal currents. This is a nontrivial 
problem, contrarily to dimension 2, where the T- 
limit of E-/| log | is trivial, which required to go to 
the lower-order term to find the nontrivial renorma- 
lized energy limit W. 


The Functional with Magnetic Field 


The aim here is to achieve the same objective: 
express or bound from below the energy by terms 
which depend only on the vortices and their degrees. 
The method consists in transposing the type of 
analysis above taking into account the magnetic 
field contribution to see how the external field 
triggers the sudden appearance of vortices, and for 
what values they appear (thus retrieving the critical 
fields, etc.). One of the main difficulties consists in the 
fact that the number of vortices becomes divergent, 


which requires more delicate estimates. Also, it is then 
no longer possible to study the convergence of the 
individual zeros of 7, so one studies instead the limit of 
rescalings of the vorticity measures u(y, A). 


Splitting of the Energy and Main Results 


Let us recall that in the case with magnetic field, the 
vorticity is given by [12]. In addition, we may 
assume that the second set of equations in [5] 


—V-h=j inQ, h= bex on AO [22] 
is satisfied (if not, keeping w fixed and choosing A 
which satisfies this equation always decreases the 
energy). Taking the curl of this equation, we find 


exactly 
in Q 
on oQ 


23 
ee 23| 


Thus, the vorticity and the induced magnetic field 
are in one-to-one correspondence with each other. 
Combining it to the relation [13], we are led to the 
approximate relation 


—Ah+h~2n dia ing 
i [24] 


h= Her on oQ 


where again the a;’s are the vortex centers and d;’s 
their degrees, well known in physics as the 
“London equation.” It shows how the magnetic 
field is induced by the vortices which act like 
“charges,” and how the magnetic field “penetrates 
the sample” around the positive vortex locations. 
Of course this equation is only an approximation, 
because the singularities at the a; s, where h would 
become infinite, are really smoothed out in p(w, A); 
however, the approximation is good far from 
the vortex cores, just as [17] is an approximation 
for [16]. 

It is then natural to introduce the field corre- 
sponding to the vortex-free situation, which is Pex ho 
where ho solves 


Abo tho =0 inQ 25) 

ho =1 ono”? 

ho is thus a fixed smooth function, depending only 

on Q, and when there are no vortices, we expect h to 

be approximately h.,h49. Moreover, h':=h — h.xho 
then solves 


—Ah! +h! = p(w, A) ~ 2m) diôa in Q - 
i 26 


b' =0 on OD 
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Defining the Green kernel G(., y) by 


—-AG+G=sé, inQ 
T y in 27 
G=0 onda) 
and S by S(x,y)=27G(x, y) + log |x —y|, for x far 
enough from the a;’s, we may approximate h’ by 


b'(x) =20 5 G(x, a) [28] 


Using the second Ginzburg-Landau equation [22] 
and the fact that |q)| <1, we have |V4w| > lj = |Vh], 
thus G.(w, A) > (1/2) fo [VA + |b — hex|*. Plugging 
in the decomposition h=h.xho +h’ and using an 
integration by parts and [26], one finds 

1 5 


Ge(, A) =5 Pex 


+ Pex | Tho» Th! + (bo - 1h 
Q 


/ Vhol? + Ibo — 1/° 
Q 


1 
+5) YH + IPP 
2 Jo 


= be Jo + bex | (bo — 1)u(y, A) 


Q 
1 
+5 [ver + wr 29) 
2 Jo 


where Jo is the constant (1/2) fo [Vhol + |ho — 11°. 
The right-hand side of eqn [29] can be expressed 
in terms of the vortices. First, using [26], we 
have |, (ho —1)u(W,A) ~ 2m >>, di(ho — 1)(a;). Second, 
the expression |, |Vh' Ž + |b|? can be treated exactly 
like E+ (u) in the previous section, using lower bounds for 
the cost of vortices provided by the Jerrard—Sandier 
method, we are led to the (approximate) relation 


1 12. 112 1 
= > . = 
5 | VPP + >) lil log 


= TX did; log |a; = ajl 
val 

+r X djdjS(aj,a;) [30] 
i,j 


Combining this to [29] we find the decomposition 
G.(, A) > hê Jo +7 X |di||loge| 
+ 2rhex > d;(hy — 1)(a;) 


= T did; log |a; = aj] 
val 
T rX did;S (di; aj) [31] 
i,j 
On the other hand, this inequality is sharp: as 
before, given vortices a; one can construct a 


configuration (w, A) for which this is an equality, 
at leading order. 

In that relation, h2, Jo is a fixed energy, the energy 
of the vortex-free configuration. To it are added the 
intrinsic cost of each vortex z|d,||log |, the interac- 
tion cost between vortices, and the interaction 
between the vortices and the external field 
2thex >); dilho — 1)(a;). 

It is then simple, by minimizing the right-hand 
side with respect to the vortices for a given /,,, and 
observing that họ — 1 <0, to deduce a few basic 
facts about vortices: vortices of positive degree (and 
of degree +1) are preferred, each vortex costs 
t\|loge|, and allows to gain at best an energy 
2rhex max |ho — 1| when placed at the minimum of 
hy — 1. Therefore, vortices become favorable when 
their cost becomes smaller than the gain, that is, 
when Pex becomes larger than the “first critical field” 


| log ¢| 


ma 2|min(bo — 1)| Pl 


We have the first main result. 


Theorem 2 (Sandier—Serfaty). When € is small 
enough and hex < Ha, then minimizers of G; have 
no vortices. 


On the other hand, if hex > He, the vortices 
cannot all be located at the same minimum point of 
ho — 1, because their repulsion —7 > /,, log |a; — ajl 
would be infinite. There is thus a trade-off between 
their repulsion and the cost for being far from the 
minimum of ho —1. Only if n, the number of 
vortices, is small compared to þex do the vortices 
tend to concentrate near the minimum of bọ — 1. If 
so, then, assuming for simplicity that the minimum 
of hy —1 is achieved at a unique point p, and 
denoting by O the Hessian of hy — 1 at p, in the 
relation above (9 — 1)(a;) can be approximated by 
min (bo — 1) + (1/2)O(a; — p) and thus G.(~w, A) by 


G.(w, A) ~h2 Jo + 1 log e| + 22h, min(ho — 1) 
+ thex ) (Olai =p) 


-rX djdjlog|a;—aj|+nnS(p,p) [83] 
val 

From this relation, optimizing on £, the character- 
istic distance to p and characteristic distance 
between the vortices, we find that (= ,/n/h.x is 
optimal. 

Moreover, optimizing with respect to n, we find 
that n should remain bounded (as £ — 0) when 
hex < Ha + O(log |loge|). In that regime, rescaling 
by setting x;=((a; — p)/2), we have the following 
result: 
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Theorem 3  (Sandier—-Serfaty). There exist fields 
H, ~ Ha + C(n—1)log|loge| such that when 
An < bex < Hni1, minimizers of G, have n vortices 
of degree 1, and the rescaled vortices x;s tend to 
minimize: 


WOO fie cig OH) -7X log yao 


iA] 


n 
+rn X Q(x;) [34] 

i=1 
If hex — He, > log |log e|, then the optimal number 
of vortices n becomes unbounded as e — 0. The 
analysis above still holds, but in order to get a 
convergence of the vortices, one needs to rescale the 
vorticity measure by n. There is an intermediate 
regime, for log|loge| << hx — He, « |loge| for 
which n should be >>1 but still n & bex, so <1: 
vortices are numerous, but still concentrate around p. 
Rescaling by the scale £ as above, we prove that the 
density of vortices (after dividing it by n) converges to 

a probability measure, minimizer of the energy 


tm | OG) dus) 35] 


This is an a form of [34]. 

If bex— He, is of order |loge|, then the optimal 
number n becomes of order hex and the vortices no 
longer concentrate around a single point. 

The simplest approach is then to simply consider 
the vorticity measure u(y, A) and to rescale it by the 
order n, hence by hy. Then (1/hex) u(y, A) con- 
verges, after extraction, to some measure p,. A 
continuous version of [31] can thus be written, using 


[12], as 
— ) 


esl loge| J MEA y Vb, 
z J 


where /,,, solves 


+ |by, |? 








[36] 


in Q 
on oQ 


b= 1 
Again, this inequality can be proved to be sharp (by 
a construction) and allows to show that minimizers 


of G, have a vorticity u(y, A) such that p(w, A)/hex 
converges to a minimizer of 


gti) = 5 (tim PE) [ls f Wha 


In fact the stronger result holds, in that sense: 





T hy, i 








Theorem 4 (Sandier-Serfaty). G-/h2, -converges 


to G. 


The limit problem of minimizing G turns out to 
have a simple solution in terms of an obstacle 
problem: the optimal p, is a uniform density of 
vortices on a subdomain of Q determined through a 
free boundary problem (and depending on hx), 
which is nonzero. 

In all these regimes, we have thus been able to 
identify the optimal number and repartition of 
vortices through a I-convergence-type approach, 
that is, by reducing the minimization of the energy 
to the minimization of a limiting problem: w, or I or G, 
according to the regime. 


Further Results 


Concerning vortices, in the same spirit as what was 
done for E., we can obtain necessary conditions 
characterizing limiting vorticities obtained from 
sequences of (nonminimizing) critical points of 
the energy G.. They consist in passing to the limit 
in the conservative form of the Ginzburg-Landau 
equations [5]. 

Most of the results concerning the phase transi- 
tions at the next critical fields He, and He, are also 
obtained by nonvariational methods, and often by 
linear analysis. 

The study of the Ginzburg-Landau energy in non- 
simply-connected domains is also very interesting 
because it leads to nontrivial topological effects, since 
in such domains there exist unit-valued maps with 
nonzero degree (corresponding to permanent currents). 


See also: Abelian Higgs Vortices; Aharonov-Bohm Effect; 
Bose-Einstein Condensates; Gamma-Convergence and 
Homogenization; Gauge Theory: Mathematical 
Applications; Ginzburg-Landau Equation; High Te 
Superconductor Theory; Image Processing: 
Mathematics; Superfluids; Topological Defects and Their 
Homotopy Classification; Variational Techniques for 
Microstructures. 
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Austenite—Martensite Transformations 
and the Shape Memory Effect 


Microstructures in materials that typically form in 
response to phase transformations in the solid state, 
and their impact on the elastic properties of these 
materials have been known for centuries. The 
discovery of the complex phase diagram of iron 
revolutionized the production of steels at the end of 
the nineteenth century. Starting in the 1980s, the 
mathematical description of microstructures in the 
framework of nonlinear elasticity has led to deep 
analytical questions and surprising developments in 
the calculus of variations and in nonlinear partial 
differential equations. 

The mathematical approach outlined here is based 
on the following fundamental assumptions: 


1. The observed configurations correspond to mini- 
mizers of or elements of minimizing sequences 
for an energy functional. 

2. The qualitative properties of low energy states 
are determined from the set of minima of the free 
energy density. 


Under these assumptions one aims at explaining 
experimental observations and to predict material 
properties based on minimizing an energy functional 
of the form 


I(u) = | W(Du) dx 


Here Q is an ideal, unstressed reference configura- 
tion in R”, u:Q — R” is an elastic deformation, and 
W:M””” —R is the stored energy density. In the 
case of physical interest, m =n =2 or m =n = 3. For 
applications in elasticity we assume that m =n, but 
this assumption is not needed in the general theory. 
The energy density W and its structure depend 
critically on the temperature. However, since we are 
interested in the analysis of the material at a given 
temperature, we do not include this dependence 
explicitly. 

The key ingredient of this model is the stored 
energy density W which has to reflect the properties 
of the specific material one wants to model. 
Frequently these are alloys, in particular shape 
memory alloys that undergo an austenite—martensite 
transformation. For most materials a closed analytic 


expression for W is not available. In the spirit of the 
fundamental assumption (2) one therefore focuses 
on the structure of the set of minima of W which is 
determined from general invariance and symmetry 
principles. We may assume that W>0O and that 
K={X: W(X)=0} 490. The principle of material 
frame indifference then asserts that 


W(RF) = W(F) for all RE SO(n) 


Here SO(n) is the group of proper rotations, that is, 
the set of all matrices RE M”*” with R'R=Id and 
det R=1. 

The symmetry of the austenitic (high-temperature) 
phase implies that the energy density in the 
martensitic (low-temperature) phase is invariant 
under all changes of basis that leave the underlying 
lattice in the austenitic phase invariant. Therefore, 


W(R'FR) = W(F) for all REP, 


where P, is the point group of the austenite. In the 
case of a cubic to tetragonal phase transformation, 
this leads to K = SO(3) in the austenitic phase and to 


K = $0(3)U; U SO(3)U2 U SO(3)U; [1] 


with 
2 1 
U= e Ee) [2] 


in the martensitic phase (see Figure 1). A set of the 
form SO(n)U; is often referred to as an energy well. 

The origin of the shape memory effect is the 
availability of a rich class of geometric patterns in 
which the martensitic phases can be arranged, thus 
leading to a great flexibility of the material to 
accommodate macroscopic deformations. Upon heat- 
ing of the material above the transformation tem- 
perature, the martensitic phases lose their stability 
and the material returns to its unique shape in the 


(a) (b) (c) 

Figure 1 Two-dimensional cartoon of a cubic to tetragonal 
phase transformation in a single crystal: (a) a cubic lattice, (b) 
and (c) tetragonal variants which are stretched in directions e4 
and @5, respectively. (Sketch not to scale.) 
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(a) (b) (c) 


Figure 2 Formation of phase boundaries in a single crystal. 
(a) The upper right half of the lattice deforms into phase | with 
the constant deformation gradient U4, the lower left half of the 
lattice deforms into phase II with constant deformation gradient 
U». (b) An additional rotation is needed to accomplish a 
continuous deformation, see formula [3]. (c) A different config- 
uration with a different orientation of the interface. (Sketch not 
to scale.) 


austenitic phase. The two solutions of Hadamard’s 
compatibility condition 


OU, — U =a4 8b, O € SO(3) 
are given by 
1 27° g=] 0 
Caer raed as a 0 [3] 
pana 0 0 nt +1 


and O,=Q! (see Figure 2). The normals (in the 
reference configuration) are given by (1, +1, 0)/v2. 
It is one of the successes of the theory that it 
provides an analytical derivation of the normals to 
the twinning planes. 


The Direct Method in the Calculus 
of Variations 


The mathematical interest in the variational prob- 
lems described in the previous section lies in the fact 
that existence of minimizers cannot in general be 
obtained by a straightforward application of the 
direct methods in the calculus of variations. This 
approach is based on the idea to (1) choose a 
minimizing sequence for the functional I, (2) show 
that this sequence is bounded and precompact, 
and (3) prove that the functional is lower semicon- 
tinuous with respect to the notion of convergence, 


I(u) < lim inf I(u;) if uj > u 
joo 


The typical choice is to seek u; in a suitable Sobolev 
space W'?(Q;R”) with 1 < p < œ which is related 
to growth and coercivity conditions for the energy 
density W, 
c1|F|? — c2 < W(F) < c (|F +1) 
for all Fe M””” [4] 


This leads to weak compactness in W?(Q;R”) 
(weak-* compactness in W'™(Q;R”)) and to the 
requirement of sequential weak lower semicontinu- 
ity of the functional, 

I(u) < oe I(uj;) if uj — u in W'?(0;R”) 
(sequential weak-« lower semicontinuity for p = oo). 
Morrey’s fundamental work establishes a link 
between convexity conditions for the energy density 
and lower semicontinuity of the variational integral: 
under suitable growth and coercivity conditions, 
sequential weak-* lower semicontinuity is equivalent 
to quasiconvexity of the integrand. 


Definition 1 <A function W:M’””*” — R is said to be 
quasiconvex at F if 


/ W(F)dx < | W(F + D¢)dx 
Q Q 
for all 6€ WẸ” (Q; R”) 


and for all open and bounded domains Q c R” with 
L”(ðNQ)=0. It is said to be quasiconvex if it is 
quasiconvex at all F. 


In the language of nonlinear elasticity, W is 
quasiconvex if affine functions are minimizers of 
the energy functional subject to their own boundary 
conditions. The direct method implies the following 
classical existence theorem. 


Theorem 1 Suppose that W:M’’*” >R is quasi- 
convex and satisfies the growth and coercivity 
condition [4]. Let up € Wb?(Q;R”). Then the varia- 
tional problem: minimize I(u) in 


A= fu E W190; R”): u — uo € Wy? (Q; R”) 


has a minimizer. 


The remarkable fact is that the structure of the 
zero set of a typical energy W modeling a phase- 
transforming material in its low-temperature phase 
prevents W from being quasiconvex. In order to see 
this, let Q C R? be a cube with two of its sides 
perpendicular to b=(1,1,0)/\/(2) and let h be the 
1-periodic function with h’ = 0 on (0, A) and h’=1 on 
(A, 1) with A € (0, 1). Define vj(x) = U1x + ah( jx - b)/7 
and 


uj(x) = min{v;(x), dist(x,dQ)} 
= min{U,x + ah( jx - b)/j, dist(x,0Q)} 


where  dist(x, OQ) = inf {||x — y||,,,y E€ OO}. Then 
uj — u, u(x) = Cx strongly in L*(Q; R?) and weakly-« 
in W'~(QO;R3) with C=AU,+(1—A)O;UW.¢K 
where K is the zero set of W, see the previous section. 





Figure 3 Construction of a minimizing sequence uj; with Du; — 
{A, B} in measure and affine boundary conditions u(x) = AA + 
(1 — 4)B Hadamard’s compatibility condition requires that A — 
B=a&b is a rank-1 matrix and that the planar interfaces are 
perpendicular to b. 


Moreover, Du; € {U;, Q1 U2} except in a small transi- 
tion layer of volume O(1/7) close to OO and 


j= 


I(u) = J W(C)dx > lim inf I(u;) = 0 


This inequality shows that the functional is not 
weakly-x lower semicontinuous and therefore W 
fails to be quasiconvex. The oscillations of u; on a 
scale 1/j are part of the mathematical model for the 
microstructures frequently observed in shape mem- 
ory alloys. More generally, whenever u is a Sobolev 
function on a domain Q such that Du takes only two 
values, say Du € {A, B}, on open sets which are not 
empty and whose union is Q (up to a set of measure 
zero), then the tangential continuity of the deriva- 
tives implies that the difference A — B is a matrix of 
rank 1, A—B=a@b, and that the interfaces 
between the regions with Du=A and Du=B are 
hyperplanes with normal parallel to b. This state- 
ment is usually referred to as “Hadamard’s compat- 
ibility condition.” Moreover, the pattern in Figure 3 
is known as a “simple laminate” and the matrices A 
and B are said to be rank-1 connected. 


Relaxation 


The discussion in the previous section shows that the 
variational problems related to models in materials 
science typically fail to be weakly lower semicon- 
tinuous. One approach which allows us to recover 
the macroscopic energy of the system and the macro- 
scopic stress-strain relation is to pass to the relaxed 
variational problem which involves the quasiconvex 
envelope of the energy density W. 


Definition 2 Let W:M”””—R be given. The 
function 


WE = sup{f: f < W, f quasiconvex} 
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is called the quasiconvex envelope of W. Equivalently, 
1 
w= inf J W(F + Dé)dx 
ewir”) [OQ] Jo 


This formula implies that W°% is the macroscopic 
energy of the system in the sense that it characterizes 
the smallest energy per unit volume that is required 
to subject a volume element to a deformation with 
affine boundary conditions. Here the system is 
allowed to minimize its energy with microstructures 
at any scale, a mechanism which was already 
explored in the previous section. The arguments in 
this section prove that W%(C)=0 and this shows 
that the zero set of W% can be strictly larger than 
the zero set of K, see Definition 4. The relaxed 
functional is given by 


I%(u) = | we(Dujae 


Since W* satisfies the growth and coercivity 
conditions [4] if they are satisfied by W, the 
functional I9% attains its minimum subject to given 
boundary conditions. The functional I% is the 
weakly lower semicontinuous envelope of I in the 
sense that minimizing sequences for I contain 
subsequences that converge to minimizers of [% 
and for all u there exists a sequence u; which 
converges in W!?(Q;R™”) to u such that the 
energies converge, I(u;)— I(u). However, a lot of 
information in particular about oscillation patterns 
might be lost in the passage from I to I% since the 
knowledge of a minimizer u for I% does not 
provide any immediate information about the 
behavior of any minimizing sequence for I that 
converges to u. Moreover, the minimization pro- 
blem required in the definition of the relaxed 
energy has been solved explicitly only for very 
special energy densities. 

In this context, one often relies on two related 
notions of convexity, one sufficient and the other 
necessary for quasiconvexity. For Fe M’”*” let 
M(F) e RY” be the vector of all minors (sub- 
determinants) of F. In the special case m=n=2 
we have M(F)=(F, detF)€R° and for m=n=3 
we find M(F)=(F,cofF, detF) € RÌ? where cof 
F is the 3 x 3 matrix of all 2 x 2 subdeterminants 
of F. 


Definition 3 Let W:M”*”-—R be given. The 
function W is said to be polyconvex if there exists 
a convex function g:R%™" —R such that 
W(F) =2(M(F)). The function W is rank-1 convex if 
it is convex along all rank-1 lines in M’’*”, that is, the 
function t— W(F + tR) is convex for all F € M”*” 
and all R € M””” with rank(R)=1. 
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All notions of convexity reduce to classical 
convexity if m=1 or n=1. In the vector-valued 
case m,n > 1 the following implications are true: 


f convex => f polyconvex => f quasiconvex 


= f rank-1 convex 


The reverse statements for the first two implications 
are not true. Rank-1 convexity does not imply 
quasiconvexity for m> 3 and it is a fundamental 
open problem with deep connections to harmonic 
analysis to decide whether rank-1 convexity and 
quasiconvexity are equivalent for m=n=2. 

The polyconvex and the rank-1 convex envelope 
of an energy density W are defined analogously to 
Definition 2. In view of the implications between the 
different notions of convexity, one has WP* < 
ws < W" and essentially all explicitly known 
relaxation formulas are based on the approach to 
construct a candidate W* for W" and to verify that 
W* is polyconvex. Then the inequalities become 
equalities and one obtains a characterization for the 
relaxed energy. This approach does not work for 
extended-valued functions which are used in models 
for incompressible materials since quasiconvexity 
does not imply rank-1 convexity in this case. 
However, for a model system of particular interest, 
nematic elastomers, a complete characterization of 
the relaxed energy, the macroscopic stress-strain 
relation, and the macroscopic phase diagram have 
been obtained. 


Classical and Generalized Minimizers 


The discussion of observed configurations as ele- 
ments of minimizing sequences {u;} in the section 
“The direct method in the calculus of variations” 
leaves the question of the existence of minimizers 
open. The answer cannot be obtained via the direct 
methods since minimizing sequences do not need to 
converge strongly to minimizers. One approach to 
obtain the existence of solutions u with I(u) =0 is to 
solve the differential relation Du € K,u(x)=Fx on 
OQ. by constructing special minimizing sequences 
that converge strongly so that one can pass to 
the limit in the energy integral. This idea has led 
to surprising solutions u with affine boundary 
conditions for the two-well problem where K = 
SO(2)diag(7, 1/7) U SO(2)diag(1/n, n). However, the 
structure of the solutions is intrinsically complicated 
in the sense that the phase boundary has infinite 
length unless the boundary conditions are given by 
u(x) = Fx with FEK. 

More generally, the right tool to pass to the limit in 
nonlinear functions of z = Du; like the energy is the 


“Young measure” generated by a subsequence. It is 
given by a family of probability measures v, that 
provide statistical information about the distribution of 
the values of z; close to a given point x. The existence 
and the fundamental properties of Young measures are 
described in the following theorem. For simplicity we 
assume that the sequence z; is uniformly bounded. 


Theorem 2 (Fundamental theorem on Young 
measures). Let E C R” be measurable, L"(E) < œ, 
and let zj:E—>R? be a measurable and bounded 
sequence. Then there exists a subsequence z, and a 
weakly-« measurable map v: E — M(RÎ) such that 
the following assertions are true: 


(i) The measures vy are non-negative probability 
measures. 
(ii) If there exists a compact set K such that u, — K 
in measure, then suppv, C K for a.e. x CE. 
(iii) If f €C(R%) and if f(z,) is relatively weakly 


compact in L'(E), then f(z) —f in L‘(E) 
where f(x) = (vx, f). 


Here (vx, f) denotes the integration of the func- 
tion f with respect to the measure vy. For example, 
the Young measure generated by the sequence Du; 
constructed in the section “The direct method in the 
calculus of variations” generates the Young measure 


Vy =(1/2)64 + (1/2)őg (see Figure 3) and 
I(uj) = J W(Du;) dx 


/ W(Y) duy(¥) dx = 0 
Q JM” 


A Young measure generated by a sequence of 
gradients is called a gradient Young measure 
(GYM). It is said to be homogeneous if vy=v is 
independent of x. We restrict our attention in the 
following to homogeneous GYMs generated by 
sequences that are bounded in L®. The importance 
of quasiconvexity is also reflected in the following 
characterization of homogeneous GY Ms. 


Theorem 3 A non-negative probability measure v 
is a GYM if and only if there exists a compact set 
K c M” with suppv C K and Jensens inequality 
(v, f) >f((v,id)) holds for all quasiconvex functions 
f:M"*" SR. 


This motivates to characterize the generalized 
limits of minimizing sequences as 


M*(K) ={v Ee M(K): f(y, id) ) < Cae 


for all f : M’’*” — R quasiconvex} 


where M(K) is the set of all probability measures 
supported on K. If v is generated by a sequence of 


functions with affine boundary conditions 
uj(x)=Fx, then (v,id)=F. The set of all affine 
deformations of the material that can be recovered 
by heating (shape memory effect) is therefore given 
as the set of all centers of mass of homogeneous 
GYMs supported on K, the so-called “quasiconvex 
hull” KE of K. 


Definition 4 Suppose that K C M””” is compact. 
We define the quasiconvex hull of K by 


K*¥ = {F = (v,id): v E€ M*(K)} 


There are several equivalent definitions of K*. 
The foregoing definition corresponds to the defini- 
tion of the convex hull of a set as the set of all 
centers of mass of probability measures supported 
on K (which satisfy Jensen’s inequality for all 
convex f). The set K can also be defined as the 
set of all points that cannot be separated by 
quasiconvex functions from K or as the zero set of 
the quasiconvex envelope of the distance function to 
K. The “polyconvex hull” KP and the “rank-1 
convex hull” K" are defined analogously by replac- 
ing quasiconvexity with polyconvexity and rank-1 
convexity in the foregoing definitions. It follows that 
K" c K c K™ and all of these inclusions can be 
strict. 

A particularly useful set of conditions are the 
minors conditions 


(v, M) = M((v, id)) 


for all minors M which follow from the weak 
continuity of the minors. For example, if 
K ={A, B} c M’***, then any probability measure 
supported on K is given by v= 6,4 + (1 — A)dg. The 
minors condition with M = det implies that 


det(AA + (1 — A)B) = det(v,id) = (v, det) 
= Adet A + (1 — A) det B 
This identity is equivalent to 
A(1 — A) det(A — B) = 0 


and therefore the quasiconvex hull is equal to K if 
and only if det(A—B)+#0O. A very instructive 
example is the set K={(1,3),(—1, —3),(—3,1), 
(3, —1)} viewed as a subset of the space of all 
diagonal matrices in M***. It is frequently referred 
to as a T4 configuration. The rank-1 convex hull is 
equal to the quasiconvex hull and given by the four 
points, the line segments, and the square in the 
center, the polyconvex hull is bounded by four 
hyperbolic arcs, and the convex hull is the square 
with the points as corners, see Figure 4. It is 
remarkable that the rank-1 convex hull is strictly 
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Figure 4 The four-point subset K in the space of all diagonal 
matrices and its convex hulls: K = K% are given by K, the line 
segments and the shade square, KP°® is bounded by the dashed 
hyperbolic arcs, and the convex hull is the outer square. 


larger than the set K itself despite the fact that the 
set K does not contain any rank-1 connections. 

There are only a few examples in which explicit 
characterizations of the convex hulls for sets 
invariant under SO(n) have been obtained. For 
K =SO(3)U; USO(3)U2 (see [2]), one finds 


a © 0 
K®&¥={FeMe*?:FTF=/c b 0 |, 
0 0 1/1 
2 2 4 l 
"m a a 2c Sy, 
n 


The quasiconvex hull of the three-well problem [1] 
is not known. In two dimensions one finds for 


K =SO(2)U; U---USO(2)Up, 
det Up S19 = 1cca0 


that 


K¥ = [F € M?*?: detF = 1,|Fe|’ < max Uie | 
All examples in which envelopes of functions or hulls 
of sets have been obtained explicitly are based on the 
exceptional property that the polyconvex envelope 
coincides with the rank-1 convex envelope. The T4 
configuration in Figure 4 is one of the few cases where 
the quasiconvex hull is known to be different from the 
polyconvex hull. The construction of quasiconvex 
functions and the understanding of their properties is 
one of the challenges left for the future. 
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Introduction 
The Navier-Stokes equations 


plu, +u-Vu) = —Vp + pAu + f [1] 


V:-u=0 [2] 


provide the simplest model for the motion of a 
viscous incompressible fluid that is consistent with 
the principles of mass and momentum conservation, 
and with Stokes’ hypothesis that the internal forces 
due to viscosity must be invariant with respect to 
any superimposed rigid motion of the reference 
frame. Despite their simplicity, they seem to govern 
the motion of air, water, and many other fluids very 
accurately over a wide range of conditions. Thus, 
their mathematical theory is central to the rigorous 
analysis of many experimental observations, from 
the asymptotics of steady wakes and jets, to the 
dynamics of convection cells, vortex shedding, and 
turbulence. During the last 80 years, a great deal of 
progress has been made on both the basic mathe- 
matical theory of the equations and on its applica- 
tion to the understanding of such phenomena. But 
one of the most important matters, that of estimat- 
ing the regularity of solutions over long periods of 
time, remains a vexing and fascinating challenge. 
Such an estimate will almost certainly be needed to 
prove the “global” existence of smooth solutions. By 
that we mean the existence of smooth solutions of 
the initial-value problem over indefinitely long 
periods of time without any restriction on the 
“size” of the data. To date we can prove the 
“local” existence of smooth solutions, but there 
remains a concern that if the data are large, 
solutions may develop singularities within a finite 
period of time. In fact, there is a great deal more at 
issue than this question of existence. A regularity 
estimate is required to prove the reliability of the 
equations as a predictive model. That is because any 
estimate for the continuous dependence of solutions 
on the prescribed data for a problem depends upon 
a regularity estimate, as do error estimates for 
numerical approximations. A global estimate for 
the regularity of solutions is also required for a 
mathematically rigorous theory of turbulence. In 
fact, it may be hoped that the insight which 
ultimately yields a global regularity estimate will 


also be pivotal to our understanding of turbulence, 
perhaps justifying Kolmogorff theory; see Heywood 
(2003). In this article we aim to present a relatively 
simple approach to the local existence, uniqueness, 
and regularity theory for the initial boundary value 
problem for the Navier-Stokes equations, and to 
discuss some observations that bear on the question 
of global regularity. A wider-ranging review of open 
problems is given in Heywood (1990), and further 
observations concerning the problem of global 
regularity are given in Heywood (1994). 


Setting the Problem 


To focus on core issues, we shall make some 
simplifying assumptions. The fluid under considera- 
tion will be assumed to completely fill (without free 
boundaries or vacuums) a bounded, connected, 
time-independent domain 2 C R”,n=2 or 3, with 
smooth boundary 02. We are mainly interested in 
the three-dimensional case, but comparisons with 
the two-dimensional case are illuminating. The R”- 
valued velocity u(x, t) =(u1(x,t),...,U%,(x,t)) and R- 
valued pressure p(x,t) are functions of the position 
x= (xX1,...,Xn) E Q and time t > 0. Equation [1] is 
an expression of Newton’s second law of motion, 
equating mass density times acceleration on the left 
with several force densities on the right, due to 
pressure and viscosity, and sometimes a prescribed 
external force f. Written in full, using the summa- 
tion convention over repeated indices, its ith 
component is 


Ma E A a 
Por | Ox;} Ox; Ox? 





We will assume the density p and the coefficient of 
viscosity u are positive constants. 

In this article, we consider the initial boundary 
value problem consisting of the equations [1], [2] 
together with the initial and boundary conditions 


Ula = 0 3] 


ul, o= uo, 


The initial velocity uo(x) is prescribed. It will be 
assumed to possess whatever smoothness is con- 
venient, and to satisfy V -uo =0 and uola =0. The 
boundary condition is a reasonable one, since fluids 
adhere to rigid surfaces. 

Notice that a further condition would be needed 
to uniquely determine the pressure, since only its 
derivatives appear in the problem as posed. We 
prefer to do without auxiliary conditions for the 
pressure, and to refer to u by itself as a solution of 
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the problem provided there exists a scalar function p 
which together with u satisfies [1]-[3]. The problem 
is said to be uniquely solvable if there is a unique 
solution u, in which case the gradient of the pressure 
is also uniquely determined, along with the pressure 
up to a constant. Notice also that under our 
assumptions a potential force like gravity has no 
effect on u. If u solves the problem in the absence of 
such a force, then the inclusion of the force affects 
only the pressure, from which the potential must be 
subtracted. It turns out that the inclusion of a 
prescribed nonpotential force, while complicating 
many of the estimates below, does not affect in any 
essential way those parts of the theory to be 
presented here. Thus, for simplicity, we shall 
henceforth assume that f = 0. 


Reynolds Number 


We can make a slight further simplification of eqn [1] 
by rescaling, with the objective of setting p = 1, or even 
p=1 and w=1. This scaling is not required for the 
existence theory we are presenting, but provides an 
important insight for the study of stability, bifurcation, 
and turbulence. The Reynolds number 


_ max |u|- |Q]: p 
u 


plays an important role in rescaling. It expresses the 
ratio of the inertial to viscous effects. The notation 
[Q| represents a characteristic length, such as the 
minimum diameter of a bounded domain. Generally 
speaking, a high Reynolds number corresponds to 
what is meant by “large” data, and the higher the 
Reynolds number the more inclined a flow is to 
instability and turbulence, and perhaps to the 
development of singularities. However, the size of 
the Reynolds number has precise implications only 
in comparing “dynamically similar” flows. We say 
that two vector fields v(x,t) and u(x,t) are dynami- 
cally similar if and only if v(x,t)=au(x/G,t/y) for 
some a,(,7y > 0. In such a case, if u is defined in 
Q x [0, T), then v will be defined in GQ x [0,yT), 
where BQ ={Gx: x € Q}. Furthermore, if u satisfies 
the Navier-Stokes equations, then v will satisfy 


R 


pa tyv, + po *Bv - Vv 
= —BVp(x/B,t/y) +a udv A 


which has the form of the Navier-Stokes equations if 
and only if the coefficients of the two inertial terms 
on the left-hand side are equal. That is, if and only if 


ay = 8 [5] 


in which case 
ve +v: Vv = -Vq + NAv [6] 
with 
n = abu/p [7] 


and q(x,t)=a*p plx/B,t/y). We refer to such u 
and v as dynamically similar flows. The relation 
[7], that follows from [5], is equivalent to the 
equality of the Reynolds numbers for the two 
flows, 


max |u|- |9] p 
h 
— maxļ|au|- |BQ| -1 | 
n 


R(u) 
R(v) 


The condition [5] can be satisfied simultaneously 
with the condition 7=1. For example, one may 
choose G=1,a=p/p, and y= u/p. This achieves a 
rescaling of the equation to 


v +v: Vv = -Vq + Av [8] 


without changing the domain. Different Reynolds 
numbers result from varying the magnitude of the 
velocity. In what follows, we will work with the 
Navier-Stokes equation in this simplest possible 
form. 


Continuous Dependence on the Data 


We begin our investigation of the initial boundary 
value problem 


u,tu-Vu=—-Vp+Au, V-u=0 
for (x,t) € Q x (0,00), [9] 
u|,=0 = “o, ulan =0 
by considering two smooth solutions, say u and v, 
taking possibly different initial values up and vo. Let 
their difference be w=v — u, with initial value wo, 
and let g be the difference of the corresponding 
pressures. Then, subtracting one equation from the 
other, one obtains 


wWr+w:-Vwtu-Vw+w-Vu=—-Vg+Aw [10] 
Multiplying this by w, integrating over Q, and 
integrating by parts, one then obtains 

1d 


zg lel Vw = —(w- Vu, w) [11] 
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where 








a. Ow; Ow; 
Ou; 
(w: Vu,w) = [Sods 











and similarly (w-Vw,w)=O0. In deriving these we 
have used the fact that the vector fields are 
divergence free and vanish on the boundary. In the 
following, we will use such identities without further 
mention. 

We can estimate the nonlinear term on the right- 
hand side of [11] by using the “Sobolev inequalities” 


Isla < HellllVell, ifn =2 


| [12] 
leli < lelt Ivo, ifn =3 


proved by Ladyzhenskaya (1969), though with 
larger constants. These are valid for any smooth 
function ¢ which vanishes on the boundary of Q. It 
may be either scalar or vector valued. The norms on 
the left are L*-norms; we use the notation 
lell = (Jalol dx)! for any p>1, but usually 
drop the subscript when p=2. Using first Holder’s 
inequality and then [12], one obtains 


(w - Vu, w)| < wlll Vel 
< l lwll || Viol] || V|] 
T otw Pul ifn = 3 


ifn=2 


Young’s inequality 


1 
ab <—a? pipa 
p q 


holds if a,b > 0,p,q > 1 and 1/p + 1/q4= 1. Taking 
a=vV2||Vwl]|, along with p=q=2 in the two- 
dimensional case, and a=(4/3)>/*||Vwl], along 
with p=4/3, g=4 in the three-dimensional case, 
one obtains 


(w - Vu, w)| 
P l [Vw +l Vel el, ifan=2 [13] 
-Uveli Vl lelt, if a= 3 


Using these estimates for the right-hand side of [11], 
we obtain linear differential inequalities for ||w||* 
that are easily integrated to give 


w(t) ||" 
fina hl ifn=2 [14] 


< 
2 4 . 
lwoll exp fo 25 ||Vul dr, ifn=3 


It follows that if we can estimate the integrals on 
the right, which concern only the solution u, and if 
v is a second solution, perhaps differing only 
slightly from u when t=0, then we can estimate 
the difference ||v(t) — u(t)|| at later times. Moreover, 
at any particular time this difference will be 
bounded proportionally to ||v(0) — u(0)||. The inte- 
gral on the right-hand side of the two-dimensional 
version of [14] is easily estimated using the energy 
estimate [16] below. The estimation of the corre- 
sponding integral in the three-dimensional case, 
without a restriction on the size of the data, 
remains an open problem. It can be regarded as 
the most important open problem in the Navier- 
Stokes theory. It would never be enough to some- 
how prove that solutions are smooth without 
estimating this integral, or something equivalent 
to it. Of course, if solutions were known to be 
smooth one could infer their uniqueness from [14], 
since smoothness would imply that the integrals are 
finite, which is enough to conclude that ||w(t)|| is 
zero if ||wol|| is zero. 


Energy Estimate 


If one multiplies the Navier-Stokes equation for u 
by u, and proceeds as in deriving [11], one obtains 


1d 
5q lult +|| Vall" = 0 [15] 


and hence 


1 i 1 
JOP f Vuar = Slo? 6 
0 
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This settles the matter of continuous dependence in 
the two-dimensional case. Together with [16], the 
two-dimensional version of [14] implies 


[wA < |lewoll"exp 4 |luol|”, ifn=2 [17] 


We remark that the local rate of energy dissipa- 
tion is 2|Du|* rather than |Vu|*, where Du is the 
stress tensor Du=(1/2)(Vu+ (Vu)'). However, 
integrating over the domain, and integrating by 
parts using the boundary condition u|,.=0, one 
may verify that the rate of total energy dissipation 
2\|Du||* equals ||Vul||?. For the purpose of this 
article, it is convenient to write the energy identity 
as [15]. 


Estimates for ||Vu(t)|| Pointwise in Time 


Of course, an estimate for ||Vu(t)|| pointwise in time 
would imply an estimate for the integral of ||Vu(t)||* 
on the right-hand side of [14]. We can prove such an 
estimate for at least a finite interval of time by an 
argument due to Prodi (1962). It requires, in 
preparation, some deep results concerning the 
regularity of solutions of the steady Stokes equa- 
tions. These cannot be proved here, but we can 
briefly summarize what will be needed. Let 


L7(Q)=space of vector fields ¢, with finite 
L*-norms ||¢||, 

Co (Q) =space of smooth vector fields with compact 
support in Q, 

D(Q) = {$ € Co (9): V - 6 = 0}, 

J(Q) =completion of D(Q) in the L*-norm ||¢]|, 

J,(Q) =completion of D(Q) in the norm ||V@|, 

G(Q) ={Vp:p € L2(Q) with Vp € L?(Q)}, and 

P:L7(Q)—J(Q) be the L?-projection of L?(Q) onto 
J9), 


and define the Sobolev W4 (Q) norm by 
242 y= lal +I Vee)” 


J |3 u; /Ox;Oxp i dx 
Q 


Furthermore, observe that (Vp,¢)=0 for Vp € 
G(Q) and ¢ €J(Q), since it holds if p is smooth 
and @€D(Q). Therefore, PVp=0, since 
(PVp, ¢)=(Vp,¢)=0, for all deEJ(Q). Later, 
when we need it, we will also argue that 
L?(Q) =J(Q) 6 G(Q). 

With these preparations, it is evident that every 
smooth vector field u satisfying V-u=0O and 
u|4, =0 can be regarded as a solution of the steady 
Stokes problem 





-Au + Vp =f and V -u =0 in Q ulja =0 [18] 


with f = — PAu. For such solutions, and hence for 
all such u, we have the estimates 


lulos clPAv| 19 
and 


ifn=2 
ifn=3 


2 < J clulil|PAu]]|, 
splat | OMPA, a 
with constants independent of u. It can also be 
shown that every such vector field u belongs to J, (Q) 
and hence to J(Q); see Heywood (1973). 

Some history and remarks are in order. The 
inequality [19] was proved independently by 
Solonnikov (1964, 1966), and by Prodis student 
Cattabriga (1961). In fact, they gave L? versions of 
it for all orders of the derivatives. Several proofs 
specific to the L* case needed here have been given 
by Solonnikov and Séadilov (1973) and by Beirão da 
Veiga (1997). The inequalities [20] can be proved by 
combining [19] with appropriate Sobolev inequal- 
ities, or better, by combining [19] with recent 
inequalities of Xie (1991) which are of precisely 
the form [20], but with Au instead of PAu on the 
right-hand side, and without the requirement that 
V -u=0. The constant c in [19] depends upon the 
regularity of the boundary, and tends to infinity 
along with a bound for the boundary curvature. 
Through the work of Xie (1992, 1997), there is 
reason to believe that the inequalities [20] are 
probably valid for arbitrary domains, with the 
constant c= (2r) if n=2, and c= (3r)! if n=3. 
Xie’s efforts to prove this have been continued by 
the author (Heywood 2001). If the inequalities 
[20] can be proved for arbitrary domains (i.e., 
arbitrary open sets), with these fixed constants, 
then the approach to Navier-Stokes theory pre- 
sented in this article will extend immediately to 
arbitrary domains, as explained in Heywood and 
Xie (1997), with estimates independent of the 
domain. 

We go on now with an estimation of ||Vu(t)|| 
based on [20]. Multiplying the Navier-Stokes 
equation for u by —PAu, and integrating over Q, 
one obtains 


Tt [Vel + || PAu]? = (u - Vu, PAu) 
< sup |u||Val| PAu] [21] 


since (u, — PAu) = (Pu, — Au) = (u, — Au) = (Vur Vu) 
and (Vp,PA^u)=0. 
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The right-hand side of [21] can be estimated using 
[20] and Young’s inequality: 


sup [sll] Val || PAz| 


J elult Vull PAu], ifn =2 
c|| Vul|?/*||PAu||>/7, ifn=3 
3 ||PAu||* + cllze||"||Vael|", if n= 2 

Z 2 

~ [EPA]? + el] Vall’, ifn=3 
2 


Thus, 
£ [Vul]? + ||PAu]]? 
dt 


- ferai ifn =2 22) 


c||Vull®, ifn =3 


These differential inequalities are at the core of 
present theory. Consider first the two-dimensional 
case. It can be viewed as a linear differential 
inequality 


d 2 2 2 2 
Z Vu? < (e?e) Ve 23 


with a coefficient c||x||7||Vul||? that is integrable, in 
view of the energy estimate [16]. Integrating it yields 
a “global” estimate; for all ¢ > 0, 


t 
[Vue < [Vuo]? exp | jul Vul} dr 


1 
< || Voll" exp 5 ||“oll” |24] 


However, if the three-dimensional version of [22] 
is viewed as a linear differential inequality, the 
coefficient to be integrated is ||Vu(t)||f. Thus, the 
same integral which is crucial to proving continuous 
dependence on the data is also crucial to proving 
regularity. What we can do in the three-dimensional 
case, is view [22] as a nonlinear differential inequal- 
ity of the form 


P <e? or g < ell Vull g? [25] 


for y(t) = (Vult). Integrating the first of these, one 
obtains a local estimate 


Vuoll? 
[Vuc <0 


ee |26] 
\/1 —2cl|Vuoll*t 


1 
SS eer 
2c||Vuo || 


for 


without any restriction on the size of the data. 
Integrating the second, one obtains a global estimate 


2 [Vuol]? 
[ee ee 
1 — elf Voll? fe Vl? dr 
Vuo i 
2. CU 
1 — (c/2)||uo||" || Vol 
valid for all t > 0, provided 
2 22 
[uol | Val" <= |28] 


This is a good interpretation of what we mean by 
“small data.” If Xie’s conjecture is correct, that the 
constant in the three-dimensional version of [20] is 
c=(3r)', then we obtain [25]-[28] with the 
constant c=3/(12877). Thus, 2/c ~ 842. 


Further Regularity, Smoothing Estimates 
Once one has an estimate of the form 
||Vu(t)|| << M(t), forO<t<T [29] 


as provided by [24], [26], or [27], one can estimate 
the solution’s derivatives of all orders over the open 
time interval (0, T). The initial time t=0 must be 
excluded from the interval, because the “imperfec- 
tion” of prescribed data generally causes an impul- 
sive acceleration along the boundary at time zero, 
resulting in a thin boundary layer in which the 
derivatives are so large that ||Vz;(t)|| and lul) ll waa) 
tend to infinity as t>0*. But the solution quickly 
smooths and remains smooth as long as [29] 
remains in force. Thus, our working assumption up 
to this point, that solutions are C® smooth in Q x 
[0, c0) is not valid at t=0. However, we will see 
that they are smooth in 2 x (0, T) and continuous in 
Q x [0, T). They are also continuous on [0, T) in the 
W3(Q) norm. This is sufficient regularity to justify 
everything that we have done to this point. 

In this section, we give estimates for the derivatives 
of all orders with respect to time, of u and its first- and 
second-order derivatives with respect to space. In the 
next section, we will prove an existence theorem by 
Galerkin approximation. It will be easily seen that all 
of the estimates proved in this and previous sections, 
for solutions that are assumed to be smooth, also hold 
for the approximations, without any unproven 
assumptions. Therefore, they will be inherited by the 
solution that is obtained upon passing to the limit of 
the approximations. At first, this solution will be 
something of a generalized solution, not fully classical, 
but one which is C% with respect to time over the 
interval 0 < t < T, in the W3(Q) norm. In a final step, 
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viewing u at any fixed time as a solution of the steady 
Stokes equations, we can apply regularity estimates for 
the Stokes equations to infer that it is C% in all 
variables throughout Q x (0,T), with specific esti- 
mates for each derivative. 

The estimates of this section are obtained by 
integrating an infinite sequence of differential inequal- 
ities, for llall, [Vall lel [I Vers lees [I Veta =» 
The first two are [15] and [21], which have already 
been dealt with. It turns out that after these first two, 
each succeeding differential inequality is linearized by 
the estimates obtained from its predecessor, which 
explains why the time intervals for these additional 
estimates do not become successively shorter. In fact, 
in the two-dimensional case, the energy estimate 
resulting from [15], which is valid for all time, already 
gives the linearization [23] of [21], which then 
provides an estimate valid for all time. Except for 
noting such differences between the two- and three- 
dimensional cases, we will henceforth deal with only 
the three-dimensional case. 

The differential inequalities just mentioned are 
obtained by estimating the right-hand sides of two 
sequences of differential identities, and ordering 
them by an iteration between the two sequences. 
The first sequence begins with and is patterned after 
the energy identity, 


1d 
zg lel? + [| Vall” =0 
1d 
gg lelt + [I Veell” = — (u: Vu, m) 
1 d p) 2 30] 
z Gy ll + ||Vua]| = — (Uae: Vu, un) 


= 2 (uy : Vtr, Utt) 
etc. 
while the second begins with and is patterned after 


Prodi’s identity, 


1d 
zg Val? + |PAull” = (u - Vu, PAu) 


1d 
zg Veel? + PAu} = (u : Vu, PAm) 
+ (u - Vu;z, PAu;) [31] 


1d 
2 dt [Veel T | PAu]? = (üy : Vu, PAuy) +- 


etc. 


Before going on, notice that we can return to [22] and 
use [29] to infer a more complete estimate of the form 


t 
|Vu(o)||2 + J IPAn]? dr 
< B(M,t), 


[32] 
fr0<t<T 


containing an integral of ||PAu||* on the left-hand side. 
We will use the notation B(M, t) generically, for any 
bound that depends only on the function M(t) and tf. 
We remark, that a term ||u||7 can also be included 
under the integral sign on the left-hand side of [32], 
because ||z;|| and |/PAz|| are of essentially the 
same order, being the leading terms in the projection 
u, + P(u- Vu) = PAu of the Navier-Stokes equation. 
Finally, one can also include lll wa) under the 
integral sign, in view of [19]. 

Going on, we obtain a third differential inequality 
from the second identity of the sequence [30]. Its 
right-hand side admits the estimate 


—(u; - Vu, uz) < |u Vul] 
1/2 3/2 
< celju"? Vul? || Va 


1 
< 5 ||Veull + ell Vul l? [33] 


which, in view of [29] or [32], produces a linear 
differential inequality with integrable coefficients. 
Its integration yields an estimate of the form 


t 
l(t)? + | [Vu]? dr 
= B(M, t, |jz(0) ||), 


[34] 
fr0<t<T 


provided ||w;(0)|| is bounded. Since u;=P(Au— 
u- Vu), we have the estimate 


24¢(0)|| = ||P(Ato — uo - Vuo)| 
< ||Auy — uo - Vuol] < B(\eollwaray) BSI 


provided that u is smooth in Q x [0, T). This is a 
delicate point, having been forewarned of a regular- 
ity breakdown at t=0. But, we will be able to 
replicate the estimate [35] for the Galerkin approx- 
imations, ultimately validating [34] for the approx- 
imations and the solution. 

The integration of the next differential inequality, 
which arises from the second of the identities [31], 
requires that ||Vu,(0)|| < oo. Similarly to [35], we 
have 


| Veee(0)|] = || VP(Auto — uo - Vuo) | 
< B(|lHollwa¢ay) 


provided that u is smooth in 2 x [0, T). However, 
there is a big difference between [35] and [36]. In the 
next section, we will not be able to obtain an analog of 
[36] for the Galerkin approximations. Consequently, 
the solution that is obtained will not be fully regular at 
time t=0. It will satisfy u € C(Q x [0, T)) nN C”(Qx 
(0,T)), but not u € C%(Q x [0,T)). It will satisfy 


[36] 
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u(t) — uo || w2(a) 0 but not |u(t) — uo || w3(Q) > 0; 
ast— 07. 

One may wonder whether this is a fault or 
deficiency in the Galerkin method. It is not, 
remembering what was said at the beginning of 
this section. For most prescribed values of uo, no 
matter how smooth, there is a breakdown in the 
regularity of the solution as t— 0+. In fact, it was 
proved in Heywood and Rannacher (1982) that if 
||Vu-(t)|| or any one of several other quantities, 
including lutt lwo)» remains bounded as t— 07, 
then there exists a solution po of the overdetermined 
Neumann problem 


—Apo =: (uo ' Vuo) in Q 


[37] 
VPolan = Avolan 


Generically speaking, this problem is not solvable, 
and therefore 


lim sup,_,+||Va4(t)]] = 00 


We mention that under our assumption that uo is 
smooth, the correctly posed Neumann problem, 
with boundary condition Opo/On|,9 = Auo - M99, is 
uniquely solvable for a solution po € W3()/R, and 
|| Vp(t) — Vpo|| 0, as t—0t; see Heywood and 
Rannacher (1982). 

Since solutions are smooth for 0<t< T, the 
pressure in the Navier-Stokes equations satisfies the 
overdetermined Neumann problem for all t € (0, T). 
So it may seem appropriate to require that the 
prescribed initial value up be a function for which 
problem [37] is solvable. We do not agree with that. 
It is too difficult, if not impossible, to find such 
functions, except by solving the Navier-Stokes 
equations. For example, one might think that the 
condition that [37] should be solvable might be 
satisfied if uo € D(Q), since such functions are zero 
in a neighborhood of the boundary. In fact, K 
Masuda has shown that if Q is a three-dimensional 
sphere, then the overdetermined Neumann problem 
[37] is never solvable for nonzero up € D(Q). Hence, 
the gradient of the initial pressure will have a 
nonzero tangential component, causing an impulsive 
tangential acceleration along the boundary. 

If we are to use the Navier-Stokes equations to 
make predictions of the future, we must solve the 
initial boundary value problem for “man-made” 
initial values, and accept the fact that there is a 
momentary breakdown in regularity along the 
boundary, immediately following the initial time. 
Thereafter, the solution smooths as “nature” takes 
over. To prove the reliability of our predictions, we 
need continuous dependence estimates and error 
estimates for numerical methods that take into 


account this initial breakdown in the regularity. 
The continuous dependence estimate [14] meets this 
requirement. So also do the error estimates given in 
a series of four papers by Rannacher and the author, 
beginning with Heywood and Rannacher (1982). 
They were based on the “smoothing” regularity 
estimates for solutions that are being presented here. 
We go on with these now, as models for similar 
estimates for the Galerkin approximations. 

Estimating the right-hand side of the second of the 
identities [31] using [20] and Young’s inequality, 
and then multiplying through by t, we get the linear 
differential inequality 


d 2 2 
5, (tll Vail?) + tPA] 
< [Vul +e (|| Vell*+ Vee? 
+ ||PAu||*) (HV?) 38 


for t||Vu;||*, with coefficients that are integrable in 
view of the previous estimates [32], [34], and [35]. 
Therefore, its integration yields an estimate analo- 
gous to [32] of the form 


t 
TAE / 1||PAu,||? d7 
0 
< B(M, t, oll w2cay) for0<t<T [39] 


provided its “initial value” is finite. It is, due to the 
time weight, in the sense that 


lim sup; o+ (¢|Vur(2)||") = 0 40] 


This is proved by noting that if the lim sup were 
positive, then the integral on the left-hand side of 
[34] would be infinite. Finally, a term t||uy||" can be 
included under the integral sign on the left-hand side 
of [39], because ||uyl| and ||PAu || are of essentially 
the same order, being the leading terms in the 
projection Uy + P(u,-Vu+tu-Vu,)=PAu, of the 
time differentiated Navier-Stokes equation. 

We continue inductively. Estimating the right- 
hand side of the third of the identities [30] using 
[12], [20], and Young’s inequality, and then multi- 
plying through by t*, we get the linear differential 
inequality 


d 
q (Pll?) +P Vue? 
< 2| juta [PH || Vun- PAn 
+ eVa Vut) (Pel) (4 


with coefficients that are integrable in view of 
preceding estimates. In particular, the integrability 
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of the first term on the right-hand side follows from 
the boundedness of the integral 


t 
f, tal? ar 42] 


which, we have pointed out, can be included on the 
left-hand side of [39]. Finally, notice that the 
boundedness of the integral [42] implies 


lim sup,_o+ (Pll?) = 0 [43] 


Therefore, we can integrate [41] to get the estimate 
t 
PDPA [Pal dr 
0 
< B(M, t, oll w2cay) forO<t<T [44] 


analogous to [34]. 

At this point, we have introduced every device 
needed to proceed by induction to an infinite 
sequence of time-weighted estimates, similar to 
[39] and [44], but with successively higher orders 
of time derivatives and weights. The dependence of 
these estimates on Io ll w2 (a) was introduced through 
[34] and [35]. It can be eliminated by beginning the 
introduction of powers of t as weight functions one 
step earlier, with the added advantage that the initial 
velocity uo needs only belong to J,(Q). In the two- 
dimensional case, the weight functions can be 
introduced even another step earlier, with the 
advantage that the initial velocity ug needs only 
belong to J(Q). Each of these cases leads to an 
existence theorem for solutions u € C®(Q x (0, T)), 
with the initial values assumed in the norms of J,(Q) 
and J(Q), respectively. 


Existence by Galerkin Approximation 


Let {a',a*,...} and {\y, A2,...} denote the eigenfunc- 
tions and eigenvalues of the Stokes equations, 


—Aa+Vp=,a®, Va =0 inf 
k 
Fh 0 [45] 


chosen to be orthonormal in L7?(Q). Clearly, 
—PAa, =,a*, so they are also the eigenfunctions 
and eigenvalues of the Stokes operator, —PA. Using 
regularity estimates for the Stokes equations, each 
eigenfunction is known to be C® smooth in Q. 

The nth Galerkin approximation for problem [9] 
is the solution 


w(x) = Semlt)a¥(x) 
k=1 


of the system of ordinary differential equations 


(ut, + (u" Vu”, a) = (Au",a') 


forts L eh |46] 


satisfying the initial conditions (u”(0), a’) = (uo,a!), 
for /=1,2,...,n. Of course, since (usa) = OC}, /Ot 
and (Au”, al) = (PAu”, al) = — Aicm, the differential 
equations can be written as 


d n 
—_ (== > CinCin (a . Va,a') — ÀC 
dż ij=1 


and the initial conditions as cm(0)= (u”(0),a!), for 
[= ee aes 
The system [46] is at least locally solvable, on 
some interval [0, T„), with each coefficient satisfying 
Cm E€ C°[0, T,,). Therefore, since the eigenfunctions 
are also smooth, u” is C® smooth in Q x [0, T,,). It 
also satisfies all of the identities [30] and [31] on the 
interval [0, T„). Indeed, multiplying [46] by cm and 
summing over | from 1 to n has the effect of 
converting a! into u”. The resulting identity for u” 
leads immediately to the energy identity 
TA wv = 0 a7] 


The remaining identities in the sequence [30] are 
obtained similarly. For example, the second is 
obtained by taking the time derivative of [46], 
multiplying through by dc;,/dt and summing over l. 

Prodi’s identity is obtained by multiplying [46] by 
ACI, and summing, which has the effect of convert- 
ing al into —PAw”. To obtain the second of the 
identities [31] for u”, one differentiates [46], multi- 
plies by A;dc;,,/dt and sums. The remaining identities 
in the sequence [31] are obtained similarly. 

The initial conditions easily imply that ||u”(0)|| < 
||uol|, because uo € J(Q) and the eigenfunctions are 
orthogonal and complete in J(Q). Therefore, inte- 
gration of [47] yields the energy estimate 


i re i P 1 
SO f Vedr <5 luo? sl 
0 


which is uniform in 7. Since ||u”(t)|| remains bounded, 
the solution u”(t) can be continued for all time. Thus, 
T,=, for all n. Hence, our early working assump- 
tion about solutions, that they are smooth in Q x 
[0, 00), is actually valid for the Galerkin approxima- 
tions. The issue becomes one of obtaining estimates 
for their derivatives that are uniform in n. All of the 
estimates we have proved for solutions are proved in 
exactly the same way for the approximations. The 
only possible source of nonuniformity would arise 


from the initial values of ||Vu”|| and ||} ||. 
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The estimates [24], [26], and [27] are uniform in 
n, since uo EJ (Q) and hence |/Vu"(0)|| < || Voll, 
due to the orthogonality of the eigenfunctions in the 
inner-product (Vu, Vv), and their completeness with 
respect to functions in J,(Q). We also obtain a 
uniform bound for ||#?(0)|| of the form [35], by 
multiplying [46] by Oc;,/Ot and summing over l. In 
the last step, we also need the inequality 
u” (Oliw) < [uo ll w29)» which follows from the 
orthogonality of the eigenfunctions in the inner 
product (PAu, PAv), and their completeness with 
respect to functions in J,(Q)N W3(Q); see 
Ladyzhenskaya (1969, p. 46). Any attempt to find 
a bound for ||Vw’(0)|| analogous to [36] is certain to 
fail, as it would lead to a contradiction with afore- 
mentioned results from Heywood and Rannacher 
(1982). 


Passage to the Limit 


We now have L?-bounds for u”, Vu", u”, 0? u" /Ox;Ox;, 
and Vu? over any space-time region Q x (0, T’), with 
0 < T' < T. We also have L*-bounds for all orders of 
the time derivatives of these quantities over any 
subregion Q x (e,T’), with O<e<T' <T. From 
these L*-bounds, we may infer the existence of a 
subsequence of the Galerkin approximations, again 
denoted by {u”}, which converges, along with those of 
its derivatives for which we have bounds, to a limit u 
and its derivatives. The convergence u”—u and 
Vu" — Vu is strong in L*(Q x (0,T’)); the conver- 
gence of u” is strong in L?(Q x (e, T')) and weak in 
L?(Q x (0, T’)); the convergence of PAu” is weak in 
L7(Q x (0, T’)); all time derivatives of u”, Vu” con- 
verge strongly in L7(Q x (e, T’)). 

Because of estimates for the time derivatives, trace 
arguments give the strong convergence u”— u, 
Vu" —> Vu,uy >u, and the weak convergence 
PAu" = PAu, in L?(Q), for every t > 0. 

For any fixed time, u € W3 (Q), and therefore u is 
continuous in 2 by a well known Sobolev inequal- 
ity. Since u €J,(Q), it must equal zero along the 
boundary. The estimates for the time derivatives of 
u”, Vu", 0 u” /ðx;ðx; imply that u and its time 
derivatives are time continuous in W3(Q). There- 
fore, U,U;,Uz,... are classically continuous in 


Q x (0, T). 


Introduction of the Pressure 


Because of the strong convergence u”—u, Vu" — 
Vu,uy—u, and the weak convergence 
PAu” — PAu, in L7(Q), for any t > 0, it is an easy 
matter to let 7 — œo in [46], obtaining, for all £ > 0, 


(m,a') + (u - Vu, a') = (Av, a') 
tori = 12534 |49] 


Since the eigenfunctions are complete in J(Q), and 
D(Q) c J(Q), this implies 


(u,t+u-Vu— Au, >) = 0, 


Therefore, there exists a vector field Vp € G(Q) such 
that 


for all E€ DQ) [50] 


u, +u -Vu — Au = -Vp [51] 


Indeed, the usual test to determine whether a 
smooth vector field w is conservative in some 
domain Q, and therefore representable as a gradient, 
is to check whether the curve integrals 


f words [52] 


vanish for every smooth closed curve C C Q. Here, T 
is the unit tangent to the curve and ds is its arc 
length. With a little reflection, one will realize that 
these curve integrals can be approximated by 
volume integrals of the form (w, ¢) with ¢ € D(Q). 
For this, one should choose ¢ to have its support in 
a small tubular neighborhood of the curve, and its 
streamlines parallel to the curve, with unit net flux 
through any section of the tube. If w is not smooth, 
but only known to belong to L7(Q), one can 
approximate it with its smooth mollifications. This 
argument can be made rigorous. We previously 
showed that J(Q) and G(Q) are orthogonal sub- 
spaces of L7(Q). Now we have argued that 
L?(Q) =J(Q) @ G(Q). 


Classical C% Regularity 


At any fixed time, we may regard u as a solution of the 
steady Stokes problem [18] with f=—u,—u- Vu. 
Included in Cattabriga (1961) and Solonnikov (1964, 
1966) are regularity estimates for all orders of 
derivatives of the form 


lll weze) = Clif ll wea) 


From our estimates above, we easily conclude that 
f =—u,—u-Vu € W3(Q). Hence, u € W3 (Q). In 
fact, in view of the regularity we have proven 
with respect to time, f € C~(0,T; W}(Q)) and u € 
C~(0, T; W3(Q)). Thus begins a bootstrapping argu- 
ment. In the next step, we observe that f € 
C~(0,T; W3(Q)) and conclude that u € C%(0, T; 
W3(Q)). By induction, one obtains u € C%(0,T; 
W3(Q)) for every positive integer k. Then well- 
known Sobolev inequalities imply that u € C% 
(Q x (0, T)). 
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Assumption of the Initial Values 


We begin by showing that u(t)— uo, weakly in 
L*(Q), as t— 0*. Of course, ||z(t)|| remains bounded 
as t— 0", in virtue of [48], and the eigenfunctions 
{a'} are complete in J(Q). Writing 


(ut) m uo,a') z (u(e) 2 u"(t),a') n (uw"(0) 
m u"(0),a') + (w"(0) — uo,a') 
note that the first and third terms on the right-hand 


side can be made small by choosing n large. The 
second can be written as 


(O —u"(0),a") = f (wa) ar 


which will be small if ż is small, in view of [34]. 
Thus, (u(t) —uo,a') 0, as t—0", which implies 
the desired weak convergence. 

The strong convergence u(t) — uo in L (Q) follows 
from the weak convergence if lim sup; + ||z(t)|| < 
uol. The energy estimate [48] for the approxima- 
tions implies this also. 

To conclude that u(t) — uo strongly in J, (Q), it only 
remains to be shown that limsup,_, + ||V(t)|| < 
|| Vuo||. This readily follows from [29], provided the 
bounding function M(t) satisfies M(t) — ||Vuol|, as 
t — 07. The bounding functions provided by our basic 
estimates [24], [26], and [27] all have this property. 

We may conclude that u(t) — uo weakly in W3(Q), 
provided u(t) || waa) remains bounded as t— 0°. To 
see this, remember that ||PAz|| and ||z;|| are of 
essentially the same order. Thus the term ||;(t)||” on 
the left-hand side of [34] can be accompanied by a 
term [utli 

Finally, to prove that u(t) — uo strongly in W2 (9), 
we need only show that limsup,_,9+||PAz(t)|| < 
||PAuo||, since ||PA-|| and ||-|waq) are equivalent 
norms on J,(Q) n W3(Q). To this end, multiply [46] 
by A;dc;,/dt and sum to get 


1 d n n n n n 
5g PAW PH Vee E = (u < Vu”, PAu) 


(u” - Vu", PAu") 
— (u; Vu” +u” Vuy, PAu”) 


Sle 


Integrating this gives 
| PAw"(z)||°—||PAu(0)||° 
= f PAv"? ds 
Jy dt 
= 2(u"- Vu", PAu”)|,—2(u"- Vu", PAu”)|, 


t t 
-2 | Yur ds -2 f (u! Vat" 
0 0 


+u” - Vut, PAu") ds [53] 


For the terms under the last integral we have 
| (uf - Vu”, PAu”) + (u” - Vut, PAu") | 
n||2 n\1/2 n\\3/2 
< |] Veer [tel] Ve" |! Pv" |”! 
Therefore, [53] implies 


PAu" (2) ||"< ||PAu"(0)||'+2(u" - Vu", PAu”), 
—2(u" - Vu", PAu” )|o + Kt 


uniformly in n, as t—0*, where K is a constant 
depending on the estimates [32] and [34]. Letting 
n— œ, gives 


[PAUC] < ||PAu(0)||° +2. Vu, PAu), 
—2(u- Vu, PAu)|o + Kt 


Since u- Vu—uo:Vuo strongly in L*(Q), and 
PAu —PA^uo weakly in L*(Q), we get the desired 
result. The continuous assumption of the initial values 
in W3(Q) also implies their continuous assumption in 
the classical sense, and hence that u € C( x [0, T)). 


Conclusion 


Years ago, mathematical questions concerning the 
Navier-Stokes equations were usually considered in 
the context of generalized or weak solutions, which was 
a technical barrier to many in the scientific community. 
Nowadays, realizing that solutions are at least locally 
classical, fundamental questions such as that of global 
regularity can be studied within the classical context. If 
the estimate [29] is proved for classical solutions, with 
T = œ, and without a restriction on the size of the data, 
this particular matter will be settled. 
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Introduction 


von Neumann algebras, as they are called now, 
first made their appearance under the name 
“rings of operators” in a series of seminal papers — 
see Murray and von Neumann (1936, 1937, 1943) 
and von Neumann (1936) — by F J Murray and J von 
Neumann starting in 1936. Murray and von 
Neumann (1936) specifically cite “attempts to 
generalize the theory of unitary group representa- 
tions” and “demands by various aspects of the 
quantum-mechanical formalism” among the reasons 
for the elucidation of this subject. 

In fact, the simplest definition of a von Neumann 
algebra is via unitary group representations: 
a collection M of continuous linear operators on a 
Hilbert space H (in order to avoid some potential 
technical problems, we shall restrict ourselves to 


separable Hilbert spaces throughout this article) is 
a von Neumann algebra precisely when there is a 
representation p of a group G as unitary operators 
on H such that 


M={x € L(A) : xp(t) = p(t)x Vt € G} 


As above, we shall write £(H) for the collection of 
all continuous linear operators on the Hilbert space H; 
recall that a linear mapping x:H— H is continuous 
precisely when there exists a positive constant K such 
that ||x&|| < K||é|| VE € H. If the norm ||x|| of the 
operator x is defined as the smallest constant K with 
the above property, then the set L(H) acquires the 
structure of a Banach space. In fact £(H) is a Banach 
*-algebra with respect to the composition product, and 
involution x+> x* given by 


(x€, 9) = (6, x") VE,n © H 


The first major result in the subject is the 
remarkable “double commutant theorem,” which 
establishes the equivalence of a purely algebraic 
requirement to purely topological ones. We need 
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two bits of terminology to be able to state the 
theorem. 
First, define the commutant S of a subset S C 


L(H) by 
Sal CL) ee Soe Ve 


Second, the strong (resp., weak) operator topology is 
the topology on L(H) of “pointwise strong (resp., 
weak) convergence”: that is, x, —.x precisely when 
en -xE OVE CH. (respa, (ine — xE, n) > 0 VE, 
n € H). 


Theorem 1 The following conditions on a unital 
*-subalgebra M of L(H) are equivalent: 


(i) M=M"(=(M')’). 
(ii) M is closed in the strong operator topology. 
(iii) M is closed in the weak operator topology. 


The conventional definition of a von Neumann 
algebra is that it is a unital *-subalgebra of L(H) 
which satisfies the equivalent conditions above. The 
equivalence with our earlier “simplest definition” is 
a consequence of the double commutant theorem 
and the fact that any element of a von Neumann 
algebra is a linear combination of four unitary 
elements of the algebra: simply take G to be the 
group of unitary operators in M’. 

Another consequence of the double commutant 
theorem is that von Neumann algebras are closed 
under any “canonical construction.” For instance, 
the uniqueness of the spectral measure E> P,.(E) 
associated to a normal operator x shows that if u is 
unitary, then Py. (E) =uP,(E)u* for all Borel sets E. 
In particular, if xe€M and wu €U(M’), then 
u' P,(E)u™ = Pyxys(E)= P(E), and hence, we may 
conclude that P(E) € U(M'Y=(M' =M (we will 
write U(N) (resp., P(N)) to denote the collection of 
unitary (resp., projection) operators in any von 
Neumann algebra N); that is, if a von Neumann 
algebra contains a normal operator, it also contains 
all the associated spectral projections. This fact, 
together with the spectral theorem, has the conse- 
quence that any von Neumann algebra M is the 
closed linear span of P(M). 

The analogy with unitary group representations is 
fruitful. Suppose then that M=p(G)’, for a unitary 
representation of G. Then the last sentence of the 
previous paragraph implies that p(G)’=C precisely 
when there exist no nontrivial p-stable subspaces 
(here and in the sequel, we identify C with its image 
under the unique unital homomorphism of C into 
L(H); and we reserve the symbol Z(M) to denote the 
center of M), that is, when p is irreducible. In general, 
the p-stable subspaces are precisely the ranges of 
projection operators in M. The notion of unitary 


equivalence of subrepresentations of p is seen to 
translate to the equivalence defined on the set P(M) 
of projections in M, whereby p ~q if and only if 
there exists an operator u € M such that u*u =p and 
uu* =q. (Such a u is called a partial isometry, with 
“initial space” = range p, and “final space” = range q.) 
This is the definition of what is known as the 
“Murray—von Neumann equivalence rel M” and is 
denoted by ~y . The following accompanying defini- 
tion is natural: if p,q € P(M), say p <m q if there 
exists Po E€ P(M) such that p ~m po < q — where of 
course e < f range(e) C range(f). 


The Murray-von Neumann 
Classification of Factors 


We start with a fact (whose proof is quite easy) and 
a consequent fundamental definition. 


Proposition 2 The following conditions on a von 
Neumann algebra M are equivalent: 


(i) for any p,q € P(M), it is true that either p <m q 
Or d SMP. 
i) Z(M)=MnM’=C. 


The von Neumann algebra M is called a “factor” if 
it satisfies the equivalent conditions above. 


The alert reader would have noticed that if G is 
a finite group, then p(G)’ is a factor precisely when 
the representation p is “isotypical.” Thus, the 
“representation-theoretic fact,” that any unitary 
representation is expressible as a direct sum of 
isotypical subrepresentations, translates into the 
“von Neumann algebraic fact” that any «-subalgebra 
of £(H) is isomorphic, when # is finite dimensional, 
to a direct sum of factors. In complete generality, 
von Neumann (1949) showed that any von 
Neumann algebra is expressible as a “direct integral 
of factors.” We shall interpret this fact from 
“reduction theory” as the statement that all the 
magic/mystery of von Neumann algebras is con- 
tained in factors and hence restrict ourselves, for a 
while, to the consideration of factors. 

Murray and von Neumann initiated the study of a 
general factor M via a qualitative as well as a 
quantitative analysis of the relation <m on P(M). 
First, call a p € P(M) infinite if there exists a po < p 
such that p ~m po and po Æ p; otherwise, say p is 
finite. They obtained an analog, called the “dimen- 
sion function,” of the Haar measure, as follows. 


Theorem 3 


(i) With M as above, there exists a function 
Dy: P(M)— [0,00] which satisfies the following 
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properties, and is determined up to a multiplicative 
constant, by them: 


© p <m q@Dm(p) < Da(q) 

è p is finite if and only if Dy(p) < œ 

è If {p :n=1,2,...} is any sequence of pairwise 
orthogonal projections in P(M) and 
p= don Pn, then Du(P) =>, Da (Pn) 


(ii) M falls into exactly one of five possible cases, 
depending on which of the following sets is the 
range of some scaling of Dm: 


e (14) 10;1;2,- i 
o (I ww) (0,1, 2,. oo} 
e (IL) [0,1] 

è (Is) [0;00] 

© (IIT) {0,00} 


In words, we may say that a factor M is of: 


1. type I (ie., of type I, for some 1 <1 < oo) 
precisely when M contains a minimal projection, 

2. type II (i.e., of type Il or II.) precisely when M 
contains nonzero finite projections but no mini- 
mal projections, and 

3. type III precisely when M contains no nonzero 
finite projections. 


Examples L®(Q, u) may be regarded as a von 
Neumann algebra acting on L*(Q,) as multi- 
plication operators; thus, if we set m,(€)=f€, 
then m:ft-m, defines an isomorphism of L®(Q, u) 
onto a commutative von Neumann subalgebra of 
L(L*(Q, u)). In fact, “up to multiplicity,” this is how 
any commutative von Neumann algebra looks. 


It is a simple exercise to prove that M C L(H) isa 
factor of type I,, 1 < n < œ, if and only if there exist 
Hilbert spaces H, and K and identifications H = H, Q 
K,M={x @ ide: x € L(H,)} where dim H, =n; and 
so M & L(H,,). 

To discuss examples of the other types, it will be 
convenient to use “crossed products” of von 
Neumann algebras by ergodically acting groups of 
automorphisms. We shall now digress with a 
discussion of this generalization of the notion of a 
semidirect product of groups. 

If a: G — Aut(M) is an action of a countable group 
G on M, where M C L(H) is a von Neumann 
algebra, and H=/*(G,H), there are representations 
mt: M—>L(H )) defined by 


1) and \:GSU(L(H 
(m(x)E) (s) = a,-4(x)E(s), (A(Z)E)(s) s) 


These representations satisfy the commutation rela- 
tion A(t)a(x)A(t!) = m(az(x)), and the crossed pro- 
duct MG is the von Neumann subalgebra of L(H) 
defined by M=(n(M)U A(G)”. 


Let us restrict ourselves to the case of 
M=L®™(Q, u) acting on L7(Q, u). In this case, it is 
true that any automorphism of M is of the form 
f—fo T”, where T is a “nonsingular transforma- 
tion of the measure space (Q, u)? (=a bijection 
which preserves the class of sets of u-measure 0). So, 
an action of G on M is of the form a;(f) =f o T,", 
for some homomorphism t+>T; from G to the 
group of nonsingular transformations of (Q, u). We 
have the following elegantly complete result from 
Murray and von Neumann (1936). 


Theorem 4 Let M,G,a be as in the last section, 
and let M=Mx,G. Assume the G-action is “free,” 
meaning that if t41€EG, then pl{wEQ: 
T;(w) =w}) =0. Then: 


(i) M is a factor if and only if G acts ergodically on 
(Q, u) — meaning that the only G-fixed functions 
in M are the constants. 

(ii) Assume that G acts ergodically. Then the type of 
the factor M is determined as follows: 


© M is of type I or II if and only if there exists 
a G-invariant measure v which is mutually 
absolutely continuous with respect to u, 
meaning v(E)=0<4 u(E)=0; (the ergodicity 
assumption implies that such a v is necessa- 
rily unique up to scaling by a positive 
constant;) 

e M is of type l, precisely when the v as above is 
totally atomic, and Q) is the disjoint union of n 
atoms for v; 

e M is of type II precisely when the v as above is 
nonatomic; 

e M is of finite type — meaning that 1 is a finite 
projection in M — precisely when the v as 
above is a finite measure; 

e M is of type III if and ay if there exists no v 
as above. 


Thus, we get all the types of factors by this 
construction; for instance, we may take: 


(In)\G=Z, acting on =Z., by translation, and 
u=v= counting measure 

UI..)G=Z acting on Q=Z by translation, and 
u=v= counting measure 

(I,)G=Zacting on Q=T={zeEC:|z|/=1} by 
powers of an aperiodic rotation, and p=v= 
arclength measure 

(I,.)G=Qacting on Q=R by translations, and 
u=v= Lebesgue measure 

(III)G =ax + b group acting in the obvious manner 
on Q=R, u=v = Lebesgue measure. 


Such crossed products of a commutative von 
Neumann algebra by an ergodically acting countable 
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group were intensively studied by Krieger (1970, 
1976). We shall simply refer to such factors as 
“Krieger factors.” The term “Krieger factor” is 
actually used for factors obtained from a slightly 
more general construction, with ergodic group 
actions replaced by more general ergodic equiva- 
lence relations. Since there is no difference in the 
two notions at least in good (amenable) cases, we 
will say no more about this. 


Abstract von Neumann Algebras 


So far, we have described matters as they were in 
von Neumann’s time. To come to the modern era, it 
is desirable to “free a von Neumann algebra from 
the ambient Hilbert space” and to regard it as an 
abstract object in its own right which can act on 
different Hilbert spaces — for example, L®(Q, u) is 
an object worthy of study in its own right, without 
reference to L7(Q, u). 

The abstract viewpoint is furnished by a theorem 
of Sakai (1983); let us define an abstract von 
Neumann algebra to be an abstract C*-algebra 
(this is a Banach algebra with an involution related 
to the norm by the so-called C*-identity ||x/||* = 
\|x*x|]) M which admits a pre-dual M, — i.e., M is 
isometrically isomorphic to the Banach dual space 
(M,.)°. It turns out that a predual of such an abstract 
von Neumann algebra is unique up to isometric 
isomorphism. Consequently, an abstract von 
Neumann algebra comes equipped with a canonical 
“weak*-topology,” usually called the “o-weak topol- 
ogy” on M. The natural morphisms in the category 
of abstract von Neumann algebras are x*-homo- 
morphisms which are continuous with respect to 
o-weak topologies on domain and range. It is 
customary to call a linear map between abstract 
von Neumann algebras “normal” if it is continuous 
with respect to o-weak topologies on domain and 
range. 

The equivalence of the “abstract” definition of 
this section, with the “concrete” one given earlier 
(which depends on an ambient Hilbert space), relies 
on the following four facts: 


1. L(H) is an abstract von Neumann algebra, with 
the predual L(H), being the so-called “trace class” 
of operators, equipped with the “trace norm.” 

2. A self-adjoint subalgebra of £(H) is closed in the 
strong operator topology, and is hence a “con- 
crete von Neumann algebra” precisely when it is 
closed in the o-weak topology on L(H). 

3. If M is an abstract von Neumann algebra, and N 
is a *-subalgebra of M which is closed in the 
o-weak topology of M, then N is also an abstract 


von Neumann algebra, with one candidate for N, 
being M./N, (where N, ={p € M,:n(p)= 
0 Vn € N}). 

4, Any abstract von Neumann algebra (with separ- 
able predual) is isomorphic (in the category of 
abstract von Neumann algebras) to a (concrete) 
von Neumann subalgebra of L(H) (for a separ- 


able H). 


With the abstract viewpoint available, we shall 
look for modules over a von Neumann algebra M, 
meaning pairs (H, 7) where 7: M — L(H) is a normal 
*-homomorphism. 

A brief digression into the proof of fact (4) 
above — which asserts the existence of faithful 
M-modules — will be instructive and useful. Suppose 
M is an abstract von Neumann algebra. A linear 
functional @ on M is called a normal state if: 


è (positivity) o(x*x) > OVx € M; 
e (normality) 6: M—C is normal; and 
è (normalization) ¢(1)=1. 


(Normal states on L®(Q, u) correspond to non- 
negative probability measures on Q which are 
absolutely continuous with respect to u.) It is true 
that there exist plenty of normal states on M. 
In fact, they linearly span M,. This implies that if 
M, is separable, then there exist normal states 
on M which are even “faithful” - meaning 
Oe x =0 x=, 

Fix a faithful normal state @ on M. (Consistent 
with our convention about separable H’s, we shall 
only consider M’s with separable preduals.) The 
well-known “Gelfand—Naimark-—Segal” construction 
then yields a faithful M-module which is usually 
denoted by L*(M, ¢) — motivated by the fact that if 
M=L™(Q, u), and o(f)= | fdv, with v a probabil- 
ity measure mutually absolutely continuous with 
respect to u, then L*(M, ¢) = L7(Q,v) with L®(Q, u) 
acting as multiplication operators. The construction 
mimics this case: the assumptions on ¢ ensure that 
the equation 


(x,y) = O(y*x) 


defines a positive-definite inner product on M; let 
L7(M,¢) be the Hilbert space completion of M. It 
turns out that the operator of left-multiplication by 
an element of M extends as a bounded operator to 
L7(M, ¢), and it then follows easily that L7(M, œ) is 
indeed a faithful M-module, thereby establishing 
fact (4) above. 

Since we wish to distinguish between elements of 
the dense subspace M of L*(M,@) and the operators 
of left-multiplication by members of M, let us write 
x for an element of M when thought of as an 


von Neumann Algebras: Introduction, Modular Theory, and Classification Theory 383 


element of L7(M,¢), and x for the operator of left- 
multiplication by x; thus, for instance, x=x1, and 
KV =X). (X 1,1.) = Ole), Cte: 


Modular Theory 


While type III factors were more or less an enigma 
at the time of von Neumann, all that changed with 
the advent of Connes. The first major result of this 
“type III era” is the celebrated “Tomita—Takesaki 
theorem” (cf. Takesaki (1970)), which views the 
adjoint mapping on M as an appropriate operator 
on L*(M,¢@), and analyzes its polar decomposition. 
Specifically, we have: 


Theorem 5 If ¢ is any faithful normal state on M, 
consider the densely defined conjugate-linear opera- 
tor given, with domain {x:x € M}, by ar 
Then, 


(i) there is a unique COM eere HIE operator 
Ss (the “closure of Malan ao graph is 
i oe of the graph of S™ 43 if we write 

Sy * for the polar decomposition of the 
ee -linear closed operator Sg, then 

(11) Jẹ is an antiunitary involution on L? (M, œ) (i.e., 
it is a conjugate-linear norm-preserving bijec- 
tion of L?(M,¢) onto itself which is its own 
inverse); 

(Gii) Ag is an injective positive self-adjoint operator 
on L7(M, @) such that ]f(As)Jo =f(A;") for all 
Borel functions f :R —> R, and most crucially 


JMJ =M' and ASMA," =MYtER 
(Here and elsewhere, we shall identify x € M 
with the operator of “left-multiplication by x” 


on L*(M, 4).) 


Thus, each faithful normal state ¢ on M yields a 
one-parameter group {of :t € R} of automorphisms 
of M -— referred to as the group of “modular 
automorphisms” — given by 


a) = 


The extent of dependence of the modular group on 
the state is captured precisely by Connes’ Radon 
Nikodym theorem (Connes 1973), which shows that 
the modular groups associated to two different 
faithful normal states are related by a “unitary 
cocycle in M.” This has the consequence that if 
c: Aut(M) — Out(M) = Aut(M)/Int(M) is the quoti- 
ent mapping — where Int(M) denotes the normal 
subgroup of inner automorphisms given by unitary 


Nox A; it 


elements of M — then the one-parameter subgroup 
{e(o?) :t € R} of Out(M) is independent of œ. 


Connes’ Classification and 
Injective Factors 


Given a factor M, Connes defined 
S(M) = 
( ){spec(Ag) : ¢ a faithful normal state on M} 


which is obviously an isomorphism invariant. He 
then classified (Connes 1973) type III factors into a 
continuum of factors: 


Theorem 6 Let M be a factor. Then, 


(i) 0€ S(M) & M is of type ILI; and 
ii) if M is a type III factor, there are three mutually 
exclusive and exhaustive possibilities: 


e (IIo)S(M) = {0, 1} 
e (III \S(M) = {0} UA%, for some 0< å< 1 
e (III: )S(M) = [0, œœ) 


Example 7 Consider the compact group 9 = [[~_, 
G, where G, is a finite cyclic group of order v, for 
each n. Let w= | [| Hn, where un is a probability 
measure defined on the subsets of G, which assigns 
positive mass to each singleton. Let G= 6°°_, G, be 
the dense subgroup of Q consisting of finitely 
nonzero sequences. It is not hard to see that each 
translation T,,g € G (given by T,(w)=g+w) is a 
nonsingular transformation of the measure space 
(Q, u). The density of G in Q shows that this action 
of G on L®(Q, u) is fixed-point-free and ergodic, 
with the result that the crossed product L™®(Q, js) XG 
is a factor. 

Krieger showed that in the case of a Krieger factor 
M=L™(Q, u) xG, the invariant S(M) agrees with the 
so-called “asymptotic ratio set” of the group G of 
nonsingular transformations, which is computable 
purely in terms of the Radon—Nikodym derivatives 
d(uo T;)/du. Using this ratio set description, it is 
not hard to see that the Krieger factor M given by 
the infinite product Q 


è isa factor of type II) if v, = 2 and p,{0} = A/(1 + A) 
for all 7; 

è is a factor of type III, if y,=2 and p2,{0}= 
r/(1 T A), H2n+1{0)} = K/(1 ae K), for all n, pro- 
vided that {A,x} generates a dense multiplicative 
subgroup of RŽ; 

è can be of type IIo. 

Among all factors, Connes identified one tractable 


class — the so-called injective factors — which are 
ubiquitous and amenable to classification. To start 
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with, he established the equivalence of several 
(seemingly quite disparate) requirements on a von 
Neumann algebra M C L(H) —- ranging from injec- 
tivity (meaning the existence of a projection of norm 
1 from L(H) onto M) to “approximate finite 
dimensionality” (meaning M=(U, A„)” for some 
increasing sequence A; C A2C-:::-CA,C:-:- of 
finite-dimensional -x-subalgebras). In the same 
paper, Connes (1976) essentially finished the com- 
plete classification of injective factors. Only the 
injective III, factor withstood his onslaught; but 
eventually even it had to surrender to the technical 
virtuosity of Haagerup (1987) a few years later! 

In the language we have developed thus far, the 
classification of injective factors may be summarized 
as follows: 


e Every injective factor is isomorphic to a Krieger 
factor. 

e Up to isomorphism, there is a unique injective 
factor of each type with the solitary exception of 
IIo. 

è Injective factors of type IIIo are classified (up to 
isomorphism) by an invariant of an ergodic- 
theoretic nature called the “flow of weights”; 
unfortunately, coming up with a crisp description 
of this invariant, which is simultaneously acces- 
sible to the nonexpert and is consistent with the 
stipulated size of this survey, is beyond the scope 
of this author. 


The interested reader is invited to browse through 
one of the books (Connes 1994, Sunder 1986, 
Dixmier 1981) for further details; the third book is 
the oldest (a classic but the language has changed a 
bit since it was written), the second is more recent 
(but quite sketchy in many places), and the first is 
clearly the best choice (if one has the time to read it 
carefully and digest it). Alternatively, the interested 
reader might want to browse through the encyclo- 
pediac treatments (Kadison and Ringrose) or 


(Takesaki). 


See also: Algebraic Approach to Quantum Field Theory; 
Bicrossproduct Hopf Algebras and Noncommutative 
Spacetime; Braided and Modular Tensor Categories; 
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and g-Deformation Quantum Groups; The Jones 
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Introduction 


Subfactor theory was initiated by Jones (1983) and 
has experienced rapid progress beyond the frame- 
work of operator algebras. Here we start with a 
basic introduction in this section. 

A factor is a von Neumann algebra with a trivial 
center. A von Neumann algebra M is an algebra of 
bounded linear operators on a Hilbert space H, 
which contains the identitiy operator and is closed 
under the *-operation and weak operator topology, 
and its center is the intersection of M and its 
commutant 


M' = {x € B(H)|xy = yx for all y € M} 


where B(H) denotes the set of all the bounded linear 
operators on H. (We are mostly interested in 
separable, infinite-dimensional Hilbert spaces. A 
von Neumann algebra is automatically closed also 
in the norm topology and thus it is also a C*-algebra.) 
By definition, a factor M acts on a certain Hilbert 
space H, but we also consider its action on another 
Hilbert space K, that is, a o-weakly continuous 
homomorphism preserving the *-operation from M 
into B(K). A subfactor is a factor N which is 
contained in another factor M and has the same 
identity. A factor is classified into types 
I, (2=1,2,3,...), Too, Th, Il, and II. In most of 
the interesting studies of subfactors, the two factors 
are of both type II; or both type III. A factor M is 
said to be of type II; if it is infinite dimensional 
and has a finite trace tr: M—C. By definition, a 
finite trace tr is a linear functional on M satisfying 
tr(1)=1, tr(xy)=tr(yx) for all x,yeM, and 
tr(x*x) > 0 for all xe M. When a factor M, not 
isomorphic to C, acts on a separable Hilbert space, it 
is of type III if and only if for any two nonzero 
projections p,q € M, we have an operator v € M 
with vv*=p and v*v = q. One obviously cannot have 
a trace on such a factor. (See Takesaki (2002, 2003) 
for a general theory on factors.) 

Let M be a type II; factor acting on a Hilbert 
space H. We then have the coupling constant of 
Murray and von Neumann, which is denoted by 
dimyH and belongs to (0,00]. This measures the 
relative dimension of H with respect to M. Note 
that the factor M acts on M itself by the left 
multiplication. We introduce an inner product on 
M by (x,y) =tr(y*x) and denote the completion by 


L? (M). Then M acts on this Hilbert space and we 
have dimyL7(M)=1. 

Let N C M be a subfactor and suppose that both 
N and M are of type II,. (We then simply say that 
N C Mis a type II, subfactor.) Suppose that M acts 
on a Hilbert space H with dimyH < co. Then we 
define the Jones index of N in M by 





This number is independent of the choice of H, as 
long as dimyH < œ, so we can take H = L?(M), 
then we have [M:N]=dimyL7(M). The equality 
[M:N]=1 means M=N. The first major discovery 
of Jones (1983) is that the value of the Jones index is 
in the set 


{4 cos*(1/m)|m = 3,4,5,...} U[4, oo] [1] 


and all the values in this set are indeed realized. 

Suppose we have a II, factor M and an action of 
an at most countable, discrete group G on M, that 
is, a homomorphism a: G — Aut(M), where Aut(M) 
is the automorphism group of M. Then we have a 
construction Mx,G, called the crossed product. If 
Qg is not an inner automorphism of M for any g € G 
other than the identity element of G, then Mx,G is 
also a type II, factor. (An automorphism ~m of M is 
said to be inner if it is of the form r(x) =uxu* for 
some unitary operator u € M.) The index of a 
subfactor M C Mx,G is the order of G, which can 
be infinite. If we have a subgroup H of G, then we 
obtain a subfactor Mx,H C MXaG and its index is 
given by the index [G : H] of the subgroup H. This 
analogy to the index of a subgroup is the origin of 
the terminology of the Jones index for a subfactor. 
The Jones index is also analogous to the degree of 
an extension of a field. From the viewpoint of this 
analogy, subfactor theory can be regarded as a 
certain generalized analogue (or the “quantum” 
version) of the classical Galois theory for field 
extensions. (The direct analog of the classical Galois 
correspondence for subfactors was studied by 
Nakamura-Takeda in the early days, and Izumi- 
Longo-Popa gave the most general form.) 

The tools Jones (1983) has introduced to study 
subfactors are as follows. Let N C M be a subfactor of 
type II, with finite Jones index. We consider the 
actions of N, M on L?(M). The completion of N with 
respect to the inner product given by the trace gives 
L?(N), which is naturally regarded as a closed 
subspace of L?(M). Let en be the projection on 
L?(M) onto L?(N), which is called the Jones 
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projection. We define Mı to be the von Neumann 
algebra generated by M and en on L?(M). This is again 
a type II, factor and denoted by M1. This construction 
is called the basic construction. We obtain [M1 : M] = 
[M:N]. Repeat the same procedure for M C Mı 
acting on L?(M1) this time. In this way, we have an 
increasing sequence of type II, factors, 


NCMCM,CM.CM3C::: 


which is called the Jones tower. We label the 
corresponding Jones projections as ej =en, €2 =em, 
e3=em,, .-.. We then have the following celebrated 
Jones relations: 

ejek = ekej, if |j—k| > 1 

1 

ie) = ENS |2] 
Jones proved the above-mentioned restriction on 
the possible values of the Jones index using these 
relations. The realization of the index values below 
4 in the set [1] by Jones also relies on these 
relations of the Jones projection. The basic con- 
struction is also possible for the other direction. 
That is, we can construct a subfactor N; C N so 
that N C M is the basic construction of N; CN. 
This is called the downward basic construction. 
This N; is not unique, but is unique up to an inner 
automorphism of N. 

A subfactor N C M is said to be irreducible if the 
relative commutant N’MM is equal to C. If a 
subfactor has Jones index less than 4, then it is 
automatically irreducible. The original realization of 
the Jones index values above 4 by Jones was through 
reducible subfactors. Popa proved that all the values 
above 4 are realized with irreducible subfactors. A 
factor is said to be hyperfinite if it has a dense 
subalgebra given as the union of increasing sequence 
of finite-dimensional x-algebras. If M is a hyperfinite 
type II, factor, then its subfactor is automatically 
also hyperfinite by a deep theorem of Connes. For 
hyperfinite, irreducible type II, subfactors, it is still 
an open problem to determine all the possible values 
of the Jones index. 

For type I, factors N C M CP, the Jones index 
[P: N] is equal to the product [P:M][M:N]. Thus for 
the Jones tower, we have [M,:N]=[M:N]**!. In 
general, if a subfactor N CM has a finite Jones 
index, then the relative commutant N'QM is 
automatically finite dimensional. So, if we start 
with a type II, subfactor N C M with finite Jones 
index, we have an increasing sequence of finite- 
dimensional algebras as follows: 


NAMCNAMCNAMCNOM3Cc:-:- fB 


These finite-dimensional algebras are called higher 
relative commutants of NCM. We draw the 
Bratteli diagram for the higher relative commutants 
as follows. Consider N’ My, (with convention 
Ma =N, Mo=M), then it is a finite-dimensional 
*-algebra; thus, it is of the form p; M, (C), where 
we have only finitely many direct summands. We 
draw a dot for each summand. We similarly draw a 
dot for each summand in @; Mm, (C) for N'A Mgy1. 
Let ¿ be the inclusion map from N’O M;= 
P; Mn (C) to N'N Mr O Mm (C) and p; the 
identity of M,,,(C), which is a projection in N'N 
M41- We denote by pug the multiplicity of the 
embedding map x+>1(x)p; from M, (C) to M,,,(C). 
Then we draw pj; edges from the jth dot for M,,(C) 
to the lth dot for M,,,(C). We repeat this procedure 
for all k, and get a picture as in Figure 1, which is 
called the Bratteli diagram of the higher relative 
commutants of N C M. 

It turns out that the edges connecting the kth and 
(k +1)th steps of the Bratteli diagram consist of the 
reflection of those connecting the (k — 1)th and kth 
steps, and a (possibly empty) new part. The “new” 
parts taken altogether in the above Bratteli diagram 
constitute the principal graph of a subfactor N C M. 
In the example of Figure 1, the principal graph is the 
Dynkin diagram As. In general, a principal graph 
can be finite or infinite. If it is finite, we say that a 
subfactor is of finite depth. If a subfactor has the 
Jones index less than 4, it is automatically of 
finite depth and the principal graph must be one of 
the A-D-E Dynkin diagrams. 

Pimsner and Popa (1986) obtained the character- 
ization of the Jones index value in terms of the 
Pimsner—Popa inequality for a conditional expec- 
tation. This can be used as a definition of the index 
for a subfactor of arbitrary type (and even for 
C*-subalgebras). Kosaki obtained a definition of the 
index for type III subfactors based on works of 
Connes and Haagerup. 


N’AN 
N’AM 
N’AM, 
N’ AM, 
N’ a M3 


N’ OAM, 


Figure 1 The Bratteli diagram of the higher relative commutants. 


Analytic Classification Theory 


If M is a hyperfinite type II, factor, then it is unique 
up to isomorphism. So any subfactor of such M is 
isomorphic to M itself. We next consider the 
classification problem of hyperfinite type I], subfac- 
tors. We say that a subfactor N C M is isomorphic to 
PCO if we have an isomorphism of M onto O 
which maps N onto P. The following tower of finite- 
dimensional algebras is a natural invariant for a type 
II, subfactor N C M with finite Jones index and it is 
called the standard invariant for N C M: 


M'AM c MAM c MAM, Cc. 
N N N [4] 
NAM c NOM c N'AM C.e 


Each square 


N N 


is a special combination of inclusions called a 
commuting square. Under a fairly general condition 
(called extremality of a subfactor, which automati- 
cally holds for an irreducible subfactor), the above 
sequence [4] is anti-isomorphic to the following 
sequence of finite-dimensional algebras, including 
the trace values: 


MAM c NAM c NAM C.e 
N N N [5] 


M'AM, c NAM c N; OM: ELE 


where ---CN3CN: CN; CNCM is given by 
repeated downward basic constructions. So, if the 
closure of |J;(N; 9 M1) in the weak operator topology 
is equal to Mı for an appropriate choice of N,’s, 
then the closure of U(N; N M) is also M, and the 
isomorphism class of the subfactor N C M is recov- 
ered from the standard invariant. In such a case, we 
say that a subfactor has a generating property, and 
then we have a complete classification of subfactors 
in terms of the standard invariant. Popa (1994) 
introduced a notion called strong amenability and 
proved that a subfactor of type II, is strongly 
amenable if and only if it has the generating property. 
This is the fundamental result in the classification of 
subfactors. A hyperfinite type I], subfactor with finite 
Jones index and finite depth is automatically strongly 
amenable, so such a subfactor is covered by this 
classification theorem of Popa. Popa also has a 
similar result for subfactors of type III. 
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Constructions and Combinatorial 
Classification 


As mentioned in the above section, Jones con- 
structed hyperfinite type II, subfactors for all 
possible index values below 4. They have the 
Dynkin diagrams A, as the principal graphs. It has 
been an important problem to construct new 
subfactors since then. Using the Hecke algebras, 
Wenzl constructed a series of subfactors with index 
values sin* (Nz/k)/ sin? (x/k) with N=2,3,4,..., 
where the series for N=2 coincide with the ones 
constructed by Jones. Wenzl’s dimension estimate in 
this work for the relative commutant has been an 
important tool to study subfactors. It was soon 
noticed that the subfactors of Jones and Wenzl are 
related to the quantum groups U,(sln) of Drinfel’d— 
Jimbo, at the value of the deformation parameter g 
at exp (7i/k). Constructions of subfactors from other 
quantum groups have been given by Wenzl. 

Ocneanu (1988) has introduced a notion of a 
paragroup and characterized the higher relative 
commutants arising from a type I, subfactor with 
finite Jones index and finite depth as a paragroup. If 
we start with a subfactor N C NX,G for a finite 
group G, the corresponding paragroup contains 
complete information on the group G and its 
representations. In this sense, a paragroup is a 
generalization of a (finite) group. The basic idea is 
to regard the bimodule yL7(M),, as an analog of the 
fundamental representation of a compact Lie group 
and make finite relative tensor products 


aN L (M) Qu L*(M) Qn L?(M) Buya 


Then one makes an irreducible decomposition and 
studies various intertwiners arising from these 
irreducible bimodules. In this way, we obtain a 
certain combinatorial object and it is called a 
paragroup. The vertices of the principal graphs 
correspond to irreducible bimodules and the edges 
correspond to basis vectors in the intertwiner spaces. 
Note that by Popa’s theorem explained in the 
previous section, a classification of subfactors of a 
hyperfinite type II, factor with finite Jones index 
and finite depth is reduced to one of paragroups. 
Using this theory of paragroups, Ocneanu has 
found that the Dynkin diagrams A,, Dən, Eg, and 
Eg are realized as principal graphs of subfactors, but 
Doy., and E7 are not. Furthermore, each of the 
graphs A, and D», has unique realization and each 
of Es, Eg has two realizations. At the index value 4, 
the principal graph must be one of the extended 
Dynkin diagrams, A, D”), Ee. es EY, Piss: 
Aw,oo, and Dæ, and all are realized. (The last 
three correspond to subfactors of infinite depth.) 
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See Evans and Kawahigashi (1998) and Goodman et 
al. (1989) for these constructions and classifications. 
Evans-Kawahigashi and Xu studied the orbifold 
construction of subfactors applied to the Hecke 
algebra subfactors of Wenzl. 

In a theory of integrable lattice models, we have 
squares with labeled edges, and we assign complex 
numbers to them. A paragroup has much formal 
similarity to such a lattice model, and the para- 
groups of subfactors of Jones and Wenzl correspond 
to the lattice models of Andrews—Baxter—Forrester. 

Goodman-de la Harpe—Jones have another con- 
struction of subfactors from the Dynkin diagrams, 
and for E¢ this gives a hyperfinite type I], subfactor 
with Jones index 3 + V3 and finite depth. Haagerup 
has made a combinatorial study on type I, sub- 
factors with Jones index values between 4 and 3 + V3 
and obtained a list of candidates of possible higher 
relative commutants. Haagerup himself showed one 
in the list with Jones index (5 + /13)/2 is indeed 
realized. Asaeda—Haagerup showed that another in 
the list having the Jones index (5 + /17)/2 is also 
realized. These two examples are still among the 
most mysterious examples of subfactors today and 
do not seem to arise from other constructions using 
quantum groups or conformal field theory. Izumi 
has another construction of a subfactor with the 
Jones index (7 + V29)/2 using an endomorphism of 
the Cuntz algebra. 

Popa has obtained a complete characterization of 
higher relative commutants including the case of 
infinite depth, and axiomatized the higher relative 
commutant as the standard -lattices. Xu has 
constructed standard A-lattices, hence subfactors, 
from quantum groups. This realization of Popa of a 
given standard A-lattice produces a nonhyperfinite 
type II, subfactor. Popa-Shlyakhtenko later showed 
that any standard )-lattice is realized for a subfactor 
of a single type II, factor, a group II; factor arising 
from the free group Fẹ having countably many 
generators, which is not hyperfinite. 

Jones (1999) has introduced a combinatorial 
characterization of standard )-lattices as planar 
algebras. This approach uses planar operads based 
on tangles and provides a new viewpoint on the 
structure of higher relative commutants. More 
studies on planar algebras have been done by 
Bisch—Jones. 


Topological Invariants in Three 
Dimensions and Tensor Categories 


Through the relations of the Jones projections, Jones 
(1985) discovered the Jones polynomial as an 


invariant for links. This was the beginning of series 
of entirely new theories in three-dimensional topol- 
ogy. The Jones polynomial was quickly generalized 
to the two-variable HOMFLY polynomial by Hoste, 
Ocneanu, Millet, Freyd, Lickorish, and Yetter. 

A three-dimensional topological quantum field 
theory (TQFT;) assigns a complex number to each 
closed oriented 3-manifold and a finite dimensional 
vector space to each closed oriented surface. 
Furthermore, to each compact oriented 3-manifold 
with boundary, it assigns a vector in the vector space 
corresponding to its boundary. Turaev—Viro have 
constructed TQFT, from combinatorial data called 
quantum 6j-symbols arising from quantum groups. 
Ocneanu has found that a subfactor of finite index 
and finite depth also produces quantum 6j-symbols, 
which give rise to a TQFT; generalizing the Turaev—Viro 
construction. See Evans and Kawahigashi (1998) for 
this construction. Reshetikhin—Turaev have another 
construction of TQFT, from a modular tensor 
category, which is a braided tensor category with 
nondegenerate braiding. Ocneanu has found a 
subfactor version of the quantum double construc- 
tion which produces a modular tensor category 
from a type II, subfactor of finite index and finite 
depth. From a type II, subfactor of finite index and 
finite depth, we can apply Ocneanu’s generalization 
of the Turaev—Viro construction on one hand, and 
also the Reshetikhin—Turaev construction to the 
modular tensor category arising from the quantum 
double construction of Ocneanu. The resulting two 
TQFT3s are shown to be equal by Kawahigashi- 
Sato—Wakui. Concrete computations of these topo- 
logical invariants have been made by Sato—Wakui 
based on Izumi’s work. Turaev and Wenzl have 
other constructions of TQFT; and modular tensor 
categories. 


Algebraic Quantum Field Theory 


An operator algebraic approach to quantum field 
theory is called algebraic quantum field theory and 
the standard reference is Haag (1996). In this 
approach, instead of quantum fields which are 
operator-valued distributions, we consider a family 
{A(O)} of von Neumann algebras parametrized by 
spacetime regions O in a Minkowski space. Each 
A(O) is meant to be generated by self-adjoint 
operators which are observables in O. We axioma- 
tize such a family of von Neumann algebras and call 
one a local net of von Neumann algebras. It is 
enough to take O of a special form, called a double 
cone. The name “local” comes from the locality 
axiom which is a mathematical expression of the 
Einstein causality on a Minkowski space. The 


Poincaré group is used as the spacetime symmetry of 
the Minkowski space. Doplicher et al. (1971, 1974) 
have introduced a representation theory of a local 
net A of von Neumann algebras and found that a 
“physically nice” representation is realized as an 
endomorphism of a one von Neumann algebra A(O) 
for some fixed O. They have a notion of a statistical 
dimension for such a representation and it is an 
integer (or infinite) if the spacetime dimension is 
larger than 2. Longo (1989, 1990) has shown that 
this statistical dimension of a representation is equal 
to the square root of the index [A(O):A(A(O))], 
where A is the corresponding endomorphism of 
A(O) to the representation. The relation between 
algebraic quantum field theory and subfactor theory 
has been found in this way. Longo (1989, 1990) has 
also started a theory of canonical endomorphisms 
for a subfactor and Izumi has further studied it. 
Longo has later obtained a characterization when an 
endomorphism of a factor becomes a canonical 
endomorphism by introducing a O-system. 
Recently, conformal field theory has attracted 
much attention. An approach based on algebraic 
quantum field theory describes a conformal field 
theory with a local net of von Neumann algebras on 
a two-dimensional Minkowski space with diffeo- 
morphism group as the spacetime symmetry. We can 
restrict such a theory into a tensor product of two 
theories on the circle, the compactified one- 
dimensional Euclidean space. Each theory on the 
circle is called a chiral conformal field theory and 
described by a local conformal net of von Neumann 
algebras, which is a family of von Neumann 
algebras parametrized by intervals on the circle. 
The name “conformal” comes from the fact that we 
use the orientation preserving diffeomorphism group 
on the circle as the symmetry group of the space. For 
a local conformal net A of von Neumann algebras 
on the circle with natural irreducibility assumption, 
each von Neumann algebra A(I) is automatically a 
type III factor. The Doplicher-Haag—Roberts theory 
works in this setting after an appropriate adaptation 
as in Fredenhagen et al. (1989) and each representa- 
tion of a local conformal net of von Neumann 
algebras is realized by an endomorphism of A(I), 
where I is an arbitrarily fixed interval on the circle. 
(Here we do not need an assumption that a 
representation is “physically nice” since it now 
automatically holds.) Now the representations give 
a braided tensor category. 
Buchholz—Mack—Todorov constructed examples of 
local conformal nets of von Neumann algebras on the 
circle using the U(1)-current algebra. Wassermann 
(1998) has constructed more examples using positive 
energy representations of the loop groups LSU(N) 
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and computed their representation theory, and his 
construction has been extended to other Lie groups 
by Toledano Laredo and others. For the local 
conformal net A of von Neumann algebras on the 
circle arising from LSU(N), we take an endomorph- 
ism A of A(I) arising from a representation of the 
local conformal net, then we have a subfactor 
A(A(I)) c A(I). This is isomorphic to the type II, 
subfactor constructed by Jones and Wenzl tensored 
with a common type III factor. 

Longo-Rehren (1995) started the study of a local 
net of subfactors, A(I) c B(I). They have defined a 
certain induction procedure which gives a represen- 
tation of the larger local conformal net B from that 
of A. This procedure is today called a-induction. Xu 
has studied this procedure and found several basic 
properties. In the cases of local conformal nets of 
subfactors arising from conformal embeddings, he 
has found a simple construction of subfactors with 
principal graphs Eg and Eg using a-induction. 
In the context of subfactor theory, a-induction 
has been further studied by Béckenhauer—Evans-— 
Kawahigashi, together with graphical methods of 
Ocneanu on the Dynkin diagrams. More detailed 
studies on local conformal nets of factors on the 
circle have been pursued partly using various 
techniques of subfactor theory, including classifica- 
tion of local conformal nets of von Neumann 
algebras on the circle with central charge less than 
1 by Kawahigashi—Longo. 


See also: Algebraic Approach to Quantum Field Theory; 
Braided and Modular Tensor Categories; C*-Algebras 
and Their Classification; Hopf Algebras and 
g-Deformation Quantum Groups; The Jones Polynomial; 
Quantum 3-Manifold Invariants; Quantum Entropy; von 
Neumann Algebras: Introduction, Modular Theory, and 
Classification Theory; Yang—Baxter Equations. 
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Introduction 


A vortex is commonly associated with the rotating 
motion of fluid around a common centerline. It is 
defined by the vorticity in the fluid, which measures 
the rate of local fluid rotation. Typically, the fluid 
circulates around the vortex, the speed increases as 
the vortex is approached and the pressure decreases. 
Vortices arise in nature and technology applications 
in a large range of sizes, as illustrated by the 
examples given in Table 1. The next section presents 
some of the mathematical background necessary to 
understand vortex formation and evolution. Next, 
some sample flows are described, including impor- 
tant instabilities and reconnection processes. Finally, 
some of the numerical methods used to simulate 
these flows are presented. 


Background 


Let D be a region in three-dimensional (3D) space 
containing a fluid, and let x = (x,y,z)! be a point in 
D. The fluid motion is described by its velocity 


Table 1 Sample vortices and typical sizes 





Vortex Diameter 
Superfluid vortices 10-8 cm (=1A) 
Trailing vortex of Boeing 727 1—2 m 

Dust devils 1—10 m 
Tornadoes 10—500 m 
Hurricanes 100—2000 km 
Jupiters Red Spot 25 000 km 


Spiral galaxies Thousands of light years 
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u(x,t) = u(x, t)i + v(x, t)j + w(x, t)k, and depends on 
the fluid density p(x,t), temperature T(x, t), gravita- 
tional field g, and other external forces possibly 
acting on it. The fluid vorticity is defined by a= V x u. 
The vorticity measures the local fluid rotation about an 
axis, as can be seen by expanding the velocity near 
X = X0, 


u(x) = u(xo) + D(xo)(x — xo) ++ø@(x0) x (x — xo) 
+ O(|x — xol) [1] 
where 


Ux Uy Uz 
Vu=] Uy Vy Vz [2] 
Wy Wy Wz 


Dix) = +(Vu + Vu!), 


The first term u(xo) corresponds to translation: all 
fluid particles move with constant velocity u(xo). 
The second term D(xo)(x— xo) corresponds to a 
strain field in the three directions of the eigenvectors 
of the symmetric matrix D. If the eigenvalue 
corresponding to a given eigenvector is positive, 
the fluid is stretched in that direction, if it is 
negative, the fluid is compressed. Note that, in 
incompressible flow, V -u=0, so the sum of the 
eigenvalues of D equals zero. Thus, at least one 
eigenvalue is positive and one negative. If the third 
eigenvalue is positive, fluid particles move towards 
sheets (Figure 1a). If the third eigenvalue is negative, 
fluid particles move towards tubes (Figure 1b). The 
last term in eqn [1], (1/2)@(xo0) x (x — xo), corre- 
sponds to a rotation: near a point with @(xo) Æ 0, 
the fluid rotates with angular velocity |@|/2 in a 
plane normal to the vorticity vector @. Fluid for 
which @=0 is said to be irrotational. 

A vortex line is an integral curve of the vorticity. 
For incompressible flow, V-@=V-(V x u)=0, 
which implies that vortex lines cannot end in the 
interior of the flow, but must either form a closed loop 
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(a) (b) 
Figure 1 Strain field: (a) two positive eigenvalues, sheet 
formation and (b) one positive eigenvalue, tube formation. 


interior of the flow, but must either form a closed loop 
or start and end at a bounding surface. In 2D flow, 
u= ui + vj and the vorticity is @=wk, where w is the 
scalar vorticity. Thus, in 2D, the vorticity points in the 
z-direction and the vortex lines are straight lines 
normal to the x-y plane. A vortex tube is a bundle 
of vortex lines. The strength of a vortex tube is defined 
as the circulation feu: ds about a curve C enclosing 
the tube. By Stokes’ theorem, 


[w-as= |f onas [3] 


and thus the circulation can also be interpreted as 
the flux of vorticity through a cross section of the 
tube. In inviscid incompressible flow of constant 
density, Helmholtz? theorem states that the tube 
strength is independent of the curve C, and is 
therefore a well-defined quantity, and Kelvin’s 
theorem states that a tube’s strength remains 
constant in time. A vortex filament is an idealization 
in which a tube is represented by a single vortex line 
of nonzero strength. 

The evolution equation for the fluid vorticity, as 
derived from the Navier-Stokes equations, is 

O pga |4] 
dt 

where d/dt=08/ðt+u-V is the total time deriva- 
tive. Equation [4] states that the vorticity is 
transported by the fluid velocity (first term), 
stretched by the fluid velocity gradient (second 
term), and diffused by viscosity v (last term). These 
equations are usually nondimensionalized and writ- 
ten in terms of the Reynolds number, a dimension- 
less quantity inversely proportional to viscosity. 

To understand high Reynolds number flow it is of 
interest to study the inviscid Euler equations. The 
corresponding vorticity evolution equation in 2D is 


dw 


which states that 2D vortex filaments in inviscid 
flow move with the fluid velocity. Furthermore, in 
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incompressible flow, the fluid velocity is determined 
by the vorticity, up to an irrotational far-field 
component Uw, through the Biot-Savart law, 


1 [— x @(x’) 


—— dx! + u% 6 
4r lx — x’? m n 


u(x) = 


In planar 2D flow, eqn [6] reduces to 


Lyt 


u(x) = Kpw, K*”əp(x) 7 [7] 


21 Ix] 
where w(x) is the scalar vorticity. Equations [4], [5] 
and [6], [7] are the basis of the numerical methods 
discussed later in this article. 

A vortex is typically defined by a region in the 
fluid of concentrated vorticity. A simple model is a 
point vortex in 2D flow, which corresponds to a 
straight vortex filament of unit circulation. The 
associated scalar vorticity is a 6-function in the 
plane, and the induced velocity is obtained from the 
Biot-Savart law. For a point vortex at the origin, 
this reduces to the radial velocity field 
u(x) = K*2Dő = Kop(x). Corresponding particle tra- 
jectories are shown in Figure 2a. The particle speed 
|u| =1/(2mr) increases unboundedly as the vortex 
center is approached, and vanishes as r— oo 
(Figure 2b). In general, the far-field velocity of a 
concentrated vortex behaves similarly to the one of 
a point vortex, with speeds decaying as 1/r. Near 
the vortex center, the velocity typically increases in 
magnitude and, as a result, the fluid pressure 
decreases (Bernoulli’s theorem). A vortex of arbi- 
trary shape can be approximated by a sum of point 
vortices (in 2D) or vortex filaments (in 3D), as is 
often done for simulation purposes. 

Vorticity can be generated by a variety of 
mechanisms. For example, vorticity can be gen- 
erated by density gradients, which in turn are 
induced by spatial temperature variations. This 
mechanism explains the formation of warm-air 
vortices when a layer of hot air is trapped 
underneath cooler air. Vorticity is also generated 
near solid walls in the form of boundary layers 
caused by viscosity. To illustrate, imagine 


Speed, |u| 


Distance, r 


(a) (b) 
Figure 2 Flow induced by a point vortex: (a) streamlines and 
(b) speed |u| vs. distance r. 
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(a) (b) 
Figure 3 Velocity and vorticity in boundary layer near a flat 
wall. 


horizontal flow with speed U, moving past a 
solid wall at rest (Figure 3a). Since in viscous 
flow the fluid sticks to the wall (the no-slip 
boundary condition), the fluid velocity at the wall 
is zero. As a result, there is a thin layer near the 
wall in which the horizontal velocity varies 
greatly while the vertical velocity gradients are 
small, yielding large negative vorticity values 
w=Vx — uy (Figure 3b). Similarity solutions to 
the approximating Prandtl boundary-layer equa- 
tions show that the boundary-layer thickness d 
grows proportional to vt, where t£ measures the 
time from the beginning of the motion. Boundary 
layers can separate from the wall at corners or 
regions of high curvature and move into the fluid 
interior, as illustrated in several of the following 
examples. 


Sample Vortex Flows 
Shear Layers 


A shear layer is a thin region of concentrated 
vorticity across which the tangential velocity com- 
ponent varies greatly. An example is the constant- 
vorticity layer given by parallel 2D flow 
u(x, y)= U(y), v(x, y)=0, where U is as shown in 
Figure 4a. In this case, the velocity is constant 
outside the layer and linear inside. The vorticity 
w= —U'(y) is zero outside the layer and constant 
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(a) (b) 
Figure 4 Shear layer: (a) velocity profile and (b) dispersion 
relation. 
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(a) (b) 
Figure 5 Vortex sheet: (a) velocity profile and (b) dispersion 
relation. 


inside. Shear layers occur naturally in the ocean or 
atmosphere when regions of distinct temperature or 
density meet. To illustrate this scenario, consider a 
tank containing two horizontal layers of fluids of 
different densities, one on top of the other. If the 
tank is tilted, the heavier bottom fluid moves 
downstream, and the lighter one moves upstream, 
creating a shear layer. 

Flat shear layers are unstable to perturbations: 
they do not remain flat but roll up into a sequence 
of vortices. This is the Kelvin-Helmholtz instability, 
which can be deduced analytically using linear 
stability analysis. One shows that in a periodically 
perturbed flat shear layer, the amplitude of a 
perturbation with wave number k will initially 
grow exponentially in time as e**, where w =w(k) 
is the dispersion relation, leading to instability. The 
wave number of largest growth depends on the layer 
thickness. This is illustrated in Figure 4b, which 
plots w(k) for a constant-vorticity layer of thickness 
2d. The wave number of maximal growth is 
proportional to 1/d. 

A vortex sheet is a model for a shear layer. The 
layer is approximated by a surface of zero thickness 
across which the tangential velocity is discontinu- 
ous, as illustrated in Figure 5a. In this case, the 
dispersion relation reduces to w(k) =+ k. That is, 
for each wave number k there is a growing and a 
decaying mode, and the growing mode grows faster 
the higher the wave number is, as shown in 
Figure 5b. The vortex sheet arises from a constant 
vorticity shear layer as the thickness d — 0 and the 
vorticity w — oo in such a way that the product wd 
remains constant. Figure 6 shows the roll-up of a 
periodically perturbed vortex sheet due to the 
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Figure 6 Computation of vortex sheet roll-up. 


Kelvin-Helmholtz instability, computed using one of 
the methods described in the next section. 


Aircraft Trailing Vortices 


One can often observe trailing vortices that shed 
from the wings of a flying aircraft (also called 
contrails). These vortices are formed because the 
wing develops lift. The pressure on the top of the 
wing is lower than on bottom, causing air to move 
around the edge of the wing from the bottom 
surface to the top. The boundary layer on the wing 
separates as a shear layer that rolls up into a vortex 
attached to the tip of the wing (Figure 7). Since the 
velocity inside the vortex is high, the pressure is 
correspondingly low and causes water vapor in the 
air to condense, forming water droplets that 
visualize the vortices. The vortex strength increases 
with increasing lift, and is particularly strong in 
high-lift conditions such as take-off and landing. 
Since lift is proportional to weight, it also increases 
with the size of the airplane. Vortices of large planes 
are strong enough to flip a small one if it gets too 
close. Trailing vortices are the principal reason for 
the time delay between take-off and landing and are 
still a serious issue for crowded urban airports. 

The trailing vortices can be modeled by a pair of 
counter-rotating vortex lines (Figure 8a). Two 
parallel vortex lines of opposite strength induce a 
downward motion on each other, similar to two 
point vortices, the zero-core limit. Two point 
vortices of strength +I at a distance 2d from each 
other translate with self-induced velocity (Figure 9): 


oF 
= And 


As a result trailing vortices near takeoff hit the 
ground as a strong downwash air current. 

Vortex decay results generally from the develop- 
ment of instabilities. Two parallel vortex tubes are 
subject to the long-wavelength Crow instability. 
Triggered by turbulence in the surrounding air, or 
by local variations in air temperature or density, the 
vortices develop symmetric sinusoidal perturbations 
with long wavelength, of the same order as the 
vortex separation (Figure 8b). As the perturbations 
grow to finite amplitude, the tubes reconnect and 
produce a sequence of vortex rings. Note that the 
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Figure 7 Sketch. Shear layer separation and roll-up into 
trailing vortices behind an airfoil. 
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(c) 
Figure 8 Sketch. Onset of Crow instability in a pair of vortex 
lines and ensuing reconnection. 


two-dimensional schematic in Figure 8c does not 
convey the three-dimensional structure of the rings. 
The reconnection process destroys the initial wake 
structure more rapidly than viscous decay of the 
individual filaments. 

Of much interest is the study of how to accelerate 
the vortex decay. High-aspect-ratio vortices are subject 
to a shorter-wavelength elliptic instability, which leads 
to earlier destruction. However, such vortices are not 
realistic in current aircraft wakes. Wing designs have 
been proposed in which more than two trailing 
vortices form which interact strongly and lead to 
faster decay. Other interesting aspects are the effect of 
ambient turbulence and vortex breakdown. Break- 
down refers to a disturbance in the vortex core in 
which it quickly, within an axial distance of few core 
diameters, develops a region of reversed flow and loses 
its laminar behavior. 

Unlike the counter-rotating vortices discussed so 
far, two equally signed vortices rotate under their 
self-induced velocity about a common axis. If the 
separation distance between them is too small, two 
equally signed patches merge into one. Vortex 
merging occurs in two- or three-dimensional flows, 
as opposed to vortex reconnection, which is a 
strictly three-dimensional phenomenon. 
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Figure 9 Self-induced downward motion of a vortex pair. 
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Vortex Rings 


A vortex tube that forms a closed loop is called a 
vortex ring. Vortex rings can be formed by ejecting 
fluid from a circular opening, such as when a smoke 
ring is formed. The boundary layer wall vorticity 
separates at the opening as a cylindrical shear layer 
that rolls up at its edge into a ring (Figure 10). The 
vorticity is concentrated in a core, which may be 
thin or thick relative to the ring diameter. The 
limiting cases are an infinitely thin circular filament 
of nonzero circulation and the Hill’s vortex, in 
which the vorticity occupies all the interior of a 
sphere. 

Just as a counter-rotating vortex pair, a ring 
translates under its self-induced velocity U in 
direction normal to the plane of the ring (Figure 11). 
However, unlike the vortex pair, the ring velocity 
depends significantly on its core thickness. For a ring 
with radius, circulation and core size, respectively, 
R, T, a, the self-induced velocity is 


T SR 1 
eer (10-5) a 


asymptotically as a=— 0. Thus, the translation 
velocity becomes unbounded for rings with decreas- 
ing core size. In reality, at some point viscosity takes 
over and spreads the core vorticity, slowing the ring 
down. 


x 


Figure 10 Vortex ring, formed by ejecting fluid from a circular 
tube. 





Figure 11 Self-induced motion of a vortex ring. 


Figure 12 Sketch. Onset of azimuthal vortex ring instability. 


Vortex rings of small cross section are subject 
to an azimuthal instability. Theory, experiment, 
and simulations show that if a ring is perturbed 
in the azimuthal direction, there exists a domi- 
nant wave number which is unstable and grows 
(Figure 12). The unstable wave number increases 
as the core size decreases, while its spatial 
amplification rate is almost independent of the 
core size. 

Interesting dynamics are obtained when two or 
more rings interact. Two coaxial vortex rings of 
equally signed circulation move in the same 
direction and exhibit leap-frogging: the rear ring 
causes the front ring to grow in radius and the 
front ring causes the rear one to decrease. From 
eqn [9] it can be seen that the ring velocity is 
inversely proportional to its radius. Consequently, 
the front ring slows down and the rear ring 
speeds up, until the rear ring travels through the 
front ring. This process repeats itself and is 
known as leap-frogging. On the other hand, two 
coaxial vortex rings of oppositely signed circula- 
tion approach each other and grow in radius. 
Their cores contract in order to preserve volume, 
and their vorticity increases in order to preserve 
circulation. Under certain experimental condi- 
tions, the azimuthal instability develops, the 
resulting waves on opposite rings reconnect and 
a sequence of smaller rings form. 


Vortices, Mixing, and Chaos 


Mixing is important in many natural processes and 
technological applications. For example, mixing in 
shear flows and wakes is relevant to aeronautics and 
combustion, mixing and diffusion determine chemi- 
cal reaction rates, and mixing of contaminants 
pollutes oceans and atmosphere. It is therefore 
important to understand and control mixing 
processes. 

Efficient mixing of two fluids is obtained by 
efficient stretching and folding of material lines. 


Stretching and folding in turn are the fingerprint 
of chaos; thus, mixing and chaos are intimately 
related. Mixing and associated chaotic fluid 
motion can be obtained by simple vortical 
motion. For example, two counter-rotating vor- 
tices subject to a periodic strain field oscillate in a 
regular fashion but induce chaos in a region of 
fluid moving with them. Similarly, two corotating 
vortices of equal strength that are turned on and 
off periodically so that one is on when the other 
is off, known as the blinking vortices, rotate 
around a common axis in a stepwise manner but 
induce chaos in nearby regions. On the other 
hand, if there are four or more vortices present, 
the vortex motion itself is generally chaotic. It 
should be noted that there are also nonchaotic 
equilibrium solutions of four or more vortices 
forming what is called a vortex crystal. 

Information about chaotic particle motion is 
obtained by studying Poincaré sections, examining 
the associated stable and unstable manifolds, and 
investigating the existence of chaotic maps such as 
the horseshoe map. 


Atmospheric Vortices 


Atmospheric vortices are driven by temperature 
gradients, Earth’s rotation (Coriolis force), spatial 
landscape variations, and instabilities. For example, 
temperature differences between the equator and the 
poles and Earth’s rotation lead to large-scale 
vortices such as the trade winds (Hadley cell), the 
jet streams, and the polar vortex (Figure 13). Semi- 
annual temperature oscillations are responsible for 
the Indian monsoons. Daily oscillations cause land- 
and sea-breezes. Landscape variations can cause 
urban-rural wind flows and mountain—valley 
circulations. 

Instabilities are often responsible for large 
cyclonic vortices. Barotropic instability results 
from large horizontal velocity gradients, and has 
been deemed responsible for disturbances over the 
Sahara region that occasionally intensify into 
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Figure 13  Vortices in the atmosphere. 
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tropical cyclones. Baroclinic instability, which 
occurs when temperature advection is superposed 
on a velocity field, can lead to cyclonic vortices at 
the front between air of polar origin and that of 
tropical origin. The inertial or centrifugal 
instability occurs when air flows around high- 
pressure systems and the pressure gradient force 
is not large enough to balance the centripetal 
acceleration and the Coriolis effect. 

Vortices also form on other planets with an 
atmosphere. On Mars, dust devils are quite 
common. They are ~10-50 times larger than the 
ones on Earth and can carry high-voltage electric 
fields caused by the rubbing of dust grains against 
each other. Jupiter’s characteristic spots are 
extremely large storm vortices. The Great Red 
Spot is a vortex spanning twice the diameter of 
the Earth. Unlike the low-pressure terrestrial 
storms and hurricanes, the Great Red Spot is a 
high-pressure system that has been stable for 
more than 300 years. Other vortices on Jupiter 
decay and vanish, such as the White Ovals, three 
large anticyclones which merged into one within 
two years. Recent computer simulations predict 
that many of Jupiter’s vortices will merge and 
disappear in the next decade. As a result, mixing 
of heat across zones will decay and the planet’s 
temperature is predicted to increase. 

Numerical simulations of the atmosphere are 
expensive due to the large number of parameters 
and the relatively small scales that need to be 
resolved. For climate models and medium-range 
forecast models, the governing 3D compressible 
Euler equations are simplified using the hydro- 
static approximation (in which only the pressure 
gradient and the gravitational forces are retained 
in the vertical-momentum equation) and the 
anelastic approximation (in which dp/dt is 
neglected), to obtain the primitive equations. 
Additional vertical averaging yields the shallow- 
water equations. One big hurdle is to accurately 
incorporate the effect of clouds, which is sig- 
nificant and is usually treated using subgrid 
models. 


Vortices in Superfluids and Superconductors 


At temperatures below 2.2K, liquid helium is a 
superfluid, meaning that it acts essentially like a 
fluid with zero viscosity governed by the Euler 
equations. The fluid is irrotational, except for 
extremely thin vortex filaments, which are formed 
by quantum-mechanical processes. Since the vortices 
cannot end in the interior of the flow, they can be 
generated only at the surface or they nucleate as 
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vortex rings inside the fluid. As an example, if 
a cylindrical container with helium is rotated 
sufficiently fast, vortex lines attached to both ends 
of the container appear. These quantum vortices 
have discrete values of circulation (= nb/m, where 
h =Planck’s constant, m=mass of helium atom, 
n=integer), core sizes of about 1A (roughly the 
diameter of a single hydrogen atom) and move 
without viscosity. 

Similarly, certain types of materials lose their 
electric resistance at low temperatures and 
become superconductors. One distinguishes type-I 
superconductors (most pure metals) from type-II 
superconductors (alloys). Using the Ginzburg- 
Landau theory it has been predicted that in 
type-II superconductors a lattice of vortex fila- 
ments forms, each carrying a quantized amount 
of magnetic flux. This was subsequently con- 
firmed by experimental observation. More pre- 
cisely, for temperatures T below a critical value 
Te, there are three regions corresponding to 
increasing values of the magnetic field (Figure 14). 
At low magnetic fields (H < Ha), no vortices 
exist (superconducting phase). At intermediate 
values (Ha < H < Ha), the magnetic field pene- 
trates the superconductor in the form of quan- 
tized vortices, also called flux lines (mixed 
phase). The values He1,c2 are determined by the 
London penetration depth A, which measures the 
electromagnetic response of the superconductor. 
With increasing magnetic field, the density of flux 
lines increases until the vortex cores overlap 
when the upper critical field H. is reached, 
beyond which one recovers the normal metallic 
state (normal conductor). 

When an external current density 7 is applied to 
the vortex system, the flux lines start to move under 
the action of the Lorentz force. As a result, a 
dissipating electric field E appears that is parallel to 
j, and the superconducting property of dissipation- 
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Figure 14 Superconductor phase dependence on magnetic 
field H and temperature T. 


free current flow is lost. In order to recover the 
desired property of dissipation-free flow, flux lines 
have to be pinned, for example, by introducing 
inhomogeneities and structural defects. For a given 
pinning force, flux lines remain pinned as long as the 
current density stays below a critical value. A major 
research objective is to optimize the pinning force in 
order to preserve superconductivity at larger current 
densities. 


Numerical Vortex Methods 


Many numerical methods used to compute fluid 
flow are Eulerian schemes based on a fixed mesh, 
such as finite difference, finite element, and spectral 
methods, commonly used for example in atmo- 
sphere and ocean modeling. This section briefly 
describes alternative vorticity-tracking methods 
used to simulate incompressible inviscid vortex 
flows, and concludes with some extensions to 
viscous flows. The premise of these methods is 
that since the fluid velocity is determined by the 
vorticity through the Biot-Savart law (eqn [6]), it 
suffices to track only that portion of the fluid 
carrying nonzero vorticity. This region is often 
much smaller than the total fluid volume, and 
computational efficiency is gained. Numerical vor- 
tex methods are typically Lagrangian, that is, the 
computational elements move with the fluid 
velocity. 


Point-Vortex Approximation in 2D 


To compute the evolution of a vorticity distribution 
w(x, t) in 2D, the simplest approach is to approx- 
imate the vorticity by a set of point vortices at x;(t) 
with circulation T; and evolve them under their self- 
induced motion. The values T; are an estimate of the 
initial circulation around x;(0). The vortex positions 
x;(t) evolve in the induced velocity field 


dx; = 

a = 2 eK av (xj — x) Pa 
k=1 
kAj 


where the exclusion k Æ j accounts for the fact that 
a point vortex induces zero velocity on itself. The 
solution to the system of ordinary differential 
equations [10] can be obtained using any method, 
such as Runge-Kutta or Adams—Bashforth. 

The point-vortex approximation can be written in 
Hamiltonian form as 


doc 


dt 


10H dy, 10H 
oS a M1) 
l; OY; dt l; Ox; 


where the Hamiltonian 


+O- ye] 12 


is conserved along fluid particles, dH/dt=0. The 
method also conserves the fluid circulation and the 
linear and angular momenta. 

Ideally, the solution to [10] should converge as 
N —> oo to the solution of the Euler equations. 
This is true for smooth vorticity distributions, but 
for singular distributions such as a vortex sheet, 
the situation is more complicated. The vortex 
sheet, a curve in the plane, develops a singularity 
in finite time at which the curvature becomes 
unbounded at a point. The point-vortex approx- 
imation converges before the singularity formation 
time, provided the growth of spurious roundoff 
error due to Kelvin-Helmholtz instability is 
suppressed using a filter. However, past the 
singularity formation time, the  point-vortex 
approximation no longer converges. 

The general approach is to replace the singular 
kernel Kp by a regularization K$p, such as 


5 1 —yi + xj 


= 1 
D = Ir y + 82 [13a] 
1 —yti + xj 2 169 
6 YIT XI (4 _ alal 
Kip = >- Pr (1 e ) [13b] 


where 6 is a numerical parameter. The regulariza- 
tion amounts to replacing the 6-function vorticity 
of a point vortex by an approximate 6-function. In 
order to recover the solution to the Euler equations, 
it is necessary to study the limit N — oo, ô — 0. For 
smooth vorticity distributions, this process con- 
verges. For vortex sheet initial data, there is 
evidence of convergence, but details of the limiting 
behavior remain under investigation. Regularized 
solutions with fixed value 6 and vortex sheet initial 
data are shown in Figures 6 and 15. Figure 6 shows 
the onset of the Kelvin-Helmholtz instability in a 
periodically perturbed flat vortex sheet. Figure 15 
shows the rollup of an elliptically loaded flat vortex 
sheet that models the evolution of an aircraft 
wake (see Figure 7). The correspondence between 
the two-dimensional simulation and the three- 
dimensional wake is made by replacing the spatial 
coordinate in the aircraft’s line of flight by a time 
coordinate. 


Vortex Dynamics 397 


0.0 





—8.0 
—2.5 2.5 


Figure 15 Computed evolution of an elliptically loaded flat 
vortex sheet. 


Contour Dynamics in 2D 


Consider a planar patch of constant vorticity wo 
bounded by a curve x(s,t),0 <s< L, moving in 
inviscid, incompressible flow. In view of Kelvin’s 
theorem and eqn [5], the vorticity in the patch 
remains constant and equal to w, for all time, and 
the patch area remains constant. Only the patch 
boundary moves. The velocity at a point x(a,t) on 
the boundary can be written as a line integral over 
the boundary: 


dx Wo Ox 
noe | logle — xls, 2) 54 [14] 


The contour dynamics method consists of approx- 
imating a given vorticity distribution by a super- 
position of vortex patches, and moving their 
boundaries according to eqn [14]. This method 
has been applied to compute the evolution of 
single-vortex patches and shear layers, and to 
geophysical flows. Typically, filamentation occurs: 
the patch develops thin filaments which increase the 
boundary length significantly and thereby the 
computational expense. The approach generally 
taken is to remove the thin filaments at several 
times throughout the computation, which is 
referred to as contour surgery. The contour 
dynamics approach as well as the point-vortex 
approximation have also been generalized to treat 
quasigeostrophic flows. 
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Vortex Filament Methods in 3D 


Vortex simulations in 3D differ from those in 2D in 
that the stretching term in eqn [4] needs to be 
incorporated. The vortex filament method approx- 
imates the fluid vorticity by a finite number of 
filaments whose circulation remains constant in 
time. Each filament is marked by computational 
mesh points which move with the regularized 
induced velocity. The regularization is necessary to 
prevent the infinite self-induced velocities of curved 
vortex filaments. As in 2D, this method automati- 
cally conserves circulation. Vorticity stretching is 
accounted for by the stretching between computa- 
tional mesh points. As the filament length increases, 
more meshpoints are typically introduced to keep it 
resolved. Also, the number of filaments can be 
increased throughout the simulation to maintain 
resolution. 


Viscous Vortex Methods 


While inviscid models are expected to approximate 
small viscosity fluids well far from boundaries, near 
boundaries, where vortex shedding is an inherently 
viscous mechanism, it is important to incorporate 
the effects of viscosity. The first methods to do so 
used operator splitting in which inviscid and viscous 
terms of the Navier-Stokes equations were solved in 
a sequential manner. In each time step, the compu- 
tational elements would first be convected, and then 
they would be diffused by a random-walk scheme. 
The particle strength exchange method, introduced 
more recently, does not rely on operator splitting 
and has better accuracy. The particle position and 
vorticity evolve simultaneously, and viscous 
diffusion is accounted for in a consistent manner. 

Vortex dynamics continues to be a source of 
interesting problems of theoretical and practical 
importance. In particular, much remains to be 
learned to better understand turbulence and the 
transition to turbulence, a process dominated by 
deterministic vortex dynamics. 


Further Remarks 


Finally, some remarks on relevant literature on this 
subject are in order. Lugt (1983) and Tritton (1988) 
are recommended as elementary introduction to 
vortex flows. van Dyke (1982) presents beautiful and 
instructive flow visualizations. Comprehensive treat- 
ments of incompressible fluid dynamics are given in 
Batchelor (1967), Chorin and Marsden (1992), Lamb 
(1932), and Saffman (1992), and compressible flow is 
treated in Anderson (1990). Cottet and Koumoutsakos 
(2000) give an overview of numerical vortex methods. 


Special topics have also been addressed; atmosphere 
(Andrews et al. 1987), point vortex motion and chaos 
(Aref 1983, Newton 2001, Ottino 1989), superfluids 
and superconductors (Blatter et al. 1994, Donnelly 
1991), turbulence theory using statistical mechanics ( 
Chorin 1994), vortex reconnection (Kida and 
Takaoka 1994), theory for Euler and Navier-Stokes 
equations (Majda and Bertozzi 2002), contour 
dynamics (Pullin 1992), vortex rings (Shariff and 
Leonard 1992), and aircraft trailing vortices (Spalart 
1998). Green (1995) includes survey articles on 
various topics. 


Nomenclature 

a vortex ring core size 

g gravitational field 

H Hamiltonian 

Kop singular velocity kernel 
Kop, 6 regularized velocity kernel 
p(x, t) fluid density 

R vortex ring radius 

T(x, t) temperature 

U translation velocity 
u(x,t) =u(x, tit fluid velocity 

v(x, t)j + w(x, t)k 

w(k) dispersion relation 
O=Vxu vorticity 

== yy scalar vorticity 

p ring circulation 


See also: Abelian Higgs Vortices; Incompressible Euler 
Equations: Mathematical Theory; Integrable Systems: 
Overview; Interfaces and Multicomponent Fluids; 
Intermittency in Turbulence; Newtonian Fluids and 
Thermohydraulics; Point-Vortex Dynamics; Stochastic 
Hydrodynamics; Superfluids; Topological Knot Theory 
and Macroscopic Physics; Turbulence Theories. 
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Introduction 
The most basic wave equation is 


Oru 

32 Au = 0 [1] 
for u=u(t,x), where A is the Laplace operator, 
given by Au=07u/Ox7 +--+» + 0*u/Ox2 on n-dimen- 
sional Euclidean space R”. More generally, u might 
be defined on R x M, where R is the t-axis and M is 
a Riemannian manifold, with a metric tensor given 
in local coordinates by (gj,). Then the Laplace- 
Beltrami operator is given, in local coordinates, by 


o „ OU 
gp —1/2 —. 1/2 | SA 2 
u =g 2 Ox, (e § Ox, [2] 


where (g/*) is the matrix inverse to (giz) and 
g= det (g). Even if one concentrates on wave 
propagation in Euclidean space, one frequently 
wants to use curvilinear coordinates, and [2] is 
useful. Equation [1] is supplemented by initial 
conditions of the form 


u(0,x) = f (x), 


called Cauchy data. If the spatial domain M has a 
boundary OM (e.g., if M is a bounded region in R”), 
then boundary conditions are imposed. The most 
common are the Dirichlet boundary condition 


du(0,x) = g(x) [3] 


u(t,x)=0 forx eM |4] 
and the Neumann boundary condition 
ð u(t,x)=0 forx eM 5] 


where O,u denotes the normal derivative of u at the 
boundary. More generally, one might have a driving 
force, and replace 0 on the right-hand side of [1] by 
a function F(t,x). Similarly, one can consider 
nonzero boundary data in [4] and [5]. 


The wave equation [1] models a number of 
physical phenomena, at least in the linear approxi- 
mation. The vibration of a drum head is modeled by 
[1], with M a planar domain, and with the Dirichlet 
boundary condition [4]. The motion of sound waves 
in a room with hard walls is modeled by [1], with M 
a region in R°, and with the Neumann boundary 
condition [5]. The propagation of electromagnetic 
waves is given by Maxwell’s equations: 


E 
© — curl B = —] 
OB 
py TE = 0 (6] 
divE=p 
divB = 0 


where p is the electric charge density and J the 
current. These equations yield [1] (with the right- 
hand side replaced by some function F(t, x) if J and p 
are not zero) for the components of the electric field 
E and the magnetic field B. If the propagation is in a 
region M in R? bounded by a perfect conductor, 
then the boundary conditions are that E is normal to 
OM and B is tangential to OM. If OM is flat, these 
equations can be decomposed into Dirichlet pro- 
blems for some components and Neumann problems 
for the rest, but if OM is curved such a decomposi- 
tion is not possible. 

Other models of vibrating objects produce var- 
iants of [1]. Examples include vibrating elastic 
solids, yielding an equation like [1] with Az 
replaced by pAu+(A+p)graddivu, for linear 
elasticity. Here A acts componentwise on u, and u 
and A are constants, called Lamé constants. Other 
examples model vibrations of crystals and propaga- 
tion of electromagnetic waves in crystals. Further 
interesting phenomena arise in these various cases, 
such as Rayleigh waves in linear elasticity and 
conical refraction in crystal optics. 

Here we discuss the propagation of waves and their 
reflection and diffraction at boundaries. In the interest 
of providing reasonable coverage in a brief space, we 
restrict attention to the wave equation [1]. 


402 Wave Equations and Diffraction 


Basic Propagation Phenomena 


The simplest examples of waves propagating accord- 
ing to [1] are plane waves, of the form 


u(t,x) = p(x-w— t) [7] 


with (t,x) € R x R”,w a unit vector in R”, and ọ a 
function on R. If y has two continuous derivatives, 
[7] defines a classical solution of [1]. More 
generally, one can allow y to be less regular. For 
example, it could be piecewise smooth with a jump 
discontinuity at some point a € R. In such a case, u 
will be piecewise smooth with a jump across the 
n-dimensional surface x -w — t =a in R x R”, which 
will solve [1] in a weak, or distributional, sense. For 
fixed t,u(t,-) has a jump across the (n-— 1)- 
dimensional surface X; = {x € R”:x-w=t+a}. As 
t varies, X, moves in the direction w with unit speed. 

There are also spherical wave solutions to [1] on 


R x R”, such as 





sent = 
u(t,x) =S—(P — |x?) 3] 
for n=2, and 
1 
u(t, x) = 7 —A(|x|— ltl) 9) 


for n= 3. Here s, =s for s > 0,s, =0 for s < 0, and 
6(s) is the Dirac delta function. In fact, [8] and [9] 
are “fundamental solutions” (more on which in the 
section on harmonic analysis) to the wave equation 
on R x R”, for n=2 and 3, respectively. In such 
cases, the singularity in u(t, -) for each fixed ż lies in 
Y= {x E€ R”:|x|=|z|}, a family of surfaces in R” 
that moves, in the direction of the normal to ¥;, at 
unit speed. 

The examples mentioned above illustrate two 
general phenomena about the behavior of solutions 
to [1]. The first is finite propagation speed. Its 
general formulation is that, given a closed set 
K C M, 


suppf,g CK = suppu(t,-) 
C {x € M : dist(x, K) < |t|} [10] 


In fact, given that [8]-[9] are fundamental solutions, 
[10] is a consequence of these formulas when 
M=R? or R?. The result [10] is true in great 
generality, with well-known demonstrations invol- 
ving energy estimates. 

The second phenomenon involves propagation of 
singularities. Typically, if the Cauchy data f and g in 
[3] are smooth on the complement of an (n — 1)- 
dimensional surface Xo, perhaps with a jump across 
Xo, or such a singularity as in [8] or [9], the solution 
u(t,x) will be a sum of two terms, with singularities 


of a similar nature on the surfaces £F, moving at 


unit speed in the direction of their normals, X+ 
flowing from “op in one direction and X; in the 
other. This also holds for the manifold case [2]. That 
happens at least until such surfaces develop singula- 
rities, when matters become more elaborate. 

An alternative way to describe how the set of 
singularities evolves is the following. Let $;M denote 
the space of unit vectors tangent to M; this is 
a submanifold of the tangent bundle of M,TM. 
There is a natural projection 7:$;M — M. Asso- 
ciated to a smooth surface “op of dimension n — 1 in 
M (of dimension n) are two preimages Aj and Ap in 
S,M, consisting of unit vectors lying over points of 
Xo and normal to No. The geodesic flow is a flow on 
S,M, and it takes Aj to smooth (n — 1)-dimensional 
surfaces A= in S;M. The sets F are the images of 
AF under 7. The geodesics starting out at points in 
Aj and sweeping out AF are the rays along which 
the singularities of the solution u propagate. 

This latter description works for all t if M has no 
boundary and is complete, that is, all geodesics are 
defined for all t, although singularities develop in 
the images 7(A#) ==, at points p € 4, where AF 
meets TpM nontransversally. The behavior of u near 
such singular points of =, known as caustics, is 
more complicated than that near regular points, but 
it can be captured in terms of integrals. Methods of 
establishing this propagation of singularities are 
discussed in the section on geometrical optics. 

Such a description needs further elaboration if M 
has a boundary. One of the principal problems of 
diffraction theory is to explain how singularities of 
solutions to [1], with a boundary condition such as 
[4] or [5], propagate and reflect off the boundary. 

Considering the case where M is a half-space 


in R”, 
M=R‘= {x €R®: x, > 0} [11] 


provides a guide to the simplest reflection phenom- 
ena. In such a case, one can solve the Dirichlet or 
Neumann boundary problem for the wave equation 
[1] by the method of images. One extends f and g 
from R? to R”. For the Dirichlet problem [4], one 
takes odd extensions, f(x’, —x,)= —f(x', Xn), and 
similarly for g. For the Neumann problem [5], one 
takes even extensions, f(x’, —x,)=f(x',xn), etc. 
One then solves the wave equation [1] on R x R” 
with the extended Cauchy data, and the restriction 
to R x R4 solves the respective boundary problem. 
Suppose No is a smooth (n — 1)-dimensional surface 
that does not meet OR’, and that f and g have 
singularities on Xo, as above. (Suppose for simplicity 
that f and g vanish near OR% .) Those rays issuing 


from normals to Xọ have mirror images, which are 
rays in R”. If such a ray hits OR}, its mirror image 
does so also, and continues into R% , as the reflected 
ray. The singularities of u propagate along such 
reflected rays. 

Such a description extends to a general complete 
Riemannian manifold with boundary M, in the case 
of rays that hit the boundary transversally. Such a 
ray is reflected by retaining the tangential compo- 
nent of its velocity vector at the point of intersection 
OM and reversing the sign of the normal component. 
One says that the ray is reflected according to the 
laws of geometrical optics. Singularities of u carried 
by such rays that hit OM are correspondingly 
reflected. Methods to establish such transversal 
reflection of singularities are natural extensions of 
those developed to treat the propagation away from 
OM, mentioned above. 

Matters become more delicate when there are rays 
that are tangent to OM. A model example is given by 


M=R"\B, B={xeER"”: |x| <1} [12] 


which one takes when studying the scattering of 
waves in R” by the obstacle B. Consider a solution 
to [1] with boundary condition given by [4] or [5] 
that has a simple singularity on X; = {x € R” : x, =t} 
for t< —1. The associated rays are of the form 
y(t) =(x',t), for t < —, with x’ € R1. If |x’| > 1, 
these rays continue on in R”\B, for all t > —1. If 
jx’| < 1, these rays hit OM=OB transversally, and 
their reflection is as described above. If |x’|=1, 
these rays hit OB tangentially, at t=0; they are 
sometimes called grazing rays. One also continues 
them past t=0. One defines in this fashion »; for 
t > —1. The region 


Sa 1720.5, ER bella Or T] 


is called the “shadow region.” It is disjoint from ¥; 
for all t. The solution u is smooth in S for all t, 
although it is not identically zero. The set 


S? = {x = (x',x,) E€ R”\B : |x| =1, x, > 0} [14] 


is the “shadow boundary.” 

One can replace B in [12] by a more general 
smooth, convex obstacle K, with positive Gauss 
curvature everywhere, and the same considerations 
of transversal and grazing rays and shadow regions 
apply. These notions also extend to a more general 
class of Riemannian manifolds with boundary, 
called manifolds with diffractive boundary. In the 
case K=B, one can use separation of variables to 
reduce the problem of analyzing solutions to [1] and 
showing that singularities propagate along such rays 
to a problem in harmonic analysis on the sphere 
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S"-!, For more general convex obstacles K or 
manifolds with diffractive boundary, other techni- 
ques are required, to show that waves reflect off the 
boundary in a fashion similar to the case [12]. 

Another situation arises if instead of [12] one 
takes M=B, or more generally M=K, a convex 
region as described above. A ray starting off from 
a point in OM, almost tangent to OM but with a 
small component in the direction of the normal 
pointing into M, will undergo many reflections in 
a short time. Upon shrinking the normal compo- 
nent of the initial velocity to zero, one obtains in 
the limit a geodesic in M, known as a gliding ray. 
In such a case, singularities of solutions to [1], 
with such a boundary condition as [4] or [5], 
propagate along both transversally reflected and 
gliding rays. 

For the generic smooth obstacle K in R”, the 
second fundamental form can have a variety of 
signatures at various boundary points. Various types 
of “generalized rays” occur — generally speaking 
limits of sequences of transversally reflected rays. 
This situation also holds for general complete 
Riemannian manifolds with smooth boundary. The 
main result about propagation of singularities in 
such a case is that it is always along such generalized 
rays. This was established by Melrose and Sjöstrand 
(1978). 

Further diffraction effects arise when OM has 
singularities, such as edges and corners. The simplest 
example is 


M = {x €R*:4a<6<b,r>0} [15] 


where (r, 0) are the polar coordinates of x € R?, and 
we assume 0 < a < b < 2r. Here one is studying the 
diffraction of waves by a wedge. In the limiting case 
a=0, b=2r, the wedge becomes a half-line, that is, 


M = R° \ {(x1,0) : x1 > 0} [16] 


Singularities of solutions to [1] on R x M with 
such a boundary condition as [4] or [5] propagate in 
the interior of M and reflect off the regular points of 
OM as before. If a family of continuous, piecewise 
smooth curves X}; carrying the singularity of u hit the 
corner x =0 at t=a, this reflection creates a tear in 
X; for t > a. In addition, a diffracted wave spreads 
out from the corner at unit speed. This diffracted 
wave carries a singularity that is weaker than that of 
the incident wave. For example, if one has a solution 
like [8], but shifted to have support in a disk of radius 
It] about a point p 4 0 in R’, for small |ż|, then the 
diffracted wave will have a jump discontinuity. 

The space M in [15] is a special case of a cone. 
More generally, if N is a complete Riemannian 
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manifold (possibly with boundary), then the cone 
C(N) with base N is the set 


C(N) = [0, 00) x N [17] 


with all points (0,x), x € N, identified, with the 
metric tensor 


ds* = dr’ +r°g [18] 


where g is the metric tensor on N, and points on 
C(N) are denoted (r,x),r € [0,œ0),x € N. The space 
n [14] has the form M=C(N) with N = [a,b], an 
interval. A cone in Euclidean space R” is of the form 
C(N) with N a domain in the unit sphere S”~!. 

The propagation of singularities for solutions to [1] 
on C(N), when N has smooth boundary, has a 
description similar to that above for the case [15]. 
Again, there is a diffracted wave set off from the conic 
point {r = 0} when a singularity of a wave hits it. The 
diffracted wave is typically (n — 1)/2 units smoother 
than the singular wave producing it, where 
n= dim C(N). For example, the fundamental solution 
to the wave equation on C(N) produces a diffracted 
wave which is the sum of a jump discontinuity and (in 
general) a logarithmic singularity. 

In fact, precise understanding of the behavior of 
the fundamental solution to the wave equation on 
C(N) is encoded in terms of the behavior of the 
solution operator to the wave equation on the base 
N. This is discussed in further detail in the section 
on harmonic analysis. In the case where C(N) is 
given by [15], we are dealing with the wave 
equation on an interval [a,b], whose behavior is 
elementary. 

One can use analysis of [15] together with finite 
propagation speed to get a good qualitative picture of 
diffraction of waves in R? by a polygonal obstacle. A 
variation of this argument allows one to understand 
the behavior of the wave equation on a “polygonal” 
domain N in S’, that is, one whose boundary consists 
of a finite number of geodesic segments in S*. Going 
from there to C(N), one can then analyze diffraction 
of waves in R? by a polyhedron. 

It is worth remarking how the “shadow region” 
for such an obstacle as a wedge in R? differs from 
that in [12]-[14]. For example, if one considers M 
given by [16] and u(t,x) =6(x2 — t), for t < 0, then 
the region 


4K = (xpt): Xip 0} [19] 


is the “shadow region,” in the sense that rays either 
missing or reflecting off the obstacle {(x1,0): x1 > 0} 
do not enter the region [19]. However, unlike the 
case [13], the solution u(t,x) is not smooth in the 
region [19] for t > 0. There is a singularity there, 


although it is weaker than the singularity of the 
main wave. 

Taking Cartesian products of spaces of the form 
[15] with RE yields spaces with k-dimensional 
edges. There are also spaces with curvy edges. 
Rather than continuing with further general 
description, one more particular case is discussed 
next, which has had a historical significance. 
Namely, we consider the reflection of waves in R? 
off a disk, that is, take 


M=R°\D, D= {(x1,x2,0) :x¢+2%3 <1} [20] 


Consider a wave given for t < 0 by u(t, x) = 6(x3 — t). 
This wave hits D =M at t=0, giving off a diffracted 
wave, traveling away from o={(x1,x2,0):x7 + 
x} =1} at speed 1 for t>0. This diffracted wave 
carries a singularity that blows up like the —1/2 power 
of the distance to the torus of points of distance t from 
o, for t € (0,1). For t > 1, there is a focusing effect 
along the x3-axis, producing a stronger singularity for 
u(t, x) there. 

This sort of phenomenon was understood, at 
least from a heuristic point of view, in the 
nineteenth century, and it played a role in an 
important argument of Poisson. At the time, there 
was a debate about whether the propagation of 
light was a wave phenomenon. Poisson did not 
think it was, and he noted that if it were, the light 
Waves propagated past such an obstacle should 
produce a bright spot along the axis normal to the 
disk and through its center. The experiment was 
performed and the bright spot was observed. 
This is now called the Poisson spot, and its 
occurrence convinced many physicists, including 
Poisson, that the propagation of light is a wave 
phenomenon. 


Harmonic Analysis and the Wave 
Equation 


The wave equation [1] with Cauchy data [3] can be 
regarded as an operator differential equation, with 
solution 


sin tV¥—A 
V—A 
This brings one to investigate functions of the self- 


adjoint operator A. If M =R”, one can do this using 
the Fourier transform, which is given by 


FF (6) =f© = (20)? J f(xje™dx 22 


One defines F* by changing e~** to e®§ in [22], 
and the Fourier inversion formula says F and F* are 


u(t,x) = cos tV—A f(x) + g(x) [21] 
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inverses of each other on various function spaces, 
including L?(R”). Then one has 


o(V—A)f (x) = (2r) / ple de [23 


Note that [23] is equal to 


J B(x — y)f(y) dy = © * f(x) 24 


where 


D(x) = (2r) ” / pllet de [25] 


In particular, [21] becomes 


wax) = oR, x f(x) + R: * g(x) [26] 
where 
R) = Qn" EE etag p7 


is the fundamental solution to the wave equation. 
The integral [27] is not an easy integral when 

n> 1, but the answer can be derived by analytic 

continuation from the Poisson kernel, that is, 


ue (50) = Py * f(x) 


pn [28] 
Pix) = Cald ey e 


where C, =a ""*))/*T((n + 1)/2). One gets 


R(x) = lim Im(|x/? = (= ie?) D 29 


Taking this limit for n=2,3 yields the formulas 
[8]-[9]. There are several ways to derive [28]. One, 
which is flexible and useful for other situations, 
derives it from the formula for the heat kernel, 


ef (x) = H, f(x), Hil) = (4nty "eI /4* [30] 
via the subordination identity: 


ii. 5 = f e ie A pd 
N 0 


A>0, y>0 [31] 
with A = V—A. The heat kernel can be computed via 
[23], which becomes a well-known Gaussian inte- 
gral. The identity [31] can be proved using the fact 
that the Fourier integral formula for P,(x) is 
elementary to compute when n= 1. 


To understand functions of the Laplace operator 
on a cone C(N), one uses 


Oo n-10 1 
Se aA 2 
ee r or ` [32] 


where Ayn is the Laplace operator on N, which 
follows from [2] and [18]. Here n = dim C(N). This is 
a modified Bessel operator. We define the operator 





—2 
vy =(-An +02), a= a [33] 
For each v; in the spectrum of v, we consider the 
Hankel transform 


HygQ)= f OnAir B4 


where J,, is the Bessel function of order v;. The 
Hankel inversion formula says H,, is unitary 
on L7(R*,rdr), and is its own inverse. Conse- 
quently, we can write the action of y(V—A) on 
L?(C(N)) as 


o(V—A)g(r, n= [ Kass, v)g(s,x)s” tds [35] 


where K,(r,s,v) is a family of operators on L7(N), 
given by 


Ko (r,s,v) = (rs)° [ v(A)LAr[LOAs)A dX [36] 


To obtain the wave kernel on C(N), one can 
analytically continue formulas for the Poisson 
kernel, for e0Y-4_ Such formulas arise from the 
Lipschitz—Hankel identity: 


J E “Ju(tr)Jr(sd) da 


0 


1 _ rts ty? 
-ieo a (E) p7 


Here O,_1/2(T) is a Legendre function. The identity 
[37] is one of the more difficult identities in the 
theory of Bessel functions. It is useful to know that it 
can be derived by applying a slight variant of the 
subordination identity [31] to the more elementary 
identity 


sa 2 1 2 ig rs 
—tr = —(r°+s*) /4t 
| e™^ Ti(raA)Ji(sA)XA da 77° I, (5) [38] 


(where I (y)=e7™/2] (iy) for y > 0), which describes 
the behavior of the heat kernel on C(N). 

Carrying out the analytic continuation of [37] 
to imaginary y yields results stated in the section 
on basic propagation phenomena, once one 
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understands the behavior of families of functions of 
the operator v so produced. An approach taken by 
Cheeger and Taylor (1982) to this was to synthesize 
these operators from eS”, s €R, and deduce their 
behavior from the behavior of the solution operator 
to the wave equation on the base N. 

One can apply similar considerations to M = R” \B, 
which is the truncated cone [1,00) x 8”%~!, with 
metric tensor [18], where g is the metric tensor on 
S”-!, and Laplace operator given by [32], with Ay 
the Laplace operator on S”~!. The problem of 
diffraction of waves by the ball B can be recast as 
solving 

Oru 
ap av = 0 on R x M 39 
U\RxaM = i Ut, xX) = fort«&«o0 


with f compactly supported on R x OM. Taking the 
partial Fourier transform with respect to t yields the 
reduced wave equation 


(A+ *)v=0 for |x| >1, vlg = g(x,A) [40] 


and the condition u(t,x)=0 for t <0 yields for v 
the outgoing radiation condition 


aan na( iw) — 0 as r — OO [41] 


The solution is 


with v as in [33] and H") the ear function. 

The behavior of H% (Ar) ) AY as Vj, À— 00 
with ratio in a small pe bie ae of 1 can be 
shown to control the behavior of the solution u to 
[39] near grazing rays. There is an asymptotic 
formula for this, which is one of the most delicate 
analytical results in the theory of Bessel functions. 
The result is that, uniformly for z near 1, as 


[b> &, 
| 4c \ 1⁄4 
1 —7m1/3 
H (uz) ~ 2e™/ (5) 


x Jasuo Da ae 
k>0 
+A (WO) S > bow [43] 
k>0 
Here 
A,(€) = Ai(e™™™ E) [44] 


where Ai is the Airy function. The coefficients a,(C) 
and b,(¢) are smooth functions of their argument, 


¢ =¢(z), which is defined by 
1 
~ =f v1 -p [45] 


Making use of [43] in [44], one can obtain a 
parametrix for u (i.e., a solution modulo a C®™ error) 
whose form is a special case of the formula [50], 
which we will present in the next section. 


Geometrical Optics and Extensions 


By results of the last section, the solution to [1] 
when M =R” has the form 


u(t,x) = > | etiieltiesh | (6) dé [46] 


where the functions h+ are produced from the initial 
data via simpler transformations. For a general 
metric tensor, one can produce a parametrix (i.e., 
an approximation to u(t, x) with a C% error) in the 
following form: 


u(t, x) =X J a+ (t,x, £) 


Here the phase functions y*(t,x,€) are smooth for 
€#0 and homogeneous of degree 1 in &. The 
amplitudes a*(t,x,€) are smooth and have asympto- 
tic expansions as || — oo: 


S a hat) [48] 


k>0 





eh (dE [47] 


*(t,x,€) ~ 


with a; (t,x,€) homogeneous of degree —k in €. One 
applies 0? — A to both sides of [47], and obtains an 
operator of a similar form, with new amplitudes 
b*(t,x,€) ~ >> bE (t,x,€). Setting the terms in this 
asymptotic expansion equal to zero yields, first for 
y*(t,x,€), a partial differential equation known as 
the eikonal equation: 

ð E 

a = E Veg] 49) 
where |v| is the norm of a vector v € TM, 
determined by the metric tensor. Setting bF (t, x,¢) 
= 0 for k > 1 yields linear differential equations for 
the amplitude terms in [48], known as transport 
equations. 

Operators of the form [47] are special cases 
of Fourier integral operators. Seminal works of 
Keller (1953) and Lax (1957) gave an important 
stimulus to work on these operators, and work of 
Hormander (1971) turned this into a systematic and 
powerful theory. A particular advance regards 


producing a parametrix valid for all t£. Generally, one 
can solve [49] and the associated transport equations 
for £ in some interval, past which the eikonal 
equation might break down. Hormander’s theory 
treats products of Fourier integral operators, yielding 
global constructions. This facilitates the treatment of 
caustics mentioned earlier. Stationary-phase methods 
can be brought to bear to relate the singularities of 
Th to those of 4, when T is a Fourier integral 
operator. 

To construct parametrices for waves reflecting off 
a boundary, one can again reduce the problem to 
one of the form [39]. Waves that reflect transver- 
sally are given by parametrices of the form [47], 
although with the role of the variables changed, so 
that ¢ in [47|-[49] is replaced by a coordinate that 
vanishes on R x OM. 

A parametrix that treats grazing rays can be written 
in the form of a Fourier—Airy integral operator: 


u(y) =f [adv tile 7A, 6) 
x A+ (Go) | eV P(E) dé [50] 


Here y= (y1,...,Yn+1) denotes a coordinate system 
on a neighborhood of a boundary point of R x M, 
with y,11=0 on R x OM. We have a pair of phase 
functions O(y,€) and C(y, £), homogeneous in € of 
degree 1 and 2/3, respectively, and a pair of 
amplitudes a(y,€) and b(y, £), each having asympto- 
tic expansions of the form [48]. The function A, is 
the Airy function [44]. The phase functions satisfy a 
coupled pair of eikonal equations: 


(V30, Vy) = 0 
where (-,-) denotes the Lorentz inner product on 
Ty(R x M) given by di? — g. More precisely, [51] is 
to hold in the region where ¢ <0, and also to 
infinite order at y„+1 =0, for ¢ > 0. One requires 
00/0€; to have linearly independent y-gradients, for 
(= 1 524450,-anG 


C(y,6)=O() =6 En for yna =0 [52] 


The terms in the asymptotic expansions of a(y, €) 
and b(y,¢) satisfy coupled systems of transport 
equations. One can arrange that b(y,é)=0 for 
Ynt1 =0. Then ulkxəm = TF, where T is a Fourier 
integral operator, which can be inverted, modulo a 
smooth error, by H6rmander’s theory, producing a 
parametrix for [39]. 

The construction of solutions to [51] satisfying 
[52] is due to Melrose. This followed earlier works 
of Ludwig (1967), Melrose (1975), and Taylor 
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(1976), which produced solutions satisfying [52] to 
infinite order at €,=0. This earlier construction is 
adequate to produce a grazing ray parametrix, but 
the sharper result [52] is extremely valuable for 
constructing a gliding ray parametrix. This has the 
form 


uo) = fa AiO + ile DANO 


x Ai(Co) | eVF(8) dé [53] 
It differs from [50] in the use of Ai rather than A... 
Since Ai has real zeros, it is also convenient to pick 
T >0 and evaluate 6,¢€,a, and b at (&1,...,En-1, 
En +iT), and take ĉo = ¿€t (£, + iT). The treatment 
of the eikonal and transport equations is as above, 
though the Fourier—Airy integral operator [50] has a 
different behavior from [53], reflecting the differ- 
ence between how singularities in solutions to the 
wave equation are carried by grazing and by gliding 
rays: 
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Introduction about Turbulence 
and Wavelets 


What is Turbulence? 


Turbulence is a highly nonlinear regime encoun- 
tered in fluid flows. Such flows are described by 
continuous fields, for example, velocity or pressure, 
assuming that the characteristic scale of the fluid 
motions is much larger than the mean free path of 
the molecular motions. The prediction of the 
spacetime evolution of fluid flows from first 
principles is given by the solutions of the Navier- 
Stokes equations. The turbulent regime develops 
when the nonlinear term of Navier-Stokes equa- 
tions strongly dominates the linear term; the ratio 
of the norms of both terms is the Reynolds number 
Re, which characterizes the level of turbulence. In 
this regime nonlinear instabilities dominate, which 
leads to the flow sensitivity to initial conditions and 
unpredictability. 

The corresponding turbulent fields are highly 
fluctuating and their detailed motions cannot be 
predicted. However, if one assumes some statistical 
stability of the turbulence regime, averaged quan- 
tities, such as mean and variance, or other related 
quantities, for example, diffusion coefficients, lift or 
drag, may still be predicted. 

When turbulent flows are statistically stationary 
(in time) or homogeneous (in space), as it is 
classically supposed, one studies their energy spec- 
trum, given by the modulus of the Fourier transform 
of the velocity autocorrelation. 

Unfortunately, since the Fourier representation 
spreads the information in physical space among the 
phases of all Fourier coefficients, the energy spec- 
trum loses all structural information in time or 
space. This is a major limitation of the classical way 
of analyzing turbulent flows. This is why we have 
proposed to use the wavelet representation instead 
and define new analysis tools that are able to 
preserve time and space locality. 

The same is true for computing turbulent flows. 
Indeed, the Fourier representation is well suited to 
study linear motions, for which the superposition 
principle holds and whose generic behavior is, either 
to persist at a given scale, or to spread to larger 
ones. In contrast, the superposition principle does 


not hold for nonlinear motions, their archetype 
being the turbulent regime, which therefore cannot 
be decomposed into a sum of independent motions 
that can be separately studied. Generically, their 
evolution involves a wide range of scales, exciting 
smaller and smaller ones, even leading to finite-time 
singularities, e.g., shocks. The “art” of predicting 
the evolution of such nonlinear phenomena consists 
of disentangling the active from the passive 
elements: the former should be deterministically 
computed, while the latter could either be discarded 
or their effect statistically modeled. The wavelet 
representation allows to analyze the dynamics 
in both space and scale, retaining only those degrees 
of freedom which are essential to predict the 
flow evolution. Our goal is to perform a kind 
of “distillation” and retain only the elements 
which are essential to compute the nonlinear 
dynamics. 


How One Studies Turbulence? 


When studying turbulence one is uneasy about the 
fact that there are two different descriptions, 
depending on which side of the Fourier transform 
one looks from. 


e On the one hand, looking from the Fourier space 
representation, one has a theory which assumes 
the existence of a nonlinear cascade in an 
intermediate range of wavenumbers sets, called 
the “inertial range” where energy is conserved 
and transferred towards high wavenumbers, but 
only on average (i.e., considering either ensemble 
or time or space averages). This implies that a 
turbulent flow is excited at wavenumbers lower 
than those of the inertial range and dissipated at 
wavenumbers higher. Under these hypotheses, the 
theory predicts that the slope of the energy 
spectrum in the inertial range scales as k~°/> in 
dimension 3 and as k~° in dimension 2, k being 
the wavenumber, i.e., the modulus of the wave 
vector. 

@ On the other hand, if one studies turbulence from 
the physical space representation, there is not yet 
any universal theory. One relies instead on 
empirical observations, from both laboratory 
and numerical experiments, which exhibit the 
formation and persistence of coherent vortices, 
even at very high Reynolds numbers. They 
correspond to the condensation of the vorticity 
field into some organized structures that contain 
most of the energy (L*-norm of velocity) and 
enstrophy (L?-norm of vorticity). 


Moreover, the classical method for modeling turbu- 
lent flows consists in neglecting high-wavenumber 
motions and replacing them by their average, suppos- 
ing their dynamics to be either linear or slaved to the 
low wavenumber motions. Such a method would work 
if there exists a clear separation between low and high 
wavenumbers, that is, a spectral gap. 

Actually, there is now strong evidence, from 
both laboratory and direct numerical simulation 
(DNS) experiments, that this is not the case. 
Conversely, one observes that turbulent flows are 
nonlinearly active all along the inertial range and that 
coherent vortices seem to play an essential dynamical 
role there, especially for transport and mixing. One 
may then ask the following questions: Are coherent 
vortices the elementary building blocks of turbulent 
flows? How can we extract them? Do their mutual 
interactions have a universal character? Can we 
compress turbulent flows and compute their evolu- 
tion with a reduced number of degrees of freedom 
corresponding to the coherent vortices? 

The DNS of turbulent flows, based on the integra- 
tion of the Navier-Stokes equations using either grid 
points in physical space or Fourier modes in spectral 
space, requires a number of degrees of freedom per 
time step that varies as Re?/* in dimension 3 (and as 
Re in dimension 2). Due to the inherent limitation of 
computer performances, one can presently only per- 
form DNS of turbulent flows up to Reynolds numbers 
Re=10°. To compute higher Reynolds flows, one 
should then design ad hoc turbulence models, whose 
parameters are empirically adjusted to each type of 
flows, in particular to their geometry and boundary 
conditions, using data from either laboratory or 
numerical experiments. 


What are Wavelets? 


The wavelet transform unfolds signals (or fields) 
into both time (or space) and scale, and possibly 
directions in dimensions higher than 1. The starting 
point is a function ~ € L?(R), called the “mother 
wavelet”, which is well localized in physical space 
x ER, is oscillating (7 has at least a vanishing 
integral, or better, its first m moments vanish), and 
is smooth (its Fourier transform 2(k) exhibits fast 
decay for wave numbers |k| tending to infinity). The 
mother wavelet then generates a family of dilated 
and translated wavelets 


tba, p(x) =a 1 (=) 


with a€ R the scale parameter and b €R the 
position parameter, all wavelets being normalized 
in L*-norm. 
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The wavelet transform of a function f € L7(R) is 
the inner product of f with the analyzing wavelets 
Wa, bs poy gives the wavelet coefficients: f(a, b) = 
(f5 Wa,b) = Txe) dx. They measure the fluc- 
tuations of f around he scale a and the position 
b. f can then be reconstructed without any loss as 
the inner product of its wavelet coefficients f with 
the analyzing wavelets 


va : fle) =C | | F(a, b) aola) ?dadb 


C= lw|7|k| dk being a constant which depends 
on the wavelet w. 

Like the Fourier transform, the wavelet transform 
realizes a change of basis from physical space to 
wavelet space which is an isometry. It thus conserves 
the inner product (Plancherel theorem), and in 
particular energy (Parseval’s identity). Let us men- 
tion that, due to the localization of wavelets in 
physical space, the behavior of the signal at infinity 
does not play any role. Therefore, the wavelet 
analysis and synthesis can be performed locally, in 
contrast to the Fourier transform where the nonlocal 
nature of the trigonometric functions does not allow 
to perform a local analysis. 

Moreover, wavelets constitute building blocks of 
various function spaces out of which some can be 
used to contruct orthogonal bases. The main 
difference between the continuous and the orthogo- 
nal wavelet transforms is that the latter is non- 
redundant, but only preserves the invariance by 
translation and dilation only for a discrete subset of 
wavelet space which corresponds to the dyadic grid 
A= (j,i), for which scale is sampled by octaves j and 
space by positions 2i. The advantage is that all 
orthogonal wavelet coefficients are decorrelated, 
which is not the case for the continuous wavelet 
transform whose coefficients are redundant and 
correlated in space and scale. Such a correlation 
can be visualized by plotting the continuous wavelet 
coefficients of a white noise and the patterns one 
thus observes are due to the reproducing kernel of 
the continuous wavelet transform, which corre- 
sponds to the correlation between the analyzing 
wavelets themselves. 

In practice, to analyze turbulent signals or fields, 
one should use the continuous wavelet transform 
with complex-valued wavelets, since the modulus of 
the wavelet coefficients allows to read the evolution 
of the energy density in both space (or time) and 
scales. If one uses real-valued wavelets instead, the 
modulus of the wavelet coefficients will present the 
same oscillations as the analyzing wavelets and it 
will then become difficult to sort out features 
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belonging to the signal or to the wavelets. In the case 
of complex-valued wavelets, the quadrature between 
the real and the imaginary parts of the wavelet 
coefficients eliminates these spurious oscillations; this 
is why we recommend to use complex-valued wave- 
lets, such as the Morlet wavelet. To compress 
turbulent flows, and a fortiori to compute their 
evolution at a reduced cost, compared to standard 
methods (finite difference, finite volume, or spectral 
methods), one should use orthogonal wavelets. This 
avoids redundancy, since one has the same number of 
grid points as wavelet coefficients. Moreover there 
exists a fast algorithm to compute the orthogonal 
wavelet coefficients which is even faster than the fast 
Fourier transform, having O(N) operations instead of 
O(N log, N). 

The first paper about the continuous wavelet 
transform has been published by Grossmann and 
Morlet (1984). Then, discrete wavelets were 
constructed, leading to frames (Daubechies et al. 
1986) and orthogonal bases (Lemarié and Meyer, 
1986). From there the formalism of multiresolution 
analysis (MRA) has been constructed which led 
to the fast wavelet algorithm (Mallat 1989). The 
first application of wavelets to analyze turbulent 
flows has been published by Farge and Rabreau 
(1988). Since then a long-term research program has 
been developed for analyzing, computing and 
modeling turbulent flows using either continuous 
wavelets, orthogonal wavelets, or wavelet packets. 


Wavelet Analysis 
Wavelet Spectra 


Wavelet space To study turbulent signals one uses 
the continuous wavelet transform for analysis, and 
the orthogonal wavelet transform for compression 
and computation. To perform a continuous wavelet 
transform, one can choose: 


è either a real-valued wavelet, such as the Marr 
wavelet, also called “Mexican hat,” which is the 
second derivative of a Gaussian, 


ve) = (1x4) exp) i 


è or a complex-valued wavelet, such as the Morlet 
wavelet, 


2 
Hk) = exp - S 


Wk) =0 fork<0 


fork > 0 


with the wavenumber ky denoting the barycenter of 
the wavelet support in Fourier space computed as 


_ Jon Riv(k) Idk 


a 3 
Jo |w(k) Ide 3 


Y 


For the orthogonal wavelet transform, there is 
a large collection of possible wavelets and the 
choice depends on which properties are preferred, 
for instance: compact support, symmetry, smooth- 
ness, number of cancelations, computational 
efficiency. 

From our own experience, we tend to prefer 
the Coifman wavelet 12, which is compactly 
supported, has four vanishing moments, is quasi- 
symmetric, and is defined with a filter of length 12, 
which leads to a computational cost for the fast 
wavelet transform in 24N operations, since two 
filters are used. 

As stated above, we recommend the complex- 
valued continuous wavelet transform for analysis. In 
this case, one plots the modulus and the phase of the 
wavelet coefficients in wavelet space, with a linear 
horizontal axis for the position b, and a logarithmic 
vertical axis for the scale a, with the largest scale at 
the bottom and the smallest scale at the top. 

In Figure 1a we show the wavelet analysis of 
a turbulent signal, corresponding to the time 
evolution of the velocity fluctuations of two succes- 
sive vortex breakdowns, measured by hot-wire 
anemometry at N=32768=2!'> instants (Cuypers 
et al. 2003). The modulus of the wavelet coefficients 
(Figure 1b) shows that during the vortex break- 
down, which is due to strong nonlinear flow 
instability, energy is spread over a wide range of 
scales. The phase of the wavelet coefficients 
(Figure 1c) is plotted only where the modulus is 
non-negligible, otherwise the phase information 
would be meaningless. In Figure 1c, one observes 
that the lines of constant phase point towards the 
instants where the signal is less regular, that is, 
during vortex breakdowns. 


Local wavelet spectrum Since the wavelet trans- 
form conserves energy and preserves locality in 
physical space, one can extend the concept of energy 
spectrum and define a local energy spectrum, such 
that 


2 
fork >0 M 





~ 1 p/k 

Bea) =o FZ) 
where k, is the centroid wavenumber of the 
analyzing wavelet w and Cy is defined in the 
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Figure 1 Example of a one-dimensional continuous wavelet 
analysis. (a) the signal to be analyzed, (b) the modulus of its 
wavelet coefficients, (c) the phase of its wavelet coefficients. 
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admissibility condition (respectively, eqns [10] and 
[1] in the article Wavelets: Mathematical Theory). 

By measuring E(k,x) at different instants or 
positions, one estimates which elements in the 
signal contribute most to the global Fourier energy 
spectrum, inorder to suggest a way to decompose 
the signal into different components. For example, 
if one considers turbulent flows, one can compare 
the energy spectrum of the coherent structures 
(such as isolated vortices in incompressible flows 
or shocks in compressible flows) and the energy 
spectrum of the incoherent background flow, since 
both elements exhibit different correlations and 
therefore different spectral slopes. 


Global wavelet spectrum Although the wavelet 
transform analyzes the flow using localized func- 
tions rather than complex exponentials, one can 
show that the global wavelet energy spectrum 
converges towards the Fourier energy spectrum, 
provided the analyzing wavelet has enough vanish- 
ing moments. More precisely, the global wavelet 


spectrum, defined by integrating [4] over all 
positions, 
E(k) = / E(k, x)dx [5] 


gives the correct exponent for a power-law Fourier 
energy spectrum E(k) x k~’ if the analyzing wavelet 
has at least M > (8—1)/2 vanishing moments. 
Thus, the steeper the energy spectrum one studies, 
the more vanishing moments the analyzing wavelet 
should have. 

The inertial range which corresponds to the scales 
when turbulent flows are dominated by nonlinear 
interactions, exhibits a power-law behavior as 
predicted by the statistical theory of homogeneous 
and isotropic turbulence. 

The ability to correctly evaluate the slope of the 
energy spectrum is an important property of the 
wavelet transform which is related to its ability to 
detect and characterize singularities. We will not 
discuss here how wavelet coefficients could be used 
to study singularities and fractal measures, since it is 
presented in detail elsewhere (see Wavelets: 
Applications). 


Relation to Classical Analysis 


Relation to Fourier spectrum The global wavelet 
energy spectrum E(k) is actually a smoothed version 
of the Fourier energy spectrum E(k). This can be 
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dk! (6) 


seen from the following relation between the two 
~ 1 
E 


spectra: 
7 CO i pa kyk 
N= oe | BR) /8( k ) 


which shows that the global wavelet spectrum is an 
average of the Fourier spectrum weighted by the 
square of the Fourier transform of the analyzing 
wavelets at wavenumber k. Note that the larger k, 
the larger the averaging interval, because wavelets 
are bandpass filters with Ak/k constant. This 
property of the global wavelet energy spectrum is 
particularly useful to study turbulent flows. Indeed, 
the Fourier energy spectrum of a single realization 
of a turbulent flow is too oscillating to be able to 
clearly detect a slope, while it is no more the case 
for the global wavelet energy spectrum, which is a 
better estimator of the spectral slope. 

The real-valued Marr wavelet [1] has only two 
vanishing moments and thus can correctly measure 
the energy spectrum exponents up to 8 < 5. In the 
case of the complex-valued Morlet wavelet [2], only 
the zeroth-order moment is null, but the higher mth 
order moments are very small (xk? eky/ a 
provided that kų is larger than 5. For instance, the 
Morlet wavelet transform with ky =6 gives accu- 
rate estimates of the power-law exponent of the 
energy spectrum up to 8 < 7. 

There is also a family of wavelets with an infinite 
number of cancelations 











alk) = aex- (e+) #21 17 


where a,, is chosen for normalization. 

These wavelets can therefore correctly measure 
any power-law energy spectrum, and thus detect the 
difference between a power-law energy spectrum 
and a Gaussian energy spectrum (E(k) « e ko)Ż) y, 
For instance, it is important in turbulence to 
determine the wavenumber after which the 
energy spectrum decays exponentially, since this 
wavenumber defines the end of the inertial range, 
dominated by nonlinear interactions, and the begin- 
ning of the dissipative range, dominated by linear 
dissipation. 


Relation to structure functions In this subsection 
we will point out the limitations of classical 
measures of intermittency and present a set of 
wavelet-based alternatives. 


The classical measures based on structure func- 
tions can be thought of as a special case of wavelet 
filtering using a nonsmooth wavelet defined as the 
difference of two Diracs (DOD). It is this lack of 
regularity of the underlying wavelet that limits the 
adequacy of classical measures to analyze smooth 
signals. Wavelet-based diagnostics can overcome 
these limitations, and produce accurate results, 
whatever the signal to be analyzed. 

We will link the scale-dependent moments of the 
wavelet coefficients and the structure functions, 
which are classically used to study turbulence. In 
the case of second-order statistics, the global wavelet 
spectrum corresponds to the second-order structure 
function. Furthermore, a rigorous bound for the 
maximum exponent detected by the structure func- 
tions can be computed, but there is a way to 
overcome this limitation by using wavelets. 

The increments of a signal, also called the 
modulus of continuity, can be seen as its wavelet 
coefficients using the DOD wavelet 


p(x) = 6(x + 1) — ê(x) [8] 


We thus obtain 


f(x +a) — f(x) = fra = (f4 a) [9] 


with —e,aly) =1/al6((y — x)/a + 1) —8((y — x)/a)]. 
Note that the wavelet is normalized with respect to 
the L'-norm. The pth-order structure function S,(a) 
therefore corresponds to the pth-order moment of 
the wavelet coefficients at scale a 


Sp(a) = | Fa) dx 10] 


As the DOD wavelet has only one vanishing 
moment (its mean), the exponent of the pth-order 
structure function in the case of a self-similar 
behavior is limited by p, that is, if S,(a) x aS, 
then ¢(p) < p. To be able to detect larger exponents, 
one has to use increments with a larger stencil, or 
wavelets with more vanishing moments. 

We now concentrate on the case p = 2, that is, the 
energy norm. Equation [6] gives the relation 
between the global wavelet spectrum E(k) and the 
Fourier spectrum E(k) for an arbitrary wavelet 4. 
For the DOD wavelet we find, since 7°(k) 
eik — 1 = eik/2(eik/2 — eik/2) and hence — |2)°(k)|° 
2(1 — cos k), that 


E(k) sgg, Be) (2 - 2005(“#) Jax i 


Setting a=k,,/k, we see that the wavelet spectrum 
corresponds to the second-order structure function, 
such that 


E(k) = TE? (a) 12] 


The above results show that, if the Fourier spectrum 
behaves like k~° for k — œ, E(k) x k™® ifa < 2M + 
1, where M denotes the number of vanishing 
moments of the wavelets. Consequently, we find 
for Sz(a) that Sz(a) x aS?) = (ky /k) P for a—0 if 
(2) < 2M. For the DOD wavelet, we have M=1, 
therefore, the second-order structure function 
can only detect slopes smaller than 2, corresponding 
to an energy spectrum whose slope is shallower 
than —3. Thus, the usual structure functions give 
spurious results for sufficiently smooth signals. The 
relation between structure functions and wavelet 
coefficients can be generalized in the context of 
Besov spaces, which are classically used for non- 
linear approximation theory (see Wavelets: Mathe- 
matical Theory). 


intermittency Measures 


Intermittency is defined as localized bursts of high- 
frequency activity. This means that intermittent 
phenomena are localized in both physical and 
spectral spaces, and thus a suitable basis for 
representing intermittency should reflect this dual 
localization. The Fourier basis is well localized in 
spectral space, but delocalized in physical space. 
Therefore, when a turbulence signal is filtered using 
a high-pass Fourier transform and then recon- 
structed in physical space, for example, to calculate 
the flatness, some spatial information is lost. This 
leads to smoothing of strong gradients and spurious 
oscillations in the background, which come from the 
fact that the modulus and phase of the discarded 
high wavenumber Fourier modes have been lost. 
The spatial errors introduced by such a Fourier 
filtering lead to errors in estimating the flatness, and 
hence the signal’s intermittency. 

When a quantity (e.g., velocity derivative) is 
intermittent, it contains rare but strong events (i.e., 
bursts of intense activity), which correspond to 
large deviations reflected in the “heavy tails” of the 
PDF. Second-order statistics (e.g., energy spectrum, 
second-order structure function) are relatively 
insensitive to such rare events whose time or 
space supports are very small and thus do not 
dominate the integral. However, these events 
become increasingly important for higher-order 
statistics, where they finally dominate. High-order 
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statistics therefore characterize intermittency. Of 
course, intermittency is not essential for all problems: 
second-order statistics are sufficient to measure 
dispersion (dominated by energy-containing scales), 
but not to calculate drag or mixing (dominated by 
vorticity production in thin boundary or shear 
layers). 

To measure intermittency, one uses the space- 
scale information contained in the wavelet coeffi- 
cients to define scale-dependent moments and 
moment ratios. Useful diagnostics to quantify the 
intermittency of a field f are the moments of its 
wavelet coefficients at different scales j 


Á [13] 








2i—1 
My = le: 

i=0 
Note that the distribution of energy scale by scale, 
that is, the scalogram, can be computed from the 
second-order moment of the orthogonal wavelet 
coefficients: E; =2/~'M2,;. Due to orthogonality of 
the decomposition, the total energy is just the sum: 
E= 2/20 Ej. 

The sparsity of the wavelet coefficients at each 
scale is a measure of intermittency, and it can be 
quantified using ratios of moments at different 
scales 


(M,,(f))?/4 


which may be interpreted as quotient norms 
computed in two different functional spaces, 
L?-and L%-spaces. Classically, one chooses g=2 to 
define typical statistical quantities as a function of 
scale. Recall that for p=4 we obtain the scale- 
dependent flatness F; = Q4,2,;. It is equal to 3 for a 
Gaussian white noise at all scales j, which proves that 
this signal is not intermittent. The scale-dependent 
skewness, hyperflatness, and hyperskewness are 
obtained for p=3,5, and 6, respectively. For inter- 
mittent signals OQ, 4,; increases with j, whatever p 
and q. 


Opa (f) = 


Wavelet Compression 
Principle 


To study turbulent signals, we now propose to 
separate the rare and extreme events from the dense 
events, and then calculate their statistics indepen- 
dently. A major difficulty in turbulence research is 
that there is no clear scale separation between these 
two kinds of events. This lack of “spectral gap” 
excludes Fourier filtering for disentangling these 
two behaviors. Since the rare events are well 
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localized in physical space, one might try to use an 
on-off filter defined in physical space to extract 
them. However, this approach changes the spectral 
properties by introducing spurious discontinuities, 
adding an artificial scaling (e.g, Rk in one 
dimension) to the energy spectrum. To avoid these 
problems, we use the wavelet representation, which 
combines both physical and spectral space localiza- 
tions (bounded from below by Heisenberg’s uncer- 
tainty principle). In turbulence, the relevant rare 
events are the coherent vortices and the dense 
events correspond to the residual background flow. 
We have proposed a nonlinear wavelet filtering of 
the wavelet coefficients of vorticity to extract the 
coherent vortices out of turbulent flows. We now 
detail the different steps of this procedure. 


Extraction of Coherent Structures 


Principle We propose a new method to extract 
coherent structures from turbulent flows, as encoun- 
tered in fluids (e.g., vortices, shocklets) or plasmas 
(e.g., bursts), in order to study their role in transport 
and mixing. 

We first replace the Fourier representation by the 
wavelet representation, which keeps track of both 
time and scale, instead of frequency only. The 
second improvement consists in changing our view- 
point about coherent structures. Since there is not 
yet a universal definition of coherent structures, we 
prefer starting from a minimal but more consensual 
statement about them, that everyone hopefully could 
agree with: “coherent structures are not noise.” 
Using this apophatic method, we propose the 
following definition: “coherent structures are what 
remain after denoising.” 

For the noise we use the mathematical definition 
stating that a noise cannot be compressed in any 
functional basis. Another way to say this is to 
observe that the shortest description of a noise is the 
noise itself. Notice that often one calls “noise” what 
is actually “experimental noise,” but not noise in the 
mathematical sense. 

Considering our definition of coherent structures, 
turbulent signals can be split into two contribu- 
tions: coherent bursts, corresponding to that part of 
the signal which can be compressed in a wavelet 
basis, and incoherent noise, corresponding to that 
part of the signal which cannot be compressed, 
neither in wavelets nor in any other basis. We will 
then check a posteriori that the incoherent con- 
tribution is spread, and therefore does not com- 
press, in both Fourier and grid-point basis. Since we 
use the orthogonal wavelet representation, both 
coherent and incoherent components are 


orthogonal and therefore the L*-norm, for example, 
energy or enstrophy, is a superposition of coherent 
and incoherent contributions (Mallat 1998). 

Assuming that coherent structures are what 
remain after denoising, we need a model, not for 
the structures themselves, but for the noise. As a first 
guess, we choose the simplest model and suppose the 
noise to be additive, Gaussian and white, that is, 
uncorrelated. Having this model in mind, we use 
Donoho and Johnstone’s theorem to compute the 
value to threshold the wavelet coefficients. Since the 
threshold value depends on the variance of the noise, 
which in the case of turbulence is not a priori 
known, we propose a recursive method to estimate 
it from the variance of the weakest wavelet 
coefficients, that is, those whose modulus is below 
the threshold value. 


Wavelet decomposition We describe the wavelet 
algorithm to extract coherent vortices out of 
turbulent flows and apply it as example to a 3D 
turbulent flow. We consider the vorticity field 
@= V x v, computed at resolution N =2°/,N being 
the number of grid points and J the number of 
octaves in each spatial direction. Each vorticity 
component is developed into an orthogonal wavelet 
series from the largest scale Imax =2° to the smallest 
scale [min =2/~' using a three-dimensional (3D) MRA: 


w(x) a 
-1 2/-12/-12/-1 7 


+ J D DA y Jo Wi i eshy hg Wea dg (x) [15] 
0 d=1 


J=0 ix=0 ty=0 i 








with P; ip iyi i(%) = Pj, i, (X) Pii (YP; iZ), and 


Wii (X) Pj. i, VP) d=1 
)4.(x) Yi, (V)bji,(z) d=2 
j,i, (x) Pa VWa) d= 3 
Wei ipi(®) = A Visl(x) bi, Oa) d=4 [16] 
Wi i, (~) Vi, (W)G4(Z) d=S5 
j,i, (x) 74, Yb) d=6 
Viie Ob Oil) d= 7 
where ¢,; and yj; are the one-dimensional 


scaling function and the corresponding wavelet, 
respectively. Due to orthogonality, the scaling coeffi- 
cients are given by w,0,0 = (w, ¢0,0,0) and the wavelet 

~d n 
coefficients are given by DF iesiyy ig = Ws VF i, i,,i,)9 Where 
(.,-) denotes the L*-inner product. 


Nonlinear thresholding The vorticity field is then 
split into @c and @, by applying a nonlinear threshold- 
ing to the wavelet coefficients. The threshold is defined 
as e= ($Z InN)". It only depends on the total 
enstrophy Z= 4 f læ| dx and on the number of grid 
points N without any adjustable parameter. The choice 
of this threshold is based on theorems by Donoho 
and Johnstone proving optimality of the wavelet 
representation to denoise signals in the presence of 
Gaussian white noise, since this wavelet-based 
estimator minimizes the maximal L?-error for func- 
tions with inhomogeneous regularity (Mallat 1998). 


Wavelet reconstruction The coherent vorticity field 
@c is reconstructed from the wavelet coefficients 
whose modulus is larger than «€ and the incoherent 
vorticity field @; from the wavelet coefficients whose 
modulus is smaller or equal to e. The two fields thus 
obtained, @c and @r, are orthogonal, which ensures 
a separation of the total enstrophy into Z = Zc + Z, 
because the interaction term (@c,@,) vanishes. We 
then use Biot-Savart’s relation v=V x (V~@) to 
reconstruct the coherent velocity vc and the inco- 
herent velocity v; from the coherent and incoherent 
vorticities, respectively. 


Application to 3D Turbulence 


We consider a 3D homogeneous isotropic turbulent 
flow, computed by DNS at resolution N =256°, 
which corresponds to a Reynolds number based 
on the Taylor microscale R,=168 (Farge et al. 
2003). The computation uses a pseudospectral 
code, with a Gaussian random vorticity field as initial 
condition, and the flow evolution is integrated until a 
statistically stationary state is reached. Figure 2 shows 
the modulus of the vorticity fluctuations of the total 
flow, zooming on a 64° subcube to enhance structural 
details. The flow exhibits elongated, distorted, and 
folded vortex tubes, as observed in laboratory and 
numerical experiments. 

We apply to the total flow the wavelet compres- 
sion algorithm described above. We find that only 
2.9% wavelet modes correspond to the coherent 
flow, which retains 79% of the energy (L?7-norm of 
velocity) and 75% of the enstrophy (L*-norm of 
vorticity), while the remaining 97.1% incoherent 
modes contain only 1% of the energy and 21% of 
the enstrophy. We display the modulus of the 
coherent (Figure 3) and incoherent (Figure 4) vorti- 
city fluctuations resulting from the wavelet 
decomposition. 

Note that the values of the three isosurfaces chosen 
for visualization (|w|=6Z'/*, 8Z'/2 and 10Z'/2, with 
Z the total enstrophy) are the same for the total and 
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Figure 2 Isosurfaces of total vorticity field, for 
|| = 30, 40, 50 with opacity 1, 0.5, 0.1, respectively, and o? the 
total enstrophy. Simulation with resolution N = 256° for Ry = 168. 
Zoom on a subcube 64%. Reprinted with permission from Farge 
et al. Coherent vortex extraction in three-dimensional homo- 
geneous turbulence: Comparison between CVS-wavelet and 
POD-Fourier decompositions. Physics of Fluids 15(10): 2886- 
2896. Copyright 2003, American Institute of Physics. 


coherent vorticities, but they have been reduced by a 
factor 2 for the incoherent vorticity whose fluctuations 
are much smaller. In the coherent vorticity (Figure 3) 
we recognize the same vortex tubes as those present in 
the total vorticity (Figure 2). In contrast, the remaining 
vorticity (Figure 4) is much more homogeneous and 











Isosurfaces of 


Figure 3 coherent vorticity field, for 
|\@| = 30, 40,50 with opacity 1, 0.5, 0.1, respectively. Simulation 
with resolution N = 256°. Zoom on a subcube 64%. Reprinted with 
permission from Farge et al. Coherent vortex extraction in three- 
dimensional homogeneous turbulence: Comparison between CVS- 
wavelet and POD-Fourier decompositions. Physics of Fluids 
15(10): 2886-2896. Copyright 2003, American Institute of Physics. 
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Figure 4 lIsosurfaces of incoherent vorticity field, for 
|\a| = 3/20, 20,5/20 with opacity 1, 0.5, 0.1, respectively. Simula- 
tion with resolution N = 256°. Zoom on a subcube 64°. Reprinted 
with permission from Farge et al. Coherent vortex extraction in 
three-dimensional homogeneous turbulence: Comparison between 
CVS-wavelet and POD-Fourier decompositions. Physics of Fluids 
15(10): 2886-2896. Copyright 2003, American Institute of Physics. 


does not exhibit coherent structures. Hence, the 
wavelet compression retains all the vortex tubes and 
preserves their structure at all scales. Consequently, the 
coherent flow is as intermittent as the total flow, while 
the incoherent flow is structureless and non intermit- 
tent. Modeling the effect of the incoherent flow onto 
the coherent flow should then be much simpler than 
with methods based on Fourier filtering. 

Figure 5 shows the velocity PDF in semilogarithmic 
coordinates. We observe that the coherent velocity has 
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Figure 5 Velocity PDF, resolution N=256° with a zoom at 
643. Reprinted with permission from Farge et al. Coherent vortex 
extraction in three-dimensional homogeneous turbulence: Com- 
parison between CVS-wavelet and POD-Fourier decomposi- 
tions. Physics of Fluids 15(10): 2886-2896. Copyright 2003, 
American Institute of Physics. 
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Figure 6 Energy spectrum, resolution N=256° with a 
zoom at 64°. Reprinted with permission from Farge et al. 
Coherent vortex extraction in three-dimensional homogeneous 
turbulence: Comparison between CVS-wavelet and POD-Fourier 
decompositions. Physics of Fluids 15(10): 2886-2896. Copyright 
2003, American Institute of Physics. 


the same Gaussian distribution as the total velocity, 
while the incoherent velocity remains Gaussian, but its 
variance is much smaller. The corresponding energy 
spectra are plotted on Figure 6. We observe that the 
spectrum of the coherent energy is identical to the 
spectrum of the total energy all along the inertial 
range. This implies that the vortex tubes are respon- 
sible for the k~*/? energy scaling, which corresponds to 
a long-range correlation, characteristic of 3D turbu- 
lence as predicted by Kolmogorov’s theory. In con- 
trast, the incoherent energy has a scaling close to k’, 
which corresponds to an energy equipartition between 
all wave vectors k, since the isotropic spectrum is 
obtained by integrating energy in 3D k-space over 2D 
shells k = |k|. The incoherent velocity field is therefore 
spatially uncorrelated, which is consistent with the 
observation that incoherent vorticity is structureless 
and homogeneous. 

From these observations, we propose the following 
scenario to interpret the turbulent cascade: the 
coherent energy injected at large scales is transferred 
towards small scales by nonlinear interactions between 
vortex tubes. In the meantime, these nonlinear inter- 
actions also produce incoherent energy at all scales, 
which is dissipated at the smallest scales by molecular 
kinematic viscosity. Thus, the coherent flow causes 
direct transfer of the coherent energy into incoherent 
energy. Conversely, the incoherent flow does not 
trigger any energy transfer to the coherent flow, as it 
is structureless and uncorrelated. We conjecture that 
the coherent flow is dynamically active, while the 
incoherent flow is slaved to it, being only passively 
advected and mixed by the coherent vortex tubes. This 
is a different view from the classical interpretation 
since it does not suppose any scale separation. Both 


coherent and incoherent flows are active all along the 
inertial range, but they are characterized by different 
probability distribution functions and correlations: 
non-Gaussian and long-range correlated for the 
former, while Gaussian and uncorrelated for the latter. 


Wavelet Computation 
Principle 


The mathematical properties of wavelets (see Wave- 
lets: Mathematical Theory) motivate their use for 
solving of partial differential equations (PDEs). 

The localization of wavelets, both in scale and 
space, leads to effective sparse representations of 
functions and pseudodifferential operators (and their 
inverse) by performing nonlinear thresholding of the 
wavelet coefficients of the function and of the matrices 
representing the operators. Wavelet coefficients allow 
to estimate the local regularity of solutions of PDEs 
and thus can define autoadaptive discretizations with 
local mesh refinements. The characterization of func- 
tion spaces in terms of wavelet coefficients and the 
corresponding norm equivalences lead to diagonal 
preconditioning of operators in wavelet space. 

Moreover, the existence of the fast wavelet trans- 
form yields algorithms with optimal linear complex- 
ity. The currently existing algorithms can be 
classified in different ways. We can distinguish 
between Galerkin, collocation, and hybrid schemes. 
Hybrid schemes combine classical discretizations, 
for example, finite differences or finite volumes, and 
wavelets, which are only used to speed up the linear 
algebra and to define adaptive grids. On the other 
hand, Galerkin and collocation schemes employ 
wavelets directly for the discretization of the 
solution and the operators. Wavelet methods have 
been developed to solve Burger’s, Stokes, Kura- 
moto-Sivashinsky, nonlinear Schrödinger, Euler, 
and Navier-Stokes equations. As an example, we 
present an adaptive wavelet algorithm, of Galerkin 
type, to solve the 2D Navier-Stokes equations. 


Adaptive Wavelet Scheme 


We consider the 2D Navier-Stokes equations writ- 
ten in terms of vorticity w and stream function Y, 
which are both scalars in two dimensions, 


Ow +v -Vw —vVV’w = V x F [17] 


V’ =w and v=VtY [18] 


for x € [0,1]*,t > 0. The velocity is denoted by v, F 
is an external force, v > 0 is the molecular kinematic 
viscosity, and V+ = (— ôy, Ox). 
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The above equations are completed with bound- 
ary conditions and a suitable initial condition. 


Time discretization Introducing a classical semi- 
implicit time discretization with a time step At and 
setting w” (x) © w(x,nAt), we obtain 


(1—vAtV*)w* =u" + At(V x r eaa [19] 


g2y”+! = a”t1 and yp! = Viwl [20] 

Hence, in each time step two elliptic problems 
have to be solved and a differential operator has to 
be applied. 

Formally the above equations can be written in 
the abstract form Lu=f, where L is an elliptic 
operator with constant coefficients. This corre- 
sponds to a Helmholtz type equation for w with 
L=(1 — vAtV?) and a Poisson equation for Y with 
Leva 


Spatial discretization For the spatial discretization, 
we use the method of weighted residuals, that is, a 
Petrov—-Galerkin scheme. The trial functions 
are orthogonal wavelets œ and the test functions 
are operator adapted wavelets, called “vaguelettes,” 
0. To solve the elliptic equation Lu=f at time 
step t”t!, we develop u”*! into an orthogonal 
wavelet series, that is, u”+!= >, wt py, where 
A= (j, ix,iy,d) denotes the multi-index for scale j, 
space i, and direction d. Requiring that the residual 
vanishes with respect to all test functions 0), we 
obtain a linear system for the unknown wavelet 


coefficients we of the solution u: 


> (Lyx Oy) = (FON) [21] 
À 


The test functions 0 are defined such that the 
stiffness matrix turns out to be the identity. 
Therefore, the solution of Lu=f reduces to a 
change of basis, that is, u”™t = 5>, (f,0\)W. The 
right-hand side (RHS) f can then be developed into a 
biorthogonal operator adapted wavelet 
basis f= >X (f,0a)Ġ, with Oy =L typ, and 
C= Lya, * denoting the adjoint operator. By 
construction, 0 and Ç are biorthogonal, that is, 
such that (8),¢,) =6),y. It can be shown that 
both have similar localization properties in physical 
and Fourier space as Y, and that they form a Riesz 
basis. 


Adaptive discretization To get an adaptive space 
discretization for the linear problem Lu=f, we 
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Figure 7 Illustration of the dynamic adaption strategy in 
wavelet coefficient space. 


consider only the significant wavelet coefficients of 
the solution. Hence, we only retain coefficients 4% 
whose modulus is larger than a given threshold e, 
that is, |43| >e. The corresponding coefficients 
are shown in Figure 7 (white area under the solid 
line curve). 


Adaption strategy To be able to integrate the 
equation in time we have to account for the 
evolution of the solution in wavelet coefficient 
space (indicated by the arrow in Figure 7). There- 
fore, we add at time step t” the neighbors to the 
retained coefficients, which constitute a security 
zone (gray area in Figure 7). The equation is then 
solved in this enlarged coefficient set (white and 
gray areas below the curves in Figure 7) to obtain 
u1. Subsequently, we threshold the coefficients 
and retain only those whose modulus |#t'| > 
(coefficients under the dashed curve in Figure 7). 
This strategy is applied in each time step and hence 
allows to automatically track the evolution of the 
solution in both scale and space. 


Evaluation of the nonlinear term For the 
evaluation of the nonlinear term f(u”), where the 
wavelet coefficients “” are given, there are two 
possibilities: 


è Evaluation in wavelet coefficient space. As 
illustration, we consider a quadratic nonlinear 
term, f(u)=u?. The wavelet coefficients of f can 
be calculated using the connection coefficients, 
that is, one has to calculate the bilinear expres- 
sion, > > y UL yy yn Uy with the interaction 
tensor Zaya = (papy, Oy”). Although many coeffi- 
cients of Z are zero or very small, the size of Z 
leads to a computation which is quite untractable 
in practice. 

e Evaluation in physical space. This approach is 
similar to the pseudospectral evaluation of the 
nonlinear terms used in spectral methods, there- 
fore it is called pseudowavelet technique. The 


advantage of this scheme is that general nonlinear 
terms, for example, f(u)=(1—u)e~©/“, can be 
treated more easily. The method can be summar- 
ized as follows: starting from the significant 
wavelet coefficients, |t| >€, one reconstructs u 
on a locally refined grid and gets u(x)). Then one 
can evaluate f(u(x))) pointwise and the wavelet 
coefficients f, are calculated using the adaptive 
decomposition. 


Finally, one computes the scalar products of the 
RHS of [21] with the test functions 0 to advance the 
solution in time. We compute 7) = (f, 0\) belonging 
to the enlarged coefficient set (white and gray 
regions in Figure 7). 

The algorithm is of O(N) complexity, where N 
denotes the number of wavelet coefficients retained 
in the computation. 


Application to 2D Turbulence 


To illustrate the above algorithm we present an 
adaptive wavelet computation of a vortex dipole in 
a square domain, impinging on a no-slip wall at 
Reynolds number Re = 1000. To take into account 
the solid wall, we use a volume penalization 
method, for which both the fluid flow and the 
solid container are modeled as a porous medium 
whose porosity tends towards zero in the fluid and 
towards infinity in the solid region. 

The 2D Navier-Stokes equations are thus mod- 
ified by adding the forcing term F = —(1/n)yv 
in eqn [18], where 7 is the penalization parameter 
and y is the characteristic function whose value is 1 
in the solid region and 0 elsewhere. The equations 
are solved using the adaptive wavelet method in 
a periodic square domain of size 1.1, in which 
the square container of size 1 is imbedded, 
taking 7=10°%. The maximal resolution corre- 
sponds to a fine grid of 1024* points. Figure 8a 
shows snapshots of the vorticity field at times 
t=0.2,0.4,0.6, and 0.8 (in arbitrary units). We 
observe that the vortex dipole is moving towards 
the wall and that strong vorticity gradients are 
produced when the dipole hits the wall. The 
computational grid is dynamically adapted during 
the flow evolution, since the nonlinear wavelet filter 
automatically refines the grid in regions where 
strong gradients develop. Figure 8b shows the 
centers of the retained wavelet coefficients at 
corresponding times. 

Note that during the computation only 5% out of 
10247 wavelet coefficients are used. The time 
evolution of total kinetic energy and the total 


enstrophy F = E xv, are plotted in Figure 9 to 
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(a) 


(b) 


Figure 8 Dipole wall interaction at Re = 1000. (a) Vorticity field, (b) corresponding centers of the active wavelets, at t= 0.2, 0.4, 0.6, 


and 0.8 (from top to bottom). 


show the production of enstrophy and the concomi- 
tant dissipation of energy when the vortex dipole 
hits the wall. 

This computation illustrates the fact that the 
adaptive wavelet method allows an automatic grid 
refinement, both in the boundary layers at the 
wall and also in shear layers which develop during 
the flow evolution far from the wall. Therewith, 
the number of grid points necessary for the 
computation is significantly reduced, and we con- 
jecture that the resulting compression rate will 
increase with the Reynolds number. 


0.5 
0.45 
0.4 

= 0.35 5 
0.3 
0.25 
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5 
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Figure 9 Time evolution of energy (solid line) and enstrophy 
(dashed line). 
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Introduction 


Wavelet analysis was first developed in the early 
1980s in the field of seismic signal analysis in the 
form of an integral transform with a localized kernel 
function with continuous parameters of dilation and 
translation. When a seismic wave or its derivative 
has a singular point, the integral transform has a 
scaling property with respect to the dilation para- 
meter; thus, this scaling behavior can be available to 
locate the singular point. In the mid-1980s, the 
orthonormal smooth wavelet was first constructed, 
and later the construction method was generalized 
and reformulated as multiresolution analysis 
(MRA). Since then, several kinds of wavelets have 
been proposed for various purposes, and the concept 
of wavelet has been extended to new types of basis 
functions. In this sense, the most important effect of 
wavelets may be that they have awakened deep 
interest in bases employed in data analysis and data 
processing. Wavelets are now widely used in various 
fields of research; some of their applications are 
discussed in this article. 
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http://wavelets.ens.fr — other papers about wavelets and turbu- 


lence can be downloaded from this site. 


From the perspective of time-frequency analysis, 
the wavelet analysis may be regarded as a windowed 
Fourier analysis with a variable window width, 
narrower for higher frequency. The wavelets can 
therefore give information on the local frequency 
structure of an event; they have been applied to 
various kinds of one-dimensional (1D) or multi- 
dimensional signals, for example, to identify an 
event or to denoise or to sharpen the signal. 

1D wavelets Yb) (x) are defined as 


1 b (* — =) 
vial \ 4 
where a(40),b are real parameters and y(x) is a 
spatially localized function called “analyzing wave- 
let” or “mother wavelet.” Wavelet analysis gives a 
decomposition of a function into a linear combina- 
tion of those wavelets, where a perfect reconstruc- 
tion requires the analyzing wavelet to satisfy some 
mathematical conditions. 

For the continuous wavelet transform (CWT), 
where the parameters (a,b) are continuous, the 
analyzing wavelet 7(x)L7(R) has to satisfy the 
admissibility condition 





pe (x) = 


analyzing wavelet y(x)L*(R) has to satisfy the 
admissibility condition 


EA A 
œ= f vw) dw < oo 


œ la] 





where %(w) is the Fourier transform of w(x): 


De) = | eya) dx 
The admissibility condition is known to be equiva- 
lent to the condition that (x) has no zero-frequency 
component, that is, ¥(0)=0, under some mild 
condition for the decay rate at infinity. Then the 


CWT and its inverse transform of a data function 
f(x) € L?(R) is defined as 


— 1 >S ab) (x da db 
foz] | Tole bw) S 


In the case of the discrete wavelet transform 
(DWT), the parameters (a,b) are taken discrete; a 
typical choice is a=1/2/,b=k/2/, where j and k are 
integers: 





pipl) = 22x — k) 


In order that the wavelets {w,,(x)|j,k € Z} may 
constitute a complete orthonormal system in L7(R), 
the analyzing wavelet should satisfy more stringent 
conditions than the admissibility condition for the 
CWT, and is now constructed in the framework of 
MRA. A data function is then decomposed by the 
DWT as 





OO OO 
fx) = Y ahala) o= | Wafl) dx 
j=—00 = 
Even when the discrete wavelets do not constitute 
a complete orthnormal system, they often form a 
wavelet frame if linear combinations of the wavelets 
are dense in L? (R) and if there are two constants A, 
B such that the inequality 


AAE < X Kt PE < BIIAIF 
j,k 


holds for an arbitrary f(x) € L?(R). For the wavelet 
frame {wz}, there is a corresponding dual frame, 
(V; k}, which permits the following expansion of f(x): 


f(x) = Dies Pbl) = Dies vial) 


j,k j,k 
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The wavelet frame is also employed in several 
applications. 

From the prospect of applications, the CWTs are 
better adapted for the analysis of data functions, 
including the detection of singularities and patterns, 
while the DWTs are adapted to the data processing, 
including signal compression or denoising. 


Singularity Detection and Multifractal 
Analysis of Functions 


Since its birth, the wavelet analysis has been applied 
for the detection of singularity of a data function. 
Let us define the Hölder exponent þh(xo) at xo of a 
function f(x) is defined here as the largest value of 
the exponent / such that there exists a polynomial 
P(x) of degree n that satisfies for x in the 
neighborhood of xo: 


F(x) — Pa(x — xo)| = O(|x — xol") 


The data function is not differentiable if (xo) < 1, 
but if b(xo) >1 then it is differentiable and a 
singularity may arise in its higher derivatives. The 
wavelet transform is applied to find the Holder 
exponent h(x), because Ty(a,b) has an asymptotic 
behavior Ty(a, b) = O(a?™+1/?))(qa — 0) if the ana- 
lyzing wavelet has N(>h(xo)) vanishing moments, 
that is, 


/ x"v(x)dx=0, meEZ,0<m<N 


CO 


A commonly used analyzing wavelet for this purpose 
may be the N-time derivative of the Gaussian 
function w(x) = d™ (e7**/2) /dx. This method works 
well to examine a single or some finite number of 
singular points of the data function. 

When the data function is a multifractal function 
with an infinite number of singular point of various 
strengths, the multifractal property of the data 
function is often characterized by the singularity 
spectrum D(h) which denotes the Hausdorff dimen- 
sion of the set of points where h(x)=h. The 
singularity spectrum is, however, difficult to obtain 
directly from the CWT, and the Legendre transfor- 
mation is introduced to bypass the difficulty. 

Fully developed 3D fluid turbulence may be a 
typical example of wavelet application to the 
singularity detection. The Kolmogorov similarity 
law of fluid turbulence for the longitudinal velocity 
increment Au(r) = e- (u(x + re) — u(x)), where u(x) 
is the velocity field and e is a constant unit vector, 
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predicts a scaling property of the structure function; 
for r in the inertial subrange, 
((Au(r))?) ~ 1, G = p/3 

where (-) denotes the statistical mean. In reality, 
however, the scaling exponent ¢, measured in 
experiments shows a systematic deviation from p/3, 
which is considered to be a reflection of intermit- 
tency, namely the spatial nonuniformity or multi- 
fractal property of active vortical motions in 
turbulence. For simplicity, let us consider the 
velocity field on a linear section of the turbulence 
field. According to the multifractal formalism, the 
turbulence velocity field has singularities of various 
strengths described by the singularity spectrum 
D(h), which is related to the scaling exponent Cp 
through the Legendre transform, D(h) = inf,(ph — 
Cp + 1). This relation is often used to determine D(h) 
from the knowledge of ¢ (structure function 
method). However, this method does not necessarily 
work well because, for example, it does not capture 
the singular points of the Holder exponent larger 
than 1 and it is unstable for þh < 0. 

These difficulties are not restricted to the turbu- 
lence research, but arise commonly when the 
structure function is employed to determine the 
singularity spectrum. In these problems, the CWT 
Ty(a, 6) provides an alternative method. An inge- 
nious technique is to take only the modulus maxima 
of Ty(a,b) (for each of fixed a) to construct a 
partition function 


q 
Zaa) = E | sep, Ta) 


l lua (a,b')el 


where q € R, and Lmax denotes the set of all maxima 
lines, each of which is a continuous curve for small 
value of a, and there exists at least one maxima line 
toward a singular point of the Holder exponent 
h(xo) < N. In the limit of a— 0, defining the 
exponent T(q) as Z(a,q) ~ a7), one can obtain the 
singularity spectrum through the Legendre 
transform: 


D(h) = inf Iq(h +5) —7(q)| 


This method (wavelet-transform modulus-maxima 
(WTMM) method) is advantageous in that it works 
also for singularities of þh > 1 and h < 0. Several 
simple examples of multifractal functions have been 
successfully analyzed by this method. For fluid 
turbulence, this method gives a singularity spectrum 
D(h) which has a peak value of ~1 at b~1/3, 
consistently with Kolmogorov similarity law, but 


has a convex shape around }=1/3 suggesting a 
multifractal property. For a fractal signal, we note 
that the WIMM method enlightens the hierarchical 
organization of the singularities, in the branching 
structure of the WT skeleton defined by the 
maxima lines arrangement in the (a, b) half-plane. 

Though the above discussion also applies to the 
DWT, the detection of the Holder exponent / in 
experimental situations is usually performed by the 
CWT, which has no restriction on possible values of 
a, while the DWT is often employed for theoretical 
discussions of singularity and multifractal structure 
of a function. 


Multiscale Analysis 


Wavelet transform expands a data function in the 
time-frequency or the position-wavenumber space, 
which has twice the dimension of the original signal, 
and makes it easier to perform a multiscale analysis 
and to identify events involved in the signal. In the 
wavelet transform, as stated above, the time resolu- 
tion is higher at higher frequency, in contrast with 
the windowed Fourier transform where the time and 
the frequency resolutions are independent of fre- 
quency. Another advantage of wavelet is a wide 
variety of analyzing wavelet, which enables us to 
optimize the wavelet according to the purpose of 
data analysis. Both the CWT and the DWT are 
available for these time-frequency or position- 
wavenumber analysis. However, the CWT has 
properties quite different from those of familiar 
orthonormal bases of discrete wavelets. 


Multidimensional CWT 


The CWT can be formulated in an abstract way. We 
can regard G={(a, b)|a(40), b € R} as an affine 
group on R with the group operation of 
(a, b\(a', b')=(aa', ab' +b) associated with the 
invariant measure du—dadb/a*. The group G has 
its unitary representation in the Hilbert space 


H=L?(R): 
w z =) 


and then we can consider the CWT can be constructed 
as a linear map W from L?(R) to L?(G; dadb/a’): 


(U(a, b)f)(x) = 








W : f(x)» Tola, b) = T (U(a, bv. f) 


where (-,-) is the inner product of L?7(R) with the 
complex conjugate taken at the first element, and 


w(x) is a unit vector (analyzing wavelet) satisfying 
the abstract admissibility condition 


Co = | Ula: b0, Y) du < œ 


This formulation is applicable also to a locally 
compact group G and its unitary and square 
integrable representation in a Hilbert space H. 
Note that even the canonical coherent states are 
included in this framework by taking the Weyl- 
Heisenberg group and L?(R) for G and H, 
respectively. This abstract formulation allows us 
to extend the CWT to higher-dimensional Eucli- 
dean spaces and other manifolds: for example, 2D 
sphere S* for geophysical application and 4D 
manifold of spacetime taking the Poincaré group 
into consideration. 

In R”, the CWT of f(x 


transform are given by 


) € L?(R”) and its inverse 


Ty(a,r, b) = = | VI) dx 


Va 


da dr db 


b) 
ix T (a, r, b) 4") (x) al 





“ede 


where r € SO(n), b € R”, dr is the normalized invar- 
iant measure of a era ), and the wavelets are 
defined as )'% % (x) = (1/a"/*)W(r-"(x — b)/a), with 
the analyzing wanna Skee the admissibility 
condition 





C - | WO) a9 < o 
í gp o|” 


Note that these wavelets are constructed not only 
by dilation and translation but also by rotation 
which therefore gives the possibility for directional 
pattern detection in a data function. In the case of 
2D sphere S*, on the other hand, the dilation 
operation should be reinterpreted in such a way 
that at the North Pole, for example, it is the normal 
dilation in the tangent plane followed by lifting it 
to S* by the stereographic projection from the 
South Pole. 

Generally, the abstract map W thus defined is 
injective and therefore reversal, but not surjective in 
contrast with the Fourier case. Actually in the case of 
1D CWT, T,(a, b) is subject to an integral condition: 


ren- ff 


CARRE J JENEY (2c) dx 


dadb 
2 





K(a,b;a', b')T, (a, b’) 
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which defines the range of the CWT, a subspace 
of L*(R). Therefore, if one wants to modify T,(a, b) 
by, for example, assigning its value as zero in some 
parameter region just as in a filter process, care 
should be taken for the resultant T,,(a, b) to be in the 
image of the CWT. The reason may be understood 
intuitively by noticing that the wavelets y? (x) are 
linearly dependent on each other. The expression of 
a data function by a linear combination of the 
wavelets is therefore not unique, and thus is 
redundant. The CWT gives 7 Ty(a, b) of the 
least norm in L?(R*; dadb/a?). In physical inter- 
pretations of the CWT, however, this nonuniqueness 
is often ignored. 


Pattern Detection 


Edge detection The edges of an object are often the 
most important components for pattern detection. 
The edge may be considered to consist of points of 
sharp transition of image intensity. At the edge, the 
modulus of the gradient of the image f(x, y) is 
expected to take a local maximum in the 1D 
direction perpendicular to the edge. Therefore, the 
local maxima of |Vf(x, y)| may be the indicator of 
the edge. However, the image textures can also give 
similar sharp transitions of f(x, y), and one should 
take into account the scale dependence which 
distinguishes between edges and textures. One of 
the practically possible ways for this purpose is to 
use dyadic wavelets (x, y) =2/y"(2/x, Xy) which 
are generated from the two wavelets (4t, P) = (— 
00/0x, —00/0y), where ð is a localized function 
(multiscale edge detection method). The dyadic 
wavelet transform of the image f(x, y) 


T” (b1, b2) = (f(x 


defines the multiscale edges as a set of points 
b =(b,,b2) where the modulus of the wavelet trans- 
form, I(T}, T?)|, takes a locally maximum value 
(WTMM) in a 1D neighborhood of b in the 
direction of (T (b), T? (b)). Scale dependence of 
the magnitude of the modulus maxima is related to 
the Hölder exponent of f(x, y) similarly to 1D case, 
and thus gives information to distinguish between 
the edges and the textures. 

Inversely, the information of WTMM b;p= 
{(bDijp, b2jp)} of multiscale edges can be made use 
of for an approximate reconstruction of the original 
image, although the perfect reconstruction cannot be 
expected because of the noncompleteness of the 
Monon ae wavelets. Assuming that 


{ w p} = (Yi (x — h We (x — bjp) constitutes a 
frame ae the es, closed space generated by 


IE (x — b1,y—b2)), m=1,2 
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. an approximate image f is obtaine 
9 Ve ohs pp t g btained by 
inverting the relation 


LA S ob, = ODT? bin Ye 
m jp mM Jp 


using, for example, a conjugate gradient algorithm, 
where a fast calculation is possible with a filter bank 
algorithm for the dyadic wavelet (“algorithm à 
trous”). This algorithm gives only the solution of 
minimum norm among all possible solutions, but it 
is often satisfactory for practical purposes and thus 
is applicable also to data compression. 


Directional detection For oriented features such as 
segments or edges in images to be detected, a 
directionally selective wavelet for the CWT is desired. 
A useful wavelet for this purpose is one that has the 
effective support of its Fourier transform in a convex 
cone with apex at the origin in wave number space. A 
typical example of the directional wavelet may be the 
2D Morlet wavelet: 


w(x) = expliko - x) exp(—|Ax|*) 


where ko is the center of the support in Fourier 
space, and A is a 2 x 2 matrix diag[e!/*, 1](e < 1), 
where the admissibility condition for the CWT is 
approximately satisfied for |ko| > 5. Another exam- 
ple is the Cauchy wavelet which has the support 
strictly in a convex cone in wave number space. 

These wavelets have the directional selectivity 
with preference to a slender object in a specific 
direction. One of their applications is the analysis of 
the velocity field of fluid motion from an experi- 
mental data, where many tiny plastic balls distrib- 
uted in fluid give a lot of line segments in a picture 
taken with a short exposure. The directional wavelet 
analysis of the picture classifies the line segments 
according to their directions, indicating the direc- 
tions of fluid velocity. Another example may be a 
wave-field analysis where many waves in different 
directions are superimposed; the directional wavelets 
allow one to decompose the wave field into the 
component waves. Directional wavelets have also 
been applied successfully to detect symmetry of 
objects such as crystals or quasicrystals. 


Denoising and separation of signals The wavelet 
frame as well as the CWT give a redundant 
representation of a data function. If, instead of the 
original data, the redundant expression is trans- 
mitted, the redundancy is used to reduce the noise 
included in the received data because the redun- 
dancy requires the data to belong to a subspace, and 
the projection of the received data to the subspace 


reduces the noise component orthogonal to it. More 
specifically, the wavelet frame gives a representation 
of a data function as f(t)= doin Ok jk where the 
expansion coefficients aj, = (Wz, f(x)) satisfy the 
defining equation of the subspace 


oe >: Qik (Wi pr Wik) 


If the frame coefficients are transmitted, the projec- 
tion operator P, which is defined on the right-hand 
side of the above equation, reduces the noise in the 
received coefficients aj, contaminated during the 
transmission. 

However, this method is not applicable if the 
transmitted signal is not redundant. Then some 
a priori criterion is necessary to discriminate between 
signal and noise. Various criteria have been pro- 
posed in different fields. If the signal and the noise, 
or plural signals have different power-law forms of 
spectra, then their discrimination may be possible by 
the DWT at higher-frequency region where the 
difference in the magnitude of the coefficients is 
significant. In this approach, the wavelets of Meyer 
type, that is, an orthogonal wavelet with a compact 
support in Fourier space, may be preferable because 
the wavelets of different scales are separated, at least 
to some extent, in Fourier space. 

In fluid dynamics, the vorticity field of 2D 
turbulence is found to be decomposed into coherent 
and incoherent vorticity fields, according as the 
CWT is larger than a threshold value or not, 
respectively. These two fields give different Fourier 
spectra of the velocity field (k-° for coherent part 
while k? for incoherent part), showing that the 
coherent structures are responsible for the deviation 
from k`’ predicted by the classical enstrophy 
cascade theory. In an astronomical application, on 
the other hand, the data processing is performed by 
a more sophisticated method taking into account 
interscale relation in the wavelet transform, because 
an astronomical image contains various kinds 
of objects, including stars, double-stars, galaxies, 
nebulas, and clusters. In a medical image however 
contrast analysis is indispensable for diagnostic 
imaging to get a clear detailed picture of organic 
structure. A scale-dependent local contrast is defined 
as the ratio of the CWT to that given by an 
analyzing wavelet with a larger support. A multi- 
plicative scheme to improve the contrast is con- 
structed by using the local contrast. 


Signal Compression 


Signal compression is quite an important technology 
in digital communication. Speech, audio, image, and 
digital video are all important fields of signal 


compression, and plenty of compression methods 
have been put to practical use, but we mention here 
only a few. 

The MRA for orthogonal wavelets gives a 
successive procedure to decompose a subspace of 
L7(R) into a direct sum of two subspaces corre- 
sponding to higher- and lower-frequency parts; only 
the latter of which is decomposed again into its 
higher- and lower-frequency parts. Algebraically, 
this procedure was already known before the 
discovery of MRA in filter theory in electrical 
engineering, where a discretely sampled signal is 
convoluted with a filter series to give, for example, a 
high-pass-filtered or low-pass-filtered series. An 
appropriate designed pair of a high-pass and a 
low-pass filters followed by the downsampling 
yields two new series corresponding to the higher- 
and lower-frequency parts, respectively, which are 
then reversible by another two reconstruction filters 
with the upsampling. These four filters which are 
often employed in a widely used technique of “sub- 
band coding” then constitute a perfect reconstruc- 
tion filter bank. Under some conditions, successive 
applications of this decomposition process to the 
series of lower-frequency parts, which is equivalent 
to the nesting structure of MRA, have been used for 
data compression (quadrature mirror filter). A 
famous example is a data compression system of 
FBI for finger prints, consisting of wavelet coding 
with scalar quantization. 

In MRA, however, it is only the lower-frequency 
parts that are successively decomposed. If both the 
lower- and the higher-frequency parts are repeatedly 
decomposed by the decomposition filters, then the 
successive convolution processes correspond to a 
decomposition of data function by a set of wavelet- 
like functions, called “wavelet packet,” where there 
are choices whether to decompose the higher- and/or 
the lower-frequency parts. The best wavelet packet, in 
the sense of the entropy, for example, within a 
specified number of decompositions, often provides 
with a powerful tool for data compression in several 
areas, including speech analysis and image analysis. 
We also note that from the viewpoint of the best basis 
which minimizes the statistical mean square error of 
the thresholded coefficients, an orthonormal wavelet 
basis gives a good concentration of the energy if the 
original signal is a piecewise smooth function super- 
imposed by a white noise, which is thus efficiently 
removed by thresholding the coefficients. The effi- 
ciency of a wavelet expansion of a signal is sometimes 
evaluated with the entropy of “probability” defined as 
la */\|f\|?. A better wavelet can be selected by 
reducing the entropy, practically from among some 
set of wavelets, and its restricted expansion coefficients 
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give a compressed signal. One of the systematic 
methods to generate such a suitable basis is also to 
employ the wavelet packets. 


Numerical Calculation 


Application of wavelet transform, especially of the 
DWT, to numerical solver for a differential equation 
(DE) has long been studied. At the first sight, the 
wavelets appear to give a good DE solver because 
the wavelet expansion is generally quite efficient 
compared to Fourier series due to its spatial 
localization. But its implementation to an efficient 
computer code is not so straightforward; research is 
still continuing for concrete problems. Application 
of the CWT to spectral method for partial differ- 
ential equation (PDE) has been studied extensively. 
There is no wavelet which diagonalizes the differ- 
ential operator 0/0x; therefore, an efficient numer- 
ical method is necessary for derivatives of wavelets. 
Products of wavelets also yield another numerical 
problem. MRA brings about mesh points which are 
adaptive to some extent, but finite element method 
still gives more flexible mesh points. 

For some scaling-invariant differential or integral 
operators, including 07/0x*, Abel transformations, 
and Reisz potential, adaptive biorthogonal wavelets 
can be provided with block-diagonal Galerkin 
representations, which has been applied to data 
processing. Generally, simultaneous localization of 
wavelets, both in space and in scale, leads to a 
sparse Galerkin representation for many pseudodif- 
ferential operators and their inverses. A threshold- 
ing technique with DWT has been introduced to 
coherent vortex simulation of the 2D Navier-Stokes 
equations, to reduce the relevant wavelet coeffi- 
cients. Another promising application of wavelet 
occurs as a preprocessor for an iterative Poisson 
solver, where a wavelet-based preconditioning leads 
to a matrix with a bounded condition number. 


Other Wavelets and Generalizations 


Several new types of wavelets have been proposed: 
“coiflet” whose scaling function has vanishing 
moments giving expansion coefficients approxi- 
mately equal to values of the data functions, and 
“symlet” which is an orthonormal wavelet with a 
nearly symmetric profile. Multiwavelets are wavelets 
which give a complete orthonormal system in L? 
space. In 2D or multidimensional applications of the 
DWT, separable orthonormal wavelets consisting of 
tensor products of 1D orthonormal wavelets are 
frequently used, while nonseparable orthonormal 
wavelets are also available. Another generalization 
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of wavelets is the Malvar basis which is also a 
generalization of local Fourier basis, and gives a 
perfect reconstruction. A new direction of wavelet is 
the second-generation wavelets which are con- 
structed by lifting scheme and free from the regular 
dyadic procedure, and thus applicable to compact 
regions as S* and a finite interval. 


See also: Fractal Dimensions in Dynamics; Image 
Processing: Mathematics; Intermittency in Turbulence; 
Wavelets: Application to Turbulence; Wavelets: 
Mathematical Theory. 
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Introduction 


The wavelet transform unfolds functions into time 
(or space) and scale, and possibly directions. The 
continuous wavelet transform has been discovered 
by Alex Grossmann and Jean Morlet who published 
the first paper on wavelets in 1984. This mathema- 
tical technique, based on group theory and square- 
integrable representations, allows us to decompose a 
signal, or a field, into both space and scale, and 
possibly directions. The orthogonal wavelet trans- 
form has been discovered by Lemarié and Meyer 
(1986). Then, Daubechies (1988) found orthogonal 
bases made of compactly supported wavelets, and 
Mallat (1989) designed the fast wavelet transform 
(FWT) algorithm. Further developments were done 
in 1991 by Raffy Coifman, Yves Meyer, and Victor 
Wickerhauser who introduced wavelet packets and 
applied them to data compression. The development 
of wavelets has been interdisciplinary, with con- 
tributions coming from very different fields such as 
engineering (sub-band coding, quadrature mirror 
filters, time-frequency analysis), theoretical physics 
(coherent states of affine groups in quantum 
mechanics), and mathematics (Calderon—Zygmund 
operators, characterization of function spaces, har- 
monic analysis). Many reference textbooks are 
available, some of them we recommend are listed 
in the “Further reading” section. Meanwhile, a large 
spectrum of applications has grown and is still 
developing, ranging from signal analysis and image 
processing via numerical analysis and turbulence 
modeling to data compression. 


Further Reading 


Benedetto JJ and Frazier W (eds.) (1994) Wavelets: Mathematics 
and Applications. Boca Raton, FL: CRC Press. 

van den Berg JC (ed.) (1999) Wavelets in Physics. Cambridge: 
Cambridge University Press. 

Daubechies I (1992) Ten Lectures on Wavelets, SIAM, CBMS61, 
Philadelphia. 

Mallat S (1998) A Wavelet Tour of Signal Processing. San Diego: 
Academic Press. 

Strang G and Nguyen T (1997) Wavelet and Filter Banks. 
Wellesley: Wellesley-Cambridge Press. 


In this article, we will first define the continuous 
wavelet transform and then the orthogonal wavelet 
transform based on a multiresolution analysis. 
Properties of both transforms will be discussed 
and illustrated by examples. For a general intro- 
duction to wavelets, see Wavelets: Applications. 


Continuous Wavelet Transform 


Let us consider the Hilbert space of square-integr- 
able functions L?(R)={f:|||f\|, <}, equipped 
with the scalar product (f,g)= fp f(x)g*(x) dx 
(* denotes the complex conjugate in the case of 
complex-valued functions) and where the norm is 


defined by |Ifll2 = (ff). 


Analyzing Wavelet 


The starting point for the wavelet transform is to 
choose a real- or complex-valued function ww € 
L? (R), called the “mother wavelet,” which fulfills 
the admissibility condition, 


c= [lie <0 a 
where 
de) = f alee) e7? dy 2] 


denotes the Fourier transform, with ¿=v —1 and k 
the wave number. If w is integrable, that is, ọ% € 
L'(R), this implies that ~ has zero mean, 


[ve dx =0 or w0)=0 B 


In practice, however, one also requires the wavelet 
w to be well localized in both physical and Fourier 


| xrv(@) de = 0 form=0,M—-1 M 


that is, monomials up to degree M — 1 are exactly 
reproduced. In Fourier space, this property is 
equivalent to 


m 


ok) k-o=0 form=0,M-1 [5] 


therefore, the Fourier transform of w decays 
smoothly at k=0. 


Analysis 


From the mother wavelet y, we generate a family of 
continuously translated and dilated wavelets, 


bool) =e) 


fora>OandbeR [6] 





where a denotes the dilation parameter, correspond- 
ing to the width of the wavelet support, and b the 
translation parameter, corresponding to the position 
of the wavelet. The wavelets are normalized in 
energy norm, that is, ||Ya bll =1. 

In Fourier space, eqn [6] reads 


Pab = Valak) e7? [7] 


where the contraction with 1/a in [6] is reflected in 
a dilation by a |7] and the translation by b implies a 
rotation in the complex plane. 

The continuous wavelet transform of a function f 
is then defined as the convolution of f with the 
wavelet family w, p: 


j(a.b)= | fwi) 8 


where y* , denotes, in the case of complex-valued 
wavelets, the complex conjugate. 
Using Parseval’s identity, we get 


jta.b)= | Fede p 


and the wavelet transform could be interpreted as a 
frequency decomposition using bandpass filters w, 4 
centered at frequencies k=k,,/a. The wave number 
k,, denotes the barycenter of the wavelet support in 
Fourier space 


JE RAB(A)| dk 
Jo WW(k)| dk 
Note that these filters have a variable width Ak/k; 


therefore, when the wave number increases, the 


(10) 
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bandwidth becomes wider. 


Synthesis 


The admissibility condition [1] implies the existence 
of a finite energy reproducing kernel, which is a 
necessary condition for being able to reconstruct the 
function f from its wavelet coefficients f. One then 
recovers 


dadb 


q2 





TE o / “| i 7 (a, b)Wap(x) 11) 


which is the inverse wavelet transform. 

The wavelet transform is an isometry and one has 
Parseval’s identity. Therefore, the wavelet transform 
conserves the inner product and we obtain 


fa =f Hosa) dx 


1 2? [Ps 7 dadb 
5 f J Fla, bE (a,b) [12] 


q2 





As a consequence, the total energy E of a signal 
can be calculated either in physical space or in 
wavelet space, such as 


B= |] fÈ dx 


t7 a dadb 
-gj | feorS m 


This formula is also the starting point for the 
definition of wavelet spectra and scalogram (see 
Wavelets: Application to Turbulence). 





Examples 


In the following, we apply the continuous wavelet 
transform to different academic signals using the 
Morlet wavelet. The Morlet wavelet is complex 
valued, and consists of a modulated Gaussian with 


width ko /7: 


W(x) = 


The envelope factor kg controls the number of 
oscillations in the wave packet; typically, ko =5 is 
used. The correction factor e~*0/2, to ensure its 
vanishing mean, is very small and often neglected. 
The Fourier transform is 


(e2"™ = e—ko/?) e72n x/k [14 


oa k 2 2 2 
UR) = Fee OCD ea) [AS 


Figure 1 shows wavelet analyses of a cosine, two 
sines, a Dirac, and a characteristic function. Below 
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the four signals we plot the modulus and the phase 
of the corresponding wavelet coefficients. 


Higher Dimensions 


The continuous wavelet transform can be extended to 
higher dimensions in L*(R”) in different ways. Either 
we define spherically symmetric wavelets by setting 
w(x) = W'4(|x]) for x € R” or we introduce in addition 
to dilations a € R* and translations b € R” also rota- 
tions to define wavelets with a directional sensitivity. In 
the two-dimensional case, we obtain for example, 


aval) =ar (E) s 


where a € R*,b € R?, and where Rọ is the rotation 
matrix 
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The analysis formula | 


F(a, b, 0) = [ai Vinolx)dx [8] 


and for the corresponding inverse wavelet transform 
[11] we obtain 


1 Gi 2r dadbdé 
Nef fof Faber) (19) 
wJ0O JR2J0 a 


Similar constructions can be made in dimensions 
larger than 2 using n— 1 angles of rotation. 


org 17] 


cos 0 


| then becomes 
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Examples of a one-dimensional continuous wavelet analysis using the complex-valued Morlet wavelet. Each subfigure 


shows on the top the function to be analyzed and below (left) the modulus of its wavelet coefficients and below (right) the phase of its 


wavelet coefficients. 


Discrete Wavelets 
Frames 


It is possible to obtain a discrete set of quasiortho- 
gonal wavelets by sampling the scale and position 
axes a,b. For the scale a we use a logarithmic 
discretization: a is replaced by a;=a,', where ao is 
the sampling rate of the log a axis (a9 = A(loga)) 
and where j € Z is the scale index. The position b is 
discretized linearly: b is replaced by x;;=iboao’, 
where bo is the sampling rate of the position axis at 
the largest scale and where i € Z is the position 
index. Note that the sampling rate of the position 
varies with scale, that is, for finer scales (increasing j 
and hence decreasing aj), the sampling rate 
increases. Accordingly, we obtain the discrete wave- 
lets (cf. Figure 2) 


wile!) = aj Yap (—* - = 20 


aj 


and the corresponding discrete decomposition for- 
mula is 


=(= f FEWE a 


Furthermore, the wavelet coefficients satisfy the 
following estimate: 


Alf < So EÊ < BIA [22] 


with frame bounds B > A > 0. In the case A = B we 
have a tight frame. 

















(b) 


Figure 2 Orthogonal quintic spline wavelets yj ;(x) = 2y 


(2x —i) at different scales and positions: (a) ws,6(x), 
We, 32(X), W7, 108(X), and (b) corresponding wavelet coefficients. 
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The discrete reconstruction formula is 


y= S e 23 


J=—OO 1=—00 


where C is a constant and R(x) is a residual, both 
depending on the choice of the wavelet and the 
sampling of the scale and position axes. For the parti- 
cular choice aj) =2 (which corresponds to a scale 
sampling by octaves) and bj) = 1, we have the dyadic 
sampling, for which there exist special wavelets y; that 
form an orthonormal basis of L*(R), that is, such that 


(Wii, Wye) = 6 biz [24] 


where 6 denotes the Kronecker symbol. This means 
that the wavelets %; are orthogonal with respect to 
their translates by discrete steps 2/7 and their dilates 
by discrete steps 27 corresponding to octaves. In 
this case, the reconstruction formula is exact with 
C=1 and R=0. Note that the discrete wavelet 
transform has lost the invariance by translation and 
dilation of the continuous one. 


Orthogonal Wavelets and Multiresolution Analysis 


The construction of orthogonal wavelet bases and the 
associated fast numerical algorithm is based on the 
mathematical concept of multiresolution analysis 
(MRA). The underlying idea is to consider approx- 
imations f; of the function f at different scales j. 
The amount of information needed to go from a coarse 
approximation f; to a finer resolution approximation 
fj41 is then described using orthogonal wavelets. The 
orthogonal wavelet analysis can thus be interpreted as 
decomposing the function into approximations of the 
function at coarser and coarser scales (i.e., for 
decreasing j), where the differences between the 
approximations are encoded using wavelets. 

The definition of the MRA was introduced by 
Stéphane Mallat in 1988 (Mallat 1989). This 
technique constitutes a mathematical framework of 
orthogonal wavelets and the related FWT. 

A one-dimensional orthogonal MRA of L?(R) is 
defined as a sequence of successive approximation 
spaces V;,j7 € Z, which are closed imbedded subspaces 
of L*(R). They verify the following conditions: 


V; C Via4 VEZ [25] 
UV =L R) 26 
JEZ 
(Vj = {0} [27] 
JEZ 
f(x) € Vj e f(2x) € Vi [28] 
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A scaling function ¢(x) is required to exist. Its 
translates generate a basis in each Vj, that is, 


V; V; = span{ dji} jez, |29] 
where 
pilx) =? d(x -1), jhiczZ [30] 


At a given scale j, this basis is orthonormal with respect 
to its translates by steps i/2/ but not to its dilates, 


(Oji, Pik) = Sik [31] 


The nestedness of the approximation spaces [28] 
generated by the scaling function ¢ implies that it 
satisfies a refinement equation: 


j-i) = X bn-2ibin (x) [32] 
with the filter coefficients by, = (¢jn, ¢)-1,0), which 
determine the scaling function completely. In gen- 
eral, only the filter coefficients 4, are known and no 
analytical expression of ¢ is given. Equation [32] 
implies that the approximation of a function at 
coarser scale can be described by linear combina- 
tions of the same function at finer scales. 

The orthogonal projection of a function f € L7(R) 
on V} is defined as 


Py, :f—Pv,f =f [33] 


with 


fy(x) = S Pik) Pjk (x) [34] 


REZ, 


This coarse graining at a given scale J is done by 
filtering the function with the scaling function ġ. As 
a filter, the scaling function @ does not have 
vanishing mean but is normalized so that 
f> ġ(x)dx=1. 

As Vj_; is included in V;, we can define its 
orthogonal complement space in Vy: 


V = Vy-1 $ Wj [35] 





Correspondingly, the approximation of the func- 
tion f at scale 27, belonging to Vj, can be 
decomposed as a sum of orthogonal projections on 
V;-ı and W7_1, such that 


Py,f = Py, of + Pal [36] 


Based on the scaling function ¢, one can construct a 
function w, the so-called mother wavelet, given by 
the relation 


Dix) = S aa) [37] 


nE 


with g= (Pm Yi-1,0) and where q(x) =2//¢ 
w(x — i), ji € Z (cf. Figure 2). The filter coeffi- 
cients g,, can be computed from the filter coefficients 
hy, using the relation 


Bn = (1) "hin [38] 


The translates and dilates of the wavelet w 
constitute orthonormal bases of the spaces W;, 


W; = span{ yji hez, [39] 


As in the continuous case, the wavelets have 
vanishing mean, and also possibly vanishing higher- 
order moments; therefore, 


| "Wad =0 for m =0,...,M—1 [40] 


CO 


Let us now consider approximations of a function 
f € L?(R) at two different scales /: 


è at scale j 


fix) = X fibi) [41] 


1=— © 


e at scale j — 1 


fa = Y Pda) aa 


1=—00 


with the scaling coefficients 


fii = f, Pi) [43] 


which correspond to local averages of the function 
f at position i2 and at scale 27. 

The difference between the two approximations is 
encoded by the wavelets 


pE) -fa = Y hoa) A 


1=—00 


with the wavelet coefficients 


~ 


fii = (f Wii) [45] 


which correspond to local differences of the function 
at position (2i + 1)270+1 between approximations 
at scales 27 and 2709, 

Iterating the two-scale decomposition [44], any 
function f € L7(R) can be expressed as a sum of a 
coarse-scale approximation at a reference scale jo 
that we set to O here, and their successive 


differences. These details are needed to go from one 


scale j to the next finer scale j;+1 for 
T ly 
f(x) = D fotol) + >> D fb) 46 
1=—00 1=0 1==00 


For numerical applications, the sums in eqn [46] 
have to be truncated in both scale j and position i. 
The truncation in scale corresponds to a limitation 
of f to a given finest scale J, which is in practice 
imposed by the available sampling rate. Due to the 
finite length of the available data, the sum over 7 
also becomes finite. The decomposition [46] is 
orthogonal, as, by construction, 


(Din, Dy) = jj Êi [47] 


(Unom =0 forj>7 |48] 


in addition to [31]. 


Fast Wavelet Transform 


Starting with a function f € L?(R) given at the finest 
resolution 27 (i.e., we know f; € Vj and hence the 
coefficients fi; for i€ Z), the FWT computes its 
wavelet coefficients f; by decomposing successively 
each approximation fy into a coarser scale approx- 
imation fj_1, plus the corresponding details which 
are encoded by the wavelet coefficients. The 
algorithm uses a cascade of discrete convolutions 
with the low pass filter 4,, and the bandpass filter g,,, 
followed by downsampling, in which only one 
coefficient out of two is retained. The direct wavelet 
transform algorithm is 


è initialization 
given f € L7(R) and fi = (5) fori € zZ 


e decomposition 
for j=J to 1, step —1, do 
ET ~ > 


neZ, 


hn-2i [49] 


jn 


fizi z N &n—2if in [50] 
n€Z, 

The inverse wavelet transform is based on 
successive reconstructions of fine-scale approxima- 
tions f; from coarser scale approximations fj-1, 
plus the differences between approximations at 
scale j — 1 and the finer scale j which are encoded 
by fj-1,i The algorithm uses a cascade of discrete 
convolutions with the filters 4, and g,, preceded by 
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upsampling which adds zeros in between two 
successive coefficients. 


è reconstruction 
for j=1 to J, step 1, do 


fii = >. bi2nf jan + >, Sidali [51] 


nN=—OO n=— 00 


The FWT has been introduced by Stéphane Mallat 
in 1989. If the scaling functions (and wavelets) are 
compactly supported, the filters 4, and g, have only 
a finite number of nonvanishing coefficients. In this 
case, the numerical complexity of the FWT is O(N) 
where N denotes the number of samples. 


Choice of Wavelets 


Orthogonal wavelets are typically defined by their 
filter coefficients ),, since in general no analytic 
expression for w is available. In the following, we 
give the filter coefficients of ), for some typical 
orthogonal wavelets. The filter coefficients of g, can 
be obtained using the quadrature relation between 
the two filters [38]. 


e Haar D1 (one vanishing moment): 


hy =1/Vv2 
hy =1/V2 


e Daubechies D2 (two vanishing moments): 


ho = 0.482 962 913 145 
hı = 0.836 516 303 736 
hy = 0.224 143 868 042 
hz = —0.129 409 522 551 


è Daubechies D3 (three vanishing moments): 


ho = 0.332 670 552 950 
hı = 0.806 891 509 311 
hy = 0.459 877 502 118 
h3 = —0.135 011020010 
h4 = —0.085 441 273 882 
hs = 0.035 226 291 882 
è Coiflets C12 (four vanishing moments): the 


wavelets and the corresponding scaling function 
are shown in Figure 3. 


Remarks The construction of orthogonal wavelets 
in L7(R) can be modified to obtain wavelets on the 
interval, that is, in L*([0,1]). Therewith, boundary 
wavelets are introduced, while in the interior of the 
interval the wavelets are not modified. 
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Figure 3 Orthogonal wavelets Coiflet C12. (a) Scaling function ¢(x) (left) and |ġ(w)|. (0) Wavelet y(x) (left) and |x(w)}. 


A periodic MRA of L?(T), where T=R/Z 
denotes the torus, can also be constructed by 
periodizing the wavelets in L*(R), using 


PP (xc) = X V(x +) 


REZ, 


Relaxing the condition of orthogonality allows 
greater flexibility in the choice of the basis 
functions. For example, biorthogonal wavelets can 
be designed using different basis functions for 
analysis (ê) and synthesis (°) which are related 
but no longer orthogonal. A couple of refinable 
scaling functions (¢°,¢°) with related wavelets 
(~*,v*) which are by construction biorthogonal 
generate a biorthogonal MRA V?,V?. From an 
algorithmic point of view, only two different filter 
couples (g*, 4?) for the forward and (g‘,‘) for the 
backward FWT are used, without changing the 
algorithm. 

The multiresolution approach can be further 
generalized, for samplings on nonequidistant 
grids leading to the so-called second-generation 
wavelets. 


Higher Dimensions 


The previously presented one-dimensional construc- 
tion can be extended to higher dimensions. For 
simplicity, we will consider only the two- 
dimensional case, since higher dimensions can be 
treated analogously. 


Tensor product construction Having developed a 
one-dimensional orthonormal basis w; of L*(IR), one 
could use these functions as building blocks in 
higher dimensions. One way of doing so is to take 
the tensor product of two one-dimensional bases 
and to define 


Wie iyini XY) = Viris (X) Wy, i, (Y) [52] 


The resulting functions constitue an orthonormal 
wavelet basis for L*(R*). Each function f € L*(R*) 
can then be developed into 
[y= a, 
Jacob Jy sty 
with fiiin i = (fs Viisin i)e However, in this basis 
the two variables x and y are dilatated separately 
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and therefore no longer form an MRA. This means 
that the functions yj,;, involve two scales, 2% and 
2», and each of the functions is essentially supported 
on a rectangle with these side-lengths. Hence, the 
decomposition is often called rectangular wavelet 
decomposition (cf. Figure 4a). From the algorithmic 
viewpoint, this is equivalent to applying the one- 
dimensional wavelet transform to the rows and the 
columns of a matrix or a function. For some 
applications, such a basis is advantageous, for others 
not. Often the notion of a scale has a certain 
meaning. For an application, one would like to have 
a unique scale assigned to each basis function. 


Multiresolution construction Another much more 
interesting construction is the construction of a truly 
two-dimensional MRA of L?(R*). It can be obtained 
through the tensor product of two one-dimensional 
MRAs of L7(R). More precisely, one defines the 
spaces V; j € Z by 


Vj = V; &) V; [54] 


and V;=span{¢;j. ini (X, Y) = j,i, (x) j,i, (Y) tes ty E Z} 
fulfilling analogous properties as in the one- 
dimensional case. 

Likewise, we define the complement space W; to 
be the orthogonal complement of V; in V;41, that is, 


Via = Vier 8 Vj 


= (V; $ W;) 8 (V; 6 W;) [35] 





=V; 8 V; 8 ((W; 8 V;) 
© (Vj 8 Wj) 6 (W; 8 W;)) [56] 





= V; W; [57] 
It follows that the orthogonal complement W; = 


Vj41 OV; consists of three different types of func- 
tions and is generated by three different wavelets 


(b) 


Figure 4a Schematic representation of the 2D (b) wavelet transforms: (a) Tensor product construction and (b) 2D MRA. 


Wiig (x) ji, (y), = 1 
Pii hY), E= 
Viib), €= 3 

Observe that here the scale parameter j simulta- 
neously controls the dilatation in x and y. We recall 
that in d dimensions this construction yields 24 — 1 
types of wavelets spanning Wj. 


Using [58], each anon f € L?(R?) can be 
developed into a multiresolution basis as 


=) >, > ff tasty M (x,y) [59] 


N 


x,y) = [58] 


P 
Jstx sly 


J ix, iy €=1,2,3 
with l; n fW i, iyo: A schematic representa- 
tion a the wavelet coefficients is shown in 


Figure 4b. The algorithmic structure of the one- 
dimensional transforms carries over to the two- 
dimensional case by simple tensorization, that is, 
applying the filters at each decomposition step to 
rows and columns. 


Remark The described two-dimensional wavelets 
and scaling functions are separable. This advantage is 
the ease of generation starting from one- 
dimensional MRAs. However, the main drawback 
of this construction is that three wavelets are needed 
to span the orthogonal complement space Wj. 
Another property should be mentioned. By construc- 
tion, the wavelets are anisotropic, that is, horizontal, 
diagonal, and vertical directions are preferred. 


Approximation Properties 
Reproduction of Polynomials 


A fundamental property of the MRA is the exact 
reproduction of polynomials. The vanishing 
moments of the wavelet w, that is, fg x”y(x)dx =0 


434 Wavelets: Mathematical Theory 


for m=0,M — 1, is equivalent to the fact that 
polynomials up to degree M — 1, can be expressed 
exactly as a linear combination of scaling functions, 
Dmn(x)=)_ ezn" o(x—n) for m=0,M—1. This so- 
called Strang—Fix condition proves that w has M 
vanishing moments if and only if any polynomial of 
degree M — 1 can be written as a linear combination 
of scaling functions ¢. Note that, as pm Z L7(R), the 
coefficients n” are not in F(Z). 


Regularity and Local Decay of Wavelet 
Coefficients 


The local or global regularity of a function is closely 
related to the decay of its wavelet coefficients. If a 
function is locally in C*(R) (the space of s-times 
continuously differentiable functions), it can be well 
approximated locally by a Taylor series of degree s. 
Consequently, its wavelet coefficients are small at 
fine scales, as long as the wavelet y has enough 
vanishing moments. The decay of the coefficients 
hence determines directly the error being made when 
truncating a wavelet sum at some scale. 

Depending on the type of norm used and whether 
global or local characterization is concerned, various 
relations of this kind have been developed. Let us 
take as example the case of an a-Lipschitz function. 

Suppose f € L7(R), then for [a,b] C R the func- 
tion f is a-Lipschitz with 0 < a < 1 for any xo € 
[a,b], that is, |f(xo +h) — f(xo)| < C\h|*, if and 
only if there exists a constant A such that |fji| < 
A2-/¢-1/2 for any (j,i) with i/2/ € [a,b]. 

This shows the relation between the local reg- 
ularity of a function and the decay of its wavelet 
coefficients in scale. 


Example To illustrate the local decay of the 
wavelet coefficients, we consider in Figure 5 the 
function f(x) = sin(27x) for x < 1/4 and x > 3/4 
and f(x) = —sin (27x) for 1/4 < x < 3/4. The corre- 
sponding wavelet coefficients for quintic spline 
wavelets are plotted in logarithmic scale. The 
wavelet coefficients show that only in a local region 
around singularities the fine-scale coefficients are 
significant. 


Linear Approximation 


The exact reproduction of polynomials can be used 
to derive error estimates for the approximation of a 
function f at a given scale, which corresponds to 
linear approximation. We consider f belonging to 
the Sobolev space W*?(R7), that is, the weak 
derivatives of f up to order s belong to L? (R). The 
linear approximation of f at scale J, corresponding 
to the projection of f onto Vj, is then given by 
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Figure 5 Orthogonal wavelet decomposition using quintic 
spline wavelets: (a) function f(x) = sin (27x) for x < 1/4 and x > 
3/4 and f(x)= —sin (27x) for 1/4<x<3/4 sampled on a grid 
xj =i/2/,i=0,...,2”—1 with J=9 and (b) corresponding wavelet 
coefficients l0g4o |f,;| for i=0,...,2/-1 and j=0,...,J— 1. 
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Hi 
I= n [60] 


j=0 EZ 


The approximation error can be estimated by 
If — fille, < CA [61] 


where s denotes the smoothness of the function in 
L?,d the space dimension, and m the number of 
vanishing moments of the wavelet Y. In the case of 
poor global regularity of f, that is, for small s, a 
large number of scales J is needed to get a good 
approximation of f. 

In Figure 6, we plot the linear approximation of 
the function f shown in Figure 5. The function fe is 
reconstructed using wavelet coefficients up to scale 
J—1=5, so that in total only 64 out of 512 
coefficients are retained. We observe an oscillating 
behavior of fy near the discontinuities of f which 
dominates the approximation error. 


Nonlinear Approximation 


Retaining the N largest wavelet coefficients in the 
wavelet expansion of f in [46], without imposing 
any a priori cutoff scale, yields the best N-term 
approximation f^. In contrast to the linear approx- 
imation [60], it is called nonlinear approximation, 
since the choice of the retained coefficients depends 
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Figure 6 (a) Linear approximation fy of the function f in 
Figure 5 for J=6, reconstructed from 64 wavelet coefficients 
using quintic splines wavelets and (b) corresponding wavelet 
coefficients log, |f ;| for i=0,...,2/-1 and j=0,...,J—1. 
Note that the coefficients for J > 5 have been set to zero. 


1.00E +00 


Logarithm 


on the function f. The mathematical theory has been 
formalized by Cohen, Dahmen, and De Vore. 

The nonlinear approximation of the function f can 
then be written as 


Ma = X fiv) [62] 


(j,i) EAN 


where Ayn denotes the ensemble of all multi-indices 
à= (j,i), indexing the N largest coefficients (mea- 
sured in the /? norm), 


Av=(AeR=1,NI le> Wille Yee A} [63 


with A=f{u=(j,i), j> 0,1€ Z}. The nonlinear 
approximation leads to the following error estimate: 


If — A~ ll < CN [64 


where s denotes the smoothness of f in the larger 
space L4 (RŽ) with 


1 1 s 


q pd 
which corresponds to the Sobolev embedding line 
(Figure 7). This estimate shows that the nonlinear 
approximation converges faster than the linear one, 
if f has a larger regularity in L4, that is, f € W®4 
(R2), which is for example the case for functions 
with isolated singularities and for small q. 
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Figure 7 Schematic representation of linear and nonlinear 
approximation. 


In Figure 8, we plot the nonlinear approximation 
of the function f shown in Figure 5. The function f] 
is reconstructed using the strongest 64 wavelet 
coefficients out of 512 coefficients. Compared to 
the linear approximation (cf. Figure 6), the oscilla- 
tions around the discontinuities disappear and the 
approximation error is reduced while using the same 
number of coefficients. 


Compression and Preconditioning of Operators 


The nonlinear approximation of functions can be 
extended to certain operators leading to an efficient 
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Figure 8 (a) Nonlinear approximation f of the function f in 
Figure 5 reconstructed from the 64 largest wavelet coefficients 
using quintic splines wavelets, (b) retained wavelet coefficients 
logio |f,;| for i=0,...,2/-1 and j=0,...,J— 1. 
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representation in wavelet space, that is, to sparse 
matrices. For integral operators, for example, 
Calderon—Zygmund operators T on R defined by 


Tf (x) = | K(x, yf (y) dy 65) 


where the kernel k satisfies 


C 


k(x,y,)| < 
ky) s 





and 
G 
< 


— 2 
lx — y| 








o O 
Eelen) + Ekl) 


their wavelet representation (Tyi, Ypy} is sparse 
and a large number of weak coefficients can be 
suppressed by simple thresholding of the matrix 
entries while controlling the precision. The resulting 
numerical scheme is called BCR algorithm and is 
due to Beylkin et al. (1991). 

The characterization of function spaces by the 
decay of the wavelet coefficients and the corre- 
sponding norm equivalences can be used for 
diagonal preconditioning of integral or differential 
operators which leads to matrices with uniformly 
bounded condition numbers. For elliptic differential 
operators, for example, the Laplace operator V? the 
norm equivalence |V f|| ~ ||27f;|| can be used for 
preconditioning the matrix (V7; ;, Yy, v} by a simple 
diagonal scaling with 2% to obtain a uniformly 
bounded condition number. For further details, we 
refer to the book of Cohen (2000). 


Wavelet Denoising 


We consider a function f which is corrupted by a 
Gaussian white noise n € N (0,0%). The noise is 
spread over all wavelet coefficients S, while, 
typically, the original function f is determined by 
only few significant wavelet coefficients. The aim is 
then to reconstruct the function f from the observed 
noisy signal s=f +n. 

The principle of the wavelet denoising can be 
summarized in the following procedure: 


© Decomposition. Compute the wavelet coefficients 
sa using the FWT. 

e Thresholding. Apply the thresholding function p- 
to the wavelet coefficients s,, thus reducing the 
relative importance of the coefficients with small 
absolute value. 

è Reconstruction. Reconstruct a denoised version sc 
from the thresholded wavelet coefficients using 
the fast inverse wavelet transform. 


The thresholding parameter € depends on the 
variance of the noise and on the sample size N. 
The thresholding function p we consider corre- 
sponds to hard thresholding: 


if |a| >€ 


taze G 


p-(a)= 46 


Donoho and Johnstone (1994) have shown that 
there exists an optimal € for which the relative 
quadratic error between the signal s and its 
estimator sc is close to the minimax error for all 
signals s € H, where H belongs to a wide class of 
function spaces, including Holder and Besov spaces. 
They showed using the threshold 


Ep = 0,V2InN [67] 


yields an error which is close to the minimum error. 
The threshold ep depends only on the sampling N 
and on the variance of the noise o,; hence, it is 
called universal threshold. However, in many 
applications, o,, is unknown and has to be estimated 
from the available noisy data s. For this, the present 
authors have developed an iterative algorithm (see 
Azzolini et al. (2005)), which is sketched in the 


following: 


1. Initialization 

(a) given s,,kR=0,...,N—1. Set i=0 and com- 
pute the FWT of s to obtain s); 

(b) compute the variance of of s as a rough 
estimate of the variance of n and compute the 
corresponding threshold £ọ = (2 In N oa) 2, 

(c) set the number of coefficients considered as 
noise Nyoise = N. 

2. Main loop repeat 

(a) set Na” Noe 
coefficients Nhnoise 

than «;; 

(b) compute the new variance C4 from the 
wavelet coefficients whose modulus is smal- 
ler than c; and the new threshold £;}1 = 
(2(In N)o?, 4)"; 

(c) seti=i+1 until (N e= 

3. Final step 

(a) compute sc from the coefficients with mod- 


ulus larger than gc; using the inverse FWT. 


and count the wavelet 
with modulus smaller 


= Naoise): 


Example To illustrate the properties of the denoising 
algorithm, we apply it to a one-dimensional test signal. 
We construct a noisy signal s by superposing a 
Gaussian white noise, with zero mean and variance 
oiy =1, to a function f, normalized such that 
((1/N) >>, fel’) '/* = 10. The number of samples is 
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Figure 9 Construction (top) of a 1D noisy signal s=f + n (middle), and results obtained by the recursive denoising algorithm 


(bottom). 


N= 8192. Figure 9a shows the function f together 
with the noise n; Figure 9b shows the constructed 
noisy signal s and Figure 9c shows the wavelet 
denoised signal sc together with the extracted noise. 
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Main Definition 


WDVV equations of associativity (after E Witten, 
R Dijkgraaf, E Verlinde, and H Verlinde) is 
tantamount to the following problem: find a func- 
tion F(v) of n variables v= (vt, v?,...,v”) satisfying 


the conditions [1], [3], and [4] given below. First, 


OPF(v) 
Av'av°ave — nee g 


must be a constant symmetric nondegenerate matrix. 
Denote (7%?) = (Nag) the inverse matrix and intro- 
duce the functions 


0 F(v) 
Ov'eOv"Ov® ’ 


capt) =n" OO) = Lhar [2] 


The main condition says that, for arbitrary 
v!,..., u” these functions must be structure con- 
stants of an associative algebra, that is, introducing 
a v-dependent multiplication law in the n-dimen- 
sional space by 


a- b:= (cha(w)a%b®, e ct (v)a%b*) 


one obtains an m-parameter family of n-dimensional 
associative algebras (these algebras will automati- 
cally be also commutative). Spelling out this condi- 
tion one obtains an overdetermined system of 
nonlinear PDEs for the function F(v) often also 
called WDVV associativity equations 


0° F(v) 
Ov“ dv’ Ov 


— 0). nò! 
Ov ðvl Ov 


O° F(v) 
OveHOv Ov? 

0? F(v) 
ovt OV Ov 


Ape 


for arbitrary 1<a,G,y,6 <n. (Summation over 
repeated indices will always be assumed.) The last 
one is the so-called quasihomogeneity condition 


EF = (3—d)F+3Aqgv°v? +B? +C M] 


where 
o 
L= (a5v" + b°) — 
p Ov 
for some constants aj, b” satisfying 
A =). b= 0 


Aag, Ba, C,d are some constants. E is called Euler 
vector field and d is the charge of the Frobenius 
manifold. 

For n=1 one has F(v)=(1/6)v°. For n=2 one 
can choose 


F(u,v) = tuv? + f(u) 


only the quasihomogeneity [4] makes a constraint 
for f(v). The first nontrivial case is for n=3. The 
solution to WDVV is expressed in terms of a 
function f =f(x,y) in one of the two forms (in the 
examples all indices are written as lower): 


d#0: F=4fvjv3+501v3 + f(v vs) 

2 

XXY — fyyy “ie fxxxfxyy [5] 
da0: f= ly? + v1V2V3 + f (v2, V3) 


fxxxf: yyy — f: xxyfxyy =1 


The function f(x,y) satisfies additional constraint 
imposed by [4]. Because of this the above PDEs [5] 
can be reduced (Dubrovin 1992, 1996) to a 
particular case of the Painlevé-VI equation (see 
Painlevé Equations). 

The problem [1], [3], [4] is invariant with respect 
to linear changes of coordinates preserving the 
direction of the vector 0/0vu!: 


viwHv = Pov" + O%, det(P3)# Q, i E 1 


It is also allowed to add to F(v) a polynomial of the 
degree at most 2. To consider more general non- 
linear changes of coordinates one has to give a 
coordinate-free form of the above equations [1], [3], 
[4]. This gives rise to the notion of Frobenius 
manifold introduced in Dubrovin (1992). 

Recall that a Frobenius algebra is a pair (A, <,>), 
where A is a commutative associative algebra with a 
unity e over a field k (we will consider only the cases 
k=R,C) and <,> is a k-bilinear symmetric non- 
degenerate invariant form on A, that is, 


<x- y, Z> = <x, y: Z> 
for arbitrary vectors x,y,z in A. 


Definition Frobenius structure (-,e,<,>,E,d) on 
the manifold M is a structure of a Frobenius algebra 
on the tangent spaces T,M = (A,, < , > ) depending 
(smoothly, analytically, etc.) on the point v € M. It 
must satisfy the following axioms. 

FM1. The curvature of the metric <,>, on M 
(not necessarily positive definite) vanishes. Denote V 
the Levi-Civita connection for the metric. The unity 
vector field e must be flat, Ve =Q. 

FM2. Let c be the 3-tensor c(x,y,z):=<x-y, 
z>, x,y,z € T,M. The 4-tensor (V,,c)(x,y,z) must 
be symmetric in x,y,z,w € T,M. 

FM3. A linear vector field E € Vect(M) (called 
Euler vector field) must be fixed on M, that is, 
VVE =Q, such that 


Lieg(x - y) — Liegx-y—x-Liegy=x-y 
La <,s=(.—4) <> 


for some number d € k called “charge.” 


The last condition (also called quasihomogeneity) 
means that the derivations Ofunc(my:= E, Ovect(m):= 
id + adg define on the space Vect(M) of vector fields 
on M a structure of graded Frobenius algebra over 
the graded ring of functions Func(M). 

Flatness of the metric <,> implies local existence 
of a system of flat coordinates v!,...,v” on M. 
Usually, they are chosen in such a way that 


o 
ðv! 


e = 


is the unity vector field. In such coordinates, the 
problem of local classification of Frobenius mani- 
folds reduces to the WDVV associativity equations 
[1], [3], [4]. Namely, nag is the constant Gram 
matrix of the metric in these coordinates 


{9 oa 
NOB? \ ua ye 
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The structure constants of the Frobenius algebra 
A, =T,M 


00a... ð 


Ave aps = aM BF [6] 


can be locally represented by third derivatives [2] of 
a function F(v) satisfying [1], [3], [4]. The function 
F(v) is called “potential” of the Frobenius manifold. 
It is defined up to adding of an at most quadratic 
polynomial in v!,...,v”. 

A generalization of the above definition to the 
case of Frobenius supermanifolds can be found in 
Manin (1999). For the more general class of the 
so-called F-manifolds, the requirement of the 
existence of a flat invariant metric has been relaxed. 


Deformed Flat Connection 


One of the main geometrical structures of the theory 
of Frobenius manifolds is the deformed flat connec- 
tion. This is a symmetric affine connection on M x 
C* defined by the following formulas: 


~ 


Vey = Vy + zx: y, x,y € TM,z € C* 


- 1 
Vd/deY = Oy + E - y — ao (7 
~ d e d 
xm = —=0 
K dz Vade Ge 


where, as above, V is the Levi-Civita connection for 
the metric <, > and 


v= VE [8] 


is an operator on the tangent bundle TM antisym- 
metric with respect to <, >, 


AA a 


Observe that the unity vector field e is an eigen- 
vector of this operator with the eigenvalue 


ve= -fe 


The connection V = V (z) is not metric but it satisfies 


V <x,y>=<V(—-z)x,y> + <x, V(z)y> 
x,y E TM 


for any z € C*. As it was discovered in Dubrovin 
(1992), vanishing of the curvature of the connection 
V is essentially equivalent to the axioms of 
Frobenius manifold. 
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Definition A “deformed flat function” f(v;z) on a 
domain in M x C* is defined by the requirement of 
horizontality of the differential df 


Vdf =0 [9] 


Due to vanishing of the curvature of V locally 
there exist n independent deformed flat functions 
filv; z), ...,fa(v;z) such that their differentials, 
together with the flat 1-form dz, span the cotangent 
plane T% „(M x C*). They will be called “deformed 
flat coordinates.” The global analytic properties of 
deformed flat coordinates can be derived, for the 
case of semisimple Frobenius manifolds, from the 
results of the section “Moduli of semisimple 
Frobenius manifolds” discussed later. 

One can relax the definition of Frobenius manifold 
dropping the last axiom FM3. The potential F(v) in 
this case satisfies [1] and [3] but not [4]. In this case, 
the deformed flat connection V is just a family of 
affine flat connections on M depending on the 
parameter z € C given by the first line in [7]. The 
curvature and torsion of this family of connections 
vanishes identically in z. The deformed flat functions 
of V defined as in [9] can be chosen in the form of 
power series in z. The flatness equations written in the 
flat coordinates on M yield a recursion equation for 
the coefficients of these power series 


vd =O. f= N Opo) 


p>0 
OOF = zc (V) f 


ONOM (v) =( 
0,9, 9p+1(V) = Cy, (V)O,Ap(v) 


Thus, f(v;0) is just an affine linear function of the 
flat coordinates v',...,v”; the dependence on z can 
be considered as a deformation of the affine 
structure. This motivates the name “deformed flat 
coordinates.” The coefficients of the expansions of 
the deformed flat coordinates are the leading terms 
of the e-expansion of the Hamiltonian densities 
of the integrable hierarchies associated with the 
Frobenius manifolds (see below). 


p20 [10] 


Intersection Form of a 
Frobenius Manifold 


Another important geometric structure on M is the 
intersection form of the Frobenius manifold. It is a 
symmetric bilinear form on the cotangent bundle 
T*M defined by the formula 


(Ww, W2) = Ip W Wiss T*M [11] 


Here the multiplication law on the cotangent planes 
is defined by means of the isomorphism. 


<,>: TM — T*M 


The discriminant © C M is a proper analytic (for an 
analytic M) subset where the intersection form 
degenerates. One can introduce a new metric on 
the open subset M\™» taking the inverse of the 
intersection form. A remarkable result of the theory 
of Frobenius manifolds is vanishing of the curvature 
of this new metric. Moreover, the new flat metric 
together with the following new multiplication: 


x*xy:= x: y E7! 


defines on M\È a structure of an almost-dual 
Frobenius manifold (Dubrovin 2004). In the original 
flat coordinates v!,...,v” the coordinate expressions 
for the new metric and for the associated Levi-Civita 
connection V*, called the Gauss—Manin connection, 
read 


g” (vj := (dv*, dv’) = Evje (w) 


*æ dyb _ reb y 
V“? dv? = T% (v) dv 12 


1 p 
e= Ori) = oA) 
The pair (,) and < , > of bilinear forms on T*M 

possesses the following property crucial for under- 
standing the relationships between Frobenius mani- 
folds and integrable systems: they form a flat pencil. 
That means that on the complement to the subset 


ai= {ve M| det (g°” (v) — dn”) =O) 
The inverse to the bilinear form 
lg p= (5g) ASS [13] 


defines a metric with vanishing curvature. Flat 
functions p= p(v; A) for the flat metric are deter- 
mined from the system 


(V* — AV) dp = 0 14) 


They are called “periods” of the Frobenius manifold. 
The periods p(v; A) are related to the deformed flat 
functions f(v;z) by the suitably regularized Laplace- 
type integral transform 

dz 


50:= / “ef (n32) = 15) 


Choosing a system of n independent periods, one 
obtains a system of flat coordinates p!(v;X),..., 
p”(v; A) for the metric (,), on M\), 


(dp'(v; A), dp’ (v; )) = G” [16] 


for some constant nondegenerate matrix G”. 


The structure of a flat pencil on the Frobenius 
manifold M gives rise to a natural Poisson pencil 
(= bi-Hamiltonian structure) on the infinite-dimen- 
sional “manifold” £(M) consisting of smooth maps 
of a circle to M (the so-called loop space). In the flat 
coordinates v',...,v” for the metric <,> the 
Poisson pencil has the form 


{u*(x), v O) = PS (x y) 
{u*(x), P O) 58 (U(x) )8'(x y) [17] 
+T (w(x) )upd(x — y) 


By definition of the Poisson pencil, the linear 
combination a;{,}, + 42{,}, of the Poisson brackets 
is again a Poisson bracket for arbitrary constants 
a,,a2. Choosing a system of n independent periods 
pi(v;A),i=1,...,”, as a new system of dependent 
variables, one obtains a reduction of the Poisson 
bracket {,},:={,}, —A{,}, for a given A to the 
canonical form 


{p (v(x); A), p (u(y); 


Under an additional assumption of existence of tau 
function (Dubrovin 1996, Dubrovin and Zhang), 
one can prove that any Poisson pencil on L(M) of 
the form [17] with a nondegenerate matrix (nef) 
comes from a Frobenius structure on M. 


A)}, =GIS(x—-y) [18] 


Canonical Coordinates on Semisimple 
Frobenius Manifolds 


Definition The Frobenius manifold M is called 
semisimple if the algebras T,M are semisimple for 
v belonging to an open dense subset in M. 


Any n-dimensional semisimple Frobenius algebra 
over C is isomorphic to the orthogonal direct sum of 
n copies of one-dimensional algebras. In this section, 
all the manifolds will be assumed to be complex 
analytic. 

Near a semisimple point, the roots u;=4u,(v), 
i=1,...,n, of the characteristic equation 


det (g*”"(v) — An”) =0 [19] 


can be used as local coordinates. The vectors 


/ðu;i=1,... n, are basic idempotents of the 
algebras T,M 

o Og Oo 

Ou; Ou; — ú Ou; 


We call u1,...,u, “canonical coordinates.” Observe 
that we violate the indices convention labeling the 
canonical coordinates by subscripts. We will never 
use summation over repeated indices when working 
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in the canonical coordinates. Actually, existence of 
canonical coordinates can be proved without using 
[4] (see details in Dubrovin (1992)). 

Choosing locally branches of the square roots 





wy (u) = V/ <0O/Ou;,0/Ou;>, (cs een 7: [20] 
we obtain a transition matrix V = (Vialu)), 
O Wia(u 
21 
Ov -5 wir (u e A 


from the basis 0/Ov® to the orthonormal basis 


(fis fi) = ĉi 


fi = yl u) = 

22 
aod P 
fn = O 


The matrix Y(u) satisfies orthogonality condition 


o o 
n= (Nas), Nap = Ape’ ave 


In this formula Y* stands for the transposed matrix. 
The lengths [20] coincide with the first column of 
this matrix. 

Denote V(u) =(Vj;(%)) the matrix of the antisym- 
metric operator V [8] with respect to the orthonor- 
mal frame 


V(u):= U(u) VU! (u) [23] 


The antisymmetric matrix V(u)=(V;(u)) satisfies 
the following system of commuting time-dependent 
Hamiltonian flows on the Lie algebra so(n) 
equipped with the standard Lie—Poisson brackets 
{Vj Vert = Virdje — Vilôik + Vind — Vin dj1: 


AA ere) [24] 


H,(V;u) =>) - [25] 


The matrix U(x) satisfies 


Ow 
ðu; E Vi(u)¥, 


V;(u):= adg, ady 


|26] 
'(V(u)), i=1,...,7 
Here the matrix unity E; has the entries (E;),, = 


Ôaiôib U = diag(u1, ..., Un). Conversely, given a solu- 
tion to [24] and [26], one can reconstruct the 


442 WDVV Equations and Frobenius Manifolds 


Frobenius manifold structure by quadratures 
(Dubrovin 1998). The reconstruction depends on a 
choice of an eigenvector of the constant matrix 
V= Yt (u) V(u)U(u). 

The system [24] coincides with the equations of 
isomonodromic deformations (see Isomonodromic 
Deformations) of the following linear differential 
operator with rational coefficients: 


T=(v-”)y [27] 


The latter is nothing but the last component of the 
deformed flat connection [7] written in the ortho- 
normal frame [22]. Other components of the 
horizontality equations yield 


OY = (zE; + Vi(u))Y, i=1,...,n [28] 


The compatibility conditions of the system [27] and 
[28] coincide with [24]. 

The integration of [24], [26] and, more generally, 
the reconstruction of the Frobenius structure can be 
reduced to a solution of a certain Riemann—Hilbert 
problem (see Riemann-Hilbert Problem). 

The isomonodromic tau function of the semisim- 
ple Frobenius manifold is defined by 


dlog ru) = SHV (u); u)du; [29] 
i=1 


It is an analytic function on a suitable unramified 
covering of the semisimple part of M. 

Alternatively, eqns [24] can be represented as the 
isomonodromy deformations of the dual Fuchsian 


system 
U-S= (5+ v) [30] 


The latter comes from the Gauss—Manin system for 
the periods p=p(v;A) of the Frobenius manifold 
written in the canonical coordinates [22]. 


Moduli of Semisimple 
Frobenius Manifolds 


All n-dimensional semisimple Frobenius manifolds 
form a finite-dimensional space. They depend on 
n(n — 1)/2 essential parameters. To parametrize the 
Frobenius manifolds one can choose, for example, 
the initial data for the isomonodromy deformation 
equations [24]. Alternatively, they can be parame- 
trized by monodromy data of the deformed flat 
connection according to the following construction. 

The first part of the monodromy data is the 
spectrum (V, < , >, û, R) of the Frobenius manifold 
associated with the Poisson pencil. Here V is an 


n-dimensional linear space equipped with a sym- 
metric nondegenerate bilinear form <,>. Two 
linear operators on V, a semisimple operator 
û: V — V, and a nilpotent operator R: V— V must 
satisfy the following properties. First, the operator [i 
is antisymmetric: 


ji = ph [31] 


and the operator R satisfies 
R =e “Re [32] 


Here the adjoint operators are defined with respect 
to the bilinear form < , >. The last condition to be 
imposed onto the operator R can be formulated in a 
simple way by choosing a basis e1,...,€„ of 
eigenvectors of the semisimple operator Â, 
leam lola “OS lan 

We require the existence of a decomposition 

R= Roọo+ Rit R+. [33] 


where for any integer k > 0 the linear operator R+ 
satisfies 


Rzea E€ span{eg| us =ua+k} Ya=1,... n [B4 


In the nonresonant case, such that none of the 
differences of the eigenvalues of ú being equal to a 
positive integer, all the matrices R1, R2,..., are equal 
to zero. Observe a useful identity 


PR "= hoteki te Rhi [35] 


More generally, for any operator A:V— V com- 
muting with e*"” a decomposition is defined as 


A= 9 [A], 
REZ, 36] 
ZA gH = oe [A]; | 
REZ, 


In particular, [R]; =R,,k > 0, [R]; =0, k < 0. 

One has to also choose an eigenvector e of the 
operator ji such that Roe=0; denote —d/2 the 
corresponding eigenvalue 

. d 
e EV, ĝe = -3 8, Roe = 0 [37] 

The second part of the monodromy data is a pair 

of linear operators 
C:V= C”, S: C” — C” 

The space C” is assumed to be equipped with the 

standard complex Euclidean structure given by 

the sum of squares. The properties of the operators 

S, C depend on the choice of an unordered set 


u°=(u),...,u2) of n pairwise distinct complex 
numbers and on a choice of a ray 4, on an auxiliary 


complex z-plane starting at the origin such that 


Rez(u? —u) £0, iżj zcl [38] 
Let us order the complex numbers in such a way that 


z(u9 =u?) 


er =, hks E 37] 


The operator S must be upper triangular 


= (Sir); S; = 0, L>] 


[40] 
Sl, Te ery 7: 
The operator C must satisfy 
GCScC=e"e™ [41] 


Here the adjoint operator C* is understood as 
follows: 


= 
COSC oy SV 


The group of diagonal n x n matrices 
D = diag(+1,..., 1) 
acts on the pairs (S, C) by 


S — DSD, Cre DC 


One is to factor out the action of this diagonal 
group. Besides, the operator C is defined up to a left 
action of certain group of linear operators depend- 
ing on the spectrum. 

For the generic (i.e., nonresonant) case where 
e*"/ has simple spectrum, the operator C is defined 
up to left multiplication by any matrix commuting 
with e**'“, In this situation, the monodromy data 
(fi, R,S,C) are locally uniquely determined by the 
n(n — 1)/2 entries of the matrix S. Therefore, near a 
generic point, the variety of the monodromy data is 
a smooth manifold of the dimension n(n — 1)/2. At 
nongeneric points, the variety can get additional 
strata. 

The monodromy data S, C are determined at an 
arbitrary semisimple point of a Frobenius manifold 
in terms of the analytic properties of horizontal 
sections of the deformed flat connection V [7] in the 
complex z-plane (the so-called “Stokes matrix” and 
the “central connection matrix” of the operator 
[27]). Locally, they do not depend on the point of 
the semisimple Frobenius manifold (the isomono- 
dromicity property). 

We will now describe the reconstruction procedure 
giving a parametrization of semisimple Frobenius 


WDVV Equations and Frobenius Manifolds 443 


manifolds in terms of the monodromy data (ji, R, 
S.C): 

Conversely, to reconstruct the Frobenius manifold 
near a semisimple point with the canonical coordi- 
nates uĵ,...,u®, one is to solve the following 
boundary-value problem. Let 


@=(-0_)UL, 


be the oriented line on the complex z-plane chosen 
as in [38]. Here the ray ¢_ is the opposite to @,. 
Denote IIg/T the right/left half-planes with respect 
to £. To reconstruct the Frobenius manifold, one is 
to find three matrix-valued functions ®ọo(z; u), 
Op(z3u), and (z; u): 


D(z; u) : V = C” 

Pp ji (z; u) : C” — C” 
for u close to u? such that ®o(z; u) i analytic and 
invertible for z € C, @r(z; u)/®ı(z;u) are analytic 


and invertible for z € IIp/II, resp., A continuous 


up to the boundary @\0 and 
Pralin) ~ 1+ O(1/z), [z| > 00, z E I/II 


The boundary values of the functions 
Po(z;u),Pr(z3;u), and ®,(z;~) must satisfy the 


following boundary-value problem (as above 
U = diag(u1,..., Uun): 
Pp(z3u) = By (zu) "Se*¥, zE [42] 
Pr(z u) = Oy (z;u)e@VSte*4, zee [43] 
Bo(z;u)ziz® = Prhz;uje VC, zer 44 
Po(z; u)z’z? = OL (z;u) VSC, zen 


Here zi’ := ef/losz, zR :— eRlosz are considered as 


Aut(V)-valued functions on the universal covering 
of C\0; the branch cut in the definition of log z is 
chosen to be along ¢_. 

The solution of the above boundary-value pro- 
blem [42]|-[44], if exists, is unique. It can be reduced 
to a certain Riemann—Hilbert problem, that is, to a 
problem of factorization of an analytic nxn 


nondegenerate miatrix-valued function on the 
annulus 
G(z;u), r<|z|<R, det G(z;u) 40 
depending on the parameter u= (u1,..., un) in a 
product 
G(z;u) = Go(z; u)" Goz; 1) [45] 


of two matrix-valued functions Go(z;u) and 
Glz; u) analytic for |z| < R and r < |z| < œ resp., 
with nowhere-vanishing determinant. 
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Existence of a solution to the Riemann—Hilbert 
problem for a given u = (u1, ..., Un), ui A uj for i £ j, 
means triviality of certain n-dimensional vector 
bundle over the Riemann sphere with the transition 
functions given by G(z; u). Existence of the solution 
for u=u? implies solvability of the Riemann- 
Hilbert problem for u sufficiently close to u°. From 
these arguments, it can be deduced that the matrices 
Po(2; u), Pri (z3u) are analytic in (z;u) for u 
sufficiently close to u°. Moreover, they can be 
analytically continued in u to the universal covering 
of the space of configurations of n distinct points on 
the complex plane: 


(C""\Uigi{ui = uj}) /Sn |46] 


The resulting functions are meromorphic on the 
universal covering, according to the results of 
B Malgrange and T Miwa. The structure of the 
global analytic continuation is given (Dubrovin 
1999) in terms of a certain action of the braid group 


By = Ti (C Uau = uj}) /Sn) 


on the monodromy data. 


Examples of Frobenius Manifolds 


Example 0 Trivial Frobenius manifold, M = Aọ a 
graded Frobenius algebra, F(v) =(1/6) <e,v-v-v> 
is a cubic polynomial. 


First nontrivial examples appeared in the setting 
of 2D topological field theories (Dijkgraaf et al. 
1991, Witten 1991) (see Topological Quantum Field 
Theory: Overview). Mathematical formalization of 
these ideas gives rise to the following two classes of 
examples. 


Example 1 Frobenius structure on the base of an 
isolated hypersurface singularity. The construction 
(Hertling 2002, Sabbah 2002) uses the K Saito 
theory of periods of primitive forms. For the 
example of A, singularity f(x) =x”*! the Frobenius 
structure on the base of universal unfolding 


Ma, — EO asin +++ +Sy|S1,---5Sn eC} 


is constructed as follows (Dijkgraaf et al. 1991): 











o o 
OS, 
1 o 
n=l 
=o n+1 


The multiplication is introduced by identifying the 
tangent space T;M with the quotient algebra 


TsMa, = Clx] /(fe(x)) 
The metric has the form 
Bf) [Asif /05; 4. 
f(x) 


The flat coordinates v,=v,(s) can be found from 
the expansion of the solution to the equation 


f(x) = k"t, 


o 1 Ur. Ui V1 1 
BE a tte) +O(ga) 
The potentials of the Frobenius manifolds M4, for 

n=1,2,3 read 


<0 eS (0 1) ES ne 





— 1,3 
EEEE 1 ,,4 
FA, = 5 V102 +a? |47] 
1 2 T2 Í 22 1 
Täs = 5 V107 + 3V103 + 76 V203 T 969 V3 


The space of polynomials M4, can be identified with 
the orbit space of C/W(A,) of the Weyl group of the 
type A,. More generally (Dubrovin 1996), the orbit 
space Mw := C”/W of an arbitrary irreducible finite 
Coxeter group W C O(n) carries a natural structure 
of a polynomial semisimple Frobenius manifold. 
Conversely, all irreducible polynomial semisimple 
Frobenius manifolds with positive degrees of the flat 
coordinates can be obtained by this construction 
(Hertling 2002). Generalizations for the orbit spaces 
of certain infinite groups were obtained in Dubrovin 


and Zhang (1998b) and Bertola (2000). 


Example 2 Gromov—Witten (GW) invariants (see 
Topological Sigma Models). Let X be a smooth 
projective variety. We will assume for simplicity that 
H°“4(X)=0. To every such variety, one can associ- 
ate a bunch of rational numbers. They are expressed 
in terms of intersection theory of certain cycles on 
the moduli spaces X m, of stable genus g and 
degree 3 curves on X with m marked points (see 
details in Kontsevich and Manin (1994)): 


Xgm,pi= If: (oar TE a) =} X, 
f-[Cg] = 8 € M(X; Z)} 


Denote n:= dim H*(X; C). Choosing a basis ¢; = 1, 
2,.--5¢, we define the numbers 


48] 


< Thi (Pai) ee Tbm (Pan) Aon 
evi (Pa) A e (£1) 


Nei Neylan) NEm) [49] 


for arbitrary non-negative integers p1,...,)m. Here 
the evaluation maps ev;,i=1,...,m, are given by 


f => f (xi) 


The so-called tautological line bundles £; over Xg, m 

by definition have the fiber T% Cg,i= 1,...,m me 
the article Moduli Spaces: An O fee wine 
the construction of the so-called virtual fundamental 
class [Xe in,g]"""). The numbers [49] can be defined 
for an arbitrary compact symplectic manifold X 
where one is to deal with the intersection theory on 
the moduli spaces of pseudoholomorphic curves 
fixing a suitable almost-complex structure on X. 
They depend only on the symplectic structure on X. 
In particular, the numbers 


<7 Ong.) 028 0 Ou.) eB [50] 


are called the genus g and degree 3 GW invariants of 
X. In certain cases, they admit an interpretation in 
terms of enumerative geometry of the variety X 

(Kontsevich and Manin 1994). The numbers [49] 
with some of p;>0 are called “gravitational 
descendents.” 


eVi : Xemb => X, 


One can form a generating functions of the 
numbers [49] 


1 
X —_— 4a „p Qm, Pm 
-D Y Lens 
m BEH(X;Z) 
<T (Pa) dees TD (Dom ) > g,3 [51] 


(summation over repeated indices 1 < ay,...,Qm < 
n will always be assumed). Here ¢%? are indetermi- 
nates labeled by pairs (a,p) with a=1,...,n, 


p=0,1,2,.... (Usually one is to insert in the 
definition of ae elements g’ of the Novikov ring 
C[H>(X;Z)]. However, due to the divisor axiom 


(Kontsevich and Manin 1994) and these insertions 
can be compensated by a suitable shift in the space 
of couplings t= (t?).) We finally introduce the full 
generating function called total GW potential (it is 
also called the free energy of the topological sigma 
model with the target space X) 


=e [52] 


g>0 


F* (t;e) 


Restricting the genus-zero generating function 
onto the so-called small phase space 


[53] 


one obtains a solution to the WDVV associativity 
equations. This solution defines a structure of 
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(formal) Frobenius manifold on H*(X) with the 
bilinear form 7 given by the Poincaré pairing 


roa = | dags 


the unity 

pees 

= Ov! 
and the Euler vector field 
o 
E= 1 -— qa A 
D dalt” t talas oa 

Here the numbers qa,řa are defined by the 


conditions 


Pa € pe (X), 


= > Tole 


The resulting Frobenius manifold will be denoted 
Mx. The corresponding n-parameter family of 
n-dimensional algebras on the tangent spaces T,Mx 
is also called “quantum cohomology” QH*(X). At 
the point vy E€ Mx of classical limit, the algebra 
T,, Mx coincides with the cohomology ring H*(X). 
In all known examples, the series [53] actually 
converges in a neighborhood of the point vg. 
Therefore, one obtains a genuine Frobenius structure 
on a domain Mx C H*(X; C)/2miH2(X; Z). How- 
ever, a general proof of convergence is still missing. 

In particular, for d= 1, the quantum cohomology 
of complex projective line P! is a two-dimensional 
Frobenius manifold with the potential, unity, and 
the Euler vector field 


F(u, v) = yu? + e”, 


„8 
Ov’ 

o O 

i 

“oU ü Ou 


For d=2 one has a three-dimensional Frobenius 
manifold OH*(P*) with 


F(v1, 2,03) =5V7V3 + 5V1V5 
y vy tg 
+ N; gme 
a (3k—1)! 
; 54] 
~ Ovy 
B= +3 3 a 


where N; =number of rational curves on P* passing 
through 3k—1 generic points. WDVV [5] yields 
(Kontsevich and Manin 1994) recursion relations for 
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the numbers Ny, starting from N;=1. The closed 
analytic formula for the function [54] is still unknown. 

Only for certain very exceptional X the Frobenius 
manifold My is semisimple (e.g., for X =P*). The 
general geometrical reasons of the semisimplicity of 
My are still to have been understood. 

For the case X =Calabi-Yau manifold, the Fro- 
benius manifold OH*(X) is never semisimple. This 
Frobenius structure can be computed in terms of the 
mirror symmetry construction (see Mirror Symme- 
try: A Geometric Survey). 


Frobenius Manifold and Integrable 
Systems 


The identities in the cohomology ring generated by 
the cocycles ev*(¢_) and 4; :=c1(£;) can be recast 
into the form of differential equations for the 
generating function [52]. The variable x:= t1? 
corresponding to ¢;=1 plays a distinguished role 
in these differential equations. According to the idea 
of Witten (1991), the differential equations for the 
generating functions can be written as a hierarchy of 
systems of n evolutionary PDEs (n = dim H*(X)) for 
the unknown functions 


1 PF*(t, 6) 
Or1.°9te.9 


The variable x is the spatial variable of the 
equations of the hierarchy. The remaining para- 
meters (coupling constants) t~? of the generating 
function play the role of the time variables. Witten 
suggested to use the two-point correlators 


CF 
ha p = (Tp+1(Ga)T0(1))) = ee 


as the densities of the Hamiltonians of the flows of 
the hierarchy. 

Existence of such a hierarchy can be proved for 
the case of GW invariants (and their descendents) 
of complex projective spaces P* (the results of 
Givental (2001) along with Dubrovin and Zhang 
(2005) can be used). For d=0 one obtains, 
according to the celebrated result by Kontsevich 
conjectured by Witten (see Topological Gravity, 
Two-Dimensional), the tau function of the solution 
to the KdV hierarchy (see Korteweg-de Vries Equation 
and Other Modulation Equations) specified by the 
initial condition, 


Wo = (T0(¢a)T0(¢1))) = € 


— 71,0 


[55] 


|56] 


u(x) | t=0 — * 


For d=1 the hierarchy in question is the extended 
Toda lattice (see details in Dubrovin and Zhang 
(2004); see also Toda Lattices). For all other d > 2, 


the needed integrable hierarchy is a new one. It can 
be associated (Dubrovin and Zhang) with an arbi- 
trary n-dimensional semisimple Frobenius manifold 
M. The equations of the hierarchy have the form 


XXX 


wi = Ai(w)wl, +è |Bi(w jw’ + Ci (w ) wi wk, 


+ Di (w ww wh! | +O(e ), i=1,....n [57] 


X X X 


The coefficients of «78 are graded homogeneous 
polynomials in uy,uxx, etc., of the degree 2g +1, 


deg d”u/dx” = m 


The construction of the hierarchy is done in two 
steps. First, we construct the leading approximation 
(Dubrovin 1992). The equation of the hierarchy 
specifying the dependence on t=t%? at «=O reads 





OV 
Oto Ox (Va p+ (v) [58] 
OS leg P0 


The functions 6a,p(v), v € M, are the coefficients of 
expansion [10] of the deformed flat functions 
normalized by 64,9 =va. The solution v=v(x,t) of 
interest is determined from the implicit function 
equations 


v= xe + ` tP V6.0.» (v) [59] 
ap 


Next, one has to find solution 


AF = > EET F (V; Vy,- , VPET) [60] 


g>1 


of the following universal loop equation (closely 
related with the Virasoro conjecture of Eguchi and 
Xiong (1998)): 


OAF y 1 i 
Ovi" *\E(v) —~A 


A Afr 
OAF y (| O OCG T oe 





r>0 





YTY 
pat OV 


ac wU tieu- a] 


16 
LAF poar OAF 
ia) 5) Dunkve! + vk Ave! 


v oo GL Cad Ops 
OAF okt! 
aes >d Ovvk Ox 


vy Pals A) 5 OPaYs A) 
«|v ar Or 











vs "Gee [61] 


Here U = U(v) is the operator of multiplication by 
E(v), Pa = Palt; à) a =1,...,n, is a system of flat 
coordinates [16] of the bilinear form [13]. The 
substitution 

Ui We=~ + € Oy, Oya.0 AF (0; R a 62] 

OS lf 

transforms [58] to [57]. The terms of the expansion 
[60] are not polynomial in the derivatives. For 
example (Dubrovin and Zhang 1998a), 


1 á Ti(u) 
Fi=—) osu 4. loe—— —— 


I(u)= dee( FO) = + [Jon (u) 


(the canonical coordinates have been used) where 
T(u) is the isomonodromic tau function [29]. The 
transformation [62] applied to the solution [59] 
expresses higher-genus GW invariants of a variety X 
with semisimple quantum cohomology OH"*(X) via 
the genus-zero invariants. For the particular case of 
X=P’, the formula [63] yields (Dubrovin and 
Zhang 1998a) 


o" —27 1 4) 
eo ar ee 
8027+20 —30) BT A 


[63] 


ef? 


Gk)! 





Here 


eke 
olz) = ` Np Gk- 1)! 


k>0 


is the generating function of the genus-zero GW 
invariants of P* (see [54]) and No = the number of 
elliptic plane curves of the degree k passing through 
3k generic points. 


See also: Bi-Hamiltonian Methods in Soliton Theory; 
Functional Equations and Integrable Systems; Integrable 
Systems: Overview; Isomonodromic Deformations; 
Korteweg-de Vries Equation and Other Modulation 
Equations; Mirror Symmetry: A Geometric Survey; Moduli 
Spaces: An Introduction; Painlevé Equations; 
Riemann-Hilbert Problem; Toda Lattices; Topological 
Gravity, Two-Dimensional; Topological Quantum Field 
Theory: Overview; Topological Sigma Models. 
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Introduction 


Practically any physical, chemical, or biological 
system can exhibit rhythmic oscillatory activity, at 
least when the conditions are right. Winfree (2001) 
reviews the ubiquity of oscillations in nature, 
ranging from autocatalytic chemical reactions to 
pacemaker cells in the heart, to animal gates, and to 
circadian rhythms. When coupled, even weakly, 
oscillators interact via adjustment of their phases, 
that is, their timing, often leading to synchroniza- 
tion. In this chapter, we review the most important 
concepts needed to study and understand the 
dynamics of coupled oscillators. 

From a mathematical point of view, an oscillator 
is a dynamical system, 


x= f (x), 
having a limit-cycle attractor — periodic orbit y C R”. 
Its period is the minimal T > 0 such that 


y(t) = y + T) 


and its frequency is Q=27/T. Let x(0)= xo € y be 
an arbitrary point on the attractor, then the state of 
the system, x(t), is uniquely defined by its phase 
0 € S! relative to xo, where S! is the unit circle. 

Throughout this article, we assume that the 
periodic orbit y is exponentially stable, which 
implies normal hyperbolicity. In this case, there is a 
continuous transformation ©: U — S! defined in a 
neighborhood U D y such that v(t) = O(x(t)) for any 
trajectory in U, that is, © maps solutions of [1] to 
solutions of 


x Ee R” [1] 


for any t 


v= [2] 


Such a transformation removes the amplitude but 
saves the phase of oscillation. 

Accordingly, there is a continuous transformation 
that maps solutions of the weakly coupled network 
of n oscillators, 


Xi = fi) Pegi Kaye); E<l [3] 
onto solutions of the phase system 
0; = 0; + ehj(01,...,0n,€), Ù; € S! [4] 


which is easier for studying the collective properties 


of [3]. 


„----Adentify------. 
@S Wi Li 
2 04 04 “seses : 04 


(a) (b) (c) 
Figure 1 A 2-torus and its representation on the square. 
(Modified from Hoppensteadt and Izhikevich 1997.) 


Entrainment 
(1:1 frequency locking) 





Figure 2 Various degrees of locking of oscillators. (Modified 
from Izhikevich 2006.) 


The oscillators are said to be frequency locked when 
[4] has a stable periodic orbit Y(t) = (441 (t), ..., 9n (t)) 
on the n-torus T”, as in Figure 1a. The “rotation 
vector” or “winding ratio” of the orbit is the set of 


while #2 makes q2 rotations, etc., as in the 2:3 
frequency locking in Figure 1a. The oscillators 


locked. The oscillators are phase locked when there is 
an (n—1)xn integer matrix K having linearly 
independent rows such that K(t) = const. For exam- 
ple, the two oscillators in Figure 1b are phase locked 
with K = (2,3), while those in Figure 1c are not. The 
oscillators are synchronized when they are entrained 
and phase locked. Synchronization is in-phase when 
D(t) = --- =V,(t) and out-of-phase otherwise. Two 
oscillators are said to be synchronized antiphase when 
H(t) — D(t) =a. Frequency locking without phase 
locking, as in Figure 1c, is called phase trapping. The 
relationship between all these definitions is depicted 
in Figure 2. 


Phase Resetting 


An exponentially stable periodic orbit is a normally 
hyperbolic invariant manifold, hence its sufficiently 
small neighborhood, U, is invariantly foliated by 


Andronov—Hopf oscillator 
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Figure 3 Isochrons of Andronov—Hopf oscillator (z= (1+ i)z — z2 z € C) and van der Pol oscillator (x = x — x? — y, y =X). 


stable submanifolds (Guckenheimer 1975) illustrated 
in Figure 3. The manifolds represent points having 
equal phases and, for this reason, they are called 
isochrons (from Greek “iso” meaning equal and 
“chronos” meaning time). 

The geometry of isochrons determines how the 
oscillators react to perturbations. For example, the 
pulse in Figure 3, right, moves the trajectory from 
one isochron to another, thereby changing its phase. 
The magnitude of the phase shift depends on the 
amplitude and the exact timing of the stimulus 
relative to the phase of oscillation V. Stimulating the 
oscillator at different phases, one can measure the 
phase transition curve (Winfree 2001) 


Unew = PT C(Woia) 
and the phase resetting curve 


PRC(W) = PTC(W) — V 
(shift = new phase — old phase) 


Positive (negative) values of the PRC correspond to 
phase advances (delays). PRCs are convenient when 
the phase shifts are small, so that they can be 
magnified and clearly seen, as in Figure 4. PTCs are 
convenient when the phase shifts are large and 
comparable with the period of oscillation. 


Andronov—Hopf oscillator 


van der Pol oscillator 











Stimulus phase, @ 


Stimulus phase, @ 


Figure 4 Examples of phase response curves (PRCs) of the 
oscillators in Figure 3. PRC;(v) and PRC2(9) correspond to 
horizontal (along the first variable) and vertical (along the second 
variable) pulses with amplitudes 0.2. An example of oscillation is 
plotted as a dotted curve in each subplot (not to scale). 





In Figure 5 we depict phase portraits of the 
Andronov—Hopf oscillator receiving pulses of 
magnitude 0.5 (left) and 1.5 (right). Notice the 
drastic difference between the corresponding PRCs 
or PTCs. Winfree (2001) distinguishes two cases: 


1. type 1 (weak) resetting results in continuous PRCs 
and PTCs with mean slope 1, and 

2. type 0 (strong) resetting results in discontinuous 
PRCs and PTCs with mean slope 0. 


Type 1 (weak) resetting 


Type O (strong) resetting 











Phase resetting 
Phase resetting 
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PTC(0) = 
{9+ PRC(6) 


PTG(6)= 
{0+ PRC(6)} mod 27 


Phase transition 
Phase transition 
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Figure 5 Types of phase resetting of the Andronov—Hopf 
oscillator in Figure 3. 
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The discontinuity of type 0 PRC in Figure 5 is a 
topological property that cannot be removed by 
reallocating the initial point xo that corresponds to 
zero phase. The discontinuity stems from the fact 
that the shifted image of the limit cycle (dashed 
circle) goes beyond the central equilibrium at which 
the phase is not defined. 

The stroboscopic mapping of S! to itself, called 
Poincaré phase map, 


Vk+1 = PTC) [5] 


describes the response of an oscillator to a T-periodic 
pulse train. Here, V, denotes the phase of oscillation 
when the kth input pulse arrives. Its fixed points 
correspond to synchronized solutions, and its periodic 
orbits correspond to phase-locked states. 


Weak Coupling 
Now consider dynamical systems of the form 
x = f(x) + es(t) [6] 


describing periodic oscillators, x= f(x), forced by 
a weak time-depended input es(t), for example, from 
other oscillators in a network. Let O(x) denote the 
phase of oscillation at point x € U, so that the map 
©:U — S! is constant along each isochron. This 
mapping transforms [6] into the phase model 


=Q +Q) - s(t) 


with function O(V), illustrated in Figure 6, satisfying 
three equivalent conditions: 


Andronov—Hopf oscillator van der Pol oscillator 


Q(9) 














0 Phase,@ 2r 0 


Phase,d 2r 


Figure 6 Solutions Q = (Q1, Q2) to the adjoint problem [7] for 
oscillators in Figure 3. 


1. Winfree: O(v) is normalized PRC to infinitesimal 
pulsed perturbations; 

2. Kuramoto: O(v?) = grad O(x); and 

3. Malkin: O is the solution to the adjoint problem 


Q = {Df (7(2))}' © [7] 
with the normalization O(t) - f(y(t)) = for any t. 


The function OỌ(4) can be found analytically in a 
few simple cases: 


1. a nonlinear phase oscillator x = f(x) with x € S! 
and f > 0 has Q(¥) = 9/f(7(8)); 

2. a system near saddle-node on invariant circle 
bifurcation has O(4) proportional to 1 — cos %; 
and 

3. a system near supercritical Andronov—Hopt 
bifurcation has O(v) proportional to sin(ð — 4%), 
where 2 € S! is a constant phase shift. 


Other interesting cases, including homoclinic, 
relaxation, and bursting oscillators are considered 
by Izhikevich (2006). 

Treating s(t) in [6] as the input from the network, 
we can transform weakly coupled oscillators 


si(t) 
AAN 


i= filxi) +e > gy(xi, xj), x ER” [8 
j=l 


to the phase model 
s(t) 





d; = Q; +€O,(9) -X gyal), x9) P 
j=1 
having the form [4] with bh; =Q; gj, or the form 
0; = Q; +E DD Devi, Vi) 
j=1 


where h; = Ojg;j. Introducing phase deviation vari- 
ables J; = Q;t + y;, we transform this system into the 
form 


=e bilit + yj, Qjt + ¥)) 
j=l 


which can be averaged to 
i =e Hi(¥i - pj) [10] 
j=1 
with the functions 


T 
Hi(x) = lim L i bi it, Qit = x) dt [11] 


T— co 


describing the interaction between oscillators 
(Ermentrout and Kopell 1984). To summarize, we 
transformed weakly coupled system [8] into the 
phase model [10] with H given by [11] and each O 
being the solution to the adjoint problem [7]. This 
constitutes the Malkin theorem for weakly coupled 
oscillators (Hoppensteadt and Izhikevich 1997, 
theorem 9.2). 

Existence of one equilibrium of the phase model 
[10] implies the existence of the entire circular 
family of equilibria, since translation of all y; by a 
constant phase shift does not change the phase 
differences y; — y; and hence the form of [10]. This 
family corresponds to a limit cycle of [8], on which 
all oscillators have equal frequencies and constant 
phase shifts, that is, they are synchronized, possibly 
out of phase. 

We say that two oscillators, i and j, have resonant 
(or commensurable) frequencies when the ratio 
Q;/Q; is a rational number, for example, it is p/q 
for some integer p and q. They are nonresonant 
when the ratio is an irrational number. In this case, 
the function H; defined above is constant regardless 
of the details of the oscillatory dynamics or the 
details of the coupling, that is, dynamics of two 
coupled nonresonant oscillators is described by an 
uncoupled phase model. Apparently, such oscillators 
do not interact; that is, the phase of one of them 
cannot change the phase of the other one even on 
the long timescale of order 1/e. 


Synchronization 


Consider [8] with n=2, describing two mutually 
coupled oscillators. Let us introduce “slow” time 
T=et and rewrite the corresponding phase model 
[10] in the form 


yy = w1 + Hilpi — p2) 

Py = w + Aoi (p2 — ¥1) 
where '=d/dr and w;=H;;(0) is the frequency 
deviation from the natural oscillation, i=1, 2. Let 


X¥ = 2 — yı denote the phase difference between the 
oscillators; then 


x =wt Hx) [12] 
where 
w = w — uw, and H(y) = H21 (x) — Hi2(-x) 


is the frequency mismatch and the antisymmetric 
part of the coupling, respectively, illustrated in 
Figure 7, dashed curves. A stable equilibrium of 
[12] corresponds to a stable limit cycle of the phase 
model. 


Weakly Coupled Oscillators 451 


van der Pol oscillator 


H(x) oy 


Andronov—Hopf oscillator 





H69 














0 2r O0 Qn 


Phase difference, x Phase difference, x 


Figure 7 Solid curves: functions H;(x) defined by [11] 
corresponding to the gap—junction input g(x;, xj) = (X1 — Xin, 0). 
Dashed curves: functions H(x) = Hy(x) — H; x).. Parameters 
are as in Figure 3. 


All equilibria of [12] are solutions to H(y) = ~w, 
and they are intersections of the horizontal line —w 
with the graph of H. They are stable if the slope 
of the graph is negative at the intersection. If 
oscillators are identical, then H(y) is an odd 
function (i.e. H(—y)= —H(y)), and y=O and 
xy=rņr are always equilibria, possibly unstable, 
corresponding to the in-phase and antiphase syn- 
chronized solutions. The in-phase synchronization 
of gap-junction coupled oscillators in Figure 7 is 
stable because the slope of H (dashed curves) is 
negative at y=0. The max and min values of the 
function H determine the tolerance of the network 
to the frequency mismatch w, since there are no 
equilibria outside this range. 

Now consider a network of n > 2 weakly coupled 
oscillators [8]. To determine the existence and 
stability of synchronized states in the network, we 
need to study equilibria of the corresponding phase 
model [10]. The vector d6=(d1,...,@,) is an 
equilibrium of [10] when 


0 = wi + X Hi(¢i = Qi) (for all 1) [13] 
Fl 
It is stable when all eigenvalues of the linearization 
matrix (Jacobian) at ¢ have negative real parts, 
except one zero eigenvalue corresponding to the 
eigenvector along the circular family of equilibria (¢ 
plus a phase shift is a solution of [13] too since the 
phase shifts ¢; — ¢; are not affected). 
In general, determining the stability of equilibria 
is a difficult problem. Ermentrout (1992) found a 
simple sufficient condition. If 


1. aj =H;($; — $j) < 0, and 

2. the directed graph defined by the matrix a= (aj) 
is connected, (i.e., each oscillator is influenced, 
possibly indirectly, by every other oscillator), 


then the equilibrium ¢ is neutrally stable, and the 


corresponding limit cycle x(t + ¢) of [8] is asympto- 
tically stable. 
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Another sufficient condition was found by 
Hoppensteadt and Izhikevich (1997). If system [10] 
satisfies 


1. wy = +--+ =w, =w (identical frequencies) 
2. Hi(—x) = —Hji(x) (pairwise odd coupling) 


for all i and f, then the network dynamics converge to a 
limit cycle. On the cycle, all oscillators have equal 
frequencies 1 + ew and constant phase deviations. 

The proof follows from the observation that [10] 
is a gradient system in the rotating coordinates 
yy =wt + ¢ with the energy function 


E(¢) = -IS YR: =) 


i=1 j=1 
where 
x 
Ri(x) = / Hij(s) ds 
One can check that dE(¢)/dr= -5 0) <0 


along the trajectories of [12] with equality only at 
equilibria. 


Mean-Field Approximations 


Let us represent the phase model [10] in the form 


n 
p, =wi +X Hylly- pj) 
jfi 
where '=d/dr,r=et is the slow time, and 
wi = H;(0) are random frequency deviations. Collec- 
tive dynamics of this system can be analyzed 
in the limit ~— oo. We illustrate the theory 


using the special case, H(y) = —sin x, known as the 
Kuramoto (1984) model: 


cr. 
p =w +>) sin(yj— yi), vi € [0.2m] [14] 
j=l 


where K > 0 is the coupling strength and the factor 
1/n ensures that the model behaves well as n — co. 
The complex-valued sum of all phases, 


1G , 
re” =- X Ca 
nG 


(Kuramoto synchronization index) [15] 


describes the degree of synchronization in the 
network. Apparently, the in-phase synchronized 
state y1 =: =, corresponds to r=1 with % 
being the population phase. In contrast, the inco- 
herent state with all y; having different values 





Figure 8 Kuramoto synchronization index [15] describes the 
degree of coherence in the network [14]. 


randomly distributed on the unit circle, corresponds 
to r = 0. Intermediate values of r correspond to a 
partially synchronized or coherent state, depicted in 
Figure 8. Some phases are synchronized forming a 
cluster, while others roam around the circle. 

Multiplying both sides of [15] by e™i and 
considering only the imaginary parts, we can rewrite 
[14] in the equivalent form 


p; = wi + Krsin(y — gi) 


that emphasizes the mean-filed character of interac- 
tions between the oscillators: they all are pulled into 
the synchronized cluster (p; — ~) with the effective 
strength proportional to the cluster size r. This pull 
is offset by the random frequency deviations w; that 
pull away from the cluster. 

Let us assume that ws are distributed randomly 
around O with a symmetrical probability density 
function g(w), for example, Gaussian. Kuramoto has 
shown that in the limit n — ov, the cluster size r 
obeys the self-consistency equation 


+n /2 

= rK | g(Kr sin y) cos? y dy [16] 
—r/2 

Notice that r=0, corresponding to the incoherent 

state, is always a solution of this equation. When the 

coupling strength K is greater than a certain critical 

value, 


2 


Se = 5 9(0) 
an additional, nontrivial solution r>0 appears, 
which corresponds to a partially synchronized 
state. Expanding g in a Taylor series, one gets the 
scaling r=,/16(K — Ke)/(~2"(0)rK4). Thus, the 
stronger the coupling K relative to the random 
distribution of frequencies, the more oscillators syn- 
chronize into a coherent cluster. The issue of stability 
of incoherent and partially synchronized states is 
discussed by Strogatz (2000). Other generalizations 
of the Kuramoto model are reviewed by Acebron et al. 
(2005). An extended version of this article with the 


emphasis on computational neuroscience can be found 


in the recent book by Izhikevich (2006). 


See also: Bifurcations of Periodic Orbits; Dynamical 
Systems and Thermodynamics; Hamiltonian Systems: 
Stability and Instability Theory; Singularity and Bifurcation 
Theory; Stability Theory and KAM; Synchronization of 
Chaos. 
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Introduction 


It is recognized that one of the outstanding problems 
in modern physics is to formulate the quantum 
theory of gravity, synthesizing the principles of 
quantum mechanics and general theory of relativity. 
The fundamental units for measuring time, length, 
and energy, known as Planck time, Planck length, 
and Planck energy, respectively, are defined to be 
tp, =(bG/c°)"/? =5.39 x 10-445, Ip, =(bG/c3)'/* = 
1.61 x 10°? cm, and mp = (be /G)'/? =2.17 x 
10g, in terms of the Newton’s constant, G, 
velocity of light, c, and b=h/27,h being the 
Planck’s constant. We may conclude, on dimen- 
sional arguments, that quantum gravity effects will 
play an important role when we consider physical 
phenomena in the vicinity of these scales. Therefore, 
when we probe very short distances, consider 
collisions at Planckian energies, and envisage evolu- 
tion of the universe in the Planck era, the quantum 
gravity will come into play in a predominant 
manner. The purpose of this article is to present an 
overview of an approach to quantize Einstein’s 
theory of gravity, pioneered by Wheeler and De 
Witt almost four decades ago. We proceed to 
recapitulate various prescriptions for quantizing 
gravitation and then discuss simple derivation of 
the Wheeler-De Witt (WDW) equation in general 
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relativity and some of its applications in the study of 
quantum cosmology. There are, broadly speaking, 
three different approaches to quantize gravity. 

The general theory of relativity has been tested to 
great degree of accuracy in the classical regime. The 
geometrical description of spacetime plays a cardinal 
role in Einstein’s theory. Therefore, the general 
relativists emphasize the geometrical attributes of the 
theory and the central role played by the spacetime 
structure in their formulation of quantum theory. 
It is natural to adopt a background-independent 
approach. In contrast, the path followed by 
quantum field theorists, where the prescription is 
valid in the weak-field approximation, the theory is 
quantized in a given background, usually the Min- 
kowskian space. It is argued by the proponents of the 
geometric approach, that the background metric 
should emerge from the theory in a self-consistent 
manner rather than being introduced by hand when 
we quantize the theory. One of the earliest attempts 
to quantize gravity was to follow the route of 
canonical method. The canonical quantization 
approach has many advantages. One of the impor- 
tant features is that it is quite similar to the 
prescriptions adopted in quantum field theory where 
one uses notion of operators, commutation relations, 
etc. Moreover, the subtleties encountered in quantiz- 
ing gravity are transparent. Therefore, the canonical 
procedure is preferred over the path-integral formula- 
tion, although the latter has its own advantages too. 
Another positive aspect of the canonical approach is 
that the requirement of background-independent 
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formulation could be maintained to some extent. 
Thus, there is room for exploring some of the 
nonperturbative attributes of the theory. The relati- 
vists favor canonical formulation, since some of the 
geometrical features of general theory of relativity 
could be incorporated here and be explored to see 
how far the quantum theory captures such properties 
of the classical theory. As we shall discuss in sequel, 
some of the interesting issues of quantum cosmology 
are addressed in this approach. However, there are 
limitations and short comings in this formulation and 
we refer the reader to the text books and review 
articles for further reading and critical assessments of 
canonical approach to quantize gravity. 

The second approach is primarily the endeavor of 
physicists who have devoted their research to 
quantum field theory. Feynman’s seminal work on 
quantization of gravity from this perspective has 
profoundly influenced the subsequent developments. 
The quantization of gravity is carried out in the 
weak-field approximation such that the graviton is 
identified as the fluctuation over the Minkowski 
background metric. It is a massless spin-2 field as one 
concludes from the properties of low-energy gravita- 
tional interaction in the classical limit. Furthermore, 
the gauge invariance associated with a spin-2 mass- 
less field gets intimately related with invariance of 
Einstein’s theory under general coordinate transfor- 
mation. In this setup, the field-theoretic techniques 
could be employed to quantize theory and to consider 
perturbative expansions for the scattering amplitudes. 
It is realized that low-energy amplitudes computed 
from the massless spin-2 theory match with those 
derived from the Einstein—Hilbert action in the weak- 
field approximation. Furthermore, the theory is not 
perturbatively renormalizable since the coupling 
constant carries dimension. One of the most impor- 
tant outcomes of the investigations from this per- 
spective is the discovery, due to Feynman, that the 
introduction of ghost fields is necessary in order to 
maintain unitarity of the S-matrix when one goes 
beyond the tree level. As is well known, this work has 
profoundly influenced frontiers of research in physics 
leading to quantization of Yang-Mills theory which, 
in turn, paved way for electroweak theory and the 
QCD. It is worthwhile to mention in passing that the 
quantum phenomena associated with gravity in the 
nonperturbative regime cannot be addressed in this 
framework. 

In recent years, superstring theory has been at the 
center stage in order to provide a unified theory of 
fundamental interactions. It is postulated that all 
elementary constituents of matter and the carriers of 
the interactions such as gauge bosons and graviton 
are excitations of one-dimensional extended objects: 


the strings. The superstring theories are perturba- 
tively consistent in critical ten dimensions. The 
closed-superstring spectrum contains a spin-2 mass- 
less state which is identified to be the graviton. It is 
well known that perturbative computation of pro- 
cesses involving graviton turn out to be finite. 
Moreover, the Einstein—Hilbert term appears natu- 
rally when one derives the string effective action. 
Therefore, it is expected that string theory will be 
able to provide answers to questions related to 
quantum gravity. Indeed, the theory has met with 
success in resolving some important issues. We note 
that cosmological scenario has been discussed in the 
string theory framework and the WDW equation 
has played an important role in study of quantum 
string cosmology. We shall comment on this aspect 
towards the end of this article. 


The Canonical Structure of Einstein 
Gravity 


The Einstein—Hilbert action is 


S = ET | VTE d a(R - 2A) [1] 


where R is the Ricci scalar derived from the metric, 
guv, and A is the cosmological constant. The field 
equations are derived from the action by the 
standard variational technique. Note that R involves 
second derivative of the metric. If we have compact 
manifolds with boundary OM such that variations of 
the metric vanish on the boundary and the normal 
derivatives do not, it is necessary to add a surface 
term to this action. The exact form of this term will 
be discussed later. The Einstein’s theory of gravita- 
tion is manifestly covariant. The associated action 
[1] is invariant under general coordinate transforma- 
tions: under x“ — x’#(x), 





TE Ox!" Oe” 
gr (x) = s(x) 2a 


Therefore, we expect that the theory will be 
endowed with constraints expressed in terms of the 
canonical variables. One can implement general 
coordinate transformations so that there are only 
two pairs of canonical phase-space variables on a 
spacelike hypersurface. In other words, from physi- 
cal considerations, graviton has only two polariza- 
tions whereas the metric has ten components. 
Therefore, the two physical degrees of freedom can 
be obtained using the freedom of choosing the 
“gauge” transformations in this context. It is 
desirable to identify the constraints and analyze 
their structure, most appropriately in Dirac’s 


2 | 


formalism, and to quantize the theory canonically as 
the next step. This is the path we intend to follow in 
order to arrive at the WDW equation. 


The Classical Constraints 


The Hamiltonian approach is most appropriate to 
employ the constraint formalism due to Dirac. We 
recall that the Lagrangian formulation is manifestly 
covariant as is reflected in the field equations; 
whereas the spacetime covariance is lost in the 
passage to the Hamiltonian approach. Furthermore, 
the spatial components of the metric are the 
dynamical degrees of freedom. We adopt the 
formalism introduced by Arnowitt, Deser, and 
Misner (ADM) for the so-called 3 +1 split of the 
hyperbolic Riemannian spacetime metric, g,,. One 
introduces the lapse function, N+, and the shift 
function, N’. We suppress the factors of 1/167G, 
etc., for the time being for the general discussions 
and shall reintroduce them later. The family of 
spacelike hypersurfaces, %;, are constructed, with 
metric h; induced on it. Here ¢ is a timelike 
parameter, parametrize %;. The distance between 
points on two neighboring hypersurface, X, and 
Xar, With coordinates (t,x’) and (t+ dt, x’ + dx’), 
respectively, is given by 


ds? = —(N+)* de? + hj(N! dt + dx) (Nİ dt + dx’) 

= gd" dx" [3] 
The indices of tensors defined on X; are raised and 
lowered by h; and its inverse b”. The relations 


between the components of gu and N+, N’,/; can 
be obtained easily, 


goo =hiN'N'-(N*)°, gor =hyN' [A] 
The above relations can be inverted to give 

1 

— g00 

The relations between spatial components, gi, of 2) 


and h; and some other useful relations are listed 
below for later conveniences: 





Ni =biig, N+ = s] 





gi = pi — NN 
(N+)? 
J/-g = Ntvh [6] 
N! 
Or __ 
> (Nt? 


Note that (N+, N’) are introduced to specify the 
deformation of the hypersurface and therefore, the 
evolution equations through the Hamiltonian will 
not determine them; they are arbitrary functions. 
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Consequently, [4] implies that go9 and go; will enter 
the Hamiltonian as arbitrary functions. As alluded 
to above, /;; and their conjugate momenta 7” are the 
dynamical degrees of freedom. We may choose 
(N+,N’)=N" and hb; as independent variables 
rather than (goo, goi) =20, and h; for convenience 
and go back to the other set of variables through [4] 
and [5] if we desire. Let m, be canonically conjugate 
momenta to N”, then it is obvious that a Lagrangian 
multiplier, x“, is necessary so that 7.x term has to be 
supplemented to the Hamiltonian due to the 
arbitrariness of N”. We remind the reader that in 
electrodynamics an analogous situation arises while 
analyzing its canonical structure — local gauge 
symmetry plays a crucial role there. It is obvious 
that the generic form of the Hamiltonian is (we shall 
introduce 1/167G, etc., later) 


H= | P(N js] +N Hibin] + x0) [7] 


From the perspective of constraint analysis, it is 
natural that 7“ ~ 0 appears as a first-class constraint 
as they are multiplied by arbitrary functions. More- 
over, this constraint must hold good under the 
deformation of the surface which implies {z“, H}pp 
must vanish weakly leading to H, +0. As a 
consistency requirement, these must be first-class 
constraints if N” are to be arbitrary functions. We 
identify that 7’ ~ 0 and H, +0 are the primary and 
secondary constraints, respectively. Thus far, we 
have discussed the case for pure gravity; the 
presence of matter fields in the full action modifies 
the treatment appropriately. 

Let us analyze the structure of the constraints for 
the Einstein—Hilbert action [1]. For a compact 
manifold with boundary 0M, we have to add the 
surface term which takes the form: 


= d?xVhK 

8G OM 
Here K stands for the trace of the extrinsic curvature 
of the boundary 3-surface and h = deth,; note that 
hi; is the induced metric on the 3-surface. If we 
include matter fields, the corresponding action is to 
be taken into account. Once we make the 3 + 1 split 
of the metric, the action assumes the following form: 


oa / ane, 


7 167G 
x (Kj\K’ — K? +°R — 24A) [8] 
where 
1 Ohi 
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Here D;N; represents covariant derivative of N; with 
the connections computed from h; and °R is 
curvature of the 3-surface. The canonical momenta 
are 


© vh 
-= 167G 


and we can invert this relation to get 


2 T 5 hie 
167G Wh ( Pa 


The Hamiltonian form of action is given by 


ql 


(Ki = pïKI) [10] 


SH = J d?x dt (byr -N= NHG; ) [11] 


Notice that [8] does not involve time derivatives of 
N+ and N’, their corresponding canonical momenta 
vanish. 


Ti FX 0 [12] 


as expected from our earlier discussions about the 
role of N”. A straightforward constraint analysis 
leads to the pair of constraints 


H; = —2D,7, = 0 [13] 
167G 1 og 
= Te (Pabu = 5 Pah) ik al 





SRO [14] 


We mention in passing that the above constraint 
equations get modified in the presence of matter 
fields in the theory. This is relevant. The WDW 
equation plays an important role in quantum 
cosmology to describe the evolution of the universe 
in early epochs and the equation is studied in the 
presence of a generic matter content, that is, a scalar 
field with potential. The constraint equations [13] 
and [14] modify to 


Hi =H; +-H™" w 0 [15] 


H! Z Hi i Harter y 0 [1 6] 


The Algebra of Constraints 


In order to compute the classical Poisson bracket 
algebra of the constraints [13] and [14], we use the 
canonical Poisson bracket relations for the phase- 
space variables on »;: 


{hij(x), Pei(x') } = 0 [17] 


o a (x) f =O [18] 
eha Ge) 0,06 Ge) [19] 


Thus, Poisson brackets among the constraints [13] 
and [14] are 


{Hi(x), Hyj(x!)} = —Hj(x)OF 6(x, x’) 
+ Hi0;5(x, x’) [20] 


{Hi(x), Hi (x)} = Hi (*)OF6(x, x") [21] 


{Hs (x), Hi (x')} = h’ (x)Hi(x) OF 5(x, x’) 


— hii(x!Hj(x!)OF6(x,x") [22] 


When we resort to canonical quantization, the 
starting point is the Hamiltonian action in the first- 
order formalism, where the canonical variables are 
subjected to the constraints [13] and [14] in terms of 
H1 and H; satisfying the algebra given by [20]-[22]. 
One encounters a number of important issues while 
proceeding to canonically quantize the theory. We 
shall mention only a few of them in what follows. It 
is important to address issues related to the role of 
the constraints in the quantized theory and how to 
deal with the Lagrange multipliers N+ and N’. A 
simple proposal is to solve the constraints at the 
classical level and identify the physical degrees of 
freedom and quantize the theory subsequently. 
There are four constraints (first class), H _, Hi, 
therefore, out of the 12 phase-space variables, 
(hi n”), only eight are independent. We need to 
supply four gauge conditions in order to render the 
theory (classically) solvable. Thus, we are left with 
four physical degrees of freedom in the Hamiltonian 
phase space and we can quantize them. The 
implementation of this idea is easier said than 
done. One obstacle is that the constraints cannot 
be solved in a closed form in this formalism. If we 
fix a gauge and quantize the theory, we obviously 
break the gauge invariance. It is essential to show, 
subsequently, that all physically observable quanti- 
ties are independent of the gauge choice. Another 
criticism of this formalism is that we already get rid 
of some of the components of the metric. Therefore, 
the spirit of the general theory of relativity, which is 
based on the geometrical structure of spacetime, is 
somewhat diluted. There are other suggestions 
where h; and their conjugate momenta are elevated 
to quantum status before supplying the gauge 
conditions. The issues of gauge fixing and dealing 
with the constraints are addressed at the quantum 
level. We replace the canonical Poisson bracket 


algebra by the canonical commutators and proceed 
further. The momentum operator assumes the form 


a o ô 
wt! = —ib oh, 
and the wave functional depends on h; that is, U[/]. 
There are many technical problems related to the 
properties of the states and we shall not deal with 
them due to limitations of space. It is essential to 
discuss the role of the constraints in the quantum 
theory. We demand that the quantum constraints 
annihilate the physical states (recall the Gauss law 
constraint in gauge theories). However, the issue of 
operator ordering is to be dealt with which in turn is 
connected with the Hermiticity properties of the 
quantum constraints. The Hamiltonian constraint 
Hı 7 0 (henceforth denoted as H and defined as the 
Hamiltonian) is a product of the metric h; and 7”. 
There is certain ambiguity in defining the constraint. 
Therefore, one has to choose a _convention. 
The condition that the Hamiltonian, H , consisting 
of gravitational and matter components, annihilates 
the state is expressed as 


Hv =0 [23] 


When we adopt coordinate representation for n”, 
the above equation takes the form 


6 ó 


“i676C an 
i 1 bhi bhp) 
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(CR—2A)+H™ |) Gib,¢]=0 [24] 


This is the celebrated WDW equation. Here we have 
considered a simple case where matter Hamiltonian 
density generically contains a single scalar field, œ, 
and therefore Y is functional of 3-metric on X}; and 
$. Giz is the De Witt metric in the superspace: 


1 
Gini = Vb (Pahil + hihi, — hibys) [25] 


Remarks The space of all 3-metrics and the scalar 
field (hj, ), on Xz, for the description of classical 
evolutions is called the superspace (no connection 
with the superspace of supersymmetry). Thus, 
W[h,] is a functional on superspace. Furthermore, 
W carries no explicit dependence on t. This is a 
consequence of the fact that “time” plays the role of 
a parameter in the general theory of relativity, thus 
the dynamical variables þh; and ¢ already provide the 
evolutionary processes although ¢ does not make its 
appearance. As mentioned earlier, we always discuss 
the case when X; is compact. Another point to note 
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is that the quantum momentum constraint, Hj, as an 
operator annihilates the wave function which is a 
statement of the quantum-mechanical invariance of 
the theory under three-dimensional diffeomorph- 
isms. However, the WDW equation conveys invar- 
lance of the theory under  reparametrization, 
although careful analysis is necessary to prove this 
point. Now we proceed to discuss the solutions of 
the WDW equation. 


WDW Equation and the Solutions 


It is recognized that the WDW equation [24] is a 
second-order hyperbolic functional differential equa- 
tion and naturally it has enormous number of 
solutions. Therefore, if we want the WDW equation 
to have any predictive power, it is necessary to 
introduce boundary conditions. One of the possible 
choice is to specify the wave function on the 
boundary of the superspace. Indeed, the central 
issue of quantum cosmology is about the choice of 
various boundary conditions which has been an 
important topic of debates. This point will be briefly 
discussed later. Notice that the boundary condition 
has to be introduced keeping in mind how the 
universe is expected to behave as it evolves. There is 
a proposition that the boundary condition for the 
quantum evolution of the universe be given the 
status of a physical law. Therefore, the role of the 
wave functional, Y[b;(x), (x), B], its evolution, and 
interpretation are central to the development of 
quantum cosmology. Thus, Ų represents the ampli- 
tude for the universe to have h;(x) on the 3-surface, 
B, and matter field ¢(x). It is argued that path- 
integral formalism should be adopted as an alter- 
native to the canonical prescription to solve for the 
wave function, rather the transition amplitude, 
satisfying the WDW equation. Here the first step is 
to define the Euclidean version of the gravitational 
action keeping in mind the subtleties. As is well 
known, we deal with propagator (or transition 
amplitude) in the path-integral approach where the 
functional integral is carried out over a set of 
4-metrics and matter fields with Euclidean action 
inside the integral acting as the weight factor. We 
recall that while formulating quantum mechanics in 
the path-integral approach, we sum over all possible 
paths in the functional integral. However, in the 
semiclassical approximation, the amplitude is domi- 
nated by the action corresponding to the classical 
path and we approximate the wave function as w ~ 
e/a and it gets modified appropriately in the 
Euclidean formulation. In this background, we 
briefly discuss how the wave function of the 
universe is obtained in the path-integral formalism. 
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According to the proposal of Hartle and Hawking, 
one adopts path-integral formalism for the Eucli- 
dean action where the functional integral is not only 
carried out over the 4-metric, g,,, and the scalar 
field #, but also one takes sum over the class of 
manifolds, M. Note that B is a part of the boundary 
of this set of manifold. If h;; and @ are the induced 
metric and the configuration of the scalar field, ¢, 
on the boundary, B, then the propagator (henceforth 
we just call it the wave function) U[h;, 6, B] can be 
given a functional-integral representation. Indeed, 
obtaining the most general form of the path integral, 
summing over the 4-manifolds, is quite a formidable 
task. On the other hand, if one chooses a class of 
4-manifolds which can be decomposed as a product 
(foliation) R x B, the wave function is expressed as 


Wh, 6, B] 
=| DN" J Dh; Do f(N")Agpe #89! [26] 


We have introduced the gauge-fixing condition 
as f(N“), which is usually taken to be N” =|" and 
then the corresponding Faddeev—Popov determinant, 
Arp, has to be inserted into the path-integral 
measure. We recall from our earlier discussions 
that N” has to be unrestricted on the boundary, B, 
since they have no dynamical role when we express 
the action in terms of the variables defined on the 
3-surface. As noted in the previous discussion, 
explicit time dependence does not appear after the 
3+ 1 split and (h;(x),(x)) have no dependence on 
t. Therefore, we introduce a parameter to designate 
the paths over which the functional integral is to be 
taken. Recall that in the quantum-mechanical case, 
the paths are parametrized as q;(t) for the coordi- 
nates. However, when we resort to a parametriza- 
tion of the variables for the case at hand, certain 
conditions must be fulfilled. We are permitted to 
integrate over h; and ¢ over only those paths, while 
parametrizing them as (h;(x, T), (x, 7)), so that they 
match the arguments of the wave function on the 
boundary B. Therefore, we may define the metric 
and the scalar field configuration so that at r=1 
they assume their functional values on the boundary: 
in other words, hj(x)=hj(x,r=1) and 
d(x) =o(x,7=1). It is worthwhile to go back to 
the quantum-mechanical analogy once more. When 
we compute amplitudes/propagators in quantum 
mechanics, the functional integral is defined for the 
amplitude of going from a configuration q; to qf 
while summing over all possible paths originating 
from one endpoint q; and ending at the final point 
qf. On this occasion, we have imposed the con- 
straint on the final endpoint belonging to the 


boundary B. Thus, in order to determine the wave 
function of the universe, we are required to specify 
the initial configurations of þh; and ¢ at T=0. We 
shall not enter into important issues related with the 
properties of the Euclidean action, the problems 
associated with the choice of contours of the path 
integrals, and related topics. The reader will find 
detailed discussions in the lectures and monographs 
referred in the “Further reading” section. 

It is important to re-emphasize that boundary 
conditions are to be introduced while solving the 
WDW equation. It was argued by De Witt that the 
wave function will be determined uniquely from the 
mathematical consistency of the theory and that 
hope has not been realized. Whether one attempts to 
solve the functional differential WDW equation or 
obtain the wave function in the _ path-integral 
formalism, the issue of boundary condition is 
unavoidable. There are mainly three different kinds 
of boundary conditions in quantum cosmology: 
Hartle-Hawking (HH) no-boundary proposal, 
Vilenkin’s tunneling mechanism, and Linde’s bound- 
ary condition. We shall briefly discuss the first two 
proposals. Instead of stating the boundary condi- 
tions in full generality, we shall envisage quantum 
cosmology in a minisuperspace and provide illus- 
trative examples to compare the main features of 
HH and Vilenkin solutions to the WDW equation. 

It is realized that the discussion and solutions of 
quantum cosmology in the superspace is rather 
difficult, since we deal with functional differential 
equations and the configuration space is infinite 
dimensional. Therefore, it is worthwhile to consider 
a system, as a simple model, which has finite degrees 
of freedom. Thus, we assume that the metric and 
matter fields depend only on cosmic time to begin 
with. There is a physical motivation behind this 
assumption, since the present classical state of the 
universe is described by the Friendmann—Robertson-— 
Walker (FRW) metric corresponding to an isotropic 
and homogeneous universe. Notice that the classical 
evolution equation resembles that of the motion of a 
particle. The quantum evolution equations are now 
given by differential equations of quantum 
mechanics rather than functional differential equa- 
tions. Similarly, the path-integral formulation 
becomes analogous to the quantum-mechanical 
frame work. Of course, adopting such a simplified 
approach deprives us from describing some of the 
important aspects of quantum gravity. However, 
within this framework, several essential features can 
be exhibited and deep insight might be gained into 
the physics of the very early universe. The first step 
in getting the minisuperspace metric is to assume 
that the lapse is homogeneous, that is, N+ = N~(t) 


and the shift is set to zero, N’ = 0. Thus, the metric 
takes the form 


ds? = —(N+(t))*de* + hi(x,t)dx'dx’ [27] 


The relevant choice of 3-metric for FRW isotropic 
and homogeneous universe is 


hi(x, t)dx'dx! = a(t)" dQ? [28] 


Note that dQ% is the metric on a 3-sphere. It is 
straightforward to derive the Friedmann equations 
for such a geometry. 

The HH no-boundary condition can be inter- 
preted as a topological proposition about the set of 
path over which we have to sum. The 3-surface B is 
to be taken as the only surface of compact 
4-manifold M which is endowed with the metric 
gı, and h; and ¢ are the induced metric and the 
scalar field on the surface. The wave function is 
obtained by using the matching condition supple- 
mented with initial condition. For the minisuper- 
space case, initial conditions impose constraints on 
the scale factor a(r =0) and (da/dr)(r =0), and N+ 
is to be gauge fixed. These conditions are to be 
implemented in the context of determining the wave 
function of the universe. In the case of the tunneling 
boundary condition of Vilenkin, the qualitative 
scenario is as follows. If we look at the solution 
to the WDW equation (in the path-integral 
approach, Vilenkin considers Lorentzian action), 
the solution, crudely speaking, has both ingoing 
and outgoing modes at the boundary. In his 
proposal, the outgoing mode at the boundary is to 
be accepted. The exact prescription is lot more 
subtle than the above statement, since one has to 
define the meaning of outgoing mode carefully in 
the absence of a timelike Killing vector when we 
write the WDW equation on the superspace. The 
qualitative picture for Vilenkin’s boundary condi- 
tion, in the minisuperspace, is like tunneling solu- 
tions in quantum mechanics when a particle 
penetrates through a potential barrier. 

Let us consider a minisuperspace model with the 
scalar field and potential V(¢). The action is 


l 3| 1 (ay. (4) 
S=5 | dea -E PN 


|29] 





A few comments are in order here. For the FRW 
metric, we have \/gR=6(—aa+ka)+ a total 
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derivative term; the total derivative term can be 
removed by adding a boundary term and k is 
positive since we take the spatial part to be closed. 
We have redefined the scale factor, the scalar 
field, the potential term, and k such that the 
Einstein—Hilbert action with matter field assumes 
the form of [29] and this action facilitates the 
definition of conjugate momenta without cumber- 
some numerical factors, and the Hamiltonian takes 
a simple form. The conjugate momenta and result- 
ing Hamiltonian are 


aa ard 
Ta =- NL’ 1e = WL [30] 
N= T n 
H.=—=-|-+ +a°V(¢)—a =N'H [31] 





and the constraint is H=0. In the quantum 
cosmology context, we solve the WDW equation: 
HY =0. Since the exact solution is not possible, one 
resorts to some approximation with simple assump- 
tions. The differential equation is 


a 1p 
aa age to VO -@|v¥=0 [32 
Let us consider the case when V(¢) does not grow 
very fast, that is, V(¢)/V()’ << 1 and consider the 
solution to the WDW equation where Y has weak 
dependence on ¢. Consequently, we may ignore the 
@ derivative term in [32]. The purpose of these 
assumptions is to reduce the problem to a one- 
dimensional quantum mechanics problem and then 
employ WKB method. It is hoped that at least some 
of the nonperturbative aspects can still be captured. 
When the effective potential appearing in [32] is 
negative (this is a classically inaccessible region), the 
wave function is 


D(a, p) x et BVA- Vio)” [33] 


We expect the wave function have oscillatory 
behavior in the classically allowed domain and it 
does have that property, 


D(a, p) x etA VO- [34] 


The choice of signs is decided from the boundary 
conditions imposed and the usual matching of 
the wave functions of the two regions is done as is 
the case with the WKB approximation. Note that we 
are considering the metric and the scalar field on 
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the boundary which were denoted by h; and 4; 
strictly speaking, we should denote the solutions 
as a and ¢. But from now on, we drop this bar on 
a and ¢. 

Let us momentarily assume that V is ¢-indepen- 
dent and therefore, we have an effective cosmologi- 
cal constant. The problem is identical to the motion 
of a particle in a potential well. There are two 
turning points. In one region, the particle starts from 
a= 0, reaches one turning point 7; and returns back. 
In another case, it starts from a= cœ, travels up to 
a=ryz and reflects back. In the quantum-mechanical 
case, the particle can tunnel through the barrier. The 
wave function has both decaying and growing 
modes under the barrier, and boundary conditions 
tell us which mode to choose. One possibility is that 
the particle starts from a=0, tunnels through and 
proceeds towards a=on, that is, it has outgoing 
mode. The other possibility is that the wave function 
has both outgoing and ingoing modes. In this simple 
scenario, the former corresponds to Vilenkin’s 
tunneling boundary condition, where the universe 
is created at a=0 and it keeps growing. The latter is 
HH no-boundary proposal where the wave function 
has both modes and the universe contracts and 
expands. 

Now we discuss the two boundary conditions in 
the presence of the potential, with the approxima- 
tions mentioned above. The proposition of Vilenkin 
amounts to the following conditions on the wave 
function: the region of the boundary which is 
nonsingular is ¢ finite and a=0. Other than this 
domain, either a or ¢ diverge on any other region of 
the boundary; both can diverge in this singular 
boundary. Notice from the expression for [33] and 
[34] that the tunneling region corresponds to 
a*V(o) <1, whereas, the oscillatory domain is 
a*V(¢) > 1. If we use the saddle-point approxima- 
tion, Y ~ e*«, Vilenkin’s boundary condition cor- 
responds to UW ~ e™a, with 


(VEVA — 1)° 
3V(4) 


So far, we considered the situation where differential 
operator for ọ is dropped in [32]. In order to 
account for weak ¢-dependence, we could introduce 
it by multiplying a slowly varying function, say F(¢) 
and write W(a,¢) ~ F(d)e«. Similarly, the wave 
function can be obtained under the barrier and 
required to satisfy WKB matching conditions. 
Furthermore, the regularity condition on the wave 
function in small scale factor limit and behavior of 
its derivative with respect to @ in that limit 
determine the form of F(¢). In summary, the 


Sel — 


Vilenkin boundary conditions yield the following 
wave functions: 


U(a, b)y we CVO- V(@)P”) 35] 


Pa, b)y = eT V/3V@) eiA- 136) 


Note that [35] is the wave function under the barrier, 
that is, a? V(¢) < 1 in this region, whereas [36] is in 
the classically accessible domain (a*V(¢) > 1) which 
is reflected by the oscillatory character. The slowly 
varying function F(ọ)~e™/V®) appears as the 
common factor for the wave functions in the two 
domains. 

The HH no-boundary proposal to derive the wave 
function of the universe was formulated in the 
Euclidean path-integral formalism. A considerable 
amount of attention has been focused in this area. 
We shall present the HH wave function providing 
only a sketchy argument. In the Euclidean descrip- 
tion, 4-metric is ds? =(N+)*dr2 + a*(r)d03. The 
4-geometry should close in a regular way. If we 
make the bounding 3-space smaller and smaller, it 
can be closed with flat space. We can infer about the 
behavior of the scale factor in the limit r— 0 from 
this consideration. Furthermore, in the semiclassical 
approximation W(a,d) ~e>®; we have replaced 
(a,) by (a,¢) as remarked earlier. Thus, our aim 
is to evaluate Sp at the saddle point. This is achieved 
by writing down the (Euclidean version) field 
equations for a and @¢ and the Hamiltonian 
constraint, and then solve for a(r), (7), and N+(r). 
Eventually, we want to eliminate N+ and then 
obtain Sg. After all, the path integral is dominated 
by the classical trajectory, a(r), and one does not fix 
the gauge for N+ while solving for a. In fact, the 
lapse gets eliminated by utilizing the Hamiltonian 
constraint which involve r-derivatives of both a and 
@. We mention, without going into details, that the 
classical action is not unique. One of the ways to 
visualize it is to note that the solutions obtained for 
the lapse from the Hamiltonian constraint have sign 
ambiguities. 

The classical action is 


$= -gl ti-e) B7 


Note that the two solutions correspond to 3-sphere 
boundary being closed off by sections of 4-sphere. 
Moreover, the Euclidean action is negative. Hartle 
and Hawking argue that the negative sign in [37] 
gives the correct answer since the wave function 
peaks for that choice. However, there is no unanimity 


for HH argument and some authors have put 
forward a point of view that additional inputs are 
necessary to arrive at the HH conclusion about 
choosing the negative sign for Sg in [37]. We refer the 
reader to the reviews of Hartle and Halliwell for 
detailed discussions on the choice of contours for 
path integrals, subtleties involved in getting various 
solutions for the lapse and their interpretations. We 
give below the wave function under the barrier (with 
choice of negative sign in [37]): 


Dyng (a, p) x e/a- Voy") [38] 


Taula, p) ~ e? VO) 


x COS ( v 5 a2 V(¢) — 1? — =) [39] 





Remarks The wave function in [38] is obtained in 
the classically inaccessible region under the condition 
a*V(%) <1, and wave function [39] corresponds to 
the case a*V(¢) > 1, where the particle motion is 
permissible classically. Note the factor e!/3'°) in the 
wave functions in both the regions and compare that 
with the Vilenkin’s wave function which has the 
opposite sign. We may conclude where the wave 
function will peak for each of the two boundary 
conditions. Whereas Vilenkin’s proposal implies that 
Wy(a, 6) peaks when V(¢) takes large values, HH no- 
boundary condition tells us that it peaks when 
V(¢) + 0. Furthermore, we note that Vy is complex 
and Wy is real in the oscillatory region. Although 
the debates on the merits and demerits of each of the 
boundary proposals are going on for more than two 
decades, the issue is far from being settled. In the 
absence of any experimental tests, there is no way to 
favor one boundary proposal over another. Then, 
boundary conditions do have predictions about the 
evolution of the universe after the quantum era and 
have predictions in that (classical) regime. Therefore, 
determination of the wave function with specific 
boundary conditions does have some connections 
with the laws that govern the evolution of our 
universe in the present epoch. 


It is worthwhile to dwell on the WDW equation 
from the perspectives of string theories. Indeed, there 
have been important developments to understand the 
dynamics of the universe in the string-theoretic 
framework. It is important to note the key role 
played by dilaton in string theory: (1) it is one of the 
massless states of the theory, and (2) the vacuum 
expectation value (VEV) of this field determines the 
coupling constants we hope to use in describing 
fundamental interactions. Therefore, the graviton is 
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always accompanied by the dilaton in any string- 
theoretic approach to study the universe. The duality 
symmetries are recognized to provide deep under- 
standing of the string dynamics. Therefore, the 
investigations of quantum gravity phenomena from 
the string-theory viewpoint are necessarily influenced 
by above mentioned facts. Indeed, classical cosmolo- 
gical solutions, derived from string effective action, 
have several interesting characteristics. We mention is 
passing that the WDW equation has played an 
important role to study quantum evolution equations 
in string cosmology. The choice of operator-ordering 
prescription in defining the WDW Laplace—Beltrami 
operator can be resolved by appealing to the duality 
symmetries. Furthermore, the boundary conditions 
imposed on the wave function are dictated by string 
symmetries and therefore, the resulting wave function 
has very interesting properties. The string theory has 
addressed some of the most important problems in 
quantum gravity and it has provided resolutions to 
several key issues. It is expected that string theory 
will provide answers to challenging questions in 
quantum cosmology. In summary, we have conveyed 
some of the salient aspects of the WDW equation. 
The canonical quantization technique is adopted to 
study quantum gravity in this approach. We have 
illustrated the crucial role of the constraint formalism 
due to Dirac and argued that some of the nonpertur- 
bative aspects of quantum gravity could be retained. 
In a short article of this nature, it is not possible to 
provide detailed discussion about the general deriva- 
tion of the WDW equation and discuss the role of 
boundary conditions more exhaustively. Instead, we 
presented some of the key steps in the derivation of 
the WDW equation adopting the canonical formalism 
and provided simple examples. The subject is still an 
active area of research. The interested reader may 
benefit from the bibliography. 


See also: Canonical General Relativity; Loop Quantum 
Gravity; Quantum Cosmology; Quantum Dynamics in 
Loop Quantum Gravity; Quantum Geometry and its 
Applications; Superstring Theories. 
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Introduction 


Historically, the first question where the Wulff shapes 
have appeared is the one of the formation of a droplet 
or a crystal of one substance inside another. The 
natural problem here is: what shape such a formation 
would take? The statement that such a shape should 
be defined by the minimum of the overall surface 
energy subject to the volume constraint is physically 
very natural. In the isotropic case, when the surface 
tension does not depend on the orientation of the 
surface, and so is just a positive number, the shape in 
question should be of course spherical (provided we 
neglect the gravitational effects). In a more general 
situation the shape in question is less symmetric. The 
corresponding variational problem is called the Wulff 
problem. Wulff (1901) formulated it in his paper, 
where he also presented a geometric solution to it, 
called the “Wulff construction.” 

The Wulff variational problem is formulated as 
follows. Let r(n), n €S4-', be some continuous 
function on the unit sphere S+! c R%. We suppose 
that T > 0, and that 7 is even: T(n) = 7(—n). The value 
T(n) plays the role of the surface tension between two 
phases separated by the hyperplane orthogonal to the 
vector n. For every closed compact (hyper)surface 
M®-1 c Rf, we define its surface energy as 


W,(M) = J r(n,) ds 


where n, is the normal vector to M at s € M. The 
functional W,(M) has the meaning of the surface 
energy of the M-shaped droplet made from one of 
these two phases. It is called the Wulff functional. 
Let W, be the surface which minimizes W,(-) over 
all the surfaces enclosing the unit volume. Such a 


minimizer does exist and is unique up to translation. 
It is called the Wulff shape. 

The following is the geometric construction of 
MF-. Consider the set 


k= fx ERİ: Yn € S(x, n) < r(n)} 
If we define the half-spaces 
L= fx ERİ (x,n) < r(n)} 
then 
K, = OnLrn [1] 
In particular, K, is convex. It turns out that 
W, = \,0(K;) 


where the dilatation factor A, is defined by the 
normalization: vol(A;K,)=1. The relation [1] is 
called the Wulff construction. For the future use, 
we introduce the notation w, for the value of the 
surface energy of the Wulff shape: 


w, = W,(20,) 


The Wulff construction was considered by the 
rigorous statistical mechanics as just a phenomeno- 
logical statement, though the notion of the surface 
tension was among its central notions. The situation 
changed after the appearance of the book by 
Dobrushin et al. (1992). There it was shown that 
in the setting of the canonical ensemble formalism, 
in the regime of the first-order phase transition, the 
(random) shape occupied by one of the phases has 
asymptotically (in the thermodynamic limit) a 
nonrandom shape, given precisely by the Wulff 
construction! In other words, a typical macroscopic 
random droplet looks very close to the Wulff shape. 

In what follows we will explain the above result. 
Another important application of the concepts 
introduced above — the role played by the Wulff 


shapes in the theory of metastability — is also 
described (see Metastable States). 


Crystals in the Ising Model 


Ising spins o, take values +1, with x € Zt. We will 
wrap Zf into a torus T- by cone a factor lattice: 
‘a =Z*/NZ*. Ising- mci! grand canonical Gibbs 
state in TË is the probability measure Tie 


uilo) = Z's exp(—GH(o)) 


Here Hyn(o)=— oy yan.x yeTd Oxy, D >0 is the 
inverse temperature, and Zy,g is the normalization 
factor. Ising-model oS Gibbs state 2 n 
is the probability measure ji P, obtained from ne o 
taking its conditional see o 


a= (. DCE pnt) al <1 


d 
xels 


(Here we make a slight abuse of notation. More 
precisely, since o,=+1, one has to consider 
the conditioning S*o,=pnN?¢, where pn—p as 
N— oo, while the numbers (1 —pyn)N?% are even 
integers; otherwise the condition is empty.) We will 
characterize the canonical state uP by describing the 
properties of contours, {y;(o)}, of configuration v. 
Contours y; of configuration ø are hypersurfaces 
made of elementary (d — 1)-dimensional unit cubes of 
the dual lattice, which separate the nearest-neighbor 
(n.n.) points x,y € T4, where ox 4 Oye 

Suppose that the temperature 37! is low enough, 
while the density parameter p satisfies the constraints: 

my(8) > P > 8a 

Here m*((3) is the spontaneous magnetization of the 
d-dimensional Ising model, while gz is some geo- 
metric factor, the role of which will be explained 
later. The above constraint forces some amount of 
the (—)-phase into the (+)-phase. It turns out that 
this amount gathers into one big droplet, which has 
approximately the Wulff shape. 

We first formulate the known rigorous results for 
the case d=2, and then indicate some extensions. 


Two-Dimensional Case 


The following holds with pi’ 
ing 1 as N > œ: 


-probability approach- 


e The set {y;(c)} of contours of ø has precisely one 
“big” contour, I(c); the diameters of other 
contours do not exceed K InN, K=K({). 

e The area |IntI‘(c)| inside ['(c) satisfies 


| Int P (o)| — vN*| < KN” (In N)“ 
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where 


è There is a point x =x(a) — the “center” of Ic) - 
such that the shift of r(e) by —x(o) brings the 
contour I (øo) very close to the scaled Wulff curve, 
defined by the Ising-model surface tension 7: 


rH ro - x(a), nw) < KN? (ln N)* [2] 


(Here ry is the Hausdorff distance: for every two 
sets A, C € Rf, r(A, C) is defined as max{inf[r: 
A C C+ B,], inf[r:C C A + B,]}, where B, is the 
ball of radius r.) 


The proof of the above result is the content of 
the book by Dobrushin et al. (1992). In the 
two-dimensional case, it remains true for all 
temperatures 8! below the critical one (Ioffe and 
Schonmann 1998). The value 2/3 of the exponent is 
an improvement of the original 3/4 result 
(Alexander 1992). Probably, it can be further 
improved down to 1/2. Though Dobrushin et al. 
(1992) treat only the Ising model, their results are 
valid for a wide range of other models. 

The restriction p > gg in the theorem is needed 
because without it the droplet may prefer to assume 
the shape of a strip between two meridians rather 
than to take the Wulff shape. 


Three-Dimensional Case 


In the case d=3 or d > 3, the statement that a 
typical configuration ø has only one big contour 
T (ø) is still true. But the analog of [2] is not known. 
It is natural to conjecture that it holds at low 
temperatures, even in a stronger version, with only a 
logarithmic term K(In N)“ in the RHS. It probably 
fails at higher subcritical temperatures. 

What is known to hold is a weaker version of this 
theorem, where the distance between random 
droplet and the Wulff shape is measured not in 
Hausdorf distance, but in L! sense. To state the 
corresponding theorem, we will associate with every 
configuration o on a vice torus Té, a real-valued 
function M,(t) on the unit torus T?=R4 Vig 
and we then compare this function with the 
indicator function Isk., where sK, C T? is the Wulff 
body, properly scaled. 

The function M,(t) is defined as follows. We 
denote by in the natural embedding of the discrete 
torus T$ into T*?, the image of in being the grid with 
spacing 1/N. For t € T? we define by(t) c T? to be 
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the ball centered at ¢ with radius ~/1/N, and let 
By(t) C A(N) be its preimage under in. Then 


1 
Molt) = Bay de 7) 


x€Bn(t) 





We have to expect to see a droplet sK, with 


_4 d m3(b) =P 
w, 2m*(3) 


Let us introduce for every subset AC T? the 
indicator 


1, tEA 
lad) = = te AS 


For every function v in L'(T?) we denote by U(v, 6) 
its 6-neighborhood in L! (T). 

The result can now be formulated. Suppose the 
temperature 37! is below the critical one. Then the 
function M,(t) is close to the characteristic function 
of the Wulff shape: For every 6 > 0 





1 
lim pi? M,(- U(Isx,41;6) y =1 
Naren map) a U cial 
eT 
The shifts by all t-s of the Wulff shape sK, appear 
in the statement since the location of the droplet can 


be arbitrary. Note that if a point £ is such that the 
ball By(t) stays away from the boundary of the 


droplet T (o) present in the configuration ø, then the 
value M,(t) should be expected to be +m*((), 
depending on whether £ is outside or inside the 
droplet, which explains the factor 1/m*({). 

For a proof, see Bodineau (1999) and Cerf and 
Pizstora (1999). 


See also: Cluster Expansion; Large Deviations in 
Equilibrium Statistical Mechanics; Metastable States; 
Percolation Theory; Statistical Mechanics of Interfaces. 
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Introduction 


The term Yang—Baxter equations (YBEs) was coined 
by Faddeev in the late 1970s to denote a principle of 
integrability, that is, exact solvability, in a wide 
variety of fields in physics and mathematics. Since 
then it has become a common name for several 
classes of local equivalence transformations in 
statistical mechanics, quantum field theory, differ- 
ential equations, knot theory, quantum groups, and 
other disciplines. We shall cover the various versions 
and their relationships, paying attention also to the 
early historical development. 


Electric Networks 


The first such transformation came up as early as 
1899 when the Brooklyn engineer Kennelly pub- 
lished a short paper, entitled “The equivalence of 
triangles and three-pointed stars in conducting net- 
works.” This work gave the definite answer to such 
questions as whether it is better to have the three 
coils in a dynamo - or three resistors in a network — 
arranged as a star or as a triangle, see Figure 1. 
Using Kirchhoff’s laws, the two situations in Figure 1 
can be shown to be equivalent provided 


ZiZi = ZZ» = Z3Z3 
= 21722 + Z223 + Z3Z1 [1] 


= Z1 Z2Z3/(Zı + Z2 + Z3) [2] 


Here one has to take either [1] or [2] as second line 
of the equation, depending on which direction the 
transformation is to go. The star-triangle transfor- 
mation thus defined is also known under other 
names within the electric network theory literature 
as wye-delta (Y — A), upsilon-delta (Y — A), or 
tau-pi (T — II) transformation. 


Spin Models 


When Onsager wrote his monumental paper on the 
Ising model published in 1944, he made a brief 
remark on an obvious star-triangle transformation 
relating the model on the honeycomb lattice with 
the one on the triangular lattice. His details on this 
were first presented in Wannier’s review article of 
1945. However, the star-triangle transformation 
played a much more crucial role in Onsager’s 
reasoning, as it is also intimately connected with 
his elliptic function uniformizing parametrization. 

Furthermore, it implies the commutation of 
transfer matrices and spin-chain Hamiltonians. 
Only in his Battelle lecture of 1970 did Onsager 
explain how he used this remarkable observation in 
his derivation of the formula for the spontaneous 
magnetization which he had announced as a 
conference remark in 1948 and of which the first 
complete derivation had been published by Yang in 
1952 using a completely different method. 

Many other applications and generalizations have 
since appeared. Most generally, we can consider a 
system whose state variables — also called spins — take 
values from some suitable discrete or continuous sets. 
The interactions between spins a and b are given in 
terms of weight factors W,, and W,,, which are 
complex numbers in general, see Figure 2. One 
quantity of special interest is the partition function — 
sum of the product of all weight factors over all 
allowed spin values. The integrability of the model is 
expressed by the existence of spectral variables — 
rapidities p,g,7r,... — that live on oriented lines, two 
of which cross between a and b as indicated by the 
dotted lines in Figure 2. Arrows from a to b are added 
to keep track of the ordering of a and b in case the 
weights are chiral (not symmetric). 

In Onsager’s special choice of the Ising model the 
spins take values a,b,c,... = +1 and the weight 
factors are the usual real positive Boltzmann weights 
depending on the product ab= +1, uniformizing 
variable p — q, and elliptic modulus k. In the integra- 
ble chiral Potts model the weights depend on a — b 
mod N, with a,b=1,...,N, whereas the rapidities p 
and q are living in general on a higher-genus curve. 
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Wab Wap 
Figure 2 Spin model weights W.»(p, q) and W æ(p, q). 
When the weights are asymmetric in the spins, there 


are two sets of star—triangle equations which can be 
expressed both pictorially (Figure 3) and algebraically: 


S| Walp, 4) Wala, t) Walp") 
d 


= R(p, 9,7) Wald, q4) Weald, r) W 6 (p, 7) [3] 


R(p, q, r) Wab (p, q) Wac(q, r)Wo(P, r) 


= Wye (P q) Woalq; 1) Waa(D; 1) [4] 
d 


Note that eqns [3] and [4] differ from each other by the 
transposition of both spin variables in all six weight 





Figure 3 Star—triangle equation. 


factors. In general, there may also appear scalar factors 
R(p,q,r) and R(p, q,r), which can often be eliminated 
by a suitable renormalization of the weights. If a, b, 
and c take values in the same set, we can sum over 
a=b =c, showing that R =R in that case. 

The Kennelly star—triangle equation [1], [2] can be 
recovered as a special limit of a spin model where 
the states are continuous variables. 


Knot Theory and Braid Group 


A seemingly totally different situation occurs in the 
theory of knots, links, tangles, and braids. In 1926, 
Reidemeister showed that only three types of moves 
suffice to show the equivalence between two 
different configurations, see Figure 4. Moves of 
type I — removing simple loops — do not apply to 
braids. Moves of type II, for which one strand crosses 
twice over another strand, can be reformulated for 
braids, namely that an overcrossing is the inverse of 
an undercrossing. The Reidemeister move of type III 
is a precursor of the more general Yang—Baxter 
moves and can be represented also by the defining 
relations of Artin’s braid group. Let Rj j,; be the 
operator representing the situation in which the 
strand in position 7 crosses over the one in position 
i+1. Then a braid can be represented by a product 
of Rj ;41’s and their inverses, provided 


Rj 1 Ri Ri = Riv 2RiyuRarin2 [S] 
and 
Rij, Rij] =0, ifi- jl 2 2 6] 
and similar relations in which R; ;}1 and/or R;41,;+2 
are replaced by their inverses. 


Factorizable S-Matrices and Bethe Ansatz 


In the early 1960s, Lieb and Liniger solved the one- 
dimensional Bose gas with delta-function interaction 
using the Bethe ansatz. Yang and McGuire then tried 
to generalize this result to systems with internal 
degrees of freedom and to fermions. This led to the 
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Figure 4 Reidemeister moves of types I, Il, and Ill. 





Figure 5 Vertex model YBE. 


discovery of the condition for factorizable $-matrices 
by McGuire in 1964, represented pictorially by 
Figure 5, where the world lines of the particles are 
given. Upon collisions the particles can only exchange 
their rapidities p,g,7r, so that there is no dispersion. 
Also indicated are the internal degrees of freedom in 
Greek letters. In other words, the three-body $-matrix 
can be factorized in terms of two-body contributions 
and the order of the collisions does not affect the 
final outcome. McGuire also realized that this 
condition is all one needs for the consistency of 
factoring the m-body S-matrix in terms of two-body 
S-matrices. The consistency condition is obviously 
related to the Reidemeister move of type III in 
Figure 4. 

Yang succeeded in solving the spin-1/2 fermionic 
model using a nested Bethe ansatz, utilizing a 
generalization of Artin’s braid relations [5] and [6], 


— r)Riivi(q—1) 


E r)\Rina (p = r)Rivii+2 (p — q) [7] 


He submitted his findings in two short papers in 
1967. The R operators in eqn [7] - a notation 
introduced later by the Leningrad school — depend 
on differences of two momenta or two relativistic 
rapidities. Sutherland solved the general spin case 
using repeated nested Bethe ansatze, while Lieb and 
Wu used Yang’s work to solve the one-dimensional 
Hubbard model. 


Rii (p — q)R RiP 


= Rainald 


Vertex Models 


Since Lieb’s solution of the ice model by a Bethe 
ansatz, there have been many developments on 
vertex models, in which the state variables live on 
line segments and weight factors wine are assigned to 
a vertex where four line segments with the four 
states a, u, A, 8 on them meet, see Figure 6. 

Baxter solved the eight-vertex model in 1971, using 
a method based on commuting transfer matrices, 
starting from a solution of what he then called the 
generalized star—-triangle equation, but what is now 
commonly called the Yang—Baxter equation (YBE): 
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Figure 6 Vertex model weight wa? (p, q), mixed model weight 
W325 p,q) and IRF model weight wee (p, q). 
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=5 5 > o na a (pr) [8] 
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This equation is represented graphically in Figure 5. 
From it one can also derive a sufficient condition for 
the commutation of transfer matrices and spin-chain 
Hamiltonians, generalizing the work of McCoy and 
Wu, who had earlier initiated the search by showing 
that the general six-vertex model transfer matrix 
commutes with a Heisenberg spin-chain Hamilto- 
nian. To be more precise, Baxter found that if 
wa? = 6267 for some choice of p and q, some spin- 
chain Hamiltonians could be derived as logarithmic 
derivatives of the transfer matrix. 


Interaction-Round-a-Face Model 


Baxter introduced another language, namely that of the 
IRF or “interaction-round-a-face” model, which he 
introduced in connection with his solution of the hard- 
hexagon model. This formulation is convenient when 
studying one-point functions using the corner-transfer- 
matrix method. Now the integrability condition can be 
represented graphically as in Figure 7 or algebraically as 


N wf (p, qi’ la, r wip, r) 
d 


=Y wp, awii la rwo) 9 
d' 


The spins live a faces enclosed by rapidity lines and 
the weights w? (p,q) are assigned as in Figure 6. 





IRF model YBE. 


Figure 7 
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Baxter discovered a new principle based on eqns [8] 
and [9], which he called Z-invariance, as it expresses 
an invariance of the partition function Z under moves 
of rapidity lines. This also implies that typical one- 
point functions are independent of the values of the 
rapidities, while two-point functions can only depend 
on the values of the rapidities of rapidity lines crossing 
between the two spins considered. Many recent results 
on correlation functions in integrable models depend 
on this observation of Baxter. 


IRF-Vertex Model 


In Figure 6, we have also To mixed IRF-vertex 
model vn wasje P ab(p»q4). (We could put further 
state variables on ‘the wth but then the natural 
thing to do is to introduce new effective weights 
summing over the states at each vertex.) With the 
choice made a more general YBE can be represented 
as in Figure 8, or by 


i Waa ley (P4) 


o" p g 
x WIZE (a, ae 7 A 
“SOLD wks a) 
o" p n 
x We galg r) Wow aps) — [10] 


Quantum Inverse-Scattering Method 


The Leningrad school of Faddeev incorporated the 
methods of Baxter and Yang in their so-called 





Figure 9 Checkerboard versions of the weights. 





Figure 8 General YBE. 


quantum inverse-scattering method (QISM), coining 
the term quantum YBEs (QYBEs) for eqns [8]. If 
special limiting values of p and q can be found, say as 
hb — 0, such that we = 5168 + O(h), one can reduce 
[8] to the classical Yang—Baxter equations (CYBEs) by 
expanding up to the first nontrivial order in expansion 
variable 4. These determine the integrability of certain 
models of classical mechanics by the inverse-scattering 
method and the existence of Lax pairs. 


Checkerboard generalizations 


Star-triangle equations [3] and [4] imply that there are 
further generalizations of the YBEs, namely those for 
which the faces enclosed by the rapidity lines are 
alternatingly colored black and white in a checkerboard 
pattern. We can then introduce either vertex model 
weights wa? (Pq ) W ane „(p,q), or IRF-vertex model 
weights wre (p,q and Wb, ), or IRF 
model weights er. ) and w% (p,q ), see Figure 9. 





n n 
O: © ©: © 
© © O © 


The black faces are those where the spins of the 
spin model with weights defined in Figure 2 live; the 
white faces are to be considered empty in Figures 2 
and 3 (or, equivalently, they can be assumed to host 
trivial spins that take on only a single value). 
Clearly, the IRF-vertex model description contains 
all the other versions. 


Checkerboard Vertex Model 


First we oo the checkerboard vertex model 
with weights wô (D> ) and w? „(p,q) as assigned in 
Figure 9. The YBE [8] then a to two sets of 
equations: 


DD Mia’ P 


Jwan (q, D (p, r) 


al! Br ryt 
= 20.4. Dwg 
q! 0" a 
sT (ao (D1) [11] 


ACE e (p, r) 


Rp4,7) DD 0a (P, 


q" Bt ae 


DDI e 


(q,r) (r) [12] 
a! eu q" 


where scalar factors R and R have been added as in 
[3] and [4]. These equations are represented graphi- 
cally by Figure 10. 





Figure 10 Checkerboard vertex model YBE. 
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Checkerboard IRF Model 


The checkerboard IRF version of the YBE [8] 
becomes 


> wal (p, q) wie (q, r) pa (p, r) 


= R(p,q,r) X _ Tia P, Wya rwar) [13] 
d' 


R(p, 4,7) 2 Wey (P, We (1) Wal: r) 


= 2 wiel (p qawwi la rE or) 14 


again with scalar factors R and R added as in [3] 
and [4]. These equations can now be represented 
graphically as in Figure 11. Note that these 
equations reduce to eqns [3] and [4] if the spins on 
the white faces are allowed to take only one value, 
which means that they can be ignored. 


Checkerboard IRF-Vertex Model 


Finally, the most general case is represented by the 
checkerboard IRF-vertex model, with weights 
defined in Figure 9. For this case the YBEs are 
given by 


22 d Waa leo (Pa) 


a! B A 
x wie wile (q,r 1) Wy, J£ (p, r) 


DDD DD DD 


a" Be q" 


x War Elar WE een) [15] 


R(p,q,r) LL Woa ley ky Pq) 


Q" p" nl 


a! 1 ‘de’ 
x Wi, q" a. r)W 4 bia(P r) 


Ae 
— = 2 2 >, Wo alaa q) 
a my 
x WL Ela NW Ripr) [16] 


with its graphical representation in Figure 12. 
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Figure 12 Checkerboard YBE. 


Formal Equivalence of Languages 
The Square Weight 


Combining four weights of a checkerboard model in 
a square, as is done with four spin model weights 
in Figure 13, we find a regular vertex model weight 
with rapidities that are now pairs of the original 
ones. This gives 


Won (P1; 41) Wus(b1, 92) Wor(P2, 41) Wra(p2; q2) 
= wi (P1, P23 91,92) [17] 


Q 


(94,92) 


Figure 13 Square weight as vertex weight. 


From any solution of [3] and [4] we can thus 
construct a solution of YBE [8]. This has been used 
by Bazhanov and Stroganov to relate the integrable 
chiral Potts model with a cyclic representation of the 
six-vertex model. 


Map to Checkerboard Vertex Model 


The checkerboard IRF-vertex model formulation 
contains all other versions mentioned above as 
special cases. However, collecting the state variables 
in triples, we can immediately translate it to a vertex 
model version, writing 


\3 dc — Ap Wr? idc 
walpa) = Wil lP a), T va E Waulab (P9) 


if A = (d, A, E); B = (b, B, c) 
ad = (a,a,d), 


[18] 
p= (a, H, b) 


w (p,q) =D. (p,q) =0 otherwise [19] 


In eqn [19], we have set all vertex model weights 
zero that are inconsistent with IRF-vertex config- 
urations. Clearly, the translation of IRF models and 
spin models to vertex models can be done similarly. 


Map to Spin Model 


We can, furthermore, translate each vertex model 
with weights assigned as in Figures 6 or 9 into a spin 
model with weights as in Figure 2 by defining 
suitable spins in the black faces, after checkerboard 
coloring. Each spin is then defined to be the ordered 
set of states on the line segments of the vertex 
model, a=(a1,qa2,...), ordering the line segments 
counterclockwise starting at, say, 12 o’clock. We 
can then identify wa? (Ds q)= Wa, (p,q), wa? (p; q)= 
W, (p,q). This is surely not very economical, as 
many of the weights will be equal, but it helps show 
that all different versions of the checkerboard YBE 
are formally equivalent. 

Hence, we shall only use the vertex model 
language in the following. It is fairly straightforward 
to convert to the other formulations. 


An si(m|n) Example 


One fundamental example is a Q-state model for 
which the rapidities have 20+1 components, 
p= (P-o; os PQ), q= (J-o; se qo), etc., and the 
states on the line segments are arranged in strings 
of continuing conserved color. The vertex weights, 
for a, By A, u=1,..., Oy are given by 


AB = are P+r4-6 
wga Pq) = Woay\P0s 40) p, [20] 
with (p 4 o) 
Woop (P0, qo) = N sinh[n + Ep(Po — 4q0)| 
Woop (po, qo) = N Gog sinh(po = qo) 21] 
2? (po, qo) = N ePow)sisal-o) sinh 7 
wha lpo, qo) = 0, otherwise 


where N is an arbitrary overall normalization factor 
and ņ is a constant. Furthermore, ¢,= +1 for 
p=1,..., O, where m of them equal +1 and n of 
them equal —1. The G,,’s are constants satisfying 
Gpo = 1/Gop, which freedom is allowed because the 
number of p-o crossings minus the number of o-p 
crossings is fixed by the states on the boundary only, 
that is, the choice of a, a’, 3, 3’, y, y in YBE [8] and 
Figure 5. 

The solution [20], [21] has many applications. 
The case m=0, n=2 leads to the general six-vertex 
model; the m=0,n=n case produces the funda- 
mental intertwiner of affine quantum group U,$I(7), 
whereas the case m=2,n=1 corresponds to the 
supersymmetric one-dimensional t-J model. 


Operator Formulations 
The AR-Matrix 


For a problem with N rapidity lines, carrying 
rapidities p1,...,DN, We can introduce a set of 
matrices Rj(pi, pj), for 1<i < j<N, with elements 


Ry (Di, Dien on, = WEE (Di bj) |] 62 [22] 
kŁi,j 


In terms of these, the YBE [8] can be rewritten in 
matrix form as 


Rix (Dj; Pe) Rik (Dis Pk)Ri(pi, pi) 


[23] 
= Rj(Di, Pj) Rig (Dis Pe) Rig (Pj, Pr) 


where 1<1<j<k<N. 
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The A-Matrix 


If we transpose the ( indices 8; and / in eqn [22], 

we can define a set of matrices Rj j41(p,q) with 

elements 

wwa (pa) [] oe [24] 
k#i,i+1 


R; i+1 (p, a A — 


Using these, the YBE [8] can be rewritten in matrix 
form as 


Ř;, i+1(9, r)Riy1, i+2(D,1) Ry (p, q) 
= Rit. i+2(p, q) Ri ii (p, r)R i+1, i+2(q, r) |235] 


and 


Ri, 1,4) R; jls) =0, ifli-il>2 [26] 


In this formulation, it is clear that many solutions 
can be found “Baxterizing” Temperley—Lieb and 
Iwahori—Hecke algebras. 


Classical YBEs 


If we expand 


Ry(pi, p) = 1+ 6Xi(pi,p)) +06) 27 


in [23], we get in second order in p the classical YBE 
(CYBE) as the vanishing of a sum of three commu- 
tators, that is, 


X; Pi Dj), Xik (Pi P)| + X; (Di, Dj), Xie (Dj; Pe) 
+ [X (pi, De), Xj (Dj, pr) = 9 |28] 


introduced by Belavin and Drinfel’d, where X; is 
called the classical r-matrix. 


Reflection YBEs 


Cherednik and Sklyanin found a condition deter- 
mining the solvability of systems with boundaries, 
the reflection YBEs (RYBEs), see Figure 14. Upon 





Figure 14 Reflection YBE. 
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collisions with a left or right wall the rapidity 
variable changes from p to p and back. In most 
examples, in which the rapidities are difference 
variables such that R(p,g)=R(p — q), one also has 
p= -— p, with u some constant. The corresponding 
left boundary weights are K°(p, p) satisfying 


Ki (q,q)Ri2(P, q4)Kı (p, P)Ri2(q, p) 
= R (p, 9)Ki(p. P)Ri2(,p)Ki(q,9) [29] 


with Kı(p,p) defined by a direct product as in [24] 
appending unit matrices for positions i>2, and a 
similar equation must hold for the right boundary. 
Most work has been done for vertex models, while 
Pearce and co-workers wrote several papers on the 
IRF-model version. 


Higher-Dimensional Generalizations 


In 1980 Zamolodchikov introduced a three- 
dimensional generalization of the YBE, the so-called 
tetrahedron equations (TEs), and he found a special 
solution. Baxter then succeeded in proving that 
this solution satisfies all TEs. Baxter and Bazhanov 
showed in 1992 that this solution can be seen as 
a special case of the sl(oo) chiral Potts model. 
Several authors found further generalizations more 
recently. 


Inversion Relations 


When wl (Ds p) « 6.6%, that is, the weight decouples 
when the two rapidities are equal, one can derive the 
local inverse relation depicted in Figure 15, which is 
a generalization of the Reidemeister move of type II 
in Figure 4. It is easily shown that C(g, p) = C(p, q). 

This local relation implies also a global inversion 
relation which can be found in many ways. The 
following heuristic way is the easiest: consider the 
situation in Figure 16, with N closed p-rapidity lines 
and M closed q-rapidity lines. For M and N large, 
we may expect the partition function of Figure 16 
to factor asymptotically in top- and bottom-half 
contributions. If each line segment carries a state 


Figure 15 Local inversion relation. 


p 
p 
Figure 16 Heuristic derivation of inversion relation. 


variable that can assume O values, then the total 
partition function factors by repeated application of 
the relation in Figure 15 into the contribution of 
M +N circles. Therefore, 


Z = O™'NC(p, gq)" © Zuna) Znala p) [30] 


Taking the thermodynamic limit, 


(0,9) = lim Zm Y [31] 
one finds 
2(p, 4)2(4,P) = Cd, p) [32 


In many models, eqn [32], supplemented with some 
suitable symmetry and analyticity conditions, can be 
used to calculate the free energy per site. 


See also: Affine Quantum Groups; Bethe Ansatz; 
Classical rmatrices, Lie Bialgebras, and Poisson Lie 
Groups; Eight Vertex and Hard Hexagon Models; Hopf 
Algebras and g-Deformation Quantum Groups; 
Integrability and Quantum Field Theory; Integrable 
Discrete Systems; Integrable Systems: Overview; The 
Jones Polynomial; Knot Invariants and Quantum Gravity; 
Knot Theory and Physics; Sine-Gordon Equation; 
Topological Knot Theory and Macroscopic Physics; 
Two-Dimensional Ising Model; von Neumann Algebras: 
Subfactor Theory. 
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